NOT THE PYPI PACKAGE YOU'RE LOOKING FOR —

Latest attack on PyPI users shows crooks are only getting better

The code found in the malicious packages closely resembled legit offerings.

A skull and crossbones on a computer screen are surrounded by ones and zeroes.

More than 400 malicious packages were recently uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, in the latest indication that the targeting of software developers using this form of attack isn’t a passing fad.

All 451 packages found recently by security firm Phylum contained almost identical malicious payloads and were uploaded in bursts that came in quick succession. Once installed, the packages create a malicious JavaScript extension that loads each time a browser is opened on the infected device, a trick that gives the malware persistence over reboots.

The JavaScript monitors the infected developer’s clipboard for any cryptocurrency addresses that may be copied to it. When an address is found, the malware replaces it with an address belonging to the attacker. The objective: intercept payments the developer intended to make to a different party.

In November, Phylum identified dozens of packages, downloaded hundreds of times, that used highly encoded JavaScript to surreptitiously do the same thing. Specifically, it:

  • Created a textarea on the page
  • Pasted any clipboard contents to it
  • Used a series of regular expressions to search for common cryptocurrency address formats
  • Replaced any identified addresses with the attacker-controlled addresses in the previously created textarea
  • Copied the textarea to the clipboard

“If at any point a compromised developer copies a wallet address, the malicious package will replace the address with an attacker-controlled address,” Phylum Chief Technical Officer Louis Lang wrote in the November post. “This surreptitious find/replace will cause the end user to inadvertently send their funds to the attacker.”

New obfuscation method

Besides vastly increasing the number of malicious packages uploaded, the latest campaign also uses a significantly different way to cover its tracks. Whereas the packages disclosed in November used encoding to conceal the behavior of the JavaScript, the new packages write function and variable identifiers in what appear to be random 16-bit combinations of Chinese language ideographs found in the following table:

Unicode code point Ideograph Definition
0x4eba man; people; mankind; someone else
0x5200 knife; old coin; measure
0x53e3 mouth; open end; entrance, gate
0x5973 woman, girl; feminine
0x5b50 child; fruit, seed of
0x5c71 mountain, hill, peak
0x65e5 sun; day; daytime
0x6708 moon; month
0x6728 tree; wood, lumber; wooden
0x6c34 water, liquid, lotion, juice
0x76ee eye; look, see; division, topic
0x99ac horse; surname
0x9a6c horse; surname
0x9ce5 bird
0x9e1f bird

Using this table, the line of code

''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) - 1) << 3) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 2) - 1, (((3 << 3) + 1) << 1)]))

creates the built-in function chr and maps the function to the list of integers [119, 105, 110, 51, 50]. Then the line combines it into a string that ultimately creates 'win32'.

Phylum researchers explained:

We can see a series of these kinds of calls oct.__str__()[-3 << 0]. The [-3 << 0] evaluates to [-3] and oct.__str__() evaluates to the string '<built-in function oct>'. Using Python’s index operator [] on a string with a -3 will grab the 3rd character from the end of the string, in this case '<built-in function oct>'[-3] will evaluate to 'c'. Continuing with this on the other 2 here gives us 'c' + 'h' + 'r' and simply evaluating the complex bitwise arithmetic tacked on to the end leaves us with:

''.join(map(getattr(__builtins__, 'c' + 'h' + 'r'), [119, 105, 110, 51, 50]))

The getattr(__builtins__, 'c' + 'h' + 'r') just gives us the built-in function chr and then it maps chr to the list of ints [119, 105, 110, 51, 50] and then joins it all together into a string ultimately giving us 'win32'. This technique is continued throughout the entirety of the code.

While giving the appearance of highly obfuscated code, the technique is ultimately easy to defeat, the researchers said, simply by observing what the code does when it runs.

The latest batch of malicious packages attempts to capitalize on typos developers make when downloading one of these legitimate packages:

  • bitcoinlib
  • ccxt
  • cryptocompare
  • cryptofeed
  • freqtrade
  • selenium
  • solana
  • vyper
  • websockets
  • yfinance
  • pandas
  • matplotlib
  • aiohttp
  • beautifulsoup
  • tensorflow
  • selenium
  • scrapy
  • colorama
  • scikit-learn
  • pytorch
  • pygame
  • pyinstaller

Packages that target the legitimate vyper package, for instance, used 13 file names that omitted or duplicated a single character or transposed two characters of the correct name:

  • yper
  • vper
  • vyer
  • vype
  • vvyper
  • vyyper
  • vypper
  • vypeer
  • vyperr
  • yvper
  • vpyer
  • vyepr
  • vypre

“This technique is trivially easy to automate with a script (we leave this as an exercise for the reader), and as the length of the name of the legitimate package increases, so do the possible typosquats,” the researchers wrote. “For example, our system detected 38 typosquats of the cryptocompare package published nearly simultaneously by the user named pinigin.9494.”

The availability of malicious packages in legitimate code repositories that closely resemble the names of legitimate packages dates back to at least 2016 when a college student uploaded 214 booby-trapped packages to the PyPI, RubyGems, and NPM repositories that contained slightly modified names of legitimate packages. The result: The imposter code was executed more than 45,000 times on more than 17,000 separate domains, and more than half were given all-powerful administrative rights. So-called typosquatting attacks have flourished ever since.

The names of all 451 malicious packages the Phylum researchers found are included in the blog post. It’s not a bad idea for anyone who intended to download one of the legitimate packages targeted to double-check that they didn’t inadvertently obtain a malicious doppelganger.

Channel Ars Technica