Latest attack on PyPI users shows crooks are only getting better

A skull and crossbones on a computer screen are surrounded by ones and zeroes. — Enlarge
Getty Images

More than 400 malicious packages were recently uploaded to PyPI (Python Package Index), the official code repository for the Python programming language, in the latest indication that the targeting of software developers using this form of attack isn’t a passing fad.

All 451 packages found recently by security firm Phylum contained almost identical malicious payloads and were uploaded in bursts that came in quick succession. Once installed, the packages create a malicious JavaScript extension that loads each time a browser is opened on the infected device, a trick that gives the malware persistence over reboots.

The JavaScript monitors the infected developer’s clipboard for any cryptocurrency addresses that may be copied to it. When an address is found, the malware replaces it with an address belonging to the attacker. The objective: intercept payments the developer intended to make to a different party.

In November, Phylum identified dozens of packages, downloaded hundreds of times, that used highly encoded JavaScript to surreptitiously do the same thing. Specifically, it:

Created a textarea on the page
Pasted any clipboard contents to it
Used a series of regular expressions to search for common cryptocurrency address formats
Replaced any identified addresses with the attacker-controlled addresses in the previously created textarea
Copied the textarea to the clipboard

“If at any point a compromised developer copies a wallet address, the malicious package will replace the address with an attacker-controlled address,” Phylum Chief Technical Officer Louis Lang wrote in the November post. “This surreptitious find/replace will cause the end user to inadvertently send their funds to the attacker.”

New obfuscation method

Besides vastly increasing the number of malicious packages uploaded, the latest campaign also uses a significantly different way to cover its tracks. Whereas the packages disclosed in November used encoding to conceal the behavior of the JavaScript, the new packages write function and variable identifiers in what appear to be random 16-bit combinations of Chinese language ideographs found in the following table:

Unicode code point	Ideograph	Definition
0x4eba	人	man; people; mankind; someone else
0x5200	刀	knife; old coin; measure
0x53e3	口	mouth; open end; entrance, gate
0x5973	女	woman, girl; feminine
0x5b50	子	child; fruit, seed of
0x5c71	山	mountain, hill, peak
0x65e5	日	sun; day; daytime
0x6708	月	moon; month
0x6728	木	tree; wood, lumber; wooden
0x6c34	水	water, liquid, lotion, juice
0x76ee	目	eye; look, see; division, topic
0x99ac	馬	horse; surname
0x9a6c	马	horse; surname
0x9ce5	鳥	bird
0x9e1f	鸟	bird

Using this table, the line of code

''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) - 1) << 3) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 2) - 1, (((3 << 3) + 1) << 1)]))

creates the built-in function chr and maps the function to the list of integers [119, 105, 110, 51, 50]. Then the line combines it into a string that ultimately creates 'win32'.

Phylum researchers explained:

We can see a series of these kinds of calls oct.__str__()[-3 << 0]. The [-3 << 0] evaluates to [-3] and oct.__str__() evaluates to the string '<built-in function oct>'. Using Python’s index operator [] on a string with a -3 will grab the 3rd character from the end of the string, in this case '<built-in function oct>'[-3] will evaluate to 'c'. Continuing with this on the other 2 here gives us 'c' + 'h' + 'r' and simply evaluating the complex bitwise arithmetic tacked on to the end leaves us with:
''.join(map(getattr(__builtins__, 'c' + 'h' + 'r'), [119, 105, 110, 51, 50]))
The getattr(__builtins__, 'c' + 'h' + 'r') just gives us the built-in function chr and then it maps chr to the list of ints [119, 105, 110, 51, 50] and then joins it all together into a string ultimately giving us 'win32'. This technique is continued throughout the entirety of the code.

While giving the appearance of highly obfuscated code, the technique is ultimately easy to defeat, the researchers said, simply by observing what the code does when it runs.

The latest batch of malicious packages attempts to capitalize on typos developers make when downloading one of these legitimate packages:

bitcoinlib
ccxt
cryptocompare
cryptofeed
freqtrade
selenium
solana
vyper
websockets
yfinance
pandas
matplotlib
aiohttp
beautifulsoup
tensorflow
selenium
scrapy
colorama
scikit-learn
pytorch
pygame
pyinstaller

Packages that target the legitimate vyper package, for instance, used 13 file names that omitted or duplicated a single character or transposed two characters of the correct name:

yper
vper
vyer
vype
vvyper
vyyper
vypper
vypeer
vyperr
yvper
vpyer
vyepr
vypre

“This technique is trivially easy to automate with a script (we leave this as an exercise for the reader), and as the length of the name of the legitimate package increases, so do the possible typosquats,” the researchers wrote. “For example, our system detected 38 typosquats of the cryptocompare package published nearly simultaneously by the user named pinigin.9494.”

NOT THE PYPI PACKAGE YOU'RE LOOKING FOR —

Latest attack on PyPI users shows crooks are only getting better

The code found in the malicious packages closely resembled legit offerings.

New obfuscation method

Further Reading

Channel Ars Technica

New obfuscation method

Further Reading

reader comments

Channel Ars Technica