Dec 24, 2021 Written by Adam Goldschmidt

How I found (and fixed) a vulnerability in Python

Following research done by James Kettle on web cache poisoning, I decided to deepen my knowledge in this field and explore these vulnerabilities in the open source domain. I focused my research on the most popular web frameworks, such as Flask, Bottle, and Tornado. I couldn't imagine that this research would end up in me fixing a security vulnerability in Python 3.9.

But wait - let's start at the beginning. As part of my research, I set up local instances of these frameworks so I can try to exploit them. Many of them were deemed vulnerable, but the Tornado one caught my attention. It was because Tornado’s maintainer told me that they were using Python’s standard library for parsing the URL.

Python’s source code

When I looked at Python’s source code, it became clear to me that the vulnerability was much more critical and profound than I thought it was - all packages that used Python’s standard library were vulnerable.

The urlparse module treated semicolon as a separator - whereas most proxies only took ampersands as separators. That meant that when the attacker could separate query parameters using a semicolon (;), they could have caused a difference in the interpretation of the request between the proxy (running with default configuration) and the server, resulting in malicious requests being cached as safe ones.

Exploitation example

GET /?link=http://google.com&utm_content=1;link='><t>alert(1)</script> HTTP/1.1

Host: somesite.com

Upgrade-Insecure-Requests: 1		

User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,imag e/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate			

Accept-Language: en-US,en;q=0.9 Connection: close

urlparse saw 3 parameters here: link, utm_content, and then link again. On the other hand, the proxy considered this full string: 1;link='><t>alert(1)</script> as the value of utm_content, which is why the cache key would have only contained somesite.com/?link=http://google.com .

I immediately contacted the Python security team and opened a bug ticket. I also created a pull request on the CPython repository. It took about a month of going back and forth with the PR, during which I have learned to adhere to Python’s contributors’ rules - and it got merged 🎉 on Feb 15 and released on Feb 19. The fix was backported to older versions of Python as well.

The moral of the story is to always strive to dig deeper. You think you found something interesting? challenge your hypothesis and think about the root cause, try to find it further along the chain, which might lead to even more fascinating results.

Join 1,202 other busy engineers

Python’s source code

Exploitation example