A Python security fix breaks (some) bignums
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
Typically, an urgent security release of a project is not for a two-year-old CVE, but such is the case for a recent Python release of four versions of the language. The bug is a denial of service (DoS) that can be caused by converting enormous numbers to strings—or vice versa—but it was not deemed serious enough to fix when it was first reported. Evidently more recent reports, including a remote exploit of the bug, have raised its importance—causing a rushed-out fix. But the fix breaks some existing Python code, and the process of handling the incident has left something to be desired, leading the project to look at ways to improve its processes.
Python integers can have an arbitrary size; once they are larger than can be stored in a native integer, they are stored as arbitrary-length "bignum" values. So Python can generally handle much larger integer values than some other languages. Up until recently, Python would happily output a one followed by 10,000 zeroes for print(10**10000). But as a GitHub issue describes, that behavior has been changed to address CVE-2020-10735, which can be triggered by converting large values to and from strings in bases other than those that are a power of two—the default is base 10, of course.
The fix is to restrict the number of digits in strings that are being converted to integers (or that can result from the conversion of integers) to 4300 digits for those bases. If the limit is exceeded, a ValueError is raised. There are mechanisms that can be used to change the limit, as described in the documentation for the Python standard types. The value for the maximum allowable digits can be set with an environment variable (PYTHONINTMAXSTRDIGITS), a command-line argument (-X int_max_str_digits), or from within code using sys.set_int_max_str_digits(). It can be set to zero, meaning there is no limit, or any number greater than or equal to a lower-limit threshold value, which is set to 640.
Collateral damage
After the release was made on September 7, Oscar Benjamin posted
a complaint about it to the Core Development section of the Python
discussion forum. He noted that the new limit causes problems for the
SageMath mathematics application
(bug report)
and the SymPy math
library (Benjamin's
bug report). The fix for the bug is
a significant
backward-compatibility break, because of the "fact that
int(string) and str(integer) work with arbitrarily
large integers has been a clearly intended and documented feature of Python
for many years
".
Furthermore, the ways to alter the limit are all global in nature; a library cannot really do much about it, since changing it would affect the rest of the program. Applications like SageMath can probably just set the limit (back) to zero, but not SymPy:
For Sage that is probably a manageable fix because Sage is mostly an application. SymPy on the other hand is mostly a library. A library should not generally alter global state like this so it isn't reasonable to have import sympy call sys.set_int_max_str_digits because the library needs to cooperate with other libraries and the application itself. Even worse the premise of this change in Python is that calling sys.set_int_max_str_digits(0) reopens the vulnerability described in the CVE so any library that does that should then presumably be considered a vulnerability in itself. The docs also say that max_str_digits is set on a per interpreter basis but is it threadsafe? What happens if I want to change the limit so that I can call str and then change it back again afterwards?
He wondered why the choice was made to switch the longstanding behavior of
int() and str(), rather than creating new variants, such
as safe_int(), that enforced this limit. Programs that wanted the
new behavior could switch to using those functions, but the default would
remain unchanged. Benjamin was hoping that the change could be
reconsidered before it became "established as the status quo
" but
that is not going to happen.
Steve Dower apologized for how the bug fix was handled. The problem had been reported to various security teams, which then converged on the Python Security Response Team (PSRT), he said. That group agreed that fixing it in the Python runtime was the right approach, but making that actually happen took too long:
The delay between report and fix is entirely our fault. The security team is made up of volunteers, our availability isn't always reliable, and there's nobody "in charge" to coordinate work. We've been discussing how to improve our processes. However, we did agree that the potential for exploitation is high enough that we didn't want to disclose the issue without a fix available and ready for use.
As might be guessed, the problem stems from untrusted input. The PSRT tried a number of different approaches but settled on forcing the limit, he said:
The code doing int(gigabyte_long_untrusted_string) could be anywhere inside a json.load or HTTP header parser, and can run very deep. Parsing libraries are everywhere, and tend to use int indiscriminately (though they usually handle ValueError already). Expecting every library to add a new argument to every int() call would have led to thousands of vulnerabilities being filed, and made it impossible for users to ever trust that their systems could not be DoS'd.We agree it's a heavy hammer to do it in the core, but it's also the only hammer that has a chance of giving users the confidence to keep running Python at the boundary of their apps.
Gregory P. Smith, who opened the GitHub issue and authored the fix, noted that the int_max_str_digits setting is not specific to a particular thread, so changing it in one thread may affect others, as Benjamin suggested. Because the scope of the change was simply a security fix, no new features could be added, Smith said, but he expects that some kind of execution context to contain the setting (and perhaps others) will be added for Python 3.12, which is due in 2023. Meanwhile, he understands that some libraries may need to change the setting even though that has more wide-ranging effects:
If a project/framework/library decides to say, "we can't live with this limit", and include a Warning in their documentation telling users that the limit has been disabled, the consequences, why, how to opt-out of that or re-enable it, and what different limits may be more appropriate for what purposes if they choose to do so, that is entirely up to them. I consider it reasonable even though I fully agree with the philosophy that changing a global setting from a library is an anti-pattern.
Beyond that, Smith is sympathetic
to the complaints, but the intent was "to pick something that limited
the global amount of pain caused
". They saw no way to satisfy
all of the different users of Python, so "we chose the 'secure by
default' approach as that flips the story such that only the less common
code that actually wants to deal with huge integers in decimal needs to do
something
" to deal with the change.
In addition, it had gotten to the point where the PSRT was concerned that other Python distributors would decide to apply their own fixes, which could lead to fragmentation within the ecosystem. Smith is also a member of the Python steering council; he and the other members felt that this was the only reasonable path:
We don't take this kind of change lightly.We weren't happy with any of the available options.
But those also included not being happy with continuing to do nothing and sitting on it for longer, or continuing to do nothing and letting it become public directly or via any of the disappointed at our lack of action reporters.
But Benjamin noted
that PEP 387
("Backwards Compatibility Policy") seems to preclude removing a feature
without notice, though it does provide a steering council escape hatch, which
was used here. While security is a reasonable concern, he was unhappy that
"security apparently trumps compatibility to the point that even
providing a reasonable alternative for a long standing very much core
feature can be neglected
". In particular, he complained that no
alternative conversion function (e.g. large_int()) was added for
code that wants to be able to convert arbitrarily long strings and
integers. He wrote a version of that function to demonstrate that the
tools being provided are inadequate to allow libraries like SymPy to write their
own:
def large_int(s: str): # This function is not thread-safe. # It could fail to parse a large integer. # It could also undermine security of other threads old_val = sys.get_int_max_str_digits() try: sys.set_int_max_str_digits(0) return int(s) finally: sys.set_int_max_str_digits(old_val)
Since another thread could be changing the limit at the same time, all bets are off for a multi-threaded Python program when calling that function. Benjamin and others also brought up the performance of the current conversion code, better algorithms for conversions, and other plans for the future, all of which may well deserve more coverage down the road. But much of the focus of the discussion was about the process of making a change like this, with no warning, and its effects on other projects in the Python community.
For example, "Damian" pointed
out that there is a community using Python for studying number theory;
the language has a low barrier to entry "because of the unbounded nature
of Python integers and the very easy syntax to learn
". But this change
has erected a new barrier and the way to restore the previous behavior is
buried; "it took me ~15 paragraphs of reading from the release notes to
the documentation, to subtopics in the documentation
" to find what is
needed. Smith agreed
("yuck
") and filed a bug to improve
the documentation.
Why 4300?
The limit of 4300 digits seems rather arbitrary—why not 4000 or 5000, for
example? Tim Peters said
that it did not make sense to him; "I haven't been able to find the
reasoning behind it
". Tests on his system showed that numbers that
large could be converted "well over a thousand times per second
",
which seemed like it would not really create a denial-of-service problem.
But
Smith said
that the limit was chosen for just that reason; he filled in a bit more
background on the attack, though the full details of it still have not been
disclosed as of this writing:
Picture a CPU core handling N things [per] second, which can include a few megabytes of data which could be something full of a thousand integer strings (ie: JSON). It's still possible to get something to spin an entire CPU core for a second in an application like that even with the 4300 digit limit in place. What isn't possible anymore is getting it to spend infinite time by sending the same amount of data. With mitigation in place an attack now [needs] to scale the data volume linearly with desired cpu wastage just like other existing boring avenues of resource waste.
Steven D'Aprano lamented that secrecy, however, noting that there have been multiple disclosures of similar bugs along the way.
I don't buy the need to push out a fix to this without public discussion. This class of vulnerability was already public, and has been for years.Even if it turns out that the chosen solution is the best, or only, solution, the secrecy and "trust us, we know best" from the security team was IMO not justified.
Smith said
that the vulnerability is not "in any way exciting, new, or novel
".
The decision was made to keep it private "because it hadn't publicly
been associated with the Python ecosystem
". That may turn out to be
the wrong choice, in truth, but we will not know that until we see the
exploit(s) the security team was working with.
Peters
asked
again about the value of the digit limit. Neil Schemenauer said
that it "was chosen to not break a numpy unit test
". Ofek Lev called
that "extraordinarily arbitrary
" and wondered if there was proof
that was actually the case. Andre Müller was in
favor of the change "because it makes Python safer
", but he also
wondered about 4300. Christian Heimes filled
in the details:
4,300 happens to be small enough to have int(s) in under 100 usec on modern hardware and large enough to pass unit tests of popular projects that were in our internal test [systems]. My initial value was 2,000, which is easier to remember and more than large enough for a 4k RSA key in decimal notation. We later bumped it up to 4,300 because the test suites of some popular Python projects like NumPy were failing with 2,000.
Chris Angelico thought
that the limit was "extremely aggressively chosen
"; it could be
quite a bit higher without causing major problems, he said. Dower agreed
that was possible, "but as we've seen from the examples above, the
people who are unhappy about the limit won't be happy until it's completely
gone
". Given that, having a lower limit means those people will
encounter and work around the problem that much sooner.
The discussion continues as of this writing and it looks like there are some real efforts toward better fixes down the road, but for now users of lengthy bignums need to make adjustments in their code. It is unfortunate that the community did not get to pitch in on the solution, as there are a few different suggestions in the thread that seem at least plausibly workable—with less impact. We will have to wait and see what the original vulnerability report(s) have to say; no one has said so specifically, but one hopes those will be released at some point soon. The two-year gap between report and fix is worrisome as well, but the security team is aware of that problem and, with luck, working on a fix for that as well.
Index entries for this article | |
---|---|
Security | Python |
Python | Security |
(Log in to post comments)
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 12:00 UTC (Wed) by ballombe (subscriber, #9523) [Link]
"A huge integer will always consume a near-quadratic amount of CPU time in conversion to or from a base 10 (decimal) string with a large number of digits. No efficient algorithm exists to do otherwise."
Certainly, efficient quasi linear time and spaces algorithms exist and are implemented in GNU MP for example.
(they can print 10^10000000 with conversion from base 2 to base 10 in much less than 1s)
The secondary issue is that python has its own bignum implementation which is very slow.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 12:56 UTC (Wed) by vstinner (subscriber, #42675) [Link]
Python performance is correct with "small bigints", bad for "large bigints" :-) I don't know the threshold, it may depend on the machine. I would say that up to 1000 decimal digits, the Python bigint algorithms are fast enough for most use cases on a desktop CPU. For numbers larger than 10,000 decimal digits, you might want even faster algorithms which are more complicated, less people understand them, and it's harder to maintain them. GMP is just a perfect place for such highly optimized algorithms ;-) For example, Python avoids assembly code, whereas GMP loves it :-)
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 13:12 UTC (Wed) by Wol (subscriber, #4433) [Link]
One system I used had microcode specifically for BCD arithmetic, specifically for arbitrary precision. Given that the CPU speed was probably measured in the low MHz, that tells you how long ago it was ...
Cheers,
Wol
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 12:48 UTC (Fri) by colejohnson66 (subscriber, #134046) [Link]
x86 had BCD related instructions (AAA/AAS/AAM/AAD
for BCD and DAA/DAS
for ASCII), but when AMD introduced AMD64/x86-64, those opcodes became undefined/reserved (and still are). You could now only use them in 16 and 32 bit modes. However, even in those modes, the instructions are microcoded, so they're relatively slow. And context switching to a 32 bit thread just to do BCD logic would be horrible performance wise. But even on a 32 bit thread, it's probably about as slow as doing the logic yourself using the normal instructions.
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 12:50 UTC (Fri) by colejohnson66 (subscriber, #134046) [Link]
Correction:DAA/DAS
are not ASCII, but packed BCD (two numbers per byte). AAA/AAS/AAM/AAD
are for "unpacked" BCD (one number per byte)
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 13:25 UTC (Wed) by hkario (subscriber, #94864) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 22:24 UTC (Wed) by gray_-_wolf (subscriber, #131074) [Link]
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 0:07 UTC (Thu) by vstinner (subscriber, #42675) [Link]
* https://mail.python.org/pipermail/python-3000/2007-Septem...
* https://mail.python.org/pipermail/python-3000/2007-Octobe...
* https://bugs.python.org/issue1814
Python is fast to allocate Python integer objects and algorithms on Python ints are fast enough for numbers up to 2^64 (20 decimal digits) which fits most use cases.
I tried that in the early days of Python 3, since the Python 2 "small int" type (single digit) was removed. Python 2 had two integer types ("small" and "long"), Python 3 has a single integer type (only "long").
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 11:01 UTC (Thu) by pbonzini (subscriber, #60935) [Link]
The disadvantage is that mpn is very low level.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 14:22 UTC (Wed) by tchernobog (subscriber, #73595) [Link]
If something has O(n^2) behavior, just document it. It is the responsibility of the caller to make sure nobody is trying to trigger a potentially expensive operation due to unsanitized input coming from outside.
I hear here talk about "safe_int", but it's not like the previous implementation crashed or led to undefined behavior.
In most cases, input is not coming from outside your applications. Only the boundary of the system really needs a conditional check on the size of each value being parsed.
So, I do not buy this argument _at all_:
> Parsing libraries are everywhere, and tend to use int indiscriminately
> (though they usually handle ValueError already).
>
> Expecting every library to add a new argument to every int() call would
> have led to thousands of vulnerabilities being filed, and made it impossible for
> users to ever trust that their systems could not be DoS'd.
Sometimes I get the feeling that Python wants to be foolproof in all situations, event at the expense of performance, but if a data-structure or algorithm just take quadratic time... the responsibility really lies on the user.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 14:43 UTC (Wed) by ballombe (subscriber, #9523) [Link]
Most other software and languages that support bigint do it quasi linear time and space.
So there is no real reason for the user to expect it to be quadratic.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 16:41 UTC (Wed) by iabervon (subscriber, #722) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:15 UTC (Wed) by k8to (guest, #15413) [Link]
Not that I like the numeric handling context pattern, but .. it's kinda already out there.
A Python security fix breaks (some) bignums
Posted Sep 18, 2022 18:09 UTC (Sun) by flussence (subscriber, #85566) [Link]
Regardless, that seems like the sort of thing code should have, even if the spec doesn't.
A Python security fix breaks (some) bignums
Posted Sep 18, 2022 18:39 UTC (Sun) by iabervon (subscriber, #722) [Link]
What you're probably thinking of is that email has a header line length limit, but it's not widely enforced, and it turns out that some messages people actually want to receive don't obey it.
A Python security fix breaks (some) bignums
Posted Sep 23, 2022 9:08 UTC (Fri) by njs (subscriber, #40338) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 18:04 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
There are a *lot* of JSON-spewing HTTP endpoints that Python is asked to convert to Python objects and integers happen to be part of that. I don't think such a claim holds much water.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 15:30 UTC (Wed) by linuxrocks123 (subscriber, #34648) [Link]
First off, that CVE is bogus. Using a less-than-optimal algorithm for string conversion is not a security vulnerability. If someone is converting untrusted strings to integers, it's up to them to make sure the string isn't huge. And, if they don't do that, and this "vulnerability" is exploited, they won't get pwned, and they'll figure out the problem and fix their application soon enough. Given all that, breaking BACKWARD COMPATIBILITY IN THE LANGUAGE to prevent griefing is massively unjustified.
Second, if you're going to run around with your head cut off because someone, somewhere, might be able to waste CPU cycles on a web server, why not fix the code by changing the string conversion algorithm from a quadratic one to an n-log-n one, since such an algorithm exists? That would have been an unmitigated win for everyone. Doesn't that make more sense than BREAKING YOUR USER'S PROGRAMS?
For comparison, C deprecated gets in 1999 and removed it in 2011. And C is a compiled language so, unlike with Python, removals of features don't cause running code to break when you run updates.
Thank you guys so much for stopping support for Python 2.7. All your shitty decisions since then have made it abundantly clear to me that your support is a net negative.
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 7:06 UTC (Wed) by IkeTo (subscriber, #2122) [Link]
Are you sure that O(n log n) time suffices for a conversion between binary and decimal for integers of length n? From what I read in the online gmp manual like this, this and this, it appears to me that you'll need no less than than O(n1.33) time.
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 9:53 UTC (Wed) by paulj (subscriber, #341) [Link]
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 2:34 UTC (Thu) by IkeTo (subscriber, #2122) [Link]
Will I use that notation if I did have some thoughts about it? I spent some time pondering about the question.
The answer is, no. I'll still use the O notation anyway. Why? Because using Omega implies a certain level of rigor. Either you say a particular algorithm is Omega(f(n)) time because you have analyzed it and find that the worst case time is at least a constant times f(n), or you say a particular problem is Omega(f(n)) time because you have analyzed the problem and find that to solve the problem you must expend that amount of efforts or more in the worst case.
Neither is the intention of my reply. I just want to express my feeling that the original comment, saying that there is a way to convert a decimal string to a bigint representation in O(n log n) time, looks highly dubious.
First, there is no particular algorithm I'm talking about. I just want to say that the algorithm that the original poster might have in mind probably does not have that complexity guarantee. I checked the gmp manual, and think that any trivial way to compose an algorithm doing string to bigint conversion seems to need an amount of time far exceeding O(n log n).
Second, I'm not talking about the conversion problem. I never intend to say that there is no possible way to build a conversion routine better than using gmp.
So my usage has no rigor, and in such cases I want a "layman" notation and don't want to use Omega.
Sorry for nit picking so badly. :-)
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 9:49 UTC (Thu) by paulj (subscriber, #341) [Link]
There's nothing wrong with just using clear natural language in technical writing / maths.
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 11:15 UTC (Thu) by Wol (subscriber, #4433) [Link]
The linux raid wiki contribution guidelines say "please write in the FIRST person". :-)
Cheers,
Wol
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 17:49 UTC (Wed) by jsm28 (subscriber, #104141) [Link]
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 23:59 UTC (Wed) by IkeTo (subscriber, #2122) [Link]
Are you sure that multiplying two n-digit numbers can be done in O(n) time? The O(n1.33) in my last message comes solely from a single multiplication.
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 18:21 UTC (Thu) by jsm28 (subscriber, #104141) [Link]
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 21:54 UTC (Thu) by IkeTo (subscriber, #2122) [Link]
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 22:04 UTC (Thu) by jsm28 (subscriber, #104141) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 16:14 UTC (Wed) by mb (subscriber, #50428) [Link]
If some application doesn't restrict arbitrarily long and arbitrarily chosen strings from being put into Python's bignum parser, it's the applications fault. It is *obvious* that such a thing could result in arbitrary calculation times and memory allocations.
This problem *cannot* even be fixed in Python completely. The chosen limit is not low enough, if the application allows a big number or arbitrary number of conversions in parallel or in a row.
Therefore this change didn't really fix the application DoS.
And safe_int() actually makes the developer think less about their code. It has safe in its name, so it must be safe, right? It calls it a million times in a row? Still safe, because the name says so!
In fact, which actual application is affected by this? Any names of applications that are DoS-able by binnum parsing? Just fix them, ok?
What is Python going to restrict next to prevent applications from doing dumb things?
Restrict the multiplication operator for strings, because it could result in arbitrary allocations and big computation times, if the factor is untrusted? Shall we restrict loop iteration counts?
This change does just one thing: Break applications. Please revert it.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:24 UTC (Wed) by k8to (guest, #15413) [Link]
In C++ when we had a problem where the client could send a bogus content length that the server would just malloc() a buffer for (and fall over instantly). I just hacked the code to say "if the content length is over X, just reject the request" where X was defaulted to like 20MB. This was easy because we owned the whole code stack, and even if we were using a third party request parser , maintaining a diff would be reasonable. (Obviously, the right thing to do here was to handle the content in a streaming manner, and we did that later.)
In pythonworld, it feels very strange to go around patching third party parsers. You can do it, even dynamically, but it seems like you're doing it wrong, and I'd say you probably are doing it wrong. So saying it's just hte application's fault feels weird. Probably the parsers should adjust to be more conservative in what they expect, at least by default. Or at *least* have a way to run in a conservative mode. But my assumption is that today, they don't, so some solution was wanted.
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 15:19 UTC (Thu) by jafd (subscriber, #129642) [Link]
Programmers and their tendency to always find answers to complicated questions that are simple, elegant, and wrong. Grr.
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 15:47 UTC (Thu) by Wol (subscriber, #4433) [Link]
Quite often I think they can't see the forest for the leaves ...
Cheers,
Wol
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 0:48 UTC (Fri) by khim (subscriber, #9252) [Link]
> Quite often I think they can't see the forest for the leaves ...Except in this case the forest definitely have the problem, while all trees in it look sane.
- Python supports bignums and since it's dynamic language they are supported everywhere.
- Python supports reading and writing bignums and these are using quadratic algorithms.
- Certain data structures are supporting integers (cough JSON cough).
- And these are used by million of developers who have no idea what they are actually doing (many don't have any CS education at all and couldn't know how linear algorithm differes from quadratic one).
- And after #1, #2, #3, and #4 have become a widespread knowledge script-kiddies have started attacking these web sites. This works because of quadratic nature of parsing: now your server processes 1MB input not 1000 times slower than 1KB input but 1000000 times slower, which is quite a problem, in practice.
What are you options? You may follow mb's advice and say that millions of developers should, somehow, fix their code. But they don't even know Python supports bignums, and they definitely wouldn't know how to fix their bespoke parsers!
You can not change JSON and, more importantly, you can not change other data structures which embed numbers in a textual form!
You probably can fix Python and switch from quadratic algorithm to something better… but that's significant change which is hard to do in a jiffy.
I don't like the decision of Python maintainers, but I fail to see what other choice they could have done.
A Python security fix breaks (some) bignums
Posted Sep 17, 2022 0:47 UTC (Sat) by butlerm (subscriber, #13312) [Link]
A Python security fix breaks (some) bignums
Posted Sep 18, 2022 0:30 UTC (Sun) by mathstuf (subscriber, #69389) [Link]
A Python security fix breaks (some) bignums
Posted Sep 17, 2022 9:46 UTC (Sat) by mb (subscriber, #50428) [Link]
Sorry, you are completely loosing track here.
For once, can you please name just a couple of these presumably millions of affected applications?
I might have missed something in the discussion, but to this date I don't know of a single affected application.
And then: This theoretical problem is *not* a serious security issue. It's a simple DoS. You application stops responding. If you care so much about your application, you should probably have a monitor that just restarts it anyway.
This is not a privilege escalation. And it's not a private data leak. It's a DoS.
>What are you options?
The options are simple and obvious: Don't change anything in Python.
Don't fix application bugs, that don't exist.
Don't fix theoretical application bugs, by effectively removing a long standing useful feature.
If there are applications throwing untrusted data into quadratic algorithms, then these applications probably have more problems than that. The developer clearly didn't know what she was doing.
Are we now going to remove all algorithms with potentially long run time from Python, just because someone might throw untrusted data into it?
A Python security fix breaks (some) bignums
Posted Sep 18, 2022 23:18 UTC (Sun) by ras (subscriber, #33059) [Link]
They have effectively changed JSON with the fix, to wit: integers with more than 4300 digits are now banned in Python.
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 19:40 UTC (Fri) by k8to (guest, #15413) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 17:46 UTC (Wed) by tux3 (subscriber, #101245) [Link]
For Python 4, I would simply replace CPython with a eBPF-in-userspace transpiler.
CPython could just pick a default eBPF complexity limit and provide a PYTHONMAXINSTRUCTIONS env var to override it.
And without unbounded loops, people writing their own network servers won't have to worry about checking lengths when parsing or adding fancy things like timeouts and memory limits to their parsers or network request handlers.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 17:54 UTC (Wed) by mfuzzey (subscriber, #57966) [Link]
There are many ways untrusted input can lead to performance issues or worse and it should be up to applications accepting untrusted input to sanitize it first because only the application knows the trustworthiness of the data or otherwise.
If we really want to impose restrictions at the python level where none existed before they should be *opt in* so existing code won't break and any application taking untrusted input that is worried about a DoS and doesn't need huge integers and doesn't want to do the sanitizing itself can set the limit lower.
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:30 UTC (Wed) by k8to (guest, #15413) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:32 UTC (Wed) by k8to (guest, #15413) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:35 UTC (Wed) by k8to (guest, #15413) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 19:11 UTC (Wed) by GhePeU (subscriber, #56133) [Link]
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 15:14 UTC (Thu) by jafd (subscriber, #129642) [Link]
A Python security fix breaks (some) bignums
Posted Sep 19, 2022 12:46 UTC (Mon) by geert (subscriber, #98403) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:19 UTC (Wed) by ssmith32 (subscriber, #72404) [Link]
Wouldn't it address the security concerns, and provide an escape hatch to have a valid impl (perhaps as C module) of large_int, or, even better, a per-method override argument of the limit on the existing int() method?
A per-method optional argument to disable the limit jumped to my mind immediately, so surely it crossed the mind of someone smarter than I .. so curious why that functionality was not included, in addition to the global flag?
Anyone already working on a bigint python module?
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 21:34 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
What's wrong with `def int(input, maxlen=4300):` as the function? Then SciPy and other packages can be modified to pass `maxlen=None` or whatever. Alas, old versions reject unknown keyword arguments, so that's no fun either… But it'd at least avoid the global state problem that is currently here.
A Python security fix breaks (some) bignums
Posted Sep 15, 2022 0:10 UTC (Thu) by devkev (subscriber, #74096) [Link]
This. But even without adding a default keyword arg, you could get the same benefit from a def int_maxlen(s, maxlen) function.With such a function, individual conversions that want a larger limit can change int(foo) to int_maxlen(foo, maxlen), ie. basically the same as int(foo, maxlen=maxlen).
And this would also allow the awful unsafe large_int() proposed in the article to instead be trivial (and safe): return int_maxlen(s, 0)
The real problem in this whole kerfuffle is that there's no targeted way to opt-out of the fix, only the system-wide sledgehammer.
Not two years old - nineteen years old
Posted Sep 14, 2022 22:20 UTC (Wed) by maney (subscriber, #12630) [Link]
Did Guido have to decommission the time machine when he resigned as BDFL?
Not two years old - nineteen years old
Posted Sep 16, 2022 19:59 UTC (Fri) by jwilk (subscriber, #63328) [Link]
A Python security fix breaks (some) bignums
Posted Sep 14, 2022 22:22 UTC (Wed) by gray_-_wolf (subscriber, #131074) [Link]
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 3:00 UTC (Fri) by milesrout (subscriber, #126894) [Link]
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 10:58 UTC (Fri) by flussence (subscriber, #85566) [Link]
The vast majority of people I've seen talk about this webapp absolutely loathe it; the poster child for "works on my machine"-ware.
A Python security fix breaks (some) bignums
Posted Sep 16, 2022 12:32 UTC (Fri) by milesrout (subscriber, #126894) [Link]
A Python security fix breaks (some) bignums
Posted Sep 17, 2022 12:22 UTC (Sat) by marduk (subscriber, #3831) [Link]
A Python security fix breaks (some) bignums
Posted Sep 18, 2022 12:23 UTC (Sun) by Homer512 (subscriber, #85295) [Link]
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 6:19 UTC (Wed) by IkeTo (subscriber, #2122) [Link]
On the other hand, I don't think it is a difficult problem to solve. There are many places which can host the maximum int conversion length. The only problem is that it is a problem already in the wild and needs to be addressed ASAP, without going through the usual PEP process. Personally I don't worry about it, the problem will be a short-life one.
A Python security fix breaks (some) bignums
Posted Sep 21, 2022 22:35 UTC (Wed) by Homer512 (subscriber, #85295) [Link]
> old_val = sys.get_int_max_str_digits()
> try:
> sys.set_int_max_str_digits(0)
> return library_call()
> finally:
> sys.set_int_max_str_digits(old_val)
controlling entry points to a library may be less painful than changing every int conversion within it; as long as you are careful around callbacks, generators, etc. By making the limit a global setting, you can't just override it within a library call.
A Python security fix breaks (some) bignums
Posted Sep 22, 2022 1:04 UTC (Thu) by yadiwainoalbert (guest, #160641) [Link]