Hacker News new | past | comments | ask | show | jobs | submit login
Our Plan for Python 3.13 (github.com/faster-cpython)
531 points by bratao 11 months ago | hide | past | favorite | 399 comments



The "Faster Python" team are doing a fantastic job, it's incredible to see the progress they are making.

However, there is also a bit of a struggle going on between them and the project to remove the GIL (global interpreter lock) from CPython. There is going to be a performance impact on single threaded code if the "no GIL" project is merged, something in the region of 10%. It seems that the faster Python devs are pushing back against that as it impacts their own goals. Their argument is that the "sub interpreters" they are adding (each with its own GIL) will fulfil the same use cases of multithreaded code without a GIL, but they still have the overhead of encoding and passing data in the same way you have to with sub processors.

There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in. However again I'm unconvinced by that, the Python community has been through worse (Python 3) and even asyncIO completely divides the community now.

It's somewhat unfortunate that this internal battle is happening, both projects are incredible and will push the language forward.

Once the GIL has been removed it opens us all sorts of interesting opportunities for new concurrency APIs that could enable making concurrent code much easer to write.

My observation is that the Faster Python team are better placed politicly, they have GVR on the team, whereas No GIL is being proposed by an "outsider". It just smells a little of NIH syndrome.


The GIL will never be removed from the main python implementation. Histortically, the main value of GIL removal proposals and implementations has been to spur the core team to speed up single core codes.

I think it's too late to consider removing the gil from the main implementation. Like guido said in the PEP thread, the python core team burned the community for 10 years with the 2-3 switch, and a GIL change would be likely as impactful; we'd have 10 years of people complaining their stuff didn't work. Frankly I wish Guido would just come out and tell Sam "no, we can't put this in cpython. You did a great work but compatibility issues trump performance".

Kind of a shame because Hugunin implemented a Python on top of the CLR some 20 years ago and showed some extremely impressive performance results. Like jython, and pypy and other implementations, it never caught on because compatibility with cpython is one of the most important criteria for people dealing with lots of python code.


> The GIL will never be removed from the main python implementation.

I don't see why. It's a much easier transition than 2 to 3.

Make each package declare right at the top if they are non-GIL compatible. Have both modes available in the interpreter. If every piece of code imported has the declaration on top, then it runs in non-GIL mode, otherwise it runs in classic GIL mode.

At first most code would still run in GIL mode, but over time most packages would be converted, especially if people stopped using packages that weren't compatible.


Python would be too dynamic a language to have non-GIL compatibility declared simply at top. Code can be imported/evaled/generated at runtime, and at any part of a script, which means that python would need to be able to switch from non-GIL to GIL at any time of execution.


It's true that there are dynamic imports, but presumably it would be on the library maintainer to know about that, but also you could throw a catchable error about GIL imports or something like that.

All I'm saying is that it's solvable, and more solvable than 2 to 3.


I completely agree, something like:

   from __future__ import multicore
(The proposal seems to be to call it that rather than "no Gil" as it's more positive)

And maybe if done in an __init__.py it applies to the whole module.

Do that for v3.x then drop it completely for v4

I think the issue may be that maintaining both systems is too complex.


That's not an insurmountable problem.

As long as all the data structures stay the same, all you really need to do is flush out all the non-GIL bytecode and load in (or generate) GIL bytecode.

Sure, there might be a stutter when this happens. You will also want a way to either start in GIL mode, or force it to stay in non-GIL mode, throwing errors. But it's a very solvable problem.


There is no programming language problem related to concurrency and C-level APIs that change the amount of code Python represents that is NOT insurmountable. This is like linux "we don't break userspace" motto. There is conceivably no possibility of removing GIL without breaking the userspace. Programs will be buggy, some will stop working. Worse, until someone works around the halting problem, we can't just crawl python code and decide if it uses GIL. People will have to be paid to inspect one function at a time to determine where GIL is needed.


Removing the GIL is easy. It's already been done with Jython and IronPython. No changes to python code is needed.

You do still need locks for concurrency, it's just that instead of a corse-grained global lock, you replace it with a bunch of fine-grained locks. Which is actually one of the sticking points with the proposal to remove the GIL. All those fine-grained locks actually impact performance for a single-threadded python programs.

The impossible problem is removing the GIL without breaking compatibility with the cpython API; There is a bunch of non-python code (mostly c/c++) that interfaces to python code with the cpython API and it's that code which breaks when you remove the GIL.

The proposals floating around don't actually fix that issue, it's more of a "let's break API compatibility anyway", allowing libraries to be updated the new API incrementally.

There are no issues with the halting problem here. It's really easy to detect at runtime if python code is interacting with c/c++ code via the old cpython API. Remember, the halting problem only applies to static analysis, not dynamic runtime checks.


This is extremely naive, and you answer your own question:

> [Par 2] ["Removing GIL creates performance costs that people don't want"]

> [Par 3] ["Removing GIL breaks CPython API which tons of codebases rely on and people will have to be paid to fix all that code"]

In short, it's not possible to remove GIL without requiring code change.

> There are no issues with the halting problem here. It's really easy to detect at runtime if python code is interacting with c/c++ code via the old cpython API. Remember, the halting problem only applies to static analysis, not dynamic runtime checks.

This is not true and you're thinking of this wrong. When presented with a "will my code work after GIL removal" type of problem, the question becomes "will my code run C/C++ API". Whether it's detected at runtime is irrelevant because the question you want to answer is "does this piece of software require any change at all to work without GIL". To do that, you need a tool that statically analyzes whether known-bad functions (those that run C/C++ functions trivially/deterministically) are called from the rest of the codebase.

Imagine you work for a company that has a package "xyz" written in C/C++ called in Python. Now that GIL is removed in an imaginary Python 4, my boss comes to me and asks "how much code needs to be changed for us to port to Python 4". Now, you can apply heuristics such as "how many modules import xyz". But since the correctness implications are co-infective, any module that doesn't import xyz but imports a module that import xyz will be affected as well. You can be more granular and verify whether individual functions use objects in xyz. Which brings me to my original point that there are two cases, either this is a static analysis problem (which we lack) or it's work someone has to crawl the entire codebase, inspect function by function, run unittest by unittest to determine if GIL removal breaks anything. I know because people did exactly this in 2 -> 3 change and it's an extraordinarily expensive ($$$) transition. Your view is naive beyond comprehension and it's hard to think of GIL removal as anything easier than the 2 -> 3 transition which was a shitshow of the magnitude software industry hasn't seen before.


It's super easy to check if any C/C++ code has been imported. Not too hard to get a reasonably good result with static analysis, just find all import statements and see if it maps to a python file or a dll/so file, recusing as needed. You don't need to look at any other code except the import statements.

For an even better result the main python interpreter can be modified with a flag to just list all imported dll/so files. Run your application with that flag and see what it imported and it will catch weird edge cases (like, python code that messes with import paths)

There is no need to check the entire codebase. It's only the C/C++ code that directly uses the cpython api that needs to be checked. Any python code can be ignored, along with C/C++ code that doesn't interact with the cpython API.

This isn't like the 2 -> 3 transition where you needed to touch almost every line of python code. The amount of touched code for most companies should be tiny.

Also, the proposal isn't for a python 4 were everyone is required to upgrade to this new paradigm. It's for an optional mode that allows running without the GIL. The only people who actually need to worry about it are people who want multithreadding.

Your theoretical company can do a quick check to see what c/c++ libraries are imported and estimate how much code would need to be checked. If it's too daunting, they can just keep operating with the GIL enabled.


> Your theoretical company can do a quick check to see what c/c++ libraries are imported and estimate how much code would need to be checked. If it's too daunting, they can just keep operating with the GIL enabled.

Once again, this doesn't make sense and you're simply naively glossing over things. This won't work because likely one of your dependencies use no-GIL mode (such as numpy or something like that). This essentially forks the ecosystem into two, and if you have a 5 million LoC GIL python code you have no way of importing a no-GIL dependency. Nobody will sign up for that kind of mess, python package management is already a huge mess. What happens if the library goes from GIL to no-GIL? Runtime exception. Yeah, thanks but no thanks.

Beyond that, it's unclear how best to respond to your comment about static analysis being easy and that you don't need to inspect python code line by line. It needs to be understood that there are companies who have millions of lines of code where code is mixed with C/C++ FFI calls all over the place. I have a codebase like that where almost everything is FFI/Python mixed. I know many companies do. In our case we do need to go line by line to see what's inspected. Even if we don't, our boss will ask us to because if anything fails it'll be our responsibility so we at least need to do the busy work of confirming nothing is affected. It seems like you suffer from the problem that you identify a small subset of reality, then determine the solution you propose is practical, when in real life it causes tons of codebases to be almost rebuilt. Knowing Python folks, I know no one has any interest in such a thing.


> all you really need to do is [SMOP]. But it's a very solvable problem.

Awesome! I look forward to seeing your PR shortly. ;-)


I too enjoyed the parent comment. It's right up there with "as soon as we have sufficiently smart enough compilers, that won't be an issue anymore".

I needed to Google SMOP, which tells me: "small matter of programming". Correct?


> I needed to Google SMOP, which tells me: "small matter of programming". Correct?

Yes, it can mean either "small..." or "simple matter of programming", which is more the meaning I had in mind. But either fit.


I'm confused. Why do people think my suggested solution is difficult enough to be comparable to the halting problem or "a smart enough compiler"?

We are talking about a scenario where cpython has already been modified to support both GIL and non-GIL operating modes, and a set of libraries that support the new non-GIL mode have been manually annotated by programmers.

All I'm suggesting is that on top of this, we modify the "load library" code to detect at runtime when we are loading a library that doesn't support the new non-GIL mode and switch modes.


This simplest solution is to get a patch accepted by Python team. Good luck with that.


I mean that seems horribly tricky, but also totally doable.

Having read about JavaScript/Java VM optimisations in JITs and GC I would be surprised if a global state change like this is not manageable - think deoptimising the JS when you enter the debugger in the dev tools in your browser.


Java-the-bytecode is as dynamic yet somehow hotspot manages to be rather nippy


This is wildly misunderstood by many just how insanely good is the current JVM. It scares me how much it has improved in the last ten years. Since it has been open source for a long time, many university researchers have done experiments and proposed HotSpot improvements. Plus, Sun/Oracle employs a small army of PhD researchers to work on it "night and day". If Project Valhalla ever sets sail (pun intended), I could imagine other languages begin to target JVM instead of their own (worse) VM. (Raku talked about it a few years ago.) When I need to move from Java to Python, I am always painfully reminded how terrible is the VM performance in a tight loop that does not use list comprehension.


I'm no fan of Python after the 2-3 debacle, but to be fair, you're comparing apples and oranges here. Why not instead compare it to another C-like language (Java was originally intended to be the replacement for C++ back in 1995, and Python dates back to 1989.)

Go or Rust, or even Erlang or Haskell would be great examples that are more inline with the design goals (then and now) of Java.


    comparing apples and oranges
That is exactly my point. People start a small project in Python. Velocity is very high. Then, the project grows into a monstrosity. Suddenly, the awful Python "VM" (slow as hello) and lack of true multi-threading are major project liabilities. They should have started with someone much less sexy: DotNet (C#) or Java. Repetitive? Hell yes, but easy to maintain and cheap to hire devs.


FWIW, the JVM is one of the supported backends of the Raku Programming Language. Has been for quite some time actually. It was the first additional backend supported after Parrot, and provided a template for supporting the MoarVM and Javascript backends since then.


I'm sorry but I really don't think it's "easy" and your suggestion would be just part of a much, much larger solution.


I don't see why you wouldn't have holdouts using GIL for valid single-threaded performance reasons for years/decades. And that's ignoring legacy code - even 2.7 is still alive and kicking in some corners.


I'm sure you would, but anyone who cared about it would work around it, either by not using that library or finding a different one or even forking the one that isn't updated. Just like most people use Python 3.x now, but there are some 2.7 holdouts. But those holdouts aren't holding back the entire ecosystem at this point.


They did for many, many years. And the result soured many devs and companies on Python. And many libraries have not been migrated.

But yes, I'm very glad they've been cut off now. It does eventually shift to just the new stuff.


The difference is that no-GIL is a minority use case. Most people are using Python in single threaded workflows (numpy/scipy, machine learning, build scripts, etc etc) and who don’t care about the GIL. The GIL mostly affects webdevs.

Probably the library support will be much worse than 2-3, but if only a minority of users are impacted by libraries that don’t support it then it’s not a big issue. GIL concerned users can either carefully select their dependencies to be no-GIL supporting, or they can decide to go with the GIL.

2-3 was bad because every single library, app, and tiny script had to be migrated, if it used a string or a print statement.


[flagged]


jedberg doesn't seem like a naïve person, glancing at his work history. Perhaps you could explain why you think this opinion is naïve?


Even skilled programmers often make the mistake of saying "why don't you just..." or "it's easy, just..." when completely ignoring large important factors, such as following process, ensuring backward compatibility, stakeholder alignment (ugh), and addressing long tail problems.


Yes, but it's reasonable to note such things instead of simply derogating the remark.


"There are two types of programmers: new ones who don't know how complicated things are, and experienced ones who have forgotten it" -from an article on HN the other day about setting up python envs.


There also are programmers who are aware of much more complicated things being done in sister projects, like JVM and JS runtimes in this case.


An appeal to authority? I suppose it'd be better if you named the article's author or linked to it.


Not appealing to anyone; I thought it was a clever quote.

* https://www.bitecode.dev/p/why-not-tell-people-to-simply-use

EDIT -- Here is the quote:

There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.


It's clever, but like (almost) all such things, doesn't always apply.


I really think the GIL is saving a bunch of poorly written multi-threaded C++ wrappers/libraries out there. If they remove it, a bunch of bugs will appear in other libraries that might not be Pythons fault.


They're not "poorly written", the fact that you don't need to do any locking in C/C++ code is part of the existing Python API. Right now when Python code calls into C/C++ code the entire call is treated as if it's a single atomic bytecode instruction. Adding extra locking would just make the code slower and would accomplish absolutely nothing, which is why people don't do it.


The interpreter could do locks around these calls automatically to make them atomic, while leaving itself multithreaded.


In order for the call into C to appear atomic to a multithreaded interpreter, all threads in the interpreter would need to be blocked during the call. That's possible to do, but you've just re-introduced the GIL whenever any thread is within a C extension.

In the unlocked case, one could use low-overhead tricks used for GC safepoints in some interpreters. One low-overhead technique is a dedicated memory page from which a single byte is read at the beginning at of every opcode dispatch, and you mark that page non-readable when you need to freeze all the threads executing in the interpreter. You'd then have the SIGSEGV handler block each faulting thread until the one thread returned from C. That's fairly heavy in the case it's used, but pretty light-weight if not used.


Nevertheless, this is still a concern to wider ecosystem, if Python libraries suddenly start to break due to underlying issues. I don't think this can be neglected.


Like in any other language, it's best to avoid non-native dependencies.


Unfortunately, that's simply impossible if you want to approach >10% of your machines power when using python.


I think it's too late to consider removing the gil from the main implementation.

I think it'll happen one day. Is Python going anywhere?

Give it 20 years. The 2 -> 3 switch will be like the Y2K bug, only remembered by the oldest programmers. The memories of pain will fade, leaving only entertaining war stories. The GIL will still be there, and still be annoying.

Then, when everyone has forgotten, the community will be ready. For an incredibly long and grinding transition to Python 4.


Wat! I'm experiencing 2-3 pain right now! Google has given me till January next year to port all my apps that have been running happily for 10 years. It's no small job either.

I'm already in a state where I can't upgrade to the latest gcloud tools because 2.7 is removed from them. I'm stuck on a version from late last year until I've finished my ports or abandon my projects.

I was a very happy user of GAE standard for many many years, but they burned me in the end and I've learnt my lesson about building on proprietary tools.

Standard frameworks and standard databases for me from now on!


That's 4 years after the EOL of Python 2.7: https://www.python.org/doc/sunset-python-2/


In 20 years, you wouldn't have Python 4. You will have something like ChatGPT that you interact with and it writes code for you, down to the machine level instructions that are all hyperoptimized. Coding will be half typing, half verbal.


Imagine debugging hyperoptimized machine code. - Or would you just blame yourself for not stating your natural language instructions clearly enough and start over? I guess all of these complex problems would somehow be solved for everyone within the next 21 years and 364 days.


You wouldn't debug it directly. The interaction will be something like telling the compiler to run the program for a certain input, and seeing wrong output, and then saying, "Hey, it should produce this output instead".

The algorithm will be smart enough to simply regenerate the code minimally to produce the correct output.


I think you have described a new definition of hell.


Yeah, when people start running into weird failure modes with no insight into the code... sounds like a nightmare.


Input output mapping. Point out what output should be for any failure mode and compiler will auto fix the code


What you wrote is naive. I dont think AI will ever be able to guess what correct output for arbitrary inputs should look like, just based on examples. You need to specify an algorithm that produces the correct output for all inputs. AI could be able to optimize that, but not fill in gaps in the definition of the goal.


Good luck with that.


While titillating to think about, what you're describing is more than 20 years away for AI. The best it could do is that for generalized problems, to actually write new original code and complex code is not within the bounds of current AI capabilities.


Yeah, I bet nothing of that sort happens. This programming sci-fi is entertaining to read but it's not grounded in reality.


I agree with this take.

The GIL is hiding all kinds of concurrency bugs. If the CPython team default disable it then all hell is going to break loose.

It's better to carve out special concurrency constructs for those that need it.


Not sure why this was flagged dead. If you look at many stackoverflow answers, around threading, many explicitly rely on the GIL, quoting source/documentation “proving” that it’s safe, avoiding the use of threading.Lock() and the like.

As an early python programmer, I copy pasted these types of answers. My old threaded code absolutely has these bugs. I’ve even seen code in production that has these bugs, because they’re sometimes dumb performance enhancements, rather than bugs, unless you happen to use a Python interpreter without a GIL, especially one that that doesn’t exist yet.

Great care would have to be taken to make sure the GIL was not disabled by default, for anything an existing thread touches (or some super, dynamic aware, smarts to know if it can be disabled).


I believe that GIL-removal projects aim to preserve this behaviour:

https://peps.python.org/pep-0703/#container-thread-safety

That's why gilectomy carries an unreasonable single-threaded performance cost: many operations now need to take a lock where before they relied on the GIL.


Why do you consider code that relies on the GIL to be buggy? Isn't the GIL a documented, stable part of Python? (Hence why it will probably never be removed).


5th sentence in my comment:

> I’ve even seen code in production that has these bugs, because they’re sometimes dumb performance enhancements, rather than bugs, unless you happen to use a Python interpreter without a GIL, especially one that that doesn’t exist yet.


The GIL protects interpreter resources, not your program. If you have concurrent access to your own objects you need your own locks.


“Interpreter resources” are just python primitives, from the user perspective. And, from that perspective, you can sometimes get away without using user managed locks, by relying on the GIL, in your objects. For example, you can trivially use a list as a multi threaded queue, using `.append()` and `.pop()`.


> Like jython, and pypy and other implementations, it never caught on because compatibility with cpython is one of the most important criteria for people dealing with lots of python code.

I think this is more an issue of popular packages were developed and tested against cpython and there wasn't enough effort available to port/test them against anything else. There's no special magic in cpython that Python programmers love, they just want their code to run. If they've got a numpy dependency (IIRC it doesn't support pypy but I'm not going to look it up so I may be corrected on that point this is a long parenthetical) they can't use an interpreter that doesn't support it. Even if it worked but had bugs it didn't have in cpython, they're still going to use cpython. Most people aren't writing Python for its super duper fast performance so they're fine leaving a little performance on the table by using the interpreter that their dependencies support. Whatever that is.


Jython doesn't have a good story for running native libraries like Numpy.


Sure, I didn't claim it did. My point is that Python programmers don't tend to have a particular fondness for an interpreter. They tend to care only about the ecosystem. If you came up with a cxpython interpreter that was faster than cpython and supported all the modules the same way (including C interop) Python programmers would jump over to it. If your cxpython was faster than cpython but didn't support everything they'd ignore it.

Case in point: Python 2.7. While 3.x offered a lot of improvements it took years for some popular modules to support it. No one bothered to look twice at Python 3 until their dependencies supported it.

Python programmers don't tend to care much about the interpreter so much as the code they wrote or use running correctly.


Numpy also wasn't very prominent in those days among Python users, the explosion of data science Python use happened later.

(Actually NumPy didn't even exist for most of Jython's active development, but some of its predecessors did)


>the python core team burned the community for 10 years with the 2-3 switch, and a GIL change would be likely as impactfu

A core team led by him, which also had the opportunity to make much more impactful changes during 3, including removing the GIL, since they were going to mess up compatibility anyway, but didn't.

All that mess (and resulting split and multi-year slowdown in Py3 adoption) just to put in the utf-8 change and some trivial changes. It's only after 3.5 or so that 3 became interesting.


I believe it's a lot easier to state this long after the migration took place. Removing the GIL, or any host of other changes, could have split the community to a place where the language would have been abandoned except amongst a small minority.

At this point, removing the GIL is such a large change, it would probably be better to wait for the end of (practical) performance to single-threaded code, and then remove the GIL. At that point you would have a community consensus behind the change, and the holdouts wouldn't have a stranglehold on keeping the community from moving forward.

I'm not sure if I would call the move from 2 -> 3 a trivial change, but maybe you're more well versed in Python than I am (and I'm fully willing to admit that may be the case).


It was such a big screwup they had to backtrack on one of the changes, resulting in 2.7 -> 3.3 being easier than 2.7 -> 3.0


Do you have in mind u'unicode' syntax?


Yep

Edit: To be clear, it specifically made it difficult to support 2 and 3 at the same time, which was a problem for libraries. And without the libraries supporting 3, app code wouldn't be able to migrate either.


I think that's called "learning from one's mistakes". I hope.


It's hard to argue that Python 3 should have been even less compatible.


Is it? It took the 5 year (at least) adoption hit anyway. How worse would it be if it had more features people want?

I'd say it should have been less compatible when needed to add more substantial changes people wanted, as opposed to taking the hit for nothing!

And it should be more compatible for things that were stupid decisions that had to eventually take back, like not having a bytes str solution.


> if it had more features people want

What people wanted was features to help with the migration.

Yes, that of course having those features would have helped.

But doing that required experience the Python developers didn't have when they were doing 3.0!

The Python developers thought people could do a one-off syntactic code translation (2to3), perhaps even at install time, rather than what most people did - write to the common subset of 2 and 3, with helpers like the 'six' package.

What are the "more substantial changes" you propose? The walrus operator? Matching? Other things that Python 3 eventually gained, and which took years in some cases to develop?

Or are you proposing something that would have it more difficult to write to the common subset?

That subset compatibility necessity extends to the Python/C API. Get rid of global state and you'll need to replace things like:

   PyErr_SetString(PyExc_ValueError, "embedded null character");
with something that passes in the current execution state. Make that too hard, and you inhibit migration of existing extension modules, which further inhibits the migration.


>The Python developers thought people could do a one-off syntactic code translation (2to3), perhaps even at install time

Perhaps they overstimated how much of the change 2to3 would handle, but they surely didn't think that, because they already knew that they put in all kinds of not automatable via "syntactic code translation" changes - where context is needed (e.g. the str changes).

>What are the "more substantial changes" you propose? The walrus operator? Matching? Other things that Python 3 eventually gained, and which took years in some cases to develop?

No GIL, more optimization - a wasted opportunity for a big speed bump, dropping legacy stuff, typing support, an improved C extension model, JIT, specialization, green threads, and so on. Things like the walrus operator would be at the very bottom...

They could have gone for "compatibility" to ease the migration (or make it a non-issue, just run backwards compatibly) sure.

But since they decided to go ahead and break things, and bump the version name, a change the community has been discussing since the mid-90s as some grand vision of big changes, they could have done more substantial stuff at least.

At it stands, they broke compatibility and cost the community 5-6 years for nothing much. Everything big came later in 3.x. And not because it couldn't just as well be added piecemeal to some 2.8 and onwards...


> but they surely didn't think that

They surely did! And it tells me you don't know the history that well.

You can read PEP 3000 where they say almost exactly that, at https://peps.python.org/pep-3000/ :

  The recommended development model for a project that needs to 
  support Python 2.6 and 3.0 simultaneously is as follows:

  0.  You should have excellent unit tests with close to full coverage.
  1.  Port your project to Python 2.6.
  2.  Turn on the Py3k warnings mode.
  3.  Test and edit until no warnings remain.
  4.  Use the 2to3 tool to convert this source code to 3.0 syntax.
        Do not manually edit the output!
  5.  Test the converted source code under 3.0.
  6.  If problems are found, make corrections to the 2.6 version of
        the source code and go back to step 3.
  When it’s time to release, release separate 2.6 and 3.0 tarballs (or
  whatever archive form you use for releases).
They really did not consider having a usable subset that could work on Python 2 and Python 3. The same PEP says:

  There is no requirement that Python 2.6 code will run unmodified on
  Python 3.0. Not even a subset. (Of course there will be a tiny subset,
  but it will be missing major functionality.)
Remember, it wasn't even until Python 3.3 that they restored the u'unicode' syntax "provided solely to reduce the number of purely mechanical changes in migrating to Python 3, making it easier for developers to focus on the more significant semantic changes (such as the stricter default separation of binary and text data)." https://docs.python.org/3.3/whatsnew/3.3.html

And it wasn't until Python 3.5 that they supported old-style %-formatting, like (b"%d" % n), both because it's useful in wire protocols, and "to help ease migration from, and/or have a single code base with, Python 2" https://peps.python.org/pep-0461/ .

> No GIL ... and so on

The features you listed required years of development, and some, like the JIT support, are still in-progress, while the no-gil proposal seems beyond what the steering committee is willing to accept.

It sounds like you want them to have delivered Python 3.12, with all these features (many of them iterated on over several releases), instead of 3.0, and without any migration path from 2.x Python or Python/C extensions beyond wholesale rewriting.

There is no way they would have manged to timely deliver a stable release following your suggestion. As it was, it took about a decade to deliver a version that had a effective migration pathway. Python 3.5 is the first version I supported, in no small part because it supported bytes % interpolation.

> And not because it couldn't just as well be added piecemeal to some 2.8 and onwards...

Perhaps it could have been done in an abstract and technical sense. But the Python core developers decided they didn't have the resources (money, people, etc) for that path.

I don't see how some of the changes, like exception chaining, could have been implemented without breaking backwards compatibility. Even things as simple as evaluating (a<b) or "Hello".encode("base64") changed at a pretty deep level.

Remember, Python 3 was always meant as "a relatively mild improvement on Python 2, [because] we can gain a lot by not attempting to reimplement the language from scratch." https://peps.python.org/pep-0461/ That was how the effort was justified.

The Perl6/Raku experience was definitely part of the zeitgeist in the discussion against larger changes like you think were needed.

What are your counter-examples that might sooth the concerns should a cusp like this appear in the future? I can't think of any good ones.

> and cost the community 5-6 years for nothing much. Everything big came later in 3.x.

The core developers argued that the technical debt in the code base was high, and while it would take years, those changes would enable the later big improvements you now see.

You seem to look at the big improvements and somehow believe they could have been done 15 years ago, on the old code base, with the available knowledge and resources of the people then.

You believe this because, why?


> How worse would it be

raku


Raku failed to timely deliver a stable release. And didn't stick to sensible new features, but tried to make an uber-language with everything plus the kitchen sink, and even a multi-language vm.


You are literally asking what could happen if Python 3 had done the same.


> How worse would it be

8 years

or never


Agreed. It would have pushed the 2->3 migration from "very painful for the ecosystem" to a full on perl5->6 break between the two versions. Not sure it would have survived that.


It's not like Python adoption and use was significantly boosted by Python3: it was and would have been just fine without it.

The problem with languages written by language designers/implementers is that they often don't know when to stop. Wirth is a shining example here: Pascal -> Modula(n) -> Oberon. You need to move on, and leave the old alone.


I wouldn't say that GIL will never be removed, but I believe the GIL cannot be removed without breaking a lot of existing code.

That means there could be another drama with migrations if that would be done.

I think the most likely way they can effectively eliminate GIL would be to provide a compile option that would basically say "this enables paralellization, but your code is bo longer allowed to do A, B, C (there probably would be a lot more things)"

People who want to get it would then adapt their code, and there could be pressure for other packages to make them work in that mode.


> Histortically, the main value of GIL removal proposals and implementations has been to spur the core team to speed up single core codes.

While I'm all for increasing single threaded performance[side rant] is this really a good argument? My understanding is that Moore's Law only currently exists because of the existence of more cores. Clock frequency has stagnated since 2004[0]. I mean feature size (Moore's Law) still exists[1] but IPS/core seems significantly flatter than IPS[2]. Aren't most of our performance gains from mutlicore? And as we push more in this direction (e.g. GPU taking over compute) doesn't this make this even worse for python? (Yes I understand that you can make C/CUDA calls but doesn't the GIL make problems here as well as prevent a native python solution?)

[side rant] Still frustrated by things like that difference in speeds between math.sqrt(2.) (~0.002s/10k), np.sqrt(2.) (~0.010/10k), torch.sqrt(torch.tensor([2.])) (~0.039/10k). I know there's more going on in capabilities, but it is of course frustrating by the large differences.

[0] http://cpudb.stanford.edu/visualize/clock_frequency

[1] http://cpudb.stanford.edu/visualize/technology_scaling

[2] https://en.wikipedia.org/wiki/Instructions_per_second#Millio...


That difference (at least in the math->numpy case) is entirely down to conversions (or at least it was when I profiled it ~7 years ago). If you're careful with types (i.e. avoiding converting from float to numpy scalar values), then the difference disappears.


> compatibility issues trump performance".

Surely it would be possible to drop back to GIL mode if any c extension is loaded which is incompatible with running lockless?

Then you usually get speed, yet still maintain compatibility.


> if any c extension is loaded which is incompatible with running lockless?

It'd be slightly magical to do. You'd probably have a giant RWLock, which is used for checking whether you're in the "lock free" mode. But at least almost all code could hit it only for read, and it could go away one day.


GIL is one of the things that make Python an annoyance to work with. In saner languages, you could handle multiple requests at the same time, or easily spin something off in a thread to work in the background. In Python you can't do this. You need to duplicate your process, then pay the price of memory usage and other things multiple processes hinder (like communication between threads or pre-computed values now aren't shared so you need something external again). To deploy your app, you end up with 10 different deploys because each of them have to have a different entry point and separate task to fulfill.


GIL is one of the things that make Python an annoyance to work with

For your particular usecase, yes. Personally I've been using Python for like 20 years for various tasks and so far never got really bothered by the presence of it once. Worst case was having to wait somewhat longer for things to complete. For my case: still worth it compared to making things multithreaded. And async fixed the rest. And the things which I actually need to be fast aren't usually in Python anyway. I'm not saying the GIL should stay, it's just that it doesn't seem as much of a problem in the general land of Python. Or in other words: how many Python users out there even know what GIL means and does?


> For your particular usecase, yes.

The use case they are describing is a standard web server or web application. That's a pretty important and widely applicable use case to dismiss out of hand as "your particular usecase".


The GIL is not held during IO, which is what most web applications and web servers should be spending the vast majority of their time doing.

https://docs.python.org/3/library/threading.html

If that’s too limiting, preforking and other forms of process-based parallelism are a tried and true approach that has been used for years to run python, ruby, PHP, and once upon a time Perl web applications at enormous scale. The difference between threads and processes on Linux is relatively minor.

Saying that python doesn’t work for web application use cases because of the GIL is frankly sort of bizarre given the large number of python web applications in the wild chugging along delivering value.


> The GIL is not held during IO, which is what most web applications and web servers should be spending the vast majority of their time doing.

While this has been oft-repeated for years, more or less language-independently, I have become convinced it no longer accurately describes ruby on rails apps. People still say it about ruby/rails too though. But my rails web apps are spending 50-80% of their wall time in cpu, rather than blocking on IO. Depending on app and action. And whenever I ask around for people who have actual numbers, they are similar -- as long as they are projects with enough developer experience to avoid things like n+1 ORM problems.

I don't have experience with python, so I can't speak to python. but python and ruby are actually pretty similar languages, including with performance characteristics, and the GIL. Python projects tend to use more stuff that's in C, which would make more efficient use of CPU, so that could be a difference. (Also not unrelated to what we're talking about!)

But I have become very cautious of accepting "most web applications are spending the vast majority of their time on io blocking rather than CPU" as conventional wisdom without actually having any kind of numbers. vast majority? I would doubt it, but we need empirical numbers.


But are they CPU limited?

There’s a difference between spending most of your time in CPU, and being CPU limited. If I’m serving 10 requests/second then even if I’m 90% CPU and 10% IO, it doesn’t matter because I’m 99% idle.

My mental model for a very high-interpreter overhead language like Ruby or Python for webdev, is that it is appropriate for sites that don’t see that much absolute interactive traffic. e.g. it’s good for a blog where you can put a cache in front of it, or an intranet service where you control the number of users. I would never use Python for a fully interactive high traffic service sitting on the open internet. There are languages that are much better at that.


> If I’m serving 10 requests/second then even if I’m 90% CPU and 10% IO, it doesn’t matter because I’m 99% idle.

I get your point I think, but that depends on the capacity of the host you are on, and how much the total amount of CPU is, not just the proportion vs IO.

Rails may not be a good comparison to python, perhaps it is especially a CPU hog, but I have definitely seen rails apps for which 90% CPU and 10 rps would saturate the hosts CPU yeah.

I mean, the math is--- if you have a 111ms response time, 90% of that was spent on cpu (90% of 111 is 100), and you have 10 requests per second (100ms * 10 == 1 second) -- you are now saturating a single core cpu, right? Those are not crazy numbers.

But of course Rails has a GIL too -- so you can be "cpu limited" with spare CPU available on other cores, depending on how you've set things up -- that's the original conversation topic here, right? How the GIL may or may not complicate attempts to make efficient use of resources?

For what we're talking about, the issue I guess is whether they would be CPU limited with the GIL but not without the GIL. Which seems plausible.

(And of course Rails is very very commonly used "for a fully interactive high traffic service sitting on the open internet" -- so I don't see why python, a language with pretty similar relevant characteristics, couldn't or shouldn't be? Despite having little python experience, the issues with GIL are something very very similar in Ruby/Rails, is why I'm in the conversation)

It's easy to start getting confused about what we're talking about here, or be talking about different things at once.


> I would never use Python for a fully interactive high traffic service sitting on the open internet.

So, nothing like Instagram, Dropbox, or YouTube?


> which is what most web applications and web servers should be spending the vast majority of their time doing.

Sure... but if you have dozens of threads spending most of their time doing I/O, that still leaves many threads wanting to do things other than I/O.

> The difference between threads and processes on Linux is relatively minor.

Except having any shared state between processes is painful. If you're hitting an outside database for everything, it's fine.


The dismissiveness really goes the other way. Pythons like IronPython and Jython don't have a GIL. CPython does because it's primarily a glue language for extensions that might not be thread-safe. Web apps were given huge accommodation with async, so you can't say their needs are being dismissed. Why must we break the C in CPython for a use-case that could use one of the GIL-free Pythons?


That's somewhat out of context. With the bit you quoted I meant "sure working around the GIL by implemening a web server in that particular way is annoying". I'm not saying that "web server" as a whole is not important or not widely applicable, merely that amongst all other usecases and applications of Python out there, web servers are just one of many. And the particular implementation stated like "10 different deploys" is even a subset of that 'one' and as explained by fellow comments, probably not the most appropriate one.


> The use case they are describing is a standard web server or web application.

I believe this is what they were referring to when they said “async fixed the rest”.


I think Data Scientists would like a word with you. They have plenty of time since their parallel pipeline was OOMKilled.


Other languages do that: JavaScript, PHP, Erlang.

Python multiprocessing is pretty usable.

I like multithreading, but also...it has more footguns then the rest of programming combined. [1]

I'm not convinced Python's approach is that bad in practice.

[1] https://news.ycombinator.com/item?id=22165193


It is, until the GIL bites you in the ass. As it is you get different behavior if your call is calling out to external code vs being pure python. Note that you really don't know if a random function call is python or wrapping external class so you really get random behavior.

The time it got me was a thread just to timeout another process. Tests worked great but the timeout didn't work in production because the call it was wrapping was calling out to C code so nothing would run until the call returned. We even still got the timeout error in the logs and it looked like it was working (it even tossed the now waited for valid results), but not at the time of the timeout but after the call finally returned a few hours later.


So.... It would have been better if GIL were even more aggressive?


> you could handle multiple requests at the same time

To be fair to Python and the GIL, it's totally capable of parallelizing requests when most of the work is network-bound, which is probably the common case. And when the work is CPU-bound, but the CPU-intensive part is written in C, it's also possible for C to release the GIL. So it's really only "heavy computational work directly in Python" programs that are affected by this. (On the other hand, Python applications do naturally expand to look like this over time...)


> You need to duplicate your process, then pay the price of memory usage ...

I believe the implementation of the Python multiprocessing package uses fork() on *nix systems which means memory should be copy-on-write.

Processes do have the advantage of being self-contained meaning if one crashes, then it will not take down the entire system. Also, the message passing model can theoretically scale to processes running on separate hardware. Thread's cannot do that.


This is your opinion, but many disagree. I personally find processes infinitely easier to deal with than thread, it's not even a discussion in my mind. For me, threads are banned from usage, until it's absolutely necessary to do so. When I write code in C++ there are cases where a problem cannot be solved unless threads are used. It's fine to use threads there. For everything else I appeal to either multiprocessing or `async` concurrency.


The other languages are not saner. You are basically saying "Python GIL is annoying because I can't write parallel processing performant code in Python". Python has never been and is not a performant language. Its designed for rapid and easy development.

The multiprocessing+asyncio in Python fulfills the aspect of utilizing all the resources, albeit at a higher memory cost, but memory is dirt cheap these days. You have a master process and then worker threads. For all things that you would write in Python, where in >90% cases you are network latency limited, the paradigm of a master process and worker processes with IPC on unix sockets works extremely well. Set up a web app with fast api/gunicorn master/uvicorn workers, and it will be plenty fast enough for anything you do.


No it is not.

If you want to reach peak performance single-threaded app with no locks is the way to go, and work being sharded (not shared) among multiple single-threaded apps.

Multi-threaded apps with shared state introduce more complexity, than the performance when compared to multiple single-threaded apps running asyncio event loop.

For example LMAX Disruptor


Yeah, they clearly stated that here https://discuss.python.org/t/pep-703-making-the-global-inter...

I really wish that GIL go away. It is better to pay this price now, multi-threading is the future.


>multi-threading is the future

Haha, reminds me of an image I saw with a farmer from some developing country saying "irrigation is the future".

For everyone else multithreading has been the status quo for quite a long time.


It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach. Yes, this will break some assumptions here and there, but not too difficult to overcome especially if you bump the version number to Python 4.

However, that leaves a lot of C code that you can't talk to anymore because the C code requires the old Python FFI. I think this is where the main problem lies.


>It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach. Yes, this will break some assumptions here and there, but not too difficult to overcome especially if you bump the version number to Python 4.

"It's easy to lower the air-conditioning costs of Las Vegas: just move the town to New England".

The problem is "how to remove the GIL" in abstract. It's how to remove the GIL, not impact extensions at all (or as little as possible), keep single threaded performance, and have zero impact to user programs.

To which the above isn't any kind of solution.


> It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach

Sure, if you don't mind paying a 50-90% performance impact on single threaded performance or completely abandon C-API compatibility and have C extensions start from scratch then there are simple approaches.

If you look at any example in the past to remove the GIL you would see that keeping these two requirements of not having terrible single threaded performance and not having almost a completely new C-API is actually very complex and takes a lot of expertise to implement.


This might be a dumb question, but why would removing the GIL break FFI? Is it just that existing no-GIL implementations/proposals have discarded/ignored it, or is there a fundamental requirement, e.g. C programs unavoidably interact directly with the GIL? (In which case, couldn't a "legacy FFI" wrapper be created?) I know that the C-API is only stable between minor releases [0] compiled in the same manner [1], so it's not like the ecosystem is dependent upon it never changing.

I cannot seem to find much discussion about this. I have found a no-GIL interpreter that works with numpy, scikit, etc. [2][3] so it doesn't seem to be a hard limit. (That said, it was not stated if that particular no-GIL implementation requires specially built versions of C-API libs or if it's a drop-in replacement.)

[0]: https://docs.python.org/3/c-api/stable.html#c-api-stability

[1]: https://docs.python.org/3/c-api/stable.html#platform-conside...

[2]: https://github.com/colesbury/nogil

[3]: https://discuss.python.org/t/pep-703-making-the-global-inter...


> C programs unavoidably interact directly with the GIL?

Bingo. They don't have to, but often the point of C extensions is performance, which usually means turning on parallelism. E.g. Numpy will release the GIL in order to use machine threads on compute-heavy tasks. I'm not worried about the big 5 (numpy, scipy, pandas, pytorch, and sklearn), they have enough support that they can react to a GILectomy. It's everyone else that touches the GIL but may not have the capacity or ability to update in a timely manner.

I don't think this is something which can be shimmed either or ABI-versioned either. It's deeeep and touches huge swaths of the cpython codebase.


Thanks, that explains a lot. Sounds like a task that would have to be done in Python 4, if ever it exists.


> or is there a fundamental requirement, e.g. C programs unavoidably interact directly with the GIL?

Both C programs can use the GIL for thread safety and can make assumptions about the safety of interacting with a Python object.

Some of those assumptions are not real guarantees from the GIL but in practise are good enough, they would no longer be good enough in a no-GIL world.

> I know that the C-API is only stable between minor releases [0] compiled in the same manner [1], so it's not like the ecosystem is dependent upon it never changing.

There is a limited API tagged as abi3[1] which is unchanging and doesn't require recompiling and any attempt to remove the GIL so far would break that.

> so it's not like the ecosystem is dependent upon it never changing

But the wider C-API does not change much between major versions, it's not like the way you interact with the garbage collector completely changes causing you to rethink how you have to write concurrency. This allows the many projects which use Python's C-API to relatively quickly update to new major versions of Python.

> I have found a no-GIL interpreter that works with numpy, scikit, etc. [2][3] so it doesn't seem to be a hard limit.

The version of nogil Python you are linking is the product of years of work by an expert funded to work full time on this by Meta, the knowledge is sourcing many previous attempts to remove the GIL including the "gilectomy". Also you are linking to the old version based on Python 3.9, there is a new version based on Python 3.12[2]

This strays away from the points I was making, but with this specific attempt to remove the GIL if it is adopted it is unlikely to be switched over in a "big bang", e.g. Python 3.13 followed by Python 4.0 with no backwards compatibility on C extensions. The Python community does not want to repeat the mistakes of the Python 2 to 3 transition.

So far more likely is to try and find a way to have a bridge version that supports both styles of extensions. There is a lot of complexity in this though, including how to mark these in packaging, how to resolve dependencies between packages which do or do not support nogil, etc.

And even this attempt to remove the GIL is likely to make things slower in some applications, both in terms of real-world performance as some benchmarks such as MyPy show a nearly 50% slowdown and there may be even worse edge cases not discovered yet, and in terms of lost development as the Faster CPython project will unlikely be able to land a JIT in 3.13 or 3.14 as they plan right now.

[1]: https://docs.python.org/3/c-api/stable.html#c.Py_LIMITED_API [2]: https://github.com/colesbury/nogil-3.12


>> However, that leaves a lot of C code that you can't talk to anymore because the C code requires the old Python FFI. I think this is where the main problem lies.

This is exactly the problem, but people have a hard time grasping this because most people interacting with Python have no understanding of how C code interacts with Python, or don't understand the C module ecosystem. I'm not sure if the Python community has a good accounting of this either because I don't recall seeing much quantitative analysis of how many modules would need to be updated etc.

This would help compare with the Python 2 to 3 conversion efforts. Even then, the site listing (shaming?) popular modules with compatibility made a mid-to-late appearance in the process of killing Python 2. Quantification of module updates is obvious thing to have from the get-go for anyone looking to follow through on removing the GIL, but it's not a fun task.


This needs more thinking but how about a hybrid approach, where you have Thread objects, and GILFreeThread objects?

The Thread objects work with old code, but run more slowly.

The GILFreeThread objects are fast.

If an object is passed from a Thread to a GILFreeThread or the other way around, then special safety code is attached to the object so that manipulating the object from the other side doesn't cause issues.

The advantage is that now the module implementers have time to migrate from the old system to the new system. And users can work with both the old modules and "converted" modules in the same system, with minor changes.


That sounds like a maintenance and stability nightmare, if it's even possible. You are effectively red/blue splitting the entire codebase. PyObject and the GIL touch everything in the codebase.


The red/blue splitting happens behind the scenes, so it's different. Not really a color problem, because the user doesn't have to know about it.

But yeah, you will basically have two versions of Python running at the same time, with some (hopefully invisible) translation between them.


> But the red/blue splitting happens behind the scenes, so it's different.

Respectfully, I don't believe you have spent any appreciable time looking at the CPython source code. If you had, you would understand how unreasonable this expectation is. I don't say this to tear you down, I say this to convey the magnitude of what you are describing. It would involve touching tens of thousands of LoC. You are talking about a multi-million dollar project that would result in a ton of near-duplication of code.

The red/blue is inescapable because you have to redefine PyObject to have two flavors, PyObject with GIL and GilFreePyObject. You now have to check which one you are dealing with constantly.


> You now have to check which one you are dealing with constantly.

No, because if you're running inside a Thread you will know that you will see only PyObjects, whereas if you're running inside a GilFreeThread you will know that you will only see GilFreePyObjects.

If you're manipulating the PyObject (necessarily from a Thread) then there will be behind-the-scenes translation code that will manipulate the corresponding GilFreePyObject for you. But you don't have to know about it.


What exactly does "running inside a Thread/GilFreeThread" in the context of the cpython runtime mean? You pretty much need an entire copy of the virtual machine code.

These are C structs we are talking about here, not some Rust trait you can readily parameterize over abstractly. That either means lots of manual code duplication, or some gnarly preprocessor action. Both are a maintenance nightmare.


Yes, the assumption is that writing a "double-headed Python" runtime is far less work than converting the entire ecosystem to a new Python runtime.

I think this is the correct view, because at this moment people are writing various approaches in an attempt at getting rid of the GIL. It's the ecosystem of modules that's the real problem, where you want to basically put in as little effort as possible per module, at least initially.


Please read any amount of CPython interpreter code to begin to understand what you’re asking for “behind the scenes”.


This sounds a bit like COM and its apartment-threaded vs. free-threaded objects. The "special safety code" in that case is a proxy object that sends messages to the thread that owns the actual object when its methods are invoked.


JRuby is a good path to this in the Ruby world.


this would literally break every single python package out there man


[flagged]


> Also, there aren't good programmers in Python core dev.

You seem pretty confident that you know what you are doing.


...you mean like how the nogil project already has a working Numpy module?


No it doesn't.

There are couple dozens of projects which link with NumPy. In order to declare NumPy working, you need to support all of them too.


I'm so conflicted.

~40 of the programs I am responsible for are single threaded. They were relatively quick to develop and were made by Electrical Engineers rather than career/degreed programmers.

2 programs use multithreading, I had to do that. The learning curve was not a huge deal, but the development time adds at least hours. In my case days(due to testing).

I imagine its too hard to have an optional flag at the start of each program that can let the user decide?


> I imagine its too hard to have an optional flag at the start of each program that can let the user decide?

Adding nogil would mean deep changes to the interpreter. I imagine maintaining both versions would be almost like forking the project.


>but the development time adds at least hours

So? some "hours of development time" is nothing.


True, and as you get better at it, those hours will be fewer.

The weeks of debugging never go away, though.

(Or at least, not as long as you're using shared state, but that's really the only thing under consideration here.)


I'm mostly with you. I think this probably affects part time/newbie programmers more.

A few hours of dev time is 1 or 2 nights of work.


The problem is that Python types are not thread safe so you have to jump through more hoops to have safe parallelization in Python. These changes would make it so that writing multithreaded code will be much easier it seems like.


It's not like CPython is depriving anyone of GIL-free Python. IronPython and Jython are GIL-free. You can have it yesterday.

CPython maintains Python's original purpose as a glue language. You know how much worse CPython would be for that use-case without the GIL.

GVR has said he'll support removing the GIL as long as there's no performance hit. Otherwise, it's simply asking for a fundamental change of direction and too much sacrifice from everyone who needs CPython specifically, just to improve the productivity of people who could use a different Python.


> I imagine its too hard to have an optional flag at the start of each program that can let the user decide?

The actual next-step proposal toward no-gil is GIL as a build-time flag (which isn’t quite the same as a runtime flag, but not too far off, either.)

https://peps.python.org/pep-0703/


Not for Python, I feel. In sheer volume, the vast majority of my Python programs are single-threaded. I want my programs to be very quick in runtime when they run.

Those that are multi-threaded are seeing minor to medium load.

If expecting extreme load (like Twitter scale), then Python is usually not the answer (rather go to a statically typed language like Java, Go, Rust etc).


> In sheer volume, the vast majority of my Python programs are single-threaded

obviously if multithreading is near-useless in python, very few programs will take advantage of it.


Likewise I can say: Obviously we still have the GIL because very few people want it removed.


Well, yes, that's pretty much obvious: the majority of python programmers do not care, those that cared have moved on as it is easier than changing the language.

And I'll be first to admit that improving single threaded performance has an higher priority than removing the GIL.


> Not for Python, I feel. In sheer volume, the vast majority of my Python programs are single-threaded.

Yes, they are single threaded, because using multiple threads brings very little benefit in most cases...

> If expecting extreme load (like Twitter scale), then Python is usually not the answer (rather go to a statically typed language like Java, Go, Rust etc).

So that means we shouldn't get any performance improvements, because there are faster languages out there?


Is single threaded perf is important, you’ve already lost by using python. You’re only ever going to get, ok-ish performance or slightly more ok-ish


>multi-threading is the future.

Yes, with ML powered compilers recognizing what you are trying to do and generating the actual multithreaded code for you.

And it won't be multithreaded code like you know it, in the sense of os specific threading code with context switching and what not. It will be compiled compute graphs targeted at specific ML hardware, likely with static addressing.


My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.

Also, no matter how much you wish it otherwise, retrofitting concurrency on an existing project guarantees that you'll wind up with subtle concurrency bugs. You might not encounter them often, and they're hard to spot, but they'll be there.

Furthermore existing libraries that expect to be single-threaded are now a potential source of concurrency bugs. And there is no particular reason to expect the authors of said libraries to have either the skills or interest to track those bugs down and fix them. Nor do I expect that multi-threaded enthusiasts who use those libraries in unwise ways will recognize the potential problems. Particularly not in a dynamic language like Python that doesn't have great tool support for tracking down such bugs in an automated way.

As a result if "no GIL" ever gets merged, I expect that the whole Python ecosystem will get much worse as well. But that's no skin off of my back - I've learned plenty of languages. I can simply learn one that hasn't (yet) made this category of mistake.


My deep interest is multithreaded code. For a software engineer working on business software, I'm not sure if they should be spending too much time debugging multithreaded bugs because they are operating at the wrong level of abstraction from my perspective for business operations.

I'm looking for an approach to writing concurrent code with parallelism that is elegant and easy to understand and hard to introduce bugs. This requires alternative programming approaches and in my perspective, alternative notations.

One such design uses monotonic state machines which can only move in one direction. I've designed a syntax and written a parser and very toy runtime for the notation.

https://github.com/samsquire/ideas5#56-stateful-circle-progr...

https://github.com/samsquire/ideas4#558-state-machine-formul...

The idea is inspired by LMAX Disruptor and queuing systems.


And your approach can be built into a system that does multi-threading away from Python, thereby achieving parallelism without requiring that Python supports it as well.

That's basically what all machine learning code written in Python does. It calls out to libraries that can themselves parallelize, use the GPU, etc. And then gets the answer back. You get parallelism without any special Python support.


Just to add a bit of my opinion after reading your comment in the context of this thread and not to the merit of your idea. You are precisely the type of person I'd keep very very far away from multithreading in any business software project and also why I advocate the GIL to stay. If you want to do that, go solo in your own time, or try apply for a research position in some giant tech Co.


>My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.

Out of curiosity, have you done any Rust programming and used Rayon?

It's hard to convey how easy and impactful multi-threading can be if properly enclosed in a safe abstraction.


Python is dynamic AF and Rust's whole shtick is compile-time safety. Python was built from the ground up to be dynamic and "easy", Rust was meticulously designed to be strict and use types to enforce constraints.

It's hard to convey how difficult it would be to retrofit python to be able to truly "enclose multithreading in a safe abstraction".


I have only read about and played a tiny bit with Rust. But as I noted at https://news.ycombinator.com/item?id=36342081, I see it as fundamentally different than the way people want to add multithreading to Python. People want to lock code in Python. But Rust locks data with its compile-time checked ownership model.

See https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h... for more.


This is a good point. With tsan and helgrind multithreading becomes manageable in C/C++, nothing like that exists (or will exist) for Python.


>My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.

It's 2023 already. 1988 called.


Did you know that Coverity actively REMOVED checks for concurrency bugs?

It turns out that when the programmer doesn't understand what the tool says, managers believe the programmer and throw out the tool. Coverity was finding itself in situations where they were finding real bugs, and being punished for it by losing the sale. So they removed the checks for those bugs.

I'll revisit my opinion of multithreaded code when things like that stop happening. In the meantime there are models of how to run code on multiple threads that work well enough with different primitives. See Erlang, Go, and Rust for three of them. Also, if you squint sideways, microservices. (Though most people set up microservices in a way that makes debugging problematic. Topic for another day.)


> Did you know that Coverity actively REMOVED checks for concurrency bugs?

Source?



Specifically, on the last page of that:

> As an example, for many years we gave up on checkers that flagged concurrency errors; while finding such errors was not too difficult, explaining them to many users was.

(And thanks; I was also wondering about that)


This is very witty and all, but what does it mean?


I think he was subtly Rick Rolling you.

https://en.wikipedia.org/wiki/Never_Gonna_Give_You_Up

On 12 March 1988, "Never Gonna Give You Up" reached number one in the American Billboard Hot 100 chart after having been played by resident DJ, Larry Levan, at the Paradise Garage in 1987. The single topped the charts in 25 countries worldwide.


Actually it just meant that this is a tired old argument when C/C++ programmers were new to multithreaded code.

Now we have languages and language facilities (consider Rust, Haskell, and others) to make it much safer. Same with green threads and what Go and now Java does.


And yet, the actual proposals for how to do multithreading in Python are essentially the same as the old C/C++ approaches. And since that hasn't changed, there is no reason to expect better results.


This just posted on the Python forum is a brilliant rundown of the conflicting "Faster Python" and "No GIL" projects, and a proposal (plus call for funding) for a route forward.

I think everyone would agree that trying to combine both would be ideal!

"A fast, free threading Python"

https://discuss.python.org/t/a-fast-free-threading-python/27...


It is from the person most involved in the faster-with-GIL effort, and its recommendation is to prioritize that effort in any case, and if the resources are available for that and no-gil, do both.

Not that I disagree with the recommendation, but one of the sides saying “as long as resources make us choose, choose my side” is...not really surprising.


Not surprising, but I'm very happy they are trying to find a route forward for both. That I commend.

I think from memory "Faster Python" is Microsoft funded and "No GIL" is funded by Facebook. If they can find a way to fund a combined effort that would be good.

I suspect the conflicting funding also adds to the general political difficulty around this.


this is a good read and deserves to be on the homepage with it's own thread too!


> There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in. However again I'm unconvinced by that, the Python community has been through worse (Python 3) and even asyncIO completely divides the community now.

I think the fact that you can name two other recent things which have divided the community is a solid argument for being a least a little gunshy about making big, breaking changes. There's the cost of the changes themselves, but there's also a cost to the language as a whole to add yet-another-upheaval.

Performance is important, but not breaking things is also important. I can understand the appeal of doing something suboptimal (but better than current) in favor of not introducing a bunch of harder to predict side effects, both in code and the community.


Do sub interpreters actually work with c extensions? I get the extension API has long supported it. However, I wonder if in practice extensions rely on process global state to stash information.

If so, sub interpreters invite all kinds of nasty bugs. Keep in mind that porting the most popular extensions is an easy exercise so the more interesting question is how this hidden majority of extensions fares.


The other thing I don't get is that the whole sub interpreters thing seems to totally break extension modules as well: https://github.com/PyO3/pyo3/issues/2274. In theory parts of sub-interpreters have been around for a while and it just happens that every extension module out there is incompatible with it because no one used it. But if it's going to become the recommended way to do parallelism going forward then they'll have to become compatible with it.

The serialization thing is also a huge issue. Half of the time I want to use multiprocessing I end up finding that the serialization of data is the bottleneck and have to somehow re-architect my code to minimize it.

I would much prefer a world in which asyncio is 2x faster and can benefit from real parallelism across threads. Libraries like anyio already make it super easy to work with async + threads. It would make Python a viable option for workloads where it currently just isn't.


(Disclosure: Author of a Python C extension that wraps SQLite that has 2 decades of development behind it.)

Have a look at the current documentation for writing extensions. This approach is essentially unchanged since Python 2.0. https://docs.python.org/3/extending/newtypes_tutorial.html

In particular note how everything is declared static - ie only one instance of that data item will exist. If there are multiple interpreters then there needs to be one instance per sub-interpreter. That means no more static and initialisation has to be changed to attach these to the module object which is then attached to a specific interpreter instance. It also means every location you needed to access a previously static item (which often happens) has to change from a direct reference through new APIs chasing back from objects to get their owning module and then get the reference. That is the code churn the PyO3 issue is having to address. One bonus however is that you can then cleanly unload modules.

This may still not be sufficient. For example I wrap SQLite and it has some global items like a logging callback. If my module was loaded into two different sub interpreters and both registered the logging callback, only one would win. These kind of gnarly issues are hard to discover and diagnose.

Removing the GIL also won't magically help. I already release it at every possible opportunity. If it did go away, I would have to reintroduce a lock anyway to prevent concurrency in certain places. And there would have to be more locking around various Python data structures. For example if I am processing items in a list, I'd need a lock to prevent the list changing while processing. Currently the GIL handles that and ensures fewer bugs.

I've also experienced the serialization overhead with multiprocessing. I made a client's code so much faster that any form of Python concurrency was slower because of all the overhead. I had to rearchitect the code to work on batches of data items instead of the far more natural one at a time. That finally allowed a performance improvement with multiprocessing.


The GIL has been a blocker for many years. It's nice that the team is making progress of course. IMHO it's one of those bandaids they need to rip off.

I was listening to the interview with Chris Lattner with Lex Friedman a week ago or so. Very interesting discussion on his project mojo which intends to build a new language that is backwards compatible and a drop in replacement for python with opt in strict typing, better support for native/primitive types where this makes sense, easier integrations with hardware optimizations, and of course no GIL. The idea would be that the migration path for existing code is that it should just work and then you optimize it and provide the compiler with type hints and other information so it can do a better job. Very ambitious roadmap and I'm curious to see if they'll be able to deliver.

The main goal seems to be to enable programmers to do the things you currently can't do in python because it's too slow in python without running into a brick wall in terms of performance.

I mostly work with JVM languages and a few other things but I occasionally do a bit with python as well. I've always liked it as a language but I'm by no means an expert in it. I recently spent a day building a simple geocoder and since I know about the GIL, I went straight for the multi processing library and did not bother with threads. IMHO there's absolutely no point in attempting to use threads with python with the GIL in place. I needed to geocode a few hundred thousand things in a reasonable time frame, so all I wanted to do was use a few different processes concurrently so I could cut down the runtime to something reasonable.

Python is ok for single threaded stuff but you run into a brickwall doing anything with multiple processes or threads and juggling state. In the end I just gave up and wrote a bunch of logic that splits the input into files, processes the files with separate processes, waits for that to finish, and then combines the output files. Just a lot of silly boiler plate and abusing the file system for sharing state. It does what it needs to but it feels a bit primitive and backwards and I'm not proud of the solution.

Removing the GIL, adding some structured concurrency, and maybe some other features, would make python a lot more capable for data processing. And since there are a lot of people already use python for that sort of thing, I don't think that would be such a bad thing. Data science and data processing are the core use case for python. I don't think people actually care a lot about the raw python performance. It's never been that great to begin with. If it's performance critical, it's mostly being done via native libraries already.


> Data science and data processing are the core use case for python

indeed. one would almost hope that all the different aspects of "performance" and "concurency", their memory, disk or network profile etc get their own dedicated labels. The conflation of these distinct dimensions is a major source of confusion (and thus a waste of bandwidth).


This is an interesting writeup. Could you go for asyncio.gather[0] or TaskTaskGroups[1] these days? Or would that not help?

[0] https://docs.python.org/3/library/asyncio-task.html#asyncio....

[1] https://docs.python.org/3/library/asyncio-task.html#asyncio....


Is this io-bound or cpu-bound? Hard to tell from your one word description, “geocode”. Is that local or a network call?

If you’ve broken up the input already I’d use the shell to parallelize, ie for &. If network, async is probably what you want.


Mostly IO bound. In this case the actual limit was the API rate limiting of the geocoder I was using. A couple of thousand calls per minute. Quite a bit more than what you can do with a single thread but not quite what a decent laptop would be able to do.

Python has blocking IO. So a network call blocks the process. So if you have 250ms response times, you are doing 4 requests per second. Without the GIL, threads would be a good way to scale that. With 10 threads you should be able to do 2400 requests per minute. But with the GIL forget about that. With co-routines and non blocking IO, it could all be single threaded. There are some ways to do that with python of course but then you are going to have to use some specialized frameworks and step away from the standard library a bit.

Using the shell vs. the multi processing module is the same difference. I've done both with python. I've a slight preference for the multi processing module so I don't have to deal with bash weirdness on top of all the python boiler plate. The first time I did stuff like that with python was probably 13 years ago or so around 2010. Not a lot has changed or progressed on this front since then in python. The GIL made scaling this unnecessarily hard then and it still does. Threads are a no go because of it, the standard library mostly offers blocking IO, and when you go with multiprocessing things like shared memory are very limited so you end up using files or databases for state.

Things like node.js, go, or kotlin would handle this type of work load with a lot less fuss. With Kotlin, I'd be using co-routines and some multi threaded scope to launch them in. Or I could build on some of the Java internals. Or a mix of both. I'd be able to write similar code and choose between blocking or non blocking IO. I'd be forking and joining things. Maybe use channels and flows to pass data around and rely on back pressure to keep things progressing at a more or less optimal rate. Not an option for this project as my client just is more python focused and that's fine. But just signaling that python is a bit out of its depth here. Just singling out kotlin here because I just have spend a lot of time with it. If you have a hammer everything seems a nail.

I think python could be so much better but that starts with modernizing its internals. Mojo seems like a huge step in the right direction.


> There is going to be a performance impact on single threaded code if the "no GIL" project is merged, something in the region of 10%.

10% doesn't look too much to me, I still don't get why today people care so much about single thread performance.


Single threaded performance is still more important than multi-threaded. Most applications are single threaded, and single threaded programs are much easier to write and debug. Removing the GIL from python will not change that.

If no-GIL has a 10% single thread performance hit, that means that essentially all my existing python code would be that much worse.


> If no-GIL has a 10% single thread performance hit, that means that essentially all my existing python code would be that much worse.

Maybe in a 100% CPU bound code, most of the code is I/O bound and no one will notice the change, just my opinion.


Maybe that's just my bubble, but I see much more python in data science projects than in web servers. And in (python) data science even your file reading/writing code quickly gets CPU bound.


That's because you are doing it wrong. You'll need to split every step of your data science pipeline into a microservice, then put it in the cloud for resilience. Then the application will be so fast that it is no longer CPU bound but I/O bound.


Not in pure Python, that's in specialized libs, like numpy, pandas and co, done in C.

So, the hit on the Python interpreter wouldn't translate to a hit on those.


That's going to be CPU-bound in numpy's C extensions rather than Python itself, one would hope. The worst of all worlds is that we get a 10% perf cut to python execution and numpy breaks because the C API is ripped up.


There was a time like 5-10 years ago where Python was really popular for grassroots web projects. Nowadays this is mostly node looks like.


It might be more helpful to think of it in terms of supported use cases, rather than just pure volume.


Maybe, but if your code is I/O bound, then multi-threading isn't going to help you either.


But the GIL doesn't need to be held in I/O-bound code anyway, so why does it matter?


>If no-GIL has a 10% single thread performance hit, that means that essentially all my existing python code would be that much worse.

So? Especially since the "Faster Python" team already made Python 1.11 "10–60% Faster than 3.10", and 1.12 is even faster still, whereas their overall plan is to get it to 2-5 times faster compared to 3.9.

So at the worst case, with a 10% hit, you'd balance out the 3.11 speed, and your code would be as fast as 3.10.


But your software would still run 10% slower than it needs to. Single threaded code is like 99% of all code written.


>But your software would still run 10% slower than it needs to

There's no absolute objective "needs to" or even any static baseline. Python can have, and often has had, a performance regression that drop your code by 10% at any time. It's no big deal in itself.

Also consider a further speedup of e.g. 50% in upcoming versions (they have promised more).

If you're OK with the X speed of today's Python, you should be ok with X + 40% - even if it's not the X + 50% it could have been due to the 10% GIL's removal toll.


that's not an argument either, as your software is already 10000% slower than it needs to be as you have written it in python.


Are your current python programs slow and it matters?

Why havent you implemented multithreading?

(don't get me wrong, I know the cost of implementation, but if speed matters, multithreading is a very reasonable step in python)


Because the GIL and also you have to use tons of locks to get around the lack of thread safety for pythons objects.


> Why haven't you implemented multi-threading?

Because that makes programs slower in Python.

Multi-threading in Python is for when you need time-slicing for CPU intensive tasks so that they don't block other work that needs to be be done.


His point remains, he just phrased it badly. We haven't you implemented a multiprocess pool?


Because it's a global solution to a local problem.

With threads, I can encapsulate the use of threads in a class, whose clients never even notice that threads are in use. Sure, threads are a global resource too, but much of the time you can get away with pretending that they're not and create them on demand. Not so with multiprocess. If you use that, then the whole program has to be onboard with it.

Threads work great in Python. Well not for maximising multicore performance, of course, but for other things, for structuring programs they're great. Just shuttle work items and results back and forth using queue.Queue, and you're golden - Python threads are super reliable. And if the threads are doing mostly GIL-releasing stuff, then even multicore performance can be good.


>Not so with multiprocess. If you use that, then the whole program has to be onboard with it

Huh? In Python you just need a function to call, and multiprocess will run it wrapped in a process from the pool, while api-wise it would look as it would if it was a threadpool (but with no sharing in the process case, obviously).

So what would the rest of the program be onboard with?

And all this could also be hidden inside some subpackage within your package, the rest of the program doesn't need to know anything, except to collect the results.


multiprocessing needs to run copies of your program that are sufficiently initialised that they can execute the function, yet no initialisation code must be run that should not be run multiple times.

That means you either use fork - which is a major can of worms for a reusable library to use.

Or you write something like this in your entry point module:

    if __name__=='__main__':
        multiprocessing.freeze_support()
        once_only_application_code()
Suppose I don't realise that your library is using multiprocessing, and I carelessly call it from this two-line script:

    import library_that_uses_multiprocessing_internally
    library_that_uses_multiprocessing_internally.do_stuff()
That's basically a fork bomb.

And where do you put the multiprocessing.set_start_method call? Surely not in the library.


>multiprocessing needs to run copies of your program that are sufficiently initialised that they can execute the function, yet no initialisation code must be run that should not be run multiple times.

Huh? As far as I remember multiprocessing just sends pickled versions of the function to run and any of its dependencies (other functions, closures, etc). As long as the function doesn't use global state that's not available when pickled, it's fine. But it doesn't re-initialize your whole program for each process in the pool.

>That means you either use fork - which is a major can of worms for a reusable library to use.

How did we get into a reusable library authoring?

Yes, multiprocessing is not just turnkey to use inside a reusable library you make.

But the context were programs here, or not?

>Or you write something like this in your entry point module:

Hmmm? This is to have it support freezing the script (that is, using a tool to make it a distributable, like PyInstaller). That's not necessarily a use case most have.


> How did we get into a reusable library authoring?

That was always my premise. Maybe I didn't make it clear enough, because I tend to just take it for granted that that's how you write code, in a style that's suited for reuse.

> But the context were programs here, or not?

That's the "global" I was talking about: Code that's using multiprocessing needs to know the context that it's embedded in. Any moment I might grab that piece of code and transfer it to a library of reusable components, because that's how I work - code that starts out as part of a standalone program doesn't necessarily stay that way. Multiprocessing gets in the way of that.


>That was always my premise. Maybe I didn't make it clear enough, because I tend to just take it for granted that that's how you write code, in a style that's suited for reuse.

That's somewhat condescending. You can write code "in a style that's suitable for reuse" without being a library author - well, without publishing public packages anyway. Re-use is not only about some totally generic package that can run under any arbitrary context willy nilly.

And of course there are tons of programs where the parts don't make sense as libraries, because they're tied to the specific functionality and overall design (whether because of the domain logic required or due to optimization or other constraints). You write them to be modular and clean, but not with "arbitrary people running my code in whatever context" in mind.

Not to mention the mountains of purpose-specific throwaway scripts, e.g. in the scientific community especially, where Python is big, there's little regard for reuse (even less so for library building), and it's not because multiprocess is stopping them :)

So, yeah, I'd say, even if not 100% suitable for generic reusable library-style code, it doesn't mean multiprocess can't be applied in a huge number of specific people's problems and codebases.

>Code that's using multiprocessing needs to know the context that it's embedded in.

If you want to speed up your Python program and there's something that can run in parallel with no shared state, you can use multiprocess to run it.

If having it as a "reusable component" that hides away the fact multiprocess is used, and that can be called in any arbitrary context, is your concern, it's a valid one, but then perhaps a specific Python program and its performance is not your main priority. Library writing is, instead :)

Else, it's enough that the user calling multiprocessing knows the function that is to be passed and its dependencies (or lack thereof). Other than that, they don't have to change their top level program's architecture.


I didn't mean to say that no one should ever use multiprocessing. I was laying out the reasons why I don't.

I'm really looking forward to subinterpreters. I think they have great potential for supporting a style of multiprocessing that is both faster and better isolated.


> Sure, threads are a global resource too, but much of the time you can get away with pretending that they're not and create them on demand.

I think you would love Trio and applying the idea to threads.


Applying which idea? async does not appeal to me, if that's what you mean.


Because not everything is trivially parallelizable and multiprocess makes it harder to share data?


>Because not everything is trivially parallelizable

A lot of things are though...


CPython is so hopelessly slow, I wouldn't care about 10%. For most of the stuff written in Python, users don't really care about speed.

The impact won't be on users / Python programmers who don't develop native extensions. It will suck for people who had a painful workaround for Pythons crappy parallelism already, but now will have to have two workarounds for different kinds of brokenness. It still pays off to make these native extensions, however their authors will create a lot of problems for the clueless majority of Python users, which will like end up in some magical "workarounds" for problems introduced by this change very few people understand. This will result in more cargo cult in the community that's already on the witch hunt.


Exactly, and the thing is, the Faster Python project will completely surpass that 10% performance change.


Sometimes people see it as losing 10% performance that you never get back


because a lot of code is single-threaded


And if you're really concerned about speed, Python is not the language to choose.


Every program in any language has the potential to be concerned about speed. This cute maxim is ultimately a punchline, not really a serious point.


Right, and single thread performance won’t matter as much if it becomes easier to implement multithreading. This hurts legacy code, but I imagine it would be worth it in the long run.


It would remain as hard as it has always been. Also threads are very heavy, locking kills performance, and if you don't have GIL, you'll need to manage explicit locks, that will be just as slow but also cause an incredible amount of subtle bugs.


> if you don't have GIL, you'll need to manage explicit locks

You need to do that with multithreaded Python code with the GIL. The GIL only guarantees that operations that take a single bytecode are thread-safe. But many common operations (including built-in operators, functions, and methods on built-in types) take more than one bytecode.


> locking kills performance, and if you don't have GIL, you'll need to manage explicit locks

I was under the impression that the Python thread scheduler is dependent on the host OS (rather than being intelligent enough to magically schedule away race conditions, deadlocks, etc.), so you still need to manage locks, semaphores, etc. if you write multi-threaded Python code. I don't see how removing the GIL would make this any worse. (Maybe make it slightly harder to debug, but at that point it would be in-line with debugging multi-threaded Java/C/etc. code.)

Or would this affect single-threaded code somehow?


In python you always have a lock, the GIL. If you take it away you end up actually having to do synchronization for real. Which is hard and error prone.


>I still don't get why today people care so much about single thread performance.

For about 10 minutes a few years ago, when the M1 had the best single threaded performance per buck, people cared.

Now that the M1 isnt the leader in single threaded, we are back to the 'multithread is most important'.

Which has always been true. If your program needs an improvement in speed, you can multithread it. The opposite isnt true.


Not all algorithms can be chunked up. Single thread performance is and will always be important.


> If your program needs an improvement in speed, you can multithread it. The opposite isnt true.

What do you mean by "the opposite"? "If your program doesn't need an improvement in speed, you can't multithread it"? "If you can multithread your program, then it doesn't need an improvement in speed"? Well, yeah, obviously both of those statements are false but they're also quite useless, so who cares?


Add negative threads to fix a program that runs too fast?


Well, having too many threads can slow down a program as well (extra context switches, extra synchronization) so... no idea.


You can improve performance by moving to a single thread. Pinning work to a single core will improve cache performance, avoid overhead of flushing TLBs and other process specific kernel structures, and more.


> Their argument is that the "sub interpreters" they are adding (each with its own GIL) will fulfil the same use cases of multithreaded code without a GIL, but they still have the overhead of encoding and passing data in the same way you have to with sub processors.

This is smart, though, because (even if it's not great) there's a lot of evidence that it works in practice. Specifically, this is almost exactly what JavaScript does with workers. It's not a great API and it's cumbersome to write code for, but it got implemented successfully and people use it successfully (and it didn't slow down the whole web).


If we could just get efficient passing of objects graphs from one subinterpreter to another, which is not in the current plan, I think that would solve a lot of use cases. That would allow producer/consumer-style processing with multiple cores without the serialization overhead of the multiprocessing module.

Removing the GIL seems like it could make things more complicated in many ways by making lots of currently thread-safe code subtly unsafe, but I might be wrong about this. (...in which case it would just make things very slow because everything is synchronized?)


As someone who observed Python core development for many years, a major change to the interpreter REQUIRES core-dev buy in. There have been at least 5 big projects which proposed large changes, they have all been declined.

It is a NIH syndrome, if a big project doesn't originate in the dev team, it will not be accepted.


I have often wondered what the solution to the serialisation of objects between subinterpreters is.

If its garbage collection that's the problem, I think you could transfer ownership between threads, so the subinterpreter takes ownership of the object and all references to it in the source interpreter are voided.

Alternatively you can do something like Java and all objects are in a global object allocator, passing things between threads doesn't require interpretation, just a reference.


This feels like it's playing out as I expected. I followed Python, and the Python community, really closely from 2008-2016 when there were tons of relatively small scale experiments happening. This all happened organically to a large extent and there was no one coordinating a grand vision. It seems like we have a continuation of this giving rise to the concern that there is some battle.

I suspect there will be some butting of heads for a while before they work things out after seeing how the community reacts. All of this could be handled better with some thoughtful proactive engagement, but that's not really how things operate and there is no one to really enforce it.


Where is the progress? They claim "10-60%" improvements over Python 3.9 (I believe) and I don't notice much, partly because Python has always been sped up by C-extensions.

The price is an added complexity of the code base.

I truly don't understand why Python always gets a pass and people applaud every announcement, no matter how trivial or elusive it is.

The GIL effort is another matter. I'd rather have a simple interpreter with no-GIL than this mess of relatively small speedups.

But like other GvR efforts where he presided over a small group of people who did the work his efforts will of course go in. Like asyncio, the suboptimal "match" statement, the peg parser (which adds new workloads for other implementations because the de-facto DFL cannot be bothered with publishing an LALR grammar).

Python is a glue language, people can go elsewhere for speed.


I do hope the dialogue stays cordial, constructive and open rather than becoming distinct entrenched camps - the Python community has a strong and mature community spirit so this seems plausible and not too much wishful thinking.

Much as No GIL would be an adventure, I'm leaning towards the more gradual and stable changes from the FasterPython team and I can see that throwing No GIL into the mix adds complexity at an inopportune moment.


It’s overstating the case to call this a “struggle” between factions; it’s an important discussion with a lot of ramifications (and while it is unresolved a lot of work is stalled).

https://discuss.python.org/t/a-fast-free-threading-python/27...


Nogil would give far larger returns and I wish they’d focus on that. That’s the best way to a faster python.


Oh, boy. Will any of that impact backward compatibility?

I don't develop anything in Python, but it is used by several applications of importance to me. The lack of compatibility between versions is a thing that bites me hard, and I tend to curse Python because of it.


>There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in

I think the arguments are a red herring. It's more rationalizations for not wanting to do it.


If the GIL were an optional interpreter parameter you could spawn GILed subinterpreters and GILless subinterpreters according to your needs.


Why not just try to make multiprocessing easier?


Removing the GIL should be seen as a last option.


I'm going to admit that what I really want to see is a strong push to standardize and fully incorporate package management and distribution into python's core. Despite the work done on it, it's still a mess as far as I can see, and there is no single source of truth (that I know of) on how to do it.

For that matter, pip can't even search for packages any more, and instead directs you to browse the pypi website in a browser. Whatever the technical reasons for that, its a user interface fail. Conda can do it!!!!! (as well as just about any package management system I've ever used)


> I'm going to admit that what I really want to see is a strong push to standardize and fully incorporate package management and distribution into python's core. Despite the work done on it, it's still a mess as far as I can see, and there is no single source of truth (that I know of) on how to do it.

Package management is standardized in a series of PEPs[1]. Some of those PEPs are living documents that have versions maintained under the PyPA packaging specifications[2].

The Python Packaging User Guide[3] is, for most things, the canonical reference for how to do package distribution in Python. It's also maintained by the PyPA.

(I happen to agree, even with all of this, that Python packaging is a bit of a mess. But it's a much better defined mess than it was even 5 years ago, and initiatives to bring packaging into the core need to address ~20 years of packaging debt.)

[1]: https://peps.python.org/topic/packaging/

[2]: https://packaging.python.org/en/latest/specifications/index....

[3]: https://packaging.python.org/en/latest/flow/


which is why I said, "Despite the work done on it"


Yes, that was meant more for the “source of truth” part.


No, I do appreciate you taking the time to do it (I was too lazy)


Is there anything in there about managing dependencies within a python project? What is the canonical way to do that in python today?


It depends (unfortunately) on what you mean by a Python project:

* If you mean a thing that's ultimately meant to be `pip` installable, then you should use `pyproject.toml` with PEP 518 standard metadata. That includes the dependencies for your project; the PyPUG linked above should have an example of that.

* If you mean a thing that's meant to be deployed with a bunch of Python dependencies, then `requirements.txt` is probably still your best bet.


I meant the second. requirements.txt is a really bad solution for that, and that is the frustration many of us have that have used languages with much better solutions.


Requirements feels like a dirty hack but it does work fine. It has ==, ~=, and >= for version numbers, as well as allowing you to flag dependencies for different target os, etc. And then you can add setup.py if you need custom steps. But yes, it feels dirty to maintain requirements.txt, requirements-dev.txt, etc.

Poetry is the most common solution that I've seen in the wild. You spec everything using pyproject.toml and then "poetry install" and it will manage a venv of its own. But you still need to tell people to "pip install poetry" as step 0 which is annoying.

If you don't care about deploying python files, and rather just the final product, I'd recommend either nuitka or pyinstaller. These are for bundling your project into an executable without a python runtime needed (--onefile type of options for single file output). Neither supports cross compilation though.


What flow do you use with requirements.txt that gives you reproducible builds across a team and environments? Using ==, ~=, and >= will not give you reproducible builds.


We do it like this:

- configure the project with pyproject.toml,

- use pip-compile (from the pip-tools package) to create a lockfile,

- commit the lockfile into git,

- whenever we want to update the dependencies, do it through pip-compile again (if you give it an existing lockfile as output, it will keep what's in there and change only what's required).

Since all our requirements are cross-platform and on PyPI, we can install the same env everywhere.


Hash-pinning with requirements.txt will get you the closest to this, but it's not possible in the general case to have a cross-environment reproducible build with Python. The closest you can hope for is a build that reproduces in the same environment.

This problem is shared by the majority of language packaging ecosystems; the only one I'm aware of that might avoid it is Java.


Rust and Go both have proper lock files... Both of which are good enough to satisfy Nix's requirements for reproducability such that they re-use the lock file hashes. This "it's hard, no one does it right" feels like a cope.


We get (very) close to cross-environment reproducible builds for Python with https://github.com/pantsbuild/pex (via Pants). For instance, we build Linux x86-64 artifacts that run on AWS Lambda, and can build them natively on ARM macOS. (As pointed out elsewhere, wheels are an important part of this.)

This is not raw requirements.txt, but isn’t too far off: Pants/PEX can consume one to produce a hash-pinned lock file.


How do you get a reproducible build in python for the same os/arch? As in, what concrete steps do you take?

This is very easy in nearly every other language that is popular. No one ever answers this clearly in threads like this short of saying “use poetry” which makes my point. I’ve asked many times.


I explicitly said that you can't. Python's packaging ecosystem wasn't designed with reproducibility in mind, and has never claimed to prioritize reproducibility. The best you can do is get close, and hash-pinning gets you pretty close.

I'm not aware of any other major language or language packaging ecosystem that makes reproducibility straightforward. Certainly not Ruby or NPM, and not even brand new ones like Rust's Crates. Java appears to be the closest[1], but is operating with significant advantages (distributing reproducible bytecode to all users, minimizing system dependencies, etc.).

Edit: In addition to hash-pinning, you can instruct `pip` to only install built distributions, i.e. wheels. If you do both hash-pinning and built distributions only, your package installation step _should_ reproduce exactly on machines of the same OS, architecture, and Python version. But again, this is guaranteed nowhere.

[1]: https://reproducible-builds.org/docs/jvm/


You are splitting pedantic hairs in order to avoid talking about the obvious. Python's dependency management is much worse than ruby's and nearly every other popular language.

  In ruby you add dependencies to a Gemfile then ...
  $ bundle install
  $ git add Gemfile Gemfile.lock
and other members of your team can have the same build as you.

requirements.txt doesn't solve this basic need.

Edit: formatting.


I think you're confusing lockfiles with reproducibility. Lockfiles are good, but they don't guarantee reproducibility: a locked (or pinned, hashed, etc.) dependency might always be the exact same source artifact, but it can install in different ways (e.g. due to local toolchain differences, different versions of dependencies, sensitivity to timestamps, sensitivity to user-controlled environment variables, etc.).

Reproducibility is a much harder problem than dependency locking, and (again) I'm not aware of any language level packaging ecosystem that really supports it out of the box.

Python doesn't have reproducible builds, but it does have lockfiles (via hashed and pinned requirements). They're not particularly good (for all the reasons mentioned upthread), but they do indeed exist. If you use them as I've said, then your environment will be approximately as repeatable as with any other language packaging ecosystem (and arguably more so in some cases, since wheel installs are reproducible where gem installs aren't).


Binary wheels are just archives, I don't see differences in the way they install between different systems.

Source wheels that contain C/C++ code are so annoying to install on Windows that we don't use them. But most packages provide binary wheels anyway.


> Binary wheels are just archives, I don't see differences in the way they install between different systems.

The subtlety here is in which binary wheel is selected: a particular (host, arch, libc) tuple may cause `pip` to select a more specific wheel for the same version of the package, or even an entirely different wheel. This makes wheels themselves reproducible between systems, but it also means that which wheel isn't guaranteed.


What is the most common python flow for dependency locking? Is it in any of the PEPs?


That would be PEP 508.

508 notably excludes hash-pinning, which is a significant limitation -- hash-pinning is defined by how `pip` implements it.


Thanks, is there any kind of tutorial on how you should use this in a python project?

Edit: I think this is one of the biggest problems someone coming to python has. Python advocates say some version of, "you can roughly do that" but there isn't a clear explanation of how to do it.

Edit 2: I see that the official docs have a Pipenv flow outlined. Is Pipenv the way people do this in python these days?


> Thanks, is there any kind of tutorial on how you should use this in a python project?

The PyPUG docs contain examples of using all of the PEPs mentioned above. TL;DR: all you need for 99% of use cases is a pyproject.toml.


Those docs say to use Pipenv, or am I looking at the wrong docs? Really not sure why python people can’t articulate a clear flow to follow. It’s all riddles.


> * If you mean a thing that's meant to be deployed with a bunch of Python dependencies, then `requirements.txt` is probably still your best bet.

This is exactly how we got in this mess. Using ``setup.cfg`` or ``pyproject.toml`` for all projects makes this easy as now your deployable project can be installed via pip like every other one.

1. ``python -m virtualenv .``

2. ``source ./bin/activate.fish``

3. ``pip install -U https://my.program.com/path/to/tarball.tar.xz``


Wait, why are ghey two different things? Can't pip be used for deployment?


It can -- the second flow also uses `pip`.

The terminology here is confusing: the first is the flow that produces a "distribution" (i.e., an sdist or bdist), while the second is the flow that produces an "environment" (i.e., a specific set of packages installed in some prefix).


It's beyond "mess" well into "fiasco" and frankly I'm astounded people think there's a more important issue facing the language right now. Look, for an example of a high-prestige project, at Spleeter, which spends multiple pages of its wiki describing how to install it with Conda and then summarizes "Note: as of 2021 we no longer recommend using Conda to install Spleeter" and nothing else.


What are you smoking? The readme for spleeter clearly shows the two simple commands needed to install -- one being a conda install for system level dependencies and one being pip for the spleeter python package itself.


Python’s dependency management, or lack there of, its import system, and the lack of strong typing really make me hate it. It’s the first language I really felt adept with, but once I learned Go, I never looked back. Every time I have to use python it’s like coding with crayons.


Premature optimization is root of all evil.

Python isn't perfect, and mostly unsuitable for any system where performance is in consideration.

That leaves everything else including personal utility scripts and packages I use each day to automate random stuff. And I hugely appreciate how fast and simple it is to develop in python, unlike certain languages that literally depends on IDEs due to the verbosity and unnecessary cognitive load.


> And I hugely appreciate how fast and simple it is to develop in python

Indeed it’s crazy. A few pip installs and I had a multiprocessing pandas (dask) with a web gui, and a workflow system (also with a web gui), and a pipeline to convert csv to parquet in like 20 lines of code


What aspect of python's type system do you find insufficient?


I think type hints help a lot. A codebase with classes and type hints reads much better than one using ad-hoc dictionaries for every data structure.

But on the other hand I don't really like Go, so maybe it's different languages for different tastes.


I gave a talk about this at the Packaging Summit during Pycon which was well received, so the team is definitely aware of the problem.

However, the sense I got was that it was going to be a lot of work to “fix Python packaging” which wasn't feasible with an all-volunteer group.

At work, we're migrating away from pip as a distribution mechanism for this reason; I don't expect to see meaningful improvements to the developer experience anytime soon.

This is especially true because pip today is roughly where npm was in 2015, so there's a lot of fundamental infrastructure work (including security) that still needs to happen. An example of this is that PyPI just got the ability to namespace packages.


> An example of this is that PyPI just got the ability to namespace packages.

You're thinking of organizations, which are not namespaces: https://blog.pypi.org/posts/2023-04-23-introducing-pypi-orga...


Right, but to an average developer, organizations look and feel very much like namespaces.

LWN even used namespaces in the title of the article describing the feature, which doesn’t help the confusion: https://lwn.net/Articles/930509/


That article is about the packaging summit talk on introducing namespaces, not about organizations. In fact, when talking about organizations, it explicitly says:

> But support for namespaces is not part of the new feature.


> we're migrating away from pip as a distribution mechanism for this reason

Could you elaborate on what you’re using as a replacement?


Not the parent but pipenv is decent, poetry is even better:

- clear separation of dev and production dependencies - lock file with the current version of all dependencies for reproducible builds (this is slightly difference than the dependency specification) - no accidental global installs because you forgot to activate a virtual environment - (not sure if supported by pip) allows installing libraries directly from a git repo, which is very useful if you have internal libraries - easier updates


We distribute a CLI tool (dbt), so we’re migrating to distributing using the following mechanisms:

1. curl script that installs dbt for end users

2. zipped snapshots for dbt Cloud, the SaaS hosted version of dbt

Eventually we want to create Docker images from 2, but we’re not there yet.


There it is. The obligatory comment on every Python thread on HN. It's most popular programming language in the world. Other people can figure it out, apparently.


I think we can be more charitable than this: it's possible to be both immensely popular and to have a sub-par packaging experience that users put up with. That's where Python is.


The trouble is people compare it to greenfield languages of the past few years with nowhere near the scope, userbase or legacy of Python. Long time Python users like me don't have any of the problems that the non-Python users that always post these comments have. It would be nice to have improvements to packaging, sure, but it's always just completely non-constructive stuff like "it's not as easy as <brand new language with no legacy>".


Java and Ruby both have much better dependency management experiences and both have been around for far longer than a few years.


As someone who dealt with Java and Python 20yrs back, I don't think Java is a valid comparison.

Java had a terrible or non existent OS integration story - it didn't even try to have OS native stuff. It was it's own separate island that worked best when you stayed on the island. On Linux, Python was included in the OS so you had the two worlds of distro packaging and application development/deployment dependencies already in conflict. Macs also shipped their own Python that you had to avoid messing up. And on Windows Python was also trying to support the standard download a setup.exe method for library distribution. Java only ever had the developer dependency usecase to think about.

Before Maven most Java apps just manually vendored all their dependencies into their codebase, or you manually wrangled assembling stuff in place using application specific classpaths and additions to path env vars etc.


Today, java has much better dependency management than python. Nearly all popular languages do.


Yeah, but you were refuting someone pointing out why that was with the difference between a later green field system in a smaller problem space that could learn from earlier systems vs one with a lot of extra use cases pulling in different directions, complications and legacy to overcome. The competing 3rd party Python projects for packaging early on were learning a lot of lessons the hard way that both left behind legacy to clean up and paved the way for other languages to skip ahead all that usually with a single blessed solution.

Of course a well resourced language that didn't have to worry about native OS/distro integration and only started solving dependency distribution and management later after learning from others is going to have a better system. It would be a total surprise if it didn't.


There is no excuse for the state of python dependency management. Every similar language has figured this out.


I agree with all of this! Ironically, grievances around Python packaging are a function of Python’s overwhelming success; non-constructive complaints about the packaging experience reflect otherwise reasonable assumptions about how painless it should be, given how nice everything else is.

(This doesn’t make them constructive, but it does make them understandable.)


Big assumptions here.



Packaging is a big topic right now, and a lot is happening - that includes a lot of good tool improvements. I think that's one reason for these comments, because it's close to top of mind


I love python. It is my go-to language for just about everything. But that also means that I feel the pain points pretty acutely. And you know what, I'm not alone.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: