Subinterpreters for Python
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
A project that has been floating around in the Python world for a number of years is now working its way toward inclusion into the language—or not. "Subinterpreters", which are separate Python interpreters that can currently be created via the C API for extensions, are seen by some as a way to get a more Go-like concurrency model for Python. The first step toward that goal is to expose that API in the standard library. But there are questions about whether subinterpreters are actually a desirable feature for Python at all, as well as whether the hoped-for concurrency improvements will materialize.
PEP 554
Eric Snow's PEP 554
("Multiple Interpreters in the Stdlib
") would expose the
existing
subinterpreter support from the C API in the standard library. That would
allow Python programs to use multiple separate interpreters; the PEP also
proposes to add a way to share some data types between the instances. The
eventual goal is to allow those subinterpreters to run in parallel, but the
implementation is not there yet.
In particular, giving each subinterpreter its own global interpreter lock (GIL) is not (yet) on the table. The GIL prevents multiple threads from executing Python bytecode at the same time. It exists mainly because the CPython memory-management code and garbage collector are not thread-safe. But the existence of the GIL has meant that other features, C-based extensions for example, depend on it for proper functioning. There have been efforts to remove the GIL from Python along the way, including the Gilectomy project. Subinterpreters are seen by some as another way of addressing the "GIL problem".
The PEP proposes adding an interpreters module to the standard library that will allow the creation of subinterpreters as follows:
interp = interpreters.create()Interpreters can then run code passed as a string to the run() method. Data is not shared between these interpreters unless it is done explicitly by using "channels" created this way:
recv, send = interpreters.create_channel()As might be guessed, simple objects (e.g. bytes, strings, integers) can then be sent and received using the send() and recv() methods of the corresponding channel objects.
The run() method blocks until the subinterpreter completes, though it can be executed in a separate thread as an example from the PEP that uses the threading module shows:
interp = interpreters.create() def run(): interp.run('print("during")') t = threading.Thread(target=run) print('before') t.start() print('after')
Because the GIL is shared between all of the interpreters, however, the concurrency gains are minimal. In the most recent revisions, the PEP tries to make it clear that exposing the feature from the C API is worth doing regardless of what happens with the GIL:
PEP 554 has been around since 2017, but Snow thinks it is getting ready for "pronouncement" (a decision to accept or reject it) now. While he believes there is value to exposing the interface in its own right, the PEP has had trouble separating itself from the ongoing GIL work; PEP 554 could perhaps be added to Python 3.9, though the GIL changes are not complete. In mid-April, Snow posed a question to the python-dev mailing list, wondering if it made sense to hold off on the PEP until 3.10 because there is no per-interpreter GIL.
While PEP 554 might be accepted and the implementation ready in time for 3.9, the separate effort toward a per-interpreter GIL is unlikely to be sufficiently done in time. That will likely happen in the next couple months (for 3.10).
So...would it be sufficiently problematic for users if we land PEP 554 in 3.9 without per-interpreter GIL?
His main concern is that users will be confused and frustrated by encountering subinterpreters with a shared GIL, which will have lots of limitations; that might lead them to not reconsider the feature when those limitations are lifted for 3.10. He listed four options for proceeding: merging it without the GIL changes, the same but mark it as a "provisional" module, not merging until the GIL changes are ready, and the same but adding a 3.9-only subinterpreters module to the Python Package Index (PyPI). He was in favor of the first or the second option.
C extensions
But others are concerned that adding subinterpreter support to the standard library will put additional burdens onto the developers of C-based extensions. Those extensions sometimes use global variables, which do not play well with subinterpreters—whether they are created via the existing C API or the proposed standard library interpreters module. That means that using subinterpreters could lead to strange, hard-to-find problems when combined with extensions.
CPython core developer Nathaniel Smith, who is also a core developer of the C-based extension NumPy, was particularly unhappy with the proposal:
NumPy core developer Sebastian Berg chimed
in as well. He suggested that it could take up to a solid year of work to
support subinterpreters in NumPy. He also said that
the proposal to raise an exception when subinterpreters import extensions
that are not subinterpreter-ready is helpful, though it likely will still lead to
bugs being filed against the extensions. The PEP proposes to raise
ImportError for any extension that does not support PEP 489
("Multi-phase extension module initialization
"); multi-phase
initialization eliminates the problems with global state variables for the
extensions by moving them into their own module-specific dictionary object.
Both Smith and Berg are skeptical of the existing C-level subinterpreter
support. Berg said: "I believe
you must consider subinterpreters basically a non-feature at this time.
It has neither users nor reasonable ecosystem support
", while Smith
said that he might write a PEP to propose that subinterpreters be
completely eliminated from Python. Snow replied
to Berg that there are existing users, however:
That's not to say that alone justifies exposing the C-API, of course. :)
Benefits?
Beyond the concerns about extensions, though, Smith is not convinced of the benefits for concurrency that could eventually come from subinterpreter support. PEP 554 is careful not to directly connect the interpreters module with the eventual plan to stop sharing the GIL between subinterpreters, though it is clearly the eventual goal for some. Smith is skeptical of that plan as well:
Berg concurred to a certain extent. He said that there is a need for a wider vision, beyond the PEP's smaller goals, to explain what the plans are for subinterpreters so that a fuller picture can be considered. Snow agreed that there was a need for better documentation, an informational PEP or other justification document, though that has not appeared as yet. Ultimately, the decision on the PEP rests with Antoine Pitrou, who is the delegate for the PEP. He is generally favorably inclined toward it:
He had some concrete suggestions on things to improve in the API and suggested that the feature be added provisionally (effectively option two in Snow's original message). He also explicitly solicited more feedback. Mark Shannon reviewed the PEP and said that he was in favor of the idea, but that it did not make sense to add the module to the standard library without showing that it would be beneficial for parallelism:
If per-[subinterpreter] GILs are possible then, and only then, sub-interpreters will provide true parallelism and (limited) shared memory concurrency.
The problem is that we don't know whether we can implement per-[subinterpreter] GILs without too large a negative performance impact. I think we can, but we can't say so for certain.
Snow disagreed, not surprisingly, but Shannon put together a table comparing different existing approaches to concurrency in Python with PEP 554 and an "ideal" communicating sequential processes (CSP) model. Go's concurrency model is roughly based around CSP; adding it to Python has also been tried along the way. Shannon said:
As it stands, multiprocessing a better fit for CSP than PEP 554.
IMO, sub-interpreters only become a useful option for concurrency if they allow true parallelism and are not much more expensive than threads.
Snow sees concurrency as something of a side issue, but he is thinking of taking up the suggestion by Berg and others to more fully document the complete plan:
There was plenty of other discussion, but Snow eventually deferred the PEP until the 3.10 time frame:
It is an interesting feature and one that numerous core developers think
could really help the performance of Python programs on multiple cores.
But, without the GIL changes, it is difficult to know for sure whether it
will be a substantial win. As Smith put
it: "[...] the new concurrency model in PEP 554 has never actually
been used, and it isn't even clear whether it's useful at all.
Designing useful concurrency models is *stupidly* hard.
" We will
have to wait to see if subinterpreters can clear that hurdle.
Index entries for this article | |
---|---|
Python | Python Enhancement Proposals (PEP)/PEP 554 |
Python | Subinterpreters |
(Log in to post comments)
Subinterpreters for Python
Posted May 13, 2020 23:26 UTC (Wed) by geofft (subscriber, #59789) [Link]
From the very bottom of the PEP, it sounds like all the necessary changes to the CPython API/implementation have already been merged? (Or is it just that the work has been done in a fork but not yet merged?) If the CPython changes can be merged by themselves, does the "interpreters" module need to be part of the standard library, or can it be a CPython extension module with the same functionality?
That seems like it would help with the concern of breaking other extension modules. "NumPy doesn't work with this other random extension module I found on PyPI" is much easier for the developers to dismiss (and much less likely to be filed, at all) than "NumPy doesn't work with this thing in Python 3.9 core."
(Also given the "removing dead batteries" PEP and the points it makes about maintenance burden, implicit endorsement of things in the standard library, etc., it seems worth keeping new batteries out until they're fully charged.)
Subinterpreters for Python
Posted May 14, 2020 0:40 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
> [I] expect that communicating between subinterpreters is going
> to end up looking an awful lot like communicating between
> subprocesses via shared memory.
>
> The trade-off between the two models will then be that one still
> just looks like a single process from the point of view of the
> outside world, and hence doesn't place any extra demands on the
> underlying OS beyond those required to run CPython with a single
> interpreter, while the other gives much stricter isolation
> (including isolating C globals in extension modules), but also
> demands much more from the OS when it comes to its IPC
> capabilities.
I must admit that this comes across as a bit abstruse to me. All modern operating systems have something which resembles a pipe or socket, and for the most part, they also have a reasonable way of sharing memory between processes (admittedly, the Windows way is a bit odd, but it certainly exists). Perhaps Nick was thinking of some other form of IPC, but if so, I cannot divine it from what he has written. A great deal of IPC-related complexity has already been implemented in the multiprocessing module (it has a number of primitives for copying objects across the process boundary).
I find it unfortunate that so many C extensions have tacitly assumed there can be only one Python interpreter in the whole process, but it appears that this assumption is endemic to large swaths of the community. It seems imprudent to provide a stdlib facility which will break popular extensions such as NumPy, and I suspect that this will block the PEP's adoption in the short term.
Subinterpreters for Python
Posted May 15, 2020 20:37 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
If the CPython API weren't designed around there being only one interpreter, maybe this could all have been avoided. I still don't get why an extra pyctx parameter wasn't added to every C API at some point (Python3 seems like it would have been a perfect time). The existing ones can become macros (with trampolines for ABI compat if that mattered) that pass the existing global context around. Wean yourself off that and now CPython isn't encouraging static globals anymore.
Granted, I'm sure it's a lot of work and Python already burnt too many bridges for such an API break to happen again and be accepted.
But I'd not be so quick to blame the extension developers when the core doesn't have it's act together either.
Subinterpreters for Python
Posted May 17, 2020 6:20 UTC (Sun) by NYKevin (subscriber, #129325) [Link]
Subinterpreters for Python
Posted May 14, 2020 5:04 UTC (Thu) by Rudd-O (guest, #61155) [Link]
Can the same thing be done with multiprocessing or threading? No, not really, as the semantics of those mechanisms aren't quite right.
Tragic torpedoing of a good idea.
Subinterpreters for Python
Posted May 14, 2020 5:46 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
Subinterpreters for Python
Posted May 14, 2020 13:01 UTC (Thu) by enchantner (guest, #138900) [Link]
Subinterpreters for Python
Posted May 14, 2020 16:44 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
Subinterpreters for Python
Posted May 14, 2020 19:37 UTC (Thu) by enchantner (guest, #138900) [Link]
Pygolang
Posted May 15, 2020 8:35 UTC (Fri) by kirr (guest, #14329) [Link]
> Uhh... What? Go's channels can be trivially expressed ...
For the reference:
https://pypi.org/project/pygolang/
and in particular GIL-avoidance mode:
Subinterpreters for Python
Posted May 14, 2020 21:06 UTC (Thu) by njs (subscriber, #40338) [Link]
If you want Go-style multi-core efficiency, though, without the GIL messing things up... then that's harder. CPython is stuck with using the GIL to protect access to Python objects. Therefore, if you want each subinterpreter to have its own GIL, then that means you can never pass Python objects between subinterpreters.
So the reality is that subinterpreters are like subprocesses: each new subinterpreter has to re-load all modules from scratch, passing objects between subinterpreters requires pickling/sending bytes/unpickling, etc. PEP 543 uses the words "CSP" and "channel" a lot, but this was never going to look like Go.
Subinterpreters for Python
Posted May 14, 2020 5:09 UTC (Thu) by eric.saint.etienne (subscriber, #123009) [Link]
Subinterpreters wouldn't help with performances on interpreters with no GIL like Jython or IronPython and possibly Python for GraalVM.
Subinterpreters for Python
Posted May 14, 2020 5:50 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
There are exactly two real-world implementations of Python: CPython and PyPy. Both will have to be modified heavily to support GIL-less world, with subinterpreters being a somewhat lesser evil.
Subinterpreters for Python
Posted May 14, 2020 12:38 UTC (Thu) by Conan_Kudo (subscriber, #103240) [Link]
There is also an experimental GraalVM implementation of Python 3.7, but yeah, the landscape of Python implementations is kind of barren with Python 3 at the moment.
Subinterpreters for Python
Posted May 14, 2020 8:42 UTC (Thu) by flussence (subscriber, #85566) [Link]
I'm flabbergasted that this is all Python can muster in 2020. Cooperative multitasking based on string eval.
Subinterpreters for Python
Posted May 14, 2020 8:58 UTC (Thu) by k3ninho (subscriber, #50375) [Link]
Not even pickle'd for efficient serialisation between subinterps.
I've not read the proposal, but it being absent from this article suggests there's no aysnc-yield pairing either for saying 'evaluate until this other component is ready to progress global tasking state'. That's something that V8 and ES-16 made easy to manage with (resolve, reject) promises.
K3n.
Subinterpreters for Python
Posted May 14, 2020 10:46 UTC (Thu) by anselm (subscriber, #2796) [Link]
Tcl offered a language-level subinterpreter feature quite similar to this (including a facility to use subinterpreters that had all their dangerous commands removed, to execute untrusted Tcl code) in the late 1990s/very early 2000s. At the time nobody else had anything of the sort and it was pretty nifty.
Subinterpreters for Python
Posted May 14, 2020 12:39 UTC (Thu) by jezuch (subscriber, #52988) [Link]
This typo is even self-explanatory! :)
(Sorry, couldn't resist)