The return of lazy imports for Python
LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
Back in September, we looked at a Python Enhancement Proposal (PEP) to add "lazy" imports to the language; the execution of such an import would be deferred until its symbols were needed in order to save program-startup time. While the problem of startup time for short-running, often command-line-oriented, tools is widely acknowledged in the Python community, and the idea of deferring imports is generally popular, there are concerns about the effect of the feature on the ecosystem as a whole. Since our article, the PEP has been revised and discussed further, but the feature was recently rejected by the steering council (SC) because of those concerns; that has not completely ended the quest for lazy imports, however.
Updated PEP
In early October, Germán Méndez Bravo started
a new discussion thread to discuss the updates that had been made to PEP 690 ("Lazy
Imports"). He and co-author Carl Meyer "have (hopefully) considered and
addressed each and all of the suggestions in the previous discussion
thread, by either providing rejection reasons or improving the API and
implementation
". They updated the reference implementation of the
feature, so that interested developers could try it out.
Méndez Bravo also posted some benchmark results that he got when testing three different versions of the interpreter: vanilla CPython, CPython with lazy imports added but unused, and CPython using lazy imports. The idea was to measure the impact of the feature on the operation of the interpreter, rather than the gains that might be found for a particular command-line-program use case. He summarized the impact as pretty minimal, with the disabled imports having no measurable impact versus the vanilla interpreter, while the other two combinations were only 1% slower.
SC member Brett Cannon had
some "personal feedback
" about the updated proposal. In his
opinion,
there are too many ways to enable and disable the feature. In particular,
he found
the enable_lazy_imports_in_module()
API to be "too magical
". It was meant
for SciPy use cases, Méndez Bravo said,
so that individual modules could control their imports without impacting
the rest of an application, but Cannon said that since those modules would
already
need to be modified, they should be changed to do something more explicit. The
PEP authors seem to have agreed with that, since that call was removed
from
the final version of the PEP.
The conversation then languished for a month before another SC member, Petr Viktorin, picked the conversation back up in mid-November. Once again, he was speaking for himself and not the committee; he had concerns about modifying the Python dict object to support the feature. Because the PEP specifies that lazy imports are to be transparent, dictionary lookup is changed to handle lazy objects that represent modules that have not (yet) actually been imported, as described in the Implementation section of the PEP. The Rationale section explains the intended behavior:
The aim of this feature is to make imports transparently lazy. "Lazy" means that the import of a module (execution of the module body and addition of the module object to sys.modules) should not occur until the module (or a name imported from it) is actually referenced during execution. "Transparent" means that besides the delayed import (and necessarily observable effects of that, such as delayed import side effects and changes to sys.modules), there is no other observable change in behavior: the imported object is present in the module namespace as normal and is transparently loaded whenever first used: its status as a "lazy imported object" is not directly observable from Python or from C extension code.
The lazy objects are stored in a module's symbol dictionary (i.e. module.__dict__); in order to ensure that any code that digs around in the module dictionary cannot expose the lazy objects, the underlying dictionary code must be changed. Viktorin was concerned that the behavior could be an obstacle for dictionary optimizations and features in the future. Méndez Bravo agreed that there was a bit of complexity added to the dictionary code, but thought that it was manageable—and that doing things that way was better than other alternatives that had been tried in the Cinder CPython fork where the lazy imports work began. Meta was able to achieve up to 70% reduction in startup times on Python command-line tools using Cinder's lazy imports.
PEP rejected
On December 2, Gregory P. Smith posted the steering council's decision to reject the PEP. The main reason was the effect that it would have on the Python user community:
But a problem we deem significant when adding lazy imports as a language feature is that it becomes a split in the community over how imports work. A need to test code both ways in both traditional and lazy import setups arises. It creates a divergence between projects who expect and rely upon import time code execution and those who forbid it. It also introduces the possibility of unexpected import related exceptions occurring in code at the time of first use virtually anywhere. Such exceptions could bubble up from transitive dependency first use in unanticipated places.A world in which Python only supported imports behaving in a lazy manner would likely be great. But we cannot rewrite history and make that happen. As we do not envision the Python [language] transitioning to a world where lazy imports are the default, let alone only, import behavior. Thus introducing this concept would add complexity to our ecosystem.
The SC also had some concerns with the implementation described in the PEP, including the changes needed to the dictionary implementation, but ultimately decided that those did not matter; the SC would have said "no" even if those problems were addressed. To a certain extent, though, the SC rejection opened to the floodgates to more discussion of the feature.
Both Guido van Rossum and PEP sponsor Barry Warsaw
expressed
disappointment
with the rejection, though both could understand the council's reasoning
for doing
so. Both also noted that the PEP was the best proposal for the feature
that they
had seen. As Warsaw put it: "It was the best option so far for solving
a common use case, and one that puts pressure on ecosystems to move away
from Python.
" Meyer wondered
if there was any appetite for a revised proposal that changed to
explicitly specifying each use (e.g. lazy import foo)
and that created a dict subclass to be used for module
dictionaries if they contain lazy imports. That would address many of the
areas of concern, though it would not really change the
fragmentation issue.
One big question that underlies much of the debate about the feature is around who should decide whether lazy imports are enabled—or supported. PEP 690 envisions application authors enabling lazy imports for the entire application and opting out of laziness for just the few modules that are dependent on being eagerly imported. Back in August, Méndez Bravo described following that process with code at Instagram (which is where Cinder came from), where it worked well.
But others are not so sure that it is application developers who should be making the determination. Viktorin would rather see ways for library authors to take advantage of the feature:
Overall, I think we should make it easier for libraries to use lazy imports themselves, à la SciPy or Mercurial.The current proposal is made for "applications" with tightly controlled set of dependencies. Those are relatively rare in open-source code, and closed-source ones don't have a good way to report bugs that only appear in a specific setup back to the libraries they're using. And the libraries can't test things themselves very well.
Adding explicit lazy syntax to the import sites would allow
libraries to slowly
opt into the feature. The PEP rejected
that approach, but he thought the reasons might be specific to the
Meta/Instagram use case. "Porting to explicit lazy imports, library by
library, would take time and effort, but might eventually give better
results ecosystem-wide.
" Doing so would also allow the implementation
to avoid some of the problem areas:
With explicit lazy imports, we could get away with rougher side effects, avoiding too much magic. Dicts could focus on being containers. Code that needs too much introspection or dynamic features simply wouldn't opt in.
There is concern that library maintainers will be pressured to support lazy imports of their library, however. Warsaw wondered if adding explicit "eager import" syntax would help library maintainers avoid that pressure, but Viktorin did not think it would change anything:
Lazy imports need to be tested, and to be generally useful (outside big apps with rigid dependency chains), they should be tested in individual library test suites. There'll be demand for testing, maintenance, mental overhead around the fact that your library can be imported in two different ways.
That is, of course, already the case, since imports can already be deferred in various ways. Since there is no direct language support for delaying imports, however, that leaves it up to the user of a library, which is part of what Warsaw liked in the PEP:
What I liked about the PEP was that it (at least attempted) to put the burden on the application developer, which is where I think the majority of the responsibility lies. For example, if I turned on implicit lazy imports in my Python CLI, and I found that one of my dependencies can't be lazily imported, I think I'd report the issue (or file a PR) to the dependency, but then I'd just eager-ify the import and my CLI would be none the worse off.
But, as Cannon noted, it is important to consider both the application and the library when looking at doing a lazy import:
The tricky bit with lazy imports as a concept is both the code doing the import and code being imported are affected. Right now there's no handshake in both directions saying both sides of this "transaction" agree that lazy imports are a good thing. You almost need the code being lazily imported to opt into the possibility, and then the person doing the importing saying they want those semantics.
Meyer did
not think that having libraries opt into being lazily imported made
sense, however. If
lazy import foo is shallow, where only foo itself is
lazily imported and not any of the imports it contains (unless specified as
lazy in
foo), then the feature is "effectively just syntactic (and
maybe performance) sugar for manually inlining the import, which is already
possible and not infrequently done
". The PEP gives an example of the manual
inlining that he mentions:
# regular import import foo def func1(): return foo.bar() def func2(): return foo.baz() # manually inlined def func1(): import foo return foo.bar() def func2(): import foo return foo.baz()In the second case, foo will not actually be imported until one of the functions is called. At that point, any imports in foo will be processed (eagerly) as well. Meyer also listed some reasons why he thinks it makes sense to add the syntactic sugar. For one, manual inlining is verbose ("
Sometimes syntactic sugar tastes sweet"), but also:
Manual inlining invokes the import system every time the function is called, which has a noticeable cost. The PEP 690 approach reduces this overhead to zero, after the initial reference that triggers the import.
It is not entirely clear where things go from here. The discussion has largely tailed off as of this writing, but it is a feature that some find useful. The performance and memory-saving benefits that Méndez Bravo reported are certainly eye opening. Finding some way to bring those benefits to all Python users, without fracturing the ecosystem, would definitely be welcome. Perhaps the explicit approach will gain some more traction—and a PEP of its own—before too long.
Index entries for this article | |
---|---|
Python | Import |
Python | Python Enhancement Proposals (PEP)/PEP 690 |
(Log in to post comments)
The return of lazy imports for Python
Posted Dec 13, 2022 22:42 UTC (Tue) by mb (subscriber, #50428) [Link]
> which has a noticeable cost.
Yes, why is that so expensive anyway?
This cost hit me often in the past already.
Can't the import system bail out early, if the import has already been done?
The return of lazy imports for Python
Posted Dec 14, 2022 3:13 UTC (Wed) by xi0n (subscriber, #138144) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 19:31 UTC (Wed) by warrax (subscriber, #103205) [Link]
The return of lazy imports for Python
Posted Dec 15, 2022 0:13 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 3:28 UTC (Wed) by nybble41 (subscriber, #55106) [Link]
The repeated import issue doesn't seem that difficult to work around:
def lazy_foo(): if not hasattr(lazy_foo, "cached"): import foo lazy_foo.cached = foo return lazy_foo.cached def func1(): lazy_foo().bar() def func2(): lazy_foo().baz()
This will import foo exactly once, the first time either func1() or func2() calls lazy_foo(). After that lazy_foo() just returns the module which was already imported.
The return of lazy imports for Python
Posted Dec 14, 2022 6:03 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
The return of lazy imports for Python
Posted Dec 16, 2022 5:24 UTC (Fri) by mgedmin (subscriber, #34497) [Link]
The return of lazy imports for Python
Posted Dec 25, 2022 21:35 UTC (Sun) by empiko (guest, #162849) [Link]
Alternatively, you can do this:foo = None def func(): if foo is None: import foo ...
The return of lazy imports for Python
Posted Dec 14, 2022 3:29 UTC (Wed) by coderanger (subscriber, #134639) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 13:35 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 18:46 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
(It's sometimes necessary for things like dynamic plugins that you want to load at runtime, but this is the exception rather than the rule. Most of the time, you're better off loading each module no more than once.)
The return of lazy imports for Python
Posted Dec 15, 2022 0:12 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 13:41 UTC (Wed) by 0x3333 (subscriber, #158599) [Link]
The return of lazy imports for Python
Posted Dec 14, 2022 14:09 UTC (Wed) by osma (subscriber, #6912) [Link]
Scientific Python SPEC 1
Posted Dec 14, 2022 15:15 UTC (Wed) by hodgestar (subscriber, #90918) [Link]
There is also the Scientific Python SPEC 1 lazy importer. It requires libraries to opt-in (good) and uses the PEP 562 ability to override module `__getattr__`, etc that was implemented in Python 3.7. Python 3.7 is currently the oldest Python that is not end of life.
Scientific Python SPEC 1
Posted Dec 14, 2022 18:52 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
IMHO this is objectively superior to the PEP, since it allows laziness to be an implementation detail of the library, rather than something the application developer has to worry about. If an application developer goes plumbing lazy_loader into an existing library, they know perfectly well that they're carrying a patch and can't reasonably expect upstream to support it. Conversely, if an application developer doesn't want anything to do with lazy loading, they don't even have to know that it is there, because it Just Works.