|
|
Subscribe / Log in / New account

The return of lazy imports for Python

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jake Edge
December 13, 2022

Back in September, we looked at a Python Enhancement Proposal (PEP) to add "lazy" imports to the language; the execution of such an import would be deferred until its symbols were needed in order to save program-startup time. While the problem of startup time for short-running, often command-line-oriented, tools is widely acknowledged in the Python community, and the idea of deferring imports is generally popular, there are concerns about the effect of the feature on the ecosystem as a whole. Since our article, the PEP has been revised and discussed further, but the feature was recently rejected by the steering council (SC) because of those concerns; that has not completely ended the quest for lazy imports, however.

Updated PEP

In early October, Germán Méndez Bravo started a new discussion thread to discuss the updates that had been made to PEP 690 ("Lazy Imports"). He and co-author Carl Meyer "have (hopefully) considered and addressed each and all of the suggestions in the previous discussion thread, by either providing rejection reasons or improving the API and implementation". They updated the reference implementation of the feature, so that interested developers could try it out.

Méndez Bravo also posted some benchmark results that he got when testing three different versions of the interpreter: vanilla CPython, CPython with lazy imports added but unused, and CPython using lazy imports. The idea was to measure the impact of the feature on the operation of the interpreter, rather than the gains that might be found for a particular command-line-program use case. He summarized the impact as pretty minimal, with the disabled imports having no measurable impact versus the vanilla interpreter, while the other two combinations were only 1% slower.

SC member Brett Cannon had some "personal feedback" about the updated proposal. In his opinion, there are too many ways to enable and disable the feature. In particular, he found the enable_lazy_imports_in_module() API to be "too magical". It was meant for SciPy use cases, Méndez Bravo said, so that individual modules could control their imports without impacting the rest of an application, but Cannon said that since those modules would already need to be modified, they should be changed to do something more explicit. The PEP authors seem to have agreed with that, since that call was removed from the final version of the PEP.

The conversation then languished for a month before another SC member, Petr Viktorin, picked the conversation back up in mid-November. Once again, he was speaking for himself and not the committee; he had concerns about modifying the Python dict object to support the feature. Because the PEP specifies that lazy imports are to be transparent, dictionary lookup is changed to handle lazy objects that represent modules that have not (yet) actually been imported, as described in the Implementation section of the PEP. The Rationale section explains the intended behavior:

The aim of this feature is to make imports transparently lazy. "Lazy" means that the import of a module (execution of the module body and addition of the module object to sys.modules) should not occur until the module (or a name imported from it) is actually referenced during execution. "Transparent" means that besides the delayed import (and necessarily observable effects of that, such as delayed import side effects and changes to sys.modules), there is no other observable change in behavior: the imported object is present in the module namespace as normal and is transparently loaded whenever first used: its status as a "lazy imported object" is not directly observable from Python or from C extension code.

The lazy objects are stored in a module's symbol dictionary (i.e. module.__dict__); in order to ensure that any code that digs around in the module dictionary cannot expose the lazy objects, the underlying dictionary code must be changed. Viktorin was concerned that the behavior could be an obstacle for dictionary optimizations and features in the future. Méndez Bravo agreed that there was a bit of complexity added to the dictionary code, but thought that it was manageable—and that doing things that way was better than other alternatives that had been tried in the Cinder CPython fork where the lazy imports work began. Meta was able to achieve up to 70% reduction in startup times on Python command-line tools using Cinder's lazy imports.

PEP rejected

On December 2, Gregory P. Smith posted the steering council's decision to reject the PEP. The main reason was the effect that it would have on the Python user community:

But a problem we deem significant when adding lazy imports as a language feature is that it becomes a split in the community over how imports work. A need to test code both ways in both traditional and lazy import setups arises. It creates a divergence between projects who expect and rely upon import time code execution and those who forbid it. It also introduces the possibility of unexpected import related exceptions occurring in code at the time of first use virtually anywhere. Such exceptions could bubble up from transitive dependency first use in unanticipated places.

A world in which Python only supported imports behaving in a lazy manner would likely be great. But we cannot rewrite history and make that happen. As we do not envision the Python [language] transitioning to a world where lazy imports are the default, let alone only, import behavior. Thus introducing this concept would add complexity to our ecosystem.

The SC also had some concerns with the implementation described in the PEP, including the changes needed to the dictionary implementation, but ultimately decided that those did not matter; the SC would have said "no" even if those problems were addressed. To a certain extent, though, the SC rejection opened to the floodgates to more discussion of the feature.

Both Guido van Rossum and PEP sponsor Barry Warsaw expressed disappointment with the rejection, though both could understand the council's reasoning for doing so. Both also noted that the PEP was the best proposal for the feature that they had seen. As Warsaw put it: "It was the best option so far for solving a common use case, and one that puts pressure on ecosystems to move away from Python." Meyer wondered if there was any appetite for a revised proposal that changed to explicitly specifying each use (e.g. lazy import foo) and that created a dict subclass to be used for module dictionaries if they contain lazy imports. That would address many of the areas of concern, though it would not really change the fragmentation issue.

One big question that underlies much of the debate about the feature is around who should decide whether lazy imports are enabled—or supported. PEP 690 envisions application authors enabling lazy imports for the entire application and opting out of laziness for just the few modules that are dependent on being eagerly imported. Back in August, Méndez Bravo described following that process with code at Instagram (which is where Cinder came from), where it worked well.

But others are not so sure that it is application developers who should be making the determination. Viktorin would rather see ways for library authors to take advantage of the feature:

Overall, I think we should make it easier for libraries to use lazy imports themselves, à la SciPy or Mercurial.

The current proposal is made for "applications" with tightly controlled set of dependencies. Those are relatively rare in open-source code, and closed-source ones don't have a good way to report bugs that only appear in a specific setup back to the libraries they're using. And the libraries can't test things themselves very well.

Adding explicit lazy syntax to the import sites would allow libraries to slowly opt into the feature. The PEP rejected that approach, but he thought the reasons might be specific to the Meta/Instagram use case. "Porting to explicit lazy imports, library by library, would take time and effort, but might eventually give better results ecosystem-wide." Doing so would also allow the implementation to avoid some of the problem areas:

With explicit lazy imports, we could get away with rougher side effects, avoiding too much magic. Dicts could focus on being containers. Code that needs too much introspection or dynamic features simply wouldn't opt in.

There is concern that library maintainers will be pressured to support lazy imports of their library, however. Warsaw wondered if adding explicit "eager import" syntax would help library maintainers avoid that pressure, but Viktorin did not think it would change anything:

Lazy imports need to be tested, and to be generally useful (outside big apps with rigid dependency chains), they should be tested in individual library test suites. There'll be demand for testing, maintenance, mental overhead around the fact that your library can be imported in two different ways.

That is, of course, already the case, since imports can already be deferred in various ways. Since there is no direct language support for delaying imports, however, that leaves it up to the user of a library, which is part of what Warsaw liked in the PEP:

What I liked about the PEP was that it (at least attempted) to put the burden on the application developer, which is where I think the majority of the responsibility lies. For example, if I turned on implicit lazy imports in my Python CLI, and I found that one of my dependencies can't be lazily imported, I think I'd report the issue (or file a PR) to the dependency, but then I'd just eager-ify the import and my CLI would be none the worse off.

But, as Cannon noted, it is important to consider both the application and the library when looking at doing a lazy import:

The tricky bit with lazy imports as a concept is both the code doing the import and code being imported are affected. Right now there's no handshake in both directions saying both sides of this "transaction" agree that lazy imports are a good thing. You almost need the code being lazily imported to opt into the possibility, and then the person doing the importing saying they want those semantics.

Meyer did not think that having libraries opt into being lazily imported made sense, however. If lazy import foo is shallow, where only foo itself is lazily imported and not any of the imports it contains (unless specified as lazy in foo), then the feature is "effectively just syntactic (and maybe performance) sugar for manually inlining the import, which is already possible and not infrequently done". The PEP gives an example of the manual inlining that he mentions:

    # regular import
    import foo

    def func1():
	return foo.bar()

    def func2():
	return foo.baz()

    # manually inlined
    def func1():
	import foo
	return foo.bar()

    def func2():
	import foo
	return foo.baz()
In the second case, foo will not actually be imported until one of the functions is called. At that point, any imports in foo will be processed (eagerly) as well. Meyer also listed some reasons why he thinks it makes sense to add the syntactic sugar. For one, manual inlining is verbose ("Sometimes syntactic sugar tastes sweet"), but also:
Manual inlining invokes the import system every time the function is called, which has a noticeable cost. The PEP 690 approach reduces this overhead to zero, after the initial reference that triggers the import.

It is not entirely clear where things go from here. The discussion has largely tailed off as of this writing, but it is a feature that some find useful. The performance and memory-saving benefits that Méndez Bravo reported are certainly eye opening. Finding some way to bring those benefits to all Python users, without fracturing the ecosystem, would definitely be welcome. Perhaps the explicit approach will gain some more traction—and a PEP of its own—before too long.


Index entries for this article
PythonImport
PythonPython Enhancement Proposals (PEP)/PEP 690


(Log in to post comments)

The return of lazy imports for Python

Posted Dec 13, 2022 22:42 UTC (Tue) by mb (subscriber, #50428) [Link]

> Manual inlining invokes the import system every time the function is called,
> which has a noticeable cost.

Yes, why is that so expensive anyway?
This cost hit me often in the past already.

Can't the import system bail out early, if the import has already been done?

The return of lazy imports for Python

Posted Dec 14, 2022 3:13 UTC (Wed) by xi0n (subscriber, #138144) [Link]

Because of import hooks. The meaning of the same ‘import foo’ can change in between invocations if a hook was added or removed from sys.meta_path.

The return of lazy imports for Python

Posted Dec 14, 2022 19:31 UTC (Wed) by warrax (subscriber, #103205) [Link]

Would that not be detectable via a quick check? How frequent are changes to the hooks?

The return of lazy imports for Python

Posted Dec 15, 2022 0:13 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

While the direct configuration of hooks changing might be detectable, knowing what each hook cares about is…hard. Environment? Some module global state? Modules already loaded?

The return of lazy imports for Python

Posted Dec 14, 2022 3:28 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

The repeated import issue doesn't seem that difficult to work around:

def lazy_foo():
    if not hasattr(lazy_foo, "cached"):
        import foo
        lazy_foo.cached = foo
    return lazy_foo.cached

def func1():
    lazy_foo().bar()

def func2():
    lazy_foo().baz()

This will import foo exactly once, the first time either func1() or func2() calls lazy_foo(). After that lazy_foo() just returns the module which was already imported.

The return of lazy imports for Python

Posted Dec 14, 2022 6:03 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

That has to call an entire function, compared to looking up a global. Even if the LOAD_GLOBAL code path grows a little extra complexity for lazy objects, a function call is still much more expensive, because it sets up and tears down an entire stack frame object. If you do that pervasively throughout the entire application, you'll end up churning the heap allocator unnecessarily (Python puts frame objects on the heap so that they can be referenced and kept alive by traceback objects when an exception is thrown).

The return of lazy imports for Python

Posted Dec 16, 2022 5:24 UTC (Fri) by mgedmin (subscriber, #34497) [Link]

Python 3.11 creates frame objects lazily, only when needed. I wonder how that affects this consideration.

The return of lazy imports for Python

Posted Dec 25, 2022 21:35 UTC (Sun) by empiko (guest, #162849) [Link]

Alternatively, you can do this:
foo = None

def func():
    if foo is None:
        import foo
    ...

The return of lazy imports for Python

Posted Dec 14, 2022 3:29 UTC (Wed) by coderanger (subscriber, #134639) [Link]

It does bail out long before loading the file a second time, but there's still some processing before the point where it can tell it's definitely a duplicate (relative->absolute resolution, etc).

The return of lazy imports for Python

Posted Dec 14, 2022 13:35 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Note that another difference in "manual inlining" is that when unloading/reloading a module, all of its global imports become `None`. Function-local and class-local imports are fine. This causes lots of…fun when a module is registered as callbacks and you suddenly start getting `logging is None` exceptions. Why this seems so "rigorously undocumented" is beyond me; granted, it might be *implied* by `importlib.reload`'s docs, but they do not seem to be written for mere mortals.

The return of lazy imports for Python

Posted Dec 14, 2022 18:46 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

Rule of thumb: reload() is for testing a module at the REPL (after you have modified it and want to test the modified version). If you put reload() in a .py file, it is likely to end in tears unless you know exactly what you are doing.

(It's sometimes necessary for things like dynamic plugins that you want to load at runtime, but this is the exception rather than the rule. Most of the time, you're better off loading each module no more than once.)

The return of lazy imports for Python

Posted Dec 15, 2022 0:12 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Yeah, this was part of a reconfigure for buildbot where restarting everything for a config update stopped all builds and caused them to start from scratch. This was disruptive enough that doing some, frankly, horrible things to `sys.modules` to support this was worth it.

The return of lazy imports for Python

Posted Dec 14, 2022 13:41 UTC (Wed) by 0x3333 (subscriber, #158599) [Link]

Guys, just focus on making the interpreter faster 🤣🤣🤣

The return of lazy imports for Python

Posted Dec 14, 2022 14:09 UTC (Wed) by osma (subscriber, #6912) [Link]

I for one, as an application developer, would welcome the syntactic and performance sugar of optional, shallow lazy imports. I've done inlined imports in the past to avoid some extremely slow imports like NumPy, but that is cumbersome and adds a lot of unnecessary lines of code.

Scientific Python SPEC 1

Posted Dec 14, 2022 15:15 UTC (Wed) by hodgestar (subscriber, #90918) [Link]

There is also the Scientific Python SPEC 1 lazy importer. It requires libraries to opt-in (good) and uses the PEP 562 ability to override module `__getattr__`, etc that was implemented in Python 3.7. Python 3.7 is currently the oldest Python that is not end of life.

Scientific Python SPEC 1

Posted Dec 14, 2022 18:52 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

That seems really nifty, and it looks like they put the meat of the work on PyPI as a separate library: https://pypi.org/project/lazy_loader/

IMHO this is objectively superior to the PEP, since it allows laziness to be an implementation detail of the library, rather than something the application developer has to worry about. If an application developer goes plumbing lazy_loader into an existing library, they know perfectly well that they're carrying a patch and can't reasonably expect upstream to support it. Conversely, if an application developer doesn't want anything to do with lazy loading, they don't even have to know that it is there, because it Just Works.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds