Lazy imports for Python
Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net. |
Starting a Python application typically results in a flurry of imports as modules from various locations (and the modules they import) get added into the application process. All of that occurs before the application even gets started doing whatever it is the user actually launched it for; that delay can be significant—and annoying. Beyond that, many of those imports may not be necessary at all for the code path being followed, so eagerly doing the import is purely wasted time. A proposal back in May would add a way for applications to choose lazy imports, where the import is deferred until the module is actually used.
PEP 690
The lazy imports proposal was posted
to the Python discussion forum by one of its authors, Germán
Méndez Bravo. He noted that the feature is in use in the Cinder CPython fork
at Meta, where it has demonstrated
"startup time improvements up to 70% and memory-use reductions up to
40%
" on real-world Python command-line tools. So he and Carl Meyer
teamed up on PEP 690
("Lazy Imports") to propose the feature for CPython itself; since neither
of them is a core developer, Barry Warsaw stepped up as the PEP's sponsor.
The PEP has changed some since it was posted based on feedback in the
discussion, some of which will be covered below; the May 3
version can be found at GitHub, along with other
historical
versions.
The core of the idea is the concept of a lazy reference that is not visible to Python programs; it is purely a construct in the C code of the interpreter. When run with lazy imports enabled, a statement like import foo will simply add the name foo to the global namespace (i.e. globals()) as a lazy reference; any access to that name will cause the import to be executed, so the lazy reference acts like a thunk. Similarly, from foo import bar will add bar to the namespace, such that when it is used it will be resolved as foo.bar, which will import foo at that time.
The original proposal enabled lazy imports by way of a command-line flag (-L) to the interpreter or via an environment variable. But Inada Naoki pointed out that it would be better to have an API to enable lazy imports. For an application like Mercurial, which already uses a form of lazy importing but might want to switch to the new mechanism, setting an environment variable just for the tool is not sensible and adding a command-line argument to the "#!/usr/bin/env python" shebang (or hashbang) line of a Python script is not possible on Linux. Meyer agreed that an API (e.g. importlib.set_lazy_imports()) should be added.
The PEP specifically targets application developers as the ones who should choose lazy imports and test their applications to ensure that it works. The PEP says:
Since lazy imports are a potentially-breaking semantic change, they should be enabled only by the author or maintainer of a Python application, who is prepared to thoroughly test the application under the new semantics, ensure it behaves as expected, and opt-out any specific imports as needed (see below). Lazy imports should not be enabled speculatively by the end user of a Python application with any expectation of success.It is the responsibility of the application developer enabling lazy imports for their application to opt-out any library imports that turn out to need to be eager for their application to work correctly; it is not the responsibility of library authors to ensure that their library behaves exactly the same under lazy imports.
The environment variable for enabling the feature fell by the wayside, but the -L option remains and an explicit API was added. There are also ways for developers to opt out of lazy imports; the PEP proposes a few different mechanisms. To start with, there are several types of imports that will never be processed lazily, regardless of any settings. For example:
import importlib # not lazy (aka "eager") # make subsequent imports lazy importlib.set_lazy_imports() import foo # lazy from bar import baz # lazy from xyz import * # star imports are always eager try: import abc # always eager in try except block except ImportError: import def # eager with A() as a: import ghi # always eager inside with blockImports that are not at the top level (i.e. outside of any class or function definition) are also always eager. If a developer knows that an import needs to be done eagerly, presumably because the side effects from importing it need to happen before the rest of the code is executed—or perhaps because the module does not work correctly when lazily imported—the import can be done in a try block or the proposed new context manager can be used:
from importlib import eager_imports with eager_imports(): import foo # eagerFor third-party code that cannot (or should not) be modified, there is an exclude list available, which will force modules on the list to be eagerly imported when they are encountered:
from importlib import set_lazy_imports set_lazy_imports(excluding=['foo', 'bar.baz'])In that example, foo and bar.baz will be eagerly imported, though bar is still lazily imported, as are all of the imports contained in foo and bar.baz.
Libraries
Several library authors expressed concerns that they would effectively be forced to support (and test) lazy imports of their library, which is an added burden for maintainers. Thomas Kluyver put it this way:
Realistically, we won't get to tell everyone that if they want to use our library they can't use this new lazy import thing that Python just added. Especially as it's meant to make startup faster, and performance tricks always get cargo-culted to people who don't want to think about what they mean (one weird trick to make your Python scripts start 70% faster!). Within a year or so of releasing a version of Python with this option, we'll probably have to ensure our libraries and examples work with and without it. I'm sure we'd manage, but please remember that opt-in features for application developers aren't really optional for library developers.
Marc-Andre Lemburg said
that it makes more sense to explicitly choose which imports are done
lazily. Enabling it globally for a large code base is potentially
dangerous; instead something like "lazy import foo"
should be used for imports that are only accessed from some subset of the
program. He acknowledged that can already be done, by placing the import
where the functionality is being used, but thought that explicitly calling
out the lazy imports was a better approach. Gregory P. Smith disagreed:
"The startup time benefit of lazy imports only comes from enabling them
broadly, not by requiring all code in your application's entire transitive
dependencies to be modified to declare most of their imports as lazy.
"
Meyer wondered about real world examples of the kinds of problems that library authors might encounter. There is a persistent idea in the discussion about libraries opting into being imported lazily, but he does not think that makes sense. However, library authors may not really want to determine whether their library can be imported that way, Paul Moore said:
My concern is more that as a library developer I have no intention of even thinking about whether my code is "lazy import safe". I just write "normal" Python code, and if my test suite passes, I'm done. I don't particularly want to run my test suite twice (with and without lazy imports) and even if I did, what am I supposed to do if something fails under lazy imports? The fact that it works under "normal" imports means it's correct Python, so why should make my life harder by avoiding constructs just to satisfy an entirely theoretical possibility that someone might want to import my code lazily?
Meyer said
that was a reasonable position for a library developer to take, but also
recognized that the maintainer "might get user complaints about it, and
this is a significant cost of the PEP
". He also pointed
out
that most of the concerns being raised also apply to the existing importlib.util.LazyLoader
class, which provides a more limited kind of lazy imports. Beyond
that, there is no real
way to decide that a module is "safe" for lazy import:
What I think the discussions of "library opt-out" are missing is that "safe for lazy imports" is fundamentally not even a meaningful or coherent property of a single module or library in isolation. It is only meaningful in the context of an actual application codebase. This is because no single module or library can ever control the ordering of imports or how the import-time code path flows: it is an emergent property of the interaction of all modules in the codebase and their imports.[...] I think the nature of the opt-out in PEP 690 is not well understood. It is not an exercise in categorizing modules into neatly-defined objective categories of "safe for lazy import" and "not safe for lazy import." (If it were, the only possible answer would be that no module is ever fully lazy import safe.) Rather, it is a way for an application developer to say "in the context of my specific entire application and how it actually works, I need to force these particular imports to be eager in order for the effects I actually need to happen in time."
Warsaw agreed
with that; library authors "can't declare their modules safe for lazy
import because
they have no idea how their libraries are consumed or in what order they
will be imported
". On the other hand, application authors are in a
position to work all of that out:
As an application author though, I know everything I need to know about what modules I consume, how they are imported, and whether they are safe or not. At least theoretically. Nobody is in a more advantageous position to understand the behavior of my application, and to make declarations about what modules can and cannot be safely lazily imported. And nobody else is in a position to actually test that assumption.To me, the PEP gives the application author end-consumer the tools they need to build a lazy-friendly application.
As one data point on the real-world prevalence of lazy-import problems,
Meyer said: "the Instagram Server codebase is multiple million lines of
code, uses lazy imports applied globally, and has precisely five modules
opted out
". Méndez Bravo published
a lengthy blog post that described the process of converting that code base
to use lazy imports. For the most part, the problems encountered were not
due to importing libraries; even third-party and standard library modules
largely just worked when lazy imports was enabled globally.
Toward the end of June the discussion picked back up when Matplotlib developer Thomas A Caswell reiterated
the
concerns about the feature's impact on libraries and their authors,
though he is "still enthusiastic about this proposal
". Matplotlib
and other SciPy libraries have lengthy
import times and have tried various ways of deferring imports but "at
every step discovered a subtle way that a user was relying on a
side-effect
". He expects that PEP 690 will "produce a stream of
really interesting bugs across the whole ecosystem
", though he would be
happy to be wrong about that.
David Lord, who helps maintain Flask, Jinja, and other
libraries, focused
on the push from users to support lazy imports in libraries. He
said that other features added to Python over the years
(asyncio
and typing)
had created a lot of extra work when users clamored for them to be
supported. "I really hope this doesn't add a third huge workload to my
list of things to juggle as a maintainer.
"
Moore is worried
that users will perceive lazy imports as a magic button they can press for
better performance:
My fear is that most users will get the impression that "enable lazy imports" is essentially a "go_faster=True" setting, and will enable them and have little or no ability to handle the (inevitable) problems. They will therefore push the issue back to library maintainers.
He is in favor of improving startup time and reducing the cost of imports
but would prefer to see it done with some form of opt-in for library
authors. Meyer reminded
everyone that a form of the feature already exists in the language "in a very
similar global opt-in way
" with LazyLoader. The PEP makes the
feature "more usable and more effective and faster
", however, which
may make it more popular, thus library developers may see more user
requests. Furthermore,
the existence of
LazyLoader has not led to the problems envisioned: "The Python
ecosystem doesn't seem to have been overwhelmed by people trying it 'just
to see if it makes things faster.'
" But it may be that
LazyLoader is not all that well-known so it has not been
(ab)used much.
The discussion has wound down at this point, though in early August, Mark
Shannon argued
that the PEP "just feels too magical
" because it does not use an
explicit mechanism to mark lazy imports
(e.g. lazy import foo). He said that "explicit
approaches have been rejected with insufficient justification
". Warsaw
disagreed
and thought that the PEP did justify its choices; he encouraged those who
want to see an explicit approach to create a competing PEP.
Méndez Bravo recounted
the process he went through when converting the Instagram code, which
started out with an explicit approach. As he worked through it, he
realized that nearly all of the imports could be done lazily so he switched
to the global approach. All in all, it worked well:
There are many different types of uses of Python and some communities have different patterns, but all the evidence we do have is that the percentage of modules we tried and worked without any issues out of the box with lazy imports enabled was high, and that just enabling lazy imports in a few modules doesn't yield many benefits at all. The true power comes when you enable laziness in whole systems. We've saved terabytes of memory in some systems and reduced start times from minutes to just a few seconds, just by making things lazy.
Opinions are split on PEP 690, but it seems clear that it provides a
useful tool for some. Python creator Guido van Rossum is in
favor: "I am eager to have this available even if there are
potential problems
", but others are less enthusiastic even though the
underlying problem is widely acknowledged. The PEP is targeted at the 3.12
release, which does not have a feature freeze until next May, so there
is still plenty of time. One
might guess that the next step is to ask the steering council to decide on
the PEP. The outcome of that is not obvious, though if more people start using
lazy imports in Cinder without major problems, it might help sway the
decision. Time will tell.
Index entries for this article | |
---|---|
Python | Import |
Python | Python Enhancement Proposals (PEP)/PEP 690 |
(Log in to post comments)
Lazy imports for Python
Posted Sep 7, 2022 19:46 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
- setting up lazy imports only works from a `__name__ == '__main__'` context; and
- modules/packages have a way to say "I will never support lazy loading" to avoid having every single consumer have to add it to their exclusion list.
Lazy imports for Python
Posted Sep 7, 2022 21:57 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
The PEP says that laziness will be per-module. If you call set_lazy_imports() in foo.py, and bar.py does not call set_lazy_imports(), then all of the imports in bar.py will be eagerly evaluated (even if foo.py lazily imports those same modules first - what happens is that bar.py's import gets eagerly evaluated, and then when foo.py's lazy import eventually needs to be evaluated, the import machinery resolves it back to the same module that bar.py already imported, just like it would if both imports were eager). IMHO if some library wants to use lazy imports as part of an internal implementation detail that application code will never see, then we might as well allow it.
> - modules/packages have a way to say "I will never support lazy loading" to avoid having every single consumer have to add it to their exclusion list.
This basically translates to "I'm deliberately causing nontrivial import-time side effects that every application will necessarily depend on." You can't possibly know what "every application" is going to depend on (you might be a dependency of a dependency of a dependency that the application barely calls into at all), so I'm skeptical that this is a real thing. But regardless, I don't think Python should encourage this sort of chicanery, or go out of its way to support it, because import-time side effects are evil.
Lazy imports for Python
Posted Sep 7, 2022 22:06 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
This is incorrect, I misread https://peps.python.org/pep-0690/#deep-eager-imports-over.... My bad.
Lazy imports for Python
Posted Sep 8, 2022 13:12 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
The library is VTK. What happens is that there are "object factory" types that can give you back a subclass if they are "registered". So you ask for a `vtkOpenGLRenderWindow` and you get back one that actually does X or Cocoa without having to know which you want at every call site. There are a variety of classes that do this.
In C++, there's a mechanism to see "oh, you included the base class" and the buildsystem injects "and this TU knows about overloads in library X and Y, so call X and Y's registration mechanisms" through that include through some injected global static ctors that ensure registration happens. Due to the way various platforms actually work, this needs to be done in every TU to be reliable[1]. Python accesses this by importing the module that has overloads and then it works "by magic" due to the module initializer hooking this mechanism (since the wrappers need to include the relevant headers).
> because import-time side effects are evil.
I'm not saying they're not, but alas C++ gives such little control over global initializers, that for some solutions, there's no reasonable alternative. As much as *I'd* like to make these things scoped and explicit, getting everyone to update is beyond the software process capital I'm willing to spend (as if I had that much anyways).
[1] For example, static builds on macOS (at one point?) only call object-local global initializers when something from the object is actually used. There's no place to put a library ctor at the static-library level, so you just need to inject "everywhere" to be reliable.
Lazy imports for Python
Posted Sep 7, 2022 21:40 UTC (Wed) by willy (subscriber, #9762) [Link]
Lazy imports for Python
Posted Sep 7, 2022 21:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
```python
import mount # Filesystem support
import mount_xfs # Enable XFS support
mount.mount(type='xfs') # Oops, XFS isn't actually loaded because the `mount_xfs` name wasn't tickled.
```
Lazy imports for Python
Posted Sep 7, 2022 21:59 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
import mount # Filesystem support
import mount_xfs # XFS support
mount.mount(type=mount_xfs)
Or maybe you use type=mount_xfs.mount_type or something like that, but you can just pass the whole module if you want to. Python doesn't care. They're all PyObject* under the hood.
Lazy imports for Python
Posted Sep 8, 2022 13:03 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
For the project I know that will be affected by this, it is all in the library ctor stuff that stuff gets set up; there's no API *to* call. Again, not the best design, but doing it explicitly would make "everyone" unhappy because of how much work it ends up doing for users.
[1] Via feature request pressure to "support lazy loading" over time.
Lazy imports for Python
Posted Sep 8, 2022 16:51 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
Having said that, you might consider moving that logic into an init() function that the application can call into, if possible. Then, when someone inevitably asks for lazy loading support, tell them "Sure, we support enable lazy loading, just make sure you call this init() function." Since they're already making a code change anyway (to enable lazy loading), they should not have a backcompat objection to that.
Lazy imports for Python
Posted Sep 8, 2022 17:51 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
There's no function for `init` to bind; the relevant code is all stored in global initializers (again, not the design I'd prefer, but it's what I have).
Lazy imports for Python
Posted Sep 8, 2022 20:24 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
Lazy imports for Python
Posted Sep 9, 2022 8:15 UTC (Fri) by farnz (subscriber, #17727) [Link]
Or, on the assumption that you're willing to do work to support lazy loading, you move all of the global code bar imports into the init function, and then have the last line of your module be init(). This is a bigger refactor, but it means that old users get the behaviour they expect (import runs the code), and new users can do a lazy load followed by a call to init to get the same behaviour.
Lazy imports for Python
Posted Sep 10, 2022 16:57 UTC (Sat) by NYKevin (subscriber, #129325) [Link]
main.py:
import primary_library
import some_plugin
import another_plugin
Each of the plugin files imports primary_library and then calls some magic init function from there, but the actual API is primary_library (i.e. you just import some_plugin for the magic init side-effect and not to actually use it directly). The plugins are third-party code. You can't just fix it in primary_library, because primary_library doesn't know about the plugins and can't find and init them by itself, even if the application does call primary_library.init().
The reasonable solution is to require the application to call primary_library.init(some_plugin) and explicitly say which plugin(s) to init. But that might be a more significant refactoring job. OTOH, explicit is better than implicit, and this is IMHO a superior coding style to just "write import x and magic happens."
Lazy imports for Python
Posted Sep 9, 2022 19:00 UTC (Fri) by smurf (subscriber, #17840) [Link]
The fix for this is to park each imported part in a separate module, then use the filename to load it.
Lazy imports for Python
Posted Sep 10, 2022 20:03 UTC (Sat) by NYKevin (subscriber, #129325) [Link]
Lazy imports for Python
Posted Sep 11, 2022 3:18 UTC (Sun) by smurf (subscriber, #17840) [Link]
Lazy imports for Python
Posted Sep 9, 2022 13:36 UTC (Fri) by azumanga (subscriber, #90158) [Link]
Now that exception will be thrown in whatever function the library is first used in. That may well be surprising, and could cause issues I imagine.
Lazy imports for Python
Posted Sep 9, 2022 19:05 UTC (Fri) by smurf (subscriber, #17840) [Link]
The same kind of argument has been used 20 years ago when libc started to use lazy loading. The solution was a loader flag that loaded eagerly and reported when something didn't resolve. On Python, the equivalent solution might well be a dedicated import checker.
Fortunately, tools like that do exist already. :-P
Lazy imports for Python
Posted Sep 11, 2022 19:24 UTC (Sun) by NYKevin (subscriber, #129325) [Link]
Lazy imports for Python
Posted Sep 14, 2022 0:02 UTC (Wed) by xnox (subscriber, #63320) [Link]
import setup
import daemon
daemon.start()
Can fail, if setup has a side effect of exporting environment variables which start of Daemon expected to be set.
Same issues as shell source command, which also is hard to make "lazy".
Lazy imports for Python
Posted Sep 8, 2022 7:40 UTC (Thu) by yaap (subscriber, #71398) [Link]
Lazy imports for Python
Posted Sep 8, 2022 8:20 UTC (Thu) by josh (subscriber, #17465) [Link]
Lazy imports for Python
Posted Sep 8, 2022 13:00 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
Lazy imports for Python
Posted Sep 8, 2022 20:30 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
This either doesn't work, or it's basically useless.
foo.py:
import bar
bar.py:
print("Side effect!")
main.py:
import foo
Now, if you make main.py lazily import foo.py, then the side effects of bar.py will stop happening, which main might have accidentally relied on. So you would need to treat import statements as "statements other than declarations" - which effectively means that almost any nontrivial module will be eagerly imported. Or, alternatively, you have to recursively trace through all of the modules in the entire dependency tree and check each one for this lazy import flag - which is still pretty expensive and doesn't really save you all that much (consider the disk seeks!).
Lazy imports for Python
Posted Sep 14, 2022 6:39 UTC (Wed) by arvidma (guest, #6353) [Link]
This way, you only pay the cost once and any subsequent runs can take the fast path.
Lazy imports for Python
Posted Sep 15, 2022 8:54 UTC (Thu) by smurf (subscriber, #17840) [Link]
Lazy imports for Python
Posted Sep 8, 2022 8:30 UTC (Thu) by taladar (subscriber, #68407) [Link]
Lazy imports for Python
Posted Sep 8, 2022 10:17 UTC (Thu) by LtWorf (subscriber, #124958) [Link]
At work we have an internal python tool that to do --version loads 604 files.
This is mostly due to type annotations, that force the imports just to declare a function that uses a certain type.
Importing modules before using them is annoying and requires a lot of discipline, but is only a partial solution
As developer of typedload (like pydantic but better) I am not sure this would help me much. The code imports a bunch of modules to support those types, but the types are referenced so the import happens anyway. I'd have to do a lot of trickery to only reference them when they are actually needed, but that would probably mean that the library would no longer work with cython.
Lazy imports for Python
Posted Sep 9, 2022 10:28 UTC (Fri) by georgm (subscriber, #19574) [Link]
# time firewall-cmd --state running real 0m1.675s user 0m0.861s sys 0m0.105sfor a simple DBUS call...
Hopefully this will benefit from such a change
Lazy imports for Python
Posted Sep 9, 2022 19:12 UTC (Fri) by smurf (subscriber, #17840) [Link]
DBus is nice, but its metadata system could have used a somewhat simpler data model.
Lazy imports for Python
Posted Sep 9, 2022 19:14 UTC (Fri) by smurf (subscriber, #17840) [Link]