How virtual environments work

After needing to do a deep dive on the venv module (which I will explain later in this blog post as to why), I thought I would explain how virtual environments work to help demystify them.

Why do virtual environments exist?

Back in my the day, there was no concept of environments in Python: all you had was your Python installation and the current directory. That meant when you installed something you either installed it globally into your Python interpreter or you just dumped it into the current directory. Both of these approaches had their drawbacks.

Installing globally meant you didn't have any isolation between your projects. This led to issues like version conflicts between what one of your projects might need compared to another one. It also meant you had no idea what requirements your project actually had since you had no way of actually testing your assumptions of what you needed. This was an issue if you needed to share you code with someone else as you didn't have a way to test that you weren't accidentally wrong about what your dependencies were.

Installing into your local directory didn't isolate your installs based on Python version or interpreter version (or even interpreter build type, back when you had to compile your extension modules differently for debug and release builds of Python). So while you could install everything into the same directory as your own code (which you did, and thus didn't use src directory layouts for simplicity), there wasn't a way to install different wheels for each Python interpreter you had on your machine so you could have multiple environments per project (I'm glossing over the fact that back in my the day you also didn't have wheels or editable installs).

Enter virtual environments. Suddenly you had a way to install projects as a group that was tied to a specific Python interpreter. That got us the isolation/separation of only installing things you depend on (and being able to verify that through your testing), as well has having as many environments as you want to go with your projects (e.g. an environment for each version of Python that you support). So all sorts of wins! It's an important feature to have while doing development (which is why it can be rather frustrating for users when Python distributors leave venv out).

How do virtual environments work?

💡
Virtual environments are different than conda environments (in my opinion; some people disagree with me on this view). The key difference is that conda environments allow projects to install arbitrary shell scripts which are run when you activate a conda environment (which is done implicitly when you use conda run). This is why you are always expected to activate a conda environment, as some conda packages require those shell scripts to be run. I won't be covering conda environments in this post.

Their structure

There are two parts to virtual environments: their directories and their configuration file. As a running example, I'm going to assume you ran the command py -m venv --without-pip .venv in some directory on a Unix-based OS (you can substitute py with whatever Python interpreter you want, including the Python Launcher for Unix).

For simplicity I'm going to focus on the Unix case and not cover Windows in depth.

A virtual environment has 3 directories and potentially a symlink in the virtual environment directory (i.e. within .venv):

  1. bin ( Scripts on Windows)
  2. include ( Include on Windows)
  3. lib/pythonX.Y/site-packages where X.Y is the Python version (Lib/site-packages on Windows)
  4. lib64 symlinked to lib if you're using a 64-bit build of Python that's on a POSIX-based OS that's not macOS

The Python executable for the virtual environment ends up in bin as various symlinks back to the original interpreter (e.g. .venv/bin/python is a symlink; Windows has a different story). The site-packages directory is where projects get installed into the virtual environment (including pip if you choose to have it installed into the virtual environment). The include directory is for any header files that might get installed for some reason from a project. The lib64 symlink is for consistency on those Unix OSs where they have such directories.

The configuration file is pyvenv.cfg and it lives at the top of your virtual environment directory (e.v. .venv/pyvenv.cfg). As of Python 3.11, it contains a few entries:

  1. home (the directory where the executable used to create the virtual environment lives; os.path.dirname(sys._base_executable))
  2. include-system-packages (should the global site-packages be included, effectively turning off isolation?)
  3. version (the Python version down to the micro version, but not with the release level, e.g. 3.12.0, but not 3.12.0a6)
  4. executable (the executable used to create the virtual environment; os.path.realpath(sys._base_executable))
  5. command (the CLI command that could have recreated the virtual environment)

On my machine, the pyvenv.cfg contents are:

home = /home/linuxbrew/.linuxbrew/opt/python@3.11/bin
include-system-site-packages = false
version = 3.11.2
executable = /home/linuxbrew/.linuxbrew/Cellar/python@3.11/3.11.2_1/bin/python3.11
command = /home/linuxbrew/.linuxbrew/opt/python@3.11/bin/python3.11 -m venv --without-pip /tmp/.venv
Example pyvenv.cfg

One interesting thing to note is pyvenv.cfg is not a valid INI file according to the configparser module due to lacking any sections. To read fields in the file you are expected to use line.partition("=") and to strip the resulting key and value.

And that's all there is to a virtual environment! When you don't install pip they are extremely fast to create: 3 files, a symlink, and a single file. And they are simple enough you can probably create one manually.

One point I would like to make is how virtual environments are designed to be disposable and not relocatable. Because of their simplicity, virtual environments are viewed as something you can throw away and recreate quickly (if it takes your OS a long time to create 3 directories, a symlink, and a file consisting of 292 bytes like on my machine, you have bigger problems to worry about than virtual environment relocation 😉). Unfortunately, people tend to conflate environment creation with package installation, when they are in fact two separate things. What projects you choose to install with which installer is actually separate from environment creation and probably influences your "getting started" time the most.

How Python uses a virtual environment

During start-up, Python automatically calls the site.main() function (unless you specify the -S flag). That function calls site.venv() which handles setting up your Python executable to use the virtual environment appropriately. Specifically, the site module:

  1. Looks for pyvenv.cfg in either the same or parent directory as the running executable (which is not resolved, so the location of the symlink is used)
  2. Looks for include-system-site-packages in pyvenv.cfg to decide whether the system site-packages ends up on sys.path
  3. Sets sys._home if home is found in pyvenv.cfg (sys._home is used by sysconfig)

That's it! It's a surprisingly simple mechanism for what it accomplishes.

One thing to notice here about how all of this works is virtual environment activation is optional. Because the site module works off of the symlink to the executable in the virtual environment to resolve everything, activation is just a convenience. Honestly, all the activation scripts do are:

  1. Puts the bin/ (or Scripts/) directory at the front of your PATH environment variable
  2. Sets VIRTUAL_ENV to the directory containing your virtual environment
  3. Tweaks your shell prompt to let you know your PATH has been changed
  4. Registers a deactivate shell function which undoes the other steps

In the end, whether you type python after activation or .venv/bin/python makes no difference to Python. Some tooling like the Python extension for VS Code or the Python Launcher for Unix may check for VIRTUAL_ENV to pick up on your intent to use a virtual environment, but it doesn't influence Python itself.

Introducing microvenv

In the Python extension for VS Code, we have an issue where Python beginners end up on Debian or a Debian-based distro like Ubuntu and want to create a virtual environment. Due to Debian removing venv from the default Python install and beginners not realizing there was more to install than python3, they often end up failing at creating a virtual environment  (at least initially as you can install python3-venv separately; in the next version of Debian there will be a python3-full package you can install which will include venv and pip, but it will probably take a while for all the instructions online to be updated to suggest that over python3). We believe the lack of venv is a problem as beginners should be using environments, but asking them to install yet more software can be a barrier to getting started (I'm also ignoring the fact pip isn't installed by default on Debian either which also complicates the getting started experience for beginners).

But venv is not shipped as a separate part of Python's stdlib, so we can't simply install it from PyPI somehow or easily ship it as part of the Python extension to work around this. Since venv is in the stdlib, it's developed along with the version of Python it ships with, so there's no single copy which is fully compatible with all maintained versions of Python (e.g. Python 3.11 added support to use sysconfig to get the directories to create for a virtual environment, various fields in pyvenv.cfg have been added over time, use new language features may be used, etc.). While we could ship a copy of venv for every maintained version of Python, we potentially would have to ship for every micro release to guarantee we always had a working copy, and that's a lot of upstream tracking to do. And even if we only shipped copies from minor release of Python, we would still have to track every micro release in case a bug in venv was fixed.

Hence I have created microvenv. It is a project which provides a single .py file which you use to create a minimal virtual environment. You can either execute it as a script or call its create() function that is analogous to venv.create(). It's also compatible with all maintained versions of Python. As I (hopefully) showed above, creating a virtual environment is actually straight-forward, so I was able to replicate the necessary bits in less than 100 lines of Python code (specifically 87 lines in the 2023.1.1 release). That actually makes it small enough to pass in via python -c, which means it could be embedded in a binary as a string constant and passed as an argument when executing a Python executable as a subprocess if you wanted to (directly executing microvenv.py works). Hopefully that means a tool could guarantee it can always construct a virtual environment somehow.

To keep microvenv simple, small, and maintainable, it does not contain any activation scripts. I personally don't want to be a shell script expert for multiple shells, nor do I want to track the upstream activation scripts (and they do change in case you were thinking "it shouldn't be that hard to track"). Also, in VS Code we are actually working towards implicitly activating virtual environments by updating your environment variables directly instead of executing any activation shell scripts, so the shell scripts aren't needed for our use case (we are actively moving away from using any activation scripts where we can as we have run into race condition problems with them when sending the command to the shell; thank goodness of conda run, but we also know people still want an activated terminal).

I'm also skipping Windows support because we have found the lack of venv to be a unique problem for Linux in general, and Debian-based distros specifically.

I honestly don't expect anyone except tool providers to use microvenv, but since it could be useful to others beyond VS Code, I decided it was worth releasing on its own. I also expect anyone using the project to only use it as a fallback when venv is not available (which you can deduce by running py -c "from importlib.util import find_spec; print(find_spec('venv') is not None)"). And before anyone asks why we don't just use virtualenv, its wheel is 8.7MB compared to microvenv at 3.9KB; 0.05% the size, or 2175x smaller. Granted, a good chunk of what makes up virtualenv's wheel is probably from shipping pip and setuptools in the wheel for fast installation of those projects after virtual environment creation, but we also acknowledge our need for a small, portable, single-file virtual environment creator is rather niche and something virtualenv currently doesn't support (for good reason).

Our plan for the Python extension for VS Code is to use microvenv as a fallback mechanism for our Python: Create Environment command (FYI we also plan to bootstrap pip via its pip.pyz file from bootstrap.pypa.io by downloading it on-demand, which is luckily less than 2MB). That way we can start suggesting to users in various UX flows to create and use an environment when one isn't already being used (as appropriate, of course). We want beginners to learn about environments if they don't already know about them and also remind experienced users when they may have accidentally forgotten to create an environment for their workspace. That way people get the benefit of (virtual) environments with as little friction as possible.