Managing dependencies in a Python project

, Jacek Kołodziej

Intro

Assume you're maintaining a project (an application or a library) and it needs some new dependency - a library, maybe a framework. How to add it to the project for real? You look around and see multiple alternatives:

So many options! Oh, and it seems like the choice depends on whether you're maintaining an application or a library, why is that?

Please note it is not a guide for any particular dependency management tool (although I will suggest a few) but rather a high-level overview of managing dependencies in a Python project - and why you would want to do it in a well-structured way. Also, there's a Tl;dr section near the end if you're not interested in the details.

Otherwise, I will try to explain why there's more than one way to do it and what are the tradeoffs between them so you can make an informed decision - no matter whether you're reasonably new to having dependencies in your project (and figuring out how to start) or you're already familiar with dependency management - but struggle with grasping the big picture at once.

Glossary

But first let's go through some definitions:

I believe the pinning vs. restricting wording is not very widespread but I intend to use it throughout this article to be concise.

If you think there's a better way to call them, feel free to correct me.

Managing dependencies

What does that mean, exactly?

This may feel obvious but actually listing our needs will make things down the road easier:

  1. listing the libraries, frameworks etc. your project depends on directly
  2. being able to easily upgrade/downgrade them - a specific one or all at once
  3. being able to drop a dependency (because your code ceases to use it) - along with transitive dependencies that are no longer necessary
  4. discerning the transitive dependencies from the direct ones
  5. installing them - for runtime, but also for running tests or some static analysis - either on a CI server or locally, and also somehow isolated from the system (e.g. in a virtual environment or a Docker image) or not
  6. installing them with some degree of reproducibility

I understand sometimes it's not as easy to classify your project as an application or a library - it might be a library but with a CLI for example - but let's start with this simple distinction and build on that.

But first, what's dive more into...

Reproducibility

If you have some series of commands you execute to install your project along with its dependencies - be it for deployment, for running tests, development or something else - then reproducibility of that process means:

  1. all your dependencies (direct and transitive) are installed in the same versions each time
  2. or all dependencies are installed with the same code each time - this is different because a dependency may very well be released again under the same version but with a different code - possibly by another party with malicious intents; of course it doesn't happen often but point is: it may happen

To achieve a) you need a full list of your dependencies (including transitive ones) with their exact versions. b), in addition, requires some form of hashes/checksums of dependencies' code. Don't worry, there are good tools for that.

Why is reproducibility useful?

Be aware that depending on what your commands actually do, there are more things that can change: system libraries (including the Python interpreter used for runtime and installation), system kernel etc. but these are out of scope of this article. Let's just remember that reproducibility is not exactly a binary thing but rather a spectrum of choices.

...vs. automatic upgrades

On the other hand we may want auto upgrades of the dependencies when we install them - so they get installed in their newest possible versions for your project (honoring the versions restrictions) without having to do anything besides re-running the installation (build); this of course takes reproducibility away.

At the end of the day you need to consider your situation and choose which option will work better for you. My personal default is reproducibility and more control, then looking whether the benefits of auto upgrades outweight the costs of lost reproducibility.

Now, this all applies easily to application projects. What does reproducibility even mean for maintaining libraries?

Reproducibility for libraries

Let's focus on maintaining libraries for a while. The terms build and reproducibility is more nuanced here, because your dependencies get installed:

For Python libraries, you shouldn't ever pin your dependencies' versions for the second purpose (in an attempt to make it somehow reproducible) - because it's very easy to lead your users into version conflicts (dependency hell) that way.

You may consider reproducibility for CI and even development. That's a double-edged sword, though:

I don't know of any good solution to that: either you make your process block changes in your library until you fix such a problem or go with reproducibility and have a process of regurarly updating the frozen versions (like Dependabot) - the more frequent the updates, the quicker you'd catch any breakages.

Version restricting

Let's discuss this aspect of dependency management a bit more: why do we sometimes restrict the version of a dependency to some foo<2.0?

There are a few possible reasons and most of them (if not all) are really situational and depend on a particular project (e.g. its versioning scheme and support guarantees*):

  1. foo>=1.2 - e.g. to be sure foo gets installed in the version 1.2 or higher because your code depends on some feature or behavior that was introduced in 1.2
  2. bar!=1.2.3 - e.g. because bar-1.2.3 introduced some bug that you are sure will get fixed in bar-1.2.4
  3. baz<2 - e.g. when you're fairly sure baz-2.0 will introduce some changes that will break your project and you don't want to get surprised by that release
  4. baz>1.2,<2 - you can mix a) and c) together

As new versions of your dependencies get released (or when your code using them changes) the related restrictions will likely need changing, too.

Side-note: it's a good idea to document each restriction (e.g. in commit messages) so you'll know why is it there in the first place and will be able to upgrade them with more confidence. Also, to avoid cargo-culting: imagine you're inheriting a requirements.txt file or copyting some part of a project to the next one, then see flask<1.2, then ask yourself: is the <1.2 part still required? A little piece of documentation (especially if it's quickly accessible through git annotate) would definitely help.

* no versioning scheme will guarantee shielding from all possible breaking changes; it may only reduce the probability

Suboptimal approaches

There are two forms of dependency specifications that I often encounter: keeping the dependencies in a single file; they do their job to some extent but each is lacking:

1. Direct dependencies only

A list of (only) direct dependencies:

flask
requests

or, with versions restricted for any reason:

flask>=1.1
requests>2,<3

Such a list is often put:

So this form:

2. Frozen list only

Pretty much a result of a pip freeze command ran in an environment with all (hopefully!) the necessary dependencies installed - a complete, flattened tree of dependencies (direct and transitive) with their exact versions:

certifi==2020.12.5
chardet==4.0.0
click==7.1.2
Flask==1.1.2
idna==2.10
itsdangerous==1.1.0
Jinja2==2.11.3
MarkupSafe==1.1.1
requests==2.25.1
urllib3==1.26.4
Werkzeug==1.0.1

Such a list is often put... also into a requirements.txt file (sometimes named differently, e.g. requirements-frozen.txt - the exact name doesn't really matter).

It'd be preferred to also have checksums of the source code of each version (for even more reproducibility and security assurances) but pip freeze doesn't provide that.

Assuming such a frozen list really represents a complete dependency tree:

There's a tool that can help a little bit with these problems - pipdeptree - but it'd still require brittle, manual work.

3. Mixed

Some projects - likely in the absence of a clear dependency management system - try to mix these two together:

flask>=1.1
pytest<5
requests-2.22
urllib3>1.26

Imagine a possible history of this file:

There are good explanations for why we may end up in such a situation and I see it may be bearable to manage dependencies like this - but it doesn't make it any better: it's less structured which makes it harder to read, reason about and update. It can also lead to dependency hell more easily. Eventually it becomes a place where - if any changes are necessary - you make random modifications in frustration and mumble please just work already! (I'm calling this code piñata).

To sum up

Direct dependencies only Frozen list only Mixed
A. listing direct dependencies

who knows?

B. easy up-/downgrading
C. easy dropping a dependency
D. discerning transitive dependencies
E. installing dependencies
E. reproducibility

Proper solutions

Now we see that these options, while seemingly do the same job, they really serve different purposes. When a project resorts to using one over the other (or, even worse, cram them together), it surrenders a few aspects of full-fledged dependencies management (knowingly or not).

Which leads to the realization: both of these forms are useful and you would likely want to use both - which results in dependencies management system that fits all the needs we discussed earlier, along with a clear way to make your builds reproducibile (if you need it).

But maintaining them manually feels like a burden. - you'd say. And it is! That's why the frozen list's maintenance should be automated using the best tools you can use, e.g. Pipenv, Poetry or pip-tools. They were designed to do exactly this:

There are probably more options available. But if you're certain no ready solution suits your needs, you can go with some custom scripts - but please please review the existing tools first. :)

Tool guides

Each of the official how-to guides will explain their usage better than this (already long) article would do - so let me just point you towards their documentation:

If you're 100% sure you want to go more low-level with:

Tl;dr

If you want to manage dependencies for:

If you have any "why?" questions about these - I hope the article above answers them well enough. :)