Python Dependencies Are Fixable

Python Dependencies Are Fixable

I like Python. I've had a lot of success with it on projects large and small. It is fast enough for most of the things I need and when it isn't, migrating from it to a more performant language isn't challenging. The depth of the standard library has been incredible for stable long-living code. However the one thing I hear often when discussing Python with younger programmers is well the dependency management is so bad I wouldn't bother with the language. Lately it seems the narrative is now evolving into "it is so broken that we need a new package system to fix it", which to me is the programming version of Spock dying in the warp core. Let's make absolutely sure we have no other options.

Kirk (William Shatner) bids farewell to Spock (Leonard Nimoy) in the emotional finale of Star Trek II: The Wrath of Khan. (Photo: Paramount/Courtesy Everett Collection)

The problem here isn't one of engineering. We have all the solutions to solve this problem for the majority of users. The issue is an incorrect approach to managing defaults. Pip, like many engineering-led projects, doesn't account for the power of defaults. Engineers tend towards maintaining existing behavior and providing the tooling for people to do it correctly. That's the wrong mentality. Experts who drive the project should be adjusting the default behavior to follow best practices.

Defaults are so important and I think so intimidating to change that this decision has been pushed back for years and years. If we have a better user experience for people and we know this is what they should be using, we should not expect users to discover that best way on their own. You have to make them opt out of the correct flow, not discover and opt in to the right way to do things. Change is scary though and maintainers don't have a massive corporate structure to hide behind. Whatever ire the change generates isn't directed at Faceless Corporation PR, it's directly at the people who make the decision.

Golang taught us this lesson. I work a lot with Golang at work and on some side projects. It is an amazing language to show the power of defaults and the necessity of experts pushing users forward. Golang code at every job looks like code at every other job, which is the direct result of intentional design. Shipping gofmt bundled in with the language increased the quality and readability of golang everywhere. Decentralizing dependency management became a "of course" moment for people when they tried it. Keep the language simple in the face of demands for increased complexity has preserved the appeal. The list goes on and on.

Pypa needs to push the ecosystem forward or give up on the project and officially endorse a new approach. Offering people 400 options is destroying confidence in the core language. The design mentality has to change from "it isn't a problem if there is a workaround" to the correct approach which is for most users the default is the only option they'll ever try.

Why it isn't that broken

Why do I think that we don't need to start fresh? Here's the workflow I use, which is not unique to me. I start a new Python repo and immediately make a venv with python -m venv venv. Then I activate it with source /venv/bin/activate and start doing whatever I want. I write all my code, feel pretty good about it and decide to lock down my dependencies.

I run pip freeze > requirement.in which gives me all the packages I have installed with their versions. It's 2024 so I need more security and confidence than a list of packages with a version number. The easiest way to get that is with package hashes, which is easy to do with pip-tools. pip-compile --generate-hashes requirements.in outputs a requirements.txt with the hashes I want along with the dependencies of the packages.

build==1.0.3 \
    --hash=sha256:538aab1b64f9828977f84bc63ae570b060a8ed1be419e7870b8b4fc5e6ea553b \
    --hash=sha256:589bf99a67df7c9cf07ec0ac0e5e2ea5d4b37ac63301c4986d1acb126aa83f8f
    # via
    #   -r requirements.in
    #   pip-tools
cachetools==5.3.2 \
    --hash=sha256:086ee420196f7b2ab9ca2db2520aca326318b68fe5ba8bc4d49cca91add450f2 \
    --hash=sha256:861f35a13a451f94e301ce2bec7cac63e881232ccce7ed67fab9b5df4d3beaa1
    # via
    #   -r requirements.in
    #   google-auth
certifi==2023.11.17 \
    --hash=sha256:9b469f3a900bf28dc19b8cfbf8019bf47f7fdd1a65a1d4ffb98fc14166beb4d1 \
    --hash=sha256:e036ab49d5b79556f99cfc2d9320b34cfbe5be05c5871b51de9329f0603b0474
    # via
    #   -r requirements.in
    #   aioquic
    #   aioquic-mitmproxy
    #   mitmproxy
    #   requests

Now I know all the packages I have, why I have the packages I have and also the specific hashes of those packages so I don't need to worry about supply chain issues. My Dockerfile is also pretty idiot-proof.

FROM python:3.12-slim

# Create a non-root user
RUN groupadd -r nonroot && useradd -r -g nonroot nonroot
WORKDIR /app

COPY requirements.txt .

RUN pip3 install -r requirements.txt

COPY . .

RUN chown -R nonroot:nonroot /app

USER nonroot

ENTRYPOINT ["./gunicorn.sh"]

Yay its running. I can feel pretty confident handing this project over to someone new and having them run into minimal problems getting all of this running. Need to check for updates? Not a big deal.

pip-review
Flask==3.0.2 is available (you have 3.0.0)
Jinja2==3.1.3 is available (you have 3.1.2)
MarkupSafe==2.1.5 is available (you have 2.1.3)
pip==24.0 is available (you have 23.3.1)

Basically if you know the happy path, there are no serious problems here. But you need to know all these steps, which are documented in random places all over the internet. How did we get here and what can be done to fix it?

Why do people think it is so bad

What is the combination of decisions that got us to this place? Why is the average users opinion so low? I think its everything below.

  • Bare pip sucks for daily tasks. We can't declare a minimum version of Python, we don't get any information as to dependency relationships in the file, we don't have a concept of developer dependency vs production dependency, we don't have hashes so we're very open to upstream attacks, it's slow, it's not clear how to check for updates, there's no alert for a critical update, the list goes on and on.
    • What pip-compile does should always be the minimum. It should have been the minimum years and years ago.
    • Where pip shines is the range of scenarios it covers and backwards compatibility. We don't want to throw out all that work if we can avoid it to switch to a new package manager unless the situation is unfixable. To me the situation is extremely fixable, but we need to change defaults.
  • People used Python as a bash replacement. This was a weird period where, similar to Perl, there was an assumption that Python would be installed on any system you were working with and so you could write Python scripts to do things and then package them up as Linux packages. If your script had dependencies, you would also pull those in as Linux packages.
    • To be blunt, this dependency management system never should have been allowed. It caused no end of confusion for everyone and ended up with people using super old packages. If your Python application had dependencies, you should have included them.
    • Starting to write Python in Linux and then running apt-get install requests but then later being told to use pip and remove the package even though packages are how you get software in Linux has thrown off beginners as long as I have been doing this job.
  • The nature of dependencies has changed and how we think of including third-party software has evolved. I was shocked when I started working with NodeJS teams at how aggressively and (frankly) recklessly they would add dependencies to a project. However NPM and Node are designed around that model of lots of external dependencies and they've adopted a lot of things that people have come to expect
    • The package.json, package-lock.json and node_modules directory as a consistent design across all projects is huge. It completely eliminated confusion and ensures you can have super-easy project switching along with reproducible environments.
    • Node defaulting to per-project and not global is what Python should have switched over to years ago. Again, this is just what people expect when they're talking about having lots of dependencies.
    • People have a lot more dependencies. When I started in this field, the idea of every project adding a 66 MB dependency with boto would have been unthinkable. Not because of disk space, but because its just so much code to bring into a project. However now people don't even blink at adding more libraries. pip was designed in a world where requirements.txt were 10 lines. Now we could easily be talking 200.
    • If we're not going to switch over to per-project dependencies, then at the very least you need to switch to venv as a default. I don't care how you do it. Make a file that sits at the top level of a directory that tells Python we're using a venv. Have it check for the existence of a folder and if it exists use it by default, you gotta have something a bit easier here.
    • Why this isn't a crisis is this is effectively a basic .profile fix
   cd() {
       builtin cd "$@"
       if [ -f "venv/bin/activate" ]; then
           source venv/bin/activate
       fi
   }
  • Finally people think its bad because Golang and Rust exist. Expectations for how dependency management evolved in the space. Work has been done to expand pip to meet more of these expectations but we're still pretty far.

Where to go from here

Anyone familiar with the Apple ecosystem will know the term "Sherlocking". It's where Apple monitors the ecosystem of third-party apps and will periodically copy one and make it part of the default OS. While unfair at times to those third parties, it's a clever design from Apple's perspective. They can let someone else do all the work of figuring out what users like and don't like, what designs succeed or fail on their platform and then swoop in when there is general consensus.

pip needs to do some Sherlocking. Pypa has already done a ton of hard engineering work. We have the ability to create a more secure, easier to debug dependency management system with existing almost-stock tooling. It doesn't require any fundamental changes to the ecosystem, or the investment of a lot of engineering effort.

What it requires is being confident enough in your work to make a better experience for everyone by enduring a period of some complaints. Or its time to give up and endorse something like uv. Sitting and waiting for the problem to resolve itself through some abstract concept of community consensus has been a trainwreck. Either make the defaults conform to modern expectations or warn users when they run pip this is a deprecated project and they should go install whatever else.

Questions/comments/concerns: https://c.im/@matdevdug