Shrink your Conda Docker images with conda-pack

If you’re building a Docker image that’s based on Conda, the resulting images can be huge. For example, later I will show how a simple image with just Python 3.8 and NumPy can be over 950MB!

Large images waste bandwidth, disk, time, and CPU: how do you make the image smaller?

In this article I’ll show one way to do it, by combining the conda-pack tool with multi-stage builds. In the example case of just Python and NumPy, the image shrinks to 330MB, almost two-thirds smaller.

If you’re not familiar with multi-stage builds, I recommend reading my introduction to multi-stage builds first.

The problem: a giant image

Let’s create a standard Conda Docker image with just a couple of dependencies. Here’s the environment.yml:

name: example
channels:
  - conda-forge
dependencies:
  - python=3.8
  - numpy

And here’s the Dockerfile; we install Python 3.8 and NumPy, and when we run the image it imports NumPy to make sure everything is working. (If the conda run is not something you’re familiar with, you might want to read my article on activating Conda environments in Docker.)

FROM continuumio/miniconda3

COPY environment.yml .
RUN conda env create -f environment.yml

ENTRYPOINT ["conda", "run", "-n", "example", \
            "python", "-c", \
            "import numpy; print('success!')"]

Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.

Python on Docker Production Handbook Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.

The resulting image is 970MB, which is quite surprisingly large. Where is all the disk space being going?

  1. Conda caches downloaded packages by default.
  2. The base environment where the Conda toolchain is installed takes up a bunch of space; it has its own copy of Python, for example, in this case Python 3.7.

That second reason means the base image we used, continuumio/miniconda3, is 430MB. For comparison, the python:3.8-slim-buster image is 115MB, and it already includes the version of Python we’d want to use.

Let’s get rid of Conda!

The first reason for extra size is fairly standard with package managers, and can typically be fixed by either configuration or a well-targeted rm -rf. The second problem, however, is Conda-specific: the base Conda environment is necessary for installation of packages, but once we’re running the code it really doesn’t add much.

Enter conda-pack, a tool that let’s you package a Conda environment into a standalone environment, with no need for the Conda toolchain. Once we’ve packaged up our environment that way, we can copy it into a new image that only contains that self-contained environment.

Again, if you’re not familiar with multi-stage builds, I recommend reading my introduction to multi-stage builds first.

Here’s what our new Dockerfile looks like:

# The build-stage image:
FROM continuumio/miniconda3 AS build

# Install the package as normal:
COPY environment.yml .
RUN conda env create -f environment.yml

# Install conda-pack:
RUN conda install -c conda-forge conda-pack

# Use conda-pack to create a standalone enviornment
# in /venv:
RUN conda-pack -n example -o /tmp/env.tar && \
  mkdir /venv && cd /venv && tar xf /tmp/env.tar && \
  rm /tmp/env.tar

# We've put venv in same path it'll be in final image,
# so now fix up paths:
RUN /venv/bin/conda-unpack


# The runtime-stage image; we can use Debian as the
# base image since the Conda env also includes Python
# for us.
FROM debian:buster AS runtime

# Copy /venv from the previous stage:
COPY --from=build /venv /venv

# When image is run, run the code with the environment
# activated:
SHELL ["/bin/bash", "-c"]
ENTRYPOINT source /venv/bin/activate && \
           python -c "import numpy; print('success!')"

If we build the image, the resulting image is much smaller, and it still works just fine:

$ docker image build -t condapack .
...
$ docker container run condapack
success!
$ docker image ls condapack
REPOSITORY   TAG      IMAGE ID       SIZE
condapack    latest   6e7906bd0634   330MB

Why does this work?

Conda is an interesting packaging system in that it includes everything you need to run your program, other than the standard C library. So when we install the python=3.8 package in the environment.yml, that installs Python and all C libraries it needs.

When we use conda-pack to package our Conda environment into an isolated environment that doesn’t need Conda, the result is a directory with programs that can be run on almost any Linux distribution. So we can just copy that directory onto a plain old small Debian image, and get a self-contained running application.

Next steps

If you’re using Conda in your Docker image, conda-pack is an easy way to shrink your image. However, make sure to read my article on fast multi-stage builds; naive usage of multi-stage builds results in very slow rebuilds in CI.