Cython, Rust, and more: choosing a language for Python extensions

Sometimes pure Python code isn’t enough, and you need to implement an extension in a compiled language like C, C++, or Rust. Maybe your code is slow, and you need to speed it up. Maybe you just need access to a library written in another language.

Depending on your particular situation and needs, you may want to choose a different tool. But which one?

Let’s see what your options are, and then go through a variety of scenarios and see which of the options is most appropriate.

C, C++, Cython, and Rust: a quick overview

All four of these languages compile down to machine code, and can potentially run orders of magnitude faster than Python. Beyond that, they differ in many significant ways.

C

C was originally created in 1970, and while it has been extensively updated since then, it’s still fundamentally the same language it was back then. In terms of provided functionality, C is quite simple, providing only functions; there are no classes, for example.

The default Python interpreter is implemented in C, and is therefore often referred to as CPython. As such, the default API you would use to implement Python extensions is C, using the provided C API. In addition, Unix systems like Linux are implemented in C, so operating system and library APIs are usually presented as C APIs.

The C API for CPython is also fairly verbose, with lots and lots of boilerplate. Here’s an (untested) example of a simple wrapper for the system() C API, taken from the Python documentation:

// Licensed under PSF License Agreement

#include <stdlib.h>
#include "Python.h"

static PyObject *
spam_system(PyObject *self, PyObject *args)
{
    const char *command;
    int sts;

    if (!PyArg_ParseTuple(args, "s", &command))
        return NULL;
    sts = system(command);
    return PyLong_FromLong(sts);
}

static PyMethodDef SpamMethods[] = {
    {"system",  spam_system, METH_VARARGS,
     "Execute a shell command."},
    {NULL, NULL, 0, NULL}        /* Sentinel */
};

static struct PyModuleDef spammodule = {
    PyModuleDef_HEAD_INIT,
    "spam",
    NULL,
    -1,
    SpamMethods
};

PyMODINIT_FUNC
PyInit_spam(void)
{
    return PyModule_Create(&spammodule);
}

All that for one small function! And some of it is shared, but fundamentally it’s just very verbose. Other problems will be covered later, as they are shared with C++ and Cython.

C++

C++ originally started as “C with classes”, but has since grown into a quite different, very expressive, and correspondingly very complex, programming language. It is backwards compatible with C code, other than a few edge cases.

The greater expressiveness of C++ means it’s possible to create much less verbose APIs for writing Python extensions. The pybind11 library is a great example of how much more succinct C++ can be for writing Python extensions.

Here’s a sketch (untested), the equivalent of the previous C example. It’s a lot shorter!

#include <stdlib.h>
#include <pybind11/pybind11.h>

PYBIND11_MODULE(spam, m) {
    m.def("system", &system, "Wrap the system() C API");
}

Cython

Given how verbose the Python C API is, another option is Cython. Cython is a hybrid language: it implements Python syntax, but can interleave code that gets translated into C or C++.

The C example above would look something like this in Cython (untested, but close enough to the real thing):

cdef extern from "<stdlib.h>" nogil:
     int system (const char *command)

def my_system(command):
    return system(command)

Conversion between Python and C/C++ types can be done automatically in many cases, so e.g. there is no need to do PyLong_FromLong().

Cython can call into both C and C++ code, and even subclass C++ classes. The C++ support is somewhat limited, though, given how complex the C++ language is.

Shared problems with C, C++, and Cython

Memory unsafety

C, C++, and Cython all have manual memory management: memory must be manually allocated and freed, unlike Python which uses automatic garbage collection. And partially due to the way memory management is implemented, they all suffer from a lack of memory safety: it’s extremely easy to overwrite the wrong memory by mistake, read from uninitialized memory, and so on. This can lead to crashes, silent data corruption, and security bugs; up to 70% of security bugs in software written in C or C++ are due to memory unsafety.

Lack of build system and package repositories

C and C++ also lack a standard build system, nor a standard way to install dependencies. In Python, you can pip install a package from source or from a package repository; C has no such equivalent.

Python does provide a way to build C Python extensions in a cross-platform way. But if you are depending on existing third-party libraries you are going to have a frustrating time of it, especially if you are doing cross-platform packaging.

Concurrency

Writing fast concurrent code that works without crashing, corrupting data, or otherwise breaking can be quite difficult in these languages.

The (often superior) alternative: Rust

Rust is a new language that was designed from the ground up as a replacement for C++, though it can also be used to replace C. Rust by default cannot have memory unsafety problems, and the same architecture that allows this also allows “fearless concurrency”: you can write concurrent code without worrying about data races or memory corruption. At the same time, Rust can be just as fast as C or C++, and it has escape hatches to allow doing the unsafe operations C and C++ allow.

Rust also has a modern build and package system called cargo, and a modern package distribution system (crates.io). That means that adding new Rust dependencies is trivial, instead of a nightmare of bespoke, baroque, or hacked up build systems most C libraries provide.

In short, Rust is fundamentally superior to C and C++ in both safety and tooling, while still giving you access to the same performance and functionality.

The main downsides to Rust are that in order to achieve safety it has to use a rather different programming model than C or C++, which has a bit of learning curve. And as a fairly new language, it has fewer libraries available, and fewer learning resources.

Rust can call C code, or to a lesser extent C++, but as we’ll discuss next if that’s your main use case you’re better off with other solutions.

For Python integration, you can use PyO3 to write Python extensions, and then for packaging either Maturin for simple Rust-focused extensions or setuptools-rust if you need to integrate with an existing Python package with a setup.py.

Note: Whether or not any particular tool or technique will speed things up depends on where the bottlenecks are in your software.

Need to identify the performance and memory bottlenecks in your own Python data processing code? Try the Sciagraph profiler, with support for profiling both in development and production on macOS and Linux, and with built-in Jupyter support.

A performance timeline created by Sciagraph, showing both CPU and I/O as bottlenecks
A memory profile created by Sciagraph, showing a list comprehension is responsible for most memory usage

Choosing the right tool for the job

Given these options, let’s go over a number of scenarios and see which of these options you might want to choose.

Scenario #1: Making a math calculation run faster

Imagine you have a Python function doing some math, and you need it to run a lot faster. In this case, Cython is the easiest route to speeding up the code from the options we’ve discussed.

Since Cython supports the same syntax as Python, you can just take the same exact Python code, sprinkle in some C types on variables, and it is likely to run vastly faster:

def fib(int n):
    cdef int a, b
    a, b = 0, 1
    while b < n:
        a, b = b, a + b
    return n

Other alternatives

  • Numba uses JIT compilation to make this sort of Python function run faster.
  • Mypyc can use Python type annotations to compile code into native extensions, but note that it’s still experimental.

Scenario #2: Implementing a well-known data structure, algorithm, or API client

In this situation, the first thing to do is to look for an existing implementation. For example, a client asked me to speed up some code that would check if any of a collection of strings appeared in another string. A common algorithm for solving this problem is Aho-Corasick, so I looked for existing implementations.

As it turns out, there existed one already written in Cython, so I could have just suggested that. However, given their need for performance, I did some further testing, and it turns out the Rust ahocorasick library was even faster, so I wrapped that for Python. Given an existing library, the wrapping code was fairly short; the result is much faster than the Cython library. The extra performance was not because of the implementation language but because of a more sophisticated algorithm implementation, mind you.

If there’s an existing implementation in Rust and it seems robust and well-maintained, consider wrapping it.

If you can only find an existing library in C or C++, or if none exist that you can find, we’re on to the next scenarios.

Scenario #3: Wrapping a C or C++ library

If you are going to wrap a C or C++ library:

  • For C, the easiest option is Cython. You could use the Python C API directly, but that involves a lot of boilerplate.
  • For C++, you can use Cython, but Cython has limited C++ support, and you need to reimplement all the headers using Cython’s syntax. So instead I would suggest pybind11, or the faster nanobind library if you’re on a compiler that can support C++17.

Other alternatives that reduce boilerplate

  • cffi lets you write Python extensions for C that will also run much faster on the alternative PyPy interpreter.
  • Swig lets you generate Python extensions, as well as other programming language extensions at the same time with the same configuration.

Scenario #4: Writing a large pile of code from scratch

Sometimes you can’t use existing libraries, you’re just going to have to write the code yourself. And you want want it to be in a fast language for performance reasons, otherwise you’d just be writing in Python.

In this case you could use C or C++, but then you’re adding more code to the world that is likely to be unsafe and buggy. It’s just too difficult to write correct C or C++.

Cython is another alternative, but eventually you’re just falling back to C/C++, and frankly I wouldn’t want to write large amounts of code in Cython. Since it compiles to C/C++, you have two phases of compilation, and the errors from Cython can be quite difficult to debug. And any code you write in Cython is heavily tied to Python, and can’t be reused in other contexts.

So if you do need to write a significant amount of code, I strongly recommend using Rust. There will be a learning curve, but the resulting combination of safety, expressiveness, and performance are well worth it.

Choosing the right tool

By now you’ve no doubt gotten the message that I really love Rust. But even so, notice that in two of the scenarios above I recommend other alternatives.

Different situations require different approaches. So don’t just pick the tool you’re most familiar with—though you should definitely take that into account in your decision! You should also spend a little time thinking about long term maintainability, about security, and if there are ways you can reuse existing code.