Automating The Dark Arts

Packaging and Distributing cppyy-generated Python Bindings for C++ Projects with CMake and Setuptools

TL;DR

I rewrote the cppyy CMake modules to be much more user friendly and to work using only Anaconda/PyPI packages, and to generate more feature-complete and customizable Python packages using CMake’s configure_file, while also supporting distribution of cppyy pythonization functions. I then rewrote the existing k-nearest-neighbors example project to use my new system, and wrote bindings generation for bbhash as an example with a real library. Finally, I wrote a recipe for cookiecutter to generate project templates for CMake-built cppyy bindings.

A Bit About Bindings

I’ve been writing Python bindings for C++ code for a number of years now. My first experiences were with raw CPython for our lab’s khmer/oxli project; if you’ve ever done this before, you’re aware that it’s laborious, filled with boilerplate and incantations which are required but easily overlooked. After a while, we were able to convince Titus that the maintenance burden and code bloat would be much lessened by switching to Cython for bindings generation. After a major refactor, most of the old raw CPython was excised.

Cython, however, while powerful, is much more suited for interacting with C than C++. Its support for templates is limited (C++11 features are only marginally supported, let alone newer features like variadic templates); it often fails with more than a few overloads; and there’s no built-in mechanism for generating Python classes from templated code. Furthermore, one has to redefine the C/C++ interface in cdef extern blocks in Cython .pxd files, before dealing with many issues converting types and dealing with encodings. In short, Cython obviates the need for a lot of the boilerplate required with raw CPython, while requiring its own boilerplate and keeping three distinct layers of code synchronized.

My own research codebase makes considerably more use of C++11 and beyond template features than the khmer/oxli codebase, but I still wrap it in Python for easier high-level scripting and, in particular, testing. I’ve been working around Cython’s template limitations with a jury-rigged solution: Cython files defined with Jinja2 templates and a hacknied template type substitution mechanism, all hooked together with my own build system written in pydoit with its own pile of half-assed code for deducing Cython compilation dependencies, a feature which setuptools for Cython currently lacks. This system has worked surprisingly well, regardless of my labmates’ looks of horror when I describe it, but its obviously brittle. Luckily, there is a better way…

Automatic Bindings Generation

While discussing this eldritch horror of a build system, my labmate Luiz mentioned a project he’d heard of in passing called cppyy. I began looking into it, and then tested it out on my project. I quickly came to the conclusion that it’s the darkest bit of code art I’ve ever laid eyes on, and I can only assume that Wim Lavrijsen and coauthors spent many a long night around a demon-summoning circle to bring it into existence. Regardless, when it comes to projects that so nicely solve my problems, I can work around Azazel being a core dev.

cppyy is built around the cling interactive C++ interpreter, which itself grew out of the ROOT project at CERN. It uses Clang/LLVM to generate introspection data for C++ code, and then generates and JITs bindings for use in Python. A demo and some more explanation is available on the Python Wiki, though I believe the cppyy docs and their notebook tutorial are more up to date.

cppyy does, to put it mildly, some damn cool shit. A few examples are:

And remember that this happens automatically. Awesome!

Meta

This post isn’t meant to be a complete tutorial on cppyy. For that, you should look through the documentation and tutorial. Rather, I’m aiming to head off problems that I ran into along the way, and then provide some solutions. So, read on!

Getting it Working

Now, with much love to our colleagues at CERN and elsewhere, my experience interacting with code written by physicists, or even touched by physicists, has been checkered at best. cppyy suffers from some of that familiar lack of tutorials and documentation, but is greatly served by an extremely responsive developer who also happens to seem like a nice person. wlav was very helpful during my experimentation, and I thank them for that.

With that said, I’m going to go through some of the problems I ran into. Ultimately, I decided that if I was going to solve those problems for me, I ought to solve them for everybody, hence this post and the associated software.

To start testing, I immediately began trying to run the associated tools on my own code to see if I could get a few classes working. First, I would need to install cppyy, which turns out to be quite simple. Unfortunately, there is not a recent package on conda-forge, but PyPI is up-to-date, and you can install with pip install cppyy. This will build cppyy’s modified libcling.

Dependencies

The documentation suggested running the code through genreflex, which would require an interface file #includeing my headers and explicitly declaring any template specializations I would need. genreflex runs rootcling with a bunch of preconfigured options, which ends up calling into clang and hence needs properly configured includes and library paths. It’s likely it will fail at first, due to being unable to find libclang.so or some variant; this can be solved by a sudo apt install clang-7 libclang-7-dev, or whatever the equivalent for your distribution. You can then go on to run genreflex and then compile with your system compilers as described in the docs.

This is fine for mucking about on your own computer, but ultimately, with modern scientific software, it’s desirable to get this working in a conda environment (and for my purposes, bioconda). This required a lot of trial and error, but ultimately, the necessary minimal invocation for the rest of this post is:

conda create -n cppyy-example python=3 cxx-compiler c-compiler clangdev libcxx libstdcxx-ng libgcc-ng cmake
conda activate cppyy-example 
pip install cppyy clang

The cxx-compiler and c-compiler packages bring in the conda-configured gcc and g++ binaries, and libcxx, libstdcxx-ng, and libgcc-ng bring in the standard libraries. Finally, clangdev brings in libclang.so, which is needed down the line, as is the Python clang package. Then we install cppyy from PyPI with pip, which will build its own cling and whatnot.

At this point, you should be able to generate and build bindings, and then load the resulting dictionaries, as described.

Build Systems

Now that I was able to get things working, I wanted to get it fit into a proper build system. I had already decided to convert my project to CMake, and cppyy happens to include its own cmake modules for automating the process. Unfortunately, I rather quickly ran into issues here:

Results

A First Pass: bbhash

So, I went about fixing these things. I took a step away from my rather more complex project, and aimed at wrapping a smaller library (which happens to be one of my dependencies), bbhash, which provides minimal perfect hash functions. This also gave me the chance to learn a lot more about CMake. The results can be found on my github. Essentially, this implementation solves the problems listed above: it uses genreflex and a selection XML; it properly finds and utilizes all the conda compilers and libraries; it allows for packaging pythonizations with a discovery mechanism similar to pytest and other such packages; and it uses templates for the generation of the necessary Python package files. The end result also provides installation targets for the both the underlying C++ library and the resulting bindings. Finally, it’s portable enough that I even have it running and continuous integration.

The Second pass: knn-nearest-neighbors-example

The bbhash example above, while more complex than the existing toy example, does not use a dynamically linked library: rather, the underling header-only library is stuck into a static library and bundled directly into the bindings’ shared library. I figured I should also apply my work to the existing example, and at the same time fix the previously mentioned issue, so I went ahead and used the same project structure from cppyy-bbhash for a new knn-example. This time, a shared library is created for the underlying C++, and dynamically linked to the bindings library.

The Third pass: a cookiecutter template

Now that I’d worked out most of the kinks, I figured I ought to make usage a bit easier. So, I created a cookiecutter recipe that will sketch out a basic project structure with my CMake modules and packaging templates. This is a work in progress, but is sufficient to reproduce the previous two repositories.

fin. Well, not really.

I plan to continue work on improving the cookiecutter template and ironing out any more kinks. And of course, I now have to finish applying this work to my own project, as I set out to do in the first place! Finally, I plan on working up a conda recipe demonstrating distribution, so that hopefully, one will soon be able to do a simple conda install mybindings. Look for a future post on that front :)

If you got this far, thanks for your patience and happy hacking!