Implementing C++ Virtual Functions in Cython - HedgeDoc
  6308 views
<center> Implementing C++ Virtual Functions in Cython === *Written by [JDC](https://twitter.com/jdcaballerov). Originally published 2021-02-02 on the [Monadical blog](https://monadical.com/blog.html).* </center> ___ *TLDR; This is an extensive article that assumes Cython knowledge and describes two strategies for using C++ code from Python, requiring the implementation of virtual functions in C++ abstract classes. Elements of these solutions have been discussed before, but they are scattered through forums, GitHub issues, and Stack Overflow. The first strategy implements C++ wrapper classes and then wraps them with Cython classes. The second strategy allows us to write the virtual functions for the abstract classes in Cython/Python.* ## Introduction In [an earlier post](https://monadical.com/posts/knowledge-in-box.html), we described a project we did for [Kiwix](https://www.kiwix.org/en/about/), which we had a lot of fun working on. Kiwix offers an awesome service--an offline reader that makes a huge amount of online content available to people who have no or limited internet access. [More than half of the world’s population](https://www.itu.int/dms_pub/itu-s/opb/pol/S-POL-BROADBAND.18-2017-PDF-E.pdf) is in this position, due to infrastructure issues, censorship, or affordability. That’s 4 billion people without access to powerful resources like Wikipedia and Youtube that the rest of us take for granted. Kiwix scrapes sites like these in their entirety and stores them using a library called [‘libzim’](https://github.com/openzim/libzim). Libzim packages the content into a [ZIM file](https://wiki.openzim.org/wiki/ZIM_file_format), which can be easily saved to a phone, computer, or USB. Users can then browse websites as if they were online--all for free! ![kiwix logo](https://docs.monadical.com/uploads/upload_6049f777464802e30b5c4b226bcc5798.png) Now, libzim is a C++ library and Kiwix’s content scraper is written in Python. Kiwix was originally copying all the scraped content to a disk (the file system hard drive) and then using another tool to bundle it into libzim. But this was pretty slow and took up too much disk space. You have to imagine the sheer magnitude of information Kiwix is dealing with here. Wikipedia on its own is vast--never mind when you add to that [YouTube](https://www.youtube.com/), [Project Gutenberg](https://www.gutenberg.org/), [Stack Exchange](https://stackexchange.com/), [Codecademy](https://www.codecademy.com/), [TED](https://www.ted.com/)...! And we’re not just talking about a one-shot process: Kiwix has to regularly download, organize, and bundle content to keep up with changes to these resources. The time had come to optimize this process. ![zim file](https://docs.monadical.com/uploads/upload_a6d949cbffbae82d7a340e6c7c3f0594.png) This is where Monadical came in: our job was to develop Python bindings for libzim, replacing the intermediary disk and speeding up the communication between libzim and the scraper. If you want a higher-level overview of this project, you can check out [How to Fit All Human Knowledge in a Box](https://monadical.com/posts/knowledge-in-box.html). In this post, I want to explain the technical details of our solution and how we were able to bind Python with libzim by implementing virtual functions in Cython. ### The Challenge After considering [different binding alternatives](https://realpython.com/python-bindings-overview/), we decided that [Cython](https://github.com/cython/cython) was the best fit because of its documentation, its active community, and the projects that use it, such as [numpy](https://cython.readthedocs.io/en/latest/src/tutorial/numpy.html). For each scraped piece of content (bytes)--i.e., web pages, images, videos, js, other resources)--we needed to construct an article and call the `add_article` [function](https://github.com/openzim/libzim/blob/7716d8545bf26f7c3cc381affdcf8d2ca63e1768/include/zim/writer/creator.h#L48) on the `zim::writer::Creator` [class](https://github.com/openzim/libzim/blob/master/include/zim/writer/creator.h) within the library. This function has the following prototype: ```cpp void addArticle(std::shared_ptr<Article> article); ``` However, this is how the class `zim::writer::Article` of the `article` parameter is [defined](https://github.com/openzim/libzim/blob/master/include/zim/writer/article.h) ```cpp class Article { public: virtual Url getUrl() const = 0; virtual std::string getTitle() const = 0; virtual bool isRedirect() const = 0; virtual bool isLinktarget() const; virtual bool isDeleted() const; virtual std::string getMimeType() const = 0; virtual bool shouldCompress() const = 0; virtual bool shouldIndex() const = 0; virtual Url getRedirectUrl() const = 0; virtual zim::size_type getSize() const = 0; virtual Blob getData() const = 0; virtual std::string getFilename() const = 0; virtual ~Article() = default; // returns the next category id, to which the article is assigned to virtual std::string getNextCategory(); }; ``` Notice that the `Article` class is composed of [pure virtual functions](https://www.geeksforgeeks.org/pure-virtual-functions-and-abstract-classes/) and is an abstract class. That means we needed to provide `Article` objects with function implementations of the interface so that `libzim` is able to call the functions on those objects, i.e, to obtain the contents, title, etc. when writing the content to disk. The first strategy we considered involved implementing the `Article` class on a `C++` wrapper with data members in C++ Land, and then wrapping this `C++` wrapper class with a Cython class, and passing the contents from Python/Cython to obtain a property based pythonic API. Let's see how that works. ## Part 1: Basic Cython Wrapper To start with, let's create a minimal `setup.py` file for our Cython wrapper. `setup.py` ```python import os from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() setup( name = "python-libzim", version = "0.0.1", author = "Monadical Inc - Juan Diego Caballero", author_email = "jdc@monadical.com", description = ("A python-facing API for creating and interacting with ZIM files"), license = "GPLv3", long_description=read('README.md'), long_description_content_type='text/markdown', ext_modules = cythonize([ Extension("libzim", ["libzim/*.pyx","libzim/wrappers.cpp"], libraries=["zim"], extra_compile_args=['-std=c++11'], language="c++"), ], compiler_directives={'language_level' : "3"} ) ) ``` ### A Simple `C++` Wrapper To implement our virtual class, let's define a `zim::writer::Article` wrapper named `ArticleWrapper` that will implement the required functions: `libzim/wrappers.cpp` ```cpp #include <string> #include <iostream> #include <zim/zim.h> #include <zim/article.h> #include <zim/blob.h> #include <zim/writer/article.h> #include <zim/file.h> #include <zim/search.h> #include <zim/writer/creator.h> class ArticleWrapper : public zim::writer::Article { public: virtual ~ArticleWrapper() = default; ArticleWrapper(char ns, // namespace std::string url, std::string title, std::string mimeType, std::string redirectUrl, bool _shouldIndex, std::string content) : ns(ns), url(url), title(title), mimeType(mimeType), redirectUrl(redirectUrl), _shouldIndex(_shouldIndex), content(content) { } char ns; std::string url; std::string title; std::string mimeType; std::string redirectUrl; bool _shouldIndex; std::string content; std::string fileName; // Virtual Member functions implementations .... virtual std::string getTitle() const { return title; } .... virtual std::string getMimeType() const { return mimeType; } }; ``` This simplified implementation defines functions over its data members as follows: ```cpp virtual std::string getTitle() const { return title; } virtual zim::Blob getData() const { return zim::Blob(&content[0], content.size()); } ``` To enable a `Creator`'s `add_article` function to accept our newly defined `ArticleWrapper` class and set some defaults, we override the original `zim::writer::Creator` with `CreatorWrapper` as follows: `libzim/wrapper.cpp` continuation ```cpp class OverriddenZimCreator : public zim::writer::Creator { public: OverriddenZimCreator(std::string mainPage) : zim::writer::Creator(true), mainPage(mainPage) {} virtual zim::writer::Url getMainUrl() { return zim::writer::Url('A', mainPage); } void setMainUrl(std::string newUrl) { mainPage = newUrl; } std::string mainPage; }; class CreatorWrapper { public: CreatorWrapper(OverriddenZimCreator *creator) : _creator(creator) { } ~CreatorWrapper() { delete _creator; } static CreatorWrapper *create(std::string fileName, std::string mainPage, std::string fullTextIndexLanguage, int minChunkSize) { bool shouldIndex = !fullTextIndexLanguage.empty(); OverriddenZimCreator *c = new OverriddenZimCreator(mainPage); c->setIndexing(shouldIndex, fullTextIndexLanguage); c->setMinChunkSize(minChunkSize); c->startZimCreation(fileName); return (new CreatorWrapper(c)); } void addArticle(std::shared_ptr<ArticleWrapper> article) { _creator->addArticle(article); } void setMainUrl(std::string newUrl) { _creator->setMainUrl(newUrl); } zim::writer::Url getMainUrl() { return _creator->getMainUrl(); } OverriddenZimCreator *_creator; }; ``` Up to this point, we have normal `C++` wrapping code. We haven't created any Cython related code other than the minimal `setup.py`. ### Defining the interface To use the wrapper's and library's code from Cython/Python, we need to let Cython know about the interface. We use the `*.pxd` files for this. Let's describe all the functions and data types to be used or called from our Cython/Python code: `zim_wrapper.pxd` ```python from libcpp.string cimport string from libc.stdint cimport uint32_t, uint64_t from libcpp cimport bool from libcpp.memory cimport shared_ptr, unique_ptr cdef extern from "zim/zim.h" namespace "zim": ctypedef uint32_t size_type ctypedef uint64_t offset_type cdef extern from "zim/blob.h" namespace "zim": cdef cppclass Blob: char* data() except + char* end() except + int size() except + cdef extern from "zim/writer/url.h" namespace "zim::writer": cdef cppclass Url: string getLongUrl() except + cdef extern from "wrappers.cpp": cdef cppclass ArticleWrapper: ArticleWrapper(char ns, string url, string title, string mimeType, string redirectAid, bool _shouldIndex, string content) except + string getTitle() except + const Blob getData() except + string getMimeType() except + bool isRedirect() except + Url getUrl() except + Url getRedirectUrl() except + char ns string url string title string mimeType string redirectUrl string content cdef cppclass CreatorWrapper: @staticmethod CreatorWrapper *create(string fileName, string mainPage, string fullTextIndexLanguage, int minChunkSize) except + void addArticle(shared_ptr[ArticleWrapper] article) except + Url getMainUrl() except + void setMainUrl(string) except + ``` ### The Cython Wrapper With the `C++` wrappers defined in `wrappers.cpp` and a description of the interface that Cython can understand in `wrappers.pxd`, let's write the Cython/Python code that will use and call the C++ functions and data. Cython requires that we write `*.pyx` files that allow us to combine Cython/Python code with `C++`. The strategy for getting a pythonic API is to wrap the `C++` data types and class functions with Cython classes `cdef class ZimArticle` and `cdef class ZimCreator`. Each class holds an "internal" pointer to an object of its corresponding `C++` class (either `ArticleWrapper` or `CreatorWrapper`) and wraps the functions and data types. Let's start by defining Cython's `ZimArticle` and a `__cinit__` to set its "internal" pointer: `pyzim.pyx` ```python # This imports all the definitions from zim_wrapper.pxd (ArticleWrapper, etc) cimport zim_wrapper as zim cdef class ZimArticle: # This is an "internal" pointer to hold a C++ ArticleWrapper cdef zim.ArticleWrapper *c_zim_article def __cinit__(self, url="", content="", namespace= "A", mimetype= "text/html", title="", redirect_article_url= "", should_index=True ): # Creates a new ArticleWrapper object c_zim_art = new zim.ArticleWrapper(ord(namespace), # Namespace url.encode('UTF-8'), # url title.encode('UTF-8'), # title mimetype.encode('UTF-8'), # mimeType redirect_article_url.encode('UTF-8'),# redirectUrl should_index, # shouldIndex content) self.c_zim_article = c_zim_art def __dealloc__(self): if self.c_zim_article != NULL: del self.c_zim_article ``` The code is straightforward and just creates new `C++` "internal" to the class objects `c_zim_article` from our `C++` wrapper class `ArticleWrapper`, getting the input from Python via the `__cinit__` constructor. Now let's write Python wrappers for `ArticleWrapper` data members. Fortunately, Cython has a neat feature that enables a very pythonic property-based API. When we wrote `zim_wrapper.pxd` above, notice that we also included `ArticleWrappers`'s' public data members (e.g., char ns, string title, etc.): ```python cdef extern from "wrappers.cpp": cdef cppclass ArticleWrapper: ArticleWrapper(char ns, string url, string title, string mimeType, string redirectAid, bool _shouldIndex, string content) except + string getTitle() except + const Blob getData() except + string getMimeType() except + bool isRedirect() except + Url getUrl() except + Url getRedirectUrl() except + char ns string url string title string mimeType string redirectUrl string content ``` This allows us to use very simple accessor methods (setters, getters) on the Python/Cython side. We can access the public members with a simple dot operator: ```python @property def title(self): """Get the article's title""" return self.c_zim_article.title.decode('UTF-8') @title.setter def title(self, new_title): """Set the article's title""" self.c_zim_article.title = new_title.encode('UTF-8') ``` We follow a similar strategy for wrapping the C++ `CreatorWrapper` in Cython. However, notice that the Cython wrapper for `add_article` will accept Cython `ZimArticle` objects. This will allow us to use the class with a pythonic API from Cython/Python, allowing it to take care of the internal details of dereferencing and creating a shared pointer inside the function as follows: ```python def add_article(self, ZimArticle article): """Add a ZimArticle to the Creator object. Parameters ---------- article : ZimArticle The article to add to the file """ # Make a shared pointer to ArticleWrapper from the ZimArticle object (dereference internal c_zim_article) cdef shared_ptr[zim.ArticleWrapper] art = make_shared[zim.ArticleWrapper](dereference(article.c_zim_article)); self.c_creator.addArticle(art) ``` A complete file with the simplified implementation will look like this: #### Implementation `pyzim.pyx` ```python from libcpp.string cimport string from libcpp cimport bool from libcpp.memory cimport shared_ptr, unique_ptr, make_shared import datetime import copy from collections import defaultdict from cython.operator import dereference, preincrement cimport zim_wrapper as zim ######################### # ZimArticle # ######################### cdef class ZimArticle: """ A class to represent a Zim Article. Attributes ---------- *c_zim_article : zim.ArticleWrapper a pointer to the C++ article object Properties ----------- namespace : str the article namespace title : str the article title content : str the article content longurl : str the article long url i.e {NAMESPACE}/{redirect_url} url : str the article url mimetype : str the article mimetype is_redirect : bool flag if the article is a redirect redirect_longurl: str the long redirect article url i.e {NAMESPACE}/{redirect_url} redirect_url : str the redirect article url """ cdef zim.ArticleWrapper *c_zim_article VALID_NAMESPACES = ["-","A","B","I","J","M","U","V","W","X"] def __cinit__(self, url="", content="", namespace= "A", mimetype= "text/html", title="", redirect_article_url= "", should_index=True ): """Constructs a ZimArticle from parameters. Parameters ---------- url : str Article url without namespace content : str - bytes Article content either str or bytes namespace : {"A","-","B","I","J","M","U","V","W","X"} Article namespace (the default is A) mimetype : str Article mimetype (the default is text/html) title : str Article title redirect_article_url : str Article redirect if article is a redirect should_index : bool Flag if article should be indexed (the default is True) """ # Encoding must be set to UTF-8 #cdef bytes py_bytes = content.encode(encoding='UTF-8') #cdef char* c_string = py_bytes bytes_content =b'' if isinstance(content, str): bytes_content = content.encode('UTF-8') else: bytes_content = content if namespace not in self.VALID_NAMESPACES: raise ValueError("Invalid Namespace") c_zim_art = new zim.ArticleWrapper(ord(namespace), # Namespace url.encode('UTF-8'), # url title.encode('UTF-8'), # title mimetype.encode('UTF-8'), # mimeType redirect_article_url.encode('UTF-8'),# redirectUrl should_index, # shouldIndex bytes_content) self.__setup(c_zim_art) def __dealloc__(self): if self.c_zim_article != NULL: del self.c_zim_article cdef __setup(self, zim.ZimArticle *art): """Assigns an internal pointer to the wrapped C++ article object. A python ZimArticle always maintains a pointer to a wrapped zim.ZimArticle C++ object. The python object reflects the state, accessible with properties, of a wrapped C++ zim.ZimArticle, this ensures a valid wrapped article that can be passed to a zim.ZimCreator. Parameters ---------- *art : zim.ArticleWrapper Pointer to a C++ article object """ # Set new internal C zim.ZimArticle article self.c_zim_article = art @property def namespace(self): """Get the article's namespace""" return chr(self.c_zim_article.ns) @namespace.setter def namespace(self,new_namespace): """Set the article's namespace""" if new_namespace not in self.VALID_NAMESPACES: raise ValueError("Invalid Namespace") self.c_zim_article.ns = ord(new_namespace[0]) @property def title(self): """Get the article's title""" return self.c_zim_article.title.decode('UTF-8') @title.setter def title(self, new_title): """Set the article's title""" self.c_zim_article.title = new_title.encode('UTF-8') @property def content(self): """Get the article's content""" data = self.c_zim_article.content try: return data.decode('UTF-8') except UnicodeDecodeError: return data @content.setter def content(self, new_content): """Set the article's content""" if isinstance(new_content, str): self.c_zim_article.content = new_content.encode('UTF-8') else: self.c_zim_article.content = new_content @property def longurl(self): """Get the article's long url i.e {NAMESPACE}/{url}""" return self.c_zim_article.getUrl().getLongUrl().decode("UTF-8", "strict") @property def url(self): """Get the article's url""" return self.c_zim_article.url.decode('UTF-8') @url.setter def url(self, new_url): """Set the article's url""" self.c_zim_article.url = new_url.encode('UTF-8') @property def redirect_longurl(self): """Get the article's redirect long url i.e {NAMESPACE}/{redirect_url}""" return self.c_zim_article.getRedirectUrl().getLongUrl().decode("UTF-8", "strict") @property def redirect_url(self): """Get the article's redirect url""" return self.c_zim_article.redirectUrl.decode('UTF-8') @redirect_url.setter def redirect_url(self, new_redirect_url): """Set the article's redirect url""" self.c_zim_article.redirectUrl = new_redirect_url.encode('UTF-8') @property def mimetype(self): """Get the article's mimetype""" return self.c_zim_article.mimeType.decode('UTF-8') @mimetype.setter def mimetype(self, new_mimetype): """Set the article's mimetype""" self.c_zim_article.mimeType = new_mimetype.encode('UTF-8') @property def is_redirect(self): """Get if the article is a redirect""" return self.c_zim_article.isRedirect() def __repr__(self): return f"{self.__class__.__name__}(url={self.longurl}, title=)" ######################### # ZimCreator # ######################### cdef class ZimCreator: """ A class to represent a Zim Creator. Attributes ---------- *c_creator : zim.CreatorWrapper a pointer to the C++ Creator object _filename : str zim filename """ cdef zim.CreatorWrapper *c_creator cdef object _filename cdef object _main_page cdef object _index_language cdef object _min_chunk_size def __cinit__(self, str filename, str main_page = "", str index_language = "eng", min_chunk_size = 2048): """Constructs a ZimCreator from parameters. Parameters ---------- filename : str Zim file path main_page : str Zim file main_page index_language : str Zim file index language (default eng) min_chunk_size : int Minimum chunk size (default 2048) """ self.c_creator = zim.CreatorWrapper.create(filename.encode("UTF-8"), main_page.encode("UTF-8"), index_language.encode("UTF-8"), min_chunk_size) self._filename = filename self._main_page = self.c_creator.getMainUrl().getLongUrl().decode("UTF-8", "strict") self._index_language = index_language self._min_chunk_size = min_chunk_size @property def filename(self): """Get the filename of the ZimCreator object""" return self._filename @property def main_page(self): """Get the main page of the ZimCreator object""" return self.c_creator.getMainUrl().getLongUrl().decode("UTF-8", "strict")[2:] @main_page.setter def main_page(self,new_url): """Set the main page of the ZimCreator object""" # Check if url longformat is used if new_url[1] == '/': raise ValueError("Url should not include a namespace") self.c_creator.setMainUrl(new_url.encode('UTF-8')) @property def index_language(self): """Get the index language of the ZimCreator object""" return self._index_language @property def min_chunk_size(self): """Get the minimum chunk size of the ZimCreator object""" return self._min_chunk_size def add_article(self, ZimArticle article): """Add a ZimArticle to the Creator object. Parameters ---------- article : ZimArticle The article to add to the file """ # Make a shared pointer to ArticleWrapper from the ZimArticle object (dereference internal c_zim_article) cdef shared_ptr[zim.ArticleWrapper] art = make_shared[zim.ArticleWrapper](dereference(article.c_zim_article)); self.c_creator.addArticle(art) def __repr__(self): return f"{self.__class__.__name__}(filename={self.filename})" ``` ### Compiling the Extension To compile our extension, the following command is inputed: ```bash python3 setup.py build_ext -i ``` ### A Pythonic API This implementation using the `ZimArticle` wrapper enables a very pythonic property-based API. `example.py` ```python import libzim test_content = '''<!DOCTYPE html> <html class="client-js"> <head><meta charset="UTF-8"> <title>Monadical</title> <h1> ñññ Hello, it works ñññ </h1></html>''' # Create a filled article article = libzim.ZimArticle(namespace="A", url = "Monadical", title="Monadical", content=test_content, should_index = True) print(article.longurl) print(article.url) # Create an empty article then fill it article2 = libzim.ZimArticle() article2.content = test_content article2.url = "Monadical_SAS" article2.title = "Monadical SAS" ``` Using the creator is also straightforward: ```python # Write the articles zim_creator = libzim.ZimCreator('test.zim',main_page = "welcome",index_language= "eng", min_chunk_size= 2048) # Add article to zim file zim_creator.add_article(article) zim_creator.add_article(article2) ``` ### Summary of the solution using the first strategy ![](https://docs.monadical.com/uploads/upload_bd5d4032c302a212cdd36955504e9d88.png) In this solution, the article wrapper class `ArticleWrapper` in C++ Land implements the virtual functions of the abstract class `zim::writer::Article` i.e., `getData(), getTitle()` and declares public data members i.e., `title, content, etc`. A Cython wrapper class `ZimArticle` creates a pointer to a new object of the C++ `ArticleWrapper` class used to fill and access the declared public data members of `ArticleWrapper` objects using the constructor and Python properties as accessor functions. ## Part 2: Writing virtual functions in Cython The first strategy produces a neat pythonic API. However, it might involve holding a huge amount of content in memory and might not be appropriate for every use case. Let’s say that, instead of getting the article content from a short Python string, we’re getting it from a video stream reader. With the first solution, we would need to hold all the content in memory (std::string content) until it could be written to disk by `libzim`. Think of this strategy as working a bit like service in a restaurant: the kitchen has to prepare and store the whole table’s order before it can go out. This is fine if you’re dealing with tables of four, but if a hundred customers walk in (i.e. someone wants to download Wikipedia), you’re going to need a different approach. Wouldn’t it be nice if we could implement the `getTitle()` function--and the others required by the `libzim` interface--directly in Python, so that the data is lazy loaded by `libzim` when needed? In this case, the data would be siphoned from the reader to the disk by libzim. This was the second strategy we considered. Let’s try it out. Cython not only allows us to call C code from Python, it also allows us [to make declarations from a Cython module available for use by external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#using-cython-declarations-from-c), thus exposing a public API. To implement this strategy, we will declare a public Cython API that will receive a pointer to a Python object and a function to call in the Python object. The return value from Python Land will be passed to C++ via the public API. ![](https://docs.monadical.com/uploads/upload_8e468898c543f7410bdea5b7aadfea45.png) ### Getting an article title in C++: A sample execution journey To fully understand this strategy, let’s look at how an article title is finally available in C++ Land. As you may notice in the diagram above, our C++ `ArticleWrapper` class no longer declares public data members but holds a pointer to a Python object: `PyObject` `*obj`. When a Cython class `ZimArticle` is constructed, a pointer to an `ArticleWrapper` is initialized with a pointer to self (the current `ZimArticle`). This makes a pointer to the Cython `ZimArticle` available in the C++ wrapper object. The next step is using a public API, exposing a Python/Cython function `cy_call_fct` callable from C++ Land that takes as arguments a pointer to a Python/Cython object (a `ZimArticle`) and a function name to call on the Python/Cython object. The result is returned to the caller in C++ Land. Then, we declare member functions in C++ Land (i.e `getTitle()` ) that use the public API to obtain the data from Python/Cython Land. Let’s follow the call from C++ Land inside libzim. When, deep inside `libzim`, the function `getTitle()` is called on an `ArticleWrapper` (a subclass of `zim::writer::Article` ) object, this implementation returns whatever it obtains from calling `cy_call_fct(*obj,’get_title’)`. This function is a public API function exposed from Python/Cython Land that returns the evaluation of the function `get_title` on the object `*obj` that is the `ZimArticle`. This way, we end up with the string `“Hello”` lazily loaded and available in C++ Land. What we have constructed is the equivalent of implementing C++ virtual functions in Python/Cython. ![show me the code quote](https://docs.monadical.com/uploads/upload_fcc2f5b4843698a64cdaf7dd740668ad.jpg) ### Implementation First let’s declare `ZimArticle` with the constructor `__init__` saving a pointer to a `ZimArticleWrapper` and passing a pointer to self `<cpy_ref.PyObject*>self` as an initialization argument. `libzim.pyx` ```python3 cimport libzim cimport cpython.ref as cpy_ref from cython.operator import dereference from libcpp.string cimport string cdef class ZimArticle: cdef ZimArticleWrapper* c_article def __init__(self): self.c_article = new ZimArticleWrapper(<cpy_ref.PyObject*>self) def get_url(self): raise NotImplementedError def get_title(self): return “Hello” ``` Now, let’s declare a public API function that returns a string from evaluating a function on Python/Cython objects. `libzim.pyx` ```python3 cdef public api: string string_cy_call_fct(object obj, string method, string *error) with gil: """Lookup and execute a pure virtual method on ZimArticle returning a string""" try: func = getattr(obj, method.decode('UTF-8')) ret_str = func() return ret_str.encode('UTF-8') except Exception as e: error[0] = traceback.format_exc().encode('UTF-8') return b"" ``` With the public API defined, we will be able to call `string_cy_call_fct` from C++ code by including the Cython auto-generated header file `libzim_api.h`. Let’s implement the C++ wrapper that uses the Cython public API to obtain the title: `lib.h` ```cpp // -*- c++ -*- #ifndef libzim_LIB_H #define libzim_LIB_H 1 struct _object; typedef _object PyObject; #include <zim/zim.h> #include <zim/writer/article.h> #include <string> class ZimArticleWrapper : public zim::writer::Article { public: PyObject *m_obj; ZimArticleWrapper(PyObject *obj); virtual ~ZimArticleWrapper(); virtual std::string getTitle() const; private: std::string callCythonReturnString(std::string) const; }; #endif // !libzim_LIB_H ``` `lib.cxx` ```cpp #include <Python.h> #include "lib.h" // THE FILE BELOW IS AUTOGENERATED BY CYTHON AND INCLUDES BOTH (import_libzim__wrapper and string_cy_call ) #include "libzim_api.h" #include <cstdlib> #include <iostream> /* ######################### # ZimArticle # ######################### */ ZimArticleWrapper::ZimArticleWrapper(PyObject *obj) : m_obj(obj) { if (import_libzim__wrapper()) { std::cerr << "Error executing import_libzim!\n"; throw std::runtime_error("Error executing import_libzim"); } else { Py_XINCREF(this->m_obj); } } ZimArticleWrapper::~ZimArticleWrapper() { PyGILState_STATE gstate; gstate = PyGILState_Ensure(); Py_XDECREF(this->m_obj); PyGILState_Release(gstate); } std::string ZimArticleWrapper::callCythonReturnString(std::string methodName) const { if (!this->m_obj) throw std::runtime_error("Python object not set"); std::string error; std::string ret_val = string_cy_call_fct(this->m_obj, methodName, &error); if (!error.empty()) throw std::runtime_error(error); return ret_val; } std::string ZimArticleWrapper::getTitle() const { return callCythonReturnString("get_title"); } ``` Finally, to use the wrapper from Cython we need to describe the interface: `libzim.pxd` ```python3 from cpython.ref cimport PyObject from libcpp.string cimport string cdef extern from "lib.h": cdef cppclass ZimArticleWrapper(Article): ZimArticleWrapper(PyObject *obj) except + const string getTitle() except + ``` ## Conclusion This article presented two strategies for implementing C++ virtual functions with Cython. The first consisted of implementing the functions in C++ and wrapping them with Cython. The required data was passed from Cython/Python and data copies were kept in C++ Land. Although this strategy works in general, and enables a neat pythonic API, we needed an approach that could better accommodate the huge amount of scraped content involved in our project. The second strategy was a much better fit for us, since it means that there’s no need to hold all that content in memory. It also allows code to be implemented in Python instead of C++, making it more accessible, as well as easier and faster to develop. The second strategy also means that we can program to the interface and not to implementation. Unlike with Strategy One, which involves doing intermediate implementations, Strategy Two allows us to provide an implementation from Cython. If you thought that was cool, check out our other projects [here](https://monadical.com/projects.html). References ==== https://github.com/cython/cython/wiki/enchancements-inherit_CPP_classes https://stackoverflow.com/questions/10126668/can-i-override-a-c-virtual-function-within-python-with-cython https://stackoverflow.com/questions/32257889/passing-cython-class-object-as-argument-to-c-function https://groups.google.com/forum/#!topic/cython-users/vAB9hbLMxRg --- <center> <img src="https://monadical.com/static/logo-black.png" style="height: 80px"/><br/> Monadical.com | Full-Stack Consultancy *We build software that outlasts us* </center>



Recent posts:


Back to top