Some sessions from the Python Language Summit
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
The Python Language Summit is an annual gathering for the developers of various Python implementations, though, this year, the gathering actually happened via videoconference—as with so many other conferences due to the pandemic. The invite-only gathering typically has numerous interesting sessions, as can be seen in the LWN coverage of the summit from 2015 to 2018, as well as in the 2019 summit coverage on the Python Software Foundation (PSF) blog. Those writeups were penned by A. Jesse Jiryu Davis, who reprised his role for this year's summit. In this article, I will summarize some of the sessions that caught my eye.
Language specification
Mark Shannon shared his thoughts on a more formal definition of the Python language. It would not only help developers of alternative implementations understand the nuances and corner cases of the language, it would also help developers of the CPython reference implementation fully understand that code base. He noted that Java has a language specification and he thinks that Python could benefit from having one as well.
Shannon proposed splitting the specification up into three parts: code loading (parsing, importing, and so on), execution, and the C API. For his presentation, he looked in more detail at the execution specification. For example, he broke down a function call into a series of steps: create a stack frame, move the function arguments from the current frame to the new one, save the instruction pointer, and push the frame onto the stack.
Breaking things down that way will allow developers to rework how certain features are interrelated. The example he gave was that iterators came first, so generators were defined in terms of iterators, even though generators are the lower-level concept. If you were starting from scratch, it would make more sense to specify iterators as being built on generators. In his nascent formal semantics repository, Shannon made a start on defining iterators in terms of generators.
The goal of the work would be to assist in language development, but it will take a fair amount of work to get there:
The audience reaction appears to have been positive, overall, with some amount of confusion as to how to get there—and how exactly the specification would be used once it exists. It remains to be seen if Shannon (or someone else) wants to put in a fairly large chunk of work for an unclear amount of benefit.
HPy
One of the problems that CPython has struggled with over the years is its C API for extensions. On one hand, it has led to a number of high-powered, popular extensions like NumPy, but on the other, the API is too closely tied to CPython internals. The too-close tie not only hampers alternative implementations (e.g. PyPy), but also stands in the way of changes that CPython developers might like to make—efforts to remove the Global Interpreter Lock (GIL), in particular.
That is the backdrop against which Antonio Cuni presented HPy (the "H" is for "handle"), which is a new API that might provide a way forward. HPy came about from a conversation between CPython, PyPy, and Cython developers at last year's EuroPython; it could replace the existing C API with one that is based on handles, rather than direct pointers to CPython objects.
Currently, Python objects in C extensions are PyObject pointers that have their lifetimes managed through reference counts. HPy would instead turn those into HPy types that effectively wrap the underlying PyObject pointers, which decouples the extension from the reference counts. A mapping between HPy objects and PyObject pointers would need to be maintained, but if, say, PyPy wanted to move objects around as part of its garbage-collection strategy, it would simply need to update the map appropriately. Handles need to be explicitly closed, and only once per handle, which is different than the Py_INCREF()/Py_DECREF()-style of reference-count management used today.
There is a debug mode that will help catch multiple calls to
HPy_Close(), which should help with porting extensions to the
API. To a large extent, HPy "basically works
", Cuni said, but
there are still lots
of things that need to be addressed, including support for custom Python types
in extensions.
There are plans afoot to port parts of NumPy to HPy; NumPy is something of the "gold standard" in terms of Python C extensions. The PyPy team has done a lot of work to make NumPy work with PyPy; no change to the C API is likely to go far without somehow supporting NumPy. Eventually, Cuni would like to write a Python Enhancement Proposal (PEP) to add HPy to CPython; it would not replace the existing C API, but would coexist with it, at least for while.
Property-based testing
Handwritten tests do a good job of finding problems and preventing regressions, but they are limited to the types of tests that the developer can think of—bugs that come from unforeseen areas or interactions may well be missed. There are alternatives that try to fill in those gaps, either through exhaustive testing or by fuzzing; using coverage-guided fuzzing can yield even better results. But fuzz testing is generally geared toward finding inputs that cause programs to crash; property-based testing is an alternative for finding logic bugs of various sorts.
Zac Hatfield-Dodds gave
a summit presentation on property-based testing; he is one of the
leads for the Hypothesis project,
which is "a Python library for creating unit tests which are simpler
to write and more powerful when run, finding edge cases in your code you
wouldn’t have thought to look for
". Hatfield-Dodds proposed adding
these kinds of tests to the Python standard library.
Instead of looking for a particular mapping from an input to an output, as many unit tests do, a property-based test would describe what properties the function should always maintain: commutative, sorted values, idempotent, and so on. The framework takes those descriptions and alters the inputs to see if it can find places where it breaks.
Hatfield-Dodds suggested that property-based tests be created for CPython, its builtins, the standard library, other implementations like PyPy, and more. Those tests could be run as part of the continuous integration (CI) for CPython and shared with other language implementations. They could also be integrated with coverage-guided fuzzing frameworks, such as for the OSS-Fuzz project.
Several developers in the audience noted that property-based testing looks useful. Łukasz Langa pointed to an effort to use the techniques in Hypothesis on a Python code formatter, which found a lot of bugs. Paul Ganssle used property-based testing for his reimplementation of date.fromisoformat() in the datetime module; it worked well, but those tests were not merged with the new code. A subsequent bug was introduced that would likely have been caught if those tests had been run, so he was strongly behind the idea of adding that kind of testing. It is not clear where things go from here, but the technique seems like a promising addition to the testing arsenal.
Mobile Python
Russell Keith-Magee returned to the summit to give a presentation on Python for mobile systems. He presented in 2015 and last year on the status of the long-running project to make CPython available on iOS and Android. He began his presentation this year by noting that the BeeWare project has its tools mostly running on Android now, thanks to a grant from the PSF.
CPython has been running on iOS for a while now, but Android has been
problematic until recently. The strategy used to be to compile the Python
to Java bytecode, then run that on Android, "but Android devices are
now fast enough, and the Android kernel permissive enough, to run CPython
itself
".
Distribution size is an issue for mobile platforms, however. Each app bundles the entire Python runtime, so making that as small as possible is a priority for the project. There have been some ideas on slimming down CPython by removing much or all of the standard library in order to minimize its size. The idea of a "kernel Python" (which was inspired by a presentation from Amber Brown at the 2019 summit) is one that a number of different projects would like to see.
Currently, Keith-Magee maintains CPython forks to support iOS for Python 3.5 through 3.8; for Android, he has a handful of patches and a list of unit tests that need to be skipped. There is no continuous-integration (CI) testing as he has not found a service that provides phones to run on. If mobile Python is to become a reality, the changes for iOS and Android need to get upstream and some kind of CI system needs to be established.
He wondered if the CPython core developers were interested in changing the situation; it will take both money and work to get there, but there is no point in doing it if there is no appetite for it in the core. It sounds like several audience members were in favor of adding support for mobile Python, including Python founder Guido van Rossum and former release manager Ned Deily. Whether that translates to a renewed push, with some funding from the PSF or elsewhere, remains to be seen.
And more
There were, of course, lots of other sessions, as well as two rounds of
lightning talks. Those interested in different facets of Python
development will find taking a spin through the reports rewarding. At the
end of the videoconference, Victor
Stinner said: "Thanks TCP/IP for making this possible.
" That
is a sentiment that will likely be shared widely these days.
Index entries for this article | |
---|---|
Python | Python Language Summit |
(Log in to post comments)
Some sessions from the Python Language Summit
Posted May 28, 2020 11:44 UTC (Thu) by djc (subscriber, #56880) [Link]
https://github.com/indygreg/PyOxidizer
This seems like a promising way to pack up Python code into an executable in an efficient way.
Some sessions from the Python Language Summit
Posted May 28, 2020 16:31 UTC (Thu) by hkario (subscriber, #94864) [Link]
Some sessions from the Python Language Summit
Posted May 29, 2020 3:15 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Some sessions from the Python Language Summit
Posted May 29, 2020 17:04 UTC (Fri) by dloewenherz (guest, #139206) [Link]
My first LWN comment! Regarding mobile Python:There have been some ideas on slimming down CPython by removing much or all of the standard library in order to minimize its size.This seems like a great idea, especially for mobile platforms that enforce security controls to the local filesystem that makes much of the stdlib unusable anyways. I wonder if this is a project that other folks have already embarked upon?
Some sessions from the Python Language Summit
Posted May 29, 2020 20:27 UTC (Fri) by mathstuf (subscriber, #69389) [Link]