|
|
Subscribe / Log in / New account

Some sessions from the Python Language Summit

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
May 27, 2020

The Python Language Summit is an annual gathering for the developers of various Python implementations, though, this year, the gathering actually happened via videoconference—as with so many other conferences due to the pandemic. The invite-only gathering typically has numerous interesting sessions, as can be seen in the LWN coverage of the summit from 2015 to 2018, as well as in the 2019 summit coverage on the Python Software Foundation (PSF) blog. Those writeups were penned by A. Jesse Jiryu Davis, who reprised his role for this year's summit. In this article, I will summarize some of the sessions that caught my eye.

Language specification

Mark Shannon shared his thoughts on a more formal definition of the Python language. It would not only help developers of alternative implementations understand the nuances and corner cases of the language, it would also help developers of the CPython reference implementation fully understand that code base. He noted that Java has a language specification and he thinks that Python could benefit from having one as well.

Shannon proposed splitting the specification up into three parts: code loading (parsing, importing, and so on), execution, and the C API. For his presentation, he looked in more detail at the execution specification. For example, he broke down a function call into a series of steps: create a stack frame, move the function arguments from the current frame to the new one, save the instruction pointer, and push the frame onto the stack.

Breaking things down that way will allow developers to rework how certain features are interrelated. The example he gave was that iterators came first, so generators were defined in terms of iterators, even though generators are the lower-level concept. If you were starting from scratch, it would make more sense to specify iterators as being built on generators. In his nascent formal semantics repository, Shannon made a start on defining iterators in terms of generators.

The goal of the work would be to assist in language development, but it will take a fair amount of work to get there:

He concluded that a semi-formal spec of Python would help alternative implementations match CPython, would make PEPs less ambiguous, and would clarify whether any existing "odd behavior is a feature or a bug." It would be possible to reason about the correctness of optimizations. However, writing the spec is work, and it could deter good PEPs in the future if authors are daunted by writing their proposals in terms of the spec.

The audience reaction appears to have been positive, overall, with some amount of confusion as to how to get there—and how exactly the specification would be used once it exists. It remains to be seen if Shannon (or someone else) wants to put in a fairly large chunk of work for an unclear amount of benefit.

HPy

One of the problems that CPython has struggled with over the years is its C API for extensions. On one hand, it has led to a number of high-powered, popular extensions like NumPy, but on the other, the API is too closely tied to CPython internals. The too-close tie not only hampers alternative implementations (e.g. PyPy), but also stands in the way of changes that CPython developers might like to make—efforts to remove the Global Interpreter Lock (GIL), in particular.

That is the backdrop against which Antonio Cuni presented HPy (the "H" is for "handle"), which is a new API that might provide a way forward. HPy came about from a conversation between CPython, PyPy, and Cython developers at last year's EuroPython; it could replace the existing C API with one that is based on handles, rather than direct pointers to CPython objects.

Currently, Python objects in C extensions are PyObject pointers that have their lifetimes managed through reference counts. HPy would instead turn those into HPy types that effectively wrap the underlying PyObject pointers, which decouples the extension from the reference counts. A mapping between HPy objects and PyObject pointers would need to be maintained, but if, say, PyPy wanted to move objects around as part of its garbage-collection strategy, it would simply need to update the map appropriately. Handles need to be explicitly closed, and only once per handle, which is different than the Py_INCREF()/Py_DECREF()-style of reference-count management used today.

There is a debug mode that will help catch multiple calls to HPy_Close(), which should help with porting extensions to the API. To a large extent, HPy "basically works", Cuni said, but there are still lots of things that need to be addressed, including support for custom Python types in extensions.

Cuni said the "HPy strategy to conquer the world" is to create a zero-overhead façade that maps HPy to the C API (using compile-time macros), then port third-party C extensions to pure HPy, one function at a time. It must be faster on alternative implementations than their existing C API emulations; early benchmarks show a 3x speedup on PyPy and 2x on GraalPython, a JVM-based Python.

There are plans afoot to port parts of NumPy to HPy; NumPy is something of the "gold standard" in terms of Python C extensions. The PyPy team has done a lot of work to make NumPy work with PyPy; no change to the C API is likely to go far without somehow supporting NumPy. Eventually, Cuni would like to write a Python Enhancement Proposal (PEP) to add HPy to CPython; it would not replace the existing C API, but would coexist with it, at least for while.

Property-based testing

Handwritten tests do a good job of finding problems and preventing regressions, but they are limited to the types of tests that the developer can think of—bugs that come from unforeseen areas or interactions may well be missed. There are alternatives that try to fill in those gaps, either through exhaustive testing or by fuzzing; using coverage-guided fuzzing can yield even better results. But fuzz testing is generally geared toward finding inputs that cause programs to crash; property-based testing is an alternative for finding logic bugs of various sorts.

Zac Hatfield-Dodds gave a summit presentation on property-based testing; he is one of the leads for the Hypothesis project, which is "a Python library for creating unit tests which are simpler to write and more powerful when run, finding edge cases in your code you wouldn’t have thought to look for". Hatfield-Dodds proposed adding these kinds of tests to the Python standard library.

Instead of looking for a particular mapping from an input to an output, as many unit tests do, a property-based test would describe what properties the function should always maintain: commutative, sorted values, idempotent, and so on. The framework takes those descriptions and alters the inputs to see if it can find places where it breaks.

Hypothesis searches for bugs by randomizing the input, or trying interesting values that tend to trigger edge cases, or retrying inputs that triggered bugs in previous runs. When Hypothesis finds a bug, it evolves the input, searching for the simplest input that reproduces the same bug.

Hatfield-Dodds suggested that property-based tests be created for CPython, its builtins, the standard library, other implementations like PyPy, and more. Those tests could be run as part of the continuous integration (CI) for CPython and shared with other language implementations. They could also be integrated with coverage-guided fuzzing frameworks, such as for the OSS-Fuzz project.

Several developers in the audience noted that property-based testing looks useful. Łukasz Langa pointed to an effort to use the techniques in Hypothesis on a Python code formatter, which found a lot of bugs. Paul Ganssle used property-based testing for his reimplementation of date.fromisoformat() in the datetime module; it worked well, but those tests were not merged with the new code. A subsequent bug was introduced that would likely have been caught if those tests had been run, so he was strongly behind the idea of adding that kind of testing. It is not clear where things go from here, but the technique seems like a promising addition to the testing arsenal.

Mobile Python

Russell Keith-Magee returned to the summit to give a presentation on Python for mobile systems. He presented in 2015 and last year on the status of the long-running project to make CPython available on iOS and Android. He began his presentation this year by noting that the BeeWare project has its tools mostly running on Android now, thanks to a grant from the PSF.

CPython has been running on iOS for a while now, but Android has been problematic until recently. The strategy used to be to compile the Python to Java bytecode, then run that on Android, "but Android devices are now fast enough, and the Android kernel permissive enough, to run CPython itself".

Distribution size is an issue for mobile platforms, however. Each app bundles the entire Python runtime, so making that as small as possible is a priority for the project. There have been some ideas on slimming down CPython by removing much or all of the standard library in order to minimize its size. The idea of a "kernel Python" (which was inspired by a presentation from Amber Brown at the 2019 summit) is one that a number of different projects would like to see.

Senthil Kumaran observed, "BeeWare, MicroPython, Embedded Python, Kivy all seem to have a need for a kernel-only Python," and suggested they combine forces to create one.

Currently, Keith-Magee maintains CPython forks to support iOS for Python 3.5 through 3.8; for Android, he has a handful of patches and a list of unit tests that need to be skipped. There is no continuous-integration (CI) testing as he has not found a service that provides phones to run on. If mobile Python is to become a reality, the changes for iOS and Android need to get upstream and some kind of CI system needs to be established.

Mobile Python suffers a chicken-and-egg problem: there is no corporate funding for Python on mobile because Python doesn't support mobile, so there is no one relying on mobile Python who is motivated to fund it.

He wondered if the CPython core developers were interested in changing the situation; it will take both money and work to get there, but there is no point in doing it if there is no appetite for it in the core. It sounds like several audience members were in favor of adding support for mobile Python, including Python founder Guido van Rossum and former release manager Ned Deily. Whether that translates to a renewed push, with some funding from the PSF or elsewhere, remains to be seen.

And more

There were, of course, lots of other sessions, as well as two rounds of lightning talks. Those interested in different facets of Python development will find taking a spin through the reports rewarding. At the end of the videoconference, Victor Stinner said: "Thanks TCP/IP for making this possible." That is a sentiment that will likely be shared widely these days.


Index entries for this article
PythonPython Language Summit


(Log in to post comments)

Some sessions from the Python Language Summit

Posted May 28, 2020 11:44 UTC (Thu) by djc (subscriber, #56880) [Link]

In reference to the Mobile Python sections, I wonder if the group are aware of PyOxidizer?

https://github.com/indygreg/PyOxidizer

This seems like a promising way to pack up Python code into an executable in an efficient way.

Some sessions from the Python Language Summit

Posted May 28, 2020 16:31 UTC (Thu) by hkario (subscriber, #94864) [Link]

Hypothesis is really nice, the API to generate inputs (when the existing classes aren't sufficient) is really powerful. It also integrates both with pure unittest tests and more advanced runners like pytest.

Some sessions from the Python Language Summit

Posted May 29, 2020 3:15 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Woo, it looks like HPy is the API change I've been looking for! I hope it ends up being the standard C API (even if it means I have to rewrite a bunch of stuff; allowing our Python use to be sealed away from any other Python in-process will be quite great).

Some sessions from the Python Language Summit

Posted May 29, 2020 17:04 UTC (Fri) by dloewenherz (guest, #139206) [Link]

My first LWN comment! Regarding mobile Python:
There have been some ideas on slimming down CPython by removing much or all of the standard library in order to minimize its size.
This seems like a great idea, especially for mobile platforms that enforce security controls to the local filesystem that makes much of the stdlib unusable anyways. I wonder if this is a project that other folks have already embarked upon?

Some sessions from the Python Language Summit

Posted May 29, 2020 20:27 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Isn't that specific problem solved with freezing the stdlib? <https://wiki.python.org/moin/Freeze> I think it's more about size here (e.g., stripping the Amiga audio decoders or SMTP libraries), not mechanism.


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds