|
|
Subscribe / Log in / New account

What should be in the Python standard library?

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

By Jake Edge
January 9, 2019

Python has always touted itself as a "batteries included" language; its standard library contains lots of useful modules, often more than enough to solve many types of problems quickly. From time to time, though, some have started to rethink that philosophy, to reduce or restructure the standard library, for a variety of reasons. A discussion at the end of November on the python-dev mailing list revived that debate to some extent.

Jonathan Underwood raised the issue, likely unknowingly, when he asked about possibly adding some LZ4 compression library bindings to the standard library. As the project page indicates, it fits in well with the other compression modules already in the standard library. Responses were generally favorable or neutral, though some, like Brett Cannon, wondered if it made sense to broaden the scope a bit to create something similar to hashlib but for compression algorithms. Gregory P. Smith had a different take, however:

I don't think adding lz4 to the stdlib is worthwhile. It isn't required for core functionality as zlib is (lowest common denominator zip support). I'd argue that bz2 doesn't even belong in the stdlib, but we shouldn't go removing things. PyPI makes getting more algorithms easy.

If anything, it'd be nice to standardize on some stdlib namespaces that others could plug their modules into. Create a compress in the stdlib with zlib and bz2 in it, and a way for extension modules to add themselves in a managed manner instead of requiring a top level name? Opening up a designated namespace to third party modules is not something we've done as a project in the past though. It requires care. I haven't thought that through.

Steven D'Aprano objected to Smith's assertion about the Python Package Index (PyPI): "PyPI makes getting more algorithms easy for *SOME* people." He noted that in many environments (e.g. schools, companies) users cannot install additional software on the computers they are using, so PyPI is not the panacea it is sometimes characterized as.

That led Cannon to suggest discussing the standard library and its role: "We have never really had a discussion about how we want to guide the stdlib going forward (e.g. how much does PyPI influence things, focus/theme, etc.)." Paul Moore wasn't sure that discussing the matter would really resolve anything, though:

I'm not sure a formal discussion on this matter will help much - my feeling is that most people have relatively fixed views on how they would like things to go (large stdlib/batteries included vs external modules/PyPI/slim stdlib). The "problem" isn't so much with people having different views (as a group, we're pretty good at achieving workable compromises in the face of differing views) as it is about people forgetting that their experience isn't the only reality, which causes unnecessary frustration in discussions. That's more of a people problem than a technical one.

A larger standard library would help those without access to PyPI, Antoine Pitrou argued, while a smaller one does not provide huge benefits: "Python doesn't become magically faster or more powerful by including less in its standard distribution: the best it does is make the distribution slightly smaller." But there are definite downsides to having a large standard library, Benjamin Peterson said:

These include:
  • The [development] of stdlib modules slows to the rate of the Python release schedule.
  • stdlib modules become a permanent maintenance burden to CPython core developers.
  • The blessed status of stdlib modules means that users might use a substandard stdlib modules when a better thirdparty alternative exists.

Steve Dower would rather see a smaller standard library with some kind of "standard distribution" of PyPI modules that is curated by the core developers. Later in the thread, he listed numerous different Python distributions as examples of what he meant, but that just highlighted another problem, Moore said: which of those should he recommend to his users? Right now, the standard library provides the base that a Python script can rely on:

Every single one of those distributions includes the stdlib. If we remove the stdlib, what will end up as the lowest common denominator functionality that all Python scripts can assume? Obviously at least initially, inertia will mean the stdlib will still be present, but how long will it be before someone removes urllib in favour of the (better, but with an incompatible API) requests library? And how then can a "generic" Python script get a resource from the web?

Moore acknowledged that maintaining modules in the standard library has a "significant cost" but wondered if moving to the distribution model was simply shifting those costs to users—without users gaining much from it. Nathaniel Smith looked at the list of distributions and came to a different conclusion: the "single-box-of-batteries" model is not really solving the problems it needs to solve.

If Python core wants to be in the business of providing a single-box-of-batteries that solves Paul's problem, then we need to rethink how the stdlib works. Or, we could decide we want to leave that to the distros that are better at it, and focus on our core strengths like the language and interpreter. But if the stdlib isn't a single-box-of-batteries, then what is it?

It's really hard to tell whether specific packages would be good or bad additions to the stdlib, when we don't even know what the stdlib is supposed to be.

But Moore found that to be overstated somewhat. For him (and presumably others), the standard library is what you can expect to find when you have Python installed. That means that various things like StackOverflow answers, tutorials, books, and so on can rely upon those pieces being present, "much like you'd expect every Linux distribution to include grep". In addition, the "batteries included" attribute is likely to have been part of what helped Python grow into one of the most popular languages, D'Aprano said. "The current model for the stdlib seems to be working well, and we mess with it at our peril."

Nathaniel Smith sees some advantages to the "standard distribution" model, though he is not sure that it would really be the best option. "But what I like about it is that it could potentially reduce the conflict between what our different user groups need, instead of playing zero-sum tug-of-war every time this comes up." Others don't see it that way, though; "not every need can be solved by the stdlib", as Pitrou put it. He continued:

So, yes, there's a discussion for each concretely proposed package about whether it's sufficiently useful (and stable etc.) to be put in the stdlib. Every time it's a balancing act, and obviously it's an imperfect decision. That doesn't mean it cannot be done.

Moore concurred: "In exploring alternatives, let's not lose sight of the fact that the stdlib has been a huge success, so we know we *can* deliver an extremely successful distribution based on that model, no matter how much it might trigger regular debates :-)" In any case, as he pointed out, a more concrete proposal (in the form of a PEP) is going to be needed before any real progress can be made. Dower floated some ideas about what a distribution might look like along the way, but, without something like a PEP to discuss, participants are often talking past each other based on their assumptions.

The topic has come up before on the Python mailing lists and at Python Language Summits. In 2015, there was a discussion at the summit on adding the popular Requests module to the standard library. Participants recognized that there were significant barriers—development pace, certificate handling, no asyncio support—to moving it into the standard library. In the end, it made sense for Requests to stay out. At the 2018 summit, Christian Heimes brought up a number of batteries that should perhaps be removed from the set, though the effort to create a PEP listing them seems to have stalled.

No firm conclusions were drawn in the discussion, but part of the underlying problem seems to be a lack of clarity on what the purpose of the standard library is. At the 2015 summit, Cannon suggested an informational PEP be drafted to solidify that; until that happens, there will be wildly differing views on what role the standard library serves. At the moment, though, there is no process to accept or reject a PEP even if one were on offer; that will have to await the new Python Steering Council, which will be elected in early February. One of the first orders of business of that group is likely to address the PEP process.

As far as adding LZ4 goes, the overall feeling from the thread is that it would be useful to have it in the standard library—at least for those not looking to change the standard library model. Adding LZ4 also requires a PEP, however, so that process may be stalled by the governance change, as well.

Index entries for this article
PythonStandard library


(Log in to post comments)

What should be in the Python standard library?

Posted Jan 10, 2019 14:58 UTC (Thu) by mageta (subscriber, #89696) [Link]

I'd not put it past them to now that people just got over the pain of the python 2->3 migration go and break tons of scripts again because they minimize the stdlib. Have they really learned nothing?

What should be in the Python standard library?

Posted Jan 10, 2019 17:38 UTC (Thu) by smurf (subscriber, #17840) [Link]

Presumably, if you get your Python from a distribution, the recommended set will include the pieces that have been removed from stdlib.

Also, there are downsides to having a large stdlib. Presumably² the Python developers are able to reason about them and strike a, well, reasonable balance between maintainance burden and compatibility issues, esp. since this would fall flat on its face at script startup time instead of crashing some indeterminate time later.

What should be in the Python standard library?

Posted Jan 10, 2019 23:00 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

I disagree. I think it's pretty clear that Python 3 was an exceptional situation, and that going forward, they intend to adhere to PEP 4 for future deprecations. Furthermore, I'm rather skeptical that a significant number of actually used modules are going to be removed any time soon, even if they are deprecated. (For example, macpath is slated for removal in 3.8, but I doubt anyone cares at this point. On the other hand, there is no indication that they intend to remove optparse any time soon.)

However, I should also point out that the table of contents is getting unwieldy. It might make sense to reorganize it, or to split it into multiple pages. That would not break anyone's old scripts.

What should be in the Python standard library?

Posted Jan 10, 2019 23:26 UTC (Thu) by karkhaz (subscriber, #99844) [Link]

> It might make sense to reorganize it, or to split it into multiple pages. That would not break anyone's old scripts.

It breaks the flow of navigating to that page and browser-searching for a word related to the module that you want to use, whose name you do not know. For me, this has always been a reliable way of finding the right module.

If you split the ToC into multiple pages, I now need to guess what arbitrary page or category somebody has placed a module into, browse to that, search, discover that my guess was wrong, navigate back up and try again, and by now my concentration is long gone.

The ToC is not broken, there is no need to fix it.

Compressors in the Python standard library?

Posted Jan 10, 2019 15:17 UTC (Thu) by zougloub (subscriber, #46163) [Link]

A compresslib like the hashlib or codecs modules (actually codecs can be abused for stuff like that, but it results in a very limited API) makes sense for a simple encode()/decode() usage, but some compressors have extended APIs (functions or compression parameters) so they can't just be exposed through a simplified API.
But the thing is, since as of today much advanced functionality (eg. flushing, dictionary handling) isn't even exposed or documented in even the zlib module.

In any case given the amount of compressors, moving the various 3-4 letter compressor name words down a namespace would be clearly beneficial (except for compatibility of course, but there could be shims for the main/current compressors).

Compressors in the Python standard library?

Posted Jul 25, 2019 17:17 UTC (Thu) by k8to (guest, #15413) [Link]

Indeed, even with the limited functionality exposed by the existing compression & archive tools, they have pretty wildly varied interfaces. You could try to present tar, zip, etc with some base functionality and extensions, but they are expressed completely differently. That's probably hard because they have so varying behavior. It seems more achievable for simple compress/decompress datastream tools like zlib, bzip2, xz, lz4, but there are many variations here too.

Making them more regular would make it more reasonable to "drop in" additional compression algorithms, but that isn't completed work for sure.

What should be in the Python standard library?

Posted Jan 10, 2019 17:27 UTC (Thu) by MatyasSelmeci (subscriber, #86151) [Link]

A lot of (mostly internal) software we write needs to run out of the box from a git checkout without any additional setup. If libraries we use went away/got moved to PyPI, we'll have to vendor them in.

What should be in the Python standard library?

Posted Jan 10, 2019 18:36 UTC (Thu) by hkario (subscriber, #94864) [Link]

same here, dealing with PyPI, with all environments considered, is painful

standard library is a strength of the language, not its burden, just because it makes the core of the language move slower doesn't mean that the project itself is moving slower

There are people that do not use core language features too (e.g. generator functions), that doesn't mean we should think about moving them to PyPI.

What should be in the Python standard library?

Posted Jul 25, 2019 17:21 UTC (Thu) by k8to (guest, #15413) [Link]

I wonder if pypi would be less objectionable if the python packaging tools were more pleasant. Obviously some people just can't take this approach at all (for example, for python programs that are intended to be self-contained), but I have often written tools to be entirely self-contained simply because of the number of times that the packaging tools have broken on me.

Granted, sometimes the problem isn't the tools but rather things that are just difficult to deploy like 'cryptography'. But I still struggle with the status quo. Debian packaging tends to just work. I install a package and it runs. Python packages i get conflicts, build failures, inscrutible errors that make little sense. I know python has it a bit harder because it doesn't dictate the ecosystem it runs on, but it feels like some kind of binary package approach would make it vastly more reliable for those cases.

What should be in the Python standard library?

Posted Jan 10, 2019 21:37 UTC (Thu) by iabervon (subscriber, #722) [Link]

I wonder if it would be a good idea for the stdlib to consist primarily of packages from PyPI, where you can rely on having at least some version of the package supplied with Python, and may have a later version (either from PyPI, or supplied with a later Python patch release). I can think of a bunch of problems with this (what if the new version isn't backwards compatible, what if the package developers stop maintaining it, etc.), but all of them happen with the standard library or with PyPI anyway.

What should be in the Python standard library?

Posted Jan 12, 2019 11:16 UTC (Sat) by smcv (subscriber, #53363) [Link]

> I wonder if it would be a good idea for the stdlib to consist primarily of packages from PyPI, where you can rely on having at least some version of the package supplied with Python, and may have a later version (either from PyPI, or supplied with a later Python patch release).

The Perl standard library has worked like this for a long time (with CPAN as the equivalent of PyPI).

What should be in the Python standard library?

Posted Jan 13, 2019 0:40 UTC (Sun) by ms-tg (subscriber, #89231) [Link]

> > I wonder if it would be a good idea for the stdlib to consist primarily of packages from PyPI, where you can rely on having at least some version of the package supplied with Python, and may have a later version (either from PyPI, or supplied with a later Python patch release).

> The Perl standard library has worked like this for a long time (with CPAN as the equivalent of PyPI).

And the Ruby standard library is going through the same evolutionary path, where bits of the standard library are being extracted to RubyGems, but the language ships with a defined set of “default gems” and pre-installs an additional set of “bundled gems”.

For more information please see
https://stdgems.org/

This is intended to meet the continued interests of a batteries-included common install everywhere, while recognizing that libraries stagnate and tend to go unmaintained in the classic standard library.

What should be in the Python standard library?

Posted Jan 17, 2019 21:34 UTC (Thu) by atnot (subscriber, #124910) [Link]

This is already the case in some places. For example, the python `json` module is an older version of the `simplejson` pypi module.

What should be in the Python standard library?

Posted Jan 13, 2019 14:21 UTC (Sun) by nilsmeyer (guest, #122604) [Link]

> He noted that in many environments (e.g. schools, companies) users cannot install additional software on the computers they are using, so PyPI is not the panacea it is sometimes characterized as.

I wonder to what extent Python should be required to cater to broken corporate (and school) policies?

What should be in the Python standard library?

Posted Jan 13, 2019 20:08 UTC (Sun) by mb (subscriber, #50428) [Link]

To what extend is not allowing to install random/malicious software broken?
https://www.zdnet.com/article/twelve-malicious-python-lib...

What should be in the Python standard library?

Posted Jan 14, 2019 10:23 UTC (Mon) by nilsmeyer (guest, #122604) [Link]

That's a false premise, and if anything the article (ironically on a website that seems to be rife with adware) shows that there is a good process in place to deal with malicious software.

What should be in the Python standard library?

Posted Jan 18, 2019 14:43 UTC (Fri) by flussence (subscriber, #85566) [Link]

Unless these institutions are disabling access to all web VCS interfaces containing this code, and preventing use of clipboard or file save functions, *and* adding file hashes of everything in PyPI/etc to some sort of filesystem blocklist to prevent manual transcribing of these *plain text files*, they're not achieving anything here other than being obnoxious and puritanical.

What should be in the Python standard library?

Posted Jan 17, 2019 9:40 UTC (Thu) by Wol (subscriber, #4433) [Link]

> I wonder to what extent Python should be required to cater to broken corporate (and school) policies?

To what extent do you understand why those policies are in place? Would you like to go to jail?

Dunno how easy it is to do, but the idea of namespaces sounds very interesting to me. Split the stdlib up into modules, each in their own namespace, and allow drop-in replacements for each module.

That way, if the standard implementation stagnates, it's a reasonably easy job for it to be forked, improved, and fed back in.

Cheers,
Wol

What should be in the Python standard library?

Posted Jan 17, 2019 21:50 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

Is there any point in trying to control software installation when one of the packages on the "allowed" list is a Turing-complete program interpreter like Python? Sure, it only runs programs written in Python, but that just inconveniences users without actually limiting what they can do. A standard Python installation includes CFFI as part of the standard library, and consequently can do anything a C program would be able to do.

What should be in the Python standard library?

Posted Jan 19, 2019 17:28 UTC (Sat) by jgu (guest, #129944) [Link]

Poster of the original email about LZ4 bindings here. I have to say, I was a bit taken aback by how a simple offer of contribution seemed to trigger such an existential crisis on the mailing list.

I am not sure I came away feeling that the feeling was positive towards the addition of the lz4 bindings, as the final paragraph suggests - opinion seems very divided on that. I do see merit in the "compresslib" proposal though, and have been giving that some thought and prototyping.


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds