|
|
Subscribe / Log in / New account

Python and deprecations redux

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
February 1, 2022

The problem of how to deprecate pieces of the Python language in a minimally disruptive way has cropped in various guises over the last few years—in truth, it has been wrangled with throughout much of language's 30-year history. The scars of the biggest deprecation, that of Python 2, are still rather fresh, both for users and the core developers, so no one wants (or plans) a monumental change of that sort. But the language community does want to continue evolving Python, which means leaving some "baggage" behind; how to do so without leaving further scars is a delicate balancing act, as yet another discussion highlights.

We looked in on some discussion of the topic back in December, but the topic pops up frequently. There is a policy on handling deprecations that is described in PEP 387 ("Backwards Compatibility Policy"), but the reality of how they are handled is often less clear-cut. Python has several warnings that can be raised when features slated for deprecation are used: PendingDeprecationWarning and DeprecationWarning. The former is meant to give even more warning for a feature that will coexist with its replacement for multiple releases, while the latter indicates something that could be removed two releases after the warning is added—effectively two years based on the relatively recent annual release cycle.

But, as noted in that earlier discussion, the deprecation period is for a minimum of two release cycles. There are concerns that time frame is being treated as a deadline of sorts—to the detriment of some parts of the ecosystem. So on January 18, Victor Stinner, Tomáš Hrnčiar, and Miro Hrončok proposed postponing some deprecations that had been scheduled for Python 3.11, which is due in October. The message referred to an early January posting by Hrnčiar to the Python discussion forum that described the problems Fedora had encountered when building its packages using a development version of 3.11.

In particular, two specific sets of deprecations were causing the most trouble for Fedora packages. Removing deprecated aliases from the unittest module (bug 45162) and getting rid of deprecated pieces from the configparser module (bug 45173) led to the bulk of the problems that Fedora encountered. The unittest deprecation caused 61 Fedora packages to break, while the configparser changes broke another 28. In the proposal, Stinner said that they and others had reported the problems upstream and often contributed a fix, but that there is still a lengthy process before the changes actually reach the distribution:

The problem is that fixing a Fedora package requires multiple steps:
  1. Propose a pull request upstream
  2. Get the pull request merged upstream
  3. Wait for a new release upstream
  4. Update the Fedora package downstream, or backport the change in Fedora (only needed by Fedora)

Reverting those two changes, which caused most of the problems Fedora has run into in its testing of the new version of Python, will allow for "more time on updating projects to Python 3.11 for the other remaining incompatible changes". As reported by Hrnčiar, four other changes led to problems building Python packages, but those were fewer in number.

Silencing deprecations

In a reply to the proposal, Antoine Pitrou wondered whether it showed "that making DeprecationWarning silent by default was a mistake?" He is referring to the changes to the visibility of DeprecationWarning that have occurred over the years. While DeprecationWarning is useful for the developers of a Python package, it is often seen by users, who may not be in a position to do much about it. The warnings were made invisible by default for Python 2.7 and 3.2 (in 2010 and 2011), but that policy was changed for Python 3.7 in 2017 with PEP 565 ("Show DeprecationWarning in __main__").

Guido van Rossum did not think that the evidence was quite that clear, but deprecations are tricky:

At best it shows that deprecations are complicated no matter how well you plan them. I remember that "noisy by default" deprecation warnings were widely despised.

Some ideas of further tweaks that could be made to the visibility of the warnings were raised. Richard Damon suggested having them only be visible when running unit tests. It turns out that pytest already enables those warnings, as Brett Cannon pointed out. That is something of a double-edged sword, though, Christopher Barker noted: "It's really helpful for my code, but they often get lost in the noise of all the ones I get from upstream packages." Gregory P. Smith pointed out that the standard library unit tests enable the warnings as well; "Getting the right people to pay attention to them is always the hard part."

Fixing deprecations

There was a bit of discussion about how to silence warnings from imported modules, possibly semi-automatically, but Steven D'Aprano had a bit of a warning about that approach:

If we use a library, then we surely care about that library working correctly, which means that if the library generates warnings, we *should* care about them. They are advanced notice that the library is going to break in the future.

Of course I understand that folks are busy maintaining their own project, and have neither the time nor the inclination to take over the maintenance of every one of their dependencies. But we shouldn't just dismiss warnings in those dependencies as "warnings I don't care about" and ignore them as Not My Problem.

Like it or not, it is My Problem and we should care about them.

In the world of open-source software, the lines between users and "vendors" of software are blurred, he said. Users often have the ability, and certainly have the legal right, to change the code based on observing problems of this (or any other) nature, but there is something of a social problem, "and you cannot fix social problems with technology". Ignoring warnings breaks some assumptions about how open source works:

The open source mantra about many eyes making bugs shallow doesn't work when everyone is intentionally closing their eyes to the warnings of pending bugs.

Barker said that he does try to submit fixes upstream when he notices problems of that sort, as did others in the thread. There is still the problem, mentioned by Stinner, that even once fixes are contributed, releases including them may still take a while; as Stephen J. Turnbull put it: "even if you submit a patch, there's no guarantee that the next version (or three) will contain it".

With regard to silencing DeprecationWarning, Steve Dower said that it was not necessarily a mistake to do so:

If we'd gone the other way, perhaps we'd be looking at massive complaints from "regular" end users about all the noisy warnings that they can't fix and saying that making it noisy was the mistake.

He was not opposed to reverting the changes as proposed, though he thought it might be "a bit premature" to do so now, roughly nine months before the release. They can be reverted closer to the release if the packages in question still are not fixed (and released). If they do get reverted now, because "they cause churn for no real benefit", that would be reasonable; those who are opposed can argue that the benefit is real, however, "as long as they also argue in favour of the churn". He also made a broader point:

We shouldn't pretend to be surprised that something we changed causes others to have to change. We *know* that will happen. Either we push forward with the changes, or we admit we don't really need them.

Stinner pointed to two different examples of the kinds of problems that Fedora has found by testing with development versions of upcoming Python releases. There are advantages to finding these problems as early as possible: "If issues are discovered earlier, we get more time to discuss and design how to handle them." He thinks it makes sense to revert these particularly problematic deprecations now because it will help flush out more problems further down in the dependency chain:

In Fedora, if a frequently used dependency is broken, a long list of packages "fail to build". (In Fedora, the package test suite must pass to build a package successfully.) If it takes 9 months to fix this dependency, we will likely miss other issues before the Python final version in dependent packages.

Sebastian Rittau said "that some (semi-) automated way to actively test and notify important projects of deprecations/removals before a release would be a great addition to the Python ecosystem", though he acknowledged that it might be difficult to do. Stinner replied that, in effect, Fedora is already doing that, albeit with "changes already merged in Python". He has done some work on ways to automatically test Python with patches applied, to test upcoming or proposed changes, but it turned out to be rather complicated.

Smith was also in favor of the reversions; he thanked the Fedora team for helping bring these problems to light, and noted that being proactive is a better way forward:

Deprecation removals are hard. Surfacing these to the impacted upstream projects to provide time for those to integrate the changes is the right way to make these changes stick in 3.12 or later. [...]

As you've done the work to clean up a lot of other OSS projects, I suggest we defer this until 3.12 with the intent that we won't defer it again. That doesn't mean we can't hold off on it, just that we believe pushing for this now and proactively pushing for a bunch of cleanups has improved the state of the world such that the future is brighter. That's a much different strategy than our passive aggressive DeprecationWarnings.

Toward the end of the original proposal message, Stinner had some thoughts on being even more proactive in the future. He suggested that before making an incompatible change, doing a search of the Python Package Index (PyPI) for uses of the feature in question "and try to update these projects *before* making the change". Once the number of affected projects has been reduced to some low number (he suggested 15), the change could be made in Python.

The Python ecosystem is huge, with an amazing number of projects, libraries, packages, tools, and so on, subsets of which are gathered up together into Linux (and other) distributions. All of those packages support differing ranges of Python versions, which makes the job of distributions that much harder, since they typically settle on one Python version to maintain throughout the life of a particular distribution release. Deprecating pieces along the way makes that ever more difficult, of course.

There are other software projects that take a different approach; the Linux kernel somewhat famously almost never deprecates something unless it truly can no longer be supported (e.g. ancient hardware or an API that leads to a security hole), but Python (and some other languages) have not chosen that course. There are certainly advantages to leaving things behind, especially when replacing them with something emphatically and unquestionably better, but it does have its downsides as well. It would seem that Python is drawing closer to finding the right balance when the deprecation route is taken, though there are always likely to be bumps along the way.


Index entries for this article
PythonDeprecation


(Log in to post comments)

Python and deprecations redux

Posted Feb 2, 2022 3:49 UTC (Wed) by tbird20d (subscriber, #1901) [Link]

I've come to the unfortunate conclusion that writing code in Python is a fool's errand. I have been a fan and user of the language since Python 1.5. Every project that I have migrated to version 3 of the language suffers a stream of breakages as new version of the language and libraries come out. I'm starting to resist the urge to move things forward. I'm not sure what I'll do when distros stop shipping Python 2.7.

Python and deprecations redux

Posted Feb 2, 2022 10:15 UTC (Wed) by ddevault (subscriber, #99589) [Link]

I've come to the same conclusion. Unfortunately, I am responsible for a number of large Python codebases. Slowly rewriting them...

Python and deprecations redux

Posted Feb 2, 2022 20:24 UTC (Wed) by tnoo (subscriber, #20427) [Link]

There is Conda that lets you cleanly isolate different versions of libraries and Python.

Just activate the proper Conda environment for each code base you have.

Very convenient.

Python and deprecations redux

Posted Feb 3, 2022 15:22 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

Depending on how reliant the code bases are on new versions of packages, you may want to consider either building Python 2.7 from source or Tauthon:

https://github.com/naftaliharris/tauthon

I'm personally staying on Python 2.7 forever. I've never migrated anything and never will because I refuse to let other people create work for me. Oh, and I'll never have to worry about deprecations again :)

The Python Lumberjack
-------------------------------

Oh, I'm on 2.7 and I'm okay.
I sleep well at night, and do real work during the day.
No future deprecations will be coming my way!
Oh, I'm on 2.7 and I'm okay!

Python and deprecations redux

Posted Feb 17, 2022 1:12 UTC (Thu) by nix (subscriber, #2304) [Link]

Tauthon, hm, I was wondering about that...

... no commits for nearly a year now. I'd call that more or less dead, alas :(

Python and deprecations redux

Posted Mar 2, 2022 8:17 UTC (Wed) by cpitrat (subscriber, #116459) [Link]

Well, as long as you don't have security concern and don't need new libraries and don't need hardware not supported by the old version ...

Python and deprecations redux

Posted Feb 2, 2022 11:41 UTC (Wed) by ceplm (subscriber, #41334) [Link]

Next service pack of SLE-15 won’t have Python 2. Ding dong ding dong, your time is coming!

Python and deprecations redux

Posted Feb 2, 2022 11:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

My previous company successfully solved this by moving everything from Python to Go and Java.

Both are exceptionally good at maintaining backwards compatibility (arguably, too much so in case of Java).

Python and deprecations redux

Posted Feb 2, 2022 14:52 UTC (Wed) by madscientist (subscriber, #16861) [Link]

Python (2, at least, haven't tried P3) is very easy to build locally from source. We've been doing this for years, initially so that everyone could use Python 2.7 even if their local system had an older version installed, and now so everyone can use Python 2.7 even if their local system had a newer version installed.

Python and deprecations redux

Posted Feb 2, 2022 21:27 UTC (Wed) by ceplm (subscriber, #41334) [Link]

(working for SUSE, but my opinions do not reflect opinions of my employer, yada yada)

Aren’t these boring cleanup tasks (like running sed -e 's/assertRegexpMatches/assertRegex/' on all files in packages in distro) exactly what enterprise distributors should do? There are some things which people won’t do unless they are paid to do so, and nobody else will pay for this misery.

We are just finishing another similar ultra-boring thing: eliminating nose (that’s nose1) from the distro. There were hundreds of patches sent upstream, some of them trivial, some (whole ipython universe, boto) far far from trivial, some of them were just send by us upstream, some of them we have to develop in close cooperation with upstream because of their complexity.

Python and deprecations redux

Posted Feb 2, 2022 21:41 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]

> Aren’t these boring cleanup tasks (like running sed -e 's/assertRegexpMatches/assertRegex/' on all files in packages in distro) exactly what enterprise distributors should do?

They do for packages they ship. However the ecosystem of Python is much much larger than the packages shipped by the distros. So once you go beyond the core set, you start hitting the rough edges. In the container world, its not uncommon for devs to just bypass the distro packages and use Pip directly because they want a newer version or they just don't know any better. So the distro work doesn't benefit them.

Python and deprecations redux

Posted Feb 2, 2022 21:52 UTC (Wed) by Wol (subscriber, #4433) [Link]

> Aren’t these boring cleanup tasks (like running sed -e 's/assertRegexpMatches/assertRegex/' on all files in packages in distro) exactly what enterprise distributors should do? There are some things which people won’t do unless they are paid to do so, and nobody else will pay for this misery.

Do the majority of programmers work for software houses, or for end -users? And I think you'll find there are a LOT of people (like me, now) for whom programming is a large chunk of the job, but they're not called programmers. And for many, Python is their tool of choice.

So all your hard work REMOVING "nose", and similar, is actually MAKING work for them.

Cheers,
Wol

Python and deprecations redux

Posted Feb 2, 2022 21:58 UTC (Wed) by ceplm (subscriber, #41334) [Link]

Because of course you don’t care that you sell to your client software with known bugs including security bugs (that’s also for those who promote here building your own 2.7 from the upstream tarball … we have currently 37 patches on 2.7, which they won’t apply).

Python and deprecations redux

Posted Feb 2, 2022 22:37 UTC (Wed) by Wol (subscriber, #4433) [Link]

WHAT customers? Didn't you actually read what I wrote?

Cheers,
Wol

Python and deprecations redux

Posted Feb 2, 2022 21:36 UTC (Wed) by eplanit (guest, #121769) [Link]

Same here. I've wanted to like Python, but the v2 vs. v3 split make it impractical. I've experienced more misery than benefit from it, mostly due to the version split and its cumbersome ecosystem of dependencies. I've been willing to forgive the annoying indentation-based syntax, and recognize that a lot of people are more than ok with it's nuances.

In my 30+ years of being a software engineer, I've known no other language to be so popular, yet I've also know no other to have decade+ long v2 vs. v3 split and such a list of peculiarities.

I've become more a fan of Golang -- you still have to manage your dependencies for developing, but you can ship a simple executable and live the day with much less stress (and much simpler installation instructions for your user/customer).

Python and deprecations redux

Posted Feb 3, 2022 8:18 UTC (Thu) by LtWorf (subscriber, #124958) [Link]

I agree, but I'm curious as to what you think is the replacement for python.

Python and deprecations redux

Posted Feb 3, 2022 15:31 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

You can get off the treadmill pretty easily: just package your own Python 2.7 build with your packages.

This site may help:
https://github.com/pts/staticpython

Deprecated shouldn't mean removal

Posted Feb 2, 2022 4:09 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link]

There seems to be a confusion between the word deprecated and the word removal. Deprecated just means that you're discouraging people from using it. There's no particular reason a deprecated feature MUST be removed, that should be evaluated case by case and only removed when the cost of removal is *now* low compared to its benefits. For example, the unittest aliases... Removing them clearly causes harm, yet they are just aliases. What is the harm of being kind to your users? I don't see a strong advantage to removing them other than having an unwise policy of always removing deprecated names. Sure, things should sometimes be removed, but removal is pretty user-hostile.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 7:05 UTC (Wed) by mb (subscriber, #50428) [Link]

Deprecating and eventually removing some big functionality, that causes major maintenance burden, is one thing.
But removing simple aliases and other trivial things is another.

Removing simple aliases will make the life of the library/language maintainer almost no better, but it will force the users to create compatibility layers and monkey patching, if they must support older Python versions or other Interpreters that don't have the same change schedule as CPython. (The 2-3 transition is still a thing! If old aliases are removed, that possibly makes the 2-3 transistion even harder again.)

We should have a deprecation period of at least 10 years by default.
For individual features and cases that duration could be reduced, if it's really a big pain to the library/language maintainers.

Trivial aliases should never be removed from central parts such as the stdlib or other big libraries, unless the functionality implementing these interfaces goes away as a whole. Just make these trivial deprecated things vanish from the documentation and be done with it.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 8:03 UTC (Wed) by Wol (subscriber, #4433) [Link]

Can't you add a deprecation layer? So for example everything deprecated in 3.7 moves into a module called "deprecated37". That's then left live in production, but as far as possible deleted from devs' workstations.

At which point we sort of end up with the same situation linux is in, where stuff is never actively removed until it bit-rots and no-one cares to fix it ...

Cheers,
Wol

Deprecated shouldn't mean removal

Posted Feb 2, 2022 10:18 UTC (Wed) by mb (subscriber, #50428) [Link]

> So for example everything deprecated in 3.7 moves into a module called "deprecated37".

That doesn't improve the situation for deprecation of trivial changes. Moving breaks the existing API.

> At which point we sort of end up with the same situation linux is in, where stuff is never actively removed until it bit-rots and no-one cares to fix it ...

On Linux decades old binaries can usually be run without problems.
Try that with Python scripts.

Linux ABI backward compatibility is not perfect, but it is way better than Python backward compatibility.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 17:05 UTC (Wed) by vstinner (subscriber, #42675) [Link]

> Linux ABI backward compatibility is not perfect, but it is way better than Python backward compatibility.

While the Linux source code (~30M LOC) is way bigger than the Python source code (1M LOC), the API exposed by the Linux kernel (syscall, ioctl, devices, etc.) looks smaller than the API of the Python language and its large standard library.

The Linux kernel has around 300 syscalls and Python has around 300 stdlib modules. The API of a Linux syscall looks smaller to me than the API of a whole stdlib module. For example, the Python module os provides more than 200 functions and also contains os.path submodule which also provides around 40 functions.

The discussed unittest module provides 80 methods and functions, and its unittest.mock sub-module provides 30 functions and methods.

Well, to be honest, I don't know well the Linux kernel "API", so I'm maybe just plain wrong, ioctl(), BPF & cie are way larger than the Python API. Or maybe Linux API and Python API cannot be compared because they are too different ;-)

Note: Python also provides a C API which exposes more than 100 structures and around 1500 functions (1000 public and 500 "private" functions, but in practice many of these "private" functions are used by 3rd party C extensions). It's challenging to introduce new feature without breaking any of these functions which were not designed to be used by 3rd party code initially (not designed to remain "stable" forever), 30 years ago.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 17:59 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

> It's challenging to introduce new feature without breaking any of these functions which were not designed to be used by 3rd party code initially (not designed to remain "stable" forever), 30 years ago.

The core PyObject changes often enough that C API users need updates too, so it's not any more sacred in the backwards compatibility landscape than anything else. I've not seen any other efforts into making it more future-proof either.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 0:38 UTC (Thu) by vstinner (subscriber, #42675) [Link]

> The core PyObject changes often enough that C API users need updates too, so it's not any more sacred in the backwards compatibility landscape than anything else.

The PyObject structure is the same since the initial Python commit in 1990. Only the structure name changed from "object" to "PyObject" (in the early years of Python). What do you mean by frequent PyObject changes? Could you be more specific?

> I've not seen any other efforts into making it more future-proof either.

I'm actively working on bending the C API towards a more stable API (and get a stable ABI) in the long term. For example, I wrote PEP 620, PEP 670 and PEP 674.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 13:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

3.8 renamed `tp_print` to `tp_vectorcall_offset` and, in the process, it went from a pointer to an integer type. Code which passed `nullptr` from C++ code no longer compiled. Looking at our preprocessor definitions around it, it seems that `tp_print` had been moved to the end of the structure at that time, but removed in 3.9, breaking definitions that initialized every field (IIRC, extensions are fine as long as zero-initialization is fine, but removing a member breaks pedantic code).

I'll also note that the PyConfig initialization routines (added in 3.8) are way better, but 3.10 introduced a new initialization codepath for the interpreter that broke how we supplemented `sys.path` in our interpreter wrapper. Unfortunately, I did not get around to this until after the final 3.10 release. Basically, Py_Main resets `sys.path` and we need to defer the addition of our own paths until after initialization.

I'll also note that PyConfig is missing "add this to sys.path" as the only options are "do the default stuff" and "I'll do all the work myself" with no middle ground (at least as far as the docs indicate).

> I'm actively working on bending the C API towards a more stable API (and get a stable ABI) in the long term. For example, I wrote PEP 620, PEP 670 and PEP 674.

That is good to hear. These PEPs seem like real improvements are on the roadmap, thank you.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 15:56 UTC (Thu) by vstinner (subscriber, #42675) [Link]

> 3.8 renamed `tp_print` to `tp_vectorcall_offset` and, in the process, it went from a pointer to an integer type.

Oh, you're talking about the PyTypeObject structure and defining "static types". Since Python 3.2, there is a new PyType_FromSpec() API which doesn't suffer from these issues. In Python 3.9 and 3.10, this API has been completed to support more PyTypeObject members. I'm not sure that PyType_FromSpec() is well advertized. See the PEP 630 for a good overview of current best practices: https://www.python.org/dev/peps/pep-0630/

> I'll also note that PyConfig is missing "add this to sys.path" as the only options are "do the default stuff" and "I'll do all the work myself" with no middle ground (at least as far as the docs indicate).

Aha, the "Path Configuration" is the most complex part of the Python initialization. In Python 3.10, you can call PyConfig_Read(config) to compute the default Path Configuration, and then modify config.module_search_paths to insert or append your own paths.

In Python 3.11, Modules/getpath.c has been reimplemented in pure Python (Modules/getpath.py). I'm not sure how it impacts PyConfig API, I didn't follow these recent changes. I designed and implemented PEP 587 (the new PyConfig C API) in Python 3.8.

We lack user feedback on these APIs. You may open an issue at bugs.python.org to elaborate your use case and explain how the current API doesn't fit your needs.

Deprecated shouldn't mean removal

Posted Feb 17, 2022 15:34 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

> Oh, you're talking about the PyTypeObject structure and defining "static types".

Whoops, indeed. Sorry.

> I'm not sure that PyType_FromSpec() is well advertized.

No, it is not. I've opened an issue to start using this instead.

> We lack user feedback on these APIs. You may open an issue at bugs.python.org to elaborate your use case and explain how the current API doesn't fit your needs.

Thanks; I'll look at summarizing there.

Deprecated shouldn't mean removal

Posted Feb 4, 2022 18:21 UTC (Fri) by mb (subscriber, #50428) [Link]

> While the Linux source code (~30M LOC) is way bigger than the Python source code (1M LOC), the API exposed by the Linux kernel (syscall, ioctl, devices, etc.) looks smaller than the API of the Python language and its large standard library.

Is it harder to prevent accidental API breakage in Python than in Linux?
Probably yes.

But that's not the point.
You are breaking the API on purpose! (= deprecation and eventual removal).
That's the point.

The Linux rule is pretty simple: Don't break applications.

And I don't think that would be impossible for Python.
Other complex languages do manage to achieve that goal. Look at Rust, for example, which has very strict rules for backward compatibility.

Deprecated shouldn't mean removal

Posted Feb 10, 2022 14:34 UTC (Thu) by irvingleonard (guest, #156786) [Link]

I would argue that it's unfair to compare Linux and Python. The accurate comparison would be to compare the Python Standard Library and glibc, since very little in the actual language has been deprecated (like the print statement demoted to a function). The question then becomes: how stable has been glibc over the years and how do they handle deprecations?

Deprecated shouldn't mean removal

Posted Feb 2, 2022 8:49 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

I'm torn. On the one hand, I'm generally sympathetic to the argument that aliases are cheap and removing them is a waste of developer resources. Removals cause a lot of churn and the people who have to do that work are, generally speaking, not the people who decide what gets removed in the first place.

On the other hand, CPython is its own project, and if their developers don't want to maintain these aliases, we don't really have the right to demand that they maintain them "for free."

Perhaps a compromise solution would be for an independent group of developers to maintain a single, de facto standardized compatibility layer for each new minor version of Python, which monkey-patches all of the "easy" deprecated aliases back in, and maybe also supplies simple implementations for some of the other removed functionality (perhaps with inferior performance or quality of implementation, if copying the CPython code wholesale is not practical). Given the amount of work which CPython has already caused through deprecation, and the relative simplicity of this sort of monkey-patching, I find it mildly confusing that such an effort does not exist already.

I'm aware of Tauthon, which is (apparently?) still plugging along, but putting my SRE hat on for a moment, I wouldn't let it anywhere near any of my production systems without a lot of very intensive testing and analysis. It's far too big and complicated compared to the sort of shim that I'm imagining.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 10:31 UTC (Wed) by mb (subscriber, #50428) [Link]

> On the other hand, CPython is its own project, and if their developers don't want to maintain these aliases, we don't really have the right to demand that they maintain them "for free."

Well, yes. There's no right to demand. That's correct.
But CPython is certainly not alone on its own when doing decisions. CPython is not some kind of end user application, where only the project and its end users are affected by decisions. CPython is a (de facto standard defining implementation of a) programming language.

And Python developers *are* very good in their decision processes. They do generally care a lot about their users.
But the current deprecation process of trivial things just causes a lot of work for nothing on the user side, and bad reputation for (C)Python.

That's why I'm in favor of sticking with deprecated things forever, if they are relatively easy to maintain. Or at least *until* they become hard to maintain. Just hide them from the documentation, so that no new development is based on it.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 12:08 UTC (Wed) by smurf (subscriber, #17840) [Link]

The problem with this is that there's no pressure to drop deprecated calls when chances are that they will stay around forever.

Then, when it's apparent that one becomes too much of a maintainer burden, the pressure to remove things *now* becomes rather high. So it'll get dropped even if there are still users out there.

On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 15:16 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link]

> The problem with this is that there's no pressure to drop deprecated calls when chances are that they will stay around forever.

That is NOT a problem in many cases. Deprecated aliases that last thousands of years are PERFECTLY FINE.

> On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.

I'm big on fixing code over time, but all developers have to prioritize their code. Please let me focus on what's important, not on the spelling of a method name.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 8:39 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

Look, it’s a balancing act.

If you develop using a feature-poor stack (in C for example), you may never need to change your code due to someone else’s decision.

If you develop using a feature-rich batteries included stack there is a ton of features you get almost free, often better coded than you would yourself (because even if you had the capability, you never had the time to rewrite properly all of them). But, it’s only almost-free, this kind of stack is never really done and you have to adapt your code over time to its changes.

Asking the feature-rich stack to provide perfect eternal backwards compability is not reasonable. Pinning its state (like some people do static building and container side) is only defering the technical debt . With eventual software abandon in the future when the debt pile has grown so much it weights more than the software value.

This kind of write and forget dev only works for games that need to pass a couple Christmas seasons working and nothing more. Also sane people do not let games touch serious data (financial or other).

Deprecated shouldn't mean removal

Posted Feb 3, 2022 11:06 UTC (Thu) by Wol (subscriber, #4433) [Link]

> If you develop using a feature-rich batteries included stack there is a ton of features you get almost free, often better coded than you would yourself (because even if you had the capability, you never had the time to rewrite properly all of them). But, it’s only almost-free, this kind of stack is never really done and you have to adapt your code over time to its changes.

Might be a bit late for Python, but if it's batteries-included, the language should be split into 3 parts. "Core" which is guaranteed to (almost) never change, "Battery Packs" where all the nifty things live, and "Recycle Bin" where battery packs go to die. Then the development environment can moan every time it goes to load a battery pack and finds it in the recycle bin.

Cheers,
Wol

Deprecated shouldn't mean removal

Posted Feb 3, 2022 16:56 UTC (Thu) by atnot (subscriber, #124910) [Link]

I really hope python heads in this direction. As many have pointed out "the standard library is where modules go to die". It has accumulated far too many de facto unmaintained libraries that have better alternatives or could be written better today.

However last time this was proposed Guido stormed out of the room in anger and made the presenter quit python altogether so perhaps it's not wise to propose it again. (https://lwn.net/Articles/790677/)

Deprecated shouldn't mean removal

Posted Feb 3, 2022 17:52 UTC (Thu) by Wol (subscriber, #4433) [Link]

Reading that previous article, maybe we don't want to call it Recycle Bin, just call it "Norwegian Blues" :-)

If nobody wants to maintain it, that's where it goes ...

Cheers,
Wol

Deprecated shouldn't mean removal

Posted Feb 10, 2022 14:35 UTC (Thu) by irvingleonard (guest, #156786) [Link]

The problem is that's basically a tradeoff.

It's very expensive to improve stable code, because of the constraint, since you can't break existing stuff. It's only "safe" to add new stuff, that you'll have to maintain "forever" so you better foresee any future need or you'll be soon writing the 3rd version of your function, and so does the story goes. With every version you increase the maintenance burden (you better have tests) and you discourage any change upstream: any change in the actual language would end up affecting 3 functions instead of 1, and that's only for "this" thing. This approach has the advantage that anyone using the code will be able to do so "forever" at the expense that it will be "all" that you'll get from it. Need a better "std_fancy_function5"? You're out of luck, the maintainer ran out of hair after version 4 and outright quit after 5; but we're looking for maintainers, so you could contribute "std_fancy_function6" and maintain the other 5...

In the other hand an evolving stdlib will keep breaking stuff, and generating work for developers, sometimes just annoying, sometimes useful BUT you could get that "std_fancy_function6" basically "for free", just keep in mind that you have to update all your code using 1-5

The current state of affairs is something in between: the stdlib is so important that major changes are discouraged (which helps very little in the usability side) but at the same time such changes are not prohibited and eventually find their way to a stable version (which infuriates some people). It's a lose-lose situation, where some people get burned because of the changes while others end up using 3rd party libraries because of the limitations of the stdlib counterpart.

Other solutions for abandon-code:
- Avoid the stdlib, since it's the major source of changes, you probably create "set and forget scripts" as long as you don't rely on any module. I would say that creating "a program" this way would be too much, but simple scripts should survive for a very long time.
- Just hang on, the code will eventually mature enough, like the 3.10, which added/changed very little in comparison to other versions.
- Use another, compiled, language and just statically link everything (I wouldn't use this code in anything critical since you would be "baking" all the libraries' bugs into your binary, forever)

Deprecated shouldn't mean removal

Posted Feb 2, 2022 17:08 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.

Assuming there's somebody who's actively maintaining the code. There's a lot of code out there that is in low-effort maintenance mode. That means the first anyone will know about it is that their application breaks, after which someone will have to scramble to fix it. It seems as if Python is basically saying it's for projects that will always be under active development forever, and people who want to write something that will keep working with minimal maintenance need not apply.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 18:57 UTC (Wed) by fenncruz (subscriber, #81417) [Link]

If the code is really only in maintinance mode then shouldn't it pin the python version?

Deprecated shouldn't mean removal

Posted Feb 2, 2022 20:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

Linux distros keep bumping the Python versions. So if you're not using extra-long-support distros, you'll have to upgrade eventually.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 20:28 UTC (Wed) by tnoo (subscriber, #20427) [Link]

> So if you're not using extra-long-support distros, you'll have to upgrade eventually.

or you use conda and run your code in the exact environment you need

Deprecated shouldn't mean removal

Posted Feb 2, 2022 21:36 UTC (Wed) by Kamiccolo (subscriber, #95159) [Link]

> use conda

please, stop.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 6:10 UTC (Thu) by tnoo (subscriber, #20427) [Link]

care to elaborate?

Deprecated shouldn't mean removal

Posted Feb 3, 2022 16:21 UTC (Thu) by sb (subscriber, #191) [Link]

Whenever someone expresses dismay at the maintenance burden of using Python, and how so much of that burden was entirely avoidable, someone else comes along and says something like "just add shmronda, very convenient" :-)

This gets old after a while, especially because the suggestion is often presented as a unique sine qua non for Python but just happens to be that person's preferred workaround at the time and there are several others, presented likewise by their proponents.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 11:55 UTC (Wed) by azumanga (subscriber, #90158) [Link]

The problem is the Python developers don't want backwards compatibility. There are more than enough people who like Python 2 that it could have carried on, with minimal fixes, forever.

The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".

I am aware that sounds suprising, so here is a link: https://github.com/naftaliharris/tauthon/issues/47#issuec...

Deprecated shouldn't mean removal

Posted Feb 2, 2022 15:39 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]

The problem is the Python developers don't want backwards compatibility.

Exactly this. I think the root is that the devs learned the absolute wrong answer from the 2 to 3 transition. I think we can all accept that was bad, and nobody wants to go through it again. But what most programming languages would learn from that is to avoid backward-incompatible changes whenever possible. What Python learned from it was to make backwards incompatible changes unavoidable, so developers are forced to change with the language rather than relying indefinitely on things that are eventually going away.

That's a reasonable approach for programs that are being actively developed, at least as long as the overall trajectory of the language is positive. In that case, developers are willing to pay a deprecation tax to keep up. It's terrible for programs that are being developed slowly or expected to keep functioning with minimal maintenance, since those programs don't need the new features and have to spend their limited maintenance effort on keeping up with apparently unnecessary changes. It's especially egregious in the case of a language like Python that depends on the runtime to function, since there's no way to avoid dealing with the changes.

The long-term problem is that this should be really scary to people considering Python for projects that aim to reach a stable final product. The Python devs' attitude says it's a bad language for projects that aim for stability. You can't build a stable program on an unstable language, and Python intends to remain unstable.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 19:36 UTC (Wed) by cyperpunks (subscriber, #39406) [Link]

The long-term problem is that this should be really scary to people considering Python for projects that aim to reach a stable final product. The Python devs' attitude says it's a bad language for projects that aim for stability. You can't build a stable program on an unstable language, and Python intends to remain unstable.
Python is dead a viable language if the current policy don't stop very soon. It's just to risky to base your work on such unreliable project. It's not just the deprecation policy, it's the very short release cycles, the idiotic lack of a crypto lib in core (the dependency on Rust in cryptography just to make the point of madness very clear), the non development of pip and the whoel "we don't care because we are free (as in beer) attitude. It's sad.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 4:22 UTC (Thu) by roc (subscriber, #30627) [Link]

Ironically Python has never been stronger in terms of the number of users of the language. That's partly to do with the growth of machine learning etc.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 17:02 UTC (Wed) by mb (subscriber, #50428) [Link]

>The problem is the Python developers don't want backwards compatibility.

I don't think that is true.
We're talking about corner cases here. Overall the language is pretty backwards compatible, aside from the 2-3 transition. (See for example how match had been implemented into the parser).
However, corner cases are still very important.

> The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".

Well, I won't argue whether it is Ok to threaten legal actions here.
But I would simply _expect_ people to rename the project, or at least clearly mark it as a fork, if they fork it.
This is a matter of decency towards their users and to the original project.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 17:36 UTC (Wed) by Wol (subscriber, #4433) [Link]

> But I would simply _expect_ people to rename the project, or at least clearly mark it as a fork, if they fork it.

The problem with that, is that the NEW name goes to the OLD software, while the OLD name stays with the NEW software.

Okay, I can understand the project owners not wanting the fork to keep the name, but equally the fork is changing NOTHING BUT the name, that's the whole point of the fork! So why if they're not changing anything else, why do they need to change that?

That's why Perl6 forking off as Raku was a victory for common sense over personal pride.

Cheers,
Wol

Deprecated shouldn't mean removal

Posted Feb 3, 2022 0:49 UTC (Thu) by jkingweb (subscriber, #113039) [Link]

> The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".

After reading the whole thread, I think that's a gross mischaracterization of what actually happened.

It was pointed out that calling the software "Python 2.8", while technically accurate (for some definition of accurate), was legally problematic and potentially a source of significant confusion. The author was open to changing the name, and while alternatives were being discussed, a third party took it upon themselves to besmirch van Rossum's character. Thus the latter responded negatively to that, but it seems to have been in a sarcastic, deadpan way. I find it hard to interpret that as an actual threat.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 1:00 UTC (Thu) by vstinner (subscriber, #42675) [Link]

> There are more than enough people who like Python 2 that it could have carried on, with minimal fixes, forever.

I read that often in the last 10 years. So far, I didn't see any volunteer doing it, even after Python 2.7 support ended 2 years ago.

Red Hat backports security fixes to Python 2.7 in Fedora, RHEL 7 and RHEL 8 until 2024. Fedora patches are public: https://src.fedoraproject.org/rpms/python2.7/tree/rawhide

The problem is that users expect more than just the language and the stdlib when they want "Python". They also expect large Python projects like numpy, Jupyter or PyTorch, but these projects already dropped Python 2 support: https://python3statement.org/

> The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".

Tauthon is *not* Python 2.7. It is something between Python 2.7 and Python 3 which could be called "Python 2.8". PEP 404 rejected the idea of a Python 2.8 version: https://www.python.org/dev/peps/pep-0404/

Tauthon description: "Fork of Python 2.7 with new syntax, builtins, and libraries backported from Python 3."

Anyone is free to fork Python 2.7, add recent Fedora security fixes and maybe fix a few bugs. Since most Linux distributions still ship Python 2.7 in 2022, there is no need to maintain a Python 2.7 fork right now. You're free to continue using Python 2.7.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 13:37 UTC (Thu) by farnz (subscriber, #17727) [Link]

As a side note, there's also a general tendency to complain when Red Hat stops doing maintenance work. When Red Hat stopped making "make X11 work well for people not using Wayland" part of his job, Adam Jackson stepped down from being the Xorg release manager - he'd only been doing it because Red Hat made it part of his job.

No-one else stepped up to that job, so it didn't get done, but people were willing to complain bitterly that it wasn't happening, and that Red Hat "should" have made Adam continue to do it for them, not because Red Hat needed it for their product.

I could see the same happening with Python 2.7; Red Hat will stop supporting it in 2024 unless someone (or a group of someones) steps up with significant money for them to do it. If no-one else steps into that breach, I predict that in 2025 or so, we'll see a selection of complaints that no-one is supporting Python 2.7 any more, but they were using it, and someone should have supported it for them so that whatever problem comes up with Python 2.7 didn't bite them.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 15:06 UTC (Thu) by cortana (subscriber, #24596) [Link]

For context, by that point Python 2.7 will be 14 years old. At the time it was released, we were still 3 years away from Ruby 2.0, 1 year away from Java 7, 9 months away from GCC 4.6...

and a year away from Perl 5.14. :)

Deprecated shouldn't mean removal

Posted Feb 3, 2022 15:29 UTC (Thu) by azumanga (subscriber, #90158) [Link]

Where exactly can I sign up to help keep Python 2.7 going?

All messages I have seen have made very clear that there will be no more releases at all, not "there is insufficient support".

Deprecated shouldn't mean removal

Posted Feb 3, 2022 15:54 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

The problem is that no one has forked the project yet, so there's no place for people to go to contribute to an upstream.

You could sign up by changing that: create a fork of Python 2.7 on GitHub and incorporate the Fedora patches.

Since the Python project has trademarked the name "Python" and wants to be your enemy, you'll have to change the name. "Snek" may be a good choice. Be aware that you should be able to still advertise your project as a "continuation of the Python 2.7 codebase" or "a project to provide continued maintenance for the Python 2.7 codebase" since those are true statements and you can generally use a trademark when you are making true statements. That's not legal advice, but I think it's true. Satisfy yourself as to the accuracy of what I said beforehand so that if the Python project tries to bully you, and they might, you can feel safe standing up for yourself.

Someone will eventually do this work if you don't, probably when the last distro stops providing patches. However, if you think you'd be good at it, call dibs by doing it now.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 20:20 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

Snek already exists as a Python-like programming language.

Deprecated shouldn't mean removal

Posted Feb 4, 2022 6:29 UTC (Fri) by linuxrocks123 (subscriber, #34648) [Link]

Ah, of course it does. Okay then. "27thon".

Deprecated shouldn't mean removal

Posted Feb 5, 2022 9:07 UTC (Sat) by adobriyan (subscriber, #30858) [Link]

or simply Py27.

Deprecated shouldn't mean removal

Posted Feb 3, 2022 15:55 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]

There's also Tauthon

https://github.com/naftaliharris/tauthon

That's more of a Python 2.8 than continued maintenance for 2.7, though.

Deprecated shouldn't mean removal

Posted Feb 4, 2022 10:15 UTC (Fri) by ceplm (subscriber, #41334) [Link]

https://github.com/naftaliharris/tauthon seems like the most viable option, but I think it is complete exercise in futility. Just fix your code and switch to Python 3.* it is much better to be out here.

Deprecated shouldn't mean removal

Posted Feb 2, 2022 23:08 UTC (Wed) by vstinner (subscriber, #42675) [Link]

> We should have a deprecation period of at least 10 years by default.

This article is about unittest and configparser. Most unittest deprecation warnings were added to Python 2.7 in 2010: 12 years ago. configparser deprecations were added to Python 3.2 in 2011: 11 years ago.

This article is not about *removing* the deprecated features but *keeping* them for one more year (Python 3.11) and better advertize these deprecations (that developers managed to ignore for longer than 10 years).

Deprecated shouldn't mean removal

Posted Feb 2, 2022 23:59 UTC (Wed) by fman (subscriber, #121579) [Link]

> but removal is pretty user-hostile.

Hostile nails it. That is exactly the feeling you get when being at the sharp end of the stick with no hope for sympathy for your frustrations from the core dev team.

Python and deprecations redux

Posted Feb 2, 2022 8:42 UTC (Wed) by azumanga (subscriber, #90158) [Link]

Sorry to sound blunt, but just stop breaking things!

None of these changes seem to allow future improvements, or fix security issues, they are just "tidying".

I'm happy for documentation of deprecated functions to be hidden, and maybe an always on warning.

It increasingly feels like Python actively hates the idea I might write a program and it just be done. I don't want to keep doing tidy up every year. I have C99 programs doing useful work they haven't needed any "cleanup" in 20 years.

Python and deprecations redux

Posted Feb 2, 2022 9:47 UTC (Wed) by cortana (subscriber, #24596) [Link]

I'm happy for documentation of deprecated functions to be hidden

I can't stand this. If I'm looking through source code and I see an unfamiliar function, I want the Python Standard Library documentation to document it. Absolutely add a note that it's deprecated but don't hide it!

Python and deprecations redux

Posted Feb 2, 2022 10:42 UTC (Wed) by smcv (subscriber, #53363) [Link]

If the documentation for deprecated functions is hidden, then there's also nowhere to say what the non-deprecated replacement is.

Replacing the entire documentation for the deprecated foo_bar_baz function with "deprecated equivalent of foo_bar(baz=True), use that instead" is often fine, though.

Python and deprecations redux

Posted Feb 2, 2022 22:25 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

I should prefer to know *why*

Take Rust's str::trim_left_matches(). This function is deprecated since 1.33 about 3 years ago. Unlike Python it's unlikely Rust will ever actually remove deprecated standard library functions but nevertheless it is deprecated and your code should call str::trim_start_matches()

The documentation tells you this, but if you want to understand why you need to read the full description of the functions and perhaps if you've never seen them before, read a little about other human writing systems. Nobody is surprised by what trim_start_matches does but there is potential to be surprised by trim_left_matches depending on how you're thinking about the problem.

Python and deprecations redux

Posted Feb 2, 2022 11:01 UTC (Wed) by mb (subscriber, #50428) [Link]

> I can't stand this. If I'm looking through source code and I see an unfamiliar function, I want the Python Standard Library documentation to document it.

You can look into an older version of the documentation.

The latest documentation should only include the name of the deprecated interface, a big deprecation statement that tells us since when it has been deprecated (so you can look up the old documentation).
All technical description of the interface should be removed. Optionally the name of the new interface could be added, if that's applicable.

Python and deprecations redux

Posted Feb 4, 2022 13:47 UTC (Fri) by ceplm (subscriber, #41334) [Link]

Well, when you want to be blunt: pick different language. I have been around Python since the early 2000s (Python 1.5.2) and I very much remember that unspoken (or sometimes even spoken) social contract among Pythonistas (on comp.lang.python, which was then still The Thing™) was that the language changes and you should be better have your test suites ready, because you will need to accommodate to those changes regularly. Language was slowly changing, updates were regularly done, people accommodated to changes, and everybody was happy. Then unfortunate py2k/py3k change happened and all hell broke loose. One unfortunate side effect was that Python 2.7 got stale for so long and people started to believe that the stagnation is the rule and let their test suites rot away (which is a bad idea even for other reasons). Now, when the life returned to normal, and Python’s development is happily chugging away again, they are surprised.

All those stories how “my thirty year old C program still works just fine” are based on two assumptions, which I am not sure people are willing to accept and certainly Python won’t satisfy. The language (C, Fortran, COBOL) must be dead. So, even languages which are slightly alive (C++, Java, Perl) could be problematic and need periodic adjustments. And of course, aside from dead language, you cannot use any libraries (because those change) and dead environment (how are your C-languages for Plan9 or CP/M doing?). So, if you have a program which uses just stdio.h from the K&R book, it will work still (perhaps), with plenty of warnings, but anything more involved will have problems.

Python and deprecations redux

Posted Feb 4, 2022 19:21 UTC (Fri) by klindsay (subscriber, #7459) [Link]

Fortran is not dead, new features get added to the language. For example, Fortran 2003 added support for object oriented programming.

While there are some deleted features, the new features are added in a largely backwards compatible way. This is intentional. Backwards compatibility is a high priority of the standards committee. The 2003 standard includes the sentence "This standard protects the users’ investment in existing software by including all but five of the language elements of Fortran 90 that are not processor dependent.".

So there are programming languages that evolve and simultaneously prioritize backwards compatibility. It's not one or the other, as your second paragraph seems to imply.

Python and deprecations redux

Posted Feb 4, 2022 19:49 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> my thirty year old C program still works just fine

With Python it's getting difficult to run 5-year old code. Which is a problem.

Python and deprecations redux

Posted Feb 6, 2022 1:23 UTC (Sun) by abartlet (subscriber, #3928) [Link]

This is my strong feeling as well. Samba keeps getting hit by this, for changes that really don't seem to have any more rationalle than "someone deprecated this a while back, we better finish removing it.

Of course Samba deprecates features and options as well - but I like to think we only do the removal it when we really need to, not just to follow up a deprecation that almost by definition was not discussed with all/any significant fraction of the users.

Samba only finds out when Fedora builds break.

Python and deprecations redux

Posted Feb 6, 2022 3:09 UTC (Sun) by pabs (subscriber, #43278) [Link]

Seems like the Samba CI could be checking for deprecations when they happen instead of hitting them when the removals happen?

Python and deprecations redux

Posted Feb 2, 2022 12:24 UTC (Wed) by jezuch (subscriber, #52988) [Link]

In Java land, the compiler prints something like "your code is using deprecated stuff"; it doesn't even mention what stuff, and it prints it *once*. But then it adds: "enable this lint to see all of them".

It's been like this for decades.

There's also a well-established process for depreciation for removal, according to which the deprecated things are clearly marked as such in the code. It is used for features which are known to not be widely used, or are known to be positively harmful (like the Applet API, or, more recently, finalization - so yes, it happens even for features once considered a core part of the language). Apart from that there'd tons of deprecated stuff, and just sits there undisturbed.

I have no point to make, and I don't even really care about Python :) But any time I read about Python community struggling with something, I see that it's successfully been done elsewhere. And it feels really amateurish in comparison.

Python and deprecations redux

Posted Feb 10, 2022 14:35 UTC (Thu) by irvingleonard (guest, #156786) [Link]

So, that raises a lot of questions:

- Who maintains the deprecated stuff in Java? (since it "just sits there undisturbed")
- Do the maintainers allow new bug reports for deprecated code?
- What about patches?
- If that's all possible then what's the difference to non-deprecated code? Just vocal stance against its use?

Python and deprecations redux

Posted Feb 10, 2022 20:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> - Who maintains the deprecated stuff in Java? (since it "just sits there undisturbed")
Java maintainer (Oracle).

> - Do the maintainers allow new bug reports for deprecated code?
Yes, for security issues.

> - What about patches?
Not really.

> - If that's all possible then what's the difference to non-deprecated code? Just vocal stance against its use?
Deprecation warnings during compilation.

Python and deprecations redux

Posted Feb 25, 2022 21:39 UTC (Fri) by irvingleonard (guest, #156786) [Link]

So, it's just a terminology issue. There's no Java equivalent to Python's "deprecation/removal" process, they just "vocally encourage and discourage" some pieces or others, but otherwise support the whole thing, forever.

Basically what I said here https://lwn.net/Articles/884324/ with the caveat that if Java works for you, you should definitely use Java.

Python and deprecations redux

Posted Feb 25, 2022 22:58 UTC (Fri) by dtlin (subscriber, #36537) [Link]

Java does remove APIs. https://docs.oracle.com/en/java/javase/16/migrate/removed...
Not close to the same rate as Python, but even beyond the API of the standard classpath, Java modularization breaks a good number of programs, both by making the runtime stricter and by removing previously standard components as well.

Python and deprecations redux

Posted Feb 28, 2022 14:30 UTC (Mon) by irvingleonard (guest, #156786) [Link]

I'm lost here. So, there's no difference between Java and Python in this regard, other than the timings? (how long Java keeps stuff around after deprecation vs Python's, etc.).

Python and deprecations redux

Posted Feb 2, 2022 13:51 UTC (Wed) by zeekec (subscriber, #2414) [Link]

I wonder* if it would be worthwhile to have two levels of warnings for deprecations. For the first two releases after deprecation, the warning would be silent by default, as it is now. For the next two(one) the warning would be visible by default (but silenceable). That would give people that care time to get their package ready and time to harangue those that don't.

* Note: I'm too lazy to google to see if this has already been suggested.

Python and deprecations redux

Posted Feb 2, 2022 15:34 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

No idea if it has been suggested either, but you want to conditionalize the warnings based on availability of the replacement. If Python had had some way of saying "this code expects to work with Python 3.5", anything deprecated in 3.6 could be silent because 3.5 (presumably) doesn't have the replacement. Exceptions would be:

- APIs where the replacement works in the declared version (please use the better spelling, pattern, replacement, etc.)
- APIs slated for removal Real Soon Now™ (this is effectively a request to bump the minimum requirement)

Of course no project currently has a way to tell the standard library what version it expects to work with, so this only works once such things can be communicated. Note that you also might have mixing with different modules/packages expecting different minimums, so this is something that should be attached to where the code is declared, not something set from the top-level package.

For bonus points, have a second version which states what version it is aware of. Then you can warn about any API deprecated before that second version additionally (the code is expected to have been made runtime-conditional to avoid any version skew issues).

Python and deprecations redux

Posted Feb 2, 2022 16:45 UTC (Wed) by vstinner (subscriber, #42675) [Link]

It's possible to start introducing a PendingDeprecation in Python 3.N, convert it to a DeprecationWarning in Python 3.N+1, and remove the function in Python 3.N+3. Python offers various ways to show or hide PendingDeprecation or DeprecationWarning. Sometimes, a deprecation starts by only adding a mention to the documentation, before emitting a warning at runtime.

A problem is that Python default behavior is for *users*, not developers: PendingDeprecationWarning and DeprecationWarning are hidden by default. The Python Development Mode (-X dev or PYTHONDEVMODE=1) shows these warnings.

pytest (popular library to write tests) and unittest (stdlib module) now show DeprecationWarning warnings for a few years: it wasn't the case previously. Things are evolving to better handle deprecations in Python.

Python and deprecations redux

Posted Feb 2, 2022 19:38 UTC (Wed) by iabervon (subscriber, #722) [Link]

What I'd like to see is support for hiding DeprecationWarnings (or PendingDeprecation) for only a provided list of (deprecation, call site) pairs. That way, you could get the deprecation warnings for code you develop and suppress the warnings for code you're just using (ideally, after reporting the warnings to developers of that other code). For that matter, even if you are also a developer of some of your dependencies, you tend to be working in one role or another at any particular time.

Python and deprecations redux

Posted Feb 3, 2022 0:28 UTC (Thu) by rra (subscriber, #99804) [Link]

The Python coverage offered by LWN of late has truly been excellent. These articles are quite valuable and interesting, and a great help to my day-to-day programming and planning. Thank you for publishing them!

It seems like every post about Python on LWN prompts a flurry of comments from people who want to posture about how much they dislike Python, and I hope that's not discouraging to the LWN writers. Please know that many of us use Python regularly and appreciate your coverage and your typical thorough and dispassionate job of keeping us up to date.

Python and deprecations redux

Posted Feb 3, 2022 1:44 UTC (Thu) by jake (editor, #205) [Link]

> I hope that's not discouraging to the LWN writers

I don't think it is, really. Programming languages seem to bring out a certain level of disdain from fans of other languages (or non-fans of Python or whatever) in comments sections everywhere. It is not a perfect language or community, by any means, but lots of folks (including LWN) use it for all sorts of interesting things and generally enjoy doing so. The Python community has given us a huge gift.

That said, your note certainly helped encourage us to continue covering it like we do. We appreciate it greatly.

jake

Python and deprecations redux

Posted Feb 3, 2022 7:59 UTC (Thu) by AdamW (subscriber, #48457) [Link]

In that case, I'll just +1 the OP here. Agree entirely, and thanks for the coverage, Jake.

Python and deprecations redux

Posted Feb 3, 2022 23:46 UTC (Thu) by vstinner (subscriber, #42675) [Link]

Even if I'm basically reading all python-dev emails, Discourse and (private) Discourse discussions, PEPs, etc. I'm still learning new things from these excellent summaries :-) Moreover, it's convenient when I need to share a summary of the discussion with someone. The traffic on python-dev can be insane some weeks, and it's really hard to digest everything.

Python and deprecations redux

Posted Feb 5, 2022 17:45 UTC (Sat) by willy (subscriber, #9762) [Link]

> Programming languages seem to bring out a certain level of disdain from fans of other languages

I don't think that's really what's going on in the comments for this particular article. Speaking for myself, I don't write Python (or any language that might remotely be considered a competitor), but I do want to be able to run code other people wrote without having to figure out what "the new way" to do that thing is.

I just want Python to be better, and I get the strong sense the other critics here want the same thing.

Python and deprecations redux

Posted Feb 3, 2022 9:31 UTC (Thu) by danpb (subscriber, #4831) [Link]

> Sebastian Rittau said "that some (semi-) automated way to actively test and notify important projects of deprecations/removals before a release would be a great addition to the Python ecosystem", though he acknowledged that it might be difficult to do. Stinner replied that, in effect, Fedora is already doing that, albeit with "changes already merged in Python".....

> .... Stinner had some thoughts on being even more proactive in the future. He suggested that before making an incompatible change, doing a search of the Python Package Index (PyPI) for uses of the feature in question "and try to update these projects *before* making the change". Once the number of affected projects has been reduced to some low number (he suggested 15), the change could be made in Python.

Not knowingly breaking stuff on PyPI is great, but what about the millions of projects using python code that don't exist on PyPI? PyPI merely hosts the code designed and published as reusable modules, but there's likely orders of magnitude more python code existing in leaf applications (both open source and private to an organization) that's just as important, probably more so to those who use it.

Python and deprecations redux

Posted Feb 3, 2022 23:50 UTC (Thu) by vstinner (subscriber, #42675) [Link]

The automated https://github.com/asottile/pyupgrade tool looks interesting for such task! We need of these tools!

I wrote a similar tool for C extensions: https://github.com/pythoncapi/pythoncapi_compat

At least, more and more incompatible changes are documented with practical instructions on how to update existing code in the "What's New in Python 3.x" document. Example with Python 3.11: https://docs.python.org/dev/whatsnew/3.11.html

Python and deprecations redux

Posted Feb 6, 2022 4:01 UTC (Sun) by ras (subscriber, #33059) [Link]

Reading the comments here I must be odd in thinking mostly I don't mind the way Python's mechanism for handling change. The only time I've cursed it is the time they didn't stick to it - which is the Python2 --> Python3 transition.

As it happens just about everything that transition did could have been handled the old way. The one exception was the change to the way Python handled Unicode. Which is ironic because IMHO, the way they handle Unicode and Bytes in Python3 is objectively worse than Python2. In fact their treatment of how bytes is handled, and in particular that b'abc' returns in integer and somehow deciding b'abc'[0] in b'abc' would be False is inexplicable. It's almost like the forget their mission was to improve Python, not make it more like Java.

The sad part about all that is had they stuck to original way of doing transitions rather than using the flag day method for the Unicode change, I suspect the worst of those decisions would not have made it through the process.

Python and deprecations redux

Posted Feb 10, 2022 14:34 UTC (Thu) by irvingleonard (guest, #156786) [Link]

AFAIK the py2->py3 transition was made incompatible by design. The idea was that all the existing code would have to be revisited to check for the string/bytes distinction (is the variable holding "text" or "bytes"?). There were tons of landmines for programmers using IO in any way, since the encode/decode thing was "optional", and a huge hole to attack your code (what if I use a different encoding in my request? why if you forget to put that line to encode your response?). When you live in an English only world and use UTF-8 (or ASCII) everywhere then you'll see it as a painful transition, otherwise you'll see it as a great change, it depends on your worldview.

Python and deprecations redux

Posted Feb 10, 2022 20:19 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> When you live in an English only world and use UTF-8 (or ASCII) everywhere then you'll see it as a painful transition, otherwise you'll see it as a great change, it depends on your worldview.

Really? Which one of these 5 non-UTF encodings do you prefer: KOI8-R, KOI8-U, CP1251, DOS866, or perhaps the most standard of them all: ISO/IEC 8859-5?

At the time Python3 was developed, UTF-8 made national encodings basically useless and stupid.

Python and deprecations redux

Posted Feb 25, 2022 21:30 UTC (Fri) by irvingleonard (guest, #156786) [Link]

What if is none of them? Check this response https://lwn.net/Articles/886174/

Python and deprecations redux

Posted Feb 25, 2022 21:43 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

This response is equally invalid.

Python and deprecations redux

Posted Feb 11, 2022 2:22 UTC (Fri) by ras (subscriber, #33059) [Link]

> it depends on your worldview.

I can't agree. The (b'abc'[0] in b'abc') is False thing has nothing to do with my world view. I'd be amazed if that little "lets copy Java" feature doesn't cause more 2to3 transition bugs than the rest combined.

Nor does my world view matter when it comes to unix file names having no well defined encoding. I'd agree the world would be a better place if they did, but the reality is they don't. Python2's string handling handled the situation without missing a beat. Python3 looks like they handled it into going into denial and assuming it was possible to convert file file name into a readable string, despite the fact they knew full it could be written by a python program with LANG=af_ZA.ISO-8859-1 and later read by that same program with LANG=af_ZA.UTF-8.

I've never met anybody with a world view in which the ISO 10646 vision of allocating one true number to every grapheme was a bad thing. I've never met anybody with a world view that lead to them thinking all readable text shouldn't be encoded using such numbers. Unfortunately I've never lived in a world where code assuming those two things were a given didn't cause the programs using them to break far, far too often. Maybe you just haven't dealt with enough crap coming in from the internet to notice.

Python2 had a workable compromise that let us move towards ISO 10646 in a graceful way while being subject to gobs on broken text. As a "be conservative in what you do, be liberal in what you accept" compromise it wasn't bad, although I agree it could have done more in nudging us more towards "conservative in what you do". Python3 was apparently arrogant enough to think if could fix language encoding problem by just assuming it was already fixed and this would force the world would follow. Oddly, they were wrong, the world didn't change on a dime just because Python3 was a thing. Turned out you can't change the world by forcing Python programmers to, as you say, revisit python code "to check for the string/bytes distinction", because making such a binary distinction is impossible in some cases. It's possible for Unix file names and configuration file to be both - blocks of ASCII text intermingled with some unknown encoding. It's not just Linux, HTML / HTTP / RFC 5821 all tend to assume you will treat it as an unknown encoding with meanigful bits of ASCII embedded. If you are lucky, some of those meaningful bits might even tell you the encoding of the rest.

The really sad bit is Unicode didn't even get the ISO 10646 vision right. After the UCS2 / UTF-16 encoding debacle, they left themselves with such a small encoding space they dropped the "one coding point for each grapheme" thing in favour of diacritics. With diacritics ("o" in string) could well return True when there is in fact no "o" in the string. What were they thinking? Perhaps it was "hey, I've found a next way we can trick programmers into introducing a whole pile of new exploits!".

As you can probably gather, I've come to find the entire Unicode thing (not just Python3's part in it), a depressing subject. We could have done so much better.

Python and deprecations redux

Posted Feb 11, 2022 16:40 UTC (Fri) by jwilk (subscriber, #63328) [Link]

> After the UCS2 / UTF-16 encoding debacle, they left themselves with such a small encoding space they dropped the "one coding point for each grapheme" thing in favour of diacritics.

Even Unicode 1.0 had codepoints for diacritics (see §2.5 "Non-spacing Marks"). UTF-16 was introduced only in Unicode 2.0.

Python and deprecations redux

Posted Feb 14, 2022 10:54 UTC (Mon) by farnz (subscriber, #17727) [Link]

Diacritics is an interesting example to pick, since depending on the language, ó is an o. Or sometimes not. Depends on which language you're speaking.

And this is why Unicode defines normalization forms, so that you can take in an arbitrary set of Unicode codepoints, and turn them into a uniform sequence of codepoints regardless of what the user has entered (by doing things like setting an order for multiple combining marks).

Python and deprecations redux

Posted Feb 25, 2022 21:29 UTC (Fri) by irvingleonard (guest, #156786) [Link]

I think the main problem is the bytes/bytearray presentation method. People keep joining bytes and text in their minds, and that doesn't help. From my point of view, bytes (and bytearrays) are a low level data handling tool, they aren't letters, or anything else for that matter, just a sequence of bits. When you're doing text this is just a nuisance, with something there about "encodings", but it's just a bless for anything else, aka binary protocols.

Say I have this protocol where I send a 3bit flag followed by a five 1bit flags, and they happen to be "1100001", aka 0x61; python will show b'a'. Does it mean I sent an "a"? Am I expecting an "a"? Not really, I will struct my way into it, but python doesn't really know what it means before that, it can only tell me that "it looks like an 'a'", which makes a lot of people think, incorrectly, that "1100001"/0x61 will "always be an 'a'".

That being said, regarding the "byte search complaint", I see there might be a reason to "search a string of bits within a stream of them using byte boundaries" and I presume it was a political decision (this is just guesswork, I don't really know):
- if you're working with text, decode the bytes and use the text tools
- if you're working with binaries, struct your bytes and do the search there

I've tried working with binary stuff in python 2 and I can definitely tell you: python 3's bytes (and bytearrays) are a strong step forward in that regard.

Python and deprecations redux

Posted Feb 25, 2022 21:50 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> I've tried working with binary stuff in python 2 and I can definitely tell you: python 3's bytes (and bytearrays) are a strong step forward in that regard.

No, it's not. Python3 doesn't actually make stuff any better, because Py3 strings are not actually strings. They are just sequences of Unicode codepoints.

For example, Py3 allows you to split strings across combining characters, so you can easily get nonsense like diacritics separated from the characters they should go on top of. Or worse, RTL text in an incorrect direction.

Heck, Py3's standard library doesn't even have locale-specific upper/lowercasing built in (see: Turkish dotless I). See how it's done in a language where developers actually care about correctness: https://pkg.go.dev/unicode#SpecialCase

Py3 just pretends that these complexities don't exist if you're writing English and allows developers to pat themselves on the back for "eating their veggies".

Python and deprecations redux

Posted Feb 26, 2022 12:56 UTC (Sat) by irvingleonard (guest, #156786) [Link]

We could both be right, you see. You're talking about strings and I'm talking about bytes, just from there we get to what I said earlier: binary processing is better in Python 3 to the expense of string simplicity. Fair?

Python and deprecations redux

Posted Feb 26, 2022 20:33 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

Not really. I fail to see how binary processing is better in Py3 compared to Py2. Can you give a concrete example?

Moreover, new features like format string don't work well with bytes. So if I have a binary protocol, I can't just do something like this: fb'{part1}\x11{part2}'

Python and deprecations redux

Posted Feb 28, 2022 15:51 UTC (Mon) by irvingleonard (guest, #156786) [Link]

Like I said, binary in python 2 is possible, just harder. Let me put it this way: lower level languages don't assume the type of a blob/bytes/bit-string, ever, since that's the most basic "type" that you could have (everything will translate to "bytes", eventually). Ints, floats, bools, and all flavors of "string" are "high level types", working on top of "bytes".

Python 2 saw that and said: nah, we'll do strings by default. So, with this you could say that python 2's integers end up as strings and the other way around, which doesn't mean text, which is also a possible casting. Say: you can cast the int 255 to a string (binary) and it will be 0xff or you could cast it to a string (text) and it would be 0x323535 depending on which casting function you use (same origin and end types). Floats should be even more "interesting". To add salt to the injury, python will autocast using whatever function they think should be used in some circumstances (__repr__).

From where I see it, you shouldn't process bytes directly unless you know what you're doing (you're so into it, that you know how to handle your data in encoded form) or you're just working with binary data. Bytes should be treated as a low level data format, that should support only low level functions and should be converted (decoded) into your data types for any meaningful processing (it's a new layer) or should be used by your application specific functionalities (audio functions, video functions, code functions, etc.). The main problem is that the documentation keeps linking them to "strings" (text) because of the history (aka Python 2) but python 3 bytes are not text, and shouldn't be shown at text, ever; but then we have binary strings b'Confusing?' so we can keep people scratching their heads and complaining about "byte strings" shortcomings.

Python and deprecations redux

Posted Feb 28, 2022 19:03 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Like I said, binary in python 2 is possible, just harder.
Uh.... Whut?!?

> Python 2 saw that and said: nah, we'll do strings by default. So, with this you could say that python 2's integers end up as strings and the other way around, which doesn't mean text, which is also a possible casting.
Py2 does not have a functional distinction between byte arrays and strings.

> Say: you can cast the int 255 to a string (binary) and it will be 0xff or you could cast it to a string (text) and it would be 0x323535 depending on which casting function you use (same origin and end types).

>>> chr(255)
'\xff'
>>> str(255)
'255'

> From where I see it, you shouldn't process bytes directly unless you know what you're doing
Whut?!?

I've written probably a hundred thousands lines of code in Py2 that worked with binary protocols, using regular strings. My biggest problems were printing with correct escaping and binary formatting.

I've also moved that code to Py3. Not once I had a case where Py3 strings caused me to say: "Wow! That strings/binary separation is so nice, it saved me a day of debugging!". On the other hand, I've probably wasted weeks on: "Oh fuck. I forgot .encode() in that exception handler and that's why the application crashes".

Python and deprecations redux

Posted Feb 28, 2022 21:20 UTC (Mon) by irvingleonard (guest, #156786) [Link]

Ok, so we're on the same page:
- Py2 does not have a functional distinction between byte arrays and strings.
- Different casting function for the same origin->destination (the binary/text distinction lies there)

To make this work you'd have to be a great programmer (which apparently you are, so kudos to you), able to keep track of which of your variables are holding text and which binaries (perfectly doable, with a good naming convention and hard discipline) and of course, a huge landmine (you treat binary as text and it might blow in your face). Now do 3rd party libraries: is this function returning text or "binary"? does this class expect binary? Again, everything out-of-band, via documentation or naming convention or some other trick. I'm not as disciplined as you apparently are, so, this is hard for me, I would rather have a totally independent type and have a type separation between binary and text.

Python and deprecations redux

Posted Feb 28, 2022 22:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> - Py2 does not have a functional distinction between byte arrays and strings.
Yes, and it's awesome!

> - Different casting function for the same origin->destination (the binary/text distinction lies there)
???

I still don't get it. Can you give an example of different casting in Py2?

> Now do 3rd party libraries: is this function returning text or "binary"?
Here's the question: why does it matter?

Python and deprecations redux

Posted Mar 2, 2022 5:13 UTC (Wed) by irvingleonard (guest, #156786) [Link]

Obviously our minds work in different ways. You're happy with python 2's extremely "Pythonic" strings and I prefer a more traditional binary & text separation. Hopefully there will be enough people with your taste and you will have an "alternative python" with all the niceness of strings that do all.

Cheers.

Python and deprecations redux

Posted Feb 28, 2022 23:57 UTC (Mon) by ras (subscriber, #33059) [Link]

> "Oh fuck. I forgot .encode() in that exception handler and that's why the application crashes".

I admire your restraint.

Python and deprecations redux

Posted Feb 27, 2022 0:41 UTC (Sun) by ras (subscriber, #33059) [Link]

> binary processing is better in Python 3 to the expense of string simplicity.

I'm with Cyberax here - I don't see that at all. And I do work with bytes.

Aside from the (b'abc'[1] in b'abc') thing (what were they thinking?), bytes and old Python2 strings are almost the same now.

Granted they weren't. In their initial vision bytes there were two very different things. Strings were the old Python2 strings, but representing Unicode points rather than bytes. Bytes were a brand new thing - a sequence of uint8's. Maybe I'm wrong, but I get the feeling they were introduced because of the planned move to Unicode strings, they needed a "raw i/o" type, so they copied Java's. So they had two different sets of methods. I don't know why they introduced the b'123' syntax because it doesn't fit will within this vision. Perhaps it was to ease porting form Python2. In any case, it was the hint that something was wrong, and the vision would be severely compromised in the years to come.

What you call "string abuse" is what I call them being forced to acknowledge data in the real world is not like that. It isn't separated cleanly into strings and bytes. We can see Python3 was in the end forced to acknowledge that because it backported (nearly?) all the str methods to bytes. They did that because while the Java model worked, it creates a lot of code bloat. Java is the spiritual home of code bloat so no one thought it was out of place there, but it becomes painfully evident in Python3. So they've been forced to kludge their way around it by duplicating code. Unfortunately the kludges won't work anywhere near as well at reducing bloat as strings and bytes did in Python2.

> I can't just do something like this: fb'{part1}\x11{part2}'

And yet b'{part1}\x11{part2}' % locals() works fine, and the f string is just syntactic sugar for that. Had their new string model become "str is subclass of a bytes object that contains pure unicode strings", and defined the str() and repr() functions to be things that always utf-8 (ie a bytes object that is also a str) then there is no problem - it all fit together naturally and beautifully. That contrasts to fb'{part1}\x11{part2}' restriction you mention which is purely artificial. It's not imposed by the real world, or math. It's there because someone is inflicting their vision of intellectual purism on the rest of us.

When you say "at the expense of simplicity", I'm struggling to think of something that is simpler. The only thing I can come up with is bytes.__getitem__ returns an int. Yes, that is slightly more convenient than ord(b'abc'), and perhaps bytes((a, b, c,)) is more convenient than ''.join(chr(c) for c in (a, b, c,)), but all that is completely blown away by the (b'abc'[0] in b'abc') fubar. I'd give up on the former to get the latter fixed any day. The rest is pretty much the same in both worlds now, due to the method duplication. Even print(b'abc\x11') gives very similar output to print('abc').

> I'm just saying that from a fresh dev point of view, one that started using python around 2.5-2.6, it felt like a great thing to fix.

No one is arguing that sequences of bytes and sequences of unicode do feel very different conceptually, or that it doesn't feel intellectually fulfilling nice to make a sharp distinction between the two, or that at the start making Unicode and bytes mix better than oil and water didn't seem like it would be a very fruitful endeavour. It certainly was a difficult endeavour. In fact I suspect it would never been even attempted, had the real world not intervened with example after example of single blob of text being mixture of ASCII and other crap, and it was real convenient to treat it all as text working just with the ASCII. Sure carefully parsing it to separate out the ASCII and crap is intellectually purer, but it also generates the sort of bloat we see in Java. It didn't help it is on occasion real convenient to apply a regex to something you knew damned well was just binary, or that the other string operations like slice, copy, join happened all the time with binary. While duplicating these otherwise two identical sets of operations into two incompatible types is intellectually purer it also increases the read world cognitive load on the programmer which is neither nice nor pure.

Python and deprecations redux

Posted Feb 28, 2022 16:28 UTC (Mon) by irvingleonard (guest, #156786) [Link]

I wouldn't say that bytes are sequences of "uint8"; that might be a shorthand for "anything" but it's also an encoding: you could have bytes that represent a list of 8bit unsigned integers, as in (abs(int), abs(int), abs(int), abs(int)) with int <= 127 (assuming cpython has such an encoding, not sure if int starts at 8bits or 16bits, or something bigger). I would describe "bytes" as a stream of bits grouped in octets, and the number of bits is a multiple of 8. The meaning attributed to such bits and octets will depend on your code.

There's another response from me about Python 2 strings here https://lwn.net/Articles/886363/

Regarding the origin theory, I really have no information on the whys and hows, so it's just theoretical discussion at this point. I could make the opposite case that you did.
If the main reason was to decouple "text" from "bits", hence the string/bytes distinction, then they just created a new problem: how do you cast one into the other? I'm sure there are many ways, and the current one could be great, or lousy, or good enough. I don't have expertise to opine there, all I can say is that it "feels good enough" for my use cases (and I see it flexible enough).

Python and deprecations redux

Posted Feb 26, 2022 2:02 UTC (Sat) by ras (subscriber, #33059) [Link]

> Say I have this protocol where I send a 3bit flag followed by a five 1bit flags, and they happen to be "1100001", aka 0x61; python will show b'a'. Does it mean I sent an "a"? Am I expecting an "a"? Not really, I will struct my way into it, but python doesn't really know what it means before that, it can only tell me that "it looks like an 'a'", which makes a lot of people think, incorrectly, that "1100001"/0x61 will "always be an 'a'".

Yes there is some tension between the various representations of blob of bytes. You see it everywhere - not just in languages. For example tcpdump will give you both the hex and printable representations. And contrary to what you say, I've never met a systems programmer who was confused by that - they instinctively know which one they want, even if the packet dump contains a mixture of text and binary data.

There are many resolutions to this tension. Java for instance chooses to represent byte blobs as a sequence of unit8's, and strings as something entirely different. C (and later Python) made the observation that a ASCII string is also a sequence of bytes. They could have created a separate type system with mostly duplicated operations for bytes and strings (copies, concatenate, search, ...), but that would be insane, which is how we ended up in the Python2 world. You don't like that Python2 always defaults to representing blobs of bytes as they ASCII equivalents where they exist, \xHH otherwise, but as someone whose dealt with this for decades while I need both, the string representation is usually the more useful one so it's a nice default, and it’s easy enough to convert to another representation - just [ord(c) for c in strng] say.

Then ISO 10646 came along. At first blush a sequence of ISO 10646 code points don't look much like a sequence of bytes. In fact in the beginning code points weren't a sequence of well defined anything as ISO 10646 had a variety of encodings, some of varying length. Java did the obvious and created those two parallel type systems - one for byte blobs and one for ISO 10646 text strings. (And sadly was sucked in by the UCS2 delusion Unicode created to justify its existence, as were many others at the time.) The duplication aside, two incompatible type systems worked well enough where there always was a sharp distinction between printable text and all other data. Unfortunately the plethora of different ways of encoding text prior to ISO 10646 meant in older data you had ASCII mixed with god knows what. The C/Python2 continued to handle that situation (which includes ISO 10646 using some unknown encoding) well, but Unicode hardly at all.

Then Ken Thompson gave us UTF-8, a ISO 10646 coding that could once again be treated as a strict sub-type of a sequence of bytes. The Unicode sub-type / trait or whatever you want to call it could literally just be a marker that said "I guarantee this byte sequence is valid UTF-8", and its methods could just be inherited from a byte sequence. It's a near perfect solution - the type system duplication is gone, unclean data mixing ASCII, bytes and text of unknown encodings could be handled without tripping over the type system and unwanted exceptions all the time. Hell, it even meant C's null terminated strings worked with Unicode, all the existing regex engines continued to work, wchar_t weren't anywhere near as necessary as it first appeared, and in general C's Unicode handling went from almost non-existent to ok'ish, to with the addition of a Unicode library or two to "perfectly serviceable". The old man gave us youngin's a lesson in software engineering. Again.

Then Python3 was created to "fix" Python2's Unicode problem - and adopted the "two entirely different type systems" approach.

But at its heart, my disappointment in this whole sorry saga isn't about that. We all make wrong technical choices all the time. What really irks me is Python adopted a one off deviation from their standard change control processes (which had worked, and seems to me continue to work very well despite this article's attempts to stir the pot) that let the mistake persist - the Python2 / Python3 transition. Had they done it the normal way, the way being discussed in this article, the entire Python user population would have gone along with them. As the deficiencies came to light they would have squawked longly and loudly, and my guess it would have been fixed. But as it was, we Python2 users had what we thought was an out. (Admittedly we were deluding ourselves, but Python2’s depreciation was so far away it seemed like it was some unknown future self's problem.) For me it was when my innocent looking os.listdir() blew up in my face, and it gradually dawned I had to write everything as b'' if I wanted my program to be reliable. It was so much bloat, and so unreliable I stuck with Python2. They fixed os.listdir() of course, and attempted to fix many of the other failings - but at the cost of piling on more and more weird encodings to "unicode", adding more and more complexity, and creating more and more duplication between bytes operations and string operations.

And now it looks like we are stuck with the result.

Python and deprecations redux

Posted Feb 26, 2022 13:37 UTC (Sat) by irvingleonard (guest, #156786) [Link]

I see the original problem as a "string abuse". Python strings are suppose to be text, with all the fancy functionalities that you can build around text; but "blobs" are usually int arrays (lists) and all those text functionalities make 0 sense. Say I'm working with a video file, substring search or replace would probably do very little, some others like title, upper, or lower would be outright useless. Assuming that "you probably want to treat it as a string" was a shorthand solution that helped python a lot in the text processing department, but pushed all binary processing into the "treat it as a string" field, which is extremely painful (not impossible, just painful). With python 3 that was walked back to a more general default, where you can still have all that text niceness but there's an extra layer to it: the binary conversion.

I'm not saying that is perfect, or that it was done right, or that it didn't have politics involved, I'm just saying that from a fresh dev point of view, one that started using python around 2.5-2.6, it felt like a great thing to fix. There's also the "application of things" in different places: every time something new appears there's this group of people that feel that they have to use it. In your example: paths are text, and always be text, that's the whole point of them, so, it makes very little sense to talk about "bytes" in that area, and I'm sure they found a reason to use bytes in path functions but that only shows that those are incomplete; or maybe they wanted to provide a "low level os interface" (and then you should use pathlib for the regular stuff instead), not sure what they were thinking.

Python and deprecations redux

Posted Mar 5, 2022 14:31 UTC (Sat) by nix (subscriber, #2304) [Link]

> In your example: paths are text, and always be text, that's the whole point of them, so, it makes very little sense to talk about "bytes" in that area

What? Paths in Unix are a sequence of any bytes at all other than \0 (with / having a constrained meaning as a path separator). They are absolutely not required to be UTF-8 in whole or in part (and in fact can be partly UTF-8 and partly some other encoding in the same path). Any Python program that isn't going to give up and die when faced with a perfectly valid file whose name it doesn't like must deal with this, which means not using Python strings for paths.

Python 3's mistake is that it doesn't acknowledge that what is true of paths is true of *almost all other string-like data ever*. Network traffic, documents, you name it: many of them will be almost entirely UTF-8 except for little bits, rarely encountered, that are not, and you *must handle those little bits too*.

Python and deprecations redux

Posted Feb 9, 2022 12:00 UTC (Wed) by iq-0 (subscriber, #36655) [Link]

Deprecations are something that should be exposed. The tricky thing about languages like Python is that there is no clear difference between a developer running an application and an end-user. But hiding things because "end-users might be exposed to them" is not something the language should be trying to influence directly, but is something that could be supported indirectly:

Provide (semi) standardized way to signal deprecations. Provide a way for application developers to cleanly redirect those warnings and allow easy silencing of these warnings, eg. using an environment variable, and hint to that in the warning message.

This will initially lead to a number of users being exposed to warnings they don't want, though they should probably care about. They can complain to the maintainers of the tool they're trying to use.

In that case:

a) it's maintained: the maintainer will probably find a way for users to be shielded for these warnings
b) it's not maintained: The user can easily shield themselves from the from the warnings by silencing the warning

In either case somebody is explicitly taking responsibility for hiding the deprecations and in doing so take on the burden of any resulting problems from ignoring it.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds