Python and deprecations redux
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
The problem of how to deprecate pieces of the Python language in a minimally disruptive way has cropped in various guises over the last few years—in truth, it has been wrangled with throughout much of language's 30-year history. The scars of the biggest deprecation, that of Python 2, are still rather fresh, both for users and the core developers, so no one wants (or plans) a monumental change of that sort. But the language community does want to continue evolving Python, which means leaving some "baggage" behind; how to do so without leaving further scars is a delicate balancing act, as yet another discussion highlights.
We looked in on some discussion of the
topic back in December, but the topic pops up frequently. There is a
policy on handling deprecations that is described in PEP 387
("Backwards Compatibility Policy
"), but the reality of
how they are handled is often
less clear-cut. Python has several warnings that can be raised when
features slated for deprecation are used:
PendingDeprecationWarning and DeprecationWarning. The
former is meant to give even more warning for a feature that will coexist
with its replacement for multiple releases, while the latter indicates
something that could be removed two releases after the warning is
added—effectively two
years based on the relatively recent annual release cycle.
But, as noted in that earlier discussion, the deprecation period is for a minimum of two release cycles. There are concerns that time frame is being treated as a deadline of sorts—to the detriment of some parts of the ecosystem. So on January 18, Victor Stinner, Tomáš Hrnčiar, and Miro Hrončok proposed postponing some deprecations that had been scheduled for Python 3.11, which is due in October. The message referred to an early January posting by Hrnčiar to the Python discussion forum that described the problems Fedora had encountered when building its packages using a development version of 3.11.
In particular, two specific sets of deprecations were causing the most trouble for Fedora packages. Removing deprecated aliases from the unittest module (bug 45162) and getting rid of deprecated pieces from the configparser module (bug 45173) led to the bulk of the problems that Fedora encountered. The unittest deprecation caused 61 Fedora packages to break, while the configparser changes broke another 28. In the proposal, Stinner said that they and others had reported the problems upstream and often contributed a fix, but that there is still a lengthy process before the changes actually reach the distribution:
The problem is that fixing a Fedora package requires multiple steps:
- Propose a pull request upstream
- Get the pull request merged upstream
- Wait for a new release upstream
- Update the Fedora package downstream, or backport the change in Fedora (only needed by Fedora)
Reverting those two changes, which caused most of the problems Fedora
has run into in its testing of the new version of Python, will allow for
"more time on updating projects to Python 3.11 for the
other remaining incompatible changes
". As reported by Hrnčiar, four
other changes led to problems building Python packages, but those were
fewer in number.
Silencing deprecations
In a reply to the
proposal, Antoine Pitrou wondered whether it
showed "that making DeprecationWarning silent by
default was a mistake?
" He is referring to the changes to the visibility of
DeprecationWarning that have occurred over the years. While
DeprecationWarning is useful for the developers of a Python
package, it is often seen by users, who may not be in a position to do
much about it. The warnings were made invisible by default for
Python 2.7 and 3.2 (in 2010 and 2011), but that policy was
changed for Python 3.7 in 2017 with
PEP 565
("Show DeprecationWarning in __main__
").
Guido van Rossum did not think that the evidence was quite that clear, but deprecations are tricky:
At best it shows that deprecations are complicated no matter how well you plan them. I remember that "noisy by default" deprecation warnings were widely despised.
Some ideas of further tweaks that could be made to the visibility of the
warnings were raised. Richard Damon suggested
having them only be visible when running unit tests. It turns out that pytest already enables
those warnings, as Brett Cannon pointed out. That is something of a double-edged sword,
though, Christopher Barker noted:
"It's really helpful for my code, but they often get lost in the
noise of all the ones I get from upstream packages.
" Gregory
P. Smith pointed
out that the standard library unit tests enable the warnings as well;
"Getting the right people to pay attention to them is always the hard
part.
"
Fixing deprecations
There was a bit of discussion about how to silence warnings from imported modules, possibly semi-automatically, but Steven D'Aprano had a bit of a warning about that approach:
If we use a library, then we surely care about that library working correctly, which means that if the library generates warnings, we *should* care about them. They are advanced notice that the library is going to break in the future.Of course I understand that folks are busy maintaining their own project, and have neither the time nor the inclination to take over the maintenance of every one of their dependencies. But we shouldn't just dismiss warnings in those dependencies as "warnings I don't care about" and ignore them as Not My Problem.
Like it or not, it is My Problem and we should care about them.
In the world of open-source software, the lines between users and "vendors"
of software are blurred, he said. Users often have the ability, and
certainly have the legal right, to change the code based on observing
problems of this (or any other) nature, but there is something of a social
problem, "and you cannot fix social problems with
technology
". Ignoring warnings breaks some assumptions about how
open source works:
The open source mantra about many eyes making bugs shallow doesn't work when everyone is intentionally closing their eyes to the warnings of pending bugs.
Barker said
that he does try to submit fixes upstream when he notices problems of that
sort, as did others in the thread. There is still the problem, mentioned
by Stinner, that even once fixes are contributed, releases including them may
still take a while; as Stephen J. Turnbull put it:
"even if you submit a
patch, there's no guarantee that the next version (or three) will
contain it
".
With regard to silencing DeprecationWarning, Steve Dower said that it was not necessarily a mistake to do so:
If we'd gone the other way, perhaps we'd be looking at massive complaints from "regular" end users about all the noisy warnings that they can't fix and saying that making it noisy was the mistake.
He was not opposed to reverting the changes as proposed, though he thought
it might be "a bit premature
" to do so now, roughly nine
months before the release. They can be reverted closer to the release if
the packages in question still are not fixed (and released). If they do
get reverted now, because "they cause
churn for no real benefit
", that would be reasonable; those who are
opposed can argue that the benefit is real, however, "as long as they
also argue in favour of the churn
". He also made a broader point:
We shouldn't pretend to be surprised that something we changed causes others to have to change. We *know* that will happen. Either we push forward with the changes, or we admit we don't really need them.
Stinner pointed
to two different examples of the kinds of problems that Fedora has found by
testing with development versions of upcoming Python releases. There are
advantages to finding these problems as early as possible: "If
issues are discovered earlier, we get more time to discuss and design
how to handle them.
" He thinks
it makes sense to revert these particularly problematic deprecations now
because it will help flush out more problems further down in the dependency
chain:
In Fedora, if a frequently used dependency is broken, a long list of packages "fail to build". (In Fedora, the package test suite must pass to build a package successfully.) If it takes 9 months to fix this dependency, we will likely miss other issues before the Python final version in dependent packages.
Sebastian Rittau said
"that some (semi-) automated way to actively test and notify
important projects of deprecations/removals before a release would be a
great addition to the Python ecosystem
", though he acknowledged that
it might be difficult to do. Stinner replied
that, in effect, Fedora is already doing that, albeit with "changes
already merged in Python
". He has done some work on ways to
automatically test Python with patches applied, to test upcoming or
proposed changes, but it turned out to be rather complicated.
Smith was also in favor of the reversions; he thanked the Fedora team for helping bring these problems to light, and noted that being proactive is a better way forward:
Deprecation removals are hard. Surfacing these to the impacted upstream projects to provide time for those to integrate the changes is the right way to make these changes stick in 3.12 or later. [...]As you've done the work to clean up a lot of other OSS projects, I suggest we defer this until 3.12 with the intent that we won't defer it again. That doesn't mean we can't hold off on it, just that we believe pushing for this now and proactively pushing for a bunch of cleanups has improved the state of the world such that the future is brighter. That's a much different strategy than our passive aggressive DeprecationWarnings.
Toward the end of the original proposal message, Stinner had some thoughts
on being even more proactive in the future. He suggested that before
making an incompatible change, doing a search of the Python Package Index (PyPI) for uses of the
feature in question "and try to update these projects
*before* making the change
". Once the number of affected projects
has been reduced to some low number (he suggested 15), the change could be
made in Python.
The Python ecosystem is huge, with an amazing number of projects, libraries, packages, tools, and so on, subsets of which are gathered up together into Linux (and other) distributions. All of those packages support differing ranges of Python versions, which makes the job of distributions that much harder, since they typically settle on one Python version to maintain throughout the life of a particular distribution release. Deprecating pieces along the way makes that ever more difficult, of course.
There are other software projects that take a different approach; the Linux kernel somewhat famously almost never deprecates something unless it truly can no longer be supported (e.g. ancient hardware or an API that leads to a security hole), but Python (and some other languages) have not chosen that course. There are certainly advantages to leaving things behind, especially when replacing them with something emphatically and unquestionably better, but it does have its downsides as well. It would seem that Python is drawing closer to finding the right balance when the deprecation route is taken, though there are always likely to be bumps along the way.
Index entries for this article | |
---|---|
Python | Deprecation |
(Log in to post comments)
Python and deprecations redux
Posted Feb 2, 2022 3:49 UTC (Wed) by tbird20d (subscriber, #1901) [Link]
Python and deprecations redux
Posted Feb 2, 2022 10:15 UTC (Wed) by ddevault (subscriber, #99589) [Link]
Python and deprecations redux
Posted Feb 2, 2022 20:24 UTC (Wed) by tnoo (subscriber, #20427) [Link]
Just activate the proper Conda environment for each code base you have.
Very convenient.
Python and deprecations redux
Posted Feb 3, 2022 15:22 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]
https://github.com/naftaliharris/tauthon
I'm personally staying on Python 2.7 forever. I've never migrated anything and never will because I refuse to let other people create work for me. Oh, and I'll never have to worry about deprecations again :)
The Python Lumberjack
-------------------------------
Oh, I'm on 2.7 and I'm okay.
I sleep well at night, and do real work during the day.
No future deprecations will be coming my way!
Oh, I'm on 2.7 and I'm okay!
Python and deprecations redux
Posted Feb 17, 2022 1:12 UTC (Thu) by nix (subscriber, #2304) [Link]
... no commits for nearly a year now. I'd call that more or less dead, alas :(
Python and deprecations redux
Posted Mar 2, 2022 8:17 UTC (Wed) by cpitrat (subscriber, #116459) [Link]
Python and deprecations redux
Posted Feb 2, 2022 11:41 UTC (Wed) by ceplm (subscriber, #41334) [Link]
Python and deprecations redux
Posted Feb 2, 2022 11:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
Both are exceptionally good at maintaining backwards compatibility (arguably, too much so in case of Java).
Python and deprecations redux
Posted Feb 2, 2022 14:52 UTC (Wed) by madscientist (subscriber, #16861) [Link]
Python and deprecations redux
Posted Feb 2, 2022 21:27 UTC (Wed) by ceplm (subscriber, #41334) [Link]
Aren’t these boring cleanup tasks (like running sed -e 's/assertRegexpMatches/assertRegex/' on all files in packages in distro) exactly what enterprise distributors should do? There are some things which people won’t do unless they are paid to do so, and nobody else will pay for this misery.
We are just finishing another similar ultra-boring thing: eliminating nose (that’s nose1) from the distro. There were hundreds of patches sent upstream, some of them trivial, some (whole ipython universe, boto) far far from trivial, some of them were just send by us upstream, some of them we have to develop in close cooperation with upstream because of their complexity.
Python and deprecations redux
Posted Feb 2, 2022 21:41 UTC (Wed) by rahulsundaram (subscriber, #21946) [Link]
They do for packages they ship. However the ecosystem of Python is much much larger than the packages shipped by the distros. So once you go beyond the core set, you start hitting the rough edges. In the container world, its not uncommon for devs to just bypass the distro packages and use Pip directly because they want a newer version or they just don't know any better. So the distro work doesn't benefit them.
Python and deprecations redux
Posted Feb 2, 2022 21:52 UTC (Wed) by Wol (subscriber, #4433) [Link]
Do the majority of programmers work for software houses, or for end -users? And I think you'll find there are a LOT of people (like me, now) for whom programming is a large chunk of the job, but they're not called programmers. And for many, Python is their tool of choice.
So all your hard work REMOVING "nose", and similar, is actually MAKING work for them.
Cheers,
Wol
Python and deprecations redux
Posted Feb 2, 2022 21:58 UTC (Wed) by ceplm (subscriber, #41334) [Link]
Python and deprecations redux
Posted Feb 2, 2022 22:37 UTC (Wed) by Wol (subscriber, #4433) [Link]
Cheers,
Wol
Python and deprecations redux
Posted Feb 2, 2022 21:36 UTC (Wed) by eplanit (guest, #121769) [Link]
In my 30+ years of being a software engineer, I've known no other language to be so popular, yet I've also know no other to have decade+ long v2 vs. v3 split and such a list of peculiarities.
I've become more a fan of Golang -- you still have to manage your dependencies for developing, but you can ship a simple executable and live the day with much less stress (and much simpler installation instructions for your user/customer).
Python and deprecations redux
Posted Feb 3, 2022 8:18 UTC (Thu) by LtWorf (subscriber, #124958) [Link]
Python and deprecations redux
Posted Feb 3, 2022 15:31 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]
This site may help:
https://github.com/pts/staticpython
Deprecated shouldn't mean removal
Posted Feb 2, 2022 4:09 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 7:05 UTC (Wed) by mb (subscriber, #50428) [Link]
But removing simple aliases and other trivial things is another.
Removing simple aliases will make the life of the library/language maintainer almost no better, but it will force the users to create compatibility layers and monkey patching, if they must support older Python versions or other Interpreters that don't have the same change schedule as CPython. (The 2-3 transition is still a thing! If old aliases are removed, that possibly makes the 2-3 transistion even harder again.)
We should have a deprecation period of at least 10 years by default.
For individual features and cases that duration could be reduced, if it's really a big pain to the library/language maintainers.
Trivial aliases should never be removed from central parts such as the stdlib or other big libraries, unless the functionality implementing these interfaces goes away as a whole. Just make these trivial deprecated things vanish from the documentation and be done with it.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 8:03 UTC (Wed) by Wol (subscriber, #4433) [Link]
At which point we sort of end up with the same situation linux is in, where stuff is never actively removed until it bit-rots and no-one cares to fix it ...
Cheers,
Wol
Deprecated shouldn't mean removal
Posted Feb 2, 2022 10:18 UTC (Wed) by mb (subscriber, #50428) [Link]
That doesn't improve the situation for deprecation of trivial changes. Moving breaks the existing API.
> At which point we sort of end up with the same situation linux is in, where stuff is never actively removed until it bit-rots and no-one cares to fix it ...
On Linux decades old binaries can usually be run without problems.
Try that with Python scripts.
Linux ABI backward compatibility is not perfect, but it is way better than Python backward compatibility.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 17:05 UTC (Wed) by vstinner (subscriber, #42675) [Link]
While the Linux source code (~30M LOC) is way bigger than the Python source code (1M LOC), the API exposed by the Linux kernel (syscall, ioctl, devices, etc.) looks smaller than the API of the Python language and its large standard library.
The Linux kernel has around 300 syscalls and Python has around 300 stdlib modules. The API of a Linux syscall looks smaller to me than the API of a whole stdlib module. For example, the Python module os provides more than 200 functions and also contains os.path submodule which also provides around 40 functions.
The discussed unittest module provides 80 methods and functions, and its unittest.mock sub-module provides 30 functions and methods.
Well, to be honest, I don't know well the Linux kernel "API", so I'm maybe just plain wrong, ioctl(), BPF & cie are way larger than the Python API. Or maybe Linux API and Python API cannot be compared because they are too different ;-)
Note: Python also provides a C API which exposes more than 100 structures and around 1500 functions (1000 public and 500 "private" functions, but in practice many of these "private" functions are used by 3rd party C extensions). It's challenging to introduce new feature without breaking any of these functions which were not designed to be used by 3rd party code initially (not designed to remain "stable" forever), 30 years ago.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 17:59 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
The core PyObject changes often enough that C API users need updates too, so it's not any more sacred in the backwards compatibility landscape than anything else. I've not seen any other efforts into making it more future-proof either.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 0:38 UTC (Thu) by vstinner (subscriber, #42675) [Link]
The PyObject structure is the same since the initial Python commit in 1990. Only the structure name changed from "object" to "PyObject" (in the early years of Python). What do you mean by frequent PyObject changes? Could you be more specific?
> I've not seen any other efforts into making it more future-proof either.
I'm actively working on bending the C API towards a more stable API (and get a stable ABI) in the long term. For example, I wrote PEP 620, PEP 670 and PEP 674.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 13:15 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
I'll also note that the PyConfig initialization routines (added in 3.8) are way better, but 3.10 introduced a new initialization codepath for the interpreter that broke how we supplemented `sys.path` in our interpreter wrapper. Unfortunately, I did not get around to this until after the final 3.10 release. Basically, Py_Main resets `sys.path` and we need to defer the addition of our own paths until after initialization.
I'll also note that PyConfig is missing "add this to sys.path" as the only options are "do the default stuff" and "I'll do all the work myself" with no middle ground (at least as far as the docs indicate).
> I'm actively working on bending the C API towards a more stable API (and get a stable ABI) in the long term. For example, I wrote PEP 620, PEP 670 and PEP 674.
That is good to hear. These PEPs seem like real improvements are on the roadmap, thank you.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 15:56 UTC (Thu) by vstinner (subscriber, #42675) [Link]
Oh, you're talking about the PyTypeObject structure and defining "static types". Since Python 3.2, there is a new PyType_FromSpec() API which doesn't suffer from these issues. In Python 3.9 and 3.10, this API has been completed to support more PyTypeObject members. I'm not sure that PyType_FromSpec() is well advertized. See the PEP 630 for a good overview of current best practices: https://www.python.org/dev/peps/pep-0630/
> I'll also note that PyConfig is missing "add this to sys.path" as the only options are "do the default stuff" and "I'll do all the work myself" with no middle ground (at least as far as the docs indicate).
Aha, the "Path Configuration" is the most complex part of the Python initialization. In Python 3.10, you can call PyConfig_Read(config) to compute the default Path Configuration, and then modify config.module_search_paths to insert or append your own paths.
In Python 3.11, Modules/getpath.c has been reimplemented in pure Python (Modules/getpath.py). I'm not sure how it impacts PyConfig API, I didn't follow these recent changes. I designed and implemented PEP 587 (the new PyConfig C API) in Python 3.8.
We lack user feedback on these APIs. You may open an issue at bugs.python.org to elaborate your use case and explain how the current API doesn't fit your needs.
Deprecated shouldn't mean removal
Posted Feb 17, 2022 15:34 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
Whoops, indeed. Sorry.
> I'm not sure that PyType_FromSpec() is well advertized.
No, it is not. I've opened an issue to start using this instead.
> We lack user feedback on these APIs. You may open an issue at bugs.python.org to elaborate your use case and explain how the current API doesn't fit your needs.
Thanks; I'll look at summarizing there.
Deprecated shouldn't mean removal
Posted Feb 4, 2022 18:21 UTC (Fri) by mb (subscriber, #50428) [Link]
Is it harder to prevent accidental API breakage in Python than in Linux?
Probably yes.
But that's not the point.
You are breaking the API on purpose! (= deprecation and eventual removal).
That's the point.
The Linux rule is pretty simple: Don't break applications.
And I don't think that would be impossible for Python.
Other complex languages do manage to achieve that goal. Look at Rust, for example, which has very strict rules for backward compatibility.
Deprecated shouldn't mean removal
Posted Feb 10, 2022 14:34 UTC (Thu) by irvingleonard (guest, #156786) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 8:49 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
On the other hand, CPython is its own project, and if their developers don't want to maintain these aliases, we don't really have the right to demand that they maintain them "for free."
Perhaps a compromise solution would be for an independent group of developers to maintain a single, de facto standardized compatibility layer for each new minor version of Python, which monkey-patches all of the "easy" deprecated aliases back in, and maybe also supplies simple implementations for some of the other removed functionality (perhaps with inferior performance or quality of implementation, if copying the CPython code wholesale is not practical). Given the amount of work which CPython has already caused through deprecation, and the relative simplicity of this sort of monkey-patching, I find it mildly confusing that such an effort does not exist already.
I'm aware of Tauthon, which is (apparently?) still plugging along, but putting my SRE hat on for a moment, I wouldn't let it anywhere near any of my production systems without a lot of very intensive testing and analysis. It's far too big and complicated compared to the sort of shim that I'm imagining.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 10:31 UTC (Wed) by mb (subscriber, #50428) [Link]
Well, yes. There's no right to demand. That's correct.
But CPython is certainly not alone on its own when doing decisions. CPython is not some kind of end user application, where only the project and its end users are affected by decisions. CPython is a (de facto standard defining implementation of a) programming language.
And Python developers *are* very good in their decision processes. They do generally care a lot about their users.
But the current deprecation process of trivial things just causes a lot of work for nothing on the user side, and bad reputation for (C)Python.
That's why I'm in favor of sticking with deprecated things forever, if they are relatively easy to maintain. Or at least *until* they become hard to maintain. Just hide them from the documentation, so that no new development is based on it.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 12:08 UTC (Wed) by smurf (subscriber, #17840) [Link]
Then, when it's apparent that one becomes too much of a maintainer burden, the pressure to remove things *now* becomes rather high. So it'll get dropped even if there are still users out there.
On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 15:16 UTC (Wed) by david.a.wheeler (subscriber, #72896) [Link]
That is NOT a problem in many cases. Deprecated aliases that last thousands of years are PERFECTLY FINE.
> On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.
I'm big on fixing code over time, but all developers have to prioritize their code. Please let me focus on what's important, not on the spelling of a method name.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 8:39 UTC (Thu) by nim-nim (subscriber, #34454) [Link]
If you develop using a feature-poor stack (in C for example), you may never need to change your code due to someone else’s decision.
If you develop using a feature-rich batteries included stack there is a ton of features you get almost free, often better coded than you would yourself (because even if you had the capability, you never had the time to rewrite properly all of them). But, it’s only almost-free, this kind of stack is never really done and you have to adapt your code over time to its changes.
Asking the feature-rich stack to provide perfect eternal backwards compability is not reasonable. Pinning its state (like some people do static building and container side) is only defering the technical debt . With eventual software abandon in the future when the debt pile has grown so much it weights more than the software value.
This kind of write and forget dev only works for games that need to pass a couple Christmas seasons working and nothing more. Also sane people do not let games touch serious data (financial or other).
Deprecated shouldn't mean removal
Posted Feb 3, 2022 11:06 UTC (Thu) by Wol (subscriber, #4433) [Link]
Might be a bit late for Python, but if it's batteries-included, the language should be split into 3 parts. "Core" which is guaranteed to (almost) never change, "Battery Packs" where all the nifty things live, and "Recycle Bin" where battery packs go to die. Then the development environment can moan every time it goes to load a battery pack and finds it in the recycle bin.
Cheers,
Wol
Deprecated shouldn't mean removal
Posted Feb 3, 2022 16:56 UTC (Thu) by atnot (subscriber, #124910) [Link]
However last time this was proposed Guido stormed out of the room in anger and made the presenter quit python altogether so perhaps it's not wise to propose it again. (https://lwn.net/Articles/790677/)
Deprecated shouldn't mean removal
Posted Feb 3, 2022 17:52 UTC (Thu) by Wol (subscriber, #4433) [Link]
If nobody wants to maintain it, that's where it goes ...
Cheers,
Wol
Deprecated shouldn't mean removal
Posted Feb 10, 2022 14:35 UTC (Thu) by irvingleonard (guest, #156786) [Link]
It's very expensive to improve stable code, because of the constraint, since you can't break existing stuff. It's only "safe" to add new stuff, that you'll have to maintain "forever" so you better foresee any future need or you'll be soon writing the 3rd version of your function, and so does the story goes. With every version you increase the maintenance burden (you better have tests) and you discourage any change upstream: any change in the actual language would end up affecting 3 functions instead of 1, and that's only for "this" thing. This approach has the advantage that anyone using the code will be able to do so "forever" at the expense that it will be "all" that you'll get from it. Need a better "std_fancy_function5"? You're out of luck, the maintainer ran out of hair after version 4 and outright quit after 5; but we're looking for maintainers, so you could contribute "std_fancy_function6" and maintain the other 5...
In the other hand an evolving stdlib will keep breaking stuff, and generating work for developers, sometimes just annoying, sometimes useful BUT you could get that "std_fancy_function6" basically "for free", just keep in mind that you have to update all your code using 1-5
The current state of affairs is something in between: the stdlib is so important that major changes are discouraged (which helps very little in the usability side) but at the same time such changes are not prohibited and eventually find their way to a stable version (which infuriates some people). It's a lose-lose situation, where some people get burned because of the changes while others end up using 3rd party libraries because of the limitations of the stdlib counterpart.
Other solutions for abandon-code:
- Avoid the stdlib, since it's the major source of changes, you probably create "set and forget scripts" as long as you don't rely on any module. I would say that creating "a program" this way would be too much, but simple scripts should survive for a very long time.
- Just hang on, the code will eventually mature enough, like the 3.10, which added/changed very little in comparison to other versions.
- Use another, compiled, language and just statically link everything (I wouldn't use this code in anything critical since you would be "baking" all the libraries' bugs into your binary, forever)
Deprecated shouldn't mean removal
Posted Feb 2, 2022 17:08 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]
On the other hand, if it's clear that once something's deprecated it'll vanish after two more releases, everybody has some incentive to actually fix their code before that happens.
Assuming there's somebody who's actively maintaining the code. There's a lot of code out there that is in low-effort maintenance mode. That means the first anyone will know about it is that their application breaks, after which someone will have to scramble to fix it. It seems as if Python is basically saying it's for projects that will always be under active development forever, and people who want to write something that will keep working with minimal maintenance need not apply.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 18:57 UTC (Wed) by fenncruz (subscriber, #81417) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 20:21 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 20:28 UTC (Wed) by tnoo (subscriber, #20427) [Link]
or you use conda and run your code in the exact environment you need
Deprecated shouldn't mean removal
Posted Feb 2, 2022 21:36 UTC (Wed) by Kamiccolo (subscriber, #95159) [Link]
please, stop.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 6:10 UTC (Thu) by tnoo (subscriber, #20427) [Link]
Deprecated shouldn't mean removal
Posted Feb 3, 2022 16:21 UTC (Thu) by sb (subscriber, #191) [Link]
Whenever someone expresses dismay at the maintenance burden of using Python, and how so much of that burden was entirely avoidable, someone else comes along and says something like "just add shmronda, very convenient" :-) This gets old after a while, especially because the suggestion is often presented as a unique sine qua non for Python but just happens to be that person's preferred workaround at the time and there are several others, presented likewise by their proponents.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 11:55 UTC (Wed) by azumanga (subscriber, #90158) [Link]
The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".
I am aware that sounds suprising, so here is a link: https://github.com/naftaliharris/tauthon/issues/47#issuec...
Deprecated shouldn't mean removal
Posted Feb 2, 2022 15:39 UTC (Wed) by rgmoore (✭ supporter ✭, #75) [Link]
The problem is the Python developers don't want backwards compatibility.
Exactly this. I think the root is that the devs learned the absolute wrong answer from the 2 to 3 transition. I think we can all accept that was bad, and nobody wants to go through it again. But what most programming languages would learn from that is to avoid backward-incompatible changes whenever possible. What Python learned from it was to make backwards incompatible changes unavoidable, so developers are forced to change with the language rather than relying indefinitely on things that are eventually going away.
That's a reasonable approach for programs that are being actively developed, at least as long as the overall trajectory of the language is positive. In that case, developers are willing to pay a deprecation tax to keep up. It's terrible for programs that are being developed slowly or expected to keep functioning with minimal maintenance, since those programs don't need the new features and have to spend their limited maintenance effort on keeping up with apparently unnecessary changes. It's especially egregious in the case of a language like Python that depends on the runtime to function, since there's no way to avoid dealing with the changes.
The long-term problem is that this should be really scary to people considering Python for projects that aim to reach a stable final product. The Python devs' attitude says it's a bad language for projects that aim for stability. You can't build a stable program on an unstable language, and Python intends to remain unstable.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 19:36 UTC (Wed) by cyperpunks (subscriber, #39406) [Link]
The long-term problem is that this should be really scary to people considering Python for projects that aim to reach a stable final product. The Python devs' attitude says it's a bad language for projects that aim for stability. You can't build a stable program on an unstable language, and Python intends to remain unstable.Python is dead a viable language if the current policy don't stop very soon. It's just to risky to base your work on such unreliable project. It's not just the deprecation policy, it's the very short release cycles, the idiotic lack of a crypto lib in core (the dependency on Rust in cryptography just to make the point of madness very clear), the non development of pip and the whoel "we don't care because we are free (as in beer) attitude. It's sad.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 4:22 UTC (Thu) by roc (subscriber, #30627) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 17:02 UTC (Wed) by mb (subscriber, #50428) [Link]
I don't think that is true.
We're talking about corner cases here. Overall the language is pretty backwards compatible, aside from the 2-3 transition. (See for example how match had been implemented into the parser).
However, corner cases are still very important.
> The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".
Well, I won't argue whether it is Ok to threaten legal actions here.
But I would simply _expect_ people to rename the project, or at least clearly mark it as a fork, if they fork it.
This is a matter of decency towards their users and to the original project.
Deprecated shouldn't mean removal
Posted Feb 2, 2022 17:36 UTC (Wed) by Wol (subscriber, #4433) [Link]
The problem with that, is that the NEW name goes to the OLD software, while the OLD name stays with the NEW software.
Okay, I can understand the project owners not wanting the fork to keep the name, but equally the fork is changing NOTHING BUT the name, that's the whole point of the fork! So why if they're not changing anything else, why do they need to change that?
That's why Perl6 forking off as Raku was a victory for common sense over personal pride.
Cheers,
Wol
Deprecated shouldn't mean removal
Posted Feb 3, 2022 0:49 UTC (Thu) by jkingweb (subscriber, #113039) [Link]
After reading the whole thread, I think that's a gross mischaracterization of what actually happened.
It was pointed out that calling the software "Python 2.8", while technically accurate (for some definition of accurate), was legally problematic and potentially a source of significant confusion. The author was open to changing the name, and while alternatives were being discussed, a third party took it upon themselves to besmirch van Rossum's character. Thus the latter responded negatively to that, but it seems to have been in a sarcastic, deadpan way. I find it hard to interpret that as an actual threat.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 1:00 UTC (Thu) by vstinner (subscriber, #42675) [Link]
I read that often in the last 10 years. So far, I didn't see any volunteer doing it, even after Python 2.7 support ended 2 years ago.
Red Hat backports security fixes to Python 2.7 in Fedora, RHEL 7 and RHEL 8 until 2024. Fedora patches are public: https://src.fedoraproject.org/rpms/python2.7/tree/rawhide
The problem is that users expect more than just the language and the stdlib when they want "Python". They also expect large Python projects like numpy, Jupyter or PyTorch, but these projects already dropped Python 2 support: https://python3statement.org/
> The Python dev team went as far as to threaten legal action against anyone who tried to keep Python 2 alive, if they called their project anything close to "Python".
Tauthon is *not* Python 2.7. It is something between Python 2.7 and Python 3 which could be called "Python 2.8". PEP 404 rejected the idea of a Python 2.8 version: https://www.python.org/dev/peps/pep-0404/
Tauthon description: "Fork of Python 2.7 with new syntax, builtins, and libraries backported from Python 3."
Anyone is free to fork Python 2.7, add recent Fedora security fixes and maybe fix a few bugs. Since most Linux distributions still ship Python 2.7 in 2022, there is no need to maintain a Python 2.7 fork right now. You're free to continue using Python 2.7.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 13:37 UTC (Thu) by farnz (subscriber, #17727) [Link]
As a side note, there's also a general tendency to complain when Red Hat stops doing maintenance work. When Red Hat stopped making "make X11 work well for people not using Wayland" part of his job, Adam Jackson stepped down from being the Xorg release manager - he'd only been doing it because Red Hat made it part of his job.
No-one else stepped up to that job, so it didn't get done, but people were willing to complain bitterly that it wasn't happening, and that Red Hat "should" have made Adam continue to do it for them, not because Red Hat needed it for their product.
I could see the same happening with Python 2.7; Red Hat will stop supporting it in 2024 unless someone (or a group of someones) steps up with significant money for them to do it. If no-one else steps into that breach, I predict that in 2025 or so, we'll see a selection of complaints that no-one is supporting Python 2.7 any more, but they were using it, and someone should have supported it for them so that whatever problem comes up with Python 2.7 didn't bite them.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 15:06 UTC (Thu) by cortana (subscriber, #24596) [Link]
and a year away from Perl 5.14. :)
Deprecated shouldn't mean removal
Posted Feb 3, 2022 15:29 UTC (Thu) by azumanga (subscriber, #90158) [Link]
All messages I have seen have made very clear that there will be no more releases at all, not "there is insufficient support".
Deprecated shouldn't mean removal
Posted Feb 3, 2022 15:54 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]
You could sign up by changing that: create a fork of Python 2.7 on GitHub and incorporate the Fedora patches.
Since the Python project has trademarked the name "Python" and wants to be your enemy, you'll have to change the name. "Snek" may be a good choice. Be aware that you should be able to still advertise your project as a "continuation of the Python 2.7 codebase" or "a project to provide continued maintenance for the Python 2.7 codebase" since those are true statements and you can generally use a trademark when you are making true statements. That's not legal advice, but I think it's true. Satisfy yourself as to the accuracy of what I said beforehand so that if the Python project tries to bully you, and they might, you can feel safe standing up for yourself.
Someone will eventually do this work if you don't, probably when the last distro stops providing patches. However, if you think you'd be good at it, call dibs by doing it now.
Deprecated shouldn't mean removal
Posted Feb 3, 2022 15:55 UTC (Thu) by linuxrocks123 (subscriber, #34648) [Link]
https://github.com/naftaliharris/tauthon
That's more of a Python 2.8 than continued maintenance for 2.7, though.
Deprecated shouldn't mean removal
Posted Feb 4, 2022 10:15 UTC (Fri) by ceplm (subscriber, #41334) [Link]
Deprecated shouldn't mean removal
Posted Feb 2, 2022 23:08 UTC (Wed) by vstinner (subscriber, #42675) [Link]
This article is about unittest and configparser. Most unittest deprecation warnings were added to Python 2.7 in 2010: 12 years ago. configparser deprecations were added to Python 3.2 in 2011: 11 years ago.
This article is not about *removing* the deprecated features but *keeping* them for one more year (Python 3.11) and better advertize these deprecations (that developers managed to ignore for longer than 10 years).
Deprecated shouldn't mean removal
Posted Feb 2, 2022 23:59 UTC (Wed) by fman (subscriber, #121579) [Link]
Hostile nails it. That is exactly the feeling you get when being at the sharp end of the stick with no hope for sympathy for your frustrations from the core dev team.
Python and deprecations redux
Posted Feb 2, 2022 8:42 UTC (Wed) by azumanga (subscriber, #90158) [Link]
None of these changes seem to allow future improvements, or fix security issues, they are just "tidying".
I'm happy for documentation of deprecated functions to be hidden, and maybe an always on warning.
It increasingly feels like Python actively hates the idea I might write a program and it just be done. I don't want to keep doing tidy up every year. I have C99 programs doing useful work they haven't needed any "cleanup" in 20 years.
Python and deprecations redux
Posted Feb 2, 2022 9:47 UTC (Wed) by cortana (subscriber, #24596) [Link]
I'm happy for documentation of deprecated functions to be hidden
I can't stand this. If I'm looking through source code and I see an unfamiliar function, I want the Python Standard Library documentation to document it. Absolutely add a note that it's deprecated but don't hide it!
Python and deprecations redux
Posted Feb 2, 2022 10:42 UTC (Wed) by smcv (subscriber, #53363) [Link]
Replacing the entire documentation for the deprecated foo_bar_baz function with "deprecated equivalent of foo_bar(baz=True), use that instead" is often fine, though.
Python and deprecations redux
Posted Feb 2, 2022 22:25 UTC (Wed) by tialaramex (subscriber, #21167) [Link]
Take Rust's str::trim_left_matches(). This function is deprecated since 1.33 about 3 years ago. Unlike Python it's unlikely Rust will ever actually remove deprecated standard library functions but nevertheless it is deprecated and your code should call str::trim_start_matches()
The documentation tells you this, but if you want to understand why you need to read the full description of the functions and perhaps if you've never seen them before, read a little about other human writing systems. Nobody is surprised by what trim_start_matches does but there is potential to be surprised by trim_left_matches depending on how you're thinking about the problem.
Python and deprecations redux
Posted Feb 2, 2022 11:01 UTC (Wed) by mb (subscriber, #50428) [Link]
You can look into an older version of the documentation.
The latest documentation should only include the name of the deprecated interface, a big deprecation statement that tells us since when it has been deprecated (so you can look up the old documentation).
All technical description of the interface should be removed. Optionally the name of the new interface could be added, if that's applicable.
Python and deprecations redux
Posted Feb 4, 2022 13:47 UTC (Fri) by ceplm (subscriber, #41334) [Link]
All those stories how “my thirty year old C program still works just fine” are based on two assumptions, which I am not sure people are willing to accept and certainly Python won’t satisfy. The language (C, Fortran, COBOL) must be dead. So, even languages which are slightly alive (C++, Java, Perl) could be problematic and need periodic adjustments. And of course, aside from dead language, you cannot use any libraries (because those change) and dead environment (how are your C-languages for Plan9 or CP/M doing?). So, if you have a program which uses just stdio.h from the K&R book, it will work still (perhaps), with plenty of warnings, but anything more involved will have problems.
Python and deprecations redux
Posted Feb 4, 2022 19:21 UTC (Fri) by klindsay (subscriber, #7459) [Link]
While there are some deleted features, the new features are added in a largely backwards compatible way. This is intentional. Backwards compatibility is a high priority of the standards committee. The 2003 standard includes the sentence "This standard protects the users’ investment in existing software by including all but five of the language elements of Fortran 90 that are not processor dependent.".
So there are programming languages that evolve and simultaneously prioritize backwards compatibility. It's not one or the other, as your second paragraph seems to imply.
Python and deprecations redux
Posted Feb 4, 2022 19:49 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
With Python it's getting difficult to run 5-year old code. Which is a problem.
Python and deprecations redux
Posted Feb 6, 2022 1:23 UTC (Sun) by abartlet (subscriber, #3928) [Link]
Of course Samba deprecates features and options as well - but I like to think we only do the removal it when we really need to, not just to follow up a deprecation that almost by definition was not discussed with all/any significant fraction of the users.
Samba only finds out when Fedora builds break.
Python and deprecations redux
Posted Feb 6, 2022 3:09 UTC (Sun) by pabs (subscriber, #43278) [Link]
Python and deprecations redux
Posted Feb 2, 2022 12:24 UTC (Wed) by jezuch (subscriber, #52988) [Link]
It's been like this for decades.
There's also a well-established process for depreciation for removal, according to which the deprecated things are clearly marked as such in the code. It is used for features which are known to not be widely used, or are known to be positively harmful (like the Applet API, or, more recently, finalization - so yes, it happens even for features once considered a core part of the language). Apart from that there'd tons of deprecated stuff, and just sits there undisturbed.
I have no point to make, and I don't even really care about Python :) But any time I read about Python community struggling with something, I see that it's successfully been done elsewhere. And it feels really amateurish in comparison.
Python and deprecations redux
Posted Feb 10, 2022 14:35 UTC (Thu) by irvingleonard (guest, #156786) [Link]
- Who maintains the deprecated stuff in Java? (since it "just sits there undisturbed")
- Do the maintainers allow new bug reports for deprecated code?
- What about patches?
- If that's all possible then what's the difference to non-deprecated code? Just vocal stance against its use?
Python and deprecations redux
Posted Feb 10, 2022 20:20 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
Java maintainer (Oracle).
> - Do the maintainers allow new bug reports for deprecated code?
Yes, for security issues.
> - What about patches?
Not really.
> - If that's all possible then what's the difference to non-deprecated code? Just vocal stance against its use?
Deprecation warnings during compilation.
Python and deprecations redux
Posted Feb 25, 2022 21:39 UTC (Fri) by irvingleonard (guest, #156786) [Link]
Basically what I said here https://lwn.net/Articles/884324/ with the caveat that if Java works for you, you should definitely use Java.
Python and deprecations redux
Posted Feb 25, 2022 22:58 UTC (Fri) by dtlin (subscriber, #36537) [Link]
Not close to the same rate as Python, but even beyond the API of the standard classpath, Java modularization breaks a good number of programs, both by making the runtime stricter and by removing previously standard components as well.
Python and deprecations redux
Posted Feb 28, 2022 14:30 UTC (Mon) by irvingleonard (guest, #156786) [Link]
Python and deprecations redux
Posted Feb 2, 2022 13:51 UTC (Wed) by zeekec (subscriber, #2414) [Link]
* Note: I'm too lazy to google to see if this has already been suggested.
Python and deprecations redux
Posted Feb 2, 2022 15:34 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
- APIs where the replacement works in the declared version (please use the better spelling, pattern, replacement, etc.)
- APIs slated for removal Real Soon Now™ (this is effectively a request to bump the minimum requirement)
Of course no project currently has a way to tell the standard library what version it expects to work with, so this only works once such things can be communicated. Note that you also might have mixing with different modules/packages expecting different minimums, so this is something that should be attached to where the code is declared, not something set from the top-level package.
For bonus points, have a second version which states what version it is aware of. Then you can warn about any API deprecated before that second version additionally (the code is expected to have been made runtime-conditional to avoid any version skew issues).
Python and deprecations redux
Posted Feb 2, 2022 16:45 UTC (Wed) by vstinner (subscriber, #42675) [Link]
A problem is that Python default behavior is for *users*, not developers: PendingDeprecationWarning and DeprecationWarning are hidden by default. The Python Development Mode (-X dev or PYTHONDEVMODE=1) shows these warnings.
pytest (popular library to write tests) and unittest (stdlib module) now show DeprecationWarning warnings for a few years: it wasn't the case previously. Things are evolving to better handle deprecations in Python.
Python and deprecations redux
Posted Feb 2, 2022 19:38 UTC (Wed) by iabervon (subscriber, #722) [Link]
Python and deprecations redux
Posted Feb 3, 2022 0:28 UTC (Thu) by rra (subscriber, #99804) [Link]
It seems like every post about Python on LWN prompts a flurry of comments from people who want to posture about how much they dislike Python, and I hope that's not discouraging to the LWN writers. Please know that many of us use Python regularly and appreciate your coverage and your typical thorough and dispassionate job of keeping us up to date.
Python and deprecations redux
Posted Feb 3, 2022 1:44 UTC (Thu) by jake (editor, #205) [Link]
I don't think it is, really. Programming languages seem to bring out a certain level of disdain from fans of other languages (or non-fans of Python or whatever) in comments sections everywhere. It is not a perfect language or community, by any means, but lots of folks (including LWN) use it for all sorts of interesting things and generally enjoy doing so. The Python community has given us a huge gift.
That said, your note certainly helped encourage us to continue covering it like we do. We appreciate it greatly.
jake
Python and deprecations redux
Posted Feb 3, 2022 7:59 UTC (Thu) by AdamW (subscriber, #48457) [Link]
Python and deprecations redux
Posted Feb 3, 2022 23:46 UTC (Thu) by vstinner (subscriber, #42675) [Link]
Python and deprecations redux
Posted Feb 5, 2022 17:45 UTC (Sat) by willy (subscriber, #9762) [Link]
I don't think that's really what's going on in the comments for this particular article. Speaking for myself, I don't write Python (or any language that might remotely be considered a competitor), but I do want to be able to run code other people wrote without having to figure out what "the new way" to do that thing is.
I just want Python to be better, and I get the strong sense the other critics here want the same thing.
Python and deprecations redux
Posted Feb 3, 2022 9:31 UTC (Thu) by danpb (subscriber, #4831) [Link]
> .... Stinner had some thoughts on being even more proactive in the future. He suggested that before making an incompatible change, doing a search of the Python Package Index (PyPI) for uses of the feature in question "and try to update these projects *before* making the change". Once the number of affected projects has been reduced to some low number (he suggested 15), the change could be made in Python.
Not knowingly breaking stuff on PyPI is great, but what about the millions of projects using python code that don't exist on PyPI? PyPI merely hosts the code designed and published as reusable modules, but there's likely orders of magnitude more python code existing in leaf applications (both open source and private to an organization) that's just as important, probably more so to those who use it.
Python and deprecations redux
Posted Feb 3, 2022 23:50 UTC (Thu) by vstinner (subscriber, #42675) [Link]
I wrote a similar tool for C extensions: https://github.com/pythoncapi/pythoncapi_compat
At least, more and more incompatible changes are documented with practical instructions on how to update existing code in the "What's New in Python 3.x" document. Example with Python 3.11: https://docs.python.org/dev/whatsnew/3.11.html
Python and deprecations redux
Posted Feb 6, 2022 4:01 UTC (Sun) by ras (subscriber, #33059) [Link]
As it happens just about everything that transition did could have been handled the old way. The one exception was the change to the way Python handled Unicode. Which is ironic because IMHO, the way they handle Unicode and Bytes in Python3 is objectively worse than Python2. In fact their treatment of how bytes is handled, and in particular that b'abc' returns in integer and somehow deciding b'abc'[0] in b'abc' would be False is inexplicable. It's almost like the forget their mission was to improve Python, not make it more like Java.
The sad part about all that is had they stuck to original way of doing transitions rather than using the flag day method for the Unicode change, I suspect the worst of those decisions would not have made it through the process.
Python and deprecations redux
Posted Feb 10, 2022 14:34 UTC (Thu) by irvingleonard (guest, #156786) [Link]
Python and deprecations redux
Posted Feb 10, 2022 20:19 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]
Really? Which one of these 5 non-UTF encodings do you prefer: KOI8-R, KOI8-U, CP1251, DOS866, or perhaps the most standard of them all: ISO/IEC 8859-5?
At the time Python3 was developed, UTF-8 made national encodings basically useless and stupid.
Python and deprecations redux
Posted Feb 25, 2022 21:30 UTC (Fri) by irvingleonard (guest, #156786) [Link]
Python and deprecations redux
Posted Feb 25, 2022 21:43 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
Python and deprecations redux
Posted Feb 11, 2022 2:22 UTC (Fri) by ras (subscriber, #33059) [Link]
I can't agree. The (b'abc'[0] in b'abc') is False thing has nothing to do with my world view. I'd be amazed if that little "lets copy Java" feature doesn't cause more 2to3 transition bugs than the rest combined.
Nor does my world view matter when it comes to unix file names having no well defined encoding. I'd agree the world would be a better place if they did, but the reality is they don't. Python2's string handling handled the situation without missing a beat. Python3 looks like they handled it into going into denial and assuming it was possible to convert file file name into a readable string, despite the fact they knew full it could be written by a python program with LANG=af_ZA.ISO-8859-1 and later read by that same program with LANG=af_ZA.UTF-8.
I've never met anybody with a world view in which the ISO 10646 vision of allocating one true number to every grapheme was a bad thing. I've never met anybody with a world view that lead to them thinking all readable text shouldn't be encoded using such numbers. Unfortunately I've never lived in a world where code assuming those two things were a given didn't cause the programs using them to break far, far too often. Maybe you just haven't dealt with enough crap coming in from the internet to notice.
Python2 had a workable compromise that let us move towards ISO 10646 in a graceful way while being subject to gobs on broken text. As a "be conservative in what you do, be liberal in what you accept" compromise it wasn't bad, although I agree it could have done more in nudging us more towards "conservative in what you do". Python3 was apparently arrogant enough to think if could fix language encoding problem by just assuming it was already fixed and this would force the world would follow. Oddly, they were wrong, the world didn't change on a dime just because Python3 was a thing. Turned out you can't change the world by forcing Python programmers to, as you say, revisit python code "to check for the string/bytes distinction", because making such a binary distinction is impossible in some cases. It's possible for Unix file names and configuration file to be both - blocks of ASCII text intermingled with some unknown encoding. It's not just Linux, HTML / HTTP / RFC 5821 all tend to assume you will treat it as an unknown encoding with meanigful bits of ASCII embedded. If you are lucky, some of those meaningful bits might even tell you the encoding of the rest.
The really sad bit is Unicode didn't even get the ISO 10646 vision right. After the UCS2 / UTF-16 encoding debacle, they left themselves with such a small encoding space they dropped the "one coding point for each grapheme" thing in favour of diacritics. With diacritics ("o" in string) could well return True when there is in fact no "o" in the string. What were they thinking? Perhaps it was "hey, I've found a next way we can trick programmers into introducing a whole pile of new exploits!".
As you can probably gather, I've come to find the entire Unicode thing (not just Python3's part in it), a depressing subject. We could have done so much better.
Python and deprecations redux
Posted Feb 11, 2022 16:40 UTC (Fri) by jwilk (subscriber, #63328) [Link]
Even Unicode 1.0 had codepoints for diacritics (see §2.5 "Non-spacing Marks"). UTF-16 was introduced only in Unicode 2.0.
Python and deprecations redux
Posted Feb 14, 2022 10:54 UTC (Mon) by farnz (subscriber, #17727) [Link]
Diacritics is an interesting example to pick, since depending on the language, ó is an o. Or sometimes not. Depends on which language you're speaking.
And this is why Unicode defines normalization forms, so that you can take in an arbitrary set of Unicode codepoints, and turn them into a uniform sequence of codepoints regardless of what the user has entered (by doing things like setting an order for multiple combining marks).
Python and deprecations redux
Posted Feb 25, 2022 21:29 UTC (Fri) by irvingleonard (guest, #156786) [Link]
Say I have this protocol where I send a 3bit flag followed by a five 1bit flags, and they happen to be "1100001", aka 0x61; python will show b'a'. Does it mean I sent an "a"? Am I expecting an "a"? Not really, I will struct my way into it, but python doesn't really know what it means before that, it can only tell me that "it looks like an 'a'", which makes a lot of people think, incorrectly, that "1100001"/0x61 will "always be an 'a'".
That being said, regarding the "byte search complaint", I see there might be a reason to "search a string of bits within a stream of them using byte boundaries" and I presume it was a political decision (this is just guesswork, I don't really know):
- if you're working with text, decode the bytes and use the text tools
- if you're working with binaries, struct your bytes and do the search there
I've tried working with binary stuff in python 2 and I can definitely tell you: python 3's bytes (and bytearrays) are a strong step forward in that regard.
Python and deprecations redux
Posted Feb 25, 2022 21:50 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]
No, it's not. Python3 doesn't actually make stuff any better, because Py3 strings are not actually strings. They are just sequences of Unicode codepoints.
For example, Py3 allows you to split strings across combining characters, so you can easily get nonsense like diacritics separated from the characters they should go on top of. Or worse, RTL text in an incorrect direction.
Heck, Py3's standard library doesn't even have locale-specific upper/lowercasing built in (see: Turkish dotless I). See how it's done in a language where developers actually care about correctness: https://pkg.go.dev/unicode#SpecialCase
Py3 just pretends that these complexities don't exist if you're writing English and allows developers to pat themselves on the back for "eating their veggies".
Python and deprecations redux
Posted Feb 26, 2022 12:56 UTC (Sat) by irvingleonard (guest, #156786) [Link]
Python and deprecations redux
Posted Feb 26, 2022 20:33 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]
Moreover, new features like format string don't work well with bytes. So if I have a binary protocol, I can't just do something like this: fb'{part1}\x11{part2}'
Python and deprecations redux
Posted Feb 28, 2022 15:51 UTC (Mon) by irvingleonard (guest, #156786) [Link]
Python 2 saw that and said: nah, we'll do strings by default. So, with this you could say that python 2's integers end up as strings and the other way around, which doesn't mean text, which is also a possible casting. Say: you can cast the int 255 to a string (binary) and it will be 0xff or you could cast it to a string (text) and it would be 0x323535 depending on which casting function you use (same origin and end types). Floats should be even more "interesting". To add salt to the injury, python will autocast using whatever function they think should be used in some circumstances (__repr__).
From where I see it, you shouldn't process bytes directly unless you know what you're doing (you're so into it, that you know how to handle your data in encoded form) or you're just working with binary data. Bytes should be treated as a low level data format, that should support only low level functions and should be converted (decoded) into your data types for any meaningful processing (it's a new layer) or should be used by your application specific functionalities (audio functions, video functions, code functions, etc.). The main problem is that the documentation keeps linking them to "strings" (text) because of the history (aka Python 2) but python 3 bytes are not text, and shouldn't be shown at text, ever; but then we have binary strings b'Confusing?' so we can keep people scratching their heads and complaining about "byte strings" shortcomings.
Python and deprecations redux
Posted Feb 28, 2022 19:03 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
Uh.... Whut?!?
> Python 2 saw that and said: nah, we'll do strings by default. So, with this you could say that python 2's integers end up as strings and the other way around, which doesn't mean text, which is also a possible casting.
Py2 does not have a functional distinction between byte arrays and strings.
> Say: you can cast the int 255 to a string (binary) and it will be 0xff or you could cast it to a string (text) and it would be 0x323535 depending on which casting function you use (same origin and end types).
>>> chr(255)
'\xff'
>>> str(255)
'255'
> From where I see it, you shouldn't process bytes directly unless you know what you're doing
Whut?!?
I've written probably a hundred thousands lines of code in Py2 that worked with binary protocols, using regular strings. My biggest problems were printing with correct escaping and binary formatting.
I've also moved that code to Py3. Not once I had a case where Py3 strings caused me to say: "Wow! That strings/binary separation is so nice, it saved me a day of debugging!". On the other hand, I've probably wasted weeks on: "Oh fuck. I forgot .encode() in that exception handler and that's why the application crashes".
Python and deprecations redux
Posted Feb 28, 2022 21:20 UTC (Mon) by irvingleonard (guest, #156786) [Link]
- Py2 does not have a functional distinction between byte arrays and strings.
- Different casting function for the same origin->destination (the binary/text distinction lies there)
To make this work you'd have to be a great programmer (which apparently you are, so kudos to you), able to keep track of which of your variables are holding text and which binaries (perfectly doable, with a good naming convention and hard discipline) and of course, a huge landmine (you treat binary as text and it might blow in your face). Now do 3rd party libraries: is this function returning text or "binary"? does this class expect binary? Again, everything out-of-band, via documentation or naming convention or some other trick. I'm not as disciplined as you apparently are, so, this is hard for me, I would rather have a totally independent type and have a type separation between binary and text.
Python and deprecations redux
Posted Feb 28, 2022 22:19 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]
Yes, and it's awesome!
> - Different casting function for the same origin->destination (the binary/text distinction lies there)
???
I still don't get it. Can you give an example of different casting in Py2?
> Now do 3rd party libraries: is this function returning text or "binary"?
Here's the question: why does it matter?
Python and deprecations redux
Posted Mar 2, 2022 5:13 UTC (Wed) by irvingleonard (guest, #156786) [Link]
Cheers.
Python and deprecations redux
Posted Feb 28, 2022 23:57 UTC (Mon) by ras (subscriber, #33059) [Link]
I admire your restraint.
Python and deprecations redux
Posted Feb 27, 2022 0:41 UTC (Sun) by ras (subscriber, #33059) [Link]
I'm with Cyberax here - I don't see that at all. And I do work with bytes.
Aside from the (b'abc'[1] in b'abc') thing (what were they thinking?), bytes and old Python2 strings are almost the same now.
Granted they weren't. In their initial vision bytes there were two very different things. Strings were the old Python2 strings, but representing Unicode points rather than bytes. Bytes were a brand new thing - a sequence of uint8's. Maybe I'm wrong, but I get the feeling they were introduced because of the planned move to Unicode strings, they needed a "raw i/o" type, so they copied Java's. So they had two different sets of methods. I don't know why they introduced the b'123' syntax because it doesn't fit will within this vision. Perhaps it was to ease porting form Python2. In any case, it was the hint that something was wrong, and the vision would be severely compromised in the years to come.
What you call "string abuse" is what I call them being forced to acknowledge data in the real world is not like that. It isn't separated cleanly into strings and bytes. We can see Python3 was in the end forced to acknowledge that because it backported (nearly?) all the str methods to bytes. They did that because while the Java model worked, it creates a lot of code bloat. Java is the spiritual home of code bloat so no one thought it was out of place there, but it becomes painfully evident in Python3. So they've been forced to kludge their way around it by duplicating code. Unfortunately the kludges won't work anywhere near as well at reducing bloat as strings and bytes did in Python2.
> I can't just do something like this: fb'{part1}\x11{part2}'
And yet b'{part1}\x11{part2}' % locals() works fine, and the f string is just syntactic sugar for that. Had their new string model become "str is subclass of a bytes object that contains pure unicode strings", and defined the str() and repr() functions to be things that always utf-8 (ie a bytes object that is also a str) then there is no problem - it all fit together naturally and beautifully. That contrasts to fb'{part1}\x11{part2}' restriction you mention which is purely artificial. It's not imposed by the real world, or math. It's there because someone is inflicting their vision of intellectual purism on the rest of us.
When you say "at the expense of simplicity", I'm struggling to think of something that is simpler. The only thing I can come up with is bytes.__getitem__ returns an int. Yes, that is slightly more convenient than ord(b'abc'), and perhaps bytes((a, b, c,)) is more convenient than ''.join(chr(c) for c in (a, b, c,)), but all that is completely blown away by the (b'abc'[0] in b'abc') fubar. I'd give up on the former to get the latter fixed any day. The rest is pretty much the same in both worlds now, due to the method duplication. Even print(b'abc\x11') gives very similar output to print('abc').
> I'm just saying that from a fresh dev point of view, one that started using python around 2.5-2.6, it felt like a great thing to fix.
No one is arguing that sequences of bytes and sequences of unicode do feel very different conceptually, or that it doesn't feel intellectually fulfilling nice to make a sharp distinction between the two, or that at the start making Unicode and bytes mix better than oil and water didn't seem like it would be a very fruitful endeavour. It certainly was a difficult endeavour. In fact I suspect it would never been even attempted, had the real world not intervened with example after example of single blob of text being mixture of ASCII and other crap, and it was real convenient to treat it all as text working just with the ASCII. Sure carefully parsing it to separate out the ASCII and crap is intellectually purer, but it also generates the sort of bloat we see in Java. It didn't help it is on occasion real convenient to apply a regex to something you knew damned well was just binary, or that the other string operations like slice, copy, join happened all the time with binary. While duplicating these otherwise two identical sets of operations into two incompatible types is intellectually purer it also increases the read world cognitive load on the programmer which is neither nice nor pure.
Python and deprecations redux
Posted Feb 28, 2022 16:28 UTC (Mon) by irvingleonard (guest, #156786) [Link]
There's another response from me about Python 2 strings here https://lwn.net/Articles/886363/
Regarding the origin theory, I really have no information on the whys and hows, so it's just theoretical discussion at this point. I could make the opposite case that you did.
If the main reason was to decouple "text" from "bits", hence the string/bytes distinction, then they just created a new problem: how do you cast one into the other? I'm sure there are many ways, and the current one could be great, or lousy, or good enough. I don't have expertise to opine there, all I can say is that it "feels good enough" for my use cases (and I see it flexible enough).
Python and deprecations redux
Posted Feb 26, 2022 2:02 UTC (Sat) by ras (subscriber, #33059) [Link]
Yes there is some tension between the various representations of blob of bytes. You see it everywhere - not just in languages. For example tcpdump will give you both the hex and printable representations. And contrary to what you say, I've never met a systems programmer who was confused by that - they instinctively know which one they want, even if the packet dump contains a mixture of text and binary data.
There are many resolutions to this tension. Java for instance chooses to represent byte blobs as a sequence of unit8's, and strings as something entirely different. C (and later Python) made the observation that a ASCII string is also a sequence of bytes. They could have created a separate type system with mostly duplicated operations for bytes and strings (copies, concatenate, search, ...), but that would be insane, which is how we ended up in the Python2 world. You don't like that Python2 always defaults to representing blobs of bytes as they ASCII equivalents where they exist, \xHH otherwise, but as someone whose dealt with this for decades while I need both, the string representation is usually the more useful one so it's a nice default, and it’s easy enough to convert to another representation - just [ord(c) for c in strng] say.
Then ISO 10646 came along. At first blush a sequence of ISO 10646 code points don't look much like a sequence of bytes. In fact in the beginning code points weren't a sequence of well defined anything as ISO 10646 had a variety of encodings, some of varying length. Java did the obvious and created those two parallel type systems - one for byte blobs and one for ISO 10646 text strings. (And sadly was sucked in by the UCS2 delusion Unicode created to justify its existence, as were many others at the time.) The duplication aside, two incompatible type systems worked well enough where there always was a sharp distinction between printable text and all other data. Unfortunately the plethora of different ways of encoding text prior to ISO 10646 meant in older data you had ASCII mixed with god knows what. The C/Python2 continued to handle that situation (which includes ISO 10646 using some unknown encoding) well, but Unicode hardly at all.
Then Ken Thompson gave us UTF-8, a ISO 10646 coding that could once again be treated as a strict sub-type of a sequence of bytes. The Unicode sub-type / trait or whatever you want to call it could literally just be a marker that said "I guarantee this byte sequence is valid UTF-8", and its methods could just be inherited from a byte sequence. It's a near perfect solution - the type system duplication is gone, unclean data mixing ASCII, bytes and text of unknown encodings could be handled without tripping over the type system and unwanted exceptions all the time. Hell, it even meant C's null terminated strings worked with Unicode, all the existing regex engines continued to work, wchar_t weren't anywhere near as necessary as it first appeared, and in general C's Unicode handling went from almost non-existent to ok'ish, to with the addition of a Unicode library or two to "perfectly serviceable". The old man gave us youngin's a lesson in software engineering. Again.
Then Python3 was created to "fix" Python2's Unicode problem - and adopted the "two entirely different type systems" approach.
But at its heart, my disappointment in this whole sorry saga isn't about that. We all make wrong technical choices all the time. What really irks me is Python adopted a one off deviation from their standard change control processes (which had worked, and seems to me continue to work very well despite this article's attempts to stir the pot) that let the mistake persist - the Python2 / Python3 transition. Had they done it the normal way, the way being discussed in this article, the entire Python user population would have gone along with them. As the deficiencies came to light they would have squawked longly and loudly, and my guess it would have been fixed. But as it was, we Python2 users had what we thought was an out. (Admittedly we were deluding ourselves, but Python2’s depreciation was so far away it seemed like it was some unknown future self's problem.) For me it was when my innocent looking os.listdir() blew up in my face, and it gradually dawned I had to write everything as b'' if I wanted my program to be reliable. It was so much bloat, and so unreliable I stuck with Python2. They fixed os.listdir() of course, and attempted to fix many of the other failings - but at the cost of piling on more and more weird encodings to "unicode", adding more and more complexity, and creating more and more duplication between bytes operations and string operations.
And now it looks like we are stuck with the result.
Python and deprecations redux
Posted Feb 26, 2022 13:37 UTC (Sat) by irvingleonard (guest, #156786) [Link]
I'm not saying that is perfect, or that it was done right, or that it didn't have politics involved, I'm just saying that from a fresh dev point of view, one that started using python around 2.5-2.6, it felt like a great thing to fix. There's also the "application of things" in different places: every time something new appears there's this group of people that feel that they have to use it. In your example: paths are text, and always be text, that's the whole point of them, so, it makes very little sense to talk about "bytes" in that area, and I'm sure they found a reason to use bytes in path functions but that only shows that those are incomplete; or maybe they wanted to provide a "low level os interface" (and then you should use pathlib for the regular stuff instead), not sure what they were thinking.
Python and deprecations redux
Posted Mar 5, 2022 14:31 UTC (Sat) by nix (subscriber, #2304) [Link]
What? Paths in Unix are a sequence of any bytes at all other than \0 (with / having a constrained meaning as a path separator). They are absolutely not required to be UTF-8 in whole or in part (and in fact can be partly UTF-8 and partly some other encoding in the same path). Any Python program that isn't going to give up and die when faced with a perfectly valid file whose name it doesn't like must deal with this, which means not using Python strings for paths.
Python 3's mistake is that it doesn't acknowledge that what is true of paths is true of *almost all other string-like data ever*. Network traffic, documents, you name it: many of them will be almost entirely UTF-8 except for little bits, rarely encountered, that are not, and you *must handle those little bits too*.
Python and deprecations redux
Posted Feb 9, 2022 12:00 UTC (Wed) by iq-0 (subscriber, #36655) [Link]
Provide (semi) standardized way to signal deprecations. Provide a way for application developers to cleanly redirect those warnings and allow easy silencing of these warnings, eg. using an environment variable, and hint to that in the warning message.
This will initially lead to a number of users being exposed to warnings they don't want, though they should probably care about. They can complain to the maintainers of the tool they're trying to use.
In that case:
a) it's maintained: the maintainer will probably find a way for users to be shielded for these warnings
b) it's not maintained: The user can easily shield themselves from the from the warnings by silencing the warning
In either case somebody is explicitly taking responsibility for hiding the deprecations and in doing so take on the burden of any resulting problems from ignoring it.