Python structural pattern matching morphs again
Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today! |
A way to specify multiply branched conditionals in the Python language—akin to the C switch statement—has been a longtime feature request. Over the years, various proposals have been mooted, but none has ever crossed the finish line and made it into the language. A highly ambitious proposal that would solve the multi-branch-conditional problem (and quite a bit more) has been discussed—dissected, perhaps—in the Python community over the last six months or so. We have covered some of the discussion in August and September, but the ground has shifted once again so it is time to see where things stand.
It seems quite possible that this could be the last major change that is made to the language—if it is made at all. As with many mature projects, there is a good deal of conservatism that tends to rear its head when big changes are proposed for Python. But this proposal has the backing of project founder (and former benevolent dictator for life) Guido van Rossum and has attracted support from other core developers—as well as opposition from within that group. It may also depend on one's definition of major, of course, but large syntactic and semantic language changes are definitely finding major headwinds in the Python community these days.
Background
The basic idea behind the "structural pattern matching" proposal is fairly straightforward, but there are
some rather deep aspects to it as well. Our
previous coverage, as well as the various Python Enhancement Proposals
(PEPs) surrounding the feature—linked below—will be helpful to readers who
want to dig in a ways. For those who just want the high-level
introduction, this example taken from PEP 622
("Structural Pattern Matching
") gives much of the flavor of
the proposed feature:
def make_point_3d(pt): match pt: case (x, y): return Point3d(x, y, 0) case (x, y, z): return Point3d(x, y, z) case Point2d(x, y): return Point3d(x, y, 0) case Point3d(_, _, _): return pt case _: raise TypeError("not a point we support")
The make_point_3d() function uses the proposed match statement to extract the relevant information from its pt argument, which may be passed as a two-tuple, three-tuple, Point2d, or Point3d. The x, y, and z (if present) are matched in the object passed and assigned to those variables, which are then used to create a Point3d with the right values. The use of "_" as a wildcard is consistent with other languages that have similar constructs, and is even used in a similar fashion as a convention in Python, but is perhaps one of the more contentious parts of the proposal. The final case matches anything at all that has not been matched by an earlier case.
If you squint at that example, it looks ... Python-ish, perhaps. But the case entries have some substantial differences from the existing language. In particular, constructs like Point2d(x, y) do not instantiate a Point2d object, but test if the match argument matches that type. If so, x and y are not looked up in the local scope, but are, instead, assigned to. It is different enough from the usual way of reading Python code that some have called it a domain-specific language inside Python for matching, which is seen (by some) as something to be avoided.
Another contentious part of the proposal is the handling of names, which are always treated as variables that get filled in from the match (called "capture variables"), as opposed to looking the name up and using its current value as a constant to be matched. That does not sit well with some, who mainly think that the capture variables should be indicated with some kind of sigil (e.g. ?var); other uses of names should conform to Python's usual practice. But the long list of authors for PEP 622 unanimously agreed that the common capturing case should not be made "ugly" for consistency with other parts of Python. Part of the reasoning is that other languages which have the feature also default to capture variables for unadorned names.
But programmers will want to be able to use constants in their case entries. The first version of PEP 622 required a sigil in the form of a dot prepended to names that should be used as constants (e.g. .CONSTANT), but that was not wildly popular—to put it mildly. Round two of that PEP switched to requiring constants to be in a namespace, which might be seen as something of a cop-out, since that effectively still requires the dot (e.g. namespace.constant).
Three new PEPs
When last we left the saga, PEP 622 was being handed off to the Python
steering council for consideration. The council members discussed the PEP
among themselves as well as with the PEP's authors. The result of that was
announced
by one of those authors, Van Rossum, toward the end of October. It turned
out that "there were a lot of problems with the text
" of
PEP 622, so the authors abandoned it in favor of three new PEPs:
- PEP 634:
"
Structural Pattern Matching: Specification
" - PEP 635:
"
Structural Pattern Matching: Motivation and Rationale
" - PEP 636:
"
Structural Pattern Matching: Tutorial
"
Make that four
A few days before Van Rossum's announcement, steering council member Thomas
Wouters posted
a PEP addressing the use of "_": PEP 640
("Unused variable syntax
"). It would create a new unused
variable that can be assigned to, though the binding (or assignment) is not actually performed
and that variable cannot be used in any other way. The PEP proposes to use
"?" as that variable.
Currently, some Python code conventionally uses "_" for unused variables, though that name has no special treatment in the language. In particular, the "unused" value does get bound to the name "_". It is often used as follows:
x, _, z = (2, 3, 4) # x=2, z=4 (but _=3 as well) for _ in range(10): do_something() # _=9 here
Using "unused", "dummy", or other regular names is possible too, of course. The problem that Wouters (and others) see is that the structural pattern matching proposal gives an additional meaning to "_", but does not extend it to the rest of the language. It is this inconsistency that led to the PEP:
Introducing ``?`` as special syntax for unused variables *both inside and outside pattern matching* allows us to retain that consistency. It avoids the conflict with internationalization *or any other uses of _ as a variable*. It makes unpacking assignment align more closely with pattern matching, making it easier to explain pattern matching as an extension of unpacking assignment.
There is one other oddity with "_": it has ... interesting ... behavior in the Python read-eval-print loop (REPL), where "_" is normally assigned to the value of the last-executed expression.
>>> 2+2 4 >>> _ 4If any of that is done in the REPL after the user explicitly assigns to "_", though, it always holds the last value that was assigned. So there is a fair amount of established usage of "_" that PEP 640 is trying to sidestep.
In Wouters's posting, he noted that adding "?" as the unused variable had benefits entirely independent of the pattern matching proposal, but he believes they are too small if PEP 634 is not adopted. So he thinks that PEP 640 should be rejected in that case. The reaction to the PEP was generally somewhat negative, though there was not a lot of discussion of the PEP itself in that thread. The main objection is that debugging uses of the unused variable when its value cannot be queried will be difficult.
Or five
Van Rossum's announcement of the three PEPs was also met with a fairly abbreviated thread (at least by the standards set in earlier rounds) that mostly consisted of tangential discussions on various pieces. But, as he was with PEP 622, Mark Shannon is not convinced that this form of pattern matching is needed at all in the language. He argued that it is a bad fit for a dynamically-typed procedural language like Python and that PEP 635 fails to offer a convincing case for the value of the feature (though the arguments have improved since PEP 622, he said).
Shannon had a number of specific areas where he believes that the proposal
falls short, which were mostly met with disagreement, but Nick Coghlan noted
that he shared some of Shannon's concerns. In fact, Coghlan had just posted
an announcement of PEP 642
("Constraint Pattern Syntax for Structural Pattern Matching
")
addressing some of those problems. His idea is that the existing
assignment syntax can be tweaked slightly to accommodate pattern matching,
while retaining the possibility that it could be used elsewhere in the
language down the road.
In the original version of the PEP, Coghlan combines literal and value (e.g. namespace.constant) patterns from PEP 634 into "constraint patterns". These constraint patterns can be tested either for equality or identity in a case. He used "?" as a prefix for equality and "?is" for identity and replaced the non-binding "_" wildcard with "?". The end result is that names are looked up and literals used if they are marked with "?"; literals that are not marked would raise a SyntaxError. It would look something like:
MISSING=404 match foo: case ?0: print('foo equals zero') case ?is None: print('foo is None') case ?MISSING: print('foo not found (404)') case (a, b): print(f'foo is a two-tuple: {a} {b}') case _: # still works, _ is just a normal capture variable print('foo is something wildly unexpected')
Steven D'Aprano did not like the PEP, but he had several suggestions, some of which were subsequently adopted by Coghlan. In particular, he dropped the need to have equality markers for literal values and switched away from using "?" entirely. Literal patterns are simply "case 0:", equality uses "==", and identity uses "is". D'Aprano also suggested that the problem with "_" in match is overblown:
Wouters sees things differently, however:
[...] The use of something else, like '?', leaves existing uses of '_' unambiguous, and allows structured pattern matching and iterable unpacking to be thought of the same. It reduces the complexity of the language because it no longer uses the same syntax for disparate things.
Tobias Kohn, one of the PEP 622 authors and co-author of PEP 635 with Van Rossum, noted that the idea of "load sigils" had been discussed and, in fact, the authors had settled on dot (".") for that case, but it proved to be unpopular. Kohn said that there is nothing in the current structural pattern matching proposal that precludes adding, say, "?" as a load sigil in the future. But he thinks those kinds of things can wait:
Deciding
While there are five PEPs floating around, two of them are informational in nature (635 and 636), so the steering council needs to decide if it will accept PEP 634 and add structural pattern matching to the language. It also needs to decide whether to augment or modify the feature with either PEP 640 to add "?" as an unused variable or PEP 642 to add constraint patterns and, effectively, load sigils. It could choose to adopt all three since Coghlan had switched PEP 642 to use "__" (double underscore) as its wildcard matching variable.
It is a complicated set of questions; if anything is adopted, it seems likely to have a significant impact for the language for a long time to come. The 2020 steering council will not be making the decision, however. The election for the 2021 steering council is currently underway; it completes on December 16. As reported by Wouters in early November, the current council will make a strong recommendation on the PEPs to the incoming council, which will make the final determination. There is no huge rush since the schedule for Python 3.10 shows the first beta, which is also the feature-freeze date, in early May 2021.
As part of the effort to make that recommendation, steering council member Brett Cannon posted a poll to the Python Discourse instance. He posted to the "Committers" category, where only core developers can comment and answer the poll. There were five options, one rejecting pattern matching entirely, three accepting PEP 634 with and without the other PEPs, and one for those who want pattern matching but not as defined in any of the PEPs.
When the voting closed on November 23, the clear split among core developers was evident. Half of the 34 voters wanted to accept PEP 634 in some form, while 44% (15 voters) did not want pattern matching at all and two voters (6%) wanted pattern matching but not as proposed. The poll is not binding in any way, of course, but it is indicative of the fault lines in the community with regard to the feature. Whichever way the council decides, it is likely to leave a sizable contingent unhappy.
Several commented in the poll thread about why they were voting one way or another; those in favor tended to see ways they could use the feature in their own code and were not overly bothered by any perceived inconsistencies. For the "no pattern matching" folks, Larry Hastings may have spoken for many of them when he said:
I can see how the PEP authors arrived at this approach, and I believe them when they say they thought long and hard about it and they really think this is the best solution. Therefore, since I dislike this approach so much, I’m pessimistic that anybody could come up with a syntax for pattern matching in Python that I would like. That’s why I voted for I don’t want pattern matching rather than I want pattern matching, but not as defined in those PEPs. It’s not that I’m against the whole concept of pattern matching, but I now believe it’s impossible to add it to Python today in a way that I would want.
There is a great deal more discussion in the python-dev mailing list for those who might want to dig in further. Coghlan's post of version two of PEP 642 and a suggestion by David Mertz to use words rather than sigils both led to interesting discussions. Paul Sokolovsky pointed participants to a recent academic paper [PDF] written by the authors of PEP 622 about pattern matching for Python; the paper sparked some discussion. Shannon also posted about some work he has been doing to define the precise semantics of pattern matching, which is something that is currently lacking. And so on.
It is, in short, one of the most heavily discussed Python features of all time. It seems likely that it even surpasses the discussion in the "PEP 572 mess", which brought the walrus operator (":=") to Python, but also led to Van Rossum's retirement. But maybe it only seems as large. In any case, the soon-to-be-elected steering council is in something of an unenviable position, but it seems clear that the question of this style of pattern matching for Python will finally be laid to rest early in 2021—one way or the other.
Index entries for this article | |
---|---|
Python | match statement |
Python | Python Enhancement Proposals (PEP)/PEP 622 |
Python | Python Enhancement Proposals (PEP)/PEP 634 |
(Log in to post comments)
Python structural pattern matching morphs again
Posted Dec 3, 2020 1:58 UTC (Thu) by logang (subscriber, #127618) [Link]
> Explicit is better than implicit.
> Simple is better than complex.
> Flat is better than nested.
> Readability counts.
> Special cases aren't special enough to break the rules.
> There should be one-- and preferably only one --obvious way to do it.
> Although never is often better than *right* now.
> If the implementation is hard to explain, it's a bad idea.
Not to mention python tends to prefer duck typing a lot and most of the examples are counter to that.
If accepted, I'd almost certainly ask my colleagues not to use it and suggest traditional methods during review.
Python structural pattern matching morphs again
Posted Dec 3, 2020 9:44 UTC (Thu) by lxsameer (guest, #65438) [Link]
To me, it seems that because pattern matching and type systems are a trend in past few years they want them in Python otherwise it's violates Python's Zen and would be mess.
Python structural pattern matching morphs again
Posted Dec 3, 2020 10:41 UTC (Thu) by rsidd (subscriber, #2582) [Link]
Python structural pattern matching morphs again
Posted Dec 3, 2020 17:32 UTC (Thu) by nybble41 (subscriber, #55106) [Link]
Inferred static types are very different from dynamic types, even if they appear superficially similar at the source level. Even with polymorphism, the type of the value being matched must be known in advance for each particular instance. The compiler can use this information to generate jump tables rather than long if/else chains and to statically test for exhaustiveness in the match patterns. The simplest pattern matches (e.g. destructuring a product type) can be optimized out altogether.
You can get approximately the same behavior with runtime tests in dynamic languages, but the performance cost is considerable.
Python structural pattern matching morphs again
Posted Dec 3, 2020 17:12 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
Not to mention python tends to prefer duck typing a lot and most of the examples are counter to that.Agreed. If it allowed something like this, I think it'd be way more powerful:
match ptlike: case c(x=x, y=y): return Point3d(x, y, 0)to match on *any* class with an x and y attribute (rather than `hasattr` checks or an `AttributeError` catch)? But maybe the AttributeError is just more Pythonic anyways… This makes me think…is the "in front of parens" token *always* looked up in the scope and not allowed to be a captured variable? If so, it's kind of odd that even it gets to be special too…
Python structural pattern matching morphs again
Posted Dec 3, 2020 17:29 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
IMHO this whole thing would make a lot more sense if it *just* supported the regular unpacking syntax a la PEP 448, and (maybe) a very conservative extension for matching simple enum constants. Everything else is just going to require too many weird special cases to get right.
Python structural pattern matching morphs again
Posted Dec 3, 2020 17:55 UTC (Thu) by ms (subscriber, #41272) [Link]
Haskell has n+k patterns (albeit behind an option - they're off by default), though one of the args must be a literal constant. But, err, I suspect Prolog and the halting-problem in general is where it would end if you took this to its logical conclusion.
Python structural pattern matching morphs again
Posted Dec 10, 2020 17:30 UTC (Thu) by papik (guest, #15175) [Link]
I was thinking... Python already has an assign-ish keyword "as". Maybe it is more pythonic something like:match ptlike: as Point2d(x, y): return Point3d(x, y, 0)But in my opinion there are too many colons...
Python structural pattern matching morphs again
Posted Dec 22, 2020 10:27 UTC (Tue) by intgr (subscriber, #39733) [Link]
I've seen lots of backpedalling on that approach. It's been a slow transition, but starting with the introduction of Abstract Base Classes (ABC), and more importantly, type hints, the whole community has become more accepting of nominative typing.
Data types that used to be hidden as internal implementation details are more frequently being exposed as public types, etc.
It would be possible to achieve "duck typing" type hints with protocols (structural typing) as well, but ironically the current implementation in Python is problematic.
Python structural pattern matching morphs again
Posted Dec 3, 2020 9:17 UTC (Thu) by rsidd (subscriber, #2582) [Link]
This is not akin to the C switch statement as much as to the match construct in the ML family (eg ocaml). Immensely useful. There is something similar in Haskell, and it is implemented as a macro in Julia.
Python structural pattern matching morphs again
Posted Dec 3, 2020 18:20 UTC (Thu) by dbaker (guest, #89236) [Link]
I write python pretty much all day. I'm excited to having this. maybe I'm the only one.
Python structural pattern matching morphs again
Posted Dec 4, 2020 9:12 UTC (Fri) by jezuch (subscriber, #52988) [Link]
Mind you, exactly the same thing is happening in Java right now, where they are trying to extend the switch statement to support pattern matching. The issues are also almost identical. The difference is that in Java they envision it as part of a bigger picture - not just pattern matching, but turning the entire type system around to be more like algebraic types. (See for example the recent addition of Record types.) For this reason this is not a simple task and the work has been going on for a couple of years now. I like to read the discussions on their mailing list and, sorry to say that, the way it happens in Python looks terribly amateurish in comparison :) I'm actually in awe of the projects Valhalla and Amber members.
Python structural pattern matching morphs again
Posted Dec 4, 2020 21:42 UTC (Fri) by Polynka (guest, #129183) [Link]
I cannot believe that somebody wrote those two words, by each other, unironically.
> rust and zig have pattern matching
Oh, sure, those languages are fashionable right now, but in maybe, 15 years, they and their features will probably go out of fashion and style – just like, 15 years ago every language had to be object-oriented and have exceptions as the preferred method of the error handling, but apparently these things are now Evil, and new languages like Holy Rust do not have them or have them in a greatly reduced form – now procedural programming and result types are The Way.
But Python is a language that’s by now firmly established as one of the “major languages”, and we will probably still use it widely in 15 years (unless it goes the way of Perl/Raku). There’s no real reason to force some feature on it because it’s now fashionable, but when implemented badly, it could be an ugly legacy wart in the near future.
Python structural pattern matching morphs again
Posted Dec 7, 2020 17:51 UTC (Mon) by dbaker (guest, #89236) [Link]
I actually was trying to be a little snarky, I tend to think that Rust and Zig have some nice features, but also some really bad ones :)
Python structural pattern matching morphs again
Posted Dec 10, 2020 12:26 UTC (Thu) by HelloWorld (guest, #56129) [Link]
Python structural pattern matching morphs again
Posted Dec 22, 2020 10:08 UTC (Tue) by intgr (subscriber, #39733) [Link]
Suggesting Rust, on the other hand, requires learning a new paradigm that affects pretty much every line of code. As good as that paradigm may be, it's a hard sell if the old paradigm works "well enough".
Python structural pattern matching morphs again
Posted Dec 22, 2020 17:16 UTC (Tue) by mathstuf (subscriber, #69389) [Link]
This system has been running for over 4 years now, gained support for non-`master` default branches in July, updated its GraphQL schema usage in June, added a couple of small features this past spring, a few trickling in over time (as they do), and hasn't really had an "oh shit" moment in over a year (I've stopped scrolling the history; there have been 2 that I can remember that weren't "gitlab or github changed something on us" (those are just error logs, not crashes) and they were logic errors possible in any language). It's also performant enough that we just deploy the debug build so the backtraces are useful (when needed).
If you're crafting your own data structures, yes, Rust is going to be a tough sell over more convenient languages (though I argue that it is still worth it). But if you're writing code that needs to be correct in production, performance is of at least some note, and where threading can really help out there, Rust is definitely the top of my list for what to implement it in.
Python structural pattern matching morphs again
Posted Dec 22, 2020 17:27 UTC (Tue) by mpr22 (subscriber, #60784) [Link]
This is really cool, and I can't think of much higher praise for a language's performance :)
Python structural pattern matching morphs again
Posted Dec 22, 2020 21:43 UTC (Tue) by mathstuf (subscriber, #69389) [Link]
I've talked about the project before on here, but it all lives here for anyone curious:
https://gitlab.kitware.com/utils
All of the "rust" and "ghostflow" repositories; the "source-formatters" repo is relevant in that it's something we point our configurations at to do formatting verification and fixing. "git-import-third-party" is relevant for anyone wanting to use the "third party" check to make sure vendored code is not modified outside of dedicated patch tracking mechanisms.
Python structural pattern matching morphs again
Posted Dec 26, 2020 19:55 UTC (Sat) by jezuch (subscriber, #52988) [Link]
To be fair, this is inferred most of the time (just like types and some other things); it's only needed (AFAICT, possibly wrongly) when you return a reference from a function which might also be one of the arguments. At least this was the moment it "clicked" for me and I learned to stop wondering WTH I need those annotations for and love lifetimes.
And besides, I think that many experienced programmers already know this "paradigm" without knowing it, so to speak.
Python structural pattern matching morphs again
Posted Dec 10, 2020 12:32 UTC (Thu) by HelloWorld (guest, #56129) [Link]
Python structural pattern matching morphs again
Posted Dec 5, 2020 21:35 UTC (Sat) by iustin (subscriber, #102433) [Link]
As a Haskell programmer in my free time who writes mostly Python at work, I'm very excited by this, and after all the typing annotations added to Python (which I just love), I'm quite surprised that a simple pattern matching results in "I don't want it in _my_ Python".
<snip lots of explanations because it won't convince people who made up their mind>
… but if Python doesn't add it, in one form or another, I think it's a loss for the Python language. Not for the community, since yes, it would raise the complexity of the language a tiny, tiny bit.
Python structural pattern matching morphs again
Posted Dec 6, 2020 19:59 UTC (Sun) by ehiggs (subscriber, #90713) [Link]
Python structural pattern matching morphs again
Posted Dec 6, 2020 20:47 UTC (Sun) by iustin (subscriber, #102433) [Link]
At least least, it gives a more succint view on what the author of the code thought/how they saw the code flow should happen, (the same way as type annotations show what they thought of the involved types), so in my book they're very useful.
Python structural pattern matching morphs again
Posted Dec 16, 2020 13:11 UTC (Wed) by smitty_one_each (subscriber, #28989) [Link]
This seems a trend in programming languages away from procedural and into more abstract mathematical realms.
The genius of Python is in striking a balance between "featuritis" and adding stuff for stuff's sake to the language and really hashing out the details that are going to matter in the long run.
Structural pattern matching seems an esoteric tool that one will gladly reach for when needed but mostly ignore in casual work.