"Structural pattern matching" for Python, part 2
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
We left the saga of PEP 622
("Structural Pattern Matching
") at the end of June, but the
discussion of
a Python "match" statement—superficially similar to a C
switch but with extra data-matching features—continued. At this
point, the next steps are up to the Python steering
council, which will determine the fate of the PEP. But there is lots
of discussion to catch up on from the last two months or so.
As a quick review, the match statement is meant to choose a particular code block to execute from multiple options based on conditions specified for each of the separate case entries. The conditions can be simple, as in:
match x: case 1: print('1') case 2: print('2')That simply compares the value of x to the numeric constants, but match can do far more than that. It is a kind of generalized pattern matching that can instantiate variables in the case condition when it gets matched, as the following example from the PEP shows:
# The target is an (x, y) tuple match point: case (0, 0): print("Origin") case (0, y): print(f"Y={y}") case (x, 0): print(f"X={x}") case (x, y): print(f"X={x}, Y={y}") case _: raise ValueError("Not a point")
Anti-PEP
On the other hand, there are some cognitive hurdles that folks will need to clear in order to understand this new syntax—and that's what is driving many of the complaints about the feature. Within a day after Guido van Rossum posted the first version of the PEP, he asked that commenters refrain from piling onto the huge thread so that the authors could have some time to hash things out. For the most part, that is what happened, but there was some concern that perhaps the opposition's arguments against the PEP would not get a full airing in the PEP itself. Mark Shannon, who had earlier questioned the need for the feature, proposed an "Anti-PEP" as a mechanism for opponents to marshal their arguments:
Whether the ultimate decision is made by the steering committee or by a PEP delegate, it is hard to make a decision between the pros and cons, when the pros are in a single formal document and the cons are scattered across the internet.
An Anti-PEP is a way to ensure that those opposed to a PEP can be heard and, if possible, have a coherent voice. Hopefully, it would also make things a lot less stressful for PEP authors.
Notably, an Anti-PEP would not propose an alternative, it would just collect the arguments against a proposed PEP. But Brett Cannon (and others) thought that there is another mechanism to be used:
Most other commenters agreed that the "Rejected Ideas" section (or
perhaps a new "Objections" section) is the right place to record such
arguments, though Raymond
Hettinger agreed
with Shannon about the need for a more formalized opposition document:
"The current process doesn't make it
likely that a balanced document is created for decision making
purposes.
" They were in the minority, however, so a formal anti-PEP
process seems improbable.
On July 1, Van Rossum
announced
a match statement "playground" that can be used
to test the proposed feature. It is a Binder
instance that runs a Jupyter Python
kernel that has been modified with a "complete implementation of the
PEP
". But the playground and other aspects of the process made
Rob Cliffe uneasy; he was concerned that the PEP was being
"railroaded through
".
Cliffe was under the impression that PEPs are never implemented until they have been accepted, but several people in the thread pointed out that is not the case. Chris Angelico said:
Round 2
The second version of the PEP was announced
on July 8. Van Rossum noted that the __match__() protocol
(for customizing a class's matching behavior)
had been removed at the behest of Daniel
Moisset, who was added as the sixth author of the PEP (with Van Rossum,
Brandt Bucher, Tobias Kohn, Ivan Levkivskyi, and Talin). The protocol is
not required for the feature; "postponing it will allow us to design it at a later
time when we have more experience with how `match` is being used
".
Moisset was added in part for his contribution of new text that "introduces the subject
matter much more gently than the
first version did
"
The other big change was to drop the leading dot for constants (e.g. .CONST) in the case statements. The problem addressed by that feature is that identifier strings in the patterns to be matched can either be a variable to be stored into if there is a match or a constant value to be looked up in order to match it (sometimes called "load-and-compare" values); Python cannot determine which it is without some kind of syntactical element or convention (e.g. all uppercase identifiers are constants). Consider the following example from the announcement:
USE_POLAR = "polar" USE_RECT = "rect"[...]
match t: case (USE_RECT, real, imag): return complex(real, imag) case (USE_POLAR, r, phi): return complex(r * cos(phi), r * sin(phi))
Python cannot distinguish the USE_RECT constant, which should cause
it to only match tuples with "rect" as the first element, from the
real and imag variables that should be filled in with the
values from the match. Since the original choice of prepending a dot to
the constants was quite unpopular, that was removed. It is a problem that
other languages with this kind of match have struggled with and
the PEP authors have as well; it was the
first issue in their bug tracker. Adding a sigil for the to-be-stored variables,
as has been suggested (e.g. ?real) "makes this common case ugly
and inconsistent
". In the end, they have decided to only allow
constants that come from a namespace:
If that part of the proposal is a "deal-breaker
", Van Rossum
said, then, when any
other problems have been resolved, that decision could be reconsidered. He
also outlined the other outstanding items, with the authors' position on
them:
Regarding the syntax for wildcards and OR patterns, the PEP explains why `_` and `|` are the best choices here: no other language surveyed uses anything but `_` for wildcards, and the vast majority uses `|` for OR patterns. A similar argument applies to class patterns.
That post predictably set off yet another mega-thread on various aspects of the new syntax. The alignment of else (if added) was one such topic; Stefano Borini suggested that the code to be executed should not be two levels of indentation in from the match, but be more like if statements. Glenn Linderman took that further:
match: t case ("rect", real, imag): return complex(real, imag) case ("polar", r, phi): return complex( r* cos(phi), r*sin(phi) else: return None
He compared that syntax favorably with that of try/except/else blocks, in addition to it resolving the else-alignment question. The PEP authors had addressed the idea, noting that there are two possibilities: either the match expression is in its own single-statement block (as Linderman has it), which is unique in the language, or the match and its expression are introducing a block with a colon, but that block is not indented like every other block after a colon in Python.
match: expression case ... # or match expression: case ...Either would violate a longstanding expectation in Python, so the authors rejected both possibilities.
Larry Hastings wondered about the special treatment being given to the "_" wildcard match. That symbol acts like a regular identifier, except in case statements, where it does not get bound (assigned to) for a match; it can also be used more than once in a case, which is not allowed for other match variables:
match x: case (_, _): # match any tuple print(_) # _ will not have a value case (x, x): # ILLEGAL
Hastings argued that if the same variable can be used more than once and that underscore does get bound, the special case disappears. The cost is an extra store for the binding, which could be optimized away as a dead store if that was deemed important. Moisset pointed out a few technical hurdles, and Van Rossum added more, but also thought it really did not make sense to do the binding for a value that is being explicitly described as a "don't care":
The need for a wildcard pattern has already been explained -- we really want to disallow `Point(x, y, y)` but we really need to allow `Point(z, _, _)`. Generating code to assign the value to `_` seems odd given the clear intent to *ignore* the value.
Compelling case?
As he was with the first version, Shannon is far from convinced that Python
needs match and that the PEP actually lays out a compelling case
for it. He returned to that question in a mid-July
post that pointed out several flaws that he saw in the justification
for the feature.
In general, the examples in the PEP do not show any real-world uses of
the new syntax that he finds compelling.
As he said
in a followup message: "I worry that the PEP is treating pattern matching as an ideal which we
should be striving towards. That is a bad thing, IMO.
"
Others disagreed with that view, however. Kohn, one of the PEP authors who also participated in that short thread, started his own short thread with a different way to look at the feature. It is not, he said, adding a switch statement to Python; instead, it is providing a mechanism for doing function overloading in various guises:
Kohn went through a few detailed examples of function overloading and the visitor design
pattern, showing how the proposed match feature would bring
multiple benefits to the code. It is worth a read, especially for those
who may not fully see the scope of what the feature can do. On the
flipside, though, Shannon posted
a link to his lengthy
deconstruction of PEP 622, which was met with skepticism—at best.
Several responders felt that it was simply a rehash of various arguments
that had already been made. Van Rossum effectively dismissed
it entirely:
"[...] it just repeats Mark's own arguments, which are exclusively focused on the
examples in the PEP (it's as if Mark read nothing *but* the
examples)
".
But Shannon (and others) pointed out his analysis of code in the standard library, which is linked in the critique. He concluded that in the roughly 600,000 lines of Python code in CPython, there were only three or four examples of places where the match statement made things more clear. While Stephen J. Turnbull is in favor of the PEP, he agreed that Shannon's analysis of the standard library was useful. Unlike Shannon, Turnbull thought that most of the pattern-matching rewrites were more clear, but there were not many of them; however, there is still a missing piece:
To the steering council
At this point, the PEP is in the hands of the steering council, which could approve it as written, ask for changes, or reject it entirely. One suspects that an outright rejection would likely kill the idea forever, though some pieces of it could perhaps survive in other guises; Shannon had some suggestions along those lines.
If modification is requested, else seems like a logical addition, though there is no real consensus on how it should be aligned, either within the author group or the Python community at large. The requirement that constants be referred to via a namespace (e.g. color.RED) is another area that could draw attention from the council. Doing things that way requires that match variables not be referred to via a namespace, which has a few downsides, including the inability to assign to self.x in a match pattern. While it was deemed "ugly" (at best), the original idea of requiring constants to have a dot prepended to them (e.g. .RED) would at least remove that restriction on match variables.
The council is an unenviable position here; one suspects that Van Rossum knows how the members feel after the "PEP 572 mess" that led to his resignation as benevolent dictator for life (BDFL)—thus to the creation of the council. Contentious decisions are draining and the aftermath can be painful as well. In this case, the opposition does not seem as strong as it was to the "walrus operator" from PEP 572, but the change here is far more fundamental—and the effects are far-reaching.
The council has not really tipped its hand in the multiple, typically long, threads discussing the feature—most of its members have not participated at all. Clearly a lot of work has gone into the PEP, which may make it somewhat harder to reject outright. The opinion of the language's inventor and former BDFL also likely helps tip the scales somewhat in the PEP's favor; most of the community seemed to like the idea overall, while there were (lots of) quibbles about details. It would not be surprising to find that Python 3.10 (slated for October 2021) shows up with a match statement.
Index entries for this article | |
---|---|
Python | Enhancements |
Python | match statement |
Python | Python Enhancement Proposals (PEP)/PEP 622 |
(Log in to post comments)
"Structural pattern matching" for Python, part 2
Posted Sep 1, 2020 17:57 UTC (Tue) by martin.pitt (subscriber, #26246) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 1, 2020 19:25 UTC (Tue) by logang (subscriber, #127618) [Link]
I personally find the syntax incredibly confusing and overly magic in any language -- so many rust examples that I've seen just leave me scratching my head.
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 2:38 UTC (Wed) by atnot (subscriber, #124910) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 6:01 UTC (Wed) by NYKevin (subscriber, #129325) [Link]
Perhaps the phrase "tagged unions" means something different to people who write Rust, but to my mind, PyObject* is a tagged union (and thus, so is every Python object). Am I misinterpreting you?
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 7:24 UTC (Wed) by ehiggs (subscriber, #90713) [Link]
Since so many things can be a PyObject, this isn't feasible unless you just had a fallthrough case (in Rust this uses _). The problem then is that you will never catch bugs when extending the code because you will happily use the fall through case when a new field in the enum is added.
"Structural pattern matching" for Python, part 2
Posted Sep 13, 2020 11:11 UTC (Sun) by plugwash (subscriber, #29694) [Link]
I would argue it's more akin to a base class than to a tagged union with the "ob_type" field serving a role similiar to a vptr.
To me at least "tagged union" implies that the list of subtypes is determined in advance and sufficient memory is allocated to store any of the subtypes.
"Structural pattern matching" for Python, part 2
Posted Sep 3, 2020 16:38 UTC (Thu) by jcpunk (subscriber, #95796) [Link]
I agree. I've yet to be convinced this is actually any better thanif XXX: print(x) elseif YYY: print(y) elseif ZZZ: print(z) else: raise ValueError()
"Structural pattern matching" for Python, part 2
Posted Sep 1, 2020 20:26 UTC (Tue) by NYKevin (subscriber, #129325) [Link]
But maybe I just haven't studied it enough.
Zen of Python
Posted Sep 2, 2020 8:28 UTC (Wed) by tnoo (subscriber, #20427) [Link]
For my feeling this proposal violates almost all Zen mantras.
(maybe unless you are Dutch --wink--)
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 13:00 UTC (Wed) by tnemeth (subscriber, #37648) [Link]
I'm sorry to disagree here but IHMO the "default:" case can need to access the `_` value, for error reporting...
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 14:08 UTC (Wed) by NAR (subscriber, #1313) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 14:19 UTC (Wed) by tnemeth (subscriber, #37648) [Link]
case x:
print("error: ", str(x))
an not use the `_` (which isn't meant to be used).
"Structural pattern matching" for Python, part 2
Posted Sep 3, 2020 13:22 UTC (Thu) by hartb (subscriber, #96640) [Link]
I guess you still have the object-to-be-matched handy:match t: ... case _: print("error: ", str(t))
"Structural pattern matching" for Python, part 2
Posted Sep 4, 2020 16:04 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
I was worried about the same thing too, but the walrus operator saves you in this related case as well:match t := some_expr(): case ...: case _: # use tThough I don't like the locality mismatch here. For a large (probably considered un-Pythonic) match, your variable assignment is far away when it could be right next to it. You also have to consider that `t` is now always assigned to rather than just in the catch-all case.
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 13:18 UTC (Wed) by caliloo (subscriber, #50055) [Link]
For me this tastes of feature FOMO more than design opinion, which is a red flag. The thing about variable match or assign in the brackets make my hair stand up from a language design stand point. If that can be alleviated, then for me it’s fine...
"Structural pattern matching" for Python, part 2
Posted Sep 6, 2020 21:18 UTC (Sun) by smitty_one_each (subscriber, #28989) [Link]
It does seem that, with the newer release cycle, Python is going down the Java "featuritis" road.
On the other hand, this seems to take Python more in a Haskell-ish direction.
Good news? Bad news? Who can say?
On the other hand, the languages that fail to add new features seem to be the ones nobody's using any longer. It's more of an economic than technical arms race, apparently.
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 19:25 UTC (Wed) by mb (subscriber, #50428) [Link]
we chose a simpler rule: named constants are only recognized when referenced via some namespace, such as `mod.var` or `Color.RED`Oh, come on...
That is even more awful than using cryptic special characters like dots.
That means whether a variable is read from or written to depends on how I "fetch" a variable name.
The equal character would IMO be the obvious and intuitive character to mark that something is going to be assigned to.
match t: case (USE_RECT, real=, imag=): return complex(real, imag) case (USE_POLAR, r=, phi=): return complex(r * cos(phi), r * sin(phi))
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 21:25 UTC (Wed) by nybble41 (subscriber, #55106) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 21:32 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
Changing variable scope to be anything other than the parent function/class/module declaration would be a massive change. Anything digging around in frame objects would need to learn to poke wherever these new scopes end up being declared. You'd basically need to add variable declarations too since this fairly common pattern would be broken:if foo is not None: local_foo = foo else: local_foo = DefaultValue()(Sure, here `foo` could probably be reused, but code like this still exists.)
"Structural pattern matching" for Python, part 2
Posted Sep 2, 2020 23:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]
Yes, I anticipated that. I think it would be worth it, since the only way to get reasonable scopes in current Python involves nested functions or lambdas, but in the end that's not for me to decide.
> Anything digging around in frame objects would need to learn to poke wherever these new scopes end up being declared.
Making it harder to mess around with frame objects seems like a feature, not a bug. Making them completely inaccessible would be even better. (And, apart from debugging, how would you even use such a thing without creating a very tight coupling with the function's internal implementation? I'm not seeing any realistic scenario where this code would not need to be updated anyway.)
> You'd basically need to add variable declarations too…
I think that could be avoided by simply defaulting to function scope as current Python does when assigning to an unknown variable. Nested block scopes would be introduced by specific statements where binding makes sense such as loop variables in a "for" statements, "as" clauses in "with" statements (as per the original version of PEP 343[1]), or placeholders in "match" statements. A "local" keyword could be added to explicitly introduce scoped variables into the current block, similar to the "global" and "nonlocal" keywords, if there is a demand for it.
"Structural pattern matching" for Python, part 2
Posted Sep 3, 2020 7:06 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
How exactly would you define "reasonable?"
IMHO, the Python rule is very straightforward and easy to remember. Moreover, I've never run into a situation where it actually causes a problem of some sort. Sure, variables hang around for a little longer than they would otherwise, but so what? If you really care, you can explicitly write del x, which is probably a Good Idea anyway, because if you care, then you should call attention to the fact that you care (not-caring is the default state of affairs, so x must be special in some way).
> Making it harder to mess around with frame objects seems like a feature, not a bug. Making them completely inaccessible would be even better.
And while we're at it, let's put the e in creat(2), where it should've been all along.
You can't break backcompat on a feature just because you personally dislike that feature. It exists. It needs to be supported. Unless we want to do *another* 3.x-style flag day, frame objects are here to stay.
> I think that could be avoided by simply defaulting to function scope as current Python does when assigning to an unknown variable.
As I mentioned in the comments the last time we discussed this idea:
- There is no such thing as an "unknown" variable, except at the global scope.
- It is always possible to determine at compile time which variables belong to which scopes, with the exception of the global scope. The bytecode compiler will then emit bytecode that only consults the single scope where the variable actually lives, or consults the global scope if the variable is not recognized.
- The global scope is not known at compile time, for a number of reasons, most obviously the fact that from foo import * is legal at the global scope (it's a SyntaxError anywhere else). This doesn't matter, because the "unknown = global" algorithm is good enough for the current design.
- Because the global scope is not known at compile time, it is not possible to determine at compile time whether an unrecognized variable in a match expression is a global or a nonexistent variable that should be created and bound to the case's scope.
- Therefore, allowing match expressions to create a new scope does not actually solve this problem, and would introduce a significant amount of complexity for no adequately justified reason.
"Structural pattern matching" for Python, part 2
Posted Sep 3, 2020 14:17 UTC (Thu) by sbaugh (subscriber, #103291) [Link]
Isn't the fact that Python functions aren't true closures a direct consequence of its lack of proper lexical scope? I'd say that causes a *lot* of problems:
>>> fs = [lambda: i for i in range(5)]
>>> [f() for f in fs]
[4, 4, 4, 4, 4]
"Structural pattern matching" for Python, part 2
Posted Sep 3, 2020 22:48 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
What (I think) you really want is capture-by-value semantics instead of capture-by-name, which has nothing to do with whether or not the loop has a separate scope.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]
And of course there is in also a very easy solution:lambda i=i: i
captures i
“properly”. Perhaps not very intuitive for beginners, but once you've learnt it you've learnt it.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 14:37 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
>>> d = []
>>> f = lambda x=d: print(x)
>>> f()
[]
>>> d.append(3)
>>> f()
[3]
Unless I'm missing what you're trying to capture here?
"Structural pattern matching" for Python, part 2
Posted Sep 10, 2020 2:11 UTC (Thu) by milesrout (subscriber, #126894) [Link]
>>> a = 0
>>> b = []
>>> f = lambda: print(a)
>>> g = lambda a=a: print(a)
>>> h = lambda: print(b)
>>> i = lambda b=b: print(b)
>>> (f(), g(), h(), i())
0
0
[]
[]
>>> a = 1; c.append(3)
>>> (f(), g(), h(), i())
1
0
[3]
[3]
f captures the *name* 'a', which refers to the immutable value 0. g has an optional parameter, the default value of which is the value of a i.e. the immutable value 0. Like any Python function, g will store internally references to the default values for its parameters. It can't 'capture' the variable a because you could put *any* expression as the default value for g's parameter, and that expression is evaluated *when the function is defined*, with the resulting value stored inside the closure object somewhere.
When f is called, it prints a i.e. the value referred to by the name 'a' that f captured. If 'a' has changed then what f prints will change. When g is called, it prints the value of its parameter or if it isn't given one, the default value, which is zero. The 'a' in f refers not to the value that 'a' had when f was created, but to the name 'a' itself in the definition environment of f.
When a is rebound to 1, it doesn't change the value that a was originally bound to. 'a = 1' is an operation on the name 'a', not an operation on the value pointed to by a. We reassign 'a' so that it refers to a different value, but we don't modify the value that 'a' previously pointed to. So when f() is called a second time, it prints 1 (the new value pointed to by a) while when g() is called a second time it prints 0 (the value you can think of as being pointed to by some 'g.__internal_stuff__.parameters[0].default_value' attribute).
h and i actually work exactly the same way, but with one crucial difference: the value that the name 'b' refers to is a list and is thus mutable. So when we do the 'b.append(3)' operation we are not changing the name 'b'! We're modifying the value *pointed to by b*. When we defined i, we evaluated the expression "b" and pointed some internal "this is the default value of my parameter" field of the closure object at the result of that evaluation. Evaluating the expression "b" results in that list object, the list object pointed to by the name 'b'.
So when we call h(), it prints the value pointed to by the name 'b'. When we call i(), it prints the value of parameter, if it exists, or the value pointed to by some internal 'default' reference for that parameter. That default reference hasn't been changed, it still points to the same object. However that object itself has been changed by the call to append. We only ever created *one* list object, and there are two references to it.
The 'mutable default value' issue for Python functions basically has nothing to do with variable capture at all. It's present even for normal function definitions:
def foo(x=[]):
x.append(1)
print(x)
>>> foo()
[1]
>>> foo()
[1, 1]
The main problem that the variable capture issue causes happens even with immutable values, and in fact is primarily an immutable value issue. The first time I encountered it was with a GUI library. I was doing something like this:
>>> for i in range(5):
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)
but of course the problem with this is that all the buttons will make the last button red when clicked. That's not because of a mutable value: the value pointed to by i doesn't change. The problem here is that a for loop assigns to the iteration variable. That code is the same as writing
>>> i = 0
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)
>>> i = 1
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)
etc.
And there it's more clear what's happening vis a vis name capture. The solution is of course to instead write buttons[i].on_click(lambda i=i: buttons[i].colour = RED).
"Structural pattern matching" for Python, part 2
Posted Sep 10, 2020 13:21 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 4, 2020 11:31 UTC (Fri) by gray_-_wolf (subscriber, #131074) [Link]
I would have assume that
match p:
case (x, x):
would match points where p.x == p.y. However that does not seem to be the case. How can I express such a match?
Second question, if constants must be namespaced, is there no way to express something like:
x = 1
match p:
case (x, y):
For points that are `(1, .*)`? Is there something like `local` namespaces for local variables in python?
I'm not a python programmer, so I might be lacking some parts of the proper mindset, but it seems... limiting.
"Structural pattern matching" for Python, part 2
Posted Sep 4, 2020 15:54 UTC (Fri) by ecree (guest, #95790) [Link]
Apparently the PEP authors decided to punt on what they call "algebraic matching" of repeated names.
IMNSHO, if this PEP isn't doing algebraic matching then it's really far from clear what it's for, and I'm tempted to side with those who say this creature is being feeped purely for the sake of feeping.
"Structural pattern matching" for Python, part 2
Posted Sep 6, 2020 11:35 UTC (Sun) by kleptog (subscriber, #1183) [Link]
An alternative to algebraic pattern matching is allowing extra conditional, like in elixir (which also offers algebraic matching). Then it could look like:match size: case (x,y) if x == y: circle() case (x,y): ellipse()The whole power of such a match statement is that you can match entire nested structures, which this PEP does offer. An optional extra if clause for the more fancy cases can help here.
"Structural pattern matching" for Python, part 2
Posted Sep 7, 2020 16:45 UTC (Mon) by ecree (guest, #95790) [Link]
On thinking more about this, I have a fairly fundamental question, which I didn't see raised anywhere:
Why can't this be done as a library? Why is a language feature needed?
Something like
import matching
# ...
if m := matching.match(p, (matching.capture('x'), matching.capture('x')):
print("Diagonal at %d" % (m.x,))
elif m := matching.match(p, (0, matching.capture('y')):
print("X-axis as %d" % (m.y,))
elif m := matching.match(p, (matching.capture('x'), matching.capture('y')):
print("Point at (%d,%d)" % (m.x, m.y))
else:
raise ValueError("Not a 2-tuple")
return Point(*p)
Yes, it's more verbose than the 'match' statement, but mainly because of the length of names, and still much more concise and declarative than doing everything with isinstance() and len(). And it makes the difference between constants and capture variables totally explicit!
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 7:21 UTC (Tue) by johill (subscriber, #25196) [Link]
You can probably use some more "magic" to make that shorter, something likefrom matching import match, capture with match(p) as matches: if m := matches((capture.x, capture.y)): print("Diagonal at %d" % (m.x, )) elif m := matches((0, capture.y)): print("X-axis at %d" % (m.y, )) elif m := matches((capture.x, capture.y)): print("Point at (%d, %d)" % (m.x, m.y)) else: raise ValueError("Not a 2-tuple") return Point(*p)You need something like
class CapturedVar: def __init__(self, name): self.name = name class capture: def __getattr__(self, name): return CapturedVar(name)and "match(...)" returns an object that is callable; the called method knows about CapturedVar instances and bumps those into the returned MatchedVars object ("m").
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 7:29 UTC (Tue) by johill (subscriber, #25196) [Link]
And maybe CapturedVar() can be callable again, returning another CapturedVar() instance with a type attached, so you can writefrom matching import match, capture with match(p) as matches: if m := matches((capture.x(int), capture.y(int))): print("Diagonal at %d" % (m.x, )) elif m := matches((0, capture.y(int))): print("X-axis at %d" % (m.y, )) elif m := matches((capture.x(int), capture.y(int))): print("Point at (%d, %d)" % (m.x, m.y)) elif m := matches((capture.x, capture.y)): raise ValueError("It's a tuple, but not with integers") else: raise ValueError("Not a 2-tuple") return Point(*p)
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 7:34 UTC (Tue) by johill (subscriber, #25196) [Link]
Sorry, obviously in both cases the first if was meant to beif m := matches((capture.x(int), capture.x(int))): print("Diagonal at %d" % (m.x, ))And that of course shows that the matches() call needs to be fairly smart about disentangling instances of CapturedVar() etc. But I believe it can be done.
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 16:15 UTC (Tue) by ecree (guest, #95790) [Link]
I agree. And that's a clever idea of yours to use a context manager to get the 'only mention p once' semantics.
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 17:39 UTC (Tue) by johill (subscriber, #25196) [Link]
I also just realized that I sort of sketched the basics of "algebraic matching", where the original PEP completely bypassed that issue and (x, x) doesn't even do what seems obvious? Or am I reading that wrong?Of course some more complex expressions would be ... hard. Probably not impossible, but hard. You'd have to make the CapturedVar() be able to track all kinds of things, if you wanted to be able to write
... if m := matches((capture.x(int), capture.x(int)**2)): print("Integer point on the parabola at %d" % m.x) ...It still seems possible, but ... tricky to say the least. Effectively, "CapturedVar" would have to be able to be operated on in a symbolic fashion, always returning a new CapturedVar object that encapsulates the expression, and can later evaluate it. So given an instance
x_square := CapturedVar('x') ** 2you'd have to be able to calculate
x_square(7)so you can compare to
capture.x(7).
But you can see the ambiguity here! I previously allowed
capture.x(int)to indicate you wanted a specific type ...
This would be better if the types were to rely on type annotations, but I don't think you can do that in this fashion.
I suppose you could detect if the argument was a type or an instance or something? But tricky to differentiate ...
Oh, I have an idea. The
x_square(7)calculation is purely internal, so we can do it as
x_square(value=y)(say the signature is "__call__(self, type=None, value=None)" so you can differentiate more easily. The value passing syntax is only needed internally to evaluate when you have a given thing in hand.
Well, basically ... it get complex, but I haven't hit anything I couldn't do.
Now I'm tempted to actually implement it :-)
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 18:33 UTC (Tue) by johill (subscriber, #25196) [Link]
while all of that works nicely with tuples because you can build some knowledge of them into the code, it really doesn't work at all with what the PEP calls "Class Patterns".
You can't write "m := matches(Point(capture.x, capture.y))" because you quite probably cannot instantiate a Point() object with two CapturedVar instances.
so you'd have to rewrite that as something like "m := matches_instance(Point, capture.x, capture.y)" at which point (pun intended) new syntax seems to make sense...
"Structural pattern matching" for Python, part 2
Posted Sep 8, 2020 23:39 UTC (Tue) by ecree (guest, #95790) [Link]
because you quite probably cannot instantiate a Point() object with two CapturedVar instances.
But can't you, though?
As long as CapturedVar
suitably 'ducks' its type (e.g. for ints, implementing + and **, like you suggested), Point.__init__
should be entirely happy with it unless it's trying to do things that depend on the actual values.
Sure, it's not something that will work for every class, but nor is the rest of the "Match Protocol" in PEP 622. And if the class has a __match_args__
, then you could potentially have a matching.Duck
that uses that to "play the rôle of" the class in a match expression:
from matching import match, capture, Duck
with match(p) as matches:
if m := matches(Duck(Point)(capture.x(int), capture.x(int))):
print("Diagonal at %d" % (m.x, ))
elif m := matches(capture.p(Point)(capture.x(int), capture.y(int))):
# the second call on capture.p implies a Duck
# if the match succeeds, it is replaced with the concrete Point
print("Other point at %s % (m.p, ))
So you'd use plain Point()
for dataclasses and anything else that's not too fussy, Duck(Point)()
otherwise.
Still not quite as pretty as the PEP 622 syntax, but it still seems like a good way to get experience with "how this feature would be used in practice" before baking it into the language. Which to my mind is reason enough to do it.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 6:22 UTC (Wed) by johill (subscriber, #25196) [Link]
I don't think that'll work. It has to implement the algebraic operations to build a parse tree to be able to do algebraic matching, but you still can't really call Point.__init__ with it I'd think, it might check the type, or other random things in there. Hence the match protocol though!
> Sure, it's not something that will work for every class, but nor is the rest of the "Match Protocol" in PEP 622. And if the class has a __match_args__, then you could potentially have a matching.Duck that uses that to "play the rôle of" the class in a match expression: [...]
But yeah, you're right, the PEP also cannot do magic wrt. class matching, it has __match_args__ for that, and indeed with something like the "Duck()" you propose - or what I wrote as "m := matches_instance(Point, capture.x, capture.y)" - you can indeed do the same thing.
I neglected to think enough about it I suppose, but (fairly obviously) the only magic in the PEP can be the syntax, making "Point(x, y)" not be an instantiation call but some other kind of expression. Without new syntax we have to open that up manually by doing "Duck(Point)(capture.x, capture.y)" or my "m := matches_instance(Point, capture.x, capture.y)".
The rest ... yeah can probably be done, now that I think about it. In mostly the same ways.
And that shows to some extent where and how the algebraic matching breaks down and probably why they punted on it, though I'm not convinced it's that bad. Obviously, you don't want to get into the territory of writing "Point(x**2, x**3)" and suddenly needing an arbitrary solver to do the match, but I think you could do it if you restrict to having to mention each variable "naked" at least once. I.e. "Point(x, x**3)" would be OK and "Point3d(x, y, x+y)" would also be OK - all of this because once you have values for x/y it's easy to check the remaining expressions. But something like "Point(x+y, x-y)" would not be allowed.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]
At that point, how is this better than
(x, y) = size
if x == y:
circle()
else:
ellipse()
I would contend that it is not. It is an extra level of indentation and a whole new complicated language construct.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 22:16 UTC (Wed) by johill (subscriber, #25196) [Link]
The match/case version can handle such things without you having to put another "is a 2-tuple" if outside of it.
"Structural pattern matching" for Python, part 2
Posted Sep 10, 2020 2:12 UTC (Thu) by milesrout (subscriber, #126894) [Link]
That's quite intentional. If you are passing the wrong type of value to a function then your code should crash. The code will also crash if I pass a string to a function that expects an integer. Code should not be full of defensive checks for programmer error: they're code paths that are hard if not impossible to test and by design are intended to never be run (nobody makes programmer errors on purpose). Type checks are for dispatching based on type in Python, and most of the time 'isinstance' checks are the wrong way of doing things. "if isinstance elif isinstance elif isinstance"-style code is considered unPythonic, and I don't think that papering over that code with a language construct makes it a good idea.Now there are possibly some cases where you want to be able to pass different sizes of tuples to a single function. However I think in most of those cases you probably shouldn't be using tuples, you should be using something like a named tuple, a dictionary, a custom class, a dataclass with an optional field, or just separate arguments to a single function, some of which are optional.
Or possibly you should actually be making a decision 'am I working with 2-tuples or 3-tuples?' at a higher level and switch. If you had some code that is meant to work with 2D or 3D or 4D vectors for example, it would be better to have separate functions for the length of 2D, 3D and 4D vectors than a single function that does
def veclen(v):
match v:
case (x,y): return sqrt(x*x + y*y)
case (x,y,z): return sqrt(x*x + y*y + z*z)
case (x,y,z,w): return sqrt(x*x + y*y + z*z + w*w)
Then you should make the determination as to whether you are working with 2D, 3D or 4D vectors at a high level and write all the functions dealing with 2D, 3D or 4D vectors separately.
For example if you really want to write 'length-agnostic' code with tuples up to some maximum length, then you should encapsulate that variation in size in Vector2D, Vector3D and Vector4D etc. classes with a common Vector superclass such that Vector's methods can be those that are parametric in the dimension.
And if it's really truly length-agnostic tuple code then you should be writing loops over lists, not matching on tuples.
"Structural pattern matching" for Python, part 2
Posted Sep 11, 2020 6:29 UTC (Fri) by johill (subscriber, #25196) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 11, 2020 12:53 UTC (Fri) by milesrout (subscriber, #126894) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 11, 2020 13:27 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
This feels like one of those "cannot derive ought from is" statements. Is there any indication that something better than `isinstance` if-trees was explicitly rejected earlier in the language's design process? I'd expect the previous approach was "just treat it like a duck and if it quacks well enough, what do you care?", but with types now becoming more and more of a thing in general (even in Python), this seems like a natural thing to be considering around this time (to me at least).
"Structural pattern matching" for Python, part 2
Posted Sep 12, 2020 4:30 UTC (Sat) by milesrout (subscriber, #126894) [Link]
Yes. Switch statement suggestions have been rejected over and over again. Pattern matching proposals have been rejected before. 'if isinstance' isn't considered bad Python because it's *ugly* but because it's *bad design*. Dispatching on types is just not a good way to write code in a dynamically typed language. There is no - and can be no - exhaustiveness checking for example, to give just one example of why it's a problem.
>but with types now becoming more and more of a thing in general (even in Python), this seems like a natural thing to be considering around this time (to me at least).
Adding type annotations to Python is yet another example of the continual application of bad ideas to clutter up Python, done by people trying to roleplay as Haskell programmers. Pattern matching is the same thing. It's blatant cargo culting.
"Structural pattern matching" for Python, part 2
Posted Sep 16, 2020 16:02 UTC (Wed) by HelloWorld (guest, #56129) [Link]
That problem has nothing to do with pattern matching and everything to do with dynamic typing, which is quite simply an inherently bad idea.
"Structural pattern matching" for Python, part 2
Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]
"Structural pattern matching" for Python, part 2
Posted Sep 15, 2020 13:28 UTC (Tue) by udifuchs (guest, #141354) [Link]
match pt:
case Point2D(x, y):
It looks like the class is being initiated.
I was wondering if using square brackets would make more sense,
as it is similar to the syntax used for type annotations.
For example:
match pt:
case Point2D[x, y]:
One issue with this idea is that the square bracket use might be confused with
item access. In enum, for example, this is a valid syntax:
>>> Sides["SPAM"] == Sides.SPAM == Sides("Spam")
True
Resolving this conflict would probably require not allowing such item access
inside a case statement.
"Structural pattern matching" for Python, part 2
Posted Sep 16, 2020 15:59 UTC (Wed) by HelloWorld (guest, #56129) [Link]
>
> It looks like the class is being initiated.
No, it doesn't, because there is a case keyword right before it, it's pretty much impossible to miss. If the reader doesn't know what that means and starts making wild guesses, the problem isn't with the language but with the reader.
"Structural pattern matching" for Python, part 2
Posted Sep 18, 2020 4:23 UTC (Fri) by udifuchs (guest, #141354) [Link]
The PEP mentions that assignment targets in python also look like expressions, but this is true in a very narrow way. It is a syntax error to write 'int(x) = 3', while 'case int(x):' is valid.
More specifically, while reading the is_tuple() example in the PEP, it took me a while to realize that LParen() is initialized in the original code but not in the new code. If LParen.__init__() has some side effects, the two versions of the code that look very similar could be very different.