|
|
Subscribe / Log in / New account

"Structural pattern matching" for Python, part 2

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jake Edge
September 1, 2020

We left the saga of PEP 622 ("Structural Pattern Matching") at the end of June, but the discussion of a Python "match" statement—superficially similar to a C switch but with extra data-matching features—continued. At this point, the next steps are up to the Python steering council, which will determine the fate of the PEP. But there is lots of discussion to catch up on from the last two months or so.

As a quick review, the match statement is meant to choose a particular code block to execute from multiple options based on conditions specified for each of the separate case entries. The conditions can be simple, as in:

    match x:
        case 1:
	    print('1')
	case 2:
	    print('2')
That simply compares the value of x to the numeric constants, but match can do far more than that. It is a kind of generalized pattern matching that can instantiate variables in the case condition when it gets matched, as the following example from the PEP shows:
# The target is an (x, y) tuple
match point:
    case (0, 0):
        print("Origin")
    case (0, y):
        print(f"Y={y}")
    case (x, 0):
        print(f"X={x}")
    case (x, y):
        print(f"X={x}, Y={y}")
    case _:
        raise ValueError("Not a point")
In the second case, for example, y will be instantiated to the value of the second element in the point tuple if the first is zero; that allows its value to be printed. This example also shows the "_" value being used as the match-all wildcard, though that character was not an entirely popular choice; it is, however, commonly used by other languages for that purpose. Even that example barely scratches the surface of the power and reach of the proposed match statement.

Anti-PEP

On the other hand, there are some cognitive hurdles that folks will need to clear in order to understand this new syntax—and that's what is driving many of the complaints about the feature. Within a day after Guido van Rossum posted the first version of the PEP, he asked that commenters refrain from piling onto the huge thread so that the authors could have some time to hash things out. For the most part, that is what happened, but there was some concern that perhaps the opposition's arguments against the PEP would not get a full airing in the PEP itself. Mark Shannon, who had earlier questioned the need for the feature, proposed an "Anti-PEP" as a mechanism for opponents to marshal their arguments:

When deciding on PEP 484 ["Type Hints"], I had to decide between a formally written PEP on one hand, and a mass of emails and informal polls I had done at conferences, on the other. I hope I made the right decision.

Whether the ultimate decision is made by the steering committee or by a PEP delegate, it is hard to make a decision between the pros and cons, when the pros are in a single formal document and the cons are scattered across the internet.

An Anti-PEP is a way to ensure that those opposed to a PEP can be heard and, if possible, have a coherent voice. Hopefully, it would also make things a lot less stressful for PEP authors.

Notably, an Anti-PEP would not propose an alternative, it would just collect the arguments against a proposed PEP. But Brett Cannon (and others) thought that there is another mechanism to be used:

[...] that's what the Rejected Ideas section is supposed to capture. If a PEP is not keeping a record of what is discussed, including opposing views which the PEP is choosing not to accept, then that's a deficiency in the PEP and should be fixed. And if people feel their opposing view was not recorded properly, then that should be brought up.

Most other commenters agreed that the "Rejected Ideas" section (or perhaps a new "Objections" section) is the right place to record such arguments, though Raymond Hettinger agreed with Shannon about the need for a more formalized opposition document: "The current process doesn't make it likely that a balanced document is created for decision making purposes." They were in the minority, however, so a formal anti-PEP process seems improbable.

On July 1, Van Rossum announced a match statement "playground" that can be used to test the proposed feature. It is a Binder instance that runs a Jupyter Python kernel that has been modified with a "complete implementation of the PEP". But the playground and other aspects of the process made Rob Cliffe uneasy; he was concerned that the PEP was being "railroaded through".

However, PEP 622 only seems to have been presented to the Python community only *after* a well-developed (if not finalised) implementation was built.  A fait accompli.  So there will inevitably be resistance from the developers to accept changes suggested on python-dev.  And since the PEP has Guido's authority behind it, I think it is likely that it will eventually be accepted pretty much as it was originally written.

Cliffe was under the impression that PEPs are never implemented until they have been accepted, but several people in the thread pointed out that is not the case. Chris Angelico said:

Speaking with my PEP Editor hat on, I would be *thrilled* if more proposals came with ready-to-try code. Only a very few have that luxury, and a lot of the debating happens with nothing but theory - people consider what they *think* they'd do, without actually being able to try it out and see if it really does what they expect. Having a reference implementation isn't necessary, of course, but it's definitely a benefit and not a downside. Also, there HAVE been proposals with full reference implementations that have ended up getting rejected; it's not a guarantee that it'll end up getting merged.

Round 2

The second version of the PEP was announced on July 8. Van Rossum noted that the __match__() protocol (for customizing a class's matching behavior) had been removed at the behest of Daniel Moisset, who was added as the sixth author of the PEP (with Van Rossum, Brandt Bucher, Tobias Kohn, Ivan Levkivskyi, and Talin). The protocol is not required for the feature; "postponing it will allow us to design it at a later time when we have more experience with how `match` is being used". Moisset was added in part for his contribution of new text that "introduces the subject matter much more gently than the first version did"

The other big change was to drop the leading dot for constants (e.g. .CONST) in the case statements. The problem addressed by that feature is that identifier strings in the patterns to be matched can either be a variable to be stored into if there is a match or a constant value to be looked up in order to match it (sometimes called "load-and-compare" values); Python cannot determine which it is without some kind of syntactical element or convention (e.g. all uppercase identifiers are constants). Consider the following example from the announcement:

USE_POLAR = "polar"
USE_RECT = "rect"
[...]
match t:
    case (USE_RECT, real, imag):
	return complex(real, imag)
    case (USE_POLAR, r, phi):
	return complex(r * cos(phi), r * sin(phi))

Python cannot distinguish the USE_RECT constant, which should cause it to only match tuples with "rect" as the first element, from the real and imag variables that should be filled in with the values from the match. Since the original choice of prepending a dot to the constants was quite unpopular, that was removed. It is a problem that other languages with this kind of match have struggled with and the PEP authors have as well; it was the first issue in their bug tracker. Adding a sigil for the to-be-stored variables, as has been suggested (e.g. ?real) "makes this common case ugly and inconsistent". In the end, they have decided to only allow constants that come from a namespace:

No other language we’ve surveyed uses special markup for capture variables; some use special markup for load-and-compare, so we’ve explored this. In fact, in version 1 of the PEP our long-debated solution was to use a leading dot. This was however boohed off the field, so for version 2 we reconsidered. In the end nothing struck our fancy (if `.x` is unacceptable, it’s unclear why `^x` would be any better), and we chose a simpler rule: named constants are only recognized when referenced via some namespace, such as `mod.var` or `Color.RED`.

If that part of the proposal is a "deal-breaker", Van Rossum said, then, when any other problems have been resolved, that decision could be reconsidered. He also outlined the other outstanding items, with the authors' position on them:

There’s one other issue where in the end we could be convinced to compromise: whether to add an `else` clause in addition to `case _`. In fact, we probably would already have added it, except for one detail: it’s unclear whether the `else` should be aligned with `case` or `match`. If we are to add this we would have to ask the Steering Council to decide for us, as the authors deadlocked on this question.

Regarding the syntax for wildcards and OR patterns, the PEP explains why `_` and `|` are the best choices here: no other language surveyed uses anything but `_` for wildcards, and the vast majority uses `|` for OR patterns. A similar argument applies to class patterns.

That post predictably set off yet another mega-thread on various aspects of the new syntax. The alignment of else (if added) was one such topic; Stefano Borini suggested that the code to be executed should not be two levels of indentation in from the match, but be more like if statements. Glenn Linderman took that further:

That would also sidestep the dilemma of whether else: (if implemented) should be indented like case: or like match: because they would be the same.
match:
    t
case ("rect", real, imag):
    return complex(real, imag)
case ("polar", r, phi):
    return complex( r* cos(phi), r*sin(phi)
else:
    return None

He compared that syntax favorably with that of try/except/else blocks, in addition to it resolving the else-alignment question. The PEP authors had addressed the idea, noting that there are two possibilities: either the match expression is in its own single-statement block (as Linderman has it), which is unique in the language, or the match and its expression are introducing a block with a colon, but that block is not indented like every other block after a colon in Python.

    match:
        expression
    case ...
    
    # or
    
    match expression:
    case ...
Either would violate a longstanding expectation in Python, so the authors rejected both possibilities.

Larry Hastings wondered about the special treatment being given to the "_" wildcard match. That symbol acts like a regular identifier, except in case statements, where it does not get bound (assigned to) for a match; it can also be used more than once in a case, which is not allowed for other match variables:

    match x:
        case (_, _):  # match any tuple
	    print(_)  # _ will not have a value
	case (x, x):  # ILLEGAL

Hastings argued that if the same variable can be used more than once and that underscore does get bound, the special case disappears. The cost is an extra store for the binding, which could be optimized away as a dead store if that was deemed important. Moisset pointed out a few technical hurdles, and Van Rossum added more, but also thought it really did not make sense to do the binding for a value that is being explicitly described as a "don't care":

When I write `for x, _, _ in pts` the main point is not that I can write `print(_)` and get the z coordinate. The main point is that I am not interested in the y or the z coordinates (and showing this to the reader up front). The value assigned to `_` is uninteresting [...]

The need for a wildcard pattern has already been explained -- we really want to disallow `Point(x, y, y)` but we really need to allow `Point(z, _, _)`. Generating code to assign the value to `_` seems odd given the clear intent to *ignore* the value.

Compelling case?

As he was with the first version, Shannon is far from convinced that Python needs match and that the PEP actually lays out a compelling case for it. He returned to that question in a mid-July post that pointed out several flaws that he saw in the justification for the feature. In general, the examples in the PEP do not show any real-world uses of the new syntax that he finds compelling. As he said in a followup message: "I worry that the PEP is treating pattern matching as an ideal which we should be striving towards. That is a bad thing, IMO."

Others disagreed with that view, however. Kohn, one of the PEP authors who also participated in that short thread, started his own short thread with a different way to look at the feature. It is not, he said, adding a switch statement to Python; instead, it is providing a mechanism for doing function overloading in various guises:

Indeed, pattern matching is much more closely related to concepts like regular expressions, sequence unpacking, visitor patterns, or function overloading.  In particular, the patterns themselves share more similarities with formal parameters than with expressions.

Kohn went through a few detailed examples of function overloading and the visitor design pattern, showing how the proposed match feature would bring multiple benefits to the code. It is worth a read, especially for those who may not fully see the scope of what the feature can do. On the flipside, though, Shannon posted a link to his lengthy deconstruction of PEP 622, which was met with skepticism—at best. Several responders felt that it was simply a rehash of various arguments that had already been made. Van Rossum effectively dismissed it entirely: "[...] it just repeats Mark's own arguments, which are exclusively focused on the examples in the PEP (it's as if Mark read nothing *but* the examples)".

But Shannon (and others) pointed out his analysis of code in the standard library, which is linked in the critique. He concluded that in the roughly 600,000 lines of Python code in CPython, there were only three or four examples of places where the match statement made things more clear. While Stephen J. Turnbull is in favor of the PEP, he agreed that Shannon's analysis of the standard library was useful. Unlike Shannon, Turnbull thought that most of the pattern-matching rewrites were more clear, but there were not many of them; however, there is still a missing piece:

In the Analysis Mark argues that idioms amenable to pattern matching in existing stdlib code are quite rare (a couple of dozen instances, and as I wrote above I think his query was reasonably representative). While that certainly is a useful analysis, it cannot account for the fact that pattern matching is a *paradigm* that has not been available in Python in the past. *Availability of a pleasant syntax for a new paradigm causes code to be designed differently.* That is something that none of us can claim to be able to quantify accurately, although the experience Guido and others have with "async" may be relevant to guesstimating it.

To the steering council

At this point, the PEP is in the hands of the steering council, which could approve it as written, ask for changes, or reject it entirely. One suspects that an outright rejection would likely kill the idea forever, though some pieces of it could perhaps survive in other guises; Shannon had some suggestions along those lines.

If modification is requested, else seems like a logical addition, though there is no real consensus on how it should be aligned, either within the author group or the Python community at large. The requirement that constants be referred to via a namespace (e.g. color.RED) is another area that could draw attention from the council. Doing things that way requires that match variables not be referred to via a namespace, which has a few downsides, including the inability to assign to self.x in a match pattern. While it was deemed "ugly" (at best), the original idea of requiring constants to have a dot prepended to them (e.g. .RED) would at least remove that restriction on match variables.

The council is an unenviable position here; one suspects that Van Rossum knows how the members feel after the "PEP 572 mess" that led to his resignation as benevolent dictator for life (BDFL)—thus to the creation of the council. Contentious decisions are draining and the aftermath can be painful as well. In this case, the opposition does not seem as strong as it was to the "walrus operator" from PEP 572, but the change here is far more fundamental—and the effects are far-reaching.

The council has not really tipped its hand in the multiple, typically long, threads discussing the feature—most of its members have not participated at all. Clearly a lot of work has gone into the PEP, which may make it somewhat harder to reject outright. The opinion of the language's inventor and former BDFL also likely helps tip the scales somewhat in the PEP's favor; most of the community seemed to like the idea overall, while there were (lots of) quibbles about details. It would not be surprising to find that Python 3.10 (slated for October 2021) shows up with a match statement.


Index entries for this article
PythonEnhancements
Pythonmatch statement
PythonPython Enhancement Proposals (PEP)/PEP 622


(Log in to post comments)

"Structural pattern matching" for Python, part 2

Posted Sep 1, 2020 17:57 UTC (Tue) by martin.pitt (subscriber, #26246) [Link]

I find Mark Shannon's analysis very compelling. Even though I just looked at the examples (not the discussion), these are the ones that the proponents picked - so I'd expect them to be prime candidates for match. If even these are so riddled with ambiguities and subtle bugs, then IRL code will hardly get better. There is such a thing for changing the language for change's sake!

"Structural pattern matching" for Python, part 2

Posted Sep 1, 2020 19:25 UTC (Tue) by logang (subscriber, #127618) [Link]

I agree. Though, it seems more like they are changing the language just to add a feature other languages have without considering whether it's good for python or even a good feature at all.

I personally find the syntax incredibly confusing and overly magic in any language -- so many rust examples that I've seen just leave me scratching my head.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 2:38 UTC (Wed) by atnot (subscriber, #124910) [Link]

This is my problem with this feature. I love rust's pattern matching, but what really makes that useful is the rest of the language, most notably tagged unions or as rust calls them enums. Python has none of those features. If python were to have those, which I would love, I think match could be incredibly useful, but it doesn't.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 6:01 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

> most notably tagged unions or as rust calls them enums.

Perhaps the phrase "tagged unions" means something different to people who write Rust, but to my mind, PyObject* is a tagged union (and thus, so is every Python object). Am I misinterpreting you?

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 7:24 UTC (Wed) by ehiggs (subscriber, #90713) [Link]

You are correct that a PyObject is a tagged union. The point here, however, is that match in Rust is exhaustive. You must write the match case for every possible branch if you want your Rust code to compile. Making match on PyObject exhaustive would mean you need a branch covering all possible types.

Since so many things can be a PyObject, this isn't feasible unless you just had a fallthrough case (in Rust this uses _). The problem then is that you will never catch bugs when extending the code because you will happily use the fall through case when a new field in the enum is added.

"Structural pattern matching" for Python, part 2

Posted Sep 13, 2020 11:11 UTC (Sun) by plugwash (subscriber, #29694) [Link]

> You are correct that a PyObject is a tagged union.

I would argue it's more akin to a base class than to a tagged union with the "ob_type" field serving a role similiar to a vptr.

To me at least "tagged union" implies that the list of subtypes is determined in advance and sufficient memory is allocated to store any of the subtypes.

"Structural pattern matching" for Python, part 2

Posted Sep 3, 2020 16:38 UTC (Thu) by jcpunk (subscriber, #95796) [Link]

I agree. I've yet to be convinced this is actually any better than
if XXX:
  print(x)
elseif YYY:
  print(y)
elseif ZZZ:
  print(z)
else:
  raise ValueError()

"Structural pattern matching" for Python, part 2

Posted Sep 1, 2020 20:26 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

I still think indentation is a problem here. If you're following PEP 8, then you have four spaces of indents for the match clause, four spaces for the enclosing case, four spaces for the enclosing function, and (possibly) four spaces for the enclosing class. That's a quarter of the entire line just devoted to indentation. Then, if you want to put this thing inside of a loop or other block, it gets even worse. So, basically, the only way this will be readable is if it's doing some kind of top-level dispatch, where it destructures into local variables, and then immediately hands those variables off to another function or method. That's probably somewhat useful, in certain contexts (e.g. consuming an AST and dispatching on the atom type) but now I get this strange feeling that we're basically reimplementing the vtable, in a language that already has very elaborate support for polymorphism and OOP.

But maybe I just haven't studied it enough.

Zen of Python

Posted Sep 2, 2020 8:28 UTC (Wed) by tnoo (subscriber, #20427) [Link]

> import this

For my feeling this proposal violates almost all Zen mantras.
(maybe unless you are Dutch --wink--)

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 13:00 UTC (Wed) by tnemeth (subscriber, #37648) [Link]

> The value assigned to `_` is uninteresting [...]

I'm sorry to disagree here but IHMO the "default:" case can need to access the `_` value, for error reporting...

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 14:08 UTC (Wed) by NAR (subscriber, #1313) [Link]

In Erlang/Elixir world if I want to print out that value for error reporting, I don't use the _, but give some meaningful name to that variable. Of course, _ doesn't really mean it's "uninteresting", it means that the variable is intentionally not used.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 14:19 UTC (Wed) by tnemeth (subscriber, #37648) [Link]

Indeed. I discussed this issue with a friend who has developed in Scala. He told me that in what he understood from the article -- and what I completely missed -- is that if I want error reporting, I may simply have to use

case x:
print("error: ", str(x))

an not use the `_` (which isn't meant to be used).

"Structural pattern matching" for Python, part 2

Posted Sep 3, 2020 13:22 UTC (Thu) by hartb (subscriber, #96640) [Link]

I guess you still have the object-to-be-matched handy:
match t:
    ...
    case _:
        print("error: ", str(t))

"Structural pattern matching" for Python, part 2

Posted Sep 4, 2020 16:04 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

I was worried about the same thing too, but the walrus operator saves you in this related case as well:
match t := some_expr():
    case ...:
    case _:
        # use t
Though I don't like the locality mismatch here. For a large (probably considered un-Pythonic) match, your variable assignment is far away when it could be right next to it. You also have to consider that `t` is now always assigned to rather than just in the catch-all case.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 13:18 UTC (Wed) by caliloo (subscriber, #50055) [Link]

What irks me is all the mention of other languages being surveyed to see how they did it. If python had started like that its particular style of indentation that makes it so readable and concise would never have existed.
For me this tastes of feature FOMO more than design opinion, which is a red flag. The thing about variable match or assign in the brackets make my hair stand up from a language design stand point. If that can be alleviated, then for me it’s fine...

"Structural pattern matching" for Python, part 2

Posted Sep 6, 2020 21:18 UTC (Sun) by smitty_one_each (subscriber, #28989) [Link]

I hadn't heard of "Fear Of Missing Out" before. Thanks.

It does seem that, with the newer release cycle, Python is going down the Java "featuritis" road.

On the other hand, this seems to take Python more in a Haskell-ish direction.

Good news? Bad news? Who can say?

On the other hand, the languages that fail to add new features seem to be the ones nobody's using any longer. It's more of an economic than technical arms race, apparently.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 19:25 UTC (Wed) by mb (subscriber, #50428) [Link]

we chose a simpler rule: named constants are only recognized when referenced via some namespace, such as `mod.var` or `Color.RED`

Oh, come on...

That is even more awful than using cryptic special characters like dots.
That means whether a variable is read from or written to depends on how I "fetch" a variable name.

The equal character would IMO be the obvious and intuitive character to mark that something is going to be assigned to.
match t:
    case (USE_RECT, real=, imag=):
	return complex(real, imag)
    case (USE_POLAR, r=, phi=):
	return complex(r * cos(phi), r * sin(phi))

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 21:25 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

Maybe I just lack a properly Pythonic mindset, but the most bizarre part of this proposal to me is the fact that the names are *assigned* rather than *bound* within the lexical scope of the match arm. Every other language I know of with pattern matching creates lexical bindings for match variables rather than allowing the match itself to have side effects on existing variables. If the match placeholders were bound lexically then it would make perfect sense that they can't be qualified. As implicit assignments, however, the restriction does seem a bit arbitrary. At the moment Python appears to lack a concept of lexical block scope. Perhaps that deficiency should be rectified before moving on more advanced features like pattern matching.

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 21:32 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Changing variable scope to be anything other than the parent function/class/module declaration would be a massive change. Anything digging around in frame objects would need to learn to poke wherever these new scopes end up being declared. You'd basically need to add variable declarations too since this fairly common pattern would be broken:
if foo is not None:
    local_foo = foo
else:
    local_foo = DefaultValue()
(Sure, here `foo` could probably be reused, but code like this still exists.)

"Structural pattern matching" for Python, part 2

Posted Sep 2, 2020 23:24 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Changing variable scope to be anything other than the parent function/class/module declaration would be a massive change.

Yes, I anticipated that. I think it would be worth it, since the only way to get reasonable scopes in current Python involves nested functions or lambdas, but in the end that's not for me to decide.

> Anything digging around in frame objects would need to learn to poke wherever these new scopes end up being declared.

Making it harder to mess around with frame objects seems like a feature, not a bug. Making them completely inaccessible would be even better. (And, apart from debugging, how would you even use such a thing without creating a very tight coupling with the function's internal implementation? I'm not seeing any realistic scenario where this code would not need to be updated anyway.)

> You'd basically need to add variable declarations too…

I think that could be avoided by simply defaulting to function scope as current Python does when assigning to an unknown variable. Nested block scopes would be introduced by specific statements where binding makes sense such as loop variables in a "for" statements, "as" clauses in "with" statements (as per the original version of PEP 343[1]), or placeholders in "match" statements. A "local" keyword could be added to explicitly introduce scoped variables into the current block, similar to the "global" and "nonlocal" keywords, if there is a demand for it.

[1] https://docs.python.org/2.5/whatsnew/pep-343.html

"Structural pattern matching" for Python, part 2

Posted Sep 3, 2020 7:06 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> Yes, I anticipated that. I think it would be worth it, since the only way to get reasonable scopes in current Python involves nested functions or lambdas, but in the end that's not for me to decide.

How exactly would you define "reasonable?"

IMHO, the Python rule is very straightforward and easy to remember. Moreover, I've never run into a situation where it actually causes a problem of some sort. Sure, variables hang around for a little longer than they would otherwise, but so what? If you really care, you can explicitly write del x, which is probably a Good Idea anyway, because if you care, then you should call attention to the fact that you care (not-caring is the default state of affairs, so x must be special in some way).

> Making it harder to mess around with frame objects seems like a feature, not a bug. Making them completely inaccessible would be even better.

And while we're at it, let's put the e in creat(2), where it should've been all along.

You can't break backcompat on a feature just because you personally dislike that feature. It exists. It needs to be supported. Unless we want to do *another* 3.x-style flag day, frame objects are here to stay.

> I think that could be avoided by simply defaulting to function scope as current Python does when assigning to an unknown variable.

As I mentioned in the comments the last time we discussed this idea:

- There is no such thing as an "unknown" variable, except at the global scope.
- It is always possible to determine at compile time which variables belong to which scopes, with the exception of the global scope. The bytecode compiler will then emit bytecode that only consults the single scope where the variable actually lives, or consults the global scope if the variable is not recognized.
- The global scope is not known at compile time, for a number of reasons, most obviously the fact that from foo import * is legal at the global scope (it's a SyntaxError anywhere else). This doesn't matter, because the "unknown = global" algorithm is good enough for the current design.
- Because the global scope is not known at compile time, it is not possible to determine at compile time whether an unrecognized variable in a match expression is a global or a nonexistent variable that should be created and bound to the case's scope.
- Therefore, allowing match expressions to create a new scope does not actually solve this problem, and would introduce a significant amount of complexity for no adequately justified reason.

"Structural pattern matching" for Python, part 2

Posted Sep 3, 2020 14:17 UTC (Thu) by sbaugh (subscriber, #103291) [Link]

>Moreover, I've never run into a situation where it actually causes a problem of some sort.

Isn't the fact that Python functions aren't true closures a direct consequence of its lack of proper lexical scope? I'd say that causes a *lot* of problems:

>>> fs = [lambda: i for i in range(5)]
>>> [f() for f in fs]
[4, 4, 4, 4, 4]

"Structural pattern matching" for Python, part 2

Posted Sep 3, 2020 22:48 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

lambda does create a new lexical scope already, and it doesn't help with that problem. In order for your example to work "correctly," you would need to establish a new lexical scope for each iteration (not just one for the whole loop, because it would still have i=4 at the end of the loop). I would find that very hard to reason about, because that's not what I think of when I hear the word "lexical."

What (I think) you really want is capture-by-value semantics instead of capture-by-name, which has nothing to do with whether or not the loop has a separate scope.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]

And of course there is in also a very easy solution: lambda i=i: i captures i “properly”. Perhaps not very intuitive for beginners, but once you've learnt it you've learnt it.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 14:37 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Hmm. That doesn't seem fool-proof:

>>> d = []
>>> f = lambda x=d: print(x)
>>> f()
[]
>>> d.append(3)
>>> f()
[3]

Unless I'm missing what you're trying to capture here?

"Structural pattern matching" for Python, part 2

Posted Sep 10, 2020 2:11 UTC (Thu) by milesrout (subscriber, #126894) [Link]

That's conflating two separate things: capturing names and capturing names that refer to mutable values.

>>> a = 0
>>> b = []
>>> f = lambda: print(a)
>>> g = lambda a=a: print(a)
>>> h = lambda: print(b)
>>> i = lambda b=b: print(b)
>>> (f(), g(), h(), i())
0
0
[]
[]
>>> a = 1; c.append(3)
>>> (f(), g(), h(), i())
1
0
[3]
[3]

f captures the *name* 'a', which refers to the immutable value 0. g has an optional parameter, the default value of which is the value of a i.e. the immutable value 0. Like any Python function, g will store internally references to the default values for its parameters. It can't 'capture' the variable a because you could put *any* expression as the default value for g's parameter, and that expression is evaluated *when the function is defined*, with the resulting value stored inside the closure object somewhere.

When f is called, it prints a i.e. the value referred to by the name 'a' that f captured. If 'a' has changed then what f prints will change. When g is called, it prints the value of its parameter or if it isn't given one, the default value, which is zero. The 'a' in f refers not to the value that 'a' had when f was created, but to the name 'a' itself in the definition environment of f.

When a is rebound to 1, it doesn't change the value that a was originally bound to. 'a = 1' is an operation on the name 'a', not an operation on the value pointed to by a. We reassign 'a' so that it refers to a different value, but we don't modify the value that 'a' previously pointed to. So when f() is called a second time, it prints 1 (the new value pointed to by a) while when g() is called a second time it prints 0 (the value you can think of as being pointed to by some 'g.__internal_stuff__.parameters[0].default_value' attribute).

h and i actually work exactly the same way, but with one crucial difference: the value that the name 'b' refers to is a list and is thus mutable. So when we do the 'b.append(3)' operation we are not changing the name 'b'! We're modifying the value *pointed to by b*. When we defined i, we evaluated the expression "b" and pointed some internal "this is the default value of my parameter" field of the closure object at the result of that evaluation. Evaluating the expression "b" results in that list object, the list object pointed to by the name 'b'.

So when we call h(), it prints the value pointed to by the name 'b'. When we call i(), it prints the value of parameter, if it exists, or the value pointed to by some internal 'default' reference for that parameter. That default reference hasn't been changed, it still points to the same object. However that object itself has been changed by the call to append. We only ever created *one* list object, and there are two references to it.

The 'mutable default value' issue for Python functions basically has nothing to do with variable capture at all. It's present even for normal function definitions:

def foo(x=[]):
x.append(1)
print(x)

>>> foo()
[1]
>>> foo()
[1, 1]

The main problem that the variable capture issue causes happens even with immutable values, and in fact is primarily an immutable value issue. The first time I encountered it was with a GUI library. I was doing something like this:

>>> for i in range(5):
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)

but of course the problem with this is that all the buttons will make the last button red when clicked. That's not because of a mutable value: the value pointed to by i doesn't change. The problem here is that a for loop assigns to the iteration variable. That code is the same as writing

>>> i = 0
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)
>>> i = 1
>>> buttons[i].on_click(lambda: buttons[i].colour = RED)
etc.

And there it's more clear what's happening vis a vis name capture. The solution is of course to instead write buttons[i].on_click(lambda i=i: buttons[i].colour = RED).

"Structural pattern matching" for Python, part 2

Posted Sep 10, 2020 13:21 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

Ah, I see now what you're doing. Yes, that's a nifty trick (assuming you can guarantee that you're only using rebound variables rather than holding a window into a variable someone else eventually mutates). But it's Python and I consider that kind of thinking just necessary for making robust software in that language.

"Structural pattern matching" for Python, part 2

Posted Sep 4, 2020 11:31 UTC (Fri) by gray_-_wolf (subscriber, #131074) [Link]

Given that the same variable in the match multiple times is illegal, how is one supposed to match things like Point with x == y?

I would have assume that

match p:
case (x, x):

would match points where p.x == p.y. However that does not seem to be the case. How can I express such a match?

Second question, if constants must be namespaced, is there no way to express something like:

x = 1
match p:
case (x, y):

For points that are `(1, .*)`? Is there something like `local` namespaces for local variables in python?

I'm not a python programmer, so I might be lacking some parts of the proper mindset, but it seems... limiting.

"Structural pattern matching" for Python, part 2

Posted Sep 4, 2020 15:54 UTC (Fri) by ecree (guest, #95790) [Link]

Apparently the PEP authors decided to punt on what they call "algebraic matching" of repeated names.

IMNSHO, if this PEP isn't doing algebraic matching then it's really far from clear what it's for, and I'm tempted to side with those who say this creature is being feeped purely for the sake of feeping.

"Structural pattern matching" for Python, part 2

Posted Sep 6, 2020 11:35 UTC (Sun) by kleptog (subscriber, #1183) [Link]

An alternative to algebraic pattern matching is allowing extra conditional, like in elixir (which also offers algebraic matching). Then it could look like:
match size:
    case (x,y) if x == y:
        circle()
    case (x,y):
        ellipse()
The whole power of such a match statement is that you can match entire nested structures, which this PEP does offer. An optional extra if clause for the more fancy cases can help here.

"Structural pattern matching" for Python, part 2

Posted Sep 7, 2020 16:45 UTC (Mon) by ecree (guest, #95790) [Link]

On thinking more about this, I have a fairly fundamental question, which I didn't see raised anywhere:

Why can't this be done as a library? Why is a language feature needed?

Something like

    import matching

    # ...

    if m := matching.match(p, (matching.capture('x'), matching.capture('x')):
        print("Diagonal at %d" % (m.x,))
    elif m := matching.match(p, (0, matching.capture('y')):
        print("X-axis as %d" % (m.y,))
    elif m := matching.match(p, (matching.capture('x'), matching.capture('y')):
        print("Point at (%d,%d)" % (m.x, m.y))
    else:
        raise ValueError("Not a 2-tuple")
    return Point(*p)
Yes, it's more verbose than the 'match' statement, but mainly because of the length of names, and still much more concise and declarative than doing everything with isinstance() and len(). And it makes the difference between constants and capture variables totally explicit!

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 7:21 UTC (Tue) by johill (subscriber, #25196) [Link]

You can probably use some more "magic" to make that shorter, something like
  from matching import match, capture

  with match(p) as matches:
    if m := matches((capture.x, capture.y)):
      print("Diagonal at %d" % (m.x, ))
    elif m := matches((0, capture.y)):
      print("X-axis at %d" % (m.y, ))
    elif m := matches((capture.x, capture.y)):
      print("Point at (%d, %d)" % (m.x, m.y))
    else:
      raise ValueError("Not a 2-tuple")
  return Point(*p)
You need something like
class CapturedVar:
  def __init__(self, name):
    self.name = name

class capture:
  def __getattr__(self, name):
    return CapturedVar(name)
and "match(...)" returns an object that is callable; the called method knows about CapturedVar instances and bumps those into the returned MatchedVars object ("m").

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 7:29 UTC (Tue) by johill (subscriber, #25196) [Link]

And maybe CapturedVar() can be callable again, returning another CapturedVar() instance with a type attached, so you can write
  from matching import match, capture

  with match(p) as matches:
    if m := matches((capture.x(int), capture.y(int))):
      print("Diagonal at %d" % (m.x, ))
    elif m := matches((0, capture.y(int))):
      print("X-axis at %d" % (m.y, ))
    elif m := matches((capture.x(int), capture.y(int))):
      print("Point at (%d, %d)" % (m.x, m.y))
    elif m := matches((capture.x, capture.y)):
      raise ValueError("It's a tuple, but not with integers")
    else:
      raise ValueError("Not a 2-tuple")
  return Point(*p)

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 7:34 UTC (Tue) by johill (subscriber, #25196) [Link]

Sorry, obviously in both cases the first if was meant to be
    if m := matches((capture.x(int), capture.x(int))):
      print("Diagonal at %d" % (m.x, ))
And that of course shows that the matches() call needs to be fairly smart about disentangling instances of CapturedVar() etc. But I believe it can be done.

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 16:15 UTC (Tue) by ecree (guest, #95790) [Link]

> But I believe it can be done.

I agree. And that's a clever idea of yours to use a context manager to get the 'only mention p once' semantics.

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 17:39 UTC (Tue) by johill (subscriber, #25196) [Link]

I also just realized that I sort of sketched the basics of "algebraic matching", where the original PEP completely bypassed that issue and (x, x) doesn't even do what seems obvious? Or am I reading that wrong?

Of course some more complex expressions would be ... hard. Probably not impossible, but hard. You'd have to make the CapturedVar() be able to track all kinds of things, if you wanted to be able to write

...
    if m := matches((capture.x(int), capture.x(int)**2)):
        print("Integer point on the parabola at %d" % m.x)
...
It still seems possible, but ... tricky to say the least. Effectively, "CapturedVar" would have to be able to be operated on in a symbolic fashion, always returning a new CapturedVar object that encapsulates the expression, and can later evaluate it. So given an instance
  x_square := CapturedVar('x') ** 2
you'd have to be able to calculate
x_square(7)
so you can compare to
capture.x(7)
.

But you can see the ambiguity here! I previously allowed

capture.x(int)
to indicate you wanted a specific type ...

This would be better if the types were to rely on type annotations, but I don't think you can do that in this fashion.

I suppose you could detect if the argument was a type or an instance or something? But tricky to differentiate ...

Oh, I have an idea. The

x_square(7)
calculation is purely internal, so we can do it as
x_square(value=y)
(say the signature is "__call__(self, type=None, value=None)" so you can differentiate more easily. The value passing syntax is only needed internally to evaluate when you have a given thing in hand.

Well, basically ... it get complex, but I haven't hit anything I couldn't do.

Now I'm tempted to actually implement it :-)

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 18:33 UTC (Tue) by johill (subscriber, #25196) [Link]

But actually...

while all of that works nicely with tuples because you can build some knowledge of them into the code, it really doesn't work at all with what the PEP calls "Class Patterns".

You can't write "m := matches(Point(capture.x, capture.y))" because you quite probably cannot instantiate a Point() object with two CapturedVar instances.

so you'd have to rewrite that as something like "m := matches_instance(Point, capture.x, capture.y)" at which point (pun intended) new syntax seems to make sense...

"Structural pattern matching" for Python, part 2

Posted Sep 8, 2020 23:39 UTC (Tue) by ecree (guest, #95790) [Link]

because you quite probably cannot instantiate a Point() object with two CapturedVar instances.

But can't you, though?

As long as CapturedVar suitably 'ducks' its type (e.g. for ints, implementing + and **, like you suggested), Point.__init__ should be entirely happy with it unless it's trying to do things that depend on the actual values.

Sure, it's not something that will work for every class, but nor is the rest of the "Match Protocol" in PEP 622. And if the class has a __match_args__, then you could potentially have a matching.Duck that uses that to "play the rôle of" the class in a match expression:

from matching import match, capture, Duck

with match(p) as matches:
    if m := matches(Duck(Point)(capture.x(int), capture.x(int))):
        print("Diagonal at %d" % (m.x, ))
    elif m := matches(capture.p(Point)(capture.x(int), capture.y(int))):
        # the second call on capture.p implies a Duck
        # if the match succeeds, it is replaced with the concrete Point
        print("Other point at %s % (m.p, ))

So you'd use plain Point() for dataclasses and anything else that's not too fussy, Duck(Point)() otherwise.

Still not quite as pretty as the PEP 622 syntax, but it still seems like a good way to get experience with "how this feature would be used in practice" before baking it into the language. Which to my mind is reason enough to do it.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 6:22 UTC (Wed) by johill (subscriber, #25196) [Link]

> As long as CapturedVar suitably 'ducks' its type (e.g. for ints, implementing + and **, like you suggested), Point.__init__ should be entirely happy with it unless it's trying to do things that depend on the actual values.

I don't think that'll work. It has to implement the algebraic operations to build a parse tree to be able to do algebraic matching, but you still can't really call Point.__init__ with it I'd think, it might check the type, or other random things in there. Hence the match protocol though!

> Sure, it's not something that will work for every class, but nor is the rest of the "Match Protocol" in PEP 622. And if the class has a __match_args__, then you could potentially have a matching.Duck that uses that to "play the rôle of" the class in a match expression: [...]

But yeah, you're right, the PEP also cannot do magic wrt. class matching, it has __match_args__ for that, and indeed with something like the "Duck()" you propose - or what I wrote as "m := matches_instance(Point, capture.x, capture.y)" - you can indeed do the same thing.

I neglected to think enough about it I suppose, but (fairly obviously) the only magic in the PEP can be the syntax, making "Point(x, y)" not be an instantiation call but some other kind of expression. Without new syntax we have to open that up manually by doing "Duck(Point)(capture.x, capture.y)" or my "m := matches_instance(Point, capture.x, capture.y)".

The rest ... yeah can probably be done, now that I think about it. In mostly the same ways.

And that shows to some extent where and how the algebraic matching breaks down and probably why they punted on it, though I'm not convinced it's that bad. Obviously, you don't want to get into the territory of writing "Point(x**2, x**3)" and suddenly needing an arbitrary solver to do the match, but I think you could do it if you restrict to having to mention each variable "naked" at least once. I.e. "Point(x, x**3)" would be OK and "Point3d(x, y, x+y)" would also be OK - all of this because once you have values for x/y it's easy to check the remaining expressions. But something like "Point(x+y, x-y)" would not be allowed.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]

At that point, how is this better than

(x, y) = size
if x == y:
    circle()
else:
    ellipse()
I would contend that it is not. It is an extra level of indentation and a whole new complicated language construct.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 22:16 UTC (Wed) by johill (subscriber, #25196) [Link]

Your version crashes if you pass it a 3-tuple or something else.

The match/case version can handle such things without you having to put another "is a 2-tuple" if outside of it.

"Structural pattern matching" for Python, part 2

Posted Sep 10, 2020 2:12 UTC (Thu) by milesrout (subscriber, #126894) [Link]

That's quite intentional. If you are passing the wrong type of value to a function then your code should crash. The code will also crash if I pass a string to a function that expects an integer. Code should not be full of defensive checks for programmer error: they're code paths that are hard if not impossible to test and by design are intended to never be run (nobody makes programmer errors on purpose). Type checks are for dispatching based on type in Python, and most of the time 'isinstance' checks are the wrong way of doing things. "if isinstance elif isinstance elif isinstance"-style code is considered unPythonic, and I don't think that papering over that code with a language construct makes it a good idea.

Now there are possibly some cases where you want to be able to pass different sizes of tuples to a single function. However I think in most of those cases you probably shouldn't be using tuples, you should be using something like a named tuple, a dictionary, a custom class, a dataclass with an optional field, or just separate arguments to a single function, some of which are optional.

Or possibly you should actually be making a decision 'am I working with 2-tuples or 3-tuples?' at a higher level and switch. If you had some code that is meant to work with 2D or 3D or 4D vectors for example, it would be better to have separate functions for the length of 2D, 3D and 4D vectors than a single function that does


    def veclen(v):
      match v:
        case (x,y): return sqrt(x*x + y*y)
        case (x,y,z): return sqrt(x*x + y*y + z*z)
        case (x,y,z,w): return sqrt(x*x + y*y + z*z + w*w)

Then you should make the determination as to whether you are working with 2D, 3D or 4D vectors at a high level and write all the functions dealing with 2D, 3D or 4D vectors separately.

For example if you really want to write 'length-agnostic' code with tuples up to some maximum length, then you should encapsulate that variation in size in Vector2D, Vector3D and Vector4D etc. classes with a common Vector superclass such that Vector's methods can be those that are parametric in the dimension.

And if it's really truly length-agnostic tuple code then you should be writing loops over lists, not matching on tuples.

"Structural pattern matching" for Python, part 2

Posted Sep 11, 2020 6:29 UTC (Fri) by johill (subscriber, #25196) [Link]

Well, that's a question of your architecture/design, but a large part of the whole point of the match statement was to be able to match different types in the same place.

"Structural pattern matching" for Python, part 2

Posted Sep 11, 2020 12:53 UTC (Fri) by milesrout (subscriber, #126894) [Link]

And what I am saying, quite clearly I think, is that I do not think that that is the right thing to be doing in Python code. It's not supported by the language *for good reason*.

"Structural pattern matching" for Python, part 2

Posted Sep 11, 2020 13:27 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

> It's not supported by the language *for good reason*.

This feels like one of those "cannot derive ought from is" statements. Is there any indication that something better than `isinstance` if-trees was explicitly rejected earlier in the language's design process? I'd expect the previous approach was "just treat it like a duck and if it quacks well enough, what do you care?", but with types now becoming more and more of a thing in general (even in Python), this seems like a natural thing to be considering around this time (to me at least).

"Structural pattern matching" for Python, part 2

Posted Sep 12, 2020 4:30 UTC (Sat) by milesrout (subscriber, #126894) [Link]

>Is there any indication that something better than `isinstance` if-trees was explicitly rejected earlier in the language's design process?

Yes. Switch statement suggestions have been rejected over and over again. Pattern matching proposals have been rejected before. 'if isinstance' isn't considered bad Python because it's *ugly* but because it's *bad design*. Dispatching on types is just not a good way to write code in a dynamically typed language. There is no - and can be no - exhaustiveness checking for example, to give just one example of why it's a problem.

>but with types now becoming more and more of a thing in general (even in Python), this seems like a natural thing to be considering around this time (to me at least).

Adding type annotations to Python is yet another example of the continual application of bad ideas to clutter up Python, done by people trying to roleplay as Haskell programmers. Pattern matching is the same thing. It's blatant cargo culting.

"Structural pattern matching" for Python, part 2

Posted Sep 16, 2020 16:02 UTC (Wed) by HelloWorld (guest, #56129) [Link]

> Dispatching on types is just not a good way to write code in a dynamically typed language. There is no - and can be no - exhaustiveness checking for example, to give just one example of why it's a problem.

That problem has nothing to do with pattern matching and everything to do with dynamic typing, which is quite simply an inherently bad idea.

"Structural pattern matching" for Python, part 2

Posted Sep 9, 2020 13:37 UTC (Wed) by milesrout (subscriber, #126894) [Link]

Python used to have simple and pleasant syntax. With async/await, := assignment and now this, the syntax seems to be getting significantly more complex with every new version of the language for very little benefit.

"Structural pattern matching" for Python, part 2

Posted Sep 15, 2020 13:28 UTC (Tue) by udifuchs (guest, #141354) [Link]

The syntax that I find confusing is the class pattern matching:

match pt:
case Point2D(x, y):

It looks like the class is being initiated.
I was wondering if using square brackets would make more sense,
as it is similar to the syntax used for type annotations.
For example:

match pt:
case Point2D[x, y]:

One issue with this idea is that the square bracket use might be confused with
item access. In enum, for example, this is a valid syntax:

>>> Sides["SPAM"] == Sides.SPAM == Sides("Spam")
True

Resolving this conflict would probably require not allowing such item access
inside a case statement.

"Structural pattern matching" for Python, part 2

Posted Sep 16, 2020 15:59 UTC (Wed) by HelloWorld (guest, #56129) [Link]

> case Point2D(x, y):
>
> It looks like the class is being initiated.

No, it doesn't, because there is a case keyword right before it, it's pretty much impossible to miss. If the reader doesn't know what that means and starts making wild guesses, the problem isn't with the language but with the reader.

"Structural pattern matching" for Python, part 2

Posted Sep 18, 2020 4:23 UTC (Fri) by udifuchs (guest, #141354) [Link]

I know that a computer parser could easily understand this syntax. Still, as a human, I find it confusing. Most keywords in python are followed by an expression. The case keyword is followed by a pattern that has a syntax similar to an expression. This is confusing.

The PEP mentions that assignment targets in python also look like expressions, but this is true in a very narrow way. It is a syntax error to write 'int(x) = 3', while 'case int(x):' is valid.

More specifically, while reading the is_tuple() example in the PEP, it took me a while to realize that LParen() is initialized in the original code but not in the new code. If LParen.__init__() has some side effects, the two versions of the code that look very similar could be very different.


Copyright © 2020, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds