|
|
Subscribe / Log in / New account

Late-bound argument defaults for Python

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jake Edge
November 10, 2021

Python supports default values for arguments to functions, but those defaults are evaluated at function-definition time. A proposal to add defaults that are evaluated when the function is called has been discussed at some length on the python-ideas mailing list. The idea came about, in part, due to yet another resurrection of the proposal for None-aware operators in Python. Late-bound defaults would help with one use case for those operators, but there are other, stronger reasons to consider their addition to the language.

In Python, the defaults for arguments to a function can be specified in the function definition, but, importantly, they are evaluated in the scope where the function is defined. So default arguments cannot refer to other arguments to the function, as those are only available in the scope of the function itself when it gets called. For example:

    def foo(a, b = None, c = len(a)):
        ...
That definition specifies that a has no default, b defaults to None if no argument gets passed for it, and c defaults to the value of len(a). But that expression will not refer to the a in the argument list; it will instead look for an a in the scope where the function is defined. That is probably not what the programmer intended. If no a is found, the function definition will fail with a NameError.

Idea

On October 24, Chris Angelico introduced his proposal for late-bound arguments. He used an example function, derived from the bisect.bisect_right() function in the standard library, to demonstrate the idea. The function's arguments are specified as follows:

def bisect(a, x, lo=0, hi=None):

He notes that there is a disparity between lo and hi: "It's clear what value lo gets if you omit it. It's less clear what hi gets." Early in his example function, hi is actually set to len(a) if it is None. Effectively None is being used as a placeholder (or sentinel value) because Python has no way to directly express the idea that hi should default to the length of a. He proposed new syntax to identify hi as a late-bound argument:

def bisect(a, x, lo=0, hi=:len(a)):

The "=:" would indicate that if no argument is passed for hi, the expression would be evaluated in the context of the call and assigned to hi before any of the function's code is run. It is interesting to note that the documentation for bisect.bisect_right() linked above looks fairly similar to Angelico's idea (just lacking the colon) even though the actual code in the library uses a default value of None. It is obviously useful to know what the default will be without having to dig into the code.

In his post, Angelico said that in cases where None is a legitimate value, there is another way to handle the default, but it also obscures what the default will be:

And the situation only gets uglier if None is a valid argument, and a unique sentinel is needed; this standard idiom makes help() rather unhelpful:
_missing = object()
def spaminate(thing, count=_missing):
    if count is _missing: count = thing.getdefault()
Proposal: Proper syntax and support for late-bound argument defaults.
def spaminate(thing, count=:thing.getdefault()):
    ...
[...]

The purpose of this change is to have the function header define, as fully as possible, the function's arguments. Burying part of that definition inside the function is arbitrary and unnecessary.

The first order of business in these kinds of discussions is the inevitable bikeshedding about how the operator is spelled. Angelico chose a "deliberately subtle" syntax, noting that in many cases it will not matter when the argument is bound. It is visually similar to the walrus operator (":="), but that is not legal in a function definition, so there should be no ambiguity, he said.

Ethan Furman liked the idea but would rather see a different operator (perhaps "?=") used because of the potential confusion with the walrus operator. Guido van Rossum was also in favor of the feature, but had his spelling suggestion as well:

I like that you're trying to fix this wart! I think that using a different syntax may be the only way out. My own bikeshed color to try would be `=>`, assuming we'll introduce `(x) => x+1` as the new lambda syntax, but I can see problems with both as well :-).

New syntax for lambda expressions has also been discussed, with most settling on "=>" as the best choice, in part because "->" is used for type annotations; some kind of "arrow" operator is commonly used in other languages for defining anonymous functions. Several others were similarly in favor of late-bound defaults and many seemed to be happy with Van Rossum's spelling, but Brendan Barnwell was opposed to both; he was concerned that it would "encourage people to cram complex expressions into the function definition". Since it would only truly be useful—readable—for a simpler subset of defaults, it should not be added, he said. Furthermore:

To me, this is definitely not worth adding special syntax for. I seem to be the only person around here who detests "ASCII art" "arrow" operators but, well, I do, and I'd hate to see them used for this. The colon or alternatives like ? or @ are less offensive but still too inscrutable to be used for something that can already be handled in a more explicit way.

But Steven D'Aprano did not think that the addition of late-bound defaults would "cause a large increase in the amount of overly complex default values". Angelico was also skeptical that the feature was some sort of bad-code attractant. "It's like writing a list comprehension; technically you can put any expression into the body of it, but it's normally going to be short enough to not get unwieldy." In truth, any feature can be abused; this one does not look to them to be particularly worse in that regard.

PEP 671

Later that same day, Angelico posted a draft of PEP 671 ("Syntax for late-bound function argument defaults"). In it, he adopted the "=>" syntax, though he noted a half-dozen other possibilities. He also fleshed out the specification of the default expression and some corner cases:

The expression is saved in its source code form for the purpose of inspection, and bytecode to evaluate it is prepended to the function's body.

Notably, the expression is evaluated in the function's run-time scope, NOT the scope in which the function was defined (as are early-bound defaults). This allows the expression to refer to other arguments.

Self-referential expressions will result in UnboundLocalError::

    def spam(eggs=>eggs): # Nope
Multiple late-bound arguments are evaluated from left to right, and can refer to previously-calculated values. Order is defined by the function, regardless of the order in which keyword arguments may be passed.

But one case, which had been raised by Ricky Teachey in the initial thread, was discussed at some length when Jonathan Fine asked about the following function definition:

def puzzle(*, a=>b+1, b=>a+1):
    return a, b

Angelico was inclined to treat that as a syntax error, "since permitting it would open up some hard-to-track-down bugs". Instead it could be some kind of run-time error in the case where neither argument is passed, he said. He is concerned that allowing "forward references" to arguments that have yet to be specified (e.g. b in a=>b+1 above) will be confusing and hard to explain. D'Aprano suggested handling early-bound argument defaults before their late-bound counterparts and laid out a new process for argument handling that was "consistent and understandable". In particular, he saw no reason to make some kinds of late-bound defaults into a special case:

Note that step 4 (evaluating the late-bound defaults) can raise *any* exception at all (it's an arbitrary expression, so it can fail in arbitrary ways). I see no good reason for trying to single out UnboundLocalError for extra protection by turning it into a syntax error.

Angelico noted that it was still somewhat difficult for even experienced Python programmers to keep straight, but, in addition, he had yet to hear of a real use case. Erik Demaine offered two examples, "though they are a bit artificial"; he said that simply evaluating the defaults in left-to-right order (based on the function definition) was reasonably easy to understand. Angelico said that any kind of reordering of the evaluation was not being considered; as he sees it:

The two options on the table are:

1) Allow references to any value that has been provided in any way
2) Allow references only to parameters to the left

Option 2 is a simple SyntaxError on compilation (you won't even get as far as the def statement). Option 1 allows everything all up to the point where you call it, but then might raise UnboundLocalError if you refer to something that wasn't passed.

The permissive option allows mutual references as long as one of the arguments is provided, but will give a peculiar error if you pass neither. I think this is bad API design.

Van Rossum pointed out that the syntax-error option would break new ground: "Everywhere else in Python, undefined names are runtime errors (NameError or UnboundLocalError)." Angelico sees the error in different terms, though, noting that mismatches in global and local scope are a syntax error; he gave an example:

>>> def spam():
...     ham
...     global ham
...
  File "<stdin>", line 3
SyntaxError: name 'ham' is used prior to global declaration

He also gave a handful of different function definitions that were subtly different using the new feature; he was concerned about the "bizarre inconsistencies" that can arise, because they "are difficult to explain unless you know exactly how everything is implemented internally". He would prefer to see real-world use cases for the feature to decide whether it should be supported at all, but was adamant that the strict left-to-right interpretation was easier to understand:

If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:

1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first, then late-bound arguments

The first option is much easier to explain [...]

D'Aprano explained that the examples cited were not particularly hard to understand and fell far short of the "bizarre inconsistencies" bar. There is a clear need to treat the early-bound and late-bound defaults differently:

However there is a real, and necessary, difference in behaviour which I think you missed:
    def func(x=x, y=>x)  # or func(x=x, @y=x)
The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:
    def method(self, x=>self.attr)  # @x=self.attr

    def bisect(a, x, lo=0, hi=>len(a))  # @hi=len(a)
Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency.

As can be seen, D'Aprano prefers a different color for the bikeshed: using "@" to prepend late-bound default arguments. He also said that Angelico had perfectly explained the "harder to explain" option in a single sentence; both are equally easy to explain, D'Aprano said. Beyond that, it does not make sense to "prohibit something as a syntax error because it *might* fail at runtime". In a followup message, he spelled that out further:

We don't do this:
    y = x+1  # Syntax error, because x might be undefined
and we shouldn't make this a syntax error
    def func(@spam=eggs+1, @eggs=spam-1):
either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses.

Winding down

While many of the participants in the threads seem reasonably happy—or at least neutral—on the idea, there is some difference of opinion on the details as noted above. But several thread participants are looking for a more general "deferred evaluation" feature, and are concerned that late-bound argument defaults will preclude the possibility of adding such a feature down the road. Beyond that, Eric V. Smith wondered about how late-bound defaults would mesh with Python's function-introspection features. Those parts of the discussion got a little further afield from Angelico's proposal, so they merit further coverage down the road.

At first blush, Angelico's idea to fix this "wart" in Python seems fairly straightforward, but the discussion has shown that there are multiple facets to consider. It is not quite as simple as "let's add a way to evaluate default arguments when the function is called"—likely how it was seen at the outset. That is often the case when looking at new features for an established language like Python; there is a huge body of code that needs to stay working, but there are also, sometimes conflicting, aspirations for features that could be added. It is a tricky balancing act.

As with many python-ideas conversations, there were multiple interesting sub-threads, touching on language design, how to teach Python (and this feature), how other languages handle similar features (including some discussion of ALGOL thunks), the overall complexity of Python as it accretes more and more features, and, of course, additional bikeshedding over the spelling. Meanwhile, Angelico has been working on a proof-of-concept implementation, so PEP 671 (et al.) seems likely to be under discussion for some time to come.


Index entries for this article
PythonArguments
PythonPython Enhancement Proposals (PEP)/PEP 671


(Log in to post comments)

Late-bound argument defaults for Python

Posted Nov 10, 2021 18:26 UTC (Wed) by smurf (subscriber, #17840) [Link]

This would *finally* allow us to default arguments to empty lists, directories, or in fact any other new-and-empty object.

Better late than never.

Late-bound argument defaults for Python

Posted Nov 10, 2021 20:42 UTC (Wed) by benhoyt (subscriber, #138463) [Link]

Better late than never.

I see what you did there. :-)

My question: what about not using new syntax? I get that it wouldn't be backwards compatible, but it could be done on a file-by-file basis with a "from __future__ import late_bound_defaults", avoiding the need for new syntax entirely. Was that discussed at all?

Late-bound argument defaults for Python

Posted Nov 10, 2021 22:40 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

That loses the ability to default arguments to some computation in the defining scope:
def some_functor(blah):
    dflt = something_expensive(blah)
    def f(d=dflt):
        pass
    return f

Late-bound argument defaults for Python

Posted Nov 11, 2021 1:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

Here is a more straightforward example of that use case:

funcs = []
for i in range(4):
    funcs.append(lambda x=i: x)
print([f() for f in funcs])  # [0, 1, 2, 3]

If you write the seemingly-obvious lambda: i, you get [3, 3, 3, 3] instead, because at the moment the function is actually called, i=3. In effect, we use one wart in the language to cancel out another.

Late-bound argument defaults for Python

Posted Nov 11, 2021 8:03 UTC (Thu) by epa (subscriber, #39769) [Link]

Why would that ability be lost? Since dflt isn't in scope at call time, it can be bound at compile time instead. There would only be an ambiguity if you had an argument called dflt also.

Late-bound argument defaults for Python

Posted Nov 11, 2021 15:58 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

That was my first thought as well, but in Python unknown / unassigned variables default to being local, not free variables with lexical scope as in most other languages. In the absence of a "nonlocal" declaration, which is not possible here due to the syntax, the "dflt" in the default argument expression is merely an uninitialized local variable and would not resolve to the variable of the same name in the enclosing function.

Late-bound argument defaults for Python

Posted Nov 12, 2021 0:16 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

> That was my first thought as well, but in Python unknown / unassigned variables default to being local, not free variables with lexical scope as in most other languages.

It's more complicated than that.

In Python, *most* interesting things happen at runtime, but name resolution is one of the few things that actually happens at compile time (I believe for performance reasons). The compiler looks at each variable, and follows a process like* this (first matching rule wins):

0. If the scope has an explicit global/nonlocal statement, then the variable is interpreted as such and bytecode is emitted accordingly.
1. If, anywhere in the scope, there's an assignment, then the variable is local to that scope, and we emit LOAD_FAST/STORE_FAST bytecode.
2. If a variable of the same name exists in an enclosing function (not class) scope, then it's a non-local or "closed over" variable. We emit complicated bytecode which sets up a closure. Closure variables are looked up by name at the time the closure is executed, so if the enclosing function rebinds the variable before returning, the closure will observe the new binding. This is different to how function parameters work, and so is a common source of confusion. If you're used to C++ closures, this is roughly equivalent to using [&] instead of [=], automatically, on every closure, without any option of doing it differently.
3. If none of the above rules applies, then it's a global or a builtin, and we emit LOAD_GLOBAL (we do not emit STORE_GLOBAL, because rule 1 or rule 0 would have applied in that case). LOAD_GLOBAL checks for globals and then for builtins at runtime.

Corollary: The set of variables in each non-global scope is fixed at compile time, because we have to emit the correct bytecode in order for a variable to be looked up in any non-global scope. You cannot add new variables to a non-global scope at runtime, and trying to evaluate a non-global variable before you assign to it raises UnboundLocalError instead of looking for a global variable of the same name.

In the specific case shown, since there is no assignment involved (i.e. the default value expression does not involve the walrus operator), I assume that the late-binding logic would generate a closure under rule 2 (and if it did not, then IMHO that would be a pretty egregious bug). The resulting bytecode would be ugly, but it should work correctly. However, if some_functor() had rebound dflt to some other value before returning (for example, if dflt were a loop variable), and we were using late-binding defaults, then the late-binding would very likely get the new value and not the value that was originally computed by something_expensive(blah). This is probably not what the programmer intended to happen, and in fact early-binding defaults are commonly used to work around this problem.

* I did not actually pull up the CPython source code, so it's possible that I have missed a case or oversimplified something.

Late-bound argument defaults for Python

Posted Nov 12, 2021 16:40 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

Ah, so it depends on whether there is an assignment. Thank you for clearing that up. It still seems quite non-intuitive that adding an assignment later in the function (without an explicit "nonlocal" declaration) can completely change the scope of an earlier use of the variable, but at least the given example would work.

> However, if some_functor() had rebound dflt to some other value before returning (for example, if dflt were a loop variable), and we were using late-binding defaults, then the late-binding would very likely get the new value and not the value that was originally computed by something_expensive(blah). This is probably not what the programmer intended to happen …

Actually that is exactly how I would expect it to work, based on the behavior of free variables in other languages, e.g. Common Lisp or even Javascript. Closures capture the variable (a.k.a. the binding), not the specific value in the variable at the time the closure is created. Though my preference is for languages like Haskell (or Rust) where (shared) variables are always immutable so this situation doesn't come up. (Of course if the variable is something like an IORef or STRef then reading from it gives the most recent value, but in that case the effect is rather obvious and generally intentional.)

Of course, these other languages with closures and mutation also have support for explicit *local* (not just function) scope, so you can express whether you want to assign to an existing variable (CL: setf) or create a new binding (CL: let). Python has no block scope—apart from some special cases like generator & dict/set comprehension expressions—which makes it difficult to control the scope of free variables in a closure. Even within comprehensions like "[(lambda: i) for i in range(N)]" the variable is reused across the loop iterations rather than being freshly bound for each value, so this just gives a list of functions which all return N-1. To get the more intuitive result you would need something ugly like "[(lambda j: lambda: j)(i) for i in range(N)]" to emulate the missing local binding in the body of the loop.

Late-bound argument defaults for Python

Posted Dec 1, 2021 18:12 UTC (Wed) by hellcat_coder (guest, #155524) [Link]

Can't we default arguments to empty lists, dictionaries or even objects as of now in functions? Or am I missing something?

Late-bound argument defaults for Python

Posted Dec 1, 2021 21:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]

Sure, you *can*. But they don't act like you think:
def f(p=[]):
    p.append(1)
    return p
print(f()) # [1]
print(f()) # [1, 1]
This is because the default is only created upon function definition, not upon function entry. And given the mutability of Python, you can fiddle with that default inadvertently.

Late-bound argument defaults for Python

Posted Nov 10, 2021 20:30 UTC (Wed) by keithp (subscriber, #5140) [Link]

Hah! Snek is ahead of the game — it only supports late-bound arguments as that required less code in the compiler and runtime.

    Welcome to Snek version 1.7
    > def foo(a, b = len(a)):
    +  return b
    + 
    > foo("hello")
    5

Now to decide if I care enough to go implement early-bound arguments.

Late-bound argument defaults for Python

Posted Nov 10, 2021 21:54 UTC (Wed) by khim (subscriber, #9252) [Link]

C++ supported late-bound defaults exclusively for more than quarter-century.

Late-bound argument defaults for Python

Posted Nov 12, 2021 10:03 UTC (Fri) by Visse (subscriber, #145030) [Link]

Could this be achieved using decorators instead of adding new syntax?
Something like:
      @late_bound_arguments
      def foo(a, b = None, c = len(a)):
           ...
This would have the benefit of being more searchable than new syntax.

Late-bound argument defaults for Python

Posted Nov 12, 2021 10:51 UTC (Fri) by smurf (subscriber, #17840) [Link]

Short answer: No. Decorators aren't designed to do that and adding that capability to the CPython core would be an unholy mess.

Late-bound argument defaults for Python

Posted Nov 12, 2021 18:53 UTC (Fri) by malmedal (subscriber, #56172) [Link]

No, not something that references another argument or needs to be evaluated late.

What you can do is to write something such that this:
@late
def foo(a, b=[]):
...

would give b a new empty list on each invocation instead of reusing the original.

Late-bound argument defaults for Python

Posted Nov 12, 2021 20:00 UTC (Fri) by NYKevin (subscriber, #129325) [Link]

Even doing that is questionable, because the decorator does not "see" a code object that evaluates to an empty list. It just sees an empty list, and has to figure out how to manufacture a new empty list each time the function is called. So this would not work:

@late
def foo(x=bar()):
    ...

By the time @late receives control, bar() has already been called and has returned some value. There is no way that @late can plausibly figure out that it needs to call bar() again in the future. The best it can do is call something like copy.copy() on bar()'s return value, and hope that's close enough.

Late-bound argument defaults for Python

Posted Nov 12, 2021 20:41 UTC (Fri) by malmedal (subscriber, #56172) [Link]

Yes, that was what I meant by "or needs to be evaluated late". Perhaps that was a bit terse.

Late-bound argument defaults for Python

Posted Nov 13, 2021 11:48 UTC (Sat) by lobachevsky (subscriber, #121871) [Link]

Maybe I'm missing that from the discussion, but the defining late bindings left to right in the function definition makes me somewhat unhappy when I think about keyword-only arguments.

For me the point of keyword-only arguments is that I don't need to know the order they appear in in the function definition, but if they were to use late-bound arguments now, the function definition order becomes important again. This seems like a potential footgun for API breakage, when the definition order were to change.

Late-bound argument defaults for Python

Posted Nov 13, 2021 13:02 UTC (Sat) by smurf (subscriber, #17840) [Link]

Why? This is about default values, which by definition don't appear in your function call in the first place.

Presumably these get evaluated left-to-right *after* your arguments get bound. Anything else would make no sense whatsoever.

Late-bound argument defaults for Python

Posted Nov 17, 2021 12:33 UTC (Wed) by foom (subscriber, #14868) [Link]

With that definition, you could have funny functions like:
def f(a=:b+1, b=:a+1): ...

And it'd work if you call it like f(a=5) or f(b=5), but f() would return an unknown variable error.

Late-bound argument defaults for Python

Posted Nov 17, 2021 13:55 UTC (Wed) by smurf (subscriber, #17840) [Link]

Not really. Binding to function args should proceed strictly left-to-right, evaluating a =: clause iff the function call didn't pass a value for it.

Late-bound argument defaults for Python should focus on the source scope

Posted Nov 18, 2021 16:12 UTC (Thu) by ccurtis (guest, #49713) [Link]

I like the "@" syntax ("&" would also work for me), but the function signature proposed seems fundamentally flawed. In this statement:
The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:
    def method(self, x=>self.attr)  # @x=self.attr

    def bisect(a, x, lo=0, hi=>len(a))  # @hi=len(a)
This comment implies that a "global x" exists and so means a finer-grained specification is warranted. If true, this seems a much better approach:
    def method(self, x=self.attr+@self.attr)  # 'self' is global self, '@self' is the local self
That said, I know nothing more about Python than what I just read and I don't expect Van Rossum to be reading this, but if it makes sense someone may want to mention it on the list. It is the RHS scope that is of interest...


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds