Late-bound argument defaults for Python
Python supports default values for arguments to functions, but those defaults are evaluated at function-definition time. A proposal to add defaults that are evaluated when the function is called has been discussed at some length on the python-ideas mailing list. The idea came about, in part, due to yet another resurrection of the proposal for None-aware operators in Python. Late-bound defaults would help with one use case for those operators, but there are other, stronger reasons to consider their addition to the language.
In Python, the defaults for arguments to a function can be specified in the function definition, but, importantly, they are evaluated in the scope where the function is defined. So default arguments cannot refer to other arguments to the function, as those are only available in the scope of the function itself when it gets called. For example:
def foo(a, b = None, c = len(a)): ...That definition specifies that a has no default, b defaults to None if no argument gets passed for it, and c defaults to the value of len(a). But that expression will not refer to the a in the argument list; it will instead look for an a in the scope where the function is defined. That is probably not what the programmer intended. If no a is found, the function definition will fail with a NameError.
Idea
On October 24, Chris Angelico introduced his proposal for late-bound arguments. He used an example function, derived from the bisect.bisect_right() function in the standard library, to demonstrate the idea. The function's arguments are specified as follows:
def bisect(a, x, lo=0, hi=None):
He notes that there is a disparity between lo and hi:
"It's clear what value lo gets if you omit it. It's less clear what
hi
gets.
" Early in his example function, hi is actually set
to len(a) if it is None. Effectively None is
being used as a placeholder (or sentinel value)
because Python has no way to directly express
the idea that hi should default to the length of a. He
proposed new syntax to identify hi as a late-bound argument:
def bisect(a, x, lo=0, hi=:len(a)):
The "=:" would indicate that if no argument is passed for hi, the expression would be evaluated in the context of the call and assigned to hi before any of the function's code is run. It is interesting to note that the documentation for bisect.bisect_right() linked above looks fairly similar to Angelico's idea (just lacking the colon) even though the actual code in the library uses a default value of None. It is obviously useful to know what the default will be without having to dig into the code.
In his post, Angelico said that in cases where None is a legitimate value, there is another way to handle the default, but it also obscures what the default will be:
And the situation only gets uglier if None is a valid argument, and a unique sentinel is needed; this standard idiom makes help() rather unhelpful:_missing = object() def spaminate(thing, count=_missing): if count is _missing: count = thing.getdefault()Proposal: Proper syntax and support for late-bound argument defaults.def spaminate(thing, count=:thing.getdefault()): ...[...]The purpose of this change is to have the function header define, as fully as possible, the function's arguments. Burying part of that definition inside the function is arbitrary and unnecessary.
The first order of business in these kinds of discussions is the inevitable
bikeshedding about how the operator is spelled. Angelico chose a
"deliberately subtle
" syntax, noting that in many cases it
will not matter when the argument is bound. It is visually similar to the
walrus
operator (":="), but that is not legal in a function
definition, so there should be no ambiguity, he said.
Ethan Furman liked the idea but would rather see a different operator (perhaps "?=") used because of the potential confusion with the walrus operator. Guido van Rossum was also in favor of the feature, but had his spelling suggestion as well:
I like that you're trying to fix this wart! I think that using a different syntax may be the only way out. My own bikeshed color to try would be `=>`, assuming we'll introduce `(x) => x+1` as the new lambda syntax, but I can see problems with both as well :-).
New syntax for lambda expressions has also been
discussed, with most settling on "=>" as the best choice,
in part because "->" is used for type annotations; some kind
of "arrow" operator is commonly used in other languages for defining
anonymous functions.
Several others were similarly in favor of late-bound defaults
and many seemed to be happy with Van Rossum's spelling, but Brendan
Barnwell was opposed to both;
he was concerned that it would "encourage people to cram complex expressions into the
function definition
". Since it would only truly be useful—readable—for a simpler
subset of defaults, it should not be added, he said. Furthermore:
To me, this is definitely not worth adding special syntax for. I seem to be the only person around here who detests "ASCII art" "arrow" operators but, well, I do, and I'd hate to see them used for this. The colon or alternatives like ? or @ are less offensive but still too inscrutable to be used for something that can already be handled in a more explicit way.
But Steven D'Aprano did not
think that the addition of late-bound defaults would "cause a
large increase in the amount of overly complex
default values
". Angelico was also
skeptical that the feature was some sort of bad-code attractant. "It's like writing a list comprehension;
technically you can put any expression into the body of it, but it's
normally going to be short enough to not get unwieldy.
" In truth,
any feature can be abused; this one does not look to them to be
particularly worse in that regard.
PEP 671
Later that same day, Angelico posted
a draft of PEP 671
("Syntax for late-bound function argument defaults
"). In it,
he adopted the "=>" syntax, though he noted a half-dozen other
possibilities. He also fleshed out the specification of the default
expression and some corner cases:
The expression is saved in its source code form for the purpose of inspection, and bytecode to evaluate it is prepended to the function's body.Notably, the expression is evaluated in the function's run-time scope, NOT the scope in which the function was defined (as are early-bound defaults). This allows the expression to refer to other arguments.
Self-referential expressions will result in UnboundLocalError::
def spam(eggs=>eggs): # NopeMultiple late-bound arguments are evaluated from left to right, and can refer to previously-calculated values. Order is defined by the function, regardless of the order in which keyword arguments may be passed.
But one case, which had been raised by Ricky Teachey in the initial thread, was discussed at some length when Jonathan Fine asked about the following function definition:
def puzzle(*, a=>b+1, b=>a+1): return a, b
Angelico was inclined
to treat that as a syntax error, "since permitting it would
open up some hard-to-track-down bugs
". Instead it could be some
kind of run-time error in the case where neither argument is passed, he
said.
He is concerned
that allowing "forward references" to arguments that have yet to be
specified (e.g. b in a=>b+1 above) will be confusing and
hard to explain. D'Aprano suggested
handling early-bound argument defaults before their late-bound counterparts
and laid out a new process for argument handling that was "consistent
and understandable
". In particular, he saw no reason to make some
kinds of late-bound defaults into a special case:
Note that step 4 (evaluating the late-bound defaults) can raise *any* exception at all (it's an arbitrary expression, so it can fail in arbitrary ways). I see no good reason for trying to single out UnboundLocalError for extra protection by turning it into a syntax error.
Angelico noted
that it was still somewhat difficult for even experienced Python
programmers to keep straight, but, in addition, he had yet to hear of a
real use case. Erik Demaine offered
two examples, "though they are a bit artificial
"; he said
that simply evaluating the defaults in left-to-right order (based on the
function definition) was reasonably easy to understand. Angelico said
that any kind of reordering of the evaluation was not being considered; as he
sees it:
The two options on the table are:1) Allow references to any value that has been provided in any way
2) Allow references only to parameters to the left
Option 2 is a simple SyntaxError on compilation (you won't even get as far as the def statement). Option 1 allows everything all up to the point where you call it, but then might raise UnboundLocalError if you refer to something that wasn't passed.
The permissive option allows mutual references as long as one of the arguments is provided, but will give a peculiar error if you pass neither. I think this is bad API design.
Van Rossum pointed
out that the syntax-error option would break new ground:
"Everywhere else in Python, undefined names are runtime errors
(NameError or UnboundLocalError).
" Angelico sees the error in
different terms, though, noting
that mismatches in global and local scope are a syntax error; he gave an
example:
>>> def spam(): ... ham ... global ham ... File "<stdin>", line 3 SyntaxError: name 'ham' is used prior to global declaration
He also gave a handful of different function definitions that were subtly
different using the new feature; he was concerned about the "bizarre
inconsistencies
" that can arise, because they "are
difficult to explain unless you know exactly how everything is
implemented internally
". He would prefer to see real-world use
cases for
the feature to decide whether it should be supported at all, but was
adamant that the strict left-to-right interpretation was easier to
understand:
If this should be permitted, there are two plausible semantic meanings for these kinds of constructs:1) Arguments are defined left-to-right, each one independently of each other
2) Early-bound arguments and those given values are defined first, then late-bound arguments
The first option is much easier to explain [...]
D'Aprano explained that the examples cited were not particularly hard to understand and fell far short of the "bizarre inconsistencies" bar. There is a clear need to treat the early-bound and late-bound defaults differently:
However there is a real, and necessary, difference in behaviour which I think you missed:def func(x=x, y=>x) # or func(x=x, @y=x)The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:def method(self, x=>self.attr) # @x=self.attr def bisect(a, x, lo=0, hi=>len(a)) # @hi=len(a)Without that difference in behaviour, probably fifty or eighty percent of the use-cases are lost. (And the ones that remain are mostly trivial ones of the form arg=[].) So we need this genuine inconsistency.
As can be seen, D'Aprano prefers a different color for the bikeshed: using
"@" to prepend late-bound default arguments. He also said that
Angelico had perfectly explained the "harder to explain" option in a single
sentence; both are equally easy to explain, D'Aprano said. Beyond that, it
does not make sense to "prohibit something as a syntax error
because it *might* fail at runtime
". In a followup
message, he spelled that out further:
We don't do this:y = x+1 # Syntax error, because x might be undefinedand we shouldn't make this a syntax errordef func(@spam=eggs+1, @eggs=spam-1):either just because `func()` with no arguments raises. So long as you pass at least one argument, it works fine, and that may be perfectly suitable for some uses.
Winding down
While many of the participants in the threads seem reasonably happy—or at least neutral—on the idea, there is some difference of opinion on the details as noted above. But several thread participants are looking for a more general "deferred evaluation" feature, and are concerned that late-bound argument defaults will preclude the possibility of adding such a feature down the road. Beyond that, Eric V. Smith wondered about how late-bound defaults would mesh with Python's function-introspection features. Those parts of the discussion got a little further afield from Angelico's proposal, so they merit further coverage down the road.
At first blush, Angelico's idea to fix this "wart" in Python seems fairly straightforward, but the discussion has shown that there are multiple facets to consider. It is not quite as simple as "let's add a way to evaluate default arguments when the function is called"—likely how it was seen at the outset. That is often the case when looking at new features for an established language like Python; there is a huge body of code that needs to stay working, but there are also, sometimes conflicting, aspirations for features that could be added. It is a tricky balancing act.
As with many python-ideas conversations, there were multiple interesting sub-threads, touching on language design, how to teach Python (and this feature), how other languages handle similar features (including some discussion of ALGOL thunks), the overall complexity of Python as it accretes more and more features, and, of course, additional bikeshedding over the spelling. Meanwhile, Angelico has been working on a proof-of-concept implementation, so PEP 671 (et al.) seems likely to be under discussion for some time to come.
Index entries for this article | |
---|---|
Python | Arguments |
Python | Python Enhancement Proposals (PEP)/PEP 671 |
Late-bound argument defaults for Python
Posted Nov 10, 2021 18:26 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (9 responses)
Posted Nov 10, 2021 18:26 UTC (Wed) by smurf (subscriber, #17840) [Link] (9 responses)
Better late than never.
Late-bound argument defaults for Python
Posted Nov 10, 2021 20:42 UTC (Wed)
by benhoyt (subscriber, #138463)
[Link] (6 responses)
Posted Nov 10, 2021 20:42 UTC (Wed) by benhoyt (subscriber, #138463) [Link] (6 responses)
Better late than never.
I see what you did there. :-)
My question: what about not using new syntax? I get that it wouldn't be backwards compatible, but it could be done on a file-by-file basis with a "from __future__ import late_bound_defaults", avoiding the need for new syntax entirely. Was that discussed at all?
Late-bound argument defaults for Python
Posted Nov 10, 2021 22:40 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
That loses the ability to default arguments to some computation in the defining scope:
Posted Nov 10, 2021 22:40 UTC (Wed) by mathstuf (subscriber, #69389) [Link] (5 responses)
def some_functor(blah): dflt = something_expensive(blah) def f(d=dflt): pass return f
Late-bound argument defaults for Python
Posted Nov 11, 2021 1:10 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
Posted Nov 11, 2021 1:10 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
Here is a more straightforward example of that use case:
funcs = [] for i in range(4): funcs.append(lambda x=i: x) print([f() for f in funcs]) # [0, 1, 2, 3]
If you write the seemingly-obvious lambda: i, you get [3, 3, 3, 3] instead, because at the moment the function is actually called, i=3. In effect, we use one wart in the language to cancel out another.
Late-bound argument defaults for Python
Posted Nov 11, 2021 8:03 UTC (Thu)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Nov 11, 2021 8:03 UTC (Thu) by epa (subscriber, #39769) [Link] (3 responses)
Late-bound argument defaults for Python
Posted Nov 11, 2021 15:58 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
Posted Nov 11, 2021 15:58 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (2 responses)
Late-bound argument defaults for Python
Posted Nov 12, 2021 0:16 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Nov 12, 2021 0:16 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)
It's more complicated than that.
In Python, *most* interesting things happen at runtime, but name resolution is one of the few things that actually happens at compile time (I believe for performance reasons). The compiler looks at each variable, and follows a process like* this (first matching rule wins):
0. If the scope has an explicit global/nonlocal statement, then the variable is interpreted as such and bytecode is emitted accordingly.
1. If, anywhere in the scope, there's an assignment, then the variable is local to that scope, and we emit LOAD_FAST/STORE_FAST bytecode.
2. If a variable of the same name exists in an enclosing function (not class) scope, then it's a non-local or "closed over" variable. We emit complicated bytecode which sets up a closure. Closure variables are looked up by name at the time the closure is executed, so if the enclosing function rebinds the variable before returning, the closure will observe the new binding. This is different to how function parameters work, and so is a common source of confusion. If you're used to C++ closures, this is roughly equivalent to using [&] instead of [=], automatically, on every closure, without any option of doing it differently.
3. If none of the above rules applies, then it's a global or a builtin, and we emit LOAD_GLOBAL (we do not emit STORE_GLOBAL, because rule 1 or rule 0 would have applied in that case). LOAD_GLOBAL checks for globals and then for builtins at runtime.
Corollary: The set of variables in each non-global scope is fixed at compile time, because we have to emit the correct bytecode in order for a variable to be looked up in any non-global scope. You cannot add new variables to a non-global scope at runtime, and trying to evaluate a non-global variable before you assign to it raises UnboundLocalError instead of looking for a global variable of the same name.
In the specific case shown, since there is no assignment involved (i.e. the default value expression does not involve the walrus operator), I assume that the late-binding logic would generate a closure under rule 2 (and if it did not, then IMHO that would be a pretty egregious bug). The resulting bytecode would be ugly, but it should work correctly. However, if some_functor() had rebound dflt to some other value before returning (for example, if dflt were a loop variable), and we were using late-binding defaults, then the late-binding would very likely get the new value and not the value that was originally computed by something_expensive(blah). This is probably not what the programmer intended to happen, and in fact early-binding defaults are commonly used to work around this problem.
* I did not actually pull up the CPython source code, so it's possible that I have missed a case or oversimplified something.
Late-bound argument defaults for Python
Posted Nov 12, 2021 16:40 UTC (Fri)
by nybble41 (subscriber, #55106)
[Link]
Posted Nov 12, 2021 16:40 UTC (Fri) by nybble41 (subscriber, #55106) [Link]
> However, if some_functor() had rebound dflt to some other value before returning (for example, if dflt were a loop variable), and we were using late-binding defaults, then the late-binding would very likely get the new value and not the value that was originally computed by something_expensive(blah). This is probably not what the programmer intended to happen …
Actually that is exactly how I would expect it to work, based on the behavior of free variables in other languages, e.g. Common Lisp or even Javascript. Closures capture the variable (a.k.a. the binding), not the specific value in the variable at the time the closure is created. Though my preference is for languages like Haskell (or Rust) where (shared) variables are always immutable so this situation doesn't come up. (Of course if the variable is something like an IORef or STRef then reading from it gives the most recent value, but in that case the effect is rather obvious and generally intentional.)
Of course, these other languages with closures and mutation also have support for explicit *local* (not just function) scope, so you can express whether you want to assign to an existing variable (CL: setf) or create a new binding (CL: let). Python has no block scope—apart from some special cases like generator & dict/set comprehension expressions—which makes it difficult to control the scope of free variables in a closure. Even within comprehensions like "[(lambda: i) for i in range(N)]" the variable is reused across the loop iterations rather than being freshly bound for each value, so this just gives a list of functions which all return N-1. To get the more intuitive result you would need something ugly like "[(lambda j: lambda: j)(i) for i in range(N)]" to emulate the missing local binding in the body of the loop.
Late-bound argument defaults for Python
Posted Dec 1, 2021 18:12 UTC (Wed)
by hellcat_coder (guest, #155524)
[Link] (1 responses)
Posted Dec 1, 2021 18:12 UTC (Wed) by hellcat_coder (guest, #155524) [Link] (1 responses)
Late-bound argument defaults for Python
Posted Dec 1, 2021 21:57 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
Sure, you *can*. But they don't act like you think:
Posted Dec 1, 2021 21:57 UTC (Wed) by mathstuf (subscriber, #69389) [Link]
def f(p=[]): p.append(1) return p print(f()) # [1] print(f()) # [1, 1]This is because the default is only created upon function definition, not upon function entry. And given the mutability of Python, you can fiddle with that default inadvertently.
Late-bound argument defaults for Python
Posted Nov 10, 2021 20:30 UTC (Wed)
by keithp (subscriber, #5140)
[Link] (1 responses)
Posted Nov 10, 2021 20:30 UTC (Wed) by keithp (subscriber, #5140) [Link] (1 responses)
Hah! Snek is ahead of the game — it only supports late-bound arguments as that required less code in the compiler and runtime.
Welcome to Snek version 1.7 > def foo(a, b = len(a)): + return b + > foo("hello") 5
Now to decide if I care enough to go implement early-bound arguments.
Late-bound argument defaults for Python
Posted Nov 10, 2021 21:54 UTC (Wed)
by khim (subscriber, #9252)
[Link]
Posted Nov 10, 2021 21:54 UTC (Wed) by khim (subscriber, #9252) [Link]
Late-bound argument defaults for Python
Posted Nov 12, 2021 10:03 UTC (Fri)
by Visse (subscriber, #145030)
[Link] (4 responses)
Could this be achieved using decorators instead of adding new syntax? Posted Nov 12, 2021 10:03 UTC (Fri) by Visse (subscriber, #145030) [Link] (4 responses)
Something like:
@late_bound_arguments def foo(a, b = None, c = len(a)): ...This would have the benefit of being more searchable than new syntax.
Late-bound argument defaults for Python
Posted Nov 12, 2021 10:51 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted Nov 12, 2021 10:51 UTC (Fri) by smurf (subscriber, #17840) [Link]
Late-bound argument defaults for Python
Posted Nov 12, 2021 18:53 UTC (Fri)
by malmedal (subscriber, #56172)
[Link] (2 responses)
Posted Nov 12, 2021 18:53 UTC (Fri) by malmedal (subscriber, #56172) [Link] (2 responses)
What you can do is to write something such that this:
@late
def foo(a, b=[]):
...
would give b a new empty list on each invocation instead of reusing the original.
Late-bound argument defaults for Python
Posted Nov 12, 2021 20:00 UTC (Fri)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Nov 12, 2021 20:00 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (1 responses)
Even doing that is questionable, because the decorator does not "see" a code object that evaluates to an empty list. It just sees an empty list, and has to figure out how to manufacture a new empty list each time the function is called. So this would not work:
@late def foo(x=bar()): ...
By the time @late receives control, bar() has already been called and has returned some value. There is no way that @late can plausibly figure out that it needs to call bar() again in the future. The best it can do is call something like copy.copy() on bar()'s return value, and hope that's close enough.
Late-bound argument defaults for Python
Posted Nov 12, 2021 20:41 UTC (Fri)
by malmedal (subscriber, #56172)
[Link]
Posted Nov 12, 2021 20:41 UTC (Fri) by malmedal (subscriber, #56172) [Link]
Late-bound argument defaults for Python
Posted Nov 13, 2021 11:48 UTC (Sat)
by lobachevsky (subscriber, #121871)
[Link] (3 responses)
Posted Nov 13, 2021 11:48 UTC (Sat) by lobachevsky (subscriber, #121871) [Link] (3 responses)
For me the point of keyword-only arguments is that I don't need to know the order they appear in in the function definition, but if they were to use late-bound arguments now, the function definition order becomes important again. This seems like a potential footgun for API breakage, when the definition order were to change.
Late-bound argument defaults for Python
Posted Nov 13, 2021 13:02 UTC (Sat)
by smurf (subscriber, #17840)
[Link] (2 responses)
Posted Nov 13, 2021 13:02 UTC (Sat) by smurf (subscriber, #17840) [Link] (2 responses)
Presumably these get evaluated left-to-right *after* your arguments get bound. Anything else would make no sense whatsoever.
Late-bound argument defaults for Python
Posted Nov 17, 2021 12:33 UTC (Wed)
by foom (subscriber, #14868)
[Link] (1 responses)
Posted Nov 17, 2021 12:33 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)
def f(a=:b+1, b=:a+1): ...
And it'd work if you call it like f(a=5) or f(b=5), but f() would return an unknown variable error.
Late-bound argument defaults for Python
Posted Nov 17, 2021 13:55 UTC (Wed)
by smurf (subscriber, #17840)
[Link]
Posted Nov 17, 2021 13:55 UTC (Wed) by smurf (subscriber, #17840) [Link]
Late-bound argument defaults for Python should focus on the source scope
Posted Nov 18, 2021 16:12 UTC (Thu)
by ccurtis (guest, #49713)
[Link]
I like the "@" syntax ("&" would also work for me), but the function signature proposed seems fundamentally flawed. In this statement:
Posted Nov 18, 2021 16:12 UTC (Thu) by ccurtis (guest, #49713) [Link]
The x=x parameter uses global x as the default. The y=x parameter uses the local x as the default. We can live with that difference. We *need* that difference in behaviour, otherwise these examples won't work:This comment implies that a "global x" exists and so means a finer-grained specification is warranted. If true, this seems a much better approach:def method(self, x=>self.attr) # @x=self.attr def bisect(a, x, lo=0, hi=>len(a)) # @hi=len(a)
def method(self, x=self.attr+@self.attr) # 'self' is global self, '@self' is the local selfThat said, I know nothing more about Python than what I just read and I don't expect Van Rossum to be reading this, but if it makes sense someone may want to mention it on the list. It is the RHS scope that is of interest...