|
|
Subscribe / Log in / New account

The return of Python dictionary "addition"

Did you know...?

LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

By Jake Edge
October 29, 2019

Back in March, we looked at a discussion and Python Enhancement Proposal (PEP) for a new dictionary "addition" operator for Python. The discussion back then was lively and voluminous, but the PEP needed some updates and enhancements in order to proceed. That work has now been done and a post about the revised PEP to the python-ideas mailing list has set off another mega-thread.

PEP 584 ("Add + and += operators to the built-in dict class") has gotten a fair amount bigger, even though it has lost the idea of dictionary "subtraction", which never gained significant backing the last time. It also has two authors now, with Brandt Bucher joining Steven D'Aprano, who wrote the original PEP. The basic idea is fairly straightforward; two dictionaries can be joined using the "+" operator or one dictionary can be updated in place with another's contents using "+=". From the PEP:

    >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
    >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> d + e
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> e + d
    {'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
    
    >>> d += e
    >>> d
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

As can be seen, it is effectively an "update" operation (similar to using the .update() method) where the last value for a particular key "wins". That is why "cheese" is "cheddar" for d + e, but it is 3 for e + d. The example also shows that the operation is not commutative, which bothered some commenters even though there are already several such "arithmetic" operators that are not commutative; list "addition" using "+" isn't either, for example.

There were some objections to removing subtraction, some +1 and -1 responses, and others along those lines, but the biggest chunk of the thread was taken up by the question of how to "spell" the operator. The question seems to boil down to whether to use "|" instead of "+"; that was also part of the discussion back in March and is mentioned in the PEP as well. The operation is seen by some as being analogous to the set union operation, which uses "|".

Richard Musil kicked off a big sub-thread by making the argument for the set-union usage, though he suggested an entirely new operator ("|<") for it. He is concerned about the ambiguity of the + operator in Python and that choosing something completely new will ensure that users do not guess incorrectly about what it does. Chris Angelico did not see things that way, however:

Adding a time delta to a datetime isn't quite the same as adding two numbers. Adding two strings is even more different. Adding two tuples, different again. Yet they are all "adding" in a logical way.

But Paul Moore is unsure that there is any real need for a new dictionary addition operator:

IMO, debating the "meaning" of addition, and whether the + operator is appropriate here, is not the key question. The real questions for me are whether the update operation is used frequently enough to require an additional way of spelling it, and whether using the + operator leads to cleaner more readable code.

He said that he has never needed that kind of operator and suggested that someone do a survey of real-world Python code to see if it would be improved using the new operator, though he did admit to not following the debate closely. It turns out that a big chunk of the PEP is taken up by examples of how the new operator might be used, taking examples from third-party code (including SymPy and Sphinx). Moore was not entirely impressed with them, however, saying that only four out of the roughly 20 examples were improved with the switch, though another few were arguable.

Andrew Barnert thought that Moore's observation actually made a good argument in favor of the proposal; if those who are not in favor of the proposal think that roughly a quarter of the examples are an improvement using it, that's a pretty strong vote in its favor. Beyond that, though, he thinks the need for + (which he calls "copying update") makes for a compelling case, more so than just for the += operator ("mutating update"):

I don’t think it’s about the mutating update operation; I think everyone can live with the update method there. The problem is the copying update. The only way to spell it is to store a copy in a temporary variable and then update that. Which you can’t do in an expression. You can do _almost_ the same thing with {**a, **b}, but not only is this ugly and hard to discover, it also gives you a dict even if a was some other mapping type, so it’s making your code more fragile, and usually not even doing so intentionally.

With the "{**a, **b}" example, he is referring to using the dictionary unpacking operator, "**", which is specified in PEP 448 ("Additional Unpacking Generalizations"), to do the update operation. While that "works", it suffers from the drawbacks he mentions; it is also a fairly universally disliked language idiom. Most are fine with the "**" operator itself, but using it in that way is considered rather non-obvious and is quite unpopular.

D'Aprano pointed out that adding two dictionaries using + has come up frequently over the years, seemingly independently; "to many people, the use of + for this purpose is obvious and self-explanatory". The thread continued with some arguing for each spelling of the operator; in some sense, the arguments often came down to "taste". There were also some more exotic ideas (spellings other than + or |, providing a "did you mean ... ?" kind of error for + to lead users to |, and so on), but Guido van Rossum said he is "not crazy" about the "did you mean ... ?" idea; he indicated that he sees the field as already having been narrowed down:

So the choice is really only three way.

1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=)
2) Add d1 | d2 and d1 |= d2 (similar to set | and |=)
3) Do nothing

We're not going to introduce a brand new operator for this purpose, nor are we going to use a different existing operator.

Beyond that, his preference would be to use |, but he is not completely opposed to +:

So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.)

In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.

While the discussion went on at length, no real consensus was reached. As is always the case in the Python world, though, it seems, the discussion was never heated or even contentious really; in the end it comes down to personal preferences. As D'Aprano put it, even if the PEP "fails", it will have succeeded at some level:

If this PEP accomplishes nothing else, at least it will be a single source of information about dict addition the next hundred times somebody asks "Why can't I add two dicts?"

One would guess that the discussion will move from python-ideas to python-dev before too long and then likely to the steering council for some kind of decision. We know how one member of the council (Van Rossum) is leaning at this point, but we'll have to wait and see how the rest of that group feels as none have been active in the discussion. It seems like a reasonable "addition" to the language, however spelled, though using + seems more likely to head off newbie queries. Lists and dictionaries are much more integral to Python; those who are new to the language will probably see list "addition" well before they ever meet sets.


Index entries for this article
PythonDictionaries
PythonPython Enhancement Proposals (PEP)/PEP 584


(Log in to post comments)

The return of Python dictionary "addition"

Posted Oct 29, 2019 18:15 UTC (Tue) by Otus (subscriber, #67685) [Link]

> Beyond that, though, he thinks the need for += (which he calls "copying update") makes for a compelling case, more so than just for the + operator ("mutating update"):

Is that the wrong way around? I would have thought :+ is the mutating one.

The return of Python dictionary "addition"

Posted Oct 29, 2019 18:21 UTC (Tue) by Otus (subscriber, #67685) [Link]

I mean += being the mutating, rather than + of course.

The return of Python dictionary "addition"

Posted Oct 29, 2019 23:45 UTC (Tue) by jake (editor, #205) [Link]

Yes, of course. I had them backwards. Thanks for the correction!

jake

The return of Python dictionary "addition"

Posted Oct 29, 2019 23:53 UTC (Tue) by Anssi (subscriber, #52242) [Link]

Yep, a += b is mutating and similar to a.update(b), while a + b is copying and similar to {**a, **b}.

And the latter is where most of the need for the new operation is.

The return of Python dictionary "addition"

Posted Oct 30, 2019 2:16 UTC (Wed) by droundy (subscriber, #4559) [Link]

I wonder if anyone considered having the+ operator add the values when a key is present in both ducts? That would seem to be both intuitive and useful.

Keeping either the first or second value disambiguates an error case, but doesn't really feel correct or intuitive.

The return of Python dictionary "addition"

Posted Oct 30, 2019 5:17 UTC (Wed) by marcH (subscriber, #57642) [Link]

Do you have a real example?

That would be mixing operations at the dict/set/list level with operations on the elements themselves. I'd find that everything but intuitive.

> Keeping either the first or second value disambiguates an error case, but doesn't really feel correct or intuitive.

"Intuitive" is always somewhat subjective/cultural, however keeping the second value is "correct" because this new operation is a (non-commutative) "update" operation. I'm stopping now to paraphrase the article.

The return of Python dictionary "addition"

Posted Oct 30, 2019 17:14 UTC (Wed) by droundy (subscriber, #4559) [Link]

As a real example, data like histograms. As others have pointed out, Counter does this, but it could also be interesting for any sort of totals, e.g. for computing the average of any value (age, date, weight).

The return of Python dictionary "addition"

Posted Oct 30, 2019 9:54 UTC (Wed) by embe (subscriber, #46489) [Link]

That is available as collections.Counter:
>>> a = collections.Counter({'a': 1})
>>> b = collections.Counter({'a': 2})
>>> a + b
Counter({'a': 3})

The return of Python dictionary "addition"

Posted Oct 30, 2019 12:45 UTC (Wed) by weberm (subscriber, #131630) [Link]

except the Counter code is not compatible with a generic '+'; the entries in the Counter need to be integers. Whereas 'a'+'b' works, Counter({'a':'a'})+Counter({'a':'b'}) doesn't, nor would Counter({'a':[1]}) + Counter({'a':[2]'}) produce Counter({'a':[1,2]}).

The return of Python dictionary "addition"

Posted Oct 30, 2019 19:48 UTC (Wed) by NYKevin (subscriber, #129325) [Link]

But you can have dictionaries which contain other dictionaries, and + then becomes a deep-merge operation. Do you really want that as a binary operator? I think it's trying to squeeze too much functionality into one character.

The return of Python dictionary "addition"

Posted Oct 30, 2019 16:17 UTC (Wed) by pgdx (guest, #119243) [Link]

> I wonder if anyone considered having the+ operator add the values when a key
> is present in both ducts?

Yes, that has been addressed in both the email list (python-ideas) and in
PEP-584. It has been concluded that it's not a good (enough) idea.

Quoting PEP-584:

> Add the values (as Counter does).
>
> Too specialised to be used as the default behaviour.

The return of Python dictionary "addition"

Posted Nov 16, 2019 8:29 UTC (Sat) by iq-0 (subscriber, #36655) [Link]

Sure, but the issue is more that such behaviour might be expected when seeing ‘+’ being used.

It’s not always clear that you’re looking at code that is dealing with dictionaries, Though the data in them are logically “addable”.

Using set operators would immediately signal to the reader there is no mathematical addition taking place.

The return of Python dictionary "addition"

Posted Oct 30, 2019 22:36 UTC (Wed) by rioting_pacifist (guest, #134765) [Link]

I think using "|" hides it from the average python developer, for the sake of "technical correctness" .

I don't think it's worth it, as "+" is the syntax, an average python user would expect (especially given it's appeal to non-computer scientists)

The return of Python dictionary "addition"

Posted Oct 31, 2019 6:18 UTC (Thu) by buck (subscriber, #55985) [Link]

> I think using "|" hides it from the average python developer,
> for the sake of "technical correctness" .

Another possibility (and i don't know if this was brought up in
the PEP discussion, so sue me if i'm seeming to plagiarize) is
that making the operator a "|" makes it just non-obvious enough
to lead the "average python programmer" to stop and wonder,
"why didn't they just use '+'?", which might lead him or her to the
realization that it's got the maybe non-obvious behavior of dropping
values for overlapping keys, which might not occur if it was as
natural as seeing it spelled "+" by somebody, trying that out on
your own at some point, having the compiler accept it, and not
realizing you didn't think about key overlap

I.e., setting a trap for the unwary (like me)

A/k/a, leading you up the garden path

Indeed, i find myself even more sympathetic to the line of thinking
that says that the behavior doesn't line up well enough with my
naive notion of either "+" or "|" so why operator-ize it?, but now
i'm sure i must be rehashing the PEP discussion the article had
attempted to distill to the most pertinent highlights, probably
eschewing stuff like i'm now cluttering up the comment trail with

The return of Python dictionary "addition"

Posted Nov 4, 2019 17:11 UTC (Mon) by hkario (subscriber, #94864) [Link]

There are a lot of "nifty" features in Python that are hidden unless you learn about them (iterators, list comprehension is common when teaching).

the |= is used for set, and the keys in dict are a set not a list, so it's more correct to use |= rather than +=, as the behaviour will NOT mirror behaviour from list

The return of Python dictionary "addition"

Posted Nov 5, 2019 9:17 UTC (Tue) by timrichardson (subscriber, #72836) [Link]

Apparently knowing sets means you have moved beyond the "average" python user. Personally I hope the set notation wins ... + is more discoverable, but | may help the 'average' programmer discover sets, which sounds like a greater benefit . Particularly if they come to sets already know that "union" is a thing. However, either approach is a lot better than the current solutions.

The return of Python dictionary "addition"

Posted Nov 8, 2019 0:51 UTC (Fri) by Pc5Y9sbv (guest, #41328) [Link]

Given the ordering sensitivity, some might argue this is more of an append/concatenate operation. Which + also does for strings in Python.

Interestingly, PostgreSQL reuses their concatenation operator for JSONb objects with the same "copy and update" semantics proposed here. They also support subtraction of keys:

'foo'::text || 'bar'::text produces 'foobar'

'{"a":1, "b":2}'::jsonb || '{"a":3}'::jsonb produces '{"a":3, "b":2}'::jsonb

They also support subtraction of key strings from JSONb objects, but not subtraction of one object from another:

'{"a":1, "b":2}'::jsonb - 'a' produces '{"b":2}'::jsonb

The return of Python dictionary "addition"

Posted Oct 31, 2019 2:28 UTC (Thu) by lsl (guest, #86508) [Link]

> You can do _almost_ the same thing with {**a, **b}, but not only is this ugly and hard to discover […]

So Python goes TIMTOWTDI now?

The unpacking thing appears to follow logically from existing language rules (it's the same as writing out the KV pairs of a and b in order, or at least I hope it is). It's hard to see how the issues with it justify the introduction of a new operator. Seems good enough already.

The return of Python dictionary "addition"

Posted Nov 1, 2019 12:51 UTC (Fri) by sytoka (guest, #38525) [Link]

A few more years and Python will finally be of the same quality as Perl and Raku ;-)

The return of Python dictionary "addition"

Posted Nov 2, 2019 9:50 UTC (Sat) by rav (subscriber, #89256) [Link]

> it's the same as writing out the KV pairs of a and b in order, or at least I hope it is

For the curious, dict literals indeed keep the last value if the same key is specified multiple times, at least according to my test on Python 3.7.4:

>>> {"a": 1, "a": 2}
{'a': 2}

I was actually surprised by this - I had guessed it would be a SyntaxError.

The return of Python dictionary "addition"

Posted Nov 2, 2019 21:02 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

How would it be a SyntaxError with this code? That being such a different exception type depending on whether literals were used or not…icky.

a = 'a'
{a: 1, a: 2, 'a': 3}


Copyright © 2019, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds