|
|
Subscribe / Log in / New account

Applying PEP 8

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jake Edge
September 8, 2021

Two recent threads on the python-ideas mailing list have overlapped to a certain extent; both referred to Python's style guide, but the discussion indicates that the advice in it may have been stretched further than intended. PEP 8 ("Style Guide for Python Code") is the longstanding set of guidelines and suggestions for code that is going into the standard library, but the "rules" in the PEP have been applied in settings and tools well outside of that realm. There may be reasons to update the PEP—some unrelated work of that nature is ongoing, in fact—but Pythonistas need to remember that the suggestions in it are not carved in stone.

Emptiness

On August 21, Tim Hoffmann posted his idea for an explicit emptiness test (e.g. isempty()) in the language; classes would be able to define an __isempty__() member function to customize its behavior. Currently, PEP 8 recommends using the fact that empty sequences are false, rather than any other test for emptiness:

# Correct:
if not seq:
if seq:

# Wrong:
if len(seq):
if not len(seq):

But Hoffmann said that an isempty() test would be more explicit and more readable, quoting entries from PEP 20 ("The Zen of Python"). He also pointed to a video of a talk by Brandon Rhodes, where Rhodes suggested that the second ("Wrong") version of the test was more explicit, thus a better choice. Effectively Hoffmann wanted to take that even further, but Steven D'Aprano said that Python already has an explicit way to test collections for emptiness:

We do. It's spelled:
    len(collection) == 0
You can't get more explicit than that.

He perhaps should have known that the last line would be too absolute for other Python developers to resist; Serhiy Storchaka and others came up with "more explicit" tests that D'Aprano laughingly acknowledged. But, perhaps more to the point, Chris Angelico wondered what actual problems isempty() would solve. Testing a collection in a boolean context (e.g. in an if statement or using bool()), as suggested in the PEP, works for many types, he said; "Are there any situations that couldn't be solved by either running a type checker, or by using len instead of bool?"

But, as Thomas Grainger pointed out, both NumPy arrays and pandas DataFrames have a different idea about what constitutes emptiness; evaluating those types as booleans will not produce the results expected. NumPy and pandas are popular Python projects for use in scientific and data-analysis contexts, so their behavior is important to take into account. Grainger also mentioned the "false" nature of time objects set to midnight, which was addressed back in 2014, as another example.

While the wisdom of treating zero as false in Python in general was questioned by Christopher Barker, Angelico said that the real problem with the false midnight was in treating midnight as zero (thus false). In any case, Hoffmann believes that objects should be able to decide whether they are empty: "It's a basic concept and like __bool__ and __len__ it should be upon the objects to specify what empty means." In a later message, he conceded that adding a new emptiness protocol (i.e. __isempty__()) may well be overkill, however.

Several commenters asked about use cases where emptiness-test problems manifest; Hoffmann said that SciPy and Matplotlib both have functions that can accept NumPy arrays or Python lists and need to decide if they are empty at times. Using len() works, but:

We often can return early in a function if there is no data, which is where the emptiness check comes in. We have to take extra care to not do the PEP-8 recommended emptiness check using `if not data`.

He suggested that having two different ways to test for emptiness depending on the types of the expected data was "unsatisfactory"; "IMHO whatever the recommended syntax for emptiness checking is, it should be the same for lists and arrays and dataframes." But Paul Moore objected to the rigid adherence to PEP 8:

You can write a local isempty() function in matplotlib, and add a requirement *in your own style guide* that all emptiness checks use this function.

Why do people think that they can't write project-specific style guides, and everything must be in PEP 8? That baffles me.

But the inconsistency for using the object in a boolean context versus checking its len() led Hoffmann to suggest that PEP 8 needs changing, "because 'For sequences, (strings, lists, tuples), use the fact that empty sequences are false:' is not a universal solution". While Moore was not opposed to changing the wording in PEP 8, he said that things are not as clear cut as Hoffmann seems to think:

PEP 8 is a set of *guidelines* that people should use with judgement and thought, not a set of rules to be slavishly followed. And in fact, I'd argue that describing a numpy array or a Pandas dataframe as a "sequence" is pretty inaccurate anyway, so assuming that the statement "use the fact that empty sequences are false" applies is fairly naive anyway.

But if someone wants to alter PEP 8 to suggest using len() instead, I'm not going to argue, I *would* get cross, though, if the various PEP 8 inspired linters started complaining when I used "if seq" to test sequences for emptiness.

Hoffmann eventually decided not to pursue either a language change or one for PEP 8. There are some differences of opinion within the thread, but, by and large, the Python core developers do not see anything that requires much in the way of change. Meanwhile, PEP 8 popped up again right at the end of August.

is versus ==

Nick Parlante posted a lengthy message about a problem he has encountered when teaching a first course in programming using Python. Unlike other languages (e.g. Java), Python has a much simpler rule for how to do comparisons:

To teach comparisons in Python, I simply say "just use ==" - it works for ints, for strings, even for lists. Students are blown away by how nice and simple this is. This is how things should work. Python really gets this right.

The problem is that PEP 8 has an entry in the "Programming Recommendations" section that says: "Comparisons to singletons like None should always be done with is or is not, never the equality operators." Singletons are classes that only have one instance—all references to None in Python are to the same object. Parlante calls the entry in the PEP the "mandatory-is rule" and said that it complicates teaching the language unnecessarily; tests like "x == None" generally work perfectly well.

Students often first encounter is in a warning from code that tests a variable for equality to None, Parlante said. Integrated development environments (IDEs) will typically complain about violations of PEP 8, he said, which is usually "very helpful". But there is an exception: "Having taught thousands of introductory Python students, the one PEP8 rule that causes problems is this mandatory-is rule." He suggested making the "rule" less ironclad by adding language about it being optional to the PEP.

Angelico said that the two operators are asking different questions, however; it is important to eventually understand the difference, but "just use ==" is a fine place to start. He also pointed to the specific language in the PEP and noted, again, that "EVERYTHING in that document is optional for code that isn't part of the Python standard library". He suggested turning off the specific warning in the IDE if it was causing problems. Ultimately, it is up to the instructor to determine the best approach for their course—including the style guide.

Parlante pushed back a bit on the correctness of using is as specified in the PEP, but Angelico provided several examples of where the "x == None" test will not work. Perhaps unsurprisingly, NumPy was used in one of the examples; the point is that equality is not the right question to ask because some objects have odd views on what it means—or, like NumPy, are unwilling to even attempt to decide. NumPy raises ValueError when its multi-element arrays are tested using ==, for example.

Barker noted that he also teaches Python to beginners, but that he does teach about the difference between is and == early on. There are benefits to that approach, he said:

I have seen all too many instances of code like:
if c is 0:
    ...
Which actually DOES work, for small integers, on cPython -- but it is a very bad idea. (though I think it raises a Warning now)

And your students are almost guaranteed to encounter an occasion where using == None causes problems at some point in their programming careers -- much better to be aware of it early on!

Barker suggested that Parlante leave the "mandatory is" warning turned on in the IDE, but D'Aprano had a different take. He is "not fond of linters that flag PEP 8 violations" and agreed with Angelico's configuration suggestion. As a practical matter, D'Aprano said, changing PEP 8 in order to affect the IDEs is likely to be a slow way to go about fixing this problem—if there even is one.

But "PEP-8 zealots" (as D'Aprano called them) are actually acting as a force for good, Parlante said. Students naturally pick up good habits by seeing complaints from the IDE and fixing them, even though they come from the completely optional guidelines in PEP 8. "I hope people who care about PEP8 can have a moment of satisfaction, appreciating how IDEs have become a sort of gamified instrument to bring PEP8 to the world at low cost."

He has something of an ulterior motive to get to a more "== tolerant" world, but few, if any, commenters see things his way; as "Todd" put it:

Using "==" is a brittle approach that will only work if you are lucky and only get certain sort of data types. "is" works no matter what. The whole point of Python in general and pep8 in particular is to encourage good behavior and discourage bad behavior, so you have an uphill battle convincing people to remove a rule that does exactly that.

Furthermore, as David Mertz pointed out, there are some important concepts that may be getting swept under the rug:

Moreover, I would strongly discourage any instructor from papering over the difference between equality and Identity. I guess in a pure functional language there's no difference. But in Python it's of huge importance.

As noted REPEATEDLY, this isn't just about 'is None'. As soon as you see these, it is a CRUCIAL distinction:

a = b = []
c, d = [], []
Nary a None in sight, yet the distinction is key.

In the example, all four variables are assigned to an empty list, but a and b are assigned to the same list. So:

    >>> a == c
    True
    >>> a is b
    True
    >>> a is c
    False
    >>> c is d
    False
Adding elements to a will add them to b and vice versa, which is decidedly not the case for the other two lists.

D'Aprano thinks the dangers of using an equality test for None to be a bit overblown, but using is is still beneficial:

There are a bunch of reasons, none of which on their own are definitive, but together settle the issue (in my opinion).
  1. Avoid rare bugs caused by weird objects.
  2. Slightly faster and more efficient.
  3. Expresses the programmer's intent.
  4. Common idiom, so the reader doesn't have to think about it.

It looks rather unlikely that we will see any changes to PEP 8 for either of the ideas raised in these two threads. It is important to recognize what PEP 8 is (and is not)—no matter what IDEs and linters do by default. Hopefully the PEP's goals and intent were reinforced in the discussions. Meanwhile, Barker has been working on changes to the PEP to remove Python-2-specific language from it.

Other communities might not appreciate this kind of discussion, some of which can question the foundations of the language at times. But Python (and the python-ideas mailing list in particular) seems to welcome it for the most part. Over the years, those sorts of discussions have led to PEPs of various kinds—some adopted, others not—and to a better understanding of the underpinnings of the language and its history.


Index entries for this article
PythonEnhancements
PythonPython Enhancement Proposals (PEP)/PEP 8


(Log in to post comments)

Applying PEP 8

Posted Sep 8, 2021 23:16 UTC (Wed) by tau (subscriber, #79651) [Link]

Operator overloading is a sharp knife. If you're going to define operator overloads on a class then the onus is on you to ensure that they obey whatever standard axioms those operators are expected to adhere to, whether the operator is equal-to or less-than or whatever else it might be. A class whose instances can compare equal to None is defective: None's entire raison d'etre is to be a sentinel value that is not equal to anything other than itself.

Working around (anticipated) bugs in the manner that the Postel principle suggests instead of fixing them is generally considered to be a discredited idea these days, whether you're double-checking invariants that a callee has the responsibility to uphold, or silently swallowing errors, or any other attempts to "do the right thing" despite the fact that the program has already gone off the rails.

Better in my view to think twice about overloading operators in the first place.

Applying PEP 8

Posted Sep 9, 2021 0:35 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

This is a deliberate design decision of NumPy. The basic idea is that the binary operations, such as +, are done element-by-element. So for example, if x and y are arrays, then x + y is a new array all of whose entries are the sum of the corresponding entries in x and y.

The problem is that sometimes, you need to do this for equality. x == y does not produce True or False at all. Instead, it produces an array of booleans, one for each entry in x and y.* If you really wanted a boolean, you just write (x == y).all() instead, which checks whether all elements of the boolean array are truthy (and therefore, whether x and y have the same entries, after broadcasting). OTOH, you can instead keep the array of booleans and use it in a variety of other contexts. For example, if you have a third array z, you could write z[x == y], which gives you a one-dimensional array of all elements of z for which the corresponding elements of x and y are equal.

You might wonder why (x == y) can't just automatically coerce to bool when used in a boolean context. The problem is that you probably intuitively expect that np.array([True, False]) should be truthy (because that's the behavior of an ordinary Python list), but calling .all() on it produces False (because that's what "all" means). So they have a second method, .any(), which produces True for this array (however, unlike a Python list, it doesn't merely check "are there elements?" but instead looks to see if at least one element is actually truthy). The problem is that ndarray would have to pick one of these two behaviors as The Way That Coercing Works, and that would cause confusion in one or the other situation. So instead, it raises ValueError and forces you to be explicit about which one you mean.

The other possibility would be for (x == y) to produce something other than an array of booleans. However:

1. If it just produces a flat bool, then you have no reasonable way of recovering the "which elements matched?" information. So you would need a separate method for that, such as x.equals(y), which would be harder to read and understand (compare and contrast the .dot() method before the introduction of the @ operator).
2. If it produces some sort of "I can't believe it's not a bool array," then you're going to run into the same confusion, but even worse because now behavior varies based on the dynamic type of the array.

* If x and y are not the same size, or one of them is a scalar value and not an array at all, see https://numpy.org/doc/stable/user/basics.broadcasting.html

Applying PEP 8

Posted Sep 9, 2021 7:02 UTC (Thu) by zuki (subscriber, #41808) [Link]

Yes. The Numpy truthiness rules might be unexpected for people not familiar with Numpy, but they work very well. I can't count the number of times the lack of an implicit cast from array of bools to bool has saved my skin by forcing me to decide whether I'm assuming .any() or .all().

Applying PEP 8

Posted Sep 9, 2021 12:38 UTC (Thu) by dona73110 (guest, #113155) [Link]

> A class whose instances can compare equal to None is defective: None's entire raison d'etre is to be a sentinel value that is not equal to anything other than itself.

Except that we're told comparisons to such singletons should *always* be done with 'is' or 'is not', never the equality operators. So overloading __eq__ does not break that convention.

Overloading operators opens a wide door for returning intermediate objects in order to build up complex expression objects (which may include '==' in subexpressions) --similar to lambda expressions, except you can combine them with regular arithmetic operators-- which can then be applied to individual input values, sequences, or custom containers.

In that case, the returned object (of the operator overload function) does not return the result of the regular operator, but an object like a lambda that represents the action of the operator expression, to be applied to other input.

...

Anyway: Always use 'is' or 'is not' when testing for None! Is that such a hard rule to teach?

Applying PEP 8

Posted Sep 9, 2021 14:28 UTC (Thu) by dezgeg (subscriber, #92243) [Link]

Sure, writing 'if x is None:' is not big deal over 'if x == None:'... but what about cases like 'if None in some_list:' or 'some_list.count(None)' or similar where == is implicitly used?

Applying PEP 8

Posted Sep 9, 2021 17:30 UTC (Thu) by proski (subscriber, #104) [Link]

But what if I don't know if I'm comparing to None or not? A variable can use a number to enable some functionality or None to disable it, e.g.
def cycle_loop(special_cycle):
    for cycle_counter in range(100):
        if cycle_counter == special_cycle:
            print('Doing the magic on cycle {}'.format(cycle_counter))

cycle_loop(10)
cycle_loop(None)
To follow the comparison rule, I would need to write a longer conditional:
if cycle_counter is not None and cycle_counter == special_cycle:
Another alternative is to use a special integer number instead of None. Both approaches seem suboptimal to me.

Applying PEP 8

Posted Sep 9, 2021 17:40 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

This is fine, because you can prove that None will only ever be compared to integers. Integers don't have a "clever" __eq__ override, and so it works as expected. The problem is when you start comparing against arbitrary objects, at which point x == y is not even required to return a bool.

In other words: You have to know the types of the objects which you are working with, and how those types will behave. But that's true of almost any programming in Python, or any language for that matter.

Applying PEP 8

Posted Sep 9, 2021 13:02 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Another place where this goes a bit funny is in SQLAlchemy. In the query construction it's really nice to be able to say .filter(model.field > 5). This is operator overloading but does something completely different. This leads to PEP8 complaints in statements like .filter(model.field == None). For that they offer the alternative model.field.is_(None).

Operator overloading is good if it doesn't add to the cognitive load and works as you expect. Obviously operators do different things depending on the objects (matrix multiplication doesn't work like normal multiplication) and that's fine, as long as it works as expected. I think both NumPy and SQLAlchemy do a reasonable job here.

Applying PEP 8

Posted Sep 9, 2021 22:57 UTC (Thu) by Wol (subscriber, #4433) [Link]

If you're going to drag SQL into this, don't forget NULL is also overloaded.

|Let's take the question "How old is someone". If there is no date of birth for someone who is alive, reality is UNKNOWN, and it's encoded as NULL. If, however, they are dead, reality is NOT_VALID, which is also encoded as NULL.

So when you compare two peoples' ages, comparing NULL with NULL returns NULL, but what on earth does that mean? If either or both are dead, then it should return FALSE, but if they're both alive it should return UNKNOWN.

Cheers,
Wol

Applying PEP 8

Posted Sep 10, 2021 13:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

The date of birth doesn't change when someone dies though. They gain a death date, but their birthdate wouldn't NULL out just because of it. We do, after all, celebrate x00th birthdays of famous people after all despite them being very dead. Asking "what is the maximum this person reached?" and "how old would this person be if alive today?" are both valid questions, so I think the FALSE/UNKNOWN distinction depends more on the question being asked than the NULL state of the birthdate field.

Applying PEP 8

Posted Sep 10, 2021 16:09 UTC (Fri) by Wol (subscriber, #4433) [Link]

Their birthday doesn't disappear, true, but their AGE does. We say "how old WOULD they be", not "how old ARE they".

That's the point. If they're dead, AGE is INVALID. If we don't know their birthday or deathday, AGE is UNKNOWN. SQL uses NULL for both, and telling the difference results in a horribly complicated query. (Of course, if we know their birthday, and they have a deathday but we don't know that, things get REALLY interesting :-)

Pick/MV by default uses the empty string, which makes matters worse, but provided your application isn't phased by it you can always return "dead" or "unknown" instead. This should be a computed field anyway so you know the result will be correct. Comparing two ages will still be a pain but at least you can define the truth table without worrying over whether values have multiple meanings - if they're both numbers the answer is true or false, if either is "dead" the result is INVALID, otherwise the result is UNKNOWN.

Cheers,
Wol

Applying PEP 8

Posted Sep 10, 2021 18:37 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

> If they're dead, AGE is INVALID. If we don't know their birthday or deathday, AGE is UNKNOWN.

If you define it that way I don't see how you could ever report any age other than INVALID or UNKNOWN. If the records say they're dead then, as you say, the result is INVALID. If the records do not say they are dead then their status is UNKNOWN (not "alive") since any claim that they are alive may be obsolete, and their age would likewise be UNKNOWN since it could be either a number (if there is a birthdate), UNKNOWN (if there is no birthdate), or INVALID (if they are actually dead and it just hasn't been recorded yet).

If you added an "alive as of date" field you could calculate their last *known* age based on when their status was confirmed. Short of that you can only answer "how old WOULD they be", not "how old ARE they".

I'm not certain it really makes sense to try to deal with age-at-death and current-age-of-living-subject in the same query. This seems like a type error. On the other hand, "time-since-birth" would apply equally to everyone, provided you have their birthdate on file.

Applying PEP 8

Posted Sep 9, 2021 8:16 UTC (Thu) by xav (guest, #18536) [Link]

Rust's approach on these problems is really clean:
- no implicit conversion to boolean, you have to be explicit and write if expr == 0
- all collection types have .len() and .is_empty()
- general explicitness where it matters

Applying PEP 8

Posted Sep 9, 2021 19:42 UTC (Thu) by hkario (subscriber, #94864) [Link]

The nice thing about `if seq:` is that it will run both when seq is None and when it's empty.

Obviously None objects don't define their length or have a sensible is_empty. So code that needs to handle both None and empty objects would be more complex as it can't assume .len() will work.

Applying PEP 8

Posted Sep 9, 2021 19:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

In Rust, this would be `Option<&[T]>` and the `None` case is explicit anyways. If an API takes such a type, then `None` is different than `Some(&[])`. If not, just take a `&[T]` and let the caller make the empty slice if they have a `None`. IMO, this is far cleaner and lets the API actually communicate something about its requirements.

But Python is a different beast and `if seq:` is indeed convenient (if `None` and the "empty sequence" cases are indeed the same).

Applying PEP 8

Posted Sep 10, 2021 12:38 UTC (Fri) by taladar (subscriber, #68407) [Link]

Does the code really become more complex when it is more explicit about what it does? It certainly becomes more readable over code that handles different kinds of values and implicitly does something different with each of them.

Applying PEP 8

Posted Sep 10, 2021 13:20 UTC (Fri) by Wol (subscriber, #4433) [Link]

Remember KISS, and especially Einstein's version - "make it as simple as possible, BUT NO SIMPLER".

Making code more explicit makes things easier for the application programmer. Does it make it easier for the compiler programmer? That's often the problem - by simplifying one side too far, the savings there are far outweighed by the increased complexity on the other side.

Cheers,
Wol

Applying PEP 8

Posted Sep 10, 2021 15:57 UTC (Fri) by joey (guest, #328) [Link]

crtl-f monoid
no matches
hmm

(sorry for a haskeller's perspective)

Applying PEP 8

Posted Sep 11, 2021 2:35 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

I don't see how category theory would help here. The truthiness half of this is only loosely reminiscent of "is this the identity/unit element?" and it doesn't really match up 1:1 with either of those definitions (particularly since you would need to pick out a *single* binary operation for which it is the identity/unit, and such an operation may or may not actually exist for any given type). Meanwhile, equality is (usually, barring weird stuff like NumPy) designed to look like an equivalence relation, which is very much a set theory concept.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds