Applying PEP 8
This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible. |
Two recent threads on the python-ideas mailing list have overlapped to a
certain extent; both referred to Python's style guide, but the discussion
indicates that the advice in it may have been stretched further than intended. PEP 8
("Style Guide for Python Code
") is the longstanding set of
guidelines and suggestions for code that is going into the standard
library, but the "rules" in the PEP have been applied in settings and tools well outside of that
realm. There may be reasons to update the PEP—some unrelated work of that nature is
ongoing, in fact—but Pythonistas need to remember that the suggestions in
it are not carved in stone.
Emptiness
On August 21, Tim Hoffmann posted his idea for an explicit emptiness test (e.g. isempty()) in the language; classes would be able to define an __isempty__() member function to customize its behavior. Currently, PEP 8 recommends using the fact that empty sequences are false, rather than any other test for emptiness:
# Correct: if not seq: if seq: # Wrong: if len(seq): if not len(seq):
But Hoffmann said that an isempty() test would be more explicit
and more readable, quoting entries from PEP 20
("The Zen of Python
"). He also pointed to a video of a talk by Brandon
Rhodes, where Rhodes suggested that the second ("Wrong") version of the
test was more explicit, thus a better choice. Effectively Hoffmann wanted
to take that even further, but Steven D'Aprano said
that Python already has an explicit way to test collections for emptiness:
We do. It's spelled:len(collection) == 0You can't get more explicit than that.
He perhaps should have known that the last line would be too absolute for
other Python developers to resist; Serhiy Storchaka
and others came up with "more explicit" tests that D'Aprano laughingly
acknowledged.
But, perhaps more to the point, Chris Angelico wondered
what actual problems isempty() would solve. Testing a collection
in a boolean context (e.g. in an if statement or using bool()), as suggested in the PEP, works
for many types, he said; "Are there any situations
that couldn't be solved by either running a type checker, or by using
len instead of bool?
"
But, as Thomas Grainger pointed out, both NumPy arrays and pandas DataFrames have a different idea about what constitutes emptiness; evaluating those types as booleans will not produce the results expected. NumPy and pandas are popular Python projects for use in scientific and data-analysis contexts, so their behavior is important to take into account. Grainger also mentioned the "false" nature of time objects set to midnight, which was addressed back in 2014, as another example.
While the wisdom of treating zero as false in Python in general was questioned
by Christopher Barker, Angelico said
that the real problem with the false midnight was in treating midnight as
zero (thus false). In any case, Hoffmann believes
that objects should be able to decide whether they are empty: "It's a
basic concept and like __bool__ and __len__ it should be upon the objects
to specify what empty means.
" In a later message, he conceded
that adding a new emptiness protocol (i.e. __isempty__())
may well be overkill, however.
Several commenters asked about use cases where emptiness-test problems manifest; Hoffmann said that SciPy and Matplotlib both have functions that can accept NumPy arrays or Python lists and need to decide if they are empty at times. Using len() works, but:
We often can return early in a function if there is no data, which is where the emptiness check comes in. We have to take extra care to not do the PEP-8 recommended emptiness check using `if not data`.
He suggested that having two different ways to test for emptiness depending
on the types of the expected data was "unsatisfactory
";
"IMHO whatever the recommended syntax for
emptiness checking is, it should be the same for lists and arrays and
dataframes.
" But Paul Moore objected
to the rigid adherence to PEP 8:
You can write a local isempty() function in matplotlib, and add a requirement *in your own style guide* that all emptiness checks use this function.Why do people think that they can't write project-specific style guides, and everything must be in PEP 8? That baffles me.
But the inconsistency for using the object in a boolean context versus
checking its len() led
Hoffmann to suggest
that PEP 8 needs changing, "because 'For sequences, (strings,
lists, tuples), use the fact that empty sequences are false:' is not a
universal solution
". While Moore was not
opposed to changing the wording in PEP 8, he said that things are
not as clear cut as Hoffmann seems to think:
PEP 8 is a set of *guidelines* that people should use with judgement and thought, not a set of rules to be slavishly followed. And in fact, I'd argue that describing a numpy array or a Pandas dataframe as a "sequence" is pretty inaccurate anyway, so assuming that the statement "use the fact that empty sequences are false" applies is fairly naive anyway.But if someone wants to alter PEP 8 to suggest using len() instead, I'm not going to argue, I *would* get cross, though, if the various PEP 8 inspired linters started complaining when I used "if seq" to test sequences for emptiness.
Hoffmann eventually decided not to pursue either a language change or one for PEP 8. There are some differences of opinion within the thread, but, by and large, the Python core developers do not see anything that requires much in the way of change. Meanwhile, PEP 8 popped up again right at the end of August.
is versus ==
Nick Parlante posted a lengthy message about a problem he has encountered when teaching a first course in programming using Python. Unlike other languages (e.g. Java), Python has a much simpler rule for how to do comparisons:
To teach comparisons in Python, I simply say "just use ==" - it works for ints, for strings, even for lists. Students are blown away by how nice and simple this is. This is how things should work. Python really gets this right.
The problem is that PEP 8 has an entry in the "Programming
Recommendations" section that says: "Comparisons to singletons like None
should always be done with is or is not, never the
equality operators.
" Singletons are
classes that only have one instance—all references to None in
Python are to the same object.
Parlante calls the entry in the PEP the "mandatory-is rule"
and said that it complicates teaching the language unnecessarily; tests
like "x == None" generally work perfectly well.
Students often first encounter is in a warning from code that
tests a variable for equality to None, Parlante said. Integrated development
environments (IDEs) will typically complain about violations of PEP 8,
he said, which is usually "very helpful
". But there is an
exception: "Having taught thousands of introductory Python students, the one PEP8
rule that causes problems is this mandatory-is rule.
"
He suggested making the "rule" less ironclad by adding language about it
being optional to the PEP.
Angelico said
that the two operators are asking different questions, however; it is
important to eventually understand the difference, but "just use =="
is a fine place to start.
He also pointed to the specific language in the PEP and noted, again, that
"EVERYTHING in that document is optional for code that isn't part of
the Python standard library
".
He
suggested turning off the specific warning in the IDE if it was causing
problems. Ultimately, it is up to the instructor to determine the best
approach for their course—including the style guide.
Parlante pushed back a bit on the correctness of using is as specified in the PEP, but Angelico provided several examples of where the "x == None" test will not work. Perhaps unsurprisingly, NumPy was used in one of the examples; the point is that equality is not the right question to ask because some objects have odd views on what it means—or, like NumPy, are unwilling to even attempt to decide. NumPy raises ValueError when its multi-element arrays are tested using ==, for example.
Barker noted that he also teaches Python to beginners, but that he does teach about the difference between is and == early on. There are benefits to that approach, he said:
I have seen all too many instances of code like:if c is 0: ...Which actually DOES work, for small integers, on cPython -- but it is a very bad idea. (though I think it raises a Warning now)And your students are almost guaranteed to encounter an occasion where using == None causes problems at some point in their programming careers -- much better to be aware of it early on!
Barker suggested that Parlante leave the "mandatory is" warning turned on in the IDE,
but D'Aprano had a different
take. He is "not fond of linters that flag PEP 8
violations
" and agreed with Angelico's configuration suggestion. As
a practical matter, D'Aprano said, changing PEP 8 in order to affect the IDEs is
likely to be a slow way to go about fixing this problem—if there even is
one.
But "PEP-8 zealots
" (as D'Aprano called them) are actually
acting as a force for good, Parlante said. Students naturally pick up good
habits by seeing complaints from the IDE and fixing them, even though they come
from the completely optional guidelines in PEP 8. "I hope people
who care about PEP8 can have a moment of satisfaction, appreciating how
IDEs have become a sort of gamified instrument to bring PEP8 to the world
at low cost.
"
He has something of an ulterior motive to get to a more "== tolerant
"
world, but few, if any, commenters see things his way; as "Todd" put
it:
Using "==" is a brittle approach that will only work if you are lucky and only get certain sort of data types. "is" works no matter what. The whole point of Python in general and pep8 in particular is to encourage good behavior and discourage bad behavior, so you have an uphill battle convincing people to remove a rule that does exactly that.
Furthermore, as David Mertz pointed out, there are some important concepts that may be getting swept under the rug:
Moreover, I would strongly discourage any instructor from papering over the difference between equality and Identity. I guess in a pure functional language there's no difference. But in Python it's of huge importance.As noted REPEATEDLY, this isn't just about 'is None'. As soon as you see these, it is a CRUCIAL distinction:
a = b = [] c, d = [], []Nary a None in sight, yet the distinction is key.
In the example, all four variables are assigned to an empty list, but a and b are assigned to the same list. So:
>>> a == c True >>> a is b True >>> a is c False >>> c is d FalseAdding elements to a will add them to b and vice versa, which is decidedly not the case for the other two lists.
D'Aprano thinks the dangers of using an equality test for None to be a bit overblown, but using is is still beneficial:
There are a bunch of reasons, none of which on their own are definitive, but together settle the issue (in my opinion).
- Avoid rare bugs caused by weird objects.
- Slightly faster and more efficient.
- Expresses the programmer's intent.
- Common idiom, so the reader doesn't have to think about it.
It looks rather unlikely that we will see any changes to PEP 8 for either of the ideas raised in these two threads. It is important to recognize what PEP 8 is (and is not)—no matter what IDEs and linters do by default. Hopefully the PEP's goals and intent were reinforced in the discussions. Meanwhile, Barker has been working on changes to the PEP to remove Python-2-specific language from it.
Other communities might not appreciate this kind of discussion, some of which can question the foundations of the language at times. But Python (and the python-ideas mailing list in particular) seems to welcome it for the most part. Over the years, those sorts of discussions have led to PEPs of various kinds—some adopted, others not—and to a better understanding of the underpinnings of the language and its history.
Index entries for this article | |
---|---|
Python | Enhancements |
Python | Python Enhancement Proposals (PEP)/PEP 8 |
(Log in to post comments)
Applying PEP 8
Posted Sep 8, 2021 23:16 UTC (Wed) by tau (subscriber, #79651) [Link]
Working around (anticipated) bugs in the manner that the Postel principle suggests instead of fixing them is generally considered to be a discredited idea these days, whether you're double-checking invariants that a callee has the responsibility to uphold, or silently swallowing errors, or any other attempts to "do the right thing" despite the fact that the program has already gone off the rails.
Better in my view to think twice about overloading operators in the first place.
Applying PEP 8
Posted Sep 9, 2021 0:35 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
The problem is that sometimes, you need to do this for equality. x == y does not produce True or False at all. Instead, it produces an array of booleans, one for each entry in x and y.* If you really wanted a boolean, you just write (x == y).all() instead, which checks whether all elements of the boolean array are truthy (and therefore, whether x and y have the same entries, after broadcasting). OTOH, you can instead keep the array of booleans and use it in a variety of other contexts. For example, if you have a third array z, you could write z[x == y], which gives you a one-dimensional array of all elements of z for which the corresponding elements of x and y are equal.
You might wonder why (x == y) can't just automatically coerce to bool when used in a boolean context. The problem is that you probably intuitively expect that np.array([True, False]) should be truthy (because that's the behavior of an ordinary Python list), but calling .all() on it produces False (because that's what "all" means). So they have a second method, .any(), which produces True for this array (however, unlike a Python list, it doesn't merely check "are there elements?" but instead looks to see if at least one element is actually truthy). The problem is that ndarray would have to pick one of these two behaviors as The Way That Coercing Works, and that would cause confusion in one or the other situation. So instead, it raises ValueError and forces you to be explicit about which one you mean.
The other possibility would be for (x == y) to produce something other than an array of booleans. However:
1. If it just produces a flat bool, then you have no reasonable way of recovering the "which elements matched?" information. So you would need a separate method for that, such as x.equals(y), which would be harder to read and understand (compare and contrast the .dot() method before the introduction of the @ operator).
2. If it produces some sort of "I can't believe it's not a bool array," then you're going to run into the same confusion, but even worse because now behavior varies based on the dynamic type of the array.
* If x and y are not the same size, or one of them is a scalar value and not an array at all, see https://numpy.org/doc/stable/user/basics.broadcasting.html
Applying PEP 8
Posted Sep 9, 2021 7:02 UTC (Thu) by zuki (subscriber, #41808) [Link]
Applying PEP 8
Posted Sep 9, 2021 12:38 UTC (Thu) by dona73110 (guest, #113155) [Link]
Except that we're told comparisons to such singletons should *always* be done with 'is' or 'is not', never the equality operators. So overloading __eq__ does not break that convention.
Overloading operators opens a wide door for returning intermediate objects in order to build up complex expression objects (which may include '==' in subexpressions) --similar to lambda expressions, except you can combine them with regular arithmetic operators-- which can then be applied to individual input values, sequences, or custom containers.
In that case, the returned object (of the operator overload function) does not return the result of the regular operator, but an object like a lambda that represents the action of the operator expression, to be applied to other input.
...
Anyway: Always use 'is' or 'is not' when testing for None! Is that such a hard rule to teach?
Applying PEP 8
Posted Sep 9, 2021 14:28 UTC (Thu) by dezgeg (subscriber, #92243) [Link]
Applying PEP 8
Posted Sep 9, 2021 17:30 UTC (Thu) by proski (subscriber, #104) [Link]
But what if I don't know if I'm comparing to None or not? A variable can use a number to enable some functionality or None to disable it, e.g.def cycle_loop(special_cycle): for cycle_counter in range(100): if cycle_counter == special_cycle: print('Doing the magic on cycle {}'.format(cycle_counter)) cycle_loop(10) cycle_loop(None)To follow the comparison rule, I would need to write a longer conditional:
if cycle_counter is not None and cycle_counter == special_cycle:Another alternative is to use a special integer number instead of None. Both approaches seem suboptimal to me.
Applying PEP 8
Posted Sep 9, 2021 17:40 UTC (Thu) by NYKevin (subscriber, #129325) [Link]
In other words: You have to know the types of the objects which you are working with, and how those types will behave. But that's true of almost any programming in Python, or any language for that matter.
Applying PEP 8
Posted Sep 9, 2021 13:02 UTC (Thu) by kleptog (subscriber, #1183) [Link]
Operator overloading is good if it doesn't add to the cognitive load and works as you expect. Obviously operators do different things depending on the objects (matrix multiplication doesn't work like normal multiplication) and that's fine, as long as it works as expected. I think both NumPy and SQLAlchemy do a reasonable job here.
Applying PEP 8
Posted Sep 9, 2021 22:57 UTC (Thu) by Wol (subscriber, #4433) [Link]
|Let's take the question "How old is someone". If there is no date of birth for someone who is alive, reality is UNKNOWN, and it's encoded as NULL. If, however, they are dead, reality is NOT_VALID, which is also encoded as NULL.
So when you compare two peoples' ages, comparing NULL with NULL returns NULL, but what on earth does that mean? If either or both are dead, then it should return FALSE, but if they're both alive it should return UNKNOWN.
Cheers,
Wol
Applying PEP 8
Posted Sep 10, 2021 13:06 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Applying PEP 8
Posted Sep 10, 2021 16:09 UTC (Fri) by Wol (subscriber, #4433) [Link]
That's the point. If they're dead, AGE is INVALID. If we don't know their birthday or deathday, AGE is UNKNOWN. SQL uses NULL for both, and telling the difference results in a horribly complicated query. (Of course, if we know their birthday, and they have a deathday but we don't know that, things get REALLY interesting :-)
Pick/MV by default uses the empty string, which makes matters worse, but provided your application isn't phased by it you can always return "dead" or "unknown" instead. This should be a computed field anyway so you know the result will be correct. Comparing two ages will still be a pain but at least you can define the truth table without worrying over whether values have multiple meanings - if they're both numbers the answer is true or false, if either is "dead" the result is INVALID, otherwise the result is UNKNOWN.
Cheers,
Wol
Applying PEP 8
Posted Sep 10, 2021 18:37 UTC (Fri) by nybble41 (subscriber, #55106) [Link]
If you define it that way I don't see how you could ever report any age other than INVALID or UNKNOWN. If the records say they're dead then, as you say, the result is INVALID. If the records do not say they are dead then their status is UNKNOWN (not "alive") since any claim that they are alive may be obsolete, and their age would likewise be UNKNOWN since it could be either a number (if there is a birthdate), UNKNOWN (if there is no birthdate), or INVALID (if they are actually dead and it just hasn't been recorded yet).
If you added an "alive as of date" field you could calculate their last *known* age based on when their status was confirmed. Short of that you can only answer "how old WOULD they be", not "how old ARE they".
I'm not certain it really makes sense to try to deal with age-at-death and current-age-of-living-subject in the same query. This seems like a type error. On the other hand, "time-since-birth" would apply equally to everyone, provided you have their birthdate on file.
Applying PEP 8
Posted Sep 9, 2021 8:16 UTC (Thu) by xav (guest, #18536) [Link]
- no implicit conversion to boolean, you have to be explicit and write if expr == 0
- all collection types have .len() and .is_empty()
- general explicitness where it matters
Applying PEP 8
Posted Sep 9, 2021 19:42 UTC (Thu) by hkario (subscriber, #94864) [Link]
Obviously None objects don't define their length or have a sensible is_empty. So code that needs to handle both None and empty objects would be more complex as it can't assume .len() will work.
Applying PEP 8
Posted Sep 9, 2021 19:55 UTC (Thu) by mathstuf (subscriber, #69389) [Link]
But Python is a different beast and `if seq:` is indeed convenient (if `None` and the "empty sequence" cases are indeed the same).
Applying PEP 8
Posted Sep 10, 2021 12:38 UTC (Fri) by taladar (subscriber, #68407) [Link]
Applying PEP 8
Posted Sep 10, 2021 13:20 UTC (Fri) by Wol (subscriber, #4433) [Link]
Making code more explicit makes things easier for the application programmer. Does it make it easier for the compiler programmer? That's often the problem - by simplifying one side too far, the savings there are far outweighed by the increased complexity on the other side.
Cheers,
Wol
Applying PEP 8
Posted Sep 10, 2021 15:57 UTC (Fri) by joey (guest, #328) [Link]
no matches
hmm
(sorry for a haskeller's perspective)
Applying PEP 8
Posted Sep 11, 2021 2:35 UTC (Sat) by NYKevin (subscriber, #129325) [Link]