Hacker News new | past | comments | ask | show | jobs | submit login
5% of 666 Python repos had comma typo bugs (inc V8, TensorFlow and PyTorch) (codereviewdoctor.medium.com)
360 points by rikatee on Jan 7, 2022 | hide | past | favorite | 327 comments



The high-level goals of python end up creating these little syntactic landmines that can get even experienced coders. My personal nomination for the worst one of these is that having a comma after a single value often (depending on the surrounding syntax) creates a tuple. It's easy to miss and creates maddening errors where nothing works how you expect.

I've moved away from working in Python in general, but I think the #1 feature I want in the core of the language is the ability to make violating type hints an exception[1]. The core team has been slowly integrating type information, but it feels like they have really struggled to articulate a vision about what type information is "for" in the core ecosystem. I think a little more opinion from them would go a long way to ecosystem health.

[1] I know there are libraries that do this, I am not seeking recommendations.


A lot of people in this thread are using this to make fun of Python, but the exact same issue exists in something like c++, here's some I fixed recently:

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335


I didn't understand anyone to be saying that Python is the only language to have this flaw.

Also, I personally don't mind this approach to string concatenation. I think it's a fine compromise between easy formatting and clarity. I was whining about a corner case of tuple construction - which as far as I know is not a feature of any other language.


A better compromise is to insist on this:

    (
       "one",
       (
          "a very very"
          "long long two"
       ),
       "three"
    )
And of course a,b should be syntactically invalid. It must be (a,b)


I think automatic string concatenation and singleton tuples were not introduced according to some high-level goal. They are just historical baggage. Automatic string concatenation comes from C, and the singleton tuple syntax probably just seemed like a good idea at first.

In hindsight, singleton tuples are not common or useful enough to deserve their own syntax. If the way to create them was something like this:

    t = tuple.single("hello")
we'd thing it's ugly or inconsistent, but definitely not confusing or bug-prone.


One place where singleton tuples used to be common is with the old "%"-formatting, specifically in the case where there is a single argument and its value might be a tuple:

    x = (1,2,3)
    #print("the value of x is %s" % x)   # breaks if x is a tuple
    print("the value of x is %s" % (x,)) # works even if x is a tuple
There is a readable way to create singleton tuples, without the sneaky trailing comma or a new function like tuple.single:

    tuple(["hello"])
The square brackets can be slightly annoying. I recall writing the following function to omit them:

    def tup(*args):
        return tuple(args)
This basically lets you use the usual tuple syntax, just prefixed with the word "tup". The advantages are that you don't need a trailing comma for singleton tuples, and it's more obvious that a tuple is being created (it can be difficult to distinguish between tuple literals and parentheses used for grouping in a complex expression).

I am reminded of a somewhat similar issue with empty set literals: {1,2} is a set, {1} is a set, but {} is a dict. The way to create empty sets is using set().


I’ve been writing Python professional full time for 8 years and still occasionally make the trailing-comma-tuple mistake. These days at least I’ll recognize and be able to find it quickly rather than wasting time. Can be caught with a linter, but not every codebase is readily linted.


The lack of a static type-system is IMO what makes these one-character mistakes very annoying. The compiler can't tell you something is wrong, so you're just left to figure out why things are broken, just to realize it was the smallest of typos.


I love how simple and forgiving Python is for small projects. The "trailing comma creates a tuple" situation comes out of, as far as I can tell, a desire to create maximally convenient syntax in the scenarios where tuples are intended. I think that's great for small code!

I just wish that the core team would take that same zeal for a "pythonic" experience with small code and use it to develop more scaled-up systems for dealing with larger code bases. My idea is to enforce strong pre-conditions on function calls using type hints, but I am sure there are other ways to do it.


For a language that is so incredibly picky about it's whitespace rules, it's a little laissez faire on the string-concatentation/tuple syntax side. I say this as someone who loves python and uses it extensively.


If you use mypy (as anyone should for any non-hobby Python usage) then Python has one of the strongest type systems available. Optional types, generics, "Any" escape hatches, everything you could want.


mypy is a great project and I agree that basically every project at scale should use it. However, I think you're wrong about the strength of the Python type system and what a good type system can "get" you. I think mypy both does an amazing job at static checking and that more powerful type systems go far beyond static checks and into changing how you structure and write code. The newly introduced "structural pattern matching" they just introduced[1] is an example of the kind of feature that could be usefully expanded by making type a first-class part of the Python runtime.

Again - the dynamism of Python means teams can write amazing extensions to Python (like mypy), but that isn't a replacement for the core team having a plan for how they think typing information should be used at runtime. Their current answer seems to be "nothing," which disappoints me.

[1] https://www.python.org/dev/peps/pep-0622/


> one of the strongest type systems available.

This is simply not true - Python with mypy isn’t even as strong as Typescript, let alone Rust, F#, Haskell and so forth.


Would mypy have caught any of the issues highlighted in the article?


No. Mypy only cares about types, it would only have been caught if something was expecting tuple of certain length, otherwise not.

The problem in the article is more related to syntax, not types, with the problem that both forms are valid syntax with different but still very similar outcome.

Pylint on the other hand can find it with implicit-str-concat check enabled.


The "trailing comma creates a tuple" bug actually comes from a disconnect between what people think defines a tuple (parenthesis) and what really does (comma). I always put parenthesis around a tuple for clarity.


whats the reason for allowing like

foo,

to be a tuple? why not make this a syntax error? is there a use for single value tuples?


Yes, single tuple values are useful for example for passing to a function which expects a tuple (because it might expect zero, one or multiple values).

The thing is, your example is the way tuple should be defined. The parenthesis are merely allowed (and ignored). Why? I see this as a mistake of language creators. But to be fair, it is difficult to make a perfect language (or anything really), and Python is pretty close imho.


C lets me do this, and doesn't say much about it.

  char ch_arr[3][10] = {
      "uno",
      "dos" 
      "tres"
  };


C's type system is neither very static nor outstandingly helpful by today's standards, yes.


On the other hand a good type system doesn't:

    # let ch_arr = [
      "uno";
      "dos"
      "tres";
    ];;
    Error: This expression has type string
           This is not a function; it cannot be applied.


This is not a good type system. It's a bad language where you can invoke functions without parentesis


Why should function arguments be delimited by parentheses? We don’t do that in Bash or Objective-C, for example


to be able to detect the difference between "this expression is a function" and "this expression is the result of the computation of a function"

the parentheses are not for function arguments, are for "invocation".


> to be able to detect the difference between "this expression is a function" and "this expression is the result of the computation of a function"

If function (or method) arguments don't require parenthesis, referring to (rather than calling) a function/method usually requires quite distinct syntax, so it's quite easy to it apart from a call.

It may not be familiar to people coming from languages where no-parens refers to the function and parens call it, but being clear and distinct and being intuitive to people indoctrinated in contrary syntax are not the same thing.

E.g., in ruby (which has methods but not functions in the strict sense) I can call a method with:

  thing.square # or thing.square()
Or access the corresponding method object with:

  thing.method :square # or, thing.method(:square)
Either of the former options are distinct from both of the latter.


A syntactic feature like this can only be judged in the context of the language. I'm sure there's languages where parentheses around function arguments prevent ambiguity, but there's also languages where it's very unambiguous, or the opposite might be true.

For other examples of languages where invocations don't use parentheses for arguments, OCaml and Haskell. In fact, I'd argue that if they tried to add that feature to those languages (parens around arguments to a function), it'd make things very confusing given the way functions and tuples work.


Can you show an example of how they could be confused?


What does this do?


An array of char arrays. But with the missing comma, it does something similar to what Python does in the linked article. Instead of ("uno", "dos", "tres"), you get ("uno", "dostres").


Fully agreed. If python had a proper static type system, those typos would hardly matter, and you'd have the best of both worlds: Convenient, concise syntax, but still confidence in your code.

I say "had a proper type system", but actually it turns out that it does have something like that: When I use python for anything else than a most tiny script now, I use "mypy"[1] which implements static typing according to some existing Python standard (whether that came about because of mypy or the other way around, I don't know).

It is so, so good to have mypy telling me where I messed up my code instead of receiving a cryptic, weird runtime error, or worse, no error and erratic runtime behavior. Because not knowing that a particular type is unexpected and wrong, values often get passed along and even manipulated until the resulting failure is not very indicative of the actual problem anymore.

[1] http://mypy-lang.org


I’m not clear how a type system would pick up a missing comma in a list of strings, unless the type was specific enough that the contents of the list or the length was encoded in the type.


True, in this particular case that would only help for fixed length strings, which is far from the encompassing case. I was thinking more generally and lost what the actual issue here is about.


I feel like it's been pretty clear from day one that type hints are meant for static analysis with tools like mypy. It's not exclusive to that use and has a lot of other possible applications, but the primary goal has always static analysis.


I’d rather a compile time error over an exception (or both), which in many cases can occur. I know mypy does this, maybe I should alias python=“mypy&&python”


For those looking to avoid this specific problem, there is a flake8 rule: https://pypi.org/project/flake8-no-implicit-concat.

More broadly, the https://codereview.doctors makers are making the point that their tool caught an easy-to-miss issue that most wouldn't think to add a rule for. A bit of an open question to me how many of those there really are at the language level, but still seems like a neat project.


Also all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also in terms of mistakes codereviewdoctor twice linked to the same issue in their blog https://github.com/tensorflow/tensorflow/issues/53636 and raised the PR to the wrong project https://github.com/tensorflow/tensorflow/pull/53637 (I guess Tensorflow vendors Keras, easy mistake)


https://github.com/tensorflow/tensorflow/tree/0d8705c82c64df...

    STOP!
    This folder contains the legacy Keras code which is stale and about to be deleted. The current Keras code lives in github/keras-team/keras.

    Please do not use the code from this folder.
Yeah, not the most obvious notice.

The fact they didn't find the same mistake(s) in keras-team/keras (I assume they scanned, it's one of the most popular Python repo) makes me believe these issues have been fixed/removed in up-to-date karas repo.


once tensorflow pointed to keras-team this happened

https://github.com/keras-team/keras/issues/15854

resulting in

https://github.com/keras-team/keras/pull/15876


The automatic bug report generation tool produces the following:

"Absent comma results in unwatned string concatenation on line 330"

Bug-ception!


> all but 1 of the issues they found relates to test code, it seems people are a little less careful compared to functional code.

Also a factor that bugs in functional code are more visible, both during development and to users once shipped. So there may have been an equal number or more such bugs in the non-test code, that just didn't remain in the code base for this long.


Ime, Black will add parenthesis to clearly and explicitly indicate a tuple where there is trailing comma. Figured this out when I made the trailing comma mistake and wondered why Black kept reformatting my code.


Black rules. I love it that I don't need to have a discussion about style with anyone when Black is used on the project.


TBH I think every language should have a longer like this and teams should just apply it and never need a discussion about formatting.


The URL in this comment has an incorrect TLD: it should be `doctor` (singular).

https://codereview.doctor/


there is also https://pypi.org/project/flake8-tuple/

typo in the url (or in HN's markup) btw: it's https://codereview.doctor


The removal of implicit string concatenation was proposed for Py3k[1], but was rejected.

[1] https://www.python.org/dev/peps/pep-3126/


The rejection notice seems completely counter intuitive to me. How is adding a plus "harder" compared to removing a foot gun?

> This PEP is rejected. There wasn't enough support in favor, the feature to be removed isn't all that harmful, and there are some use cases that would become harder.


This change would break a lot of legacy code for no good reason

The most common way to split a string in lines is using this concatenation formula.


> This change would break a lot of legacy code for no good reason

Preventing a bug that occurs in 5% of observed codebases (and anecdotally, happens to me during development all the time) seems like about as good as reasons get.

Swapping a perfectly fine print statement for a function, on the other hand… that’s the breaking change in Py3k that’s never seemed worth it to me.


I've never heard from Guido on this, but I've always felt that he created the print keyword in the very early days, just because it was easy and he always thought the language would be a niche small language. But, as the popularity of the language increased, the print keyword just stand out as a sore thumb and he just had to fix that.


But wasn't this proposal part of the move to python 3? strings where broken left and right anyway.


Right, there was lots of deliberate breakage, _and_ this is purely syntaxual hence the sort of thing 2to3 could trivially deal with.


> the sort of thing 2to3 could trivially deal with

2to3 could also trivially add +, and if anything, that would actually help surface these kind of bugs, because if you randomly see a + in the middle of your list of strings, it's much easier to spot the bug than if there was a missing comma.


> The most common way to split a string in lines is using this concatenation formula.

Is it really? I tend to avoid it in favour of ””” or ‘\n’.join(<list of lines>), because it looks like a mistake.

Triple quotes are kind of annoying if the string is indented, but you can just not indent the string to avoid the whitespace.


Both of your solutions are great but don't fully cover the use case. They are useful for multiline strings, but implicit concatenation is also often used to break long strings that may not have newlines.


In y opinion that would be better served with ‘’.join( ‘hello’, ‘world’)

No footgun potential, and as others have mentioned the “good usage” would often be bad simply because it ends up looking like a mistake even if it’s intentional.


I use it, personally. The other two options I find too aesthetically displeasing: not indenting the string looks bad when it's within an indented block of code, and using join and putting the strings in a list is just too much boilerplate. I will use """ if I don't care about the extra space put at the start of each line by the indentation.



Does Python support the concept of allowing code to opt in to new safety features? I can understand rejecting something like this for the sake of legacy compatibility (something Python has abandoned too readily in the past), but it seems like an option—or maybe even a default—might be nice.

I suppose this is also something you could catch with a linter?



I'd say that's a "kind of", since it implies the feature will eventually become mandatory. I was thinking more along the lines of Javascript's 'use strict';


No, there's a general aversion to "use flag" features among the Python core dev due to not wanting to support multiple versions of Python behavior and how they may interact over the long term.

"from __future__" is meant to only ever be used temporarily with a specific Python version slated for it becoming the default behavior.

This discussion about flags has come up recently as part of the debate of accepting PEP 649 or PEP 563 or something else continues. If the Steering Council does not accept PEP 563 it will need to be figured out how to deprecate "from __future__ import annotations" without making it the default and how to implement it's replacement.


Most of the "bugs" caught here (including in TensorFlow and in my own project, Xarray) seems to actually be typos in the test suite. This is certainly a good catch (and yes, linters should check for this!), but seems a little oversold to me.


Same :P I'm actually responsible for one of these (https://github.com/pytorch/pytorch/issues/70607), but it's a typo in a list of tests to skip.


A typo in a list of tests to skip means tests are run that are not intended to be run. This can lead to unexpected failures, so in my opinion is not the same as the errors in test suites where tests run with other test data than intended but should still pass.


Literally the second item in the "Zen of Python" (https://www.python.org/dev/peps/pep-0020/):

Explicit is better than implicit.

And yet, s = ["one", "two" "three"] will implicitly and silently do something, that is probably wrong most of the time.


I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.


Notice that, in the original quote,

    There should be one-- and preferably only one --obvious way to do it.
the author used two different ways of hyphenating (three, if you count the whole PEP 20). PEP 20 is clearly not meant to be taken as law. Nor PEP 8. Nor PEP 257.

People frequently mistake "one obvious way" with "one way". There are lots of ways to iterate through something, for example, but there is really one obvious way. And the philosophy here still applies: when you read anyone else's python code, the obvious way is probably doing the obvious thing. I think that is the more appropriate takeaway from PEP 20.


> the author used two different ways of hyphenating

No, first, it doesn't use hyphenating at all, it uses hyphens as an ASCII approximation for typographical dashes used to set off a phrase (a distinct function from hyphenation), and, second, in that quote they used one way of doing it: “two dashes set closed on the side of the main sentence and set open on the side of set-off phrase”.

It is an unusual way of doing it—just as with actual typographical dashes, setting open or closed symmetrically would be more common—but it's not two ways.

EDIT: And the third use (in the heading and later in the body) is seperating parts where neither is a mid-sentence appositive phrase, and uses open-on-both sides. So that's not a different way of doing the same thing, it's a different way of doing a semantically different thing.

Actually, I think the dash use makes a good illustration of how the “it” in “one way to do it” is intended.


> “two dashes set closed on the side of the main sentence and set open on the side of set-off phrase”.

Eh, I don't think that's the interpretation the author was going for. The author wanted to show two different ways of approximating a dash, and he had limited options.

If he'd done this-- for example-- he would have been showing one way, not two.

If he'd done this --for example-- you would have called it "two dashes set open on the side of the main sentence and set closed on the side of set-off phrase".

If he'd done this-- for example -- it would have been too obvious (on the same line).

I suppose he could have done this-- for example--but I still think that would have been too obvious. You're not supposed to see it on a first read.

> And the third use (in the heading and later in the body) is seperating parts where neither is a mid-sentence appositive phrase, and uses open-on-both sides. So that's not a different way of doing the same thing, it's a different way of doing a semantically different thing.

It's a different use of a dash, but it's still a place where you'd typically use a dash.

-----

Edit: You know what, thinking about it again—perhaps both interpretations are valid. That almost adds to the effectiveness of the whole thing.


It's not even obvious how to run Python or dependencies in the first place. Even putting aside the 2.7/3.x fiasco (that still causes problems even today), you're left with figuring out wheel vs egg vs easy-install vs setuptools vs poetry vs pip vs pip3 vs pip3.7 vs pip3.8 vs piptools vs conda vs anaconda vs miniconda vs virtualenv vs pyenv vs pipenv vs pyflow.


it's like you read my mind.


> And the philosophy here still applies: when you read anyone else's python code, the obvious way is probably doing the obvious thing.

I don't get what you mean by this.

When I read someone else's code, what is obvious to me isn't necessarily what was obvious to the author. For an illustration of this, have a look at the day 1 solution thread from this year's Advent of Code - https://www.reddit.com/r/adventofcode/comments/r66vow/2021_d... (you can search for Python solutions) - and see how many different ways there are to solve a fairly straightforward problem.


I can think of at least 2 obvious ways to iterate through something: for loops and comprehensions.


You're right that both iterate through something but `for` loops and comprehensions aren't used as if they were interchangeable.

For example, you'll sometimes see people do bad stuff like this:

  >>> lst = []
  >>> 
  >>> [lst.append(i + i) for i in range(10)]
  [None, None, None, None, None, None, None, None, None, None]
  >>> 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>> 
When they should be doing this:

  >>> lst = []
  >>> 
  >>> for i in range(10):
  ...     lst.append(i + i)
  ... 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>> 
Or just this:

  >>> lst = [i + i for i in range(10)]
  >>> 
  >>> lst
  [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
  >>>


The first append version will more often be in a loop. It's unlikely that someone will know enough to use comprehensions but not enough to still use append.


Agreed. I've mainly seen the first `append` version in code written by people who've just discovered comprehensions and code golf.


    lst = [range(0, 10, 2)]


That's wrong in multiple ways. You want

    lst = list(range(0, 20, 2))


Ohh, yeah, you're right.


Even simpler!


To generate a list/dictionary/geneator from an input iterable, you use a comprehension of the appropriate type.

To iterate through it without doing one of those things, you use a for loop.

In “one obvious way to do it”, “it” refers to a concrete task; the same is not necessarily intended to be true of arbitrarily broad generalizations of classes of tasks.


I suspect this comment was an elaborate nerdsnipe.


The author uses "only one" to clarify "one". So obviously "one" means at least one.

    There should be at least one-- preferably only one --obvious way to do it.
Kinda funny meta joke considering everybody conflates "one" and "only one" to mean the same thing. Preferably there would only be one obvious way to describe "one". :p


> I mean the zen being wrong is kind of a meme at this point. The whole “only one obvious way to do it” isn’t just false but the exact opposite is true. Python is one of the most flexible languages with many many ways to do the same thing; more than any other language I can think of.

Not in comparison to Perl, which usually has multiple ways to do anything, each 'obvious' to different sets of people (each Perl codebase therefore seems to have a distinct dialect based on which 'obvious' alternatives are chosen).

The other direction languages can take that is being contrasted, is there being one non-obvious way to do something.

Python's 'most obvious way' isn't necessarily the fastest/most concise/most efficient/scalable/etc. way to do something in Python, but it will usually be obvious to most Python developers. And although broad styles have certainly developed over time (imperative, functional, OO) as Python has gained power and flexibility, the dictum still largely holds true.


10 years ago I'd have agreed with you. But Perl has gone a long way in pulling back from some of that insanity while Python has been giving C++ a run for it's money in terms of features.


I'd totally agree - there's been a burst of sort of the perl style stuff (:= ?) to gain relatively small wins.

ie, instead of

for line in lines: print(line)

we are supposed to be using

while line := f.readline(): print(line)

I've not been super impressed with this type of thing.

That said, string formatting is better with f strings.

They also rolled back some the forced breakage from trying to force unicode with 3 which made a big difference. 3.3 added back u''

Lots of good cleanups lstrip vs removeprefix etc.

Underscores in numeric literals (10000000 vs 10_000_000)

So lots of good stuff still landing.


> ie, instead of

> for line in lines: print(line)

> we are supposed to be using

> while line := f.readline(): print(line)

No, we’re not. Walrus, in loops, IME, is more for replacing this pattern:

  while True:
    myvar = get_it()
    if not ok(myvar):
      break
    # code that uses myvar
with this pattern:

  while ok(myvar := get-it()):
    # code that uses myvar


False. I have been harshly attacked here on HN for suggesting things like for line in lines - literally been called "stupid".

I'm not the only one who looked at the recommended examples of the use case here and went, huh?

https://news.ycombinator.com/item?id=17450890

Recommended new way:

  if any(len(longline := line) >= 100 for line in lines):
     print("Extremely long line:", longline)
Old way:

    for line in lines:
        if len(line) >= 100:
            print("Extremely long line:", line)
            break

I prefer the old way. These were examples in the PEP!

In your example get_it() might be better as a generator or iterable. A lot of code looks great if you push that type of thing down a bit, and sometimes memory is helped as well. Then you iterate over it, for values in get_it. This keeps python very natural. You start to get a lot of weird line noise type code with := vs the old python style which while a bit longer was basically psudo-code.


I still don't see the need for things like the walrus-operator.

All it does is increase line-noise, and for what? So we don't have to write 2 short lines, or save an indentation level somewhere?

There is a good reason why assignments in Golang are not expressions, even though they are in C, and the language is otherwise deliberately close to the mindset of C; The added convenience makes the code much harder to read.

Sure;

    char c;
    while((c = getch()) != EOF) {
        // do something
    }
requires less lines than;

    char c;
    while(1) {
        c = getch();
        if (c == EOF)
            break;
        // do something
    }
but it's also easier to read, because each line carries less information. That's what people call "line noise".

IMO, := is a step in the wrong direction, and sadly I see python take more and more of these, going from the deliberately simple and clear language to something that's becoming needlessly hard to read by piling on things it doesn't even need.


Except exit.

I knew Python wasn't for me in my first foray into it when I fired its REPL and then went to exit it with control-C or whatever and it literally printed out the right way to do it but then didn't do it. Python was more interested in having me do things a certain way even when it knew what I intended to do, just to be a twit.


Ctrl-c raises a KeyboardInterrupt error, which is useful for programs to catch. If you type

   >>> exit
   Use exit() or Ctrl-D (i.e. EOF) to exit
You will get that error response. The goal of this is to have the REPL language the exact same as the scripting language. exit() is supposed to be called as a function to make the language more consistent, so just typing `exit` will do nothing


> which is useful for programs to catch.

Useful would be, if the default handler for SIGINT would not raise an exception, but have a useful default like eg. terminating the program. Go handles SIGINT this way by default.

If I want an exception, I can just tell the program:

    import signal
    signal.signal(signal.SIGINT, throwException())
The way it is now, the exception bubbles up to runtime, and if it isn't handled (eg. in the REPL) the program crashes, or worse, hangs if there are other threads of execution running:

    import threading
    import time
    def sleepN():
        for i in range(20):
            time.sleep(1)
    threading.Thread(target=sleepN).start()
    time.sleep(20)
Press c-C here, and the thread will still run, because the bubbled up Excp only kills the main thread. This is a real footgun in applications which rely on SIGINT being a termination signal, and have long running threads.


The REPL prints the value of a variable that you type in. exit is a variable, and so the REPL prints its value. If you want to run it as a function, you can do that, and indeed its string value is a message telling you to do that.

    $ python3
    Python 3.9.2 (default, Feb 28 2021, 17:03:44)
    [GCC 10.2.1 20210110] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> exit
    Use exit() or Ctrl-D (i.e. EOF) to exit
    >>> exit.eof
    'Ctrl-D (i.e. EOF)'
    >>> exit.name
    'exit'
    >>> exit = 42
    >>> exit
    42
    >>> exit()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'int' object is not callable
    >>>
I would have special-cased exit, though.


ipython has 'exit' without the parentheses.


It was a meme when Zen was written, the spaces around the em dash are handled 3 different ways. Twice in the line you abbreviated, removing the joke.


Python finally ended up following Perl's TMTOWTDI motto! https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to...


the zen of python was written in the 90s.

from that context it makes sense, because the only goal of python in the 1990s was to be more popular than perl, which was notorious in having many ways of doing the same thing.

but yeah, python had had significant feature creep over the years, it's nowhere near the small clear lang it used to be.


And still no expressive switch/case statement, breaking out of loops and ending scripts early (for explorative programming).


>no expressive switch/case statement

match/case (not a drop in switch statement)

>breaking out of loops

  break
>ending scripts early (for explorative programming)

  exit() or sys.exit()


I think by breaking out of loops they meant breaking out of nested loops.


Exit or sys exit kills the kernel, so for explorative programming in like spyder it is not that useful.


Are you looking for breakpoint()?


> And still no expressive switch/case statement

There's match/case in 3.10 - https://www.python.org/dev/peps/pep-0636/


Matplotlib is an example of a library with at least two "correct" ways of plotting


But only one of them is recommended - the one that makes less sense.


How is working with figure and axes objects the one that makes less sense?

Is it really that crazy do set up a figure, axes on that figure, and plot on the axes, returning an artist object for each plotting command?


Yes, it is crazy. I guess this isn't really the place for it but ... From the official docs:

    The Figure is the final image that may contain 1 or more Axes.

    The Axes represent an individual plot (don't confuse this with the word "axis", which refers to the x/y axis of a plot).
This is infuriatingly bad and I firmly believe that it makes sense only to people who already know how it works. There's an image, axes (this word alone is a crime), plot, figure... it's like they took a bunch of synonyms and arranged them randomly to put together an API.


>axes (this word alone is a crime),

why so? you prefer something like axiis?


See, that's the thing:

> Axes object is the region of the image with the data space.

In matplotlib axes is not the plural of axis. It has its own meaning specific to the API. And at the same time it's the plural form of another word (axis) which is also relevant in this context and it sounds almost identical when pronounced.


I like the wording in the MATLAB docs (since Matlab committed the original sin, the axes/axis/figure API has been around since the late '80s, matplotlib is just a port to python):

https://www.mathworks.com/help/matlab/ref/axes.html

https://www.mathworks.com/help/matlab/ref/axis.html

https://www.mathworks.com/help/matlab/ref/figure.html

So they emphasize the cartesianess of the axes.


I dunno. One sets global values everywhere, then collects them all into a plot. The other creates a bunch of apparently disconnected objects, sets a bunch of different attributes on each one, and then gets the plot from one of those objects.

If I was designing something like it, I wouldn't recommend either. The global one has many fewer WTFs per character, but the objects one looks like it works in a multithreaded program or that you can create more than one plot without displaying them (but I've never tested this).


one is more or less based on matlab's plotting procedures, the other is an attempt at a cogent implementation of a OOP implementation. However, the OOP paradigm just doesn't seem very good for plotting.

Personally, I like plotting in R way better than in python. It has a lot better developer UX.


Which two ways?



It's sort of like the Unix Philosophy. It sounds good and is probably a good thing to strive for generally, but it's ultimately pointless when it comes to actually evaluating whether approach A is better than approach B.


> Complex is better than complicated

What? Something being complex is artificial, we try to avoid it. Problems can be complicated, we try to simplify them, and more complicated the problem is, we tend to develop more complex solutions. So comparing them does not make sense?

Or did I always know them wrong?


Complex: consisting of many different and connected parts.

Complicated: consisting of many interconnecting parts or elements; intricate.

Nothing specifically artificial about either one. Software that is well decomposed is Complex (made of many smaller connected parts). Software that is is poorly decomposed is Complicated (made of many smaller interconnected parts).

Connected vs interconnected?

Interconnected: connected at multiple points or levels (aka spaghetti code)


Complicated: this mutha is hard all by itself

Complex: we took all of these simple steps, lumped them together, now we have this


Yeah that was what I was trying to say!


It's not particularly well-worded. A lot of dictionaries list complex/complicated as synonyms.

I always took it to mean 'complex' as in having many connected parts, and 'complicated' more as in over-complicated or convoluted - the opposite of 'simple'. In other words, breaking something complicated into a system of intentionally-designed pieces is probably better than a chunk of opaque code to brute-force the current case. A good system is probably also 'simpler', despite having more pieces and interconnects.


I first encountered the notion of complex/complicated in Antifragile I believe, and IIRC it's based on the [Cynefin framework](https://en.wikipedia.org/wiki/Cynefin_framework).

My understanding is that: * Complex domains lend themselves to experimentation and emergent behavior. * Complicated domains lend themselves to analysis, expertise, and rule following.

The Wikipedia article offers the domains as containing "unknown unknowns" and "known unknowns" respectively.

I'm trying to think how this maps to Python -- the language is complicated, while the problems we're solving are expected to be complex? Or, maybe, the language lives at the boundary between complicated and complex. We push complicated procedures into the language, and let the programmers deal with complex issues?


Hmmm, it sounds like you're expecting "two" and "three" to be separate list elements because of some sort of implicit behavior due to being written in a list context. This is the opposite of what "Explicit is better than implicit" means.

This is a list and you must explicitly place a comma when you want to start a new element in the list. Is there ever a time a new element follows a previous one and is NOT separated by a comma? No, this is explicit.

Whereas, strings also always concatenate in this manner be it in a list context or not. It seems like you're assuming behaviors from other languages would be the same in another.


No, we don't want it to implicitly be a list item. We want it to fail as invalid syntax. If I wanted the two and three strings to be combined, I would have /explicitly/ used an operator for that. It's the implicit behavior of that which is the problem.


Not to mention the implicit string concatenation that you get instead.


Ah yes, why would anyone expect lists' main purpose to be listing?

Sarcasm aside, I'd assume people primarily list things in between [ and ], and sometimes concatenate things in there too. The language should err on the side of doing what people expect, unless explicitly told not to.

> It seems like you're assuming behaviors from other languages would be the same in another.

Rather, I think people expect a language, especially one this big and important, to work for them, and not to be designed with unergonomic features instead.


> it sounds like you're expecting "two" and "three" to be separate list elements

I'd expect that to be an error.


Funny enough, in dynamic languages i expect it to do something unexpected and unwanted.

This is why i like Go/Rust. I detest the implicit warts of these languages.


It's not related to being dynamic or not, it's a syntactical choice: that's also the way to concatenate string literals in C.


Well, there are dynamic languages and dynamic languages. There are Python and Ruby and there are Elixir, Erlang and Lisps.


I'm not a python programmer, but the implicit string concatenation seems surprising to me.


It's idiomatic in C.


I'm not a python programmer either, but I would be seriously annoyed at implicit anything instead of syntax error


Your sarcasm is misplaced. I would prefer a SyntaxError to either of the implicit behaviours.


I could see lisp programmers missing the commas out of muscle memory


> Is there ever a time a new element follows a previous one and is NOT separated by a comma?

Yes:

  [ "one, two", "three" ]
The comma is not an absolute context-free indicator of element separation.


This is not what implicit is about.


Implicit concatenation sure seems implicit to me


Implicit things are rarely nice in code for production environments. It makes bug tracing and security much more complicated


This is indeed the point. Some use cases are amazing and increase quality while others are just pure evil.


A lot of people are criticising dynamic typing for this.

It doesn't seem to have anything to do with typing discipline.

    words = (
        'yes',
        'correct',
        'affirmative'
        'agreed',
     )
Would be a tuple (immutable list) of strings, while

    words = (
        'yes',
        'correct',
        'affirmative',
        'agreed',
     )
would also be a tuple of strings.

If haskell had for some reason decided to have the same syntax sugar, it also would have caused an issue.


You got me for a second there.


I am a bit in shock. Accidental string concatenation. Python just lost a lot of reputation in my brain.


Misspelling a variable on the lhs of an assignment just causes a new variable to be created with the new name. That's a lot worse in my book.


I dont think that's the same kind of thing. Your example is a tradeoff that anyone who uses a language that doesn't require explicit variable declaration faces, and it's pretty tough to argue such languages really shouldn't exist.

Missing an operator resulting in explicit behavior is much more subtle and not even obvious behavior. For those who use python, it is worse.


> ...it's pretty tough to argue such languages really shouldn't exist.

"Shouldn't exist" is too strong.

Dynamic languages that let you create a new variable via assignment shouldn't be used to create non-trivial software. How about that?

Scripting languages have a place. That place is 100% in creating quick-and-dirty scripts and tools. Or in doing some kind of one-off data transform (as is common in machine learning scenarios). Anything that has a life span of two weeks or less, or a code length of fewer than a hundred lines? Yeah, script languages rock for that.

Explicit/static typing adds vastly more value to large projects than the cost of the overhead. The fact that you can't really gain that value in Python means that Python should be relegated to quick and dirty scripts.

Same for JavaScript, Ruby, and other completely dynamic languages.

You'll note that all of these languages are getting types one way or another, meaning that there are a lot of people who do recognize their value. Though TypeScript is years ahead of the rest in the completeness and sophisticated of its type system; bugs like the comma bug detailed by OP, along with simply every JavaScript "wat" bug, simply can't happen in TypeScript in strict mode. And static types enables entire other categories of bugs to be detectable via a linter as well.


I've been building non-trivial software in dynamic languages for twenty years. They work great.

I'd take a project in a dynamic language with a decent test suite over a project without tests in a statically typed language any day of the week.


But a dynamic language needs all the tests a compiled language needs AND type/syntax tests (that are handled by the compiler in a static language).

There are reasons dynamic language (or specifically Python), but I haven't heard one explanation how it helps writing fewer tests.


> I'd take a project in a dynamic language with a decent test suite over a project without tests in a statically typed language any day of the week.

I'd take the opposite. I've read too many useless tests in python codebases that can be accomplished by a static type checker. "Decent" does a lot of heavy lifting in your comment. And what about a dynamically typed codebase without any tests? I'm sure they exist.

I'd rather dive into a big ball of mud with a compiler that will help point me to my mistakes before I release them, than having to sift through a ball of mud trying to find that mistake with production services flailing.

That all being said I've worked with both types of languages in successful projects. But I prefer the development experience of the statically typed variety.


  it's pretty tough to argue such languages really shouldn't exist
Well, I agree with OP so that is at least two people. I really don't see it as a good trade.


Explicit variable declaration is just adding a keyword (such as var or let) when you're declaring a new variable instead of modifying one.

The cognitive burden of having to memorize and look for which variables are new vs which are being modified is simply not worth it in my opinion, even for a scripting language. Maybe for esolangs, simple math or first time learning programming.

In any case, it's a short coming of the language (IMO) but not a deal breaker. We learn to live with it.


I'd say unexpected behavior is always worse than expected one.

Yes, you'll certainly find somebody who doesn't know what 'not statically typed' means, but ... And yes, there are also C(++) users, that expect strings to be concatenated like that.


You seem to also not know what "not statically typed" means. It certainly does not mean "not properly scoped".


Yes, of course. But you see that no scope keywords exist in Python. But there exists `+` to concatenate strings (too).


I left python around the times of 2/3 drama, are nonlocal and global not there anymore?


Keywords like namespace, no; but functions and classes and modules provide for a lot of scoping opportunities.


The problem is `fop` should be `foo`:

    foo = 5
    fop = 6
Keywords like `let` solve this problem:

   let foo = 5
   fop = 6 # error


Not entirely:

  let foo = a();
  let foo = b(foo);
  let fop = c(foo);
  let foo = d(foo);
(Which is valid, e.g., in Rust.)


I hate variables shadowing, I'm very surprised that they allowed it in Rust, I saw an unpected behaviour caused by variable shadowing in C++ (global variable hidden by member variable) just last week..


You do get a warning, though. And most Rust projects I've seen usually adhere to 0 warnings.


A good IDE has many other safety nets for that error.

Auto completion, highlight matching variable, gray out unread variables and warning of unused assignment.

I’ve written lots of python and can’t recall ever having this issue. More likely is a logic typo of two similar variables like length_x vs length_y, where a “let” wouldn’t have saved you anyway if both are already defined.

JavaScript, pre strict TS, on the other hand, where missing var implied global was a real motherload of bugs. Or kotlins “val” vs “var” changing semantics completely…wow. But those are different concepts from basic definition I know.


Or := for declaration like Go and Toit


Yes, or another symbol instead of `=` for assignment, like `<-` (F#)


While I agree, this is somehow something I expect. Implicit string concatenation without operator or function around it sounds just like a terrible idea. It breaks the basic syntax concept of `foo X bar`. On the other hand it is probably very handy with DSLs and things like that.


Not so much DSLs, it's probably something as banal and ancient as

  usage = (
    'usage: foo [options] filenames...\n'
    '  -f force concatenation\n'
    '  -c for convenience\n'
  )
  print usage
Edit: forgot to add parentheses


Isn't that common for all/most languages that don't require explicit typing?


This is a scoping rule, not typing. Scoping is a mechanism of symbol resolution, i.e. what do you mean by `foo` at line N. Is it a local, an argument, a global, or addresses a symbol defined in an enclosing scope? Most languages use explicit local definitions, searching implicit ones in outer scopes bottom-up, ending at the global scope. Python was the first popular non-basic language which made implicit assignments to be local and shadowing and function-scoped:

  global x = 1
  def setx():
    if True:
      x = 2 # completely different x
    print x # prints 2, visible outside of `if`
  setx()
  print x # prints 1
This led to a funny keyword 'nonlocal', because you can't simply ignore scoping and pretend that you're BASIC in any serious program.

(To my opinion, python had a good start, but lost in the woods for no clear reason. It's a movie mutant of a language, which tried to appeal to non-programmers and somehow succeed, and then realized that non-programmers eventually become ones, and it's not hard. Now it's too late to fix this mess. End of opinion.)


JavaScript (strict mode) doesn't have explicit typing, but it still requires variables to be declared.


Same for Perl.


Uh? Perl optionally requires you to declare variables, which is a good idea IMHO, no noise for small script and any experimented Perl programmer will have learned that 'use strict' is a really good idea for big scripts..


It would be impossible in any language that requires either explicit typing or some kind of 'let' keyword. (Or, in the fringe case, a language like Go which uses a different operator for initialisation-plus-assignment.)


Exactly. That's why I asked about languages that don't require explicit typing. My point is that it's a feature of many languages rather than a Python idiosyncrasy.


Declaration and explicit typing are logically orthogonal, but few if any languages require typing but not declaration. Lots require declaration but not typing.


It is common tonall languages that have the same syntax for definitions and mutation.

In Scheme, for example, this is not an issue.


That’s a complaint against the entire type system, nothing to do with misspelling.


It has nothing to do with the type system? It's an issue with implicit declaration. You could very easily require explicit declaration while retaining the selfsame type system.


Huh, you're right. It would be bizarre to see something like this in Python though. I've never even thought of it as being implicit declaration.


I was going to comment something like "who would even use this?" and then I remembered that I have in fact used that feature :) It's a somewhat "nice" way to write long strings and keep the code from getting too wide. I never did it inside an array, but I found breaking up a long string into smaller ones and wrapping them in parens without a comma was convenient, for things like error messages.

But that's just what comes with a hyper flexible language like python. You can do lots of things in lots of different ways, but you can also screw things up just as easily, and your IDE won't tell you because technically it's valid code.


I completely get that. That is a very nice feature for building DSL or libraries with special needs. But it makes the overall language very dangerous.

Is this "operator" overloadable on each type in Python?

And that scares me a lot. I think I have to reevaluate my position towards Python.


It's not really an operator. It's part of the syntax of string literals. "foo" "bar" is an alternative way of writing the string literal "foobar". If foo is not a string literal, foo "bar" is invalid syntax.


Okay... So it is not a implicit operator. That is good. Some small reputation points are regained.

Thanks.


Why not just use plusses? Or perhaps a join func, which would accomplish the same.

I get the use case as you described it, but it just seems like minimal effort to accomplish and have some semblance of explicit/safety.


or if that's the use case, require the whitespace to include a \n or \r\n... It's not like python doesn't have significant whitespace already.


That wouldn't fix most of the cases highighted by the tool in the article.

So strange that Python has completely different syntax from C, but they chose to copy this obscure syntactic feature _even though they have the plus operator on strings_.


Heh. I use it all the time the way you do and didn't realize this is alien to many developers (no one in my team every complained about it).

It's common in some languages and used the way you use it. I looked in PEP8 and it seems they don't discuss this.

I think it's a perfectly valid use case, but clearly there are two camps to this. If this is so contentious, I would recommend PEP8 be revised to either explicitly endorse it as a way to split long lines or to explicitly discourage it and recommend the + operator instead.


You could have the same behavior by enforcing + operation in between

  mylongstring = "hello" +
    "world"
No idea if python's way of indentations allows this but sounds like it should


No, it doesn't:

    mylongstring = ("hello" +
       "world")
or, without `+`

      mylongstring = ("hello"
       "world")


Use \

    mylongstring = "hello " \
      "world " \
         "my " \
     "name " \
    "is"*


The use of \ is discouraged in Python. From PEP8:

> The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. Long lines can be broken over multiple lines by wrapping expressions in parentheses. These should be used in preference to using a backslash for line continuation.


See I knew Python just wants to be more like lisp.


Not sure if it's irony or not. After all, this is not really accidental string concatenation but an easy to make type error which can go undetected due to the dynamic typing (and the lack of thorough type annotation in most code).

The string concatenation in itself should not be a problem as it's really just string constants. (But again, it might be irony exactly because of this :) )


Unfortunately no irony.

I come from a programming platform (C#) where productivity is a key element of language design. I highly doubt that Anders Heijlsberg would have accepted such a error prone concept like a literal free implicit operator on a key type like strings.


Well, I guess it's true for most language that productivity is intended to be a key element of design. (For python, definitely. But I also remember James Gosling saying this about Java.) This implicit concatenation seems to come (inherited?) from C.

I kind of remembered that some languages do support it for braking strings into multiple lines conveniently. I'm a bit surprised that it works even on line (I've never used it, because why would have I), but you'll likely to make the mistake on multiline statements anyway. I've also checked and it doesn't work in java (which I kind of remembered, though I mostly do python these days).


> for braking strings into multiple lines conveniently

What is inconvenient about just adding a + at the end or beginning of the line?


In most languages an array with 3 elements has the same type as an array with 2 elements so the type system isn't going to warn you about the difference between

("foo" "bar", "baz")

and

("foo", "bar", "baz")


They still tend to differentiate between 2- and 3 element tuples (but I agree that the implicit concatenation is problematic).


Fair enough. I was only thinking about the str vs tuple case. So when you have 2 elements in the parenthesis.


C/C++ has the exact same thing, no?


I really like the idea of automated code review tools that point out unusual or suspicious solutions and code patterns. Kind of like an advanced linter that looks deeper into the code structure. With emerging AI tools like Github Copilot, it seems like the inevitable future. Programming is very pattern-oriented and even though these kinds of tools might not necessarily be able to point out architectural flaws in a codebase, there might be lots of low-hanging fruits in this area and opportunities to add automated value.


Consider that you may be describing a compiler. Typos are not generally a problem in statically typed languages with notable exceptions such as dictionary key lookups etc.

Even without static typing, argument length verification etc. can be done with a suitable compiler. In python we are left chasing 100% code coverage in unit tests as it's the only way to be certain that the code doesn't include a silly mistake.


I think 100% code coverage is folly. Spreading tests so widely near-inevitably means they're also going to be thin. In any codebase I'm working on, I would focus my attention on testing functions which are either (a) crucially important or (b) significantly complex (and I mean real complexity, not just the cyclomatic complexity of the control flow inside the function itself).


Fully agree, but I never want to see a missed function argument programming error in customer facing code. In python you really do need code coverage to achieve this goal - static languages have some additional flexibility.


Or a rich suite of linters religiously applied. Never save a file with red lines in flymake or the equivalent. Ed: actually, I am unsure if my current suite would miss required parameters. I tend to have defaults for all but the first parameter or two, so not a big issue for me I guess. I do like a compile time check on stuff tho, one of the reasons I am doing more and more tools in Go.


I actually recently joined a startup working on this problem!

One of our products is a universal linter, which wraps the standard open-source tools available for different ecosystems, simplifies the setup/installation process for all of them, and a bunch of other usability things (suppressing existing issues so that you can introduce new linters with minimal pain, CI integration, and more): you can read more about it at http://trunk.io/products/check or try out the VSCode extension[0] :)

[0] https://marketplace.visualstudio.com/items?itemName=Trunk.io


cool product :) it is just linting or do any of the tools do code transformation to offer the fix for the lint failure? (code review doctor also offers the fix if you add the github PR integration)


If a linter provides autofix suggestions, we will propagate it all the way back to the user!


This is basically linting, i.e. code analysis. The techniques used might be more current (as they have been evolving, as you say, for pattern matching) but linting is just that: a code review tool to find usual bugs. (This is what did happen in this blog post. It wasn't looking for unusual solutions but usual mistakes.) The packaging, form of the feedback seems also different and that in itself may make a lot of difference in ease of use and thus adoption.


Admittedly, the difference here is that codereview.doctor spent time tuning a custom lint on a variety of repos. In an org with a sufficiently large monorepo (or enough repos, but I don't really know how the tooling scales there) it's possible to justify spending time doing that, but for most companies it's one of those "one day we'll get around to it" issues.


yeah something like sonarqube or https://codereview.doctor (if you use GitHub)


Or people could just write it correctly in the first place! Controversial I know! Seems like people would rather half-ass things and then let some AI autocorrect fix it up for whatever reason rather than doing it properly.


As a comparison, in Ruby

  puts "a" "b" == "ab" # true
and

  puts "a"
    "b" == "ab"
prints "a" with "b" == "ab" evaluated to false and discarded. This could create bugs as with Python. However

  ["a"
     "b"] == ["ab"]
is syntax error at the beginning of the second line. The parser expects a ] It would evaluate to true if it were on one line.


In Ruby one too many commas can also cause problems:

# list

list = "a","b",

# function

def foobar

end

=> ["a", "b", :foobar]


I actually prefer Python approach here in that within () [] {} newlines are simply whitespace with no special meaning - this allows for very flexible formatting of expressions which is still unambiguous.

The implicit concat of string literals is the culprit here. It really should require "+".


Ironic to see this today. I spent an hour debugging this very same issue this morning.

I was just doing some simple refactoring, changing a hard coded sting into a parameterized list of f-strings that’s filtered and joined back into a string.

I’m glad that I had unit tests that caught the problem! I couldn’t figure out why it was breaking, that comma is very devilish to spot with the naked eye. I’m surprised my linters didn’t catch it either. Maybe time to revisit them.


I like this. It's clearly meant as marketing for their product, but imo the best kind of marketing. They don't just run their tool and automatically make tickets, but check for false positive and (offer to) make pr's.

It's both good for those projects and for the company that does the marketing since they reach there exact target group. Plus it gets them on the front page of HN.


A great addition to prune a ton of false-positives is to check the length of the strings. Almost always, the intentional implicit concats will have a very long string that reaches the max line length, whereas the accidental ones are almost always very short strings.


Nice! Internally we have a PCRE support on our code search and I regularly run a regex to find and fix these. I've also found a ton on opensource project which I've been trying to fix:

https://github.com/YosysHQ/prjtrellis/pull/176

https://github.com/UWQuickstep/quickstep/pull/9

https://github.com/tensorflow/tensorflow/pull/51578

https://github.com/mono/mono/pull/21197

https://github.com/llvm/llvm-project/pull/335

https://github.com/PyCQA/baron/pull/156

https://github.com/dagwieers/pygments/pull/1

https://github.com/zhuyifei1999/guppy3/pull/12

https://github.com/pyusb/pyusb/pull/277

https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull...

It is indeed a very common mistake in Python, and can be very hard to debug. It bit me once and wasted a whole day for me, so I've been finding/fixing them ever since trying to save others the same pain I went through.

EDIT: I will point out that I've found this error in other non-Python code too, such as c++ (see the 2nd PR for example).

Here's the regex for anyone curious:

[([{]\s*\n?(\s*['"](\w)+['"],\n)+(\s*['"]\w+['"]\n)(\s*['"]\w+['"],\n)*


Just to be clear, the V8 "bug" was in the test runner code and caused mis-parsing of command line options for testing for non-SSE hardware. Not exactly a critical bug.


The way the bug arrived in that test runner is interesting. It sneaked in mid-review. Possibly bugs added in the middles of code reviews are more likely to get through.

https://chromium-review.googlesource.com/c/v8/v8/+/2629465/3...

Personally, I prefer uniform lists with leading commas, because it's easier to add and remove lines for later, inevitable refactoring. For example, I prefer:

  things = [
    'foo'
  , 'bar'
  , 'baz'
  ]
This drives some people crazy, but I think it's the One True Way.


Isn't

  things = [
    'foo',
    'bar',
    'baz',
  ]
even better? In your case, if you want to add something to the beginning of the list you'll have to modify two lines.


Depending on the context, yes. But sometimes you are not allowed the last comma.

ETA: Let me expand on why it's important to put the comma first. Which list is more clear to you:

    a
  , dog
  , weather
  , banana
  , b
  , car
or

  a,
  dog,
  weather,
  banana,
  b,
  car
With the leading commas, they all line up, and you can see them in a neat little row. I really prefer it especially in contexts where the trailing comma is not permitted, such as a SQL query:

  SELECT
    name
  , date
  , operation
  FROM
    stuff


The whole "666" thing really threw me off. I thought it was some Python specific term or something at first glance. They open with a sentence that mentions "5% of the 666 Python open source GitHub repositories" as though there were only 666 total open source Python GH repos. Picking a number with other fun connotations or whatever to use as a sample is fine, but without setting that context, it was kind of distracting from their main content.


Did you figure out what the context is, and if you did, would you mind spelling it out for me? I still haven't figured out what correction to make to that sentence to get it to make sense.


in a blog post about the evils of typos there was a typo! classic https://en.wikipedia.org/wiki/Muphry%27s_law ;)


Also this classic:

> Apple I was the first product ever announced by the company in 1976. The computer was put on sale for $666.66 at the time.

https://9to5mac.com/2021/11/25/steve-woz-signs-rare-1976-app...


They ran their static analyzer over a sample of GH repos. They chose 666 as the number for their sample size. That's all.


It's further evidence that the Illuminati intentionally put these typo bugs there to destabilize the global order.


tl;dr: Python concatenates space separated strings, so ['foo' 'bar'] becomes ['foobar'], leading to silent bugs due to typos.

I've been bitten by this one at work, and can't help but think it is an insane behaviour, given that ['foo' + 'bar'] explicitly concatenates the strings, and ['foo', 'bar'] is the much more common desired result.

edit: This also applies to un-separated strings, so ['foo''bar'] also becomes ['foobar']


I assume it's based on the C behavior, where it can be handy with macros

I don't think it fits well in python


Maybe. We must remember that Python was designed at the very end of the 80s so what was normal for developers back then could be unexpected nowadays. An example: the self in Python's OO is a C pointer to struct of data and function pointers. It should be perfectly clear to anybody writing OO code in plain C at the time (rising hand.) Five years later new OO languages (Java, Ruby) kept self inside the classes but hide it in method definitions.


But Python 3 was designed in the 2000s and had many breaking changes. Seems like they could have changed this behavior with that version.


I assumed it was borrowed from shell, where everything can just be put next to eachother since it’s all text.


It's a holdover from C, where implicit string literal concatenation is very useful in the preprocessor.


I luckily never accidently used this space-concatenation thing, but I've been bitten by the fact a=(1) doesn't create 1-element tuple multiple times in my early days learning Python.


I still don't understand why it doesn't! So I still get bit from time to time.


If a person decides to add parentheses to some booleans or arithmetic,

    (4 + 5) * (8 + 2)

    (this and that) or (theother)
These elements should not become 1-tuples after the interior contents are evaluated. I sometimes add parentheses even around single variables just for visual clarity.

Also, this allows you to do dot-access on int / float literals, if you want to

    # doesn't work
    4.to_bytes(8, 'little')

    # works
    (4).to_bytes(8, 'little')


In principle, a 1-tuple shouldn't even be a thing - any single value is a 1-tuple by itself already. However, in a dynamically typed language, this approach complicates things elsewhere - e.g. if you have a value / 1-tuple that is a list, you'd expect iteration over it to give you list elements, not the single element that is a list. But if you have a value that is a tuple of unknown size, you don't want to special-case iteration for when that size is 1.


It depends what you mean by tuple. In Python, tuples are basically just immutable lists. Just as lists with 1 element are useful, so are tuples with 1 element. You might be dealing with a tuple of unknown length, where the length could be 1. In other contexts, the word "tuple" often carries the connotation of "having a known fixed length", in which case the notion of a 1-tuple as distinct from the value itself is less useful.


Presumably because parantheses don't really have anything to do with tuples, it's commas that do. Parantheses are there to help the parser group things in case of ambiguity, and to support expressions spanning multiple lines.


Since you typed it twice, I don't think it's a typo. It's parentheses not parantheses.


Thank you! I guess spelling from my native language is creeping over to English on occasion :)


This seems like not a big deal. It’s a common mistake and is in 5% of repos but it’s not causing major damage.

And there’s no evaluation of importance as to whether these instances are in test files or non-critical code. Packages are big and can have hundreds or thousands of files.

It could be that if these mattered, they would have been detected and fixed.

A good example for unit tests and perhaps checking to see if these bugs are covered or not covered.

I like these kinds of analyses but don’t like the presented like it’s some significant failure.


5% of 'released' software is quite a lot, more importantly it's a class of errors that definitely should not exist. This is a 'bug' in the language effectively there just isn't any real upside.

Python has a few of these things, which is really sad.


It's a class of error that would be caught by even the most basic testing. A better title for the article is that 5% of 666 Python repos have typos that demonstrate the code in them that is completely untested. It doesn't matter which language it is: untested code is untested code in any language.


The errors were usually in tests themselves. Are you arguing that tests need their own tests to test that they are testing the right thing? Usually I think people believe that tests do not need to be tested and should not be tested, i.e., that you measure "100% coverage" against non-test code alone.


I don't think anyone could disagree: you could never exceed 0% code coverage if your definition was recursive (i.e. included tests, tests-of-tests, tests-of-tests-of-tests, ...).


Only if you generate infinite tests, then your coverage approaches 0%. But 100% covered code + 0% covered tests = ~50% total coverage.

Also, the obvious solution is self-testing code. (Jokes aside, structures like code contracts attempt something like this).


unfortunately like 10% of the bugs were in the tests themselves. e.g., the sentry one https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

the tests are only as good as the code they're written with, and as good as the code review process they were merged under.


One of the habits I have when writing kernel code is to intentionally break code in the kernel to verify that my test is checking what I think it's checking. That's because of a lesson I learned a long, long time ago after someone reviewed my code and caught a problem: when your code has security implications, you need to make sure the boundary conditions that your tests are supposed to cover actually get tested. Having implemented a number of syscalls exposted to untrusted userland over the years, this habit has saved my bacon several times and avoided CVEs.


I believe that, whenever possible, tests should be written in a different language that the one used for the code under test (even better, in a dedicated, mostly declarative, testing language).

It avoids replicating the same category of errors in both the test and the code under test, especially when some calculation or some sub-tests generation is made in the test.


"It's a class of error that would be caught by even the most basic testing. "

You could say that about anything and everything in software.

It's not acceptable that testing needs to be run for something the language should 100% accommodate.

The whole point of the language is to provide algorithmic clarity and avoid these things.

This isn't really an issue of 'trade offs' is just a bad feature of the language that should have been remedied more than a decade ago.

The lack of proper declaration of variables is even more absurd, there's only downside to that.


I checked those those 11 links to issues for major software. 10 bugs were actually in tests...


This is understandable since many of those projects are not written in python. So the python code in them is only in incidental scripts like test harnesses. If V8 was written in python then performance would probably not be very good.


I do not see this from a verification perspective ... But also from a productivity perspective.


9 out of 10, actually; the Tensorflow links are the same link.


There were proposals to fix some of these but the unicode zeal beat out some of the more boring (but I'd say as important) cleanups.


yeah the impact varies. the sentry one seems pretty big: https://codereviewdoctor.medium.com/5-of-666-python-repos-ha...

test did not work but did not fail either, imagine being that dev maintaining the code that the test professes to cover. Imagine being the user relying on the feature that test was meant to check (if the feature under test actually broke).


I mean, if you’re ultimately going to combine the list into a string anyway it’s no big deal.

Along those lines. I wonder how many of these come from ad-hoc file path handling instead of using pathlib.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: