Hacker News new | past | comments | ask | show | jobs | submit login
Google Python Style Guide (google.github.io)
239 points by quick_brown_fox on Feb 11, 2023 | hide | past | favorite | 173 comments



It is a shame that default arguments isn’t a bit longer. Perhaps it’s out of scope to talk about anti-patterns but in my experience default arguments cause a lot of distress to a good code base.

Defaults are useful when you are providing a library function for other teams to use. If you’re inside a more private code base and doing work on the implementation of your team’s service then it is wise to avoid default arguments.

The problem is they provide a point after which it seems acceptable to add a flood of more default arguments. This is particularly the case for junior developers who lack confidence to refactor instead of patch. Default arguments go hand in hand with conditional logic and cause functions to bloat into do-everything multi-page monsters without any focus and no tractable flow of logic.

Forgive the contrived example, but what was once this:

  def greet(name):
    print(f”Hello {name}”)
ends up becoming this, all because no one would bite the bullet and pick this apart into individual functions:

  def greet(
    name,
    language=None,
    io=None,
    is_ci=False,
    and_return=False,
  ):
    greeting = “Hello”
    if language:
      greeting = translate(greeting)
    message = f”{greeting} {name}”
    fn = print
    flush = False
    if is_ci:
      fn = log
      flush = True
    fn(
      greeting,
      flush=flush,
      io=io if io else stdout,
    )
  if and_return:
    return greeting

The slow rot of more and more defaults makes the function longer and longer. Moreover, each time someone adds a new option it gets harder to justify why they shouldn’t do it when the previous person was allowed.


I think the example is indeed contrived - you said it yourself!

To me it doesn't illustrate the problem with default parameters specifically.

For example it shows that the programmer doesn't know dependency injection and first-class functions. Printing could be passed in as a function param, but then it might actually be sensible to provide a default callable (eg, print), depending on how the greet function is going to be called.

Language seems to be a perfectly sensible thing to have a default on.

Then, and_return... That is, like, super contrived, man... I mean if a programmer doesn't know that a function caller can simply call for side effects and ignore the return value, they likely have much bigger problems than their judgement to use defaults or not.

I empathize with your plight though - you're probably a great programmer, and I think it's very difficult for someone who is good at their craft to come up with genuinely but subtly shitty examples.


You are right — the return thing is very silly. What about this:

  def greet(…, bow=False):
    …
    if bow:
      take_a_bow()
Except imagine take_a_bow() instead as 10 lines of code to perform the post-greeting bow-ceremony. That code takes additional arguments regarding what kind of hand flourish to perform while bowing. The flourish_type has to be an optional argument to greet (because bow is) but inside the bow you have to assert flourish_type is not None because you can’t bow without knowing what flourish to give.

I’ve seen some dark stuff over the years.

Also: Welcome to HN!


  def greet(bow=None):
    ...
    if bow:
      bow()

  def elsewhere(gesture):
    def bow():
      print(gesture)
    greet(bow)
Wrap that.


OMG please leave your dependency injection in your Java project and keep Python clean with its easy to use list of default parameters.


DI is not a pattern to provide the defaults but a pattern that allows you to separate dependency initialisation from your callable's responsibilities, which often is a complex graph of inner dependencies that you would need to provide/initialise in some way, it also can handle whether you want to have single instance of such dependency for your single call, request scope, thread scope or process scope. Also injecting some defaults from configuration object or some properties/setting file is very nice feature.

Also stop being so religious and defensive when somebody mentions things that are not standard in your language of choice as nobody forces you to use this.

I was sucessfully building very testable and maintanable codebases (~50 kloc) in Python while also using very small in-house built DI framework and it was subjectively (by me and my collegues) much better than what we had before while we followed standard Python patterns and ways Python frameworks teach you to follow.


What's the point of a DI framework? I never got the point of even thinking about DI explicitly like it's something special. It's so obvious that it hardly deserves to have a name, let alone a framework.


> (by me and my collegues)

Can I guess you were all Java developers?


No, around 6-7 Python developers including me, we've used it in two different projects. Also it's probably worth mentioning that I'm not author of that lib, my die-hard anti Java collegue created it by just reading about DI, consulting with other collegue that was using Spring more-or-less since high school and researching existing DI libraries. I myself did some programming in Java before that, but it was mainly hobbist gamedev, later commercial Android and light Java backend work using Spring (that's where I've seen it used for first time) intermixed with around 7 years of professional Python backend programming in two different companies. Now after 10 years I'm an Java engineer, I've had enough of using dynamic languages to write moderately complex web applications.

I still love using Python for REPL, small scripts or prototyping, and I think having things like mypy is great as it takes away much of the burden without being a huge obstacle in some situations where you really need to use duck typing. Also I'm thankful that it teached me early that the debugger is one of the developer's best friends and the best documentation is just reading the code.


Thanks yeah I'm just bitter and in a bad mood today. :)


You can use dependency injection in Python while remaining completely pythonic - nothing in the Python I write resembles Java. Dependency injection is not mutually exclusive with using defaults either, so I'm not sure what we're talking about here.


For me, DI is where this happens at the public interface to things. I’m quite happy with subprocess.Popen’s keyword argument salad because they are relatively ergonomic and make good sense.

When that kind of “handy defaults for ya!” programming happens to a function inside a package… that has four call sites… all of which were added by the same team of three people… just refactor your stuff, get functional, and say explicitly what you actually need.



Yeah pandas seems to flaunt a lot of convention about not grouping lots of different control flow into a single function.

But at the same time I wonder how it would look refacotred. How many read from csv functions would we be left with?


> How many read from csv functions would we be left with?

It probably couldn't be that, because many build on one another. Some are deprecated and others are clearly incompatible, but out of 50 parameters you likely could imagine calling this with 20 parameters if the environment and the CSV you're ingesting are wonky enough.

I think feasible refactorings would be:

- rationalise currently separate parameters into meatier objects e.g. there's at least half a dozen parameters which deal with dates parsing, a dozen which configure the low-level CSV parsing, etc... that could probably be coalesced into configuration objects

- a builder-type API, but you'd end up at the same result using intermediate steps instead of a function, not really useful unless you leverage (1) and each builder step configures a non-trivial amount of the system, so rather than 50 parameters you'd have maybe 10 builder, each with 0~10 knobs

- or you'd build the thing as a bunch of composable transformers on top of a base parser

Of note: the latter at least might be undesirable from the Pandas POV, as it would imply layers of recursive Python calls, which might be much slower than whatever Pandas currently does (I've no idea).


I think that this style (such as it is) comes from R, and scientific computing more generally. I grew up with R and never realised how terrible long argument functions are until relatively recently.


`pyarrow`'s `read_csv` function[0] has just four default arguments (defaulted to None): 3 option objets and one Memory Pool option.

``` pyarrow.csv.read_csv(input_file, read_options=None, parse_options=None, convert_options=None, MemoryPool memory_pool=None) ```

You can then pass a `ReadOptions`[1] object if needed.

For example:

``` read_options = csv.ReadOptions( column_names=["animals", "n_legs", "entry"], skip_rows=1) csv.read_csv(io.BytesIO(s.encode()), read_options=read_options) ```

You can see how ReadOptions is written on this link [2]. It's interesting they use a `cdef class` from `Cython` for this.

This doesn't solve all issues (the ReadOptions object and the others will inevitably have a bunch of default arguments) but I do think it's safer and it's easier to have a mental map of the things you need to decide and what's decided for you.

[0] https://arrow.apache.org/docs/python/generated/pyarrow.csv.r... [1] https://arrow.apache.org/docs/python/generated/pyarrow.csv.R... [2] https://github.com/apache/arrow/blob/master/python/pyarrow/_...


This is where you could use a builder pattern where you specify everything that diverges from the default using chained method calls.


So you end up at the same point, but now you need additional intermediate structures and infrastructure which do nothing to help. And for Python specifically it's also a pain in the ass to format due to the whitespace sensitivity.


proving the comment's point - this is a library function! exactly the right case for default args


Yet, I've never had an issue using that function!


matplotlib has entered the chat.


You don't suggest any solution. Do you want more function overloading or maybe config objects?

Adding default parameters works well with existing code. It is not bad and lazy because it is easy.


Imagine two different call sites want to do two different things with the message. One wants to log() it, another wants to print() it. In my example this has been implemented by passing a flag to greet() to tell it what to do.

If greet() gave up responsibility for outputting the message and instead just constructed it, then your code would look like this:

  def site1():
    print(greet(“x”))
  
  def site2():
    log(greet(“y”))
And greet would be half as long.


Except wouldn't the ideal in this case be to use default parameters?

def site(message, port=print): port(greet(message))


This bit in particular:

  if and_return:
    return greeting
Is a nice touch. I've definitely seen that pattern in the wild.


I can't conceive where this would be useful, could you expand on this?


I usually see it for things like “return_generator”. Then you need to write an overloaded signature to show that it could return a list or generator depending on that param.

Then it’s even worse when there’s also an “allow_raise” param, where the return type will not be None if allow_raise=True. Now you need to write 4 overloaded signatures to account for the 2 polymorphic params


It's more likely to be listDir(withSizes=False)

I work on a very large codebase like this and most functions return lists of strings/tuples and most DTOs will be dictionaries with string keys. Instead of classes which have methods to retrieve information in different ways. Therefore parameters have been added to return information in more and more ways.


> Moreover, each time someone adds a new option it gets harder to justify why they shouldn’t do it when the previous person was allowed.

I’m not sure languages should be limited in order to avoid problems with the lack of project leadership.


I think the example is a bit grey.

In my opinion, function should list its dependencies and allow changing them. Having said that I dont believe the `is_ci` decision should happen in the function. The decision should happen at the entrypoint and it should drive which implementations the code will use for the dependencies.

I would look for the reason of rot in making the function become the merge point of multiple context, not the default values per-se. Whether default arguments make merging multiple contexts in a single function easier - code reviews might help here.

In any case, very good example


mmmm yes the entire R ecosystem…

Others also mention pandas

Data science workflows in general love to do this


SRP violation?


Pylint is extremely slow, I've preferred flake8 for some time. But now looking into Ruff, handles the whole codebase in less than second (vs minutes). Happy with it so far.

https://github.com/charliermarsh/ruff

This article also highlights some pain points I have with python. They say avoid big list comprehensions, but there is no nice way of piping data in Python one can switch to instead. Especially with lambdas being so underpowered (so they also say to avoid).

They say avoid conditional expressions for all but simple cases, and I agree. Which is what makes me wish everything (like ifs, switches/when etc) in Python was an expression (ala elm, kotlin etc). Because right now it's hard to assign a variable conditionally without having to re-assign / mutate it, which feels unpure.

Default arguments being reused between calls is just weird. So I understand the rationale of Google's style guide, but it's a big flaw in the language imo, lots of weird bugs from novice programmers from that.

I disagree on allowing the @property decorator. You think you're doing a lookup, but it's suddenly a function call doing a db lookup or so. Huge foot gun.

I feel the 80 character rule is too low, and often hard to avoid with keyword arguments, to the django orm etc. End up having to litter comments to disable it all over the place, and waste a lot of time when CI breaks.

As for formatting, whitespace etc. I'm over caring about that for languages. I just have some autoformatter set up to fix everything on save, and someone else can argue about the rules.


> This article also highlights some pain points I have with python. They say avoid big list comprehensions, but there is no nice way of piping data in Python one can switch to instead. Especially with lambdas being so underpowered (so they also say to avoid).

Write generator functions, i.e. functions that yield their results. They are surprisingly powerful. I usually find I need only one or two.


Have being using Ruff for a week, it's so awesome that I'm surprised that VSCode extension for Ruff has only 8500 users.


I only started a week or two ago as well, and I had to go double check directly on the Ruff git repo to see if the extension was legit. I just couldn't believe the real one had so few installs.


ruff is the ticket. it replaced isort, flake8 for me, never looked back


it also doesn't know how to deal with simple constructs like match...case

I agree that ruff seems to be the way forward, it's (almost) at feature-parity, it is extremely fast, but I think it needs polishing


I just tried ruff last night and ran into the match-case support issue. I'm following https://github.com/charliermarsh/ruff/issues/282 and looking forward to trying ruff again once that issue is closed.


I was gonna ask if match case wasn't a really recent thing, but it seems to be from 3.10 released in October 2021.


I feel like it have taken quite a few tools some time to catch up. Or if they've caught up, it wasn't straight out of the box. Like, I had to upgrade my linter to handle match cases, but that bump (as we were running an older version) also introduced new rules breaking other code. Since I didn't want to take that work right then, I rewrote to an older if statement and put a task in the backlog to upgrade the linter.

So I've actually seen little use of match cases so far.


to be honest it is fairly recent and is not exactly "basic", I'm sure ruff people will end up covering it in the near future, I just thought it might bring some balance to mention that it doesn't yet go a hundred percent


IIRC this is Ruff's criteria for 1.0


A Python style preference I'm rapidly deriving is as soon as there's more than two arguments to a function, especially when you're mixing in args with defaults, I'm thinking about enforcing kwarg syntax only at a language level. Just to force the code to be more readable at the call site.

E.g.,

    def bla(self, a, b, c=2, d=True):


    self.bla("x", "x", 3)

Ain't no way in hell I'm wanting those arguments to be used positionally by callers. They're getting THE KWARG IS MANDATORY STAR.

    def bla(self, *, a, b, c=2, d=True):

    self.bla(a="x", b="x", c=3)
It adds more vertical to your code when you have meaningful argument names, but it makes it a lot clearer what is what, and the language enforces what just used to be a good style.

It's especially important when Python's mocking comes into play.

In languages like Java, you can determine what is what based on types... (with static imports for both)

    X x = new X(mock(Y.class));
Makes it far clearer what X is working with than

    x: X = X(MagicMock())


Your mock example could be improved (in both languages) by defining a variable for the Y mock. You probably want a variable anyway, so you can define the mock’s behaviour or verify it was used correctly.

Also, Java’s a pretty bad example, as it lacks default arguments and named arguments, leading to the ugly and verbose builder pattern. (Side note, why didn’t Java add them yet, after so many years of their usefulness being seen in Scala, Kotlin, C# to name a few related languages?)


While it's not required, there's nothing stopping you from doing any of:

    x: X = X(MagicMock(Y))
    x: X = X(MagicMock(Y(...)))
    x: X = X(MagicMock(spec=Y))
which all make it fairly clear that the Mock is of Y. (or better yet, use fakes instead of mocks)


Yeah it's the fact that it's not required that makes me insert the *.


Kwargs is a feature I miss now I work in a typescript/nodejs code base.

Sure you can objects bits it not the same.


In TS, an object is close to being kwargs. My main issue is that it’s much more verbose since you separately write the destructure and type, so you have to write the args twice.

But one nice thing is that consumption is less verbose than Python. In Python you write “greet(name=name)” but in TS you write “greet({name})”


  2.14 True/False Evaluations
  Use the “implicit” false if at all possible.

This one is my personal bug-bear. I find this:

  if not users:
     ...
significantly worse than:

  if users == []:
      ...
The second is totally explicit, reminds the reader that users is (expected to be) a list and makes it totally clear that we can only enter the conditional block if users is an empty list.

The first option:

a) obfuscates the type of users on first reading

b) evaluates to True if users is None (or LOADS of other things?!) which can lead to hard-to-find bugs.

Granted, type-checking can help here but purely from a readability perspective the second option seems way more friendly and for almost no downside. The same holds true for all of the "False-y" objects:

  if users == {}:
  if users == 0:
  if users is None:
  if users == ():
  if users is False:
Why is the implicit:

  if not users:
an improvement in any of these cases?

  If you need to distinguish False from None then chain the expressions, such as if not x and x is not None:.
!!!

Why not just:

  if x is False:

?


Dynamic languages works since they are very polymorphic across types. Your function is more general if it doesn't have to know if you pass a list or a tuple or a dict, the implicit false will work on all of those. Making strict type checks quickly makes dynamic typing impossible to work with, and then you start to require strict typing everywhere, and at that point I'd not work with python but in some other language.

This goes for if you make a library as well, your library will be easier to use if you are less strict about the inputs you take, since that allows your user to work in a more naturally dynamic way. I love static types, but I have worked on making python libraries and there accepting a wide range of inputs is an important part of usability.


On the other hand, big codebases without type hints rely on their engineers remembering types of every argument. Or the functions being overly defensive.

Each to their own liking, I prefer knowing what argument types a function accepts so I don't need to think about it, and focus on writing business logic. If the function could accept more types, Id just improve it.


I used type hints everywhere in python, they are orthogonal to what I talked about.


Isn't your first paragraph all about not knowing what argument types the function accepts just looking at its declaration, as it defeats the purpose of using a dynamic language?


Type hints supports generics and abstract interfaces, you use those to display what behaviour you are using within the function and then you try to do what you need in the function as dynamically as possible.

That is for library code, maybe it would be too cumbersome to try to do that for code with less reuse. I have never worked on a large python codebase that wasn't a library so I'm not sure what is best there.


This makes it easier for errors to go unnoticed in large codebases. If the function expects to take a list, and someone passes a tuple, it's likely that they passed the wrong value by accident.

For beginners, and in toy examples, it's kind of neat when code bends over backwards to work. Take this example:

    >>> def all_uppercase(lst):
    ...     return [s.upper() for s in lst]
    ...
    >>> all_uppercase('hi')
    ['H', 'I']
    >>>
Kind of neat, right? It's almost like a joke in code. Ha ha, iterating over a string gives you strings! But the charm of finding cute things to do with unexpected inputs doesn't scale. What is a helpful attitude at a small scale translates to "errors should manifest as far away as possible from the programming mistake that caused them" at a large scale. 99 times out of 100, if your code expects a list and somebody passes a tuple, they want a stack trace, not a return value.

> I have worked on making python libraries and there accepting a wide range of inputs is an important part of usability

I work with a large Python codebase at work, and this is a frequent source of frustration. I frequently track down bugs and find that on some untested code path our code passes nonsensical values of the wrong type to a third-party library, and the library just... finds some way of interpreting it.

Even if all the code paths get tested, they can't be tested with every possible input. Property-based testing seems like overkill for our application, and our tests are already almost slow enough to be annoying. And what if the third-party library is side-effecting in a way that's hard to test? It gets mocked out. And I find that the mocks are configured to expect the nonsensical values, because the original programmer found that they "work."

All because libraries don't want to make an unfriendly impression by throwing a stack trace.


> Why is the implicit:

> if not users:

> an improvement in any of these cases?

Function iterates over the input. User provides a list,

     if users == ():
test fucks up because lists and tuples are never equal.

Literally no gain, only pain.


I don't quite understand. In the case where the function is expecting a list, why would you want to execute the logic for "empty list" on an empty tuple?

Wouldn't this just be the developer using the wrong comparison for the types the function is expecting(hence more reason to be explicit instead of using the implicit false)?


> I don't quite understand. In the case where the function is expecting a list, why would you want to execute the logic for "empty list" on an empty tuple?

Because for most functions that's not a relevant or useful distinction, in Python a tuple is an immutable list, both are sequences.

By mis-handling empty tuples you're just unnecessarily constraining the caller. Not only that, but you might also create an inconsistency which is hard for the caller to notice if your function only fucks up on empty collections.


Good point!


A good answer is because you don't necessarily know if the thing you're getting is a list, a tuple, or a RepeatedCompositeFieldContainer (a protobuf list), or some other type that meets the Sequence/MutableSequence abstract base class contract. `if not foo` will check that they're all empty, while `if foo == []` will have unexpected behavior if foo starts returning a set tomorrow.

The generalization of this is to code against as generic an api as possible, you wouldn't do `list.__eq___(x, y)` in your code, but you're suggesting almost exactly that.

(granted you can still run into this kind of issue if foo is a generator, but that's a less common way to explode).

The style guide does tell you to use explicit `x is None`, instead of implicit bool when checking noneness, specifically to disambiguate between binary and ternary values, but usually that's not what you want.


>type-checking can help

If we use Python as a strongly typed language, it makes no difference which one you use.

If we don't (i.e. use Python as it is: a dynamically-typed language), then this is just a preference.

Using (or exploiting, depends on how you think) Truthiness this way is actually an intentional choice in lots of case, especially if you have "else" condition.

Think it this way: you're going to split the conditions into two: `users` is non-empty, which is the "good" condition; and `users` is empty, which is the "bad" condition.

Then you have unexpected condition that "users" is something that shouldn't be, most commonly being None. In most of cases, this is a "bad" condition. So it makes sense it's grouped together with `users == []`.

If `users` is "True" or "False" as you said (which you should ensure to not happen in other ways anyway), then indeed it will not be captured by `users == []`, but it would still be broken/unmanaged in "else" side.


> If we use Python as a strongly typed language, it makes no difference which one you use. If we don't (i.e. use Python as it is: a dynamically-typed language),

Nitpick: Python is strongly typed, it's also dynamic. The strongly-weakly typed axis is different from the static-dynamic axis.


Thanks for letting me know


> Think it this way: you're going to split the conditions into two: `users` is non-empty, which is the "good" condition; and `users` is empty, which is the "bad" condition.

In a lot of cases though, an empty collection isn't a "bad" condition at all, e.g. it's a valid collection to apply filters/maps to.

Similarly when people get used to doing "if not i" for ints, but then forget about the times that zero is a valid value.

It's true that dynamic coercion is a feature of the language, but coding conventions generally are often about enforcing "least surprise" to remove a burden from the person reading the code.


> an empty collection isn't a "bad"

Then you don't need to check if it's empty to begin with.


I think they're making the distinction between special cases and error cases.

An empty list could be valid(e.x. a search of users providing no results) so you still need to differentiate between "special cases that need special logic" and "bad input".

Lumping the two together in one `if` block makes any code less readable imo, because they're not the same thing.


Yes agreed but the trouble being that if you see "if not users", you've to second guess the intent of behind it.

Is an empty list being routed to the else branch because it is an error in this instance or because it's an error in 90% of the codebase so the author forgot to handle it explicitly here?

Or is the author always expecting users will be a full or empty list and that other falsy values will never occur?


Ideally you'd use `if len(x) == 0`, which handles lists but also list-like things like tuples, while not letting None and False through.


len will count the object whereas “not” will only check for truthiness. This can matter in cases where counting takes significantly longer; for example, a sql query set.


This is purely subjective. First of all, I prefer putting type hints everywhere. PyCharm helps reminding one in a lot of cases.

If there is a bug and somebody passes invalid type to my function, I would just fix it and move on.

Very often both None and empty list are not an interesting case and I return early. Thus `not users` makes sense.

There are cases, though, when None means a sane default should be used instead and you can't use default argument values due to mutability. In these cases `users is None` makes perfect sense.

There are also cases I explicitly check for True and False - tests. In such cases I wouldn't rely on truthy/falsy values and assert True and False values by reference.

Having said that, its all subject ive. You like this style, some one else likes other style. What ultimately matters are two things: automatic formatters and consistency.


One thing I haven't seen anyone else mention besides not being idiomatic Python is that ”if users == []" and it's ilk allocate a new empty object for the comparison. It's unnecessarily slow and wasteful.

Others have mentioned that the comparison to say a tuple will also fail. If the intent is to ensure a list instance, use isinstance, instead.


You are 100% right. Truthiness leads to bugs. I would have thought that was well known by now.


When you're comparing collection literals, often type checking is part of equality.


No, this isn't explicit enough:

    if users == []:
Nor is this:

    if (users == []) == True:
Nor is this:

    if ((users == []) == True) == True:
...

/s


I hate style guides.

Probably the worst thing about this one is the pydoc. An not just pydoc. Pydoc, javadoc, doxygen. It is all useless garbage that litters the code with useless comments explaining that the get_height method "gets the height" while at the same time nobody is actually explaining anything remotely useful in comments.


This seems a rather broad generalization.. Style guides are certainly very useful specially when pair with auto formatting since they remove a lot of useless arguing.

Sure, we can discuss whether all poiktsw make sense but it seems excessive to dismiss them entirely just because you dislike mandatory method docstrings.


"they remove a lot of useless arguing". It is not like I have not heard this argument before. Unfortunately I have also experienced how it works out in practice. It goes more or less like this. "No, your desired change to the coding style cannot go into the formatting rules because we want to prevent useless arguing so we follow such-and-such style guide to the letter." "Actually, you are not follwing such-and-such style guide to the letter because you these formatting rules are actually not according to such-and-such style guide". "Yes, we decided to have these changes because we think they are good". Utterly despicable.

So besides the useless comments I also have that experience described in the previous paragraph.

And to top it all off, another 'nice' experience with style guides was being forced to used the long discredited 'hungarian notation' which at the time was already long past its expiry date with not really anyone competent believing it to be a good thing.

Maybe my bad experiences are not typical but I have had so many bad experiences with style guides that I am now at the stage where I consider anyone enforcing a style guide to be my personal enemy and most likely a despicable person.


I dont think it is an overgeneralization. Whenever I see lint rules regarding comments applied the comments routinely end up being no shit sherlock comments.

There probably are ways to encourage good comments but enforcing somebody write A comment, ANY comment in specific places at the point when theyre eager to merge is pretty much a recipe for shitty comments.


If you are terrible at writing docs, then that’s your fault, not the style guide’s for enforcing documentation. Some fields might be self-explanatory and the docs for them might be pointless, but it’s better to be overzealous here rather than allow people to skip documenting stuff, because then nothing will be documented.

Also, the requirement to have documentation comments is just one rule out of hundreds, so why focus on that one?


I dissagree that its better to be overzealous here, because it re-enforces the habbit of writing subpar docstrings. Alternatively, if you always do the work (and it is work) to make excellent docstrings, it will affect the way you write code by discouraging splitting code into small obvious functions. It just makes it slightly more annoying.

Just as with comments I think you should assume the reader is a somewhat competent human beeing. Explain what needs to be explained, and make a *conscious* decision about it. For most public functions I agree that this usually involves a docstring, but not always.


Actually, I am quite good at writing docs. That is why I hate mandated comments in certain places. I would like to write better docs than that. In fact your comment tells me you (or perhaps your colleagues) are the one who is terrible at writing docs. "because then nothing will be documented" says it all. I actually would write docs voluntarily and they would actually be helpful.

Why focus on that one? Because it turns out that rule will force me to have 50% of the text that I see in a file be these useless comments.


Some things can’t be linted and require a code reviewer to point them out. Noting the absence of documentation where it is clearly needed would be one of those things.

  def get_height():  # noqa
    …
Is just as annoying as useless documentation.


>On the other hand, never describe the code. Assume the person reading the code knows Python (though not what you’re trying to do) better than you do.

># BAD COMMENT: Now go through the b array and make sure whenever i occurs

># the next element is i+1

In practice style guides will usually advise against the kind of comment you're blaming them for.


It forces you to make that decision explicitly. Yes sometimes you just put "Gets the height" and move on. But at least you thought about it and those functions that do need clarification get it because you are reminded of it and the codebase is better for it.


No, it turns to code base into an absolute garbage where 50% of the characters that are on display are part of a comment that noone needs.


Pydoc comments aren’t always for the reader of the code. They’re for people reading the API reference documentation and the code completion window.


I always roll my eyes, when I see some "documentation" like that. Usually such a community has some overly strict pre-commit hooks you are supposed to use and tons of tooling as well, but the quality of documentation is still bad. Then people lean back and think, that they have good documentation now.


When you dogmatically require that every function must have a comment, that's what happens, regardless of whether it makes sense or not. There's even tools like GhostDoc which will automatically generate useless (and sometimes hilarious) comments for you, just to show how pointless it is, especially when it's only for satisfying another equally useless tool that checks for their presence.


Functions are not usually as obvious as the creator thinks they are.

Does get_height() include the optional border width or not? What does it return while the item is hidden? Does it return an int or float? Can it return a string value like "auto" or a null value? Under what circumstances should its value not be relied upon, e.g. while an animation is active? Why was it written as a function rather than a property? And so forth.

Requiring a comment is helpful in reminding the author to explain why this code was even separated out as a separate function in the first place, and all of the potential gotchas around it.


Gets the height of what? In what?


self, # height is in meters

(also use numericalunits if you have so many dimensions that they're easily mixed up)


You are right. I give you too little information.

class Rectangle: ...

   def get_height() -> Micrometer:
      ....
Now we clearly need to sufficiently document this method by saying "gets the height of the Rectangle in Micrometers".

And yes, this is the garbage kind of answer that one gets whenever one brings up the point that I brought up. Honestly, I am already fondly hoping you will never be a colleague of mine.

This kind of 'documentation' has to be the worst and most disgusting kind of cargo culting ever invented in programming.


No need to make things personal.

Personally, I’d agree with your example. But does this apply to all other situations? Often documentation comments can be very useful


I'd definitely agree here. There's a lot of things that cannot be expressed in a language, and need comments.

Even for e.g. get_height, does it make a DB call? Is it expensive and shouldn't be called in a tight loop? Can it ever return 0?


You're making the case against comment guidelines: It's poor engineering to resort to a comment to tell you all that.

if get_height returns a simple value, then it "smells" like it should be a simple getter.

So regardless of your comment, people will use it exactly like that and get burnt. Your comment only helps once they're already suffering enough to go digging for why things are so slow.

If the operation needs a DB call it should be exposing a way to handle the infinitely higher likelihood of failure with something like a result type, and it probably shouldn't be a simple blocking call

-

No one is saying you should never have comments by the way, but pydoc-style guidelines where you're encouraged to fill out a template end up with terrible signal to noise.

Comments should be seen as an absolute last resort and a liability.

The only parts of comments that can be automatically refactored are the absolute least useful ones. Things like intent aren't magically be pulled out and updated automatically (yet), so you're now putting the onus on every single person who ever touches that code again to keep your comment up to date, otherwise it's can end up worse than nothing.


Since auto complete works great in all code editors, each variable and function name should be allowed to be long enough to explain its type and uses.


Absolutely not.

I worked on a team like this who regularly wrote function names well into the 160-200 character range. It was useless.


To those that wish to automate a subset of these conventions, there is a tool called Sourcery[1] that I, personally, am a huge fan of! Not only does it have a large set of default rules[2], but it can also allow you to write your own rules that may be specific to your team or organization, and as mentioned it can enable you to follow Google's Python style guide as well[3].

There are some refactorings that Sourcery suggest that I don't agree with myself, namely the usage of 'contextlib.suppress'[4] as I don't like to introduce an additional 'import' statement just to do something so trivial. I wish Sourcery would add the relevance of having possibly too many 'import' statements as a heuristic.

---

[1]: https://sourcery.ai/

[2]: https://docs.sourcery.ai/Reference/Default-Rules/ (expand the sub-pages)

[3]: https://docs.sourcery.ai/Reference/Optional-Rules/gpsg/

[4]: https://docs.sourcery.ai/Reference/Default-Rules/refactoring...


Related:

Python Style Guide from Google - https://news.ycombinator.com/item?id=11839332 - June 2016 (27 comments)

Google's Python style guide - https://news.ycombinator.com/item?id=3861617 - April 2012 (86 comments)

Google Python Style Guide - https://news.ycombinator.com/item?id=1311126 - May 2010 (23 comments)


One thing I never understood from it is the recommendation to avoid staticmethods, classmethods[1]. I was befuddled by it at first, looked it up on moma, and even asked around, but never got a convincing answer during my time there. IIRC, their C++ style guide had even stronger opinions, diverging from the norm.

[1]: https://google.github.io/styleguide/pyguide.html#217-functio...


@staticmethod is kinda pointless in two ways. One, it has the same effect as @classmethod, except the first argument, so you can always just use @classmethod. Two, you could argue that static methods don’t make much sense in a language which has top-level functions (unlike Java). If something is part of the class but does not make use of the class state and should be usable from outside of the class, then it should be a plain function, not a @staticmethod. (Overzealous linters that suggest @staticmethod when the method could be a function, or when the method is part of some public interface and does not necessarily have to be static in other implementations, are dumb.)


I think the argument is that static methods are just functions, they are not strongly tied to a class or it's objects, so then let's not have two kinds of entities (normal functions and static methods) that do the same thing.

Class methods are a weird thing that can be used to affect all objects of a class, so are kinda global in a sense, which is why they should be avoided.


static methods are nice for clarifying scope


To add to the sibling comments, nudging you into module level constructors for classes will encourage more modularity in general.

If you have a class like this with a separate constructor in the same scope as the class…

  class Cow:
    def __init__(self, name)
      …

  def random_cow() -> Cow:
    return Cow(uuid())
…you are more likely to roll this all up into farm.cow than you are to lump all the animals together in a single farm module.

Modularity is nice of course because it helps you step away from implementation detail (close the file, forget about how it works, and just use it) and your code gets split up into little pieces that helps your e.g. bazel monorepo build/test work efficiently.


I seem to remember they used to indent Python code with 2 spaces. Glad to see it's 4 now.


I liked the look of 2 spaces tbh. Code had a certain „compact“ feeling to it combined with the 80 column limit.


For an even better feeling, combine 2 spaces with a more sensible line length. Why does the Google Python style guide allow only 80 columns and 4 spaces, but the Java style guide allows 100 columns and 2 spaces? Python may be simpler than Java, but the difference feels too large IMO, and it’s time to get rid of old TTY vestiges and embrace the larger screens.


When dealing with screen splits, max line length is important. If I have my screen split vertically, I can't display more than 90 characters per line without having to horizontally scroll or deal with soft wrapping even with a large screen. If I'm resolving conflicts using a 3 way vertical split with diff3, having a shorter line length is even more important.


No, anything above 80 characters per line doesn’t fit two documents on the screen at the same time. Not everyone codes with a font size of 10.


This depends on your font size and on your screen size. I can manage 230 columns with 14px Consolas on a 1080p screen, which is plenty for two 100-column documents, and almost enough for 120-column. And even if it won’t fully fit, most lines don’t reach the length limit, and for the remaining ones, in a two-file scenario, you could enable the soft word wrap feature of your editor, or just cut them off and scroll a little when absolutely necessary.


Fwiw, I didn’t care about any of this until I worked with a visually-impaired programmer.

In fact, a better article would have been to try and predict the eye powers of a programmer based on their formatting choices.


Java is much more verbose than python, lines get a lot longer.


Which is why I use 3. I noticed 4 was easier to read, but was too bloated, so why not 3? Is there some specific reason why they want even numbers? 3 seems so right to me.

Edit: Did someone seriously downvote me for how many spaces I indent with? Funniest downvote I've gotten so far.


I am pretty sure it used to be 2 spaces as well. Some public repositories such as https://github.com/tensorflow/tensorflow/tree/master/tensorf... appear to use 2 space indent throughout.

I think all these will be covered by the "be consistent" clause, and whoever made the first commit decides the style.


It’s strange that some code in the style guide is still 2-space intended, though. For example, all the code from 3.19.3 until the end of the document.


I agree 4 spaces is preferable, but working in a code base with both conventions due to legacy code is hardly ideal.


It's still 2 spaces. This is the public version of the style guide, using 4.


Content aside, as a front end dev, I am liking the clean design of that site:

* no goofy animations (no slide down when the table of contents expands, just BOOM open)

* no distracting images / icons

* no dark mode toggle (blasphemous nowadays, I know...)

* black and white

* logical font sizing for section / sub section numbers

Just clean and simple. My compliments to the designer.


> no dark mode toggle (blasphemous nowadays, I know...)

Why the hate ? I love an option for dark mode or else I go blinded by the lights on my monitor.


No hate, just a preference.


I might not be understanding you, but are you saying your preference is that the preference not be offered?


See also how yapf defines the Google formatting style:

https://github.com/google/yapf/blob/v0.32.0/yapf/yapflib/sty...


I would always prefer keyword argument calling in a professional environment.

The exception to this rule could be functions that have only one argument.


About the relative imports: https://google.github.io/styleguide/pyguide.html#233-decisio...

The guide states this is unclear:

    import jodie
I agree, but why not using:

    import .jodie


I was happy to see “Nested local functions or classes are fine”

Being a mostly Lisp developer, I like to nest local functions in order to close over locally defined variables. Glad that is considered OK in Python.


I’m personally defaulting to pylance, flake8, black, mypy (strict), autoflake and isort to keep my Python code sensible. That said, it usually takes me like 15 minutes to set this all up in VSCode.


Someone in this thread mentioned Ruff. I think you might be interested.

https://beta.ruff.rs/docs/


Can you please share a guide on your setup?


Golang has the right idea. There's an inbuilt language style and what you think matters doesn't because there's already a formatting tool and we'll all bend to it's will.


Black has been around a while and has pretty widespread use: https://pypi.org/project/black/


Black is opinionated, but you just run it, accept the opinions, and never think about formatting after that. No debates over how to format a code block; focus on the actual logic being implemented.


yup, after i embraced black, i can't go back (no pun intended) to not using it. it makes reading code so much easier.


It’s been a while since I’ve given black a shot, but I recall getting some really gnarly/ugly line wraps out of it and thinking it made the code less readable. I liked the idea but not the subjective style choices the devs made


Black defaults to a line length of 88. While it is slightly larger than the old PEP8 standard of 79, and while it was chosen the intention of avoiding some ugly line wraps, it is still quite small and still leads to ugliness. Try raising the limit to a more reasonable 100-120 and see if it helps.


I mostly don't like how removing a parameter to a function can make it become 1 line, and then in the diff I have no idea of what has happened.

In reviews I always ask that there must be a separate formatting commit, at the end.

Also, because our builds fail if the code is not formatted, that means constant reformatting and moving around of commits.

In the end the time wasted to start the container to run black (if you use the distribution one, every version formats differently), to run black (which is terribly slow), and juggle the commits around is hardly worth it.

However I believe from a management perspective it gets rid of discussions about style in the reviews, so it looks like time is being saved because now the developers waste it each on their own in silence, without communicating.


All your points are right in what I’d say is the most common ambient professional environment.

We had an internal debate about how to gauge code quality. One camp only allowed the combination of black format plus coverage. To play devils advocate, I said that the number of asserts removed or added per merge request.


The way I think about it is that if black's output looks weird, you should probably restructure your code. It's not _always_ true, but often enough it means the code can be made simpler, some values created in flight can be given names, and so on.


Not saying there’s harm in realizing this, but the weirdest formatting choices in black seem to arrive deep in numerical formulae, where formatting actually makes differences in clarity. Under our typesetting-perfectionist overlords, our only recourse is to grow variable names so that the formulae make sense.

And this defeats trying to reuse variable names from publication, so that a more-international audience can follow ‘ss’ rather than sum_of_squared_residuals_dude.

Julia promotes this as a war cry over Python.


Formatting is one small part of a style guide, and yes in every self-respecting place formatting is automated. What is then left is more interesting and important: naming, code organization, management of state, ...


Formatting is only a subset of style, there's plenty of questions it doesn't solve even in Go. Off the top of my head: value v pointer receivers, named return values, public fields v accessors, var v :=, in-band versus out-of-band signalling, closures in `go` statements (and even raw `go` statements at all), channel types, iface side-casting, etc...


And generally my answer to all of that or both go and python is be consistent to yourself and to the codebase. And if you're going to change stuff in a big way. have a reason. And if you don't have a reason don't change.


> And generally my answer to all of that or both go and python is be consistent to yourself and to the codebase.

That's nice but not everybody thinks that way, and when the project gets big enough inconsistencies get introduced everywhere so there is no way to be "consistent with the codebase" unless you impose consistency via a style guide (and ideally a linter, so reviewers don't have to be the one performing style checking)


80 chars line width is too small imho. This isn’t the 90s anymore.


Why so? I prefer short width than longer one because it's easy to scan. I can also have multiple tabs open without having to scroll.


> Use the “implicit” false if at all possible.

I understand that it make the code less verbose, but I don't agree that is "less error prone" as stated.

> May look strange to C/C++ developers.

Yes..


Nitpick

“2.5 Mutable Global State

Avoid mutable global state.

[…]

2.5.2 Pros

Occasionally useful.”

So, are these pros of mutable global state or of avoiding it? Elsewhere pros and cons appear to refer to the description of the guideline, not the title (see 2.2 Imports).


There are pros to using it and pros to avoiding it. That’s not a contradiction because they are different pros. Just avoid it if you can, or unless the pain of doing so is too great to justify.


" [...]

2.5.4 Decision

Avoid mutable global state."

This style guide touches both pros and cons then states the decision.


It's worth noting that Guido himself had an, um, "interesting" time getting Python Readability at Google.


This is cool but my biggest issue with python is how difficult it is to just use without blowing up my workstation.

Honestly even with a version manager it can become a nightmare and it’s the primary reason I’ve stayed away from it. Also because I’m really a mathematician or something who needs to use any of the extensive python libraries to do some cool AI stuff.

Here’s to hoping I never have to deal with actually maintaining or working on a python codebase. Cheers!


> without blowing up my workstation

You should file a bug report. Sounds like an über P1 because of the physical workstation damage.


Ha ha. Very funny.

But I meant more that even tho I was using pyenv somehow broke my gcloud cli and took me a few minutes of frustration to get things working again because I had to install some other dependencies I didn’t have. Eventually I just ended up using it in a vm. :/

Not exactly the most pleasant dx compared to the other programming languages I work with on the daily.


1. create virtualenv: python3 -m venv bla

2. activate virtualenv: . bla/bin/activate

3. install stuff: pip install blablablabla

4. do things

5. remove bla and repeat if you want to start clean

> Here’s to hoping I never have to deal with actually maintaining or working on a python codebase. Cheers!

In this case, just use distribution packages.


What exactly is the problem with using the virtualenv? It ain't rocket science hell it isn't even "AI development".


Are there any similar guide for React/TS/JS. Or frontend in general?


Prefixing with underscores to pretend that a variable is private still seems like the most pointless and ugly thing ever.


Why is it pointless to communicate that a function shoulf not be called from the outside? Also, most autocomplete tools respect rhis and only show them when you start typing _.

I think it’s actually great that in Python you can still call these functions in a pinch.


> linters will flag protected member access

Sometimes you've got internal code. Sometimes people use it. Communicating which code is internal helps people avoid depending on it. Allowing people to access internals allows them to decide if it's worth the risk, and sometimes it is worth the risk. Tooling support makes it easier to work with this model.


We mostly prefix function and method names, not variables.

Prefixing module variables with an underscore is a bit strange.


How else do you communicate to a user of your code in a language with no access controls that there be dragons if you mess with an object’s internal data structures?

Type hinting could do it today but you’d need a PEP to add it.


You can use postfix if you want.


Google again with the not invented here syndrome


WHAT, always use four spaces instead of TAB?? Oh my god.


I feel like I'm either too young or somehow sheltered to understand why people care about this. Hasn't it been very standard for a while now to press the tab key and have your editor insert 4 spaces?


Yeah, probably. I'm not so comfortable with all the whitespace in Python, but the thought of having to tap space 4x constantly while writing code is so hugely off-putting to me as an idea. I'd rather just continue pressing tab a single time, all the time, until the day I die.


I don't think anyone taps space four times; they just set tab to insert four spaces. So from a typing perspective, there's no difference.

It's a pretty standard setting in every editor I've used, including vim.


Every single editor has some equivalent to `expandtab`, so you just use that instead of pressing space x times.


Do you have to press tab 3 times if you're inside blocks?

Do you use notepad.exe or something more ancient like edline?


For reference, this has always been the recommended style in PEP 8 (initially published in 2001). https://peps.python.org/pep-0008/#indentation


Jesus christus


In comparison, Godot's python-like GDScript prefers tabs:

https://docs.godotengine.org/en/stable/tutorials/scripting/g...

I have come to appreciate tabs here, but you have to have already established tabs as the prevailing convention for this to work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: