September 13, 2019 · Python

Should you use "dot notation" or "bracket notation" with pandas?

If you've ever used the pandas library in Python, you probably know that there are two ways to select a Series (meaning a column) from a DataFrame:

# dot notation
df.col_name

# bracket notation
df['col_name']

Which method should you use? I'll make the case for each, and then you can decide...

Why use bracket notation?

The case for bracket notation is simple: It always works.

Here are the specific cases in which you must use bracket notation, because dot notation would fail:

# column name includes a space
df['col name']

# column name matches a DataFrame method
df['count']

# column name matches a Python keyword
df['class']

# column name is stored in a variable
var = 'col_name'
df[var]

# column name is an integer
df[0]

# new column is created through assignment
df['new'] = 0

In other words, bracket notation always works, whereas dot notation only works under certain circumstances. That's a pretty compelling case for bracket notation!

As stated in the Zen of Python:

There should be one-- and preferably only one --obvious way to do it.

Why use dot notation?

If you've watched any of my pandas videos, you may have noticed that I use dot notation. Here are four reasons why:

Reason 1: Dot notation is easier to type

Dot notation is three fewer characters to type than bracket notation. And in terms of finger movement, typing a single period is much more convenient than typing brackets and quotes.

This might sound like a trivial reason, but if you're selecting columns dozens (or hundreds) of times a day, it makes a real difference!

Reason 2: Dot notation is easier to read

Most of my pandas code is a made up of chains of selections and methods. By using dot notation, my code is mostly adorned with periods and parentheses (plus an occasional quotation mark):

# dot notation
df.col_one.sum()
df.col_one.isna().sum()
df.groupby('col_two').col_one.sum()

If you instead use bracket notation, your code is adorned with periods and parentheses plus lots of brackets and quotation marks:

# bracket notation
df['col_one'].sum()
df['col_one'].isna().sum()
df.groupby('col_two')['col_one'].sum()

I find the dot notation code easier to read, as well as more aesthetically pleasing.

Reason 3: Dot notation is easier to remember

With dot notation, every component in a chain is separated by a period on both sides. For example, this line of code has 4 components, and thus there are 3 periods separating the individual components:

# dot notation
df.groupby('col_two').col_one.sum()

If you instead use bracket notation, some of your components are separated by periods, and some are not:

# bracket notation
df.groupby('col_two')['col_one'].sum()

With bracket notation, I often forget whether there's supposed to be a period before ['col_one'], after ['col_one'], or both before and after ['col_one'].

With dot notation, it's easier for me to remember the correct syntax.

Reason 4: Dot notation limits the usage of brackets

Brackets can be used for many purposes in pandas:

df[['col_one', 'col_two']]
df.iloc[4, 2]
df.loc['row_label', 'col_one':'col_three']
df.col_one['row_label']
df[(df.col_one > 5) & (df.col_two == 'value')]

If you also use bracket notation for Series selection, you end up with even more brackets in your code:

df['col_one']['row_label']
df[(df['col_one'] > 5) & (df['col_two'] == 'value')]

As you use more brackets, each bracket becomes slightly more ambiguous as to its purpose, imposing a higher mental burden on the person reading the code. By using dot notation for Series selection, you reduce bracket usage to only the essential cases.

Conclusion

If you prefer bracket notation, then you can use it all of the time! However, you still have to be familiar with dot notation in order to read other people's code.

If you prefer dot notation, then you can use it most of the time, as long as you are diligent about renaming columns when they contains spaces or collide with DataFrame methods. However, you still have to use bracket notation when creating new columns.

Which do you prefer? Let me know in the comments below!

Addendum

There were some thoughtful comments about this issue on Twitter, mostly in favor of bracket notation:

Comments powered by Disqus