Counting occurrences in Python with collections.Counter

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
5 min. read Python 3.8—3.12
Share
Copied to clipboard.
Tags

Python's Counter class is one of the most useful data structures that's also frequently overlooked. Counter objects are mappings (dictionary-like objects) that are specially built just for counting up occurrences of items.

I'd like to share how I typically use Counter objects in Python.

What is a Counter?

Python's collections.Counter objects are similar to dictionaries but they have a few extra features that can simplify item tallying.

>>> from collections import Counter
>>> colors = ["purple", "pink", "green", "yellow", "yellow", "purple", "purple", "black"]
>>> color_counts = Counter(colors)
>>> color_counts
Counter({'purple': 3, 'yellow': 2, 'pink': 1, 'green': 1, 'black': 1})
>>> color_counts['purple']
3

Creating a Counter object

There are two ways you'll usually see a Counter object made:

  1. Using a for loop
  2. By passing in an iterable

Here's an example of using a for loop to increment keys within a Counter object:

from collections import Counter

words = Counter()
for word in text.split():
    words[word.strip(".,!?\"'")] += 1

Note that this is similar to a dictionary, except that when a key doesn't exist within a Counter that key's value will default to 0.

Here's an example of passing an iterable to Counter:

from collections import Counter

words = Counter(w.strip(".,!?\"'") for w in text.split())

Note that we're passing a generator expression to the Counter class here. It's pretty common to see a generator expression passed to Counter if the items you're counting need a bit of normalizing or altering before they're counted up (we're stripping punctuation in our case).

Of these two ways to use Counter, passing an iterable directly into Counter is simpler and usually preferable to using a for loop.

Let's look at some of the most useful operations that Counter objects support.

Getting the N most common items

The feature I use Counter for most often is the most_common method.

The most_common method is like the dictionary items method but sorts the items by their values (their counts) in descending order.

>>> color_counts.items()
dict_items([('purple', 3), ('pink', 1), ('green', 1), ('yellow', 2), ('black', 1)])
>>> color_counts.most_common()
[('purple', 3), ('yellow', 2), ('pink', 1), ('green', 1), ('black', 1)]

Unlike the items method, most_common also accepts a number to indicate how many of the most common items you'd like (it returns all items by default).

>>> color_counts.most_common(2)
[('purple', 3), ('yellow', 2)]

Keep in mind that if there's a "tie" for the most common n-th item, the tie will be arbitrarily broken. For example, here there are two items that tie for "most common item" but most_common(1) just returns one of them:

>>> color_counts["yellow"] += 1
>>> color_counts.most_common(2)
[('purple', 3), ('yellow', 3)]
>>> color_counts.most_common(1)
[('purple', 3)]

Examples of getting the most common items

Here we're asking for the 5 most frequently seen characters in a string:

>>> from collections import Counter
>>> message = "Python is pretty nifty!"
>>> Counter(message.casefold()).most_common(5)
[('t', 4), ('y', 3), (' ', 3), ('p', 2), ('n', 2)]

Or the most common word in a string (assuming there's no punctuation):

>>> lyric = "don't worry about it just do what you do and do it good"
>>> Counter(lyric.split()).most_common(1)
[('do', 3)]

Or, using a regular expression, we could get all words that appear more than once displayed in descending order of commonality, with punctuation removed:

>>> from collections import Counter
>>> import re
>>> bridge = """
... If you read the album cover by now
... You know that my name is what my name is
... When I came in here to try and
... Do this, something I've never done before
... Mr. Jones, Booker T., said to me
... Don't worry about it
... Just do what you do
... And do it good
... """
>>> words = re.findall(r"[A-Za-z']+", bridge)
>>> for word, count in Counter(words).most_common():
...     if count <= 1:
...         break
...     print(word)
...
do
you
my
name
is
what
to
it

Adding items to a Counter

Like dictionaries, Counter objects have an update method:

>>> letters = Counter("hi")
>>> letters.update({"a": 1, "b": 1, "c": 2})
>>> letters
Counter({'c': 2, 'h': 1, 'i': 1, 'a': 1, 'b': 1})

But unlike dictionaries, the update method on Counter objects is usually used to count additional items:

>>> letters = Counter("hi")
>>> letters.update("hiya")
>>> letters
Counter({'h': 2, 'i': 2, 'y': 1, 'a': 1})

You can pass an iterable to update and the Counter object will loop over it and increase the counts of those items.

Subtracting items from a Counter

Counter objects also have a subtract method:

>>> colors = Counter()
>>> colors.subtract(["red", "green", "blue", "green", "blue", "green"])
>>> colors
Counter({'red': -1, 'blue': -2, 'green': -3})

If we only ever subtract items from our Counter, the most_common method would instead return the least common items (since our counts are all negative):

>>> colors.most_common(1)
[('red', -1)]

It's rare that I use negatives in counters, but they can occasionally be handy. Negatives with Counter can be finicky when combined with arithmetic though, so use them with caution. Otherwise your zero and negative values may disappear if you're not careful:

>>> colors
Counter({'red': -1, 'blue': -2, 'green': -3})
>>> colors + Counter({'red': 2, 'green': 1})
Counter({'red': 1})

Removing negative counts

What if you want to discard all negatives and zero counts from your Counter object?

You can use the unary + operator to remove every item that doesn't have a positive count:

>>> from collections import Counter
>>> letters = Counter('aaabbc')
>>> letters.subtract('abbcc')
>>> letters
Counter({'a': 2, 'b': 0, 'c': -1})
>>> letters = +letters
>>> letters
Counter({'a': 2})

Arithmetic with Counter objects

You can even add Counter objects together:

>>> fruit_counts = Counter(["pear", "kiwi", "pear", "lime"])
>>> more_fruit_counts = Counter(["pear", "lime"])
>>> fruit_counts + more_fruit_counts
Counter({'pear': 3, 'lime': 2, 'kiwi': 1})

And you can subtract them:

>>> fruit_counts - more_fruit_counts
Counter({'pear': 1, 'kiwi': 1})

Note that once a value becomes 0 or negative, it'll be removed from the Counter object.

Counter comprehensions

By far my most common use for Counter is passing in a generator expression to count up a specific aspect of each iterable item.

For example, how many users in a list of users have each subscription type:

Counter(
    user.subscription_type
    for user in users
)

Or, counting up each word in string, while ignoring surrounding punctuation marks:

words = Counter(w.strip(".,!?\"'") for w in text.split())

Those are actually generators passed into the Counter class, but they're like comprehensions: they use a comprehension-like syntax to create a new object (a Counter object).

Use Counter for counting occurrences of many items

The next time you need to count how many times a particular item occurs, consider using collections.Counter.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.