Python Big O: the time complexities of different data structures in Python

Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
9 min. read Python 3.8—3.12
Share
Copied to clipboard.

Let's look at the time complexity of different Python data structures and algorithms.

This article is primarily meant to act as a Python time complexity cheat sheet for those who already understand what time complexity is and how the time complexity of an operation might affect your code. For a more thorough explanation of time complexity see Ned Batchelder's article/talk on this subject.

Time Complexity ⏱️

Time complexity is one of those Computer Science concepts that's scary in its purest form, but often fairly practical as a rough "am I doing this right" measurement.

In the words of Ned Batchelder, time complexity is all about "how your code slows as your data grows".

Time complexity is usually discussed in terms of "Big O" notation. This is basically a way to discuss the order of magnitude for a given operation while ignoring the exact number of computations it needs. In "Big O" land, we don't care if something is twice as slow, but we do care whether it's n times slower where n is the length of our list/set/slice/etc.

Here's a graph of the common time complexity curves:

O(1), O(log n), O(n), O(n log n), and O(n^2) curves plotted on a graph, each being steeper in slope than the last

Remember that these lines are simply about orders of magnitude. If an operation is on the order of n, that means 100 times more data will slow things down about 100 times. If an operation is on the order of (that's n*n), that means 100 times more data will slow things down 100*100 times.

I usually think about those curves in terms of what would happen if we suddenly had 1,000 times more data to work with:

  • O(1): no change in time (constant time!)
  • O(log n): ~10 times slow down
  • O(n): 1,000 times slow down
  • O(n log n): 10,000 times slow down
  • O(n²): 1,000,000 times slow down! 😲

With that very quick recap behind us, let's take a look at the relative speeds of all common operations on each of Python's data structures.

List 📋

Python's lists are similar to arrays or array lists in some other languages.

Here are the time complexities of some common list operations:

Big O Operation Notably
O(1) sequence.append(item) Fast!
O(1) sequence.pop(item) Fast!
O(n) sequence.insert(0, item) Slow!
O(n) sequence.pop(item, 0) Slow!
O(1) sequence[index]
O(1) sequence[index] = value
O(n) item in sequence Slow!

I've called out append, pop, and insert above because new Python programmers are sometimes surprised by the relative speeds of those operations.

Adding or removing items from the end of a list are both very fast operations regardless of how large the list is. On the other hand, adding or removing items from the beginning of a list is very slow (it requires shifting all items after the change).

Note that indexing and assigning to indexes is fast regardless of the index. Also note that the in operator requires looping over the list, unlike sets (as we'll see below).

In case you're curious, here are even more list operations:

Big O Operation Notably
O(1) len(sequence)
O(k) sequence.extend(iterable)
O(k) sequence[index:index+k]
O(n) sequence.index(item) Slow!
O(n) sequence.count(item) Slow!
O(n) for item in sequence:

For the extend method, k represents the length of the given iterable. For slicing, k represents the length of the slice.

Double-Ended Queue ↔️

Lists are stack-like. That means it's inexpensive to perform operations on the most-recently added items (at the end of a list).

For inexpensive operations involving the least-recently added item (the beginning of a list), we'd need a queue-like structure. That's what Python's collections.deque data structure is for.

>>> from collections import deque
>>> queue = deque([2, 1, 3, 4])

Here are the time complexities of common deque operations:

Big O Operation Notably
O(1) queue.append(item)
O(1) queue.pop(item)
O(1) queue.appendleft(item) Fast!
O(1) queue.popleft(item) Fast!
O(n) item in queue
O(1) queue[0] or queue[-1]
O(n) queue[i] Slow!
O(n) for item in queue:

Note that we can efficiently add and remove items from the beginning of a deque with the appendleft and popleft methods. If you find yourself calling the insert or pop methods on a list with an index of 0, you could probably speed your code up by using a deque instead.

Also note that, looking up arbitrary indexes in deque objects requires looping! Unlike lists, deque objects are implemented as a doubly-linked list. Fortunately, looking up arbitrary indexes is pretty unusual for both lists and deque objects in Python (since our for loops are not index-based).

Dictionary 🗝️

Dictionaries are meant for grouping or accumulating values based on a key. Our "dictionaries" in Python are called hash maps (or sometimes "associative arrays") in many other programming languages.

Here are the time complexities of some common dictionary operations:

Big O Operation Notably
O(1) mapping[key] = value Fast!
O(1) mapping[key]
O(1) mapping.get(key)
O(1) mapping.pop(key) Fast!
O(1) key in mapping Fast!
O(n) for k, v in mapping.items():

Note that the only expensive operation on a dictionary involves explicitly looping over the dictionary.

Thanks to the power of hashing, dictionaries are very fast at all operations related to key lookups. Checking for containment, inserting a new item, updating the value of an item, and removing an item are all constant time operations (that's O(1) in big O).

Here are time complexities of slightly less common dictionary operations:

Big O Operation Explanation
O(1) next(iter(mapping)) Get first item
O(1) next(reversed(mapping)) Get last item
O(n) value in mapping.values() Value containment
O(k) mapping.update(iterable) Add many items

The k in O(k) for the update method represents the number of items in the given iterable.

Note that getting the first and last items is a bit awkward, but very fast.

Also note that checking whether a dictionary contains a particular value is slow! Dictionaries are optimized for fast key lookups, but not fast value lookups. Key containment checks are fast, but value containment checks require looping over the whole dictionary.

Set 🎨

Sets store distinct items.

Unlike lists, sets don't maintain the order of their items. Instead, they're optimized for quick containment checks.

Here are the time complexities of some common set operations:

Big O Operation Notably
O(1) my_set.add(item) Fast!
O(1) my_set.remove(item) Fast!
O(1) item in my_set Fast!
O(n) for item in my_set:

Like dictionaries, the only expensive operation on a set involves explicitly looping over the set.

Most importantly, asking whether a set contains an item (item in my_set) is fast, unlike lists.

Sets also support various operations between multiple sets:

Big O Operation Explanation
O(n) set1 & set2 Intersection
O(n) set1 | set2 Union
O(n) set1 ^ set2 Symmetric difference
O(n) set1 - set2 Asymmetric difference

I'm assuming the sets are the same size here. If the sets are different sizes, some of those operations will be on the order of either the smallest or largest set size (depending on the operation).

Also note that all of those operations work the same way between dictionary keys as well! For example, if you wanted to efficiently find the common keys between two dictionaries, you can use the & operator:

>>> colors1 = {"purple": 1, "blue": 2, "green": 4}
>>> colors2 = {"red": 2, "purple": 3, "blue": 1}
>>> colors1.keys() & colors2.keys()
{'blue', 'purple'}

Counter 🧮

Python's collections module includes a Counter class which can efficiently count the number of times each item occurs within a given iterable.

This collections.Counter class is really just a specialized dictionary with some extra operations.

Here are the time complexities of some common Counter operations:

Big O Operation
O(1) counter[item]
O(1) counter.pop(item)
O(n) for k, v in counter.items():
O(n log n) for k, v in counter.most_common():
O(n log k) for k, v in counter.most_common(k):

Note that the most_common method does the same thing as the dictionary items method, except it sorts the items first. Although, if a number is passed to most_common, it will efficiently lookup the k most common items instead (similar to the heapq.nlargest function noted in traversal techniques below).

Here are a few more somewhat common Counter operations:

Big O Operation
O(k) counter.update(iterable)
O(k) counter.subtract(iterable)
O(n) counter.total()

The k in O(k) above represents the length of the given iterable to the update and subtract methods.

Heap / Priority Queue ⛰️

Need a heap, possibly for the sake of implementing your own priority queue? Python's heapq module has you covered.

Here are the time complexities of various heap-related operations provided by the heapq module:

Big O Operation Notably
O(n) heapq.heapify(sequence)
O(log n) heapq.heappop(sequence) Fast!
O(log n) heapq.heappush(sequence, item) Fast!
O(1) sequence[0]

The heapq module really just performs operations on a list to treat it like a heap.

It's pretty unusual to implement long-running daemon processes that add items to and remove items from a custom priority queue, so you're unlikely to need a heap directly within your own code.

The heapq module does have some handy helper utilities that are heap-powered though (see traversal techniques below).

Sorted List 🔤

Need to find items or ranges of items within a sorted list? The bisect module has an implementation of binary search for you.

Here are the time complexities of the bisect module's various binary search operations:

Big O Operation Context / Notably
O(n log n) sorted_sequence = sorted(sequence) If not yet sorted
O(n) sorted_sequence.index(item) (For comparison's sake)
O(log n) bisect.bisect(sorted_sequence, item) Fast!
O(n) bisect.insort(sorted_sequence, item)

Note that you can combine bisect.bisect_left and bisect.bisect_right to efficiently find all items in a sorted list that are within a certain upper and lower bound.

Keep in mind that the act of sorting a list takes more time than traversing, so unless you're working with already sorted data or you're repeatedly bisecting your sorted list, it may be more efficient to simply loop over an unsorted list.

Also keep in mind that adding a new value to a sorted list is slow for the same reason that the list insert method is slow: all values after the insertion index will need to be shuffled around.

Traversal Techniques 🔍

Lastly, let's look at a few common lookup/traversal techniques.

Big O Operation
O(n) min(iterable)
O(n) max(iterable)
O(n log n) sorted(iterable)
O(n log k) heapq.nsmallest(k, iterable)
O(n) statistics.multimode(iterable)

Most traversal techniques require looping over the given iterable, so they're O(n) at minimum.

The traversals that require more time are the ones that involve comparisons between more than just two values (like sorting every item).

Efficient sorting is O(n log n) in time complexity terms. Whether you're using the list sort method or the built-in sorted function, Python attempts to sort as efficiently as it can.

If you don't really care about sorting every value, but instead you just need the k largest or smallest values, the heapq module has some heap-powered utilities that are even faster than sorting.

Other Data Structures? 📚

Computer Science includes a number of other classical structures, including but not limited to:

  • Lists (CS lists, not Python lists): linked lists, skip lists
  • Trees: binary trees, B-trees, red-black trees, tries
  • Graphs: directed, undirected, and weighted
  • Probabilistic: Bloom filters, locality-sensitive hashing

Why aren't these structures included in the Python standard library?

Well, as Brandon Rhodes noted in his PyCon 2014 talk, many of the classic CS data structures don't really make sense in Python because data structures in Python don't contain actually data but instead contain references to data (see variables and objects in Python).

When you do need a data structure that's optimized for specific operations, you can always lookup an implementation online or find a PyPI module (such as sortedcollections).

Beware of Loops-in-Loops! 🤯

Note that time complexity can really compound when you're performing operations within a loop.

For example, this code has an O(n²) time complexity because it contains a loop inside a loop:

counts = {}
for item in my_list:
    counts[item] = my_list.count(item)

The for loop looks like a loop, but where's the other loop?

The list count method actually performs an implicit loop because it needs to loop over the list to perform its counting!

Since we're performing an O(n) operation for each iteration of our loop, this code is O(n) * O(n), which is usually written as O(n*n) or O(n²). Remember that really steep line in the time complexity plot above? That's O(n²)!

Sometimes it's impossible to avoid an O(n²) operation. But it's often possible to change your algorithm or your data structures to greatly alter your code's time complexity. In our case we could avoid the count method call in our loop by incrementing an item count for each item we see:

counts = {}
for item in my_list:
    if item not in counts:
        counts[item] = 0
    counts[item] = += 1

Dictionary containment checks, key lookups, and item assignments are all O(1) operations. So this new for loop now has an O(n) time complexity!

This code doesn't look any faster at a quick glance. But it will be much faster for large amounts of data. For 1,000 times more data, our code will only be 1,000 times slower, whereas the previous loop would have been 1,000,000 times slower!

You can play with different list sizes for each of the above loops in this code snippet.

Note: for readability's sake, our whole loop could also just be one line with collections.Counter.

Mind Your Data Structures 🗃️

Choosing between data structures involves a trade-off between features, speed, and memory usage.

For example, sets are faster at key lookups than lists, but they have no ordering. Dictionaries are just as fast at key lookups as sets and they maintain item insertion order, but they require more memory.

In day-to-day Python usage, time complexity tends to matter most for avoiding loops within loops.

If you take away just two things from this article, they should be:

  1. Refactor O(n²) time complexity code to O(n) whenever possible
  2. When performance really matters, avoid O(n) whenever O(1) or O(log n) are possible

The next time you're worried about slow code, consider your code's time complexity. The biggest code speed ups often come from thinking in orders of magnitude.

A Python Tip Every Week

Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.