Let's look at the time complexity of different Python data structures and algorithms.
This article is primarily meant to act as a Python time complexity cheat sheet for those who already understand what time complexity is and how the time complexity of an operation might affect your code. For a more thorough explanation of time complexity see Ned Batchelder's article/talk on this subject.
Time complexity is one of those Computer Science concepts that's scary in its purest form, but often fairly practical as a rough "am I doing this right" measurement.
In the words of Ned Batchelder, time complexity is all about "how your code slows as your data grows".
Time complexity is usually discussed in terms of "Big O" notation.
This is basically a way to discuss the order of magnitude for a given operation while ignoring the exact number of computations it needs.
In "Big O" land, we don't care if something is twice as slow, but we do care whether it's n
times slower where n
is the length of our list/set/slice/etc.
Here's a graph of the common time complexity curves:
Remember that these lines are simply about orders of magnitude.
If an operation is on the order of n
, that means 100 times more data will slow things down about 100 times.
If an operation is on the order of n²
(that's n*n
), that means 100 times more data will slow things down 100*100
times.
I usually think about those curves in terms of what would happen if we suddenly had 1,000 times more data to work with:
O(1)
: no change in time (constant time!)O(log n)
: ~10 times slow downO(n)
: 1,000 times slow downO(n log n)
: 10,000 times slow downO(n²)
: 1,000,000 times slow down! 😲With that very quick recap behind us, let's take a look at the relative speeds of all common operations on each of Python's data structures.
Python's lists are similar to arrays or array lists in some other languages.
Here are the time complexities of some common list operations:
Big O | Operation | Notably |
---|---|---|
O(1) |
sequence.append(item) |
Fast! |
O(1) |
sequence.pop(item) |
Fast! |
O(n) |
sequence.insert(0, item) |
Slow! |
O(n) |
sequence.pop(item, 0) |
Slow! |
O(1) |
sequence[index] |
|
O(1) |
sequence[index] = value |
|
O(n) |
item in sequence |
Slow! |
I've called out append
, pop
, and insert
above because new Python programmers are sometimes surprised by the relative speeds of those operations.
Adding or removing items from the end of a list are both very fast operations regardless of how large the list is. On the other hand, adding or removing items from the beginning of a list is very slow (it requires shifting all items after the change).
Note that indexing and assigning to indexes is fast regardless of the index.
Also note that the in
operator requires looping over the list, unlike sets (as we'll see below).
In case you're curious, here are even more list operations:
Big O | Operation | Notably |
---|---|---|
O(1) |
len(sequence) |
|
O(k) |
sequence.extend(iterable) |
|
O(k) |
sequence[index:index+k] |
|
O(n) |
sequence.index(item) |
Slow! |
O(n) |
sequence.count(item) |
Slow! |
O(n) |
for item in sequence: |
For the extend
method, k
represents the length of the given iterable.
For slicing, k
represents the length of the slice.
Lists are stack-like. That means it's inexpensive to perform operations on the most-recently added items (at the end of a list).
For inexpensive operations involving the least-recently added item (the beginning of a list), we'd need a queue-like structure.
That's what Python's collections.deque
data structure is for.
>>> from collections import deque
>>> queue = deque([2, 1, 3, 4])
Here are the time complexities of common deque
operations:
Big O | Operation | Notably |
---|---|---|
O(1) |
queue.append(item) |
|
O(1) |
queue.pop(item) |
|
O(1) |
queue.appendleft(item) |
Fast! |
O(1) |
queue.popleft(item) |
Fast! |
O(n) |
item in queue |
|
O(1) |
queue[0] or queue[-1] |
|
O(n) |
queue[i] |
Slow! |
O(n) |
for item in queue: |
Note that we can efficiently add and remove items from the beginning of a deque with the appendleft
and popleft
methods.
If you find yourself calling the insert
or pop
methods on a list with an index of 0
, you could probably speed your code up by using a deque
instead.
Also note that, looking up arbitrary indexes in deque
objects requires looping!
Unlike lists, deque
objects are implemented as a doubly-linked list.
Fortunately, looking up arbitrary indexes is pretty unusual for both lists and deque objects in Python (since our for
loops are not index-based).
Dictionaries are meant for grouping or accumulating values based on a key. Our "dictionaries" in Python are called hash maps (or sometimes "associative arrays") in many other programming languages.
Here are the time complexities of some common dictionary operations:
Big O | Operation | Notably |
---|---|---|
O(1) |
mapping[key] = value |
Fast! |
O(1) |
mapping[key] |
|
O(1) |
mapping.get(key) |
|
O(1) |
mapping.pop(key) |
Fast! |
O(1) |
key in mapping |
Fast! |
O(n) |
for k, v in mapping.items(): |
Note that the only expensive operation on a dictionary involves explicitly looping over the dictionary.
Thanks to the power of hashing, dictionaries are very fast at all operations related to key lookups.
Checking for containment, inserting a new item, updating the value of an item, and removing an item are all constant time operations (that's O(1)
in big O).
Here are time complexities of slightly less common dictionary operations:
Big O | Operation | Explanation |
---|---|---|
O(1) |
next(iter(mapping)) |
Get first item |
O(1) |
next(reversed(mapping)) |
Get last item |
O(n) |
value in mapping.values() |
Value containment |
O(k) |
mapping.update(iterable) |
Add many items |
The k
in O(k)
for the update
method represents the number of items in the given iterable.
Note that getting the first and last items is a bit awkward, but very fast.
Also note that checking whether a dictionary contains a particular value is slow! Dictionaries are optimized for fast key lookups, but not fast value lookups. Key containment checks are fast, but value containment checks require looping over the whole dictionary.
Sets store distinct items.
Unlike lists, sets don't maintain the order of their items. Instead, they're optimized for quick containment checks.
Here are the time complexities of some common set operations:
Big O | Operation | Notably |
---|---|---|
O(1) |
my_set.add(item) |
Fast! |
O(1) |
my_set.remove(item) |
Fast! |
O(1) |
item in my_set |
Fast! |
O(n) |
for item in my_set: |
Like dictionaries, the only expensive operation on a set involves explicitly looping over the set.
Most importantly, asking whether a set contains an item (item in my_set
) is fast, unlike lists.
Sets also support various operations between multiple sets:
Big O | Operation | Explanation |
---|---|---|
O(n) |
set1 & set2 |
Intersection |
O(n) |
set1 | set2 |
Union |
O(n) |
set1 ^ set2 |
Symmetric difference |
O(n) |
set1 - set2 |
Asymmetric difference |
I'm assuming the sets are the same size here. If the sets are different sizes, some of those operations will be on the order of either the smallest or largest set size (depending on the operation).
Also note that all of those operations work the same way between dictionary keys as well!
For example, if you wanted to efficiently find the common keys between two dictionaries, you can use the &
operator:
>>> colors1 = {"purple": 1, "blue": 2, "green": 4}
>>> colors2 = {"red": 2, "purple": 3, "blue": 1}
>>> colors1.keys() & colors2.keys()
{'blue', 'purple'}
Python's collections
module includes a Counter
class which can efficiently count the number of times each item occurs within a given iterable.
This collections.Counter
class is really just a specialized dictionary with some extra operations.
Here are the time complexities of some common Counter
operations:
Big O | Operation |
---|---|
O(1) |
counter[item] |
O(1) |
counter.pop(item) |
O(n) |
for k, v in counter.items(): |
O(n log n) |
for k, v in counter.most_common(): |
O(n log k) |
for k, v in counter.most_common(k): |
Note that the most_common
method does the same thing as the dictionary items
method, except it sorts the items
first.
Although, if a number is passed to most_common
, it will efficiently lookup the k
most common items instead (similar to the heapq.nlargest
function noted in traversal techniques below).
Here are a few more somewhat common Counter
operations:
Big O | Operation |
---|---|
O(k) |
counter.update(iterable) |
O(k) |
counter.subtract(iterable) |
O(n) |
counter.total() |
The k
in O(k)
above represents the length of the given iterable to the update
and subtract
methods.
Need a heap, possibly for the sake of implementing your own priority queue?
Python's heapq
module has you covered.
Here are the time complexities of various heap-related operations provided by the heapq
module:
Big O | Operation | Notably |
---|---|---|
O(n) |
heapq.heapify(sequence) |
|
O(log n) |
heapq.heappop(sequence) |
Fast! |
O(log n) |
heapq.heappush(sequence, item) |
Fast! |
O(1) |
sequence[0] |
The heapq
module really just performs operations on a list to treat it like a heap.
It's pretty unusual to implement long-running daemon processes that add items to and remove items from a custom priority queue, so you're unlikely to need a heap directly within your own code.
The heapq
module does have some handy helper utilities that are heap-powered though (see traversal techniques below).
Need to find items or ranges of items within a sorted list?
The bisect
module has an implementation of binary search for you.
Here are the time complexities of the bisect
module's various binary search operations:
Big O | Operation | Context / Notably |
---|---|---|
O(n log n) |
sorted_sequence = sorted(sequence) |
If not yet sorted |
O(n) |
sorted_sequence.index(item) |
(For comparison's sake) |
O(log n) |
bisect.bisect(sorted_sequence, item) |
Fast! |
O(n) |
bisect.insort(sorted_sequence, item) |
Note that you can combine bisect.bisect_left
and bisect.bisect_right
to efficiently find all items in a sorted list that are within a certain upper and lower bound.
Keep in mind that the act of sorting a list takes more time than traversing, so unless you're working with already sorted data or you're repeatedly bisecting your sorted list, it may be more efficient to simply loop over an unsorted list.
Also keep in mind that adding a new value to a sorted list is slow for the same reason that the list insert
method is slow: all values after the insertion index will need to be shuffled around.
Lastly, let's look at a few common lookup/traversal techniques.
Big O | Operation |
---|---|
O(n) |
min(iterable) |
O(n) |
max(iterable) |
O(n log n) |
sorted(iterable) |
O(n log k) |
heapq.nsmallest(k, iterable) |
O(n) |
statistics.multimode(iterable) |
Most traversal techniques require looping over the given iterable, so they're O(n)
at minimum.
The traversals that require more time are the ones that involve comparisons between more than just two values (like sorting every item).
Efficient sorting is O(n log n)
in time complexity terms.
Whether you're using the list sort
method or the built-in sorted
function, Python attempts to sort as efficiently as it can.
If you don't really care about sorting every value, but instead you just need the k
largest or smallest values, the heapq
module has some heap-powered utilities that are even faster than sorting.
Computer Science includes a number of other classical structures, including but not limited to:
Why aren't these structures included in the Python standard library?
Well, as Brandon Rhodes noted in his PyCon 2014 talk, many of the classic CS data structures don't really make sense in Python because data structures in Python don't contain actually data but instead contain references to data (see variables and objects in Python).
When you do need a data structure that's optimized for specific operations, you can always lookup an implementation online or find a PyPI module (such as sortedcollections).
Note that time complexity can really compound when you're performing operations within a loop.
For example, this code has an O(n²)
time complexity because it contains a loop inside a loop:
counts = {}
for item in my_list:
counts[item] = my_list.count(item)
The for
loop looks like a loop, but where's the other loop?
The list count
method actually performs an implicit loop because it needs to loop over the list to perform its counting!
Since we're performing an O(n)
operation for each iteration of our loop, this code is O(n) * O(n)
, which is usually written as O(n*n)
or O(n²)
.
Remember that really steep line in the time complexity plot above?
That's O(n²)
!
Sometimes it's impossible to avoid an O(n²)
operation.
But it's often possible to change your algorithm or your data structures to greatly alter your code's time complexity.
In our case we could avoid the count
method call in our loop by incrementing an item count for each item we see:
counts = {}
for item in my_list:
if item not in counts:
counts[item] = 0
counts[item] = += 1
Dictionary containment checks, key lookups, and item assignments are all O(1)
operations.
So this new for
loop now has an O(n)
time complexity!
This code doesn't look any faster at a quick glance. But it will be much faster for large amounts of data. For 1,000 times more data, our code will only be 1,000 times slower, whereas the previous loop would have been 1,000,000 times slower!
You can play with different list sizes for each of the above loops in this code snippet.
Note: for readability's sake, our whole loop could also just be one line with collections.Counter
.
Choosing between data structures involves a trade-off between features, speed, and memory usage.
For example, sets are faster at key lookups than lists, but they have no ordering. Dictionaries are just as fast at key lookups as sets and they maintain item insertion order, but they require more memory.
In day-to-day Python usage, time complexity tends to matter most for avoiding loops within loops.
If you take away just two things from this article, they should be:
O(n²)
time complexity code to O(n)
whenever possibleO(n)
whenever O(1)
or O(log n)
are possibleThe next time you're worried about slow code, consider your code's time complexity. The biggest code speed ups often come from thinking in orders of magnitude.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Need to fill-in gaps in your Python skills? I send weekly emails designed to do just that.