Codebeez

Introduction

While it is easy to use Python to turn an idea into a program, one will quickly run into bottlenecks that make their code less performant than they might want it to be. One such bottleneck is memory, of which Python consumes a lot compared to statically typed languages. Indeed, someone asking for advice on how to optimize their Python application online will likely receive the following advice: "Rewrite it in Rust". For obvious reasons, this is not very practical advice most of the time. Thus, we must make do with what we have: Python, and libraries written for Python.

What follows is an exhibition of the memory model behind your Python application: How objects are allocated, where they are stored, and how they are eventually cleaned up. While the center of gravity of the discussion is on the theoretical, there will nevertheless be several practical considerations you can use to optimize your code's memory usage.

The Python memory model

The memory model can be explained with the use of three concepts: The stack and the heap, which are the main areas of memory in a Python application, and 'pointers', which are variables that hold memory addresses pointing to an object. Each of these concepts will be explained below.

Stack

The stack is the area of memory that holds the execution flow of your application. Whenever a function calls another function, a new stack frame is pushed to the top of the stack. When a function returns, that stack frame is destroyed, and control is passed to the frame one direct level lower. Execution of a program ends when the bottommost frame on the stack returns.

The "stack trace", the error message shown on an exception, is named for the stack, and represents the state of execution at the moment the error occured.

The stack is also where local variables live. This is how local variables are cleaned up at the end of function execution, and why functions outside of a function scope cannot access local variables within said scope.

Heap

The heap is the largest memory space of the application and it is where Python leaves any objects it creates. Unlike the stack, the heap is not structured in any particular way. Space for object creation is made available as needed. As such, the lifetime of objects on the heap is not predetermined. Objects exist for as long as they are considered to be needed.

Of course, from the stack it is necessary for objects on the heap to be reachable. Similarly, it will also be necessary to reach objects from other objects. This reachability is facilitated via pointers.

Pointers

Everyone who worked with C and C++ will know what pointers are. They will also understand why most other programming languages try so hard to abstract the concept of pointers away from the developer altogether. Pointers are memory addresses. They are named as they are because they 'point' to an address in memory that is assumed to hold data that maps to a specifc type of object.

In Python, when an object is created, what actually happens is that an object is created on the heap, with a pointer being made available within the local scope that requested the object's initialization. This means that the previous statement, "local variables live on their respective stack frame", is not entirely accurate: The actual object lives somewhere on the heap, while the (initially) only available pointer to the object living inside the stack frame.

This upshot of this distinction is that destroying a stack frame only cleans up the pointer, but not the actual object. This is because it is possible that since creation, another reference to the same object has been created, and that reference is still live. We will see how Python handles clearing the associated object in the section on garbage collection.

The size of a pointer is equal to the word size of your systems's architecture, being therefore 8 bytes on 64-bit systems, which most modern computers are.

How Python handles objects in memory

C-style arrays vs Python lists

In C, an array type is actually a pointer to the first element of that array. No additional information is given, not even the array's presumed size. In order to facilitate this type of data structure, elements of an array are required to be adjacant to each other in memory. Furthermore, the type of the data needs to be known in advance, so as to anticipate the amount of memory space each object allocates. The result is an optimal layout of array data in memory, with decreased memory usage and increased performance.

Python's dynamic nature, specifically the dynamic typing, prohibits this type of efficient data storage. With few exceptions, all objects refer to each other indirectly, through the presence of a pointer. This practice extends to Python's collections, list, set and tuple: All standard data structers are represented in memory by an underlying arrays of pointers.

The fact that almost all objects in Python are represented by pointers is a big contributor to Python's flexibility. It is, for example, why Python's list and tuple data structures can hold objects of different types. But it is also why Python collections are much less space-efficient than statically typed languages. Certain Python standard library classes, most notably the array library, overcome this limitation, letting the user define sequences of a fixed-sized data type for optimized storage. This can be a good solution if you have a need to store large sequences of numerical types in-memory. But of course, most real-world applications will prefer to use a third-party library specific to their needs, numpy being the most well-known example.

Reference counting and the garbage collector

Note that these concepts hold for the CPython implementation. Alternative implementations such as PyPy do not use a garbage collector relying on reference counting.

Python uses a combination of reference counting and reachability. reference counting keeps track of the amount of times an object has been assigned to a variable, minus the amount of times it was unassigned. An object not referred to by any other object has a reference count of zero, and thus can be cleaned up. It also uses reachability checks to see if any chain of references exists to make the object reachable from the stack. In so doing, it can clean up cyclical but disconnected graphs of objects, which cannot be discovered by reference counting alone. Such as when two objects hold references to each other, but not being referenced anywhere else.

Python's CPython garbage collector is generational. Objects are first created and assigned to generation 0, and the garbage collector will check each generation individually. An object that survives a collection round of their generation are bumped up to a higher generation. Higher generations are checked for expired objects less frequently, adhering to the universal observation that the longer something it has been around, the longer it will likely still be around. In total, there are three generations.

Limited manual control over garbage collection can be exerted through the gc library, which is part of the stdlib. For example, garbage collection can be manually triggered through the gc.collect() method. Garbage collection can also be suspended altogether through use of gc.disable() and gc.enable(). This can be useful whenever the unpredictable halting of a program is (temporarily) undesired during stretches of execution.

Because all Python types are objects and because all objects are managed by the garbage collector, types in CPython necessarily all have two extra fields: ob_refcnt, used for reference counting, and ob_type, which points to an object's class. Together these account for an irreducible 16 bytes of extra memory overhead per object. This is quite significant, given that an equivalent int type in statically typed languages is only 4 bytes.

Memory footprint of common Python types

We can use the sys module to inspect memory usage of types in memory. In CPython, the value of a float is 24 bytes, as we might expect: 8 bytes occupied by a double-precision floating point number, and 16 additional bytes of overhead

>>> sys.getsizeof(1.0)
24

integers in python are a bit stranger, because they extend infinitely. While a 4-byte integer can only hold approximately 4 billion values, python ints go on forever. This leads to a small amount of additional overhead. 28 bytes for small-value ints. Memory usage will increase as the value of the integer increases, however.

>>> sys.getsizeof(0)
28
>>> sys.getsizeof(2**256)
60

Python uses unicode encoding for strings. Ascii characters will preferably be represented by a single byte, plus an irreducible overhead of at least 49 bytes. If a string has non-ascii characters, Python will use an alternative string implementation that has an irreducible overhead of at least 73 bytes, and a size per character that depends on the unicode value of the largest character used (up to 4 bytes per character).

>>> sys.getsizeof('abc')
52
>>> sys.getsizeof('🐍')
80

Of the standard collection types, tuple is ever-so-slightly more memory efficient than lists. Both of them, however, are much less memory-hungry than sets, which have much greater overhead due to the hashing that is required to ensure quick uniqueness checks. This overhead scales with the length of the collection. Therefore, it may be valuable to prefer lists for large collections and ensure uniqueness in a different way. (if fast checking of presence is required, you might have to roll your own tree-like collection object, or import a third-party library. This would allow for checks in O(log n) while consuming less memory.) The same memory-inefficiency of sets furthermore extends to dictionary keys.

my_list = [random.randint(0, 1000) for _ in range(100)]

>>> sys.getsizeof(my_list)
920
>>> sys.getsizeof(tuple(my_list))
840
>>> sys.getsizeof(set(my_list))
8408

Memory footprint of Python objects

While the above covers the most common Python built-ins, it does not yet touch upon the internal representation of objects in memory. The most important thing to know about python classes is that the values of their fields are, by default, stored inside of a dictionary. Specifically, they are contained inside the field called __dict__.

This is what makes Python objects incredible flexible, as fields can be added to an object at will, even after initialization.

For example, code like this works perfectly fine in Python:

class Foo:
    def __init__(self):
        self.bar = 2

foo = Foo()
foo.xyz = 3  # valid

However, apart from being possibly undesired (and frustrating, because a typo in field names won't lead to an error), this feature of Python is also causing a large increase in the memory footprint of your objects:

>>> sys.getsizeof(foo.__dict__)
296

What most people will have realized, however, is that this flexibility does not extend to Python's builtin types. Specifically, both of these lines of code will lead to an AttributeError

int(0).xyz = 3
object().xyz = 3

Looking further into, this, we will quickly realize that the __dict__ attribute does not exist for these types:

int(0).__dict__  # Also an AttributeError

Python has a feature called slotted classes, which is an alternative way of an object to store its fields that is less flexible, but more memory-efficient. The syntax for this is quite straightforward, albeit a bit verbose:

class Foo:
    __slots__ = ('bar', 'baz')  # specify exactly which fields the class has

    def __init__(self, bar, baz):
        self.bar = bar
        self.baz = baz

foo = Foo('bar', 'baz')
foo.xyz = 3  # This is now an AttributeError

This syntax specifies ahead of time which fields the object is expected to have, although it does not require that these fields must be initialized by __init__ (or at any time). As an added bonus, access of a slotted attribute is faster than for the approach using __dict__

It should be noted that most Python programmers will prefer using dataclasses wherever possible. And as it happens, slots are more easily specified for a dataclass without the additional boilerplate we see above:

@dataclass(slots=True)
class Foo:
    bar: str
    baz: str

foo = Foo('bar', 'baz')

So this will likely be your preferred way of instantiating slotted classes.

Conclusion

Understanding the underlying memory model of your Python application is a useful guide in understanding the performance pitfalls that Python's flexible nature offers you. Although it is commonly known that Python is "less efficient" than statically typed languages, the contributing factors to this inefficiency are not as universally understood. But with only a little bit of explanation, I hope to have helped you understand and gain a grasp on how Python is using your computer's memory, and showed you how some relatively simple tweaks can offer substantial improvements in your application's memory usage.

As a closing note, it should be noted that almost no commercial application of Python uses "pure" Python to get its tasks done. There is almost always a reliance on third-party libraries or frameworks that are at least partially written in a compiled language. When writing competitive code, when memory efficiency truly becomes a topic of concern, the above knowledge will only get you so far, and it becomes necessary to immerse yourself in the memory-efficient options provided as Python libraries within your domain.

updated on March 19th: Edited the string encoding section based on a correction I received.

The memory footprint of your Python application