Article Thumbnail

Disassemble Your Python Code

Gaining insights into your Python code using the dis-module

Florian Dahlitz
15 min
April 14, 2020

Why Do I Want to Disassemble My Code?

By disassembling your code you get an understanding of how Python treats your Python code and how the resulting bytecode looks like. While writing code together with other people, reading or watching tutorials you may come across phrases like "writing it this way is the same as writing it this way" or "this piece of code is equivalent to this one". By disassembling certain code pieces and inspecting the generated bytecode, you can see whether Python generates the same bytecode for certain code pieces or different bytecode. You may even be able to conclude some (mis-)behaviors.

Furthermore, disassembling Python code allows you to understand Python internals even better. So without further introduction let's jump into it!

What Is in the Module?

The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.

Note: Bytecode is an implementation detail of the the CPython interpreter. According to the docs, there is no guarantee that no bytecode will be added, removed or changed between Python version. Furthermore, the use of the dis module should not be considered to work across Python releases or Python VMs. This article was written based on CPython 3.8.2.

We can separate the dis-module roughly in four parts. First, there is the Instruction object representing a single bytecode instruction. Second, the Bytecode object, which is a wrapper used to access details of the compiled code. Additionally, a set of analysis functions and a set of opcode collections are available. We will have a look at all four of them.

Instruction

The Python code you write is translated into a sequence of bytecode instructions during execution. A single instruction consists of:

  • opcode: numeric code for operation, corresponding to the opcode values and the bytecode values in the opcode collections. A mapping of the numeric codes and the human readable opnames can be obtained by calling dis.opmap.
  • opname: human readable name for operation
  • arg: numeric argument to operation (if any), otherwise None
  • argval: resolved arg value (if known), otherwise same as arg
  • argrepr: human readable description of operation argument
  • offset: start index of operation within bytecode sequence
  • starts_line: line started by this opcode (if any), otherwise None
  • is_jump_target: True if other code jumps to here, otherwise False

Let's take the RETURN_VALUE instruction as an example. Some of the values depend on the actual code, such as offset and starts_line. The values below are from an example, we will have a look at later. As the name suggests, it's the instruction used to represent a return-statement. The representation for the instruction may look like this:

Instruction(opname='RETURN_VALUE',
            opcode=83,
            arg=None,
            argval=None,
            argrepr='',
            offset=16,
            starts_line=None,
            is_jump_target=False)

The human readable name for the instruction is RETURN_VALUE and the corresponding opcode is 83. It has no numeric argument (arg) to the operation hence no argval and argrepr. The offset, which is the start index of the operation within the bytecode sequence, is 16. The return-statement in this example is not the first instruction in the line hence starts_line is None. No code is jumping to this instruction resulting in is_jump_target=False.

The RETURN_VALUE instruction is just one of a bunch. The available instructions can be grouped in six categories:

Note: TOS stands for Top of Stack and refers to the the element at the top of the stack. TOS1 refers to the second top-most element of the stack respectively.

  1. General: Contains general purpose instructions such as ROT_TWO (swaps the two top-most stack items).
  2. Unary: Unary operations take the element at the top of the stack, apply the desired operation, and push the result back on the stack. An example of an unary operations is UNARY_POSITIVE, which is implemented as TOS = +TOS
  3. Binary: Similar to unary operations, the binary operations remove the two top-most elements of the stack, apply the operation, and push the result back on the stack. For instance, the BINARY_ADD operation is implemented as TOS = TOS1 + TOS and is nothing else but a simple addition.
  4. In-place: In-place operations are pretty similar to binary operations except that they try to do the operation in-place when TOS1 supports it. Both TOS and TOS1 are removed from the stack, too. The resulting TOS may be the original TOS1, but doesn't have to. For example the INPLACE_ADD is implemented as TOS = TOS1 + TOS (but as an in-place operation).
  5. Coroutine: Coroutine opcodes are available since the emersion of Python 3.5's async and await keywords. This category consists of all coroutine-related instructions.
  6. Miscellaneous: Miscellaneous opcodes is by far the largest group. A prominent member of this category is the RETURN_VALUE instruction returning the TOS to the caller of the function.

Bytecode

The Bytecode object is a wrapper for Python code and provides easy access to details of the compiled code. It's a convenience wrapper for the bytecode analysis functions, e.g. get_instructions(). Iterating over a Bytecode instance yields the bytecode operations as Instruction instances. Let's take a simple example:

def func():
    return 5

The function at hand does nothing else but return the number 5. Let's extend the code snippet to create a Bytecode based on the function.

# bytecode_intro.py

import dis


def func():
    return 5


bytecode = dis.Bytecode(func)

You can access the compiled code object via bytecode.codeobj. The compiled code object can for instance be passed to the built-in eval() function, to evaluate it.

Note: You shouldn't do that in productive code!

# code of bytecode_intro.py
result = eval(bytecode.codeobj)
print(result)

This will print the number 5 if executed. To find out at which line in the file our code object starts, we can access the first_line attribute (this might be None in some cases).

# code of bytecode_intro.py
print(bytecode.first_line)

Executing the file will print 6 as the first line is a comment followed by a blank line, the third line is an import-statement followed by two blank lines.

The two remaining instance methods are dis() and info(). In essence, both are working like the corresponding dis.dis() and dis.code_info() functions, so we will have a look at them in the Bytecode Analysis section.

Additionally, the Bytecode class provides a class method called from_traceback(). This method constructs a Bytecode instance from a given traceback and sets the current_offset to the instruction responsible for the exception. The class method can be used to have a closer look at breaking code.

Bytecode Analysis

The analysis functions of the dis-module convert the input directly to the desired output. This is useful if you only want to perform a single operation and don't need the intermediate analysis object. The dis-module provides nine analysis functions:

  • code_info()
  • show_code()
  • dis()
  • disb()
  • disassemble()/disco()
  • get_instructions()
  • findlinestarts()
  • findlabels()
  • stack_effect()

We will have a look at each of them using the following code example:

# analysis_functions.py

import dis


def func():
    x = []
    x.append(5)
    return x

The func() function does nothing else but create an empty list, add the number 5 to the end of the list and return the list afterwards.


The code_info() function returns a formatted multi-line string with detailed information about the code object for the supplied input. Adding the line dis.code_info(func) at the end of the example file and executing the script afterwards may result in something like this:

$ python analysis_functions.py
Name:              func
Filename:          analysis_functions.py
Argument count:    0
Positional-only arguments: 0
Kw-only arguments: 0
Number of locals:  1
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: 5
Names:
   0: append
Variable names:
   0: x

Note: The contents of code_info() are highly dependent on the Python VM and Python version, so don't expect to always see the same information.

As we can see code_info() shows us the name of the function as well as the filename (in the REPL the filename would be something like <stdin>). Furthermore, the total number of arguments is displayed as well as the number of positional-only and keyword-only arguments. code_info() reveals, that only one local variable is defined and that this must be the variable x. Two more constants are used (None and 5) as well as one method/function (append()).

The next function is show_code(). In fact, this is only a convenient shorthand for print(code_info(x)). You can specify an optional file if you don't want it to be printed to sys.stdout.


A more often used function to gain insights into your code is dis(). As the name indicates it disassembles your code. If you provide a class, it will disassemble all methods including class and static functions. Furthermore, nested functions and code objects (comprehensions, generator expressions) are disassembled recursively.

Let's add dis.dis(func) to our analysis_functions.py script and execute it.

$ python analysis_functions.py
  7           0 BUILD_LIST               0
              2 STORE_FAST               0 (x)

  8           4 LOAD_FAST                0 (x)
              6 LOAD_METHOD              0 (append)
              8 LOAD_CONST               1 (5)
             10 CALL_METHOD              1
             12 POP_TOP

  9          14 LOAD_FAST                0 (x)
             16 RETURN_VALUE

The numbers 7, 8, and 9 in the first column are the line numbers in the file. The second column (0, 2, 4, 6, and so on) are the instruction offsets. The third column shows the opnames and the fourth column the size of the stack. The last column shows the name of the method, variable or the constant that's being loaded.

In the example at hand, the function first builds a list and stores it in the variable x. In line 8 the variable x is loaded as well as the method append(). After loading the constant 5, which is also pushed onto the stack, the method is called (CALL_METHOD). Subsequently, the top of the stack (constant 5) is popped. We are now in line 9 of the script, load the variable x and return it to the caller function.

As you can see, the dis() function allows us to see how Python converts our code to bytecode instructions. If you want to have a first example use-case, try to find out if and how a list comprehension and an equivalent for-loop differ in their bytecode instructions.


Now let's have a look at the distb() function. This function disassembles the top-of-stack function of a traceback. If none was passed, the last traceback is used. This way the instruction causing the exception is indicated. To see an easy example, open the Python REPL by typing python (or python3 on your machine) and try to execute dis.disb(). This results in an AttributeError as the function disb() does not exist.

>>> import dis
>>> dis.disb()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    dis.disb()
AttributeError: module 'dis' has no attribute 'disb'

Now we can call the distb() function without a traceback, so it's using the last one, which occurred.

>>> dis.distb()
  1           0 LOAD_NAME                0 (dis)
    -->       2 LOAD_METHOD              1 (disb)
              4 CALL_METHOD              0
              6 PRINT_EXPR
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

As we can see, Python failed to load the method disb() as it simply did not exist.


The next function we will look at is disassemble() (or disco() if you prefer - it's an alias and points to disassemble()). To me it seems like to be a function like dis(), where you need to supply an actual compiled code object instead of a reference to a function, module or something similar. I extended the analysis_functions.py script and came up with something like this:

# analysis_functions.py

import dis


def func():
    x = []
    x.append(5)
    return x


bytecode = dis.Bytecode(func)
dis.disassemble(bytecode.codeobj)

This results in the same output as with the dis() function example:

$ python analysis_functions.py
  7           0 BUILD_LIST               0
              2 STORE_FAST               0 (x)

  8           4 LOAD_FAST                0 (x)
              6 LOAD_METHOD              0 (append)
              8 LOAD_CONST               1 (5)
             10 CALL_METHOD              1
             12 POP_TOP

  9          14 LOAD_FAST                0 (x)
             16 RETURN_VALUE

However, supplying a value to the lasti argument adds a pointer to the specified offset. So calling disassemble() with last=4 points to the LOAD_FAST instruction with the offset 4:

$ python analysis_functions.py
  7           0 BUILD_LIST               0
              2 STORE_FAST               0 (x)

  8 -->       4 LOAD_FAST                0 (x)
              6 LOAD_METHOD              0 (append)
              8 LOAD_CONST               1 (5)
             10 CALL_METHOD              1
             12 POP_TOP

  9          14 LOAD_FAST                0 (x)
             16 RETURN_VALUE

If you have a look at the implementation of the functions [1], you can see that at some point dis() is calling disassemble() and both are calling the same private methods.


The get_instructions() function returns a generator yielding a series of Instruction named tuples. In fact it's the same as iterating over the earlier discussed Bytecode instance. We can add the following for-loop to our analysis_functions.py script to produce the hereafter result:

# analysis_functions.py code
for instruction in dis.get_instructions(func):
    print(instruction)
$ python analysis_functions.py
Instruction(opname='BUILD_LIST', opcode=103, arg=0, argval=0, argrepr='', offset=0, starts_line=7, is_jump_target=False)
Instruction(opname='STORE_FAST', opcode=125, arg=0, argval='x', argrepr='x', offset=2, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=4, starts_line=8, is_jump_target=False)
Instruction(opname='LOAD_METHOD', opcode=160, arg=0, argval='append', argrepr='append', offset=6, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_CONST', opcode=100, arg=1, argval=5, argrepr='5', offset=8, starts_line=None, is_jump_target=False)
Instruction(opname='CALL_METHOD', opcode=161, arg=1, argval=1, argrepr='', offset=10, starts_line=None, is_jump_target=False)
Instruction(opname='POP_TOP', opcode=1, arg=None, argval=None, argrepr='', offset=12, starts_line=None, is_jump_target=False)
Instruction(opname='LOAD_FAST', opcode=124, arg=0, argval='x', argrepr='x', offset=14, starts_line=9, is_jump_target=False)
Instruction(opname='RETURN_VALUE', opcode=83, arg=None, argval=None, argrepr='', offset=16, starts_line=None, is_jump_target=False)

findlinestarts() is used to find the offsets which are starts of lines in the source code. Therefore, the generator function uses the co_firstlineno and co_lnotab attributes of the supplied code object. The line starts are then generated as (offset, lineno) pairs. Let's create a Bytecode instance for our func() function and supply the code object to the findlinestarts() function. Subsequently, we iterate over it and print the generated elements.

# analysis_functions.py code
bytecode = dis.Bytecode(func)
for pair in dis.findlinestarts(bytecode.codeobj):
    print(pair)

For the example at hand this results in the following printed pairs:

$ python analysis_functions.py
(0, 7)
(4, 8)
(14, 9)

The findlabels() functions return a list of all offsets, which are jump targets in the raw compiled bytecode string. It's used to set the is_jump_target attribute of an Instruction instance.


The last analysis function in the dis-module is the stack_effect() function. It's not directly implemented in the dis-module itself but in Modules/_opcode.c [2]. According to the documentation it "[c]ompute[s] the stack effect of opcode with argument oparg." [3] As I couldn't find a possible use-case in the day-to-day Python programming, I leave it at this point.

Opcode Collections

The dis-module provides a set of opcode collections, which can be used for automatic introspection of bytecode instructions. Currently, ten of them exist. I don't want to look at all of them, but let's have a look at two of them so you get a feeling of what's waiting for you.

The opmap collection is a dictionary mapping operation names to bytecodes. If you remember the Instruction object from the beginning, this mapping can be used to get the opname for an opcode. As it's a quite large dictionary, I don't want to post it here. If you want to see it, open a Python REPL session, import the dis-module by typing import dis and print the opmap collection by typing dis.opmap.

The second collection I want to introduce is the haslocal collection. It provides a sequence of bytecodes that access a local variable. If you have your REPL still open and type dis.haslocal, you get the sequence [124, 125, 126]. If you now combine it with opmap, you know that only the instructions LOAD_FAST, STORE_FAST, and DELETE_FAST are accessing a local variable.

Summary

Congratulations, you made it through the article and learned a lot about Python's dis-module! Now you know what an instruction is and what it consists of. You noticed that the Bytecode object is used to provide easy access to the details of the compiled code and had a look at all the bytecode analysis functions available in the module.

You want to explore your Python code, but don't know where to start? Take a small function you wrote, supply it to some of the analysis functions and try to understand what's happening. I hope you enjoyed reading the article and would be happy to receive feedback (contact information). Feel free to share the articles with your friends and colleagues. Stay curious and keep coding!

References