[Python] Pointers/references in Python

Coming from lower-level programming languages like pure C/C++, you may think there are no pointers/references in dynamically-typed higher-level PLs like Lisp or Python.

In fact, there are, just hidden.

Ever tried to create a 2D-array in Python 2/3? Ahh, the classic bug:

>>> a=[[None]*4]*4

>>> a
[[None, None, None, None], [None, None, None, None], [None, None, None, None], [None, None, None, None]]

>>> a[0][0]=1

>>> a
[[1, None, None, None], [1, None, None, None], [1, None, None, None], [1, None, None, None]]

You do not create a 2D-array. You create a 4-item list of None-s and 4-item list of lists. The 'outer' list contains 4 links (or pointers) to the same 4-item list of None-s! They are like 'mirrors'.

So there are in fact only two lists in memory at this moment. Inner list: [None, None, None, None], and after modification: [1, None, None, None]. Outer list: [pointer, pointer, pointer, poiner].

Using id() to get address of object:

>>> a=[[None]*4]*4
>>> id(a)
140258511872128
>>> id(a[0])
140258513079296
>>> id(a[1])
140258513079296
>>> id(a[2])
140258513079296
>>> id(a[3])
140258513079296

... after modification all addresses are unchanged ...

>>> a[0][0]=1
>>> id(a)
140258511872128
>>> id(a[0])
140258513079296
>>> id(a[1])
140258513079296
>>> id(a[2])
140258513079296
>>> id(a[3])
140258513079296

... or, for each element of list ...

>>> list(map(lambda x: list(map(id, x)), a))
[
[9788992, 9484816, 9484816, 9484816],
[9788992, 9484816, 9484816, 9484816],
[9788992, 9484816, 9484816, 9484816],
[9788992, 9484816, 9484816, 9484816]
]

Correct way of creating a 2D-array is creating each element:

>>> a = [[None for x in range(4)] for y in range(4)]

... all pointers in the 'outer' list are different ...

>>> list(map(id, a))
[140258511872960, 140258511872064, 140258511825984, 140258511872512]

Hence you will have 5 lists in memory: 4 'inner' lists and 1 'outer'.

This is why there exist 'shallow copy' and 'deep copy'. These procedures are exist also in Lisp family of PLs.

'Shallow copy' copies only links/pointers. 'Deep copy' recreates all structure recursively, hence, works slow.

See the source code of the Python module that does deep copy, it recreates dictionaries and lists copying all elements one-by-one.


Update from IRC #python (Libera):

00:50 < SnoopJ> Everything, see also: `lst = [1,2,3]; lst.append(lst)`
00:51 < SnoopJ> Everything, but in fact, in CPython, pretty much everything is PyObject*, so references and even pointers are all over
                Python ;)

Let's see:

>>> lst = [1,2,3]; lst.append(lst)
  
>>> lst
[1, 2, 3, [...]]

>>> lst[0]
1

>>> lst[1]
2

>>> lst[2]
3

>>> lst[3]
[1, 2, 3, [...]]

Note the last line -- it's like recursion. Recursive structure cannot be dumped fully due to infinite loop, you must stop at some point.

What are pointers?

>>> id(lst)
140258511875328

>>> list(map(id, lst))
[9788992, 9789024, 9789056, 140258511875328]

Comments at lobste.rs.


Another update. Citing NetworkX manuals about usefulness of deep/shallow copies:

Deepcopy – A “deepcopy” copies the graph structure as well as all data attributes and any objects they might contain. The entire graph object is new so that changes in the copy do not affect the original object. (see Python’s copy.deepcopy)

Data Reference (Shallow) – For a shallow copy the graph structure is copied but the edge, node and graph attribute dicts are references to those in the original graph. This saves time and memory but could cause confusion if you change an attribute in one graph and it changes the attribute in the other. NetworkX does not provide this level of shallow copy.

Independent Shallow – This copy creates new independent attribute dicts and then does a shallow copy of the attributes. That is, any attributes that are containers are shared between the new graph and the original. This is exactly what dict.copy() provides.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.