Python No Copy / is

Python has a very consistent way of handling memory between lines of code or between functions - there is just one string, or list, or dict, and references to that one proliferate across the lines.

Towards the end of the page are explanations of how to make a copy, and the is operator.

Python Does Not Copy #1 - Lists

Suppose your code is manipulating a string or list or dict, and so has a reference to this structure. What happens if there is an assignment =? Does this result in two lists?

>>> lst = [1, 2]
>>> lst2 = lst

No, there is just the one list. The = works in a "shallow" way, creating an additional reference to the one list.

alt:lst and lst2 point to the same list

Python always works this way - there is just one list or dict or whatever the code created explicitly, e.g. in this case [1, 2], and then references to that one list are spread around.

We can observe that there's just one list, since changes made on the lst2 variable can be seen on the lst variable.

>>> lst
[1, 2]
>>> lst2.append(3)
>>> lst2
[1, 2, 3]
>>> lst    # shows there was just one list
[1, 2, 3]

Python Does Not Copy #2 - Nesting

Suppose there is a dictionary d, and a list is stored as a value inside it. What happens when code refers to that list inside the dict? Does that make a copy?

No copy is made. There is the one list inside of the dict, and lst points to that list, even though it is nested inside of something else.

alt:lst points to list inside of dict

>>> d = {}
>>> d['a'] = [1, 2]  # put list inside of dict
>>> d
{'a': [1, 2]}
>>> lst = d['a']     # a reference to the list
>>> lst
[1, 2]

As before, we can observe that there is one list with multiple references by changing it.

>>> lst = d['a']
>>> lst.append(3)
>>> lst
[1, 2, 3]
>>> d     # Observe the list within d is changed too
{'a': [1, 2, 3]}

Python Does Not Copy #2a - Nesting Again

Say we have the above case with the list inside the dict at key 'a'. The expression d['a'] is a reference to that list. This means the expression d['a'] can be used in code to examine or modify that list.

>>> d = {}
>>> d['a'] = [1, 2]
>>> d['a']             # look at list
[1, 2]
>>> d['a'].append(13)  # refer to list and change it
>>> d['a'] 
[1, 2, 13]

Python Does Not Copy #3 - Parameters and Return

Suppose we have this code

def exclaim(strs):
    Modifies the strs list,
    appending '!' to each str element.
    for i in range(len(strs)):
        strs[i] += '!'

def caller():
    lst = ['a', 'b', 'c']

    # what's in lst now?

What happens when exclaim() is called, passing in the list of strings? Does this make a copy of the list? No, like before, there is never a copy.

alt:caller and called function point to same memory

The called function exclaim() just gets a reference to the list that caller created.

Caller / Called Functions Share Memory = Communicate Changes

Because the called and caller functions share the one list, that means that changes made by the called function are seen by the caller function - it's just the one data structure being worked on by two or more functions.

This is a form of data communication from the called function back to the caller, but it is not so crisp as using return. With return, we see an explicit line with an expression, and that is the value being returned.

In contrast, this "shallow" communication, which is a fine technique, is more broad. The contract of the exclaim() function is that whatever list the caller provides, it is going to be modified.

Making a Copy

What if code wants to make a copy of, say, a list? Perhaps the code is going to modify a list, but wants to keep a copy of its original form. This is not a very common need, but here is how to do it.

Both lists and dicts have a .copy() function that will construct and return a copy.

This code makes a list a, and then creates a copy b with its own memory.

>>> a = [1, 2, 3]
>>> b = a.copy()   # b is copy of a
>>> a              # a and b look the same
[1, 2, 3]
>>> b
[1, 2, 3]

We can tell that the b is copy by modifying a and observing that b is not changed:

>>> a.append(4)  # modify a
>>> a
[1, 2, 3, 4]
>>> b            # see b is not changed
[1, 2, 3]

Another, historical way to make a list copy is with a slice like this:

>>> b = a[:]     # make copy with slice

Some may prefer the .copy() function since it spells out the intent, and it works for dicts too.

Detect a Copy: is

Say we have a and b lists that appear to have the same values. How can we tell if it's one list with two variables pointing to it, vs. two lists which happen to look the same? This is what the is operator does - True if the two values are literally the same object.

>>> a = [1, 2, 3]
>>> b = a
>>> a is b          # "is" True case
>>> c = [1, 2, 3]   # c looks like a
>>> a is c          # "is" False case
>>> a is not c      # "is not" variant
>>> a == c          # == returns True

Programmers generally need to have a mental model of how the lists and whatnot are shared between the parts of the code. Thinking about the shared memory between functions is common.

However, the need for code to use the is operator to detect if two values are copies or not is very rare. Generally when code is passed in a list or something, the code can just use it, not start checking about how the list is allocated in memory.

The main use of is is for PEP8 style conformance, which states that comparisons to None and True and False should be written with is instead of ==, like this:

if x is None:        # PEP8 None comparison
    print('None detected')

This use of is is not about memory copying actually. The issue is that the == operator can be overridden by a datatype and provide unpredictable behavior. The is operator cannot be overridden, so the is None comparison is reliable.  


Copyright 2020 Nick Parlante