Python Style Readable

Readable Code

What is the best code? Different situations call for different goals, but an excellent all purpose goal for code is that it is readable. When someone looks at the code, what it does is apparent. The code "reads" nicely, as the combination of function names and variable names narrate what it does.

Readable code tends to be less buggy, since a bug is a case where what the code says literally diverges from what the author had in mind.

Readable-1 - Good Function Names

Good function names are the first step in readable code. Function names often use verbs indicating what calling the function will accomplish. Look at how the function names below make the surrounding code read nicely.

delete_files(files)


if is_url_sketchy(url):
    display_alert('That url looks sketchy!')
else:
    html = download_url(url)


s = remove_digits(s)


count = count_duplicates(coordinates)


canvas.draw_line(0, 0, 10, 10)

The name of a function does not need to spell out every true detail about it. In Python, the sweet spot is probably one to three words, separated by underbars _, summarizing the main idea of the function.

Boolean Functions: is_xxx() has_xxx()

If a function returns a boolean value, starting its name with is_ or has_ can be a good choice. Think about how the function call will read when used in an if or while:

if is_weak(password):
    ...

This is not a requirement though. The Bit function bit.front_clear() works well enough. It could be named bit.is_front_clear(), but that may not be enough of an improvement to merit adding another word.

Function Name - Principle Of Least Surprise

The Principle of Least Surprise is a convention for function names. When designing a function, e.g. is_url_sketchy(url), imagine that another programmer is writing code to call this function. Assume that all the other programmers knows is its name since they don't bother to read the documentation. Therefore, the function should only take actions that one might expect given its name. So is_url_sketchy() should not, say, delete a bunch of files.

compute_ ?

Is it a good idea for a function name to start with a word like compute_? There is no fixed answer about that. On the one hand, this is often quite accurate! The function does compute and return something when called, so the verb is fair. On the other hand, computing things is what most functions do, so we could say it's kind of implied. A good way to judge is imagine what the code calling this function will look like. Below are a couple functions that compute distance. What do you think of the naming?

if distance(loc1, loc2) < 1.0:
  ...


if compute_distance(loc1, loc2) < 1.0:
  ...

In this case, the word "distance" has a clear meaning, so the compute_ does not add much.

Readability-2 - Good Variable Names

The code in a function is a story, a narrative, and the variable and function names help you keep the parts of the story clear in your mind. A variable name provides a short label for a bit of data in the story.

Bugs - Mix Up Two Values

Many bugs result from the programmer mixing up two data values just in the two minutes they are working on those lines, resulting in a round of debugging. Therefor the payoff for good variable names is right now. It's more efficient to keep things straight the whole time, and good variable names are part of the solution.

brackets() Example

Previous lecture example - "left" is a fine variable name in there. "x" or "i" would not be good choices.

brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return the empty string. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.

def brackets(s):
    left = s.find('[')
    if left == -1:
        return ''
    right = s.find(']')
    return s[left + 1: right]

brackets() Good Variable Names

For example, the variables left and right are good variable names — they are short, but long enough to identify each value in the computation. Short variable names are easy to type and easy to read. A longer name is needed if there are more details that need to be spelled out. In this case just left and right capture the main idea with one word. The variable name does not need to capture every detail of the data value, just enough to label it versus the others in the function.

Good Variable Examples

url           # a url string

urls          # a list of urls, w/ "s"

count         # counting something

count_coffee  # counting multiple things
count_tea

Too Long and Too Short Names

Here are some other possible names for left, exploring how long or short a variable name could be.

left                  # fine
left_index            # fine


int_index_of_left_paren   # too long
index_of_left_paren       # too long
# Don't need to spell out
# every detail in the name

a         # meaningless
li        # cryptic
l         # too short, and don't use "l"

When You Need More Words

Suppose the algorithm stored both the index and the character at that index - two values it would be very easy to mix up in the code. In that case, the variable names need added words to keep the two values straight:

left_index       # index of left char
left_ch          # char at that index

From the Sand homework, the x_from and x_to variables are good variable name examples. That code was difficult, but at least each variable was labeled as what it was. The code would have been more difficult if the four x/y variables were named a, b, c, d.

brackets() - Bad Names x, y, z Example

Here is a version of brackets() with bad, meaningless names - a, b, c:

def brackets(a):
    c = a.find('[')
    if c == -1:
        return ''
    b = a.find(']')
    return a[b + 1:c]  # buggy?

Last Line - Good vs. Bad Vars

Looking at the last lines of the good and bad versions demonstrates the role of good variable names. Look at the last line of the bad names version below. Is that line correct?

# Bad names version
return a[b + 1:c]  # buggy?


# Good names version
return s[left + 1:right]

With a bad variable, you have to look upwards in the code to remind yourself what value it holds. That's the sign of bad variable naming! The name of the variable should tell the story right there, not scrolling up to remind yourself what it holds. Save yourself some time and give the variable a sensible name.

Idiomatic Short Variable Names

There are some circumstances that are so common and idiomatic, that there are standard, idiomatic short variable names tuned for that situation.

Never name a variable lowercase L or O - these look too much like the digits 1 and 0.

What Generic s Means

Using a generic variable like the string s in brackets(s): means the function should work with any string and we are not making any more specific claim about the input string. If we were writing a function that took a url string or an email string, we would name the parameter url or email.

Add Var Strategy

Look at the variables "left = .." "right = .." above. One variable at a time, they are picking off parts of the problem, making little progress steps. Solve part of the problem and store it in a well-named variable. This is a nice strategy to keep in mind (also known as Decomp by Var). If you are staring at the blank screen with the whole problem to do, think of some sub-part of the problem you could compute and store in a local variable. Keep going this way, picking off and naming bits of the solution. This is just the old divide-and-conquer strategy, but applied to these smaller steps within a function.


Avoid Re-Computation - Store in Var

Suppose we have this loop - n copies of the lowercase form of s. This code is fine, we will just point out a slight improvement.

def n_copies(s, n):
    result = ''
    for i in range(n):
        result += s.lower()
    return result

Notice that s.lower() computes the lowercase form of s in the loop. The readability is fine, but the code computes that lowercase form again and again and again. The lowercase of 'Hello' is the same 'hello' every time through the loop. This is a little wasteful. Could compute it once, store in a variable, use the variable in the loop:

def n_copies(s, n):
    result = ''
    low = s.lower()
    for i in range(n):
        result += low
    return result

This is a slight improvement. It would be especially important if the s.lower() computation was slow. The most important requirement of a function is calling its helpers correctly to compute the correct result. Here we are looking at a secondary goal - is there unnecessary re-computation we could eliminate.


Here is another example of good and bad variable names.

switcheroo(s) Example

Here is a kind of difficult string logic problem. Getting this perfect is not so easy, but the variable names can help

switcheroo(s): Given a string s of even length, if the string length is 2 or less, return it unchanged. Otherwise take off the first and last chars. Consider the remaining middle piece. Split the middle into front and back halves. Swap the order of these two halves, and return the whole thing with the first and last chars restored. So 'x1234y' returns 'x3412y'.

switcheroo(s) - Good Var Names

The variable names here help us keep the various parts clear through the narrative, even at the moment we are working out each line. The variable names are naturally similar to those in the specification.

def switcheroo(s):
    if len(s) <= 2:
        return s
    first = s[0]
    last = s[len(s) - 1]
    middle = s[1:len(s) - 1]
    halfway = len(middle) // 2
    return first + middle[halfway:] + middle[:halfway] + last

The variable names don't have to be super detailed. Just enough to label the concepts through this narrative. Note that the one letter "s" is fine - there is nothing semantic about s that we need to keep track of beyond it's a string. In contrast, "first" "last" etc. have specific roles in the algorithm, and each one-word variable names tries to capture that role.

switcheroo(s) - Bad Var Names

Here is the above function written without any good variables, and without the benefit of spacing the steps of the algorithm out over several lines. Just because something is 1 line, does not make it better. I believe it's correct, but it' hard to tell!

def switcheroo(s):
    if len(s) <= 2:
        return s
    return (s[0] + s[1:len(s) - 1][(len(s) - 2) // 2:] +
            s[1:len(s) - 1][:(len(s) - 2) // 2] + s[len(s) - 1])

This is a good example of not readable.

The bad code also repeats computations, like (len(s) - 2) // 2. The good solution computes that value once and stores it in the variable halfway for use by later lines.  

 

Copyright 2020 Nick Parlante