File Read Write

We'll talk about file reading first which is much more common; see file writing at the end.

Python makes it easy to read the data out of a text file. There are a few different forms, depending on if you want to process the file line by line or all at once.

Say the variable filename holds the name of a text file as a string, like 'poem.txt'. The file 'poem.txt' is out in the file system with lines of text in it. Here is the standard code to open a file, loop over all its lines. If you need to loop over the lines of a file, you could paste this code in to start. The code is explained below.

with open(filename) as f:
    for line in f:
        line = line.strip()
        # Use line here

1. `with open(filename) as f`

The phrase - with open(filename) as f - opens a connection to that file and stores it in the variable f. Code that wants to read the data from the file works through f which is a sort of conduit to the file.

alt: set variable f to point to file data

Aside: The connection to the file is kept open so long as the code is running within the indented with/open section. When the run leaves indented section, the connection is automatically "closed", freeing up its resources. This is not something the programmer needs to worry about, just a bit of automatic cleanup.

2. `for line in f:`

The loop for line in f: reads through all the lines of a text file. On the first iteration of the loop, the variable line is set to point to the text of the first line from the file. On the second iteration, it points to the second line of text, and so on through the whole file. If the file is 100 lines long, the loop will iterate 100 times.

alt: line points to each line of the file

Aside: the loop looks at one line at a time. This has the advantage of requiring very little memory — only enough memory to hold the chars of a single line at a time, even though the file might have thousands or millions of lines of text in total.

3. `line = line.strip()`

Each line from the file has a newline char '\n' at its end. The newline char is basically the char produced by the return key on the keyboard. Somtimes, this newline char gets in the way. To remove this complication, we will always remove the newline char with line = line.strip. The .strip() function removes whitespace chars from the beginning and end of a string, including the newline char.

Print File Example

For example, here is the standard file-read loop with print() added in the loop, so it prints out each line from the file.

with open(filename) as f:
    for line in f:
        line = line.strip()
        print(line)

Text Line Split

Often each text line has multiple items on it, separated by some char such as comma ','. The string split() function is very handy here, dividing a string into parts based on a separator char like this

>>> line = 'apple,12,donut,99'
>>> line.split(',')  # note ',' parameter
['apple', '12', 'donut', '99']
>>>
>>> line = 'aa:bb:cc'
>>> line.split(':')
['aa', 'bb', 'cc']
>>>

Text Line vs. Int and Float

The data in the lines is fundamentally text. Use functions like int() and float() to convert text to a number:

>>> line = '123'     # Line from the file
>>> int(line)        # Compute int value
123
>>>
>>> line = '3.14'
>>> float(line)
3.14
>>>

Unicode Encoding

The form open(filename, encoding='utf-8') specifies the encoding to use to interpret the file to unicode chars, in this case 'utf-8' which is a very common encoding.

If reading a file crashes with a "UnicodeDecodeError", probably the reading code needs to specify the particular encoding used by that file.

1. text = f.read()

(Can try these in >>> interpreter, running python in a folder that has a text file in it to be read.)

You can read the whole file into a single string — less code and bother than going line by line. This is handy if the code does not need to consider each line separately.

with open(filename) as f:
    text = f.read()
    # Look at text str

In this example text is the string 'Roses are red\nViolets are blue\n' — the whole contents of the file in one string.

This approach will require memory in Python to store all of the bytes of the file. As an estimate, look at the byte size of the file in your operating system file viewer.

The read() function is designed to be called once, returning the contents of the file. Do not call read() a second time; store the string returned in a variable and use that to access the file contents.

File split()

Recall the string function s.split() with no parameters, splits on whitespace, returning a list of "words". Whitespace includes '\n', and the no-param form of split merges multiple whitespace chars together.

Therefore, split() works beautifully with the whole text of a file, treating '\n' like just another whitespace char. Here it is applied to our text file:

>>> text = 'Roses are red\nViolets are blue\n'
>>> text.split()
['Roses', 'are', 'red', 'Violets', 'are', 'blue']

So text = f.read() may be followed by a words = text.split(). Now we have a list of words easily, without bothering with lines or looping.

Demo - read the whole book into 1 string, split into words. Python looks powerful here.

>>> with open('alice-book.txt') as f:
...   text = f.read()
>>> len(text)   # num chars,  len > 149,000 !
149103
>>> 
>>> text[:200]  # first 200 chars
"Alice's Adventures in Wonderland\n\n                ALICE'S ADVENTURES IN WONDERLAND\n\n                          Lewis Carroll\n\n               THE MILLENNIUM FULCRUM EDITION 3.0\n\n\n\n\n                     "
>>>
>>> words = text.split()  # split into words
>>> len(words)  # num words
26963
>>> words[:20]  # Look at first 20
["Alice's", 'Adventures', 'in', 'Wonderland', "ALICE'S", 'ADVENTURES', 'IN', 'WONDERLAND', 'Lewis', 'Carroll', 'THE', 'MILLENNIUM', 'FULCRUM', 'EDITION', '3.0', 'CHAPTER', 'I', 'Down', 'the', 'Rabbit-Hole']

Conclusion: 3 lines of Python, can just have a list of all the words, ready for a loop or whatever.

2. lines = f.readlines()

Another technique, the call f.readlines() returns a list of strings, one for each line. Sometimes it's more useful to have all the lines at once, vs. getting them one at a time in the standard loop. Can access the lines in any order, or use a slice to get rid of some, and so on.

with open(filename) as f:
    lines = f.readlines()
    # lines[0], lines[1], .. are the file lines

For our example, the lines list is ['Roses are red\n', 'Violets are blue\n']. The lines in the list are analogous to the lines you would get with for-line-in-f, but in the form of a list.

File Closing

The with/open() form automates closing the file reference when the code is done using it. Closing the file frees up some memory resources associated with keeping the file open. In older versions of Python (and in other languages) the programmer is supposed to call f.close() manually when done with the file. Here is an example, not using the with/open() form, and instead closing the file manually:

f = open(filename)
...use f...
f.close()

Most code should just use the with open(..) structure so the file close is handled automatically.

File Writing

File "writing" is the opposite direction of reading — taking data in Python variables and writing it out to a text file. The CS106A projects typically do lots of reading, which is the most common form.

Here is example code writing to file (and you can try this in the interpreter). First specify 'w' in the open(). Then call print('Hello', file=f) to print data out to the file as a series of text lines. This is the same print() function that writes to standard output, here used to write to the opened file.

>>> with open('out.txt', 'w') as f:
...   print('Hello there', file=f)
...   print('Opposite or reading!', file=f)

After running those lines, a file out.txt now exists in the directory from which we ran Python:

$ cat out.txt
Hello there
Opposite or reading!

Be very careful with the open(filename, 'w') form - it will instantly erase the file referenced. It is perhaps safer to use the output redirection technique instead in the next section.

Output Redirection

Instead of coding the file-writing in your program, there is an alternative in the terminal that handles simple cases easily. This feature works in Mac, Windows, and Linux.

Many programs print output to standard output. When you run the program in the terminal, you see this printed out right there, like this run of a super.py program that prints out that today is just great.

$ python3 super.py
Everything today is just super
Most excellent
$

What if we wanted to write that text to an out.txt file? An easy way is to use the > output redirection feature in the command line, like this:

$ python3 super.py > out.txt  # send output to out.txt
$
$ cat out.txt   # see what's in the file
Everything today is just super
Most excellent
$

The > takes the output of a program and stores it in the file instead of showing it on screen. So a simple form of program prints its output to standard output. This way the user can see the output right away in the terminal. If the user wants to save the output in a file, they can run the program again with > to save the output to a file. This is why file-writing code is so rare compared to file-reading code — output redirection handles simple writing cases without any python code required.

Text Files vs. Binary Files

The examples above all use text files made of lines and chars, which is the most common case. It's also possible to read a "binary" file, treating the file contents as raw bytes, not text. To open a file in read-binary mode, add a 'rb' as shown below. Then call f.read() will return a Python "bytes" structure of the raw bytes, where len() and square-brackets work to access each value. You may need to call read() multiple times in a loop.

with open('binary-file', 'rb') as f:
    raw = f.read(100)
    # raw is
    # b'\xa7\x0f\x12\xff\x6e...
    # raw[0] == hex a7 i.e. 167

For more details, see the python.org official File Read/Write Docs