
How to Think Like a Computer Scientist: Learning with Python 3 previous | next | index 13. Files 1 3.1. Reading and writing files While a program is running, its data is stored in rando m access m e m ory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time you turn on your computer and start your program, you have to write it to a n o n - volatile storage medium, such a hard drive, usb drive, or CD-RW. Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs. Working with files is a lot like working with a notebook. To use a notebook, you have to open it. When youre done, you have to close it. While the notebook is open, you can either write in it or read from it. In either case, you know where you are in the notebook. You can read the whole notebook in its natural order or you can skip around. All of this applies to files as well. To open a file, you specify its name and indicate whether you want to read or write. Opening a file creates what we call a file h a n d l e. In this example, the variable myfile refers to the new handle object. Our program calls methods on the handle, and this makes changes to the actual file which is usually located on our disk. myfile = open('test.dat', 'w') The open function takes two arguments. The first is the name of the file, and the second is the m o d e. Mode 'w' means that we are opening the file for writing. If there is no file named test.dat on the disk, it will be created. If there already is one, it will be replaced by the file we are writing. To put data in the file we invoke the write method on the handle: myfile.write("Now is the time") myfile.write("to close the file") Closing the file handle tells the system that we are done writing and makes the disk file available for reading by other programs (or by ourselves): myfile.close() Now we can open the file again, this time for reading, and read the contents into a string. This time, the mode argument is 'r' for reading: >>> mynewhandle = open('test.dat', 'r') If we try to open a file that doesnt exist, we get an error: >>> mynewhandle = open('test.cat', 'r') IOError: [Errno 2] No such file or directory: 'test.cat' Not surprisingly, the read method reads data from the file. With no arguments, it reads the entire contents of the file into a single string: >>> text = mynewhandle.read() >>> p r i n t(text) Now is the timeto close the file There is no space between time and to because we did not write a space between the strings. read can also take an argument that indicates how many characters to read: >>> myfile = open('test.dat', 'r') >>> p r i n t(myfile.read(5)) Now i If not enough characters are left in the file, read returns the remaining characters. When we get to the end of the file, read returns the empty string: >>> p r i n t(myfile.read(1000006)) s the timeto close the file >>> p r i n t(myfile.read()) >>> The following function copies a file, reading and writing up to fifty characters at a time. The first argument is the name of the original file; the second is the name of the new file: de f copy_file(oldfile, newfile): h_infile = open(oldfile, 'r') h_outfile = open(newfile, 'w') wh i l e True: text = h_infile.read(50) i f text == "": b r eak h_outfile.write(text) h_infile.close() h_outfile.close() This functions continues looping, reading 50 characters from infile and writing the same 50 characters to outfile until the end of infile is reached, at which point text is empty and the break statement is executed. A han dle is so mewhat like a TV re m ote control Were all familiar with a remote control for a TV. You perform operations on the remote control switch channels, change the volume, etc. But the real action happens on the TV. So, by simple analogy, wed call the remote control your hand l e to the underlying TV. Sometimes we want to emphasize the difference the file handle is not the same as the file, and the remote control is not the same as the TV it controls. But at other times we prefer to treat them as a single mental chunk, or abstraction, and well just say close the file, or flip the TV channel. 13.2. Text files A te x t file is a file that contains printable characters and whitespace, organized into lines separated by newline characters. One of the Python design goals was to provide methods that made text file processing easy. Notice the subtle difference in abstraction here: in the previous section, we simply regarded a file as containing many characters, and could read them one at a time, many at a time, or all at once. In this section, particularly for reading data, were interested in files that are organized into lines, and we will process them line-at-a-time. To demonstrate, well create a text file with three lines of text separated by newlines: >>> h_outfile = open("test.dat","w") >>> h_outfile.write("line one \ nline two \ nline three \ n") >>> h_outfile.close() The readline method reads all the characters up to and including the next newline character: >>> h_infile = open("test.dat","r") >>> p r i n t(h_infile.readline()) line one >>> readlines returns all of the remaining lines as a list of strings: >>> p r i n t(h_infile.readlines()) ['line two\n', 'line three\n'] In this case, the output is in list format, which means that the strings appear with quotation marks and the newline character appears at the end of each. At the end of the file, readline returns the empty string and readlines returns the empty list: >>> p r i n t(h_infile.readline()) >>> p r i n t(h_infile.readlines()) [] The following is an example of a line-processing program. filter makes a copy of oldfile, omitting any lines that begin with #: 1 de f filter(oldfile, newfile): 2 infile = open(oldfile, 'r') 3 outfile = open(newfile, 'w') 4 wh i l e True: 5 text = infile.readline() 6 i f text == "": 7 b r eak 8 i f text[0] == '#': 9 con t i nue 10 outfile.write(text) 11 infile.close() 12 outfile.close() The contin ue state ment ends the current iteration of the loop, but continues looping. The flow of execution moves to the top of the loop, checks the condition, and proceeds accordingly. Thus, if text is the empty string, the loop exits. If the first character of text is a hash mark, the flow of execution goes to the top of the loop. Only if both conditions fail do we copy text into the new file. Lets consider one more case: suppose your original file contained empty lines. At line 6 above, would this program not find the first empty line in the file, and terminate immediately? No! Recall that readline always includes the newline character in the string it returns, so even an empty line in your file would arrive in the text variable on line 5 containing its newline character. It is only when we try to read b e y ond the end of the file that we we get back the empty string. 1 3.3. Directories Files on non-volatile storage media are organized by a set of rules known as a file system. File systems are made up of files and directories, which are containers for both files and other directories. When you create a new file by opening it and writing, the new file goes in the current directory (wherever you were when you ran the program). Similarly, when you open a file for reading, Python looks for it in the current directory. If you want to open a file somewhere else, you have to specify the p a t h to the file, which is the name of the directory (or folder) where the file is located: >>> wordsfile = open('/usr/share/dict/words', 'r') >>> wordlist = wordsfile.readlines() >>> p r i n t(wordlist[:6]) ['\n', 'A\n', "A's\n", 'AOL\n', "AOL's\n", 'Aachen\n'] This (unix) example opens a file named words that resides in a directory named dict, which resides in share, which resides in usr, which resides in the top-level directory of the system, called /. It then reads in each line into a list using readlines, and prints out the first 5 elements from that list. A Windows path might be "c:/temp/words.txt" or "c:\\temp\\words.txt". Because backslashes are used to escape things like newlines and tabs, you need to write two backslashes in a literal string to get one! So the length of these two strings is the same! You cannot use / or \ as part of a filename; they are reserved as a deli m iter between directory and filenames. The file /usr/share/dict/words should exist on unix-based systems, and contains a list of words in alphabetical order.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-