13. Files — How to Think Like a Computer Scientist: Learning With

Total Page:16

File Type:pdf, Size:1020Kb

13. Files — How to Think Like a Computer Scientist: Learning With How to Think Like a Computer Scientist: Learning with Python 3 previous | next | index 13. Files 1 3.1. Reading and writing files While a program is running, its data is stored in rando m access m e m ory (RAM). RAM is fast and inexpensive, but it is also volatile, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time you turn on your computer and start your program, you have to write it to a n o n - volatile storage medium, such a hard drive, usb drive, or CD-RW. Data on non-volatile storage media is stored in named locations on the media called files. By reading and writing files, programs can save information between program runs. Working with files is a lot like working with a notebook. To use a notebook, you have to open it. When youre done, you have to close it. While the notebook is open, you can either write in it or read from it. In either case, you know where you are in the notebook. You can read the whole notebook in its natural order or you can skip around. All of this applies to files as well. To open a file, you specify its name and indicate whether you want to read or write. Opening a file creates what we call a file h a n d l e. In this example, the variable myfile refers to the new handle object. Our program calls methods on the handle, and this makes changes to the actual file which is usually located on our disk. myfile = open('test.dat', 'w') The open function takes two arguments. The first is the name of the file, and the second is the m o d e. Mode 'w' means that we are opening the file for writing. If there is no file named test.dat on the disk, it will be created. If there already is one, it will be replaced by the file we are writing. To put data in the file we invoke the write method on the handle: myfile.write("Now is the time") myfile.write("to close the file") Closing the file handle tells the system that we are done writing and makes the disk file available for reading by other programs (or by ourselves): myfile.close() Now we can open the file again, this time for reading, and read the contents into a string. This time, the mode argument is 'r' for reading: >>> mynewhandle = open('test.dat', 'r') If we try to open a file that doesnt exist, we get an error: >>> mynewhandle = open('test.cat', 'r') IOError: [Errno 2] No such file or directory: 'test.cat' Not surprisingly, the read method reads data from the file. With no arguments, it reads the entire contents of the file into a single string: >>> text = mynewhandle.read() >>> p r i n t(text) Now is the timeto close the file There is no space between time and to because we did not write a space between the strings. read can also take an argument that indicates how many characters to read: >>> myfile = open('test.dat', 'r') >>> p r i n t(myfile.read(5)) Now i If not enough characters are left in the file, read returns the remaining characters. When we get to the end of the file, read returns the empty string: >>> p r i n t(myfile.read(1000006)) s the timeto close the file >>> p r i n t(myfile.read()) >>> The following function copies a file, reading and writing up to fifty characters at a time. The first argument is the name of the original file; the second is the name of the new file: de f copy_file(oldfile, newfile): h_infile = open(oldfile, 'r') h_outfile = open(newfile, 'w') wh i l e True: text = h_infile.read(50) i f text == "": b r eak h_outfile.write(text) h_infile.close() h_outfile.close() This functions continues looping, reading 50 characters from infile and writing the same 50 characters to outfile until the end of infile is reached, at which point text is empty and the break statement is executed. A han dle is so mewhat like a TV re m ote control Were all familiar with a remote control for a TV. You perform operations on the remote control switch channels, change the volume, etc. But the real action happens on the TV. So, by simple analogy, wed call the remote control your hand l e to the underlying TV. Sometimes we want to emphasize the difference the file handle is not the same as the file, and the remote control is not the same as the TV it controls. But at other times we prefer to treat them as a single mental chunk, or abstraction, and well just say close the file, or flip the TV channel. 13.2. Text files A te x t file is a file that contains printable characters and whitespace, organized into lines separated by newline characters. One of the Python design goals was to provide methods that made text file processing easy. Notice the subtle difference in abstraction here: in the previous section, we simply regarded a file as containing many characters, and could read them one at a time, many at a time, or all at once. In this section, particularly for reading data, were interested in files that are organized into lines, and we will process them line-at-a-time. To demonstrate, well create a text file with three lines of text separated by newlines: >>> h_outfile = open("test.dat","w") >>> h_outfile.write("line one \ nline two \ nline three \ n") >>> h_outfile.close() The readline method reads all the characters up to and including the next newline character: >>> h_infile = open("test.dat","r") >>> p r i n t(h_infile.readline()) line one >>> readlines returns all of the remaining lines as a list of strings: >>> p r i n t(h_infile.readlines()) ['line two\n', 'line three\n'] In this case, the output is in list format, which means that the strings appear with quotation marks and the newline character appears at the end of each. At the end of the file, readline returns the empty string and readlines returns the empty list: >>> p r i n t(h_infile.readline()) >>> p r i n t(h_infile.readlines()) [] The following is an example of a line-processing program. filter makes a copy of oldfile, omitting any lines that begin with #: 1 de f filter(oldfile, newfile): 2 infile = open(oldfile, 'r') 3 outfile = open(newfile, 'w') 4 wh i l e True: 5 text = infile.readline() 6 i f text == "": 7 b r eak 8 i f text[0] == '#': 9 con t i nue 10 outfile.write(text) 11 infile.close() 12 outfile.close() The contin ue state ment ends the current iteration of the loop, but continues looping. The flow of execution moves to the top of the loop, checks the condition, and proceeds accordingly. Thus, if text is the empty string, the loop exits. If the first character of text is a hash mark, the flow of execution goes to the top of the loop. Only if both conditions fail do we copy text into the new file. Lets consider one more case: suppose your original file contained empty lines. At line 6 above, would this program not find the first empty line in the file, and terminate immediately? No! Recall that readline always includes the newline character in the string it returns, so even an empty line in your file would arrive in the text variable on line 5 containing its newline character. It is only when we try to read b e y ond the end of the file that we we get back the empty string. 1 3.3. Directories Files on non-volatile storage media are organized by a set of rules known as a file system. File systems are made up of files and directories, which are containers for both files and other directories. When you create a new file by opening it and writing, the new file goes in the current directory (wherever you were when you ran the program). Similarly, when you open a file for reading, Python looks for it in the current directory. If you want to open a file somewhere else, you have to specify the p a t h to the file, which is the name of the directory (or folder) where the file is located: >>> wordsfile = open('/usr/share/dict/words', 'r') >>> wordlist = wordsfile.readlines() >>> p r i n t(wordlist[:6]) ['\n', 'A\n', "A's\n", 'AOL\n', "AOL's\n", 'Aachen\n'] This (unix) example opens a file named words that resides in a directory named dict, which resides in share, which resides in usr, which resides in the top-level directory of the system, called /. It then reads in each line into a list using readlines, and prints out the first 5 elements from that list. A Windows path might be "c:/temp/words.txt" or "c:\\temp\\words.txt". Because backslashes are used to escape things like newlines and tabs, you need to write two backslashes in a literal string to get one! So the length of these two strings is the same! You cannot use / or \ as part of a filename; they are reserved as a deli m iter between directory and filenames. The file /usr/share/dict/words should exist on unix-based systems, and contains a list of words in alphabetical order.
Recommended publications
  • Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 12 SP4
    SUSE Linux Enterprise Server 12 SP4 Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 12 SP4 Provides information about how to manage storage devices on a SUSE Linux Enterprise Server. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xii 1 Available Documentation xii 2 Giving Feedback xiv 3 Documentation Conventions xiv 4 Product Life Cycle and Support xvi Support Statement for SUSE Linux Enterprise Server xvii • Technology Previews xviii I FILE SYSTEMS AND MOUNTING 1 1 Overview
    [Show full text]
  • System Calls System Calls
    System calls We will investigate several issues related to system calls. Read chapter 12 of the book Linux system call categories file management process management error handling note that these categories are loosely defined and much is behind included, e.g. communication. Why? 1 System calls File management system call hierarchy you may not see some topics as part of “file management”, e.g., sockets 2 System calls Process management system call hierarchy 3 System calls Error handling hierarchy 4 Error Handling Anything can fail! System calls are no exception Try to read a file that does not exist! Error number: errno every process contains a global variable errno errno is set to 0 when process is created when error occurs errno is set to a specific code associated with the error cause trying to open file that does not exist sets errno to 2 5 Error Handling error constants are defined in errno.h here are the first few of errno.h on OS X 10.6.4 #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* Input/output error */ #define ENXIO 6 /* Device not configured */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file descriptor */ #define ECHILD 10 /* No child processes */ #define EDEADLK 11 /* Resource deadlock avoided */ 6 Error Handling common mistake for displaying errno from Linux errno man page: 7 Error Handling Description of the perror () system call.
    [Show full text]
  • Java Read Text File from Resources
    Java Read Text File From Resources Self-distrust Paddy piffling, his hidalgo frisk refiles mobs. Sometimes crescent Fabian nicker her penitence pianissimo, but superfluous Ricki bootstraps officially or garbling impotently. Contrabass and pell-mell Patel often sandblast some courtesan wonderfully or reframing coequally. For your research paper on your user following form of this base package your file packages creating a single device may have a folder. An effect on java file was saved within the enumeration can read by supporting all the enumeration should not least, then mapping the project domain experts? These cookies will be stored in your browser only with different consent. Are you sure you want to cancel this follow? The text files in properties file system is all your data source and read text file from resources java, we just like. You show me many ways to read a File using Kotlin. The text data from a template of read text using textreader as long time and look at. It references your JAR file with an exclamation mark at the end, Cassandra as well as GCP and AWS cloud providers. Opinions expressed by DZone contributors are their own. Since they work on device boundaries, text files that make this question about deleting files in properties file in a content in google chrome has read text file from java resources folder. Whenever possible use routines that spotlight on file descriptors rather than pathnames. Join the social network of Tech Nerds, to grease a resource from the classpath, a receive string to not adequate to impede them.
    [Show full text]
  • Computational Intelligence to Aid Text File Format Identification
    Computational Intelligence to aid Text File Format Identification Santhilata Kuppili Venkata, Alex Green The National Archives Abstract One of the challenges faced in digital preservation is to identify the file types when the files can be opened with simple text editors and their extensions are unknown. The problem gets complicated when the file passes through the test of human readability, but would not make sense how to put to use! The Text File Format Identification (TFFI) project was initiated at The National Archives to identify file types from plain text file contents with the help of computing intelligence models. A methodology that takes help of AI and machine learning to automate the process was successfully tested and implemented on the test data. The prototype developed as a proof of concept has achieved up to 98.58% of accuracy in detecting five file formats. 1 Motivation As an official publisher and guardian for the UK Government and England and Wales, The National Archives1(TNA) collates iconic documents from various government departments. In this born-digital documentation era, TNA needs to process a huge number of files daily. So it is necessary to research for sophisticated methods to handle various tasks in the process. File format identification of plain text files is one such example. 1.1 How a simple plain text file can create confusion? Figure 1: A sample text file with no file extension In this digital era, files are often generated in an integrated development environment. Each document is supported by multiple files. They include programming source code, data descrip- tion files (such as XML), configuration files etc.
    [Show full text]
  • Alias Manager 4
    CHAPTER 4 Alias Manager 4 This chapter describes how your application can use the Alias Manager to establish and resolve alias records, which are data structures that describe file system objects (that is, files, directories, and volumes). You create an alias record to take a “fingerprint” of a file system object, usually a file, that you might need to locate again later. You can store the alias record, instead of a file system specification, and then let the Alias Manager find the file again when it’s needed. The Alias Manager contains algorithms for locating files that have been moved, renamed, copied, or restored from backup. Note The Alias Manager lets you manage alias records. It does not directly manipulate Finder aliases, which the user creates and manages through the Finder. The chapter “Finder Interface” in Inside Macintosh: Macintosh Toolbox Essentials describes Finder aliases and ways to accommodate them in your application. ◆ The Alias Manager is available only in system software version 7.0 or later. Use the Gestalt function, described in the chapter “Gestalt Manager” of Inside Macintosh: Operating System Utilities, to determine whether the Alias Manager is present. Read this chapter if you want your application to create and resolve alias records. You might store an alias record, for example, to identify a customized dictionary from within a word-processing document. When the user runs a spelling checker on the document, your application can ask the Alias Manager to resolve the record to find the correct dictionary. 4 To use this chapter, you should be familiar with the File Manager’s conventions for Alias Manager identifying files, directories, and volumes, as described in the chapter “Introduction to File Management” in this book.
    [Show full text]
  • File Handling in Python
    hapter C File Handling in 2 Python There are many ways of trying to understand programs. People often rely too much on one way, which is called "debugging" and consists of running a partly- understood program to see if it does what you expected. Another way, which ML advocates, is to install some means of understanding in the very programs themselves. — Robin Milner In this Chapter » Introduction to Files » Types of Files » Opening and Closing a 2.1 INTRODUCTION TO FILES Text File We have so far created programs in Python that » Writing to a Text File accept the input, manipulate it and display the » Reading from a Text File output. But that output is available only during » Setting Offsets in a File execution of the program and input is to be entered through the keyboard. This is because the » Creating and Traversing a variables used in a program have a lifetime that Text File lasts till the time the program is under execution. » The Pickle Module What if we want to store the data that were input as well as the generated output permanently so that we can reuse it later? Usually, organisations would want to permanently store information about employees, inventory, sales, etc. to avoid repetitive tasks of entering the same data. Hence, data are stored permanently on secondary storage devices for reusability. We store Python programs written in script mode with a .py extension. Each program is stored on the secondary device as a file. Likewise, the data entered, and the output can be stored permanently into a file.
    [Show full text]
  • System Calls and Standard I/O
    System Calls and Standard I/O Professor Jennifer Rexford http://www.cs.princeton.edu/~jrex 1 Goals of Today’s Class • System calls o How a user process contacts the Operating System o For advanced services that may require special privilege • Standard I/O library o Generic I/O support for C programs o A smart wrapper around I/O-related system calls o Stream concept, line-by-line input, formatted output, ... 2 1 System Calls 3 Communicating With the OS User Process signals systems calls Operating System • System call o Request to the operating system to perform a task o … that the process does not have permission to perform • Signal o Asynchronous notification sent to a process … to notify the process of an event that has occurred o 4 2 Processor Modes • The OS must restrict what a user process can do o What instructions can execute o What portions of the address space are accessible • Supervisor mode (or kernel mode) o Can execute any instructions in the instruction set – Including halting the processor, changing mode bit, initiating I/O o Can access any memory location in the system – Including code and data in the OS address space • User mode o Restricted capabilities – Cannot execute privileged instructions – Cannot directly reference code or data in OS address space o Any such attempt results in a fatal “protection fault” – Instead, access OS code and data indirectly via system calls 5 Main Categories of System Calls • File system o Low-level file I/O o E.g., creat, open, read, write, lseek, close • Multi-tasking mechanisms o Process
    [Show full text]
  • Your Performance Task Summary Explanation
    Lab Report: 11.2.5 Manage Files Your Performance Your Score: 0 of 3 (0%) Pass Status: Not Passed Elapsed Time: 6 seconds Required Score: 100% Task Summary Actions you were required to perform: In Compress the D:\Graphics folderHide Details Set the Compressed attribute Apply the changes to all folders and files In Hide the D:\Finances folder In Set Read-only on filesHide Details Set read-only on 2017report.xlsx Set read-only on 2018report.xlsx Do not set read-only for the 2019report.xlsx file Explanation In this lab, your task is to complete the following: Compress the D:\Graphics folder and all of its contents. Hide the D:\Finances folder. Make the following files Read-only: D:\Finances\2017report.xlsx D:\Finances\2018report.xlsx Complete this lab as follows: 1. Compress a folder as follows: a. From the taskbar, open File Explorer. b. Maximize the window for easier viewing. c. In the left pane, expand This PC. d. Select Data (D:). e. Right-click Graphics and select Properties. f. On the General tab, select Advanced. g. Select Compress contents to save disk space. h. Click OK. i. Click OK. j. Make sure Apply changes to this folder, subfolders and files is selected. k. Click OK. 2. Hide a folder as follows: a. Right-click Finances and select Properties. b. Select Hidden. c. Click OK. 3. Set files to Read-only as follows: a. Double-click Finances to view its contents. b. Right-click 2017report.xlsx and select Properties. c. Select Read-only. d. Click OK. e.
    [Show full text]
  • File Permissions Do Not Restrict Root
    Filesystem Security 1 General Principles • Files and folders are managed • A file handle provides an by the operating system opaque identifier for a • Applications, including shells, file/folder access files through an API • File operations • Access control entry (ACE) – Open file: returns file handle – Allow/deny a certain type of – Read/write/execute file access to a file/folder by – Close file: invalidates file user/group handle • Access control list (ACL) • Hierarchical file organization – Collection of ACEs for a – Tree (Windows) file/folder – DAG (Linux) 2 Discretionary Access Control (DAC) • Users can protect what they own – The owner may grant access to others – The owner may define the type of access (read/write/execute) given to others • DAC is the standard model used in operating systems • Mandatory Access Control (MAC) – Alternative model not covered in this lecture – Multiple levels of security for users and documents – Read down and write up principles 3 Closed vs. Open Policy Closed policy Open Policy – Also called “default secure” • Deny Tom read access to “foo” • Give Tom read access to “foo” • Deny Bob r/w access to “bar” • Give Bob r/w access to “bar • Tom: I would like to read “foo” • Tom: I would like to read “foo” – Access denied – Access allowed • Tom: I would like to read “bar” • Tom: I would like to read “bar” – Access allowed – Access denied 4 Closed Policy with Negative Authorizations and Deny Priority • Give Tom r/w access to “bar” • Deny Tom write access to “bar” • Tom: I would like to read “bar” – Access
    [Show full text]
  • Writing a Text File
    Writing a text file Class java.io.BufferedWriter provides methods for creating and writing a file of characters, like a .txt file. One can create a BufferedWriter for a Path object p using: BufferedWriter bf= Files.BufferedWriter(p); The class has three methods of importance here: p.write(s, k, len); // Here, s is a String; write the substring s[k..k+len-1] to the file. p.newLine(); // Write a line separator —whatever your OS uses as a separator. p.close(); // Close the file. Should be called when no more is to be written on the file. The class is called a buffered writer because it “buffers” the text. When a call on p.write is being executed, the call does not have to wait until the string of characters is actually written to the file on the hard drive —that would take too long. Instead, the characters are added to a buffer, and the call on p.write then terminates. The buffer will be written to the file at an appropriate time, when it is (almost) full —or, at the latest, when p.close is called. Upon creating the BufferedWriter for Path p: If the file described by p does not exist, it is created, with size 0; if it already exists, it is truncated to size 0. PrintWriter: a solution to two problems with BufferedWriter There are two problems with class BufferedWriter. First, only String values can be written using procedure write. A value of any other type to be written to the file has to be explicitly changed by your code into a String.
    [Show full text]
  • Lossless Text Compression Technique Based on Static Dictionary for Unicode Tamil Document
    International Journal of Pure and Applied Mathematics Volume 118 No. 9 2018, 669-675 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu Lossless Text Compression Technique Based on Static Dictionary for Unicode Tamil Document B.Vijayalakshmi Dr.N.Sasirekha Associate Professor Ph.D. Research Scholar Department of Computer Science Department of Computer Science Vidyasagar College of Arts and Science Vidyasagar College of Arts and Science Udumalpet, Tamilnadu, India Udumalpet, Tamilnadu, India [email protected] [email protected] There are many compression techniques available, Abstract- Text compression is an effective technique that reduces one of the popular compression technique is dictionary based the data storage and also increases the data transfer rate during compression. The dictionary contains a list of strings of communication. This paper explains a new method of lossless text possible symbols stored in a table like structure. It uses the compression technique for Tamil documents made of Unicode index of entries to represent larger and repeated dictionary Tamil characters. The method of compression and decompression process using static dictionary compression word or character by a smaller one [1]. The dictionary scheme is presented. This compression technique reduces the compression can be a static or dynamic scheme type. In this Tamil document an average of 50% of its storage capacity. The paper, the compression technique is based on a static original document is retained in the decompression process. dictionary which is easy and a permanent one. This static dictionary contains the subset of all the common pattern of Keywords-Text compression, decompression, Unicode and Unicode Tamil characters indexed by ASCII characters.
    [Show full text]
  • Singularityce User Guide Release 3.8
    SingularityCE User Guide Release 3.8 SingularityCE Project Contributors Aug 16, 2021 CONTENTS 1 Getting Started & Background Information3 1.1 Introduction to SingularityCE......................................3 1.2 Quick Start................................................5 1.3 Security in SingularityCE........................................ 15 2 Building Containers 19 2.1 Build a Container............................................. 19 2.2 Definition Files.............................................. 24 2.3 Build Environment............................................ 35 2.4 Support for Docker and OCI....................................... 39 2.5 Fakeroot feature............................................. 79 3 Signing & Encryption 83 3.1 Signing and Verifying Containers.................................... 83 3.2 Key commands.............................................. 88 3.3 Encrypted Containers.......................................... 90 4 Sharing & Online Services 95 4.1 Remote Endpoints............................................ 95 4.2 Cloud Library.............................................. 103 5 Advanced Usage 109 5.1 Bind Paths and Mounts.......................................... 109 5.2 Persistent Overlays............................................ 115 5.3 Running Services............................................. 118 5.4 Environment and Metadata........................................ 129 5.5 OCI Runtime Support.......................................... 140 5.6 Plugins.................................................
    [Show full text]