Outline

❚ Basics of File Systems File and I/O ❚ and Unix : ❚ UNIX I/O System Calls: open, close, read, write, ioctl (USP Chapters 4 and 5) ❚ File Representations: FDT, SFT, inode table ❚ fork and inheritance, Filters and redirection ❚ File pointers and buffering Instructor: Dr. Tongping Liu ❚ Directory operations ❚ Links of Files: Hard vs. Symbolic

1 2

Storing Information File Systems

❚ Applications may store information in the ❚ For long-term information storage: address space. But it is a bad idea. Ø Should be able to store very large amount of information Ø Size is limited to size of virtual address space Ø Information must survive the processes of using it ü May not be sufficient, e.g. airline reservations, banking Ø Should provide concurrent accesses to multiple processes Ø The data is lost when the application terminates ❚ Solution: ü Even when a computer doesn’t crash! Ø Store information to disks in units called as “files” Ø Multiple processes might want to access the same data Ø Files are persistent until users delete it explicitly ü Imagine a telephone directory of one company Ø Files are managed by the OS

❚ File Systems: How the OS manages files! Refer the slides of Profs. Sirer and George at Cornell

1 File Naming File Attributes

❚ Motivation: ❚ File-specific info maintained by the OS Ø People cannot remember block, sector, … Ø , modification date, creation time, etc. Ø Varies a lot across different OSs Ø They should have human-readable names

❚ Some examples: ❚ Basic idea: Ø Name – human-readable form Ø Process creates a file, and gives it a name Ø Identifier – a unique tag (number) identifies a file within the file system ü Other processes can access the file by the name Ø Type – required for systems supporting different types (e.g. ro, exe) Ø Naming conventions are OS dependent Ø Location – pointer to the file location on device Ø Size ü Usually names are allowed to be less than 255 characters Ø Protection – controls who can read, write, execute the file ü Digits and special characters are sometimes allowed Ø Time, date, and user identification – data for protection, security, and ü MS-DOS and Windows are not case sensitive, but UNIX-like OSs are usage monitoring

File Protection Access Example in

❚ File owner/creator should be able to control: ❚ Mode of access: read, write, execute Ø what can be done ❚ Three classes of users: Ø by whom Ø Owner: Ø Group: ❚ Types of access Ø Public: RWX Ø Read 7: 1 1 1 Ø Write owner!group! public! 6: 1 1 0 Ø Execute ! game! 1: 0 0 1 Ø Append 761! Ø Delete Ø List

2 Outline Directory: An Example

❚ Basics of File Systems ❚ Directory and : inode (SGG 12) ❚ UNIX I/O System Calls: open, close, read, write, ioctl ❚ File Representations: FDT, SFT, inode table ❚ fork and inheritance, Filters and redirection ❚ File pointers and buffering ❚ Directory operations ❚ Links of Files: Hard vs. Symbolic An example structure of the file system.

9 10

Structure of a Unix File System Definition of inode

❚ The inode is a data structure in a Unix-style file system that describes a filesystem object, such as a file or directory. ❚ Each inode stores the attributes and disk block location(s) of the object's data. ❚ Inode attributes may include metadata (times of last change, access, modification), as well as owner and permission data. A directory entry contains only a name and an index into a table with the information of a file (inode)

11 12

3 More about “inode” Traditional inode Structure

❚ Each file system (such as or ) maintains an array of -- the inode table, which contains list of all files. Each individual inode has a unique number (unique to that filesystem): the inode number. ❚ An inode stores: Pointers to location of file Ø File type: regular file, directory, pipe etc. Ø Permissions to that file: read, write, execute Ø Link count: The number of relative to an inode Ø User ID: owner of file Ø Group ID: group owner Ø Size of file: or major/minor number in case of some special files Ø Time stamp: access time, modification time and (inode) change time Ø Attributes: immutable' for example Ø Access control list: permissions for special users/groups Ø Link to location of file Ø Other metadata about the file 13 14

File Size and Block Size File Size and Block Size

❚ Block size is 8K and pointers are 4 bytes ❚ Block size is 8K and pointers are 8 bytes ❚ One single indirect pointer Ø How large a file can be represented using only one single Ø A block contains 2K pointers to identify file’s 2K blocks indirect pointer? Ø File size = 2K * 8K = 16MB Ø What about one double indirect pointer? ❚ One double indirect pointer Ø What about one triple indirect pointer? Ø 2K pointers à 2K blocks, where each block contains 2K pointers to identify blocks for the file Ø Number of blocks of the file = 2K*2K = 4M Ø File size = 4M * 8K = 32 GB ❚ One triple indirect pointer Ø Number of blocks of the file = 2K*2K*2K = 8 *K*K*K

Ø File size à 64 TB ( = 2^33) ; 15 16

4 Outline Unix I/O Related System Calls

❚ Basics of File Systems ❚ Device independence: uniform device interface ❚ Directory and Unix File System: inode ❚ I/O through device drivers with standard interface ❚ UNIX I/O System Calls: open, close, read, write, ioctl ❚ Use file descriptors (FDs) ❚ File Representations: FDT, SFT, inode table Ø FD is an abstract indicator (handle) used to access a file or other input/output resource ❚ fork and inheritance, Filters and redirection ❚ 3 file descriptors open when a program starts ❚ File pointers and buffering Ø STDIN_FILENO (0), STDOUT_FILENO (1), ❚ Directory operations STDERR_FILENO (2) ❚ Links of Files: Hard vs. Symbolic ❚ 6 main system calls for I/O Ø open, close, read, write, ioctl, lseek Ø Return -1 on error and set errno 17 18

File position open

❚ An open file has its file position that keeps track of #include where the next character is to be read or written. #include Ø On GNU systems, and all POSIX.1 systems, the file int open(const char *path, int oflag); position is simply an integer representing the number of bytes from the beginning of the file. int open(const char *path, int oflag, Ø The file position is normally set to the beginning of the file mode_t mode); when it is opened, and each time a character is read or written, the file position is incremented. In other words, accessing to a file is normally sequential. ❚ Oflag: O_RDONLY, O_WRONLY, O_RDWR, O_APPEND, O_CREAT, O_EXCL, O_NOCTTY, Ø File position can be changed by lseek O_NONBLOCK, O_TRUNC;

19 20

5 open (cont.)

❚ O_CREAT flag: must use the 3-parameter form of open and specify permissions POSIX Symbolic Names for Permissions (mode)

defined in sys/.h

Historical layout of the permissions mask.

21 22

Program 4.9 (p106): copyfilemain.ccopy a file. close and its usage

#include int close(int fildes); ❚ Open files are closed when program exits normally.

23 24

6 : read An example to read in a line

#include ssize_t read(int fildes, void *buf, Read one char at a time! size_t nbyte);

❚ Need to allocate a buffer *buf to hold the bytes read; ❚ size_t is an unsigned long type; ❚ ssize_t is a signed long type; ❚ Can return fewer bytes than requested; ❚ Return value of -1 with errno set to EINTR is not usually an error.

25 26

System Call: write Example for write ?? Function copyfile reads from one file and writes out to another #include ssize_t write(int fildes, const void *buf, size_t nbyte);

❚ Not error if return value > 0 but less than nbyte Ø Must restart write if it returns fewer bytes than requested ❚ Return value of -1 with errno set to EINTR is not an error usually

27 28

7 Outline File Representation

❚ Basics of File Systems ❚ Table: ❚ Directory and Unix File System: inodes Ø An array of pointers indexed by the file descriptors ❚ UNIX I/O System Calls: open, close, read, write, ioctl Ø The pointers point to entries in System File Table ❚ File Representations: FDT, SFT, inode table ❚ System File Table: Ø Contains entries for each opened file ❚ fork and inheritance, Filters and redirection Ø Entries contain pointers to a table of inodes kept in ❚ File pointers and buffering memory ❚ Directory operations Ø Other information in an entry: current file offset; file ❚ Links of Files: Hard vs. Symbolic mode; count of file descriptors using this entry; Ø When a file is closed, the count is decremented. The entry is freed when the count becomes 0. ❚ In-Memory Inode Table: copies of in-use inodes 29 30

File Representation (cont.) What Happened for File Read/Write

1. The process passes the file descriptor to the kernel through a system call (read/write) Per Process 2. The kernel will access the entry in file table on behalf of the process, by getting the pointer to the inode All Processes 3. For read/write, after obtaining the file pointer (offset) in the file table, then it can compute the All Processes block/sector information. 4. In the end, it will issue IO request. IO scheduler may combine multiple requests with the close Figure 4.2 (page 120): Relationship between the file descriptor table, the system file table and the in-memory inode table. location together. System file table: dynamic information 31 32

8 inode: Store File Information Questions about file-related tables

Traditional UNIX inode structure. 1. If a file is opened twice in the same process, how many entries will be created in file descriptor table? How many will be created in open file table? How many in inode table? 2 entries in file descriptor table and open file table, but only one inode 2. What about one file is opened separately by two processes?

33 34

Inheritance of File Descriptors File Representation (cont.)

❚ When fork creates a child Ø child inherits a copy of the parent's address space, including the file descriptor table

35 36

9 Inheritance of File Descriptors Parent’s FDT both parent and child share the 0 stdin same system file table entry File is opened BEFORE fork ❚ When fork creates a child 1 stdout System File Table Ø child inherits a copy of the parent's address space 2 stderr stdin Ø including the file descriptor table 3 File_A stdout stderr Child’s FDT 0 stdin File_A 1 stdout 2 stderr 3 File_A Question: how many stdin, stdout,

37 stderr in system file table? 38

File is opened AFTER fork

parent and child use different system file table entries

39 40

10 Example of File Read Read Example (open/fork are switched)

❚ Suppose the file my.dat contains ❚ Suppose the file my.dat contains int main(void) { int main(void) { char c = '!'; "abcdefghijklmnop” and no errors occur; char c = '!'; "abcdefghijklmnop” and no errors occur; char d = '!'; what will be the possible outputs? char d = '!'; what will be the possible outputs? int myfd; int myfd; if (fork() == -1) { Process 17514 got a if ((myfd = open("my.dat", O_RDONLY)) == -1) { return 1; Process 17514 got b return 1; } Process 17515 got a } Process 17514 got a if ((myfd = open("my.dat", O_RDONLY)) == -1) { Process 17515 got b Process 17514 got b return 1; Process 17515 got a if (fork() == -1) { } Process 17515 got b return 1; Process 17515 got c } read(myfd, &c, 1); Process 17514 got a Process 17515 got d read(myfd, &d, 1); Process 17514 got b read(myfd, &c, 1); printf("Process %ld got %c\n", (long)getpid(), c); Process 17515 got a read(myfd, &d, 1); Many possibilities on the order printf("Process %ld got %c\n", (long)getpid(), d); Process 17514 got a printf("Process %ld got %c\n", (long)getpid(), c); return 0; Process 17515 got b printf("Process %ld got %c\n", (long)getpid(), d); } Process 17514 got b return 0; 41 } 42

Exercise Redirection

❚ Suppose myinfile is a ❚ A program can modify the file descriptor table entry regular file containing so that it points to a different entry in the system file “wxyz”. Consider the table. This action is known as redirection. fd = open("myinfile",O_RDONLY); following code segment fork(); read(fd,buf,1); and assume that it is read(fd,buf+1,1); executed without error. printf("%c%c",buf[0],buf[1]); ❚ Explain why it is not possible: xwzy

43 44

11 A Redirection Example More file redirection examples

❚ Example 4.35 (p129) Symbol Redirection > Output redirect Ø cat > my.file >> append output | pipe output to another command < input redirection

Output redirection takes the output of a command and places it into a named file. Input redirection reads the file as input to the command.

45 46

Redirection: dup2 File positions for dup() vs. open()

❚ Copy one file descriptor table entry into another ❚ if you open a descriptor and then duplicate it to get ❚ Can be done with dup2 system call another descriptor, these two descriptors share the same file position: changing the file position of one descriptor will affect the other. #include ❚ if you open a file twice even in the same program, int dup2(int fildes, int fildes2); you get two streams or descriptors with independent

file positions. Ø First close fildes2 silently if fildes2 exists Ø Then, copy the pointer of entry fildes into entry fildes2

47 48

12 Example Implementation: cat > my.file FDT for Redirection Example

Figure 4.7 (page 131): The status of the file descriptor table during the execution of Program 4.18.

49 50

Outline File Pointers and Buffering

❚ Basics of File Systems ❚ Use fopen, fclose, fread, fwrite, ❚ Directory and Unix File System: inodes fprintf, fscanf, etc. ❚ UNIX I/O System Calls: open, close, read, write, ioctl ❚ Example: open a file for output using file pointers ❚ File Representations: FDT, SFT, inode table ❚ fork and inheritance, Filters and redirection ❚ File pointers and buffering ❚ Directory operations ❚ Links of Files: Hard vs. Symbolic

51 52

13 File Pointers and Buffering (cont.) File Pointers and Buffering (cont.)

❚ I/O using file pointers à read/write from/to buffer; ❚ Buffer is filled or emptied when necessary ❚ Buffer size may vary, depend on different OS ❚ write may fill part of the buffer without causing any physical I/O to the file; ❚ If a write is done to standard output, and program crashes, data written may not show up on screen ❚ Standard error is NOT buffered ❚ Interleaving output to standard output and error à output to appear in an unpredictable order

53 ❚ Force the physical output to occur with an fflush 54

fopen() vs. open() Exercise: bufferout.c

1. fopen() returns a C standard FILE pointer, while open() returns a file descriptor. 2. fopen is a library function while open is a system call. printf(“a”); 3. Fopen() provides you with buffering IO if you are mainly reading or writing a file sequentially, and a big speed improvement since fread and fget typically read a buffer size (4K, 8K). But it is not true if you are not accessing sequentially. 4. A FILE * gives you the ability to use fscanf and other stdio functions.

55 56

14 Exercise: bufferinout.c Fileiofork.c

❚ What is the output?

#include #include int main(void) { printf("This is my output."); fork(); This is my output.This is my output. return 0; } Fork will copy the original buffer to the new process.

57 58

Fileioforkline.c Current

❚ What is the output? #include int chdir(const char *path); #include #include char *getcwd(char *buf, size_t size); int main(void) { This is my output. printf("This is my output.\n"); PATH_MAX to determine fork(); the size of the buffer needed return 0; } The buffering of standard output is usually line buffering. This means that the buffer is flushed when it contains a newline.

59 60

15 Functions for Directory Access Lists Files in a Directory

#include *opendir(const char *); Ø provides a handle for the other functions struct dirent *readdir(DIR *dirp); Ø gets the next entry in the directory void rewinddir(DIR *dirp); Ø restarts from the beginning int closedir(DIR *dirp); Ø closes the handle.

❚ Functions are not re-entrant (like strtok) 61 62

Functions to Access File Status Contents of the struct stat

#include ❚ System dependent int lstat(const char *restrict path, ❚ At least the following fields struct stat *restrict buf); Ø Same as stat; for à information about link, not the file it links to int stat(const char *restrict path, struct stat *restrict buf); Ø use the name of a file int fstat(int fildes, struct stat *buf); Ø used for open files

63 64

16 Example: Last Access Time of A File Access and Modified Times

65 66

File Mode Check Whether a File is A Directory

Use the following macros to test the st_mode field for the file type.

S_ISBLK(m) block special file S_ISCHR(m) character special file S_ISDIR(m) directory S_ISFIFO(m) pipe or FIFO special file S_ISLNK(m) symbolic link S_ISREG(m) regular file S_ISSOCK(m) socket

67 68

17 Outline Links in Unix

❚ Basics of File Systems ❚ Link: association between a filename and an inode ❚ Directory and Unix File System: inodes ❚ Two types of links in Unix: Hard vs. Symbolic/Soft ❚ UNIX I/O System Calls: open, close, read, write, ioctl Ø Hard link: A hard link just creates another file with a link to the same underlying inode. ❚ File Representations: FDT, SFT, inode table Ø Soft link: a link to another filename in the file system ❚ fork and inheritance, Filters and redirection ❚ When a file is created ❚ File pointers and buffering Ø A new inode is created ❚ Directory operations Ø inode tracks the number of hard links to the inode ❚ Links of Files: Hard vs. Symbolic

69 70

Creating links Links in Unix (cont.)

❚ Create a hardlink ❚ New hard link to an existing file Ø creates a new file entry Ø no other additional disk space Ø Increment the link count in the inode ❚ Remove a hard link

Ø the rm command or the unlink system call ❚ Create a softlink Ø decrement the link count in the inode ❚ When inode’s link count à 0 Ø The inode and associated disk space are freed

71 72

18 Example: Hard Link Example: Hard Link (cont.)

Creating command:

/dirA/name1 /dirB/name2

Figure 5.4 (page 163): A Figure 5.5 (page 165): Two directory entry, inode, and hard links to the same file; data block for a simple file;

73 74

File Modification with Multiple Hard Links File Modification (cont.)

❚ Exercise 5.17, page 166: What would happen to Figure 5.5 after the following operations:

open("/dirA/name1"); read close modify memory image of the file unlink("/dirA/name1"); open("/dirA/name1"); write close After unlink, the name1 is deleted, while name1 still points to the Figure 5.6 (page 167): The situation after editing the file. A original file 75 new file with the new inode is created; 76

19 Symbolic Link Example: A Symbolic Link

❚ Symbolic link: special type of file that contains the name of another file ❚ A reference to the name of a symbolic link Ø OS use the name stored in the file; not the name itself. ❚ Create a symbolic link Ø ln –s /dirA/name1 /dirB/name2 ❚ Symbolic links do not affect link count in the inode ❚ Symbolic links can span filesystems (while hard links cannot) Figure 5.8 (page 170): An ordinary file with a symbolic link to it.

77 Create a new inode that the content links to the original file 78

Hard vs. Symbolic Links Summary

❚ Hard links ❚ Basics of File Systems Ø Hard links are restrict to a given filesystem ❚ Directory and Unix File System: inodes Ø References a physical file (unless the filesystem is corrupt) ❚ UNIX I/O System Calls: open, close, read, write, ioctl Ø A count of hard links is kept in the inode ❚ File Representation Ø Removing the last hard link frees the physical file Ø File descriptor table, System File Table, In-memory inode ❚ Symbolic links table Ø Can make a symbolic link to a file that does not exist ❚ File pointers and buffering Ø Even if it did exist once, a symbolic link might not reference anything ❚ fork and file operations Ø Removing a symbolic link cannot free a physical file ❚ Filters and redirection ❚ Links of Files: Hard vs. Symbolic

79 80

20