Files

• A Unix is a sequence of m bytes: CS 3214 – B0 ,B1, .... , Bk , .... , Bm-1 Computer Systems • All I/O devices are represented as files: – /dev/sda2 (/usr disk partition) – /dev/tty2 (terminal) Supplementary Material on Unix • Even the kernel is represented as a file: System Calls and Standard I/O – /dev/kmem (kernel memory image) – /proc (kernel data structures)

CS 3214

Unix File Types Unix I/O • Regular file – Binary or text file. • The elegant mapping of files to devices allows – Unix does not know the difference! kernel to export simple interface called Unix I/O. • file • Key Unix idea: All input and output is handled in a – A file that contains the names and locations of other files. consistent and uniform way. • Character special and block special files • Basic Unix I/O operations (system calls): – Terminals (character special) and disks (block special) – Opening and closing files • FIFO () • open()and close() – A file type used for interprocess comunication – Changing the current file position (seek) • Socket • lseek() – side note: Use off_t – may be 64bits! – A file type used for network communication between – Reading and writing a file processes • read() and write()

CS 3214 CS 3214

Opening Files • Opening a file informs the kernel that you are A note on Unix errors getting ready to access that file.

int fd; /* */ • Most Unix system functions return 0 on success

if ((fd = open(“/etc/hosts”, O_RDONLY)) < 0) { – Notable exceptions: write, read, lseek perror(“open”); – Always check the section “RETURN VALUE” in man exit(1); } page if in doubt • Returns a small identifying integer file descriptor • If return value indicates error, a global variable ‘errno’ is set with more information – fd == -1 indicates that an error occurred • Each process created by a Unix shell begins life – errno.h include symbolic constants with three open files associated with a terminal: – perror() prints the error in a standard form – 0: standard input • Error values can be unspecific at times – 1: standard output • Reminder: in the project, we will deduct points if – 2: standard error return values of system calls aren’t checked CS 3214 CS 3214

1 Reading Files Closing Files • Reading a file copies bytes from the current file position to memory, and then updates file • Closing a file informs the kernel that you are position. char buf[512]; finished accessing that file. int fd; /* file descriptor */ int nbytes; /* number of bytes read */ int fd; /* file descriptor */ int retval; /* return value */ /* Open file fd ... */ /* Then read up to 512 bytes from file fd */ if ((retval = close(fd)) < 0) { if ((nbytes = read(fd, buf, sizeof(buf))) < 0) { perror(“close”); perror(“read”); exit(1); exit(1); } } • Closing an already closed file is a recipe for • Returns number of bytes read from file fd into disaster in threaded programs (more on this later) buf • Moral: Always check return codes, even for – nbytes < 0 indicates that an error occurred. seemingly benign functions such as close() – short counts (nbytes < sizeof(buf ) ) are possible and are not errors! CS 3214 CS 3214

Writing Files • Writing a file copies bytes from memory to the current Dealing with Short Counts file position, and then updates current file position. char buf[512]; int fd; /* file descriptor */ • Short counts can occur in these situations: int nbytes; /* number of bytes read */ /* Open the file fd ...*/ – Encountering (end-of-file) EOF on reads /* Then write up to 512 bytes from buf to file fd */ if ((nbytes = write(fd, buf, sizeof(buf)) < 0) { – Reading text lines from a terminal perror(“write”); exit(1); – Reading and writing network sockets or Unix } pipes • Returns number of bytes written from buf to file fd. • Short counts never occur in these situations: – nbytes < 0 indicates that an error occurred. – As with reads, short counts are possible and are not – Reading from disk files (except for EOF) errors! – Writing to disk files (except when disk is full or • Transfers up to 512 bytes from address buf to file fd over quota)

CS 3214 CS 3214

/* * rio_readn - robustly read n bytes (unbuffered) */ ssize_t rio_readn(int fd, void *usrbuf, size_t n) Interrupted System Calls { size_t nleft = n; ssize_t nread; • Intention of signal design is to deliver a notification to the char *bufp = usrbuf; process as early as possible • What if the process is in a “BLOCKED” state? while (nleft > 0) { – Say waiting for I/O to complete, in the middle of a read() system if ((nread = read(fd, bufp, nleft)) < 0) { call reading from terminal/network? if (errno == EINTR) /* interrupted by sig handler return */ • In this case, syscalls can be “interrupted.” nread = 0; /* and call read() again */ – The signal handler is executed. If it returns there are 2 options: else • Abort the current system call: returns -1 and sets errno to EINTR (then return -1; /* errno set by read() */ the application must handle it!) } • Restart the current system call – done automatically if SA_RESTART is else if (nread == 0) set in sigaction break; /*EOF*/ • Example: sysrestart.c nleft -= nread; bufp += nread; • Practical implication: if you need to react to signals by } diverting program execution, you cannot set SA_RESTART return (n - nleft); /*return>= 0*/ and must check for EINTR on every blocking system call. } CS 3214 CS 3214

2 /* statcheck.c - Querying and manipulating a file’s meta data */ File Metadata #include "csapp.h" int main (int argc, char **argv) • Metadata is data about data, in this case file data. { struct stat; • Maintained by kernel, accessed by users with the char *type, *readok; stat and fstat functions. bass> ./statcheck statcheck.c Stat(argv[1], &stat); type: regular, read: yes /* Metadata returned by the stat and fstat functions */ if (S_ISREG(stat.st_mode)) /* file type*/bass> 000 statcheck.c struct stat { type = "regular"; bass> ./statcheck statcheck.c dev_t st_dev; /*device*/ else if (S_ISDIR( stat.st_mode)) type: regular, read: no ino_t st_ino; /*inode*/ type = "directory"; mode_t st_mode; /* protection and file type */ else nlink_t st_nlink; /* number of hard links */ type = "other"; uid_t st_uid; /*userIDofowner*/ if ((stat.st_mode & S_IRUSR)) /* OK to read?*/ gid_t st_gid; /* groupID ofowner */ readok = "yes"; dev_t st_rdev; /* device type (if inode device) */ else off_t st_size; /* total size, in bytes */ readok = "no"; unsigned long st_blksize; /* blocksize for filesystem I/O */ Accessing File Metadata unsigned long st_blocks; /* number of blocks allocated */ printf("type: %s, read: %s\n", type, readok); time_t st_atime; /* time of last access */ exit(0); time_t st_mtime; /* time of last modification */ } time_t st_ctime; /* time of last change */ }; CS 3214 CS 3214

Reading Directories Open Files vs. Closed Files

#include • Some functions operate on closed files: • opendir/readdir/ #include #include – And they do not open them: access(), unlink(), mkdir(), closedir symlink(), link() int – Some do: open(), creat() – Note: no writedir() main() (why not?) { • Files can have multiple names, e.g. “hard links” DIR * dir = opendir("."); – Deleting a file removes one name, hence “unlink()” assert(dir); • dirfd() gets struct dirent *ent; – When last link is gone, the file is gone • Interesting Unix property: actual fd while (ent = readdir(dir)) { printf("%s\n", ent->d_name); – Can work with already deleted files – Usable in fchdir() } – “creat()” -> “unlink()” -> “write()” -> “read()” works assert(closedir(dir) == 0); – Use for temporary files, will be deleted when last fd is } closed • Avoid old API mktemp, use mkstemp()

CS 3214 CS 3214

#include #include #include Using Temporary #include Standard I/O Functions #include Files /* Toy example. Real applications use * mkstemp() instead. */ • The C standard library (libc.a) contains a int collection of higher-level standard I/O functions main() { – Documented in Appendix B of K&R. char name [80], buf[3]; • Examples of standard I/O functions: snprintf(name, sizeof name, "/tmp/tmpfile.%d", getpid()); – Opening and closing files (fopen and fclose) – Reading and writing bytes (fread and fwrite) int fd = open(name, O_CREAT|O_RDWR|O_TRUNC, 0600); assert(fd != -1); – Reading and writing text lines (fgets and fputs) assert(unlink(name) == 0); – Formatted reading and writing ( and assert(write(fd, "Hi", 3) == 3); fscanf assert(lseek(fd,(off_t)0, SEEK_SET) == 0); fprintf) assert(read(fd, buf, 3) == 3); – Adjusting offset (fseek/fsetpos/rewind and assert(strcmp(buf, "Hi") == 0); ftell/fgetpos) assert(access(name, R_OK) == -1); printf("Success. Check if %s exists.\n", name); • Yes, it’s ‘rewind’, not ‘frewind’ return 0; } CS 3214 CS 3214

3 Standard I/O Streams Buffering in Standard I/O • Standard I/O models open files as streams – Abstraction for a file descriptor and a buffer in memory. • Standard I/O functions use buffered I/O • C programs begin life with three open streams (defined printf(“h”); in stdio.h) printf(“e”); – stdin (standard input) printf(“l”); printf(“l”); – stdout (standard output) printf(“o”); buf – stderr (standard error) printf(“\n”);

#include extern FILE *stdin; /* standard input (descriptor 0) */ h e l l o \n . . extern FILE *stdout; /* standard output (descriptor 1) */ extern FILE *stderr; /* standard error (descriptor 2) */ fflush(stdout); int main() { fprintf(stdout, “Hello, world\n”); } write(1, buf += 6, 6);

CS 3214 CS 3214

Flushing Standard I/O Pros and Cons of Unix I/O

• Streams are flushed • Pros – When a newline is output and the stream refers to a terminal – Unix I/O is the most general and lowest overhead (isatty(fileno(fd)) == true) form of I/O. – When the buffer size is exceeded – When the program properly exits (exit(3), not _exit(2) or signal- • All other I/O packages are implemented using Unix I/O induced termination) functions. • Flushing related bugs are a common cause of time-wasting – Unix I/O provides functions for accessing file and frustrating errors metadata. – Particularly when standard I/O streams are built from file • Cons descriptors obtained elsewhere via fdopen()! – Dealing with short counts is tricky and error prone. – Particularly in languages that have adopted similar models: Fortran, Perl are big offenders – Efficient reading of text lines requires some form of • Else you need fflush() buffering, also tricky and error prone – Also noteworthy: setbuf(stdout, NULL) turns off buffering – this of – Both of these issues are addressed by the standard course defeats the purpose of stdio! I/O and other packages

CS 3214 CS 3214

Pros and Cons of Standard I/O Pros and Cons of Standard I/O (cont)

• Pros: • Restrictions on streams: – Buffering increases efficiency by decreasing the – Restriction 1: input function cannot follow output number of read and write system calls function without intervening call to fflush, – Short counts are handled automatically fseek, fsetpos, or rewind. • Latter three functions all use lseek to change file • Cons: position. – Provides no function for accessing file metadata – Restriction 2: output function cannot follow an – Standard I/O is not appropriate for input and input function with intervening call to fseek, output on network sockets fsetpos, or rewind. – There are poorly documented restrictions on • Restriction on sockets: streams that interact badly with restrictions on – You are not allowed to change the file position of sockets a socket

CS 3214 CS 3214

4 Pros and Cons of Standard I/O Choosing I/O Functions • Workaround for restriction 1: • General rule: Use the highest-level I/O functions you can – Flush stream after every output. – Many C programmers are able to do all of their work using • Workaround for restriction 2: the standard I/O functions • When to use standard I/O? – Open two streams on the same descriptor, – When working with disk or terminal files one for reading and one for writing: • When to use raw Unix I/O – ‘dup()’ is needed FILE*fpin,*fpout; – When you need to fetch file metadata because fclose() – In rare cases when you need absolute highest performance fpin = fdopen(sockfd, “r”); – In signal handlers (none of stdio is async-signal-safe) closes underlying fpout = fdopen(dup(sockfd), “w”); Unix file descriptor: • What about network sockets? … – Generally, don’t use standard I/O or raw Unix I/O on fclose(fpin); sockets fclose(fpout); – Need to create layer to handle short reads appropriately

CS 3214 CS 3214

5