Systems Programming/ and

Alice E. Fischer

September 9, 2015

Alice E. Fischer Systems Programming – Lecture 3. . . 1/40 September 9, 2015 1 / 40 Outline

1 Compile and Run

2 Unix Topics System Calls The Unix File System Directories and Files I-Nodes

3 Directory Operations

4 Summary

Alice E. Fischer Systems Programming – Lecture 3. . . 2/40 September 9, 2015 2 / 40 Outline Coding Standards

Part of every program grade will be based on these coding standards: Keep all functions short. 30 lines is normally too long. Global variables and use of goto are permanently forbidden. Use class member variables and local variables appropriately. Test data should be submitted and should test all parts of the program. Write your code so that it conforms to the style standards on the next slide.

Alice E. Fischer Systems Programming – Lecture 3. . . 3/40 September 9, 2015 3 / 40 Outline Style Standards

Keep your code and your comments within 80 columns. Eliminate unnecessary words from your comments. Do not repeat the obvious. Comments are for ideas that are not obvious. Use appropriate whitespace, not too little, not too much. Put a space after every comma, semicolon, //, and anywhere else that will help to break up the words in your statement. Do not put blank lines randomly all over the code. Do not put multiple consecutive blank lines in your file. If you need a visual divider, use a line of //—————– Spelling and grammar need to be correct. Abbreviations are OK. Outright misspellings are not. Indentation must be consistent. Learn how to use your IDE to do this. The indentation style should be one of the two nationally recognized styles for C. 3 or 4 spaces at every level. Not , not more.

Alice E. Fischer Systems Programming – Lecture 3. . . 4/40 September 9, 2015 4 / 40 Outline Using the Tools

Attached to the website are two files, tools.cpp, and tools.hpp. Download and save the pair. Please use the functions in the tools module as follows: In tools.hpp, put your own name on line 9. Call banner() or fbanner() or both at the beginning of each program you write. This will put a standard header on the output. Call fatal( format, output-list ); to exit the program after a fatal error. It flushes the streams and exits properly. Use hold() if you want to pause execution so you can read the output screen. Do not use a system call or anything non-standard. Study the code in the fatal() function until you understand how variable-length argument lists work. Please read the relevant section of the handout from the first week (Chapter 20).

Alice E. Fischer Systems Programming – Lecture 3. . . 5/40 September 9, 2015 5 / 40 Compile and Run The Command Shell

Stages of Compilation Conditional Compilation Linking

Alice E. Fischer Systems Programming – Lecture 3. . . 6/40 September 9, 2015 6 / 40 Compile and Run Stages of Compilation

Between submitting your source code to the and running your program, these things must happen: The preprocessor is used in four ways:

Lexical analysis: the words and symbols in your code are identified. Preprocessing: conditional compilation, include files, macros, and constant definitions are removed from the code and replaced by compilable code. Compilation: Your code is parsed and converted to object code. Linking: Modules of your program are combined into a load module, and linked to the system libraries. Loading: The executable file is loaded into main memory and its process is put on the system’s run queue.

Alice E. Fischer Systems Programming – Lecture 3. . . 7/40 September 9, 2015 7 / 40 Compile and Run The Preprocessor

Both C and C++ have a preprocessor, with its own language that is quite separate from the . The preprocessor is used in four ways: To include header files for code modules: #include To define and un-define symbols and literal constants: #define UNIX To define macros: short, untyped, inline functions. These were important in C, but have largely been replaced by inline functions with full type-checking in C++. To do conditional compilation, which is used to create portable code modules. Preprocessing happens after lexical analysis and before compilation.

Alice E. Fischer Systems Programming – Lecture 3. . . 8/40 September 9, 2015 8 / 40 Compile and Run Conditional Compilation

Conditional compilation enables us to “make” a single source code module in multiple ways that are appropriate for different OS and hardware architectures.

When you download source code for a major software project, it will be full of conditional compilation blocks. The top-level blocks will include other files that are also full of conditional compilation blocks. It soon becomes difficult to figure out what code is being compiled!

Alice E. Fischer Systems Programming – Lecture 3. . . 9/40 September 9, 2015 9 / 40 Compile and Run Conditional Compilation Directives

These directives include or exclude a of code from the current module during the current compile. Excluded code is not compiled. #ifdef SYMBOL If the SYMBOL is defined, the following code block will be compiled. #ifndef SYMBOL If the SYMBOL is not defined, the following code block will be compiled. #if followed by a constant expression. The if-condition is tested. When true, the lines of code up to the matching #endif are included in the program. When false, they are skipped (not compiled). #if can be used with the keywords “defined” or “!defined” #if defined (SYMBOL) and #if !defined (SYMBOL) #elif constant-expression and #else Any number of #elif and one final #else can follow the #if. #endif is used with #ifdef, #ifndef, and #if

Alice E. Fischer Systems Programming – Lecture 3. . . 10/40 September 9, 2015 10 / 40 Compile and Run Conditional Compilation–Continued

Include-guards are the most common use of conditional compilation. An include guard is a preprocessor command (or command combo) that will include or exclude a block of code based on what has previously been included. Include guards are used to ensure that no block of code is included twice in the same compile module. Every header file should be protected by include guards. Three processor commands are needed: #ifndef MODULE_NAME #define MODULE_NAME ... code for the module ... #endif An alternative that is supported by some, but not all, systems is #pragma once

Alice E. Fischer Systems Programming – Lecture 3. . . 11/40 September 9, 2015 11 / 40 Compile and Run Linking

At link-time, system and local libraries are searched for functions that were called by the program. (See bottom of log.txt). User functions are linked statically: the entry point of the function is stored in the transfer vector of the module that calls it. System functions are linked dynamically: a stub function is hard-linked into the code. The actual function code is brought in at load time, if it is not already there. If some other process has already loaded the needed function, your process will be linked to the same copy. To make this work, system code must be reentrant, that is, the code must not modify itself at run time.

Alice E. Fischer Systems Programming – Lecture 3. . . 12/40 September 9, 2015 12 / 40 Unix Topics Unix Topics

System Calls vs. Library Function Calls The Unix File System Directories and Files Directory Operations

Alice E. Fischer Systems Programming – Lecture 3. . . 13/40 September 9, 2015 13 / 40 Unix Topics System Calls System Calls vs. Library Function Calls

A system call is a call to a function that executes with root privileges. They are documented in section 2 of the Unix manual. Library function calls execute with user privileges. (Therefore, they are less dangerous.) A library function sometimes calls a system function, causing a context switch: The user’s process (in the OS) goes from RUN state to BLOCKED state. Control is transferred to the system function, which has its own stack and environment, and runs in supervisor mode. The system function reaches back into the user’s stack to get its parameters. When execution is finished, the user’s process moves into the READY state. System calls cause two context swaps. System calls may not be portable because systems are different.

Alice E. Fischer Systems Programming – Lecture 3. . . 14/40 September 9, 2015 14 / 40 Unix Topics System Calls The OS Process Queue

Starting Done

login or my turn termination execve READY RUN

time is up system call completion or I/O

BLOCKED

A system call takes your process out of RUN. It stays in BLOCKED until the system call finishes. Frequent system calls will affect performance.

Alice E. Fischer Systems Programming – Lecture 3. . . 15/40 September 9, 2015 15 / 40 Unix Topics The Unix File System The Unix File System

Mounted file systems The UNIX root and bin directories

Directories, files, and pathnames Links and iNodes System libraries for directory processing.

Alice E. Fischer Systems Programming – Lecture 3. . . 16/40 September 9, 2015 16 / 40 Unix Topics The Unix File System Hardware – Unix System – Program

3451716 John's file 2 links data data 6 blocks

data Mary's link to data John’s file data

data

Alice E. Fischer Systems Programming – Lecture 3. . . 17/40 September 9, 2015 17 / 40 Unix Topics The Unix File System A Fully Mounted File System

Windows pathnames all start with a device code, such as C: Unix pathnames don’t. All Unix pathnames start at the root directory, which is written as: / A removable device that stores files (disk, CD-ROM, stick memory) must be mounted before use. For this purpose, several mount points (empty directories) are built into the Unix file system. For example, /bin/media and /bin/mnt. My main disk, backup disk, and stick memory are all mounted (automatically) on the /Volumes directory. mount() is a system call – happily, not something you will be using.

Alice E. Fischer Systems Programming – Lecture 3. . . 18/40 September 9, 2015 18 / 40 Unix Topics The Unix File System Mounting a Removable Device

Modern systems automatically mount removable devices when they are inserted and unmount them when they are removed. (Many years ago, the user wrote the mount() command himself.) To mount a device, a Unix system must go to the root directory of the file system on the device. That directory is grafted onto the file-tree as a subdirectory of the mount point. Thereafter, the files on it can be reached from anywhere else in the file system using an ordinary path name. Thus, a pathname might start on one device, then go through a mount point, and onto another device.

Alice E. Fischer Systems Programming – Lecture 3. . . 19/40 September 9, 2015 19 / 40 Unix Topics The Unix File System The Fedora Root directory

The Organization of a Linux File System:

/ boot/ Where the kernel files live. dev/ Devices and device types etc/ Configuration files home/ Everybody's home directories alice/ lost+found/ bob/ mike/ media/ For mounting media root/ The home directory of the root user. root directory from stick. * run/media/mike/stickname/ tmp/ Things that should disappear on power off. bin/ usr/ Where non-kernel system files live. games/ usr/bin/ Executable essential system commands kerberos/ usr/sbin/ Executables for sys administration local/ usr/lib/ System libraries (C, etc.) share/ etc/ usr/lib64/ include/ var/ System log files, other files that change lib/ libexec/ sbin/ * The run/ and tmp/ directories exist only at run time. src/

Alice E. Fischer Systems Programming – Lecture 3. . . 20/40 September 9, 2015 20 / 40 Unix Topics The Unix File System What is in the bin directory?

The bin directory stores the executable code and scripts for commands that are necessary for basic system administration. Here are the most familiar and useful: bash cat df ps tcsh echo ln pwd date chmod ls rm hostname chown mkdir rmdir kill cp mv unlink

Some things are in /bin in any Unix-like system; the things listed here are the same in OS-X and Linux. However, Linux has important commands in /bin that are found in /usr/bin in OS-X.

Alice E. Fischer Systems Programming – Lecture 3. . . 21/40 September 9, 2015 21 / 40 Unix Topics The Unix File System In /bin or in /usr/bin: More Basic Commands

Languages: Directories: Utilities: Shell: cd ftp alias find grep chsh mount gzip echo awk stat gunzip logout sed tar make ping c99 Files: man passwd gcc cat svn path g++ diff rdiff-backup rehash java less rsync slogin python more sort ssh ruby touch uniq su umask sudo which whoami

Alice E. Fischer Systems Programming – Lecture 3. . . 22/40 September 9, 2015 22 / 40 Unix Topics Directories and Files Directories and Path Names

A pathname is a sequence of directory names, separated by slashes, and optionally ending with a file name. A pathname can be either absolute or relative. An absolute pathname begins with slash: / (slash, the root directory) Avoid absolute pathnames in your code, since they generally fail when an application is moved to another directory or another machine. We can write a relative pathname starting with these special directories: ∼/ (tilde, the user’s home directory). ./ (dot, the current working directory) ../ (dot dot, the parent of the current working directory) Any other pathname is interpreted relative to the current working directory.

Alice E. Fischer Systems Programming – Lecture 3. . . 23/40 September 9, 2015 23 / 40 Unix Topics Directories and Files Types of Directory Entries

Files, devices, directories, links, and inter-process connections are treated uniformly. The first two entries are always . and .. The type of the entry is stored in the I-node, not in the directory. Directories are treated just like files. A soft link, or symbolic link, is a short file that stores the pathname of another file. A hard link is a second directory entry that points to the same INode. Devices (block- and character-oriented) and communication channels (pipes and sockets) are treated like files. The directory is the only place where the file name is stored.

Alice E. Fischer Systems Programming – Lecture 3. . . 24/40 September 9, 2015 24 / 40 Unix Topics Directories and Files File Types

The entries in a directory can be any of the following types:

Symbol Val Meaning DT_UNKNOWN 0 just in case DT_FIFO 1 pipe DT_CHR 2 character device DT_DIR 4 directory DT_BLK 6 block device DT_REG 8 regular file DT_LNK 10 symbolic link DT_SOCK 12 socket DT_WHT 14 system use only: to hide files.

Alice E. Fischer Systems Programming – Lecture 3. . . 25/40 September 9, 2015 25 / 40 Unix Topics I-Nodes I-Nodes

The POSIX standard for the I-Node for a regular file requires: The length of the file in bytes. Device ID (this identifies the device containing the file). The User ID of the file’s owner. The Group ID of the file. The file mode, including the file type and u-g-o access privileges. Additional system and user flags to further protect the file. Timestamps telling when the inode itself was last changed (ctime, change time), the file content last modified (mtime, modification time), and last accessed (atime, access time). A link count telling how many hard links point to the inode. One or a few file-content block pointers or indirect pointers.

Alice E. Fischer Systems Programming – Lecture 3. . . 26/40 September 9, 2015 26 / 40 Unix Topics I-Nodes I-Node Information

The stat system call retrieves a file’s inode number and some of the information in its I-node. Example: bash-3.2$ stat Elephant.pdf 234881026 // device ID number 3451716 // I-node number -rw-r--r-- // type, permissions 1 // # of hard links to file alice staff // owner, group 0 // device type 87339 // file size in bytes "Jan 10 02:38:05 2010" "Jan 3 14:05:56 2010" // acc, mod "Jan 4 18:32:36 2010" "Jan 3 14:05:56 2010" // Imd, brn 4096 // block size 176 0 // # of blocks in file, ? Elephant.pdf // Name of file

Alice E. Fischer Systems Programming – Lecture 3. . . 27/40 September 9, 2015 27 / 40 Unix Topics I-Nodes Links: Hard and Soft

A hard link is a second pointer to the same I-node. It is exactly like a file; one cannot distinguish the original from the link. Hard links only work within one physical disk partition. A file will not be deleted until the last hard link is deleted. You lose control of your file if you give a hard link to another user. A soft link, or symbolic link is a file that contains a single string, the pathname of another file. That path name can be either absolute or relative. Moving a file or a directory breaks soft links that point to it from the outside. Deleting a file breaks soft links to it, but the broken link will not be discovered until someone tries to use it. Soft links are used more often than hard links.

Alice E. Fischer Systems Programming – Lecture 3. . . 28/40 September 9, 2015 28 / 40 Unix Topics I-Nodes The I-Node is the Actual File

3451716 John's file 2 links data data 6 blocks

data Mary's link to data John’s file data

data

The directory entry points to an I-node. The I-node contains the file info and points to data blocks. For longer files, there is a single-indirect link to another I-node. For huge files, there are double and triple-indirect links that points to an I-node full of single or double-indirect links.

Alice E. Fischer Systems Programming – Lecture 3. . . 29/40 September 9, 2015 29 / 40 Unix Topics I-Nodes Soft link.

A Soft link is a separate file containing only a pathname.

John’s soft 3451729 pathname to John’s file directory for his link link, 1 link original file. 1 block

Alice E. Fischer Systems Programming – Lecture 3. . . 30/40 September 9, 2015 30 / 40 Directory Operations Directory Operations

Library Functions for Directory Processing Error Codes Structure of a Directory Entry Testing the Type of the Entry

Alice E. Fischer Systems Programming – Lecture 3. . . 31/40 September 9, 2015 31 / 40 Directory Operations Library Functions for Directory Processing

To use the directory functions, you must include #include

These libraries are documented in Section 3 of the Unix manual. For further details, check the Unix man pages.

You will need to capture the return value from these functions, then test it and handle error codes appropriately.

The direntDemoJoined program gives a skeleton program with two classes for dealing with directories.

For Program 5, you will need to call the library functions on the next few slides. The error return values for each one are given.

Alice E. Fischer Systems Programming – Lecture 3. . . 32/40 September 9, 2015 32 / 40 Directory Operations The Current Working Directory

Get the absolute path name of the current working directory. #include char* getcwd(char* buf, size_t size); If size=0 and buf is NULL, space will be allocated for the pathname. The return value is NULL if the command fails, and the global variable errno is set to indicate the error. The associated error message is copied into buf. Error codes are: EACCES Permission was denied for a component of the pathname. EINVAL The size argument is zero. ENOENT A component of the pathname no longer exists. ENOMEM Insufficient memory is available. ERANGE The size argument is greater than zero but smaller than the length of the pathname plus 1.

Alice E. Fischer Systems Programming – Lecture 3. . . 33/40 September 9, 2015 33 / 40 Directory Operations Directory Access

Open a directory stream. #include DIR* opendir(const char* dname); The pointer NULL is returned if dname cannot be accessed, or if the system cannot malloc enough memory to hold the whole thing. The global variable errno is set to indicate the error.

Close the directory stream and free the associated memory: int closedir(DIR* dirp); A return value of 0 means success. On failure, -1 is returned and the global variable errno is set to indicate the error.

Alice E. Fischer Systems Programming – Lecture 3. . . 34/40 September 9, 2015 34 / 40 Directory Operations Directory Processing

Read a directory entry. struct dirent * readdir(DIR* dirp); The return value is NULL if there are no more entries in this directory or an error occurred. In the event of an error, errno may be set to any of these values: EBADF fd is not a valid file descriptor open for reading. EFAULT Either buf or basep point outside the allocated address space. EIO An I/O error occurred while reading from or writing to the file system.

Alice E. Fischer Systems Programming – Lecture 3. . . 35/40 September 9, 2015 35 / 40 Directory Operations Structure of a Directory Entry

Basically, the directory lists only the name of the file. All other information is in the I-node.

This is one possible implementation of the directory entry: struct dirent { ino_t d_ino; // file or I-node number __uint16_t d_reclen; // length of this record __uint8_t d_type; // file type, see below __uint8_t d_namlen; // strlen( d_name ) char d_name[255 + 1]; // name must be <= 255 }; This is platform dependent; always use the standard interface functions to process directory entries. Use the member names d_name, d_type, and d_ino in your code.

Alice E. Fischer Systems Programming – Lecture 3. . . 36/40 September 9, 2015 36 / 40 Directory Operations Testing the Entry Type

The header file sys/stat.h defines macros that let you test the type of a directory entry. Use them as if they were functions. #define S_ISDIR(m) (((m) & S_IFMT) == S_IFDIR) // directory #define S_ISREG(m) (((m) & S_IFMT) == S_IFREG) // regular file #define S_ISLNK(m) (((m) & S_IFMT) == S_IFLNK) // symbolic link

#define S_ISSOCK(m) (((m) & S_IFMT) == S_IFSOCK) // socket #define S_ISFIFO(m) (((m) & S_IFMT) == S_IFIFO) // pipe or socket You will need the first three for program 3; we will use pipes and sockets later in the term.

Alice E. Fischer Systems Programming – Lecture 3. . . 37/40 September 9, 2015 37 / 40 Directory Operations System Calls for Directory Processing

System calls are documented in Section 2 of the Unix manual. For Program 3, you will need to make the following system call: Get the stats out of the I-node for a regular file. int lstat(const char* path, struct stat* buf); A return value of 0 means success. If the function fails, it sets a global variable, errno, to the code for the error and returns -1. The stat type definition follows:

Alice E. Fischer Systems Programming – Lecture 3. . . 38/40 September 9, 2015 38 / 40 Directory Operations Reading the I-node Stats.

struct stat { dev_t st_dev; // device ino_t st_ino; // inode mode_t st_mode; // protection nlink_t st_nlink; // number of hard links uid_t st_uid; // user ID of owner gid_t st_gid; // group ID of owner dev_t st_rdev; // device type (if inode device) off_t st_size; // total size, in bytes blksize_t st_blksize; // blocksize for filesystem I/O blkcnt_t st_blocks; // number of blocks allocated time_t st_atime; // time of last access time_t st_mtime; // time of last modification time_t st_ctime; // time I-node info last changed };

Alice E. Fischer Systems Programming – Lecture 3. . . 39/40 September 9, 2015 39 / 40 Summary Summary

Tonight’s topics include: How compilation and linking work. What is a system call? The Unix file system and I-Nodes Directories and files Directory operations Error return values and error codes

Alice E. Fischer Systems Programming – Lecture 3. . . 40/40 September 9, 2015 40 / 40