A Gentler Introduction to

Zhizhang Shen ∗ Dept. of Computer Science and Technology Plymouth State University April 21, 2020

Abstract

Since this course is to run C/C++ in a Unix like environment, we start with a brief introduction to Unix, based on[1], to learn some of the basic Unix commands, the directory structure of the Unix system, and how to compile and run a C program consisting of either one file, or multiple files. In the rest of this course, we will then proceed to learn now to programs in C to solve various problems, and then go through the same process as mentioned above to compile, and run, such programs. Later on, we will also look a variety of Unix system calls, and shell scripts consisting of such calls, and see a comprehensive example as how C is used to write programs that use of system information as kept by Unix.

Contents

1 An introduction 3 1.1 Howtologintothesystem? ...... 3 1.2 Simple Unix commands...... 3

2 Unix file system 6 2.1 Whatisinaname?...... 8 2.2 Workwithfiles ...... 10 2.3 Workwithdirectories...... 14 2.4 Backgroundprocessing ...... 17 2.5 Miscellaneouscommands ...... 18 ∗Address correspondence to Dr. Zhizhang Shen, Dept. of Computer Science and Technology, Plymouth State University, Plymouth, NH 03264, USA. Email address: [email protected].

1 3 A Unix shell 19 3.1 How does a shell work?...... 19 3.2 Optionsandargumentsofcommands ...... 20 3.3 Redirection ...... 20 3.4 Groupcommands...... 21 3.5 Pipesandtees...... 21 3.6 Quotingspecialcharacters ...... 22

4 The editors 23

5 C programming in Unix 23 5.1 Firstthingfirst ...... 24 5.2 Areyouboredwithit? ...... 24 5.3 Speakmathematically ...... 25 5.4 Cutitintopieces ...... 26

6 A debugger 28 6.1 Setbreakpoints...... 30 6.2 Howtoproceedfromabreakpoint?...... 31 6.3 Checkoutthevalues ...... 32 6.4 Afinalexample...... 33

7 shell scripts 36 7.1 Positionalvariables ...... 38 7.2 User-definedvariables...... 41

8 Unix system interface 42 8.1 Lookatonecallindetails ...... 42 8.1.1 Anexample ...... 42 8.1.2 Anotherexample ...... 44 8.1.3 Scriptsagain?...... 45 8.2 Anexample:Listingdirectories ...... 47 8.2.1 Givenastandardinterface...... 48 8.2.2 ... and animplementationofthe interface ...... 51 8.2.3 Alittlesummary ...... 53

2 1 An introduction

Unix is a popular, and powerful 1, which supports multitasking and - sharing (Think about how Turing is operating, where multiple users can use it “at the same time”.). It consists of a kernel, a file system, a shell, and various utilities. As the core of the system, the kernel controls everything inside the computer, and manages all kinds of resources, in particular, it implements various queues to support the multitasking ability, which we will further explore in CS 4310 Operating Systems. With the support of the file system, Unix organizes all kinds of data into a bunch of files, categorized into directories (Referred to as folders in Windows platform). The Unix shell interprets user commands and passes their requests to the kernel so that they can be further carried out. A utility is just a useful piece of software that makes the system user friendly. For example, we can print out a calendar, list all the files out in various forms, find a word in a file, etc.. There are various versions of Unix, such as System V Unix, BSD Unix, and Unix-like systems, such as Linux. Turing is a Linux based system. As mentioned earlier, even Apple iOS is built on top of Unix.

Since Unix is a multiuser system, we need a user account/password to get into the system, which is just the usual myPlymouth account credential.

1.1 How to log into the system? We can get into Turing via puTTY, an SSH and telnet client, with the above credential. The puTTY program can be found in all the programming labs in Memorial, and can also be obtained via https://www.putty.org/. If you use a MAC, try https://www.ssh.com/ssh/putty/mac/. Once you start puTTY, you should enter turing.plymouth.edu into the “Host Name” box 2, then click “Open”, as shown in Figure 1. You will then be prompted for your credential. If everything works out, it should look like Figure 2. You are now ready to play with it. ,

1.2 Simple Unix commands Once getting into the system, we can immediately try out a few simple things.

• Change your password.

% passwd Changing password for user zshen. Current passworE: 1234567

1Both Apple iOS and Adnroid are based on Unix. 2You can use turing later on.

3 Figure 1: What does PuTTy look like?

Figure 2: How to log into turing?

New passworE: 7654321 Re-enter passworE: 7654321 passwE: all authentication tokens updated successfully. %

Here, by ‘%’, we mean the command line prompt, or simply prompt. It is often associated with a user path, which we will explain further later on. For example, in my case, the actual prompt looks like the following (Cf. Figure 2):

/home/zshen >

Question: What is the path in your case?

• What is the date today?

4 % date Tue Jan 7 08:06:31 EST 2020

just logged on?

% who am i zshen pts/0 2020-01-07 08:06 (vpnuser-1-20.plymouth.edu)

• What is the command that you want to use? Here “man” certainly refers to “manual”.

% man date DATE(1) UserCommands DATE(1)

NAME date - print or set the system date and time ...

• I am done.

% logout

Labwork 1.2: Unix commands are case sensitive. Try out the following commands with turing to see what happens: WHO, 2018, DATE, Who Am I, e.g.,

%WHO

You might also want to use the manual command 3 man to check if they are validate Unix commands, e.g.,

% man WHO

Collect what you have done with the above in a ‘.txt’ file, using, e.g., Notepad++, and send it in as part of your report for this lab 4. (Check out the course page for the whole assignment for Lab 1.)

3Notice that you can the manual mode by entering ‘q’, standing for “quit”?. 4An effective and efficient way to send in a lab report for this course is to copy the stuff that you have done in turing, and them into a .txt file. Make sure that you will clearly label your solutions to the problems. For example, collect all the stuff you just did for Labwork1.2 after a label “Labwork1.2”.

5 2 Unix file system

There is nothing special in terms of the file system, compared with what we have been dealing with every day with the Windows system, except perhaps a folder in Windows is referred to as a directory in Unix. An ordinary file can be either a text file or a binary file. A text file is usually represented as ASCII codes; and a binary file, on the other hand, is composed as a sequence of binary bits, intended for a computer to use. A special file contains information necessary to work with a certain device. There are lots of files kept in a computer. To facilitate a mundane task of looking for a file, a bunch of related files are organized into a directory, and a bunch of related directories are organized into a parent directory, ..., until everything is collected under a root directory. From an operating system’s perspective, a directory is also regarded as a file. Thus, the whole thing is organized as a tree, which you should know very well after taking CS2381 Data Structure and Intermediate Programming. Below shows an example of such a file system, based on a tree.

Figure 3: An example of a Unix tree

To dig a bit deeply, the very reason to use a tree structure to organize all these directories and files is that, as you should have learned in MA2200 Finite Mathematics, MA2250 Mathematics for Computer Scientists, or MA3200 Discrete mathematics, that, in such a structure, the path between any two nodes, especially that between the root directory and any other directory and/or file, is unique. This simple fact allows us to use the same name to uniquely identify different entities in such a structure, which will be made clear later when we discuss the naming of such entities as directories and files. Another fact is that it is relatively fast for use to look for a specific file in such a tree structure, which will be made clear when you take CS3221 Algorithm Analysis.

6 More specifically, at the top level, a Unix file tree contains the following directories: bin, containing the software for the shell and most of other commands; dev, containing all the special files, especially drivers, associated with various devices; etc, containing various administrative files, such as the list of the authorized users; home, containing the home directories of various users; tmp, containing you-know-what; usr, containing something useful; and var, containing something that varies, e.g., users’ mailboxes. When just logging into the system, you are in your working directory. For example, when I log into the system, I am in my working directory /home/zshen, as shown in Figure 2. You can always find out your current working directory by using the “print working directory” command .

% pwd /home/zshen %

If you want to have a look at the tree structure rooted at the current working directory, you can use the tree command. For example, a sub-tree rooted at the working directory of counting is shown in Figure 4, where lists all the stuff in a working directory.

Figure 4: Usage of the tree command

Once you know where you are, you can change to any directory you want to work with, by using the “directory change” command, . The issue is certainly how do you know where to go, and how to tell the system about where you want to go. We will explain how to change to the desired directory after understanding how to identify the location of such directories in the following Section.

7 2.1 What is in a name? Every file and directory is identified by a name. On most of the systems, a name contains between 1 and 255 characters out of the following: upper case letters, lowercase letters, digits, period, underscore and comma. You should avoid some of the special symbols such as ‘&’, ‘*’, ‘\’, various brackets, ‘%’, etc., when coming up with a name, and you should not use various reserved words 5 or special characters 6, as the names of your directories/files. To refer to a file located in your current directory, you only need to use its name. But, if you want to use a file located in another directory, you have to use its pathname, which uniquely identifies its location in the whole file structure, which is a tree. For example, the root directory is almost always labeled with‘/’, and the absolute path- names for the root’s child directories starts with the root directory, i.e., ‘/’. In general, to form the absolute pathname of a node name in a tree structure, we start with ‘/’, the name of the root, and collect the names of all the nodes along the unique path from the root to name in a sequence, and use ‘/’ to separate all such names. For example, the absolute name of the directory home is /home as shown in Figure 3, where, on the other hand, the directory ‘home’ contains two sub-directories: jack and jill. Their absolute path names 7 are ‘/home/jack’ and ‘/home/jill’, respectively. Similarly, the absolute pathname for the file kangaroo is ‘/home/jill/Marsupials/kangaroo’.

Figure 5: What does a marsupial look like?

Although the absolute path name uniquely identifies a file, and allows us to give the same name to different files, it is certainly an overkill, since we have to specify all the intermediate names /, e.g., as shown in Figure 4, the absolute name of the directory counting is

5Check out some of the Unix reserved words from the course page. 6Check out some of the Unix special characters from the course page. 7In ros, a robotic operating system, which some of you will study in the EMTR program, an absolute pathname is called a global name [3, Sec. 5.2].

8 /home/zshen/Courses/CS2470/programs/Core02/counting

Moreover, we usually don’t have access rights to some of these directories, especially, the root directory. Furthermore, the host computer most likely won’t have the directory structure that you will use. Thus, we almost always use relative pathname 8, a pathname relative to the current working directory, to refer to a certain node in a file structure . This notion is also based on the key structural property of a tree: there is a unique path between any two nodes, going through the root of the minimum tree, which includes both nodes. For example, in Figure 3, if we are in the directory jack, and want to refer to the directory Marsupials, we notice that home is the root of the minimum tree that contains both of these two nodes. We thus 1) move up to home, the parent of jack, as represented with ..; 2) then move down to jill; and 3) move further down to Marsupials, while using ‘/’ to separate all these three names. As a result, the name of Marsupials relative to jack is “../jill/Marsupials”. Note: It is easy to tell apart from an absolute path name from a relative one: the former starts with a slash ‘/’, while the latter doesn’t. We can also use ‘.’ to refer to the current working directory, as in, e.g., ‘./Oceans’, assuming the current working directory is still “jack”. But we prefer to use “Oceans” as it saves us two key strokes. , On the other hand, because of an access privilege issue, when we want to execute a program in the current working directory, e.g., a.out, we have to say

%./a.out

Labwork 2.1:

1. Which of the following are valid names for ordinary Unix files, and why 9?

foo, guess?, book.chap1, BOOK.chap2, 2good2Btrue, {2bad}, rank*, serial#

2. Which of the following are valid names for directories, and why?

Dir2, Directory.3, *Hook, |Line1|, "Sinker", money.$, .hideNseek, 777

3. In Figure 3, what are the absolute path names for root, bin, jill, and kangaroo, respectively?

4. Again, in Figure 3, suppose that Marsupials is now the working directory, what are the relative pathnames of root, bin, jill, and kangaroo, respectively?

8It is called relative graph resource name, or simply relative name in ros, including a special case of private name, relative to a running node [3, Sec. 5.3]. 9You might want to about them now, and try them out later on. Things might have changed, during the last thirty years.

9 5. Still in Figure 3, directory jack has two subdirectories, Continents and Oceans, what are their absolute pathnames?

Collect what you have done in a ‘.txt’ file and send it in as part of your report for this lab, together with what you have done for Labwork 1.2. (Check out the course page for the whole assignment for Lab 1.)

2.2 Work with files Once a directory is identified, we can list all the files included in that directory using the ls command, as you already saw in Figure 4.

1. List all the files in the current directory 10

% ls a.out first.c hello lucky second.c basicExamples format.c hello.c pow.c sqroot.c char formatFirst.c longestLine power.c temperature counting getline.c lsOutput printd.c

2. A hidden file, i.e., a file with its name starting with a ‘.’, will not be listed, unless we use the list all command.

% ls -a . char formatFirst.c longestLine power.c temperature .. counting getline.c lsOutput printd.c a.out first.c hello lucky second.c basicExamples format.c hello.c pow.c sqroot.c

3. The command ls shows us the names of files and directories. We can have a look at the content of a text file by using more, e.g.,

% more first.c #include

main(){ ("I feel lucky today \n"); } 10What you will get is certainly different from mine ,.

10 4. To create a “new file”, you can either 1) redirect the “standard input” into a “new file”; 2) copy an existing file into a “new file”; 3) use a text editor to create a “new file”, which we will do a lot in this course; or 4) use a computer program to generate a “new file”.

(a) It is easy to create a file, e.g., “sampleRedirect”, by redirecting the output of the standard input device into such a file with ‘>’ as follows: Use the output of the “ls” command as the content of the file sampleRedirect.txt. % ls output.txt % ls > sampleRedirect.txt % more sampleRedirect.txt output.txt sampleRedirect.txt In particular, the following creates an empty file (?). % > testFile % more testFile % (b) You can also copy an existing file with the command. For example, in the following, you copy “output.txt” into “anotherOutput.txt” % ls output.txt sampleRedirect.txt % cp output.txt anotherOutput.txt % ls anotherOutput.txt output.txt sampleRedirect.txt (c) If you want to redirect the output of the standard input and append it to an existing file, you can use the ‘>>’ operator. In the following, you add the output of the program “ls” at the end of the the file “output.txt”. % more output.txt This is a % ls anotherOutput.txt output.txt sampleRedirect.txt % ls >> output.txt % more output.txt This is a test

anotherOutput.txt output.txt sampleRedirect.txt

11 (d) What we will do mostly in this course, just as we did in the programming courses, is to use an editor to come up with a program, actually a text file. We will talk a lot more about this approach in Section 4.

5. We can change the name of a file, or even move a file to another place, with the move command ‘’. For example, the following renames “output.txt”to “result.txt”.

% ls anotherOutput.txt output.txt sampleRedirect.txt % mv output.txt result.txt % ls anotherOutput.txt result.txt sampleRedirect.txt

In general, you can also move a file somewhere else by changing its absolute path name. Here the command “” creates a directory sampleDir within the current working directory. We will further discuss “mkdir” in Section 2.3.

% mkdir sampleDir % ls anotherOutput.txt result.txt sampleDir sampleRedirect.txt % ls sampleDir %

The following moves the file result.txt into the just created directory sampleDir.

% mv result.txt sampleDir % ls anotherOutput.txt sampleDir sampleRedirect.txt % ls sampleDir result.txt

You should use the man(ual) command to check out the other formats, option, etc. of this “mv” command.

6. To delete a file, you use “”. Go through the following process step by step and understand what happens in each step.

% ls anotherOutput.txt sampleDir sampleRedirect.txt % mv sampleDir/result.txt result.txt % ls anotherOutput.txt result.txt sampleDir sampleRedirect.txt % cp result.txt output.txt

12 % ls anotherOutput.txt output.txt result.txt sampleDir sampleRedirect.txt % rm output.txt % ls anotherOutput.txt result.txt sampleDir sampleRedirect.txt

7. With the “ls” command, the ‘l’ option shows all the information of the items in a directory.

% ls -l total 16 ... -rw-r--r-- 1 zshen domain^users 15 Mar 27 2013 test1.txt ...

Look at the above long listing, you might be wondering what does, e.g., “rw-r--r--”, mean. They deal with the access privilege of a file or a directory in three groups: u(owner of this file), g (other users in this file group), o (everyone else), and a (everybody, i.e., ugo). For each group, it tells its access privileges regarding whether it can be read, written or executed. Thus, “rw-r--r--” means that the owner of the file can read and update, but cannot execute, this file; all the members of her group, as well as the public, can only read this file.

8. Only the owner of a file, or a directory, can change such privileges with the command. Go through the following process step by step and understand what happens in each step.

% ls -l test1.txt -rw-r--r-- 1 zshen domain^users 15 Mar 27 2013 test1.txt % more test1.txt -42+45*- % chmod u-r test1.txt % more test1.txt test1.txt: Permission denied % ls -l test1.txt --w-r--r-- 1 zshen domain^users 15 Mar 27 2013 test1.txt % chmod u+r test1.txt % ls -l test1.txt -rw-r--r-- 1 zshen domain^users 15 Mar 27 2013 test1.txt % more test1.txt -42+45*-

13 The following gives everyone the read access privilege to the file.

% chmod a=r test1.txt % ls -l test1.txt -r--r--r-- 1 zshen domain^users 15 Mar 27 2013 test1.txt

2.3 Work with directories It is quite common to develop a tree like directory structure to manage the files more effi- ciently by using various directory related commands.

1. When working with directory, the most important thing is to know where you are, i.e., what is your working directory, with the “pwd” command.

% pwd /home/zshen/testDir/parent % ls anotherOutput.txt result.txt sampleDir sampleRedirect.txt

2. To create another directory within the current directory, you use “mkdir”.

% mkdir anotherDir % ls anotherDir anotherOutput.txt result.txt sampleDir sampleRedirect.txt

3. Use “cd” to change the working directory to this newly created directory anotherDir.

% cd anotherDir % ls

4. Come back to the previous working directory. Here is where we are at this moment.

% pwd /home/zshen/testDir/parent/anotherDir

Remember “..” stands for the parent directory of the current working directory.

% cd .. % pwd /home/zshen/testDir/parent % ls anotherDir anotherOutput.txt result.txt sampleDir sampleRedirect.txt

14 5. To remove a directory, we use “”.

% rmdir anotherDir % ls anotherOutput.txt result.txt sampleDir sampleRedirect.txt

6. What have we got? Use some of the previous commands, we can find out the contents of the current working directory.

% pwd /home/zshen/Courses/CS2470/programs/Core02/basicExamples % ls a.out recip sqroot.c testScanf.c hello.c recip.c testAnotherScanf.c tripExample

The tree command provides a more structural perspective of the content, as we saw in Figure 4.

Labwork 2.3: When sending in a lab report, write down as much detail as you could for the assignments, and label them clearly.

1. Create the directory structure as shown in Figure 3, notice that “kangaroo”is the name of a file, but not a directory. Since “/” is a special character, for the real root directory, which you don’t have access right, use “root” as the name of the root directory in your structure. Use the tree command to demonstrate what you have done.

2. Set up some additional subdirectories for jack as follows: Continents now holds Africa, Antarctica, Asia, Australia, Europe, NAmerica, and SAmerica; and each of these directories holds some countries and regions of their own.

(a) User tree command to decomonstrate this expanded file structure. (b) List the absolute path names for the following countries: Norway, India, Egypt and Argentina 11. (c) Assume that your current working directory is USA, accomplish the following using a single command line with the help of relative pathnames. i. List the content of Marsupials. ii. List the content of Australia. iii. Place a copyof the kangaroo file in the Australia directory under Continents.

11Make sure you will put these countries in the right continent.

15 (d) Repeat the above using absolute pathnames.

3. The command takes a line of input that you in through the keyboard and kicks it right back on the screen. Try out the following:

% echo try this out. try this out.

Create a file “fun” by redirecting the output with the echo command, with the help of the redirection mechanism (Cf. Step 4a of Sec.2.2)

4. Use the commands who, who am i, and date (Cf. Sec.1.2) to append more data to the fun file.

5. Come up with a file “stuff” and place it in your working directory. Then do the following:

(a) Give everyone permission to read stuff, but do not change any other access privilege. (b) Permit the owner and group members to read and write the file; remove these privileges from everyone else. (c) Give the owner and group members permission to execute stuff while giving the owner sole permission to read or write it.

Confirm what you have done (Cf. Step 8 of Sec.2.2)

6. A hidden file has a name starting with ‘.’. (Cf. Step 2 of Sec.2.2) Use the “cal” utility (What does it do? How to find it out?) and the redirection operator ‘>’ to create a .hidden file. How to tell that you have created this file? How to show the content of this file?

7. Create a new directory “Misc” and move the file fun into Misc.

8. Without leaving your current working directory (What is it?), create a directory “Vacations” under Misc. Then create calendars for the summer months (June, July and August) of 2020 12, and move them into the Vacations directory.

9. Redo Labworks 2.1.1 and 2.1.2 by testing them out, using appropriate commands. Tell me what happens when you try to create a file with its name being, e.g., “{2bad}”, etc.. 12They will be here soon... ,.

16 2.4 Background processing As we mentioned earlier, Unix is a powerful operating system that supports multitasking in terms of multiple processes identified with PID. Since there is only one processor in the machine 13, at any time, one process is running, while the others are simply waiting. The command will show the PIDs of all the processes that have been running.

% ps PIDTTYTIMECMD 21847 pts/0 00:00:00 bash 22164 pts/0 00:00:00 man 22167 pts/0 00:00:00 sh 22168 pts/0 00:00:00 sh 22173 pts/0 00:00:00 less 25002 pts/0 00:00:00 ps

We can use the command to terminate a process, stop to suspend a process, bg brings it to the background, and fg to bring a process to the foreground. The ‘\&’ symbol at the end of a line puts it into the background.

% ( 60; echo stop this command) & [2] 25128 % ps PIDTTYTIMECMD 21847 pts/0 00:00:00 bash 22164 pts/0 00:00:00 man 22167 pts/0 00:00:00 sh 22168 pts/0 00:00:00 sh 22173 pts/0 00:00:00 less 25128 pts/0 00:00:00 bash 25129 pts/0 00:00:00 sleep 25130 pts/0 00:00:00 ps

% kill 25129 % -bash: line 42: 25129 Terminated sleep 60 stop this command

Question: What does “sleep 60” do? You might want to check out the syntax of this “sleep” command. Also change the 60 to other values to see what happens.

13There are perhaps multiple processing cores, but sill limited, as compared with a large number of processes, and threads. A lot more details along this line will be discussed in CS 4310 Operating Systems.

17 2.5 Miscellaneous commands There are also quite a few other commands, such as , crypt, , , etc.. To check them out for details, you can use the manual command “man” to find out their respec- tive syntax and semantics. For example, “” essentially does the “search” of a target string in a file, or pattern recognition, which we will explore later on in the Function chapter.

% man grep

GREP(1) GREP(1)

NAME grep, egrep, fgrep - print lines matching a pattern

SYNOPSIS grep [options] PATTERN [...] grep [options] [-e PATTERN | -f FILE] [FILE...]

DESCRIPTION Grep searches the named input FILEs (or standard input if no files are named, or the file name - is given) for lines containing a match to the given PATTERN. By default, grep prints the matching lines.

In addition, two variant programs egrep and fgrep are available. Egrep is the same as grep -E. Fgrep is the same as grep -F.

OPTIONS -A NUM, --after-context=NUM Print NUM lines of trailing context after matching lines. Places a line containing -- between contiguous groups of matches.

Below shows an application.

% more first.c #include main(){ printf("I feel lucky today\n"); }

% grep printf first.c printf("I feel lucky today\n");

18 Labwork 2.5: How should we display, with grep, two lines after a matching of “printf” is found in the above file, first.c, starting with each matching?

3 A Unix shell

As we mentioned earlier that a Unix shell interprets and executes user commands.

3.1 How does a shell work? The general process looks like the following: 1. The shell displays a prompt such as ‘% ’.

%

2. You type in a command such as cal.

% cal

3. The shell interprets your command by looking for an appropriate software. If it can’t find it, it will print out an error message.

% wrong -bash: wrong: command not found

4. If it finds the command, the kernel runs the requested software to give you the output.

% cal 2 2020 February 2020 Su Mo Tu We Th Fr Sa 1 2345678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

5. The shell then prints another prompt and waits for the next command.

%

Note: The above provides a general user interface of a Unix Shell, which you will work on as your final project for this course. Such a user interface is also the core of a GUI (Graphic User Interface) interface of a contemporary operating system, where we just use mouse and window to provide the need information to an operating system for its execution.

19 3.2 Options and arguments of commands For the “cal” command, you can specify which month and year you want. Here both the month and the year are the arguments of this command. Sometimes, we have options for a command. For example, we have -a, -F, -l, -r, -s, -t, and -u for the listing command ls, where, e.g., the -F option classifies all the objects within a directory by showing a ‘/’ after each directory and a ‘*’ after each executable file. % ls parent testFile testFile.txt % ls -F parent/ testFile* testFile.txt Labwork 3.2: 1. What does “-a” stand for? it out and do the same for all the other options of ls, the listing command. 2. Give an example for each of the aforementioned option about its usage.

3.3 Redirection We already talked about this stuff earlier, in Step 3 in Sec. 2.2, e.g., % more result.txt This is a test

anotherOutput.txt output.txt sampleRedirect.txt The following gets the input from result.txt for the utility “”, which then sends its output to aother file “wcOutput.txt. % wc < result.txt > wcOutput.txt % more wcOutput.txt 5 764 % Question: What does “wc” do? We will check out its implementation details in the C Core unit later, also in the Function chapter.

Labwork 3.3: 1. Use man to dig out all the details of this wc utility. 2. Run wc to find out only the number of characters in the calendars for all the summer months, i.e., June through August, of 2020.

20 3.4 Group commands Besides typing one command at a time, we can also type in several at a time and place them in the same line, which will then be executed sequentially. For example,

% who am i; cal 2 2020 zshen pts/1 2019-11-19 06:46 (vpnuser-1-11.plymouth.edu) February 2020 Su Mo Tu We Th Fr Sa 1 2345678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

3.5 Pipes and tees The pipe command, ‘|’, sends the output of one command as the input to another command. For example,

% ls | more input output output.txt sampleRedirect

Here, the command ls throws its output, the content of your working directory, into a temporary file, which is passed as input to more, which then displays such a content. On the other hand, “|” does one more thing: besides passing the output of one command to another as input, it also saves the output in a file. In the following, besides passing the output of ls to more to display, it also saves such a content into the file IsOut.

% ls |tee lsOut |more input output output.txt sampleRedirect % ls input lsOut output output.txt sampleRedirect % more lsOut input output output.txt sampleRedirect

21 Labwork 3.5:

1. Use “|tee” to list the content of your working directory, and also save them in a separate file.

3.6 Quoting special characters To prevent confusion, a file name should not contain any of the special characters that Unix uses for some specific purpose. But, when we have a need to use any of the special characters, we have to tell the system, using the escape character ‘\’. For example, ‘*’ stands for a wild card, i.e., everything, while “\*” stands for itself.

% echo * anotherDir anotherOutput.txt a.out hello.c result.txt sampleDir wcOutput.txt % echo \* *

We use grave accent, ‘‘’, i.e., the key above the “Tab” key in a standard keyboard 14, to indicate a command that we want the shell to run. For example,*

% date Thu Jan 9 07:22:12 EST 2020

Nothing leads to nothing.

% echo It is not date It is not date

If we use ‘’’, apostrophe (Decimal code 39), by pressing the key to the left of the Enter key, we get the following:

% echo It is not ’date’ It is not date

On the other hand, we will get the following if we use, ‘‘’, the grave accent.

% echo It is not ‘date‘ It is not Thu Jan 9 07:25:48 EST 2020 Labwork 3.6:

1. What does each of the following do, and why?

14In the standard ASCII code chart, the binary, Octadecimal, Decimal, and Hexadecimal code of grave accent is 110 0000, 140, 96 and 60, respectively.

22 % echo * % echo /* % echo \* % echo "\*" % echo "/*" % echo % echo */*

2. Investigate the wild cards such as ‘*’,‘?’, and use “”, another utility (What does it do?), to show the contents of all the files with their names ending in “ing”. How to list all the files with their names containing either ‘x’or ‘X’?

4 The editors

There are a few editors available in Unix that we can use to enter a program, such as (m), pico and emacs. We once all used pico 15 to edit emails. It is pretty easy to use, but you can only work with one line at a time. /.

% pico newFile

Unix does come with a full-screen editor, vim.

% vim newFile

Although it is not as easy, you are strongly urged to use vim as the editor. Check out the course page for a vim tutorial.

Labwork 4: Make sure that you know vim reasonably well, before proceeding to the next unit. ,

5 C programming in Unix

We will now go over the process that, once you have saved a program somewhere, how to compile, and run this program. This compilation, , and execution, process is the same for all the other programs that we will come up in the rest of this course.

15Well, pico is not available in turing, but nano is.

23 5.1 First thing first It is pretty easy to compile a c program in Unix, using the cc, or gcc 16 facility. For example, after editing and saving your program, “hello.c” in your working directory, you can check out its content as follows:

% more hello.c /* the first program in c */ #include int main(void){

printf("Hello, world\n"); return 0; }

The following line compiles this program 17:

% cc hello.c

You now can run it 18

% ./a.out Hello, world

5.2 Are you bored with it? You don’t need to always call the executable a.out, for example

% cc hello.c -o hello % ./hello I feel lucky today

Here is another example which calculates the reciprocal of an integer.

% more recip.c % more recip.c /* Compute reciprocals */

#include

16Later on, when we work with C++ programs in Unit 9, we use the g++ command. 17Technically, you might also want to go through the following section on debugger. But, it might serve the purpose better if you get some C programming experience first. , 18Revisit Sec.2.1 as why we have to prefix “a.out” with “./”.

24 int main(void){ int n; float recip; printf("This is a program computes reciprocals.\n"); printf("Enter a number: "); scanf("%d", &n); recip=1/(float) n; printf("The reciprocal of %d is %f.\n", n, recip); return 0; } % cc recip.c -o recip % ./recip This is a program computes reciprocals. Enter a number: 34 The reciprocal of 34 is 0.029412.

5.3 Speak mathematically When using a mathematical function, we have to add on the -lm option to link to the mathematics library. For example, % more sqroot.c /* Compute square roots */

#include #include

int main(void){ float n, x, root; printf("This program computes square root.\n"); printf("Enter a number: ");

//printf("%3.1f", sqrt(32)); scanf("%f", &n); x=n; printf("The number you entered is %f: ", x); printf("The square root of %f is %f.\n", n, sqrt(n)); return 0; } We now bring in “lm”, a library of the math functions 19, including that of sqrt(n), to serve the comiling purpose. 19Check out the relevant link in the course page.

25 % cc sqroot.c -lm

Ready to run?

% ./a.out This program computes square root. Enter a number: 34 The number you entered is 34.000000: The square root of 34.000000 is 5.830952.

5.4 Cut it into pieces We can also work with a program consisting of multiple files, just as we did with Java. For example,

% more main.c /* Illustrate multiple source files */ void chicago(void); void indiana(void); int main(void){ chicago(); indiana(); chicago(); return 0; }

% more chicago.c /* The chicago() function */ #include void chicago(){ printf("\nI’m waiting at O’Hare International,\n "); printf("Airport, a fun place.\n"); }

% more indiana.c /* The indiana() function */ #include void indianapolis(void); void indiana(void){

26 printf("Back home again, Indiana.\n"); indianapolis(); printf("Wander Indiana--come back soon!\n "); }

% more indy.c /* The indianapolis() function */ #define POP2000 1.6 #include

void indianapolis(void){ printf("\nWelcome to Indianapolis, Indiana.\n"); printf("Population: %f million.\n", POP2000); }

Compile all the pieces individually. Notice the “-c” option 20.

% cc -c main.c % cc -c chicago.c % cc -c indiana.c % cc -c indy.c

Link all the object codes, in “.o”.

% cc *.o

Run the executable.

% ./a.out

I’m waiting at O’Hare International, Airport, a fun place Back home again, Indiana.

Welcome to Indianapolis, Indiana. Population: 1.600000 million. Wander Indiana--come back soon!

I’m waiting at O’Hare International, Airport, a fun place %

Notice that, at the end, the printer returns to its original place.

20Check out the other popular “cc” options in the course page.

27 Labwork 5:

1. To help you better organize the fifty or so programs that you will soon develop in this course, come up with a directory structure under your working directory, for all the chapters that we will cover in this course. Each chapter, e.g., Core, is assigned its own directory. And some of them, such as unixNotes, must have several sub-directories, e.g., one for the current section, editor. A sample, and partial structure, is given as follows for your reference:

% pwd /home/zshen/Courses/CS2470 % tree . +-- Core +-- Function +-- Type +-- unixNotes +-- editor +-- hello.c +-- multi +-- chicago.c +-- indiana.c +-- indy.c +-- main.c

6 directories, 5 files

2. Use an editor, e.g., vim, to add in all the programs that we have gone through in this section, and put them under editor. The above structure shows hello.c, and a group of files, as shown in Section 5.4, collected under the sub-directory, multi.

3. Test out all these program samples, make sure they all run smoothly.

4. Let me know that “I’m done.”. You don’t need to send in anything else.

6 A debugger

A C debugger, gdb, is a very useful tool for a C programmer, similar to what jdb could do for you if you program in Java. To help us to find out programming errors, it allows us to check out various snapshots of the execution of a program, namely, values of various variables, and determine whether a particular line in a program will be the next one to be executed, allowing us to find out

28 logic errors. We can also use it to execute one line at a time, instead of letting the whole thing go through in a single shot. We can also use it to observe how a program behaves with a collection of choice points. All these features help us to diagnose whether the program is what we have in mind, and if it does the right thing. We will now check out this many possibilities through a particular debugger gdb that accompanies cc. Let’s use the following code as a running example, which prints out the value of sum at the end of its execution.

1. #include 2. 3. main(){ 4. int i, sum; 5. 6. sum=0; 7. for(i=0; i<10; i++) 8. if(i<5) 9. sum=sum+1; 10. else 11. sum=sum+((i-3)/2+(i/3)); //line 11 12. 13. printf("sum=%d\n", sum); }

We can certainly run it as follows.

% cc sum.c % ./a.out sum=24

Let’s send it through the debugger gdb:

% cc -g sum.c % gdb a.out GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-". For bug reporting instructions, please see: ... Reading symbols from /home/zshen/Courses/CS2470/programs/Debugger/a.out...done. (gdb)

29 In the above, to facilitate the debugger, we use the “-g” switch to come up with additional information, such as the symbolic table, built during the compilation process 21. Moreover, when this switch is used, no optimization is done when compiling the codes, which makes it less complicated to do the debugging. After compiling, we run the debugger on the executable a.out, which simply loads the executable into the debugger environment. We are now ready to execute various debugger commands. Once any such command is done, it comes back to the debugger prompt

(gdb)

Before we get into something, we alwasy want to know how to get out... .

Question: How to get out of gdb?

(gdb) quit %

6.1 Set break points It is often useful to execute a program partially, i.e., through part of a program, which can be done by setting a break point. It tells the debugger to run the program until that break point is reached, when the program just paused. We can then do other things to the paused program. For example,

(gdb) break 11 Breakpoint 1 at 0x400541: file sum.c, line 11. (gdb) run Starting program: /home/zshen/Courses/CS2470/programs/Debugger/a.out

Breakpoint 1, main () at sum.c:11 11 sum=sum+((i-3)/2+(i/3)); Missing separate debuginfos, use: debuginfo- glibc-2.17-260.el7_6.3.x86_64 (gdb)

Question: What should be value of i at this point? Answer: 5. Since if i is less than 5, it would go through the then part.

Question: How do we find it out? Answer: We can certainly check this out using another debugger command display:

(gdb) display i 1: i = 5 (gdb)

21If you take CS 3221 with me, this is an example that I gave as an application of the Hashing table.

30 Question: What should be value of sum at this point? Answer: Also 5. Since we went through from i=0 through i=4.

Question: How do we find it out? Answer: (gdb) display sum 2: sum = 5 We can also check out where the program is paused. (gdb) where #0 main () at sum.c:11

6.2 How to proceed from a break point? There are three ways: step, next and continue. 1. The step command executes the next line in the logic flow and then pause again. of “one step at a time”. Question: What is the next line at this point in the logic flow? Answer: Since line 11 is the last line of the loop body, the control has to go back to the for line, i.e., line 7.

Question: How do we find it out? Answer: Use step.

(gdb) step 7 for(i=0;i<10;i++) 1: i = 5

Question: Then what?

(gdb) step 8 if(i<5) 2: sum = 7 1: i = 6 (gdb)

The first step executes line 11, and brings the control to the loop at line 7, when the value of i is still 5. The next step does the increment of i and brings the control to line 8, where the value of i turns into 6.

31 2. The next command does the same, except if the next line is a function call, it will complete everything that call does.

(gdb) next

Breakpoint 1, main () at sum.c:11 11 sum=sum+((i-3)/2+(i/3)); 1: i = 6 (gdb) next 7 for(i=0;i<10;i++) 1: i = 6 (gdb) next 8 if(i<5) 1: i = 7

The next command might be more preferable over step.

3. The command continue allows the execution continue until either a breakpoint is reached, the program exits normally, or it tries to do something illegal.

(gdb) continue Continuing.

Breakpoint 1, main () at sum.c:11 11 sum=sum+((i-3)/2+(i/3)); 1: i = 7

6.3 Check out the values There are two ways to observe the values of a variable: print and display.

1. The print command is a one-time request to see the value. The debugger shows its current value then forgets about it.

(gdb) print i $1 = 7

2. As we have already seen, the display command shows it on an ongoing basis: The debugger will show its value every time the program is paused.

32 6.4 A final example The program sum.c generates different values of sum for different value of i. We might want to see the big picture as what specific values of sum will be generated for what specific values of i. What we could do is to set break points at lines 8 and 9, and print out values of sum and i in that context. % gdb a.out GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: ... Reading symbols from /home/zshen/Courses/CS2470/programs/Debugger/a.out...done. (gdb) break 8 Breakpoint 1 at 0x400535: file sum.c, line 8. (gdb) break 9 Breakpoint 2 at 0x40053b: file sum.c, line 9. (gdb) run Starting program: /home/zshen/Courses/CS2470/programs/Debugger/a.out

Breakpoint 1, main () at sum.c:8 8 if(i<5) Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.3.x86_64 (gdb) display i 1: i = 0 (gdb) step

Breakpoint 2, main () at sum.c:9 9 sum=sum+1; 1: i = 0 (gdb) display sum 2: sum = 0 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 1 1: i = 1 (gdb) step

33 Breakpoint 2, main () at sum.c:9 9 sum=sum+1; 2: sum = 1 1: i = 1 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 2 1: i = 2 (gdb) continue Continuing.

Breakpoint 2, main () at sum.c:9 9 sum=sum+1; 2: sum = 2 1: i = 2 (gdb) step 7 for(i=0;i<10; i++) 2: sum = 3 1: i = 2 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 3 1: i = 3 (gdb) continue Continuing.

Breakpoint 2, main () at sum.c:9 9 sum=sum+1; 2: sum = 3 1: i = 3 (gdb) continue’ Unmatched single quote. (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 4

34 1: i = 4 (gdb) continue Continuing.

Breakpoint 2, main () at sum.c:9 9 sum=sum+1; 2: sum = 4 1: i = 4 (gdb) continue Continuing.

//When i<5, we just add 1 to sum--zs Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 5 1: i = 5 (gdb) continue Continuing.

//When i>=5, the rule changes, and the //first such value for sum becomes 7--zs Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 7 1: i = 6 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 10 1: i = 7 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 14 1: i = 8 (gdb) continue Continuing.

Breakpoint 1, main () at sum.c:8 8 if(i<5) 2: sum = 18

35 1: i = 9 (gdb) continue Continuing. sum=24 [Inferior 1 (process 102194) exited with code 07] (gdb)

The order of the break points does not matter, the debugger will pause whenever a break point is reached.

Break points can be removed using the clear command.

(gdb) clear 9 Deleted breakpoint 2 (gdb) continue Continuing.

As we just saw, once we are done, we simply quit.

(gdb) quit A debugging session is active.

Inferior 1 [process 18704] will be killed.

Quit anyway? (y or n) y %

Incidentally, this debugger also works with g++, i.e., the c++ stuff.

Labwork 6

1. Play with the debugger with the sum.c program, and familiarize with its features.

2. Find a non-trivial program in C that you did this semester. Identify one or more aspects that you want to identify, particularly, a malfunctioning feature of your program, i.e., something that you did wrong. Then use various features of this gdb debugger to identify the bugs, and come up with a correct program.

7 shell scripts

A very powerful feature of Unix is that we can group together various Unix commands to come up with a script, or a little program, that will execute. For example, the following script, simpleShell, containing the following lines:

36 % more simpleShell #A simple shell script cal date who

Here ‘#’ indicates a comment. We can then run the program by using a redirection:

%more simpleShell #A simple shell script cal date who

% sh < simpleShell April 2020 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Wed Apr 8 19:57:45 EDT 2020 zshen pts/11 2020-04-08 19:39 (vpnuser-1-119.plymouth.edu) %

We can also make this file executable by changing its access privilege using chmod, which was discussed in Item 8 in Section 2.2:

% chmod u+x simpleShell % ./simpleShell April 2020 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Wed Apr 8 19:59:20 EDT 2020 zshen pts/11 2020-04-08 19:39 (vpnuser-1-119.plymouth.edu) %

37 In this sense, shell becomes a system programming language, where the control structures are pretty much the same stuff; but the variables are different: besides the user-defined variables, there are two other kinds: environment variables and positional variables. Environment variables are used to keep values for special shell variables such as HOME, TERM,PATH, referring to your home directory, the type of the terminal you use, and where the system should look for stuff, respectively. They set the scene for everything, but we will not on them too much here to avoid a mess /.

7.1 Positional variables Positional variables are used to capture the command-line arguments used by the shell script, numbered 0, 1, 2, ..., 9. Similar to the argv[] usage (Cf. Page 35, notes of Pointers and Arrays Chapter), $0 keeps the name of this very program, S1 the first argument, etc.. For example, the following is the echo.args script.

% more echo.args #!/bin/sh #Illustrate the use of positional variables echo $0 $1 $2 $3 $4 $5 $6 $7 $8 $9

Notice that the very first line

#!/bin/sh specifies that we will always use a particular shell, sh, to run the scripts. Thus,

% ./echo.args I love Spring, but is it coming soon? ./echo.args I love Spring, but is it coming soon? %

Positional variables $1 through $9 are also collected as $*. Their number is denoted as $#. For example,

% more echoRead.args #!/bin/sh #Illustrate the use of positional parameters, used defined variables and #the read command. echo ’What is your name?’ read name echo "Well, $name, you typed $# arguments:" echo $0 $*

Thus,

38 % ./echoRead.args This is great! What is your name? Zhizhang Well, Zhizhang, you typed 3 arguments: ./echoRead.args This is great! You can also set up those variables using the set command, which is very useful. For example, % more weeklyReminder #!/bin/sh # A daily reminder service set ‘date‘ echo "Remember for today:" case $1 in Mon) echo "Plan the week.";; Tue) echo "Take clothes to the cleaners.";; Wed) echo "Attend group meeting.";; Thu) echo "Make plans for the weekend."; echo "Pick up clothes at the cleaners.";; Fri) echo "Answer e-mail";; Sat) echo "You should not be here working."; echo "Finish your work and log off.";; Sun) echo "Call Grandma and Grandpa.";; esac Let’s try to run this piece out.... % date Wed Apr 8 20:01:23 EDT 2020 % chmod u+x weeklyReminder % ./weeklyReminder Remember for today: Attend group meeting. % , I have to attend group meeting Wednesday ,

Just one more example, setDate, % more setDate #!/bin/sh #Demonstrate the set command set ‘date‘ echo "Time: $4 $5"

39 echo "Day: $1" echo "Date: $3 $2 $6"

Note that when you type this ‘date‘ stuff, you have to use the grave accent, i.e., the key above the Tab key and to the left of 1.

% chmod u+x setDate % ./setDate Time: 20:03:57 EDT Day: Wed Date: 8 Apr 2020 %

Question: What is going on?

Answer: The grace accented ‘date‘ run the date command, which sends back the following

Wed Apr 8 20:03:57 EDT 2020 as values of positional parameters: $1, $2, $3, $4, $5 and $6.

Labwork 7.1:

1. Play with all the scripts and really understand them.

2. Write a script to come up with a daily reminder, e.g., for the week days,

(a) If it is about 6 a.m., get up (b) If it is about 7 a.m., have breakfast (c) If it is about 8 a.m., start the first class (d) If it is about 10 p.m., time to do homework

If it is either Saturday or Sunday, do something appropriate for the weekend, e.g., cleaning the backyard. Note: For the “about” part, you might put in a difference, say, “ about 6 a.m.” is the segment between 5:55 and 6:05 in the morning. You might also use a wild card, e.g., “05:5*” stands for a period between 5:50 through 5:59. Do some research with WEB to dig out more relevant expressions. In either case, you certainly need to investigate the ancient syntax for the conditional structure as used in shell.

40 7.2 User-defined variables Please don’t try the following delFile, if you don’t want to accept the consequence.

% more delFile #!/bin/sh #Delete a file interactively filename=$1 if test ! -f $filename then echo "There is no file \"$filename\"." else echo "Do you want to delete \"$filename\"?" read choice

if test $choice = y then rm $filename echo \"$filename\" deleted else echo \"$filename\" not deleted. fi fi

% ./delFile fakedir There is no file "fakedir". % ls testFile testFile % ./delFile testFile Do you want to delete "testFile"? y "testFile" deleted % ls testFile ls: cannot access testFile: No such file or directory

Labwork 7.2: Modify the delFile script so that it will test if the one to be deleted is a directory, in this case, it should call rmdir instead of rm to delete it. Otherwise, it does nothing. Notice that the following line tests if is a directory, which sends back True if the value of $dirname holds a directory’s name. if test ! -d $dirname

41 8 Unix system interface

The Unix provides its services via a set of system calls, which can be used by various user programs. We now look at a few important and useful system calls. Since most of the Unix are written in C, such system calls also provide a way to us to have an insight look of C.

8.1 Look at one call in details We will experiment with an important Unix system call, fork(), which creates a new process. When a process creates another child process, it simply creates a copy of the calling process (parent process), except that the child process has its own identification number, pid, and its own pointers to shared kernel entities 22. After fork() is called, two processes will execute the next statement after this fork() call with their respective 23 data space. If the fork() call succeeds in the parent process, it returns the pid of the newly created child process. For the child process, fork() returns a 0. This tells apart the execution of two processes, as we will see in the later demos. It may seem like an overkill by creating another process to carry out certain computation, but it is actually essential from at least a reliability consideration: If the newly created child process does anything wrong, it might crash; while the parent process will still run normally. This technique is used extensively. For example, after the Unix system is booted, a process 0 is created, which is the ancestor of every process created during that session. Whenever a command is to be executed; or whenever a user is to log on, a new process is created. For those who have taken, or are taking, CS3221 Algorithms, we discussed its usage in spawning processes to potentially take care of parallel tasks. This important tool will be discussed in much more details through projects as assigned in CS4310 Operating Systems. You are also expected to use this technique when taking on the final project on shell for this course.

8.1.1 An example Below is a simple example, fork.c, involving the fork() call.

/* Slight modification of Richard Stevens example from "Advanced Programming in the UNIX Environment" */

#include #include #include #include 22On the other hand, the system call vfork() creates a child process that shares its data space with its parent. 23This is not true with the vfork() call, which we will work with in CS4310 Operating Systems.

42 int globalVar = 6; /* external variable in the initialized data */ char buf[] = "Write to the standard output\n";

int main(void){

int localVar; /* local variable in the initialized data */ pid_t pid;

localVar = 88; if (write(STDOUT_FILENO, buf, sizeof(buf)-1) != sizeof(buf)-1){ printf("write error"); exit(1); } printf("Now, the fork starts\n");

if ( (pid = fork()) < 0){ printf("fork error\n"); exit(1); } else if (pid == 0) { globalVar++; localVar++; } else sleep(2);

printf("pid = %d, globalVar = %d, localVar = %d\n", getpid(), globalVar, localVar); exit(0); }

Labwork 8.1.1: Run the program, then check it out carefully so that you can answer the following questions:

1. What are the process ids for the child and the parent processes?

2. Which parts are done by both processes, by the child only, and by the parent only?

3. Why does only the child process increment the two variables, i.e., globalVar and localVar? Why are both of them affected?

4. Which process(es), child and/or parent, execute(s) the sleep command? Why is that?

43 8.1.2 Another example More realistically, a parent wants to create a process to get something done, and will actually until the newly created process to complete. This is quite similar to the final project, where the parent process, might want to fork off a child process to do, e.g., the ls command, and will wait there, until it is done. The following code, consisting of two files, demonstrates this situation.

/* The parent.c file */ #include #include #include int main(){ if(fork()==0){ execve("child", NULL, NULL); exit(0); } printf("Process[%d]: Parent in execution...\n", getpid()); sleep(2); if(wait(NULL)>0) printf("Process[%d]: Parent detects terminating child \n", getpid()); printf("Process[%d]: parent terminating...\n", getpid()); }

/* The child.c file */ #include int main(){ printf("Process[%d]: child in execution...\n", getpid()); sleep(2); printf("Process[%d]: child terminating...\n", getpid()); }

Notice that we have to name the executable file associated with child.c as child.

% cc child.c -o child

We can then compile and run parent.c and get the following:

% cc parent.c % ./a.out Process[9715]: Parent in execution... Process[9716]: child in execution...

44 Process[9716]: child terminating... Process[9715]: Parent detects terminating child Process[9715]: parent terminating...

The parent program will create a process, which executes the child program using the system function execve(3), as contained in child.c. The parent then calls the system function wait to block itself until the OS kernel signals the process to continue again, because one of its child processes has just terminated.

Note: Check out the link regarding another example of using execve(3) on the course page.

8.1.3 Scripts again? Anything executable, including those shell scripts as discussed in Section 7, can be used to start a child process in this forking context. We start with the following parent1.c file.

/* parent1.c */ #include #include #include int main(int argc, char * argv[]){ if(fork()==0){ char *cmd[] = { "delFile", argv[1], NULL };

//How many parameters do you see? if(argc==1) execve("child", NULL, NULL); else if(argc==2) execve("delFile", cmd, NULL); else exit(1); exit(0); } printf("Process[%d]: Parent in execution...\n", getpid()); sleep(2); if(wait(NULL)> 0) printf("Process[%d]: Parent detects terminating child \n", getpid()); printf("Process[%d]: parent terminating...\n", getpid()); }

Below is the original child.c file.

% more child.c

45 #include

int main(){ printf("Process[%d]: child in execution...\n", getpid()); sleep(2); printf("Process[%d]: child terminating...\n", getpid()); }

We now have the delFile script that we saw earlier.

% more delDir % more delFile #!/bin/sh #Delete a file interactively filename=$1 if test ! -f $filename then echo "There is no file \"$filename\"." else echo "Do you want to delete \"$filename\"?" read choice

if test $choice = y then rm $filename echo \"$filename\" deleted else echo \"$filename\" not deleted. fi fi %

We compile the child.c into an executable child,

% cc child.c -o child then compile the parent1.c program.

% cc parent1.c

If we call it without giving it any parameter, thus argc=1, the executable of parent1.c executes the child program.

46 % ./a.out Process[31193]: Parent in execution... Process[31194]: child in execution... Process[31194]: child terminating... Process[31193]: Parent detects terminating child Process[31193]: parent terminating...

On the other hand, if we call it with a parameter, child.c, thus argc=2, it calls mySpell to check the spelling of the words as contained in this file.

% ./a.out fakeFile Process[31221]: Parent in execution... Do you want to delete "fakeFile"? y "fakeFile" deleted Process[31221]: Parent detects terminating child Process[31221]: parent terminating... %

This above situation is pretty similar to what you have to do in the final project as well, as different commands need different number of parameters. Even the same command might need different parameters to do different things. For example, ls will list all the stuff in the current directory, while ls fake will list the content of the fake subdirectory.

Labwork 8.1.3:

1. Play with the programs as shown in this section. You may change some of the stuff, such as the time that the child process waits, to see if there is any change of the printout.

2. Find out the general format, and discuss the various usage, of those “strange” functions, particularly, the exec family, fork(), getpid(), and wait(). You may need these stuff when completing the project....

You now should be ready to work on the very profitable project.

8.2 An example: Listing directories After writing some thing simple, let’s now read something that is a lot more complicated, which is taken from the textbook[2, §8]. What we have discussed is something of a file, i.e., the content of a file. Sometimes, we also need to know something about a file. For example, a Unix service, ls, which we often use, lists the names of files in a directory, together with other information such as their size, permissions, and so on.

47 There are two aspects of this issue: Since a directory is just a file, we only need to get access to this file and print out its content. But, it is necessary to use a system call to get other information such as the sizes of files. We will show how to write such a program, fsize, that is to find out the size of a file, and when the argument is a directory, it will recursively finds out the sizes of all the files as contained in such a directory. As we mentioned in § 2.1, a directory is a file that consists of a list of filenames with some other information, including the location of those files. Technically, a location of a file is given as an index into another table called the inode list. An inode for a file tells where all the information about that file, except its file name is kept. Thus, an entry in a directory about a file contains only two pieces: a file name and an index number. However, the format and precise contents of a directory vary with system. Thus, we have to specify a standard interface to deal with such a variety by dividing the directory listing task into two pieces. The outer level defines a structure, Direnet, and three routines opendir, readdir and closedir to provide system-independent access to the file name and the associated inode number in a directory entry. We will write fsize with this interface in mind. Then, we will show how to implement such a structure and routines on the directory structures of Version 7 and System V Unix. The advantage for such an approach is that, when given a different implementation, we only need to revise the implementation of such an interface, rather than changing everything.

8.2.1 Given a standard interface... The Dirent structure contains the name of a file and its associated inode index number. The declaration of this structure and other routines are collected in a file dirent.h as the standard interface we can work with. #define NAME_MAX 14 /* longest file name */ typedef struct } /* portable directory structure */ long ino; char name[NAME_MAX]; } Dirent;

typedef struct { /* a minimal directory structure with no buffer */ int fd; /* file descriptor for a directory */ Dirent d; /* the Dirent for this directory */ }DIR;

DIR *opendir(char *dirname); Dirent *readdir(DIR *dfd); void closedir(DIR *dfd); The system call stat takes a file name and returns all of the information in the inode for that file, or -1 if there is an error.

48 char *name; struct stat stbuf; int stat(char *, struct stat *);

With the above declaration, the way to call it is certainly the following:

stat(name, &stbuf);

which fills the structure stbuf with the inode information for the file with its name being name. The structure stat, as declared in typically contains the following information: struct stat {/*inode information returned by stat */ dev_t st_dev; /* divide of inode */ ino_t st_ino; /* inode number */ short st_mode; /* mode bits */ short st_nlinks; /* number of links to file */ short st_uid; /* owner’s user id */ short st_gid; /* owner’s group id */ dev_t st_rdev; /* for special files */ off_t st_size; /* file size i characters */ time_t st_atime; /* time last accessed */ time_t st_mtime; /* time last modified */ time_t st_ctime /* time inode last changed */ };

Now, we have some ideas as where the information for file Properties is kept; and also how does the OS know who has what access rights to a file. The types dev_t and ino_t are defined in , which should be included as will. The st_mode contains a bunch of flags to further describe the nature of the file in question. The following is just a part of such description, which is also defined in .

#define S_IFMT 0160000 /* type of file */ #define S_IFDIR 0040000 /* directory */ #define S_IFCHR 0020000 /* character special */ #define S_IFBLK 0060000 /* block special */ #define S_IFREG 0100000 /* regular */ /*...*/

We are now ready to write the program fsize. If the mode st_mode obtained tells us that it is a file, then its size st_size can be immediately printed. Otherwise, we have to process that directory one file at a time. The process can be recursive since such a directory might contain subdirectory. Let’s look at it.

49 #include #include #include "syscalls.h" #include #include #include "dirent.h"

void fsize(char *);

main(int argc, char *argv[]){ if (argc==1) fsize("."); else while (--argc> 0) fsize(*++argv); return 0; }

The function fsize prints the size of the file. If the file is a directory, it calls a function dirwalk to process all the files within such a directory. int stat(char *, struct stat *); void dirwalk(char *, void (*fcn)(char *)); void fsize(char *name){ struct stat stbuf;

if(stat(name, &stbuf)==-1){ fprintf(stderr, "fsize: can’t access %s\n", name); return; } if(stbuf.st_mode & S_IFMT)==S_IFDIR) /* check the st_mode structure */ dirwalk(name, sfize); /* go ahead with the guts first */ printf("%8ld %s\n", stbuf.st-size, name); /* prints out information for this directory file */ }

The function dirwalk applies a function, as given in terms of a pointer, to each file (subdirectory) in a directory. The general flow is clear: it opens the directory, goes through all the files within with the function, then closes the directory and returns. To find out the size of a file, it calls fsize indirectly.

50 #define MAX_PATH 1024

void dirwalk(char *dir, void (*fcn)(char *)){ char name[MAX_PATH]; Dirent *dp; /* portable interface */ DIR *dfd;

/* calls opendir to fill the fstat structure */ if((dfd=opendir(dir))==NULL){ fprintf(stderr, "fsize: can’t access %s\n", name); return; } while((dp=readdir(dfd))!=NULL){ if(strcmp(dp-> name, ".")==0||strcmp(dp-> name, "..")==0) continue; /* skip itself and parent, see Section 2.1 */ if(strlen(dir)+strlen(dp-> name)+2> sizeof(name)) /* dir/dp-> name/ contains more than MAX_PATH characters /* fprintf(stderr, "dirwalk: name %s %s too long\n", dir, dp-> name); else { sprintf(name, "%s%s", dir, dp-> name); /* fsize is applied in dirwalk in fsize declaration */ (*fcn)(name); } } /* Done when no more files are left */ fclose(dfd); }

8.2.2 ... and an implementation of the interface We are done with the standard interface, and now have to get down to the next level to provide a minimal implementation for this standard interface. The following one is for Version 7 and System V Unix, two of the more popular versions of Unix, which uses the following directory information as declared in .

/* See Chapter 4 notes #ifndef DIRSIZE #define DIRSIZ 14 #endif

struct direct {/* directory entry */ ino_t d_ino; /* inode number */ char d_name[DIRSIZ]; /* long name does not have ’\0’ */

51 }

It is clear that in those two versions, the names are rather short. For some other versions, much longer names are allowed which could lead to a more complicated directory structure. The type ino_t as occurred in the above direct and stat structures is a typedef that describes the index into the inode list. We regularly implement it as a unsigned short type, but it may change with a different system. Now, we will check out the three basic routines: opendir, readdir and closedir. The routine opendir opens the directory as a file, verifies its status as a directory, allocates a directory structure for it, and then records the information.

/* fstat is similar to stat, but applies to a file descriptor rather than a file name */ int fstat(int fd, struct state *);

DIR *openfir(char *dirname){ int fd; struct stat stbuf; /* DIR is part of the standard interface */ DIR *dp;

if((fd=open(dirname, O_RDONLY, 0))==-1 ||fstat(fd, &stbuf)==-1 ||(stbuf.st_mode & S_IFMT)!=S_IMDIR ||(dp=(DIR*) malloc(sizeof(DIR)))==NULL) return NULL; dp-> fd=fd; return dp; }

The routine closedir simply closes a directory opened by an invocation of opendir. void closedir(DIR *dp){ if(dp){ /* for close system call, see Section 6.3 of this lab notes */ close(dp-> fd); /* for the free function, see the address arithmetic unit in Chapter 5 notes */ free(dp); } }

The important(:-)) routine readdir uses the read system call to reach each directory entry. If a directory slot is not currently in use, the inode number is 0, and this position is skipped. Otherwise, the inode number and the associated name will be placed in a static structure as called for the standard interface, then return a pointer to such a structure.

52 #include

Dirent *readdir(DIR *dp){ struct direct dirbuf; static Dirent d;

while(read(dp-fd, (char *)&dirbuf, sizeof(dirbuf))==sizeof(difbuf)){ if(dirbuf.d_info==0) /* slot not in use */ continue; d.ino=dirbuf.d_ino; strcpy(d.name, dirbuf.d_name, DIRSIZ); d.name[DIRSIZ]=’\0’; return &d; } return NULL }

8.2.3 A little summary The function fsize is one of those programs that they are not “system programs” in the sense that they are not exclusively written for the system, but they merely make use of the information provided by the system, through the stat structure in this case. For such pro- grams, all the system information should be included as a head file, such as the sys/stat.h file, but not embedded into these programs. There are at least two advantages for this ap- proach: one is to share all such information consistently among all the files that have a need to use them; and the other is to remove as much redundancy as possible. The division of the design in terms of a standard interface and a system dependent implementation is also an important point since this makes the fsize function specific system independent. Another important point for this example is that it demonstrates how to use C, a high- level programming language, to write programs at a system level. It also demonstrates a few applications of bitwise operators, space allocations, etc..

References

[1] Anderson, P., Just Enough Unix (Fourth .), McGraw-Hill, Boson, MA, 2003.

[2] Kernighan, B., and Ritche, D., The C Programming Language (Second Ed.), Prentice Hall, Englewood Cliffs, NJ, 1988.

[3] O’Kane, J. M., A Gentel Introduction to ROS. Available at https://www.cse.sc.edu/~jokane/agitr/

53