Part Workbook 7. Standard I/O and Pipes Table of Contents

1. Standard In and Standard Out ...... 3 Discussion ...... 3 Three types of programs ...... 3 Standard in (stdin) and Standard out (stdout) ...... 4 Redirecting stdout ...... 4 Writing Output to a File ...... 4 Appending Output to a File ...... 4 Redirecting stdin ...... 5 Under the Hood: Open Files and File Descriptors ...... 5 Open Files and File Descriptors ...... 5 Redirection ...... 7 Examples ...... 8 Example 1. Getting Out of sort's ...... 8 Example 2. Automated FTP Transfers ...... 9 Example 3. Automating Graph Generation with gnuplot ...... 11 Online Exercises ...... 12 Online Exercise 1. Using Standard In and Standard Out ...... 12 Specification ...... 12 Deliverables ...... 13 Suggested Strategy for Automating gnuplot ...... 13 Questions ...... 13 2. Standard Error ...... 17 Discussion ...... 17 Standard Error (stderr) ...... 17 Redirecting stderr ...... 18 Combining stdout and stderr: Old School ...... 19 Combining stdout and stderr: New School ...... 20 Summary ...... 20 Examples ...... 20 Example 1. Using /dev/null to filter out stderr ...... 20 Online Exercises ...... 21 Specification ...... 21 Deliverables ...... 22 Questions ...... 22 3. Pipes ...... 26 Discussion ...... 26 Pipes ...... 26 Filtering output using grep ...... 27 Pipes and stderr ...... 27 Commands as filters ...... 28 Examples ...... 28 Example 1. Listing Processes by Name ...... 28 Example 2. Searching History Efficiently ...... 29 Example 3. Unix philosophy: Simple Tools that Work Well Together ...... 29 Online Exercises ...... 31 Specification ...... 32 Deliverables ...... 32 Clean Up ...... 32 Questions ...... 32

2 Chapter 1. Standard In and Standard Out

Key Concepts

• Terminal based programs tend to read information from one source, and write information to one destination.

• The source programs read from is referred to as Standard In (stdin), and is usually connected to a terminal's keyboard.

• The destination programs write to is referred to as Standard Out (stdout), and is usually connected to a terminal's display.

• When using the shell, stdout can be redirected using > or >>, and stdin can be redirected using <. Discussion

Many Linux commands read input from the keyboard and display output to the terminal. In this Workbook, you'll learn how you can redirect where input is read from and where output goes. The output of one command can be used as the input for another command, allowing simple commands to be used together to perform more complicated tasks. Three types of programs

In Linux (and Unix), programs can generally be grouped into the following three designs.

Graphical Programs Graphical programs are designed to run in the X graphical environment. They expect the user to be using a mouse, and use common graphical components, such as popup menus and buttons, for user input. The mozilla web browser is an example of a graphical program.

Screen Programs Screen based programs expect to use a text console. They use of the entire display, and handle text placement and screen redraws in sophisticated ways. They do not require a mouse, and are appropriate for terminals and virtual consoles. The vi and nano text editors, and links web browser, are examples of screen based programs.

Terminal Programs Terminal programs collect input and display output in a stream, seldom if ever redrawing the screen, as if writing directly to a printer that does not allow the cursor to move back up the page. Because of their simplicity, terminal based programs are often called simply commands. ls, grep, and useradd are examples of terminal based programs.

This chapter focuses on the latter type of program. Do not let the simplicity of the way these commands receive input and output fool you. You will find that many of these commands are very sophisticated, and allow you to use the command line interface in powerful ways.

3 Standard In and Standard Out

Standard in (stdin) and Standard out (stdout)

Terminal based programs generally read information as as stream from a single source, such as a terminal's keyboard. Likewise, they generally write information as a steam to a single destination, such as a display. In Linux (and Unix), the input stream is referred to as Standard In (usually abbreviated stdin), and the output stream is referred to as Standard Out (usually abbreviated stdout).

Usually, stdin and stdout are connected to the terminal that runs the command. Sometimes, in order to automate commonly repeated commands, or in order to record the output of a command for later inclusion in a report or email, people find it convenient to redirect stdin from or stdout into files. Redirecting stdout Writing Output to a File

When a terminal based program generates output, it generally writes that output to its stdout stream, without knowing what is connected to the receiving end of that stream. Usually, the stdout stream is connected to the terminal that started the process, so the output is written to the terminal's display. The bash shell uses > to redirect a process's stdout stream to a file.

For example, suppose the machine elvis is using becomes very sluggish and non-responsive. In order to diagnose the problem, elvis would like to examine the currently running processes. Because the machine is so sluggish, however, he wants to collect the information now, but analyze it later. He can redirect the output of the ps aux command into the file sluggish.txt, and come back to examine the file when the machine is more responsive.

[elvis@station elvis]$ ps aux > sluggish.txt [elvis@station elvis]$

Notice that no output is displayed to the terminal. The ps command writes to stdout, as it always does, but stdout is redirected by the bash shell to the file sluggish.txt. The user elvis can examine the file later, at a more convenient time.

[elvis@station elvis]$ head sluggish.txt USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1380 76 ? S Jun02 0:04 init [ root 2 0.0 0.0 0 0 ? SW Jun02 0:00 [keventd] root 3 0.0 0.0 0 0 ? SW Jun02 0:00 [kapmd] root 4 0.0 0.0 0 0 ? SWN Jun02 0:00 [ksoftirqd_CPU0] root 9 0.0 0.0 0 0 ? SW Jun02 0:00 [bdflush] root 5 0.0 0.0 0 0 ? SW Jun02 0:00 [kswapd] root 6 0.0 0.0 0 0 ? SW Jun02 0:00 [kscand/DMA] root 7 0.0 0.0 0 0 ? SW Jun02 0:37 [kscand/Normal] root 8 0.0 0.0 0 0 ? SW Jun02 0:00 [kscand/HighMem] Appending Output to a File

If the file sluggish.txt already existed, its original contents would be lost. This is often referred to as clobbering a file. To append a command's output to a file, rather than clobbering it, bash uses >>.

Suppose that elvis wanted to record a timestamp of when the sluggish behavior was happening, as well as a list of currently running processes. He could first create (or clobber) the file with the output of the date command, using >, and then append to it the output of the ps aux command using >>.

[elvis@station elvis]$ date > sluggish.txt [elvis@station elvis]$ ps aux >> sluggish.txt [elvis@station elvis]$ head sluggish.txt

4 Standard In and Standard Out

Tue Jun 3 16:57:23 EDT 2003 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 1380 76 ? S Jun02 0:04 init [ root 2 0.0 0.0 0 0 ? SW Jun02 0:00 [keventd] root 3 0.0 0.0 0 0 ? SW Jun02 0:00 [kapmd] root 4 0.0 0.0 0 0 ? SWN Jun02 0:00 [ksoftirqd_CPU0] root 9 0.0 0.0 0 0 ? SW Jun02 0:00 [bdflush] root 5 0.0 0.0 0 0 ? SW Jun02 0:00 [kswapd] root 6 0.0 0.0 0 0 ? SW Jun02 0:00 [kscand/DMA] root 7 0.0 0.0 0 0 ? SW Jun02 0:37 [kscand/Normal] Redirecting stdin

Just as bash uses > to coax commands into delivering their output somewhere other than the display, bash uses < to cause them to read input from somewhere other than the keyboard. The user elvis is still trying to figure out why his machine was acting sluggish. He talked to his local system administrator, who thought that looking at the list of currently running processes sounded like a good idea, and asked elvis to mail him a copy.

Using the terminal based mail command, elvis first writes an email message to the administrator "manually", from the keyboard. The mail command expects a recipient as an argument, and the subject line can be specified with the -s command line switch. The email body is then entered from the keyboard. The end of the message text is signaled by a lone period on a line.

[elvis@station elvis]$ mail -s "Computer is sluggish" [email protected]

Hey sysadmin...

I'm sending a list of processes that were running when the computer was running in a separate email.

Thanks! --elvis .

Cc:

For his follow-up message, elvis can easily mail the output of the ps command he recorded in the file sluggish.txt. He just redirects the mail command's stdin stream to be read from the file.

[elvis@station elvis]$ mail -s "ps output" [email protected] < sluggish.txt

The system administrator will receive an email from elvis, with "ps output" as it's subject line, and the contents of the file sluggish.txt as its body.

In the first case, the mail process's stdin was connected to the terminal, and the message body was provided by the keyboard. In the second case, bash arranged for the mail process's stdin to be connected to the file sluggish.txt, and the message body was provided by its contents. The mail command doesn't change its basic behavior: It reads the body of the email message from stdin. 1 Under the Hood: Open Files and File Descriptors Open Files and File Descriptors

To fully appreciate how processes manage Standard In, Standard Out, and files, we must introduce the concept of a file descriptor. In order to read information from or write information to a file, a process must

1 This is an oversimplification. Some commands do respond differently if stdin or stdout is a terminal instead of a file. The mail command, for instance, prompted elvis with a Cc: prompt when stdin was a terminal, but not when stdin was the file sluggish.txt. In general, however, commands should be thought of reading from a file or a terminal interchangeably, with only occasional, minor changes in behavior.

5 Standard In and Standard Out open the file. Linux (and Unix) processes keep track of the files they currently have open by assigning each an integer. The integer is called a file descriptor.

The Linux kernel provides an easy way to examine the open files and file descriptors of a currently running process, using the /proc file system. Every process has an associated subdirectory under /proc, named after its PID (process ID). The process's subdirectory in turn has a subdirectory called fd (for file descritptor). Within the /proc/pid/fd subdirectory, a symbolic links exists for every file the process has open. The name of the symbolic link is the open file's integer file descriptor, and the symbolic link resolves to the open file itself.

In the following, elvis cats the file /usr/share/hwdata/oui.txt, and then almost immediately suspends the program with a CTRL+Z.

[elvis@station elvis]$ /usr/share/hwdata/oui.txt

[1]+ Stopped cat /usr/share/hwdata/oui.txt

Using the ps command to look up the process's PID, elvis next examines the process's /proc/pid/fd directory.

[elvis@station elvis]$ ps PID TTY TIME CMD 1368 pts/1 00:00:00 bash 1910 pts/1 00:00:00 cat 1911 pts/1 00:00:00 ps [elvis@station elvis]$ ls -l /proc/1910/fd total 0 lrwx------1 elvis elvis 64 Sep 13 06:42 0 -> /dev/tty1 lrwx------1 elvis elvis 64 Sep 13 06:42 1 -> /dev/tty1 lrwx------1 elvis elvis 64 Sep 13 06:42 2 -> /dev/tty1 lr-x------1 elvis elvis 64 Sep 13 06:42 3 -> /usr/share/hwdata/oui.txt

elvis observes that the PID of the cat process is 1910. elvis now looks in the subdirectory which corresponds to the observed PID.

Not surprisingly, the cat process has the file /usr/share/hwdata/oui.txt open (it must be able to read the file to display its contents). Perhaps a little surprising, it is not the only, or even the first, file that the process has open. The cat command has three open files before it, or, more exactly, the same file open three times: /dev/tty1.

As a Linux (and Unix) convention, every process inherits three open files upon startup. The first, file descriptor 0, is Standard In. The second, file descriptor 1, is Standard Out, and the third, file descriptor 2, is Standard Error (to be discussed in the next Lesson). What open files did the cat command inherit from the bash shell that started it? The device node /dev/tty1 for all three.

Table 1.1. Standard In, Standard Out, and Standard Error File Descriptors

Stream Descriptor Abbreviation Standard In 0 stdin Standard Out 1 stdout Standard Error 2 stderr

Recall that /dev/tty1 is the device node which connects to the console serial driver within the kernel. Whatever elvis types can be read from this file, and whatever is written to this file is displayed on elvis's terminal. What happens if the cat process reads from stdin? It reads input from elvis's keyboard. What happens if it writes to stdout? Whatever is written is displayed on elvis's terminal.

6 Standard In and Standard Out

Redirection

In the next example, elvis cat's the /usr/share/hwdata/oui.txt file, but this time redirects stdout to the file /tmp/foo . Again, elvis suspends the command in mid-stride with the CTRL+Z control sequence.

[elvis@station elvis]$ cat /usr/share/hwdata/oui.txt > /tmp/foo

[1]+ Stopped cat /usr/share/hwdata/oui.txt >/tmp/foo

Using the same technique as above, elvis examines the files opened by the cat command, and the file descriptors associated with them.

[elvis@station elvis]$ ps PID TTY TIME CMD 1368 pts/1 00:00:00 bash 1976 pts/1 00:00:00 cat 1977 pts/1 00:00:00 ps [elvis@station elvis]$ ls -l /proc/1976/fd total 0 lrwx------1 elvis elvis 64 Sep 13 07:05 0 -> /dev/pts/1 l-wx------1 elvis elvis 64 Sep 13 07:05 1 -> /tmp/foo lrwx------1 elvis elvis 64 Sep 13 07:05 2 -> /dev/pts/1 lr-x------1 elvis elvis 64 Sep 13 07:05 3 -> /usr/share/hwdata/oui.txt

Notice that file descriptor 1 (in other words, Standard Out) is no not connected to the terminal, but instead to the file /tmp/foo.

What happens when elvis redirects both Standard Out and Standard In?

[elvis@station elvis]$ cat < /usr/share/hwdata/oui.txt > /tmp/foo

[1]+ Stopped cat /tmp/foo [elvis@station elvis]$ ps PID TTY TIME CMD 1368 pts/1 00:00:00 bash 1980 pts/1 00:00:00 cat 1988 pts/1 00:00:00 ps [elvis@station elvis]$ ls -l /proc/1980/fd total 0 lr-x------1 elvis elvis 64 Sep 13 07:07 0 -> /usr/share/hwdata/oui.txt l-wx------1 elvis elvis 64 Sep 13 07:07 1 -> /tmp/foo lrwx------1 elvis elvis 64 Sep 13 07:07 2 -> /dev/pts/1

File descriptor 0 (Standard In) is not connected to the terminal, but instead to the file /usr/share/ hwdata/oui.txt.

When the cat command is called without arguments (i.e., without any filenames of files to display), it displays Standard In instead. Rather than opening a specified file (using file descriptor 3, as above), the cat command reads from stdin instead.

What is the effective difference between the following three commands?

[elvis@station elvis]$ cat < /usr/share/hwdata/oui.txt > /tmp/foo [elvis@station elvis]$ cat /usr/share/hwdata/oui.txt > /tmp/foo [elvis@station elvis]$ /usr/share/hwdata/oui.txt /tmp/foo

There is none. In order to appreciate the real benefit of designing commands to read from Standard In in lieu of named files, we must wait until pipes are introduced in a subsequent Lesson.

7 Standard In and Standard Out

Examples

The following examples include a quick example of how new users can often get confused by commands that read from Standard In, and a couple of more "real world" examples that use the ftp and gnuplot programs. The ftp and gnuplot programs are both complicated programs, and these examples merely introduce enough of their functionality to emphasize one of the main themes of this Workbook: if the program is driven from a command line interface, it can usually be automated with a simple text script and redirection. Getting Out of sort's

In the following, blondie uses the sort command to sort the animals found in the text file zoo.

[blondie@station blondie]$ cat zoo elephant seal ape giraffe fish [blondie@station blondie]$ sort zoo ape elephant fish giraffe seal

As the name of the command suggests, the sort command (in its simplest form) reads a file, and writes the contents sorted line by line alphabetically. Like the cat command, the sort command, when not provided any arguments (i.e., filenames of files to sort), the sort command will look to stdin for its input.

[blondie@station blondie]$ sort < zoo ape elephant fish giraffe seal

While this behavior seems (and is) perfectly reasonable, it often confuses new users who innocently type a command's name, "just to see what it does". In the following, assume that blondie does not yet know about Standard In. Exploring, she invokes the sort command. Not understanding that the sort command is waiting to read Standard In, i.e., her keyboard, she tries to somehow exit the command she's started. Finally, a friend whispers to her, "CTRL+D".

[blondie@station blondie]$ sort ls quit man sort exit get me out of this CTRL+D exit get me out of this ls man sort quit [blondie@station blondie]$

Upon typing CTRL+D, the conventional "End of File" control sequence (recall Workbook 1), the sort command prints a sorted list of everything it read on Standard In.

8 Standard In and Standard Out

Automated FTP Transfers

The user blondie routinely grabs a README file from the ftp server for the Linux kernel project, ftp.kernel.org. The kernel.org ftp server allows anonymous users, that is, users who sign in using the username "anonymous". When prompted for a password, anonymous ftp users do not need to supply one, but conventionally give their email address in lieu of a password.

[blondie@station student]$ ftp ftp.kernel.org Connected to ftp.kernel.org (204.152.189.116). 220 ProFTPD [ftp.kernel.org] Name (ftp.kernel.org:blondie): anonymous 331 Anonymous login ok, send your complete email address as your password. Password: (blondie types in her email address) 230 Anonymous access granted, restrictions apply. Remote system type is UNIX. Using binary mode to transfer files. ftp> ls 227 Entering Passive Mode (204,152,189,116,237,224). 150 Opening ASCII mode data connection for file list drwxr-s--- 2 korg mirrors 4096 May 21 2001 for_mirrors_only drwx------2 root root 16384 Mar 18 00:27 lost+found drwxrwsr-x 8 korg korg 4096 Mar 24 17:46 pub 226 Transfer complete. ftp> cd pub 250 CWD command successful. ftp> ls 227 Entering Passive Mode (204,152,189,116,237,229). g150 Opening ASCII mode data connection for file list drwxrws--- 2 korg korg 4096 Mar 18 04:05 RCS -r--r--r-- 1 korg korg 1963 Oct 4 2001 README -r--r--r-- 1 korg korg 578 Mar 18 04:04 README_ABOUT_BZ2_FILES drwxrwsr-x 4 korg korg 4096 Oct 26 2000 dist ... 226 Transfer complete. ftp> get README local: README remote: README 227 Entering Passive Mode (204,152,189,116,237,237). 150 Opening BINARY mode data connection for README (1963 bytes). 226 Transfer complete. 1963 bytes received in 0.000564 secs (3.4e+03 Kbytes/sec) ftp> quit 221 Goodbye.

When the ftp command pauses with the ftp> prompt, blondie types commands to navigate the ftp server's directories. If blondie downloaded this file often, she might be tempted to write a simple text file, getreadme.ftp, that would reproduce the commands she typed from the keyboard. She could then run the same command, ftp ftp.kernel.org. This time, however, she would use < to cause bash to redirect stdin from the file getreadme.ftp. When the ftp command reads input from its stdin stream, the information is provided by the file instead of the keyboard.

First, blondie uses a simple text editor to create the file getreadme.ftp containing all of the commands that she typed interactively on the keyboard (including the password she supplied to the anonymous ftp server, [email protected]).

[blondie@station blondie]$ cat getreadme.ftp anonymous [email protected] ls cd pub ls get README quit

9 Standard In and Standard Out

Notice how the contents of the file match exactly what she typed when using the ftp command above. Next, she reruns the ftp ftp.kernel.org, but redirects stdin from the newly created file.

[blondie@station blondie]$ ftp ftp.kernel.org < getreadme.ftp Password:Name (ftp.kernel.org:blondie): ?Invalid command drwxr-s--- 2 korg mirrors 4096 May 21 2001 for_mirrors_only drwx------2 root root 16384 Mar 18 00:27 lost+found drwxrwsr-x 8 korg korg 4096 Mar 24 17:46 pub drwxrws--- 2 korg korg 4096 Mar 18 04:05 RCS -r--r--r-- 1 korg korg 1963 Oct 4 2001 README -r--r--r-- 1 korg korg 578 Mar 18 04:04 README_ABOUT_BZ2_FILES drwxrwsr-x 4 korg korg 4096 Oct 26 2000 dist -r--r--r-- 1 korg korg 1507 Oct 11 2001 index.html drwxrwsr-x 8 korg korg 4096 Jan 21 2002 linux drwxrwsr-x 3 korg korg 4096 Mar 24 17:46 scm drwxrwsr-x 3 korg korg 4096 Oct 11 2001 site drwxrwsr-x 11 korg korg 4096 Jan 1 2002 software [blondie@station blondie]$ ls -l README -rw-rw-r-- 1 blondie blondie 1963 Jun 3 17:37 README

After the command has run, blondie has a new README file in her directory, which was downloaded by the ftp command. blondie did encounter a couple of hitches, however.

• First, the command paused, and she needed to hit RETURN once to get it to complete. For security reasons, many commands, when reading passwords, do not read the passwords from stdin, but from the terminal directly. (Commands do not have to rely on stdin as their sole source of input, but most choose to do so.) When ftp attempted to read the password from the terminal, the program suspended until blondie hit the RETURN key.

• Secondly, there is an odd line specifying ?Invalid Input. Because the password was read from the terminal directly, it was not consumed from getreadme.ftp file. When the ftp command went to read its next line of input, it read [email protected], which it reasonably didn't recognize as a command.

• Lastly, directory listings were dumped to the terminal when the command was run. When the ftp command ran the ls commands from getreadme.txt, it wrote the output to stdout, which is still connected to the terminal. Since blondie knows where the file is located, and has embedded that information into the script, she does not need to see these listings every time she runs the command.

To address these issues, she first takes advantage of a ~/.netrc file. The ftp command is designed to look for such a file in the user's home directory, and if it exists, use it to provide the user's username and password. After examining the netrc(5) man page, blondie uses a simple text editor to create the following ~/.netrc file.

[blondie@station blondie]$ cat .netrc default login anonymous password [email protected]

Because the ~/.netrc file will now provide her username and password, she removes them from her getreadme.ftp script. Secondly, she removes the unnecessary ls commands from the script as well.

[blondie@station blondie]$ cat getreadme.ftp cd pub get README quit

Armed with her ~/.netrc file (to provide her username and password) and her modified getreadme.txt (to provide the commands for the ftp program), she reruns the ftp command, and the operation runs smoothly.

[blondie@station blondie]$ head .netrc getreadme.ftp ==> .netrc <==

10 Standard In and Standard Out

default login anonymous password user@site

==> getreadme.ftp <== cd pub get README quit [blondie@station blondie]$ ls getreadme.ftp [blondie@station blondie]$ ftp ftp.kernel.org < getreadme.ftp [blondie@station blondie]$ ls getreadme.ftp README Automating Graph Generation with gnuplot

The user madonna would like to be able to easily generate plots of her machine's CPU activity. She is familiar with the vmstat command, which samples and tables many parameters which concern system performance. The command can take two numeric arguments, the first which specifies the sampling period in seconds, the last the number of samples to collect.

She is interested in the last three columns, which are the the percentage of time the CPU is spending in the user ("us"), system ("sy"), and idle ("id") state. She collects 60 seconds worth of data from her machine, sampling every second.

[madonna@station madonna]$ vmstat 1 60 > stats.txt [madonna@station madonna]$ head stats.txt procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 2 6 0 17348 65604 277768 0 0 15 16 126 221 1 0 97 1 1 5 0 15736 66008 277788 0 0 376 6269 314 725 5 2 0 93 1 6 0 11496 67224 277392 0 0 1216 8 422 1533 15 16 0 69 0 6 0 10492 67944 277676 0 0 940 28 338 1193 7 8 0 85 0 6 0 10168 68324 277644 0 0 576 0 261 992 6 1 0 93 3 3 0 8848 69424 277864 0 0 1252 64 429 1386 10 16 0 74 3 3 0 8056 70188 277892 0 0 1068 1148 422 1215 8 16 0 76 1 6 0 12248 71084 277636 0 0 940 28 341 1275 9 4 0 87

A little frustrated that the two lines of text headers will interfere with the plotting of the data, madonna opens the file stats.txt in a text editor and easily deletes them.

To plot the data, she uses gnuplot, a sophisticated plotting package which uses commands read from a terminal interface to generate plots of mathematical functions and numeric data. After some browsing through the online help available within gnuplot, she develops the following commands to plot her data as a PNG graphics file called cpu.png.

[madonna@station madonna]$ gnuplot

G N U P L O T Version 3.7 patchlevel 3

...

Terminal type set to 'x11' gnuplot> set term png Terminal type set to 'png' Options are ' small color' gnuplot> set output 'cpu.png' gnuplot> plot 'stats.txt' using 0:13 title "user" with lines, 'stats.txt' using 0:14 title "system" with lines, 'stats.txt' using 0:15 title "idle" with lines gnuplot> quit

After quiting gnuplot, she returns to the bash shell, where she uses the eog image viewer to view her plot.

[madonna@station madonna]$ eog cpu.png

11 Standard In and Standard Out

Figure 1.1. madonna's plot of CPU activity

Because madonna thinks she will want to generate a similar plot often, and doesn't go through the agony of typing in gnuplot's plot command every time, she generates a script which can be used to automate gnuplot. Using a text editor, she creates the file cpu_plot.gnuplot, which contains all of the gnuplot commands she entered from the keyboard, paying close attention to put one command per line.

[madonna@station madonna]$ cat cpu_plot.gnuplot set term png set output 'cpu.png' plot 'stats.txt' using 0:13 title "user" with lines, 'stats.txt' using 0:14 titl e "system" with lines, 'stats.txt' using 0:15 title "idle" with lines

Now she can easily plot newly collected data by redirecting her script as gnuplot's stdin.

[madonna@station madonna]$ gnuplot < cpu_plot.gnuplot Online Exercises Using Standard In and Standard Out Lab Exercise

Objective: Use bash shell redirection to effectively control Standard In and Standard Out.

Estimated Time: 20 mins. Specification

1. The hostname command reports your station's currently assigned hostname. Run the command (without arguments), and redirect the output to the file ~/stdoutlab.txt.

2. The uptime command reports how much time has passed since your machine was booted, and other system utilization information. Run the uptime command (without arguments), using redirection to append the output to the file ~/stdoutlab.txt.

12 Standard In and Standard Out

3. The uname -a command lists information about your current kernel version. Run the command, using redirection to append the output to the file ~/stdoutlab.txt.

If you have completed the previous three steps successfully, you should be able to reproduce output similar to the following. (Do not be concerned if your actual information differs from that shown below).

[student@station student]$ cat stdoutlab.txt station 07:09:31 up 11:30, 5 users, load average: 0.19, 0.06, 0.01 Linux station 2.4.20-20.9 #1 Mon Aug 18 11:45:58 EDT 2003 i686 i686 i386 GNU/Linux

4. Generate a simple text file, ~/script.gnuplot, that will act as script to drive gnuplot. When gnuplot reads your script from stdin, it should generate a plot of a simple mathematical expression, such as the sine of x (sin(x)), or x squared (x**2). The plot should be generated as a PNG graphics file called "gnuplot.png".

Once completed, your script should be able to be used as in the following example.

[student@station student]$ ls script.gnuplot [student@station student]$ gnuplot < script.gnuplot [student@station student]$ ls gnuplot.png script.gnuplot [student@station student]$ file gnuplot.png gnuplot.png: PNG image data, 640 x 480, 8-bit colormap, non-interlaced [student@station student]$ eog gnuplot.png Deliverables

1. 1. A file called ~/stdoutlab.txt, which contains the output of the hostname command, followed by the output of the uptime command, followed by the output of the uname -a command.

2. A script ~/script.gnuplot, which when used as stdin for the gnuplot command, generates a PNG graphics file titled gnuplot.png containing a plot of a simple mathematical function. Suggested Strategy for Automating gnuplot

Using Example 3 as your guide, experiment with gnuplot interactively, until you are able to generate a simple plot. If you are using a text terminal instead of the X graphics environment, you can generate text plots by setting your gnuplot output terminal as "dumb":

gnuplot> set term dumb

Once you can produce graphs, set your terminal type to png (for PNG graphics), and your output file to "gnuplot.png", using the following two commands ...

gnuplot> set term png gnuplot> set output "gnuplot.png"

... and generate your plot again. Once you have figured out the sequence of commands to generate a plot as a PNG file, record the commands as your gnuplot script. Questions

Use the following transcript to answer the next 3 questions.

[prince@station prince]$ who

13 Standard In and Standard Out

prince tty1 Sep 22 20:02 [prince@station prince]$ cat < /etc/services >> ~/services

1. Which file is the cat command using for Standard In?

a. /dev/tty1

b. /etc/services

. /dev/null

d. ~/services

e. None of the above

2. Which file is the cat command using for Standard Out?

a. ~/services

b. /dev/tty1

c. /etc/services

d. /dev/null

e. None of the above

3. Which file is the bash shell using for Standard Out?

a. /dev/tty1

b. ~/services

c. /dev/null

d. /etc/services

e. None of the above

4. Which of the following command lines would append lines to the (already existing) file df.out?

a. df > df.out

b. df >+ df.out

c. df.out << df

d. df.out +< df

e. None of the above

Use the following transcript to answer the next 3 questions.

[prince@station prince]$ ps PID TTY TIME CMD 1409 pts/0 00:00:00 bash 1542 pts/0 00:00:00 vmstat 1543 pts/0 00:00:00 ps [prince@station prince]$ ls -l /proc/1542/fd total 6 lrwx------1 prince prince 64 Sep 22 20:42 0 -> /dev/pts/0

14 Standard In and Standard Out

l-wx------1 prince prince 64 Sep 22 20:42 1 -> /tmp/vmstat.out lrwx------1 prince prince 64 Sep 22 20:42 2 -> /dev/pts/0 lr-x------1 prince prince 64 Sep 22 20:42 3 -> /proc/uptime lr-x------1 prince prince 64 Sep 22 20:42 4 -> /proc/stat lr-x------1 prince prince 64 Sep 22 20:42 5 -> /proc/meminfo

5. Which of the following files is the vmstat command using for Standard Out?

a. /proc/stat

b. /dev/tty1

c. /dev/pts/0

d. /tmp/vmstat.out

e. There is not enough information provided.

6. Which of the following occurred when the vmstat command was started?

a. The bash shell created a new file.

b. The bash shell clobbered an already existing file.

c. The bash shell appended to an already existing file.

d. The vmstat command created a new file.

e. There is not enough information provided.

7. Which of the following files is the vmstat command using for Standard In?

a. /proc/stat

b. /tmp/vmstat.out

c. /dev/pts/0

d. /dev/tty1

e. There is not enough information provided.

Use the following transcript to answer the next 3 questions.

[prince@station prince]$ cal > cal.out [prince@station prince]$ ls -l /dev/stdin lrwxrwxrwx 1 root root 17 Apr 1 11:13 /dev/stdin -> ../proc/self/fd/0 [prince@station prince]$ cat /dev/stdin < cal.out September 2003 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

8. Which of the following would best describe the file /proc/self?

a. The file is a symbolic link which resolves to the current login shell.

b. The file is a symbolic link which resolves to the root directory.

15 Standard In and Standard Out

c. The file is a symbolic link which resolves to /dev/stdout.

d. The file is a symbolic link which resolves to /tmp.

e. The file is a symbolic link which resolves to the /proc/pid directory corresponding to the current process.

9. For the cat process, to what file would the symbolic link /proc/self/fd/0 resolve?

a. /dev/stdin

b. cal.out

c. /dev/tty1

d. /tmp

e. /proc/self

10. For the cat process, to what file would the symbolic link /proc/self/fd/1 most likely resolve?

a. /proc/self

b. /tmp

c. cal.out

d. /dev/stdin

e. /dev/tty1

16 Chapter 2. Standard Error

Key Concepts

• Unix programs commonly report error conditions to a destination called Standard Error (stderr).

• Usually, stderr is connected to a terminal's display, and error messages are found intermixed with standard output.

• When using the bash shell, the stderr stream can be redirected to a file using 2>.

• When using bash, the stderr stream can be combined with stdout stream using 2>&1 or >& Discussion Standard Error (stderr)

We have discussed the standard input and output streams, stdin and stdout, and how to use > and < in the bash command line to redirect them. We are now ready to confuse matters a little by introducing a second output stream, commonly used for reporting error conditions, called Standard Error (often abbreviated stderr).

In the following sequence, elvis is using the head -1 command to generate a list of the first lines of all the files in the /etc/rc.d directory.

[elvis@station elvis]$ ls -F /etc/rc.d/ init.d/ rc0.d/ rc2.d/ rc4.d/ rc6.d/ rc.sysinit* rc* rc1.d/ rc3.d/ rc5.d/ rc.local* rc.sysinit.rpmsave* [elvis@station elvis]$ head -1 /etc/rc.d/* ==> /etc/rc.d/init.d <== head: /etc/rc.d/init.d: Is a directory

==> /etc/rc.d/rc <== #! /bin/bash

==> /etc/rc.d/rc0.d <== head: /etc/rc.d/rc0.d: Is a directory

==> /etc/rc.d/rc1.d <== head: /etc/rc.d/rc1.d: Is a directory

==> /etc/rc.d/rc2.d <== head: /etc/rc.d/rc2.d: Is a directory

==> /etc/rc.d/rc3.d <== head: /etc/rc.d/rc3.d: Is a directory

==> /etc/rc.d/rc4.d <== head: /etc/rc.d/rc4.d: Is a directory

==> /etc/rc.d/rc5.d <== head: /etc/rc.d/rc5.d: Is a directory

==> /etc/rc.d/rc6.d <== head: /etc/rc.d/rc6.d: Is a directory

==> /etc/rc.d/rc.local <== #!/bin/sh

17 Standard Error

==> /etc/rc.d/rc.sysinit <== #!/bin/bash

==> /etc/rc.d/rc.sysinit.rpmsave <== #!/bin/bash

The head command, when fed multiple file names as arguments, conveniently decorates the name of the file, followed by the first specified number of lines (in this case, one). When the head command encounters a directory, however, it merely complains. Next, elvis runs the same command, redirecting stdout to the file rcsummary.out.

[elvis@station elvis]$ head -1 /etc/rc.d/* > rcsummary.out head: /etc/rc.d/init.d: Is a directory head: /etc/rc.d/rc0.d: Is a directory head: /etc/rc.d/rc1.d: Is a directory head: /etc/rc.d/rc2.d: Is a directory head: /etc/rc.d/rc3.d: Is a directory head: /etc/rc.d/rc4.d: Is a directory head: /etc/rc.d/rc5.d: Is a directory head: /etc/rc.d/rc6.d: Is a directory

Most of the output is obediently redirected to the file rcsummary.out, but the directory complaints are still displayed. Although not obvious at the outset, the head command is really sending output to two independent streams. Normal output is written to Standard Out, but error message are written to a separate stream called Standard Error (often abbreviated stderr). Usually, both streams are connected to the terminal, and so the two are difficult to distinguish. By redirecting stdout, however, the information written to stderr is obvious. Redirecting stderr

Just as bash uses > to redirect stdout, bash uses 2> to redirect stderr. For example, elvis repeats the head command from above, but instead of redirecting stdout to rcsummary.out, he redirects stderr to the file rcsummary.err.

[elvis@station elvis]$ head -1 /etc/rc.d/* 2> rcsummary.err ==> /etc/rc.d/init.d <==

==> /etc/rc.d/rc <== #! /bin/bash

==> /etc/rc.d/rc0.d <==

==> /etc/rc.d/rc1.d <==

==> /etc/rc.d/rc2.d <==

==> /etc/rc.d/rc3.d <==

==> /etc/rc.d/rc4.d <==

==> /etc/rc.d/rc5.d <==

==> /etc/rc.d/rc6.d <==

==> /etc/rc.d/rc.local <== #!/bin/sh

==> /etc/rc.d/rc.sysinit <== #!/bin/bash

==> /etc/rc.d/rc.sysinit.rpmsave <== #!/bin/bash

18 Standard Error

The output is the complement to the previous example. We now see the normal output displayed to the screen, but no error messages. Where did the error messages go? It shouldn't be hard to guess.

[elvis@station elvis]$ cat rcsummary.err head: /etc/rc.d/init.d: Is a directory head: /etc/rc.d/rc0.d: Is a directory head: /etc/rc.d/rc1.d: Is a directory head: /etc/rc.d/rc2.d: Is a directory head: /etc/rc.d/rc3.d: Is a directory head: /etc/rc.d/rc4.d: Is a directory head: /etc/rc.d/rc5.d: Is a directory head: /etc/rc.d/rc6.d: Is a directory

In the following example, both > and 2> are used to redirect stdout and stderr independently.

[elvis@station elvis]$ head -1 /etc/rc.d/* > rcsummary.out 2> rcsummary.err [elvis@station elvis]$

In this case, the standard output can be found in the file rcsummary.out, error messages can be found in rcsummary.err, and nothing is left over to be displayed to the screen. Combining stdout and stderr: Old School

Often, someone would like to redirect the combined stdout and stderr streams to a single file. As a first attempt, elvis tries the following command.

[elvis@station elvis]$ head -1 /etc/rc.d/* > rcsummary.both 2> rcsummary.both

Upon examining the file rcsummary.both, however, elvis doesn't find what he expects.

[elvis@station elvis]$ cat rcsummary.both head: /etc/rc.d/init.d: I ==> /etc/rc.dhead: /etc/rc.d/rc0.d: Is a directory head: /etc/rc.d/rc1.d: Is a direc ==> head: /etc/rc.d/rc2. ==> /etc/rc.d/rc3head: / ==> /etc/rc.d/rc4.d <==

==> /etc/rc.d/rc5.d <==

==> /etc/rc.d/rc6.d <==

==> /etc/rc.d/rc.local <== #!/bin/sh

==> /etc/rc.d/rc.sysinit <== #!/bin/bash

==> /etc/rc.d/rc.sysinit.rpmsave <== #!/bin/bash

The bash shell opened the file rcsummary.both twice, but treated each open file independently. When stdout and stderr both wrote to the file, they clobbered each other's information. What is needed instead is some way to tell bash to effectively combine stderr and stdout into a single stream, and then redirect that stream to a single file. As you would expect, there is such a way.

[elvis@station elvis]$ head -1 /etc/rc.d/* > rcsummary.both 2>&1

Although awkward, the last token 2>&1 should be thought of as saying "take stderr, and send it wherever stdout is currently going". Now rcsummary.both contains the expected output.

[elvis@station elvis]$ cat rcsummary.both ==> /etc/rc.d/init.d <==

19 Standard Error

head: /etc/rc.d/init.d: Is a directory

==> /etc/rc.d/rc <== #! /bin/bash

==> /etc/rc.d/rc0.d <== head: /etc/rc.d/rc0.d: Is a directory

==> /etc/rc.d/rc1.d <== head: /etc/rc.d/rc1.d: Is a directory ...

Much of this output was truncated, and replaced with "...". Combining stdout and stderr: New School

Using 2>&1 to combine stdout and stderr was introduced in the original Unix shell, the Bourne shell (sh). Because bash is designed to be backwards compatible with sh, it supports the syntax as well. The syntax, however, is inconvenient. Besides being difficult to write, the order of the redirections is important. Using ">out.txt 2>&1" and "2>&1 >out.txt" does not have the same effect!

In order to simplify things, bash uses >& to combine both stdin and stdout, as in the following example.

[elvis@station elvis]$ head -1 /etc/rc.d/* >& rcsummary.both Summary

The following table summarizes the syntax used by the bash shell for redirecting stdin, stdout, and stderr learned in this and the previous lesson.

Table 2.1. Redirecting stdin, stdout, and stderr in bash

syntax effect cmd < file Redirect stdin from file cmd > file Redirect stdout into file, overwriting (clobbering) file if it exists. cmd >> file Redirect stdout into file, appending to file if it exists. cmd 2> file Redirect stderr into file, overwriting (clobbering) file if it exists. cmd 2>> file Redirect stderr into file, appending to file if it exists. cmd > file 2>&1 Combine stdout and stderr, and redirect both into file. (Portable syntax) cmd >& file Combine stdout and stderr, and redirect both into file. (Convenient syntax) Examples Using /dev/null to filter out stderr

The user elvis is has recently learned that, besides the /home/elvis and /tmp directories he's familiar with, he may also own files in the /var directory. These files are usually spooling files for received but not yet viewed email, print jobs waiting to be sent to the printer, etc.

Curious, he uses the find command to find all files within the /var directory that he owns.

[elvis@station elvis]$ find /var -user elvis find: /var/lib/slocate: Permission denied

20 Standard Error

find: /var/lib/nfs/statd: Permission denied find: /var/lib/xdm/authdir: Permission denied ... find: /var/spool/lpd/three-west: Permission denied find: /var/spool/lpd/one-east-color: Permission denied find: /var/spool/lpd/server1: Permission denied /var/spool/mail/elvis find: /var/spool/at: Permission denied ... find: /var/tux: Permission denied find: /var/tomcat4/webapps: Permission denied

(Much of the output of this command has been truncated, and replaced with "...").

Although the find command appropriately reported the /var/spool/mail/elvis file, the output is difficult to find among all of the "Permission denied" error messages being reported from various subdirectories of /var. In order to help separate the wheat from the chaff, elvis redirects stderr to some file in the /tmp directory.

[elvis@station elvis]$ find /var -user elvis 2> /tmp/foo /var/spool/mail/elvis

While this works, elvis is left with a file called /tmp/foo that he really didn't want. In situations like this, when a user wants to discard a stream of information, experienced Unix users usually redirect output to a pseudo device called /dev/null.

[elvis@station elvis]$ find /var -user elvis 2> /dev/null /var/spool/mail/elvis

As the following long listing shows, /dev/null is a character device node, like those used for conventional device drivers.

[elvis@station elvis]$ ls -l /dev/null crw-rw-rw- 1 root root 1, 3 Jan 30 05:24 /dev/null

When a user writes to /dev/null, the information is merely discarded by the kernel. When a user reads from /dev/null, they encounter an immediate end of file. Notice that /dev/null is one of the few files in Red Hat Enterprise Linux that has world writable permissions by default. Online Exercises Lab Exercise

Objective: Effectively manage Standard Out and Standard Error Streams

Estimated Time: 10 mins. Specification

1. Use the following command line to display the contents of all files within the /etc/X11 directory.

[elvis@station elvis]$ cat /etc/X11/* cat: /etc/X11/applnk: Is a directory cat: /etc/X11/desktop-menus: Is a directory cat: /etc/X11/fs: Is a directory cat: /etc/X11/gdm: Is a directory cat: /etc/X11/lbxproxy: Is a directory #!/bin/sh

PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin

21 Standard Error

...

2. Repeat the command line, but redirect stdout to a file called ~/stderrlab.out and stderr to a file called ~/stderrlab.err.

3. Repeat the command again, but combine stdout and stderr into a single stream, and redirect the stream to the file ~/stderrlab.both. Deliverables

1. 1. A file called ~/stderrlab.out, which contains the stdout stream from the command cat / etc/X11/*.

2. A file called ~/stderrlab.err, which contains the stderr stream from the command cat / etc/X11/*.

3. A file called ~/stderrlab.both, which contains the combined stdout and stderr streams from the command cat /etc/X11/*. Questions

Use the following transcript to answer the next question.

[madonna@station madonna]$ cat chmod.err chmod: changing permissions of `/tmp/orbit-elvis': Operation not permitted chmod: changing permissions of `/tmp/orbit-elvis-a8e8d915': Operation not permitted chmod: changing permissions of `/tmp/orbit-hogan': Operation not permitted chmod: changing permissions of `/tmp/orbit-root': Operation not permitted chmod: changing permissions of `/tmp/orbit-student': Operation not permitted

1. Which of the following command lines most likely created the file chmod.err?

a. chmod a+r /tmp/* > chmod.err

b. chmod a+r /tmp/* 2> chmod.err

c. chmod a+r /tmp/* >> chmod.err

d. chmod a+r /tmp/* e> chmod.err

e. None of the above

2. Which of the following command lines would combine stdout and stderr, and redirect the combined stream to the file /tmp/find.out?

a. find /etc > /tmp/find.out 2>&1

b. find /etc >> /tmp/find.out

c. find /etc >& /tmp/find.out

d. find /etc >>& /tmp/find.out

e. Both A and C

3. Which of the following command lines would combine stdout and stderr, and redirect the combined stream to the file /tmp/find.out, appending to the file if it already existed?

22 Standard Error

a. find /etc > /tmp/find.out 2>&1

b. find /etc >> /tmp/find.out

c. find /etc >> /tmp/find.out 2>&1

d. find /etc >>& /tmp/find.out

e. Both A and C

Use the following transcript to answer the next two questions.

[madonna@station madonna]$ cat /etc/t* > /tmp/cat.out 2> /tmp/cat.err

[1]+ Stopped cat /etc/t* >/tmp/cat.out 2>/tmp/cat.err [madonna@station madonna]$ ps PID TTY TIME CMD 2419 pts/1 00:00:00 bash 3126 pts/1 00:00:00 cat 3127 pts/1 00:00:00 ps [madonna@station madonna]$ ls -l /proc/3126/fd total 4 lrwx------1 madonna madonna 64 Sep 23 04:15 0 -> /dev/pts/1 l-wx------1 madonna madonna 64 Sep 23 04:15 1 -> /tmp/cat.out l-wx------1 madonna madonna 64 Sep 23 04:15 2 -> /tmp/cat.err lr-x------1 madonna madonna 64 Sep 23 04:15 3 -> /usr/share/hwdata/oui.txt [madonna@station madonna]$ fg [madonna@station madonna]$ ??????? cat: /etc/tomcat4: Is a directory

4. Which file descriptor does Linux use as Standard Error?

a. 0

b. 1

c. 2

d. 3

e. None of the above

5. Which of the following command lines could replace the question marks as the last command line in the transcript?

a. cat /dev/pts/1

b. cat /tmp/cat.out

c. cat /tmp/cat.err

d. cat /usr/share/hwdata/oui.txt

e. cat /dev/null

6. To which of the following files would you expect the symbolic link /dev/stderr to resolve?

a. ../proc/self/fd/2

b. ../proc/self/fd/1

c. ../proc/self/fd/0

23 Standard Error

d. /dev/null

e. None of the above

Use the following transcript to answer the next two questions.

[madonna@station madonna]$ cat groups.sh #!/bin/bash if id $1 then echo "The user $1 belongs to the following groups: $(id -Gn $1) " else echo "The user $1 does not exist" fi [madonna@station madonna]$ ./groups.sh elvis uid=501(elvis) gid=501(elvis) groups=501(elvis),201(wrestle),202(physics),203(emperors),205(music) The user elvis belongs to the following groups: elvis wrestle physics emperors music [madonna@station madonna]$ ./groups.sh barney id: barney: No such user The user barney does not exist

7. Which of the following replacements for line number 3 of the file groups.sh would cause the script to display a single line beginning "The user", in all cases?

a. if id -q $1

b. if id $1 >/dev/null 2>&1

c. if id $1 2> /dev/null

d. if id $1 > /dev/null

e. None of the above

8. Which of the following replacements for line number 7 of the file groups.sh would cause the script to complain on Standard Error if the user does not exist (instead of Standard Out)?

a. echo "The user $1 does not exist" > /dev/stderr

b. echo -e "The user $1 does not exist"

c. echo "The user $1 does not exist" > /proc/self/fd/2

d. echo -E "The user $1 does not exist"

e. A and C

9. Assuming cmd is some simple command and its arguments (but no shell metacharacters), which of the following would redirect Standard Error (only) to the file /tmp/errors, appending to the file if it already existed?

a. cmd 2> /tmp/errors

b. cmd >> /tmp/errors

c. cmd >+ /tmp/errors

d. cmd 2>+ /tmp/errors

e. cmd 2>> /tmp/errors

24 Standard Error

10. Assuming cmd is some simple command and its arguments (but no shell metacharacters), which of the following would be guaranteed to execute cmd, but generate no visible output?

a. cmd 2> /dev/null

b. cmd > /dev/null

c. (cmd)

d. cmd > /dev/null 2>&1

e. silent (cmd)

25 Chapter 3. Pipes

Key Concepts

• The stdout stream from one process can be connected to the stdin stream of another process, using what Unix calls a "pipe".

• Many commands in Unix are designed to operate as a filter, reading input from stdin and sending output to stdout.

• bash uses "|" to create a pipe between two commands. Discussion Pipes

In the previous Lessons, we have seen that a process's output can be redirected to somewhere other than the terminal display, or that a process can be asked to read input from some location other than the terminal keyboard. One of the most common, and most powerful, forms of redirection is a combination of the two, where the output (Standard Out) of one command is "piped" directly into the input (Standard In) of another command, forming what Linux (and Unix) refers to as a pipe.

When two commands are joined by a pipe, the stdout stream of the first process is tied directly to the stdin sequence of the second process, so that multiple processes can be combined in a sequence. In order to create a pipe using bash, the two commands are joined with a vertical bar |. (On most keyboards, this character is found on the same key as the backslash, above the RETURN key.) All processes that are joined in a pipe are referred to as a process group.

As an example, consider prince, who is trying to find the largest files underneath the /etc directory. He begins by composing a find command that will list all file with a size greater than 100Kbytes.

[prince@station prince]$ find /etc -size +100k 2>/dev/null /etc/termcap /etc/gconf/gconf.xml.defaults/schemas/desktop/gnome/interface/%gconf.xml /etc/gconf/gconf.xml.defaults/schemas/apps/mailcheck_applet/prefs/%gconf.xml /etc/gconf/gconf.xml.defaults/schemas/apps/tasklist_applet/prefs/%gconf.xml ...

Observing that the find command seems to list the files in no particular order, prince decides he would like the files to be listed alphabetically. He could redirect the output to a file, and then sort the file. Instead, he takes advantage of the fact that the sort command, when invoked without arguments, looks to Standard In for the data to sort. He pipes the output of his find command into sort.

[prince@station prince]$ find /etc -size +100k 2>/dev/null | sort /etc/aep/aeptarg.bin /etc/gconf/gconf.xml.defaults/schemas/apps/gedit-2/preferences/editor/save/%gconf.xml /etc/gconf/gconf.xml.defaults/schemas/apps/gnomemeeting/general/%gconf.xml ... /etc/makedev.d/cciss /etc/makedev.d/dac960 /etc/squid/squid.conf /etc/squid/squid.conf.default /etc/termcap

The files are now listed in alphabetical order.

26 Pipes

Filtering output using grep

The traditional Unix grep command is commonly used in pipes to reduce data to only the "interesting" parts. The grep command will be discussed in detail in a later Workbook. Here, we introduce grep in its simplest form.

The grep command is used to search for and extract lines which contain a specified string of text. For example, in the following, prince prints all lines that contain the text "root" from the /etc/passwd file.

[prince@station prince]$ grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin

The first argument to the grep command is the string of text to be searched for, and any remaining arguments are files to be searched for the text. If the grep command is called with only one argument (a string to be searched for, but no files to search), it looks to Standard In as its source of data on which to operate.

In the following, prince has so many files in his home directory that he is having trouble keeping track of them. He's trying to find a directory called templates that he created a few months ago. He uses the locate command to help him find it.

[prince@station prince]$ locate templates /etc/openldap/ldaptemplates.conf /usr/share/doc/libxslt-1.0.27/html/libxslt-templates.html /usr/share/doc/libxslt-1.0.27/templates.gif /usr/share/doc/docbook-style-xsl-1.58.1/docsrc/templates.xml /usr/share/man/man5/ldaptemplates.conf.5.gz /usr/share/man/man3/ldap_free_templates.3.gz /usr/share/man/man3/ldap_init_templates_buf.3.gz /usr/share/man/man3/ldap_init_templates.3.gz ...

Unfortunately for prince, there are many files which contain the text templates in their name on the system, and prince becomes overwhelmed with lines and lines of output. In order to reduce the information to more relevant files, prince next takes stdout from the locate command, and creates a pipe to stdin of the grep command, "grepping" for the word "prince".

[prince@station 010_section_discussion]$ locate templates | grep prince /home/prince/.kde/share/apps/quanta/templates /home/prince/proj/templates

Because the grep command is not given a file to search, it looks to stdin, where it finds the stdout stream of the locate command. Filtering the stream, grep only duplicates to its stdout lines that matched the specified text, "prince". The rest were discarded. The user prince easily finds his directory under ~/proj, as well as another directory created by the application quanta. Pipes and stderr

In the next example, prince is curious to see where he shows up in the system's configuration files, and "greps" for his name in the /etc directory.

[prince@station prince]$ grep prince /etc/* grep: /etc/aliases.db: Permission denied grep: /etc/at.deny: Permission denied grep: /etc/default: Permission denied /etc/group:music:x:205:elvis,blondie,prince,madonna,student /etc/group:prince:x:502: grep: /etc/group-: Permission denied

27 Pipes

grep: /etc/group.lock: Permission denied ... grep: /etc/lvmtab: Permission denied /etc/passwd:prince:x:502:502::/home/prince:/bin/bash grep: /etc/passwd-: Permission denied grep: /etc/passwd.lock: Permission denied ... grep: /etc/sudoers: Permission denied /etc/termcap:# From: John Doe grep: /etc/vsftpd.conf.rpmsave: Permission denied

Again, prince is overwhelmed by the amount of output from this command. He tries the same trick, "grepping" it down for all lines that contain the word "passwd".

[prince@station prince]$ grep prince /etc/* | grep passwd grep: /etc/aliases.db: Permission denied grep: /etc/at.deny: Permission denied grep: /etc/default: Permission denied grep: /etc/group-: Permission denied grep: /etc/group.lock: Permission denied ... grep: /etc/lvmtab: Permission denied /etc/passwd:prince:x:502:502::/home/prince:/bin/bash grep: /etc/passwd-: Permission denied grep: /etc/passwd.lock: Permission denied ...

While stdout from the first grep command was appropriately filtered, stderr is unaffected, and still gets displayed to the screen. How would prince go about suppressing stderr as well? Commands as filters

The concept of a pipe extends naturally, so that multiple commands can be used together, each reading information from stdin, somehow modifying or filtering the information, and passing the result to stdout. In a subsequent Workbook, you will find that there are many standard Linux (and Unix) commands that are designed for this purpose, including some that you are already familiar with: grep, head, tail, cut, sort, sed, and awk, to name a few. Examples Listing Processes by Name

Often, one would like to list information about processes which are running a specific command. While ps aux tables a lot of information about currently running processes, the number of processes running on the machine can make the output overwhelming. The grep command can help simplify the output.

In the following, prince would like to list information about the processes which are implementing his web server, the httpd command. He lists all processes, but then reduces the output to only those lines which contain the text httpd.

[prince@station prince]$ ps aux | grep httpd root 889 0.0 0.0 18248 100 ? S Sep22 0:00 /usr/sbin/httpd apache 907 0.0 0.5 18436 1320 ? S Sep22 0:00 /usr/sbin/httpd apache 913 0.0 0.7 18436 1952 ? S Sep22 0:00 /usr/sbin/httpd apache 914 0.0 0.5 18436 1332 ? S Sep22 0:00 /usr/sbin/httpd apache 1979 0.0 0.5 18360 1524 ? S Sep22 0:00 /usr/sbin/httpd apache 1980 0.0 0.8 18388 2140 ? S Sep22 0:00 /usr/sbin/httpd apache 1981 0.0 0.5 18360 1524 ? S Sep22 0:00 /usr/sbin/httpd prince 4905 0.0 0.2 3572 640 pts/1 S 06:19 0:00 grep httpd

28 Pipes

Searching Command History Efficiently

The user prince recently spent some time constructing a find command line which listed all large files underneath the /etc directory, including size. Rather than repeating his efforts, he would like to see if the command is still in his history. Because his history contains hundreds of lines, he uses the grep command to help reduce the output.

[prince@station prince]$ history | grep find 102 find /var -user elvis 175 find -exec file {} \; 434 find /etc -name *.conf | head 675 find /etc -size +100k 680 find /etc -size +100k -exec ls -s {} \; 2>/dev/null 682 find -size +100k /etc 683 find /etc -size +100k 690 history | grep find

He now locates the command line he wanted, and uses history substitution to easily repeat the command.

[prince@station prince]$ !680 find /etc -size +100k -exec ls -s {} \; 2>/dev/null 728 /etc/termcap 132 /etc/gconf/gconf.xml.defaults/schemas/desktop/gnome/interface/%gconf.xml 304 /etc/gconf/schemas/gedit.schemas ... Unix philosophy: Simple Tools that Work Well Together

Linux, like Unix, is fundamentally based on the philosophy that complex systems should be created out of simple, specialized components that inter-operate easily. Following this philosophy, many standard Linux programs are designed to work as filters, reading data from a standard source, manipulating the data, and delivering the result to a standard destination.

This philosophy is important enough to be illustrated using a long, detailed example. The example will use commands you're probably not yet familiar with. Don't worry about the details of how to use the commands, but instead focus on how the commands work together, each command performing a small part, to produce the desired result.

Suppose a system administrator is examining DHCP lease messages in a well known log file, /var/ log/messages. If you are not familiar with DHCP, it is a protocol by which IP addresses can be assigned to machines based on the hardware (MAC) address that is built into a machine's Ethernet card. In the following lines extracted from /var/log/messages, focus on the line containing the word DHCPOFFER. Note that the IP address 192.168.0.11 is being offered to the Ethernet card with a hardware address of 00:08:74:37:c5:c3.

... May 27 12:18:21 server1 dhcpd: DHCPACK on 192.168.0.110 to 00:09:6b:d0:ce:8f via eth0 May 27 12:18:27 server1 login(pam_unix)[1981]: session closed for user root May 27 12:19:15 server1 named[24350]: listening on IPv4 interface eth1, 192.168.22.20#53 May 27 12:19:21 server1 vsftpd: warning: can't get client address: Bad file descriptor May 27 12:19:21 server1 last message repeated 3 times May 27 12:20:27 server1 dhcpd: DHCPDISCOVER from 00:08:74:37:c5:c3 via eth0 May 27 12:20:27 server1 dhcpd: DHCPOFFER on 192.168.0.11 to 00:08:74:37:c5:c3 via eth0 May 27 12:20:27 server1 dhcpd: DHCPREQUEST for 192.168.0.11 (192.168.0.254) from 00:08:74:37:c5:c3 via eth0 ...

Without worrying about the details of the DHCP protocol, suppose the administrator wanted to extract a list of IP addresses and the hardware addresses they were offered to from the log file. An experienced administrator might take the following approach.

29 Pipes

Realizing that the /var/log/message file is a very large file, in this case over 1000 lines in length, the administrator first uses the grep command to reduce the information to only the relevant lines.

[root@station log]# grep DHCPOFFER messages May 27 11:46:22 server1 dhcpd: DHCPOFFER on 192.168.0.1 to 00:08:74:d9:41:9e via eth0 May 27 11:46:22 server1 dhcpd: DHCPOFFER on 192.168.0.1 to 00:08:74:d9:41:9e via eth0 May 27 11:46:30 server1 dhcpd: DHCPOFFER on 192.168.0.1 to 00:08:74:d9:41:9e via eth0 May 27 11:46:30 server1 dhcpd: DHCPOFFER on 192.168.0.1 to 00:08:74:d9:41:9e via eth0 May 27 11:48:40 server1 dhcpd: DHCPOFFER on 192.168.0.2 to 00:08:74:d9:41:32 via eth0 ...

This is a start, but the administrator is still dealing with too much information (the above screenshot only lists the first 5 of 90 lines produced by this command). In order to extract just the relevant information, namely the IP address and hardware address, the administrator takes the output of the grep command, and pipes it to a command called sed, which strips the first few words from each line.

[root@station log]# grep DHCPOFFER messages | sed "s/^.*on //" 192.168.0.1 to 00:08:74:d9:41:9e via eth0 192.168.0.1 to 00:08:74:d9:41:9e via eth0 192.168.0.1 to 00:08:74:d9:41:9e via eth0 192.168.0.1 to 00:08:74:d9:41:9e via eth0 192.168.0.2 to 00:08:74:d9:41:32 via eth0 192.168.0.2 to 00:08:74:d9:41:32 via eth0 192.168.0.2 to 00:08:74:d9:41:32 via eth0 192.168.0.2 to 00:08:74:d9:41:32 via eth0 192.168.0.3 to 00:08:74:d9:40:a4 via eth0 ...

If you are not familiar with the sed command (and you probably aren't), don't worry about the details. Just note that sed's argument had the effect of removing the leading text from each line, up to the word "on". There's still too much extra text, however, so the administrator takes the output of this grep-sed combination, and pipes it to a command called awk.

[root@station log]$ grep DHCPOFFER messages | sed "s/^.*on // " | awk '{print $1,$3}' ... 192.168.0.14 00:08:74:34:fe:bc 192.168.0.5 00:08:74:34:fd:36 192.168.0.15 00:08:74:37:c8:eb 192.168.0.15 00:08:74:37:c8:eb 192.168.0.6 00:08:74:d9:41:a3 192.168.0.16 00:08:74:d9:41:ac 192.168.0.7 00:08:74:d9:41:53 192.168.0.16 00:08:74:d9:41:ac 192.168.0.17 00:08:74:35:00:e3 ...

Again, not worrying about the details of the awk command, note that the result was to extract just the first and third column from the previous output. In order to sort the data, and remove duplicate lines, the administrator now takes the output from the chain, and pipes it in turn through the commands sort and uniq.

[root@station log]$ grep DHCPOFFER messages | sed "s/^.*on // " | awk '{print $1,$3}' | sort | uniq 192.168.0.10 00:08:74:d9:40:95 192.168.0.1 00:08:74:d9:41:9e 192.168.0.110 00:09:6b:d0:ce:8f 192.168.0.11 00:08:74:37:c5:c3 192.168.0.12 00:08:74:d9:41:dd 192.168.0.13 00:08:74:35:00:d0 192.168.0.14 00:08:74:34:fe:bc 192.168.0.15 00:08:74:37:c8:eb 192.168.0.16 00:08:74:d9:41:ac 192.168.0.17 00:08:74:35:00:e3

30 Pipes

192.168.0.2 00:08:74:d9:41:32 192.168.0.3 00:08:74:d9:40:a4 192.168.0.4 00:08:74:d9:3f:7f 192.168.0.5 00:08:74:34:fd:36 192.168.0.6 00:08:74:d9:41:a3 192.168.0.7 00:08:74:d9:41:53 192.168.0.8 00:08:74:d9:41:7b 192.168.0.9 00:08:74:35:00:1f

This is almost the list that the administrator wants, but the sort command didn't work quite right. The information is sorted, but it's sorted alphabetically, not by IP address. The administrator modifies the sort command with a couple of command line switches, specifying to sort numerically, keying on the fourth field, where fields are separated by a period:

[root@station log]$ grep DHCPOFFER messages | sed "s/^.*on //" | awk '{print $1, $3}' | sort -n -k4 -t. | uniq 192.168.0.1 00:08:74:d9:41:9e 192.168.0.2 00:08:74:d9:41:32 192.168.0.3 00:08:74:d9:40:a4 192.168.0.4 00:08:74:d9:3f:7f 192.168.0.5 00:08:74:34:fd:36 192.168.0.6 00:08:74:d9:41:a3 192.168.0.7 00:08:74:d9:41:53 192.168.0.8 00:08:74:d9:41:7b 192.168.0.9 00:08:74:35:00:1f 192.168.0.10 00:08:74:d9:40:95 192.168.0.11 00:08:74:37:c5:c3 192.168.0.12 00:08:74:d9:41:dd 192.168.0.13 00:08:74:35:00:d0 192.168.0.14 00:08:74:34:fe:bc 192.168.0.15 00:08:74:37:c8:eb 192.168.0.16 00:08:74:d9:41:ac 192.168.0.17 00:08:74:35:00:e3 192.168.0.110 00:09:6b:d0:ce:8f

This is the list that the administrator wanted, in the order that she wanted. She redirects all of this output to a file in her home directory.

[root@station log]$ grep DHCPOFFER messages | sed "s/^.*on //" | awk '{print $1 ,$3}' | sort -n -k4 -t. | uniq > ~/ip_dhcp.txt

In this example, an (admittedly experienced) administrator browsing a log file was able to spend a few minutes and develop a chain of commands that filtered the original information down to exactly the information that she wanted. She did this with a handful of tools that are in most Unix administrator's mental toolkit: grep, sed, awk, sort, and uniq.

If the administrator were using an operating system that wasn't designed around the philosophy "small tools that work together", she would have needed to rely on some programmer to develop the ip_mac_extractor utility, and possibly rely on that programmer to create a graphical user interface for the utility as well. Instead, because she was able to use the flexibility of the command line, she was able to handle the information herself. Online Exercises

Lab Exercise

Objective: Use pipes to effectively filter information.

Estimated Time: 10 mins.

31 Pipes

Specification

1. You would like to create a sorted list of all TCP services found in the file /etc/services. Pipe the output of the command grep tcp /etc/services into the command sort. Redirect the output of this pipe into the file ~/pipelab.txt.

2. Using the less pager, you would like to browse the output of the ls -R / command, viewing only files that contain the letter s. Compose a command line using two pipes to chain the commands ls -R /, grep s, and less. Leave the less pager in the foreground while you grade your exercise. Deliverables

1. 1. A file called ~/pipelab.txt, which contains the output of the command grep tcp / etc/services piped through the command sort.

2. An active less pager, which is browsing the output of the command ls -R /, piped through the command grep s. Clean Up

After your Exercise has been graded, you may quit the less pager. Questions

1. Which of the following command lines would reduce the output of the locate conf command to only files whose name or path contains the text python?

a. locate conf >> grep python

b. locate conf | grep python

c. locate conf < grep python

d. locate conf | grep < python

e. None of the above

2. Which of the following command lines would produce a sorted list of all processes which contain the text sshd?

a. ps aux | sort | grep sshd

b. ps aux | grep sshd | sort

c. grep sshd | ps aux | sort

d. sort | ps aux | grep sshd

e. A or B

Use the following transcript to answer the next 6 questions.

[elvis@station elvis]$ ls -R / 2> /dev/null | grep etc | less

32 Pipes

[1]+ Stopped ls --color=tty -R / 2>/dev/null | grep etc | less [elvis@station elvis]$ ps PID TTY TIME CMD 1603 pts/2 00:00:01 bash 2391 pts/2 00:00:00 ls 2392 pts/2 00:00:00 grep 2393 pts/2 00:00:00 less 2394 pts/2 00:00:00 ps [elvis@station elvis]$ ls -l /proc/2391/fd total 0 lrwx------1 elvis elvis 64 Sep 23 09:49 0 -> /dev/pts/2 l-wx------1 elvis elvis 64 Sep 23 09:49 1 -> pipe:[20966] l-wx------1 elvis elvis 64 Sep 23 09:49 2 -> /dev/null

3. Which file (or pipe) is tied to stderr of the ls process?

a. A pipe to the grep process

b. /dev/null

c. /dev/pts/2

d. A pipe to the less process

e. None of the above

4. Which file (or pipe) is tied to stderr of the grep process?

a. /dev/null

b. /dev/pts/2

c. A pipe to the ls process

d. A pipe to the less process

e. None of the above

5. Which file (or pipe) is tied to stdin of the grep process?

a. A pipe to the less process

b. A pipe to the ls process

c. /dev/pts/2

d. /dev/null

e. None of the above

6. Which file (or pipe) is tied to stdin of the less process?

a. A pipe to the ls process

b. /dev/pts/2

c. /dev/null

d. A pipe to the grep process

e. None of the above

33 Pipes

7. To what expression would the symbolic link /proc/2392/fd/0 resolve?

a. /dev/pts/2

b. pipe:[20966]

c. /dev/null

d. /tmp

e. /dev/tty2

8. When bash reports the stopped process group, the ls has been expanded to ls --color=tty. The --color=tty option tells the ls command to generate color control sequences only if its stdout is tied to a terminal. Which of the following is true?

a. The ls command generates color control sequences, which can be observed by the less pager.

b. The ls command does not generate color control sequences.

c. The ls command generates color control sequences, but they are filtered out by the grep command.

d. The ls command never generates color control sequences (the --color=tty is only included for purposes of backwards compatibility).

e. None of the above apply.

9. Which of the following command lines would allow the less pager to browse grep's error messages, as well as output?

a. grep root /etc/* 2>&1 | less

b. grep root /etc/* | less 2>&1

c. grep root /etc/* >>| less

d. grep root /etc/* 2| less

e. None of the above

The -j command line switch causes the ps command to generate "job control oriented" output. Use the following transcript to answer the next question.

[elvis@station elvis]$ ls -R / | grep etc | less

[1]+ Stopped ls --color=tty -R / | grep etc | less [elvis@station elvis]$ ps -j PID PGID SID TTY TIME CMD 1215 1215 1215 pts/0 00:00:00 bash 3242 3242 1215 pts/0 00:00:00 ls 3243 3242 1215 pts/0 00:00:00 grep 3244 3242 1215 pts/0 00:00:00 less 3246 3246 1215 pts/0 00:00:00 ps

10. Which of the following would be the most reasonable expansion of the second column header, PGID?

a. Process Group ID

34 Pipes b. Parent Generation ID c. Process Gender ID d. Powder Gondola ID e. None of the above

35