Process Text Streams Using Filters

OBJECTIVE:

Candidates should should be able to apply filters to text streams.

1 Process Text Streams Using Filters

KeyKEY knowledge KNOWLEDGE area(s): AREAS: Send text files and output streams through text utility filters to modify the output using standard commands found in the GNU textutils package.

2 Process Text Streams Using Filters

KEY FILES,TERMS, UTILITIES unexpand hexdump tac

3 cat cat the editor

- used as a rudimentary text editor.

cat > short-message we are curious to meet penguins in Prague Crtl+D

*Ctrl+D - is used for ending interactive input.

4 cat cat the reader

More commonly used to flush text to stdout.

Options:

-n number each line of output -b number only non-blank output lines -A show carriage return

Example

cat /etc/resolv.conf ▶ search mydomain.org nameserver 127.0.0.1

5 tac tac reads back-to-front

This command is the same as cat except that the text is read from the last line to the first.

tac short-message ▶ penguins in Prague to meet we are curious

6 head or tail using head or tail

- often used to analyze logfiles. - by default, output 10 lines of text.

List 20 first lines of /var/log/messages:

head -n 20 /var/log/messages head -20 /var/log/messages

List 20 last lines of /etc/aliases:

tail -20 /etc/aliases

7 head or tail

The tail utility has an added option that allows one to list the end of a text starting a given line.

List text starting at line 25 in /var/log/messages:

tail +25 /etc/log/messages tail can continuously read a using the -f option. This is most useful when you are expecting a file to be modified in real .

8 wc

The wc utility counts the number of bytes, words, and lines in files.

Options for wc output:

-l count number of lines -w count number of words -c or -m count number of bytes or characters

With no argument, wc will count what is typed in stdin.

9 nl

The nl utility has the same output as cat -b.

Number all lines including blanks

nl -ba /etc/lilo.conf

Number only lines with text

nl -bt /etc/lilo.conf

10 expand / unexpand

The expand command is used to replace TABs with spaces.

One can also use unexpand for the reverse operations.

11 hexdump

A common tool for viewing binary files. Another is (octal dump)

12 split

The split tool can split a file into smaller files using criteria such as size or number of lines.

split -l 5 /etc/passwd

This will create files called xaa, xab, xac, xad ... each file contains at least 5 lines. It is possible to give a meaningful prefix name for the files (other than 'x') such as 'pass-5.' on the command line

split -l 5 /etc/passwd passwd-5

This has created files identical to the ones above (aa, xab, xac, xad ...) but the names are now passwd-5aa, passwd-5ab, passwd-5ac, passwd-5ad ...

13 uniq

The uniq tool will send to STDOUT only one version of consecutive identical lines.

uniq> /tmp/UNIQUE line 1 line 2 line 2 line 3 line 3 line 3 line 1 ^D

14 uniq

The file /tmp/UNIQUE has the following content:

cat /tmp/UNIQUE line 1 line 2 line 3 line 1 non consecutive identical lines are still printed to STDOUT.

sort | uniq > /tmp/UNIQUE

15 cut

The cut utilility can extract a range of characters or fields from each line of a text.

The –c option is used to manipulate characters.

Syntax: cut –c {range1,range2}

Example

cut –c5-10,15- /etc/

16 cut

Syntax: cut -d {delimiter} -f {fields}

Example

cut -d: -f 1,7 --output-delimiter=" " /etc/passwd

The default output-delimiter is the same as the original input delimiter. The --output-delimiter option allows you to change this.

17 join and paste paste concatenates two files next to each other.

Syntax: paste text1 text2

With join you can further specify which fields you are considering.

Syntax:

join -j1 {field_num} -j2{field_num} text1 text2 or join -1 {field_num} -2{field_num} text1 text2

Text is sent to stdout only if the specified fields match.

Comparison is done one line at a time and as soon as no match is made the process is stopped even if more matches exist at the end of the file.

18 sort

By default, sort will arrange a text in alphabetical order.

To perform a numerical sort use the -n option.

19 fmt

You can modify the number of characters per line of output using fmt. concatenate lines and output 75 character lines.

Options -w number of characters per line -s split long lines but do not refill -u place one space between each word and two spaces at the end of a sentence

20 pr

Long files can be paginated to fit a given size of paper with the pr utility.

One can control the page length (default is 66 lines) and page width (default 72 characters) as well as the number of columns.

When outputting text to multiple columns each column will be evenly truncated across the defined page width.

This means that characters are dropped unless the original text is edited to avoid this.

21 tr

The tr utility translates one set of characters into another.

Example changing uppercase letters into lowercase

tr 'A-B' 'a-b' < file.txt

Replacing delimiters in /etc/passwd:

tr ':' ' ' < /etc/passwd tr has only two arguments. The file is not an argument.

22 sed

The sed utility is most often used to search and replace patterns in text. It supports most regular expressions.

Syntax: sed [options] ´command’ [INPUTFILE]

The input file is optional since sed also works on file redirections and pipes.

Example: (using file MODIF)

Delete all commented lines:

sed ‘/^#/ d ’ MODIF

The search pattern is between the double slashs //.

23 sed

Example continued: (using file MODIF)

Substitute /dev/hda1 by /dev/sdb3:

sed ‘s/\/dev\/hda1/\/dev\/sdb3/g’ MODIF

The s in the command stands for ‘substitute’. The g stands for “globally” and forces the substitution to take place throughout each line.

If the line contains the keyword KEY then substitute ‘:’ with ‘;’ globally:

sed ‘ /KEY/ s/:/;/g’ MODIF

24 sed

You can issue several commands each starting with –e. Example: (1) delete all blanks then (2) substitute ‘OLD’ by ‘NEW’ in the file MODIF

sed –e ‘/^$/ d’ -e ‘s/OLD/NEW/g’ MODIF

These commands can also be written to a file, then each line is interpreted as a new command to execute (no quotes are needed).

An example COMMANDS file 1 s/old/new/ /keyword/ s/old/new/g 23,25 d

The syntax to use this COMMANDS file is:

sed -f COMMANDS MODIF

25 sed

Summary of Options

Commandline flags -e Execute the following command -f Read commands from a file -n Do not printout unedited lines sed commands d Delete an entire line r Read a file and append to output s Substitute w output to a file

26 cat – concatenate files and print on the standard output cut– remove sections from each line of files expand – convert tabs to spaces fmt– simple optimal text formatter head– output the first part of files join– join lines of two files on a common field nl – number lines of files od – dump files in octal and other formats paste – merge lines of files sort – sort lines of text files split – split a file into pieces tac– concatenate and print files in reverse tail – output the last part of files tr – translate or delete characters unexpand – convert spaces to tabs uniq – remove duplicate lines from a sorted file wc – print the number of bytes, words, and lines in files

27