Process Text Streams Using Filters
Total Page:16
File Type:pdf, Size:1020Kb
Process Text Streams Using Filters OBJECTIVE: Candidates should should be able to apply filters to text streams. 1 Process Text Streams Using Filters KeyKEY knowledge KNOWLEDGE area(s): AREAS: Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package. 2 Process Text Streams Using Filters KEY FILES,TERMS, UTILITIES cat nl tail cut paste tr expand pr unexpand fmt sed uniq head sort wc hexdump split join tac 3 cat cat the editor - used as a rudimentary text editor. cat > short-message we are curious to meet penguins in Prague Crtl+D *Ctrl+D - command is used for ending interactive input. 4 cat cat the reader More commonly used to flush text to stdout. Options: -n number each line of output -b number only non-blank output lines -A show carriage return Example cat /etc/resolv.conf ▶ search mydomain.org nameserver 127.0.0.1 5 tac tac reads back-to-front This command is the same as cat except that the text is read from the last line to the first. tac short-message ▶ penguins in Prague to meet we are curious 6 head or tail using head or tail - often used to analyze logfiles. - by default, output 10 lines of text. List 20 first lines of /var/log/messages: head -n 20 /var/log/messages head -20 /var/log/messages List 20 last lines of /etc/aliases: tail -20 /etc/aliases 7 head or tail The tail utility has an added option that allows one to list the end of a text starting at a given line. List text starting at line 25 in /var/log/messages: tail +25 /etc/log/messages tail can continuously read a file using the -f option. This is most useful when you are expecting a file to be modified in real time. 8 wc The wc utility counts the number of bytes, words, and lines in files. Options for wc output: -l count number of lines -w count number of words -c or -m count number of bytes or characters With no argument, wc will count what is typed in stdin. 9 nl The nl utility has the same output as cat -b. Number all lines including blanks nl -ba /etc/lilo.conf Number only lines with text nl -bt /etc/lilo.conf 10 expand / unexpand The expand command is used to replace TABs with spaces. One can also use unexpand for the reverse operations. 11 hexdump A common tool for viewing binary files. Another is od (octal dump) 12 split The split tool can split a file into smaller files using criteria such as size or number of lines. split -l 5 /etc/passwd This will create files called xaa, xab, xac, xad ... each file contains at least 5 lines. It is possible to give a more meaningful prefix name for the files (other than 'x') such as 'pass-5.' on the command line split -l 5 /etc/passwd passwd-5 This has created files identical to the ones above (aa, xab, xac, xad ...) but the names are now passwd-5aa, passwd-5ab, passwd-5ac, passwd-5ad ... 13 uniq The uniq tool will send to STDOUT only one version of consecutive identical lines. uniq> /tmp/UNIQUE line 1 line 2 line 2 line 3 line 3 line 3 line 1 ^D 14 uniq The file /tmp/UNIQUE has the following content: cat /tmp/UNIQUE line 1 line 2 line 3 line 1 non consecutive identical lines are still printed to STDOUT. sort | uniq > /tmp/UNIQUE 15 cut The cut utilility can extract a range of characters or fields from each line of a text. The –c option is used to manipulate characters. Syntax: cut –c {range1,range2} Example cut –c5-10,15- /etc/password 16 cut Syntax: cut -d {delimiter} -f {fields} Example cut -d: -f 1,7 --output-delimiter=" " /etc/passwd The default output-delimiter is the same as the original input delimiter. The --output-delimiter option allows you to change this. 17 join and paste paste concatenates two files next to each other. Syntax: paste text1 text2 With join you can further specify which fields you are considering. Syntax: join -j1 {field_num} -j2{field_num} text1 text2 or join -1 {field_num} -2{field_num} text1 text2 Text is sent to stdout only if the specified fields match. Comparison is done one line at a time and as soon as no match is made the process is stopped even if more matches exist at the end of the file. 18 sort By default, sort will arrange a text in alphabetical order. To perform a numerical sort use the -n option. 19 fmt You can modify the number of characters per line of output using fmt. concatenate lines and output 75 character lines. Options -w number of characters per line -s split long lines but do not refill -u place one space between each word and two spaces at the end of a sentence 20 pr Long files can be paginated to fit a given size of paper with the pr utility. One can control the page length (default is 66 lines) and page width (default 72 characters) as well as the number of columns. When outputting text to multiple columns each column will be evenly truncated across the defined page width. This means that characters are dropped unless the original text is edited to avoid this. 21 tr The tr utility translates one set of characters into another. Example changing uppercase letters into lowercase tr 'A-B' 'a-b' < file.txt Replacing delimiters in /etc/passwd: tr ':' ' ' < /etc/passwd tr has only two arguments. The file is not an argument. 22 sed The sed utility is most often used to search and replace patterns in text. It supports most regular expressions. Syntax: sed [options] ´command’ [INPUTFILE] The input file is optional since sed also works on file redirections and pipes. Example: (using file MODIF) Delete all commented lines: sed ‘/^#/ d ’ MODIF The search pattern is between the double slashs //. 23 sed Example continued: (using file MODIF) Substitute /dev/hda1 by /dev/sdb3: sed ‘s/\/dev\/hda1/\/dev\/sdb3/g’ MODIF The s in the command stands for ‘substitute’. The g stands for “globally” and forces the substitution to take place throughout each line. If the line contains the keyword KEY then substitute ‘:’ with ‘;’ globally: sed ‘ /KEY/ s/:/;/g’ MODIF 24 sed You can issue several commands each starting with –e. Example: (1) delete all blanks then (2) substitute ‘OLD’ by ‘NEW’ in the file MODIF sed –e ‘/^$/ d’ -e ‘s/OLD/NEW/g’ MODIF These commands can also be written to a file, then each line is interpreted as a new command to execute (no quotes are needed). An example COMMANDS file 1 s/old/new/ /keyword/ s/old/new/g 23,25 d The syntax to use this COMMANDS file is: sed -f COMMANDS MODIF 25 sed Summary of Options Commandline flags -e Execute the following command -f Read commands from a file -n Do not printout unedited lines sed commands d Delete an entire line r Read a file and append to output s Substitute w Write output to a file 26 cat – concatenate files and print on the standard output cut– remove sections from each line of files expand – convert tabs to spaces fmt– simple optimal text formatter head– output the first part of files join– join lines of two files on a common field nl – number lines of files od – dump files in octal and other formats paste – merge lines of files sort – sort lines of text files split – split a file into pieces tac– concatenate and print files in reverse tail – output the last part of files tr – translate or delete characters unexpand – convert spaces to tabs uniq – remove duplicate lines from a sorted file wc – print the number of bytes, words, and lines in files 27.