Wildcards and Regular Expressions.Shw
Total Page:16
File Type:pdf, Size:1020Kb
Wildcards and Regular Expressions Hour 9 PObjectives <Regular expressions <File name wild cards and hiding wild cards from the shell <Searching for strings and patterns: grep, fgrep, and egrep Copyright © 1998-2002 Delroy A. Brinkerhoff. All Rights Reserved. Hour 9 Unix Slide 1 of 12 Regular Expressions A formal language PFormal (computer) languages are categorized by their strength (i.e., by the complexity of the grammar they accept) PRegular expressions are the simplest of these languages PRegular expressions are formed with metacharacters <Some characters just represent themselves <Metacharacters have an extended, regular expression meaning <The set of metacharacters and the associated meaning vary between regular expression languages (i.e., the regular expression language accepted by Unix commands is not always the same) Hour 9 Unix Slide 2 of 12 Wildcards File name shortcuts PWildcard characters <Simplified (subset of) regular expression <Shortcut method of specifying –a single file name – multiple file names with one expression < Implemented or expanded by the shell (Bourne, Korn, C, & others) – Wildcards work with any command (including user-written programs) that accepts multiple file names on the command line PIf these characters are interpreted by a command, they can be hidden from the shell with quotation marks < " (double quote marks) inhibit wildcard expansion < ' (single quote marks) inhibit wildcard expansion, variable substitution ($varname), and command substitution (introduced later) Hour 9 Unix Slide 3 of 12 Wildcard Metacharacters Selecting related files P* (asterisk, or “splat” in Unix) matches zero or more occurrences of any character < ls * list all non-dot files (equivalent to ls) < cp c*s bak copies all files whose name begins with ‘c’ and ends with ‘s’ to directory bak P? (question mark) matches any one character < cp *.? bak copies all files whose name ends with a “dot” (period) and a single character P[xyz] matches one character from x, y, or z < cp [ABC]* /bak copies all files whose name begins with ‘A,’ ‘B,’ or ‘C’ to the bak directory P[a-z] matches one character from the range a to z < ls [A-Z]*.c lists all files whose name begins with a capital letter and end with a .c extension Hour 9 Unix Slide 4 of 12 Wildcard Examples % ls -a . .login DOCS Mail prog.c wc.c .. .profile DOCS.bak Mandel.java sprio.z wc.java .TRASH Count.p Hypo.java pgp.Z wc.asm wc.p % ls * Count.p Hypo.java pgp.Z wc.asm wc.p DOCS Mail prog.c wc.c DOCS.bak Mandel.java sprio.z wc.java %ls *.? Count.p pgp.Z prog.c sprio.z wc.c wc.p % ls [A-Z]* Count.p DOCS DOCS.bak Hypo.java Mail Mandel.java % ls *.c prog.c wc.c % ls *.[zZ] pgp.Z sprio.z % ls wc.* wc.asm wc.c wc.java wc.p % ls w*a wc.java Hour 9 Unix Slide 5 of 12 Other Wildcards Not supported by all (especially older) shells PSupported by Bourne and Korn shells P[!xyz] <matches any single character except ‘x,’ ‘y,’ or ‘z’ % ksh $ ls [!Dw]* Count.p Mail pgp.Z sprio.z Hypo.java Mandel.java prog.c Hour 9 Unix Slide 6 of 12 The grep Command Pattern matching with a regular expression processor Pgrep [ -ivlnw ] re-pattern [file ...] < -i case insensitive < -v invert test (print lines that don’t match) < -l list files but don’t print matched lines < -n print line number and matched line < -w match whole words only PInput is line-oriented text PSearches lines for a specified regular expression pattern <Processes limited regular expressions like those described in the regexp(5) manual page POutput (by default) is all lines that contain that pattern Hour 9 Unix Slide 7 of 12 grep Examples Simple regular expressions Pgrep BUFSIZ *.c <Searches for the pattern or string “BUFSIZ” in all files ending with .c <This pattern does not contain regular expression metacharacters < *.c is interpreted by the shell, not by grep Pgrep ”Cranston Snort” /etc/passwd <Searches for the string “Cranston Snort” in /etc/passwd <The quotation marks are required to “hide” the space – without the quotation marks, grep would try to search for Cranston in two files: Snort and /etc/passwd – Single quotation marks also work Pgrep -v Wilson $HOME/.phone <Searches for “Wilson” in the .phone file in the user’s home directory <Prints lines that do not contain “Wilson” <Shell expands the HOME environment variable Hour 9 Unix Slide 8 of 12 Regular Expressions See the regexp(5) or ed(1) man pages for more regular expressions PC matches character C exactly P\C escape C (treat C as a “normal” character) P. matches any single character except new-line P^R matches reg exp R when at the beginning of line PR$ matches reg exp R when at the end of the line PR* repeat regular expression R zero or more times P[xyz] matches one of x, y, or z P[^xyz] matches any one character except x, y, or z P[a-z] matches any one character from a to z P[^a-z] matches any character except in the range a to z Hour 9 Unix Slide 9 of 12 Advanced grep Examples Filtering out non-matching lines PRegular expression pattern must be hidden from shell < grep ’^Count’ prog.c < “Count must be at the beginning of the line Pgrep ’Count$’ prog.c < “Count” must be at the end of a line Pgrep ’^Count\$’ prog.c < “Count$” must be at the beginning of the line; $ is matched exactly Pmake | grep ’[Ee]rror’ <Reads standard in and searches for “Error” or “error” Pgrep ’A.*y’ /usr/dict/words <Searches the on-line dictionary for all words starting with ‘A’ and ending with ‘y’ Hour 9 Unix Slide 10 of 12 The fgrep and egrep Commands Other regular expression processors Pfgrep [ -ivlnw ] [ -f pattern-file ] string [file ...] <Fast grep: only searches for strings (i.e., no regular expressions) <Same options as grep < -f pattern-file Take the list of patterns from pattern-file Pegrep [ -ivlnw ] [ -f file ] re-pattern [file ...] <Extended or expression grep <Same options as grep < -f file Take the list of full regular expressions from file <Accepts full regular expressions as described on the regexp(5) manual page plus the following – R+ matches one or more occurrences of the full regular expression R – R? matches 0 or 1 occurrences of the full regular expression R –R1|R2 match regular expression R1 or R2 – (R) groups regular expression R for other RE operators Hour 9 Unix Slide 11 of 12 egrep Examples Great for cross word puzzles Pegrep -i ’bomb|explosive|terrorist’ mail/* <Searches for any of the words “bomb,” “explosive,” or “terrorist” <Case is not considered <Note that there are no spaces surrounding ‘|’ Pegrep ’(oo)+k’ /usr/dict/words <Searches for words with any of “ook” “ooook” “ooooook” ... Pegrep ’school(book)?’ /usr/dict/words <Searches for words that begin with “school” or “schoolbook” Pegrep ’^[Ss].....te....r$’ /usr/dict/words <Searches for a word 13 letters long <Begins with S or s <Has “te” as the 7th and 8th letters <Ends with ‘r’ Hour 9 Unix Slide 12 of 12 .