Wildcards and Regular Expressions

Hour 9 PObjectives

Copyright © 1998-2002 Delroy A. Brinkerhoff. All Rights Reserved. Hour 9 Slide 1 of 12

Regular Expressions A formal language

PFormal (computer) languages are categorized by their strength (i.e., by the complexity of the grammar they accept) PRegular expressions are the simplest of these languages PRegular expressions are formed with metacharacters

Hour 9 Unix Slide 2 of 12 Wildcards name shortcuts

PWildcard characters

Hour 9 Unix Slide 3 of 12

Wildcard Metacharacters Selecting related files P* (asterisk, or “splat” in Unix) matches zero or occurrences of any character < ls * list all non-dot files (equivalent to ) < cp c*s bak copies all files whose name begins with ‘c’ and ends with ‘s’ to directory bak P? (question mark) matches any one character < cp *.? bak copies all files whose name ends with a “dot” (period) and a single character P[xyz] matches one character from x, y, or z < cp [ABC]* /bak copies all files whose name begins with ‘A,’ ‘B,’ or ‘C’ to the bak directory P[a-z] matches one character from the range a to z < ls [A-Z]*.c lists all files whose name begins with a capital letter and end with a .c extension Hour 9 Unix Slide 4 of 12 Wildcard Examples

% ls -a . .login DOCS Mail prog.c .c .. .profile DOCS.bak Mandel.java sprio.z wc.java .TRASH Count.p Hypo.java pgp.Z wc.asm wc.p % ls * Count.p Hypo.java pgp.Z wc.asm wc.p DOCS Mail prog.c wc.c DOCS.bak Mandel.java sprio.z wc.java %ls *.? Count.p pgp.Z prog.c sprio.z wc.c wc.p % ls [A-Z]* Count.p DOCS DOCS.bak Hypo.java Mail Mandel.java % ls *.c prog.c wc.c % ls *.[zZ] pgp.Z sprio.z % ls wc.* wc.asm wc.c wc.java wc.p % ls w*a wc.java Hour 9 Unix Slide 5 of 12

Other Wildcards Not supported by all (especially older) shells

PSupported by Bourne and Korn shells P[!xyz]

% ksh $ ls [!Dw]* Count.p Mail pgp.Z sprio.z Hypo.java Mandel.java prog.c

Hour 9 Unix Slide 6 of 12 The Command Pattern matching with a regular expression processor

Pgrep [ -ivlnw ] re-pattern [file ...] < -i case insensitive < -v invert test (print lines that don’t match) < -l list files but don’t print matched lines < -n print line number and matched line < -w match whole words only PInput is line-oriented text PSearches lines for a specified regular expression pattern

Hour 9 Unix Slide 7 of 12

grep Examples Simple regular expressions

Pgrep BUFSIZ *.c

PC matches character C exactly P\C escape C (treat C as a “normal” character) P. matches any single character except new-line P^R matches reg exp R when the beginning of line PR$ matches reg exp R when at the end of the line PR* repeat regular expression R zero or more times P[xyz] matches one of x, y, or z P[^xyz] matches any one character except x, y, or z P[a-z] matches any one character from a to z P[^a-z] matches any character except in the range a to z

Hour 9 Unix Slide 9 of 12

Advanced grep Examples Filtering out non-matching lines

PRegular expression pattern must be hidden from shell < grep ’^Count’ prog.c < “Count must be at the beginning of the line Pgrep ’Count$’ prog.c < “Count” must be at the end of a line Pgrep ’^Count\$’ prog.c < “Count$” must be at the beginning of the line; $ is matched exactly Pmake | grep ’[Ee]rror’

Pfgrep [ -ivlnw ] [ -f pattern-file ] string [file ...]

–R1|R2 match regular expression R1 or R2 – (R) groups regular expression R for other RE operators Hour 9 Unix Slide 11 of 12

egrep Examples Great for cross word puzzles

Pegrep -i ’bomb|explosive|terrorist’ mail/*