Useful Examples of the Usage of Unix/Linux Filters and Utilities
Total Page:16
File Type:pdf, Size:1020Kb
Useful examples of the usage of Unix/Linux filters and utilities tr tr a-z A-Z capitalise all letters tr A-Z a-z make all letters lower case tr ' ' '\n' put each word on a separate line (literally: change a space into a newline character) tr ' ' '\012' put each word on a separate line (an octal representation of the previous example) tr -d a-c delete the letters a, b and c tr -d '.,:;!?()[]0-9' delete punctuation marks and numbers tr -s ' ' delete extra spaces (leave only one space in a sequence) tr -sc A-Za-z '[\012*]' put each word on its own line Flags: -d delete -s squeeze -c complement (= anything else except what is listed) sort sort sort in alphabetical order starting from the first character sort +1 the first field will be skipped and the sort starts from the second field (i.e. the first space after the first string. Be careful if the number of spaces after the first string is not the same, because this affects the sort order.) sort -b +1 the first field will be skipped and also the spaces (blanks) after it will be skipped, and the sort starts from the second argument (string) sort -n sorting is done according to numbers, smallest first sort -n +1 sorting is done according to numbers, smallest first, starting from the second argument (which should be a number) sort -r sort in reverse order (z on top and a on bottom) sort -rn sort in reverse order (biggest number on top and smallest on bottom) uniq uniq remove consequtive extra lines (remember to sort the lines first) uniq -c remove consequtive extra lines (remember to sort the lines first) but retain the number of occurrences rev reverse the order of characters on line (first becomes last and vice versa) rev | sort | rev produces a list of lines sorted from last character to the left sed sed "/ kama /d" delete the lines that contain the word 'kama' sed /kama/d delete the lines that contain a partial string 'kama' sed -e "/ kama /d" -e "/ ikiwa /d" delete the lines that contain the words 'kama' and 'ikiwa' (note that the flag -e has to be repeated before each string to be deleted) sed -n /Tanzania/p print only the lines that contain the word Tanzania sed /Tanzania/p print the lines that contain the word Tanzania twice, and all the other lines once sed -e s/Ali/Juma/ substitute the first occurrence of Ali with Juma (= rewrite Ali as Juma) sed -e s/Ali/Juma/g substitute all occurrences of Ali with Juma (= rewrite Ali as Juma) egrep egrep 'kama' retrieve lines where the string kama occurs egrep ' kama ' retrieve lines where the word kama occurs egrep ' [kK]ama ' retrieve lines where the word 'kama' or 'Kama' occurs egrep ' (k|K)ama ' retrieve lines where the word 'kama' or 'Kama' occurs egrep -i ' kama ' retrieve lines where the word 'kama' or 'Kama' or 'KAMA' occurs egrep ' kama ' | egrep ' Kama ' retrieve lines where the word 'kama' or 'Kama' occurs egrep -v ' kama ' retrieve lines where the word 'kama' noes NOT occur Flags: -i ignore case -v complement (retrieve all lines except those where the strins listed occur) Using regular expressions sed -n /[u|U]piganaji/p print the lines that contain the strings: upiganaji, Upiganaji sed -n '/ [kK]ama /p' print the lines that contain the words: kama, Kama (must be surrounded by quotes) sed -n '/ .... /p' print all words with four characters egrep -i '^(m|wa)piganaji' retrieve the lines where in the beginning of the line there is the word: mpiganaji, Mpiganaji, wapiganaji, Wapiganaji egrep -i '^(m|wa)piganaji$' retrieve the lines where in the beginning of the line there is the word mpiganaji, Mpiganaji, wapiganaji, or Wapiganaji, and the line ends with that word egrep '((^| )(n|N)i($| ))' retrieve the word ni or Ni regardless the place in text egrep -i '((^| )ni($| ))' retrieve the word ni or Ni regardless the place in text Combining commands tr -d '.,:;!?()[]0-9' | tr -s ' ' '\n' | sed -n -e '/aji$/p' | sed -e 's/aji$/+aji/g' | sort | uniq -c | sort -nr puts the words into word-per-line format, retrieves the words that end with 'aji', rewrites those words so that a + sign comes in front of the ending, sorts the lines, removes duplicates but keeps count, sorts again according to number and puts the result in reverse order, the most frequent first .