Searching Inside Files & Pattern Matching

Total Page:16

File Type:pdf, Size:1020Kb

Searching Inside Files & Pattern Matching SEARCHING INSIDE FILES & PATTERN MATCHING Patricia J Riddle [email protected] Types of Searching • Searching inside files – using regular expressions (this lecture) • Searching for files – using recursive file find (next lecture) Lecture 6 COMPSCI 215 1 REGULAR EXPRESSIONS • Used by several different UNIX commands: . ed - a line-based text editor to create, display, modify, and otherwise manipulate text files . sed - a stream editor that reads input files, modifies the input in line with a list of commands, and writes to STDOUT . awk - a pattern-directed scanning and processing tool . grep - searches input files or STDIN for lines matching a given pattern and, by default, prints the matching lines . vi - a programmers text editor to edit all kinds of plain text (it is especially useful for editing programs) (but it is not as good as emacs!) . Provide a convenient and consistent way of specifying patterns to be matched Lecture 6 COMPSCI 215 2 The awk Command • The awk command, named after its authors, Aho, Weinberger, and Kernigan, was the most powerful utility for text manipulation and report writing before the advent of perl – The POSIX awk also appears as nawk (never awk) in most systems and gawk (GNU awk) in Linux – It combines features of several filters, but also has two unique features: 1. It can identify and manipulate individual fields in a line 2. It is the only UNIX filter that can perform computations – Also, it accepts extended REs for pattern matching, has C-type programming constructs, and several built-in variables and functions Lecture 6 COMPSCI 215 3 WILDCARDS A very limited form of regular expressions recognised by the shell when you use filename substitution * (asterisk, or “splat”) specifies zero or more characters to match ? (question mark) specifies any single character […] specifies any character listed within the brackets [abc] (set) matches any one of the characters listed, i.e. a, b, or c [x-z] (range) matches any one character in the range x-z . Be advised that the asterisk and the question mark are treated differently by these programs that by the shell ! Lecture 6 COMPSCI 215 4 Shortcomings of Wildcards • Wildcards are great in specifying files in a directory, but not really powerful enough when searching inside files (Because they are evaluated by the shell before the command is run) • What we need is something to search inside files . egrep -n PATTERN [FILES…] ggim001$ egrep -n Ju.y * The_Relief_of_Tobruk.txt:23:(on a July estimate) did not return, and it was but a small consolation that The_Relief_of_Tobruk.txt:59:By 10 July General Freyberg (1) was able to point out to the New Zealand inter1.txt:193:July inter1.txt:645:July Lecture 6 COMPSCI 215 5 REGULAR EXPRESSIONS • A pattern that describes a set of strings • A tiny, highly specialised programming language – Sometimes a part of other programming languages such as Python, Perl or Java • Theoretical underpinnings - finite state automata • A regular expression (RE) has an elaborate character set overshadowing the shell’s wildcards – REs take care of usual query requirements Lecture 6 COMPSCI 215 6 Regular Expressions Vs. Wildcards • Alert! Regular expressions are NOT wildcards • Although they look similar, the meaning of wildcards are different to REs – Wildcards are expanded by the shell – REs are interpreted by some other program: often by grep (egrep, fgrep) but also awk, sed (stream editor) and perl (practical extraction and report language) – REs are interpreted even by Java by importing the java.util.regexp package Lecture 6 COMPSCI 215 7 Fixed Strings Regular expressions - • Most common case: most characters in a RE match themselves • Exceptions are called meta-characters: . * ? + [ ] { } | \ ( ) • To match a meta-character, escape it with a \ $ egrep ”p\.j\.delmas" cs_staff.lst if you look for the name p.j.delmas Lecture 6 COMPSCI 215 8 A Single Character • If not a meta-character, it matches itself • Bracketed structure […] . Matches a single character in the set . Example: The pattern [ra] matches either an r or an a . Can contain a range, both for alphabets and numerals . [a-zA-Z0-9] matches a single alphanumeric character . Can be complemented (negated) by putting a caret (^) as the first character (like the wildcard !) . [^a-zA-Z] matches a single non-alphabetic character . Fullstop, or period (.) matches any single character . First use of ^ Lecture 6 COMPSCI 215 9 Repeating Characters A regular expression may be followed by one of several repetition operators: ? The preceding item will be matched zero or 1 times * The preceding item will be matched zero or more times + The preceding item will be matched one or more times \{n\} The preceding item is matched exactly n times \{n,\} The preceding item is matched n or more times \{n,m\} The preceding item is matched at least n times, but no more than m times Lecture 6 COMPSCI 215 10 Repeating Characters * (asterisk) refers to the immediately preceding character – Its interpretation totally differs from the * used by wildcards – It indicates that the previous character can occur many times, or not at all g* matches a null string and also: g gg ggg gggg … g+ matches g gg ggg gggg … .* matches a null string or any number of characters – The * has significance in a regular expression only if it is preceded by a character • If it is the first character in a regular expression, then it matches itself Lecture 6 COMPSCI 215 11 Example: the grep Command • The grep command allows you to search for one or more files for particular character patterns: grep pattern file(s) – Every line of each file that contains pattern is displayed at the terminal – If more than one file is specified to grep, each line is also immediately preceded by the name of the file (in order to identify the latter) – If the pattern does not exist in the specified file(s), the grep command simply displays nothing – It is generally a good idea to enclose the grep pattern inside a pair of single quotes to “protect” it from the shell $ grep * stars - the shell sees the asterisk * and automatically substitutes the names of all the files in your current directory $ grep '*' stars - the quotes remove its special meaning from the shell, so that the two arguments, * and stars, are passed to the grep Lecture 6 COMPSCI 215 12 Specifying Pattern Locations Anchoring a pattern is necessary when it can occur in more than one place in a line ^ (caret) matches an empty string at the beginning of a line . ^pat matches the pattern pat at the beginning of a line $ (dollar) matches an empty string at the end of a line . pat$ matches the pattern pat at the end of a line ggim001$ egrep -n Ju.y * ggim001$ egrep -n ^Ju.y * The_Relief_of_Tobruk.txt:23:5816 (on inter1.txt:193:July a July estimate) did not return, and inter1.txt:645:July The_Relief_of_Tobruk.txt:59:By 10 inter2.txt:193:July July General Freyberg (1) was able to point out to the New Zealand inter1.txt:193:July inter1.txt:645:July inter2.txt:193:July inter4.txt:480: 2 July Second use of ^ Lecture 6 COMPSCI 215 13 Regular Expression Character Subset * Matches zero or more occurrences of previous character . Matches a single character (like the ? Wildcard) [prq] Matches a single character p, r, or q [c1-c2] Matches a single character in the range between c1 and c2 [^pqr] Matches a single character which is not p, q, or r ^pat Matches pattern pat at beginning of line pat$ Matches pattern pat at end of line Some of these symbols are also meaningful to the shell, so that the REs should be quoted – The expressions are interpreted by the command, and quoting ensures that the shell is not able to interfere Lecture 6 COMPSCI 215 14 Large Regular Expressions • Larger REs can be built from smaller REs in two ways: – Concatenation: The resulting RE matches any string formed by concatenating two substrings that respectively match the concatenated sub-expressions • Example: “[a-z][0-9]*” matches any string that begins with a lowercase letter followed by zero or more digits – Alternation: two REs may be joined by the infix operator |; the resulting RE matches any string matching either sub- expression • Example: [a-z]|[0-9] Lecture 6 COMPSCI 215 15 A Few Examples chap1 chap2 $ ls cha* text1 text2 chap1 chap2 $ cat [a-z]hap[0-9] text1 text2 $ cat [a-z]hap[0-9] | tr '[a-z]' '[A-Z]' TEXT1 TEXT2 $ cat [a-z]hap[0-9] | tr '[a-z]' '[A-QU-Z]' WE]W1 WE]W2 $ egrep -n '[a-z]' chap* chap1:1:text1 chap2:1:text2 Lecture 6 COMPSCI 215 16 Saving Matched Characters • The characters matched within a regular expression are captured by enclosing the characters inside \(…\) • The captured characters are stored in “registers” 1, …, 9 and retrieved as \1, …, \9, respectively – Successive occurrences of the \(…\) construct get assigned to successive registers: ^\(…\)\(…\) - the first three characters on the line are stored into register 1, and the next three characters into register 2 – The RE ^\(.\) matches the first character in the line and stores it in register 1 – The RE ^\(.\).*\1$ matches all lines in which the first (^.) and the last character (\1$) on the line are the same • Here, the RE .* matches all the characters in-between Lecture 6 COMPSCI 215 17 An Example • List all the ordinary files in your directory created in July in increasing order of file size (assuming file names do not contain “ Jul ”): ls -l | grep '^-.* Jul .*' | sort -n -k 5 • The first command ls -l lists directory contents – the -l option displays the long format: file mode, number of links, owner name, group name, number of bytes in the file, abbreviated month, day- of-month, hour:minute file was last modified, and file name ggim001$ ls -l total 160 drwxr-xr-x 13 ggim001 ggim001 442 Jul 9 11:35 Exercises -rw-r--r-- 1 ggim001 ggim001 526 Jul 9 12:46 Test.txt drwxr-xr-x 17 ggim001 ggim001 578 Jun 27 15:01 bak-dir -rwxr-xr-x 1 ggim001 ggim001 294 Jun 27 14:57 bak.bash -rwxr-xr-x 1 ggim001 ggim001 425 Jun 26 18:26 calc.bash … Lecture 6 COMPSCI 215 18 An Example • The second command grep '^-.* Jul .*' allows you to search its input (i.e.
Recommended publications
  • GNU Grep: Print Lines That Match Patterns Version 3.7, 8 August 2021
    GNU Grep: Print lines that match patterns version 3.7, 8 August 2021 Alain Magloire et al. This manual is for grep, a pattern matching engine. Copyright c 1999{2002, 2005, 2008{2021 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". i Table of Contents 1 Introduction ::::::::::::::::::::::::::::::::::::: 1 2 Invoking grep :::::::::::::::::::::::::::::::::::: 2 2.1 Command-line Options ::::::::::::::::::::::::::::::::::::::::: 2 2.1.1 Generic Program Information :::::::::::::::::::::::::::::: 2 2.1.2 Matching Control :::::::::::::::::::::::::::::::::::::::::: 2 2.1.3 General Output Control ::::::::::::::::::::::::::::::::::: 3 2.1.4 Output Line Prefix Control :::::::::::::::::::::::::::::::: 5 2.1.5 Context Line Control :::::::::::::::::::::::::::::::::::::: 6 2.1.6 File and Directory Selection:::::::::::::::::::::::::::::::: 7 2.1.7 Other Options ::::::::::::::::::::::::::::::::::::::::::::: 9 2.2 Environment Variables:::::::::::::::::::::::::::::::::::::::::: 9 2.3 Exit Status :::::::::::::::::::::::::::::::::::::::::::::::::::: 12 2.4 grep Programs :::::::::::::::::::::::::::::::::::::::::::::::: 13 3 Regular Expressions ::::::::::::::::::::::::::: 14 3.1 Fundamental Structure ::::::::::::::::::::::::::::::::::::::::
    [Show full text]
  • “Linux at the Command Line” Don Johnson of BU IS&T  We’Ll Start with a Sign in Sheet
    “Linux at the Command Line” Don Johnson of BU IS&T We’ll start with a sign in sheet. We’ll end with a class evaluation. We’ll cover as much as we can in the time allowed; if we don’t cover everything, you’ll pick it up as you continue working with Linux. This is a hands-on, lab class; ask questions at any time. Commands for you to type are in BOLD The Most Common O/S Used By BU Researchers When Working on a Server or Computer Cluster Linux is a Unix clone begun in 1991 and written from scratch by Linus Torvalds with assistance from a loosely-knit team of hackers across the Net. 64% of the world’s servers run some variant of Unix or Linux. The Android phone and the Kindle run Linux. a set of small Linux is an O/S core programs written by written by Linus Richard Stallman and Torvalds and others others. They are the AND GNU utilities. http://www.gnu.org/ Network: ssh, scp Shells: BASH, TCSH, clear, history, chsh, echo, set, setenv, xargs System Information: w, whoami, man, info, which, free, echo, date, cal, df, free Command Information: man, info Symbols: |, >, >>, <, ;, ~, ., .. Filters: grep, egrep, more, less, head, tail Hotkeys: <ctrl><c>, <ctrl><d> File System: ls, mkdir, cd, pwd, mv, touch, file, find, diff, cmp, du, chmod, find File Editors: gedit, nedit You need a “xterm” emulation – software that emulates an “X” terminal and that connects using the “SSH” Secure Shell protocol. ◦ Windows Use StarNet “X-Win32:” http://www.bu.edu/tech/support/desktop/ distribution/xwindows/xwin32/ ◦ Mac OS X “Terminal” is already installed Why? Darwin, the system on which Apple's Mac OS X is built, is a derivative of 4.4BSD-Lite2 and FreeBSD.
    [Show full text]
  • User Commands Grep ( 1 ) Grep – Search a File for a Pattern /Usr/Bin/Grep [-Bchilnsvw] Limited-Regular-Expression [Filename
    User Commands grep ( 1 ) NAME grep – search a file for a pattern SYNOPSIS /usr/bin/grep [-bchilnsvw] limited-regular-expression [filename...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] -e pattern_list... [-f pattern_file]... [file...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] [-e pattern_list...] -f pattern_file... [file...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] pattern [file...] DESCRIPTION The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses a compact non-deterministic algorithm. Be careful using the characters $, ∗, [,ˆ,, (, ), and \ in the pattern_list because they are also meaning- ful to the shell. It is safest to enclose the entire pattern_list in single quotes ’ . ’. If no files are specified, grep assumes standard input. Normally, each line found is copied to standard output. The file name is printed before each line found if there is more than one input file. /usr/bin/grep The /usr/bin/grep utility uses limited regular expressions like those described on the regexp(5) manual page to match the patterns. /usr/xpg4/bin/grep The options -E and -F affect the way /usr/xpg4/bin/grep interprets pattern_list. If -E is specified, /usr/xpg4/bin/grep interprets pattern_list as a full regular expression (see -E for description). If -F is specified, grep interprets pattern_list as a fixed string. If neither are specified, grep interprets pattern_list as a basic regular expression as described on regex(5) manual page. OPTIONS The following options are supported for both /usr/bin/grep and /usr/xpg4/bin/grep: -b Precede each line by the block number on which it was found.
    [Show full text]
  • Regular Expressions Grep and Sed Intro Previously
    Lecture 4 Regular Expressions grep and sed intro Previously • Basic UNIX Commands – Files: rm, cp, mv, ls, ln – Processes: ps, kill • Unix Filters – cat, head, tail, tee, wc – cut, paste – find – sort, uniq – comm, diff, cmp – tr Subtleties of commands • Executing commands with find • Specification of columns in cut • Specification of columns in sort • Methods of input – Standard in – File name arguments – Special "-" filename • Options for uniq Today • Regular Expressions – Allow you to search for text in files – grep command • Stream manipulation: – sed Regular Expressions What Is a Regular Expression? • A regular expression (regex) describes a set of possible input strings. • Regular expressions descend from a fundamental concept in Computer Science called finite automata theory • Regular expressions are endemic to Unix – vi, ed, sed, and emacs – awk, tcl, perl and Python – grep, egrep, fgrep – compilers Regular Expressions • The simplest regular expressions are a string of literal characters to match. • The string matches the regular expression if it contains the substring. regular expression c k s UNIX Tools rocks. match UNIX Tools sucks. match UNIX Tools is okay. no match Regular Expressions • A regular expression can match a string in more than one place. regular expression a p p l e Scrapple from the apple. match 1 match 2 Regular Expressions • The . regular expression can be used to match any character. regular expression o . For me to poop on. match 1 match 2 Character Classes • Character classes [] can be used to match any specific set of characters. regular expression b [eor] a t beat a brat on a boat match 1 match 2 match 3 Negated Character Classes • Character classes can be negated with the [^] syntax.
    [Show full text]
  • Linux Terminal Commands Man Pwd Ls Cd Olmo S
    Python Olmo S. Zavala Romero Welcome Computers File system Super basics of Linux terminal Commands man pwd ls cd Olmo S. Zavala Romero mkdir touch Center of Atmospheric Sciences, UNAM rm mv Ex1 August 9, 2017 Regular expres- sions grep Ex2 More Python Olmo S. 1 Welcome Zavala Romero 2 Computers Welcome 3 File system Computers File system 4 Commands Commands man man pwd pwd ls ls cd mkdir cd touch mkdir rm touch mv Ex1 rm Regular mv expres- sions Ex1 grep Regular expressions Ex2 More grep Ex2 More Welcome to this course! Python Olmo S. Zavala Romero Welcome Computers File system Commands 1 Who am I? man pwd 2 Syllabus ls cd 3 Intro survey https://goo.gl/forms/SD6BM6KHKRlDOpZx1 mkdir 4 touch Homework 1 due this Sunday. rm mv Ex1 Regular expres- sions grep Ex2 More How does a computer works? Python Olmo S. Zavala Romero Welcome Computers File system Commands man pwd ls cd mkdir touch rm mv Ex1 Regular 1 CPU Central Processing Unit. All the computations happen there. expres- sions 2 HD Hard Drive. Stores persistent information in binary format. grep 3 RAM Random Access Memory. Faster memory, closer to the CPU, not persistent. Ex2 More 4 GPU Graphics Processing Unit. Used to process graphics. https://teentechdaily.files.wordpress.com/2015/06/computer-parts-diagram.jpg What is a file? Python Olmo S. Zavala Romero Welcome Computers File system Commands man pwd ls cd mkdir touch rm mv Section inside the HD with 0’s and 1’s. Ex1 Regular File extension is used to identify the meaning of those 0’s and 1’s.
    [Show full text]
  • How to Build a Search-Engine with Common Unix-Tools
    The Tenth International Conference on Advances in Databases, Knowledge, and Data Applications Mai 20 - 24, 2018 - Nice/France How to build a Search-Engine with Common Unix-Tools Andreas Schmidt (1) (2) Department of Informatics and Institute for Automation and Applied Informatics Business Information Systems Karlsruhe Institute of Technologie University of Applied Sciences Karlsruhe Germany Germany Andreas Schmidt DBKDA - 2018 1/66 Resources available http://www.smiffy.de/dbkda-2018/ 1 • Slideset • Exercises • Command refcard 1. all materials copyright, 2018 by andreas schmidt Andreas Schmidt DBKDA - 2018 2/66 Outlook • General Architecture of an IR-System • Naive Search + 2 hands on exercices • Boolean Search • Text analytics • Vector Space Model • Building an Inverted Index & • Inverted Index Query processing • Query Processing • Overview of useful Unix Tools • Implementation Aspects • Summary Andreas Schmidt DBKDA - 2018 3/66 What is Information Retrieval ? Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an informa- tion need (usually a query) from within large collections (usually stored on computers). [Manning et al., 2008] Andreas Schmidt DBKDA - 2018 4/66 What is Information Retrieval ? need for query information representation how to match? document document collection representation Andreas Schmidt DBKDA - 2018 5/66 Keyword Search • Given: • Number of Keywords • Document collection • Result: • All documents in the collection, cotaining the keywords • (ranked by relevance) Andreas Schmidt DBKDA - 2018 6/66 Naive Approach • Iterate over all documents d in document collection • For each document d, iterate all words w and check, if all the given keywords appear in this document • if yes, add document to result set • Output result set • Extensions/Variants • Ranking see examples later ...
    [Show full text]
  • Introduction to BASH: Part II
    Introduction to BASH: Part II By Michael Stobb University of California, Merced February 17th, 2017 Quick Review ● Linux is a very popular operating system for scientific computing ● The command line interface (CLI) is ubiquitous and efficient ● A “shell” is a program that interprets and executes a user's commands ○ BASH: Bourne Again SHell (by far the most popular) ○ CSH: C SHell ○ ZSH: Z SHell ● Does everyone have access to a shell? Quick Review: Basic Commands ● pwd ○ ‘print working directory’, or where are you currently ● cd ○ ‘change directory’ in the filesystem, or where you want to go ● ls ○ ‘list’ the contents of the directory, or look at what is inside ● mkdir ○ ‘make directory’, or make a new folder ● cp ○ ‘copy’ a file ● mv ○ ‘move’ a file ● rm ○ ‘remove’ a file (be careful, usually no undos!) ● echo ○ Return (like an echo) the input to the screen ● Autocomplete! Download Some Example Files 1) Make a new folder, perhaps ‘bash_examples’, then cd into it. Capital ‘o’ 2) Type the following command: wget "goo.gl/oBFKrL" -O tutorial.tar 3) Extract the tar file with: tar -xf tutorial.tar 4) Delete the old tar file with rm tutorial.tar 5) cd into the new director ‘tutorial’ Input/Output Redirection ● Typically we give input to a command as follows: ○ cat file.txt ● Make the input explicit by using “<” ○ cat < file.txt ● Change the output by using “>” ○ cat < file.txt > output.txt ● Use the output of one function as the input of another ○ cat < file.txt | less BASH Utilities ● BASH has some awesome utilities ○ External commands not
    [Show full text]
  • Laboratory Manual
    LINUX SHELL PROGRAMMING LAB (4CS4-24) LABORATORY MANUAL FOR IV -SEMESTER COMPUTER SCIENCE& ENGINEERING Linux Shell Programming Lab- 4CS4-24 INTERNAL MARKS: 30 EXTERNAL MARKS: 20 Department of Computer Science and Engineering GLOBAL INSTITUTE OF TECHNOLOGY, JAIPUR Department of Computer Science & Engineering Global Institute of Technology, Jaipur LINUX SHELL PROGRAMMING LAB (4CS4-24) Scheme as per RTU M.M M.M. Subject Exam Sessional/ Total Name of Subject L T P End Code Hrs. Mid M.M. Term Term 4CS4-24 Linux Shell Prog. 2 - - 2 30 20 50 Lab Assessment criteria A. Internal Assessment: 30 In continuous evaluation system of the university, a student is evaluated throughout semester. His/her performance in the lab, attendance, practical knowledge, problem solving skill, written work in practical file and behavior are main criteria to evaluate student performance. Apart from that a lab quiz will be organize to see program programming skill and knowledge about the proposed subject. B. External Assessment: 20 At the end of the semester a lab examination will be scheduled to check overall programming skill, in which student will need to solve 2 programming problems in time span of 3 hours. C. Total Marks: 30+20=50 Department of Computer Science & Engineering Global Institute of Technology, Jaipur LINUX SHELL PROGRAMMING LAB (4CS4-24) SYLLABUS AS PER RTU 1. Use of basic Unix Shell Commands: ls, mkdir, rmdir, cd, cat, banner, touch, file, wc, sort, cut, grep, dd, dfspace, du, ulimit. 2. Commands related to inode, I/O redirection, piping, process control commands, mails. 3. Shell Programming: shell script exercise based on following: a.
    [Show full text]
  • Introduction to Bash Programming
    Bash Scripting: Advanced Topics CISC3130, Spring 2013 Dr. Zhang 1 Outline Review HW1, HW2 and Quiz2 Review of standard input/output/error How to redirect them ? Pipeline Review of bash scripting Functions Here documents Arrays 2 Homework 2 match phone numbers in text file 7188174484, 718-817-4484, (718)817,4484 817-4484, or 817,4484, 8174484. (01)718,817,4484, 01,718-817-4484 grep -f phone.grep file.txt , where phone.grep: [^0-9][0-9]\{10\}$ Match 10 digits at end of line [^0-9][0-9]\{10\}[^0-9] Match 10 digits, and a non-digit char [^0-9][0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}$ 718-817,4484 at end of line [^0-9][0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}[^0-9] [^0-9][0-9]\{3\}\,[0-9]\{4\}$ [^0-9][0-9]\{3\}\,[0-9]\{4\}[^0-9] [^0-9][0-9]\{3\}\-[0-9]\{4\}$ [^0-9][0-9]\{3\}\-[0-9]\{4\}[^0-9] 3 Homework 2 [^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\)[0-9]\{3\}\,[0-9]\{4\}$ [^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\)[0-9]\{3\}\,[0-9]\{4\}[^0-9] [^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\))?[0-9]\{3\}\-[0-9]\{4\}$ [^0-9]*\([0-9]\{2\}\)\([0-9]\{3\}\))?[0-9]\{3\}\-[0-9]\{4\}[^0-9] [^0-9]*[0-9]\{2\}\,[0-9]\{3\}\,[0-9]\{3\}\,[0-9]\{4\}$ [^0-9]*[0-9]\{2\}\,[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}[^0-9] [^0-9]*[0-9]\{2\}\,[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}$ [^0-9]*[0-9]\{2\}\,[0-9]\{3\}\,[0-9]\{3\}\,[0-9]\{4\}[^0-9] 4 Homework 2 Write a sed script file that remove all one-line comments from C/C++ source code files.
    [Show full text]
  • UNIX II:Grep, Awk, Sed October 30, 2017 File Searching and Manipulation
    UNIX II:grep, awk, sed October 30, 2017 File searching and manipulation • In many cases, you might have a file in which you need to find specific entries (want to find each case of NaN in your datafile for example) • Or you want to reformat a long datafile (change order of columns, or just use certain columns) • Can be done with writing python or other scripts, today will use other UNIX tools grep: global regular expression print • Use to search for a pattern and print them • Amazingly useful! (a lot like Google) grep Basic syntax: >>> grep <pattern> <inputfile> >>> grep Oklahoma one_week_eq.txt 2017-10-28T09:32:45.970Z,35.3476,-98.0622,5,2.7,mb_lg,,133,0.329,0.3,us,us1000ay0b,2017-10-28T09:47:05.040Z,"11km WNW of Minco, Oklahoma",earthquake,1.7,1.9,0.056,83,reviewed,us,us 2017-10-28T04:08:45.890Z,36.2119,-97.2878,5,2.5,mb_lg,,41,0.064,0.32,us,us1000axz3,2017-10-28T04:22:21.040Z,"8km S of Perry, Oklahoma",earthquake,1.4,2,0.104,24,reviewed,us,us 2017-10-27T18:39:28.100Z,36.4921,-98.7233,6.404,2.7,ml,,50,,0.41,us,us1000axpz,2017-10-28T02:02:23.625Z,"33km NW of Fairview, Oklahoma",earthquake,1.3,2.6,,,reviewed,tul,tul 2017-10-27T10:00:07.430Z,36.2851,-97.506,5,2.8,mb_lg,,25,0.216,0.19,us,us1000axgi,2017-10-27T19:39:37.296Z,"19km W of Perry, Oklahoma",earthquake,0.7,1.8,0.071,52,reviewed,us,us 2017-10-25T15:17:48.200Z,36.2824,-97.504,7.408,3.1,ml,,25,,0.23,us,us1000awq6,2017-10-25T21:38:59.678Z,"19km W of Perry, Oklahoma",earthquake,1.1,5,,,reviewed,tul,tul 2017-10-25T11:05:21.940Z,35.4134,-97.0133,5,2.5,mb_lg,,157,0.152,0.31,us,us1000awms,2017-10-27T21:37:47.660Z,"7km
    [Show full text]
  • Lecture 7 Regular Expressions and Grep 7.1
    LECTURE 7 REGULAR EXPRESSIONS AND GREP 7.1 Regular Expressions 7.1.1 Metacharacters, Wild cards and Regular Expressions Characters that have special meaning for utilities like grep, sed and awk are called metacharacters. These special characters are usually alpha-numeric in nature; a "e# e$- ample metacharacters are: caret &�'( dollar &+'( question mar- &.'( asterisk &/'( backslash &1'("orward slash &2'( ampersand &3'( set braces &4 an) 5'( range brac-ets &[...]). These special characters are interpreted contextually 0y the utilities. If you need to use the spe- cial characters as is without any interpretation, they need to be prefixed with the escape character (\). 8or example( a literal + can be inserted as 1+! a literal \can be inserted as \\. Although not as commonly used, numbers also can be turned into metacharacters 0y using escape character. 8or example( a number 9 is a literal number tw*( while 19 has a special meaning. Often times( the same metacharacters ha:e special meaning "or the shell as #ell. Thus #e need to be careful when passing metacharacters as arguments to utilities. It is a standard practice to embed the metacharacter arguments in single quotes to stop the shell from interpreting the metacharacters. $grep a*b file . asteris- gets interpreted 0y shell $grep ’a*b’ file . asterisk gets interpreted 0y 6rep utility Although single quotes are needed only "or arguments with metacharacters( as a 6**) practice( the pattern arguments *" the 6rep( sed and a#- utilities are always embedded in single quotes. $grep ’ab’ file $sed ’/a*b/’ file 7.1 Regular Expressions 47 Wild card Meaning given 0y shell / Zero or more characters .
    [Show full text]
  • LFCS Exam Name: Linux Foundation Certified System Administrator
    Vendor: Linux Foundation Exam Code: LFCS Exam Name: Linux Foundation Certified System Administrator Version: DEMO QUESTION 1 What is the output of the following command? echo "Hello World" | tr -d aieou A. Hello World B. eoo C. Hll Wrld D. eoo Hll Wrld Answer: C QUESTION 2 Given a file called birthdays containing lines like: YYYY-MM-DD Name 1983-06-02 Tim 1995-12-17 Sue Which command would you use to output the lines belonging to all people listed whose birthday is in May or June? A. grep '[56]' birthdays B. grep 05?6? birthdays C. grep '[0-9]*-0[56]-' birthdays D. grep 06 birthdays | grep 05 Answer: C QUESTION 3 Which keyword must be listed in the hosts option of the Name Service Switch configuration file in order to make host lookups consult the /etc/hosts file? Answer: files QUESTION 4 Which command can be used to delete a group from a Linux system? A. groupdel B. groupmod C. groups D. groupedit Answer: A QUESTION 5 How many IP-addresses can be used for unique hosts inside the IPv4 subnet 192.168.2.128/28? (Specify the number only without any additional information.) Answer: 14 QUESTION 6 What is the purpose of the command mailq? A. It fetches new emails from a remote server using POP3 or IMAP. B. It is a multi-user mailing list manager. C. It is a proprietary tool contained only in the qmail MTA. D. It queries the mail queue of the local MTA. E. It is a command-line based tool for reading and writing emails.
    [Show full text]