Grep: Searches for a Pattern in Files Grep [Options] Pattern [File-List]

Total Page:16

File Type:pdf, Size:1020Kb

Grep: Searches for a Pattern in Files Grep [Options] Pattern [File-List] grep: Searches for a pattern in files grep [options] pattern [file-list] The grep utility searches one or more files, line by line, for a pattern, which can be a simple string or another form of a regular expression. The grep utility takes various actions, specified by options, each time it finds a line that contains a match for the pattern. This utility takes its input from files you specify on the command line or from standard input. Arguments The pattern is a regular expression, as defined in Appendix A. You must quote regular expressions that contain special characters, SPACE s, or TAB s. An easy way to quote these characters is to enclose the entire expression within single quotation marks. The file-list is a list of the pathnames of ordinary files that grep searches. With the –r option, file- list may contain directories whose contents are searched. Options Without any options grep sends lines that contain a match for pattern to standard output. When you specify more than one file on the command line, grep precedes each line that it displays with the name of the file that it came from followed by a colon. Major Options You can use only one of the following three options at a time. Normally you do not need to use any, because grep defaults to –G, which is regular grep. –E (extended) Interprets pattern as an extended regular expression (page 836). The command grep –E is the same as egrep. See "Notes" later in this section. –F (fixed) Interprets pattern as a fixed string of characters. The command grep –F is the same as fgrep. –G (grep) Interprets pattern as a basic regular expression. This is the default major option if none is specified. Other Options ––count –c Displays only the number of lines that contain a match in each file. ––context=n – C n Displays n lines of context around each matching line. ––file=file –f File Reads file, which contains one pattern per line, and finds lines in the input that match each of the patterns. ––no-filename –h Does not display the filename at the beginning of each line when searching through multiple files. ––ignore-case –i Causes lowercase letters in the pattern to match uppercase letters in the file, and vice versa. Use this option when you are searching for a word that may be at the beginning of a sentence (that is, may or may not start with an uppercase letter). ––files-with-matches –l (lowercase "l") Displays only the name of each file that contains one or more matches. A filename is displayed only once, even if the file contains more than one match. ––max-count=n –m n Stops reading each file, or standard input, after displaying n lines containing matches. ––line-number –n Precedes each line by its line number in the file. The file does not need to contain line numbers. ––quiet or ––silent –q Does not write anything to standard output; only sets the exit code. ––recursive –r or –R Recursively descends directories in file-list and processes files within these directories. ––no-messages –s (silent) Does not display an error message if a file in file-list does not exist or is not readable. ––invert-match –v Causes lines not containing a match to satisfy the search. When you use this option by itself, grep displays all lines that do not contain a match for the pattern. ––word-regexp –w With this option, the pattern must match a whole word. This option is helpful if you are searching for a specific word that may also appear as a substring of another word in the file. ––line-regexp – x The pattern matches whole lines only. Notes The grep utility returns an exit status of 0 if it finds a match, 1 if it does not find a match, and 2 if the file is not accessible or there is a syntax error. egrep and fgrep Two utilities perform functions similar to that of grep. The egrep utility (same as grep –E) allows you to use extended regular expressions, which include a different set of special characters than basic regular expressions. The fgrep utility (same as grep –F) is fast and compact but processes only simple strings, not regular expressions. Examples The following examples assume that the working directory contains three files: testa, testb, and testc: File testa File testb File testc aaabb aaaaa AAAAA bbbcc bbbbb BBBBB ff-ff ccccc CCCCC cccdd ddddd DDDDD dddaa The grep utility can search for a pattern that is a simple string of characters. The following command line searches testa and displays each line containing the string bb: $ grep bb testa aaabb bbbcc The –v option reverses the sense of the test. The following example displays the lines in testa without bb: $ grep -v bb testa ff-ff cccdd dddaa The –n option displays the line number of each displayed line: $ grep -n bb testa 1:aaabb 2:bbbcc The grep utility can search through more than one file. Here grep searches through each file in the working directory. The name of the file containing the string precedes each line of output. $ grep bb * testa:aaabb testa:bbbcc testb:bbbbb When the search for the string bb is done with the –w option, grep produces no output because none of the files contains the string bb as a separate word: $ grep -w bb * $ The search that grep performs is case sensitive. Because the previous examples specified lowercase bb, grep did not find the uppercase string BBBBB in testc. The –i option causes both uppercase and lowercase letters to match either case of letter in the pattern: $ grep -i bb * testa:aaabb testa:bbbcc testb:bbbbb testc:BBBBB $ grep -i BB * testa:aaabb testa:bbbcc testb:bbbbb testc:BBBBB The –c option displays the number of lines in each file that contain a match: $ grep -c bb * testa:2 testb:1 testc:0 The –f option finds matches for each pattern in a file of patterns. The next example shows gfile, which holds two patterns, one per line, and grep searching for matches to the patterns in gfile: $ cat gfile aaa bbb $ grep -f gfile test* testa:aaabb testa:bbbcc testb:aaaaa testb:bbbbb The following command line displays from text2 lines that contain a string of characters starting with st, followed by zero or more characters (.* represents zero or more characters in a regular expression), and ending in ing: $ grep 'st.*ing' text2 ... The ^ regular expression, which matches the beginning of a line, can be used alone to match every line in a file. Together with the –n option, ^ can be used to display the lines in a file, preceded by their line numbers: $ grep -n '^' testa 1:aaabb 2:bbbcc 3:ff-ff 4:cccdd 5:dddaa The next command line counts the number of times #include statements appear in C source files in the working directory. The –h option causes grep to suppress the filenames from its output. The input to sort is all lines from *.c that match #include. The output from sort is an ordered list of lines that contains many duplicates. When uniq with the –c option processes this sorted list, it outputs repeated lines only once, along with a count of the number of repetitions in its input. $ grep -h '#include' *.c | sort | uniq -c 9 #include "buff.h" 2 #include "poly.h" 1 #include "screen.h" 6 #include "window.h" 2 #include "x2.h" 2 #include "x3.h" 2 #include <math.h> 3 #include <stdio.h> The final command calls the vim editor with a list of files in the working directory that contain the string Sampson. The $(…) command substitution construct (page 329) causes the shell to execute grep in place and supply vim with a list of filenames that you want to edit: $ vim $(grep -l 'Sampson' *) ... The single quotation marks are not necessary in this example, but they are required if the regular expression you are searching for contains special characters or SPACEs. It is generally a good habit to quote the pattern so that the shell does not interpret special characters it may contain..
Recommended publications
  • Administering Unidata on UNIX Platforms
    C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\ADMINUNIX\ADMINUNIXTITLE.fm March 5, 2010 1:34 pm Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta UniData Administering UniData on UNIX Platforms UDT-720-ADMU-1 C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\ADMINUNIX\ADMINUNIXTITLE.fm March 5, 2010 1:34 pm Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Notices Edition Publication date: July, 2008 Book number: UDT-720-ADMU-1 Product version: UniData 7.2 Copyright © Rocket Software, Inc. 1988-2010. All Rights Reserved. Trademarks The following trademarks appear in this publication: Trademark Trademark Owner Rocket Software™ Rocket Software, Inc. Dynamic Connect® Rocket Software, Inc. RedBack® Rocket Software, Inc. SystemBuilder™ Rocket Software, Inc. UniData® Rocket Software, Inc. UniVerse™ Rocket Software, Inc. U2™ Rocket Software, Inc. U2.NET™ Rocket Software, Inc. U2 Web Development Environment™ Rocket Software, Inc. wIntegrate® Rocket Software, Inc. Microsoft® .NET Microsoft Corporation Microsoft® Office Excel®, Outlook®, Word Microsoft Corporation Windows® Microsoft Corporation Windows® 7 Microsoft Corporation Windows Vista® Microsoft Corporation Java™ and all Java-based trademarks and logos Sun Microsystems, Inc. UNIX® X/Open Company Limited ii SB/XA Getting Started The above trademarks are property of the specified companies in the United States, other countries, or both. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names as designated by the companies who own or market them. License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc., are furnished under license, and may be used and copied only in accordance with the terms of such license and with the inclusion of the copyright notice.
    [Show full text]
  • DC Console Using DC Console Application Design Software
    DC Console Using DC Console Application Design Software DC Console is easy-to-use, application design software developed specifically to work in conjunction with AML’s DC Suite. Create. Distribute. Collect. Every LDX10 handheld computer comes with DC Suite, which includes seven (7) pre-developed applications for common data collection tasks. Now LDX10 users can use DC Console to modify these applications, or create their own from scratch. AML 800.648.4452 Made in USA www.amltd.com Introduction This document briefly covers how to use DC Console and the features and settings. Be sure to read this document in its entirety before attempting to use AML’s DC Console with a DC Suite compatible device. What is the difference between an “App” and a “Suite”? “Apps” are single applications running on the device used to collect and store data. In most cases, multiple apps would be utilized to handle various operations. For example, the ‘Item_Quantity’ app is one of the most widely used apps and the most direct means to take a basic inventory count, it produces a data file showing what items are in stock, the relative quantities, and requires minimal input from the mobile worker(s). Other operations will require additional input, for example, if you also need to know the specific location for each item in inventory, the ‘Item_Lot_Quantity’ app would be a better fit. Apps can be used in a variety of ways and provide the LDX10 the flexibility to handle virtually any data collection operation. “Suite” files are simply collections of individual apps. Suite files allow you to easily manage and edit multiple apps from within a single ‘store-house’ file and provide an effortless means for device deployment.
    [Show full text]
  • GNU Grep: Print Lines That Match Patterns Version 3.7, 8 August 2021
    GNU Grep: Print lines that match patterns version 3.7, 8 August 2021 Alain Magloire et al. This manual is for grep, a pattern matching engine. Copyright c 1999{2002, 2005, 2008{2021 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". i Table of Contents 1 Introduction ::::::::::::::::::::::::::::::::::::: 1 2 Invoking grep :::::::::::::::::::::::::::::::::::: 2 2.1 Command-line Options ::::::::::::::::::::::::::::::::::::::::: 2 2.1.1 Generic Program Information :::::::::::::::::::::::::::::: 2 2.1.2 Matching Control :::::::::::::::::::::::::::::::::::::::::: 2 2.1.3 General Output Control ::::::::::::::::::::::::::::::::::: 3 2.1.4 Output Line Prefix Control :::::::::::::::::::::::::::::::: 5 2.1.5 Context Line Control :::::::::::::::::::::::::::::::::::::: 6 2.1.6 File and Directory Selection:::::::::::::::::::::::::::::::: 7 2.1.7 Other Options ::::::::::::::::::::::::::::::::::::::::::::: 9 2.2 Environment Variables:::::::::::::::::::::::::::::::::::::::::: 9 2.3 Exit Status :::::::::::::::::::::::::::::::::::::::::::::::::::: 12 2.4 grep Programs :::::::::::::::::::::::::::::::::::::::::::::::: 13 3 Regular Expressions ::::::::::::::::::::::::::: 14 3.1 Fundamental Structure ::::::::::::::::::::::::::::::::::::::::
    [Show full text]
  • “Linux at the Command Line” Don Johnson of BU IS&T  We’Ll Start with a Sign in Sheet
    “Linux at the Command Line” Don Johnson of BU IS&T We’ll start with a sign in sheet. We’ll end with a class evaluation. We’ll cover as much as we can in the time allowed; if we don’t cover everything, you’ll pick it up as you continue working with Linux. This is a hands-on, lab class; ask questions at any time. Commands for you to type are in BOLD The Most Common O/S Used By BU Researchers When Working on a Server or Computer Cluster Linux is a Unix clone begun in 1991 and written from scratch by Linus Torvalds with assistance from a loosely-knit team of hackers across the Net. 64% of the world’s servers run some variant of Unix or Linux. The Android phone and the Kindle run Linux. a set of small Linux is an O/S core programs written by written by Linus Richard Stallman and Torvalds and others others. They are the AND GNU utilities. http://www.gnu.org/ Network: ssh, scp Shells: BASH, TCSH, clear, history, chsh, echo, set, setenv, xargs System Information: w, whoami, man, info, which, free, echo, date, cal, df, free Command Information: man, info Symbols: |, >, >>, <, ;, ~, ., .. Filters: grep, egrep, more, less, head, tail Hotkeys: <ctrl><c>, <ctrl><d> File System: ls, mkdir, cd, pwd, mv, touch, file, find, diff, cmp, du, chmod, find File Editors: gedit, nedit You need a “xterm” emulation – software that emulates an “X” terminal and that connects using the “SSH” Secure Shell protocol. ◦ Windows Use StarNet “X-Win32:” http://www.bu.edu/tech/support/desktop/ distribution/xwindows/xwin32/ ◦ Mac OS X “Terminal” is already installed Why? Darwin, the system on which Apple's Mac OS X is built, is a derivative of 4.4BSD-Lite2 and FreeBSD.
    [Show full text]
  • Useful Commands in Linux and Other Tools for Quality Control
    Useful commands in Linux and other tools for quality control Ignacio Aguilar INIA Uruguay 05-2018 Unix Basic Commands pwd show working directory ls list files in working directory ll as before but with more information mkdir d make a directory d cd d change to directory d Copy and moving commands To copy file cp /home/user/is . To copy file directory cp –r /home/folder . to move file aa into bb in folder test mv aa ./test/bb To delete rm yy delete the file yy rm –r xx delete the folder xx Redirections & pipe Redirection useful to read/write from file !! aa < bb program aa reads from file bb blupf90 < in aa > bb program aa write in file bb blupf90 < in > log Redirections & pipe “|” similar to redirection but instead to write to a file, passes content as input to other command tee copy standard input to standard output and save in a file echo copy stream to standard output Example: program blupf90 reads name of parameter file and writes output in terminal and in file log echo par.b90 | blupf90 | tee blup.log Other popular commands head file print first 10 lines list file page-by-page tail file print last 10 lines less file list file line-by-line or page-by-page wc –l file count lines grep text file find lines that contains text cat file1 fiel2 concatenate files sort sort file cut cuts specific columns join join lines of two files on specific columns paste paste lines of two file expand replace TAB with spaces uniq retain unique lines on a sorted file head / tail $ head pedigree.txt 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10
    [Show full text]
  • DC Load Application Note
    DC Electronic Load Applications and Examples Application Note V 3032009 22820 Savi Ranch Parkway Yorba Linda CA, 92887-4610 www.bkprecision.com Table of Contents INTRODUCTION.........................................................................................................................3 Overview of software examples........................................................................................................3 POWER SUPPLY TESTING.......................................................................................................4 Load Transient Response.................................................................................................................4 Load Regulation................................................................................................................................5 Current Limiting................................................................................................................................6 BATTERY TESTING...................................................................................................................7 Battery Discharge Curves.................................................................................................................7 Battery Internal Resistances.............................................................................................................8 PERFORMANCE TESTING OF DC LOADS...........................................................................10 Slew Rate.......................................................................................................................................10
    [Show full text]
  • Mv-409S (8-14)
    MV-409S (8-14) www.dmv.state.pa.us SUPPLEMENTAL CERTIFICATION APPLICATION FOR THE ENHANCED VEHICLE SAFETY For Department Use Only INSPECTION PROGRAM Bureau of Motor Vehicles • Vehicle Inspection Division • P.O. Box 68697 (TO BE USED FOR CATEGORY 4 TESTING ONLY) Harrisburg, PA 17106-8697 PRINT OR TYPE ALL INFORMATION - MUST BE SUBMITTED BY AN APPROVED EDUCATIONAL FACILITY Upon successful completion of testing, applicants who currently hold an inspection mechanic certification will receive an updated mechanic certification card; applicants who do not hold an inspection mechanic certification will receive a Certificate of Completion. All applicants must be 18 years of age and have a valid operator’s license. A APPLICANT INFORMATION Last Name First Name Middle Name Driver’s License/Photo ID# State Issued From Street Address City State Zip Code County Work Telephone Number Home Telephone Number Date Of Birth Applicant Gender ( ) ( ) r Male r Female Do you currently hold a valid out-of-state driver’s license? (If yes, attach a copy) . r Yes r No *Contact PennDOT’s Vehicle Inspection Division at 717-787-2895 to establish an out-of-state mechanic record prior to completion of this class. Do you currently hold a PA inspection mechanic certification? . r Yes r No If yes, enter the classes listed on your mechanic certification card: ________________________________________ List any restrictions on your driver’s license (if applicable): __________________________________________________ Do you currently hold a valid Pennsylvania driver's license?. r Yes r No If no, enter the operator number on your safety inspection certification card: Have you held a Pennsylvania driver's license in the past? .
    [Show full text]
  • Linux File System and Linux Commands
    Hands-on Keyboard: Cyber Experiments for Strategists and Policy Makers Review of the Linux File System and Linux Commands 1. Introduction Becoming adept at using the Linux OS requires gaining familiarity with the Linux file system, file permissions, and a base set of Linux commands. In this activity, you will study how the Linux file system is organized and practice utilizing common Linux commands. Objectives • Describe the purpose of the /bin, /sbin, /etc, /var/log, /home, /proc, /root, /dev, /tmp, and /lib directories. • Describe the purpose of the /etc/shadow and /etc/passwd files. • Utilize a common set of Linux commands including ls, cat, and find. • Understand and manipulate file permissions, including rwx, binary and octal formats. • Change the group and owner of a file. Materials • Windows computer with access to an account with administrative rights The Air Force Cyber College thanks the Advanced Cyber Engineering program at the Air Force Research Laboratory in Rome, NY, for providing the information to assist in educating the general Air Force on the technical aspects of cyberspace. • VirtualBox • Ubuntu OS .iso File Assumptions • The provided instructions were tested on an Ubuntu 15.10 image running on a Windows 8 physical machine. Instructions may vary for other OS. • The student has administrative access to their system and possesses the right to install programs. • The student’s computer has Internet access. 2. Directories / The / directory or root directory is the mother of all Linux directories, containing all of the other directories and files. From a terminal users can type cd/ to move to the root directory.
    [Show full text]
  • User Commands Grep ( 1 ) Grep – Search a File for a Pattern /Usr/Bin/Grep [-Bchilnsvw] Limited-Regular-Expression [Filename
    User Commands grep ( 1 ) NAME grep – search a file for a pattern SYNOPSIS /usr/bin/grep [-bchilnsvw] limited-regular-expression [filename...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] -e pattern_list... [-f pattern_file]... [file...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] [-e pattern_list...] -f pattern_file... [file...] /usr/xpg4/bin/grep [-E -F] [-c -l -q] [-bhinsvwx] pattern [file...] DESCRIPTION The grep utility searches text files for a pattern and prints all lines that contain that pattern. It uses a compact non-deterministic algorithm. Be careful using the characters $, ∗, [,ˆ,, (, ), and \ in the pattern_list because they are also meaning- ful to the shell. It is safest to enclose the entire pattern_list in single quotes ’ . ’. If no files are specified, grep assumes standard input. Normally, each line found is copied to standard output. The file name is printed before each line found if there is more than one input file. /usr/bin/grep The /usr/bin/grep utility uses limited regular expressions like those described on the regexp(5) manual page to match the patterns. /usr/xpg4/bin/grep The options -E and -F affect the way /usr/xpg4/bin/grep interprets pattern_list. If -E is specified, /usr/xpg4/bin/grep interprets pattern_list as a full regular expression (see -E for description). If -F is specified, grep interprets pattern_list as a fixed string. If neither are specified, grep interprets pattern_list as a basic regular expression as described on regex(5) manual page. OPTIONS The following options are supported for both /usr/bin/grep and /usr/xpg4/bin/grep: -b Precede each line by the block number on which it was found.
    [Show full text]
  • Regular Expressions Grep and Sed Intro Previously
    Lecture 4 Regular Expressions grep and sed intro Previously • Basic UNIX Commands – Files: rm, cp, mv, ls, ln – Processes: ps, kill • Unix Filters – cat, head, tail, tee, wc – cut, paste – find – sort, uniq – comm, diff, cmp – tr Subtleties of commands • Executing commands with find • Specification of columns in cut • Specification of columns in sort • Methods of input – Standard in – File name arguments – Special "-" filename • Options for uniq Today • Regular Expressions – Allow you to search for text in files – grep command • Stream manipulation: – sed Regular Expressions What Is a Regular Expression? • A regular expression (regex) describes a set of possible input strings. • Regular expressions descend from a fundamental concept in Computer Science called finite automata theory • Regular expressions are endemic to Unix – vi, ed, sed, and emacs – awk, tcl, perl and Python – grep, egrep, fgrep – compilers Regular Expressions • The simplest regular expressions are a string of literal characters to match. • The string matches the regular expression if it contains the substring. regular expression c k s UNIX Tools rocks. match UNIX Tools sucks. match UNIX Tools is okay. no match Regular Expressions • A regular expression can match a string in more than one place. regular expression a p p l e Scrapple from the apple. match 1 match 2 Regular Expressions • The . regular expression can be used to match any character. regular expression o . For me to poop on. match 1 match 2 Character Classes • Character classes [] can be used to match any specific set of characters. regular expression b [eor] a t beat a brat on a boat match 1 match 2 match 3 Negated Character Classes • Character classes can be negated with the [^] syntax.
    [Show full text]
  • Linux Terminal Commands Man Pwd Ls Cd Olmo S
    Python Olmo S. Zavala Romero Welcome Computers File system Super basics of Linux terminal Commands man pwd ls cd Olmo S. Zavala Romero mkdir touch Center of Atmospheric Sciences, UNAM rm mv Ex1 August 9, 2017 Regular expres- sions grep Ex2 More Python Olmo S. 1 Welcome Zavala Romero 2 Computers Welcome 3 File system Computers File system 4 Commands Commands man man pwd pwd ls ls cd mkdir cd touch mkdir rm touch mv Ex1 rm Regular mv expres- sions Ex1 grep Regular expressions Ex2 More grep Ex2 More Welcome to this course! Python Olmo S. Zavala Romero Welcome Computers File system Commands 1 Who am I? man pwd 2 Syllabus ls cd 3 Intro survey https://goo.gl/forms/SD6BM6KHKRlDOpZx1 mkdir 4 touch Homework 1 due this Sunday. rm mv Ex1 Regular expres- sions grep Ex2 More How does a computer works? Python Olmo S. Zavala Romero Welcome Computers File system Commands man pwd ls cd mkdir touch rm mv Ex1 Regular 1 CPU Central Processing Unit. All the computations happen there. expres- sions 2 HD Hard Drive. Stores persistent information in binary format. grep 3 RAM Random Access Memory. Faster memory, closer to the CPU, not persistent. Ex2 More 4 GPU Graphics Processing Unit. Used to process graphics. https://teentechdaily.files.wordpress.com/2015/06/computer-parts-diagram.jpg What is a file? Python Olmo S. Zavala Romero Welcome Computers File system Commands man pwd ls cd mkdir touch rm mv Section inside the HD with 0’s and 1’s. Ex1 Regular File extension is used to identify the meaning of those 0’s and 1’s.
    [Show full text]
  • How to Build a Search-Engine with Common Unix-Tools
    The Tenth International Conference on Advances in Databases, Knowledge, and Data Applications Mai 20 - 24, 2018 - Nice/France How to build a Search-Engine with Common Unix-Tools Andreas Schmidt (1) (2) Department of Informatics and Institute for Automation and Applied Informatics Business Information Systems Karlsruhe Institute of Technologie University of Applied Sciences Karlsruhe Germany Germany Andreas Schmidt DBKDA - 2018 1/66 Resources available http://www.smiffy.de/dbkda-2018/ 1 • Slideset • Exercises • Command refcard 1. all materials copyright, 2018 by andreas schmidt Andreas Schmidt DBKDA - 2018 2/66 Outlook • General Architecture of an IR-System • Naive Search + 2 hands on exercices • Boolean Search • Text analytics • Vector Space Model • Building an Inverted Index & • Inverted Index Query processing • Query Processing • Overview of useful Unix Tools • Implementation Aspects • Summary Andreas Schmidt DBKDA - 2018 3/66 What is Information Retrieval ? Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an informa- tion need (usually a query) from within large collections (usually stored on computers). [Manning et al., 2008] Andreas Schmidt DBKDA - 2018 4/66 What is Information Retrieval ? need for query information representation how to match? document document collection representation Andreas Schmidt DBKDA - 2018 5/66 Keyword Search • Given: • Number of Keywords • Document collection • Result: • All documents in the collection, cotaining the keywords • (ranked by relevance) Andreas Schmidt DBKDA - 2018 6/66 Naive Approach • Iterate over all documents d in document collection • For each document d, iterate all words w and check, if all the given keywords appear in this document • if yes, add document to result set • Output result set • Extensions/Variants • Ranking see examples later ...
    [Show full text]