Process Text Streams Using Filters

Total Page:16

File Type:pdf, Size:1020Kb

Process Text Streams Using Filters Process Text Streams Using Filters OBJECTIVE: Candidates should should be able to apply filters to text streams. 1 Process Text Streams Using Filters KeyKEY knowledge KNOWLEDGE area(s): AREAS: Send text files and output streams through text utility filters to modify the output using standard UNIX commands found in the GNU textutils package. 2 Process Text Streams Using Filters KEY FILES,TERMS, UTILITIES cat nl tail cut paste tr expand pr unexpand fmt sed uniq head sort wc hexdump split join tac 3 cat cat the editor - used as a rudimentary text editor. cat > short-message we are curious to meet penguins in Prague Crtl+D *Ctrl+D - command is used for ending interactive input. 4 cat cat the reader More commonly used to flush text to stdout. Options: -n number each line of output -b number only non-blank output lines -A show carriage return Example cat /etc/resolv.conf ▶ search mydomain.org nameserver 127.0.0.1 5 tac tac reads back-to-front This command is the same as cat except that the text is read from the last line to the first. tac short-message ▶ penguins in Prague to meet we are curious 6 head or tail using head or tail - often used to analyze logfiles. - by default, output 10 lines of text. List 20 first lines of /var/log/messages: head -n 20 /var/log/messages head -20 /var/log/messages List 20 last lines of /etc/aliases: tail -20 /etc/aliases 7 head or tail The tail utility has an added option that allows one to list the end of a text starting at a given line. List text starting at line 25 in /var/log/messages: tail +25 /etc/log/messages tail can continuously read a file using the -f option. This is most useful when you are expecting a file to be modified in real time. 8 wc The wc utility counts the number of bytes, words, and lines in files. Options for wc output: -l count number of lines -w count number of words -c or -m count number of bytes or characters With no argument, wc will count what is typed in stdin. 9 nl The nl utility has the same output as cat -b. Number all lines including blanks nl -ba /etc/lilo.conf Number only lines with text nl -bt /etc/lilo.conf 10 expand / unexpand The expand command is used to replace TABs with spaces. One can also use unexpand for the reverse operations. 11 hexdump A common tool for viewing binary files. Another is od (octal dump) 12 split The split tool can split a file into smaller files using criteria such as size or number of lines. split -l 5 /etc/passwd This will create files called xaa, xab, xac, xad ... each file contains at least 5 lines. It is possible to give a more meaningful prefix name for the files (other than 'x') such as 'pass-5.' on the command line split -l 5 /etc/passwd passwd-5 This has created files identical to the ones above (aa, xab, xac, xad ...) but the names are now passwd-5aa, passwd-5ab, passwd-5ac, passwd-5ad ... 13 uniq The uniq tool will send to STDOUT only one version of consecutive identical lines. uniq> /tmp/UNIQUE line 1 line 2 line 2 line 3 line 3 line 3 line 1 ^D 14 uniq The file /tmp/UNIQUE has the following content: cat /tmp/UNIQUE line 1 line 2 line 3 line 1 non consecutive identical lines are still printed to STDOUT. sort | uniq > /tmp/UNIQUE 15 cut The cut utilility can extract a range of characters or fields from each line of a text. The –c option is used to manipulate characters. Syntax: cut –c {range1,range2} Example cut –c5-10,15- /etc/password 16 cut Syntax: cut -d {delimiter} -f {fields} Example cut -d: -f 1,7 --output-delimiter=" " /etc/passwd The default output-delimiter is the same as the original input delimiter. The --output-delimiter option allows you to change this. 17 join and paste paste concatenates two files next to each other. Syntax: paste text1 text2 With join you can further specify which fields you are considering. Syntax: join -j1 {field_num} -j2{field_num} text1 text2 or join -1 {field_num} -2{field_num} text1 text2 Text is sent to stdout only if the specified fields match. Comparison is done one line at a time and as soon as no match is made the process is stopped even if more matches exist at the end of the file. 18 sort By default, sort will arrange a text in alphabetical order. To perform a numerical sort use the -n option. 19 fmt You can modify the number of characters per line of output using fmt. concatenate lines and output 75 character lines. Options -w number of characters per line -s split long lines but do not refill -u place one space between each word and two spaces at the end of a sentence 20 pr Long files can be paginated to fit a given size of paper with the pr utility. One can control the page length (default is 66 lines) and page width (default 72 characters) as well as the number of columns. When outputting text to multiple columns each column will be evenly truncated across the defined page width. This means that characters are dropped unless the original text is edited to avoid this. 21 tr The tr utility translates one set of characters into another. Example changing uppercase letters into lowercase tr 'A-B' 'a-b' < file.txt Replacing delimiters in /etc/passwd: tr ':' ' ' < /etc/passwd tr has only two arguments. The file is not an argument. 22 sed The sed utility is most often used to search and replace patterns in text. It supports most regular expressions. Syntax: sed [options] ´command’ [INPUTFILE] The input file is optional since sed also works on file redirections and pipes. Example: (using file MODIF) Delete all commented lines: sed ‘/^#/ d ’ MODIF The search pattern is between the double slashs //. 23 sed Example continued: (using file MODIF) Substitute /dev/hda1 by /dev/sdb3: sed ‘s/\/dev\/hda1/\/dev\/sdb3/g’ MODIF The s in the command stands for ‘substitute’. The g stands for “globally” and forces the substitution to take place throughout each line. If the line contains the keyword KEY then substitute ‘:’ with ‘;’ globally: sed ‘ /KEY/ s/:/;/g’ MODIF 24 sed You can issue several commands each starting with –e. Example: (1) delete all blanks then (2) substitute ‘OLD’ by ‘NEW’ in the file MODIF sed –e ‘/^$/ d’ -e ‘s/OLD/NEW/g’ MODIF These commands can also be written to a file, then each line is interpreted as a new command to execute (no quotes are needed). An example COMMANDS file 1 s/old/new/ /keyword/ s/old/new/g 23,25 d The syntax to use this COMMANDS file is: sed -f COMMANDS MODIF 25 sed Summary of Options Commandline flags -e Execute the following command -f Read commands from a file -n Do not printout unedited lines sed commands d Delete an entire line r Read a file and append to output s Substitute w Write output to a file 26 cat – concatenate files and print on the standard output cut– remove sections from each line of files expand – convert tabs to spaces fmt– simple optimal text formatter head– output the first part of files join– join lines of two files on a common field nl – number lines of files od – dump files in octal and other formats paste – merge lines of files sort – sort lines of text files split – split a file into pieces tac– concatenate and print files in reverse tail – output the last part of files tr – translate or delete characters unexpand – convert spaces to tabs uniq – remove duplicate lines from a sorted file wc – print the number of bytes, words, and lines in files 27.
Recommended publications
  • A Brief Introduction to Unix-2019-AMS
    Brief Intro to Linux/Unix Brief Intro to Unix (contd) A Brief Introduction to o Brief History of Unix o Compilers, Email, Text processing o Basics of a Unix session o Image Processing Linux/Unix – AMS 2019 o The Unix File System Pete Pokrandt o Working with Files and Directories o The vi editor UW-Madison AOS Systems Administrator o Your Environment [email protected] o Common Commands Twitter @PTH1 History of Unix History of Unix History of Unix o Created in 1969 by Kenneth Thompson and Dennis o Today – two main variants, but blended o It’s been around for a long time Ritchie at AT&T o Revised in-house until first public release 1977 o System V (Sun Solaris, SGI, Dec OSF1, AIX, o It was written by computer programmers for o 1977 – UC-Berkeley – Berkeley Software Distribution (BSD) linux) computer programmers o 1983 – Sun Workstations produced a Unix Workstation o BSD (Old SunOS, linux, Mac OSX/MacOS) o Case sensitive, mostly lowercase o AT&T unix -> System V abbreviations 1 Basics of a Unix Login Session Basics of a Unix Login Session Basics of a Unix Login Session o The Shell – the command line interface, o Features provided by the shell o Logging in to a unix session where you enter commands, etc n Create an environment that meets your needs n login: username n Some common shells n Write shell scripts (batch files) n password: tImpAw$ n Define command aliases (this Is my password At work $) Bourne Shell (sh) OR n Manipulate command history IHateHaving2changeMypasswordevery3weeks!!! C Shell (csh) n Automatically complete the command
    [Show full text]
  • How the /Etc/Passwd File Relates to the Laua User
    Page 1 (2) System Foundation 9.0 laua User and Group How the /etc/passwd File Relates to the laua User Scenario Most users who need access to the Lawson environment will need to have a user account set up on the Operating System where Lawson System Foundation 9.0 has been installed. Users will need resources allocated to run their jobs and space in the directory structure to store their print files. The user account on the Operating System is what allows the allocation of resources and space. In the example presented in this session, users are associated to the Unix Operating System and setup via the /etc/passwd file. The information stored on the user account in the file is used by Lawson’s Legacy Security System, laua. This scenario will present the /etc/passwd file and highlight the important parts of the user represented in the file and how it relates to user setup. Workflow Step 1 of 3: To verify that a user has an account on the Unix Operating System where the Lawson server resides, view the /etc/passwd file on the Lawson server. To do so, you must first access a command line which can be done either through LID (Lawson Insight Desktop) or through lawterminal which is command line access through the Lawson portal. This demonstration will use LID. First access your command line. Action: Type in 'cd /etc' and press the 'Enter' key. © Copyright Lawson Learning 2008 Page 2 (2) Step 2 of 3: You can use any method you choose to view the file.
    [Show full text]
  • The Linux Command Line
    The Linux Command Line Fifth Internet Edition William Shotts A LinuxCommand.org Book Copyright ©2008-2019, William E. Shotts, Jr. This work is licensed under the Creative Commons Attribution-Noncommercial-No De- rivative Works 3.0 United States License. To view a copy of this license, visit the link above or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042. A version of this book is also available in printed form, published by No Starch Press. Copies may be purchased wherever fine books are sold. No Starch Press also offers elec- tronic formats for popular e-readers. They can be reached at: https://www.nostarch.com. Linux® is the registered trademark of Linus Torvalds. All other trademarks belong to their respective owners. This book is part of the LinuxCommand.org project, a site for Linux education and advo- cacy devoted to helping users of legacy operating systems migrate into the future. You may contact the LinuxCommand.org project at http://linuxcommand.org. Release History Version Date Description 19.01A January 28, 2019 Fifth Internet Edition (Corrected TOC) 19.01 January 17, 2019 Fifth Internet Edition. 17.10 October 19, 2017 Fourth Internet Edition. 16.07 July 28, 2016 Third Internet Edition. 13.07 July 6, 2013 Second Internet Edition. 09.12 December 14, 2009 First Internet Edition. Table of Contents Introduction....................................................................................................xvi Why Use the Command Line?......................................................................................xvi
    [Show full text]
  • The AWK Programming Language
    The Programming ~" ·. Language PolyAWK- The Toolbox Language· Auru:o V. AHo BRIAN W.I<ERNIGHAN PETER J. WEINBERGER TheAWK4 Programming~ Language TheAWI(. Programming~ Language ALFRED V. AHo BRIAN w. KERNIGHAN PETER J. WEINBERGER AT& T Bell Laboratories Murray Hill, New Jersey A ADDISON-WESLEY•• PUBLISHING COMPANY Reading, Massachusetts • Menlo Park, California • New York Don Mills, Ontario • Wokingham, England • Amsterdam • Bonn Sydney • Singapore • Tokyo • Madrid • Bogota Santiago • San Juan This book is in the Addison-Wesley Series in Computer Science Michael A. Harrison Consulting Editor Library of Congress Cataloging-in-Publication Data Aho, Alfred V. The AWK programming language. Includes index. I. AWK (Computer program language) I. Kernighan, Brian W. II. Weinberger, Peter J. III. Title. QA76.73.A95A35 1988 005.13'3 87-17566 ISBN 0-201-07981-X This book was typeset in Times Roman and Courier by the authors, using an Autologic APS-5 phototypesetter and a DEC VAX 8550 running the 9th Edition of the UNIX~ operating system. -~- ATs.T Copyright c 1988 by Bell Telephone Laboratories, Incorporated. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopy­ ing, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. Published simultaneously in Canada. UNIX is a registered trademark of AT&T. DEFGHIJ-AL-898 PREFACE Computer users spend a lot of time doing simple, mechanical data manipula­ tion - changing the format of data, checking its validity, finding items with some property, adding up numbers, printing reports, and the like.
    [Show full text]
  • Basic Unix/Linux Commands
    Unix/Linux Basic commands Learn the basics of grep, sed, awk, find, sort, uniq, cut, cat, etc We will study them using sample requirements 1. Determine all the users that are having at least two first names. Using /etc/passwd file as an input. 2. Replace all the bash shells with csh in the /etc/passwd file. Eg. /bin/bash should be replaced with /bin/csh. 3. Replace all the home directories with /home/missing in the same /etc/passwd file. 4. Count the number of lines in a file. 5. Count the number of words starting with letter b/B in /etc/passwd file. 6. Determine the most popular family name found in the /etc/passwd file. 7. Display all text file names in the current directory ordered by their name. 8. Display all the files having more than 100 bytes. 9. Display all the files that the current user has at least read permission. 10. Sort the content of the /etc/passwd file ascending/descending by the fifth column. 11. Display the first 25 lines from the /etc/passwd file. 12. Display the last 30 lines from the /etc/passwd file. 13. If a file has N lines display the lines between N/2 and N/2+10. 14. Display all the unique parent directories from the /etc/passwd file. Eg. on the sixth column we have the home directory, like: /home/scs/master/an1/gr246/abicr2020, from this we will consider the /home/scs/master/an1/gr246/ as the parent directory. 15. Display the first 10 characters from each line of /etc/passwd file.
    [Show full text]
  • Manipulating Files and Directories
    MANIPULATING FILES AND DIRECTORIES At this point, we are ready for some real work! This chapter will introduce the following commands: z cp—Copy files and directories. z mv—Move/rename files and directories. z mkdir—Create directories. z rm—Remove files and directories. z ln—Create hard and symbolic links. These five commands are among the most frequently used Linux com- mands. They are used for manipulating both files and directories. Now, to be frank, some of the tasks performed by these commands are more easily done with a graphical file manager. With a file manager, we can drag and drop a file from one directory to another, cut and paste files, delete files, and so on. So why use these old command-line programs? www.it-ebooks.info The answer is power and flexibility. While it is easy to perform simple file manipulations with a graphical file manager, complicated tasks can be easier with the command-line programs. For example, how could we copy all the HTML files from one directory to another—but only those that do not exist in the destination directory or are newer than the versions in the destination directory? Pretty hard with a file manager. Pretty easy with the command line: cp -u *.html destination Wildcards Before we begin using our commands, we need to talk about the shell fea- ture that makes these commands so powerful. Because the shell uses file- names so much, it provides special characters to help you rapidly specify groups of filenames. These special characters are called wildcards.
    [Show full text]
  • A. -D B. -L C. -1 D. -I E. -A A
    Ian! D. Allen − Fall 2012 -1- 45 minutes Ian! D. Allen − Fall 2012 -2- 45 minutes 7. [61/123] If I am in my home directory named /home/me and mt is an empty sub- Test Version: ___ Print Name: directory,what is true after this command line: touch ./foo ; mv ./mt/../foo ../me/bar Multiple Choice - 52 Questions - 15 of 15% a. the directory mt nowcontains only a file named bar 1. Read all the instructions and both sides (back and front) of all pages. b. there is a second copyofthe file named foo in the file named bar 2. Put the Test Version above into NO. OF QUESTIONS and NO. OF STUDENTS c. the command fails because path ./mt/../foo does not exist 3. Answer the questions you know, first. One Answer Only per question. d. the command fails because path ../me/bar does not exist 4. Manage your time when answering questions on this test! e. the directory mt/.. nowhas a file named bar in it 8. [63/125] In the output of ls -a,the one-character name . signifies what? 1. [49/126] If my current directory is /bin,which of these pathnames is equivalent to the file name /bin/ls? a. Acurrent file. a. ../bin/ls/. b. /root/bin/ls c. ls/. b. The current directory. d. ./bin/ls e. ../../bin/ls c. The parent directory. d. Aname that is hidden. 2. [52/125] If my current working directory is /home,and my home directory is e. Aname with an unprintable character. /home/ian,which command copies file /bin/ls into my home directory under the name me? 9.
    [Show full text]
  • IBM Education Assistance for Z/OS V2R1
    IBM Education Assistance for z/OS V2R1 Item: ASCII Unicode Option Element/Component: UNIX Shells and Utilities (S&U) Material is current as of June 2013 © 2013 IBM Corporation Filename: zOS V2R1 USS S&U ASCII Unicode Option Agenda ■ Trademarks ■ Presentation Objectives ■ Overview ■ Usage & Invocation ■ Migration & Coexistence Considerations ■ Presentation Summary ■ Appendix Page 2 of 19 © 2013 IBM Corporation Filename: zOS V2R1 USS S&U ASCII Unicode Option IBM Presentation Template Full Version Trademarks ■ See url http://www.ibm.com/legal/copytrade.shtml for a list of trademarks. Page 3 of 19 © 2013 IBM Corporation Filename: zOS V2R1 USS S&U ASCII Unicode Option IBM Presentation Template Full Presentation Objectives ■ Introduce the features and benefits of the new z/OS UNIX Shells and Utilities (S&U) support for working with ASCII/Unicode files. Page 4 of 19 © 2013 IBM Corporation Filename: zOS V2R1 USS S&U ASCII Unicode Option IBM Presentation Template Full Version Overview ■ Problem Statement –As a z/OS UNIX Shells & Utilities user, I want the ability to control the text conversion of input files used by the S&U commands. –As a z/OS UNIX Shells & Utilities user, I want the ability to run tagged shell scripts (tcsh scripts and SBCS sh scripts) under different SBCS locales. ■ Solution –Add –W filecodeset=codeset,pgmcodeset=codeset option on several S&U commands to enable text conversion – consistent with support added to vi and ex in V1R13. –Add –B option on several S&U commands to disable automatic text conversion – consistent with other commands that already have this override support. –Add new _TEXT_CONV environment variable to enable or disable text conversion.
    [Show full text]
  • GNU Coreutils Cheat Sheet (V1.00) Created by Peteris Krumins ([email protected], -- Good Coders Code, Great Coders Reuse)
    GNU Coreutils Cheat Sheet (v1.00) Created by Peteris Krumins ([email protected], www.catonmat.net -- good coders code, great coders reuse) Utility Description Utility Description arch Print machine hardware name nproc Print the number of processors base64 Base64 encode/decode strings or files od Dump files in octal and other formats basename Strip directory and suffix from file names paste Merge lines of files cat Concatenate files and print on the standard output pathchk Check whether file names are valid or portable chcon Change SELinux context of file pinky Lightweight finger chgrp Change group ownership of files pr Convert text files for printing chmod Change permission modes of files printenv Print all or part of environment chown Change user and group ownership of files printf Format and print data chroot Run command or shell with special root directory ptx Permuted index for GNU, with keywords in their context cksum Print CRC checksum and byte counts pwd Print current directory comm Compare two sorted files line by line readlink Display value of a symbolic link cp Copy files realpath Print the resolved file name csplit Split a file into context-determined pieces rm Delete files cut Remove parts of lines of files rmdir Remove directories date Print or set the system date and time runcon Run command with specified security context dd Convert a file while copying it seq Print sequence of numbers to standard output df Summarize free disk space setuidgid Run a command with the UID and GID of a specified user dir Briefly list directory
    [Show full text]
  • Multiresolution Recurrent Neural Networks: an Application to Dialogue Response Generation
    Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation Iulian Vlad Serban∗◦ Tim Klinger University of Montreal IBM Research 2920 chemin de la Tour, T. J. Watson Research Center, Montréal, QC, Canada Yorktown Heights, NY, USA Gerald Tesauro Kartik Talamadupula Bowen Zhou IBM Research IBM Research IBM Research T. J. Watson Research Center, T. J. Watson Research Center, T. J. Watson Research Center, Yorktown Heights, Yorktown Heights, Yorktown Heights, NY, USA NY, USA NY, USA Yoshua Bengioy◦ Aaron Courville◦ University of Montreal University of Montreal 2920 chemin de la Tour, 2920 chemin de la Tour, Montréal, QC, Canada Montréal, QC, Canada Abstract We introduce the multiresolution recurrent neural network, which extends the sequence-to-sequence framework to model natural language generation as two parallel discrete stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language tokens. There are many ways to estimate or learn the high-level coarse tokens, but we argue that a simple extraction procedure is sufficient to capture a wealth of high-level discourse semantics. Such procedure allows training the multiresolution recurrent neural network by maximizing the exact joint log-likelihood over both sequences. In contrast to the standard log- likelihood objective w.r.t. natural language tokens (word perplexity), optimizing the joint log-likelihood biases the model towards modeling high-level abstractions. We apply the proposed model to the task of dialogue response generation in arXiv:1606.00776v2 [cs.CL] 14 Jun 2016 two challenging domains: the Ubuntu technical support domain, and Twitter conversations. On Ubuntu, the model outperforms competing approaches by a substantial margin, achieving state-of-the-art results according to both automatic evaluation metrics and a human evaluation study.
    [Show full text]
  • Linux File System and Linux Commands
    Hands-on Keyboard: Cyber Experiments for Strategists and Policy Makers Review of the Linux File System and Linux Commands 1. Introduction Becoming adept at using the Linux OS requires gaining familiarity with the Linux file system, file permissions, and a base set of Linux commands. In this activity, you will study how the Linux file system is organized and practice utilizing common Linux commands. Objectives • Describe the purpose of the /bin, /sbin, /etc, /var/log, /home, /proc, /root, /dev, /tmp, and /lib directories. • Describe the purpose of the /etc/shadow and /etc/passwd files. • Utilize a common set of Linux commands including ls, cat, and find. • Understand and manipulate file permissions, including rwx, binary and octal formats. • Change the group and owner of a file. Materials • Windows computer with access to an account with administrative rights The Air Force Cyber College thanks the Advanced Cyber Engineering program at the Air Force Research Laboratory in Rome, NY, for providing the information to assist in educating the general Air Force on the technical aspects of cyberspace. • VirtualBox • Ubuntu OS .iso File Assumptions • The provided instructions were tested on an Ubuntu 15.10 image running on a Windows 8 physical machine. Instructions may vary for other OS. • The student has administrative access to their system and possesses the right to install programs. • The student’s computer has Internet access. 2. Directories / The / directory or root directory is the mother of all Linux directories, containing all of the other directories and files. From a terminal users can type cd/ to move to the root directory.
    [Show full text]
  • Permissions and Ownership
    Getting Started with Linux: Novell’s Guide to CompTIA’s Linux+ Objective 1 Understand User and Group Configuration Files Information on users and groups on a Linux system is kept in the following files: ■ /etc/passwd ■ /etc/shadow ■ /etc/group Whenever possible, you should not modify these files with an editor. Instead, use the Security and Users modules in YaST or the command line tools described in the next objective, “Manage User Accounts and Groups from the Command Line” on 7-12. Modifying these files with an editor can lead to errors (especially in /etc/shadow), such as a user—including the user root—no longer being able to log in. To ensure consistency of these files, you need to understand how to ■ Check /etc/passwd and /etc/shadow ■ Convert Passwords to and from Shadow /etc/passwd The file /etc/passwd stores information for each user. In the past, UNIX and Linux users were handled in a single file: /etc/passwd. The user name, the UID, the home directory, the standard shell, and the encrypted password were all stored in this file. The password was encrypted using the function crypt (man 3 crypt). In principle, the plain text password could not be deciphered from the encrypted password. 7-2 Version 2 Use the Command Line Interface to Administer the System However, there are programs (such as john) that use dictionaries to encrypt various passwords with crypt, and then compare the results with the entries in the file /etc/passwd. With the calculation power of modern computers, simple passwords can be “guessed” within minutes.
    [Show full text]