GAWK: Effective AWK Programming a User’S Guide for GNU Awk Edition 4 June, 2011

Total Page:16

File Type:pdf, Size:1020Kb

GAWK: Effective AWK Programming a User’S Guide for GNU Awk Edition 4 June, 2011 GAWK: Effective AWK Programming A User’s Guide for GNU Awk Edition 4 June, 2011 Arnold D. Robbins “To boldly go where no man has gone before” is a Registered Trademark of Paramount Pictures Corporation. Published by: Free Software Foundation 51 Franklin Street, Fifth Floor Boston, MA 02110-1301 USA Phone: +1-617-542-5942 Fax: +1-617-542-2652 Email: [email protected] URL: http://www.gnu.org/ ISBN 1-882114-28-0 Copyright c 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011 Free Software Foundation, Inc. This is Edition 4 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk, for the 4.0.0 (or later) version of the GNU implementation of AWK. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled “GNU Free Documentation License”. a. “A GNU Manual” b. “You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.” To Miriam, for making me complete. To Chana, for the joy you bring us. To Rivka, for the exponential increase. To Nachum, for the added dimension. To Malka, for the new beginning. i Short Contents Foreword ................................................ 1 Preface ................................................. 3 1 Getting Started with awk .............................. 11 2 Running awk and gawk ................................ 25 3 Regular Expressions .................................. 37 4 Reading Input Files .................................. 49 5 Printing Output ..................................... 73 6 Expressions ......................................... 89 7 Patterns, Actions, and Variables ....................... 111 8 Arrays in awk ...................................... 135 9 Functions .......................................... 147 10 Internationalization with gawk ......................... 185 11 Advanced Features of gawk ........................... 195 12 A Library of awk Functions ........................... 211 13 Practical awk Programs .............................. 241 14 dgawk: The awk Debugger ............................ 285 A The Evolution of the awk Language..................... 301 B Installing gawk ..................................... 309 C Implementation Notes................................ 325 D Basic Programming Concepts ......................... 341 Glossary .............................................. 347 GNU General Public License .............................. 357 GNU Free Documentation License ......................... 369 Index ................................................. 377 iii Table of Contents Foreword ............................................ 1 Preface .............................................. 3 History of awk and gawk ............................................. 3 A Rose by Any Other Name ......................................... 4 Using This Book .................................................... 5 Typographical Conventions .......................................... 6 The GNU Project and This Book.................................... 7 How to Contribute .................................................. 8 Acknowledgments ................................................... 9 1 Getting Started with awk ..................... 11 1.1 How to Run awk Programs .................................... 11 1.1.1 One-Shot Throwaway awk Programs ...................... 11 1.1.2 Running awk Without Input Files ........................ 12 1.1.3 Running Long Programs ................................. 12 1.1.4 Executable awk Programs ................................ 13 1.1.5 Comments in awk Programs .............................. 14 1.1.6 Shell-Quoting Issues ...................................... 15 1.1.6.1 Quoting in MS-Windows Batch Files................. 16 1.2 Data Files for the Examples ................................... 16 1.3 Some Simple Examples........................................ 17 1.4 An Example with Two Rules .................................. 19 1.5 A More Complex Example .................................... 20 1.6 awk Statements Versus Lines .................................. 21 1.7 Other Features of awk ......................................... 22 1.8 When to Use awk ............................................. 22 2 Running awk and gawk ......................... 25 2.1 Invoking awk .................................................. 25 2.2 Command-Line Options ....................................... 25 2.3 Other Command-Line Arguments ............................. 30 2.4 Naming Standard Input ....................................... 31 2.5 The Environment Variables gawk Uses ......................... 32 2.5.1 The AWKPATH Environment Variable ...................... 32 2.5.2 Other Environment Variables ............................. 32 2.6 gawk’s Exit Status ............................................ 33 2.7 Including Other Files Into Your Program ...................... 34 2.8 Obsolete Options and/or Features ............................. 35 2.9 Undocumented Options and Features .......................... 35 iv GAWK: Effective AWK Programming 3 Regular Expressions........................... 37 3.1 How to Use Regular Expressions .............................. 37 3.2 Escape Sequences ............................................. 38 3.3 Regular Expression Operators ................................. 40 3.4 Using Bracket Expressions .................................... 42 3.5 gawk-Specific Regexp Operators ............................... 44 3.6 Case Sensitivity in Matching .................................. 45 3.7 How Much Text Matches? ..................................... 46 3.8 Using Dynamic Regexps....................................... 47 4 Reading Input Files ........................... 49 4.1 How Input Is Split into Records ............................... 49 4.2 Examining Fields ............................................. 52 4.3 Nonconstant Field Numbers ................................... 53 4.4 Changing the Contents of a Field.............................. 54 4.5 Specifying How Fields Are Separated .......................... 56 4.5.1 Whitespace Normally Separates Fields .................... 57 4.5.2 Using Regular Expressions to Separate Fields ............. 57 4.5.3 Making Each Character a Separate Field ................. 58 4.5.4 Setting FS from the Command Line ...................... 59 4.5.5 Field-Splitting Summary ................................. 60 4.6 Reading Fixed-Width Data.................................... 61 4.7 Defining Fields By Content.................................... 63 4.8 Multiple-Line Records......................................... 64 4.9 Explicit Input with getline .................................. 67 4.9.1 Using getline with No Arguments ....................... 67 4.9.2 Using getline into a Variable ............................ 68 4.9.3 Using getline from a File ............................... 69 4.9.4 Using getline into a Variable from a File ................ 69 4.9.5 Using getline from a Pipe............................... 70 4.9.6 Using getline into a Variable from a Pipe ............... 71 4.9.7 Using getline from a Coprocess ......................... 71 4.9.8 Using getline into a Variable from a Coprocess .......... 71 4.9.9 Points to Remember About getline ..................... 71 4.9.10 Summary of getline Variants........................... 72 4.10 Directories On The Command Line........................... 72 5 Printing Output ............................... 73 5.1 The print Statement ......................................... 73 5.2 print Statement Examples.................................... 73 5.3 Output Separators ............................................ 75 5.4 Controlling Numeric Output with print....................... 75 5.5 Using printf Statements for Fancier Printing ................. 76 5.5.1 Introduction to the printf Statement .................... 76 5.5.2 Format-Control Letters................................... 76 5.5.3 Modifiers for printf Formats ............................ 78 5.5.4 Examples Using printf .................................. 80 v 5.6 Redirecting Output of print and printf ...................... 81 5.7 Special File Names in gawk .................................... 84 5.7.1 Special Files for Standard Descriptors .................... 84 5.7.2 Special Files for Network Communications ................ 85 5.7.3 Special File Name Caveats ............................... 85 5.8 Closing Input and Output Redirections ........................ 86 6 Expressions .................................... 89 6.1 Constants, Variables and Conversions ......................... 89 6.1.1 Constant Expressions .................................... 89 6.1.1.1 Numeric and String Constants ....................... 89 6.1.1.2 Octal and Hexadecimal Numbers .................... 89 6.1.1.3 Regular Expression Constants ....................... 90 6.1.2 Using Regular Expression Constants ...................... 91 6.1.3 Variables................................................. 92 6.1.3.1 Using Variables in a Program ........................ 92 6.1.3.2 Assigning Variables on the Command Line ........... 92 6.1.4 Conversion of
Recommended publications
  • At—At, Batch—Execute Commands at a Later Time
    at—at, batch—execute commands at a later time at [–csm] [–f script] [–qqueue] time [date] [+ increment] at –l [ job...] at –r job... batch at and batch read commands from standard input to be executed at a later time. at allows you to specify when the commands should be executed, while jobs queued with batch will execute when system load level permits. Executes commands read from stdin or a file at some later time. Unless redirected, the output is mailed to the user. Example A.1 1 at 6:30am Dec 12 < program 2 at noon tomorrow < program 3 at 1945 pm August 9 < program 4 at now + 3 hours < program 5 at 8:30am Jan 4 < program 6 at -r 83883555320.a EXPLANATION 1. At 6:30 in the morning on December 12th, start the job. 2. At noon tomorrow start the job. 3. At 7:45 in the evening on August 9th, start the job. 4. In three hours start the job. 5. At 8:30 in the morning of January 4th, start the job. 6. Removes previously scheduled job 83883555320.a. awk—pattern scanning and processing language awk [ –fprogram–file ] [ –Fc ] [ prog ] [ parameters ] [ filename...] awk scans each input filename for lines that match any of a set of patterns specified in prog. Example A.2 1 awk '{print $1, $2}' file 2 awk '/John/{print $3, $4}' file 3 awk -F: '{print $3}' /etc/passwd 4 date | awk '{print $6}' EXPLANATION 1. Prints the first two fields of file where fields are separated by whitespace. 2. Prints fields 3 and 4 if the pattern John is found.
    [Show full text]
  • Buffer Overflow Exploits
    Buffer overflow exploits EJ Jung A Bit of History: Morris Worm Worm was released in 1988 by Robert Morris • Graduate student at Cornell, son of NSA chief scientist • Convicted under Computer Fraud and Abuse Act, sentenced to 3 years of probation and 400 hours of community service • Now a computer science professor at MIT Worm was intended to propagate slowly and harmlessly measure the size of the Internet Due to a coding error, it created new copies as fast as it could and overloaded infected machines $10-100M worth of damage Morris Worm and Buffer Overflow One of the worm’s propagation techniques was a buffer overflow attack against a vulnerable version of fingerd on VAX systems • By sending special string to finger daemon, worm caused it to execute code creating a new worm copy • Unable to determine remote OS version, worm also attacked fingerd on Suns running BSD, causing them to crash (instead of spawning a new copy) For more history: • http://www.snowplow.org/tom/worm/worm.html Buffer Overflow These Days Most common cause of Internet attacks • Over 50% of advisories published by CERT (computer security incident report team) are caused by various buffer overflows Morris worm (1988): overflow in fingerd • 6,000 machines infected CodeRed (2001): overflow in MS-IIS server • 300,000 machines infected in 14 hours SQL Slammer (2003): overflow in MS-SQL server • 75,000 machines infected in 10 minutes (!!) Attacks on Memory Buffers Buffer is a data storage area inside computer memory (stack or heap) • Intended to hold pre-defined
    [Show full text]
  • Wait, I Don't Want to Be the Linux Administrator for SAS VA
    SESUG Paper 88-2017 Wait, I don’t want to be the Linux Administrator for SAS VA Jonathan Boase; Zencos Consulting ABSTRACT Whether you are a new SAS administrator or switching to a Linux environment, you have a complex mission. This job becomes even more formidable when you are working with a system like SAS Visual Analytics that requires multiple users loading data daily. Eventually a user will have data issues or create a disruption that causes the system to malfunction. When that happens, what do you do next? In this paper, we will go through the basics of a SAS Visual Analytics Linux environment and how to troubleshoot the system when issues arise. INTRODUCTION Many companies choose to implement SAS Visual Analytics in a Linux environment. With a distributed deployment, it’s the only choice but many chose this operating system because it reduces operating costs. If you are the newly chosen SAS platform administrator, you might be more versed in a Windows environment and feel intimidated by Linux. This paper introduces using basic Linux commands and methods for troubleshooting a SAS Visual Analytics environment. The paper assumes that SAS Visual Analytics is installed on a SAS 9.4 platform for Linux and that the reader has some familiarity with other operating systems, such as Windows. PLATFORM ADMINISTRATION 101 SAS platform administrators work with three main product areas. Each area provides a different functionality based on the task the administrator needs to perform. The following figure defines each area and provides a general overview of its purpose. Figure 1 Platform Administrator Tools Operating System SAS Management SAS Environment •Contains installed Console Manager software •Access and manage the •Monitor the •Contains logs used for metadata environment troubleshooting •Control database •Configure custom alerts •Administer host system connections users •Manage user accounts •Manage the LASR server With any operating system, there is always a lot to learn.
    [Show full text]
  • Practical C Programming, 3Rd Edition
    Practical C Programming, 3rd Edition By Steve Oualline 3rd Edition August 1997 ISBN: 1-56592-306-5 This new edition of "Practical C Programming" teaches users not only the mechanics or programming, but also how to create programs that are easy to read, maintain, and debug. It features more extensive examples and an introduction to graphical development environments. Programs conform to ANSI C. 0 TEAM FLY PRESENTS Table of Contents Preface How This Book is Organized Chapter by Chapter Notes on the Third Edition Font Conventions Obtaining Source Code Comments and Questions Acknowledgments Acknowledgments to the Third Edition I. Basics 1. What Is C? How Programming Works Brief History of C How C Works How to Learn C 2. Basics of Program Writing Programs from Conception to Execution Creating a Real Program Creating a Program Using a Command-Line Compiler Creating a Program Using an Integrated Development Environment Getting Help on UNIX Getting Help in an Integrated Development Environment IDE Cookbooks Programming Exercises 3. Style Common Coding Practices Coding Religion Indentation and Code Format Clarity Simplicity Summary 4. Basic Declarations and Expressions Elements of a Program Basic Program Structure Simple Expressions Variables and Storage 1 TEAM FLY PRESENTS Variable Declarations Integers Assignment Statements printf Function Floating Point Floating Point Versus Integer Divide Characters Answers Programming Exercises 5. Arrays, Qualifiers, and Reading Numbers Arrays Strings Reading Strings Multidimensional Arrays Reading Numbers Initializing Variables Types of Integers Types of Floats Constant Declarations Hexadecimal and Octal Constants Operators for Performing Shortcuts Side Effects ++x or x++ More Side-Effect Problems Answers Programming Exercises 6.
    [Show full text]
  • Download the Specification
    Internationalizing and Localizing Applications in Oracle Solaris Part No: E61053 November 2020 Internationalizing and Localizing Applications in Oracle Solaris Part No: E61053 Copyright © 2014, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial
    [Show full text]
  • Potranslator Documentation Release 1.1.5
    potranslator Documentation Release 1.1.5 SekouD Nov 01, 2018 Contents 1 potranslator 3 1.1 Supported Languages..........................................3 1.2 Quick Start for auto-translation with potranslator............................6 1.3 Basic Features..............................................7 1.4 Optional features.............................................7 1.5 Installation................................................8 1.6 Commands, options, environment variables...............................8 1.7 License..................................................9 1.8 Original..................................................9 1.9 CHANGES................................................9 2 Installation 11 2.1 Stable release............................................... 11 2.2 From sources............................................... 11 3 Usage 13 3.1 From a Python program......................................... 13 3.2 Commands, options, environment variables............................... 13 4 Package Api Documentation for potranslator 17 4.1 API Reference for the classes in potranslator.potranslator.py...................... 17 5 Contributing 19 5.1 Types of Contributions.......................................... 19 5.2 Get Started!................................................ 20 5.3 Pull Request Guidelines......................................... 21 5.4 Tips.................................................... 21 5.5 Deploying................................................ 21 6 Credits 23 6.1 Development Lead...........................................
    [Show full text]
  • Drilling Network Stacks with Packetdrill
    Drilling Network Stacks with packetdrill NEAL CARDWELL AND BARATH RAGHAVAN Neal Cardwell received an M.S. esting and troubleshooting network protocols and stacks can be in Computer Science from the painstaking. To ease this process, our team built packetdrill, a tool University of Washington, with that lets you write precise scripts to test entire network stacks, from research focused on TCP and T the system call layer down to the NIC hardware. packetdrill scripts use a Web performance. He joined familiar syntax and run in seconds, making them easy to use during develop- Google in 2002. Since then he has worked on networking software for google.com, the ment, debugging, and regression testing, and for learning and investigation. Googlebot web crawler, the network stack in Have you ever had the experience of staring at a long network trace, trying to figure out what the Linux kernel, and TCP performance and on earth went wrong? When a network protocol is not working right, how might you find the testing. [email protected] problem and fix it? Although tools like tcpdump allow us to peek under the hood, and tools like netperf help measure networks end-to-end, reproducing behavior is still hard, and know- Barath Raghavan received a ing when an issue has been fixed is even harder. Ph.D. in Computer Science from UC San Diego and a B.S. from These are the exact problems that our team used to encounter on a regular basis during UC Berkeley. He joined Google kernel network stack development. Here we describe packetdrill, which we built to enable in 2012 and was previously a scriptable network stack testing.
    [Show full text]
  • Automatic Discovery of API-Level Vulnerabilities
    Automatic Discovery of API-Level Vulnerabilities Vinod Ganapathyy, Sanjit A. Seshiaz, Somesh Jhay, Thomas W. Repsy, Randal E. Bryantz yComputer Sciences Department, zSchool of Computer Science, University of Wisconsin-Madison Carnegie Mellon University fvg|jha|[email protected] fsanjit|[email protected] UW-MADISON COMPUTER SCIENCES TECHNICAL REPORT: UW-CS-TR-1512, JULY 2004. Abstract A system is vulnerable to an API-level attack if its security can be compromised by invoking an allowed sequence of operations from its API. We present a formal framework to model and analyze APIs, and develop an automatic technique based upon bounded model checking to discover API-level vulnerabilities. If a vulnerability exists, our technique produces a trace of API operations demonstrating an attack. Two case studies show the efficacy of our technique. In the first study we present a novel way to analyze printf-family format-string attacks as API-level attacks, and implement a tool to discover them automatically. In the second study, we model a subset of the IBM Common Cryptographic Architecture API, a popular cryptographic key-management API, and automatically detect a previously known vulnerability. 1 Introduction Software modules communicate through application programming interfaces (APIs). Failure to respect an API's usage conventions or failure to understand the semantics of an API may lead to security vulnerabilities. For instance, Chen et al. [13] demonstrated a security vulnerability in sendmail-8.10.1that was due to a misunderstanding of the semantics of UNIX user-id setting commands. A programmer mistakenly assumed that setuid(getuid()) would always drop all privileges.
    [Show full text]
  • Bash Crash Course + Bc + Sed + Awk∗
    Bash Crash course + bc + sed + awk∗ Andrey Lukyanenko, CSE, Aalto University Fall, 2011 There are many Unix shell programs: bash, sh, csh, tcsh, ksh, etc. The comparison of those can be found on-line 1. We will primary focus on the capabilities of bash v.4 shell2. 1. Each bash script can be considered as a text file which starts with #!/bin/bash. It informs the editor or interpretor which tries to open the file, what to do with the file and how should it be treated. The special character set in the beginning #! is a magic number; check man magic and /usr/share/file/magic on existing magic numbers if interested. 2. Each script (assume you created “scriptname.sh file) can be invoked by command <dir>/scriptname.sh in console, where <dir> is absolute or relative path to the script directory, e.g., ./scriptname.sh for current directory. If it has #! as the first line it will be invoked by this command, otherwise it can be called by command bash <dir>/scriptname.sh. Notice: to call script as ./scriptname.sh it has to be executable, i.e., call command chmod 555 scriptname.sh before- hand. 3. Variable in bash can be treated as integers or strings, depending on their value. There are set of operations and rules available for them. For example: #!/bin/bash var1=123 # Assigns value 123 to var1 echo var1 # Prints ’var1’ to output echo $var1 # Prints ’123’ to output var2 =321 # Error (var2: command not found) var2= 321 # Error (321: command not found) var2=321 # Correct var3=$var2 # Assigns value 321 from var2 to var3 echo $var3 # Prints ’321’ to output
    [Show full text]
  • Open Source License and Copyright Information for Gplv3 and Lgplv3
    Open Source License and Copyright Information for GPLv3/LGPLv3 Dell EMC PowerStore Open Source License and Copyright Information for GPLv3/LGPLv3 June 2021 Rev A02 Revisions Revisions Date Description May 2020 Initial release December 2020 Version updates for some licenses, and addition and deletion of other components June, 2021 Version updates for some licenses, and addition and deletion of other components The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2020-2021 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [6/1/2021] [Open Source License and Copyright Information for GPLv3/LGPLv3] [Rev A02] 2 Dell EMC PowerStore: Open Source License and Copyright Information for GPLv3/LGPLv3 Table of contents Table of contents Revisions............................................................................................................................................................................. 2 Table of contents ...............................................................................................................................................................
    [Show full text]
  • UNIX X Command Tips and Tricks David B
    SESUG Paper 122-2019 UNIX X Command Tips and Tricks David B. Horvath, MS, CCP ABSTRACT SAS® provides the ability to execute operating system level commands from within your SAS code – generically known as the “X Command”. This session explores the various commands, the advantages and disadvantages of each, and their alternatives. The focus is on UNIX/Linux but much of the same applies to Windows as well. Under SAS EG, any issued commands execute on the SAS engine, not necessarily on the PC. X %sysexec Call system Systask command Filename pipe &SYSRC Waitfor Alternatives will also be addressed – how to handle when NOXCMD is the default for your installation, saving results, and error checking. INTRODUCTION In this paper I will be covering some of the basics of the functionality within SAS that allows you to execute operating system commands from within your program. There are multiple ways you can do so – external to data steps, within data steps, and within macros. All of these, along with error checking, will be covered. RELEVANT OPTIONS Execution of any of the SAS System command execution commands depends on one option's setting: XCMD Enables the X command in SAS. Which can only be set at startup: options xcmd; ____ 30 WARNING 30-12: SAS option XCMD is valid only at startup of the SAS System. The SAS option is ignored. Unfortunately, ff NOXCMD is set at startup time, you're out of luck. Sorry! You might want to have a conversation with your system administrators to determine why and if you can get it changed.
    [Show full text]
  • Useful Commands in Linux and Other Tools for Quality Control
    Useful commands in Linux and other tools for quality control Ignacio Aguilar INIA Uruguay 05-2018 Unix Basic Commands pwd show working directory ls list files in working directory ll as before but with more information mkdir d make a directory d cd d change to directory d Copy and moving commands To copy file cp /home/user/is . To copy file directory cp –r /home/folder . to move file aa into bb in folder test mv aa ./test/bb To delete rm yy delete the file yy rm –r xx delete the folder xx Redirections & pipe Redirection useful to read/write from file !! aa < bb program aa reads from file bb blupf90 < in aa > bb program aa write in file bb blupf90 < in > log Redirections & pipe “|” similar to redirection but instead to write to a file, passes content as input to other command tee copy standard input to standard output and save in a file echo copy stream to standard output Example: program blupf90 reads name of parameter file and writes output in terminal and in file log echo par.b90 | blupf90 | tee blup.log Other popular commands head file print first 10 lines list file page-by-page tail file print last 10 lines less file list file line-by-line or page-by-page wc –l file count lines grep text file find lines that contains text cat file1 fiel2 concatenate files sort sort file cut cuts specific columns join join lines of two files on specific columns paste paste lines of two file expand replace TAB with spaces uniq retain unique lines on a sorted file head / tail $ head pedigree.txt 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10
    [Show full text]