GAWK: Effective AWK Programming a User’S Guide for GNU Awk Edition 4 June, 2011
Total Page:16
File Type:pdf, Size:1020Kb
GAWK: Effective AWK Programming A User’s Guide for GNU Awk Edition 4 June, 2011 Arnold D. Robbins “To boldly go where no man has gone before” is a Registered Trademark of Paramount Pictures Corporation. Published by: Free Software Foundation 51 Franklin Street, Fifth Floor Boston, MA 02110-1301 USA Phone: +1-617-542-5942 Fax: +1-617-542-2652 Email: [email protected] URL: http://www.gnu.org/ ISBN 1-882114-28-0 Copyright c 1989, 1991, 1992, 1993, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011 Free Software Foundation, Inc. This is Edition 4 of GAWK: Effective AWK Programming: A User’s Guide for GNU Awk, for the 4.0.0 (or later) version of the GNU implementation of AWK. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with the Invariant Sections being “GNU General Public License”, the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled “GNU Free Documentation License”. a. “A GNU Manual” b. “You have the freedom to copy and modify this GNU manual. Buying copies from the FSF supports it in developing GNU and promoting software freedom.” To Miriam, for making me complete. To Chana, for the joy you bring us. To Rivka, for the exponential increase. To Nachum, for the added dimension. To Malka, for the new beginning. i Short Contents Foreword ................................................ 1 Preface ................................................. 3 1 Getting Started with awk .............................. 11 2 Running awk and gawk ................................ 25 3 Regular Expressions .................................. 37 4 Reading Input Files .................................. 49 5 Printing Output ..................................... 73 6 Expressions ......................................... 89 7 Patterns, Actions, and Variables ....................... 111 8 Arrays in awk ...................................... 135 9 Functions .......................................... 147 10 Internationalization with gawk ......................... 185 11 Advanced Features of gawk ........................... 195 12 A Library of awk Functions ........................... 211 13 Practical awk Programs .............................. 241 14 dgawk: The awk Debugger ............................ 285 A The Evolution of the awk Language..................... 301 B Installing gawk ..................................... 309 C Implementation Notes................................ 325 D Basic Programming Concepts ......................... 341 Glossary .............................................. 347 GNU General Public License .............................. 357 GNU Free Documentation License ......................... 369 Index ................................................. 377 iii Table of Contents Foreword ............................................ 1 Preface .............................................. 3 History of awk and gawk ............................................. 3 A Rose by Any Other Name ......................................... 4 Using This Book .................................................... 5 Typographical Conventions .......................................... 6 The GNU Project and This Book.................................... 7 How to Contribute .................................................. 8 Acknowledgments ................................................... 9 1 Getting Started with awk ..................... 11 1.1 How to Run awk Programs .................................... 11 1.1.1 One-Shot Throwaway awk Programs ...................... 11 1.1.2 Running awk Without Input Files ........................ 12 1.1.3 Running Long Programs ................................. 12 1.1.4 Executable awk Programs ................................ 13 1.1.5 Comments in awk Programs .............................. 14 1.1.6 Shell-Quoting Issues ...................................... 15 1.1.6.1 Quoting in MS-Windows Batch Files................. 16 1.2 Data Files for the Examples ................................... 16 1.3 Some Simple Examples........................................ 17 1.4 An Example with Two Rules .................................. 19 1.5 A More Complex Example .................................... 20 1.6 awk Statements Versus Lines .................................. 21 1.7 Other Features of awk ......................................... 22 1.8 When to Use awk ............................................. 22 2 Running awk and gawk ......................... 25 2.1 Invoking awk .................................................. 25 2.2 Command-Line Options ....................................... 25 2.3 Other Command-Line Arguments ............................. 30 2.4 Naming Standard Input ....................................... 31 2.5 The Environment Variables gawk Uses ......................... 32 2.5.1 The AWKPATH Environment Variable ...................... 32 2.5.2 Other Environment Variables ............................. 32 2.6 gawk’s Exit Status ............................................ 33 2.7 Including Other Files Into Your Program ...................... 34 2.8 Obsolete Options and/or Features ............................. 35 2.9 Undocumented Options and Features .......................... 35 iv GAWK: Effective AWK Programming 3 Regular Expressions........................... 37 3.1 How to Use Regular Expressions .............................. 37 3.2 Escape Sequences ............................................. 38 3.3 Regular Expression Operators ................................. 40 3.4 Using Bracket Expressions .................................... 42 3.5 gawk-Specific Regexp Operators ............................... 44 3.6 Case Sensitivity in Matching .................................. 45 3.7 How Much Text Matches? ..................................... 46 3.8 Using Dynamic Regexps....................................... 47 4 Reading Input Files ........................... 49 4.1 How Input Is Split into Records ............................... 49 4.2 Examining Fields ............................................. 52 4.3 Nonconstant Field Numbers ................................... 53 4.4 Changing the Contents of a Field.............................. 54 4.5 Specifying How Fields Are Separated .......................... 56 4.5.1 Whitespace Normally Separates Fields .................... 57 4.5.2 Using Regular Expressions to Separate Fields ............. 57 4.5.3 Making Each Character a Separate Field ................. 58 4.5.4 Setting FS from the Command Line ...................... 59 4.5.5 Field-Splitting Summary ................................. 60 4.6 Reading Fixed-Width Data.................................... 61 4.7 Defining Fields By Content.................................... 63 4.8 Multiple-Line Records......................................... 64 4.9 Explicit Input with getline .................................. 67 4.9.1 Using getline with No Arguments ....................... 67 4.9.2 Using getline into a Variable ............................ 68 4.9.3 Using getline from a File ............................... 69 4.9.4 Using getline into a Variable from a File ................ 69 4.9.5 Using getline from a Pipe............................... 70 4.9.6 Using getline into a Variable from a Pipe ............... 71 4.9.7 Using getline from a Coprocess ......................... 71 4.9.8 Using getline into a Variable from a Coprocess .......... 71 4.9.9 Points to Remember About getline ..................... 71 4.9.10 Summary of getline Variants........................... 72 4.10 Directories On The Command Line........................... 72 5 Printing Output ............................... 73 5.1 The print Statement ......................................... 73 5.2 print Statement Examples.................................... 73 5.3 Output Separators ............................................ 75 5.4 Controlling Numeric Output with print....................... 75 5.5 Using printf Statements for Fancier Printing ................. 76 5.5.1 Introduction to the printf Statement .................... 76 5.5.2 Format-Control Letters................................... 76 5.5.3 Modifiers for printf Formats ............................ 78 5.5.4 Examples Using printf .................................. 80 v 5.6 Redirecting Output of print and printf ...................... 81 5.7 Special File Names in gawk .................................... 84 5.7.1 Special Files for Standard Descriptors .................... 84 5.7.2 Special Files for Network Communications ................ 85 5.7.3 Special File Name Caveats ............................... 85 5.8 Closing Input and Output Redirections ........................ 86 6 Expressions .................................... 89 6.1 Constants, Variables and Conversions ......................... 89 6.1.1 Constant Expressions .................................... 89 6.1.1.1 Numeric and String Constants ....................... 89 6.1.1.2 Octal and Hexadecimal Numbers .................... 89 6.1.1.3 Regular Expression Constants ....................... 90 6.1.2 Using Regular Expression Constants ...................... 91 6.1.3 Variables................................................. 92 6.1.3.1 Using Variables in a Program ........................ 92 6.1.3.2 Assigning Variables on the Command Line ........... 92 6.1.4 Conversion of