AWKREFERENCE DEFINITIONS This card describes POSIX AWK, as well as the three freely CONTENTS available implementations (see FTP/HTTP Infor mation Action Statements ...... 7 below). Common extensions (in twoormore versions) are Arrays ...... 11 printed in light blue. Features specific to just one version— AwkProgram Execution ...... 4 usually GNU AWK (gawk)—are printed in dark blue. Bit Manipulation Functions (gawk)...... 16 Exceptions and deprecated features are printed in red. Features Bug Reports...... 2 mandated by POSIX are printed in black. Closing Redirections ...... 12 Several type faces are used to clarify the meaning: Command Line Arguments (standard)...... 2 • Courier Bold is used for computer input. Command Line Arguments (gawk)...... 3 • Times Italic is used for emphasis, to indicate user input and for Command Line Arguments (mawk)...... 4 syntactic placeholders, such as variable or action. Conversions And Comparisons ...... 9 •Times Roman is used for explanatory text. Copying Permissions...... 18 Definitions ...... 2 number −afloating point number as in ANSI C, such as 3, 2.3, Dynamic Extensions (gawk)...... 14 1.4e2 or 4.1E5. Numbers may also be giveninoctal or Environment Variables (gawk)...... 16 hexadecimal: e.g., 011 or 0x11. Escape Sequences...... 8 escape sequences −aspecial sequence of characters beginning Expressions ...... 11 with a backslash, used to describe otherwise unprintable Fields ...... 6 characters. (See Escape Sequences below.) FTP/HTTP Information...... 18 Historical Features (gawk)...... 16 string −agroup of characters enclosed in double quotes. Strings Input Control ...... 12 may contain escape sequences. Internationalization (gawk)...... 18 regexp −aregular expression, either a regexp constant enclosed in Lines And Statements...... 5 forward slashes, or a dynamic regexp computed at run-. Localization (gawk)...... 17 Regexp constants may contain escape sequences. Numeric Functions ...... 14 Output Control...... 12 name −avariable, array or function name. Pattern Elements...... 7 entry(N)−entry entry in section N of the reference POSIX Character Classes (gawk)...... 6 manual. Printf Formats ...... 13 Records ...... 6 pattern −anexpression describing an input record to be matched. Regular Expressions...... 5 action −statements to execute when an input record is matched. Special ...... 14 String Functions ...... 15 rule −apattern-action pair,where the pattern or action may be Time Functions (gawk)...... 16 missing. User-defined Functions ...... 17 Variables ...... 8 COMMAND LINE ARGUMENTS (standard) Arnold Robbins wrote this reference card. We thank Brian Command line arguments control setting the field separator, Kernighan and Michael Brennan who reviewed it. setting variables before the BEGIN rule is run, and the location of AWKprogram source code. Implementation-specific command line arguments change the behavior of the running interpreter. OTHER FSF PRODUCTS: −F fs use fs for the input field separator. Free Software Foundation, Inc. −v var=val assign the value val to the variable var before 59 Temple Place — Suite 330 execution of the program begins. Such variable Boston, MA 02111-1307 USA values are available to the BEGIN rule. −f prog-file read the AWK program source from the file Phone: +1-617-542-5942 prog-file,instead of from the first command line Fax(including Japan): +1-617-542-2652 argument. Multiple −f options may be used. E-mail: [email protected] −− the end of options. URL: http://www.gnu.org The following options are accepted by both awk and gawk (ignored by gawk,not in mawk). Source Distributions on CD-ROM −mf val set the maximum number of fields to val −mr val set the maximum record size to val DeluxeDistributions Emacs, Gawk, Make and GDB Manuals BUGREPORTS Emacs and GDB References If you find a bug in this reference card, please report it via electronic mail to [email protected]. Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc. 1 2

Copyright 05-01-02 18:25:12, FSF, Inc. (all) COMMAND LINE ARGUMENTS (gawk) COMMAND LINE ARGUMENTS (mawk) The following options are specific to gawk.You may also use The following options are specific to mawk. ``−W option'' for full POSIX compliance. Long options may −W print an assembly listing of the program abbreviated as long as the abbreviation remains unique. to stdout and exit zero. −−assign var=val just like −v. −W exec file read program text from file.Noother −−field-separator fs just like −F. options are processed. Useful with #!. −−file prog-file just like −f. −W interactive unbuffer stdout and line buffer −−compat, −−traditional stdin.Lines are always records, disable gawk-specific extensions (the use of ignoring RS. −−traditional is preferred). −W posix_space \n separates fields when RS = "". −−copyleft, −−copyright −W sprintf=num adjust the size of mawk's internal print the short version of the GNU copyright sprintf buffer. information on stdout. −W version print version and copyright on stdout −−dump-variables[=file] and limit information on stderr and Print a sorted list of global variables, their types exit zero. and final values to file.Ifnofile is provided, The options may be abbreviated using just the first letter,e.g., gawk uses awkvars.out. −We, −Wv and so on. −−gen−po the program and print a GNU gettext format .po format file on standard output, containing the text of all strings that were AWKPROGRAM EXECUTION marked for localization. AWKprograms are a sequence of pattern-action statements and −−help, −−usage optional function definitions. print a short summary of the available options on stdout,then exit zero. pattern { action statements } −−lint[=fatal] function name(parameter list){statements } warn about constructs that are dubious or non- awk first reads the program source from the prog-file(s), if portable to other awks. With an optional specified, from arguments to −−source, or from the first non- argument of fatal,lint warnings become fatal option argument on the command line. The program text is read errors. as if all the prog-file(s) and command line source texts had been −−lint−old warn about constructs that are not portable to the concatenated. original version of Unix awk. −−non−decimal−data AWKprograms execute in the following order.First, all variable recognize octal and hexadecimal values in input assignments specified via the −v option are performed. Next, data. Use this option with great caution! awk executes the code in the BEGIN rules(s), if any, and then −− disable common and GNU extensions. Enable proceeds to read the files 1 through ARGC − 1 in the ARGV interval expressions in regular expression array.(Adjusting ARGC and ARGV thus provides control overthe matching (see Regular Expressions below). input files that will be processed.) If there are no files named on −−profile[=prof_file] the command line, awk reads the standard input. send profiling data to prof_file (default: If a command line argument has the form var=val,itistreated as awkprof.out). With gawk,the profile is just avariable assignment. The variable var will be assigned the value a``pretty printed''version of the program. With val.(This happens after any BEGIN rule(s) have been run.) pgawk,the profile contains execution counts in Command line variable assignment is most useful for the left margin of each statement in the program. dynamically assigning values to the variables awk uses to control −−re−interval howinput is broken into fields and records. It is also useful for enable interval expressions in regular expression controlling state if multiple passes are needed overasingle data matching (see Regular Expressions below). file. Useful if −−posix is not specified. −−source ’text’ If the value of a particular element of ARGV is empty (""), awk use text as AWK program source code. skips overit. −−version print version information on stdout and exit Foreach record in the input, awk tests to see if it matches any zero. pattern in the AWK program. Foreach pattern that the record In compatibility mode, anyother options are flagged as invalid, matches, the associated action is executed. The patterns are butare otherwise ignored. In normal operation, as long as tested in the order theyoccur in the program. program text has been supplied, unknown options are passed on Finally,after all the input is exhausted, awk executes the code in to the AWK program in ARGV for processing. This is most useful the END rule(s), if any. for running AWK programs via the #! executable interpreter mechanism. If a program only has a BEGIN rule, no input files are processed. If a program only has an END rule, the input will be read. pgawk accepts twosignals. SIGUSR1 causes it to dump a profile and function call stack to the profile file. It then continues to run. SIGHUP causes it to dump the profile and function call stack and then exit.

3 4

Copyright 05-01-02 18:25:12, FSF, Inc. (all) LINES AND STATEMENTS POSIX CHARACTER CLASSES (gawk) AWKisaline-oriented language. The pattern comes first, and In regular expressions, within character ranges ([...]), the then the action. Action statements are enclosed in { and }.Either notation [[:class:]] defines character classes: the pattern or the action may be missing, but not both. If the alnum alphanumeric lower lower-case pattern is missing, the action is executed for every input record. alpha alphabetic print printable Amissing action is equivalent to blank space or tab punct punctuation {print } cntrl control space whitespace digit decimal upper upper-case which prints the entire record. graph non-spaces xdigit hexadecimal Comments begin with the # character,and continue until the end Recognition of these character classes is disabled when of the line. Normally,astatement ends with a , but lines −−traditional is supplied. ending in a ``,'', {, ?, :, && or || are automatically continued. Lines ending in do or else also have their statements automatically continued on the following line. In other cases, a RECORDS line can be continued by ending it with a ``\'', in which case the Normally,records are separated by newline characters. Assigning newline is ignored. However, a ``\''after a # is not special. values to the built-in variable RS controls howrecords are Multiple statements may be put on one line by separating them separated. If RS is anysingle character,that character separates with a ``;''. This applies to both the statements within the action records. Otherwise, RS is a regular expression. (Not Bell Labs part of a pattern-action pair (the usual case) and to the pattern- awk.) Te xtinthe input that matches this regular expression action statements themselves. separates the record. gawk sets RT to the value of the input text that matched the regular expression. The value of IGNORECASE also affects howrecords are separated when RS is a regular expression. If RS is set to the null string, then records are separated by one or more blank lines. When RS is set to the null REGULAR EXPRESSIONS string, the newline character always acts as a field separator,in Regular expressions are the extended kind originally defined by addition to whatevervalue FS may have. mawk does not apply egrep. Additional GNU regexp operators are supported by exceptional rules to FS when RS = "". gawk.Aword-constituent character is a letter,digit, or underscore (_). FIELDS Summary of Regular Expressions As each input record is read, awk splits the record into fields, In Decreasing Precedence using the value of the FS variable as the field separator.IfFS is a (r) regular expression (for grouping) single character,fields are separated by that character. If FS is c if non-special char,matches itself the null string, then each individual character becomes a separate \c turn offspecial meaning of c field. Otherwise, FS is expected to be a full regular expression. ˆ beginning of string (note: not line) In the special case that FS is a single space, fields are separated $ end of string (note: not line) by runs of spaces and/or tabs and/or .Leading and . anysingle character,including newline trailing whitespace are ignored. The value of IGNORECASE also [...] anyone character in ... or range affects howfields are split when FS is a regular expression. [ˆ...] anyone character not in ... or range If the FIELDWIDTHS variable is set to a space-separated list of \y word boundary numbers, each field is expected to have a fixed width, and gawk \B middle of a word splits up the record using the specified widths. The value of FS is \< beginning of a word ignored. Assigning anew value to FS overrides the use of \> end of a word FIELDWIDTHS,and restores the default behavior. \w anyword-constituent character \W anynon-word-constituent character Each field in the input record may be referenced by its position, \‘ beginning of a string $1, $2 and so on. $0 is the whole record. Fields may also be \’ end of a string assigned newvalues. r* zero or more occurrences of r The variable NF is set to the total number of fields in the input r+ one or more occurrences of r record. r? zero or one occurrences of r r{n,m} n to m occurrences of r (POSIX: see note below) References to non-existent fields (i.e., fields after $NF)produce r1| r2 r1 or r2 the null-string. However, assigning to a non-existent field (e.g., $(NF+2) = 5)increases the value of NF,creates any The r{n,m} notation is called an interval expression.POSIX intervening fields with the null string as their value, and causes mandates it for AWK regexps, but most awksdon'timplement it. the value of $0 to be recomputed with the fields being separated Use −−re−interval or −−posix to enable this feature in by the value of OFS.References to negative numbered fields gawk. cause a fatal error.Decreasing the value of NF causes the trailing fields to be lost (not Bell Labs awk).

5 6

Copyright 05-01-02 18:25:12, FSF, Inc. (all) PA TTERN ELEMENTS ESCAPE SEQUENCES AWKpatterns may be one of the following. Within strings constants ("...")and regexp constants (/.../), escape sequences may be used to generate otherwise unprintable BEGIN characters. This table lists the available escape sequences. END expression \a alert (bell) \r carriage return pat1,pat2 \b backspace \t horizontal tab \f form feed \v vertical tab BEGIN and END are special patterns that provide start-up and \n newline \\ backslash clean-up actions respectively.Theymust have actions. There can \ddd octal value ddd \xhh hexvalue hh be multiple BEGIN and END rules; theyare merged and executed \" double quote \/ forward slash as if there had just been one large rule. Theymay occur anywhere in a program, including different source files. VARIABLES Expression patterns can be anyexpression, as described under ARGC number of command line arguments. Expressions. ARGIND indexinARGV of current data file. The pat1,pat2 pattern is called a rangepattern.Itmatches all ARGV array of command line arguments. Indexed input records starting with a record that matches pat1,and from 0 to ARGC −1.Dynamically changing continuing until a record that matches pat2,inclusive.Itdoes not the contents of ARGV can control the files combine with anyother pattern expression. used for data. BINMODE controls ‘‘binary’’mode for all file I/O. Values of 1, 2, or 3, indicate input, output, or all files, respectively,should use binary I/O. ACTION STATEMENTS (Not Bell Labs awk.) Applies only to non- break POSIX systems. For gawk,string values of break out of the nearest enclosing do, for,orwhile loop. "r",or"w" specify that input files, or continue output files, respectively,should use binary skip the rest of the loop body.Evaluate the condition part of I/O. String values of "rw" or "wr" specify the nearest enclosing do or while loop, or go to the incr that all files should use binary I/O. Any part of a for loop. other string value is treated as "rw",but delete array[index] generates a warning message. delete element index from array array. CONVFMT conversion format for numbers, default delete array value is "%.6g". delete all elements from array array. ENVIRON array containing the current environment. do statement while (condition) The array is indexedbythe environment execute statement while condition is true. The statement is variables, each element being the value of always executed at least once. that variable. exit [ expression ] ERRNO string describing the error if a getline terminate input record processing. Execute the END rule(s) if or read fails, or if close() present. If present, expression becomes awk’s return value. fails. for (init; cond; incr) statement FIELDWIDTHS white-space separated list of fieldwidths. execute init.Evaluate cond.Ifitistrue, execute statement. Used to parse the input into fields of fixed Execute incr before going back to the top to re-evaluate cond. width, instead of the value of FS. Anyofthe three may be omitted. Amissing cond is name of the current input file. If no files considered to be true. givenonthe command line, FILENAME is for (var in array) statement ‘‘−’’. FILENAME is undefined inside the execute statement once for each subscript in array,with var BEGIN rule (unless set by getline). set to a different subscript each time through the loop. FNR record number in current input file. if (condition) statement1 [ else statement2 ] FS input field separator,aspace by default (see if condition is true, execute statement1,otherwise execute Fields above). statement2.Each else matches the closest if. IGNORECASE if non-zero, all regular expression and string next see Input Control. operations ignore case. Array subscripting nextfile (not mawk) see Input Control. and asort() are not affected. while (condition) statement LINT provides dynamic control of the −−lint while condition is true, execute statement. option from within an AWK program. { statements } When true, gawk prints lint warnings. alist of statements enclosed in braces can be used anywhere When assigned the string value "fatal", that a single statement would otherwise be used. lint warnings become fatal errors, exactly like −−lint=fatal.Any other true value just prints warnings. NF number of fields in the current input record. NR total number of input records seen so far. OFMT output format for numbers, "%.6g",by default. Old versions of awk used this for number to string conversion. 7 8

Copyright 05-01-02 18:25:12, FSF, Inc. (all) VARIABLES (continued) NOTES OFS output field separator,aspace by default. ORS output record separator,anewline by default. PROCINFO elements of this array provide access to info about the running AWK program. See GAWK: Effective AWK Programming for details. RLENGTH length of the string matched by match(); −1 if no match. RS input record separator,anewline by default (see Records above). RSTART indexofthe first character matched by match();0ifnomatch. RT record terminator. gawk sets RT to the input text that matched the character or regular expression specified by RS. SUBSEP character(s) used to separate multiple subscripts in array elements, by default "\034".(See Arrays below). TEXTDOMAIN the application’stextdomain for internationalization; used to find the localized translations for the program’s strings.

CONVERSIONS AND COMPARISONS Variables and fields may be (floating point) numbers, strings or both. Context determines howthe value of a variable is interpreted. If used in a numeric expression, it will be treated as a number,ifused as a string it will be treated as a string. To force a variable to be treated as a number,add 0 to it; to force it to be treated as a string, concatenate it with the null string. When a string must be converted to a number,the conversion is accomplished using strtod(3). A number is converted to a string by using the value of CONVFMT as a format string for sprintf(3), with the numeric value of the variable as the argument. However, ev enthough all numbers in AWK are floating-point, integral values are always converted as integers. Comparisons are performed as follows: If twovariables are numeric, theyare compared numerically.Ifone value is numeric and the other has a string value that is a ‘‘numeric string,’’ then comparisons are also done numerically.Otherwise, the numeric value is converted to a string, and a string comparison is performed. Two strings are compared, of course, as strings. Note that string constants, such as "57",are not numeric strings, theyare string constants. The idea of ‘‘numeric string’’only applies to fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array created by split() that are numeric strings. The basic idea is that user input,and only user input, that looks numeric, should be treated that way. Note that the POSIX standard applies the concept of ‘‘numeric string’’everywhere, eventostring constants. However, this is clearly incorrect, and none of the three free awksdothis. (Fortunately,this is fixed in the next version of the standard.) Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty,string).

9 10

Copyright 05-01-02 18:25:12, FSF, Inc. (all) ARRAYS INPUT CONTROL An array subscript is an expression between square brackets ([ getline set $0 from next record; set NF, NR, FNR. and ]). If the expression is a list (expr, expr ...), then the getline < file set $0 from next record of file;set NF. subscript is a string consisting of the concatenation of the (string) getline v set v from next input record; set NR, FNR. value of each expression, separated by the value of the SUBSEP getline v < file set v from next record of file. variable. This simulates multi-dimensional arrays. For example: cmd |getline pipe into getline;set $0, NF. cmd |getline v pipe into getline;set v. i="A"; j ="B"; k ="C" cmd |& getline co-process pipe into getline;set $0, NF. x[i, j, k] = "hello, world\n" cmd |& getline v assigns "hello, world\n" to the element of the array x co-process pipe into getline;set v. indexedbythe string "A\034B\034C".All arrays in AWK are next associative,i.e., indexedbystring values. stop processing the current input record. Read next input record and start overwith the first pattern in the program. Use the special operator in in an if or while statement to see Upon end of the input data, execute any END rule(s). if a particular value is an array index. nextfile if (val in array) stop processing the current input file. The next input record print array[val] comes from the next input file. FILENAME and ARGIND are updated, FNR is reset to 1, and processing starts overwith the If the array has multiple subscripts, use (i, j) in array. first pattern in the AWK program. Upon end of input data, Use the in construct in a for loop to iterate overall the elements execute any END rule(s). Earlier versions of gawk used of an array. next file,astwo words. This usage is no longer supported. mawk does not currently support nextfile. Use the delete statement to delete an element from an array. Specifying just the array name without a subscript in the delete getline returns 0 on end of file and −1 on an error. Upon an statement deletes the entire contents of an array. error, ERRNO contains a string describing the problem.

EXPRESSIONS OUTPUT CONTROL Expressions are used as patterns, for controlling conditional fflush([file]) action statements, and to produce parameter values when calling flush anybuffers associated with the open output file or pipe functions. Expressions may also be used as simple statements, file. If no file,then flush standard output. If file is null, then particularly if theyhav e side-effects such as assignment. flush all open output files and pipes (not Bell Labs awk). Expressions mix operands and operators.Operands are constants, print fields, variables, array elements, and the return values from print the current record. Terminate output record with ORS. function calls (both built-in and user-defined). print expr-list print expressions. Each expression is separated by the value Regexp constants (/pat/), when used as simple expressions, i.e., of OFS.Terminate the output record with ORS. not used on the right-hand side of ˜ and !˜,orasarguments to printf fmt, expr-list the gensub(), gsub(), match(), split(),and sub(), format and print (see Pr intf Formats below). functions, mean $0 ˜ /pat/. system(cmd) The AWK operators, in order of decreasing precedence, are: execute the command cmd,and return the exit status (may not be available on non-POSIX systems). (...) grouping $ field reference I/O redirections may be used with both print and printf. ++ −− increment and decrement, prefix and postfix print "hello" > file ˆ ** exponentiation print data to file.The first time the file is written to, it is +−! unary plus, unary minus, and logical negation truncated. Subsequent commands append data. */% multiplication, division, and modulus print "hello" >> file +− addition and subtraction append data to file.The previous contents of file are not lost. space string concatenation print "hello" | cmd <> than, greater than print data down a to cmd. <= >= less than or equal, greater than or equal print "hello" |& cmd != == not equal, equal print data down a pipeline to co-process cmd. ˜!˜ regular expression match, negated match in array membership && logical AND, short circuit CLOSING REDIRECTIONS || logical OR, short circuit close(file) ?: in-line conditional expression close input or output file, pipe or co-process. =+=−=*=/=%=ˆ=**= close(command, how) assignment operators close one end of co-process pipe. Use "to" for the write end, or "from" for the read end. On success, close() returns zero for a file, or the exit status for aprocess. It returns −1 if file wasnev eropened, or if there was a system problem. ERRNO describes the error.

11 12

Copyright 05-01-02 18:25:12, FSF, Inc. (all) PRINTF FORMATS SPECIAL FILENAMES The printf statement and sprintf() function accept the When doing I/O redirection from either print or printf into a following conversion specification formats: file or via getline from a file, all three implementations of awk recognize certain special filenames internally.These %c an ASCII character filenames allowaccess to open file descriptors inherited from the %d adecimal number (the integer part) parent process (usually the ). These filenames may also be %i adecimal number (the integer part) used on the command line to name data files. The filenames are: %e afloating point number of the form [−]d.dddddde[+ −]dd "−" standard input %E like %e,but use E instead of e /dev/stdin standard input (not mawk) %f afloating point number of the form /dev/stdout standard output [−]ddd.dddddd /dev/stderr standard error output %g use %e or %f,whicheverisshorter,with The following names are specific to gawk. nonsignificant zeros suppressed %G like %g,but use %E instead of %e /dev/fd/n %o an unsigned octal integer File associated with the open n. %u an unsigned decimal integer /inet/tcp/lport/rhost/rport %s acharacter string File for TCP/IP connection on local port lport to remote host %x an unsigned hexadecimal integer rhost on remote port rport.Use a port of 0 to have the %X like %x,but use ABCDEF for 10–15 system pick a port. Usable only with the |& two-way I/O %% Aliteral %;noargument is converted operator. /inet/udp/lport/rhost/rport Optional, additional parameters may lie between the % and the Similar,but use UDP/IP instead of TCP/IP. control letter: /inet/raw/lport/rhost/rport count$ use the count’thargument at this point in the Reserved for future use. formatting (a positional specifier). Use in Other special filenames provide access to information about the translated versions of format strings, not in the running gawk process. Reading from these files returns a single original text of an AWK program. record. The filenames and what theyreturn are: − left-justify the expression within its field. space for numeric conversions, prefix positive values /dev/pid process ID of current process with a space and negative values with a minus /dev/ppid parent process ID of current process sign. /dev/pgrpid ID of current process + used before the width modifier means to always /dev/user asingle newline-terminated record. supply a sign for numeric conversions, evenif The fields are separated with spaces. the data to be formatted is positive.The + $1 is the return value of getuid(2), overrides the space modifier. $2 is the return value of geteuid(2), # use an ‘‘alternate form’’for some control letters. $3 is the return value of getgid(2) , and %o supply a leading zero. $4 is the return value of getegid(2). %x, %X supply a leading 0x or 0X for a nonzero result. Anyadditional fields are the group IDs %e, %E, %f the result always has a decimal point. returned by getgroups(2). Multiple groups %g, %G trailing zeros are not removed. may not be supported on all systems. 0 aleading zero acts as a flag, indicating output These filenames are nowobsolete. Use the PROCINFO array to should be padded with zeros instead of spaces. obtain the information theyprovide. This applies eventonon-numeric output formats. Only has an effect when the field width is wider than the value to be printed. NUMERIC FUNCTIONS width pad the field to this width. The field is normally atan2(y, x) the arctangent of y/x in radians. padded with spaces. If the 0 flag has been used, cos(expr) the cosine of expr,which is in radians. pad with zeros. exp(expr) the exponential function (e ˆ x). .prec precision. The meaning of the prec varies by int(expr) truncates to integer. control letter: log(expr) the natural logarithm function (base e). %d, %o, %i, rand() arandom number between 0 and 1. %u, %x, %X the minimum number of digits to print. sin(expr) the sine of expr,which is in radians. %e, %E, %f the number of digits to print to the right of the sqrt(expr) the square root function. decimal point. srand([expr]) uses expr as a newseed for the random %g, %G the maximum number of significant digits. number generator.Ifnoexpr,the time of day %s the maximum number of characters to print. is used. Returns previous seed for the random number generator. The dynamic width and prec capabilities of the ANSI C printf() routines are supported. A * in place of either the DYNAMIC EXTENSIONS (gawk) width or prec specifications causes their values to be taken from extension(lib, func) printf sprintf() * $ the argument list to or . Use n to use dynamically load the shared library lib and call func in it to positional specifiers with a dynamic width or precision. initialize the library.This adds newbuilt-in functions to gawk.Itreturns the value returned by func.

13 14

Copyright 05-01-02 18:25:12, FSF, Inc. (all) STRING FUNCTIONS STRING FUNCTIONS (continued) asort(s [, d]) toupper(str) sorts the source array s,replacing the indices with numeric returns a copyofthe string str,with all the lower-case values 1 through n (the number of elements in the array), and characters in str translated to their corresponding upper-case returns the number of elements. If destination d is supplied, s counterparts. Non-alphabetic characters are left unchanged. is copied to d, d is sorted, and s is unchanged. gensub(r, s, h [, t]) search the target string t for matches of the regular expression TIME FUNCTIONS (gawk) r.Ifhis a string beginning with g or G,replace all matches gawk provides the following functions for obtaining time stamps of r with s.Otherwise, h is a number indicating which match and formatting them. of r to replace. If t is not supplied, $0 is used instead. Within mktime(datespec) the replacement text s,the sequence \n,where n is a digit turns datespec into a time stamp of the same form as returned from 1 to 9, may be used to indicate just the text that by systime().The datespec is a string of the form "YYYY matched the nth parenthesized subexpression. The sequence MM DD HH MM SS[ DST]". \0 represents the entire matched text, as does the character strftime([format [, timestamp]]) &.Unlike sub() and gsub(),the modified string is formats timestamp according to the specification in format. returned as the result of the function, and the original target The timestamp should be of the same form as returned by string is not changed. systime().Iftimestamp is missing, the current time of gsub(r, s [, t]) day is used. If format is missing, a default format equivalent for each substring matching the regular expression r in the to the output of date(1) is used. string t,substitute the string s,and return the number of systime() substitutions. If t is not supplied, use $0.An&in the returns the current time of day as the number of seconds replacement text is replaced with the text that was actually since the Epoch. matched. Use \& to get a literal &.See GAWK: Effective AWKPro gramming for a fuller discussion of the rules for &'s and backslashes in the replacement text of gensub(), BIT MANIPULATION FUNCTIONS (gawk) sub() and gsub() gawk provides the following functions for doing bitwise index(s, t) operations. returns the indexofthe string t in the string s,or0iftis not and(v1, v2) present. returns the bitwise AND of the values provided by v1 and v2. length( ) [s] compl(val) $0 returns the length of the string s,orthe length of if s is returns the bitwise complement of val. not supplied. lshift(val, count) match( , , ) s r [ a] returns the value of val,shifted left by count bits. returns the position in s where the regular expression r or(v1, v2) occurs, or 0 if r is not present, and sets the values of variables returns the bitwise OR of the values provided by v1 and v2. RSTART RLENGTH and . If a is supplied, the text matching rshift(val, count) [0] all of r is placed in a .Ifthere were parenthesized returns the value of val,shifted right by count bits. [1] subexpressions, the matching texts are placed in a , xor(v1, v2) [2] a ,and so on. teturns the bitwise XOR of the values provided by v1 and v2. split(s, a [, r]) splits the string s into the array a using the regular expression r,and returns the number of fields. If r is omitted, FS is used ENVIRONMENT VARIABLES (gawk) instead. The array a is cleared first. Splitting behaves The environment variable AWKPATH specifies a search path to identically to field splitting. (See Fields,above.) use when finding source files named with the −f option. The sprintf(fmt, expr-list) default path is ".:/usr/local/share/awk".Ifafile name prints expr-list according to fmt,and returns the resulting giventothe −f option contains a ``/''character,nopath search is string. performed. strtonum(s) POSIXLY_CORRECT gawk examines s,and returns its numeric value. If s begins with a If exists then behavesexactly as if −−posix leading 0, strtonum() assumes that s is an octal number. the option had been given. If s begins with a leading 0x or 0X, strtonum() assumes that s is a hexadecimal number. HISTORICAL FEATURES (gawk) sub(r, s [, t]) 1. It is possible to call the length() built-in function not only just like gsub(),but only the first matching substring is with no argument, but evenwithout parentheses. This feature is replaced. marked as ``deprecated''inthe POSIX standard, and gawk issues substr(s, i [, n]) awarning about its use if −−lint is specified on the command returns the at most n-character substring of s starting at i.If line. nis omitted, the rest of s is used. tolower(str) 2. The continue and break statements may be used outside returns a copyofthe string str,with all the upper-case the body of a while, for,ordo loop. Historical AWK characters in str translated to their corresponding lower-case implementations have treated such usage as equivalent to the counterparts. Non-alphabetic characters are left unchanged. next statement. gawk supports this usage if −−traditional is specified. 15 16

Copyright 05-01-02 18:25:12, FSF, Inc. (all) USER-DEFINED FUNCTIONS INTERNATIONALIZATION (gawk) Functions in AWK are defined as follows: gawk provides the following functions for runtime message translation. function name(parameter list) { bindtextdomain(directory [, domain]) statements specifies the directory where gawk looks for the .mo files, } in case theywill not or cannot be placed in the ‘‘standard’’ locations (e.g., during testing.) It returns the directory where Functions are executed when theyare called from within domain is ‘‘bound.’’ expressions in either patterns or actions. Actual parameters supplied in the function call instantiate the formal parameters The default domain is the value of TEXTDOMAIN.When declared in the function. Arrays are passed by reference, other directory is the null string (""), bindtextdomain() variables are passed by value. returns the current binding for the given domain. dcgettext(string [, domain [, category]]) Local variables are declared as extra parameters in the parameter returns the translation of string in text domain domain for list. The convention is to separate local variables from real locale category category.The default value for domain is the parameters by extra spaces in the parameter list. For example: current value of TEXTDOMAIN.The default value for #aand b are local category is "LC_MESSAGES". function f(p, q, a, b) If you supply a value for category,itmust be a string equal to { one of the known locale categories. You must also supply a ..... text domain. Use TEXTDOMAIN to use the current domain. } dcngettext(string1 , string2 , number [, domain [, /abc/ { ... ; f(1, 2) ; ... } category]]) The left parenthesis in a function call is required to immediately returns the plural form used for number of the translation of followthe function name without anyintervening white space. string1 and string2 in text domain domain for locale category This is to avoid a syntactic ambiguity with the concatenation category.The default value for domain is the current value operator.This restriction does not apply to the built-in functions. of TEXTDOMAIN.The default value for category is "LC_MESSAGES". Functions may call each other and may be recursive.Function parameters used as local variables are initialized to the null string If you supply a value for category,itmust be a string equal to and the number zero upon function invocation. one of the known locale categories. You must also supply a text domain. Use TEXTDOMAIN to use the current domain. Use return to return a value from a function. The return value is undefined if no value is provided, or if the function returns by ‘‘falling off’’the end. FTP/HTTP INFORMATION func function The word may be used in place of . Note: This Host: ftp.gnu.org usage is deprecated. File: /gnu/gawk/gawk-3.1.1.tar.gz GNU awk (gawk). There may be a later version. LOCALIZATION (gawk) http://cm.bell-labs.com/who/bwk/awk.tar.gz There are several steps involved in producing and running a Bell Labs awk.This version requires an ANSI C compiler; localizable awk program. GCC (the GNU C compiler) works well. 1. Add a BEGIN action to assign a value to the TEXTDOMAIN Host: ftp.whidbey.net variable to set the text domain for your program. File: /pub/brennan/mawk1.3.3.tar.gz BEGIN { TEXTDOMAIN = "myprog" } Michael Brennan’s mawk.There may be a newer version. This allows gawk to find the .mo file associated with your program. Without this step, gawk uses the messages text domain, which probably won’twork. COPYING PERMISSIONS Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002 Free 2. Mark all strings that should be translated with leading Software Foundation, Inc. underscores. Permission is granted to makeand distribute verbatim copies of bindtextdomain() dcgettext() 3. Use the , ,and/or this reference card provided the copyright notice and this dcngettext() functions in your program, as appropriate. permission notice are preserved on all copies. 4. Run Permission is granted to copyand distribute modified versions of gawk −−gen−po −f myprog.awk > myprog.po this reference card under the conditions for verbatim copying, provided that the entire resulting derivedwork is distributed under .po to generate a file for your program. the terms of a permission notice identical to this one. 5. Provide appropriate translations, and build and install a Permission is granted to copyand distribute translations of this .mo corresponding file. reference card into another language, under the above conditions The internationalization features are described in full detail in for modified versions, except that this permission notice may be GAWK: Effective AWK Programming. stated in a translation approvedbythe Foundation.

17 18

Copyright 05-01-02 18:25:12, FSF, Inc. (all) Copyright 05-01-02 18:25:12, FSF, Inc. (all)