<<

Quick Reference Card Operator Precedence (continued) 87 tr/// Modifiers 156 Associativiy Arity Precedence Class / Complement SEARCHLIST. version 0.02 – editor: John Bokma – freelance programmer Right 3 ?: /d Delete found but unreplaced characters. DRAFT VERSION, check: http://johnbokma.com/perl/ Right 2 and so on /s Squash duplicate replaced characters. Backslashed Character Escapes 61 = += -= *= Left 2 , => Newline (usually LF) Null character (NUL) General Regex Metacharacters 159 \n \0 Right 0+ List operators (rightward) \r Carriage return (usually CR) \033 ESC in octal Symbol Atomic Meaning Right 1 not \t Horizontal tab (HT) \x7f DEL in … Varies De-meta next nonalphanumeric character, meta Left 2 and \ \f Form feed (FF) \cC Control-C next alphanumeric character (maybe). Left 2 or xor \b Backspace (BS) \x{263a} Unicode, ☺(smiley) …|… No Alternation (match one or the other). \a Alert (BEL) \N{NAME} Named character File Test Operators 98 (…) Yes Grouping (treat as a unit). \e Escape (ESC) -r File is readable by effective UID/GID. […] Yes Character class (match one character from a set). File is writable by effective UID/GID. Translation Escapes 61 -w -x File is executable by effective UID/GID. \u Force next character to uppercase (“titlecase” in Unicode). ^ No True at beginning of string (or after a newline, -o File is owned by effective UID/GID. \l Force next character to lowercase. maybe). -R File is readable by real UID/GID. \U Force all following characters to uppercase . Yes Match one character (except newline, normally). -W File is writable by real UID/GID. \L Force all following characters to lowercase $ No True at end of string (or before any newline, -X File is executable by real UID/GID. \Q Backslash all following non-"word" characters (quotemeta) maybe). -O File is owned by real UID/GID. \E End \U, \L, or \Q. -e File exists. Regex Quantifiers 159-160 Quote Constructs 63 -z File has zero size Quantifier Atomic Meaning File has nonzero size (returns size). Customary Generic Meaning Interpolates -s * No Match 0 or more times (maximal). -f File is a plain file. + No Match 1 or more times (maximal). -d File is a directory. ? No Match 0 or 1 time (maximal). '' q// Literal string No -l File is a symbolic link. {COUNT} No Match exactly COUNT times. "" qq// Literal string Yes -p File is a named pipe (FIFO). {MIN,} No Match at least MIN times (maximal). `` qx// Command execution Yes -S File is a socket. {MIN,MAX} No Match at least MIN but not more than MAX () qw// Word list No -b File is a block special file. times (maximal). // m// Pattern match Yes -c File is a character special file. s/// s/// Pattern substitution Yes -t Filehandle is open to a tty. *? No Match 0 or more times (minimal). y/// tr/// Character translation No -u File has setuid bit set. +? No Match 1 or more times (minimal). "" qr// Regular expression Yes -g File has setgid bit set. ?? No Match 0 or 1 time (minimal). Note: no interpolation is done if you use single quotes for delimiters. -k File has sticky bit set. {MIN,}? No Match at least MIN times (minimal). -T File is a text file. {MIN,MAX}? No Match at least MIN but not more than MAX Operator Precedence 87 -B File is a binary file (opposite of -T). times (minimal). Associativiy Arity Precedence Class -M Age of file (at startup) in (fractional) days since modification. Extended Regex Sequences 160 None 0 Terms, and list operators (leftward) -A Age of file (at startup) in (fractional) days since last access. Extension Atomic Meaning Left 2 -> -C Age of file (at startup) in (fractional) days since inode change. (?#… None 1 ++ -- ) No Comment, discard. Pattern Modifiers 147 …) Right 2 ** (?: Yes Cluster-only parentheses, no capturing. Right 1 ! ~ > and unary + and unary - /i Ignore alphabetic case distinctions (case insensitive). (?imsx-imsx) No Enable/disable pattern modifiers. …) Left 2 =~ !~ /s Let . match newline and ignore deprecated $* variable. (?imsx-imsx: Yes Cluster-only parentheses plus modifiers. …) Left 2 * / % x /m Let ^ and $ match next embedded \n. (?= No True if lookahead assertion succeeds. …) Left 2 + - . /x Ignore (most) whitespace and permit comments in pattern. (?! No True if lookahead assertion fails. …) Left 2 << >> /o Compile pattern only once. (?<= No True if lookbehind assertion succeeds. (? <= >= lt gt le ge (?>…) Yes Match nonbacktracking subpattern. /g Globally find all matches. None 2 == != <=> eq ne cmp (?{…}) No Execute embedded Perl code. /cg Allow continued search after failed /g match. Left 2 & (??{…}) Yes Match regex from embedded Perl code. … … … Yes Match with if-then-else pattern. Left 2 | ^ Additional s/// Modifiers 153 (?( ) | ) (?(…)…) Yes Match with if-then pattern. Left 2 && /g Replace globally, that is, all occurences. Left 2 || /e Evaluate the right side as an expression. None 2 .. ... Alphanumeric Regex Metasymbols 161-162 Composite Unicode Properties 168-169 Symbol Atomic Meaning Property Equivalent \0 Yes Match the null character (ASCII NUL). IsASCII [\x00-\x7f] \NNN Yes Match the character given in octal, up to \377. IsAlnum [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd} IsAlpha [\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo} \n Yes Match nth previously captured string (decimal). IsCntrl \p{IsC} \a Yes Match the alarm character (BEL). IsDigit \p{IsNd} \A No True at the beginning of a string. IsGraph [^\pC\p{IsSpace}] \b Yes Match the backspace character (BS). IsLower \p{IsLl} \b No True at a word boundary. IsPrint \P{IsC} IsPunct \p{IsP} \B No True when not at a word boundary. IsSpace [\t\n\f\r\p{IsZ}] \cX Yes Match the control character Ctrl-X (\cZ). IsUpper [\p{IsLu}\p{IsLt}] \C Yes Match one byte (C char) even in utf8 (dangerous). IsWord [_\p{IsLl}\p{IsLu}\p{IsLt}\p{IsLo}\p{IsNd}] \d Yes Match any digit character. IsXDigit [0-9a-fA-F] \D Yes Match any non-digit character. \e Yes Match the escape character (ASCII ESC, not \ ). Perl also provides the following composites: \E — End case (\L, \U) or quotemeta (\Q) translation. Property Meaning Normative \f Yes Match the form feed character (FF). IsC Crazy control characters and such Yes \G No True at end-of-match position of prior m//g. IsL Letters Partly \l — Lowercase the next character only. IsM Marks Yes \L — Lowercase till \E. IsN Numbers Yes \n Yes Match the newline character (usually NL, but CR Punctuation No on Macs). IsP IsS Symbols No \N{NAME} Yes Match the named char (\N{greek:Sigma}). IsZ Separators (Zeparators?) Yes \p{PROP} Yes Match any character with named property. \P{PROP} Yes Match any character without the named property. POSIX-Style Character Classes 174-175 \Q — Quote (de-meta) metacharacters till \E. Class Meaning \r Yes Match the return character (usually CR, but NL alnum Any alphanumeric, that is an alpha or a digit. on Macs). alpha Any letter. (That's a lot more letters than you think, unless Yes Match any whitespace character. \s you're thinking Unicode, in which case it's still a lot.) \S Yes Match any nonwhitespace character. ascii Any character with an ordinal value between 0 and 127. \t Yes Match the tab character (HT). cntrl Any control character. Usually characters that don't \u — Titlecase next character only. produce output as such, but instead control the terminal \U — Uppercase (not titlecase) till \E. somehow; for example, newline, form feed, and backspace. \w Yes Match any “word” character (alphanum plus “_”). digit A character representing a decimal digit, such as 0 to 9. \W Yes Match any nonword character. (Includes other characters under Unicode.) Equivalent to \d. \xHEX Yes Match the character given one or two hex digits. graph Any alphanumeric or punctuation character. \x{abcd} Yes Match the character given in hexadecimal. lower A lowercase letter. \X Yes Match Unicode “combining character sequence” print Any alphanumeric or punctuation character or space. string. punct Any punctuation character. \z No True at end of string only. space Any space character. Includes tab, newline, form feed, and \Z No True at end of string or before optional newline. carriage return (and a lot more under Unicode.) Equivalent Classic Character Classes 167 to \s. upper Any uppercase (or titlecase) letter. Symbol Meaning As Bytes As utf8 word Any identifier character, either an alnum or underline. \d Digit [0-9] \p{IsDigit} xdigit Any hexadecimal digit. Equivalent to [0-9a-fA-F]. \D Nondigit [^0-9] \P{IsDigit} \s White [ \t\n\r\f] \p{IsSpace} You can negate the POSIX character classes by prefixing the class \S Nonwhitespace [^ \t\n\r\f] \P{IsSpace} name with a ^ following the [:. (This is a Perl extension.) \w Word character [a-zA-Z0-9_] \p{IsWord} \W Non-(word character) [^a-zA-Z0-9_] \P{IsWord}