Perl Regular Expressions Tip Sheet Functions and Call Routines

– Perl Regular Expressions Tip Sheet Functions and Call Routines Basic Syntax Advanced Syntax regex-id = prxparse(perl-regex) Character Behavior Character Behavior Compile Perl regular expression perl-regex and /…/ Starting and ending regex delimiters non-meta Match character return regex-id to be used by other PRX functions. | Alternation character () Grouping {}[]()^ Metacharacters, to match these pos = prxmatch(regex-id | perl-regex, source) $.|*+?\ characters, override (escape) with \ Search in source and return position of match or zero Wildcards/Character Class Shorthands \ Override (escape) next metacharacter if no match is found. Character Behavior \n Match capture buffer n Match any one character . (?:…) Non-capturing group new-string = prxchange(regex-id | perl-regex, times, \w Match a word character (alphanumeric old-string) plus "_") Lazy Repetition Factors Search and replace times number of times in old- \W Match a non-word character (match minimum number of times possible) string and return modified string in new-string. \s Match a whitespace character Character Behavior \S Match a non-whitespace character *? Match 0 or more times call prxchange(regex-id, times, old-string, new- \d Match a digit character +? Match 1 or more times string, res-length, trunc-value, num-of-changes) Match a non-digit character ?? Match 0 or 1 time Same as prior example and place length of result in \D {n}? Match exactly n times res-length, if result is too long to fit into new-string, Character Classes Match at least n times trunc-value is set to 1, and the number of changes is {n,}? Character Behavior Match at least n but not more than m placed in num-of-changes. {n,m}? Match a character in the brackets times […] Match a character not in the brackets text = prxposn(regex-id, n, source) [^…] Match a character in the range a to z Look-Ahead and Look-Behind After a call to prxmatch or prxchange, prxposn [a-z] Character Behavior return the text of capture buffer n. Position Matching Zero-width positive look-ahead (?=…) Character Behavior assertion. E.g. regex1 regex2 , call prxposn(regex-id, n, pos, len) (?= ) Match beginning of line a match is found if both regex1 and After a call to prxmatch or prxchange, call prxposn ^ Match end of line regex2 match. regex2 is not sets pos and len to the position and length of capture $ included in the final match. buffer n. \b Match word boundary \B Match non-word boundary (?!…) Zero-width negative look-ahead call prxnext(regex-id, start, stop, source, pos, len) assertion. E.g. regex1(?!regex2), Search in source between positions start and stop. Repetition Factors a match is found if regex1 matches (greedy, match as many times as possible) Set pos and len to the position and length of the and regex2 does not match. regex2 Character Behavior match. Also set start to pos+len+1 so another search is not included in the final match. Match 0 or more times can easily begin where this one left off. * (?<=…) Zero-width positive look-behind + Match 1 or more times assertion. E.g. (?<=regex1)regex2, call prxdebug(on-off) ? Match 1 or 0 times a match is found if both regex1 and Pass 1 to enable debug output to the SAS Log. {n} Match exactly n times regex2 match. regex1 is not Pass 0 to disable debug output to the SAS Log. {n,} Match at least n times included in the final match. {n,m} Match at least n but not more than m (?<!…) Zero-width negative look-behind call prxfree(regex-id) times assertion. Free memory for a regex-id returned by prxparse. – Perl Regular Expressions Tip Sheet Basic Example Search and Replace #1 Search and Extract data _null_; data _null_; data _null_; pos=prxmatch('/world/', input; length first last phone $ 16; 'Hello world!'); _infile_ = retain re; put pos=; ('s/</</', -1, _infile_); if _N_ = 1 then do; prxchange txt=prxchange('s/world/planet/', put _infile_; re = prxparse("/$([2-9]\d\d)$ ?" || -1, 'Hello world!'); datalines; "[2-9]\d\d-\d\d\d\d/"); put txt=; x + y < 15 end; run; x < 10 < y y < 11 input first last phone & $16.; Output: ; pos=7 if prxmatch(re, phone) then do; txt=Hello planet! Output: area_code = prxposn(re, 1, phone); x + y < 15 if area_code ^in ("828" "336" Data Validation x < 10 < y "704" "910" y < 11 "919" "252") then data phone_numbers; putlog "NOTE: Not in NC: " length first last phone $ 16; first last phone; Search and Replace #2 input first last phone & $16.; end; datalines; data reversed_names; datalines; Thomas Archer (919)319-1677 Thomas Archer (919)319-1677 Lucy Barr 800-899-2164 input name & $32.; datalines; Lucy Barr (800)899-2164 Tom Joad (508) 852-2146 Tom Joad (508) 852-2146 Laurie Gil (252)152-7583 Jones, Fred Kavich, Kate ; Laurie Gil (252)352-7583 Turley, Ron ; data invalid; Dulix, Yolanda set phone_numbers; ; Output: where not NOTE: Not in NC, Lucy Barr (800)899-2164 data names; prxmatch("/$[2-9]\d\d$ ?" || set reversed_names; NOTE: Not in NC, Tom Joad (508) 852-2146 "[2-9]\d\d-\d\d\d\d/",phone); run; name = prxchange('s/(\w+), (\w+)/$2 $1/', -1, name); proc sql; /* Same as prior data step */ run; create table invalid as proc sql; /* Same as prior data step */ select * from phone_numbers where not create table names as select prxmatch("/$[2-9]\d\d$ ?" || "[2-9]\d\d-\d\d\d\d/",phone); prxchange('s/(\w+), (\w+)/$2 $1/', -1, name) quit; as name from reversed_names; Output: quit; Obs first last phone 1 Lucy Barr 800-899-2164 Output: 2 Laurie Gil (252)152-7583 Obs name 1 Fred Jones 2 Kate Kavich For complete information refer to the 3 Ron Turley Base SAS documentation at 4 Yolanda Dulix support.sas.com/base .

Perl Regular Expressions Tip Sheet Functions and Call Routines

Use Perl Regular Expressions in SAS® Shuguang Zhang, WRDS, Philadelphia, PA

Lecture 18: Theory of Computation Regular Expressions and Dfas

Unicode Regular Expressions Technical Reports

Regular Expressions with a Brief Intro to FSM

Regular Expressions

Context-Free Grammar for the Syntax of Regular Expression Over the ASCII

PHP Regular Expressions

A Quick Guide to PERL Regular Expressions Second Edition © 2006 Transliteration: Translate Operator Tr/// EXPR =~ Tr/SEARCHLIST/REPLACELIST/Cds

CSCI 3434: Theory of Computation Lecture 4: Regular Expressions

10 Patterns, Automata, and Regular Expressions

Regular Expressions for Perl, C, PHP, Python, Java, and .NET

Regular Languages and Finite Automata for Part IA of the Computer Science Tripos