Textual Data Processing (Perl)
Total Page:16
File Type:pdf, Size:1020Kb
• CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Visibility in VBA Project P1 Project P2 CSCI-GA.3033.003 Module M1A Module M1B Scripting Languages Sub S 6/7/2012 S is visible from M1B M1B is visible from P2 Textual data processing (Perl) if S is not private if P2 has a reference to P1 and M1B is not private © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 1 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 2 Perl Outline About Perl • Perl Basics • Practical Extraction and Reporting Language • Regular Expressions – Regular expressions – String interpolation • At 7:30pm: Quiz 1 – Associative arrays • TIMTOWDI – There is more than one way to do it – Make easy jobs easy – … without making hard jobs impossible – Pathologically Eclectic Rubbish Lister © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 3 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 4 Concepts Perl Orthogonality Related Languages Definition of Language design Violation of orthogonal principle orthogonality • Predecessors: C, sed, awk, sh At right angles Uniform rules for VBA object • Successors: (unrelated) feature interaction assignments – PHP (“PHP 1” = collection of Perl scripts) Not redundant Few, but general, VBA positional – Python, JavaScript (different languages, inspired features vs. named args by Perl’s strengths + weaknesses) Perl is diagonal rather than orthogonal: • Perl 5 (current version, since 1994) “If I walk from one corner of the park to another, • Perl 6 I don’t walk due east and then due north. I go – Larry Wall has been talking about it since 2001 northeast.” [Larry Wall] – Release unclear; not backward compatible ⇒ shortcut features even when not orthogonal © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 5 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 6 • © Martin Hirzel • 1 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Perl Perl How to Write + Run Code Lexical Peculiarities • perl [-w] -e 'perl code' • “term” = token – “-w” flag produces warnings • Single-line comments: # • perl [-w] script.pl • Semicolon required after statements unless • script.pl {last; in; block} – Write the file in Vi or Emacs or … • Quotes around certain strings (bare words) – chmod u+x script.pl Makes script executable optional in certain cases (e.g., as hash key) – #!/usr/bin/perl -w • v-string: v13.10 = "\x{13}\x{10}" In first line of script specifies interpreter • Interpolation; pick-your-own-quotes; • perl [-w] -d -e 42 Heredocs; POD (plain old documentation) – Edit-eval-print loop (debug the script “42”) © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 7 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 8 Perl Perl Types Sigils, a.k.a. “Funny Characters” • Symbol that must appear in front of (composite) scalar file handle variable, showing its type – $=scalar, @=array, %=hash, &=function, *=typeglob numeric string void reference array hash – E.g., $a[0] is element 0 of array @a • Unlike shell, Perl requires sigil also on function (other int double UNIVERSAL left-hand side of assignment reference references) • ${id} is the same as $id (blessed references) • Function sigil & not required for call © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 9 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 10 Perl Perl Variable Declarations Interpolation Read if Implicit print $a + 1; undef • Expansion of values embedded in string $b = 5; non-existent • Single-quoted string literal 'abcde' Local, my $c; – Only interpolate \' and \\ lexical scope my ($d,$e)=(3,4); • Double-quoted string literal "abcde" sub f{ Global Hides locals; – More escape sequences, e.g., \n our $g; unlimited print $g++ – Variables only, starting with @ or $ used locally lifetime } – Use curleys to delimit: "time ${hours}h" sub h{print $i;} Local, Can also • Trick to interpolate arbitrary expressions sub k{ localize single dynamic scope local $i=5; h – "… @{[expr]} …" or "… @{[scalar expr]} …" } array/hash item © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 11 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 12 • © Martin Hirzel • 2 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Concepts (…), "…", `…`, print, sort, … L Terms, function call, quoting, list operators (leftward) -> 2 L Dereference and member access Static vs. Dynamic Scoping ++, -- 1 N Auto-increment, auto-decrement Perl ** 2 R Exponentiation !, ~, \, +, - 1 R Negation (!, ~, -), reference (\),no-op (+) Static scoping Dynamic scoping =~, !~ 2 L Binding to regular expression pattern match Bound in closest nesting Bound in closest calling *, /, %, x 2 L Multiplicative (x is string repetition) scope in program text function at runtime +, -, . 2 L Additive (. is string concatenation) <<, >> 2 L Bitwise shift #!/usr/bin/perl -w #!/usr/bin/perl -w eval, sqrt, -f, -e, … 1 N Named unary operators, file test operators $x = 's'; $x = 's'; <, >, <=, >=, lt, gt, le, ge 2 N Relational (lt, gt, le, ge is for strings) sub g { sub g { ==, !=, <=>, eq, ne, cmp 2 N Equality (eq, ne, cmp is for strings) (<=>, cmp yield -1/0/1) my $x = 'd'; local $x = 'd'; &, |, ^ 2 L Bitwise (not all same precedence) return h() return h() && 2 L Logical and (short-circuit) } } ||, // 2 L Logical or (||), Defined-or (//) (short-circuit) sub h { sub h { Operators .., ... 2 N Range (in list context) or bistable (in scalar context) return $x return $x ?: 3 R Ternary conditional =, +=, -=, *=, … 2 R Assignment; return l-value of target } } ,, => 2 L List (in list context) or sequencing (in scalar context) print g(), "\n"; #s print g(), "\n"; #d print, sort, … N List operators (rightward) print $x, "\n"; #s print $x, "\n"; #s not, and, or, xor 2 R Logical (short-circuit; not all same precedence) © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 13 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 14 Perl Perl Operators: List vs. Named Unary Input and Output • Different precedence rules • Output • List operator (most user-defined functions) – print "Hello, world!"; – High leftward, low rightward precedence – print STDERR "boo!"; – @a = (1,5,sort 9,2); print @a; #1529 – printf "sqrt(%.2f)=%.2f\n", 2, sqrt(2); • Named unary operator • Input – Lower than arithmetic, higher than comparison – $lineFromStdIn = <>; – @a = (1,5,sqrt 9,2); print @a; #1532 – open MYFILE, '<recipe' or die "$!"; • Call either one with parentheses – $lineFromMyFile = <MYFILE>; – Highest precedence – @allLines = <MYFILE>; – @a = (1,5,sort(3+6),2); print @a; #1592 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 15 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 16 Perl Perl Arrays Hashes • Resizable • Associative arrays, using hash tables • Literals: list @a=(1,3,5), range @b=2..4 • Literals: none; use list literals instead • Indexing: e.g. $a[1] – %h = (lb => 1, oz => 16, g => 453); – Zero-based; negative index counts from end • Indexing: string, bare-word quotes optional – $#a returns last index of @a, in this case, 2 – $h{"oz"}, $h{oz} – Write to non-existent index auto-vivifies – Write to non-existing index auto-vivifies • Free: undef @a, truncate: $#a=1 • Free: undef %h; delete: undef $h{oz}; • Array slice: using multiple indices, e.g., @a • Hash slice: returns array, e.g., [0,2] or @a[1..2] @a = @h{lb, g}; returns (1, 453) • Using array in scalar context: returns length • Using hash in scalar context: returns string – scalar(@a); # 3 = size – scalar(%h); # ”2/8"=buckets/capacity © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 17 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 18 • © Martin Hirzel • 3 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Perl Perl Array and Hash References Data Structures : hash • References are scalars, only scalars can be $recipe = [ $recipe { qty = 2 stored in other arrays/hashes qty => 2, unit => "cup", unit = "cup" • Literals: “anonymous array/hash composers” ing => "flour", ing = "flour" – $ar=[1,3,5];, $hr={x=>2,y=>0}; }, { : hash • Indexing: e.g., $ar->[1]; $hr->{y}; qty => 4, qty = 4 – Dereference operator -> optional in nested chain, unit => "oz", : array ing => "sugar", unit = "oz" e.g., $rarh->[123]{abc} }, ing = "sugar" – Write to non-existent index auto-vivifies all indices { on the way qty => 3, : hash unit => "tbsp", qty = 3 • Using reference in string context: returns ing => "milk", string describing the type and address }, unit = "tbsp" ]; – print $ar; # "ARRAY(0x18bef54)" ing = "milk" © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 19 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 20 Perl Perl Control Statements What is Truth if(expr){…} [elsif(expr){…}] … [else{…}] [label:] while(expr){…} [continue{…}] • Strings "" and "0" are false, Proper [label:] until(expr){…} [continue{…}] everything else is true statements [label:] for(expr; expr; expr) {…} (man • File read <…> returns "" at end-of-file perlsyn) [label:] foreach [var] (list) {…} [continue{…}] [label:] for [var] (list) {…} [continue{…}] • Number 0 converted to string "0" [label ] [ ] : {…} continue{…} • Array in scalar context 0 iff empty Modifiers stmt (if | unless | while | until | foreach) expr Operators do | eval | sub • Hash in scalar context "0" iff empty (man continue | goto | last | next | redo | return • Reference iff perlfunc) die | dump | exit "" undef © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 21 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 22 Concepts Perl Blocks as Function Arguments Writing Subroutines • do{…}, eval{…}, sub{…} look like control • Declaration statements, but aren’t – sub [id] [proto] [attrs] [{…}] – They are functions with block argument – No id: anonymous; no proto: list operator; no – Other examples: grep, map, sort block: declaration without definition – You can write such functions yourself – To return