Textual Data Processing (Perl)

Textual Data Processing (Perl)

• CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Visibility in VBA Project P1 Project P2 CSCI-GA.3033.003 Module M1A Module M1B Scripting Languages Sub S 6/7/2012 S is visible from M1B M1B is visible from P2 Textual data processing (Perl) if S is not private if P2 has a reference to P1 and M1B is not private © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 1 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 2 Perl Outline About Perl • Perl Basics • Practical Extraction and Reporting Language • Regular Expressions – Regular expressions – String interpolation • At 7:30pm: Quiz 1 – Associative arrays • TIMTOWDI – There is more than one way to do it – Make easy jobs easy – … without making hard jobs impossible – Pathologically Eclectic Rubbish Lister © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 3 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 4 Concepts Perl Orthogonality Related Languages Definition of Language design Violation of orthogonal principle orthogonality • Predecessors: C, sed, awk, sh At right angles Uniform rules for VBA object • Successors: (unrelated) feature interaction assignments – PHP (“PHP 1” = collection of Perl scripts) Not redundant Few, but general, VBA positional – Python, JavaScript (different languages, inspired features vs. named args by Perl’s strengths + weaknesses) Perl is diagonal rather than orthogonal: • Perl 5 (current version, since 1994) “If I walk from one corner of the park to another, • Perl 6 I don’t walk due east and then due north. I go – Larry Wall has been talking about it since 2001 northeast.” [Larry Wall] – Release unclear; not backward compatible ⇒ shortcut features even when not orthogonal © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 5 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 6 • © Martin Hirzel • 1 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Perl Perl How to Write + Run Code Lexical Peculiarities • perl [-w] -e 'perl code' • “term” = token – “-w” flag produces warnings • Single-line comments: # • perl [-w] script.pl • Semicolon required after statements unless • script.pl {last; in; block} – Write the file in Vi or Emacs or … • Quotes around certain strings (bare words) – chmod u+x script.pl Makes script executable optional in certain cases (e.g., as hash key) – #!/usr/bin/perl -w • v-string: v13.10 = "\x{13}\x{10}" In first line of script specifies interpreter • Interpolation; pick-your-own-quotes; • perl [-w] -d -e 42 Heredocs; POD (plain old documentation) – Edit-eval-print loop (debug the script “42”) © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 7 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 8 Perl Perl Types Sigils, a.k.a. “Funny Characters” • Symbol that must appear in front of (composite) scalar file handle variable, showing its type – $=scalar, @=array, %=hash, &=function, *=typeglob numeric string void reference array hash – E.g., $a[0] is element 0 of array @a • Unlike shell, Perl requires sigil also on function (other int double UNIVERSAL left-hand side of assignment reference references) • ${id} is the same as $id (blessed references) • Function sigil & not required for call © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 9 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 10 Perl Perl Variable Declarations Interpolation Read if Implicit print $a + 1; undef • Expansion of values embedded in string $b = 5; non-existent • Single-quoted string literal 'abcde' Local, my $c; – Only interpolate \' and \\ lexical scope my ($d,$e)=(3,4); • Double-quoted string literal "abcde" sub f{ Global Hides locals; – More escape sequences, e.g., \n our $g; unlimited print $g++ – Variables only, starting with @ or $ used locally lifetime } – Use curleys to delimit: "time ${hours}h" sub h{print $i;} Local, Can also • Trick to interpolate arbitrary expressions sub k{ localize single dynamic scope local $i=5; h – "… @{[expr]} …" or "… @{[scalar expr]} …" } array/hash item © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 11 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 12 • © Martin Hirzel • 2 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Concepts (…), "…", `…`, print, sort, … L Terms, function call, quoting, list operators (leftward) -> 2 L Dereference and member access Static vs. Dynamic Scoping ++, -- 1 N Auto-increment, auto-decrement Perl ** 2 R Exponentiation !, ~, \, +, - 1 R Negation (!, ~, -), reference (\),no-op (+) Static scoping Dynamic scoping =~, !~ 2 L Binding to regular expression pattern match Bound in closest nesting Bound in closest calling *, /, %, x 2 L Multiplicative (x is string repetition) scope in program text function at runtime +, -, . 2 L Additive (. is string concatenation) <<, >> 2 L Bitwise shift #!/usr/bin/perl -w #!/usr/bin/perl -w eval, sqrt, -f, -e, … 1 N Named unary operators, file test operators $x = 's'; $x = 's'; <, >, <=, >=, lt, gt, le, ge 2 N Relational (lt, gt, le, ge is for strings) sub g { sub g { ==, !=, <=>, eq, ne, cmp 2 N Equality (eq, ne, cmp is for strings) (<=>, cmp yield -1/0/1) my $x = 'd'; local $x = 'd'; &, |, ^ 2 L Bitwise (not all same precedence) return h() return h() && 2 L Logical and (short-circuit) } } ||, // 2 L Logical or (||), Defined-or (//) (short-circuit) sub h { sub h { Operators .., ... 2 N Range (in list context) or bistable (in scalar context) return $x return $x ?: 3 R Ternary conditional =, +=, -=, *=, … 2 R Assignment; return l-value of target } } ,, => 2 L List (in list context) or sequencing (in scalar context) print g(), "\n"; #s print g(), "\n"; #d print, sort, … N List operators (rightward) print $x, "\n"; #s print $x, "\n"; #s not, and, or, xor 2 R Logical (short-circuit; not all same precedence) © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 13 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 14 Perl Perl Operators: List vs. Named Unary Input and Output • Different precedence rules • Output • List operator (most user-defined functions) – print "Hello, world!"; – High leftward, low rightward precedence – print STDERR "boo!"; – @a = (1,5,sort 9,2); print @a; #1529 – printf "sqrt(%.2f)=%.2f\n", 2, sqrt(2); • Named unary operator • Input – Lower than arithmetic, higher than comparison – $lineFromStdIn = <>; – @a = (1,5,sqrt 9,2); print @a; #1532 – open MYFILE, '<recipe' or die "$!"; • Call either one with parentheses – $lineFromMyFile = <MYFILE>; – Highest precedence – @allLines = <MYFILE>; – @a = (1,5,sort(3+6),2); print @a; #1592 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 15 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 16 Perl Perl Arrays Hashes • Resizable • Associative arrays, using hash tables • Literals: list @a=(1,3,5), range @b=2..4 • Literals: none; use list literals instead • Indexing: e.g. $a[1] – %h = (lb => 1, oz => 16, g => 453); – Zero-based; negative index counts from end • Indexing: string, bare-word quotes optional – $#a returns last index of @a, in this case, 2 – $h{"oz"}, $h{oz} – Write to non-existent index auto-vivifies – Write to non-existing index auto-vivifies • Free: undef @a, truncate: $#a=1 • Free: undef %h; delete: undef $h{oz}; • Array slice: using multiple indices, e.g., @a • Hash slice: returns array, e.g., [0,2] or @a[1..2] @a = @h{lb, g}; returns (1, 453) • Using array in scalar context: returns length • Using hash in scalar context: returns string – scalar(@a); # 3 = size – scalar(%h); # ”2/8"=buckets/capacity © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 17 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 18 • © Martin Hirzel • 3 • CSCI-GA.3033.003 • 6/7/2012 Textual Data Processing (Perl) Perl Perl Array and Hash References Data Structures : hash • References are scalars, only scalars can be $recipe = [ $recipe { qty = 2 stored in other arrays/hashes qty => 2, unit => "cup", unit = "cup" • Literals: “anonymous array/hash composers” ing => "flour", ing = "flour" – $ar=[1,3,5];, $hr={x=>2,y=>0}; }, { : hash • Indexing: e.g., $ar->[1]; $hr->{y}; qty => 4, qty = 4 – Dereference operator -> optional in nested chain, unit => "oz", : array ing => "sugar", unit = "oz" e.g., $rarh->[123]{abc} }, ing = "sugar" – Write to non-existent index auto-vivifies all indices { on the way qty => 3, : hash unit => "tbsp", qty = 3 • Using reference in string context: returns ing => "milk", string describing the type and address }, unit = "tbsp" ]; – print $ar; # "ARRAY(0x18bef54)" ing = "milk" © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 19 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 20 Perl Perl Control Statements What is Truth if(expr){…} [elsif(expr){…}] … [else{…}] [label:] while(expr){…} [continue{…}] • Strings "" and "0" are false, Proper [label:] until(expr){…} [continue{…}] everything else is true statements [label:] for(expr; expr; expr) {…} (man • File read <…> returns "" at end-of-file perlsyn) [label:] foreach [var] (list) {…} [continue{…}] [label:] for [var] (list) {…} [continue{…}] • Number 0 converted to string "0" [label ] [ ] : {…} continue{…} • Array in scalar context 0 iff empty Modifiers stmt (if | unless | while | until | foreach) expr Operators do | eval | sub • Hash in scalar context "0" iff empty (man continue | goto | last | next | redo | return • Reference iff perlfunc) die | dump | exit "" undef © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 21 © Martin Hirzel CSCI-GA.3033.003 NYU 6/7/12 22 Concepts Perl Blocks as Function Arguments Writing Subroutines • do{…}, eval{…}, sub{…} look like control • Declaration statements, but aren’t – sub [id] [proto] [attrs] [{…}] – They are functions with block argument – No id: anonymous; no proto: list operator; no – Other examples: grep, map, sort block: declaration without definition – You can write such functions yourself – To return

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us