Quick viewing(Text Mode)

Perl Is a Language of Getting Your Job Done » Larry Wall

Perl Is a Language of Getting Your Job Done » Larry Wall

Practical Extraction and Report

« is a language of getting your job done »

« There is more than one way to do it »

Larry Wall

VI, March 2005 Page 1 Practical Extraction and Report Language http://perl.oreilly.com

" Perl is both a and an application on your that runs those programs "

VI, March 2005 Page 2 Perl history

A few dates:

1969 was born at Bell Labs.

1970 suggested the name "Unix" and the we know today was born.

1972 The programming language is born at the Bell Labs (C is one of Perl' ancestors).

1973 “” is introduced by Ken Thompson as an external utility: Global Print.

1976 Steven Jobs and Steven Wozniak found Apple Computer (1 April).

1977 The is designed by Alfred . Aho, Peter . Weinberger, and Brian W. Kernighan (awk is one of Perl's ancestors).

VI, March 2005 Page 3 Perl history

1987 Perl 1.000 is unleashed upon the world

NAME perl | Practical Extraction and Report Language

SYNOPSIS perl [options] filename args

DESCRIPTION Perl is a interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It's also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features of C, , awk, and sh, so people familiar with those should have little difficulty with it (Language historians will also note some vestiges of csh, Pascal, and even BASIC|PLUS). Expression corresponds quite closely to C expression syntax. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don' want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts OK, enough hype.

VI, March 2005 Page 4 Perl history

1994 Perl5: last major release (Currently Perl 5.8.6).

1996 Creation of the CPAN repository of modules and documentation ( Comprehensive Perl Archive Network).

2005 Perl 5.8.6

Supported Operating Systems: Unix systems / Macintosh (OS 7-9 and X) / Windows / VMS

Perl Features Perls integration interface (DBI) supports thirdparty including Oracle, Sybase, Postgres, MySQL and others. Perl works with HTML, XML, and other markup languages . Perl supports . Perl is Y2K compliant. Perl supports both procedural and objectoriented programming. Perl interfaces with external C/C++ libraries through XS or SWIG. Perl is extensible There are over 500 third party modules available from (CPAN). VI, March 2005 Page 5 Perl history

Perl and the

Perl is the most popular web programming language due to its text manipulation capabilities and rapid development cycle.

Perl's CGIpm module, part of Perl's standard distribution, makes handling HTML forms simple.

Perl can handle encrypted Web , including ecommerce transactions.

Perl can be embedded into web servers (mod_perl) to speed up processing by as much as 2000%.

Perl's DBI package makes webdatabase integration easy.

VI, March 2005 Page 6 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit ;

The first line of a Perl program is called " interpretation" or "Shebang line". This line refers to the "#!" and tells the computer that this is a Perl program.

To out whether you should use /usr/bin/perl OR /usr/local/bin/perl, type: " perl" in your shell: computerX: vioannid$ which perl computerY: vioannid$ which perl /usr/bin/perl /usr/local/bin/perl

VI, March 2005 Page 7 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world " print "Hello world" ;

#tell the program to exit exit ; use strict;

A command like use strict is called a pragma. Pragmas are instructions to the Perl to do something special when it runs your program. "use strict" does two things that it harder to write bad software: It makes you declare all your variables, and it makes it harder for Perl to mistake your intentions when you are using

ALL STATEMENTS ENDS IN A ";" (similar to the use of the period "." in the ) VI, March 2005 Page 8 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit exit ; use warnings;

Comments are good, but the most important tool for writing good Perl is the "warnings". Turning on warnings will make Perl yelp and complain at a huge variety of things that are almost always sources of bugs in your programs.

Perl normally takes a relaxed attitude toward things that may be problems: it assumes that you know what you're doing, even when you don't…

VI, March 2005 Page 9 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit exit ;

Comments All lines starting with "#" are not taken into account in the of the program. Good comments are short, but instructive They tell you things that aren't clear from reading the code.

Blank lines or spaces are also not taken into account in the execution of the program. However, they help in the reading of the code.

VI, March 2005 Page 10 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit exit ;

Print statement:

… prints !

By default, the standard output is the shell window from which the program is executed.

ALL STATEMENTS ENDS IN A SEMICOLON ";" (similar to the use of the period "." in the English language)

VI, March 2005 Page 11 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit exit ;

The exit statement:

Tells the computer to exit the program.

Although not explicitely required in Perl, it is definitely common.

VI, March 2005 Page 12 Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl

use strict; use warnings;

#tell the program to print "Hello world" print "Hello world" ;

#tell the program to exit exit ;

output: (Do not forget to make the file executable: vioannid$ chmod a+x perl_01.pl )

vioannid$ ./perl_01.pl Hello worldvioannid$

VI, March 2005 Page 13 Perl Hello world !!

Print:

#!/usr/local/bin/perl vioannid$ ./perl_02.pl Hello use strict; world use warnings; Hello world Helloworld #play with the print statement vioannid$

#words separated by print "Hello\nworld\n" ;

#words separated by tabs & a final newline print "Hello\tworld\n" ;

#usage of the period to cat strings print "Hello"."world"."\n"; Important: #tell the program to exit Unix & all Unix flavors: \n exit ; Mac OS : \ Windows: \r\n

VI, March 2005 Page 14 Perl variables

Perl has 3 data types: scalars / arrays / hashes scalars

a single string (of any size, limited only by the available memory), or a number, or a to something

Scalar values are always named with '$' (even when referring to a scalar that is part of an array or a hash). The '$' works semantically like the English word "the" in that it indicates a single value is expected. my $variable_1 = "Hello world !\n"; #note the quotes my $variable_two = 30; #note the absence of quotes my $marks[4]; # the fifth element of the array "marks"

VI, March 2005 Page 15 Perl variables

Perl has 3 data types: scalars / arrays / hashes arrays (of scalars) Normal arrays are ordered lists of scalars indexed by number (starting with 0).

Entire arrays are denoted by '@', which works much like the word "these" or "those" does in English, in that it indicates multiple values are expected. my @numbers = ("One", "Two", "Three", "Four", "Five"); my @numbers = (1..5); #same as "@numbers = (1, 2, 3, 4, 5);" my $numbers[0] = "One"; my $numbers[1] = "Two"; … my @anyarray = (6, "hello", @numbers);

index 0 1 2 3 4 … value One Two Three Four Five

VI, March 2005 Page 16 Perl variables

Perl has 3 data types: hashes (associative arrays of scalars)

Hashes are unordered collections of scalar values indexed by their associated string key. Entire hashes are denoted by '%' my %var = ("a","first","","3"); my %codon3 = ( Key Value "TTT" => "Phe", "TTA" => "Leu", TTT Phe ); TCT Ser TGT Cys print $codon3{'TTT'}; TAT Tyr

VI, March 2005 Page 17 Perl special variables ( extract)

$_ The default input and patternsearching space.

$& The string matched by the last successful pattern match. $` The string preceding whatever was matched by the last successful pattern match. $' The string following whatever was matched by the last successful pattern match.

$! If a system or call fails, it sets this variable This means that the value of $! is meaningful only immediately after a failure.

$/ The input record separator, newline by default .

$$ The number of the Perl running this .

@ARGV commandline arguments (space separation by default). note: $ARGV[0] first commandline argument …

VI, March 2005 Page 18 Perl variables

Programs using variables :

#!/usr/local/bin/perl #!/usr/local/bin/perl #!/usr/local/bin/perl use strict; use strict; use strict; use warnings; use warnings; use warnings; my $name = "John Doe"; my $name = $ARGV[0]; print "\nEnter your name (then press \"return\" print "Hello $name !\n" ; print "Hello $name !\n" ; when done):\t"; exit ; exit ; #get information from the #terminal window my $name = ; Interpolation & quoting: print "Hello $name !\n" ; the quotes have different significations … exit ; my $price = '$100'; print "the price is $price"; #this is called interpolation …

VI, March 2005 Page 19 Perl variables

Program using variables :

#!/usr/local/bin/perl

use strict; use warnings;

my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav", "Kyrill", "Petr", "Sebastien");

print "Hello\n @names !\n" ;

exit ;

Some arrays functions: sort sorts all the elements of an array. reverse inverses the order of all the elements of an array. shift, unshift takes the first element, places an element at the first position of the array. pop, push takes the last element, places an element at the last position of the array.

VI, March 2005 Page 20 Perl statement modifiers

Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or ending). The possible modifiers are:

if (EXPR) { } unless (EXPR) { } while (EXPR ) { } until (EXPR ) { } foreach (LIST ) { }

The EXPR following the modifier is referred to as the "condition". Its truth or falsehood determines how the modifier will behave. if executes the statement once if and only if the condition is true . unless is the opposite, it executes the statement if the condition is false (unless the condition is true). The foreach modifier is an : it executes the statement once for each item in the LIST (with $_ aliased to each item in turn). while repeats the statement while the condition is true. until does the opposite, it repeats the statement until the condition is true (or while the condition is false): The while and until modifiers have the usual "" (conditional evaluated first).

VI, March 2005 Page 21 Perl statement modifiers

if / if else / if elsif else

#!/usr/local/bin/perl

use strict; use warnings;

print "\nEnter your name (then press \"return\" when done):\t";

#get information from the terminal window my $name = ;

#remove trailing "\n" if any chomp $name;

if ($name eq "Couchepin") { print "Hello Mr President !\n" ; }

else { print "Hello $name !\n" ; }

exit ;

VI, March 2005 Page 22 Perl statement modifiers

if / if else / if elsif else (name.pl) :

#!/usr/local/bin/perl

use strict; use warnings;

print "\nEnter your name (then press \"return\" when done):\t";

#get information from the terminal window my $name = ;

#remove trailing "\n" if any chomp $name;

if ($name eq "Couchepin") { print "Hello Mr President !\n" ; }

elsif ($name eq "Falquet") { print "Good day to you Master $name !\n" ; }

else { print "Hello $name !\n" ; }

exit ;

VI, March 2005 Page 23 Perl statement modifiers

Perl looping the for/ : "Passing an array": foreach my $element ( @array ) { # do something with the element }

"Passing a hash": foreach my $key (keys %hash) { print "The value of $key is $hash{$key}\n"; }

"specify 3 EXPR inside the (): initial , condition and loop expression": for ($i = 0; $i <= 10; $i=$i+1 ) { #execute the contents of the block as long as $i is less than, or equal to 10 or while $i is smaller than 10 }

VI, March 2005 Page 24 Perl statement modifiers

Perl looping the for/foreach loop :

#!/usr/local/bin/perl

use strict; use warnings;

my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");

foreach my $name (@names) { print "Hello $name !\n"; }

exit ;

VI, March 2005 Page 25 Perl statement modifiers

Perl looping the for/foreach loop :

#!/usr/local/bin/perl

use strict; use warnings;

my $counter;

for ($counter=1;$counter<=10;$counter++){ print "I can count up to $counter !\n"; }

exit ;

VI, March 2005 Page 26 Perl statement modifiers

Perl looping the while loop True/False In Perl some variables are considered true:

- integer with a nonzero value - string with nonzero length - array with at least one element while ( condition ) { - hash with at least one key/value pair #execute the contents of the block } For example:

$lang = "Perl"; # < true

$version = 5.6; # < true ATTENTION: Infinite Loop !!! $zero = 0; # < false while (1) { #execute the contents of the block forever ! $empty = ""; # < false } @states = (); # < false

%table = (1 => "one"); # < true

VI, March 2005 Page 27 Perl statement modifiers

Perl looping the while loop

#!/usr/bin/#!/usr/local/bin/perlperl

use strict; use warnings;

my $number = 1;

while ($number<=10) { print "I can count up to $number !"; $number+=1; #Ha ! }

exit ; #really ?

Tip:

To stop a "looping" script press CTRL+C …

VI, March 2005 Page 28 Perl statement modifiers

Perl looping while loop / do until

while loop

"Activity" may never be executed.

do until

"Activity" is executed at least once !

VI, March 2005 Page 29 Perl operators

Perl operators Arithmetic Numeric comparison String comparison + addition == equality eq equality - subtraction != inequality ne inequality * multiplication < less than lt less than / division > greater than gt greater than <= less than or equal le less than or equal >= greater than or equal ge greater than or equal

Why do we have separate numeric and string comparisons?

Because we don't have special variable types, and Perl needs to know whether to sort numerically (where 99 is less than 100) or alphabetically (where 100 comes before 99).

VI, March 2005 Page 30 Perl operators

Perl operators

#!/usr/local/bin/perl Output: use strict; "100" is numerically greater than "99" use warnings; "100" is alphabetically smaller than "99"

my $x = 100; my $y = 99;

if ($x > $y) { print "\"$x\" is numerically greater than \"$y\"\n" ; } else { print "\"$x\" is numerically smaller than \"$y\"\n" ; }

if ($x gt $y) { print "\"$x\" is alphabetically greater than \"$y\"\n" ; } else { print "\"$x\" is alphabetically smaller than \"$y\"\n" ; }

exit ;

VI, March 2005 Page 31 Perl operators

Perl operators

Boolean Miscellaneous && and = assignment || or . string ! not x string multiplication .. range operator (creates a list of numbers)

Many operators can be combined with a "=" as follows:

$a += 1; # same as $a = $a + 1 #same as $a++

$a -= 1; # same as $a = $a - 1 #same as $a--

$a .= "\n"; # same as $a = $a. "\n";

VI, March 2005 Page 32 Perl functions

Functions in Perl are called subroutines

Functions are useful to avoid typing redundant code over and over.

Functions help in the clarity of scripts.

There are already many available functions in Perl: http://searchcpanorg/~nwclark/perl-5.8.6/pod/perlfunc.pod syntax of Perl subroutines: sub (list of arguments) { list of statements to execute return some value }

VI, March 2005 Page 33 Perl functions

#!/usr/local/bin/perl use strict; use warnings; my $height = 220; my $weight = 120;

#to calculate the BFI you need the heigth in cm and the weight in kg my $bfi = &cal($height, $weight); print "$bfi\n"; exit; sub cal { if (@_ != 2) { die "&cal should get exactly two arguments!\n" ; } my ($cm, $kg) = @_ ; my $index = ($kg)/(($cm / 100)*($cm / 100)); return $index; }

Notice on Body Fat Index (BFI): Output: BFI <20 => weight is too low 24.7933884297521 20 < BFI < 25 => weight is correct BFI > 25 => Oups !

VI, March 2005 Page 34 Perl functions

#!/usr/local/bin/perl Output: ******* use strict; *Pedro* use warnings; ******* ******** my @names = ("Pedro", "Claire", "Yemima", "Fabien", "Uta"); *Claire* ******** foreach (@names) { ******** my $size = length($_); *Yemima* print "*"x($size+2)"\n"; ******** print "*$_*\n"; ******** print "*"x($size+2)"\n"; *Fabien* } ******** ***** exit ; *Uta* *****

What if you need this "pretty print" more than once ? my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Uta"); my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique"); my @names3 = ("Lionel", "Michael", "Charlotte", "Subhash", "Adam"); my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Viviane"); my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");

VI, March 2005 Page 35 Perl functions

#!/usr/local/bin/perl

use strict; use warnings;

my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Francisco"); my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela"); my @names3 = ("Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam"); my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane"); my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");

&pretty_print(@names1); &pretty_print(@names2); &pretty_print(@names3); &pretty_print(@names4); &pretty_print(@names5);

exit ;

sub pretty_print { foreach (@_) { my $size = length($_); print '*'x($size+2),"\n"; print "*$_*\n"; print '*'x($size+2),"\n"; } }

VI, March 2005 Page 36 Perl File handles

A "file handle" is a connection between your Perl script and the outside world.

You can open a file for input or output using the open() .

open(INFILE, "input.txt") or die "Can't open input.txt: $!"; open(OUTFILE, ">output.txt") or die "Can't open output.txt: $!"; open(LOGFILE, ">>logfile") or die "Can't open logfile: $!";

print() can also take an optional first argument specifying which filehandle to print to:

print STDERR "This is your final warning\n"; print OUTFILE $record; print LOGFILE $logmessage;

use whatever name you like BUT: STDIN, STDOUT, STDERR !

VI, March 2005 Page 37 Perl File handles

Perl special file handles

There are three connections that always exist and are always "open" when your program starts:

STDIN, STDOUT, and STDERR.

Actually, these names are file handles. File handles are variables used to manipulate files.

STDIN reads from standard input which is usually the keyboard in normal Perl script (or input from a Browser in a CGI script. Cgi-lib.pl reads from this automatically.)

STDOUT (Standard Output) and STDERR (Standard Error) by default write to a console (or a browser in CGI).

We have been using the STDOUT file handle without knowing it for every print() statement during this presentation. The print() function uses STDOUT as the default if no other file handle is specified.

VI, March 2005 Page 38 Perl File handles

You can read from an open filehandle using the "<>" operator.

In scalar context it reads a single line (or a single record) from the filehandle, and in list context it reads the whole file in, assigning each line to an element of the list: my $line = ; my @lines = ;

Reading in the whole file at one time is called slurping. It can be useful but it may be a memory hog. Most text file processing can be done a line at a time with Perl's looping constructs. The "<>" operator is most often seen in a while loop: while { # assigns each line in turn to $_ print "Just read in this line: $_"; }

When you're done with your filehandles, you should close() them (though Perl will clean up after you if you forget…): close INFILE; You can modify the regular record separator "\n" by something else: $/= "\/\/\n"; for a file containing SwissProt entries or $/=">"; for a fasta file)

VI, March 2005 Page 39 Perl regular expressions

Idea: powerful way to search for text patterns …

>sw:THIO_RAT/110 VKLIESKEAFQEALAAAGDKLVVVDFSATWCGPCKMIKPFFHSLCDKY …… >te:CB530525/66168 VKQIESKYAFQEALNSAGEKLVVVDFSATWCGPCKMIKPFFHSLSEKY …… >tr:Q5R9M3_PONPY/210 VKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKY …… >tg:NT039170_956/56151 VKLIESKEAFQEALAAERDKLVMVDFSATWCGPCKMIKPFFHSSCDKY …… >te:CV502349/88193 VSLITTKESWDQKLAEAKKegKIVIANFSASWCGPCRMISPFYCELKY …… >sw:TRXL2_ARATH/98174 ITSAEQFLNALKDAGDRLVIVDFYGTWCGSCRAMFPKLCKFGHTAKEH …… >te:OMY_1368_2/13111 ISSEEQWEEALSGPGLLVIEVYQRWCGPCKAVQNIFRKLRSHTHHTEY …… >te:CA246724/110160 SKATYDEQWAAhkSSGKLMVIDFSASWCGPCRFIEPAFKELTHTASRF …… >tr:Q84XR8_CHLRE/68169 ILTADTYHGFLEKNAEKLVVTDFYAVWCGPCKVIAPEIERTLANEMMT …… >tg:AL772421_11/578 KLVVIEFGASWCEPSRRIAPVFAEYAKKMNKDKNDHDKDGDKDGMKEF ……

VI, March 2005 Page 40 Perl

VI, March 2005 Page 41