Perl for Bioinformatics Perl and Bioperl I

Perl for Bioinformatics Perl and BioPerl I Jason Stajich July 13, 2011 Perl for Bioinformatics Slide 1/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro Slide 2/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro Basics Slide 3/83 Why Perl for data processing & bioinformatics Fast text processing Regular expressions Extensive module libraries for pre-written tools Large number of users in community of bioinformatics Scripts are often faster to write than full compiled programs Cons Syntax sometimes confusing to new users; 'There's more than one way to do it' can also be confusing. (TMTOWTDI) Cons Not a true object-oriented language so some abstraction is clunky and hacky Scripting languages (Perl, Python, Ruby) generally easier to write simply than compiled ones (C, C++, Java) as they are often not strongly typed and less memory management control. Perl for Bioinformatics Perl Intro Basics Slide 4/83 Perl packages CPAN - Comprehensive Perl Archive http://www.cpan.org Perl for Bioinformatics Perl Intro Basics Slide 5/83 Help! Perldoc online http://perldoc.perl.org/ Or on your computer - type 'perldoc' For functions use -f 'perldoc -f sprintf' For modules just the name 'perldoc List::Util' Perl for Bioinformatics Perl Intro Basics Slide 6/83 Hello world #!/usr/bin/perl −w use s t r i c t ; p r i n t "hello world\n" ; >> perl hello.pl >> h e l l o world Perl for Bioinformatics Perl Intro Basics Slide 7/83 Variables & Syntax $ s c a l a r − single value, can be string ,number @array − list of values %hash − paired values: key and value # comments look like this my $variable;#a var declared with my for this scope my ($n1,$n2) = (10,20);# declared and initialized Perl for Bioinformatics Perl Intro Basics Slide 8/83 Strings my $number = "12" ; my $msg = "This could be a message" ; my $composite = "There are $number apples" ; print $composite , "\n" ; >> There are 12 apples Perl for Bioinformatics Perl Intro Basics Slide 9/83 Quotable my $ s t r = 'literally' ; my $ s t r 2 = "interpreted as $str" ; my $executed = `/usr/bin/clustalw seqs.fa `; # special characters my $tab = "\t" ; my $newline = "\t" ; my $singlequote = "'" ; my $dblquote = "\"" ;# OR'"'; Perl for Bioinformatics Perl Intro Basics Slide 10/83 Numerics my $n = 1 6 ; my $g = 3∗∗2;# 3^2 my $d = 12.34; my $ i r = 1 / 3 ; my $h = 1e −3; p r i n t $ i r , "\n" ; p r i n t $h , "\n" ; p r i n t f "%e\n" , $h ;# print in scientific notation >> 0.333333333333333 >> 0.001 >> 1.000000 e−03 Perl for Bioinformatics Perl Intro Basics Slide 11/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro Syntax and Variables Slide 12/83 Print me something #!/usr/bin/perl −w use s t r i c t ; p r i n t "a message\n" ; my $v1 = 'CFTR' ; my $v2 = '20.34 ' ; p r i n t f "a LOD score for marker %s is %d\n" , $v1 , $v2 ; >> a LOD score for the marker CFTR is 20 Perl for Bioinformatics Perl Intro Syntax and Variables Slide 13/83 Logic i f ( $A < $B) f g # if this e l s i f ( $B < $C ) f g # otherwise if this e l s e fg # do this as last resort unless( $bool n o t ) f g # do something if bool n o t is false while( $bool ) fg # loop if bool is true until ( $bool n o t ) fg # loop if bool n o t is false f o r ( INIT ; TEST ; INCREMENT ) f g for( $i = 0; $i < 5 ; $ i++ ) f p r i n t $i , "\n" ; g >> 1 >> 2 >> 3 ... Perl for Bioinformatics Perl Intro Syntax and Variables Slide 14/83 Logic and Truth i f ( $a == $b )#a is numerically equivalent tob i f ( $a eq $b )#a is lexically equivalent tob i f ( $a < $b )#a is less thanb i f ( $a != $b )#a is not equal tob(numerically) i f ( $a ne $b )#a is not lexically equal(string comparison) i f ( @ l i s t )# true if @list is not empty i f ( $a )# true if $a is not0 and not undefined Perl for Bioinformatics Perl Intro Syntax and Variables Slide 15/83 Arrays and Lists my @ f r u i t = ( 'apple ' , 'pear ' , 'peach ' ); my @tropical = qw(kiwi passion star);# qw for quote words my @all = (@fruit ,@tropical);# lists are flattened my @mixed = (1, 'pineapple' , 1 8 . 3 ) ;# can be mixed types my @sorted = sort @mixed;#alpha numeric sort my @ordered = sort f $a<=>$bg (10,3,17,200,9); print $ordered[0], "\n" ;# print the1st item print $ordered[ −1] , "\n" ;# print the last item >> 3 >> 200 Lists and strings start counting at '0' not '1' Perl for Bioinformatics Perl Intro Syntax and Variables Slide 16/83 Arrays To add or remove entries from lists: can operate as linked-lists and stacks pop to remove from end push to add to end shift remove from front unshift add to front splice to arbitrarily remove from any position my @tools = qw(rake shovel); push @tools , 'hammer ' ;# now there will be3 items my $first = shift @tools;# will be'rake' splice(@tools , 1,0, 'saw ' );# insert'saw' after'hammer' p r i n t j o i n ( "," , @ t o o l s ) , "\n" ; >> rake ,saw, shovel ,hammer Perl for Bioinformatics Perl Intro Syntax and Variables Slide 17/83 Split and Join my @lst = qw(In the locust wind comes a rattle and hum); p r i n t j o i n ( "," , @ l s t ) , "\n" ;# combine list toa string my @newlist = split(/n s +/, "GENE1 GENE2 GENE3 VALUE1" ); print $newlist[2], "\n" ;# split string, get3rd item >> In ,the ,locust ,wind,comes,a, rattle ,and,hum >> GENE3 Perl for Bioinformatics Perl Intro Syntax and Variables Slide 18/83 Hashes my %f r u i t = ( 'apple ' => 'red ' );# initialize with value $ f r u i t f'banana 'g = 'yellow ' ;# adda value $ f r u i t f'apple 'g = 'green ' ;# updatea value my @keys = keys %fruit;# get keys of hash my @vals = values %fruit;# get the values p r i n t j o i n ( "," , @keys ) , "\n" ; p r i n t j o i n ( "," , @ v a l s ) , "\n" ; >> banana , apple >> yellow ,green Perl for Bioinformatics Perl Intro Syntax and Variables Slide 19/83 Manipulate strings substr - get a substring and also manipulate in place length - get length of a string . - concatenate two strings my $ l e f t = 'ABC' ; my $ r i g h t = 'XYZ' ; my $concat = $left .$right; print length($concat), " is length of string $concat\n" ; p r i n t " pos 3-4 is " , substr($concat ,2,2), "\n" ; my $lastchar = substr($concat, −1 ,1); substr($concat ,1,2, '' );# replace2nd and3rd characters p r i n t "concat is $concat\n" ; >> 6 is the length of string ABCXYZ >> pos 2−4 i s CX >> concat is AXYZ Perl for Bioinformatics Perl Intro Syntax and Variables Slide 20/83 Manipulate strings 2 How about walking through each base in a sequence? Could use split and turn it into an array and use a for loop OR Can use substr to request each character in the string, one at a time. Turns out this is faster. my $sequence = 'ACGGTAGCATA' ; for( my $i = 0; $i < length $sequence; $i++ ) f my $base = substr($sequence, $i, 1);# get thei −th base print $base, "\n" ; g >> A >> C >> G >> G >> T ... Perl for Bioinformatics Perl Intro Syntax and Variables Slide 21/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro More complex: References Slide 22/83 References { pointers in Perl my %i n f o = ( 'gene123 ' => 1 . 0 2 ) ; my $ r e f = n% i n f o ;# backslash to make reference p r i n t $ r e f −>f'gene123 ' g , "\n" ;# arrow to dereference my $ r e f 2 = f 'gene1 ' => 2 g ;# fg f o r anonymous hash reference p r i n t $ r e f 2 −>f'gene1 ' g , "\n" ; p r i n t j o i n ( "," , k e y s %f$ r e f 2 g ), "\n" ;#% fg to derefa hash my $refarray = [ 'gene1 ' , 'gene2 ' ];#[] for anonymous array reference p r i n t j o i n ( ";" , @$refarray), "\n" ;#@ fg to dereference array >> 1 . 0 2 >> gene1 >> gene1;gene2 Perl for Bioinformatics Perl Intro More complex: References Slide 23/83 References used for Hashes of Arrays (HoA) If you wanted to store more than 1 thing per key in a hash my %kitchen = ( 'ingredients' => [ qw(flour eggs milk) ] ); p r i n t j o i n ( "," ,@f $ k i t c h e n f'ingredients' g g), "\n" ; $ k i t c h e n f'chefs 'g = [qw( Anthony Emeril)]; push @f $ k i t c h e n f'chefs ' gg , 'Julia ' ; for my $info ( keys %kitchen ) f p r i n t " $info : " , j o i n ( "\t" ,@f $ k i t c h e n f $ i n f o g g), "\n" ; g >> flour ,eggs ,milk >> chefs: Anthony Emeril Julia >> ingredients: flour eggs milk Perl for Bioinformatics Perl Intro More complex: References Slide 24/83 Array of Hashes (HoH) my @library = ( f 'title ' => 'To Kill a Mockingbird' , 'author_last'=> 'Lee ' , 'author_first'=> 'Harper ' g , f 'title ' => 'Moby Dick ' , 'author_last' => 'Mellville' , 'author_first' => 'Herman ' g , f 'title ' => 'Old Man and the sea' , 'author_last' => 'Hemmingway' , 'author_first'=> 'Ernest ' g ); for

Perl for Bioinformatics Perl and Bioperl I

Easybuild Documentation Release 20210907.0

Bioperl What’S Bioperl?

The Bioperl Toolkit: Perl Modules for the Life Sciences

Vasco Da Rocha Figueiras Algoritmos Para Genómica Comparativa

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

Web & Grid Technologies in Bioinformatics, Computational And

BINF 634 Bioinformatics Programming

(12) United States Patent (10) Patent No.: US 8,407,013 B2 Rogan (45) Date of Patent: Mar

Program Performance Review

Technical Notes All Changes in Fedora 13

IT-SC Beginning Perl for Bioinformatics James Tisdall Publisher: O'reilly First Edition October 2001 ISBN: 0-596-00080-4, 384 Pages

Towards Left Duff S Mdbg Holt Winters Gai Incl Tax Drupal Fapi Icici