Bioinformatics: Issues and Algorithms Perl Programming 2
Total Page:16
File Type:pdf, Size:1020Kb
Perl Programming 2 Bioinformatics: Issues and Algorithms CSE 308-408 • Fall 2007 • Lecture 5 CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 1 - Administrative notes • Homework #1 is due on Tuesday, Sept. 11 at 5:00 pm. Submit your work using Blackboard Assignment function. • Homework #2 will be available on Blackboard on Thursday, Sept. 13 at 9:00 am. CSE Department Ice Cream Social (yum!) Location: Packard Lab 360 Date: Tues., Sept. 11, 4:10 pm – 5:00 pm CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 2 - Arrays As we know, in bioinformatics, much of the data we care about consists of collections of genetic sequences. Simple scalar variables won't suffice ... A perl list data structure #! /usr/bin/perl -w # The 'arrays1' program. @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "$list_of_sequences[1]\n"; Perl array variables start with “@” metis:~/CSE308/Chapter4% arrays1 Why did this print GCTCAGTTCT GCTCAGTTCT and metis:~/CSE308/Chapter4% not TTATTATGTT? CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 3 - Arrays Arrays in Perl (and many other languages) start at index [0]: #! /usr/bin/perl -w # The 'arrays1' program. @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "$list_of_sequences[1]\n"; TTATTATGTT [0] GCTCAGTTCT [1] GACCTCTTAA [2] metis:~/CSE308/Chapter4% arrays1 GCTCAGTTCT metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 4 - Manipulating arrays #! /usr/bin/perl -w # The 'arrays2' program. @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "$list_of_sequences[1]\n"; $list_of_sequences[1] = 'CTATGCGGTA'; $list_of_sequences[3] = 'GGTCCATGAA'; print "$list_of_sequences[1]\n"; TTATTATGTT [0] TTATTATGTT [0] GCTCAGTTCT [1] CTATGCGGTA [1] GACCTCTTAA [2] GACCTCTTAA [2] GGTCCATGAA [3] CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 5 - Manipulating arrays #! /usr/bin/perl -w # The 'arrays2' program. @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "$list_of_sequences[1]\n"; $list_of_sequences[1] = 'CTATGCGGTA'; $list_of_sequences[3] = 'GGTCCATGAA'; print "$list_of_sequences[1]\n"; What does this do when it runs? metis:~/CSE308/Chapter4% arrays2 GCTCAGTTCT CTATGCGGTA metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 6 - How big is an array? #! /usr/bin/perl -w # The 'arrays3' program. @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "The array size is: ", $#list_of_sequences+1, ".\n"; print "The array size is: ", scalar @list_of_sequences, ".\n"; Returns largest array index Perl's scalar function converts array to a scalar by counting number of list elements metis:~/CSE308/Chapter4% arrays3 The array size is: 3. The array size is: 3. metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 7 - Adding elements to an array #! /usr/bin/perl -w # The 'arrays4' program. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "The array size is: ", $#sequences+1, ".\n"; @sequences = ( @sequences, 'CTATGCGGTA' ) ; print "The array size is: ", scalar @sequences, ".\n"; Perl combines these two lists metis:~/CSE308/Chapter4% arrays4 The array size is: 3. The array size is: 4. metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 8 - But be careful Notice the effect of this code: #! /usr/bin/perl -w # The 'arrays6' program. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); print "The array size is: ", $#sequences+1, ".\n"; print "@sequences\n"; Overwrites the array @sequences = ( 'CTATGCGGTA' ); print "The array size is: ", scalar @sequences, ".\n"; print "@sequences\n"; metis:~/CSE308/Chapter4% arrays6 The array size is: 3. TTATTATGTT GCTCAGTTCT GACCTCTTAA The array size is: 1. CTATGCGGTA metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 9 - Adding elements to an array An obvious extension: metis:~/CSE308/Chapter4% more arrays8 #! /usr/bin/perl -w # The 'arrays8' program. @sequence_1 = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequence_2 = ( 'GCTCAGTTCT', 'GACCTCTTAA' ); @combined_sequences = ( @sequence_1, @sequence_2 ); print "@combined_sequences\n"; metis:~/CSE308/Chapter4% metis:~/CSE308/Chapter4% arrays8 TTATTATGTT GCTCAGTTCT GACCTCTTAA GCTCAGTTCT GACCTCTTAA metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 10 - Removing elements from an array: splicing Perl provides function for “surgically removing” part of an array: #! /usr/bin/perl -w # The 'remove1' program. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'TTATTATGTT' ); @removed_elements = splice @sequences, 1, 2; print "@removed_elements\n"; print "@sequences\n"; Remove two array elements starting at index [1] metis:~/CSE308/Chapter4% splice1 GCTCAGTTCT GACCTCTTAA Removed elements TTATTATGTT TTATTATGTT New array metis:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 11 - Removing elements from an array: splicing splice @sequences, OFFSET, LENGTH Start removing at Remove this this array index many elements Notes: • Splice subroutine returns removed elements. • If no value for LENGTH provided, every element from OFFSET onward is removed. • If no value for OFFSET provided, every element is removed. • In latter case, more efficient to write @sequences = (); CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 12 - Accessing elements in an array: slicing To access array elements without removing them, use slice: #! /usr/bin/perl -w # The 'slices' program - slicing arrays. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); print "@sequences\n"; Slice to access @seq_slice = @sequences[ 1 .. 3 ]; print "@seq_slice\n"; elements 1-3 print "@sequences\n"; @removed = splice @sequences, 1, 3; print "@sequences\n"; Splice to remove print "@removed\n"; elements 1-3 europa:~/CSE308/Chapter4% slices TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA Slice TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC TTATTATGTT ATCTGACCTC Splice GCTCAGTTCT GACCTCTTAA CTATGCGGTA europa:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 13 - Accessing elements in an array: slicing Perl range operator Access 2nd through @dnas[ 1 .. 9 ] 10th elements in array @dnas [ 1, 4, 9 ] Access 2nd, 5th, and 10th elements in array Notes: • To access list of elements from array, use a slice. • To remove list of elements from array, use splice. • Both return the elements in question. CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 14 - Pushing, popping, shifting, and unshifting Often, manipulation of arrays involves single elements, so Perl provides special functions to make this easier: shift Removes and returns first element from array pop Removes and returns last element from array unshift Adds element (or list) onto start of array push Adds element (or list) onto end of array Start of array @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); End of array CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 15 - Pushing, popping, shifting, and unshifting #! /usr/bin/perl -w @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); print "@sequences\n"; #1 Removes last element $last = pop @sequences; print "@sequences\n"; $first = shift @sequences; #2 Removes first element print "@sequences\n"; unshift @sequences, $last; print "@sequences\n"; #3 Places element at start push @sequences, ( $first, $last ); print "@sequences\n"; #4 Places elements at end europa:~/CSE308/Chapter4% pushpop TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC #1 TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA GCTCAGTTCT GACCTCTTAA CTATGCGGTA #2 ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA #3 ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA TTATTATGTT ATCTGACCTC #4 europa:~/CSE308/Chapter4% CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 16 - Pushing, popping, shifting, and unshifting TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC pop last element (ATCTGACCTC) “pop” TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA $last TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA “shift” shift element (TTATTATGTT) $first GCTCAGTTCT GACCTCTTAA CTATGCGGTA $last GCTCAGTTCT GACCTCTTAA CTATGCGGTA “unshift” unshift one new element (ATCTGACCTC) ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA $first, $last push on two new elements (TTATTATGTT ATCTGACCTC) “push” ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA TTATTATGTT ATCTGACCTC CSE 308-408 · Bioinformatics: Issues and Algorithms Lopresti · Fall 2007 · Lecture 5 - 17 - Iterating over all elements of an array Perl makes it easy to iterate over all the elements of an array: #! /usr/bin/perl -w # The 'iterateW' program - iterate over an entire array. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); $index = 0; $last_index = $#sequences; while ( $index <= $last_index ) { print "$sequences[ $index