<<

String literals & variables! for Bioinformatics, 140.636! F. Pineda suggestion n login into the cluster and try out some of the ideas as we are going along with perl ‘one-liners.’

perl –e ‘valid perl statements’

perl –e ‘print “hello word”’ perl –e ‘print “hello word\n”’ strings in Perl n A is a set of characters surrounded by double quotes. “Hello world” n Can also use single quotes or qw operator -- read about these in Schwartz & Phoenix. Escape sequences n Perl makes one character special n ‘\’ the n Any single character that follows a backslash is interpreted in a special way, e.g. n \n means new line n \t means tab n \” means double quote (in double quoted string) n Try these one liners:

perl -e 'print "hello world"’ perl -e 'print "hello world\n"’ perl –e ‘print “\”Hello world\”\n“; Useful Escape sequences n \a ASCII Alarm (bell) n \b ASCII Backspace n \e ASCII Escape n \f ASCII Formfeed n \n ASCII Newline n \r ASCII Return n \t ASCII Tab n \ Control the following character e.g. \cC is Control-C n \l lowercase the following character n \u uppercase the following character n \U…\E Uppercase everything between \U and \E n \L…\E Lowercase everything between \L and \E When are escape sequences allowed?

n All escape sequences are allowed in double quoted strings n Only \’ and \\ are allowed in single quoted strings. n As beginners I recommend you always use double quoted strings so you don’t have to think about the difference. String operators n (dot) n e.g. “hello”.” world”.”\n” n concatenates strings into a single string n Repetition n e.g. “ha “ x 3 n repeats the string and concatenates together the copies String functions

# returns the length of the string length(“CTAAACCCTAAACCCTA”);

# returns the string in reversed order # and does not change original string scalar reverse(“CTAAACCCTAAACCCTA”); starting position # returns a substring length substr(“CTAAACCCTAAACCCTA”,1,3);

Try them out:

perl –e ‘print substr(“CTAAACCCTAAACCCTA”,1,3);’ Perl starts counting at 0!! ! (warning: biological sequence data starts counting at 1) chomp() n A line from the keyboard is terminated by a newline which is inserted when you hit the return key. n Chomp() is a function that removes this trailing newline. n If there is more than one newline, chomp only removes the last one. n chomp() is so useful that it will appear in nearly every program you write -- trust me! chomp() avoids uncertainty about whether a line of input has a \n or not. You should always “chomp” a line ! of input immediately after reading it.

$line = ; chomp($line); Transliteration tr/// n Our first pattern matching operator! n This operator does 2 things n It replaces characters by other characters (transliteration) n It returns the number of transliterated characters! n The binding operator n =~ n Specifies the variable to which the transliteration is applied n Do not think of =~ as assigning the return value n The general form of the statement is n variable =~ tr/characters/replacementCharacters/ n Think of this as if it were a function: n tr(variable ,characters, replacementCharacters) tr///

n Example 1: Transcribe DNA string to RNA string

$s = ‘CTAAACCCTAAACCCTA’; # $s contains DNA sequence $s =~ tr/T/U/; # replace Thymine with Uracil print “$s\n”; # $s contains RNA seqeunce n Example 2: Reverse complement the DNA sequence

$s = scalar reverse(‘CTAAACCCTAAACCCTA’); # reverse the DNA sequence $s =~ tr/ATCG/TAGC/; # complement it n Example 3: Count the number of thiamine bases $s = ‘CTAAACCCTAAACCCTA’; $count = ($s =~ tr/T//); print $count, “\n”; Summary of string functions ! and operators n length(string) n Returns length of string n scalar reverse(string) n Returns the string but in reverse order n substr(string, start,len) n Returns substring of length len starting at start n chop(string) n Removes last character from a string n Returns the character removed n chomp(string) n Removes newline from end of string n Returns 1 if newline was removed, 0 otherwise n tr/searchCharacters/replacementCharacters/ n Replaces INDIVIDUAL characters (not patterns of characters!) n Returns the number of characters that were matched Numeric literals Numeric literals n Internally all numbers are doubles n Numeric Literals n Floats n Integers n Non-decimal integers n Operators n Binary operators (+,-,*,/) n Unary operators (**) Automatic Number/String conversion n What if a numeric operator is applied to a string or vice versa? n Numbers are converted to strings and strings to numbers as required by operators, e.g. “25” + “5” gives the same result as 25 + 5 n A string without a well defined numerical interpretation is interpreted as zero, i.e. 25 + “fernando” yields 25. Perl’s built-in data types n Scalar variables n A place to store a single value n A string or a number n Arrays of scalars n ordered list of scalars n Hashes of scalars n i.e. Associative arrays or lookup tables n An unordered list of key/value pairs n Lookup a value with a key n Let’s start with scalars Scalar variables! ! Just a name for a place in memory ! to put stuff String literals can be assigned to ! scalar variables

$DNA = “CTAAACCCTAAACCCTA”; print $DNA; Scalar variables n Referenced by $ followed by Perl identifier n A Perl identifier is… n letter or underscore (not digit) followed by letters or digits or underscores n Case sensitive n Scalar variables can contain n Numeric value n String value n Reference value n undef n Variables that have never been assigned have the value undef n Operators that can be applied to literals can also be applied to scalar variables. undef n Variables that have never been assigned have ! the value: undef n Some functions return undef n returns undef when it encounters an ! end-of-file, e.g. n The end of data in an input file n ctrl- from a unix terminal window). n defined() returns false if a variable has the value undef Printing scalar variables n Example $foo = “hello world”; print $foo; print “\n”; n Interpolation n Scalar variables can be interpolated in double quoted strings, (not single quoted strings), e.g.! ! $foo = “a silly word”; print “foo is $foo.\n”; Variable operators & assignment

n Expressions are built up with operators n Numeric operators +,-, *, /,**) apply to numeric data n String operators ( . ) apply to string data n Scalar assignment ( = ) n Variable on the left is assigned the value of the expression on the right, e.g.! $foo = expression n $foo = $boo+ $hoo n Binary assignment operators n Shorthand for $foo = $foo + expression ! $foo += expression n Works for all binary operators, i.e. n +=, -=, *=, /=, **=, .= Operator precedence n Operator precedence determines the order of operations in complicated expressions

$result = 5+3**4*2 n Same precedence as most other languages

$result = 5+((3**4)*2) n Style: be explicit, use parenthesis even it it’s obvious n See table on p. 32 of Schwartz & Phoenix