Beginning Perl Scripting
Total Page:16
File Type:pdf, Size:1020Kb
Beginning Perl Scripting Simple scripts, Expressions, Operators, Statements, Variables Simon Prochnik & Lincoln Stein Suggested Reading Learning Perl (6th ed.): Chap. 2, 3, 12, Unix & Perl to the Rescue (1st ed.): Chap. 4 Chapters 1, 2 & 5 of Learning Perl. Lecture Notes 1. What is Perl? 2. Some simple Perl scripts 3. Mechanics of creating a Perl script 4. Statements 5. Literals 6. Operators 7. Functions 8. Variables 9. Processing the Command Line Problems What is Perl? Perl is a Programming Language Written by Larry Wall in late 80's to process mail on Unix systems and since extended by a huge cast of characters. The name is said to stand for: 1. Pathologically Eclectic Rubbish Lister 2. Practical Extraction and Report Language Perl Properties 1. Interpreted Language 2. "Object-Oriented" 3. Cross-platform 4. Forgiving 5. Great for text 6. Extensible, rich set of libraries 7. Popular for web pages 8. Extremely popular for bioinformatics Other Languages Used in Bioinformatics 1 of 20 C, C++ Compiled languages, hence very fast. Used for computation (BLAST, FASTA, Phred, Phrap, ClustalW) Not very forgiving. Java Interpreted, fully object-oriented language. Built into web browsers. Supposed to be cross-platform, getting better. Python , Ruby Interpreted, fully object-oriented language. Rich set of libraries. Elegant syntax. Smaller user community than Java or Perl. Some Simple Scripts Here are some simple scripts to illustrate the "look" of a Perl program. Print a Message to the Terminal Code: #!/usr/bin/perl # file: message.pl use strict; use warnings; print "When that Aprill with his shoures soote\n"; print "The droghte of March ath perced to the roote,\n"; print "And bathed every veyne in swich licour\n"; print "Of which vertu engendered is the flour...\n"; Output: (~) 50% perl message.pl When that Aprill with his shoures soote The droghte of March ath perced to the roote, And bathed every veyne in swich licour Of which vertu engendered is the flour... Do Some Math Code: #!/usr/bin/perl # file: math.pl use strict; use warnings; print "2 + 2 =", 2+2, "\n"; print "log(1e23)= ", log(1e23), "\n"; print "2 * sin(3.1414)= ", 2 * sin(3.1414), "\n"; 2 of 20 Output: (~) 51% perl math.pl 2 + 2 =4 log(1e23)= 52.9594571388631 2 * sin(3.1414)= 0.000385307177203065 Run a System Command Code: #!/usr/bin/perl # file: system.pl use strict; use warnings; system "ls"; Output: (~/docs/grad_course/perl) 52% perl system.pl index.html math.pl~ problem_set.html~ what_is_perl.html index.html~ message.pl simple.html what_is_perl.html~ math.pl problem_set.html simple.html~ Return the Time of Day Code: #!/usr/bin/perl # file: time.pl use strict; use warnings; $time = localtime; print "The time is now $time\n"; Output: (~) 53% perl time.pl The time is now Thu Sep 16 17:30:02 1999 Mechanics of Writing Perl Scripts Some hints to help you get going. Creating the Script A Perl script is just a text file. Use any text (programmer's) editor. Don't use word processors like Word. By convention, Perl script files end with the extension .pl. 3 of 20 I suggest Emacs, because it is already installed on almost all Unix machines, but there are many good options: vi, vim, Textwrangler, eclipse The Emacs text editor has a Perl mode that will auto-format your Perl scripts and highlight keywords. Perl mode will be activated automatically if you end the script name with .pl. GUI-based script writing tools (Aquamacs, xemacs, Textwrangler, Eclipse) are easier to use, but you may have to install them yourself. Let's write a simple perl script. It'll be a simple text file called time.pl and will contain the lines above. Let's try doing this in emacs Emacs Essentials A GUI version is simpler to use e.g. Aquamacs, run it by adding the icon for the application to your Dock then clicking on the icon. You can also run emacs in a Terminal window. Emacs will be installed on almost every Unix system you encounter. (~) 50% emacs The same shortcuts you can use on the command line work in Emacs e.g. control-a (^a) move cursor to beginning of line etc The most important Emacs-specific commands control-x control-f (^x ^f) open a file control-x control-w (^x ^w) save as... control-x control-c (^x ^c) quit control-g (^g) cancel command shift-control-_ (^_) Undo typing control-h ?(^h ?) Help!! option-; (M;) Add comment option-/ (M/) Variable/subroutine name auto-completion (cycles through options) Running the Script Don't forget to save any changes in your script before running it. The filled red circle at the top left of the emacs GUI window has a dot in it if there are unsaved changes. Option 1 (quick, not used much) Run the perl program from the command line, giving it the name of the script file to run. (~) 50% perl time.pl The time is now Thu Sep 16 18:09:28 1999 4 of 20 Option 2 (as shown in examples above) Put the magic comment #!/usr/bin/perl at the top of your script. It's really easy to make a mistake with this complicated line and this causes confusing errors (see below). Double check, or copy from a friend who has it working. And always add use strict; use warnings; to the top of your script like in the example below #!/usr/bin/perl # file: time.pl use strict; use warnings; $time = localtime; print "The time is now $time\n"; Now make the script executable with chmod +x time.pl: (~) 51% chmod +x time.pl Run the script as if it were a command: (~) 52% ./time.pl The time is now Thu Sep 16 18:12:13 1999 Note that you have to type "./time.pl" rather than "time.pl" because, by default, bash does not search the current directory for commands to execute. To avoid this, you can add the current directory (".") to your search PATH environment variable. To do this, create a file in your home directory named .profile and enter the following line in it: export PATH=$PATH:. The next time you log in, your path will contain the current directory and you can type "time.pl" directly. Common Errors Plan out your script before you start coding. Write the code, then run it to see if it works. Every script goes through a few iterations before you get it right. Here are some common errors: Syntax Errors Code: 5 of 20 #!/usr/bin/perl # file: time.pl use strict; use warnings; time = localtime; print "The time is now $time\n"; Output: (~) 53% time.pl Can't modify time in scalar assignment at time.pl line 3, near "localtime;" Execution of time.pl aborted due to compilation errors. Runtime Errors Code: #!/usr/bin/perl # file: math.pl use strict; use warnings; $six_of_one = 6; $half_dozen = 12/2; $result = $six_of_one/($half_dozen - $six_of_one); print "The result is $result\n"; Output: (~) 54% math.pl Illegal division by zero at math.pl line 6. Forgetting to Make the Script Executable (~) 55% test.pl test.pl: Permission denied. Getting the Path to Perl Wrong on the #! line Code: #!/usr/local/bin/pearl # file: time.pl use strict; use warnings; $time = localtime; print "The time is now $time\n"; (~) 55% time.pl 6 of 20 time.pl: Command not found. This gives a very confusing error message because the command that wasn't found is 'pearl' not time.pl Useful Perl Command-Line Options You can call Perl with a few command-line options to help catch errors: -c Perform a syntax check, but don't run. -w Turn on verbose warnings. Same as use warnings; -d Turn on the Perl debugger. Usually you will invoke these from the command-line, as in perl -cw time.pl (syntax check time.pl with verbose warnings). You can also put them in the top line: #!/usr/bin/perl -w. Perl Statements A Perl script consists of a series of statements and comments. Each statement is a command that is recognized by the Perl interpreter and executed. Statements are terminated by the semicolon character (;). They are also usually separated by a newline character to enhance readability. A comment begins with the # sign and can appear anywhere. Everything from the # to the end of the line is ignored by the Perl interpreter. Commonly used for human-readable notes. Use comments plentifully, especially at the beginning of a script to describe what it does, at the beginning of each section of your code and for any complex code. Some Statements $sum = 2 + 2; # this is a statement $f = <STDIN>; $g = $f++; # these are two statements $g = $f / $sum; # this is one statement, spread across 3 lines The Perl interpreter will start at the top of the script and execute all the statements, in order from top to bottom, until it reaches the end of the script. This execution order can be modified by loops and control structures. Blocks It is common to group statements into blocks using curly braces. You can execute the entire block conditionally, or turn it into a subroutine that can be called from many different places. Example blocks: { # block starts 7 of 20 my $EcoRI = 'GAATTC'; my $sequence = <STDIN>; print "Sequence contains an EcoRI site" if $sequence=~/$EcoRI/; } # block ends my $sequence2 = <STDIN>; if (length($sequence) < 100) { # another block starts print "Sequence is too small. Throw it back\n"; exit 0; } # and ends foreach $sequence (@sequences) { # another block print "sequence length = ",length($sequence),"\n"; } Literals A literal is a constant value that you embed directly in the program code.