Modern Perl for Biologists I | the Basics M-BIM / INBTBI
Total Page:16
File Type:pdf, Size:1020Kb
Modern Perl for Biologists I | The Basics M-BIM / INBTBI Denis BAURAIN / ULiège Edition 2020–2021 Denis BAURAIN / ULiège ii Modern Perl for Biologists I | The Basics Acknowledgment This course has been developed over eight years (2013–2020) primarily as teaching materials for the Modern Perl module of the (INBTBI) Bioinformatics Training organized by the Biotechnology Training Centre in the GIGA Tower of the University of Liège (Belgium). Until 2016, the Biotechnology Training Centre was directed by Laurent Corbesier, who is warmly thanked for his support. The training itself was funded through the Bioinformatics Training Program, 6th call 2010–2014, BioWin (the Health Cluster of Wallonia), Le Forem (the Walloon Office for Employment and Training), and the Biotechnology Training Centre (Forem-GIGA). This two-part document benefitted from the feed-back of about 150 students, trainees and colleagues, among which Arnaud Di Franco, Agnieszka Misztak, Loïc Meunier and Sinaeda Anderssen are espe- cially acknowledged. If you spot typos, grammatical or technical errors, or have any suggestion on how to improve the next edition of this course, let me know ([email protected]). I will answer all messages. Finally, I thank Pierre (Pit) Tocquin for his help with the mighty triad Markdown / Pandoc / LATEX. How to read this course? This document is written in American English. The first time they are introduced, programming terms and biological terms are typeset in bold and italic, respectively, whereas computer keywords are always typeset in a monospaced font (Consolas). All these terms are indexed at most once per section in which they appear (see Index). While the main text is typeset in Palatino, user statements (either at the shell or as small code excerpts) and computer answers are typeset in the same monospaced font as computer keywords. $ echo "Hello world!" # user statement (note the shell prompt) Hello world! # computer answer Example BOX Advanced material that can be skipped on first reading is enclosed in boxes with a light blue background. A List of Boxes is available for convenience. General programming advices and good practices that apply beyond the Perl language are typeset in a smaller size and introduced by a yellow bar surmounted by an icon. 1 say <<'EOT'; 2 Complete program listings are typeset 3 in the same monospaced font as keywords. 4 Their lines are numbered. 5 EOT Denis BAURAIN / ULiège iii Modern Perl for Biologists I | The Basics Denis BAURAIN / ULiège iv Modern Perl for Biologists I | The Basics Contents Acknowledgment ......................................... iii How to read this course? ..................................... iii I Lesson 1 1 1 Introduction 3 1.1 What is Perl? ........................................ 3 1.2 What is Modern Perl? ................................... 3 1.3 Why for biologists? .................................... 4 1.4 Covered topics ....................................... 4 1.5 Homework guidelines ................................... 5 1.6 Perl Cheat Sheet ...................................... 6 2 Before beginning 9 2.1 The need for a sandbox .................................. 9 2.2 How to install our sandbox? ............................... 9 3 First steps in Perl 11 3.1 Motivation ......................................... 11 3.2 Our first killer app ..................................... 11 3.3 How to make our own rev_comp? ............................ 12 3.4 How to document our program? ............................. 14 3.5 Perl variables ........................................ 15 3.6 How to see what’s going on? ............................... 17 3.7 A closer look to our killer app ............................... 18 3.8 Basic Perl syntax ...................................... 19 3.8.1 Names ........................................ 19 3.8.2 Sigils ......................................... 20 3.8.3 Context ........................................ 20 3.8.4 Scope ......................................... 21 3.8.5 strict and warnings pragmas ........................... 23 3.9 List builtin functions .................................... 23 3.9.1 push & pop ...................................... 23 3.9.2 unshift & shift .................................. 24 3.9.3 split & join .................................... 25 4 Digging deeper into Perl 29 4.1 Another killer app ..................................... 29 4.2 The code for our own translate ............................. 30 Homework 33 Denis BAURAIN / ULiège v Modern Perl for Biologists I | The Basics CONTENTS II Lesson 2 35 5 Looking at the novelties in translate.pl 37 5.1 Shebang line ........................................ 37 5.2 Modern::Perl ....................................... 37 5.3 Perl values: Strings .................................... 38 5.3.1 Defining and concatenating strings ......................... 38 5.3.2 Alternate quoting operators ............................ 40 5.4 String builtin functions .................................. 40 5.4.1 length ........................................ 41 5.4.2 substr ........................................ 41 5.4.3 uc & lc ........................................ 42 5.5 The for loop ........................................ 43 5.5.1 foreach-style for loop ................................ 43 5.5.2 C-style for loop ................................... 45 5.6 Boolean expressions .................................... 47 5.6.1 Perl’s vision of truth ................................. 47 5.6.2 The undef value ................................... 48 5.6.3 Logical defined-or operator ............................. 48 5.7 The say builtin function .................................. 49 6 Batch vs. interactive programs 51 6.1 Acquiring user input ................................... 51 6.2 How to improve our killer app? .............................. 51 6.3 Let’s make our first game! ................................. 53 Homework 57 III Lesson 3 59 7 Becoming a control freak 61 7.1 Control flow in Perl .................................... 61 7.2 Branching directives .................................... 61 7.2.1 if & unless ..................................... 62 7.2.2 else & elsif .................................... 63 7.2.3 die and the postfix form .............................. 66 7.2.4 Interlude—defining multiline strings with the heredoc syntax . 66 7.3 Looping directives ..................................... 67 7.3.1 The while loop and its variants ........................... 68 7.3.2 Loop control directives ............................... 71 8 Other novelties in codon_quizz.pl 79 8.1 keys & shuffle ...................................... 79 8.2 Reading from the standard input stream ......................... 80 8.2.1 The readline operator ............................... 80 8.2.2 Line endings ..................................... 81 9 A first look at operators 83 9.1 What are operators? .................................... 83 9.2 Comparison operators ................................... 84 9.3 Logical operators ..................................... 85 9.4 Operator precedence and associativity .......................... 86 Denis BAURAIN / ULiège vi Modern Perl for Biologists I | The Basics CONTENTS 10 Using Perl to compute some stats 87 10.1 A killer app with a fast bite ................................ 87 10.2 The code for our own codon_usage ........................... 88 Homework 91 IV Lesson 4 93 11 Input/output in Perl 95 11.1 Reading files ........................................ 95 11.1.1 FASTA format .................................... 95 11.1.2 open & close .................................... 95 11.1.3 autodie and $! ................................... 96 11.2 Writing files ........................................ 98 12 A first look at mathematics in Perl 101 12.1 Perl values: Numbers ................................... 101 12.2 Numeric and in-place operators ............................. 102 13 More on hashes 105 13.1 Hash uses ......................................... 105 13.2 values & sort and reuse of variable names ....................... 110 14 Towards more complex programs 113 14.1 A production-grade translation tool ........................... 113 Homework 117 V Lesson 5 119 15 Looking at the novelties in xxl_xlate.pl 121 15.1 Would you like some syntactic sugar? . 121 15.1.1 Ordered hashes: Tie::IxHash . 121 15.1.2 each and list assignment .............................. 121 15.1.3 The ternary conditional operator . 122 15.2 Writing portable code ................................... 122 15.2.1 LWP::Simple ..................................... 122 15.2.2 Path::Class ..................................... 123 15.2.3 File::Basename ................................... 123 16 Regular expressions 125 16.1 What are regular expressions? .............................. 125 16.2 Defining regular expressions ............................... 125 16.3 Using regexes ....................................... 126 16.4 When not to use regexes? ................................. 127 16.4.1 Literal searches: index ............................... 128 16.4.2 Transliteration: tr/// ................................ 129 16.5 Anchors .......................................... 132 16.6 Character classes ...................................... 135 16.7 Metacharacters ....................................... 136 16.8 Quantifiers ......................................... 137 16.9 Capturing groups ..................................... 139 16.10 Non-capturing groups ................................... 142 16.11 Modifiers .........................................