<<

Introduction to Introduction to Bioinformatics

Prof. Dr. Nizamettin AYDIN

[email protected] Introduction to

1 2

Learning objectives Setting The Technological Scene

• After this lecture you should be able to • One of the objectives of this course is.. – to enable students to acquire an understanding of, and understand : ability in, a (Perl, Python) as the – sequence, iteration and selection; main enabler in the development of programs in the area of Bioinformatics. – building blocks of programming; – three ’s: constants, comments and conditions; • Modern are organised around two main – use of variable containers; components: – use of some Perl operators and its pattern-matching technology; – Hardware – Perl input/output – Software – …

3 4

Introduction to the Computing Introduction to the Computing

• Computer: electronic genius? • In theory, computer can compute anything – NO! Electronic idiot! • that’s possible to compute – Does exactly what we tell it to, nothing more. – given enough memory and time • All computers, given enough time and memory, • In practice, solving problems involves are capable of computing exactly the same things. computing under constraints. Supercomputer – time Workstation • weather forecast, next frame of animation, ... PDA – cost • cell phone, automotive engine controller, ... = = – power • cell phone, handheld video game, ...

5 6

Copyright 2000 N. AYDIN. All rights reserved. 1 Layers of Technology Layers of Technology

... – Interacts directly with the hardware – Responsible for ensuring efficient use of hardware resources • Tools... – Softwares that take adavantage of what the operating system has to offer. – Programming languages, , editors, interface builders... • Applications... – Most useful category of software – Web browsers, email clients, web servers, word processors, etc...

7 8

Transformations Between Layers How do we solve a problem using a computer? • A systematic sequence of transformations between layers of abstraction. Problems

Algorithms Problem : Language choose algorithms and data structures Algorithm Instruction Architecture Programming: Microarchitecture use language to express design Program Circuits Compiling/Interpreting: convert language to Devices Instr Set Architecture machine instructions

9 10

Deeper and Deeper… Descriptions of Each Level…

• Problem Statement – stated using "natural language" Instr Set Architecture – may be ambiguous, imprecise Processor Design: • Algorithm choose structures to implement ISA – step-by-step procedure, guaranteed to finish Microarch – definiteness, effective computability, finiteness Logic/Circuit Design: gates and low-level circuits to • Program – express the algorithm using a computer language Circuits implement components – high-level language, low-level language Process Engineering & Fabrication: develop and manufacture • Instruction Set Architecture (ISA) Devices lowest-level components – specifies the set of instructions the computer can perform – data types, addressing mode

11 12

Copyright 2000 N. AYDIN. All rights reserved. 2 …Descriptions of Each Level Many Choices at Each Level

• Microarchitecture Solve a system of equations – detailed organization of a processor implementation Gaussian Jacobi Red-black SOR Multigrid – different implementations of a single ISA elimination iteration • Logic Circuits C C++ – combine basic operations to realize Tradeoffs: microarchitecture PowerPC Intel Atmel AVR cost – many different ways to implement a single function performance Centrino Pentium 4 Xeon power (e.g., addition) (etc.) • Devices Ripple-carry adder Carry-lookahead adder – properties of materials, manufacturability CMOS Bipolar GaAs

The successive over-relaxation (SOR) : a method for solving a linear system of equations. 13 14

The Computer Level Hierarchy C Fortran Ada etc. Basic Java

• Each virtual machine layer is Compiler an abstraction of the level below it. Byte Code • The machines at each level execute their own particular instructions, calling upon Assembler machines at lower levels to perform tasks as required. Executable • Computer circuits ultimately carry out the work. Instruction Set Architecture •Software?

•Program or collection of programs. HW HW HW •Enables the hardware to process data. Implementation 1 Implementation 2 Implementation N

15 16

Programming Stepwise Refinement

• Methodologies for creating computer programs • Also known as systematic decomposition. that perform a desired function. – Problem Solving • Start with problem statement: • How do we figure out what to tell the computer to do? • Convert problem statement into algorithm, using stepwise • Decompose task into a few simpler subtasks. refinement. • Convert algorithm into machine instructions. • Decompose each subtask into smaller subtasks, and – Debugging these into even smaller subtasks, etc.... • How do we figure out why it didn’t work? • Examining registers and memory, setting breakpoints, etc. until you get to the machine instruction level. Time spent on the first can reduce time spent on the second!

17 18

Copyright 2000 N. AYDIN. All rights reserved. 3 Problem Statement Three Basic Constructs

• There are three basic ways to decompose a task: • Because problem statements are written in English, they are sometimes ambiguous and/or incomplete. – Where is “file” located? Task – How big is it? – How do I know when I’ve reached the end? – How should final count be printed? A decimal number?

True False – If the character is a letter, should I count both upper-case and False condition Test lower-case occurrences? Subtask 1 condition True • How do you resolve these issues? Subtask 1 Subtask 2 Subtask 2 Subtask – Ask the person who wants the problem solved, or – a decision and document it. Sequential Conditional Iterative

19 20

Sequential Conditional

• Do Subtask 1 to completion, • If condition is true, do Subtask 1; then do Subtask 2 to completion, etc. else, do Subtask 2.

Get character input from keyboard True file char False = input?

Count and print the Examine file and Test character. occurrences of a count the number If match, increment character in a file of characters that Count = Count + 1 match counter.

Print number to the screen

21 22

Iterative Why Write Programs?

• Do Subtask over and over, as long as the test condition is true. • Automate computer work that you do by hand – save time & reduce errors

more chars False • Run the same analysis on lots of similar data files to check? Check each element of • Analyze data the file and count the True characters that match. • Make decisions Check next char and • Create new analysis methods count if matches.

23

Copyright 2000 N. AYDIN. All rights reserved. 4 Why Perl? As a software tool: Perl

• What Is Perl? • Fairly easy to learn the • PERL is a "Practical Extraction and Report Language“ • Many powerful functions for working with • (or Pathologically Eclectic Rubbish Lister) text: search & extract, modify, combine – freely available for , MVS, VMS, MS/DOS, • Can control other programs , OS/2, , and other operating systems. • Free and available for all operating systems • Perl has powerful text manipulation functions. – It eclectically combines features and purposes of many • Most popular language in bioinformatics command languages. • Many pre-built “modules” are available that – Perl has enjoyed popularity for programming World Wide do useful things Web electronic forms and generally as glue and gateway between systems, databases, and users.

26

History Strengths of Perl

• Originally written by at NASA’s Jet • Very easy to learn Propulsion Labs • Very portable – to process mail on Unix systems – extended by a lot of people and many biologists! • High level language • Started as ‘glue’ language, • Powerful text processing – for the use of Larry and officemates. • It’s free • It combines the best features of several languages. • What makes Perl a good programming language for • Version 1: December 18, 1987 Biological data? • Current stable release is Perl 5.18.2 – Fast in file manipulation • Perl motto: TMTOWTDI-There’s More Than One – DBI modules provide bridge for other applications Way To Do It – CGI module provides easy web interface

27 28

Getting and Installing Perl What is Perl Used For

• http://www.perl.org/ • CGI (common gateway interface) Programming (dynamically generating web pages). • http://www.perl.com/CPAN/ – (Example : www.amazon.com, www.slashdot.org, • http://www.activestate.com/ www.deja.com) • Perl tutorials: • Extracting data from one source and translating it to another format. http://www.internetbiologists.org/IB-perl/index.html • Manipulating databases, simple search and replace http://learn.perl.org/library/beginning_perl/ operation. • Data management in Human Genome Project • Bioinformatics related web pages: • programming, automating administration http://www.geocities.com/bioinformaticsweb/index.html tasks, ...... etc. http://glasnost.itcarlow.ie/~biobook/index.html

29 30

Copyright 2000 N. AYDIN. All rights reserved. 5 Platform to Use Win95 / Win98 / WinME / WinNT / Win2000/W2K / WinXP (Win32)

• http://www.perl.com/CPAN/ports/index.html • Starting from Perl 5.005 the Win32 support has been integrated to the Perl • Perl Ports (Binary Distributions) standard distribution. But if you insist on a binary: • ActivePerl (Perl for Win32, Perl for ISAPI, PerlScript, Perl Package • CPAN/ports (Comprehensive Perl Archive Network ) Manager) • Perl runs on over 100 platforms! • Apache/Perl (binaries for both Perl-5.6/Apache-1.0/mod_perl-1 and Perl- • Acorn | AIX | Amiga | Apple | Atari | AtheOS | BeOS | BSD | BSD/OS | 5.8/Apache-2/mod_perl-2) Coherent | Compaq | Concurrent | | Darwin | DG/UX | Digital | • DeveloperSide.Net (compiled under VS.NET and includes the latest Digital UNIX | DEC OSF/1 | /ptx | Embedix | EMC | EPOC | versions of Apache2, PHP, MySQL, OpenSSL, mod_perl, Apache::ASP, FreeBSD | Fujitsu | GNU Darwin | Guardian | HP | HP-UX | IBM | IRIX | and a few other components) Japanese | JPerl | | LynxOS | Mac OS | Mac OS X | Macintosh | • IndigoPerl (Perl for Win32, integrated Apache webserver, GUI Package MachTen | MinGW | | MiNT | MorphOS | MPE/iX | MS-DOS | MVS Manager) | NetBSD | NetWare | NEWS-OS | NextStep | NonStop | NonStop-UX | • niPerl (MSI installer, Win32::GUI, Win32::GUI::XMLBuilder, Novell | ODT | Open UNIX | OpenBSD | OpenVMS | OS/2 | OS/390 | Documentation Viewer, WGX, PAR ready, built-in SciTE editor) OS/400 | OSF/1 | OSR | Plan 9 | Pocket PC | PowerMAX | Psion | QNX | • PXPerl (libwin32 included, compiled with Intel C++ Compiler for Reliant UNIX | RISCOS | SCO | Sequent | SGI | Sharp | Siemens | SINIX | maximum performance, Explorer integration (file association etc), self- Solaris | SONY | Sun | Syllable | Stratus | Tandem | Tivo | Tru64 | | configures on install with the local Visual C++ binaries) UNIX | Unixware | U/WIN | VMS | VOS | Win32 | WinCE | Windows 3.1 | • SiePerl for Win32 by Siemens, contains several modules Windows 95/98/Me/NT/2000/XP/VISTA/W7 W8 | z/OS | ……. • Prebuilt Perls by Rich Megginson, a special installer is used. • a perl environment for MS Windows.

31 32

IDEs for Perl How to Get Help • Perl Application Development and Refactoring Environment (Windows, Linux, Mac OS X) • Arachno Perl from Scriptolutions (Windows and Linux) • To get information on a particular function • multiplatform IDE has Perl plugins • Komodo from ActiveState (Windows and Linux) >perldoc –f function • Open Perl IDE (Windows) • Perl Builder from Solutionsoft (Windows and Linux) for example >perldoc –f print • PerlDevKit from ActiveState (IDE Windows, Linux and Solaris) • PerlEdit from IndigoStar (Windows and Linux) • Perl Oasis from Johan Lindström (Windows) • PerlWiz from Arctan Computer Ventures (Windows) • SciTE from the project (Windows and X/+) • visiPerl+ from Help Consulting (Windows) or editors (Perl programs are just plain text so any editor will do). – CodeWright | | GNU | Epsilon | gVim | MicroEmacs | MultiEdit | nvi | PFE | SlickEdit | UltraEdit | | | XEMacs or shell environments (UNIX tool environments, tcsh and zsh are just the shell). – Cygwin | MKS ksh | U/WIN sh | tcsh | (csh/tcsh book) zsh (zsh in general)

33 34

How to Get Help How to Get Help • To search the Perl FAQ for any or keyword • Websites: www.perl.com, >perldoc –q keyword www.perlclinic.com, for example >perldoc –q reverse www.perlfaq.com, www.perl.org, www.tpj.com, www.activestate.com, www.perlarchive.com • News groups, books

35 36

Copyright 2000 N. AYDIN. All rights reserved. 6 Again: What Is Perl? How to Write Perl Code (WIN32)

• Interpreted Language ? • Form a working folder (directory) – (such as Basic, which needs another program called • Open Notepad (or any ) and type in the perl interpreted to process the code every time you want to run code following the convention the program) • Save the file with extension pl, or plx • Compiled languge ? – (such as C, which uses a compiler to proces the code before • In you double click on the file. the code is ever run) – The program will run (probably a window will appear and disappear) • Perl is in between like java: – interpreter reads and compiles the program at ones, • Go to MSDOS prompt. – not into the specific , • Change to working directory and type perl xxxx.pl – but into a special virtual machine code. • Or you use one of the IDEs • It is also called

37 38

First Perl Program Debugging Perl

• Here is a simple to illustrate how a Perl • Debugging program looks: (Print a message to the terminal – Finding the errors and fixing them. Code: • It is a specialized skill and it takes practice #!/niPerl/bin –w to become good at it. • Among the beginner , it is print "merhaba \n"; common banging your head against the • Save this file as merhaba.pl keyboard for what seems like hours, only to • Run it by typing discover the problem was actually in a > perl merhaba.pl completely different part of your script than where you were looking.

39 40

...Debugging...... Debugging...

• Perl tries its best to tell you where the error is. • Use the -c switch – Read the error message carefully. – to check for possible errors. • Sometimes it gives multiple line errors. • By entering the command perl -c scriptname you make Perl to try to compile your script without • Go to the first line number and try to see the error. actually running it • Try to look at an earlier line too, – The -c switch compiles the script without invoking the – sometimes the error doesn’t trigger until a next warnings feature statement. print "merhaba \n"; • After fixing the first error, run the script again, >perl -c merhaba.pl – the next errors may have gone away.

41 42

Copyright 2000 N. AYDIN. All rights reserved. 7 ...Debugging...... Debugging...

• Syntax Errors print "merhaba \n; pint "merhaba \n "; >perl -c hmerhaba1.pl >perl -c hmerhaba2.pl

43 44

Syntax and ...Debugging... • A Perl program may be syntactically correct, but • Use the –w switch semantically wrong. – to tell Perl to warn you about things it thinks might • Semantics has to do with meaning of language. be problems in your script. – This means that the program satisfies the rules and >perl -w hmerhaba3.pl regulations of language but does not do what you expected it to do.

print; "merhaba \n ";

> perl -c hmerhaba3.pl

45 46

...Debugging...... Debugging... • Use the -cw switches (combination of -c and -w) • Try to isolate the problem. – the -cw compiles it with warnings turned on. – By commenting out chunks of code, then rerunning the script, you can often narrow down where the In either case, you can get feedback on any problems your script problem is occurring. might have, without having to actually run it which, – Even better is to avoid the need for this by building may save you time in long and big scripts. and debugging the script in small increments. – Create a simple framework first, get it working, then >perl -c -w hmerhaba3.pl add increasingly complex features on, testing each >perl -cw hmerhaba3.pl component before moving to the next. – In this way you will uncover bugs as you go, and it will usually be obvious where the bug resides; it is in the small section of code you just added. – While it may seem faster to code up the whole thing first, then do all the debugging at the end, it rarely works out that way.

47 48

Copyright 2000 N. AYDIN. All rights reserved. 8 ...Debugging...... Debugging... • Use the strict pragma • Use the strict pragma – Perl is a great language for writing quick, one-off scripts, in part because of its default behavior of – By putting the following line near the of the script: having new variables simply spring into use strict; • you are telling Perl that you are willing to be held to a higher existence on first mention. standard. – This can lead to problems as your script grows, – In particular, you're saying you are willing to declare all however. your variables before using them. – A typo in the name of a variable will mean that – Besides protecting you from typos in your variable names (because the script will abort with an error message during your script is suddenly using a new, different the initial compilation phase if it encounters an undeclared variable from the one you intended, which can be variable), a real head-scratcher to debug. – this also lets you properly "“ your variables, thereby In computer programming, a or pragma (from "pragmatic") is a limiting their visibility, rather than letting them be "global" language construct that specifies how a compiler (or other translator) should variables that could conceivably interact with other process its input. Directives are not part of the grammar of a programming variables of the same name elsewhere in the script. language, and may vary from compiler to compiler. – All of this translates into big savings in debugging time.

49 50

...Debugging Return to our 1st program

• Resist the temptation to attribute the problem • #!/niPerl/bin –w to some previously undiscovered bug in Perl. • Every line starting with # is comment and ignored by – Every novice Perl eventually comes Perl. up against a bug that defies all efforts to identify • However, # and ! together at the start of the 1st line tell and eradicate it. UNIX how the file should be run. – As the programmer's frustration level mounts, an – In this case the file should be passed to Perl interpreter, which in niPerl/bin idea begins to creep into his or her head: • Within Perl, and almost all other programming • it must not be a problem in the script, but is languages, each line in a program is referred to as a something broken in Perl itself. ‘‘statement’’. • It is almost certainly a bug in your script, not in Perl. – Perl statements end with, and are separated from any other statements by, a , that is, the ‘‘;’’ character.

51 52

...1st program Program 1

• 2nd line print "merhaba \n"; • print function tells perl to display the given text. print "Welcome to the Wonderful World of Bioinformatics!\n"; • Text inside the quotes is not interpreted as code >perl -cw welcome1.pl and is called string. • \n is used to start a new line >perl welcome1.pl #!/niPerl/bin -w print "merhaba \n";

53 54

Copyright 2000 N. AYDIN. All rights reserved. 9 Another version of welcome Iteration (Repetition) • Using the Perl while construct print "Welcome "; print "to "; # The ‘iter1' program - a (Perl) program, print "the "; # which does not stop until someone presses Ctrl-C. print "Wonderful "; print "World "; use constant TRUE => 1; print "of "; use constant FALSE => 0; print "Bioinformatics!"; while ( TRUE ) print "\n"; { print "Welcome to the Wonderful World of Bioinformatics!\n"; >perl -cw welcome1.pl sleep 1; >perl welcome1.pl }

55 56

Running forever ... Iteration

>perl iter1.pl • Rather than use the word iteration or repetition to refer to this mechanism, many programmers favour the use

Welcome to the Wonderful World of Bioinformatics! of the word loop. Welcome to the Wonderful World of Bioinformatics! – In this context, loop is both a noun and a verb. Welcome to the Wonderful World of Bioinformatics! • Typical programmer utterances might be ‘‘the loop prints the message Welcome to the Wonderful World of Bioinformatics! five times’’ or ‘‘this program loops forever’’. Welcome to the Wonderful World of Bioinformatics! Welcome to the Wonderful World of Bioinformatics! – Programs that loop forever, just like the forever program in Welcome to the Wonderful World of Bioinformatics! this section, are referred to as an infinite loop. Welcome to the Wonderful World of Bioinformatics! • The infinite loop is generally regarded as a very bad thing, and Welcome to the Wonderful World of Bioinformatics! programmers are encouraged not to introduce such loops into . programs. .

57 58

Introducing variable containers Introducing variable containers

• Two constant definitions come after the comment lines: • The opposite of a constant is a variable container, or use constant TRUE => 1; variable for short. use constant FALSE => 0; – A variable’s value can change over the lifetime of the program. – Perl treats a value of 1 as true and a value of 0 as false. • When you need to change the value of an item, use a variable container. • A constant is a container within a program whose value • Perl has excellent support for all types of variable cannot be changed under any circumstance. containers. – Although not required, it is a convention to give constants all UPPERCASE names. • The simplest type of variable container is the scalar. – Scalars can hold, a number, a word, a sentence or a disk-file. • Perl is case-sensitive. • Within Perl programs, scalars are given a name prefixed – This means that when naming variables in Perl, case is with a ($). significant. – Here are some example scalar names: • So, ‘‘TRUE’’ is a different symbol to ‘‘true’’. • $name, $_address, $programming_101, $z, $abc, $count

59 60

Copyright 2000 N. AYDIN. All rights reserved. 10 Variable containers and loops Running tentimes ...

• To demonstrate the use of variable containers within loops, >perl iter2.pl a version of forever that displays ten messages and then stops can be created. Welcome to the Wonderful World of Bioinformatics! # The ‘iter2' program - a (Perl) program, Welcome to the Wonderful World of Bioinformatics! # which stops after ten iterations. Welcome to the Wonderful World of Bioinformatics! use constant HOWMANY => 10; Welcome to the Wonderful World of Bioinformatics! Welcome to the Wonderful World of Bioinformatics! $count = 0; Welcome to the Wonderful World of Bioinformatics! while ( $count < HOWMANY ) Welcome to the Wonderful World of Bioinformatics! { Welcome to the Wonderful World of Bioinformatics! print "Welcome to the Wonderful World of Bioinformatics!\n"; Welcome to the Wonderful World of Bioinformatics! $count++; Welcome to the Wonderful World of Bioinformatics! }

61 62

Variable containers and loops Selection

• In Perl, it is not necessary to set the value of a • One of the basic building blocks of programming is variable container before it’s used. selection. • Perl has a number of rules that are applied to the first • The use of a selection mechanism allows a program to usage of a variable container choose one of a number of possible courses of action. – Perl sets a scalar to zero if it is first used within a numeric • Here’s the general of the selection statement in Perl: context. if ( some condition is true ) { – This feature can be very convenient. do something – However, it is always a good idea to give variable } containers an explicit starting value, as it indicates else precisely what the intentions for the variable are. { • Any programmer reading the tentimes program should be in no do something else doubt that the $count scalar is to be used within a numeric context. }

63 64

Using the Perl if construct There Really Is MTOWTDI • Here is another variation on the forever program that prints • Where MTOWTDI stands for ‘‘more than the message fivetimes. one way to do it’’. # The 'fivetimes' program - a (Perl) program, # which stops after five iterations. • This philosophy is one of the great strengths use constant TRUE => 1; use constant FALSE => 0; of Perl, but care is needed. use constant HOWMANY => 5; $count = 0; • Next examples illustrate the good and the while ( TRUE ) bad of this philosophy, { $count++; • Starting with a couple of not so good print "Welcome to the Wonderful World of Bioinformatics!\n"; if ( $count == HOWMANY ) examples followed by a couple of much { last; improved ones. } }

65 66

Copyright 2000 N. AYDIN. All rights reserved. 11 The oddeven program The terrible program

# The 'oddeven' program - a (Perl) program, # which iterates four times, printing 'odd' when $count # is an odd number, and 'even' when $count is an even • Here is another program that does exactly the same # number. use constant HOWMANY => 4; thing as oddeven. $count = 0; #! /usr/bin/perl -w while ( $count < HOWMANY ) # The 'terrible' program - a poorly formatted 'oddeven'. { $count++; use constant HOWMANY => 4; $count = 0; if ( $count == 1 ) while ( $count < HOWMANY ) { $count++; { if ( $count == 1 ) { print "odd\n"; } elsif ( $count == 2 ) print "odd\n"; } { print "even\n"; } elsif ( $count == 3 ) { print "odd\n"; } elsif ( $count == 2 ) else # at this point $count is four. { { print "even\n"; } } print "even\n"; } elsif ( $count == 3 ) • Notice that the program statements that make up the { terrible program are exactly the same as those that print "odd\n"; } make up the oddeven program. else # at this point $count is four. { – The difference between the two programs has to do with print "even\n"; } how they are laid out, or formatted. } 67 68

The terrible program The oddeven2 program • Here is another version of oddeven. • It produces exactly the same output as both oddeven and terrible: • Like a lot of modern programming languages, Perl is #! /usr/bin/perl -w classified as free format. # The 'oddeven2' program - another version of 'oddeven'. use constant HOWMANY => 4; • This means that you can write a program using $count = 0; whatever formatting you prefer, as perl can just as while ( $count < HOWMANY ) { easily process a well-formatted program, such as $count++; oddeven, as it can a poorly formatted program, such as if ( $count % 2 == 0 ) terrible. { print "even\n"; • Do yourself and everyone else a favour, and be sure to } else # $count % 2 is not zero. format your programs to be as readable as possible. { • Use plenty of whitespace, blank lines and indentation print "odd\n"; } to make your programs easier to read. }

69 70

Using the modulus operator The oddeven3 program

• That percentage sign (%) is another Perl operator, the modulus, % operator. • The following code is even shorter than the oddeven2. • Given two numbers, ‘‘A % ’’ returns the remainder after A has been divided by B, assuming both are #! /usr/bin/perl -w positive numbers. # The 'oddeven3' program - version of 'oddeven'. use constant HOWMANY => 4; $count = 0; print 5 % 2, "\n"; # prints a '1' on a line. print 4 % 2, "\n"; # prints a '0' on a line. while ( $count < HOWMANY ) print 7 % 4, "\n"; # prints a '3' on a line. { $count++; print "even\n" if ( $count % 2 == 0 ); print "odd\n" if ( $count % 2 != 0 ); }

71 72

Copyright 2000 N. AYDIN. All rights reserved. 12 Processing Data Files Processing Data Files

• The input operator in Perl is • The following is a small program called getlines that <> exploits the program statement in the previous slide: – When Perl encounters this operator within a program, it #! /usr/bin/perl -w looks for and returns a line of input from standard input, # The 'getlines' program which processes lines. • which is the name given to the mechanism that is currently providing input data to the program. while ( $line = <> ) – Unless Perl is told otherwise, the default input mechanism { is the keyboard. print $line; • A program takes a line of data from the keyboard } whenever the input operator is used. • The getlines program has a condition part that uses – Consider the following program statement: <> to look for and return a line from standard input. $line = <>; • The line, when available, is assigned to the $line scalar, – A line is read from the keyboard and put into the $line scalar which is then checked for trueness.

73 74

Processing Data Files Processing Data Files

• It turns out that, in addition to using numerics to represent • Consider the following commmand line statement: true and false, strings also have a truth value. > perl getlines terrible – A string with no characters is false, otherwise it is true. • In addition to standard input, Perl has – standard output, • the default place to display normal messages, – standard error, • the default place to display error messages. > perl getlines terrible welcome3 – Unless told otherwise, Perl uses the screen as the default for both standard output and standard error. • To make things convenient, standard input, standard output and standard error go by the shorthand names of STDIN, STDOUT and STDERR respectively.

75 76

Processing Data Files Introducing Patterns

• As the getlines program uses • Perl has another programming language built • When used in association with a named disk- into it. file, it uses the contents of the disk-file as input, and there can be more than one named • This language within a language makes extensive disk file. use of Perl’s regular expression, pattern-matching technology. • Programmers refer to the list of things on the • The Perl on-line documentation defines a regular command-line that follow a program name as expression to be simply a string that describes a pattern. its command-line arguments or parameters. – The pattern identifies what it is hoped to match. • The last invocation of getlines has two • The actual how of finding the pattern is taken care of by command-line arguments, the word ‘‘terrible’’ the Perl program. and the word ‘‘welcome3’’.

77 78

Copyright 2000 N. AYDIN. All rights reserved. 13 Introducing Patterns Introducing Patterns

• A programming language that allows the programmer • For introducing regular expressions, consider the to specify what is required is often referred to as a following program, called patterns: declarative language. – The programmer declares what’s required, and the #! /usr/bin/perl -w technology works out the details. • Aprogramming language that allows the programmer to # The 'patterns' program - introducing regular expressions. specify exactly how a result is to be arrived at is often referred to as a procedural language. while ( $line = <> ) – The programmer defines the procedure to be followed, and { the technology blindly follows the instructions. print $line if $line =~ /even/; • Most programming languages can be classified as one } or the other, either declarative or procedural. • Remarkably, Perl can be one or the other, or both.

79 80

Introducing Patterns Introducing Patterns

• ‘patterns’ program is very similar to the ‘getlines’ • When programmers refer to a character that program except the print command within the loop’s . surrounds something of interest, such as the print $line if $line =~ /even/; forward-leaning slash surrounding the patterns • Here’s the equivalent: in this section, they call that character a – display the contents of the scalar called $line if and only if . the scalar called $line contains the pattern ‘‘even’’. • In above stetement, =~ is called the binding operator. • The character delimits the something of • The binding operator compares something (usually a interest. scalar variable container) against a pattern. • The ‘‘/’’ character is the default delimiter for – For now, a pattern is defined as any sequence of characters surrounded by the forward-leaning slash character ‘‘/’’. regular expression patterns in Perl. • In the example above, the pattern is the word ‘‘even’’. • If the contents of $line contains the pattern ‘‘even’’ anywhere in the line, it is said to match.

81 82

Running patterns ... Input/Output

• To illustrate what’s going on, try the following command-lines: • Data entering a program is referred to as its input, while data produced by a program is its output. > perl patterns terrible – Rather than refer to (and write) ‘‘input/output’’, most programmers simply say ‘‘IO’’, which is written as I/O. > perl patterns oddeven • I/O facilities are often referred to as streams. – It is possible to have many streams associated with a program, with some of them classed as input streams and others classed as output streams. > perl patterns welcome2 – As a minimum, every Perl program has three standard streams available to it. • STDIN, STDOUT, and STDERR

83 84

Copyright 2000 N. AYDIN. All rights reserved. 14 I/O-STDIN I/O-STDIN

• The standard input (STDIN) is the default print "Enter a number: "; place from which data enters a program. $a = ; #<> is okay too • Typically, STDIN is the keyboard, but it can also be a print "Enter another number: "; disk-file. $b = ; #<> is okay too • To read data from STDIN, use the input operator: chomp $a; # chomp : removes \n from string chomp $b; # chomp : removes \n from string my $data = ; $c = $a + $b; or print "The sum of the numbers that you entered is $c"; my $data = <>; • Perl is smart enough to know that an ‘‘empty’’ input • If $a is omitted, it is assumed to be $_ operator actually refers to STDIN by default.

85 86

I/O-STDIN I/O- STDOUT

print "Enter the username: "; • The standard output stream (STDOUT) is the default $username = ; place to which data is sent by a program. chomp $username; • Typically, STDOUT is the screen, but it can also be a if ($username =~ /ibstudent/) { disk-file. print "Welcome IB student!\n\n";} • To write data to STDOUT, use the output operator: else {print "Bad username, sorry!\n\n";} print STDOUT $data; • Syntax of if, else statement: or if (a condition is met) print $data; {do something;} • Perl is smart enough to know that print sends data to else {do something else;} STDOUT by default.

87 88

I/O- STDOUT I/O-STDOUT

• STDOUT can be altered outside your program, • Writing the output to a particular file from with the "redirection" operator. within a Perl program: • So if you were running your Perl program and – Specify the file you want to use by opening a you wanted to keep the output for review filehandle to it. instead of letting it flash by on the screen, you – Use the new filehandle in print statement, instead could redirect STDOUT with the ">" symbol of the default STDOUT filehandle. and have the output sent to a file like this: – When finished, close the filehandle. >perl perltest.pl > output.txt; open OUTPUT, ">output.txt"; print OUTPUT "hello world\n"; close OUTPUT;

89 90

Copyright 2000 N. AYDIN. All rights reserved. 15 I/O-STDOUT FILEHANDLE

• An example: • A filehandle is a special type of variable that is associated with an output destination. – Following script creates a new web page, saved in • It is used to tell your program where you want output the file: c:/web/root/index.html. to go. – open a file for reading open FILEHANDLE,"chromosome2" open HTML, ">c:/perlex/index.html"; alternative form: print HTML "Content-Type: text/html\n\n"; open FILEHANDLE,"< chromosome2" print HTML ""; – open a file for writing print HTML "

Written by Perl!

"; open FILEHANDLE,">myprediction" print HTML ""; close HTML; – open a file for appending open FILEHANDLE,">>mypredictions"

91 92

FILEHANDLE FILEHANDLE

• Reading from file “DNA” and copying each line to • Reading from file “DNA” and copying each line to “DNAcopy": “DNAcopy":

open IN, “DNA" or die "Can't open input file: $!\n"; open IN, “DNA“; open OUT, ">DNAcopy" or die "Can't open output file: $!\n"; open OUT, ">DNAcopy"; while ($line = ) while ($line = ) { { print OUT $line; print OUT $line; } } • Syntax of die function: or die “output message” • Syntax of while statement: || die “output message” while (something is happening) {do something;} • Variable $! describes the error such as “file not found”

93 94

FILEHANDLE Perl Variables

• It is a good habit to close the things that you open. • Perl programs use variables to store data in • Close the filehandle once you are done with it. memory. • This will also happen automatically when your program ends. • Some complex functions don’t even work till after you close • Perl is a typeless language that doesn't force the files. the programmer to distinguish between the • Also as long as you keep files open you occupy the system’s memory. types of data stored in a variable. • Perl provides 3 built-in variable types: close IN or warn "Errors while closing filehandle: $!"; – Scalar • Syntax of warn function: – Array or warn “output message” || warn “output message” – Hash

95 96

Copyright 2000 N. AYDIN. All rights reserved. 16 Scalar variables Array variables...

• In Perl the most basic variable type is a scalar variable. • Arrays are handy when you want to store more than • It holds a single value. one data in a single variable, but still want to be able • Value can be any kind of data, including, but not limited to, to refer to them independently. string, integer, float, object and reference to other variables or sets of variables. • Array in Perl is distinguished from scalar variables • Scalar variables are preceded with dollar sign ($), and consist by its @ sign that proceeds its name. of only alpha-numeric characters. • Following are all valid Perl variable assignments. • You can initialize an array variable by giving a list of $lang = "Perl"; # <-- string. - Notice quotes values, each separated with comma inside the $version = 5.6; # <-- float. - Notice lack of quotes parenthesis: $year = 2001; # <-- integer. / $x = 10; $value = $x + 1; @desimal = ("bir", "iki", "uc", ... "dokuz"); $number_of_items = 15; @array = ( 1, 2 ); $word = "hello"; @values = ( $x, $y, 3, 5); $text = "This is a sentence but is still a scalar";

97 98

Array variables... Array variables

• You can also create an arbitrary array, and later re-assign it elements using a ([ ]) operator. • Programming languages such as C/C++ require – Important: when you refer to individual elements of an array, you use $ that all the elements of an array be of the same sign just like in scalar variables: type, such as all integers, all characters, all $ desimal [0] = "bir"; strings. $ desimal [1] = "iki"; • This is not the case with Perl. $ desimal [2] = "üç"; # ... – You can mix all kinds of data types in an array: $ desimal [8]= "dokuz";

• Digits inside the [ ] are usually called array's index, or just $array[0] = "apple"; # <-- string index. – In Perl array indices start at 0, not 1. $array[1] = 12; # <-- integer • That's why 10th element of an array has an index of 9 $array[2] = 3.47 # <-- float

99 100

Appending elements to an array Hash variables...

• When you want to add an extra element to the end of • Perl supports hash variables, which are also known as the array, you will need to know the last index of the associative arrays. array. – Arrays, because they store multiple values, just like ordinary arrays. • Special symbol, $# can be prepended to the name of – Associative, because they associate the values of the element not with the array to get the last index number. an index, but through names, also called keys. • You generate keys yourself, and you can refer to those values • For example: with those keys. • Distinguishing signature of a hash is a % sign, which is $last_index = $#desimal; prepended to its name: $desimal[ $last_index + 1 ] = "on"; • or %person = (); # <-- creating an empty hash $desimal[ $#desimal + 1 ] = "on"; • or %person = ( "l_name" => "AYDIN", push (@desimal, "on"); "f_name" => "Nizamettin", "email" => '[email protected]' );

101 102

Copyright 2000 N. AYDIN. All rights reserved. 17 Hash variables Hash-related functions • Just like in arrays, Perl provides several built-in functions for working with hashes. • You can access the values of columns of the keys() - returns all the keys (names) of the hash as an array: table individually using a {} operator. @names = keys(%person); • Just like in arrays, we use not % sign, but $ to values() - returns all the values of the hash as an array: refer to individual variables: @values = values(%person); delete() - deletes a key/value pair from the hash. delete $person{email}; $name = $person{"f_name"}; exists() - returns a true value if a specific key of the hash $email =$person{"l_name"}; really exists: if ( exists( $person{f_name} ) { # do something accordingly... }

103 104

String & Array... String & Array

• How to compute string length?. • Converting a string into an array: $length = length $variable; length function returns the length of a string – Use 'qw' operator to create an array: (number of characters) @hum_ubiq = qw(M Q I F K T L T G K T); • How to position of character in a string? Notice that the characters have to be separated by spaces $length = rindex($variable , ‘N') + 1; – Use 'split' function: rindex function returns the position of the $ubiquitin = 'MQIFVKTLTGKT'; first ‘N’ from the right @array = split(//, $ubiquitin); • Instead of 'rindex', 'index' can also be used It splits string $ubiquitin at a separator substring defined $q = $a . '?'; # "tag" the end of the string with '?' within slashes (// defines an ) $x = index ($q, '?'); # get the position of '?'

105 106

Example precursor

• This is the amino acid sequence of human ubiquitin: • Main Entry: pre·cur·sor MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKE • Pronunciation: \pri-ˈkər-sər, ˈprē-ˌ\ GIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLH • Function: noun LVLRLRGG • Etymology: Middle English precursoure, from praecursor,from praecurrere to run before, from prae- • Human UBC gene encodes a precursor composed of pre- + currere to run — more at CURRENT nine direct repeats of this sequence, plus an additional • Date: 15th century valine residue (V) at the C terminus. • 1 a : one that precedes and indicates the approach of another b :PREDECESSOR • Using Perl, create the sequence of the precursor, 2 : a substance, cell, or cellular component from which another calculate its length, approximate molecular weight, substance, cell, or cellular component is formed and the corresponding number of nucleotides in • synonyms see FORERUNNER mRNA, and finally print out the results. • — pre·cur·so·ry \-ˈkərs-rē, -ˈkər-sə-\ adjective

107 108

Copyright 2000 N. AYDIN. All rights reserved. 18 Example script 1 Example script 2

@ubi = qw(M Q I F V K T L T G K T I T L E V E P S D T I $ubi = E N V K A K I Q D K E G I P P D Q Q R L I F A G K Q L E 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQ D G R T L S D Y N I Q K E S T L H L V L R L R G G); QRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG'; @pre_ubi = (@ubi) x 9; $pre_ubi = ($ubi x 9) . 'V'; push (@pre_ubi, 'V'); $length = length ($pre_ubi); $length = @pre_ubi; $mw = $length * 0.11; $mw = $length * 0.11; $RNA_length = $length * 3; $RNA_length = $length * 3; print "The sequence of the human ubiquitin precursor print "The sequence of the human ubiquitin precursor is:\n$pre_ubi\n"; is:\n@pre_ubi\n"; print "Its length is $length amino acids.\n"; print "Its length is $length amino acids.\n"; print "Its approximate molecular weight is $mw daltons.\n"; print "Its approximate molecular weight is $mw daltons.\n"; print "It is encoded by an mRNA of approximately print "It is encoded by an mRNA of approximately $RNA_length nucleotides.\n"; $RNA_length nucleotides.\n";

109 110

Perl Operators… Perl Operators…

• Arithmetic Operators • To use these, you will place them in your statements – They perform some sort of mathematical functions like a mathematical expression. • So, if you want to store the sum of two variables in a Operator Function third variable, you would write something like this: + Addition - Subtraction, Negative Numbers, $adrevenue=20; Unary $sales=10; * Multiplication $total_revenue = $adrevenue + $sales; / Division % Modulus ** Exponent

111 112

Perl Operators… Perl Operators…

• Assignment Operators • String Operators Operator Function Operator Function = Normal Assignment += Add and Assign . Concatenate Strings -= Subtract and Assign .= Concatenate and Assign *= Multiply and Assign /= Divide and Assign %= Modulus and Assign $a = "Hello"; **= Exponent and Assign $b = "World"; • Increment/Decrement $c = $a . $b; # $c is now "HelloWorld" Operator Function $d = $a . " " . $b; # $d is now "Hello World" ++ Increment (Add 1) -- Decrement (Subtract 1)

113 114

Copyright 2000 N. AYDIN. All rights reserved. 19 Perl Operators… Perl Operators… • String Comparison • Numeric Comparison – These are similar to the numerical comparisons, – These operators are used to compare two numbers, but they work with strings. but not to compare strings. – These operators are typically used in some type of conditional statement that executes a block of code Operator Function or initiates a loop. eq Equal to ne Not Equal to Operator Function gt Greater than == Equal to lt Less than != Not Equal to ge Greater than or Equal to > Greater than le Less than or Equal to < Less than >= Greater than or Equal to <= Less than or Equal to

115 116

Perl Operators… Perl Operators

• String Comparison • Logical Operators – These are often used when you need to check more – The greater-than and less-than operators compare than one condition. strings using alphabetical order. Operator Function – Something that starts with "a" is greater than && AND something that starts with "c". || OR ! NOT – Also, small letters are greater than capital letters. – Thus, "hello" is greater than "Hello". • So, if you want to see if a number is less than or equal to 10, and also greater than zero: – So, that is how it will compare it. $number=5; if (($number <= 10) && ($number > 0)) { ...code.... }

117 118

Perl Functions... Perl Functions...

• Function is a collection of code (statements), • Let's create a simple function to compute the area of the triangle. which can be easily configured by passing lists • This function will receive 2 arguments; triangles of arguments. width and its height and returns final computed value: sub ucgen ($genislik, $yukseklik) { • Functions in Perl are called , and $alan = $genislik * $yukseklik / 2; have a general syntax of: return $alan; } sub (list of arguments) { • We can now call the above function with various arguments, and each time it will return a computed list of statements to execute value for a triangle: return some value } $alan1 = ucgen(3, 4); $alan2 = ucgen(45, 38);

119 120

Copyright 2000 N. AYDIN. All rights reserved. 20 Matching and Substitution in Perl Matching and Substitution in Perl • The ease and power of Perl's • You need to be able to open HTML templates and is one its true strengths and a big reason why swap in information pertaining to your visitor. Perl is as popular as it is. • Matching and then substituting is just the way to do it. • Almost every script you write in Perl will have • Also in many other administrative tasks, such as some kind of pattern matching operation searching through log files or web pages for because so often you want to seek something particular words or sequences, pattern matching is out, and then take an action when you find it. the way to go. • Pattern matching, in Perl at least, is the process of • Matching and substitution are very important looking through sections of text for particular because this is how you do editing "on the words, letters-within-words, character sequences, fly". numbers, strings of numbers, html tags. • This is how you create content customized to • These more complicated search expressions fall your Web visitor. into the category of "regular expressions".

121 122

The Binding Operator... The Binding Operator...

• When you do a pattern match, you need three things: • The "=~" construct, called the binding operator, is what binds • the text you are searching through the string being searched with the pattern that specifies the • the pattern you are looking for search. – The binding operator links these two together and causes the search to • a way of linking the pattern with the searched text take place. • As a simple example, let's say you want to see whether a string • Next, the "m/success/" construct is the matching operator, m//, variable has the value of "success". in action. • Here's how you could write the problem in Perl: – The "m" stands for matching to make it easy to remember. The slash $word = "success"; characters here are the "". • They surround the specified pattern. if ( $word =~ m/success/ ) { • In m/success/, the matching operator is looking for a match of print "Found success\n"; the letter sequence: success. } else { • Generally, the value of the matching statement returns 1 if print "Did not find success\n";} there was a match, and 0 if there wasn't.

123 124

The Binding Operator Matching..

• Negative Matching • Parentheses () group pattern elements. • An * means that the preceding character, element, or – In some cases you are more interested in whether a pattern group of elements may occur zero times, one time, or many does not match a string rather than that it does. In this case times. you could write • A plus + means that the preceding element or group of elements must occur at least once. if ( ! $string =~ m/search text/ ) ... • A ? matches zero or one times. – but as usual, Perl makes it easier for you and offers you • So: more than one way to do it. /fr.*nd/ matches "frnd", "friend", "front and back" • In this case, there's the "negative" binding operator, /fr.+nd/ matches "frond", "friend", "front and back" but not !~, so you could write this: "frnd". /10*1/ matches "11", "101", "1001", "100000001". if ( $string !~ m/search text/ ) ... /b(an)*a/ matches "ba", "bana", "banana", "banananana" /flo?at/ matches "flat" and "float" but not "flooat"

125 126

Copyright 2000 N. AYDIN. All rights reserved. 21 Matching... Matching...

[^...] matches characters that are not "...": • Square [ ] match a class of single characters. [^0-9] matches any non-digit character. [0123456789] matches any single digit • Curly braces allow more precise specification of repeated fields. For [0-9] matches any single digit example [0-9]{6} matches any sequence of 6 digits, and [0-9]+ matches any sequence of one or [0-9]{6,10} matches any sequence of 6 to 10 digits. more digits • Patterns float, unless anchored. The caret ^ (outside [ ]) anchors a pattern to the beginning, and dollar-sign $ anchors a pattern at the end, so: [a-z]+ matches any lowercase word /at/ matches "at", "attention", "flat", & "flatter" /^at/ matches "at" & "attention" but not "flat" [A-Z]+ matches any uppercase word /at$/ matches "at" & "flat", but not "attention" [ab n]* matches the null string "", "b", any /^at$/ matches "at" and nothing else. /^at$/i matches "at", "At", "aT", and "AT". number of blanks, "nab a banana" /^[ \t]*$/ matches a "blank line", one that contains nothing or any combination of blanks and tabs.

127 128

Matching... Matching

• The . Other characters simply match themselves, but the characters +?.*^$()[]{}|\ and usually / must be escaped with a • A simple \s specifies "white space", the same as the backslash \ to be taken literally. Thus: character class [ \t\n\r\f] (blank, tab, , carriage /10.2/ matches "10Q2", "1052", and "10.2" return,form-feed). A character may be specified in /10\.2/ matches "10.2" but not "10Q2" or "1052" hexadecimal as a \x followed by two hexadecimal /\*+/ matches one or more digits; \x1b is the ESC character. /A:\\DIR/ matches "A:\DIR" /\/usr\/bin/ matches "/usr/bin“ • A | specifies "or". • If a backslash precedes an alphanumeric character, this sequence if ($answer =~ /^y|^yes|^yeah/i ) { print takes a special meaning, typically a short form of a [ ] character class. For example, \d is the same as the [0-9] digits character "Affirmative!"; } class. /[-+]?\d*\.?\d*/ is the same as prints "Affirmative!" for $answer equal to "y" or /[-+]?[0-9]*\.?\d*/ "yes" or "yeah" (or "Y", "YeS", or "yessireebob, that's Either of the above matches decimal numbers: "-150", "- right"). 4.13", "3.1415", "+0000.00", etc.

129 130

Regular Expressions Regular Expressions

• All pattern matching in Perl is based on the concept of • A string containing wildcard characters and regular expressions. operations that define a set of one or more possible strings. • Regular expressions are an important part of computer science, and entire books are devoted to the topic. • A regular expression (abbreviated as regexp, regex, or regxp, with plural forms regexps, regexes, or regexen) • Regular expressions form a standard way of is a string that describes or matches a set of strings, expressing almost any text pattern unambiguously. according to certain syntax rules. • A mechanism to select specific strings from a set of • Regular expressions are used by many text editors and character strings. utilities to search and manipulate bodies of text based • A set of characters, metacharacters, and operators that on certain patterns. define a string or group of strings in a search pattern. • Many programming languages support regular expressions for string manipulation

131 132

Copyright 2000 N. AYDIN. All rights reserved. 22 http://en.wikipedia.org/wiki/Regular_expression

http://www.regular-expressions.info/tutorial.html

http://www.english.uga.edu/humcomp/perl/regex 2a.html

http://www.perl.com/doc/manual/html/pod/perlre .html

133

Copyright 2000 N. AYDIN. All rights reserved. 23