Introduction to Bioinformatics Introduction to Bioinformatics
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Bioinformatics Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN [email protected] Introduction to Perl http://www3.yildiz.edu.tr/~naydin 1 2 Learning objectives Setting The Technological Scene • After this lecture you should be able to • One of the objectives of this course is.. – to enable students to acquire an understanding of, and understand : ability in, a programming language (Perl, Python) as the – sequence, iteration and selection; main enabler in the development of computer programs in the area of Bioinformatics. – basic building blocks of programming; – three C’s: constants, comments and conditions; • Modern computers are organised around two main – use of variable containers; components: – use of some Perl operators and its pattern-matching technology; – Hardware – Perl input/output – Software – … 3 4 Introduction to the Computing Introduction to the Computing • Computer: electronic genius? • In theory, computer can compute anything – NO! Electronic idiot! • that’s possible to compute – Does exactly what we tell it to, nothing more. – given enough memory and time • All computers, given enough time and memory, • In practice, solving problems involves are capable of computing exactly the same things. computing under constraints. Supercomputer – time Workstation • weather forecast, next frame of animation, ... PDA – cost • cell phone, automotive engine controller, ... = = – power • cell phone, handheld video game, ... 5 6 Copyright 2000 N. AYDIN. All rights reserved. 1 Layers of Technology Layers of Technology • Operating system... – Interacts directly with the hardware – Responsible for ensuring efficient use of hardware resources • Tools... – Softwares that take adavantage of what the operating system has to offer. – Programming languages, databases, editors, interface builders... • Applications... – Most useful category of software – Web browsers, email clients, web servers, word processors, etc... 7 8 Transformations Between Layers How do we solve a problem using a computer? • A systematic sequence of transformations between layers of abstraction. Problems Algorithms Problem Software Design: Language choose algorithms and data structures Algorithm Instruction Set Architecture Programming: Microarchitecture use language to express design Program Circuits Compiling/Interpreting: convert language to Devices Instr Set Architecture machine instructions 9 10 Deeper and Deeper… Descriptions of Each Level… • Problem Statement – stated using "natural language" Instr Set Architecture – may be ambiguous, imprecise Processor Design: • Algorithm choose structures to implement ISA – step-by-step procedure, guaranteed to finish Microarch – definiteness, effective computability, finiteness Logic/Circuit Design: gates and low-level circuits to • Program – express the algorithm using a computer language Circuits implement components – high-level language, low-level language Process Engineering & Fabrication: develop and manufacture • Instruction Set Architecture (ISA) Devices lowest-level components – specifies the set of instructions the computer can perform – data types, addressing mode 11 12 Copyright 2000 N. AYDIN. All rights reserved. 2 …Descriptions of Each Level Many Choices at Each Level • Microarchitecture Solve a system of equations – detailed organization of a processor implementation Gaussian Jacobi Red-black SOR Multigrid – different implementations of a single ISA elimination iteration • Logic Circuits FORTRAN C C++ Java – combine basic operations to realize Tradeoffs: microarchitecture PowerPC Intel x86 Atmel AVR cost – many different ways to implement a single function performance Centrino Pentium 4 Xeon power (e.g., addition) (etc.) • Devices Ripple-carry adder Carry-lookahead adder – properties of materials, manufacturability CMOS Bipolar GaAs The successive over-relaxation (SOR) : a method for solving a linear system of equations. 13 14 The Computer Level Hierarchy C Fortran Ada etc. Basic Java • Each virtual machine layer is Compiler Compiler an abstraction of the level below it. Byte Code • The machines at each level Assembly Language execute their own particular instructions, calling upon Assembler Interpreter machines at lower levels to perform tasks as required. Executable • Computer circuits ultimately carry out the work. Instruction Set Architecture •Software? •Program or collection of programs. HW HW HW •Enables the hardware to process data. Implementation 1 Implementation 2 Implementation N 15 16 Programming Stepwise Refinement • Methodologies for creating computer programs • Also known as systematic decomposition. that perform a desired function. – Problem Solving • Start with problem statement: • How do we figure out what to tell the computer to do? • Convert problem statement into algorithm, using stepwise • Decompose task into a few simpler subtasks. refinement. • Convert algorithm into machine instructions. • Decompose each subtask into smaller subtasks, and – Debugging these into even smaller subtasks, etc.... • How do we figure out why it didn’t work? • Examining registers and memory, setting breakpoints, etc. until you get to the machine instruction level. Time spent on the first can reduce time spent on the second! 17 18 Copyright 2000 N. AYDIN. All rights reserved. 3 Problem Statement Three Basic Constructs • There are three basic ways to decompose a task: • Because problem statements are written in English, they are sometimes ambiguous and/or incomplete. – Where is “file” located? Task – How big is it? – How do I know when I’ve reached the end? – How should final count be printed? A decimal number? True Test False – If the character is a letter, should I count both upper-case and False condition Test lower-case occurrences? Subtask 1 condition True • How do you resolve these issues? Subtask 1 Subtask 2 Subtask 2 Subtask – Ask the person who wants the problem solved, or – Make a decision and document it. Sequential Conditional Iterative 19 20 Sequential Conditional • Do Subtask 1 to completion, • If condition is true, do Subtask 1; then do Subtask 2 to completion, etc. else, do Subtask 2. Get character input from keyboard True file char False = input? Count and print the Examine file and Test character. occurrences of a count the number If match, increment character in a file of characters that Count = Count + 1 match counter. Print number to the screen 21 22 Iterative Why Write Programs? • Do Subtask over and over, as long as the test condition is true. • Automate computer work that you do by hand – save time & reduce errors more chars False • Run the same analysis on lots of similar data files to check? Check each element of • Analyze data the file and count the True characters that match. • Make decisions Check next char and • Create new analysis methods count if matches. 23 Copyright 2000 N. AYDIN. All rights reserved. 4 Why Perl? As a software tool: Perl • What Is Perl? • Fairly easy to learn the basics • PERL is a "Practical Extraction and Report Language“ • Many powerful functions for working with • (or Pathologically Eclectic Rubbish Lister) text: search & extract, modify, combine – freely available for Unix, MVS, VMS, MS/DOS, • Can control other programs Macintosh, OS/2, Amiga, and other operating systems. • Free and available for all operating systems • Perl has powerful text manipulation functions. – It eclectically combines features and purposes of many • Most popular language in bioinformatics command languages. • Many pre-built “modules” are available that – Perl has enjoyed popularity for programming World Wide do useful things Web electronic forms and generally as glue and gateway between systems, databases, and users. 26 History Strengths of Perl • Originally written by Larry Wall at NASA’s Jet • Very easy to learn Propulsion Labs • Very portable – to process mail on Unix systems – extended by a lot of people and many biologists! • High level language • Started as ‘glue’ language, • Powerful text processing – for the use of Larry and officemates. • It’s free • It combines the best features of several languages. • What makes Perl a good programming language for • Version 1: December 18, 1987 Biological data? • Current stable release is Perl 5.18.2 – Fast in file manipulation • Perl motto: TMTOWTDI-There’s More Than One – DBI modules provide database bridge for other applications Way To Do It – CGI module provides easy web interface 27 28 Getting and Installing Perl What is Perl Used For • http://www.perl.org/ • CGI (common gateway interface) Programming (dynamically generating web pages). • http://www.perl.com/CPAN/ – (Example websites: www.amazon.com, www.slashdot.org, • http://www.activestate.com/ www.deja.com) • Perl tutorials: • Extracting data from one source and translating it to another format. http://www.internetbiologists.org/IB-perl/index.html • Manipulating databases, simple search and replace http://learn.perl.org/library/beginning_perl/ operation. • Data management in Human Genome Project • Bioinformatics related web pages: • Internet programming, automating administration http://www.geocities.com/bioinformaticsweb/index.html tasks, ...............etc. http://glasnost.itcarlow.ie/~biobook/index.html 29 30 Copyright 2000 N. AYDIN. All rights reserved. 5 Win95 / Win98 / WinME / WinNT / Win2000/W2K / Which Platform to Use WinXP (Win32) • http://www.perl.com/CPAN/ports/index.html • Starting from Perl 5.005 the Win32 support has been integrated to the Perl • Perl Ports (Binary Distributions) standard source code distribution. But if you