Sequence Alignment in Bioinformatics

Sequence alignment in bioinformatics Ewan Birney Sanger Centre Wellcome Trust Genome Campus Hinxton Cambridge CB SA England Email birneysangeracuk March Contents Intro duction Information ow in biology Probabilistic Finite State Machines PFSMs in bioinformatics Complex FSMs Previous use of PFSM in bioinformatics DNA comp osition mo dels prole hidden Markovmodels PFSM interpretations of pairwise sequence alignment Probabilistic mo dels of RNA Genome Mapping Gene Prediction Metho ds Other techniques Dynamite Intro duction The use of dynamic programming in bioinformatics PFSMs and dynamic programming Finding the maximum likeliho o d path Finding the total probability of observations Dynamite The Dynamite language formal denition of a dynamic programming recursion Implementations provided by Dynamite Viterbi quadratic memory alignment Viterbi score only linear memory Forwards score only linear memory Recursive linear memory alignment Serial database search pthreads Database search OneMo del BioXLG p ort Software engineering details of Dynamite Innovations in Dynamite Sp ecial states Lab els Compile time error detection by the Dynamite compiler An optimiser for dynamic programming Example Dynamite programs Dna Blo ck Aligner DBA Deriving an alignment from a D sup erp osition of protein structures Comparing two transmembrane proteins Other Dynamic Programming to olkits UNIX pattern matchers Dong and Searls Lefebvre Discussion GeneWise Intro duction The bio chemistry of premRNA splicing Current computer approaches to predicting splicing patterns PFSMs in Gene Prediction Performance of ab initio Gene Prediction programs Combining Homology with Gene Prediction Combining Probabilistic Mo dels GeneWise mo del Parameterisation Splice Site Mo dels Co don emission probabilities Insertion or Deletion errors Intron parameterisation Path scoring Flanking Regions Co ding region scoring Using the GeneWise algorithm Example of using GeneWise Evaluation of GeneWise Other evaluations of GeneWise Guigo and Agarwal The Drosophila annotation exp eriment Discussion of GeneWise Pfam a protein family database Intro duction protein prole HMMs of domains The Pfam database Requirements for Pfam as database The Pfam Database Management System Triggers on data entry Pro ductivity to ols Underlying Sequence database up date Middleware Layer Some Example families The RNA Recognition Motif Protein complexes Discussion Genome Intro duction Halfwise Worm Genome Comparison to curated worm genes Indication of annotation errors GeneWise accuracy in the worm Comparison to protein Pfam analysis Human Chromosome Comparison to curated genes Co ding density of Human vs Celegans Discussion Conclusion List of Tables Performance of Performance of Performance of Viterbi Performance of Guigo and Agarwal assessment Pfam Database Management System utilities Pfam Pro ductivitytools Blast Sensitivity Pfam across the worm Introns in the worm Potential annotation errors Halfwise accuracy Pfam across chromosome Chromosome accuracy List of Figures Information ow in Biology Simple nite state machine Simple nite state machine parameterised Non deterministic nite state machine DNA motif PFSM Alignment PFSM Simple vs Complex PFSM The two state gap mo del Basic Viterbi recursion Dynamic Programming implementation Divide and conquor recursion Dna Blo ck Aligner DBA Dna Blo ck Aligner output Transmembrane matching PFSM Transmembrane comparison output Diagram of splicing Mo del combination Gene mo del for GeneWise protein prole HMM The merging of gene prediction and homology mo dels GeneWise GeneWise An example output of GeneWise Diagram of the Pfam middleware The RRM seed alignment Preface.

Sequence Alignment in Bioinformatics

Curriculum Vitae – Prof. Anders Krogh Personal Information

Predicting Transmembrane Topology and Signal Peptides with Hidden Markov Models

Biological Sequence Analysis Probabilistic Models of Proteins and Nucleic Acids

Tporthmm : Predicting the Substrate Class Of

A Jumping Profile HMM for Remote Protein Homology Detection

Dear Delegates,History of Productive Scientiﬁc Discussions of New Challenging Ideas and Participants Contributing from a Wide Range of Interdisciplinary ﬁelds

I S C B N E W S L E T T

Reading Genomes Bit by Bit

Predicting Cellular Localization and Membrane Topology

Mrub 1325, Mrub 1326, Mrub 1327, And

Scientific Program

University of Bologna International Master Course in Bioinformatics Interdepartmental Center “Luigi Galvani” Department of Pharmacy and Biotechnology