Bioinformatics I: Sequence Analysis and Phylogenetics

Bioinformatics I: Sequence Analysis and Phylogenetics

Lecture Notes Institute of Bioinformatics, Johannes Kepler University Linz Bioinformatics I Sequence Analysis and Phylogenetics Winter Semester 2016/2017 by Sepp Hochreiter Institute of Bioinformatics Tel. +43 732 2468 4520 Johannes Kepler University Linz Fax +43 732 2468 4539 A-4040 Linz, Austria http://www.bioinf.jku.at c 2008 Sepp Hochreiter This material, no matter whether in printed or electronic form, may be used for personal and educational use only. Any reproduction of this manuscript, no matter whether as a whole or in parts, no matter whether in printed or in electronic form, requires explicit prior acceptance of the author. Legend (!): explained later in the text, forward reference italic: important term (in most cases explained) iii iv Literature D. W. Mount, Bioinformatics: Sequences and Genome analysis, CSHL Press, 2001. D. Gusfield, Algorithms on strings, trees and sequences: computer science and cmomputa- tional biology, Cambridge Univ. Press, 1999. R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological sequence analysis, Cambridge Univ. Press, 1998. M. Waterman, Introduction to Computational Biology, Chapmann & Hall, 1995. Setubal and Meidanis, Introduction to Computational Molecular Biology, PWS Publishing, 1997. Pevzner, Computational Molecular Biology, MIT Press, 2000. J. Felsenstein: Inferring phylogenies, Sinauer, 2004. W. Ewens, G. Grant, Statistical Methods in Bioinformatics, Springer, 2001. M. Nei, S. Kumar, Molecular Evolution and Phylogenetics, Oxford 2000. Blast: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html v vi Contents 1 Biological Basics 1 1.1 The Cell . .1 1.2 Central Dogma of Molecular Biology . .4 1.3 DNA.........................................5 1.4 RNA......................................... 12 1.5 Transcription . 14 1.5.1 Initiation . 15 1.5.2 Elongation . 17 1.5.3 Termination . 17 1.6 Introns, Exons, and Splicing . 17 1.7 Amino Acids . 23 1.8 Genetic Code . 27 1.9 Translation . 29 1.9.1 Initiation . 29 1.9.2 Elongation . 31 1.9.3 Termination . 31 1.10 Folding . 31 2 Bioinformatics Resources 37 2.1 Data Bases . 37 2.2 Software . 40 2.3 Articles . 42 3 Pairwise Alignment 45 3.1 Motivation . 45 3.2 Sequence Similarities and Scoring . 47 3.2.1 Identity Matrix . 47 3.2.2 PAM Matrices . 50 3.2.3 BLOSUM Matrices . 55 3.2.4 Gap Penalties . 59 3.3 Alignment Algorithms . 60 3.3.1 Global Alignment — Needleman-Wunsch . 61 3.3.1.1 Linear Gap Penalty . 61 3.3.1.2 Affine Gap Penalty . 66 3.3.1.3 KBand Global Alignment . 67 3.3.2 Local Alignment — Smith-Waterman . 71 vii 3.3.3 Fast Approximations: FASTA, BLAST and BLAT . 72 3.3.3.1 FASTA . 76 3.3.3.2 BLAST . 76 3.3.3.3 BLAT . 79 3.4 Alignment Significance . 80 3.4.1 Significance of HSPs . 80 3.4.2 Significance of Perfect Matches . 83 4 Multiple Alignment 85 4.1 Motivation . 85 4.2 Multiple Sequence Similarities and Scoring . 87 4.2.1 Consensus and Entropy Score . 87 4.2.2 Tree and Star Score . 87 4.2.3 Weighted Sum of Pairs Score . 88 4.3 Multiple Alignment Algorithms . 90 4.3.1 Exact Methods . 92 4.3.2 Progressive Algorithms . 94 4.3.2.1 ClustalW . 95 4.3.2.2 TCoffee . 96 4.3.3 Other Multiple Alignment Algorithms . 96 4.3.3.1 Center Star Alignment . 96 4.3.3.2 Motif- and Profile-based Methods . 98 4.3.3.3 Probabilistic and Model-based Methods . 98 4.3.3.4 Divide-and-conquer Algorithms . 98 4.4 Profiles and Position Specific Scoring Matrices . 101 5 Phylogenetics 103 5.1 Motivation . 103 5.1.1 Tree of Life . 103 5.1.2 Molecular Phylogenies . 105 5.1.3 Methods . 106 5.2 Maximum Parsimony Methods . 106 5.2.1 Tree Length . 106 5.2.2 Tree Search . 110 5.2.2.1 Branch and Bound . 110 5.2.2.2 Heuristics for Tree Search . 111 5.2.2.2.1 Stepwise Addition Algorithm . 111 5.2.2.2.2 Branch Swapping . 112 5.2.2.2.3 Branch and Bound Like . 112 5.2.3 Weighted Parsimony and Bootstrapping . 112 5.2.4 Inconsistency of Maximum Parsimony . 112 5.3 Distance-based Methods . 114 5.3.1 UPGMA . 115 5.3.2 Least Squares . 115 5.3.3 Minimum Evolution . 116 5.3.4 Neighbor Joining . 116 5.3.5 Distance Measures . 121 viii 5.3.5.1 Jukes Cantor . 121 5.3.5.2 Kimura . 123 5.3.5.3 Felsenstein / Tajima-Nei . 124 5.3.5.4 Tamura . 124 5.3.5.5 Hasegawa (HKY) . 125 5.3.5.6 Tamura-Nei . 125 5.4 Maximum Likelihood Methods . 125 5.5 Examples . 127 A Amino Acid Characteristics 135 B A∗-Algorithm 137 C Examples 139 C.1 Pairwise Alignment . 139 C.1.1 PAM Matrices . 139 C.1.2 BLOSUM Matrices . 142 C.1.3 Global Alignment — Needleman-Wunsch . 144 C.1.3.1 Linear Gap Penalty . 144 C.1.3.2 Affine Gap Penalty . 147 C.1.4 Local Alignment — Smith-Waterman . 151 C.1.4.1 Linear Gap Penalty . 151 C.1.4.2 Affine Gap Penalty . 154 C.2 Phylogenetics . 158 C.2.1 UPGMA . 158 C.2.2 Neighbor Joining . 161 ix x List of Figures 1.1 Prokaryotic cells of bacterium and cynaophyte (photosynthetic bacteria). .3 1.2 Eukaryotic cell of a plant. .4 1.3 Cartoon of the “human genome project”. .5 1.4 Central dogma is depicted. .6 1.5 The deoxyribonucleic acid (DNA) is depicted. .7 1.6 The 5 nucleotides. .8 1.7 The hydrogen bonds between base pairs. .8 1.8 The base pairs in the double helix. .9 1.9 The DNA is depicted in detail. .9 1.10 The storage of the DNA in the nucleus. 10 1.11 The storage of the DNA in the nucleus as cartoon. 11 1.12 The DNA is right-handed. 11 1.13 The difference between RNA and DNA is depicted. ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    184 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us