Protein Sequence-Structure Relationships Sumudu Pamoda Leelananda Iowa State University

Total Page:16

File Type:pdf, Size:1020Kb

Protein Sequence-Structure Relationships Sumudu Pamoda Leelananda Iowa State University Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2011 Protein sequence-structure relationships Sumudu Pamoda Leelananda Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Biochemistry, Biophysics, and Structural Biology Commons Recommended Citation Leelananda, Sumudu Pamoda, "Protein sequence-structure relationships" (2011). Graduate Theses and Dissertations. 10344. https://lib.dr.iastate.edu/etd/10344 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Protein sequence-structure relationships by Sumudu Pamoda Leelananda A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Biophysics Program of Study Committee Robert L. Jernigan, Major Professor Drena Dobbs David Fernandez-Baca Mark Hargrove Andrzej Kloczkowski Edward Yu Iowa State University Ames, Iowa 2011 Copyright © Sumudu Pamoda Leelananda, 2011. All rights reserved. ii DEDICATION To my dear parents for their unconditional love and support iii TABLE OF CONTENTS ACKNOWLEDGEMENTS ........................................................................................... vi ABSTRACT................................................................................................................... viii CHAPTER 1: INTRODUCTION .................................................................................. 1 1.1 Designability ........................................................................................................................................... 1 1.2 Contact potentials .................................................................................................................................. 4 1.3 A guide to the present study .................................................................................................................. 5 CHAPTER 2: DESIGNABILITY OF LATTICE MODELS OF PROTEINS .......... 8 2.1 Topology and designability of lattice conformations .......................................................................... 9 2.1.1 Methods 10 2.1.1.1 Enumeration of compact conformations ................................................................................. 10 2.1.1.2 Calculation of energy of conformations ................................................................................. 11 2.1.1.3 Calculation of designability .................................................................................................... 15 2.1.1.4 Generation of contact graphs .................................................................................................. 15 2.1.1.5 Regression analysis ................................................................................................................ 17 2.1.1.5 Naïve Bayes prediction........................................................................................................... 17 2.1.2 Results 19 2.1.3 Discussion 24 2.1.3.1 Application to larger lattices and real protein structures ........................................................ 27 2.2 Sequence space diversity ..................................................................................................................... 28 2.2.1 Methods 28 2.2.2 Results 31 2.2.2.1 Extended set of conformations ............................................................................................... 31 2.2.2.2 Degeneracy ............................................................................................................................. 34 2.2.2.3 Reduced set of conformations ................................................................................................ 36 2.2.3 Discussion 37 iv 2.3 Use of machine learning algorithms ................................................................................................... 39 2.3.1 Distinguishing between designable and non-designable sequences 39 2.3.1.1 Methods .................................................................................................................................. 39 2.3.1.2 Results .................................................................................................................................... 41 2.3.1.3 Discussion .............................................................................................................................. 42 2.3.2 Application to real proteins 44 2.3.2.1 Methods .................................................................................................................................. 44 2.3.2.2 Results .................................................................................................................................... 45 2.3.2.3 Discussion .............................................................................................................................. 45 2.4 A preliminary study of disordered proteins using a simple lattice .................................................. 46 2.4.1 Methods 47 2.4.2 Results 47 2.4.3 Discussion 51 CHAPTER 3: DESIGNABILITY OF REAL PROTEIN STRUCTURES .............. 53 3.1 Thermodynamic stability of highly designable structures ............................................................... 54 3.1.1 Methods 54 3.1.2 Results 56 3.1.3 Discussion 57 3.2 Designability of off-lattice models of real proteins ............................................................................ 58 3.2.1 Methods 58 3.2.1.1 Dihedral angles ....................................................................................................................... 60 3.2.2 Results 61 3.2.3 Discussion 62 3.3 Topology and designability of real proteins ....................................................................................... 64 3.3.1 Methods 64 3.3.1.1 Selection of datasets ............................................................................................................... 64 3.3.1.2 Generation of contact graphs and calculation of designability ............................................... 69 3.3.1.3 Naïve Bayes prediction........................................................................................................... 70 3.3.2 Results 70 3.3.3 Discussion 76 3.4 Sequence space diversity ..................................................................................................................... 80 3.4.1 Methods 80 v 3.4.2 Results 81 3.4.3 Discussion 86 3.5 Eigenvalues of contact graphs and designability ............................................................................... 87 3.5.1 Methods 88 3.5.2 Results 88 3.5.3 Discussion 91 3.6 Use of machine learning algorithms ................................................................................................... 91 3.6.1 Distinguishing between designable and non-designable sequences 92 3.6.1.1 Methods .................................................................................................................................. 92 3.6.1.2 Results .................................................................................................................................... 92 3.6.1.3 Discussion .............................................................................................................................. 93 3.6.2 Application to real proteins 93 3.6.2.1 Methods .................................................................................................................................. 93 3.6.2.2 Results .................................................................................................................................... 93 3.6.2.3 Discussion .............................................................................................................................. 94 CHAPTER 4: FOUR-BODY OPTIMIZED POTENTIALS ..................................... 95 4.1 Methods................................................................................................................................................. 96 4.1.1 Geometric construction of four bodies 96 4.2 Results ................................................................................................................................................. 101 4.2.1 Performance of different individual potential functions 101 4.2.2 Performance of the optimized potential 103 4.3 Discussion ........................................................................................................................................... 105 4.4 Participation in CASP9 ..................................................................................................................... 110 CHAPTER 5: SUMMARY AND CONCLUSIONS ................................................. 112 APPENDIX ................................................................................................................... 117 REFERENCES ............................................................................................................ 118 vi ACKNOWLEDGEMENTS I would like to take this opportunity to thank my advisor, Dr. Robert L. Jernigan, who has been guiding me from the day I joined the Jernigan Lab in 2008. It has been a pleasure to work with him. He has always
Recommended publications
  • Proteome-Scale Prediction of Structure and Func- Tion
    Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press The proteome folding project: proteome-scale prediction of structure and func- tion Kevin Drew 1, Patrick Winters 1, Glenn L. Butterfoss 1, Viktors Berstis 2, Keith Uplinger 2, Jonathan Armstrong 2, Michael Riffle 3, Erik Schweighofer 4, Bill Bovermann 2, David R. Good- lett 5, Trisha N. Davis 3, Dennis Shasha 6, Lars Malmström 7, Richard Bonneau 1,4,6 * 1 Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003, USA 2 IBM, Austin, TX, 78758, USA 3 Department of Biochemistry, Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA 4 Institute for Systems Biology, Seattle, WA 98103, USA 5 Medicinal Chemistry Department, University of Washington, Seattle, WA 98195, USA 6 Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, New York, 10003, USA 7 Institute of Molecular Systems Biology, ETH Zurich, Zurich CH 8093, Switzerland * To whom correspondence should be addressed. Richard Bonneau, PhD. New York University Center for Genomics and Systems Biology 100 Washington Sq. E. 1009 Main Building New York, NY 10003 Voice: 646 460 3026 Email: [email protected] Running title: proteome folding Keywords: proteome, de novo folding, function prediction, Rosetta, grid computing 1 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press 2 Downloaded from genome.cshlp.org on October 5, 2021 - Published by Cold Spring Harbor Laboratory Press Abstract: The incompleteness of proteome structure and function annotation is a critical problem for biol- ogists and, in particular, severely limits interpretation of high-throughput and next-generation experiments.
    [Show full text]
  • The History of the CATH Structural Classification of Protein Domains
    Accepted Manuscript The History of the CATH Structural Classification of Protein Domains Ian Sillitoe, Natalie Dawson, Janet Thornton, Christine Orengo PII: S0300-9084(15)00251-5 DOI: 10.1016/j.biochi.2015.08.004 Reference: BIOCHI 4793 To appear in: Biochimie Received Date: 1 July 2015 Accepted Date: 1 August 2015 Please cite this article as: I. Sillitoe, N. Dawson, J. Thornton, C. Orengo, The History of the CATH Structural Classification of Protein Domains, Biochimie (2015), doi: 10.1016/j.biochi.2015.08.004. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. ACCEPTED MANUSCRIPT The History of the CATH Structural Classification of Protein Domains Ian Sillitoe*, Natalie Dawson, Janet Thornton and Christine Orengo University College London Darwin Building, Gower Street, University College London, WC1E 6BT, UK * Corresponding author: Email: [email protected] Phone: +442076792171 MANUSCRIPT ACCEPTED ACCEPTED MANUSCRIPT Abstract This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains.
    [Show full text]
  • An Analysis of Intron Positions in Relation to Protein Structure, Function, and Evolution
    An Analysis of Intron Positions in Relation to Protein Structure, Function, and Evolution Gordon Stuart Whamond Biomolecular Structure and Modelling Unit Department of Biochemistry and Molecular Biology University College London and European Bioinformatics Institute Hinxton Cambridge A thesis submitted to the University of London in the Faculty of Science for the degree of Doctor of Philosophy September 2004 ProQuest Number: U642089 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. uest. ProQuest U642089 Published by ProQuest LLC(2015). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code. Microform Edition © ProQuest LLC. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, Ml 48106-1346 Abstract The genes of most eukaryotic organisms are organised into coding (exon) and non­ coding (intron) sequences. Since their discovery, introns have been the subject of a considerable amount of research focused on trying to locate correlations be­ tween intron positions and protein structure. However, in many cases these studies have suffered from using small datasets. In recent years, the amount of available nucleotide sequence data, protein sequence data, and protein structure data has rapidly increased, allowing more reliable and accurate conclusions to be drawn from contemporaneous analyses. The work presented in this thesis is an investigation of intron positions in relation to protein sequence, structure, and function.
    [Show full text]
  • Dompro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Netw
    DOMpro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Networks Jianlin Cheng [email protected] Michael J. Sweredoski [email protected] Pierre Baldi [email protected] Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California Irvine, Irvine, CA 92697, USA Abstract. Protein domains are the structural and functional units of proteins. The abil- ity to parse protein chains into different domains is important for protein classification and for understanding protein structure, function, and evolution. Here we use machine learning algorithms, in the form of recursive neural networks, to develop a protein domain predictor called DOMpro. DOMpro predicts protein domains using a combination of evolutionary infor- mation in the form of profiles, predicted secondary structure, and predicted relative solvent accessibility. DOMpro is trained and tested on a curated dataset derived from the CATH database. DOMpro correctly predicts the number of domains for 69% of the combined dataset of single and multi-domain chains. DOMpro achieves a sensitivity of 76% and specificity of 85% with respect to the single-domain proteins and sensitivity of 59% and specificity of 38% with respect to the two-domain proteins. DOMpro also achieved a sensitivity and specificity of 71% and 71% respectively in the Critical Assessment of Fully Automated Structure Prediction 4 (CAFASP-4) (Fischer et al., 1999; Saini and Fischer, 2005) and was ranked among the top ab initio domain predictors. The DOMpro server, software, and dataset are available at http://www.igb.uci.edu/servers/psss.html. Keywords: protein structure prediction, domain, recursive neural networks 1.
    [Show full text]
  • Understanding the Relationship Between Enzyme Structure and Catalysis
    Understanding the Relationship Between Enzyme Structure and Catalysis Alex Gutteridge∗ Darwin College, Cambridge This dissertation is submitted for the degree of Doctor of Philosophy October 15, 2005 ∗EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD. Preface This dissertation is the result of my own work, and includes nothing which is the outcome of work done in collaboration, except where specifically indicated in the text. The length of this dissertation does not exceed the word limit specified by the Graduate School of Biological, Medical and Veterinary Sciences. i Abstract The three dimensional structure of an enzyme is of fundamental importance to almost every function it performs. In this thesis, we aim to analyse the structures of many enzymes in order to gain insights into how they perform catalysis, and the role of structure in enzyme function. As well as improving our understanding of these important biological molecules, these insights will help in developing new tools for annotating enzyme structures of unknown function and for designing novel enzymes. Predicting the location of the active site, and the identity of the catalytic residues, is an important first step for annotating an enzyme of unknown function. We have developed a neural network trained to distinguish catalytic and non-catalytic residues based on a mixture of sequence and structural parameters. We find that the correct location of the active site can be predicted in ∼70% of cases. We also find that the most important factor in making a prediction is the conservation score of each residue. However, including structural data does improve the predictions that are made over those made on the basis of conservation alone.
    [Show full text]
  • HMM Based Approach for Classifying Protein Structures
    International Journal of BioBioBio-Bio ---Science and BioBio----Technology Vol. 111,1, No. 111,1,,, DecemberDecember,, 2002009999 HMM based approach for classifying protein structures Georgina Mirceva 1 and Danco Davcev 2 Faculty of Electrical Engineering and Information Technologies University Ss. Cyril and Methodius, Skopje, Macedonia [email protected], [email protected] Abstract To understand the structure-to-function relationship, life sciences researchers and biologists need to retrieve similar structures from protein databases and classify them into the same protein fold. With the technology innovation the number of protein structures increases every day, so, retrieving structurally similar proteins using current structural alignment algorithms may take hours or even days. Therefore, improving the efficiency of protein structure retrieval and classification becomes an important research issue. In this paper we propose novel approach which provides faster classification (minutes) of protein structures. We build separate Hidden Markov Model (HMM) for each class. In our approach we align tertiary structures of proteins. Viterbi algorithm is used to find the most probable path to the model. We have compared our approach against an existing approach named 3D HMM, which also performs alignment of tertiary structures of proteins by using HMM. The results show that our approach is more accurate than 3D HMM. Keywords : Protein Data Bank (PDB), protein classification, Structural Classification of Proteins (SCOP), Hidden Markov Model (HMM), 3D HMM. 1. Introduction To understand the structure-to-function relationship, life sciences researchers and biologists need to retrieve similar structures from protein databases and classify them into the same protein fold. The structure of a protein molecule is the main factor which determines its chemical properties as well as its function.
    [Show full text]
  • Exploring the Function and Evolution of Proteins Using Domain Families
    Exploring the Function and Evolution of Proteins Using Domain Families Adam James Reid Research Department of Structural and Molecular Biology University College London A thesis submitted to University College London for the degree of Doctor of Philosophy January 2009 1 Declaration I, Adam James Reid, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Adam James Reid January 2009 2 Abstract Proteins are frequently composed of multiple domains which fold independently. These are often evolutionarily distinct units which can be adapted and reused in other proteins. The classification of protein domains into evolutionary families facilitates the study of their evolution and function. In this thesis such classifications are used firstly to examine methods for identifying evolutionary relationships (homology) between protein domains. Secondly a specific approach for predicting their function is developed. Lastly they are used in studying the evolution of protein complexes. Tools for identifying evolutionary relationships between proteins are central to computational biology. They aid in classifying families of proteins, giving clues about the function of proteins and the study of molecular evolution. The first chapter of this thesis concerns the effectiveness of cutting edge methods in identifying evolutionary relationships between protein domains. The identification of evolutionary relationships between proteins can give clues as to their function. The second chapter of this thesis concerns the development of a method to identify proteins involved in the same biological process. This method is based on the concept of domain fusion whereby pairs of proteins from one organism with a concerted function are sometimes found fused into single proteins in a different organism.
    [Show full text]
  • BGRS\SB-2018) the Eleventh International Conference
    Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences Novosibirsk State University BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE\SYSTEMS BIOLOGY (BGRS\SB-2018) The Eleventh International Conference Abstracts 20–25 August, 2018 Novosibirsk, Russia Novosibirsk ICG SB RAS 2018 УДК 575 B60 Bioinformatics of Genome Regulation and Structure\Systems Biology (BGRS\SB-2018) : The Eleventh International Conference (20–25 Aug. 2018, Novosibirsk, Russia); Abstracts / Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences; Novosibirsk State University. – Novosibirsk: ICG SB RAS, 2018. – 267 pp. – ISBN 978-5-91291-036-4. ISBN 978-5-91291-036-4 © ICG SB RAS, 2018 International program committee • Nikolai Kolchanov, Institute of Cytology and Genetics of SB RAS, Novosibirsk, Russia (Conference Chair) • Ralf Hofestädt, University of Bielefeld, Germany • Mikhail Fedoruk, Novosibirsk State University, Novosibirsk, Russia • Lubomir Aftanas, State Scientific-Research Institute of Physiology & Basic Medicine, Novosibirsk, Russia • Ivo Grosse, Halle-Wittenberg University, Halle, Germany • Olga Lavrik, Institute of Chemical Biology and Fundamental Medicine of SB RAS, Novosibirsk, Russia • Dmitry Afonnikov, Institute of Cytology and Genetics of SB RAS, Novosibirsk, Russia • Andrey Lisitsa, IBMC, Moscow, Russia • Mikhail Voevoda, Institute of Internal Medicine and Preventive Medicine – ICG SB RAS, Russia • Vladimir Ivanisenko, Institute of Cytology and Genetics of SB RAS, Novosibirsk, Russia • Sergey
    [Show full text]
  • The History of the CATH Structural Classification of Protein Domains
    Biochimie 119 (2015) 209e217 Contents lists available at ScienceDirect Biochimie journal homepage: www.elsevier.com/locate/biochi Review The history of the CATH structural classification of protein domains * Ian Sillitoe , Natalie Dawson, Janet Thornton, Christine Orengo University College London, Darwin Building, Gower Street, WC1E 6BT, UK article info abstract Article history: This article presents a historical review of the protein structure classification database CATH. Together Received 1 July 2015 with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more Accepted 1 August 2015 than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to Available online 4 August 2015 capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive Keywords: function annotation resources has also meant that domain families can be functionally annotated Protein structure allowing insights into functional divergence and evolution within protein families. Structure classification © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Contents 1. Historical background . ................................................ 209 2. Structural approaches used to recognize fold similarities and homologues . .................... 211 3. Domain recognition . ................................................ 212 4. Superfolds and the likely existence of limited folding arrangements in nature . .................... 212 5. Expansion of SCOP and CATH with predicted structures to further explore the evolution of domains . ............... 214 6. Exploiting domain structure superfamilies in CATH to examine the evolution of protein functions . ................... 214 7.
    [Show full text]
  • Measuring Uncertainty of Protein Secondary Structure
    Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2011 Measuring Uncertainty of Protein Secondary Structure Alan Eugene Herner Wright State University Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all Part of the Computer Engineering Commons, and the Computer Sciences Commons Repository Citation Herner, Alan Eugene, "Measuring Uncertainty of Protein Secondary Structure" (2011). Browse all Theses and Dissertations. 422. https://corescholar.libraries.wright.edu/etd_all/422 This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. MEASURING UNCERTAINTY OF PROTEIN SECONDARY STRUCTURE A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy By Alan Eugene Herner B.A. Wright State University, 1980 M.S. Wright State University, 1980 M.S. Wright State University, 2001 _____________________________________________ 2011 Wright State University COPYRIGHT BY Alan E. Herner 2011 WRIGHT STATE UNIVERSITY SCHOOL OF GRADUATE STUDIES January 7, 2011 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY ALAN E. HERNER ENTITLED Measuring Uncertainty of Protein Secondary Structure BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ________________________ Michael L. Raymer, PhD Dissertation Director ________________________ Arthur A. Goshtasby, PhD Director, Computer Science and Engineering PhD Program ________________________ Andrew Hsu, PhD Dean School of Graduate Studies Committee Final Examination ____________________ Michael L. Raymer, PhD ____________________ Gerald Alter, PhD ____________________ Travis Doom, PhD ____________________ Ruth Pachter, PhD _____________________ Mateen Rizki, PhD ABSTRACT Herner, Alan E.
    [Show full text]
  • Insights Into the Evolution of Protein Domains Give Rise to Improvements of Function Prediction
    Insights into the Evolution of Protein Domains give rise to Improvements of Function Prediction Dissertation zur Erlangung des naturwissenschaftlichen Doktorgrades der Bayrischen Julius-Maximilians-Universität Würzburg vorgelegt von Birgit Pils aus Bad Soden im Taunus Würzburg 2005 Eingereicht am: ....................................................................................................... Mitglieder der Promotionskommission: Vorsitzender: ........................................................................................................... Gutachter: Prof. Dr. Jörg Schultz Gutachter: Prof. Dr. Jürgen Kreft Tag des Promotionskolloquiums: ........................................................................... Doktorurkunde ausgehändigt am: ........................................................................... II ERKLÄRUNG Hiermit erkläre ich ehrenwörtlich, daß ich die vorliegende Dissertation selbständig angefertigt und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Die Dissertation wurde bisher weder in gleicher noch ähnlicher Form in einem anderen Prüfungsverfahren vorgelegt. Außer dem Diplom in Biochemie von der Universität Witten/Herdecke habe ich bisher keine weiteren akademischen Grade erworben oder versucht zu erwerben. Würzburg, August 2005 Birgit Pils III Acknowledgements Many individuals have supported me during the development of this thesis whom I would like to express my deepest gratitude. First and foremost, I would like to thank my mentor Prof. Dr. Jörg Schultz for his inspiring
    [Show full text]
  • Alternative Splicing and Protein Structure Evolution
    Alternative Splicing and Protein Structure Evolution Fabian Birzele Munchen¨ 2008 Alternative Splicing and Protein Structure Evolution Fabian Birzele Dissertation an der Fakultat¨ fur¨ Mathematik, Informatik und Statistik der Ludwig–Maximilians–Universitat¨ Munchen¨ vorgelegt von Fabian Birzele aus Wurzburg¨ Munchen,¨ den 09.12.2008 Erstgutachter: Prof. Dr. Ralf Zimmer Zweitgutachter: Prof. Dr. Dimtrij Frishman Tag der mundlichen¨ Prufung:¨ 27.01.2009 Contents Abstract xiii Zusammenfassung xv 1 Introduction 1 I Protein Structure Comparison 7 2 Introduction to Protein Structure Analysis 9 2.1 Goals and open questions in protein structure analysis . 9 2.2 Existing approaches to protein structure comparison . 10 2.2.1 SCOP and CATH - the standard of truth . 10 2.2.2 Automated protein structure alignment and comparison . 11 3 A Systematic Comparison of SCOP and CATH 13 3.1 Methods . 14 3.1.1 Mapping of domain assignments . 14 3.1.2 Mapping inner nodes of the hierarchies . 15 3.2 Results . 15 3.2.1 Datasets . 15 3.2.2 Detailed Comparison of SCOP and CATH . 16 3.3 Applications of the SCOP-CATH mapping . 20 3.3.1 Benchmarking Structure-Comparison methods . 20 3.3.2 Inter-Fold Similarities revealed by Consistency Checks . 24 3.4 Conclusion . 25 4 Protein Structure Alignment considering Phenotypic Plasticity 27 4.1 Outline of the method . 29 4.2 Conclusion . 30 5 Vorolign - fast protein structure alignment using Voronoi contacts 33 5.1 Methods . 34 5.1.1 Voronoi tessellation of protein structures . 34 vi CONTENTS 5.1.2 Properties of Voronoi cells . 35 5.1.3 Similarity of Voronoi cells .
    [Show full text]