Protein Sequence-Structure Relationships Sumudu Pamoda Leelananda Iowa State University

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2011 Protein sequence-structure relationships Sumudu Pamoda Leelananda Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Biochemistry, Biophysics, and Structural Biology Commons Recommended Citation Leelananda, Sumudu Pamoda, "Protein sequence-structure relationships" (2011). Graduate Theses and Dissertations. 10344. https://lib.dr.iastate.edu/etd/10344 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Protein sequence-structure relationships by Sumudu Pamoda Leelananda A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Biophysics Program of Study Committee Robert L. Jernigan, Major Professor Drena Dobbs David Fernandez-Baca Mark Hargrove Andrzej Kloczkowski Edward Yu Iowa State University Ames, Iowa 2011 Copyright © Sumudu Pamoda Leelananda, 2011. All rights reserved. ii DEDICATION To my dear parents for their unconditional love and support iii TABLE OF CONTENTS ACKNOWLEDGEMENTS ........................................................................................... vi ABSTRACT................................................................................................................... viii CHAPTER 1: INTRODUCTION .................................................................................. 1 1.1 Designability ........................................................................................................................................... 1 1.2 Contact potentials .................................................................................................................................. 4 1.3 A guide to the present study .................................................................................................................. 5 CHAPTER 2: DESIGNABILITY OF LATTICE MODELS OF PROTEINS .......... 8 2.1 Topology and designability of lattice conformations .......................................................................... 9 2.1.1 Methods 10 2.1.1.1 Enumeration of compact conformations ................................................................................. 10 2.1.1.2 Calculation of energy of conformations ................................................................................. 11 2.1.1.3 Calculation of designability .................................................................................................... 15 2.1.1.4 Generation of contact graphs .................................................................................................. 15 2.1.1.5 Regression analysis ................................................................................................................ 17 2.1.1.5 Naïve Bayes prediction........................................................................................................... 17 2.1.2 Results 19 2.1.3 Discussion 24 2.1.3.1 Application to larger lattices and real protein structures ........................................................ 27 2.2 Sequence space diversity ..................................................................................................................... 28 2.2.1 Methods 28 2.2.2 Results 31 2.2.2.1 Extended set of conformations ............................................................................................... 31 2.2.2.2 Degeneracy ............................................................................................................................. 34 2.2.2.3 Reduced set of conformations ................................................................................................ 36 2.2.3 Discussion 37 iv 2.3 Use of machine learning algorithms ................................................................................................... 39 2.3.1 Distinguishing between designable and non-designable sequences 39 2.3.1.1 Methods .................................................................................................................................. 39 2.3.1.2 Results .................................................................................................................................... 41 2.3.1.3 Discussion .............................................................................................................................. 42 2.3.2 Application to real proteins 44 2.3.2.1 Methods .................................................................................................................................. 44 2.3.2.2 Results .................................................................................................................................... 45 2.3.2.3 Discussion .............................................................................................................................. 45 2.4 A preliminary study of disordered proteins using a simple lattice .................................................. 46 2.4.1 Methods 47 2.4.2 Results 47 2.4.3 Discussion 51 CHAPTER 3: DESIGNABILITY OF REAL PROTEIN STRUCTURES .............. 53 3.1 Thermodynamic stability of highly designable structures ............................................................... 54 3.1.1 Methods 54 3.1.2 Results 56 3.1.3 Discussion 57 3.2 Designability of off-lattice models of real proteins ............................................................................ 58 3.2.1 Methods 58 3.2.1.1 Dihedral angles ....................................................................................................................... 60 3.2.2 Results 61 3.2.3 Discussion 62 3.3 Topology and designability of real proteins ....................................................................................... 64 3.3.1 Methods 64 3.3.1.1 Selection of datasets ............................................................................................................... 64 3.3.1.2 Generation of contact graphs and calculation of designability ............................................... 69 3.3.1.3 Naïve Bayes prediction........................................................................................................... 70 3.3.2 Results 70 3.3.3 Discussion 76 3.4 Sequence space diversity ..................................................................................................................... 80 3.4.1 Methods 80 v 3.4.2 Results 81 3.4.3 Discussion 86 3.5 Eigenvalues of contact graphs and designability ............................................................................... 87 3.5.1 Methods 88 3.5.2 Results 88 3.5.3 Discussion 91 3.6 Use of machine learning algorithms ................................................................................................... 91 3.6.1 Distinguishing between designable and non-designable sequences 92 3.6.1.1 Methods .................................................................................................................................. 92 3.6.1.2 Results .................................................................................................................................... 92 3.6.1.3 Discussion .............................................................................................................................. 93 3.6.2 Application to real proteins 93 3.6.2.1 Methods .................................................................................................................................. 93 3.6.2.2 Results .................................................................................................................................... 93 3.6.2.3 Discussion .............................................................................................................................. 94 CHAPTER 4: FOUR-BODY OPTIMIZED POTENTIALS ..................................... 95 4.1 Methods................................................................................................................................................. 96 4.1.1 Geometric construction of four bodies 96 4.2 Results ................................................................................................................................................. 101 4.2.1 Performance of different individual potential functions 101 4.2.2 Performance of the optimized potential 103 4.3 Discussion ........................................................................................................................................... 105 4.4 Participation in CASP9 ..................................................................................................................... 110 CHAPTER 5: SUMMARY AND CONCLUSIONS ................................................. 112 APPENDIX ................................................................................................................... 117 REFERENCES ............................................................................................................ 118 vi ACKNOWLEDGEMENTS I would like to take this opportunity to thank my advisor, Dr. Robert L. Jernigan, who has been guiding me from the day I joined the Jernigan Lab in 2008. It has been a pleasure to work with him. He has always

Protein Sequence-Structure Relationships Sumudu Pamoda Leelananda Iowa State University

Proteome-Scale Prediction of Structure and Func- Tion

The History of the CATH Structural Classification of Protein Domains

An Analysis of Intron Positions in Relation to Protein Structure, Function, and Evolution

Dompro: Protein Domain Prediction Using Profiles, Secondary Structure, Relative Solvent Accessibility, and Recursive Neural Netw

Understanding the Relationship Between Enzyme Structure and Catalysis

HMM Based Approach for Classifying Protein Structures

Exploring the Function and Evolution of Proteins Using Domain Families

BGRS\SB-2018) the Eleventh International Conference

The History of the CATH Structural Classification of Protein Domains

Measuring Uncertainty of Protein Secondary Structure

Insights Into the Evolution of Protein Domains Give Rise to Improvements of Function Prediction

Alternative Splicing and Protein Structure Evolution