Analysis on Protein Structures Using Statistical and Bioinformatical Methods Aimin Yan Iowa State University
Total Page:16
File Type:pdf, Size:1020Kb
Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 2008 Analysis on protein structures using statistical and bioinformatical methods Aimin Yan Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Bioinformatics Commons, and the Biostatistics Commons Recommended Citation Yan, Aimin, "Analysis on protein structures using statistical and bioinformatical methods" (2008). Retrospective Theses and Dissertations. 15827. https://lib.dr.iastate.edu/rtd/15827 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Analysis on protein structures using statistical and bioinformatical methods by Aimin Yan A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Bioinformatics and Computational Biology Program of Study Committee: Robert L.Jernigan, Major Professor Zhijun Wu Vasant Honavar David Fernandez-Baca Alicia Carriquiry Kai-Ming Ho Iowa State University Ames, Iowa 2008 Copyright c Aimin Yan, 2008. All rights reserved. 3316183 3316183 2008 ii DEDICATION I would like to dedicate this thesis to my mother(Cungui Pan), my father(Jinyuan Yan) and my sisters(Caili Yan, Caihong Yan, Caiqin Yan, Caixia Yan) for their unconditional love. Equivalently, I want to thank my wife, Shan Yu, for her patience and love. I also want to thank my son, Ivan Yu Yan, for his smiles that give me a lot of relax during my studies. iii TABLE OF CONTENTS LIST OF TABLES . vii LIST OF FIGURES . ix ABSTRACT . xiii CHAPTER 1. General introduction . 1 References............................................... 6 CHAPTER 2. How do side chains orient globally in protein structures? . 10 Abstract . 10 Introduction . 11 Data set and method . 16 The used protein structures . 16 Ω angle calculation . 17 Residue surface area and curvature . 17 Data processing and Statistical analysis . 17 Results and Discussion . 18 Relationship between Ω angle and hydrophobicity . 18 Difference in Ω angle between exposed and buried residues in 144 monomeric structures 21 Difference in Ω angle distributions between exposed, interfacial, and buried residues in 192 dimeric Structures . 21 Relationship between Ω angle and mean residue depth . 24 Ω angle for different sizes of monomeric protein structures . 24 Variation of Ω angle in different solvent-accessible layers of structure . 25 The difference of side chain orientation preference in surface regions and buried regions in 1982 monomeric protein structures . 27 The influence of the local surface shape on the side chain orientation preference . 29 iv Visualization of side chain vectors . 30 Conclusions . 33 References . 34 CHAPTER 3. Prediction of side chain orientations in proteins by statistical ma- chine learning methods . 37 Abstract . 37 Introduction . 37 Materials and Methods . 39 Data Set . 39 Statistical methods . 44 Further analysis for the residual plots in the different models . 57 Measurement of prediction accuracy . 58 Software............................................. 58 Results and Discussion . 58 Comparison of prediction results for the training set . 58 Comparison of prediction results for the test set . 59 Comparison between the prediction from our statistical models and random assignment for side chain Ω angle . 59 Conclusions . 65 References . 65 CHAPTER 4. The effects of different superpositioning methods for structures in the NMR ensemble of structures on the correspondence between the experi- mental conformational changes and the motions generated from elastic network model.............................................. 69 Abstract . 69 Introduction . 69 Materials and Method . 71 Structure used in this study . 71 Superposition methods . 72 The experimental conformation changes obtained using PCA . 76 Protein motion simulations based on the median structure . 77 v Overlap measurement between the protein principal conformation change and the motion simulated from the anisotropic network models . 77 Results and Discussion . 77 The effects of the different superposition methods on the simulated motions . 77 The effects of two superposition methods on the observed conformational variabilities . 78 The Least square based method is better for most of proteins . 79 The Maximum likelihood based method is better for some proteins . 81 The overlap between the experimental conformational change space and the normal mode space . 83 A case study using calcium-binding protein . 86 Conclusion . 91 References . 91 CHAPTER 5. The simulated motions of partially assembled 30S ribosome struc- tures .............................................. 93 Abstract . 93 Introduction . 93 Materials and Method . 95 Structure used in this study . 95 Overlap matrix calculation . 97 Correlation of substructure motions in the complete 30S subunit . 97 Calculation of deformation energy . 97 Protein removal method . 97 Computation Cost . 98 Results and Discussion . 99 The influence of removing all protein subunits on the simulated motions of 16sRNA . 99 Influence of removal of one protein from the 30S complex . 100 Influence of removing pairs of proteins from the 30S complex . 105 The correlation of motion of different subunits . 108 Remove the sets of protein subunits based on their binding order . 112 Whether the subunits that have the larger contact also have the stronger correlated motion generated from ENM . 120 Contacts between the protein subunits and 16SrRNA subunit in the 30S structure . 121 vi Conclusion . 124 References . 124 CHAPTER 6. Principal component shaving method for clustering the structures within an ensemble of NMR-derived protein . 128 Abstract . 128 Introduction . 128 Materials and Method . 129 Structure used in this study . 129 Principal component shaving method . 129 RMSD calculation for the structures within an ensemble . 130 Results and Discussion . 130 The cluster from principal component shaving . 130 Conclusion . 131 References . 131 CHAPTER 7. General Conclusion . 133 Coarse-grained side chain orientation . ..