Measuring Uncertainty of Protein Secondary Structure
Total Page:16
File Type:pdf, Size:1020Kb
Wright State University CORE Scholar Browse all Theses and Dissertations Theses and Dissertations 2011 Measuring Uncertainty of Protein Secondary Structure Alan Eugene Herner Wright State University Follow this and additional works at: https://corescholar.libraries.wright.edu/etd_all Part of the Computer Engineering Commons, and the Computer Sciences Commons Repository Citation Herner, Alan Eugene, "Measuring Uncertainty of Protein Secondary Structure" (2011). Browse all Theses and Dissertations. 422. https://corescholar.libraries.wright.edu/etd_all/422 This Dissertation is brought to you for free and open access by the Theses and Dissertations at CORE Scholar. It has been accepted for inclusion in Browse all Theses and Dissertations by an authorized administrator of CORE Scholar. For more information, please contact [email protected]. MEASURING UNCERTAINTY OF PROTEIN SECONDARY STRUCTURE A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy By Alan Eugene Herner B.A. Wright State University, 1980 M.S. Wright State University, 1980 M.S. Wright State University, 2001 _____________________________________________ 2011 Wright State University COPYRIGHT BY Alan E. Herner 2011 WRIGHT STATE UNIVERSITY SCHOOL OF GRADUATE STUDIES January 7, 2011 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY ALAN E. HERNER ENTITLED Measuring Uncertainty of Protein Secondary Structure BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. ________________________ Michael L. Raymer, PhD Dissertation Director ________________________ Arthur A. Goshtasby, PhD Director, Computer Science and Engineering PhD Program ________________________ Andrew Hsu, PhD Dean School of Graduate Studies Committee Final Examination ____________________ Michael L. Raymer, PhD ____________________ Gerald Alter, PhD ____________________ Travis Doom, PhD ____________________ Ruth Pachter, PhD _____________________ Mateen Rizki, PhD ABSTRACT Herner, Alan E. PhD. Department of Computer Science and Engineering, Wright State University, 2011. Measuring Uncertainty of Protein Secondary Structure. This dissertation develops and demonstrates a method to measure the uncertainty of secondary structure of protein sequences using Shannon’s information theory. This method is applied to a newly developed large dataset of chameleon sequences and to several protein hinges culled from the Hinge Atlas. The uncertainty of the central residue in each tripeptide is computed for each amino acid in a sequence using Cuff and Barton’s CB513 as the reference set. It is shown that while secondary structure uncertainty is relatively high in chameleon regions [avg = 1.27 bits] it is relatively low in the regions 1- 7 residues nearest a chameleon [N terminus flank avg = 1.12 bits; C terminus flank avg = 1.16 bits]. This difference is shown to be highly statistically significant [ p = 9.6E-18 and p = 2.9E-12, respectively]. It is also shown that the secondary structure uncertainty of hinge regions was not found to be different to a statistically significant degree once a Bonferroni multiple test correction was applied. A new hand curated database of long “chameleon” sequences was developed. It contains nine sequences of length eight and eighty-five sequences of length seven. iv TABLE OF CONTENTS 1.0 INTRODUCTION .........................................................................................................1 1.1 Overview ........................................................................................................................1 1.1.1 Research Objective and Significance ...................................................................3 1.1.2 Organization of the Report ...................................................................................4 1.2 Proteins ..........................................................................................................................5 1.2.1 Amino acid composition and peptide bonds ........................................................5 1.2.2 Types of Amino Acids .........................................................................................7 1.2.3 Planarity and Dihedral angles ..............................................................................7 1.2.4 Conformational Constraints .................................................................................8 1.2.5 Molecular forces involved in protein folding .......................................................9 1.2.5.1 Hydrogen bonds ...........................................................................................10 1.2.5.3 Ionic (charge) interactions ...........................................................................11 1.2.5.4 Covalent bonds.............................................................................................12 1.2.5.5 Van der Waals forces ...................................................................................13 1.2.6 Protein Structure .................................................................................................13 1.2.6.1 Primary Structure .........................................................................................13 1.2.6.2 Secondary Structure .....................................................................................14 1.2.6.2.1 Alpha Helix............................................................................................14 1.2.6.2.2 Extended strand .....................................................................................15 1.2.6.2.3 Random Coil ..........................................................................................16 1.2.6.3 Motifs ...........................................................................................................17 1.2.6.4 Tertiary Structure .........................................................................................17 1.2.6.5 Quaternary Structure ....................................................................................18 1.2.7 Theories of Folding ............................................................................................19 1.2.7.1 Framework Model ........................................................................................19 1.2.7.2 Hydrophobic Collapse Model ......................................................................19 1.2.7.3 Nucleation Model.........................................................................................20 1.2.7.4 Unified Model ..............................................................................................20 1.3 Protein Data and Databases .........................................................................................21 1.3.1 Experimental data ...............................................................................................21 1.3.1.1 X-ray crystallography ..................................................................................21 1.3.1.2 Nuclear Magnetic Resonance (NMR) .........................................................22 v 1.3.2 Dictionary of Secondary Structure of Proteins (DSSP) ....................................23 1.3.3 Data sets .............................................................................................................24 1.3.3.1 Redundancy and Homology .........................................................................24 1.3.3.2 Data Sources ................................................................................................26 1.3.3.2.1 wwProtein Data Bank (PDB) ................................................................26 1.3.3.2.2 Customized Data Sets ............................................................................27 1.3.3.3 Data Formats ................................................................................................28 1.3.3.3.1 FASTA...................................................................................................28 1.3.3.3.2 Protein Data Bank ..................................................................................29 1.3.4 Eight to three reduction ......................................................................................29 2.0 LITERATURE REVIEW ............................................................................................31 2.1 Secondary Structure Prediction....................................................................................31 2.1.1 Foundations ........................................................................................................31 2.1.1.1 Early Investigations .....................................................................................31 2.1.1.2 Thermodynamic Hypothesis ........................................................................32 2.1.1.3 Levinthal’s Paradox .....................................................................................33 2.1.2 Illustrative Papers ...............................................................................................34 2.1.2.1 Physico-chemical .........................................................................................34 2.1.2.1.1 Helical wheels ........................................................................................34 2.1.2.1.2 Physical rules .........................................................................................35 2.1.2.1.3 Molecular Dynamics ..............................................................................35 2.1.2.2 Statistical