The Incidence and Functional Relevance of Intrinsic Disorder in Enzymes and the Protein Data Bank Shelly Deforte University of South Florida, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 6-27-2016 Intrinsic Disorder Where You Least Expect It: The Incidence and Functional Relevance of Intrinsic Disorder in Enzymes and the Protein Data Bank Shelly Deforte University of South Florida, [email protected] Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the Bioinformatics Commons, Medicine and Health Sciences Commons, and the Molecular Biology Commons Scholar Commons Citation Deforte, Shelly, "Intrinsic Disorder Where You Least Expect It: The ncI idence and Functional Relevance of Intrinsic Disorder in Enzymes and the Protein Data Bank" (2016). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6219 This Thesis is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Intrinsic Disorder Where You Least Expect It: The Incidence and Functional Relevance of Intrinsic Disorder in Enzymes and the Protein Data Bank by Shelly DeForte A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Molecular Medicine College of Medicine University of South Florida Major Professor: Vladimir Uversky, Ph.D. Yu Chen, Ph.D. Robert Deschenes, Ph.D. Sandy Westerheide, Ph.D. Bin Xue, Ph.D. Date of Approval: June 14, 2016 Keywords: intrinsically disordered protein, x-ray crystallography, structural biology, enzyme function Copyright © 2016, Shelly DeForte Table of Contents List of Tables ................................................................................................ iv List of Figures ................................................................................................ v Abstract ..................................................................................................... vii 1. Introduction to Intrinsically Disordered Proteins ................................................... 1 1.1 The Dominant Paradigms in Protein Science .............................................. 1 1.2 Defining Intrinsically Disordered Proteins ................................................. 2 1.3 The Subtler Side of Disorder ................................................................ 5 1.4 The Line Between Order and Disorder ..................................................... 7 1.5 The Mechanisms of Disorder ................................................................. 9 1.5.1 Entropy ............................................................................. 10 1.5.2 Accessibility ....................................................................... 11 1.5.3 Plasticity ........................................................................... 11 1.6 Biological Functions ......................................................................... 12 1.6.1 Signaling ........................................................................... 12 1.6.2 Regulation ......................................................................... 12 1.7 Disorder and Protein Evolution............................................................. 13 1.8 The Tools of the Un-Structural Biologist .................................................. 14 1.8.1 Experimental techniques ........................................................ 14 1.8.1.1 X-Ray crystallography. ................................................ 14 1.8.1.2 Nuclear Magnetic Resonance. ........................................ 15 1.8.1.3 Combining experimental techniques................................ 16 1.8.2 Bioinformatics analysis ........................................................... 16 1.8.2.1 Sequence characteristics. ............................................ 16 1.8.2.2 Disorder prediction. ................................................... 18 1.8.2.3 Classification of function. ............................................ 20 1.8.2.4 Proteome level studies ............................................... 20 1.9 Protein Intrinsic Disorder and Disease .................................................... 21 1.10 Protein Intrinsic Disorder and Drug Design and Discovery ............................ 23 i 1.10.1 The story of PTP1B .............................................................. 24 1.11 The Field of Protein Intrinsic Disorder .................................................. 26 1.12 Intrinsic Disorder Where You Least Expect It ........................................... 29 2. Intrinsic Disorder in the Protein Data Bank ......................................................... 31 2.1 Background ................................................................................... 31 2.1.1 The Protein Data Bank ........................................................... 31 2.1.2 Missing regions in the PDB ....................................................... 31 2.1.3 B-factors ........................................................................... 33 2.1.4 Missing regions and the development of disorder prediction ............... 34 2.1.5 Previous studies ................................................................... 35 2.2 Results ......................................................................................... 35 2.2.1 A new method for the characterization of missing regions ................. 35 2.2.2 Ambiguous regions have greater secondary structure variation ........... 39 2.2.3 Different types of missing regions have distinct characteristics ........... 42 2.2.4 Disorder prediction correlates with missing residue conservation ........ 46 2.2.5 Static disorder and wobbly domains are rare in the PDB ................... 49 2.3 Conclusions ................................................................................... 52 3. Intrinsic Disorder in Enzymes ......................................................................... 54 3.1 Background ................................................................................... 54 3.1.1 Intrinsically disordered enzymes in the literature ........................... 54 3.1.2 Previous studies ................................................................... 63 3.2 Results ......................................................................................... 64 3.2.1 Experimental design .............................................................. 64 3.2.2 Enzymes and non-enzymes have a similar incidence of IDPRs. ............ 66 3.2.3 Enzymes and non-enzymes have IDPRs of similar lengths .................. 67 3.2.4 Disorder is specific to enzyme length and type. ............................. 69 3.2.5 Disorder increases with organismic complexity. ............................. 70 3.2.6 Eukaryotic proteins in the PDB are highly truncated ........................ 70 3.2.7 Long IDPRs in enzymes are associated with specific functions ............. 73 3.2.8 Promiscuity is not correlated with disorder in enzymes .................... 74 3.3 Conclusions ................................................................................... 75 4. Materials and Methods ................................................................................. 79 4.1 PubMed Data and Analysis .................................................................. 79 ii 4.1.1 IDP terminology in PubMed ...................................................... 79 4.2 PDB Data and Analysis ....................................................................... 81 4.2.1 Parsing and preparation of the PDB dataset .................................. 81 4.2.2 The assignment of missing residues from PDB files .......................... 82 4.2.3 The creation of PDB composite data in Python............................... 82 4.2.4 Amino acid composition ......................................................... 82 4.2.5 Disorder, binding, and MoRF predictions ...................................... 82 4.3 Reference Proteomes Data and Analysis .................................................. 83 4.3.1 Parsing and preparation of the Reference QFO dataset .................... 83 4.3.2 Enzyme Commission (EC) designations ........................................ 84 4.3.3 Disorder prediction ............................................................... 84 4.3.4 Disorder analysis .................................................................. 86 4.3.4.1 Disorder calculations .................................................. 86 4.3.4.2 Expectation values .................................................... 86 4.3.4.3 Transmembrane domains ............................................. 87 4.3.5 Gene Ontology enrichment ...................................................... 87 References ................................................................................................. 88 Appendix A: Glossary .................................................................................... 101 Appendix B: IDP Search Terms and PubMed IDs ...................................................... 103 Appendix C: IDP Search Terms, Search Results, and Disorder Scores ............................. 142 Appendix D: Intrinsically Disordered Enzymes ....................................................... 179 Appendix E: Copyright Permissions .................................................................... 182 iii List of Tables Table 1 Secondary structure abbreviations. ..........................................................