Discovering Meaning from Biological Sequences: Focus on Predicting

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2013 Discovering meaning from biological sequences: focus on predicting misannotated proteins, binding patterns, and G4-quadruplex secondary Carson Michael Andorf Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Bioinformatics Commons, and the Computer Sciences Commons Recommended Citation Andorf, Carson Michael, "Discovering meaning from biological sequences: focus on predicting misannotated proteins, binding patterns, and G4-quadruplex secondary" (2013). Graduate Theses and Dissertations. 13533. https://lib.dr.iastate.edu/etd/13533 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Discovering meaning from biological sequences: focus on predicting misannotated proteins, binding patterns, and G4-quadruplex secondary structures by Carson Michael Andorf A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Vasant Honavar, Co-major Professor Drena Dobbs, Co-major Professor David Fernandez-Baca Robert Jernigan Taner Sen Guang Song Iowa State University Ames, Iowa 2013 Copyright © Carson Michael Andorf, 2013. All rights reserved. ii DEDICATION TO MY FAMILY, FOR THEIR OVERWHELMING SUPPORT iii TABLE OF CONTENTS Page DEDICATION .......................................................................................................... ii ACKNOWLEDGEMENTS ...................................................................................... vi ABSTRACT .............................................................................................................. viii CHAPTER 1 GENERAL INTRODUCTION ...................................................... 1 Data Mining Approaches for Discovery of Protein Sequence-Structure- Function Relationships ........................................................................................ 2 Overview of Prediction of Misannotated Proteins .............................................. 3 Overview of Protein-Protein Interaction Networks and the Relationship of Binding Patterns in Hub Proteins ........................................................................ 4 Genome-Wide Analysis of G4-Quaduplex Motifs .............................................. 6 A Survey of Current Methods ............................................................................. 7 Summary...... ....................................................................................................... 26 Dissertation Organization .................................................................................... 27 References........................................................................................................... 29 CHAPTER 2 EXPLORING INCONSISTENCIES IN GENOME-WIDE PROTEIN FUNCTION ANNOTATIONS: A MACHINE LEARNING APPROACH ............................................................ 46 Abstract............................................................................................................... 46 Background......................................................................................................... 47 Results................................................................................................................. 49 Discussion.... ....................................................................................................... 53 Conclusion.... ....................................................................................................... 55 Methods........ ....................................................................................................... 56 Authors' Contributions ........................................................................................ 61 Acknowledgements ............................................................................................. 61 References........................................................................................................... 62 Figures................................................................................................................. 66 Tables........... ....................................................................................................... 68 Supplementary Information found in the Appendices Section ........................... 71 iv Page CHAPTER 3 PREDICTING THE BINDING PATTERNS OF HUB PROTEINS: A STUDY USING YEAST PROTEIN INTERACTION NETWORKS ...................................................... 73 Abstract............................................................................................................... 73 Introduction... ...................................................................................................... 75 Results and Discussion ........................................................................................ 82 Conclusion.... ....................................................................................................... 91 Materials and Methods ........................................................................................ 92 Acknowledgements ............................................................................................. 101 References........................................................................................................... 102 Figures................................................................................................................. 108 Tables............ ...................................................................................................... 114 Supporting Information ....................................................................................... 118 CHAPTER 4 GENOME-WIDE, COMPUTATIONAL IDENTIFICATION G-QUADRUPLEX ELEMENTS AS POTENTIAL COORDINATE REGULATORS OF GENETIC RESPONSES TO HYPOXIA IN MAIZE (ZEA MAYS L.) ............................................................... 120 Abstract............................................................................................................... 120 Introduction... ...................................................................................................... 121 Material and Methods .......................................................................................... 123 Results................................................................................................................. 129 Discussion.... ....................................................................................................... 136 Acknowledgements ............................................................................................. 143 Funding............................................................................................................... 143 References........................................................................................................... 143 Figures................................................................................................................. 152 Tables........... ....................................................................................................... 159 Supporting Information....................................................................................... 167 CHAPTER 5 CONCLUSIONS .............................................................................. 168 Summary and Discussion .................................................................................... 168 Contributions to the Field .................................................................................... 170 Future work on protein classification .................................................................. 177 Future work toward understanding the biology of G4-quadruplexes ................. 180 References ......................................................................................................... 182 v Page APPENDIX A: Supplementary Data ........................................................................ 185 APPENDIX B: Supplementary Note ........................................................................ 194 APPENDIX C: Supplementary Table 1 .................................................................... 195 APPENDIX D: Supplementary Table 2 .................................................................... 201 APPENDIX E: Supplementary Table 3 .................................................................... 207 APPENDIX F: Supplementary Table 4 .................................................................... 213 APPENDIX G: Supplementary Table 5 .................................................................... 220 APPENDIX H: Supplementary Table 6 .................................................................... 221 APPENDIX I: Supplementary Table 7 ..................................................................... 226 APPENDIX J: Figure S1 ........................................................................................... 227 APPENDIX K: Figure S2 ......................................................................................... 228 APPENDIX L: Table S1 ..........................................................................................

Discovering Meaning from Biological Sequences: Focus on Predicting

OXSR1 (A-4): Sc-271707

The Proximal Signaling Network of the BCR-ABL1 Oncogene Shows a Modular Organization

Defining Functional Interactions During Biogenesis of Epithelial Junctions

(MMP-9) in Oral Cancer Squamous Cells—Are There Therapeutical Hopes?

OXSR1 Mouse Monoclonal Antibody [Clone ID: OTI1F3] Product Data

Transcriptomic Analysis of Native Versus Cultured Human and Mouse Dorsal Root Ganglia Focused on Pharmacological Targets Short

OSR1 (Phospho Thr185) Antibody Cat

1 UST College of Science Department of Biological Sciences

Plenary and Platform Abstracts

Aneuploidy: Using Genetic Instability to Preserve a Haploid Genome?

Mining Hub Genes Correlated with Immune Infiltrating Level Across

Comparative Transcriptome Profiling of the Human and Mouse Dorsal Root Ganglia: an RNA-Seq-Based Resource for Pain and Sensory Neuroscience Research