Towards Automating Structural Analysis of Complex Rna Molecules and Some Applications in Nanotechnology
Total Page:16
File Type:pdf, Size:1020Kb
TOWARDS AUTOMATING STRUCTURAL ANALYSIS OF COMPLEX RNA MOLECULES AND SOME APPLICATIONS IN NANOTECHNOLOGY Lorena G. Parlea A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2015 Committee: Neocles B. Leontis, Advisor R. Marshall Wilson Graduate Faculty Representative Craig L. Zirbel Carol A. Heckman George S. Bullerjahn © 2015 Lorena G. Parlea All Rights Reserved iii ABSTRACT Neocles B. Leontis, Advisor RNA has emerged as a versatile and multi-faceted player in gene expression and the informational metabolism of living cells. RNA molecules can function by virtue of their sequences, storing or transmitting genetic information, as well as by forming complex three- dimensional (3D) structures that can bind specifically to proteins, small molecules or other RNA or DNA molecules to carry out diverse recognition functions, including chemical catalysis. As a result of revolutions in RNA 3D structure determination and high-throughput DNA and RNA sequencing, on-line databases are brimming with new structure and sequence data. The large amounts of new data are creating new challenges in data management, curation, search, visualization, and access. For example, ribosomes have been solved in many different functional states, with tRNAs variously bound to the A-, P-, or E-sites, or associated with different translation factors (i.e. initiation, elongation, termination or recycling factors) or antibiotics. Detailed and accurate functional annotations are needed to enable focused database searches for specific states and bound ligands and to uncover new relationships regarding structure, function, and evolution of RNA molecules and their complexes. As large numbers of new structures are accumulating in databases faster than they can be manually annotated, automated annotation procedures need to be developed and deployed by databases such as the Nucleic Acid Database (NDB). In addition to annotation of individual structures, related structures must be identified, compared and clustered, and representative structures chosen for detailed analysis. To facilitate research, clustering should be dynamic and user driven. iv The unifying theme of this dissertation is to develop new conceptual frameworks and analytical approaches to assess and improve the automated annotation and analysis of RNA 3D structures, and to connect these data to structural changes in functional state and evolutionary changes. More specifically, the dissertation focuses on analysis of 3D structures of ribosomal RNA (rRNA) from the large (LSU) and small (SSU) subunits of ribosomes, and the hairpin and internal loop motifs extracted from them. A manual analysis was carried out of all atomic- resolution, 3D structures of the SSU of the bacterium T. thermophilus found in the NDB as of April, 2014, structure equivalence class NR_4.0_81883.24, release 1.56 of ribosome structures posted in the NDB (see http://rna.bgsu.edu/rna3dhub/nrlist). Each structure was manually examined to determine the functional annotations for the state of the ribosome with all bound tRNAs, mRNAs and other factors. These data were combined in a single spreadsheet found at this link: http://tinyurl.com/16S-T-Thermophilus-summary. NDB maintains a Non-redundant (NR) set of structures that is updated each week and used to construct the 3D Motif Atlas of RNA hairpin and internal loop motifs (http://rna.bgsu.edu/rna3dhub/motifs). To assess the quality of motif clustering in the Motif Atlas, links and meta-data were downloaded and analyzed for all motif instances in release 1.14 (http://rna.bgsu.edu/rna3dhub/motifs/release/IL/1.14). Overall, motifs found in conserved regions of the rRNAs were placed in the same motif groups by the automated clustering. Information must flow between different parts of the ribosome to report on the state of one part that is relevant to another part. One hypothesis is that networks of RNA tertiary and quaternary interactions play a central role in the ribosome. As a first step to automatically detecting and comparing networks of RNA-RNA interactions in the ribosome, a new module was created for the FR3D (“Find RNA 3D”) suite of RNA analysis tools. This module can perform the analysis of large structures automatically. v To my Mother: “Multumesc ca esti mama mea.” vi ACKNOWLEDGMENTS I want to thank my advisor, Dr. Neocles Leontis, for his mentorship, support and encouragement throughout my graduate studies. Thank you for believing in me. I also wish to thank all my committee members for all their expertise and time, and for agreeing to be part of my journey. I want to thank Dr. Zirbel for always being patient, for helping with writing the programs and sorting out the bugs. My thanks also go to Dr. Heckman for being a teacher, a mentor, a great boss and a friend. I would like to thank Dr. Bullerjahn as well for the time and the input given on my research. I would like to acknowledge my colleagues Jesse Stombaugh for the advice and input in my research, Blake Sweeney for providing the meta-data for two of my projects, and Kirill Afonin for giving me the motivation to finish my dissertation. I would like to thank my family for their unconditional love and support, even from a distance. Special thanks go to my mother and my sister, for everything they did to see me through graduate school, for being my cheerleaders, my critics, my friends, and my support system. I would like to express my love and gratitude to Elena and Dmitry Khon, which became my family far away from home. Without them, I would have not been able to reach the finish line. Special thanks go to my friend Mina Coman which was my avid supporter towards the end of my studies. Finally, I want to thank all the friends I made here, too many to acknowledge them, who encouraged and helped me throughout, you brought color to my life and made this journey thrilling. vii TABLE OF CONTENTS Page I. INTRODUCTION ………………………………………………………………….................. 1 I.1. RNA history and its importance as a biological molecule ………..…………….………. 1 I.1.1. Historical overview of RNA research ……..…………………………………........ 1 I.1.2. The roles of RNA and RNA nucleotides in biological processes ………………... 15 I.1.3. Methods for scrutinizing RNAs structure and function ………..………….…...… 16 I.1.4. RNA structural organization ……….……………….……………………………. 18 I.1.5. Types of RNA ………..……………….……………………………………......… 21 I.1.6. Types of RNA and the biological processes in which they are involved ….…...... 22 I.1.6.1. RNA synthesis (Transcription) ………..……………….………………. 22 I.1.6.2. mRNA processing ………..……………….…………………………..... 27 I.1.6.3. The RNAs involved in translation ………..………………...……..…… 28 I.1.6.4. The bacterial ribosome and rRNA ……..……………….………...……. 28 I.1.6.5. The bacterial ribosome and tRNA ……..……………….…......……….. 31 I.1.6.6. Protein synthesis (Translation) ………..……………….……………..... 31 I.1.7. The structure of the ribosome ……..……………….………………………...…... 37 I.2.RNA Structural Motifs ………..……………….……………………….. …………….. 38 I.2.1. Defining Motifs at Different Levels of Structure ………..………………………. 38 I.2.1.1. Hierarchical architectures and folding of structured RNA molecules …...… 39 I.2.1.2. Defining the modular units of RNA Structure ………..……………………. 39 I.2.1.3. Modular and recurrent 3D motifs ………..……………………………….... 41 viii I.2.1.4. Neutral substitutions in helices ………..………………………………..…. 43 I.2.2. Identifying, classifying and annotating nucleotide interactions that stabilize RNA 3D motifs ………..………………………………......................................... 44 I.2.2.1. Reduced representations of RNA 3D structure ………..…………….…….. 44 I.2.2.2. Classification and Annotation of Base-pairing Interactions in RNA Structures ………..………………………………...................................………………….. 45 I.2.2.3. Annotation of secondary structures ………..……………………………..... 53 I.2.2.4. Structure-neutral mutations in recurrent RNA 3D motifs ………..……...… 55 I.2.3. Defining recurrent 3D motifs and identifying them in structures ………..…...…. 58 I.2.3.1 Classification of “loop” motifs ………..……………………………………. 58 I.2.3.2. Defining and naming 3D motifs ………..………………………………….. 60 1.2.3.3. Defining tertiary interaction motifs ………..……………………………… 64 1.2.4. Classification of motifs according to function ………..……................................. 68 I.2.5. Conclusions ………..……………………………………………………..……… 71 II. METHODS AND MATERIALS ………..………………………………………………….. 72 III. RESULTS AND DISCUSSION ………..…………………………………………..….…... 77 III. 1. Structural elements in the bacterial ribosome ………..………………………...……. 77 III. 1.1. Motivation and overview ………..………………………………………… 77 III. 1.2. Annotations of long-range tertiary interactions ………..………………….. 78 III. 1.3. Helical elements vs. stacking structural elements ………..…………...…… 80 III. 1.4. Defining the hierarchical structure of an RNA molecule ………..………... 82 ix III.1.5. Structural elements in the ribosome ………..………. …………………...… 84 III.1.6. Assigning nucleotides to each level of hierarchical organization ………….. 96 III.1.7. Programs for calculating the interaction network ………..……………....... 97 III.1.8. Visualization of results for 2J00 and 2J01 ………..……………………...... 97 III.1.9. Conclusions ………..…………………………………………………….... 104 III. 2. Towards automated extraction of key information of ribosome structures in the Nucleic Acids Databank (NDB) .……………………………………........................... 108 III.3. Evaluation of classification of recurrent motifs in the ribosome structures by the RNA 3D Motif Atlas ………..…………………………………………………………........