Modular Organization and Composability of RNA
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Fall 2009 Modular Organization and Composability of RNA Miler T. Lee University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Biology Commons, Computational Biology Commons, Evolution Commons, Genomics Commons, and the Molecular and Cellular Neuroscience Commons Recommended Citation Lee, Miler T., "Modular Organization and Composability of RNA" (2009). Publicly Accessible Penn Dissertations. 244. https://repository.upenn.edu/edissertations/244 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/244 For more information, please contact [email protected]. Modular Organization and Composability of RNA Abstract Life is organized. Organization is largely achieved via composability -- that at some level of abstraction, a system consists of smaller parts that serve as building blocks -- and modularity -- the tendency for these blocks to be independent units that recombine to form functionally different systems. Here, we explore the organization, composition, and modularity of ribonucleic acid (RNA) molecules, biopolymers that adopt three-dimensional structures according to their specific nucleotide sequence. eW address three themes: the efficacy of specific sequenceso t function as modules or as the context in which modules are inserted; the sources of novel modules in modern genomes; and the resolutions at which functionally relevant modules exist in RNA. First, we investigate the structural modularity of RNA sequences by developing the Self-Containment Index, a method to quantify in silico the degree to which RNA structures deviate in changing genomic contexts. We show that although structural modularity is not a general property of natural RNAs, precursor microRNAs are strongly modular, which we hypothesize is a consequence of their unique biogenesis and evolutionary history. Next, we consider the role of modularity in the regulation of subcellular localization. We identify a novel module, the ID element retrotransposon, contained in the introns of rat neuronal genes, and demonstrate that it is sufficiento t drive localization of mRNAs to dendrites via regulated retention of intron sequence. This mechanism shows that introns can provide the context for functional module insertion, and that transposable elements can be co-opted as source material for these modules. As a further example, we present evidence that a Camk2a localization signal can be mimicked by Alu retrotransposon sequence. Finally, we propose that RNAs can be conceptually decomposed into sets of basic RNA functions. To identify these, we automatically construct an ontology of RNA function using Wikipedia documents. We show that many of the functions encoded in ontology terms are significantly associated with common structural features, highlighting an underlying structure-function relationship that can be encapsulated in elemental RNA building-block units. In sum, we show how the phenomena of organization, composition, and modularity can frame RNA research in an evolutionary context. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Genomics & Computational Biology First Advisor Junhyong Kim Keywords RNA structure, transposons, intron retention, transcript localization, evolution, modularity Subject Categories Bioinformatics | Biology | Computational Biology | Evolution | Genomics | Molecular and Cellular Neuroscience This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/244 COPYRIGHT Miler T.S. Lee 2009 “It is interesting that people try to find meaningful patterns in things that are essentially random.” — Lt. Cmdr. Data, Star Trek: The Next Generation, Ep. 5.22 iii Acknowledgements Inspiration and guidance come in many forms. This work would not have been possi- ble without the help of countless individuals and entities, some of whom I can name below: My advisor, Dr. Junhyong Kim, who deserves more praise than I am capable of lavishing. Current and former members of the Kim Lab, who despite working on disparate topics, have helped guide my research and refine my ideas. Our collaborator Dr. James Eberwine and his lab, with whom we’ve closely worked on several fruitful projects. In particular, Chapter 4 of this dissertation contains work from a project that was a joint effort between his student Peter Buckley and myself. The members of my dissertation committee: Dr. Harold Riethman, Dr. James Eberwine, Dr. Sampath Kannan, Dr. Scott Poethig, and formerly Dr. Artemis Hatzigeorgiou, who have each provided me with several years of helpful feedback. Numerous other mentors, colleagues, and staff at the University of Pennsylvania, particularly the faculty who were instrumental in drawing me to Penn, including Dr. Richard Spielman, Dr. Warren Ewens, Dr. Maja Bucan, and Dr. Lyle Ungar; as well as my fellow students in the Genomics and Computational Biology Program, whose combined talents are the greatest strength of this graduate group; and the many graduate group coordinators who have occupied the office in 1425 Blockley Hall and iv have done their part to manage the chaos. The U.S. Department of Energy for providing me with a generous fellowship to fund my doctoral work, for which I am truly grateful. Also, the staff at the Krell Institute for their continued stewardship of the fellowship and an awe-inspiring group of young computational scientists. Mentors and colleagues at several other institutions who have contributed to my never-ending education: Sandia National Laboratories; The University of California, Berkeley; Stanford University; The College Preparatory School; and Bentley School. The various and sundry locations where I wrote the bulk of this dissertation: Good Karma; Brew; the Penn Bookstore; Barnes and Noble; Borders; Starbucks; the Walnut Bridge Coffee House; the Coffee Bar at the Radisson; Last Drop Coffee House; the Green Line Cafe; LaVa Cafe; Chapterhouse; Lovers and Madmen; Sweet Endings; Capo Giro; Rittenhouse, Washington, and Franklin Squares; Clark Park; the Schuylkill Banks; Locust Walk; the steps of the National Constitution Center; the basement of the Mellon Independence Center; Andy’s Laundry; the Penn Graduate Student Center; several living rooms, dining rooms, and kitchens; my bed; the shower; the 14th floor of Blockley Hall; the San Leandro Public Library; Delta Airlines flights 3229, 3432, 1643, 1644, 3212, and 2504; the Philadelphia, Minneapolis-St. Paul, Salt Lake City, and Oakland International Airports; SEPTA bus routes 12, 21, 40, 42, and 44 and regional rail route R1; and my office in the Carolyn Lynch Laboratory. And lastly, in order but not in importance: my parents, who from the day I was born taught me through their words, their actions, and their love how to live a life of meaning; and my friends, without whom I’d be a very dull boy indeed. v This work was supported by the Department of Energy Computational Science Grad- uate Fellowship Program of the Office of Science and National Nuclear Security Ad- ministration in the Department of Energy under contract DE-FG02-97ER25308. vi ABSTRACT MODULAR ORGANIZATION AND COMPOSABILITY OF RNA Miler T.S. Lee Supervisor: Junhyong Kim, Ph.D. Life is organized. Organization is largely achieved via composability – that at some level of abstraction, a system consists of smaller parts that serve as building blocks – and modularity – the tendency for these blocks to be independent units that re- combine to form functionally different systems. Here, we explore the organization, composition, and modularity of ribonucleic acid (RNA) molecules, biopolymers that adopt three-dimensional structures according to their specific nucleotide sequence. We address three themes: the efficacy of specific sequences to function as modules or as the context in which modules are inserted; the sources of novel modules in modern genomes; and the resolutions at which functionally relevant modules exist in RNA. First, we investigate the structural modularity of RNA sequences by developing the Self-Containment Index, a method to quantify in silico the degree to which RNA structures deviate in changing genomic contexts. We show that although structural modularity is not a general property of natural RNAs, precursor microRNAs are strongly modular, which we hypothesize is a consequence of their unique biogenesis and evolutionary history. Next, we consider the role of modularity in the regulation of subcellular localiza- tion. We identify a novel module, the ID element retrotransposon, contained in the introns of rat neuronal genes, and demonstrate that it is sufficient to drive localization vii of mRNAs to dendrites via regulated retention of intron sequence. This mechanism shows that introns can provide the context for functional module insertion, and that transposable elements can be co-opted as source material for these modules. As a fur- ther example, we present evidence that a Camk2a localization signal can be mimicked by Alu retrotransposon sequence. Finally, we propose that RNAs can be conceptually decomposed into sets of basic RNA functions. To identify these, we automatically construct an ontology of RNA function using Wikipedia documents. We show that many of the functions encoded in ontology terms are significantly associated with common structural features, high- lighting an underlying structure-function relationship that can be