Identifying RNA-Protein Interaction Sites Throughout Eukaryotic Transcriptomes
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2015 Identifying RNA-Protein Interaction Sites Throughout Eukaryotic Transcriptomes Ian M. Silverman University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Biology Commons, and the Genetics Commons Recommended Citation Silverman, Ian M., "Identifying RNA-Protein Interaction Sites Throughout Eukaryotic Transcriptomes" (2015). Publicly Accessible Penn Dissertations. 2016. https://repository.upenn.edu/edissertations/2016 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/2016 For more information, please contact [email protected]. Identifying RNA-Protein Interaction Sites Throughout Eukaryotic Transcriptomes Abstract Gene expression is regulated at both the transcriptional and post-transcriptional levels. While transcription controls only the rate of RNA production, numerous and diverse mechanisms regulate the processing, stability and translation of RNAs at the post-transcriptional level. At the heart of this regulation are RNA-binding proteins (RBPs) and their RNA targets. Thousands of RBPs are encoded in mammalian genomes, each with hundreds to thousands of RNA targets. Therefore, cataloging these interactions represents a significant challenge. Recent advances in high-throughput sequencing technologies have greatly expanded the toolkit that researchers have to probe RNA-protein interactions, but these technologies are still in their infancy and thus new methods and applications are required to move our understanding forward. We developed a novel, high-throughput approach to globally identify regions of RNAs that interact with proteins throughout a transcriptome of interest. We applied this technique to human HeLa cells and provide evidence that our approach captures both known and novel RNA-protein interaction sites. We identified global patterns of RNA-protein interactions, found evidence for co-binding of functionally related genes, and revealed that disease associated single-nucleotide polymorphisms are enriched within protein interaction sites. We also performed detailed analysis of the RNA targets for two specific RBPs; oly(A)-bindingP protein cytoplasmic 1 (PABPC1) and Argonaute (AGO). First, we used CLIP-seq to generate a transcriptome-wide map of PABPC1 interaction sites in the mouse transcriptome. This analysis revealed that PABPC1 binds directly to the highly conserved polyadenylation signal sequence and to translation initiation and termination sites. We also showed that PABPC1 binds to A-rich regions in the 5’ untranslated region of a subset of messenger RNAs (mRNAs) and negatively regulates their gene expression. Finally, we applied a recently developed approach to isolate and sequence AGO-bound microRNA precursors (pre-miRNAs). We uncovered widespread trimming and tailing, identified novel intermediates and created an index for pre-miRNA processing efficiency. We discovered that numerous pre-miRNA-like elements are embedded within mRNAs, but do not produce functional small RNAs. In total, these studies provide several advances in our understanding of the global landscape of RNA-protein interactions and serve as a foundation for future mechanistic studies. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Cell & Molecular Biology First Advisor Brian D. Gregory Keywords AGO, CLIP-seq, microRNA, PABPC1, PIP-seq, RNA-binding protein Subject Categories Bioinformatics | Biology | Genetics This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/2016 IDENTIFYING RNA-PROTEIN INTERACTION SITES THROUGHOUT EUKARYOTIC TRANSCRIPTOMES Ian Michael Silverman A DISSERTATION in Cell and Molecular Biology Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2015 Supervisor of Dissertation __________________________________ Brian D. Gregory Assistant Professor of Biology Graduate Group Chairperson __________________________________ Daniel S. Kessler, Ph.D. Associate Professor of Cell and Developmental Biology Dissertation Committee Kristen W. Lynch, Ph.D. (Chair) Professor of Biochemistry and Biophysics Nancy M. Bonini, Ph.D. Florence R.C. Murray Professor of Biology Stephen A. Liebhaber, M.D. Professor of Genetics Christopher D. Brown Assistant Professor of Genetics IDENTIFYING RNA-PROTEIN INTERACTION SITES THROUGHOUT EUKARYOTIC TRANSCRIPTOMES COPYRIGHT 2015 Ian Michael Silverman This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 3.0 License To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.0/ ACKNOWLEDGMENTS The contents of this dissertation would not have been possible without the assistance and support of so many people. I am forever grateful to my advisor Brian Gregory, who has ferociously lead and mentored me over the past 5 years. Thank you for teaching me “to go where the biology takes me” and that a failed experiment is just the start of a new project. Thank you also to my committee members, who have guided me through the Ph.D. process and encouraged me to pursue multiple research interests during my training. I would also like to thank my undergraduate mentors; David Davies and Claudia Marques at Binghamton University and Jens Gundlach and Marcus Collins at the University of Washington. Thanks for giving me the opportunity to do exciting science and inspiring me to pursue a graduate degree. There is no doubt that I am indebted to my labmates, both former and current; Isabelle Dragomir and Matthew Willmann, for teaching me to work with RNA and basic molecular biology techniques; Fan Li, Nate Berkowitz and Sager Gosai for being patient with me as I learned programming and bioinformatics skills and for being excellent collaborators and friends. Thanks also to Lee Vandivier, Qi Zheng, Shawn Foley, and Xiang Yu, Anissa Alexander, Steve Anderson and Lucy Shan for your support and for keeping the lab a fun and interesting place to work. My dissertation research could not have been completed without the assistance of members of other laboratories. Thanks go to John Rinn, Cole Trapnell and Loyal Goff, who helped to develop the concept of PIP-seq and initiated the experiments. I’d also like to thank members of the Liebhaber lab, especially Hemant Kini, Xinjun Ji and Lou Ghanem for being excellent collaborators and also mentoring me during my time at Penn. I look forward to our future collaborations. Thanks also to members of the Mourelatos lab including Nikolaus Vrettos and Anastasios (Tassos) Vourekas for their assistance with microRNA immunoprecipitation experiments. In addition to my dissertation work, I have had the honor of working with many excellent collaborators from a wide array of laboratories, both at Penn and other institutions. Thanks go to iii Li-San Wang and Paul Ryvkin, for letting me contribute to their project on RNA modifications. Many thanks to Muredach Reilly, Mingyao Li and their lab members, especially Hanrui Zhang and Yichuan Liu for collaborative work on transcriptome profiling in human blood and adipose tissue. It was a pleasure working with David Fredrick and Joseph Baur on transcriptome profiling of a mouse model of muscular dystrophy. Thanks to Xiafeng Cao and Toalan Zhao for allowing me to analyze one of the first Arabidopsis CLIP-seq datasets. Also thanks to Henry Daniell, John Cupp and Elise Van Buskirk for working with us on transcriptome analysis of squalene producing Tobacco plants. All of my work has been dependent on the hard work and dedication of two high- throughput sequencing cores. The late Penn Genome Frontiers Institute sequencing core, run by Jeanne Geskes and staffed by Davinder Sandu and Amber Kiliti. Also, thanks to the Next Generation Sequencing Core, their tireless leader Jonathan Schug, and members of his staff, especially Allan Fox, Joe Grubb, Christina Theodorou and Haleigh Zillges. Many thanks to the CAMB coordinators, especially Meagan Schofer, for keeping our graduate program running like a well-oiled machine. I would also like to thank my classmates and friends in GGR, CAMB and other programs at Penn. You have made graduate school an amazing experience and I’ve learned so much from each of you. Thanks to all of my non-science friends who have supported my interest in science and pretended to be interested when I discussed RNA at length. I must thank my family for their love and support. To my parents, Sue and Dan, thank you for encouraging me to ask questions and providing me with endless opportunities to learn. Thanks to my brother Eric, and his wife Lauren, for setting an example and demonstrating that hard work and dedication can pay off. Thanks also to my sister Julie, who showed me that an interest in science is not equivalent to an interest in medicine and for paving the way for my scientific career. Katie, thank you for keeping me balanced through my time in graduate school. I don’t know how I would have made it through alone, but I’m happy I don’t have to find out. iv ABSTRACT IDENTIFYING RNA-PROTEIN INTERACTION SITES THROUGHOUT EUKARYOTIC TRANSCRIPTOMES Ian Michael Silverman Brian D. Gregory, Ph.D. Gene expression is regulated at both the transcriptional and post-transcriptional levels. While transcription controls only the rate of RNA production, numerous and diverse mechanisms regulate the processing,