Methods in and Applications of the Sequencing of Short Non-Coding Rnas" (2013)
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2013 Methods in and Applications of the Sequencing of Short Non- Coding RNAs Paul Ryvkin University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, Genetics Commons, and the Molecular Biology Commons Recommended Citation Ryvkin, Paul, "Methods in and Applications of the Sequencing of Short Non-Coding RNAs" (2013). Publicly Accessible Penn Dissertations. 922. https://repository.upenn.edu/edissertations/922 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/922 For more information, please contact [email protected]. Methods in and Applications of the Sequencing of Short Non-Coding RNAs Abstract Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields angingr from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non- coding RNAs, as well as a study in which they are applied to Alzheimer's Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications ot RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer's disease. When applied to the study of AD, it is apparent that several classes of non-coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer's disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non-coding RNAs, we can further elucidate the molecular mechanisms that underly Alzheimer's disease pathogenesis. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Genomics & Computational Biology First Advisor Li-San Wang Keywords Alzheimer's disease, machine learning, non-coding RNA, RNA, RNA modification, sequencing Subject Categories Bioinformatics | Genetics | Molecular Biology This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/922 METHODS IN AND APPLICATIONS OF THE SEQUENCING OF SHORT NON-CODING RNAS Paul Ryvkin A DISSERTATION in Genomics and Computational Biology Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2013 Supervisor of Dissertation _________________ Li-San Wang, Ph.D. Assistant Professor of Pathology and Laboratory Medicine Graduate Group Chairperson _____________________ Maja Bucan, Ph.D. Professor of Genetics Dissertation Committee: James Eberwine, Ph.D. (Chair) Professor of Pharmacology Brian Gregory, Ph.D. Assistant Professor of Biology F. Bradley Johnson, M.D. Ph.D. Associate Professor of Pathology and Laboratory Medicine Tandy Warnow, Ph.D. Professor of Computer Sciences at University of Texas at Austin METHODS IN AND APPLICATIONS OF THE SEQUENCING OF SHORT NON-CODING RNAS COPYRIGHT 2013 Paul Ryvkin This work is licensed under the Creative Commons Attribution- NonCommercial-ShareAlike 3.0 License To view a copy of this license, visit http://creativecommons.org/licenses/by-ny-sa/2.0/ Dedication This work is dedicated to my loving parents, Mark and Yelena Ryvkin, to whom I owe everything I’ve achieved and am yet capable of achieving. iii Acknowledgements First I must thank my thesis advisor, Li-San Wang, whose careful guidance and perseverance show through in this work. Thanks also go to my thesis committee who graciously took the time to provide useful input throughout the process. This section would not be complete without thanking all of my former and current labmates, particularly: Fan Li for transforming my ugly hacks into useful apps, Kajia Cao for keeping my head out of the clouds, Fanny Leung for helping with the benchwork and the machine learning algorithms, Otto Valladares for keeping the servers humming, Micah Childress for putting a public face on my software, and everyone else in the Wang lab. My appreciation goes to the staff at the Institute for Biomedical Informatics, particularly Hannah Chervitz and Tiffany Barlow for their organizational prowess. I’d also like to thank all the GCB students I’ve known; the elder for their sage advice, the co-matriculating for their commiseration, and the younger for excellent times had. In the course of my time at Penn I’ve worked with many, many other researchers, without whom this work would not have been possible. I’d like to thank Brad Johnson, who taught me so many important molecular biology techniques ranging from nucleic acids extraction to keeping the supernatant. Also to thank is Brian Gregory and everyone in his lab, particularly Isabelle Dragomir for her help with the sequencing library preparation and Lee Vandivier and Ian Silverman for their help with validating experiments. Thanks also go to Alice-Chen Plotkin for her help with tissue processing, Vivianna Van Deerlin for her help with large-scale RNA extraction, Theresa Schuck for her help with tissue dissection, and Virginia Lee for her ever insightful input. I particularly appreciate everyone at the Center for Neurodegenerative Disease Research and everyone in Gerard Schellenberg’s lab for being gracious hosts for much of this work. A special thank-you goes to John Trojanowski and the Institute on Aging for providing the funding for the Alzheimer’s study, which yielded almost all of the data necessary for this work. Additional funding came from the National Institutes of Health, National Institute of General iv Medical Sciences, the National Human Genome Research Institute, the National Institute on Aging, Penn Alzheimer’s Disease Center, and the National Science Foundation. Finally I’d like to thank my office in Blockley Hall for sheltering my computer from the elements, my bicycle for faithfully transporting me from point A to point B, my two cats for being endlessly fascinating felids, the never-boring city of Philadelphia, the food and company at Grace Tavern, the scenery of Rittenhouse Park, and the game of Bridge (special thanks to Kathleen Sprouffske, Miler Lee, Rumen Kostadinov, and Aaron Goodman). I also thank my wonderful girlfriend Chrystelle Browman for supporting me despite the unique challenges of dating a PhD student. v ABSTRACT METHODS IN AND APPLICATIONS OF THE SEQUENCING OF SHORT NON-CODING RNAS Paul Ryvkin Li-San Wang Short non-coding RNAs are important for all domains of life. With the advent of modern molecular biology their applicability to medicine has become apparent in settings ranging from diagonistic biomarkers to therapeutics and fields ranging from oncology to neurology. In addition, a critical, recent technological development is high-throughput sequencing of nucleic acids. The convergence of modern biotechnology with developments in RNA biology presents opportunities in both basic research and medical settings. Here I present two novel methods for leveraging high-throughput sequencing in the study of short non-coding RNAs, as well as a study in which they are applied to Alzheimer’s Disease (AD). The computational methods presented here include High-throughput Annotation of Modified Ribonucleotides (HAMR), which enables researchers to detect post-transcriptional covalent modifications to RNAs in a high-throughput manner. In addition, I describe Classification of RNAs by Analysis of Length (CoRAL), a computational method that allows researchers to characterize the pathways responsible for short non-coding RNA biogenesis. Lastly, I present an application of the study of non-coding RNAs to Alzheimer’s disease. When applied to the study of AD, it is apparent that several classes of non- coding RNAs, particularly tRNAs and tRNA fragments, show striking changes in the dorsolateral prefrontal cortex of affected human brains. Interestingly, the nature of these changes differs between mitochondrial and nuclear tRNAs, implicating an association between Alzheimer’s disease and perturbation of mitochondrial function. In addition, by combining known genetic factors of AD with genes that are differentially expressed and targets of regulatory RNAs that are differentially expressed, I construct a network of genes that are potentially relevant to the pathogenesis of the disease. By combining genetics data with novel results from the study of non- vi coding RNAs, we can