Discovering Alternative Splicing of Deeply Conserved Exons and Characterizing the Intronome in the Unicellular Yeast S
Total Page:16
File Type:pdf, Size:1020Kb
DISCOVERING ALTERNATIVE SPLICING OF DEEPLY CONSERVED EXONS AND CHARACTERIZING THE INTRONOME IN THE UNICELLULAR YEAST S. POMBE USING LARIAT SEQUENCING A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy By Ali Raza Awan August 2013 © 2013 Ali Raza Awan Discovering Alternative Splicing of Deeply Conserved Exons and Characterizing the Intronome in the Unicellular Yeast S. pombe using Lariat Sequencing Ali Raza Awan, Ph.D. Cornell University 2013 Alternative splicing is a potent regulator of gene expression that vastly increases proteomic diversity in multi-cellular eukaryotes. Although it is widespread in vertebrates, little is known about the evolutionary origins of this process owing in part to the absence of phylogenetically conserved events that cross major eukaryotic clades. The unicellular fission yeast, Schizosaccharomyces pombe, is an organism evolutionarily distant from mammals that nonetheless shares many of the hallmarks of the major mammalian alternative splicing form, exon skipping. Further, S. pombe is a highly genetically tractable organism that has been considered as an attractive potential model system in which to study exon skipping. However, evidence of exon skipping from RNA-seq studies in S. pombe has remained elusive. To better search for such evidence I have developed a novel lariat sequencing approach that offers high sensitivity for detecting splicing events, and applied it to study splicing in S. pombe. Using this approach, I discovered multiple examples of exon skipping, several of which involve exons that are conserved with dozens of animals, plants, other fungi and even protists. Strikingly, some of the specific alternative splicing patterns found in S. pombe are identical in mammals. In addition, I discovered hundreds of novel splicing events, many of which occur in genes that were thought to be intronless. Finally, I present only the second large-scale map of the intronic branch point sequences in any organism, allowing me to test a prediction about differences in splice site recognition in introns from non-coding regions versus those that separate protein-coding exons. BIOGRAPHICAL SKETCH Ali Raza Awan was born on June 29th, 1981 to Munia Rab and Kazim Awan. Ali was born in London, but moved with his family to Pakistan at the age of six months. After three years in his father’s family home in Quetta, Ali moved again, this time to Jeddah, Saudi Arabia, where he spent most of his childhood. It was in Jeddah that Ali developed his love for nature and the outdoors. Situated next to the Red Sea and near the Empty Quarter, Jeddah was ideally situated for the young Ali to explore the riches of the beach as well as the vastness of the desert. Ali spent many a happy weekend with his parents, his brother and family friends picnicking by the seaside, or on trips out into nearby mountains. It was during this time, snorkelling in the Red Sea, collecting seashells and hunting for fossils, that Ali became passionate about the natural world. After moving to Dubai as a teenager, Ali was exposed to new aspects of the science of biology. At the beginning of high school, Ali’s father bought him a book called “Introducing Genetics” by Steve Jones and Borin Van Loom. Little did he know that this book would propel Ali on a path to his future career. Genetics combined Ali’s love of nature with his fondness for mathematics (his mother taught the subject). The Biology A-levels in high iv ! school formalized Ali’s foray into genetics, which was further bolstered by an inspirational biology teacher, Mr Lochhead. After high school, Ali was accepted into Cornell University as an undergraduate, where he began his college career majoring in biology and computer science. On a whim, sometime during the winter of his sophomore year, Ali decided to apply to transfer to Stanford University. Six months later, Ali made the switch and spent three wonderful years soaking up the California sunshine, pursuing a Bachelors degree in Biology and a Masters degree in Bioinformatics, growing his hair, playing the guitar and becoming a hippie of sorts. After graduating and two brief stints at biotech start-ups in St. Louis and Dubai, Ali decided that it was time to get a PhD. Funnily enough, he ended up back where he started: at Cornell University, with many cold winters ahead of him. At Cornell, Ali decided that the lab of newly appointed Assistant Professor Jeff Pleiss was the perfect academic home. The research promised an exciting mix of genomics and bioinformatics where Ali felt that his background would nicely complement the high-throughput methods-based outlook of the lab. When he is not working on science, Ali is interested in music (listening and playing), literature, football (soccer), and spending time with family and friends. v ! ACKNOWLEDGEMENTS I thank my parents Munia and Kazim and my brother Taimur, for all of their love and support, without which none of this would have been possible. This was especially important during some of the times when it felt like winter would never end in Ithaca. I thank Jeff for sharing his immense enthusiasm for doing science, and showing me how to be more positive about my own work. I also thank Jeff for teaching me the importance of effective communication in science, right from my very first department seminar down to the time I presented my work in the RNA Society conference in Kyoto. I owe Eric Alani a huge debt of gratitude for always having an open door and giving me sound advice about how to navigate the challenges of the PhD during some of the times when Jeff unfortunately had to be away. I extend thanks to Andrew Grimson for the similar role he played, and for the encouragement he gave me. I thank John Lis for the support and advice he has given me along the way, especially with regard to my choice of postdoctoral position. vi ! I thank my friends, especially Satyaki Prasad, Syud Ahmed, Gabriel Hoffman, Yin He and Stephane Bentolila for all the good cheer through these six years and just for making Ithaca a lot more interesting in general. Finally, I am grateful to my aunt Bugs Khala and my mum (again) for putting me up and putting up with me to make it possible to write my thesis during an intense two week period in the Upper West side of Manhattan. vii ! TABLE OF CONTENTS CHAPTER 1: INTRODUCTION 1.1 Alternative Splicing is a Key Modulator of Eukaryotic Gene Expression .................................................................................................................... 1 1.2 The Regulation of Alternative Splicing is Multi-Layered ...............................11 1.3 Genome-wide Approaches to Study Alternative Splicing in a Genetically Tractable System .........................................................................................................19 CHAPTER 2: DEVELOPING LARIAT SEQUENCING TO DETECT EXON SKIPPING IN S. POMBE 2.1 Introduction ......................................................................................................... 40 2.2 Results ................................................................................................................... 43 2.3 Discussion ............................................................................................................ 78 2.4 Materials and Methods ....................................................................................... 83 CHAPTER 3: OPTIMIZING LARIAT SEQUENCING TO BETTER PROFILE THE WHOLE S. POMBE INTRONOME 3.1 Introduction ....................................................................................................... 121 3.2 Results ................................................................................................................. 123 3.3 Discussion .......................................................................................................... 145 3.4 Materials and Methods ..................................................................................... 146 CHAPTER 4: MAPPING S. POMBE BRANCHPOINTS ON A GENOMIC SCALE 4.1 Introduction ....................................................................................................... 176 4.2 Results ................................................................................................................. 180 4.3 Methods .............................................................................................................. 195 CHAPTER 5: FUTURE DIRECTIONS 5.1 Srrm1 and Exon Definition ............................................................................ 207 5.2 Circular Exonic RNAs ..................................................................................... 217 viii ! CHAPTER 1: INTRODUCTION 1.1 Splicing is a Key Modulator of Eukaryotic Gene Expression The Nobel Prize-winning discovery of “split genes” in the late 1970’s (Berget, Moore, & Sharp, 1977; Chow, Gelinas, Broker, & Roberts, 1977) dramatically changed our understanding of the architecture of eukaryotic gene structure. While it was already known that non-coding untranslated regions flanked the protein-coding segments of genes, this discovery led to the idea that the protein-coding region itself consisted of coding segments (exons) interrupted by non-coding regions called introns (Cech, 1983). Intronic sequences are typically removed from the RNA transcript in the nucleus via a process called splicing (Padgett, Grabowski, Konarska, Seiler, & Sharp, 1986). The chemical mechanism