Prevalence and Significance of Nonsense-‐Mediated Mrna Decay
Total Page:16
File Type:pdf, Size:1020Kb
Prevalence and Significance of Nonsense-Mediated mRNA Decay Coupled with Alternative Splicing in Diverse Eukaryotic Organisms By Courtney Elizabeth French A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Molecular and Cell Biology in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Steven E. Brenner, Co-Chair Professor Donald C. Rio, Co-Chair Professor Britt A. Glaunsinger Professor Sandrine Dudoit Spring 2016 Prevalence and Significance of Nonsense-Mediated mRNA Decay Coupled with Alternative Splicing in Diverse Eukaryotic Organisms Copyright 2016 by Courtney Elizabeth French Abstract Prevalence and Significance of Nonsense-Mediated mRNA Decay Coupled with Alternative Splicing in Diverse Eukaryotic Organisms by Courtney Elizabeth French Doctor of Philosophy in Molecular and Cell Biology University of California, Berkeley Professor Steven E. Brenner, Co-Chair Professor Donald C. Rio, Co-Chair Alternative splicing plays a crucial role in increasing the amount of protein diversity and in regulating gene expression at the post-transcriptional level. In humans, almost all genes produce more than one mRNA isoform and, while the fraction varies, many other species also have a substantial number of alternatively spliced genes. Alternative splicing is regulated by splicing factors, often in a developmental time- or tissue-specific manner. Mis- regulation of alternative splicing, via mutations in splice sites, splicing regulatory elements, or splicing factors, can lead to disease states, including cancers. Thus, characterizing how alternative splicing shapes the transcriptome will lead to greater insights into the regulation of numerous cellular pathways and many aspects of human health. A critical tool for investigating alternative splicing is high-throughput mRNA sequencing (RNA-seq). This technology produces hundreds of millions of short (~100bp) sequencing reads from mRNA molecules and can be used to both discover novel transcripts and to quantify the expression of transcripts. While short read length is a limitation of the technology in its current form, RNA-seq has resulted in the discovery of hundreds of thousands of new transcripts and revealed an increased complexity of the transcriptome via alternative splicing, particularly in human. Here, I used RNA-seq analysis to investigate the global effect of post-transcriptional regulation via alternative splicing coupled to nonsense-mediated mRNA decay and to examine natural human variation in alternative splicing, particularly in genes associated with differential therapeutic drug response. The nonsense-mediated mRNA decay pathway (NMD), which degrades transcripts containing a premature termination codon, plays an important role in post-transcriptional gene regulation when coupled to alternative splicing. If a gene produces an alternative isoform that is targeted by NMD, the mRNA abundance of the protein-producing transcripts can be post-transcriptionally regulated at the alternative splicing level. This has been shown to be important in the regulation of a number of genes, including many of the splicing factors themselves. I have used RNA-seq analysis on cells where NMD has been 1 inhibited to discover alternative isoforms that are NMD targets on a genome-wide scale in human and a number of diverse other eukaryotic species. I found that around 20% of expressed human genes are potentially regulated by alternative splicing coupled to NMD and that they fall into many different functional categories. I also found that hundreds to thousands of genes produce NMD-targeted alternative isoforms in each of frog, zebrafish, fly, fission yeast, and plant, highlighting the prevalence of this relatively under-studied method of gene regulation across the three major branches of eukaryotic organisms. I also gained insight into the features that define NMD targets, which are thought to vary between species although the field is still unclear. I find that an exon-exon junction downstream of the termination codon is a much stronger predictor of NMD than 3’ UTR length in every species except yeast. I also used RNA-seq to investigate alternative splicing in genes of pharmacologic importance. Natural human variation in the expression level and activity of genes involved in drug disposition and action (“pharmacogenes”) can affect drug response and toxicity. Previous studies have relied primarily on microarrays to understand gene expression differences, or have focused on a single tissue or small number of samples. Here, we used RNA-seq to determine the expression levels and alternative splicing of 389 selected pharmacogenes across four pharmacologically relevant tissues (liver, kidney, heart and adipose) and lymphoblastoid cell lines (LCLs), which are used widely in pharmacogenomics studies. Analysis of data from 18 different individuals for each of the 5 tissues (90 samples in total) revealed substantial variation in both expression levels and splicing across samples and tissue types. Comparison with an independent RNA-seq dataset yielded a consistent picture. This in-depth exploration also revealed 183 splicing events in pharmacogenes that were previously not annotated. Overall, this study serves as a rich resource for the research community to inform biomarker and drug discovery and use.* In conclusion, the roles of alternative splicing and NMD in the regulation of cellular processes and in human health are wide-open but critical fields of study. Advancements in sequencing technologies have had and will continue to have a huge impact on the studies of these mechanisms. New long-read technologies will likely soon be readily available and promise to greatly increase our ability to accurately interpret RNA-seq results. As the cost of sequencing continues to decrease, more and more data will be generated, allowing for a better view of how the transcriptome varies between individuals and shapes differential disease risks and drug responses. * This paragraph was co-written with Aparna Chhibber and Sook Wah Yee and modified from a previously published work: Chhibber A*, French CE*, Yee SW*, Gamazon ER*, et al. 2016. Transcriptomic variation of pharmacogenes in multiple human tissues and lymphoblastoid cell lines. Pharmacogenomics Journal. *co-first authors 2 Table of Contents CHAPTER 1 USING RNA-SEQ ANALYSIS FOR ISOFORM EXPRESSION AND ALTERNATIVE SPLICING INVESTIGATIONS ................................................................................. 1 ABSTRACT ................................................................................................................................................................... 1 INTRODUCTION TO RNA-SEQ .................................................................................................................................. 1 READ ALIGNMENT ...................................................................................................................................................... 2 ASSEMBLY OF NOVEL TRANSCRIPTS ........................................................................................................................ 2 QUANTIFYING TRANSCRIPT EXPRESSION ................................................................................................................ 3 USING A REFERENCE GENE ANNOTATION ............................................................................................................... 6 IMPACT OF READ DEPTH ........................................................................................................................................... 7 READ DEPTH VERSUS NUMBER OF SAMPLES .......................................................................................................... 9 CONCLUSIONS .......................................................................................................................................................... 11 REFERENCES ........................................................................................................................................................... 12 CHAPTER 2 TRANSCRIPTOME ANALYSIS OF ALTERNATIVE SPLICING COUPLED WITH NONSENSE MEDIATED MRNA DECAY IN HUMAN CELLS ....................................................... 15 ABSTRACT ................................................................................................................................................................ 15 INTRODUCTION ....................................................................................................................................................... 15 RESULTS ................................................................................................................................................................... 17 Over two thousand genes produce transcripts identified as confident NMD targets .......... 17 Diverse categories of genes are affected by NMD, particularly splicing .................................... 21 NMD-targeted genes are enriched for ultraconserved elements .................................................. 25 The 50nt Rule is a strong predictor of NMD while a longer 3’ UTR has little effect ............. 25 Premature termination codons generated by alternative splicing events or uORFS ........... 27 DISCUSSION ............................................................................................................................................................. 28 MATERIALS AND METHODS .................................................................................................................................