Ubiquitously Transcribed Genes Use Alternative Polyadenylation to Achieve Tissue-Specific Expression
Total Page:16
File Type:pdf, Size:1020Kb
RESOURCE/METHODOLOGY Ubiquitously transcribed genes use alternative polyadenylation to achieve tissue-specific expression Steve Lianoglou,1,2 Vidur Garg,3 Julie L. Yang,1 Christina S. Leslie,1 and Christine Mayr3,4 1Computational Biology Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA; 2Physiology, Biophysics, and Systems Biology Graduate Program, Weill Cornell Medical College, New York, New York 10021, USA; 3Cancer Biology and Genetics Program, Memorial Sloan-Kettering Cancer Center, New York, New York 10065, USA More than half of human genes use alternative cleavage and polyadenylation (ApA) to generate mRNA transcripts that differ in the lengths of their 39 untranslated regions (UTRs), thus altering the post-transcriptional fate of the message and likely the protein output. The extent of 39 UTR variation across tissues and the functional role of ApA remain poorly understood. We developed a sequencing method called 39-seq to quantitatively map the 39 ends of the transcriptome of diverse human tissues and isogenic transformation systems. We found that cell type- specific gene expression is accomplished by two complementary programs. Tissue-restricted genes tend to have single 39 UTRs, whereas a majority of ubiquitously transcribed genes generate multiple 39 UTRs. During transformation and differentiation, single-UTR genes change their mRNA abundance levels, while multi-UTR genes mostly change 39 UTR isoform ratios to achieve tissue specificity. However, both regulation programs target genes that function in the same pathways and processes that characterize the new cell type. Instead of finding global shifts in 39 UTR length during transformation and differentiation, we identify tissue-specific groups of multi-UTR genes that change their 39 UTR ratios; these changes in 39 UTR length are largely independent from changes in mRNA abundance. Finally, tissue-specific usage of ApA sites appears to be a mechanism for changing the landscape targetable by ubiquitously expressed microRNAs. [Keywords: alternative polyadenylation; tissue-specific regulation of gene expression; transcriptome analysis; 39 UTR isoform; gene regulation; computational biology] Supplemental material is available for this article. Received August 24, 2013; revised version accepted September 17, 2013. Protein expression is determined by the rate of transcrip- generate alternative mRNA isoforms that differ in the tion and by post-transcriptional processes that lead to length of their 39 UTRs due to the recognition of alter- changes in mRNA transport, stability, and translational native cleavage and polyadenylation (ApA) sites (Tian efficiency. These post-transcriptional processes are me- et al. 2005; Flavell et al. 2008; Sandberg et al. 2008; Ji et al. diated by RNA modifications, secondary structure, 2009; Mayr and Bartel 2009). As a consequence, changes microRNAs (miRNAs), and RNA-binding proteins that in the relative abundance of 39 UTR isoforms determine recognize regulatory elements located in the 39 untrans- whether the regulatory elements that enable post-tran- lated regions (UTRs) of transcripts (Bartel 2004; Sonenberg scriptional regulation are largely present or absent from and Hinnebusch 2009; Darnell 2010; Ascano et al. 2012; the mRNA. It has been shown for specific genes that the Meyer et al. 2012; Vogel and Marcotte 2012). It was shorter mRNA isoforms escape regulation by miRNAs recently discovered that a large fraction of human genes and other RNA-binding proteins and can produce as much as 40-fold more protein (Mayr and Bartel 2009). There are reports in the literature that show that the 4Corresponding author E-mail [email protected] Article published online ahead of print. Article and publication date are online at http://www.genesdev.org/cgi/doi/10.1101/gad.229328.113. Ó 2013 Lianoglou et al. This article, published in Genes & Development, Freely available online through the Genes & Development Open Access is available under a Creative Commons License (Attribution-NonCommercial option. 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/. GENES & DEVELOPMENT 27:000–000 Published by Cold Spring Harbor Laboratory Press; ISSN 0890-9369/13; www.genesdev.org 1 Lianoglou et al. longer mRNA isoforms can produce more protein or that tional regulation. We also demonstrate that differentiation both mRNA isoforms produce comparable amounts of or transformation leads to changes in mRNA abundance of protein (Ranganathan et al. 1995; An et al. 2008). How- single-UTR genes as well as alterations in 39 UTR isoform ever, in the majority of tested genes, the shorter mRNA levels of multi-UTR genes. These changes are specific to isoforms produce more protein (Wiestner et al. 2007; the conditions analyzed, and, remarkably, both groups of Sandberg et al. 2008; Mayr and Bartel 2009; Singh et al. genes are involved in the pathways and processes charac- 2009; Akman et al. 2012; Martin et al. 2012; Bava et al. teristic of the new cell state, even though multi-UTR genes 2013). Thus, induction of widespread changes in the ratio largely do not alter mRNA levels. If alternative 39 UTRs of ApA isoform abundance is a gene regulatory process indeed confer differential protein expression, alterations in that can have extensive consequences for gene expression, mRNA abundance and changes in 39 UTR isoform ratios as it sets the stage for post-transcriptional regulation. cooperate in accomplishing the activated expression pro- The first transcriptome-wide studies on ApA reported gram. Although we have been able to measure transcrip- that proliferation or oncogenic transformation was asso- tome-wide mRNA expression levels for more than a decade, ciated with a shift toward generation of shorter mRNA deconvolving mRNA length from abundance has only isoforms, whereas differentiation seemed to correlate recently been made possible through directed sequencing with lengthening of 39 UTRs (Flavell et al. 2008; Sandberg methods like 39-seq. Our analysis identifies a key compo- et al. 2008; Ji et al. 2009; Mayr and Bartel 2009; Elkon nent of gene expression programs—the global changes in et al. 2012, 2013; Lin et al. 2012; Tian and Manley 2013). 39 UTR isoform expression—that has been largely invisible Furthermore, specific tissues appeared to produce overall until now. shorter or longer 39 UTRs (Zhang et al. 2005; Ramskold et al. 2009; Shepard et al. 2011; Li et al. 2012; Smibert et al. 2012; Ulitsky et al. 2012). Recently, several 39 end Results sequencing methods were published. Whereas Derti et al. The 39 cleavage events of genes are the same across (2012) showed that their protocol is quantitative with tissues respect to mRNA abundance levels, none of the protocols was established to be quantitative with respect to 39 UTR We hypothesized that the differential usage of 39 UTR isoform expression (Fu et al. 2011; Jan et al. 2011; Derti isoforms is a coordinated gene expression program to et al. 2012; Elkon et al. 2012; Lin et al. 2012; Hoque et al. regulate protein levels. To uncover the extent of this 2013). In the present study, we performed a quantitative potential regulatory program, we set out to quantitatively and statistically rigorous analysis of alternative 39 UTR map the 39 ends of mRNAs in diverse human tissues and isoform expression across a large number of human cell lines at single-nucleotide resolution. To this end, we tissues as well as in isogenic cell transformation experi- developed a next-generation sequencing protocol called ments and uncovered a far more complex picture of ApA 39-seq (Fig. 1A) and applied it to 14 human samples, than in these early reports. including testis, ovary, brain, breast, skeletal muscle, Using 39-seq, we built an atlas of human polyadenyla- naı¨ve B cells, embryonic stem (ES) cells, and human cell tion (pA) cleavage events that contains a large majority of lines (NTERA2, HeLa, MCF7, MCF10A, and HEK293) functional pA sites. We extensively validated our sequenc- (Supplemental Table 1). In total, we obtained 98 million ing results and demonstrate that the number of isoforms reads uniquely mapping to the genome, with the vast detected as well as the relative abundance ratio of alterna- majority mapping to known 39 UTRs (Supplemental Fig. tive 39 UTR isoforms is consistent with results obtained by 1A). We implemented a computational pipeline (see the Northern blots. We then applied a new computational Supplemental Material) to identify peaks representing approach to identify significant tissue- and condition- 39 ends and removed peak events due to internal priming specific differences in 39 UTR isoform levels that takes (Supplemental Fig. 1B). We focused our analysis on mRNA into account changes in mRNA level and the variation 39 ends that mapped to annotated 39 UTRs or were located between biological replicates. This comprehensive analy- within 5 kb downstream from RefSeq gene annotations and sis revealed that the generation of alternative 39 UTR found 2187 genes to have longer 39 UTRs (>20 nucleotides isoforms is a characteristic of ubiquitously transcribed [nt]; median, 1870 nt) than annotated. For seven of these genes that are involved in diverse gene regulatory pro- genes, we tested by Northern blot analysis whether the cesses and are distinct from the classical housekeeping 39 ends were connected to the upstream genes and con- genes that generate single UTRs. However, like house- firmed them for all seven genes (Supplemental Fig. 1C; data keeping genes, this new class of ubiquitously transcribed not shown). This shows that our data can be used to genes is less stringently regulated at the transcriptional reannotate human 39 UTRs. We used the 39 UTR lengths level. Instead, ubiquitously transcribed multi-UTR genes from our atlas for downstream analyses. use differential abundance of their 39 UTR isoforms to To identify the human genes that generate single or achieve tissue- and context-dependent expression. They multiple 39 UTRs, we generated an atlas of cleavage also have permissive proximal pA sites that enable read- events (80,371 39 ends for 13,718 genes) from all of our through transcription to allow inclusion of 39 UTR regula- samples.