Analysis of Intronic and Exonic Reads in RNA-Seq Data Characterizes Transcriptional and Post-Transcriptional Regulation

AN A LYSIS Analysis of intronic and exonic reads in RNA-seq data characterizes transcriptional and post-transcriptional regulation Dimos Gaidatzis1,2,4, Lukas Burger1,2,4, Maria Florescu1–3 & Michael B Stadler1,2,4 RNA-seq experiments generate reads derived not only from alignment14,15, transcript assembly16,17, transcript quantification14,18 mature RNA transcripts but also from pre-mRNA. Here we and differential expression analysis19–21. Although RNA-seq mostly present a computational approach called exon-intron split generates reads that map to exons, it also captures less abundant analysis (EISA) that measures changes in mature RNA and intronic sequences6. However, their interpretation has remained pre-mRNA reads across different experimental conditions to controversial. Some have suggested that they originate from DNA quantify transcriptional and post-transcriptional regulation contamination and can thus be used as a quality metric for RNA-seq of gene expression. We apply EISA to 17 diverse data sets to data22 (see also RNA-seq guidelines of the Roadmap Epigenomics show that most intronic reads arise from nuclear RNA and Consortium, http://www.roadmapepigenomics.org/), whereas others changes in intronic read counts accurately predict changes have hypothesized that they stem from unknown exons or intronic in transcriptional activity. Furthermore, changes in post- enhancers6,7. In a study based on exon arrays, probes mapping to transcriptional regulation can be predicted from differences introns were used to investigate pre-mRNA dynamics23. Three between exonic and intronic changes. EISA reveals both recent studies based on RNA-seq provided evidence that intronic transcriptional and post-transcriptional contributions to reads might correlate with transcriptional activity. In two of these, expression changes, increasing the amount of information that the read coverage along introns was related to nascent transcrip- can be gained from RNA-seq data sets. tion in combination with co-transcriptional splicing events24 and Nature America, Inc. All rights reserved. America, Inc. Nature 5 later was used to fit a detailed transcriptional model within a single Cellular RNAs are regulated at multiple stages, including transcrip- sample25. More recently, levels of exonic reads were found to lag © 201 tion, RNA maturation and degradation. Several analytic methods 15 min behind levels of intronic reads for a set of oscillating tran- have been developed to measure these processes on a transcriptome- scripts during Caenorhabditis elegans development, suggesting that wide scale. For example, global run-on sequencing (GRO-seq)1 uses intronic levels are a proxy for nascent transcription26. npg incorporation of a nucleotide analog to enrich for nascent RNA. In Here we analyze 17 published RNA-seq data sets covering a wide Nascent-seq2,3, newly transcribed RNAs are isolated by purification range of cell types and perturbations. We find that changes in intronic of their complex with proteins and the DNA template. Cellular frac- read counts are not technical artifacts but instead directly measure tionation techniques4 have also been adapted to measure nascent changes in transcriptional activity. Using EISA to compare intronic transcripts, which are enriched in the nucleus. mRNA half-lives have and exonic changes across different experimental conditions allows been determined, for example, by blockage of transcription followed the separation of transcriptional and post-transcriptional contribu- by transcriptional profiling5. RNA sequencing, the most widely used tions to observed changes in steady-state RNA levels, without the need method for transcriptome analysis, has been applied in numerous for additional experiments such as GRO-seq1 or Nascent-seq2,3. This studies to determine steady-state mRNA levels6,7 and alternative approach opens up the possibility of using standard RNA-seq experi- splicing events8 and to identify previously unknown transcripts and ments to determine whether expression changes observed in gene noncoding RNAs9–11. In general, these protocols aim to enrich for knockout, knock-down or overexpression experiments are caused by mature mRNA by selection of polyadenylated RNA or by depletion transcriptional or post-transcriptional mechanisms, thereby provid- of ribosomal RNA. ing information pertinent to gene function. Many computational methods (reviewed in refs. 12,13) have been developed for the analysis of RNA-seq data, to enable spliced RESULTS Quantifying expression changes in exons and introns 1Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland. Given that exonic reads essentially reflect mature cytoplasmic 2Swiss Institute of Bioinformatics, Basel, Switzerland. 3University of Basel, Basel, Switzerland. 4These authors contributed equally to this work. mRNAs, we determined whether in silico separation of exonic and Correspondence should be addressed to D.G. ([email protected]) or intronic reads from a standard RNA-seq experiment could be used L.B. ([email protected]) or M.B.S. ([email protected]). to separate transcriptional and post-transcriptional contributions to Received 10 April 2014; accepted 22 May 2015; published online 22 June 2015; expression changes. In our approach, expression changes across con- doi:10.1038/nbt.3269 ditions were quantified separately for exonic (∆exon) and intronic 722 VOLUME 33 NUMBER 7 JULY 2015 NATURE BIOTECHNOLOGY A N A LYSIS Figure 1 Comparison of exonic and intronic changes for single genes a under controlled transcriptional and post-transcriptional perturbations. ∆intron (a) Illustration of the computational approach. mRNA is displayed in the form of nascent unspliced or partially spliced transcripts (nucleus) and mature mRNAs (cytoplasm). RNA-seq reads that map to annotated transcripts are separated into exonic (blue) and intronic reads (red). ∆exon The differences in exonic (∆exon) and intronic (∆intron) read counts between experimental conditions are then quantified and compared to each other. (b) ∆exon and ∆intron after treatment of human A549 ∆exon b 4 c 4 intron ) ∆ cells with the glucocorticoid dexamethasone. Five target genes of the 2 176/30 glucocorticoid receptor known to be upregulated in a receptor-dependent 2,402/470 356/83 ) 27 753/238 2 manner are shown. Numbers indicate the exonic and intronic read 1,826/567 2 4,615/1,662 2 66/24 1,705/718 536/265 counts with and without treatment. (c) Same as in b for siRNA-target 54/24 122/70 genes in five different knock-down experiments. 1,195/876 0 0 223/1,141 115/131 911/5,312 160/382 99/1,728 95/92 2,416/4,643 138/134 –2 exon –2 reads (∆intron) and the relationship between ∆exon and ∆intron ∆ siRNA – control (log was investigated (Fig. 1a). To test the feasibility of the approach, we ∆intron monitored the dynamics of intron and exon levels of single genes Response to dexamethasone (log –4 –4 under well-characterized transcriptional and post-transcriptional HUR ELL2 ADAR ZFP36 SNRPC ERRFI1 TIPARP ZNF143 perturbations. We first calculated expression changes of glucocor- YTHDF2 ticoid receptor target genes upon stimulation of human A549 cells CDC42EP3 with glucocorticoid dexamethasone27. From a set of genes shown previously to be transcriptionally induced in a receptor-dependent (chromatin, cytoplasm) and unfractionated cells (polyA RNA) after manner27, we selected a subset of five genes and determined their 0, 30, 60 and 120 min28 (Supplementary Data 2). Figure 2a shows RNA levels by counting either reads mapping to exons (which is the the time evolution of expression levels of exons and introns in unfrac- common approach) or only reads mapping to introns. As expected, tionated RNA-seq as well as the chromatin and cytoplasmic fractions the selected genes were upregulated on the exonic level upon stimula- separately for all originally identified clusters of lipid-A-induced tran- tion (Fig. 1b). Notably, a similar upregulation was observed on the scripts28. We hypothesized that if intronic reads in standard unfrac- intronic level, suggesting that the intronic reads may provide infor- tionated RNA reflect changes in transcription, the intronic expression mation about the transcriptional changes that occur upon treatment. patterns should more closely follow the chromatin fraction, which However, intronic changes do not always mirror exonic changes. represents nascent RNA2,3, than the cytoplasmic fraction. This is indeed We performed the same analysis for five genes knocked down in dif- the case when considering all genes in all clusters (P = 1.03 × 10−9, ferent short interfering RNA (siRNA) experiments (Supplementary one-sided paired t-test). Considering each cluster separately, we Table 1). In all cases, we observed exonic downregulation of the tar- obtained significant P-values (<0.02) for the first four clusters A1, A2, Nature America, Inc. All rights reserved. America, Inc. Nature geted genes whereas intronic levels remained virtually unchanged B and C. As the clusters D, E and F contained genes that change less 5 (Fig. 1c). The correlated intronic and exonic changes during direct abruptly over time, we speculate that in these cases it is more difficult transcriptional activation (Fig. 1b) and the lack of intronic changes to detect differences between transcription and mRNA levels in the © 201 upon post-transcriptional perturbations (Fig. 1c) indicate that a cytoplasm. As expected, the exonic changes of the unfractionated pool comparison of exonic and intronic expression changes can separate closely

Analysis of Intronic and Exonic Reads in RNA-Seq Data Characterizes Transcriptional and Post-Transcriptional Regulation

Primepcr™Assay Validation Report

Exon Amplification: a Strategy to Isolate Mammalian Genes Based on RNA Splicing (Gene Cloning/Polymerase Chain Reaction) ALAN J

Mrna Turnover Philip Mitchell* and David Tollervey†

Supplemental Information

Sequences at the Exon-Intron Boundaries* (Split Gene/Mrna Splicing/Eukaryotic Gene Structure) R

Nuclear PTEN Safeguards Pre-Mrna Splicing to Link Golgi Apparatus for Its Tumor Suppressive Role

Deep Sequencing of Subcellular RNA Fractions Shows Splicing to Be Predominantly Co-Transcriptional in the Human Genome but Inefficient for Lncrnas

Unraveling Regulation and New Components of Human P-Bodies Through a Protein Interaction Framework and Experimental Validation

Mouse Prpf19 Conditional Knockout Project (CRISPR/Cas9)

Chem 465 Biochemistry II Test 3

Messenger-Rna-Binding Proteins and the Messages They Carry

Alternatively Spliced Genes