S41467-019-13840-9.Pdf

ARTICLE https://doi.org/10.1038/s41467-019-13840-9 OPEN Accurate quantification of circular RNAs identifies extensive circular isoform switching events Jinyang Zhang 1,2, Shuai Chen1, Jingwen Yang1 & Fangqing Zhao1,2,3* Detection and quantification of circular RNAs (circRNAs) face several significant challenges, including high false discovery rate, uneven rRNA depletion and RNase R treatment efficiency, and underestimation of back-spliced junction reads. Here, we propose a novel algorithm, fi 1234567890():,; CIRIquant, for accurate circRNA quanti cation and differential expression analysis. By con- structing pseudo-circular reference for re-alignment of RNA-seq reads and employing sophisticated statistical models to correct RNase R treatment biases, CIRIquant can provide more accurate expression values for circRNAs with significantly reduced false discovery rate. We further develop a one-stop differential expression analysis pipeline implementing two independent measures, which helps unveil the regulation of competitive splicing between circRNAs and their linear counterparts. We apply CIRIquant to RNA-seq datasets of hepatocellular carcinoma, and characterize two important groups of linear-circular switching and circular transcript usage switching events, which demonstrate the promising ability to explore extensive transcriptomic changes in liver tumorigenesis. 1 Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, 100101 Beijing, China. 2 University of Chinese Academy of Sciences, 100049 Beijing, China. 3 Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, 650223 Kunming, China. *email: [email protected] NATURE COMMUNICATIONS | (2020) 11:90 | https://doi.org/10.1038/s41467-019-13840-9 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-13840-9 ircular RNA (circRNA) is a large class of RNA molecules Additionally, we profile the switching events in circRNA junction that contain a covalent circular structure formed by non- ratio and circular transcript usage, and characterize these two C ′ ′ canonical 3 to 5 end-joining event called back-splicing. groups of circRNAs with potential biological functions. We Previous studies have shown that circRNAs are widely present believe that CIRIquant, which provides an accurate and efficient and some are conserved in eukaryotic organisms, and have a quantification approach to characterize circRNAs and perform relative low abundance compared with canonical linear mRNA differential expression analysis after correcting experimental and transcripts1. Although the exact function of most circRNAs is still computational biases, will greatly improve our understanding of ambiguous, studies have shown that circRNAs may function as circRNA diversities and functions. sponges to sequester miRNAs or RNA binding proteins (RBPs)2–4. More recently, several studies found that internal ribosome entry sites or N6-methyladenosine can promote the Results translation of circRNAs, which results in the biogenesis of Challenges in quantifying the expression of circRNAs. To rig- circRNA-derived proteins5–7. Thus, circRNAs have great poten- orously evaluate the challenges in current quantification of cir- tials to play important roles in cellular metabolic process, and the cRNA expression, we collected 63 transcriptomic samples from characterization and quantification of circRNAs from high- six previous studies16–22, including both RiboMinus and Ribo- throughput RNA-seq data has become an emerging problem in Minus/RNase R RNA-seq libraries of four species (human, circRNA studies. mouse, fly and roundworm, Supplementary Table 1). All these A number of computational methods have been developed for RNA-seq datasets were aligned to their reference genomes and characterizing circRNAs8. Most of these methods employ align- the ribosomal RNA (rRNA) sequences using HISAT223 to assess ment based strategies to recognize back-spliced junction (BSJ) the mapping rate and rRNA sequence fraction. As shown in reads from circRNAs, and have limited sensitivity and notable Fig. 1a, RNA-seq datasets from four species showed an extra- false-positive rate in circRNA identification9. Recently, a model- ordinarily high variance in the efficiency of rRNA sequence based strategy is employed by Sailfish-cir10, which uses a quasi- depletion, which is largely due to the limited specificity and mapping method to acquire direct estimation of circular tran- efficiency of current RiboMinus transcriptome isolation kit. For script expression. This statistical model depends on the unique each sample, the raw reads were further aligned to the reference sequence between circular and linear transcripts, which limits its genome using the BWA-MEM algorithm24, and then subjected to ability on quantifying exonic circRNAs. Moreover, the ratio of CIRI2 for circRNA detection and quantification. We chose CPM BSJ reads to canonical linear reads at the junction, which repre- (counts per million mapped reads) to represent circRNA sents the splicing preference in precursor mRNAs, is also an expression levels to remove the biases derived from different important factor in circRNA analysis. However, among all cur- library insert size and sequencing depth. For 34 pairs of Ribo- rently available circRNA detection tools, only CIRI211,12 can Minus and RiboMinus/RNase R samples, the overall number of output the junction ratio directly. DCC13 and Sailfish-cir estimate detected circRNAs ranged from several thousand to over 30 the expression values of both linear and circular transcripts, thousand, in which most of circRNAs detected in the RiboMinus which can be used to calculate circRNA’s junction ratio. Hence, a samples can be validated in the RiboMinus/RNase R samples reliable computational tool is urgently needed for accurate (Fig. 1b). Although the total number of identified circRNAs quantification of circRNAs and their parental linear transcripts. increased after RNase R enrichment, over 50–80% highly Differential expression analyses of circRNAs in different sam- expressed circRNAs could be detected in both RiboMinus and ples is a routine analysis in circRNA studies. Currently, simple RiboMinus/RNase R samples. Considering that RNase R treat- statistical tests (e.g., t-test) or differential expression analysis ment is widely used for circRNA enrichment, we compared pipelines designed for linear RNA transcripts (e.g., DESeq214) are expression levels of both gene and circRNA in RNase R-treated often used to evaluate the significance of differentially expressed and untreated samples to investigate whether RNase R treatment circRNAs. Since most of circRNAs are expressed at extremely low can introduce bias in transcript quantification. As shown in levels, RNase R treatment are usually performed to enrich cir- Fig. 1b (right), gene expression levels decreased over 2-fold, which cRNAs. For studies using RiboMinus/RNase R-treated RNA-seq is in concordance with the degradation effect of linear RNAs in libraries, the variation of enrichment coefficient in the RNase R RNase R treatment. However, the relative expression value of treatment step may result in biased estimation of circRNA circRNAs also showed a certain level of reduction. Considering expression levels. In addition, analysis of experimental replicates that the RNase R treatment is expected to increase the saturation has shown the poor reproducibility of circRNA identification15, level of circRNA detection, the more circRNAs are identified, the which further indicates that accurate characterization and quan- more circular reads and the lower the relative expression value for tification of circular transcripts is crucial in circRNA studies. each specific circRNA. To overcome these limitations, we propose CIRIquant for To further investigate the effect of RNase R treatment on accurate quantification of both circRNAs and their parental circRNA quantification, we divided the detected circRNAs into transcripts and filtration of false positives BSJ reads. CIRIquant two groups, highly expressed circRNAs and lowly expressed can employ currently widely used tools (e.g., CIRI211,12, CIR- circRNAs, according to the ranking of their expression values. Cexplorer216, find_circ2, etc.) for circRNA identification, then Then, we calculated the proportion of reads from the two groups generated pseudo-reference sequences for the identified circRNA of circRNAs after RNase R treatment. As shown in Fig. 1c and transcripts to re-align putative BSJ reads. Using both alignment Supplementary Fig. 1, we observed an increased proportion of BSJ results of reads against the reference genome and pseudo cir- reads for highly expressed circRNAs after RNase R treatment, cRNA transcripts, CIRIquant not only achieves more accurate indicating that highly expressed circRNAs tended to be enriched and sensitive identification of BSJ reads, but also enables reliable in RiboMinus/RNase R-treated data. In order to assess the quantification of junction ratio for circRNAs. Moreover, CIRI- efficiency of RNase R treatment, we calculated the enrichment quant provides a convenient function for one-stop circRNA coefficient for circRNAs in different RNA-seq datasets. Surpris- differential expression analysis. We apply CIRIquant to survey ingly, the enrichment coefficient exhibited distinct difference circRNA expression profiles between hepatocellular carcinoma among samples from different species or even within the same (HCC) tumor samples and their adjacent normal tissues, which

S41467-019-13840-9.Pdf

A Flexible Microfluidic System for Single-Cell Transcriptome Profiling

Transposon Activation Mutagenesis As a Screening Tool for Identifying

WO 2019/079361 Al 25 April 2019 (25.04.2019) W 1P O PCT

Algorithms for Transcriptome Quantification and Reconstruction from RNA-Seq Data

The Function and Evolution of C2H2 Zinc Finger Proteins and Transposons

Muscle Enriched Lamin Interacting Protein (Mlip) Binds Chromatin and Is Required for Myoblast Differentiation

Whole Genome Sequencing of Familial Non-Medullary Thyroid Cancer Identiﬁes Germline Alterations in MAPK/ERK and PI3K/AKT Signaling Pathways

Supplementary Table 1 Genes Tested in Qrt-PCR in Nfpas

Population-Haplotype Models for Mapping and Tagging Structural Variation Using Whole Genome Sequencing

Supplemental Information

Evolutionary Analysis of Viral Sequences in Eukaryotic Genomes

Characterizing Genomic Duplication in Autism Spectrum Disorder by Edward James Higginbotham a Thesis Submitted in Conformity