
Downloaded from genome.cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press Method Statistically robust methylation calling for whole- transcriptome bisulfite sequencing reveals distinct methylation patterns for mouse RNAs Carine Legrand,1 Francesca Tuorto,1 Mark Hartmann,1,2 Reinhard Liebers,1 Dominik Jacob,3 Mark Helm,3 and Frank Lyko1 1Division of Epigenetics, DKFZ-ZMBH Alliance, German Cancer Research Center, 69120 Heidelberg, Germany; 2Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), 69120 Heidelberg, Germany; 3Institute of Pharmacy and Biochemistry, Johannes Gutenberg-University Mainz, 55128 Mainz, Germany Cytosine-5 RNA methylation plays an important role in several biologically and pathologically relevant processes. However, owing to methodological limitations, the transcriptome-wide distribution of this mark has remained largely un- known. We previously established RNA bisulfite sequencing as a method for the analysis of RNA cytosine-5 methylation patterns at single-base resolution. More recently, next-generation sequencing has provided opportunities to establish tran- scriptome-wide maps of this modification. Here, we present a computational approach that integrates tailored filtering and data-driven statistical modeling to eliminate many of the artifacts that are known to be associated with bisulfite sequencing. By using RNAs from mouse embryonic stem cells, we performed a comprehensive methylation analysis of mouse tRNAs, rRNAs, and mRNAs. Our approach identified all known methylation marks in tRNA and two previously unknown but evolutionary conserved marks in 28S rRNA. In addition, mRNAs were found to be very sparsely methylated or not meth- ylated at all. Finally, the tRNA-specific activity of the DNMT2 methyltransferase could be resolved at single-base resolution, which provided important further validation. Our approach can be used to profile cytosine-5 RNA methylation patterns in many experimental contexts and will be important for understanding the function of cytosine-5 RNA methylation in RNA biology and in human disease. [Supplemental material is available for this article.] 5-Methylcytosine (m5C) is the longest-known and best-under- m5C in RNA can be reliably detected using radioactive label- stood epigenetic modification of DNA (Jones 2012). The genome- ing and thin-layer chromatography (Hengesbach et al. 2008). wide analysis of m5C patterns has greatly aided our understanding However, this method only allows for an indirect quantification of epigenetic gene regulation (Bock et al. 2010). Changes in DNA of global methylation levels. In comparison, high-performance methylation patterns have been found to underpin organismal liquid chromatography coupled to mass spectrometry (LC-MS) is development and cellular differentiation (Smith and Meissner more accurate and also allows the analysis of larger sample num- 2013) and also provide valuable biomarkers for the detection of bers (Thüring et al. 2016) but currently does not provide any infor- human diseases, including cancer (Heyn and Esteller 2012). mation about the sequence context of the methylation marks. m5C also represents a well-known modification of RNA Several methods for the mapping of RNA cytosine methylation (Motorin et al. 2010). In comparison to DNA modifications, RNA marks have been proposed (Hussain et al. 2013a; Li et al. 2016); modifications are substantially more diverse and complex, but however, they are usually based on indirect detection and thus rep- their functional significance is only beginning to be elucidated resent approximations of the actual distribution. Direct mapping (Gilbert et al. 2016; Tuorto and Lyko 2016). RNA modifications of RNA m5C marks in their native sequence context is currently are particularly enriched in tRNAs, where they are often linked only provided by RNA bisulfite sequencing (Schaefer et al. 2009). to translational regulation (Agris 2008). In this context, it has Bisulfite sequencing is based on the selective deamination of been shown that m5C modification of tRNA plays an important unmethylated cytosines, thus converting unprotected cytosines role in tRNA stability and in the regulation of translational fidelity to uracils, followed by sequencing-based detection of methyla- (Schaefer et al. 2010; Tuorto et al. 2012, 2015; Blanco et al. 2014). tion-related sequence polymorphisms (Clark et al. 1994). RNA Furthermore, m5C is also a widely conserved modification of bisulfite sequencing can accurately identify the presence of select- rRNA, where it is implied in the quality control of ribosome bio- ed known methylation marks in tRNA and rRNA and has proven to genesis (Sharma et al. 2013; Bourgeois et al. 2015; Schosserer be very useful for the molecular characterization of RNA cytosine-5 et al. 2015). These processes have been associated with a variety methyltransferases (Schaefer et al. 2009). of human diseases (Blanco and Frye 2014). A few studies have also utilized RNA bisulfite sequencing to map RNA m5C marks at the transcriptome level and found evidence for the presence of m5C in mammalian mRNAs and non- coding RNAs (Squires et al. 2012; Hussain et al. 2013b; Khoddami Corresponding author: [email protected] Article published online before print. Article, supplemental material, and publi- © 2017 Legrand et al. This article, published in Genome Research, is available cation date are at http://www.genome.org/cgi/doi/10.1101/gr.210666.116. under a Creative Commons License (Attribution-NonCommercial 4.0 Inter- Freely available online through the Genome Research Open Access option. national), as described at http://creativecommons.org/licenses/by-nc/4.0/. 27:1589–1596 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/17; www.genome.org Genome Research 1589 www.genome.org Downloaded from genome.cshlp.org on September 24, 2021 - Published by Cold Spring Harbor Laboratory Press Legrand et al. and Cairns 2013; Amort et al. 2017). While it has been suggested Sequenced reads were aligned with BSMAP (Xi and Li 2009) and that the human coding and noncoding transcriptome contains subjected to initial quality control (read length ≥25 nt, aligned up to 10,000 methylation sites (Squires et al. 2012; Amort et al. uniquely, forward and reverse reads at the same location). A closer 2017), the function of these marks has remained elusive. examination of candidate methylation sites revealed many obvi- Furthermore, the available studies could not define a common ous false-positive sites that were related to inefficient conversion set of substrate mRNAs or consensus methylation target sequenc- (i.e., tracts of three or more consecutive nonconverted Cs) or mis- es, which raised the possibility that some of the results were influ- alignment (Supplemental Fig. S1). We therefore implemented enced by incomplete deamination, secondary structures, or other filters for the removal of these artifacts (for details, see Supplemen- confounding factors that are known to affect bisulfite sequencing tal Methods). For methylation calling, a Poisson distribution was (Supplemental Table S1). It is also possible that methylation call- fit to each sample, and nonconversion P-values were calculated. Fi- ing was influenced by insufficient statistical stringency. nally, available replicates were joined, and a combined nonconver- Standard analytical pipelines for whole-transcriptome bisul- sion P-value was calculated (Fig. 1B; Methods). fite sequencing (WTBS) data sets are currently not available. Fur- For the present study, we sequenced and analyzed three rep- thermore, available tools do not sufficiently address the need for licates of bisulfite-converted libraries from mouse embryonic statistical approaches in the elimination of stochastic artifacts. stem (ES) cells and various genotypes; 19,198,431 to 84,435,380 Our approach addresses these issues and identifies profoundly dif- read pairs were available after sequencing of each sample (Supple- ferent methylation patterns for mRNA, rRNA, and tRNA. mental Table S2). Bisulfite conversion rates were determined from unmethylated regions of rRNA and ranged from 98.4%–98.6% Results (Supplemental Table S2). Initial data analysis using the mRNA data set from a single Whole-transcriptome bisulfite sequencing libraries were prepared wild-type replicate showed that the vast majority of cytosines by separating total RNA samples into small (<200 nt) and long had conversion ratios >90% (Fig. 2A). To further analyze the (>200 nt) RNA fractions. Depending on whether ribosomal RNA nonconverted cytosine reads, we estimated the Poisson parameter λ was also examined, an rRNA depletion step was included or not. p as the count of nonconverted (Nn) cytosines separately within The long fraction was fragmented to a size distribution appropriate each coverage bin between 10× and 1200× (bins of 10 in coverage, for Illumina sequencing. Both fractions were DNase-digested, extended to ensure a minimum of 60 data points were used; for fur- bisulfite converted, and end-repaired prior to cDNA library prepa- ther details, see Methods). The average and median coverages were ration for deep sequencing (Fig. 1A; Supplemental Methods). 44× and 20×, respectively. The ratio r = λp/(Nn + Nc) was computed in each bin (where Nc = count of convert- ed cytosines, N = Nn + Nc is the coverage). r had a tight distribution and showed no strong dependence on N. The median of r was 0.0164, with 95% confidence interval [0.0164; 0.0166] (Fig. 2B; Supple- mental Table S3). Not surprisingly,
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-