bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Acute depletion of METTL3 identifies a role for N6-methyladenosine in alternative intron/exon inclusion in the nascent transcriptome
Guifeng Wei1, Mafalda Almeida1, Greta Pintacuda1,4, Heather Coker1, Joseph S Bowness1, Jernej Ule2,3, and Neil Brockdorff1,*
1. Developmental Epigenetics, Department of Biochemistry, University of Oxford, Oxford, UK
2. The Francis Crick Institute, London, UK
3. Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, Queen Square, London, UK
4. Present address: Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
*. Correspondence, Neil Brockdorff, [email protected]
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Abstract RNA N6-methyladenosine (m6A) modification plays important roles in multiple aspects of RNA regulation. m6A is installed co-transcriptionally by the METTL3/14 complex, but its direct roles in RNA processing remain unclear. Here we investigate the presence of m6A in nascent RNA of mouse embryonic stem cells (mESCs). We find that around 10% m6A peaks are in introns, often close to 5’-splice sites. RNA m6A peaks significantly overlap with RBM15 RNA binding sites and the histone modification H3K36me3. Interestingly, acute dTAG depletion of METTL3 reveals that inclusion of m6A-bearing alternative introns/exons in the nascent transcriptome is disrupted. For terminal or variable-length exons, m6A peaks are generally located upstream of a repressed 5’- splice site, and downstream of an enhanced 5’-splice site. Intriguingly, genes with the most immediate effects on splicing include several components of the m6A pathway, suggesting an autoregulatory function. Our findings demonstrate a direct crosstalk between m6A machinery and the regulation of RNA processing.
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Introduction RNA is subject to diverse post-transcriptional modifications that have emerged as new layers of gene regulation (Fu et al., 2014; Roundtree et al., 2017a; Yue et al., 2015). Among these, N6- methyladenosine (m6A) is the most prevalent and abundant internal RNA modification on mRNA. m6A was initially identified in the 1970s (Desrosiers et al., 1974; Perry et al., 1975), and the enzyme that catalyzes this modification was described in the mid-1990s (Bokar et al., 1994; Bokar et al., 1997). Accumulating evidence suggests that RNA m6A modifications are in most part installed by the METTL3/14 core heterodimer (Liu et al., 2014), which together with accessory proteins WTAP (Ping et al., 2014), VIRMA (Schwartz et al., 2014), RBM15/15B (Patil et al., 2016), CBLL1 (Ruzicka et al., 2017), and ZC3H13 (Knuckles et al., 2018; Wen et al., 2018) forms the m6A writer complex. Structural studies have revealed that METTL3 is the only catalytic subunit, while METTL14 has a degenerate active site and maintains the complex’ integrity and substrate RNA recognition (Sledz and Jinek, 2016; Wang et al., 2016a; Wang et al., 2016b). Similar to DNA and histone modifications pathways, the m6A pathway has specific eraser (FTO and ALKBH5) and reader proteins (YTH-domain-containing proteins, YTHDC1/2 and YTHDF1/2/3) (Zaccara et al., 2019).
Global m6A patterns have been profiled using m6A specific antibodies coupled to high- throughput sequencing (Dominissini et al., 2013; Ke et al., 2015; Linder et al., 2015; Meyer et al., 2012). Antibody-free m6A profiling methods, MAZTER-seq (Garcia-Campos et al., 2019) and m6A-REF-seq (Zhang et al., 2019) have been developed since, but are limited to a subset of the m6A (m6ACA) sites. Extensive m6A profiling in a variety of RNA populations from diverse species and tissues have revealed that the majority of mRNAs are m6A modified with preferred sites occurring in clusters, most commonly in the 3’ UTR and around the stop codon (Dominissini et al., 2012; Meyer et al., 2012). Individual m6A sites have the consensus sequence DRACH (Dominissini et al., 2012; Linder et al., 2015; Meyer et al., 2012). The installation of m6A by the writer complex occurs co-transcriptionally and sites are found both in exons (the majority) and introns (Ke et al., 2017; Louloupi et al., 2018). An important factor for targeting m6A to defined sites is the RNA binding protein RBM15/15B, a subunit of the m6A writer complex (Coker et al., 2020; Patil et al., 2016). Additionally, the METTL14 subunit recognizes the histone modification H3K36me3, which is enriched within gene bodies of active genes (Huang et al., 2019). Finally some transcription factors (TF) have been proposed to facilitate m6A targeting, for example SMAD2/3 (Bertero et al., 2018) and CEBPZ (Barbieri et al., 2017), although only for a small number of transcripts in certain conditions and/or cell types. bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The m6A modification has important functions in mRNA metabolism, notably in the regulation of RNA processing (Alarcon et al., 2015b), nuclear export (Roundtree et al., 2017b), turnover (Ke et al., 2017; Liu et al., 2020; Wang et al., 2014), and translation (Barbieri et al., 2017; Wang et al., 2015). There are however contradictory findings, for example in relation to alternative splicing (Alarcon et al., 2015a; Ke et al., 2017; Xiao et al., 2016), translation (Wang et al., 2015; Zaccara and Jaffrey, 2020), and X chromosome inactivation (Coker et al., 2020; Nesterova et al., 2019; Patil et al., 2016). Confounding factors include the difficulty in discriminating primary and secondary effects following chronic long-term knockout/knockdown of m6A writers/readers, and cell lethality effects linked to the important role of m6A in essential cell functions (Barbieri et al., 2017; Xiao et al., 2018).
In this study we analyse m6A in the nascent transcriptome of mouse embryonic stem cells (mESCs) with a view to uncover important roles of METTL3 complex in alternative splicing. We make use of the dTAG protein degron system (Nabet et al., 2018) to rapidly and acutely deplete the m6A methyltransferase METTL3, thereby discriminating primary from secondary effects. We characterize intronic m6A sites and show that they correlate with both RBM15 binding and the histone H3K36me3 modification. Functional analysis following acute depletion of METTL3 uncovered numerous changes to alternative splicing occurring at or near to m6A methylation sites. Intriguingly, m6A-dependent alternative splicing affects several genes encoding factors in the m6A pathway, suggestive of autoregulatory functions.
Results Mapping m6A in the nascent mESC transcriptome Chromatin associated RNA (ChrRNA) is substantially enriched for nascent transcripts (Nesterova et al., 2019). Thus, to investigate the roles of m6A in nascent RNA processing in mouse embryonic stem cells (mESCs), we performed MeRIP-seq from ChrRNA, referred to henceforth as ChrMeRIP (Figure 1A and S1A). Sequencing of input showed that 70~80% of reads are intronic (Figure S1B). To minimize specific antibody bias, we used two commercially available m6A antibodies (SySy and Abcam) to identify high confidence m6A-modified RNA regions. Using maximum ORF and longest ncRNA isoforms as representative transcripts (see Methods), refined peak calling analysis (see Methods) classified 5277, 5472, and 6319 m6A peaks into Confidence group 1 (high), Confidence group2 (medium), and Confidence group3 (low), respectively (Figure 1B-D and S1C,D). The trend of m6A peak intensity in the different groups accords with their confidence classifications (Figure 1B and S1E). Notably, the overlap between peaks from SySy and Abcam antibodies is approximately half, which is similar to the differences in peak detection bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 1
A Nuclei Isolation D Conf.Group1 (n=5277) Conf.Group2 (n=5472) protein_coding protein_coding (6%) ncRNA ncRNA ncRNA (9%) (0%) (2%) ncRNA (2%) intergenic (1%) Nucleoplasm RNA (7%) Trizol intron intron (6.23%) Chromatin-bound RNA intergenic (10.307%) intergenic (10%) (6.82%) intergenic Data Analysis Sequencing (10.325%) input SySy Location Location
Abcam @m6A-IP exon exon (79.368%) (86.94%) Confident Group1 protein_coding Confident Group2 SySy, Abcam protein_coding (78%) (85%) Confident Group3
B Antibody Abcam C E Cfg1 Cfg2 62.15% p=1e-224 45.06% p=1e-124 SySy 2 2 10 Abcam SySy 1.5 1.5 1 1
0.5 0.5 exon SySy 0.49 1.00 0 0 58.68% p=1e-24 53.02% p=1e-32 5 2 2 Abcam 1.00 0.46 1.5 1 1
0.5 intron 0 0
0 1.0 0.9 0.8 0.7 0.6 0.5 log2(Fold Change) 54.79% p=1e-24 50.31% p=1e-30 2 2 % peaks overlapped 1.5 1.5 1 1
0.5 0.5
0 0 intergenic
Peaks Cfg1(sysy,abcam) Cfg2(sysy) Cfg3(abcam) F protein_coding ncRNA intron 0.05 0.100 Cfg1 (n=4501) Cfg1 (n=87) 0.05 Cfg1 (n=329) Cfg2 (n=4255) Cfg2 (n=88) Cfg2 (n=564) 0.04 0.075 0.04 0.03 0.050 0.03 0.02 Fraction 0.02 0.025 0.01 0.01 0.00 0.000
5’ 3’ 5’ 3’ 5’UTR CDS 3’UTR 5’-splice site 3’-splice site Position in normalized transcript Position in normalized transcript Position in normalized intron G GENCODE VM23 Comprehensive Transcript Set GENCODE VM23 Comprehensive Transcript Set Ythdc1 Spen
500 _ 1kb - 0-0 - mESC_Abcam_Rep1 mESC_Abcam_Rep1 2 kb - 0-0 - -500 _ 500 _ - 0-0 - mESC_Abcam_Rep2 mESC_Abcam_Rep2 - 0-0 - -500 _ 400 _ - 0-0 - mESC_SySy_Rep1 mESC_SySy_Rep1 - 0-0 - -400 _ 400 _ - 0-0 - mESC_SySy_Rep2 mESC_SySy_Rep2 - 0-0 - -400 _ 2.5 _ 2.5 _ RBM15_irCLIP_Rep1_Pos_Strand RBM15_irCLIP_Rep1_Neg_Strand
0 _ 0 _ 2 _ 2 _ RBM15_irCLIP_Rep2_Pos_Strand RBM15_irCLIP_Rep2_Neg_Strand
0 _ 0 _ 2 _ 2 _ H3K36me3(PSU) H3K36me3(PSU)
0 _ 0 _ bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 1. ChrMeRIP-seq reveals that 6-10% of m6A peaks are located in introns. (A) Schematic illustrating the experimental and computational workflow for ChrMeRIP-seq. Chromatin-associated RNAs (ChrRNA) enriched for introns was used for MeRIP with two commercially available m6A antibodies (SySy and Abcam). Three confidence groups of m6A sites were identified. (B) Boxplots showing the m6A intensity distributions for Confidence group 1 (Cfg1), Confidence group 2 (Cfg2), and Confidence group 3 (Cfg3) groups (see Methods for details). Pink and cyan represent m6A intensity from Abcam and SySy antibodies respectively. (C) Heatmap showing the peak overlap between two antibodies. (D) Pie chart output from RNAmpp analysis showing the distribution of m6A peaks for Cfg1 (left) and Cfg2 (right) group. Peak numbers are indicated above. The MaxORF and longestNcRNA isoform was chosen for each gene. (E) Most representative motifs called for each subgroup (exonic, intronic, and intergenic) in Cfg1 and Cfg2 group. (F) RNA Metaprofiles (RNAmpp) of m6A peaks in transcriptome for Cfg1 (red) and Cfg2 (blue) groups. Left plot is for aggregated protein-coding gene, middle for noncoding RNA, and right for normalized intron. (G) UCSC genome browser screenshots showing example genes (left, Ythdc1 intron11; right Spen intron2) harboring intronic m6A methylation. From top to bottom, tracks denote gene annotation, MeRIP-seq (Abcam 2 replicates, SySy 2 replicates), RBM15 irCLIP-seq (2 replicates), and H3K36me3 ChIP-seq.
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
surveyed between studies (Figure 1C) (McIntyre et al., 2020). Despite the large fraction of intronic reads in the input, only 6.2% of confidence group1 (Cfg1) and 10.3% of confidence group2 (Cfg2) peaks are from intronic regions (Figure 1D and S2A), as defined by the position of the single-nucleotide peak summit (see Methods). This is slightly higher than previously reported for MeRIP-seq or m6A-CLIP studies using only messenger RNA from mESCs (Figure S2B) (Batista et al., 2014; Geula et al., 2015; Ke et al., 2017), and is in line with ChrMeRIP-seq from HeLa cells (Ke et al., 2017). The majority of intronic m6A modification occurs in protein-coding genes rather than noncoding RNAs (Figure 1D and S2A,B).
We developed RNA metaprofile plot (RNAmpp) to describe the distribution of m6A in the nascent transcriptome (see Methods). m6A peaks from both confidence groups 1 and 2 are enriched around stop-codon regions or at the beginning of the 3’-UTR in mRNAs, and at all genomic locations the canonical DRACH m6A motif (GGACU) is the most highly represented motif (Figure 1E,F and S2C). The few m6A peaks which map to lncRNAs are not close to the 3’-end regions, but are rather distributed randomly across the transcripts, as exemplified by Xist, Norad, and Malat1 lncRNA genes (Figure S2D) (Coker et al., 2019; Patil et al., 2016). We found several clear examples of intronic m6A peaks located close to 5’-splice sites, such as the two intronic m6A peaks from Ythdc1 intron 11 and Spen intron 2 (Figure 1G).
Given that the pattern of exonic m6A has been extensively characterized (Dominissini et al., 2012; Meyer et al., 2012), we sought to specifically investigate the deposition pattern and characteristics of intronic m6A modification. To reduce bias, we compared the GC content, conservation level, and relative position of intronic m6A peaks to their size-matched control regions derived from random regions within the same intron (see Methods). This analysis shows that intronic m6A-methylations are more common in regions which have high GC content, are evolutionarily conserved, and are in proximity to 5’-splice sites (Figure 2A-D and S3A-C). When compared to randomly-chosen introns from the same genes as control, longer introns are preferentially methylated (Figure S3D).
Most exonic methylation is deposited around stop-codon regions, as previously noted (Dominissini et al., 2012; Meyer et al., 2012), but intronic methylation sites are fairly evenly distributed relative to host transcripts (Figure S3E). Therefore, we queried in particular whether intronic methylated regions are located close to alternative exons or introns using the comprehensive GENCODE annotation set (vM24). Indeed, we found that higher confidence groups of m6A-methylation were more likely to reside in alternative intron/exon regions (see bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 2
10.0 A Cfg1 ExonORIntron
7.5 Alternative Constitutive
5.0 Phastcon
0.00 2.5 0.25 m6A_Intensity 0.50 0.75 0.0 1.00
0.00 0.25 0.50 0.75 1.00 5’-splice site relative Position 3’-splice site
0.4 ● B 37.7% C Cfg1 Cfg2 Cfg3 D Cfg1 Cfg2 Cfg3 34.6% ● ● ● ● 6 1.00 ● ● ● ● ● ● ● ● ● ● 0.3 ● ● ● ● ● ● ● ● ● ● 28.5% ● ● ● ● ● ● ● ● ● ● ● 25.7% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● ● ● ● ● 4 ● ● ● ● ● ● ● 0.1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.50 ● ● ● Cfg1 Cfg2 Cfg3 Random ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● 0.0 Phastcon ● ● ● ● 2.9% ● ● 11.0% m6A_Intensity ● ● ● ● ● 0.2 ● 0.25 0.4 53.5% 0 0.6 66.7% ● ● ● ● 0.8 ● ● 0.00 ● Alternative PercentageAlternative 25% toward 5’-splice site ● 1.0 Alternative Constitutive Alternative Constitutive Alternative Constitutive Peak Control Peak Control Peak Control bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Figure 2. Pattern of intronic m6A modification. (A) Dot plot show the relative position, m6A intensity, conservation, and location in alternative or constitute introns for all intronic m6A methylation in Cfg1. Here 0 and 1 in the X-axis represent the 5’- and 3’-splice sites respectively. The Y-axis denotes m6A intensity calculated as an average of all replicates. Dot area indicates the PhastCon conversation score from UCSC browser. Red and blue dot color denotes location in alternative and constitutive introns respectively. (B) Barplots (top) showing the fraction of m6A peak located in first quarter (close to 5’-splice site) of introns for Cfg1, Cfg2, and Cfg3 classes, as well as random simulated peak summits. Barplots (bottom) showing the fraction of m6A peaks located in alternative exon/introns for all the groups. The percentages for each bar were labelled. (C) Boxplots showing intensity of intronic m6A peaks located in alternative and constitute intron for all classes. (D) Boxplot of PhastCon scores for all classes of m6A peaks located in intron regions annotated from the MaxORF_LongestNcRNA isoforms, compared with controls matched for size and intron- of-origin.
bioRxiv preprint doi: https://doi.org/10.1101/2020.09.10.291179; this version posted September 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Methods) than low confidence groups or random genomic regions (Figure 2A,B). This indicated a potential role of METTL3 complex and its methylation sites in the regulation of alternative splicing.
Intronic m6A modifications correlate with RBM15 binding and H3K36me3 The RNA-binding protein RBM15 plays an important role in targeting m6A to defined sites in mRNA (Patil et al., 2016). We went on to examine if this pathway is linked to the intronic m6A sites that we observe in nascent RNA. To map binding sites for RBM15 in mESCs we performed infrared crosslinking immunoprecipitation followed by sequencing (irCLIP-seq) (Zarnegar et al., 2016), making use of an mESC line in which both emGFP-Rbm15 and Xist RNA are induced by treatment with doxycycline (Coker et al., 2020) (Figure S4A). RBM15 interacts with the Xist A- repeat region and contributes to the deposition of m6A methylation at sites immediately downstream (Coker et al., 2020; Patil et al., 2016) (Figure S4B), and thus provides a useful positive control.
Cross-linking induced truncation sites (CITS or RT stops) are the main signature occurring in irCLIP-seq datasets. The RNA meta-profile plot shows where these CITSs reside across normalized transcripts, with two main peaks at the start of the transcript and near the stop-codon region (Figure 3A,B), in agreement with the RBM15/15B binding profile in human cells (Patil et al., 2016). Examination of Rbm15 binding across introns shows a preference for the 5’-splice site, consistent with the profile of intronic m6A (Figure 3B and S4C). Motif analysis of CITSs revealed an RBM15 binding consensus comprising 3 or 4 consecutive U bases, both for exonic and intronic sites (Figure 3C and S4D). This is also the case for crosslinking-induced mutations (CIMS) (Figure 3C). Importantly, we found that RBM15 binding is centered at m6A peak summits for all exonic, intronic, and intergenic regions and in general correlates with peak confidence (Figure 3D and S5A,B). We further split each m6A confidence group by strong or weak RBM15 binding, based on whether strong RBM15 biding sites (CITS>=3 in 2 replicates) intersected the m6A peak (Figure 3E-J and S5C-E). Over half of sites in all sub-groupings of Cfg1 and Cfg2 were assigned as strong RBM15-binding, with the only exceptions being Cfg3 intronic and intergenic groups (Figure 3E,H, and S5C). RBM15 binding was centered on m6A peaks across all the confidence groups (Figure 3F,I and S5D).
In addition to RBM15, the histone modification H3K36me3 has been proposed to play a role in directing m6A to defined sites in mRNA, mediating interaction with the METTL14 subunit of the core m6A complex (Huang et al., 2019). Consistent with this model we observe that high bioRxiv preprint preprint (whichwasnotcertifiedbypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. 0.00 0.01 0.02 0.03 0.04 0.00 0.01 0.02 0.03 0.04 0.0 0.2 0.4 0.6 0.8 D C A Figure 3 RBM15 irCLIPCITS(n=29415) intergenic ncRNA -0.5 -0.5 -0.5 (1%) (19%) 0.5 1.5 0.5 1.5 0.5 1.5 0 1 2 0 1 2 0 1 2 RBM15_irCLIP intergenic (19.0%) CITS IS28.65%p=1e-125 CITS protein_coding CIMS doi: (15%) m6A m6A m6A (15.9%) intron 60.70% p=1e-35 53.73% p=1e-259 https://doi.org/10.1101/2020.09.10.291179 Location 0.5Kb 0.5Kb 0.5Kb Cfg3 Cfg2 Cfg1 cN (3%) ncRNA protein_coding (65.1%) CITS>=4 CITS>=4 exon
Cfg3 Cfg2 Cfg1 (62%) -0.5 Intronic m6Asites m6A B Fraction 0.01 0.02 0.5Kb protein_coding 5’ 5’UTR E F
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 density 0.00 0.01 0.02 0.03 0.04 0.05 Position innormalizedtranscript G 0.4 0.6 0.8
Cfg1 0.0 -5.0 exon 04 92 52.1% 59.2% 60.4%
; 0.2
this versionpostedSeptember11,2020. CDS
0.4 H3K36me3 0.6
C
0.8
1.0 1.2 5.0Kb
nrnintergenic intron 1.4 0 1000 500 0 0.3 0.4 0.5 0.6 0.2
3’UTR
0.0 -5.0
0.2
0.4 H3K36me3 0.6 C
3’
intergenic intron exon 0.8
0.025 0.050 0.075 0.100 0.125 1.0
5.0Kb 5’-splice site 1.2 intron H 0.00 0.01 0.02 0.03 J I 0.3 0.4 0.5 0.6