ARTICLE

https://doi.org/10.1038/s42003-021-02406-5 OPEN Cytosine and adenine deaminase base-editors induce broad and nonspecific changes in expression and splicing

Jiao Fan1,2,5, Yige Ding3,5, Chao Ren1, Ziguo Song3, Jie Yuan1, Qiuzhen Chen1,4, Chenchen Du3, Chao Li3, ✉ ✉ Xiaolong Wang 3 & Wenjie Shu 1

Cytosine or adenine base editors (CBEs or ABEs) hold great promise in therapeutic appli- cations because they enable the precise conversion of targeted base changes without gen- erating of double-strand breaks. However, both CBEs and ABEs induce substantial off-target

1234567890():,; DNA editing, and extensive off-target RNA single nucleotide variations in transfected cells. Therefore, the potential effects of deaminases induced by DNA base editors are of great importance for their clinical applicability. Here, the transcriptome-wide deaminase effects on gene expression and splicing is examined. Differentially expressed (DEGs) and dif- ferential alternative splicing (DAS) events, induced by base editors, are identified. Both CBEs and ABEs generated thousands of DEGs and hundreds of DAS events. For engineered CBEs or ABEs, base editor-induced variants had little effect on the elimination of DEGs and DAS events. Interestingly, more DEGs and DAS events are observed as a result of over expres- sions of cytosine and adenine deaminases. This study reveals a previously overlooked aspect of deaminase effects in transcriptome-wide gene expression and splicing, and underscores the need to fully characterize such effects of deaminase enzymes in base editor platforms.

1 Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing, China. 2 Institute of Geriatrics, National Clinical Research Center of Geriatrics Disease, Second Medical Center of Chinese PLA General Hospital, Beijing, China. 3 Key Laboratory of Animal Genetics, Breeding and Reproduction of Shaanxi Province, College of Animal Science and Technology, Northwest A&F University, Yangling, China. 4 Computer School, University of South China, ✉ Hengyang, China. 5These authors contributed equally: Jiao Fan, Yige Ding. email: [email protected]; [email protected]

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 1 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

ecently, a variety of cytosine and adenine base editors 8893 putative RNA SNVs in CBEs and 7501, 3495, 3505, and R(CBEs and ABEs) that combines cytosine or adenosine 6661 putative RNA SNVs in ABEs, respectively (Supplementary deaminases with CRISPR-Cas9, has been developed. This and Methods Supplementary Tables 1 and 2). We noted that only variety enables highly efficient and precise targeted C-to-T or A- about 5 and 10% of DEGs were mapped to RNA SNVs, respec- to-G base conversions1–4. Previous studies have reported the tively (Supplementary Tables 1 and 2). Notably, transfection of induction of unexpected off-target DNA editing in mammalian APOBEC1 or TadA-TadA* induced a higher number of DEGs cells by both CBEs and ABEs. This limited their broad use in compared with other transfected cells. This suggests that biomedical studies and therapeutic applications3,5. Adeno- increased DEGs in CBE- or ABE-treated cells were likely caused associated viruses (AAVs) are the most commonly utilized by the overexpression of the deaminases APOBEC1 or TadA delivery system for gene therapies. The main reason is their (Fig. 1a–d and Supplementary Data1). Moreover, the number of efficiency of DNA editing, as well as their capability to maintain DEGs increased with higher levels of expressions of CBEs or long-term gene expression without toxicity or immune responses ABEs (Fig. 1e, f). We observed 46.2 ± 9.0% (mean ± sem) or 52.4 to the viral vector in vivo6–8. Thus, the extent of the potential ± 8.7% (mean ± sem) overlap between any two groups from the effects of DNA base editor-induced deaminases is of great BE3- or ABE7.10-transfected cells with or without sgRNA, importance. Recent studies have shown that both CBEs and ABEs respectively (Fig. 1g, h). Together, the DEGs induced by CBEs can indeed induce extensive transcriptome-wide deamination of and ABEs were independent of sgRNAs and were presumably a cytosine or adenosine in human cells; this leads to the generation result of APOBEC1 and TadA-TadA* overexpression, of thousands of off-target RNA single nucleotide variations respectively. (SNVs)9–13. Whether the presence of this large number of Next, we characterized these transcriptome-wide DEGs unwanted RNA SNVs has genetic or transcriptional consequences identified in BE3-treated cells. More than 60.0% of the DEGs has not been comprehensively explored to date, this lack of data were identified in BE3-treated or APOBEC1-expressing cells may lead to underestimation of the effect. Although engineering- (Supplementary Fig. 1f). These data indicate that the occurrence improved CBEs or ABEs can largely decrease the number of off- of this substantial number of DEGs was, in fact, induced by BE3 target RNA SNVs in human cells11,12, a comprehensive char- or APOBEC1, rather than a spontaneous occurrence. Similarly, acterization of the effects of CBE or ABE deaminases on RNA is 59.3% of ABE7.10-induced DEGs were expressed, which is crucial to fully realize their utility. consistent with the action of TadA (Supplementary Fig. 1g). This study analyzed RNA-seq datasets from two recent Notably, 12 and 20 cancer-associated genes were observed among studies9,12. Transcriptome-wide differentially expressed genes the DEGs induced by BE3 and ABE7.10, respectively. These genes (DEGs) and differential alternative splicing (DAS) events that had included ETV5, CDKN1A, and DDIT3, which highlights concerns been induced by CBEs or ABEs were identified. Both cytosine and about the oncogenic potential associated with DNA base editing adenine base editors had generated hundreds of DEGs and DAS (Fig. 1i, j). GO enrichment analysis of these DEGs identified in events. However, these phenomena had previously not been BE3- or ABE7.10-treated groups specifically identified enrich- reported. Furthermore, engineering of CBE or ABE variants had ments of regulation of cell proliferation, angiogenesis, cell cycle only negligible effects on the prevention of DEG and DAS arrest, and SMAD signal transduction (Supplementary occurrences and even increased the emergence of DEGs and DAS Fig. 1h, i and Supplementary Data 2). Each of these pathways events because of the overexpression of cytosine and adenine exerts well-established roles in the progression of cancer. deaminases. These data highlight a previously unreported aspect Bulk RNA-seq is based on large pools of cells, which offers the of deaminase effects on gene expression and splicing, which potential to mute random signals via population averaging. To apparently cannot be eliminated by genetic engineering. These account for this, single-cell RNA-seq analysis was performed of results have important implications for the future use of base DEGs from each of the three groups12 (GFP-alone, BE3-site 3, editors in both research and as therapeutic agents, thus high- and ABE7.10-site 1; Fig. 1k). Hundreds of DEGs for BE3- or lighting a full characterization of deaminase enzyme activities in ABEs7.10-edited cells were consistently observed compared with base editors. the GFP-alone group (Fig. 1k). Notably, the percentage of DEGs shared by any of these BE3- or ABE7.10-edited cells (19.8 ± 1.6% or 15.5 ± 1.6%) was much lower than that of cell populations Results (52.4 ± 8.7%, 46.2 ± 9.0%) (Supplementary Fig. 1j). These data Base editors induce transcriptome-wide DEGs. To evaluate the indicate that BE3- or ABE7.10-induced DEGs were essentially deaminase effects of base editors, an RNA-seq based analysis of random and independent of the utilized sgRNA. transcriptome-wide DEGs was conducted. This was applied to one type of CBE (BE3; APOBEC1-nCas9-UGI) and one type of ABE (ABE7.10; TadA-TadA*-nCas9), combined with GFP and Engineering of deaminases causes a modest reduction in DEGs. either with or without a single guide RNA (sgRNA) in Recent studies have reported that off-target RNA SNVs, induced HEK293T cells. Cells receiving GFP alone were used as controls. by DNA base editors, can be eliminated via the engineering of We first compared GFP versus GFP cells and found the numbers deaminases11,12. To test whether such engineered deaminases of of DEGs were closed to 0 (Supplementary Fig. 1a). A total of CBE or ABE variants can alleviate their off-target effects on gene 1178, 757, 172, and 316 DEGs were identified in cells among expression, DEGs were identified in transfected HEK293T cells APOBEC1, BE3 without sgRNA, BE3-site 3, and BE3-RNF2, that expressed engineered CBE or ABE, variant groups. For CBE respectively (Fig. 1a, b and Supplementary Data 1). Moreover, engineering, five CBE variants were examined: a point mutation DEGs were found in both coding and non-coding regions (Sup- and double mutations to BE3 (BE3W90A and BE3W90Y/R126E), plementary Fig. 1b, c). Similarly, 1761, 859, 215, and 357 DEGs BE3 with human APOBEC3A (hA3A), as well as its two variants were identified in cells expressing only TadA-TadA*, ABE7.10 BE3 (hA3AR128A), and BE3 (hA3AY130F). These variants had a without sgRNA, and ABE7.10 with either site 1, or site 2 sgRNA, decreased number of off-target RNA SNVs while maintaining respectively (Fig. 1c, d, Supplementary Fig. 1d, e and Supple- BE3-like DNA site-specific activity. BE3W90A and BE3 (hA3A) mentary Data 1). To explore the relationship between off-target induced a similar number of DEGs compared with BE3 (118, 123, RNA SNVs and DEGs, we identified 17,826, 12,393, 6530, and and 172, respectively). BE3W90Y/R126E, BE3(hA3AR128A), and

2 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5 ARTICLE

a b BE3 (757 DEGs) 10.0 10.0 n = 1,178 757 172 316 80

7.5 7.5 60

5.0 40 5.0 No. of DEGs No. 2.5 20 |Log2 Fold change| |Log2 Fold |Log2 Fold change| |Log2 Fold 2.5 0.0 0 1 5 10 15 20 X 1 5 10Chromosome 15 20 X BE3 Chromosome APOBEC1 BE3−siteBE3−RNF2 3 c d ABE7.10 (859 DEGs) 10.0 n = 1,761 859 215 357 80

7.5 9 60

5.0 6 40 No. of DEGs No. 2.5 20 |Log2 Fold change| |Log2 Fold |Log2 Fold change| |Log2 Fold 3

0.0 0 1 5 10 15 20 X 1 5 10 15 20 X Chromosome Chromosome ABE7.10 TadA−TadA* ABE7.10−siteABE7.10−site 1 2 e f g h 10.0 10.0 n = 172 2,657 n = 215 3,063

100 APOBEC1 TadA−TadA* 100 7.5 7.5 80 80 60 60 BE3 ABE7.10 5.0 5.0 40 40 20 20 BE3−site 3 0 ABE7.10−site 1 2.5 0

|Log2 Fold change| |Log2 Fold 2.5 |Log2 Fold change| |Log2 Fold

BE3−RNF2 ABE7.10−site 2 0.0 0.0

BE3 ABE7.10 BE3−site 3 APOBEC1 BE3−siteBE3−RNF2 3 TadA−TadA* ABE7.10−siteABEmax−site 1 1 ABE7.10−siteABE7.10−site 1 2 BE3(FNLS)−site 3 i j k BE3 ABE7.10 LCK 160 MYB ETV5 MYOD1 CDKN1A ROBO2 LMO1 CARS ETV5 120 C15orf65 BIRC3 MYC DDIT3 MAML2 CDKN1A HNF1A RET 80 FBLN2 EPHA3 FAM131B MYC No.of DEGs ERBB4 DNAJB1 IDH2 40 NBEA HERPUD1 JAK3 RNF11 ERBB4 PREX2 HOXA13 STK39 0 00.511.522.53 0 0.5 1 1.5 2 2.5 3 |Log2 Fold change| |Log2 Fold change| BE3-site 3 ABE7.10-site 3

BE3(hA3AY130F) induced the largest numbers of DEGs compared cells increased relative to ABE7.10 with site 1 sgRNA (404 vs. 215 with BE3 (1574, 3036, and 4524, respectively) (Fig. 2a, Supple- DEGs, respectively) (Fig. 2b, Supplementary Fig. 2d, e and Sup- mentary Fig. 2a, b and Supplementary Data 3). More than 56.1% plementary Data 3). The DEGs of ABE7.10F148A-transfected cells of the DEGs were identified in CBE variants-expressing cells at site 1 were slightly less than ABE7.10D53E-transfected cells (132 (Supplementary Fig. 2c). For ABE engineering, the introduction vs. 215 DEGs, respectively). Surprisingly, ABE7.10F148A- and site of the mutations D53E or F148A into both TadA and TadA* 2 sgRNA co-transfected cells induced 1854 DEGs (Fig. 2b). For showed that the numbers of DEGs in ABE7.10D53E-transfected ABE engineering, about 58.0% of DEGs occurred in the expressed

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 3 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

Fig. 1 Identification and characterization of transcriptome-wide differentially expressed genes (DEGs) induced by DNA base editors. a Jitter plots of RNA-seq experiments showing the DEGs identified in HEK293T cells transfected with APOBEC1, BE3, BE3-site 3, and BE3-RNF2. Cells transfected with a GFP-encoding plasmid served as a control for all comparisons. n total number of identified DEGs. b Manhattan plots showing representative distributions of DEGs across the human transcriptome in cells overexpressing BE3. are represented in different colors. The number of DEGs for each chromosome is presented at the right. c Jitter plots of RNA-seq experiments showing the DEGs identified in HEK293T cells transfected with TadA-TadA*, ABE7.10, ABE7.10-site 1, and ABE7.10-site 2. The GFP group serves as the control for all comparisons. n the total number of identified DEGs. d Manhattan plots showing the representative distributions of DEGs across the human transcriptome for ABE7.10. Chromosomes are indicated in different colors. The number of DEGs for each chromosome is presented at the right. e Jitter plots of RNA-seq experiments showing the DEGs identified in HEK293T cells transfected with BE3-site 3 or with BE3(FNLS)–site 3. GFP group served as the control for all comparisons. n total number of identified DEGs. f Jitter plots from RNA-seq experiments showing the DEGs identified in HEK293T cells transfected with ABE-site 1 or ABEmax–site 1. The GFP group served as a control for all comparisons. n total number of identified DEGs. g, h Ratio of shared DEGs between any two samples in the APOBEC1, BE3, TadA, and ABE7.10 overexpression groups. The proportion in each cell was calculated by the number of overlapping DEGs between two samples, divided by the number of DEGs in the row. i, j Cancer-related genes in BE3 or ABE7.10-induced DEGs. k Bar plots showing the number of identified DEGs in single-cell RNA-seq samples in BE3-site3 and ABE7.10-site1 overexpressing cells. Single-cell RNA-seq samples of GFP plasmid transfected cells served as a control for all comparisons. Error bars represent SD for three independent experiments. Corresponding data were presented in Supplementary Data 1.

genes (Supplementary Fig. 2f). Moreover, fewer ABEmax variants slightly higher numbers of DEGs than eA3A-BE3 and hAID-BE3 lacking the TadA domain (referred to as miniABEmax)9 were (Fig. 2e and Supplementary Data 6) and more than 65% of DEGs generated, and two miniABEmax variants (K20A/R21A and were expressed, which may have confounding effects (Supple- V82G) were also generated. Each of these editors was sgRNA mentary Fig. 2g). assayed targeting two endogenous sites (HEK site 2 and ABE site Next, we examined whether the numbers of DEGs and off- 16) and one site that does not occur in the (non- target RNA SNVs are correlated. Overall, the numbers of off- targeting (NT)). DEGs of ABEmax, miniABEmax, miniABEmax target RNA SNVs accounted for approximately only 5.6% of the (K20A/R21A), and miniABEmax (V82G) were identified. These variation in the numbers of DEGs as identified by RNA-seq four groups, at the three sgRNA targets, all induced thousands of (Fig. 2f). The correlation in BE3-treated or ABE-treated groups DEGs when protein 2A (P2A)-enhanced GFP (EGFP) fusion was considered separately. The numbers of off-target RNA SNVs (without nCas9) was used as control. Interestingly, ABEmax explained only 1.5% of the variation in CBE-treated groups, while induced the highest number of DEGs compared with miniABE- it explained 31% in ABE-treated groups (Supplementary Fig. 2h, max and the two miniABEmax variants (K20A/R21A and V82G) i). These data suggest that off-target RNA SNVs are likely not (Fig. 2c and Supplementary Data 4). responsible for the significant observed transcriptional differences Next, the nCas9-dependence of ABE-mediated DEGs was between groups treated with either base editors or controls. examined. DEGs induced by ABEmax, miniABEmax, miniABE- max (K20A/R21A), and miniABEmax (V82G) were identified using three sgRNAs in conjunction with nCas9-UGI-NLS fusion. Base editors induce transcriptome-wide DAS events. To test The four groups all induced hundreds of DEGs at particular whether base editors cause extensive isoform dynamics, sgRNA sites (Fig. 2d and Supplementary Data 5). With the transcriptome-wide DAS events were identified and categorized exception of miniABEmax (V82G) at sgRNA site 2, ABEmax into seven types: exon skipping (ES), alternative 5′ and 3′ splice induced the most DEGs compared with miniABEmax and the sites (A5/A3), intron retention (RI), mutually exclusive exons two miniABEmax variants (K20A/R21A and V82G) (Fig. 2d). (MX), and alternative first and last exons (AF/AL). We first Moreover, a comparison of ABE variants with the GFP control identified DAS events between GFP groups by randomly selecting showed that all four ABE variants yielded substantial reductions three repeats from six repeats of GFP cells as control groups and in DEGs compared with a nCsa9 with NT sgRNA negative the other three repeats as treatment groups. The numbers of DAS control (Fig. 2c, d). events between GFP groups were mostly less than 100 (Supple- To examine the nCas9-dependence of BE3-mediated DEGs, the mentary Fig. 3a). In total, 1779 (APOBEC1), 568 (BE3 without DEGs induced by three variants (BE3-R33A, BE3-R33A/K34A, sgRNA), 1821 (BE3-site 3), and 1612 (BE3-RNF2) DAS events and BE3-E63Q) were examined using RNF2 gRNA. These three were identified compared with GFP control groups (Fig. 3a and variants all had on-target DNA editing efficiencies comparable to Supplementary Data 7). More than 90% of DAS events occurred that of wild-type BE3 with reduced RNA editing. The nCas9- in expressed genes (Supplementary Fig. 3b). Furthermore, 1949 UGI-NLS fusion was used as a negative control in this (TadA-TadA*), 1324 (ABE7.10 without sgRNA), 513 (ABE7.10- experiment. Modest decreases in the numbers of transcriptome- site 1), and 1115 (ABE7.10-site 2) DAS events were identified wide DEGs were observed. Expression of WT BE3 induced 1585 (Fig. 3b and Supplementary Data 7) and also about 92.8% of DAS DEGs, and BE3-R33A or BE3-R33A/K34A induced 1145 and events were related to expressed genes (Supplementary Fig. 3c). 1031 DEGs, respectively (Fig. 2e and Supplementary Data 6). Similarly, we observed that less than 5% of DAS events were Furthermore, we sought to determine whether CBEs harbor other mapped to RNA SNVs in CBE and ABE (Supplementary Tables 1 cytidine deaminases (e.g., human APOBEC3A (hA3A), enhanced and 2). Next, the percentages of overlap in DAS events for CBE A3A (eA3A; an engineered A3A with more precise and specific and ABE groups were examined. A total of 51.1 ± 8.1% (mean ± DNA-editing activity), human activation-induced cytidine dea- sem) and 56.8 ± 7.6% (mean ± sem) of overlaps between any two minase (hAID)) and whether these induce DEGs independent of groups from the BE3- or ABE7.10-transfected cells were found, nCas9. For this, DEGs in HEK293T cells transfected with respectively, independent of the presence of sgRNA (Fig. 3c, d). plasmids expressing each of the CBEs (with sgRNA targeting Moreover, thousands of DAS events were identified in RNA- the RNF2 gene) were examined. By comparison, the two seq samples from engineered CBE or ABE variant groups previously described variants R33A and R33A/K34A induced compared with GFP-transfected samples. For engineered CBEs,

4 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5 ARTICLE

Fig. 2 Identification of DEGs for different engineered base editor variants. a Jitter plots of RNA-seq experiments in HEK293T cells showing DEGs identified for BE3-site 3, BE3W90Y/R126E-site 3, BE3W90A-site 3, BE3(hA3A)-site 3, BE3(hA3AR128A)-site 3, and BE3(hA3AY130F)-site 3 groups. Cells transfected with GFP encoding plasmids serve as a control for all comparisons. n total number of identified DEGs. Corresponding data were presented in Supplementary Data 3. b Jitter plots from RNA-seq experiments in HEK293T cells showing the DEGs identified for ABE7.10-site 1, ABE7.10D53E-site 1, ABE7.10F148A-site 1, ABE7.10-site 2, and ABE7.10F148A-site 2 groups. Cells transfected with GFP-encoding plasmids served as a control for all comparisons. n total number of identified DEGs. Corresponding data were presented in Supplementary Data 3. c Bar plots showing the number of DEGs identified in RNA-seq experiments in HEK293T cells that expressed ABEmax, miniABEmax, miniABEmax(K20A/R21A), or miniABEmax(V82G) as well as one of three different sgRNAs (HEK site 2, ABE site 16, and a non-targeted control (NT)). Cells transfected with GFP-encoding plasmids served as a control for all comparisons. Corresponding data were presented in Supplementary Data 4. d Bar plots showing the number of DEGs identified in RNA-seq experiments in HEK293T cells that expressed ABEmax, miniABEmax, miniABEmax(K20A/R21A), or miniABEmax(V82G) as well as one of three different gRNAs (HEK site 2, ABE site 16, and a non-targeted control (NT)). Cells expressing nCas9 with NT sgRNA served as a control for all comparisons. Corresponding data were presented in Supplementary Data 5. e Bar plots showing the number of identified DEGs in RNA-seq experiments in HEK293T cells that expressed WT BE3, SECURE-BE3(R33A), SECURE-BE3(R33A/K34A), hA3A-BE3, eA3A-BE3, and hAID-BE3 with co-expression of a gRNA, targeting a site on the RNF2 gene. Cells expressing nCas9 served as a control for all comparisons. Corresponding data were presented in Supplementary Data 6. f Correlations between the numbers of DEGs and the numbers of off-target RNA SNVs in BE3-treated groups. g Overall correlations between the numbers of DEGs and the numbers of off-target RNA SNVs in CBE-treated and ABE-treated groups. R2 values were calculated using robust linear regressions on the numbers of DEGs and the numbers of off-target RNA SNVs.

DAS events of BE3W90A and BE3 (hA3A)-transfected cells at site the rates of expressed genes were also more than 90% 3 were less compared to BE3 (647, 1232, and 1821, respectively). (Supplementary Fig. 3d, e). Furthermore, a different study11 In contrast, BE3 (hA3AR128A) at the same three sites produced reported that a large number of DAS events were induced by four similar numbers of DAS events compared with BE3 (1785 vs. ABE variants (ABEmax, miniABEmax, miniABEmax (K20A/ 1821) (Fig. 3e and Supplementary Data 8). For engineered ABEs, R21A), and miniABEmax (V82G)) in HEK293 cells at site 2, ABE the numbers of DAS events in ABE7.10D53E site 1-transfected site 16, and non-targeting sgRNA compared with EGFP or nCsa9 cells increased compared with that of ABE7.10-site 1 (1233 vs. controls (Fig. 3g, h, Supplementary Data 9 and Data 10). DAS 513, respectively) (Fig. 3f and Supplementary Data 8). When an events associated with BE3 variants (BE3-R33A, BE3-R33A/ F148A mutation was introduced into both TadA and TadA*, the K34A, hA3A-BE3, eA3A-BE3, and hAID-BE3; all using the RNF2 DAS events of ABE7.10F148A-transfected cells both at sites 1 and gRNA) were observed, compared with the nCas9-UGI-NLS 2 increased compared with ABE7.10 (1334 vs. 513 or 1964 vs. fusion negative control. These variants all decreased the number 1115, respectively) (Fig. 3f and Supplementary Data 8). For these of DAS events compared with WT BE3 (Fig. 3i and Supplemen- DAS events identified in engineered CBE or ABE variant groups, tary Data 11) and 93.6% of DAS events were related to expressed

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 5 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

a b c 2500 2000 SE A5 A3 MX RI AF AL SE A5 A3 MX RI AF AL

1600 2000 100 80 APOBEC1 1200 1500 60 40 BE3 800 1000 20 o. of DAS events o.

No. of DAS events No. N 0 BE3−site 3 400 500

BE3−RNF2 0 0

BE3 TadA* BE3 APOBEC1 BE3−site 3 ABE7.10 APOBEC1 BE3−RNF2 TadA− BE3−siteBE3−RNF2 3 ABE7.10−siteABE7.10−site 1 2 d e f 4000 2500 SE A5 A3 MX RI AF AL SE A5 A3 MX RI AF AL

2000 100 3000 80 TadA−TadA* 60 1500 40 2000 ABE7.10 20 1000 o. of DAS events o.

0 of DAS events No. ABE7.10−site 1 N 1000 500 ABE7.10−site 2

0 0

TadA* A E −site 18A −site 1 48 −site 2 ABE7.10 6 −site90A 3−site 3 A F 53E 14 1 adA− R2 W 28 )−site 3)−site 3 D F F T BE3−site 3 R1 Y130 ABE7.10−siteABE7.10−site 1 2 W90Y BE3 ABE7.10−site 1 ABE7.10−site 2 ABE7.10ABE7.10 ABE7.10 BE3 BE3(hA3A)−site 3 BE3(hA3ABE3(hA3A g h i

1600 SE A5 A3 MX RI AF AL SE A5 A3 MX RI AF AL 2000 SE A5 A3 MX RI AF AL 5000

4000 1500 1200

3000 1000 800

2000 No. of DAS events No. No. of DAS events No. of DAS events No. 500 400 1000

0 0 0 x x A G ax ax 1A 1A 1 2 2 2 ma m 8 R R E E Ema Em V82G V82G V V82G V82G V82G Emax Emax /R21A / BEmax B B B BEmax A/R21A A ABEmax AB ABEmax A 0A 0A i A AB AB 0 2 20A/ ni n niA i i K2 K K K2 K20A/R2 K20A/R21A m m miniABEmax miniA miniABEmax mi HEK site 2 ABE site 16 NT HEK site 2 ABE site 16 NT

genes (Supplementary Fig. 3f). Thus, engineering CBE or ABE editor and their expressional alterations were validated. These deaminases can only moderately decrease the frequency of DAS DEGs included MYC, GPC3, SGK1, HSP90AA1, SETD2, PPM1D, events. and NCOA1, which had been identified as oncogenes or tumor suppressor genes. For DEGs, differential expression was verified fi by qPCR (Fig. 4d, e and Supplementary Fig. 4d, e). For DAS, Validation of the identi ed DEGs and DAS in BE-treated cells. fi To further validate DEGs and DAS identified in cells treated with several potential candidates of skipped exons were identi ed: base editors, we also separately performed RNA-seq in BE3 or RBM15, BAG4, ADRM1, TCEAL9, and PPP5C for BE3-treated * groups as well as UBE2I, ISCU, RIF1, GADD45A, and APEH for TadA-TadA transfected HEK293T cells. DEGs and DAS events * were identified in comparison with GFP groups (Supplementary TadA-TadA -treated groups. PCR analyses of the GFP groups Fig. 4a). Totals of 1796 and 1151 DEGs were identified in BE3 and base editor-treated groups were performed using primers * flanking the skipped exons of these target genes. The results and TadA-TadA transfected HEK293T cells, including 41 and fi 36 cancer-related genes, respectively (Fig. 4a, b and Supplemen- veri ed that the exon inclusion levels of these genes were all significantly changed in response to treatment with the base tary Fig. 4b, c). Totals of 1209 and 1518 DAS events were – – observed in BE3 and TadA-TadA*-treated groups, respectively editor (Fig. 4f i, Supplementary Fig. 5a f, and Supplementary (Fig. 4c). Several DEGs were randomly selected from each BE Fig. 7a). In summary, these results suggest that large numbers of

6 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5 ARTICLE

Fig. 3 Transcriptome-wide differential alternative splicing (DAS) events induced by DNA base editors. a Bar plots from RNA-seq experiments showing identified DAS events in HEK293T cells expressing APOBEC1, BE3, BE3-site 3, and BE3-RNF2. Cells expressing GFP served as a control for all comparisons. Corresponding data were presented in Supplementary Data 7. b Bar plots from RNA-seq experiments showing identified DAS events in HEK293T cells transfected with TadA-TadA*, ABE7.10, ABE7.10-site 1, and ABE7.10-site 2. Cells expressing GFP served as a control for all comparisons. Corresponding data were presented in Supplementary Data 7. c, d Ratio of shared DAS events between any two samples in APOBEC1, BE3, TadA, and ABE7.10 groups. The proportion in each cell was calculated by the number of overlapping DAS events between two samples divided by the number of DAS events in the row. e Bar plots of RNA-seq experiments in HEK293T cells showing the identified DAS events for BE3-site 3, BE3W90Y/R126E-site 3, BE3W90A-site 3, BE3 (hA3A)-site 3, BE3(hA3AR128A)-site 3, and BE3(hA3AY130F)-site 3 groups. GFP expressing cells served as a control for all comparisons. Corresponding data were presented in Supplementary Data 8. f Bar plots from RNA-seq experiments in HEK293T cells showing the identified DAS events for ABE7.10-site 1, ABE7.10D53E-site 1, ABE7.10F148A-site 1, ABE7.10-site 2, and ABE7.10F148A-site 2 groups. GFP expressing cells served as a control for all comparisons. Corresponding data were presented in Supplementary Data 8. g Bar plots showing the number of identified DAS events in RNA-seq experiments in HEK293T cells expressing ABEmax, miniABEmax, miniABEmax(K20A/R21A), or miniABEmax(V82G) as well as one of three different gRNAs (HEK site 2, ABE site 16, and a non-targeted control (NT)). Cells expressing GFP served as a control for all comparisons. Corresponding data were presented in Supplementary Data 9. h Bar plots showing the number of identified DAS events in RNA-seq experiments in HEK293T cells expressing ABEmax, miniABEmax, miniABEmax(K20A/R21A), or miniABEmax(V82G) as well as one of three different sgRNAs (HEK site 2, ABE site 16, and a non-targeted control (NT)). Cells transfected with nCas9 and NT sgRNA served as a control for all comparisons. Corresponding data were presented in Supplementary Data 10. i Bar plots showing the number of identified DAS events in RNA-seq experiments in HEK293T cells expressing WT BE3, SECURE-BE3(R33A), SECURE-BE3(R33A/K34A), hA3A-BE3, eA3A-BE3, and hAID-BE3 with co-expression of a sgRNA, targeting a site on the RNF2 gene. nCas9 expressing cells served as a control for all comparisons. Corresponding data were presented in Supplementary Data 11.

DEGs and DAS events exist universally across base editor-treated FPKM values of APOBEC1 for RNA-seq data and observed the cells. gradient expression of deaminases in HEK293T cells (Fig. 6b). We found that the number of DEGs was increased when higher Base editors induce transcriptome-wide DEGs and DAS events levels of APOBEC1 were expressed. There were 0, 990, 2119, and fi μ in other mammalian cell types. We also tested the applicability 2581 DEGs identi ed in HEK293T cells transfected with 0 g, 1.5 μ μ μ of our findings to other mammalian cell types. We investigated g, 2.5 g, or 3.5 g APOBEC1, respectively (Fig. 6c and Sup- the changes in gene expression and splicing induced by base plementary Data 14). Totals of 146, 652, 1061, and 1346 DAS editors in HeLa cells (Supplementary Fig. 4a). There were 1386 events were observed in gradient concentration of APOBEC1- and 732 DEGs identified in BE3 or TadA-TadA* transfected transfected groups, respectively (Fig. 6c, Supplementary Data 14). HeLa cells, including 34 and 31 cancer-related genes, respectively Simultaneously, we examined the potential effect of expression of (Fig. 5a, b, Supplementary Fig. 6a, b, and Supplementary ABE deaminases by performing gradient concentration trans- * Data 13). Also, a total of 2035 and 1263 DAS events were fection of TadA-TadA in HEK293T cells (Supplementary fi observed in BE3 and TadA-TadA*-treated HeLa cells, respec- Fig. 4a) and con rmed the expression levels of deaminases in tively (Fig. 5c and Supplementary Data 3). We compared the each group (Fig. 6d, e). We also observed an increase in numbers DEG and DAS events between HEK293T and HeLa cells and in DEGs and DAS events induced by increased expression of ABE found that only a few of DEG and DAS events in these two cells deaminases (Fig. 6f, Supplementary Data 14). There were 0, 1264, overlapped with each other, respectively (Supplementary Fig. 6c). 1076, and 2312 DEGs and 146, 956, 1384, and 1607 DAS events fi μ μ μ It was well consistent with the observations of off-targets SNVs identi ed in HEK293T cells transfected with 0 g, 1.5 g, 2.5 g, μ * between different cell types. In the same way, we randomly or 3.5 g TadA-TadA , respectively (Fig. 6f). Taken together, our fi selected several DEGs, including ETV5, HSPE1, DUSP1, DNAJB1, results con rmed our supposition that a higher expressing level of SERPINH1, and DDIT3, and verified the differential expression in CBEs or ABEs will result in more DEGs and DAS events. HeLa cells by qPCR (Fig. 5d, e). For DAS, several potential candidates of skipped exons were identified by PCR analyses Discussion fl using primers anking the skipped exons of these target genes, Recent developments of DNA base editors (both CBEs and ABEs) including QPRT and ZNHIT1 for BE3-treated HeLa cells as well resulted in the rapid emergence of state-of-the-art tools for * – as DHX9 and RLIM for TadA-TadA -treated cells (Fig. 5f i and genome editing in model organisms. Despite the advantages, Supplementary Fig. 7b). Together, these data indicate that the these base-editing methods offer, several potential issues, such as transcriptome-wide DEGs and DAS events induced by base edi- target specificity, off-target DNA edits, and the recently reported tors ubiquitously occur in mammalian cells. base editor-induced off-target RNA SNVs, impede the use of these rapidly evolving techniques for biomedical applications. To Increased expression of deaminases induces an increase in alleviate these disadvantages, further efforts to more fully char- DEGs and DAS events. Because our data have suggested that acterize the potential effects of deaminases are critical. This study higher expressing level of CBEs or ABEs trend to induce more comprehensively examined the deaminase effects of CBEs or DEGs or DAS events, we hypothesized the potential effect of the ABEss on RNA. To this end, transcriptome-wide DEGs and DAS expression levels of deaminases in gene expression and alternative events induced by CBEs or ABEs were examined in RNA-seq splicing. Note that this hypothesis does not preclude the possi- samples9,12. The data presented here show that canonical CBEs bility that there are other potential mechanisms for these wide- and ABEs generated substantial numbers of DEGs and DAS spread changes induced by base editors. To test this hypothesis, events, which were independent of sgRNA, and appear to be HEK293T cells were transfected with a gradient concentration of caused by overexpression of APOBEC1 and TadA-TadA*, APOBEC1 (0 μg, 1.5 μg, 2.5 μg, and 3.5 μg), and the expression of respectively. APOBEC1 in cells was confirmed by qPCR (Fig. 6a and Sup- Notably, recent studies have shown that the engineering of plementary Fig. 4a). Then RNA-seq has performed in these deaminases (both CBE and ABE variants) may eliminate inci- gradient concentration-transfected cells. We also calculated dences of off-target SNVs without substantially affecting on-

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 7 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

a BE3 (1,796 DEGs) b TadA-TadA* (1,151 DEGs) c 20 20 SE A5 A3 MX RI AF AL

15 15 1200

10 10 800 -log10(FDR) -log10(FDR)

5 5 400

0 0 0 -5 0 5 -5 0 5 BE3 TadA-TadA* Log2 (Fold change) Log2 (Fold change) d e

6.0 P = 2.61e-06 4 P = 2.07e-04 12 P = 3.61e-08 12 P = 4.36e-06 6.0 P = 1.52e-04 1 6.0 P = 1.98e-05 0AA IF1 44L I R MYC F P9 I L IFNB1 f f ZFAND2A o HS of 4.5 o 3 9 9 4.5 4.5 nof n of o o i i sion n s s s io ession s r re p

3.0 2 6 6 3.0 es 3.0 x r xpress exp e p Aexpre A N N R Aex RNA e R RNA m 1.5 m 1 3 3 1.5 1.5 em v ive m emRN tive ati at a l l tiv e lative mRNA expression of Relative R la Re Rel e

0.0 R

0 0 0 0.0 Re 0.0 GFP BE3 GFP BE3 GFP BE3 GFP TadA-TadA* GFP TadA-TadA* GFP TadA-TadA*

f 82 g 142 98 400 143 GFP 1000 GFP 300 200 500 100 0 0 499 125

300 141 BE3 70 BE3 129 750 82 200 500 250 100 0 0 249 71 RBM15 BAG4

110342200 110342400 110342600 110342800 110343000 38177500 38178000 38178500 38179000 Genomic coordinate (chr1),”+” strand Genomic coordinate (chr8),”+” strand

RBM15 60 P = 1.23e-04 BAG4 75 P = 2.44e-04 E2 E3 E4 E1 E2 E3 % 409 bp 40 E2 E4 252bp % 50 E1 E3 299bp ion sion

145bp sion (%) u lus 20 25 nc nclu i incl Inclusion (%) GAPDH I GAPDH 131bp 131 bp 0 0 GFP BE3 GFP BE3 GFP BE3 GFP BE3 h GFP i GFP 202 600 1000 80 188 169 400 500 200 0 0 321 470 TadA-TadA* TadA-TadA* 133 144 1000 121 1000 55 750 500 500 250 0 0 622 463 UBE2I ISCU 108563000 1309700 1309900 1310100 1310300 1310500 108562800 108562900 Genomic coordinate (chr16),”+” strand Genomic coordinate (chr12),”+” strand

60 P = 2.28e-04 UBE2I ISCU 75 P = 1.31e-04 E1 E2 E3 % 40 E1 E2 E3 198bp % 362bp 50 E1 E3 ion 151bp E1 E3 103bp sion (%) s usion (%) u

20 u cl cl l 25 n I In inclusion GAPDH 131bp GAPDH 131bp inc 0 0 GFP TadA-TadA* GFP TadA-TadA* GFP TadA-TadA* GFP TadA-TadA*

target DNA editing12. However, in the present study, both CBE these DEGs and DAS events were independent of Cas9, sug- and ABE variants failed to adequately decrease transcriptome- gesting that they were the result of deaminase wide DEGs and DAS events. In fact, hundreds to thousands of overexpression alone. DEGs and DAS events were generated, respectively. Although the Genetic correction of inherited diseases depends largely on a presence of a few high-fidelity variants (i.e., BE3(hA3AR128A), single delivery system, which is most commonly used: AAVs. BE3(hA3AY130F), and ABE7.10F148A-site 2) showed significantly These viruses exhibit prolonged gene expression in vivo6–8. Thus, fewer off-target RNA SNVs than canonical BEs in vitro, they the fact that base editors (i.e., CBEs or ABEs) and their variants induced significantly more DEGs and DAS events. In addition, continuously generate extensive DEGs and DAS events overtime

8 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5 ARTICLE

Fig. 4 Validation of DEGs and DAS events identified from RNA-seq data in HEK293T cells. a Volcano plots showing the significant DEGs (adjusted p- value ≤ 0.05 and |Fold change | ≥ 2; downregulated (blue), upregulated (red), and unchanged genes (gray)) in BE3-treated groups. Cells expressing GFP served as control. b Volcano plots showing the significant DEGs (adjusted p-value ≤ 0.05 and |Fold change | ≥ 2; downregulated (blue), upregulated (red), and unchanged genes (gray)) in TadA-TadA*-treated groups. Cells expressing GFP served as control. c Bar plots from RNA-seq experiments showing identified DAS events in HEK293T cells transfected with BE3 and TadA-TadA*. Cells expressing GFP served as a control for all comparisons. Corresponding data of (a–c) were presented in Supplementary Data 12. d DEG validation in BE3-treated groups by qPCR. The tested genes were randomly selected and the RNA levels are shown relative to those of the control GFP group, normalized to 1. Data of qPCR are presented as mean ± sem, n = 3 biologically independent samples. e DEG validation in TadA-TadA*-treated groups by qPCR. The tested genes were randomly selected and the RNA levels are shown relative to those of the control GFP group, normalized to 1. Data of qPCR are presented as mean ± sem, n = 3 biologically independent samples. f Sashimi plot and PCR validation of the RBM15 gene in GFP and BE3-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. g Sashimi plot and PCR validation of the BAG4 gene in GFP and BE3-treated cells. The alternative exon (Exon 3 (E3)) is marked in gray. h Sashimi plot and PCR validation of the UBE2I gene in GFP and TadA-TadA*-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. i Sashimi plot and PCR validation of the ISCU gene in GFP and TadA-TadA*-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. Error bars represent SD for three independent experiments.

to come poses additional risks in gene therapies, and oncogenesis and DAS events. Although these DEGs or DAS events could exist is of particular concern in this context. Both BE3- and ABE7.10- for only a short period of time by transient expression of base treated groups showed a dramatic increase in the levels of ETV5 editors, the longer-term functional consequences of widespread (Ets variant gene 5; also known as ERM). This variant has been unwanted gene expression and alternative splicing will need to be associated with metastatic progression in several types of human accounted for in research studies. cancers14,15. Differential expression of the transcriptional target of the tumor suppressor TP53 (encoding p53) cyclin-dependent Methods kinase inhibitor 1A (CDKN1A; encoding p21) was found16.To RNA-seq data collection, quality control, and processing. The raw sequencing further validate DEGs and DAS identified in base editor-treated data of cells treated with CBE and ABE, in addition to those treated with engi- cells, RNA-seq was conducted and the expression of genes and AS neered base editor variants were downloaded from the Sequence Read Archive fi BioProject (accession number PRJNA528149) and Gene Expression Omnibus events in base editor-treated cells were detected. The signi cantly (GEO, accession number GSE129894). Single-cell RNA-seq data for CBE and ABE up-regulated expression of MYC was confirmed to be induced by groups were downloaded from SRA (accession number PRJNA528561). The BE3. The MYC oncogene is overexpressed in more than half of quality of RNA-seq data was controlled using FastQC (v0.11.8). The RNA-seq data human cancers17. The confounding effects of unwanted DEGs were trimmed with Trim Galore (v0.6.1) using default parameters for pair-end data to remove both low-quality reads and adaptors. The trimmed RNA-seq data were and DAS events must be accounted for in relevant studies, aligned to the GRCh38 human transcriptome with STAR (v2.5.0a)19 using default especially if constitutive base editor expression is applied. For parameters. Gene expression was quantified as fragments per kilobase of transcript therapeutic applications in humans, both the duration and level per million mapped reads (FPKM) in HTSeq (v0.10.0)20. of base editor deaminase expression should be minimized. Recently, the first 3-D cryo-electron microscopy structure of Transient transfection and sequencing. Plasmids were constructed with Clo- base editors has been reported, which enables a better under- nExpress II One Step Cloning Kit (Vazyme) following the standard protocol. HEK293T (ATCC CRL-3216) and HeLa (ATCC CCL-2) were obtained from the standing of the molecular basis for DNA adenosine deamination American type culture collection (ATCC). Mycoplasma contamination was 18 by ABEs . Activity assays indicated why ABEs are prone to determined by PCR of the supernatant of HEK293T or HeLa cells. HEK293T cells create off-target edits: in ABEs, the deaminase protein fused to and HeLa cells were seeded in 10-cm dishes and cultured in Dulbecco’s modified Eagle’s medium (DMEM, Thermo Fisher Scientific) supplemented with 10% FBS Cas9 is always active. As Cas9 hops around the nucleus, it binds – (Bi) and penicillin streptomycin at 37 °C with 5% CO2. Cells were transfected with and releases hundreds or even thousands of DNA segments 30 μg plasmids using Lipofectamine 3000 (Thermo Fisher Scientific). Three days before it finds its intended target. The attached deaminase does after transfection, cells were digested with 0.05% trypsin and were prepared for not wait for a perfect match and often edits a base before Cas9 FACS. GFP-positive cells were sorted and retained in Trizol (Solarbio). For RNA- finds its final target. Maybe the high activity of fused deaminase seq, about 1,000,000 cells (the top 5% according to GFP signal strength) were collected and their RNA was extracted following the standard protocol. To con- can partly explain the transcriptome-wide effects on gene struct the library, mRNAs were fragmented and converted into cDNA using ran- expression and alternative splicing. In fact, we indeed found that dom hexamers or oligo(dT) primers. The 5′ and 3′ ends of cDNA were ligated with increased expression of deaminases induces an increased number adaptors, and correctly ligated cDNA fragments were enriched and amplified by in DEGs and DAS events. Future studies are necessary to explore PCR. For HEK293T cells, nine groups of transfected cells were used: cells that * the proposed mechanism and identify potential unanticipated expressed only GFP, TadA-TadA , or BE3. Data for GFP and BE3 are shown in Fig. 4a, c, Supplementary Data 12, and data for GFP and TadA-TadA* are shown activities of base editors. in Fig. 4b, c, Supplementary Data 12. For HeLa cells, eight groups of transfected The rapid evolution of genome editing technologies has led us cells were used: cells that expressed only GFP, TadA-TadA*, or BE3. Data for GFP into a new era. We can now easily edit our genomes and as well as and BE3 are shown in Fig. 5a, c, Supplement Data 13, and data for GFP and TadA- * the genomes of many other organisms. Expect for off-target RNA TadA are shown in Fig. 5b, c, Supplementary Data 13. For gradient concentration transfection assays, 0 μg, 1.5 μg, 2.5 μg and 3.5 μg APOBEC1 or TadA-TadA* was SNVs effect induced by genome-editing technologies reported by transfected in HEK293T cells, and data for HEK293T cells transfected with dif- recent studies, we show here a critical and previously unreported ferent concentration of APOBEC1 are shown in Fig. 6a–c, Supplementary 14 and aspect of the effects deaminases impose on gene expression and data for HEK293T cells transfected with different concentration of TadA-TadA* alternative splicing. We present further insight into the current are shown in Fig. 6d–f, Supplementary 14. High-throughput mRNA sequencing was conducted using Nova-PE150. The process of RNA-seq data was performed understanding of the potential effects of deaminases induced by using the same analysis pipeline mentioned above. All sequencing data have been DNA base editors, which has important implications for the deposited under GEO. application of these technologies in both research and clinical settings. The most basic requirement of application the of Identification of DEGs in base editor-treated cells. DEGs were identified using genome-editing technologies in clinical treatment is accuracy and DESeq2 (V1.20.0)21. A gene was identified as a DEG if changes in expression were safety. Safety assessments for human therapeutic applications > 2 or <0.5. The statistical significance of differential expression was evaluated using a Student’s t-test on-base editor-treated vs. GFP or nCas9 groups. The results should include a comprehensive evaluation of the potential were filtered according to a false discovery rate of <0.05 using the functional consequences induced by transcriptome-wide DEGs Benjamini–Hochberg procedure.

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 9 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

a BE3 (1,386 DEGs) b TadA-TadA* (732 DEGs) c SE A5 A3 MX RI AF AL

25 40 2500

20 2000 30

15 1500

20 -log10(FDR)

10 -log10(FDR) 1000 No. of DAS events 10 5 500

0 0 0 -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 BE3 TadA-TadA* Log2 (Fold change) Log2Log2 (Fold (Fold change) change)

8 16 10.0 16 12 4 d P = 0.0048 P = 0.0014 P = 0.0407 e P = 0.0005 P = 0.0369 P = 0.0153 DDIT3 ETV5 HSPE1 DUSP1 DNAJB1 6 12 7.5 12 SERPINH1 9 3

4 8 5.0 8 6 2

2 4 2.5 4 3 1 Relative mRNA expression of Relative mRNA Relative mRNA expression of Relative mRNA Relative mRNA expression of Relative mRNA Relative mRNA expression of Relative mRNA Relative mRNA expression of Relative mRNA Relative mRNA expression of Relative mRNA 0 0 0.0 0 0 0 GFP BE3 GFP BE3 GFP BE3 GFP TadA-TadA*GFP TadA-TadA*GFP TadA-TadA* GFP GFP g 33 f 1000 1000 22 374 500 463 500 0 733 0 18 BE3 BE3 750 38 1000 500 10 512 500 88 250 0 0 6 138 QPRT ZNHIT1 29680000 29681000 29682000 29683000 10121900 10122000 10122100 Genomic coordinate (chr16),”+” strand Genomic coordinate (chr7),”+” strand

60 P = 3.21e-05 30 P = 1.07e-03 QPRT ZNHIT1 E1 E2 E3 E1 E2 E3

241bp % % 350bp 20 150bp 40 E1 E3 E1 E3 184bp n io ion s

20 lus 10 c clu Inclusion (%) Inclusion (%) n n i GAPDH 131bp i GAPDH 131bp 0 0 GFP BE3 GFP BE3 GFP BE3 GFP BE3 GFP i 89 72 GFP h 800 600 47 70 400 400 200 0 0 280 344 TadA-TadA* 111 TadA-TadA* 107 800 300 136 100 200 400 100 0 0 454 151 DHX9 RLIM 182854200 182854400 182854600 74596200 74596600 74597000 Genomic coordinate (chr1),”+” strand Genomic coordinate (chrX),”-” strand

90 P = 8.46e-09 45 P = 1.25e-07 DHX9 RLIM E1 E2 E3 E1 E2 E3

296bp % % 390bp 60 30 E1 E3 160bp E1 E3 329bp

30 lusion 15 c clusion Inclusion (%) Inclusion (%) Inclusion in GAPDH 131bp in GAPDH 131bp 0 0 GFP TadA-TadA* GFP TadA-TadA*GFP TadA-TadA* GFP TadA-TadA* Tad-TadA*

Identification of DAS events in base editor-treated cells. For DAS analysis, Quantification of fold changes in gene expression was calculated using the com- clean data were mapped to the GRCh38 reference genome using default parameters parative threshold cycle (Ct) method with GAPDH as internal control based on the in SUPPA2 (v2.2.1)22. Seven types of DAS events (SE, A5/A3, RI, MX, and AF/AL) following equation: fold change (test vs. control) = 2−ΔΔCt, where ΔΔCt = (Cttest/ were identified, using a false discovery rate <0.05. gene − Ctcontrol/gene) − (Cttest/GAPDH − Ctcontrol/GAPDH). The sequences of all qPCR primers used in this study are listed in Supplementary Table 1. Validation of DAS events was performed by PCR analysis using primers (listed in Supplementary Validation of DEGs and DAS events in base editor-treated cells. Total RNA Data 15) flanking the skipped exons of target genes. was extracted from transfected cells using the TRIzol reagent (Invitrogen). First- strand cDNA synthesis was performed using the revertaid first strand cDNA synthesis kit (Thermo Fisher Scientific). For qPCR, the TransStart® Green qPCR ’ fi Statistics and reproducibility. All values are presented as mean ± sem. Unpaired SuperMix (Trans) was used following the manufacturer s instructions. Ampli ca- ’ tion was performed in a LightCycle®96 real-time PCR system (Roche). Student s t-tests (two-tailed) were used for comparisons, and P < 0.05 was

10 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5 ARTICLE

Fig. 5 Validation of DEGs and DAS events identified from RNA-seq data in HeLa cells. a Volcano plots showing the significant DEGs (adjusted p-value ≤ 0.05 and |Fold change | ≥ 2; downregulated (blue), upregulated (red), and unchanged genes (gray)) in BE3-treated groups. Cells expressing GFP served as control. b Volcano plots showing the significant DEGs (adjusted p-value ≤ 0.05 and |Fold change | ≥ 2; downregulated (blue), upregulated (red), and unchanged genes (gray)) in TadA-TadA*-treated groups. Cells expressing GFP served as control. Corresponding data were presented in Supplementary Data 13. c Bar plots from RNA-seq experiments showing identified DAS events in HeLa cells transfected with BE3 and TadA-TadA*. Cells expressing GFP served as a control for all comparisons. Corresponding data were presented in Supplementary Data 3. d DEG validation in BE3-treated groups by qPCR. The tested genes were randomly selected and the RNA levels are shown relative to those of the control GFP group, normalized to 1. Data of qPCR are presented as mean ± sem, n = 3 biologically independent samples. e DEG validation in TadA-TadA*-treated groups by qPCR. The tested genes were randomly selected and the RNA levels are shown relative to those of the control GFP group, normalized to 1. Data of qPCR are presented as mean ± sem, n = 3 biologically independent samples. f Sashimi plot and PCR validation of the QPRT gene in GFP and BE3-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. g Sashimi plot and PCR validation of the ZNHIT1 gene in GFP and BE3-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. h Sashimi plot and PCR validation of the DHX9 gene in GFP and TadA-TadA*-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. i Sashimi plot and PCR validation of the RLIM gene in GFP and TadA-TadA*-treated cells. The alternative exon (Exon 2 (E2)) is marked in gray. Error bars represent SD for three independent experiments.

abc

3000 DEGs 240000 100 DAS events 2581

BEC1 2500 O 2119

AP TadA-TadA*180000

f 75

of 2000 sion o ssion s

e 1500 r 120000 50 Number M values p pre x 990 E 1000 1346 FPK NA 1061 R 60000 25 mRNA ex mRNA em 500 v e 652 v ti i 146 t la

Re 0 0 0 Rela ) ) FP )g) g) g ) PG g g g) g) ) g) 1μ.5 2.μ5 3.5μ μg) μg μ μg GF .51( 1( 1( GFP GFP EC EC.5 EC.5 5 .5μ .5μg) .5 .5 5 OB OB OB C1(1AP AP AP C1(2 EC1(1. EC1(3 EC1(2 APOBEAPOBEAPOBEC1(3 APOB APOBEC1(2APOB APOBEC1(1 APOB APOBEC1(3.

d ef 2500 2312

* DEGs 44000 100 DAS events

TadA -TadA*

- 2000 A 1706

Tad TadA 33000 75 1500 1264 1607 ion of

22000 1384 50 Number 1000 express

FPKM values 956 11000 25 500 146 tive mRNA Expression of a l 0 0 Re

Relative mRNA 0 ) ) FP g) g g ) GFPG .5 .5 .5 g) FP μg .5( μg)1 (2 *(3 GFP μ μg) G 5μg) d* d* ad 5μg) . .5μg) 5 Ta Ta -T . 1 2 A*(1d- d- ad 2.5 3 *( *( Ta Ta T A*(3.5 μg) A*( Tad TadA*(2.5ad μg) adA adA TadA*(3. -T -TadA*(1.5 A-T T TadA- TadA- dA ad Ta TadA TadA-TadTadA-TadA*( T TadA- TadA-

Fig. 6 Increased expression of deaminases induces an increase in DEGs and DAS events. a Expression of APOBEC1 detected by qPCR in HEK293T cells transfected with a gradient concentration of APOBEC1 (0, 1.5, 2.5, and 3.5 μg). All values are presented as mean ± sem. b FPKM values of APOBEC1 calculated by RNA-seq data in HEK293T cells transfected with a gradient concentration of APOBEC1 (0, 1.5, 2.5, and 3.5 μg). All values are presented as mean ± sem. c The number of DEGs and DAS events in cells transfected with a gradient concentration of APOBEC1 (0, 1.5, 2.5, and 3.5 μg). Corresponding data were presented in Supplementary Data 14. d Expression of TadA detected by qPCR in HEK293T cells transfected with a gradient concentration of TadA-TadA* (0, 1.5, 2.5, and 3.5 μg). All values are presented as mean ± sem. e FPKM values of TadA calculated by RNA-seq data in HEK293T cells transfected with a gradient concentration of TadA-TadA* (0, 1.5, 2.5, and 3.5 μg). All values are presented as mean ± sem. f The number of DEGs and DAS events in cells transfected with a gradient concentration of TadA-TadA* (0, 1.5, 2.5, and 3.5 μg). Error bars represent SD for two or three independent experiments. considered to indicate statistically significant differences. A number of repeated with CBE and ABE, in addition to those treated with engineered base editor variants were experiments was described in the figure legends. downloaded from the Sequence Read Archive BioProject (accession number PRJNA528149) and Gene Expression Omnibus (GEO, accession number GSE129894). Single-cell RNA-seq data for CBE and ABE groups were downloaded from SRA Reporting summary. Further information on research design is available in the Nature (accession number PRJNA528561). Fig. 1 and Supplementary Fig. 1a–e correspond to Research Reporting Summary linked to this article. Supplementary Data 1. Supplementary Fig. 1h, i correspond to Supplementary Data 2. Fig. 2a, b, Supplementary Fig. 2a, b, d, and Fig. 5c correspond to Supplementary Data 3. Data availability Fig. 2c corresponds to Supplementary Data 4. Fig. 2d corresponds to Supplementary The authors declare that all relevant data supporting the findings presented in this study Data 5. Fig. 2e corresponds to Supplementary Data 6. Fig. 3a, b correspond to are available within the article and its Supplementary Information files, or, from the Supplementary Data 7. Fig. 3e, f correspond to Supplementary Data 8. Fig. 3g corresponding author upon reasonable request. The raw sequencing data of cells treated corresponds to Supplementary Data 9. Fig. 3h corresponds to Supplementary Data 10.

COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio 11 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02406-5

Fig. 3i corresponds to Supplementary Data 11. Fig. 4a–c correspond to Supplementary 21. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change Data 12. Fig. 5a–c and Supplementary 6a, b correspond to Supplementary Data 13. and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 Fig. 6c corresponds to Supplementary Data 14. (2014). 22. Trincado, J. L. et al. SUPPA2: fast, accurate, and uncertainty-aware Received: 18 October 2020; Accepted: 30 June 2021; differential splicing analysis across multiple conditions. Genome Biol. 19,40 (2018).

Acknowledgements This work was jointly supported by the National Natural Science Foundation of China (61873276, 31972526, 31772571). We thank Professor Hui Yang at Shanghai Institutes References for Biological Sciences, Chinese Academy of Sciences, and Professor Branden S. Mor- 1. Gaudelli, N. M. et al. Programmable base editing of A*TtoG*C in genomic iarity at Masonic Cancer Center and Center for Genome Engineering, the University of DNA without DNA cleavage. Nature 551, 464–471 (2017). Minnesota for technical and analytical support. 2. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016). Author contributions 3. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and W.S. and X.W. designed research; J.F., Y.D., C.R., Z.S., J.Y., Q.C., C.D., and C.L. per- transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018). formed research; W.S., J.F., and X.W. wrote the manuscript. All authors read and 4. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and approved the manuscript. vertebrate adaptive immune systems. Science 353 (2016). 5. Seo, H. & Kim, J. S. Towards therapeutic base editing. Nat. Med. 24, Competing interests – 1493 1495 (2018). The authors declare no competing interests. 6. Maeder, M. L. et al. Development of a gene-editing approach to restore vision loss in Leber congenital amaurosis type 10. Nat. Med. 25, 229–233 (2019). 7. Rossidis, A. C. et al. In utero CRISPR-mediated therapeutic editing of Additional information metabolic genes. Nat. Med. 24, 1513–1518 (2018). Supplementary information The online version contains supplementary material 8. Villiger, L. et al. Treatment of a metabolic liver disease by in vivo genome base available at https://doi.org/10.1038/s42003-021-02406-5. editing in adult mice. Nat. Med. 24, 1519–1525 (2018). 9. Grunewald, J. et al. CRISPR DNA base editors with reduced RNA off-target Correspondence and requests for materials should be addressed to X.W. or W.S. and self-editing activities. Nat. Biotechnol. 37, 1041–1048 (2019). 10. Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and Peer review information Communications Biology thanks the anonymous reviewers for minimization of Cas9-independent off-target DNA editing by cytosine base their contribution to the peer review of this work. Primary Handling Editor: George editors. Nat. Biotechnol. 38, 620–628 (2020). Inglis. Peer reviewer reports are available. 11. Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019). Reprints and permission information is available at http://www.nature.com/reprints 12. Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its ’ elimination by mutagenesis. Nature 571, 275–278 (2019). Publisher s note Springer Nature remains neutral with regard to jurisdictional claims in 13. Rees, H. A., Wilson, C., Doman, J. L. & Liu, D. R. Analysis and minimization of published maps and institutional affiliations. cellular RNA editing by DNA adenine base editors. Sci. Adv. 5, eaax5717 (2019). 14. Llaurado, M. et al. ETV5 transcription factor is overexpressed in ovarian cancer and regulates cell adhesion in ovarian cancer cells. Int J. Cancer 130, Open Access This article is licensed under a Creative Commons 1532–1543 (2012). Attribution 4.0 International License, which permits use, sharing, 15. Monge, M. et al. ERM/ETV5 up-regulation plays a role during myometrial adaptation, distribution and reproduction in any medium or format, as long as you give fi in ltration through matrix metalloproteinase-2 activation in endometrial appropriate credit to the original author(s) and the source, provide a link to the Creative – cancer. Cancer Res. 67, 6753 6759 (2007). Commons license, and indicate if changes were made. The images or other third party 16. Otto, T. & Sicinski, P. Cell cycle proteins as promising targets in cancer material in this article are included in the article’s Creative Commons license, unless – therapy. Nat. Rev. Cancer 17,93 115 (2017). indicated otherwise in a credit line to the material. If material is not included in the – 17. Dang, C. V. MYC on the path to cancer. Cell 149,22 35 (2012). article’s Creative Commons license and your intended use is not permitted by statutory 18. Lapinaite, A. et al. DNA capture by a CRISPR-Cas9-guided adenine base regulation or exceeds the permitted use, you will need to obtain permission directly from editor. Science 369, 566–571 (2020). the copyright holder. To view a copy of this license, visit http://creativecommons.org/ 19. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, licenses/by/4.0/. 15–21 (2013). 20. Anders, S., Pyl, P. T. & Huber, W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015). © The Author(s) 2021

12 COMMUNICATIONS BIOLOGY | (2021) 4:882 | https://doi.org/10.1038/s42003-021-02406-5 | www.nature.com/commsbio