www.nature.com/npjgenmed

ARTICLE OPEN Loss of grand histone H3 lysine 27 trimethylation domains mediated transcriptional activation in esophageal squamous carcinoma

Jian Yuan 1,2,6, Qi Jiang1,2,6, Tongyang Gong 3,6, Dandan Fan1,2,6, Ji Zhang1,2, Fukun Chen1,2, Xiaolin Zhu3, Xinyu Wang1,2, ✉ ✉ ✉ Yunbo Qiao5, Hongyan Chen 3 , Zhihua Liu 3 and Jianzhong Su 1,2,4

Trimethylation of histone H3 lysine 27 trimethylation (H3K27me3) may be recruited by repressive Polycomb complexes to mediate silencing, which is critical for maintaining embryonic stem cell pluripotency and differentiation. However, the roles of aberrant H3K27me3 patterns in tumorigenesis are not fully understood. Here, we discovered that grand silencer domains (breadth > 50 kb) for H3K27me3 were significantly associated with epithelial cell differentiation and exhibited high gene essentiality and conservation in esophageal epithelial cells. These grand H3K27me3 domains exhibited high modification signals involved in gene silencing, and preferentially occupied the entirety of topologically associating domains and interact with each other. We found that widespread loss of the grand H3K27me3 domains in of esophageal squamous cell carcinomas (ESCCs) were enriched in involved in epithelium and endothelium differentiation, which were significantly associated with overexpression with increase of active modifications of H3K4me3, H3K4me1, and H3K27ac marks, as well as DNA hypermethylation in the gene bodies. A total of 208 activated genes with loss of grand H3K27me3 domains in ESCC were identified, where the higher expression and

1234567890():,; mutation of T-box transcription factor 20 (TBX20) were associated with worse patients’ outcomes. Our results showed that knockdown of TBX20 may have led to a striking defect in esophageal cancer cell growth and carcinogenesis-related pathway, including cell cycle and homologous recombination. Together, our results reveal that loss of grand H3K27me3 domains represent a catalog of remarkable activating regulators involved in carcinogenesis. npj Genomic Medicine (2021) 6:65 ; https://doi.org/10.1038/s41525-021-00232-6

INTRODUCTION Specific transcription outputs could be contained in the spread Histone methylation primarily on lysine and arginine residues is of epigenetic modifications over a genomic locus. Active governed by multiple positive and negative regulators, to either chromatin marks are usually restricted to specific genomic loci 10 activate or repress transcription, which are essential for cell-fate but have also been observed in broader deposits . Super- determination and embryonic stem cell pluripotency and differ- enhancers, have been used to describe large clusters of fi entiation1. Abnormal expression patterns or genomic alterations in H3K27ac peaks, were speci cally drive the expression of cell identity genes and associated with key oncogenes in many cancer histone methylation can lead to the induction and maintenance of 11–13 various cancers2,3. One of the best examples of deregulated types . In addition, the broad H3K4me3 domains have been histone methylation resulting in changes of and found to associate not only with increased transcriptional precision of cell-type-specific genes related to cell identity but genome integrity in cancer is the extensive loss and gain of also with the increased transcriptional elongation and enhancer histone H3 lysine 27 trimethylation (H3K27me3). The recurrent activity of tumor-suppressor genes14,15. More recently, several gain-of-function or loss-of-function mutations in the gene studies have proposed methods to identity broad H3K27me3 encoding enhancer of zeste homolog 2 (EZH2) is the one of the domain in a genome-wide manner and found these broad genic catalytic subunits of Polycomb repressive complex 2 (PRC2) and 4 repression domains serve as a strong repressive machinery via catalyzes the methylation of H3K27me3 . Gain-of-function muta- chromatin interactions and signify enhanced silencing of onco- tions leading to the overexpression of PRC2 components or genes16,17. Taken together, these findings indicated that modifica- enhancement of the catalytic activity of EZH2 lead to an aberrant tion breadth is essential for the regulation of the transcriptional 5,6 increase in global levels of H3K27me3 . However, loss-of- output across multiple cell or tumor types. function mutations leading to a decrease in PRC2 activity and Esophageal cancer is the sixth leading cause of cancer-related affecting the H3K27 substrate of PRC2 have been found in several mortality worldwide. In China, esophageal squamous cell carcino- types of cancer, resulting in a decrease in the global level of mas (ESCCs) are the predominant histological subtype of H3K27me37,8. Together, aberrant H3K27me3 in different tumor esophageal cancer and fourth leading cause of cancer death18,19. contexts is thought to reflect the crucial roles of the specific Aberrant keratinocyte differentiation is considered to be a key cellular transcriptional programme and chromatin environment9. mechanism in the initiation of ESCC, which is driven by genomic

1School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou, China. 2Institute of Biomedical Big Data, Wenzhou Medical University, Wenzhou, China. 3State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China. 4Oujiang Laboratory, Wenzhou, China. 5Precise Genome Engineering Center, School of Life Sciences, Guangzhou University, Guangzhou, China. 6These authors contributed equally: Jian Yuan, Qi Jiang, Tongyang Gong, Dandan Fan. ✉ email: [email protected]; [email protected]; [email protected]

Published in partnership with CEGMR, King Abdulaziz University J. Yuan et al. 2 changes of lineage-specific transcription factors (TFs) such as domains span DNA regions whose median length is tenfold wider mutations in Notch1/2/3 and ZNF750 and copy number amplifica- than the typical domains, and have levels of signal density that are tions of SOX2 and TP6320. However, how H3K27me3 and other at least a twofold greater than those at the typical domains chromatin modifications influence cell differentiation and devel- (Fig. 1e). Consistent with previous reports23, H3K27me3 was opment of ESCC remains unclear. mostly around located at the transcription start sites (TSSs), In this study, we generated genome-wide maps of H3K27me3, whereas grand H3K27me3 domains were consistently found to H3K4me3, H3K27ac H3K4me1, and DNA methylation, as well as cover whole genes extending both TSS and TES among species gene expression profiles in both normal (NE2) and cancer cell lines (Supplementary Fig. 3). (KYSE450) of ESCC to investigate the role of the epigenetic To further investigate whether the grand H3K27me3 domains alteration in oncogenesis. We found that these extremely grand have features that might distinguish them from other H3K27me3 H3K27me3 domains are of high density and represses transcrip- domains, we performed gene essentiality and conservation tional activity more strongly than typical domains. Remarkably, analyses with genes marked by H3K27me3 width quantile subsets. genes with loss of grand H3K27me3 domains were preferentially We found genes with the top 5% grand H3K27me3 domain have enriched in epithelium/endothelial cell differentiation and blood significant greater missense Z score (Fig. 1f, P = 3.70 × 10−14, vessel development involved in carcinogenesis. Using the grand Wilcoxon rank sum test) and lower dN/dS (Fig. 1g, P = 1.53 × H3K27me3 domains, we discovered the novel carcinogenic genes 10−09, Wilcoxon rank sum test) than other genes without grand of T-box TF 20 (TBX20) in ESCC and experimentally validated our H3K27me3 domains, which implied that genes marked by grand findings in vitro and in vivo. H3K27me3 domains are more constrained and intolerance to the class of variation. A significant negative correlation was found between H3K27me3 domain breadth and gene expression, which RESULTS indicated grand H3K27me3 domains thoroughly repressed gene Top 5% broadest H3K27me3 domains preferentially mark cell- expression in NE2 cells (Fig. 1h, R = −0.76, P = 8.8 × 10−5, Spear- fate determination in esophageal epithelial cell line man’s rank correlation). Together, we uncovered grand To investigate the importance of epigenomic modification in H3K27me3 silence domains (GSDs) that represent a distinct stabilizing normal cell identity and in carcinogenesis of ESCC, we subclass of H3K27me3 domains, which are strongly and uniquely first generated ChIP-seq datasets for H3K27me3, H3K27ac, associated with gene constraint and gene silencing in NE2 cells. H3K4me1, and H3K4me3 and RNA-seq in human esophageal 1234567890():,; epithelial cell line (NE2). Although the vast majority of H3K27me3 Grand H3K27me3 silence domains coincide with topologically domains was present in 4-5 kb regions, the H3K27me3 domains associating domains had a skewness distribution, in which broadest H3K27me3 Although Polycomb-dependent chromatin looping contributes to domains can spanning up to 300 kb in NE2 cells (Fig. 1a). gene silencing have also been described24–26, the relationship Consistent with the notions that exceptionally broad between higher-order chromatin structure and GSDs remains 15 H3K4me3 signature marks tumor-suppressor genes and that unclear. We used publicly available H3K27me3 ChIP-seq and Hi-C 16 broad H3K27me3 marks oncogenes , we observed that the ChIP- datasets from the IMR90 cells and found GSDs in IMR90 cells have seq for H3K4me3 can cover up to 10 kb in the well-known tumor- a strongly positive correlation of functional enrichments with suppressor gene, TP5321, and that broad H3K27me3 domains GSDs in NE2 cells (R = 0.84, P = 1.6 × 10−201 for enrichment score, spanned up to 100 kb, which were present in the RSPO3 Pearson’s correlation; Supplementary Fig. 4). By mapping oncogenes22 (Fig. 1b). H3K27me3 profiles with the Hi-C data in IMR90 cells, we observed To assess whether broad H3K27me3 and other histone that multiple anchors and topologically associated domains (TAD) modification domains mark specific genes in human esophagus often corresponded to regions of GSDs (Fig. 2a). In detail, we system, we performed and pathway enrichment found that 643 out of 840 GSDs overlap with the TADs and 431 analysis for genes with distinct breadth domains. In NE2 cells, top mark the anchors. To conduct a quantitative comparison of 5% broadest H3K27me3 domain showed a striking pattern of different breadth, we retrieved three subsets of H3K27me3 most enrichment, along with the H3K27me3 breadth continuum, domain: top 5% broadest H3K27me3 domains (GSDs), top 5% for genes involved in cell-fate determination (false discovery rate narrowest H3K27me3 domains, and random 5% H3K27me3 − (FDR) = 1.0 × 10 11), as well as genes involved in epithelial tube domains as a control. Interestingly, we found 30% GSDs across − morphogenesis (FDR = 2.0 × 10 10) (Fig. 1c, Supplementary the entirety of TAD in comparison with surrounding regions, Table 1). And the pathways in cancer (hsa05200) were also which inferred these TADs might lead to the formation of − enriched in the broadest H3K27me3 domains (FDR = 1.5 × 10 04). repressed compartments (Fig. 2b). Similar patterns were also In contrast, the broadest domains of other histone marks of observed in fibroblast derived cell line (GM23248) and lympho- H3K4me3, H3K4me1, and H3K27ac were not specifically enriched blastoid cell line (GM12878) (Supplementary Fig. 5). Compared for epithelium development (Supplementary Fig. 1). As expected, GSDs with narrow and control domains, interactions between we found that the set of 5% broadest H3K27me3 domains was anchors inside GSDs were more likely to interact with each other enriched in oncogenes in NE2 cells (Supplementary Fig. 2). Taking (Fig. 2c). Alternatively, we calculated the expected number of these data together, we conclude that top 5% broadest interactions for shuffle GSDs under random simulations. The H3K27me3 peaks in NE2 cells involved in specific gene repression observed number of GSDs interacted with each other is related to cell-fate determination, epithelium development, and significantly higher than the results obtained by randomly carcinogenesis. shuffling GSD assignments (P < 0.001, Fig. 2d). In addition, GSDs These observations motivated us to perform a systematic tend to interact with more anchors to form multiplex chromatin analysis of the function characterization between top 5% broadest interactions (Fig. 2e), suggesting that they tend to cosegregate H3K27me3 domains and the remaining typical domains. Notably, with multiple anchors simultaneously for a consequence of phase we found broad H3K27me3 domains were strongly associated separation consistent with previous studies27,28. with higher ChIP-seq signal (total reads) and intensities (Fig. 1d). To infer the target genes of GSDs and potential regulatory Based on functional specificity and high correlation with signal function, we defined putative target genes as genes that mapped density, we defined the top 5% broadest peaks which greater than within, or in distance (≤3 kb) to, a downstream interaction region. 50 kb in width and totally encompassed 813 H3K27me3 domains A total of 388 putative target genes were defined to the 430 as grand H3K27me3 domain (Fig. 1d). The grand H3K27me3 anchor-related GSDs, and 60 genes were overlapped with 1172

npj Genomic Medicine (2021) 65 Published in partnership with CEGMR, King Abdulaziz University J. Yuan et al. 3

a b chr 17 7,670,000 7,690,000 50 H3K27me3 0 50 2500 2500 2000 H3K4me3 1500 0 50 H3K4me1 0 50 1500 H3K27ac 0 50 1000 RNA-seq 1500 1500 0

1000 TP53

chr 6 127,100,000 127,200,000 500 20 500 H3K27me3 0 500 500 20 H3K4me3 0 20 Number of H3K4me1 ChIP-seq peaks Number of H3K4me3 ChIP-seq peaks Number of H3K27ac ChIP-seq peaks

Number of H3K27me3 ChIP-seq peaks H3K4me1 0 0 0 0 0 20 0102030 010203040 010203040 010203040 H3K27ac 0 H3K4me3 breadth (kb) H3K27ac breadth (kb) H3K4me1 breadth (kb) H3K27me3 breadth (kb) 20 RNA-seq 0

RSPO3 c d

100 ) 100 Cell fate determination (%) (%

Endocrine system development e 80 il 80

Digestive system development nt ntile a Epithelial tube morphogenesis a

Epithelial cell fate commitment 60 qu 60 Epithelial cell proliferation ight

Muscle cell fate commitment ignal qu 40 40 s Notch signaling pathway he Hippo signaling pathway 20 20 Transcriptional misregulation in cancer R = 0.93 R = 0.71 Pathways in cancer p < 2.2X10-16 p < 2.2X10-16 PI3K−Akt signaling pathway 0 0 H3K27me3 H3K27me3 Canonical Wnt signaling pathway 0 2040 60 80 100 0 2040 60 80 100 WNT SIGNALING H3K27me3 breadth quantile (%) H3K27me3 breadth quantile (%) Stem cell fate commitment Cardiac ventricle morphogenesis e Pattern specification process Typical peaks Grand H3K27me3 domain Amine transport Median size:7.6kb Median size:75.4kb Limb morphogenesis H3K27me3 10kb Embryonic organ development 6 H3K27ac Cell fate commitment H3K4me3 Reproduction 4 Tissue morphogenesis H3K4me1 CNS neuron differentiation 2 Fold enrichment 357 H3K27me3 breadth 0 q -log10( -value) 0510 NE2 H3K27me3 density -2 Start End Start End fgh P = 3.70X10-14 P = 1.53X10-09

6 1.0 0.4 R = -0.76 p = 8.8X10-5 4 0.8 0.3 2 0.6

0.2

0 dN/dS 0.4 Gene expression

missense Z score 0.1 −2 0.2

−4 0.0 0.0

H3K27me3 breadth H3K27me3 breadth H3K27me3 breadth Fig. 1 Grand H3K27me3 silencer domains marked subsets of genes in esophageal epithelial cell line. a Breadth distribution of H3K27me3, H3K4me3, H3K4me1, and H3K27ac ChIP-seq peaks in human esophageal epithelial cell line (NE2). Inserts example histone modification regions with different width in NE2; black bar represent ChIP-seq peaks called by SICER. b H3K27me3, H3K4me3, H3K4me1, and H3K27ac density at the tumor-suppressor TP53 and oncogene RSPO3 in NE2 cells. c Functional enrichment for genes along the H3K27me3 breadth continuum. The color and size of each point represented the −log10 (P value) values and enrichment scores. d The index of H3K27me3 peak signal or peak height (y-axis) plotted against peak breadth (x-axis). Blue dots indicate the top 5% broadest H3K27me3 domains which we defined “grand H3K27me3 silencer domains.” e Metagenes of H3K27me3, H3K4me3, H3K4me1, and H3K27ac ChIP-seq density (reads per kilobase per million mapped reads). Metagenes are centered on the H3K27me3 region (7.6 kb for typical peaks and 75.4 kb for grand H3K27me3 domains), with 3 kb surrounding each peak region. Boxplots representing the missense Z scores (f), the ratio of substitution rates, denoted dN/dS, between human and mouse (g) and gene expression level, and TPM (h) associated with various H3K27me3 domain breadths in NE2.

Published in partnership with CEGMR, King Abdulaziz University npj Genomic Medicine (2021) 65 4 p eoi eiie(01 65 (2021) Medicine Genomic npj 2 Fig. asa 5k eouino h eoi einCr1 49.3 Chr.17 region genomic the of resolution 15-kb at maps nbak n ersnaiegnswti h Asaeson 32m3 34e,HKm1 32a,adCC hPsqadDNA and ChIP-seq CTCF and H3K27ac, H3K4me1, H3K4me3, H3K27me3, shown. are TADs the 49.3 Chr.17 within on genes tracks representative methylation and black, in cosTD.TD r omlzdt oiin0 ersnigteTDcne,adpositions and center, TAD the representing 0, position to normalized are TADs TADs. across arwpa,adcnrlpa mn aeois xmlso rwe iwo rn 32m3dmismdae neatosaeshown. are interactions domains, mediated H3K27me3 domains grand H3K27me3 the grand of of fractions view the browser shows of Barplot Examples (>3). categories. interaction among pro multiplexed peak performed). Contact were control and permutations and (2), of peak, interaction replications narrow double 1000 (1), histogram, interaction blue (the single location genomic domain H3K27me3 e grand of permutation random d c 12GD akdgnsad38rno eetdgns Isr)Sera orlto coef correlation Spearman (Insert) genes. selected random 388 and genes marked GSDs 1172 r orltdfrG em hrdbten17 Ssmre ee n 8 Sstree ee.(net pamncorrelation Spearman (Insert) genes. targeted GSDs 388 and genes marked GSDs 1172 between shared terms GO coef for correlated are eecmaio ewe Ssmre n Sstargeted. GSDs and marked GSDs between comparison gene rprino rn 32m3dmis arwpas rcnrlpaspeeetal soit ihec te i hoai interactions. chromatin via other each with associate preferentially peaks control or peaks, narrow domains, H3K27me3 grand of Proportion nlsso rn 32m3dmismdae utpee neatosb aeoiigalitrcin nogadHK7e oan to domains H3K27me3 grand into interactions all categorizing by interactions multiplexed mediated domains H3K27me3 grand of Analysis h bevdnme fgadHK7e oan htitrc ihohrmmes(h e ie esstenmesotie by obtained numbers the versus line) red (the members other with interact that domains H3K27me3 grand of number observed The fi f cients. de cd ab h itiuino rn 32m3slne oan screae ihte3 eoeognzto.a organization. genome 3D the with correlated is domains silencer H3K27me3 grand of distribution The IMR-90 DNAme IMR-90 H3K4me1 IMR-90 H3K4me3 IMR-90 H3K27ac IMR-90 CTCF IMR-90 H3K27me3 RefSeq Genes GSDs markedgene

Proportion of H3K27me3 peak chr 1749,500,00050,000,000

fi (intra-interaction) h 0.00 0.05 0.10 0.15 0.20 0.25 e nldn h 32m3dniympaditrcin o op)frtemlilxdcpueaeshown. are capture multiplexed the for loops) (or interactions and map density H3K27me3 the including les eeotlg n aha nihetsigni enrichment pathway and ontology Gene 1172

GSDs 108

narrow Resolution =5kb chr.17 49.3Mb 200 10 10 10 =15 10 2 0 0 0 0 0 1 0 control GSDs Targetedgene 0328 60 4 ZNF652 PHB

– Density of 1,000 shuffles 0.00 0.05 0.10 0.15 01M o IMR90. for Mb 50.1 NGFR NXPH3 Peak numberofintra-interaction 01 025 20 15 10 5 0 g SPOP

GO enrichment of 328 GSD-targets KAT7 TAC4 (-log10(p-value)) 2 3 4 5 6 DLX4 DLX3 b .Ya tal. et Yuan J. oiino ek o rn 32m3dmis arwpas n oto peaks control and peaks, narrow domains, H3K27me3 grand for peaks of Position GO enrichmentof1172GSD-genes – angiogenesis embryonic morphogenesis 01M nnra lung normal in Mb 50.1 fi g ac ( cance 02 040 30 20 10 epithelial celldifferentiation eeotlg n aha nihetsigni enrichment pathway and ontology Gene chr.17 50.1Mb 108 (-log − 10 log (p-value))

embryonic development Proportion of H3K27me3 peak 10 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 multiplex chromatininteractions R=0.64, P=1.7X10 ( P Proportion of H3K27me3 peaks 234 au)vle)aentcreae o Otrssae between shared terms GO for correlated not are values) value) 0.1 0.2 0.3 37 12>=3 0 ulse nprnrhpwt EM,Kn buai University Abdulaziz King CEGMR, with partnership in Published 62 fi -3 137 rbatcl ieIR0 neato op r contoured are loops Interaction IMR90. line cell broblast Control Narrow GSDs 9 -13 16 SsControl GSDs h H3K27me3 peaksinTAD(-1to1) -2 GO enrichment of 328 random genes 60 1 fi (-log10(p-value)) cients. 2.0 2.4 2.8 6 − -1 23456 n ersn h A boundary. TAD the represent 1 and 1 Narrow 10 10 10 chr 8 h 232,100,000 chr 2 GO enrichmentof1172GSD-genes 0 0 0 chr 1 143,430,000 0123 fi PCDSL LPL1PRSS56 ALPPALP1 DIS3L2 NPPC ac ( cance TOP1MT (-log CT E2NRF5NOL9 HES2TNFRSF25 ACOT7 10 6,415,000 MAFA (p-value)) xml fH- contact Hi-C of Example − log 143,800,000 GSDMD 232,500,000 10 f R=0.20, P=0.33 6,550,000 endarmof diagram Venn ( ZNF623 P au)values) value) J. Yuan et al. 5 GSD-marked genes (Fig. 2f). To understand the function of the GSDs loss activated genes with acquired active promoter and putative GSD target genes, we analyzed the biological processes hypermethylated gene body and pathways of the genes targeted by GSDs. Excluding 60 genes To determine the molecular basis of GSDs mediated gene activity, marked by GSDs from domain target genes, it was nominally we performed integrated analyses of ChIP-seq and whole-genome enriched (P < 0.01) for 235 GO terms. Of these, 106 terms were bisulfite sequencing (WGBS) data in NE2 and KYSE450 cells. For enriched in both GSD-targeted genes and GSD-marked genes 208 activated genes in cancer found associated with GSD loss, a (Supplementary Table 2). Furthermore, we observed a significant significant increase was observed in H3K4me3, H3K4me1, and positive correlation in enrichment P value for shared terms (R = H3K27ac levels in KYSE450 cells (Fig. 4a). Notably, this was 0.67, P = 8.9 × 10−16, Fig. 2g). GO term analysis of the GSD- accompanied with an increase of DNA methylation level at gene targeted genes indicated enrichment of terms related to body but not hypomethylation at promoters (Fig. 4a). In our development and differentiation processes. In contrast, there previous study, we discovered a strong correlation between gene- was no correlation for function enrichment between GSD-marked body hypermethylation of DNA methylation canyons (large domains of low DNA methylation) and overexpression of homeo- genes and random selected genes (Fig. 2h). Together, our results box genes29. It indicates that GSDs loss could induce de novo DNA demonstrate that the biological nature of GSDs tend to be methylation in gene body to maintain transcriptional stability. chromatin looping and relevance to the coordinately regulation of To further explore genomic architecture of GSDs activated functionally related genes. genes, we searched for TF-binding motifs enriched in the H3K4me3 and H3K27ac peaks of 208 upregulated genes using a Widespread loss of GSDs induced gene activity in ESCC motif analysis algorithm. In KYSE450 cells, TF-binding motifs were To find out how loss or gain of GSDs affects gene expression only enriched (Fisher’s exact test FDR < 0.05) in upregulated genes levels in ESCC, we first compared their H3K27me3 levels and when using 50,000 background sequence regions as a control. The expression changes between cancer KYSE450 cells and normal top scoring TF motifs were activator 1 (AP1) family, cells (NE2) (Fig. 3a). We found that average H3K27me3 levels of including FOSL2, JunB, BATF, ATF3, and Jun, in H3K4me3 and GSD-marked genes but not narrow and control group were H3K27ac peaks (Fig. 4b), which suggests that the TF AP1 likely decreased in KYSE450 cells (Supplementary Fig. 6). Specifically, contribute to regulation of GSDs associated genes. 557 of 844 (66.0%) GSDs associated genes and 155 (18.4%) GSDs associated genes showed H3K27me3 signal loss and gain in RNA-seq analysis reveals novel gene signatures regulated by KYSE450 cells compared with NE2 cells, respectively (Fig. 3b). TBX20 in ESCC Next, we investigated the relationship between change in To understand how the GSDs activated genes defined above are H3K27me3 signal and change in gene expression. As expected, controlled, we focused on TFs because of their broad role in 30 H3K27me3 signal loss in KYSE450 cells was correlated with initiating programs key to cell fate and identity in cancer . transcription upregulation, whereas gain of H3K27me3 signal was Among 24 TFs of 208 activated genes with no known role in ESCC, correlated with downregulated transcription (Fig. 3c). Only 30 of we found expression of TBX20 was associated with TCGA ESCC ’ = 155 (19.4%) genes that were gained H3K27me3 signal showed patients outcome in univariate log-rank analysis (P 0.042, log- significantly downregulated than in KYSE450 cells. Conversely, rank test, Fig. 5a, Supplementary Table 4). We also found that the overall survival of 563 patients with TBX20 mutation was worse 208 of the 557 genes (37.3%) that showed less H3K27me3 are than 18,632 patients without TBX20 mutation in TCGA Pan-Cancer highly expression in KYSE450 cells (Fig. 3c, Supplementary − Resource (P = 9.04 × 10 4, log-rank test, Fig. 5a). TBX20 expression Table 3). Although narrow and random H3K27me3 peaks were was low in normal esophageal epithelial cells, but TBX20 was also found to be correlated with differential gene expression, the highly expressed in KYSE450 and KYSE510 cells and TCGA = − strength of these correlation was smaller than GSDs (R 0.54 patients’ samples (Fig. 5b, Supplementary Fig. 11). Using a for GSDs, −0.46 for narrow and random domains, Spearman’s lentiviral-based RNA interference (short hairpin RNA (shRNA)) correlation; Fig. 3c, Supplementary Fig. 7). To test the effect of approach, we knocked down TBX20 in KYSE510 cells and H3K27me3 breadth loss on transcriptional consistency, we quantified cell growth in vivo (Fig. 5c). As expected, we found used H3K27me3 ChIP-seq dataset and expression RNA-seq that TBX20 depletion significantly suppressed cell growth in vitro obtained in another ESCC cell lines, KYSE510, and found the (Fig. 5d) and in vivo (Fig. 5e, Supplementary Fig. 12). increased transcriptional expression associated with the loss of To identify the molecules and pathways responsible for the H3K27me3 signal (Supplementary Fig. 8). biological effects of TBX20, we performed RNA-seq analysis We next examined whether the gene expression upregulated after knockdown of TBX20 and obtained high correlations by H3K27me3 loss in ESCC cells also exist in human tumors. In the between the biological replicates in each experiment (Supple- ESCC dataset from The Cancer Genome Atlas (TCGA), the 208 mentary Fig. 13). We next performed differential gene GSDs activated genes were also significantly upregulated com- expression analysis, which at an FDR of 5%, identified − fi pared to the background genes (P = 2.0 × 10 36, Wilcoxon rank 266 signi cantly differentially expressed transcripts after fi sum test, Fig. 3d). Although narrow and random H3K27me3 peaks TBX20 knockdown in KYSE510. Expression was signi cantly “ ” loss also results in gene activation, we observed stronger activity higher for 112 transcripts ( upregulated abundance )and significantly lower for 154 transcripts (“downregulated abun- on GSDs associated genes than genes harboring narrow or control dance”) in TBX20-depleted KYSE510 cells (Fig. 5f, Supplemen- H3K27me3 peaks. The total 208 differentially expressed GSDs tary Table 5). To determine enriched DNA motifs at TBX20 associated genes were mainly enriched in cell-fate commitment −08 regulated genes, motif analysis was performed on differentially (FDR = 2.0 × 10 ) and blood vessel development (FDR = 4.0 × fi −04 expressed transcripts. As expected, the most signi cantly 10 , Supplementary Fig. 9). Furthermore, gene set enrichment enriched motif at differentially expressed genes was the analysis (GSEA) of representative gene set in KYSE450 and TBX20 (Supplementary Fig. 14). Gene ontology and pathway KYSE510 cells on epithelium development, endothelium develop- enrichment analysis revealed a significant association between ment, and blood vessel development is shown in Fig. 3e and TBX20 and important cancer-related pathways; downregulation Supplementary Fig. 10, which also indicated a significant was associated with cell cycle (P = 5.3 × 10−08) and double-strand correlation between upregulated epithelium and endothelium break repair via homologous recombination (P = 9.2 × 10−07) development programs and GSDs loss in ESCC. in TBX20-deletion KYSE510s (Fig. 5g, Supplementary Table 6).

Published in partnership with CEGMR, King Abdulaziz University npj Genomic Medicine (2021) 65 J. Yuan et al. 6 a Super-silencers Narrow peaks Control peaks 15 10 5 H3K27me3 signal (KYSE450) H3K27me3 signal (KYSE450) H3K27me3 signal (KYSE450) 0 5 10 15 20 25 30 0 0 5 10 15 20

0 5 10 15 20 25 30 051015 0 5 10 15 20 H3K27me3 signal (NE2) H3K27me3 signal (NE2) H3K27me3 signal (NE2) b c d NE2 KYSE450 NE2 KYSE450 Super-silencer-up: 208 TCGA_ESCC Super-silencer-dn: 30 All gene (19645) Broad_up (208) Narrow_up (106) Control_up (106) 0.6 0.8 1.0 0 5 10 15 0.4 −5 Cumulative fraction 0.2 Loss in tumor (n = 557) Gain in tumor (n = 155) -10 mRNA fold change (log2) R=-0.54, P= 5.3X10-36 −15 0.0 −20 −10 0 10 Mean H3K27me3 signal −10−5 0 5 10 2kbGene 2kb −1 01 KYSE450-NE2 mRNA fold change (log2) e Epithelium development Endothelium development Blood vessel development 0.0 0.0 FDR : 0.002 FDR : 0.019 0.0 FDR : 0.017 ES : -0.596 ES : -0.588 ES : -0.549 Normalized ES : -1.543 Normalized ES : -1.767 −0.2 Normalized ES : -1.485

−0.2 ichment Score −0.2 r richment Score richment richment Score richment

−0.4 −0.4 −0.4 Running En Running En Running En ic r ric 10 H3K27me3 gain 10 H3K27me3 gain ric 10 H3K27me3 gain 0 0 0 −10 −10 −10

ed List Met −20 −20 −20 ed List Met k H3K27me3 loss H3K27me3 loss H3K27me3 loss (Signal/Noise) (Signal/Noise) −30 −30

(Signal/Noise) −30 Ran Ranked List Met Ranked 0 250 500 750 1000 0 250 500 750 1000 Rank 0 250 500 750 1000 Rank in Ordered Dataset Rank in Ordered Dataset Rank in Ordered Dataset Fig. 3 Widespread loss of grand H3K27me3 silencer domains in esophageal squamous cell carcinoma. a Scatterplot showing average H3K27me3 signal of indicated peak-related genes in KYSE450 (y-axis) and NE2 (x-axis). H3K27me3 gains and losses in KYSE450 cells, compared to NE2 cells (determined using > 1-fold), are outside the dotted lines, respectively. b In grand H3K27me3 domains marked gene, heatmaps showing H3K27me3 distribution in NE2 and KYSE450 cells. H3K27me3 on gene bodies (50 equal-sized bins) ± 2 kb (ten equal-sized bins) is represented on the brown scale. Each row represents the same gene in NE2 and KYSE450 cells. c H3K27me3 difference for GSDs marked gene (mean values; x-axis) plotted against expression of the corresponding genes (y-axis; log2 values) in KYSE450 cells versus NE2 cells (along axes): red, selected H3K27me3 losses and upregulated genes; blue, H3K27me3 gains and downregulated genes; and black diagonal line, linear regression. d Cumulative distribution of gene expression changes in TCGA ESCC. The x-axis is log2 (fold change) in TCGA tumor versus normal. The y-axis represents the cumulative fraction of genes: red, genes with grand H3K27me3 domains loss-induced upregulation; blue, narrow peak loss-induced gene upregulation 3; green, control peak loss-induced gene upregulation; and gray, entire mRNA. e GSEA of H3K27me3 losses in GSDs marked genes between NE2 and KYSE450 (normalized enrichment score (NES); false discovery rate-adjusted P value (FDR)).

These results indicate that loss of GSDs associated TBX20 and have been identified21,31–40. Although epigenomic alterations, the pathways it controls represent an important hub of tumor such as changes in DNA methylation41,42 and histone modifica- control in ESCC. tions43,44, also have been observed in ESCCs, it is not clear how they contribute to tumor development. The increasing knowl- edge on how epigenetic modifications such as H3K27me3 DISCUSSION influence patterns of gene expression in cancer holds great ESCC is a common malignancy without effective therapy. The promises for understanding biological processes, and develop- hundreds of whole exomes and genomes of ESCCs have been ing a novel molecular prognosis biomarker, patient stratifica- sequenced in the past few years, and numerous key aberrations tions approaches, and therapeutic intervention strategies.

npj Genomic Medicine (2021) 65 Published in partnership with CEGMR, King Abdulaziz University J. Yuan et al. 7

a H3K4me3 H3K27ac

NE2 KYSE450 NE2 KYSE450 25 3 NE2 NE2 20 KYSE450 2 KYSE450

15 1

10 0

-1

5 H3K27ac signal H3K4me3 signal

0 -2 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 H3K4me1 DNA methylation

NE2 KYSE450 NE2 KYSE450 4 1.0 NE2 NE2 3 KYSE450 0.8 KYSE450

2 0.6

1 0.4

0 0.2 H3K4me1 signal Methylation level -1 0.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 -5.0 TSS TES 5.0 b Top 10 motifs enriched (q-value < 0.05) in H3K4me3 peaks Top 10 motifs enriched (q-value < 0.05) in H3K27ac peaks

Name MotifP-value Q-value Name MotifP-value Q-value GG C GA C TCC TAA C TT A CTA FOSL2 ATG CA 1e-17 0.0000 Jun-AP1 TG AT 1e-7 0.0000 GA A T G TT T CG C TG T G AG G C A C A CG CG AG T CG TA A C C T AC A T T A G T C G A CCCTGA TCAGG C TGA TCA GG GA C T AAAT AG C T A A T JUNB1e-14 0.0000 NFE2L2T CA G 1e-7 0.0000 GGT TAC T G C T T CT C A AC T G C GGCG AA TGA A G G C A T GT T A CA GA GT G C TC GTT CCC C CA G G CTGTGCGATCTCGCAA G A TCA TA G T GA G TCC GC G A GTG BATF AG C 1e-14 0.0000 FOSL2 TG A 1e-6 0.0001 T C

CG A C A CG T CG AG T CG T A CT T A A A CA C T CG CA TA T G CC GGC TG CAA G G C T CTAT A T A T A C T A GT A T G C A BACH2 G C 1e-12 0.0000 NRF2 T T 1e-6 0.0002 A A C AGC C AATT C AG CG T GT T C T T C G G AC TC A G C G T C G TT TT A C A GGAGAC A G T C CGA C G A CA TGA TCA GTGCTGAGTCA C GA C TCA AG CT TA G Jun-AP1 T AT 1e-12 0.0000 BACH2 G C 1e-5 0.0002 A A AGC CG C TG T G T G T C G CG T GT TA A C C AG AC TTAG AC TAC TTA G AC T G C G A C TGA TCA GG CG TGACTCA G CC AA C TTA TG CAT GAA ATF3 1e-11 0.0000 AARET 1e-5 0.0014 T G T C T TCG A C TA CG T GACG A G T ACG T A CTGC A CG A C AG T A GC T CG A GA CG T CCTGA TCAGGG TTGCATCA CT A AAG CGG C GA T CGAG T TC A A A C GT C T TT A T AGC C C C CA AC TGT E2F71e-8 0.0000 p63 A A 1e-4 0.0027 TA T T T A A A T T TG T T A T C G G G T G C T G A A A A A A TA GA A A CC CG CA G T C GT C G A GT C G TCACTTCGCGC G GTC C GG GGTC CC G G CA CC CT GA TC GA TC A GA A AT TTGA AT TTG G G T T A C E2F1 A 1e-6 0.0000 p53 T A A T A 1e-4 0.0051 C TT C A T GAAA C T G T C T T T G T T C C A T A A A A A C C T TG C G GC A G A T A C G CA C G GC G T C C TC GCGGGAA C CG G GGCC C GGGGT AAG G C GC C G T CC GA CC CT G T A GA T TA A T GGCT T CG A CAC TC T A CCAC GCG CA AC TGT TA A p63 A A 1e-5 0.0002 HRE A T C G 1e-4 0.0058 TA T T GG G T A A C G G A T A TT A G T TG T T C G A A CC CG CA G T CA A G T G C G A GT C G T CT T GC CG CA A A GTC C GG GGTC CC G G ATAA GAGAAATTTCCT TT G G G GT GT CAG IRF8 C CC 1e-4 0.0003 SREBF2 TA C CT 1e-3 0.0077 A C G AA A C A G A G T CG A T CG T T A T G T A G G T A C C TG C TG T A T GTT G C TTGT ACT CC T C G C A G AAA AGA A TCCTCAGACCAG

Fig. 4 Grand H3K27me3 silencer domain breadth dynamics encodes epigenomic signature and cancer-associated TF binding. a H3K4me3, H3K4me1, H3K27ac, and DNA methylation density plots (left) and average density (right) across all modified regions in KYSE450 versus NE2 cell line. Columns represent the region from −5 kb with respect to the transcription start sites (TSSs) to +10 kb with respect to the transcription end site (TES). b Top ten TF-binding motifs enriched in H3K4me3, H3K4me1, and H3K27ac peaks of upregulated GSDs marked genes. Motif and statistical analysis were performed in HOMER software.

In this study, we have demonstrated the existence of GSDs was GSDs differ from typical silencers, which are preferentially associated with highly conserved and functional essential genes in segregated by TAD boundaries, associated with each other via human esophageal epithelial cells. GSD is mutually exclusive with chromatin interactions, and their anchors form links at arbitrary H3K4me3, H3K4me1, and H3K27ac marks and is linked to gene distances and construct multiplex interactions. expression repression. We characterized the GSDs landscape of In addition to peak breadth encoding information on gene NE2 cells, and identified that GSDs associated genes are enriched function, we showed that its breadth dysregulation is also a link in the processes exclusive to squamous cell carcinoma biology, between H3K27me3 and changes to gene expression. These GSDs such as epithelium differentiation and development, as well as are more sensitive to perturbation than typical silencers, and the cell-fate determination and carcinogenesis. In addition, these expression of GSDs associated genes is preferentially affected. In

Published in partnership with CEGMR, King Abdulaziz University npj Genomic Medicine (2021) 65 J. Yuan et al. 8

a b chr 7 35,200,000 35,300,000 TCGA_Pan-Cancer NE2 30 TCGA_ESCC 0 1.00 H3K27me3 30 1.00 High expression of TBX20 (n=44) TBX20 mutated (n=563) KYSE450 0 Low expression of TBX20 (n=44) TBX20 not mutated (n=18,632) 60 NE2 0 0.75 H3K4me3 60 0.75 KYSE450 0 20 NE2 H3K4me1 0 20 0.50 0.50 KYSE450 0 10 NE2 H3K27ac 0 0.25 0.25 10 Survival Probability Survival Probability KYSE450 0 Log Rank Log Rank 1 P NE2 = 0.042 P = 9.04e-4 0 WGBS 1 0.00 0.00 KYSE450 0 500 1000 1500 2000 0 5 10 15 0 Overall survival (days) Overall survival (years) 10 NE2 0 10 RNA-seq KYSE450 0 c d 10 KYSE510 0 2.0 siNC ** TBX20 siTBX20 e shRNA Lentiviral Transduction 1.5 shNC

1.0 shTBX20#1

Culture dish

0.5 shTBX20#2

Subcutaneous tumor Absorbance at 450nm 0.0

024 days f g TBX20 knockdown in KYSE510 GO:0007059: segregation 112 Up, FDR <= 0.05 GO:0044770: cell cycle phase transition GO:0006310: DNA recombination 154 Down, FDR <= 0.05 GO:0045047: protein targeting to ER ko04110: Cell cycle GO:0051348: negative regulation of transferase activity GO:0070482: response to oxygen levels GO:0071103: DNA conformation change M281: PID FAK PATHWAY GO:0007584: response to nutrient ko04114: Oocyte meiosis GO:0000722: telomere maintenance via recombination GO:0008380: RNA splicing (Fold change) (Fold

2 GO:0071417: cellular response to organonitrogen compound ko00270: Cysteine and methionine metabolism

log M129: PID PLK1 PATHWAY M239: PID A6B1 A6B4 INTEGRIN PATHWAY GO:0006890: retrograde vesicle-mediated transport, Golgi to ER −4 −2 0 2 4 GO:0010038: response to metal ion GO:0006984: ER-nucleus signaling pathway

0 5 10 15 01234567

log2 (Mean count) -log10(P) Fig. 5 RNA-seq analysis reveals genes and pathways targeted by TBX20. a Kaplan–Meier estimates of overall survival of patients with low- or high-expression level of TBX20 in TCGA ESCC patients (left) and with mutated or not mutated TBX20 in TCGA Pan-Cancer (right). Statistical analysis was performed using the log-rank test and univariate Cox analysis. b Genomic snapshot of H3K27me3, H3K4me3, H3K4me1, H3K27ac, and RNA-seq in NE2, KYSE450, and KYSE510 at the TBX20 loci. c Experimental design to test the role of GSDs mediated oncogenesis candidates in cell proliferation and subcutaneous tumor growth. d Cell proliferation was measured using the CCK-8 assay. Error bars, s.d. calculated from five replicates. ***P < 0.001, **P < 0.01, **P < 0.05, compared to control siRNA in Wilcoxon rank sum test. e TBX20 silencing inhibited tumor growth as measured by tumor weight. Error bars, s.d. calculated from night replicates. ***P < 0.001, compared to control in Wilcoxon rank sum test. f MA plot comparing control and TBX20 knockdown KYSE510 cells. g Heat map of Metascape-enriched clusters. Each cluster contains multiple gene sets to eliminate redundancy. Analysis used genes meeting an FDR q < 0.05 threshold.

addition, GSDs dysregulation is linked to enhancer activity and oncogenes29. In this study, we found that loss of H3K27me3 in gene activation via coordination of H3K4me3 and H3K27ac parallel with hypermethylation of gene body but not promoter deposition and DNA hypermethylation of gene bodies. In the within GSDs marked genes is associated with increased gene , promoter hypermethylation preferentially leads expression. Therefore, we unveiled the more extent role of to the silencing of gene expression45,46, whereas hypomethylation epigenetic mechanism we discovered in homeobox oncogenes to of high-CpG dense promoters showed the most profound increase GSDs medicated gene-body hypermethylation in tumorigenesis. in gene expression47,48. As we know, most gene bodies have low In addition, we found that GSDs loss upregulated a transcriptional CpG density and are heavily methylated and its methylation is program driven by the TFs of AP1 family. As per previous study, involved in maintaining gene transcription efficiency49,50. AP1 can exert its oncogenic and antioncogenic effects by Recently, we found gene-body hypermethylation of canyon genes regulating genes involved in cell proliferation, differentiation, is strongly associated with increased expression of homeobox apoptosis, angiogenesis, and tumor invasion51. In esophageal

npj Genomic Medicine (2021) 65 Published in partnership with CEGMR, King Abdulaziz University J. Yuan et al. 9 adenocarcinoma, AP1 was identified as a molecular switch to alter function to calculate ChIP-seq signal, subtract corresponding background gene expression and hence cause cells to adopt a cancer fate52. input signal, normalize the number of reads per bin in reads per kilobases The biological associations with H3K27me3 dysregulation may per million (RPKM) method, and generate BigWig format files. The input- be attributed to the function of the tissue and the pathophy- subtracted peak signal within a region was measured as a RPKM value siological status of the model in relation to the ongoing gene using bigWigAverageOverBed. The ChIP-seq signal values at each expression programs. Here, a major new discovery driven by the across individual genes were submitted to deepTools to draw profiles and GSDs dysregulation was the identification of TBX20 as a key heatmaps. regulator in ESCC. We have not only found that high expression of TBX20 is related to the poor prognosis of patients but also we RNA-seq and data analysis verified the critical role of TBX20 in tumor growth in vivo. Total RNA was extracted with TRIzol reagent (Life Technologies). Notably, RNA-seq analysis revealed that TBX20 regulates several Complementary DNA libraries were constructed using an Illumina TruSeq pathways that are important for tumor progression, including RNA Sample Prep kit according to the manufacturer’s protocol. A total of cell cycle33,53 and homologous recombination repair54,55,which 150 base paired-end reads were sequenced using the Illumina HiSeq X-Ten opening a path to identifying new targets for therapy of platform. RNA-seq raw reads were mapped to the human genome version aggressive cancers and giving support to the relevance of hg38 using STAR-2.6.160 with default parameter values, used the htseq- TBX20 network to ESCC. count function in HTSeq61 to calculate read counts for each gene. In conclusion, by exploring epigenetic changes associated with Transcript abundances at the gene level were calculated as TPM values 62 tumorigenesis in ESCC cell line, we discovered that for carcino- using RSEM . We utilized the function to generate a BigWig file that genic sets, characterized by loss of grand H3K27me3 domains, contains RNA-seq signal (read density) at each base pair across the fi 63 their expression levels are raised in cancer. Establishing a catalog genome. The BigWig le was then submitted to IGV to visualize RNA-seq signal at individual genes. Differential gene expression analysis between of these cis-regulatory elements and trans-regulatory factors 64,65 would greatly increase our understanding of the molecular factors groups was performed with the DESeq2 R package . The genes with a log fold change value of >1 and an FDR of <0.05 were considered to be that regulate the transcriptional programs in ESCC. The increasing 2 significantly differentially expressed. knowledge on how epigenetic modifications such as loss of grand H3K27me3 domains influence patterns of gene expression in cancer holds great promises for understanding biological pro- WGBS and data analysis cesses, as well as for their use in prognosis, patient stratifications, DNA was extracted using Qiagen DNeasy & Blood Tissue Kit. Fifty and therapeutic intervention. nanograms of genomic DNA was used for library preparation. DNA was bisulfite converted with the EZ DNA Methylation Gold Kit (Zymo Research) for use with the TruSeq library preparation methods. A total METHODS of 150 base paired-end reads were sequenced using the Illumina HiSeq fi Cell cultures and siRNA transfection X-Ten platform. All of the WGBS reads were rst processed using Skewer (v0.2.2)66 to trim adapter and low-quality reads. Adapter-trimmed reads The human ESCC cell lines KYSE450 and KYSE510 were provided by Dr. were mapped to the human genome version hg38 using Bismark Yutaka Shimada (Kyoto University, Kyoto, Japan). The immortalized normal v0.20.167 with default parameter values. Methylation BigWig file was esophageal epithelium NE2 cells were a gift from Dr. Enmin Li (Medical generated by iBSTools v1.3.029. College of Shantou University, Guangdong, China). HEK293T cells were purchased from the ATCC. The ESCC cells were cultured in RPMI-1640 medium with 10% FBS. NE2 cells were cultured in in a 1:1 mixture of Hi-C data analysis fi de ned K-SFM (10744-019, Gibco) and EpiLife medium (MEP1500CA, We accessed publicly available human IMR90 cell line sequencing reads Gibco). HEK293T cells were cultured in DMEM with 10% FBS. All cells were in FASTQ format from the ENCODE databases, the reads were mapped to cultured in cell incubator with 5% CO2 at 37 °C. All of the cells were the human reference genome version hg38 using Juicer68 with default authenticated by short tandem repeat analysis and regularly tested for parameter values. TADs and long-range chromatin interactions which mycoplasma contamination. siRNAs were purchased from Dharmacon and contain the genomic location that has been mapped to the human were transfected following the standard protocol of Lipofectamine 2000 Transfection Reagent. siRNAs used in this study are listed in Supplementary genome version hg19. We converted these data from the hg19 to the Information, Supplementary Table 7. hg38 version of the human genome using the tool liftOver (https:// genome.ucsc.edu/util.html). The Hi-C maps of regions were visualized using Juicebox v1.11.0869. ChIP-seq and ChIP-chip data analysis pipeline The ChIP experiments were performed following the manufacturer’s protocol (#9003, CST). Briefly, 4 × 106 cells were seed in 10 cm dishes and Functional enrichment analysis 70 cultured overnight. Cells were cross-linked with 1% formaldehyde at RT for We used Metascape for pathway analysis. Pathway with a P-value 10 min, followed by the addition of glycine to terminate cross-linking (accumulative hypergeometric distribution) smaller than 0.01, and an reaction. After washing with cold PBS, cells were harvested into 1.5 mL EP enrichment factor > 1.5 (the enrichment factor is the ratio between the tubes and fragmented by micrococcal nuclease and ultrasonic wave. The observed counts and the counts expected by chance) are shown. GSEA cleared chromatin samples were incubated with antibody overnight at was conducted through the Bioconductor package “clusterProfiler”71. 4 °C, and then incubated with magnetic beads for 2 h. Beads bound with chromatin were then extensively washed and eluted. Eluted chromatin was digested at 65 °C by proteinase K and DNA was purified using columns Motif analysis provided in this kit. The purified ChIP DNA samples and their input DNA To detect enriched sequence motifs in H3K27ac, H3K4me3, and H3K4me1 were diluted for library construction using UltraTM II DNA Library Prep Kit peaks associated with altered super silencers, we performed motif (#E7645, NEB). The libraries were sequenced by Illumina HiSeq X Ten analysis using HOMER72 with default parameters except size given and Sequencer. ChIP-seq raw reads were aligned to the human reference mask. The ranks of the most highly enriched TFs are presented according genome sequence version hg38 using Bowtie2 (version 2.3.4)56. For IMR90 to the Q-value. cell datasets downloaded in BAM format from the ENCODE databases (https://www.encodeproject.org/), the reads were mapped to the human reference genome version hg38. Significant H3K27me3 peaks were called Survival analysis by using SICER257 with default parameter values, while H3K27ac, Kaplan–Meier survival curves and log-rank tests were used to compare the H3K4me3, and H3K4me1 were analyzed with all default parameters except survival distribution between the high-expression and low-expression -w 200 and -g 400. H3K27me3 peaks were merged within a distance of groups through the platform DrBioRight73. The survival difference between 2 kb. Identified peaks were annotated by intersection with the reference two groups with or without mutations was visualized using TCGA (https:// using BEDTools v2.25.058. We then used the deepTools59 bamCompare portal.gdc.cancer.gov).

Published in partnership with CEGMR, King Abdulaziz University npj Genomic Medicine (2021) 65 J. Yuan et al. 10 Cell proliferation assay 8. Lane, A. A. et al. Triplication of a 21q22 region contributes to B cell transformation Cell proliferation was measured using CCK-8 reagent at days 0, 2, and 4. through HMGN1 overexpression and loss of histone H3 Lys27 trimethylation. Nat. Specifically, CCK-8 reagent (Dojindo) was added to the cell culture medium Genet. 46, 618–623 (2014). at a ratio of 1:10, and the absorbance was measured at 450 nm after 9.Comet,I.,Riising,E.M.,Leblanc,B.&Helin,K.Maintainingcellidentity:PRC2- incubation for 1 h at 37 °C. mediated regulation of transcription and cancer. Nat. Rev. Cancer 16, 803–810 (2016). 10. Parker, S. et al. Chromatin stretch enhancer states drive cell-specific gene reg- Plasmid construction, lentivirus package, and stable cell line ulation and harbor human disease risk variants. Proc. Natl. Acad. Sci. USA 110, generation 17921–17926 (2013). shRNA sequences were cloned into pSIH1-puro vector (Addgene). The 11. Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell shRNA sequences targeting TBX20 were as follows: sh1, ACAACAAGAGG- 155, 934–947 (2013). TACCGCTA and sh2, TCAAACAGATGGTGTCTTT. Lentivirus was produced 12. Lovén, J. et al. Selective inhibition of tumor oncogenes by disruption of super- using HEK293T cells with the second-generation packaging system psPAX2 enhancers. Cell 153, 320–334 (2013). (#12260, Addgene) and pMD2.G plasmid (#12259, Addgene). KYSE510 cells 13. Whyte, W. A. et al. Master transcription factors and mediator establish super- were infected with lentivirus in medium supplemented with 8 mg/mL enhancers at key cell identity genes. Cell 153, 307–319 (2013). polybrene (Sigma-Aldrich) for 48 h and were then selected with 1 mg/mL 14. Benayoun, B. A. et al. H3K4me3 breadth is linked to cell identity and transcrip- puromycin (Sangon Biotech) for 2 weeks. tional consistency. Cell 158, 673–688 (2014). 15. Chen, K. et al. Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor-suppressor genes. Nat. Genet. 47,1149(2015). Animal experiments 16. Zhao, D. et al. Broad genic repression domains signify enhanced silencing of Balb/c-nude mice were purchased from Charles River (USA). All animal oncogenes. Nat. Commun. 11, 5560 (2020). protocols were approved by the Animal Care and Use Committee of the 17. Cai, Y. et al. H3K27me3-rich genomic regions can function as silencers to repress Chinese Academy of Medical Sciences Cancer Hospital (Beijing, China). For gene expression via chromatin interactions. Nat. Commun. 12,1–22 (2021). 6 subcutaneous xenograft, 5 × 10 ESCC cells were injected subcutaneously 18. Pennathur, A., Gibson, M. K., Jobe, B. A. & Luketich, J. D. Oesophageal carcinoma. into 6-week-old female BALB/c-nude mice. Tumor growth was monitored Lancet 381, 400–412 (2013). weekly by caliper measurements and calculated individually as: tumor 19. Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2016. CA Cancer J. Clin. 66, 2 volume = a × b /2 (a represents tumor length and b represents tumor 7–30 (2016). width). After 5 weeks, tumors were harvested. Tumor weights were 20. Enzinger, P. C. & Mayer, R. J. Esophageal cancer. N. Engl. J. Med. 349, 2241–2252 measured and analyzed by Wilcoxon rank sum test. (2003). 21. Network, C. G. A. R. Integrated genomic characterization of oesophageal carci- Reporting summary noma. Nature 541, 169 (2017). 22. Hilkens, J. et al. RSPO3 expands intestinal stem cell and niche compartments and Further information on research design is available in the Nature Research drives tumorigenesis. Gut 66, 1095–1105 (2017). Reporting Summary linked to this article. 23. Young, M. D. et al. ChIP-seq analysis reveals distinct H3K27me3 profiles that correlate with transcriptional activity. Nucleic Acids Res. 39, 7415–7427 (2011). DATA AVAILABILITY 24. X, Z. et al. Large DNA methylation nadirs anchor chromatin loops maintaining hematopoietic stem cell identity. Mol. Cell 78, 506–521 (2020). The sequence data obtained in this study have been submitted to the NCBI Sequence 25. McLaughlin, K. et al. DNA methylation directs Polycomb-dependent 3D Read Archive (SRA) under SRA accession number PRJNA679744. The accession genome re-organization in naive pluripotency. Cell Rep. 29, 1974–1985 numbers for raw data from public database or the source of processed dataset are (2019). indicated in the Supplementary Data (Supplementary Tables 8–10). Human reference 26. Rhodes, J. et al. Cohesin disrupts Polycomb-dependent chromosome interactions genome sequence version hg38 was downloaded from the UCSC Genome Browser in embryonic stem cells. Cell Rep. 30, 820–835 (2020). website (https://genome.ucsc.edu). All the data used are publicly available at https:// 27. Alberti, S. Phase separation in biology. Curr. Biol. 27, R1097–R1102 (2017). github.com/JiangQi1996/H3K27me3. The datasets used and analyzed during the 28. Strom, A. et al. Phase separation drives heterochromatin domain formation. current study are available from the corresponding author on reasonable request. Nature 547, 241–245 (2017). 29. Su, J. et al. Homeobox oncogene activation by pan-cancer DNA hypermethyla- tion. Genome Biol. 19, 108 (2018). CODE AVAILABILITY 30. Roy, N. & Hebrok, M. Regulation of cellular identity in cancer. Dev. Cell 35, All the codes used are publicly available at https://github.com/JiangQi1996/ 674–684 (2015). H3K27me3. 31. Gao, Y.-B. et al. Genetic landscape of esophageal squamous cell carcinoma. Nat. Genet. 46, 1097 (2014). Received: 7 May 2021; Accepted: 23 July 2021; 32. Hao, J.-J. et al. Spatial intratumoral heterogeneity and temporal clonal evolution in esophageal squamous cell carcinoma. Nat. Genet. 48, 1500 (2016). 33. Lin, D.-C. et al. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nat. Genet. 46, 467 (2014). 34. Qin, H.-D. et al. Genomic characterization of esophageal squamous cell carcinoma reveals critical genes underlying tumorigenesis and poor prognosis. Am. J. Hum. REFERENCES Genet. 98, 709–727 (2016). 1. Jambhekar, A., Dhall, A. & Shi, Y. Roles and regulation of histone methylation in 35. Sawada, G. et al. Genomic landscape of esophageal squamous cell carcinoma in a animal development. Nat. Rev. Mol. Cell Biol. 20, 625–641 (2019). Japanese population. Gastroenterology 150, 1171–1182 (2016). 2. Dawson, M. & Kouzarides, T. Cancer epigenetics: from mechanism to therapy. Cell 36. Song, Y. et al. Identification of genomic alterations in oesophageal squamous cell 150,12–27 (2012). cancer. Nature 509, 91 (2014). 3. Michalak, E. M., Burr, M. L., Bannister, A. J. & Dawson, M. A. The roles of DNA, RNA 37. Zhang, L. et al. Genomic analyses reveal mutational signatures and frequently and histone methylation in ageing and cancer. Nat. Rev. Mol. Cell Biol. 20, altered genes in esophageal squamous cell carcinoma. Am. J. Hum. Genet. 96, 573–589 (2019). 597–611 (2015). 4. Laugesen, A., Højfeldt, J. W. & Helin, K. Molecular mechanisms directing PRC2 38. Hu, N. Genomic landscape of somatic alterations in esophageal squamous cell recruitment and H3K27 methylation. Mol. Cell 74,8–18 (2019). carcinoma and gastric cancer. Cancer Res. 76, 1714–1723 (2016). 5. Béguelin, W. et al. EZH2 is required for germinal center formation and somatic 39. Chang, J. et al. Genomic analysis of oesophageal squamous-cell carcinoma EZH2 mutations promote lymphoid transformation. Cancer Cell 23, 677–692 identifies alcohol drinking-related mutation signature and genomic alterations. (2013). Nat. Commun. 8, 15290 (2017). 6. Wilson, B. G. et al. Epigenetic antagonism between Polycomb and SWI/SNF 40. Cheng, C. et al. Whole-genome sequencing reveals diverse models of structural complexes during oncogenic transformation. Cancer Cell 18, 316–328 (2010). variations in esophageal squamous cell carcinoma. Am. J. Hum. Genet. 98, 7. Abdel-Wahab, O. et al. ASXL1 mutations promote myeloid transformation 256–274 (2016). through loss of PRC2-mediated gene repression. Cancer Cell 22,180–193 41.Tokugawa,T.,Sugihara,H.,Tani,T.&Hattori,T.Modesofsilencingofp16indevel- (2012). opment of esophageal squamous cell carcinoma. Cancer Res. 62, 4938–4944 (2002).

npj Genomic Medicine (2021) 65 Published in partnership with CEGMR, King Abdulaziz University J. Yuan et al. 11 42. Baba, Y. et al. LINE-1 hypomethylation, DNA copy number alterations, and CDK6 69. Robinson, J. T. et al. Juicebox.js provides a cloud-based visualization system for amplification in esophageal squamous cell carcinoma. Clin. Cancer Res. 20, Hi-C data. Cell Syst. 6, 256–258.e251 (2018). 1114–1124 (2014). 70. Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis 43. He, L. et al. Prognostic impact of H3K27me3 expression on locoregional pro- of systems-level datasets. Nat. Commun. 10,1–10 (2019). gression after chemoradiotherapy in esophageal squamous cell carcinoma. BMC 71. Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing Cancer 9, 461 (2009). biological themes among gene clusters. OMICS 16, 284–287 (2012). 44. Jiang, Y.-Y. et al. Targeting super-enhancer-associated oncogenes in oesophageal 72. Heinz, S. et al. Simple combinations of lineage-determining transcription factors squamous cell carcinoma. Gut 66, 1358–1368 (2017). prime cis-regulatory elements required for macrophage and B cell identities. Mol. 45. Herman, J. G. & Baylin, S. B. Gene silencing in cancer in association with promoter Cell 38, 576–589 (2010). hypermethylation. N. Engl. J. Med. 349, 2042–2054 (2003). 73. Li, J., Chen, H., Wang, Y., Chen, M.-J. M. & Liang, H. Next-generation analytics for 46. Bergman, Y. & Cedar, H. DNA methylation dynamics in health and disease. Nat. omics data. Cancer Cell 39,3–6 (2020). Struct. Mol. Biol. 20, 274–281 (2013). 47. Stefanska, B. et al. Definition of the landscape of promoter DNA hypomethylation in liver cancer. Cancer Res. 71, 5891–5903 (2011). ACKNOWLEDGEMENTS 48. Bert, S. et al. Regional activation of the cancer genome by long-range epigenetic The authors thank the Geneseeq Technology Inc. for sequencing services. This work was – remodeling. Cancer Cell 23,9 22 (2013). supported by the National Natural Science Foundation of China (61871294), Zhejiang 49. Lister, R. et al. Human DNA methylomes at base resolution show widespread Provincial Natural Science Foundation of China (LR19C060001), and the Scientific – epigenomic differences. Nature 462, 315 322 (2009). Research Foundation for Talents of Wenzhou Medical University (QTJ18023) to J.S. 50. Neri, F. et al. Intragenic DNA methylation prevents spurious transcription initia- tion. Nature 543,72–77 (2017). 51. Eferl, R. & Wagner, E. F. AP-1: a double-edged sword in tumorigenesis. Nat. Rev. AUTHOR CONTRIBUTIONS Cancer 3, 859–868 (2003). 52. Britton, E. et al. Open chromatin profiling identifies AP1 as a transcriptional J.S., Z.L. and H.C. conceived and designed the study; J.Y., Q.J., X.W. and D.F. performed regulator in oesophageal adenocarcinoma. PLoS Genet. 13, e1006879 (2017). the data analysis and statistical analyses; J.Y., J.S. and H.C. drafted the manuscript; and 53. Ohashi, S. et al. Recent advances from basic and clinical studies of esophageal T.G., X.Z. and Y.Q. conducted the experiments. squamous cell carcinoma. Gastroenterology 149, 1700–1715 (2015). 54. Qin, Q. et al. Small-molecule survivin inhibitor YM155 enhances radio- sensitization in esophageal squamous cell carcinoma by the abrogation of G2 COMPETING INTERESTS checkpoint and suppression of homologous recombination repair. J. Hematol. The authors declare no competing interests. Oncol. 7, 62 (2014). 55. Yang, Q. et al. NRAGE is involved in homologous recombination repair to resist the DNA-damaging chemotherapy and composes a ternary complex with RNF8- ADDITIONAL INFORMATION BARD1 to promote cell survival in squamous esophageal tumorigenesis. Cell Supplementary information The online version contains supplementary material – Death Differ. 23, 1406 1416 (2016). available at https://doi.org/10.1038/s41525-021-00232-6. 56. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012). Correspondence and requests for materials should be addressed to Hongyan Chen, fi 57. Zang, C. et al. A clustering approach for identi cation of enriched domains Zhihua Liu or Jianzhong Su. from histone modification ChIP-Seq data. Bioinformatics 25, 1952–1958 (2009). Reprints and permission information is available at http://www.nature.com/ fl 58. Quinlan, A. R. & Hall, I. M. BEDTools: a exible suite of utilities for comparing reprints genomic features. Bioinformatics 26, 841–842 (2010). 59. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims – data analysis. Nucleic Acids Res. 44, W160 W165 (2016). in published maps and institutional affiliations. 60. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29,15–21 (2013). 61. Anders, S., Pyl, P. T. & Huber, W. HTSeq—a Python framework to work with high- – throughput sequencing data. Bioinformatics 31, 166 169 (2015). Open Access This article is licensed under a Creative Commons fi 62. Li, B. & Dewey, C. N. RSEM: accurate transcript quanti cation from RNA-Seq data Attribution 4.0 International License, which permits use, sharing, with or without a reference genome. BMC Bioinform. 12, 323 (2011). adaptation, distribution and reproduction in any medium or format, as long as you give – 63. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29,24 26 (2011). appropriate credit to the original author(s) and the source, provide a link to the Creative 64. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and Commons license, and indicate if changes were made. The images or other third party dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014). material in this article are included in the article’s Creative Commons license, unless 65. Anders, S. & Huber, W. Differential expression analysis for sequence count data. indicated otherwise in a credit line to the material. If material is not included in the Nat. Preced. 11, R106 (2010). article’s Creative Commons license and your intended use is not permitted by statutory 66. Jiang, H., Lei, R., Ding, S.-W. & Zhu, S. Skewer: a fast and accurate adapter trimmer regulation or exceeds the permitted use, you will need to obtain permission directly for next-generation sequencing paired-end reads. BMC Bioinform. 15, 182 (2014). from the copyright holder. To view a copy of this license, visit http://creativecommons. fl 67. Krueger, F. & Andrews, S. R. Bismark: a exible aligner and methylation caller for org/licenses/by/4.0/. Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). 68. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop- resolution Hi-C experiments. Cell Syst. 3,95–98 (2016). © The Author(s) 2021, corrected publication 2021

Published in partnership with CEGMR, King Abdulaziz University npj Genomic Medicine (2021) 65