Intragenic CpG islands play important roles in bivalent PNAS PLUS assembly of developmental

Sun-Min Leea, Jungwoo Leeb, Kyung-Min Nohc, Won-Young Choib, Sejin Jeond, Goo Taeg Ohd, Jeongsil Kim-Hae, Yoonhee Jinf, Seung-Woo Chof, and Young-Joon Kima,b,1

aDepartment of Biochemistry, College of Life Science and Technology, Yonsei University, Seoul 03722, Republic of Korea; bDepartment of Integrated Omics for Biomedical Science, Graduate School, Yonsei University, Seoul 03722, Republic of Korea; cGenome Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany; dDepartment of Life Sciences, Ewha Womans University, Seoul 03760, Republic of Korea; eDepartment of Integrative Bioscience and Biotechnology, College of Life Sciences, Sejong University, Seoul 05006, Republic of Korea; and fDepartment of Biotechnology, College of Life Science and Technology, Yonsei University, Seoul 03722, Republic of Korea

Edited by Roger D. Kornberg, Stanford University School of Medicine, Stanford, CA, and approved January 31, 2017 (received for review August 9, 2016) CpG, 5′-C-phosphate-G-3′, islands (CGIs) have long been known for MLL and PRC2 complexes (8). CGIs thus provide integration sites their association with enhancers, silencers, and promoters, and for for diverse regulatory signals mediated by DNA and their epigenetic signatures. They are maintained in embryonic stem modifications. cells (ESCs) in a poised but inactive state via the formation of bivalent CGIs show distinct epigenetic modification patterns depending chromatin containing both active and repressive marks. CGIs also on their locations, whether promoter-associated, intragenic, or occur within coding sequences, where their functional role has intergenic. Promoter-associated CGIs (pCGIs) are usually marked remained obscure. Intragenic CGIs (iCGIs) are largely absent from with , whereas nonpromoter-associated CGIs are fre- housekeeping genes, but they are found in all genes associated with quently methylated (9). The modifications often correlate with organ development and cell lineage control. In this paper, we inves- expression levels of the associated genes (10). Methylation of tigated the epigenetic status of iCGIs and found that they too reside pCGIs is regarded as a repressive signature because some meth- in bivalent chromatin in ESCs. Cell type-specific DNA methylation of ylated CGIs recruit the H3K9 methyltransferase to form re- iCGIs in differentiated cells was linked to the loss of both the

pressive (11), but the intragenic CGIs (iCGIs) of BIOLOGY H3K4me3 and marks, and disruption of physical interac- actively expressed genes are highly DNA methylated, pointing to a DEVELOPMENTAL tion with promoter regions, resulting in transcriptional activation of positive role of methylation in this context (12). key regulators of differentiation such as PAXs, HOXs, and WNTs. The Some iCGIs are known to act as orphan promoters, and cell type- differential epigenetic modification of iCGIs appears to be mediated specific DNA methylation on iCGIs can repress alternative tran- by cell type-specific transcription factors distinct from those bound scription from iCGIs in a manner that inversely correlates with by promoter, and these transcription factors may be involved in the H3K4me3 levels (9). The effects of iCGI hypermethylation on ex- hypermethylation of iCGIs upon cell differentiation. iCGIs thus play a pression of the associated genes are still unclear. Deaton et al. key role in the cell type-specific regulation of transcription. showed that the cell type-specific DNA methylation of iCGIs tends to correlate with the silencing of the associated genes in the immune intragenic CpG islands | DNA methylation | bivalent chromatin | embryonic system (13). On the other hand, Lee et al. have observed re- stem cell | differentiation pression induced by iCGI hypomethylationincancerouscells,in- dicating a positive correlation between methylation and gene o provide a common starting ground for the diverse epigenetic expression (14). Therefore, neither the regulatory roles of iCGIs nor Tlandscapes of differentiated cells, embryonic stem cells (ESCs) the cause of modification changes are currently well understood. have a relatively open chromatin structure (1). At the same time, many promoters of developmentally regulated genes are in an Significance intermediate condition, poised for use in specific cell lineages or for inactivation. These promoters are often marked both with activating [trimethylation of 4 (H3K4me3)] and The decision-making process of cellular phenotype specification repressing (H3K27me3) epigenetic modifications (2). Such bi- is controlled by the interplay between genetic and epigenetic valent histone modifications facilitate conversion to active or re- elements. Intragenic CGIs (iCGIs) associated with developmental pressive states by further modification upon differentiation (2). regulators have sequence features that favor DNA methylation However, the key drivers of lineage-specific histone modifications, and bivalent histone modification, i.e., both activating histone in particular those of stem cells, are poorly understood. H3 lysine 4 trimethylation and repressing H3K27me3 marks. The Along with histone modifications, DNA methylation is a key epigenetic transition from bivalent modification to DNA meth- regulator of gene transcription in the mammalian life cycle (3). ylation on iCGIs during differentiation results in cell type-specific Approximately 70% of the total CpG, 5′-C-phosphate-G-3′,sitesin activation of their associated genes. This process is accompanied the mammalian genome are methylated, but CpG sites in crowded by loss of physical interactions with promoter regions, and the contexts, such as CpG islands (CGIs), are frequently unmethylated motifs of developmental regulators are enriched at iCGIs, in- (4). These unmethylated CGIs are highly enriched at promoters, in dicating involvement of these regulators in the epigenetic particular at the promoters of housekeeping genes (4, 5), and serve transition. Our work uncovers the role of iCGIs in cell type- as major binding sites for activating histone modifiers. SETD1 (SET specific differentiation. domain-containing 1) and MLL/KMT2A [mixed-lineage leukemia Author contributions: S.-M.L. and Y.-J.K. designed research; S.-M.L., J.L., K.-M.N., S.J., and or lysine (K)-specific methyltransferase 2A] proteins create active G.T.O. performed research; K.-M.N., Y.J., and S.-W.C. contributed new reagents/analytic H3K4me3 at CGIs, whereas KDM2A [lysine (K)-specific demeth- tools; S.-M.L., J.L., and W.-Y.C. analyzed data; and S.-M.L., J.K.-H., and Y.-J.K. wrote ylase 2A] and TET1 (tet methylcytosine dioxygenase 1) proteins the paper. protect promoters from the spread of gene body-associated epige- The authors declare no conflict of interest. neticpatternsbyremovingH3K36me1/2 and DNA methylation, This article is a PNAS Direct Submission. respectively (6). A number of unmethylated CGIs are also highly 1To whom correspondence should be addressed. Email: [email protected]. enriched for the repressive Polycomb-repressive complex 2 (PRC2) This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. (7) and are prone to bivalent modification by the coaccumulation of 1073/pnas.1613300114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1613300114 PNAS Early Edition | 1of10 Downloaded by guest on September 29, 2021 To elucidate the epigenetic signatures required for the main- or CGIum, unmethylated, H3K4me3-enriched) (SI Appendix,Fig. tenance of ESCs and their differentiation into specific cell types, S1A). pCGIs were primarily H3K4me3-modified, whereas most we reanalyzed publicly available epigenetic modification data (the iCGIs were primarily cytosine-methylated (Fig. 1A; SI Appendix,Fig. analyzed databases are listed in Dataset S1) and validated the S1 B and C;andDataset S2). Although this differential modification results experimentally with special emphasis on the roles of iCGIs. of CGIs is not well understood, H3K4 methyltransferase preferen- We observe that iCGIs serve as key platforms for the production tially binds to 5′-CGG-3′ (hereafter CGG) (15), which indicates of bivalent chromatin at sites in important developmental genes. sequence-related CGI modifications. Examination of CGG/C and The hypermethylation of iCGIs linked to the simultaneous loss of CGA/T frequencies in pCGIs and iCGIs revealed that most iCGIs both H3K4me3 and H3K27me3 triggers transcriptional activation had low CGG/C frequencies, whereas pCGIs had high CGG/C of the associated gene by releasing repressive interactions with the frequencies (Fig. 1B). When we examined the relationship promoter. We found that iCGIs are highly enriched for binding between the epigenetic modification and the CGG/C ratio [the sites for development-specific transcription factors that enable cell (CGG/C):(CGA/T) ratio] for each CGI, the H3K4me3 modifi- type-specific epigenetic modification of iCGIs. We describe a cation preferentially occurred in CGIs with high CGG/C ratios, previously unidentified regulatory mechanism controlling differ- whereas DNA methylation preferentially occurred in CGIs with entiation that is mediated by the epigenetic modification of iCGIs. low CGG/C ratios (<1.6) (Fig. 1C). By contrast, H3K27me3 modi- fications gradually increased as CGG/C ratios decreased, but then Results decreased as DNA methylation became more dominant. Therefore, iCGIs Show a Distinct Sequence Signature. To understand the functional the distinct modification patterns of CGIs appear to be a result of differences between CGIs within promoters (pCGIs) and within their sequences. genes (iCGIs), we examined their epigenetic modification patterns Mouse ESCs (GSE30202) contain fewer CGIs (SI Appendix,Fig. using human ESC data ( Omnibus: GSE16256). A S1D) and have a higher preference for the H3K4me3 modification 2D density plot of CGIs and their associated modifications, DNA than human ESCs (SI Appendix,Fig.S1A and E and Dataset S2) methylation and H3K4me3, revealed that these modifications are (16). Nevertheless, the overall trend of sequence-related CGI mutually exclusive (Fig. 1A). We grouped all CGIs into two types methylation appears to be evolutionarily conserved (SI Appendix, according to whether they had high levels of H3K4me3 modification Fig. S1F). CGI modification patterns remained relatively con- or DNA methylation, using k-means clustering (CGIm, methylated, stant as CGG/C ratios decreased, but H3K4me3 levels dropped

hESCs A pCGIs hESCs iCGIs B pCGIs iCGIs 1 8.0 6.0 4.0 2.0 0 2.0 4.0 6.0 8.0 1 tem AND tem /A CGT/A CGT DNA met 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 1 2 3 4 5 0 1 2 3 4 5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 H3K4me3 H3K4me3 CGG/C CGG/C C F hESCs pCGIs 1

Rank 10000 20000 hESCs (CGG/C)/(CGT/A) 2.6 1.9 0 D DNA met (CGG/C)/(CGT/A) 0 2000 1 2 3 4 5 6 Length (bp) 2000 iCGIs 1 Rank 5000 10000 15000 mESCs

(CGG/C)/(CGT/A) 1.6 1.1 1.0 0 /A)

E CXXC1 MLL2 SUZ12 DNMT3A DNMT3B DNA met CGG/C)/(CGT ( 0 0 1 2 3 4 5 6 Rank 5000 10000 15000 mESCs 200 Length (bp) 2000 (CGG/C)/(CGT/A) 1.6 1.1 1.0 0

Fig. 1. Sequence features of iCGIs favor H3K27me3 and DNA methylation. (A) Density scatter plot showing the distribution patterns of H3K4me3 [x axis,

log2(RPKM+1)] and DNA methylation levels (y axis, methylation rate from 0 to 1) at pCGIs and iCGIs in human H1 ESCs. Density is color-coded from low (blue) to high (red) as shown in the reference color triangle. (B) Density scatter plots showing the distributions of pCGIs and iCGIs based on their sequence features (x axis, percentile ranks of each CGI’s CGG/C percentage; y axis, CGT/A percentage. (C) Enrichments of H3K4me3 (red line, left y axis), H3K27me3 (blue line, right y axis), and DNA methylation (black line, far-right y axis) on CGIs in human ESCs. CGIs were arranged from left to right by CGG/C to CGT/A ratio. (D) En- richments of H3K4me3 (red line, left y axis), H3K27me3 (blue line, right y axis), and DNA methylation (black line, far-right y axis) on CGIs in mESCs. CGIs were arranged from left to right by CGG/C to CGT/A ratio. (E) Enrichments of CXXC1 (pink line), MLL2 (green line), SUZ12 (blue line), DNMT3A (black line), and DNMT3B (gray line) on CGIs in mouse ESCs. CGIs were arranged from left to right by CGG/C to CGT/A ratio. (F) Scatter plots showing the distributions of length (x axis) and ratio of (CGG/C) to (CGT/A) (y axis) of pCGIs (Upper) and iCGIs (Lower). The color of each dot represents the extent of DNA methylation of the CGI in hESCs [blue (0), 0%; red (1), 100%] as shown in the color bars to the right of each panel.

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1613300114 Lee et al. Downloaded by guest on September 29, 2021 dramatically at CGG/C ratios below 1.1, whereas H3K27me3 S2G). Neither iCGIsum nor iCGIsm showed the stimulating ac- PNAS PLUS and DNA methylation levels increased sharply (Fig. 1D). tivities compared with control SV40 (SI Appendix, Fig. In agreement with these modification patterns, binding of S2H). Therefore, iCGIs does not seem to act as an enhancer, and DNA methyltransferases (DNMTs) 3A and B to CGIs gradually a substantial number of iCGIsum do not appear to be alternative increased as CGG/C ratios decreased, and binding of SUZ12 (a TSSs. The regulatory roles of the remaining iCGIsum enriched PRC2 component) dramatically increased. As DNA methylation for H3K4me3 remain to be elucidated. levels increased, binding of CXXC1 (a nonmethylated CpG To examine a possible link between iCGIs and other regula- binding protein) to CGIs with low CGG/C ratios decreased tory elements of the genes, we scanned the effects of iCGI substantially (Fig. 1E). As shown with H3K4me3, DNA methyl- modifications on the epigenetic landscapes of the associated ation and the H3K27me3 modification did not occur at the same genes. To this end, we compared diverse epigenetic modification iCGIs (SI Appendix, Fig. S1G). Thus, iCGIs with low CGG/C patterns along the genic structure among the genes with DNA- ratios tend to be enriched for methylation. Nevertheless, a small methylated iCGIs (iCGIsm: red in Fig. 2A; group 3 genes in SI portion of iCGIs appeared to be histone-modified (H3K4me3 Appendix, Fig. S2A), unmethylated iCGIs (iCGIsum: blue, group and H3K27me3). Indeed, the H3K4me3 levels were substantially 2 genes), or without iCGIs (shaded gray, group 1 genes) (Fig. lower in the iCGIs of Mll2 mutant mouse ESCs than in those of 2A). Promoters of genes lacking iCGIs were accessible and were the wild type (SI Appendix, Fig. S1H). Therefore, distinct binding associated with high levels of H3K4me3, RNA polymerase II preferences of epigenetic modifiers for iCGIs appear to influ- (pol II), CXXC1, and MLL2. The presence of iCGIsum within ence the epigenetic modifications of the iCGIs and the expres- gene sequences was associated with extra-accessible regions and sion of the associated genes. However, additional factors also high H3K4me3 levels along with the accumulation of pol II, appear to influence CGI modifications; iCGIs with similar se- CXXC1, and MLL2 centered on the iCGIum. Especially, it was quence features as pCGIs were more biased toward DNA associated with a dramatic increase in H3K27me3 over the entire methylation than the pCGIs (Fig. 1F). Thus, the regulatory genic area including the promoter, and this increase was ac- mechanism of iCGIs and the correlation between their DNA companied by strong binding of polycomb complex components methylation levels and the expression levels of their associated at both the promoter and iCGIum, creating a strongly repressed genes remains puzzling. chromatin structure over almost the entire gene. Indeed, the histone modification patterns of pCGIs were highly correlated

Unmethylated iCGIs Are Connected with Bivalent Chromatin in ESCs with those of the associated iCGIs (SI Appendix, Fig. S3A), but BIOLOGY and Are Associated with Key Developmental Regulators. iCGIs were um

the presence of iCGI appeared to induce even higher levels of DEVELOPMENTAL found in about 15% of genes (SI Appendix, Fig. S2A), and on- H3K27me3 at the associated promoters (SI Appendix, Fig. S3B). tology analysis predicted that they probably have roles in the Consequently, bivalent iCGIum-containing genes had reduced regulation of developmental genes (SI Appendix, Table S1). levels of the transcriptional elongation mark, H3K79me2 (Fig. iCGIs were not found in most housekeeping genes; instead, they 2A), and lower expression levels (SI Appendix, Fig. S2A). were highly enriched in genes involved in coordinating de- In contrast to bivalently modified iCGIsum, methylated iCGIsm velopmental processes, such as the Hedgehog-, Notch-, and Wnt- did not affect the expression of associated genes, except that signaling pathways (SI Appendix, Fig. S2B). In particular, iCGIs elevated levels were associated with high levels of were frequently found in genes encoding human transcription transcription of genic areas (Fig. 2A). Unlike methylated pCGIs, factors showing strong cell type-specific expression patterns (17) iCGIsm and silenced bivalent iCGIsum were not enriched for the (SI Appendix, Fig. S2C). Indeed, all of the major transcription repressive modification. In agreement, iCGIm-asso- factors associated with specific organ development and lineage ciated promoters displayed active H3K4me3 modification pat- determination had iCGIs (SI Appendix, Table S2). Therefore, terns (SI Appendix, Fig. S3C). These iCGI-dependent epigenetic iCGIs may function as cis-acting elements controlling tissue- patterns were also observed in genes containing non-CGI pro- specific transcription. moters (SI Appendix, Fig. S3D) and in mice (SI Appendix, Fig. To understand the functions of iCGIs, we examined the pos- S3E), demonstrating the epigenetic nature of iCGIs and their sibility that iCGIs correspond to some previously defined type of regulatory potential. regulatory element, namely an enhancer, silencer, or alternative Although each iCGI type has distinct biochemical character- promoter. However, the high levels of H3K4me3 modification or istics, ∼30% of iCGI-containing genes have more than one type DNA methylation patterns of either type of iCGI (iCGIm or of CGI, which complicates the analysis of their regulatory effects. iCGIum) were quite different from the well-known signatures of In general, the effects of iCGIs on gene expression appeared to enhancers (SI Appendix, Fig. S2D) (18–20). In addition, there be additive because the level of repression by H3K27me3 and the was no overlap between the iCGIs and previously defined en- level of activation by H3K36me3 correlated with the numbers of hancers and silencers (SI Appendix, Fig. S2E). iCGIsum were bivalent iCGIsum and methylated iCGIsm found within a gene, highly enriched for H3K4me3, which is associated with promoter respectively (Fig. 2B). When both types of iCGIs were present function, and showed weak enrichment for the enhancer marker within a gene, an intermediary level of active and repressive H3K4me1 (SI Appendix, Fig. S2D). We assessed the signal en- histone modifications was found (Fig. 2C). These results indicate richment for capped analysis of gene expression (CAGE) ex- that gene expression is influenced by the overall modification periments to test the role of iCGIsum as alternative promoters or patterns of the associated iCGIs. active enhancers. CAGE-seq analysis of the RNAs containing or We examined the correlation between epigenetic modification lacking poly(A) tail of human ESCs identified only 227 iCGIsum patterns of all iCGIs and the expression levels of their associated (13%) that were revealed to be alternative transcription start genes. We plotted iCGI modifications in human ESCs using 3D sites (TSSs) with CAGE signals (SI Appendix, Fig. S2F). These scatter plots (X: DNA methylation; Y: H3K4me3; Z: H3K27me3) 227 iCGIsum were marked with H3K4me3, but showed weak with differential coloring for the percentiles of the associated gene enrichment of H3K4me1, which is different from enhancers expression (Fig. 2D and Dataset S3). When DNA methylation levels more enriched with H3K4me1 than H3K4me3 (SI Appendix, Fig. were high (iCGIm), iCGIs were devoid of histone modifications and S2F). We also performed a luciferase assay to test whether iCGIs showed high expression levels of the associated genes (Fig. 2 D and can stimulate gene expression with enhancer activity. We fused E). Conversely, iCGIs with low DNA methylation levels (iCGIum) the mouse iCGIs to luciferase with the upstream SV40 promoter were either H3K4me3-modified (798 H3K4me3 monovalent iCGIsum) and evaluated the mouse iCGI activities in transient transfection with active expression or contained both active (H3K4me3) and experiments in NIH/3T3 mouse fibroblast cells (SI Appendix, Fig. repressive (H3K27me3) modification marks (758 bivalent iCGIsum)

Lee et al. PNAS Early Edition | 3of10 Downloaded by guest on September 29, 2021 Fig. 2. Epigenetic modification landscapes of iCGI-containing genes and their expression levels. (A) Average signal intensity of epigenetic factors indicated at the left of each panel along the length of CGI promoter-containing genes in human H1 ESCs. The MLL2, CXXC1, and RING1B data are from mouse ESCs. Genic regions [from −1 kb upstream of TSS to transcription end site (TxEnd)] are divided into four areas: promoter (gray zone), iCGI, and the regions between iCGI and promoter or TxEnd. Boundaries of promoters and iCGIs are marked with vertical dotted lines. CGI promoter-containing genes were divided into three groups: without iCGI (gray dotted lines), with iCGIum (blue lines), and with iCGIm (red lines). The y axis indicates the RPKM values for ChIP-seq signals or DNA methylation ratios obtained from bisulfite sequencing data. (B) Average signal intensity of H3K27me3 and H3K36me3 along the length of iCGI-containing genes in human H1 ESCs. H3K27me3 levels are represented according to the number of iCGIsum (Left: gray, 1 iCGIum; green, 2 iCGIsum; blue, ≥3 iCGIsum), and H3K36me3 levels are represented according to the number of iCGIsm (Right: gray, 1 iCGIm; orange, 2 iCGIsm; red, ≥3 iCGIsm). (C) Average signal intensity of H3K27me3 and H3K36me3 along the length of genes containing only iCGIum (blue line), only iCGIm (red line), or both (purple line) in human H1 ESCs (Left: H3K27me3; Right: H3K36me3). (D) A 3D scatter plot showing the levels of three epigenetic marks (H3K27me3, H3K4me3, and DNA methylation) at iCGIs in hESCs. Each spot is color-coded according to the percentile rank of the expression of the associated gene, and the relative ranks are shown in the color bars at

the right of each panel (blue: 0; red: 1). Log2(RPKM+1) values are used for H3K4me3 and H3K27me3 ChIP-seq signals, and methylation ratios (0–1) obtained from bisulfite sequencing data are shown for DNA methylation. (E) The 2D scatter plots showing the levels of H3K4me3 (x axis) and H3K27me3 (y axis) on iCGIsum (Left) and iCGIsm (Right) in hESCs. Each spot is color-coded according to the percentile rank of the associated gene.

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1613300114 Lee et al. Downloaded by guest on September 29, 2021 in silenced genes (Fig. 2 D and E). These CGI modification pat- scriptional activity, as shown by high global run-on sequencing PNAS PLUS terns were also observed in mouse ESCs (SI Appendix,Fig.S3G), levels. In addition, pol II that was stalled at promoters and iCGIs indicating that iCGIs have alternative epigenetic modification states was released to genic regions, increasing the elongation effi- that correlate with the expression levels of the associated genes. ciency in differentiated IMR90 cells compared with that in ESCs. Consistent with this, reverse differentiation of fibroblasts to Epigenetic Changes on iCGIs Correlate with Developmental Gene iPSCs was accompanied by a transition of the chromatin modi- Expression During Differentiation. Two distinct epigenetic states fication pattern in the opposite direction, along with the silenc- of iCGIs reflecting the expression levels of the associated genes ing of gene expression (Fig. 3C and SI Appendix, Fig. S5C). suggest that alteration of their modification state could be Therefore, these observations provide evidence that reversible reflected in gene expression. To examine this possibility during changes of the chromatin structure of iCGIs play an important ESC differentiation, we compared human ESCs (hESCs) with role in regulating developmental gene expression during ESC differentiated cells, human neural progenitor cells (hNPCs, maintenance and differentiation. GSE16368) and fibroblasts (IMR90, GSE16256), for cell type- specific modification of iCGIs with respect to differential Methylation of iCGIs Disrupts Promoter Interactions. The correlation expression (Fig. 3A). In general, the correlation between modi- between changes at iCGIs and the activity of their associated fication patterns and expression levels observed in the ESCs was promoters raised the possibility of physical interactions between also seen in the differentiated cells. However, dramatic changes these two elements. Because iCGI function appears to be con- in iCGI modifications correlated with gene expression changes served in mammals, we relied on mESC chromatin interaction induced by differentiation. First, many iCGIs (both iCGIsum and data (21) to test this idea. Chromatin interaction analysis of iCGIsm) in the differentiated cells were silenced by higher levels ESCs by paired-end-tag sequencing (ChIA-PET) of pol II-asso- of H3K27me3. Second, a substantial proportion of the bivalently ciated DNA revealed hot spots of chromatin interactions. As modified iCGIsum was absent from the differentiated cells. Ap- expected, unmethylated CGIs at promoters showed a high fre- proximately one-third of bivalent iCGIsum were hypermethylated quency of chromatin interactions (Fig. 4A). In addition, iCGIsum, in differentiated cells, and this was associated with gene activa- as well as bivalently silenced promoters, had high levels of in- tion (SI Appendix, Fig. S4 A and B). Scatter plot analysis of teraction. Interestingly, iCGIsum interacted most frequently with differential modification levels of iCGIs revealed that hyper- promoters (Fig. 4B and Dataset S4), even with their own pro-

methylation was associated with transcriptional activation and a moters located more than 50 kb upstream of the iCGI (Fig. 4C). BIOLOGY reduction in both H3K4me3 and H3K27me3 marks (Fig. 3B): Sixty-four percent of bivalent iCGIsum and 79% of H3K4me3 DEVELOPMENTAL 11% of iCGIsum had switched to iCGIsm (>0.4% of DNA monovalent iCGIsum physically interacted with other genomic methylation rate changes, hNPC-hESC) with transcriptional ac- regions (SI Appendix, Fig. S6A). However, these interactions tivation in differentiated hNPCs (SI Appendix, Fig. S4C). The were not seen at methylated CGIs, even at iCGIsm displaying genes containing the hypermethylated iCGIs in hNPCs have key high transcriptional activity (Fig. 4A and SI Appendix, Fig. S6B). roles in transcriptional regulation and brain development (SI To test whether interactions at iCGIs are affected by the Appendix, Fig. S4D). This transcriptional activation associated hypermethylation that occurs during ESC differentiation to with hypermethylation of iCGIs is also observed in genes with neuronal cells, interaction frequencies of hypermethylated iCGIs the methylated promoters (SI Appendix, Fig. S4 E–G). and of iCGIsum in mNPCs were compared with their corre- The loss of H3K4me3 marks specifically occurred at iCGIs and sponding genes in mESCs. ChIA-PET analysis of mESCs and not at other intragenic regions, but DNA methylation increased mNPCs revealed that hypermethylation of iCGIs during differ- over entire intragenic regions (SI Appendix,Fig.S4H). Hyper- entiation resulted in decreased interaction frequencies in dif- methylation of these intragenic regions was observed until ∼1kb ferentiated cells, whereas the interaction frequencies of iCGIsum away from the TSS (SI Appendix,Fig.S4I). H3K27me3 decreased were unchanged (Fig. 4D). The ability of iCGIs to provide ad- at regions between TSSs and iCGIs to the same extent as at ditional binding sites for PRC complexes and to mediate physical hypermethylated iCGIs (SI Appendix,Fig.S4H). This effect was interaction with promoter regions may enable strong enrich- substantial when iCGIs were located within 10 kb of the TSS, and ments of H3K27me3 marks from TSSs to iCGIs, resulting in moderate when iCGIs were located more than 10 kb away from the more stable silencing of key developmental genes in ESCs (Fig. TSS (SI Appendix,Fig.S4J). Hypomethylation was often associated 2A and SI Appendix, Figs. S3B and S6C). iCGI hypermethylation with loss of transcriptional activity and increased levels of can disrupt physical interactions with promoter regions and H3K27me3 (Fig. 3B and SI Appendix,Fig.S4B). A comparison prevent the binding of PRC complexes to iCGIs, and this may between mouse ESCs (mESCs) and mouse neural progenitor cells cause the elimination of H3K27me3 marks from TSSs to iCGIs (mNPCs) revealed an even more striking conversion of bivalently (SI Appendix, Fig. S4 H and G). H3K4me3 monovalent iCGIs modified iCGIsum into DNA-methylated iCGIsm in the differenti- also interact with promoter regions in actively expressed genes ated cells (SI Appendix,Fig.S5A and B). Therefore, these results (SI Appendix, Fig. S6A); however, why transcription starts only in suggest an intriguing possibility that iCGIs regulate gene expression promoter regions and not in iCGIs (except for the 13% CAGE+ by converting bivalent histone modification to DNA methylation at iCGIsum in SI Appendix, Fig. S2F) remains to be elucidated. poised developmental genes or by gaining H3K27me3 marks to To validate these findings experimentally, we examined the further silence genes, the expression of which is not required. effect of demethylation on iCGI chromosomal interactions by To examine how the DNA methylation changes at iCGIs af- conformation capture (3C) analysis of iCGI-con- fect gene expression during ESC differentiation, the epigenetic taining genes in mouse ES14 cells with and without treatment patterns influenced by hypermethylation of iCGIs were exam- with a DNMT inhibitor (22). Because all of the pCGIs tested ined for the entire genic landscape in differentiated IMR90 and were unmethylated, there was no change in the DNA methyl- induced pluripotent stem cells (iPSCs) (GSE16256) derived from ation level upon 5-azacytidine treatment (SI Appendix, Fig. S6E). fibroblasts. As shown above, hypermethylation of iCGIs was However, the iCGIm-containing genes Igsf3, Map3k1, and Whrn accompanied by loss of bivalent histone marks from the iCGIs. were markedly demethylated upon treatment with 5-azacytidine. The loss of H3K27me3 was not limited to the iCGIs but was 3C-quantitative real-time PCR (3C-qPCR) measurements showed observed over the entire genic area including the promoters (Fig. that the Insr iCGIum had a high interaction frequency, whereas 3C). Conversely, promoters and genic regions gained substantial the interaction frequencies of the iCGIsm of Igsf3, Map3k1,and amounts of H3K4me3 and H3K27 acetylation (H3K27ace), Whrn with their promoters were barely detectable (Fig. 4E). along with H3K36me3 at gene bodies, resulting in higher tran- This correlated with a reduction of methylation at these

Lee et al. PNAS Early Edition | 5of10 Downloaded by guest on September 29, 2021 Fig. 3. Epigenetic modification changes of iCGIs during ESC differentiation. (A) The 3D scatter plots showing the levels of DNA methylation, H3K4me3, and H3K27me3 at iCGIs in hESCs, hNPCs, and IMR90. Each spot is color-coded according to the percentile rank of the expression of the associated gene, as shown in

the vertical color bars (blue: 0; red; 1). Each axis represents DNA methylation rate or log2(RPKM+1) values for H3K4me3 and H3K27me3. (B) DNA methylation, H3K4me3, and H3K27me3 changes at 1,369 iCGIs associated with 651 genes, which were differentially expressed (percentile rank differences >0.3) in hNPCs

compared with hESCs. Difference values of DNA methylation rate and log2(RPKM+1) for H3K4me3 and H3K27me3 of each iCGI are shown in 2D scatter plots (hNPCs-hESCs). Each spot is color-coded according to the percentile rank difference of the expression of their associated gene as shown in the vertical color bars (blue: −1; red: +1). (C) Heatmap display of various epigenetic marker enrichments for the genes activated by hypermethylation of the associated iCGIs in IMR90 cells. Sixty-six genes showing a more than 40% increase of iCGI methylation are shown. The Top two rows show the levels of the indicated epigenetic marks along the relative locations of CGIs indicated in the leftmost panels. Genes are arranged from Top to Bottom according to the intragenic locations of the CGIs in their gene body. Red gradient indicates the RPKM values of the modifications (or DNA methylation rates) in H1 ESCs or IMR90 cells. The heatmaps in the bottom two rows show differential levels of the epigenetic marks in IMR90 compared with those in hESCs (IMR90-hESC) or differential levels of the epigenetic marks in iPSCs compared with those in IMR90 (iPSC-IMR90). Color scale in heatmap shows the differences as increased (red) and decreased (green).

iCGIs upon 5-azacytidine treatment (SI Appendix,Fig.S6E). demethylation on accumulation of the repressive histone modifi- However, treatment with 5-azacytidine enhanced chromatin cations H3K27me3 and H3K9me3 on iCGIsum and iCGIsm.A interactions in Map3k1, Igsf3,andWhrn between the iCGI and DNMT triple-knockout (TKO) caused a complete loss of DNA the promoter, but had no effect on other intragenic regions methylation along with a slight depletion of H3K9me3 on iCGIsum (SI Appendix,Fig.S6D and E). Therefore, DNA methylation and iCGIsm, reflecting similarities between these two repressive of iCGIs appears to interrupt chromatin interactions within epigenetic markers (SI Appendix,Fig.S7A and B). However, the promoter regions. loss of DNA methylation had different effects on H3K27me3 depending on the type of iCGI. iCGIsm gained H3K27me3 DNA Hypermethylation of iCGIs Is Required for Active Expression of modifications upon the loss of DNA methylation, whereas bivalent Their Associated Genes. To validate the requirement of iCGI iCGIsum lost the repressive modifications (SI Appendix,Fig.S7B). methylation for gene regulation, we analyzed the effects of DNMT Reduced DNA methylation of iCGIs caused transcriptional mutations on iCGI-mediated regulation in mouse ESCs repression of their associated genes, whereas genes containing (GSE28254 and GSE29413). We examined the effects of DNA hypomethylated pCGIs were activated in TKO cells (SI Appendix,

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1613300114 Lee et al. Downloaded by guest on September 29, 2021 type each, resulting in depletion of H3K4me3 and H3K27me3 PNAS PLUS marks concurrently with the activation of the associated genes in the corresponding cell type (Fig. 5 B–D and SI Appendix,Fig.S8 A–C). These cell type specifically activated genes were enriched for homophilic cell adhesion (blood cell-specific cluster 2), mor- phogenic transcription factors (fibroblast-specific cluster 3), and brain development-associated genes (neuron-specific cluster 4) (SI Appendix, Table S3), and similar patterns of iCGI methylation were observed in the analysis of additional cell types (mobilized CD34+ cell, HSC; cortex-derived neural stem cell, Neuro; breast myoepiblast, Myoe) (SI Appendix,Fig.S9), confirming the im- portance of iCGIs in cell type-specific differentiation. Moreover, many of these differentially regulated iCGI-containing genes were proteins specifically expressed in the corresponding cell types (Fig. 5A). In particular, several key transcription factors in- cluding BAI1, LHX2, ZIC1, PAX6, MEIS1, and ONECUT1 were specifically activated in NPCs with hypermethylation of their iCGIs, indicating the essential roles of iCGI-mediated transcrip- tional control in neuronal differentiation.

Cell Type-Specific -Binding Sites Are Enriched at iCGIs. Specific epigenetic changes are usually induced by cell type-specific transcription factors (23). To test whether cell type- specific methylation of iCGIs can be triggered by tissue-specific transcription factors, we analyzed transcription factor-binding sites in the differentially methylated iCGI clusters in Fig. 5A

using Homer software (24). Indeed, these iCGI clusters, rather BIOLOGY than their pCGIs, were enriched for binding sites for a distinct DEVELOPMENTAL set of developmental transcription factors known to act in the corresponding cell types (Fig. 6A). To examine the generality of this phenomenon, we analyzed the enrichment of transcription factor-binding sites among the entire set Fig. 4. Disruption of CGI interactions by methylation of iCGIs. (A) Histogram of promoter-associated and iCGIs. Grouping of transcription fac- showing the enrichment of different types of CGIs and bivalent promoters tors according to their calculated probability of binding to different (Bivalent_P) undergoing ChIA-PET interactions with RNA polymerase II in mouse ESCs (GSE44067). The y axis indicates the RPKM values for the ChIA- CGI elements revealed four distinct groups of transcription factors PET signal. (B) Donut chart showing the relative genomic distribution of specifically enriched in given CGI elements (Fig. 6B and Dataset iCGIum-interacting regions. (C) Genome browser view showing the enrich- S6). The transcription factors recognizing the CGIs of active pro- ment of ChIA-PET interaction signals, Pol II, H3K4me3, and H3K27me3 at the moters (Active pCGI) encompassed many abundant transcription Insr and Gprin1 loci. Locations of iCGIs are highlighted in cyan. Interacting factors including SP1, NFY, and YY1 (green), whereas NRSF and regions are indicated by red dashed lines with P values. (D) Histogram TBP (gray) were specifically enriched at bivalent pCGIs (Fig. 6B). showing the enrichment of methylated or unmethylated iCGIs undergoing Although their histone modification patterns are similar to those of ChIA-PET interactions with RNA polymerase II in mESCs and NPCs. (Upper) bivalent promoters, iCGIsum were enriched for binding sites for The interaction frequencies of hypermethylated iCGIs in NPCs compared unique developmental transcription factors (blue) quite different with ESCs. (Lower) The interaction frequencies of unmethylated iCGIs in both ESCs and NPCs. The y axis indicates the RPKM values for the ChIA-PET from those binding to bivalent pCGIs. These binding sites for dis- signal. (E) The 3C-qPCR analysis of interactions between promoter CGIs and tinct groups of transcription factors in each CGI type were evolu- iCGIs (iCGIum: Insr; iCGIm: Igsf3, Map3k1, and Whrn) within each gene in tionarily conserved such that similar enrichment patterns were control DMSO-treated and 5-azacytidine–treated mouse ES14 cells. Two in- observed in mESCs (SI Appendix,Fig.S10and Dataset S6). dependent 3C-qPCR experiments were performed on two independent bi- These unique transcription factors recognizing bivalently modi- ological samples. The data were normalized to a Gapdh loading control. fied CGIs are expressed in a cell type-specific manner and are thus expressed at much lower levels in ESCs than those recognizing active promoters or methylated iCGIsm (Fig. 6C). Due to their Fig. S7C). In addition, recovery of DNMT3A expression in TKO absence in ESCs, most of their target iCGIs might remain in an cells (GSE57577) induced hypermethylation of CGIs in both unmethylated state and only become methylated in differentiated promoter and intragenic regions, but led to differential effects on derivatives specifically expressing the corresponding transcription gene expression depending on the locations of the CGIs. pCGIs factors. Intriguingly, many of these cell type-specific transcription were repressed, whereas iCGIs were de-repressed in DNMT3A- factors also contained iCGIs and were regulated in a similar way. expressing cells (SI Appendix,Fig.S7D). Therefore, DNA meth- On the other hand, the methylated iCGIsm of ESCs contained ylation at iCGIs appears to counteract the repressive H3K27me3 binding sites for abundant transcription factors present in ESCs, modification and activate gene expression. such as and MAX (Fig. 6 B and C). These results indicate that the availability of iCGI-binding transcription factors may be the Cell Type-Specific Gene Expression Is Associated with Differential main determinant for differential modification of iCGIs. For this Methylation of iCGIs. We examined DNA methylation changes at reason, transcription factor genes containing iCGIsum are in a si- um iCGIs of ESCs in three different somatic cell types (GM12878: lenced state waiting for a specific developmental cue and thus have LB, IMR90, and NPC). We identified 355 iCGIs that were substantially higher cell-type specificity scores than those containing markedly hypermethylated (>40%) in at least one somatic cell methylated iCGIsm (Fig. 6D). These results suggest that cell type- type (Fig. 5A and Dataset S5). Many iCGIs (cluster 1) gained specific expression of both transcription factors and their targets methylation in all of the differentiated cell types, but three distinct may take advantage of epigenetic modifications of iCGIs to gener- groups of iCGIs were specifically hypermethylated in just one cell ate differentiated lineages from ESCs. Combinatorial transcription

Lee et al. PNAS Early Edition | 7of10 Downloaded by guest on September 29, 2021 Fig. 5. Cell type-specific regulation of DNA methylation at iCGIs. (A) Heatmap showing the hierarchical clustering of DNA methylation differences at iCGIs between ESCs and indicated somatic cells [LB (lymphoblastoid, GM12878), IMR90 (fibroblast), or hNPC]. A total of 355 iCGIs hypermethylated (more than 40%) in at least one somatic cell type were analyzed using hierachial clustering. Four clusters are generated (cluster 1: common; cluster 2: LB-specific; cluster 3: fibroblast- specific; cluster 4, NPC-specific). Representative transcription factors of each cluster are indicated on the left. Color scale for the methylationdifferencesisshown. (B) Expression levels of the genes associated with each cluster in LB, IMR90, and NPC. Box-whisker plots showing the FPKM values of the genes from mRNA-seq data of each cell line. (C) H3K27me3 levels of iCGIs in LB, IMR90, and NPC. Box-whisker plots showing the RPKM values of the iCGIs from ChIP-seq data of each cell line. (D) H3K4me3 levels of iCGIs in LB, IMR90, and NPC. Box-whisker plots showing the RPKM values of the iCGIs from ChIP-seq data of each cell line.

factor occupancy on promoters and iCGIs may lead to diverse have been suggested to silence their expression via bivalent his- transcriptional outputs. tone modification of pCGIs (31, 32). The establishment of bi- valent modification and its progress into monovalent modification Discussion have been largely speculated to rely on transcription factor-medi- Most work concerning the importance of epigenetic modifications ated recruitment of specific histone modifiers. However, this may has focused on promoter-associated CGIs. The only known role of not be enough to provide the necessary regulatory complexity re- DNA methylation on iCGIs is to repress alternative TSSs in a cell quired for cellular differentiation. Additionally, the requirement for type-specific manner. Here we provide several lines of evidence DNA methylation to regulate bivalent chromatin has been un- suggesting that iCGIs have additional roles in ESC differentiation. certain. Therefore, identification of the additional layer of regula- First, iCGI-containing genes, including most cell lineage tion mediated by iCGIs expands our understanding of bivalent transcription factors, are important developmental regulators. chromatin and suggests a new regulatory mechanism. In addition, Many of the genes encoding key regulators of morphogenesis areas encompassed by the iCGIs, which could include more than and organ development, such as BMPs, HOXs, PAXs, GATAs, one gene, can interact with other CGIs and be coregulated by the Notch, and WNTs, contain iCGIs. The HOXA gene cluster spread of the repressive histone mark H3K27me3. These CGI in- contains 39 CGIs distributed between 13 HOXA genes (chro- teractions with interlocking repressive chromatin can be disrupted mosome 7, 155-kb region) (25), and seven contain 18 by DNA methylation of iCGIs, demonstrating another way in which iCGIs. Therefore, iCGIs appear to be a common element for DNA methylation can affect gene expression. Indeed, DNA development-specific gene expression. hypomethylation in DNMT knockout cells leads to the reappear- Second, the epigenetic modifications of iCGIs are evolutionarily ance of H3K27me3 and H3K4me3 at iCGIs, resulting in gene si- conserved and vary in a cell type-specific manner. In both human lencing (6, 33, 34). Therefore, iCGIs reveal a regulatory mechanism and mouse, the modification status of iCGIs correlated with the distinct from that of enhancers and silencers. expression of their associated genes. Comparison of various hu- Fourth, iCGIs are enriched for binding sites for developmentally man differentiated cell types with ESCs revealed that cell type- regulated transcription factors that are distinct from those of specific hypermethylation of iCGIs correlated with activated pCGIs. It is intriguing that many cell type-specific transcription transcription of their associated developmentally regulated genes. factors can bind to iCGIs to affect target gene expression. A com- For example, iCGIs of cell lineage markers (PAX5, RARA, LHX2, plex combination of transcription factors on promoters and iCGIs and MEIS1) were hypermethylated only in the corresponding cell can result in subtle changes in transcriptional regulation depending types that typically express these transcription factors (26–29). on each cell type. We observed that iCGI-binding transcription Although there are a number of iCGIs commonly methylated in factor availability correlated somewhat with differential modifica- diverse cell types, including ESCs, CGIs are the major targets of tion of iCGIs, but we do not yet know the exact role of transcription controlled de novo DNA methylation during early embryogenesis factors in epigenetic modifications of iCGIs. However, iCGI-bind- following genome-wide demethylation (30). Therefore, methyl- ing transcription factors, including glucocorticoid receptors and ation of iCGIs is highly associated with development. progesterone receptors, interact with the H3K4 Third, iCGIs may provide a common platform for bivalent LSD1, KDM5A, and KDM5B (35) and DNMTs (36), sup- chromatin and DNA methylation. Many developmental genes porting the idea that iCGI-binding transcription factors can

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1613300114 Lee et al. Downloaded by guest on September 29, 2021 taining genes also gained H3K27 acetylation upon activation, they PNAS PLUS did not show DNA methylation changes. Therefore, different transcription factors appear to regulate development-specific gene activation at different genic locations by distinct mechanisms. Finally, iCGIs have distinct sequence features optimized for methylation-mediated regulation. We showed that several se- quence features (length, CGG/C ratio, and transcription factor- binding sites) of CGIs influence their methylation levels and differ between intragenic and promoter-associated CGIs. Reflecting their sequence differences, the methylation process and its functional effect on iCGIs is different from that on pCGIs. Transcriptional activation at iCGIs could facilitate methylation at H3K36 by SETD2 (SET domain-containing 2), which binds to the C-terminal domain of elongating pol II (43, 44). High levels of H3K36me3 at iCGIs may promote DNA methylation because the PWWP domain of DNMT3A has been shown to recognize the H3K36me3 mark (45). Although there is no direct evidence that SETD2 binds to methylated DNA, the loss of DNA methylation can promote the binding of H3K36 demethylases KDM2A and NO66 to iCGIs (46, 47), which could repress H3K36 accumulation. Therefore, DNA methylation and H3K36me3 appear to promote each other in actively transcribing regions. Intriguingly, the histone variant H2A.B (48) was pref- erentially associated with iCGIs rather than pCGIs (SI Appendix, Fig. S3F). However, its binding to iCGIs was not dependent on their methylation status; it was enriched at iCGIsum as well as at iCGIsm. The histone variant may contribute to the formation of BIOLOGY

the distinct epigenetic features and regulatory roles of iCGIs. DEVELOPMENTAL Therefore, distinctive sequence features of iCGIs may be able to mediate both the arrest and the release of transcriptional elon- gation, making iCGIs efficient regulators of gene expression during differentiation (SI Appendix, Fig. S11). These lines of evidence demonstrate regulatory functions as- sociated with iCGIs. The epigenetic transition from bivalent histone modification to DNA methylation at iCGIs appears to be Fig. 6. Transcription factor-binding motif enrichment at iCGIs. (A) Enrichment of widely used in many developmental processes. Therefore, genes defined transcription factor-binding motifs was calculated using Homer software. containing iCGIs may be inferred to be important players in Heatmaps show the hierarchical clustering of transcription factor motifs enriched developmentally regulated processes, such as cancer progression in each cluster of the pCGIs (Left)andiCGIs(Right) defined in Fig. 5 (cluster 1: common; cluster 2: LB-specific: cluster 3: fibroblast-specific; cluster 4: NPC-specific). and aging. Our analyses thus provide fresh insights into the roles Black gradient indicates enrichment (minus log P values) of motifs in CGIs, and the of epigenetic regulation in development and disease. motifs with cutoff minus log-P value greater than 3 in at least one cluster are shown. (B) Heatmap showing the hierarchical clustering of transcription factor Materials and Methods motif enrichments for active promoter CGIs, bivalent promoter CGIs, iCGIsum,and Public Datasets Used. The following are accession numbers of the datasets iCGsIm in hESCs. Enrichments (minus log P values) of 88 motifs with a cutoff minus used: GSE11431, GSE12241, GSE16256, GSE16368, GSE17312, GSE18927, log P value of 4 in at least one region are shown. The representative transcription GSE19468, GSE28254, GSE29413, GSE30202, GSE40832, GSE41009, GSE43070, factors enriched specifically in each category are indicated with different colors. GSE44067, GSE48122, GSE49294, GSE52017, GSE53490, GSE57413, GSE57575, Sequences for the bivalent promoter CGIs were obtained from the bivalent genes GSE64115, human ENCODE, and mouse ENCODE. Details of analyzed data defined in Li et al. (49). Active promoter CGIs are from the genes without an iCGI. are listed in Dataset S1. (C) Expression levels (FPKM) of the transcription factors with binding sites at active promoter CGIs, bivalent promoter CGIs, iCGIsum,andiCGIsm are obtained from the Sequencing and Data Processing. RNA-seq data of hESCs. Horizontal lines indicate the median values. (D) Density Mapping. All human sequence data were mapped to the University of Cal- plot of iCGI-containing transcription factors with iCGIsum (black line) or iCGIsm ifornia at Santa Cruz (UCSC) human reference genome (hg19, downloaded (gray line) according to their tissue specificity scores. Tissue specificity scores were from UCSC). For consistent analysis, all mouse data were mapped to the UCSC obtained from Ravasi et al. (17). mouse reference genome (mm9). Human and mouse data were mapped to the reference genome using Bowtie2 (version 2.1.0). The default para- meters were used for the mapping, and no mismatches were allowed recruit chromatin modifiers to iCGIs. LHX2, SCL, HOXD13, (default setting). GATA3, progesterone , , NR5A2, ChIP-seq data analysis. To analyze the enrichment of histone modification and CTCF are key players inducing differentiation into neuronal signals, the SAM output was converted to a BED file. Mapped reads were counted at a 50-bp resolution according to the sequence of the genome, and (ectoderm) or hematopoietic (mesendoderm) cells, but appear to – the reads per kilobase of counts per million reads sequenced (RPKM) at a 5-bp bind specifically to iCGIs rather than to pCGIs (28, 37 41). resolution were calculated. These RPKM values were averaged between Recently, a number of transcription factors binding at pro- replicate samples to minimize experimental variation. moters were shown to elicit dynamic epigenetic changes during Whole-genome bisulfite sequencing data analysis. For the bisulfite sequencing ESC differentiation (42). However, these transcription factors are data, the raw sequences were first trimmed using Trim-galore to trim off any ′ quite different from those enriched at iCGIs and appear to reg- low-quality base calls, and the adapter sequences from the 3 end. The sequences were then aligned to the Bismark genome built from the ref- ulate a different set of target genes using a distinct mechanism that erence genome using Bismark with the Bowtie2 aligner. The standard induces DNA methylation loss or H3K27 acetylation accumula- alignment parameter was used with a multiseed length of 20 bp with tion. Although the bivalent promoter regions of the iCGI-con- 0 mismatches. The Bismark methylation extractor was used to extract the

Lee et al. PNAS Early Edition | 9of10 Downloaded by guest on September 29, 2021 methylation percentage at every called methylation base in Bedgraph isoforms were averaged. To minimize variation in overall expression values data format. between samples, Cuffnorm (included in the Cufflinks package), which RNA-seq data analysis. RNA-seq data were mapped to the reference genome normalizes the read counts across samples for relative comparisons, was used. using Tophat2 (version 2.0.10). The aligned BAM files were then assembled Detailed descriptions of methods are available in SI Appendix. using the Cufflinks package (version 2.1.1) to extract and compare their FPKM (Fragments Per Kilobase Million) values. National Center for Biotechnology ACKNOWLEDGMENTS. We thank Dr. Roger D. Kornberg for critical comments Information mRNA reference sequence collection (RefSeq) numbers were and Dr. Yoon-Young Koh for statistical analysis. This work was supported by assigned to each gene name. The expression values of genes with multiple Samsung Science and Technology Foundation Project SSTF-BA1601-13.

1. Chen T, Dent SY (2014) Chromatin modifiers and remodellers: Regulators of cellular 25. Rauch T, et al. (2007) Homeobox gene methylation in lung cancer studied by genome- differentiation. Nat Rev Genet 15(2):93–106. wide analysis with a microarray-based methylated CpG island recovery assay. Proc 2. Bernstein BE, et al. (2006) A bivalent chromatin structure marks key developmental Natl Acad Sci USA 104(13):5527–5532. genes in embryonic stem cells. Cell 125(2):315–326. 26. Medvedovic J, Ebert A, Tagoh H, Busslinger M (2011) Pax5: A master regulator of B 3. Jones PA (2012) Functions of DNA methylation: Islands, start sites, gene bodies and cell development and leukemogenesis. Adv Immunol 111:179–206. α beyond. Nat Rev Genet 13(7):484–492. 27. Laursen KB, Wong PM, Gudas LJ (2012) Epigenetic regulation by RAR maintains li- – 4. Beck S, et al. (2014) CpG island-mediated global gene regulatory modes in mouse gand-independent transcriptional activity. Nucleic Acids Res 40(1):102 115. embryonic stem cells. Nat Commun 5:5490. 28. Morales D, Hatten ME (2006) Molecular markers of neuronal progenitors in the – 5. Zhu J, He F, Hu S, Yu J (2008) On the nature of human housekeeping genes. Trends embryonic cerebellar anlage. J Neurosci 26(47):12226 12236. Genet 24(10):481–484. 29. Spieler D, et al. (2014) Restless legs syndrome-associated intronic common variant in Meis1 – 6. Hashimoto H, Vertino PM, Cheng X (2010) Molecular coupling of DNA methylation alters enhancer function in the developing telencephalon. Genome Res 24(4):592 603. 30. Guo H, et al. (2014) The DNA methylation landscape of human early embryos. Nature and . Epigenomics 2(5):657–669. 511(7511):606–610. 7. Mendenhall EM, et al. (2010) GC-rich sequence elements recruit PRC2 in mammalian 31. Milne TA, et al. (2005) MLL associates specifically with a subset of transcriptionally ES cells. PLoS Genet 6(12):e1001244. active target genes. Proc Natl Acad Sci USA 102(41):14765–14770. 8. Hu D, et al. (2013) The Mll2 branch of the COMPASS family regulates bivalent pro- 32. Schmitges FW, et al. (2011) Histone methylation by PRC2 is inhibited by active chro- moters in mouse embryonic stem cells. Nat Struct Mol Biol 20(9):1093–1097. matin marks. Mol Cell 42(3):330–341. 9. Maunakea AK, et al. (2010) Conserved role of intragenic DNA methylation in regu- 33. Murphy PJ, et al. (2013) Single-molecule analysis of combinatorial epigenomic states – lating alternative promoters. Nature 466(7303):253 257. in normal and tumor cells. Proc Natl Acad Sci USA 110(19):7772–7777. 10. Deaton AM, Bird A (2011) CpG islands and the regulation of transcription. Genes Dev 34. Hagarman JA, Motley MP, Kristjansdottir K, Soloway PD (2013) Coordinate regulation – 25(10):1010 1022. of DNA methylation and H3K27me3 in mouse embryonic stem cells. PLoS One 8(1): 11. Sarraf SA, Stancheva I (2004) Methyl-CpG binding protein MBD1 couples histone H3 e53880. methylation at lysine 9 by SETDB1 to DNA replication and chromatin assembly. Mol 35. Stratmann A, Haendler B (2012) Histone demethylation and steroid receptor function Cell 15(4):595–605. in cancer. Mol Cell Endocrinol 348(1):12–20. 12. Hawkins RD, et al. (2010) Distinct epigenomic landscapes of pluripotent and lineage- 36. Hervouet E, Vallette FM, Cartron PF (2009) Dnmt3/transcription factor interactions as committed human cells. Cell Stem Cell 6(5):479–491. crucial players in targeted DNA methylation. 4(7):487–499. 13. Deaton AM, et al. (2011) Cell type-specific DNA methylation at intragenic CpG islands 37. Pandolfi PP, et al. (1995) Targeted disruption of the GATA3 gene causes severe abnor- in the immune system. Genome Res 21(7):1074–1086. malities in the nervous system and in fetal liver haematopoiesis. Nat Genet 11(1):40–44. 14. Lee SM, et al. (2014) HBx induces hypomethylation of distal intragenic CpG islands 38. Watson LA, et al. (2014) Dual effect of CTCF loss on neuroprogenitor differentiation required for active expression of developmental regulators. Proc Natl Acad Sci USA and survival. J Neurosci 34(8):2860–2870. 111(26):9555–9560. 39. Real PJ, et al. (2012) SCL/TAL1 regulates hematopoietic specification from human 15. Xu C, Bian C, Lam R, Dong A, Min J (2011) The structural basis for selective binding of embryonic stem cells. Mol Ther 20(7):1443–1453. non-methylated CpG islands by the CFP1 CXXC domain. Nat Commun 2:227. 40. Shetty AS, et al. (2013) Lhx2 regulates a cortex-specific mechanism for barrel for- – 16. Stamatoyannopoulos JA, et al.; Mouse ENCODE Consortium (2012) An encyclopedia mation. Proc Natl Acad Sci USA 110(50):E4913 E4921. of mouse DNA elements (Mouse ENCODE). Genome Biol 13(8):418. 41. Hale MA, et al. (2014) The nuclear family member NR5A2 controls 17. Ravasi T, et al. (2010) An atlas of combinatorial transcriptional regulation in mouse aspects of multipotent progenitor cell formation and acinar differentiation during – and man. Cell 140(5):744–752. pancreatic organogenesis. Development 141(16):3123 3133. 18. Hawkins RD, et al. (2011) Dynamic chromatin states in human ES cells reveal potential 42. Tsankov AM, et al. (2015) Transcription factor binding dynamics during human ES cell differentiation. Nature 518(7539):344–349. regulatory sequences and genes involved in pluripotency. Cell Res 21(10):1393–1409. 43. Sun XJ, et al. (2005) Identification and characterization of a novel human histone H3 19. Andersson R, et al.; FANTOM Consortium (2014) An atlas of active enhancers across lysine 36-specific methyltransferase. J Biol Chem 280(42):35261–35271. human cell types and tissues. Nature 507(7493):455–461. 44. Brown SJ, Stoilov P, Xing Y (2012) Chromatin and epigenetic regulation of pre-mRNA 20. Satoh J, Kawana N, Yamamoto Y (2013) ChIP-Seq data mining: Remarkable differ- processing. Hum Mol Genet 21(R1):R90–R96. ences in NRSF/REST target genes between human ESC and ESC-derived neurons. 45. Dhayalan A, et al. (2010) The Dnmt3a PWWP domain reads histone 3 lysine 36 tri- – Bioinform Biol Insights 7:357 368. methylation and guides DNA methylation. J Biol Chem 285(34):26114–26120. 21. Zhang Y, et al. (2013) Chromatin connectivity maps reveal dynamic promoter- 46. Brien GL, et al. (2012) Polycomb PHF19 binds H3K36me3 and recruits PRC2 and de- – enhancer long-range associations. Nature 504(7479):306 310. methylase NO66 to embryonic stem cell genes during differentiation. Nat Struct Mol 22. Tsuji-Takayama K, et al. (2004) Demethylating agent, 5-azacytidine, reverses differ- Biol 19(12):1273–1281. – entiation of embryonic stem cells. Biochem Biophys Res Commun 323(1):86 90. 47. Blackledge NP, et al. (2010) CpG islands recruit a histone H3 lysine 36 . 23. Iwafuchi-Doi M, Zaret KS (2014) Pioneer transcription factors in cell reprogramming. Mol Cell 38(2):179–190. Genes Dev 28(24):2679–2692. 48. Chen Y, Chen Q, McEachin RC, Cavalcoli JD, Yu X (2014) H2A.B facilitates transcription 24. Heinz S, et al. (2010) Simple combinations of lineage-determining transcription fac- elongation at methylated CpG loci. Genome Res 24(4):570–579. tors prime cis-regulatory elements required for macrophage and B cell identities. Mol 49. Li Q, Lian S, Dai Z, Xiang Q, Dai X (2013) BGDB: A database of bivalent genes. Da- Cell 38(4):576–589. tabase (Oxford) 2013:bat057.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1613300114 Lee et al. Downloaded by guest on September 29, 2021