1 Synthetic CpG islands reveal DNA sequence
2 determinants of chromatin structure
3
4 Elisabeth Wachter1, Timo Quante1, Cara Merusi1, Aleksandra Arczewska1,
5 Francis Stewart2, Shaun Webb,1 and Adrian Bird1#
6
7
8 1 The Wellcome Trust Centre for Cell Biology 9 University of Edinburgh 10 The King’s Buildings 11 Mayfield Road 12 Edinburgh 13 EH9 3JR 14 UK 15 16 2 Genomics & Biotechnology Center 17 Technische Universitaet 18 Dresden 19 Tatzberg 47 20 01307 Dresden 21 Germany 22
23 #Corresponding author: [email protected]; phone: +44-(0)131-650-5670
24
25 Running title: CpG island DNA affects chromatin structure
1 26 Abstract
27
28 The mammalian genome is punctuated by CpG islands (CGIs), which differ
29 sharply from the bulk genome by being rich in G+C and the dinucleotide CpG.
30 CGIs often include transcription initiation sites and display “active” histone
31 marks, notably histone H3 lysine 4 methylation. In embryonic stem cells (ESCs)
32 some CGIs adopt a “bivalent” chromatin state bearing simultaneous “active” and
33 “inactive” chromatin marks. To determine whether CGI chromatin is
34 developmentally programmed at specific genes or is imposed by shared features
35 of CGI DNA, we integrated artificial CGI-like DNA sequences into the ESC genome.
36 We found that bivalency is the default chromatin structure for CpG-rich, G+C-rich
37 DNA. A high CpG density alone is not sufficient for this effect, as A+T-rich
38 sequence settings invariably provoke de novo DNA methylation leading to loss of
39 CGI signature chromatin. We conclude that both CpG-richness and G+C-richness
40 are required for induction of signature chromatin structures at CGIs.
41
42 Introduction
43 CpG islands (CGIs) are stretches of atypical genomic DNA sequences that mark
44 most gene promoters (Bird, 1986; Deaton and Bird, 2011). Though individually
45 unique in nucleotide sequence, mammalian CGIs share two features that often
46 correlate but can vary independently: a G+C-rich base composition (65% versus
47 40% for the bulk genome) and high density of the dinucleotide CpG (5-10 fold
48 higher than the bulk genome). Whereas bulk genomic DNA is globally methylated,
49 CGIs associated with promoters invariably lack DNA methylation. It has been
50 proposed that CGIs function as generic platforms for gene regulation, probably
2 51 via proteins that bind to CpG and influence chromatin modification (Blackledge
52 and Klose, 2011; Deaton and Bird, 2011). This hypothesis has gained credence
53 due to accumulating evidence that enrichment or depletion of specific chromatin
54 marks at CGIs is linked to proteins that bind to unmethylated CpG. So far, all such
55 proteins possess CXXC zinc finger domains that bind specifically to the CpG dyad
56 in duplex DNA (Lee et al., 2001). Cfp1, for example, is a CXXC domain-containing
57 component of the Set1/COMPASS complex (Lee and Skalnik, 2005), which
58 generates H3K4 methylation, a signature chromatin mark at non-methylated
59 CGIs (Bernstein et al., 2006; Guenther et al., 2007). Accordingly, Cfp1 is
60 concentrated at CGIs as determined by chromatin immunoprecipitation (ChIP)
61 and its absence is associated with reduced H3K4 methylation at many CGIs
62 (Clouaire et al., 2012; Thomson et al., 2010). The H3K4 methyltransferases Mll1
63 and Mll2 are also CXXC proteins and each is found at CGIs in mouse embryonic
64 stem cells (ESCs) as determined by ChIP-Seq (Denissov et al., 2014; Hu et al.,
65 2013). Similarly the CXXC domain-containing proteins Kdm2a and Kdm2b are
66 enriched at CGIs. Kdm2a is an H3K36 demethylase that contributes to depletion
67 of H3K36me at CGI promoters (Blackledge et al., 2010), whereas Kdm2b
68 facilitates recruitment of the PRC1 complex to transcriptionally silent CGIs
69 (Farcas et al., 2012; Wu et al., 2013).
70
71 Previous studies have shown that CGI-like DNA sequences can impose an altered
72 chromatin state in ESCs. Promoter-less CGI-like DNA sequences of invertebrate
73 origin in mouse ES cells were initially shown to cause local enrichment of
74 trimethylation of lysine 4 of histone H3 (H3K4me3) and to recruit the CpG
75 binding protein Cfp1 (Thomson et al., 2010). Similar experiments using G+C-rich
3 76 DNA derived from bacteria created chromatin marked by both H3K27me3 and
77 H3K4me3 (Mendenhall et al., 2010). Whereas H3K4me3 is characteristic of
78 active promoters, H3K27me3 is associated with transcriptional repression via
79 the polycomb complex. The coincidence of these two marks in so-called
80 “bivalent” chromatin is thought to be a feature of genes that are poised to
81 become either active or silent during early development (Azuara et al., 2006;
82 Bernstein et al., 2006; Voigt et al., 2013). Many native CGIs in ESCs adopt a
83 “bivalent” chromatin structure when transcriptionally silent.
84
85 Despite evidence that CXXC proteins play a role at CGIs, the hypothesis that CpG
86 density is the critical determinant of CGI function has not been directly tested.
87 Here we vary the DNA sequence composition of artificial promoterless CGIs to
88 assess the relative importance of G+C-richness and high CpG density. Our assay
89 relies upon chromosomal integration of artificial CGI-like sequences into the
90 genome of ESCs. We show that a bivalent chromatin configuration and absence
91 of DNA methylation represent the default state of biologically inert CGI-like DNA
92 sequences. While a high CpG density is essential, it is not sufficient to guarantee
93 the bivalent chromatin structure, as CpG-rich DNA sequences with a G+C-poor
94 average base composition do not acquire this chromatin signature. In fact A+T-
95 rich, G+C poor insertions with a CpG frequency matching that of CGIs
96 consistently become DNA methylated in ESCs, although the bivalent
97 configuration can be restored if the dominant DNA methylation is removed. Our
98 findings demonstrate that CpG-richness is essential for the formation of bivalent
99 chromatin, whereas G+C-richness is required to exclude DNA methylation, which
100 when present is dominant over the other chromatin marks.
4 101
102 Results
103 Formation of a novel bivalent domain at artificial CGI-like sequences
104 Comparison between CGIs and an equivalent number of sequences from the bulk
105 genome shows that both CpG frequency and G+C richness are distinct in CGIs
106 compared with bulk genomic DNA of mouse (Figure 1A). A previous study
107 showed that bacterial DNA with CGI-like features organised bivalent chromatin
108 in ESCs (Mendenhall et al., 2010). In order to verify and extend this result we
109 introduced CGI-like DNA sequences of ~1000 nucleotide pairs (the average
110 length of native CGIs) into a bacterial artificial chromosome (BAC) containing a
111 human “gene desert” using recombineering (see Figure 1 – figure supplement 1).
112 A computer-generated CGI-like sequence (Artificial CGI 1) was designed with a
113 CpG frequency of one per ~10 base pairs and a base composition of 65% G+C
114 (Figure 1A, Figure 1-figure supplement 2 and Figure 1-source data 1 ). A second
115 CGI-like sequence (PuroGFP; Figure 1A, Figure 1-figure supplement 2 and Figure
116 1-source data 1) was of prokaryotic origin, being derived from a promoter-less
117 bacterial puromycin gene adjacent to codon-optimised green fluorescent protein
118 coding sequence (Thomson et al., 2010). All constructs lack a promoter, allowing
119 us to focus on the interaction between DNA sequence and chromatin
120 modification without the complicating involvement of transcription. The gene
121 desert regions flanking CGI-like sequences are intended to insulate against
122 effects of the genomic and chromatin environment at different BAC integrations
123 sites.
124
5 125 Three independently transfected stable ESC lines with low copy number random
126 integrations of the BAC containing Artificial CGI 1 and one cell line with the
127 PuroGFP CGI were selected for ChIP analysis (see Figure 1-figure supplement 1).
128 As controls we monitored active genes (Sox2 and Gapdh), which are marked by
129 H3K4me3, an endogenous bivalent gene (Hoxc8), which carries both H3K27me3
130 and H3K4me3, and an inter-genic region of chromosome 15 (m15), which bears
131 neither mark. The CGI-like insertions consistently generated a bivalent
132 chromatin structure marked by H3K4me3 and H3K27me3 (Figure 1B and C and
133 Figure 1-figure supplement 1). We refer to DNA sequence domains as bivalent
134 using the convention that H3K4me3 and H3K27me3 marks coincide at a single
135 integration site. It is possible that some cells in the population harbour an
136 integrant marked by only one of these marks whereas other cells possess only
137 the other mark. This configuration has not been detected at the few ESC bivalent
138 domains tested so far. More likely is that the two marks are interspersed at a
139 given bivalent sequence domain, though we did not test this experimentally [see
140 (Voigt et al., 2013)]. The levels of H3K4me3 at the artificial CGI were similar to
141 those at the endogenous bivalent gene Hoxc8, whereas H3K4me3 levels in the
142 DNA sequences flanking the CGI were unaffected by the insertion. H3K27me3
143 was less discrete, spreading variably into flanking regions of the human BAC, but
144 Suz12, a component of the PRC2 complex that deposits the H3K27me3 mark, was
145 tightly localised to the CGI-like sequence in each case. We tested an unrelated
146 artificial CGI-like sequence with similar overall sequence properties (Artificial
147 CGI 2; Figure 1-figure supplement 1), this time integrated at a recombination
148 cassette within the beta globin locus of mouse ESCs (Lienert et al., 2011).
149 Independent replicate stable transformant cell lines again showed consistent
6 150 presence of a bivalent chromatin structure (data not shown). The DNA
151 methylation status of the integrated artificial CGIs was investigated by bisulfite
152 sequencing. This showed that all the insertions reproducibly maintained a low
153 level of DNA methylation (Figure 1D & Figure 1-figure supplement 1). DNA
154 methylation levels at CGI-like sequences inserted into the gene desert (~10%)
155 were consistently somewhat higher than in the cassette exchange system (0.5%)
156 and were somewhat higher than an endogenous control CGI in the same DNA
157 samples (Dlx5; ~4-5%, data not shown). Despite this variability, it is evident that
158 all of these artificial DNA sequences maintain a largely non-methylated status.
159
160 H3K4me3 at an artificial CGI is independent of Cfp1 and RNA polymerase II
161 To test whether these promoterless artificial DNA sequences were
162 transcriptionally inert, we performed ChIP with antibodies recognising three
163 differentially phosphorylated forms of RNA polymerase II in replicate cell lines.
164 None displayed a peak over the insert, whereas control active genes were RNA
165 polymerase-positive as expected (Figure 2A and Figure 2-figure supplement 1).
166 Consistent with the absence of RNA polymerase, we did not detect significant
167 peaks of histone acetylation over the CGI-like insertions (Figure 2B). In summary,
168 data derived from three distinct promoter-less CGI-like sequences indicate that
169 bivalent chromatin is the default state for transcriptionally inert CGI-like DNA in
170 ESCs.
171
172 We next asked whether the CXXC protein Cfp1 is enriched at the CGI-like
173 sequences. To facilitate detection of Cfp1, we introduced the BAC containing the
174 artificial CGI into a transgenic cell line expressing a Cfp1-GFP fusion protein
7 175 (Denissov et al., 2014). Having verified that a bivalent domain was formed at the
176 artificial CGI in these cells (data not shown), ChIP was performed on three
177 independent cell lines using an anti-GFP antibody. We consistently observed
178 discrete enrichment of Cfp1 at the CGI-like insertion (Figure 2C). To determine
179 whether the formation of a bivalent domain at the inserted artificial CGI-like
180 sequence was dependent on Cfp1, the artificial CGI was introduced into Cfp1 -/-
181 mouse ES cells (Carlone & Skalnik, 2001) and three independent lines were
182 analysed by ChIP. H3K4me3 levels at the artificial CGI were clearly detectable in
183 the Cfp1 -/- cells, indicating that Cfp1 is not required for the formation of
184 H3K4me3 levels at the bivalent domain (Figure 2D). Depletion of Cfp1 was
185 previously reported to preferentially cause a decrease of H3K4me3 at active
186 genes without affecting non-productive genes (Clouaire et al, 2012). In
187 agreement with this finding, we observed reduced H3K4me3 at the active
188 control gene Sox2 compared to wildtype cells (compare Figures. 2D and 1C). We
189 also followed the fate of chromatin modifications during differentiation of ESCs
190 to neuronal progenitor cells (Figure 2 – figure supplement 1B) and found a
191 consistent drop in H3K4me3 accompanied by persistent or increased H3K27me3
192 (Figure 2-figure supplement 1). This transition from bivalency to H3K27me3
193 marking alone matches that at native CGI-associated genes that remain
194 transcriptionally silent during differentiation (Bernstein et al., 2006).
195
196 A high CpG frequency is necessary for the creation of bivalent domain
197 Although G+C content and CpG frequency are related features, they can be varied
198 independently (Figure 1A). To establish the importance of these features for
199 determination of bivalent chromatin, we varied CpG frequency and G+C content
8 200 in 1000 base pair long artificial DNA sequences (Figure 1-figure supplement 2
201 and Figure 1-source data 1). An artificial CGI with a base composition similar to
202 that of a normal CGI (65% G+C) but with a low density of CpGs, similar to that of
203 the bulk genome (1CpG/100bp), was designed (Low CpG / High G+C). This Low
204 CpG / High G+C sequence failed to create bivalent chromatin as neither
205 H3K4me3 nor H3K27me3 was detected in three independent ESC lines (Figure
206 3A). We note that the relative values of control and experimental data points are
207 consistent between experiments although we observe variability in the absolute
208 precipitation levels due to the use of different antibody suppliers between
209 experiments over an extended time period. Our conclusion from this data is that
210 a G+C-rich base composition alone is insufficient to recruit either H3K4me3 or
211 H3K27me3.
212
213 A+T-rich CGIs become reproducibly DNA methylated
214 This result raised the possibility that CpG frequency alone determines the
215 chromatin state, with G+C content playing no role. To test this idea, we generated
216 four different artificial DNA sequences that were CpG-rich to the same level as
217 typical CGIs (10CpGs/100bp), but relatively A+T-rich in overall base composition
218 (three of 40% and one of 50% G+C on average; Figure 1-figure supplement 2 and
219 Figure 1-source data 1). Contrary to expectation, none of these insertions
220 generated a focus of bivalent chromatin in multiple independent cell lines
221 (Figure 4A and Figure 4-figure supplement 1). A potential explanation for this
222 finding came from an analysis of DNA methylation status, which showed that in
223 replicate cell lines the CGIs had all become densely methylated at CpGs (Figure
224 4B and Figure 4-figure supplement 2). The striking contrast between the
9 225 consistent methylation-free status of three separate G+C-rich, CpG-rich
226 integrants and the reproducible dense methylation of four unrelated A+T-rich,
227 CpG-rich sequences of the same length indicates that base composition is a
228 strong determinant of DNA methylation status. A plot of G+C-content against
229 percentage CpG methylation showed a sharp transition between 50 and 60%
230 G+C (Figure 4D). Interestingly, a CGI-like insertion with a base composition of
231 55% G+C (MeCP2-eGFP) studied previously (Thomson et al., 2010) showed an
232 intermediate DNA methylation level, suggesting that it lies on the transition
233 point for triggering de novo methylation (Figure 4D).
234
235 Removal of DNA methylation restores the bivalent domain
236 The results indicate that DNA methylation is dominant over both H3K4me3 and
237 H3K27me3, as in its presence neither chromatin mark is observed at the
238 artificial CGIs. To test this hypothesis, we asked if removal of DNA methylation
239 could restore bivalent chromatin at A+T-rich CpG-rich sequences by using
240 mutant ESCs lacking the de novo DNA methyltransferases Dnmt 3a and Dnmt 3b
241 (Okano et al., 1999), which display severe DNA hypomethylation (Figure 4-figure
242 supplement 2). To ensure that histone modifying enzyme activities are not
243 disrupted in the Dnmt 3a/3b double knock out cells, we generated stable cell
244 lines with the artificial CGI-containing BAC and confirmed that a bivalent domain
245 was observed over the insertion (Figure 4-figure supplement 2). When the High
246 CpG /High A+T construct was introduced into Dnmt 3a/3b knock out cells it now
247 remained unmethylated, and importantly, bivalent chromatin was detected at
248 the inserted sequence (Figure 4C). We noticed that H3K4me3 was reproducibly
249 weaker than controls in Dnmt 3a/Dnmt 3b double mutant cells, whereas a robust
10 250 H3K27me3 signal was obtained. This may indicate that A+T-rich DNA is less able
251 to recruit H3K4me3 despite its high CpG density. The dominance of the
252 polycomb-associated H3K27me3 mark at A+T-rich CGIs was also seen when DNA
253 methylation was partially reduced by growing cells for 10days in 2i medium,
254 which enhances the pluripotent state. In line with previous reports (Ficz et al.,
255 2013; Habibi et al., 2013), we found that the average level of genomic CpG
256 methylation was reduced from ~95% to ~55% (Figure 4-figure supplement 2).
257 Whereas cells grown in serum-containing medium lacked both the tested histone
258 marks, 2i-grown cells displayed a strong increase in H3K27me3 without the
259 appearance of noticeable H3K4me3 (Figure 4-figure supplement 2). We conclude
260 that while CpG frequency is a key feature of CGIs that determines the signature
261 chromatin marks at CGIs, A+T-richness confers an intrinsic susceptibility to de
262 novo methylation that is dominant over these chromatin modifications (Figure
263 4E).
264
265 Status of endogenous DNA domains with atypical CpG or G+C composition
266 Our study assessed the role of G+C-richness versus CpG-richness on chromatin
267 structure by experimentally varying these sequence parameters individually in
268 synthetic DNA domains. The question arises: do equivalent atypical sequences
269 exist naturally in the mouse genome and if so what is their chromatin
270 modification status? We searched first for G+C-rich (≥61%), CpG deficient
271 (≤1/100bp) sequences of ≥500bp in length and identified 1,954 examples. None
272 of these coincided with bivalent chromatin in ESCs, in agreement with our
273 results using synthetic DNA. To determine if A+T-rich, CpG-rich sequences also
274 exist naturally in mice, we divided the genome into windows of 100bp and
11 275 calculated the G+C content and CpG density of each. To ensure that we selected
276 regions with profiles consistently above the thresholds (as is the case for our
277 artificial constructs) rather than short CpG dense regions flanked by AT rich
278 DNA, we subtracted windows with ≥50% GC content or <5 CpGs and searched
279 for blocks of adjoining windows. Uniformly A+T-rich and CpG-rich DNA
280 sequences of this kind were absent in the mouse genome. Even when the criteria
281 were relaxed to include sequences of only 500 base pairs in length (the
282 approximate minimum size of CGIs), we found no examples. As A+T-rich regions
283 are consistently DNA methylated, it seems likely that, in the absence of strong
284 selection, the high mutation rate of 5-methylcytosine effectively resists CpG
285 accumulation at these domains over evolutionary time.
286
287 Discussion
288
289 Epigenetic marks are often considered to be sensitive to both developmental and
290 environmental signals. Our data reinforce the complementary view that the
291 underlying genetic information, independent of nurture or developmental
292 history, plays an important role in setting up CGI chromatin structures. In ESCs
293 at least, each of the major CGI chromatin configurations is informed by DNA
294 sequence. We find that the non-methylated state of CGIs depends on G+C-rich
295 base composition and that both the signature histone marks H3K4me3 and
296 H3K27me3 depend on a high density of the non-methylated CpG dyad. The
297 importance of primary DNA sequence in determining bulk features of the
298 epigenome have been suggested elsewhere. For example, DNA sequence motifs
299 that are recognised by transcription factors strongly influence DNA methylation
12 300 patterns (Lienert et al., 2011; Stadler et al., 2011) and inter-individual variation
301 in other epigenomic features maps to sites of human DNA sequence
302 heterogeneity that are probably causal (Kasowski et al., 2013; Kilpinen et al.,
303 2013; McVicker et al., 2013). Our data show that even gross features of genomic
304 DNA, such as base composition and the frequency of the 2-base pair sequence
305 CpG, influence the epigenome.
306
307 Do our findings reflect the relationship between naturally occurring DNA
308 sequences and chromatin modification? Almost all native CGIs in mouse ESCs are
309 non-methylated at the DNA level and coincide with peaks of H3K4me3, which is
310 often seen as the signature histone mark of CGIs (Bernstein et al., 2006;
311 Thomson et al., 2010). About one third of the H3K4me3-marked CGIs in mouse
312 ES cells also carry H3K27me3 and are therefore defined as bivalent (Ku et al.,
313 2008). H3K27me3 is usually relatively dispersed compared with the discrete
314 localisation of components of the PRC2 complex, which deposit this mark. It is
315 estimated that at least 97% of peaks corresponding to the PRC2 component Ezh2
316 coincide with CGIs (Ku et al., 2008). This has led to the suggestion that polycomb
317 is targeted to G+C-rich or CpG-rich DNA (Long et al., 2013; Lynch et al., 2012;
318 Mendenhall et al., 2010; Tanay et al., 2007). It was shown previously that CpG
319 density within the non-methylated CGI fraction as a whole is proportional to the
320 H3K4me3 ChIP-seq signal in both mouse and human cells (Illingworth et al.,
321 2010). We asked whether the same relationship holds true for bivalent CGIs and
322 whether it also applies to H3K27me3. Within a set of 2,547 exclusively bivalent
323 promoters from mouse ESCs (Denissov et al., 2014; Marks et al., 2012), we found
324 that 92% coincide with CGIs. Within this bivalent group, CGI length and CpG
13 325 density both correlate positively with H3K4me3 and also H3K27me3 levels
326 (Figure 4 – figure supplement 3). In contrast to typical CGI-like sequences,
327 endogenous G+C-rich, CpG-poor sequences do not display a bivalent chromatin
328 structure. A+T-rich, CpG-rich domains of CGI-like dimensions, however, are
329 effectively absent from the mouse genome. The sum of available data argues
330 strongly that a high abundance of the dinucleotide CpG is a key precondition for
331 the formation of bivalent chromatin.
332
333 A possible mechanism for recruitment of H3K4me3 involves DNA binding by
334 H3K4 methyltransferases, each of which is associated with a CpG-binding CXXC
335 domain. In Mll1 and Mll2 the DNA binding domains are within the SET-
336 containing protein themselves (Allen et al., 2006; Cierpicki et al., 2010), whereas
337 in the case of Set1/COMPASS the CXXC domain resides in the Cfp1 protein
338 component of the multi-subunit complex (Lee and Skalnik, 2005). The H3K4me3
339 component of bivalent CGIs in ESCs depends on Mll2 (Denissov et al., 2014; Hu et
340 al., 2013). Accordingly, we have found that bivalent CGIs form normally in Cfp1-
341 /- ES cells. Since the binding of the CXXC domain to CpG is abolished by
342 methylation of the cytosine moiety (Lee et al., 2001), this may explain why DNA
343 methylation prevents the formation of H3K4me3 even in the presence of high
344 CpG densities. The mechanism responsible for recruiting PRC2, which is the
345 complex responsible for the establishment of H3K27me3, remains uncertain, as
346 no CpG binding component of PRC2 has yet been detected. (Hu et al., 2013;
347 Thomson et al., 2010; Wu et al., 2013). A recent study, however, indicates that
348 the CXXC-domain protein KDM2B can target the PRC1 complex to CGIs and
14 349 recruit PRC2 secondarily, which would accord with our findings (Blackledge et
350 al., 2014).
351
352 Bivalent chromatin has attracted attention due to the proposal that it represents
353 a poised transcriptional state in pluripotent cells (Bernstein et al., 2006; Voigt et
354 al., 2013). It is suggested that the poised state is resolved during differentiation
355 as the affected gene becomes either transcriptionally activated or repressed.
356 Recent evidence has clearly established that H3K27me and H3K4me can be
357 present on the same nucleosome, albeit on different histone tails (Voigt et al.,
358 2012). The biological significance of bivalent chromatin has recently become less
359 certain, however. Whereas silent CGI promoters in ESCs grown in serum usually
360 exhibit a bivalent chromatin structure, this chromatin configuration is
361 significantly reduced in 2i medium, which discourages differentiation and is
362 thought to induce a more pronounced pluripotent state (Marks et al., 2012). Also,
363 cells in which the histone methyltransferase Mll2 is depleted lose H3K4me3 at
364 hitherto bivalent CGIs, but this has no detectable effect on the induction kinetics
365 of the associated genes upon differentiation (Hu et al., 2013). The evidence
366 remains inconclusive, but it raises the possibility that bivalency is not an
367 essential precondition for gene activation during differentiation. Here we find
368 that bivalency is not confined to poised developmental genes but is a default
369 response to any CpG-rich DNA sequences, even when these are completely
370 artificial. It is apparent from this that the bivalent chromatin structure is not
371 reserved for developmentally important genes, but is a response to general
372 features of local DNA sequence.
373
15 374 Bulk mammalian DNA has a base composition of 40% G+C and is globally
375 methylated at CpGs to an average level of ~65%, whereas G+C-rich CGIs are
376 usually DNA methylation-free. The transfection experiments reported here
377 recapitulate this distinction as G+C-rich CGI-like DNA reproducibly resisted DNA
378 methylation, whereas A+T-rich DNA reliably became densely methylated. Our
379 findings raise the possibility that this broadly binary pattern of global DNA
380 methylation may be determined by base composition. CpG density did not affect
381 this susceptibility to methylation, which occurred even when CpG occurred at
382 densities typical of endogenous CGIs. Two simple alternative explanations are
383 possible: either A+T-rich DNA attracts de novo methylation, or G+C-richness
384 excludes it, leaving the bulk genome to be methylated by default. Both Dnmt 3L
385 and Dnmt 3A have the potential to be repelled by H3K4me3 (Jia et al., 2007; Ooi
386 et al., 2007), suggesting that recruitment of H3K4 methyltransferases with CpG-
387 binding CXXC domains protects CGIs against de novo methylation. We find,
388 however, that A+T-rich CpG-rich sequences are reproducibly DNA methylated
389 despite their ability to attract H3K4me3 when DNA methylation is absent. Since
390 these sequences were initially non-methylated when transfected into cells, it
391 may be that H3K4me3 alone is not sufficient to exclude CpG methylation from
392 these regions. This would accord with our previous observation that a
393 moderately G+C-rich artificial CGI when inserted into ES cells was extensively
394 methylated despite the presence of H3K4me3 on non-methylated copies
395 (Thomson et al., 2010). Alternatively, A+T-rich CpG-rich sequences may attract
396 H3K4me3 less robustly than G+C-rich CGIs, thereby allowing access to de novo
397 DNA methyltransferases.
398
16 399 The CGI phenomenon is conserved throughout the vertebrate lineage, but
400 interestingly the G+C-richness characteristic of mammalian and bird CGIs is not
401 seen in some vertebrate groups (Long et al., 2013). Fish for example have non-
402 methylated CGI-like sequences at promoters, but these are not markedly G+C-
403 rich compared with the bulk genome (Cross et al., 1991; Long et al., 2013). Since
404 A+T-rich sequences are susceptible to de novo DNA methylation in mammals, it
405 follows that some vertebrate groups may rely on a different set of mechanisms to
406 prevent methylation at G+C-poor CGIs. Identifying potential components that
407 discriminate between mammalian CGIs and the bulk genome is a priority for
408 future work.
409
410
411 Materials and Methods
412
413 Mouse ES cell lines
414 ES cells were grown in gelatinized dishes in Glasgow MEM (Gibco) supplemented
415 with 15% fetal bovine serum (Hyclone), 1% sodium pyruvate, 1% non-essential
416 amino acids, 0.1% β-mercaptoethanol, 100U/ml penicillin, 100 μg/ml
417 streptomycin and leukemia inhibitory factor (LIF). E14TG2a ES cells were used
418 as wild-type ES cells. Cfp1-/- cells were a gift from David Skalnik and have been
419 described previously (Carlone et al., 2005). Dnmt 3a/3b double knock out mouse
420 ESC (DKOs) were as described (Okano et al, 1999). TC1-mES cells (a gift from Dr
421 Ann Dean) with the hygromycin/thymidine kinase cassette in the β-globin locus
422 (Lienert et al., 2011) were used for recombination-mediated cassette exchange.
423 Cfp1-GFP tagged mouse ES cells have been described (Denissov et al, 2014). This
17 424 integrated BAC contains the whole Cfp1 gene with regulatory elements and GFP
425 fused to the last codon of to create a C-terminal GFP tag.
426
427 Recombineering
428 Custom Perl scripts were used to create random sequences with specific
429 frequencies of CpG, G+C content and length (https://github.com/swebb1/cpg_tools).
430 Supplementary File 1 lists the CGI-like sequences used in this study. Artificial
431 DNAs were synthesised (GeneArt) and cloned into a plasmid containing a
432 selection cassette and homology arms for recombineering. CGI-constructs were
433 introduced into human gene desert BACs by recombineering using the Red/ET
434 system by Gene Bridges. Briefly, bacteria containing the gene desert BAC of
435 interest (gene desert 1 mChr1:81,106,616-81,153,886 or gene desert 2
436 mChr18:36,042,881-36,175,341) were cultivated in LB medium plus
437 chloramphenicol at 37°C o/n. On the next day 40ng Red/ET plasmid were
438 electroporated (1350 V, 10 μF, 600 Ohms) into the cells containing the BAC and
439 incubated in LB containing chloramphenicol and tetracycline at 30° for at least
440 15h. Next day 50μl of 10% L-arabinose were added to induce the expression of
441 the Red/ET recombination proteins and samples were incubated at 37°C for 1h.
442 Cells were electroporated with 200-300ng of linearized DNA containing the CGI-
443 like sequence, a kanamycin selection cassette and homology arms. Cells were re-
444 suspended in 1ml of SOC medium and recovered for 1-2h at 37°C. Cells were
445 plated on plates containing chloramphenicol and kanamycin. Plates were
446 incubated at 37° o/n. Colonies were screened by colony PCR and control digests
447 for successful recombination. The linearized BAC containing the CGI-like
448 constructs and a selection cassette was used for transfection of mESCs.
18 449
450 ES cell transfection
451 Linearized BAC DNA (0.5 to 2μg) containing the CGI-like sequences and a
452 selection cassette flanked by Frt sites was used to transfect 60% confluent
453 mouse ESC growing in a 6-well plate using LipofectamineTM LTX PlusTM
454 (Invitrogen). DNA was made up with OptiMem (GIBCO) to 500μl, 2.5μl PLUS
455 reagent were added and incubated for 5 min at RT. Afterwards 6.25μl
456 Lipofectamine were added, incubated for 30min at RT and added to the ES cells.
457 Cells were split in a range of different ratios (10-0.1% of transfected cells) 24h
458 after transfection and plated onto 10cm2 dishes. Next day, selection medium
459 containing the appropriate antibiotic (G418 250 μg/ml or Blasticidin 3 μg/ml)
460 was added and cells were grown until colonies were ready to be picked. Clones
461 were analysed for incorporation of the constructs by PCR and 2-3 independent
462 clones with low copy number integration were selected for the excision of the
463 selection cassette. Circular plasmid (50μg) containing a eukaryotic expression
464 cassette for Flp or Dre were added to 2x107 cells and electroporated at 250V and
465 500μF using a BioRad electroporator (GenePulser Xcell). Cells were left to
466 recover for 20 min at RT and seeded at different dilutions. Next day 0.8μg/ml
467 puromycin were added for 48h and cells were cultured until colonies were big
468 enough for picking. Successful excision was confirmed by Southern blotting.
469 Artificial CGIs for insertion into the beta-Globin locus via recombination
470 mediated cassette exchange were synthesised (GeneArt) with flanking inverted
471 loxp sites and cloned into pBSIISK+ and electroporated into TC-1 ES cells
472 carrying a Hygromycin/Thymidine Kinase double selection cassette in the beta-
473 Globin locus (Lienert et al, 2011). Cells (4x106) were pre-selected with
19 474 hygromycin for 10d, electroporated with 25µg L1-artificial CGI-1L construct and
475 20µg pCAGGS-Cre and selection for positive clones with 3um Ganciclovir was
476 started 2d after electroporation and continued for 8-10d. Clones were tested for
477 successful insertion of artificial CGIs by PCR screen and Southern Blot. For
478 differentiation of mESC into neural precursors cells were plated (4x106
479 cells/dish) in 15ml EB medium (ES cell medium with 10% FBS and no LIF). After
480 4 days in EB medium trans-retinoic acid (Sigma) was added to start neuronal
481 differentiation. Medium was changed every 2 days. On day 8 EBs were disrupted,
482 trypsinized and used for formaldehyde crosslinking.
483
484 Chromatin Immunoprecipitation
485 Chemical crosslink of chromatin was performed for 10 minutes at room
486 temperature by addition of formaldehyde to a final concentration of 1%.
487 Crosslinking was stopped by addition of glycine to a final concentration of 0.125
488 mM. After 5 minutes incubation cells were washed twice with ice-cold 1x PBS. If
489 acetylation was examined, sodium butyrate was added to the PBS. Cells were
490 centrifuged for 5 minutes at 330 g at 4°C and washed once in wash buffer A
491 (0.25% Triton X-100, 10mM EDTA pH 8.0, 0.5mM EGTA pH 7.5, 10mM, HEPES
492 pH 7.5) and B (0.2M NaCl, 1mM EDTA pH 8.0, 0.5mM EGTA pH 7.5, 10mM HEPES
493 pH 7.5). Each buffer contained PMSF to a final concentration of 1μg/ml, 1x
494 Proteinase Inhibitor Complete Mix (Roche) and 10mM sodium butyrate. Cells
495 were re-suspended in lysis buffer (20% SDS, 0.5M EDTA, 1M Tris pH 8.1, 1X
496 complete protease inhibitors, 1μg/ml PMSF, 10mM sodium butyrate). Chromatin
497 was sheared to 400-600 base pair fragments by sonication of the cell lysate using
20 498 a twin Bioruptor (Diagenode) for 15 cycles, 30 sec on, 30 sec off on high setting.
499 The lysate was centrifuged for 10 minutes at 16,000 g at 4°C to collect cellular
500 debris. Chromatin concentration was measured on a Nanodrop
501 spectrophotometer. Chromatin (30μg or 100μg) was used for ChIP with
502 antibodies against histone modifications or other proteins. For the ChIP,
503 chromatin was made up to 100μl with lysis buffer and 900μl of dilution buffer
504 (20mM Tris-HCl, pH 8, 150mM NaCl, 1% Triton X-100, 1mM EDTA) were added.
505 Antibodies were added as specified and samples were incubated o/n at 4 °C on a
506 rotating wheel. The next day debris was removed by centrifugation for 5min at
507 16000 x g and 50μl dynabeads protein G (Life Technologies) were added to the
508 supernatant of the IPs and samples were incubated on a rotating wheel for 2
509 hours at 4 °C. IPs were washed 3x with 1ml ice-cold wash buffer 1 (20mM Tris-
510 HCl, pH 8, 150mM NaCl, 1% Triton X-100, 1mM EDTA, 0.1% SDS), 2x with ice-
511 cold wash buffer 2 (20mM Tris-HCl, pH 8, 500mM NaCl, 1% Triton X-100, 1mM
512 EDTA, 0.1% SDS) and 1x with ice-cold TE. 100μl or 50 μl of freshly prepared 10%
513 Chelex 100 Resin (100-200 mesh, BioRad) were added to samples or input
514 respectively. Samples were boiled for 12 min, cooled to RT. Proteinase K (2μl of
515 20mg/ml) was added and incubated at 55 °C for 30’ while shaking. After heating
516 to 100°C, samples were spun down and 60μl supernatant were transferred to
517 fresh tubes. Each sample was made up to 300μl total volume with 10mM Tris
518 pH8, 0.1 mM EDTA and assayed by q-PCR.
519
520 qPCR
521 Supplementary File 1 lists the qPCR primers used in this study. qPCR was used to
522 assess enrichment of specific regions in ChIP samples and to determine the copy
21 523 number of BAC DNA inserted into the mouse genome. qPCR reactions (10μl)
524 contained SYBR Green SensiMix (Bioline), 250nM primers and 3µl ChIP DNA or
525 100ng DNA for the determination of copy numbers. PCR was carried out using a
526 Roche Lightcycler and cycling conditions were as follows; initial denaturation at
527 94°C for 10min followed by 45 cycles of denaturation at 94°C for 10 sec, primer
528 annealing for 10 sec and primer extension at 72°C for 15 sec. Using the Roche
529 Lightcycler software, SYBR Green fluorescence measurements were plotted
530 relative to cycle number and the 2nd derivative maximum method was used to
531 determine the cycle threshold values (Ct) for each sample. Values for duplicate of
532 triplicate ChIP samples were calculated as %-input and copy number of artificial
533 CGIs was assessed relative to Sox2 qPCR signal.
534
535 Antibodies
536 α-H3K4me3 (Abcam-8580), α-H3K4me1(Abcam-8895), α-H3K27me3 (Millipore-
537 07-449), -H3K9/K14ac (Abcam 12179), -H3 (Abcam 1791), -SUZ12 (Abcam
538 12073-100), -RNA Pol II N20 (Santa-Cruz -899), -RNA Pol II S5P (Abcam
539 5131), -RNA Pol II unphosphorylated CTD (Abcam 817), -GFP (Chromotec
540 GFP-TRAP-A gta-20), -IgG (Invitrogen 10500C).
541
542 Bisulfite genomic sequencing
543 Bisulfite conversion of genomic DNA was carried out using the EpiTect Bisulfite
544 Kit from Qiagen. The converted DNA was used for PCR amplification of regions of
545 interest, Supplementary File 2 lists the CGI-like sequences used in this study. PCR
546 products were gel-purified and cloned using the Stratagene blunt end cloning kit.
547 Positive clones were sent for sequencing.
22 548
549 Acknowledgments
550 We thank Ann Dean, Dirk Schubeler and David Skalnik for sharing cell lines,
551 Sukhdeep Singh for bioinformatic insights and Martha Koerner, Matt Lyst and
552 Sabine Lagger for critical comments on the manuscript. The work was supported
553 by a Programme Grant from the Wellcome Trust. E.W. was funded by Wellcome
554 Trust 4 year PhD studentships and T.Q. holds a Marie Curie fellowship.
555
556 Author contributions
557 E.W. created and analysed cell lines and interpreted data. A.A created High CpG
558 Medium G+C cell lines and analysed and interpreted data. T.Q. created and
559 analysed cell lines in the beta-globin locus and analysed data. C.M. carried out
560 selection cassette excision, bisulfite sequencing, southern blot analysis and data
561 analysis and interpretation. S.W. provided extensive informatic support for
562 designing constructs and analysing native CGI chromatin. F.S. shared
563 unpublished cell lines and contributed ideas and analysis. E.W., T.Q. and A.B.
564 conceived the project and wrote the manuscript. All authors commented on the
565 final version of the manuscript.
566
567 Competing financial interests
568 The authors declare no competing financial interests.
569
570
23 571 References
572
573 Allen, M.D., Grummitt, C.G., Hilcenko, C., Min, S.Y., Tonkin, L.M., Johnson, C.M.,
574 Freund, S.M., Bycroft, M., and Warren, A.J. (2006). Solution structure of the
575 nonmethyl-CpG-binding CXXC domain of the leukaemia-associated MLL histone
576 methyltransferase. The EMBO journal 25, 4503-4512. 7601340 [pii]
577 10.1038/sj.emboj.7601340
578 Azuara, V., Perry, P., Sauer, S., Spivakov, M., Jorgensen, H.F., John, R.M., Gouti, M.,
579 Casanova, M., Warnes, G., Merkenschlager, M., et al. (2006). Chromatin signatures
580 of pluripotent cell lines. Nature cell biology 8, 532-538. 10.1038/ncb1403
581 Bernstein, B.E., Mikkelsen, T.S., Xie, X., Kamal, M., Huebert, D.J., Cuff, J., Fry, B.,
582 Meissner, A., Wernig, M., Plath, K., et al. (2006). A bivalent chromatin structure
583 marks key developmental genes in embryonic stem cells. Cell 125, 315-326.
584 Bird, A.P. (1986). CpG-rich islands and the function of DNA methylation. Nature
585 321, 209-213.
586 Blackledge, N.P., Farcas, A.M., Kondo, T., King, H.W., McGouran, J.F., Hanssen, L.L.,
587 Ito, S., Cooper, S., Kondo, K., Koseki, Y., et al. (2014). Variant PRC1 complex-
588 dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain
589 formation. Cell 157, 1445-1459. 10.1016/j.cell.2014.05.004
590 Blackledge, N.P., and Klose, R. (2011). CpG island chromatin: a platform for gene
591 regulation. Epigenetics 6, 147-152. 13640 [pii]
592 Blackledge, N.P., Zhou, J.C., Tolstorukov, M.Y., Farcas, A.M., Park, P.J., and Klose,
593 R.J. (2010). CpG islands recruit a histone H3 lysine 36 demethylase. Mol Cell 38,
594 179-190. 10.1016/j.molcel.2010.04.009
24 595 Carlone, D.L., Lee, J.H., Young, S.R., Dobrota, E., Butler, J.S., Ruiz, J., and Skalnik,
596 D.G. (2005). Reduced genomic cytosine methylation and defective cellular
597 differentiation in embryonic stem cells lacking CpG binding protein. Mol Cell Biol
598 25, 4881-4891. 25/12/4881 [pii]
599 10.1128/MCB.25.12.4881-4891.2005
600 Cierpicki, T., Risner, L.E., Grembecka, J., Lukasik, S.M., Popovic, R., Omonkowska,
601 M., Shultis, D.D., Zeleznik-Le, N.J., and Bushweller, J.H. (2010). Structure of the
602 MLL CXXC domain-DNA complex and its functional role in MLL-AF9 leukemia.
603 Nat Struct Mol Biol 17, 62-68. nsmb.1714 [pii]
604 10.1038/nsmb.1714
605 Clouaire, T., Webb, S., Skene, P., Illingworth, R., Kerr, A., Andrews, R., Lee, J.H.,
606 Skalnik, D., and Bird, A. (2012). Cfp1 integrates both CpG content and gene
607 activity for accurate H3K4me3 deposition in embryonic stem cells. Genes &
608 development 26, 1714-1728. 10.1101/gad.194209.112
609 Cross, S., Kovarik, P., Schmidtke, J., and Bird, A.P. (1991). Non-methylated islands
610 in fish genomes are GC-poor. Nucleic Acids Research 19, 1469-1474.
611 Deaton, A.M., and Bird, A. (2011). CpG islands and the regulation of transcription.
612 Genes & development 25, 1010-1022. 25/10/1010 [pii]
613 10.1101/gad.2037511
614 Denissov, S., Hofemeister, H., Marks, H., Kranz, A., Ciotta, G., Singh, S.,
615 Anastassiadis, K., Stunnenberg, H.G., and Stewart, A.F. (2014). Mll2 is required for
616 H3K4 trimethylation on bivalent promoters in embryonic stem cells, whereas
617 Mll1 is redundant. Development 141, 526-537. 10.1242/dev.102681
618 Farcas, A.M., Blackledge, N.P., Sudbery, I., Long, H.K., McGouran, J.F., Rose, N.R.,
619 Lee, S., Sims, D., Cerase, A., Sheahan, T.W., et al. (2012). KDM2B links the
25 620 Polycomb Repressive Complex 1 (PRC1) to recognition of CpG islands. Elife 1,
621 e00205. 10.7554/eLife.00205
622 00205 [pii]
623 Ficz, G., Hore, T.A., Santos, F., Lee, H.J., Dean, W., Arand, J., Krueger, F., Oxley, D.,
624 Paul, Y.L., Walter, J., et al. (2013). FGF signaling inhibition in ESCs drives rapid
625 genome-wide demethylation to the epigenetic ground state of pluripotency. Cell
626 Stem Cell 13, 351-359. 10.1016/j.stem.2013.06.004
627 Guenther, M.G., Levine, S.S., Boyer, L.A., Jaenisch, R., and Young, R.A. (2007). A
628 chromatin landmark and transcription initiation at most promoters in human
629 cells. Cell 130, 77-88.
630 Habibi, E., Brinkman, A.B., Arand, J., Kroeze, L.I., Kerstens, H.H., Matarese, F.,
631 Lepikhov, K., Gut, M., Brun-Heath, I., Hubner, N.C., et al. (2013). Whole-genome
632 bisulfite sequencing of two distinct interconvertible DNA methylomes of mouse
633 embryonic stem cells. Cell Stem Cell 13, 360-369. 10.1016/j.stem.2013.06.002
634 Hu, D., Garruss, A.S., Gao, X., Morgan, M.A., Cook, M., Smith, E.R., and Shilatifard, A.
635 (2013). The Mll2 branch of the COMPASS family regulates bivalent promoters in
636 mouse embryonic stem cells. Nat Struct Mol Biol 20, 1093-1097.
637 10.1038/nsmb.2653
638 Illingworth, R.S., Gruenewald-Schneider, U., Webb, S., Kerr, A.R., James, K.D.,
639 Turner, D.J., Smith, C., Harrison, D.J., Andrews, R., and Bird, A.P. (2010). Orphan
640 CpG islands identify numerous conserved promoters in the mammalian genome.
641 PLoS Genet 6. 10.1371/journal.pgen.1001134
642 Jia, D., Jurkowska, R.Z., Zhang, X., Jeltsch, A., and Cheng, X. (2007). Structure of
643 Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation.
644 Nature 449, 248-251. 10.1038/nature06146
26 645 Kasowski, M., Kyriazopoulou-Panagiotopoulou, S., Grubert, F., Zaugg, J.B.,
646 Kundaje, A., Liu, Y., Boyle, A.P., Zhang, Q.C., Zakharia, F., Spacek, D.V., et al. (2013).
647 Extensive variation in chromatin states across humans. Science 342, 750-752.
648 10.1126/science.1242510
649 Kilpinen, H., Waszak, S.M., Gschwind, A.R., Raghav, S.K., Witwicki, R.M., Orioli, A.,
650 Migliavacca, E., Wiederkehr, M., Gutierrez-Arcelus, M., Panousis, N.I., et al. (2013).
651 Coordinated effects of sequence variation on DNA binding, chromatin structure,
652 and transcription. Science 342, 744-747. 10.1126/science.1242463
653 Ku, M., Koche, R.P., Rheinbay, E., Mendenhall, E.M., Endoh, M., Mikkelsen, T.S.,
654 Presser, A., Nusbaum, C., Xie, X., Chi, A.S., et al. (2008). Genomewide analysis of
655 PRC1 and PRC2 occupancy identifies two classes of bivalent domains. PLoS Genet
656 4, e1000242. 10.1371/journal.pgen.1000242
657 Lee, J.H., and Skalnik, D.G. (2005). CpG-binding protein (CXXC finger protein 1) is
658 a component of the mammalian Set1 histone H3-Lys4 methyltransferase
659 complex, the analogue of the yeast Set1/COMPASS complex. The Journal of
660 biological chemistry 280, 41725-41731.
661 Lee, J.H., Voo, K.S., and Skalnik, D.G. (2001). Identification and characterization of
662 the DNA binding domain of CpG-binding protein. The Journal of biological
663 chemistry 276, 44669-44676.
664 Lienert, F., Wirbelauer, C., Som, I., Dean, A., Mohn, F., and Schubeler, D. (2011).
665 Identification of genetic elements that autonomously determine DNA
666 methylation states. Nature genetics 43, 1091-1097. 10.1038/ng.946
667 ng.946 [pii]
668 Long, H.K., Sims, D., Heger, A., Blackledge, N.P., Kutter, C., Wright, M.L., Grutzner,
669 F., Odom, D.T., Patient, R., Ponting, C.P., et al. (2013). Epigenetic conservation at
27 670 gene regulatory elements revealed by non-methylated DNA profiling in seven
671 vertebrates. Elife 2, e00348. 10.7554/eLife.00348
672 Lynch, M.D., Smith, A.J., De Gobbi, M., Flenley, M., Hughes, J.R., Vernimmen, D.,
673 Ayyub, H., Sharpe, J.A., Sloane-Stanley, J.A., Sutherland, L., et al. (2012). An
674 interspecies analysis reveals a key role for unmethylated CpG dinucleotides in
675 vertebrate Polycomb complex recruitment. The EMBO journal 31, 317-329.
676 10.1038/emboj.2011.399
677 Marks, H., Kalkan, T., Menafra, R., Denissov, S., Jones, K., Hofemeister, H., Nichols,
678 J., Kranz, A., Stewart, A.F., Smith, A., et al. (2012). The transcriptional and
679 epigenomic foundations of ground state pluripotency. Cell 149, 590-604.
680 10.1016/j.cell.2012.03.026
681 McVicker, G., van de Geijn, B., Degner, J.F., Cain, C.E., Banovich, N.E., Raj, A.,
682 Lewellen, N., Myrthil, M., Gilad, Y., and Pritchard, J.K. (2013). Identification of
683 genetic variants that affect histone modifications in human cells. Science 342,
684 747-749. 10.1126/science.1242429
685 Mendenhall, E.M., Koche, R.P., Truong, T., Zhou, V.W., Issac, B., Chi, A.S., Ku, M.,
686 and Bernstein, B.E. (2010). GC-rich sequence elements recruit PRC2 in
687 mammalian ES cells. PLoS Genet 6, e1001244. 10.1371/journal.pgen.1001244
688 Okano, M., Bell, D.W., Haber, D.A., and Li, E. (1999). DNA methyltransferases
689 Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian
690 development. Cell 99, 247-257.
691 Ooi, S.K., Qiu, C., Bernstein, E., Li, K., Jia, D., Yang, Z., Erdjument-Bromage, H.,
692 Tempst, P., Lin, S.P., Allis, C.D., et al. (2007). DNMT3L connects unmethylated
693 lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714-717.
28 694 Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scholer, A., van
695 Nimwegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., et al. (2011). DNA-
696 binding factors shape the mouse methylome at distal regulatory regions. Nature
697 480, 490-495. 10.1038/nature10716
698 nature10716 [pii]
699 Tanay, A., O'Donnell, A.H., Damelin, M., and Bestor, T.H. (2007). Hyperconserved
700 CpG domains underlie Polycomb-binding sites. Proc Natl Acad Sci U S A 104,
701 5521-5526. 10.1073/pnas.0609746104
702 Thomson, J.P., Skene, P.J., Selfridge, J., Clouaire, T., Guy, J., Webb, S., Kerr, A.R.,
703 Deaton, A., Andrews, R., James, K.D., et al. (2010). CpG islands influence
704 chromatin structure via the CpG-binding protein Cfp1. Nature 464, 1082-1086.
705 nature08924 [pii]
706 10.1038/nature08924
707 Voigt, P., LeRoy, G., Drury, W.J., 3rd, Zee, B.M., Son, J., Beck, D.B., Young, N.L.,
708 Garcia, B.A., and Reinberg, D. (2012). Asymmetrically modified nucleosomes. Cell
709 151, 181-193. 10.1016/j.cell.2012.09.002
710 Voigt, P., Tee, W.W., and Reinberg, D. (2013). A double take on bivalent
711 promoters. Genes & development 27, 1318-1338. 10.1101/gad.219626.113
712 Wu, X., Johansen, J.V., and Helin, K. (2013). Fbxl10/Kdm2b recruits polycomb
713 repressive complex 1 to CpG islands and regulates H2A ubiquitylation. Mol Cell
714 49, 1134-1146. 10.1016/j.molcel.2013.01.016
715
716
29 717 Figure legends
718
719 Figure 1. A novel bivalent chromatin domain is formed at promoter-less
720 artificial CGI-like sequences integrated within a gene desert in mouse ESCs.
721 (A) CpG frequency and G+C content of CGIs in the mouse genome (blue circles)
722 and an equivalent number of equal-sized (1000 base pair) random fragments of
723 bulk genomic DNA (red circles). (B) Map of human gene desert 2 (grey bars;
724 Chr1:81,106,616-81,153,886) showing the integration site of the Artificial CGI-
725 like construct (purple box). Black boxes at the ends indicate bacterial BAC
726 sequences. Black bars above indicate the position of Q-PCR amplicons (not to
727 scale). (C) Representative anti-H3K4me3 and H3K27me3 ChIP profiles
728 (normalized to H3 ChIP) and Suz12 ChIP profiles (% Input; n=3) for 3
729 independently transfected cell lines. Shaded box includes primers spanning the
730 Artificial CGI. ChIP control amplicons are derived from the TSS of the active
731 genes Sox2 (S) and GAPDH (G); the TSS of bivalent gene Hoxc8 (H) and an
732 inconspicuous negative control region on mouse chromosome 15 (-). Error bars
733 indicate the standard deviation of PCR replicates. (D) Bisulfite sequencing of the
734 3 cell lines shown in (C). In the map above, blue strokes show CpGs in the CGI-
735 like insert and the clear box indicates the bisulfite amplicon. Methylated and
736 unmethylated CpGs are depicted as filled and open circles, respectively. This
737 figure has two supplements:
738 Supplement 1 - Bivalent chromatin at artificial CGI-like sequences in mouse ESCs.
739 Supplement 2 - Synthetic DNA elements with different sequence properties.
740
741
30 742 Figure 2. H3K4me3 at a promoter-less artificial CGI forms independently of
743 Cfp1 and RNA polymerase II
744 (A) Map of gene desert 2 with integrated Artificial CGI-like construct labeled as
745 in Figure 1B. Representative ChIP with an antibody specific for the N-terminus of
746 RNA polymerase II for 3 independent cell lines (% Input over IgG; n=2). (B)
747 Representative anti-H3K9/K14 acetylation ChIP profiles for 3 independently
748 transfected cell lines normalized to H3 ChIP (n=2). (C) Mouse ES cells expressing
749 GFP-tagged Cfp1 were transfected with Artificial CGI construct and bound Cpf1
750 was assayed by ChIP with anti-GFP antibodies in 3 independent cell lines (n=2).
751 (D) Representative anti-H3K3me3 ChIP for three independent Cfp1-/- mouse ES
752 cells transfected with the Artificial CGI construct (n=2). Control ChIP amplicons
753 are as in Figure 1C. Error bars indicate standard deviation of PCR replicates. This
754 figure has one supplement:
755 Supplement 1 - H3K4me3 at an artificial CGI without Cfp1 or RNA polymerase II
756
757
758
759 Figure 3. High G+C content is not sufficient to create a bivalent chromatin
760 domain
31 761 Map of gene desert 2 showing the integration site of the Low CpG / High G+C (L-
762 CpG / H-G+C) construct labeled as in Figure 1B. Representative anti-H3K3me3
763 and H3K27me3 ChIP profiles (normalized to H3) and Suz12 ChIP profiles (%
764 Input; n=3) are shown for three independent transfected cell lines. Shaded bar
765 includes primers spanning the Low CpG / High G+C construct. Control ChIP
766 amplicons are as in Figure 1C. Error bars indicate standard deviation of PCR
767 replicates.
768
769
770 Figure 4. CpG-rich DNA sequences on an A+T-rich background fail to form
771 bivalent chromatin and reproducibly acquire DNA methylation
772 (A) Above: Map of gene desert 2 indicating the integration site of the High CpG /
773 Low G+C 1 (H-CpG / L-G+C 1) construct as in Figure 1B. Representative anti-
774 H3K4me3, H3K27me3 and Suz12 ChIPs shown (n=3) for each of two
775 independently transfected cell lines (3rd line not shown). The shaded bar
776 includes primers spanning the High CpG / Low G+C 1 construct. (B) Bisulfite
777 sequence analysis of the two cell lines shown in (A). Clear box indicates bisulfite
778 amplicon. In the map above, blue strokes show CpGs in the CGI-like insert and
779 the clear box indicates the bisulfite amplicon. Methylated and unmethylated
780 CpGs are depicted as filled and open circles, respectively. (C) The High CpG / Low
781 G+C construct was integrated into Dnmt 3a-/- Dnmt 3b-/- double mutant mouse
782 ES cells. Representative H3K4me3, H3K27me3 and Suz12 ChIPs are shown (n=3).
783 Upper right panel shows bisulfite sequence analysis of a cell line containing the
784 High CpG / Low G+C construct in Dnmt 3a/b -/- cells, presented as in panel (B).
785 (D) The relationship between G+C content of constructs analysed in this study
32 786 and their DNA methylation status. Data for Mecp2-eGFP and Nanog-PuroGFP
787 refer to cell lines reported previously (Thomson et al, 2010), but reanalyzed for
788 this study. (E) Diagrams depicting the influence of CGI sequence composition on
789 chromatin structure. Upper panel: Sequences with high CpG frequency and high
790 C+G content attract both H3K4 and H3K27 methyltransferase to establish
791 bivalent chromatin domains and they remain unmethylated. SET1A/1B and
792 MLL1/2 complexes contain CXXC domains that may target H3K4me3 to CGIs. The
793 mechanism by which the PRC2 complex is targeted is unknown. Middle panel:
794 Without CpGs, H3K4 and K27 methyltransferases are not recruited even when
795 the DNA is G+C-rich. Lower panel: A+T-rich DNA fails to form a bivalent
796 chromatin structure, even when the CpG density is high and is consistently
797 subject to de novo methylation.
798 This figure has the following supplements:
799 Supplement 1: CpG-rich, A+T-rich DNA sequences do not form bivalent
800 chromatin.
801 Supplement 2: DNA methylation at CpG-rich, A+T-rich DNA sequences blocks
802 bivalent chromatin.
803
804 Legends to Figure supplements
805
806 Figure 1 - figure supplement 1.
807 Bivalent chromatin at artificial CGI-like sequences in mouse ESCs.
808 Schematic representation of the experimental protocol for insertion of artificial
809 CGIs into the mouse ESC genome. A linearized plasmid containing the CGI like
810 sequence, a selection cassette flanked by FRT sites, a single LoxP site and 5’ and
33 811 3’ homology regions were integrated into a human gene desert within a BAC via
812 recombineering. The linearized BAC was then transfected into mouse ESCs.
813 Colonies containing the BAC were screened for clones with low copy number
814 integration and transfected with Flp to excise the selection cassette. Successful
815 excision was confirmed by PCR and Southern blotting. Selected clones were used
816 for analysis of chromatin modification and DNA methylation. (B) Diagram above
817 shows the CGI like sequence PuroGFP integrated into a human gene desert 1
818 (mChr18:36,042,881-36,175,341) and the position of primer pairs used for ChIP
819 (not to scale). Representative H3K4me3, H3K27me3 and Suz12 ChIP qPCR
820 experiments (n=2). The shaded box includes primers spanning the PuroGFP
821 construct. (C) Diagram above indicates the mouse ß-globin locus showing the
822 integration site of the Artificial CGI 2 construct. Black bars indicate position of
823 primers used for qPCR. H3K4me3, H3K27me3 and Suz12 ChIP analysis are
824 shown for a representative cell line (data for 2 other cell lines not shown).
825 Shading indicates primers spanning the Artificial CGI 2. (D) Bisulfite sequencing
826 of the Artificial CGI 2 construct. In the map above, blue strokes show CpGs in the
827 CGI-like insert and the clear box indicates the bisulfite amplicon. Methylated and
828 unmethylated CpGs are depicted as filled and open circles, respectively. ChIP
829 control amplicons as described in Figure 1C but including the TSS of the active
830 gene ß-actin. Error bars indicate standard deviation of PCR replicates.
831
832 Figure 1 - figure supplement 2.
833 Synthetic DNA elements with different sequence properties.
34 834 (A) Sequence profiles of CGI-like constructs flanked on either side by 1 kb of
835 human gene desert. Upper panels: % G+C plots; lower panels: CpGs-per-100bp
836 plots. X-axis length shows length in base pairs (bp).
837
838 Figure 1 – source data 1
839 Table showing properties of constructs used in these studies, including length,
840 base composition and CpG frequency (observed over expected CpG ratio; o/e) of
841 all tested constructs. Note that the o/e ratio takes the overall G+C content into
842 account and therefore the High CpG/ Low G+C sequences have a high o/e ratio.
843
844 Figure 2 - figure supplement 1.
845 H3K4me3 at an artificial CGI independent of Cfp1 and RNA polymerase II
846 (A) Scheme above shows gene desert 2 (mChr1:81,106,616-81,153,886) with
847 integration site of the Artificial CGI-like construct (see Figure 1B).
848 Representative ChIP with an antibody specific for the unphosphorylated C-
849 terminus of RNA Pol II and the Serine 5 phosphorylated C-terminus of RNA Pol II
850 for 3 independent cell lines (n=2). Shaded bar indicates primers spanning the
851 Artificial CGI. (B) Undifferentiated mouse ES cells were cultured for 4 days
852 without LIF and then another 4 days in the presence of retinoic acid (RA). All
853 panels photographed at 10x magnification. (C): H3K4me3 and H3K27me3 in
854 ESCs versus neural precursor cells for two independent cell lines (n=2). Error
855 bars indicate standard deviation of PCR replicates.
856
857 Figure 4 - figure supplement 1.
858 CpG-rich, A+T-rich DNA sequences do not form bivalent chromatin.s
35 859 (A & B) Above: diagrams (see Figure 1B) of gene desert 2 with (A) inserted A+T-
860 rich, CpG-rich construct #2 (H-CpG/L-G+C 2) and (B) medium-A+T, CpG-rich
861 construct (H-CpG/M-G+C). Below: representative anti-H3K3me3 and H3K27me3
862 ChIP normalized to H3 and Suz12 ChIP shown as enrichment over the negative
863 control region (n=3) for two independently transfected cell lines for each
864 construct. Shaded bars indicate primers spanning the CGI-like constructs. (C)
865 Scheme showing the mouse ß-globin locus (see Figure 1-figure supplement 1)
866 with integration site of the A+T-rich, CpG rich construct #3 (H-CpG/L-G+C 3)
867 construct. Black bars indicate position of primers used for Q-PCR. Representative
868 H3K4me3 and H3K27me3 ChIP analyses (data for cell lines 2 and 3 not shown).
869 Shaded bar indicates primers spanning the H-CpG/L-G+C 3 construct.
870
871 Figure 4 - figure supplement 2.
872 DNA methylation at CpG-rich, A+T-rich DNA sequences blocks bivalent
873 chromatin.
874 (A) Bisulfite sequence analysis of the H-CpG/L-G+C 2 and 3 and H-CpG/M-G+C
875 constructs. Clear box indicates bisulfite amplicon and blue vertical lines show
876 position of CpGs. Methylated CpGs are depicted as filled black circles,
877 unmethylated CpGs as empty white circles. (B) Bisulfite sequence analysis of IAP
878 elements in wt versus Dnmt 3a/3b knockout mouse ES cells. (C) Dnmt 3a/3b
879 knockout cells can form bivalent chromatin when transfected with the Artificial
880 CGI 1 (see Figure 1C). Representative H3K3me3 and H3K27me3 ChIP normalized
881 to H3 for 2 independent cell lines. (D) Partial loss of DNA methylation causes
882 increased H3K27me3 at the A+T-rich, CpG-rich CGI 1, but does not show elevated
883 H3K4me3. Wt mouse ESCs were transfected and grown for 10 days in either
36 884 normal medium or medium + 2i inhibitors. Representative H3K3me3 and
885 H3K27me3 ChIP normalized to H3 shown (second cell line not shown). (E)
886 Bisulfite sequencing of the H-CpG/L-G+C 1 after 10 days in medium +2i shows
887 reduced DNA methylation at the inserted DNA.
888
889 Figure 4 - figure supplement 3.
890 CpG density and CGI length at bivalent CGIs correlate positively with
891 H3K4me3 and H3K27me3 levels in mouse ESCs.
892 Bivalent CGIs identified in published ChIPseq analyses (Denissov et al., 2014;
893 Marks et al., 2012) were divided into four equal bins based on length or CpG
894 density and plotted against levels of H3K4me3 or H3K27me3 (read counts).
895
37 A B Gene desert 2 55.4 kb 1 2 3 4 8 9 10 11 1 kb
Artificial CGI 1 GC content 567 0.3 0.4 0.5 0.6 0.7
0.0 0.5 1.0 1.5 CpG o/e C Cell line 1 Cell line 2 Cell line 3 H3K4me3 ChIP
10 6 8 8 5 6 6 3 4 4 4 2 3 2 1.5 3 1.5
H3K4me3 / H3 / H3K4me3 1.0 2 1.0 H3K4me3 / H3 H3K4me3 / H H3K4me3 / 0.5 1 0.5 0.0 0 0.0 - 1 2 3 4 5 6 7 8 9 S G H - 1 2 3 4 5 6 7 8 9 S - 1 2 3 4 5 6 7 8 9 0 1 S G H 10 11 10 11 G H 1 1 Left Artificial Right Controls Left Artificial Right Controls Left Artificial Right Controls CGI 1 CGI1 CGI1
H3K27me3 ChIP
0.5 0.8 0.5
0.4 0.4 0.6
0.3 0.3 0.4 0.2 0.2 H3K27me3 / H3 H3K27me3 / H3K27me3 / H3 H3K27me3 /
H3K27me3 / H3 H3K27me3 / 0.2 0.1 0.1
0.0 0.0 0.0 1 2 3 4 5 6 7 8 9 - 1 2 3 4 5 6 7 8 9 S H - 1 2 3 4 5 6 7 8 9 S - 10 11 S G H 10 11 G 10 11 G H Left Artificial Right Controls Left Artificial Right Controls Left Artificial Right Controls CGI 1 CGI 1 CGI 1 Suz12 ChIP
8 8 6
6 6 4
4 4 % Input % Input % Input 2 2 2
0 0 0 1 2 3 4 5 6 7 8 9 - 1 2 3 4 5 6 7 8 9 S H 1 2 3 4 5 6 7 8 9 S - 10 11 S G H 10 11 G 10 11 G H Left Artificial Right Controls Left Artificial Right Controls Left Artificial Right Controls CGI 1 CGI 1 CGI 1
D Artificial CGI 1 CpGs
Cell line 1 Cell line 2 Cell line 3 9.2 % methylated 7.2 % methylated 12.4 % methylated
Wachter et al, 2014 Figure 1 A
Gene desert 2 55.4 kb 1 2 3 4 8 9 10 11 1 kb
Arti cial CGI 1 5 6 7
Cell line 1 Cell line 2 Cell line 3 RNA Pol II ChIP
5,00 0 8,00 0 1,00 0
4,00 0 6,00 0 800 G G G g 3,00 0 g
g 600 I I I / / / / 4,00 0 / / I I I I I I l l 2,00 0 l 400 o o o P P 2,00 0 P 1,00 0 200
0 0 0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 S - 0 1 - 1 1 S G H 1 1 G H 1 2 3 4 5 6 7 8 9 1 1 S G H Left Arti cial Right Controls Left Arti cial Right Controls Left Arti cial Right Controls CGI 1 CGI 1 CGI 1
B H3ac ChIP
8 5 15 3 3 3 H H H
/ / / 4
6 c c c 10
14a 3 14a 14a K K K 4 / / / 9 9 9 2 K K K
3 5 3 3
2 H H H 1
0 0 0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 S H 1 1 S H 1 1 S H Left Arti cial Right Controls Left Arti cial Right Controls Left Arti cial Right Controls CGI 1 CGI 1 CGI 1 C Cfp1 GFP - ChIP 0.5 0.2 0 0.1 5
0.4 0.1 5 t t t u u 0.1 0 u 0.3 np np np I I I
0.1 0 % % % 0.2 0.0 5 0.0 5 0.1
0.0 0.0 0 0.0 0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 S G H 1 1 S G H 1 1 S G H Left Arti cial Right Controls Left Arti cial Right Controls Left Arti cial Right Controls CGI 1 CGI 1 CGI 1 D Cfp1 -/- ES cells
H3K4me3 ChIP 8 6 8 3 3 3 H H H
/ 6 / 6 / 3 3 3 4 e e e m m 4 m 4 4 4 4 K K K 3 3 3 2 H H 2 H 2
0 0 0 1 2 3 4 5 6 7 8 9 0 1 - 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 S G H 1 2 3 4 5 6 7 8 9 1 1 S G H 1 1 S G H Left Arti cial Right Controls Left Arti cial Right Controls Left Arti cial Right Controls CGI 1 CGI 1 CGI 1
Wachter et al, 2014 Figure 2 A Gene desert 2 55.4 kb 1 2 3 4 8 9 10 11 1 kb
Low CpG/ High G+C 5 6 7
Cell-line 1 Cell-line 2 Cell-line 3 H3K4me3 ChIP
20 50 1.4 1.2 3 3 40 3 15 H H H
1.0 / / / 30 3 3 3 e e e 0.8 10 20 m m m 4 4 4 1. 0 1.5 0.0 4 K K K 3 3 3 1.0 H H H 0. 5 0.0 2 0.5
0. 0 0.0 0.0 0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 S - 1 1 S G H 1 1 S G H 1 1 G H Left L-CpG Right Controls Left L-CpG Right Controls Left L-CpG Right Controls H-G+C H-G+C H-G+C H3K27me3 ChIP 1.5 1.0 1.0 3 3 3 H H
H 0.8 0.8
/ / / 3 3 3 1.0 e e e 0.6 0.6 m m m 7 7 7 2 2 2 0.4
0.4 K K K 3
3 0.5 3 H H H 0.2 0.2
0.0 0.0 0.0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 G S H 1 1 S G H 1 1 S G H Left L-CpG Right Controls Left L-CpG Right Controls Left L-CpG Right Controls H-G+C H-G+C H-G+C Suz12 ChIP
0.8 6 1.0
0.6 0.8 t t 4 t u u u 0.6 np np np I I I
0.4 % % % 0.4 2 0.2 0.2
0.0 0 0.0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 S H - 1 1 G S H 1 1 G S H 1 1 G Left L-CpG Right Controls Left L-CpG Right Controls Left L-CpG Right Controls H-G+C H-G+C H-G+C
Wachter et al, 2014 Figure 3 A Gene desert 2 B 55.4 kb 1 2 3 4 8 9 10 11 High CpG / Low G+C 1 1 kb CpGs
High CpG/ Low G+C 1 Cell-line 1 Cell-line 2 93.3 % methylated 93.3 % methylated 5 6 7 Cell-line 1 Cell-line 2 H3K4me3 ChIP 8 10 3 3 8 H H
6 / / 3 3 6 e e m
m 4 4 4 K K 0.4 0.4 3 3 D H H 0.2 0.2
0.0 0.0 100 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 G H 1 1 G H Left H-CpG Right Controls Left H-CpG Right Controls L-G+C 1 L-G+C 1 80
H3K27me3 ChIP n
0.8 0.8 io t 60 la 3 3
y Artificial C GI 1 H H
h / t / 0.6 0.6 Artificial C GI 2 3 3 e e High C pG / Low G +C 1
Me 40
m m High C pG / Low G +C 2 7 7 0.4 0.4 % 2 2 High C pG / Low G +C 3 K K 3 3 20 High C pG / Med G +C H H 0.2 0.2 Mecp2-eGF P Nanog-P uroGF P 0.0 0.0 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 G H 1 1 G H 0 Left H-CpG Right Controls Left H-CpG Right Controls 30 40 50 60 70 L-G+C 1 L-G+C 1 G+C % Suz12 ChIP 0.8 2.5
2.0 E 0.6 t t u u 1.5 np np I I
0.4 SET1A/1B % % 1.0 MLL1/2 0.2 CFP1 0.5
0.0 0.0 CG AGC CG G C CG GT CG GT CG C 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 - 1 1 G H 1 1 G H ? ? Left H-CpG Right Controls Left H-CpG Right Controls JARID2
L-G+C 1 L-G+C 1 PRC2
C Dnmt 3a/3b -/- cells H3K 4me3 C hIP High CpG / Low G+C 1 10 CpGs MLL1/2 8 SET1A/1B
3 6
H CFP1
4
/ 5 % methylated
3 2 e GG ACCC AGCCC TGGGCCTGG m 4
K 0.0 4 ? 3 H 0.0 2 ? PRC2 0.0 0 1 2 3 4 5 6 7 8 9 0 1 JARID2 1 1 S G H - Left H-CpG Right Controls L-G+C 1 MLL1/2 S uz12 C hIP H3K 27me3 C hIP SET1A/1B 0.5 4 CFP1 DNMT 5
f 3A/3B 1 3
0.4 o
H m t /
r
n 3 3 e
e ? v e 0.3 o m
m CG AT TCG TATATTA CG AAT CG A h 2 7 c i 1 2 r ? 0.2 z K n
u 2 3 E
S ? H 0.1 PRC2 0.0 1 - JARID2 1 2 3 4 5 6 7 8 9 0 1 - 1 2 3 4 5 6 7 8 9 0 1 S G H 1 1 S G H 1 1 Left H-CpG Right Controls Left H-CpG Right Controls H3K4me3 Unmethylated CpG L-G+C 1 L-G+C 1 H3K27me3 Methylated CpG Wachter et al, 2014 Figure 4