bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 KDM2 constrain from CpG island promoters 2 independently of their histone demethylase activity

3 Anne H. Turberfield1, Takashi Kondo2, Manabu Nakayama3, Yoko Koseki2, Hamish W. King1, 4 Haruhiko Koseki2,4, Robert J. Klose*1

5 1 Department of Biochemistry, University of Oxford, Oxford, United Kingdom 6 2 Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, 7 Yokohama, Japan 8 3 Department of Technology Development, Kazusa DNA Research Institute, Kisarazu, Japan 9 4 CREST, Japan Science and Technology Agency, Kawaguchi, Japan 10 *For correspondence: [email protected]

11

12 ABSTRACT

13 CpG islands (CGI) are associated with the majority of mammalian gene promoters and 14 function to recruit chromatin modifying enzymes. It has therefore been proposed that CGIs 15 regulate gene expression through chromatin-based mechanisms, however in most cases this 16 has not been directly tested. Here, we reveal that the histone H3 lysine 36 (H3K36) 17 demethylase activity of the CGI-binding KDM2 proteins contributes only modestly to the 18 H3K36me2-depleted state at CGI-associated gene promoters and is dispensable for normal 19 gene expression. Instead, we discover that KDM2 proteins play a widespread and 20 demethylase-independent role in constraining gene expression from CGI-associated gene 21 promoters. We further show that KDM2 proteins shape RNA Polymerase II occupancy but not 22 chromatin accessibility at CGI-associated promoters. Together this reveals a demethylase- 23 independent role for KDM2 proteins in transcriptional repression and uncovers a new function 24 for CGIs in constraining gene expression. 25

26 INTRODUCTION

27 The functionality of complex multicellular organisms is underpinned by the creation of diverse 28 cell types from a common genetic DNA blueprint. This is achieved during development by cells 29 acquiring and maintaining cell type-specific gene expression programmes. At the most basic 30 level, this relies on the control of RNA polymerase II (RNAPII)-mediated transcription by 31 transcription factors (Spitz and Furlong 2012). However, it has also become clear that 32 chromatin structure and its chemical modification can profoundly affect how transcription bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33 initiates from promoters and how gene expression is controlled (Kouzarides 2007; Li et al. 34 2007). 35 36 One such chemical modification of chromatin occurs on DNA where a methyl group is added 37 to the 5 position of each cytosine in the context of CpG dinucleotides. CpG methylation is 38 pervasive in mammalian genomes and is generally associated with transcriptional repression, 39 particularly of repetitive and parasitic DNA elements (Klose and Bird 2006; Schübeler 2015). 40 However, short CpG-rich regions of the genome, called CpG islands (CGIs), remain free of 41 DNA methylation and are associated with the majority of mammalian gene promoters 42 (Saxonov et al. 2006; Illingworth and Bird 2009). CGIs have been proposed to regulate gene 43 expression (Blackledge and Klose 2011; Deaton and Bird 2011) through a family of ZF-CxxC 44 DNA binding domain-containing proteins that recognise non-methylated DNA and occupy 45 CGIs (Voo et al. 2000; Lee et al. 2001; Blackledge et al. 2010; Thomson et al. 2010). 46 Interestingly, most ZF-CxxC domain-containing proteins possess histone modifying activities 47 or are part of large chromatin modifying complexes, suggesting that these factors regulate 48 gene expression through chromatin (Long et al. 2013a). However, in most cases the 49 contribution of chromatin-based mechanisms to CGI-dependent gene regulation remains 50 untested. 51 52 The ZF-CxxC domain-containing lysine-specific demethylase 2A (KDM2A) and its 53 paralogue KDM2B bind to CGIs (Blackledge et al. 2010; Farcas et al. 2012; He et al. 2013; 54 Wu et al. 2013). KDM2 proteins encode a JmjC domain that catalyses the removal of H3K36 55 mono- and di- methylation (H3K36me1/2) (Tsukada et al. 2006; Fang et al. 2007; He et al. 56 2008; Cheng et al. 2014). H3K36me1/2 are broadly distributed throughout the mammalian 57 genome (Peters et al. 2003; Robin et al. 2007; Schotta et al. 2008) and H3K36me2 has been 58 proposed to counteract transcription initiation. For example, in yeast H3K36me2 inhibits 59 inappropriate initiation of transcription from cryptic promoters in (Carrozza et al. 2005; 60 Joshi and Struhl 2005; Keogh et al. 2005; Li et al. 2009a; McDaniel and Strahl 2017), and this 61 function may also be conserved in mammals (Xie et al. 2011; Carvalho et al. 2013). Given the 62 seemingly widespread and indiscriminate deposition of H3K36me1/2 in mammalian genomes 63 and its association with transcriptional repression, the discovery that KDM2 proteins localise 64 specifically to CGIs has led to the suggestion that the removal of H3K36me2 at these sites 65 may contribute to a widespread and transcriptionally permissive chromatin state at gene 66 promoters (Blackledge and Klose 2011; Deaton and Bird 2011). Depletion of KDM2A was 67 shown to cause an increase in H3K36me2 at a number of CGI-associated gene promoters, 68 suggesting that KDM2A plays an active role in H3K36me2 removal (Blackledge et al. 2010). 69 However, the histone demethylase activity of KDM2 proteins has also been linked to gene bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

70 repression, including of genes that have roles in cell proliferation, differentiation and 71 senescence (Frescas et al. 2007; He et al. 2008; Tzatsos et al. 2009; Tanaka et al. 2010; Du 72 et al. 2013; Yu et al. 2016). Therefore, the role that KDM2 proteins play in regulating 73 H3K36me1/2 and the effect that this has on CGI-associated gene transcription remain unclear. 74 75 KDM2 proteins may also regulate transcription through mechanisms that do not rely on their 76 demethylase activity. KDM2A has been implicated in the formation of pericentromeric 77 heterochromatin (Borgel et al. 2017), while KDM2B physically associates with polycomb 78 repressive complex 1 (PRC1) and is required for the formation of repressive polycomb 79 chromatin domains at a subset of CGI-associated gene promoters (Farcas et al. 2012; He et 80 al. 2013; Wu et al. 2013; Blackledge et al. 2014). The possibility that KDM2 proteins have 81 demethylase-independent activity is supported by the observation that the Kdm2a and Kdm2b 82 genes encode internal transcription start sites (TSS) downstream of their JmjC domain. 83 Transcription initiating from these alternative promoters gives rise to short forms of KDM2A 84 and KDM2B (KDM2A/B-SF, Figure 1A) that lack the JmjC domain and therefore cannot act as 85 histone demethylases (Tanaka et al. 2010; Long et al. 2013a). Importantly, however, they 86 retain their ZF-CxxC domain and CGI binding activity. The function of KDM2A/B-SF proteins 87 remains poorly defined, but there is evidence that the KDM2B-SF is sufficient to recruit PRC1 88 to chromatin (Blackledge et al. 2014). Following the depletion of either KDM2A or KDM2B, 89 alterations in gene expression have been reported (Blackledge et al. 2010; Farcas et al. 2012; 90 He et al. 2013; Blackledge et al. 2014; Boulard et al. 2015). However, whether these two 91 closely related paralogues function cooperatively to regulate gene expression is unknown and, 92 like many chromatin modifying enzymes, it remains largely untested whether they rely on their 93 enzymatic activity for gene regulation. Perhaps more fundamentally, whether the KDM2 94 proteins function primarily to potentiate or repress gene transcription has not been examined 95 at the genome-scale and remains a major conceptual barrier in understanding how CGIs, 96 which are associated with most vertebrate gene promoters, control gene transcription and 97 expression. 98 99 To address these fundamental questions, here we have exploited systematic conditional 100 genetic ablation strategies and detailed genome-wide analysis to dissect how KDM2 proteins 101 regulate H3K36me2 and gene expression in mouse embryonic stem cells (mESCs). 102 Remarkably, we discover that KDM2 proteins contribute only modestly to the H3K36me2- 103 depleted state at CGI-associated gene promoters and the demethylase activity of KDM2 104 proteins is largely dispensable for normal gene expression. In contrast, surgical removal of 105 the KDM2 ZF-CxxC domains, which liberates KDM2 proteins from CGIs, revealed a 106 widespread increase in gene expression. This was not limited to the function of KDM2B in bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

107 polycomb-mediated gene repression, but instead occurred broadly across CGI-associated 108 genes, revealing an unexpectedly widespread role for KDM2 proteins in constraining gene 109 expression. KDM2B plays the predominant role in gene repression, while KDM2A appears to 110 cooperate with KDM2B to counteract expression at a subset of genes. KDM2-dependent 111 effects on gene expression do not manifest through altered DNA accessibility at CGIs, but 112 instead appear to regulate RNAPII occupancy at gene promoters. Therefore, we define a new 113 demethylase-independent role for KDM2A/B in transcriptional repression, uncovering a new 114 logic whereby CGIs appear, unexpectedly, to constrain gene expression. 115

116 RESULTS

117 KDM2 proteins contribute modestly to the H3K36me2-depleted state at CGI- 118 associated gene promoters

119 KDM2A and KDM2B both catalyse H3K36me2 demethylation via their JmjC domain (Tsukada 120 et al. 2006; He et al. 2008) and localise to CGIs via their ZF-CxxC domain (Figure 1A) 121 (Blackledge et al. 2010; Farcas et al. 2012; He et al. 2013; Wu et al. 2013). However, whether 122 KDM2 proteins regulate H3K36me2 at CGI-associated gene promoters throughout the 123 genome has not been examined. Therefore, we generated a mESC system in which loxP sites 124 were inserted into the Kdm2a and Kdm2b genes flanking exons that encode the JmjC domain 125 (Kdm2a/b-JmjCfl/fl, Supplementary Figure 1A) and which also expresses a tamoxifen-inducible 126 form of CRE recombinase. Addition of tamoxifen triggers deletion of the JmjC domain- 127 containing exons, removing the long forms of KDM2A and KDM2B (KDM2-LFs) and their 128 associated demethylase activity (Figure 1B,C, Supplementary Figure 1B). Importantly, KDM2- 129 SFs, which are expressed from downstream promoters (Supplementary Figure 1A), were 130 unaffected by removal of the KDM2-LFs (Figure 1 B,C). We first investigated the contribution 131 of KDM2-LFs to global H3K36 methylation levels by western blot, and observed only minor 132 changes following removal of KDM2-LFs (Supplementary Figure 1C,D). Next, we examined 133 the genome-wide distribution of H3K36me2 using chromatin immunoprecipitation followed by 134 massively-parallel sequencing (ChIP-seq). This confirmed a local depletion of H3K36me2 at 135 CGI-associated gene promoters (Figure 1D,E) (Blackledge et al. 2010; Blackledge and Klose 136 2011; Deaton and Bird 2011). H3K36me2 depletion was not detected at non-CGI gene 137 promoters, demonstrating that this is a CGI-associated chromatin feature. Following removal 138 of the KDM2-LFs by tamoxifen treatment, there was a modest increase in H3K36me2 at the 139 TSS of CGI-associated gene promoters (Figure 1D,E). This demonstrates that KDM2A/B 140 contribute to the H3K36me2-depleted state at CGIs, in agreement with single-gene studies 141 examining the KDM2A- or KDM2B-depleted state (Blackledge et al. 2010). Interestingly, bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

142 intragenic CGIs were also depleted of H3K36me2 (Figure 1F), although this depletion was on 143 average less pronounced than at CGI promoters, likely due to their lower average CpG density 144 and size (Supplementary Figure 1E). Importantly, removal of the KDM2-LFs resulted in an 145 increase in H3K36me2 at intragenic CGIs indicating that their H3K36me2 depleted state is 146 also shaped by KDM2A/B.

147 Our ChIP-seq analysis revealed that KDM2-LFs contribute to the H3K36me2-depleted state 148 at CGI-associated promoters. However, we were curious whether the effects on H3K36me2 149 were uniformly distributed or dependent on other features of gene promoters, such as 150 transcriptional activity. Therefore, we separated genes based on expression level 151 (Supplementary Figure 1F) and examined H3K36me2 at genes and surrounding regions. This 152 revealed that CGI-associated TSSs were depleted of H3K36me2 irrespective of expression 153 level (Figure 1G). Chromatin surrounding CGI-associated TSSs was blanketed by H3K36me2, 154 consistent with this modification being pervasive in mammalian genomes. The increase in 155 H3K36me2 at the TSS following KDM2-LF removal was similar across all expression levels 156 (Figure 1H), consistent with the transcription-independent targeting of KDM2 proteins to CGI 157 promoters via their ZF-CxxC domains. Interestingly, highly transcribed genes were also 158 depleted of H3K36me2 in their gene body (Figure 1G). However, this was independent of 159 KDM2 demethylase activity and instead correlated with co-transcriptional conversion of 160 H3K36me2 to H3K36me3 (Figure 1I) (Bannister et al. 2005; Pokholok et al. 2005; Barski et al. 161 2007; Bell et al. 2007; Mikkelsen et al. 2007; Weiner et al. 2015). Together, these observations 162 reveal that KDM2 proteins remove H3K36me2 from CGIs, but unexpectedly their contribution 163 to the depletion of H3K36me2 at these sites is modest. This suggests that CGIs could be 164 inherently refractory to H3K36me2 or that additional H3K36 demethylases may also function 165 at these regions (see discussion).

166

167 KDM2 demethylase activity contributes minimally to gene regulation

168 Depletion of H3K36me2 at CGI-associated gene promoters has been proposed to contribute 169 to the generation of a transcriptionally permissive chromatin state (Blackledge et al. 2010; 170 Blackledge and Klose 2011; Deaton and Bird 2011). Although KDM2 proteins appear to 171 contribute only modestly to the H3K36me2-depleted state at CGIs (Figure 1), we were curious 172 whether this effect was nevertheless required to sustain normal chromatin accessibility and 173 transcription from CGI-associated gene promoters. To address these questions, we first 174 performed calibrated ATAC-seq (cATAC-seq) to measure chromatin DNA accessibility before 175 and after removal of KDM2-LFs. This demonstrated that CGI promoters remained accessible, 176 despite the observed increases in H3K36me2 (Figure 2A). To examine gene expression, we bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

177 performed calibrated nuclear RNA sequencing (cnRNA-seq). This revealed that the 178 expression of the vast majority of genes did not change following removal KDM2 demethylase 179 activity, with only a small number of genes being modestly perturbed (Figure 2B). Furthermore, 180 there was a poor correlation between gene expression changes and the effects on H3K36me2 181 at gene promoters (Figure 2C,D). This minimal perturbation to gene expression and chromatin 182 accessibility following loss of KDM2-LFs indicates that histone demethylase activity of KDM2 183 proteins is largely dispensable for normal CGI-associated promoter activity.

184

185 KDM2 proteins play a widespread role in gene repression

186 Given that gene expression was largely unaffected when KDM2 demethylase activity was 187 removed, we wondered whether demethylase-independent activities of KDM2 proteins may 188 play a more prominent role in gene regulation. KDM2A and KDM2B encode multiple isoforms, 189 each of which contain the ZF-CxxC DNA binding domain. Therefore, to remove all CGI- 190 targeted KDM2 proteins, we developed a conditional mESC system in which the exon 191 encoding the ZF-CxxC domain is flanked by loxP sites in both Kdm2a and Kdm2b genes 192 (Kdm2a/b-CXXCfl/fl), and which expresses tamoxifen-inducible Cre recombinase 193 (Supplementary Figure 3A). Following addition of tamoxifen the ZF-CxxC-encoding exons are 194 excised, producing KDM2A and KDM2B proteins that now lack the ZF-CxxC domain (Figure 195 3A). The effectiveness of this approach was evident from the loss of the ZF-CxxC domain and 196 an increased mobility of the KDM2 proteins in western blot analysis (Figure 3B) and from loss 197 of binding to CGIs in ChIP analysis (Supplementary Figure 3B).

198 To examine whether this loss of CGI binding had an effect on gene expression, we carried out 199 cnRNA-seq and compared gene expression between untreated and tamoxifen treated cells. 200 This revealed that KDM2 protein removal resulted in more than a fifth of all genes showing 201 significantly increased expression (Figure 3C,D). Owing to the quantitative nature of cnRNA- 202 seq it was also apparent that KDM2 protein removal led to a more general increase in gene 203 expression, even amongst genes that were not considered significantly changed by statistical 204 analysis (Figure 3E). We validated these widespread effects using highly sensitive and 205 quantitative digital droplet PCR analysis (Supplementary Figure 3C). Importantly, our capacity 206 to uncover this broad increase in gene expression was only possible due to the use of 207 calibrated nuclear RNA-seq (cnRNA-seq) as conventional normalisation based on total read 208 count fails to uncover this pervasive alteration in gene expression (Supplementary Figure 3D). 209 When we examined in more detail the transcripts with significantly increased expression these 210 were enriched for CGI-associated genes (Figure 3F), consistent with these effects being a 211 direct result of KDM2 protein removal as opposed to a global perturbation of some core bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

212 transcriptional component. In contrast, significantly downregulated genes were less numerous 213 and not enriched for CGI-associated genes, suggesting they may correspond to secondary 214 effects. Together, these observations establish an unexpected and widespread role for KDM2 215 proteins in suppressing expression from CGI-associated gene promoters.

216

217 Elevated gene expression following KDM2 protein removal is not simply a 218 consequence of polycomb target gene reactivation

219 We have previously shown that KDM2B plays an important role in recruiting the PRC1 220 complex to CGI-associated gene promoters in mESCs. It does so by interacting with the PRC1 221 complex via the specialised adaptor protein PCGF1 which links KDM2B to RING1B, the 222 catalytic core of PRC1 (Farcas et al. 2012; He et al. 2013; Wu et al. 2013; Blackledge et al. 223 2014). When we examined the genes that increased in gene expression following KDM2 224 protein removal they had stereotypical CGI-associated features (Figure 4A), but were also 225 enriched for KDM2B, RING1B and SUZ12. This raised the possibility that the observed effects 226 on gene expression following KDM2 protein removal simply resulted from loss of KDM2B- 227 dependent targeting and gene repression by the PRC1 complex. To investigate this possibility, 228 we compared the gene expression changes following KDM2 protein removal with those 229 following conditional removal of PCGF1 (Fursova 2019). This revealed that removal of PCGF1 230 caused de-repression of more than four times fewer genes than removal of KDM2 proteins 231 (Figure 4B). Furthermore, genes that significantly increased in expression following PCGF1 232 removal were more strongly enriched for polycomb target genes than those that significantly 233 increased following KDM2 removal (Figure 4C). Genes showing increased expression 234 following PCGF1 removal were a subset of those showing increased expression following 235 KDM2 protein removal (Figure 4D), and there was only a moderate positive correlation 236 between the gene expression changes in these lines (Supplementary Figure 4A). These 237 observations indicate that a small proportion of the gene de-repression events in cells where 238 KDM2 proteins are removed are related to the activity of the KDM2B-PRC1 complex.

239 Building on this important observation, we examined in more detail the gene expression 240 changes that manifest from KDM2 protein removal from CGIs. From this it was evident that 241 genes with low starting expression level more strongly increased in expression, including non- 242 polycomb target genes (Figure 4E). analysis revealed that genes that 243 significantly increased in expression were enriched for a variety of developmental terms 244 (Figure 4F), consistent with some of the effects being related to the polycomb system, but also 245 a variety of basic cellular processes that are unrelated (Figure 4G). This reflects the 246 generalised increase in gene expression that occurs following KDM2 removal. Together, these bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

247 observations reveal that KDM2 proteins play a widespread role in gene repression from CGI- 248 associated gene promoters and do so largely through mechanisms that are independent of 249 the polycomb repressive system.

250

251 KDM2B plays the predominant role in gene repression

252 Loss of both KDM2A and KDM2B from CGI chromatin simultaneously resulted in widespread 253 increases in gene expression (Figure 3). However, it was unclear from these experiments 254 whether KDM2A, KDM2B or both contribute to gene repression. To examine this question, we 255 developed a conditional mESC system in which we could remove KDM2A alone by tamoxifen- 256 induced deletion of its ZF-CxxC domain (Kdm2a-CXXCfl/fl, Figure 5A, Supplementary Figure 257 5A and see Supplementary Figure 3A). cnRNA-seq revealed that removal of KDM2A led to 258 virtually no changes in gene expression (Figure 5B), indicating that KDM2A alone is not 259 required to maintain normal gene expression in mESCs. We next investigated the contribution 260 of KDM2B to the regulation of gene expression, performing cnRNA-seq using a Kdm2b- 261 CXXCfl/fl mESC line (Blackledge et al. 2014)(Figure 5A, Supplementary Figure 5B). cnRNA- 262 seq revealed that removal of KDM2B alone was sufficient to cause increases in the expression 263 of thousands of genes (Figure 5B) and, unlike KDM2A removal, largely recapitulated the 264 widespread increases in gene expression that occurred following KDM2A/B removal (Figure 265 5C). Genes that significantly increased in expression following KDM2B removal were enriched 266 for CGI-associated genes (Figure 5D). Furthermore, gene ontology analysis revealed that 267 these significantly increasing genes were enriched for a variety of developmental terms 268 characteristic of polycomb target genes (Figure 5E) but also terms relating to basic cellular 269 processes (Figure 5F). These observations suggest that removal of KDM2B alone, like 270 removal of KDM2A/B together, leads to widespread increases in the expression of CGI- 271 associated genes.

272 A comparison of the gene expression changes following KDM2B removal alone and KDM2A/B 273 removal together revealed good overall correlation (Figure 5G), indicating that the gene 274 expression changes following KDM2B removal largely recapitulated those following removal 275 of KDM2A/B together. However, a more detailed analysis revealed 931 significantly increasing 276 genes that less strongly increased in expression following loss of KDM2B compared to 277 KDM2A/B together, and this set was enriched for genes with low expression level (Figure 5H). 278 This suggests that KDM2A plays a role in restricting the expression of these genes following 279 KDM2B removal. Together our findings demonstrate that KDM2B plays the predominant role 280 in repressing gene expression, while KDM2A may cooperate with KDM2B to counteract 281 expression at a subset of genes. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

282

283 KDM2 proteins regulate polymerase occupancy but not chromatin accessibility at CGIs

284 Gene regulatory elements and gene promoters are characterised by elevated chromatin 285 accessibility (Boyle et al. 2008; Song et al. 2011; Thurman et al. 2012), and this is thought to 286 play an important role in regulating gene expression. Accessibility at CGI-associated gene 287 promoters broadly correlates with transcriptional output, with the promoters of highly 288 transcribed CGI-associated genes being more highly accessible (King et al. 2018). Therefore 289 we wondered whether the increases in gene expression following removal of KDM2 proteins 290 resulted from increases in the accessibility at CGI-associated gene promoters in the absence 291 of KDM2A/B. To test this we carried out cATAC-seq following removal of KDM2 proteins. 292 Importantly, we did not observe any significant change in the accessibility of CGI-associated 293 gene promoters (Figure 6A, Supplementary Figure 6A), indicating that expression changes 294 must manifest through effects on transcription that are independent of chromatin accessibility.

295 To examine this possibility in more detail, we carried out cChIP-seq for RNAPII before and 296 after removal of KDM2 proteins. This revealed on average a widespread decrease in RNAPII 297 occupancy at the TSSs of CGI-associated genes (Figure 6B,C). In the gene body, alterations 298 in RNAPII occupancy appeared to be related to the level of RNAPII reduction at the gene 299 promoter. Genes that retained promoter associated RNAPII showed increased RNAPII in the 300 gene body and promoters showing reduced RNAPII levels also had moderately reduced 301 RNAPII in the gene body (Figure 6D). Importantly, these effects were restricted to CGI- 302 associated genes, in agreement with the function of KDM2A/B at CGIs. To examine in more 303 detail the nature of the defects in RNAPII function at CGI-associated gene promoters, we 304 calculated the RNAPII pausing index, which is often used a proxy for RNAPII pause-release 305 (Supplementary Figure 6B). This showed a modest but clear decrease in pausing index when 306 KDM2 proteins were removed, and importantly this effect was not observed for non-CGI- 307 associated genes (Figure 6E). This suggests that removal of KDM2 proteins may contribute 308 to an increased rate of RNAPII pause release from CGI-associated gene promoters. A 309 comparison of the changes in RNAPII occupancy with the gene expression changes following 310 KDM2 removal revealed a moderate positive correlation (Supplementary Figure 6C), such that 311 the genes that most strongly increased in expression retained RNAPII at their promoter and 312 had elevated RNAPII occupancy throughout the gene body (Supplementary Figure 6D,E). 313 These effects on RNAPII could be explained by an increase in the rate of transcription initiation 314 at genes that show large increases in expression which, when combined with an increased 315 rate of pause release, results in the accumulation of RNAPII throughout the gene body and 316 increases in transcript levels. Importantly, the change in the distribution of Ser5- bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

317 phosphorylated RNAPII, which is enriched at promoter regions, and Ser2-phosphorylated 318 RNAPII, which is enriched throughout gene bodies, resembled that of total RNAPII 319 (Supplementary Figure 6F). There was no obvious shift in the position of either Ser2- or Ser5- 320 phosphorylated RNAPII enrichment throughout CGI genes (Figure 6F), and only minor 321 changes in the relative enrichment of Ser5- or Ser2-phosphorylated RNAPII compared to total 322 RNAPII at CGI gene promoters and gene bodies, respectively (Figure 6G). Together our 323 findings suggest that KDM2 proteins play a role in regulating RNAPII activity at gene 324 promoters, potentially by limiting initiation and pause release to constrain productive 325 transcription from these regions of the genome.

326

327 DISCUSSION

328 Chromatin modifying complexes are thought to play central roles in regulating gene expression 329 through their enzymatic activities. Yet, for most of these complexes, the importance of their 330 histone modifying activities in gene regulation remains to be tested. KDM2 histone 331 demethylases have been proposed to contribute to an H3K36me2-depleted and 332 transcriptionally permissive chromatin state at CGI-associated gene promoters. Alternatively, 333 they have also been suggested to contribute to gene repression in some specific instances. 334 However, the extent to which KDM2 proteins regulate gene expression and how this is related 335 to their H3K36me2 demethylase activity has remained untested. Here, using combinatorial 336 genetic perturbation and detailed genome-wide approaches, we discover that the histone 337 demethylase activity of KDM2 proteins contributes modestly to the H3K36me2 depletion at 338 CGI-associated gene promoters (Figure 1) and has minimal effects on gene expression 339 (Figure 2). In contrast, using calibrated gene expression analysis we discover an unexpectedly 340 widespread histone demethylase-independent role for KDM2 proteins in constraining the 341 expression of CGI-associated genes (Figure 3). Importantly, repression by KDM2 proteins is 342 not limited to polycomb target genes, which are known to be regulated by the KDM2B-PRC1 343 complex (Figure 4). Nevertheless, we find that KDM2B plays the predominant role in 344 repressing expression, while KDM2A appears to contribute at a subset of genes (Figure 5). 345 Finally, the effects of KDM2 proteins on gene expression are not mediated through changes 346 in chromatin accessibility, but instead KDM2 proteins appear to play a role in constraining 347 RNAPII occupancy and possibly pause release at CGI-associated gene promoters to limit 348 transcription (Figure 6). Together, this demonstrates that KDM2 proteins regulate gene 349 expression independently of their histone demethylase activity and through mechanisms that 350 appear to regulate RNAPII function at CGI-associated gene promoters. These discoveries bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

351 reveal an interesting new chromatin modification-independent role for CGIs and the KDM2 352 proteins in constraining gene expression.

353 Our understanding of how histone modification states are specified and regulated remains 354 poorly understood. In the context of histone H3K36, NSD1-3 and ASH1L, the main 355 H3K36me1/2 methyltransferases, can associate with gene promoters and genic regions, and 356 H3K36me2 blankets most of the genome (Gregory et al. 2007; Lucio-Eterovic et al. 2010; Kuo 357 et al. 2011; Rahman et al. 2011; Ram et al. 2011; Shen et al. 2015). However, our genome- 358 wide profiling of H3K36me2 reveals that the bodies of highly transcribed genes and CGI- 359 associated gene promoters are exceptions to this, being uniquely depleted of H3K36me2. This 360 suggests that mechanisms must function to shape H3K36me2 at distinct regions of the 361 genome. The depletion of H3K36me2 in highly transcribed gene bodies is likely due to 362 conversion to H3K36me3 by the SETD2 protein, which interacts with RNAPII and functions as 363 an H3K36 trimethyltransferase in gene bodies during transcriptional elongation (Strahl et al. 364 2002; Krogan et al. 2003; Li et al. 2003; Schaft et al. 2003; Xiao et al. 2003; Kizer et al. 2005; 365 Li et al. 2005; Sun et al. 2005; Edmunds et al. 2008). We and others had previously proposed 366 that the H3K36me2-depleted state at CGI-associated gene promoters relies on KDM2 proteins 367 to actively remove H3K36me2 from these regions. Now, using a cell system where we can 368 induce the removal of KDM2 demethylase activity, we show that KDM2 enzymes contribute 369 modestly to depletion of H3K36me2 at CGI-associated gene promoters. This suggests that 370 the activity of NSD/ASH1L may be inhibited, or that other histone demethylases may 371 compensate for the loss of KDM2 enzymes, at these regions of the genome. The latter of 372 these two possibilities seems the most likely, as KDM4A-C demethylases catalyse the removal 373 of H3K36me2/3 (Cloos et al. 2006; Fodor et al. 2006; Klose et al. 2006; Whetstine et al. 2006) 374 and also associate broadly with gene promoters (Pedersen et al. 2014; Pedersen et al. 2016). 375 Therefore, determining whether the H3K36me2-depleted state at CGI-associated gene 376 promoters results from active removal of this modification and contributes to gene regulation 377 awaits combinatorial removal of KDM2 and KDM4 demethylase activity.

378 When studied in the context of individual genes, KDM2A and KDM2B have been proposed to 379 function in both gene activation and repression. However, which of these activities is most 380 prevalent and whether these paralogous proteins function together to achieve appropriate 381 gene regulation have remained unknown. Using combinatorial inducible genetic perturbation 382 strategies and calibrated RNA-seq we now reveal that KDM2 proteins function primarily as 383 repressors of gene expression and elicit their effects via a demethylase-independent 384 mechanism. Gene repression by KDM2 proteins is remarkably widespread but largely 385 restricted to CGI-associated genes, in agreement with the occupancy of KDM2 proteins at 386 these regions of the genome through their ZF-CxxC DNA binding domain. However, lowly bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

387 expressed genes were more susceptible to increases in gene expression when KDM2 proteins 388 were removed. Therefore, KDM2 proteins may function to generically constrain transcription 389 from CGI-associated gene promoters, but only function to counteract low-level activation 390 signals. In agreement with this suggestion, CGI-associated genes that are already highly 391 expressed are largely unaffected by KDM2 loss, despite the fact that KDM2 proteins occupy 392 their promoters. In the context of these observations, we propose that the repressive activity 393 of KDM2 proteins may effectively create a CGI-imposed barrier to gene activation which 394 protects against low-level or inappropriate gene activation signals. This could be particularly 395 important in the context of cellular differentiation where excessive gene expression noise or 396 precocious gene activation may have deleterious consequences for the highly orchestrated 397 cascade of gene expression events that lead to appropriate acquisition of new cell fates.

398 Gene repression by the KDM2 proteins occurs independently of their JmjC domain and histone 399 demethylase activity, raising the interesting question of how they repress gene expression. By 400 examining the binding of RNAPII and its modified forms throughout the genome following loss 401 of KDM2 proteins, we discover that there is a widespread reduction in RNAPII occupancy at 402 CGI-associated gene promoters. This suggests that one activity of KDM2 proteins at CGIs 403 may be to constrain productive transcription, perhaps through a process that directly regulates 404 RNAPII pause release. It is intriguing to note that, on average, RNAPII occupancy moderately 405 decreased in gene bodies following loss of KDM2 proteins, despite a widespread increase in 406 gene expression. This raises the possibility that there are further alterations to RNAPII 407 behaviour, such as elongation rate, following KDM2 removal. Based on these observations, 408 an important area of future work will be to examine the mechanisms by which KDM2 proteins 409 affect RNAPII activity and to determine how direct this is.

410 We speculate that one mechanism by which KDM2 proteins could potentially modulate 411 RNAPII-dependent transcription processes is through ubiquitination. This is because, in 412 addition to their JmjC domains, KDM2 proteins also encode FBOX and LRR domains. The 413 FBOX binds a protein called SKP1, and we and others have previously shown that KDM2A 414 and KDM2B both interact with SKP1 (Gearhart et al. 2006; Koyama-Nasu et al. 2007; Farcas 415 et al. 2012; Tan et al. 2013). SKP1 is a central component of SCF-type E3 ligase 416 complexes (Cardozo and Pagano 2004), while FBOX-containing proteins are thought to confer 417 substrate specificity for SCF complexes through additional domains such as the LRR domain 418 (Ho et al. 2006). This suggests that KDM2 proteins might identify target proteins for 419 ubiquitylation. Indeed, KDM2B was reported to ubiquitylate the transcription factor c-Fos, 420 leading to its degradation by the proteasome (Han et al. 2016). KDM2A has also been 421 proposed to possess E3 ligase activity, as its overexpression stimulates 53BP1 ubiquitylation 422 (Bueno et al. 2018). The specificity of these putative KDM2 E3 ubiquitin ligase complexes bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

423 remains to be investigated. However, given that KDM2 proteins act broadly to repress gene 424 expression and may regulate RNAPII activity, one might envisage that KDM2 proteins could 425 regulate a component of the core transcriptional machinery or another general modulator of 426 gene transcription. Therefore, in future work it will be interesting to explore whether KDM2 427 proteins have a role in proteostasis at CGIs and to understand whether this contributes to their 428 function in the repression of gene expression and the regulation of RNAPII activity.

429 In conclusion, we discover that KDM2 proteins are CGI-specific transcriptional repressors that 430 appear to function to constrain low-level gene activation signals. Interestingly, DNA situated 431 in CGIs is known to be highly accessible, differentiating it from much of the rest of the genome. 432 It has been proposed that this accessibility highlights the location of gene regulatory elements 433 within large and complex vertebrate genomes, and allows transcriptional regulators and the 434 transcriptional machinery to more easily access the underlying DNA and enable gene 435 expression. However, an unintended consequence of this CGI-associated accessibility may 436 be that it renders these regions susceptible to low-level and potentially inappropriate gene 437 activation signals. We speculate that, in response to this potentially deleterious side effect of 438 CGI accessibility, KDM2 proteins may have evolved to bind CGIs and constrain transcription. 439 Indeed, we show that loss of KDM2 proteins does not affect accessibly at CGIs but does 440 broadly affect gene expression and RNAPII occupancy. Therefore we propose that CGIs 441 create an appropriate balance of transcriptionally permissive and restrictive activities to help 442 control gene expression.

443

444 FIGURE LEGENDS

445 Figure 1 – KDM2 proteins contribute modestly to the H3K36me2-depleted state at CGI- 446 associated gene promoters.

447 (A) A schematic illustrating protein domain architecture for KDM2A/B long (LF) and short 448 isoforms (SF). 449 (B) A schematic of the Kdm2a/b-JmjCfl/fl system in which addition of tamoxifen (OHT) leads 450 to removal of KDM2 long isoforms. 451 (C) Western blot analysis for KDM2A and KDM2B in Kdm2a/b-JmjCfl/fl mESCs before 452 (UNT) and after 96 hours of tamoxifen (OHT) treatment. BRG1 is shown as a loading 453 control for both blots. Asterisks indicate non-specific bands. 454 (D) Heatmaps of H3K36me2 enrichment (ChIP-seq) in Kdm2a/b-JmjCfl/fl mESCs before 455 (UNT) and after addition of tamoxifen (OHT), for CGI-associated (n=14106) and non- bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

456 CGI-associated (n=6527) gene promoters. H3K36me2 signal was normalised to H3 457 ChIP-seq to control for any alterations in nucleosome density. 458 (E) A metaplot of normalised H3K36me2 ChIP-seq signal at CGI-associated or non-CGI- 459 associated gene promoters in Kdm2a/b-JmjCfl/fl mESCs, before (UNT) and after 460 tamoxifen treatment (OHT). 461 (F) A metaplot of normalised H3K36me2 ChIP-seq signal at intragenic CGIs in Kdm2a/b- 462 JmjCfl/fl mESCs, before (UNT) and after tamoxifen treatment (OHT). 463 (G) Metaplots showing normalised H3K36me2 ChIP-seq signal throughout the gene body 464 before (UNT) and after tamoxifen treatment (OHT), for CGI-associated genes 465 separated into quartiles according to their expression level in Kdm2a/b-JmjCfl/fl mESCs 466 (Q1 < Q2 < Q3 < Q4). Genes were scaled to the same length and aligned at their TSS 467 and TES. 468 (H) A boxplot showing fold change in normalised H3K36me2 ChIP-seq signal following 469 tamoxifen treatment, for the CGI-associated gene quartiles shown in (G) and for non- 470 CGI-associated genes. 471 (I) A metaplot showing H3K36me3 enrichment throughout the gene body for the gene 472 sets shown in (G) (Brookes et al. 2012). 473 474 Figure 2 – KDM2 demethylase activity contributes minimally to gene regulation. 475 (A) An MA-plot showing log2 fold change in the accessibility (cATAC-seq) of CGI- 476 associated gene promoters in Kdm2a/b-JmjCfl/fl mESCs following tamoxifen treatment. 477 No promoters significantly changed in accessibility (p-adj < 0.05 and > 1.4-fold). 478 (B) An MA-plot showing log2 fold change in gene expression (cnRNA-seq) in Kdm2a/b- 479 JmjCfl/fl mESCs following tamoxifen treatment. The number of genes with significantly 480 increased or decreased expression (p-adj < 0.05 and > 1.4-fold) is shown in red and 481 density of gene expression changes is shown on the right. 482 (C) A scatter plot comparing the log2 fold change in gene expression (cnRNA-seq) with 483 the log2 fold change in normalised H3K36me2 ChIP-seq signal for CGI-associated 484 genes following tamoxifen treatment of Kdm2a/b-JmjCfl/fl mESCs. The solid line shows 485 the linear regression, and the coefficient of determination (R2) and Spearman 486 correlation coefficient (Cor) are annotated. 487 (D) Boxplots showing the log2 fold change in cnRNA-seq signal (left) and normalised 488 H3K36me2 ChIP-seq signal (right) for CGI-associated genes grouped into quintiles 489 based on their change in expression following tamoxifen treatment of Kdm2a/b-JmjCfl/fl 490 mESCs. 491 492 Figure 3 – KDM2 proteins mediate widespread gene repression. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

493 (A) A schematic of the Kdm2a/b-CXXCfl/fl system in which addition of tamoxifen (OHT) 494 leads to the generation of KDM2 proteins that lack the ZF-CxxC domain and therefore 495 are unable to bind to chromatin. 496 (B) Western blot analysis for KDM2A and KDM2B in Kdm2a/b-CXXCfl/fl mESCs before 497 (UNT) and after 96 hours tamoxifen (OHT) treatment. BRG1 is shown as a loading 498 control for both blots. Asterisks indicate non-specific bands. 499 (C) Genomic snapshots showing gene expression (cnRNA-seq) before (UNT) and after 500 tamoxifen (OHT) treatment of Kdm2a/b-CXXCfl/fl mESCs, for representative genes 501 that moderately (Fbxl20, left) or more dramatically increased in expression (Fosl2, 502 right). BioCAP and KDM2A and KDM2B ChIP-seq signal are shown for reference 503 (Farcas et al. 2012; Long et al. 2013b; Blackledge et al. 2014) 504 (D) An MA-plot showing log2 fold change in gene expression (cnRNA-seq) in Kdm2a/b- 505 CXXCfl/fl mESCs following tamoxifen treatment. The number of genes with significantly 506 increased or decreased expression (p-adj < 0.05 and > 1.4-fold) are shown in red and 507 density of gene expression changes is shown on the right. 508 (E) A density plot showing the distribution of the log2 fold change in gene expression 509 following tamoxifen treatment of Kdm2a/b-CXXCfl/fl mESCs, for all genes. 510 (F) A bar graph showing the proportion of genes that have a CGI promoter, for all genes 511 and genes that significantly increased or decreased in expression following tamoxifen 512 treatment of Kdm2a/b-CXXCfl/fl mESCs. 513 514 Figure 4 – KDM2-mediated repression is not limited to polycomb target genes. 515 (A) Metaplots showing enrichment of BioCAP signal and KDM2A, KDM2B, RING1B and 516 SUZ12 ChIP-seq signal at the TSS of all CGI-associated genes (n=14106, green) and 517 of the subset of these genes that significantly increased in expression following 518 tamoxifen treatment of Kdm2a/b-CXXCfl/fl mESCs (n=3879, red) (Farcas et al. 2012; 519 Long et al. 2013b; Blackledge et al. 2014). 520 (B) Left: an MA-plot showing log2 fold change in gene expression (cnRNA-seq) in Pcgf1fl/fl 521 mESCs following tamoxifen treatment to induce PCGF1 knockout (Fursova 2019). The 522 number of genes with significantly increased or decreased expression (p-adj < 0.05 523 and > 1.4-fold) is shown in red and density of gene expression changes is shown on 524 the right. Right: as (B) but for Kdm2a/b-CXXCfl/fl mESCs, shown for comparison. 525 (C) A bar graph comparing the distribution of genes into three classes – non-CGI, 526 polycomb (PRC) occupied and non-PRC occupied – for all genes and for genes that 527 that significantly increased in expression following tamoxifen treatment of Pcgf1fl/fl or 528 Kdm2a/b-CXXCfl/fl mESCs. Non-CGI genes are genes that lack a CGI at their 529 promoter. Non-PRC-occupied genes have a CGI promoter that is not bound by bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

530 polycomb complexes, while PRC-occupied genes have a CGI promoter that is bound 531 by polycomb complexes. 532 (D) A Venn diagram showing the overlap between genes that significantly increased in 533 expression following tamoxifen treatment of Pcgf1fl/fl and Kdm2a/b-CXXCfl/fl mESCs. 534 (E) A box plot showing the starting expression level (log2 UNT RPKM) for genes grouped 535 into deciles based on their log2 fold change in expression following tamoxifen 536 treatment of Kdm2a/b-CXXCfl/fl mESCs. 537 (F) Gene ontology analysis of genes that significantly increased in expression following 538 tamoxifen treatment of Kdm2a/b-CXXCfl/fl mESCs. 539 (G) As (H), but for the subset of significantly increasing genes that were not classified as 540 polycomb target genes. 541 542 Figure 5 – KDM2B plays the predominant role in gene repression. 543 (A) A schematic of the Kdm2a-CXXCfl/fl and Kdm2b-CXXCfl/fl systems in which addition of 544 tamoxifen (OHT) leads to the generation of KDM2A or KDM2B proteins that lack the 545 ZF-CxxC domain, respectively, and therefore are unable to bind to chromatin. 546 (B) MA-plots showing log2 fold change in gene expression (cnRNA-seq) in Kdm2a- 547 CXXCfl/fl (left) or Kdm2b-CXXCfl/fl mESCs (right) following tamoxifen treatment. The 548 number of genes with significantly increased or decreased expression (p-adj < 0.05 549 and > 1.4-fold) are shown in red and density of gene expression changes is shown on 550 the right. 551 (C) A density plot showing the distribution of the log2 fold change in gene expression 552 following tamoxifen treatment of Kdm2a-CXXCfl/fl, Kdm2b-CXXCfl/fl or Kdm2a/b- 553 CXXCfl/fl mESCs, for all genes. 554 (D) A bar graph illustrating the distribution of genes between three gene classes (Non- 555 CGI, Non-PRC, PRC) described in Figure 4C, for all genes and for genes that 556 significantly increased or decreased in expression following tamoxifen treatment of 557 Kdm2b-CXXCfl/fl mESCs. 558 (E) Gene ontology analysis of genes that significantly increased in expression following 559 tamoxifen treatment of Kdm2b-CXXCfl/fl mESCs. 560 (F) As (E), but for the subset of significantly increasing genes that were not classified as 561 polycomb target genes. 562 (G) A scatter plot comparing the log2 fold change in gene expression (cnRNA-seq) 563 following tamoxifen treatment of Kdm2b-CXXCfl/fl and Kdm2a/b-CXXC fl/fl mESCs. The 564 solid line shows the linear regression, and the coefficient of determination (R2) and 565 Spearman correlation coefficient (Cor) are annotated. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

566 (H) A scatter plot of the log2 fold change in gene expression (cnRNA-seq) following 567 tamoxifen treatment of Kdm2a/b-CXXC fl/fl mESCs for genes that significantly increased 568 in expression, plotted against the ratio of the log2 fold change in gene expression 569 following tamoxifen treatment of Kdm2b-CXXC fl/fl versus Kdm2a/b-CXXC fl/fl mESCs. A 570 1.5-fold threshold (red dotted lines) was used to define genes which were differentially 571 regulated between the two datasets, and the number of genes with more than 1.5-fold 572 increased or decreased expression is shown in red. The solid line shows the linear 573 regression, and the coefficient of determination (R2) and Spearman correlation 574 coefficient (Cor) are annotated. 575 576 Figure 6 – KDM2 proteins regulate RNAPII occupancy but not chromatin accessibility 577 at CGI-associated gene promoters. 578 (A) An MA-plot showing log2 fold change in the accessibility (cATAC-seq) of CGI- 579 associated gene promoters in Kdm2a/b-CXXCfl/fl mESCs following tamoxifen 580 treatment. No promoters significantly changed in accessibility (p-adj < 0.05 and > 1.4- 581 fold). 582 (B) A heatmap of the fold change in RNAPII ChIP-seq signal following tamoxifen treatment 583 of Kdm2a/b-CXXCfl/fl mESCs, for CGI-associated (n=14106) and non-CGI-associated 584 (n=6527) gene promoters. 585 (C) A metaplot showing RNAPII enrichment at CGI-associated genes before (UNT) and 586 after tamoxifen treatment (OHT) Kdm2a/b-CXXCfl/fl mESCs. 587 (D) Genomic snapshots showing RNAPII occupancy before (UNT) and after tamoxifen 588 treatment (OHT) of Kdm2a/b-CXXCfl/fl mESCs, for (above) a representative gene that 589 retains RNAPII at the promoter and increases in RNAPII occupancy throughout the 590 gene body, and (below) a representative gene that decreases in RNAPII at both 591 promoter and gene body regions. BioCAP and KDM2A and KDM2B ChIP-seq signal 592 are shown for reference (Farcas et al. 2012; Long et al. 2013b; Blackledge et al. 2014). 593 (E) Empirical cumulative density function (ECDF) plots of RNAPII pausing index for CGI- 594 associated (left) or non-CGI-associated (right) genes, before (UNT) and after 595 tamoxifen treatment (OHT) of Kdm2a/b-CXXCfl/fl mESCs. 596 (F) Metaplots showing Ser5P-RNAPII (upper panel) or Ser2P-RNAPII (lower panel) 597 enrichment at CGI-associated genes before (UNT) and after tamoxifen treatment 598 (OHT) of Kdm2a/b-CXXCfl/fl mESCs. 599 (G) A boxplot showing the fold change in Ser5P-RNAPII at gene promoters (left) or Ser2P- 600 RNAPII at gene bodies (right) following tamoxifen treatment of Kdm2a/b-CXXCfl/fl 601 mESCs, normalised to RNAPII-NTD signal. The fold changes for CGI-associated and 602 non-CGI-associated genes are shown. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

603 604 Figure S1 – Related to Figure 1. 605 (A) A schematic representation of the Kdm2a/b-JmjCfl/fl mESC line, in which loxP sites 606 were inserted into the Kdm2a and Kdm2b genes flanking exons that encode the JmjC 607 domain. 608 (B) ChIP-qPCR analysis showing KDM2A (upper panel) and KDM2B (lower panel) 609 enrichment relative to input in Kdm2a/b-JmjCfl/fl mESCs before (UNT) and after 610 tamoxifen treatment (OHT). Error bars show standard error of the mean of three 611 biological replicates. 612 (C) Quantitation of western blots of histone extract from Kdm2a/b-JmjCfl/fl mESCs, before 613 (UNT) and after 96 hours of tamoxifen treatment (OHT). Signal is normalised to histone 614 H4 and is represented relative to average UNT signal. Error bars show standard 615 deviation of three biological replicates. Significance was tested using a Student’s T- 616 test (non-significant (ns) if p > 0.05). 617 (D) Representative western blots. Histone H4 is shown as a loading control. 618 (E) Boxplots showing CpG density (left) and size (right) of intragenic and promoter- 619 associated CGIs. 620 (F) A metaplot of cnRNA-seq signal in Kdm2a/b-JmjCfl/fl mESCs, for CGI-associated 621 genes separated into quartiles according to their expression level and for non-CGI- 622 associated genes. Genes were scaled to the same length and aligned at their TSS and 623 TES. 624 625 Figure S3 – Related to Figure 3. 626 (A) A schematic representation of the Kdm2a/b-CxxCfl/fl mESC line, in which loxP sites 627 were inserted into the Kdm2a and Kdm2b genes flanking exons that encode the ZF- 628 CxxC domain. sgRNA.1 and sgRNA.2 indicate the position of CRISPR-mediated loxP 629 insertion. 630 (B) ChIP-qPCR analysis showing KDM2A (left panel) and KDM2B (right panel) enrichment 631 relative to input in Kdm2a/b-CXXCfl/fl mESCs before (UNT) and after tamoxifen 632 treatment (OHT). Error bars show standard error of the mean of four biological 633 replicates. 634 (C) Digital droplet PCR analysis, showing the fold change in template cDNA concentration 635 following tamoxifen treatment (OHT) of Kdm2a/b-CXXCfl/fl mESCs (blue). Error bars 636 show standard deviation of three biological replicates. The fold change calculated by 637 cnRNA-seq is shown for comparison (green), and the significance of this change is 638 annotated below the graph. The dashed lines represent no change in expression 639 (black) and the 1.4 fold change threshold used for cnRNA-seq analysis (red). bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

640 (D) An MA-plot showing log2 fold change in gene expression in Kdm2a/b-CXXCfl/fl mESCs 641 following tamoxifen treatment (OHT), normalising nRNA-seq data to total library size. 642 The number of genes with significantly increased or decreased expression (p-adj < 643 0.05 and > 1.4-fold) is shown in red and density of gene expression changes is shown 644 on the right. 645 646 Figure S4 – Related to Figure 4. 647 (A) A scatter plot comparing the log2 fold change in gene expression (cnRNA-seq) 648 following tamoxifen treatment of Kdm2a/b-CXXCfl/fl and Pcgf1fl/fl mESCs. The solid line 649 shows the linear regression, and the coefficient of determination (R2) and Spearman 650 correlation coefficient (Cor) are annotated. 651 652 Figure S5 – Related to Figure 5. 653 (A) Western blot analysis for KDM2A and KDM2B in Kdm2a-CXXCfl/fl mESCs before 654 (UNT) and after 96 hours of tamoxifen treatment (OHT). BRG1 is shown as a loading 655 control for both blots. Asterisks indicate non-specific bands. 656 (B) As (A) but for Kdm2b-CXXCfl/fl mESCs. 657

658 Figure S6 – Related to Figure 6.

659 (A) MA-plots showing log2 fold change in the accessibility (cATAC-seq) of CGI-associated 660 gene promoters in Kdm2a-CXXCfl/fl (left) or Kdm2a-CXXCfl/fl (right) mESCs following 661 tamoxifen treatment. No promoters significantly changed in accessibility (p-adj < 0.05 662 and > 1.4-fold). 663 (B) An illustration of the pausing index, the ratio of the average read density of RNAPII- 664 NTD at the promoter and the average read density of RNAPII in the gene body. 665 (C) Scatter plots comparing the log2 fold change in gene expression (cnRNA-seq) with the 666 log2 fold change in RNAPII occupancy (ChIP-seq) following tamoxifen treatment of 667 Kdm2a/b-CXXCfl/fl mESCs, at CGI-associated gene promoters (left) or gene bodies 668 (right). The solid line shows the linear regression, and the coefficient of determination 669 (R2) and Spearman correlation coefficient (Cor) are annotated. 670 (D) Boxplots showing the fold change in RNAPII occupancy at CGI-associated gene 671 promoters (left) or gene bodies (right) following tamoxifen treatment of Kdm2a/b- 672 CXXCfl/fl mESCs, for all CGI genes or for genes that significantly increased in 673 expression separated into quartiles according to their log2 fold change in expression 674 (Q1 < Q2 < Q3 < Q4). bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

675 (E) A metaplot showing RNAPII enrichment before (UNT) and after tamoxifen treatment 676 (OHT) of Kdm2a/b-CXXCfl/fl mESCs, for the top quartile of significantly upregulated 677 genes (Q4 – as in (D)). 678 (F) Heatmap analyses of the fold change in RNAPII, Ser5-RNAPII or Ser2P-RNAPII ChIP- 679 seq signal following tamoxifen treatment of Kdm2a/b-CXXCfl/fl mESCs, for CGI- 680 associated (n=14106) and non-CGI-associated (n=6527) gene promoters.

681

682 MATERIALS AND METHODS

683 Cell culture

684 Mouse embryonic stem cells (mESCs) were cultured on gelatine-coated dishes at 37oC and

685 5% CO2, in DMEM (Life Technologies) supplemented with 15% fetal bovine serum (Labtech), 686 2mM L-glutamine (Life Technologies), 0.5 mM beta-mercaptoethanol (Life Technologies), 1x 687 non-essential amino acids (Life Technologies), 1x penicillin-streptomycin (Life Technologies), 688 and 10 ng/ml leukemia-inhibitory factor. Conditional mESC lines were treated with 800 nM 4- 689 hydroxytamoxifen (Sigma) for 96 hours to induce KDM2-LFs knockout (Kdm2a/b-JmjCfl/fl) or 690 ZF-CxxC domain deletion (Kdm2a/b-CXXCfl/fl, Kdm2a-CXXCfl/fl, Kdm2b-CXXCfl/fl).

o 691 Human HEK293T cells used for RNAPII cChIP-seq were grown at 37 C and 5% CO2 in DMEM 692 supplemented with 10% fetal bovine serum, 2 mM L-glutamine, 0.5 mM beta-mercaptoethanol 693 and 1x penicillin-streptomycin. D. melanogaster S2 (SG4) cells used for cnRNA-seq were 694 grown adhesively at 25oC in Schneider’s Drosophila Medium (Life Technologies), 695 supplemented with 1x penicillin-streptomycin and 10% heat-inactivated fetal bovine serum.

696

697 Generation of the Kdm2a/b-JmjCfl/fl mESC line

698 To generate the Kdm2a/b-JmjCfl/fl mESC line, a loxP site was inserted upstream of the critical 699 JmjC domain-encoding exon(s) in the Kdm2a and Kdm2b genes (exon 8 for Kdm2a, and 700 exons 7-8 for Kdm2b), and FRT flanked PGK-neo and a second loxP site was inserted 701 downstream of the critical exon(s). Targeting vectors were generated from bacterial artificial 702 containing the target mouse genomic regions using the Double Red 703 recombination method, as previously described (Suzuki and Nakayama 2011). Linearized 704 targeting vectors were introduced into M1 mESCs by electroporation (GenePulser, Bio-Rad). 705 mESC colonies were isolated and expanded, and the genomic DNA of each clone was 706 purified. Homozygous loxP targeting was verified by sequencing of the genomic region 707 surrounding the loxP sites. Targeted ES cells were injected into mouse blastocysts to generate bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

708 chimeric mice. The Kdm2a/b-JmjCfl/fl line was generated by removal of the PGK-neo marker 709 gene by mating the targeted mice with mice expressing FLP recombinase. These Kdm2a/b- 710 JmjCfl/fl mice were further mated with mice harboring the ROSA26-CreErt2 locus to generate 711 Kdm2a/b-JmjCfl/fl:ROSA26-CreErt2+/- mice, from which the Kdm2a/b-JmjCfl/fl mESCs used in 712 this study were derived.

713

714 Generation of Kdm2a-CXXCfl/fl and Kdm2a/b-CXXCfl/fl mESC lines

715 Conditional Kdm2a-CXXCfl/fl and Kdm2a/b-CXXCfl/fl mESC lines were generated by using 716 CRISPR-mediated genome editing to insert parallel loxP sites flanking exon 14 of the Kdm2a 717 gene in Rosa26::CreERT2 or Kdm2b-CXXCfl/fl mESCs, respectively (Blackledge et al. 2014). 718 Targeting constructs encoding the loxP sequence flanked by 150 bp homology arms and 719 carrying a mutated PAM sequence to prevent retargeting by the Cas9 enzyme were purchased 720 from GeneArt (ThermoFisher). The pSpCas9(BB)-2A-Puro(PX459)-V2.0 vector was obtained 721 from Addgene (#62988). sgRNAs were designed using the CRISPOR online tool 722 (http://crispor.tefor.net/crispor.py) and were cloned into the vector as previously described 723 (Ran et al. 2013). First, the upstream loxP site was targeted. Rosa26::CreERT2 or Kdm2b- 724 CXXCfl/fl mESCs were transiently co-transfected with 1 μg of Cas9-sgRNA plasmid and 3.5 μg 725 of targeting construct using Lipofectamine 3000 (ThermoFisher). The day after transfection, 726 cells were passaged at a range of densities and subjected to puromycin selection (1 μg/ml) 727 for 48 hours. Individual clones were isolated and PCR-screened. A correctly targeted 728 homozygous clone was then used to target the downstream loxP site using the same 729 transfection protocol and screening strategy. Correct loxP targeting was verified by 730 sequencing of the genomic region surround the loxP sites, and clones were analysed by both 731 RT-qPCR and western blot to confirm loss of the ZF-CxxC domain in response to tamoxifen 732 treatment.

733

734 Protein extracts and immunoblotting

735 For nuclear extraction, mESCs were washed with PBS then resuspended in 10 volumes of

736 Buffer A (10mM Hepes pH 7.9, 1.5mM MgCl2, 10mM KCl, 0.5mM DTT, 0.5mM PMSF, and 1x 737 PIC (Roche)) and incubated on ice for 10 min. Cells were recovered by centrifugation at 1500 738 g for 5 min, resuspended in 3 volumes of Buffer A supplemented with 0.1% NP-40 and 739 incubated on ice for 10 min. The released nuclei were recovered by centrifugation at 1500 g 740 for 5 min and resuspended in 1 pellet volume of Buffer B (5mM Hepes pH 7.9, 26% glycerol, 741 400mM NaCl, 1.5mM MgCl2, 0.2mM EDTA, 0.5mM DTT and 1x PIC). After 1 hour of rotation bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

742 at 4°C, the suspension was pelleted at 16,000 g for 20 min and the supernatant taken as 743 nuclear extract.

744 For histone extraction, mESCs were washed with RSB (10mM Tris HCl pH 7.4, 10mM NaCl,

745 3mM MgCl2 and 20mM NEM), then resuspended in RSB buffer supplemented with 0.5% NP- 746 40 and incubated on ice for 10 min to allow cell lysis. Following centrifugation at 500 g for 5

747 min, the nuclear pellet was incubated in 2.5mM MgCl2, 0.4M HCl and 20mM NEM on ice for 748 20 min. After centrifugation at 16,000 g for 20 min, histones were precipitated from the 749 supernatant on ice with 25% TCA for 30 min. Histones were recovered by centrifugation at 750 16,000 g for 15 min, and the pellet was washed twice in acetone. The histone pellet was 751 resuspended in 1x SDS loading buffer and boiled at 95°C for 5 min. Any insoluble precipitate 752 was pelleted by centrifugation at 16,000 g for 15 min and the soluble fraction retained as 753 histone extract. Histone concentrations were compared by Coomassie Blue staining following 754 SDS-PAGE. Semi-quantitative western blot analysis of histone extracts was performed using 755 LiCOR IRDye® secondary antibodies and the LiCOR Odyssey Fc system. To measure 756 changes in H3K36 methylation, the signal relative to H4 histone was determined.

757

758 Antibodies

759 The following antibodies were used in this study: anti-KDM2A (Blackledge et al. 2010), anti- 760 KDM2B (Farcas et al. 2012), anti-BRG1 (EPNCIR111A, Abcam), anti-H3, anti-H3K36me1, 761 anti-H3K36me2 (Blackledge et al. 2010), anti-H3K36me3, anti-H4 (L64C1, Cell Signalling), 762 anti-Rbp1-NTD (D8L4Y, Cell Signalling), anti-Rbp1-CTD-Ser5P (D9N5I, Cell Signalling), anti- 763 Rbp1-CTD-Ser2P (E1Z3G, Cell Signalling). Anti-H3 and anti-H3K36me antibodies were 764 prepared in-house by rabbit immunisation with synthetic peptides (PTU/BS Scottish National 765 Blood Transfusion Service), and antibodies were purified on peptide affinity columns.

766

767 Preparation of chromatin

768 For KDM2A/B ChIP, 5x107 mESCs were resuspended in PBS and crosslinked in 2 mM 769 disuccinimidyl glutarate (Thermo Scientific) for 45 min at 25oC with gentle rotation, then in 1% 770 formaldehyde for 12.5 min (methanol-free, Life Technologies). Reactions were quenched by 771 addition of 125 mM glycine, and crosslinked cells were resuspended in lysis buffer (50mM 772 HEPES-KOH pH 7.9, 140mM NaCl, 1mM EDTA, 10% glycerol, 0.5% NP40, 0.25% TritonX- 773 100 and 1x PIC) and rotated for 10 min at 4°C. The released nuclei were washed (10 mM Tris- 774 HCl pH 8.0, 200mM NaCl, 1mM EDTA, 0.5mM EGTA and 1x PIC) for 5 min at 4°C, and the bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

775 nuclear pellet resuspended in 1 ml sonication buffer (10mM Tris HCl pH 8.0, 100mM NaCl, 776 1mM EDTA, 0.5mM EGTA, 0.1% sodium deoxycholate, 0.5% N-lauroylsarcosine and 1x PIC).

777 For histone ChIP, 1x107 mESCs were crosslinked for 10 min in 1% formaldehyde. Reactions 778 were quenched by addition of 125 mM glycine. The released nuclei washed twice in PBS, then 779 resuspended in lysis buffer (1% SDS, 10mM EDTA, 50mM Tris HCl pH 8.1 and 1x PIC) and 780 incubated on ice for 30 min.

781 For RNAPII ChIP, 5x107 mESCs were resuspended in PBS and mixed with 4x106 HEK293T 782 cells. Cells were crosslinked for 10 min in 1% formaldehyde. Reactions were quenched by 783 addition of 150 mM glycine, and the crosslinked cells resuspended in FA-lysis buffer for 10 784 min (50mM HEPES pH 7.9, 150mM NaCl, 2mM EDTA, 0.5mM EGTA, 0.5% NP40, 0.1% 785 sodium deoxycholate, 0.1% SDS, 10mM NaF, 1mM AEBSF, 1x PIC).

786 Chromatin was sonicated using a BioRuptor Pico sonicator (Diagenode), shearing genomic 787 DNA to approximately 0.5 kb. Following sonication, TritonX-100 was added to chromatin used 788 for KDM2A/B ChIP to a final concentration of 1%.

789

790 Chromatin immunoprecipitation and sequencing

791 Sonicated chromatin was diluted 10-fold in ChIP dilution buffer (1% Triton-X100, 1 mM EDTA, 792 20mM TrisHCl pH 8, 150mM NaCl and 1x PIC) for KDM2A/B or histone ChIP, or in FA-lysis 793 buffer for RNAPII ChIP. Chromatin was pre-cleared for 1 hour with either protein A magnetic 794 Dynabeads (Invitrogen, for KDM2A/B ChIP) or protein A agarose beads (Repligen, for histone 795 or RNAPII ChIP) blocked with 1 mg/ml BSA and 1 mg/ml yeast tRNA. For each ChIP reaction, 796 150 μg chromatin (KDM2A/B), 300 μg chromatin (RNAPII) or chromatin corresponding to 797 1x105 cells (histone ChIP) was incubated overnight with the appropriate antibody: anti-KDM2A 798 (2.4 μl), anti-KDM2B (2 μl), anti-H3 (15 μl) anti-H3K36me2 (15 μl), anti-Rbp1-NTD (15 μl), 799 anti-Rbp1-CTD-Ser5P (12.5 μl), anti-Rbp1-CTD-Ser2P (12.5 μl).

800 Antibody-bound chromatin was isolated using blocked protein A agarose (histone/ RNAPII 801 ChIP) or magnetic beads (KDM2A/B ChIP) for 2 hours at 4°C. For histone or KDM2A/B ChIP, 802 washes were performed with low salt buffer (0.1% SDS, 1% TritonX-100, 2 mM EDTA, 20 mM 803 Tris-HCl pH 8, 150 mM NaCl), high salt buffer (0.1% SDS, 1% TritonX-100, 2 mM EDTA, 20 804 mM Tris-HCl pH 8, 500 mM NaCl), LiCl buffer (250mM LiCl, 1% NP40, 1% sodium 805 deoxycholate, 1 mM EDTA, 10 mM Tris-HCl pH 8) and two washes with TE buffer (10 mM 806 Tris-HCl pH 8, 1 mM EDTA). For RNAPII ChIP, washes were performed with FA-Lysis buffer, 807 FA-Lysis buffer containing 500mM NaCl, DOC buffer (250mM LiCl, 0.5% NP40, 0.5% sodium 808 deoxycholate, 2 mM EDTA, 10mM Tris-HCl pH 8) and two washes with TE buffer. ChIP DNA bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

809 was eluted in elution buffer (1% SDS, 100mM NaHCO3) and crosslinks reversed overnight at 810 65°C with 200 mM NaCl and 2 μl RNase A (Sigma). A matched input sample (corresponding 811 to 10% of original ChIP reaction) was treated identically. The following day, samples were 812 treated with 20 μg/ml Proteinase K (Sigma) for 2 hours at 45oC and purified using the ChIP 813 DNA Clean and Concentrator Kit (Zymo Research).

814 ChIP-seq libraries for both ChIP and input samples were prepared using the NEBNext Ultra 815 DNA Library Prep Kit for Illumina (NEB), following the manufacturer’s guidelines and using 816 NEBNext Multiplex Oligos. The average size and concentration of libraries were determined 817 using the 2100 Bioanalyzer High Sensitivity DNA Kit (Agilent) and qPCR with SensiMix SYBR 818 (Bioline) and KAPA Illumina DNA standards (Roche). Libraries were sequenced using the 819 Illumina NextSeq 500 platform in biological triplicate or quadruplicate with 40 bp paired-end 820 reads.

821

822 Calibrated nuclear RNA-sequencing and ATAC-sequencing (cnRNA-seq and 823 cATAC-seq)

824 To isolate nuclei for cnRNA-seq and cATAC-seq, 107 mESCs were mixed with 2.5x106 825 Drosophila SG4 cells in PBS. Cells were lysed in 1 ml HS lysis buffer (0.05% NP40, 50mM

826 KCl, 10 mM MgSO4.7H20, 5mM HEPES, 1mM PMSF, 3mM DTT, 1x PIC). Nuclei were 827 recovered by centrifugation at 1000 g for 5min and washed three times in 1 ml resuspension

828 buffer (10mM NaCl, 10mM Tris pH 7.4, 3mM MgCl2). Nuclear integrity was assessed using 829 0.4% trypan blue staining (ThermoFisher).

830 Nuclear RNA was prepared from 4x106 nuclei using TRIzol reagent according to the 831 manufacturer’s protocol (Invitrogen), then treated with the TURBO DNA-free kit 832 (ThermoFisher). nRNA quality was assessed using the 2100 Bioanalyzer RNA 6000 Pico kit 833 (Agilent), then nRNA was depleted of rRNA using the NEBNext rRNA depletion kit and the 834 depletion efficiency evaluated using the Bioanlayzer RNA 6000 Pico kit. RNA-seq libraries 835 were prepared using the NEBNext Ultra Directional RNA-seq kit, and library size and 836 concentration was determined as described for ChIP libraries. Libraries were sequenced using 837 the Illumina NextSeq 500 platform in biological triplicate or quadruplicate using 80 bp paired- 838 end reads.

839 Chromatin accessibility was assayed using an adaptation of the assay for transposase 840 accessible-chromatin (ATAC)-seq (Buenrostro et al. 2013) as previously described (King and 841 Klose 2017), using 5x105 nuclei from the same preparation used for the purification of nuclear 842 RNA. Genomic DNA was also purified from an aliquot of the same preparation of nuclei by bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

843 phenol-chloroform extraction and tagmented with Tn5, to control for sequence bias of the Tn5 844 transposase and to determine the exact mouse/fly mixing ratio for each individual sample.

845 ATAC-seq and input gDNA libraries were prepared by PCR amplification using custom-made 846 Illumina barcodes (Buenrostro et al. 2013) and the NEBNext High-Fidelity 2X PCR Master 847 Mix. Libraries were purified with two rounds of Agencourt AMPure XP bead cleanup 848 (Agencourt, 1.5X bead:sample ratio). Library size and concentration were determined as 849 described for ChIP libraries. Libraries were sequenced using the Illumina NextSeq 500 850 platform in biological triplicate or quadruplicate using 80 bp paired-end reads.

851

852 Digital droplet PCR

853 For digital droplet PCR (ddPCR), total RNA was prepared from 106 mESCs using the RNeasy 854 mini plus kit including gDNA eliminator columns (QIAGEN). RPT low retention tips (Starlab) 855 were used throughout the ddPCR protocol to increase pipetting accuracy. Purified RNA was 856 eluted in 30 μl elution buffer, 7 μl was diluted with 8 μl water, and 4 μl of this dilution was 857 reverse transcribed using the imProm-II system with random hexamer primers and RNasin 858 ribonuclease inhibitor (Promega). The generated cDNA was diluted with 300 ul nuclease-free 859 water. ddPCR primers were designed using Primer 3 Plus (Untergasser et al. 2012) with 860 BioRad recommended settings: 3.8mM divalent cations, 0.8mM dNTPs, 80-120 bp product, 861 60-61°C melting temperature. Their efficiency was tested using a serial dilution curve of cDNA 862 by standard SYBR qPCR. ddPCR reactions were prepared in 96-well PCR plates, and 863 contained 12.5 μl 2x QX200 ddPCR EvaGreen Supermix (BioRad), 0.32 μl each of forward 864 and reverse primers (10 μM), 7 μl diluted cDNA and 4.86 μl nuclease-free water. This 25 μl 865 reaction was mixed by pipetting, then 22 μl was transferred to a semi-skirted 96-well PCR 866 plate (Eppendorf) and used for droplet generation with an AutoDG droplet generator (BioRad). 867 Droplets were collected in a semi-skirted PCR plate, which was then sealed using a PX1 PCR 868 plate sealer (BioRad). PCR was performed using a C1000 Touch thermal cycler (BioRad) with 869 a 2°C/s ramp rate: 5 min at 95°C followed by 40 cycles of denaturation at 95°C for 30 s and 870 annealing/extension at 60°C for 60 s, then signal stabilisation at 4°C for 5 min followed by 871 90°C for 5 min. Droplets were sorted into PCR-positive and PCR-negative fractions according 872 to their fluorescence using the QX200 droplet reader (BioRad). QuantaLife software (BioRad) 873 was used to calculate the absolute concentration of template cDNA in the ddPCR reaction.

874

875 Data processing and normalisation of massively-parallel sequencing bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

876 For cATAC-seq and RNAPII cChIP-seq (including sequencing of input gDNA), reads were 877 aligned to concatenated mouse and spike-in genomes (mm10+dm6 or mm10+hg19) using 878 Bowtie 2 with the ‘-- no-mixed’ and ‘-- no-discordant’ options (Langmead and Salzberg 2012). 879 For histone ChIP-seq, reads were aligned to the mouse mm10 genome as above. Reads that 880 were mapped more than once were discarded, and PCR duplicates were removed using 881 SAMTools (Li et al. 2009b). For cATAC-seq, reads that mapped to a custom ‘blacklist’ of 882 genomic regions with artificially high counts, including mitochondrial DNA sequences, were 883 also discarded.

884 For cnRNA-seq, reads were first aligned using Bowtie 2 (with ‘--very-fast’, ‘--no-mixed’ and ‘-- 885 no-discordant’ options) to the concatenated mm10 and dm6 rRNA genomic sequence 886 (GenBank: BK000964.3 and M21017.1) to filter out reads mapping to rRNA. All unmapped 887 reads were aligned to the concatenated mm10+dm6 genome using STAR (Dobin et al. 2013). 888 To improve mapping of intronic sequences, reads that failed to map using STAR were aligned 889 using Bowtie 2, with ‘--sensitive-local’, ‘--no-mixed’ and ‘--no-discordant’ options. PCR 890 duplicates were removed using SAMtools (Li et al. 2009b).

891 To internally calibrate cnRNA-seq, cATAC-seq and cChIP-seq experiments we spiked a fixed 892 number of control cells into each sample (Drosophila SG4 cells for cnRNA-seq and cATAC- 893 seq, human HEK293T cells for RNAPII cChIP-seq). This spike-in genome was then used to 894 quantitatively compare the gene expression, chromatin accessibility or RNAPII profiles 895 between experimental conditions. For visualisation of cATAC-seq and cChIP-seq data, mm10 896 reads were randomly subsampled by a factor that reflects the total number of spike-in reads 897 in the same sample, as previously described (Bonhoure et al. 2014; Orlando et al. 2014; Hu 898 et al. 2015). To account for any variation in the exact spike-in cell: mESC mixing ratio between 899 biological replicates, the subsampling factors were additionally corrected according to the ratio 900 of dm6 (or hg19)/mm10 total read counts in the matched input sample. For visualisation of 901 cnRNA-seq data, mm10 reads were randomly subsampled by Drosophila normalised size 902 factors calculated using DESeq2 (see below). Histone ChIP-seq libraries were randomly 903 downsampled to achieve the same total number of reads for each individual replicate using 904 SAMtools (Li et al. 2009b).

905 906 Read count quantitation and analysis

907 To compare replicates, read coverage across regions of interest (gene bodies for cnRNA-seq 908 and ChIP-seq, gene promoters for cATAC-seq) was analysed using deepTools 909 multiBamSummary and plotCorrelation functions (Ramírez et al. 2016). For each condition, bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

910 biological replicates correlated well with each other (Pearson correlation coefficient > 0.95) 911 and were merged for downstream applications.

912 Genome coverage tracks were generated using the pileup function from MACS2 (Zhang et al. 913 2008) for ChIP-seq and ATAC-seq and genomeCoverageBed from BEDtools (Quinlan 2014) 914 for cnRNA-seq and visualised using the UCSC genome browser (Kent et al. 2002). Differential 915 bigwig tracks of H3K36me2 normalised to H3, normalised H3K36me2 signal in tamoxifen- 916 treated versus -untreated cells, or RNAPII signal in tamoxifen-treated versus –untreated cells, 917 were generated from merged bigwig files using the deepTools bigwigCompare function with ‘- 918 -operation ratio’ setting (Ramírez et al. 2016). Metaplot and heatmap analyses of read density 919 were performed using the computeMatrix, plotProfile and plotHeatmap deepTools functions 920 (v3.0.1). For ChIP-seq, intervals of interest were annotated with normalised read counts from 921 merged replicates with a custom Perl script using SAMtools, or from differential bigwig files 922 using deepTools computeMatrix with the ‘--outFileNameMatrix’ option. Correlation analyses 923 were performed in R using Spearman correlation and visualised with scatterplots coloured by 924 density using ‘stat_density2d’.

925

926 Differential ATAC-seq and gene expression analyses

927 DESeq2 (Love et al. 2014) was used with a custom R script to identify significant changes in 928 chromatin accessibility or gene expression. In order to calibrate to the spike-in genome, 929 Drosophila reads were first pre-normalised according to the exact dm6/mm10 spike-in ratio 930 derived from the matched input gDNA sample. Drosophila read counts were then generated 931 for a set of unique dm6 refGene genes and used to calculate DESeq2 size factors. These size 932 factors were supplied for DESeq2 normalisation of raw mm10 read counts for a custom non- 933 redundant mm10 gene set of 20633 genes. P-adj < 0.05 and fold change > 1.4 thresholds 934 were used to determine significant changes. Log2 fold changes were visualised using MA- 935 plots generated with ggplot2.

936

937 Gene annotation

938 Non-redundant mouse genes (n = 20633) were classified into non-CGI, PRC and non-PRC 939 categories based on the presence of a non-methylated CGI and RING1B and SUZ12 binding 940 at their promoters. Gene Ontology analysis was performed using DAVID (Huang da et al. 941 2009). The BP FAT setting and a FDR < 0.1 cut-off were used, and the complete non- 942 redundant mm10 gene set was used as a background. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

943

944 Accession numbers

945 The following previously published datasets were used for analysis: H3K36me3 ChIP-seq 946 (GSE34520) (Brookes et al. 2012), KDM2A ChIP-seq (GSE41267) (Farcas et al. 2012), 947 KDM2B ChIP-seq (GSE55698) (Blackledge et al. 2014), BioCAP (GSE43512) (Long et al. 948 2013b).

949

950 ACKNOWLEDGEMENTS

951 We thank Nadezda Fursova for assistance with computational analysis, Emilia Dimitrova, 952 Angelika Feldmann and Neil Blackledge for helpful discussions, and Amy Hughes and Neil 953 Blackledge for critical reading of the manuscript. We are grateful to Amanda Williams at the 954 Department of Zoology in Oxford for sequencing support on the NextSeq 500. Work in the 955 Klose laboratory is supported by the Wellcome Trust, the Lister Institute of Preventive 956 Medicine and the European Research Council. Takashi Kondo and Haruhiko Koseki are 957 supported by the AMED-CREST programme from the Japan Agency for Medical Research 958 and Development. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

959 REFERENCES

960 Bannister AJ, Schneider R, Myers FA, Thorne AW, Crane-Robinson C, Kouzarides T. 2005. 961 Spatial distribution of di- and tri-methyl lysine 36 of histone H3 at active genes. The 962 Journal of biological chemistry 280: 17732-17736. 963 Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, Wei G, Chepelev I, Zhao K. 964 2007. High-resolution profiling of histone methylations in the . Cell 129: 965 823-837. 966 Bell O, Wirbelauer C, Hild M, Scharf AND, Schwaiger M, MacAlpine DM, Zilbermann F, van 967 Leeuwen F, Bell SP, Imhof A et al. 2007. Localized H3K36 methylation states define 968 histone H4K16 acetylation during transcriptional elongation in Drosophila. The EMBO 969 Journal 26: 4974-4984. 970 Blackledge NP, Farcas AM, Kondo T, King HW, McGouran JF, Hanssen LL, Ito S, Cooper S, 971 Kondo K, Koseki Y et al. 2014. Variant PRC1 complex-dependent H2A ubiquitylation 972 drives PRC2 recruitment and polycomb domain formation. Cell 157: 1445-1459. 973 Blackledge NP, Klose R. 2011. CpG island chromatin: a platform for gene regulation. 974 Epigenetics 6: 147-152. 975 Blackledge NP, Zhou JC, Tolstorukov MY, Farcas AM, Park PJ, Klose RJ. 2010. CpG islands 976 recruit a histone H3 lysine 36 demethylase. Molecular cell 38: 179-190. 977 Bonhoure N, Bounova G, Bernasconi D, Praz V, Lammers F, Canella D, Willis IM, Herr W, 978 Hernandez N, Delorenzi M et al. 2014. Quantifying ChIP-seq data: a spiking method 979 providing an internal reference for sample-to-sample normalization. Genome research 980 24: 1157-1168. 981 Borgel J, Tyl M, Schiller K, Pusztai Z, Dooley CM, Deng W, Wooding C, White RJ, Warnecke 982 T, Leonhardt H et al. 2017. KDM2A integrates DNA and histone modification signals 983 through a CXXC/PHD module and direct interaction with HP1. Nucleic acids research 984 45: 1114-1129. 985 Boulard M, Edwards JR, Bestor TH. 2015. FBXL10 protects Polycomb-bound genes from 986 hypermethylation. Nature genetics 47: 479-485. 987 Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. 988 2008. High-resolution mapping and characterization of open chromatin across the 989 genome. Cell 132: 311-322. 990 Brookes E, de Santiago I, Hebenstreit D, Morris KJ, Carroll T, Xie SQ, Stock JK, Heidemann 991 M, Eick D, Nozaki N et al. 2012. Polycomb associates genome-wide with a specific 992 RNA polymerase II variant, and regulates metabolic genes in ESCs. Cell stem cell 10: 993 157-170. 994 Bueno MTD, Baldascini M, Richard S, Lowndes NF. 2018. Recruitment of lysine demethylase 995 2A to DNA double strand breaks and its interaction with 53BP1 ensures genome 996 stability. Oncotarget 9: 15915-15930. 997 Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native 998 chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding 999 proteins and nucleosome position. Nature methods 10: 1213-1218. 1000 Cardozo T, Pagano M. 2004. The SCF ubiquitin ligase: insights into a molecular machine. 1001 Nature reviews Molecular cell biology 5: 739-751. 1002 Carrozza MJ, Li B, Florens L, Suganuma T, Swanson SK, Lee KK, Shia WJ, Anderson S, 1003 Yates J, Washburn MP et al. 2005. Histone H3 methylation by Set2 directs 1004 deacetylation of coding regions by Rpd3S to suppress spurious intragenic 1005 transcription. Cell 123: 581-592. 1006 Carvalho S, Raposo AC, Martins FB, Grosso AR, Sridhara SC, Rino J, Carmo-Fonseca M, de 1007 Almeida SF. 2013. Histone methyltransferase SETD2 coordinates FACT recruitment 1008 with nucleosome dynamics during transcription. Nucleic acids research 41: 2881-2893. 1009 Cheng Z, Cheung P, Kuo AJ, Yukl ET, Wilmot CM, Gozani O, Patel DJ. 2014. A molecular 1010 threading mechanism underlies Jumonji lysine demethylase KDM2A regulation of 1011 methylated H3K36. Genes & development 28: 1758-1771. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1012 Cloos PA, Christensen J, Agger K, Maiolica A, Rappsilber J, Antal T, Hansen KH, Helin K. 1013 2006. The putative oncogene GASC1 demethylates tri- and dimethylated lysine 9 on 1014 histone H3. Nature 442: 307-311. 1015 Deaton AM, Bird A. 2011. CpG islands and the regulation of transcription. Genes & 1016 development 25: 1010-1022. 1017 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, 1018 Gingeras TR. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 1019 (Oxford, England) 29: 15-21. 1020 Du J, Ma Y, Ma P, Wang S, Fan Z. 2013. Demethylation of epiregulin gene by histone 1021 demethylase FBXL11 and BCL6 corepressor inhibits osteo/dentinogenic 1022 differentiation. Stem cells (Dayton, Ohio) 31: 126-136. 1023 Edmunds JW, Mahadevan LC, Clayton AL. 2008. Dynamic histone H3 methylation during 1024 gene induction: HYPB/Setd2 mediates all H3K36 trimethylation. Embo j 27: 406-420. 1025 Fang J, Hogan GJ, Liang G, Lieb JD, Zhang Y. 2007. The Saccharomyces cerevisiae histone 1026 demethylase Jhd1 fine-tunes the distribution of H3K36me2. Molecular and cellular 1027 biology 27: 5055-5065. 1028 Farcas AM, Blackledge NP, Sudbery I, Long HK, McGouran JF, Rose NR, Lee S, Sims D, 1029 Cerase A, Sheahan TW et al. 2012. KDM2B links the Polycomb Repressive Complex 1030 1 (PRC1) to recognition of CpG islands. eLife 1: e00205. 1031 Fodor BD, Kubicek S, Yonezawa M, O'Sullivan RJ, Sengupta R, Perez-Burgos L, Opravil S, 1032 Mechtler K, Schotta G, Jenuwein T. 2006. Jmjd2b antagonizes H3K9 trimethylation at 1033 pericentric heterochromatin in mammalian cells. Genes & development 20: 1557-1562. 1034 Frescas D, Guardavaccaro D, Bassermann F, Koyama-Nasu R, Pagano M. 2007. 1035 JHDM1B/FBXL10 is a nucleolar protein that represses transcription of ribosomal RNA 1036 genes. Nature 450: 309-313. 1037 Fursova NA. 2019. Unpublished data, Klose Lab. 1038 Gearhart MD, Corcoran CM, Wamstad JA, Bardwell VJ. 2006. Polycomb group and SCF 1039 ubiquitin ligases are found in a novel BCOR complex that is recruited to BCL6 targets. 1040 Molecular and cellular biology 26: 6880-6889. 1041 Gregory GD, Vakoc CR, Rozovskaia T, Zheng X, Patel S, Nakamura T, Canaani E, Blobel 1042 GA. 2007. Mammalian ASH1L is a histone methyltransferase that occupies the 1043 transcribed region of active genes. Molecular and cellular biology 27: 8466-8479. 1044 Han XR, Zha Z, Yuan HX, Feng X, Xia YK, Lei QY, Guan KL, Xiong Y. 2016. KDM2B/FBXL10 1045 targets c-Fos for ubiquitylation and degradation in response to mitogenic stimulation. 1046 Oncogene 35: 4179-4190. 1047 He J, Kallin EM, Tsukada Y-I, Zhang Y. 2008. The H3K36 demethylase Jhdm1b/Kdm2b 1048 regulates cell proliferation and senescence through p15(Ink4b). Nature structural & 1049 molecular biology 15: 1169-1175. 1050 He J, Shen L, Wan M, Taranova O, Wu H, Zhang Y. 2013. Kdm2b maintains murine embryonic 1051 stem cell status by recruiting PRC1 complex to CpG islands of developmental genes. 1052 Nature cell biology 15: 373-384. 1053 Ho MS, Tsai PI, Chien CT. 2006. F-box proteins: the key to protein degradation. Journal of 1054 biomedical science 13: 181-191. 1055 Hu B, Petela N, Kurze A, Chan K-L, Chapard C, Nasmyth K. 2015. Biological 1056 chromodynamics: a general method for measuring protein occupancy across the 1057 genome by calibrating ChIP-seq. Nucleic acids research 43: e132-e132. 1058 Huang da W, Sherman BT, Lempicki RA. 2009. Systematic and integrative analysis of large 1059 gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44-57. 1060 Illingworth RS, Bird AP. 2009. CpG islands--'a rough guide'. FEBS letters 583: 1713-1720. 1061 Joshi AA, Struhl K. 2005. Eaf3 chromodomain interaction with methylated H3-K36 links 1062 histone deacetylation to Pol II elongation. Molecular cell 20: 971-978. 1063 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. The 1064 human genome browser at UCSC. Genome research 12: 996-1006. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1065 Keogh MC, Kurdistani SK, Morris SA, Ahn SH, Podolny V, Collins SR, Schuldiner M, Chin K, 1066 Punna T, Thompson NJ et al. 2005. Cotranscriptional set2 methylation of histone H3 1067 lysine 36 recruits a repressive Rpd3 complex. Cell 123: 593-605. 1068 King HW, Fursova NA, Blackledge NP, Klose RJ. 2018. Polycomb repressive complex 1 1069 shapes the nucleosome landscape but not accessibility at target genes. Genome 1070 research 28: 1494-1507. 1071 King HW, Klose RJ. 2017. The pioneer factor OCT4 requires the chromatin remodeller BRG1 1072 to support gene regulatory element function in mouse embryonic stem cells. eLife 6. 1073 Kizer KO, Phatnani HP, Shibata Y, Hall H, Greenleaf AL, Strahl BD. 2005. A Novel Domain in 1074 Set2 Mediates RNA Polymerase II Interaction and Couples Histone H3 K36 1075 Methylation with Transcript Elongation. Molecular and cellular biology 25: 3305-3316. 1076 Klose RJ, Bird AP. 2006. Genomic DNA methylation: the mark and its mediators. Trends in 1077 biochemical sciences 31: 89-97. 1078 Klose RJ, Yamane K, Bae Y, Zhang D, Erdjument-Bromage H, Tempst P, Wong J, Zhang Y. 1079 2006. The transcriptional repressor JHDM3A demethylates trimethyl histone H3 lysine 1080 9 and lysine 36. Nature 442: 312-316. 1081 Kouzarides T. 2007. Chromatin modifications and their function. Cell 128: 693-705. 1082 Koyama-Nasu R, David G, Tanese N. 2007. The F-box protein Fbl10 is a novel transcriptional 1083 repressor of c-Jun. Nature cell biology 9: 1074-1080. 1084 Krogan NJ, Kim M, Tong A, Golshani A, Cagney G, Canadien V, Richards DP, Beattie BK, 1085 Emili A, Boone C et al. 2003. Methylation of histone H3 by Set2 in Saccharomyces 1086 cerevisiae is linked to transcriptional elongation by RNA polymerase II. Molecular and 1087 cellular biology 23: 4207-4218. 1088 Kuo AJ, Cheung P, Chen K, Zee BM, Kioi M, Lauring J, Xi Y, Park BH, Shi X, Garcia BA et al. 1089 2011. NSD2 links dimethylation of histone H3 at lysine 36 to oncogenic programming. 1090 Molecular cell 44: 609-620. 1091 Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature methods 1092 9: 357-359. 1093 Lee JH, Voo KS, Skalnik DG. 2001. Identification and characterization of the DNA binding 1094 domain of CpG-binding protein. The Journal of biological chemistry 276: 44669-44676. 1095 Li B, Carey M, Workman JL. 2007. The role of chromatin during transcription. Cell 128: 707- 1096 719. 1097 Li B, Howe L, Anderson S, Yates JR, 3rd, Workman JL. 2003. The Set2 histone 1098 methyltransferase functions through the phosphorylated carboxyl-terminal domain of 1099 RNA polymerase II. The Journal of biological chemistry 278: 8897-8903. 1100 Li B, Jackson J, Simon MD, Fleharty B, Gogol M, Seidel C, Workman JL, Shilatifard A. 2009a. 1101 Histone H3 lysine 36 dimethylation (H3K36me2) is sufficient to recruit the Rpd3s 1102 histone deacetylase complex and to repress spurious transcription. The Journal of 1103 biological chemistry 284: 7970-7976. 1104 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 1105 2009b. The Sequence Alignment/Map format and SAMtools. Bioinformatics (Oxford, 1106 England) 25: 2078-2079. 1107 Li M, Phatnani HP, Guan Z, Sage H, Greenleaf AL, Zhou P. 2005. Solution structure of the 1108 Set2-Rpb1 interacting domain of human Set2 and its interaction with the 1109 hyperphosphorylated C-terminal domain of Rpb1. Proceedings of the National 1110 Academy of Sciences of the United States of America 102: 17636-17641. 1111 Long HK, Blackledge NP, Klose RJ. 2013a. ZF-CxxC domain-containing proteins, CpG islands 1112 and the chromatin connection. Biochemical Society transactions 41: 727-740. 1113 Long HK, Sims D, Heger A, Blackledge NP, Kutter C, Wright ML, Grutzner F, Odom DT, 1114 Patient R, Ponting CP et al. 2013b. Epigenetic conservation at gene regulatory 1115 elements revealed by non-methylated DNA profiling in seven vertebrates. eLife 2: 1116 e00348. 1117 Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for 1118 RNA-seq data with DESeq2. Genome Biol 15: 550. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1119 Lucio-Eterovic AK, Singh MM, Gardner JE, Veerappan CS, Rice JC, Carpenter PB. 2010. 1120 Role for the nuclear receptor-binding SET domain protein 1 (NSD1) methyltransferase 1121 in coordinating lysine 36 methylation at histone 3 with RNA polymerase II function. 1122 Proceedings of the National Academy of Sciences of the United States of America 107: 1123 16952-16957. 1124 McDaniel SL, Strahl BD. 2017. Shaping the cellular landscape with Set2/SETD2 methylation. 1125 Cellular and molecular life sciences : CMLS 74: 3317-3334. 1126 Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman 1127 W, Kim TK, Koche RP et al. 2007. Genome-wide maps of chromatin state in pluripotent 1128 and lineage-committed cells. Nature 448: 553-560. 1129 Orlando DA, Chen MW, Brown VE, Solanki S, Choi YJ, Olson ER, Fritz CC, Bradner JE, 1130 Guenther MG. 2014. Quantitative ChIP-Seq normalization reveals global modulation 1131 of the epigenome. Cell reports 9: 1163-1170. 1132 Pedersen MT, Agger K, Laugesen A, Johansen JV, Cloos PA, Christensen J, Helin K. 2014. 1133 The demethylase JMJD2C localizes to H3K4me3-positive transcription start sites and 1134 is dispensable for embryonic development. Molecular and cellular biology 34: 1031- 1135 1045. 1136 Pedersen MT, Kooistra SM, Radzisheuskaya A, Laugesen A, Johansen JV, Hayward DG, 1137 Nilsson J, Agger K, Helin K. 2016. Continual removal of H3K9 promoter methylation 1138 by Jmjd2 demethylases is vital for ESC self-renewal and early development. Embo j 1139 35: 1550-1564. 1140 Peters AH, Kubicek S, Mechtler K, O'Sullivan RJ, Derijck AA, Perez-Burgos L, Kohlmaier A, 1141 Opravil S, Tachibana M, Shinkai Y et al. 2003. Partitioning and plasticity of repressive 1142 histone methylation states in mammalian chromatin. Molecular cell 12: 1577-1589. 1143 Pokholok DK, Harbison CT, Levine S, Cole M, Hannett NM, Lee TI, Bell GW, Walker K, Rolfe 1144 PA, Herbolsheimer E et al. 2005. Genome-wide map of nucleosome acetylation and 1145 methylation in yeast. Cell 122: 517-527. 1146 Quinlan AR. 2014. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Current 1147 protocols in bioinformatics 47: 11.12.11-34. 1148 Rahman S, Sowa ME, Ottinger M, Smith JA, Shi Y, Harper JW, Howley PM. 2011. The Brd4 1149 extraterminal domain confers transcription activation independent of pTEFb by 1150 recruiting multiple proteins, including NSD3. Molecular and cellular biology 31: 2641- 1151 2652. 1152 Ram O, Goren A, Amit I, Shoresh N, Yosef N, Ernst J, Kellis M, Gymrek M, Issner R, Coyne 1153 M et al. 2011. Combinatorial patterning of chromatin regulators uncovered by genome- 1154 wide location analysis in human cells. Cell 147: 1628-1639. 1155 Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, 1156 Manke T. 2016. deepTools2: a next generation web server for deep-sequencing data 1157 analysis. Nucleic acids research 44: W160-W165. 1158 Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. 2013. Genome engineering using 1159 the CRISPR-Cas9 system. Nature Protocols 8: 2281. 1160 Robin P, Fritsch L, Philipot O, Svinarchuk F, Ait-Si-Ali S. 2007. Post-translational modifications 1161 of histones H3 and H4 associated with the histone methyltransferases Suv39h1 and 1162 G9a. Genome biology 8: R270-R270. 1163 Saxonov S, Berg P, Brutlag DL. 2006. A genome-wide analysis of CpG dinucleotides in the 1164 human genome distinguishes two distinct classes of promoters. Proceedings of the 1165 National Academy of Sciences of the United States of America 103: 1412-1417. 1166 Schaft D, Roguev A, Kotovic KM, Shevchenko A, Sarov M, Shevchenko A, Neugebauer KM, 1167 Stewart AF. 2003. The histone 3 lysine 36 methyltransferase, SET2, is involved in 1168 transcriptional elongation. Nucleic acids research 31: 2475-2482. 1169 Schotta G, Sengupta R, Kubicek S, Malin S, Kauer M, Callén E, Celeste A, Pagani M, Opravil 1170 S, De La Rosa-Velazquez IA et al. 2008. A chromatin-wide transition to H4K20 1171 monomethylation impairs genome integrity and programmed DNA rearrangements in 1172 the mouse. Genes & development 22: 2048-2061. 1173 Schübeler D. 2015. Function and information content of DNA methylation. Nature 517: 321. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1174 Shen C, Ipsaro JJ, Shi J, Milazzo JP, Wang E, Roe JS, Suzuki Y, Pappin DJ, Joshua-Tor L, 1175 Vakoc CR. 2015. NSD3-Short Is an Adaptor Protein that Couples BRD4 to the CHD8 1176 Chromatin Remodeler. Molecular cell 60: 847-859. 1177 Song L, Zhang Z, Grasfeder LL, Boyle AP, Giresi PG, Lee BK, Sheffield NC, Graf S, Huss M, 1178 Keefe D et al. 2011. Open chromatin defined by DNaseI and FAIRE identifies 1179 regulatory elements that shape cell-type identity. Genome research 21: 1757-1767. 1180 Spitz F, Furlong EE. 2012. Transcription factors: from enhancer binding to developmental 1181 control. Nature reviews Genetics 13: 613-626. 1182 Strahl BD, Grant PA, Briggs SD, Sun ZW, Bone JR, Caldwell JA, Mollah S, Cook RG, 1183 Shabanowitz J, Hunt DF et al. 2002. Set2 is a nucleosomal histone H3-selective 1184 methyltransferase that mediates transcriptional repression. Molecular and cellular 1185 biology 22: 1298-1306. 1186 Sun XJ, Wei J, Wu XY, Hu M, Wang L, Wang HH, Zhang QH, Chen SJ, Huang QH, Chen Z. 1187 2005. Identification and characterization of a novel human histone H3 lysine 36- 1188 specific methyltransferase. The Journal of biological chemistry 280: 35261-35271. 1189 Suzuki E, Nakayama M. 2011. VCre/VloxP and SCre/SloxP: new site-specific recombination 1190 systems for genome engineering. Nucleic acids research 39: e49-e49. 1191 Tan M-KM, Lim H-J, Bennett EJ, Shi Y, Harper JW. 2013. Parallel SCF adaptor capture 1192 proteomics reveals a role for SCFFBXL17 in NRF2 activation via BACH1 repressor 1193 turnover. Molecular cell 52: 9-24. 1194 Tanaka Y, Okamoto K, Teye K, Umata T, Yamagiwa N, Suto Y, Zhang Y, Tsuneoka M. 2010. 1195 JmjC enzyme KDM2A is a regulator of rRNA transcription in response to starvation. 1196 Embo j 29: 1510-1522. 1197 Thomson JP, Skene PJ, Selfridge J, Clouaire T, Guy J, Webb S, Kerr AR, Deaton A, Andrews 1198 R, James KD et al. 2010. CpG islands influence chromatin structure via the CpG- 1199 binding protein Cfp1. Nature 464: 1082-1086. 1200 Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, 1201 Stergachis AB, Wang H, Vernot B et al. 2012. The accessible chromatin landscape of 1202 the human genome. Nature 489: 75-82. 1203 Tsukada Y, Fang J, Erdjument-Bromage H, Warren ME, Borchers CH, Tempst P, Zhang Y. 1204 2006. Histone demethylation by a family of JmjC domain-containing proteins. Nature 1205 439: 811-816. 1206 Tzatsos A, Pfau R, Kampranis SC, Tsichlis PN. 2009. Ndy1/KDM2B immortalizes mouse 1207 embryonic fibroblasts by repressing the Ink4a/Arf locus. Proceedings of the National 1208 Academy of Sciences of the United States of America 106: 2641-2646. 1209 Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG. 2012. 1210 Primer3--new capabilities and interfaces. Nucleic acids research 40: e115-e115. 1211 Voo KS, Carlone DL, Jacobsen BM, Flodin A, Skalnik DG. 2000. Cloning of a mammalian 1212 transcriptional activator that binds unmethylated CpG motifs and shares a CXXC 1213 domain with DNA methyltransferase, human trithorax, and methyl-CpG binding domain 1214 protein 1. Molecular and cellular biology 20: 2108-2121. 1215 Weiner A, Hsieh T-HS, Appleboim A, Chen HV, Rahat A, Amit I, Rando OJ, Friedman N. 2015. 1216 High-resolution chromatin dynamics during a yeast stress response. Molecular cell 58: 1217 371-386. 1218 Whetstine JR, Nottke A, Lan F, Huarte M, Smolikov S, Chen Z, Spooner E, Li E, Zhang G, 1219 Colaiacovo M et al. 2006. Reversal of histone lysine trimethylation by the JMJD2 family 1220 of histone demethylases. Cell 125: 467-481. 1221 Wu X, Johansen JV, Helin K. 2013. Fbxl10/Kdm2b recruits polycomb repressive complex 1 to 1222 CpG islands and regulates H2A ubiquitylation. Molecular cell 49: 1134-1146. 1223 Xiao T, Hall H, Kizer KO, Shibata Y, Hall MC, Borchers CH, Strahl BD. 2003. Phosphorylation 1224 of RNA polymerase II CTD regulates H3 methylation in yeast. Genes & development 1225 17: 654-663. 1226 Xie L, Pelz C, Wang W, Bashar A, Varlamova O, Shadle S, Impey S. 2011. KDM5B regulates 1227 embryonic stem cell self-renewal and represses cryptic intragenic transcription. The 1228 EMBO journal 30: 1473-1484. bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1229 Yu G, Wang J, Lin X, Diao S, Cao Y, Dong R, Wang L, Wang S, Fan Z. 2016. Demethylation 1230 of SFRP2 by histone demethylase KDM2A regulated osteo-/dentinogenic 1231 differentiation of stem cells of the apical papilla. Cell proliferation 49: 330-340. 1232 Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, 1233 Brown M, Li W et al. 2008. Model-based Analysis of ChIP-Seq (MACS). Genome 1234 Biology 9: R137. 1235 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) (C) JmjC ZF-CXXC PHD F Box LRR JmjC JmjC α-KDM2A α-KDM2B KDM2A-LF 0h 96h OHT 0h 96h OHT kDa kDa LF SF LF LF Cx x C Cx x C 130 KDM2A-SF UNT Kdm2a/b-JmjCfl/fl * 130 OHT JmjC JmjC * KDM2B-LF 90 SF SF LF SF x Cx x x C α-BRG1 KDM2B-SF C C

(D) (E) (G) H3K36me2 H3K36me2 1.4 1.2 H3K36me2 1.2 1.0 Q1 UNT OHT UNT 1.0 0.8 OHT

CGI UNT 0.8 CGI OHT Normalised read density non-CGI UNT -10 kb TSS TES + 10 kb non-CGI OHT Normalised read density 0.6 1.2 -3 kb TSS 3 kb

1.0 Q2

0.8

CGI (F) Normalised read density H3K36me2 -10 kb TSS TES + 10 kb 1.4 1.2 1.2 Gene expression

1.0 Q3 1.0

0.8 0.8 intragenic CGI UNT intragenic CGI OHT Normalised read density Normalised read density -10 kb TSS TES + 10 kb 0.6 -3 kb TSS 3 kb 1.2

1.0 Q4 (H) 0.8 non- H3K36me2

CGI Normalised read density -10 kb TSS TES + 10 kb 1.4 (I) H3K36me3 12 Q1 1.0 Q2 Q3 -3 30 8 Q4 non-CGI Distance from TSS (kb) 0.6 Fold change OHT/UNT Fold 4 change 0.6 0.8 1.0 1.2 Q1 Q2 Q3 Q4 non-CGI Read density

0 -10 kb TSS TES + 10 kb

Figure 1 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) (C) cATAC-seq cnRNA-seq

fl/fl fl/fl Kdm2a/b-JmjC Kdm2a/b-JmjC Density

5 2 5 R2 0.000 4 4 88 Cor -0.020

3 3 1 2 2

1 1

0 0 0

−1 −1

−2 −2 −1 −3 −3 Log2 fold change OHT/UNT Log2 fold Log2 fold change OHT/UNT Log2 fold −4 −4 90

−5 −5 change OHT/UNT H3K36me2 log2 fold −2 0 2 4 6 8 10 12 −12 −8 −4 0 4 8 12 16 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 Log2 UNT RPKM Log2 UNT RPKM cnRNA-seq log2 fold change OHT/UNT (D) cnRNA-seq H3K36me2

1.0 0.4

0.5 0.2

0.0 0.0

-0.2 -0.5

-0.4 -1.0 Log2 fold change OHT/UNT Log2 fold change OHT/UNT Q1 Q2 Q3 Q4 Q5 Q1 Q2 Q3 Q4 Q5

Figure 2 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) JmjC α-KDM2A α-KDM2B JmjC 0h 96h OHT 0h 96h OHT JmjC LF SF kDa kDa LF LF 130 LF ∆CxxC LF ∆CxxC LF SF SF x x x x C C C C + OHT 130 100 * CpG island CpG island SF * 70 100 SF (C) SF ∆CxxC SF ∆CxxC Fbxl20 10 kb Fosl2 5 kb

α-BRG1 CGI 75 50 UNT 0 0 cnRNA-seq 75 50 OHT 0 0 70 70 BioCAP 0 0 40 40 (D) KDM2A fl/fl Kdm2a/b-CXXC Density 0 0 40 40 5 KDM2B 4528 0 0 4 (E) (F) 3 1.00 1.0 2 1 0.8 0 0.75 0.6 −1

0.50 old change OHT/UNT −2

Density 0.4 −3

0.25 Log2 f 0.2 −4 531

% CGI-associated genes −5 0.00 -1 0 1 0.0 all −12 −8 −4 0 4 8 12 16 Log2 fold change OHT/UNT Log2 UNT RPKM

Figure 3 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) BioCAP KDM2A KDM2B RING1B SUZ12 80 80 80 CGI 10 60 all CGI 60 60 8 60

densi ty 40 6 40 40 40

Read 4 20 20 20 20 2 0 -5 kb TSS 5 kb -5 kb TSS 5 kb -5 kb TSS 5 kb -5 kb TSS 5 kb -5 kb TSS 5 kb (B) (C) fl/fl fl/fl 1.0 Pcgf1 Density Kdm2a/b-CXXC Density 106 649 Non−CGI 5 5 6527 4528 0.8 PRC 4 783 4 1636 Non−PRC 3 3 0.6 3931 458 2 2

1 1 0.4 0 0

−1 −1 Proportion of genes 10175 2243 0.2 −2 −2 219

−3 −3 0.0 Log2 fold change OHT/UNT Log2 fold −4 −4 all 194 531 fl/fl fl/fl −5 −5 Pcgf1 −12 −8 −4 0 4 8 12 16 −12 −8 −4 0 4 8 12 16

Log2 UNT RPKM Log2 UNT RPKM Kdm2a/b-CXXC (D) (F) genes genes Fold Enrichment Kdm2a/b-CXXCfl/fl 0 1 2 3 spinal cord patterning 3.91E-05 ventral spinal cord development 4.48E-07 cell differentiation in spinal cord 2.66E-06 neuron fate commitment 4.33E-08 fl/fl Pcgf1 cardiac septum morphogenesis 4.01E-06 regulation of epithelial to mesenchymal transition 4.27E-06 176 607 3921 negative regulation of MAP kinase activity 2.40E-05 homophilic cell adhesion via plasmamembrane 6.55E-10 positive regulation of endothelial cell migration 3.56E-05 mesoder m formation 1.85E-05 mammary gland epithelium development 1.90E-05 spinal cord development 1.36E-06 dorsal/ventral pattern formation 1.90E-06 cardiac septum development 1.89E-06 nephron morphogenesis 3.75E-05 renal tubule morphogenesis 3.75E-05 epithelial to mesenchymal transition 9.74E-06 positive regulation of dendrite development 3.68E-05 positive regulation of epithelial cell migration 9.55E-06 cell fate commitment 8.04E-13 (E) (G) Non-PRC genes Non-PRC genes Fold Enrichment 0 1 2 3 positive regulation of proteasomal protein catabolic process 7.77E-06 4 positive regulation of proteasomal ubiquitin-dependent protein catabolism 3.21E-05 response to oxidative stress 8.60E-06 regulation of apoptotic signaling pathway 4.58E-05 negative regulation of intracellular signal transduction 9.90E-06 0 endomembrane system organization 1.96E-05 microtubule-based process 1.08E-05 vesicle- mediated transport 6.81E-06 −4 cytoskeleton organization 3.89E-05 single-organism organelle organization 1.86E-06 Log2 UNT RPKM establishment of protein localization 5.45E-06 −8 protein localization 4.06E-07 protein transport 2.52E-05 cellular response to stress 3.77E-05 macromolecule localization 3.17E-06 Log2 fold change OHT/UNT cellular localization 4.62E-05

Figure 4 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

JmjC (A) (C)

1.00 Kdm2a-CXXCfl/fl LF KDM2ASF KDM2B Kdm2b-CXXCfl/fl Cx Cx Cx Cx fl/fl Kdm2b-CXXC Kdm2a/b-CXXCfl/fl 0.75

+ OHT fl/fl Kdm2a-CXXC CpG island + OHT 0.50 Density

KDM2ALF KDM2BLF 0.25

KDM2BSF KDM2A SF x x C C Cx Cx 0.00 -1 0 1 Log2 fold change OHT/UNT

CpG island CpG island (B) (D) Kdm2a-CXXCfl/fl Kdm2b-CXXCfl/fl Kdm2b-CXXCfl/fl 1.0 5 5 323 Non−CGI 15 4 4 3024 ** 6527 PRC 0.8 99 3 3 1195 ** Non−PRC 2 2 0.6 3931 1 1

0 0 0.4 57 −1 −1 Proportion of genes old change OHT/UNT old change OHT/UNT −2 −2 10175 1506 0.2 −3 −3 64 Log2 f Log2 f ** −4 −4 3 220 0.0 −5 −5 all −12 −8 −4 0 4 8 12 16 −12 −8 −4 0 4 8 12 16 Log2 UNT RPKM Log2 UNT RPKM (E) (F) genes Non-PRC genes Fold Enrichment Fold Enrichment 0 1 2 3 0 1 2 3 cardiac septum morphogenesis 1.60E-05 establishment of localization by movement along microtubule 1.03E-05 mammary gland epithelium development 1.61E-05 cilium organization 4.80E-05 homophilic cell adhesion via plasma membrane 1.84E-07 response to oxidative stress 1.92E-05 regulation of cell-matrix adhesion 3.11E-05 negative regulation of intracellular signal transduction 2.37E-05 bone morphogenesis 1.72E-05 single-organism organelle organization 7.39E-07 cellular response to hypoxia 3.82E-05 cardiac chamber morphogenesis 3.51E-05 axon guidance 4.19E-07 neuron projection guidance 5.04E-07 morphogenesis of a branching epithelium 5.87E-07 (G) (H) branching morphogenesis of an epithelial tube

5.78E-06 )

cell-cell adhesion via plasma-membrane adhesion fl/fl 2.66E-06 2 1 regulation of extrinsic apoptotic signaling pathway 4.31E-05 R 0.588 7 morphogenesis of a branching structure 3.09E-07 Cor 0.705 central nervous system neuron differentiation 7.74E-06 2 morphogenesis of embryonic epithelium 3.27E-05 fl/fl 0 cell fate commitment 7.01E-07 epithelial tube morphogenesis 7.95E-09 canonical Wnt signaling pathway 9.69E-06 0 protein localization to plasma membrane 5.15E-05 −1 / Kdm2a/b-CXXC fl/fl

Kdm2a/b-CXXC −2 Log2 fold change −2

log2 fold change OHT/UNT R2 0.257 931 Cor 0.492 −4 −3 −4 −2 0 2 4 ( Kdm2b-CXXC −5 0 5 10 15 Kdm2b-CXXCfl/fl Log2 UNT RPKM log2 fold change OHT/UNT

Figure 5 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (C) (E) Kdm2a/b-CXXCfl/fl RNAPII-NTD CGI genes non-CGI genes 1.0 1.0 5 UNT 4 OHT 0.8 0.8 60 decreased 3 Pausing Index 2 0.6 0.6 40 1 0.4 0.4 Fraction Fraction 0 old change OHT/UNT Read density f 20 −1 0.2 UNT 0.2 UNT OHT OHT −2 0.0 0.0 −3 0 -10 kb TSS TES +10 kb −4 −2 0 2 4 6 −4 −2 0 2 4 6 Log2(Pausing Index) Log2(Pausing Index) −4 −5

cATAC-seq log2 0 2 4 6 8 10 12 Log2 UNT RPKM (D) (F) (B) Sdc4 5 kb Ser5P-RNAPII OHT / UNT UNT CGI 60 NTD 280 OHT UNT 0 40 RNAPII 280 OHT 0

70 Read density 20 BioCAP 0 40 KDM2A -10 kb TSS TES +10 kb 0 40 KDM2B Ser2P-RNAPII 0 CGI UNT 60 OHT

40

Smtn12 5 kb Read density 20 CGI 240 UNT -10 kb TSS TES +10 kb 0 RNAPII 240 OHT (G) 0 70 3.0 BioCAP 0 40 KDM2A 2.0 0 non- 40 CGI KDM2B 0 1.0 Fold change relative to RNAPII-NTD OHT/UNT

0.0 -10 kb TSS TES +10 kb CGI non-CGI CGI non-CGI

Fold Ser5P-RNAPII Ser2P-RNAPII change 0.6 1.0 1.4 at promoter in gene body

Figure 6 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) Kdm2a-LF 12.5 kb Kdm2a-SF KDM2A 1.0 Kdm2a-JmjCfl/fl UNT 0.8 JmjC OHT

LoxP LoxP 0.6

Kdm2b-LF 12.5 kb Kdm2b-SF % input 0.4

Kdm2b-JmjCfl/fl 0.2 JmjC 0.0 LoxP LoxP Ncoa2 SuvR Brd2 Bmp7 Pax3 Lhx9 Fgf7 NcoA2 Bmp7 promoter body body (C) CGI promoters non-CGI

1.8 KDM2B ns UNT 0.8 1.6 1.4 ns OHT ns ns 1.2 0.6 1.0 0.8 0.6 0.4 Relative Relative signal 0.4 % input 0.2 0.0 0.2 - + - + - + - + OHT H3 H3K36me1 H3K36me2 H3K36me3

0.0 Ncoa2 SuvR Brd2 Bmp7 Pax3 Lhx9 Fgf7 NcoA2 Bmp7 (D) (E) promoter body body CGI promoters non-CGI 0h 96h OHT 160 3.0 (F) H3 Gene expression 120 8 Q4 Q3 H3K36me1 2.0 6 Q2 Q1 80 4 non-CGI Size (kb) Size PK M H3K36me2 2 CpG density 1.0 40 0

H3K36me3 Log2 R -2 H4 -4 intragenic promoter intragenic promoter -10 kb TSS TES + 10 kb

Figure S1 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (D) Kdm2a-LF 12.5 kb Kdm2a-SF fl/fl Kdm2a/b-CXXC Density

Kdm2a-CXXCfl/fl 5 1921 sgRNA.1 sgRNA.2 4 CxxC 3

2 LoxP LoxP 1 Kdm2b-LF 12.5 kb Kdm2b-SF 0 −1 Kdm2b-CXXCfl/fl −2

CxxC −3 Log2 fold change OHT/UNT Log2 fold LoxP LoxP −4 1304 −5 −12 −8 −4 0 4 8 12 16 Log2 UNT RPKM (B) KDM2B KDM2A 0.3

0.3 UNT OHT 0.2 0.2 % input % input 0.1 0.1

0.0 0.0 SuvR Bmp7 mmp17 SfiI Rn45s Nat10 Yipf2 Sec1 Upf1 Bmp7 Fgf7 SuvR Bmp7 mmp17 SfiI Rn45s Nat10 Yipf2 Sec1 Upf1 Bmp7 Fgf7 promoters enhancer body promoter promoters enhancer body promoter

CGI non-CGI CGI non-CGI (C) 10 ddPCR cnRNA-seq

8

6

Fold change OHT/UNT 2 1.4 fold no change

0 Igf2 Lamc2 Klf8 Fbxl20 Cdx5rap3 Gmpr2 Mpc2 Nsmce4a Sos1 Top2a Map2k1 Dbt Bzw2 cnRNA-seq padj < 0.05 cnRNA-seq padj > 0.05

Figure S3 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) 4 R2 0.287 Cor 0.468

2

0

−2 log2 fold change OHT/UNT fl/fl

−4 Pcgf1 −4 −2 0 2 4 Kdm2a/b-CXXCfl/fl log2 fold change OHT/UNT

Figure S4 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) Kdm2a-CXXCfl/fl Kdm2b-CXXCfl/fl α-KDM2A α-KDM2B α-KDM2A α-KDM2B

0h 96h OHT 0h 96h OHT 0h 96h OHT 0h 96h OHT kDa kDa kDa LF kDa LF LF LF 130 130 LF ∆CxxC LF ∆CxxC 130 130 100 100 * * * SF * 100 SF SF 70 SF 100 SF ∆CxxC 70 * α-BRG1

SF ∆CxxC α-BRG1

Figure S5 bioRxiv preprint doi: https://doi.org/10.1101/561571; this version posted February 28, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

(A) (B) Kdm2a-CXXCfl/fl Kdm2b-CXXCfl/fl 5 5 Promoter read density (-1kb to +1kb) Pausing index = 4 4 Body read density (+1kb to TES) 3 3 2 2 1 1 0 0

−1 −1 Read density −2 −2 −3 −3 −4 −4 -1kb TSS +1kb TES −5 −5

cATAC-seq log2 fold change OHT/UNT cATAC-seq log2 fold 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14 Log2 UNT RPKM Log2 UNT RPKM (C) (F) Promoter Body OHT / UNT R2 0.208 3 R2 0.284 NTD Ser5P Ser2P 2 Cor 0.436 Cor 0.535 2 1 1 0 0

−1

RNAPII-NTD −1

−2 −2 log2 fold change OHT/UNT

−3 −3 CGI −2 0 2 −2 0 2 cnRNA-seq log2 fold change OHT/UNT cnRNA-seq log2 fold change OHT/UNT (D) Promoter Body 2.0 2.0

1.5 1.5

1.0 1.0 RNAPII NTD 0.5 0.5 Fold change OHT/UNT

CGI Q1 Q2 Q3 Q4 CGI Q1 Q2 Q3 Q4 CGI CGI non- (E) CGI RNAPII-NTD 30 UNT OHT 25

20 -10 kb TSS TES +10 kb 0.6 1.0 1.4 15 Read density 10

5 -10 kb TSS TES +10 kb

Figure S6