<<

bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

A high density map for navigating the human Polycomb complexome

Simon Hauri1,2,7*, Federico Comoglio3,8*, Makiko Seimiya3, Moritz Gerstung3,9, Timo Glatter1,10, Klaus Hansen4, Ruedi Aebersold1,5, Renato Paro3,6, Matthias Gstaiger1,2† and Christian Beisel3†

1 Department of Biology, Institute of Molecular Systems Biology, ETH Z¨urich, Z¨urich, Switzerland 2 Competence Center Personalized Medicine UZH/ETH, Z¨urich, Switzerland 3 Department of Biosystems Science and Engineering, ETH Z¨urich, Basel, Switzerland 4 Biotech Research and Innovation Centre (BRIC) and Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark 5 Faculty of Science, University of Z¨urich, Z¨urich, Switzerland 6 Faculty of Sciences, University of Basel, Basel, Switzerland 7 Present address: Department of Clinical Sciences, Lund University, Lund, Sweden 8 Present address: Department of Haematology, Cambridge Institute for Medical Research and Wellcome Trust/MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom 9 Present address: European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom 10 Present address: Mass spectrometry and proteomics, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany

∗ Equal contribution † Corresponding authors. M.G: [email protected]; C.B: [email protected]

Abstract

Polycomb group (PcG) are major determinants of silencing and epigenetic 1

memory in higher eukaryotes. Here, we used a robust affinity purification mass spec- 2

trometry (AP-MS) approach to systematically map the human PcG interactome, 3

uncovering an unprecedented breadth of PcG complexes. The obtained high density 4

protein interaction data identified new modes of combinatorial PcG complex formation 5

with proteins previously not associated with the PcG system, thus providing new insights 6

into their molecular function and recruitment mechanisms to target . Importantly, 7

we identified two human PR-DUB de-ubiquitination complexes, which comprise the O- 8

linked N-acetylglucosamine transferase OGT1 and a number of factors. By 9

further mapping binding of PR-DUB components genome-wide, we conclude 10

that the human PR-DUB and PRC1 complexes bind distinct sets of target genes and 11

impact on different cellular processes in mammals. 12

Introduction 13

Cell division requires faithful replication of the genome and restoration of specific 14 1 chromatin states that form the basis of epigenetic memory . Polycomb group (PcG) 15

1/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

proteins - originally identified in melanogaster as epigenetic regulators stably 16

maintaining the repressed state of homeotic genes throughout development - are key 17

players in this process. Numerous studies have now established a central role for PcG 18

proteins in the dynamic control of hundreds of targets in metazoans, including genes 19 2 affiliated to fundamental signaling pathways . Hence, biological processes regulated 20

by PcG proteins encompass cell differentiation, tissue regeneration and cancer cell 21 3−5 growth . 22

The PcG system is organized in multimeric repressive protein complexes containing 23

distinct chromatin modifying activities, which impact on transcriptional regulation 24

by modulating chromatin structures. In Drosophila, five distinct PcG complexes dis- 25

playing different biochemical functions have been reported. The Polycomb Repressive 26

Complex (PRC) 2 contains Enhancer of Zeste which trimethylates lysine 27 of histone 27 6,7 H3 (H3K27me3) while the PRC1 subunit Polycomb provides binding specificity to 28 8,9 H3K27me3 through its chromo-domain . In addition, PRC1 also contains the dRing 29

protein, which catalyzes the mono-ubiquitination of on lysine 118 (H2AUb1), 30 10−12 thereby blocking RNA polymerase II activity . The Pho (Pleiohomeotic, Drosophila 31

homolog of mammalian YY1) repressive complex PhoRC combines DNA- and histone 32 13 tail binding specificities , the PRC1-related dRing-associated factors complex dRAF 33 14 contains the H3K36-specific histone demethylase dKDM2 and the Polycomb repressive 34 15 deubiquitinase (PR-DUB) targets H2AUb1 . 35

Although the core components of Drosophila PcG complexes seem rather fixed, we and 36

others have shown that they can be co-purified with different sets of accessory proteins, 37 13,16−18 thus increasing the diversity of the PcG system . Epigenomic profiling revealed 38

that distinct PcG complexes target largely overlapping gene sets in Drosophila and 39 15,19−22 mechanistic details of PcG recruitment to target genes are beginning to emerge . 40

In contrast, the mammalian PcG system is less well defined and appears to be 41

significantly more complex. Each Drosophila PcG subunit has up to six human homologs, 42 23−26 which combinatorially assemble in different complexes . The six homologs of the 43

Drosophila PRC1 core protein Psc, PCGF1-6, purify together with RING2, the homolog 44

of dRing, in different complexes named PRC1.1-PRC1.6, and each of them associates with 45 23−27 specific additional components . These PRC1 complexes are further distinguished by 46

the mutually exclusive presence of RYBP or a chromo-domain containing CBX protein. 47

Five different CBX proteins displaying differential affinities for lysine-methylated histone 48 28 H3 tails and RNA have been linked to PRC1. In contrast, the absence of a chromo- 49

domain within RYBP suggests that recruitment of CBX and RYBP containing PRC1 50

complexes might be mediated by H3K27 methylation or be independent of it, respectively. 51

Indeed, recent work showed that the histone demethylase Kdm2b targets PRC1.1 via 52 29−31 direct binding to unmethylated CpG islands . Interestingly, incorporation of RING2 53

in optional PCGF complexes not only leads to differential recruitment to chromatin but 54 23,29,31−34 also differentially regulates its enzymatic activity . 55

Similarly to PRC1, the histone methyltransferase (HMT) activity of PRC2 is poten- 56

tially modulated by accessory components such as the Polycomb-like homologs PHF1, 57 35−36 PHF19 and MTF2 . Additional DNA binding interaction partners like JARD2 58 37−41 and AEBP2 might mediate recruitment of the complexes to chromatin . However, 59

whether the PRC2 core, consisting of EED, SUZ12 and EZH2, simultaneously interacts 60

with all of these components or whether distinct complexes co-exist remains unknown. 61

Moreover, mammalian PhoRC and PR-DUB have not been identified to date. 62

Understanding PcG-mediated epigenetic regulation in mammals requires a detailed 63

understanding of the dynamic assembly of PcG complexes. A required step towards 64

this goal is the exhaustive definition of the composition of individual PcG complexes 65

including all accessory proteins, which likely convey distinct functional effects. Here 66

we present the first systematic and comprehensive high-density map on the modular 67

2/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

organization of the human PcG system using a sensitive double-affinity purification and 68 42−43 mass-spectrometry (AP-MS) method . The refined map of 1400 interactions and 69

490 proteins led to a considerable refinement of the human PRC1 and PRC2 network 70

topology, including their relation with the silencing system and the 71

identification of several novel interaction partners. Furthermore, we determined the 72

composition of the human PR-DUB. We found that this highly diverse complex contains 73

MBD proteins, FOXK transcription factors and OGT1, an O-linked N-acetylglucosamine 74 44 (O-GlcNAC) transferase implicated in PcG silencing in Drosophila . Finally, chromatin 75

profiling of PR-DUB components and comparison with published chromatin maps of 76

PcG proteins indicates that as opposed to Drosophila, PRC1 and PR-DUB regulate 77

distinct sets of genes in human cells. 78

Results and Discussion 79

Systematic mapping of the human PcG interaction proteome 80

To investigate the human PcG protein interaction network, we applied a systematic 81

proteomics approach, based on our previously reported AP-MS protocol in HEK293 82 42 cells . The method employs Flp-In HEK293 stable cell lines expressing Strep-HA tag 83

fusion proteins upon tetracycline induction (Fig. 1a). Initially, we selected 28 PcG 84

proteins homologous to Drosophila core complex components and performed AP-MS 85

experiments using these proteins as primary baits (Supplementary Fig. 1a). Then, 86

based on the observed interaction data from this set, we chose 36 additional secondary 87

bait proteins (Supplementary Fig. 1a, Supplementary Table 1). After double affinity 88

purification, bait-associated proteins (preys) were identified by liquid chromatography 89

tandem mass spectrometry (LC-MS/MS; Fig. 1a). 90

At least two biological replicates were measured for each bait protein, for a total of 91

174 AP-MS measurements. Proteins were identified using the X!Tandem search tool 92

to match mass spectra to peptides, and the Trans-Proteomic Pipeline (TPP) to map 93 45,46 peptides to proteins, at a false discovery rate (FDR) of less than 1% . The resulting 94

raw data set contained 930 proteins exhibiting 9856 candidate interactions. 95

To efficiently discriminate biologically relevant interaction partners from contaminant 96 47 proteins, we devised a stringent filtering procedure based on both WDN-score and 97

average enrichment over control purifications for each bait-prey pair. This filtering 98

strategy retained 490 high confidence interacting proteins (HCIPs) encompassing 1400 99

(1193 unidirectional and 207 reciprocal) interactions. Our data set is characterized by 100

an average of 21.9 HCIPs per bait protein, with 75% of interactions that have not yet 101

been annotated in public databases (Supplementary Fig. 1b). 102

To evaluate the specificity and sensitivity of our AP-MS data, we considered the two 103

bait proteins exhibiting the highest number of HCIPs, SKP1 (79 HCIPs) and WDR5 104

(73), and performed a cross-validation with literature-based reports. SKP1 serves as 105

an adaptor for F-Box proteins and CUL1, and confers enzymatic specificity. Out of 79 106

HCIPs, our SKP1 purifications identified 42 F-Box proteins (Supplementary Fig. 1c). 107 48 Furthermore, a previous AP-MS study investigating the interaction partners of WDR5 108

identified a set of 21 proteins associating with this scaffold protein, which takes part in 109 49 the assembly of several chromatin regulating complexes (reviewed in Migliori et al. ). 110

Notably, while we were able to recall 76% of previously reported interaction partners, 111

our experiments identified an additional set of 48 proteins (Supplementary Fig. 1d) 112

co-purifying with WDR5 and encompassing MLL complexes, the NSL complex, the 113

ADA2/GCN5/ADA3 transcription activator complex, mTORC2 components RICTOR 114

and SIN1, and the Polycomb repressive complex PRC1.6 (Supplementary Fig. 1e). 115

3/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Hierarchical clustering assigns HCIPs to PcG complexes 116

To determine the topology of our protein interaction network, we performed hierarchical 117

clustering of HCIPs using a rank-based correlation dissimilarity measure (see Supplemen- 118

tary Methods for details). Clustering revealed a modular organization built upon the 119

three major PcG assemblies PRC1, PRC2 and PR-DUB, and HP1-associated complexes 120

(Fig. 1b-c). 121

PRC1 represents the most elaborated and heterogeneous assembly, containing four 122

groups of complexes defined by the six PCGF proteins: PRC1.1 (PCGF1), PRC1.2/PRC1.4 123

(PCGF2/4), PRC1.3/PRC1.5 (PCGF3/5) and PRC1.6 (PCGF6). Among these PRC1 124

assemblies, PRC1.6 further provides links to the heterochromatin control system via the 125

HP1 chromobox proteins CBX1 and CBX3. Although analysis of the PRC1 topology 126

has been recently reported in studies concentrating on specific subunits in various cellu- 127 23,50−52 lar systems our systematic high-density interaction data allowed us to further 128

refine the composition of the PRC1 module. In the following discussion we focus on 129

these novel findings regarding PRC1 organization, which is illustrated in Fig. 1b-e and 130

Supplementary Fig. 3, and detailed in Supplementary Table 2. 131

All four PRC1 assemblies share a common core encompassing the E3 132

ligases RING1 and RING2, and - with exception of PRC1.2/PRC1.4 - RYBP and YAF2. 133

Interestingly, PCGF2/4 also interact with RYBP and YAF2. As these proteins do not 134

share any additional interaction partner besides RING1/2 (Supplementary Fig. 3b), 135

RYBP/YAF-PCGF2/4-RING complexes might have limited functionality compared to 136

other PRC1 complexes or correspond to transient products before specific canonical 137

and non-canonical PRC1 holo complexes assemble. Furthermore, we did not detect any 138

protein stably associating with all canonical PRC1 core members (RING1/2, PHC1-3, 139

CBX2/4/6/7/8, PCGF2/4). However, we identified NUFP2 (Nuclear fragile X mental 140

retardation interacting protein 2), a putative RNA binding protein exhibiting interactions 141

with CBX2/6/7, PHC3 and PCGF4, as a new PRC1 interacting protein (Supplementary 142

Fig. 3b). 143

The PRC2 complex is separated from both PRC1 and HP1 (Fig. 1c). The two 144

characteristic histone binding proteins RBBP4 and RBBP7 not only belong to the PRC2 145

core along with SUZ12, EED and EZH1/2, but also partake in other protein complexes 146

such as LINC, NURF, NURD and SIN3 (Supplementary Fig. 2a). 147

Finally, we identified the PcG complex PR-DUB defined by ASXL1/2 and BAP1 148

(Fig. 1c). Our clustering analysis also revealed complexes such as the TCP chaperonin 149

and the proteasomal lid, that primarily consist of prey proteins (Fig. 1b). Of note, 150

several proteins belonging to MLL complexes share interactions between PRC1.3/PRC1.5 151

(CSK21/22), PRC1.6 (WDR5) and PR-DUB (OGT1) (Fig. 1c). In contrast, interaction 152

modules centered on LMBL1/3/4, SUV92 and TRIPC, LCOR, ZN211, and YY1 (the 153

homolog of Drosophila Pho) are more disconnected and tend to be sparse (Fig. 1c, Fig. 154

2d and Supplementary Fig. 2b-c). Although YY1 interacts with all subunits of the 155

INO80 chromatin remodeling complex, our AP-MS data does not unveil an equivalent of 156

the Drosophila PhoRC complex (Supplementary Fig. 2b). However, except for PhoRC, 157

we were able to reconstitute all mammalian equivalents of Drosophila PcG protein 158

assemblies with unprecedented detail. 159

WD40 domain proteins DCAF7 and WDR5 are central scaffold- 160

ing proteins for PRC1.3/PRC1.5 and PRC1.6 161

The WD40 domain protein DCAF7 has been implicated in skin development and cell 162

proliferation by interacting with DIAP1 and the dual-specificity tyrosine phosphorylation- 163 53,54 regulated kinase DYR1A . Intriguingly, DCAF7 co-purified with CBX4/6/8, RING1/2, 164

RYBP/YAF2 and PCGF3/5/6, indicating that the protein is deeply embedded in the 165

4/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

PRC1 module. As recent studies also reported interactions between DCAF7 and mem- 166 26,27,55,56 bers of the canonical PRC1 complex, as well as PCGF3/5/6 , we performed 167

DCAF7 purifications to test whether the protein is indeed a universal subunit of several 168

different RING1/2-containing complexes. 169

Our DCAF7 AP-MS revealed reciprocal interactions with all bait proteins within a 170

cluster centered on PCGF3 and 5 (Fig. 1d and Supplementary Fig. 3c), with no relation 171

to the other PCGF complexes. Moreover, we identified DYR1A/B, DIAP1, the Zinc 172

finger transcription factors (ZNFs) ZN503 and ZN703, and the ankyrin-repeat proteins 173

SWAHA and SWAHC as an unrelated module interacting with DCAF7 (Fig. 1d). This 174

result suggests that DCAF7 acts as a scaffold for several different protein complexes. 175

As for RING1/2, RYBP/YAF2 and PCGF3/5, DCAF7 interacts with the tetrameric 176

casein kinase 2 (CSK2) and the three paralogs AUTS2, FBRS and FBSL. Therefore, 177

to further refine the PRC1.3/PRC1.5 sub-network we performed AP-MS experiments 178

using the catalytic casein kinase subunits CSK21 and CSK22. Our results confirmed 179

the topology of the PCGF3/5-DCAF7 assemblies, and identify CSK2 and three unchar- 180

acterized proteins within the AUTS2 family as part of PRC1.3/PRC1.5 (Fig. 1d and 181

Supplementary Fig. 3c). 182

The protein PCGF6 was initially purified together with the transcription factors 183

E2F6, MAX, TFDP1, MGAP as well as RING1/2, YAF2, LMBL2, CBX3 and the 184 57 HMTs EHMT1 and EHMT2, an assembly denoted as E2F6.com . However, subsequent 185 23,27,58,59 studies were unable to recover the entire (holo) E2F6.com . Moreover, recent 186

data suggest that PCGF6 and RING2 might interact with the WD40 domain protein 187 23 WDR5 . We therefore decided to revisit the topology of the PCGF6-E2F6 network and 188

to probe WDR5 connectivity by adding MAX, TFDP1, E2F6, LMBL2, CBX3, EHMT2 189

and WDR5 to our bait collection. Our AP-MS experiments unraveled a high-density 190

network including reciprocal interactions between all but one (EHMT2) baits within 191

this set (Fig. 1e and Supplementary Fig. 2c), thus demonstrating that the major 192

PRC1.6 complex resembles E2F6.com. In addition, MGAP, MAX, TFDP1 and E2F6 193

purifications revealed a rich set of transcription factors that can heterodimerize with 194

these proteins but that are not part of PRC1.6 as they did not connect to any other 195

component thereof (Fig. 1e). 196

Recently, WDR5 was also reported to be part of the Non-Specific Lethal (NSL) 197

complex and to form a trimeric complex with RBBP5 and ASH2L, which stimulates 198

the H3K4-specific activity of the SET1 HMT family members SET1A, SET1B and 199 48,60−62 MLL1-4 . Interestingly, while we recalled these interactions, we additionally 200

detected reciprocal interactions of WDR5 with all PRC1.6 subunits, thus demonstrating 201

that WDR5 is a universal component of activating and repressing chromatin modifying 202

complexes. 203

Taken together, our results identify the WD40 domain proteins DCAF7 and WDR5 204

as subunits of PRC1.3/PRC1.5 and PRC1.6, respectively. Importantly, recent stud- 205

ies suggested that the diversity of PRC1 complexes might be specified by binding 206

preferences of PCGF proteins, which are mediated by their RING finger- and WD40- 207 63,64 associated Ubiquitin-Like (RAWUL) C-terminal domain . For example, the PCGF1 208

and PCGF2/4 RAWUL domains selectively interact with BCOR/BCORL and PHC 209 63 proteins, respectively . Since no interaction partners of PCGF3/5 and PCGF6 RAWUL 210

domains have been experimentally identified to date, and since WD40 domain-containing 211 49 proteins often scaffold multisubunit complexes , we propose that DCAF7 and WDR5 212

may serve as central scaffolding proteins for PRC1.3/PRC1.5 and PRC1.6. 213

5/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

CBX1 partitions in several distinct heterochromatin complexes 214

including PRC1.6 215

In contrast to previous studies, which reported CBX3 as the only heterochromatin 216

protein within E2F6.com, we unexpectedly detected CBX1 in all our PRC1.6-related pull 217

down experiments. To corroborate this finding, we performed AP-MS experiments with 218

CBX1, using the constitutive heterochromatin protein CBX5 as control. Our results 219

indicate that while CBX5 is disconnected from the PCGF6-E2F6 network, components 220

therein interact with CBX1 (Fig. 1e and Supplementary Table 2). Furthermore, they 221

validate interactions of EHMT2 with CBX1 and CBX3 and, to our surprise, separate 222

EHMT2 and EHMT1 from PRC1.6, suggesting a separate complex containing CBX1/3, 223

EHMT1/2, ZNF proteins, as well as the KRAB-ZNF interacting and co-repressor protein 224

TIF1B (Supplementary Fig. 4a). 225

Since the PcG and heterochromatin silencing systems are functionally and molecularly 226

related through PcG CBX2/4/6/7/8 and HP1 CBX1/3/5 proteins (reviewed in Beisel 227 65 and Paro ), we further explored the CBX1/3/5 core of our network seeking for potential 228

connections between these two systems. This survey led to a refined topology of 229

CBX1/3/5-containing complexes and identified new interacting partners (Supplementary 230

Fig. 4b-e). However, we did not detect additional connections to PcG proteins, suggesting 231

limited direct cross-talk between protein components of the two silencing systems. 232

The PRC2 core partitions into two different classes of complexes 233

While the functional core complex of PRC2 is composed of SUZ12, EED, RBBP4/7 and 234

either EZH1 or EZH2, additional accessory proteins have been identified which may 235 66−68 regulate the H3K27 HMT activity of the complex and its recruitment to chromatin . 236

However, how these proteins are organized within PRC2 or whether they assemble 237

into independent PRC2 subcomplexes remains largely unresolved. To elucidate the 238

topological organization of PRC2 complexes we performed AP-MS experiments using 14 239

reported PRC2-associated proteins (Supplementary Fig. 1a). 240

Hierarchical clustering analysis assigned all PRC2 baits to a single cluster exhibiting 241

high intra-cluster correlations (Fig. 1b and 2a) and forming a high-density interaction 242

network (Fig. 2b). However, when reciprocal interactions were taken into account, our 243

data revealed two fundamental alternative assemblies linked to the PRC2 core, the first 244

defined by AEBP2 and JARD2 and the second by the mutually exclusive binding of 245

one of the three Polycomb-like homologs (PCLs) PHF1, PHF19 and MTF2, respectively 246

(Fig. 2b). 247

Taken together, our results identify two structurally distinct classes of PRC2 com- 248

plexes. We therefore propose a novel nomenclature for PRC2, in which we refer to the 249

two PRC2 wings as PRC2.1 (mutually exclusive interaction of PHF1, MTF2 or PHF19) 250

and PRC2.2 (simultaneous interaction of AEBP2 and JARD2). AEBP2 and JARD2 251

can directly bind to DNA and have been implicated in the recruitment of PRC2 and 252 37−39,67 modulation of its enzymatic activity . Interestingly, depletion of JARD2 has only 253

a mild effect on global H3K27 methylation levels, suggesting that PRC2.1 might be 254

primarily responsible for maintaining H3K27me3 patterns genome-wide. 255

C10ORF12 and C17ORF96 are mutually exclusive subunits of 256

the Polycomb-like class of PRC2 complexes 257

Our purifications of the PRC2 core members and PCLs identified two largely uncharacter- 258

ized proteins, C10ORF12/LCOR and C17ORF96, as PRC2 interactors (Fig. 2b). Both 259

proteins have recently been shown to reciprocally interact and co-localize on chromatin 260

6/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

66 with EZH2 , but their placement within the PRC2 topology and their functional role 261

remained unknown. 262

Purifications of C17ORF96 confirmed all interactions with PCLs (Fig. 2b) and 263

computational sequence analysis revealed that C17ORF96 is present in all vertebrate 264

genomes. Interestingly, BLAST identified a single protein related to C17ORF96 in the 265

, the SKI/DAC domain containing protein 1 (SKDA1) (Supplementary 266

Fig. 5a). SKDA1 belongs to the DACH family, which is defined by the presence of a 267

SKI/SNO/DAC domain of about 100 amino acids, and is involved in various aspects of 268 69,70 cell proliferation and differentiation . However, C17ORF96 lacks the SKI/SNO/DAC 269

domain and its homology to SKDA1 is restricted to the C-terminus (53% sequence 270

identity within the last 60 amino acids) (Supplementary Fig. 5a-b), suggesting that this 271

region encodes an hitherto uncharacterized . Interestingly, SKDA1 also 272

interacts with EZH1 and SUZ12 (Fig. 2b), suggesting that this putative C-terminal 273

domain mediates the interaction of C17ORF96 and SKDA1 with the PRC2 core. 274

Initial analysis of C10ORF12, the second uncharacterized protein highly connected 275

to the PRC2 core, identified peptides that ambiguously mapped to two distinct UniProt 276

proteins, LCOR and C10ORF12 (Supplementary Fig. 5c-e). These two proteins are 277

encoded by the same genomic . Indeed, in contrast to the UniProt database, 278

Genebank contains the LCOR-Cra b (ligand-dependent co-repressor, isoform CRA b, 279

EAW49962.1) entry, where the N-terminal 111 amino acids of LCOR are fused to 280

C10ORF12 and the two regions are separated by a 200 spacer (Fig. 2c). 281

LCOR is a ligand-dependent co-repressor interacting via its N-terminal domain with 282

nuclear hormone receptors in a complex including CTBP and a number of histone 283 71,72 deacetylases . While our AP-MS analysis yielded peptides of the LCOR N-terminus, 284

C10ORF12 and the LCOR-CRA b specific spacer (Supplementary Fig. 5c), peptides 285

of the LCOR C-terminus were missing (Supplementary Fig. 5d), indicating that PRC2 286

interacts with LCOR-CRA b and potentially with the shorter isoform C10ORF12. To test 287

this possibility, we performed additional AP-MS experiments using LCOR, C10ORF12 288

and LCOR-CRA b as baits. LCOR purified with its known interaction partners CTBP1 289

and CTBP2, while PRC2 components were absent in LCOR purifications (Fig. 2d and 290

Supplementary Fig. 5e). In contrast, both LCOR-CRA b and C10ORF12 reciprocally 291

interact with all subunits of the PCL wing of PRC2 (Fig. 2b, Supplementary Fig. 5d-e). 292

To investigate the functional relevance of this finding, we employed a heterologous 293

reporter system based on a stably integrated, constitutively active luciferase reporter gene 294 73 responsive to upstream, -proximal GAL4 DNA binding sites (Fig. 2e) . We 295

engineered cell lines containing tetracycline inducible GAL4-LCOR and GAL4-C10ORF12 296

expression constructs, respectively. Upon induction, both proteins accumulated in the 297

nucleus and were recruited to the GAL4 motifs, resulting in strong repression of luciferase 298

activity (Fig. 2f-h and Supplementary Fig. 5f). To assess whether the repressive 299

activity of C10ORF12 is mediated by recruitment of PRC2 to the target promoter, we 300

performed chromatin immunoprecipitation (ChIP) with an H3K27me3-specific antibody 301

and analyzed the enrichment of luciferase promoter fragments via quantitative PCR. Upon 302

tetracycline induction, we found that the transcription start site (TSS) of the luciferase 303

gene was significantly trimethylated at H3K27 in the GAL4-C10ORF12 expressing cell 304

line (Fig. 2i). In contrast, despite GAL4-LCOR was expressed at higher levels than 305

GAL4-C10ORF12 (Fig. 2f) and exhibited a 10-20 fold increase in its binding to the 306

reporter (Fig. 2h), no significant H3K27me3 enrichment was observed upon expression 307

of this protein. 308

PCL proteins target PRC2 and positively regulate its enzymatic activity via their 309 36,74,75 ability to bind methylated H3K36 . However, further experimental investigation 310

will be required to elucidate the exact mechanism by which C17ORF96 and LCOR- 311

CRA b/C10ORF12 influence PRC2.1. An interesting possibility is that LCOR-CRA b 312

7/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

recruits PRC2.1 to nuclear hormone binding sites upon ligand binding. This 313

interaction, restricted to C10ORF12, leaves the N-terminus of LCOR free for ligand 314

responsive interaction with nuclear hormone receptors. 315

ASXL1 and ASXL2 define optional PR-DUB complexes contain- 316

ing OGT1 and FOXK transcription factors 317

The Drosophila PcG complex PR-DUB was identified as a heterodimer consisting of the 318 15 deubiquitinase Calypso and the Asx protein . However, the composition of its human 319

counterpart remains elusive. Thus, we set out to systematically characterize this complex 320

by performing purifications of BAP1, ASXL1 and ASXL2, the human homologs of the 321

Drosophila PR-DUB components. Our AP-MS analysis revealed that BAP1 reciprocally 322

interacts with both ASXL1 and ASXL2 (Fig. 3b). Interestingly, the two ASXL proteins 323

do not interact with each other (Fig. 3b), suggesting the existence of two mutually 324

exclusive PR-DUB complexes, which we called PR-DUB.1 and PR-DUB.2 depending on 325

the ASXL partner of BAP1 being ASXL1 and ASXL2, respectively. 326

Both PR-DUB core components share a similar set of accessory proteins encompassing 327

the transcription factors FOXK1 and FOXK2, the chromatin associated proteins MBD5 328

and MBD6, the transcriptional co-regulator HCFC1 and most notably OGT1 (Fig. 329

3b). A recent attempt to identify BAP1 interaction partners led to the identification 330 76 of Asxl1, Asxl2, Ogt, Foxk1, Kdm1b and Hcf1 in mouse spleen tissue . Our data 331

provide support to these results and indicate a general, cell type independent assembly 332

of mammalian PR-DUB complexes. Furthermore, our data clearly implicate OGT1 as 333

member of mammalian PR-DUB complexes, an interaction which was not identified in 334 15 the Drosophila PR-DUB complex purification although the Drosophila homolog Ogt 335 44 was previously annotated as bona fide PcG protein . 336

OGT1 is the only O-linked N-acetylglucosamine (O-GlcNAc) transferase in mammals. 337

The enzyme catalyzes the addition of a single GlcNAc molecule to serine and threonine 338 77 of many target proteins . OGT1 enzymatic activity is required for mouse development 339 78 and is essential for embryonic stem cell (ESC) viability . In addition, the protein was 340

found to interact with BAP1 and to localize to chromatin via its interaction with the 341 76,78 5-methylcytosine oxidase TET1 . To further refine the connectivity of OGT1 within 342

the PR-DUB network, we performed AP-MS experiments using OGT1 as bait. 343

This analysis validated the interaction between BAP1 and OGT1 and the interactions 344

of OGT1 with TET1 and NCOAT (Fig. 3b), the O-GlcNAcase counteracting OGT1 345 78,79 activity . Moreover, our data identified a second set of OGT1-containing complexes 346

involved in transcriptional regulation that did not co-purify with PR-DUB core subunits 347

(Fig. 3b). These include the ZNFs ZEP1 and ZEP2, and the arginine-specific HMT 348

CARM1. Furthermore, we identified OGT1 as subunit of WDR5 containing complexes. 349

Indeed, OGT1 exhibits interactions with the NSL complex and with the SET1 HMT 350

family activating complex WDR5/RBBP5/ASH2L, which is likely to mediate the inter- 351

action of OGT1 with MLL1 and SET1A (Fig. 3b). Although no interaction of OGT1 352

with FOXK1/2 and MBD5/6 was detected, these proteins co-cluster with PR-DUB core 353

components and OGT1 is highly connected to the PR-DUB core (Fig. 3a-b). 354

These results suggest that OGT1/HCFC1 and FOXK/MBD proteins may form 355

optional PR-DUB.1/PR-DUB.2 complexes. Conversely, OGT1 interactions with FOXK 356

and MBD proteins could be transient and hence difficult to pinpoint by OGT1 affinity 357

purification. 358

Genomics profiling of the FOXK1-containing PR-DUB.1 359

A functional interaction of OGT1 with FOXK transcription factors within the same 360

PR-DUB complex would require their colocalization at genomic target sites. To test 361

8/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

this hypothesis, we examined the genome-wide distribution of O-GlcNAc, a proxy for 362

catalytically active OGT1, ASXL1 and FOXK1 by performing ChIP-seq in HEK293 363

cells (Supplementary Fig. 6a). 364

By pairwise analysis of overlapping peak regions we found 41% and 55% of FOXK1 365

peaks co-localizing with O-GlcNAc and ASXL1, respectively, while 69% of O-GlcNAc 366

peaks were co-occupied by ASXL1 (Fig. 4a). In total, we identified 2703 genomic loci 367

bound by all three features (Fig. 4a). Functional annotation of these sites to genomic 368

compartments revealed a predominant binding of PR-DUB.1 to gene promoters (Fig. 369

4a), with read densities sharply peaking at TSSs of RefSeq annotated genes (Fig. 4b). 370

Moreover, we found that feature enrichments within ±1kb of TSSs are highly correlated 371

to each other (>0.8), further indicating that ASXL1, FOXK1 and OGT1 are likely 372

subunits of the same protein complex (Fig. 4c). To identify classes of genes bound by 373

PR-DUB.1, we subjected the set of TSSs bound by each complex member to MSigDB 374

pathway enrichment analysis. This analysis identified highly overlapping sets of enriched 375

pathways for each protein (Supplementary Fig. 6b). Notably, PR-DUB.1 targets are 376

predominantly enriched for genes involved in fundamental cellular processes like gene 377

expression, , and protein metabolism (Fig. 4d). 378

PRC1 complexes and PR-DUB.1 regulate different target genes 379

Mutations in Drosophila sxc (the gene encoding Ogt), calypso and Asx genes lead to 380

de-repression of HOX genes and previous studies reported a strong colocalization of PR- 381 15,44 DUB and O-GlcNAc with major PRC1 bound sites at inactive genes in Drosophila . 382

We sought to investigate this relation in the human genome by comparing our PR-DUB 383 23 80 profiles with publicly available ChIP-seq data of RING2 and RYBP , as well as TIF1B . 384

Our analysis therefore focused on six representatives of the three major modules 385

within our PcG interaction network at the chromatin level: RING2 and RYBP, the 386

central core of the PRC1 module (Fig. 1c); TIF1B, the common component of ZNFs 387

containing CBX1/3/5 complexes (Supplementary Fig. 4a), and PR-DUB.1. Besides 388

the expected high correlation between RING2 and RYBP (p=0.78, Supplementary Fig. 389

6c), analysis of pairwise correlations of feature enrichments at promoters revealed a 390

clear segregation between PRC1 and TIF1B on the one hand, and PR-DUB.1 on the 391

other hand (Fig. 4e-f and Supplementary Fig. 6c). Similarly, when comparing the 392

genome-wide distribution of PR-DUB.1 (2703 ASXL1+GlcNAc+FOXK1 co-occupied 393

regions) with ?PRC1? (6816 RING2+RYBP peaks) and TIF1B (10297 peaks), we 394

observed only a partial co-localization of these three complexes at target sites, with 24% 395

and 31% of PR-DUB.1 binding sites co-bound by PRC1 and TIF1B, respectively and 396

only 336 regions occupied by all three complexes (Fig. 4f). 397

In summary, our analysis uncovered the basic topology of the human PR-DUB network 398

at both proteomics and genomics level. Interestingly, and in contrast to Drosophila, the 399

human PR-DUB and PRC1 complexes bind largely distinct sets of target genes, strongly 400

suggesting they are involved in different cellular processes in mammals. In addition, 401

our AP-MS experiments identified the transcription factors FOXK1 and FOXK2 as 402

components of PR-DUB, hence highlighting a potential recruitment mechanism of PR- 403

DUB complexes. We anticipate that future experiments based on our data will shed 404

light on the functionality of PR-DUB complexes in gene regulation and their relation to 405

PRC1 and PRC2. 406

Conclusions 407

Although considerable progress has been made in determining the composition of mam- 408

malian PcG protein complexes, recent findings are primarily based on studies of isolated 409

9/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

protein components in different cellular contexts with heterogenous biochemical work- 410

flows, thus hampering a system-level understanding of . In this study, in 411

contrast, we used a systematic proteomic approach to comprehensively map the PcG 412

protein interactome in a single human cell line. Since the abundance of PcG proteins 413

can vary between cell types and surely influences the assembly of alternative protein 414

complexes, we chose HEK293 cells for our study as all PcG proteins are expressed in this 415

cell type. The result is a high-density interaction network, which enabled us to dissect 416

individual PcG complexes with unprecedented detail. By allocating newly identified 417

interaction partners to all PcG complex families and by identifying candidate subunits 418

responsible for complex targeting to chromatin, we obtained new insights into molecular 419

function and recruitment of the PcG silencing system. In addition to the fine mapping 420

of the cardinal PcG complexes PRC1 and PRC2, our data unravel human PR-DUB as 421

multifaceted assembly comprising OGT1 along with several transcription and chromatin 422

binding factors. For the first time, our study testifies the significant diversity that exists 423

among individual PcG complexes in a single cell line. In addition, it provides a solid 424

framework for future systematic experiments aiming at disentangling the biochemistry 425

of PcG protein-mediated gene regulation in mammalian cells. 426

Methods 427

Expression constructs and generation of stable cell lines 428

To generate expression vectors for tetracycline-induced expression of N-terminally SH- 429

tagged bait proteins, human ORFs within pDONR223 vectors were picked from a 430

Gateway-compatible human orfeome collection (horfeome v5.1, Open Biosystems) for 431

LR recombination with the customized destination vector pcDNA5/FRT/TO/SH/GW, 432

which was obtained through ligation of the SH-tag coding sequence and the Gateway 433

recombination cassette into the polylinker of pcDNA5/FTR/TO (Invitrogen). Genes not 434

in the human orfeome collection were amplified from human cDNA prepared from HEK293 435

cells by PCR and cloned into entry vectors by TOPO (pENTR/D-TOPO) reaction. 436

Stable Flp-In HEK293 T-REx cell lines were generated as described in Supplementary 437

Methods. 438

Protein purification 439

Stable Flp-In HEK293 T-REx cell lines were grown in five 14.5 cm Greiner dishes 440

to 80% confluency and bait protein expression induced by the addition of 1µg/ml of 441

tetracycline to the medium 16-24hrs prior to harvest in PBS containing 1 mM EDTA. 442

The suspended cells were pelleted and drained from the supernatant for subsequent 443 ◦ shock-freezing in liquid nitrogen and long term storage at -80 C. The frozen cell pellets 444

were resupended in 5ml TNN lysis buffer (100 mM Tris pH 8.0, 5 mM ETDA, 250 mM 445

NaCl, 50 mM NaF, 1% Igepal CA-630 (Nonidet P-40 Substitute), 1.5 mM Na3VO4, 1 446

mM PMSF, 1mM DTT and 1x Protease Inhibitor mix (Roche)) and rested on ice for 10 447

min. Insolubilizable material was removed by centrifugation. Cleared lysates were loaded 448

on a pre-equilibrated spin column (Biorad) containing 200 µl Strep-Tactin sepharose 449

(IBA Biotagnology). The sepharose was washed four times with 1 ml TNN lysis buffer 450

(Igepal CA-630 and DTT concentrations adjusted to 0.5% and 0.5mM, respectively). 451

Bound proteins were eluted with 1 ml 2 mM Biotin in TNN lysis buffer (Igepal CA-630 452

and DTT concentrations adjusted to 0.5% and 0.5mM, respectively), incubated for 453

2h with 100 µl HA-Agarose (Sigma), washed four times with TNN lysis buffer (Igepal 454

CA-630 concentration adjusted to 0.5%, w/o DTT and w/o protease inhibitors) and 455

two additional times in TNN buffer (100 mM Tris pH 8.0, 150 mM NaCl, 50 mM NaF). 456

10/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

The bound proteins were released by acidic elution with 500 µl 0.2 M pH 2.5 457

and the eluate was pH neutralized with NH4HCO3. bonds were reduced with 458 ◦ 5 mM TCEP for 30 min at 37 C and alkylated in 10 mM iodacetamide for 20 min at 459

room temperature in the dark. Samples were digested with 1 µg trypsin (Promega) 460 ◦ overnight at 37 C. Bait proteins with low protein yield were processed by single step 461

purification, omitting the HA step. The frozen cell pellets were resuspended in 5ml of 462

TNN lysis buffer containing 10 µg/ml Avidin. The eluates were TCA precipitated to 463

remove biotin and resolubilized in 50 µl 10% ACN, 50 mM NH4HCO3 pH 8.8. After 464

dilution with NH4HCO3 to 5% ACN the samples were reduced, alkylated and digested 465

as in the double step protocol. The digested peptides were puri?ed with C18 microspin 466

columns (The Nest Group Inc.) according to the protocol of the manufacturer, resolved 467

in 0.1% formic acid, 1% acetonitrile for mass spectrometry analysis. 468

Mass spectrometry 469

LC-MS/MS analysis was performed on an LTQ Orbitrap XL mass spectrometer (Thermo 470

Fisher Scientific). Peptide separation was carried out by reverse phase a Proxeon 471

EASY-nLC II liquid chromatography system (Thermo Fisher Scientific). The reverse 472

phase column (75 µm x 10 cm) was packed with Magic C18 AQ (3 µm) resin (WICOM 473

International). A linear gradient from 5% to 35% mobile phase (98% acetonitrile, 0.1% 474

formic acid) was run for 60 min over a stationary phase (0.1% formic acid, 2% acetonitrile) 475

at a ?ow rate of 300 nl/min. Data acquisition was set to obtain one high resolution MS 476

scan in the Orbitrap (60,000 @ 400 m/z) followed by six collision-induced fragmentation 477

(CID) MS/MS fragment ion spectra in the linear trap quadrupole (LTQ). Orbitrap 478

charge state screening was enabled and ions with unassigned or single charge states were 479

rejected. The dynamic exclusion window was set to 15s and limited to 300 entries. The 480

minimal precursor ion count to trigger CID and MS/MS scan was set to 150. The ion 481

accumulation time was set to 500 ms (MS) and 250 ms (MS/MS) using a target setting of 482

106 (MS) and 104 (MS/MS) ions. After every biological replicate measurement, a peptide 483

reference sample containing 200 fmol of human [Glu1]-Fibrinopeptide B (Sigma-Aldrich) 484

was analyzed to monitor the overall LC-MS/MS systems performance. 485

ChIP and preparation of ChIP-seq libraries 486 81 Chromatin fixation and immunoprecipitation were performed essentially as described . 487 8 Cells (3-4x10 ) were fixed in 200 ml of medium with 1% formaldehyde for 10 min at 488

room temperature. Cross-linked cells were sonicated to produce chromatin fragments 489

of an average size of 150-400 bp. Soluble chromatin was separated from insoluble 490 7 material by centrifugation. The supernatant containing chromatin of 1-2x10 cells 491

was used for immunoprecipitation. Sequencing libraries were prepared with the NEB 492

Genomic DNA Sample Preparation Kit according to NEB?s instructions. After adapter 493

ligation, library fragments of 250-350 bp were isolated from an agarose gel. The DNA 494

was PCR amplified with Illumina primers with 18 cycles, purified, and loaded on an 495

Illumina flow cell for cluster generation. Libraries were sequenced on the Genome 496

Analyzer IIx (TrueSeq cBot-GA v2 and TruSeq v5 SBS kit) and HiSeq 2000 (HiSeq 497

Flow Cell v3 and TruSeq SBS Kit v3) following the manufacturer?s protocols. For 498

ChIP-qPCR, nuclei were prepared essentially as described in Functional Analysis of 499 82 DNA and Chromatin . Immunoprecipitations were performed using Anti-GAL4 (sc-510, 500

Santa Cruz Biotechnology), Anti-IgG (10500C, Invitrogen) and Anti-H3K27me3 kindly 501

provided by Thomas Jenuwein. Anti-FOXK1 (ab18196) was purchased from Abcam , 502

Anti-ASXL1 (sc85283) from Santa Cruz Biotechnology and Anti-GlcNAc (HGAC85) 503

from Novus Biologicals. Primer sets used for qPCR are listed in the Supplemental 504

Experimental Procedures. 505

11/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Data analysis 506

Description of data processing and analysis methods are available in the Supplementary 507

Methods. 508

Accession Numbers 509

Mass spectrometry data have been submitted to the PeptideAtlas database http://www. 510 peptideatlas.org/ and assigned the identifier PASS00347. Protein interactions have 511 been submitted to the IMEx (http://www.imexconsortium.org) consortium through 512 IntAct83 and assigned the identifier IM-21659. Sequencing data have been submitted 513 to the NCBI Omnibus (http://www.ncbi.nlm.nih.gov/geo) under 514 accession no. GSE51673. 515

Acknowledgements 516

We thank I. Nissen and M. Kohler for technical support on ChIP-seq. Illumina sequencing 517

was done in the Genomics Facility Basel at D-BSSE, ETH Z¨urich. Research of SH and 518

MG is supported by the European Union 7th Framework project SYBILLA (Systems 519

Biology of T-cell activation) and the Innovative Medicines Initiative project ULTRA-DD. 520

Research of RA is funded by advanced ERC grant Proteomics v3.0 (233226) and by 521

SystemsX.ch, the Swiss initiative for systems biology. Research of RP is funded by the 522

Swiss National Science Foundation and the ETH Z¨urich. 523

References 524

1. Sarkies, P. & Sale, J.E. Cellular &epigenetic stability and cancer. Trends in genetics 28, 525

118-127 (2012). 526

2. Ringrose, L. Polycomb comes of age: genome-wide profiling of target sites. Current opinion 527

in cell biology 19, 290-297 (2007). 528

3. Jaenisch, R. & Young, R. Stem cells, the molecular circuitry of pluripotency and nuclear 529

reprogramming. Cell 132, 567-582 (2008). 530

4. Maurange, C., Lee, N. & Paro, R. Signaling meets chromatin during tissue regeneration in 531

Drosophila. Current opinion in genetics & development 16, 485-489 (2006). 532

5. Sparmann, A. & van Lohuizen, M. Polycomb silencers control cell fate, development and 533

cancer. Nature reviews. Cancer 6, 846-856 (2006). 534

6. Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 535

298, 1039-1043 (2002). 536

7. Muller, J. et al. Histone methyltransferase activity of a Drosophila Polycomb group repressor 537

complex. Cell 111, 197-208 (2002). 538

8. Fischle, W. et al. Molecular basis for the discrimination of repressive methyl-lysine marks in 539

histone H3 by Polycomb and HP1 chromodomains. Genes & development 17, 1870-1881 (2003). 540

9. Min, J., Zhang, Y. & Xu, R.M. Structural basis for specific binding of Polycomb chromod- 541

omain to histone H3 methylated at Lys 27. Genes & development 17, 1823-1828 (2003). 542

10. de Napoles, M. et al. Polycomb group proteins Ring1A/B link ubiquitylation of histone 543

H2A to heritable gene silencing and X inactivation. Developmental cell 7, 663-676 (2004). 544

11. Stock, J.K. et al. Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase 545

II at bivalent genes in mouse ES cells. Nature cell biology 9, 1428-1435 (2007). 546

12. Wang, H. et al. Role of histone H2A ubiquitination in Polycomb silencing. Nature 431, 547

873-878 (2004). 548

13. Klymenko, T. et al. A Polycomb group protein complex with sequence-specific DNA-binding 549

12/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

and selective methyl-lysine-binding activities. Genes & development 20, 1110-1122 (2006). 550

14. Lagarou, A. et al. dKDM2 couples histone H2A ubiquitylation to histone H3 demethylation 551

during Polycomb group silencing. Genes & development 22, 2799-2810 (2008). 552

15. Scheuermann, J.C. et al. Histone H2A deubiquitinase activity of the Polycomb repressive 553

complex PR-DUB. Nature 465, 243-247 (2010). 554

16. Furuyama, T., Banerjee, R., Breen, T.R. & Harte, P.J. SIR2 is required for polycomb 555

silencing and is associated with an E(Z) histone methyltransferase complex. Current biology 556

14, 1812-1821 (2004). 557

17. Saurin, A.J., Shao, Z., Erdjument-Bromage, H., Tempst, P. & Kingston, R.E. A Drosophila 558

Polycomb group complex includes Zeste and dTAFII proteins. Nature 412, 655-660 (2001). 559

18. Strubbe, G. et al. Polycomb purification by in vivo biotinylation tagging reveals cohesin 560

and Trithorax group proteins as interaction partners. Proceedings of the National Academy of 561

Sciences of the United States of America 108, 5572-5577 (2011). 562

19. Beisel, C. et al. Comparing active and repressed expression states of genes controlled by 563

the Polycomb/Trithorax group proteins. Proceedings of the National Academy of Sciences of 564

the United States of America 104, 16615-16620 (2007). 565

20. Enderle, D. et al. Polycomb preferentially targets stalled promoters of coding and noncoding 566

transcripts. Genome research 21, 216-226 (2011). 567

21. Oktaba, K. et al. Dynamic regulation by polycomb group protein complexes controls 568

pattern formation and the cell cycle in Drosophila. Developmental cell 15, 877-889 (2008). 569

22. Schuettengruber, B. et al. Functional anatomy of polycomb and trithorax chromatin 570

landscapes in Drosophila embryos. PLoS biology 7, e13 (2009). 571

23. Gao, Z. et al. PCGF homologs, CBX proteins, and RYBP define functionally distinct PRC1 572

family complexes. Molecular cell 45, 344-356 (2012). 573

24. Margueron, R. et al. Ezh1 and Ezh2 maintain repressive chromatin through different 574

mechanisms. Molecular cell 32, 503-518 (2008). 575

25. Shen, X. et al. EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 576

in maintaining stem cell identity and executing pluripotency. Molecular cell 32, 491-502 (2008). 577

26. Vandamme, J., Volkel, P., Rosnoblet, C., Le Faou, P. & Angrand, P.O. Interaction pro- 578

teomics analysis of polycomb proteins defines distinct PRC1 complexes in mammalian cells. 579

Molecular & cellular proteomics 10, M110 002642 (2011). 580

27. Sanchez, C. et al. Proteomics analysis of Ring1B/Rnf2 interactors identifies a novel complex 581

with the Fbxl10/Jhdm1B histone demethylase and the Bcl6 interacting . Molecular 582

& cellular proteomics 6, 820-834 (2007). 583

28. Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated histone H3 584

and RNA and are enriched in facultative heterochromatin. Molecular and cellular biology 26, 585

2560-2569 (2006). 586

29. Farcas, A.M. et al. KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition 587

of CpG islands. eLife 1, e00205 (2012). 588

30. He, J. et al. Kdm2b maintains murine embryonic stem cell status by recruiting PRC1 589

complex to CpG islands of developmental genes. Nature cell biology 15, 373-384 (2013). 590

31. Wu, X., Johansen, J.V. & Helin, K. Fbxl10/Kdm2b recruits polycomb repressive complex 1 591

to CpG islands and regulates H2A ubiquitylation. Molecular cell 49, 1134-1146 (2013). 592

32. Morey, L., Aloia, L., Cozzuto, L., Benitah, S.A. & Di Croce, L. RYBP and Cbx7 define 593

specific biological functions of polycomb complexes in mouse embryonic stem cells. Cell reports 594

3, 60-69 (2013). 595

33. Morey, L. et al. Nonoverlapping functions of the Polycomb group Cbx family of proteins in 596

embryonic stem cells. Cell stem cell 10, 47-62 (2012). 597

34. Tavares, L. et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target 598

sites independently of PRC2 and H3K27me3. Cell 148, 664-678 (2012). 599

35. Hunkapiller, J. et al. Polycomb-like 3 promotes polycomb repressive complex 2 binding to 600

CpG islands and embryonic stem cell self-renewal. PLoS genetics 8, e1002576 (2012). 601

13/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

36. Sarma, K., Margueron, R., Ivanov, A., Pirrotta, V. & Reinberg, D. Ezh2 requires PHF1 602

to efficiently catalyze H3 lysine 27 trimethylation in vivo. Molecular and cellular biology 28, 603

2718-2731 (2008). 604

37. Kim, H., Kang, K. & Kim, J. AEBP2 as a potential targeting protein for Polycomb 605

Repression Complex PRC2. Nucleic acids research 37, 2940-2950 (2009). 606

38. Kim, T.G., Kraus, J.C., Chen, J. & Lee, Y. JUMONJI, a critical factor for cardiac de- 607

velopment, functions as a transcriptional repressor. The Journal of biological chemistry 278, 608

42247-42255 (2003). 609

39. Pasini, D. et al. JARID2 regulates binding of the Polycomb repressive complex 2 to target 610

genes in ES cells. Nature 464, 306-310 (2010). 611

40. Peng, J.C. et al. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target 612

gene occupancy in pluripotent cells. Cell 139, 1290-1302 (2009). 613

41. Shen, X. et al. Jumonji modulates polycomb activity and self-renewal versus differentiation 614

of stem cells. Cell 139, 1303-1314 (2009). 615

42. Glatter, T., Wepf, A., Aebersold, R. & Gstaiger, M. An integrated workflow for charting 616

the human interaction proteome: insights into the PP2A system. Molecular systems biology 5, 617

237 (2009). 618

43. Varjosalo, M. et al. Interlaboratory reproducibility of large-scale human protein-complex 619

analysis by standardized AP-MS. Nature methods 10, 307-314 (2013). 620

44. Gambetta, M.C., Oktaba, K. & Muller, J. Essential role of the glycosyltransferase sxc/Ogt 621

in polycomb repression. Science 325, 93-96 (2009). 622

45. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioin- 623

formatics 20, 1466-1467 (2004). 624

46. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 625

1150-1159 (2010). 626

47. Behrends, C., Sowa, M.E., Gygi, S.P. & Harper, J.W. Network organization of the human 627

autophagy system. Nature 466, 68-76 (2010). 628

48. Cai, Y. et al. Subunit composition and substrate specificity of a MOF-containing histone 629

acetyltransferase distinct from the male-specific lethal (MSL) complex. The Journal of biological 630

chemistry 285, 4268-4272 (2010). 631

49. Migliori, V., Mapelli, M. & Guccione, E. On WD40 proteins: propelling our knowledge of 632

transcriptional control? Epigenetics 7, 815-822 (2012). 633

50. Gao, Z. et al. An AUTS2-Polycomb complex activates gene expression in the CNS. Nature 634

516, 349-354 (2014). 635

51. van den Boom, V. et al. Non-canonical PRC1.1 Targets Active Genes Independent of 636

H3K27me3 and Is Essential for Leukemogenesis. Cell reports 14, 332-346 (2016). 637

52. van den Boom, V. et al. Nonredundant and locus-specific gene repression functions of PRC1 638

paralog family members in human hematopoietic stem/progenitor cells. Blood 121, 2452-2461 639

(2013). 640

53. Miyata, Y. & Nishida, E. DYRK1A binds to an evolutionarily conserved WD40-repeat 641

protein WDR68 and induces its nuclear translocation. Biochimica et biophysica acta 1813, 642

1728-1739 (2011). 643

54. Morita, K., Lo Celso, C., Spencer-Dene, B., Zouboulis, C.C. & Watt, F.M. HAN11 binds 644

mDia1 and controls GLI1 transcriptional activity. Journal of dermatological science 44, 11-20 645

(2006). 646

55. Dietrich, N. et al. REST-mediated recruitment of polycomb repressor complexes in mam- 647

malian cells. PLoS genetics 8, e1002494 (2012). 648

56. El Messaoudi-Aubert, S. et al. Role for the MOV10 RNA helicase in polycomb-mediated 649

repression of the INK4a tumor suppressor. Nature structural & molecular biology 17, 862-868 650

(2010). 651

57. Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D.M. & Nakatani, Y. A complex with 652

chromatin modifiers that occupies - and -responsive genes in G0 cells. Science 296, 653

14/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1132-1136 (2002). 654

58. Qin, J. et al. The polycomb group protein L3mbtl2 assembles an atypical PRC1-family 655

complex that is essential in pluripotent stem cells and early development. Cell stem cell 11, 656

319-332 (2012). 657

59. Trojer, P. et al. L3MBTL2 protein acts in concert with PcG protein-mediated monoubiqui- 658

tination of H2A to establish a repressive chromatin structure. Molecular cell 42, 438-450 (2011). 659

60. Dou, Y. et al. Regulation of MLL1 H3K4 methyltransferase activity by its core components. 660

Nature structural & molecular biology 13, 713-719 (2006). 661

61. Wysocka, J. et al. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with 662

chromatin remodelling. Nature 442, 86-90 (2006). 663

62. Zhang, P., Lee, H., Brunzelle, J.S. & Couture, J.F. The plasticity of WDR5 peptide-binding 664

cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic acids 665

research 40, 4237-4246 (2012). 666

63. Junco, S.E. et al. Structure of the polycomb group protein PCGF1 in complex with BCOR 667

reveals basis for binding selectivity of PCGF homologs. Structure 21, 665-671 (2013). 668

64. Sanchez-Pulido, L., Devos, D., Sung, Z.R. & Calonje, M. RAWUL: a new ubiquitin-like 669

domain in PRC1 ring finger proteins that unveils putative plant and worm PRC1 orthologs. 670

BMC genomics 9, 308 (2008). 671

65. Beisel, C. & Paro, R. Silencing chromatin: comparing modes and mechanisms. Nature 672

reviews. Genetics 12, 123-135 (2011). 673

66. Alekseyenko, A.A., Gorchakov, A.A., Kharchenko, P.V. & Kuroda, M.I. Reciprocal interac- 674

tions of human C10orf12 and C17orf96 with PRC2 revealed by BioTAP-XL cross-linking and 675

affinity purification. Proceedings of the National Academy of Sciences of the United States of 676

America 111, 2488-2493 (2014). 677

67. Kalb, R. et al. Histone H2A monoubiquitination promotes histone H3 methylation in 678

Polycomb repression. Nature structural & molecular biology 21, 569-571 (2014). 679

68. Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 680

469, 343-349 (2011). 681

69. Caubit, X. et al. Mouse Dac, a novel nuclear factor with homology to Drosophila dachshund 682

shows a dynamic expression in the neural crest, the eye, the neocortex, and the limb bud. 683

Developmental dynamics 214, 66-80 (1999). 684

70. Wu, K. et al. Cell fate determination factor DACH1 inhibits c-Jun-induced contact- 685

independent growth. Molecular biology of the cell 18, 755-767 (2007). 686

71. Fernandes, I. et al. Ligand-dependent corepressor LCoR functions by 687

histone deacetylase-dependent and -independent mechanisms. Molecular cell 11, 139-150 (2003). 688

72. Shi, Y. et al. Coordinated histone modifications mediated by a CtBP co-repressor complex. 689

Nature 422, 735-738 (2003). 690

73. Hansen, K.H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nature 691

cell biology 10, 1291-1300 (2008). 692

74. Cai, L. et al. An H3K36 methylation-engaging Tudor motif of polycomb-like proteins 693

mediates PRC2 complex targeting. Molecular cell 49, 571-582 (2013). 694

75. Musselman, C.A. et al. Molecular basis for H3K36me3 recognition by the Tudor domain of 695

PHF1. Nature structural & molecular biology 19, 1266-1272 (2012). 696

76. Dey, A. et al. Loss of the tumor suppressor BAP1 causes myeloid transformation. Science 697

337, 1541-1546 (2012). 698

77. Hart, G.W., Slawson, C., Ramirez-Correa, G. & Lagerlof, O. Cross talk between O- 699

GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. 700

Annual review of biochemistry 80, 825-858 (2011). 701

78. Vella, P. et al. Tet proteins connect the O-linked N-acetylglucosamine transferase Ogt to 702

chromatin in embryonic stem cells. Molecular cell 49, 645-656 (2013). 703

79. Whisenhunt, T.R. et al. Disrupting the enzyme complex regulating O-GlcNAcylation blocks 704

signaling and development. Glycobiology 16, 551-563 (2006). 705

15/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

80. Iyengar, S., Ivanov, A.V., Jin, V.X., Rauscher, F.J., 3rd & Farnham, P.J. Functional 706

analysis of KAP1 genomic recruitment. Molecular and cellular biology 31, 1833-1847 (2011). 707

81. Orlando, V., Strutt, H. & Paro, R. Analysis of chromatin structure by in vivo formaldehyde 708

cross-linking. Methods 11, 205-214 (1997). 709

82. Santoro, R. Analysis of chromatin composition of repetitive sequences: the ChIP-Chop 710

assay. Methods Mol Biol 1094, 319-328 (2014). 711

83. Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 712

molecular interaction databases. Nucleic acids research 42, D358-363 (2014). 713

84. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. 714

Nature biotechnology 28, 495-501 (2010). 715

Figure Legends 716

Figure 1. Systematic profiling of human Polycomb group (PcG) protein 717

complexes. (a) Workflow for native protein complex purifications from Flp-In HEK293 718

T-REx cells. Open reading frames of 64 bait proteins were cloned into an expression 719

vector containing a tetracycline inducible CMV promoter, Strep-HA fusion tag, and 720

FRT sites. Proteins were affinity purified from whole cell extracts of isogenic cell lines, 721

trypsinized and identified by tandem mass spectrometry on an LTQ Orbitrap XL. High 722

confidence interaction proteins (HCIPs) were hierarchically clustered to infer protein 723

complex compositions. (b) Hierarchical clustering of HCIPs. Clusters of PcG and non- 724

PcG complexes are labeled in red and blue, respectively. The inset shows the location 725

of PRC1.1, PRC1.3/PRC1.5 and the four core proteins RYBP/YAF2 and RING1/2. 726

Clusters defined by single baits are indicated in green. Spearman’s rank correlation 727

coefficient based dissimilarities are color coded as indicated (top right). (c) Protein- 728

protein interaction network of clustered interaction data. Blue lines indicate interactions 729

between proteins within the same cluster. Enlarged hexagon-shaped nodes correspond 730

to the baits used in this study. (d-e) High-density interaction maps of PRC1.3/PRC1.5 731

(d) and PRC1.6 (e). New subunits are highlighted by dashed boxes. Hexagon shaped 732

nodes represent baits; squares: identified HCIPs not used as baits in this study. Black 733

nodes: common core subunits; yellow nodes: DNA binding proteins. 734

Figure 2. High-resolution interaction analysis unravels two structurally dis- 735

tinct classes of PRC2 complexes. (a) Excerpt of Figure 1b showing the PRC2 736

cluster. (b) Interaction map of PRC2 components. The PRC2 core is highlighted by 737

a dashed box. Reciprocal interactions defining the two classes of PRC2 complexes are 738

indicated in blue (PRC2.1) and red (PRC2.2) edges. Orange edges: non-reciprocal 739

interactions. (c) Schematic representation of alternative protein isoforms of LCOR 740

and C10ORF12. Numbers indicate amino acid positions. (d) LCOR interaction map. 741

Orange edges, interactions defined in this study; dashed edges, published interactions. (e) 742

Schematic representation of the employed luciferase reporter system. Amplicons (TSS, 743

+500) used for ChIP-qPCR analysis are indicated. (f) Anti-Gal4 Western Blot showing 744

the expression of Gal4-C10ORF12 and Gal4-LCOR upon tetracycline induction (g) 745

Luciferase activity of tetracycline-induced Gal4-C10ORF12 and Gal4-LCOR expressing 746

cells, normalized to uninduced cells. Values are mean?sd, p-values are from a two-sided 747

t-test (n=7). (h) Anti-Gal4 ChIP-qPCR analysis showing localization of C10ORF12 and 748

LCOR to the reporter TSS. (i) Anti-H3K27me3 ChIP-qPCR analysis at the reporter 749

TSS upon C10ORF12 and LCOR expression. Values are mean±sd, p-values are from a 750

two-sided t-test (n=3). 751

Figure 3. Human PR-DUB complexes contain OGT1 and FOXK transcrip- 752

tion factors. (a) Excerpt of Figure 1b showing the PR-DUB and 19S proteasome 753

16/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

clusters. (b) Topology of PR-DUB complexes. Interactions of bait proteins with proteins 754

localized in PR-DUB cluster are indicated in blue. WDR5 shares many interacting 755

proteins with OGT1 (indicated in red), which are predominantly MLL/SET complex 756

associated proteins, and does not interact with BAP1, ASXL1 and 2. Hexagons: bait 757

proteins; squares, identified HCIPs not used as baits in this study. Yellow: FOXK1 and 758

2. Orange nodes: OGT1 interactors. Dashed line: ASXL2-MBD5 interaction, which was 759

detected but did not pass our stringent filtering criteria. 760

Figure 4. PR-DUB.1 and PRC1 target largely distinct set of genes. (a) 761

Venn diagrams showing the genome-wide colocalization of high-confidence peaks for the 762

PR-DUB.1 components FOXK1 (blue), ASXL1 (orange) and the O-GlcNAc modification 763

(green). Empirical p-values from peak shuffling are indicated, along with the percentage of 764

intersecting peaks for each feature. The pie chart illustrates the distribution of PR-DUB.1 765

peaks (2703, triple intersection) with respect to TSSs. (b) Average ChIP-seq signal 766

(normalized to total library size) of FOXK1, ASXL1, O-GlcNAc within a 5 kb window 767

centered on RefSeq TSS. (c) Pairwise correlation of PR-DUB.1 feature enrichments at 768

TSSs. Spearman’s rank correlation coefficients are indicated. (d) Functional annotation 769

of high-confidence PR-DUB.1 peaks localizing within 5 kb of annotated TSSs. Top 10 770

significantly enriched MSigDB pathways obtained with GREAT84 are indicated. UCSC 771

tracks of PR-DUB.1 ChIP-seq signals at representative promoters belonging to the top 772

three hits are shown in order of significance (encoded by blue tones). (e) Heatmap of 773

ChIP-seq signals (normalized to total library size) for the indicated features within 10 774

kb of PR C1 and PR-DUB.1 binding sites. (f) Venn diagrams showing the genome-wide 775

colocalization of high-confidence PR-DUB.1 peaks (red), PRC1 (brown) and TIF1B 776

(light blue). A representative UCSC track of ChIP-Seq signals at TSSs bound by all 777

three features is shown. 778

17/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Figure Legends 779

Supplementary Figure 1. Comparison of AP-MS interaction data with public 780

protein interaction databases and high resolution interaction map of SKP1 781

and WDR5. (a) Compilation of human bait proteins used for AP-MS and their relation 782

to core subunits of Drosophila PcG complexes. (b) Number of high confidence interacting 783

proteins (HCIPs) for each bait. Dark blue bars represent interactions annotated in 784

the public protein interaction database IntAct, light blue bars are novel interactions 785

found in this study (overall 75%). (c) SKP1 interactome inferred in this study (blue 786

and orange lines) and annotated in public literature databases (red dotted lines). The 787

thick blue lines represent the main interactions to PcG components that cluster together 788

with SKP1 and form PRC1.1. About half of the identified interactions (n = 42) were 789

F-box proteins, including KDM2B - the only F-box protein associated with the PRC1.1 790

complex. Hexagons indicate bait, squares prey proteins. (d) WDR associated proteins 791 1 overlapping with . Uniprot protein names and number of proteins are indicated. (e) 792

Interaction map of WDR5 AP-MS results. Blue lines represent interactions detected 793

in this study to PcG associated proteins that cluster with WDR5 and form PRC1.6. 794

Interactions with other HCIPs (orange) together with public literature interactions 795

(dotted red lines) could be assigned to known protein complexes (MLL, TCP, NSL, 796

TORC2 and ADA2/GCN5/ADA3 transcription activator complex). Hexagons indicate 797

bait, squares prey proteins. 798

Supplementary Figure 2. High-resolution topology maps of selected clusters. 799

(a) RBBP4/7 interaction proteome. In addition to interacting with PRC2 and HP1, 800

RBBP4/7 proteins were found to bind members of the LINC, NURF, NURD and SIN3 801

complexes. (b) No human Pho-RC could be identified. TYY1 and SMBT1, the homologs 802

of the Drosphila Pho-RC proteins Pho and dSFMBT, respectively, did not interact. 803

However, TYY1 co-purified with the INO80 complex, which is indicated by a dashed 804

circle. (c) Interaction proteome of the four human LMBL paralogs. Only LMBL2 805

exhibited interactions with a PcG protein assembly and is part of PRC1.6. In all panels 806

blue and orange lines represent interactions found in this study; red dashed lines are 807

annotated public literature interactions. Hexagons indicate bait, squares prey proteins. 808

Supplementary Figure 3. Network topology of PRC1 complexes. (a) Protein- 809

protein interactions among the central core proteins of the PRC1 complexes. Note that we 810

detected no interactions between protein paralogs. Edges represent detected reciprocal 811

interactions. (b) Network topology of PRC1.2 and PRC1.4 complexes. Note that 812

PCGF2/4 can assemble to PRC1.2/PRC1.4 or can form trimeric complexes (edges in 813

red) with RING1/2 and RYBP/YAF2 (we detect no interactions between RYBP/YAF2 814

with CBX and PHC proteins). Edges connecting PRC1.2/PRC1.4 core components in 815

blue. (c) Excerpt of Figure 2A indicating the allocation of the WD40 protein DCAF7 to 816

PRC1.3/PRC1.5. High-density interaction maps of PRC1.3/PRC1.5 (C) and PRC1.6 (D). 817

New subunits are highlighted by dashed boxes. Hexagon shaped nodes represent baits; 818

squares: identified HCIPs not used as baits in this study. Black nodes: common core 819

subunits; yellow nodes: DNA binding proteins. (d) PRC1.1 complex contains PCGF1 820

and SKP1 and links the co-repressors BCOR and BCORL as well as the demethylase 821

KDM2B to the PRC1.1 core. Edges connecting core components are in purple. 822

Supplementary Figure 4. Interaction proteomes of PRC1 and HP1 com- 823

plexes. (a) CBX1/3/5 interactions with (ZNF; orange nodes) and neighbor- 824

ing proteins. Note that all bait proteins of this subnetwork interacted with the nuclear 825

co-repressor TIF1B (yellow), which underscores the reported function of TIF1B in the 826

18/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

recruitment of HP1 proteins to specific DNA sequences through interaction with zinc 827

finger transcription factors. Hexagons indicate bait, squares prey proteins. Blue edges, 828

interactions of potential core subunits of the corresponding network, interactions in public 829

databases as dashed lines, others in orange. (b) CBX1 and CBX3 but not CBX5 interact 830

with the histone methyltransferases EHMT1 and EHMT2. The CBX1/3-EHMT1/2 831

complex also interacts with zinc finger transcription factors WIZ, ZN644, ZN462 and 832

the co-repressor protein TIF1B. In contrast to a previous report we did not detect 833 2 any potential interactions of EHMT1/2 with PRC1.6 . (c) Interaction of CBX3 and 834

CBX5 with SENP7. This complex may also include TIF1B, the zinc finger transcription 835

factor AHDC1 and the histone chaperone CHAP1. (d) CBX1/3/5 complex with histone 836

chaperone CAF1 and RBBP4, which is potentially involved in DNA replication. (e) 837

Centromeric DSN1/MIS12 complex with HP1 proteins, involved in mitosis. 838

Supplementary Figure 5. Characterization of PRC2 core interacting proteins 839

C17ORF96 and C10ORF12. (a) C17ORF96 and SKDA1 are related proteins, which 840

share sequence similarity at their C-terminus. BLAST search result with C17ORF96 841

(Uniprot A6NHQ4) as query sequence. (b) CLUSTAL 2.1 alignment of C17ORF96 842

(Uniprot A6NHQ4) and SKDA1 (Uniprot Q1XH10) amino acid sequences. Homologous 843

C-termini, identified by BLAST search, indicated in red. (c) All AP-MS identified 844

peptides that match to LCOR-CRA b. Colors indicate specific protein regions as in 845

Fig. 2c. (d) Number of identified LCOR and C10ORF12 isoform peptides in PRC2 846

protein purification experiments. (e) Number of PRC2 prey peptides in LCOR and 847

C10ORF12 isoform AP-MS experiments. (f) C10ORF12 localize in the nucleus. Images 848

show anti-HA in situ stainings of stable Flp-In HEK293 T-REx cell lines before and 849

after tetracycline induction. HA-EGFP shows a dispersed, cytoplasmic signal whereas 850

the C10ORF12 HA-epitope fusion protein shows a nuclear signal. In situ stainings were 851 3 performed as described in Glatter et al., 2009 . 852

Supplementary Figure 6. Functional annotation of PR-DUB.1 components 853

binding sites and genome-wide analysis of PR-DUB.1, PRC1 and TIF1B 854

enrichments at TSSs. (a) Sequence read statistics of ChIP-seq experiments. (b) 855

Functional annotation of high-confidence MACS peaks localizing within 5kb of annotated 856 4 TSSs. Enriched MSigDB pathways were computed with GREAT . (c) Pairwise scatter 857

plots of PR-DUB.1, PRC1 and TIF1B enrichments at TSSs. Spearman’s rank correlation 858

coefficients are indicated. 859

Supplementary Methods 860

Cell Line Generation 861

Flp-In HEK293 T-REx cells (Invitrogen) containing a single genomic FRT site and 862

stably expressing the tet repressor were cultured in DMEM (4.5 g/l glucose, 10% FCS, 2 863

mM L-glutamine) containing 100 µg/ml zeocin and 15 µg/ml blasticidin. The medium 864

was exchanged with DMEM medium containing 15 µg/ml blasticidin before transfection. 865

For cell line generation, Flp-In HEK293 T-REx cells were co-transfected with the 866

corresponding expression plasmids and the pOG44 vector (Invitrogen) for co-expression 867

of the Flp-recombinase using the Lipofectamine 2000 transfection reagent (Invitrogen). 868

Two days after transfection, cells were selected in hygromycin-containing medium (100 869

µg/ml) for 2-3 weeks. 870

19/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Protein Identification 871 5 Mass spectrometry raw data were searched with X!Tandem against a human protein 872

sequence database (Swiss-Prot canonical reviewed human proteome reference data set; 873 http://www.uniprot.org/), including reverse decoy sequences for all entries. The 874 search parameters were set to include only fully tryptic peptides (KR/P) containing 875

up to two missed cleavages. Peptide modifications consisted of Carbamidomethyl 876

(+57.021465 amu) on Cys (static) and oxidation (+15.99492 amu) on Met (dynamic) 877

and phosphorylation (+79.966331 amu) on Ser, Thr, Tyr (dynamic) were set as dynamic 878

peptide modifications. Precursor mass error tolerance was set to 25 ppm, the fragment 879

mass error tolerance to 0.5 Da. Obtained peptide spectrum matches were statistically 880

evaluated using PeptideProphet and protein inference by ProteinProphet, both part of 881 6 the Trans Proteomic Pipeline . A minimum protein probability of 0.9 was set to match 882

a false discovery rate (FDR) of <1%. The resulting pep.xml and prot.xml files were used 883

as input for the spectral counting software tool Abacus7 to calculate spectral counts and 884 8,9 normalized spectral abundance factor (NSAF) values . 885

Evaluation of high confidence interacting proteins (HCIP) 886

Adjusted NSAF values of identified co-purified proteins were compared to a control 887

data set of 62 StrepHA-GFP and 12 StrepHA-RFP-NLS purification experiments. The 888

protein abundance in the control data set was estimated by averaging the 10 highest 889

NSAF values per protein among all 74 measurements. Protein abundance enrichment of 890

>10 fold compared to the control data set was used as an initial step for filtering protein 891

interaction raw data. Adjusted NSAF values were also used to calculate WDN-scores 892 10 of all the interaction candidates . A simulated data matrix was used to calculate 893

the WD-score threshold below which 98% of the simulated data falls. From this high 894

confidence interaction data set (control ratio > 10; WDN-score > 1) a distance matrix was 895 11 calculated with the Multiple Experiment Viewer (http://www.tm4.org/mev/) using 896 an uncentered Pearson distance metric and mapped on the unfiltered raw interactions. 897

To relax filtering stringency in close proximity in the network, sub-threshold interactions 898

(control ratio and WDN-score) were rescued if the distance was greater than zero 899

(n = 314 protein interactions). The resulting filtered data set contained the high 900

confidence interacting proteins (HCIPs) and corresponding protein-protein interactions. 901

For comparison to literature data, all human protein interactions were extracted from 902 12 the public database IntAct . 903

Clustering analysis 904 13 All data analyses were performed using R (http://www.R-project.org). Agglom- 905 erative hierarchical clustering of HCIP was performed using adjusted NSAF values. 906

Different correlation-based dissimilarity measures were considered in combination with 907

commonly adopted intergroup dissimilarity measures (single, average and complete 908

linkage functions). For each pair of measures, clustering performances were evaluated 909

using the cophenetic correlation coefficient, which measures the ability of a dendrogram 910 14 to represent the input data structure . As a result of this procedure, hierarchical 911

clustering was performed by adopting a Spearman’s rank correlation coefficient based 912

dissimilarity along with average linkage. Therefore, the dissimilarity between prey i and 913 j was computed as dij = (1 − r(xi, xj))/2, where r is the SCC. 914

20/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Network Visualization 915 15 Protein Interaction data were visualized with Cytoscape 2.8.3 . Known bait interactions 916

were obtained from the protein interaction network analysis platform PINA v2 (December 917 16 2012) using bait protein identifiers as starting nodes. 918

ChIP-Seq data analysis 919

ChIP-Seq profiles of RING1B, RYBP, TIF1B (GEO accession number GSM855007, 920

GSM855008 and GSE27929, respectively) and corresponding input data sets were 921

downloaded in sra format and converted to fastq using the NCBI Short Read Archive 922

Toolkit. Short reads were aligned to the human genome (hg19 assembly) using Bowtie 923 17 2.0.0 allowing for 1 mismatch in a 30nt seed, reporting best out of at most 100 924

alignments. Overall alignment rates ranged between 83 and 94% for in-house generated 925

data sets and between 75 and 98% for the others. Alignments were converted from SAM 926 18 format to BAM using SAMtools 0.1.18 . Peak calling was performed using MACS 927 19 1.4.0 with default parameters. Peaks were then filtered according to p-values (p 928 −10 < 10 ). If replicates were available, only peak intersections were considered further 929

and denoted as high-confidence peaks. All subsequent analyses were performed using 930 20 R/Bioconductor . Coverage tracks at single resolution were generated with 931 21 wavClusteR . Overlapping peaks were determined using GenomicRanges using a 932

minimum overlap of 1bp. RefSeq transcript annotations were fetched from UCSC using 933

GenomicFeatures. Unique TSSs were defined as TSSs having no other annotated TSS 934

within their 1kb flanking region, irrespective of the strand. A total of 21612 unique 935

TSSs was considered further. Metaprofiles of ChIP-Seq signals at TSSs (± 2.5kb) were 936

computed using non-overlapping windows of width 50 nt. ChIP-Seq signal heatmaps 937 22 were computed with Genomation . Feature enrichments at unique promoters were 938 23 computed as described in . MSigDB pathway analysis was performed with GREAT 939 4 2.0.2 by associating genomic regions to single nearest annotated genes within 5kb. 940

Supplementary References 941

1. Cai, Y. et al. Subunit composition and substrate specificity of a MOF-containing histone 942

acetyltransferase distinct from the male-specific lethal (MSL) complex. The Journal of biological 943

chemistry 285, 4268-4272 (2010). 944

2. Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D.M. & Nakatani, Y. A complex with 945

chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science 296, 946

1132-1136 (2002). 947

3. Glatter, T., Wepf, A., Aebersold, R. & Gstaiger, M. An integrated workflow for charting the 948

human interaction proteome: insights into the PP2A system. Molecular systems biology 5, 237 949

(2009). 950

4. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. 951

Nature biotechnology 28, 495-501 (2010). 952

5. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinfor- 953

matics 20, 1466-1467 (2004). 954

6. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 955

1150-1159 (2010). 956

7. Fermin, D., Basrur, V., Yocum, A.K. & Nesvizhskii, A.I. Abacus: a computational tool for 957

extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. 958

Proteomics 11, 1340-1345 (2011). 959

8. Paoletti, A.C. et al. Quantitative proteomic analysis of distinct mammalian 960

complexes using normalized spectral abundance factors. Proceedings of the National Academy 961

21/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

of Sciences of the United States of America 103, 18928-18933 (2006). 962

9. Zybailov, B. et al. Statistical analysis of membrane proteome expression changes in Saccha- 963

romyces cerevisiae. Journal of proteome research 5, 2339-2347 (2006). 964

10. Behrends, C., Sowa, M.E., Gygi, S.P. & Harper, J.W. Network organization of the human 965

autophagy system. Nature 466, 68-76 (2010). 966

11. Saeed, A.I. et al. TM4: a free, open-source system for microarray data management and 967

analysis. BioTechniques 34, 374-378 (2003). 968

12. Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 969

molecular interaction databases. Nucleic Acids Research 42, D358-63 (2014). 970

13. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 971

(2012). 972

14. Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning data mining, 973

inference, and prediction, Edn. 2nd. (Springer, New York, N.Y.; 2009). 974

15. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular 975

interaction networks. Genome research 13, 2498-2504 (2003). 976

16. Wu, J. et al. Integrated network analysis platform for protein-protein interactions. Nature 977

methods 6, 75-77 (2009). 978

17. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature methods 979

9, 357-359 (2012). 980

18. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 981

2078-2079 (2009). 982

19. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome biology 9, R137 983

(2008). 984

20. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. 985

Nature methods 12, 115-121 (2015). 986

21. Comoglio, F., Sievers, C. & Paro, R. Sensitive and highly resolved identification of RNA- 987

protein interaction sites in PAR-CLIP data. BMC bioinformatics 16, 32 (2015). 988

22. Akalin, A. et al. Genomation: a toolkit to summarize, annotate and visualize genomic 989

intervals. Bioinformatics 31, 1127-1129 (2014). 990

23. Enderle, D. et al. Polycomb preferentially targets stalled promoters of coding and noncoding 991

transcripts. Genome research 21, 216-226 (2011). 992

22/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 1 a Double Affinity LC-MS/MS Identification Contaminant Subcomplex 64 Baits Flp-In HEK293 Cells Purification LTQ Orbitrap XL X!Tandem + TPP Filtering Assignments Gene of interest 490 HCIPs N +Tet 174 AP-MS WD -Score 1400 interactions Hierarchical Strep FRT experiments Clustering HA 72 AP-MS control TetO (n > 2 biological experiments intensity CMV Hygro replicates) FRT recombination m/z Streptactin αHA

CSK22 b c PHC2 CBX6 PRC1 PHC1 PCGF2 CSK21 CBX7 CBX8 PCGF3 PRC1.1 DCAF7 CBX2 PHC3 TCP PCGF5 PRC1.3/PRC1.5 OGT1 PR-DUB PCGF4 WDR5 CBX4 DCAF7 YAF2 ASXL2MLL RING1 RYBP RYBP RING2 MLL RING2 YAF2 RING1 ASXL1 LMBL2 PRC1.6 BAP1 E2F6 OGT1 MAX PCGF6 TFDP1 LCOR MAX LMBL4 PCGF1 E2F6 HP1 (CBX1/3/5) ZMYM4 19S INO80 proteasome -complexes LMBL3 SKP1 TFDP1 SUV92 YY1 CBX3 TRIPC LMBL1 WDR5 PR-DUB TCP 19S proteasome CBX5 CBX1 HP1 SKP1 ZN462 ZN211 EZH1 C10orf12 EHMT1 PHF19 EZH2 SUV91 LMBL1 SUZ12 CAF1B 20S proteasome AEBP2 DSN1 LCOR C17orf96 ZMYM4 YY1-INO80 PHF1 PRC2 EED Z518A MTF2 RBBP7 SENP7 LINC, NURF, NURD, SIN3 SMBT1 JARD2 PRC1.2/PRC1.4 RBBP4 LINC NURF PRC2 NURD SIN3 d e REXONMYCL1 MNT MXI1 ARGI1 SPR1B RB

PRC1.6 MAD1 MYC MAD4 MAD3

RYBP RYBP CBX3 PRC1.3/5

RING1 PCGF5 RING1 MAX WDR5

CSK22 GALT8

FBRS MK67I PCGF6 MGAP CBX1

FBSL DDX54 DCAF7 CSK2B

SURF6 AUTS2

GSCR2 RING2 TFDP1 LMBL2 CSK21

TCPW FA53C DYR1B DYR1A YAF2 E2F6 RING2 PCGF3 ZN503 SWAHA DIAP1

TROAPSWAHC ZN703 GEMI ERLN2 USMG5 TGM3 TFDP2 POF1B AMY1 COMD4 YAF2

PFD4 PSME3 PFD6 ARF3 CAND1 AT2A2 APOD SMR3B CYTS COMD6

PFD2 PFD3 PFD5 DIC ABCD3 MCIN DHE3 GSTP1 CYTT COMD1

DNA binding proteins NTPCR AKAP8 S10AB COMD8 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 2 a b BIEA c PRC2 1 PHF19 111 LCOR 1 433 C17ORF96 EZH1 MTF2 SKDA1 SUZ12 C10ORF12 1 1247 EED 1 1557 0 RBBP7 LCOR-Cra_b EZH2 AEBP2 PHF1 C17ORF96 (GenBank: EAW49962.1) 311 RBBP4 EZH1 PHF1 C10ORF12 JARD2 RBBP4 SUZ12 d AEBP2 TBA1C SKDA1 IMA7 PHF19 DEK H33 EED RBBP7

GRN LCOR CTBP1 - Tet e JARD2 MTF2 C10ORF12 GAL4 binding sites TK Luciferase

TSS +500 EZH2 CTBP2 + Tet C10ORF12 ? Gal4 h i GAL4 binding sites TK Luciferase C10ORF12 LCOR H3K27me3 TSS +500 TSS +500 C10ORF12 LCOR f g p = 0.004 p = 0.03 C10ORF12 100 Tet Tet Tet Tet - Tet + Tet - - - 80 - + + + 188 + 60 4e-05 LCOR 40 % of Input Tet - Tet + 3e-08 % of Input % of Input 20 Relative Activity (%)

62 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00 0.05 0.10 0.15

LCOR 0.00 0.05 0.10 0.15 0.20 Gal4 IgG Gal4 IgG Gal4 IgG Gal4 IgG TSS +500 TSS +500 C10ORF12 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 3 a 19S proteasome PR-DUB 1 PGAM5 ASXL1 PSMD8 MBD5 0 FOXK2 FOXK1 PSD7 MBD6 PSD11 KDM1B PSMD2 ASXL2 PSMD3 OGT1 PRS10 HCFC1 PRS7 BAP1 PRS8 SSRD UBE2O PSMD4 TBA1C

19S proteasome b regulatory subunit

PSD7 PSD11 PRS6B PRS10 PRS7 PRS8

SMHD1 ZEP1 DIDO1 NCOAT TRAK1 HACD3 SSRD UBE2O PSMD2 PSMD8 PSMD3 PSMD4

PSPC1 RC3H2 CYTA TET1 HCFC2

SPAT7 1433F ZEP2 CARM1 BAP1 PGAM5

RL39 OGT1 TBA1C ASXL1 TCPB

HCFC1 TCPG KDM1B FOXK1 FOXK2 MBD6 MBD5

TCPE

CXXC1 KANL1 KANL2 KANL3 MCRS1 MLL1 RBBP5 SET1A ASH2L ASXL2 SLUR1

WDR5

shared HCIP of OGT1 and WDR5 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 4 a b OGlcNac FOXK1 ASXL1 OGlcNac (n=16699)

4827 373 2923

FOXK1 0.02 0.10

(n=7349) 2703 Average signal (RPM) 8796 -2.5kbTSS +2.5kb TSS TSS 1350

18270 c ASXL1 (n=31119) ASXL1 -2 ρ -4 0 2 4 =0.83 Peak position w.r.t. TSSs Overlap (+/-1kb) OGlcNac Distal ρ ρ

-4 0 2 4-2 =0.89 =0.85 Upstream (<5kb) Downstream -4-2 0 2 4 -4-2 0 2 4 (<5kb) FOXK1 ASXL1 d PR-DUB.1 - MSigDB Pathway -log10(Binomial p value) 0 10 20 30 40 50 60 70 80 90 100 110 Gene Expression 116.78 Transcription 61.58 Cell Cycle, Mitotic 54.84 Spliceosom e 46.15 Metabolism of proteins 45.81 Form ation and Maturation of m RNA 44.61 HIV Infection 42.49 Influenza Life Cycle 40.18 Diabetes pathways 39.84 Translation 39.00

Chr1 6,260,000 Chr8 101,163,000 Chr1 28,969,000 28,970,000 Chr11 57,426,000 Chr3 127,318,000 40 40 70 80 80 FOXK1 FOXK1 1 1 1 1 1

109 75 120 35 70 ASXL1 ASXL1 1 1 1 1 1

65 60 50 60 30 OGlcNac OGlcNac 1 1 1 1 1

RPL22 POLR2K TAF12 CLP1 MCM2 e f FOXK1 ASXL1 OGlcNac RYBP RING1B PR-DUB.1 (n=2703)

312 1552 4924 PRC1 PR-DUB.1 (n=6816) 336 503 1244

8214 TIF1B (n=10297)

2 kb 163,291,000 163,292,000 163,293,000 Chr1 40 FOXK1 1

PRC1 55 ASXL1 1

145 OGlcNac 1

20 RING1B 1

20 RYBP 0-5kb+5kb 1 20 TIF1B 1 0 0.3 0 0.3 0 0.2 0 0.2 0 0.18 RGS5 NUF2 Supplementary Figure 1 a b 80 Annotated in IntAct Novel in this study 70

60 Primary baits Secondary baits 50

“PcG system” HC IPs

CBX2 RYBP MAX CBX4 RING1 YAF2 EHMT2 40 CBX6 SKP1 LMBL2 CBX7 RING2 DCAF7 WDR5 Pc Sce 30 CBX8 bioRxiv preprint doi:CSK21 https://doi.org/10.1101/059964LMBL1 ; this version posted July 7, 2016. The copyright holder for this preprint (which was not Number of certified by PCGF1peer review)CSK22 is theLMBL3 author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under PRC1 E2F6 LMBL4 PHC1 20aCC-BY-NC 4.0 International license. PCGF2 TFDP1 TRIPC Ph Psc PCGF3 PHC2 “heterochromatin” PCGF4 10 PCGF5 CBX1 SUV92 Z518A PHC3 PCGF6 CBX3 DSN1 ZMYM4 CBX5 SENP7 ZN211 0 SUV9 CAF1B ZN462 46 2 21 1 TF 2 BX 3 BX 1 BX 5 BX 4 BX 8 BX 6 BX 7 BX 2 HC 3 HC 2 HC 1 YY 1 DR 5 RD 2 EE D YB P MAX 518 A E2F6 SK 21 SK 22 YA F2 T EZH2 EZH1 PHF1 SKP 1 BAP 1 M C C C C C C C C P P P DSN1 BBP 4 BBP 7 OG T1 R LCOR W Z TRIPC ZN ZN RING1 RING2 SUZ12 PHF19 ASX L1 ASX L2 TF DP1 JA LMBL1 LMBL2 C C SUV91 SUV92 LMBL4 LMBL3 CAF1B DC AF7 AEBP 2 R R SE NP7 PCGF6 PCGF5 PCGF3 PCGF4 PCGF2 PCGF1 SMBT1 EHMT2 ZMYM4

EZH1 17 ORF96 10 ORF12 EED C C JARD2 Bait Proteins EZH2 E(z) Esc AEBP2 RBBP4 PRC2 RBBP7 PHF1 C17ORF96 Su(z)12 Pcl C10ORF12 c LCOR RING1 BCOR KDM2B BCORL ARHG6 GIT1 ARHG7 FXL17 FBX28 FBSP1 FBX38 FBX42 SUZ12 PHF19

MTF2 FXL20 FBW1A FBX21 FBX30 FXL19 YAF2

FXL18 FXL15 FBX10 FXL14

ASXL1 ASXL2 RYBP

FBX44 FBXW5 FBX17 FBX22 FBXL8 ASX RING2 OGT1 SKP1 FBX18 KDM2A FXL12 FBX6 FBX3 PR-DUB

FBX9 FBX2 FBXL4 FBW1B FBX7 Calypso CBX4 MAX FBXW2 FBX5 FBX4 FBXL2

BAP1 CBX8 PCGF1 FBXW4 FBXW9 FBXW8 FBXL6

FBX33 FBX46 FBX11 YY1 PSB2 PSB3 Pho PSB1 PSA1 CKS1 NEDD8 PHORC F172A RBM14 EMAL4 PSA5 PSA3 SKP2

dSfmbt MTUS1 MYCB2 CCDC8 PSMF1 Proteasome PSA6 CUL1 RBX1 SFMBT NDC80 CKAP4 DYL1 PSB4 PSA7 CUL7

PSB6 PSA2

PSB7 PSA4 PSB5

MLL3 MLL complexes d e PAGR1 MLL2

MLL4 KDM6A

SET1A NCOA6 TCP complex MS3L1 MLL1 Cai et al, 2010 HCFC2 MSL2 NSL1 PRC1.6 TCPH TCPE 5 RUVB1 SET1B HCFC1 RUVB2 TCPG TCPZ

CBX3 DPY30 PAXI1 TCPA TCPB ASH2L MLL2 TFDP1 MEN1 WDR82 HCFC1 MLL3 CBX1 PCGF6 TCPQ TCPD CXXC1 RBBP5 KANL2 MLL4 MAX ASH2L 16 KANL3 OGT1 KAT8 PAXI1 LMBL2 EHMT2 MCRS1 PHF20 E2F6 MEN1 RBBP5 MGAP MLL1 SET1A HDAC1

RING1 RYBP ARHG2 HELB RICTR TCPB HDAC2 48 ATN1 KANL1 RING1 TCPD CBX3 LMBL2 RING2 TCPE RING2 YAF2 YETS2 CEP72 MAX RL35A TCPG WDR5 TAD2A ZZZ3 CSK21 MBIP1 RL37A TCPH CSR2B MGAP RM11 TCPQ CYTSB MSL1 SESN2 TCPZ MBIP1 CSR2B E2F6 PCGF6 SGF29 TFDP1 F199X PDPK1 SIN1 YAF2 SGF29 TADA3 HACD3 PPR3F TAD2A YETS2 CSK21 CSK22 HDAC1 PRR5 TADA3 ZXDC HDAC2 RERE TCPA ZZZ3 ARHG2 BD1L1 HELB RICTR RERE ADA2/ KANL3 KANL2 RL35A RL37A PDPK1 GCN5/

PHF20 MCRS1 MSL1 PRR5 SESN2 RM11 SIN1 ATN1 ADA3 this study transcription KANL1 KAT8 CYTSB ZXDC HACD3 TORC2 activator OGT1 CEP72 PPR3F F199X complex

NSL complex bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. a HP1 Supplementary Figure 2

CHD4 CAF1A

HAT1

CAF1B CBX3 ASF1B

ASF1A

SENP7 CBX5

CBX1 IMA7

P66B BRMS1 MTA1

DPY30 NOL9

PRC2 CDKA1 MBD2 H33

JARD2 PHF19 AEBP2 RBBP4 BRM1L SAP30

EZH1 EED EZH2 P66A MBD3

SUZ12 C10ORF12 PHF1

CHD3 MTA2

C17ORF96 TBA1C MTF2 RBBP7 SIN3B SIN3A LINC RL39 SP30L HDAC1 NURF PFD6 HDAC2 NURD

LIN52 BC11A ANR27 BEND7 SIN3 MYBB

BAP18 FA60A LIN37 Z512B HJURP

MTA3 BPTF PWP2A ZN296 LIN9 LIN54 b

IN80E

ACL6A INO80

LMBL3

TFPT UCHL5 HDAC1 KDM1A DYL1

IN80C RUVB2 RCOR1 SMBT1 DYL2 TYY1

IN80D RUVB1 ZN217

MCRS1 IN80B HDAC2 GSE1 RCOR3 RREB1

ARP5 NFRKB ARP8 INO80 c

SMBT1

PP2AB LMBL3 LMBL4 PSB6 PRS6B PSB7 YAF2

RYBP TCPD TCPG TCPH TCPA SAMD1 CPVL SAM13 P4HA1 TBA1C RL39 PDGFC

PSB3 PSMD2 TCPB TCPZ TCPE TCPQ PSA3 PSA7

RING2 PRS6A PRS10

HDAC2 HDAC1 PFD5 TIF1B NOP56 FBRL ATD3A PSD11 PRS7 RING1 PFD2 PSMD3 PSB5

PSB1 PSDE

MAX LMBL2 CBX1 TIM50 AKAP8 USMG5 AT2A2 LMBL1 PSB2 PRS4

PCGF6 WDR5 SUCB1 APC1 GALK1 MCM7 PSA5 PRS8 CBX3 PSB4 PSD7 PSMD6 CDC37 LAS1L TBB6 TBB2B SSRA TFDP1 E2F6 SPR1B S10A7 NOL9 NOP58 DNJB6 FANCI MGAP LIN54 TADA3 COMD4 RBM39 AKP8L DHE3 ZN281 SSRD bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Figure 3 a b SCML1 CBX2 PHC1

1433F

1433G CBX4 RING2 YAF2 1433T RING1 YAF2 PCGF1 DPM1 PCGF2

CBX6 PHC2

PCGF2 PCGF4 PCGF3 PCGF5 PCGF4

NUFP2 CBX7 RING1 RYBP

PCGF6

RING2 RYBP CBX8 PHC3

LMNB1 F195A LMNB2 RS3 LTV1 c d PCGF1 1 PRC1.3/5 PRC1.1 KDM2B MORC4 SKP1 BCOR PCGF1 UBP7 BCORL CSK2B CH60 0 YAF2 CSK22 RING2 CSK21 BCORL FBSL AUTS2 BCOR FBRS KDM2B DCAF7 RING1 PCGF5 RYBP PCGF3 YAF2

RYBP SKP1 RING2 RING1 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Figure 4

a AIFM1 RCN2 RUVB2 D2HDH CALU GBG12 MYH9 RUXE KDM1A RCOR1 ZRAB2 Z280CADNP2 Z518B POGZ RLF ZN678

DJB11 KLH21 EIF3B DCTN2 UBQL4 ACTC ILVBL ZMYM2ZSA5AZMYM3 ADNP ZN689 ZN292CHAP1 Z280D

PSB3 RUVB1

PSDE LACRT

PIP CYTA WIZ DYL1

EHMT2 ZN462 Z518A ML12B ZMYM4 DYL2

CS068

EHMT1 ZN644

TCPA TCPB CBX3 CBX5

TCPG TCPH

TIF1B CBX1 SENP7

Zinc finger protein

RL35A MSL1 RB SG2A1 IMA7 PRR14MD2L2 LRIF1 AHDC1 SCAI CC71L ZN581

b c

RUVB1 AIFM1 RCN2 RUVB2 DJB11 AHDC1 CHAP1 RBBP4

LACRT KLH21

CYTA WIZ ZN644 CALU CYTA SENP7

PSB3 EHMT2 ZN462 D2HDH ML12B

PSDE TIF1B

PIP EHMT1 CBX5 CBX3 CBX5

CBX1 CBX3 TIF1B d e RBBP4 ASF1B

H33 CAF1A CBX3 CBX5 NUDC

PFD2 TIF1B CAF1B NDC80 DSN1 NSL1 IF4G1

PFD4 CBX1 MIS12 PMF1 CBX3 CBX5

CBX1 ZWINT SPC24 SPC25 NUF2 CASC5 Supplementary Figure 5 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Score Expect Methoda CC-BY-NC 4.0 InternationalIdentiti licensees . Positives Gaps 34/64 43/64 1/64 77.8 bits(190) 2e-14 Compositional matrix adjust. (53%) (67%) (1%)

C17ORF96 313 FSLLNCFPCPPALVVGEDGDLKPASSLRLQGDSKPP-PAHPLWRWQMGGPAVPEPPGLKFWGIN 375 F + FPCPP+L++G DGDL PA SL DS+ P AHP+W+WQ+GG A+P PP KF N SKDA1 763 FHFMANFPCPPSLIIGRDGDLWPAYSLNTTKDSQTPHKAHPIWKWQLGGSAIPLPPSHKFRKFN 826 b C17ORF96 ------METLCPAPRLAVPASPR------17 SKDA1 MGDLKSGFEEVDGVRLGYLIIKGKQMFALSQVFTDLLKNIPRTTVHKRMDHLKVKKHHCDLEELRKLKAINSIAFHAAKCTLISREDVEALYTSCKTERVLKTKRRRVGRALATKAPPPERAAAASPRPGFWKDKHQLWRGLSGAARPLP 150 : * .*.*. *..****

C17ORF96 ------GSPCSPTPRKPCRGTQEFSPLCLR------ALAFCALAKPRASSLG------PGPGELAARSPVLRGPQAPLR--PGGWAP 84 SKDA1 ISAQSQRPGAAAARPAAHLPQIFSKYPGSHYPEIVRSPCKPPLNYETAPLQGNYVAFPSDPAYFRSLLCSKHPAAAAAAAAAAAAAAGATCLERFHLVNGFCPPPHHHHHHHHHHHHHHHRAQPPQQSHHPPHHHRPQPHLGSFPESCSS 300 ** . *.**: . ::.. *: * * .* * . *:.* . * : : :.* : **. * * . :.

C17ORF96 DGLKHLWAPTGR------PGVPNTAAGEDADVAACPRRGEEEEGGGGFPHFGVRSCAPPGRCPAPPHPRES------TTSFASAP------PRPAPGLEPQRGPAASP 174 SKDA1 DSESSSYSDHAANDSDFGSSLSSSSNSVSSEEEEEEGEEEEEEEEEEGGSGASDSSEVSSEEEDSSTESDSSSGSSQVSVQSIRFRRTSFCKPPSVQAQANFLYHLASAAAATKPAAFEDAGRLPDLKSSVKAESPAEWNLQSWAPKASP 450 *. . :: . .. ..:: *: : . .. ******.* .. . *...... * ***...* ** . .* ***

C17ORF96 PQEPSSR------PPSPPAG------LSTEPAGPGTAPRPFLPGQPAEVDGNP------PPAAPEAPAASPSTASPAPAAPG------238 SKDA1 VYCPASLGSCFAEIRNDRVSEITFPHSEISNAVKRTDLTINCLAEGASSPSPKTNNAFPQQRILREARKCLQTTPTTHCADNNTIAARFLNNDSSGAEANSEKYSKILHCPEFATDLPSSQTDPEVNAAGAAATKAENPCTDTGDKTLPF 60 0 *:* .*** :. :*. *. .* . ** .:.: .:.*. * *.: *::.... *..*..

C17ORF96 ------DLRQEHFDRLIRRSKLWCYAKGFALDTPSLRRGPER------PPAKGPARGAAKKRR------LPAPPPRTAQPRRPAPTLPTTS------311 SKDA1 LHNIKIKVEDSSANEEYEPHLFTNKLKCECNDTKGEFYSVTESKEEDALLTTAKEGFACPEKETPSLNPLAQSQGLSCTLGSPKPEDGEYKFGARVRKNYRTLVLGKRPVLQTPPVKPNLKSARSPRPTGKTETNEGTLDDFTVINRRKK 750 . :*.:: : .** * .:. : *: .. *. * .* : .*: *: *. *. ..* * ** * :

C17ORF96 ------TFSLLNCFPCPPALVVGEDGDLKPASSLRLQGDS-KPPPAHPLWRWQMGGPAVPEPPGLKFWGINMDES 379 SKDA1 VASNVASAVKRPFHFMANFPCPPSLIIGRDGDLWPAYSLNTTKDSQTPHKAHPIWKWQLGGSAIPLPPSHKFRKFNS--- 827 .* :: *****:*::*.**** ** **. ** .* ***:*:**:**.*:* **. ** :* c d Peptides

MQRMIQQFAA EYTSKNSSTQ DPSQPNSTKN QSLPKASPVT TSPTAATTQN PVLSKLLMAD QDSPLDLTVR KSQSEPSEQD GVLDLSTKKS PCAGSTSLSH SPGCSSTQGN GENSTEAKAV DSNNQSKSPL EKFMVKLCTH HQKQFIRVLN DLYTESQPGT EDLQPSDSGA MDVSTCNAGC AQLSTKHKEK DALCLDMKSS ASVDLFVDSS DSHSPLHLTE QTPKKPPPEI NPVDGRENAL TVVQKDSSEL PTTKSNSINS SSVDSFTPGY LTASNCSSVN FHHIPKILEG QTTGQEQDTN VNICEDGKDH MQSSALVESL ITVKMAAENS EEGNTCIIPQ RNLFKALSEE AWNSGFMGNS SRTADKENTL QCPKTPLRQD LEANEQDARP KQENHLHSLG RNKVGYHLHP SDKGQFDHSK DGWLGPGPMP AVHKAANGHS RTKMISTSIK

TARKSKRASG LRINDYDNQC DVVYISQPIT ECHFENQKSI LSSRKTARKS TRGYFFNGDC CELPTVRTLA RNLHSQEKAS CSALASEAVF 10o rf12 LCOR-Cra/LCOR LCOR-Cra C TPKQTLTIPA PRHTVDVQLP REDNPEEPSK EITSHEEGGG DVSPRKEPQE PEVCPTKIKP NLSSSPRSEE TTASSLVWPL PAHLPEEDLP LCOR To tal EGGSTVSAPT ASGMSSPEHN QPPVALLDTE EMSVPQDCHL LPSTESFSGG VSEDVISRPH SPPEIVSREE SPQCSENQSS PMGLEPPMSL C10orf12 0 0 126 0 126 GKAEDNQSIS AEVESGDTQE LNVDPLLKES STFTDENPSE TEESEAAGGI GKLEGEDGDV KCLSEKDTYD TSIDSLEENL DKKKKGKKFP LCOR-Cra 18 14 85 0 117 EASDRCLRSQ LSDSSSADRC LRNQSSDSSS ACLEIKVPKN PSAKRSKKEG HPGGTTPKGL LPDSFHTETL EDTEKPSVNE RPSEKDAEQE GEGGGIITRQ TLKNMLDKEV KELRGEIFPS RDPITTAGQP LPGERLEIYV QSKMDEKNAH IPSESIACKR DPEQAKEEPG HIPTQHVEEA EED 0190 10 VNEVDNENTQ QKDDESDAPC SSLGLSSSGS GDAARAPKSV PRPKRLTSST YNLRHAHSLG SLDASKVTSE KEAAQVNPIM PKENGASESG EZH1 1 2 16 0 19 DPLDEDDVDT VVDEQPKFME WCAEEENQEL IANFNAQYMK VQKGWIQLEK EGQPTPRARN KSDKLKEIWK SKKRSRKCRS SLESQKCSPV EZH2 3 1 13 0 17 QMLFMTNFKL SNVCKWFLET TETRSLVIVK KLNTRLPGDV PPVKHPLQKY APSSLYPSSL QAERLKKHLK KFPGATPAKN NWKMQKLWAK FRENPDQVEP EDGSDVSPGP NSEDSIEEVK EDRNSHPPAN LPTPASTRIL RKYSNIRGKL RAQQRLIKNE KMECPDALAV ESKPSRKSVC LCOR 40 0 0 95 135 INPLMSPKLA LQVDADGFPV KPKSTEGMKG RKGKQVSEIL PKAEVQSKRK RTEGSSPPDS KNKGPTVKAS KEKHADGATK TPAAKRPAAR Baits MTF2 1 1 13 0 15 DRSSQPPKKT SLKENKVKIP KKSAGKSCPP SRKEKENTNK RPSQSIASET LTKPAKQKGA GESSSRPQKA TNRKQSSGKT RARPSTKTPE PHF1 3 1 28 0 32 SSAAQRKRKL KAKLDCSHSK RRRLDAK PHF19 0140 5 e Total 66 21 294 95 476 Peptides PRC2 interactions LCOR interactions TF 2 10o rf12 C LCOR-Cra LCOR-Cra/LCOR DEK EED EZH1 EZH1/EZH2 EZH2 M PHF1 PHF19 RBBP4 RBBP4/RBBP7 RBBP7 SUZ12 CTBP1 CTBP1/CTBP2 CTBP2 GRN LCOR TBA1C To tal C10orf12 126 000 34 10 6 43 27 17 2879 54 0 00001 344 LCOR-Cra 85 14 18 2 19 0 3 19 17 10 0453 23 0 00002 224 LCOR 0 0 40 00000 0000000 11516 1 95 3

Baits 171 Total 211 14 58 2 53 10 9 62 44 27212 12 12 77 11516 1 95 6 739

f DNA anti-HA Merged

HA-C10ORF12 Tet +

HA-C10ORF12 Tet -

HA-EGFP Supplementary Figure 6 a b ASXL1- MSigDB Pathway -log10(Binomial p value) 0 20 40 60 80 100 120 140 160 180 200 220 240 260 Gene Expression >270 Diabetes pathways >270 Total reads Overall Unique Multiple Metabolism of proteins 266.81 Cell Cycle, Mitotic 240.09 SamplebioRxiv preprint(mio.) doi: https://doi.org/10.1101/059964alignment ;(%) this versionalignments posted July 7, (%)2016. Thealignments copyright holder for (%) this preprint (which was not Influenza Life Cycle 208.46 certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Translation 206.71 ASXL1 47.2 88.1 31.8 (67.4) 9.7 (20.7) Insulin Synthesis and Secretion 192.44 FOXK1 18.6 94.3 13.1 (70.3) 4.4 (24.0) GTP hydrolysis and joining of the 60S ribosomal subunit 188.83 Pathways in cancer 180.89 O-GlcNAc 46.2 86.4 30.2 (65.3) 9.8 (21.1) Peptide chain elongation 180.20 Input62 20.2 97.4 13.4 (66.4) 6.3 (31.1) Influenza Viral RNA Transcription and Replication 178.57 Input63 46.8 83.8 27.7 (59.3) 11.5 (24.5) Form ation of a pool of free 40S subunits 178.27 Regulation of beta-cell developm ent 173.27 Regulation of gene expression in beta cells 171.40 Viral m RNA Translation 170.68 Ribosom e 170.03 Transcription 169.03 Integration of energy m etabolism 158.60 Regulation of Insulin Secretion 143.62 Form ation and Maturation of m RNA Transcript 139.60

FOXK1- MSigDB Pathway -log10(Binomial p value) c 0 10 20 30 40 50 60 70 −4 0 2 4 −2 0 2 −3 −1 1 3 Gene Expression 74.68 Transcription 41.57 Cell Cycle, Mitotic 34.37 Metabolism of proteins 32.28

0.074 FOXK1 0 2 4 Translation 29.28 0.83 0.89 Diabetes pathways 28.60 Influenza Life Cycle 28.44 −4 GTP hydrolysis and joining of the 60S ribosomal subunit 27.58 Form ation and Maturation of m RNA Transcript 25.80 RNA Polym erase I, RNA Polym erase III, and Mitochondrial Transcription 25.76

0.029 0.13 0.20 0 2 4 ASXL1 HIV Infection 25.58 0.85 Spliceosom e 25.27 Mitotic M-M/G1 phases 25.23 −4 Form ation of a pool of free 40S subunits 23.24 Processing of Capped Intron-Containing Pre-mRNA 23.05 Influenza Viral RNA Transcription and Replication 22.88

0.075 Ribosom e 22.09 OGlcNac 0 2 4 Peptide chain elongation 21.23 Viral m RNA Translation 20.08

−4 Regulation of beta-cell developm ent 19.74

OGlcNac- MSigDB Pathway 0 2 RYBP 0.78 0.46 -log10(Binomial p value) −2 0 20 40 60 80 100 120 140 160 180 Gene Expression >190 Cell Cycle, Mitotic 184.59 Diabetes pathways 169.42 RING1B 0.56 0 2 4 Transcription 144.98

−2 Metabolism of proteins 140.63 HIV Infection 136.38 Form ation and Maturation of m RNA Transcript 125.96 Processing of Capped Intron-Containing Pre-mRNA 120.31 1 3 Influenza Life Cycle 116.45 TIF1B Elongation and Processing of Capped Transcripts 110.92 Translation 107.67

−3 −1 Mitotic M-M/G1 phases 106.42 Spliceosom e 101.67 −4 0 2 4 GTP hydrolysis and joining of the 60S ribosomal subunit 93.91 m RNA Splicing 92.17 Host Interactions of HIV factors 91.59 Insulin Synthesis and Secretion 91.12 RNA Polym erase I, RNA Polym erase III, and Mitochondrial Transcription 90.33 Huntington's disease 87.58 Influenza Viral RNA Transcription and Replication 85.74