bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
A high density map for navigating the human Polycomb complexome
Simon Hauri1,2,7*, Federico Comoglio3,8*, Makiko Seimiya3, Moritz Gerstung3,9, Timo Glatter1,10, Klaus Hansen4, Ruedi Aebersold1,5, Renato Paro3,6, Matthias Gstaiger1,2† and Christian Beisel3†
1 Department of Biology, Institute of Molecular Systems Biology, ETH Z¨urich, Z¨urich, Switzerland 2 Competence Center Personalized Medicine UZH/ETH, Z¨urich, Switzerland 3 Department of Biosystems Science and Engineering, ETH Z¨urich, Basel, Switzerland 4 Biotech Research and Innovation Centre (BRIC) and Centre for Epigenetics, University of Copenhagen, Copenhagen, Denmark 5 Faculty of Science, University of Z¨urich, Z¨urich, Switzerland 6 Faculty of Sciences, University of Basel, Basel, Switzerland 7 Present address: Department of Clinical Sciences, Lund University, Lund, Sweden 8 Present address: Department of Haematology, Cambridge Institute for Medical Research and Wellcome Trust/MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge, United Kingdom 9 Present address: European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, United Kingdom 10 Present address: Mass spectrometry and proteomics, Max Planck Institute for Terrestrial Microbiology, Marburg, Germany
∗ Equal contribution † Corresponding authors. M.G: [email protected]; C.B: [email protected]
Abstract
Polycomb group (PcG) proteins are major determinants of gene silencing and epigenetic 1
memory in higher eukaryotes. Here, we used a robust affinity purification mass spec- 2
trometry (AP-MS) approach to systematically map the human PcG protein interactome, 3
uncovering an unprecedented breadth of PcG complexes. The obtained high density 4
protein interaction data identified new modes of combinatorial PcG complex formation 5
with proteins previously not associated with the PcG system, thus providing new insights 6
into their molecular function and recruitment mechanisms to target genes. Importantly, 7
we identified two human PR-DUB de-ubiquitination complexes, which comprise the O- 8
linked N-acetylglucosamine transferase OGT1 and a number of transcription factors. By 9
further mapping chromatin binding of PR-DUB components genome-wide, we conclude 10
that the human PR-DUB and PRC1 complexes bind distinct sets of target genes and 11
impact on different cellular processes in mammals. 12
Introduction 13
Cell division requires faithful replication of the genome and restoration of specific 14 1 chromatin states that form the basis of epigenetic memory . Polycomb group (PcG) 15
1/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
proteins - originally identified in Drosophila melanogaster as epigenetic regulators stably 16
maintaining the repressed state of homeotic genes throughout development - are key 17
players in this process. Numerous studies have now established a central role for PcG 18
proteins in the dynamic control of hundreds of targets in metazoans, including genes 19 2 affiliated to fundamental signaling pathways . Hence, biological processes regulated 20
by PcG proteins encompass cell differentiation, tissue regeneration and cancer cell 21 3−5 growth . 22
The PcG system is organized in multimeric repressive protein complexes containing 23
distinct chromatin modifying activities, which impact on transcriptional regulation 24
by modulating chromatin structures. In Drosophila, five distinct PcG complexes dis- 25
playing different biochemical functions have been reported. The Polycomb Repressive 26
Complex (PRC) 2 contains Enhancer of Zeste which trimethylates lysine 27 of histone 27 6,7 H3 (H3K27me3) while the PRC1 subunit Polycomb provides binding specificity to 28 8,9 H3K27me3 through its chromo-domain . In addition, PRC1 also contains the dRing 29
protein, which catalyzes the mono-ubiquitination of histone H2A on lysine 118 (H2AUb1), 30 10−12 thereby blocking RNA polymerase II activity . The Pho (Pleiohomeotic, Drosophila 31
homolog of mammalian YY1) repressive complex PhoRC combines DNA- and histone 32 13 tail binding specificities , the PRC1-related dRing-associated factors complex dRAF 33 14 contains the H3K36-specific histone demethylase dKDM2 and the Polycomb repressive 34 15 deubiquitinase (PR-DUB) targets H2AUb1 . 35
Although the core components of Drosophila PcG complexes seem rather fixed, we and 36
others have shown that they can be co-purified with different sets of accessory proteins, 37 13,16−18 thus increasing the diversity of the PcG system . Epigenomic profiling revealed 38
that distinct PcG complexes target largely overlapping gene sets in Drosophila and 39 15,19−22 mechanistic details of PcG recruitment to target genes are beginning to emerge . 40
In contrast, the mammalian PcG system is less well defined and appears to be 41
significantly more complex. Each Drosophila PcG subunit has up to six human homologs, 42 23−26 which combinatorially assemble in different complexes . The six homologs of the 43
Drosophila PRC1 core protein Psc, PCGF1-6, purify together with RING2, the homolog 44
of dRing, in different complexes named PRC1.1-PRC1.6, and each of them associates with 45 23−27 specific additional components . These PRC1 complexes are further distinguished by 46
the mutually exclusive presence of RYBP or a chromo-domain containing CBX protein. 47
Five different CBX proteins displaying differential affinities for lysine-methylated histone 48 28 H3 tails and RNA have been linked to PRC1. In contrast, the absence of a chromo- 49
domain within RYBP suggests that recruitment of CBX and RYBP containing PRC1 50
complexes might be mediated by H3K27 methylation or be independent of it, respectively. 51
Indeed, recent work showed that the histone demethylase Kdm2b targets PRC1.1 via 52 29−31 direct binding to unmethylated CpG islands . Interestingly, incorporation of RING2 53
in optional PCGF complexes not only leads to differential recruitment to chromatin but 54 23,29,31−34 also differentially regulates its enzymatic activity . 55
Similarly to PRC1, the histone methyltransferase (HMT) activity of PRC2 is poten- 56
tially modulated by accessory components such as the Polycomb-like homologs PHF1, 57 35−36 PHF19 and MTF2 . Additional DNA binding interaction partners like JARD2 58 37−41 and AEBP2 might mediate recruitment of the complexes to chromatin . However, 59
whether the PRC2 core, consisting of EED, SUZ12 and EZH2, simultaneously interacts 60
with all of these components or whether distinct complexes co-exist remains unknown. 61
Moreover, mammalian PhoRC and PR-DUB have not been identified to date. 62
Understanding PcG-mediated epigenetic regulation in mammals requires a detailed 63
understanding of the dynamic assembly of PcG complexes. A required step towards 64
this goal is the exhaustive definition of the composition of individual PcG complexes 65
including all accessory proteins, which likely convey distinct functional effects. Here 66
we present the first systematic and comprehensive high-density map on the modular 67
2/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
organization of the human PcG system using a sensitive double-affinity purification and 68 42−43 mass-spectrometry (AP-MS) method . The refined map of 1400 interactions and 69
490 proteins led to a considerable refinement of the human PRC1 and PRC2 network 70
topology, including their relation with the heterochromatin silencing system and the 71
identification of several novel interaction partners. Furthermore, we determined the 72
composition of the human PR-DUB. We found that this highly diverse complex contains 73
MBD proteins, FOXK transcription factors and OGT1, an O-linked N-acetylglucosamine 74 44 (O-GlcNAC) transferase implicated in PcG silencing in Drosophila . Finally, chromatin 75
profiling of PR-DUB components and comparison with published chromatin maps of 76
PcG proteins indicates that as opposed to Drosophila, PRC1 and PR-DUB regulate 77
distinct sets of genes in human cells. 78
Results and Discussion 79
Systematic mapping of the human PcG interaction proteome 80
To investigate the human PcG protein interaction network, we applied a systematic 81
proteomics approach, based on our previously reported AP-MS protocol in HEK293 82 42 cells . The method employs Flp-In HEK293 stable cell lines expressing Strep-HA tag 83
fusion proteins upon tetracycline induction (Fig. 1a). Initially, we selected 28 PcG 84
proteins homologous to Drosophila core complex components and performed AP-MS 85
experiments using these proteins as primary baits (Supplementary Fig. 1a). Then, 86
based on the observed interaction data from this set, we chose 36 additional secondary 87
bait proteins (Supplementary Fig. 1a, Supplementary Table 1). After double affinity 88
purification, bait-associated proteins (preys) were identified by liquid chromatography 89
tandem mass spectrometry (LC-MS/MS; Fig. 1a). 90
At least two biological replicates were measured for each bait protein, for a total of 91
174 AP-MS measurements. Proteins were identified using the X!Tandem search tool 92
to match mass spectra to peptides, and the Trans-Proteomic Pipeline (TPP) to map 93 45,46 peptides to proteins, at a false discovery rate (FDR) of less than 1% . The resulting 94
raw data set contained 930 proteins exhibiting 9856 candidate interactions. 95
To efficiently discriminate biologically relevant interaction partners from contaminant 96 47 proteins, we devised a stringent filtering procedure based on both WDN-score and 97
average enrichment over control purifications for each bait-prey pair. This filtering 98
strategy retained 490 high confidence interacting proteins (HCIPs) encompassing 1400 99
(1193 unidirectional and 207 reciprocal) interactions. Our data set is characterized by 100
an average of 21.9 HCIPs per bait protein, with 75% of interactions that have not yet 101
been annotated in public databases (Supplementary Fig. 1b). 102
To evaluate the specificity and sensitivity of our AP-MS data, we considered the two 103
bait proteins exhibiting the highest number of HCIPs, SKP1 (79 HCIPs) and WDR5 104
(73), and performed a cross-validation with literature-based reports. SKP1 serves as 105
an adaptor for F-Box proteins and CUL1, and confers enzymatic specificity. Out of 79 106
HCIPs, our SKP1 purifications identified 42 F-Box proteins (Supplementary Fig. 1c). 107 48 Furthermore, a previous AP-MS study investigating the interaction partners of WDR5 108
identified a set of 21 proteins associating with this scaffold protein, which takes part in 109 49 the assembly of several chromatin regulating complexes (reviewed in Migliori et al. ). 110
Notably, while we were able to recall 76% of previously reported interaction partners, 111
our experiments identified an additional set of 48 proteins (Supplementary Fig. 1d) 112
co-purifying with WDR5 and encompassing MLL complexes, the NSL complex, the 113
ADA2/GCN5/ADA3 transcription activator complex, mTORC2 components RICTOR 114
and SIN1, and the Polycomb repressive complex PRC1.6 (Supplementary Fig. 1e). 115
3/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Hierarchical clustering assigns HCIPs to PcG complexes 116
To determine the topology of our protein interaction network, we performed hierarchical 117
clustering of HCIPs using a rank-based correlation dissimilarity measure (see Supplemen- 118
tary Methods for details). Clustering revealed a modular organization built upon the 119
three major PcG assemblies PRC1, PRC2 and PR-DUB, and HP1-associated complexes 120
(Fig. 1b-c). 121
PRC1 represents the most elaborated and heterogeneous assembly, containing four 122
groups of complexes defined by the six PCGF proteins: PRC1.1 (PCGF1), PRC1.2/PRC1.4 123
(PCGF2/4), PRC1.3/PRC1.5 (PCGF3/5) and PRC1.6 (PCGF6). Among these PRC1 124
assemblies, PRC1.6 further provides links to the heterochromatin control system via the 125
HP1 chromobox proteins CBX1 and CBX3. Although analysis of the PRC1 topology 126
has been recently reported in studies concentrating on specific subunits in various cellu- 127 23,50−52 lar systems our systematic high-density interaction data allowed us to further 128
refine the composition of the PRC1 module. In the following discussion we focus on 129
these novel findings regarding PRC1 organization, which is illustrated in Fig. 1b-e and 130
Supplementary Fig. 3, and detailed in Supplementary Table 2. 131
All four PRC1 assemblies share a common core encompassing the E3 ubiquitin 132
ligases RING1 and RING2, and - with exception of PRC1.2/PRC1.4 - RYBP and YAF2. 133
Interestingly, PCGF2/4 also interact with RYBP and YAF2. As these proteins do not 134
share any additional interaction partner besides RING1/2 (Supplementary Fig. 3b), 135
RYBP/YAF-PCGF2/4-RING complexes might have limited functionality compared to 136
other PRC1 complexes or correspond to transient products before specific canonical 137
and non-canonical PRC1 holo complexes assemble. Furthermore, we did not detect any 138
protein stably associating with all canonical PRC1 core members (RING1/2, PHC1-3, 139
CBX2/4/6/7/8, PCGF2/4). However, we identified NUFP2 (Nuclear fragile X mental 140
retardation interacting protein 2), a putative RNA binding protein exhibiting interactions 141
with CBX2/6/7, PHC3 and PCGF4, as a new PRC1 interacting protein (Supplementary 142
Fig. 3b). 143
The PRC2 complex is separated from both PRC1 and HP1 (Fig. 1c). The two 144
characteristic histone binding proteins RBBP4 and RBBP7 not only belong to the PRC2 145
core along with SUZ12, EED and EZH1/2, but also partake in other protein complexes 146
such as LINC, NURF, NURD and SIN3 (Supplementary Fig. 2a). 147
Finally, we identified the PcG complex PR-DUB defined by ASXL1/2 and BAP1 148
(Fig. 1c). Our clustering analysis also revealed complexes such as the TCP chaperonin 149
and the proteasomal lid, that primarily consist of prey proteins (Fig. 1b). Of note, 150
several proteins belonging to MLL complexes share interactions between PRC1.3/PRC1.5 151
(CSK21/22), PRC1.6 (WDR5) and PR-DUB (OGT1) (Fig. 1c). In contrast, interaction 152
modules centered on LMBL1/3/4, SUV92 and TRIPC, LCOR, ZN211, and YY1 (the 153
homolog of Drosophila Pho) are more disconnected and tend to be sparse (Fig. 1c, Fig. 154
2d and Supplementary Fig. 2b-c). Although YY1 interacts with all subunits of the 155
INO80 chromatin remodeling complex, our AP-MS data does not unveil an equivalent of 156
the Drosophila PhoRC complex (Supplementary Fig. 2b). However, except for PhoRC, 157
we were able to reconstitute all mammalian equivalents of Drosophila PcG protein 158
assemblies with unprecedented detail. 159
WD40 domain proteins DCAF7 and WDR5 are central scaffold- 160
ing proteins for PRC1.3/PRC1.5 and PRC1.6 161
The WD40 domain protein DCAF7 has been implicated in skin development and cell 162
proliferation by interacting with DIAP1 and the dual-specificity tyrosine phosphorylation- 163 53,54 regulated kinase DYR1A . Intriguingly, DCAF7 co-purified with CBX4/6/8, RING1/2, 164
RYBP/YAF2 and PCGF3/5/6, indicating that the protein is deeply embedded in the 165
4/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
PRC1 module. As recent studies also reported interactions between DCAF7 and mem- 166 26,27,55,56 bers of the canonical PRC1 complex, as well as PCGF3/5/6 , we performed 167
DCAF7 purifications to test whether the protein is indeed a universal subunit of several 168
different RING1/2-containing complexes. 169
Our DCAF7 AP-MS revealed reciprocal interactions with all bait proteins within a 170
cluster centered on PCGF3 and 5 (Fig. 1d and Supplementary Fig. 3c), with no relation 171
to the other PCGF complexes. Moreover, we identified DYR1A/B, DIAP1, the Zinc 172
finger transcription factors (ZNFs) ZN503 and ZN703, and the ankyrin-repeat proteins 173
SWAHA and SWAHC as an unrelated module interacting with DCAF7 (Fig. 1d). This 174
result suggests that DCAF7 acts as a scaffold for several different protein complexes. 175
As for RING1/2, RYBP/YAF2 and PCGF3/5, DCAF7 interacts with the tetrameric 176
casein kinase 2 (CSK2) and the three paralogs AUTS2, FBRS and FBSL. Therefore, 177
to further refine the PRC1.3/PRC1.5 sub-network we performed AP-MS experiments 178
using the catalytic casein kinase subunits CSK21 and CSK22. Our results confirmed 179
the topology of the PCGF3/5-DCAF7 assemblies, and identify CSK2 and three unchar- 180
acterized proteins within the AUTS2 family as part of PRC1.3/PRC1.5 (Fig. 1d and 181
Supplementary Fig. 3c). 182
The protein PCGF6 was initially purified together with the transcription factors 183
E2F6, MAX, TFDP1, MGAP as well as RING1/2, YAF2, LMBL2, CBX3 and the 184 57 HMTs EHMT1 and EHMT2, an assembly denoted as E2F6.com . However, subsequent 185 23,27,58,59 studies were unable to recover the entire (holo) E2F6.com . Moreover, recent 186
data suggest that PCGF6 and RING2 might interact with the WD40 domain protein 187 23 WDR5 . We therefore decided to revisit the topology of the PCGF6-E2F6 network and 188
to probe WDR5 connectivity by adding MAX, TFDP1, E2F6, LMBL2, CBX3, EHMT2 189
and WDR5 to our bait collection. Our AP-MS experiments unraveled a high-density 190
network including reciprocal interactions between all but one (EHMT2) baits within 191
this set (Fig. 1e and Supplementary Fig. 2c), thus demonstrating that the major 192
PRC1.6 complex resembles E2F6.com. In addition, MGAP, MAX, TFDP1 and E2F6 193
purifications revealed a rich set of transcription factors that can heterodimerize with 194
these proteins but that are not part of PRC1.6 as they did not connect to any other 195
component thereof (Fig. 1e). 196
Recently, WDR5 was also reported to be part of the Non-Specific Lethal (NSL) 197
complex and to form a trimeric complex with RBBP5 and ASH2L, which stimulates 198
the H3K4-specific activity of the SET1 HMT family members SET1A, SET1B and 199 48,60−62 MLL1-4 . Interestingly, while we recalled these interactions, we additionally 200
detected reciprocal interactions of WDR5 with all PRC1.6 subunits, thus demonstrating 201
that WDR5 is a universal component of activating and repressing chromatin modifying 202
complexes. 203
Taken together, our results identify the WD40 domain proteins DCAF7 and WDR5 204
as subunits of PRC1.3/PRC1.5 and PRC1.6, respectively. Importantly, recent stud- 205
ies suggested that the diversity of PRC1 complexes might be specified by binding 206
preferences of PCGF proteins, which are mediated by their RING finger- and WD40- 207 63,64 associated Ubiquitin-Like (RAWUL) C-terminal domain . For example, the PCGF1 208
and PCGF2/4 RAWUL domains selectively interact with BCOR/BCORL and PHC 209 63 proteins, respectively . Since no interaction partners of PCGF3/5 and PCGF6 RAWUL 210
domains have been experimentally identified to date, and since WD40 domain-containing 211 49 proteins often scaffold multisubunit complexes , we propose that DCAF7 and WDR5 212
may serve as central scaffolding proteins for PRC1.3/PRC1.5 and PRC1.6. 213
5/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
CBX1 partitions in several distinct heterochromatin complexes 214
including PRC1.6 215
In contrast to previous studies, which reported CBX3 as the only heterochromatin 216
protein within E2F6.com, we unexpectedly detected CBX1 in all our PRC1.6-related pull 217
down experiments. To corroborate this finding, we performed AP-MS experiments with 218
CBX1, using the constitutive heterochromatin protein CBX5 as control. Our results 219
indicate that while CBX5 is disconnected from the PCGF6-E2F6 network, components 220
therein interact with CBX1 (Fig. 1e and Supplementary Table 2). Furthermore, they 221
validate interactions of EHMT2 with CBX1 and CBX3 and, to our surprise, separate 222
EHMT2 and EHMT1 from PRC1.6, suggesting a separate complex containing CBX1/3, 223
EHMT1/2, ZNF proteins, as well as the KRAB-ZNF interacting and co-repressor protein 224
TIF1B (Supplementary Fig. 4a). 225
Since the PcG and heterochromatin silencing systems are functionally and molecularly 226
related through PcG CBX2/4/6/7/8 and HP1 CBX1/3/5 proteins (reviewed in Beisel 227 65 and Paro ), we further explored the CBX1/3/5 core of our network seeking for potential 228
connections between these two systems. This survey led to a refined topology of 229
CBX1/3/5-containing complexes and identified new interacting partners (Supplementary 230
Fig. 4b-e). However, we did not detect additional connections to PcG proteins, suggesting 231
limited direct cross-talk between protein components of the two silencing systems. 232
The PRC2 core partitions into two different classes of complexes 233
While the functional core complex of PRC2 is composed of SUZ12, EED, RBBP4/7 and 234
either EZH1 or EZH2, additional accessory proteins have been identified which may 235 66−68 regulate the H3K27 HMT activity of the complex and its recruitment to chromatin . 236
However, how these proteins are organized within PRC2 or whether they assemble 237
into independent PRC2 subcomplexes remains largely unresolved. To elucidate the 238
topological organization of PRC2 complexes we performed AP-MS experiments using 14 239
reported PRC2-associated proteins (Supplementary Fig. 1a). 240
Hierarchical clustering analysis assigned all PRC2 baits to a single cluster exhibiting 241
high intra-cluster correlations (Fig. 1b and 2a) and forming a high-density interaction 242
network (Fig. 2b). However, when reciprocal interactions were taken into account, our 243
data revealed two fundamental alternative assemblies linked to the PRC2 core, the first 244
defined by AEBP2 and JARD2 and the second by the mutually exclusive binding of 245
one of the three Polycomb-like homologs (PCLs) PHF1, PHF19 and MTF2, respectively 246
(Fig. 2b). 247
Taken together, our results identify two structurally distinct classes of PRC2 com- 248
plexes. We therefore propose a novel nomenclature for PRC2, in which we refer to the 249
two PRC2 wings as PRC2.1 (mutually exclusive interaction of PHF1, MTF2 or PHF19) 250
and PRC2.2 (simultaneous interaction of AEBP2 and JARD2). AEBP2 and JARD2 251
can directly bind to DNA and have been implicated in the recruitment of PRC2 and 252 37−39,67 modulation of its enzymatic activity . Interestingly, depletion of JARD2 has only 253
a mild effect on global H3K27 methylation levels, suggesting that PRC2.1 might be 254
primarily responsible for maintaining H3K27me3 patterns genome-wide. 255
C10ORF12 and C17ORF96 are mutually exclusive subunits of 256
the Polycomb-like class of PRC2 complexes 257
Our purifications of the PRC2 core members and PCLs identified two largely uncharacter- 258
ized proteins, C10ORF12/LCOR and C17ORF96, as PRC2 interactors (Fig. 2b). Both 259
proteins have recently been shown to reciprocally interact and co-localize on chromatin 260
6/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
66 with EZH2 , but their placement within the PRC2 topology and their functional role 261
remained unknown. 262
Purifications of C17ORF96 confirmed all interactions with PCLs (Fig. 2b) and 263
computational sequence analysis revealed that C17ORF96 is present in all vertebrate 264
genomes. Interestingly, BLAST identified a single protein related to C17ORF96 in the 265
human genome, the SKI/DAC domain containing protein 1 (SKDA1) (Supplementary 266
Fig. 5a). SKDA1 belongs to the DACH family, which is defined by the presence of a 267
SKI/SNO/DAC domain of about 100 amino acids, and is involved in various aspects of 268 69,70 cell proliferation and differentiation . However, C17ORF96 lacks the SKI/SNO/DAC 269
domain and its homology to SKDA1 is restricted to the C-terminus (53% sequence 270
identity within the last 60 amino acids) (Supplementary Fig. 5a-b), suggesting that this 271
region encodes an hitherto uncharacterized protein domain. Interestingly, SKDA1 also 272
interacts with EZH1 and SUZ12 (Fig. 2b), suggesting that this putative C-terminal 273
domain mediates the interaction of C17ORF96 and SKDA1 with the PRC2 core. 274
Initial analysis of C10ORF12, the second uncharacterized protein highly connected 275
to the PRC2 core, identified peptides that ambiguously mapped to two distinct UniProt 276
proteins, LCOR and C10ORF12 (Supplementary Fig. 5c-e). These two proteins are 277
encoded by the same genomic locus. Indeed, in contrast to the UniProt database, 278
Genebank contains the LCOR-Cra b (ligand-dependent co-repressor, isoform CRA b, 279
EAW49962.1) entry, where the N-terminal 111 amino acids of LCOR are fused to 280
C10ORF12 and the two regions are separated by a 200 amino acid spacer (Fig. 2c). 281
LCOR is a ligand-dependent co-repressor interacting via its N-terminal domain with 282
nuclear hormone receptors in a complex including CTBP and a number of histone 283 71,72 deacetylases . While our AP-MS analysis yielded peptides of the LCOR N-terminus, 284
C10ORF12 and the LCOR-CRA b specific spacer (Supplementary Fig. 5c), peptides 285
of the LCOR C-terminus were missing (Supplementary Fig. 5d), indicating that PRC2 286
interacts with LCOR-CRA b and potentially with the shorter isoform C10ORF12. To test 287
this possibility, we performed additional AP-MS experiments using LCOR, C10ORF12 288
and LCOR-CRA b as baits. LCOR purified with its known interaction partners CTBP1 289
and CTBP2, while PRC2 components were absent in LCOR purifications (Fig. 2d and 290
Supplementary Fig. 5e). In contrast, both LCOR-CRA b and C10ORF12 reciprocally 291
interact with all subunits of the PCL wing of PRC2 (Fig. 2b, Supplementary Fig. 5d-e). 292
To investigate the functional relevance of this finding, we employed a heterologous 293
reporter system based on a stably integrated, constitutively active luciferase reporter gene 294 73 responsive to upstream, promoter-proximal GAL4 DNA binding sites (Fig. 2e) . We 295
engineered cell lines containing tetracycline inducible GAL4-LCOR and GAL4-C10ORF12 296
expression constructs, respectively. Upon induction, both proteins accumulated in the 297
nucleus and were recruited to the GAL4 motifs, resulting in strong repression of luciferase 298
activity (Fig. 2f-h and Supplementary Fig. 5f). To assess whether the repressive 299
activity of C10ORF12 is mediated by recruitment of PRC2 to the target promoter, we 300
performed chromatin immunoprecipitation (ChIP) with an H3K27me3-specific antibody 301
and analyzed the enrichment of luciferase promoter fragments via quantitative PCR. Upon 302
tetracycline induction, we found that the transcription start site (TSS) of the luciferase 303
gene was significantly trimethylated at H3K27 in the GAL4-C10ORF12 expressing cell 304
line (Fig. 2i). In contrast, despite GAL4-LCOR was expressed at higher levels than 305
GAL4-C10ORF12 (Fig. 2f) and exhibited a 10-20 fold increase in its binding to the 306
reporter (Fig. 2h), no significant H3K27me3 enrichment was observed upon expression 307
of this protein. 308
PCL proteins target PRC2 and positively regulate its enzymatic activity via their 309 36,74,75 ability to bind methylated H3K36 . However, further experimental investigation 310
will be required to elucidate the exact mechanism by which C17ORF96 and LCOR- 311
CRA b/C10ORF12 influence PRC2.1. An interesting possibility is that LCOR-CRA b 312
7/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
recruits PRC2.1 to nuclear hormone receptor binding sites upon ligand binding. This 313
interaction, restricted to C10ORF12, leaves the N-terminus of LCOR free for ligand 314
responsive interaction with nuclear hormone receptors. 315
ASXL1 and ASXL2 define optional PR-DUB complexes contain- 316
ing OGT1 and FOXK transcription factors 317
The Drosophila PcG complex PR-DUB was identified as a heterodimer consisting of the 318 15 deubiquitinase Calypso and the Asx protein . However, the composition of its human 319
counterpart remains elusive. Thus, we set out to systematically characterize this complex 320
by performing purifications of BAP1, ASXL1 and ASXL2, the human homologs of the 321
Drosophila PR-DUB components. Our AP-MS analysis revealed that BAP1 reciprocally 322
interacts with both ASXL1 and ASXL2 (Fig. 3b). Interestingly, the two ASXL proteins 323
do not interact with each other (Fig. 3b), suggesting the existence of two mutually 324
exclusive PR-DUB complexes, which we called PR-DUB.1 and PR-DUB.2 depending on 325
the ASXL partner of BAP1 being ASXL1 and ASXL2, respectively. 326
Both PR-DUB core components share a similar set of accessory proteins encompassing 327
the transcription factors FOXK1 and FOXK2, the chromatin associated proteins MBD5 328
and MBD6, the transcriptional co-regulator HCFC1 and most notably OGT1 (Fig. 329
3b). A recent attempt to identify BAP1 interaction partners led to the identification 330 76 of Asxl1, Asxl2, Ogt, Foxk1, Kdm1b and Hcf1 in mouse spleen tissue . Our data 331
provide support to these results and indicate a general, cell type independent assembly 332
of mammalian PR-DUB complexes. Furthermore, our data clearly implicate OGT1 as 333
member of mammalian PR-DUB complexes, an interaction which was not identified in 334 15 the Drosophila PR-DUB complex purification although the Drosophila homolog Ogt 335 44 was previously annotated as bona fide PcG protein . 336
OGT1 is the only O-linked N-acetylglucosamine (O-GlcNAc) transferase in mammals. 337
The enzyme catalyzes the addition of a single GlcNAc molecule to serine and threonine 338 77 of many target proteins . OGT1 enzymatic activity is required for mouse development 339 78 and is essential for embryonic stem cell (ESC) viability . In addition, the protein was 340
found to interact with BAP1 and to localize to chromatin via its interaction with the 341 76,78 5-methylcytosine oxidase TET1 . To further refine the connectivity of OGT1 within 342
the PR-DUB network, we performed AP-MS experiments using OGT1 as bait. 343
This analysis validated the interaction between BAP1 and OGT1 and the interactions 344
of OGT1 with TET1 and NCOAT (Fig. 3b), the O-GlcNAcase counteracting OGT1 345 78,79 activity . Moreover, our data identified a second set of OGT1-containing complexes 346
involved in transcriptional regulation that did not co-purify with PR-DUB core subunits 347
(Fig. 3b). These include the ZNFs ZEP1 and ZEP2, and the arginine-specific HMT 348
CARM1. Furthermore, we identified OGT1 as subunit of WDR5 containing complexes. 349
Indeed, OGT1 exhibits interactions with the NSL complex and with the SET1 HMT 350
family activating complex WDR5/RBBP5/ASH2L, which is likely to mediate the inter- 351
action of OGT1 with MLL1 and SET1A (Fig. 3b). Although no interaction of OGT1 352
with FOXK1/2 and MBD5/6 was detected, these proteins co-cluster with PR-DUB core 353
components and OGT1 is highly connected to the PR-DUB core (Fig. 3a-b). 354
These results suggest that OGT1/HCFC1 and FOXK/MBD proteins may form 355
optional PR-DUB.1/PR-DUB.2 complexes. Conversely, OGT1 interactions with FOXK 356
and MBD proteins could be transient and hence difficult to pinpoint by OGT1 affinity 357
purification. 358
Genomics profiling of the FOXK1-containing PR-DUB.1 359
A functional interaction of OGT1 with FOXK transcription factors within the same 360
PR-DUB complex would require their colocalization at genomic target sites. To test 361
8/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
this hypothesis, we examined the genome-wide distribution of O-GlcNAc, a proxy for 362
catalytically active OGT1, ASXL1 and FOXK1 by performing ChIP-seq in HEK293 363
cells (Supplementary Fig. 6a). 364
By pairwise analysis of overlapping peak regions we found 41% and 55% of FOXK1 365
peaks co-localizing with O-GlcNAc and ASXL1, respectively, while 69% of O-GlcNAc 366
peaks were co-occupied by ASXL1 (Fig. 4a). In total, we identified 2703 genomic loci 367
bound by all three features (Fig. 4a). Functional annotation of these sites to genomic 368
compartments revealed a predominant binding of PR-DUB.1 to gene promoters (Fig. 369
4a), with read densities sharply peaking at TSSs of RefSeq annotated genes (Fig. 4b). 370
Moreover, we found that feature enrichments within ±1kb of TSSs are highly correlated 371
to each other (>0.8), further indicating that ASXL1, FOXK1 and OGT1 are likely 372
subunits of the same protein complex (Fig. 4c). To identify classes of genes bound by 373
PR-DUB.1, we subjected the set of TSSs bound by each complex member to MSigDB 374
pathway enrichment analysis. This analysis identified highly overlapping sets of enriched 375
pathways for each protein (Supplementary Fig. 6b). Notably, PR-DUB.1 targets are 376
predominantly enriched for genes involved in fundamental cellular processes like gene 377
expression, cell cycle, mitosis and protein metabolism (Fig. 4d). 378
PRC1 complexes and PR-DUB.1 regulate different target genes 379
Mutations in Drosophila sxc (the gene encoding Ogt), calypso and Asx genes lead to 380
de-repression of HOX genes and previous studies reported a strong colocalization of PR- 381 15,44 DUB and O-GlcNAc with major PRC1 bound sites at inactive genes in Drosophila . 382
We sought to investigate this relation in the human genome by comparing our PR-DUB 383 23 80 profiles with publicly available ChIP-seq data of RING2 and RYBP , as well as TIF1B . 384
Our analysis therefore focused on six representatives of the three major modules 385
within our PcG interaction network at the chromatin level: RING2 and RYBP, the 386
central core of the PRC1 module (Fig. 1c); TIF1B, the common component of ZNFs 387
containing CBX1/3/5 complexes (Supplementary Fig. 4a), and PR-DUB.1. Besides 388
the expected high correlation between RING2 and RYBP (p=0.78, Supplementary Fig. 389
6c), analysis of pairwise correlations of feature enrichments at promoters revealed a 390
clear segregation between PRC1 and TIF1B on the one hand, and PR-DUB.1 on the 391
other hand (Fig. 4e-f and Supplementary Fig. 6c). Similarly, when comparing the 392
genome-wide distribution of PR-DUB.1 (2703 ASXL1+GlcNAc+FOXK1 co-occupied 393
regions) with ?PRC1? (6816 RING2+RYBP peaks) and TIF1B (10297 peaks), we 394
observed only a partial co-localization of these three complexes at target sites, with 24% 395
and 31% of PR-DUB.1 binding sites co-bound by PRC1 and TIF1B, respectively and 396
only 336 regions occupied by all three complexes (Fig. 4f). 397
In summary, our analysis uncovered the basic topology of the human PR-DUB network 398
at both proteomics and genomics level. Interestingly, and in contrast to Drosophila, the 399
human PR-DUB and PRC1 complexes bind largely distinct sets of target genes, strongly 400
suggesting they are involved in different cellular processes in mammals. In addition, 401
our AP-MS experiments identified the transcription factors FOXK1 and FOXK2 as 402
components of PR-DUB, hence highlighting a potential recruitment mechanism of PR- 403
DUB complexes. We anticipate that future experiments based on our data will shed 404
light on the functionality of PR-DUB complexes in gene regulation and their relation to 405
PRC1 and PRC2. 406
Conclusions 407
Although considerable progress has been made in determining the composition of mam- 408
malian PcG protein complexes, recent findings are primarily based on studies of isolated 409
9/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
protein components in different cellular contexts with heterogenous biochemical work- 410
flows, thus hampering a system-level understanding of gene silencing. In this study, in 411
contrast, we used a systematic proteomic approach to comprehensively map the PcG 412
protein interactome in a single human cell line. Since the abundance of PcG proteins 413
can vary between cell types and surely influences the assembly of alternative protein 414
complexes, we chose HEK293 cells for our study as all PcG proteins are expressed in this 415
cell type. The result is a high-density interaction network, which enabled us to dissect 416
individual PcG complexes with unprecedented detail. By allocating newly identified 417
interaction partners to all PcG complex families and by identifying candidate subunits 418
responsible for complex targeting to chromatin, we obtained new insights into molecular 419
function and recruitment of the PcG silencing system. In addition to the fine mapping 420
of the cardinal PcG complexes PRC1 and PRC2, our data unravel human PR-DUB as 421
multifaceted assembly comprising OGT1 along with several transcription and chromatin 422
binding factors. For the first time, our study testifies the significant diversity that exists 423
among individual PcG complexes in a single cell line. In addition, it provides a solid 424
framework for future systematic experiments aiming at disentangling the biochemistry 425
of PcG protein-mediated gene regulation in mammalian cells. 426
Methods 427
Expression constructs and generation of stable cell lines 428
To generate expression vectors for tetracycline-induced expression of N-terminally SH- 429
tagged bait proteins, human ORFs within pDONR223 vectors were picked from a 430
Gateway-compatible human orfeome collection (horfeome v5.1, Open Biosystems) for 431
LR recombination with the customized destination vector pcDNA5/FRT/TO/SH/GW, 432
which was obtained through ligation of the SH-tag coding sequence and the Gateway 433
recombination cassette into the polylinker of pcDNA5/FTR/TO (Invitrogen). Genes not 434
in the human orfeome collection were amplified from human cDNA prepared from HEK293 435
cells by PCR and cloned into entry vectors by TOPO (pENTR/D-TOPO) reaction. 436
Stable Flp-In HEK293 T-REx cell lines were generated as described in Supplementary 437
Methods. 438
Protein purification 439
Stable Flp-In HEK293 T-REx cell lines were grown in five 14.5 cm Greiner dishes 440
to 80% confluency and bait protein expression induced by the addition of 1µg/ml of 441
tetracycline to the medium 16-24hrs prior to harvest in PBS containing 1 mM EDTA. 442
The suspended cells were pelleted and drained from the supernatant for subsequent 443 ◦ shock-freezing in liquid nitrogen and long term storage at -80 C. The frozen cell pellets 444
were resupended in 5ml TNN lysis buffer (100 mM Tris pH 8.0, 5 mM ETDA, 250 mM 445
NaCl, 50 mM NaF, 1% Igepal CA-630 (Nonidet P-40 Substitute), 1.5 mM Na3VO4, 1 446
mM PMSF, 1mM DTT and 1x Protease Inhibitor mix (Roche)) and rested on ice for 10 447
min. Insolubilizable material was removed by centrifugation. Cleared lysates were loaded 448
on a pre-equilibrated spin column (Biorad) containing 200 µl Strep-Tactin sepharose 449
(IBA Biotagnology). The sepharose was washed four times with 1 ml TNN lysis buffer 450
(Igepal CA-630 and DTT concentrations adjusted to 0.5% and 0.5mM, respectively). 451
Bound proteins were eluted with 1 ml 2 mM Biotin in TNN lysis buffer (Igepal CA-630 452
and DTT concentrations adjusted to 0.5% and 0.5mM, respectively), incubated for 453
2h with 100 µl HA-Agarose (Sigma), washed four times with TNN lysis buffer (Igepal 454
CA-630 concentration adjusted to 0.5%, w/o DTT and w/o protease inhibitors) and 455
two additional times in TNN buffer (100 mM Tris pH 8.0, 150 mM NaCl, 50 mM NaF). 456
10/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
The bound proteins were released by acidic elution with 500 µl 0.2 M Glycine pH 2.5 457
and the eluate was pH neutralized with NH4HCO3. Cysteine bonds were reduced with 458 ◦ 5 mM TCEP for 30 min at 37 C and alkylated in 10 mM iodacetamide for 20 min at 459
room temperature in the dark. Samples were digested with 1 µg trypsin (Promega) 460 ◦ overnight at 37 C. Bait proteins with low protein yield were processed by single step 461
purification, omitting the HA step. The frozen cell pellets were resuspended in 5ml of 462
TNN lysis buffer containing 10 µg/ml Avidin. The eluates were TCA precipitated to 463
remove biotin and resolubilized in 50 µl 10% ACN, 50 mM NH4HCO3 pH 8.8. After 464
dilution with NH4HCO3 to 5% ACN the samples were reduced, alkylated and digested 465
as in the double step protocol. The digested peptides were puri?ed with C18 microspin 466
columns (The Nest Group Inc.) according to the protocol of the manufacturer, resolved 467
in 0.1% formic acid, 1% acetonitrile for mass spectrometry analysis. 468
Mass spectrometry 469
LC-MS/MS analysis was performed on an LTQ Orbitrap XL mass spectrometer (Thermo 470
Fisher Scientific). Peptide separation was carried out by reverse phase a Proxeon 471
EASY-nLC II liquid chromatography system (Thermo Fisher Scientific). The reverse 472
phase column (75 µm x 10 cm) was packed with Magic C18 AQ (3 µm) resin (WICOM 473
International). A linear gradient from 5% to 35% mobile phase (98% acetonitrile, 0.1% 474
formic acid) was run for 60 min over a stationary phase (0.1% formic acid, 2% acetonitrile) 475
at a ?ow rate of 300 nl/min. Data acquisition was set to obtain one high resolution MS 476
scan in the Orbitrap (60,000 @ 400 m/z) followed by six collision-induced fragmentation 477
(CID) MS/MS fragment ion spectra in the linear trap quadrupole (LTQ). Orbitrap 478
charge state screening was enabled and ions with unassigned or single charge states were 479
rejected. The dynamic exclusion window was set to 15s and limited to 300 entries. The 480
minimal precursor ion count to trigger CID and MS/MS scan was set to 150. The ion 481
accumulation time was set to 500 ms (MS) and 250 ms (MS/MS) using a target setting of 482
106 (MS) and 104 (MS/MS) ions. After every biological replicate measurement, a peptide 483
reference sample containing 200 fmol of human [Glu1]-Fibrinopeptide B (Sigma-Aldrich) 484
was analyzed to monitor the overall LC-MS/MS systems performance. 485
ChIP and preparation of ChIP-seq libraries 486 81 Chromatin fixation and immunoprecipitation were performed essentially as described . 487 8 Cells (3-4x10 ) were fixed in 200 ml of medium with 1% formaldehyde for 10 min at 488
room temperature. Cross-linked cells were sonicated to produce chromatin fragments 489
of an average size of 150-400 bp. Soluble chromatin was separated from insoluble 490 7 material by centrifugation. The supernatant containing chromatin of 1-2x10 cells 491
was used for immunoprecipitation. Sequencing libraries were prepared with the NEB 492
Genomic DNA Sample Preparation Kit according to NEB?s instructions. After adapter 493
ligation, library fragments of 250-350 bp were isolated from an agarose gel. The DNA 494
was PCR amplified with Illumina primers with 18 cycles, purified, and loaded on an 495
Illumina flow cell for cluster generation. Libraries were sequenced on the Genome 496
Analyzer IIx (TrueSeq cBot-GA v2 and TruSeq v5 SBS kit) and HiSeq 2000 (HiSeq 497
Flow Cell v3 and TruSeq SBS Kit v3) following the manufacturer?s protocols. For 498
ChIP-qPCR, nuclei were prepared essentially as described in Functional Analysis of 499 82 DNA and Chromatin . Immunoprecipitations were performed using Anti-GAL4 (sc-510, 500
Santa Cruz Biotechnology), Anti-IgG (10500C, Invitrogen) and Anti-H3K27me3 kindly 501
provided by Thomas Jenuwein. Anti-FOXK1 (ab18196) was purchased from Abcam , 502
Anti-ASXL1 (sc85283) from Santa Cruz Biotechnology and Anti-GlcNAc (HGAC85) 503
from Novus Biologicals. Primer sets used for qPCR are listed in the Supplemental 504
Experimental Procedures. 505
11/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Data analysis 506
Description of data processing and analysis methods are available in the Supplementary 507
Methods. 508
Accession Numbers 509
Mass spectrometry data have been submitted to the PeptideAtlas database http://www. 510 peptideatlas.org/ and assigned the identifier PASS00347. Protein interactions have 511 been submitted to the IMEx (http://www.imexconsortium.org) consortium through 512 IntAct83 and assigned the identifier IM-21659. Sequencing data have been submitted 513 to the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under 514 accession no. GSE51673. 515
Acknowledgements 516
We thank I. Nissen and M. Kohler for technical support on ChIP-seq. Illumina sequencing 517
was done in the Genomics Facility Basel at D-BSSE, ETH Z¨urich. Research of SH and 518
MG is supported by the European Union 7th Framework project SYBILLA (Systems 519
Biology of T-cell activation) and the Innovative Medicines Initiative project ULTRA-DD. 520
Research of RA is funded by advanced ERC grant Proteomics v3.0 (233226) and by 521
SystemsX.ch, the Swiss initiative for systems biology. Research of RP is funded by the 522
Swiss National Science Foundation and the ETH Z¨urich. 523
References 524
1. Sarkies, P. & Sale, J.E. Cellular &epigenetic stability and cancer. Trends in genetics 28, 525
118-127 (2012). 526
2. Ringrose, L. Polycomb comes of age: genome-wide profiling of target sites. Current opinion 527
in cell biology 19, 290-297 (2007). 528
3. Jaenisch, R. & Young, R. Stem cells, the molecular circuitry of pluripotency and nuclear 529
reprogramming. Cell 132, 567-582 (2008). 530
4. Maurange, C., Lee, N. & Paro, R. Signaling meets chromatin during tissue regeneration in 531
Drosophila. Current opinion in genetics & development 16, 485-489 (2006). 532
5. Sparmann, A. & van Lohuizen, M. Polycomb silencers control cell fate, development and 533
cancer. Nature reviews. Cancer 6, 846-856 (2006). 534
6. Cao, R. et al. Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 535
298, 1039-1043 (2002). 536
7. Muller, J. et al. Histone methyltransferase activity of a Drosophila Polycomb group repressor 537
complex. Cell 111, 197-208 (2002). 538
8. Fischle, W. et al. Molecular basis for the discrimination of repressive methyl-lysine marks in 539
histone H3 by Polycomb and HP1 chromodomains. Genes & development 17, 1870-1881 (2003). 540
9. Min, J., Zhang, Y. & Xu, R.M. Structural basis for specific binding of Polycomb chromod- 541
omain to histone H3 methylated at Lys 27. Genes & development 17, 1823-1828 (2003). 542
10. de Napoles, M. et al. Polycomb group proteins Ring1A/B link ubiquitylation of histone 543
H2A to heritable gene silencing and X inactivation. Developmental cell 7, 663-676 (2004). 544
11. Stock, J.K. et al. Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase 545
II at bivalent genes in mouse ES cells. Nature cell biology 9, 1428-1435 (2007). 546
12. Wang, H. et al. Role of histone H2A ubiquitination in Polycomb silencing. Nature 431, 547
873-878 (2004). 548
13. Klymenko, T. et al. A Polycomb group protein complex with sequence-specific DNA-binding 549
12/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
and selective methyl-lysine-binding activities. Genes & development 20, 1110-1122 (2006). 550
14. Lagarou, A. et al. dKDM2 couples histone H2A ubiquitylation to histone H3 demethylation 551
during Polycomb group silencing. Genes & development 22, 2799-2810 (2008). 552
15. Scheuermann, J.C. et al. Histone H2A deubiquitinase activity of the Polycomb repressive 553
complex PR-DUB. Nature 465, 243-247 (2010). 554
16. Furuyama, T., Banerjee, R., Breen, T.R. & Harte, P.J. SIR2 is required for polycomb 555
silencing and is associated with an E(Z) histone methyltransferase complex. Current biology 556
14, 1812-1821 (2004). 557
17. Saurin, A.J., Shao, Z., Erdjument-Bromage, H., Tempst, P. & Kingston, R.E. A Drosophila 558
Polycomb group complex includes Zeste and dTAFII proteins. Nature 412, 655-660 (2001). 559
18. Strubbe, G. et al. Polycomb purification by in vivo biotinylation tagging reveals cohesin 560
and Trithorax group proteins as interaction partners. Proceedings of the National Academy of 561
Sciences of the United States of America 108, 5572-5577 (2011). 562
19. Beisel, C. et al. Comparing active and repressed expression states of genes controlled by 563
the Polycomb/Trithorax group proteins. Proceedings of the National Academy of Sciences of 564
the United States of America 104, 16615-16620 (2007). 565
20. Enderle, D. et al. Polycomb preferentially targets stalled promoters of coding and noncoding 566
transcripts. Genome research 21, 216-226 (2011). 567
21. Oktaba, K. et al. Dynamic regulation by polycomb group protein complexes controls 568
pattern formation and the cell cycle in Drosophila. Developmental cell 15, 877-889 (2008). 569
22. Schuettengruber, B. et al. Functional anatomy of polycomb and trithorax chromatin 570
landscapes in Drosophila embryos. PLoS biology 7, e13 (2009). 571
23. Gao, Z. et al. PCGF homologs, CBX proteins, and RYBP define functionally distinct PRC1 572
family complexes. Molecular cell 45, 344-356 (2012). 573
24. Margueron, R. et al. Ezh1 and Ezh2 maintain repressive chromatin through different 574
mechanisms. Molecular cell 32, 503-518 (2008). 575
25. Shen, X. et al. EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 576
in maintaining stem cell identity and executing pluripotency. Molecular cell 32, 491-502 (2008). 577
26. Vandamme, J., Volkel, P., Rosnoblet, C., Le Faou, P. & Angrand, P.O. Interaction pro- 578
teomics analysis of polycomb proteins defines distinct PRC1 complexes in mammalian cells. 579
Molecular & cellular proteomics 10, M110 002642 (2011). 580
27. Sanchez, C. et al. Proteomics analysis of Ring1B/Rnf2 interactors identifies a novel complex 581
with the Fbxl10/Jhdm1B histone demethylase and the Bcl6 interacting corepressor. Molecular 582
& cellular proteomics 6, 820-834 (2007). 583
28. Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated histone H3 584
and RNA and are enriched in facultative heterochromatin. Molecular and cellular biology 26, 585
2560-2569 (2006). 586
29. Farcas, A.M. et al. KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition 587
of CpG islands. eLife 1, e00205 (2012). 588
30. He, J. et al. Kdm2b maintains murine embryonic stem cell status by recruiting PRC1 589
complex to CpG islands of developmental genes. Nature cell biology 15, 373-384 (2013). 590
31. Wu, X., Johansen, J.V. & Helin, K. Fbxl10/Kdm2b recruits polycomb repressive complex 1 591
to CpG islands and regulates H2A ubiquitylation. Molecular cell 49, 1134-1146 (2013). 592
32. Morey, L., Aloia, L., Cozzuto, L., Benitah, S.A. & Di Croce, L. RYBP and Cbx7 define 593
specific biological functions of polycomb complexes in mouse embryonic stem cells. Cell reports 594
3, 60-69 (2013). 595
33. Morey, L. et al. Nonoverlapping functions of the Polycomb group Cbx family of proteins in 596
embryonic stem cells. Cell stem cell 10, 47-62 (2012). 597
34. Tavares, L. et al. RYBP-PRC1 complexes mediate H2A ubiquitylation at polycomb target 598
sites independently of PRC2 and H3K27me3. Cell 148, 664-678 (2012). 599
35. Hunkapiller, J. et al. Polycomb-like 3 promotes polycomb repressive complex 2 binding to 600
CpG islands and embryonic stem cell self-renewal. PLoS genetics 8, e1002576 (2012). 601
13/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
36. Sarma, K., Margueron, R., Ivanov, A., Pirrotta, V. & Reinberg, D. Ezh2 requires PHF1 602
to efficiently catalyze H3 lysine 27 trimethylation in vivo. Molecular and cellular biology 28, 603
2718-2731 (2008). 604
37. Kim, H., Kang, K. & Kim, J. AEBP2 as a potential targeting protein for Polycomb 605
Repression Complex PRC2. Nucleic acids research 37, 2940-2950 (2009). 606
38. Kim, T.G., Kraus, J.C., Chen, J. & Lee, Y. JUMONJI, a critical factor for cardiac de- 607
velopment, functions as a transcriptional repressor. The Journal of biological chemistry 278, 608
42247-42255 (2003). 609
39. Pasini, D. et al. JARID2 regulates binding of the Polycomb repressive complex 2 to target 610
genes in ES cells. Nature 464, 306-310 (2010). 611
40. Peng, J.C. et al. Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target 612
gene occupancy in pluripotent cells. Cell 139, 1290-1302 (2009). 613
41. Shen, X. et al. Jumonji modulates polycomb activity and self-renewal versus differentiation 614
of stem cells. Cell 139, 1303-1314 (2009). 615
42. Glatter, T., Wepf, A., Aebersold, R. & Gstaiger, M. An integrated workflow for charting 616
the human interaction proteome: insights into the PP2A system. Molecular systems biology 5, 617
237 (2009). 618
43. Varjosalo, M. et al. Interlaboratory reproducibility of large-scale human protein-complex 619
analysis by standardized AP-MS. Nature methods 10, 307-314 (2013). 620
44. Gambetta, M.C., Oktaba, K. & Muller, J. Essential role of the glycosyltransferase sxc/Ogt 621
in polycomb repression. Science 325, 93-96 (2009). 622
45. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioin- 623
formatics 20, 1466-1467 (2004). 624
46. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 625
1150-1159 (2010). 626
47. Behrends, C., Sowa, M.E., Gygi, S.P. & Harper, J.W. Network organization of the human 627
autophagy system. Nature 466, 68-76 (2010). 628
48. Cai, Y. et al. Subunit composition and substrate specificity of a MOF-containing histone 629
acetyltransferase distinct from the male-specific lethal (MSL) complex. The Journal of biological 630
chemistry 285, 4268-4272 (2010). 631
49. Migliori, V., Mapelli, M. & Guccione, E. On WD40 proteins: propelling our knowledge of 632
transcriptional control? Epigenetics 7, 815-822 (2012). 633
50. Gao, Z. et al. An AUTS2-Polycomb complex activates gene expression in the CNS. Nature 634
516, 349-354 (2014). 635
51. van den Boom, V. et al. Non-canonical PRC1.1 Targets Active Genes Independent of 636
H3K27me3 and Is Essential for Leukemogenesis. Cell reports 14, 332-346 (2016). 637
52. van den Boom, V. et al. Nonredundant and locus-specific gene repression functions of PRC1 638
paralog family members in human hematopoietic stem/progenitor cells. Blood 121, 2452-2461 639
(2013). 640
53. Miyata, Y. & Nishida, E. DYRK1A binds to an evolutionarily conserved WD40-repeat 641
protein WDR68 and induces its nuclear translocation. Biochimica et biophysica acta 1813, 642
1728-1739 (2011). 643
54. Morita, K., Lo Celso, C., Spencer-Dene, B., Zouboulis, C.C. & Watt, F.M. HAN11 binds 644
mDia1 and controls GLI1 transcriptional activity. Journal of dermatological science 44, 11-20 645
(2006). 646
55. Dietrich, N. et al. REST-mediated recruitment of polycomb repressor complexes in mam- 647
malian cells. PLoS genetics 8, e1002494 (2012). 648
56. El Messaoudi-Aubert, S. et al. Role for the MOV10 RNA helicase in polycomb-mediated 649
repression of the INK4a tumor suppressor. Nature structural & molecular biology 17, 862-868 650
(2010). 651
57. Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D.M. & Nakatani, Y. A complex with 652
chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science 296, 653
14/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
1132-1136 (2002). 654
58. Qin, J. et al. The polycomb group protein L3mbtl2 assembles an atypical PRC1-family 655
complex that is essential in pluripotent stem cells and early development. Cell stem cell 11, 656
319-332 (2012). 657
59. Trojer, P. et al. L3MBTL2 protein acts in concert with PcG protein-mediated monoubiqui- 658
tination of H2A to establish a repressive chromatin structure. Molecular cell 42, 438-450 (2011). 659
60. Dou, Y. et al. Regulation of MLL1 H3K4 methyltransferase activity by its core components. 660
Nature structural & molecular biology 13, 713-719 (2006). 661
61. Wysocka, J. et al. A PHD finger of NURF couples histone H3 lysine 4 trimethylation with 662
chromatin remodelling. Nature 442, 86-90 (2006). 663
62. Zhang, P., Lee, H., Brunzelle, J.S. & Couture, J.F. The plasticity of WDR5 peptide-binding 664
cleft enables the binding of the SET1 family of histone methyltransferases. Nucleic acids 665
research 40, 4237-4246 (2012). 666
63. Junco, S.E. et al. Structure of the polycomb group protein PCGF1 in complex with BCOR 667
reveals basis for binding selectivity of PCGF homologs. Structure 21, 665-671 (2013). 668
64. Sanchez-Pulido, L., Devos, D., Sung, Z.R. & Calonje, M. RAWUL: a new ubiquitin-like 669
domain in PRC1 ring finger proteins that unveils putative plant and worm PRC1 orthologs. 670
BMC genomics 9, 308 (2008). 671
65. Beisel, C. & Paro, R. Silencing chromatin: comparing modes and mechanisms. Nature 672
reviews. Genetics 12, 123-135 (2011). 673
66. Alekseyenko, A.A., Gorchakov, A.A., Kharchenko, P.V. & Kuroda, M.I. Reciprocal interac- 674
tions of human C10orf12 and C17orf96 with PRC2 revealed by BioTAP-XL cross-linking and 675
affinity purification. Proceedings of the National Academy of Sciences of the United States of 676
America 111, 2488-2493 (2014). 677
67. Kalb, R. et al. Histone H2A monoubiquitination promotes histone H3 methylation in 678
Polycomb repression. Nature structural & molecular biology 21, 569-571 (2014). 679
68. Margueron, R. & Reinberg, D. The Polycomb complex PRC2 and its mark in life. Nature 680
469, 343-349 (2011). 681
69. Caubit, X. et al. Mouse Dac, a novel nuclear factor with homology to Drosophila dachshund 682
shows a dynamic expression in the neural crest, the eye, the neocortex, and the limb bud. 683
Developmental dynamics 214, 66-80 (1999). 684
70. Wu, K. et al. Cell fate determination factor DACH1 inhibits c-Jun-induced contact- 685
independent growth. Molecular biology of the cell 18, 755-767 (2007). 686
71. Fernandes, I. et al. Ligand-dependent nuclear receptor corepressor LCoR functions by 687
histone deacetylase-dependent and -independent mechanisms. Molecular cell 11, 139-150 (2003). 688
72. Shi, Y. et al. Coordinated histone modifications mediated by a CtBP co-repressor complex. 689
Nature 422, 735-738 (2003). 690
73. Hansen, K.H. et al. A model for transmission of the H3K27me3 epigenetic mark. Nature 691
cell biology 10, 1291-1300 (2008). 692
74. Cai, L. et al. An H3K36 methylation-engaging Tudor motif of polycomb-like proteins 693
mediates PRC2 complex targeting. Molecular cell 49, 571-582 (2013). 694
75. Musselman, C.A. et al. Molecular basis for H3K36me3 recognition by the Tudor domain of 695
PHF1. Nature structural & molecular biology 19, 1266-1272 (2012). 696
76. Dey, A. et al. Loss of the tumor suppressor BAP1 causes myeloid transformation. Science 697
337, 1541-1546 (2012). 698
77. Hart, G.W., Slawson, C., Ramirez-Correa, G. & Lagerlof, O. Cross talk between O- 699
GlcNAcylation and phosphorylation: roles in signaling, transcription, and chronic disease. 700
Annual review of biochemistry 80, 825-858 (2011). 701
78. Vella, P. et al. Tet proteins connect the O-linked N-acetylglucosamine transferase Ogt to 702
chromatin in embryonic stem cells. Molecular cell 49, 645-656 (2013). 703
79. Whisenhunt, T.R. et al. Disrupting the enzyme complex regulating O-GlcNAcylation blocks 704
signaling and development. Glycobiology 16, 551-563 (2006). 705
15/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
80. Iyengar, S., Ivanov, A.V., Jin, V.X., Rauscher, F.J., 3rd & Farnham, P.J. Functional 706
analysis of KAP1 genomic recruitment. Molecular and cellular biology 31, 1833-1847 (2011). 707
81. Orlando, V., Strutt, H. & Paro, R. Analysis of chromatin structure by in vivo formaldehyde 708
cross-linking. Methods 11, 205-214 (1997). 709
82. Santoro, R. Analysis of chromatin composition of repetitive sequences: the ChIP-Chop 710
assay. Methods Mol Biol 1094, 319-328 (2014). 711
83. Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 712
molecular interaction databases. Nucleic acids research 42, D358-363 (2014). 713
84. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. 714
Nature biotechnology 28, 495-501 (2010). 715
Figure Legends 716
Figure 1. Systematic profiling of human Polycomb group (PcG) protein 717
complexes. (a) Workflow for native protein complex purifications from Flp-In HEK293 718
T-REx cells. Open reading frames of 64 bait proteins were cloned into an expression 719
vector containing a tetracycline inducible CMV promoter, Strep-HA fusion tag, and 720
FRT sites. Proteins were affinity purified from whole cell extracts of isogenic cell lines, 721
trypsinized and identified by tandem mass spectrometry on an LTQ Orbitrap XL. High 722
confidence interaction proteins (HCIPs) were hierarchically clustered to infer protein 723
complex compositions. (b) Hierarchical clustering of HCIPs. Clusters of PcG and non- 724
PcG complexes are labeled in red and blue, respectively. The inset shows the location 725
of PRC1.1, PRC1.3/PRC1.5 and the four core proteins RYBP/YAF2 and RING1/2. 726
Clusters defined by single baits are indicated in green. Spearman’s rank correlation 727
coefficient based dissimilarities are color coded as indicated (top right). (c) Protein- 728
protein interaction network of clustered interaction data. Blue lines indicate interactions 729
between proteins within the same cluster. Enlarged hexagon-shaped nodes correspond 730
to the baits used in this study. (d-e) High-density interaction maps of PRC1.3/PRC1.5 731
(d) and PRC1.6 (e). New subunits are highlighted by dashed boxes. Hexagon shaped 732
nodes represent baits; squares: identified HCIPs not used as baits in this study. Black 733
nodes: common core subunits; yellow nodes: DNA binding proteins. 734
Figure 2. High-resolution interaction analysis unravels two structurally dis- 735
tinct classes of PRC2 complexes. (a) Excerpt of Figure 1b showing the PRC2 736
cluster. (b) Interaction map of PRC2 components. The PRC2 core is highlighted by 737
a dashed box. Reciprocal interactions defining the two classes of PRC2 complexes are 738
indicated in blue (PRC2.1) and red (PRC2.2) edges. Orange edges: non-reciprocal 739
interactions. (c) Schematic representation of alternative protein isoforms of LCOR 740
and C10ORF12. Numbers indicate amino acid positions. (d) LCOR interaction map. 741
Orange edges, interactions defined in this study; dashed edges, published interactions. (e) 742
Schematic representation of the employed luciferase reporter system. Amplicons (TSS, 743
+500) used for ChIP-qPCR analysis are indicated. (f) Anti-Gal4 Western Blot showing 744
the expression of Gal4-C10ORF12 and Gal4-LCOR upon tetracycline induction (g) 745
Luciferase activity of tetracycline-induced Gal4-C10ORF12 and Gal4-LCOR expressing 746
cells, normalized to uninduced cells. Values are mean?sd, p-values are from a two-sided 747
t-test (n=7). (h) Anti-Gal4 ChIP-qPCR analysis showing localization of C10ORF12 and 748
LCOR to the reporter TSS. (i) Anti-H3K27me3 ChIP-qPCR analysis at the reporter 749
TSS upon C10ORF12 and LCOR expression. Values are mean±sd, p-values are from a 750
two-sided t-test (n=3). 751
Figure 3. Human PR-DUB complexes contain OGT1 and FOXK transcrip- 752
tion factors. (a) Excerpt of Figure 1b showing the PR-DUB and 19S proteasome 753
16/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
clusters. (b) Topology of PR-DUB complexes. Interactions of bait proteins with proteins 754
localized in PR-DUB cluster are indicated in blue. WDR5 shares many interacting 755
proteins with OGT1 (indicated in red), which are predominantly MLL/SET complex 756
associated proteins, and does not interact with BAP1, ASXL1 and 2. Hexagons: bait 757
proteins; squares, identified HCIPs not used as baits in this study. Yellow: FOXK1 and 758
2. Orange nodes: OGT1 interactors. Dashed line: ASXL2-MBD5 interaction, which was 759
detected but did not pass our stringent filtering criteria. 760
Figure 4. PR-DUB.1 and PRC1 target largely distinct set of genes. (a) 761
Venn diagrams showing the genome-wide colocalization of high-confidence peaks for the 762
PR-DUB.1 components FOXK1 (blue), ASXL1 (orange) and the O-GlcNAc modification 763
(green). Empirical p-values from peak shuffling are indicated, along with the percentage of 764
intersecting peaks for each feature. The pie chart illustrates the distribution of PR-DUB.1 765
peaks (2703, triple intersection) with respect to TSSs. (b) Average ChIP-seq signal 766
(normalized to total library size) of FOXK1, ASXL1, O-GlcNAc within a 5 kb window 767
centered on RefSeq TSS. (c) Pairwise correlation of PR-DUB.1 feature enrichments at 768
TSSs. Spearman’s rank correlation coefficients are indicated. (d) Functional annotation 769
of high-confidence PR-DUB.1 peaks localizing within 5 kb of annotated TSSs. Top 10 770
significantly enriched MSigDB pathways obtained with GREAT84 are indicated. UCSC 771
tracks of PR-DUB.1 ChIP-seq signals at representative promoters belonging to the top 772
three hits are shown in order of significance (encoded by blue tones). (e) Heatmap of 773
ChIP-seq signals (normalized to total library size) for the indicated features within 10 774
kb of PR C1 and PR-DUB.1 binding sites. (f) Venn diagrams showing the genome-wide 775
colocalization of high-confidence PR-DUB.1 peaks (red), PRC1 (brown) and TIF1B 776
(light blue). A representative UCSC track of ChIP-Seq signals at TSSs bound by all 777
three features is shown. 778
17/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Supplementary Figure Legends 779
Supplementary Figure 1. Comparison of AP-MS interaction data with public 780
protein interaction databases and high resolution interaction map of SKP1 781
and WDR5. (a) Compilation of human bait proteins used for AP-MS and their relation 782
to core subunits of Drosophila PcG complexes. (b) Number of high confidence interacting 783
proteins (HCIPs) for each bait. Dark blue bars represent interactions annotated in 784
the public protein interaction database IntAct, light blue bars are novel interactions 785
found in this study (overall 75%). (c) SKP1 interactome inferred in this study (blue 786
and orange lines) and annotated in public literature databases (red dotted lines). The 787
thick blue lines represent the main interactions to PcG components that cluster together 788
with SKP1 and form PRC1.1. About half of the identified interactions (n = 42) were 789
F-box proteins, including KDM2B - the only F-box protein associated with the PRC1.1 790
complex. Hexagons indicate bait, squares prey proteins. (d) WDR associated proteins 791 1 overlapping with . Uniprot protein names and number of proteins are indicated. (e) 792
Interaction map of WDR5 AP-MS results. Blue lines represent interactions detected 793
in this study to PcG associated proteins that cluster with WDR5 and form PRC1.6. 794
Interactions with other HCIPs (orange) together with public literature interactions 795
(dotted red lines) could be assigned to known protein complexes (MLL, TCP, NSL, 796
TORC2 and ADA2/GCN5/ADA3 transcription activator complex). Hexagons indicate 797
bait, squares prey proteins. 798
Supplementary Figure 2. High-resolution topology maps of selected clusters. 799
(a) RBBP4/7 interaction proteome. In addition to interacting with PRC2 and HP1, 800
RBBP4/7 proteins were found to bind members of the LINC, NURF, NURD and SIN3 801
complexes. (b) No human Pho-RC could be identified. TYY1 and SMBT1, the homologs 802
of the Drosphila Pho-RC proteins Pho and dSFMBT, respectively, did not interact. 803
However, TYY1 co-purified with the INO80 complex, which is indicated by a dashed 804
circle. (c) Interaction proteome of the four human LMBL paralogs. Only LMBL2 805
exhibited interactions with a PcG protein assembly and is part of PRC1.6. In all panels 806
blue and orange lines represent interactions found in this study; red dashed lines are 807
annotated public literature interactions. Hexagons indicate bait, squares prey proteins. 808
Supplementary Figure 3. Network topology of PRC1 complexes. (a) Protein- 809
protein interactions among the central core proteins of the PRC1 complexes. Note that we 810
detected no interactions between protein paralogs. Edges represent detected reciprocal 811
interactions. (b) Network topology of PRC1.2 and PRC1.4 complexes. Note that 812
PCGF2/4 can assemble to PRC1.2/PRC1.4 or can form trimeric complexes (edges in 813
red) with RING1/2 and RYBP/YAF2 (we detect no interactions between RYBP/YAF2 814
with CBX and PHC proteins). Edges connecting PRC1.2/PRC1.4 core components in 815
blue. (c) Excerpt of Figure 2A indicating the allocation of the WD40 protein DCAF7 to 816
PRC1.3/PRC1.5. High-density interaction maps of PRC1.3/PRC1.5 (C) and PRC1.6 (D). 817
New subunits are highlighted by dashed boxes. Hexagon shaped nodes represent baits; 818
squares: identified HCIPs not used as baits in this study. Black nodes: common core 819
subunits; yellow nodes: DNA binding proteins. (d) PRC1.1 complex contains PCGF1 820
and SKP1 and links the co-repressors BCOR and BCORL as well as the demethylase 821
KDM2B to the PRC1.1 core. Edges connecting core components are in purple. 822
Supplementary Figure 4. Interaction proteomes of PRC1 and HP1 com- 823
plexes. (a) CBX1/3/5 interactions with zinc finger (ZNF; orange nodes) and neighbor- 824
ing proteins. Note that all bait proteins of this subnetwork interacted with the nuclear 825
co-repressor TIF1B (yellow), which underscores the reported function of TIF1B in the 826
18/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
recruitment of HP1 proteins to specific DNA sequences through interaction with zinc 827
finger transcription factors. Hexagons indicate bait, squares prey proteins. Blue edges, 828
interactions of potential core subunits of the corresponding network, interactions in public 829
databases as dashed lines, others in orange. (b) CBX1 and CBX3 but not CBX5 interact 830
with the histone methyltransferases EHMT1 and EHMT2. The CBX1/3-EHMT1/2 831
complex also interacts with zinc finger transcription factors WIZ, ZN644, ZN462 and 832
the co-repressor protein TIF1B. In contrast to a previous report we did not detect 833 2 any potential interactions of EHMT1/2 with PRC1.6 . (c) Interaction of CBX3 and 834
CBX5 with SENP7. This complex may also include TIF1B, the zinc finger transcription 835
factor AHDC1 and the histone chaperone CHAP1. (d) CBX1/3/5 complex with histone 836
chaperone CAF1 and RBBP4, which is potentially involved in DNA replication. (e) 837
Centromeric DSN1/MIS12 complex with HP1 proteins, involved in mitosis. 838
Supplementary Figure 5. Characterization of PRC2 core interacting proteins 839
C17ORF96 and C10ORF12. (a) C17ORF96 and SKDA1 are related proteins, which 840
share sequence similarity at their C-terminus. BLAST search result with C17ORF96 841
(Uniprot A6NHQ4) as query sequence. (b) CLUSTAL 2.1 alignment of C17ORF96 842
(Uniprot A6NHQ4) and SKDA1 (Uniprot Q1XH10) amino acid sequences. Homologous 843
C-termini, identified by BLAST search, indicated in red. (c) All AP-MS identified 844
peptides that match to LCOR-CRA b. Colors indicate specific protein regions as in 845
Fig. 2c. (d) Number of identified LCOR and C10ORF12 isoform peptides in PRC2 846
protein purification experiments. (e) Number of PRC2 prey peptides in LCOR and 847
C10ORF12 isoform AP-MS experiments. (f) C10ORF12 localize in the nucleus. Images 848
show anti-HA in situ stainings of stable Flp-In HEK293 T-REx cell lines before and 849
after tetracycline induction. HA-EGFP shows a dispersed, cytoplasmic signal whereas 850
the C10ORF12 HA-epitope fusion protein shows a nuclear signal. In situ stainings were 851 3 performed as described in Glatter et al., 2009 . 852
Supplementary Figure 6. Functional annotation of PR-DUB.1 components 853
binding sites and genome-wide analysis of PR-DUB.1, PRC1 and TIF1B 854
enrichments at TSSs. (a) Sequence read statistics of ChIP-seq experiments. (b) 855
Functional annotation of high-confidence MACS peaks localizing within 5kb of annotated 856 4 TSSs. Enriched MSigDB pathways were computed with GREAT . (c) Pairwise scatter 857
plots of PR-DUB.1, PRC1 and TIF1B enrichments at TSSs. Spearman’s rank correlation 858
coefficients are indicated. 859
Supplementary Methods 860
Cell Line Generation 861
Flp-In HEK293 T-REx cells (Invitrogen) containing a single genomic FRT site and 862
stably expressing the tet repressor were cultured in DMEM (4.5 g/l glucose, 10% FCS, 2 863
mM L-glutamine) containing 100 µg/ml zeocin and 15 µg/ml blasticidin. The medium 864
was exchanged with DMEM medium containing 15 µg/ml blasticidin before transfection. 865
For cell line generation, Flp-In HEK293 T-REx cells were co-transfected with the 866
corresponding expression plasmids and the pOG44 vector (Invitrogen) for co-expression 867
of the Flp-recombinase using the Lipofectamine 2000 transfection reagent (Invitrogen). 868
Two days after transfection, cells were selected in hygromycin-containing medium (100 869
µg/ml) for 2-3 weeks. 870
19/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Protein Identification 871 5 Mass spectrometry raw data were searched with X!Tandem against a human protein 872
sequence database (Swiss-Prot canonical reviewed human proteome reference data set; 873 http://www.uniprot.org/), including reverse decoy sequences for all entries. The 874 search parameters were set to include only fully tryptic peptides (KR/P) containing 875
up to two missed cleavages. Peptide modifications consisted of Carbamidomethyl 876
(+57.021465 amu) on Cys (static) and oxidation (+15.99492 amu) on Met (dynamic) 877
and phosphorylation (+79.966331 amu) on Ser, Thr, Tyr (dynamic) were set as dynamic 878
peptide modifications. Precursor mass error tolerance was set to 25 ppm, the fragment 879
mass error tolerance to 0.5 Da. Obtained peptide spectrum matches were statistically 880
evaluated using PeptideProphet and protein inference by ProteinProphet, both part of 881 6 the Trans Proteomic Pipeline . A minimum protein probability of 0.9 was set to match 882
a false discovery rate (FDR) of <1%. The resulting pep.xml and prot.xml files were used 883
as input for the spectral counting software tool Abacus7 to calculate spectral counts and 884 8,9 normalized spectral abundance factor (NSAF) values . 885
Evaluation of high confidence interacting proteins (HCIP) 886
Adjusted NSAF values of identified co-purified proteins were compared to a control 887
data set of 62 StrepHA-GFP and 12 StrepHA-RFP-NLS purification experiments. The 888
protein abundance in the control data set was estimated by averaging the 10 highest 889
NSAF values per protein among all 74 measurements. Protein abundance enrichment of 890
>10 fold compared to the control data set was used as an initial step for filtering protein 891
interaction raw data. Adjusted NSAF values were also used to calculate WDN-scores 892 10 of all the interaction candidates . A simulated data matrix was used to calculate 893
the WD-score threshold below which 98% of the simulated data falls. From this high 894
confidence interaction data set (control ratio > 10; WDN-score > 1) a distance matrix was 895 11 calculated with the Multiple Experiment Viewer (http://www.tm4.org/mev/) using 896 an uncentered Pearson distance metric and mapped on the unfiltered raw interactions. 897
To relax filtering stringency in close proximity in the network, sub-threshold interactions 898
(control ratio and WDN-score) were rescued if the distance was greater than zero 899
(n = 314 protein interactions). The resulting filtered data set contained the high 900
confidence interacting proteins (HCIPs) and corresponding protein-protein interactions. 901
For comparison to literature data, all human protein interactions were extracted from 902 12 the public database IntAct . 903
Clustering analysis 904 13 All data analyses were performed using R (http://www.R-project.org). Agglom- 905 erative hierarchical clustering of HCIP was performed using adjusted NSAF values. 906
Different correlation-based dissimilarity measures were considered in combination with 907
commonly adopted intergroup dissimilarity measures (single, average and complete 908
linkage functions). For each pair of measures, clustering performances were evaluated 909
using the cophenetic correlation coefficient, which measures the ability of a dendrogram 910 14 to represent the input data structure . As a result of this procedure, hierarchical 911
clustering was performed by adopting a Spearman’s rank correlation coefficient based 912
dissimilarity along with average linkage. Therefore, the dissimilarity between prey i and 913 j was computed as dij = (1 − r(xi, xj))/2, where r is the SCC. 914
20/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Network Visualization 915 15 Protein Interaction data were visualized with Cytoscape 2.8.3 . Known bait interactions 916
were obtained from the protein interaction network analysis platform PINA v2 (December 917 16 2012) using bait protein identifiers as starting nodes. 918
ChIP-Seq data analysis 919
ChIP-Seq profiles of RING1B, RYBP, TIF1B (GEO accession number GSM855007, 920
GSM855008 and GSE27929, respectively) and corresponding input data sets were 921
downloaded in sra format and converted to fastq using the NCBI Short Read Archive 922
Toolkit. Short reads were aligned to the human genome (hg19 assembly) using Bowtie 923 17 2.0.0 allowing for 1 mismatch in a 30nt seed, reporting best out of at most 100 924
alignments. Overall alignment rates ranged between 83 and 94% for in-house generated 925
data sets and between 75 and 98% for the others. Alignments were converted from SAM 926 18 format to BAM using SAMtools 0.1.18 . Peak calling was performed using MACS 927 19 1.4.0 with default parameters. Peaks were then filtered according to p-values (p 928 −10 < 10 ). If replicates were available, only peak intersections were considered further 929
and denoted as high-confidence peaks. All subsequent analyses were performed using 930 20 R/Bioconductor . Coverage tracks at single base pair resolution were generated with 931 21 wavClusteR . Overlapping peaks were determined using GenomicRanges using a 932
minimum overlap of 1bp. RefSeq transcript annotations were fetched from UCSC using 933
GenomicFeatures. Unique TSSs were defined as TSSs having no other annotated TSS 934
within their 1kb flanking region, irrespective of the strand. A total of 21612 unique 935
TSSs was considered further. Metaprofiles of ChIP-Seq signals at TSSs (± 2.5kb) were 936
computed using non-overlapping windows of width 50 nt. ChIP-Seq signal heatmaps 937 22 were computed with Genomation . Feature enrichments at unique promoters were 938 23 computed as described in . MSigDB pathway analysis was performed with GREAT 939 4 2.0.2 by associating genomic regions to single nearest annotated genes within 5kb. 940
Supplementary References 941
1. Cai, Y. et al. Subunit composition and substrate specificity of a MOF-containing histone 942
acetyltransferase distinct from the male-specific lethal (MSL) complex. The Journal of biological 943
chemistry 285, 4268-4272 (2010). 944
2. Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D.M. & Nakatani, Y. A complex with 945
chromatin modifiers that occupies E2F- and Myc-responsive genes in G0 cells. Science 296, 946
1132-1136 (2002). 947
3. Glatter, T., Wepf, A., Aebersold, R. & Gstaiger, M. An integrated workflow for charting the 948
human interaction proteome: insights into the PP2A system. Molecular systems biology 5, 237 949
(2009). 950
4. McLean, C.Y. et al. GREAT improves functional interpretation of cis-regulatory regions. 951
Nature biotechnology 28, 495-501 (2010). 952
5. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinfor- 953
matics 20, 1466-1467 (2004). 954
6. Deutsch, E.W. et al. A guided tour of the Trans-Proteomic Pipeline. Proteomics 10, 955
1150-1159 (2010). 956
7. Fermin, D., Basrur, V., Yocum, A.K. & Nesvizhskii, A.I. Abacus: a computational tool for 957
extracting and pre-processing spectral count data for label-free quantitative proteomic analysis. 958
Proteomics 11, 1340-1345 (2011). 959
8. Paoletti, A.C. et al. Quantitative proteomic analysis of distinct mammalian Mediator 960
complexes using normalized spectral abundance factors. Proceedings of the National Academy 961
21/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
of Sciences of the United States of America 103, 18928-18933 (2006). 962
9. Zybailov, B. et al. Statistical analysis of membrane proteome expression changes in Saccha- 963
romyces cerevisiae. Journal of proteome research 5, 2339-2347 (2006). 964
10. Behrends, C., Sowa, M.E., Gygi, S.P. & Harper, J.W. Network organization of the human 965
autophagy system. Nature 466, 68-76 (2010). 966
11. Saeed, A.I. et al. TM4: a free, open-source system for microarray data management and 967
analysis. BioTechniques 34, 374-378 (2003). 968
12. Orchard, S. et al. The MIntAct project–IntAct as a common curation platform for 11 969
molecular interaction databases. Nucleic Acids Research 42, D358-63 (2014). 970
13. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria. 971
(2012). 972
14. Hastie, T., Tibshirani, R. & Friedman, J. The elements of statistical learning data mining, 973
inference, and prediction, Edn. 2nd. (Springer, New York, N.Y.; 2009). 974
15. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular 975
interaction networks. Genome research 13, 2498-2504 (2003). 976
16. Wu, J. et al. Integrated network analysis platform for protein-protein interactions. Nature 977
methods 6, 75-77 (2009). 978
17. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nature methods 979
9, 357-359 (2012). 980
18. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 981
2078-2079 (2009). 982
19. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome biology 9, R137 983
(2008). 984
20. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. 985
Nature methods 12, 115-121 (2015). 986
21. Comoglio, F., Sievers, C. & Paro, R. Sensitive and highly resolved identification of RNA- 987
protein interaction sites in PAR-CLIP data. BMC bioinformatics 16, 32 (2015). 988
22. Akalin, A. et al. Genomation: a toolkit to summarize, annotate and visualize genomic 989
intervals. Bioinformatics 31, 1127-1129 (2014). 990
23. Enderle, D. et al. Polycomb preferentially targets stalled promoters of coding and noncoding 991
transcripts. Genome research 21, 216-226 (2011). 992
22/22 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Figure 1 a Double Affinity LC-MS/MS Identification Contaminant Subcomplex 64 Baits Flp-In HEK293 Cells Purification LTQ Orbitrap XL X!Tandem + TPP Filtering Assignments Gene of interest 490 HCIPs N +Tet 174 AP-MS WD -Score 1400 interactions Hierarchical Strep FRT experiments Clustering HA 72 AP-MS control TetO (n > 2 biological experiments intensity CMV Hygro replicates) FRT recombination m/z Streptactin αHA
CSK22 b c PHC2 CBX6 PRC1 PHC1 PCGF2 CSK21 CBX7 CBX8 PCGF3 PRC1.1 DCAF7 CBX2 PHC3 TCP PCGF5 PRC1.3/PRC1.5 OGT1 PR-DUB PCGF4 WDR5 CBX4 DCAF7 YAF2 ASXL2MLL RING1 RYBP RYBP RING2 MLL RING2 YAF2 RING1 ASXL1 LMBL2 PRC1.6 BAP1 E2F6 OGT1 MAX PCGF6 TFDP1 LCOR MAX LMBL4 PCGF1 E2F6 HP1 (CBX1/3/5) ZMYM4 19S INO80 proteasome -complexes LMBL3 SKP1 TFDP1 SUV92 YY1 CBX3 TRIPC LMBL1 WDR5 PR-DUB TCP 19S proteasome CBX5 CBX1 HP1 SKP1 ZN462 ZN211 EZH1 C10orf12 EHMT1 PHF19 EZH2 SUV91 LMBL1 SUZ12 CAF1B 20S proteasome AEBP2 DSN1 LCOR C17orf96 ZMYM4 YY1-INO80 PHF1 PRC2 EED Z518A MTF2 RBBP7 SENP7 LINC, NURF, NURD, SIN3 SMBT1 JARD2 PRC1.2/PRC1.4 RBBP4 LINC NURF PRC2 NURD SIN3 d e REXONMYCL1 MNT MXI1 ARGI1 SPR1B RB
PRC1.6 MAD1 MYC MAD4 MAD3
RYBP RYBP CBX3 PRC1.3/5
RING1 PCGF5 RING1 MAX WDR5
CSK22 GALT8
FBRS MK67I PCGF6 MGAP CBX1
FBSL DDX54 DCAF7 CSK2B
SURF6 AUTS2
GSCR2 RING2 TFDP1 LMBL2 CSK21
TCPW FA53C DYR1B DYR1A YAF2 E2F6 RING2 PCGF3 ZN503 SWAHA DIAP1
TROAPSWAHC ZN703 E2F1 E2F4 GEMI ERLN2 USMG5 TGM3 TFDP2 POF1B AMY1 COMD4 YAF2
PFD4 PSME3 PFD6 E2F5 E2F3 ARF3 CAND1 AT2A2 APOD SMR3B CYTS COMD6
PFD2 PFD3 PFD5 E2F2 DIC ABCD3 MCIN DHE3 GSTP1 CYTT COMD1
DNA binding proteins NTPCR AKAP8 S10AB COMD8 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Figure 2 a b BIEA c PRC2 1 PHF19 111 LCOR 1 433 C17ORF96 EZH1 MTF2 SKDA1 SUZ12 C10ORF12 1 1247 EED 1 1557 0 RBBP7 LCOR-Cra_b EZH2 AEBP2 PHF1 C17ORF96 (GenBank: EAW49962.1) 311 RBBP4 EZH1 PHF1 C10ORF12 JARD2 RBBP4 SUZ12 d AEBP2 TBA1C SKDA1 IMA7 PHF19 DEK H33 EED RBBP7
GRN LCOR CTBP1 - Tet e JARD2 MTF2 C10ORF12 GAL4 binding sites TK Luciferase
TSS +500 EZH2 CTBP2 + Tet C10ORF12 ? Gal4 h i GAL4 binding sites TK Luciferase C10ORF12 LCOR H3K27me3 TSS +500 TSS +500 C10ORF12 LCOR f g p = 0.004 p = 0.03 C10ORF12 100 Tet Tet Tet Tet - Tet + Tet - - - 80 - + + + 188 + 60 4e-05 LCOR 40 % of Input Tet - Tet + 3e-08 % of Input % of Input 20 Relative Activity (%)
62 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.00 0.05 0.10 0.15
LCOR 0.00 0.05 0.10 0.15 0.20 Gal4 IgG Gal4 IgG Gal4 IgG Gal4 IgG TSS +500 TSS +500 C10ORF12 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Figure 3 a 19S proteasome PR-DUB 1 PGAM5 ASXL1 PSMD8 MBD5 0 FOXK2 FOXK1 PSD7 MBD6 PSD11 KDM1B PSMD2 ASXL2 PSMD3 OGT1 PRS10 HCFC1 PRS7 BAP1 PRS8 SSRD UBE2O PSMD4 TBA1C
19S proteasome b regulatory subunit
PSD7 PSD11 PRS6B PRS10 PRS7 PRS8
SMHD1 ZEP1 DIDO1 NCOAT TRAK1 HACD3 SSRD UBE2O PSMD2 PSMD8 PSMD3 PSMD4
PSPC1 RC3H2 CYTA TET1 HCFC2
SPAT7 1433F ZEP2 CARM1 BAP1 PGAM5
RL39 OGT1 TBA1C ASXL1 TCPB
HCFC1 TCPG KDM1B FOXK1 FOXK2 MBD6 MBD5
TCPE
CXXC1 KANL1 KANL2 KANL3 MCRS1 MLL1 RBBP5 SET1A ASH2L ASXL2 SLUR1
WDR5
shared HCIP of OGT1 and WDR5 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Figure 4 a b OGlcNac FOXK1 ASXL1 OGlcNac (n=16699)
4827 373 2923
FOXK1 0.02 0.10
(n=7349) 2703 Average signal (RPM) 8796 -2.5kbTSS +2.5kb TSS TSS 1350
18270 c ASXL1 (n=31119) ASXL1 -2 ρ -4 0 2 4 =0.83 Peak position w.r.t. TSSs Overlap (+/-1kb) OGlcNac Distal ρ ρ
-4 0 2 4-2 =0.89 =0.85 Upstream (<5kb) Downstream -4-2 0 2 4 -4-2 0 2 4 (<5kb) FOXK1 ASXL1 d PR-DUB.1 - MSigDB Pathway -log10(Binomial p value) 0 10 20 30 40 50 60 70 80 90 100 110 Gene Expression 116.78 Transcription 61.58 Cell Cycle, Mitotic 54.84 Spliceosom e 46.15 Metabolism of proteins 45.81 Form ation and Maturation of m RNA 44.61 HIV Infection 42.49 Influenza Life Cycle 40.18 Diabetes pathways 39.84 Translation 39.00
Chr1 6,260,000 Chr8 101,163,000 Chr1 28,969,000 28,970,000 Chr11 57,426,000 Chr3 127,318,000 40 40 70 80 80 FOXK1 FOXK1 1 1 1 1 1
109 75 120 35 70 ASXL1 ASXL1 1 1 1 1 1
65 60 50 60 30 OGlcNac OGlcNac 1 1 1 1 1
RPL22 POLR2K TAF12 CLP1 MCM2 e f FOXK1 ASXL1 OGlcNac RYBP RING1B PR-DUB.1 (n=2703)
312 1552 4924 PRC1 PR-DUB.1 (n=6816) 336 503 1244
8214 TIF1B (n=10297)
2 kb 163,291,000 163,292,000 163,293,000 Chr1 40 FOXK1 1
PRC1 55 ASXL1 1
145 OGlcNac 1
20 RING1B 1
20 RYBP 0-5kb+5kb 1 20 TIF1B 1 0 0.3 0 0.3 0 0.2 0 0.2 0 0.18 RGS5 NUF2 Supplementary Figure 1 a b 80 Annotated in IntAct Novel in this study 70
60 Primary baits Secondary baits 50
“PcG system” HC IPs
CBX2 RYBP MAX CBX4 RING1 YAF2 EHMT2 40 CBX6 SKP1 LMBL2 CBX7 RING2 DCAF7 WDR5 Pc Sce 30 CBX8 bioRxiv preprint doi:CSK21 https://doi.org/10.1101/059964LMBL1 ; this version posted July 7, 2016. The copyright holder for this preprint (which was not Number of certified by PCGF1peer review)CSK22 is theLMBL3 author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under PRC1 E2F6 LMBL4 PHC1 20aCC-BY-NC 4.0 International license. PCGF2 TFDP1 TRIPC Ph Psc PCGF3 PHC2 “heterochromatin” PCGF4 10 PCGF5 CBX1 SUV92 Z518A PHC3 PCGF6 CBX3 DSN1 ZMYM4 CBX5 SENP7 ZN211 0 SUV9 CAF1B ZN462 46 2 21 1 TF 2 BX 3 BX 1 BX 5 BX 4 BX 8 BX 6 BX 7 BX 2 HC 3 HC 2 HC 1 YY 1 DR 5 RD 2 EE D YB P MAX 518 A E2F6 SK 21 SK 22 YA F2 T EZH2 EZH1 PHF1 SKP 1 BAP 1 M C C C C C C C C P P P DSN1 BBP 4 BBP 7 OG T1 R LCOR W Z TRIPC ZN ZN RING1 RING2 SUZ12 PHF19 ASX L1 ASX L2 TF DP1 JA LMBL1 LMBL2 C C SUV91 SUV92 LMBL4 LMBL3 CAF1B DC AF7 AEBP 2 R R SE NP7 PCGF6 PCGF5 PCGF3 PCGF4 PCGF2 PCGF1 SMBT1 EHMT2 ZMYM4
EZH1 17 ORF96 10 ORF12 EED C C JARD2 Bait Proteins EZH2 E(z) Esc AEBP2 RBBP4 PRC2 RBBP7 PHF1 C17ORF96 Su(z)12 Pcl C10ORF12 c LCOR RING1 BCOR KDM2B BCORL ARHG6 GIT1 ARHG7 FXL17 FBX28 FBSP1 FBX38 FBX42 SUZ12 PHF19
MTF2 FXL20 FBW1A FBX21 FBX30 FXL19 YAF2
FXL18 FXL15 FBX10 FXL14
ASXL1 ASXL2 RYBP
FBX44 FBXW5 FBX17 FBX22 FBXL8 ASX RING2 OGT1 SKP1 FBX18 KDM2A FXL12 FBX6 FBX3 PR-DUB
FBX9 FBX2 FBXL4 FBW1B FBX7 Calypso CBX4 MAX FBXW2 FBX5 FBX4 FBXL2
BAP1 CBX8 PCGF1 FBXW4 FBXW9 FBXW8 FBXL6
FBX33 FBX46 FBX11 YY1 PSB2 PSB3 Pho PSB1 PSA1 CKS1 NEDD8 PHORC F172A RBM14 EMAL4 PSA5 PSA3 SKP2
dSfmbt MTUS1 MYCB2 CCDC8 PSMF1 Proteasome PSA6 CUL1 RBX1 SFMBT NDC80 CKAP4 DYL1 PSB4 PSA7 CUL7
PSB6 PSA2
PSB7 PSA4 PSB5
MLL3 MLL complexes d e PAGR1 MLL2
MLL4 KDM6A
SET1A NCOA6 TCP complex MS3L1 MLL1 Cai et al, 2010 HCFC2 MSL2 NSL1 PRC1.6 TCPH TCPE 5 RUVB1 SET1B HCFC1 RUVB2 TCPG TCPZ
CBX3 DPY30 PAXI1 TCPA TCPB ASH2L MLL2 TFDP1 MEN1 WDR82 HCFC1 MLL3 CBX1 PCGF6 TCPQ TCPD CXXC1 RBBP5 KANL2 MLL4 MAX ASH2L 16 KANL3 OGT1 KAT8 PAXI1 LMBL2 EHMT2 MCRS1 PHF20 E2F6 MEN1 RBBP5 MGAP MLL1 SET1A HDAC1
RING1 RYBP ARHG2 HELB RICTR TCPB HDAC2 48 ATN1 KANL1 RING1 TCPD CBX3 LMBL2 RING2 TCPE RING2 YAF2 YETS2 CEP72 MAX RL35A TCPG WDR5 TAD2A ZZZ3 CSK21 MBIP1 RL37A TCPH CSR2B MGAP RM11 TCPQ CYTSB MSL1 SESN2 TCPZ MBIP1 CSR2B E2F6 PCGF6 SGF29 TFDP1 F199X PDPK1 SIN1 YAF2 SGF29 TADA3 HACD3 PPR3F TAD2A YETS2 CSK21 CSK22 HDAC1 PRR5 TADA3 ZXDC HDAC2 RERE TCPA ZZZ3 ARHG2 BD1L1 HELB RICTR RERE ADA2/ KANL3 KANL2 RL35A RL37A PDPK1 GCN5/
PHF20 MCRS1 MSL1 PRR5 SESN2 RM11 SIN1 ATN1 ADA3 this study transcription KANL1 KAT8 CYTSB ZXDC HACD3 TORC2 activator OGT1 CEP72 PPR3F F199X complex
NSL complex bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. a HP1 Supplementary Figure 2
CHD4 CAF1A
HAT1
CAF1B CBX3 ASF1B
ASF1A
SENP7 CBX5
CBX1 IMA7
P66B BRMS1 MTA1
DPY30 NOL9
PRC2 CDKA1 MBD2 H33
JARD2 PHF19 AEBP2 RBBP4 BRM1L SAP30
EZH1 EED EZH2 P66A MBD3
SUZ12 C10ORF12 PHF1
CHD3 MTA2
C17ORF96 TBA1C MTF2 RBBP7 SIN3B SIN3A LINC RL39 SP30L HDAC1 NURF PFD6 HDAC2 NURD
LIN52 BC11A ANR27 BEND7 SIN3 MYBB
BAP18 FA60A LIN37 Z512B HJURP
MTA3 BPTF PWP2A ZN296 LIN9 LIN54 b
IN80E
ACL6A INO80
LMBL3
TFPT UCHL5 HDAC1 KDM1A DYL1
IN80C RUVB2 RCOR1 SMBT1 DYL2 TYY1
IN80D RUVB1 ZN217
MCRS1 IN80B HDAC2 GSE1 RCOR3 RREB1
ARP5 NFRKB ARP8 INO80 c
SMBT1
PP2AB LMBL3 LMBL4 PSB6 PRS6B PSB7 YAF2
RYBP TCPD TCPG TCPH TCPA SAMD1 CPVL SAM13 P4HA1 TBA1C RL39 PDGFC
PSB3 PSMD2 TCPB TCPZ TCPE TCPQ PSA3 PSA7
RING2 PRS6A PRS10
HDAC2 HDAC1 PFD5 TIF1B NOP56 FBRL ATD3A PSD11 PRS7 RING1 PFD2 PSMD3 PSB5
PSB1 PSDE
MAX LMBL2 CBX1 TIM50 AKAP8 USMG5 AT2A2 LMBL1 PSB2 PRS4
PCGF6 WDR5 SUCB1 APC1 GALK1 MCM7 PSA5 PRS8 CBX3 PSB4 PSD7 PSMD6 CDC37 LAS1L TBB6 TBB2B SSRA TFDP1 E2F6 SPR1B S10A7 NOL9 NOP58 DNJB6 FANCI MGAP LIN54 TADA3 COMD4 RBM39 AKP8L DHE3 ZN281 SSRD bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Supplementary Figure 3 a b SCML1 CBX2 PHC1
1433F
1433G CBX4 RING2 YAF2 1433T RING1 YAF2 PCGF1 DPM1 PCGF2
CBX6 PHC2
PCGF2 PCGF4 PCGF3 PCGF5 PCGF4
NUFP2 CBX7 RING1 RYBP
PCGF6
RING2 RYBP CBX8 PHC3
LMNB1 F195A LMNB2 P53 RS3 LTV1 c d PCGF1 1 PRC1.3/5 PRC1.1 KDM2B MORC4 SKP1 BCOR PCGF1 UBP7 BCORL CSK2B CH60 0 YAF2 CSK22 RING2 CSK21 BCORL FBSL AUTS2 BCOR FBRS KDM2B DCAF7 RING1 PCGF5 RYBP PCGF3 YAF2
RYBP SKP1 RING2 RING1 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
Supplementary Figure 4
a AIFM1 RCN2 RUVB2 D2HDH CALU GBG12 MYH9 RUXE KDM1A RCOR1 ZRAB2 Z280CADNP2 Z518B POGZ RLF ZN678
DJB11 KLH21 EIF3B DCTN2 UBQL4 ACTC ILVBL ZMYM2ZSA5AZMYM3 ADNP ZN689 ZN292CHAP1 Z280D
PSB3 RUVB1
PSDE LACRT
PIP CYTA WIZ DYL1
EHMT2 ZN462 Z518A ML12B ZMYM4 DYL2
CS068
EHMT1 ZN644
TCPA TCPB CBX3 CBX5
TCPG TCPH
TIF1B CBX1 SENP7
Zinc finger protein
RL35A MSL1 RB SG2A1 IMA7 PRR14MD2L2 LRIF1 AHDC1 SCAI CC71L ZN581
b c
RUVB1 AIFM1 RCN2 RUVB2 DJB11 AHDC1 CHAP1 RBBP4
LACRT KLH21
CYTA WIZ ZN644 CALU CYTA SENP7
PSB3 EHMT2 ZN462 D2HDH ML12B
PSDE TIF1B
PIP EHMT1 CBX5 CBX3 CBX5
CBX1 CBX3 TIF1B d e RBBP4 ASF1B
H33 CAF1A CBX3 CBX5 NUDC
PFD2 TIF1B CAF1B NDC80 DSN1 NSL1 IF4G1
PFD4 CBX1 MIS12 PMF1 CBX3 CBX5
CBX1 ZWINT SPC24 SPC25 NUF2 CASC5 Supplementary Figure 5 bioRxiv preprint doi: https://doi.org/10.1101/059964; this version posted July 7, 2016. The copyright holder for this preprint (which was not a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under Score Expect Methoda CC-BY-NC 4.0 InternationalIdentiti licensees . Positives Gaps 34/64 43/64 1/64 77.8 bits(190) 2e-14 Compositional matrix adjust. (53%) (67%) (1%)
C17ORF96 313 FSLLNCFPCPPALVVGEDGDLKPASSLRLQGDSKPP-PAHPLWRWQMGGPAVPEPPGLKFWGIN 375 F + FPCPP+L++G DGDL PA SL DS+ P AHP+W+WQ+GG A+P PP KF N SKDA1 763 FHFMANFPCPPSLIIGRDGDLWPAYSLNTTKDSQTPHKAHPIWKWQLGGSAIPLPPSHKFRKFN 826 b C17ORF96 ------METLCPAPRLAVPASPR------17 SKDA1 MGDLKSGFEEVDGVRLGYLIIKGKQMFALSQVFTDLLKNIPRTTVHKRMDHLKVKKHHCDLEELRKLKAINSIAFHAAKCTLISREDVEALYTSCKTERVLKTKRRRVGRALATKAPPPERAAAASPRPGFWKDKHQLWRGLSGAARPLP 150 : * .*.*. *..****
C17ORF96 ------GSPCSPTPRKPCRGTQEFSPLCLR------ALAFCALAKPRASSLG------PGPGELAARSPVLRGPQAPLR--PGGWAP 84 SKDA1 ISAQSQRPGAAAARPAAHLPQIFSKYPGSHYPEIVRSPCKPPLNYETAPLQGNYVAFPSDPAYFRSLLCSKHPAAAAAAAAAAAAAAGATCLERFHLVNGFCPPPHHHHHHHHHHHHHHHRAQPPQQSHHPPHHHRPQPHLGSFPESCSS 300 ** . *.**: . ::.. *: * * .* * . *:.* . * : : :.* : **. * * . :.
C17ORF96 DGLKHLWAPTGR------PGVPNTAAGEDADVAACPRRGEEEEGGGGFPHFGVRSCAPPGRCPAPPHPRES------TTSFASAP------PRPAPGLEPQRGPAASP 174 SKDA1 DSESSSYSDHAANDSDFGSSLSSSSNSVSSEEEEEEGEEEEEEEEEEGGSGASDSSEVSSEEEDSSTESDSSSGSSQVSVQSIRFRRTSFCKPPSVQAQANFLYHLASAAAATKPAAFEDAGRLPDLKSSVKAESPAEWNLQSWAPKASP 450 *. . :: . .. ..:: *: : . .. ******.* .. . *...... * ***...* ** . .* ***
C17ORF96 PQEPSSR------PPSPPAG------LSTEPAGPGTAPRPFLPGQPAEVDGNP------PPAAPEAPAASPSTASPAPAAPG------238 SKDA1 VYCPASLGSCFAEIRNDRVSEITFPHSEISNAVKRTDLTINCLAEGASSPSPKTNNAFPQQRILREARKCLQTTPTTHCADNNTIAARFLNNDSSGAEANSEKYSKILHCPEFATDLPSSQTDPEVNAAGAAATKAENPCTDTGDKTLPF 60 0 *:* .*** :. :*. *. .* . ** .:.: .:.*. * *.: *::.... *..*..
C17ORF96 ------DLRQEHFDRLIRRSKLWCYAKGFALDTPSLRRGPER------PPAKGPARGAAKKRR------LPAPPPRTAQPRRPAPTLPTTS------311 SKDA1 LHNIKIKVEDSSANEEYEPHLFTNKLKCECNDTKGEFYSVTESKEEDALLTTAKEGFACPEKETPSLNPLAQSQGLSCTLGSPKPEDGEYKFGARVRKNYRTLVLGKRPVLQTPPVKPNLKSARSPRPTGKTETNEGTLDDFTVINRRKK 750 . :*.:: : .** * .:. : *: .. *. * .* : .*: *: *. *. ..* * ** * :
C17ORF96 ------TFSLLNCFPCPPALVVGEDGDLKPASSLRLQGDS-KPPPAHPLWRWQMGGPAVPEPPGLKFWGINMDES 379 SKDA1 VASNVASAVKRPFHFMANFPCPPSLIIGRDGDLWPAYSLNTTKDSQTPHKAHPIWKWQLGGSAIPLPPSHKFRKFNS--- 827 .* :: *****:*::*.**** ** **. ** .* ***:*:**:**.*:* **. ** :* c d Peptides
MQRMIQQFAA EYTSKNSSTQ DPSQPNSTKN QSLPKASPVT TSPTAATTQN PVLSKLLMAD QDSPLDLTVR KSQSEPSEQD GVLDLSTKKS PCAGSTSLSH SPGCSSTQGN GENSTEAKAV DSNNQSKSPL EKFMVKLCTH HQKQFIRVLN DLYTESQPGT EDLQPSDSGA MDVSTCNAGC AQLSTKHKEK DALCLDMKSS ASVDLFVDSS DSHSPLHLTE QTPKKPPPEI NPVDGRENAL TVVQKDSSEL PTTKSNSINS SSVDSFTPGY LTASNCSSVN FHHIPKILEG QTTGQEQDTN VNICEDGKDH MQSSALVESL ITVKMAAENS EEGNTCIIPQ RNLFKALSEE AWNSGFMGNS SRTADKENTL QCPKTPLRQD LEANEQDARP KQENHLHSLG RNKVGYHLHP SDKGQFDHSK DGWLGPGPMP AVHKAANGHS RTKMISTSIK
TARKSKRASG LRINDYDNQC DVVYISQPIT ECHFENQKSI LSSRKTARKS TRGYFFNGDC CELPTVRTLA RNLHSQEKAS CSALASEAVF 10o rf12 LCOR-Cra/LCOR LCOR-Cra C TPKQTLTIPA PRHTVDVQLP REDNPEEPSK EITSHEEGGG DVSPRKEPQE PEVCPTKIKP NLSSSPRSEE TTASSLVWPL PAHLPEEDLP LCOR To tal EGGSTVSAPT ASGMSSPEHN QPPVALLDTE EMSVPQDCHL LPSTESFSGG VSEDVISRPH SPPEIVSREE SPQCSENQSS PMGLEPPMSL C10orf12 0 0 126 0 126 GKAEDNQSIS AEVESGDTQE LNVDPLLKES STFTDENPSE TEESEAAGGI GKLEGEDGDV KCLSEKDTYD TSIDSLEENL DKKKKGKKFP LCOR-Cra 18 14 85 0 117 EASDRCLRSQ LSDSSSADRC LRNQSSDSSS ACLEIKVPKN PSAKRSKKEG HPGGTTPKGL LPDSFHTETL EDTEKPSVNE RPSEKDAEQE GEGGGIITRQ TLKNMLDKEV KELRGEIFPS RDPITTAGQP LPGERLEIYV QSKMDEKNAH IPSESIACKR DPEQAKEEPG HIPTQHVEEA EED 0190 10 VNEVDNENTQ QKDDESDAPC SSLGLSSSGS GDAARAPKSV PRPKRLTSST YNLRHAHSLG SLDASKVTSE KEAAQVNPIM PKENGASESG EZH1 1 2 16 0 19 DPLDEDDVDT VVDEQPKFME WCAEEENQEL IANFNAQYMK VQKGWIQLEK EGQPTPRARN KSDKLKEIWK SKKRSRKCRS SLESQKCSPV EZH2 3 1 13 0 17 QMLFMTNFKL SNVCKWFLET TETRSLVIVK KLNTRLPGDV PPVKHPLQKY APSSLYPSSL QAERLKKHLK KFPGATPAKN NWKMQKLWAK FRENPDQVEP EDGSDVSPGP NSEDSIEEVK EDRNSHPPAN LPTPASTRIL RKYSNIRGKL RAQQRLIKNE KMECPDALAV ESKPSRKSVC LCOR 40 0 0 95 135 INPLMSPKLA LQVDADGFPV KPKSTEGMKG RKGKQVSEIL PKAEVQSKRK RTEGSSPPDS KNKGPTVKAS KEKHADGATK TPAAKRPAAR Baits MTF2 1 1 13 0 15 DRSSQPPKKT SLKENKVKIP KKSAGKSCPP SRKEKENTNK RPSQSIASET LTKPAKQKGA GESSSRPQKA TNRKQSSGKT RARPSTKTPE PHF1 3 1 28 0 32 SSAAQRKRKL KAKLDCSHSK RRRLDAK PHF19 0140 5 e Total 66 21 294 95 476 Peptides PRC2 interactions LCOR interactions TF 2 10o rf12 C LCOR-Cra LCOR-Cra/LCOR DEK EED EZH1 EZH1/EZH2 EZH2 M PHF1 PHF19 RBBP4 RBBP4/RBBP7 RBBP7 SUZ12 CTBP1 CTBP1/CTBP2 CTBP2 GRN LCOR TBA1C To tal C10orf12 126 000 34 10 6 43 27 17 2879 54 0 00001 344 LCOR-Cra 85 14 18 2 19 0 3 19 17 10 0453 23 0 00002 224 LCOR 0 0 40 00000 0000000 11516 1 95 3
Baits 171 Total 211 14 58 2 53 10 9 62 44 27212 12 12 77 11516 1 95 6 739
f DNA anti-HA Merged
HA-C10ORF12 Tet +
HA-C10ORF12 Tet -
HA-EGFP Supplementary Figure 6 a b ASXL1- MSigDB Pathway -log10(Binomial p value) 0 20 40 60 80 100 120 140 160 180 200 220 240 260 Gene Expression >270 Diabetes pathways >270 Total reads Overall Unique Multiple Metabolism of proteins 266.81 Cell Cycle, Mitotic 240.09 SamplebioRxiv preprint(mio.) doi: https://doi.org/10.1101/059964alignment ;(%) this versionalignments posted July 7, (%)2016. Thealignments copyright holder for (%) this preprint (which was not Influenza Life Cycle 208.46 certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Translation 206.71 ASXL1 47.2 88.1 31.8 (67.4) 9.7 (20.7) Insulin Synthesis and Secretion 192.44 FOXK1 18.6 94.3 13.1 (70.3) 4.4 (24.0) GTP hydrolysis and joining of the 60S ribosomal subunit 188.83 Pathways in cancer 180.89 O-GlcNAc 46.2 86.4 30.2 (65.3) 9.8 (21.1) Peptide chain elongation 180.20 Input62 20.2 97.4 13.4 (66.4) 6.3 (31.1) Influenza Viral RNA Transcription and Replication 178.57 Input63 46.8 83.8 27.7 (59.3) 11.5 (24.5) Form ation of a pool of free 40S subunits 178.27 Regulation of beta-cell developm ent 173.27 Regulation of gene expression in beta cells 171.40 Viral m RNA Translation 170.68 Ribosom e 170.03 Transcription 169.03 Integration of energy m etabolism 158.60 Regulation of Insulin Secretion 143.62 Form ation and Maturation of m RNA Transcript 139.60
FOXK1- MSigDB Pathway -log10(Binomial p value) c 0 10 20 30 40 50 60 70 −4 0 2 4 −2 0 2 −3 −1 1 3 Gene Expression 74.68 Transcription 41.57 Cell Cycle, Mitotic 34.37 Metabolism of proteins 32.28
0.074 FOXK1 0 2 4 Translation 29.28 0.83 0.89 Diabetes pathways 28.60 Influenza Life Cycle 28.44 −4 GTP hydrolysis and joining of the 60S ribosomal subunit 27.58 Form ation and Maturation of m RNA Transcript 25.80 RNA Polym erase I, RNA Polym erase III, and Mitochondrial Transcription 25.76
0.029 0.13 0.20 0 2 4 ASXL1 HIV Infection 25.58 0.85 Spliceosom e 25.27 Mitotic M-M/G1 phases 25.23 −4 Form ation of a pool of free 40S subunits 23.24 Processing of Capped Intron-Containing Pre-mRNA 23.05 Influenza Viral RNA Transcription and Replication 22.88
0.075 Ribosom e 22.09 OGlcNac 0 2 4 Peptide chain elongation 21.23 Viral m RNA Translation 20.08
−4 Regulation of beta-cell developm ent 19.74
OGlcNac- MSigDB Pathway 0 2 RYBP 0.78 0.46 -log10(Binomial p value) −2 0 20 40 60 80 100 120 140 160 180 Gene Expression >190 Cell Cycle, Mitotic 184.59 Diabetes pathways 169.42 RING1B 0.56 0 2 4 Transcription 144.98
−2 Metabolism of proteins 140.63 HIV Infection 136.38 Form ation and Maturation of m RNA Transcript 125.96 Processing of Capped Intron-Containing Pre-mRNA 120.31 1 3 Influenza Life Cycle 116.45 TIF1B Elongation and Processing of Capped Transcripts 110.92 Translation 107.67
−3 −1 Mitotic M-M/G1 phases 106.42 Spliceosom e 101.67 −4 0 2 4 GTP hydrolysis and joining of the 60S ribosomal subunit 93.91 m RNA Splicing 92.17 Host Interactions of HIV factors 91.59 Insulin Synthesis and Secretion 91.12 RNA Polym erase I, RNA Polym erase III, and Mitochondrial Transcription 90.33 Huntington's disease 87.58 Influenza Viral RNA Transcription and Replication 85.74