University of Rhode Island DigitalCommons@URI

Cell and Molecular Biology Faculty Publications Cell and Molecular Biology

3-2019

Topological Scoring of Interaction Networks

Mihaela E. Sardiu

Joshua M. Gilmore

Brad D. Groppe

Arnob Dutta

Laurence Florens

See next page for additional authors

Follow this and additional works at: https://digitalcommons.uri.edu/cmb_facpubs Authors Mihaela E. Sardiu, Joshua M. Gilmore, Brad D. Groppe, Arnob Dutta, Laurence Florens, and Michael P. Washburn ARTICLE

https://doi.org/10.1038/s41467-019-09123-y OPEN Topological scoring of protein interaction networks

Mihaela E. Sardiu1, Joshua M. Gilmore1,3, Brad D. Groppe1,4, Arnob Dutta1,5, Laurence Florens1 & Michael P. Washburn 1,2

It remains a significant challenge to define individual protein associations within networks where an individual protein can directly interact with other and/or be part of large complexes, which contain functional modules. Here we demonstrate the topological scoring (TopS) algorithm for the analysis of quantitative proteomic datasets from affinity purifica-

1234567890():,; tions. Data is analyzed in a parallel fashion where a prey protein is scored in an individual affinity purification by aggregating information from the entire dataset. Topological scores span a broad range of values indicating the enrichment of an individual protein in every bait protein purification. TopS is applied to interaction networks derived from human DNA repair proteins and yeast remodeling complexes. TopS highlights potential direct protein interactions and modules within complexes. TopS is a rapid method for the efficient and informative computational analysis of datasets, is complementary to existing analysis pipe- lines, and provides important insights into protein interaction networks.

1 Stowers Institute for Medical Research, Kansas City, MO 64110, USA. 2 Department of Pathology and Laboratory Medicine, The University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas City, KS 66160, USA. 3Present address: Boehringer Ingelheim Vetmedica, St. Joseph, MO 64506, USA. 4Present address: Thermo Fisher Scientific, Waltham, MA 02451, USA. 5Present address: Department of Cell and Molecular Biology, University of Rhode Island, 287 CBLS, 120 Flagg Road, Kingston, RI 02881, USA. Correspondence and requests for materials should be addressed to M.P.W. (email: [email protected])

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y

any large- and medium-scale analyses of protein a score by concentrating only on a single bait column via nor- interaction networks exist for the study of protein malization or modeling, we aggregate information from the whole M 1–4 fi complexes . These studies typically consist of af nity dataset where all data from all rows and all columns are used. Our purifications of different bait proteins analyzed using mass topological score is based on the likelihood ratio and reflects the spectrometry (AP-MS) and utilize statistical tools to provide interaction preference of a prey protein for an affinity-purified confidence that a prey protein is associated with a bait protein. bait. For each protein detected in a bait AP-MS, TopS calculates 5 6 7 8 Approaches like CompPASS , QSPEC , SAINT , and SFINX the likelihood ratio between the observed spectral count Qij of a largely yield statistical values, like a p value, to provide a con- protein i in a bait j, and the expected spectral count Eij in row i fidence that two proteins are associating or are part of a protein and column j (Fig. 1, Eq. (2)). Each prey protein from each AP- complex. Within a protein interaction network, an individual MS experiment has a distinct TopS value. TopS assigns positive or protein may have multiple interactions, may be part of a large negative scores to proteins identified in each bait using spectral protein complex or complexes, which can be composed of count information. If the actual number of spectra of a prey important functional modules. For example, modularity is a protein in a specific bait AP-MS exceeds that in all baits AP-MS, hallmark of protein complexes involved in and the presumption is that we have a positive preferential interac- chromatin remodeling. Within this area of protein interaction tion. Likewise, fewer spectra in the bait AP-MS indicate a negative networks, Mediator9, SAGA10,11, and SWI/SNF12 are just a few of interaction preference. the many complexes well known to have modules that carry out Unlike p values or fold changes where the difference between distinct functions. Determining these modules in these complexes the largest and the smallest values is relatively small, TopS has required years of study using biochemical, genetic, and pro- generates a wide range of positive and negative scores that can teomic methods9–12. In addition, within protein interaction net- easily differentiate high, medium, or low interaction preferences works and protein complexes, there are also direct protein within the data. This is an advantage for proteomics data analysis interactions that are critical for biological functions. For example, since these scores not only reflect the interaction preference of in DNA repair, the formation of the -Ku80 (XRCC5- proteins relative to others, but can now be directly integrated to XRCC6) heterodimer is critical for recognition of DNA double further analyses such as clustering or network analysis in order to strand breaks that occur during nonhomologous end joining13. discover network organization. TopS is written in R and the Existing statistical tools struggle to gain insight into the behavior platform is built with SHINY (https://shiny.rstudio.com/). It is of an individual protein in a protein interaction network. Meth- easily implemented and includes correlations and clustering for ods are needed to dig deeper into protein interaction datasets to bait/prey relationships. determine direct protein interactions and to capture modularity within complexes. We have used approaches like deletion network analyses, Analysis of a human DNA repair network dataset.Wefirst network perturbation, and topological data analysis to determine tested TopS on a dataset generated from HaloTag proteins the modularity in protein interaction networks and protein involved in human DNA repair. DNA repair mechanisms are complexes themselves14. Here we describe a topological scoring complex, independent, interdependent, and have been extensively (TopS) algorithm for the analysis of protein interaction networks studied15,16. To uncover the connectivity between proteins derived from quantitative proteomic AP-MS datasets. TopS can involved in these pathways, we selected 17 proteins that are part be used by itself or in addition to existing tools5–8 to analyze and of different DNA repair mechanisms. In addition to the affinity interpret datasets. Here, we apply TopS to a protein interaction purifications of known elements of the DNA repair pathways network centered on human DNA repair proteins, to a previously such as MSH2, MSH3, and MSH6 (involved in mismatch repair), published human polycomb complexome dataset4, and to yeast RPA1, RPA2, and RPA3 (involved in nucleotide excision repair), chromatin remodeling datasets from the INO80 and SWI/SNF XRCC5 and XRCC6 (involved in double strand break repair), complexes12,14. TopS yields insights into potential direct protein SSBP1 and PARP117, we analyzed an additional seven proteins interactions and modularity within these networks. TopS is a (WDR76, SPIN1, CBX1, CBX3, CBX5, CBX7, and CBX8) with simple and powerful algorithm based on the likelihood ratio chromatin associated functions and some of which had been method to infer the interaction preferences of proteins within a associated with DNA repair18. For example, we have previously network consisting of reciprocal and nonreciprocal purifications. demonstrated that WDR76 is a DNA damage response protein The TopS algorithm generates positive or negative values across a with strong associations with members of DNA repair pathways broad range for each prey/bait combination relative to the other and the CBX proteins19. It is important to note that our objective AP-MS analyses in a dataset. TopS can differentiate between here was not to describe a human DNA repair protein interaction high-confidence interactions found with large positive values and network. The proteins chosen for this small-scale study were of lower confidence interactions found with negative values. TopS interest because of their potential relationship to the poorly has the advantage that the scores it generates can be easily further characterized WDR76 protein19. Furthermore, these proteins integrated into additional computational workflows and cluster- were transiently overexpressed in HEK293 cells, and given the ing approaches. potential issues with transient overexpression20, the dataset would be expected to be noisy. We reasoned that using a noisy dataset would be an excellent test of the TopS method and its ability to Results extract meaningful biological information. Definition of topological score based on likelihood ratio. We used Halo affinity purification followed by quantitative Determining whether a protein in a single affinity purification is proteomics analysis to identify proteins associated with any of the significantly enriched without additional information remains a 17 baits. Three biological replicates were performed for each of challenge. Therefore, by comparing several biologically related the bait proteins and the distributed normalized spectral baits, one can determine whether a protein is truly enriched in a abundance factor (dNSAF)21 was used to quantify the prey sample. The concept behind computing TopS is to collectively proteins in each bait AP-MS. To eliminate potential nonspecific analyze parallel proteomics datasets and highlight enriched proteins, three negative controls were analyzed from cells interactions in each bait relative to the other baits in a larger expressing the Halo tag alone. A total of 54 purifications were biological context. For each individual bait, instead of calculating completed and 4509 prey proteins identified (Supplementary

2 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

Proteins detected by AP-MS

Filter QSPEC

Baits (j) Baits’ spectral counts are adjusted yij=(aTi. +b.j-T..)/(a-1)(b-1) Equation (1) Preys (i)

Automated analysis (SHINY App) Topological score Top S=Qij*ln(Qij/Eij) Equation (2)

Modularity and linkage analysis Filter CRAPome Top S>20

Max=29752

0.00 Act The colors scale: Arp8 Ies4 Min = –29752 IES1 IES5 IES3 NHP10 ARP4 ARP8 ARP9 ARP5 IES6 ARP7 RTT102 IES4 RVB1 RVB2 IES2 INO80 TAF14 TAF14 SNF11 SNF6 SNF2 SWI1 SWI3 SWP82 ACT1 Arp4 ARP4 Taf14 IES1 INO80 IES3 IES5 NHP10 ARP5 IES6 ARP8 Ino80 Rvb1 IES2 Ies1 IES4 SNF11 TAF14 RVB1 RVB2 ARP7 Ies2 RTT102 ARP9 Nhp10 Ies3 Rvb2 SNF12 SNF6 SNF2 Arp5 SWI1 SNF5 SWP82 Ies5 SWI3 NPL6 Ies6 RSC2 RSC8 RSC6 SFH1 RSC58 RSC1 RSC9 Topological data analysis (TDA) STH1 Fig. 1 Workflow for topological scoring of protein interaction datasets. In the first step, contaminant proteins are filtered using Z scores and FDR generated from QSPEC; however, other approaches could be used. Next, overexpressed bait proteins that have high spectral counts in the data are adjusted using Eq. (1). After prey proteins as baits are adjusted, topological scores (TopS) are then calculated, using an automated SHINY application, as described in Eq.(2), where Qij is the observed count in row i and column j and Eij is the expected count in row i and column j. Direct input of TopS values can be used in many different ways, such as data clustering to investigate the modularity and linkage in a network; additional filtering may be conducted using the CRAPome27, for example; and a topological data analysis network may be generated. FDR false discovery rate

Data 1). To determine the proteins that were enriched in the this hypothesis, we sorted proteins into known complexes using samples versus negative controls, QSPEC6 was used (Supplemen- ConsensusPathDB22 and detected 118 protein complexes con- tary Data 1). A protein was considered specific in a sample if its Z sisting of 230 proteins. Some of these 230 proteins were shared by score was greater than or equal to 2 and the FDR was less than multiple protein complexes. We systematically examined their 0.01 in the bait AP-MS versus the control AP-MS. A total of 801 TopS values and found that indeed proteins within complexes prey proteins passed these strict statistical criteria and they were tended to associate with the same baits with high topological further used in the analysis. Because of the higher number of scores (Fig. 2 and Supplementary Data 3) even though some of spectra identified for overexpressed bait proteins, we adjusted the the proteins were detected in most of the baits. To further spectral counts for each bait protein according to Eq. (1) (Fig. 1). investigate this observation, we hierarchically clustered all of the To depict the interactions in this DNA repair dataset that proteins belonging to known complexes and obtained a strong consisted of a matrix of 801 proteins in 17 baits, we calculated separation of the baits: bait proteins that belong to the same topological scores based on Eq. (2) and assigned positive or complexes clustered together (Fig. 2a). Thus, by using TopS negative TopS to proteins identified in each bait AP-MS. The values, we could illustrate the preferential interactions between FDR values for these proteins using the QSPEC6 pipeline were protein complexes and baits in a large dataset. also calculated (Supplementary Data 1 and Supplementary Certain complexes showed significant enrichment to some of Figure 1). To focus on proteins with positive interaction the baits. For example, in the case of the polycomb complex, we preferences, we used a TopS cutoff of 20 (Supplementary detected five members of the complex including CBX8. These Figure 2). A total number of 617 proteins passed this filtering five proteins had high preferential interactions to the CBX8 bait criteria (Supplementary Data 2). (Fig. 2b). Similar results were observed in the case of the BRAFT Next, we hypothesized that proteins within the same complexes complex where we observed that five components of the should have high preferences to the same bait purification. To test complex, including RPA proteins, show high scores specifically

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y

a

MSH6 MSH6 .1 VCP HNRNPF HSPD1 CALM3 HSP90AB1 .3 HSP90AB1 .2 HSP90AB1 HSP90AB1 .1 HSPA1A .2 HSPA1A HSPA1A .1 HSPA4 HSPA8 .4 HSPA8 .3 HSPA8 .2 HSPA8 HSPA8 .1 STUB1 .2 STUB1 STUB1 .1 RU VBL 1 XRCC6 .8 XRCC6 .7 XRCC6 .6 XRCC6 .5 XRCC6 .4 XRCC6 .3 XRCC6 .2 XRCC6 XRCC6 .1 PRKDC .4 PRKDC .3 PRKDC .2 PRKDC PRKDC .1 MEN1 MEN1 .1 3 SLC25A5 TUBA1C TUBB TUBB .1 TUBA4A TUBA1B BCOR AP2A2 CCN A 2 RPL38 RPL38 .1 RPL22 .2 RPL22 RPL22 .1 CSN K2B .4 CSN K2B .3 CSN K2B .2 CSN K2B CSN K2B .1 PRMT1 RING1.4 RING1.3 RING1.2 RING1 RING1.1 RNF24 . RNF23 . RNF22 . RNF2 RNF21 . AP2A1 BMI1 3. BMI1 2. BMI1 BMI1 1. 2 HSPA5 .3 HSPA5 .2 HSPA5 HSPA5 .1 CSN K2A 2 .3 CSN K2A 2 .2 CSN K2A 2 CSN K2A 2 .1 CSN K2A 1 .3 CSN K2A 1 .2 CSN K2A 1 CSN K2A 1 .1 GLG 1 CBX8 2. CBX8 CBX8 1. HNRNPL HNRNPC HNRNPC . 1 HNRNPA3 HNRNPA2B1 LMNA LMNB1 .4 LMNB1 .3 LMNB1 .2 LMNB1 LMNB1 .1 BANF1 LMNA 1. XRCC5 .8 XRCC5 .7 XRCC5 .6 XRCC5 .5 XRCC5 .4 XRCC5 .3 1 XRCC5 .2 XRCC5 XRCC5 .1 EMD.4 EMD.3 EMD.2 EM D EMD.1 RPS24 .3 RPS24 .2 RPS2 4 RPS2 4 .1 RPS1 3 .5 RPS13 .4 RPS13 .3 RPS13 .2 RPS13 RPS13 .1 RPL8.3 RPL8.2 RPL8 RPL8.1 RPL7A .3 RPL7A .2 RPL7A RPL7A .1 RPL14 .3 RPL14 .2 RPL14 RPL14 .1 RPL37A .3 RPL37A .2 RPL37A RPL37A .1 RPS1 1 .4 RPS11 .3 0 RPS11 .2 RPS1 1 RPS1 1 .1 RPS2 3 .3 RPS2 3 .2 RPS2 3 RPS23 .1 DDB1 MYBBP1A PWP1 RPL6.4 RPL6.3 RPL6.2 RPL6

RPL6.1 TopS Normalized RPL7.3 RPL7.2 RPL7 RPL7.1 RPL4.3 RPL4.2 RPL4 RPL4.1 RPS7 . 3 RPS7 . 2 RPS7 RPS7 . 1 SMC1A BRIX1 TBL3 RPS1 5 A .3 RPS1 5 A .2 RPS1 5 A RPS1 5 A .1 DHX9 .4 −1 DHX9 .3 DHX9 .2 DHX9 DHX9 .1 DDX3X .2 DDX3X DDX3X .1 RPS6 . 3 RPS6 . 2 RPS6 RPS6 . 1 HNRNPUL1 HNRNPM .2 HNRNPM HNRNPM .1 RPL21 .3 RPL21 .2 RPL21 RPL21 .1 G3BP1 CAND1 CAPRIN1 NASP.2 NASP NASP.1 RPL3.2 RPL3 RPL3.1 DDX21 DDX21 .1 RPL12 .3 RPL12 .2 RPL12 RPL12 .1 TOP2A.2 TOP2A TOP 2 A . 1 RPL9.2 RPL9 RPL9.1 RPL10A .3 RPL10A .2 RPL10A RPL10A .1 DHX30 RPLP0 .2 RPLP0 RPLP0 .1 RBM28 TJP1 TJP1.1 RSL1D1 POLB .2 POLB .1 POLB TDP1 GNAS GNAS.1 RPL11 .3 RPL11 .2 RPL11 RPL11 .1 PN KP PN KP .1 NCL .2 NCL NCL .1 LIG3 3. LIG3 2. LIG3 LIG3 1. PARP1 .9 PARP1 .8 PARP1 .7 PARP1 .6 PARP1 .5 PARP1 .4 PARP1 .3 PARP1 .2 PARP1 PARP1 .1 XRCC1 .3 XRCC1 .2 XRCC1 XRCC1 .1 GNB1 CUL4B CUL4B .1 XPO5 RPL5.2 RPL5 RPL5.1 RPL30 .5 RPL30 .4 RPL30 .3 RPL30 .2 RPL30 RPL30 .1 HSP90AA1 .2 HSP90AA1 HSP90AA1 .1 SND 1 SS B SS B.1 GTF2I RPS3A .3 RPS3 A . 2 RPS3 A RPS3 A . 1 PRPF8 .4 PRPF8 .3 PRPF8 .2 PRPF 8 PRPF8 .1 YBX1 YBX1.1 HNRNPA1 HNRNPA1 .1 ILF3 .6 ILF3 .5 ILF3 .4 ILF3 .3 ILF3 .2 ILF3 ILF3 .1 IGF2BP1 PTBP1 ILF2 .3 ILF2 .2 ILF2 ILF2 .1 RPS5 .2 RPS5 RPS5 .1 HCFC1 HCFC1 .1 FBL.2 FBL FBL.1 PRPF19 .2 PRPF19 PRPF19 .1 SNRNP200 .2 SNR NP20 0 SNRNP200 .1 SYNCRIP PABPC1.2 PABPC1 PABPC1.1 NONO .3 NONO .2 NONO NONO .1 SF3A3 .2 SF3A3 SF3A3 .1 SFPQ .3 SFPQ .2 SFPQ SFPQ .1 NOP56 NOP58 NOLC1 DDX1 SF3A1 .2 SF3A1 SF3A1 .1 C1QBP C1QBP .1 SPE N MAGOH MAGOH.1 TARDBP SF3B24 . SF3B23 . SF3B22 . SF3B2 SF3B21 . CPSF2 RBBP5 .2 RBBP5 RBBP5 .1 PRMT5 PRMT5.1 DPY30 DPY30 .1 WDR77 NCBP1 PPP1R10 CSTF 3 SRRM2 SRRM2 .1 PN N SNR P G3 . SNR P G2 . SNR P G SNR P G1 . SNR PA1 .2 SNR PA1 SNR PA1 .1 EWSR1 DGCR82 . DGCR8 DGCR81 . SRRT ACIN1 CPSF1 FUS FUS .1 SF3B43 . SF3B42 . SF3B4 SF3B41 . ERH SF3B15 . SF3B14 . SF3B13 . SF3B12 . SF3B1 SF3B11 . HNRNPR .2 HNRNPR HNRNPR .1 STT3B PCNA MSH2 MSH2 .1 RPN1 RPN2 EIF3I .2 EIF3I EIF3I .1 EIF3D EIF3D .1 EIF3E EIF3E.1 EIF3F EIF3F .1 EIF4A1 CLTC EIF3B.2 EIF3B EIF3B.1 ACTL6A CHAF1A.2 CHAF1A CHAF1A.1 KPNA3 CHAF1B .3 CHAF1B .2 CHAF1B CHAF1B .1 RBBP4 .2 RBBP4 RBBP4 .1 RF C4 RPA1 .5 RPA1 .4 RPA1 .3 RPA1 .2 RPA1 RPA1 .1 CHTF18 WRN WRN.1 BLM .2 BLM BLM .1 CBX5 TOP3A.2 TOP3A TOP3A.1 RPA2 .5 RPA2 .4 RPA2 .3 RPA2 .2 RPA2 RPA2 .1 RPA3 .4 RPA3 .3 RPA3 .2 RPA3 RPA3 .1 RPS1 5 .3 RPS1 5 .2 RPS1 5 RPS1 5 .1 RPL23 .2 RPL23 RPL23 .1 RPS4 X . 2 RPS4X RPS4X . 1 RPL19 .3 RPL19 .2 RPL19 RPL19 .1 TEX10 SENP3 PELP1 RPL24 .2 RPL24 RPL24 .1 RPL10 .2 RPL10 RPL10 .1 RPL13 .3 RPL13 .2 RPL13 RPL13 .1 H2AFZ LAS1L SMC3 RPL23A .2 RPL23A RPL23A .1 RPL26 .2 RPL26 RPL26 .1 KPNA1 HIST2H2AA4 H3F3B H3F3B .1 CHD4 HIRA HIRA.1 HIST2H4B 3. HIST2H4B 2. HIST2H4B HIST2H4B 1. HIST2H3D TP53BP1 HIST1H3D .5 HIST1H3D .4 HIST1H3D .3 HIST1H3D .2 HIST1H3D HIST1H3D .1 PPP 2CA PPP 2CA .1 PPP 2 R1A PPP2R1A .1 NSL1 TRIM28 IPO4 IPO4 .1 GTF3C2 PPP 1CA PPP 1CA .1 XRCC4 XRCC4 .1 DSN1 GTF3C4 GTF3C1 AHNAK ARP1 CBX5 CBX7 RPA2 RPA1 RPA3 CBX3 CBX1 CBX8 MSH2 MSH3 MSH6 SPIN1 SSBP1 P XRCC6 XRCC5 WDR76

b Polycomb Braft BLM HSPA1A TOP3A RNF2 RPA1 BMI1 RPA3 RING1 RPA2 CBX8 CBX1 CBX3 CBX5 CBX7 CBX8 ARP1 RPA1 RPA2 RPA3 MSH2 MSH3 MSH6 SPIN1 SSBP1 P XRCC5 XRCC6 RPA1 RPA2 RPA3 CBX1 CBX3 CBX5 CBX7 CBX8 WDR76 MSH2 MSH3 MSH6 SPIN1 SSBP1 PARP1 XRCC5 XRCC6 WDR76 c DNA−PK−Ku−eIF2−NF90−NF45 TLE1 corepressor (MASH1 TNF−alpha/Nf−kappa B signaling −corepressor ) CF IIAm (cleavage factor IIAm ) "Heterotrimeric G protein (GNG2, GNB1,GNAS)" XRCC1−LIG3−PNK−TDP1 Large Drosha FCP1−associated protein Emerin 25 PARP1 DGCR8−ILF3 Polyadenylation DGCR8 multiprotein Toposome DNA ligase III−XRCC1−PNK−DNA−pol III TOP1−PSF−P54 17S U2 snRNP multiprotein PSF−p54(nrb) MLL−HCF E3 ligase Casein kinase II (AHR, ARNT, STAT6−p100−RHA SPIN1 (beta−dimer, alpha−dimer) DDB1, TBL3, Nop56p−associated Menin−associated histone methyltransferase Emerin 32 CUL4B, RBX1) pre−rRNA H2AX I ASF1−interacting protein Emerin FIB−associated protein Histone H3.3 CTCF−nucleophosmin− 52 DNA ligase IV−XRCC4−AHNK PARP−HIS−KPNA−LMNA−TOP eIF3 NCOA6−DNA−PK−Ku−PARP1 WDR76 Oligosaccharyltransferase CBX3CBX1 (Stt3B variant) Chromatin PPP2R1A−PPP2R1B− XRCC6 PPP2CA−PPME1−EIF4A1 SMC1−SMC3 assembly (CAF−1) MLL1−WDR5 TFIIIC containing−TOP1−SUB1 BLM II RPA3 MSH3 Mis12 centromere Histone H3.1 Emerin architectural MASH1 promoter−coactivator WRN−Ku70−Ku80−PARP1 Profilin 1 60S APC Casein containing PCNA−CHL12−RFC2−5 CBX5 RPA2 kinase II MSH2 PPP2CA−PPP2R1A−PPP2R3A HSP90−FKBP38−CAM−Ca(2+) RPA1 ASCOM RC during S−phase of MSH6 CBX7 ASF1−histone containing RC during G2/M−phase of cell cycle

hPRC1L CHIP−HSC70 eIF3 SSBP1 CBX8 HCF−1 "Polycomb repressive 1 N−NOS−CHIP−HSP70−1 (PRC1, hPRC−H)" LRRK2−CHIP−HSP90 CEN XRCC5 RalBP1−CCNB1− BCOR AP2A−NUMB−EPN1

Ku70/Ku86/Werner

Vigilin−DNA−PK−Ku antigen with the RPA3 bait AP-MS dataset confirming the connectivity complexes and baits, we constructed a network using the between these proteins (Fig. 2b). The high TopS value observed Cytoscape platform23 (Fig. 2c). This network showed that with RPA3 suggests that this subunit of the RPA1/RPA2/RPA3 PARP1, SPIN1, and XRCC6 baits have the most connections, module brings in the other two BRAFT-specific proteins. and PARP1 and SPIN1 share a significant connected subnet- Finally, to further illustrate all of the connections between work (Fig. 2c).

4 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

Fig. 2 Capture of protein complexes in a human DNA repair network. a Hierarchical biclustering on TopS values of the 17 baits and 535 preys in a human DNA repair protein interaction network. Proteins identified in complexes in the DNA repair dataset were hierarchically clustered using normalized TopS values as input using ClustVis46. a Scaling method was used to divide the values by standard deviation so that each row had a variance equal to one. Rows were centered and unit variance scaling was applied to rows. Rows were clustered using correlation distance and average linkage. Columns were clustered using correlation distance and Ward linkage. b Bait-specific protein complex enrichment. Two complexes are shown that exhibited high association with specific baits. Red color corresponds to high TopS scores. Subunits of the polycomb complex exhibited high scores with the CBX8 bait, and components of the BRAFT complex exhibited high scores with the RPA3 bait. c Interaction network between baits and known protein complexes identified in the dataset. The network was constructed using the Cytoscape platform23. Large nodes correspond to a larger number of links. TopS topological scoring

Topological network assembly and extreme value analysis.We TopS values in specific baits and extreme negative values have previously demonstrated that topological data analysis elsewhere in the dataset occurred for several additional bait (TDA)24 can be used to identify clusters of proteins in protein −prey interactions. These included XRCC5 and XRCC6, which interaction networks using normalized correlations25 and to find directly interact with each other to form a heterodimer13. In the topological network modules in perturbed protein interaction XRCC5 affinity purification, XRCC6 is the highest scoring prey networks using fold change ratios14. We then assembled a protein protein (Supplementary Figure 3), which is the highest TopS interaction network based on TDA using TopS values as input value in the entire dataset, and in the XRCC6 affinity purification (Fig. 3a). TDA in conjunction with TopS values resulted in XRCC5 is the highest scoring prey protein (Fig. 3c and proteins with high topological scores clustering to the same Supplementary Data 2). XRCC5 and XRCC6 are among the region of the network (Supplementary Data 4). For example, most negative TopS values in several other bait purifications proteins that are likely to interact with PARP1 were located in the including CBX5, CBX8, MSH2, MSH3, MSH6, RPA2, RPA3, same topological area on the right side of Fig. 3, whereas proteins SPIN1, and WDR76 (Fig. 3c and Supplementary Data 2). These that interact with WDR76 were on the upper left side. WDR76 results suggest that extremely positive TopS values may suggest localized in a node containing another 18 proteins including direct protein−protein interactions. HELLS, GAN, and SIRT1, which have also been observed by An alternative approach to visualize positive and negative TopS independent studies to associate with WDR762,26. Similarly, scores is found in Supplementary Figure 4, where four subnet- SPIN1 was in a node with another 18 proteins, four of which works are shown in different bait affinity purifications. Subunits (SPIN1, SPIN4, THRAP3, and BCLAF1) have been reported by of the elF3 complex showed the same high positive scores in the others in SPIN1 purifications2. MSH3 and SPIN1 affinity purifications and negative scores in the Many additional approaches may be used to further interpret WDR76 and SSBP1 affinity purifications (Supplementary Fig- the dataset. For example, we introduced another filter to the TopS ure 4a). Also shown are proteins found in the CBX/HP1 associations by adding information from the CRAPome, which is interactions, proteins of the CEN complex, and MSH2 interacting a database of proteins known to be present in 411 negative proteins (Supplementary Figure 4b–d). In these cases, very high controls for human affinity purifications27. The list of proteins TopS values were calculated in selected baits compared to others. with high TopS values were analyzed using the CRAPome web The similarity of TopS scores for proteins in complexes in interface. We next removed proteins with high TopS values in our different affinity purifications further highlights the ability of dataset that were present in a maximum of 10/411 controls (~2% TopS scores to rapidly capture meaningful information from a of the negative controls) with a maximum spectral count of 15. dataset. In addition, this illustrates a degree of similarity for Using this threshold, we created a reduced network and inspected proteins in complexes in terms of interactions suggesting these specific interactions (Fig. 3b). For simplicity, we focused on coherence in TopS negative and positive sign assignments. the interactions of the CBX proteins where we observed a high Similar results can be observed for multiple complexes and connection between a set of zinc finger proteins with KRAB groups of biologically related proteins (Supplementary Data 3 and domains and the CBX proteins (Fig. 3b). Most of these zinc finger Supplementary Figure 4). proteins were not identified in any of the 411 negative controls As an example of how to further utilize prey proteins with high and are likely specific to the current dataset. KRAB zinc finger TopS values, we next sought to investigate the therapeutic aspect proteins play an important role in the evolution of of these interactions since proteins involved in DNA repair regulatory networks28. While one of these KRAB proteins, pathways are known for their role in human diseases such as TRIM28/KAP1, has been shown to directly interact with HP1 cancer. We used WebGestalt31 to perform drug−gene association proteins (CBX1, CBX3, and CBX5)29,30, all others are previously enrichment for the proteins with high TopS scores in our dataset. undescribed high-confidence interactors. The reduced subnet- The enrichment analysis resulted in nine identified drug classes work presented in Fig. 3b suggests a large number of additional (Fig. 4a). Unexpectedly, we observed that the enriched classes KRAB Zinc finger proteins associate with CBX proteins in consisted of proteins with high topological scores for the same unexplored processes. bait (Supplementary Data 5). For example, the association with The known direct interactions of HP1 proteins (CBX1, 3, and dactinomycin was mostly enriched with proteins with high 5) with TRIM2829,30 suggested we look deeper into TopS values positive TopS scores for PARP1, SPIN1, and WDR76 (Fig. 4b and in our dataset. We sorted the TopS values for all the prey proteins Supplementary Data 5), whereas proteins associated with for each bait protein (Supplementary Data 2) from highest to tobramycin had high scores for SPIN1 (Supplementary Data 5). lowest to observe the extreme negative and positive values in the Dactinomycin is approved for use in treatment of several cancers, dataset. The 28 prey proteins with the highest TopS values across such as Wilms’s tumor32. These enrichment results indicate that the 17 human DNA repair bait purifications are shown in Fig. 3c. TopS can highlight interactions that may be targeted by drugs in a TRIM28 is the highest scoring protein with all three CBX/HP1 dataset. proteins. In every other bait, except RPA3, TRIM28 is among the ten most negative TopS values and TRIM28 is the most negative scoring protein in the CBX8, MSH6, and WDR76 baits TopS analysis of yeast chromatin remodeling complexes. The (Supplementary Data 2). This pattern of a prey having very high fact that the highest TopS value in the human dataset was

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y

a

EXO1 1.00 Min Max 26.0 SM A RCAD1 Rows per node avg 10.5 MCM8 NUCKS1 WDHD1 Proteins interacting with MSH2 Proteins interacting with WDR76 Proteins interacting with SPIN1 HIST1H2BK; AP1B1; CBX3

Proteins interacting with MSH Proteins interacting with Proteins interacting with PARP1 CBX7 and CBX8

Proteins interacting with RPA

Proteins interacting with XRCC5

proteins interacting with CBX5 Proteins interacting with CBX1and CBX3

bcMax = 9880.2 RAF1 CDC20 CCNA2 SLIT2 RPTORFBRS AHDC1 BMI1 CBX7

ZNF264 RAP1GDS1 CBX8 ZNF7 CBX5 MRPS25 TRMT2A ZNF483 0.00 ZNF324 CBX1 DAXX ABT1 RPP40 GAN ZNF689 CETN2 UTP15 ZNF24 CEP78 GSG2 REPIN1 ZNF445 ZNF184 ZNF770 NSL1 SIRT1 C17orf75 ZNF649 EPB41L5 DUSP11 ZBTB11 CBX3 OSBPL5 WDR76 ZNF460 ZNF189 TSPYL1 ZNF498 C3orf17 FAM115A HOXA5 HIRA ZFP62 SDF2L1 ZNF771 KRBA1 HOXB6 HJURP DSN1 ZNF250 ZNF768 Min = –9880.2 CBX1 CBX3 CBX5 XRCC5 XRCC6 PARP1 CBX7 CBX8 SPIN1 SBP1 WDR76 MSH2 MSH6 MSH3 RPA1 RPA2 RPA3 CSF2 LLGL1 OBSL1 PHLDB3 AHNAK XRCC4 CHAF1B ZNF791 ZNF613 SLIT2 TRIM28 AHDC1 CBX1 ZNF264 CBX3 CBX5 MEN1 ZNF7 ZNF483 CHAF1A ZNF324 UNG CBX1 ZNF689 CBX5 C10orf2 WRN NSL1ZNF445 ZNF184 RPA2 ZNF649 RPA2 NEURL4 CBX3 ZNF460 ZNF189 RPA3 FAM115A RFWD3 HIRA SMARCAL1 RPA3 KRBA1 SSBP1 DSN1 ZNF250 NOVA2 TOP3A OBSL1 RPA1 HERC2 XRCC4 GSTCD HELB ZNF791 ZNF613 C11orf84 STT3B SPIN1 SYNE1 CHTF18 GAN RPAIN ZBTB48 SMARCAD1 TRIM27 HELLS MSH3 WDR76 RPA1 VRK3 XPNPEP1SPIN1 CBX8 GOLPH3 MSH3 EXO1 WRAP53 MSH2 XRCC6 PCNA MCM8 SSBP1 SPIN4 MSH3 MSH2 HLTF MSH6 PNKP TDP1 SMARCAD1 CENPC1 MTHFD2 PARP14 GPC1 NCL CASK PARP1 FARP2 PARP1 PARG PRKDC POLB TIMELESS XRCC6 PARP2 XRCC5 XRCC1 calculated for XRCC6 in the XRCC5 affinity purification, which is remodeling complexes. Here, we reanalyzed two published yeast a known heterodimer13, led us to investigate the TopS approach INO8014 and SWI/SNF12 protein complexes datasets for which in the well-characterized yeast chromatin remodeling system. We crosslinking data also exist33,34. However, we only applied TopS have previously utilized deletion network analyses to determine to the wild-type affinity purifications and did not consider the the modularity of the INO8014 and SWI/SNF12 chromatin affinity purifications in genetic deletion backgrounds. We sought

6 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

Fig. 3 Topological network and linkage analysis of a human DNA repair network. a TDA in combination with the TopS scores was applied to the proteins with Tops > 20 in at least one of the baits. Norm Correlation was used as a distance metric with two filter functions: Neighborhood lens 1 and Neighborhood lens 2. Resolution 30 and gain 3 were used here using the Ayasdi software24. Proteins are colored based on the rows per node. Color bar: red: high values, blue: low values. Node size is proportional to the number of proteins in the node. b A reduced network was created using information from the CRAPome27 database, and the network was generated using Cytoscape platform23. Proteins associated with CBX1, CBX3, and CBX5 are expanded and highlighted in the inset. c Hierarchical biclustering using PermutMatrix47 of protein with extreme positive TopS values. The 28 proteins with the highest TopS values across the dataset are shown. Rows and columns were clustered using Pearson as the distance and Ward linkage as the method. Yellow corresponds to high TopS values and gray shows negative TopS values. Proteins not present in the purifications are shown in black. TopS topological scoring

a Drug-gene association integration

Drug-gene enrichment 100 90 80 70 60 50 40 30

–log10(P_value) 20 10 0

Netilmicin Adenosine Tobramycin Kanamycin Cefacetrile Dactinomycin CiprofloxacinHydroxyurea

Adenosine triphosphate

b TopS values enriched in dactinomycin drug –1222.25 0.00 9880.2 RPL4 WDR3 RPL12 RSL1D1 RPL22 RPL6 SNRNP200 RPS3A ATP5B RPL14 PRPF19 GNL2 SFPQ EIF5A SDAD1 DDX52 BLM RPL23 WDR12 AIFM1 COIL HNRNPC SNRPA1 KHDRBS1 RPL3 UBTF RPL11 RPL5 DDX21 RPL10A HNRNPM UBTF1 RPL9 LMNB1 DHX9 PTBP1 MYBBP1A RBM28 XPO1 RPL7 PPP1CB RPL7A GCN1L1 RRP1B DDX1 SMC1A DDX50 RPL8 TBL3 NONO NOP56 ILF2 FBL DKC1 RPL13 RPL26 RPS5 DDX54 NHP2L1 BOP1 POP1 BMS1 CEBPZ RCL1 GLTSCR2 DDX10 ABT1 SENP3 PES1 HSPA9 WDR18 TRIM28 CBX1 CHD4 PRKDC NUP153 XRCC6 WDR76 SPIN1 SSBP1 XRCC6 XRCC5 RPA3 RPA2 RPA1 PARP1 MSH6 MSH3 MSH2 CBX8 CBX7 CBX5 CBX3 CBX1

Fig. 4 Enrichment of drug targets in the human DNA repair network. a The proteins detected and scored in our human DNA repair dataset were enriched in nine gene−drug classes. The enriched classes were obtained using WebGestalt database31. b Proteins/genes enriched in the dactinomycin set are represented. The red color represent high TopS values. TopS topological scoring

to determine if TopS analysis applied to the wild-type INO80 and estimated by dNSAF, of the INO80 complex in the ARP8 bait SWI/SNF data could capture modules as determined from the (Fig. 5a) and the protein abundance of the SWI/SNF complex in deletion datasets and potential direct interactions as determined the SWP82 bait (Fig. 5b) are shown. For example, ACT1, ARP4, from crosslinking data. ARP8, IES4, and TAF14 are known to be part of a module based Identifying direct interactions from quantitative wild-type on deletion network analysis14 and crosslinking data34, yet their affinity purification datasets has been a long-standing problem. dNSAF abundances have no particular pattern. Examining protein abundances in separate samples, one cannot To determine whether TopS values can capture the modularity identify the direct interactions using standard approaches. This and potentially direct protein interaction in these complexes, we challenge is illustrated in Fig. 5 where the protein abundances, as merged a total of 24 wild-type affinity purifications for several

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y

subunits (Fig. 6a). In addition, the ARP9, ARP7, and RTT102 a ARP8 bait Module: ARP8, ACT1, ARP4, IES4, TAF14 module in SWI/SNF was also clearly separated from the other 0.16 modules (Fig. 6a, Supplementary Figure 7). ARP7, ARP9, and RTT102 are part of both the SWI/SNF12 and RSC35 complexes. 0.14 Additional components of RSC were therefore present in the affinity purifications of these shared subunits and localized to an 0.12 SWI/SNF subunit/RSC complex partition of the network (Fig. 6a). Overall, TopS values of wild-type affinity purifications were able 0.10 to identify the modules of the INO80 and SWI/SNF complexes 0.08 (Fig. 6a) in a manner comparable to the results from the deletion 12,14

dNSAF network analyses . 0.06 Next, similarly to how we processed the human DNA repair network, we sorted the TopS values for all the prey proteins for 0.04 each bait protein (Supplementary Data 6 and 7) from highest to lowest to observe the extreme negative and positive values in the 0.02 dataset. The 35 prey proteins with the highest TopS values across fi 0.00 the 24 yeast chromatin remodeling wild-type bait puri cations were hierarchically clustered (Fig. 6b). The highest TopS value in fi fi IES3 IES6 IES4 IES1 IES5 IES2 the entire dataset was ARP5 in the IES6 af nity puri cation. ACT1 RVB1 RVB2 ARP5 ARP4 ARP8 INO80 TAF14 NHP10 Furthermore, ARP5 and IES6 were the only two proteins in the fi fi b IES6 af nity puri cation from the INO80 complex with positive SWP82 bait TopS values; all the other proteins in the INO80 complex have 0.18 Module: SWP82, SNF5, and TAF14 negative values in the IES6 affinity purification (Fig. 6b and Supplementary Data 6 and 7). ARP5 and IES6 are well- 0.16 characterized direct interactors in a subcomplex within the 14,36 0.14 INO80 complex . A similar result occurred in the ARP9 affinity purification where ARP7, RTT102, and ARP9 had the 0.12 three highest TopS values, and all the other components of the 0.10 SWI/SNF and RSC complexes had negative values in the ARP9 affinity purification (Fig. 6b and Supplementary Data 6 and 7). 0.08 dNSAF Again, ARP7, RTT102, and ARP9 are a known module since they are shared by both the SWI/SNF12 and RSC35 complexes. 0.06 An alternative approach to visualize positive and negative TopS 0.04 scores in the INO80 and SWI/SNF datasets is found in Supplementary Figure 8, where six modules are shown in 0.02 different bait affinity purifications. For example, in the INO8O dataset, the four proteins of the NHP10 module had high positive 0.00 TopS scores in IES1 affinity purifications but negative values in the ARP5 and ARP4 affinity purifications (Supplementary SWI1 SWI3 SNF2 SNF6 SNF5 ARP9 ARP7 TAF14 SNF11 SNF12 Figure 8a). In the SWI/SNF dataset, the three proteins of the SWP82 RTT102 SNF5, SWP82, and TAF14 module had high TopS values in the Fig. 5 Failure to capture modularity in wild type-yeast complexes with a SWP82, SNF6, and SNF2 baits, but negative values in the ARP7 standard approach. Distributed normalized spectral abundance factors for bait (Supplementary Figure 8f). Similarly to the analysis of the the subunits of the INO80 and SWI/SNF complexes in APMS experiments human DNA repair dataset, modules in INO80 and SWI/SNF using ARP8 and SWP82 as baits. Protein in red are included in a the ARP8, complexes showed similar patterns of interactions in different ACT1, ARP4, IEAS4, and TAF14 module, and b the SWP82, SNF5, and affinity purifications (Supplementary Figure 8), highlighting once TAF14 module. Error bars represent one standard deviation of the dNSAF again the accuracy of TopS positive and negative sign assignment values from at least three biological replicates to protein interactions. We next evaluated the overlap of our high TopS values with reported crosslinking interactions from INO8034 and SWI/SNF33 INO80 and SWI/SNF subunits isolated from yeast (Supplemen- (Fig. 6c, d and Supplementary Data 6 and 7). Our results showed tary Data 6 and 7). First, the INO80 data14 were preprocessed by a high overlap between crosslinking interactions and proteins extracting nonspecific proteins, and proteins that passed the −baits pairs with high TopS values. We observed 77% and 63% contaminant extractions in INO80 purifications (Supplementary overlap for the INO80 and SWI/SNF complexes, respectively. The Data 6) were also searched in the SWI/SNF dataset (Supplemen- direct interactions identified by crosslinking are mostly between tary Data 7). A total of 237 proteins were shared between the two subunits located within the same module and TopS identified the complexes. TopS analysis was applied to both complexes and majority of these interactions (Fig. 6c, d). The major exceptions FDR values for these proteins using QSPEC analysis were also were for proteins that are shared between different complexes like obtained (Supplementary Data 9 and 10 and Supplementary RVB1, RVB2, ARP7, ARP9, and RTT102 (Supplementary Data 6 Figure 5). and 7). In these cases, TopS give the highest scores to other As with the human DNA repair network, we directly imported complexes which share these proteins. For example, the module TopS values into a TDA-based network analysis (Fig. 6). In the ARP7, ARP9, and RTT102 is shared with the RSC complex and INO80 complex, the ARP8 module was separated from the we can see from Fig. 6b that members of the RSC complex are NHP10 and ARP5-IES6 modules (Fig. 6a, Supplementary highly enriched in the ARP7 purifications. Overall, we found that Figure 6). Interestingly, the ARP8 module was connected via TopS values from an analysis of wild-type INO80 and SWI/SNF the shared protein TAF14 to a module consisting of SWI/SNF AP-MS captured the known modularity of the complexes and

8 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

a

–4.31 Min Max 6.57 ARP7 SWI/SNF subunits PCA coord 2 avg –62.3 ARP9 RSC complex TAF14 ARP8 module RTT102

ARP4-TRA1 interaction

SWI1

SNF11 interactions NHP10 ARP5-IES6 module network

ARP5 RVB1-RVB2 RVB1-RVB2 interactions interaction/ interactions RPT proteins b c

Max = 29752 ACT1

ARP8 IES4

ARP4 TAF14 TopS 0.00

IES1 INO80 RVB1

Min = –29752 IES1 IES5 IES3 NHP10 ARP4 ARP8 ARP9 IES6 ARP7 RTT102 IES4 RVB1 RVB2 IES2 INO80 TAF14 TAF14 SNF11 SNF6 SNF2 SWI1 SWI3 SWP82 ARP5 IES2 ACT1 NHP10 IES3 RVB2 ARP4 IES1 ARP5 INO80 IES3 IES5 IES5 NHP10 IES6 ARP5 IES6 ARP8 IES2 d TAF14 IES4 SNF11 SWI3 TAF14 RVB1 RVB2 ARP7 RTT102 SNF12 ARP9 SNF12 ARP7 SNF6 SNF2 SNF6 SNF5 SWI1 SNF5 SWP82 SWP82 SWI3 NPL6 ARP9 RSC2 RSC8 SNF2 RSC6 SFH1 RSC58 RSC1 SNF11 RSC9 SWI1 STH1 RTT102

Fig. 6 Topological network and linkage analysis of a yeast chromatin remodeling network. a A TDA network was constructed for the yeast data inputting TopS scores to the Ayasdi platform24. Distance Correlation was used as a metric with two filter functions: Neighborhood lens 1 and Neighborhood lens 2. Resolution 50 and gain 6x eq. were used. Proteins are colored based on the rows per node. Color bar: red: high values, blue: low values. Node size is proportional with the number of proteins in the node. The location of protein complexes subunits within the topological network are drawn on top of the TDA output. b Hierarchical biclustering using TopS values. Components of the INO80, SWI/SNF, and RSC complexes were clustered using PermutMatrix47. Rows and columns were clustered using Pearson as the distance and Ward linkage as the method. Yellow corresponds to high TopS values and gray indicates negative TopS values. Proteins that are not present in the purifications are shown in black. c, d Comparison of TopS values with crosslinking data. Each plot shows the direct interactions between subunits of c INO80 and d SWI/SNF. A red bidirectional edge represents the reported crosslinking interactions with a high TopS value. Black edges represent the reported crosslinking interactions with lower TopS values. TDA topological data analysis, TopS topological scoring

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y high TopS values correlated with known direct interactions syndromic intellectual disability41. Overall, our TopS analysis of within these protein complexes. the polycomb complexome dataset4 demonstrates the ability of the approach to rapidly analyze existing quantitative protein interaction network datasets, to generate distinct network Comparison of TopS to alternative analysis pipelines. Several visualizations, and to detect modules within these networks and computational pipelines exist for the analysis of protein interac- potentially identify direct protein interactions. tions generated from quantitative proteomic analyses of affinity purifications. These include the commonly used SAINT7 and CompPASS approaches5. We applied these two pipelines to the Discussion human DNA repair and yeast INO80 and SWI/SNF datasets and To predict protein interactions from affinity purifications using compared the results to the TopS approach (Supplementary quantitative proteomics, we have devised a TopS algorithm for Methods, Supplementary Discussion, Supplementary Figures 9, evaluating the preference of each prey protein for a bait relative to 10, and Supplementary Data 8–10). First, as expected, we other baits. We have built on the concept of TDA24 as applied to demonstrated that all three approaches effectively determine the protein interaction networks14,25 to devise the TopS approach. components of protein complexes when bait purifications are Here, we have combined information from row, column, and compared to negative controls (Supplementary Figure 10a). Next, total distributed spectral counts into this score to differentiate the we analyzed the recall of potential direct protein interactions preference of interaction with baits. This is specifically important from all three approaches using human DNA repair proteins for cases where proteins are detected in many runs. We have from BioGRID37 and from the crosslinking-based results of the illustrated the methodology and its advantages through the ana- INO80 and SWI/SNF dataset. In this case, every affinity pur- lysis of two AP-MS datasets: a human DNA repair protein ification was compared to all other affinity purifications in the interaction network generated using transient transfections and a dataset, which were considered positive controls (Supplementary yeast chromatin remodeling protein interaction network. TopS Methods, Supplementary Discussion). In all three cases, TopS values are directly incorporated into clustering and network showed improved recall over SAINT7 and CompPASS5 (Sup- assembly approaches and provide important insights into three plementary Discussion and Supplementary Figure 10b). As protein interaction networks. described earlier, an important feature of TopS is the large range TopS values cover a broad and meaningful negative to positive of values it generates allowing the capture of modules and range. In the human DNA repair proteins dataset, the highest TopS potentially the direct protein interactions in datasets. value was calculated for the XRCC6 prey in the XRCC5 affinity One issue with such computational analyses is that the TopS purification and these two proteins are known to interact and form algorithm may have been tailored to be particularly well-suited to aheterodimer13. Furthermore, XRCC5 and XRCC6 are among the the type of AP-MS datasets generated in our laboratory. To most negative TopS values in several other bait purifications in demonstrate the larger capabilities of the TopS algorithm, we the human DNA repair network dataset. The highest TopS value in reanalyzed an independently generated and previously published the entire yeast chromatin remodeling dataset is ARP5 in the IES6 protein interaction network of the human polycomb complex- affinity purification, and again these two proteins are a well- ome4. CompPASS5 was also used to analyze this polycomb characterized, directly interacting, submodule of the INO80 dataset that contains at least two biological replicates for 64 complex14,36. Again, ARP5 and IES6 had negative values in other unique baits, for a total of 174 AP-MS containing 9853 candidate INO80 bait purifications. Extreme positive TopS values in specific interactions4. As shown in Fig. 7a, TopS and TDA separated the baits are typically reflected as extreme negative values in other bait interactions and revealed the cross-talk between different baits. purifications, even if these bait proteins are part of the same For example, BAP1 and ASXL1/ ASXL2 baits were connected in a complex. This distinguishing feature of TopS provides important cluster on the left side of the topological area, with several insights into proteins within a complex and suggests potential interactions showing the same high scores to all three baits in direct interactions. Furthermore, in the INO80 and SWI/SNF agreement with the observations in the original study4. We also analysis, we were able to capture modularity from wild-type affinity observed proteins that associated only with a single bait, such as purifications that we previously could only capture with the ana- SKP1 that linked to CSK interactions, while DSN1, ZN211, and lysis of affinity purifications in deletion mutant backgrounds12,14 TYY1 were isolated (Fig. 7a and Supplementary Data 11). and our results correlated strongly with crosslinking data33,34.A Although the polycomb dataset contained spectral counts in an second important feature of TopS is its ability to identify modules intermediate abundance range (Fig. 7b) when compared to our within wild-type protein complex datasets. own human and yeast datasets described earlier, TopS recovered The TopS platform has several important features. It is easy to the majority of interactions (90% overlap with TopS ≥ 2 and implement. There are no parameters or assumptions of a prob- WDN≥ 1.5) reported in Hauri et al.4. ability distribution in our algorithm. The number of replicates As described earlier with the human DNA repair and yeast can vary for the baits, where the replicates could be averaged by chromatin remodeling datasets, an important feature of the TopS the user, without severely affecting the results. In addition, TopS approach is capturing extreme values. A biclustering analysis of is parameter free, distribution free, and independent of reference selected baits and preys with high TopS values in the polycomb knowledge. TopS values can serve as a starting point or as a complexome dataset (Fig. 7c) revealed modules within the scaffold for other computational methods, visualization tools, and network including the BAP1, HCFC1, and OGT1 module seen can be directly used in clustering and network analysis approa- in the ASX1, ASX2, OGT, and BAP1 baits (Fig. 7c). BAP1, ches. Lastly, multiple datasets can be integrated by merging their HCFC1, and OGT1 are known interacting proteins38–40. Other TopS scores. For straightforward usage by the community, TopS modules captured in this analysis included a CBX2, PCGF4, is implemented as a SHINY application (https://shiny.rstudio. RING1, PHC1, and PHC2 module, an EED, EZH2, SUZ12, com/). TopS is complementary to many computational pipelines MTF2, and JARID2 module, which are both known modules used to process quantitative AP-MS datasets5–8,42,43. TopS can be within the polycomb system4, and a previously uncharacterized used in addition to any of these approaches to provide a different ADNP1, CHAP1, POGZ, and TIF1B module (Fig. 7c). Intrigu- perspective on a dataset. The TopS platform can be easily ingly, components of this last module, CHAP1 and POGZ, have implemented in addition to these approaches to further analyze been shown to interact and play a role in a rare form of such datasets. If quantitative values are provided for all the prey

10 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

ab DSN1 interactions 8000

6000

ZN211 interactions 4000

Frequency 2000 TYY1 interactions INO80/TCP complexes JARDD2, C17ORF12, C17ORF96 0 EZH1, EZH2, EED 0 100 200 300 400 interactions Histogram of spectral counts ASXL1, ASXL2 and BAP1 SENP7/SUV91/ZN462 interactions interactions SIN3, NuRD, NURF, LINC complexes SKP1 interactions

TRIPC interactions CBX6, CBX7, CSK21 CSK interactions LMBL3, LMBL4, MAX, PCGF1,PCGF2, PCGF4 PHC1, PHC2, RING1 SMBT1 interactions

CBX1, CBX3 and CBX5 LIN proteins, Zn proteins ZMYM4 interactions ZN211, PHF19 1.00Min Max 17.9 WDR5 interactions interactions Rows per nodeavg 7.70 ZMYM4, ZN462, Z518A PHC3, PCGF6 interactions c

Max = 1767.3

2

D

R

A J PHF19 EED ASXL2 WDR5 PGCF3 PGCF5 ZMYM4 E2F6 RYBP RBBP4 EZH2 DCAF7 CBX6 PCGF2 SKP1 Z518A CAF1B TFDP1 CBX4 PHC1 PGCF4 CBX5 ZN211 LMBL2 PCGF6 YAF2 RING2 OGT1 CBX8 PHC2 C17ORF96 SUZ12 MTF2 CBX1 AEBP2 EZH1 PHF1 ASXL1 CBX7 RING1 TRIPC MAX PCGF1 BAP1 CBX3 ACACA AEBP2 RBBP7 0.00 CAF1A MYH9 TFDP1 BCORL KDM2B CBX2 PCGF4 RING1 PHC1 PHC2 ADNP Min = –1767.3 CHAP1 POGZ TIF1B BCOR MGAP CBX8 CHD4 BAP1 HCFC1 OGT1 EED EZH2 SUZ12 MTF2 JARD2

Fig. 7 Topological network analysis of the polycomb complexome. a A TDA network was constructed using the Ayasdi24 platform using TopS values as the input. Within the Ayasdi platform24 Norm correlation was used for metric with two filter functions of Neighborhood lens 1 and Neighborhood lens 2, the resolution was set to 30 and a gain 4x eq. was used. Proteins are colored based on the rows per node. Color bar: red: high values, blue: low values. All proteins (408) that have a TopS value of 20 in at least one bait were included in here. b A histogram of the spectral counts for 9,853 candidate interactions was plotted. (c) Hierarchical biclustering on a subset of baits and preys within the dataset. Briefly, preys were included if they had a score of more than 100 in more than one bait and baits were excluded if none of the 32 proteins that passed this criteria had a score of 50 of higher in the bait. Also, RBBP4, RBBP7, CSK21, and CSK22 were removed since proteins passing the criteria above were only in RBBP4 and RBBP7 or CSK21 and CSK22. Rows and columns were clustered using Euclidean distance and Ward linkage as the method. Yellow corresponds to high TopS values and gray indicates negative scores. TDA topological data analysis, TopS topological score

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y proteins in a given bait, TopS can be used to reanalyze the data, To adjust for baits enrichment, we used this approach like the recent description of the human polycomb complexome4, aT : þ bT: À T:: for potential direct interactions and/or modules within protein y ¼ i j ; ð1Þ ij ðÞa À 1 ðÞb À 1 complexes. where a is the number of columns, b is number of rows, Ti represents the bait total spectral counts, Tj is the row total spectral counts, and T.. is the total spectral Methods counts in the matrix. Using this approach, the estimated values of the bait proteins fi are now close to their average spectral counts in all the AP-MS runs of the dataset. Materials. Magne HaloTag magnetic af nity beads were purchased from Promega Next, topological scores were calculated as follows. All the data that passed (Madison, WI). The following clones from the Kazusa DNA Research Institute criteria from the QSPEC analysis was used as an input to the TopS determinations. (Kisarazu, Chiba, Japan) were used: Halo-WDR76 (FHC25370), Halo-XRCC5 We used a simple model to calculate a score for each prey–bait interaction as (FHC07775), Halo-XRCC6 (FHC01518), Halo-RPA1 (FHC01462), Halo-RPA2 follows: (FHC11655), Halo-RPA3 (FHC06678), Halo-MSH2 (FHC07773), Halo-MSH3 (FHC12698), Halo-MSH6 (FHC08173), Halo-CBX1 (FHC07438), Halo-CBX3 Qij TopS ¼ Q log ; ð2Þ (FHC02188), Halo-CBX5 (FHC10519), Halo-CBX7 (FHC10535), Halo-CBX8 ij E (FHC01705), Halo-PARP1 (FHC01012), Halo-SPIN1 (FHC10419), and Halo- ij

SSBP1 (FHC07926). Flip-In HEK293T cells were purchased from Thermo Fisher where Qij is the observed spectral count in row i and column j; and Scientific (Waltham, MA) and authenticated using the ATCC (Manassas, VA) cell line authentication service. Cells are tested quarterly for mycoplasma and using the ðÞrow sum i ðÞcolumn sum j E ¼ : ð3Þ ATCC Mycoplasma detection kit (#30-1012K). ij ðÞtable sum The data are treated in the following manner: (1) For each column and row, the sum of spectral counts is calculated; (2) the total number of spectral counts in the Affinity purification and quantitative proteomic analysis. Human proteins in dataset is determined; (3) a TopS value for each protein in a bait AP-MS run is the pFN21A plasmid with an N-terminal HaloTag were transiently transfected into determined as described in Eq. (2). If the actual number of spectra of a prey protein HEK293T cells and whole cell extracts prepared and Halo affinity chromatography in a bait AP-MS run exceeds that in all AP-MS results being analyzed, the was performed on each independent whole cell lysate44. Briefly, 300 μl of whole cell presumption is that we have a positive interaction preference. Likewise, fewer extract diluted with 700 μl Tris Buffered Saline (TBS) was used for purifying Halo- spectra than in the AP-MS runs indicate a negative interaction preference. Proteins tagged bait complexes using Magne HaloTag magnetic affinity beads (Promega, that were not detected in a particular run were assigned the value 0. A high positive Madison, WI). The extracts were then incubated for 1 h at 4 °C with beads pre- score suggests a potential direct interaction between bait and prey. A negative score pared from 100 μl bead slurry. The beads were then washed four times in buffer suggests that prey protein and bait interact to form a complex elsewhere in the containing 50 mM Tris·HCl (pH 7.4), 137 mM NaCl, 2.7 mM KCl, and 0.05% dataset. We applied this framework to construct signed interaction networks NonidetP40. Bound proteins were eluted by incubating the beads for 2 h at 25 °C in derived from two independent AP-MS datasets obtained for yeast chromatin 100 μl buffer containing 50 mM Tris·HCl (pH 8.0), 0.5 mM Ethylenediaminete- remodelers and human DNA repair proteins. traacetic acid (EDTA), 0.005 mM Dithiothreitol (DTT), and two units of AcTEV Next, TopS takes a numeric data matrix as input where multiple dimensions Protease (Invitrogen, Carlsbad, CA). To remove any traces of affinity resin, the (e.g. proteins) are measured in multiple observations (e.g. baits/samples). In our eluates were spun through Micro Bio-Spin columns (BioRad, Hercules, CA). case, the numerical values are distributed spectral counts. To make data input Samples were then processed and analyzed using label-free quantitative proteomic easier for the end user, we have defined the input file formats that include rows and analyses using standard methods19. Briefly, after affinity purification of proteins columns annotation and numeric data. Any quantitative value, not only spectral using Halo affinity chromatography, samples were precipitated with trichloroacetic counts, can therefore be utilized. TopS includes an example dataset for testing acid and centrifuged at 21,000 × g for 30 min at 4 °C. The resulting pellet was purposes: DNA repair dataset. TopS next generates an automatic output to make washed twice with acetone and resuspended in buffer containing 100 mM Tris·HCl this type of analysis easier. MS Excel can be used to visualize the output and (pH 8.5) and 8 M urea. The sample was treated with Tris(2-carboxylethyl)-phos- identify for example differences between samples or interactions between proteins phine hydrochloride to reduce disulfide bonds, chloroacetamide (to prevent bond and baits. Pearson correlation map, clustering analyses on initial numeric values (in reformation), and digested with endoproteinase Lys-C for 6 h at 37 °C. After our case distributed spectral counts), and TopS values are provided to the user as a dilution to 2 M urea with buffer, samples were digested overnight with trypsin. pdf format output. Digested samples were then analyzed via multidimensional protein identification technology (MudPIT) on linear ion trap mass spectrometers (LTQ, Thermo Fisher – fi 19 fi Topological data analysis. The input data for TDA are represented in a bait prey Scienti c, Waltham, MA) . RAW les were converted to the ms2 format using fi RAWDistiller v. 1.0, an in-house developed software. The ms2 files were subjected matrix, with each column corresponding to puri cation of a bait protein and each row to database searching using SEQUEST (version 27 (rev.9))19. Tandem mass spectra corresponding to a prey protein: values are TopS values for each protein. A network of fi nodes with edges between them is then created using the TDA approach based on of proteins puri ed were compared against 29,375 nonredundant human protein 24 sequences obtained from the National Center for Biotechnology (2012-08-27 Ayasdi platform (AYASDI Inc., Menlo Park, CA) . Two types of parameters are release). Randomized versions of each nonredundant protein entry were included needed to generate a topological analysis: First is a measurement of similarity, called in the databases to estimate the false discovery rates (FDR)19. All SEQUEST metric, which measures the distance between two points in space (i.e. between rows in searches were performed without peptide end requirements and with a static the data). Second are lenses, which are real valued functions on the data points. Lenses modification of +57 Da added to residues to account for carbox- could come from statistics (mean, , min), from geometry (centrality, curvature), amidomethylation, and dynamic searches of +16 Da for oxidized methionine. and machine learning (PCA/SVD, Autoencoders, Isomap). In the next step the data Spectra/peptide matches were filtered using DTASelect/CONTRAST45. In this are partitioned. Lenses are used to create overlapping bins in the dataset, where the dataset, spectrum/peptide matches only passed filtering if they were at least seven bins are preimages under the lens of an interval. Overlapping families of intervals are used to create overlapping bins in the data. Metrics are used with lenses to construct amino acids in length and fully tryptic. The DeltCn was required to be at least 0.08, fi with minimum XCorr values of 1.8 for singly, 2.0 for doubly, and 3.0 for triply the output. There are two parameters used in de ning the bins. One is resolution, charged spectra, and a maximum Sp rank of 10. Proteins that were subsets of which determines the number of bins; higher resolution means more bins. The second others were removed using the parsimony option in DTASelect on the proteins is gain, which determines the degree of overlap of the intervals. Once the bins are fi constructed, we perform a clustering step on each bin, using single linkage clustering detected after merging all runs. Proteins that were identi ed by the same set of fi peptides (including at least one peptide unique to such protein group to distinguish with a xed heuristic for the choice of the scale parameter. This gives a family of between isoforms) were grouped together, and one accession number was arbi- clusters within the data, which may overlap, and we will construct a network with one trarily considered as representative of each protein group. Quantitation was per- node for each such cluster, and we connect two nodes if the corresponding clusters fi contain a data point in common. Norm Correlation was used as a distance metric with formed using label-free spectral counting. The number of spectra identi ed for fi each protein was used for calculating the distributed normalized spectral abun- two lter functions: Neighborhood lens 1 and Neighborhood lens 2. Resolution 30 and dance factors (dNSAF)21. NSAF v7 (an in-house developed software) was used to gain 3 were used to generate Fig. 3a. and Resolution 50 and gain 6x eq. were used to create the final report on all nonredundant proteins detected across the different generate Fig. 6a. Neighborhood lens 1 and Neighborhood lens 2 with the resolution 30 runs, estimate FDR, and calculate their respective distributed Normalized Spectral and a gain 4x eq. was used to generate Fig. 7a. Abundance Factor (dNSAF) values. Reporting summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this article. Topological scoring. Since our approach aims to identify the enrichment of each protein in each bait relative to a collection of baits, overexpression of affinity- Code availability. TopS is written using Shiny application (R package version tagged bait proteins can diminish the interaction score. With this knowledge, we 3.4.2) for R statistics software. TopS uses several packages, including gplots, dev- therefore selected to use a normalization method where the baits are estimated tools and gridExtra. TopS is freely available at https://github.com/WashburnLab/ directly from the dataset. Topological-score-TopS.

12 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y ARTICLE

Data availability 26. Gallina, I. et al. Cmr1/WDR76 defines a nuclear genotoxic stress body All the DNA repair data used here are deposited at the https://massive.ucsd.edu/ with linking genome integrity and protein quality control. Nat. Commun. 6, 6533 MassIVE ID # MSV000081377. The data used for the analysis of the S. cerevisiae INO80 (2015). and SWI/SNF chromatin remodeling complexes are from previous studies on INO80 14 27. Mellacheruvu, D. et al. The CRAPome: a contaminant repository for (MSV000079138) and SWI/SNF (MSV000081417)12. Original data underlying this affinity purification-mass spectrometry data. Nat. Methods 10, 730–736 manuscript can be accessed from the Stowers Original Data Repository at http://www. (2013). stowers.org/research/publications/libpb-1280. 28. Imbeault, M., Helleboid, P. Y. & Trono, D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 543, 550–554 (2017). Received: 28 February 2018 Accepted: 22 February 2019 29. Moriyama, T. et al. Identification and characterization of a nuclear localization signal of TRIM28 that overlaps with the HP1 box. Biochem. Biophys. Res. Commun. 462, 201–207 (2015). 30. White, D. et al. The ATM substrate KAP1 controls DNA repair in heterochromatin: regulation by HP1 proteins and serine 473/824 phosphorylation. Mol. Cancer Res. 10, 401–414 (2012). References 31. Zhang, B., Kirov, S. & Snoddy, J. WebGestalt: an integrated system for 1. Hein, M. Y. et al. A human interactome in three quantitative exploring gene sets in various biological contexts. Nucleic Acids Res. 33, dimensions organized by stoichiometries and abundances. Cell 163, 712–723 W741–W748 (2005). (2015). 32. Malogolowkin, M. et al. Treatment of Wilms tumor relapsing after initial 2. Huttlin, E. L. et al. Architecture of the human interactome defines treatment with vincristine, actinomycin D, and doxorubicin. A report from protein communities and disease networks. Nature 545, 505–509 the National Wilms Tumor Study Group. Pediatr. Blood Cancer 50, 236–241 (2017). (2008). 3. Sardiu, M. E. et al. Probabilistic assembly of human protein interaction 33. Sen, P. et al. Loss of Snf5 induces formation of an aberrant SWI/SNF networks from label-free quantitative proteomics. Proc. Natl Acad. Sci. USA Complex. Cell Rep. 18, 2135–2147 (2017). 105, 1454–1459 (2008). 34. Tosi, A. et al. Structure and subunit topology of the INO80 chromatin 4. Hauri, S. et al. A high-density map for navigating the human polycomb remodeler and its complex. Cell 154, 1207–1219 (2013). complexome. Cell Rep. 17, 583–595 (2016). 35. Baetz, K. K., Krogan, N. J., Emili, A., Greenblatt, J. & Hieter, P. The ctf13-30/ 5. Sowa, M. E., Bennett, E. J., Gygi, S. P. & Harper, J. W. Defining the CTF13 genomic haploinsufficiency modifier screen identifies the yeast human interaction landscape. Cell 138, 389–403 chromatin remodeling complex RSC, which is required for the establishment (2009). of sister chromatid cohesion. Mol. Cell. Biol. 24, 1232–1244 (2004). 6. Choi, H., Fermin, D. & Nesvizhskii, A. I. Significance analysis of spectral count 36. Yao, W. et al. The INO80 complex requires the Arp5-Ies6 subcomplex for data in label-free shotgun proteomics. Mol. Cell. Proteom. 7, 2373–2385 chromatin remodeling and metabolic regulation. Mol. Cell. Biol. 36, 979–991 (2008). (2016). 7. Choi, H. et al. SAINT: probabilistic scoring of affinity purification-mass 37. Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2015 update. spectrometry data. Nat. Methods 8,70–73 (2011). Nucleic Acids Res. 43, D470–D478 (2015). 8. Titeca, K. et al. SFINX: straightforward filtering index for affinity purification- 38. Bhuiyan, T., Waridel, P., Kapuria, V., Zoete, V. & Herr, W. Distinct OGT- mass spectrometry data analysis. J. Proteome Res. 15, 332–338 (2016). binding sites promote HCF-1 cleavage. PLoS ONE 10, e0136636 (2015). 9. Jeronimo, C. & Robert, F. The mediator complex: at the nexus of RNA 39. Moon, S., Lee, Y. K., Lee, S. W. & Um, S. J. Suppressive role of OGT-mediated polymerase II transcription. Trends Cell Biol. 27, 765–783 (2017). O-GlcNAcylation of BAP1 in retinoic acid signaling. Biochem. Biophys. Res. 10. Helmlinger, D. & Tora, L. Sharing the SAGA. Trends Biochem. Sci. 42, Commun. 492,89–95 (2017). 850–861 (2017). 40. Machida, Y. J., Machida, Y., Vashisht, A. A., Wohlschlegel, J. A. & Dutta, A. 11. Lee, K. K. et al. Combinatorial depletion analysis to assemble the network The deubiquitinating enzyme BAP1 regulates cell growth via interaction with architecture of the SAGA and ADA chromatin remodeling complexes. Mol. HCF-1. J. Biol. Chem. 284, 34179–34188 (2009). Syst. Biol. 7, 503 (2011). 41. Isidor, B. et al. De novo truncating in the kinetochore-microtubules 12. Dutta, A. et al. Composition and function of mutant Swi/Snf complexes. Cell attachment gene CHAMP1 cause syndromic intellectual disability. Hum. Rep. 18, 2124–2134 (2017). Mutat. 37, 354–358 (2016). 13. Rivera-Calzada, A., Spagnolo, L., Pearl, L. H. & Llorca, O. Structural model of 42. Li, X. et al. Defining the protein−protein interaction network of the human full-length human Ku70-Ku80 heterodimer and its recognition of DNA and protein tyrosine phosphatase family. Mol. Cell. Proteom. 15, 3030–3044 DNA-PKcs. EMBO Rep. 8,56–62 (2007). (2016). 14. Sardiu, M. E., Gilmore, J. M., Groppe, B., Florens, L. & Washburn, M. P. 43. Verschueren, E. et al. Scoring large-scale affinity purification mass Identification of topological network modules in perturbed protein interaction spectrometry datasets with MiST. Curr. Protoc. Bioinforma. 49,1–16 networks. Sci. Rep. 7, 43845 (2017). (2015). 15. Branzei, D. & Foiani, M. Regulation of DNA repair throughout the cell cycle. 44. Banks, C. A., Boanca, G., Lee, Z. T., Florens, L. & Washburn, M. P. Proteins Nat. Rev. Mol. Cell Biol. 9, 297–308 (2008). interacting with cloning scars: a source of false positive protein-protein 16. Jensen, N. M. et al. An update on targeted gene repair in mammalian cells: interactions. Sci. Rep. 5, 8530 (2015). methods and mechanisms. J. Biomed. Sci. 18, 10 (2011). 45. Tabb, D. L., McDonald, W. H. & Yates, J. R. 3rd DTASelect and Contrast: 17. Wood, R. D., Mitchell, M. & Lindahl, T. Human DNA repair genes. Mutat. tools for assembling and comparing protein identifications from shotgun Res. 577, 275 (2005). proteomics. J. Proteome Res. 1,21–26 (2002). 18. Dabin, J., Fortuny, A. & Polo, S. E. Epigenome maintenance in response to 46. Metsalu, T. & Vilo, J. ClustVis: a web tool for visualizing clustering of DNA damage. Mol. Cell 62, 712–727 (2016). multivariate data using Principal Component Analysis and heatmap. Nucleic 19. Gilmore, J. M. et al. WDR76 co-localizes with heterochromatin related Acids Res. 43, W566–W570 (2015). proteins and rapidly responds to DNA damage. PLoS ONE 11, e0155492 47. Caraux, G. & Pinloche, S. PermutMatrix: a graphical environment to arrange (2016). profiles in optimal linear order. Bioinformatics 21, 1280–1281 20. Gibson, T. J., Seiler, M. & Veitia, R. A. The transience of transient (2005). overexpression. Nat. Methods 10, 715–721 (2013). 21. Zhang, Y., Wen, Z., Washburn, M. P. & Florens, L. Refinements to label free proteome quantitation: how to deal with peptides shared by multiple proteins. Acknowledgements – Anal. Chem. 82, 2272 2281 (2010). Research reported in this publication was supported by the Stowers Institute for Medical 22. Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB Research and the National Institute of General Medical Sciences of the National Insti- – interaction database: 2013 update. Nucleic Acids Res. 41, D793 D800 tutes of Health under Award Number RO1GM112639 to M.P.W. The content is solely (2013). the responsibility of the authors and does not necessarily represent the official views of 23. Shannon, P. et al. Cytoscape: a software environment for integrated models of the National Institutes of Health. biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 24. Lum, P. Y. et al. Extracting insights from the shape of complex data using topology. Sci. Rep. 3, 1236 (2013). Author contributions 25. Sardiu, M. E. et al. Conserved abundance and topological features in M.E.S. and M.P.W. designed the experiments. J.M.G., B.D.G. and A.D. performed the chromatin-remodeling protein interaction networks. EMBO Rep. 16, 116–126 experiments. M.E.S. performed computational analyses of data. M.E.S., L.F. and M.P.W. (2015). wrote the manuscript.

NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-09123-y

Additional information Open Access This article is licensed under a Creative Commons Supplementary Information accompanies this paper at https://doi.org/10.1038/s41467- Attribution 4.0 International License, which permits use, sharing, 019-09123-y. adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Competing interests: The authors declare no competing interests. Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless Reprints and permission information is available online at http://npg.nature.com/ indicated otherwise in a credit line to the material. If material is not included in the reprintsandpermissions/ article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from Journal peer review information: Nature Communications thanks the anonymous the copyright holder. To view a copy of this license, visit http://creativecommons.org/ reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports licenses/by/4.0/. are available.

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in © The Author(s) 2019 published maps and institutional affiliations.

14 NATURE COMMUNICATIONS | (2019) 10:1118 | https://doi.org/10.1038/s41467-019-09123-y | www.nature.com/naturecommunications