Supporting Information

Malovannaya et al. 10.1073/pnas.0912599106 SI Text and 75th percentile SPCs in SPC distribution across IPs, and the We have defined four categories of specificity filters for the extreme upper-hand outlier threshold was defined as a cutoff for immunoprecipitation followed by mass spectrometry (IP/MS) specific identification level E ðiÞ ¼ð75th percentile SPCiÞþ data as follows: cutoff ð3 × IQRiÞ. By this definition, that are present in (i) Three filters [input (IN), loose (LP), and packed (PP) pre- <25% of our IPs are always specific; more frequent proteins cipitates] compare composition of an IP with that of extract and protein precipitates that form during the IP proce- are omitted from IPs only when present at background levels. dure. For these semiquantitative composition enrichment We find these constraints sufficient to this date. iii filters, approximate fractional contribution of each protein to ( ) Keratins, trypsin, and immunoglobulins are introduced either of the fractions or an IP experiment was defined as during the IP/MS procedure. These identifications are deleted n FCi ¼ SPAi∕∑i¼1 SPAi, where SPA ðspectral abundanceÞi ¼ from the results. iv 3N SPC ðspectral countsÞi∕MW (molecular weight), and n ¼ total ( )For analysis, a group of ribosomal, heat-shock, and number of nonredundant protein identifications in a given cytoskeletal protein variants were omitted in addition to the experiment. The enriched identifications were defined as pro- dynamic composition and distribution filters that are described FC ≥ k × FC k teins with iðIPÞ iðIN;LP; or PPÞ, where is a threshold above for two reasons: (i) results from the automated search, such multiplier. We examined several different thresholds, ranging as SeQuest, often contain ambiguous identifications for k ¼ 3 k ¼ 100 from to for each of these filters, by manually eval- these categories of proteins due to high homology between multi- uating the protein identifications which were “flagged” as non- FC ≥ 5 × FC k ¼ 5 tude of isoforms, which decreases accuracy of gene product-based specific. For our dataset, iðIPÞ iðIN;LP; or PPÞ ( )is dynamic filters for this particular list; (ii) often, the same set of effective at pinpointing nonspecificity, particularly when used E heat-shock chaperone proteins specifically associates with differ- in combination with the cutoff filter described below. (ii) The purpose of the SPC distribution E filter is to dif- ent protein complexes that otherwise do not share biological cutoff 3N ferentiate background, or “noise,” identifications from enriched functions; inclusion of these proteins in analysis causes proteins by examining levels at which each protein appears across merging of functionally unrelated protein complexes, which is E all IPs in our dataset. For the cutoff filter, a standard statistical not desirable. Information about specific occurrences of heat- outlier test was applied as follows: for each protein, the interquar- shock binding can be retrieved from the original experimental tile region (IQR) was calculated as a difference between the 25th data, separately from 3N analysis.

Malovannaya et al. www.pnas.org/cgi/doi/10.1073/pnas.0912599106 1of3 a) 3N INPUT: seed protein (corresponding GeneID)

in filtered IP/MS dataset, find experiments where Top IP and Repeat

seed protein SPA 0.05 Constraints

restrict experiment list to max of 2 repeats for each antibody

(1) sort experiments by seed protein SPCs (high to low)

(2) restrict experiment list to min of 5 and max of 15 IPs Cooccurrence and Correlation

compile matrix of all specific proteins identifications

(GeneIDs) in Top IPs Constraints

restrict GeneID list by co-occurrence with seed protein: co-occurrence number of Top IPs ^(0.6)

restrict GeneID list to interactants with <65º angle to the seed

3N OUTPUT: 3N list for the seed protein

b) Core INPUT: seed protein (corresponding GeneID)

in a calculated set of all 3Ns, find 3N for seed protein

find available reciprocal 3N list for each of the NNs

sort by frequency of NN proteins in reciprocal 3N lists

YES omit least frequent NN protein(s), if any

omissions were made

NO

Core OUTPUT: core complex module for the seed protein

Fig. S1. Schematic of (A) 3N and (B) core complex cluster logic. NN ¼ near neighbor.

Malovannaya et al. www.pnas.org/cgi/doi/10.1073/pnas.0912599106 2of3 a) CHD4-containing cores

Core Complex Seed Proteins GATAD2B CHD3 RBBP4 NN Symbol CHD4 MTA2 GATAD2A MBD3 MTA1 RBBP7 CDK2AP2 CDK2AP1 MTA3 MBD2 HDAC2 CHD4 GATAD2A GATAD2B MTA1 MTA2 b) MBD3 SIN3A-containing cores RBBP7 CHD3 Core Complex Seed Proteins CDK2AP1 MBD2 MTA3 CDK2AP2 SAP130 BRMS1L SAP30 SIN3A ARID4B BRMS1 ING2 SAP30L MAX HDAC1 HDAC2 NN Symbol BBX SUDS3 ING1 HDAC1 SIN3A RBBP4 BRMS1 SAP30 ARID4B BBX SUDS3 c) KDM1-containing cores SAP130 ING1 BRMS1L Core Complex Seed Proteins ING2 SAP30L HDAC1 MAX HDAC2

NN Symbol KDM1 KIAA0182 HMG20B RCOR3 L3MBTL3 SAMD1 PHF21A HMG20A ZNF217 RCOR1 RREB1 ZNF516 ZMYM3 RCOR2 KDM1 RBBP7 RCOR1 RCOR3 KIAA0182 PHF21A HMG20B HMG20A SAMD1 L3MBTL3 ZNF217 RREB1 HDAC1 HDAC2 ZMYM3 RCOR2 ZNF516

Fig. S2. 3N core complex components were assigned for each of the denoted seed proteins for the three HDAC-containing modules (CHD4, SIN3A, KDM1), and core module subunits were sorted by their copresence across multiple 3N clusters, analogous to the Mediator complex analysis shown in Fig. 3B. NN ¼ near neighbor.

Other Supporting Information Files Table S4 (XLS) Table S1 (PDF) Table S5 (XLS) Table S2 (PDF) Table S6 (XLS) Table S3 (XLS) Table S7 (XLS)

Malovannaya et al. www.pnas.org/cgi/doi/10.1073/pnas.0912599106 3of3