ARTICLES

Network modeling links breast cancer susceptibility and centrosome dysfunction

Miguel Angel Pujana1,2,16,17, Jing-Dong J Han1,2,16,17, Lea M Starita3,16,17, Kristen N Stevens4,17, Muneesh Tewari1,2,16, Jin Sook Ahn1,2, Gad Rennert5,Vı´ctor Moreno6,7, Tomas Kirchhoff 8, Bert Gold9, Volker Assmann10, Wael M ElShamy2, Jean-Franc¸ois Rual1,2, Douglas Levine8, Laura S Rozek6, Rebecca S Gelman11, Kristin C Gunsalus12, Roger A Greenberg2, Bijan Sobhian2, Nicolas Bertin1,2, Kavitha Venkatesan1,2, Nono Ayivi-Guedehoussou1,2,16, Xavier Sole´7, Pilar Herna´ndez13, Conxi La´zaro13, Katherine L Nathanson14, Barbara L Weber14, Michael E Cusick1,2, David E Hill1,2, Kenneth Offit8, David M Livingston2, Stephen B Gruber4,6,15, Jeffrey D Parvin3,16 & Marc Vidal1,2 http://www.nature.com/naturegenetics

Many cancer-associated remain to be identified to clarify the underlying molecular mechanisms of cancer susceptibility and progression. Better understanding is also required of how mutations in cancer genes affect their products in the context of complex cellular networks. Here we have used a network modeling strategy to identify genes potentially associated with higher risk of breast cancer. Starting with four known genes encoding tumor suppressors of breast cancer, we combined expression profiling with functional genomic and proteomic (or ‘omic’) data from various species to generate a network containing 118 genes linked by 866 potential functional associations. This network shows higher connectivity than expected by chance, suggesting that its components function in biologically related pathways. One of the components of the network is HMMR, encoding a centrosome subunit, for which we demonstrate previously unknown functional associations with the breast cancer– associated gene BRCA1. Two case-control studies of incident breast cancer indicate that the HMMR locus is associated with higher risk of breast cancer in humans. Our network modeling strategy should be useful for the discovery of additional cancer- associated genes. Nature Publishing Group Group 200 7 Nature Publishing © Combinations of mutated and/or aberrantly expressed tumor sup- most cancer genes remain to be identified1. In addition, it is becoming pressor genes and oncogenes, or ‘cancer genes’, are thought to be increasingly clear that most genes and their products interact in responsible for most steps of cancer progression. Although funda- complex cellular networks, the properties of which might be altered mental principles have emerged from the study of known cancer genes in cancer cells as compared with their unaffected counterparts2. and their products, many questions remain unanswered. Notably, Achieving a deeper understanding of cancer molecular mechanisms

1Center for Cancer Systems Biology (CCSB) and 2Department of Cancer Biology, Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, 44 Binney St., Boston, Massachusetts 02115, USA. 3Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, 77 Louis Pasteur Ave., Boston, Massachusetts 02115, USA. 4Department of Epidemiology, University of Michigan, 109 Zina Pitcher Pl., Ann Arbor, Michigan 48109, USA. 5CHS National Cancer Control Center, Department of Community Medicine and Epidemiology, Carmel Medical Center and Bruce Rappaport Faculty of Medicine, Technion, Haifa 34362, Israel. 6Department of Internal Medicine, University of Michigan, 109 Zina Pitcher Pl., Ann Arbor, Michigan 48109, USA. 7Department of Epidemiology and Cancer Registry, and Translational Research Laboratory, Catalan Institute of Oncology, IDIBELL, Gran Vı´a km 2.7, L’Hospitalet, Barcelona 08907, Spain. 8Clinical Genetics Service, Department of Medicine, Memorial Sloan-Kettering Cancer Center, 1275 York Ave., New York, New York 10021, USA. 9National Cancer Institute, Human Genetics Section, Laboratory of Genomic Diversity, Frederick, Maryland 21702, USA. 10Center for Experimental Medicine, Institute of Tumor Biology, University Hospital Hamburg–Eppendorf, Martinistrasse 52, Hamburg 20246, Germany. 11Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, 44 Binney St., Boston, Massachusetts 02115, USA. 12Center for Comparative Functional Genomics, Department of Biology, New York University, 100 Washington Square East, New York, New York 10003, USA. 13Translational Research Laboratory, Catalan Institute of Oncology, IDIBELL, Gran Vı´a km 2.7, L’Hospitalet, Barcelona 08907, Spain. 14Abramson Family Cancer Research Institute, University of Pennsylvania School of Medicine, 421 Curie Blvd., Philadelphia, Pennsylvania 19104, USA. 15Department of Human Genetics, University of Michigan, 109 Zina Pitcher Pl., Ann Arbor, Michigan 48109, USA. 16Present addresses: Bioinformatics and Biostatistics Unit, Translational Research Laboratory, Catalan Institute of Oncology, IDIBELL, Gran Vı´a km 2.7, L’Hospitalet, Barcelona 08907, Spain (M.A.P.); Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Datun Rd., Beijing 100101, China (J.-D.J.H.); Department of Genome Sciences, University of Washington, 1705 NE Pacific St., Seattle, Washington 98195, USA (L.M.S.); Human Biology Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. North, Seattle, Washington 98109, USA (M.T.); Harvard School of Public Health, Boston, Massachusetts 02115, USA (N.A.-G.); Department of Biomedical Informatics, Ohio State University Medical Center, 460 West 12th Ave., Columbus, Ohio 43210, USA (J.D.P.). 17These authors contributed equally to this work. Correspondence should be addressed to M.V. ([email protected]), J.D.P. ([email protected]) or S.B.G. ([email protected]).

Received 31 March; accepted 2 August; published online 7 October 2007; doi:10.1038/ng.2007.2

1338 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

may therefore require global strategies aimed at modeling the integrates coexpression profiles in human tissues and then integrates functional interrelationships between genes and/or (genes/ functional associations derived from various functional genomic and proteins) as complex interdependent networks3. proteomic, or ‘omic’, data sets obtained in both humans and model Here we propose a complementary approach to systematic organisms. This integrated network modeling strategy provides a resequencing efforts4–6 for identifying cancer genes and/or proteins. ranking system to classify potential network components from low This approach is based on global network modeling of functional to high likelihood; the components are then functionally and geneti- associations between potential cancer genes and their products. cally tested.

RESULTS Coexpression profiling A network modeling strategy To initiate our search for genes that are likely to be functionally Macromolecular networks can be modeled on the basis of both global associated with the reference genes/proteins, we used a data set correlations observed among transcriptional profiling compendia, containing transcript abundance measurements for 9,214 human -protein interaction or ‘interactome’ networks, and genome- genes in 101 samples originating from 43 healthy human tissues or wide phenotypic profiling data sets7, and comparisons of ‘interolog’ organs and three cell lines14. We determined Pearson correlation data sets from different organisms8. We combined these two strategies coefficient (PCC) values between each of the four reference genes to generate models of macromolecular networks that are possibly and all of the genes tested on the array. To determine the likelihood of perturbed in human cancer, starting from four known breast cancer– predicting functional associations by this coexpression approach, we associated genes and their products—BRCA1 and BRCA2 (both curated published data from the scientific literature (until 1 October identified by high-penetrance mutations)9–11 and ATM and CHEK2 2004) on protein interactions involving any of the four reference (both identified by low-penetrance mutations)12,13—referred to here- human proteins. The resulting ‘literature interaction’ (LIT-Int) net- after as the ‘reference genes/proteins’ (Fig. 1). The strategy first work contains 103 proteins and 129 functional associations (Fig. 2a and Supplementary Table 1 online). We determined that a PCC value of 4 0.4 cap- http://www.nature.com/naturegenetics Initial human tures 36% of the LIT-Int functional associa- gene set (9,214) tions (Fig. 2b). Beyond this threshold, the LIT-Int pairs are enriched between two- and tenfold (depending on the control set used) in Expression profiling coexpressed pairs as compared with 10,000 Human tissues gene pairs generated randomly from the same Similar expression data set. profiles Genes To identify potential functional associations involving all four reference genes, we focused

BRCA on those transcripts found in the ‘expression XPRSS-Int intersection’ (XPRSS-Int) of the four coex- (164) pression sets. The XPRSS-Int contains 164 Nature Publishing Group Group 200 7 Nature Publishing genes, of which 15 are also present in the © LIT-Int data set (Fig. 2c and Supplementary Human Human and model organisms Table 2 online). To evaluate the significance of Expression Phenotypic Genetic Protein the XPRSS-Int set, we generated 200 ran- changes in breast tumors profiling interactions interactions domly chosen sets of four genes, calculated Upregulated Phenotypes Genes Proteins their transcriptional PCCs with each gene in Similar Suppression Binary

level profiles Complex memberships the array and measured their coexpression Downregulated Genes Genes Synthetic lethal Biochemical Expression Tumor types Proteins intersection by using PCC 4 0.4 as the cutoff. This simulation showed that 88% (176/200) of the randomly generated sets do not overlap in coexpression at any level and that, at most, their intersection contains 20 genes (Fig. 2d). These results indicate that the identification of

ATM MDC1 164 genes in the XPRSS-Int set could not be BRCA1 expected by chance (empirical P o 0.005). CHEK2 We demonstrated that XPRSS-Int genes are functionally related to the four reference genes ? BRCA2 by examining three types of shared character- istics. First, the XPRSS-Int set showed an

Cancer risk enrichment of (GO) terms also present in the annotations for the refer- ence genes (Supplementary Table 3 online). Second, evolutionary conservation of coex- pression patterns was observed between orthologs of XPRSS-Int and reference genes, Figure 1 Outline for the generation of a BCN model. corresponding to 52 XPRSS-Int genes

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1339 ARTICLES

(Supplementary Fig. 1 online). Third, using an expression data set for gene in breast tumors from individuals with BRCA1 germline muta- the analysis of breast tumor cell lines treated with various agonists tions (BRCA1mut) with that in ‘sporadic’ breast tumors16 (typically or antagonists of mammary cell growth and differentiation15,we germline wild type (BRCA1wt)). Of the 132 XPRSS-Int single-probe observed significant coexpression among 33 XPRSS-Int genes genes (see Methods), 50 were upregulated and 9 were downregulated, (Supplementary Table 2). as compared with an average of 18 upregulated and 22 downregulated genes in randomly generated sets (P o 0.01; Fig. 3, top right, and Expression changes in breast tumors Supplementary Table 4 online). To evaluate further the functional significance of the XPRSS-Int set, A total of 66 XPRSS-Int genes showing expression changes in we reasoned that many XPRSS-Int genes might show expression BRCA1mut tumors are shown as nodes in a coexpression network in changes in breast tumors arising from mutations in one of the which the length of the links, or ‘edges’, is inversely proportional to the reference genes. We compared the expression of each XPRSS-Int expression correlation (PCC value) for each reference gene (Fig. 3,

a BRCA1 NFKBIA ATR NPM1 ATM MDM2 TP53BP1 AURKA CLSPN CDK2 RAD9A AKT1 TERF1 RFC2 RFC1 CHEK1 ABL1 H2AFX CSNK2A1 CCNE1 BLM MCM3 HDAC1 E2F4 RPA2 RFC4 E2F1 HDAC2 CDK4 JAK2 MRE11A RFC1 BAP1 CCNB1 TUBG1 JAK1 RAD50 PEX5 LMO4 NBS1 TERF2 MLH1 CREBBP RBBP8 RFC4 ELK1 COBRA1 PRKDC RFC2

http://www.nature.com/naturegenetics SMC1L1 CCNA2 BACH1 ESR1 MSH6 EP300 SMARCA4 STAT1 MDC1 JUNB DHX9 NUFIP1 G22P1 XRCC5 ATF1 AR KPNA2 RBBP4 FHL2 PLK3 FANCD2 ACACA ZNF350 RBBP7 MSH2 CTBP1 CDC25A CHEK2 NMI MYC SP1 PPARBP CDC25C 103 proteins RELA BRAP STAT5A PLK1 Reference protein CCND1 MAP2K3 LIT-Int protein FANCA RB1 BARD1 RAD51 CDC2 TP53 BRCA2 129 functional associations

RPA1 Biochemical interaction Nature Publishing Group Group 200 7 Nature Publishing C11orf30 RPA2 Binary protein interaction © MADH3 Protein complex memberships RPA3 Binary protein interaction and protein complex memberships HMG20B PCAF Biochemical interaction and binary protein interaction Biochemical interaction and protein complex memberships BUB1B BCCIP Biochemical interaction, binary protein interaction, FANCG FLNA and protein complex memberships SHFM1

b 4.0 c d BRCA2 – random BRCA1 CHEK2 – random 1,649 200 3.5 BRCA2 Random – random ATM – random 423 3.0 LIT-Int genes BRCA1 – random in the XPRSS-Int 150 LIT-Int gene pairs 317 2.5 ials AURKA RB1 BLM RBBP4 2.0 1,003164 249 CCNA2 RBBP8 100 CDC2 RFC4 1.5 CSNK2A1 RPA1

MRE11A SMC1L1 tr of Number Probability density Probability 1.0 520 MSH2 STAT5A 50 PRKDC 0.5 XPRSS-Int CHEK2 0.0 772 0 ATM 164 –1.0 –0.8 –0.6 –0.4 –0.2 0.0 0.2 0.4 0.6 0.8 1.0 1,434 0 16 32 48 64 80 96 112 128 144 160 PCC Gene set size

Figure 2 Generation of the XPRSS-Int data set. (a) LIT-Int network for ATM, BRCA1, BRCA2 and CHEK2. Proteins are represented by nodes and functional associations by edges, as indicated in the insets (Supplementary Table 1 and Supplementary Methods). Multiple functional associations are indicated by blended line colors. (b) Probability density distributions of transcriptional PCC values between gene pairs for each of the four reference genes and randomly selected genes through 10,000 iterations (BRCA gene – random), for randomly selected gene pairs through 10,000 iterations (random – random), and for gene pairs from the LIT-Int data set. (c) Expression intersection (XPRSS-Int) of the four reference coexpression sets using PCC 4 0.4. LIT-Int genes included in the XPRSS-Int set are shown. (d) Distribution of the coexpression intersection for randomly chosen sets of four genes and comparison with the XPRSS-Int set using PCC 4 0.4.

1340 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

0.10 SKP2 Random Upregulated CD47 Downregulated 0.08 Random Unchanged

NCK1 nsity 0.06 PNN Random

ility de ZNF330 0.04 ASF1A

Probab XPRSS-Int XPRSS-Int XPRSS-Int 0.02 TOP1 SFRS11 0.00 09 20 4050 6073 80 100 BAT1 RAD51AP1 Number of genes CHEK2 ATM MRE11A PAXIP1L BRCA2 DEK RFC4 CAD EZH2 SH2D1A RFC3 SSBP2 ILF3 CDC2 FUSIP1 USP1 KIAA0286 NASP BLM RPIA TOPBP1 UBE2S HMMR KNTC2 STMN1 BRCA1 CNAP1 HMGN4 EXOSC8 MSH2 TCERG1 RAD54L DNMT1 BUB3 TTF2 KIAA0092 MYBL2 MCM6 KATNA1 DDX39 CENPA AURKA RPA1 CDC20 SLC7A1 CDC7 FOXM1

http://www.nature.com/naturegenetics XPRSS-Int TFDP1 TBCA AURKB DNA2L MCM4SMC4L1 Upregulated expression in BRCA1 mut tumors mut LBR MCM5 Downregulated expression in BRCA1 tumors SNRPA Breast cancer reference gene CDKN2C MCM2 Expression profiling similarity (1–PCC) M96

RQCD1

Figure 3 Expression analysis of the XPRSS-Int data set in breast tumors. Top right, comparison of gene expression in BRCA1mut breast tumors relative to sporadic breast tumors both for genes in the XPRSS-Int set (vertical lines) and for 100 randomly generated equivalent sets of genes (curves; see Methods). Left, network representation of genes in the XPRSS-Int set that show expression changes in BRCA1mut breast tumors relative to sporadic breast tumors. Edge distances to the four reference genes are optimized to be inversely proportional to their average PCCs across normal tissues (in other words, shorter edges indicate higher coexpression values). Nature Publishing Group Group 200 7 Nature Publishing

© left). We observed that the XPRSS-Int genes that were upregulated in the identification of the XPRSS-Int showed that there are significantly BRCA1mut breast tumors showed expression profiles more similar to more connected nodes and more edges in the BCN (P o 0.001; that of BRCA1 in healthy tissues than did those that did not show Fig. 4b and Supplementary Fig. 2 online). This finding indicates that expression level changes between BRCA1mut and sporadic tumors BCN components may function in biologically related pathways. Of all (two-tailed Student’s t-test, P ¼ 0.039). This observation suggests that the potential functional BCN associations, those that are only in the XPRSS-Int genes that show expression changes in BRCA1mut breast literature-curated set constitute a limited and clustered portion. In tumors are more likely to be functionally related to BRCA1. addition, each omic approach generates its own characteristic functional clusters (Fig. 4c), further illustrating the need for data A BRCA-centered network model integration to generate higher quality network models. We investigated functional associations between each of the 164 To rank XPRSS-Int genes/proteins according to their possible XPRSS-Int genes/proteins and the reference genes/proteins. We sys- functional association with the four reference genes/proteins, we tematically integrated functional associations found in largely combined the five criteria described above (enrichment in GO non-overlapping omic data sets by using interologous functional terms, conserved coexpression across species, coexpression in breast relationships from four species (see Methods). The resulting BRCA- tumor–derived cell lines, changes in expression in BRCA1mut breast centered network (BCN) model consists of 118 genes/proteins (114 tumors, and omic functional association in human or model organ- XPRSS-Int genes/proteins plus the four reference genes/proteins) and isms) in a matrix format, in which the number of matched criteria is 866 potential functional associations (321 direct ‘one-hop’, and 545 indicated for each of the XPRSS-Int genes/proteins (Fig. 4d). We indirect ‘two-hop’ associations), represented as nodes and edges, color-coded each of the five criteria to indicate the strength of respectively (Fig. 4a and Supplementary Table 5 online). coexpression with BRCA1. The percentages of LIT-Int genes within Interactome network models that are currently available for model each group in the ranking correlate with the number of criteria (66, organisms show relatively high clustering coefficients and particularly 38, 21, 0, 4 and 0% for matching all five to matching none of the high connectivity between the members of protein complexes criteria, respectively), which suggests that the likelihood that newly or ‘molecular machines’ involved in specific biological processes17. defined BCN components are functionally associated with any of the Evaluation of BCN connectivity with 1,000 randomly generated net- reference genes/proteins increases with the number of criteria matched. works of sets of 164 genes from the same expression data set used for Observations made available in the literature since the generation of

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1341 ARTICLES

BCN Randomly generated network a BRCA reference gene/protein PSMD8 Gene/protein in randomly generated network b TOPBP1 XPRSS-Int gene/protein HMMR TFDP2 TTF2 JJAZ1 NASP STAT5A STIP1 DVL1 CTNNB1 SKIIP FMNL1 XRCC3 STK23 CDC25B CD63

RBBP4 ARF1 CHAF1A BRCA1 PSIP2 GPX4 SKP2 RBBP8 H2AFV SMC2L1 DNAJC9 TCF3 KIAA018 6 PRP19 HDGF RAD51L3 NFX1 CSRP2 CD81 Number of nodes in main component PA2G4 RPS20 FLJ22028 RAD51AP11 UNG 1 2 1 FLJ10719 1 2 10 f2 24 B G1 0 D A1 A Number of edges in main component N f N N 13 4 E2 4 35 7 AM EZH2 X or N P NT D1L L5 X6 T CCNA2 BRCA2 orf107 19 or AC CL D3 DR1 TED1 OQ MRE11A ACOT2ADD2AGTRALDH3B2ALS4 ANKRD7APLPAT BCHEBTF3 C1 C10orf61C C7 C CAC CCL C CC C C CE CE CH CI CO COPS6C CO CR UBE2C ZNF330 MYBL2 KNTC2 RB1 TFDP1 DNMT1 1 1 SMC1L1 2 3 B A BLM 1 1D 52 P 1 C8 A 1 1 R 1 Z 2 A 2A 12 01 D 1 E S -DO DQ -DQB RAD54L X AJ G3 A2 2 X 2 P S P E1 C A A- A TOP1 BUB3 SK TS PM GF O XYD B CG8 L CSPG6 C C CUGBCYB5CYB56DD DN DS EL E FAM48AFANCF FLJ F FOXJ1F GDF8GH GL GN GREBGUCY1AGY H HC H H HL HL APPBP1 RAD21 PRKDC RPIA RECQL ATM 7 0.04 KATNA1 1 1 P 62 P 5 1L NFYB F 3 284 B K 9 5 Randomly KIFC1 5D 3B 3 D 1 2 157 K N3 1 AP R USP49 TIMELESS SS6 O O C P4 T C 1F H2 DN BL Y2 DDX39 RD1 F2 PP PR CN A OC639286PR AP LXIP C generated MAD2L1 IFNG IF IG IL10RBILF3 IN INPP5ITGB IT K KIAA 0KIAA09L 71 LM LM LO L M MA M MA MGEAM MR MRF MT MY N NE NP CDC20 networks TRIP3 ASF1A MSH2 BAT1 0.03 3.3 5 2 DUT T1 B 2 7 3 5 M 1 1 2 IN 7A A11 A 2 AURKA PPP2R5C RAD51C K 1 N E RM X1 A DH 11L1 P PINF 1 25 D2 TBCA X D PS2 P LN 11298PN C R R K IL C C C39A G1 AP RPA1 TOP2A ARVB EX LP2 OL R P U AR E G L N CDC2 NRCA OBSLPADI2P PD P PF P P P PRUNEPT RE R R S100S S SEMA3CSE SE S SK SL S SL SLC9 SM SMP S RQCD1 SNRPA PCNA DNA2L ABCB 7 POLA CHEK2 0.02 7 2 1 L EIF3S3 5 1 SF 2 14 2 7 C1 A 5 1 38 D 2 M F2 S T5 I F1 L F1 I 1 Probability density NCK1 NAP9ND R DR H Y CNOT3 RFC4 S S SR140SR STK TA TEA TL TLX2 TM TNFRTNFSF9TPD5T U2A UBXDUTY W W WN Y ZFR ZNF3 FLJ11712 CSNK2A1 CDC7L1 MCM2 SSBP2 ESPL1 0.01 BCN BCN SNRPA1 PPP1CC RRM1 SLC7A1 RFC3 POLE TMPO Functional associations PRPS1 DEK PRPF4 MCM6 S. cerevisiae Y2H binary protein interaction MAT2A IDH3B MCM4 PAICS 0.00 KIAA0286 DDX46 S. cerevisiae protein interaction (literature curated) 86 227 0 50 100 150 200 SMC4L1 S. cerevisiae protein complex memberships TAF11 XPO1 CNAP1 SFRS11 TCERG1 LMNB1 S. cerevisiae orthologous reference gene co-expression SEC31L1 MCM5 Number of nodes or edges DCP2 LBR S. cerevisiae genetic interaction DHFR HMGB2 POLR2B S. cerevisiae synthetic lethal interaction SNRPB METAP1 COPS3 CAD CPSF4 AATF S. cerevisiae computational interaction C. elegans orthologous reference gene co-expression P 2 1 C 39 18 9 0 22 C. elegans Y2H binary protein interaction (WI6) B 2 C1 C8 1 1 A P1 11 KB F N2 P F M1 P1 P M 00 P32E R X3 NPA 146 X B C GN1 GN4 A U CN EN EP1 TC LJ D M A NTC1 ACY AN ATP A BLMHCASPCB C CD1CCD47CDK CE C C C EXOSF FO FRG1FU FUSIGTSEH H HM ILF3 KI KIAA01KIAA09K C. elegans gene phenoclustering D. melanogaster Y2H binary protein interaction H. sapiens literature protein interaction (HPRD database) 9 PH1 L S 1 1A S 1 T H. sapiens LIT-Int association P 2L3 S3 D N1 C 31 3 I67 HO XI K4 S OH 1 R M E2 S IN 96 P PHOSPHA NN H2 PBC25 H M MK M M P PL PM P RH RIF SF S S ST UB USP1W ZNF1 ZNF4ZW Direct association Indirect association http://www.nature.com/naturegenetics c BCN S. cerevisiae BCN D. melanogaster BCN C. elegans C. elegans tac-1 synthetic lethal interactions Y2H binary protein interactions gene phenoclustering functional associations BRCA1 UNG TCF3 PSIP2 SMC2L1 C36A4.8 ZNF330 MYBL2 MRE11A UBE2C BARD1 BRC-1 MYBL2 KNTC2 K04C2.4 RAD54L TRIP3 CDC20 BLM KATNA1 BRD-1 ZK678.1 RAD54L LIN-15A TIMELESS RAD21 TOP1 USP49 C39E9.14 RPIA NFYB DUT TBCA dli-1 F44B9.6 KIFC1 RECQL LIN-36 MAD2L1 RPA1 CDC2 TOP2A RBBP4 ZK593.5 CDC20 MSH2 ASF1A ABCB 7 RQCD1 CDC2 POLA PCNA Y54E2A.3 K07A1.12 AURKA DUT dnc-1 ZK418.4 PPP2R5C RAD51C EIF3S3 PCNA MCM2 tac-1/TAC-1 LIN-53 NCK1 LIN-37 RPA1 CNOT3 RFC4 POLA PCNA FLJ11712 ESPL1 TOP2A SSBP2 CSNK2A1 CNAP1 POLE SNRPA CDC2 PPP1CC RRM1 C28H8.12 T05C12.6 RQCD1 CDC7L1 CHEK2 SLC7A1 RFC3 RFC3 KIF1C MCM6 dnc-2 MIG-5 SNRPA1 RFC4 DEK T09A5.2 CSNK2A1 KIAA0286 PRPS1 IDH3B F56A3.4 POLE MCM4 SMC4L1 MAT2A COPS3 XPO1 spd-5 RFC3 LMNB1 LMNB1 F22B5.7 W07B3.2 SFRS11 MCM5 PRPF4 SNRPB zyg-9 Y2H protein-protein interaction DNA repair HMGB2 CAD DNA transcription DNA replication SEC31L1 Gene phenoclustering

Nature Publishing Group Group 200 7 Nature Publishing d © MAD2L1 KIAA028 6 MAT2A PLK4 RECQL ZNF330 POLE SKP2 RFC4 BLM CDC2 RFC3 AURKA KNTC2 TIMELESS DUT ESPL1 MCM6 CDC7L1 SMC1L1 RBBP8 EZH2 PCNA BUB3 RAD51C TOPBP1 MCM4 CCNA2 DEK RBBP4 AURKB CDC20 RAD21 PRKDC RAD51AP1 RAD54L STMN1 TOP1 ASF1A KATNA1 RB1 TTF2 MPHOSPH1 FLJ10719 CHAF1A SMC2L1 NASP DNA2L TCERG1 UBE2S UBE2C CNAP1 SNRPA KNTC1 TBCA PPP1CC CDKN2C MCM2 MKI67 DHFR POLA SFRS11 Criteria : HMMR RRM1 MSH2 TOP2A SMC4L1 MYBL2 CSPG6 TFDP1 MCM5 RPA1 MRE11A GO terms enrichment - Orthologs genes coexpression - Breast tumor cell lines genes coexpression - BRCA1mut expression level change - BCN-references functional association - SEC31L1 CEP1 HMGB2 ANP32E HMGN1 JJAZ1 PAICS AATF EIF3S3 ABCB 7 TMPO CBX3 SNRPB TRIP3 CPSF4 BLMH METAP1 PPP2R5C ACYP1 KIAA0922 FLJ14639 COPS3 ZNF131 GTSE1 PRPS1 RIF1 TCF3 FUBP1 TAF11 CNOT3 XPO1 PMS2L3 CTCF FRG1 USP49 SPBC25 CENPC1 DDX46 POLR2B CASP2 SFRS3 IDH3B HDCMA18P DCP2 CD1C ATP11B ZNF43 KIAA0101 CENPA RPIA ZWINT DNMT1 EXOSC8 KIAA0186 DDX39 H2AFV SNRPA1 SLC7A1 LMNB1 USP1 APPBP1 FUSIP1 FLJ11712 KIAA0092 PAXIP1L LBR UNG PRPF4 CAD DNAJC9 ILF3 WHSC1 CCNF BAT1 MPHOSPH9 RQCD1 ARHH HMGN4 KIFC1 CSNK2A1 TFDP2 M96 NFYB SH2D1A FOXM1 PNN PSIP1 CD47 STAT5A SSBP2 NCK1

PCC-BRCA1 XPRSS-Int genes/proteins (n = 164) 0.4 1.0 LIT-Int HMMR interactome

Figure 4 Generation of the BCN and ranking of XPRSS-Int genes/proteins. (a) Left, 13 omic data sets were integrated into the XPRSS-Int data set to generate the BCN. Edges represent direct ‘one-hop’ (unbroken lines) or indirect ‘two-hop’ (corresponding to two XPRSS-Int genes/proteins connected through a non–XPRSS-Int gene/protein; broken lines) functional associations. Right, example of a network that was randomly generated with the algorithm in b. Y2H, yeast two-hybrid. (b) Number of nodes (genes/proteins) and edges (functional associations) included in the main giant component of networks randomly generated through 1,000 iterations (curves) and in the BCN (vertical lanes). To avoid the biases of an a priori selection of references and subjective data compilation, the four human reference genes/proteins and all edges connecting them in the BCN, in addition to the C. elegans phenotypic profiling data, were excluded from this analysis. (c) Three left panels, GO terms annotations reveal functional clusters contained in distinct omic data sets used to generate the BCN. Right panel, functional associations of the C. elegans tac-1 gene and TAC-1 protein with connections to BCN genes/proteins. (d) Five criteria were integrated to estimate the overall functional significance of XPRSS-Int genes/proteins relative to breast cancer reference genes/proteins. XPRSS-Int genes are clustered according to the number of criteria they match (from 5 to 0) and then ordered within clusters according to their average PCC value for BRCA1 (PCC-BRCA1). LIT-Int genes are shown in purple. Genes shown in red encode proteins that are also present in the HMMR-centered interactome (Fig. 5a).

1342 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

http://www.nature.com/naturegenetics Figure 5 HMMR-centered interactome model. (a) Genes/proteins are connected by functional associations, each represented by a differently colored edge as summarized in the inset. Only functional associations between human genes/proteins are shown. Protein function assignment is based on the literature or on the known function of constituent protein domains. Protein names in red are BCN components first identified by the XPRSS-Int, and those in blue correspond to BCN bridging genes/proteins. co-IP, coimmunoprecipitation; co-AP, coaffinity purification. (b) Top, endogenous coimmunoprecipitation of CSPG6 and BRCA1, and SMC1L1 and BRCA1 in 293 cells. Molecular masses are indicated on the left. The antibodies used for each coimmunoprecipitation experiment are given in parentheses. IB, immunoblot; Wce, whole-cell extract; IgG, normal purified immunoglobulin negative controls (G, goat; M, mouse; R, rabbit). Bottom, endogenous coimmunoprecipitation of MAD1L1 and BRCA1 or BRCA2, and HMMR and BRCA1 or BRCA2 in U2OS cells. (c) Endogenous coimmunoprecipitation of HMMR and BRCA1 in non-synchronized HeLa S3 cells. (d) Cell cycle synchronization and coimmunoprecipitation assays in HeLa S3 cells.

the LIT-Int data set (1 October 2004) support the hypothesis that the C. elegans genes17, which identifies a phenocluster of genes (Fig. 4c, Nature Publishing Group Group 200 7 Nature Publishing BCN points to numerous functional associations with known breast right) whose inactivation is linked to microtubule-instability pheno- © cancer genes/proteins. Since we generated LIT-Int, 20 proteins have types, particularly centrosome dysfunction22–24. Observations in been newly described to have a functional relationship with at least one human cells also indicate that HMMR may have a potential role in of the reference genes/proteins, and 4 of these proteins (20%; detailed centrosomal functions18,25. Considered together, these observations in Supplementary Table 2) are present in the XPRSS-Int (B10-fold suggest that HMMR plays a role in centrosome function in conjunc- increase relative to chance); this percentage is similar to that deter- tion with BRCA1. mined above for LIT-Int proteins in the XPRSS-Int data set. To investigate predictions based on the BCN model, we looked for New BRCA functional associations new potential connections between different biological processes. The Generation of an HMMR-centered interactome map through yeast highest-ranking connection involves a TACC family member18, two-hybrid screens identified a network of 31 proteins and inter- HMMR, which encodes the hyaluronan-mediated motility receptor19 actions between them (Fig. 5a). Several of these interactions were (HMMR, also known as RHAMM). HMMR has the highest PCC validated through coaffinity purification of exogenously expressed coexpression value relative to BRCA1 (0.90; Fig. 3). In itself, the high fusion proteins and coimmunoprecipitation of endogenous proteins PCC of HMMR with BRCA1 might be a substantial indicator of a (Supplementary Figs. 3–5 online). Notably, among interactors of functional association between the two genes or gene products; HMMR, CSPG6 cohesin and MAD1L1 mitotic spindle-assembly however, it is the integration of other types of relationships that checkpoint protein are also present in the BCN, further illustrating points to HMMR beyond the possible false-positive discovery rate how the BCN is a source of testable associations with breast reference obtained by profiling similarity analysis. The HMMR gene product genes/proteins, and the endogenous CSPG6 and BRCA1 proteins form connects BRCA1 and RPPB4 (also known as RBAP48, pRb and a complex in 293 cells (Fig. 5b). In addition to the direct association BRCA1-interacting protein) by binary physical interactions in the between MAD1L1 and HMMR, we also identified MAD1L1-BRCA1 BCN (Fig. 4c, right), and it matches four of the five functional endogenous protein complexes (Fig. 5b). The identification of several significance criteria (Fig. 4d). A protein-protein physical interaction interactors of HMMR as members of BRCA1 protein complexes identified by yeast two-hybrid screening predicts an association suggests that HMMR is a genuine component of a BRCA1 functional between the TAC-1 and BRD-1 Caenorhabditis elegans proteins, module involved in centrosomal function. orthologs of TACCs and BARD1, respectively (refs. 20,21). Clues to Consistent with the indirect evidence described above, HMMR and the function of TACCs also come from the phenotypic profiling of BRCA1 were found to associate in endogenous protein complexes in

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1343 ARTICLES

several cell lines (293, HeLa (Fig. 5c) and U2OS cell lines) tested. epithelium, it induces centrosome amplification and genetic instability Reciprocal coimmunoprecipitation experiments were hampered by before tumorigenesis33. Consistent with an association in mitosis, unspecific binding of HMMR to secondary antibodies. Because the endogenous AURKA-HMMR protein complexes were identified in HMMR-MAD1L1 protein interaction was detectable most strongly greater amounts in nocodazole-arrested cells (Fig. 6d). Immunofluor- during M phase, and because the HMMR-BRCA1 protein complex escence analysis of HMMR in BRCA1mut cells showed mislocalization association seemed to be greater during the G2/M and M phases, we relative to g-tubulin (TUBG1); this mislocalization coincided with examined whether a MAD1L1-BRCA1 association also occurs in that of AURKA, as revealed by multiple foci, some showing detectable M phase. MAD1L1 and BRCA1 were found together in endogenous HMMR and others not (Fig. 6e and Supplementary Fig. 9 online). protein complexes as cells entered mitosis (Fig. 5d). Consistent with No mislocalization of HMMR or AURKA on depletion of BRCA1 by the participation of BRCA1 and BRCA2 in the regulation of mitotic siRNA was observed in the cell lines studied, a finding that could be progression26,27, we found that BRCA2 was also present in HMMR due to partial BRCA1 depletion or, alternatively, to the fact that and MAD1L1 mitotic complexes (Fig. 5b,c). Notably, HMMR was HMMR mislocalization results from an indirect effect mediated by previously identified among other proliferation-associated genes to be overexpression of AURKA. As shown above, AURKA and HMMR tightly coexpressed with BRCA2 in breast carcinomas28. We did not expression is upregulated in BRCA1mut breast tumors relative to observe such cell cycle–dependent protein associations for CSPG6, sporadic breast tumors (Fig. 3) and AURKA and HMMR seem to which suggests that the HMMR-BRCA1 and CSPG6-BRCA1 com- be overexpressed and mislocalized in BRCA1mut cell lines (Supple- plexes have different cellular outcomes. In summary, by experimen- mentary Fig. 9). Overexpression of AURKA in HME/TERT tally testing BCN predictions, we identified three previously unknown (BRCA1wt) cells was sufficient to cause the overexpression and mis- protein associations with breast cancer tumor suppressors, thereby localization of HMMR (Fig. 6e). These observations, together with the deriving a centrosomal interactome model that has several other genetic interaction identified between HMMR and BRCA1, suggest candidates, some of which are also found in the BCN (KNTC2 (also that overexpression of HMMR and/or its biochemical modification known as HEC1), MAD2L1 and SEC31L1). resulting in centrosome amplification, which have been previously Because ubiquitination has an important role in the regulation associated in myeloma25, are early somatic molecular events that http://www.nature.com/naturegenetics of centrosome duplication29,bothBRCA1andHMMRhavebeen contribute to breast tumorigenesis. localized to centrosomes and mitotic spindle poles18,26, and HMMR- BRCA1 protein complexes are most prominent during G2/M, we HMMR and breast cancer risk examined the possibility that HMMR might be a substrate of The genotyping of three HMMR haplotype-tagging SNPs (htSNPs) in the BRCA1-BARD1 E3 ubiquitin ligase activity30.HMMRwas 923 individually matched case-control pairs from a population-based found to be efficiently polyubiquitinated by the BRCA1-BARD1 study of incident breast cancer in northern Israel identified statistically heterodimer in vitro (Fig. 6a, left). In vivo ubiquitination of HMMR significant associations for each htSNP (Supplementary Table 6 was then observed after co-transfection of glutathione S-transferase online). All three individual htSNPs were associated with risk of (GST)-conjugated HMMR and hemagglutinin (HA)-conjugated breast cancer and were consistent with an additive model showing ubiquitin into 293 cells (Fig. 6a, right). These observations suggest increasing risk with each additional risk-tagging allele. For example, that HMMR has a role in centrosomal function mediated by each copy of the A allele of rs10515860 was associated with a 32% BRCA1-BARD1 polyubiquitination. increase in risk (odds ratio (OR) ¼ 1.32, 95% confidence interval Nature Publishing Group Group 200 7 Nature Publishing (CI) 1.09–1.60; P ¼ 0.004). Each copy of the A-C-A haplotype © HMMR-BRCA1 and centrosome dysfunction (rs7712023-rs299290-rs10515860) estimated by the expectation max- To understand better the functional association between HMMR and imization method increased a woman’s risk by 33% (unadjusted BRCA1 in vivo, we examined phenotypes mediated by short interfer- OR ¼ 1.33 for an additive model, 95% CI 1.08–1.63; P ¼ 0.007). ing RNA (siRNA). In agreement with changes observed in HMMR The single intronic htSNP rs10515860 essentially captured all of the expression in BRCA1mut tumors relative to sporadic breast tumors, variation associated with risk, which suggests that the A allele of depletion of BRCA1 resulted in an upregulation of HMMR and its rs10515860 is likely to be in linkage disequilibrium with a variant that product in immortalized human mammary epithelial cells (HME/ confers risk of breast cancer. TERT cells; Supplementary Fig. 6 online). Centrosome hypertrophy Further study of this allele showed that the risk of breast cancer and amplification were observed in cells depleted of either BRCA1 or associated with the HMMR locus was not accounted for by the HMMR (Fig. 6b and Supplementary Figs. 7 and 8 online). These presence of BRCA1 or BRCA2 mutations (data not shown). Consistent effects on centrosome numbers are similar to those previously with models of inherited susceptibility to breast cancer34, rs10515860 observed on functional inhibition or depletion of BRCA1 (ref. 31). A allele carriers of age 40 years and over were 1.22 times as likely to The breast tumor–derived cell line Hs578T and the HME/TERT cells develop breast cancer than were controls (OR ¼ 1.22, 95% CI showed higher levels of centrosome amplification on depletion of 1.02–1.46; P ¼ 0.026), whereas carriers younger than 40 years were HMMR or BRCA1 than either the HeLa or U2OS cell lines. Notably, a 2.7 times as likely to develop breast cancer (OR ¼ 2.73, 95% CI 1.25– genetic interaction between HMMR and BRCA1 was observed in 5.97; P ¼ 0.012). Risk was not restricted to Ashkenazi Jews and was HME/TERT and Hs578T cells. In these cells, simultaneous depletion similar in magnitude and frequency for Sephardi Jews and of HMMR and BRCA1 suppressed the centrosome abnormality Christian and Muslim Arabs, in addition to Bedouin, Druze and phenotypes observed on depletion of either single transcript (Fig. 6c). Cherkazi populations. To investigate further the relationship between HMMR, BRCA1 We investigated this association in an independent Ashkenazi and centrosome dysfunction, we examined the role of aurora-A Jewish cohort, ascertained at Memorial Hospital in New York, kinase (AURKA, ranked third in the XPRSS-Int) because, first, its comprising 247 individuals with breast cancer and a family history C. elegans ortholog is required for centrosomal localization of TAC-1 of three or more cases of breast cancer but no identified BRCA1 or (refs. 22–24); second, it associates with BRCA1 in the regulation of the BRCA2 mutation and 298 women with no personal or family history G2/M transition32; and third, when overexpressed in the mammary of breast cancer35. Genotyping SNPs across the complete 5q34 region

1344 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

identified associations with P values lower than 103 for the rs1458338 cancer (Supplementary Table 8 online), which suggests that there and rs12658549 SNPs located proximally to the HMMR locus (Sup- may be two discrete risk variants that can be ascertained on plementary Table 7 online). Association with these SNPs was also different haplotypes. identified in the Ashkenazi subgroup of the Israeli case-control study To confirm the association between genetic variation in HMMR (rs1458338: OR ¼ 2.87, 95% CI 1.06–7.79; rs12658549: OR ¼ 2.73, and early-onset breast cancer, we genotyped rs10515860 in a third 95% CI 1.09–6.85; P ¼ 0.030 and 0.032, respectively), which independent sample of 997 cases and 464 controls (212 controls with suggested that these SNPs are in linkage disequilibrium with genetic age data available) derived from the New York Cancer Project35.We variation 5¢ upstream of the HMMR coding region in a found a significant association of a 12-month-earlier age of onset in ‘gene desert’ region spanning B1 Mb proximally. The G-C-A-T-G homozygous cases as compared with controls in this cohort haplotype was also significantly associated with risk of breast (OR ¼ 1.56, 95% CI 1.02–1.95; P ¼ 0.009), consistent with the

Wce IP a kDa kDa d IB: HMMR 111 (aIHABP) 111 IB: GST kDa Wce IgG R HMMRIgG R HMMR 49 HMMR-polyUb Nonspecific IB: AURKA (aIAK1-4) 71 HMMR DMSO Nocodazole 71 GST co-AP e HMMR TUBG1 DAPI merge +++ ++Wce (IMR90) 111 –++ ––BRCA1 IB: HA HME/TERT ––+ ++ BARD1 (BRCA1wt) ––– +–BRCA1 (AA 1-500) 71 ++– GST-HMMR ––– –+ BRCA1 (AA 1-300) –++ HA-Ub

b Negative control siRNA HMMR siRNA HMMR siRNA http://www.nature.com/naturegenetics HCC1937 (BRCA1mut)

AURKA HMMR DAPI merge

HMMR siRNA / GFP-Centrin BRCA1 siRNA BRCA1 siRNA HCC1937 (BRCA1mut)

GFP-AURKA transfected HMMR DAPI merge Nature Publishing Group Group 200 7 Nature Publishing

© 100 c HME/TERT (BRCA1wt) 50

40 > 2) n Age at diagnosis 30 f 28–35 36–40 41–45 46–51 52–54 )

10 2 20 amplification ( 0 Percent cells with centrosome Percent 10

–2 0 U C- H B H+B U C- H B H+B U C- H B H+B U C- H B H+B NM_012485 NM_012484 HeLaU20S HME/TERT Hs578T –4 Normalized expression (log Normalized expression

Figure 6 BRCA1-BARD1–mediated polyubiquitination, HMMR-BRCA1 and HMMR-AURKA interactions and centrosome dysfunction. (a) Left, HMMR in vitro polyubiquitination analysis. Right, detection of polyubiquitinated HMMR by co-transfection of GST-HMMR and HA-ubiquitin into 293 cells. (b) Immunofluorescence microscopy results for g-tubulin (TUBG1) and GFP-centrin are shown in Hs578T cells transfected with different siRNAs. White arrowheads indicate normal centrosomes; yellows arrowheads indicate centrosome amplification or hypertrophy. (c) Percentage of cells showing centrosome amplification in HeLa, U2OS, HME/TERT and Hs578T cells. U, untreated cells; C-, negative control siRNA; H, HMMR siRNA; B, BRCA1 siRNA; H+B, HMMR plus BRCA1 siRNAs. Immunofluorescence microscopy results and evaluation of siRNA-mediated depletion are provided in Supplementary Figures 7 and 8.(d) Endogenous coimmunoprecipitation of HMMR and AURKA in U2OS cells treated with DMSO or nocodazole. (e) Immunofluorescence microscopy results for HMMR, TUBG1, AURKA, GFP-AURKA and DAPI counterstaining in HME/TERT (BRCA1wt) and HCC1937 (BRCA1mut) cells. Yellow arrowheads indicate protein cellular mislocalization. A cell overexpressing GFP-AURKA shows more endogenously expressed HMMR than neighbor cells that do not express GFP-AURKA, identified by DAPI staining. Normal colocalization of HMMR and AURKA in HME/TERT cells and analysis of additional BRCA1mut cell lines are shown in Supplementary Figure 9.(f) Early age at diagnosis is associated with higher HMMR expression in primary breast tumors of individuals without BRCA1 or BRCA2 mutations, as assessed with a published data set16 (P ¼ 0.02 and 0.07 for microarray probes NM_012484 and NM_012485, respectively).

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1345 ARTICLES

Table 1 Association results for rs10515860 and risk of breast cancer

Israel and New York data

Israel New York Total data combined

Genotypes Controls Cases OR (95% CI) Controls Cases OR (95% CI) Controls Cases OR (95% CI) P valuea G/G 1,211 1,173 1.00 375 794 1.00 1,586 1,967 1.00 G/A 230 288 1.29 (1.07–1.46) 85 189 1.06 (0.80–1.41) 315 477 1.23 (1.05–1.43) A/A 12 17 1.46 (0.70–3.08) 5 14 1.28 (0.47–3.57) 17 31 1.41 (0.78–2.50) 0.007

Combined Israel and New York data by age cohort

Age r 40 Age 4 40

Genotypes Controls Cases OR (95% CI) P valuea Controls Cases OR (95% CI) P valuea G/G 101 177 1.00 1,276 1,790 1.00 G/A 15 50 2.01 (1.08–3.75) 257 427 1.18 (1.01–1.40) A/A 0 3 — 0.026 16 28 1.21 (0.65–2.24) 0.044

aP values for the additive model (trend test).

findings in the case-control study from Israel. Combined analysis for independent expression data set (GSE5859; log2 units difference ¼ 0.21, all 2,475 cases and 1,918 controls gave an OR of 1.23 (95% CI 1.05– 95% CI 0.03 to 0.44; P ¼ 0.085) was used. These differences in 1.43; P ¼ 0.007) for each copy of the rs10515860 risk allele (Table 1). expression for genotypes and haplotypes agree with a model in which http://www.nature.com/naturegenetics Combined analysis by age cohort showed greater risk for breast cancer HMMR overexpression is both an early molecular mechanism for this htSNP in women younger than 40 years (OR ¼ 2.01, 95% CI promoting breast tumorigenesis and a risk factor for breast cancer. 1.08–3.75) as compared with older women (OR ¼ 1.18, 95% CI 1.01– The G-C-A-T-G haplotype that is also associated with risk of breast 1.40), although both associations were significant (Table 1). cancer correlates with downregulation of HMMR (Table 2), reinfor- Given the association that extended proximally to the HMMR cing the hypothesis that there is tight germline regulation of HMMR coding region in these three studies, we investigated differences in expression and suggesting that any alteration in HMMR protein germline expression by using HapMap genotypes and paired expres- quantities could promote tumorigenesis, as previously discussed for sion data. Consistent with the results outlined above for risk of breast TACC proteins36. cancer, the rs10515860 SNP showed an association with HMMR To examine further this model of HMMR expression changes and overexpression under an additive model (log2 units difference ¼ 0.35, risk of breast cancer, we investigated how HMMR expression in 95% CI 0.18–0.53; P ¼ 0.0001) for European descendents. The primary breast tumors from individuals without BRCA1 or BRCA2 A-C-A haplotype showed an association with an increase in 0.32 mutations is associated with age at diagnosis. Linear regression Nature Publishing Group Group 200 7 Nature Publishing log2 units in HMMR expression for European descendents and for all analysis showed that higher HMMR expression is associated with © HapMap individuals included in the data set GSE6534 (Table 2). early age at diagnosis (Fig. 6f), in agreement with the idea that The same trend was observed for European descendents when an overexpression of HMMR is a risk factor for breast tumorigenesis.

Table 2 Haplotypes association with HMMR germline expression level

HapMap, European ancestry (n ¼ 60)a

Haplotypes rs1458338 rs12658549 rs7712023 rs299290 rs10515860 Frequency Difference (95% CI) P value

1ATTTG0.525—— 2ATATG0.2330.19(0.03/0.41) 0.090 3ATACG0.1330.04(0.21/0.30) 0.722 4 A T A C A 0.109 0.32 (0.02/0.62) 0.037

HapMap, all populations (n ¼ 210)b

1ATTTG0.277—— 2ATATG0.2510.01(0.10/0.13) 0.822 3ATACG0.2060.05 (0.17/0.07) 0.399 4GCATG0.0670.39 (0.57/0.21) 3 105 5GCACG0.0550.10(0.08/0.29) 0.271 6 A T A C A 0.053 0.32 (0.12/0.51) 2 103 7GTACG0.0390.16 (0.42/0.09) 0.217 8 ACACG0.0180.06(0.22/0.33) 0.689 9GTATG0.0150.29 (1.05/0.47) 0.452

aGlobal haplotype association P ¼ 0.04. bGlobal haplotype association P ¼ 0.92. Rare estimated haplotypes (n ¼ 7, cumulative frequency ¼ 0.019) not shown.

1346 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

We cannot exclude the possibility that linkage disequilibrium of using 100 random simulations. The BRCA1mut coexpression network was genetic variation independent of HMMR might potentially explain generated with the Graphviz graph visualization package. Owing to constraints the observed associations. However, the strength of the genetic of the visualization program, the relative network positions of the four associations, coupled with the functional associations that we have reference genes are not proportional to the PCCs between them. Transcrip- found, indicates that HMMR may be a previously unknown suscept- tional PCCs of XPRSS-Int genes that were upregulated and unchanged in BRCA1mut tumors were compared by a two-tailed Student’s t-test, taking into ibility gene for breast cancer within diverse human populations. account all U95A probes for each gene. Orthologs were defined by reciprocal BLASTP best hit (P o 106) or from the literature. DISCUSSION For the BCN modeling, we integrated, first, gene expression profiling We have shown that analysis of gene expression, combined with similarity above a given threshold, as determined by the PCC, from a integration of omic data sets from various species, can be used to compendium of Saccharomyces cerevisiae43 and C. elegans44 microarray profiles generate a network of hundreds of potential functional associations (6,174 and 18,451 genes, respectively); second, phenotypic similarity for 661 with cancer genes/proteins. Many of the hypotheses provided by the early embryogenesis C. elegans genes above a specific threshold17; third, genetic BCN have been tested and validated either here or in the literature. interactions for 1,347 S. cerevisiae genes43; and fourth, protein physical Additional omic data sets can be integrated as they are produced to interactions (binary interactions, complex co-memberships and biochemical 43 generate a larger and more comprehensive BCN. interactions (protein modification)) for 3,458 S. cerevisiae , 4,588 C. elegans (WI6 data set21), 7,198 Drosophila melanogaster45,46 and 10,305 Homo Among the newly defined associations predicted in the BCN, we sapiens proteins47. To rank XPRSS-Int genes/proteins, PCC average values focused on those of HMMR on the basis of a combination of different for BRCA1 were used and BCN links corresponding to LIT-Int interactions functional criteria. Network modeling also points to functional were not considered. Gene and protein names used throughout the text are associations for other genes/proteins with properties similar to those detailed in Supplementary Table 9 online. of HMMR, such as CSPG6. HMMR and two of its interactors (CSPG6 and MAD1L1) associate in protein complexes with BRCA1 and Protein and biochemical analyses. Yeast two-hybrid methods were carried out as described48. We performed BRCA1-BARD1 heterodimer purification and BRCA2. We established that HMMR is an in vitro substrate for 31 BRCA1-BARD1–mediated polyubiquitination and that BRCA1 and ubiquitination reactions as described , using whole-cell extracts of IMR90 http://www.nature.com/naturegenetics human fibroblasts. We used the following antibodies: polyclonal IHABP and HMMR genetically interact to control centrosome number in breast E-19 (Santa Cruz Biotechnologies) for HMMR; monoclonal SD118, MS110 tumor– and mammary epithelium–derived cell lines. In addition, we and SG11 for BRCA1; monoclonal Ab-1 (Oncogene) for BRCA2; monoclonal identified an association in breast tumorigenesis between BRCA1 and clone-53 (BD Transduction Laboratories) for NUP62; polyclonal anti-AIK (Cell HMMR with AURKA, which is also in the BCN. Collectively, these Signaling) and monoclonal anti-IAK1 (BD Transduction Laboratories) for findings describe a role for HMMR together with BRCA1 and AURKA AURKA; and polyclonal 727 and 725 for CSPG6 and SMC1L1, respectively, in microtubule-based processes essential for proper polyclonal Dap23 and polyclonal 81d for MAD1L1, anti-PCM1 and anti-CEP2 segregation. Observations made in Xenopus laevis have associated (see Acknowledgments). the HMMR ortholog XRHAMM with the control of spindle-pole 37 Cell culture and immunofluorescence microscopy. The negative control assembly mediated by the BRCA1-BARD1 heterodimer . This obser- siRNA was Silencer Negative Control 1 siRNA (Ambion). The HMMR 1and vation further supports the idea that the HMMR-centered interactome 2 siRNAs were Silencer Pre-Designed siRNAs (Ambion; 5¢-GGUGCUUAU network described here has a role in genomic stability and GAUGUUAAAATT-3¢ and 5¢-GGACCAGUAUCCUUUCAGATT-3¢, respec- breast tumorigenesis. tively). The BRCA1-b siRNA has been published31.Centrosomeimmunofluor- Nature Publishing Group Group 200 7 Nature Publishing Genetic analysis supports HMMR as a newly defined breast cancer escence counts correspond to three independent experiments, each scoring 200 © susceptibility gene, thereby delineating a genetic link between risk of cells with 480% protein reduction of HMMR and/or BRCA1. breast cancer and centrosome dysfunction. Notably, centrosome Association studies. Individuals with breast cancer and control subjects were hypertrophy and amplification are often evident in early stage pro- 38,39 recruited through an institutional review board–approved study of breast liferative breast lesions , and loss of mouse Brca1 or Brca2 leads to cancer in northern Israel and through protocols at the Memorial Sloan- 40,41 centrosome amplification and genomic instability .Ourresults Kettering Cancer Center in New York. The case-control study in northern suggest that HMMR can act as a breast cancer susceptibility gene Israel is a population-based study of incident breast cancer identified through when expressed in relatively large amounts; such overexpression rapid case ascertainment between January 2000 and July 2006. Controls were probably leads to centrosome amplification and genomic instability, identified by randomly selecting women from a comprehensive list of insurees similar to what has been proposed in studies of an Aurka transgenic with Israel’s largest health provider, which covers B70% of women at risk in model33 and in work involving transient Mad2 overexpression42. northern Israel. For each case, an individually matched control without breast Our network modeling strategy is applicable to other types of cancer was identified by randomly sampling all female insurees meeting the cancer; it will help to discover more cancer-associated genes and matching criteria of age (within 1 year), ancestry (Jewish versus non-Jewish) and geographical clinic area. All breast cancers were pathologically confirmed, to generate a ‘wiring diagram’ of functional interactions between and each participant signed written, informed consent. Cases and controls were their products. classified by religion and ancestry on the basis of self-report.

METHODS Genotyping. Genomic DNA derived from blood lymphocytes was used for all Bioinformatic analyses. We obtained GO annotations from NetAffx (Affyme- genotyping. All case and control samples were genotyped for BRCA1 and trix) and based enrichment calculations on the total number of distinct genes BRCA2 Jewish-derived founder mutations by using a TaqMan platform and their annotations in the U95A platform; P values were then determined by (Applied Biosystems). Genotyping for SNPs in HMMR was performed with Fisher’s exact test. Only grade 3 (poorly differentiated) tumors were used to Pyrosequencing (Biotage). Haplotype-tagging SNPs were selected for the study gene expression in BRCA1mut tumors relative to sporadic breast tumors; northern Israel study using Haploview49 to determine linkage disequilibrium P values for differential expression were then determined by two-tailed blocks by analyzing Centre d’ Etude du Polymorphisme Humain (CEPH) SNP Student’s t-test (P o 0.10). To avoid probe discrepancies and differences in genotype data downloaded from the HapMap website. To be included in the the number of data points, only single-probe genes were simulated and analysis, SNPs had to meet the following criteria: the Hardy-Weinberg w2 compared with single-probe genes from the XPRSS-Int data set, corresponding goodness-of-fit test yielded a P value 4 0.05; at least 75% of subjects were to 132 genes from a total of 164; P values were then calculated empirically by genotyped for the SNP; and the minor allele frequency of the SNP was at least

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1347 ARTICLES

0.001. For the SNPs that met these criteria, Haploview was run in aggressive COMPETING INTERESTS STATEMENT tagging mode using two- and three-marker haplotypes. No difference in results The authors declare competing financial interests: details accompany the full-text was seen when the pairwise tagging mode was used. For a SNP to be selected, all HTML version of the paper at http://www.nature.com/naturegenetics/. alleles to be captured were correlated at a value of r2 4 0.8 with the marker in that set. Three SNPs were identified as htSNPs, which tagged variation within Published online at http://www.nature.com/naturegenetics three estimated haplotype blocks in HMMR: rs7712023, rs299290 and Reprints and permissions information is available online at http://npg.nature.com/ reprintsandpermissions rs10515860. We performed duplicate genotyping for all samples in the population-based study for rs10515860 with 100% concordance. Duplicate genotyping for rs299290 was done for 1,789 samples, and 1,737 (97.1%) were 1. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 in agreement. Duplicate genotyping was not performed in the validation study (2004). in New York. 2. Kitano, H. Cancer as a robust system: implications for anticancer therapy. Nat. Rev. Cancer 4, 227–235 (2004). Statistical analysis. Conditional and unconditional logistic regressions were 3. Khalil, I.G. & Hill, C. Systems biology for cancer. Curr. Opin. Oncol. 17,44–48 (2005). used to analyze data from the case-control study, as appropriate for a matched 4. Thomas, R.K. et al. High-throughput oncogene mutation profiling in human cancer. study. No important differences between matched and unmatched analyses Nat. Genet. 39, 347–351 (2007). were noted; therefore, the results are presented for the unmatched analysis to 5. Sjoblom, T. et al. The consensus coding sequences of human breast and colorectal optimize the sample size for tests of interaction and for ease of presentation. cancers. Science 314, 268–274 (2006). 6. Greenman, C. et al. Patterns of somatic mutation in human cancer genomes. Nature Haplotypes were estimated with the EM algorithm, as implemented in the R 446, 153–158 (2007). 50 packages haplo.stats and SNPStats . Association study between HapMap 7. Vidal, M. A biological atlas of functional maps. Cell 104, 333–339 (2001). genotypes and haplotypes, and HMMR germline expression levels from two 8. Matthews, L.R. et al. Identification of potential interaction networks using sequence- independent data sets (normalized values of Gene Expression Omnibus records based searches for conserved protein-protein interactions or ‘interologs’. Genome Res. 11, 2120–2126 (2001). GSE5859 and GSE6536) were performed with the haplo.stats package by fitting 9. Futreal, P.A. et al. BRCA1 mutations in primary breast and ovarian carcinomas. linear equations and P values obtained based on the F-test. Science 266, 120–122 (1994). 10. Miki, Y. et al. A strong candidate for the breast and ovarian cancer susceptibility gene Additional methods. Detailed experimental methods are described in the BRCA1. Science 266, 66–71 (1994). Supplementary Methods online. 11. Wooster, R. et al. Identification of the breast cancer susceptibility gene BRCA2. Nature http://www.nature.com/naturegenetics 378, 789–792 (1995). 12. Renwick, A. et al. ATM mutations that cause ataxia-telangiectasia are breast cancer Note: Supplementary information is available on the Nature Genetics website. susceptibility alleles. Nat. Genet. 38, 873–875 (2006). 13. Meijers-Heijboer, H. et al. Low-penetrance susceptibility to breast cancer due to ACKNOWLEDGMENTS CHEK2(*)1100delC in noncarriers of BRCA1 or BRCA2 mutations. Nat. Genet. 31, We thank members of our laboratories for discussion and comments on the 55–59 (2002). manuscript; A. Merdes (Wellcome Trust Centre for Cell Biology) for anti-PCM1; 14. Su, A.I. et al. Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. B. Koch (Research Institute of Molecular Pathology) for anti-CSPG6 and anti- Acad. Sci. USA 99, 4465–4470 (2002). SMC1L1; K.-T. Jeang (National Institute of Allergy and Infectious Disease) for 15. Cunliffe, H.E. et al. The gene expression response of breast cancer to growth anti-MAD1L1 (Dap23); D.-Y. Jin (University Hong Kong) for anti-MAD1L1 regulators: patterns and correlation with tumor expression profiles. Cancer Res. 63, 7158–7166 (2003). (81d); E.A. Nigg (Max Planck Institute of Biochemistry) for anti-CEP2; D.R. 16. van’t Veer, L.J. et al. Gene expression profiling predicts clinical outcome of breast Scoles (University California Los Angeles) for providing constructs; V. Joulov for cancer. Nature 415, 530–536 (2002). sharing results before publication; C. McCowan, T. Clingingsmith and C. You for 17. Gunsalus, K.C. et al. Predictive models of molecular machines involved in Caenor- administrative assistance; and K. Salehi-Ashtiani, D. Szeto, R. Murray and C. Lin habditis elegans early embryogenesis. Nature 436, 861–865 (2005). for characterizing the genomic structure of HMMR. L.M.S. was supported by a 18. Maxwell, C.A. et al. RHAMM is a centrosomal protein that interacts with dynein and Department of Defense Breast Cancer Research Program fellowship and a grant maintains spindle pole stability. Mol. Biol. Cell 14, 2262–2276 (2003). Nature Publishing Group Group 200 7 Nature Publishing from the National Cancer Institute (CA90281to J.D.P.). M.T. was supported by an 19. Entwistle, J., Hall, C.L. & Turley, E.A. HA receptors: regulators of signalling to the © award from the National Institutes of Health (NIH; K08-AG21613). K.C.G. cytoskeleton. J. Cell. Biochem. 61, 569–577 (1996). 20. Boulton, S.J. et al. BRCA1/BARD1 orthologs required for DNA repair in Caenorhabditis received support from the US Army Medical Research Acquisition Activity elegans. Curr. Biol. 14, 33–39 (2004). (W23RYX-3275-N605) and NYSTAR (C040066). This work was supported by 21. Li, S. et al. A map of the interactome network of the metazoan C. elegans. Science an NIH/National Cancer Institute (NCI) R33 grant (to M.V.), an NIH/NCI U01 303, 540–543 (2004). grant (to S. Korsmeyer, S. Orkin, G. Gilliland and M.V.), an NIH/NCI ICBP 22. Bellanger, J.M. & Gonczy, P. TAC-1 and ZYG-9 form a complex that promotes grant (to J. Nevins and M.V.), an ‘interactome mapping’ grant from the NIH/ microtubule assembly in C. elegans embryos. Curr. Biol. 13, 1488–1498 (2003). National Research Institute and the NIH/National Institute of 23. Le Bot, N., Tsai, M.C., Andrews, R.K. & Ahringer, J. TAC-1, a regulator of microtubule General Medical Sciences (to F. Roth and M.V.), an NIH/NCI P30 grant length in the C. elegans embryo. Curr. Biol. 13, 1499–1505 (2003). (CA046592 to the University of Michigan), a Spanish Ministry of Education 24. Srayko, M., Quintin, S., Schwager, A. & Hyman, A.A. Caenorhabditis elegans TAC-1 and ZYG-9 form a complex that is essential for long astral and spindle microtubules. and Science grant (PR2006-0474 to V.M.) and awards from the Breast Cancer Curr. Biol. 13, 1506–1511 (2003). Research Foundation (BCRF13740) and the Niehaus, Southworth, Weissenbach 25. Maxwell, C.A., Keats, J.J., Belch, A.R., Pilarski, L.M. & Reiman, T. Receptor for Foundation (to K.O.) and the Koodish Foundation (to T.K.). hyaluronan-mediated motility correlates with centrosome abnormalities in multiple We also acknowledge the role of the New York Cancer Project, supported myeloma and maintains mitotic integrity. Cancer Res. 65, 850–860 (2005). by the Academic Medicine Development Company of New York. 26. Hsu, L.C. & White, R.L. BRCA1 is associated with the centrosome during mitosis. Proc. Natl. Acad. Sci. USA 95, 12983–12988 (1998). AUTHOR CONTRIBUTIONS 27. Marmorstein, L.Y. et al. A human BRCA2 complex containing a structural DNA binding Experiments and data analyses were coordinated by M.A.P., J.-D.J.H., L.M.S and component influences cell cycle progression. Cell 104, 247–257 (2001). 28. Bieche, I., Tozlu, S., Girault, I. & Lidereau, R. Identification of a three-gene expression K.N.S. Computational analyses were performed by J.-D.J.H., K.C.G., N.B. and signature of poor-prognosis breast carcinoma. Mol. Cancer 3,37(2004). K.V. Yeast two-hybrid analysis screens were performed by M.A.P., J.S.A., J.-F.R and 29. Freed, E. et al. Components of an SCF ubiquitin ligase localize to the centrosome N.A.-G. Biochemical experiments were performed by M.A.P., L.M.S., W.M.E., and regulate the centrosome duplication cycle. Genes Dev. 13, 2242–2257 R.A.G. and B.S. Cell culture and immunofluorescence experiments were (1999). performed by M.A.P. and L.M.S. The case-control study in Israel was conceived 30. Hashizume, R. et al. The RING heterodimer BRCA1-BARD1 is a ubiquitin ligase and executed by G.R. Genotyping and statistical analyses of the case-control inactivated by a breast cancer–derived mutation. J. Biol. Chem. 276, 14537–14540 studies were performed by K.N.S., L.S.R., G.R., V.M., T.K., B.G., D.L., K.O. and (2001). S.B.G. M.A.P., X.S. and P.H. performed the HapMap genotype-haplotype and 31. Starita, L.M. et al. BRCA1-dependent ubiquitination of g-tubulin regulates centrosome number. Mol. Cell. Biol. 24, 8457–8466 (2004). gene expression association analysis. V.A. provided biochemical analysis support, 32. Ouchi, M. et al. BRCA1 phosphorylation by aurora-A in the regulation of G2 to M R.S.G. provided statistical support and M.T., C.L., K.L.N., B.L.W., M.E.C., D.E.H. transition. J. Biol. Chem. 279, 19643–19648 (2004). and D.M.L. helped with overall interpretation of the data. The manuscript was 33. Wang, X. et al. Overexpression of aurora kinase A in mouse mammary epithelium written by M.A.P., J.-D.J.H., L.M.S., D.E.H., M.E.C., S.B.G., J.D.P. and M.V. The induces genetic instability preceding mammary tumor formation. Oncogene 25, project was conceived by M.V. and codirected by S.B.G., J.D.P. and M.V. 7148–7158 (2006).

1348 VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 NATURE GENETICS ARTICLES

34. Claus, E.B., Risch, N.J. & Thompson, W.D. Using age of onset to distinguish between 42. Sotillo, R. et al. Mad2 overexpression promotes aneuploidy and tumorigenesis in mice. subforms of breast cancer. Ann. Hum. Genet. 54, 169–177 (1990). Cancer Cell 11,9–23(2007). 35. Mitchell, M.K., Gregersen, P.K., Johnson, S., Parsons, R. & Vlahov, D. The New York 43. Han, J.D. et al. Evidence for dynamically organized modularity in the yeast protein- Cancer Project: rationale, organization, design, and baseline characteristics. J. Urban protein interaction network. Nature 430, 88–93 (2004). Health 81, 301–310 (2004). 44. Kim, S.K. et al. A gene expression map for Caenorhabditis elegans. Science 293, 36. Raff, J.W. Centrosomes and cancer: lessons from a TACC. Trends Cell Biol. 12, 2087–2092 (2001). 222–225 (2002). 45. Formstecher, E. et al. Protein interaction mapping: a Drosophila case study. Genome 37. Joukov, V. et al. The BRCA1/BARD1 heterodimer modulates ran-dependent mitotic Res. 15, 376–384 (2005). spindle assembly. Cell 127, 539–552 (2006). 46. Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 38. Lingle, W.L. et al. Centrosome amplification drives chromosomal instability in breast 1727–1736 (2003). tumor development. Proc. Natl. Acad. Sci. USA 99, 1978–1983 (2002). 47. Peri, S. et al. Human protein reference database as a discovery resource for 39. Pihan, G.A., Wallace, J., Zhou, Y. & Doxsey, S.J. Centrosome abnormalities and proteomics. Nucleic Acids Res. 32, D497–D501 (2004). chromosome instability occur together in pre-invasive carcinomas. Cancer Res. 63, 48. Walhout, A.J. & Vidal, M. High-throughput yeast two-hybrid assays for large-scale 1398–1404 (2003). protein interaction mapping. Methods 24, 297–306 (2001). 40. Tutt, A. et al. Absence of Brca2 causes genome instability by chromosome breakage 49. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and loss associated with centrosome amplification. Curr. Biol. 9, 1107–1110 (1999). and haplotype maps. Bioinformatics 21, 263–265 (2005). 41. Xu, X. et al. Centrosome amplification and a defective G2-M cell cycle checkpoint 50. Sole´,X.,Guino´, E., Valls, J., Iniesta, R. & Moreno, V. SNPStats: a web tool induce genetic instability in BRCA1 exon 11 isoform–deficient cells. Mol. Cell 3, for the analysis of association studies. Bioinformatics 22, 1928–1929 389–395 (1999). (2006). http://www.nature.com/naturegenetics Nature Publishing Group Group 200 7 Nature Publishing ©

NATURE GENETICS VOLUME 39 [ NUMBER 11 [ NOVEMBER 2007 1349