Research Article

Genome-Wide Functional Synergy between Amplified and Mutated in Human Breast Cancer

Yuri Nikolsky,1 Evgeny Sviridov,2 Jun Yao,3,5 Damir Dosymbekov,2 Vadim Ustyansky,1 Valery Kaznacheev,2 Zoltan Dezso,1 Laura Mulvey,3,6 Laura E. Macconaill,4,7 Wendy Winckler,4,7 Tatiana Serebryiskaya,2 Tatiana Nikolskaya,1,2 and Kornelia Polyak3,4,5

1GeneGo, Inc., St. Joseph, Michigan; 2Vavilov Institute for General Genetics, Moscow, Russia; 3Department of Medical Oncology, 4Center for Cancer Genome Discovery, Dana-Farber Cancer Institute 5Harvard Medical School, Boston, Massachusetts; and 6Harvard University and 7Broad Institute, Cambridge, Massachusetts

Abstract regions of copy number gains indicating the combined activation of several oncogenes (1). In human breast cancer, the most frequent A single cancer cell contains large numbers of genetic and well-characterized amplicons are located on 1q, alterations that in combination create the malignant pheno- 8p12, 8q24, 11q13, 12p13, 16p13, 17q11-q21, 17q22-q23, and 20q13 type. However, whether amplified and mutated genes form (2–7). Genes targeted by these amplification events have been functional and physical interaction networks that could identified in some cases, but emerging view suggests a selection for a explain the selection for cells with combined alterations is combination of coamplified genes rather than single amplicon unknown. To investigate this issue, we characterized copy targets (8). In fact, in breast cancer, the majority of amplified or number alterations in 191 breast tumors using dense single gained genes show increased levels of expression (9). In nucleotide polymorphism arrays and identified 1,747 genes addition to possible cooperation within amplicons, functional with copy number gain organized into 30 amplicons. interactions among genes in distinct amplicons may exist and could Amplicons were distributed unequally throughout the ge- explain the worse clinical outcome of tumors with multiple nome. Each amplicon had distinct enrichment pattern in chromosomal regions of copy number gain (10). pathways, networks, and molecular functions, but genes Cancer genomes were screened for somatic mutations and within individual amplicons did not form coherent functional genetically altered pathways in several large-scale studies using units. Genes in amplicons included all major tumorigenic unbiased genome-wide technologies (3, 10, 11). Recently, 1,142 pathways and were highly enriched in breast cancer–causative genes were identified with somatic alterations based on exon genes. In contrast, 1,188 genes with somatic mutations in resequencing of over 18,000 human genes in 11 breast cancer breast cancer were distributed randomly over the genome, did samples (12). One hundred forty of these mutated genes were not represent a functionally cohesive gene set, and were defined as ‘‘CAN’’ (candidate cancer) driver genes. A study focused relatively less enriched in breast cancer marker genes. on kinases identified 79 genes with somatic mutations in Mutated and gained genes did not show statistically signifi- breast cancer (3). Both studies concluded that the cancer genome cant overlap but were highly synergistic in populating key is extremely complex, a large number of genes are mutated in each tumorigenic pathways including transforming growth factor B, tumor, and very few genes are mutated in a high fraction of tumors. WNT, fibroblast growth factor, and PIP3 signaling. In general, This high degree of complexity raised new challenges with data mutated genes were more frequently upstream of gained genes interpretation and for the identification of key genetically altered in transcription regulation signaling than vice versa, suggest- pathways amenable to therapeutic targeting. ing that mutated genes are mainly regulators, whereas gained To help the functional interpretation of complex data sets, genes are mostly regulated. ESR1 was the major transcription several systems analysis methods have been developed and applied factor regulating amplified but not mutated genes. Our results for breast cancer including gene set ontology enrichment, support the hypothesis that multiple genetic events, including interactome, network and pathway analysis, and network modeling copy number gains and somatic mutations, are necessary for (11, 13–15). We have previously used the data analysis MetaCore establishing the malignant cell phenotype. [Cancer Res platformfor elucidating pathways specifically activated in breast 2008;68(22):9532–40] cancer stemcells (16), for the identification of pathways differentiating normal mammary luminal epithelial and myoepi- Introduction thelial cells (17), and for the comparative analysis of DNA Gene copy number gain and somatic mutations are two major methylation and gene expression profiles of four distinct normal molecular mechanisms underlying the activation of oncogenes mammary epithelial cell types (18). during tumor development. Gene amplifications occur recurrently at Here, we describe a large-scale study to characterize copy specific chromosomal locations and most tumors display multiple number alterations in primary breast carcinomas and to define functional interactions among genetically altered genes in breast cancer. Using high-density single nucleotide polymorphism (SNP) Note: Supplementary data for this article are available at Cancer Research Online arrays, we analyzed 191 breast cancer samples and identified 1,747 (http://cancerres.aacrjournals.org/). genes with copy number gain (called the breast cancer amplicome). Y. Nikolsky and E. Sviridov contributed equally to this work. Requests for reprints: Kornelia Polyak, Dana-Farber Cancer Institute, 44 Binney We compared these amplified genes to genes mutated in breast Street D740C, Boston, MA 02115. Phone: 617-632-2106; Fax: 617-582-8490; E-mail: cancer (called the breast cancer mutome) and identified inter- [email protected]. I2008 American Association for Cancer Research. actions and networks reflecting potential cooperation among genes doi:10.1158/0008-5472.CAN-08-3082 genetically altered in breast tumors.

Cancer Res 2008; 68: (22). November 15, 2008 9532 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Networks of Genetic Alterations in Breast Cancer

Materials and Methods (Supplementary Excel File S1) that was subjected to further Detailed description of the procedures is in Supplementary Materials and analysis. The amplicome covered 5.6% of the Methods. analyzed (chr1-22, 31,011 genes) and 10% of the genes present on Breast tumor samples. Breast cancer cell lines were obtained from the SNP array used in the study. The number of genes in individual American Type Culture Collection or were generous gifts of Drs. Steve Ethier amplicons varied from 4 (20p12-p11) to 336 (17q21-q25). Genes (University of Michigan, Ann Arbor, MI), Fred Miller (Karmanos Cancer composing the amplicome were distributed over the human Institute, Detroit, MI), and Arthur Pardee (Dana-Farber Cancer Institute, genome in a highly skewed manner with 0.7% to 29.8% of all Boston, MA). All cell lines were grown in medium recommended by the genes amplified per (Supplementary Table S1). provider. Fresh or frozen tumor specimens were obtained from the Brigham Chromosomes 8, 16, 17, and 20 featured the largest percentage of and Women’s Hospital, Beth-Israel Deaconess Medical Center, Massachu- amplified genes (>10%). setts General Hospital, University of Zagreb (Croatia), Duke University, and Amplicons are highly variable in encoded gene function. We National Disease Research Interchange. All human tissue was collected using protocols approved by the Institutional Review Boards. Only immunomag- analyzed relative enrichment of individual amplicons for specific netic bead purified or microdissected tumor samples were used for the protein functions including transcription factors (TF), receptors, analysis. Tissue was snap frozen on dry ice and stored at À80jC until use or secreted , kinases, phosphatases, proteases, and metabolic were processed for purification as described previously (16). enzymes. Amplicons varied greatly in composition of encoded SNP array analysis. SNP array analysis was performed by the Dana- protein functions (Supplementary Excel File S2). For example, Farber Microarray Core and by the Broad Institute using Affymetrix 250K amplicon 7p15 was entirely composed of TFs encompassing the StyI SNP arrays as described (16). The raw .CEL files were normalized, HOXA homebox gene cluster. Amplicon 20p13 showed 43% and the copy number of each SNP were generated using the GenePattern 8 enrichment in receptors, whereas amplicons 1q32, 17q21-q25, software. The procedure was optimized to exclude normal references that and 22q13 were highly enriched in kinases. Amplicons 1q32 and were contaminated with tumor tissues. Normal references (matched or 12q13-q21 had high fraction of ligands, whereas effector proteins, virtual) were used as controls to exclude copy number polymorphisms. Raw SNP copy numbers were then smoothened by a segmentation including proteases and metabolic enzymes, were concentrated on algorithmusing the DNAcopy package. 9 To facilitate detailed analysis, amplicons 16p13 and 12q24, respectively. This enrichment could copy number of each gene was estimated by averaging copy numbers from potentially be due to the evolvement of gene families through gene all SNPs found within the gene structure and flanking 100-kbp regions. duplications, which frequently results in physically linked genes encoding for the same function (21). HOXA and HOXB gene Results and Discussion clusters in amplicons 7p15 and 17q21-25, respectively, encoding for TFs and amplicon 17q11-21 enriched for chemokines and keratins Definition of the breast cancer amplicome. To define the are potential examples for this. However, other amplicons breast cancer amplicome, we analyzed copy number alterations in contained genes with unrelated sequence, but encoding for similar 191 human breast cancer samples (154 primary tumors and 37 function including the 20q12-13 amplicons enriched for TFs. breast cancer cell lines) using 250K Affymetrix SNP arrays. To Genes within individual amplicons do not form self- delineate amplicons, we used a modification of a method sustainable functional entities. Coamplified genes located in developed for the extraction of minimal common regions from one amplicon were thought to form functional interactions. The array comparative genomic hybridization data (4, 19). Briefly, genes best example for this is the coamplification of ERBB2 and GRB7 with predicted copy number above two were categorized as gained. in the 17q12 amplicon (1). However, in our data set, genes within X chromosome was not included in the analysis due to difficulties individual amplicons generally did seem to form concise with accurately predicting copy numbers for this chromosome functional entities such as pathways, cellular processes, and based on our SNP data. One hundred seventy-three of 191 samples networks. We conducted enrichment analysis of genes in all 30 contained at least one gained chromosomal region or gene (Fig. 1). amplicons in 5 functional ontologies (GO processes, GO Previous studies have shown that the number and type of molecular functions, canonical pathway maps, cellular process amplicons correlates with breast tumor subtypes (10). In our data networks, and disease biomarkers). Genes within individual set, we observed an enrichment of specific amplicons in luminal amplicons were enriched in cellular processes and pathways (16p13), basal-like (8p11-12 and 8q), and HER2+ (17q) breast tumor relevant to tumorigenesis including cell cycle, cell adhesion, DNA subtypes (Fig. 1). Interestingly, a subset of tumors lacked repair, development, immune response, and cytoskeleton (Sup- chromosomal gains and many of these were basal-like tumors plementary Excel File S3). However, in individual amplicons, and cell lines. cellular pathways were sparsely populated with amplified genes In total, 15,145 (87% of all genes with known function on the (only one or two genes per pathway) and the P values of 250K SNP array) genes were gained in at least one sample. hypergeometric distribution for individual amplicons were high in Subsequently, the data set was narrowed down to 1,747 genes comparison with the complete amplicome, suggesting that none amplified at least 5-fold (20) in at least 7 cases among 191 samples. of the processes and pathways were encoded by a single The threshold of seven was chosen as an upper 75 percentile of the amplicon. All large amplicons encoded multiple pathways and distribution frequency of gene amplification among 191 samples processes without significant dominance of individual functional based on box-plot analysis (Supplementary Fig. S1A and B). The entities. Thus, amplicons did not show distinct specialization for 1,747 genes organized into 30 amplicons distributed over 16 pathways and processes. However, individual amplicons were chromosomes and composed the breast cancer ‘‘amplicome’’ highly synergistic in encoding for genes with functional roles in tumorigenesis, as evident from the enrichment analysis of the entire amplicome. This synergy was clearly visible in some tumorigenic pathway maps assembled from genes located in 8 http://www.broad.mit.edu/cancer/software/genepattern/index.html different amplicons, including the IGF-RI pathway map composed 9 http://www.bioconductor.org of the genes derived from11 amplicons(Fig. 2). www.aacrjournals.org 9533 Cancer Res 2008; 68: (22). November 15, 2008

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Cancer Research

Figure 1. Definition of amplicons and gained genes data distribution. Heatmap depicting the relatedness of tumor samples based on the clustering of 2,791 amplified (z5 copies) genes. Red signal, genes with copy number gains. Depicted chromosome length is proportionate to the number of genes contained in each of the indicated amplicons. Colored rectangles, tumor subtype (red, basal, pink, HER2+; blue, luminal), ER, and HER2 status (red, positive; blue, negative; white, unknown). Breast tumors were classified into luminal, basal-like (triple negative), and HER2 subtypes based on the expression of ER and HER2.

Cancer Res 2008; 68: (22). November 15, 2008 9534 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Networks of Genetic Alterations in Breast Cancer

Genes within individual amplicons did not form concise 3h/a,PKC-a, MAP3K3, NDPKB, and CAAT/enhancer binding networks. One measure of functional connectivity among a set of protein (C/EBP)-h (Supplementary Fig. S3). Conversely, some genes and proteins is their ability to self-assemble into biological amplicons had significantly higher number of interactions with networks with physical protein interactions as links between them genes outside of the amplicon than the global interactome had (22). We applied direct interactions network building algorithmto (Supplementary Excel File S4). Interestingly, half of the amplicons each of the 30 amplicons. Genes in individual amplicons did not were enriched with degrees IN, but only four (8q23-24, 15q26, assemble into significant direct interaction networks. Only a few 17p11, and 20q12-13) were enriched with degrees OUT. This genes were directly linked with other genes in two of the largest indicates that amplicons in general are ‘‘regulated’’ rather than (17q21-q25 and 20q12-q13) amplicons (Supplementary Fig. S2A ‘‘regulating’’ data sets, and they were enriched for ‘‘effector’’ rather and B). than signaling pathways and processes. The density of interactions within individual amplicons was Network analysis of the breast cancer amplicome. To further lower than that in the entire human interactome. Network evaluate functional interactions among genes within amplicons, we topology analysis evaluates the connectivity among genes within analyzed signaling and metabolic networks among 1,360 amplified a set measured as the average number of interactions per network genes encoding known function using Analyze Networks (AN) object (degree) divided into incoming interactions (degree IN) and algorithms. This series of algorithms reveal and visualize the most outgoing interactions (degree OUT; ref. 23). The global human relevant multistep subnetworks of a selected size. The AN interactome of 16,000 proteins with >200,000 interactions (Meta- Transcription factors (ANTF) algorithm(24) connects all genes Core database) was used as a control for evaluating the relative via directed shortest path and prioritizes subnetworks based on the connectivity within amplicons. Only 6 of 30 amplicons (7p15, 8q12- following variables: (a) relative enrichment of the network with the q22, 8q23-q24, 17q11-q21, 17q21-q25, and 20q12-q13) displayed any seed genes (in this case, gained genes); (b) relative enrichment of sensible degree of intraamplicon connectivity. Moreover, connec- the network with canonical pathways; and (c) relative complete- tivity within these amplicons was significantly lower than that in ness of noncanonical pathways fromligands to TF targets. the global human interactome (Supplementary Excel File S4). The One of the top scored ANTF networks of the breast cancer largest hubs within individual amplicons were GRB2, c-, 14-3- amplicome included the JUN oncogene as central hub with 32

Figure 2. Development_ IGF-RI signaling canonical pathway map. Gained genes on the pathway map are marked with red thermometers. Names of corresponding amplicons are displayed in yellow boxes. www.aacrjournals.org 9535 Cancer Res 2008; 68: (22). November 15, 2008

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Cancer Research

Figure 3. ESR1 is the major TF hub for gained genes. The top scored ANR network in the amplicome. ESR1 has the highest number of connections to amplified genes. Thick cyan lines, fragments of canonical pathways on the network. Solid red circles indicate amplified genes. targets (Supplementary Fig. S4). Although JUN itself was not Cross-connections and trans-regulation among amplicons. amplified, it is regulated by the amplified TF via In an arbitrary data set, each protein has interactions with other calmodulin signaling and its targets include amplified ligands proteins within the same set (intraconnections) and with proteins (IL24, IFN-g, galanin, and endostatin) and receptors (HER2, that do not belong to the set (interconnections). Both types of GRP30, and SSTR2). We found 30 additional highly scored connectivity can be evaluated quantitatively in terms of over- ANTF networks, many of them including tumor suppressors connectivity and underconnectivity compared with the expected (TP53 and SMAD3) and oncogenes (C/EBPB and STAT3) number of links based on the size of the data set. We developed a encoding for TFs as central hubs highlighting a key role new statistical procedure (Supplementary Fig. S6A and B) and for transcriptional regulation in tumorigenesis (Supplementary applied it to define overconnectivity and underconnectivity for all Table S2). 30 amplicons in 140 pair-wise combinations. We found that the Next, we applied the AN Receptors (ANR) algorithmthat adds number of direct interactions in pairs varied substantially, selectivity of the entire pathway fromthe mostlikely ligand to demonstrating a complex regulatory pattern among amplicons one membrane as additional prioritization criteria (24). (Supplementary Excel File S5). Twenty-six and 114 amplicon pairs The highest scored ANR subnetwork was centered around the were overconnected and underconnected, respectively, relative to ESR1 nuclear (Supplementary Fig. S5; Fig. 3). the expected number of links. The largest number of interactions ESR1 was not amplified in our data set, although it was was observed between the overconnected pair 8q12-q22 and previously reported to have copy number gain in a subset of 17q11-q21 (15 links) and the underconnected pair 8q23-q24 and breast tumors (25), but it had significantly more direct targets 17q21-q25 (39 links). Amplicon 7p15 (HOXA gene cluster) had among amplified genes than expected by chance highlighting its intraconnections with the lowest P value, indicating cross- importance in breast cancer. The algorithm selected the amplified regulation among HOX genes in this amplicon. In general, IGF1R as the most likely receptor connected with ESR1 via SP1 interconnectivity among amplicons was higher than intra- signaling path between the two. Other important nodes of the connectivity within individual amplicons. Three amplicons ESR1 network included BRCA1 (downstreamof ATM and (17q21-q25, 20q12-q13, and 8q23-q24) had the highest number of upstreamof ESR1) and amplified MYC (downstreamof BRCA1 interconnections with other amplicons and nearly half of all and CDK4; Fig. 3). interconnections in the amplicome involved these three amplicons

Cancer Res 2008; 68: (22). November 15, 2008 9536 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Networks of Genetic Alterations in Breast Cancer

(Supplementary Excel File S5). Amplicon 12q13-q21 was the copy number analysis. Alternatively, some functional gene catego- most overconnected amplicon interacting with seven other ries are preferentially affected by copy number gain versus amplicons. mutation. Among 17 interaction mechanisms tested, interactions between On the other hand, the mutome was closely interlinked with amplicons were enriched in transcription regulation (TR) links, individual amplicons via transcriptional regulation interactions defined as physical binding of TFs to the transcription regulatory (Supplementary Table S3; Supplementary Fig. S7). The most regions of their target genes (Supplementary Excel File S5). Out of prominent and statistically significant (P < 0.007) relationship 65 amplicon pairs featuring TR interactions, 49 were over- was between mutome and amplicon 15q26 that included eight connected and only 16 underconnected. Two amplicons were direct interactions between TFs in the mutome and their targets particularly enriched with outgoing TR interactions: 8q23-q24 and present in this amplicon (Supplementary Excel File S8). 20q12-q13 that transcriptionally regulated 18 (102 TR links) and Importantly, TR interactions between mutome and amplicome 14 (52 TR links) amplicons, respectively. The most prominent TF were largely unidirectional. The absolute number of outgoing in amplicon 8q23-q24 was c-Myc as it had the highest number of interactions was higher in the direction from mutome to outgoing TR links. In contrast, in amplicon 20q12-q13, several TFs amplicome than visa versa (Supplementary Excel File S8). This (C/EBP-h, ZNF-217, and TFAP2C) were equally prominent. These finding suggested that the mutome was enriched for TFs, data were consistent with the above-described functional analysis whereas the amplicome was enriched for the targets of these demonstrating that amplicon 20q12-q13 was enriched in TFs genes. To further investigate this, we applied Transcription Target (Supplementary Excel File S2), whereas amplicon 8q23-q24 has Modeling (TTM) algorithm to the mutome data set and mapped the highest number of outgoing interactions (Supplementary amplified genes on it. TFM algorithm reconstructs canonical Excel File S4). pathways from membrane receptors to TFs most densely Comparative analysis of breast cancer amplicome and populated with the seed genes. The top-scored TTM network mutome. Next, we investigated functional interactions among connected cytokine IL12 with the TF ERM and IL18 with NF-kB genes genetically altered due to amplification (amplicome) or in 6 and 7 steps, correspondingly (Supplementary Fig. S7A). Both somatic mutations (mutome) in breast cancer. The mutome gene pathways were highly enriched in mutated genes. The IL12 set was defined as a nonredundant union of 1,188 somatically subnetwork resulted in amplification of IFN g, the direct target of mutated genes from two large-scale genome sequencing studies (3, ERM. The second scored network enriched with DNA exchange 12). Nine hundred sixty-five of these 1,188 genes had interactions in genes, linked the mutated ATM, ATR, BRCA1, and BRCA2 genes MetaCore database (Supplementary Excel File S6). We also with the amplified RAD9, BRIP1, and EMSY (Supplementary analyzed the set of 140 CAN (candidate cancer) genes that were Fig. S8B). Interestingly, mutated genes were always upstream of likely to play a casual role in tumorigenesis (11). the amplified ones on these networks. Contrary to amplified genes, mutated genes were randomly In total, 23 mutated TFs were overconnected with amplicome distributed in the genome (Supplementary Excel File S6). Only 94 genes with the highest overconnection scores observed for TFs genes were part of both the mutome and the amplicome, and only ATF2 and NFYC (Supplementary Table S4). Seven of CAN TFs had 12 fromthe CAN genes were amplified, which is a statistically direct targets among amplified genes: TP53 (52 targets), HNF1A significantly smaller set than expected (P = 0.048 and P = 0.456, (25 targets), GLI1 (4 targets), BRCA1 (3 targets), PLU1 (2 targets), respectively; Supplementary Excel File S7; Table 1). Enrichment RFX2, and TAF1 (1 target each; Supplementary Excel File S9). analysis of 94 mutated and gained genes did not show significantly In addition to TFs, several other CAN genes of different function enriched pathways, processes, and cellular processes (data not had a large number of downstream interactions with amplicome shown). One possible reason for the small overlap between two genes including NOTCH1 (11 interactions), HDAC4 (5 interac- data sets could be that the mutational analysis was conducted on a tions), and FLNA (4 interactions). Among amplified genes, the different and much smaller set of tumors than what we used for gene encoding for cytoskeletal actin had the highest number (5)

Table 1. Mutated and amplified genes in breast cancer. An intersection between CAN genes and amplified gene sets

Gene symbol ID Chr localization Protein functional class Protein name

ARFGEF2 10564 20q13.13 Regulator Brefeldin A-inhibited guanine nucleotide-exchange protein 2 CENTG1 116986 12q14.1 Regulator Centaurin-g 1 HOXA3 3200 7p15-p14 protein Hox-A3 NCOA6 23054 20q11 Binding protein coactivator 6 PPM1E 22843 17q22 Phosphatase Protein phosphatase 1E RIMS2 9699 8q22.3 Binding protein Regulating synaptic membrane exocytosis protein 2 SLC6A3 6531 5p15.3 Transporter Sodium-dependent dopamine transporter SULF2 55959 20q12-q13.2 Metabolic enzyme Extracellular sulfatase Sulf-2 precursor TG 7038 8q24 Binding protein Thyroglobulin precursor THBS3 7059 1q21 Binding protein Thrombospondin-3 precursor ZFP64 55734 20q13.2 Binding protein protein 64, isoforms 3 and 4 Binding protein Zinc finger protein 64, isoforms 1 and 2 ZNF569 148266 19q13.12 Binding protein Zinc finger protein 569

www.aacrjournals.org 9537 Cancer Res 2008; 68: (22). November 15, 2008

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Cancer Research

Figure 4. Functional synergy between amplicome and mutome. The canonical pathway map ‘‘Cell adhesion–Chemokines and adhesion’’ was densely populated with amplified (red rectangles) and mutated (blue boxes) genes. Note that the genes were either amplified or mutated. of upstreamlinks with mutatedgenes ( DBN1, FLNB, GSN, TLN1, Connectivity of mutome and amplicome proteins within and TARA); MYC and IGF1R each had four links (BRCA1, HDAC4, sets and with the global proteome. Overall, both the mutome NOTCH1, and TP53; and BRCA1, HNF1A, NOTCH1, and TP53), and the amplicome showed somewhat higher connectivity than respectively; and Cyclin D1 had three links (BRCA1, GLI1, and the global human interactome. The degree was 11.75 and 9.579 HDAC4). Based on these data, BRCA1 and HDAC4 were the two for the mutome and the amplicome, respectively, compared with mutated genes with the highest number of interactions with 9.133 for the entire human interactome (Supplementary Excel amplified genes. File S10). This may indicate that the mutome is slightly enriched

Cancer Res 2008; 68: (22). November 15, 2008 9538 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Networks of Genetic Alterations in Breast Cancer in hubs (highly connected proteins). Interestingly, the difference Fyn, and BUBR1; and proteases caspase-3, calpain 3, Granzyme B, between the connectivity of the mutome and the amplicome presenelin 1, and caspase-7. was higher for outgoing links than for incoming links. On Within the amplicome, C/EBPh, HOXB4, HOXB1, cKrox, and average, mutome and amplicome genes had 11.44 and 7.978 c-Myc were the top-scored self-regulatory TFs. The highest scored OUT interactions, respectively, compared with only 8.56 in the overconnected receptors included ITGB4, FGFR1, PGD2R, ADRB3, global interactome. This phenomenon was particularly striking and PIGR, whereas the most connected ligands included K12, in the case of the CAN genes that featured 22.1 OUT interactions MIP-1a, Substance P, Neuropeptide W, and IL-24. The most (2.6-fold more than expected). These data were consistent relevant intraconnected kinases were MAPKAPK2, CLK2, Fructos- with the observations above indicating that the mutome is amine-3-kinase, Casein kinase Iy, and PKC-a; proteases included generally a ‘‘regulating’’ data set. Intraconnectivity within data CAAX prenyl protease 2, cathepsin F, ADAM-33, matrix metal- sets was comparable between mutome and amplicome (2.942 loproteinase (MMP)-17, and MMP-25. (Supplementary Excel for mutome and 2.548 for amplicome; Supplementary Excel File S13). File S10). Ontology enrichment analysis of the breast cancer mutome Subsequently, we evaluated the relative distribution of genes and amplicome. To identify gene functions and pathways encoding for proteins with different molecular function within the enriched in the breast cancer mutome and amplicome, we mutome and the amplicome (Supplementary Excel File S11). The performed enrichment analysis in five functional ontologies mutome had fewer secreted ligands than expected but was (canonical pathways, process networks, disease biomarkers, GO enriched in kinases (2-fold over expected) and receptors. The processes, and GO molecular functions). Both the mutome and amplicome contained slightly fewer receptors than expected. The amplicome were enriched for genes and processes involved in fraction of TFs was similar in both data sets. tumorigenesis (Supplementary Excel File S14). The mutome was Next, we evaluated intraconnectivity and interconnectivity of enriched for cell adhesion, cell cycle, and DNA damage pathways individual proteins in the mutome and the amplicome, (this later was especially pronounced for the CAN genes), specifically searching for statistically significantly overconnected whereas the amplicome was enriched for developmental path- and underconnected proteins. We analyzed separately intercon- ways. With respect of disease biomarkers (e.g., genes linked to nections between each sets (mutome and amplicome), and the over 500 diseases based on genetic and biochemical data), the global proteome (Supplementary Excel File S12) and for intra- amplicome was highly enriched for genes associated with breast À À connections within mutome and amplicome (Supplementary cancer (P = 2.03 Â 10 28) and skin disease (3.48 Â 10 17). Excel File S13). The largest overconnected hubs in the case of Surprisingly, the top 15 disease markers enriched in the mutome interconnections were proteins outside the sets (e.g., proteins did not include cancer-associated biomarkers (with the excep- encoded by neither gained nor mutated genes). ESR1 was the tion of Leukemia), but markers for different nervous system most important transcription hub (overconnected TF) for the disorders (Supplementary Excel File S14). This suggests that amplicome. ESR1 had 102 direct targets among amplified genes activation of tumorigenic pathways by gene amplification is a (128 interactions total), which was statistically significantly more significant mechanism driving breast tumorigenesis than (P < 0.00003) higher than the expected 73; a 63% enrichment somatic mutations. (Supplementary Excel File S12; Fig. 3). In contrast, ESR1 was Synergy in ontology enrichment distributions between statistically significantly (P = 0.03) underconnected with the breast cancer mutome and amplicome. To determine if mutome (41 TR links with mutated genes compared with the mutated and amplified genes cooperatively alter certain path- expected 54). This could potentially be due to the different ways and networks, we analyzed synergy in ontology enrichment distribution of (ER)-positive and ER-negative distributions between mutome and amplicome and found high tumors within the amplicome and mutome, although ESR1 degree of functional connectivity. Synergy was defined by a lower remained a main transcriptional regulator of amplified genes P value for an ontology entity in the distribution of the even when we restricted the amplicome analysis to ER-negative nonredundant union of amplicome and mutome gene content tumors (data not shown). Interestingly, the amplicome was compared with amplicome and mutome individually. For specifically enriched in protein-encoding ESR1 target genes but example, pathway maps ‘‘Cell adhesion_Chemokines and adhe- À À not in ESR1 binding sites in general. About 11% of all ERa sion’’ scored P =3.2Â 10 12 for the union set and 3.1 Â 10 9 binding sites, determined based on ChIP-on-chip studies in for the mutome (Supplementary Excel File S14). The three orders MCF-7 cells (26), are located in the amplicome, which constitutes of magnitude difference in P values reflects the enhancement of f10% of the genome (Supplementary Table S5). The size of the mutant genes by amplified genes in this particular pathway individual amplicons correlated (R = 0.94) with the number of and indicates functional connectivity between amplicome and ERa sites they contained. mutome because a union of functionally disconnected data sets Other notable statistically significant outside (not amplified) would yield higher P values in enrichment analysis due to its overconnected hubs for the amplicome included TFs ESR2, RUNX2, larger size. PIT1, CSDA, and MBD1; ligands SPP1, LAMA4, CTGF, PDGF-A, and Enrichment synergy between amplicome and mutome was CCL24; EPHA receptors and PGE2R4, kinases PRKCD, PRKACB, evident for certain functional ontologies canonical pathway and MAP2K5, mammalian target of rapamycin, and PRKCZ and processes networks (Fig. 4). The highest synergy in canonical proteases SENP2, ABHD7, AMPL, CTRC, and CLPP (Supplementary pathways was found for ‘‘Cell adhesion_Chemokines and Excel File S12). adhesion,’’ ‘‘Cytoskeleton remodeling_TGF, WNT and cytoskeletal The top statistically significant overconnected TFs within the remodeling,’’ ‘‘Cell cycle_ATM/ATR regulation of G1-S check- mutome were HIVEP1, IRF7, NRSF, NEUROD2, and PAX8; receptors point,’’ and ‘‘Development_IGF-RI signaling’’ (three orders ROS1, FCGR1A, LILRB3, IGF1R, and E-selectin; ligands IFNA5, of magnitude). These synergies suggest close interactions IL-12a, FGF10, Jagged2, and IFN-g; kinases Chk1, LKB1, ZIP-kinase, between mutated and amplified genes in these selected www.aacrjournals.org 9539 Cancer Res 2008; 68: (22). November 15, 2008

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Cancer Research processes. In the ontology of process networks, the highest Disclosure of Potential Conflicts of Interest degree of synergy was observed for ‘‘Inflammation_IL10 anti- K. Polyak: research support and consultant, Novartis Pharmaceuticals Inc.; inflammatory response’’ and ‘‘Cell adhesion_Cadherins’’ (two consultant, GeneGo, Inc. and Aveo Pharmaceuticals, Inc. The other authors disclosed orders of magnitude enhancement compared with amplicome no potential conflicts of interest. or mutome alone). In summary, our results strongly support the conclusion that a Acknowledgments combination of multiple genetic alterations is necessary for the Received 8/11/2008; revised 8/26/2008; accepted 9/2/2008. aberrant self-sustaining activation of multiple tumorigenic signal- Grant support: National Cancer Institute (CA89393 and CA94074) grants (K. Polyak), NCI (CA134175) to Y. Nikolsky, and NCI (CA112828) to T. Nikolskaya, ing pathways in breast cancer. Systemic genetic and bioinformatics and by GeneGo, Inc. analyses of human tumors, such as our report presented here, can The costs of publication of this article were defrayed in part by the payment of page lay the groundwork for future translational studies exploring the charges. This article must therefore be hereby marked advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. potential therapeutic targeting of key regulators of the malignant We thank Dr. Andrea Richardson for help with the acquisition of tumor samples and cellular phenotype. Drs. WilliamHahn and Matthew Meyerson for their critical reading of the manuscript.

References 9. Pollack JR, Sorlie T, Perou CM, et al. Microarray type aspecific DNA methylation patterns in nthe human analysis reveals a major direct role of DNA copy breast. Proc Natl Acad Sci U S A 2008;105:14076–81. 1. Bai T, Luoh SW. GRB-7 facilitates HER-2/Neu-mediat- number alteration in the transcriptional program of 19. Aguirre AJ, Brennan C, Bailey G, et al. High- ed signal transduction and tumor formation. Carcino- human breast tumors. Proc Natl Acad Sci U S A 2002; resolution characterization of the pancreatic adeno- genesis 2008;29:473–9. 99:12963–8. carcinoma genome. Proc Natl Acad Sci U S A 2004;101: 2. Ethier SP. Identifying and validating causal genetic 10. Chin K, DeVries S, Fridlyand J, et al. Genomic and 9067–72. alterations in human breast cancer. Breast Cancer Res transcriptional aberrations linked to breast cancer 20. Kallioniemi A, Kallioniemi OP, Piper J, et al. Detection Treat 2003;78:285–7. pathophysiologies. Cancer Cell 2006;10:529–41. 3. Greenman C, Stephens P, Smith R, et al. Patterns of and mapping of amplified DNA sequences in breast somatic mutation in human cancer genomes. Nature 11. Lin J, Gan CM, Zhang X, et al. A multidimensional cancer by comparative genomic hybridization. Proc Natl 2007;446:153–8. analysis of genes mutated in breast and colorectal Acad Sci U S A 1994;91:2156–60. 4. Yao J, Weremowicz S, Feng B, et al. Combined cDNA cancers. Genome Res 2007;17:1304–18. 21. Michalak P. Coexpression, coregulation, and cofunc- array comparative genomic hybridization and serial 12. Wood LD, Parsons DW, Jones S, et al. The genomic tionality of neighboring genes in eukaryotic genomes. analysis of gene expression analysis of breast tumor landscapes of human breast and colorectal cancers. Genomics 2008;91:243–8. progression. Cancer Res 2006;66:4065–78. Science 2007;318:1108–13. 22. Cusick ME, Klitgord N, Vidal M, Hill DE. Interactome: 5. Clark J, Edwards S, John M, et al. Identification of 13. Chuang HY, Lee E, Liu YT, Lee D, Ideker T. Network- gateway into systems biology. Hum Mol Genet 2005;14 amplified and expressed genes in breast cancer by based classification of breast cancer metastasis. Mol Spec No. 2:R171–81. comparative hybridization onto microarrays of random- Syst Biol 2007;3:140. 23. Yu H, Zhu X, GreenbaumD, Karro J, Gerstein M. ly selected cDNA clones. Genes Chromosomes Cancer 14. Pujana MA, Han JD, Starita LM, et al. Network TopNet: a tool for comparing biological sub-networks, 2002;34:104–14. modeling links breast cancer susceptibility and centro- correlating protein properties with topological statistics. 6. LathamC, Zhang A, Nalbanti A, et al. Frequent co- some dysfunction. Nat Genet 2007;39:1338–49. Nucleic Acids Res 2004;32:328–37. amplification of two different regions on 17q in 15. Hernandez P, Sole X, Valls J, et al. Integrative analysis 24. Ekins S, Nikolsky Y, BugrimA, Kirillov E, Nikolskaya aneuploid breast carcinomas. Cancer Genet Cytogenet of a cancer somatic mutome. Mol Cancer 2007;6:13. T. Pathway mapping tools for analysis of high content 2001;127:16–23. 16. Shipitsin M, Campbell LL, Argani P, et al. Molecular data. Methods Mol Biol 2007;356:319–50. 7. Sinclair CS, Rowley M, Naderi A, Couch FJ. The 17q23 definition of breast tumor heterogeneity. Cancer Cell 25. Holst F, Stahl PR, Ruiz C, et al. Estrogen receptor a amplicon and breast cancer. Breast Cancer Res Treat 2007;11:259–73. (ESR1) gene amplification is frequent in breast cancer. 2003;78:313–22. 17. Hu M, Yao J, Carroll DK, et al. Regulation of in situ to Nat Genet 2007;39:655–60. 8. Ursini-Siegel J, Schade B, Cardiff RD, Muller WJ. invasive breast carcinoma transition. Cancer Cell 2008; 26. Lupien M, Eeckhoute J, Meyer CA, et al. FoxA1 Insights from transgenic mouse models of ERBB2- 13:394–406. translates epigenetic signatures into enhancer-driven induced breast cancer. Nat Rev Cancer 2007;7:389–97. 18. Bloushtain-Qimron N, Yao J, Snyder EL, et al. Cell lineage-specific transcription. Cell 2008;132:958–70.

Cancer Res 2008; 68: (22). November 15, 2008 9540 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research. Genome-Wide Functional Synergy between Amplified and Mutated Genes in Human Breast Cancer

Yuri Nikolsky, Evgeny Sviridov, Jun Yao, et al.

Cancer Res 2008;68:9532-9540.

Updated version Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/68/22/9532

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2008/11/18/68.22.9532.DC1

Cited articles This article cites 26 articles, 8 of which you can access for free at: http://cancerres.aacrjournals.org/content/68/22/9532.full#ref-list-1

Citing articles This article has been cited by 15 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/68/22/9532.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/68/22/9532. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 24, 2021. © 2008 American Association for Cancer Research.