Research Article

Visualizing as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed —A Study of 130 Invasive Ductal Breast Carcinomas

Fabien Reyal,1,2 Nicolas Stransky,1 Isabelle Bernard-Pierrot,1 Anne Vincent-Salomon,3 Yann de Rycke,4 Paul Elvin,9 Andrew Cassidy,9 Alexander Graham,9 Carolyn Spraggon,9 Yoann De´sille,1 Alain Fourquet,5 Claude Nos,2 Pierre Pouillart,6 Henri Magdele´nat,7 Dominique Stoppa-Lyonnet,3 Je´roˆme Couturier,3 Brigitte Sigal-Zafrani,3 Bernard Asselain,4 Xavier Sastre-Garau,3 Olivier Delattre,8 Jean Paul Thiery,1,7 and Franc¸ois Radvanyi1

1Unite´ Mixte de Recherche 144, Centre National de la Recherche Scientifique; Departments of 2Surgery, 3Tumor Biology, 4Biostatistics, 5Radiotherapy, 6Medical Oncology, and 7Translational Research; and 8U509, Institut National de la Sante´ et de la Recherche Me´dicale, Institut Curie, Paris, France and 9Astra Zeneca, Alderley Park, United Kingdom

Abstract Introduction Completion of the working draft of the has Large-scale transcriptome analyses based on DNA microarrays made it possible to analyze the expression of genes have facilitated the classification of cancers into biologically according to their position on the chromosomes. Here, we distinct categories, some of which may explain the clinical used a transcriptome data analysis approach involving for behaviors of tumors (1–3). Such analyses may also help to find each the calculation of the correlation between its new prognostic and predictive markers (4, 5). Completion of the expression profile and those of its neighbors. We used the initial working draft of the human genome has made it possible U133 Affymetrix transcriptome data set for a series of 130 to interpret transcriptome data in a new way, by directly invasive ductal breast carcinomas to construct chromosomal assigning the genome-wide, high-throughput gene expression maps of gene expression correlation (transcriptome corre- profiles to the human genome sequence (6). A few recent studies lation map). This highlighted nonrandom clusters of genes have explored the relationship between the transcriptome and along the genome with correlated expression in tumors. the positions of genes on chromosomes. Using data from serial Some of the gene clusters identified by this method analysis of gene expression (SAGE) in a range of human normal probably arose because of genetic alterations, as most of and tumor tissues, Caron et al. (7) showed that highly expressed the chromosomes with the highest percentage of correlated genes are often found in clusters in specific chromosomal genes (1q, 8p, 8q, 16p, 16q, 17q, and 20q) were also the regions, called regions of increased gene expression. By most frequent sites of genomic alterations in breast cancer. expressed sequence tag collection data analysis, Zhou et al. (8) Our analysis showed that several known breast tumor have compared normal and tumor tissues in 10 different tissue amplicons (at 8p11-p12, 11q13, and 17q12) are located types and showed clusters of genes exhibiting increased within clusters of genes with correlated expression. Using expression along chromosomes in tumors. Many of these hierarchical clustering on samples and a Treeview repre- genomic regions corresponded to known amplicons. By compar- sentation of whole arms, we observed a ing serial analysis of gene expression data for normal bronchial higher-order organization of correlated genes, sometimes epithelium, adenocarcinomas, and squamous cell carcinomas of involving very large chromosomal domains that could the lung, Fujii et al. (9) identified clusters of overexpressed and extend to a whole chromosome arm. Transcription correla- underexpressed genes. They also showed that in squamous cell tion maps are a new way of visualizing transcriptome data. carcinomas of the lung, about half of these clusters were located They will help to identify new genes involved in tumor progression and new mechanisms of gene regulation in in imbalanced chromosomal regions previously identified by tumors. (Cancer Res 2005; 65(4): 1376-83) cytogenetic, comparative genomic hybridization, or loss of heterozygosity studies. Two studies have directly explored the relationship between DNA copy number alterations and gene expression in breast tumors using cDNA microarrays and found that 40% to 60% of the highly amplified genes were overex- pressed (10, 11). Therefore, clustering of overexpressed genes Note: F. Reyal, N. Stransky, and I. Bernard-Pierrot contributed equally to this work. could be attributable, at least in part, to underlying gene copy B. Sigal-Zafrani contributed on behalf of the Institut Curie Breast Cancer Group (see Acknowledgments). number alterations. The different approaches mentioned above Supplementary data for this article are available at Cancer Research Online established a relationship between the transcriptome and the (http://cancerres.aacrjournals.org/). The original figures of the article as well as the chromosomal location of the genes by analyzing the tran- supplementary data can be found at http://microarrays.curie.fr/publications/ oncologie_moleculaire/breast_TCM/. scriptome considering the expression of each gene individually. Requests for reprints: Franc¸ois Radvanyi, Unite´ Mixte de Recherche 144, Institut The genes are subsequently grouped into chromosomal domains Curie-CNRS, 26 rue dVUlm, 75248 Paris Cedex 05, France. Phone: 33-1-42-34-63-39; Fax: 33-1-42-34-63-49; E-mail: [email protected]. according to the expression data. A second approach to explore I2005 American Association for Cancer Research. the relationship between the transcriptome and the organization

Cancer Res 2005; 65: (4). February 15, 2005 1376 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Transcriptome Correlation Map of Breast Carcinomas of the genome was developed first on budding yeast tran- probe set was not available, it was obtained using the basic local scriptome data (12) and subsequently on Drosophila tran- alignment search tool–like alignment tool program (16) for sequence- f scriptome data (13). This transcriptome data analysis approach matching searches of probe set–specific target sequences (an 600 searches for groups of neighboring genes that show correlated nucleotide sequence from which probe set oligonucleotides are derived) against the University of California Santa Cruz Human Genome Working expression profiles. Both groups identified chromosomal Draft sequence, or by using the position of their corresponding UniGene domains of similarly expressed neighboring genes. In normal Cluster (Homo sapiens UniGene Build 164). All positions refer to the July cells, these chromosomal domains of co-regulated genes may 2003 Human Genome Working Draft. represent chromatin-regulated regions or groups of genes Calculation of Transcription Correlation Scores and Identification regulated by the same transcription factor(s). We developed a of Neighboring Genes with Correlated Expression. A similar method to similar computational method for the analysis of transcriptome those described in yeast and Drosophila (12, 13), based on transcriptome data to identify chromosomal domains containing co-expressed array data, has been developed in our laboratory (by N. Stransky) to evaluate genes in cancer and applied it to a series of 130 invasive ductal the correlation between the expression profile of each gene and those of its neighbors. For each probe set (gene), we calculated a score, which we called breast carcinomas. the transcriptome correlation score. This score is the sum of the Spearman rank order correlation values in the tumor samples between the RNA levels of this gene and the RNA levels of each of the physically nearest 2n genes (n Materials and Methods centromeric genes and n telomeric genes). Patients and Breast Tumor Samples. We analyzed the gene To determine a significance threshold (i.e., a score above which a gene is expression profiles of 130 infiltrating ductal primary breast carcinomas. considered to have a similar expression pattern to its neighbors), we created These carcinomas were obtained from 130 patients who were included 100 random data sets of the same size by randomly ordering the 16,215 in the prospective database initiated in 1981 by the Institut Curie Breast probe sets on the genome. For each random set, transcriptome correlation Cancer Group between 1989 and 1999. The flash-frozen tumor samples scores were calculated for each probe set. The significance threshold was were stored at À80jC immediately after lumpectomy or mastectomy. All the 500th quantile of the distribution (i.e., the value for which 1 of 500 probe tumor samples contained >50% cancer cells, as assessed by H&E sets in the random data sets were above this value). Probe sets with a score staining of histologic sections adjacent to the samples used for the exceeding the threshold are consequently significantly correlated with their transcriptome analysis. The clinical data for the patients and the neighbors within a number of 2n probe sets at P < 0.002 and are called histologic characteristics of the tumors are summarized in Supplemen- ‘‘correlated probe sets.’’ tary Table S1. This study was approved by the institutional review To determine the appropriate number (2n) of neighboring genes needed boards of Institut Curie. to calculate the transcriptome correlation score for each gene of our data RNA Extraction and Microarray Data Collection. RNA was extracted set, the total number of genes with a score above the threshold was from all tumor samples by the cesium chloride protocol (14, 15). The calculated as a function of 2n (13) for several values ranging from 2 to 34 concentration and the integrity/purity of each RNA sample were measured (Supplementary Fig. S1). Above 2n = 20, this number reaches a plateau. using RNA 6000 LabChip kit (Agilent Technologies, Palo Alto, CA) and the Therefore, 20 neighboring genes were used to calculate the transcriptome Agilent 2100 bioanalyzer. The DNA microarrays used in this study were the correlation score. Human Genome U133 set (HG-U133, Affymetrix, Santa Clara, CA), For each of the 16,215 probe sets used, we calculated the transcriptome consisting of two GeneChip arrays (A and B), and containing almost correlation score. For each chromosome, we obtained a diagram (the 45,000 probe sets. Each probe set consisted of 22 different oligonucleotides transcriptome correlation map) representing this score for each probe set, (11 of which are a perfect match with the target transcript and 11 of which organized according to its chromosomal position. harbor a one-nucleotide mismatch in the middle). These 22 oligonucleo- Correlation between Adjacent Groups of Correlated Genes. To tides were used to measure the level of a given transcript. Details of the RNA determine if there was a correlation between adjacent groups of correlated amplification, labeling, and hybridization steps are available from http:// genes, we analyzed the correlated probe sets by one-way unsupervised www.affymetrix.com. Chips were scanned and the intensities for each probe hierarchical clustering on samples (17) and by leaving the probe sets set were calculated using Affymetrix MAS5.0 default settings. The mean organized according to their chromosomal position. intensity of the probe sets for each array was set to a constant target value Software Used for Data Analyses. Statistical analysis and all (500) by linearly scaling the array signal intensities. calculations were done using R 1.9.0 (http://www.r-project.org). We used Selection of Probe Sets: Attribution of a Unique Probe Set to Each the HKIS software (http://isoft.free.fr/hkis/) to look up gene positions in Gene. The fact that some genes are represented by several probe sets could public databases and for data formatting. We used Java TreeView 1.0.5 introduce artifacts when looking for neighboring genes with similar (http://jtreeview.sourceforge.net/) to make representations of Eisen clusters expression patterns because the intensities of these probe sets are highly (17) obtained with Cluster 3.0 (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/ correlated. To avoid this artifact, probe sets corresponding to uncharac- software/cluster/). terized expressed sequence tags were removed from the analysis, and when several probe sets corresponded to the same gene (i.e., if different probe sets had the same title, the same GenBank ID or belonged to the same UniGene Results Cluster), a single probe set was kept for the analysis. Probe sets with an Generation of a Transcriptome Map Based on the ‘‘_at’’ extension were preferentially kept because they tend to be more Correlated Expression of Neighboring Genes (Transcriptome specific according to Affymetrix probe set design algorithms. Probe sets Correlation Map) in Infiltrating Ductal Breast Carcinoma. with an ‘‘_s_at’’ extension were the second best choice followed by all other The RNA expression data of 130 ductal invasive breast extensions. When several probe sets with the same extension were available carcinomas were obtained using the Affymetrix U133 set. Of the for one gene, the one with the highest median value was kept. From an 45,000 probes in the initial set, 16,215 probe sets, each initial list of f45,000 probe sets, we kept 16,215 ‘‘unique’’ probe sets corresponding to a unique gene, were kept to establish corresponding to a unique gene. Due to the univocal correspondence between these 16,215 probe sets and genes, we will use the terms ‘‘probe chromosomal transcriptome maps (see Materials and Methods). set’’ and ‘‘gene’’ indifferently. These maps highlight genes that show a correlated expression Chromosomal Location of the Probe Sets. Each of the 16,215 probe with their neighbors. The transcriptome correlation maps for sets represents a unique gene. Their genomic locations were obtained genes located on chromosomes 1 and 2 are shown in Fig. 1. The from the U133 annotation files from Affymetrix. When the position of the significance threshold was defined as the 500th quantile of the www.aacrjournals.org 1377 Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Cancer Research

Figure 1. Transcriptome correlation map of chromosomes 1 and 2 in infiltrating ductal breast carcinoma. The transcriptome correlation score (TC score) for a given gene indicates the strength of the correlation between the expression of this gene and the expression of the 20 neighboring genes (see Materials and Methods). The transcriptome correlation map shows the scores for the different genes as a function of their position along the chromosome. Left, transcriptome correlation maps of and chromosome 2. Dashed line, significance threshold: Genes with scores above this threshold are considered to have a significantly correlated expression with that of their neighbors. This significant threshold was calculated using random data sets (see Materials and Methods). Right, examples of random data set for chromosome 1 and chromosome 2. distribution of the resulting 1.6 million transcriptome correlation correlation score differed significantly between the different scores in the random data sets and was equal to 3.38. Examples chromosome arms (Fig. 3). The chromosome arms with the highest of a random data set for chromosomes 1 and 2 are given in Fig. 1 percentages (higher than 30%) were 1q (243 of 750 genes), 8p (82 of (right panels). The transcriptome correlation maps of the different 207 genes), 8q (185 of 364 genes), 14q (174 of 527 genes), 16p (179 of chromosomes, including chromosomes 1 and 2, showed that 379 genes), 16q (110 of 318 genes), 17q (303 of 704 genes), and 20q genes with a transcriptome correlation score above the threshold (101 of 304 genes). Beside chromosome arm 14q, all these were not distributed uniformly along the chromosomes: groups of chromosome arms are also the most frequent locations of genomic adjacent genes with correlated expression could be seen (Fig. 1 alterations in breast cancer. We analyzed chromosome arms 1q, 8p, and Supplementary Fig. S2 and Table S2). Interestingly, the genes 8q, 16p, 16q, and 17q in more detail (Figs 4–6 and Supplementary corresponding to the three well-known amplification regions in Fig. S4). For this analysis, we considered only genes with a breast cancer [8p11-p12 (FGFR1 ), 11q13 (CCDN1 locus), and transcriptome correlation score above the threshold (Figs. 4A-6A 17p12 (ERBB2 locus)] were present in chromosomal regions and Supplementary Fig. S4A). The expression patterns of these containing genes with transcriptome correlation scores higher genes, ordered according to their chromosomal location, were than the threshold (Fig. 2). examined in the 130 breast cancer samples by using an unsupervised Large Chromosomal Domains of Co-expressed Genes. hierarchical clustering representation (one-way clustering on Overall, 20% of the genes had a significant transcriptome correlation samples). Very similar expression patterns were obtained for the score (i.e., their expression was correlated with that of their 243 genes on chromosome 1q with a significant transcriptome neighbors). The percentage of genes with a significant transcriptome correlation score (Fig. 4B). These genes extended from PEX11B

Cancer Res 2005; 65: (4). February 15, 2005 1378 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Transcriptome Correlation Map of Breast Carcinomas

(1q21.1) to ELYS (1q44), spanning a 100,677-kb region. A group of KIAA0924 (17q21.32) to LOC51321 (17q24.2), and a group of 88 genes with a similar expression pattern was also apparent on the genes (f ) spanning 14,219 kb from SLC16A6 (17q24.2) to MGC4368 long arm of (Fig. 5B) and extended to the (17q25.3 (17925.3) (Fig. 6). centromeric region of chromosome 8p [213 genes, spanning 108,140 kb from FLJ14299 (8p12) to KIAA0014 (8q24.3)]. The 8p11- Discussion p12 amplified region containing the FGFR1 locus was located at the edge of this domain. A very different expression pattern was Microarray technology makes it possible to monitor the observed for the genes telomeric to the FGFR1 locus [54 genes, expression of thousand of genes simultaneously. Completion of spanning 25,203 kb from LOC84549 (8p12) to DKFZp761P0423 the initial working draft of the human genome has made it (8p23.1)]. The 289 genes on chromosome 16 (Supplementary Fig. possible to analyze the expression of the genes according to S4B) with a transcriptome correlation score higher than the their position on the genome. Comparison of the expression threshold corresponded to two different sets of genes. One was a patterns of adjacent genes in different expression array data sets group of 110 genes all localized on 16q and spanning 43,127 kb in yeast (12, 18), Drosophila (13), and worms (19) has led to the from VPS35 (16q11.2) to FANCA (16q24.3) and the second was a identification of groups of physically adjacent genes that share group of 179 genes all localized on 16p, spanning 30,901 kb from similar expression profiles. In this work, we used this new BCKDK (16p11.2) to RGS11 (16p13.3). In contrast, numerous approach to analyze large-scale transcriptome data concerning different expression patterns were observed on chromosome 17q cancer. (Fig. 6B). At least six different sets of genes were found: A group of To identify systematically co-expressed adjacent genes along the 36 genes (a) spanning 4,034 kb from TNFAIP1 (17q11.2) to ZNF207 chromosomes, we calculated an expression correlation score for (17q11.2), a group of 37 genes (b) spanning 4,110 kb from FLJ22865 each given gene. This score was the sum of the correlation values (17q12) to SMARCE1 (17q21.2), a group of 64 genes (c) spanning between the RNA expression levels of this gene and the RNA 7,045 kb from KRTAP4-15 (17q21.2) to SCAP1 (17q21.32), a group of expression levels of each of the 20 neighboring genes (10 four genes (d) spanning 78 kb from HOXB2 (17q21.32) to HOXB7 centromeric genes and 10 telomeric genes). Using the U133 (17q21.32), a group of 74 genes (e) spanning 19,006 kb from Affymetrix transcriptome data for a series of 130 invasive ductal

Figure 2. Transcriptome correlation maps of regions localized on chromosomes 8, 11, and 17 containing known amplicons in infiltrating ductal breast carcinoma. The transcriptome correlation maps of chromosome 8 (region p21.2 to q11.21; left), chromosome 11 (region q11 to q14.1; middle), and (region q11.2 to q21.31; right) are shown. Vertical bars, position of the centromeres. Bold squares, transcriptome correlation scores (TC score) for FGFR1, CCND1, and ERBB2. www.aacrjournals.org 1379 Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Cancer Research

Figure 3. Percentage of genes with a transcriptome correlation score (TC score) higher than the threshold for each chromosomal arm in infiltrating ductal breast carcinoma. For each chromosomal arm, the number of genes with a transcriptome correlation score higher than the significant correlation score divided by the total number of considered genes on the chromosome arm was calculated and expressed as a percentage.

breast carcinomas, we constructed a genome-wide map that we suggest that FGFR1 is not the driving oncogene in the 8p11-p12 called the transcriptome correlation map. This map highlights the amplicon. Several new candidate oncogenes located in this region ‘‘correlated genes’’ (i.e., genes that share a similar expression profile have been suggested to play a causal role in breast cancer with their neighbors). progression in tumors with an amplification in the 8p11-p12 We have found in this series of breast tumors that f20% of the genes showed a significant correlation with their neighbors. The transcription correlation maps of the different chromo- somes revealed regions with a high percentage of correlated genes separated by regions containing genes not significantly correlated with their neighbors. The precise physical definition of the correlated regions was not always straightforward as a continuum between the correlated regions was observed in some cases. We addressed the effect of the gene densities on the percentage of correlated genes in the transcription correlation map. Unlike for the regions of increased gene expression described by Caron et al. (7), we observed no systematic correlation between gene density and the percentage of genes that are over the threshold on the transcription correlation map (Supplementary Fig. S3). Genetic or nongenetic mechanisms could account for the correlation in expression between neighboring genes. Aneusomy will affect in the same way the expression of a series of adjacent genes not subjected to gene dosage compensation. This has been described both for DNA losses and gains in yeast (20) and in humans (10, 11, 21, 22). Nongenetic mechanisms could also affect the expression of neighboring genes. Several models or combina- tions of models have been proposed: long range effect of transcription factors, chromatin structure modification, and increased concentration of components of the transcriptional machinery due to a particular subnuclear location of chromo- somal segments (23). Genomic alterations most likely explain part of the correlation between neighboring genes in breast tumors because, except for chromosome arm 14q, the chromosome arms presenting the highest percentage of correlated genes (8q, 51%; 16p, 47%; 17q, 43%; 8p, 40%; 16q, 35%; 20q, 33%; 1q, 32%) were Figure 4. Chromosomal domains of co-expressed genes on chromosome 1q. A, also known to harbor frequent chromosome imbalance, as shown transcriptome correlation map of chromosome 1. B, unsupervised hierarchical by karyotypic, comparative genomic hybridization, or loss of cluster analysis of 130 infiltrating ductal breast carcinomas using the genes on the long arm of chromosome 1 with a significant transcriptome correlation heterozygosity studies (24–30). Two well-characterized amplicons, score (TC score). Each row corresponds to one tumor sample and each column the FGFR1 amplicon at 8p11-p12 and the ERBB2 amplicon at corresponds to one gene. The genes are arranged in cytogenetic order from the centromere to the q telomere: red, high level of expression relative to the mean 17q12, corresponded to regions presenting a high percentage of expression level in the 130 tumor samples; green, low level of gene expression. correlated genes. Very recent data on breast cancer cell lines The genes under line a correspond to the genes contained in rectangle a of A.

Cancer Res 2005; 65: (4). February 15, 2005 1380 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Transcriptome Correlation Map of Breast Carcinomas

correlated in these regions. This percentage is in good agreement with the results of Hyman et al. (10) and Pollack et al. (11), who showed that f50% of the genes within an amplified region are overexpressed. We found chromosomal regions harboring genes with correlated expression in regions known to contain amplicons like 8p11-p12, 11q13, and 17q12 in regions exhibiting frequent single chromosome arm gain like 1q, 8q, and 16p, and in regions exhibiting chromosomal losses like 16q. To determine the exact genetic mechanisms responsible for the correlation in expression between adjacent genes, it will be necessary to obtain, in parallel with transcriptome data, genome data like those obtained by comparative genomic hybridization arrays (35). The combination of comparative genomic hybridization array data and transcription correlation map analysis should help us to identify genes involved in tumor progression. Groups of genes with correlated expression were rarer but also present in regions not affected by genetic alterations in breast cancer like chromosome arms 2p or 9q. Our new approach for analyzing the transcriptome in tumors may pinpoint genes involved in tumor progression, the expression of which is altered by nongenetic mechanisms. Using hierarchical clustering and a Treeview representation, we were able to show that regions with genes with similar expression patterns can extend over very large chromosomal

Figure 5. Chromosomal domains of co-expressed genes on chromosome 8. A, transcriptome correlation map of chromosome 8. B, unsupervised hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using the genes of chromosome 8 with a significant transcriptome correlation score (TC score). Each row, one tumor sample; each column, one gene. The genes are arranged in cytogenetic order from the p telomere to the q telomere: red, high level of expression relative to the mean expression level in the 130 tumor samples; green, low level of gene expression. The genes under line a (or b) correspond to the genes contained in rectangle a (or b)ofA. region. Of the 15 genes with a significant transcriptome correlation score from this region, nine (HTPAP, KIAA0725, FGFR1, BAG4, LSM1, RCP, BRF2, PROSC, and FLJ14299)have already been proposed to be potential oncogenes (31). The ERBB2 region at 17q12 is of particular interest in breast cancer. Amplification and overexpression of ERBB2 is associated with poor clinical outcome and is found in 15% of infiltrating breast carcinomas. Moreover, ERBB2 gene amplification determines the response to specific antibody-based therapy (trastuzumab; ref. 32). All seven of the genes located in the minimal ERBB2 amplification region (280 kb) that presented an overexpression associated with amplification (STARD3, PNMT, TCAP, CAB2, ERBB2, MGC14832, and GRB7; ref. 33) corresponded to correlated genes in the 17q12 region. The 11q13 region is often amplified in breast cancer, and CCND1 was initially thought to be the main oncogene in this region. Although CCND1 was one of the correlated genes, many other genes in this region had similar or higher transcriptome correlation scores, possibly because Figure 6. Chromosomal domains of co-expressed genes on chromosome multiple mechanisms for increasing the expression of CCND1 17q. A, transcriptome correlation map of chromosome 17. B, unsupervised occur frequently in addition to amplification (33) and/or due hierarchical cluster analysis of 130 infiltrating ductal breast carcinomas using to the complexity of the 11p13 amplification region, which the genes on the long arm of chromosome 17 with a significant transcriptome correlation score (TC score). Each row corresponds to one tumor sample probably contains several different amplicons (34). It should be and each column corresponds to one gene. The genes are arranged noted that within the different regions rich in correlated genes, in cytogenetic order from the centromere to the q telomere: red, high level f of expression relative to the mean expression level in the 130 tumor samples; in particular 11q13 and 17q12, 50% of the genes were found green, low level of gene expression. The genes under line a (or b, c, d, e, f) to be correlated, meaning that the other 50% of genes are not correspond to the genes under line a (or b, c, d, e, f)ofA. www.aacrjournals.org 1381 Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Cancer Research domains, sometimes involving a whole chromosome arm. This levels of the genes included in the CCND1 cluster). The phenomenon could be due to genetic or nongenetic mecha- systematic investigation of such relationship could help to nisms. Changes in whole chromosome gene expression due to identify combinations of events that occur during tumor aneuploidy have already been described in yeast and in human progression. (20–22). Modification of gene expression involving large chro- Transcriptome correlation maps are a new way of interpreting mosomal domains or even entire chromosomes could also be transcriptome data. Combined with other molecular data (i.e., due to epigenetic mechanisms (36). chromosomal alteration data obtained by comparative genomic The ‘‘transcriptome correlation map’’ approach has two strong hybridization arrays or large-scale methylation analysis; refs. points: (a) although potentially very useful, knowledge about gene 35, 37, 38), transcriptome correlation map can be applied to any expression in normal tissue is not compulsory. This is particularly tumor type and will help to identify new genes involved in useful for tumors for which the normal counterparts are difficult tumor progression and new mechanisms of gene regulation in to obtain, such as breast or ovarian carcinomas, or when the tumors. cellular origin of the tumor is unknown, like Ewing tumors, synovialosarcoma, or medulloblastomas; (b) it will be possible to compare the transcriptome correlation maps between different groups of samples even if the data for the different groups are Acknowledgments obtained on different platforms because transcriptome correlation Received 7/29/2004; revised 10/13/2004; accepted 12/9/2004. map does not compare the gene expression in different groups Grant support: Centre National de la Recherche Scientifique, Institut Curie Breast Cancer program, European FP5 IST HKIS project, and Comite´ de Paris Ligue Nationale but the correlation of expression between neighboring genes Contre le Cancer (Laboratoire associe´); Ligue Nationale Contre le Cancer fellowship within a group of samples. Additionally, subsets of genes with a (F. Reyal and I. Bernard-Pierrot) and French Ministry of Education and Research fellowship (N. Stransky). significant correlation score could be used to classify tumors into The Institut Curie Breast Cancer Group: Bernard Asselain, Alain Aurias, Emmanuel meaningful anatomoclinical groups. It should be noted that the Barillot, Franc¸ois Campana, Patricia De Cremoux, Olivier Delattre, Veronique Die´ras, relationship between regions of correlated genes can occur Jean-Marc Extra, Alain Fourquet, Henri Magdele´nat, Martine Meunier, Claude Nos, Thao Palangie´, Pierre Pouillart, Marie-France Poupon, Franc¸ois Radvanyi, Xavier not only between adjacent regions but also between regions Sastre-Garau, Brigitte Sigal-Zafrani, Dominique Stoppa-Lyonnet, Anne Tardivon, on different chromosome arms or even different chromo- Fabienne Thibault, Jean Paul Thiery, and Anne Vincent-Salomon. The costs of publication of this article were defrayed in part by the payment of page somes (e.g., higher levels of expression of the genes included charges. This article must therefore be hereby marked advertisement in accordance in the ERBB2 cluster are associated with lower expression with 18 U.S.C. Section 1734 solely to indicate this fact.

References human breast tumors. Proc Natl Acad Sci U S A 2002;99: expression profiles in a cell line model for prostate 12963–8. carcinogenesis. Cancer Res 2001;61:8143–9. 1. Perou CM, Sorlie T, Eisen MB, et al. Molecular 12. Cohen BA, Mitra RD, Hughes JD, Church GM. A 23. Oliver B, Parisi M, Clark D. Gene expression portraits of human breast tumours. Nature 2000; computational analysis of whole-genome expression neighborhoods. J Biol 2002;1:4. 406:747–52. data reveals chromosomal domains of gene expression. 24. Tirkkonen M, Tanner M, Karhu R, Kallioniemi A, 2. Sorlie T, Perou CM, Tibshirani R, et al. Gene Nat Genet 2000;26:183–6. Isola J, Kallioniemi OP. Molecular cytogenetics of expression patterns of breast carcinomas distinguish 13. Spellman PT, Rubin GM. Evidence for large domains primary breast cancer by CGH. Genes Chromosomes tumor subclasses with clinical implications. Proc Natl of similarly expressed genes in the Drosophila genome. Cancer 1998;21:177–84. Acad Sci U S A 2001;98:10869–74. J Biol 2002;1:5. 25. Forozan F, Mahlamaki EH, Monni O, et al. 3. Sorlie T, Tibshirani R, Parker J, et al. Repeated 14. Coombs LM, Pigott D, Proctor A, Eydmann M, Comparative genomic hybridization analysis of 38 observation of breast tumor subtypes in independent Denner J, Knowles MA. Simultaneous isolation of breast cancer cell lines: a basis for interpreting gene expression data sets. Proc Natl Acad Sci U S A DNA, RNA, and antigenic exhibiting kinase complementary DNA microarray data. Cancer Res 2003;100:8418–23. activity from small tumor samples using guanidine 2000;60:4519–25. 4. van’t Veer LJ, Dai H, van de Vijver MJ, et al. Gene isothiocyanate. Anal Biochem 1990;188:338–43. 26. Cingoz S, Altungoz O, Canda T, Saydam S, expression profiling predicts clinical outcome of breast 15. Chirgwin JM, Przybyla AE, MacDonald RJ, Rutter WJ. Aksakoglu G, Sakizli M. DNA copy number changes cancer. Nature 2002;415:530–6. Isolation of biologically active ribonucleic acid from detected by comparative genomic hybridization and 5. van de Vijver MJ, He YD, van’t Veer LJ, et al. A gene- sources enriched in ribonuclease. Biochemistry their association with clinicopathologic parameters in expression signature as a predictor of survival in breast 1979;18:5294–9. breast tumors. Cancer Genet Cytogenet 2003;145: cancer. N Engl J Med 2002;347:1999–2009. 16. Kent WJ. BLAT—the BLAST-like alignment tool. 108–14. 6. Collins FS, Morgan M, Patrinos A. The Human Genome Res 2002;12:656–64. 27. Janssen EA, Baak JP, Guervos MA, van Diest PJ, Jiwa Genome Project: lessons from large-scale biology. 17. Eisen MB, Spellman PT, Brown PO, Botstein D. M, Hermsen MA. In lymph node-negative invasive Science 2003;300:286–90. Cluster analysis and display of genome-wide expression breast carcinomas, specific chromosomal aberrations 7. Caron H, van Schaik B, van der Mee M, et al. The patterns. Proc Natl Acad Sci U S A 1998;95:14863–8. are strongly associated with high mitotic activity and human transcriptome map: clustering of highly 18. Kruglyak S, Tang H. Regulation of adjacent yeast predict outcome more accurately than grade, tumour expressed genes in chromosomal domains. Science genes. Trends Genet 2000;16:109–11. diameter, and oestrogen receptor. J Pathol 2003;201: 2001;291:1289–92. 19. Lercher MJ, Blumenthal T, Hurst LD. Coexpression of 555–61. 8. Zhou Y, Luoh SM, Zhang Y, et al. Genome-wide neighboring genes in Caenorhabditis elegans is mostly 28. Jong YJ, Li LH, Tsou MH, et al. Chromosomal identification of chromosomal regions of increased due to operons and duplicate genes. Genome Res comparative genomic hybridization abnormalities in tumor expression by transcriptome analysis. Cancer Res 2003;13:238–43. early- and late-onset human breast cancers: correlation 2003;63:5781–4. 20. Hughes TR, Roberts CJ, Dai H, et al. Widespread with disease progression and TP53 mutations. Cancer 9. Fujii T, Dracheva T, Player A, et al. A preliminary aneuploidy revealed by DNA microarray expression Genet Cytogenet 2004;148:55–65. transcriptome map of non-small cell lung cancer. profiling. Nat Genet 2000;25:333–7. 29. Kirchweger R, Zeillinger R, Schneeberger C, Speiser P, Cancer Res 2002;62:3340–6. 21. Virtaneva K, Wright FA, Tanner SM, et al. Expression Louason G, Theillet C. Patterns of allele losses suggest 10. Hyman E, Kauraniemi P, Hautaniemi S, et al. Impact profiling reveals fundamental biological differences in the existence of five distinct regions of LOH on of DNA amplification on gene expression patterns in acute myeloid leukemia with isolated trisomy 8 and chromosome 17 in breast cancer. Int J Cancer breast cancer. Cancer Res 2002;62:6240–5. normal cytogenetics. Proc Natl Acad Sci U S A 2001;98: 1994;56:193–9. 11. Pollack JR, Sorlie T, Perou CM, et al. Microarray 1124–9. 30. Dutrillaux B, Gerbault-Seureau M, Zafrani B. Char- analysis reveals a major direct role of DNA copy 22. PhillipsJL,HaywardSW,WangY,etal.The acterization of chromosomal anomalies in human number alteration in the transcriptional program of consequences of chromosomal aneuploidy on gene breast cancer. A comparison of 30 paradiploid cases

Cancer Res 2005; 65: (4). February 15, 2005 1382 www.aacrjournals.org

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Transcriptome Correlation Map of Breast Carcinomas

with few chromosome changes. Cancer Genet Cytoge- 33. Kauraniemi P, Kuukasjarvi T, Sauter G, Kallioniemi A. ative genomic hybridization to microarrays. Nat Genet net 1990;49:203–17. Amplification of a 280-kilobase core region at the 1998;20:207–11. 31. Ray ME, Yang ZQ, Albertson D, et al. Genomic and ERBB2 locus leads to activation of two hypothetical 36. Grewal SI, Moazed D. Heterochromatin and epigenet- expression analysis of the 8p11-12 amplicon in in breast cancer. Am J Pathol 2003;163: ic control of gene expression. Science 2003;301:798–802. human breast cancer cell lines. Cancer Res 2004; 1979–84. 37. Zardo G, Tiirikainen MI, Hong C, et al. Integrated 64:40–7. 34. Ormandy CJ, Musgrove EA, Hui R, Daly RJ, Suther- genomic and epigenomic analyses pinpoint biallelic 32. Nahta R, Hung MC, Esteva FJ. The HER-2-targeting land RL. Cyclin D1, EMS1 and 11q13 amplification in gene inactivation in tumors. Nat Genet 2002; 32:453–8. antibodies trastuzumab and pertuzumab synergistically breast cancer. Breast Cancer Res Treat 2003;78:323–35. 38. Huang TH, Perry MR, Laux DE. Methylation profiling inhibit the survival of breast cancer cells. Cancer Res 35. Pinkel D, Segraves R, Sudar D, et al. High resolution of CpG islands in human breast cancer cells. Hum Mol 2004;64:2343–6. analysis of DNA copy number variation using compar- Genet 1999;8:459–70.

www.aacrjournals.org 1383 Cancer Res 2005; 65: (4). February 15, 2005

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research. Visualizing Chromosomes as Transcriptome Correlation Maps: Evidence of Chromosomal Domains Containing Co-expressed Genes−−A Study of 130 Invasive Ductal Breast Carcinomas

Fabien Reyal, Nicolas Stransky, Isabelle Bernard-Pierrot, et al.

Cancer Res 2005;65:1376-1383.

Updated version Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/65/4/1376

Supplementary Access the most recent supplemental material at: Material http://cancerres.aacrjournals.org/content/suppl/2005/10/12/65.4.1376.DC1

Cited articles This article cites 38 articles, 17 of which you can access for free at: http://cancerres.aacrjournals.org/content/65/4/1376.full#ref-list-1

Citing articles This article has been cited by 9 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/65/4/1376.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/65/4/1376. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 25, 2021. © 2005 American Association for Cancer Research.