Oncogene (2004) 23, 1377–1391 & 2004 Nature Publishing Group All rights reserved 0950-9232/04 $25.00 www.nature.com/onc

Gene expression profiling of colon cancer by DNA microarrays and correlation with histoclinical parameters

Franc¸ois Bertucci1,2,3,Se´ bastien Salas1,9,Se´ verine Eysteries1,9, Vale´ ry Nasser1,9, Pascal Finetti1, Christophe Ginestier1, Emmanuelle Charafe-Jauffret1,4,Be´ atrice Loriod5, Loı¨ c Bachelart1,Je´ roˆ me Montfort5, Genevie` ve Victorero5, Fre´ de´ ric Viret2, Vincent Ollendorff6, Vincent Fert7, Marc Giovaninni2, Jean-Robert Delpero8, Catherine Nguyen5,7, Patrice Viens2,3, Genevie` ve Monges1,3,4, Daniel Birnbaum*,1,7 and Re´ mi Houlgatte5,7

1De´partement d’Oncologie Mole´culaire, Institut Paoli-Calmettes (IPC) and U119 Inserm, IFR57, Marseille, France; 2De´partement d’Oncologie Me´dicale, IPC, Marseille, France; 3Faculte´ de Me´decine, Universite´ de la Me´diterrane´e, Marseille, France; 4De´partement de Biopathologie, IPC, Marseille, France; 5Laboratoire TAGC, ERM206 Inserm, Marseille, France; 6Institut Me´diterrane´en de Recherche en Nutrition, Marseille, France; 7Ipsogen SA, Marseille, France; 8De´partement de Chirurgie, IPC, Marseille, France

Different diagnostic and prognostic groups of colorectal Introduction carcinoma (CRC) have been defined.However, accurate diagnosis and prediction of survival are sometimes difficult. Despite progresses made during the last decades, color- expression profiling might improve these classifica- ectal cancer (CRC) remains one of the most frequent tions and bring new insights into underlying molecular and deadly neoplasias in the Western countries. Current mechanisms.We profiled 50 cancerous and noncancerous prognostic models based on histoclinical parameters colon tissues using DNA microarrrays consisting of B8000 apprehend only poorly the heterogeneity of disease and spotted human cDNA.Global hierarchical clustering was are not accurate enough for prediction in the individual to some extent able to distinguish clinically relevant patient. Tumours with different genetic alterations, subgroups, normal versus cancer tissues and metastatic developing in different host backgrounds, may have versus nonmetastatic tumours.Supervised analyses im- the same clinical presentation but follow quite different proved these segregations by identifying sets of that evolutions. One goal of molecular analysis is to identify, discriminated between normal and tumour tissues, tumours among complex networks of molecules involved in associated or not with lymph node invasion or genetic progression, markers that could differentiate subgroups instability, and tumours from the right or left colon.A of tumours with distinct behaviour and prognosis similar approach identified a gene set that divided patients allowing patients to receive appropriate treatment. with significantly different 5-year survival (100% in one Molecular studies have been largely focused on group and 40% in the other group; P ¼ 0.005). Discrimi- individual candidate genes, contrasting with the mole- nator genes were associated with various cellular processes. cular complexity of disease. The multistep progression An immunohistochemical study on 382 tumour and normal of CRC requires years and possibly decades and is samples deposited onto a tissue microarray subsequently accompanied by a number of genetic alterations (KRAS, validated the upregulation of NM23 in CRC and a APC, P53 and mismatch repair (MMR) genes, WNT downregulation in poor prognosis tumours.These results and TGFb pathways) that accumulate and combine suggest that microarrays may provide means to improve their effects (Vogelstein et al., 1988; Fearon and the classification of CRC, provide new potential targets Vogelstein, 1990). Despite the huge amount of studies, against carcinogenesis and new diagnostic and/or prog- applications remain limited for CRC patients. Little is nostic markers and therapeutic targets. known about molecular alterations associated with the Oncogene (2004) 23, 1377–1391. doi:10.1038/sj.onc.1207262 heterogeneity of disease, and no molecular marker has been validated for use as a new diagnostic or prognostic Keywords: colon cancer; DNA microarray; expression parameter applicable to routine clinical practice. New profiles; microsatellite instability; prognosis models based on a deeper molecular understanding of disease are required to improve screening, diagnosis, treatment and ultimately survival. DNA microarray technology allows the measure of the mRNA expression level of thousands of genes *Correspondence: D Birnbaum, U.119 Inserm, IFR57, 27 Bd. Leı¨ simultaneously in a single assay, thus providing a Roure, 13009 Marseille, France; E-mail: [email protected] molecular definition of a sample adapted to tackle the serm.fr 9These authors have equally contributed combinatory and complex nature of cancers (Bertucci Received 24 June 2003; revised 30 September 2003; accepted 7 October et al., 2001; Mohr et al., 2002). Gene expression 2003 profiling can reveal biologically and/or clinically DNA microarray and colon cancer F Bertucci et al 1378 relevant subgroups of tumours (Alizadeh et al., 2000; microarrays containing B8000 spotted PCR products Garber et al., 2001; Beer et al., 2002; Bertucci et al., fromknown genes and expressed sequence tags (ESTs). 2002; Devilard et al., 2002; Singh et al., 2002) and Analyses of normalized expression levels were both improve our mechanistic understanding of oncogenesis. unsupervised and supervised. Gene expression profiling-based studies of CRC have Unsupervised hierarchical clustering of all samples so far compared normal and tumour tissue samples or was first applied, based on the expression profile of different stages of disease (Alon et al., 1999; Backert genes with a significant variation in expression level et al., 1999; Hegde et al., 2001; Kitahara et al., 2001; across all the experiments. Results are displayed in a Notterman et al., 2001; Agrawal et al., 2002; Birken- colour-coded matrix (Figure 1a) where samples are kamp-Demtroder et al., 2002; Lin et al., 2002; Zou et al., ordered on the horizontal axis and genes on the vertical 2002; Frederiksen et al., 2003; Tureci et al., 2003; axis on the basis of similarity of their expression profiles. Williams et al., 2003), but none has directly addressed The 50 samples were sorted into two large clusters that the issue of prognosis or MSI phenotype. Here we have extensively differed with respect to normal or cancer applied DNA microarray technology to analyse the type (Figure 1b, top): 87% were noncancerous in the left expression of B8000 genes in 50 cancer and noncancer- cluster and 87% were cancerous in the right cluster. As ous colon tissue samples. Unsupervised hierarchical expected, the CRC cell lines represented a branch of the clustering discovered new subgroups of tumours. ‘cancer’ cluster. Hierarchical clustering also allowed Supervised analysis identified several genes differentially identification of gene clusters corresponding to defined expressed between normal and cancer samples and functions or cell types, several of which are indicated by between subgroups of colon cancer defined upon coloured bars on the right of Figure 1a and are zoomed histoclinical parameters, including clinical outcome in Figure 1b. Three are overexpressed in tissue samples and MSI phenotype. Some of the potential markers overall as compared to epithelial cell lines, reflecting the were further studied using immunohistochemistry (IHC) cell heterogeneity of tissues: an ‘immune cluster’ with on sections of tissue microarrays (TMA). different subclusters including an MHC class I sub- cluster that correlated with an interferon-related sub- cluster, and an MHC class II subcluster, a ‘stromal cluster’ rich in genes expressed in stromal cells Results (COL1A1, COL1A2, COL3A1, MMP2, TIMP1, SPARC, CSPG2, PECAM, INHBA), and a ‘smooth Gene expression profiling of CRC and unsupervised muscle cluster’ (CNN1, CALD1, DES, MYH11, SMTN, classification TAGL) that was globally overexpressed in normal The mRNA expression profiles of 50 cancer and tissues as compared to cancer tissues. An ‘early-response noncancerous colon samples (45 clinical tissue samples cluster’ included immediate-early genes (JUNB, FOS, and five cell lines) (Table 1) were determined using DNA EGR1, NR4A1, DUSP1) involved in the human cellular

Table 1 Characteristics of cancer samples profiled using DNA microarrays Patient Sex Age Location Grade pT UICC pN UICC AJCC stage MSI status Treatment Outcome (months)

7650 M 74 Descending colon G pT3 pN1 4 (liver) MSI pS+pCT AWC 4 8582 F 80 Ascending colon P pT3 pN3 4 (liver) MSI pS D 1 7442 M 64 Transverse colon G pT3 pN1 4 (liver) MSS pS+pCT D 32 8208 M 40 Transverse colon M pT3 pN2 4 (liver) MSS cS+adj CT D 41 7835 F 72 Transverse colon G pT3 pN3 4 (liver) MSS pS+pCT D 17 8656 F 57 Descending colon G pT3 pN2 4 (liver) MSS cS+adj CT AWC 66 8031 F 46 Descending colon G pT3 pN2 3 MSS cS+adj CT MR 4 - D 7 6927 M 71 Descending colon G pT3 NA NA MSS cS+adj CT NED 10 9118 F 75 Ascending colon G pT3 pN1 2 MSI cS+adj CT NED 56 8904 M 80 Descending colon G pT3 pN1 2 MSI cS NED 18 6974 M 68 Ascending colon P pT3 pN1 2 MSI cS+adj CT NED 97 8646 M 74 Descending colon G pT3 pN1 2 MSS cS NED 63 8458 M 56 Descending colon G pT3 pN1 2 MSS cS+adj CT NED 69 6992 F 65 Ascending colon G pT3 pN1 2 MSS cS+adj CT NED 98 7094 F 87 Descending colon G pT3 pN1 2 MSS cS NED 64 8252 F 54 RectumG pT4 pN1 2 MSS cS+adj CT NED 74 9075 F 45 Ascending colon G pT2 pN1 1 MSI cS MR 23 - D 38 7505 M 71 Ascending colon G pT1 pN1 1 MSI cS NED 88 7043 M 70 Descending colon G pT2 pN1 1 MSS cS NED 97 6952 M 58 Descending colon G pT2 pN1 1 MSS cS NED 65 7597 F 72 RectumG pT2 pN1 1 MSS cS NED 87 7815 M 63 RectumG pT2 pN1 1 MSI cS MR 10 - D 40

Sex: M, male; F, female. Grade: G, good; M, moderate; P, poor. pT, pathological staging of primary tumour; UICC, International Union Against Cancer; pN, pathological staging of regional lymph nodes; AJCC, American Joint Committee on Cancer; NA, not available, but no metastasis; MSI, microsatellite instability. Treatment: pS, palliative surgery; cS, curative surgery; pCT, palliative chemotherapy; adj CT, adjuvant chemotherapy. Outcome: AWC, alive with cancer; D, death from cancer; MR, metastatic relapse; NED, no evidence of disease

Oncogene DNA microarray and colon cancer F Bertucci et al 1379

Figure 1 Global gene expression profiles in colorectal cancer and noncancerous samples. (a) Hierarchical clustering of 50 samples and 3631 cDNA clones based on mRNA expression levels. Each row represents a clone and each column represents a sample. Expression level of each gene in a single sample is relative to its median abundance across all samples and is depicted according to a colour scale shown at the bottom. Red and green indicate expression levels respectively above and below the median. The magnitude of deviation from the median is represented by the colour saturation. Grey indicates missing data. Dendrograms of samples (above matrix) and genes (to the left of matrix) represent overall similarities in gene expression profiles. For samples, black branches represent normal tissues (n ¼ 23), red branches represent cancer tissues (n ¼ 22) and purple branches represent cancer cell lines (n ¼ 5). Coloured bars to the right indicate the locations of seven gene clusters of interest. These clusters, except the ‘proliferation cluster’ (brown bar), are zoomed in (b). (b) Top: Dendrogram of samples: tissue samples are designated with numbers followed by N for noncancerous tissue and T for tumour tissue. Down: Expanded view of selected gene clusters named from top to bottom: ‘MHC class II’, ‘stromal’, ‘MHC class I’, ‘interferon-related’, ‘early response’ and ‘smooth muscle’. Genes are referenced by their HUGO abbreviation as used in ‘Locus Link’. (c) Dendrogram of samples representing the results of the same hierarchical clustering applied to the 22 cancer tissue samples. Two groups of samples (A and B) are defined. Sample names and branches highlighted in blue and in red are samples respectively without and with metastatic disease at diagnosis (labelled by *) or during follow-up. ‘A’ means alive at last follow-up and ‘D’ means dead fromCRC response to environmental stress. Conversely, a very ‘AJCC1–3 patients’ of group B relapsed nor died after a large cluster (the brown cluster in Figure 1a), so-called median follow-up of 69 months (range 10–98). This ‘proliferation cluster’, was generally overexpressed in suggests that tumours are at higher risk of metastasis in cell lines as compared to tissues, probably reflecting the group A than in group B. To identify particular sets of proliferation rate difference between cells in culture and genes that could better define subgroups of samples, we tumour tissues. It included PCNA that codes for a next focused on supervised analyses. proliferation marker used in clinical practice as well as many genes involved in glycolysis (such as GAPD, LDHA, ENO1), in cell cycle and mitosis (such as CDK4, Differential gene expression between normal colon and BUB3, TUBB), in metabolism (such as ALDH3A1), colon tumours cytochrome c oxidase subunits, GSTP1, and in synthesis such as genes coding for ribosomal To identify and rank genes with significant differential (data not shown). expression between cancer (22 samples) and noncancer- Remarkably, the same clustering algorithm applied ous tissues (23 samples), we applied discriminating score only to the 22 CRC clinical samples sorted two groups (DS) combined with iterative random permutation tests. of tumours (A, 12 patients and B, 10 patients) that We identified 245 cDNA clones (see Supplemental differed with respect to AJCC stage and clinical Table 1), 130 of which were overexpressed and 115 outcome (Figure 1c). Group A included a high propor- underexpressed in cancer samples. They corresponded tion of patients presenting with metastases at diagnosis to 237 unique sequences that represented 191 different (AJCC4 stage, six out of 12) as compared with group B known genes and 46 ESTs. The function of known (0 out of 10). Interestingly, three out of six ‘AJCC1–3 genes, as given in the OMIM and LocusLink databases patients’ of group A experienced metastatic relapse after (NCBI website), and their chromosomal location are a median duration of 18 months (range 4–88) from listed in Supplemental Table 1. Samples were then diagnosis and died fromCRC, while none out of 10 reclustered on the basis of these genes (Figure 2) with a

Oncogene DNA microarray and colon cancer F Bertucci et al 1380 Table 2 Characteristics of cancer samples profiled using tissue microarrays Characteristics All patients (n ¼ 191)

Sex (M/F) 99/92 Median age, years (range) 64 (29–97) Location of tumour Ascending colon 47 Transverse colon 9 Descending colon 110 Rectum21 NA 4

Grade Good 130 nil 47 Poor 14

pT UICC 116 221 3 127 427

pN UICC 188 248 354 NA 1

Vascular invasion No 115 Yes 68 Figure 2 Hierarchical classification of tissue samples using genes NA 8 discriminator between normal and cancer samples. (a) Hierarchical clustering of the 45 colon tissue samples using expression levels of AJCC stage the 245 cDNA clones differentially expressed between normal and 129 cancer samples. Dendrogram of samples is zoomed in (b). (b) 251 Dendrogram: black branches represent normal tissues (n ¼ 23) and 343 red branches represent cancer tissues (n ¼ 22) 468

Surgery 191 Curative/palliative 131/59 good resulting discrimination between normal and NA 1 cancer samples. Chemotherapy 109 Adjuvant/palliative 60/49 Differential gene expression within CRC tissue samples No chemotherapy 80 A supervised approach was applied to the 22 cancer NA 2 tissue samples by comparing tumour subgroups defined Median follow-up, months (range) 74 (2, 133) upon relevant histoclinical parameters. Metastatic evolution 95 Metastatic relapsea 27 Genes associated with visceral metastases Progressionb 68

Presence or occurrence of metastasis is the leading cause Death fromCRC 90 of death in CRC. Accurate predictors of metastasis are needed to tailor therapeutic strategies and improve M, male; F, female; NA, not available; pT, pathological staging of survival. We identified 244 cDNA clones (see Supple- primary tumour; UICC, International Union Against Cancer; pN, mental Table 2), corresponding to 235 unique sequences pathological staging of regional lymph nodes; AJCC, American Joint Committee on Cancer; CRC, colorectal cancer representing 194 characterized genes and 41 ESTs, aAJCC1–3 patients; bAJCC4 patients which best discriminated between samples associated with and without metastasis at diagnosis or during follow-up. Among these clones, 219 were downregulated survival between the two groups were statistically and 25 were upregulated in metastatic samples as significant (Figure 3c). With a median follow-up of 66 compared to nonmetastatic samples. Hierarchical clus- months (range 4–98), the 5-year metastasis-free survival tering of samples based on the expression of these (MFS) and overall survival (OS) were 100% for group 1 selected genes (Figure 3a and b) successfully classified (n ¼ 11) and respectively 18 and 30% for group 2 them according to outcome, with only two nonmeta- (n ¼ 11) (P ¼ 0.0001 and 0.001). When only considering static samples misplaced in group 2. Differences of patients without metastatic disease at the time of

Oncogene DNA microarray and colon cancer F Bertucci et al 1381

Figure 3 Hierarchical classification of CRC tissue samples using genes discriminator between metastatic and nonmetastatic samples and correlation with survival. (a) Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 244 cDNA clones identified as differentially expressed between metastatic and nonmetastatic cancer samples. (b) Dendrogram of samples: blue represents samples without metastasis and red represents samples with metastasis at diagnosis (labelled by *) or during follow-up. ‘A’ means alive at last follow-up and ‘D’ means dead from CRC. The analysis delineates two groups of tumours. (c) Kaplan–Meier plots of MFS and OS of the two groups of samples (blue, group 1 and red, group 2) for all patients (left, n ¼ 22) and ‘AJCC1–3 patients’ (right, n ¼ 16) diagnosis (AJCC1–3 stage), the 5-year MFS and OS 89% for samples with metastasis and 77% for non- were 100% for group 1 (n ¼ 11) and 40% for group 2 metastatic samples. In fact, two misclassified samples (n ¼ 5) (P ¼ 0.005 and 0.006, respectively). Finally, when (8904T and 7650T) displayed intermediate correlations, only ‘AJCC1–2 patients’ (no metastatic disease and in agreement with their intermediate expression profile node-negative tumour at the time of diagnosis) were shown in Figure 3, and probably represent a different considered, they were 100% for group 1 (n ¼ 10) and class of samples. Although the predictive gene set 50% for group 2 (n ¼ 4) (P ¼ 0.019 and 0.022, respec- generated by each of the crossvalidation loops was tively). slightly different at each iteration, on average 80% of To estimate the accuracy of prediction of our the genes of our signature were conserved. molecular signature and the validity of our procedure, we used a ‘leave-one-out’ crossvalidation testing method Genes associated with lymph node metastases (Golub et al., 1999). Iteratively, one of the 22 cancer samples was removed from the group, and a new set of Pathological lymph node involvement at diagnosis is a discriminator genes was generated from the remaining strong prognostic parameter in CRC. Its determination 21 samples. The test sample was then classified with the relies on surgical dissection. But lymph node invasion is other samples by using the predictive gene set and imprecise for determining clinical outcome. Node status according to the difference of the correlation coefficients reflects the tumour expression of many genes, and subtle between its expression profile and the median profile of molecular alterations important for clinical outcome nonmetastatic samples and between its expression may be hidden. Large-scale expression analyses provide profile and the median profile of metastatic samples. It an opportunity to identify those genes. We identified 46 was classified as low or high risk of metastasis, cDNA clones (41 known genes and five ESTs) as depending on whether that difference was positive or significantly differentially expressed between tumours negative, respectively. This process was repeated for with (n ¼ 5) and without (n ¼ 16) lymph node metas- each of the 22 samples. The prediction was considered tases. Reclustering based on these 46 genes correctly correct if it corresponded to the actual outcome. Out of separated node-positive fromnode-negative samples 22 samples, 18 (82%) were correctly assigned by the (Figure 4a). The two samples (9075T and 7442T) that, predictors (data not shown), with an accuracy rate of among all node-negative cases, had expression patterns

Oncogene DNA microarray and colon cancer F Bertucci et al 1382

Figure 4 Hierarchical classification of CRC tissue samples using discriminator genes selected by supervised analyses based on lymph node status, MSI phenotype and location of tumours. (a) Hierarchical clustering of the 21 CRC tissue samples based on expression levels of the 46 cDNA clones that were significantly differentially expressed between lymph node-positive (LN þ , red branches and names) and lymph node-negative (LNÀ, blue branches and names) cancer samples. For each gene, IMAGE cDNA clone number, HUGO abbreviation and chromosomal location are given. EST is used for clones without similarity with known gene or protein. (b) Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 58 cDNA clones that were significantly differentially expressed between MSI þ (MSI, blue branches and names) and non-MSI (red branches and names) cancer samples. (c) Hierarchical clustering of the 22 CRC tissue samples based on expression levels of the 46 cDNA clones that were significantly differentially expressed between cancer samples from right colon (R, blue branches and names) and left colon (L, red branches and names)

more closely related to node-positive samples displayed ESTs) differentially expressed between right and left metastatic disease at the time of diagnosis (7442T) and colon cancers, correctly sorted samples from the right or 23 months after surgery (9075T), partly corroborating left colon (Figure 4c). Such discrimination agreed with the predictions based on molecular signature. the existence of two distinct categories of CRC according to the location of tumour (Iacopetta, 2002). Genes associated with MSI phenotype and with location of cancer Immunohistochemistry on tissue microarrays To obtain additional insights into colorectal oncogen- We sought to measure the IHC expression levels of five esis, we next analysed differential gene expression proteins coded by discriminator genes identified in our between MSI þ (n ¼ 8) and non-MSI (n ¼ 14) tumours supervised analyses. These genes were selected because and between tumours from right colon (n ¼ 6) and left of a putative role in colorectal oncogenesis or metastatic colon (n ¼ 13). process and the commercial availability of a correspond- We identified 58 cDNA clones (representing 51 ing monoclonal antibody: RACK1, NTRK2, NM23, known genes and five ESTs) with significant differential Decorin and Prohibitin. We constructed a TMA expression between MSI þ and non-MSI tumours. containing 191 pairs of cancer samples and the Their discriminator potential was confirmed by the corresponding normal mucosa (Table 2). Two anti- rather successful hierarchical classification of samples bodies – anti-RACK1 and anti-NTRK2 – did not based on their expression levels, even if some MSI þ performsufficiently well in IHC on paraffin-embedded tumours displayed an intermediate expression profile tissues. Moreover, as previously described (Ginestier (Figure 4b). Similarly, classification of 19 samples et al., 2002), correlation between protein and mRNA (excluding transverse colon tumours), based on the levels was not found for Decorin and Prohibitin. The expression of 46 cDNA genes (35 known genes and 11 results were satisfactory for anti-NM23 antibody, which

Oncogene DNA microarray and colon cancer F Bertucci et al 1383 detects both NME1 and NME2 proteins. Examples of Discussion IHC results using this antibody are shown in Figure 5. As found with DNA microarrays, NM23 was signifi- DNA microarray-based gene expression profiling is a cantly overexpressed in cancer samples as compared to promising approach to investigate the molecular com- noncancerous samples (P ¼ 5.6 Â 10À6, Fisher’s exact plexity of cancer. To date, CRC studies (Alon et al., test). In univariate analysis, it was significantly down- 1999; Backert et al., 1999; Hegde et al., 2001; Kitahara regulated in tumours with metastasis (cutoff was the et al., 2001; Notterman et al., 2001; Agrawal et al., 2002; median value) compared to tumours without metastasis Birkenkamp-Demtroder et al., 2002; Lin et al., 2002; (P ¼ 0.04, Fisher’s exact test). The 5-year MFS was 68% Zou et al., 2002; Frederiksen et al., 2003; Tureci et al., for negative and 88% for positive samples when 2003; Williams et al., 2003) have not directly addressed considering the ‘AJCC1–3 patients’ (P ¼ 0.03, log-rank the issue of prognosis or MSI phenotype. Here, we test). The 5-year OS was 42% for negative and 63% profiled 50 cancer and noncancerous colon tissue for positive samples when considering all the patients samples and correlated expression profiles with histocli- (P ¼ 0.03, log-rank test). Multivariate analysis was nical parameters of disease, including survival. then conducted to assess the significance of these associations when we took into account the other Unsupervised analysis classical prognostic factors (AJCC stage, grade of differentiation, vascular invasion, age of patients). The global gene expression profile revealed extensive Patients with positive NM23 status displayed a lower transcriptional heterogeneity between samples, notably risk of metastasis (adjusted odds ratio 0.37, 95% CI cancer samples. It was to some extent already able to 0.13–1.12) and death fromCRC (adjusted odds ratio distinguish clinically relevant subgroups of samples: 0.66, 95% CI 0.40–1.10) than patients with negative normal versus cancer tissues as previously reported NM23 status, but the differences were not significant (Bhattacharjee et al., 2001; Welsh et al., 2001a; Chen (P ¼ 0.07 and 0.12, respectively). et al., 2002; Hippo et al., 2002; Matei et al., 2002;

Figure 5 Analysis of NM23 protein expression in colorectal tissue samples using tissue microarrays. Protein expression of NM23 was analysed using TMA consisting of 486 samples, including 191 pairs of cancer samples and corresponding normal mucosa. (a) Haematoxylin–eosin staining of a paraffin block section (25  35 mm) from the TMA. (b) In all, 5 mmsections of 0.6 mmcorebiopsies of CRC samples stained with anti-NM23 antibody. The sections are from patients without (top, strong staining) and with metastasis (bottom, low staining). Magnification is  50. (c) Kaplan–Meier plots of MFS in ‘AJCC1–3 patients’ according to NM23 protein expression levels. (d) Kaplan–Meier plots of OS in all patients according to NM23 protein expression levels

Oncogene DNA microarray and colon cancer F Bertucci et al 1384 Iacobuzio-Donahue et al., 2003), notably for CRC (Alon cell proliferation and tumorigenicity (Sagawa et al., et al., 1999; Notterman et al., 2001; Lin et al., 2002), and 2003). good versus poor prognosis tumours (Bertucci et al., The top-ranked gene overexpressed in cancer samples 2000; Takahashi et al., 2001; van ‘t Veer et al., 2002). was GNB2L1 (also named RACK1) that encodes a Such global classification is usually imperfect because of homologue of the beta subunit of G proteins involved in the excessive noise generated by a large gene set that signal transduction and activation of PKC. GNB2L1 masks the distinction of interest (such as clinical interacts with IGF1R, an important player in colorectal outcome) governed by a smaller set. Nevertheless, a oncogenesis (Furstenberger and Senn, 2002). This preliminary global approach allows identification of interaction can regulate IGF1-mediated AKT activation broad coherent expression patterns within the mass of and protection fromcell death (Kiely et al., 2002) as well data: for example, we identified several gene clusters that as IGF1-dependent integrin signalling and promote cell corresponded to cell types (stroma, smooth muscle, spreading and contact with the extracellular matrix MHC class I and II) or functions (interferon-related, (ECM) (Hermanto et al., 2002). immediate-early response and proliferation), which have Upregulated genes already reported for other types of been already reported in previous studies (Shaffer et al., cancer included the SNRPs and SOX transcription 2001; Welsh et al., 2001b; Bertucci et al., 2002). factors (SNRPC, SNRPE, SOX4, SOX9), components of ECM or molecules involved in its remodelling Supervised analyses (COL5A1, P4HA1, MMP13, LAMR1) (Notterman et al., 2001; Agrawal et al., 2002; Hippo et al., 2002; To identify smaller sets of discriminator genes that can Frederiksen et al., 2003; Williams et al., 2003). We also improve classification of samples and facilitate transla- identified BZRP that codes for the peripheral benzo- tion in clinical practice, supervised analyses were carried diazepine receptor (Maaser et al., 2002), the cell cycle out, based on predefined groups of samples. genes (CCNB2, CDK2), and SAT, involved in poly- amine metabolism (Bettuzzi et al., 2000). Unexpectedly, Comparison of normal versus cancer samples but as previously reported (Zhang et al., 1997; Tureci et al., 2003), we found overexpression in cancer samples A total of 245 discriminator cDNA clones (3%) were of SERPINB5 and NME1, encoding two potential significantly differentially expressed between normal TSGs. Upregulation of NME1 combined with under- and cancer samples. This ratio is in agreement with expression of CTCF might concur to upregulate the other reports (Zhang et al., 1997; Stremmel et al., 2002). MYC oncogene, a relevant target of the WNT/APC Comparison with lists of discriminator genes previously pathway in the development of CRC (Barker et al., identified in CRC using DNA microarrays (Alon et al., 2000). Other upregulated genes, and potential therapeu- 1999; Backert et al., 1999; Kitahara et al., 2001; tic targets, included kinases (PTK2, STK6, NTRK2), the Notterman et al., 2001; Agrawal et al., 2002; Birken- cell-surface protein CD9, a member of the tetraspanin kamp-Demtroder et al., 2002; Lin et al., 2002; Zou et al., family (Iacobuzio-Donahue et al., 2003) and three genes 2002; Frederiksen et al., 2003; Tureci et al., 2003; encoding integrins ITGA2, ITGAL and ITGB3. The Williams et al., 2003) revealed many common genes, integrin pathway was further affected with variations in further emphasizing the validity of our data. For the expression of genes encoding PTK2, TGFB1I1/ example, CA4, CHGA, CNN1, MYH11, FCGBP, HIC5 (a PTK2 interactor) and integrin-linked kinase KCNMB1 and SST were downregulated, whereas ILK. Agrawal et al. (2002) previously identified CA3, CCT4, EIF3S6 or EEF1A1, IFITM1, CSE1L, osteopontin, an integrin-binding protein, as a marker NME1 or RAN were upregulated in cancer samples. In of CRC progression. In our analysis, SPP1, which codes addition to these genes, we identified many other genes. for osteopontin, as well as CXCL1, which codes for Among the downregulated genes in cancer samples GRO1 oncogene (Notterman et al., 2001; Agrawal et al., were genes encoding cytokines (IL10RA, IL1RN, 2002; Zou et al., 2002) or CDK4 (Wang et al., 1998), IL2RB) (Lin et al., 2002), proteins involved in lipid were not in our stringent list of discriminator genes, metabolism (LPP, LIAS, LRP2, MGLL) (Birkenkamp- although overexpressed in cancer samples with a fold- Demtroder et al., 2002), signal transducers (PLCD1, change superior or equal to two. PLCG2, mTOR/FRAP1) (Lin et al., 2002), transcription Discriminator genes were associated with many factors such as RELA, and other known or putative cellular structures, processes and functions, including tumour suppressor genes (TSG) such as CTCF that general metabolism (the most abundant category), cell encodes a transcriptional repressor of MYC and is cycle, proliferation, apoptosis, adhesion, cytoskeleton, located in 16q22.1, chromosomal region frequently signal transduction, transcription, translation, RNA deleted in human tumours (Filippova et al., 1998) and and protein processing, immune system and others. IRF1, a transcriptional activator of genes induced by Up- and downregulated genes were rather equally cytokines and growth factors, which regulates apoptosis distributed with respect to these functions, except for and cell proliferation and is frequently deficient in those coding for kinases and for proteins involved in human cancers (Tamura et al., 1996). The downregula- ECM remodelling, metabolism, RNA and protein tion of GSN (gelsolin), combined with that of PRKCB1, processing (translation, ribosomal proteins and chaper- may concur to decrease the activation of PKCs in- onins), which were overexpressed in cancer samples. volved in phospholipid signalling pathways and inhibit This phenomenon, already reported (Alon et al., 1999;

Oncogene DNA microarray and colon cancer F Bertucci et al 1385 Kitahara et al., 2001; Birkenkamp-Demtroder et al., serine the favourable prognostic role of which 2002; Lin et al., 2002), is likely to be related to increased has been recently highlighted in prostate cancer (Dha- metabolism and cell proliferation in cancer cells. nasekaran et al., 2001; Luo et al., 2001; Ernst et al., Analysis of chromosomal location pointed to two 2002). Decorin is a small leucine-rich proteoglycan interesting regions. Six genes upregulated in tumours abundant in ECM that negatively controls growth of (STK6, UBE2C, PFDN4, RPS21, CSE1L, SLPI) were CRC cells (Santra et al., 1995) and angiogenesis (Grant located in 20q13, a chromosomal region often amplified et al., 2002). Low levels of mRNA have been associated in cancer; their overexpression might be a consequence with a poor prognosis in breast carcinomas (Troup et al., of gene amplification. This has already been observed by 2003). NME1 and NME2, tandemly linked potential others (Ried et al., 1996; Korn et al., 1999; Birkenkamp- metastasis-suppressor genes (Steeg, 2003), were both Demtroder et al., 2002; Platzer et al., 2002) although not downregulated. Prohibitin is a mitochondrial protein all genes of the region are affected transcriptionally thought to be a negative regulator of cell proliferation (Platzer et al., 2002). Conversely, six genes (TJP3, and may be a TSG (Wang et al., 2002). Transcription of INSR, ELAVL1, MAP2K7, CNN1, NR2F6) down- nuclear genes encoding mitochondrial proteins has been regulated in cancer samples were located in 19p13.1– shown to be decreased during progression of CRC p13.3, already known to harbour several potential TSG (Birkenkamp-Demtroder et al., 2002). This was con- such as APC2 (Nakagawa et al., 1998), STK11 (Resta firmed in our study since all discriminator genes et al., 1998) or MCC2 (Ishikawa et al., 2001). involved in mitochondrial metabolism were downregu- lated in metastatic tumours (ATP5C1, BCKDK, Expression profiles and clinical outcome CABC1, CKMT2, COX5B, COX6B, COX7A2, COX7- A2L, COX7C, HSPA9B, LRIG1, MDH1, NDUFA1, All patients, some of them presenting with metastasis at NDUFA4, NDUFA6, NDUFA9, NDUFV1, SCO1, diagnosis, had received standard treatment. It was UQCR). Surprisingly, although increased protein synth- noticeable that, according to global hierarchical cluster- esis is classically associated with oncogenic transforma- ing, most nonmetastatic tumours that clustered with tion, we found many genes coding for ribosomal metastatic cases eventually developed metastasis and proteins (RPL5, RPL6, RPL15, RPL29, RPL31, died during follow-up. Supervised analysis further RPL39) that were downregulated in metastatic tumours. improved the classification by identifying 194 known To date, we have no explanation for this observation. genes and 41 ESTs that well discriminated between The SMAD1/MADH1 gene, which codes for a trans- samples without or with metastasis at diagnosis or mitter of TGFb signalling, involved in metastatic during follow-up. To our knowledge, this is the first process, was also downregulated. report that suggests a potential prognostic role of gene The top upregulated gene in metastatic tumours was expression profiling in CRC. We compared the sig- PCSK7, which codes for the proprotein convertase nificance of the prognostic classification made by AJCC / type 7. Proprotein convertases (PCs) stage and by our discriminator gene set. Classification process latent precursor proteins into their biologically based on AJCC stage (‘AJCC1–2 tumours’, n ¼ 14, active products, including protein tyrosine phospha- versus ‘AJCC3–4 tumours’, n ¼ 8) was significant tases, growth factors and their receptors, and (P ¼ 0.001; Kaplan–Meier OS analysis, log-rank test), like matrix metalloproteases (MMPs), and thus may but less than that made by expression profiles (P ¼ 0.05 have a functional role in the tumour cell invasion and versus P ¼ 0.003, Fisher’s exact test). Moreover, the tumour progression. Other upregulated genes encoded prognostic impact of our gene set persisted when applied various signalling proteins including PRAME, an to patients without metastasis at diagnosis as well as to interactor of the cytoskeleton-regulator paxillin, IQ- patients without metastasis and lymph node invasion. GAP1, a negative regulator of the E-cadherin/catenin No multivariate analysis was made because of the complex-based cell–cell adhesion (Nabeshima et al., limited sample size of our series. Even if these results 2002), LTPB4, a structural component of connective were obtained on a small series of tumours with relative tissue microfibrils and local regulator of TGFb tissue over-representation of MSI cases, low stages and deposition and signalling (Sterner-Kock et al., 2002), proximal colon cancer when compared with the and IGF1R1. IGF1R has been recently shown to be common CRC population, they are encouraging and involved in metastases of CRC by preventing apoptosis, warrant further investigation. Validation studies with enhancing cell proliferation and inducing angiogenesis larger and more representative cohorts – now planned in (Reinmuth et al., 2002). Several genes located on the our institution – are strongly required to extrapolate long armof chromosome15 were downregulated in this promise to the common population. metastatic samples (Korn et al., 1999). The functional identities of the discriminator genes may provide insight into metastatic process and help Expression profiles and lymph node metastasis identify potential therapeutic targets. Here also various cellular processes were represented. Among the known Nodal metastasis is the best, although imperfect, genes that were downregulated in metastatic tumours predictor of survival for CRC patients. However, were DSC2, encoding desmocollin 2, a desmosomal and approximately one-third of node-negative CRCs recur, hemi-desmosomal adhesion molecule of the cadherin possibly due to understaging and inadequate pathologi- family, and HPN, coding for , a transmembrane cal examination of lymph nodes. Expression profiles

Oncogene DNA microarray and colon cancer F Bertucci et al 1386 defined on primary tumour could help predict the instability and loss of genomic material that may account presence of lymph node metastasis as recently reported for the loss of expression of specific alleles. Reliable (Hippo et al., 2001; West et al., 2001; Huang et al., distinction between MSI þ and non-MSI phenotypes, 2003). We identified 46 genes and ESTs as discriminator currently based on molecular approaches (Zhou et al., between node-positive and node-negative tumours. 1998), remains problematic. Due to the number and As expected, since lymph node status and metastatic large sequence length of the genes involved, the absence relapse are inter-related events, our list included genes of mutation hot-spots, and the possibility of epigenetic that we identified as discriminator between tumours inactivation, confirmation of MMR gene alteration is with or without metastasis. For example, OAS1 and time consuming and difficult in routine clinical practice. NTRK2 were overexpressed in node-positive tumours. Other methods are being tested such as IHC assessment NTRK2, encoding a neurotrophic tyrosine kinase, has of MSH2 and MLH1 (Perrin et al., 2001; Lindor et al., been recently found to be mutated in CRC (Bardelli 2002; Wahlberg et al., 2002). Furthermore, although the et al., 2003) and plays a role in metastastic process underlying molecular mechanisms of MSI þ and non- (Walch and Marchetti, 1999). OAS1 encodes the 20,50- MSI colorectal oncogenesis remain unclear, it is thought oligoadenylate synthetase 1. High levels of activity of that these two phenotypes represent different molecular the 2–5A systemhave been reported in individuals with entities that could translate into distinct gene expression disseminated cancer (Merritt et al., 1985), and a recent profiles usable in clinical practice as new diagnostic study found overexpression of OAS1 mRNA in node- markers and/or tests. positive breast cancers (Huang et al., 2003). Conversely, Our supervised analysis of MSI þ and non-MSI CRC MGP, PRSS8 and NME2 were downregulated in node- clinical samples showed 58 differentially expressed positive tumours. MGP encodes the matrix G1a protein, clones. It is of note that arrayed MMR genes (MSH2, the loss of expression of which has been associated with MSH3, MLH1, MLH3, PMS1 and PMS2) were not lymph node metastasis in urogenital tumours (Leveda- among these genes. As reported for cell lines (Dunican kou et al., 1992). The prostasin , encoded et al., 2002), several of deregulated genes are involved in by PRSS8, is a potential invasion suppressor the cell cycle control, mitosis, transcription and/or chroma- downregulation of which may contribute to invasiveness tin structure (RAN, PTPN21, TP53, MORF4L1, and metastatic potential (Chen and Chai, 2002). ZFP36L2, PSEN1, IGF2, ASNS, RPS4X, CCNF, Our list of 46 discriminator clones also included ZNF354A). The top downregulated gene in MSI þ additional genes, reflecting the nonperfect correlation tumours was EIF3S2, which encodes the eukaryotic between lymph node metastasis and visceral metastasis translation initiation factor 3, subunit 2b, also known as and the involvement of different underlying biological TRIP1. TRIP1 specifically associates with TGFBRII, a processes. Among genes underexpressed in node-posi- serine/threonine kinase receptor frequently inactivated tive tumours were BUB3, TPP2 and ITIH1. BUB3 codes by mutation and downregulated in MSI þ tumours for a mitotic spindle checkpoint protein that interacts (Markowitz et al., 1995). with the APC protein to regulate segrega- tion during cell division. Defects in mitotic checkpoints, Validation studies including mutations of BUB1, have been associated with CRC (Cahill et al., 1998), and BUB genes (BUB1 and The concordance between our data and those from BUB1B) are underexpressed in highly metastatic colon literature with respect to the identity of discriminator cell lines (Hegde et al., 2001). TPP2 encodes tripeptidyl genes and their function strengthens the validity of our peptidase II, a high molecular mass serine results. However, these genes do not point to an obvious that may play a functional role by degrading peptides mechanism of oncogenesis. For many genes, published involved in invasive and metastatic potential as recently data are scarce and our results may stimulate new reported for DPP4 (Kajiyama et al., 2003). ITIH1 investigations, notably through the identification of encodes a heavy chain of proteins of the ITI family, uncharacterized ESTs. Many different cell processes which inhibits the metastatic spreading of H460 M large appear affected during colorectal oncogenesis. Varia- cell lung carcinoma lines by increasing cell attachment tions of the expression of many genes involved in basic (Paris et al., 2002). mechanisms (translation, metabolism) seem a mere consequence of the hyperactivity of cancer cells. Genes Expression profiles and MSI phenotype involved in adhesion processes are affected in metastasis but this could have been expected. Genes known to be There are at least two distinct pathways of oncogenesis affected in oncogenesis, such as MMR genes, do not in sporadic CRC. A total of 15% of tumours present the discriminate tumour subgroups. More work is thus MSI phenotype, which is related to the inactivation of needed to sort out novel alterations that may intervene MMR genes, principally MSH2 and MLH1 (reviewed in in the initiation or development of the disease. Peltomaki, 2003). The genetically unstable tumour cells In contrast, DNA microarray data could prove accumulate somatic clonal mutations in their genome, rapidly useful in clinical practice and design of new which may disturb mRNA expression or degradation of therapeutic strategies. Discriminator genes represent specific transcripts. Conversely, 85% of sporadic tu- potential new diagnostic and prognostic markers and/ mours are associated with a non-MSI (or MSS) or therapeutic targets and deserve further investigation phenotype; they are characterized by chromosome in larger series of patients. We have analysed protein

Oncogene DNA microarray and colon cancer F Bertucci et al 1387 expression of some differentially expressed molecules ability of sample of the corresponding normal colonic using IHC on TMA containing 191 pairs of cancer mucosa), and three tumours and four normal specimens from samples and corresponding normal mucosa. Such an seven different patients. All tumour sections and medical approach is used to both validate the DNA microarray records were de novo reviewed prior to analysis. MSI results and extend data using methods more widely phenotype of 22 cancers was determined by PCR amplification using BAT-25 and BAT-26 oligonucleotide primers and by available in routine clinical practice. However, it IHC using anti-MSH2 and MLH1 antibodies (Perrin et al., encountered several limitations: absence of available 2001). Tumours with alterations in both BAT markers were antibodies for many proteins, antibodies that did not classified as MSI þ . No attempt was made to further classify performwell in IHC on paraffin-embeddedtissues and tumours into MSI-high and MSI-low phenotype. The patients absence of correlation between protein and mRNA were selected to search for gene expression signatures levels. TMA results confirmed the correlations between associated with different characteristics (MSI phenotype, NM23 expression level and two clinical parameters: location of tumour, lymph node status and clinical outcome). noncancerous or cancer status and survival of patients. Main characteristics of patients and tumours are listed in Expression was higher in cancer samples and low Table 1. After colonic surgery, patients were treated (delivery expression was significantly associated with a shorter of chemotherapy or not) according to standard guidelines (Markowitz et al., 2002). After completion of therapy, they survival. Such prognostic correlation has been described were evaluated at 3-month intervals for the first 2 years and at in a variety of tumours including breast (Niu et al., 6-month intervals thereafter. Search for metastatic relapse 2002), ovarian (Tas et al., 2002), lung (Katakura et al., included clinical examination and blood tests completed by 2002) or gastric (Kodera et al., 1994) cancers as well as yearly chest X-ray and liver ultrasound and/or CT scan. melanoma (McDermott et al., 2000). However, it Five samples were represented by two different sporadic remains controversial in CRC with positive (Indinnimeo colon cancer cell lines with chromosomal instability pheno- et al., 1997; Berney et al., 1999; Dursun et al., 2002) and type, Caco2 and HT29. Three samples represented Caco2 in a negative (Lindmark, 1996; Lee et al., 2001; Gunther differentiated state (named Caco2A, 2B and 2C) – i.e. at et al., 2002) reports. In our analysis, which represents confluence (C), at C þ 10 days, at C þ 21 days – and one one of the largest series of CRC samples simultaneously sample represented Caco2 undifferentiated (named Caco2D). Cell lines were obtained fromthe AmericanType Culture tested for NM23 IHC and under highly standardized Collection (http://www.atcc.org/) and grown as recommended. conditions using TMA, the adjusted odd ratios sug- For the IHC study on TMA, a consecutive series of 191 gested a favourable impact of positive NM23 status on unselected sporadic CRC patients (including the 26 cases the clinical outcome, but such association lost signifi- studied by DNA microarrays) treated between 1990 and 1998 cance in multivariate analysis. Larger series with several at the Institut Paoli-Calmettes were selected. There were 99 hundreds of patients are thus required to draw any men and 92 women. The median age of patients at diagnosis definitive conclusion. was 64 years (range 29–97). In 25% of the cases, tumours were In conclusion, our study shows that mRNA expres- located in the right colon, in 57% in the left colon, and in 11% sion profiling of CRC using DNA microarrays can in the rectum(Table 2). identify clinically relevant tumour subgroups, defined upon combined expression of dozens or hundreds of RNA extraction genes. Validation studies on larger series of samples are Total RNA was extracted fromfrozen samplesby using now required to confirmthese data, as well as studies guanidiniumisothiocyanate and caesiumchloride gradient, as investigating whether discriminator genes are, or are previously described (Theillet et al., 1993). RNA integrity was not, functionally relevant to oncogenesis and disease controlled by denaturing formaldehyde agarose gel electro- progression. By delineating these genes, understanding phoresis and 28S Northern blots. CRC development and progression will be improved and new diagnostic and/or prognostic markers, and new DNA microarrays preparation molecular targets for alternative anticancer drugs might Gene expression analyses were carried out with home-made soon be developed, leading to improvements in CRC Nylon microarrays containing 8074 spotted cDNA clones, management. representing 7874 IMAGE human cDNA clones and 200 control clones. According to the 155 Unigene release, the IMAGE clones were divided into 6664 genes and 1210 ESTs. All clones were PCR-amplified in 96-well microtitre plates Materials and methods (200 ml). Amplification products were desiccated and resus- pended in 50 ml of distilled water. They were then spotted as Colorectal cancer patients and samples previously described (Bertucci et al., 1999b) onto Hybond-N þ 2 Â 7cm2 membranes (Amersham) stuck to glass slides using a A total of 50 samples (45 tissue samples and five cell lines) were 64-pin print head on a MicroGridII microarrayer (Apogent profiled using DNA microarrays. The 45 colon tissue samples Discoveries, Cambridge, England). All membranes used in this were obtained from26 patients with sporadic colorectal study belonged to the same batch. adenocarcinoma who underwent surgery at the Institut Paoli-Calmettes (Marseille, France) between 1990 and 1998. DNA microarray hybridizations Samples were macrodissected by pathologists, and frozen within 30 min of removal in liquid nitrogen for molecular Microarrays were hybridized with 33P-labelled probes, first analyses. All tumour samples contained more than 50% of with an oligonucleotide sequence common to all spotted PCR tumour cells. These samples included 22 cancer samples and 23 products (called vector hybridization to determine precisely normal samples divided into 19 tumour–normal pairs (avail- the amount of target DNA accessible to hybridization in each

Oncogene DNA microarray and colon cancer F Bertucci et al 1388 spot), then after stripping, with complex probes made from three representative sample areas were carefully selected from 2 mg of retrotranscribed total RNA (Bertucci et al., 1999a, b). a haematoxylin–eosin-stained section of a donor block. Core Probe preparations, hybridizations and washes were carried cylinders with a diameter of 0.6 mm each were punched from out as described previously (http:/tagc.univ-mrs.fr/pub/Can- each of these areas and deposited into three separate recipient cer/). After washes, arrays were exposed to phosphor-imaging paraffin blocks using a specific arraying device (Beecher plates that were then scanned with a FUJI BAS5000 machine Instruments, Silver Spring, MD, USA). In addition to pairs (25 mmresolution). Hybridization signals were quantified using of tumour and normal mucosa, the recipient block also ArrayGauge software (Fuji Ltd, Tokyo, Japan). received control tissues (small intestine, adenomas) and cell line pellets. In all, 5 mmsections of the resulting TMA block Data analysis were made and used for IHC analysis after transfer onto glass slides. Two colon tumour cell lines (CaCo2, HT29) and one Signal intensities were normalized for the amount of spotted gastric tumour cell line (HGT1) were used as controls. DNA (Bertucci et al., 1999b) and the variability of experi- mental conditions (Bertucci et al., 1999a). Briefly, complex probe intensity of each spot (‘C’) was first corrected (‘C/V’) for Immunohistochemical analysis the amount of target DNA accessible to hybridization as Anti-NM23 rabbit polyclonal antibody was purchased from measured using vector hybridization (‘V’). When ‘V’ intensity Dako (Dako, Trappes, France) and used at 1 : 100 dilution. of a spot was too weak on a microarray, the corresponding IHC was carried out on 5 mmsections of tissue fixed in alcohol cDNA clone was not considered for this experiment (missing formalin for 24 h and included in paraffin. Sections were data). Then, to minimize experimental differences between deparaffinized in histolemon (Carlo Erba Reagenti, Rodano, different complex probe hybridizations, ‘C/V’ values from Italy) and rehydrated in graded alcohol. Antigen enhancement each hybridization were divided by the corresponding median was carried out by incubating the sections in target retrieval value of ‘C/V’. solution (Dako) as recommended. The reactions were carried Genes with expression levels present in at least 80% of the out using an automatic stainer (Dako Autostainer). Staining experiments and standard deviation greater than 0.5 were was done at roomtemperatureas follows: after washes in selected for the clustering analyses (3631 genes). Unsupervised phosphate buffer, followed by quenching of endogenous hierarchical classification investigated relationships between peroxidase activity by treatment with 3% H2O2, slides were samples and relationships between genes. It was applied to first incubated with blocking serum(Dako) for 30 minand data log-transformed and median-centred on genes using the then with the affinity-purified antibody for 1 h. After washes, Cluster and TreeView program(average linkage clustering slides were incubated with biotinylated antibody against rabbit using Pearson’s correlation as similarity metric) (Eisen et al., immunoglobilin for 20 min followed by streptadivin-conju- 1998). We also used supervised analysis to identify and rank gated peroxydase (Dako LSABR2 kit). Diaminobenzidine was genes that distinguish between two subgroups of samples used as the chromogen. Slides were counterstained with identified by histoclinical parameters. A discriminating score haematoxylin, and coverslipped using Curemount (Instrume- (DS) was calculated for each gene (Golub et al., 1999) as dics, Hackensack, USA) mounting solution. They were DS ¼ (M1ÀM2)/(S1 þ S2) where M1 and S1, respectively, evaluated under a light microscope by two pathologists represent mean and standard deviation of expression levels (GM, ECJ). The results were expressed in terms of percentage of the gene in subgroup 1, and M2 and S2 in subgroup 2. (P) and intensity (I) of positive cells as previously described Confidence levels were estimated by bootstrap resampling as (Hoos and Cordon-Cardo, 2001): results were scored by the described previously (Magrangeas et al., 2003). A ‘leave-one- quick score (Q)(Q ¼ P Â I). For the TMA, the mean of the out’ procedure (Golub et al., 1999) was applied to estimate the score of two core biopsies minimum was calculated for each accuracy of prediction of our ‘molecular metastatic signature’ case. Correlations between status of sample (noncancerous or and the validity of our supervised analysis. cancer and cancer with or without metastasis) or Kaplan– Statistical analyses were performed using the SPSS software Meier MFS curves and IHC data were investigated by using (version 10.0.5). MFS and OS were measured from diagnosis Fisher’s exact test and log-rank test in univariate analysis. The until respectively the date of the first distant metastasis and the influence of significant measurement, adjusted for other date of death fromCRC. Follow-up was measuredfromthe factors, was assessed in multivariate analysis by the Cox date of diagnosis to the date of last news for live patients. proportional hazard models. Statistical tests were two-sided at Survivals were estimated with the Kaplan–Meier method and the 5% level of significance. compared between groups with the log-rank test. Data concerning patients without metastatic relapse or death at last follow-up were censored as well as deaths fromother causes. A Acknowledgements P-value o0.05 was considered significant. We are grateful to Drs D Maraninchi, F Birg and C Mawas for encouragement. This work was supported in part by Inserm, CNRS, Institut Paoli-Calmettes, Ipsogen, Association pour la Tissue microarrays construction Recherche sur le Cancer, Ligue Nationale Contre le Cancer, TMA were prepared as described (Richter et al., 2000) with Fe´ de´ ration Nationale des Centres de Lutte Contre le Cancer. slight modifications (Ginestier et al., 2002). For each sample, CG is the recipient of a fellowship fromMinistery of Research.

References

Agrawal D, Chen T, Irby R, Quackenbush J, Chambers AF, JI, Yang L, Marti GE, Moore T, Hudson Jr J, Lu L, Szabo M, Cantor A, Coppola D and Yeatman TJ. (2002). J. Lewis DB, Tibshirani R, Sherlock G, Chan WC, Natl. Cancer Inst., 94, 513–521. Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Botstein D, Brown PO and Staudt LM. (2000). Nature, 403, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell 503–511.

Oncogene DNA microarray and colon cancer F Bertucci et al 1389 Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack Eisen MB, Spellman PT, Brown PO and Botstein D. (1998). D and Levine AJ. (1999). Proc. Natl. Acad. Sci. USA, 96, Proc. Natl. Acad. Sci. USA, 95, 14863–14868. 6745–6750. Ernst T, Hergenhahn M, Kenzelmann M, Cohen CD, Backert S, Gelos M, Kobalz U, Hanski ML, BohmC, Mann Bonrouhi M, Weninger A, Klaren R, Grone EF, Wiesel B, Lovin N, Gratchev A, Mansmann U, Moyer MP, M, Gudemann C, Kuster J, Schott W, Staehler G, Kretzler Riecken EO and Hanski C. (1999). Int. J. Cancer, 82, M, Hollstein M and Grone HJ. (2002). Am. J. Pathol., 160, 868–874. 2169–2180. Bardelli A, Parsons DW, Silliman N, Ptak J, Szabo S, Saha S, Fearon ER and Vogelstein B. (1990). Cell, 61, 759–767. Markowitz S, Willson JK, Parmigiani G, Kinzler KW, Filippova GN, LindblomA, Meincke LJ, Klenova EM, Vogelstein B and Velculescu VE. (2003). Science, 300, 949. Neiman PE, Collins SJ, Doggett NA and Lobanenkov VV. Barker N, Morin PJ and Clevers H. (2000). Adv. Cancer Res., (1998). Genes Cancer, 22, 26–36. 77, 1–24. Frederiksen CM, Knudsen S, Laurberg S and TF OR. (2003). Beer DG, Kardia SL, Huang CC, Giordano TJ, Levin AM, J. Cancer Res. Clin. Oncol., 15, 15. Misek DE, Lin L, Chen G, Gharib TG, Thomas DG, Furstenberger G and Senn HJ. (2002). Lancet Oncol., 3, Lizyness ML, Kuick R, Hayasaka S, Taylor JM, Iannettoni 298–302. MD, Orringer MB and Hanash S. (2002). Nat. Med., 8, Garber ME, Troyanskaya OG, Schluens K, Petersen S, 816–824. Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen Berney CR, Fisher RJ, Yang J, Russell PJ and Crowe PJ. GD, Perou CM, Whyte RI, Altman RB, Brown PO, Botstein (1999). Ann. Surg., 230, 179–184. D and Petersen I. (2001). Proc. Natl. Acad. Sci. USA, 98, Bertucci F, Bernard K, Loriod B, Chang YC, Granjeaud S, 13784–13789. BirnbaumD, Nguyen C, Peck K and Jordan BR. (1999a). Ginestier C, Charaffe-Jauffret E, Bertucci F, Eisinger F, Hum. Mol. Genet., 8, 1715–1722. Geneix J, Bechlian D, Conte N, Ade´ laide J, Toiron Y, Bertucci F, Houlgatte R, Benziane A, Granjeaud S, Adelaide Nguyen C, Viens P, Moziconacci MJ, Houlgatte R, J, Tagett R, Loriod B, Jacquemier J, Viens P, Jordan B, BirnbaumD and JacquemierJ. (2002). Am. J. Pathol., BirnbaumD and Nguyen C. (2000). Hum. Mol. Genet., 9, 161, 1223–1233. 2981–2991. Golub TR, SlonimDK, TamayoP, Huard C, Gaasenbeek M, Bertucci F, Houlgatte R, Nguyen C, Viens P, Jordan BR and Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, BirnbaumD. (2001). Lancet Oncol., 2, 674–682. Bloomfield CD and Lander ES. (1999). Science, 286, Bertucci F, Nasser V, Granjeaud S, Eisinger F, Adelaide J, 531–537. Tagett R, Loriod B, Giaconia A, Benziane A, Devilard E, Grant DS, Yenisey C, Rose RW, Tootell M, Santra M and Jacquemier J, Viens P, Nguyen C, Birnbaum D and Iozzo RV. (2002). Oncogene, 21, 4765–4777. Houlgatte R. (2002). Hum. Mol. Genet., 11, 863–872. Gunther K, Dworak O, Remke S, Pfluger R, Merkel S, Bertucci F, Van Hulst S, Bernard K, Loriod B, Granjeaud S, Hohenberger W and Reymond MA. (2002). J. Surg. Res., Tagett R, Starkey M, Nguyen C, Jordan B and BirnbaumD. 103, 68–78. (1999b). Oncogene, 18, 3905–3912. Hegde P, Qi R, Gaspard R, Abernathy K, Dharap S, Earle- Bettuzzi S, Davalli P, Astancolle S, Carani C, Madeo B, Hughes J, Gay C, Nwokekeh NU, Chen T, Saeed AI, Tampieri A, Corti A, Saverio B, Pierpaola D, Serenella A, Sharov V, Lee NH, Yeatman TJ and Quackenbush J. (2001). Cesare C, Bruno M, Auro T and Arnaldo C. (2000). Cancer Cancer Res., 61, 7792–7797. Res., 60, 28–34. Hermanto U, Zong CS, Li W and Wang LH. (2002). Mol. Cell Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Biol., 22, 2345–2365. Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, Hippo Y, Taniguchi H, Tsutsumi S, Machida N, Chong JM, Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Fukayama M, Kodama T and Aburatani H. (2002). Cancer Golub TR, Sugarbaker DJ and Meyerson M. (2001). Proc. Res., 62, 233–240. Natl. Acad. Sci. USA, 98, 13790–13795. Hippo Y, Yashiro M, Ishii M, Taniguchi H, Tsutsumi S, Birkenkamp-Demtroder K, Christensen LL, Olesen SH, Hirakawa K, Kodama T and Aburatani H. (2001). Cancer Frederiksen CM, Laiho P, Aaltonen LA, Laurberg S, Res., 61, 889–895. Sorensen FB, Hagemann R and TF OR. (2002). Cancer Hoos A and Cordon-Cardo C. (2001). Lab. Invest., 81, Res., 62, 4352–4363. 1331–1338. Cahill DP, Lengauer C, Yu J, Riggins GJ, Willson JK, Huang E, Cheng SH, Dressman H, Pittman J, Tsou MH, Markowitz SD, Kinzler KW and Vogelstein B. (1998). Horng CF, Bild A, Iversen ES, Liao M, Chen CM, Nature, 392, 300–303. West M, Nevins JR and Huang AT. (2003). Lancet, 361, Chen LM and Chai KX. (2002). Int. J. Cancer, 97, 323–329. 1590–1596. Chen X, Cheung ST, So S, Fan ST, Barry C, Higgins J, Lai Iacobuzio-Donahue CA, Maitra A, Olsen M, Lowe AW, Van KM, Ji J, Dudoit S, Ng IO, Van De Rijn M, Botstein D and Heek NT, Rosty C, Walter K, Sato N, Parker A, Ashfaq R, Brown PO. (2002). Mol. Biol. Cell, 13, 1929–1939. Jaffee E, Ryu B, Jones J, Eshleman JR, Yeo CJ, Cameron Devilard E, Bertucci F, Trempat P, Bouabdallah R, Loriod B, JL, Kern SE, Hruban RH, Brown PO and Goggins M. Giaconia A, Brousset P, Granjeaud S, Nguyen C, Birnbaum (2003). Am. J. Pathol., 162, 1151–1162. D, Birg F, Houlgatte R and Xerri L. (2002). Oncogene, 21, Iacopetta B. (2002). Int. J. Cancer, 101, 403–408. 3095–3102. Indinnimeo M, Giarnieri E, Stazi A, Cicchini C, Brozzetti S, Dhanasekaran SM, Barrette TR, Ghosh D, Shah R, Varamb- Valli C, Carreca I and Vecchione A. (1997). Cancer Lett., ally S, Kurachi K, Pienta KJ, Rubin MA and Chinnaiyan 111, 1–5. AM. (2001). Nature, 412, 822–826. Ishikawa S, Kobayashi I, Hamada J, Tada M, Hirai A, Dunican DS, McWilliamP, Tighe O, Parle-McDermottA and Furuuchi K, Takahashi Y, Ba Y and Moriuchi T. (2001). Croke DT. (2002). Oncogene, 21, 3253–3257. Gene, 267, 101–110. Dursun A, Akyurek N, Gunel N and Yamac D. (2002). Kajiyama H, Kikkawa F, Khin E, Shibata K, Ino K and Pathology, 34, 427–432. Mizutani S. (2003). Cancer Res., 63, 2278–2283.

Oncogene DNA microarray and colon cancer F Bertucci et al 1390 Katakura H, Tanaka F, Oyanagi H, Miyahara R, Yanagihara Platzer P, Upender MB, Wilson K, Willis J, Lutterbaugh J, K, Otake Y and Wada H. (2002). Ann. Thorac. Surg., 73, Nosrati A, Willson JK, Mack D, Ried T and Markowitz S. 1060–1064. (2002). Cancer Res., 62, 1134–1138. Kiely PA, Sant A and O’Connor R. (2002). J. Biol. Chem., 277, Reinmuth N, Fan F, Liu W, Parikh AA, Stoeltzing O, Jung 22581–22589. YD, Bucana CD, Radinsky R, Gallick GE and Ellis LM. Kitahara O, Furukawa Y, Tanaka T, Kihara C, Ono K, (2002). Lab. Invest., 82, 1377–1389. Yanagawa R, Nita ME, Takagi T, Nakamura Y and Resta N, Simone C, Mareni C, Montera M, Gentile M, Susca Tsunoda T. (2001). Cancer Res., 61, 3544–3549. F, Gristina R, Pozzi S, Bertario L, Bufo P, Carlomagno N, Kodera Y, Isobe K, Yamauchi M, Kondoh K, Kimura N, Ingrosso M, Rossini FP, Tenconi R and Guanti G. (1998). Akiyama S, Itoh K, Nakashima I and Takagi H. (1994). Cancer Res., 58, 4799–4801. Cancer, 73, 259–265. Richter J, Wagner U, Kononen J, Fijan A, Bruderer J, Schmid Korn WM, Yasutake T, Kuo WL, Warren RS, Collins C, U, Ackermann D, Maurer R, Alund G, Knonagel H, Rist Tomita M, Gray J and Waldman FM. (1999). Genes M, Wilber K, Anabitarte M, Hering F, Hardmeier T, Chromosomes Cancer, 25, 82–90. Schonenberger A, Flury R, Jager P, Fehr JL, Schraml P, Lee JC, Lin YJ, Chow NH and Wang ST. (2001). J. Surg. Moch H, Mihatsch MJ, Gasser T, Kallioniemi OP and Oncol., 76, 58–62. Sauter G. (2000). Am. J. Pathol., 157, 787–794. Levedakou EN, Strohmeyer TG, Effert PJ and Liu ET. (1992). Ried T, Knutzen R, Steinbeck R, Blegen H, Schrock E, Int. J. Cancer, 52, 534–537. Heselmeyer K, du Manoir S and Auer G. (1996). Genes Lin YM, Furukawa Y, Tsunoda T, Yue CT, Yang KC and Chromosomes Cancer, 15, 234–245. Nakamura Y. (2002). Oncogene, 21, 4120–4128. Sagawa N, Fujita H, Banno Y, Nozawa Y, Katoh H and Lindmark G. (1996). Br. J. Cancer, 74, 1413–1418. Kuzumaki N. (2003). Br. J. Cancer, 88, 606–612. Lindor NM, Burgart LJ, Leontovich O, Goldberg RM, Santra M, Skorski T, Calabretta B, Lattime EC and Iozzo RV. CunninghamJM, Sargent DJ, Walsh-Vockley C, (1995). Proc. Natl. Acad. Sci. USA, 92, 7016–7020. Petersen GM, Walsh MD, Leggett BA, Young JP, Shaffer AL, Rosenwald A, Hurt EM, Giltnane JM, LamLT, Barker MA, Jass JR, Hopper J, Gallinger S, Bapat B, Pickeral OK and Staudt LM. (2001). Immunity, 15, 375–385. Redston M and Thibodeau SN. (2002). J. Clin. Oncol., 20, Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, 1043–1048. Tamayo P, Renshaw AA, D’Amico AV, Richie JP, Lander Luo J, Duggan DJ, Chen Y, Sauvageot J, Ewing CM, Bittner ES, Loda M, Kantoff PW, Golub TR and Sellers WR. ML, Trent JM and Isaacs WB. (2001). Cancer Res., 61, (2002). Cancer Cell, 1, 203–209. 4683–4688. Steeg PS. (2003). Nat. Rev. Cancer, 3, 55–63. Maaser K, Grabowski P, Sutter AP, Hopfner M, Foss HD, Sterner-Kock A, Thorey IS, Koli K, Wempe F, Otte J, Stein H, Berger G, Gavish M, Zeitz M and Scherubl H. Bangsow T, Kuhlmeier K, Kirchner T, Jin S, Keski-Oja J (2002). Clin. Cancer Res., 8, 3205–3209. and von Melchner H. (2002). Genes Dev., 16, 2264–2273. Magrangeas F, Nasser V, Avet-Loiseau H, Loriod B, Decaux Stremmel C, Wein A, Hohenberger W and Reingruber B. O, Granjeaud S, Bertucci F, BirnbaumD, Nguyen C, (2002). Int. J. Colorectal Dis., 17, 131–136. Harousseau JL, Bataille R, Houlgatte R and Minvielle S. Takahashi M, Rhodes DR, Furge KA, Kanayama H, Kagawa (2003). Blood, 101, 4998–5006. S, Haab BB and Teh BT. (2001). Proc. Natl. Acad. Sci. USA, Markowitz S, Wang J, Myeroff L, Parsons R, Sun L, 98, 9754–9759. Lutterbaugh J, Fan RS, Zborowska E, Kinzler KW and Tamura G, Sakata K, Nishizuka S, Maesawa C, Suzuki Y, Vogelstein B et al. (1995). Science, 268, 1336–1338. Terashima M, Eda Y and Satodate R. (1996). J. Pathol., Markowitz SD, Dawson DM, Willis J and Willson JK. (2002). 180, 371–377. Cancer Cell, 1, 233–236. Tas F, Tuzlali S, Aydiner A, Saip P, Salihoglu Y, Iplikci A and Matei D, Graeber TG, Baldwin RL, Karlan BY, Rao J and Topuz E. (2002). Am. J. Clin. Oncol., 25, 164–167. Chang DD. (2002). Oncogene, 21, 6289–6298. Theillet C, Adelaide J, Louason G, Bonnet-Dorion F, McDermott NC, Milburn C, Curran B, Kay EW, Barry Walsh Jacquemier J, Adnane J, Longy M, Katsaros D, Sismondi C and Leader MB. (2000). J. Pathol., 190, 157–162. P and Gaudray P et al. (1993). Genes Chromosomes Cancer, Merritt JA, Meltzer DM, Ball LA and Borden EC. (1985). 7, 219–226. Prog. Clin. Biol. Res., 202, 423–430. Troup S, Njue C, Kliewer EV, Parisien M, Roskelley C, Mohr S, Leikauf GD, Keith G and Rihn BH. (2002). J. Clin. Chakravarti S, Roughley PJ, Murphy LC and Watson PH. Oncol., 20, 3165–3175. (2003). Clin. Cancer Res., 9, 207–214. Nabeshima K, Shimao Y, Inoue T and Koono M. (2002). Tureci O, Ding J, Hilton H, Bian H, Ohkawa H, Braxenthaler Cancer Lett., 176, 101–109. M, Seitz G, Raddrizzani L, Friess H, Buchler M, Sahin U Nakagawa H, Murata Y, Koyama K, Fujiyama A, Miyoshi Y, and Hammer J. (2003). FASEB J., 17, 376–385. Monden M, Akiyama T and Nakamura Y. (1998). Cancer van ‘t Veer LJ, Dai H, van De Vijver MJ, He YD, Hart AA, Res., 58, 5176–5181. Mao M, Peterse HL, van Der Kooy K, Marton MJ, Niu Y, Fu X, Lv A, Fan Y and Wang Y. (2002). Int. J. Cancer, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, 98, 754–760. Linsley PS, Bernards R and Friend SH. (2002). Nature, 415, Notterman DA, Alon U, Sierk AJ and Levine AJ. (2001). 530–536. Cancer Res., 61, 3124–3130. Vogelstein B, Fearon ER, Hamilton SR, Kern SE, Preisinger Paris S, Sesboue R, Delpech B, Chauzy C, Thiberville L, AC, Leppert M, Nakamura Y, White R, Smits AM and Bos Martin JP, Frebourg T and Diarra-Mehrpour M. (2002). JL. (1988). N. Engl. J. Med., 319, 525–532. Int. J. Cancer, 97, 615–620. Wahlberg SS, Schmeits J, Thomas G, Loda M, Garber J, Peltomaki P. (2003). J. Clin. Oncol., 21, 1174–1179. Syngal S, Kolodner RD and Fox E. (2002). Cancer Res., 62, Perrin J, Gouvernet J, Parriaux D, Noguchi T, Giovannini 3485–3492. MH, Giovannini M, Delpero JR, BirnbaumD and Monges Walch ET and Marchetti D. (1999). Clin. Exp. Metastasis, 17, G. (2001). Int. J. Oncol., 19, 891–895. 307–314.

Oncogene DNA microarray and colon cancer F Bertucci et al 1391 Wang QS, Papanikolaou A, Sabourin CL and Rosenberg DW. Williams NS, Gaynor RB, Scoggin S, Verma U, Gokaslan T, (1998). Carcinogenesis, 19, 2001–2006. Simmang C, Fleming J, Tavana D, Frenkel E and Becerra C. Wang S, Fusaro G, Padmanabhan J and Chellappan SP. (2003). Clin. Cancer Res., 9, 931–946. (2002). Oncogene, 21, 8388–8396. Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban RH, Welsh JB, Sapinoso LM, Su AI, Kern SG, Wang-Rodriguez J, Hamilton SR, Vogelstein B and Kinzler KW. (1997). Moskaluk CA, Frierson Jr HF and Hampton GM. (2001a). Science, 276, 1268–1272. Cancer Res., 61, 5974–5978. Zhou XP, Hoang JM, Li YJ, Seruca R, Carneiro F, Sobrinho- Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Simoes M, Lothe RA, Gleeson CM, Russell SE, Muzeau F, Behling CA, Monk BJ, Lockhart DJ, Burger RA and Flejou JF, Hoang-Xuan K, Lidereau R, Thomas G and Hampton GM. (2001b). Proc. Natl. Acad. Sci. USA, 98, Hamelin R. (1998). Genes Chromosomes Cancer, 21, 101–107. 1176–1181. Zou TT, Selaru FM, Xu Y, Shustova V, Yin J, Mori Y, West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang Shibata D, Sato F, Wang S, Olaru A, Deacu E, Liu TC, R, Zuzan H, Olson Jr JA, Marks JR and Nevins JR. (2001). AbrahamJM and Meltzer SJ. (2002). Oncogene, 21, Proc. Natl. Acad. Sci. USA, 18, 18. 4855–4862.

Supplementary Information accompanies the paper on Oncogene website (http://www.nature.com/onc).

Oncogene