Oncogene (2002) 21, 7749 – 7763 ª 2002 Nature Publishing Group All rights reserved 0950 – 9232/02 $25.00 www.nature.com/onc

Expression profiling of primary non-small cell lung cancer for target identification

Jim Heighway*,1, Teresa Knapp1, Lenetta Boyce1, Shelley Brennand1, John K Field1, Daniel C Betticher2, Daniel Ratschiller2, Mathias Gugger2, Michael Donovan3, Amy Lasek3 and Paula Rickert*,3

1Target Identification Group, Roy Castle International Centre for Lung Cancer Research, University of Liverpool, 200 London Road, Liverpool L3 9TA, UK; 2Institute of Medical Oncology, University of Bern, 3010 Bern, Switzerland; 3Incyte Genomics Inc., 3160 Porter Drive, Palo Alto, California, CA 94304, USA

Using a panel of cDNA microarrays comprising 47 650 Introduction transcript elements, we have carried out a dual-channel analysis of expression in 39 resected primary The lack of generally effective therapies for lung cancer non-small cell lung tumours versus normal lung leads to the high mortality rate associated with this tissue. Whilst *11 000 elements were scored as common disease. Recorded 5-year survival figures vary, differentially expressed at least twofold in at least one to some extent perhaps reflecting differences in sample, 96 transcripts were scored as over-represented ascertainment methods, but across Europe are approxi- fourfold or more in at least seven out of 39 tumours mately 10% (Bray et al., 2002). Whilst surgery is an and 30 sequences 16-fold in at least two out of 39 effective strategy for the treatment of localized lesions, tumours, including 24 transcripts in common. Tran- most patients present with overtly or covertly dissemi- scripts (178) were found under-represented fourfold in nated disease. Unfortunately, despite the many at least seven out of 39 tumours, 31 of which are under- advances in chemotherapeutic regimes, for most lung represented 16-fold in at least two out of 39 lesions. cancer patients, metastatic disease present at diagnosis The relative expression levels of representative will be fatal. We have chosen to use microarray from these lists were analysed by comparative multiplex transcript profiling to illuminate potential targets with RT – PCR and found to be broadly consistent with the application in the detection or treatment of this microarray data. Two dramatically over-represented disease. genes, previously designated as potential tumour Microarray analysis allows a reversal of the standard suppressors in breast () and lung and breast cancer genetics paradigm. Instead of attempting to (S100A2) cancers, were analysed more extensively and identify areas of the genome which are perturbed in demonstrate the effectiveness of this approach in disease and then looking for key cancer-associated identifying potential lung cancer diagnostic or ther- genes within these regions, the technology allows us to apeutic targets. Whilst it has been reported that first identify genes that are differentially expressed in a S100A2 is downregulated in NSCLC at an early stage, condition and then to look at the genomic context of our microarray, cmRT – PCR, Western and immunohis- that abnormal expression. Perhaps the most powerful tochemistry data indicate that it is strongly expressed in prioritization tool in the search for disease significance the majority of tumours. will be the coupling of changes to Oncogene (2002) 21, 7749 – 7763. doi:10.1038/ genomic damage events in the tumour DNA. sj.onc.1205979 Lung cancer is commonly divided into small cell (SCLC) and non-small cell (NSCLC) disease. SCLC, Keywords: lung; cancer; microarray; expression; target approximately 20% of cases, is invariably metastatic identification at presentation and is therefore treated primarily ONCOGENOMICS through the use of chemotherapeutic agents. In contrast, NSCLC may appear to be localised or only locally metastatic and therefore amenable to surgical resection. NSCLC is further subdivided into three main classes, squamous cell carcinoma, adenocarcino- ma and large cell carcinoma. However, many if not *Correspondence: J Heighway, Gene Function, Roy Castle Interna- most tumours are composed of multiple histological tional Centre for Lung Cancer Research, 200 London Road, types (Thurlbeck and Churg, 1995; Fraire et al., 1987). Liverpool L3 9TA, UK; E-mail: [email protected] or Such variation in a presumably clonal lesion has led Paula Rickert, Incyte Genomics, 3160 Porter Drive, Palo Alto, California, CA 94304, USA; E-mail: [email protected] to the suggestion that lung cancers arise from a multi- Received 27 May 2002; revised 8 August 2002; accepted 13 August potential stem cell component of the normal tissue, a 2002 hypothesis supported by the analysis of model systems Lung cancer target identification J Heighway et al 7750 (Sunday et al., 1999). It is very likely that transcript As additional verification, we independently assayed profiling will lead to further subdivisions within these expression profiles of two of these genes, osteopontin overall classes of disease, some which mirror pre- and PRAME, using multiplex quantitative real-time existing pathological classes and some which are novel RT – PCR (TaqManTM, Applied Biosystems) with the (Garber et al., 2001; Bhattacharjee et al., 2001). housekeeping gene b2-microglobulin as an internal Whilst such classification regimes may be very reference (Figure 1). Osteopontin was found to be important in understanding disease biology and may over-represented at least fourfold in 14 out of 14 become critical in the future with respect to assessing tumours assayed, with an average CT (cycle threshold lung cancer prognosis and directing subsequent of detectable logarithmic phase amplification) of 22 in therapeutic practice, such uses are perhaps currently lung tumours and 32 in normal tissue. PRAME was secondary to the need for improved diagnostic and measured as over-represented in 13 out of 14 tumours, therapeutic agents. with an average CT of 28 in tumours and 36 in normal Given that it is generally easier pharmacologically to tissue. Our microarray approach for identifying over- inhibit rather than to replace function and given that a represented genes in lung tumours was thus confirmed lung cancer diagnostic is likely to be most effective by two distinct methods using independent control where it measures an excess of rather than the lack of a genes. Having validated the basic experimental proto- gene product, our attention has been directed primarily col, a wider analysis was undertaken. towards genes that are over-represented in disease tissue. However, it is likely that the study of under- Microarray data analysis represented sequences will considerably enhance our understanding of the biology of the disease process and RNA was extracted from resectable NSCLC tumours may lead to the discovery of linked and more from 33 additional patients giving 39 in total, including conventional targets. We have therefore generated 13 who had undergone neo-adjuvant chemotherapy and validated gene lists of both tumour over and (Table 1). A further series of dual channel microarray under-represented sequences.

Results

Method validation Dual channel microarray hybridizations were carried out in duplicate for six patients comprising tumour against matched normal tissue across five cDNA microarrays: Human Drug Target (formerly known as GEMTM 1) and Human Founda- tion 1 – 4 (formerly known as Human Genome GEM 2 – 5) (Incyte Genomics, Inc.). Clones representing 14 distinct genes (http://www.roycastle.org/research/) with a differential expression (DE) value greater than two in at least five of the six patients were selected. The validity of these predictions was tested by an independent gene expression analysis, comparative multiplex RT – PCR (cmRT – PCR). As many of the standard control genes are known to be variable in expression between different human tissues and states (Lee et al., 2002) we selected new control genes from within the array data; sequences that showed DE values below two in the tumour versus normal hybridizations from the first six patients. Control primer sets were validated against each other in reactions with paired cDNA samples. The most robust of the tested control loci in terms of ease of

amplification and wide reaction compatibility was the TM KIAA0228 (228) gene (Acc: D86981), located on Figure 1 Illustrates TaqMan quantitative real-time PCR data for matched normal lung and tumour pairs for osteopontin (a) 17q23.2. We used cmRT – PCR and at least one and PRAME (c), confirming the over-representation of these control gene to test the 14 genes identified as genes in a majority of non-small cell lung tumours. The house- differentially expressed (83 individual hybridised keeping gene b2-microglobulin was used as an internal reference elements). The cmRT – PCR data was consistent with for normalization of sample concentrations. Relative expression values (y-axis) were determined using normal lung tissue from pa- the hybridization data predicting a gene expression tient 185 (detectably expressing both genes) as the calibrator. (b difference between the tumour and normal tissue, in and d) show cmRT – PCR (b) and RT – PCR (d) analyses of os- each case, in the same direction. teopontin and PRAME in matched samples

Oncogene Lung cancer target identification J Heighway et al 7751 Table 1 Lists the patient details and tumour histology Patient Diagnosis Sex Age at surgery T N M Stage

57 Squamous M 67 2 0 X IB 61 Large cell endocrine M 73 2 1 X IIB 64 Pulmonary carcinoid M 79 1 0 X IA 65 Adenocarcinoma F 70 1 0 X IA 67 Squamous F 68 2 0 X IB 69 Squamous M 73 1 1 X IIA 70 Adenocarcinoma M 75 2 0 X IB 74 Adenocarcinoma F 71 1 0 X IA 78 Squamous M 66 2 0 X IB 80 Squamous M 70 3 1 X IIB 83 Squamous M 73 2 1 X IIB 86 Adenocarcinoma M 67 2 0 X IB 87 Adenocarcinoma M 72 2 0 X IB 89 Squamous F 68 2 2 X IIIA 91 Adenocarcinoma M 62 2 0 X IB 111 Atypical carcinoid M 61 1 0 X IA 122 Adenocarcinoma F 66 2 1 X IIB 123 Adenocarcinoma M 54 2 2 X IIIA 125 Adenocarcinoma M 78 2 0 X IB 128 Squamous F 50 2 0 X IB 130 Squamous M 43 2 1 X IIB 132 Large cell endocrine F 54 2 0 X IB 134 Squamous F 60 2 1 X IIB 138 Squamous F 75 2 0 X IB 141 Squamous M 71 2 0 X IB 142 Neuroendocrine ca. M 53 1 0 X IA B1 Adenocarcinoma M 71 2 2 0 IIIA B2 Adenocarcinoma M 71 2 2 0 IIIA B3 Adenocarcinoma M 50 2 2 0 IIIA B4 Adenocarcinoma F 54 3 2 0 IIIA B5 Adenocarcinoma F 39 1 2 0 IIIA B6 Adenocarcinoma F 57 2 2 0 IIIA B7 Squamous M 58 2 2 0 IIIA B8 Adenocarcinoma M 62 2 2 0 IIIA B9 Adenocarcinoma F 53 2 2 0 IIIA B10 Large cell M 54 2 2 0 IIIA B11 Squamous M 62 3 2 0 IIIA B12 Adenocarcinoma M 54 2 2 0 IIIA B14 Adenocarcinoma M 62 2 2 0 IIIA

T, N and M refer to the tumour, node and metastasis status according to WHO guidelines

hybridizations was carried out for all patients compar- elements highlighted in the fourfold tumour over- ing tumour against matched normal tissue (27 patients) represented list were various immunoglobulin tran- or tumour against a pooled normal sample (six scripts, the relative expression levels of which patients). For the majority of patient samples, correlated with each other closely across the tumour duplicate hybridizations were performed across all five series (data not shown). Removing these sequences left cDNA microarrays. The data presented in this study a list of 96 genes with Genbank annotation, some of was obtained from 298 individual array analyses that which were represented in the primary list by multiple passed hybridization QC, screening 34 – 37 different array elements. These remaining 96 were ranked in patient samples across each of 47 650 clones represent- order of number of tumours in which the element ing 39 176 unique gene/EST clusters. passed the restriction criteria. Following removal of Ig- Approximately 11 000 clones were scored as signifi- related genes, 30 elements were scored as over- cantly differentially expressed (4twofold: Yue et al., represented 16-fold or more in at least two tumours 2001) in at least one sample. We were primarily and of those, 24 were common to both lists. One interested in identifying sequences dramatically differ- hundred and seventy-eight non-Ig genes with Genbank entially expressed in a wide range of disease samples, annotation were scored as under-represented fourfold or else sequences that were very dramatically differen- or more in seven or more tumours with 30 of these tially expressed in a smaller number of tumours. Gene under-represented 16-fold in two or more lesions. One lists were constructed of (a) genes that were differen- element, representing the calcium binding tially expressed fourfold or more in seven or more S100A8 (MRP8), was present in both the over and (approximately 20%) tumours and (b) genes that were under-represented lists. The expression profile of this differentially expressed 16-fold or more in two or more gene did not correlate closely with those of immuno- tumours (Tables 2 and 3). Approximately 50% of the globulin related elements.

Oncogene Lung cancer target identification J Heighway et al 7752 Table 2 Lists in ranked order four- and 16-fold tumour over-represented elements that passed the restriction criteria Genbank Acc Chr. loc. 46up in: 166up in: 26up in: Gene description (closest PD match to element sequence)

12224892 8q23.3 26 6 33 Homo sapiens mRNA; cDNA DKFZp667P184. 30057 2q32.2 22 5 30 Human mRNA for pro-alpha-1 type 3 collagen. 3413861 1p36.33 22 31 Homo sapiens mRNA for KIAA0450 protein. 14250681 12q13.13 18 11 20 Homo sapiens, keratin 6A. 35147 4q22.1 18 2 28 Human mRNA for osteopontin. 416114 Xq28 18 4 21 Human antigen (MAGE-1) gene. 2088550 6q22.2 18 3 26 Human hereditary haemochromatosis region, histone 2A-like gene. 790537 11q22.3 18 2 25 Human glutamate receptor (GluR4). 6688166 2q24.1 17 2 23 Homo sapiens partial mRNA for GalNAc-T5 (GALNT5 gene). 544761 10p15.1 17 8 20 Chlordecone reductase {clone HAKRa} [human, liver, mRNA]. 2065174 1q23.1 17 7 19 Homo sapiens S100A2 gene, exon 1-3. 3150034 7q33 17 5 18 Homo sapiens aldose reductase-like peptide. 393318 13q14.11 17 24 OSF-2p1 35569 8q22.2 16 24 Human mRNA for polyA binding protein. 31894 14q23.3 16 3 19 Human mRNA for glutathione peroxidase-like protein. 13516384 8q22.1 16 3 28 Homo sapiens genomic DNA, 8q23, clone: KB1241F12. 12653176 4q26 16 27 Homo sapiens, MAD2 (mitotic arrest deficient, yeast, homolog)-like 1. 14043321 20q13.12 15 24 Homo sapiens, carrier protein E2-C. 14198020 17p11.2 15 5 23 H.sapiens gene for cytokeratin 17. 1903383 22q11.22 15 20 Human preferentially expressed antigen of melanoma (PRAME). 452483 10p15.1 15 20 Human dihydrodiol dehydrogenase 4481752 13q12.11 14 18 Human connexin 26. 13111934 15q26.1 14 24 Human, protein regulator of cytokinesis 1. 292829 17q21.2 14 23 DNA topoisomerase II (EC 5.99.1.3) 219928 11p11.2 13 23 Human midkine gene, complete cds 12653130 4p13 13 3 20 Homo sapiens, similar to ubiquitin carboxy-terminal hydrolase L1. 11558106 6q24.1 13 24 Homo sapiens mRNA for transmembrane protein (THW gene). 10437218 11q22.2 13 3 22 Human mRNA for type I interstitial collagenase (MMP1) 13905069 4p16.1 13 4 17 Homo sapiens, S100 calcium-binding protein P. 30094 6q22.1 12 19 H.sapiens COL10A1 gene for collagen (alpha-1 type X). 3300091 Xp11.23 12 21 Homo sapiens prostate associated PAGE-1. 15079897 Xq28 12 4 16 Homo sapiens, melanoma antigen, family A, 3. 29840 10q21.2 12 25 Human cell cycle control gene CDC2. 8919392 17q21.33 12 30 Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 781354. 1418927 17q21.33 12 23 H.sapiens mRNA for prepro-alpha1(I) collagen. 7677071 1q32.1 11 25 Homo sapiens ubiquitin-conjugating enzyme E2. 5834583 19q13.2 11 14 Human mRNA encoding rat C4.4-like protein 339708 17q25.3 11 22 Human thymidine kinase mRNA, complete cds 3483454 12q23.3 11 18 Homo sapiens full length insert cDNA clone ZA02A01. 394340 2q36.1 11 22 H. sapiens (D2S351) DNA segment containing (CA) repeat; AFM294yf5. 13676499 15q15.3 11 20 Macaca fascicularis brain cDNA clone:QflA-13282. 2367444 3p25.2 11 17 Myoblast city 5419854 3q24 11 23 Homo sapiens mRNA; cDNA DKFZp566N043. 34070 17q21.2 10 5 16 Human mRNA for cytokeratin 15 307505 15q14 10 24 Human thrombospondin 2 (THBS2). 13279202 2q35 10 15 Human, insulin-like growth factor binding protein 2 (36kD). 5103013 22q11.23 10 25 Homo sapiens genomic DNA, chromosome 22q11.2, clone KB1125A3. 184472 2q37.1 10 13 Human bilirubin UDP-glucuronosyltransferase isozyme 1. 4589928 20q11.21 10 21 Homo sapiens mRNA for fls353. 12275863 11q23.3 10 17 Homo sapiens tripartite motif protein TRIM29 alpha. 6318669 5q34 10 23 Homo sapiens pituitary tumor-transforming 1 (PTTG1) gene, exon 4. 2911809 22q12.2 10 27 Human PNG pseudogene, complete sequence 1185462 3q26.31 10 19 Human ARF-activated phosphatidylcholine-specific phospholipase D1a (hPLD1). 179595 7q21.3 9 19 Homo sapiens pro-alpha 2(I) collagen (COL1A2) gene, complete cds. 14042718 3q25.1 9 17 Homo sapiens:Probable G-protein coupled receptor (GPR87). 12803708 17q21.2 9 2 19 Homo sapiens, keratin 14. 14602472 1q32.1 9 18 Human ladinin (LAD) mRNA. 14279429 Xq28 9 10 Human chondrosarcoma CSAG2b mRNA sequence 7328949 10p15.1 9 14 Human 20-alpha HSD gene for 20 alph-hydroxysteroid dehydrogenase. 533527 Xq28 9 16 Homo sapiens, melanoma antigen, family A, 9, mRNA, complete cds. 13516845 4q28.3 9 13 Human hxCT mRNA for cystine/glutamate exchanger, complete cds 12052859 10q24.33 9 16 Human mRNA; cDNA DKFZp564O2071 (from clone DKFZp564O2071); complete cds 14042497 12p11.1 9 19 Human cDNA FLJ14751 fis, similar to R. norvegicus potassium channel regulator 1 10440167 4q35.1 9 18 Human cDNA: FLJ23468 fis, clone HSI11603 914203 8q23.1 9 23 blk=protein tyrosine kinase [Human, B lymphocytes, mRNA, 2608 nt]. 6729680 17q21.2 9 18 Homo sapiens keratin 19 (KRT19) gene. 15012098 NP 8 15 Human, similar to small inducible cytokine subfamily B (Cys-X-Cys), member 10 340159 10q22.2 8 21 Source: Human pro-urokinase. 6274527 15q13.3 8 16 Human gremlin mRNA Continued

Oncogene Lung cancer target identification J Heighway et al 7753 Table 2 (Continued ) Genbank Acc Chr. loc. 46up in: 166up in: 26up in: Gene description (closest PD match to element sequence)

2306914 17p12 8 22 Homo sapiens protein kinase mRNA. 188691 1q22 8 10 Human migration inhibitory factor-related protein 8 (MRP8) gene. 2546963 6p21.31 8 2 16 Human mRNA for diubiquitin 30374 3q21.1 8 13 Human radiated keratinocyte mRNA for cysteine protease inhibitor. 14286293 10q23.33 8 19 Homo sapiens, similar to RIKEN cDNA 1200008O12 gene. 12803652 17q21.2 8 3 19 Source: Homo sapiens, keratin 13. 14043731 8q24.13 8 12 Homo sapiens, clone MGC:14128 IMAGE:3610547. 929656 9q34.13 8 12 H.sapiens mRNA for neutrophil gelatinase associated lipocalin. 10439469 2q16.3 8 22 FLJ22932 fis, highly similar to Human carcinoma-associated antigen GA733-2. 14603393 1q21.3 7 14 Human, CK2 interacting protein 1; HQ0024c protein. 35322 16q22.1 7 19 H.sapiens mRNA for p cadherin. 897916 19q13.11 7 9 Human 11kd protein mRNA. 1737211 5q13.2 7 17 Human basic transcription factor 2 p44 (btf2p44) gene. 12653220 17p11.2 7 19 Human, ubiquitin B, clone MGC:8385 IMAGE:2820408. 414584 14q22.2 7 21 Human protein tyrosine phosphatase (CIP2). 14286195 17p11.2 7 13 Human, aldehyde dehydrogenase 3 family, member A1. 453368 18q21.33 7 18 Human maspin mRNA, complete cds. 12653112 11q12.3 7 21 Homo sapiens, flap structure-specific endonuclease 1. 181915 17p11.2 7 19 Human ubiquitin carrier protein (E2-EPF) mRNA, complete cds. 1172085 18q21.33 7 11 Human squamous cell carcinoma antigen (SCCA1) gene, exon 8. 338424 1q22 7 3 10 Human small proline rich protein (sprII) mRNA, clone 174N. 6688152 1q22 7 4 10 Homo sapiens mRNA for small proline-rich protein 3 (SPRR3 gene). 4929718 17q13.1 7 16 Human CGI-125 protein. 14286301 1q25.1 7 17 Homo sapiens, lactate dehydrogenase B. 5689824 15q13.1 7 22 Homo sapiens mRNA full length insert cDNA clone EUROIMAGE 295344. 36154 Xp11.3 7 23 H.sapiens RR2 mRNA for small subunit ribonucleotide reductase. 187121 3p21.31 7 17 Human lactoferrin (HLF1). 338433 1q22 5 13 Human small proline-rich protein gene. 3327379 1q22 4 9 Source: Homo sapiens small proline-rich protein 1 (SPRR1A) gene. 14599490 1q22 3 10 Human small proline-rich protein 2F (SPRR2F) gene. 12804462 12q23.2 3 4 Human, similar to achaete-scute complex (Drosophila) homolog-like 1. 13325357 18q21.32 2 2 Homo sapiens, gastrin-releasing peptide. 338414 10q11.23 2 7 Human prostate secreted seminal plasma protein.

Listed is the closest Genbank match to the Incyte cDNA element sequence, the most likely genomic position of the sequence (BLAT similarity), the number of times the element score exceeded the filter threshold, the number of times it was scored as differentially expressed (DE42) and the abbreviated Genbank annotation for the matched sequence

Table 3 Lists in ranked order four- and 16-fold tumour under-represented elements that passed the restriction criteria 46 166 26 Genbank Acc Chrom. Loc. down in: down in: down in: Gene description (closest PD match to element sequence)

190671 10q22.3 30 21 34 Human pulmonary surfactant-associated protein. 190564 10q22.3 29 11 34 Human pulmonary surfactant apoprotein (PSAP) gene, SFTPA2. 13543547 16p13.3 29 8 32 Human, hemoglobin, alpha 2. 13623624 7q31.2 28 7 33 Human, Similar to caveolin 1, caveolae protein, 22kD. 338306 1q25.3 27 12 30 Human pulmonary surfactant protein (SP5). 29436 8q24.13 27 10 30 Human messenger RNA for beta-globin. 177869 12p13.31 26 6 30 Human alpha-2-macroglobulin mRNA. 3413863 1q42.13 26 2 35 Human mRNA for KIAA0451 protein. 2352944 16p13.12 25 34 Human smooth muscle myosin heavy chain SM2 mRNA, alternatively spliced. 28428 11q13.4 25 5 29 Human mRNA for adult folate binding protein. 34766 10q23.1 25 2 32 H.sapiens mRNA for lung surfactant protein D. 1314305 7p14.3 24 32 Human channel-like integral membrane protein (AQP-1). 10439798 4q26 24 30 Human cDNA: FLJ23191 fis, clone LNG12304. 14290606 1q21.3 24 31 Homo sapiens, Similar to selenium binding protein 1. 10440200 14q13.3 23 7 29 Homo sapiens cDNA: FLJ23494 fis, clone LNG01885. 14036106 16p13.3 23 3 24 Unnamed protein product [Homo sapiens]. 12841011 NP 23 2 28 Protein: putative [Mus musculus]. 219667 5q33.1 23 32 Human mRNA for glutathione peroxidase. 1699037 16p13.3 22 2 30 Human ABC3 mRNA. 15079188 5q35.3 22 7 30 HIN-1 putative cytokine. 37207 3p21.31 22 32 Human mRNA for slow skeletal troponin C (TnC). 4585369 12q14.2 21 2 31 WIF-1; secreted protein; binds Wnt and inhibits their activities. 15209752 14q21.1 21 31 Unnamed protein product. 3882236 6p12.3 21 29 Homo sapiens mRNA for KIAA0758 protein. 2072389 19q13.13 21 3 27 H.sapiens zinc finger transcriptional regulator. Continued

Oncogene Lung cancer target identification J Heighway et al 7754 Table 3 (Continued ) 46 166 26 Genbank Acc Chrom. Loc. down in: down in: down in: Gene description (closest PD match to element sequence)

3360425 19p13.3 21 35 Homo sapiens clone 23822 mRNA sequence. 291819 17q11.2 21 3 28 Human Na+/Cl- dependent serotonin transporter. 2781438 3p21.31 20 31 Homo sapiens alpha 2 delta calcium channel subunit isoform II. 7019845 1q23.3 20 32 Human cDNA FLJ20022 fis, clone ADSE01331. 1164906 4q34.1 20 27 Human mRNA for 15-hydroxy prostaglandin dehydrogenase. 183628 4q21.1 20 5 33 Human cytokine (GRO-beta). 1408187 2q35 20 3 31 Human desmin. 7768719 21q22.11-21q22.12 20 33 Human genomic DNA, chromosome 21q, section 63/105. 7963631 15q15.3 20 2 30 Human dual oxidase mRNA. 6910977 4p15.1 20 27 Human sodium dependent phosphate transporter isoform NaPi-Iib. 12803322 15q25.1 20 2 30 Human, cathepsin H. 233233 13q22.3 20 30 ETB endothelin receptor [Human, mRNA, 1872 nt]. 180968 1p33 19 4 29 Human lung cytochrome P450 (IV subfamily) BI protein. 9624485 6p21.1 19 28 Homo sapiens triggering receptor expressed on monocytes 1 (TREM1). 178099 4q24 19 30 Human class I alcohol dehydrogenase (ADH2) beta-1 subunit. 38222 16p13.3 19 29 Chimpanzee (Pan troglodytes) alpha-globin. 182310 9q22.32 18 28 Human fructose-1,6-bisphosphatase homotetramer gene. 187209 8p21.3 18 28 Human lipoprotein lipase. 3779197 7p21.1 18 26 Source: secreted cement gland protein XAG-2 homolog [Homo sapiens]. 12841376 NP 18 29 Hypothetical protein. 13959017 11q24.2 18 32 Homo sapiens endothelial cell-selective adhesion molecule (ESAM). 13560282 NP 18 26 Homo sapiens mRNA, complete cds, clone:SMAP31-22. 188675 10p12.33 17 32 Human mannose receptor. 3002790 2q14.2 17 31 Homo sapiens macrophage receptor MARCO. 10441878 1q21.3 17 32 Homo sapiens clone PP1396. 13528965 12p12.3 17 26 Homo sapiens, matrix Gla protein. 6137107 13q14.13 17 32 Homo sapiens RGC32. 3171913 7p13 16 30 Homo sapiens mRNA encoding RAMP3. 2661210 12q13.3 16 28 Human oxidative 3 alpha hydroxysteroid dehydrogenase. 7767238 11q23.2 16 31 Human nectin-like protein 2 (NECL2) mRNA. 3006201 3q22.1 16 28 Homo sapiens prostaglandin transporter hPGT. 1082037 19q13.2 16 2 24 Human G0S3. 37408 3p21.32 16 30 Human mRNA for tetranectin. 11545317 1q24.2 16 28 Homo sapiens leucine zipper nuclear factor (BLZF1) gene. 32130 5q33.1 16 3 26 Human mRNA for HLA-DR antigens associated invariant chain (p33). 2853223 Xq26.3 15 31 Human skeletal muscle LIM-protein FHL1. 10047228 5q12.1 15 33 Human mRNA for KIAA1577 protein. 7768757 21q22.3 15 27 Human genomic DNA, chromosome 21q, section 96/105. 29720 11p13 15 27 Human kidney mRNA for catalase. 2745959 10p15.1 14 27 Human proto-oncogene Bcd orf1 and orf2. 37946 12p13.31 14 30 Human mRNA for pre-pro-von Willebrand factor. 6518912 20p13 14 35 Source: Homo sapiens Bit. 32283 6p21.32 14 2 27 Human mRNA for HLA-D class II antigen DR1 beta chain. 2398620 10q26.12/11q22.1 14 5 28 Human mRNA for DMBT1 6 kb transcript variant 1 (DMBT1 gene). 188194 6p21.31 14 26 Human MHC HLA-DQ beta. 7023111 17q25.1 14 18 Human cDNA FLJ10832 fis, similar to NG-CAM related cell adhesion precursor 29980 5q35.1 13 22 H.sapiens CL 100 mRNA for protein tyrosine phosphatase. 8570526 1q22-q23 13 26 Human genomic DNA, chromosome 1q22-q23, CD1 region, section 4/4. 927595 4q22.1 13 27 Human prepromultimerin. 5629914 13q31.1/Xq12 13 31 Homo sapiens mRNA for Z39Ig protein. 10439145 12q24.21 13 27 Homo sapiens cDNA: FLJ22667 fis, clone HSI08385. 14585855 8p21.3 13 28 Phosphatidylethanolamine binding protein. 182127 1q23.1 13 23 Human episialin variant B mRNA, 3’ end. 14250135 7p21.2 13 19 Homo sapiens, similar to RIKEN cDNA 0610006G05 gene. 4884242 4q26 13 27 Human mRNA; cDNA DKFZp564G112. 12653718 1p36.11 13 29 Homo sapiens, CDW52 antigen (CAMPATH-1 antigen). 459812 16q22.3 13 2 16 Human haptoglobin alpha(2FS)-beta. 184831 12q23.2 12 31 Human insulin-like growth factor I (IGF-I) gene, exon 1A. 28965 14q32.13 12 25 Human mRNA for alpha 1-antitrypsin. 5725507 11q25 12 28 Human METH2 protein (METH2). 6457337 12q24.23 12 24 Human E2IG1. 38517 11q23.2 12 19 H.sapiens of PLZF gene encoding kruppel-like zinc finger protein. 179715 5p12 12 22 Human complement protein component C7. 13543471 1q24.3 12 30 Homo sapiens, flavin containing monooxygenase 2. 306754 22q11.23 12 24 Human gamma-glutamyl transpeptidase. 10438470 12p13.1 12 25 Homo sapiens cDNA: FLJ22182 fis. 1439565 1p13.3 12 24 Human chitinase precursor (HUMTCHIT) exon 1a form. 178089 4q24 12 26 Human class I alcohol dehydrogenase (ADH1) alpha subunit. 532677 14q11.2 12 25 Human mRNA for ribonuclease A (RNase A). Continued

Oncogene Lung cancer target identification J Heighway et al 7755 Table 3 (Continued ) 46 166 26 Genbank Acc Chrom. Loc. down in: down in: down in: Gene description (closest PD match to element sequence)

36485 20q13.12 12 23 Human SLPI gene for secretory leukocyte protease inhibitor. 189775 17q24.2 12 29 Human platelet endothelial cell adhesion molecule (PECAM-1). 1050957 1q32.1 11 15 Human chitotriosidase precursor. 1888522 16q13 11 22 Human CX3C chemokine precursor, alternatively spliced. 13516830 3q12.3 11 24 Homo sapiens MAIL. 178201 5q32 11 27 Human beta-2-adrenergic receptor. 10439131 18q22.3 11 25 Homo sapiens cDNA:highly similar to HUMCYB5 Human cytochrome b5. 6980069 16p13.3 11 23 Human chloride channel protein 7 (CLCN7). 6691160 19p13.2 11 31 Homo sapiens low density lipoprotein receptor (LDLR) gene. 13506804 9q32 11 25 Homo sapiens thymic stromal co-transporter. 12803502 6p21.33 11 27 Human, MHC, class I, E, clone MGC:1408 IMAGE:3140835. 11493263 20q13.31-20q13.32 11 19 bA196N14.1 (novel protein) [Homo sapiens]. 7799101 19p13.3 11 28 Human mRNA full length insert cDNA clone EUROIMAGE 532078 12697928 3q25.32 10 28 Human mRNA for KIAA1692 protein. 182438 4q28 10 18 Homo sapiens map 4q28 fibrinogen (FGG) gene, alternative splice products. 2335034 9p13.3 10 19 Homo sapiens mRNA for SLC. 2343146 2q33.1 10 22 Human aldehyde oxidase (AOX1) gene, exon 28. 13097581 16p13.3 10 17 Human, mesothelin, clone MGC:10686 IMAGE:3611296. 13649389 11q12.2 10 14 Homo sapiens MS4A8B protein. 747969 Xq23 10 17 Human II type 2 receptor gene. 14133266 8q23.1 10 23 Human mRNA for KIAA0003 gene. 599833 16q22.1 10 28 H.sapiens VE-cadherin mRNA. 4071356 4p15.1 10 25 Human sodium dependent phosphate transporter isoform NaPi-3b. 12855896 8p21.1 10 21 Hypothetical protein. 240867 17q11.2 10 2 23 Monocyte chemoattractant protein-1 [Human, mRNA, 739 nt]. 7023805 12q23.2 10 25 Homo sapiens cDNA FLJ11259 fis, clone PLACE1009045. 4589551 11q12.2 10 17 Homo sapiens mRNA for KIAA0954. 458227 2p16.1 10 23 Human extracellular protein (S1-5). 1182066 16p13.3 10 22 Human tryptase. 188231 6p21.32 10 23 Human hla-dr antigen alpha-chain mrna & ivs fragments. 13529085 12q21.33/6p24.3 10 22 Homo sapiens, decorin, clone MGC:12406 IMAGE:3934022. 35183 14q32.12 9 19 Homo sapiens mRNA for ISG12 protein. 31317 1q23.2 9 25 Human mRNA for high affinity IgE receptor alpha-subunit (FcERI). 339972 8p21.1 9 22 Human TRPM-2. 4092862 6q21 9 28 Human non-p53 regulated PA26-T1 nuclear protein (PA26). 13436208 6p12.1 9 22 Human, clone IMAGE:3534745. 1237180 Xq26.3 9 19 Homo sapiens (chromosome X) glypican (GPC3). 14715076 9q34.11 9 26 Human, erythrocyte membrane protein band 7.2 (stomatin). 35896 10q23.33 9 23 Human mRNA for retinol binding protein (RBP). 32133 6p21.31 9 21 Human DS glycoprotein alpha subunit from the HLA-D region of the MHC(MHC). 1589755 1p31.2 9 25 Human leptin receptor (LEPR) gene, exon 3. 179933 3q24 9 22 Human mast cell carboxypeptidase A. 14250013 6p21.32 9 23 Homo sapiens, similar to major histocompatibility complex, class II, DR beta 5. 177830 14q32.13 9 18 Human alpha-1-antitrypsin gene (S variant). 4929690 5q23.3 9 14 Human CGI-111 protein. 12803626 20q11.22 9 21 Human, similar to myosin regulatory light chain 2, smooth muscle isoform. 183866 5q31.3 8 22 Human heparin-binding EGF-like growth factor. 188691 1q22 8 12 Human migration inhibitory factor-related protein 8 (MRP8) gene. 2655411 4q12 8 29 Homo sapiens KDR/flk-1 protein. 190934 19q13.33 8 24 Human R-ras gene, exons 2 through 6. 179501 20p12.3 8 25 Human bone morphogenetic protein 2A (BMP-2A). 3986746 1q22 8 25 Tuftelin [Bos taurus]. 396814 1p34.1 8 28 Human tie mRNA for putative receptor tyrosine kinase. 14290437 1p36.12 8 20 Human, complement component 1, q subcomponent, beta polypeptide. 13938192 5q33.2 8 25 Homo sapiens, clone MGC:14986 IMAGE:2906244. 13937881 1q31.2 8 18 Human, regulator of G-protein signalling 2, 24kD. 2474095 16p13.2 8 29 Human XMP. 3090418 1q32.2 8 29 Human pfkfb2 gene, exons 1 to 15. 12653554 3q21.3 8 14 Human, lysophospholipase-like clone. 12851480 7p14.1 8 14 hypothetical protein [Mus musculus]. 7688148 1p36.13 8 22 hypothetical protein [Homo sapiens]. 13111800 10p12.33 8 22 Homo sapiens, vimentin, clone MGC:5062 IMAGE:2985712. 7768679 21q22.12 8 22 Human genomic DNA, chromosome 21q, section 64/105 12803542 14q13.2 8 21 Human, nuclear fact. of kappa light ppeptide gene enhancer in B-cells inhib, alpha. 5262485 3q23 8 2 25 Human mRNA; cDNA DKFZp564B2062. 3236319 3p14.3 8 12 Homo sapiens DNase gamma. 2905995 19q13.12 7 21 Human DAP12 gene. 2826520 7q34 7 26 Human maltase-glucoamylase. 2565193 11q13.4 7 24 Homo sapiens folate binding protein. Continued

Oncogene Lung cancer target identification J Heighway et al 7756 Table 3 (Continued ) 46 166 26 Genbank Acc Chrom. Loc. down in: down in: down in: Gene description (closest PD match to element sequence)

2605495 3p25.1 7 21 Human ppar gamma gene for peroxisome proliferator activated-receptor gamma. 7295892 NP 7 31 CG3104 gene product [Drosophila melanogaster]. 4049586 4p15.2 7 29 Homo sapiens mRNA for Slit-2 protein. 32432 10q22.1 7 19 Human mRNA for hematopoetic proteoglycan core protein. 189084 19q13.13 7 16 Human nonspecific crossreacting antigen. 6048564 11q12.3 7 16 Homo sapiens retinoid inducible gene 1 (RIG1). 36062 6p21.32 7 21 Human RING6 mRNA for HLA class II alpha chain-like product. 10438278 4q35.1 7 27 Homo sapiens cDNA: FLJ22029 fis, clone HEP08661. 4958949 20p11.22 7 20 Human HNF-3beta mRNA for hepatocyte nuclear factor-3 beta. 10436243 6q22.31 7 20 Human cDNA FLJ13942 fis: weakly similar to myosin heavy chain. 508486 12q13.13 7 22 Homo sapiens L-glycerol-3-phosphate:NAD oxidoreductase. 32243 6p21.32 7 19 Human HLA-SB(DP) alpha gene. 5353532 9q31.3 7 22 Human zinc finger transcription factor GKLF. 12804216 12q24.12 7 23 Homo sapiens, aldehyde dehydrogenase 2, mitochondrial. 190673 2p11.2 7 28 Human pulmonary surfactant-associated protein B (SP-B).

Listed is the closest Genbank match to the Incyte cDNA element sequence, the most likely genomic position of the sequence (BLAT similarity), the number of times the element score exceeded the filter threshold, the number of times it was scored as differentially expressed (DE42) and the abbreviated Genbank annotation for the matched sequence

Validation of expression measurements in gene lists targeted BCL2 gene and approximately 150Kb prox- To confirm that the large-scale analysis had correctly imal to a second over-represented gene in the validated identified significant and widely differentially expressed set, SCCA1 (duplicated in the genome). Whilst the sequences, six genes across the range from the fourfold array analysis suggested that BCL2 was marginally upregulated list and two from the fourfold down differentially expressed in only one tumour, a third regulated list were selected. The DE value for the gene in this region, SERPINB13, located between control gene KIAA0228 did not exceed two for any maspin and SCCA1, was found to be over-represented tumour/normal pair analysed. A second control locus in some tumours. This observation was subsequently with a DE 52 in all samples was selected (phospha- validated by cmRT – PCR (Figure 2b) with tidylinositol transfer protein, PI-TPbeta, Genbank SERPINB13 transcripts detected in a range of tumours Accession D30037) and was validated across tumours but only very rarely, if at all, in normal lung tissue. Of and normal pairs against KIAA0228, within and the three genes in this region, maspin, outside the microarray analysed series. The expression transcribed in the same direction as SERPINB13 and alterations predicted were validated across a panel of SCCA2 (SERPINB3) and in the opposite direction to paired cDNA samples generated from the RNA used BCLSCCA12, was the most frequently and specifically in the microarray hybridizations (Figure 2a,b). In each over-represented in the tumour tissues analysed. A case, the RT – PCR data confirmed the direction of DE similar result was obtained with respect to expression predicted by the microarray analysis. Of the tumour of these genes in lung cancer cell lines, with maspin under-represented sequences tested; WIF1, a secreted being the most frequently expressed locus and only one inhibitor of the Wnt signalling pathway (Hsieh et al., line, LUDLU1, expressing all three . 1999), appeared to be markedly reduced in all but one tumour (T86: in which the gene appeared to be over- Western and immunohistochemical analysis of S100A2 represented in both array and cmRT – PCR analyses). levels Interestingly, g-catenin, a putative target of Wnt signalling (Kolligs et al., 2000), was also variably Our study indicated that the calcium-binding protein over-represented in the tumour series (DE 42in14 S100A2 was over-represented in the majority of non- out of 37 samples). Caveolin 1, a gene implicated as a small cell lung tumours, in contrast to the results tumour suppressor in some tissues, was also consis- shown by Feng et al. (2001), which suggested that tently and dramatically under-represented in the lung S100A2 expression is diminished in lung cancer cells tumours. One hundred and six NSCLC tumours were from an early stage. To further analyse S100A2 subsequently screened by RFLP – PCR analysis for expression and attempt to resolve this difference, we mutations within caveolin 1: codon 132, a region extended our investigations of this gene to the protein mutated in a sub-group of breast cancer patients level. Western analysis of matched protein extracts (Hayashi et al., 2001). No mutations were identified from several tumour : normal tissue pairs with an suggesting that alterations to this codon are unlikely to S100A2 specific antibody was entirely consistent with be important in lung cancer. the transcript data for those lesions in indicating a Maspin is mapped to a region of 18q21.33 dramatic over-representation of the products of this approximately 150 Kb distal to the translocation gene in the tumour samples (Figure 3). These data were

Oncogene Lung cancer target identification J Heighway et al 7757 confirmed by an immunohistochemical analysis of a that gene products which are abnormally regulated in range of tumours, where intense staining, either supporting non-malignant cells might be more easily cytoplasmic or both cytoplasmic and nuclear, was exploited given that the non-neoplastic genomes of observed in tumour cells from a range of patients. In such cells are less fluid (Folkman et al., 2000). The the samples examined, connective tissue was invariably ideal lung cancer target would be highly specific for negative for S100A2 antibody staining. Interestingly, in tumour tissue cells and should therefore be readily the normal lung from cancer patients, whilst the detected as an over-represented sequence by the majority of cells were negative, strong nuclear staining analysis undertaken. of a layer of bronchial basal epithelial cells was Of the genes identified in our over-represented lists, observed (Figure 4). matrix metalloproteinase 1 (MMP1) stands out as being significantly associated with lung cancer. Whilst therapies directed against MMPs have not yet been Gene amplification analysis shown to be clinically effective in the disease (Bonomi, For a number of validated loci, tumours were screened 2002), a promoter polymorphism, which it is argued for evidence of gene amplification (or in the case of influences basal expression levels of the gene, appears WIF1, homozygous loss) by cmPCR. Whilst in most to modify lung cancer risk (Zhu et al., 2001). cases we found no evidence for relative copy number Furthermore, Joos et al. (2002) concluded that changes in the disease tissue (maspin, MMP1, S100P, polymorphisms in the MMP1 and MMP12 genes are WIF1), clear data consistent with low-level amplifica- either causative factors in smoking-related lung injury tion was obtained for the S100A2 gene from one out of or are in linkage disequilibrium with causative 35 primary tumours, which as determined by cmRT – polymorphisms. In comparing our data to other PCR analysis of adjacent tissue sections, appeared to epithelial cancer profiles, it also is interesting to note strongly express the gene (Figure 5). In the majority of that putative therapeutic targets in our lists have been cases tested, it seems likely that the gene expression identified a number of times in other malignancies. changes observed are not driven by dramatic numerical Specifically, osteopontin has been shown to be over- alterations in tumour gene copy number. represented in ovarian and colon cancer (Kim et al., 2002; Agrawal et al., 2002) and PRAME over- represented in ovarian cancer (Welsh et al., 2001). Discussion Similarly, S100A2 has been reported to be over- expressed in ovarian cancer (Hough et al., 2001). It has been suggested that there are two inter-related However the gene lists are substantially different to applications facilitated by microarray-based gene those generated in other NSCLC studies (Garber et expression profiling: the identification of novel disease al., 2001; Bhattacharjee et al., 2001; Giordano et al., relevant targets and pathways, and tumour classifica- 2001; Nacht et al., 2001). This is likely due to the tion and stratification (Bertucci et al., 2001). There are different criteria used to build the lists in response to many recent examples of the use of disease profiles to different primary aims (classification versus target stratify tumours into clinically relevant sub-groups. identification). However, currently, there is little evidence that such a S100A2 has been implicated in lung cancer in a strategy would lead directly to an improvement in previous microarray analysis of a human cell line mortality rates for lung cancer patients, given that model system. Interestingly, it was concluded that effective curative treatment options are very limited. this sequence was frequently down-regulated early in Consequently, our primary aim was to design an lung cancer development (Feng et al., 2001). Indeed, analysis that might highlight potential new lung cancer on the basis of their data, the authors concluded targets. that lack of S100A2 expression might be a useful The relative expression of each gene element was marker of early disease and that therapeutic where possible scored by reference to the patient strategies designed to upregulate the expression of matched normal lung tissue. Given that matched this gene might be appropriate. Supporting this tissues have been through identical clinical procedures, argument is a number of expression studies that such a strategy minimises apparent variation due to have implicated S100A2 as a breast tumour unequal degradation of mRNA. Additionally, it suppressor gene (Liu et al., 2000; Wicki et al., minimises any contribution from expression poly- 1997). However, our analysis on the expression of morphisms as it compares tissues with matched this gene in lung cancer differs from the earlier genotypes. No attempt was made to microdissect a report by demonstrating a consistent and strong pure tumour or normal epithelial component for these over-representation of this gene in multiple lung analyses. Given that the cell of origin of lung tumours tumour samples. We observed comparatively high is currently unclear, the use of microdissection to levels of mRNA and strong tumour cell-localized generate a normal-cell probe could in the worst case cytoplasmic and nuclear S100A2 protein expression generate an irrelevant population. Considering the raising the possibility that the product(s) of this gene tumour tissue, whilst it is perhaps likely that tumour- may serve as a lung cancer diagnostic marker. cell-overexpressed sequences will be the most effective Whilst the majority of normal cells appeared not targets, we are acutely aware of the counter-argument to express S100A2, strong nuclear staining was

Oncogene Lung cancer target identification J Heighway et al 7758 observed in a layer of basal lung epithelial cells and functional analysis has suggested that the underlying the columnar epithelium of major protein may act as a suppressor of metastasis airways. The basal cells are multipotent cells capable (Seftor et al., 1998). Paradoxically, Maass et al. of differentiating into many of the mature epithelial (2001) reported that the sequence was strongly cell types found in normal lung, raising the expressed relative to normal tissue in the majority possibility that these particular cells may represent of primary pancreatic tumours analysed and Umeki- the precursors of the S100A2-expressing lung tumour ta et al. (2002) in a recent study of breast cancer, cells. The difference in results observed in each study suggested that strong expression of maspin is may be due to the different control material used, or correlated with poor prognosis. The seemingly due to differences between the cell model system and contradictory results of both S100A2 and maspin the primary tissues analysed and highlights the need suggest that these and other genes perhaps not only to supplement and context tissue-derived transcript behave differently in particular tumours, but also abundance data with in situ hybridization or that they may play different roles in different tissue immunohistochemical analyses of primary material. types and perhaps even within different components A second gene, previously implicated as a tumour of a tissue. suppressor, the serine protease inhibitor maspin In addition to MMP1, the generated gene list also (SERPIN B5) was strongly over-represented in the contains a pre-existing anti-cancer drug target, DNA majority of the tumours analysed. Maspin is thought topoisomerase a (Topcu, 2001; Matsumoto et al., to be down-regulated in breast and prostate cancer 2001). Also represented are a number of putative

Oncogene Lung cancer target identification J Heighway et al 7759

Figure 2 (a and b) Show the cmRT – PCR data for selected genes from the fourfold upregulated lists. Standard RT – PCR data for MMP1 and maspin are also included to show the degree of specificity in gene expression in the tumour compared to the matched normal tissue. (b) Samples analysed for MMP1 expression by standard and comparative RT – PCR are identical and in equivalent order as marked. (a) cDNA targets for S100P S100A2, maspin, SCCA1 and SERPINB13 were common and as marked. RT – PCR targets for maspin were as marked

Oncogene Lung cancer target identification J Heighway et al 7760 transmembrane signalling proteins, kinases, various other enzymes and an ion channel regulator. Such

a

Figure 3 Shows the Western analysis for S100A2 expression. Five tumours which strongly expressed S100A2 (Figure 2a and samples outside the array series) were selected on the basis of b cmRT – PCR data. Approximately equivalent quantities of ex- tracted proteins from matched normal and tumour samples were loaded. No immunoreactive protein was detected in tumour sam- ples which were scored as negative for over-representation of S100A2 (not shown). Under the conditions used, the S100A2 spe- cific antibody detected a small immunoreactive protein in the tu- mour samples only. The apparent size of this protein was smaller than predicted for S100A2 from sequence data (*11 kDa). PAGE-related discrepancies in apparent versus predicted size have Figure 5 (a and b) Show cmPCR analyses of genomic DNA previously been reported for S100 proteins (including S100A2) from tumour and normal samples. Tumour 78 shows a bias in with mass spectrometry ultimately used to confirm the true mole- both sets of reactions towards amplification of the S100A2 geno- cular weights (Ilg et al., 1996). However, we cannot rule out the mic versus control genomic products ((a) g-globin : chr 11 and (b) possibility that the apparent molecular weight of the immunoreac- SPLIT-2: chr 4) compared to both the matched normal from pa- tive protein in tumours is related to alternative processing of tient 78 and further to a range of other normal and tumour sam- either the S100A2 transcript or protein ples. Control primers used were (a) g-globin and (b) SPLIT-2

Figure 4 Shows immunohistochemistry results for normal and tumour tissue. The primary antibody was the S100A2 specific poly- clonal. Clear immunoreactivity is seen in the tumour, with stromal cells being negative. In some cases this strong staining appears to be cytoplasmic (69), in others cytoplasmic and nuclear. CmRT – PCR and Western data suggested that both 191T and 69T ex- pressed high levels of S100A2 mRNA, observation consistent with these IHC data. In normal lung, expression (strong nuclear) ap- peared to be limited to a layer of basal epithelial cells immediately beneath the columnar epithelia lining the major airways

Oncogene Lung cancer target identification J Heighway et al 7761 sequences are putative therapeutic targets that would isolated from each sample was subjected to T7 amplification fall into the class of potentially drugable proteins. The as described in Pabon et al. (2001). Reverse transcription and requirements of a lung cancer detection or diagnostic labelling reactions and hybridizations were performed as described in Yue et al. (2001). Hybridizations were performed reagent are less stringent being simply that the tumour- TM encoded protein is ideally not represented in the in duplicate across each array design. Axon GenePix scanners (Foster City, CA, USA) were used to scan the specific diagnostic compartment (sputum, serum) microarrays in both the Cy3 (535 nm) and Cy5 (625 nm) unless a disease is present or imminent. The number channels. The signals were subsequently converted to a of potential diagnostic targets in the list is therefore resolution of 16 bits per pixel and balanced using the total wider. In both cases, further pre-clinical validation Cy3 and Cy5 signals. The surrounding background was studies are warranted. subtracted from each element signal, and Cy3 : Cy5 ratios were computed as described in Yue et al. (2001). Materials and methods Data analysis Clinical samples The Incyte microarray platform has been extensively quality Approval for this study was obtained from local ethics controlled (Yue et al., 2001). On the basis of these established committees in Liverpool and Bern. Two sets of samples were performance criteria, a differential expression (DE) between used in the microarray analysis (Table 2). The first (26) the tumour and normal channel was considered significant if represented sequentially obtained resectable-staged NSCLCs it exceeded twofold. The hybridization signal from each and matched normal tissue from clinical centres in Liverpool, element was assessed with respect to (a) height of that signal UK. The second series of paired tissues (13) were collected as above background and (b) signal diameter. A DE was only part of a clinical trial of neo-adjuvant chemotherapy centred calculated if, in at least one channel, the signal to in Bern, Switzerland. These NSCLC patients were all staged background ratio exceeded 2.5, the signal intensity was above IIIA on entry into the study and followed a course of three 250 units, and the element diameter exceeded 40% of the cycles of docetaxel and cisplatin therapy (days 1, 22 and 33) mean element diameter for that array. In addition, control prior to resection (day 64) under SAKK protocol 16/96. elements including spiked-in yeast DNA fragments and Immediately following surgery, tumour tissue and matched housekeeping genes were assessed as an additional measure normal lung tissue, approximately 10 cm distant from the of hybridization quality. primary lesion, were crudely dissected and snap frozen in Before calculation of the DE a correction factor was liquid nitrogen. Both diseased and normal tissues were applied to each Cy5 probe (corresponding to the tumour therefore matched with respect to the clinical procedure and sample) element value to normalise system signal differences collection process. The tumours used in this section of the between the Cy3 and Cy5 channels arising from slight analysis, with the exception of B2, all showed a clinical differences in RT or dye labelling efficiency. The Cy5 signal response to the neo-adjuvant therapy. Additional matched value was multiplied by a balance coefficient calculated as a NSCLC tumour pairs from the Roy Castle tissue bank were ratio of the total Cy3 and total Cy5 signals. The DE values used for extended cmRT – PCR, Taqman and mutation were exported from GEMToolsTM software (Incyte Geno- analysis. mics, Inc., Palo Alto, CA, USA) into Microsoft Excel to allow averaging over duplicate hybridizations (most samples) as the median of the log(2) DE values for each element. The Clinical sample processing averaged data was subsequently imported into GeneSpring Three batches of 20, 40 mM frozen sections were cut from the (Silicon Genetics) for all further data analysis. tumour and normal samples (*1 cm diameter) and used respectively to extract RNA, protein and DNA. 5 mM RT – PCR sections flanking those used for RNA extraction and adjacent to those fronting the DNA and protein series were stained to Tumour and normal tissue cDNA was synthesized from 1 ml show tumour histology and to estimate the overall percentage aliquots of total RNA (*100 ng) in a 20 ml poly T-primed of viable overt tumour cells in the region analysed (which reaction using a Promega Reverse Transcription System (as varied between 0% in one neo-adjuvant patient to 100%: directed) and was stored at – 208C. cDNA was similarly mean value 60%). Total RNA was extracted from the tissues prepared from cell line derived mRNA. 0.5 ml of this target using a standard Trizol (Life Technologies) protocol (Heigh- cDNA was used in RT – PCR assays. A stock reaction mix way et al., 2003). Following ethanol precipitation, the RNA was prepared which included (number of tubes+1) (42 ml was resuspended in 100 ml of RNase-free water and stored at dH2O, 5 mlof106 Taq reaction buffer, 0.5 ml of each primer –708C. Yields averaged 125 mg of total RNA from the (0.25 mg), 0.5 ml of a 250 mM dNTP mix and 1.25 units of tumours and 36 mg for the normal tissues. DNA was Roche Taq polymerase). The stock mix was vortexed after extracted by a standard SDS lysis/phenol extraction Taq addition and 49 ml added to each tube. The reactions procedure (Heighway et al., 2003). were immediately cycled on Perkin Elmer 9600 machines at 948C for 2 min, 30 times (588C for 1 min, 748C for 1 min, 948C for 1 min) 588C for 2 min and 748C for 10 min. PCR Microarray analysis products were visualized by UV illumination of 2.5% TBE mRNA was isolated from the total RNA samples with one agarose gels containing 0.5 mg/ml ethidium bromide round of poly(A)+ selection as described in Yue et al. (2001). Matched normal lung mRNA was not available from Primers tumours 57, 65, 111, 132 and B9 and B14. The hybridizations for these samples were therefore run against a pooled normal Primer sequences are available on request (or see http:// sample drawn from the analysis series. To enable sufficient www.roycastle.org/research/). For RT – PCR, the primers sample for the multiple hybridizations involved, mRNA were positioned such that the product obtained was cDNA

Oncogene Lung cancer target identification J Heighway et al 7762 specific. This was verified experimentally by test amplifica- one protease inhibitor cocktail tablet (Roche) per 50 ml). tions of cDNA and genomic DNA targets. Lysis was encouraged by vigorous pipetting and/or syringing of the mix. An equal volume of 26 tracking dye (100 mM Tris-HCl pH 6.8, 16% v/v glycerol, 3% w/v SDS, 8% v/v b- Comparative multiplex RT – PCR and PCR mercaptoethanol, 0.04% w/v bromophenol blue) was added Competitive reactions were carried out using four instead of to the extract, which was subsequently boiled for 10 min. two primers per reaction to concurrently amplify a test and a Approximately equal amounts of protein derived from control locus. The reaction is therefore competitive, the ratio matched normal and tumour sample were electrophoresed of final products being related to the initial transcript ratios. on standard 15% polyacrylamide gels and electroblotted Following cDNA test amplifications containing 0.5 mgof overnight onto Immobilon-P membrane (Millipore). Follow- each primer, and assuming that a test-gene product was ing blocking for 4 h with 4% w/v milk powder in PBS actually obtained from the normal tissues, reactions were containing 0.05% v/v Tween 20, the filter was immersed for balanced such that the product intensity of the two genes 1 h in a 1/500 dilution of primary S100A2-specific rabbit under analysis was approximately equal in the normal polyclonal antibody (Dako). Following 365 min washes in sample. The balancing was carried out by sequentially PBS containing 0.05% v/v Tween 20, specifically bound reducing the concentration of the primers for the stronger primary was detected using a swine anti-rabbit secondary gene product in the initial reaction, so reducing the efficiency antibody, 45 min, 1/1000 dilution (Dako). Following of that amplification. The estimates of relative product 465 min washes, bound secondary was detected using an intensity in the disease tissue were subsequently carried out ECL+Plus enhanced chemiluminesence system (Amersham). visually where, patient-by-patient, the ratio of test to control locus in the normal tissue derived cDNA was compared to Immunohistochemical analysis of S100A2 expression that in the tumour tissue amplification. To minimise experimental variation it is important that the normal and Five mM formalin-fixed, paraffin-embedded sections were tumour cDNA from each patient are assayed in the same dewaxed in xylene for 30 min followed by rehydration way, at the same time, using the same PCR reaction master through a series of graded ethanols to water. Antigens were mix and under identical conditions. Comparative multiplex retrieved using a 15 min microwave treatment in Vector PCR was carried out in an analogous way, using control unmasking solution (10 mM sodium citrate buffer pH 6). primer sets from regions of the genome that are relatively Slides were washed in standing and running water for 15 and unperturbed in lung cancer with respect to gene amplification 5 min respectively prior to loading onto a Dako Autostainer. and homozygous deletion. Endogenous peroxidase was blocked by 15 min incubation in 3% v/v H2O2 and sections were incubated for 60 min in a 1/250 – 1/500 dilution of S100A2 rabbit polyclonal antibody TaqManTM analysis (Dako) followed by detection of specific staining with the For quantitative real-time PCR analysis, cDNA was Dako LSAB2 HRP system with diaminobenzidine substrate. synthesized from 1 mg total RNA in a 25 ml reaction with The slides were counter stained in Haematoxilin/Scotts tap 100 units M-MLV reverse transcriptase (Ambion), 0.5 mM water and then dehydrated through a series of graded dNTPs (Epicentre), and 40 ng/ml71 random hexamers (Fish- alcohols to xylene. Sections were mounted in DPX and er). Reactions were incubated at 258C for 10 min, 428C for visualized. In the negative control sections, the primary 50 min, and 708C for 15 min, diluted to 500 ml, and stored at antibody was replaced by diluent. –308C. PCR primers and probes, details on request, were designed using ABI Primer Express 1.5 software and synthesized by Applied Biosystems, Inc. The QPCR reactions were performed using an ABI Prism 7700 Sequence Detection System in 25 ml total volume with Acknowledgements 5 ml cDNA template, 16 TaqMan Universal PCR Master This study would not have been possible without clinical Mix (ABI), 100 nM each PCR primer, 200 nM probe, and 16 material.Wewouldthereforeliketothankthepatients VIC-labelled Beta-2-Microglobulin endogenous control who allowed us to use their tissues and we would like to (ABI). Reactions were incubated at 508C for 2 min, 958C thank the clinical teams in Liverpool and Bern who for 10 min, followed by 40 cycles of incubation at 958C for collected, characterized and stored that material. We would 15 s and 608C for 1 min. Results were analysed using also like to thank Helen Scott, and Melanie Hubert for Sequence Detector 1.7 software (ABI) and fold differences sectioning, staining and imaging the tissues, Suzanne were calculated using the comparative CT method (ABI User Watson for the S100A2 IHC, Naomi Bowers for bioinfor- Bulletin #2). matics support and Tricia Mitchell for the TaqMan experiments. The Roy Castle Lung Cancer Foundation funded the UK-based component of this work. DC Western analysis Betticher and D Ratschiller are funded by Swiss Cancer A standard Western protocol was used to examine protein League (Grant KFS 703-8-1998) and the clinical material levels of S100A2. 20, 40 mM sections were lysed by the from the neo-adjuvant patients was collected and analysed addition of 1 ml of PE buffer (50 mM Tris-HCl pH 7.5, byMGuggeraspartofSAKK(SwissGroupforClinical 150 mM NaCl, 0.1% w/v SDS, 0.5% v/v NP40 containing Cancer Research) study 16/96 (chaired by DC Betticher).

References

AgrawalD,ChenT,IrbyR,QuackenbushJ,ChambersAF, Bertucci F, Houlgatte R, Nguyen C, Viens P, Jordan BR and Szabo M, Cantor A, Coppola D and Yeatman TJ. (2002). Birnbaum D. (2001). Lancet Oncol., 2, 674 – 682. J. Natl. Cancer Inst., 94, 513 – 521.

Oncogene Lung cancer target identification J Heighway et al 7763 Bhattacharjee A, Richards WG, Staunton S, Li C, Monti S, Kolligs FT, Kolligs B, Hajra KM, Hu G, Tani M, Cho KR Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, Loda M, and Fearon ER. (2000). Genes Dev., 14, 1319 – 1331. Weber G, Mark EJ, Lander ES, Wong W, Johnson BE, Lee PD, Sladek R, Greenwood CM and Hudson TJ. (2002). Golub TR, Sugarbaker DJ and Meyerson M. (2001). Proc. Genome Res., 12, 292 – 297. Natl. Acad. Sci. USA, 98, 13790 – 13795. Liu D, Rudland PS, Sibson DR, Platt-Higgins A and Bonomi P. (2002). Semin. Oncol., 29, 78 – 86. Barraclough R. (2000). Br. J. Cancer, 83, 1473 – 1479. Bray F, Sankila R, Ferlay J and Parkin DM. (2002). Eur. J. MaassN,HojoT,UedingM,LuttgesJ,KloppelG,JonatW Cancer, 38, 99 – 166. and Nagasaki K. (2001). Clin. Cancer. Res., 7, 812 – 817. Ilg EC, Schafer BW and Heizmann CW. (1996). Int. J. Matsumoto Y, Takano H, Kunishio K, Nagao S and Fojo T. Cancer, 68, 325 – 332. (2001). Jpn. J. Cancer Res., 92, 1133 – 1137. Feng G, Xu X, Youssef EM and Lotan R. (2001). Cancer Nacht M, Dracheva T, Gao Y, Fujii T, Chen Y, Player A, Res., 61, 7999 – 8004. Akmaev V, Cook B, Dufault M, Zhang M, Zhang W, Guo FraireAE,RoggliVL,VollmerRT,GreenbergSD, M, Curran J, Han S, Sidransky D, Buetow K, Madden SL McGavran MH, Spjut HJ and Yesner R. (1987). Cancer, and Jen J. (2001). Proc. Natl. Acad. Sci., 98, 15203 – 15208. 60, 370 – 375. PabonC,ModrusanZ,RuvoloMV,ColemanIM,DanielS, Folkman J, Hahnfeldt P and Hlatky L. (2000). Nat. Rev. Yue H and Arnold Jr LJ. (2001). Biotechniques, 31, 874 – Mol. Cell Biol., 1, 76 – 79. 879. Garber ME, Troyanskaya OG, Schluens K, Petersen S, Seftor RE, Seftor EA, Sheng S, Pemberton PA, Sager R and Thaesler Z, Pacyna-Gengelbach M, van de Rijn M, Rosen Hendrix MJ. (1998). Cancer Res., 58, 5681 – 5685. GD, Perou CM, Whyte RI, Altman RB, Brown PO, SundayME,HaleyKJ,SikorskiK,GrahamSA,Emanuel Botstein D and Petersen I. (2001). Proc. Natl. Acad. Sci. RL, Zhang F, Mu Q, Shahsafaei A and Hatzis D. (1999). USA, 98, 13784 – 13789. Oncogene, 18, 4336 – 4347. Giordano TJ, Shedden KA, Schwartz DR, Kuick R, Taylor Thurlbeck WM and Churg AM. (eds) (1995). Pathology of JM, Lee N, Misek DE, Greenson JK, Kardia SL, Beer DG, the Lung 2nd Edn. Thieme Medical Publishers, Inc: New RennertG,ChoKR,GruberSB,FearonERandHanash York, p 445. S. (2001). Am. J. Pathol., 159, 1231 – 1238. Topcu Z. (2001). J. Clin. Pharm. Ther., 26, 405 – 416. Hayashi K, Matsuda S, Machida K, Yamamoto T, Fukuda Umekita Y, Ohi Y, Sagara Y and Yoshida H. (2002). Int. J. Y, Nimura Y, Hayakawa T and Hamaguchi M. (2001). Cancer, 100, 452 – 455. Cancer Res., 61, 2361 – 2364. Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling Heighway J, Betticher DC, Knapp T and Hoban PR. (2003). CA, Monk BJ, Lockhart DJ, Burger RA and Hampton Lung Cancer. Vol2,BDriscoll(ed.).Methods in Molecular GM. (2001). Proc. Natl. Acad. Sci. USA, 98, 1176 – 1181. Medicine Humana Press, New Jersey. Wicki R, Franz C, Scholl FA, Heizmann CW and Schafer Hough CD, Cho KR, Zonderman AB, Schwartz DR and BW. (1997). Cell Calcium, 22, 243 – 254. Morin PJ. (2001). Cancer Res., 61, 3869 – 3876. YueH,EastmanPS,WangBB,MinorJ,DoctoleroMH, Hsieh JC, Kodjabachian L, Rebbert ML, Rattner A, Nuttall RL, Stack R, Becker JW, Montgomery JR, Vainer Smallwood PM, Samos CH, Nusse R, Dawid IB and M and Johnston R. (2001). Nucleic Acids Res., 29, E41. Nathans J. (1999). Nature, 398, 431 – 436. Zhu Y, Spitz MR, Lei L, Mills GB and Wu X. (2001). Cancer Joos L, He JQ, Shepherdson MB, Connett JE, Anthonisen Res., 61, 7825 – 7829. NR, Pare PD and Sandford AJ. (2002). Hum. Mol. Genet., 11, 569 – 576. Kim JH, Skates SJ, Uede T, Wong KK, Schorge JO, Feltmate CM, Berkowitz RS, Cramer DW and Mok SC. (2002). JAMA, 287, 1671 – 1679.

Oncogene