Oncogene (2006) 25, 1821–1831 & 2006 Nature Publishing Group All rights reserved 0950-9232/06 $30.00 www.nature.com/onc ONCOGENOMICS Head and neck squamous cell carcinoma transcriptome analysis by comprehensive validated differential display

A Carles1,4, R Millon2,4, A Cromer1,4,5, G Ganguli1, F Lemaire1,6, J Young1,7, C Wasylyk1, D Muller2, ISchultz 2, Y Rabouel1,2, D Dembe´ le´ 1, C Zhao1,8, P Marchal1, C Ducray3, L Bracco3, J Abecassis2, O Poch1 and B Wasylyk1

1Institut de Ge´ne´tique et de Biologie Mole´culaire et Cellulaire, CNRS/INSERM/ULP, IllkirchCedex, France; 2Laboratoire de Biologie Tumorale, UPRES EA 34-30, Centre Paul Strauss, Strasbourg, France and 3Exonhit Therapeutics, Paris, France

Head and neck squamous cell carcinoma (HNSCC)is Introduction common worldwide and is associated with a poor rate of survival. Identification of new markers and therapeutic Head and neck squamous cell carcinoma (HNSCC) is targets, and understanding the complex transformation common worldwide and has a poor rate of survival. It is process, will require a comprehensive description of the fifth most frequent cancer in men, with an incidence genome expression, that can only be achieved by of about 780 000 new cases per year in the world. There combining different methodologies. We report here the is a need for a better understanding of HNSCC, for the HNSCC transcriptome that was determined by exhaustive development of rational targeted interventions and to differential display (DD)analysis coupled with validation define new prognostic or diagnostic markers (Hasina by different methods on the same patient samples. The and Lingen, 2004). The need to comprehensively resulting 820 nonredundant sequences were analysed by describe expression patterns has engendered the high throughput bioinformatics analysis. Human emergence of large-scale technologies. Comprehensive were identified for 73% (596)of the DD sequences. A transcriptome analysis is widely applied in cancer large proportion (>50%)of the remaining unassigned research (Liotta and Petricoin, 2000; Liang and Pardee, sequences match ESTs (expressed sequence tags)from 2003). Various methodologies have been used and each human tumours. For the functionally annotated proteins, one has its advantages and limitations (Stein and Liang, there is significant enrichment for relevant biological 2002; Ding and Cantor, 2004). Combining different processes, including cell motility, biosynthesis, approaches is important for a complete description of stress and immune responses, cell death, cell cycle, cell cancer transcriptomes. We report here a large-scale proliferation and/or maintenance and transport. Three of analysis of HNSCC by differential display (DD) and the novel proteins (TMEM16A, PHLDB2 and ARH- compare it with macro- and micro-array (mA) studies on GAP21)were analysed further to show that they have the the same samples. potential to be developed as therapeutic targets. DD (Liang and Pardee, 1992) is a powerful, simple Oncogene (2006) 25, 1821–1831. doi:10.1038/sj.onc.1209203; and widely used technology. It combines three common published online 31 October 2005 techniques: reverse transcription (RT), polymerase chain reaction (PCR), polyacrylamide gel electrophor- Keywords: macroarray; microarray; integrative geno- esis, and detects differences in gene expression patterns mics; bioinformatics between two or more samples. The strength of DD is its ability to identify novel , whereas a limitation is the frequency of false positives requiring time-consuming validation methods. mAs generate a large amount data Correspondence: Dr B Wasylyk, Institut de Ge´ ne´ tique et de Biologie quickly, and they have become a standard tool. They ONCOGENOMICS Mole´ culaire et Cellulaire, CNRS/INSERM/ULP, 1 rue Laurent Fries, BP10142, 67404 Illkirch Cedex, France. give numerical estimates of gene expression levels for E-mail: [email protected] hundreds to thousands of sequences spotted on various 4These authors contributed equally to this work. supports. However, the data obtained depend on the 5Current address: Department of Cell Biology, Harvard Medical information that was used to generate the arrays, and School, 240 Longwood Avenue, Boston, MA 02115, USA. there are problems of sensitivity, reproducibility and 6Current address: CEA Grenoble-DRDC, Laboratoire BioPuces, 17 rue des Martyrs, 38054 Grenoble Cedex 9, France. specificity for homologous sequences (Stein and Liang, 7Current address: MIMR, Monash University, Clayton 3168, 2002; Liang and Pardee, 2003). As both DD and mAs Australia. have limitations, combining the two methodologies is 8Current address: The Hospital for Sick Children, Dept Genetics & expected to give a more comprehensive description of Genomic Biology, 555 University Avenue, Toronto, Ontario, Canada M5G 1X8. cancer transcriptomes. Received 1 August 2005; revised 13 September 2005; accepted 19 Here, we report the first extensive description of September 2005; published online 31 October 2005 hypopharyngeal carcinoma differential gene expression HNSCC transcriptome A Carles et al 1822 using large-scale DD combined with reverse Northern and macroarray blot (MAB) validation. We compared Tumours identified and characterised by histopathology and patient history the results with a parallel Affymetrix mA study that used the same samples (Cromer et al., 2004). In addition, innovative bioinformatics tools were used to analyse the RNA extracted and DNAsel treated results. We used our integrative genomics and bioinfor- matics platform Gscope and three-step protocol (Chal- Sample preparation mel F, submitted) in order to assign computationally Reverse transcription cross-validated human protein sequences to each DD 3´ primers (3 x HT11 A/G/C) nucleotide sequence. We evaluated the concordance between the gene expression profiles generated by the

PCR display different technologies. The large sets of genes that were 3´ primers (3 x HT11 A/G/C) shown to be differentially expressed were functionally 5´ primers (58 x HAP primers) analysed, in order to describe biological relevant path- ways. Several novel genes were evaluated phenotypically to describe their observable characteristics (such as 1750 bands isolated localization and expression levels), and shown to be PCR differential candidates for further evaluation as targets to develop therapeutics for cancer. 14000 clones tested

Reverse Northern Hybridisation (RNH) Selection Results

PCR-DD and validation by reverse Northern Validation/ hybridization Pilot study This study A pilot study was previously reported, that described 70 RNH using mixed probes RNH using focussed probes sequences obtained from a PCR-DD comparison of the transcriptomes of three stages of HNSCC and corre- ~200 clones 2467 clones sequenced sponding normal tissue (N) (Figure 1) (Lemaire et al., - 30 non-relevant 2003). Sequences from differentially displayed bands (bacteria, cloning vectors) were selected and validated by reverse Northern hybridization (RNH) using ‘mixed’ probes, consisting - 304 poor quality, of the particular combination of HAP primers used for unusuable (repeats...) the DD (Figure 1). As this approach selected few clones, -133 too short we investigated different methods to prepare the probes, (< 25 bases) finally resorting to ‘focused’ probes prepared with each specific HAP primer, which improved the hybridization - 1180 redundant signals and thereby allowed us to select and validate a sequences more comprehensive set of sequences. We now report a large-scale study, in which a total of 2467 clones were 70 unique sequences 820 unique sequences selected by RNH (choosing where possible two clones from each band), and sequenced (Figure 1). The sequences were analysed by high throughput automatic Figure 1 Overviews outlining the DD protocol and the bioinfor- matics sequence analysis of the 2467 sequences. The pilot study has analysis (see Materials and methods). been reported (Lemaire et al., 2003). Of the 2467 sequences, 467 were immediately excluded for different reasons. A total of 30 sequences were non- human (bacteria, plasmids, etc.), 304 were not analy- sable because of their bad quality or the presence of MAB analysis with a larger number ofpatient samples repeats (i.e. ALUs, MIRs, L3, MER47A, etc.) and 133 In order to validate the analysis by a different approach, were too short (less than 25 bases). The remaining 2000 and to extend it to more patients, cDNAs were spotted sequences corresponded to 820 differentially expressed on MABs. The MABs were hybridized in triplicate with nonredundant (NR) genes. Most of the redundancy 50 different labelled cDNAs, prepared from 30 tumours resulted from sequencing two or more clones from the (T) and 20 matched N tissues. The hybridization signals same DD band. The DD conditions were sufficiently were low (near background) for 77% of the spots, selective that particular transcripts were rarely repre- medium for 20% and high for 3%. The high proportion sented by different bands. The final NR list contains 432 of low hybridization signals is expected, as low sequences that are overexpressed in tumours, 349 that abundance transcripts constitute a large proportion of are underexpressed, and 39 that have more complex the overall mRNA population. Differentially expressed profiles, including different profiles between tumour sequences were identified by statistical analysis of types. all of the signals (to minimize overlooking relevant

Oncogene HNSCC transcriptome A Carles et al 1823 sequences). A total of 163 clones showed different brane protein 16A), a putative receptor (Katoh, 2003), expression between T and N samples by the Student’s t- has been given various names FLJ10261 (Katoh, 2003), test, 133 by the Wilcoxon signed rank test and 125 by TAOS2 (Tumour Amplified and Overexpressed Seque- significant analysis of microarrays (SAM) (Tusher et al., nce 1) (Huang et al., 2002), ORAOV2 (ORAl cancer 2001). In all, 97 clones were selected by all three tests, OVerexpressed 2) (Strausberg et al., 2002) and DOG-1 corresponding to 14 NR genes that were upregulated in (Discovered On GIST-1) (West et al., 2004)). The three tumours and 35 that were downregulated (Table 1; candidates were isolated from bands that were more Supplementary Table 1 and data not shown). intense in tumour lanes on the DD gels (Figure 3, upper T/N ratios of expression were calculated for the 347 panels). In the DD overall, there was one series of bands spots with medium or high hybridization signals. These for TMEM16A, three for PHLDB2 and two for ratios were compared with the DD gel profiles for the ARHGAP21. One of the bands for ARHGAP21 samples analysed by both techniques (Figure 2a). The (indicated by an asterisks) was shown by RNH to MAB and DD profiles are similar for 64% of the clones. correspond to two differentially expressed sequences When low MAB ratios were excluded from the with opposite profiles. Using RT–QPCR with 63 comparison, the correlation improved to 85% for ratios hypopharyngeal tumours for which there were 42 above 1.2, and finally attained 100% (Figure 2b), matched N tissues (Figure 3a, upper middle panel), showing that there is a good concordance between the TMEM16A RNA was found to be overexpressed in a two techniques. However, the low sensitivity of MABs very high proportion of tumours (84%), with a large restricts the comparison to only some of the DD average 17-fold increase in the expression. On Northern sequences. blots, we detected the predicted major transcript migrating around 4.5 kb, which was more intense in the tumours samples from all four patients (lower Comparison between the large-scale DD and middle panel). We produced antibodies against the Affymetrix mAs protein (Materials and methods) and used them for The patient samples used for the DD have also been IHC. TMEM16A was detected distinctly in the cyto- analysed by Affymetrix HG-U95A mAs (Cromer et al., plasm of neoplastic cells of HNSCC tumours, and 2004). We compared the expression profiles of sequences diffusely in the basal layer of the epithelium of N tissue that were analysed by both approaches. Using Blast, we (uvula). These antibodies stained GIST samples strongly searched for the 820 NR DD sequences among the and homogeneously (data not shown), in agreement 12 448 target sequences (probe sets) on the mA. We with a recent report (West et al., 2004). obtained 331 significant matches, which represents more PHLDB2 codes for a 160 kDa protein with a unique than one-third of the DD sequences. We excluded 52 spectrin repeat and a C-terminal Pleckstrin homology genes that gave low signals on the mA (detection call was domain (Paranavitane et al., 2003). Using RT–QPCR, absent or marginal) and 11 that had a complex DD we found that PHLDB2 is overexpressed in 10/12 profile. We found a good correlation (68.7%, 184/268) HNSCC (six-fold on average, Figure 3b). On Northern between the DD and mA expression profiles (Figure 2c). blots, we detected a predominant band migrating The correlation was not increased by choosing increas- around 6 kb that is more intense in four of the tumours ing cutoffs for the mA hybridization signal intensity compared to normal. There was no detectable signal in (200, 650, 1000; 68.7, 68.1, 67.8%, respectively). How- tumour and normal RNA from another patient. By IHC ever, using cutoffs for the T/N mA-ratios, the correlation with antibodies developed against the central part of gradually increased to 100% (Figure 2d). mAs ratios PHLDB2 (amino-acids 642–660), we detected cytoplas- may only be significant if they are at least 2- to 3-fold mic staining that was specific for the neoplastic (Draghici et al., 2003). With cutoffs of 2 or 3, and a DD component of tumours, whereas there was no detectable cutoff of 1, the correlations reached 83% (49/59) and staining in normal uvula epithelium. 92% (33/36 genes), respectively. ARHGAP21 (previously called ARHGAP10 (see Genecards, http://www.genecards.org)) maps to chro- Further analysis ofthree selected DD clones mosome 10p12.32 and codes for a member of the Rho- Three sequences from the DD analysis were chosen for GAP family (Basseres et al., 2002), By RT–QPCR in a further analysis because they had not been described in panel of 23 hypopharyngeal carcinomas and their the literature at the time of their discovery and had matched N tissues (Figure 3c), we found higher levels characteristics that might be useful for therapeutic in most tumour samples (91%) compared to the N tissue intervention, such as expression on the cell surface samples, with increasing in range from 1.5- to 5-fold (TMEM16A) and links to pathways already used for (average 2.4). Using Northern blots, a major band was drug discovery (PI3-kinase, PHLDB2; Rho-GTPases, detected migrating around 7.5 kb. Differences in band ARHGAP21; see Discussion). They were analysed by intensity could not be reliably deduced from the quantitative RT–PCR (RT–QPCR) to reconfirm their Northern blots because of the weak signals. By IHC, expression pattern and to extend the analysis to more using an antibody against a central part of the protein patients. They were also analysed by Northern blots to (amino-acids 383–400), we clearly detected weak cyto- establish the size of the transcripts, and antibodies were plasmic staining in the neoplastic cells of the tumour, developed to analyse expression at the protein level by and no detectable staining of the epithelium from immunohistochemistry (IHC). TMEM16A (Transmem- normal uvula (Figure 3c).

Oncogene HNSCC transcriptome A Carles et al 1824 Table 1 Overlapping genes that were selected by more than one profiling method Acc number Description Function (GO)

(a) Overlaps DD–MAB–mA BC013566 Aquaporin 3 GO:0005215 transporter activity X14420 Collagen alpha 1(III) chain precursor GO:0005201 extracellular matrix structural constituent Y07909 Epithelial membrane protein-1 (EMP-1) GO:0008283 cell proliferation M60751 Histone H2B.b (H2B/b) GO:0003677 DNA-binding activity U07643 Lactotransferrin precursor (Lactoferrin) GO:0004252 serine-type endopeptidase activity AY358653 Lysozyme (LLAL424) GO:0003796 lysozyme activity D87953 NDRG1 protein (N-myc downstream regulated GO:0003824 enzyme activity gene 1 protein) BC058035 Proline-rich protein 4 precursor (Lacrimal GO:0007582 physiological process proline-rich protein)

(b) Overlaps DD–MAB AJ003125 ADAMTS-2 precursor (EC 3.4.24.14) GO:0008133 collagenase activity AF213467 Bromodomain adjacent to zinc-finger domain protein GO:0003677 DNA-binding activity 1A (ATP-utilizing chromatin assembly and remodeling factor 1) X06234 Calgranulin A (migration inhibitory factor-related GO:0005509 calcium ion binding activity protein 8) Y13248 C-X-C chemokine receptor type 6 (CXC-R6) GO:0015026 coreceptor activity M28016 Cytochrome b GO:0008121 ubiquinol-cytochrome c reductase activity AF154415 FLASH GO:0005123 death receptor binding AF272149 Hepatocellular carcinoma associated protein TB6 GO:0004872 receptor activity AF247704 Homeobox protein Nkx-3.1. GO:0003700 transcription factor activity AY331186 Hypothetical protein DKFZp566N1646 (NPF/calponin- GO:0004840 ubiquitin conjugating enzyme like protein) activity BX537945 Hypothetical protein DKFZp686L1818 GO:0004263 chymotrypsin activity AK172819 Hypothetical protein FLJ23980 (OTTHUMP00000030611) GO:0008289 lipid binding activity AK098486 Hypothetical protein FLJ25620 GO:0016021 integral to membrane BC030814 IGKV1-5 protein. GO:0003823 antigen binding IGJ_HUMAN Immunoglobulin J chain. GO:0003823 antigen binding activity BC073786 Immunoglobulin lambda constant 1 (Mcg marker) GO:0004812 tRNA ligase activity BC030984 Immunoglobulin lambda variable 3–21 GO:0003823 antigen binding X02490 Interferon-inducible mRNA (cDNA 1-8) GO:0005057 receptor signaling protein activity X14640 Keratin, type Icytoskeletal 13 (Cytokeratin 13) GO:0005198 structural molecule activity AY358415 KMQK697 BC005272 Matrix Gla-protein precursor (MGP) GO:0005509 calcium ion-binding activity NU1M_HUMAN NADH-ubiquinone oxidoreductase chain 1 GO:0008137 NADH dehydrogenase (EC 1.6.5.3) ubiquinone activity AB060339 Ras and Rab interactor 2 (Ras interaction/interference GO:0005096 GTPase activator activity protein 2) AF480466 Rho-GTPase activating protein 10 GO:0005515 protein-binding activity BC021886 RPL27 protein GO:0003735 structural constituent of ribosome K03202 Salivary acidic proline-rich phosphoprotein 1/2 precursor GO:0005615 extracellular space (PRP-1/PRP-3) BC017802 Small proline-rich protein 3 (Cornifin beta) GO:0005198 structural molecule activity U40705 Telomeric repeat binding factor 1 (TTAGGG repeat GO:0003677 DNA-binding activity binding factor 1) BC022436 TPT1 protein (tumour protein, translationally controlled 1) GO:0005554 molecular_function unknown BC030807 Tumour-related protein. GO:0005509 calcium ion binding activity

(c) Overlaps DD–mA BC065937 50 nucleotidase, ecto GO:0017175 IMP-GMP specific 5-nucleotidase activity BC015766 Alpha-actinin 1 (alpha-actinin cytoskeletal isoform) GO:0003779 actin binding activity BC002988 ARP2/3 complex 41 kDa subunit (P41-ARC) GO:0003779 actin binding activity X63629 Cadherin-3 precursor (placental-cadherin) GO:0005509 calcium ion binding activity X06661 Calbindin (vitamin D-dependent calcium-binding protein, GO:0005509 calcium ion binding activity avian-type) BC005269 Cardiac phospholamban (PLB) GO:0005246 calcium channel regulator activity AF035408 Cartilage intermediate layer protein GO:0004721 protein phosphatase activity BC027963 Coagulation factor XIII A chain precursor (EC 2.3.2.13) GO:0003810 protein-glutamine gamma- glutamyltransferase activity BC063851 Complement component 7 GO:0005509 calcium ion binding activity

Oncogene HNSCC transcriptome A Carles et al 1825 Table 1 (continued ) Acc number Description Function (GO)

AB015628 C-type lectin superfamily member 2 (activation-induced GO:0005529 sugar binding activity C-type lectin) L06797 C-X-C chemokine receptor type 4 (CXC-R4) GO:0015026 coreceptor activity BC063291 Desmocollin 2A/2B precursor (desmosomal glycoprotein II GO:0005509 calcium ion binding activity and III) (desmocollin-3) AF126110 Fibulin-1 precursor GO:0005509 calcium ion binding activity U27328 Galactoside 3(4)-L-fucosyltransferase (EC 2.4.1.65) GO:0017060 3-galactosyl-N-acetylglucosami- nide 4-alpha-L-fucosyltransferase activity BC004141 Glycine amidinotransferase, mitochondrial precursor GO:0015068 glycine amidinotransferase ac- (EC 2.1.4.1) tivity BC007075 Hemoglobin beta chain GO:0005344 oxygen transporter activity AF492675 Homo sapiens (lung cancer-associated Y protein) GO:0003700 transcription factor activity BX510904 Hypothetical protein DKFZp451A123 GO:0000146 microfilament motor activity BC032839 Interferon-induced protein with tetratricopeptide repeats 2. GO:0006955 immune response BC000897 Interferon-induced transmembrane protein 1 GO:0005057 receptor signaling protein (interferon-induced protein 17) activity BC006794 Interferon-induced transmembrane protein 3 GO:0005057 receptor signaling protein (interferon-inducible protein 1-8U) activity M17017 Interleukin-8 precursor (IL-8) GO:0005153 interleukin-8 receptor ligand activity BC022068 Kallikrein 11 precursor (EC 3.4.21.–) GO:0004263 chymotrypsin activity AF279865 Kinesin-like protein KIF13B (kinesin-like protein GAKIN) GO:0003777 microtubule motor activity BC011704 Midkine precursor (MK) GO:0008083 growth factor activity AL162079 Monocarboxylate transporter 1 (MCT 1) GO:0015130 mevalonate transporter activity BC039612 Myosin XVIIIA (myosin 18A) GO:0003774 motor activity BC002601 NF-kappaB inhibitor alpha (major histocompatibility GO:0008134 transcription factor binding complex enhancer- binding protein MAD3) activity AF272036 OTTHUMP00000016853 (Rag D) GO:0003925 small monomeric GTPase activity M16006 Plasminogen activator inhibitor-1 precursor (PAI-1) GO:0004867 serine protease inhibitor activity L13385 Platelet-activating factor acetylhydrolase IB alpha subunit GO:0016787 hydrolase activity (PAF acetylhydrolase 45 kDa subunit) U03891 Phorbolin 1 GO:0008270 zinc ion binding activity U90441 Prolyl 4-hydroxylase alpha-2 subunit precursor (EC GO:0004656 procollagen-proline 2-oxogluta- 1.14.11.2) rate-4-dioxygenase activity AF290615 Protein C1orf8 precursor (liver membrane-bound protein) GO:0016021 integral to membrane U88048 Putative protein KiSS-16. BC006992 Rad51-interacting protein (RAD51 associated protein 1) GO:0003690 double-stranded DNA-binding activity AF115926 Secreted cement gland protein XAG-2 homolog (Hypothetical protein AGR2) S95936 Serotransferrin precursor (transferrin) GO:0008199 ferric iron-binding activity X72755 Small inducible cytokine B9 precursor (CXCL9) GO:0008009 chemokine activity U30246 Solute carrier family 12 member 2 (bumetanide-sensitive GO:0005279 amino-acid-polyamine sodium- (potassium)-chloride cotransporter 1) transporter activity U25997 Stanniocalcin 1 precursor (STC-1) GO:0005179 hormone activity L25444 Transcription initiation factor TFIID subunit 6 GO:0003677 DNA-binding activity X01060 Transferrin receptor protein 1 (TfR1) GO:0004998 transferrin receptor activity BC059385 Ubiquitin-like protein 3 (HCG-1 protein) GO:0006464 protein modification X51675 Urokinase plasminogen activator surface receptor GO:0030377 U-plasminogen activator precursor (uPAR) receptor activity

Accession number, description and selected GO function. For further information see Supplementary Table 1.

These results, obtained for three genes that were not analysis included unambiguous assignment to complete cross-verified on MAB or mA, further confirm the DD transcripts (due to their location in the 30- untranslated analysis and the relevance of this approach. They regions of transcripts), and elimination of redundancy suggest that most of the transcripts that are over- (Broude, 2002). We used Gscope, our integrative expressed in tumours are derived from the neoplastic genomics and bioinformatics platform, and applied component, as expected from the high proportion our three-step protocol (Chalmel F, submitted), in (>70%) of transformed cells in the tumours used for order to assign a computationally cross-validated the DD analysis. human protein sequence (see Materials and methods) to each DD nucleotide sequence. We identified a human protein for 73% (596/820) of the DD sequences, Bioinformatics analysis of the sequences identified by DD of which 532 (89%) were annotated and 64 did The 820 NR sequences were analysed by high through- not have any GO, KEGG or domain annotations put bioinformatics. Complications associated with the (Supplementary Table 1). The majority (36/64) of the

Oncogene HNSCC transcriptome A Carles et al 1826 ac 8 8 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 DD differential expression T-N -8 DD differential expression T-N -8 -10 -10 -10 -6-5-4-3-2-101234 -100 -20 -10 -5 0 5 10 40 MAB differential expression T/N µA differential expression T/N b Number of clones 263 150 76 58 45 19 d Number of genes 100% 189 59 36 2011 9 100%

90% 90%

80% 80%

70% 70% A profiles concordance 60% µ 60% DD/MAB profiles concordance DD/

50% 50% 1 1.52 2.5 10 1 2 3 5 7 9 11 13 100 MAB differential expression T/N (absolute value) µA differential expression T/N (absolute value) Figure 2 Concordance between the DD expression profiles and the MAB (a, b) and mA(c, d) T/N fold changes. (a, c) The MAB (a) and mA(c) fold changes in expression (T/N) are represented on the X-axis and the DD differential expression (T/N) on the Y-axis. Positive values indicate overexpression in T, negative values underexpression. Concordant results are found in the top right and bottom left segments. There are 268 genes in common between DD and mAs. (b, d) Increasing DD–MAB (b) and DD–mA(d) concordance with increasing expression ratios (T/N).

unannotated proteins were hypothetical. Significantly, sequences that overlapped between the DD and either many of these hypothetical proteins (28/36) had the MABs (45 sequences) or the mAs (59 sequences), human ESTs from T tissues. Of the 248 sequences (5/45, Z-score ¼ 5.5 and 8/59 Z-score ¼ 7, respectively). of the total list that did not correspond to cross- validated proteins, about half had human ESTs and Discussion half do not. We analysed the 596 differentially expressed seque- Methods and strategy nces corresponding to cross-validated proteins by We report here the first exhaustive hypopharyngeal computational functional analysis, using the program carcinoma differential gene expression analysis, invol- GOAnno that searches for GO annotation enrichment ving an extensive DD analysis, coupled with RNH compared to the whole human proteome (GOAnno; validation with focused probes and revalidation with Chalmel et al., 2005). We obtained eight main enrich- MAB and Affymetrix mAs, using the same samples. The ments in biological processes (Table 2), including cell first step was a large scale analysis by DD that motility, protein biosynthesis, response to stress, im- theoretically covers 90% of expressed sequences (three mune response, cell death, cell cycle, cell proliferation 30 primers (HT11A/G/C) and 58 50 primers (HAP1–10, and/or maintenance, and transport. The same biological 33–80, (Liang, 1998)). All the clones were selected for processes were enriched in the mA analysis (Cromer differential expression between pools of T and N et al., 2004; and data not shown). These are all samples by RNH. This procedure allowed us to important biological processes that are known to be eliminate parasite sequences that comigrate during DD involved in tumorigenesis. Interestingly, there were also electrophoresis, and the use of ‘focused’ probes im- strong enrichments for the response to stress for proved the sensitivity, with higher hybridization levels

Oncogene HNSCC transcriptome A Carles et al 1827

Figure 3 Analysis of TMEM16A (a), PHLDB2 (b) and ARHGAP21 (c). Top panels: DD profiles. The overall DD differential expression value was estimated from the relative differences in intensity of the bands for each sample. In the DD overall, there was one series of bands for TMEM16A, three for PHLDB2 and two for ARHGAP21. The asterisks indicates a case where more than one differentially expressed sequence comigrating in the band was identified by RN and independently validated by virtual Northern (data not shown). Second row of panels: RT–QPCR values are calculated in relative units adjusted to the RPLP0 (Ribosomal protein, large P0) internal control and to median values of normal samples. The mean values7standard deviations are indicated on the graph. Statistical analysis between T and N samples was carried out by Student’s t-statistical tests. Third row of panels: Northern blots, the bars indicate tumour (T), invaded node (L) and normal (N) samples from the same patients. Top: specific probe, bottom: RPLP0 loading controls. Bottom panels: immunohistochemical staining of T and N samples. Arrowheads indicate cells stained with the specific antibodies. than linear probes (Trenkle et al., 1999). The collection PCR-DD maximizes the detection of ‘novel’ sequences. of 820 NR sequences that are differentially expressed We found 248 NR sequences that apparently do not between tumour and normal samples is a high quality correspond to known proteins, and about half of them and unique description of the HNSCC transcriptome. do not match ESTs. Moreover, we found that roughly There are various advantages of using PCR-DD half of these ‘unknown genes’ (131/266) are not compared to DNA arrays (Broude, 2002; Liang, 2002). represented on the latest version of Human Affymetrix

Oncogene HNSCC transcriptome A Carles et al 1828 Table 2 GO biological processes that are enriched compared to the proteome Enriched GO DD (532 proteins) DD–MAB selection (45 proteins) DD–mA FC >2 (59 proteins)

NZ-score NZ-score NZ-score

GO:0006928 cell motility 20 7.9 — — 5 7.0 GO:0006412 protein biosynthesis 42 7.4 — — — — GO:0006950 response to stress 27 5.6 5 5.5 8 7.0 GO:0006955 immune response 10 4.6 — — 13 3.2 GO:0008219 cell death 19 4.1 — — — — GO:0007049 cell cycle 24 3.0 — — — — GO:0008283 cell proliferation 34 3.6 — — 5 2.2 GO:0006810 transport 74 2.7 — — 12 2.7

N: number of genes. FC: fold change.

mA (HG-U133_Plus_2_target downloaded at http:// The stress response may be required for cells to tolerate www.affymetrix.com). Another strength of the DD the genetic disarray characteristic of malignant trans- approach is that it picks up genes that are not analysable formation (Whitesell et al., 2003). Some of the stress- by arrays, either because they are expressed at low levels associated genes are also involved in biological processes or are not represented. Our PCR-DD analysis identified such as regulation of cell cycle (GO:0000074), anti- transcripts that could not be analysed with MABs and apoptosis (GO:0006916) and activation of MAPK mAs. Only between 25% (347 clones) and 32% (3962 (GO:0000187). Among the genes involved in metabo- probes sets) gave high-quality hybridization signals with lism, the thioredoxin-related transmembrane protein MABs or mAs, respectively. However, the good correla- (TXNDC) is overexpressed in tumours. It belongs to the tion (up to 100% with appropriate cut-offs) between the thioredoxin–thioredoxin reductase system known to be differential expression profile obtained by the DD involved in oncogenesis and tumorogenesis and to be a protocol and these high throughput technologies con- potential target for anticancer therapy for a wide range firms that our procedure has a high specificity. It is of human tumours (Lincoln et al., 2003). Further worth noting that there are 331 significant matches phenotypic and functional analysis is required to define between our DD-selected sequences and the Affymetrix the most promising therapeutic targets. probes sets. This overlap represents more than one-third We chose three genes (TMEM16A, PHLDB2 and of the DD sequences, which is comparable to the ARHGAP21) for further analysis using complementary 12 448 mA sequences representing more than one-third of approaches on a larger number of patient samples. We the estimated human transcriptome (IHGSC, 2004). confirmed that they are overexpressed in HNSCC There are other studies combining DD with array compared to N tissue by quantitative RT–PCR, analysis that have experienced the same lack of determined the size of the transcripts by Northern sensitivity of arrays. Martin and collaborators (Martin blotting and showed that the proteins are expressed in et al., 2001) found that less than 10% (12/170) of the tumour cells by immunohistochemitry (see Results). The DD-isolated sequences spotted on a membrane gave TMEM16A gene is located in the CCND1-EMS1 region useful results upon further analysis of additional patient (11q13), a that is frequently amplified in HNSCC samples. Several studies using PCR-DD and mAs as (Muller et al., 1997) and is enriched in genes that are complementary approaches detected few genes (0–10) by differentially expressed in hypopharynx carcinoma both approaches (Outinen et al., 1998; Wells, 1999; (Cromer et al., 2004). TMEM16A is predicted to be Cirelli and Tononi, 2000; Oetting, 2000; Heilig and associated with the cell membrane. PHLDB2 is a Sommer, 2004; Pascal et al., 2005). Our large-scale study 160 kDa protein that has, in particular, a C-terminal permits a more extensive comparison of PCR-DD and Pleckstrin homology domain that binds phosphatidyli- arrays, and shows that the correlation is good if certain nositol (3,4,5)-triphosphates (Dowler et al., 2000; Para- reasonable selection criteria are used. navitane et al., 2003). The overexpressed sequence we detected lacks exon 25 (data not shown). The PHLDB2 gene is located in the 3q13 locus, which is frequently Genes and processes amplified in nasopharyngeal cancer (Huang and Yao, Our study provides a list of genes differentially 2004). Therapeutic strategies are being evaluated for the expressed between normal and cancer cells, among PI3 kinase pathway (Dancey, 2004), raising the possi- which the targets of oncogenesis and potential thera- bility that PHLDB2’s PH domain might be a good peutic targets need to be identified. The candidate for targeted strategy development. ARH- analysis indicates that there are enrichments for GAP21 is widely expressed and codes for a Rho-GAP increased metabolism, cell density and stress-associated family protein (Basseres et al., 2002). The ARHGAP21 genes, raising the possibility that they are secondary to gene is highly expressed in the muscle and brain, which the key events associated with oncogenesis. However, are highly differentiated tissues, in agreement with the some of the stress-associated genes that we found are hypothesis that ARHGAP21 is important for cell already validated targets, such as heat shock proteins. differentiation (Basseres et al., 2002). A bioinformatics

Oncogene HNSCC transcriptome A Carles et al 1829 analysis has identified two isoforms that are produced Cloned DD fragments were amplified by PCR using pGEM by alternative splicing of the human ARHGAP21 gene primers (M13f and M13r), purified by ethanol purification in (Katoh, 2004). We detected several splice variants 96-well plates and checked on agarose gels. PCR products (lacking exon 8, or both 7 and 8) in tumour RNA (data (3 ng) were spotted onto nylon membrane using a Q-Pit spotter not shown), which could be important for tumorigen- (Genetix) and crosslinked to the nylon surface by UV irradiation. esis. Modulation of Rho GTPase activity by ARH- GAP21 may be a target for therapeutic intervention. However, it should be stressed that other criteria could Probes preparation Total RNA from 50 selected samples (30 have been used to select genes for further study, such as: tumours and 20 matched normal tissues) was extracted biological process, number of bands, high level of according to the RNeasy procedure (QIAGEN). cDNAs hybridization in MAB, chromosomal localization and probes for each RNA were prepared in triplicate by reverse transcription, amplification by T7 PCR and purification by selection through both DD and another method. ethanol precipitation. We identified in our pilot study two regions of the genome that are enriched for differentially expressed genes, 1q21 and 12p12–13 (Lemaire et al., 2003). This Hybridization, detection and analysis Membranes were enrichment is maintained in this study, in terms of both prehybridized for 4 h in hybridization buffer at 421C before 33 6 density (1q21 (13 genes) and 12p12–13 (25 genes)) and adding [alpha P]-dCTP-labelled probes (10 cpm, Megaprime normalized relative density (1q21 (173.6) and 12p12–13 random priming kit, Amersham). The membranes were hybridized overnight at 421C, washed in 2 SCC/0.1% SDS et al  (165.6); see Cromer ., 2004). The same regions following by 0.1  SCC/0.1% SDS at 421C, and dried at were identified in the Affymetrix mA study (Cromer room temperature. Membranes were exposed to phosphor- et al., 2004). imaging screens for 48 h. The screens were scanned with In summary, the present study reports a unique large- a Typhoon 8600 scanner (Amersham). The acquired images scale description of genes differentially expressed were analysed and the signal intensities quantified using the between hypopharynx carcinoma tissues and N tissues. Image Quant software (Amersham). The triplicates were It provides a different perspective of the cancer normalized by the quantile–quantile normalization method transcriptome and highlights potential limitations of (Bolstad et al., 2003) and then in each separate experiment the current methods used to describe tumour transcrip- data were normalized according to the median values. To tomes. We also report the phenotypic validation of three avoid overlooking significant differentially expressed clone, all measured signals were considered for statistical analysis. The novel genes with potential to be developed as therapeu- clones have been attributed a fold change T versus N and were tic targets. identified as statistically differentially expressed by three types of analysis. Student’s t-test was performed on log values and Wilcoxon’ signed rank test on intensity value ranks. Student’s Materials and methods t-test and Wilcoxon’s signed rank test were performed separately on the duplicated spot of each clone and Tissue samples and RNA extraction independently on the three experiments. Each significant All tumour samples and corresponding histologically N tissue analysis (Po0.05) was scored 1. A differential expression were obtained, with informed consent, from 98 patients score was calculated for each clone as the sum of the scores. undergoing surgery for hypopharyngeal tumour resection. All the clones having a score higher than 3 were selected as None of the patients presented distant metastasis, and surgery statistically differentially expressed. The third analysis using was the primary treatment. The set of samples common to DD SAM software (Significance Analysis of Microarray) (Tusher and mA consisted of four entities including seven patients. One et al., 2001) was performed on the three hybridization pool consisted of normal tissue (N) and three pools consisted experiments mean values. of three categories of tumour tissue (T): early stage (E), no metastatic propensity (NM), with metastatic propensity (M) (Lemaire et al., 2003; Cromer et al., 2004). Complementary DD sequence analysis sample sets with the same histological characteristics (hypo- First of all, the DD sequences were searched for the presence pharyngeal tumour and corresponding normal tissue) were of eukaryotic repeats using RepeatMasker (Smit, 1999) and a used for MAB (30T: 9E, 10NM, 11M and 20N), Northern blot genome-based masking protocol (Katsanis et al., 2002). or RT–PCR validation. Total RNA was extracted by the Sequences less than 25 nucleotides long that could not be RNAeasy procedure (QIAGEN). used were put aside. Blast searches (Altschul et al., 1997) against human sequence databanks (Genbank, Protall, Human ESTs, ) were conducted using a percent PCR-differential display identity cutoff of 95% and a match length cutoff equal to The PCR-DD protocol was performed as described previously 50% of the query length. When the homologous sequence were (Lemaire et al., 2003) following original methods (Liang and obvious contaminants (Escherichia coli, Fusobacterium nucle- Pardee, 1992; Liang, 1998). atum, cloning vector), the DD sequences were removed from the analysis. Redundancy was removed at four levels: identical Macroarray blots DD nucleotide sequences, overlapping locations on the human Production of the cDNA array MAB were prepared by genome, the same protein accession numbers and the proteins spotting double dots of cDNAs onto positively charged nylon sharing more than 95% of identity using Blast. The longest membranes (Amersham). The 1536 cDNAs were spread over sequence was used to represent each cluster of redundant two different nylon membranes. There were 1432 clones from sequences. Proteins were considered hypothetical if their DD corresponding to 664 NR genes, and 104 control probes description included: ‘hypothetical’, ‘predicted’, ‘similar to’, corresponding to 10 ubiquitous genes and empty plasmid. ‘KIAA’, ‘DKFZp’, ‘FLJ’, ‘putative’.

Oncogene HNSCC transcriptome A Carles et al 1830 DD–mA comparison and analysis were estimated relative to these standard curves. PCR Using Blast N, the nucleotide sequences from the DD protocol reactions were run at least twice for each sample. The median or their corresponding validated mRNA sequences were value was retained whenever the standard deviation did not compared to the Blast database that we built with the exceed 15%. The mean of the N samples was given an Affymetrix HG-U95A target sequences (the target sequences arbitrary unit of 1. All values were normalized using ribosomal are the sequences from which probes were selected; they were phospho-protein P0 (RPLP0) as an internal control. RPLP0 downloaded from http://www.affymetrix.com). The para- (originally called 36B4) is a ubiquitously expressed gene that meters for finding true homologues in the Blast outputs were has been routinely used in different laboratories as an internal as follows: for the mRNA against the mA target sequence an control to normalize for the amount of RNA. In a large study identity cutoff of 95% and a match length cutoff equal to 50% (98 cases), we confirmed by real-time quantitative PCR of the subject length, and for the DD clone sequence against (RT–QPCR) that its expression level remains relatively cons- the mA target sequence 90% identity and 50% of the query tant between HNSCC tumours and matched normal tissues length. When several mA target sequences fulfilled these (data not shown). RPLP0 gave better results than the com- parameters, only the one having the best identity percent was monly used control GAPDH, which was more variable bet- kept, so that one mA sequence was assigned to one DD ween samples in our experiments. Statistical analysis bet- sequence. The mA target sequences having no detectable values ween tumour and normal sample populations were carried (Absolute Call equal to Absent or Marginal) on the mAs in all out by statistical Student’s t-test. the samples were excluded, leaving the list of analysable genes. Northern blot Correlation between DD and mA or MAB profiles Total RNA was extracted from tissue samples with Trizol Within the context of DD protocol, the difference of (Life Technologies, Cergy Pontoise, France). A measure of expression between T and N samples was quantified by 20 mg of RNA was subjected to agarose 6% formaldehyde subtracting the average intensities of the T from the N bands gel electrophoresis, and then transferred to Hybond N seen on the DD gels and coded on a scale of 0–3. Positive membranes (Amersham). [32P]-labelled probes were generated values indicate overexpression in T, negative underexpression with the Rediprime system (Amersham). Membranes were in T, and zero difference between the averages complex. prehybridized and hybridized in 50% formamide at 421C Complex expression profile clones were excluded from the according to the manufacturer’s specifications, washed to DD–MAB or mA comparison. When the DD sequence had a stringency of 0.1 Â SSPE/0.1% SDS at 501C and exposed twin sequences, all the expression profiles were taken into to X-ray film (Kodak, Les Ulis, France) or to PhosphorImager account to calculate a consensus difference by subtracting the screen and analysed using the typhoon ImageQuant software. sums of the averages of the T from the N samples. For the mAs Membranes were also probed with RPLP0 as a loading and the MABs, the T versus N fold change (FC) was calculated control. T, N and involved lymph node (L) samples from the by dividing the average of the intensities of the T samples by same patients are present on the blot when available. The the average of the N intensity values. If the ratio was more level of expression in tumour samples was analysed in than 1, the expression profile was overexpressed in T, comparison with the matched normal tissues after correction otherwise it was underexpressed in T and in this case the fold for loading. change was given a value of À(N/T) for ease of comparison with (T/N). The DD–mA (or DD–MAB) correlation ratio was Immunohistochemistry calculated by dividing the number of analysable genes having the same expression profile in DD and mA (or MAB) by the Polyclonal antibodies were raised by injecting synthetic total number of analysable genes. peptides to rabbits. Peptide EKERQKDEPPCNHHNTC located in the C-terminus of the putative protein derived from the coding sequence of TMEM16A gene – peptide Real-time quantitative PCR QPQSKEHFRSLEERKKQHKC that corresponds to PHL Reverse transcription was performed with 1 mg total RNA, DB2 amino-acid sequence from 642 to 660. The ARHGAP21 random primers and the Superscript II PCR system (Life antibody was raised against a peptide corresponding to amino Technologies). Q-PCR reactions were performed using the acids 383–400 (IDWKNYKTYKEYIDNRRL) of the protein. Light Cycler (Roche Diagnostics, Meylan, France) with the The antibodies were tested by Western blotting of transfected LC Fast start DNA master SYBR green Ireaction mixture proteins expressed in COS or HeLa cells. Their specificities according to the manufacturer’s instructions. In all, 2 mlof were verified by: comparison of transfected and nontransfected 1:100 diluted RT products were used in 20 ml reaction volume. cells, peptide competition, comparison with preimmune serum The primers were chosen with the Primer3 software and their and size comparison with flagged proteins detected with anti- specificity was verified by Blast analysis on the NR database flag antibodies. In addition, the anti-TMEM16A and anti- (nonredundant set of GenBank, EMBL, and DDJB data- ARHGAP21 antibodies detected endogenous proteins of the bases). The primer sequences were: TMEM16A, 50-CTCC expected size in FaDu and HaCat cells, respectively (data not TGGACGAGGTGGTATGG-30 and 50-GAACGCCACGTA shown). Sections of patient samples were immunostained as AAAGATGG-30; PHLDB2, 50-CCTGTTGGATGTTGAAA previously described. Briefly, paraffin sections (3 mm.) were GCA-30 and 50-GAGCCTGCTGAACAATGTGA-30; ARHG cut, floated on Superfrost-coated glass microscope slides and AP21, 50-AGTGATGCTGCCAAGGAAGG-30 and 50-GAA dried overnight at 371C. Sections were de-waxed with xylene TGACCCCGAAGGACAAC-30; the ubiquitous gene RPLP0, and hydrated from ethanol to deionized water. Endogenous 50-GAAGGCTGTGGTGCTGATGG-30 and 50-CCGGATA peroxidase activity was blocked by a 20-min incubation in 3% TGAGGCAGCAGTT-30. For each gene, a standard curve hydrogen peroxide. Antigen retrieval was performed by was constructed using serial dilutions of standard cDNAs pressure cooking for 3 min in 0.1 M citrate buffer pH 6.0 (equivalent to 100, 40, 20, 10, 4, 2 and 1 ng of total RNA) (TMEM16A), by 15 min incubation in pressure cooker derived from a pool of 10 hypopharyngeal tumours. The (microwave tender cooker, Biogenex) in 0.1 M citrate buffer concentrations of primers, MgCl2, probes and cDNA were pH 6.0 (PHLDB2) or at 901C for 10 min in 0.01 M citrate optimized to obtain linear standard curves. Unknown samples buffer (ARHGAP21) and then 20 min cooling. Slides were

Oncogene HNSCC transcriptome A Carles et al 1831 blocked using the BSA for 30 min at room temperature Acknowledgements (PHLDB2). Excess blocking solution was removed, the primary antibody (rabbit polyclonal anti-B14 1/500, anti- We thank Raymond Ripp and Fre´ de´ ric Chalmel for bioinfor- PHLDB2 1/1500 dilution, and anti-ARHGAP21 1/100) was matics, Christine Macabre for technical assistance, the Wasylyk added with overnight incubation at 41C. Preimmune rabbit laboratory members for support and encouragement, and the IgG were used as a negative control on serial sections (data not IGBMC core facilities. We thank for financial support the shown). The slides were rinsed for three 5-min cycles in TBS. Ligue Nationale Franc¸aise contre le Cancer (Equipe labellise´ e), Incubation with secondary antibody was carried out at 371C the Ligues De´ partementales de Lutte contre le Cancer (Haut- for 60 min. All slides were then rinsed three times in TBS. and Bas-Rhin), the Association pour la Recherche sur le DAB kit (Vector Laboratories) was used for revelation and Cancer, BioAvenir (Aventis, Rhone-Poulenc), MNRT Reseau allowed to develop for 2–10 min. The slides were then rinsed Genhomme, the Centre National de la Recherche Scientifique, with TBS and counterstained with haematoxylin for 1 min, the Institut National de la Sante´ et de la Recherche Me´ dicale, then mounted. and the Hoˆ pital Universitaire de Strasbourg.

References

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Liang P, Pardee AB. (1992). Science 257: 967–971. Miller W et al. (1997). Nucleic Acids Res 25: 3389–3402. Liang P, Pardee AB. (2003). Nat Rev Cancer 3: 869–876. Basseres DS, Tizzei EV, Duarte AA, Costa FF, Saad ST. Lincoln DT, Ali Emadi EM, Tonissen KF, Clarke FM. (2003). (2002). Biochem Biophys Res Commun 294: 579–585. Anticancer Res 23: 2425–2433. Bolstad BM, Irizarry RA, Astrand M, Speed TP. (2003). Liotta L, Petricoin E. (2000). Nat Rev Genet 1: 48–56. Bioinformatics 19: 185–193. Martin KJ, Graner E, Li Y, Price LM, Kritzman BM, Broude NE. (2002). Expert Rev Mol Diagn 2: 209–216. Fournier MV et al. (2001). Proc Natl Acad Sci USA 98: Chalmel F, Lardenois A, Thompson JD, Muller J, Sahel JA, 2646–2651. Leveillard T et al. (2005). Bioinformatics 21: 2095–2096. Muller D, Millon R, Velten M, Bronner G, Jung G, Cirelli C, Tononi G. (2000). Brain Res 885: 303–321. Engelmann A et al. (1997). Eur J Cancer 33: 2203–2210. Cromer A, Carles A, Millon R, Ganguli G, Chalmel F, Oetting WS. (2000). Pigment Cell Res 13: 21–27. Lemaire F et al. (2004). Oncogene 23: 2484–2498. Outinen PA, Sood SK, Liaw PC, Sarge KD, Maeda N, Hirsh J Dancey JE. (2004). Ann Oncol 15(Suppl 4): iv233–iv239. et al. (1998). Biochem J 332(Part 1): 213–221. Ding C, Cantor CR. (2004). J Biochem Mol Biol 37: 1–10. Paranavitane V, Coadwell WJ, Eguinoa A, Hawkins PT, Dowler S, Montalvo L, Cantrell D, Morrice N, Alessi DR. Stephens L. (2003). J Biol Chem 278: 1328–1335. (2000). Biochem J 349: 605–610. Pascal T, Debacq-Chainiaux F, Chretien A, Bastin C, Draghici S, Kulaeva O, Hoff B, Petrov A, Shams S, Tainsky Dabee AF, Bertholet V et al. (2005). FEBS Lett 579: MA. (2003). Bioinformatics 19: 1348–1359. 3651–3659. Hasina R, Lingen MW. (2004). Semin Oncol 31: 718–725. Smit AF. (1999). Curr Opin Genet Dev 9: 657–663. Heilig M, Sommer W. (2004). Neurotox Res 6: 363–372. Stein J, Liang P. (2002). Cell Mol Life Sci 59: 1235–1240. Huang X, Gollin SM, Raja S, Godfrey TE. (2002). Proc Natl Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner Acad Sci USA 99: 11369–11374. RD, Collins FS et al. (2002). Proc Natl Acad Sci USA 99: Huang ZX, Yao KT. (2004). Di Yi Jun Yi Da Xue Xue Bao 24: 16899–16903. 798–801. Trenkle T, Welsh J, McClelland M. (1999). Biotechniques 27: IHGSC (2004). Nature 431: 931–945. 554–560, 562, 564. Katoh M. (2003). Int J Oncol 22: 1375–1381. Tusher VG, Tibshirani R, Chu G. (2001). Proc Natl Acad Sci Katoh M. (2004). Int J Oncol 25: 1201–1206. USA 98: 5116–5121. Katsanis N, Worley KC, Gonzalez G, Ansley SJ, Lupski JR. Wells JM. (1999). Alzheimer Dis Assoc Disord 13(Suppl 1): (2002). Proc Natl Acad Sci USA 99: 14326–14331. S78–S81. Lemaire F, Millon R, Young J, Cromer A, Wasylyk C, Schultz West RB, Corless CL, Chen X, Rubin BP, Subramanian S, I et al. (2003). Br J Cancer 89: 1940–1949. Montgomery K et al. (2004). Am J Pathol 165: 107–113. Liang P. (1998). Methods 16: 361–364. Whitesell L, Bagatell R, Falsey R. (2003). Curr Cancer Drug Liang P. (2002). Biotechniques 33: 338–344, 346. Targets 3: 349–358.

Supplementary information accompanies the paper on Oncogene website ( http://www.nature.com/onc).

Oncogene