[CANCER RESEARCH 60, 4037–4043, August 1, 2000] Advances in Brief

Cancer Discovery Using Digital Differential Display1

Daniela Scheurle, Maurice Phil DeYoung, David M. Binninger, Holly Page, Mohammad Jahanzeb, and Ramaswamy Narayanan2 Center for Molecular Biology and Biotechnology and Department of Biology, Florida Atlantic University, Boca Raton, Florida 33431 [D. S., M. P. D., D. M. B., H. P., R. N.], and Eugene M. and Christine E. Lynn Clinical Research Center, Boca Raton Community Hospital Cancer Center, Boca Raton, Florida 33486 [M. J.]

Abstract cancer-specific gene discovery (13). The Human Tumor Gene Index was initiated by the NCI in 1997 with a primary goal of identifying The Cancer Gene Anatomy Project database of the National Cancer expressed during development of human tumors in five major Institute has thousands of expressed sequences, both known and novel, in cancer sites: (a) breast; (b) colon; (c) lung; (d) ovary; and (e) prostate. the form of expressed sequence tags (ESTs). These ESTs, derived from This database consists of expression information (mRNA) of thou- diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Using a data-mining tool called Digital sands of known and novel genes in diverse normal and tumor tissues. Differential Display (DDD) from the Cancer Gene Anatomy Project da- By monitoring the electronic expression profile of many of these tabase, ESTs from six different solid tumor types (breast, colon, lung, sequences, it is possible to compile a list of genes that are selectively ovary, pancreas, and prostate) were analyzed for differential expression. expressed in the cancers. Data-mining tools are becoming available to An electronic expression profile and chromosomal map position of these extract expression information about the ESTs derived from various hits were generated from the Unigene database. The hits were categorized CGAP libraries (9, 10, 12, 14). Currently, there are 1.5 million ESTs into major classes of genes including ribosomal , enzymes, cell in the CGAP database, of which 73,000 are novel sequences. These surface molecules, secretory proteins, adhesion molecules, and immuno- sequences are also subclassified into those derived from libraries of globulins and were found to be differentially expressed in these tumor- normal, precancerous, or cancer tissues. We chose the DDD at the derived libraries. Genes known to be up-regulated in prostate, breast, and CGAP database to identify genes (both novel and known ESTs) that pancreatic carcinomas were discovered by DDD, demonstrating the utility of this technique. Two hundred known genes and 500 novel sequences are selectively up- or down-regulated in six major solid tumor types were discovered to be differentially expressed in these select tumor- (breast, colon, lung, ovary, pancreas, and prostate). Survey sequenc- derived libraries. Test genes were validated for expression specificity by ing of mRNA gene products can provide an indirect means of gener- reverse -PCR, providing a proof of concept for gene discov- ating fingerprints for cancer cells and their normal ery by DDD. A comprehensive database of hits can be accessed at http:// counterparts. DDD is a computer method of comparing these finger- www.fau.edu/cmbb/publications/cancergenes.htm. This solid tumor DDD prints. DDD is a quantitative method that enables the user to deter- database should facilitate target identification for cancer diagnostics and mine the fold differences between the libraries being compared, using therapeutics. a statistical method to quantitate the transcript levels. ESTs present in tumor-derived libraries were compared against all other libraries or Introduction against the corresponding normal libraries by DDD, and the hits showing Ͼ10-fold differences were compiled for each of the organ With the expected completion of the sequencing types. These hits were functionally classified into major classes of efforts in the next few years, over 100,000 new genes are likely to be proteins. Genes belonging to ribosomal proteins, enzymes, receptors, discovered (1–3). From these vast numbers of new genes, new diag- binding proteins, secretory proteins, and cell adhesion molecules were nostic and therapeutic targets for diseases like cancer are predicted to identified to be differentially expressed in these tumor types. A emerge (4). Only a subset of genes is expressed in a given cell, and the comprehensive database of hits was created, providing additional level of expression governs function. High-throughput gene expres- electronic expression data as well as novel ESTs that were thus sion technology is becoming a possibility for analyzing expression of identified. This database can be accessed on the World Wide Web.4 a large number of sequences in diseased and normal tissues with the use of microarrays and gene chips (5–8). A parallel way to initiate a Materials and Methods search for genes relevant to cancer diagnostics and therapy is to data mine the sequence database (9–13). A large number of expressed Data-mining of CGAP Database. The CGAP database was accessed,5 and sequences from diverse organ-, species-, and disease-derived cDNA the DDD tool was used according to the database instructions. DDD takes libraries are being deposited in the form of ESTs3 in different data- advantage of the UniGene database by comparing the number of times ESTs bases. from different libraries were assigned to a particular UniGene cluster. Six The CGAP database of the NCI is an attractive starting point for different solid tumor-derived EST libraries (breast, colon, lung, ovary, pan- creas, and prostate) with corresponding normal tissue-derived libraries were chosen for DDD (N ϭ 110). To identify tumor- and organ-specific ESTs, all Received 2/2/00; accepted 6/8/00. ϭ The costs of publication of this article were defrayed in part by the payment of page the other organ- and tumor-derived EST libraries (N 327) were chosen for charges. This article must therefore be hereby marked advertisement in accordance with comparison with each of the six tumor types. The nature of the libraries 18 U.S.C. Section 1734 solely to indicate this fact. (normal, pretumor, or tumor) was authenticated by comparison of the CGAP 1 Supported by an institutional startup grant (to R. N.). D. S. was supported by a grant data with UniGene database.6 Those few libraries showing discrepancies of from the Boca Raton Community Hospital Foundation. Colon tumor and normal tissues were provided by the Cooperative Human Tissue Network, which is funded by the definition between the two databases were excluded. The DDD was performed National Cancer Institute. for each organ type individually. DDD was performed using ESTs from tumors 2 To whom requests for reprints should be addressed, at Center for Molecular Biology (pool A) and corresponding normal organ (pool B) for DDD2 method or and Biotechnology, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431. Phone: (561) 297-2018; Fax: (561) 297-2099; E-mail: [email protected]. 3 The abbreviations used are: EST, expressed sequence tag; CGAP, Cancer Gene 4 http://www.fau.edu/cmbb/publications/cancergenes.htm. Anatomy Project; DDD, Digital Differential Display; RT, reverse transcription; Hs., 5 http://www.cgap.gov. human sequence; NCI, National Cancer Institute. 6 http://www.ncbi.nlm.nih.gov/UniGene/. 4037

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

Table 1 DDD of up-regulated genes in tumors Hits (known ESTs) showing Ͼ10-fold differences in the indicated tumor-derived libraries in comparison with normal tissue-derived cDNA library and all other organ- and tumor-derived cDNA libraries were compiled. ESTs belonging to specific classes of genes were subclassified as indicated. UniGene number for each hit is shown. Electronic expression (E-Northern) for each of these hits was inferred from the UniGene database from the cDNA sources, and the chromosomal map position for each of these hits was inferred from the cytogenetic map.4 Name Hs.# Br Co Lu Ov Pa Pr Enzymes (n ϭ 35) CYB561 153028 29 P450, subfamily XVII 1363 26 ATP synthase, isoform 2 155751 20 Proteosome non-ATPase, 7 155543 10 18 Glu. peroxidase 2a 2702 17 Carbonic anhydrase I 23118 18 Dopa decarboxylase 150403 10 Tryptophan Hydroxylase 144563 27 ATPase, lysosomal 24322 13 Serine protease 9 79361 28 Serine-like protease, 1 69423 25 28 Myosin IXBB 159629 17 ATPase isoform 1 64173 15 ADH 8 87539 15 ATPase, 9 kD 24322 13 PKR 177574 10 Cathepsin E 1355 80 Panc. lipase 102876 47 Ser. protease 2 241561 39 Ser. protease 1 241395 34 Elastase 1 21 38 Carboxy peptidase A2 89717 34 Carboxy peptidase A1 2879 33 Elastase 3 181289 32 Chymotrypsino-gen B1 74502 32 Pancreatic lipase-related 1 73923 30 Carboxy peptidase B1 180884 28 DNase II 118243 14 Elastase 3B 183864 13 Urokinase 77274 13 Ribosomal proteins (N ϭ 6) S15a 2953 11 S17 5174 11 S19 126707 16 L31 184014 10 L35 182825 10 L41 108124 10 Receptor/surface/membrane (N ϭ 17) Claudin 25640 33 Tumor necrosis factor receptor, 6b 194676 17 Plakophilin 3 26557 11 Lymphocyte antigen complex 77667 17 Mesothelin 155981 40 Chloride channel 1A 84974 12 receptor 1 73769 10 CD74 84298 10 Keratin 19 182265 24 CEA 220529 21 Keratin 17 2785 17 Trans membrane 4 superfamily 3 84072 13 Trans membrane 4 superfamily 4 11881 17 Non-specific cross-reacting antigen 73848 61 T-cell receptor ␥ 112259 95 IL-17 receptor 129751 43 ERBB3 199067 15 Binding proteins (N ϭ 24) Myosin 170482 236 PDEF 79414 194 Ubiquitin binding, p62 182248 27 HSFBP 1 158675 25 H2A histone 795 23 Ald. dehydrogenase 6 75746 11 PSA/KLK2 181350 77 PSA/KLK3 171995 53 Acid phosphatase 1852 61 Bruton’s tyrosine kinase 178391 Treacher Collins-Franceschetti syndrome 1 172727 18 17 ATP binding cassette, subfamily B, member 6 107911 14 Homeobox A9 127428 22 Insulinoma-associated 1 89584 48 (TITF1) 107764 34 ATP-binding cassette, subfamily C, 8 54470 30 I3 protein 75922 16 H1 histone 109804 15 Lectin binding (galectin 6) 79339 13 LIM domain 79691 10 Retinoic acid binding 2 183650 10

4038

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

Table 1 Continued Name Hs.# Br Co Lu Ov Pa Pr SNC73 protein 32225 12 Actin-related protein 2/3 complex, subu. 1B 11538 11 Lumican 79914 10 Thrombospondin 2 108623 10 Ca ϩ integrin BP 10803 10 ELK4, ETS-domain protein 169241 60 Myosin BP-C 181368 10 Fatty acid BP-5 153179 15 Secretory proteins (N ϭ 16) Trefoil factor 1 1406 22 2 315 22 Pancreatitis-associated protein 423 18 H. sapiens gallbladder mucin MUC5B mRNA 102482 11 Epididymis-specific, whey-acidic protein type 2719 27 Lutheran blood group 155048 21 Progestagen-associated protein 82269 17 TNF, 13 54673 10 Glucagon 1460 93 Mucin 5 103707 84 Islet-derived 1␤ protein 4158 51 Islet-derived 1␣ protein 1032 21 Secreted frizzled-related protein 4 105700 17 Mucin 1, transmembrane 89603 10 Microseminoprotein ␤ 183752 37 Small inducible cytokine 153423 18 Immunoglobulins (N ϭ 4) FCGRT 160741 26 Ig ␭ gene cluster 181125 10 Ig super family 102171 15 Ig ␥ 3 140 52 Cell adhesion molecules (N ϭ 6) Integrin ␤ 5 149846 11 Collagen type XIII, ␣ 1 211933 23 Laminin ␣ 5 11669 10 Laminin ␤ 3 75517 24 Laminin ␥ 2 54451 20 Collagen Type X ␣ 1 179729 30 a Glu, glutathione; DOPA, aromatic 1- decarboxylase; ADH, aldehyde dehydrogenase; PK, protein kinase; ser, serine; CEA, carcinoembryonic antigen; HSFBP, heat shock factor-binding protein; BP, binding protein; ald, aldehyde; PSA, prostate-specific antigen; TNF, tumor necrosis factor; FCGRT, Fc fragment of IgG, receptor, I3, brain protein I3.

tumors (pool A) and all other organ- and tumor-derived cDNA libraries (m) transmembrane glycoprotein (Hs. 143133). In addition, primers were including the corresponding normal (pool B) for the DDD1 method using designed for 20 colon-specific novel ESTs. The primer sequences are the online tool. The output provided a numerical value in each pool available on request. The PCR parameters included 94°C for 7 min, denoting the fraction of sequences within the pool that mapped to the followed by a 35–40-cycle amplification at 94°C for 45 s, 62°C–65°C for UniGene cluster, providing a dot intensity. Fold differences were calcu- 45 s, and 72°C for 90 s, with a final extension at 72°C for 10 min. lated by using the ratio of pool A:pool B. Statistically significant hits RT-minus controls and genomic DNA controls were routinely used to (Fisher’s exact test) showing Ͼ10-fold differences were compiled, and a authenticate the RT-derived products. One-half of the amplified products preliminary database was created. Hits were classified into major families were separated by electrophoresis on 2% agarose gel and detected by using information generated from two web sites.7 Novel ESTs were com- ethidium bromide staining. Internal control actin RT-PCR was done on all piled into a separate database. The UniGene database was accessed to samples simultaneously. establish an electronic expression profile (E-Northern) for each of the hits to facilitate tumor- and organ-selective gene discovery. The cytogenetic Results map position of the hits was also inferred from the UniGene page. A final database of ESTs that were up-regulated, down-regulated, and show abso- Electronic Profiling of Up-Regulated Genes in Solid Tumors. lute differences (ϩ/Ϫ) in the six tumor types was created.4 The EST libraries representing each of the organ types used in the Validation of DDD Hits. Tumor and normal tissues were obtained from DDD protocol were chosen from the CGAP database library ␮ the Cooperative Human Tissue Network (Birmingham, AL). One gof browser. The content of each of the libraries used was verified by total RNA was reverse-transcribed using random hexamers and Superscript comparison against the UniGene database.6 The DDD was per- Reverse Transcriptase (Life Technologies, Inc., Gaithersburg, MD). One- fortieth of the cDNA was PCR-amplified using gene-specific primers. formed by taking all the libraries representing tumors for each Primers were designed using the Primer 3 program on the web.8 Primers organ type and digitally comparing against either the correspond- were chosen for the following 13 known genes that showed DDD speci- ing normal tissue-derived library (DDD2) or all the other libraries ficity for colon tumors: (a) creatine kinase (Hs. 118843); (b) guanylate plus the corresponding normal in the database (DDD1). Pretumor cyclase (Hs. 1085); (c) ETS-variant gene (Hs. 179214); (d) placental libraries were not included in all the other libraries and were lactogen (Hs. 75984); (e) troponin (Hs. 73980); (f) tinin (Hs. 172004); (g) analyzed separately (DDD3 and DDD4). Comparing the tumor fibrinogen (Hs. 90765); (h) homeobox transcription factor 1 (Hs. 1545); (i) libraries with all of the other libraries including the corresponding homeobox transcription factor 2 (Hs. 7399); (j) myoglobin C (Hs. 118836); normal (DDD1) or with the corresponding normal only (DDD2) (k) cytokeratin 20 (Hs. 84905); (l) neurotensin receptor (Hs. 110642), and resulted in the identification of over 600 ESTs (data not shown). These hits were subdivided into known and novel ESTs. Approx- 7 http://www.ncbi.nlm.nih.gov/Omin/ and the GeneCards site http://bioinformatics. weizmann.ac.il/cards/. imately 10% of the hits showed varying levels of similarity (weak, 8 http://www.genome.wi.mit.edu//cgi-bin/primer/primer3_www.cgi. moderate, and high) to alu-containing repeat sequences. An inter- 4039

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

Table 2 DDD of down-regulated genes in tumors Hits showing Ͼ10-fold differences in normal organ-derived cDNA library in comparison with tumor-derived and other organ-derived libraries were compiled. The hits are classified as shown in Table 2.4 Name Hs.# Br Co Lu Ov Pa Pr Enzymes (N ϭ 7) Creatine kinase, brain 173724 10 TIMP 1a 5831 10 Myosin light kinase 211582 13 Superoxide dismutase 2 177781 20 Aldo-keto reductase 78183 16 Amylase, ␣ 2A 75733 15 Elastase 3B 183864 18 Receptor/surface/membrane (N ϭ 8) Phospholemman-like 92323 15 CEA 220529 14 MHC class I, E 181392 21 Surfactant C 1074 19 Surfactant A1 177582 34 SB class HC antigen ␣ 914 15 Surfactant B 76305 14 CD44 169610 22 Secretory proteins (N ϭ 9) HB ␣ 1 75792 12 140 13 30 Hb ␥ A 182167 12 11 22 Albumin 75442 29 58 10 Placental lactogen 75984 39 Hb ␤ 155376 11 TGFBI 118787 12 CTGF 75511 17 PLAB 116577 21 Interleukin 8 624 301 Ribosomal proteins (N ϭ 6) L9 157850 16 L17 82202 11 L21 184108 15 L23 234518 15 L37A 184109 23 S19 126707 11 Binding proteins (N ϭ 13) Myosin 9615 35 Crystallin ␣ B 1940 19 SNC73 protein 32225 10 12 Galectin 4 5302 18 Myosin 929 20 Pleckstrin 77436 70 Actin, ␤ 180952 14 ␣-2-Macroglobulin 74561 13 EIF4A1 129673 10 IGFBP5 103391 83 Transgelin 75777 14 Karyopherin ␣ 4 119500 11 TYR-MAP-␤ 182238 10 Cell adhesion molecules (N ϭ 2) Fibronectin 118162 15 Vimentin 2064 12 Immunoglobulins (N ϭ 2) FCGRT 160741 11 Ig ␬ 156110 10 a TIMP, tissue inhibitor of metalloproteinase 1; CEA, carcinoembryonic antigen; HB, hemoglobin; TGFB, transforming growth factor ␤; PLAB, prostate differentiation factor; SNC73, immunoglobulin heavy constant ␣ 1; EIF, eukaryotic translation initiation factor; IGFBP, insulin-like growth factor-binding protein; TYR-MAP, tyrosine 3-monooxygenase/ tryptophan 5-monooxygenase activation protein; FCGRT, Fc fragment of IgG receptor. esting pattern emerged regarding the known ESTs thus identified. carcinoma-derived EST libraries. The majority of the up-regulated The majority of the known ESTs can be classified into distinct genes were identified by both DDD1 and DDD2 protocols. These classes of genes. These included ribosomal proteins, enzymes, cell results suggest the potential utility of the DDD protocol to rapidly surface receptors, binding proteins, secretory proteins, cell adhe- identify tumor-selective genes. An electronic Northern (sources of sion molecules, and immunoglobulins. Over 80 genes were found cDNAs) and cytogenetic map position for each of the hits was to be up-regulated with Ͼ10-fold differences in comparison to created from the UniGene page (data not shown). The above- normal and all other organs (Table 1). Ribosomal proteins were mentioned DDD approach also resulted in the identification of found to be up-regulated in breast- and prostate carcinoma-derived up-regulated novel ESTs (data not shown). The vast majority of the libraries, but not in the other four solid tumor-derived libraries. ESTs were present in an organ- and tumor type-dependent manner. Known hits (enzymes) for select organ type (for example, prostate- The results of these hits and additional hits (Ͼ5-fold), including specific antigen and prostatic acid phosphatase for prostate, several the novel ESTs, can be viewed at the web site.4 pancreatic enzymes for pancreas, tryptophan hydroxylase and Electronic Profiling of Down-Regulated Genes in Solid Tumors. DOPA decarboxylase in the lung, and mammaglobin for breast) Using the same DDD protocol (DDD1 and DDD2), a list of genes that were identified by the DDD protocol. The DDD also identified are down-regulated by Ͼ10-fold was also compiled (Table 2). The mucin-related proteins in the colon and folate receptor in the ovary majority of the down-regulated genes were discovered by DDD2 4040

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

Table 3 DDD to identify hits present in the tumor types From the hits of Tables 2 and 3, ESTs present only in the tumor were compiled.4 Name Hs.# Br Co Lu Ov Pa Pr Mammaglobin 1 46452 ϩ/Ϫ Guanylate cyclase 2C 1085 ϩ/Ϫ Homeobox trans. factor 1a 1545 ϩ/Ϫ Homeobox trans. factor 2 77399 ϩ/Ϫ Cyto keratin 20 84905 ϩ/Ϫ Ets variant 179214 ϩ/Ϫ EGR4 3052 ϩ/Ϫ Dopamin rec. D2 73893 ϩ/Ϫ ␥-glutamyl-transferase 2 211824 ϩ/Ϫ V ␬ 2 249231 ϩ/Ϫ Neurotensin receptor 1 110642 ϩ/Ϫ Transcobalamin I 2012 ϩ/Ϫ Androgen receptor 99915 ϩ/Ϫ Specific granular protein 54431 ϩ/Ϫ a Trans, transcription; EGR, early growth response. protocol (tumor versus normal). Similar to the above-mentioned re- (DDD7) and ETS variant gene (DDD12) were expressed in the same sults, distinct members of ribosomal proteins, enzymes, cell surface tumor, but not in the normal colon-derived cDNAs, whereas, RT-PCR receptors, binding proteins, secretory proteins, cell adhesion mole- analysis of the remaining nine genes did not show the DDD-predicted cules, and immunoglobulins were discovered to be selectively down- specificity of expression (data not shown). Furthermore, analysis of regulated in an organ- and tumor type-dependent manner (N ϭ 34). A 20 novel ESTs predicted to be specific for colon tumors by DDD complete listing of all the genes that are down-regulated (Ͼ5-fold) demonstrated two of the ESTs to be specifically expressed in the with their E-Northern and cytogenetic map position can be accessed at tumor, consistent with the prediction of DDD (data not shown). The the web site.4 authenticity of the RT-derived products was established using RT- Electronic Prediction of Tumor-selective Genes by DDD. Using minus reactions. These results demonstrated that it is possible to the DDD1 and DDD2 protocol, a list of known genes that are rapidly validate the electronic prediction by DDD. predicted to be either present or absent in the tumor types (plus/minus differences) was compiled (Table 3). Fourteen genes were found to be Discussion present exclusively in the select solid tumor-derived libraries (Table 3). These included mammaglobulin in breast, androgen receptor in Discovery of cancer genes is a major challenge facing cancer prostate, ␥-glutamyl transferase in lung, and neurotensin receptor in research (9, 13, 15). It is becoming increasingly clear that multiple pancreas. Seventeen genes were found to be selectively absent in the genetic alterations are responsible for cancer development. The solid tumors analyzed (Table 4). These included seminogelin in the task of identifying cancer genes is complex due to the fact that only prostate, apolipoprotein in the breast, and islet ameloid polypeptide in a subset of genes within a cell population is expressed. Whereas in the pancreas. the past, cancer gene discovery followed conventional methods Test Validation of DDD Hits by RT-PCR. The electronic pre- such as mapping the gene, loss of heterozygosity, diction by DDD was tested for expression relevance using cDNAs studies, and so forth, the future of cancer gene discovery has to be from a matched set of normal and tumor colon tissues (Fig. 1). Twelve able to make effective use of the large number of genes (predicted known genes that showed plus/minus differences from the analysis of to be around 100,000 per cell) that will emerge from the human colon DDD from Table 3 and Table 4 were analyzed by RT-PCR. genome sequencing efforts (1–4). These sequences are being de- Three of these 12 genes showed concordance with the DDD predic- posited in the vast sequence databases, most of which are in the tion. The results of these three genes are shown in Fig. 1. Creatine public domain. For cancer-specific gene discovery, the CGAP kinase (DDD2) expression was detected in the normal colon-derived database of the NCI provides a comprehensive collection not only but not tumor colon-derived cDNAs, whereas, Guanylate cyclase of expressed sequences in the form of ESTs but also of various

Table 4 DDD to identify hits absent in the tumor types From the hits of Tables 2 and 3, ESTs absent only in the tumor were compiled.4 Name Hs.# Br Co Lu Ov Pa Pr Apolipoprot. D 75736 ϩ/Ϫ Albumin 75442 ϩ/Ϫϩ/Ϫ Troponin T1 73980 ϩ/Ϫ Titin 172004 ϩ/Ϫ Creatine kinase 118843 ϩ/Ϫϩ/Ϫϩ/Ϫ Actin ␣ 1 1288 ϩ/Ϫϩ/Ϫϩ/Ϫ Myoglobin 118836 ϩ/Ϫϩ/Ϫ Vinculin 75350 ϩ/Ϫ Myosin, heavy 929 ϩ/Ϫ p53-response gene 2 118893 ϩ/Ϫ Placental lactogen 75984 ϩ/Ϫϩ/Ϫϩ/Ϫ HB ␤a 155376 ϩ/Ϫ HB ␣ 1 75792 ϩ/Ϫϩ/Ϫ Islet amyloid polypeptide 142255 ϩ/Ϫ Tissue factor pathway inhibitor 2 78045 ϩ/Ϫ IGF2 182167 ϩ/Ϫ Semenogelin I 1968 ϩ/Ϫ a HB, hemoglobin; IGF, insulin-like growth factor. 4041

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

classes of genes, the number of known hits was significantly reduced (10–30 per organ type). The majority of the hits were classifiable as ribosomal proteins, enzymes, cell surface receptors, binding proteins, secretory proteins, cell adhesion molecules, or immunoglobulins. Known organ type-specific genes were identified, including prostate- specific antigens, prostatic acid phosphatase, and androgen receptor for prostate; pancreatic enzymes such as islet ameloid polypeptide, lipases, elastases, and carboxypeptidases for pancreas; and mamma- globulin for breast. Our recent demonstration of pancreatic tumor- specific expression of neurotensin receptor (20) was also corroborated by the DDD results. Distinct ribosomal proteins were found to be up- and down-regu- lated in ovary-, pancreas-, and prostate cancer-derived libraries by DDD. A recent report (17) supports the selective involvement of the ribosomal proteins in prostate and breast tumors. Subtractive hybrid- ization of prostatic hyperplasia from prostate tumors identified dis- tinct ribosomal proteins (L4, l5, L7a, L23a, l30, L37, S14, and S18). Furthermore, L23a and S14 levels were shown to be elevated in PC-3 prostate carcinoma cells in comparison to normal prostatic epithelial cells (17). In addition, mutant p53 expression has been shown to induce overexpression of selective ribosomal proteins (17). These results demonstrate the utility of DDD for rapid gene discovery of cancer.

Fig. 1. RT-PCR validation of test hits identified by DDD. Three hits (creatine kinase Electronic expression profiling to identify cancer specific genes is MM-DDD2, guanylate cyclase 2C-DDD7, and ETS variant gene 3-DDD12) from Table likely to have false positives. However, the true lead genes can be 3 were chosen for expression specificity using a matched set of normal- and tumor rapidly validated by RT-PCR using appropriate cDNAs. RT-PCR colon-derived random primed cDNAs synthesized in the presence (RTϩ) or absence of (RTϪ) reverse transcriptase. NEG, template minus negative control. validation of test genes indicated the expression profile consistent with the DDD prediction. We choose 12 known genes predicted to be either up- or down-regulated in colon tumors by DDD. Three out of data-mining tools to analyze the ESTs. The basis of the CGAP these 12 genes showed the expected specificity of expression. Evi- database is establishment of the molecular anatomy of the cancer dence in the literature supports these three hits in select cancer types. cell by determining the repertoire of genes that are expressed in the Creatine kinase isoforms have been shown to be lost in colon carci- cancer in a quantitative manner, so that a fingerprint can be nomas (18), consistent with our DDD findings. Similarly, the guany- established. By comparing the fingerprints with the normal cells, it late cyclase C isoform has recently been shown to be a biomarker for should be possible to short list genes that are present in the cancer advanced colon carcinomas (19). The selective up-regulation of the cells, which can be followed up for relevance. ETS variant gene seen in the colon tumor, if validated in a large High-throughput gene expression techniques (microarrays, Gene- sample size, provides a novel diagnostic and therapy target for colon chips) to identify cancer-specific genes are becoming available (5–8), carcinomas. The ETS gene belongs to the ETS oncogene family (16), however, the technology is not cost effective for average laboratories. which has DNA binding ability. In addition, the DDD protocol pre- Furthermore, this method introduces a bias in that only a limited dicted the expression specificity of neurotensin receptor in pancreatic number, usually one tumor- and normal-derived RNA, can be used for tumors. We have recently identified this gene by independently data- the initial analysis. In view of the pharmacogenomic gene expression mining the Unigene database and showed the expression specificity in profile differences that are usually seen in different patients (8, 9), the pancreatic tumors (20). Identification by the DDD protocol of direct use of this technique may become limiting. In this context, several known genes that have already been shown to be of diagnostic data-mining the databases provides a parallel approach to rapidly and therapeutic value in prostate, breast, and pancreatic tumors sug- establish transcript-based fingerprinting. gests that it is possible to discover novel genes from the list of novel The CGAP database currently offers three different data-mining ESTs that we have uncovered. In support of this, our preliminary tools: (a) X-profiling; (b) SAGE analysis; and (c) DDD. The X-pro- results with 20 colon-specific novel ESTs have led to the identifica- filing method enables identification of mostly novel ESTs that are tion of 2 ESTs showing specificity of expression to the colon tumors. present or absent in a given library, but it does not permit quantitation. These novel ESTs can be subjected to additional data-mining tools The SAGE and DDD options enable quantitation of transcript levels [e.g., comparison with SAGE and X-profiling; contig construction to and identify both known and novel ESTs. The SAGE database was a expand the sequences and motifs recognition, and electronic Northern recent addition to the CGAP database (10); currently, SAGE libraries (14)] to further reduce the numbers before laboratory validation with are available for only a few organs. Hence, to establish a proof of relevant cDNAs. concept of cancer gene discovery by data-mining, we choose the DDD The ability to reduce the number of hits from the vast and protocol. rapidly growing amount of sequence information in the sequence A database of ESTs, both known and novel, that are differentially database is crucial to efficient gene discovery. The results pre- expressed at least Ͼ10-fold was created for six major solid tumor sented in this report support the starting premise that by data- types. An electronic Northern based on cDNA sources as well as the mining the CGAP database, it is possible to rapidly short list both cytogenetic map position was created for each of these hits. The vast known and novel ESTs for immediate follow-up studies. The sequence database (1.5 million ESTs) was thus reduced to approxi- database of hits we have generated using DDD for six different mately 200 known genes and 500 novel ESTs for these six organ- solid tumor types should provide a rapid starting point for discov- derived tumor types. When these hits were subdivided into major ery of both diagnostic and therapy targets. 4042

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER

Acknowledgments Kinzler, K. W., Strausberg, R. L., and Riggins, G. J. A public database for gene expression in human cancers. Cancer Res., 59: 5403–5407, 1999. We thank the CGAP database for providing access and the data-mining 11. Nacht, M., Ferguson, A. T., Zhang, W., Petroziello, J. M., Cook, B. P., Yu, H. G., tools used in this study. We thank Jeanine Narayanan for editorial assistance. Maguire, S., Riley, D., Coppola, G., Landes, G. M., Madden, S. L., and Sukumar, S. Combining serial analysis of gene expression and array technologies to identify References genes differentially expressed in breast cancer. Cancer Res., 59: 5464–5470, 1999. 1. Robbins, R. J. Bioinformatics: essential infrastructure for global biology. J. Computat. 12. Wheeler, D. L., Chappey, C., Lash, A. E., Leipe, D. D., Madden, T. L., Schuler, G. D., Biol., 3: 465–478, 1996. Tatusova, T. A., and Rapp, B. A. Database resources of the National Center for 2. Andrade, M. A., and Sander, C. Bioinformatics: from genome data to biological Biotechnology Information. Nucleic Acids Res., 28: 10–14, 2000. sciences. Curr. Opin. Biotechnol., 8: 675–683, 1997. 3. Collins, F. S., Patrinos, A., Jordan, E., Chakravarti, A., Gesteland, R., and Walters, L. 13. Strausberg, R. L., Dahl, C. A., and Klausner, R. D. New opportunities for uncovering New goals for the U.S. Human Genome Project: 1998–2003. Science (Washington, the molecular basis of cancer. Nat. Genet., Spec No 17: 415–416, 1997. DC), 282: 682–689, 1998. 14. Schmitt, A. O., Specht, T., Beckmann, G., Dahl, E., Pilarsky, C. P., Hinzmann, B., 4. Fannon, M. R. Gene expression in normal and disease states: identification of and Rosenthal, A. Exhaustive mining of EST libraries for genes differentially ex- therapeutic targets. Trends Biotechnol., 14: 294–298, 1996. pressed in normal and tumour tissues. Nucleic Acids Res., 27: 4251–4260, 1999. 5. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., 15. Szallasi, Z. Bioinformatics. Gene expression patterns and cancer. Nat. Biotechnol., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., and Brown, E. L. Expression 16: 1292–1293, 1998. monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol., 16. Klemsz, M., Hromas, R., Raskind, W., Bruno, E., and Hoffman, R. PE-1, a novel ETS 14: 1675–1680, 1996. oncogene family member, localizes to 1q21–q23. Genomics, 20: 291– 6. Heller, R. A., Schena, M., Chai, A., Shalon, D., Bedilion, T., Gilmore, J., Woolley, 294, 1994. D. E., and Davis, R. W. Discovery and analysis of inflammatory disease-related genes 17. Vaarala, M. H., Porvari, K. S., Kyllonen, A. P., Mustonen, M. V., Lukkarinen, O., and using cDNA microarrays. Proc. Natl. Acad. Sci. USA, 94: 2150–2155, 1997. Vihko, P. T. Several genes encoding ribosomal proteins are over-expressed in 7. Khan, J., Bittner, M. L., Saal, L. H., Teichmann, U., Azorsa, D. O., Gooden, G. C., prostate-cancer cell lines: confirmation of L7a and L37 over-expression in prostate- Pavan, W. J., Trent, J. M., and Meltzer, P. S. cDNA microarrays detect activation of cancer tissue samples. Int. J. Cancer, 78: 27–32, 1998. a myogenic transcription program by the PAX3-FKHR fusion oncogene. Proc. Natl. Acad. Sci. USA, 23: 13264–13269, 1999. 18. Joseph, J., Cardesa, A., and Carreras, J. Creatine kinase activity and isoenzymes in 8. Elek, J., Park, K. H., and Narayanan, R. Microarray-based expression profiling in lung, colon and liver carcinomas. Br. J. Cancer, 76: 600–605, 1997. prostate tumors. In Vivo, 14: 173–182, 2000. 19. Cagir, B., Gelmann, A., Park, J., Fava, T., Tankelevitch, A., Bittner, E. W., Weaver, 9. Zhang, L., Zhou, W., Velculescu, V. E., Kern, S. E., Hruban, R. H., Hamilton, S. R., E. J., Palazzo, J. P., Weinberg, D., Fry, R. D., and Waldman, S. A. Guanylyl cyclase Vogelstein, B., and Kinzler, K. W. Gene expression profiles in normal and cancer C messenger RNA is a biomarker for recurrent stage II colorectal cancer. Ann. Intern. cells. Science (Washington, DC), 276: 1268–1272, 1997. Med., 131: 805–812, 1999. 10. Lal, A., Lash, A. E., Altschul, S. F., Velculescu, V., Zhang, L., McLendon, R. E., 20. Elek, J., Pinzon, W., Park, K. H., and Narayanan, R. Relevant genomics of neuro- Marra, M. A., Prange, C., Morin, P. J., Polyak, K., Papadopoulos, N., Vogelstein, B., tensin receptor in cancer. Anticancer Res., 20: 53–58, 2000.

4043

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. Cancer Gene Discovery Using Digital Differential Display

Daniela Scheurle, Maurice Phil DeYoung, David M. Binninger, et al.

Cancer Res 2000;60:4037-4043.

Updated version Access the most recent version of this article at: http://cancerres.aacrjournals.org/content/60/15/4037

Cited articles This article cites 16 articles, 3 of which you can access for free at: http://cancerres.aacrjournals.org/content/60/15/4037.full#ref-list-1

Citing articles This article has been cited by 13 HighWire-hosted articles. Access the articles at: http://cancerres.aacrjournals.org/content/60/15/4037.full#related-urls

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cancerres.aacrjournals.org/content/60/15/4037. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research.