Cancer Gene Discovery Using Digital Differential Display1
Total Page:16
File Type:pdf, Size:1020Kb
[CANCER RESEARCH 60, 4037–4043, August 1, 2000] Advances in Brief Cancer Gene Discovery Using Digital Differential Display1 Daniela Scheurle, Maurice Phil DeYoung, David M. Binninger, Holly Page, Mohammad Jahanzeb, and Ramaswamy Narayanan2 Center for Molecular Biology and Biotechnology and Department of Biology, Florida Atlantic University, Boca Raton, Florida 33431 [D. S., M. P. D., D. M. B., H. P., R. N.], and Eugene M. and Christine E. Lynn Clinical Research Center, Boca Raton Community Hospital Cancer Center, Boca Raton, Florida 33486 [M. J.] Abstract cancer-specific gene discovery (13). The Human Tumor Gene Index was initiated by the NCI in 1997 with a primary goal of identifying The Cancer Gene Anatomy Project database of the National Cancer genes expressed during development of human tumors in five major Institute has thousands of expressed sequences, both known and novel, in cancer sites: (a) breast; (b) colon; (c) lung; (d) ovary; and (e) prostate. the form of expressed sequence tags (ESTs). These ESTs, derived from This database consists of expression information (mRNA) of thou- diverse normal and tumor cDNA libraries, offer an attractive starting point for cancer gene discovery. Using a data-mining tool called Digital sands of known and novel genes in diverse normal and tumor tissues. Differential Display (DDD) from the Cancer Gene Anatomy Project da- By monitoring the electronic expression profile of many of these tabase, ESTs from six different solid tumor types (breast, colon, lung, sequences, it is possible to compile a list of genes that are selectively ovary, pancreas, and prostate) were analyzed for differential expression. expressed in the cancers. Data-mining tools are becoming available to An electronic expression profile and chromosomal map position of these extract expression information about the ESTs derived from various hits were generated from the Unigene database. The hits were categorized CGAP libraries (9, 10, 12, 14). Currently, there are 1.5 million ESTs into major classes of genes including ribosomal proteins, enzymes, cell in the CGAP database, of which 73,000 are novel sequences. These surface molecules, secretory proteins, adhesion molecules, and immuno- sequences are also subclassified into those derived from libraries of globulins and were found to be differentially expressed in these tumor- normal, precancerous, or cancer tissues. We chose the DDD at the derived libraries. Genes known to be up-regulated in prostate, breast, and CGAP database to identify genes (both novel and known ESTs) that pancreatic carcinomas were discovered by DDD, demonstrating the utility of this technique. Two hundred known genes and 500 novel sequences are selectively up- or down-regulated in six major solid tumor types were discovered to be differentially expressed in these select tumor- (breast, colon, lung, ovary, pancreas, and prostate). Survey sequenc- derived libraries. Test genes were validated for expression specificity by ing of mRNA gene products can provide an indirect means of gener- reverse transcription-PCR, providing a proof of concept for gene discov- ating gene expression fingerprints for cancer cells and their normal ery by DDD. A comprehensive database of hits can be accessed at http:// counterparts. DDD is a computer method of comparing these finger- www.fau.edu/cmbb/publications/cancergenes.htm. This solid tumor DDD prints. DDD is a quantitative method that enables the user to deter- database should facilitate target identification for cancer diagnostics and mine the fold differences between the libraries being compared, using therapeutics. a statistical method to quantitate the transcript levels. ESTs present in tumor-derived libraries were compared against all other libraries or Introduction against the corresponding normal libraries by DDD, and the hits showing Ͼ10-fold differences were compiled for each of the organ With the expected completion of the human genome sequencing types. These hits were functionally classified into major classes of efforts in the next few years, over 100,000 new genes are likely to be proteins. Genes belonging to ribosomal proteins, enzymes, receptors, discovered (1–3). From these vast numbers of new genes, new diag- binding proteins, secretory proteins, and cell adhesion molecules were nostic and therapeutic targets for diseases like cancer are predicted to identified to be differentially expressed in these tumor types. A emerge (4). Only a subset of genes is expressed in a given cell, and the comprehensive database of hits was created, providing additional level of expression governs function. High-throughput gene expres- electronic expression data as well as novel ESTs that were thus sion technology is becoming a possibility for analyzing expression of identified. This database can be accessed on the World Wide Web.4 a large number of sequences in diseased and normal tissues with the use of microarrays and gene chips (5–8). A parallel way to initiate a Materials and Methods search for genes relevant to cancer diagnostics and therapy is to data mine the sequence database (9–13). A large number of expressed Data-mining of CGAP Database. The CGAP database was accessed,5 and sequences from diverse organ-, species-, and disease-derived cDNA the DDD tool was used according to the database instructions. DDD takes libraries are being deposited in the form of ESTs3 in different data- advantage of the UniGene database by comparing the number of times ESTs bases. from different libraries were assigned to a particular UniGene cluster. Six The CGAP database of the NCI is an attractive starting point for different solid tumor-derived EST libraries (breast, colon, lung, ovary, pan- creas, and prostate) with corresponding normal tissue-derived libraries were chosen for DDD (N ϭ 110). To identify tumor- and organ-specific ESTs, all Received 2/2/00; accepted 6/8/00. ϭ The costs of publication of this article were defrayed in part by the payment of page the other organ- and tumor-derived EST libraries (N 327) were chosen for charges. This article must therefore be hereby marked advertisement in accordance with comparison with each of the six tumor types. The nature of the libraries 18 U.S.C. Section 1734 solely to indicate this fact. (normal, pretumor, or tumor) was authenticated by comparison of the CGAP 1 Supported by an institutional startup grant (to R. N.). D. S. was supported by a grant data with UniGene database.6 Those few libraries showing discrepancies of from the Boca Raton Community Hospital Foundation. Colon tumor and normal tissues were provided by the Cooperative Human Tissue Network, which is funded by the definition between the two databases were excluded. The DDD was performed National Cancer Institute. for each organ type individually. DDD was performed using ESTs from tumors 2 To whom requests for reprints should be addressed, at Center for Molecular Biology (pool A) and corresponding normal organ (pool B) for DDD2 method or and Biotechnology, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431. Phone: (561) 297-2018; Fax: (561) 297-2099; E-mail: [email protected]. 3 The abbreviations used are: EST, expressed sequence tag; CGAP, Cancer Gene 4 http://www.fau.edu/cmbb/publications/cancergenes.htm. Anatomy Project; DDD, Digital Differential Display; RT, reverse transcription; Hs., 5 http://www.cgap.gov. human sequence; NCI, National Cancer Institute. 6 http://www.ncbi.nlm.nih.gov/UniGene/. 4037 Downloaded from cancerres.aacrjournals.org on September 30, 2021. © 2000 American Association for Cancer Research. BIOINFORMATICS OF CANCER Table 1 DDD of up-regulated genes in tumors Hits (known ESTs) showing Ͼ10-fold differences in the indicated tumor-derived libraries in comparison with normal tissue-derived cDNA library and all other organ- and tumor-derived cDNA libraries were compiled. ESTs belonging to specific classes of genes were subclassified as indicated. UniGene number for each hit is shown. Electronic expression (E-Northern) for each of these hits was inferred from the UniGene database from the cDNA sources, and the chromosomal map position for each of these hits was inferred from the cytogenetic map.4 Name Hs.# Br Co Lu Ov Pa Pr Enzymes (n ϭ 35) CYB561 153028 29 P450, subfamily XVII 1363 26 ATP synthase, isoform 2 155751 20 Proteosome non-ATPase, 7 155543 10 18 Glu. peroxidase 2a 2702 17 Carbonic anhydrase I 23118 18 Dopa decarboxylase 150403 10 Tryptophan Hydroxylase 144563 27 ATPase, lysosomal 24322 13 Serine protease 9 79361 28 Serine-like protease, 1 69423 25 28 Myosin IXBB 159629 17 ATPase isoform 1 64173 15 ADH 8 87539 15 ATPase, 9 kD 24322 13 PKR 177574 10 Cathepsin E 1355 80 Panc. lipase 102876 47 Ser. protease 2 241561 39 Ser. protease 1 241395 34 Elastase 1 21 38 Carboxy peptidase A2 89717 34 Carboxy peptidase A1 2879 33 Elastase 3 181289 32 Chymotrypsino-gen B1 74502 32 Pancreatic lipase-related protein 1 73923 30 Carboxy peptidase B1 180884 28 DNase II 118243 14 Elastase 3B 183864 13 Urokinase 77274 13 Ribosomal proteins (N ϭ 6) S15a 2953 11 S17 5174 11 S19 126707 16 L31 184014 10 L35 182825 10 L41 108124 10 Receptor/surface/membrane (N ϭ 17) Claudin 25640 33 Tumor necrosis factor receptor, 6b 194676 17 Plakophilin 3 26557 11 Lymphocyte antigen complex 77667 17 Mesothelin 155981 40 Chloride channel 1A 84974 12 Folate receptor 1 73769 10 CD74 84298 10 Keratin 19 182265 24 CEA 220529 21 Keratin 17 2785 17 Trans membrane 4 superfamily 3 84072 13 Trans membrane 4 superfamily 4 11881 17 Non-specific cross-reacting antigen 73848