Oncogene (2001) 20, 4402 ± 4408 ã 2001 Nature Publishing Group All rights reserved 0950 ± 9232/01 $15.00 www.nature.com/onc SHORT REPORTS Identi®cation of a novel , CDCP1, overexpressed in human colorectal cancer

Marwa Scherl-Mostageer1, Wolfgang Sommergruber1, Roger Abseher1, Rudolf Hauptmann1, Peter Ambros2 and Norbert Schweifer*,1

1Research and Development, Dept. of Exploratory Research, Boehringer Ingelheim Austria, Vienna 1121 Austria; 2CCRI, St. Anna Kinderspital, Vienna 1080, Austria

We report the identi®cation of a novel human tumor 2000), including serial analysis of associated gene, CDCP1 (Cub Domain Containing (SAGE) (Velculescu et al., 1995; Ishii et al., 2000; ), which was identi®ed using representational Stollberg et al., 2000; Parle-McDermott et al., 2000), di€erence analysis and cDNA chip technology. The gene di€erential display (Liang and Pardee, 1992), represen- consists of eight exons, the upstream region of which tational di€erence analysis (RDA) (Hubank and neither contains a TATA- nor a CCAAT-box. However, Schatz, 1994; Diatchenko et al., 1996; Hubank and a CpG island is located around the transcription start, Schatz, 1999), and cDNA chip technology (Brown and which is found in approximately 60% of known . Botstein, 1999; Afshari et al., 1999; Kurian et al., 1999). The CDCP1 gene was mapped to 3p21-p23 In this study we describe the use of RDA to identify by ¯uorescence in situ hybridization. For expression di€erentially regulated genes in cancer compared to pro®ling real time quantitative RT ± PCR was performed normal tissue. Using A549 (ATCC: CCL-185), a using cell lines and laser capture microdissected colon human lung cancer cell line, as tester and normal lung cancer biopsies. CDCP1 mRNA is approximately 6 kb as driver for subtraction approximately 700 clones were and highly overexpressed in human colon cancer and lung ampli®ed exponentially, subcloned into vectors, se- cancer. CDCP1 represents a putative transmembrane quenced and annotated with bioSCOUTTM (Lion protein, containing three CUB domains in the extra- Bioscience). Half of the clones coded for genes with cellular part most likely involved in cell adhesion or known function, the rest gave no signi®cant hits to interacting with the extracellular matrix. Oncogene sequences in public data bases and in an e€ort to (2001) 20, 4402 ± 4408. further investigate these ESTs all of them were arrayed on glass slides together with selected IMAGE clones of Keywords: representational di€erence analysis; cDNA known genes (Figure 1). chip technology; laser microdissection; CUB domain; It is known that a number of genes have similar CDCP1 functions in di€erent types of solid tumors. In an attempt to identify such `tumor associated genes', consistently upregulated within solid tumors, we performed cDNA chip hybridizations using a variety Cancer is a genetic disease that involves clonal of di€erent tumor types. evolution of transformed cells (Fialkow, 1976; Nowell, Interestingly DNA chip analysis demonstrated that 1976) resulting from an accumulation of mutations some ESTs are not only signi®cantly expressed in either inherited (germline) or acquired (somatic), in human lung carcinomas but are even higher expressed critical proto-oncogenes and tumor suppressor genes. in cell lines derived from colon carcinomas (Figure Each mutation may provide additional growth advan- 1a,b). EST 1, EST 2 and EST 3 gave similar tage to the transformed cell as they dominate their transcription patterns and subsequently were revealed normal counterparts (Fearon and Vogelstein, 1990; to be partially overlapping (EST 2 and EST 3 are not Sidransky et al., 1992). Changes in the transcriptional shown). The alignment resulted in a consensus level of genes in cancer tissues can lead to upregulation sequence of 843 bp and was designated CUB Domain of ; understanding these changes may provide Containing Protein 1, CDCP1. the basis to de®ne new targets for therapeutical The BLAST algorithm (Altschul et al., 1997) was interventions. Various methods are available for used to search for EST homologues in the dbest detecting and quantifying gene expression levels (To, database. Several ESTs (AA357574, AI862284, M77904, etc.) were found, which expanded the sequence information from 843 bp to approximately 1500 bp. The full-length sequence was ampli®ed as *Correspondence: N Schweifer; demonstrated in Figure 2. DNA containing a portion E-mail: [email protected] Received 9 February 2001; revised 6 April 2001; accepted 19 April of the promoter region for the human CDCP1 gene 2001 was isolated using a human Promoter Finder DNA CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4403 human genomic DNA as well as Colo 205 cDNA (ATCC: CCL-222) using intron spanning primers. A total of eight exons were identi®ed (Figure 2). The chromosomal localization of the human CDCP1 gene was studied using a double target FISH assay on human metaphase . The two DNA probes CDCP1 and the 3pter speci®c probe B47a2 were co-hybridized on metaphases from two normal individuals. The majority of the chosen metaphases showed signals on one or both chromatids in the region p21-p23 of as visualized by the co-localization of the B47a2 (TRITC) on the same chromosomal arm (Figure 3). This is very interesting because CDCP1 shows an overexpression in an area usually associated with allele loss in other human tumors (Mitelman et al., 1997). Northern blot hybridizations were performed to determine the transcript length and to study the transcription pattern of CDCP1 in di€erent tissues and cell lines. Northern blot analysis revealed an RNA transcript of approximately 6 kb in several tissues and cell lines examined (Figure 4). Hybridization of blots with eight di€erent cancer cell lines shows a transcript of the novel gene in colon adenocarcinoma SW480, lung carcinoma cells A549 and K-562, a chronic myelogenous leukemia cell line (Figure 4a). To further characterize the transcription pattern of CDCP1, multiple tissue Northerns from normal and cancer tissues were hybridized and showed a very hetero- geneous transcription pro®le. Highest transcription level was found in samples derived from colon cancer tumors whereas RNA from normal colon tissue shows moderate expression only (Figure 4b). Furthermore analysis of normal tissues gave, with the exception of pancreas, weak positive signals in esophagus, stomach, rectum, prostate, placenta and peripheral blood lymphocytes (PBL). However, it should be considered that these ®lters contain RNA derived from whole tissue sections and therefore may contain transcrip- tional informations of in®ltrating cells such as lymphocytes, ®broblasts and endothelial cells derived Figure 1 Approximately 1300 IMAGE clones of known and from blood vessels as described by Hanahan and unknown genes together with 120 selected ESTs, derived from the Weinberg (2000). To compare the transcriptional subtracted lung cancer library, were arrayed on glass slides. These patterns of CDCP1 between di€erent human cancer cDNA chips were hybridized with Cy3 labeled cDNA either from individual normal tissues or tumor tissues or cell lines each and normal tissues as well as within di€erent cell lines, together with a Cy5 labeled cDNA derived from a mixture of 10 real time RT ± PCR was performed. Ampli®cations of normal tissues (bone marrow, heart, kidney, liver, lung, pancreas, human GAPDH, û-tubulin and/or û-actin cDNA of skeletal muscle, spleen, thymus, placenta). The Cy5 signal was comparable size were used as internal standards to used for normalizing intensities. (a) Relative expression pro®les of normalize expression values of CDCP1 mRNA (Figure EST 1 in individual lung cancer tissues and tumor cell lines as performed by cDNA chip analysis. (b) Relative expression pro®le 5a,b). The results recon®rm the expression data of EST 1 in normal colon tissue and colon tumor cell lines by obtained by Northern analysis showing highest expres- cDNA chip analysis sion of CDCP1 in colon and lung adenocarcinoma. Low expression can be observed in normal colon, lung, testis and invasive ductal carcinomas (IDC). Since the Walking Kit (Clontech). The DNA was sequenced and heterogeneous nature of tissue samples often makes bioinformatically analysed. In an e€ort to map the interpretation dicult, laser capture microdissection precise position of the transcription start site, a primer (LCM) was used to isolate homogenous populations of extension reaction was performed. cancer cells from clinical tissue specimens for real time Next we analysed the putative gene structure of quantitative RT ± PCR analysis (Bornstein et al., 1998). CDCP1. The intron/exon boundaries were determined Enriched cancer cells from tumor tissues gave clear by direct sequencing of PCR products ampli®ed from evidence that CDCP1 is highly expressed in all tumor

Oncogene CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4404

Figure 2 Nucleotide sequence of the coding and 5' ¯anking region, and the deduced amino acid sequence of the CDCP1 gene. Using RNA isolated from the non small lung cell carcinoma cell line Calu 6 (ATCC: HTB-56), a LoneLinker cDNA library was constructed (Abe, 1992). PCR ampli®cation with a CDCP1 speci®c primer (5'-AGTCCATGTGAACAAGTTGAGG-3') and a LoneLinker speci®c primer (5'-GAGATATTAGAATTCTACTC-3'), resulted in a 6 kb fragment. DNA containing a portion of the promoter region for the human CDCP1 gene was isolated using a human Promoter Finder DNA Walking Kit (Clontech) and the gene speci®c primer 5'-AGAACCCCTAGCAGTGCGATAGAGAC-3'. To determine the precise transcription start side primer extension (Sambrook et al., 1989) was performed using 25 mg total RNA from the human cell line Colo 205 as template. A 35S- labeled sequencing ladder generated from a 363 bp PCR fragment (nt 800 ± 1162) was used as a length standard. The elongation of the 32P-labeled reverse primer (positioned from nt 209 ± 183) resulted in a fragment of 209 nucleotides in length (data not shown). The initiation site of transcription, ¯anked by a cap signal consensus (GCAGCTGC), as well as the Kozak-type start of translation (ATG) and stop codon (TAA) are marked with gray boxes. The positions of the introns are marked by black triangles. The upstream genomic sequence contains no TATA or CCAAT box, however a GC-box as well as recognition sites for the rare cutter enzyme BssHII are indicated in the upstream region and in exon 1. Together with the fact that the 5' region has a GC content over 60% indicates that CDCP1 contains a CpG island. The putative transmembrane-spanning domain is underlined and 12 potential N- glycosylation sites are indicated in bold italic

Oncogene CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4405

Oncogene CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4406

Figure 3 Chromosomal localization of human CDCP1. The human CDCP1 probe (nt 281 ± 2532) was nick translated with digoxigenin 711-dUTP and hybridized in situ at a ®nal concentration of 20 ng/ml together with biotin labeled B47a2 (kindly provided by Dr L Kearney) located at the sub-telomeric region of chromosome arm 3p (Knight et al., 1997) to metaphases from two normal individuals (Lichter et al., 1988). The hybridized digoxigenin probes were detected with sheep anti dig (Boehringer Mannheim, Germany) and rabbit anti sheep antibodies both labeled with FITC. The biotin labeled probes were detected with mouse anti biotin and rabbit anti mouse (TRITC), and then counter-stained with DAPI. The CDCP1 gene could be assigned to the chromosomal region 3p21-23

Figure 5 Real time quantitative RT ± PCR of CDCP1. 3 mgof total RNA isolated from di€erent tissues and cell lines were used for ®rst strand cDNA synthesis with an oligo dT primer (Promega) according to manufacturer's recommendations (Gibco ± BRL). 1/50 of the RT reaction was used as a template Figure 4 Expression analysis of CDCP1. Commercially available for cDNA ampli®cation with CDCP1 speci®c primers 5'- Northern blots containing 2 mg of mRNA from cell lines (a) and TTGAATTC-ACTGTGTGGAGCC-3' and 5'-TGCAGGCAA- human tissues (b), were hybridized with a 317 bp cDNA fragment CAGTGATGTC-3' together with the labeled probe 5'-6-carboxy- from the coding region of CDCP1 (spanning nucleotide 827 ± ¯uorescein-ATTGGCCTTCCTTAGGCTGGCTAC- 6 -carboxyte- 1144). All DNA fragments were 32P-labeled by random priming tramethyl-rhodamine-3' designed to anneal to unique sequences using the RTS Prime DNA labeling System (Gibco ± BRL). Lanes found in the 3' untranslated region of the CDCP1 mRNA. Real are (HL-60) Promyelocytic leukemia, (S3) HeLa cell, (K-562) time PCR was performed using PRISM 7700, ABI System. Chronic myelogenous leukemia, (MOLT-4) Lymphoblastic leuke- Expression pro®les were determined in di€erent cancer and mia, (Raji) Burkitt's lymphoma Raji, (SW480) Colorectal normal tissues normalized to û-actin (a) and û-tubulin (b) adenocarcinoma, (A549) Lung carcinoma and (G361) Melanoma. mRNA. Also expression pro®les from microdissected colon Arrows indicate the position of the 6 kb CDCP1-transcript. High cancer biopsies (black bars) compared to controls from normal levels of CDCP1 mRNA are observed in SW480, A549 and in colon samples (white bars) as well as from colon cancer cell lines colon cancer tissue. Low level of CDCP1 expression is observed (gray bars) normalized to GAPDH mRNA were performed (c). in normal colon, normal stomach, normal rectal and tumor tissue. CDCP1 is signi®cantly upregulated in colon adenocarcinoma and No expression is detected in other cell lines examined including moderately expressed in lung cancer and mamma IDC compared HeLa, Burkitt's lymphoma and melanoma G361 to normal tissues

Oncogene CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4407 The ORF extends 2511 bp and encodes a protein of 836 amino acids. The deduced primary amino acid structure of the CDCP1 protein is shown in Figure 6. Statistical analysis of the amino acid sequence reveals that the CDCP1 protein exhibits a transmembrane domain (amino acid residues 666 ± 691). A signal peptide (residues 1 ± 29) was identi®ed which is characteristically for many membrane proteins Figure 6 The modular architecture of CDCP1. Low complexity (Wickner and Lodish, 1985). Twelve potential sites regions and regions of compositional bias of the protein were for N-glycosylation in the predicted CDCP1 protein identi®ed using SEG (Wootton and Federhen, 1996) and BIASDB. The transmembrane segment was identi®ed using SAPS sequence at position 39, 122, 180, 205 213, 270, 310, (Brendel et al., 1992) (statistical analysis of protein sequences, 339, 386, 477, 512 and 577 are located exclusively in part of bioSCOUTTM) as well as HMMTOP (Tusnady and the extracellular domain. Simon, 1998) and TMHMM (Sonnhammer et al., 1998). The Three potential CUB domains (Bork and Beckmann, signal peptide was identi®ed using the SignalP program (Nielsen et al., 1997). Putative N-glycosylation sites were identi®ed via a 1993; Romero et al., 1997) were identi®ed in the N- ProSearch (bioSCOUTTM) of the PROSITE database of regular terminal part (approximately residue ranges 220 ± 350, expressions. For inference of the domain structure the pattern 425 ± 535 and 545 ± 660) using sequence comparison, databases Pfam (Bateman et al., 2000) and SMART (Schultz et fold recognition and secondary structure prediction. al., 2000) were used. The SMART `schnipsel' database was CUB domains are structurally related to immunoglo- instrumental in identifying the ®rst CUB domain. Fold recogni- tion (ProCeryon software package from ProCeryon Biosciences) bulin domains. CUB domain-containing proteins are was employed for validation of the marginal components and activators of the mammalian comple- to known CUB domains in case of the second and third CUB ment system as well as proteases responsible for the domain. Secondary structure prediction was performed with the activation of trypsinogen and cleavage of collagen and DSC program (King and Sternberg, 1996). The translation shows the product of the 836 AA long open reading frame. The ®bronectin. Other members are the vertebrate bone extracellular domain is shown in gray, the transmembrane helix morphogenetic protein (BMP-1) and proteins involved (TM) in black and the intracellular region of CDCP1 protein in in cell adhesion or interacting with the extracellular white. Cross-hatched boxes indicate three CUB domains in the matrix. A majority of CUB domain-containing proteins extracellular part. Asterisks indicate the location of 12 potential is developmentally regulated (Bork and Beckmann, N-glycosylation sites, the arrow indicates a hexalysine stretch and a proline rich region is shown as a hatched box 1993). A recent publication demonstrates, that CUB domain containing proteins are the most di€erentially regulated proteins in C. elegans larval development (Gerstein and Jansen, 2000). Since genes that play key cells of colon cancer patients. In summary the data functions in embryonic development often ful®l correlate well with the results obtained from Northern analogous roles in cancer (Bhatia-Gaur et al., 1999; blots and con®rm that CDCP1 is signi®cantly Cangi et al., 2000), one could speculate that upregula- upregulated in colon cancer (Figure 5c). Bearing in tion of CDCP1 may modulate the cell substrate mind the fact that CDCP1 maps to 3p21-23, a region adhesion or the interaction with the extracellular that has been shown to be deleted in other human matrix. A hexalysine stretch (692 ± 697) may confer tumors, CDCP1 is potentially an interesting protein, capability for unspeci®c electrostatic interactions and because it could be predicted that a gene within this thus reinforce speci®c interactions possibly performed region is down regulated. However, whether CDCP1 is by the remainder of the intracellular part. Taken ampli®ed in colon cancer needs to be veri®ed by together, particularly in view of the function performed additional genomic studies. by other CUB domain containing proteins, one may The complete nucleotide sequence of the coding speculate that CDCP1 is a transmembrane protein region as well as 200 bp genomic region upstream of playing a role in communication, interaction or the transcription start site and the predicted amino acid signaling with extracellular components or ligands in sequence are given in Figure 2. The cap site motif a way that is not yet understood. Although CDCP1 is CGCAGCTGC indicates the transcription initiation upregulated in colon and lung cancer, that does not site (Bucher, 1990). The derived CDCP1 mRNA is necessarily imply its role in oncogenesis, since di€er- 5952 nucleotide in length, excluding the poly A tail. ential gene expression could simply be an e€ect and not The open reading frame (ORF) starts with the Kozak- a cause of the carcinogenic process. Further investiga- type (GTCATGG) translation initiation site (Kozak, tions have to be done to elucidate the physiological 1987) and ends at a stop codon at position 2591 ± 2593. function of this novel protein. The 200 bp upstream sequence to the proposed transcription start site of CDCP1 does not contain a TATA or a CCAAT box. However, a GC-box as well Acknowledgments as recognition sites for the rare cutter enzyme BssHII We thank H Dauwa-Stummer and A Gupta for excellent were found in the upstream region and in exon 1. The technical assistance and R Varecka for DNA sequencing. fact that the 5' region has a CG content over 60% We would like to acknowledge K Zatloukal (Dept. of indicates that CDCP1 contains a CpG island (Bird, Pathology, University of Graz) for generously providing 1986; Cross et al., 2000). the colon cancer patient samples and we want to thank

Oncogene CDCP1 gene in human colorectal cancer M Scherl-Mostageer et al 4408 L Wightman for critically reading the manuscript. Last but accession number is AY026461). We would also like to not least we want to thank KK Wilgenbus, G Adolf and M mention that while writing this publication two other Baccarini for helpful discussions. sequences, overlapping with our gene were submitted to the public database (AK026622 and AK023834). Note added in proof The novel cDNA sequence, published here has been submitted to the GenBank sequence data bank (the

References

Abe K. (1992). Mamm. Genome, 2, 252 ± 259. King RD and Sternberg MJ. (1996). Protein Sci., 5, 2298 ± Afshari CA, Nuwaysir EF and Barrett JC. (1999). Cancer 2310. Res., 59, 4759 ± 4760. Knight SJ, Horsley SW, Regan R, Lawrie NM, Maher EJ, Altschul SF, Madden TL, Scha€er AA, Zhang J, Zhang Z, Cardy DL, Flint J and Kearney L. (1997). Eur. J. Hum. Miller W and Lipman DJ. (1997). Nucleic Acids Res., 25, Genet., 5, 1±8. 3389 ± 3402. Kozak M. (1987). Nucleic Acids Res., 15, 8125 ± 8148. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL and Kurian KM, Watson CJ and Wyllie AH. (1999). J. Pathol., Sonnhammer EL. (2000). Nucleic Acids Res., 28, 263 ± 266. 187, 267 ± 271. Bhatia-Gaur R, Donjacour AA, Sciavolino PJ, Kim M, Liang P and Pardee AB. (1992). Science, 257, 967 ± 971. DesaiN,YoungP,NortonCR,GridleyT,Cardi€RD, LichterP,CremerT,BordenJ,ManuelidisLandWardDC. Cunha GR, Abate-Shen C and Shen MM. (1999). Genes (1988). Hum. Genet., 80, 224 ± 234. Dev., 13, 966 ± 977. Mitelman F, Mertens F and Johansson B. (1997). Nat. Bird AP. (1986). Nature, 321, 209 ± 213. Genet., 15, 417 ± 474. Bork P and Beckmann G. (1993). J. Mol. Biol., 231, 539 ± NielsenH,EngelbrechtJ,BrunakSandvonHeijneG. 545. (1997). Protein Eng, 10, 1±6. Bornstein SR, Willenberg HS and Scherbaum WA. (1998). Nowell PC. (1976). Science, 194, 23 ± 28. Med. Klin., 93, 739 ± 743. Parle-McDermott A, McWilliam P, Tighe O, Dunican D and BrendelV,BucherP,NourbakhshIR,BlaisdellBEand Croke DT. (2000). Br. J. Cancer, 83, 725 ± 728. Karlin S. (1992). Proc. Natl. Acad. Sci. USA, 89, 2002 ± Romero A, Romao MJ, Varela PF, Kolln I, Dias JM, 2006. Carvalho AL, Sanz L, Topfer-Petersen E and Calvete JJ. Brown PO and Botstein D. (1999). Nat. Genet., 21, 33 ± 37. (1997). Nat. Struct. Biol., 4, 783 ± 788. Bucher P. (1990). J. Mol. Biol., 212, 563 ± 578. Sambrook J, Fritsch EF and Maniatis T. (1989). Molecular Cangi MG, Cukor B, Soung P, Signoretti S, Moreira Jr G, Cloning: A laboratory Manual 2nded.,ColdSpring Ranashinge M, Cady B, Pagano M and Loda M. (2000). Harbor, Laboratory Press, Cold Spring Harbor, NY, J. Clin. Invest, 106, 753 ± 761. USA. Cross SH, Clark VH, Simmen MW, Bickmore WA, Maroon Schultz J, Copley RR, Doerks T, Ponting CP and Bork P. H, Langford CF, Carter NP and Bird AP. (2000). Mamm. (2000). Nucleic Acids Res., 28, 231 ± 234. Genome, 11, 373 ± 383. Sidransky D, Mikkelsen T, Schwechheimer K, Rosenblum Diatchenko L, Lau YF, Campbell AP, Chenchik A, ML, Cavanee W and Vogelstein B. (1992). Nature, 355, Moqadam F, Huang B, Lukyanov S, Lukyanov K, 846 ± 847. Gurskaya N, Sverdlov ED and Siebert PD. (1996). Proc. Sonnhammer EL, von Heijne G and Krogh A. (1998). Ismb., Natl. Acad. Sci. USA, 93, 6025 ± 6030. 6, 175 ± 182. Fearon ER and Vogelstein B. (1990). Cell, 61, 759 ± 767. StollbergJ,UrschitzJ,UrbanZandBoydCD.(2000). Fialkow PJ. (1976). Biochim. Biophys. Acta, 458, 283 ± 321. Genome Res., 10, 1241 ± 1248. Gerstein M and Jansen R. (2000). Curr. Opin. Struct. Biol., To KY. (2000). Comb. Chem. High Throughput. Screen., 3, 10, 574 ± 584. 235 ± 241. Hanahan D and Weinberg RA. (2000). Cell, 100, 57 ± 70. Tusnady GE and Simon I. (1998). J. Mol. Biol., 283, 489 ± Hubank M and Schatz DG. (1994). Nucleic Acids Res., 22, 506. 5640 ± 5648. Velculescu VE, Zhang L, Vogelstein B and Kinzler KW. Hubank M and Schatz DG. (1999). Meth. Enzymol., 303, (1995). Science, 270, 484 ± 487. 325 ± 349. Wickner WT and Lodish HF. (1985). Science, 230, 400 ± 407. Ishii M, Hashimoto S, Tsutsumi S, Wada Y, Matsushima K, Wootton JC and Federhen S. (1996). Meth. Enzymol., 266, Kodama T and Aburatani H. (2000). Genomics, 68, 136 ± 554 ± 571. 143.

Oncogene