Oncogene (2004) 23, 3013–3023 & 2004 Nature Publishing Group All rights reserved 0950-9232/04 $25.00 www.nature.com/onc

Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment

Lijian Hui1,6, Xin Zhang2,6, Xin Wu3, Zhixin Lin4, Qingkang Wang2, Yixue Li*,5, Gengxi Hu*,1

1State Key Laboratory of Molecular Biology, Institute of Biochemistry and Cell Biology, Room 400, Cell Building, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences,Yueyang Road 320, Shanghai 200031, China; 2Institute of Microelectronics, Shanghai Jiaotong University, China; 3Liver Cancer Institute, Zhongshan Hospital, China; 4School of Life Science, Shanghai Jiaotong University, China; 5Bioinformation Center, Shanghai Insititutes for Biological Sciences, Chinese Academy of Sciences, Yueyang Road 319, Shanghai 200031, China

Several databases have been published to predict alter- Introduction native splicing of mRNAs by analysing the exon linkage relationship by alignment of expressed sequence tags One of the remarkable revelations that the recent (ESTs) to the genome sequence; however, little effort has description and annotation of the from been made to investigate the relationship between cancers the International Human Genome Sequencing Consor- and alternative splicing. We developed a program, tium andprivate company Celera Genomics has Alternative Splicing Assembler (ASA), to lookfor splicing brought us is the prediction that there are about variants of human transcripts by genome-wide ESTs 30 000–40 000 gene transcripts in human beings (Lander alignment. Using ASA, we constructed the biosino et al., 2001; Venter et al., 2001), although only 60% of alternative splicing database (BASD), which predicted these were consistent in different data sources splicing variants for reference sequences from the (Hogenesch et al., 2001). Discussions about the almost reference sequence database (RefSeq) and presented them inestimable complex function produced by those limited in both graph and text formats. EST clusters that differ number of genes were focusedon transcription regula- from the reference sequences in at least one splicing site tion andalternative splicing (Black, 2000; Claverie, were counted as splicing variants. Of 4322 genes screened, 2001), in addition to post-translation modification, such 3490 (81%) were observed with at least one alternative as phosphoration andglycosylation, which presumably splicing variants. To discover the variants associated with contributes to the diversified functions of . cancers, tissue sources of EST sequences were extracted Alternative splicing is an important mechanism in from the UniLib database and ESTs from the same tissue higher eukaryotes for producing proteomic complexity, type were counted. These were regarded as the indicators since approximately 30–65% genes are alternatively for gene expression level. Using Fisher’s exact test, splicedas estimatedby genomically alignedexpressed alternative splicing variants, of which EST counts were sequence tags (ESTs) (Mironov et al., 1999; Brett et al., significantly different between cancer tissues and their 2000; Kan et al., 2002). Combining these mechanisms, counterpart normal tissues, were identified. It was the human organism couldconceivably producehun- predicted that 2149 variants, or 383 variants after dreds of thousands of different proteins by the estimated Bonferroni correction, of 26 812 variants were likely 35 000 human genes. tumor-associated. By reverse transcription–PCR, 11 of 13 Alternative splicing of pre-mRNA is a versatile novel alternative splicing variants and eight of nine mechanism for regulating gene function at the post- variants’ tissue specificity were confirmed in hepatocelluar transcription level. As many as 15% of single base-pair carcinoma and in lung cancer. The possible involvement of mutations that cause human genetic diseases result in alternative splicing in cancer is discussed. alternative splicing defects as indicated by a survey of Oncogene (2004) 23, 3013–3023; doi:10.1038/sj.onc.1207362 mutations in splicing junctions (Krawzczak et al., 1992). Publishedonline 29 March 2004 The differential expression of alternative spliced genes in cancer tissue has been well documented. For example, Keywords: alternative splicing prediction; database; Bcl-x, a member of the Bcl-2 family of apoptosis cancer; EST alignment regulators, has two alternative isoforms, Bcl-xL and Bcl-xS, resulting from 50 splicing of exon 2. Only Bcl- xL, believedto play a major role in carcinogenesis, is overexpressedin small cell lung carcinoma (Reeve et al., 1996) andin 60% of invasive breast carcinomas *Correspondence: G Hu or Y Li; (Olopade et al., 1997). Another example is CD44, which E-mails: [email protected] or [email protected]. 6These authors contributedequally to this paper is a membrane glycoprotein consisting of nine variable Received30 May 2003; revised25 September 2003; accepted11 exons. Elevatedexpression of CD44 variants containing November 2003 variable exon 6 in solidtumor tissue andthe soluble Prediction of cancer-related alternative splicing L Hui et al 3014 variant in serum might be appliedas a probable the RefSeq database and used for alternative splicing diagnostic marker for bladder cancer (Goodison et al., searching by ASA. Alternative splicing variants were 1999). observedin 46 out of 52 genes as reported(Table 1). Alignment of ESTs to the genome is a potentially Further study suggested that the reason why no splicing powerful means to predict alternative splicing of gene variant was detected by ASA in the remaining six genes transcripts. Wolfsberg and Landsman (1997) demon- was probably because only a few ESTs were available stratedthat alternative mRNA formats couldbe for these genes. Only four EST hits were available on the discovered through the assembly of ESTs derived from average for these six genes, while 202 EST hits couldbe the gene . Mironov et al. (1999) have also studied foundfor the other 46 genes. The limitednumber of alternative splicing of human genes by using EST ESTs did not cover the splicing sites present in the contigs from the TIGR Human Gene Index . Recently, ASDB. Basedon this sample, the ASA gives a false a software tool, Transcript Assembly Program (TAP), negative result of 12%, or, it might correctly report which deduces the gene structure by genomic EST about 88% of alternatively splicedgenes. In this aspect, alignments, was developed (Kan et al., 2001; Kan et al., ASA is comparable with SpliceNest, PALS andHASDB 2002). Databases for alternatively splicedmRNAs databases, which have false negative rates of 15, 35, and including AsMamDB, HASDB, PALS and SpliceNest, 38%, respectively (Table 1). all of which are basedon UniGene clusters, have been EST sequences might be contaminatedby genomic constructed(Ji et al., 2001; Modrek et al., 2001; Coward sequences, vector sequences andchimaeric cDNA et al., 2002; Huang et al., 2002). The EST database clones. The contamination along with sequencing errors wouldgive detailedinformation of cDNA libraries and may cause mistakes in EST assembly andgive false make it possible to predict tissue expression. For splicing variants. In order to test the effect of these example, BodyMap was established to acquire tissue factors, we randomly selected 82 splicing variants from distribution data through collection of site-directed 30 67 genes andcomparedthese variants with the ESTs (Kawamoto et al., 2000). Xu et al. (2002) nonredundant database of GenBank. Approximately developed an automated method for discovering tis- 26% (21 out of 82) variants foundat least one reported sue-specific regulation through a genome-wide analysis mRNA sequence that is homologous to the query of ESTs . Since about 45% of the human EST sequences sequence in the nonredundant database (Table 2). Since are derived from cancer cells, it would be possible to mRNA sequences deposited into the nonredundant identify cancer-associated mRNA variants by counting database came from different labs, the existence of the number of ESTs from the cancer andits counterpart these variants wouldbe highly reliable. We further normal tissue libraries (Scheurle et al., 2000; Baranova selected13 variants, three with homolog sequences in et al., 2001; Xie et al., 2002; Wang et al., 2003). nonredundant database and 10 with only homolog ESTs We hereby developed the Alternative Splicing Assem- (Table 2), andperformedRT–PCR to test their presence bler (ASA) program, which searchedall putative in clinical samples of hepatocelluar carcinoma andlung alternative splicing variants through genomic EST cancer. In total, 11 of 13 (85%) variants were confirmed alignment. Using this program, a human alternative by RT–PCR andsequencing. Among these 13 variants, splicing database was constructed based on curated eight variants were supportedonly by one EST andsix reference sequence database (RefSeq). Splicing variants of the eight were confirmed. related to cancer tissues were identified. A total of 26 812 alternatively splicedvariants from 4322 genes were included in the database, in which 2149 variants from Analysis of the alternatively spliced variants and their 1827 genes were predicted as cancer associated. A total tissue distribution in BASD of 13 novel splicing variants andnine cancer-related variants were further testedby reverse transcription BASD presents possible splicing variants of gene (RT)–PCR in clinical samples. transcripts in both text andgraph forms. Bar-link graphs show the distribution of exons along the (Figure 1). The 50 or 30 endextension of 0 0 Results a splicing variant is indicated. Although the 5 /3 extensions were not confirmedby other experiments, ASA was created to identify alternative splicing of human they might provide some additional useful information gene transcripts to the full-length cDNA projects. A total of 4322 reference sequences were screenedand3490 (81%) The first public version of our program, ASA, was freely reference sequences were predicted as alternatively available at biosino alternative splicing database spliced, producing 26 812 splicing variants. On average, (BASD). A brief help file is also provided at the six splicing variants were observedfor each reference download page. The current version has been sequence. The distribution of EST counts was graphed successfully testedin the IRIX 6.5 system. to scale representing percentage of variants (Figure 2). We randomly selected 52 genes from alternative In all, 87% variants containedless than 16 ESTs, and splicing database (ASDB) (Dralyuk et al., 2000), all 58% were representedby the EST singletons. If the having previously reportedalternative splicing isoforms. singletons were excluded from the database, then only Reference sequences for these genes were selectedfrom 66% genes were alternatively spliced, which is consistent

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3015 Table 1 Variants predicted by ASA and other databases in 52 genes with known alternative splicing Locus ID RefSeq ID Gene symbol ASA PALS SpliceNest HASDB

265a NM_001142 AMELX 1 1 1 2 272 NM_000480 AMPD3 12 4 3 7 286 NM_000037 ANK1 13 4 6 NAb 314 NM_001158 AOC2 4 1 2 2 361 NM_001650 AQP4 4 NA 2 NA 550 NM_012103 AUP1 17 13 11 10 596 NM_000633 BCL2 2 NA 1 NA 796a NM_001741 CALCA 1 NA 3 NA 799a NM_001742 CALCR 1 2 2 1 846 NM_000388 CASR 4 2 3 2 875 NM_000071 CBS 9 4 10 4 968 NM_001251 CD68 6 3 4 3 1123 NM_001822 CHN1 12 2 4 NA 1268a NM_001840 CNR1 1 1 NA 1 1356 NM_000096 CP 16 4 8 1 1446 NM_001890 CSN1S1 5 5 1 4 1889 NM_001397 ECE1 5 1 9 2 1945 NM_005227 EFNA4 4 3 2 1 2002 NM_005229 ELK1 3 6 4 3 2192 NM_001996 FBLN1 10 2 NA 12 2352 NM_000804 FOLR3 9 3 4 1 2886 NM_005310 GRB7 15 8 10 5 2897 NM_000830 GRIK1 2 1 2 NA 3054a NM_005334 HCFC1 1 1 1 1 3159 NM_002131 HMGA1 24 16 46 8 3300 NM_006736 DNAJB2 14 6 7 6 3897 NM_000425 L1CAM 11 3 10 2 4297 NM_005933 MLL 17 2 8 NA 4647 NM_000260 MYO7A 7 1 3 1 4665 NM_005967 NAB2 4 1 5 2 5327 NM_000930 PLAT 10 6 7 3 5423 NM_002690 POLB 11 5 9 6 5915 NM_000965 RARB 2 1 1 2 6387 NM_000609 CXCL12 11 3 7 NA 6426 NM_006924 SFRS1 13 6 5 3 6508 NM_005070 SLC4A3 5 NA 4 2 6546 NM_021097 SLC8A1 3 2 3 NA 6557 NM_000338 SLC12A1 6 NA 2 2 6609 NM_000543 SMPD1 7 3 6 3 6672a NM_003113 SP100 1 4 11 9 6819 NM_001056 SULT1C1 5 NA 4 1 6869 NM_001058 TACR1 3 NA 1 NA 7156 NM_004618 TOP3A 8 1 6 1 7168 NM_000366 TPM1 28 8 20 8 7249 NM_000548 TSC2 21 8 10 7 8209 NM_004649 C21orf33 18 7 8 9 9626 NM_005459 GUCA1C 3 1 2 2 9948 NM_005112 WDR1 17 9 11 12 10278 NM_005864 EFS 8 4 4 2 11016 NM_006856 ATF7 6 3 2 NA 22823 NM_007358 M96 12 8 6 12 27032 NM_014382 ATP2C1 5 2 6 5 aGenes did not detect alternative splicing by ASA. bNA: not available with previous reports (Mironov et al., 1999; Brett et al., libraries in the National Center for Biotechnology 2000; Kan et al., 2002). Information (NCBI) EST database. Library tissue Although the magnitude of the overestimate has not sources were extractedfrom this database andverified been assessed, the actual number of variants may be manually. A total of 6593 libraries were left and certainly smaller due to raw DNA sequencing errors and classifiedinto 293 tissue types. A sample of 127 random contaminatedESTs derivedfrom non-native splicing. In genes was usedto analyse the tissue distribution of addition, EST clusters, homologous to another reference alternative splicing (Table 3). There were on average sequence andnot overlapping each other, might be four variants for each gene in the brain andthree counted as two independent variants (Figure 1), whereas variants in the liver that ranks the top two, whereas that these EST clusters couldrepresent different regions of number was 1.8 in all these 17 tissue types analysed. This the same mRNA isoform. observation was consistent with a previous report that In order to analyse the tissue specificity of the splicing the brain andliver were tissue types with enriched variants, we took the source information from cDNA alternative splicing (Stamm et al., 2000); however, more

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3016 Table 2 Confirmation of new alternative splicing by homolog mRNA andRT–PCR Locus ID RefSeq ID Gene name Variant ID EST counts Homolog mRNA

14 NM_001087 AAMP 3 3 —a 18 NM_000663 ABAT 3 3 — 19 NM_005502 ABCA1 4 1 — 23 NM_001090 ABCF1 4 16 AL832430 30 NM_001607 ACAA1 4 7 AF035295; BC025780 34 NM_000016 ACADM 4 3 — 34 NM_000016 ACADM 5 8 — 35 NM_000017 ACADS 4 2 — 37 NM_000018 ACADVL 2 10 — 43 NM_000665 ACHE 2 5 NM_015831 43 NM_000665 ACHE 3 11 — 54 NM_001611 ACP5 7 5 L29280 60 NM_001101 ACTB 4 1 — 72 NM_001615 ACTG2 6 2 — 92 NM_001616 ACVR2 4 2 — 94 NM_000020 ACVRL1 3 4 L17075 94 NM_000020 ACVRL1 7 1 — 102 NM_001110 ADAM10 2 3 — 104 NM_001112 ADARB1 2 68 NM_015833 104b NM_001112 ADARB1 5 1 AF525422 109 NM_004036 ADCY3 3 7 AK027859; BC002870 109 NM_004037 ADCY3 7 2 — 123 NM_001122 ADFP 4 44 BC005127 125 NM_000668 ADH1B 3 4 — 126 NM_000669 ADH1C 2 4 — 126 NM_000669 ADH1C 4 10 — 126 NM_000669 ADH1C 12 27 M21692 127b NM_000670 ADH4 3 27 — 128 NM_000671 ADH5 2 8 — 130 NM_000672 ADH6 4 2 AJ278908; AK092768 135 NM_000675 ADORA2A 4 2 — 143b NM_006437 ADPRTL1 3 6 — 143 NM_006437 ADPRTL1 4 7 — 158 NM_000026 ADSL 2 13 AF067854 163b NM_001282 AP2B1 2 9 — 165 NM_001129 AEBP1 4 2 — 166 NM_001130 AES 3 3 — 174 NM_001134 AFP 3 3 — 174 NM_001134 AFP 7 4 — 182 NM_000214 JAG1 6 4 — 183 NM_000029 AGT 3 1 — 183b NM_000029 AGT 6 20 — 185 NM_000685 AGTR1 3 9 NM_009585 190 NM_000475 NR0B1 2 15 U31929 197 NM_001622 AHSG 8 7 — 197 NM_001622 AHSG 12 15 — 210 NM_000031 ALAD 4 9 BC000977; BC009172 211 NM_000688 ALAS1 2 4 — 212 NM_000032 ALAS2 4 3 BC030230 217 NM_000690 ALDH2 4 2 — 218 NM_000691 ALDH3A1 3 7 — 218 NM_000691 ALDH3A1 4 2 — 219 NM_000692 ALDH1B1 3 10 — 348 NM_000041 APOE 7 1 — 435 NM_000048 ASL 7 1 — 445 NM_000050 ASS 3 1 — 732 NM_000066 C8B 6 1 — 1294b NM_000094 COL7A1 3 1 — 1545 NM_000104 CYP1B1 3 1 — 1545b NM_000104 CYP1B1 5 1 — 2038b NM_000119 EPB42 5 1 — 2074 NM_000124 ERCC6 3 1 — 2074 NM_000124 ERCC6 5 1 — 2200 NM_000138 FBN1 5 1 — 2580b NM_005255 GAK 6 1 BC001290; BC008668 2669 NM_005261 GEM 4 1 — 2965 NM_005316 GTF2H1 4 1 — 3054 NM_005334 HCFC1 4 1 — 3159 NM_002131 HMGA1 9 140 BC008832 3280 NM_005524 HES1 4 1 —

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3017 Table 2 continued 3293 NM_000197 HSD17B3 5 1 — 6628b NM_003091 SNRPB 5 623 J04564 10349b NM_080282 ABCA10 5 1 — 10595 NM_033266 LOC374284 6 1 — 10755 NM_005716 RGS19IP1 4 1 — 23460 NM_080284 ABCA6 2 1 — 51058 NM_015911 LOC51058 8 1 AK000938; BC000157 51594 NM_015909 NAG 6 1 AF388385; AL832774; AL833270 51594b NM_015909 NAG 8 1 — 51599 NM_015925 LISCH7 8 1 — 51603 NM_015935 CGI-01 8 1 — 54490b NM_053039 UGT2B28 6 1 — aNo human homolog mRNA was found in NCBI nonredundant database. bSplicing variants that were testedby RT–PCR andsequencing. Variants 2038_5 and2580_6 couldnot be amplifiedby PCR. See Table 5 for primers

Figure 1 Alternative splicing of Locus 210 in BASD. Alternative splicing variants are displayed in bar-link graph form. The bar length is proportional to its corresponding exon length. The 50/30 endextension is indicatedat the headof the graphic variant. EST accession numbers and cDNA library sources (‘-‘ stands for unknown tissue types) are presented under each variant. EST counts in normal andcancer tissues andthe statistical values ( P) are displayed beside each cancer type. Exon location on the genomic contig is shown by pairednumbers in brackets. In this case, variants 2 and3 couldnot be put together because there is no EST in dbEST overlapping all alternative splicing sites of these two variants

EST information may help identify additional splicing detected exclusively in cancer tissues, for example no variants. Interestingly, about 56% splicing variants were EST representing this splicing variant was identified in foundin only one tissue type. Consideringthat six normal tissues, and29% were only detectedin normal splicing variants were observedper gene, it wouldbe tissues (Table 3). This implies that new splicing variants expectedthat a large fraction of splicing variants are might be generatedduring carcinogenesis. This result tissue specific. supports the observation that during carcinogenesis not only is the expression profile affectedbut the splicing Identification of cancer-associated splicing variants pattern is also dramatically altered (Kaufmann et al., 2002). Splicing variants were classifiedaccording to their The EST numbers of 16 495 variants were compared cancer or normal tissue sources. By simply counting between cancer andnormal tissues. Other variants were ESTs, we foundthat about 35% splicing variants were not testedbecause there are no pairedcancer tissues for

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3018 them. Cancer-associatedsplicing variants with P-value 10581) contains a variant with significance, but not smaller than 0.05 were identified in cancers (Table 4). As when Bonferroni-corrected, which has 22 and seven multiple comparisons were carried, the avalue of ESTs in prostate andcancer ( P ¼ 0.01). One variant of Fisher’s exact test was correctedto a0 ¼ 0.05/ semenogelin II (SEMG2, Locus: 6407) gives an example 16945 ¼ 3.0eÀ6 using Bonferroni correction. These of being significant even after Bonferroni correction, comparisons assume that the number of ESTs of a which has EST numbers of 94 and0 in prostate and splicing variant reflects its expression level. We give cancer tissues (Po1eÀ6). three example variants in prostate andprostate cancer To verify the statistical prediction, RT–PCR was (with total EST numbers of 73 370 and62 841) here to appliedto measure the expression level of nine genes show this relationship roughly. One variant of placental selectedfrom hepatocellular carcinoma (HCC) andlung 6 (PL6, Locus: 11070) is the example without cancer. In the four normal andtumor tissue pairs of statistic significance, which has five andsix EST counts HCC andsix pairs of lung cancer examined,eight of the in prostate andprostate cancer ( P ¼ 0.40). Interferon- nine genes’ expressions were consistent with our predic- induced transmembrane protein 2 (IFITM2, Locus: tion (Figure 3). In addition, genes related to HCC carcinogenesis from cDNA array (Xu et al., 2001) were further usedto check our data. Those variants that cover the gene sequences representedon the array membrane were chosen and their prediction data were combined. Of the 26 genes with an altered expression level from the array data, 25 genes were present in both BASD andarray. Two genes didnothave the same prediction data as the array data, and 10 genes did not give prediction of statistical significance, owing to limitedESTs. The predictedexpression levels of the remaining 13 genes (52%) were consistent with the array data. This result indicates that the prediction of cancer- associatedsplicing variants was reliable if there were enough ESTs. A total of 2149 variants (about 8% of all 26 812 splicing variants) from 1827 genes were foundto have different EST counts between cancer and normal tissues with a statistical significance of 0.05 (Table 4). After Bonferroni correction, there are only 383 variants that Figure 2 Distribution of EST counts plottedto the percentage of have statistical significance (Table 4). This small group variants. The coordinate of EST counts is scaled in binary of variants will have a relatively higher confidence of logarithm. Each bar demonstrates the integral percentage of cancer relationship. The statistics of P-value distribution variants with EST counts between the two neighboring EST numbers. Blank bars represent the percentage of overall variants, of the cancer-associatedvariants are illustratedin as gray bars standfor distribution of cancer-relatedvariants Figure 4. No significant difference was observed

Table 3 Tissue distribution of splicing variants in a sample of 127 genes Tissue type No. of No. of Variants/ No. of variants only No. of variants No. of variants in variants genes genes in normal tissue only in cancer tissue both tissues

Braina 443 112 4.0 Liver 204 62 3.3 95(0.49) 48(0.24) 61(0.30) Placenta 150 84 1.8 92(0.61) 18(0.12) 40(0.27) Colon 155 88 1.8 4(0.03) 114(0.74) 37(0.24) Testis 135 78 1.7 101(0.75) 12(0.09) 22(0.16) HeadandNeck 110 66 1.7 13(0.12) 73(0.66) 24(0.22) Lung 149 92 1.6 30(0.20) 74(0.50) 45(0.30) Breast 132 82 1.6 33(0.25) 48(0.36) 51(0.39) Kidney 141 88 1.6 46(0.33) 41(0.29) 54(0.38) Prostate 116 74 1.6 50(0.43) 21(0.18) 45(0.39) Skin 87 58 1.5 13(0.15) 45(0.52) 29(0.33) Uterus 104 73 1.4 1(0.01) 85(0.82) 18(0.17) Nervous 83 59 1.4 35(0.42) 19(0.23) 29(0.35) Pancreas 99 71 1.4 12(0.12) 59(0.60) 28(0.28) Marrow 58 42 1.4 3(0.05) 45(0.78) 10(0.17) Ovary 69 56 1.2 4(0.06) 53(0.77) 12(0.17) Stomach 62 51 1.2 2(0.03) 53(0.85) 7(0.11)

aOwing to unclear description of brain cancer in most libraries, no effort was made to separate libraries of brain cancer from normal brain

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3019 Table 4 Distribution of cancer-relatedsplicing variants Cancer type No. of libraries Total EST numbers No. cancer-related variants (with/without Bonferroni correction) Normal Cancer Normal Cancer

Adrenal gland cancer 4 4 9107 12 684 6/111 Bladder cancer 9 41 266 25 287 0/3 Breast cancer 328 710 60 900 104 657 21/297 Cervix carcinoma 1 4 1152 22 931 0/96 Colon cancer 113 547 24 098 159 609 10/100 Epididymus cancer 2 37 1147 5262 2/43 Esophagus cancer 2 5 103 3807 1/2 Headandneck cancer 104 885 16 209 122 600 18/124 Kidney cancer 8 68 70 573 88 416 107/527 Hepatocellular carcinoma 17 8 41 170 21 704 22/211 Lung cancer 93 183 49 796 170 202 40/239 Marrow cancer 6 267 12 716 51 531 19/154 Nasopharynx carcinoma 6 6 5620 897 0/47 Nervous cancer 315 216 48 423 35 663 41/220 Ovary cancer 5 90 5755 74 457 21/60 Pancreas cancer 13 13 30 592 60 662 53/427 Pituitary cancer 5 1 8495 1598 0/31 Placenta cancer 332 4 156 786 43 467 89/488 Prostate cancer 141 141 73 370 62 841 36/339 Retinoblastoma 10 2 29 741 25 407 24/228 Skin cancer 12 15 10 829 71 284 9/91 Stomach cancer 67 214 8710 52 261 6/124 Testis cancer 150 16 92 920 28 067 27/196 Thymus cancer 15 5 3407 236 2/11 Thyroidcancer 11 8 5114 4203 6/99 Tongue cancer 1 3 719 1611 0/44 Uterus cancer 3 110 2923 128 452 14/122

Total 5376 2150 437 383/2149a aThe total number is smaller than the sum of this column because some alternative splice variants are countedin multiple tissues

Table 5 Splicing specific primers for RT–PCR Variant ID Forward primer Reverse primer Product length (bp)

2_1a 50 CCC AGC CAG CCC CAA CCT 30 50 AGC GGC TCC CTG TGT AAC TGA C 30 465 104_5b 50 TGA AAT GGC CAA TGA GCA CA 30 50 TGC ATC AGG GCG TTC TTG 30 412 127_1a 50 GGC CTA GCT TTC CCA GTG 30 50 AGC CAG TTG AAA ACC CAC ATC CA 30 379 127_3a,b 50 GGC CTA GCT TTC CCA GTG 30 50 TAG CCA GTT GAG GCA AGA TTG 30 326 143_3b 50 GTT CAT CAT GTC TGC CAC AC 30 50 TGA TTT GGT GCT TTT GGT TCA G 30 434 163_2b 50 CAA TTC ATC ATG GGA GCA CTG 30 50 AGT CCA CTG CCG CCT TAG A 30 436 174_1a 50 CAC TCC AGC ATC GAT CCC ACT TT 30 50 TTT CCC CAT CCT GCA GAC AAT CC 30 455 177_1a 50 CTG CCC AGC CCT CTC CTC AAA TC 30 50 GCA CGC TCC TCC TCT TCC TCC TG 30 373 183_6a,b 50 TAG TCG CTG CCT TCT TGG GC 30 50 AGA ACT CCT GGG GCT CGG 30 591 348_1a 50 GCG CGG ATG GAG GAG ATG G 30 50 TGC AGG CTT CGG CGT TCA GTG 30 270 1294_3b 50 GAT CCG ATG CCT CTT TTC CTC A 30 50 TCT CCC TTG TCA CCC TTT AGT CCT 30 483 1356_1a 50 AGG CGG CAT GAA GCA AAA AT 30 50 ATA AGC CCA TGG AAT ACA AGC AGA 30 520 1545_5b 50 ACG TTT TCC AGA TCC GCC ATC 30 50 ACC GTG CGC CCG AAC TCT TCG TT 30 416 2038_5b 50 GAG AGT CGA GAA AGA GAA AAT GG 30 50 ACA CAA TGA GGT AAA GCA AGT AAT 30 603 2580_6b 50 CCG GCG GTT GCT GAG CTG A 30 50 ATT CCA CCA GCT GCC CTT TAA T 30 313 6628_5a,b 50 GCT GCT GGC AGA GGA ATC 30 50 CCT GAG GAA TCG AGC CCA CG 30 424 10349_5b 50 CCC TGA GTT TGT TAA AAT ACC A 30 50 TGC ATC ATT TTT AAT ATG AGC TAC 30 405 51594_8b 50 AGA ATG AAG AGA ACC GCT ACT GTC 30 50 TTT CAT CAA CCT CTT CAC TGT C 30 475 54490_6b 50 CAA TGG CAT CTA TGA GGC AAT CT 30 50 TTG CTG GAA TCA ACT GAA GTA TCC 30 474 aVariants used to confirm cancer-related splicing. Number before underline is the Locus ID of this gene in LocusLink database, and number after underline is the variant number of this gene in our database. bVariants usedto confirm new identifiedsplicing

between the P-value distribution of the cancer-asso- right comparedwith the overall EST number distribu- ciatedandnormal tissue-associatedvariants. As shown tion. About 85% cancer-relatedvariants hadmore than in the EST number distribution (Figure 2), the peak of 16 ESTs. Basedon the statistics of the 127 genes sample, the distribution of cancer-related variants shifted to the the frequencies of cancer-specific variants variedfrom 3

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3020 cancer types. For example, the splicing variant 2 of the decorin (Locus: 1634) gene was countedless in lung cancer, placenta cancer anduterus cancer. Decorin is a proteoglycan component of the cell matrix that is capable of suppressing the growth of various cancers. Decorin was observeddownregulatedto 50% of its normal level in human lung adenocarcinomas (McDo- niels-Silvers et al., 2002) andundetectable in ovarian cancer cells (Nash et al., 2002). Our result increases the necessity to study decorin’s impact in other cancers. Catechol-O-methyltransferase (COMT, Locus: 1312) catalyses the methylation of catecholamines andplays a role in melanogenesis. Low-activity COMT allele was significantly relatedto regional lymph nodemetastasis of breast cancer (Matsui et al., 2000). From our database, variant 6 of this gene, which has an alternative 50 untranslatedregion, is significantly overexpressedin skin cancer tissue, while no difference was predicted for other variants. Our data also imply that different variants from the same gene may be associatedwith different cancers. Although both variants 1 and 2 of decorin were underexpressed in lung and placenta cancers, downregulation in breast cancer and HCC was only demonstrated in variant 1. Multiple alternative splicing variants are known for this gene; however, no study has been carried out on the relationship between different decorin variants and different cancers. The disabled homolog 2 (DAB2, Locus: 1601) gene is another example. DAB2 exhibits a negative control for cell growth andsurvival. The re-expression of DAB2 in Figure 3 RT–PCR was applied to validate our prediction of breast cancer cells leads to the suppression of c-fos cancer-relatedvariants in HCC andlung cancer. Gene symbols for expression andcell growth inhibition (He et al., 2001). 127, 174, 183, 1356, 2, 177, 348 and6628 are ADH4, AFP, AGT, Variant 1 of DAB2 is downregulated in most cancers CP, A2 M, AGER, APOE andSNRPB, respectively. Predicted including breast cancer, lung cancer, nervous cancer and expression levels in cancer andnormal tissues are shown on the left placenta cancer, whereas decreased expression of DAB2 side. Statistic significance is acquired when Po0.05. Expression level of variant 183_6 in HCC is not fully consistent with our variant 8, partial deletion in exon 9 of variant 1, was prediction, although moderate upregulation in cancer tissues is only detected in liver tissue. observed

Discussion

We constructedBASD to search for splicing variants of gene transcripts, especially those associatedwith cancer. A higher percentage of alternative splicing (81%) than the reportedratio of 30–65% (Mironov et al., 1999; Brett et al., 2000;Kan et al., 2002) is suggested. This is mainly due to the inclusion of EST singletons in the BASD, since 58% of splicing variants are supportedby one EST in our database. When the EST singletons are excluded from BASD, the percentage of genes with alternative splicing sites becomes comparable (66%) to the percentage in the previous reports. Splicing variants representedby singleton EST were sometimes consid- Figure 4 Distribution of P-values for all splicing variants related to Cancers. Gray andblank bars represent the percentage of erednoninformative (Mironov et al., 1999; Brett et al., variants that are specific in cancer andnormal tissues, respectively 2000), since the singleton ESTs may introduce non- native splicing events. However, we retainedthese variants because six of eight single-EST-supported to 20% in different cancer types, among which pancreas variants (75%) were confirmedby RT–PCR in our cancer rankedat the top (20%). About 43% of cancer- study (Table 2), demonstrating the value of these single- specific variants were relatedto more than two kindsof EST-supportedvariants in mining genomic information.

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3021 A recent publication by Wang et al. also described bladder tissue, although it does occur in normal bladder screening for cancer-relatedalternative splicing using tissue (Thompson et al., 2002). Kaufmann et al. (2002) RefSeq alignment. Wang et al. searched11 014 genes studied the alternative transcripts of neurofibromatosis andidentified26 258 alternative splicing variants, of type 1 (NF1) in three tumor tissues andfoundthat the which 845 are cancer related; whereas we searched 4322 types of splicedvariants were about twice more than in genes andidentified26 812 splicing variants, of which normal tissues . In addition, physiological condition, 2149 are associatedwith cancer. The possible reasons temperature andPH value, which are often alteredin for the difference might include: first, the statistic tool tumor cells, wouldincrease aberrant variants of NF1 Wang et al. usedis a Z-test, which assumes that the in vitro (Ars et al., 2000; Kaufmann et al., 2002). frequency of alternative splicing variants is a normal Enhancement of alternative splicing in cancer cells may distribution, while we used Fisher’s exact test, a have a similar significant impact on carcinogenesis as nonparametric method; second, we included genomic expression level change. sequence in the alignment, andthus have a better chance One implication of this study could be that different to identify the alternative splicing events located in the variants might play different roles in carcinogenesis. The untranslatedregions; third,the E-value andgap length expression of AIPL1, the fourth causative gene of Leber for BLAST in our study are 1eÀ30 and7 base pairs (bp), congenital amaurosis, is specific to human retina and whereas those appliedby Wang et al. are 1eÀ10 and cell lines of retinal origin (van der Spuy et al., 2002). 10 bp. In Wang et al.’s dataset of 845 cancer-related Among the five AIPL1 variants predicted in BASD, genes, 224 genes overlappedwith the 4322 genes only the expression of variant 1 is increasedto nearly 10- included in our database, and 174 (78%) genes were folds in retinoblastoma and may be a candidate tumor also identified as cancer-associated variants as per our marker. Increasedexpression of fibroblast growth prediction. Although we are unable to compare the factors (FGFs) is observedin a substantial fraction of complete gene lists studied in the two reports, it seems human prostate cancers andcell lines. Valve et al. (2001) very likely that the two databases have a different reportedthat isoforms a ande of FGF-8 were over- coverage of the RefSeq. expressedat significantly higher frequencies in prostate We startedfrom 17 448 reference sequences while cancer, while isoform b remainedat similar levels in others usedUniGene, which contains about 104 170 prostate cancer andin control prostates. Our dataalso human clusters. One advantage of RefSeq is that a showedthat only one variant of FGF5 was significantly permanent sequence ID was usedto represent a gene in more countedin prostate cancer. These dataimplied RefSeq, while the UniGene number might be changedor that only a fraction of isoforms of FGFs might be related retiredfrom time to time (Schuler 1997; Pruitt and to the genesis of prostate cancer. Shifting of the splicing Maglott, 2001). This feature of RefSeq makes it very pattern might cause the dominant-negative or constitu- easy to compare the results of different batches. Another tively activatedform of the wild-type protein that benefit of RefSeq is that it avoids self-assembly of ESTs contributes to or results from carcinogenesis. From in the UniGene, which may cause misclassification in our analysis of HCC-relatedvariants, alteration of some alternative splicing cases. A survey using 254 amino-acidsequences was foundin about 27% variants variants from 34 randomly chosen genes showed that (data not shown). The study of altered gene expression about 10% of variants were assignedto irrelevant levels might thus be specific to splicing isoforms in UniGene clusters (data not shown). For example, using cancer research, as our recent work showedonly isoform ASA, ESTs AI741905 andAI436103 were groupedto BofSVH to be involvedin HCC carcinogenesis gene chimaerin 1 (CHN1) as a splicing variant because (Huang et al., 2003). these two ESTs contain part of exon 6 andthe entire Owing to the limitednumber of ESTs for exon 7 of CHN1 andare locatedat the same low-abundant transcripts, and due to ESTs that could site. However, in the UniGene database, not completely cover some of the long transcripts, a these ESTs were separatedfrom the CHN1 cluster number of alternative splicing variants might still be (Hs.380138) as an independent cluster, Hs.369565. underestimated. The statistical significance of the However, since the RefSeq does not include all human association between splicing variants andcancers may genes, it is possible that an EST cluster were considered also be affected, as identified in Table 4 (bladder, as a splicing variant to a homologous ResSeq, although esophagus, andthymus cancers). The distribution of it has a better homology to another gene that is not yet EST numbers also suggests that cancer-relatedvariants listedby RefSeq. This kindof error wouldbe reduced usually include more ESTs (Figure 2). In addition, the with the accumulation of additional genomic data and predicted cancer-associated splicing variants may also increasedRefSeq database size. be affected, for cancer tissues may be composed of a Our work presents a genomic view of the relationship mixture of cell types differing from the normal tissue between cancer andalternative splicing. We observed counterparts, andthe fact that mRNA variants asso- that about one-thirdof alternative splicing variants were ciatedwith rare cancer type may not be identifieddueto only detected in cancer tissues. Although no clear an insufficient number of cDNA libraries. Despite this, explanations exist until now, increasing of alternative EST-genomic DNA alignment can still provide a helpful splicing was well documented. Alterative splicing in approach to identify cancer-specific splicing variants. DNA polymerase beta appears much more often in Proper mining of this database may contribute in cancer bladder cancer tissues and cell lines than in normal studies.

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3022 Materials and methods was introduced at the 50 or 30 endof EST, there shouldbe at least 20 bp homology to the genomic sequences. If the 50 and3 0 Data source endof an EST couldnot be alignedto the genomic sequence andthe unalignedendwas shorter than 50 bp, this unaligned The analysis employedthe human EST databaseandhuman sequence part wouldbe deletedfromthe EST to avoidpossible genome database from the NCBI. Sequences in the RefSeq sequencing errors. If an internal part of an EST couldnot align were used as the standard to create database (BASD). The to the genomic sequence, this EST was taken as an unqualified LocusLink database was used to identify the genomic location sequence andremoved.The splicing site wouldbe found of reference sequences in RefSeq. Tissue sources of cDNA through the one-to-one mapping of the relationship pathways. libraries were extractedfrom the UnifiedLibrary Database Comparing the ESTs of homolog with the query sequence (UniLib). The Human ESTs database (release 129.0) was wouldproducea connective matrix. Any ESTs that have the downloaded from ftp://ftp.ncbi.nih.gov/genbank/. Human same connective pathway were put into one cluster, represent- genomic sequences (2 August 2002) were downloaded from ing a unique alternative splicing variant. Two independent ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/. RefSeq (20 August EST clusters wouldnot be groupedtogether, unless there was 2002) and LocusLink (24 October 2002) were downloaded an EST covering both clusters’ alternative splicedsites. from ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/mRNA_Prot/ andftp://ftp.ncbi.nih.gov/refseq/LocusLink/, respectively. UniLib (20 October 2002) information was available at Construction of BASD ftp://ftp.ncbi.nih.gov/pub/unilib/. Initiatedfrom reference sequences, hypothetical alternative splicing variants were identified by ASA and deposited into Locate reference sequences and ESTs to genome sequences BASD. Each variant from the same gene transcript was named numerically beginning with the Locus ID of this gene. Splicing Genes were locatedto the genomic contig accordingto the variants were presentedin graphic format accordingto the LocusLink database, thus multi-copy genes and pseudogenes alignment position in genomic sequences. Exons were repre- might not interfere during the BLAST process (Pruitt and sentedby bars proportional to their sequence length. Text Maglott, 2001). The reference sequence of each gene was used description was attached under the graph. The GenBank as the standard and BLASTed against the human EST accession numbers of ESTs andcDNA library names were database with match/mismatch scores of 1/À3 and E-value labeled. Under each splicing variant, the exon location on the of 1eÀ30. If there were more than one sequence under a genome contig was given in bracketedbase pair numbers. Key reference sequence record, the first one was selected as the words, such as Locus ID and gene name, could be used to standard for analysis. Homolog EST sequences from the retrieve the database through http://humgene.biosino.cn/ human EST database and other sequences under the same BASD.html. reference sequence recordwere taken out andpooledtogether as homolog sequences. All these homolog sequences tended to locate to the same genomic region as the reference sequence. Identification of alternative splicing variants related to cancer tissues Construction of ASA and discovery of putative exons and From the UniLib, we retrievedthe tissue source of each alternative splicing variants library. In order to prevent inconsistence in the classification of tissue types in UniLib, all tissue information of cDNA libraries Reference sequences were BLASTedagainst the mapped was checkedmanually. Libraries from the same tissue type genomic contigs. Repeat sequences were maskedby Repeat- were groupedtogether. Libraries containing pooledtissues or Masker (Smit, AFA andGreen P RepeatMasker at without clearly described tissue sources were not grouped to http://ftp.genome.washington.edu/RM/RepeatMasker.html). any tissue types. In all, 220 libraries indicated as ‘subtracted’ Putative exons were determined by using bl2seq with default or ‘normalized’ were not included in order to avoid deviation scoring values, except match/mismatch scores of 1/À2, word in tissue distribution analysis. Libraries from cancer and its size of 7, cutoff score of 7 andexpect value of 10 guaranteeing counterpart normal tissues were pairedfor comparing. the identification of short exons (at least 7 bp) and sufficient After the splicing variants were determined, the EST BLAST stringency. The whole reference sequence should numbers of each variant in cancer (IC) or normal (IN) tissue spreadalong the contig along the same directionof BLAST were countedaccordingto their source cDNA library. The hits, andmismatchedsequences among neighboring exons total library EST numbers were calculated by adding all ESTs shouldbe less than 13 bp. Reference sequences not passing from the same kindof cancer (TC) or normal (TN) tissues. these criteria were discarded. The ends of putative exons are Fisher’s exact test was appliedto check if the differenceof EST trimmedor elongatedto findcanonical splice sites (GT/AG counts between cancer tissues andnormal counterparts (IC andGC/AG). After determinationof the putative exons, each andIN) was statistically significant comparedwith the total exon was BLASTedagainst the pooledhomolog ESTs library EST numbers (TC andTN) ( Po0.05) (Scheurle et al., (threshold1e À30). The exon–EST relationships were coded 2000; Baranova et al., 2001). In order to reduce the possible in the format of (query start, query end, subject start, subject false positives causedby simultaneously performedmultiple end). A relationship is determined as connective if any two comparison, Bonferroni correction (a0 ¼ 0.05/16945) was matchedexons in an EST were in the same directionandthe further appliedto Fisher’s exact test. sequences along the contig andmismatchedparts of adjacent ends were shorter than 13 bp. The longest relationship path- RT–PCR and clinical samples way thus couldbe foundaccordingto the connective relationship. If an EST containedsequences that couldnot Clinical cancer samples were provided by Zhongshan Hospital. be alignedto previously determinedexons,it wouldonce again Each sample pair consistedof cancer andnormal tissues from be alignedto the transcription strandof genomic contig the same patient. Histological examination was carriedout in additionally. A new relationship would be accepted if the the Department of Pathology, Zhongshan Hospital. Total genomic location of this sequence part was determined as RNA was extractedby Trizol andquantifiedby 18S RNA. RT connective with context exons. If a new connective relationship was performedusing M-MLV Reverse Transcriptase

Oncogene Prediction of cancer-related alternative splicing L Hui et al 3023 (Promega, Madison, WI, USA) with oligo-dT (Sangon, internal control, anda cycle gradationtest was carriedout to Shanghai, China) as primer following the manufacturer’s determine the optimal conditions. Each sample was amplified instruction. in duplication. PCR products were purified and sequenced to PCR primers specific for the splicing variants were designed confirm the amplifiedvariants. by PrimerSelect (DNAStar, Madison, WI, USA) and are shown in Table 5. Briefly, for exon extension andinsertion, one of the PCR primers was designed in that additional region; Acknowledgements for exon deletion and shortened exon, PCR primer was This work was supportedby Minister for Science and selectedat the splicing site covering the deletedregion. Technology Grant G1998051007 andChinese High-Tech Amplification was performedusing LA Taq (Takara Bio, R&D Program (863)-2001AA231011 and-20022Z2002. We Dalian, Liaoning, China) for 40 s at 941C, 40 s at 56–621C, 60 s thank Nan Yu andVu Nguyen for carefully proofreadingthis at 721C for 19–26 cycles. b-Actin mRNA was chosen as the manuscript andProf Naqing Zhao for suggestion on statistics.

References

Ars E, Serra E, de la Luna S, Estivill X and Lazaro C. (2000). Matsui A, Ikeda T, Enomoto K, Nakashima H, Omae K, Nucleic Acids Res., 28, 1307–1312. Watanabe M, Hibi T andKitajima M. (2000). Cancer Lett., Baranova AV, Lobashev AV, Ivanov DV, Krukovskaya LL, 150, 23–31. Yankovsky NK andKozlov AP. (2001). FEBS Lett., 508, McDoniels-Silvers AL, Nimri CF, Stoner GD, Lubet RA and 143–148. You M. (2002). Clin. Cancer Res., 8, 1127–1138. Black DL. (2000). Cell, 103, 367–370. Mironov AA, Fickett JW andGelfandMS. (1999). Genome Brett D, Hanke J, Lehmann G, Haase S, Delbruck S, Krueger Res., 9, 1288–1293. S, Reich J andBork P. (2000). FEBS Lett., 74, 83–86. Modrek B, Resch A, Grasso C and Lee C. (2001). Nucleic Claverie JM. (2001). Science, 291, 1255–1257. Acids Res., 29, 2850–2859. CowardE, Haas SA andVingron M. (2002). Trends Genet., Nash MA, Deavers MT andFreedmanRS. (2002). Clin. 18, 53–55. Cancer Res., 8, 1754–1760. Dralyuk I, Brudno M, Gelfand MS, Zorn M and Dubchak I. Olopade OI, Adeyanju MO, Safa AR, Hagos F, Mick R, (2000). Nucleic Acids Res., 28, 296–297. Thompson CB andRecant WM. (1997). Cancer J. Sci. Am., Goodison S, Urquidi V and Tarin D. (1999). Mol. Pathol., 52, 3, 230–237. 189–196. Pruitt KD andMaglott DR. (2001). Nucleic Acids Res., 29, He J, Smith ER andXu XX. (2001). J. Biol. Chem., 276, 137–140. 26814–26818. Reeve JG, Xiong J, Morgan J andBleehen NM. (1996). Br. J. Hogenesch JB, Ching KA, Batalov S, Su AI, Walker JR, Zhou Cancer, 73, 1193–1200. Y, Kay SA, Schultz PG andCooke MP. (2001). Cell, 106, Scheurle D, DeYoung MP, Binninger DM, Page H, Jahanzeb 413–415. M andNarayanan R. (2000). Cancer Res., 60, 4037–4043. Huang R, Xing Z, Luan Z, Wu T, Wu X andHu G. (2003). Schuler GD. (1997). J. Mol. Med., 75, 694–698. Cancer Res., 63, 3775–3782. Stamm S, Zhu J, Nakai K, Stoilov P, Stoss O andZhang MQ. Huang YH, Chen YT, Lai JJ, Yang ST andYang UC. (2002). (2000). DNA Cell Biol., 19, 739–756. Nucleic Acids Res., 30, 186–190. Thompson TE, Rogan PK, Risinger JI andTaylor JA. (2002). Ji H, Zhou Q, Wen F, Xia H, Lu X andLi Y. (2001). Nucleic Cancer Res., 62, 3251–3256. Acids Res., 29, 260–263. Valve EM, Nevalainen MT, Nurmi MJ, Laato MK, Marti- Kan Z, Rouchka EC, Gish WR andStates DJ. (2001). Genome kainen PM andHarkonen PL. (2001). Lab Invest., 81, Res., 11, 889–900. 815–826. Kan Z, States D andGish W. (2002). Genome Res., 12, 1837–1845. van der Spuy J, Chapple JP, Clark BJ, Luthert PJ, Sethi CS Kaufmann D, Leistner W, Kruse P, Kenner O, Hoffmeyer S, andCheetham ME. (2002). Hum. Mol. Genet., 11, 823–831. Hein C, Vogel W, Messiaen L andBartelt B. (2002). Cancer Venter JC, Adams MD, Myers EW, Li PW and Mural RJ et al. Res., 62, 1503–1509. (2001). Science, 291, 1304–1351. Kawamoto S, Yoshii J, Mizuno K, Ito K, Miyamoto Y, Ohnishi Wang Z, Lo HS, Yang H, Gere S, Hu Y, Buetow KH andLee T, Matoba R, Hori N, Matsumoto Y, Okumura T, Nakao Y, MP. (2003). Cancer Res., 63, 655–657. Yoshii H, Arimoto J, Ohashi H, Nakanishi H, Ohno I, Wolfsberg TG andLandsmanD. (1997). Nucleic Acids Res., Hashimoto J, Shimizu K, Maeda K, Kuriyama H, Nishida K, 25, 1626–1632. Shimizu-Matsumoto A, Adachi W, Ito R, Kawasaki S and Xie H, Zhu WY, Wasserman A, Grebinskiy V, Olson A and Chae KS. (2000). Genome Res., 10, 1817–1827. Mintz L. (2002). Genomics, 80, 326–330. Krawzczak M, Reiss J andCooper DN. (1992). Hum. Genet., Xu L, Hui L, Wang S, Gong J, Jin Y, Wang Y, Ji Y, Wu X, 90, 41–54. Han Z andHu G. (2001). Cancer Res., 61, 3176–3181. Lander ES, Linton LM, Birren B, Nusbaum C and Zody MC Xu Q, Modrek B and Lee C. (2002). Nucleic Acids Res., 30, et al. (2001). Nature, 409, 806–921. 3754–3766.

Oncogene