Identification of Alternatively Spliced Mrna Variants Related to Cancers
Total Page:16
File Type:pdf, Size:1020Kb
Oncogene (2004) 23, 3013–3023 & 2004 Nature Publishing Group All rights reserved 0950-9232/04 $25.00 www.nature.com/onc Identification of alternatively spliced mRNA variants related to cancers by genome-wide ESTs alignment Lijian Hui1,6, Xin Zhang2,6, Xin Wu3, Zhixin Lin4, Qingkang Wang2, Yixue Li*,5, Gengxi Hu*,1 1State Key Laboratory of Molecular Biology, Institute of Biochemistry and Cell Biology, Room 400, Cell Building, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences,Yueyang Road 320, Shanghai 200031, China; 2Institute of Microelectronics, Shanghai Jiaotong University, China; 3Liver Cancer Institute, Zhongshan Hospital, China; 4School of Life Science, Shanghai Jiaotong University, China; 5Bioinformation Center, Shanghai Insititutes for Biological Sciences, Chinese Academy of Sciences, Yueyang Road 319, Shanghai 200031, China Several databases have been published to predict alter- Introduction native splicing of mRNAs by analysing the exon linkage relationship by alignment of expressed sequence tags One of the remarkable revelations that the recent (ESTs) to the genome sequence; however, little effort has description and annotation of the human genome from been made to investigate the relationship between cancers the International Human Genome Sequencing Consor- and alternative splicing. We developed a program, tium andprivate company Celera Genomics has Alternative Splicing Assembler (ASA), to lookfor splicing brought us is the prediction that there are about variants of human gene transcripts by genome-wide ESTs 30 000–40 000 gene transcripts in human beings (Lander alignment. Using ASA, we constructed the biosino et al., 2001; Venter et al., 2001), although only 60% of alternative splicing database (BASD), which predicted these genes were consistent in different data sources splicing variants for reference sequences from the (Hogenesch et al., 2001). Discussions about the almost reference sequence database (RefSeq) and presented them inestimable complex function produced by those limited in both graph and text formats. EST clusters that differ number of genes were focusedon transcription regula- from the reference sequences in at least one splicing site tion andalternative splicing (Black, 2000; Claverie, were counted as splicing variants. Of 4322 genes screened, 2001), in addition to post-translation modification, such 3490 (81%) were observed with at least one alternative as phosphoration andglycosylation, which presumably splicing variants. To discover the variants associated with contributes to the diversified functions of proteins. cancers, tissue sources of EST sequences were extracted Alternative splicing is an important mechanism in from the UniLib database and ESTs from the same tissue higher eukaryotes for producing proteomic complexity, type were counted. These were regarded as the indicators since approximately 30–65% genes are alternatively for gene expression level. Using Fisher’s exact test, splicedas estimatedby genomically alignedexpressed alternative splicing variants, of which EST counts were sequence tags (ESTs) (Mironov et al., 1999; Brett et al., significantly different between cancer tissues and their 2000; Kan et al., 2002). Combining these mechanisms, counterpart normal tissues, were identified. It was the human organism couldconceivably producehun- predicted that 2149 variants, or 383 variants after dreds of thousands of different proteins by the estimated Bonferroni correction, of 26 812 variants were likely 35 000 human genes. tumor-associated. By reverse transcription–PCR, 11 of 13 Alternative splicing of pre-mRNA is a versatile novel alternative splicing variants and eight of nine mechanism for regulating gene function at the post- variants’ tissue specificity were confirmed in hepatocelluar transcription level. As many as 15% of single base-pair carcinoma and in lung cancer. The possible involvement of mutations that cause human genetic diseases result in alternative splicing in cancer is discussed. alternative splicing defects as indicated by a survey of Oncogene (2004) 23, 3013–3023; doi:10.1038/sj.onc.1207362 mutations in splicing junctions (Krawzczak et al., 1992). Publishedonline 29 March 2004 The differential expression of alternative spliced genes in cancer tissue has been well documented. For example, Keywords: alternative splicing prediction; database; Bcl-x, a member of the Bcl-2 family of apoptosis cancer; EST alignment regulators, has two alternative isoforms, Bcl-xL and Bcl-xS, resulting from 50 splicing of exon 2. Only Bcl- xL, believedto play a major role in carcinogenesis, is overexpressedin small cell lung carcinoma (Reeve et al., 1996) andin 60% of invasive breast carcinomas *Correspondence: G Hu or Y Li; (Olopade et al., 1997). Another example is CD44, which E-mails: [email protected] or [email protected]. 6These authors contributedequally to this paper is a membrane glycoprotein consisting of nine variable Received30 May 2003; revised25 September 2003; accepted11 exons. Elevatedexpression of CD44 variants containing November 2003 variable exon 6 in solidtumor tissue andthe soluble Prediction of cancer-related alternative splicing L Hui et al 3014 variant in serum might be appliedas a probable the RefSeq database and used for alternative splicing diagnostic marker for bladder cancer (Goodison et al., searching by ASA. Alternative splicing variants were 1999). observedin 46 out of 52 genes as reported(Table 1). Alignment of ESTs to the genome is a potentially Further study suggested that the reason why no splicing powerful means to predict alternative splicing of gene variant was detected by ASA in the remaining six genes transcripts. Wolfsberg and Landsman (1997) demon- was probably because only a few ESTs were available stratedthat alternative mRNA formats couldbe for these genes. Only four EST hits were available on the discovered through the assembly of ESTs derived from average for these six genes, while 202 EST hits couldbe the gene . Mironov et al. (1999) have also studied foundfor the other 46 genes. The limitednumber of alternative splicing of human genes by using EST ESTs did not cover the splicing sites present in the contigs from the TIGR Human Gene Index . Recently, ASDB. Basedon this sample, the ASA gives a false a software tool, Transcript Assembly Program (TAP), negative result of 12%, or, it might correctly report which deduces the gene structure by genomic EST about 88% of alternatively splicedgenes. In this aspect, alignments, was developed (Kan et al., 2001; Kan et al., ASA is comparable with SpliceNest, PALS andHASDB 2002). Databases for alternatively splicedmRNAs databases, which have false negative rates of 15, 35, and including AsMamDB, HASDB, PALS and SpliceNest, 38%, respectively (Table 1). all of which are basedon UniGene clusters, have been EST sequences might be contaminatedby genomic constructed(Ji et al., 2001; Modrek et al., 2001; Coward sequences, vector sequences andchimaeric cDNA et al., 2002; Huang et al., 2002). The EST database clones. The contamination along with sequencing errors wouldgive detailedinformation of cDNA libraries and may cause mistakes in EST assembly andgive false make it possible to predict tissue expression. For splicing variants. In order to test the effect of these example, BodyMap was established to acquire tissue factors, we randomly selected 82 splicing variants from distribution data through collection of site-directed 30 67 genes andcomparedthese variants with the ESTs (Kawamoto et al., 2000). Xu et al. (2002) nonredundant database of GenBank. Approximately developed an automated method for discovering tis- 26% (21 out of 82) variants foundat least one reported sue-specific regulation through a genome-wide analysis mRNA sequence that is homologous to the query of ESTs . Since about 45% of the human EST sequences sequence in the nonredundant database (Table 2). Since are derived from cancer cells, it would be possible to mRNA sequences deposited into the nonredundant identify cancer-associated mRNA variants by counting database came from different labs, the existence of the number of ESTs from the cancer andits counterpart these variants wouldbe highly reliable. We further normal tissue libraries (Scheurle et al., 2000; Baranova selected13 variants, three with homolog sequences in et al., 2001; Xie et al., 2002; Wang et al., 2003). nonredundant database and 10 with only homolog ESTs We hereby developed the Alternative Splicing Assem- (Table 2), andperformedRT–PCR to test their presence bler (ASA) program, which searchedall putative in clinical samples of hepatocelluar carcinoma andlung alternative splicing variants through genomic EST cancer. In total, 11 of 13 (85%) variants were confirmed alignment. Using this program, a human alternative by RT–PCR andsequencing. Among these 13 variants, splicing database was constructed based on curated eight variants were supportedonly by one EST andsix reference sequence database (RefSeq). Splicing variants of the eight were confirmed. related to cancer tissues were identified. A total of 26 812 alternatively splicedvariants from 4322 genes were included in the database, in which 2149 variants from Analysis of the alternatively spliced variants and their 1827 genes were predicted as cancer associated. A total tissue distribution in BASD of 13 novel splicing variants andnine cancer-related variants were further testedby reverse transcription BASD presents possible splicing variants of gene (RT)–PCR in clinical samples. transcripts in both text andgraph