Identification of Latent Biomarkers in Hepatocellular Carcinoma by Ultra
Total Page:16
File Type:pdf, Size:1020Kb
Oncogene (2014) 33, 4786–4794 & 2014 Macmillan Publishers Limited All rights reserved 0950-9232/14 www.nature.com/onc ORIGINAL ARTICLE Identification of latent biomarkers in hepatocellular carcinoma by ultra-deep whole-transcriptome sequencing K-T Lin1,8, Y-J Shann2,8, G-Y Chau3, C-N Hsu4,5,6 and C-YF Huang1,2,7 There is an urgent need to identify biomarkers for hepatocellular carcinoma due to limited treatment options and the poor prognosis of this common lethal disease. Whole-transcriptome shotgun sequencing (RNA-Seq) provides new possibilities for biomarker identification. We sequenced B250 million pair-end reads from a pair of adjacent normal and tumor liver samples. With the aid of bioinformatics tools, we determined the transcriptome landscape and sought novel biomarkers by further empirical validations in 55 pairs of adjacent normal and tumor liver samples with various viral statuses such as HBV( þ ), HCV( þ ) and HBV( À )HCV( À ). We identified a novel gene with coding regions, termed DUNQU1, which has a tissue-specific expression pattern in tumor liver samples of HCV( þ ) and HBV( À )HCV( À ) hepatocellular carcinomas. Overexpression of DUNQU1 in Huh7 cell lines enhances the ability to form colonies in soft agar. Also, we identified three novel differentially-expressed protein-coding genes (ALG1L, SERPINA11 and TMEM82) that lack documented expression profiles in liver cancer and showed that the level of SREPINA11 is correlated with pathology stages. Moreover, we showed that the alternative splicing event of FGFR2 is associated with virus infection, tumor size, cirrhosis and tumor recurrence. The findings indicate that these new markers of hepatocellular carcinoma may be of value in improving prognosis and could have potential as new targets for developing new treatment options. Oncogene (2014) 33, 4786–4794; doi:10.1038/onc.2013.424; published online 21 October 2013 Keywords: hepatocellular carcinoma; DUNQU1; FGFR2; alternative splicing; RNA-Seq INTRODUCTION cannot be explained solely by the B2% of the human genome Hepatocellular carcinoma (HCC) is one of the three fastest- analyzed by gene expression microarrays. growing cancers in the US and is the most lethal cancer in Asia. With the aid of bioinformatics tools, many potential biomarkers Compared to other cancers, HCC has a relatively poor prognosis that were latent variables in previous genome-wide studies can 7 and limited treatment options. Traditionally, to dissect how the now be seen by RNA-Seq. In the present study, we sought out functional units are deployed in different cells, gene expression potential biomarkers and validated their expression patterns in 55 microarrays are the most frequently used tools. Gene signatures pairs of adjacent normal and tumor liver samples with diverse viral derived from these microarrays are considered to be the status and gender. blueprints of events taking place in the cells under particular conditions at specific time points. In previous genome-wide studies, many hypotheses were generated from the gene RESULTS signatures to explain biological outcomes. The catalog of the transcriptome landscape of HCC Recently, RNA-Seq, one of the applications of the second- In total, we sequenced B120 million read pairs per sample from a generation sequencing techniques, was developed. Since RNA- pair of adjacent normal and tumor liver samples (Supplementary Seq offers single-base resolution on the whole-genome scale, it Table 1). Aligned normal reads covered B6.52% of the human provides the opportunity to greatly improve our knowledge of genome and B7.59% for tumor reads (Supplementary Figure 1a both the quantitative and qualitative aspects of the human 1 and Supplementary Table 2). We also identified many novel exon transcriptome. It has been reported that RNA-Seq can detect at junctions shown in Supplementary Figure 2. All of the above least 25% more known genes than traditional gene expression 2,3 information can be either downloaded from our website (http:// arrays, as well as many novel transcripts in intergenic regions. bioagent.iis.sinica.edu.tw/HCCT2012) or browsed on the UCSC Also, studies using RNA-Seq have shown that the number of Genome Browser for visualization and track comparison. functional units in the human genome is much larger than previously anticipated.4 The transcription of mammalian genomes is now known to take place across almost all sections of the Novel differentially-expressed protein-coding genes whose genome, and many alternative splicing (AS) events in the human expression profiles were missing in liver cancer transcriptome are very noisy, even in normal cells.5,6 It quickly RNA-Seq can detect more differentially-expressed protein-coding became clear that the complexity of the human transcriptome genes (DE genes) than previous genome-wide arrays. We filtered 1Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan; 2Institute of Biopharmaceutical Sciences, National Yang-Ming University, Taipei, Taiwan; 3Division of General Surgery, Department of Surgery, Taipei Veterans General Hospital, Taipei, Taiwan; 4Institute of Information Science, Academia Sinica, Taipei, Taiwan; 5USC/ Information Sciences Institute, Marina del Rey, CA, USA; 6Division of Biomedical Informatics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA and 7Cancer Research Center and Genome Research Center, National Yang-Ming University, Taipei, Taiwan. Correspondence: Professor CYF Huang, Institute of Clinical Medicine, National Yang-Ming University, No. 155, Li-Non St, Sec. 2 Taipei 112, Taiwan, Taiwan. E-mail: [email protected] 8These authors contributed equally to this work. Received 3 January 2013; revised 15 August 2013; accepted 19 August 2013; published online 21 October 2013 Latent biomarkers in hepatocellular carcinoma K-T Lin et al 4787 out 2,576 up-regulated and 855 down-regulated genes 38146 tumor transcript sequences were found. Among them, (Supplementary Table 3) and found 17 DE gene candidates 28569 normal and 28278 tumor sequences contained coding without annotations from the genome-wide arrays (U95, U133 and regions longer than or equal to 50 amino acids (Supplementary U133 plus 2.0) (Supplementary Table 4). Also, the 17 DE genes Table 7). Of the predicted coding regions, 231 normal and 286 lacked documented expression profiles for liver cancer in Gene tumor peptides were located in intergenic regions based on Expression Omnibus, Gene Expression Atlas, ArrayExpress and Ensembl 65 gene annotations. These coding peptides in intergenic Oncomine (Supplementary Table 4 and 5).8–11 regions represent potentially novel, unidentified genes. In By real-time PCR in 55 pairs of HCC patient samples, we showed particular, 224 of the 286 tumor peptides were specific to the that ALG1L, SERPINA11 and TMEM82 indeed had expected sequenced liver tumor. This implies the existence of tumor- expression patterns (Figure 1b and Supplementary Table 6). In specific protein-coding genes in intergenic regions. To prioritize particular, the DDCt values of SERPINA11 were significantly the candidates, we ranked the tumor-specific peptides by FPKM different between stage I/II and stage III/IV (P ¼ 0.0202) and values and manually determined whether there were multi-exon negatively correlated with pathology stages (stage I/II ¼ 1 and peptides with predicted functional domains. Among tumor- stage III/IV ¼ 2, Pearson’s correlation ¼ –0.328, and P ¼ 0.0145) specific peptides whose FPKM value was greater than 0.5, we (Figure 1c). That is to say, SERPINA11 was significantly lower in the found only one multi-exon candidate that had predicted later stages, such as stages III and IV. functional domains. It was a 101-amino-acid peptide encoded ALG1L is a putative glycosyltransferase. An altered mRNA by a gene with 3 exons. We named the gene DUNQU1, from the expression level of glycosyltransferases might be helpful for early 12 Chinese for ‘the latent one’. detection of carcinomas and tumor progression. SERPINA11 is a Our analysis of the junction reads suggested that DUNQU1 serine proteinase inhibitor that might be secreted. The down- comprises three exons (E1, E2 and E3) and expresses two isoforms: regulation of SERPINA11 has been correlated with breast cancer 13 SP1 (E1 þ E2 þ E3) and SP2 (E1 þ E3) (Supplementary Figure 3a). initiation and progression. TMEM82 has a transmembrane The mRNA transcript of SP1 was predicted to be 5438 bp, whereas domain. These new DE genes may improve our understanding SP2 was 5345 bp. The predicted peptide sequences for SP1 and of the carcinogenesis of HCC. SP2 were 101 and 94 amino acids, respectively. According to InterProScan, both isoforms have a phosphodiesterase A novel gene, termed DUNQU1, has a tissue-specific expression domain (PF01663) and alkaline phosphatase-like domains pattern and may play a role in liver tumorigenesis (G3DSA:3.40.720.10 and SSF53649), and both have a potential RNA-Seq can address a certain proportion of the transcribed signal peptide (SignalP-NN(euk)) (Supplementary Figure 4).14 genome beyond current gene annotations (Supplementary Analysis of the predicted DUNQU1 peptide sequences revealed Figure 1b). To investigate the uncharacterized areas of the human high identity between DUNQU1 and the N-terminus of ENPP7 genome, we extracted the putative mRNA sequences from the (Supplementary Figure 5). ENPP7, also known as alkaline sphingo- Cufflinks de novo assembly results. In total, 38155 normal and myelinase (alk-SMase) or NPP7, is expressed in intestine and liver Figure 1. Flow chart of this study and novel differentially-expressed genes in liver cancer. (a) Using the total RNA extracted from a pair of adjacent normal and tumor liver samples, we performed RNA-Seq to obtain short reads and constructed the transcriptome landscape by using several bioinformatics tools such as TopHat, MapSplice, SpliceTrap and Cufflinks. Several major findings, such as novel differentially expressed protein-coding genes (DE genes), novel genes and AS events, were selected and verified in a set of 55 pairs of adjacent normal and tumor HCC patient samples. (b) DDCt values of ALG1L, SERPINA11 and TMEM82 were derived from real-time PCR of 55 pairs of adjacent normal and tumor liver samples.