Gene Expression-Based Recurrence Prediction of Hepatitis B Virus-Related Human Hepatocellular Carcinoma
Total Page:16
File Type:pdf, Size:1020Kb
GSK 후원 해외연수지원 기금 연구보고 Gene expression-based recurrence prediction of hepatitis B virus-related human hepatocellular carcinoma Yoon Jun Kim, M.D. Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul, Korea ABSTRACT Hepatocellular carcinoma (HCC) is one of poor prognostic malignancies because of the high rate of recurrence even after curative resection of tumors. To predict and classify the molecular signatures associated with early recurrence, we profiled the gene expression of 65 HCC samples with hepatitis B infection using genome-scale oligonucleotide microarray. We identified 348 unique gene set well reflecting early recurrence (ER) of HCC, which revealed to be enriched by GTPase signaling related genes, transcription, immune response, cell adhesion and motility related genes. We also generated a signature responding to recurrence time by using Cox proportional hazard model (HR genes). Hierarchical clustering showed that HR genes are more accurate classifier than ER genes. In addition, we applied a meta-analysis to integrate earlier expression data (Iizuka et al, 2003), and obtained 232 genes consistently expressed in both the independent data. This signature was validated in an independent study indicating its robustness for the prediction of HCC recurrence. In conclusion, the gene signatures retrieved from different but complementary methods may provide clues to predict patients with increased risk of developing early recurrence, and to identify novel therapeutic targets for HCC. Key Words: Hepatitis B Virus; Hepatocellular carcinoma; Recurrence; Microarray; Gene Expression Profile Corresponding Author: Yoon Jun Kim, Department of Internal Medicine, Seoul National University Hospital, 28 Yongon-dong, Chongno-gu, Seoul 110-744, Korea. Phone: 82-2-2072-3081, 82-2-740-8112; Fax: 82-2-743-6701; E-mail: [email protected] * 본 원고는 대한간학회-GSK 해외연수지원 기금 연구보고를 목적으로 2007년 보스턴에서 열린 미국간학회와 Clinical Cancer Research 2008년 4월호에 발표된 내용과 같은 연구를 토대로 작성되었음 - S3 - 2008년 대한간학회 추계학술대회 Introduction Hepatocellular carcinoma (HCC) is one of the major causes of death from malignancy throughout the world. Unfortunately, HCC has a poor prognosis of overall five-year survival rate and it is mainly ascribed by the high intrahepatic recurrence rate even after the curative resection. Therefore, it is important to understand the molecular basis of the early recurrence of HCC and to develop a prediction of likelihood of recurrence after treatment, which may be helpful to plan a therapeutic strategy and to improve the outcome survival of HCC patients. Recently, DNA microarray technology offers a genome-wide view of transcription profile that enables us to understand the systematic changes of gene expression patterns. However, although many studies using microarray has been challenged to predict the survival time and prognosis of cancers, few of them were focused on the recurrence rate of cancer progression. In present study, we used DNA microarrays to examine the gene expression profiles to uncover the biological mechanisms that affect recurrence rate of HCC, and predicted the recurrence time of HCCs after curative resection. Several gene signatures for HCC recurrence were identified by applying different analyzing methods. First, we obtained gene sets reflecting early recurrence of tumors by a traditional two-sample t-test method. Second, we applied a Cox regression analysis method to the recurrence time and obtained a classifier gene set according to the hazard rate of each gene. Finally, it is critical to validate the signature can predict the new case of independent dataset, since the comparison of different microarray datasets dealing with the prediction of patient’s outcome or identification of differentially expressed genes often shows poor congruence with independent datasets. To overcome the limited success of the cross-comparison of microarray data, we applied a meta-analysis by combining the previous data of Iizuka et al.1 that had addressed the same sample label (eg. early recurrence vs. non-early recurrence) of HCC. The meta-analysis was accomplished by effect size method previously introduced by Choi et al2, which is known to select the genes small but consistent expression profiles from independent datasets regardless of their differences of experimental or computational methods used. By comparing and characterizing these different but complementary signatures, we tried to figure out the biological mechanisms involved in the progression of recurrence, and to develop a prediction model for the recurrence rate of HCC. Methods Patients All 65 HCC samples were obtained from the hepatitis B-positive Korean patients who underwent curative resection at the Seoul National University hospital (Seoul, Korea). - S4 - 김윤준 Gene expression-based recurrence prediction of hepatitis B virus-related human hepatocellular carcinoma Microarray Experiments and data analysis Total RNA of HCC samples were isolated and hybridized to Affymetrix HGU133a2 chips according to the manufacturer’s instruction (Affymetrix, Santa Clara, CA). The raw data for 65 samples were normalized using Robust Multi-array Average (RMA) method3 available at Bioconductor (www.bioconductor.org). Unsupervised and supervised hierarchical clustering of expression profile were performed using Cluster and TreeView software.4 A multidimensional scaling (MDS) plot was also carried out using Bioconductor’s package. In addition, we performed a meta-analysis with Iizuka’s data which previously proposed early recurrence related genes of HCC.1 The dataset of ER (n=20) and non-ER samples (n=40) was downloaded from the author’s website. In order to avoid the effect of sample size, we randomly selected 15 ER samples and 25 non-ER samples. The meta-analysis was accomplished by effect size method previously introduced by Choi et al.2 Results 1. Unsupervised clustering We attempted an unsupervised clustering to find similarities in gene expression patterns in 65 HCC samples. Genes were further filtered by the criteria of more than 20 % of un-logged expression values have at least 100, and the unsupervised clustering was carried out on the retained 4,775 genes. Ten out of 15 ER samples were co-clustered, and five samples were clustered to the other cluster (accuracy rate 75 %). These results suggest that ER samples had an overall similarity of expression patterns and were readily distinguished from the non-ER samples. 2. Supervised clustering of recurrence related expression profiles Next, we selected classifier genes from expression profiles of 15 of ER and 25 of non-ER samples. Permutation P-values of significant genes were computed based on 10,000 random permutations, which yielded 348 unique ER genes (193 up-regulated and 155 down-regulated genes). Hierarchical clustering of the dataset showed that ER samples were considerably well classified into a same cluster (accuracy rate, 89.4%) (Figure 1A). Similar to this, multidimensional scaling (MDS) analysis also showed that the ER samples were well separated, although the distribution of non-ER samples were not separated from the ER samples in Euclidean space. In order to validate the class prediction, we applied cross-validation approach with different algorithms including CCP, LDA, 1-NN, 3-NN, NC, SVM, which showed 75 to 80% of prediction accuracy rate. The statistical significances of misclassification rate were determined by leave-one-out cross validation test (P=0.043, 0.04, 0.013, 0.019, 0.027, and 0.0012, respectively). To identify key functional elements that govern the gene expression, we applied gene identification methods on - S5 - 2008년 대한간학회 추계학술대회 A B C Figure 1. Comparison of ER and HR signatures (A) Heatmap view of hierachical clustering result of 40 ER and non-ER samples. (B) Hierarchical clustering of all 65 samples with HR gene set. (C) Kaplan-Meier analysis of recurrence time for the subclasses classified by ER and HR genes, respectively. The color bar under the hierarchical tree showed the sample information. Red: ER samples (recurred within 1 year after curative resection). Dark blue: non-ER samples (not recurred within 1 year and followed up more than 1 year). Grey: unclassified samples with censored less than 1 year of follow up time. Dichotomic classification of samples was indicated with dark cyan and dark red color. the basis of functional enrichment analysis. Recently, several methods for analyzing the gene set enrichment in the context of ontology information has been proposed, and they were thought to provide more robust and interpretable information.5,6 Therefore, we focused on the genes within the significantly enriched function categories, rather than those genes identified by single-gene analysis. This approach may be more suitable and helpful to understand the - S6 - 김윤준 Gene expression-based recurrence prediction of hepatitis B virus-related human hepatocellular carcinoma pattern changes and complex systems that govern the underlying alteration of molecular pathogenesis. Of the 348 ER genes, functional enrichment analysis revealed that the genes related with transcription, immune or inflam- matory response, GTPase activity, blood coagulation, and cell motility were significantly enriched (Table 1). Remarkably, GTPase related functions including small GTPase mediated signal transduction (P=0.004) and regulation of GTPase activity (P=0.007) were enriched. There are a number of families functioning GTPase, which have key regulating roles for Ras signaling. Ras