A Gene-Based Risk Score Model for Predicting Recurrence-Free Survival
Total Page:16
File Type:pdf, Size:1020Kb
Wang et al. BMC Cancer (2021) 21:6 https://doi.org/10.1186/s12885-020-07692-6 RESEARCH ARTICLE Open Access A gene-based risk score model for predicting recurrence-free survival in patients with hepatocellular carcinoma Wenhua Wang1,2, Lingchen Wang1,2, Xinsheng Xie3, Yehong Yan4, Yue Li1,2 and Quqin Lu1,2* Abstract Background: Hepatocellular carcinoma (HCC) remains the most frequent liver cancer, accounting for approximately 90% of primary liver cancers worldwide. The recurrence-free survival (RFS) of HCC patients is a critical factor in devising a personal treatment plan. Thus, it is necessary to accurately forecast the prognosis of HCC patients in clinical practice. Methods: Using The Cancer Genome Atlas (TCGA) dataset, we identified genes associated with RFS. A robust likelihood-based survival modeling approach was used to select the best genes for the prognostic model. Then, the GSE76427 dataset was used to evaluate the prognostic model’s effectiveness. Results: We identified 1331 differentially expressed genes associated with RFS. Seven of these genes were selected to generate the prognostic model. The validation in both the TCGA cohort and GEO cohort demonstrated that the 7-gene prognostic model can predict the RFS of HCC patients. Meanwhile, the results of the multivariate Cox regression analysis showed that the 7-gene risk score model could function as an independent prognostic factor. In addition, according to the time-dependent ROC curve, the 7-gene risk score model performed better in predicting the RFS of the training set and the external validation dataset than the classical TNM staging and BCLC. Furthermore, these seven genes were found to be related to the occurrence and development of liver cancer by exploring three other databases. Conclusion: Our study identified a seven-gene signature for HCC RFS prediction that can be used as a novel and convenient prognostic tool. These seven genes might be potential target genes for metabolic therapy and the treatment of HCC. Keywords: TCGA, Hepatocellular carcinoma, Recurrence-free survival, Risk score, Prognostic model * Correspondence: [email protected] 1Jiangxi Provincial Key Laboratory of Preventive Medicine, Nanchang University, Nanchang 330006, Jiangxi, China 2Department of Biostatistics and Epidemiology, School of Public Health, Nanchang University, Nanchang 330006, Jiangxi, China Full list of author information is available at the end of the article © The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Wang et al. BMC Cancer (2021) 21:6 Page 2 of 15 Background prognosis of liver cancer remain poor with a 2-year re- In 2018, liver cancer remained among the top six preva- currence rate of 76.9% [6–8]. And many studies have lent carcinomas. There were 841,080 new patients, and shown that HCC is the most difficult to cure cancer, and 781,631 patients died of liver cancer according to the because of this, HCC has been described as a “chemore- Global Cancer Statistics [1, 2]. Hepatocellular carcinoma sistant” tumor [9]. Because of this, the prognosis of HCC (HCC) is the most frequent liver cancer, accounting for is poor. The recurrence-free survival (RFS) of HCC pa- approximately 90% of primary liver cancers [3]. Cur- tients is a critical factor in devising a personal treatment rently, Hepatectomy and Radiofrequency ablation are plan [10]. Thus, it is necessary to accurately forecast the main two ways to treat HCC [4, 5]. Despite the con- HCC patients’ prognosis to improve the prognosis of tinuous development of medical technology, the out- HCC. Most previous studies constructed prognostic come of many patients who receive treatment and the models using the Tumor-Node-Metastasis (TNM) Fig. 1 GO functional and KEGG pathway analyses. a Summary of the differentially expressed genes and GO pathway enrichment. Red, blue, and green bars represent the biological process, cellular component, and molecular function categories, respectively. The height of the bar represents the number of differentially expressed genes observed in each category. b The top 10 pathways of genes associated with RFS Wang et al. BMC Cancer (2021) 21:6 Page 3 of 15 staging system to assess the prognosis of HCC patients Identification of the best genes for modeling [11]. However, the TNM staging system does not predict A robust likelihood-based survival approach was used to the prognosis of HCC. Therefore, it is important to de- identify the best genes for modeling after determining velop a reliable tool for clinicians to predict the progno- the genes associated with RFS [22]. We used the sis of patients with HCC. “rbsurv” package in R to complete this modeling Given the remarkable advances in high-throughput process. technologies, the development of The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) and the Construction and validation of the risk score system intergovernmental Gene Expression Omnibus (GEO) A multivariate Cox regression analysis and “rbsurv” ana- (https://www.ncbi.nlm.nih.gov/gds) database provides an lysis were performed to identify the genes related to RFS abundance of high-quality information regarding HCC and construct the prognostic gene signature. The “survi- [12]. Hence, it is urgent to develop methods to identify valROC” package in R was used to investigate the time- reliable therapeutic gene targets that could enable earlier dependent prognostic value. The optimal cut-off values prognostic evaluation and better therapeutic strategies based on ROC curves were obtained to classify the pa- [13]. Therefore, we considered whether we could build a tients into low-risk groups and high-risk groups. A cali- gene-based risk score model [14]. Our goal was to gen- bration curve and the concordance index (C-index) were erate simple and effective prognostic tools based on sev- used to evaluate the risk score system. eral genes and other factors that may affect RFS [13, 15]. Using the TCGA dataset, we selected 7 genes by robust likelihood-based survival modeling and built a risk score External validation of the risk score system system [16, 17]. We used an independent dataset We calculated the risk score in the GSE76427 dataset. (GSE76427) to validate the effectiveness of the risk score Then, the AUCs of the 12-month, 15-month, and 18- system and demonstrate that its clinical value in predict- month RFS and Kaplan-Meier curves were used to verify ing RFS in HCC patients is better than that of the TNM the risk score system. A calibration curve was used to staging system. validate the risk score system. In addition, the prognosis-related genes included in the risk score system were verified at the protein level by using The Human Methods Protein Atlas database. The CBioPortal for cancer gen- Data collection and survival analyses omics was used to study genetic alterations in the risk First, we downloaded gene expression profiles and clin- score system [23]. ical information from The Cancer Genome Atlas-liver hepatocellular carcinoma (TCGA-LIHC) dataset, which included 334 HCC samples [18]. We used GSE76427, Statistical analysis which contained the gene expression and clinical infor- The statistical tests were performed using R software mation of 115 HCC samples, as the validation group. and SPSS. Univariate and multivariate Cox regression The samples in TCGA-LIHC and GSE76427 that met analyses were performed using a forward stepwise pro- p the following inclusion criteria were included in this cedure. A -value less than 0.05 was considered statisti- study: all samples had mRNA sequencing data and clin- cally significant [23]. ical information related to RFS [19]. Table 1 The best genes predicting recurrence-free survival of hepatocellular carcinoma patients Identification of genes associated with RFS The raw count data were normalized with a log(a + 1) Gene symbol nloglik AIC Select transformation. Then, using the “survfit” function in the TTK 808.79 1619.59 * “survival” package, we plotted Kaplan-Meier curves for C16orf105 797.58 1599.16 * the high and low expression groups of each gene. A log PPAT 791.22 1588.43 * p rank test with a -value less than 0.05 was considered CD3EAP 788.83 1585.66 * statistically significant [20]. SLCO2A1 787.91 1585.83 * ACAT1 786.25 1584.50 * Enrichment analysis of GO functions and KEGG pathways GAS2L3 784.91 1583.83 * For the selected genes, we used WebGestalt (http:// SH2D5 784.84 1585.68 bioinfo.vanderbilt.edu/webgestalt) based on Gene Ontol- ogy (GO) functions and the Kyoto Encyclopedia of ATP8A2 784.75 1587.50 Genes and Genomes (KEGG) to understand the bio- PABPC5 784.74 1589.49 logical significance of the identified genes [21]. *Gene selected for the risk score Wang et al. BMC Cancer (2021) 21:6 Page 4 of 15 Fig. 2 Analysis of the seven-gene signature of HCC in TCGA dataset. a Risk score of each patient; b The RFS time and RFS status of the HCC patients; c the expression levels of TTK, C16orf105, PPAT, CD3EAP, SLCO2A1, ACAT1 and GAS2L3 in the signature; Kaplan-Meier analysis of the TCGA dataset; d The Kaplan-Meier curve for the risk score model in TCGA dataset Wang et al.