Identification of Four Hub Genes As Promising Biomarkers to Evaluate
Total Page:16
File Type:pdf, Size:1020Kb
Chen et al. Cancer Cell Int (2020) 20:270 https://doi.org/10.1186/s12935-020-01361-1 Cancer Cell International PRIMARY RESEARCH Open Access Identifcation of four hub genes as promising biomarkers to evaluate the prognosis of ovarian cancer in silico Jingxuan Chen1,2†, Yun Cai3†, Rui Xu1,2†, Jiadong Pan4, Jie Zhou5* and Jie Mei2,4* Abstract Background: Ovarian cancer (OvCa) is one of the most fatal cancers among females in the world. With growing numbers of individuals diagnosed with OvCa ending in deaths, it is urgent to further explore the potential mecha- nisms of OvCa oncogenesis and development and related biomarkers. Methods: The gene expression profles of GSE49997 were downloaded from the Gene Expression Omnibus (GEO) database. Weighted gene co-expression network analysis (WGCNA) was applied to explore the most potent gene modules associated with the overall survival (OS) and progression-free survival (PFS) events of OvCa patients, and the prognostic values of these genes were exhibited and validated based on data from training and validation sets. Next, protein–protein interaction (PPI) networks were built by GeneMANIA. Besides, enrichment analysis was conducted using DAVID website. Results: According to the WGCNA analysis, a total of eight modules were identifed and four hub genes (MM > 0.90) in the blue module were reserved for next analysis. Kaplan–Meier analysis exhibited that these four hub genes were signifcantly associated with worse OS and PFS in the patient cohort from GSE49997. Moreover, we validated the short-term (4-years) and long-term prognostic values based on the GSE9891 data, respectively. Last, PPI networks analysis, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed several poten- tial mechanisms of four hub genes and their co-operators participating in OvCa progression. Conclusion: Four hub genes (COL6A3, CRISPLD2, FBN1 and SERPINF1) were identifed to be associated with the prog- nosis in OvCa, which might be used as monitoring biomarkers to evaluate survival time of OvCa patients. Keywords: Ovarian cancer, WGCNA, Bioinformatic analysis, Prognosis Background Ovarian cancer (OvCa) is a common cancer which has the highest morbidity and quietly poor prognosis among gynecological malignancies worldwide. In United States, OvCa causes approximately 14 thousands death patients *Correspondence: [email protected]; [email protected] †Jingxuan Chen, Yun Cai and Rui Xu contributed equally to this in 2018 [1]. With the continuous improvement of com- manuscript prehensive therapy, patients with early stage OvCa seem 2 Cytoskeleton Research Group & First Clinical Medicine College, Nanjing to have satisfactory prognosis that the 5-year survival Medical University, No. 101 Longmian Road, Nanjing 211166, China 5 Department of Gynecology and Obstetrics, Afliated Wuxi Maternal rate reaching 93%. Nevertheless, since the majority of and Child Health Hospital of Nanjing Medical University, No.48, Huaishu patients, precisely more than 80%, would be hard to be Road, Wuxi 214023, China diagnosed until the tumor at FIGO stage III or stage IV, Full list of author information is available at the end of the article © The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/ zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Chen et al. Cancer Cell Int (2020) 20:270 Page 2 of 11 leading to a considerable number of mortality [2]. Hence, increasing number of researchers focus on this horrible disease and attempt to explore novel procedures for early diagnosis and treatment. However, early diagnostic strat- egies and reliable models to guide therapy and evaluate prognosis have been lacking up to now. In the past decades, vigorously developing computer technology has largely promoted the fourish of big data applications. As an emerging biomedical auxiliary research technology, bioinformatics analysis has been widely applied to several aspects of clinical or basic medi- cal research. Weighted gene co-expression network anal- ysis (WGCNA) is a systematic biological method which describes the pattern of gene association between dif- ferent samples [3]. Researchers utilize WGCNA to iden- tify gene sets of interest with information of thousands of signifcantly altered genes or all genes, and then per- form signifcant association analyses with phenotypes. At present, several laboratories have applied this technol- ogy into their researches [4–6]. Meanwhile, in the feld of cancer research, investigators tend to take advantage of WGCNA for systematic analysis of phenotypes, espe- Fig. 1 Flow chart of the research. The gene expression profles of cially for developing novel prognostic models [7, 8]. GSE49997 were downloaded from the GEO database. WGCNA was applied to investigate potential biomarkers associated with the OS GSE49997, a microarray containing 204 OvCa samples, and PFS events. Besides, the short-term and long-term prognostic was contributed by Dietmar et al. in 2014 [9]. Dietmar value of hub genes was validated based on data from GSE9891. In et al. validated the prognostic impacts of a molecular addition, the PPI networks were constructed by GeneMANIA and subtype in OvCa on overall survival (OS) and progres- enrichment analysis was further conducted to reveal the potential sion-free survival (PFS) based on this microarray [10]. mechanisms of four hub genes and their cooperator participating in OvCa progression Given the integrated follow-up information and gene expression data in this microarray, we re-assessed the above-mentioned data and fnally appraised four poten- tial biomarkers predicting prognosis of OvCa patients pre-processed by background correction, quantile nor- through WGCNA analysis. We further conducted veri- malization and probe summarization. After matching the fcation analysis on prognostic values of four genes gene expression data and survival information, 194 sam- through another microarray, GSE9891. In conclusion, we ple from GSE49997 were retained in the current research. found out four genes (COL6A3, CRISPLD2, FBN1 and For further WGCNA analysis, the top 25% diferent SERPINF1) associated poor prognosis, suggesting these expression genes (DEGs) from GSE49997 dataset accord- genes function as potential biomarkers to evaluate the ing to analysis of variance (3,837 genes) were retained. prognosis of OvCa patients. Co‑expression network construction and identifcation Materials and methods of hub genes Acquisition of microarray data and pre‑process After pre-processing the GSE49997 microarray data, the Te workfow of our research was summarized in Fig. 1. expression profle of these 3,837 genes was sent to con- Te array profles of GSE49997 (https ://www.ncbi.nlm. struct a gene co-expression network using the WGCNA nih.gov/geo/query /acc.cgi?acc=GSE49 997) [9] were package in R language [12]. Te idea of a soft threshold is downloaded from the Gene Expression Omnibus (GEO) to continually elementize the elements in the Adjacency database. Besides, four hub gene expression data and sur- Matrix through a weight function and the choice of the vival data corresponding to GSE9891 were downloaded soft threshold β is bound to afect the result of module from the Kaplan–Meier Plotter (http://kmplo t.com/ identifcation. To create a network with a nearly scale- analy sis/) [11]. Te basic clinic-pathological features of free topology, we installed the soft threshold power of 2 two microarrays were described in Table 1 according to β = 3 (scale free R = 0.868). Adjacency matrices were original publications of GSE49997 and GSE9891 [9, 10]. calculated and transformed into the topological over- Before further analysis, array profles of GSE49997 were lap matrix (TOM). Te dynamic tree cut algorithm was Chen et al. Cancer Cell Int (2020) 20:270 Page 3 of 11 Table 1 The basic clinic-pathological features of OvCa Construction of protein–protein interaction network patients in two datasets Protein–protein interaction (PPI) network was been Characteristics n Cases (%) constructed by GeneMANIA (https ://genem ania.org/) GSE49997 GSE9891 [13], an online server that explore interconnections between proteins in term of physical interaction, co- Histologic subtype expression, predicted, co-localization, common path- Serous 435 171 (88.1%) 264 (92.6%) way, genetic interaction and shared protein domains. In Non-serous 44 23 (11.9%) 21 (7.4%) this research, GeneMANIA was used for PPI analysis of FIGO stage four hub genes at the gene level. Stage 1 24 0 (0%) 24 (8.4%) Stage 2 27 9 (4.6%) 18 (6.3%) Gene Ontology and Kyoto Encyclopedia of Genes Stage 3 371 154 (79.4%) 217 (76.1%) and Genomes analysis Stage 4 53 31 (16.0%) 22 (7.7%) Unknown 4 0 (0%) 4 (1.4%) Te Database for Annotation, Visualization and Inte- Grade grated Discovery (DAVID, https ://david .ncifc rf.gov/) Grade 1 30 11 (5.7%) 19 (6.7%) [14] was applied to perform Gene Ontology (GO) and Grade 2 136 39 (20.1%) 97 (34.0%) Kyoto Encyclopedia of Genes and Genomes (KEGG) Grade 3 307 143 (73.7%) 164 (57.5%) analyses of four hub genes and their most relevant Unknown 6 1 (0.5%) 5 (1.8%) cooperators.