Identification of Key Candidate Genes and Pathways in Hepatocellular Carcinoma by Integrated Bioinformatical Analysis
Total Page:16
File Type:pdf, Size:1020Kb
4932 EXPERIMENTAL AND THERAPEUTIC MEDICINE 15: 4932-4942, 2018 Identification of key candidate genes and pathways in hepatocellular carcinoma by integrated bioinformatical analysis TONGHAI XING1, TINGMANG YAN2 and QIANG ZHOU1 Departments of 1General Surgery and 2Urology, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, P.R. China Received October 30, 2017; Accepted March 13, 2018 DOI: 10.3892/etm.2018.6075 Abstract. Hepatocellular carcinoma (HCC) is one of the most high in China (2) on account of hepatitis B (HBV) and C (HCV) common malignant neoplasms worldwide, however the under- infection. However, in the western world, HCC is primarily due lying mechanisms and gene signatures of HCC are unknown. to non-alcoholic fatty liver disease (NAFLD) (3,4). The aggres- In the present study the profile datasets of four cohorts were sive nature of this malignancy combined with the limited integrated to elucidate the pathways and candidate genes therapeutic options result in poor prognosis (1,5). Unfortunately, of HCC. The expression profiles GSE25097, GSE45267, HCC has a high recurrence rate even with successful surgical GSE57957 and GSE62232 were downloaded from the Gene removal because of the insensitivity of HCC to chemotherapy Expression Omnibus database, including 436 HCC and 94 agents and radiotherapy (6). Although there have been exten- normal liver tissues. A total of 185 differentially expressed sive previous studies on the molecular mechanism of HCC genes (DEGs) were identified in HCC, including 92 upregu- formation and progression, the causes of HCC are not yet clear. lated genes and 92 downregulated genes. Gene ontology (GO) Hence, owing to the high morbidity and mortality in HCC, was performed, which revealed that the upregulated DEGs it is very important to determine the underlying molecular were primarily enriched in cell division, mitotic nuclear divi- mechanisms and pathological processes and identify suitable sion, mitotic cytokinesis and G1/S transition of the mitotic molecular biomarkers for early HCC diagnosis. cell cycle. Pathway enrichment was analyzed based on the While HCC is a highly heterogeneous tumor, with different Kyoto Encyclopedia of Genes and Genomes database to genetic compositions and multifaceted molecular pathogen- assess the functional relevance of DEGs. The most significant esis (7,8), it is also a slow process during which plenty of genomic module was selected from protein-protein interactions and alterations accumulate and change the hepatocellular pheno- 15 important hub genes were identified. The sub‑networks of type, leading to cellular intermediates and multiple monoclonal hub genes were involved in cell division, p53 signaling, and that evolve into HCC (9,10). Microarray technology has become T lymphotropic virus type I infection signaling pathways. In an indispensable tool to monitor genome wide expression levels conclusion, the present study revealed that the identified DEG of genes in a given organism. Gene Expression Omnibus (GEO) candidate genes may promote the understanding of the cause and The Cancer Genome Atlas (TCGA) are two public func- and molecular mechanisms underlying the development of tional genomics data repositories accepting data from diverse HCC and that these candidates and signal pathways may be microarray platforms (11). Nowadays, gene expression studies potential targets of clinical therapy for HCC. by microarray have been performed to uncover molecular variations in HCC (12-14). Pathway analysis has shown several Introduction important cellular signaling pathway alterations to be linked with the main pathogenic mechanism (8). A small number Hepatocellular carcinoma (HCC) is one of the most common of hub genes (proteins) were identified as key modulators in malignant neoplasms worldwide (1). Its incidence is particularly HCC metastasis by using protein-protein interaction (PPI) network. However, the most significantly dysregulated genes from previous studies are inconsistent because of the sample heterogeneity in independent studies, small sample size, and different statistical methods. Correspondence to: Dr Qiang Zhou, Department of General In the present study, four original microarray datasets Surgery, Shanghai General Hospital, School of Medicine, Shanghai Jiao Tong University, 100 Haining Road, Shanghai 200080, GSE25097, GSE45267, GSE57957, and GSE62232 P. R. Ch i na were downloaded from the NCBI-Gene Expression E-mail: [email protected] Omnibus database (NCBI-GEO) (available online: https://www.ncbi.nlm.nih.gov/geo/), and included 436 HCC and Key words: hepatocellular carcinoma, bioinformatical analysis, 94 normal liver tissues. The microarray data was normalized differentially expressed genes and preprocessed using the robust multi-array average algorithm (RMA). The differentially expressed genes (DEGs) were obtained using the limma package. Venny was applied to XING et al: IDENTIFICATION OF KEY GENES AND PATHWAYS IN HEPATOCELLULAR CARCINOMA 4933 filter the overlapping DEGs among groups. Gene ontology (GO) (available online: http://string-db.org) was employed to and pathway enrichment analysis were also performed for develop DEGs-encoded proteins and protein-protein interac- annotation and visualization with DAVID and KEGG PATHWAY tion network (PPI). Second, the Cytoscape software (17) was (Available online: http://www.genome.jp/kegg). We used the utilized to construct a protein interaction relationship network STRING online database (Available online: http://string-db.org) and analyze the interaction relationship of the candidate to annotate functional interactions of DEGs. The most significant DEGs encoding proteins in colon cancer. Third, the Network module was selected from PPI, and 15 important hub genes Analyzer plug-in was used to calculate the node degree, i.e., were identified. The sub‑networks of hub genes were involved the numbers of inter‑connections to filter the hub genes of PPI. in cell division, p53 signaling and HTLV-I infection signaling The corresponding proteins in the central nodes might be the pathways. core proteins and key candidate genes that have important physiological regulatory functions. The cBio Cancer Genomics Materials and methods Portal (http://cbioportal.org) is an open platform for exploring multidimensional cancer genomics data by encapsulating Microarray data information and data preprocessing. The molecular profiling data obtained from cancer tissue and cell microarray data from GSE25097, GSE45267, GSE57957, lines into readily understandable genetic, epigenetic, gene and GSE62232 were downloaded from the GEO database expression, and proteomic events. Complex cancer genomic (http://www.ncbi.nlm.nih.gov/geo/). The microarray data of profiles can be easily accessed using the query interface of GSE25097 was based on GPL10687 platforms (Rosetta/Merck the portal enabling researchers to explore and compare genetic Human RSTA Affymetrix 1.0 microarray, Custom CDF; alterations across samples. The underlying data thus obtained Affymetrix Inc., Santa Clara, CA, USA). The microarray data can be linked to clinical outcomes to facilitate novel discovery of GSE45267 was based on GPL570 platforms (HG-U133_ in biological systems. Through use of the portal search func- Plus_2, Affymetrix Human Genome U133 Plus 2.0 Array). tion, identified DEG candidate genes are classified as altered The microarray data of GSE57957 was based on GPL10558 or not altered. The genomics datasets are then presented using platforms (Illumina HumanHT-12 v4.0 expression beadchip). OncoPrint as heatmaps-a visually appealing display of altera- The microarray data of GSE62232 was based on GPL570 plat- tions in gene arrays across tumor samples. forms (HG-U133_Plus_2, Affymetrix Human Genome U133 Plus 2.0 Array). We chose these four datasets for integrated Results analysis, because they represented different racial popula- tions. The microarray data was preprocessed using the RMA Identification of DEGs in HCC. NCBI-GEO is a public func- with the Affy and limma package in Bioconductor (v1.46.1; tional genomics data repository accepting data from diverse http://www.bioconductor.org/). Background correction, microarray platforms, from which gene expression profiles of normalization, and calculation of expression were all included HCC and normal or adjacent liver tissue from the GSE25097, in the preprocessing stage. The microarray data probe was GSE45267, GSE57957, and GSE62232 databases were transformed to gene symbols with Bioconductor Annotation obtained. The microarray data of GSE25097 had 268 HCC Data software packages. If several probes were mapped to one tissues and 6 normal liver tissues (18); the GSE62232 data had gene symbol, the mean value was set as the final expression 81 HCC tissues and 10 normal liver tissues (12); the GSE45267 value of this gene. The DEGs in every group were analyzed data included 48 HCC tissues and 39 healthy liver tissues; and using the limma package. In Bioconductor. The DEG P-values the GSE57957 data had 39 pairs of HCC tissues and matched were calculated using the unpaired or paired Student's t-test paraneoplastic tissue (19). Using P<0.01 and [logFC]>1 as provided by the limma package. P<0.01 and [logFC]>1 were cut-off criterion, we extracted 1072, 1850, 4055, and 1876 DEGs used as cut-off criterion for DEGs. Hierarchical clustering from the expression profile datasets GSE25097, GSE45267, analysis of the DEGs was then performed and visualized GSE57957, and GSE62232, respectively.