Identification of Four Novel Prognosis Biomarkers and Potential Therapeutic Drugs for Human Colorectal Cancer by Bioinformatics Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Available online at www.jbr-pub.org.cn Open Access at PubMed Central The Journal of Biomedical Research, 2021 35(1): 21–35 Original Article Identification of four novel prognosis biomarkers and potential therapeutic drugs for human colorectal cancer by bioinformatics analysis Zhen Sun1,2, Chen Liu1, Steven Y. Cheng1,3,✉ 1Department of Medical Genetics, 2Department of Pathology and Pathophysiology, 3Jiangsu Key Lab of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing, Jiangsu 211166, China. Abstract Colorectal cancer (CRC) is one of the most deadly cancers in the world with few reliable biomarkers that have been selected into clinical guidelines for prognosis of CRC patients. In this study, mRNA microarray datasets GSE113513, GSE21510, GSE44076, and GSE32323 were obtained from the Gene Expression Omnibus (GEO) and analyzed with bioinformatics to identify hub genes in CRC development. Differentially expressed genes (DEGs) were analyzed using the GEO2R tool. Gene ontology (GO) and KEGG analyses were performed through the DAVID database. STRING database and Cytoscape software were used to construct a protein-protein interaction (PPI) network and identify key modules and hub genes. Survival analyses of the DEGs were performed on GEPIA database. The Connectivity Map database was used to screen potential drugs. A total of 865 DEGs were identified, including 374 upregulated and 491 downregulated genes. These DEGs were mainly associated with metabolic pathways, pathways in cancer, cell cycle and so on. The PPI network was identified with 863 nodes and 5817 edges. Survival analysis revealed that HMMR, PAICS, ETFDH, and SCG2 were significantly associated with overall survival of CRC patients. And blebbistatin and sulconazole were identified as candidate drugs. In conclusion, our study found four hub genes involved in CRC, which may provide novel potential biomarkers for CRC prognosis, and two potential candidate drugs for CRC. Keywords: colorectal cancer, Gene Expression Omnibus, biomarkers, bioinformatics analysis Introduction deaths were reported each year in the world[1]. With the understanding of pathophysiology of the disease, Nowadays, colorectal cancer (CRC) is one of the different treatment options to improve the survival most deadly cancers and almost 900 000 CRC-related rates of CRC patients have been developed in the ✉Corresponding author: Steven Y. Cheng, Department of Medical 25 August 2020; Published online: 22 October 2020 Genetics, Jiangsu Key Lab of Cancer Biomarkers, Prevention and CLC number: R735.3, Document code: A Treatment, Collaborative Innovation Center for Cancer The authors reported no conflict of interests. Personalized Medicine, Nanjing Medical University, 101 Longmian This is an open access article under the Creative Commons Attribu- Avenue, Nanjing, Jiangsu 211166, China. Tel/Fax: +86-25- tion (CC BY 4.0) license, which permits others to distribute, remix, 86869463, E-mail: [email protected]. adapt and build upon this work, for commercial use, provided the Received: 13 February 2020; Revised: 12 August 2020; Accepted: original work is properly cited. © 2021 by the Journal of Biomedical Research. https://doi.org/10.7555/JBR.34.20200021 22 Sun Z et al. J Biomed Res, 2021, 35(1) world. The 5-year survival rate of CRC patients was between CRC and normal tissues, we searched the >90% when the patients were diagnosed at early NCBI-GEO database to collect enough and adequate stages[2]. However, due to lacking early detection tissues. A total of 4 GEO datasets were selected, methods, many CRC patients were diagnosed at an including GSE113513, GSE21510, GSE44076, and advanced stage or in the metastasis status. And the 5- GSE32323. These mRNA profiles were based on year survival rate for those diagnosed with metastasis platform GPL15207 (GSE113513), GPL570 [3] was at approximately 12% . Recently, a new kind of (GSE21510 and GSE32323), and GPL13667 analysis method has been used to identify the (GSE44076). A total of 253 CRC samples and 203 differential expression genes between CRC and normal samples were chosen for this study, including normal tissues based on the high-throughput 14 pairs of cancer and normal samples in GSE113513, sequencing platforms, such as microarrays. This is a promising tool with extensive clinical applications, 124 CRC samples and 24 normal samples in including molecular diagnosis, prognosis prediction, GSE21510, 98 pairs of cancer and normal samples new drug targets discovery, etc[4–6]. Furthermore, plus 50 healthy donor tissues in GSE44076, and 17 microarray assay combining bioinformatics analysis pairs of cancer and normal samples in GSE32323. made it possible to analyze the gene expression on Identification of DEGs and data preprocessing mRNA level in CRC progression. For example, several studies have used this method to identify key To identify the DEGs, we used the NCBI-GEO2R genes in CRC development through comparing with online tool to analyze these datasets. Subsequently, normal samples, and showed that the key genes were adjusted P-value <0.05 and |log2(fold change)| >1 were involved in different signal pathways, biological set as the cutoff criteria to screen the significant DEGs processes, and molecular functions[7–12]. of each dataset. Finally, Venn diagrams were However, with a relatively limited degree of performed to get the overlap significant DEGs of the overlap, we still can not find reliable biomarkers or 4 datasets. drug targets. Therefore, the discovery of novel biomarkers for early detection and prognosis GO and KEGG pathway analyses of DEGs prediction of CRC is urgently required. To find the biological functional roles of DEGs, GO In the present study, we targeted to find key genes and KEGG pathway analyses were performed through to develop novel biomarkers or drug targets for CRC. DAVID database. Significant results of molecular Therefore, we chose four Gene Expression Omnibus function (MF), biological process (BP), cellular (GEO) datasets, GSE113513, GSE21510, GSE44076, component (CC), and biological pathways were and GSE32323, and used bioinformatics methods selected with P-value <0.05. to screen the significant differentially expressed genes (DEGs) between CRC tissues and normal PPI network construction and module analysis tissues. Gene ontology (GO) and KEGG Pathway The DEGs profiles were submitted to STRING analyses were used to find the biological roles of these database for exploring their potential interactions. DEGs through DAVID database. Furthermore, the PPI The interactions with a combined score >0.4 were network of DEGs was constructed and key modules or considered significant. Subsequently, the interaction hub genes were selected with Molecular Complex Detection (MCODE) plugin of Cytoscape software. files were downloaded and imported into Cytoscape And the clinical significance was validated by GEPIA software to construct the PPI network. The MCODE database. Finally, small active candidate molecules plugin was used to find key modules of the whole PPI were identified to develop new drugs through network with a degree cutoff=2, node score Connectivity Map (CMap) database. In brief, we cutoff=0.2, K-core=2, and max depth=100. The found four hub genes involved in CRC, which may hub genes were then selected with connectivity provide novel potential biomarkers for CRC degree >10. Furthermore, KEGG pathway analyses of prognosis, and two potential candidate drugs for CRC. the significant modules were performed with P-value <0.05. Materials and methods Analysis and validation of hub genes Data resources To verify the hub genes we found, we used GEPIA To explore the differential gene expression profiles database to analyze their expression and clinical Novel prognosis biomarkers and drugs for colorectal cancer 23 prognostic information in 270 CRC patients. And the edges (Fig. 3). Based on degree scores using the survival curve, stage analysis and box plot were MCODE plugin, two key modules were detected from performed to show the clinical implications of the whole PPI network complex. Module 1 contained hub genes. 61 nodes and 1648 edges, and DEGs were enriched in cell cycle, oocyte meiosis, progesterone-mediated Identification of small molecules oocyte maturation, DNA replication and p53 signaling To find potential small active molecules to develop pathway (Fig. 4A and B). Module 2 had 55 nodes and new drugs for treating CRC, we uploaded DEGs 625 edges, and these DEGs were enriched in probe profiles into the CMap database. This database chemokine signaling pathway, ribosome biogenesis in can help to predict small molecules that induce or eukaryotes, cytokine-cytokine receptor interaction, reverse gene expression signature with a score from pathways in cancer, purine metabolism, RNA −1 to 1. And the molecules which value from 0 closer polymerase, retrograde endocannabinoid signaling, TNF signaling pathway, legionellosis, regulation of to −1 were functioned as reversing the cancer cell status. lipolysis in adipocytes, NOD-like receptor signaling Results pathway, cytosolic DNA-sensing pathway and gastric acid secretion (Fig. 4C and D). Additionally, the top Identification of DEGs in CRC 20 hub genes, CDK1, CCNB1, MYC, CCNA2, MAD2L1, AURKA, TOP2A, CDC6, UBE2C, CHEK1, Analyzed with the GEO2R online tool, a total of RRM2, BUB1B, TTK, TRIP13, TPX2, BUB1, NCAPG, 1763, 4411, 2428, and 2276 DEGs were extracted KIF2C, KIF23, and MCM4 were identified with from GSE113513,