Key Genes Associated with Pancreatic Cancer and Their Association with Outcomes: a Bioinformatics Analysis
Total Page:16
File Type:pdf, Size:1020Kb
MOLECULAR MEDICINE REPORTS 20: 1343-1352, 2019 Key genes associated with pancreatic cancer and their association with outcomes: A bioinformatics analysis JIAJIA WU1*, ZEDONG LI2*, KAI ZENG1, KANGJIAN WU1, DONG XU3, JUN ZHOU2 and LIJIAN XU1 1Department of General Surgery, The Second Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu 210000; 2Department of Minimally Invasive Surgery, The Second Xiangya Hospital, Central South University, Changsha, Hunan 410011; 3Department of General Surgery, Gaochun People's Hospital, Nanjing, Jiangsu 211300, P.R. China Received October 4, 2018; Accepted April 9, 2019 DOI: 10.3892/mmr.2019.10321 Abstract. Pancreatic cancer is a highly malignant neoplastic that the expression of COL17A1 gene may be associated with disease of the digestive system. In the present study, the the occurrence and development of pancreatic cancer. dataset GSE62165 was downloaded from the Gene Expression Omnibus (GEO) database. GSE62165 contained the data of Introduction 118 pancreatic ductal adenocarcinoma samples (38 early-stage tumors, 62 lymph node metastases and 18 advanced tumors) Pancreatic cancer is a highly malignant neoplasm of the diges- and 13 control samples. Differences in the expression levels tive system that accounts for >200,000 deaths/year globally (1). of genes between normal tissues and early-stage tumors were The incidence of pancreatic cancer is low compared with that investigated. A total of 240 differentially expressed genes of lung, breast, colorectal and gastric cancers; however, it is (DEGs) were identified using R software 3.5 (137 upregulated associated with a very high mortality rate. It has been reported genes and 103 downregulated genes). Then, the differentially that the incidence of pancreatic cancer is very similar to the expressed genes were subjected to Gene Ontology and Kyoto associated mortality rate; the reported 5-year survival rate of Encyclopedia of Genes and Genomes analysis. The following patients with pancreatic cancer is <6% (2). The mortality rate 18 core genes were identified using Cytoscape, based on the of patients with pancreatic cancer ranks fourth among common protein-interaction network of DEGs determined using the cancers, and is predicted to rise to second within a decade (3). online tool STRING: EGF, ALB, COL17A1, FN1, TIMP1, A number of factors have been identified as contributing to PLAU, PLA2G1B, IGFBP3, PLAUR, VCAN, COL1A1, PNLIP, the etiopathogenesis of pancreatic cancer, including heredity, CTRL, PRSS3, COMP, CPB1, ITGA2 and CEL. The pathways smoking, high-fat diet, chronic pancreatitis and consump- of the core genes were primarily associated with pancreatic tion of nitrous acid compounds (4). Due to the latency of secretion, protein digestion and absorption, and focal adhesion. pancreatic cancer, the majority of patients are diagnosed at Finally, survival analyses of core genes in pancreatic cancer an advanced stage, when tumor tissue has already infiltrated were conducted using the UALCAN online database. It was the surrounding tissues and has formed distant metastases, revealed that PLAU and COL17A1 were significantly associated decreasing the usefulness of surgical interventions (5). As a with poor prognosis (P<0.05). The expression levels of genes in result of drug resistance, the efficacy of postoperative adjuvant primary pancreatic cancer tissues were then compared; only therapy has also been very unsatisfactory (6). Carbohydrate one gene, COL17A1, was identified to be significantly differen- antigen 19-9 (CA19-9) is the most frequently used marker for tially expressed. Finally, another dataset from GEO, GSE28735, the clinical diagnosis of cancer; the reported sensitivity and was analyzed to verify the upregulated expression of COL17A1. specificity of CA19‑9 for the diagnosis of pancreatic cancer is Taken together, the results of the present study have indicated 69‑93 and 46‑98%, respectively (7). Therefore, early diagnosis and treatment are important to improve the prognosis and survival of patients with pancreatic cancer. At present, high-throughput sequencing is employed in a variety of contexts, such as the discovery of gene mutations Correspondence to: Professor Lijian Xu, Department of General and chromosomal translocations that are closely associ- Surgery, The Second Affiliated Hospital of Nanjing Medical ated with the occurrence and development of tumors (8-10). University, 121 Jiangjiayuan Road, Gulou, Nanjing, Jiangsu 210000, P. R. Ch i na High-throughput sequencing may be useful for the diagnosis of E‑mail: [email protected] cancer and development of targeted therapies. These analyses may provide novel insights to guide subsequent research. *Contributed equally Materials and methods Key words: pancreatic cancer, bioinformatics analysis, prognosis Microarray data. The gene expression profile of GSE62165 (11) was downloaded from the GEO database (12). The data were 1344 WU et al: BIOINFORMATICS ANALYSIS OF PANCREATIC CANCER created using the GPL13667 Affymetrix® Human Genome Table I. Top 20 differentially expressed genes in early-stage U219 array (Affymetrix; Thermo Fisher Scientific, Inc.). pancreatic cancer tissues based on Log2FC. GSE62165 contained data on 118 pancreatic ductal adenocar- cinoma (PDAC) samples and 13 control samples. Data were A, Upregulated genes standardized using the robust multi-array average (RMA) algorithm using limma package (version 3.38.3) (13). In addi- Gene symbol Log2FC Adjusted P-value tion, a separate dataset, GSE28735 (14,15), was used to verify COL1A1 5.9240335 3.17x10-16 the results. The expression profiles included 45 matched pairs -12 of pancreatic tumor and adjacent non-tumor tissues from KRT17 5.5334643 4.10x10 -09 45 patients with PDAC. The Cancer Genome Atlas (TCGA; CEACAM5 5.3990511 1.66x10 -13 http://cancergenome.nih.gov/) contains genomic sequencing S100P 5.2665291 1.66x10 ‑17 data involving 33 species of cancer. COL10A1 5.2258147 1.17x10 SERPINB5 5.1642588 5.50x10-15 Identification of differentially expressed genes (DEGs). The GJB2 5.0747799 7.98x10-18 limma package (version 3.38.3) (13) was used to identify DEGs COL17A1 5.0501325 1.80x10-11 between pancreatic cancer tissue and normal pancreatic tissue CXCL5 5.0384043 1.28x10-11 samples in R software (version 3.5; https://www.R‑project.org). TMPRSS4 5.0203823 2.37x10-16 |log2 Fold Change (FC)|>3.0 and adjusted P-value <0.05 were SDR16C5 4.9961337 3.16x10-16 considered to be the threshold for differential gene identification. CTHRC1 4.9626426 7.77x10-20 COL11A1 4.9350078 6.41x10‑17 Gene Ontology (GO) and Kyoto Encyclopedia of genes SLC6A14 4.8841916 2.47x10-15 and genomes (KEGG) pathway analysis of DEGs. GO MMP11 4.8824426 3.14x10-16 (http://www.geneontology.org/) and the KEGG (https://www. SULF1 4.721966 2.96x10‑17 kegg.jp/) (16-19) were used to analyze the function of DEGs FN1 4.6424864 2.99x10-16 using the cluster Profiler R package (20). P<0.05 was consid- POSTN 4.6415794 1.33x10-16 ered to indicate a statistically significant difference in CCL18 4.5489901 1.41x10-11 functional enrichment analysis. -09 MUC4 4.5022059 1.25x10 Core genes screening from the protein‑protein interaction (PPI) network. A PPI network for the DEGs was generated B, Downregulated genes using the STRING database (https://string-db.org/). Then, Cytoscape (version 3.6.1) (21) was employed, and a plug-in Gene symbol Log2FC Adjusted P-value termed cytohubba (22) was integrated into the software. The ‑07 plug-in provides 12 types of topological analysis methods SYCN ‑6.6535946 2.74x10 -09 [Maximal Clique Centrality, Maximum Neighborhood SERPINI2 ‑6.2894352 8.73x10 -10 Component (MNC), Density of MNC, Degree, Edge Percolated AQP8 ‑6.2356139 2.11x10 -10 Component, Bottleneck, EcCentricity, Closeness, Radiality, AMY1A ‑6.1790263 4.38x10 -08 Betweenness, Stress and Clustering Coefficient). Using 12 ALB ‑6.1165814 2.16x10 analysis methods, we identified the top 18 genes as core genes. CELA2A ‑6.0845313 2.83x10-06 PNLIPRP1 ‑6.0676353 6.52x10-10 Expression levels and survival analysis of core genes CTRL ‑5.9224624 1.73x10-06 in pancreatic cancer. UALCAN (http://ualcan.path.uab. PDIA2 ‑5.9185276 4.65x10-09 edu/index.html) (23) was employed to perform survival CPA1 ‑5.804844 5.21x10-06 analysis based on the information of TCGA database. Survival TMED6 ‑5.7967792 2.37x10-10 analysis was performed via the Kaplan-Meier method using CELP ‑5.7183603 1.15x10-10 18 identified core genes, based on their core gene expression AQP12A ‑5.6598636 3.88x10-14 levels in pancreatic adenocarcinoma (PAAD). P<0.05 was CUZD1 ‑5.5766969 1.68x10-06 considered to be statistically significant. The P-value was CELA2B -5.5649112 2.23x10-05 calculated using log-rank test. The ‘scaled_estimate’ column CPA2 ‑5.55513 3.43x10-06 provided the potential transcripts produced by each gene. The -05 6 CELA3A ‑5.5508171 1.84x10 ‘scaled_estimate’ was multiplied by 10 to obtain a transcripts GP2 ‑5.5087922 1.20x10-06 per million (TPM) expression value (24). Gene expression -09 ERP27 ‑5.4765153 7.39x10 levels in tumor tissues exhibited notable inter-individual vari- CPA1 ‑5.4417426 1.81x10-05 ability. High expression indicated that the TPM value was above the upper quartile value. Low expression indicated that Log2FC, log2 fold change. the TPM value was equal or below the upper quartile value. Verification of results. The findings from the bioinformatics analyses were validated using the dataset GSE28735 from matched pairs of pancreatic tumor and adjacent non-tumor the GEO database. The expression profiles included 45 tissues from 45 patients with PDAC. The online analysis tool MOLECULAR MEDICINE REPORTS 20: 1343-1352, 2019 1345 Figure 1. Heat map of gene expression in pancreatic ductal adenocarcinoma tissues and healthy controls. The expression levels of various genes in 51 samples (38 early-stage tumor samples and 13 controls) are presented. Green indicates downregulated expression; red indicates upregulated expression; black indicates no significant difference in expression. GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/) was used with upregulated expression is presented in Fig.