Identification and Interaction Analysis of Molecular Markers in Pancreatic
Total Page:16
File Type:pdf, Size:1020Kb
medRxiv preprint doi: https://doi.org/10.1101/2020.12.20.20248601; this version posted December 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Identification and Interaction Analysis of Molecular Markers in Pancreatic Ductal Adenocarcinoma by Integrated Bioinformatics Analysis and Molecular Docking Experiments Basavaraj Vastrad1 , Chanabasayya Vastrad *2, Anandkumar Tengli3 1. Department of Biochemistry, Basaveshwar College of Pharmacy, Gadag, Karnataka 582103, India. 2. Biostatistics and Bioinformatics,Chanabasava Nilaya, Bharthinagar, Dharwad, Karanataka 580001, India. 3. Department of Pharmaceutical Chemistry, JSS College of Pharmacy, Mysuru and JSS Academy of Higher Education & Research, Mysuru- 570015, Karnataka, India. * Chanabasayya Vastrad [email protected] Ph: +919480073398 Chanabasava Nilaya, Bharthinagar, Dharwad 580001 , Karanataka, India NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2020.12.20.20248601; this version posted December 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Abstract The current investigation aimed to mine therapeutic molecular targets that play an key part in the advancement of pancreatic ductal adenocarcinoma (PDAC). The expression profiling by high throughput sequencing dataset profile GSE133684 dataset was downloaded from the Gene Expression Omnibus (GEO) database. Limma package of R was used to identify differentially expressed genes (DEGs). Functional enrichment analysis of DEGs were performed. Protein-protein interaction (PPI) networks of the DEGs were constructed. An integrated gene regulatory network was built including DEGs, microRNAs (miRNAs), and transcription factors. Furthermore, consistent hub genes were further validated. Molecular docking experiment was conducted. A total of 463 DEGs (232 up regulated and 231 down regulated genes) were identified between very early PDAC and normal control samples. The results of Functional enrichment analysis revealed that the DEGs were significantly enriched in vesicle organization, secretory vesicle, protein dimerization activity, lymphocyte activation, cell surface, transferase activity, transferring phosphorus-containing groups, hemostasis and adaptive immune system. The PPI network and gene regulatory network of up regulated genes and down regulated genes were established, and hub genes were identified. The expression of hub genes (CCNB1, FHL2, HLA-DPA1 and TUBB1) were also validated to be differentially expressed among PDAC and normal control samples. Molecular docking experiment predicted the novel inhibitory molecules for CCNB1 and FHL2. The identification of hub genes in PDAC enhances our understanding of the molecular mechanisms underlying the progression of this disease. These genes may be potential diagnostic biomarkers and/or therapeutic molecular targets in patients with PDAC. Keywords: pancreatic ductal adenocarcinoma; bioinformatics analysis; biomarker; enrichment analysis; differentially expressed genes medRxiv preprint doi: https://doi.org/10.1101/2020.12.20.20248601; this version posted December 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. Introduction Pancreatic ductal adenocarcinoma (PDAC) is one of the most prevalent cancers in the world and primary tumor of the pancreas [1]. PDAC is a global burden ranking 15th in terms of incidence and fourth in terms of mortality [2]. Despite new developments in multimodal therapy its overall 5-year survival rate remains less than 8% [3]. PDAC treatment commonly includes surgery, radiation, chemotherapy and immunotherapy [4]. However, PDAC remains common and malignant due to recurrence and metastasis, and ultimately key cause of PDAC associated death [5]. Therefore, there is a vital need to advance new diagnostic strategies and therapeutic agents to upgrade the prognosis of patients with PDAC. The molecular mechanisms of PDAC tumorigenesis and development remain imprecise. It is therefore key to identify novel genes and pathways that are linked with PDAC tumorigenesis and patient prognosis, which may not only help to illuminate the underlying molecular mechanisms involved, but also to disclose novel diagnostic markers and therapeutic targets. Oji et al [6] demonstrated that the over expression of WT1 is linked with prognosis in patients with PDAC. A previous investigation reported that phosphoinositide 3-kinase signaling pathway is linked with development of PDAC [7]. Expression profiling by high throughput sequencing can promptly uncover gene expression on a global basis and are specially useful in identifying for differentially expressed genes (DEGs) [8]. A huge amount of data has been generated through the use of microarrays and the majority of such data has been deposited and saved in public databases. Previous investigations concerning PDAC gene expression profiling have diagnosed hundreds of DEGs [9]. The aim of this investigation was to identify hub genes and pathways in PDAC using bioinformatics methods. Our investigation contributes predictable biomarkers for early detection and prognosis, as well as effective drug targets for treating PDAC. Materials and methods Sequencing data medRxiv preprint doi: https://doi.org/10.1101/2020.12.20.20248601; this version posted December 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. PDAC expression profiling by high throughput sequencing dataset in this investigation was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/gds/) [10]. The DEGs were considered by 1 independent PDAC dataset, GSE133684 [11] with 284 PDAC and 117 normal samples. The GSE133684 expression profiling by high throughput sequencing data was based on the GPL20795 HiSeq X Ten (Homo sapiens) platform.. Identification of DEGs The Limma package of R language were used to normalize and convert the raw data to expression profiles [12]. The limma package of R language was used for DEGs between PDAC and normal control samples [12]. The P-value was adjusted by the Benjamini-Hochberg method [13]. An adjusted P-value <0.05 and |log2 fold change (FC) | >1 were considered as threshold values for DEGs identification. The ggplot2 package and gplots package of R language was used to generate volcano plot and heat map. The identified DEGs were preserved for further bioinformatics analysis. GO analysis and pathway enrichment analysis of DEGs The GO repository (http://geneontology.org/) [14] consists of a massive set of annotation terms and is generally used for annotating genes and identifying the distinctive biological aspects for expression profiling by high throughput sequencing data. The REACTOME database (https://reactome.org/) [15] contains data on known genes and their biochemical functions and is used for identifying functional and metabolic pathways. By performing the GO and REACTOME enrichment analysis at the functional level, we can boost a better understanding of the roles of these DEGs in the induction and in the advancement of PDAC. The ToppGene (ToppFun) (https://toppgene.cchmc.org/enrichment.jsp) [16] is an online resource that add tools for functional annotation and bioinformatics analysis. Both GO categories and REACTOME pathway enrichment analysis were implemented using ToppGene to inform the functions of these DEGs. P<0.05 was considered to indicate a statistically significant difference. Protein-protein interaction (PPI) network construction and module analysis medRxiv preprint doi: https://doi.org/10.1101/2020.12.20.20248601; this version posted December 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission. The online database IID interactome (http://iid.ophid.utoronto.ca//) [17] was used to construct a PPI network of the proteins encoded by DEGs. Then, Cytoscape software (Version 3.8.1) [18] was applied to perform protein interaction association network analysis and analyze the interaction correlation of the candidate proteins encoded by the DEGs in PDAC. Next, the Network Analyzer Cytoscape plug-in was applied to calculate node degree [19], betweenness centrality [20], stress centrality [21], closeness centrality [22]. Finally, the PEWCC1 (http://apps.cytoscape.org/apps/PEWCC1) [23] module for Cytoscape was used to collect the significant modules in the PPI network complex. Construction of miRNA-DEG regulatory network The miRNet database (https://www.mirnet.ca/) [24] is a database, containing miRNAs involved in various diseases. The miRNAs related to PDAC were searched from miRNet database. Through getting the intersection of the miRNAs and the DEGs,