Identification of Candidate Biomarkers and Therapeutic Agents for Heart Failure by Bioinformatics Analysis
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.01.24.428028; this version posted January 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Identification of candidate biomarkers and therapeutic agents for heart failure by bioinformatics analysis Basavaraj Vastrad1, Anandkumar Tengli2, Chanabasayya Vastrad*3 1. Department of Biochemistry, Basaveshwar College of Pharmacy, Gadag, Karnataka 582103, India. 2. Department of Pharmaceutical Chemistry, JSS College of Pharmacy, Mysuru and JSS Academy of Higher Education & Research, Mysuru, Karnataka, 570015, India 3. Biostatistics and Bioinformatics, Chanabasava Nilaya, Bharthinagar, Dharwad 580001, Karnataka, India. * Chanabasayya Vastrad [email protected] Ph: +919480073398 Chanabasava Nilaya, Bharthinagar, Dharwad 580001 , Karanataka, India bioRxiv preprint doi: https://doi.org/10.1101/2021.01.24.428028; this version posted January 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Abstract Heart failure (HF) is a heterogeneous clinical syndrome and affects millions of people all over the world. HF occurs when the cardiac overload and injury, which is a worldwide complaint. The aim of this study was to screen and verify hub genes involved in developmental HF as well as to explore active drug molecules. The expression profiling by high throughput sequencing of GSE141910 dataset was downloaded from the Gene Expression Omnibus (GEO) database, which contained 366 samples, including 200 heart failure samples and 166 non heart failure samples. The raw data was integrated to find differentially expressed genes (DEGs) and were further analyzed with bioinformatics analysis. Gene ontology (GO) and REACTOME enrichment analyses were performed via ToppGene; protein-protein interaction (PPI) networks of the DEGs was constructed based on data from the HiPPIE interactome database; modules analysis was performed; target gene - miRNA regulatory network and target gene - TF regulatory network were constructed and analyzed; hub genes were validated; molecular docking studies was performed. A total of 881 DEGs, including 442 up regulated genes and 439 down regulated genes were observed. Most of the DEGs were significantly enriched in biological adhesion, extracellular matrix, signaling receptor binding, secretion, intrinsic component of plasma membrane, signaling receptor activity, extracellular matrix organization and neutrophil degranulation. The top hub genes ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5 were identified from the PPI network. Module analysis revealed that HF was associated with adaptive immune system and neutrophil degranulation. The target genes, miRNAs and TFs were identified from the target gene - miRNA regulatory network and target gene - TF regulatory network. Furthermore, receiver operating characteristic (ROC) curve analysis and RT-PCR analysis revealed that ESR1, PYHIN1, PPP2R2B, LCK, TP63, PCLAF, CFTR, TK1, ECT2 and FKBP5 might serve as prognostic, diagnostic biomarkers and therapeutic target for HF. The predicted targets of these active molecules were then confirmed. The current investigation identified a series of key genes and pathways that might be involved in the progression of HF, providing a new understanding of the underlying molecular mechanisms of HF. Keywords: heart failure; differentially expressed genes; molecular docking; enrichment analysis; prognosis bioRxiv preprint doi: https://doi.org/10.1101/2021.01.24.428028; this version posted January 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Introduction Heart failure (HF) is a cardiovascular disease characterized by tachycardia, tachypnoea, pulmonary rales, pleural effusion, raised jugular venous pressure, peripheral oedema and hepatomegaly [1]. Morbidity and mortality linked with HF is a prevalent worldwide health problem holding a universal position as the leading cause of death [2]. The numbers of cases of HF are rising globally and it has become a key health issue. According to a survey, the prevalence HF is expected to exceed 50% of the global population [3]. Research suggests that modification in multiple genes and signaling pathways are associated in controlling the advancement of HF. However, a lack of investigation on the precise molecular mechanisms of HF development limits the treatment efficacy of the disease at present. Previous study showed that HF was related to the expression of MECP2 [4] and RBM20 [5]. Toll-Like receptor signaling pathway [6] and activin type II receptor signaling pathway [7] were liable for progression of HF. More investigations are required to focus on treatments that enhance the outcome of patients with HF, to strictly make the diagnosis of the disease based on screening of biomarkers. These investigations can upgrade prognosis of patients by lowering the risk of advancement of HF and related complications. So it is essential to recognize the mechanism and find biomarkers with a good specificity and sensitivity. The recent high-throughput RNA sequencing data has been widely employed to screen the differentially expressed genes (DEGs) between normal samples and HF samples in human beings, which makes it accessible for us to further explore the entire molecular alterations in HF at multiple levels involving DNA, RNA, proteins, epigenetic alterations, and metabolism [8]. However, there still exist obstacles to put these RNA seq data in application in clinic for the reason that the number of DEGs found by expression profiling by high throughput sequencing were massive and the statistical analyses were also too sophisticated [9-10] In this study, first, we had chosen dataset GSE141910 from Gene Expression Omnibus (GEO) (http:// www.ncbi.nlm.nih.gov/geo/) [11]. Second, we applied for bioRxiv preprint doi: https://doi.org/10.1101/2021.01.24.428028; this version posted January 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. limma tool in R software to obtain the differentially expressed genes (DEGs) in this dataset. Third, the ToppGene was used to analyze these DEGs including molecular function (MF), cellular component (CC), biological process (BP) and REACTOME pathways. Fourth, we established protein-protein interaction (PPI) network and then applied Cytotype PEWCC1 for module analysis of the DEGs which would identify some hub genes. Fifth, we established target gene - miRNA regulatory network and target gene - TF regulatory network. In addition, we further validated the hub genes by receiver operating characteristic (ROC) curve analysis and RT-PCR analysis. Finally, we performed molecular docking studies for over expressed hub genes. Results from the present investigation might provide new vision into potential prognostic and therapeutic targets for HF. Materials and Methods Data resource Expression profiling by high throughput sequencing with series number GSE141910 based on platform GPL16791 was downloaded from the GEO database. The dataset of GSE141910 contained 200 heart failure samples and 166 non heart failure samples. It was downloaded from the GEO database in NCBI based on the platform of GPL16791 Illumina HiSeq 2500 (Homo sapiens). Identification of DEGs in HF DEGs of dataset GSE141910 between HF groups and non heart failure groups were respectively analyzed using the limma package in R [12]. Fold changes (FCs) in the expression of individual genes were calculated and DEGs with P<0.05, |log FC| > 1.158 for up regulated genes and |log FC| < -0.83 for down regulated genes were considered to be significant. Hierarchical clustering and visualization were used by Heat-map package of R. Functional enrichment analysis Gene Ontology (GO) analysis and REACTOME pathway analysis were performed to determine the functions of DEGs using the ToppGene (ToppFun) (https://toppgene.cchmc.org/enrichment.jsp) [13] GO terms (http://geneontology.org/) [14] included biological processes (BP), cellular components (CC) and molecular functions (MF) of genomic products. bioRxiv preprint doi: https://doi.org/10.1101/2021.01.24.428028; this version posted January 26, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. REACTOME (https://reactome.org/) [15] analyzes pathways of important gene products. ToppGene is a bioinformatics database for analyzing the functional interpretation of lists of proteins and genes. The cutoff value was set to P<0.05. Protein–protein interaction network construction and module screening PPI networks are used to establish all protein coding genes into a massive biological network that serves an advance compassionate of the functional system of the proteome [16]. The HiPPIE interactome (https://cbdm.uni-mainz.de/hippie/) [17] database furnish information regarding predicted and experimental interactions of proteins. In the current investigation, the DEGs were mapped into the HiPPIE interactome database to find significant protein pairs with a combined score of >0.4. The PPI network was subsequently constructed using Cytoscape software, version 3.8.2 (www.cytoscape.org)