“Omics” Data Integration and Functional Analyses Link Enoyl-Coa Hydratase, Short Chain 1 to Drug Refractory Dilated Cardiomyopathy Nzali V
Total Page:16
File Type:pdf, Size:1020Kb
Campbell et al. BMC Medical Genomics (2018) 11:110 https://doi.org/10.1186/s12920-018-0439-6 RESEARCH ARTICLE Open Access “Omics” data integration and functional analyses link Enoyl-CoA hydratase, short chain 1 to drug refractory dilated cardiomyopathy Nzali V. Campbell1,2,6* , David A. Weitzenkamp2, Ian L. Campbell3, Ronald F. Schmidt4, Chindo Hicks5, Michael J. Morgan2, David C. Irwin2 and John J. Tentler1,2 Abstract Background: Large-scale “omics” datasets have not been leveraged and integrated with functional analyses to discover potential drivers of cardiomyopathy. This study addresses the knowledge gap. Methods: We coupled RNA sequence (RNA-Seq) variant detection and transcriptome profiling with pathway analysis to model drug refractory dilated cardiomyopathy (drDCM) using the BaseSpace sequencing hub and Ingenuity Pathway Analysis. We used RNA-Seq case-control datasets (n = 6 cases, n = 4 controls), exome sequence familial DCM datasets (n = 3 Italians, n = 5 Italians, n = 5 Chinese), and controls from the HapMap project (n = 5 Caucasians, and n = 5 Asians) for disease modeling and putative mutation discovery. Variant replication datasets: n =128casesandn = 15 controls. Source of datasets: NCBI Sequence Read Archive. Statistics: Pairwise differential expression analyses to determine differentially expressed genes and t-tests to calculate p-values. We adjusted for false discovery rates and reported q-values. We used chi-square tests to assess independence among variables, the Fisher’s Exact Tests and overlap p-values for the pathways and p-scores to rank network. (Continued on next page) * Correspondence: [email protected] 1Colorado Clinical and Translational Sciences Institute, Mail Stop B141, 12401 E. 17th Ave, Aurora, CO 80045, USA 2Anschutz Medical Campus, University of Colorado, 13001 E 17th Pl, Aurora, CO 80045, USA Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Campbell et al. BMC Medical Genomics (2018) 11:110 Page 2 of 30 (Continued from previous page) Results: Data revealed that ECHS1(enoyl-CoA hydratase, short chain 1(log2(foldchange) = 1.63329) hosts a mirtron, MIR3944 expressed in drDCM (FPKM = 5.2857) and not in controls (FPKM = 0). Has-miR3944-3p is a putative target of BAG1 (BCL2 associated athanogene 1(log2(foldchange) = 1.31978) and has-miR3944-5p of ITGAV (integrin subunit alpha V(log2(foldchange) = 1.46107) and RHOD (ras homolog family member D(log2(foldchange) = 1.28851). There is an association between ECHS1:11 V/A(rs10466126) and drDCM (p = 0.02496). The interaction (p = 2.82E-07) between ECHS1:75 T/I(rs1049951) and ECHS1:rs10466126 is associated with drDCM (p <2.2e-16).ECHS1:rs10466126 and ECHS1: rs1049951 are in linkage disequilibrium (D’ = 1). The interaction (p = 7.84E-08) between ECHS1:rs1049951 and the novel ECHS1:c.41insT variant is associated with drDCM (p < 2.2e-16). The interaction (p = 0.001096) between DBT (Dihydrolipoamide branched chain transacylase E2):384G/S(rs12021720) and ECHS1:rs10466126 is associated with drDCM (p < 2.2e-16). At the mRNA level, there is an association between ECHS1 (log2(foldchange) = 1.63329; q = 0.013927) and DBT (log2(foldchange) = 0.955072; q = 0.0368792) with drDCM. ECHS1 is involved in valine (−log (p = 3.39E00)), isoleucine degradation (p = 0.00457), fatty acid β-oxidation (−log(p) = 2.83E00), and drug metabolism: cytochrome P450 (z-score = 2.07985196) pathways. The mitochondria (−log(p) = 8.73E00), oxidative phosphorylation (−log(p) = 5.35E00) and TCA-cycle II (−log(p) = 2.70E00) are dysfunctional. Conclusions: We introduce an integrative data strategy that considers the interplay between the DNA, mRNA, and associated pathways, which represents a possible diagnostic, prognostic, biomarker, and personalized treatment discovery approach in genomically heterogeneous diseases. Keywords: drDCM, Omics, RNA-Seq, IPA, ECHS1, DBT, MIR3944, rs10466126, BCAA, Mitochondria Background appreciating the greater complexity that underlies hu- At present, biventricular or left ventricular dilation and man disease [6]. systolic dysfunction with no abnormal hypertension and Accomplishments in revealing the genomic and genetic valve disease or coronary artery disease enough to pathogenic architecture of complex traits have been meek, produce global systolic malfunction characterize dilated in part because of limited investigations that consider vari- cardiomyopathy (DCM: OMIM 115200). Causative fac- ation interplay across each stage of the central dogma [6]. tors can be genetic or non-genetic, and, sometimes, gen- In their 2017 review, Hang et al. presented three methods etic predispositions can interact with environmental used to integrate multiple omics data. They outlined how factors to trigger DCM. On the other hand, non-genetic comprehensive tools and multi-omic data integration have causes include drugs and toxins, myocarditis, and peri- advanced. They discuss three main algorithms used, partum cardiomyopathy. However, a combination of the unsupervised, supervised and semi-supervised data inte- factors mentioned above can also cause DCM [1–4]. gration methods [7]. Dissecting the molecular mechanisms underlying the Briefly, unsupervised data integration draws inferences disease remains a challenge and regrettably, the end- from unlabeled response variable input datasets. It uses point is usually end-stage drug-refractory heart failure several methods to investigate their biological profiles to and heart transplantation [5]. Advances in technology allocate objects into various clusters or subgroups. Con- have expanded the availability of omics data. The goal of versely, the supervised integrative approach considers the these data is to determine models that predict outcomes subject phenotype, diseased or healthy and uses machine and the phenotype, which expose biomarkers and give training omics datasets in their analyses. This approach insight into the genomic foundation of how complex uses the biological information of the labeled objects to traits are inherited. However, robust strategies to tie to- get patterns for diverse phenotypes and allocate labels to gether and use these data to discover valid connections unlabeled data by comparing the models. Semi-supervised are lacking [6]. Approaches that integrate “omics” data integrative methods, interface unsupervised and super- link the gap between assessing variation at only one vised approaches. They incorporate labeled and unlabeled stage of regulation of the central dogma and understand- samples to create learning algorithms. They mostly build ing the underlying complexity within biological systems. object-wise similarity networks by gathering omics data Understanding such complexity requires intricate and assigning labels to unknown objects through their models, which consider variation across various stages associations to the labeled objects [7]. of biological regulation. Data coupling identifies import- In our integrative approach, one does not use unlabeled ant genomic factors and connections that elucidate or response variables, training datasets, or create learning al- predict risk factors for disease and other biological out- gorithms to draw inferences. Here, we present our con- comes. It provides an avenue to making sense of and ceptual and analytical models used in our integrative Campbell et al. BMC Medical Genomics (2018) 11:110 Page 3 of 30 method, based on both the dominant and alternative para- to changes in gene expression. Changed gene expression digms, [6] and approaches coined by Vanderweele and would, in turn, lead to protein alterations, followed by Robins [8]. disease. However, the model also considers the alterna- In this report, we aimed to use our methods to provide tive probability that many levels of molecular and envir- strategies and insight that can be exploited to reveal mu- onmental dissimilarities add to disease risk in a tations, modifier genes, pathogenic pathways, and to convoluted nonlinear, and interactive fashion [6]. model complex traits. We posited that our omics inte- We defined data integration as merging genomics and grative method would be able to model molecular pro- transcriptome data, pathway, functional, and association cesses that lead to drug refractory DCM (drDCM) and analyses. The data are the predictor variables that make discriminate mutations from modifiers, pathogenic path- it possible for one to have a more systematic and ways from those that alter the outcome. We report a complete modeling of a complex trait. The complex novel approach for omics data integration that considers phenotype, here drDCM, might arise because of interac- the interplay between DNA, mRNA and pathway ana- tions among biological variations at many stages of gen- lyses. The methods can be used to model complex traits omic control. However, the disease might also occur while illuminating genomic biomarkers and targets for because of one’s environment.