Integrated Systems Approach Reveals Sphingolipid Metabolism Pathway Dysregulation in Association with Late-Onset Alzheimer’S Disease
Total Page:16
File Type:pdf, Size:1020Kb
biology Article Integrated Systems Approach Reveals Sphingolipid Metabolism Pathway Dysregulation in Association with Late-Onset Alzheimer’s Disease John Stephen Malamon and Andres Kriete * Bossone Research Center, School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA; [email protected] * Correspondence: [email protected] Received: 31 October 2017; Accepted: 6 February 2018; Published: 9 February 2018 Abstract: Late-onset Alzheimer’s disease (LOAD) and age are significantly correlated such that one-third of Americans beyond 85 years of age are afflicted. We have designed and implemented a pilot study that combines systems biology approaches with traditional next-generation sequencing (NGS) analysis techniques to identify relevant regulatory pathways, infer functional relationships and confirm the dysregulation of these biological pathways in LOAD. Our study design is a most comprehensive systems approach combining co-expression network modeling derived from RNA-seq data, rigorous quality control (QC) standards, functional ontology, and expression quantitative trait loci (eQTL) derived from whole exome (WES) single nucleotide variant (SNV) genotype data. Our initial results reveal several statistically significant, biologically relevant genes involved in sphingolipid metabolism. To validate these findings, we performed a gene set enrichment analysis (GSEA). The GSEA revealed the sphingolipid metabolism pathway and regulation of autophagy in association with LOAD cases. In the execution of this study, we have successfully tested an integrative approach to identify both novel and known LOAD drivers in order to develop a broader and more detailed picture of the highly complex transcriptional and regulatory landscape of age-related dementia. Keywords: Alzheimer’s disease; aging; sphingolipid metabolism; autophagy; co-expression modeling; eQTL; gene set enrichment analysis 1. Introduction Alzheimer’s disease (AD) currently afflicts as many as five million Americans and is estimated to reach 14 million by 2050 [1]. Aside from the tremendous health and personal burdens, the current financial burden of AD exceeds $240 billion and is expected to reach nearly $1 trillion by 2050 [2]. Furthermore, there is no current treatment, prevention or cure for AD. It is absolutely imperative that a more comprehensive description of the many complex, systematic dysfunctions involved in AD is aggressively developed with the explicit aim of better equipping the larger scientific community with the tools to tackle this healthcare crisis. Despite myriad confirmed loss-of-function single nucleotide variants (SNVs) that are now associated with LOAD (Late-onset Alzheimer’s disease), there is still no clear understanding of how these variants combine to contribute to the major and irreversible pathological hallmarks of LOAD, such as amyloid plaque deposits, neurofibrillary tangles, and dysfunctions in innate immunity. Whilst many studies focus on rare-variant association and differential expression analysis, here we model co-expression networks, define their function, and test genetic factors that can demonstratively disrupt the homeostasis of a regulatory system or pathway. By combining multiple ‘–omics’ platforms, such as RNA-seq and SNV genotype arrays, we can provide Biology 2018, 7, 16; doi:10.3390/biology7010016 www.mdpi.com/journal/biology Biology 2018, 7, x 2 of 13 BiologySNV genotype2018, 7, 16 arrays, we can provide a more robust, detailed description of the many mechanis2 ofms 13 that underlie LOAD to gain new insights into the larger, regulatory dysfunctions contributing to this adisease. more robust, detailed description of the many mechanisms that underlie LOAD to gain new insights into the larger, regulatory dysfunctions contributing to this disease. 2. Materials and Methods 2. Materials and Methods 2.1. Workflow Overview 2.1. Workflow Overview Our framework, outlined in Figure 1, consists of three main analytical components: co-expressionOur framework, modeling, outlined functional in Figure enrichment1, consists of and three expression main analytical quantitative components: trait co-expression loci (eQTL) modeling,analysis. To functional assess the enrichment validity of and this expression approach, quantitativewe have designed trait loci a pilot (eQTL) study analysis. in which To assess we have the validityapplied all of thissteps approach, of our workflow we have to designed the Mount a pilot Sinai study Brain in Bank’s which we(MSBB) have inferior applied temporal all steps ofgyrus our workflowdataset (N to = the58). Mount We used Sinai all Brain 58 subjects Bank’s (MSBB)in our analyses inferior temporaland did not gyrus segregate dataset (basedN = 58). on We disease used allstatus, 58 subjects as our in sample our analyses size was and didnot notadequate segregate for baseda statistically on disease meaningful status, as our segregation sample size analysis. was not adequateBefore beginning for a statistically analysis, meaningful we performed segregation a three analysis.-tiered quality Before control beginning process analysis, to normalize we performed and areduce three-tiered the RNA quality-seq controldataset. process Co-expression to normalize models and were reduce constructed the RNA-seq using dataset. the weighted Co-expression gene modelsco-expression were constructed network usinganalysis the weighted(WGCNA) gene toolkit co-expression [3]. Network network modules analysis were (WGCNA) stored toolkit in the [3]. Networktopological modules overlap were matrix stored (TOM), in the topologicalthe weighted overlap representation matrix (TOM), of the weightedEuclidian representation space of all ofnetwork the Euclidian-module spaceassociations. of all network-module Next, we performed associations. association Next, we-based performed testing association-based using clinical testingneuropathological using clinical data. neuropathological From this we identified data. From modules this we that identified correlated modules with specific that correlated clinical traits with specificor phenotypes. clinical traitsThe TOM or phenotypes. was then enriched The TOM with was functional then enriched data withprovided functional by the data Gene provided Ontology by the(GO) Gene consortium Ontology [4] (GO). consortium [4]. We incorporated functional annotation and analysis to elucidate biologically relevant pathways and drive more deeply into the mechanisms underlying LOAD (see Code S1). Next, we used eQTL analysis to examine the relationship between changes in genotypegenotype and genegene expression.expression. This step allowed usus to to perform perform an an in silicoin silico validation validation of perturbations of perturbations in our candidatein our candidate pathways. pathways. This approach This givesapproach us the gives ability us tothe demonstrate ability to demonstr how changesate how in genotypechanges in affect genotype larger regulatoryaffect larger systems, regulatory not justsystems, one gene.not just Finally, one gene. we performedFinally, we independent,performed independent, pathway validation pathway usingvalidation the Broad using Institute’sthe Broad geneInstitute’s set enrichment gene set enrichment analysis (GSEA) analysis toolkit (GSEA) [5]. toolkit Specifically, [5]. Specifically, we used GSEA we used to process GSEA to the process RNA-seq the data,RNA- incorporatingseq data, incorporating the KEGG the pathway KEGG databasepathway [ 6database–8], to discover [6–8], to statistically discover statistically meaningful meaningful pathways. Bypathways. combining By thesecombini techniques,ng these wetechniques, have examined we have how examined specific genetic how variationsspecific genetic contribute variations to the dysregulationcontribute to the of dysregulation biologically relevant of biologically pathways relevant and begin pathways to piece and together begin to apiece story together for how a thesestory perturbationsfor how these mayperturbations lead to systematic may lead failure. to systematic failure. Figure 1.1. HighHigh-level-level diagram of our integrative analysis workflow:workflow: RNA RNA-seq-seq dataset (red) was combined with clinical data (purple) for co-expressionco-expression modeling and associationassociation testing, whereas RNA-seqRNA-seq was combined with SNV genotype data (green)(green) forfor eQTLeQTL analysis.analysis. GSEA was used to independently validate candidatecandidate pathways.pathways. Biology 2018, 7, 16 3 of 13 2.2. Data Description The Mount Sinai Brain Bank (MSBB) Array Tissue Panel study provides RNA-seq, clinical neuropathology and single nucleotide variant (SNV) whole exome sequence (WES) data across 19 brain regions with approximately 60 unrelated, age, and sex-matched samples (~40 LOAD, ~20 controls). For our pilot study, we have chosen to analyse the inferior temporal gyrus (ITG) for two reasons: (1) we determined that it was the highest quality regional dataset and (2) experimental evidence has shown that ITG is associated with early AD pathologies such as memory impairment [9]. In this way, we wish to examine early disease conditions with a focus on early intervention and prevention. These subjects range from no cognitive impairment (N = 11), mild cognitive impairment (N = 18) and severe AD phenotypes (N = 29). Table1 describes the six neuropathology metrics used in this study. Reported race is 45 Caucasian, 10 African, 2 Hispanic and 1 Asian. RNA-seq data were generated using the Affymetrix_133AB