Multivariate Genome Wide Association and Network Analysis of Subcortical Imaging Phenotypes in Alzheimer's Disease
Total Page:16
File Type:pdf, Size:1020Kb
Meng et al. BMC Genomics 2020, 21(Suppl 11):896 https://doi.org/10.1186/s12864-020-07282-7 RESEARCH Open Access Multivariate genome wide association and network analysis of subcortical imaging phenotypes in Alzheimer’s disease Xianglian Meng1, Jin Li2, Qiushi Zhang3, Feng Chen2, Chenyuan Bian2, Xiaohui Yao4, Jingwen Yan5,6, Zhe Xu1, Shannon L. Risacher5, Andrew J. Saykin5, Hong Liang2* , Li Shen4* and for the Alzheimer’s Disease Neuroimaging Initiative From The International Conference on Intelligent Biology and Medicine (ICIBM) 2020 Virtual. 9-10 August 2020 Abstract Background: Genome-wide association studies (GWAS) have identified many individual genes associated with brain imaging quantitative traits (QTs) in Alzheimer’s disease (AD). However single marker level association discovery may not be able to address the underlying biological interactions with disease mechanism. Results: In this paper, we used the MGAS (Multivariate Gene-based Association test by extended Simes procedure) tool to perform multivariate GWAS on eight AD-relevant subcortical imaging measures. We conducted multiple iPINBPA (integrative Protein-Interaction-Network-Based Pathway Analysis) network analyses on MGAS findings using protein-protein interaction (PPI) data, and identified five Consensus Modules (CMs) from the PPI network. Functional annotation and network analysis were performed on the identified CMs. The MGAS yielded significant hits within APOE, TOMM40 and APOC1 genes, which were known AD risk factors, as well as a few new genes such as LAMA1, XYLB, HSD17B7P2, and NPEPL1. The identified five CMs were enriched by biological processes related to disorders such as Alzheimer’s disease, Legionellosis, Pertussis, and Serotonergic synapse. Conclusions: The statistical power of coupling MGAS with iPINBPA was higher than traditional GWAS method, and yielded new findings that were missed by GWAS. This study provides novel insights into the molecular mechanism of Alzheimer’s Disease and will be of value to novel gene discovery and functional genomic studies. Keywords: Brain imaging, Multivariate gene-based genome-wide analysis, iPINBPA network analysis, Consensus modules * Correspondence: [email protected]; [email protected] 2College of Automation, Harbin Engineering University, Harbin 150001, China 4Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Meng et al. BMC Genomics 2020, 21(Suppl 11):896 Page 2 of 12 Background Results Alzheimer’s disease (AD) is a debilitating and highly her- Participant characteristics itable disease with great complexity in its genetic con- The subjects (N = 866) consisted of 467 males (53.9%) tributors [1]. Genome-wide association studies (GWAS) and 399 females (46.1%) aged 48–91 years. Shown in of AD or AD biomarkers have been performed at the Table 1 are the demographic and clinical characteristics single-nucleotide polymorphism (SNP) level [2–4]as of these subjects stratified by five diagnostic groups. well as at the higher level (e.g., gene, pathway and/or There is no significant difference on the APOE e4 status network) [5–8]. It is widely recognized that AD has a in the five diagnostic groups. Significant differences are complicated genetic mechanism involving multiple observed in gender (p = 0.035) and education (p = 0.037). genes. Different combinations of functionally related var- Age is significantly different across the five groups (p < iants in genes and pathways may interact to produce the 0.001). Furthermore, eight neuroimaging phenotypes phenotypic outcomes in AD, single SNP-level and gene- (LAmygVol,RAmygVol,LHippVol,RHippVol, LAccum- level GWAS results are unlikely to completely reveal the Vol,RAccumVol,LPutamVol,RPutamVol; see Table 2) underlying genetic mechanism in AD. GWAS have show the significant difference across the five diagnostic greatly facilitated the identification of genetic markers groups (p< 0.001). Shown in Fig. 2 is the correlation (e.g., single nucleotide polymorphisms or SNPs) associ- matrix of these eight phenotypes. The correlation ated with brain imaging quantitative traits (QTs) in AD between LHippVol and RHippVol (r=0.83) and that be- [9, 10]. As a complex disease, it is highly likely that AD tween LPutamVol and RPutamVol (r= 0.90) are among is influenced by multiple genetic variants [11, 12]. The the highest. identified single-SNP-single-QT associations typically have small effect sizes. To bridge this gap, exploring Multivariate genome wide association study single-SNP-multi-QT associations may have the poten- In multivariate genome wide association study (MGAS) tial to increase statistical power and identify meaningful [13, 20], the top SNP hit is rs769449 from the APOE and imaging genetic associations. With this observation, we TOMM40 region (p = 1.19E-09) (Table 3). According to employ the MGAS (Multivariate Gene-based Association the hypothesis that genes are the functional units in test by extended Simes procedure) tool [13] to perform biology [15, 21], multivariate gene-level association p-values multivariate GWAS on eight AD-relevant subcortical were also obtained by MGAS which combines p-value in- imaging measures. formationinregressingunivariate phenotypes on common In addition, biological interactions may be important SNPs. Figure 3 shows the Manhattan plot of the gene- in contributing to intermediate imaging QTs and overall based MGAS results. Using Bonferroni corrected p-value of disease outcomes [14]. Network-based analysis guided 0.05 as the threshold, three genes (APOE, TOMM40, by biologically relevant connections from public APOC1) were significantly associated with the studied eight databases provides a powerful tool for improved mech- subcortical measures. Table 4 shows that the top 10 gene- anistic understanding of complex disorders [15–18]. level findings identified by MGAS, where APOE (p =2.77E- Considering that the etiology of AD might depend on 08), TOMM40 (p = 3.49E-08), and APOC1 (p =2.09E-06) functional protein-protein interaction (PPI) network, we are the well-known AD risk regions. LAMA1 (p =3.79E-05) conduct multiple iPINBPA (integrative protein- was reported to encode the laminin alpha subunit associ- interaction-network-based pathway analysis) [19] net- ated with late onset AD in the Amish [22]. HSD17B7P2 work analyses on MGAS findings using the PPI data, (p = 8.40E-05) was reported to play an important role in and identify Consensus Modules (CMs) based on the brain development [23]. The other five gene-level findings iPINBPA discoveries. Functional annotation and net- in top 10 are XYLB, NPEPL1, CYP24A1, OR5B2 and work analysis are subsequently performed on the identi- MIR7160. fied CMs. In order to enhance the ability to recognize the aggre- Consensus modules gation effect of multiple SNPs, it may be desirable to Consensus modules (CMs) were constructed based on perform association analysis at the SNP set (or gene) our previous work [24]. To search for subnetworks in level rather than at a single SNP level. This paper aims the multivariate GWAS finding, we ran iPINBPA ten to reveal the relationship between genetic markers and times by varying the random seed value from 1 to 10. multiple phenotypes, improve statistical power, and find Table 5 shows the top 5 subnetworks identified in each GWAS missing results by MGAS. Network analysis run, including the Dice’s coefficient value with the most could provide meaningful biological relationships to help similar modules in other runs. Compared with the interpret GWAS data to further study the genetic mech- standard iPINBPA method, our CM-based network anism of AD. A schematic framework of our analysis is strategy was designed to identify more reliable modules shown in Fig. 1. across multiple runs. Meng et al. BMC Genomics 2020, 21(Suppl 11):896 Page 3 of 12 Fig. 1 An overview of the proposed analysis framework. a Multivariate genome wide association analysis of eight subcortical imaging measures. b Network-based analysis of MGAS findings using the CM-based network strategy. c Functional enrichment analysis of the identified consensus modules (CMs) For the overlapping subnetworks, five unique CMs signaling pathway are the pathways significantly were identified (Fig. 4). CM1 contains eight genes, in- enriched by one or more CMs [25]. We also observe that cluding MAPK8, ATF2, TNFRSF1A, JUND, NR3C1, RB1, CM1,CM3,CM4 enriched many interesting pathways. In IKBKB, and CDK2.CM2 contains four genes, including particular, CM3 demonstrates the strongest functional IGF1R, PRKCA, INSR, and PTPRC.CM3 contains six association with AD (p = 4.94E-05). genes, including APP, APOE, CASP3, C3, PLTP, and CNTF.CM4 contains five genes, including APP, APOE, Discussion CASP3, C3, and PLTP. CM5 contains six genes, includ- In this work, we performed multivariate genome wide ing MED8, GATA6, HNF4A, MED1, THOC7, and PHYH association study (MGAS) of eight AD-relevant subcor- IP. The individual genes in the CMs might not demon- tical ROIs, using 866 samples in the ADNI database.