Multi-Tissue Neocortical Transcriptome-Wide Association Study Implicates 8 Genes Across 6 Genomic Loci in Alzheimer's Disease
Total Page:16
File Type:pdf, Size:1020Kb
Gockley et al. Genome Medicine (2021) 13:76 https://doi.org/10.1186/s13073-021-00890-2 RESEARCH Open Access Multi-tissue neocortical transcriptome-wide association study implicates 8 genes across 6 genomic loci in Alzheimer’s disease Jake Gockley1 , Kelsey S. Montgomery1, William L. Poehlman1, Jesse C. Wiley1, Yue Liu2, Ekaterina Gerasimov2, Anna K. Greenwood1, Solveig K. Sieberts1, Aliza P. Wingo3,4, Thomas S. Wingo2,5, Lara M. Mangravite1 and Benjamin A. Logsdon6* Abstract Background: Alzheimer’s disease (AD) is an incurable neurodegenerative disease currently affecting 1.75% of the US population, with projected growth to 3.46% by 2050. Identifying common genetic variants driving differences in transcript expression that confer AD risk is necessary to elucidate AD mechanism and develop therapeutic interventions. We modify the FUSION transcriptome-wide association study (TWAS) pipeline to ingest gene expression values from multiple neocortical regions. Methods: A combined dataset of 2003 genotypes clustered to 1000 Genomes individuals from Utah with Northern and Western European ancestry (CEU) was used to construct a training set of 790 genotypes paired to 888 RNASeq profiles from temporal cortex (TCX = 248), prefrontal cortex (FP = 50), inferior frontal gyrus (IFG = 41), superior temporal gyrus (STG = 34), parahippocampal cortex (PHG = 34), and dorsolateral prefrontal cortex (DLPFC = 461). Following within-tissue normalization and covariate adjustment, predictive weights to impute expression components based on a gene’ssurroundingcis-variants were trained. The FUSION pipeline was modified to support input of pre- scaled expression values and support cross validation with a repeated measure design arising from the presence of multiple transcriptome samples from the same individual across different tissues. Results: Cis-variant architecture alone was informative to train weights and impute expression for 6780 (49.67%) autosomal genes, the majority of which significantly correlated with gene expression; FDR < 5%: N = 6775 (99.92%), Bonferroni: N = 6716 (99.06%). Validation of weights in 515 matched genotype to RNASeq profiles from the CommonMind Consortium (CMC) was (72.14%) in DLPFC profiles. Association of imputed expression components from all 2003 genotype profiles yielded 8 genes significantly associated with AD (FDR < 0.05): APOC1, EED, CD2AP, CEAC AM19, CLPTM1, MTCH2, TREM2, and KNOP1. Conclusions: We provide evidence of cis-genetic variation conferring AD risk through 8 genes across six distinct genomic loci. Moreover, we provide expression weights for 6780 genes as a valuable resource to the community, which can be abstracted across the neocortex and a wide range of neuronal phenotypes. Keywords: Alzheimer’s disease, TWAS, FUSION, GWAS, Dementia, Neurodegeneration, AMP-AD * Correspondence: [email protected] 6Cajal Neuroscience, 1616 Eastlake Avenue East, Suite 208, Seattle, WA 98102, USA Full list of author information is available at the end of the article © The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Gockley et al. Genome Medicine (2021) 13:76 Page 2 of 15 Background CD2AP, CEACAM19, CLPTM1, MTCH2, TREM2, and Alzheimer’s disease (AD) is a progressive, incurable neu- KNOP1. rodegenerative disease accounting for 60–70% of all de- Expanding associated genes into gene sets using co- mentia diagnoses [1], currently affecting 5.8 million expression yielded enrichments for specific cell-type Americans and projected to grow to 13.8 million diagno- marker sets particularly microglial, oligodendrocyte, and ses by 2050 [2]. Late-onset Alzheimer’s disease (LOAD) astrocyte cell populations and cellular functions such as comprises over 95% of AD diagnoses and is composed protease binding, myeloid, and leukocyte regulation/acti- of a diverse, largely unknown set of etiologies [3]. The vation, regulation lipid/lipoprotein, RNA splicing, and mechanisms of AD remain insufficiently explained, despite steroid regulation. We identify 8 genes across six distinct clear Alpha-Beta plaque and Tau neurofibrillary tangle genomic loci associated to AD through gene expression neuropathology. Additionally,knownhighlypenetrantgen- attributable to their cis-genetic variation. Trained gene etic variants from familial-based cohorts with early-onset expression weights are a community resource which can Alzheimer’s disease (EOAD) implicate genes such as APP, be abstracted to multiple phenotypes and gain further PSEN1, and PSEN2. Despite clear pathology and known insights from large genotyped cohorts to maximize the risk factors, AD therapeutic clinical trials have consistently informativeness of invaluable and rare patient material failed [4, 5]. Elucidating the mechanisms by which AD gen- [11]. To this end, we provide a valuable resource to the etic risk loci contribute to AD disease and disease progres- community in the form of predictive gene expression sion is instrumental in the development of future impactful weights which can be leveraged across a wide range of therapeutics. neurological phenotypes. While genome-wide association studies (GWAS) iden- tified some candidate loci associated with AD risk, genes Methods targeted through cis-genetic risk factors remain unclear Ancestry analysis [6, 7]. Likewise, postmortem bulk-cell transcriptomics Ancestry analysis and clustering was performed to iden- show vast expression changes across multiple neocortical tify individuals with northern and western European an- regions; however, it remains difficult to identify which cestry from the combined ROSMAP, Mayo, and MSBB are driving AD from expression changes merely caused cohort. Briefly, phase I 1000 Genomes data [12] was fil- by the disease state of widespread cell death and tissue tered for YRI, CHB, JPT, and CEU ancestral populations. degeneration [8, 9]. Transcription-wide association stud- Genotype data was combined across MSBB, Mayo, and ies (TWAS) help provide this associative bridge and ROSMAP cohorts, filtered for 1000G overlapping SNPs, mechanistic direction of effect between genotype, and combined with 1000G data from the four reference transcript, and disease status [10]. TWAS leverages the ancestral populations. PCA was performed using Plink cis-genotype region surrounding an expressed gene to (v1.9) on SNPs passing filtering: minor allele frequency predict the cis-heritable component of a gene’s expres- > 1%, missingness < 0.1, maximum minor allele fre- sion, which in turn can be associated to disease status quency < 40%, and independent pairwise linkage filter using GWAS summary statistics. window of 50 Kb at 5 Kb steps and an r-squared thresh- We modified the FUSION pipeline [10], which deploys old of 0.2. PCA results were visualized along PC1 blup, bslmm, lasso, top1, and enet models to predict (50.7%) and PC2 (31.6%). Genotype clustering to identify gene expression from cis-variants within 1 MB of a given clustered genotype profiles was performed with the R gene to train weights and impute expression for 6780 package GemTools [13, 14]. The clustering of genotype (49.67%) autosomal genes from matched genotypes and profiles was performed on all PCs describing greater RNA-Seq profiles from dorsolateral prefrontal cortex than 1% of variation. Genotypes were recoded for Gem- (DLPFC), temporal cortex (TCX), prefrontal cortex Tools clustering in plink similar to the PCA but with the (PFC), superior temporal cortex (STG), inferior temporal added flags: --recode12, --compound-genotypes, --geno gyrus (IFG), and parahippocampal gyrus (PHG) provided 0.0000001. Ancestrally matched CEU samples were iden- by the Accelerating Medicines Partnership - Alzheimer’s tified as any sample genotypes belonging to one of the Disease (AMP-AD) consortia. Imputed gene expression three out of eight clusters to which contained 1000G ref- was validated in CommonMind Consortium (CMC) erence CEU individuals (Additional file 1: Fig. S1). DLPFC. Using our trained models, we imputed gene ex- pression into a large, recent GWAS cohort [6] to identify Defining the TWAS training set genes showing differential predicted gene expression be- Genotype data for 2360 individuals was combined across tween AD patients and controls. Following correction MSBB (N = 349), Mayo (N = 303), and ROSMAP (N = multiple testing, joint conditional probability testing 2360) cohorts (Specific details see: supplementary (JCP), and summary Mendelian randomization (SMR), methods sections 1–2). CEU 1000G reference individuals we discover eight candidate AD risk genes APOC, EED,