Gockley et al. Genome Medicine (2021) 13:76 https://doi.org/10.1186/s13073-021-00890-2

RESEARCH Open Access Multi-tissue neocortical transcriptome-wide association study implicates 8 across 6 genomic loci in Alzheimer’s disease Jake Gockley1 , Kelsey S. Montgomery1, William L. Poehlman1, Jesse C. Wiley1, Yue Liu2, Ekaterina Gerasimov2, Anna K. Greenwood1, Solveig K. Sieberts1, Aliza P. Wingo3,4, Thomas S. Wingo2,5, Lara M. Mangravite1 and Benjamin A. Logsdon6*

Abstract Background: Alzheimer’s disease (AD) is an incurable neurodegenerative disease currently affecting 1.75% of the US population, with projected growth to 3.46% by 2050. Identifying common genetic variants driving differences in transcript expression that confer AD risk is necessary to elucidate AD mechanism and develop therapeutic interventions. We modify the FUSION transcriptome-wide association study (TWAS) pipeline to ingest expression values from multiple neocortical regions. Methods: A combined dataset of 2003 genotypes clustered to 1000 Genomes individuals from Utah with Northern and Western European ancestry (CEU) was used to construct a training set of 790 genotypes paired to 888 RNASeq profiles from temporal cortex (TCX = 248), prefrontal cortex (FP = 50), inferior frontal gyrus (IFG = 41), superior temporal gyrus (STG = 34), parahippocampal cortex (PHG = 34), and dorsolateral prefrontal cortex (DLPFC = 461). Following within-tissue normalization and covariate adjustment, predictive weights to impute expression components based on a gene’ssurroundingcis-variants were trained. The FUSION pipeline was modified to support input of pre- scaled expression values and support cross validation with a repeated measure design arising from the presence of multiple transcriptome samples from the same individual across different tissues. Results: Cis-variant architecture alone was informative to train weights and impute expression for 6780 (49.67%) autosomal genes, the majority of which significantly correlated with ; FDR < 5%: N = 6775 (99.92%), Bonferroni: N = 6716 (99.06%). Validation of weights in 515 matched genotype to RNASeq profiles from the CommonMind Consortium (CMC) was (72.14%) in DLPFC profiles. Association of imputed expression components from all 2003 genotype profiles yielded 8 genes significantly associated with AD (FDR < 0.05): APOC1, EED, CD2AP, CEAC AM19, CLPTM1, MTCH2, TREM2, and KNOP1. Conclusions: We provide evidence of cis-genetic variation conferring AD risk through 8 genes across six distinct genomic loci. Moreover, we provide expression weights for 6780 genes as a valuable resource to the community, which can be abstracted across the neocortex and a wide range of neuronal phenotypes. Keywords: Alzheimer’s disease, TWAS, FUSION, GWAS, Dementia, Neurodegeneration, AMP-AD

* Correspondence: [email protected] 6Cajal Neuroscience, 1616 Eastlake Avenue East, Suite 208, Seattle, WA 98102, USA Full list of author information is available at the end of the article

© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Gockley et al. Genome Medicine (2021) 13:76 Page 2 of 15

Background CD2AP, CEACAM19, CLPTM1, MTCH2, TREM2, and Alzheimer’s disease (AD) is a progressive, incurable neu- KNOP1. rodegenerative disease accounting for 60–70% of all de- Expanding associated genes into gene sets using co- mentia diagnoses [1], currently affecting 5.8 million expression yielded enrichments for specific cell-type Americans and projected to grow to 13.8 million diagno- marker sets particularly microglial, oligodendrocyte, and ses by 2050 [2]. Late-onset Alzheimer’s disease (LOAD) astrocyte cell populations and cellular functions such as comprises over 95% of AD diagnoses and is composed protease binding, myeloid, and leukocyte regulation/acti- of a diverse, largely unknown set of etiologies [3]. The vation, regulation lipid/lipoprotein, RNA splicing, and mechanisms of AD remain insufficiently explained, despite steroid regulation. We identify 8 genes across six distinct clear Alpha-Beta plaque and Tau neurofibrillary tangle genomic loci associated to AD through gene expression neuropathology. Additionally,knownhighlypenetrantgen- attributable to their cis-genetic variation. Trained gene etic variants from familial-based cohorts with early-onset expression weights are a community resource which can Alzheimer’s disease (EOAD) implicate genes such as APP, be abstracted to multiple phenotypes and gain further PSEN1, and PSEN2. Despite clear pathology and known insights from large genotyped cohorts to maximize the risk factors, AD therapeutic clinical trials have consistently informativeness of invaluable and rare patient material failed [4, 5]. Elucidating the mechanisms by which AD gen- [11]. To this end, we provide a valuable resource to the etic risk loci contribute to AD disease and disease progres- community in the form of predictive gene expression sion is instrumental in the development of future impactful weights which can be leveraged across a wide range of therapeutics. neurological phenotypes. While genome-wide association studies (GWAS) iden- tified some candidate loci associated with AD risk, genes Methods targeted through cis-genetic risk factors remain unclear Ancestry analysis [6, 7]. Likewise, postmortem bulk-cell transcriptomics Ancestry analysis and clustering was performed to iden- show vast expression changes across multiple neocortical tify individuals with northern and western European an- regions; however, it remains difficult to identify which cestry from the combined ROSMAP, Mayo, and MSBB are driving AD from expression changes merely caused cohort. Briefly, phase I 1000 Genomes data [12] was fil- by the disease state of widespread cell death and tissue tered for YRI, CHB, JPT, and CEU ancestral populations. degeneration [8, 9]. -wide association stud- Genotype data was combined across MSBB, Mayo, and ies (TWAS) help provide this associative bridge and ROSMAP cohorts, filtered for 1000G overlapping SNPs, mechanistic direction of effect between genotype, and combined with 1000G data from the four reference transcript, and disease status [10]. TWAS leverages the ancestral populations. PCA was performed using Plink cis-genotype region surrounding an expressed gene to (v1.9) on SNPs passing filtering: minor allele frequency predict the cis-heritable component of a gene’s expres- > 1%, missingness < 0.1, maximum minor allele fre- sion, which in turn can be associated to disease status quency < 40%, and independent pairwise linkage filter using GWAS summary statistics. window of 50 Kb at 5 Kb steps and an r-squared thresh- We modified the FUSION pipeline [10], which deploys old of 0.2. PCA results were visualized along PC1 blup, bslmm, lasso, top1, and enet models to predict (50.7%) and PC2 (31.6%). Genotype clustering to identify gene expression from cis-variants within 1 MB of a given clustered genotype profiles was performed with the R gene to train weights and impute expression for 6780 package GemTools [13, 14]. The clustering of genotype (49.67%) autosomal genes from matched genotypes and profiles was performed on all PCs describing greater RNA-Seq profiles from dorsolateral prefrontal cortex than 1% of variation. Genotypes were recoded for Gem- (DLPFC), temporal cortex (TCX), prefrontal cortex Tools clustering in plink similar to the PCA but with the (PFC), superior temporal cortex (STG), inferior temporal added flags: --recode12, --compound-genotypes, --geno gyrus (IFG), and parahippocampal gyrus (PHG) provided 0.0000001. Ancestrally matched CEU samples were iden- by the Accelerating Medicines Partnership - Alzheimer’s tified as any sample genotypes belonging to one of the Disease (AMP-AD) consortia. Imputed gene expression three out of eight clusters to which contained 1000G ref- was validated in CommonMind Consortium (CMC) erence CEU individuals (Additional file 1: Fig. S1). DLPFC. Using our trained models, we imputed gene ex- pression into a large, recent GWAS cohort [6] to identify Defining the TWAS training set genes showing differential predicted gene expression be- Genotype data for 2360 individuals was combined across tween AD patients and controls. Following correction MSBB (N = 349), Mayo (N = 303), and ROSMAP (N = multiple testing, joint conditional probability testing 2360) cohorts (Specific details see: supplementary (JCP), and summary Mendelian randomization (SMR), methods sections 1–2). CEU 1000G reference individuals we discover eight candidate AD risk genes APOC, EED, were present within the top 3 clusters (Additional file 1: Gockley et al. Genome Medicine (2021) 13:76 Page 3 of 15

Fig. S1c) which contained 2003 individuals across all co- were represented multiple times for a subset of MSBB horts (Additional file 1: Fig. S1[b,d]). Broader admixture individuals to create the 888 unique genotype-expression was observed in the MSBB cohort compared to ROS- pairs. Among the 60 ancestrally matched MSBB geno- MAP and Mayo representative of study recruitment and types, 15 were profiled in all four tissues, 22 were pro- population (Additional file 1: Fig. S1[e-h]). After filtering filed in three tissues, 10 were profiles in 2 tissues, and 13 for variants present within the LD-Reference panel, the were only profiled in one of the four tissues. The final final European ancestry filtered population of 2003 indi- training set contained 275 designated AD cases, 181 des- viduals was represented by 1,069,623 variants (Fig. 1a). ignated controls, and 432 genotype-phenotype profiles Expression profiling from high-throughput sequencing which failed to meet a diagnostic case/control status from all three studies: ROSMAP (DLPFC N = 632), from neuropath and cognitive assessments. Mayo (TCX N = 262), and MSBB (PHG N = 161, IFG N = 187, STG N = 191, FP N = 214) were normalized Training TWAS weights within study. For description of the individual expression Multiple regions per-individual were assayed, in such data and processing, see supplementary methods (Add- cases both expression profiles were paired to the individ- itional file 1: Supplementary methods sections 3–4). ual’s genotype in the training set. The resulting training Iterative normalization was performed to regress signifi- set consisted of 790 ancestrally clustered genotypes cant covariates for individual studies including, but not matched to 888 normalized, scaled RNA-Seq profiles limited to, diagnosis, age at death, sex, and post mortem with diagnosis regressed. Weights predicting gene ex- interval (Additional file 2: Table S1). Diagnosis was pression were trained on matched genotype-RNA-Seq regressed as is common practice in TWAS and eQTL profiles and then used to impute the expression compo- studies as genetic allele risk has been found to be largely nents of all 2003 genotypes in our CEU ancestrally de- condition-independent [15, 16]. Post quality control and fined cohort. The FUSION software [10] was modified normalization, 888 RNA-Seq samples from all three to accommodate the presence of multiple RNA-Seq pro- studies (ROSMAP (DLPFC N = 481), Mayo (TCX N = files across different regions for the same individual by 248), and MSBB (PHG N = 34, IFG N = 41, STG N = ensuring that all samples from a given individual were 34, FP N = 50) were matched to 790 CEU ancestrally present within a single cross-validation fold during train- matched individuals (Fig. 1b, c, Table S4-S5). Genotypes ing and model optimization. The FUSION pipeline

Fig. 1 Experimental design. a Sample cohorts from MSBB, Mayo, and ROSMAP were combined across platforms by consensus SNP variant sites. Ancestry analysis was performed, and sites within the 2003 CEU ancestrally matched populations were filtered for consensus with the LD reference panel. b RNA-Seq samples originated from 6 distinct neocortical regions. c The training set data for training TWAS weights consisted of 888 RNA-Seq samples matched to 789 individual variant profiles Gockley et al. Genome Medicine (2021) 13:76 Page 4 of 15

script was altered cross-validating cohorts of multiple- performed using eagle, Minimac, and the HRC Reference samples per-individual, this capacity also ensured that Panel [18]. Imputed variant data was filtered for SNPs each cross-fold validation sample was within 5% of the present in the LD reference panel using Plink (v1.9). CMC size every other fold and could accept pre-scaled expres- data was withheld from training gene weight models for the sion values with specification of an additional flag --scale purpose of validating gene weights in an independent co- 1. Gemma (v0.98.1) calculated the cis-heritability of hort, blinded from the training models. Expression values scaled expression using all SNPs denoted in the LD- were imputed, and Kendall correlation values were calcu- Reference panel and within 1 MB window centered on lated comparing imputed gene expression to the scaled, the gene’s TSS for all 13650 autosomal genes. Weights assayed expression values. Correlation test values were FDR were trained using all five TWAS models (blup, bslmm, corrected for the number of matched comparisons N = lasso, top1, enet) for the 6780 genes with a significant 6643. cis-heritability (p < 0.01). To support the computational requirements of all five models, the FUSION software JCP, SMR, and COLOC analysis was altered to run on 5 threads and run in 14x parallel To assess the independence of these associations within across an AWS c5.18xlarge (72 core 144 MB) EC-2 in- their respective 1 MB windows, JCP testing was per- stance. All supporting files for training weights are avail- formed [17, 19]. In order to replicate our AD associa- able on Synapse [17] as well as trained weights in RData tions, SMR [17, 20] was run on all 6780 genes with files which can be used to impute expression compo- weights and analyzed for TWAS preliminary hits. Cor- nents with a user provided genotype profile. rection for 18 multiple comparisons was applied for rep- lication of associated genes (Additional file 2: Table S2). Expression imputation and TWAS gene associations JCP analysis was run on all candidate hit regions with The heritable component of gene expression was im- FUSION.post_process.R to disentangle signal within re- puted for 6780 genes with trained weights. Despite being gions with multiple significant genes by examining the trained on only the 790 individuals with 888 matched probability that multiple associations occur simultan- RNA-Seq profiles, expression components based on cis- eously (jointly). This analysis module helps identify genotype were able to be imputed for the entire com- genes which are associated irrespective of surrounding bined Mayo, MSBB, and ROSMAP genotype cohort of genes (Marginal) from those which rely on surrounding 2003 CEU ancestry-matched individuals on an AWS loci (Conditional) [10, 19, 21–23]. COLOC analysis was r3.8xlarge (32 core 144 MB) EC-2 instance. While ex- carried out to examine the probabilities that the pression components were able to be imputed for 2003 expression-AD phenotype signals for the remaining tar- genotypes, weight training was only able to be per- gets were driven by the same underlying genomic variant formed on the matched 888 genotype to expression pro- etiology [24–26] (Additional file 2: Table S6). files. Association of AD cases versus control using Kunkle et.al GWAS summary statistics [6] was per- GWAS enrichments formed with the FUSION.assoc_test.R script [10]. AD In order to examine whether the subset of genes with case and control designation was specified with strict trained TWAS weights were enriched with variants more neuropathological diagnosis criteria cutoffs as specified likely to regulate gene expression within their centered in supplementary table 5. Only 635 out of 2003 ances- 1 MB window compared to brain expressed genes with- trally matched individuals from the combined Mayo, out trained weights or the rest of the genome, summary MSBB, and ROSMAP genotypes were designated as AD statistics for both Kunkle et al. [6] and Styrkarsdottir cases (N = 404) or controls (N = 231) (Fig 2a). et al. [6, 27] were partitioned into 3 groups: SNPs that were within 1 MB of the TSS of genes which had trained CMC DLPFC validation TWAS weights, SNPs that were within 1 MB of the TSS CMC count data was ingested and processed similar to the of Autosomal genes which did not have trained TWAS AMP-AD transcriptome data. An iterative normalization weights, and autosomal SNPs that were not within 1 MB model was deployed to identify significant covariates and of an expressed gene (Intergenic). Wilcoxon rank-sum regress them from the expression data before scaling the tests were performed comparing p value distributions data (Additional file 2: Table S2). Genotype data was pro- between SNPs near genes with trained weights versus filed with Affymetrix GeneChip Mapping 5.0 K Array and a those without, as well as within and intergenic regions custom version of the Illumina Infinium CoreExome-24 (Additional file 2: Fig. S6). v1.1 BeadChip (#WG-331-1111). Raw data was filtered to remove SNPs with: zero alternate alleles, MAF < 1%, geno- Gene set expansion and cell type analysis − typing rate < 0.95, Hardy-Weinberg p value < 1 × 10 6,and To examine functional enrichments and identify poten- individuals with genotyping rate < 0.95. Imputation was tial processes driving AD, we used coexpression to build Gockley et al. Genome Medicine (2021) 13:76 Page 5 of 15

A) APOC1

30

) Significantly Associtated Gene

Marginally Associated After JCP

Not Significant After JCP 20

CEACAM19

CLPTM1 10 PVR EED TWAS Association P−Value TWAS MTCH2

( CD2AP C1QTNF4 10

NUP160 MADD POLR2E ZNF660 TREM2 KNOP1 PVRL2 PTPRJ POLR2G −log TMEM223

0 12345678910111213141516171820 22 19 21 B) C) 35 SPI1 FAM180B

) MYBPC3 NDUFS3 LRP4−AS1 MADD MTCH2 LRP4 NR1H3 C1QTNF4 OR4C45 30 SNORD67 ACP2 KBTBD4 OR4C3 MIR5582 DDB2 PTPMT1 OR4S1 CKAP5 MIR6745 CELF1 OR4X1 25 F2 PACSIN3 PSMC3 NUP160 OR4X2

P-Value ZNF408 C11orf49 SLC39A13 FNBP4 OR4B1

( ARHGAP1 ARFGAP2 RAPSN AGBL2 PTPRJ OR4A47

10 20 -log 15 ) 8

10 alue 6 V 4 P−

5 ( 2 10 Observed:

0 0

01234 −log 47.0 47.5 48.0 48.5 Expected: -log10( P-Value ) chr 11 physical position (MB) Fig. 2 Transcriptome-wide association study results. a Log 10 TWAS association p values by gene shown by genomic location are indicated in black and grey. Features passing initial correction for multiple comparisons (above the dotted line), but marginally significant after joint conditional probability (JCP) are shown in purple. Those features which are no longer significant after JCP are shown in light blue, while genes surviving JCP are shown in yellow. b QQplot of all TWAS p values. c An example plot of a region tested for JCP. The candidate genes found to be marginally significant, NUP160 and PTPRJ, are colored blue while those found to be jointly significant, MADD and MTCH2, are colored green (upper), while individual SNP p values are colored grey before and blue after conditioning (lower) out a gene set focused on each TWAS associated gene. additional gene was also correlated to all other tran- We first wanted to consider the possibility that another scribed genes for all six tissues. The only gene that met gene within the 1 MB window was co-regulated with our this criterion and was added for further analysis was identified gene of interest and therefore could be the APOE, as it resides within the 1 MB window surround- causative gene. For all 6 tissues, Variable Bayes Spike Re- ing APOC1. For each gene or in the case of the APOC1 gression [28, 29] was used to calculate the bootstrapped locus, APOC1 and APOE, the mean partial correlations partial regression of each TWAS associated gene to all across all 6 neocortical tissues were used to enrich for other genes. If the mean correlation of a gene within the co-expressed functionally related genes. The elbow plots 1MB cis-regulatory region was greater than 0.1, the indicating the decay of included genes as the standard Gockley et al. Genome Medicine (2021) 13:76 Page 6 of 15

A) B) APOC1 Greater than 1 Interaction ( N= 1167 ) APOC1 Greater than 2 Interactions ( N= 524 ) activation of immune response cellular response to organonitrogen compound cellular response to organonitrogen compound activation of immune response immune response-regulating signaling pathway immune response-regulating signaling pathway lymphocyte activation response to peptide immune response-activating signal transduction immune response-activating signal transduction cofactor metabolic process positive regulation of kinase activity inorganic ion homeostasis positive regulation of MAPK cascade Seed Gene positive regulation of kinase activity regulation of cell activation regulation of cell activation positive regulation of kinase activity response to peptide protein kinase B signaling TREM2 positive regulation of MAPK cascade response to peptide hormone 7500 cellular ion homeostasis regulation of protein kinase B signaling Count Count regulation of vesicle-mediated transport phagocytosis CD2AP 40 regulation of leukocyte activation 60 immune response-regulating cell surface receptor signaling pathway positive regulation of cytokine production 80 cellular response to peptide 50 MTCH2 response to peptide hormone 100 positive regulation of protein kinase B signaling 60 120 peptidyl-tyrosine immune response-activating cell surface receptor signaling pathway 70 EED peptidyl-tyrosine modification 140 positive regulation of MAP kinase activity phagocytosis response to insulin 80 KNOP1 cellular response to peptide P-Adjusted Fc receptor signaling pathway immune response-regulating cell surface receptor signaling pathway 3.68e-37 phosphatidylinositol-mediated signaling P-Adjusted regulation of MAP kinase activity 1.57e-22 inositol lipid-mediated signaling 7.95e-38 APOC1 protein kinase B signaling phosphatidylinositol 3-kinase signaling 3.14e-22 8.84e-19 positive regulation of cell activation ERBB signaling pathway 5000 CEACAM19 receptor-mediated endocytosis 4.71e-22 immune response-regulating cell surface receptor sig. pathway invpl/ in phagocytosis 1.77e-18 positive regulation of leukocyte activation 6.28e-22 Fc-gamma receptor signaling pathway involved in phagocytosis 2.65e-18 CLPTM regulation of protein kinase B signaling Fc-gamma receptor signaling pathway positive regulation of protein kinase B signaling Fc receptor mediated stimulatory signaling pathway 3.54e-18 Fc receptor signaling pathway regulation of phosphatidylinositol 3-kinase signaling phosphatidylinositol 3-kinase signaling vascular endothelial growth factor receptor signaling pathway 0.06 0.08 0.10 0.12 0.06 0.08 0.10 0.12 0.14 GeneRatio GeneRatio SFPQ F2 UBB STAT1 PRKCB NFATC2 TXK CDH1 TRAF6 RUNX1 IRAK3 HSPA5 PHB CD38 HSP90B1 PKLR MAX ACTA1 ELF2 HSPD1 CFTR LRRK2 PDE4B STAT3 C1QA PRKACA PRKCB AGT JUN ELF1 ZEB1 PSMD7 MAPKAPK2 NFKB1 HRAS SOS1 AKR1B1 GCK GABPA APP RPS27A STOML2 PSEN1 SHC1 APOB C1R DRD5 PTGS2 ESRRA PDE4B MAP3K7 PSEN1 HLA DPA1 ATP1A3 SORBS1 GAB2 IRF1 S100A1 PRKDC STAT5A GCLC SLC2A4 LCP2 SHC1 HSPA1A CTSL LAT2 UBE2D3 GNAI2 ADRB2 SP1 TP53 2500 SARM1 LY 9 6 PSMA3 A2M ASS1 RELA YWHAG RELA INPP5D DRD1 FOXO4 JAK2 positive regulation of protein kinase B signaling CASP8 IRF7 PSMC2 CYBB GNG2 TAF1 BTK PDPK1 HSPA1B CD4 CFH JAK3 HLA DPB1 CFD VAMP2 GRM5 IRF3 PTPN6 IKBKE GNB1 PARP1 AQP9 ZAP70 NRAS CTSB IRAK1 KRAS UBA52 CLU CASP3 LGMN ITGA4 SLC2A8 PRKCI BTRC TEC UBE2V1 SLA2 TBK1 PRKCQ NLRC4 immune response- UBE2N C1QB regulation of protein kinase B signaling AKT2 GAB1 UBE2D2 UBC CUL1 CALM1 SERPING1 cellular response to organonitrogen compound IRAK4 AKT2 ERBB2 regulating signaling pathway PRNP AKT1 PTPN11 INPP5K Size PIK3CD UBE2D1 C1S C1QC IRS1 AKT1 FGFR4 KIT KL AIF1 FGFR1 ESR1 PTPN11 NCK1 CYFIP1 53 FGF18 ARRB2 NCKAP1 NR1H3 MYD88 ABI1 CFB NRG1 DNM2 LY N F7 BTC NCK1 PIK3CG GAB1 PPARG FGF1 C1QBP SFTPA1 LY N AKR1C2 PIK3CB FCGR1A CSK VAV2 activation of immune response ERBB4 PIK3CA SOD1 ACTB 62 CD86 PIK3CG FYN ITGB2 DRD2 EGFR BAIAP2 SRC WASF2 MAPK1 BRK1 WASL CD14 NCKAP1L FGF1 EGF PTK2 PTEN LRP1 TREM2 FGF5 PDGFRB IL18 PDGFRA PIK3CA 71 GRB2 CD247 SYK BAIAP2 ESR1 FGF10 GRB2 TGFB1 IL1B phagocytosis LEPR AKR1C3 RAC1 FGR FGF22 PDGFB CYFIP1 HCK IL18 FGFR1 VAV1 PTPRJ ARPC2 PLCG1 C2 FGFR3 RHOG FYN PIK3R1 PIP5K1A 80 MAZ IRS1 CYFIP2 SRC TGM2 GPX1 HSP90AA1 TYRO3 MAPK3 ACTB PTPRC C4B FGF23 PTK2 AKR1C3 ERBB2 WASF2 FGF9 TLR4 YES1 FCGR3A PLCG2 CDC42 ACTG1 EGF IL1B PIK3R1 C4A FGF23 FGF17 protein kinase B signaling ERBB3 TGFA MYO18A CDC42 FGF2 VAV3 MAPK9 ITGAM ACTG1 Size HCK 0 PIK3R5 BAG4 FGF20 GPX1 FGF2 NCKAP1 SOS1 FGF18 CYFIP2 ITGAV protein kinase B signaling 66 FGF3 ITGB1 EGFR PDGFB FGF20 HSP90AA1 LDLR SYK RAB7A RHOG KITLG TGFA RAC1 BTC FGF5 PIP5K1C KLB BAG4 MSR1 FGF22 ITGB1 91 VAV1 ADORA2A FGF17 ERBB4 KL RAC2 PIK3CB FCGR3A FGFR2 FGF3 CCL2 LEPR FGFR4 MAZ KLB MYH9 HBEGF TGFB1 MET ARPC2 0.0 0.5 5.0 7.5 10.0 NRG1 PTEN AIF1 PIP5K1A 115 FGFR2 PDGFRB positive regulation of FGR INPP5K MET RAC2 MERTK LRP1 PLCG1 ABI1 ICAM5 GSN BCR FGFR3 ITSN1 AXL SLC9A3R1 CD86 HBEGF FGF9 protein kinase B signaling Partially Correlated Genes Included DRD2 ITGAV BCR YES1 NR1H3 MYH9 SETX RARA FGF10 IRS2 phagocytosis 140 BRK1 PLCG2 SETX SLC9A3R1 GSN TYROBP IL15RA LDLR EPHA2 ADORA1 RRAS PTX3 MSR1 KDR CD14 CD247 PPARG DNM2 RAB7A PIP5K1C ITGA2 NCKAP1L Standard Deviation Cutoff SOX9 PIK3R3 EPHA2 RAB5A ADORA1 SYT11 TREM2 SOD1 MYO18A ADORA2A RARA SCARB1 TGM2 C) D) * APOC1 * * * ** * * * * ** * *CLPTM1 * * CLPTM1 * * MTCH2 TREM2 * ** * * KNOP1 MTCH2 * * * **CD2AP EED * * EED ** CEACAM19 * * CEACAM19 5 CD2AP 4 * ** ** * *** * * *** TREM2 2 0 KNOP1 0 * ** * *****APOC1 −2 Ast End Ex1 Ex2 Ex3a Ex3b Ex3c Ex3d Ex3e Ex5a Ex6a Ex6b Ex8 In1a In1b In1c In2 In4a In4b In6a In6b In7 Mic Oli OPC OPC_Ce P Purk1 Pu Ast_Cer Ex4 Ex5b G In3 In8 −5 −4 Ast0 Ast1 Ast3 Ex0 Ex11 Ex14 Ex2 Ex4 Ex6 Ex7 Ex8 Ex9 In1 In10 In11 In2 In4 In5 In6 In7 In9 Mic0 Mic1 Mic2 Oli0 Oli3 Oli5 Opc0 Opc2 Ast2 Ex1 Ex12 Ex3 Ex5 In0 In3 In8 Mic3 Oli1 Oli4 Opc1 er ran r k2 −6 r Fig. 3 Cell-specific and cell process enrichment analysis. a Gene-set cutoffs of partial correlation to associated gene as a function of standard deviations away from the mean partial correlation. These are the cutoffs used for cell type specific enrichments seen in c and d. b EnrichR cell process enrichments of APOC1 expanded gene set of the 50 highest partially correlated genes and then expanded to all protein interaction partners from the pathway commons database which are represented more than once (Left) and twice (Right). c Cell type specific enrichments of expanded gene sets using Lake et. al cell type specific marker gene sets. Grey denotes zero overlapping genes between gene set and cell type-marker gene set significance in a grey square infers depletion from the expected overlap by chance. d Cell type specific enrichments of expanded gene sets using Mathys et. al cell type specific marker gene sets deviation of mean partial correlations moves away from protein-protein interaction databases from Pathway the centered mean of zero. This was used to draw cut- Commons [32]. All pairwise gene-gene interactions con- offs for inclusion into the expanded gene set at values of taining a gene member with the initial gene set expan- 0.7 (TREM2), 1.3 (KNOP1, CD2AP, MTCH2, EED), and sion were extracted, and the total gene set was expanded 1.7 (APOC1) (Fig. 3a). into a broad and narrow range gene set by including any For cell-type enrichments, TWAS expanded co- gene which appears more than once (broad) or more expression gene sets were analyzed for enrichment in than twice (narrow). Ensembl gene IDs were translated cluster-specific marker genes from Lake et al. [30] and to gene symbols and filtered for brain relevance by re- Mathys et al. [31]. Odds ratios were calculated with the quiring them to be expressed in at least one brain re- percent overlap of a gene set and cell-type specific gion. These gene sets were then submitted to EnrichR marker gene list divided by the expected percentage of [33, 34] to find cell process enrichments with the back- overlap. Significance was calculated with a two-sided ground set being any gene set to the list of all genes exact binomial test, FDR < 0.05 corrected for 35 (Lake) expressed in any one of the 6 brain regions analyzed. and 41 (Mathys) comparisons. For cell process enrichments, a multiscale gene set ex- Results pansion was employed, foundationally based on small Training and validating TWAS weights scale enrichment of expanding the candidate TWAS A total of 6818 (49.95%) genes had significant cis-herit- gene to the top 50 bootstrapped partial regression inclu- ability estimates (p value < 0.01) and therefore had sion statistics averaged across all tissues. The second ex- weights trained for them; only 6780 of 13650 (49.67%) pansion was performed by leveraging pathway commons could have non-zero variance expression components Gockley et al. Genome Medicine (2021) 13:76 Page 7 of 15

imputed into the CEU ancestrally matched genotype co- of the 13650 expressed genes in the training dataset re- hort of 2003 individuals (Additional file 1: Fig. S.2). Im- gardless of whether predictive weights were able to be puted expression components based on cis-genotype trained for the gene. Variants within 1 Mb of genes contribution were correlated to actual expression values meeting the heritability threshold versus were signifi- for all 888 matched genotype to RNA-Seq profiles which cantly enriched in lower GWAS p values than those was the inclusion criteria of the training set. Kendall within 1 MB of genes below the heritability threshold (p − correlations were calculated for all 6780 imputed to ac- < 2.22 16 Wilcoxon rank-sum) as well as intergenic vari- − tual gene expression, resulted in 6775 (99.93%; FDR < ants (p < 3.70 15 Wilcoxon rank-sum). To confirm this 0.05, N = 13650) imputed gene profiles significantly result, the same analysis was performed with variants correlated with the actual expression and 6716 (99.06%) from Styrkarsdottir et al.’s[27] bone density GWAS ana- were significantly correlated after Bonferroni correction lysis (Additional file 1: Fig. S.6b). This outgroup con- (Additional file 1: Fig. S.3a). The distribution of correla- firmed enrichment for associated variants within genes tions was right-skewed towards one (X =0.24±0.09) of higher heritability versus those of lower heritability (p − (Additional file 1:Fig.S.3b).Comparingimputedex- < 2.22 16 Wilcoxon rank-sum) as well as intergenic vari- − pression components to actualgeneexpressionforfour ants (p < 2.22 16 Wilcoxon rank-sum). Despite potential representative weights (Additional file 1:Fig.S.5)con- edge cases, such as genes regulated from long distance firmed that weights were not biased by tissue type or LD, genes in MHC regions, and trans-regulatory effects, cohort, and our regression normalization and expres- this analysis suggests that our weights are enriched for sion scaling coupled with changes to the FUSION genes under a higher degree of cis-genetic modular con- trained weights specific to the continuous heritable ex- trol, although the rate at which linkage-disequilibrium pression difference across our training set (Additional affects SNP independence is unknown. The outgroup file 1: Fig. S.4[a-d]). bone density GWAS dataset confirms that this enrich- CMC DLPFC data was used as an independent valid- ment is for genetic variants controlling gene expression ation cohort to examine the extractability of the TWAS irrespective of AD phenotype, and tissue context, sup- weights to other datasets. CMC data was comprised of porting these weights as a general resource across neo- 515 individually matched genotype to expression profiles cortical regions for generalizable use across multiple with a set of 6756 expressed genes overlapping the phenotypes. trained weights, representing 99.6% of all trained weights. Kendall rank correlation values between im- puted expression components and observed expression Alzheimer’s disease TWAS were right-skewed towards one (X = 0.13 ± 0.11) (Add- Imputed gene expression components were associated itional file 1: Fig. S.5b). FDR correction for multiple with AD through implementation of the FUSION pipe- comparisons yielded 4874 (72.14%) genes with signifi- line. This analysis yielded 18 preliminarily significant as- cant, positive correlations between imputed and actual sociations across 8 regions after correction for multiple DLPFC expression (Additional file 1: Fig. S.5a). Genes comparisons (FDR < 0.05) (Fig. 2a, Table 1). JCP testing with a significant p value (FDR < 0.05) and positive cor- resulted in dropping 6 preliminary associations as a re- relation values all had correlation values greater than sult of marginal association (Fig. 2c, Fig. S7). FDR cor- 0.061 (Additional file 1: Fig. S.5c). rection of the JCP p value resulted in dropping an Given the threshold of expression heritability required additional 4 targets due to non-significance. Association to train weights for a given gene, it could be expected of all remaining genes was further supported by SMR that variants regulating gene expression would be (Additional file 2: Table S3). The remaining 8 genes enriched within 1 MB of genes with trained weights ver- comprising 6 distinct non-overlapping 1 MB genomic re- sus genes which did not meet the heritability threshold. gions and are significantly associated to AD after JCP We looked for enrichment of low p value SNPs within and SMR with FDR corrected p values are as follows: the 1 MB widow centered on genes with trained weights APOC1 (JCP = 2.22e−22, SMR = 3.41e−4), EED (JCP = to test this assumption. Variants from Kunkle et al. [6] 3.373e−5, SMR = 2.50e−4), CD2AP (JCP = 2.96e−5, were binned into three groups (Additional file 1: Fig. SMR = 2.66e−4), CEACAM19 (JCP = 3.27e−5, SMR = S.6a). The first were the autosomal variants within the 1 1.00e−2), CLPTM1 (JCP = 4.04e−3, SMR = 2.58e−3), MB window of genes which had trained weights; the sec- MTCH2 (JCP = 0.011, SMR = 3.32e−6), TREM2 (JCP = ond consisted of autosomal variants within the 1 MB 0.021, SMR = 2.64e−3), and KNOP1(JCP = 0.039, SMR window of genes which did not meet the 1% heritability = 2.50e−4). All gene associations except MTCH2 and threshold to have weights trained for them. The final KNOP1 replicated and survived JCP testing when sum- group of variants was termed intergenic and consisted of mary statistics from Jansen et al. were used instead of all variants outside of a 1 MB window centered on any Kunkle et al. summary statistics. Gockley et al. Genome Medicine (2021) 13:76 Page 8 of 15

Table 1 Heritability (h2), best performing model, before and after JCP Z values and p values for all initially significant AD-associated genes. Blue denotes those only marginally significant after JCP, while green represents independently significant genes

Cell type specificity and pathway enrichment astrocyte and microglial markers respectively. The Expanded gene sets of genes co-expressed to TWAS CD2AP gene set was enriched within endothelial, peri- nominated AD-associated genes were built empirically cytes, and oligodendrocyte expression signatures. for each region’s gene set based on membership decay KNOP1, CEACAM19, MTCH2, and EED co-expression given increasing standard deviation cutoff (Fig. 3a, see gene set was not enriched within any of the cell-type the “Gene set expansion and cell type analysis” section). specific expression sets, but showed sporadic enrich- Distinct cell type enrichments were seen comparing gene ments in neuronal cell types (Fig. 3c, d). CLPTM1 was sets to the cell type-specific marker lists observed in enriched across neuronal populations, oligodendrocytes, Lake et al. [30] (Fig. 3c) and Mathys et al. [31] (Fig. 3d). and astrocytes (Fig. 3c, d). Notably, the largest two APOC1 and TREM2 coexpression sets were enriched in single-cell marker gene sets, derived from excitatory Gockley et al. Genome Medicine (2021) 13:76 Page 9 of 15

neuronal populations, Ex6 and Ex11, were more prone practice in TWAS and eQTL studies from training set to enrichment. This could have been due to their con- expression data, our trained weights represent a valuable taining significantly more genes than the other gene sets, resource capable of giving insight into the mechanisms which contain 850 (Ex6) and 747 (Ex11). of neocortical phenotypes [15, 16]. Cell process-enriched gene sets were built with a mul- We leveraged these weights to perform a TWAS be- tiscale approach combining pairwise inclusion statistics tween Alzheimer’s cases and controls, revealing 8 candi- with protein-protein interactions. Both a wide inclusion date genes across 6 distinct regions which passed cutoff and a more stringent inclusion cutoff produced a multiple filters for significance after correction for JCP permissive and conservative gene set for each candidate and SMR replication. We used the Jansen et al. AD gene (see the “Gene set expansion and cell type analysis” GWAS study [7] to replicate our findings, confirming six section). APOC1 was enriched for multiple immune re- of the eight genes with this data set. The two genes that sponse signaling pathways, phagocytosis, and immune failed to replicate after correction for multiple compari- activation consistently across both enriched gene sets sons were MTCH2 and KNOP1, which were not identi- (Fig. 3b). CD2AP was enriched for cellular responses to fied in Jansen et al. [7], indicating that our methodology lipids, protein localization, and responses to multiple is consistent with the input GWAS statistics. Import- molecular compounds (Additional file 1: Fig. S.8). EED, antly, as imputed expression is dependent on genotype, CLPTM1, and CEACAM19 were consistently enriched gene expression is associated with AD directly through for RNA, mRNA processing, RNA splicing, and RNA underlying regulatory cis-genetic risk factors. These translation processes. In addition to high amounts of methods can have difficulty in training expression transcription relevant overlap between the three candi- weights for relevant genes which have a high variance or date genes, distinction of enrichments of viral gene ex- are regulated under trans-regulation such as miRNA pression response (EED), protein catabolic processes for mediated transcript decay. Likewise, even though we de- (CLPTM1), and myeloid/megakaryocyte differentiation tect a strong signal for TREM2, a known risk factor pre- (CEACAM19) were observed (Additional file 1: Fig. S.9, dominantly expressed in microglia, a low representation S13-14). MTCH2 was enriched for purine, deoxynucleo- cell type, there remains a possibility that our signal is tide, and ribo-deoxynucleotide metabolism (Additional biased towards cell types of greater representation. file 1: Fig. S.10). KNOP1 was the smallest gene set within While bulk RNA sequencing is potentially confounded both the permissive and conservative cutoff groups. by cell proportion, our signal supports previous work Nominal enrichments for mitotic cell phase transition and implicating microglial contribution to AD pathology and Wnt signaling point to a potential a role in cell-cycle pro- disease mechanism. As new single-cell genomics tech- gression (Additional file 1: Fig. S.11). TREM2-expanded nology increases the available datasets on human post- gene sets showed enrichment for immune response activa- mortem cell lineages in future work will be needed to tion, T cell and leukocyte activation, and cell motility and focus on genotype to expression linkage methods such phagocytosis (Additional file 1:Fig.S.12). as TWAS in this context. Larger sample sizes and a wider array of neocortical tissue types may help mitigate Discussion these difficulties; however, the vital nature of these bios- We trained predictive models to impute gene-expression pecimens makes it understandably difficult to address components attributable to cis-variation across multiple completely. Co-regulation within the cis-genetic window neocortical tissues using a well-powered training set is a possible confounder in any TWAS analysis, as a compared to the field of TWAS studies [35, 36]. This is more stably expressed co-regulated gene could possibly the first pan-cortical analysis and is broadly abstractable produce a more robust association than the true causa- throughout the neocortex, providing a valuable resource tive disease-linked gene [37]. This effect, distributed to investigate a multitude neurological conditions and across a region, can coordinately drive AD disease mech- disorders. While it is commonly accepted practice in anism through multiple genes within the locus. With the TWAS studies to combine tissues to enhance the power exception of APOE being highly coexpressed with of predictive weight models, specifically including a APOC1, no other gene within a window is appreciably range of neocortical structures relevant to AD, we coexpressed with the TWAS candidate gene, granting sought to specifically identify drivers of AD capable of confidence in our 8 candidate genes. The inclusion of a working across these diverse regions while also maximiz- diverse set of brain regions into the training set may dis- ing the sample size of valuable neuronal tissues derived rupt co-regulation based on tissue-specific expression RNA-Seq samples. Beyond AD, there are a number of and differential disease impact across brain regions neuropsychiatric conditions schizophrenia, depression, could introduce variance into the model. This is a par- and ASD to name a few, which affect the neocortex as a ticular strength of our study. However, as AD pathology whole. As AD disease status was regressed, a common affects all of our included regions, fundamental driving Gockley et al. Genome Medicine (2021) 13:76 Page 10 of 15

risk genes could be expected to be identified across our Fig. S.8). Other potential biological roles can be seen in neocortical tissue set. As more genetic, molecular, and mice, where CD2AP is implicated in blood brain barrier biological process factors are associated with AD, function [43], and in Drosophila where the CD2AP methods such as TWAS represent a vital way to connect ortholog Cindr is implicated in synaptic plasticity and lines between these tiers of evidence, building a clearer Tau linked neurodegeneration [44, 45]. CD2AP’s role in picture of AD mechanism throughout the brain. While AD biology has been predominantly examined in neu- it is important to consider the whole locus in the con- rons, while our cell type enrichments point to the pri- text of our TWAS associations, this evidence supports mary involvement of endothelial, oligodendrocyte, our associated genes. pericytes, and astrocytes. Future studies exploring the function of CD2AP in non-neuronal cells may prove CD2AP, Chr6 useful in developing a broader perspective of CD2AP JCP analysis supports CD2AP as the most likely linked function in AD pathogenesis [6, 39]. gene within this locus, an observation that is consistent with broader biological investigations implicating upreg- EED, Chr11 ulation of CD2AP in AD. Colocalization analysis re- EED was identified by Kunkle et al., but not by three sulted in a 99% probability that GWAS association other major AD GWAS from the last few years [6, 7, 38, signal and functional TWAS signal for CD2AD originate 39]. One possible explanation is that the EED locus con- from shared causal variants (Additional file 2: Table S6). tains PICALM, a known AD risk factor, and could lead All four of the largest GWAS studies performed looking EED to be overlooked by other types of studies. We do at AD genetic associations have found variants that not believe that PICALM explains the TWAS risk identi- point to CD2AP [6, 7, 38, 39]. The biological role of fied here for a number of reasons. Colocalization ana- CD2AP involves dendritic targeting of APP to the intra- lysis yielded a high probability (H4 p = 0.69) that the luminal vesicles (ILV), which functionsasthepost-synaptic GWAS signal and EED functional association arise from lysosomal complex, for degradation [40]. Targeting APP to shared causal variants. While there is a non-zero prob- the ILV leads to proteolytic clearance, decreases the shared ability (H2 p = 0.18), our signal arises only due to time spent with BACE within the endosomal complex, and GWAS association, which may be a function of EED risk results in decreased amyloid secretion. CD2AP knockdown variants being in linkage with PICALM risk variants impairs targeted APP degradation, allowing APP and BACE (Additional file 2: Table S6). PICALM is positively asso- to co-exist within the early endosomal compartment and ciated with AD as it is involved in clathrin-mediated increasing amyloid production [40]. Accordingly, CD2AP endocytosis of APP and subsequent generation of amyl- over-expression drives APP localization from Rab5+ early oid [46, 47]; however, EED’s TWAS association Z value endosomes to Rab7+ late endosomes, leading to lysosomal is − 5.85 (Table 1). This infers that overexpression of the degradation and decreased amyloid secretion [41]. Auto- loci’s regulated gene target is protective against AD, and somal dominant AD mutations, associated with EOAD, re- this means the valence of the effect runs opposite to sulted in enlargement of the early endosomal compartment PICALM’s. Given the colocalization results, it is possible and elevated levels of the BACE-cleaved APP carboxy- that different AD risk variants within this region affect terminal fragment (CTFbeta) in cortical neurons derived both EED and PICALM transcription and are affected by from IPSCs [42]. This work supports the emerging view- partial linkage. Alternatively, EED is a component of the point that perturbations in endocytosis play a fundamental polycomb repressive complex 2 (PRC2) that functions as role in AD biology. a histone methylase depositing the repressive mark Towards its role in AD, CD2AP operates in concert H3K27 [46–48]. Targeting EED and the PRC2 complex with BIN1 as a functional regulation mechanism. While plays a role in synaptic plasticity, as genetic ablation of CD2AP promotes trafficking towards ILV for degrad- EED impacts long-term potentiation, a surrogate measure ation in the dendrites, BIN1 targets BACE from the early to hippocampal memory function [49]. Additionally, EED endosome back to the cell surface within axons, prevent- promotes neurogenesis within the hippocampus, potentially ing the colocalization of APP and BACE in endosomal making the brain more resistant to age-related neurodegen- compartments and decreasing the levels of secreted erative changes [50, 51]. Interestingly, the EED-expanded amyloid. While it may be too narrow a perspective to gene set was enriched for many processes involving transla- look only at amyloid biogenesis for linkage with disease tion and RNA splicing, two biological domains that would mechanism, it provides one plausible framework for be impacted by heterochromatin regulation (Additional consideration. Expanded gene set enrichment of CD2AP file 1: Fig. S.9). This association remains interesting given identifies a range of processes implicated in both the the potential roles of both EED and PICALM in AD biol- regulation of tissue development as well as responses to ogy, and further study is needed to fully understand the lipid and organic cyclic compounds (Additional file 1: roles of each gene’s contribution to the disease risk. Gockley et al. Genome Medicine (2021) 13:76 Page 11 of 15

MTCH2, Chr11 upon AD pathogenesis are currently not fully eluci- MTCH2, or mitochondrial carrier 2, is a SLC25 family dated; additional studies will be necessary to fully member of transporters, which localizes to the inner understand the relevant biology. mitochondrial membrane. MTCH2 has been identified as an AD risk factor [6, 38]. Previous studies implicated KNOP1, Chr16 risk variants regulating SPI1 [52], a well-known micro- KNOP1 is a rich nucleolar protein lacking direct glial shown to interact with known publications or much knowledge of its biological func- AD risk genes including TREM2, and CELF1 expression, tion; downregulation is associated with AD risk (Table 1). with fine mapping potentially implicating CELF1 [53] The KNOP1 gene resides within the IQCK locus, and from the MTCH2 locus in AD. Knockdown of SPI1 in Kunkel et al. [6] found linkage between KNOP1 and AD; microglia have implicated its role in regulating of micro- however, none of the other recent GWAS studies found glial genes involved in phagocytotic activity driving AD KNOP1 associated with AD. IQCK was a novel genome- as a potential target for treatment. MTCH2 is negatively wide locus from the Kunkle et al. [6] genetic meta- associated with AD (Z value = − 5.73 Table 1); the direc- analysis. What data exists for KNOP1 suggests that it as- tion of this effect indicates that reduced expression of sociates condensed chromatin during mitosis, which is MTCH2 increases AD pathology. This reinforces that partially supported by AP2M1 a KNOP1 yeast two- MTCH2 is the associated gene over SPI1, as knockdown hybrid binding partner identified by the Human Refer- of SPI1 reduced AD pathology. Colocalization analysis of ence Interactome (HuRI) [6, 62], and binds with a large this target feature showed a high probability (H4 p = number of H2B associated [62, 63]. This pre- 0.993) that MTCH2 functional association and GWAS liminary evidence aligns with the KNOP1 expanded gene signal arose from shared causal variants. This adds confi- set enrichment which shows a strong signal for mitotic dence GWAS signal from previous AD studies are acting cell phase transition and regulation of DNA-binding through MTCH2. The biological role of MTCH2 in the transcription factor activity (Additional file 1: Fig. S.11). brain is unclear. MTCH2 is known to contribute to adi- Intriguingly, the entire locus, similar to variants in the pocyte function and regulation of lipid metabolism [52, MTCH2 locus, is implicated in obesity [64, 65]. Colocali- 54] and to be genetically associated with obesity [55]. zation analysis yielded a high probability that GWAS However, MTCH2 clearly has a role outside of adipo- and functional associations were driven by shared gen- cytes, as inhibition of MTCH2 increases products of me- etic risk factors (H4 p = 0.809) (Additional file 2: Table tabolism, such as pyruvate and pyruvate dehydrogenase S6). We hope that the finding that KNOP1 is associated [56] in both the brain and muscle. Our gene set enrich- with the AD risk will inspire future studies into its spe- ment analysis further supports MTCH2 involvement cific function. across a wide array of metabolic processes (Additional file 1: Fig. S.10). TREM2, Chr6 MTCH2 knockout mice exhibited deficits in both Identification of TREM2 in this study is consistent with metabolic processes and hippocampal dependent previous work and knowledge in AD, as TREM2 is one spatial learning tasks [54, 57, 58]. There are known of the most widely studied genes in Alzheimer’s disease, links between nutrition, specifically cholesterol con- with links to both amyloid and tau pathology. TREM2 is sumption levels, in AD [59], relevant to health risks expressed almost exclusively in the microglia, and it is a of cardiovascular function, another well-known risk sentinel gene linking neuroinflammation to AD [66]. factor for AD. Concordantly, there are associations TREM2 null mutant mice crossed onto APP-PS1 AD observed between obesity [60]andAD,suggesting transgenics exhibit deficits in microglial recruitment to that MTCH2 variants associated with AD and obesity amyloid plaques and increased spread of pathological may be acting, at least in part, through a common tau [67]. Coding variants in TREM2 are estimated to mechanism [61]. This does not account for the spatial confer a 2–4-fold increase in AD risk, higher than any learning deficits, suggesting that this is not the whole gene other than APOE [68, 69]. Interestingly, APOE answer. Yet, consistent with a non-neuronal locus of binds to TREM2, leading to activation and recruitment effect, we did observe that the MTCH2-expanded of the microglial cells, which elicits both phagocytic and gene set is enriched for microglial and oligodendro- proinflammatory responses [66]. While microglial cells cyte cell type markers (Fig. 3d). MTCH2 knockout seem to be represented in a greater proportion in AD mice exhibit elevated levels of microglia and dimin- brains, scRNA-Seq studies have found microglial popula- ished synaptic density in the basal forebrain, both of tions to be a small fraction of total cell types regardless which could be explained by perturbations in micro- of disease status [31].TREM2 also binds to amyloid dir- glial and oligodendrocyte function [58]. The mecha- ectly, with nanomolar affinity, and activates microglial nisms through which MTCH2 exerts its influence clearance of amyloid deposition [67, 70]. Consistent with Gockley et al. Genome Medicine (2021) 13:76 Page 12 of 15

TREM2 function, our expanded TREM2 gene set expres- enhanced AD risk and making the disentanglement of sion was enriched for immune myeloid cellular lineages each gene’s unique AD risk contribution that much and cell process enrichments of immune activation and more difficult. leukocyte migration (Fig. 3c, d, Fig. S12). Interestingly, While APOC1, CEACAM19, and CLPTM1 are associ- our TREM2 Z score is − 4.52 (Table 1), which appears ated with AD, both in this study and previous studies to contradict previous work. However, TWAS analysis [72–74], a causal association to AD remains unclear. associates only the genetic component of TREM2 ex- However, there is a small literature pointing to a role for pression, inferring that TREM2 genetic risk could func- CLPTM1 in the regulation of GABA receptor trafficking tion through disruption and dysregulation of TREM2 from the ER to the plasma membrane, suggesting that endogenous function. Furthermore colocalization ana- CLPTM1 could regulate inhibitory neurotransmission lysis of this target yields a high probability (H1 p = [75, 76]. Unlike APOC1 and CEACAM19, colocalization 0.924) that this signal is driven from cis-regulatory vari- yields high probabilities that this association and GWAS ants affecting Trem2 expression but not contributing to signal arise from shared causal variants (H4 p = 0.359) the GWAS signal. This is consistent with the rare dis- as well as a high probability that this signal arises only ruptive TREM2 coding variants and recessive loss of from GWAS signal (H2 p = 0.574). While this speaks to function associations with AD versus common expres- the high LD within this window and the dominating sig- sion modulating variants with lower effect sizes. nal from APOE, there remains evidence for a distinct regulatory schema for CLPTM1 (Additional file 2: Table S6). The regulation of GABA currents could be a synap- APOC1 - CEACAM19 - CLPTM1, Chr19 tic scaling factor, adjusting the responsiveness of synap- APOC1, CLPTM1, and CEACAM19 were identified tic firing. Ge et al. [75] found that increasing CLPTM1 within this study, and all three genes reside within levels decreased miniature inhibitory postsynaptic poten- 600 Kb, less than a 1 Mb distance threshold, of each tials, while reciprocally decreasing CLPTM1 levels ele- other. Due to this proximity, there appears to be vated GABA currents in the post-synaptic neuron, complicated co-regulation in this region with upregu- strongly suggesting that CLPTM1 negatively regulates lation of APOC1 and CEACAM19 associated with AD GABAergic signaling. A recent literature meta-analysis but downregulation of CLPTM1 associated with AD. looking at neurotransmitter synaptic dysregulation in Interestingly, all three of these genes were also identi- AD found decreased levels GABA in AD patients, sup- fied in recent GWAS studies [7, 38]. This locus is porting the potential dysregulation of GABAergic signal- particularly infamous for harboring the APOE genes, ing in AD [77]. the largest known LOAD risk allele. APOE was the most coexpressed gene to APOC1, and this co- Conclusions expressed pair had the highest average coexpression We have presented here a TWAS analysis of Alzheimer’s ofanygenewithinacis-locus to an associated gene, using weights trained from RNA-Seq expression values strongly supporting the co-regulation of APOC1 and derived from 6 distinct cortical regions to associate genetic APOE. Colocalization analysis for APOC1 and CEAC risk to expression differences in 404 cases and 231 con- AM19 exhibit a high probability (H4 of p = 0.996 trols. This methodology has shown its power in resolving and p = 1, respectively) that the GWAS signal from additional mechanistic insights in the impact of risk vari- this region and the functional gene association are ants on transcripts responsible for AD pathology. We pro- driven from different causal variants. This is likely a vide a resource of trained expression weights for 6818 result of LD within this region (Additional file 2: genes which is broadly abstractable across the neocortex Table S6). Little is known about the specific role of and when used in combination with summary GWAS sta- APOC1 in AD; however, as it is also a lipid carrier tistics can perform powerful associations across a broad transport protein that, like APOE, is known to recruit range of neocortical phenotypes. the innate immune system, it may also have a role in regulation of microglial activation. More specifically, Abbreviations there are isolated studies which demonstrate linkage AD: Alzheimer’s disease; CEU: Utah residents (CEPH) with Northern and between APOC1 and declining cognition and memory Western European ancestry; FDR: False discovery rate; LOAD: Late-onset in specific ethnic groups [71, 72]. Consistent with the Alzheimer’s disease; EOAD: Early-onset Alzheimer’s disease; GWAS: Genome- wide association studies; TWAS: Transcriptome-wide association study; DLPF APOC1 role in lipid transport and immune activation, C: Dorsolateral prefrontal cortex; TCX: Temporal cortex; PFC: Prefrontal cortex; our expanded cell process enrichments specifically STG: Superior temporal cortex; IFG: Inferior temporal gyrus; identified phagocytosis and activation of immune re- PHG: Parahippocampal gyrus; AMP-AD: Accelerating Medicines Partnership - Alzheimer’s Disease; CMC: CommonMind Consortium; JCP: Joint probability sponse (Fig. 3b), supporting a shared role between correlation; SMR: Summary Mendelian randomization; GPR: G-protein APOE and APOC1 as the biological contributor to coupled receptor; ILV: Intraluminal vesicles Gockley et al. Genome Medicine (2021) 13:76 Page 13 of 15

Supplementary Information 5Department of Human Genetics, Emory University School of Medicine, The online version contains supplementary material available at https://doi. Atlanta, GA 30322, USA. 6Cajal Neuroscience, 1616 Eastlake Avenue East, org/10.1186/s13073-021-00890-2. Suite 208, Seattle, WA 98102, USA.

Received: 30 June 2020 Accepted: 17 April 2021 Additional file 1. This file contains the supplementary figures (S1-S14) and the Supplementary Methods Sections 1-4. Additional file 2. This file contains the supplementary tables (S1-S6). References 1. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR Jr, Kawas CH, Acknowledgements et al. The diagnosis of dementia due to Alzheimer’s disease: First and foremost, the authors would like to acknowledge the patients and recommendations from the National Institute on Aging-Alzheimer’s their families who provided the invaluable tissue samples to support this Association workgroups on diagnostic guidelines for Alzheimer’s disease. work. Rush Alzheimer’s Disease Center, Rush University Medical Center, Alzheimers Dement. 2011;7(3):263–9. https://doi.org/10.1016/j.jalz.2011.03. Chicago, provided all ROSMAP data collected, which was funded by NIA 005. grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, 2. Association A, Alzheimer’s Association. 2019 Alzheimer’s disease facts and U01AG32984, U01AG46152, the Illinois Department of Public Health, and the figures. Alzheimer’s Dement. 2019;15:321–87. https://doi.org/10.1016/j.jalz.2 Translational Genomics Research Institute. Mayo Study data were provided 019.01.010. by the following sources: The Mayo Clinic Alzheimer’s Disease Genetic 3. Mendez MF. Early-onset alzheimer disease. Neurol Clin. 2017;35(2):263–81. Studies, led by Dr. Nilufer Ertekin Taner and Dr. Steven G. Younkin, Mayo https://doi.org/10.1016/j.ncl.2017.01.005. Clinic, Jacksonville, FL, using samples from the Mayo Clinic Study of Aging, 4. Chartier-Harlin M-C, Crawford F, Houlden H, Warren A, Hughes D, Fidani L, the Mayo Clinic Alzheimer’s Disease Research Center, and the Mayo Clinic et al. Early-onset Alzheimer’s disease caused by mutations at codon 717 of Brain Bank. Mayo Data collection was funded through NIA grants P50 the β-amyloid precursor protein gene. Nature. 1991;353:844–6. https://doi. AG016574, R01 AG032990, U01 58AG046139, R01 AG018023, U01 AG006576, org/10.1038/353844a0. U01 AG006786, R01 AG025711, R01 AG017216, R01AG003949, NINDS grant 5. Rogaev EI, Sherrington R, Rogaeva EA, Levesque G, Ikeda M, Liang Y, et al. R01 NS080820, CurePSP Foundation, and support from Mayo Foundation. Familial Alzheimer’s disease in kindreds with missense mutations in a gene on chromosome 1 related to the Alzheimer’s disease type 3 gene. Nature. Authors’ contributions 1995;376:775–8. https://doi.org/10.1038/376775a0. JG, BAL, SKS, APW, and TSW designed the study. JG, KSM, and WLP 6. Kunkle BW, Grenier-Boley B, Sims R, Bis JC, Damotte V, Naj AC, et al. Genetic generated the data. JG performed the experiments and analyzed the data. meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and JG preformed the statistical analyses and interpreted the data with input implicates Aβ, tau, immunity and lipid processing. Nat Genet. 2019;51(3): from BAL, SKS, TSW, and APW. JG and JCW wrote the manuscript with input 414–30. https://doi.org/10.1038/s41588-019-0358-2. from all co-authors. JG and JCW revised the paper with input from all co- 7. Jansen IE, Savage JE, Watanabe K, Bryois J, Williams DM, Steinberg S, et al. authors. All authors read and approved the final manuscript. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet. 2019;51(3):404–13. https:// Funding doi.org/10.1038/s41588-018-0311-9. This research was supported by the National Institute on Aging under the 8. Wang M, Beckmann ND, Roussos P, Wang E, Zhou X, Wang Q, et al. The AMP-AD Data Coordination Center. Grant number: 5U24AG061340. Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer’s disease. Sci Data. 2018;5. https://doi.org/10.1038/sdata.2 Availability of data and materials 018.185. Weights and analysis files are available via the AD Knowledge Portal (https:// 9. Jager PLD, De Jager PL, Ma Y, McCabe C, Xu J, Vardarajan BN, et al. A multi- adknowledgeportal.synapse.org). The AD Knowledge Portal is a platform for omic atlas of the human frontal cortex for aging and Alzheimer’s disease accessing data, analyses, and tools generated by the Accelerating Medicines research. Sci Data. 2018;5. https://doi.org/10.1038/sdata.2018.142. Partnership (AMP-AD) Target Discovery Program and other National Institute 10. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BWJH, et al. Integrative on Aging (NIA)-supported programs to enable open-science practices and approaches for large-scale transcriptome-wide association studies. Nat accelerate translational learning. The data, analyses, and tools are shared Genet. 2016;48(3):245–52. https://doi.org/10.1038/ng.3506. early in the research cycle without a publication embargo on secondary use. 11. Mancuso N, Gayther S, Gusev A, Zheng W, Penney KL, Kote-Jarai Z, et al. Data is available for general research use according to the following requirements Large-scale transcriptome-wide association study identifies new prostate for data access and data attribution (https://adknowledgeportal.synapse.org/ cancer risk regions. Nat Commun. 2018;9(1):4079. https://doi.org/10.1038/ DataAccess/Instructions). s41467-018-06302-1. For information about data and data sources, see (https:// 12. Consortium T 1000 GP, The 1000 Genomes Project Consortium. A global adknowledgeportal.synapse.org/Explore/Studies/DetailsPage?Study=syn22313 reference for human genetic variation. Nature. 2015;526:68–74. https://doi. 785) org/10.1038/nature15393. 13. Zhao H, Xu J, Wang Y, Jiang R, Li X, Zhang L, et al. Knockdown of CEACAM19 Declarations suppresses human gastric cancer through inhibition of PI3K/Akt and NF-κB. Surg Oncol. 2018;27:495–502. https://doi.org/10.1016/j.suronc.2018.05.003. Ethics approval and consent to participate 14. Klei L, Kent BP, Melhem N, Devlin B, Roeder K. GemTools: a fast and efficient All data analyzed during the course of this study have been previously approach to estimating genetic ancestry. 2011. Available from: http://arxiv. published. All research performed during the course of this study conformed org/abs/1104.1162. [cited 2020 Jun 15] to the principles of the Helsinki Declaration. 15. Fromer M, Roussos P, Sieberts SK, Johnson JS, Kavanagh DH, Perumal TM, et al. Gene expression elucidates functional impact of polygenic risk for Consent for publication schizophrenia. Nat Neurosci. 2016;19(11):1442–53. https://doi.org/10.1038/ Not applicable. nn.4399. 16. Ng B, White CC, Klein H-U, Sieberts SK, McCabe C, Patrick E, et al. An Competing interests xQTL map integrates the genetic architecture of the human brain’s The authors have no competing interests. transcriptome and epigenome. Nat Neurosci. 2017;20:1418–26. https:// doi.org/10.1038/nn.4632. Author details 17. Bionetworks S. Synapse | Sage Bionetworks. Available from: https://www. 1Sage Bionetworks, Seattle, WA, USA. 2Department of Neurology, Emory synapse.org/#!Synapse:syn22163073. [cited 2020 Jun 25] University School of Medicine, Atlanta, GA 30322, USA. 3Division of Mental 18. Wang D, Liu S, Warrell J, Won H, Shi X, Navarro FCP, et al. Comprehensive Health, Atlanta VA Medical Center, Decatur, GA, USA. 4Department of functional genomic resource and integrative model for the human brain. Psychiatry, Emory University School of Medicine, Atlanta, GA, USA. Science. 2018 362(6420). doi: https://doi.org/10.1126/science.aat8464 Gockley et al. Genome Medicine (2021) 13:76 Page 14 of 15

19. Pickrell JK. Joint analysis of functional genomic data and genome-wide 39. Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C, association studies of 18 human traits. Am J Hum Genet. 2014;94(4):559–73. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci https://doi.org/10.1016/j.ajhg.2014.03.004. for Alzheimer’s disease. Nat Genet. 2013;45(12):1452–8. https://doi.org/10.1 20. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of 038/ng.2802. summary data from GWAS and eQTL studies predicts complex trait gene 40. Ubelmann F, Burrinha T, Salavessa L, Gomes R, Ferreira C, Moreno N, et al. targets. Nat Genet. 2016;48(5):481–7. https://doi.org/10.1038/ng.3538. Bin1 and CD2AP polarise the endocytic generation of beta-amyloid. EMBO 21. Liao C, Laporte AD, Spiegelman D, Akçimen F, Joober R, Dion PA, et al. Rep. 2017;18(1):102–22. https://doi.org/10.15252/embr.201642738. Transcriptome-wide association study of attention deficit hyperactivity 41. Furusawa K, Takasugi T, Chiu Y-W, Hori Y, Tomita T, Fukuda M, et al. CD2- disorder identifies associated genes and phenotypes. Nat Commun. 2019; associated protein (CD2AP) overexpression accelerates amyloid precursor 10(1):1–7. protein (APP) transfer from early endosomes to the lysosomal degradation 22. Gusev A, Mancuso N, Won H, Kousi M, Finucane HK, Reshef Y, et al. pathway. J Biol Chem. 2019;294(28):10886–99. https://doi.org/10.1074/jbc. Transcriptome-wide association study of schizophrenia and chromatin RA118.005385. activity yields mechanistic disease insights. Nat Genet. 2018;50(4):538–48. 42. Kwart D, Gregg A, Scheckel C, Murphy EA, Paquet D, Duffield M, et al. A https://doi.org/10.1038/s41588-018-0092-1. large panel of isogenic APP and PSEN1 mutant human iPSC neurons reveals 23. Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of shared endosomal abnormalities mediated by APP β-CTFs, not Aβ. Neuron. ANthropometric Traits (GIANT) Consortium, DIAbetes Genetics Replication 2019;104(5):1022. https://doi.org/10.1016/j.neuron.2019.11.010. And Meta-analysis (DIAGRAM) Consortium, et al. Conditional and joint 43. Cochran JN, Rush T, Buckingham SC, Roberson ED. The Alzheimer’s disease multiple-SNP analysis of GWAS summary statistics identifies additional risk factor CD2AP maintains blood-brain barrier integrity. Hum Mol Genet. variants influencing complex traits. Nat Genet. 2012;44(4):369–75 S1-3. 2015;24(23):6667–74. https://doi.org/10.1093/hmg/ddv371. 24. Plagnol V, Smyth DJ, Todd JA, Clayton DG. Statistical independence of the 44. Shulman JM, Imboywa S, Giagtzoglou N, Powers MP, Hu Y, Devenport D, colocalized association signals for type 1 diabetes and RPS26 gene et al. Functional screening in Drosophila identifies Alzheimer’s disease expression on chromosome 12q13. Biostatistics. 2009;10(2):327–34. https:// susceptibility genes and implicates Tau-mediated mechanisms. Hum Mol doi.org/10.1093/biostatistics/kxn039. Genet. 2014;23(4):870–7. https://doi.org/10.1093/hmg/ddt478. 25. Wallace C. Statistical testing of shared genetic control for potentially related 45. Ojelade SA, Lee TV, Giagtzoglou N, Yu L, Ugur B, Li Y, et al. cindr, the traits. Genet Epidemiol. 2013;37(8):802–13. https://doi.org/10.1002/gepi.21765. Drosophila homolog of the CD2AP Alzheimer’s disease risk gene, is required 26. Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace for synaptic transmission and proteostasis. Cell Rep. 2019;28(7):1799–1813.e5. C, et al. Bayesian test for colocalisation between pairs of genetic association 46. Merthan L, Haller A, Thal DR, von Einem B, von Arnim CAF. The role of PTB studies using summary statistics. Plos Genet. 2014;10(5):e1004383. https:// domain containing adaptor proteins on PICALM-mediated APP endocytosis doi.org/10.1371/journal.pgen.1004383. and localization. Biochem J. 2019;476(14):2093–109. https://doi.org/10.1042/ 27. Styrkarsdottir U, Stefansson OA, Gunnarsdottir K, Thorleifsson G, Lund SH, BCJ20180840. Stefansdottir L, et al. Publisher Correction: GWAS of bone size yields twelve 47. Zeng F-F, Liu J, He H, Gao X-P, Liao M-Q, Yu X-X, et al. Association of PICA loci that also affect height, BMD, osteoarthritis or fractures. Nat Commun. LM gene polymorphisms with Alzheimer’s disease: evidence from an 2019;10(1):2358. https://doi.org/10.1038/s41467-019-10425-4. updated meta-analysis. Curr Alzheimer Res. 2019;16(13):1196–205. https:// 28. Logsdon BA, Gentles AJ, Miller CP, Blau CA, Becker PS, Lee S-I. Sparse doi.org/10.2174/1567205016666190805165607. expression bases in cancer reveal tumor drivers. Nucleic Acids Res. 2015; 48. Kouznetsova VL, Tchekanov A, Li X, Yan X, Tsigelny IF. Polycomb repressive 43(3):1332–44. https://doi.org/10.1093/nar/gku1290. 2 complex-molecular mechanisms of function. Protein Sci. 2019;28(8):1387– 29. Logsdon BA, Carty CL, Reiner AP, Dai JY, Kooperberg C. A novel variational 99. https://doi.org/10.1002/pro.3647. Bayes multiple locus Z-statistic for genome-wide association studies with 49. Kim SY, Levenson JM, Korsmeyer S, Sweatt JD, Schumacher A. Bayesian model averaging. Bioinformatics. 2012;28(13):1738–44. https://doi. Developmental regulation of Eed complex composition governs a switch in org/10.1093/bioinformatics/bts261. global histone modification in brain. J Biol Chem. 2007;282(13):9962–72. 30. Lake BB, Chen S, Sos BC, Fan J, Kaeser GE, Yung YC, et al. Integrative single- https://doi.org/10.1074/jbc.M608722200. cell analysis of transcriptional and epigenetic states in the human adult 50. Sun B, Chang E, Gerhartl A, Szele FG. Polycomb protein Eed is required for brain. Nat Biotechnol. 2018;36(1):70–80. https://doi.org/10.1038/nbt.4038. neurogenesis and cortical injury activation in the subventricular zone. Cereb 31. Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, et al. Cortex. 2018;28(4):1369–82. https://doi.org/10.1093/cercor/bhx289. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019; 51. Liu P-P, Xu Y-J, Dai S-K, Du H-Z, Wang Y-Y, Li X-G, et al. Polycomb protein 570(7761):332–7. https://doi.org/10.1038/s41586-019-1195-2. EED regulates neuronal differentiation through targeting SOX11 in 32. Rouillard AD, Gundersen GW, Fernandez NF, Wang Z, Monteiro CD, hippocampal dentate gyrus. Stem Cell Rep. 2019;13(1):115–31. https://doi. McDermott MG, et al. The harmonizome: a collection of processed datasets org/10.1016/j.stemcr.2019.05.010. gathered to serve and mine knowledge about genes and proteins. 52. Huang K-L, Marcora E, Pimenova AA, Di NarzoAF,KapoorM,JinSC,etal.Acommon Database. 2016, 2016. https://doi.org/10.1093/database/baw100. haplotype lowers PU.1 expression in myeloid cells and delays onset of Alzheimer’s 33. Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, et al. Enrichr: disease. Nat Neurosci. 2017;20(8):1052–61. https://doi.org/10.1038/nn.4587. interactive and collaborative HTML5 gene list enrichment analysis tool. BMC 53. Karch CM, Ezerskiy LA, Bertelsen S, Alzheimer’s Disease Genetics Consortium Bioinformatics. 2013;14(1):128. https://doi.org/10.1186/1471-2105-14-128. (ADGC), Goate AM. Alzheimer’s disease risk polymorphisms regulate gene 34. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: expression in the ZCWPW1 and the CELF1 loci. Plos one. 2016;11(2):e0148717. a comprehensive gene set enrichment analysis web server 2016 update. Nucleic 54. Rottiers V, Francisco A, Platov M, Zaltsman Y, Ruggiero A, Lee SS, et al. Acids Res. 2016;44(W1):W90–7. https://doi.org/10.1093/nar/gkw377. MTCH2 is a conserved regulator of lipid homeostasis. Obesity. 2017;25(3): 35. Guo X, Lin W, Wen W, Huyghe J, Bien S, Cai Q, et al. Identifying novel 616–25. https://doi.org/10.1002/oby.21751. susceptibility genes for colorectal cancer risk from a transcriptome-wide 55. Buzaglo-Azriel L, Kuperman Y, Tsoory M, Zaltsman Y, Shachnai L, Zaidman association study of 125,478 subjects. Gastroenterology. 2020. https://doi. SL, et al. Loss of muscle MTCH2 increases whole-body energy utilization org/10.1053/j.gastro.2020.08.062. and protects from diet-induced obesity. Cell Rep. 2016;14(7):1602–10. 36. Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B. Integrating https://doi.org/10.1016/j.celrep.2016.01.046. gene expression with summary association statistics to identify genes 56. Khan DH, Mullokandov M, Wu Y, Voisin V, Gronda MV, Hurren R, et al. associated with 30 complex traits. Am J Hum Genet. 2017;100(3):473–87. Mitochondrial carrier homolog 2 (MTCH2) is necessary for AML survival. https://doi.org/10.1016/j.ajhg.2017.01.031. Blood. 2020. https://doi.org/10.1182/blood.2019000106. 37. Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, 57. Ruggiero A, Aloni E, Korkotian E, Zaltsman Y, Oni-Biton E, Kuperman Y, et al. Golan D, et al. Opportunities and challenges for transcriptome-wide Loss of forebrain MTCH2 decreases mitochondria motility and calcium association studies. Nat Genet. 2019;51(4):592–9. https://doi.org/10.1038/s41 handling and impairs hippocampal-dependent cognitive functions. Sci Rep. 588-019-0385-z. 2017;7(1):44401. https://doi.org/10.1038/srep44401. 38. Marioni RE, Harris SE, Zhang Q, McRae AF, Hagenaars SP, Hill WD, et al. 58. Aloni E, Ruggiero A, Gross A, Segal M. Learning deficits in adult GWAS on family history of Alzheimer’s disease. Transl Psychiatry. 2018;8(1): mitochondria carrier homolog 2 forebrain knockout mouse. Neuroscience. 99. https://doi.org/10.1038/s41398-018-0150-6. 2018;394:156–63. https://doi.org/10.1016/j.neuroscience.2018.10.035. Gockley et al. Genome Medicine (2021) 13:76 Page 15 of 15

59. Sullivan PM. Influence of Western diet and APOE genotype on Alzheimer’s disease risk. Neurobiol Dis. 2020;138:104790. https://doi.org/10.1016/j.nbd.2 020.104790. 60. Anstey KJ, Cherbuin N, Budge M, Young J. Body mass index in midlife and late- life as a risk factor for dementia: a meta-analysis of prospective studies. Obes Rev. 2011;12(5):e426–37. https://doi.org/10.1111/j.1467-789X.2010.00825.x. 61. Broce IJ, Tan CH, Fan CC, Jansen I, Savage JE, Witoelar A, et al. Dissecting the genetic relationship between cardiovascular risk factors and Alzheimer’s disease. Acta Neuropathol. 2019;137(2):209–26. https://doi.org/10.1007/s004 01-018-1928-6. 62. Luck K, Kim D-K, Lambourne L, Spirohn K, Begg BE, Bian W, et al. A reference map of the human binary protein interactome. Nature. 2020; 580(7803):402–8. https://doi.org/10.1038/s41586-020-2188-x. 63. Larsson M, Brundell E, Jörgensen P-M, Ståhl S, Höög C. Characterization of a novel nucleolar protein that transiently associates with the condensed in mitotic cells. Eur J Cell Biol. 1999;78:382–90. https://doi. org/10.1016/s0171-9335(99)80080-6. 64. Seshadri S, Fitzpatrick AL, Ikram MA, DeStefano AL, Gudnason V, Boada M, et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA. 2010;303(18):1832–40. https://doi.org/10.1001/jama.2010.574. 65. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G, Jackson AU, et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet. 2010;42(11):937–48. https://doi.org/10.1038/ng.686. 66. Shi Y, Holtzman DM. Interplay between innate immunity and Alzheimer disease: APOE and TREM2 in the spotlight. Nat Rev Immunol. 2018;18(12): 759–72. https://doi.org/10.1038/s41577-018-0051-1. 67. Leyns CEG, Gratuze M, Narasimhan S, Jain N, Koscal LJ, Jiang H, et al. TREM2 function impedes tau seeding in neuritic plaques. Nat Neurosci. 2019;22(8): 1217–22. https://doi.org/10.1038/s41593-019-0433-0. 68. Guerreiro R, Wojtas A, Bras J, Carrasquillo M, Rogaeva E, Majounie E, et al. TREM2 variants in Alzheimer’s disease. N Engl J Med. 2013;368(2):117–27. https://doi.org/10.1056/NEJMoa1211851. 69. Gratuze M, Leyns CEG, Holtzman DM. New insights into the role of TREM2 in Alzheimer’s disease. Mol Neurodegener. 2018;13(1):66. https://doi.org/1 0.1186/s13024-018-0298-9. 70. Zhao Y, Wu X, Li X, Jiang L-L, Gui X, Liu Y, et al. TREM2 is a receptor for β- amyloid that mediates microglial function. Neuron. 2018;97(5):1023–1031.e7. 71. Chuang W-L, Hsieh Y-C, Wang C-Y, Kuo H-C, Huang C-C. Association of apolipoproteins e4 and c1 with onset age and memory: a study of sporadic Alzheimer disease in Taiwan. J Geriatr Psychiatry Neurol. 2010;23(1):42–8. https://doi.org/10.1177/0891988709351804. 72. Zhou Q, Zhao F, Lv Z-P, Zheng C-G, Zheng W-D, Sun L, et al. Association between APOC1 polymorphism and Alzheimer’s disease: a case-control study and meta-analysis. PLoS One. 2014;9(1):e87017. https://doi.org/10.13 71/journal.pone.0087017. 73. Hao S, Wang R, Zhang Y, Zhan H. Prediction of Alzheimer’s disease- associated genes by integration of GWAS summary data and expression data. Front Genet. 2019;9. https://doi.org/10.3389/fgene.2018.00653. 74. Posthuma D, Jansen I, Savage J, Watanabe K, Bryois J, Williams DM, et al. 14 genetic meta-analysis identifies 9 novel loci and functional pathways for Alzheimer’s disease risk. Eur Neuropsychopharmacol. 2019;29:S1074. https:// doi.org/10.1016/j.euroneuro.2018.08.021. 75. Ge Y, Kang Y, Cassidy RM, Moon K-M, Lewis R, ROL W, et al. Clptm1 limits forward trafficking of GABAA receptors to scale inhibitory synaptic strength. Neuron. 2018;97:596–610.e8. https://doi.org/10.1016/j.neuron.2017.12.038. 76. Noam Y, Tomita S. On the path from proteomics to function: GABAAR trafficking takes a turn. Neuron. 2018;97:479–81. https://doi.org/10.1016/j. neuron.2018.01.038. 77. Manyevitch R, Protas M, Scarpiello S, Deliso M, Bass B, Nanajian A, et al. Evaluation of metabolic and synaptic dysfunction hypotheses of Alzheimer’s disease (AD): a meta-analysis of CSF markers. Curr Alzheimer Res. 2018;15. https://doi.org/10.2174/1567205014666170921122458.

Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.