Supplemental Texts

Text S1. Details of PrediXcan’s approach of estimating cis-eQTL effect sizes

PrediXcan TWAS method [1] employs Elastic-Net penalized regression method [2] to estimate cis-eQTL effect sizes � from Equation (1) in the main text. Basically, the Elastic-Net method assumes a combined LASSO (�) [3] and Ridge (�) [4] penalty and estimate � by the following equation

1 � = ������(�� − �� + �(�||�|| + (1 − �)||�||)) , � 2 where ‖∙‖ denotes � norm, ‖∙‖ denotes � norm. Particularly, � is taken as 0.5 by PrediXcan method [1] and penalty parameter � can be tuned by a 5-fold cross validation.

Text S2. Details of TIGAR’s approach of estimating cis-eQTL effect sizes

TIGAR8 provides a more flexible approach to nonparametrically estimate cis-eQTL effect sizes � from Equation (1) in the main text by a Bayesian DPR method [5]. The DPR method

assumes a normal prior distribution �(0, �) for cis-eQTL effect sizes and a Dirichlet process prior

� [6] for effect-size variance �� as follows:

w~N(0, σ), σ~D, D~DP(IG(a, b), ξ).

That is, the prior distribution � of effect-size variance deviates from a Dirichlet Process (DP) with an inverse gamma (IG) distribution and concentration parameter ξ. As proposed by previous studies, variational Bayesian algorithm [7, 8] is implemented to efficiently obtain posterior estimates �.

Text S3. Details of VC-TWAS approach with summary-level GWAS data and P-value calculation

VC-TWAS with summary-level GWAS data

Since summary-level GWAS data are generally generated by meta-analysis based on the following single variant (SNP) regression model:

� = �.�� + �, e ~ �(0, � ) .

Here, the phenotype is assumed to be adjusted for other confounding covariates with mean 0, and the genotype vectors (�.�, j=1,…,m) are also assumed to be centered with mean 0. Without loss of generality, we assume GWAS summary statistics include the single variant effect size

estimate � and corresponding standard error � for the � SNP, sample size �, and a reference

LD covariance matrix � of all test SNPs.

Following the derivation provided by [9], given the marginal SNP effect size � estimate

�∙�� � = , the denominator �.��.� can be approximated by using the ��ℎ diagonal element of the �.��.�

reference LD covariance matrix, � ≈ �′�/(� − 1), with �.��.� = (� − 1)Σ,. Thus, the numerator of the score statistic for the � SNP as shown in the main text (Equation (6)) can be estimated by

�∙�� = (� − 1)�Σ, .

In addition, based on the estimate for the marginal SNP effect size variance, the

phenotype variance � can be estimated by

�� (� �.�) ()(� �.�) � = ≈ .� .� = Σ �(� − 1) + Σ � . () () , ,

Since this estimate might vary with respect to the summary GWAS data of different SNPs, we

take the median of Σ,� (� − 1) + Σ,� across all the SNPs as � as suggested by the previous study [9]. Then the � statistic used by VC-TWAS using only GWAS summary-level data can be approximated by

� � ∙� � = � . �

P-value calculation for VC-TWAS

Under the null hypothesis, the � statistic used by VC-TWAS follows a mixture of chi-

square distribution ∑ � �, [10, 11], where (�, ⋯ , �) are nonzero eigenvalues of �,

� � = ���, � = ���, � = �� − ������� �′��

where � is the � × � genotype matrix, � is the matrix of covariate data, � = � � for continuous traits with identify matrix I, � = ����[��(� − ��), … , ��(� − ��)] for dichotomous traits.

If phenotype � is centered and adjusted for other covariates as assumed when using

()� summary-level GWAS data, then � can be simplified and approximated by � ≈ with reference LD covariance matrix � [12].

The p-value by VC-TWAS can then be conveniently obtained from several approximation and exact methods like the Davies exact method [13], which can be done by using both individual-level and summary-level GWAS data.

Text S4. Details of ROS/MAP data

In our applications of studying Alzheimer’s dementia (AD), we used transcriptome and individual-level GWAS data generated for samples from the Religious Orders Study (ROS) and

Rush Memory and Aging Project (MAP) [14-17] cohorts. ROS recruits nuns, priests, and brothers across the United States. MAP recruits participants living in private homes, subsidized housings, and retirement facilities across the greater Chicago metropolitan area. ROS/MAP data can be requested at www.radc.rush.edu. Both studies employ harmonized data collection methods performed by the same staff for annual testing during life and for structured autopsy and collection of genomic data from blood and brain biospecimens. Harmonized data collection facilitates joint analyses of the studies’ data.

Details of the studies are described elsewhere.

We used microarray genotype data generated for 2,093 European-decent subjects from

ROS/MAP [14-17], which are further imputed to the 1000 Genome Project Phase 3 [18]. Post- mortem brain samples (gray matter of the dorsolateral prefrontal cortex) from ~30% these

ROS/MAP participants with assayed genotype data are also profiled for transcriptomic data by next-generation RNA-sequencing [19], which are used as reference data to train GReX prediction models in our application studies.

Using ROS/MAP data, we conducted TWAS for clinical diagnosis of late on-site

Alzheimer’s dementia (LOAD) as well as pathology indices of AD quantified with �-antibody and

PHFtau specific immunostains. Quantitative pathology phenotypes �-amyloid load and PHFtau tangle density were studied. An additional phenotype of the summary measure of the burden of

AD pathology (a combination of neuritic and diffuse plaques and neurofibrillary tangles based on modified Bielschowsky silver stain) [14, 15, 17] was also studied.

The tangle density quantifies the average PHFtau tangle density within two or more 20µm sections from eight brain regions –– hippocampus, entorhinal cortex, midfrontal cortex, inferior temporal, angular gyrus, calcarine cortex, anterior cingulate cortex, and superior frontal cortex.

The �-amyloid load quantifies the average percent area of cortex occupied by �-amyloid in adjacent sections from the same eight brain regions. These two are based on immunohistochemistry. The global measure of AD pathology is based on counts of neuritic and diffuse plaques and neurofibrillary tangles (15 counts) on 6µm sections stained with modified

Bielschowsky [14, 15, 17].

Supplementary Figures (A) pcausal 0.001 pcausal 0.01 100 100

75 75

50 50 Power Power 25 25

0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 expression heritability heritability pcausal 0.1 pcausal 0.2 100 100

75 75

50 50 Power Power 25 25

0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability

BurdenTWAS_DPR BurdenTWAS_PrediXcan VCTWAS_filtered_DPR Method BurdenTWAS_filtered_DPR VCTWAS_DPR VCTWAS_PrediXcan (B) pcausal 0.001 pcausal 0.01 100 100

75 75

50 50 Power Power

25 25

0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability pcausal 0.1 pcausal 0.2 100 100

75 75

50 50 Power Power

25 25

0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability

Figure 1. TWASFig powerS1. TWAS comparison power comparison for VC for- TWASVC-TWAS and and Burden Burden--TWASTWAS with with phenotypes phenotypes simulated from Model I (A) and Model II (B). Various types of SNP weights were considered, including those simulated fromderived Model from I (A) PrediXcan and Model method, II (B). DPR Various method, types and of filteredSNP weights DPR weights were. considered, In Model I, the including those derived from PrediXcan method, DPR method, and filtered DPR weights. Using combinations of causal probability and phenotype heritability are�, ℎ = ((0.001,0.2), DPR weights resulted(0.01,0. 3in), higher (0.1,0.4 TWAS), (0.2,0 power.5)). In Modelthan usingII, the combinationsPrediXcan weights of causal across probability all scenariosand phenotype with . TWAS using filtered DPR weights had comparable performance as using p!"#$"% ≥ 0.heritability01 are �, ℎ = ((0.001,0.1), (0.01,0.1), (0.1,0.15), (0.2,0.15)). complete DPR weights across all scenarios. For phenotypes simulated from Model I (panel A), Burden-TWAS had either comparable power with VC-TWAS when p!"#$"% = (0.001, 0.01) or slightly higher power p!"#$"% = (0.1, 0.2). For phenotypes simulated from Model II (panel B), VC- TWAS had higher power than Burden-TWAS under all scenarios. VC-TWAS with DPR weights resulted in the highest power.

A) Model I B) Model II 100 100

75 75

50 50 Power Power

25 25

0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability

CoMM VCTWAS_DPR VCTWASSS CoMMSS VCTWAS_DPR_f VCTWASSS_f Fig S2. TWAS power comparison for VC-TWAS and CoMM with phenotypes simulated from Model I (A) and Model II (B) using individual-level and summary-level data under the scenarios with � = 0.2.

Complete DPR Weights Filtered DPR Weights

30000

20000 Number of SNPs 10000 Median:6632

Median:2872

0 Complete DPR Weights Filtered DPR Weights

Fig S3. Box plot of the number of test SNPs considered by VC-TWAS of all genome-wide in the application studies of AD, with complete DPR weights and filtered DPR weights derived from the ROS/MAP training data.

Fig S4. Q-Q plots for VC-TWAS and Burden-TWAS with DPR weights, filtered DPR weights, and PrediXcan weights under null hypothesis, where quantitative gene expression traits were generated with � = 0.2 and ℎ = 0.1.

A)

ZNF234 TRAPPC6A values)

TOMM40 log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

B)

HSPBAP1 values) log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

Fig S5. Manhattan plots of VC-TWAS results with filtered DPR weights for studying quantitative AD pathology of �-Amyloid (A) and tangles (B). Genes with FDA < 0.05 by meta VC-TWAS for studying AD clinical diagnosis are colored in red in (A) and top significant gene for studying tangles phenotype with FDR = 0.058 is colored in red in (B).

VCTWAS with DPR weights on amyloid VCTWAS with DPR weights on tangles values) values) log10(p log10(p Expected Expected 0123 012345 01234 01234 Observed log10(pvalues) Observed log10(pvalues)

VCTWAS with DPR weights on Global AD pathology VCTWAS with DPR weights on AD values) values) log10(p log10(p Expected Expected 01234 0 5 10 15 01234 01234 Observed log10(pvalues) Observed log10(pvalues)

Fig S6. Q-Q plots of VC-TWAS results with filtered DPR weights for studying �-amyloid, tangles, and global AD pathology with ROS/MAP cohort, as well as meta VC-TWAS results with filtered DPR weights for studying AD clinical diagnosis with ROS/MAP and Mayo Clinic cohorts.

A) C )

2 100 2 100 r rs4420638 80 r rs429358 ● ●

0.8 Recombination rate (cM/Mb) 0.8 Recombination rate (cM/Mb) 60 ● 0.6 80 0.6 ●● 80 ● 0.4 0.4 ● 60 ●● 0.2 0.2 ●

60 ●● 60 value) value) ●● 40 ● − −

(p (p 40 10 ● 10 ● g 40 g 40 o o l l ●● - - 20 ● ● 20 ● ● 20 20 ZNF221ZNF155ZNF45LOC100505715LOC101928063 ● ● FOXA3IRF2BP1RSPH6AMYPOPSYMPK PSG5PSG4LOC284344 LOC105372419PPP1R37CLASRPGEMIN7ZNF296NKPD1 ●● ●● ● ●●●● ●● ●● ●● ●●●●●●● ●●●● ●● ● ●●●●●●●● ● ●●●●●●●● ● ●●●●●●●● ●●● ●●●● ●●● ●●●●●●● ●●● ●●●●● ●●●●●●● ●●●●●● ● ● ● ● ● ● ●●●● ●●●●●●●●●●●●●●●●●●● ●● ●● ●● ●●●●●●●●●●● ●●● ● ●●●●●●● ●● ● ●●●●●● ●●●●● ● ● ● ●●●●●● ●●●● ●●● ● ●●●●●●● ● ●● ●●●●●● ● ●●●●●●● ●●●●●●●●●● ●●● ●●●●●● ●●●●●●●●●●● ●● ●●●●● ●● ● ● ●●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●● ●●●● ●●●●●●●●●●●● ●●●●●●●● 0 ●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●●●●●●●●● ●●● 0 0 ●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● 0

ZNF230 ZNF227 ZNF229 PVR NECTIN2 PPP1R37 KLC3 VASP FBXO46 PSG9 LYPD3 PLAUR ZNF404 ZNF224 ZNF112 IGSF23 NECTIN2

ZNF222 ZNF233 ZNF180 BCL3 APOC4 NKPD1 ERCC2 OPA3 BHMG1 PRG1 XRCC1 KCNN4 ZNF155 ZNF227 ZNF229 PVR TOMM40

ZNF223 ZNF235 IGSF23 TOMM40 TRAPPC6A ERCC1 EML2 27 genes CD177 PINLYP LYPD5 ZNF230 ZNF233 ZNF180 BCL3 APOC4 19 genes ZNF284 ZNF112 MIR4531 APOE BLOC1S3 MIR6088 GIPR omitted TEX101 SRRM5 ZNF283 ZNF223 ZNF235 MIR4531 APOE omitted ZNF224 ZNF285 CEACAM19 APOC2 EXOC3L2 FOSB MIR642A PHLDB3 IRGC ZNF45 ZNF225 ZNF285 CEACAM19 APOC2

LOC100379224 CEACAM20 CBLC RELB MARK4 RTN2 SNRPD2 ETHE1 SMG9 ZNF221 ZNF226 CEACAM20 CBLC RELB

ZNF225 CEACAM22P BCAM CLASRP CKM PPM1N SIX5 ZNF575 LOC100505715 CEACAM22P BCAM

44.5 ZNF234 45 CEACAM16 CLPTM145.5 PPP1R13L46 MIR330 44 IRGQ LOC10192806344.5 45 CEACAM16 CLPTM145.5 Position on chr19 (Mb) Position on chr19 (Mb) ZNF226 MIR8085 ZNF296 CD3EAP MIR642B ZNF576 ZNF222 MIR8085 B) D) 04 04 nonsignificant SNPs nonsignificant SNPs significant SNPs significant SNPs Top significant SNP Top significant SNP 04 6e 04 6e 04 5e 04 5e

eQTL Effect Size eQTL Effect Size eQTL Effect rs429358 04 4e 04 4e Absolute cis Absolute cis 04 3e 04 3e

rs4420638 04 2e 04 2e 1e 1e

44.4 44.6 44.8 45.0 45.2 45.4 45.6 45.8 46.0 46.2 46.4 44.4 44.6 44.8 45.0 45.2 45.4 45.6 TOMM40 ZNF234 Fig S7. Locus zoom plots of GWAS results and the magnitude (i.e., absolute value) of cis-eQTL effect size estimates by DPR method for SNPs that were considered by VC-TWAS of genes Figure 3. Locus zoom plots of GWAS results and the absolute values of cis-eQTL effect TOMM40size estimates (A, B) by and DPR ZNF2334 method for (C, SNPs D). Filteredthat were test included SNPs in with VC -theTWAS cis -ofeQTL genes effect size magnitude > 10TOMM40 were (A, plotted B) and ZNF2334here. SNPs (C, D). with Only GWAS SNPs withp-value the absolute < 5 × 10values were of estimated colored cis in- red in (B,D), top significanteQTL effect SNPssizes > by10 &GWAS' were included in (A,C) for were VC-TWAS shown were as plotted the blue in this triangle figure. inSNPs (B,D). with GWAS p-value < 5 × 10&( were colored in red in the plots of cis-eQTL effect sizes (B,D), top significant SNPs by GWAS in (A,C) were colored in blue in the plots of cis-eQTL effect sizes (B,D).

A) values) log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

B) values) log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

Fig S8. Manhattan plots of VC-TWAS results with PrediXcan weights for studying AD clinical diagnosis (A) and global AD pathology (B).

A) values) log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

B) values) log10(p 0123456

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome

Fig S9. Manhattan plots of VC-TWAS results with PrediXcan weights for studying quantitative AD pathology of β-Amyloid (A) and tangles (B).

VCTWAS with PrediXcan weights on Amyloid VCTWAS with PrediXcan weights on Tangles values) values) log10(p log10(p Expected Expected 01234 01234

01234 01234 Observed log10(pvalues) Observed log10(pvalues)

VCTWAS with PrediXcan weights on Global AD pathology VCTWAS with PrediXcan weights on AD values) values) log10(p log10(p Expected Expected 01234 0.0 0.5 1.0 1.5 2.0 2.5 3.0 01234 01234 Observed log10(pvalues) Observed log10(pvalues)

Fig S10. Q-Q plots of VC-TWAS results with PrediXcan weights for studying �-amyloid, tangles, and global AD pathology with ROS/MAO cohort, as well as meta VC-TWAS results with PrediXcan weights for studying AD clinical diagnosis with ROS/MAP and Mayo Clinic cohorts.

A) B) ) ) p ( p ( 10 10 log log

Observed Observed 0 10 20 30 40 50 60 0 5 10 15 01234

01234 Expected log10(p)

Expected log10(p)

Fig S11. Q-Q plots of VC-TWAS results with cis-eQTL DPR filtered weights and BGW weights on IGAP GWAS summary statistics.

Supplementary Tables

SNP Burden- Burden- VC-TWAS- CoMM- number TWAS TWAS-SS VC-TWAS SS CoMM SS <1000 0.25 0.0016 0.28 0.15 73.62 2.11 1000- 2000 0.28 0.0039 4.73 1.42 2333.62 81.93 2000- 5000 0.26 0.0124 20.09 3.71 38037.01 373.16 >5000 0.25 0.0478 138.93 20.64 83349.44 5476.43

Table S1. Computation time (in the unit of second) by Burden-TWAS, VC-TWAS and CoMM with individual and summary-level GWAS data by using with 1 CPU core with 32GB memory, for example genes that have different test SNP numbers.

Gene CHROM �-Amyloid Tangles Global AD pathology ZNF234 19 2.10 × 10* 1.06 × 10* 6.39 × 10* CLASRP 19 1.39 × 10 8.69 × 10 3.76 × 10* TRAPPC6A 19 4.44 × 10* 3.74 × 10 1.91 × 10* TOMM40 19 9.55 × 10* 6.95 × 10 2.08 × 10* CEACAM19 19 1.03 × 10* 1.21 × 10 3.19 × 10*

Table S2. Genes with VC-TWAS p-value <0.0013 with respect to at least one AD pathology phenotype and FDR <0.05 by meta VC-TWAS of AD clinical diagnosis. AD risk genes identified by previous GWAS are shaded in grey.

Gene name CHROM Start End P-value FDR CUTA 6 33,384,218 33,386,094 1.95 × 10 5.96 × 10 CLU 8 27,454,433 27,472,548 2.01 × 10 6.13 × 10 OSBP 11 59,341,870 59,383,617 8.67 × 10 2.84 × 10 STX3 11 59,480,928 59,573,354 2.82 × 10 8.09 × 10 PRPF19 11 60,658,201 60,674,060 2.93 × 10 1.00 × 10 TMEM109 11 60,681,345 60,690,915 3.66 × 10 1.03 × 10 TMEM132A 11 60,691,934 60,704,631 3.56 × 10 1.19 × 10 ME3 11 86,152,149 86,383,678 1.52 × 10 4.76 × 10 ZNF221 19 44,455,379 44,471,752 5.73 × 10 1.55 × 10 ZNF230 19 44,507,076 44,518,072 8.75 × 10 6.48 × 10 ZNF222 19 44,529,493 44,537,260 1.58 × 10 6.16 × 10 ZNF284b 19 44,576,296 44,591,623 6.39 × 10 2.43 × 10 ZNF225 a 19 44,617,547 44,637,255 6.77 × 10 2.51 × 10 ZNF234 a, b 19 44,645,709 44,664,462 3.38 × 10 1.19 × 10 ZNF226 19 44,669,214 44,681,836 7.69 × 10 2.70 × 10 ZNF227 b 19 44,716,690 44,741,420 4.83 × 10 3.24 × 10 ZNF233 19 44,754,317 44,815,771 2.78 × 10 8.09 × 10 ZFP112 b 19 44,830,705 44,905,774 7.16 × 10 3.05 × 10 PVR b 19 45,147,097 45,169,429 3.28 × 10 1.65 × 10 CEACAM19 a,b 19 45,174,723 45,187,631 7.27 × 10 7.30 × 10 BCL3 b 19 45,250,961 45,263,301 3.09 × 10 1.89 × 10 BCAM 19 45,312,337 45,324,677 3.35 × 10 1.47 × 10 PVRL2 19 45,349,392 45,392,485 7.07 × 10 5.85 × 10 TOMM40 a,b 19 45,394,476 45,406,935 1.52 × 10 7.13 × 10 APOE 19 45,408,955 45,412,650 2.59 × 10 1.22 × 10 APOC1 19 45,417,920 45,422,606 7.83 × 10 3.16 × 10 CLPTM1 a,b 19 45,457,847 45,496,598 1.48 × 10 1.60 × 10 RELB a 19 45,504,694 45,541,452 7.48 × 10 1.50 × 10 CLASRP a,b 19 45,542,297 45,574,214 1.91 × 10 2.69 × 10 ZNF296 b 19 45,574,758 45,579,845 4.05 × 10 2.28 × 10 GEMIN7 19 45,582,529 45,594,782 1.56 × 10 9.17 × 10 PPP1R37 19 45,595,049 45,651,335 1.02 × 10 6.49 × 10 NKPD1 19 45,653,007 45,663,408 2.61 × 10 1.84 × 10 TRAPPC6A a,b 19 45,666,186 45,681,485 5.33 × 10 1.50 × 10 BLOC1S3 19 45,682,002 45,685,057 6.28 × 10 1.67 × 10 MARK4 a,b 19 45,754,549 45,808,541 3.16 × 10 2.22 × 10 ERCC2 19 45,854,245 45,873,876 4.76 × 10 1.31 × 10 PPP1R13L a 19 45,882,891 45,909,607 1.25 × 10 1.95 × 10 CD3EAP b 19 45,909,466 45,914,024 1.15 × 10 4.15 × 10 ERCC1 19 45,910,590 45,982,086 6.89 × 10 3.34 × 10 FOSB 19 45,971,252 45,978,414 4.06 × 10 3.17 × 10 RTN2 b 19 45,988,549 46,000,313 1.89 × 10 1.78 × 10 PPM1N b 19 45,992,034 46,005,768 1.30 × 10 7.04 × 10 VASP 19 46,010,687 46,030,236 3.11 × 10 7.28 × 10 GPR4 19 46,093,024 46,105,466 8.88 × 10 2.84 × 10 EML2 a,b 19 46,112,659 46,148,726 1.83 × 10 2.35 × 10 GIPR a,b 19 46,171,501 46,185,704 4.21 × 10 3.70 × 10 SNRPD2 b 19 46,190,712 46,195,443 9.97 × 10 5.19 × 10 QPCTL 19 46,195,740 46,207,240 7.87 × 10 3.16 × 10 FBXO46 a,b 19 46,213,886 46,234,151 1.74 × 10 2.45 × 10 DMPK 19 46,272,977 46,285,815 6.79 × 10 7.95 × 10 DMWD 19 46,286,204 46,296,060 1.40 × 10 3.57 × 10 SYMPK 19 46,318,692 46,366,548 1.54 × 10 3.87 × 10 IRF2BP1 b 19 46,386,865 46,389,376 5.02 × 10 8.83 × 10 MYPOP b 19 46,393,284 46,405,862 2.80 × 10 1.27 × 10 NLRP2 19 55,476,437 55,512,510 2.82 × 10 8.09 × 10 HAR1A 20 61,733,556 61,735,738 1.17 × 10 3.05 × 10 a. Genes also identified as significant by VC-TWAS with individual-level GWAS data of ROS/MAP and Mayo Clinic cohorts. b. Genes identified as significant by both VC-TWAS and Burden-TWAS using IGAP summary statistics with filtered cis-eQTL DPR weights.

Table S3. Significant genes identified by VC-TWAS using IGAP summary statistics with filtered cis-eQTL DPR weights. Significant genes were identified with FDR < 0.05. AD risk genes identified by previous GWAS are shaded in grey.

Gene Name CHROM Start End P-value FDR PPIEL 1 39,997,509 40,024,379 1.41 × 10 3.01 × 10 ARHGAP29 1 94,614,543 94,740,624 1.14 × 10 6.11 × 10 RWDD3 1 95,699,710 95,712,781 4.17 × 10 2.94 × 10 GPATCH2 1 217,600,333 217,804,424 2.41 × 10 1.17 × 10 RP11- 211A18.2 1 227,421,969 227,423,056 4.84 × 10 1.59 × 10 C2orf50 2 11,273,178 11,286,916 4.92 × 10 1.20 × 10 ANKRD30BL 2 132,905,163 133,015,542 1.43 × 10 1.06 × 10 TTC21B 2 166,713,984 166,810,353 4.86 × 10 2.14 × 10 SPEG 2 220,299,567 220,363,009 1.11 × 10 1.12 × 10 FBXO36 2 230,787,017 230,877,825 3.14 × 10 2.60 × 10 PTH1R 3 46,919,235 46,945,287 1.52 × 10 5.50 × 10 ZXDC 3 126,156,443 126,194,762 7.11 × 10 2.13 × 10 XPO5 6 43,490,071 43,543,812 3.76 × 10 2.40 × 10 RP1-180E22.3 6 52,442,282 52,444,325 2.14 × 10 4.30 × 10 BCLAF1 6 136,578,000 136,610,989 6.31 × 10 2.69 × 10 AC017116.8 7 44,078,769 44,081,905 6.18 × 10 3.70 × 10 MRPL15 8 55,047,769 55,060,461 6.21 × 10 1.43 × 10 VCPIP1 8 67,540,721 67,579,452 3.52 × 10 2.75 × 10 GEM 8 95,261,480 95,274,578 5.51 × 10 1.31 × 10 TRAF1 9 123,664,670 123,691,451 5.69 × 10 1.00 × 10 PBX3 9 128,509,623 128,729,656 1.23 × 10 4.56 × 10 C9orf167 9 140,172,200 140,177,093 5.66 × 10 1.81 × 10 AGAP10 10 47,191,843 47,239,738 1.29 × 10 3.64 × 10 HELLS 10 96,305,546 96,373,662 3.51 × 10 7.05 × 10 PKD2L1 10 102,047,902 102,090,243 1.03 × 10 1.45 × 10 C10orf137 10 127,408,083 127,452,712 3.26 × 10 1.09 × 10 SLC39A13 11 47,428,682 47,438,047 1.81 × 10 3.76 × 10 LETMD1 12 51,441,744 51,454,207 6.19 × 10 1.94 × 10 ZBTB39 12 57,392,617 57,400,230 8.73 × 10 1.95 × 10 KITLG 12 88,885,884 88,974,628 1.74 × 10 3.66 × 10 NT5DC3 12 104,164,230 104,234,975 3.95 × 10 4.28 × 10 SERTM1 13 37,248,048 37,271,976 1.97 × 10 4.03 × 10 SEC23A 14 39,501,122 39,578,850 8.85 × 10 1.95 × 10 KTN1-AS1 14 55,965,995 56,046,828 1.09 × 10 4.25 × 10 ACOT4 14 74,058,409 74,063,200 1.30 × 10 6.10 × 10 RP13- 487P22.1 15 25,590,779 25,592,382 3.82 × 10 9.60 × 10 C15orf58 15 90,777,039 90,785,315 1.17 × 10 6.11 × 10 IGSF6 16 21,652,608 21,663,981 1.37 × 10 1.75 × 10 AC012146.7 17 5,014,762 5,018,299 5.58 × 10 8.74 × 10 AC092296.1 19 36,804,643 36,822,602 2.24 × 10 2.63 × 10 GNAS 20 57,414,772 57,486,247 8.07 × 10 2.27 × 10 FOXRED2 22 36,883,236 36,903,148 1.05 × 10 4.24 × 10 H1F0 22 38,201,113 38,203,442 8.26 × 10 1.88 × 10 PPPDE2 22 41,994,031 42,017,100 4.15 × 10 3.65 × 10 PPP6R2 22 50,781,732 50,883,514 6.97 × 10 4.77 × 10

Table S4: Novel significant genes identified by VC-TWAS using summary statistics with BGW weights on IGAP summary statistics.

Supplementary References: 1. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091-8. Epub 2015/08/11. doi: 10.1038/ng.3367. PubMed PMID: 26258848; PubMed Central PMCID: PMCPMC4552594. 2. Zou H, Hastie T. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society Series B (Statistical Methodology). 2005;67(2):301-20. 3. Tibshirani R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996;58(1):267-88. 4. Hoerl AE, Kennard RW. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics. 2000;42(1):80-6. doi: 10.2307/1271436. 5. Zeng P, Zhou X. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun. 2017;8(1):456. Epub 2017/09/08. doi: 10.1038/s41467-017-00470-2. PubMed PMID: 28878256; PubMed Central PMCID: PMCPMC5587666. 6. Muller P, Mitra R. Bayesian Nonparametric Inference - Why and How. Bayesian Anal. 2013;8(2). Epub 2013/12/26. doi: 10.1214/13-BA811. PubMed PMID: 24368932; PubMed Central PMCID: PMCPMC3870167. 7. Blei DM, Kucukelbir A, McAuliffe JD. Variational Inference: A Review for Statisticians. Journal of the American Statistical Association. 2017;112(518):859-77. doi: 10.1080/01621459.2017.1285773. 8. Carbonetto P, Stephens M. Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies. Bayesian Anal. 2012;7(1):73-108. doi: 10.1214/12-BA703. 9. Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ATC, Replication DIG, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature genetics. 2012;44(4):369-75, S1-3. Epub 2012/03/20. doi: 10.1038/ng.2213. PubMed PMID: 22426310; PubMed Central PMCID: PMCPMC3593158. 10. Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics. 2007;63(4):1079-88. Epub 2007/12/15. doi: 10.1111/j.1541-0420.2007.00799.x. PubMed PMID: 18078480; PubMed Central PMCID: PMCPMC2665800. 11. Liu D, Ghosh D, Lin X. Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC Bioinformatics. 2008;9:292. Epub 2008/06/26. doi: 10.1186/1471-2105-9-292. PubMed PMID: 18577223; PubMed Central PMCID: PMCPMC2483287. 12. Lee S, Teslovich TM, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. American journal of human genetics. 2013;93(1):42- 53. Epub 2013/06/19. doi: 10.1016/j.ajhg.2013.05.010. PubMed PMID: 23768515; PubMed Central PMCID: PMCPMC3710762. 13. Moschopoulos PG, Canada WB. The distribution function of a linear combination of chi- squares. Computers & Mathematics with Applications. 1984;10(4):383-6. doi: https://doi.org/10.1016/0898-1221(84)90066-X. 14. Bennett DA, Schneider JA, Arvanitakis Z, Wilson RS. Overview and findings from the religious orders study. Curr Alzheimer Res. 2012;9(6):628-45. Epub 2012/04/05. PubMed PMID: 22471860; PubMed Central PMCID: PMCPMC3409291. 15. Bennett DA, Schneider JA, Buchman AS, Barnes LL, Boyle PA, Wilson RS. Overview and findings from the rush Memory and Aging Project. Curr Alzheimer Res. 2012;9(6):646-63. Epub 2012/04/05. doi: 10.2174/156720512801322663. PubMed PMID: 22471867; PubMed Central PMCID: PMCPMC3439198. 16. Ng B, White CC, Klein HU, Sieberts SK, McCabe C, Patrick E, et al. An xQTL map integrates the genetic architecture of the human brain's transcriptome and epigenome. Nat Neurosci. 2017;20(10):1418-26. Epub 2017/09/05. doi: 10.1038/nn.4632. PubMed PMID: 28869584; PubMed Central PMCID: PMCPMC5785926. 17. Bennett DA, Buchman AS, Boyle PA, Barnes LL, Wilson RS, Schneider JA. Religious Orders Study and Rush Memory and Aging Project. J Alzheimers Dis. 2018;64(s1):S161-S89. Epub 2018/06/06. doi: 10.3233/JAD-179939. PubMed PMID: 29865057; PubMed Central PMCID: PMCPMC6380522. 18. Buchanan CC, Torstenson ES, Bush WS, Ritchie MD. A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data. J Am Med Inform Assoc. 2012;19(2):289-94. Epub 2012/02/10. doi: 10.1136/amiajnl-2011-000652. PubMed PMID: 22319179; PubMed Central PMCID: PMCPMC3277631. 19. De Jager PL, Srivastava G, Lunnon K, Burgess J, Schalkwyk LC, Yu L, et al. Alzheimer's disease: early alterations in brain DNA methylation at ANK1, BIN1, RHBDF2 and other loci. Nat Neurosci. 2014;17(9):1156-63. Epub 2014/08/19. doi: 10.1038/nn.3786. PubMed PMID: 25129075; PubMed Central PMCID: PMCPMC4292795.