[1] Employs Elastic-Net Penalized Regression Method [2] to Estimate Cis-Eqtl Effect Sizes � from Equation (1) in the Main Text
Total Page:16
File Type:pdf, Size:1020Kb
Supplemental Texts Text S1. Details of PrediXcan’s approach of estimating cis-eQTL effect sizes PrediXcan TWAS method [1] employs Elastic-Net penalized regression method [2] to estimate cis-eQTL effect sizes � from Equation (1) in the main text. Basically, the Elastic-Net method assumes a combined LASSO (�!) [3] and Ridge (�") [4] penalty and estimate � by the following equation " 1 " �# = ������(,�� − ��, + �(�||�||! + (1 − �)||�||")) , � " 2 where ‖∙‖! denotes �! norm, ‖∙‖" denotes �" norm. Particularly, � is taken as 0.5 by PrediXcan method [1] and penalty parameter � can be tuned by a 5-fold cross validation. Text S2. Details of TIGAR’s approach of estimating cis-eQTL effect sizes TIGAR8 provides a more flexible approach to nonparametrically estimate cis-eQTL effect sizes � from Equation (1) in the main text by a Bayesian DPR method [5]. The DPR method " assumes a normal prior distribution �(0, �%) for cis-eQTL effect sizes and a Dirichlet process prior � [6] for effect-size variance �� as follows: " " w'~N(0, σ(), σ(~D, D~DP(IG(a, b), ξ). That is, the prior distribution � of effect-size variance deviates from a Dirichlet Process (DP) with an inverse gamma (IG) distribution and concentration parameter ξ. As proposed by previous studies, variational Bayesian algorithm [7, 8] is implemented to efficiently obtain posterior estimates �#. Text S3. Details of VC-TWAS approach with summary-level GWAS data and P-value calculation VC-TWAS with summary-level GWAS data Since summary-level GWAS data are generally generated by meta-analysis based on the following single variant (SNP) regression model: " � = �.��+ + �, e, ~ �(0, �. ) . Here, the phenotype is assumed to be adjusted for other confounding covariates with mean 0, and the genotype vectors (�.�, j=1,…,m) are also assumed to be centered with mean 0. Without loss of generality, we assume GWAS summary statistics include the single variant effect size 01 estimate �O+ and corresponding standard error �#/ for the � SNP, sample size �, and a reference LD covariance matrix � of all test SNPs. Following the derivation provided by [9], given the marginal SNP effect size �+ estimate # R �∙�� 4 �/ = # , the denominator �.��.� can be approximated by using the ��ℎ diagonal element of the �.��.� 4 reference LD covariance matrix, � ≈ �′�/(� − 1), with �.��.� = (� − 1)Σ+,+. Thus, the numerator of the score statistic for the �01 SNP as shown in the main text (Equation (6)) can be estimated by 4 R �∙�� = (� − 1)�/Σ+,+ . In addition, based on the estimate for the marginal SNP effect size variance, the " phenotype variance �7 can be estimated by # =& # @ & �4� (� �.�)< (9:!)>(� �.�)? " �" = ≈ .� % .� ' = Σ �Y"(� − 1) + Σ �O . 7 (9:!) (9:!) +,+ / +,+ + Since this estimate might vary with respect to the summary GWAS data of different SNPs, we Y" " " take the median of Σ+,+�/ (� − 1) + Σ+,+�O+ across all the SNPs as �Z7 as suggested by the previous study [9]. Then the � statistic used by VC-TWAS using only GWAS summary-level data can be approximated by A �4 � " " ∙� � = \ �+ ^ _ . �Z" +B! 7 P-value calculation for VC-TWAS Under the null hypothesis, the � statistic used by VC-TWAS follows a mixture of chi- A " square distribution ∑+B! �+ �+,! [10, 11], where (�!, ⋯ , �A) are nonzero eigenvalues of �, :� � = ���, � = �4��, � = �:� − �:��i�4�:��j �′�:� " where � is the � × � genotype matrix, � is the matrix of covariate data, � = �Z7 � for continuous traits with identify matrix I, � = ����[�#�(� − �#�), … , �#�(� − �#�)] for dichotomous traits. If phenotype � is centered and adjusted for other covariates as assumed when using (9:!)� summary-level GWAS data, then � can be simplified and approximated by � ≈ & with <F( reference LD covariance matrix � [12]. The p-value by VC-TWAS can then be conveniently obtained from several approximation and exact methods like the Davies exact method [13], which can be done by using both individual-level and summary-level GWAS data. Text S4. Details of ROS/MAP data In our applications of studying Alzheimer’s dementia (AD), we used transcriptome and individual-level GWAS data generated for samples from the Religious Orders Study (ROS) and Rush Memory and Aging Project (MAP) [14-17] cohorts. ROS recruits nuns, priests, and brothers across the United States. MAP recruits participants living in private homes, subsidized housings, and retirement facilities across the greater Chicago metropolitan area. ROS/MAP data can be requested at www.radc.rush.edu. Both studies employ harmonized data collection methods performed by the same staff for annual testing during life and for structured autopsy and collection of genomic data from blood and brain biospecimens. Harmonized data collection facilitates joint analyses of the studies’ data. Details of the studies are described elsewhere. We used microarray genotype data generated for 2,093 European-decent subjects from ROS/MAP [14-17], which are further imputed to the 1000 Genome Project Phase 3 [18]. Post- mortem brain samples (gray matter of the dorsolateral prefrontal cortex) from ~30% these ROS/MAP participants with assayed genotype data are also profiled for transcriptomic data by next-generation RNA-sequencing [19], which are used as reference data to train GReX prediction models in our application studies. Using ROS/MAP data, we conducted TWAS for clinical diagnosis of late on-site Alzheimer’s dementia (LOAD) as well as pathology indices of AD quantified with �-antibody and PHFtau specific immunostains. Quantitative pathology phenotypes �-amyloid load and PHFtau tangle density were studied. An additional phenotype of the summary measure of the burden of AD pathology (a combination of neuritic and diffuse plaques and neurofibrillary tangles based on modified Bielschowsky silver stain) [14, 15, 17] was also studied. The tangle density quantifies the average PHFtau tangle density within two or more 20µm sections from eight brain regions –– hippocampus, entorhinal cortex, midfrontal cortex, inferior temporal, angular gyrus, calcarine cortex, anterior cingulate cortex, and superior frontal cortex. The �-amyloid load quantifies the average percent area of cortex occupied by �-amyloid protein in adjacent sections from the same eight brain regions. These two are based on immunohistochemistry. The global measure of AD pathology is based on counts of neuritic and diffuse plaques and neurofibrillary tangles (15 counts) on 6µm sections stained with modified Bielschowsky [14, 15, 17]. Supplementary Figures (A) pcausal 0.001 pcausal 0.01 100 100 75 75 50 50 Power Power 25 25 0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability pcausal 0.1 pcausal 0.2 100 100 75 75 50 50 Power Power 25 25 0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability BurdenTWAS_DPR BurdenTWAS_PrediXcan VCTWAS_filtered_DPR Method BurdenTWAS_filtered_DPR VCTWAS_DPR VCTWAS_PrediXcan (B) pcausal 0.001 pcausal 0.01 100 100 75 75 50 50 Power Power 25 25 0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability pcausal 0.1 pcausal 0.2 100 100 75 75 50 50 Power Power 25 25 0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability Figure 1. TWASFig powerS1. TWAS comparison power comparison for VC for- TWASVC-TWAS and and Burden Burden--TWASTWAS with with phenotypes phenotypes simulated from Model I (A) and Model II (B). Various types of SNP weights were considered, including those simulated fromderived Model from I (A) PrediXcan and Model method, II (B). DPR Various method, types and of filteredSNP weights DPR weights were. considered,In Model I, the including those derived from PrediXcan method, DPR method, and filtered DPR weights." Using combinations of causal probability and phenotype heritability arei�GHIJHK, ℎLj = ((0.001,0.2), DPR weights resulted(0.01,0. 3in), higher (0.1,0.4 TWAS), (0.2,0 power.5)). In Modelthan usingII, the combinationsPrediXcan weights of causal across probability all scenariosand phenotype with . TWAS using filtered" DPR weights had comparable performance as using p!"#$"% ≥ 0.heritability01 are i�GHIJHK, ℎLj = ((0.001,0.1), (0.01,0.1), (0.1,0.15), (0.2,0.15)). complete DPR weights across all scenarios. For phenotypes simulated from Model I (panel A), Burden-TWAS had either comparable power with VC-TWAS when p!"#$"% = (0.001, 0.01) or slightly higher power p!"#$"% = (0.1, 0.2). For phenotypes simulated from Model II (panel B), VC- TWAS had higher power than Burden-TWAS under all scenarios. VC-TWAS with DPR weights resulted in the highest power. A) Model I B) Model II 100 100 75 75 50 50 Power Power 25 25 0 0 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Gene expression heritability Gene expression heritability CoMM VCTWAS_DPR VCTWASSS CoMMSS VCTWAS_DPR_f VCTWASSS_f Fig S2. TWAS power comparison for VC-TWAS and CoMM with phenotypes simulated from Model I (A) and Model II (B) using individual-level and summary-level data under the scenarios with �GHIJHK = 0.2. Complete DPR Weights Filtered DPR Weights 30000 20000 Number of SNPs 10000 Median:6632 Median:2872 0 Complete DPR Weights Filtered DPR Weights Fig S3. Box plot of the number of test SNPs considered by VC-TWAS of all genome-wide genes in the application studies of AD, with complete DPR weights and filtered DPR weights derived from the ROS/MAP training data. Fig S4. Q-Q plots for VC-TWAS and Burden-TWAS with DPR weights, filtered DPR weights, and PrediXcan weights under null hypothesis, where quantitative gene expression traits were " generated with �GHIJHK = 0.2 and ℎM = 0.1. A) ZNF234 TRAPPC6A values) TOMM40 log10(p 0123456 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome B) HSPBAP1 values) log10(p 0123456 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 chromosome Fig S5.