bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1 Title: A Bayesian Framework for Multiple Trait Colocalization from Summary
2 Association Statistics
3
4 Claudia Giambartolomei1, Jimmy Zhenli Liu2, Wen Zhang3, Mads Hauberg3,4, Huwenbo
5 Shi5, James Boocock1, Joe Pickrell2, Andrew E. Jaffe6, the CommonMind Consortium#,
6 Bogdan Pasaniuc*1, Panos Roussos*3,7,8
7
8 1Department of Pathology and Laboratory Medicine, University of California, Los
9 Angeles, Los Angeles, CA 90095, USA; Department of Human Genetics, University of
10 California, Los Angeles, Los Angeles, CA 90095, United States of America.
11 2 New York Genome Center, New York, New York, United States of America
12 3Department of Genetics and Genomic Science and Institute for Multiscale Biology,
13 Icahn School of Medicine at Mount Sinai, New York, New York, 10029, United States of
14 America.
15 4The Lundbeck Foundation Initiative of Integrative Psychiatric Research (iPSYCH),
16 Aarhus University, Aarhus, 8000, Denmark.
17 5Bioinformatics Interdepartmental Program, University of California, Los Angeles, 90024
18 6Lieber Institute for Brain Development, Johns Hopkins Medical Campus; Departments
19 of Mental Health and Biostatistics, Johns Hopkins Bloomberg School of Public Health
20 Baltimore, MD, 21205, United States of America.
21 7Department of Psychiatry and Friedman Brain Institute, Icahn School of Medicine at
22 Mount Sinai, New York, New York, 10029, United States of America.
1 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
23 8Mental Illness Research Education and Clinical Center (MIRECC), James J. Peters VA
24 Medical Center, Bronx, New York, 10468, United States of America.
25
26 # The members of the CommonMind Consortium are listed under “Consortia”.
27
28 Correspondence:
29 Dr. Panos Roussos
30 Icahn School of Medicine at Mount Sinai
31 Department of Psychiatry and Department of Genetics and Genomic Science and
32 Institute for Multiscale Biology
33 One Gustave L. Levy Place,
34 New York, NY, 10029, USA
36
37 Dr. Claudia Giambartolomei
38 University of California, Los Angeles, Los Angeles
39 Department of Pathology and Laboratory Medicine,
40 Los Angeles, CA 90095, USA
42
43
44
45
2 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
46 ABSTRACT
47 Most genetic variants implicated in complex diseases by genome-wide association
48 studies (GWAS) are non-coding, making it challenging to understand the causative
49 genes involved in disease. Integrating external information such as quantitative trait
50 locus (QTL) mapping of molecular traits (e.g., expression, methylation) is a powerful
51 approach to identify the subset of GWAS signals explained by regulatory effects. In
52 particular, expression QTLs (eQTLs) help pinpoint the responsible gene among the
53 GWAS regions that harbor many genes, while methylation QTLs (mQTLs) help identify
54 the epigenetic mechanisms that impact gene expression which in turn affect disease
55 risk. In this work we propose multiple-trait-coloc (moloc), a Bayesian statistical
56 framework that integrates GWAS summary data with multiple molecular QTL data to
57 identify regulatory effects at GWAS risk loci. We applied moloc to schizophrenia (SCZ)
58 and eQTL/mQTL data derived from human brain tissue and identified 56 candidate
59 genes that influence SCZ through methylation. Our method can be applied to any
60 GWAS and relevant functional data to help prioritize diseases associated genes.
61
62
63
64
3 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
65 INTRODUCTION
66 Genome-wide association studies (GWAS) have successfully identified
67 thousands of genetic variants associated with complex diseases1. However, since the
68 discovered associations point to non-coding regions, it is difficult to identify the causal
69 genes and the mechanism by which risk variants mediate disease susceptibility.
70 Advancement of high-throughput array and sequencing technology has enabled the
71 identification of quantitative trait loci (QTLs), genetic variants that affect molecular
72 phenotypes such as gene expression (expression QTL or eQTL) and DNA methylation
73 (methylation QTL or mQTL). Integration of molecular QTL data has the potential to
74 functionally characterize the GWAS results. Additionally, analyzing two datasets jointly
75 has been a successful strategy to identify shared genetic variants that affect different
76 molecular processes, in particular eQTL and GWAS integration 2–8,13,19,26,28. Integrating
77 methylation data20 could help identify epigenetic regulatory mechanisms that potentially
78 control the identified genes and contribute to disease.
79 To our knowledge, a statistical approach to integrate multiple QTL datasets with
80 GWAS is lacking. Therefore, we developed multiple-trait-coloc (moloc), a statistical
81 method to quantify the evidence in support of a common causal variant at a particular
82 risk region across multiple traits. We applied moloc to schizophrenia (SCZ), a complex
83 polygenic psychiatric disorder, using summary statistics from the most recent and
84 largest GWAS by the Psychiatric Genomics Consortium9, which reported association for
85 108 independent genomic loci. eQTL data were derived from the CommonMind
86 Consortium10, which generated the largest eQTL dataset in the dorsolateral prefrontal
87 cortex (DLPFC) from SCZ cases and control subjects (N=467). Finally, we leveraged
4 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
88 mQTL data that were previously generated in human DLPFC tissue (N=121) to
89 investigate epigenetic variation in SCZ11. Integration of multiple phenotypes helps better
90 characterize the genes predisposing to complex diseases such as SCZ.
91
92 MATHERIALS AND METHODS
Methods93 Method Description
94 We extended the model of Pickrell3 and coloc2 to analyze jointly multiple traits. For each Overview of the moloc method 95 variant, we assume a simple linear regression model to relate the vector of phenotypes We extended the model from Pickrell (doi: 10.1038/ng.3570) and coloc to analyze jointly more than two traits. We will96 use� colocalization or a log-odds of three generalized traits, a, b andlinear c, to model illustrate for how the the case model-control can be extendeddataset to, and any numberthe vector of traits. of For each variant, we assume a simple linear regression model to relate the vector of phenotypes ~y or a log-odds generalized97 genotypes linear model �. Under for the this case-control model dataset,the expectation and the vector of the of genotypes trait is: ~x:
98 E[yi]= xi 99 We define a genomic region containing Q variants, for example a cis region around expression or methylation probe. We are interested in a situation where summary statistics (e↵ect size estimates and standard errors) are available 100for all datasetsWe define in a genomic a genomic region withregionQ variants. containing Q variants, for example a cis region around
101 expression or methylation probe. We are interested in a situation where summary We make two important assumptions. Firstly that the causal variant is included in the set of Q common variants, 102either directlystatistics typed (effect or well size imputed. estimates If the causal and SNP standard is not present, errors the) are power available to detect for a common all datasets variant willin bea reduced depending on the LD between the causal SNP and the included SNPs (see coloc paper). Secondly, we assume at most one association is present for each trait. Thus, for three traits, there can be up to three causal variants and 10315 possible genomic configurations region ofwith how Q they variants are shared. among the traits. In the presence of multiple causal variants per trait, this algorithm is not able to identify colocalization between additional association signals independent from the primary one. Although this is a limitation, these cases are rare and the advantage of using more complex models 104that requireWe LDmake is still two unclear important (Paintor, assumptions eCAVIAR, etc). Firstly [ in Discussion, that the ]. causal variant is included in the
105The algorithmset of Q estimates common the variants evidence, ineither support directly of di↵ erenttyped scenarios. or well imputed In the three-trait. If the causal situation, SNP we wishis not to estimate posterior probabilities for fifteen possible configurations. Four examples of configurations are show in in 106Figure X.present We are, mostthe power interested to in detect the scenarios a common supporting variant a shared will causal be reduced variant for depending two and three on traits. the LD This algorithm requires the definition of prior probabilities at the SNP level. The probability of the data under each 2 107hypothesis between is computed other by combiningSNPs included the regional in Bayesthe model factor withand the the priors causal to assess SNP the support(see coloc for each paper scenario.). 4 We evaluated the method in simulations and found that the priors with the greatest power are using 1 10 for the 7 ⇥ 108prior probabilitySecondly of, awe SNP assume being associated at most with one each causal trait, 1 variant10 for theis present SNP being for associated each withtrait. two In traits,the 7 ⇥ and 1 10 for a SNP being associated with all three traits. For the WABF, we averaged BF between three prior ⇥ 109variances presence of (0.01, 0.1,of 0.5).multiple causal variants per trait, this algorithm is not able to identify
110 colocalization between additional association signals independent from the primary one. Bayes Factor computation 5
Bayes factors for association measure the relative support for di↵erent models in which the SNP is associated with 1 or more traits, compared to the null model of no association.
In the three-trait situation, for a single SNP there are eight possible models to consider:
1 M1: the SNP is associated with trait 1 (but not trait 2 nor trait 3); M2: the SNP is associated with trait 2 (but not trait 1 nor trait 3); M3: the SNP is associated with trait 3 (but not trait 1 nor trait 2); M4: the SNP is associated with both trait 1 and trait 2 (but not trait 3); M0:M5: the the SNP SNP is is associated associated with with none both of trait the 1 traits; and trait 3 (but not trait 2); M1:M6: the the SNP SNP is is associated associated with with trait both 1 trait (but 2 not and trait trait 2 3 nor (but trait not 3); trait 1); M2:M7: the the SNP SNP is is associated associated with with trait all traits. 2 (but not trait 1 nor trait 3); M3:bioRxiv the preprint SNP isdoi: associated https://doi.org/10.1101/155481 with trait 3 (but; this not version trait 1posted nor traitJune 26, 2); 2017. The copyright holder for this preprint (which was not certifiedTo compute by peer thereview) Bayes is the Factorsauthor/funder, (BFs) who for has each granted SNP bioRxiv under a license one to of display these the models, preprint wein perpetuity. make use It is ofmade the available Asymptotic under M4: the SNP is associated with both trait 1a andCC-BY trait 4.0 2International (but not license trait 3);. M5:Bayes the Factor SNP is derivation associated?. with Let bothˆ be the trait maximum 1 and trait likelihood 3 (but not estimator trait 2); of , and pV be the standard error of that M6:estimate, the SNP in an is asymptoticassociated with setting both ˆ traitN( 2 ,V and). trait If we 3 assume (but not that trait the 1); e↵ect size =0underthenull,whileunder ! M7:the alternative, the SNP is associated N(0,W with), to all compute traits. the WABF we only need Z-scores from a standard regression output, and pW , the standard! deviation of the normal prior N(0,W) on . We average over Bayes factors using W = 0.01, W 111To= 0.1, computeWe and start W the = by Bayes 0.5. computing Factors (BFs) a Bayes for each Factor SNP underfor each one ofSNP these and models, each we of make the trait use of (i.e the GWAS, Asymptotic Bayes Factor derivation ?. Let ˆ be the maximum likelihood estimator of , and pV be the standard error of that estimate, in an asymptotic setting ˆ N( ,V). If we assume that the e↵ect size =0underthenull,whileunder12 112 eQTL, mQTL). Using the! Wakefield Approximate1 ZBayes2 factors (WABF), only the the alternative, N(0,W), to computeW theABF WABF= we onlyexp need Z-scoresr from a standard regression output, and(1) p ! p1 r ⇥ 2 ⇥ 113 W ,variance the standard and deviation effect of estimates the normal priorfrom N(0,W) regression on . We analysis average overare Bayes needed factors, as using previously W = 0.01, W = 0.1, and W = 0.5. where Z = ˆ/pV2,3is the usual Z statistic and the shrinkage factor r is the ratio of the variance of the prior and total 114variance descri (r bed= W/(V. The+ W )).computation of WABF also includes the shrinkage factor r, the ratio of 1 Z2 W ABF = exp r (1) p1 r ⇥ 2 ⇥ 115To simplifythe prior computations, variance W following (expected Pickrell2014 effect size we use under the reverse the alterna regression tive) model and wheretotal variance the genotypes, r = and phenotypes are swapped. The WABF are identical as long as the shrinkage factor r remains the same. 116where WZ/(=V ˆ+/ pWV).is The the usualevidence Z statistic in sup andport the of shrinkage one of factor the models r is the ratio with of more the variance than 1 of trait the prioris: and total varianceThe evidence (r = W/ in support(V + W )). of one of the models with >1 trait is:
(m) To simplify computations, following Pickrell2014BF we= use theW reverseABFi regression model where the genotypes and(2) i m 117phenotypes are swapped. The WABF are identical as longY2 as the shrinkage factor r remains the same. The evidence in support of one of the models with >1 trait is: 118A keyWe assumption next estimat is thate the the support traits do notfor sharepossible overlapping scenarios individuals. in a given We couldgenomic relax region this assumption. We call as in Pickrell2014. (m) BF = W ABFi (2) i m 119For each“configuration SNP, we assign” a apossible prior probability combination according of to nY2how set manyof bin traitsary thatvectors SNP isindicating associated whether with, and constantthe across SNPs (see section below). 120A keyvariant assumption is causal is that for the the traits selected do not share trait, overlapping where n individuals.is the number We couldof traits relax considered this assumption. For as in Pickrell2014. 121ForRegional eachinstance SNP, we, Bayesif assign we consider a Factorsprior probability three and traits according Posterior, there to how can many computations be traitsup to that three SNP causal is associated variants with, andand constant 15 across SNPs (see section below). 122 possible configurations of how they are shared among the traits. We combine the Bayes We now turn our attention to estimating the support for possible scenarios in a given genomic region. If we consider 123three Factor traits, wefor haveeach 15 configuration possible hypotheses. with Wethe canpriors compute to assess the probabilities the support given for the dataeach for scenario each of these. Regionalhypothesis by Bayes summing Factors over the relevant and configurations. Posterior For computations each configuration S and observed data D,The 124likelihood For each of configuration configurationh relative S and to the observed null (H0) data is given D, by: the likelihood of configuration h relative We now turn our attention to estimating the support for possible scenarios in a given genomic region. If we consider P (Hh D) P (D S) P (S) three traits, we have 15 possible hypotheses.| We= can compute| the probabilities given the data for each of these(3) P (H0 D) P (D S0) ⇥ P (S0) 125 to the null (H0) is given by: S S hypothesis by summing over the relevant configurations.| X2 h For| each configuration S and observed data D,The likelihood of configuration h relative to the null (H0) is given by: 126 P (H D) 2P (D S) P (S) h | = | (3) P (H0 D) P (D S0) ⇥ P (S0) S S 127 | X2 h | 127 (1)
128 2
129
130 where, P(D|S)/P(D|S0) is the Bayes Factor for each configuration, and P(S)/P(S0) is the
131 prior odds of a configuration compared with the baseline configuration S0, and the sum
132 is over all configurations which are consistent with a given hypothesis.
6 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. P (D S) P (S) where, P (D |S ) is the Bayes Factor for each configuration, and P (S ) is the prior odds of a configuration compared | 0 0 with the baseline configuration S0.
The Regional Bayes Factors for the 15 possible scenarios using three traits G, E, M, is then: 133 The Regional Bayes Factor (RBF) is the Bayes Factor for each configuration combined P (D S) P (S) where, Q | is the Bayes Factor for each configuration, and is the prior odds of a configuration compared P (D S0) P (S0) RBF = |⇡(1)W ABF (1) (4) 134 with the priors towith assessG the baseline the support configurationi for Seach0. scenario. If priors do not vary across i=1 X The RegionalQ Bayes Factors for the 15 possible scenarios using three traits G, E, M, is then: 135 SNPs under the same hypotheses(2) (2), we can multiply the likelihoods by one common RBFE = ⇡ W ABFi (5) i=1 Q X (1) (1) 136 prior. For example, if we Qlet RBFthe G“.”= in the⇡ subscriptW ABFi denote scenarios supporting different (4) (3) (3) i=1 RBFM = ⇡ W ABFXi (6) i=1 Q 137 causal variants, the RBFX supporting the (2)scenario(2) for one causal variant shared between Q RBF = ⇡ W ABF (5) The equations for the model with no colocalizationE can be re-writteni in terms of the model with colocalization. (1,2) i=1 (1) (2) RBFGE = ⇡ W ABFXi W ABFi (7) 138 traits G and E (equation 1)i=1 is: Q For example, the RBF for non-colocalizedX signals above(3) would(3) be: Q RBFQ M = ⇡ W ABFi (6) (1) (2)i=1 (1)(1) (2) (2) RBFG.E = ⇡ ⇡ W ABF⇡ i W⇡ ABFj I[i = j] (8) RBFa.b = RBFa RBFXb ⇥ RBFab6 (18) i=1 j=1 ⇥ Q ⇡(1,2) ⇥ X X (1,2) (1) (2) Q RBFGE = ⇡ ⇡(2)W ABF⇡(3) i W ABFi (7) RBFb.c = RBF(2,b3) RBFi=1c (2) ⇥ (3) RBFbc (19) 139 RBFEM = ⇡ ⇥W ABFXi W ABF⇡(2,3)i ⇥ (9) Q Q i=1 (1) (3) X ⇡ (1) ⇡(2) (1) (2) RBF =QRBFRBFQ G.ERBF= ⇡ ⇥⇡ W ABFRBFi W ABFj I[i = j] (20) (8) 140 where π is the prior probabilitya.c aaccordingc to how(1,3) many traitsac the SNP 6 is associated with, ⇥(2) (3)i=1 j=1 ⇡(2) ⇥ (3) RBFE.M = ⇡ ⇡ XW ABFX (1i ,2)W ABF(3)j I[i = j] (10) The equations for the model with no colocalizationQ ⇡ can be⇡ re-written6 in terms of the model with colocalization. RBF =i=1RBFj=1 RBF ⇥ RBF (21) 141 I is an indicator that evaluatesab.c X X toab 1 if i,j, orc (2 i,3)j,k,(1 are,2,3) different(2) abc(3)and 0 otherwise. Q RBFEM⇥= ⇡ W⇡ ABFi W⇥ ABFi (9) (1) (2,3) (1,3) i=1(1) ⇡ (3)⇡ RBFRBFGM = = RBF⇡ W ABFRBFXi W ABF⇥i RBF (22) (11) 142 While the RBF summarizinga.bc thea scenario⇥ Qbc Qwith⇡ (1one,2,3) causal⇥ variantabc for traits G and E, and i=1 (2) (3) Model with i traitsXRBF = ⇡(1(2),3)⇡(3)W(2) ABF W ABF I[i = j] (10) Q Q E.M ⇡ ⇡ i j 6 RBFac.b = RBFac(1) RBF(3)i=1b j=1 (1) ⇥ (3) RBFabc (23) 143 a different causalRBF variantG.M = for trait⇡ M⇥⇡, is:W ABF ⇡(1W,2 ABF,3) ⇥I[i = j] (12) X X i j 6 i=1 j=1 QQ ⇡(1) (⇡i)(2)Q ⇡(3) P (HhXDX) (i) (1,3)(i) i(1)m ⇡ (3)(1,2,...M) (1,2,...m) RBFa.b.c =| RBFRBF= GMa =⇡RBFb⇡BFRBFW ABFc 2 W⇥ ABF⇥⇡ RBFBFabc (24) (18) (11) P (H DQ) Q ⇥ ⇥ j ⇡(1i,2,...M⇡(1),2,3)i ⇥ j 0 i m(1,2) i=1(3)j=1 (1) Q (2) j=1 (3) RBFGE.M =| Y2⇡ ⇡XXW ABFi W ABFi WX ABFj I[i = j] (13) Q Q 6 i=1 j=1 (1) (3) (1) (3) 144 If priors do not vary across SNPs underXRBFX theG.M same= hypotheses,⇡ ⇡ weW can ABF multiplyW ABF the likelihoodsI[i = j] by one common prior. (12) For(1) example,(2) using(3) three traitsQ G,Q E, M, the RBF for non-colocalizedi signalsj above6 would be: We set ⇡ = ⇡ = ⇡ , i.e. we set the prior(1) probability(2i=1,3) j=1 that(1) SNP i is(2) the causal(3) one for each trait, to be identical, RBFG.ME = (1,2) ⇡ ⇡(1,X3) WX ABF(2,3) i W ABFj W ABFj I[i = j] (14) 145 andNotably refer to, thisthe asequationsp1. We also for set the⇡ model= ⇡ withQ= ⇡noQ colocalization(1), i.e. the(2) prior probability can be re6 that-written SNP iin the terms causal of one for i=1 j=1 ⇡ ⇡ two traits, to be identical andRBF referG.E X= toRBF thisX asG p2RBF. WeE refer to(1,2) the⇥ (3) prior probabilityRBF(1)GE that(2) SNP i the(3) causal for all traits (19) RBFGE.M = ⇡ (1,⇡2) W ABFi W ABFi W ABFj I[i = j] (13) as p . Q Q ⇥ ⇡ ⇥ 6 146 the3 model with colocalization. (1,2) (3)i=1 j=1 (2)(1) (3) (2) (3) RBFGM.E = ⇡ ⇡ XWX ABF⇡i W ABF⇡ i W ABFj I[i = j] (15) RBFE.M = RBFE RBFQ MQ ⇥ RBFEM 6 (20) i=1 j=1 ⇥ ⇡(2,3) ⇥ Then, the posterior probability supportingXRBFX configuration= ⇡h(1)among⇡(2,3)WH ABFpossible(1)W configurations, ABF (2)W ABF is:(3)I[i = j] (14) 147 For example, Q G.ME ⇡(1) ⇡(3) i j j 6 (1,2,3) i=1(1)j=1 (2) P ((3)Hh D) RBFG.M = RBFG RBFM (1⇥,3) RBFGM| (21) RBFGEM = ⇡ W⇥ ABFXiPX(WH h ABFD⇡) i W ABF⇥ Pi(H0 D) (16) PPh = P (Hh D)=Q Q | = | (25) i=1 H (1,2) (3) H P (Hi D) X | P(1(H,2)⇡i) (3) 1+⇡ (1) | (2) (3) RBF RBFQ= RBFQGM.EQ = RBFi=0 ⇡ ⇡ W⇥ ABFii=1RBFWP ( ABFH0 D)i W ABFj I[i = j] (22) (15) GE.M GE M (1,2,3) GEM| 6 148 (1)⇥i=1(2)j=1(3) ⇡ (1) ⇥ (2) (3) RBFG.E.M = ⇡ X⇡PX⇡ W ABF(1) i W(2,P3) ABFj W ABFk I[i = j, i = k, j = k] (17) Q ⇡ ⇡ 6 6 6 RBF i=1= jRBF=1 k=1 RBF RBF (23) G.EMX X XG EM(1,2,3) (1⇥,2(1),3) (2) GEM (3) 149 ModelIn general with, the nmodel traits supportingRBFGEM configuration⇥= ⇡ hW with ABF⇡ in traitsW ABF⇥ isi: W ABFi (16) i=1 ⇡(1,3) ⇡(2) RBF = RBF XRBF ⇥ RBF (24) GM.E GM ⇥ Q QE Q ⇡(1,2,3) ⇥ GEM 150 Q (1) (2) Q(3) (1) (2) (3) RBF = ⇡ ⇡(n) ⇡(1)W ABF(2) W(3) ABF W ABF I[i = j, i = k, j = k] (17) P (Hh D) (n) G.E.M (n) h H ⇡3 ⇡ (1,⇡2,...Hi) ⇡ (1j,2,...h) k 2 6 6 6 | RBF=G.E.M⇡ = RBFWG ABFRBFi=1j jE=1 k=1RBF(1,2,...HM ) ⇡⇥ ⇥W ABFjRBFGEM (26) (25) P (H0 D) ⇡ ⇡(1,2,3) 151 h H j=1 ⇥ X X⇥XQ j=1 ⇥ 151 (2) | Y2 X X 152 If priors do not vary across SNPs under the same hypotheses, we can multiply the likelihoods by one common prior. CorrelationWe set ⇡(1) = ⇡ in(2) = the⇡(3), e i.e.↵ect we set sizes the prior probability that SNP i is the causal3 one for each trait, to be identical, (1,2) (1,3) (2,3) and refer to this as p1. We also set ⇡ = ⇡ = ⇡ , i.e. the prior probability that SNP i the causal one for two traits, to be identical and refer to this as p2. We refer to the prior probability that SNP i the causal7 for all traits Thisas is atp3. page 5 of Pickrell’s paper:
Then, the posterior probability supporting configuration h among H possible configurations, is: n N N Cor(Z1,Z2) = E 0 ⇢ + ⇢ E ⇢ (27) g P (Hh D) n1n2 pN1N2 ⇡ pN1N| 2 P (Hh D) P (H0 D) PPh = P (Hh D)= | = | (26) | H H P (Hi D) i=0 P (Hi) 1+ i=1 P (H |D) 4 0| P P
4 The equations for the model with no colocalization can be re-written in terms of the model with colocalization. bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certifiedFor example,by peer review) the is RBF the author/funder, for non-colocalized who has granted signals bioRxiv above a license would to be: display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. ⇡(1) ⇡(2) RBF = RBF RBF ⇥ RBF (18) a.b a ⇥ b ⇡(1,2) ⇥ ab ⇡(2) ⇡(3) RBFb.c = RBFb RBFc ⇥ RBFbc (19) 153 where n is the number of traits considered⇥ , h⇡ (2is,3) the⇥ configuration of interest out of the H ⇡(1) ⇡(3) RBFa.c = RBFa RBFc (1⇥,3) RBFac (20) 154 possible configurations and π is ⇥the prior probability⇡ ⇥ according to how many traits the ⇡(1,2) ⇡(3) RBFab.c = RBFab RBFc (1,⇥2,3) RBFabc (21) 155 SNP is associated with. ⇥ ⇡ ⇥ ⇡(1) ⇡(2,3) RBFa.bc = RBFa RBFbc (1⇥,2,3) RBFabc (22) 156 We set the prior probability that a⇥ SNP is the⇡ causal ⇥one for each trait, to be identical ⇡(1,3) ⇡(2) RBF = RBF RBF ⇥ RBF (23) ac.b ac ⇥ b ⇡(1,2,3) ⇥ abc 157 (π(1) = π(2) = π(3)) and refer to this as p1. We also set the prior probability that a SNP is ⇡(1) ⇡(2) ⇡(3) RBF = RBF RBF RBF ⇥ ⇥ RBF (24) a.b.c a ⇥ b ⇥ c ⇡(1,2,3) ⇥ abc 158 associated with two traits, to be identical (π(1,2) = π(1,3) = π(2,3)) and refer to this as p2. If priors do not vary across SNPs under the same hypotheses, we can multiply the likelihoods by one common prior. 159 WeWe set refer⇡(1) = to⇡ (2)the= prior⇡(3), i.e.probability we set the that prior a probability SNP is causal that SNP for i isall the traits causal as one p3 for. each trait, to be identical, (1,2) (1,3) (2,3) and refer to this as p1. We also set ⇡ = ⇡ = ⇡ , i.e. the prior probability that SNP i the causal one for 160 twoFinally traits,, tothe be identicalposterior and referprobability to this as p2supporting. We refer to theconfiguration prior probability h thatamong SNP i theH causalpossible for all traits as p3. 161 configurations, is computed: Then, the posterior probability supporting configuration h among H possible configurations, is:
P (H D) 162 h| P (Hh D) P (H0 D) PPh = P (Hh D)= | = | (25) | H H P (Hi D) i=0 P (Hi) 1+ i=1 P (H |D) 163 1630| (3) P P 164 Model with i traits 165 Q (i) Q P (Hh D) (i) (i) i m ⇡ (1,2,...M) (1,2,...m) 2 | = ⇡ BFj (1,2,...M) ⇡ BFj (26) 166 GWAS dataset P (H0 D) ⇡ i m j=1 Q j=1 | Y2 X X 167 Summary statistics for genome-wide SNP association with Schizophrenia were Correlation in the e↵ect sizes 168 obtained from the Psychiatric Genomics Consortium-Schizophrenia Workgroup (PGC-
169 ThisSCZ is) at primary page 5 of meta Pickrell’s-analysis paper: (35,476 cases and 46,839 controls) 9.
170 n0 N N Cor(Z1,Z2) = E ⇢g + ⇢ E ⇢ (27) n n pN N ⇡ pN N 1 2 1 2 1 2 171 Expression QTL (eQTL) analysis 4 172 This analysis used RNA sequence data on individuals of European-ancestry (N =
173 467) from post-mortem DLPFC (Brodmann areas 9 and 46), and imputed genotypes
174 based on the Phase 1 reference panel from the 1,000 Genomes Project as previously
175 described10. MatrixEQTL14 was used to fit an additive linear model between the
8 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
176 expression of 15,791 genes and imputed SNP genotypes within a 1 Mb window around
177 the transcription start site for each gene, including covariates for ancestry, diagnosis,
178 and known and hidden variables detected by surrogate variable analysis, as described
179 elsewhere10. Overall, the model identified 2,154,331 significant cis-eQTL, (i.e., SNP–
180 gene pairs within 1 Mb of a gene) at a false discovery rate (FDR) ≤ 5%, for 13,137
181 (80%) genes.
182
183 Methylation QTL (mQTL) analysis
184 DNA methylation of postmortem tissue homogenates of the dorsolateral
185 prefrontal cortex (DLPFC, Brodmann areas 9 and 46) from non-psychiatric adult
186 Caucasian control donors (age > 13, N=121) was measured using the Illumina
187 HumanMethylation450 (“450k”) microarray (which measures CpG methylation across
188 473,058 probes covering 99% of RefSeq gene promoters). DNA for genotyping was
189 obtained from the cerebella of samples with either the Illumina Human Hap 650v3,1M
190 Duo V3, or Omni 5M BeadArrays and merged across the three platforms following
191 imputation to the 1000 Genomes Phase 3 reference panel as previously described11.
192 The mQTL analyses was then conducted using the R package MatrixEQTL14, fitting an
193 additive linear model up to 20kb distance between each SNP and CpG analyzed,
194 including covariates for ancestry and global epigenetic variation.
195
196 Moloc Analysis
197 Previous to running the analyses, the GWAS and eQTL datasets were filtered by
198 poorly imputed SNPs (kept only SNPs with Rsq > 0.3). The Major Histocompatibility
9 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
199 (MHC) region (chr 6: 25 Mb - 35 Mb) was excluded from all co-localization analyses due
200 to the extensive linkage disequilibrium. We applied a genic-centric approach, defined
201 cis-regions based on a 50kb upstream/downstream from the start/end of each gene,
202 since our goal is to link risk variants with changes in gene expression. We evaluated all
203 methylation probes overlapping the cis-region. The number of cis-regions/methylation
204 pairs is higher than the count of genes because, on average, there are more than one
205 methylation sites per gene. Common SNPs were evaluated in the colocalization
206 analysis for each gene, and each methylation probe, and GWAS. In total, 14,115 cis-
207 regions and 534,962 unique cis-regions/methylation probes were tested. Genomic
208 regions were analyzed only if 50 SNPs or greater were in common between all the
209 datasets. Across all of the analyses, a posterior probability equal to, or greater than,
210 80% for each configuration was considered evidence of colocalization.
211 In order to compare existing method for colocalization of two trait analyses with
212 three traits, we applied moloc using the same region definitions, but with two traits
213 instead of three, as well as a previously developed method (coloc2). Effect sizes and
214 variances were used as opposed to p-values, as this strategy achieves greater
215 accuracy when working with imputed data2.
216
217 Simulations
218 We simulated genotypes from sampling with replacement among haplotypes of
219 SNPs with a minor allele frequency of at least 5% found in the phased 1000 Genomes
220 Project within 49 genomic regions that have been associated with type 1 diabetes (T1D) 10 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
221 susceptibility loci (excluding the major histocompatibility complex (MHC) as previously
222 described15. These represent a range of region sizes and genomic topography that
223 reflect typical GWAS hits in a complex trait. For each trait, two, or three “causal
224 variants” were selected at random, and a Gaussian distributed quantitative trait for
225 which each causal variant SNP explains a specified proportion of the variance was
226 simulated. All analyses were conducted in R.
11 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
227 RESULTS
228 Overview of the moloc method
229 In this study, we demonstrate the use of moloc on three traits that have been
230 measured in distinct datasets of unrelated individuals, GWAS (defined as G), eQTL
231 (defined as E) and mQTL (defined as M). If we consider three traits, there can be up to
232 three causal variants and 15 possible scenarios summarizing how the variants are
233 shared among the traits. We can compute a probability of the data under each
234 hypothesis by summing over the relevant configurations. Four examples of
235 configurations are show in Figure 1. The “.” In the subscript denotes scenarios
236 supporting different causal variants. For instance, GE summarizes the scenario for one
237 causal variant shared between traits GWAS and eQTL (Figure 1 - Right plot top panel);
238 GE.M summarizes the scenario with one causal variant for traits GWAS and eQTL, and
239 a different causal variant for trait mQTL (Figure 1 - Left plot bottom panel). We then
240 estimate the evidence in support of different scenarios using equation (3). The algorithm
241 outputs 15 posterior probabilities. We are most interested in the scenarios supporting a
242 shared causal variant for two and three traits, involving the eQTL trait.
243
244 Sample size requirements
245 We explored the posterior probability under different sample sizes. Figure S1
246 illustrate the posterior probability distribution across all of the possible scenarios that
247 includes three traits: GWAS, eQTL and mQTL. With a GWAS sample size of 10,000
12 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
248 and eQTL and mQTL sample sizes of 300, the method provides reliable evidence to
249 detect a shared causal variant behind the GWAS and another trait (median posterior
250 probability of any hypothesis >50%). Although in this paper we analyze GWAS, eQTL
251 and mQTL, our method can be applied to any combinations of traits, including 2 GWAS
252 traits and an eQTL dataset. We explored the minimum sample size required when
253 analyzing two GWAS datasets (GWAS1, GWAS2) and one eQTL (Figure S2). The
254 method provides reliable evidence for all hypotheses when the two GWAS sample sizes
255 are 10,000 and eQTL sample size reaches 300.
256 It is instructive to observe where evidence for other hypotheses is distributed. Figure
257 2A illustrates the accuracy of our approach under different scenarios where two or three
258 causal variants are shared. For example, under simulations of one shared variant for
259 GWAS and eQTL and a second variant for mQTL (GE.M), on average 60% of the
260 evidence points to the simulated scenario, while 12% point to GE, 12% to G.E.M and
261 7.2% to GEM.
262 We examined whether the inclusion of a third trait increases power to detect
263 colocalization in comparison to running analysis with two traits on the same data. For
264 the colocalization of three traits, we consider any scenarios where there is evidence of
265 colocalization between the GWAS and eQTL datasets, i.e. GE, GE.M, GEM. We note
266 that using only eQTL data recovers fewer colocalizations with GWAS loci when there is
267 truly one single causal variant across the datasets (Figure 2B), providing additional
268 support for increasing power by adding mQTLs. In this study we focus on the
269 colocalization of GWAS with eQTL (GEM, GE.M or GE scenarios), due to smaller
13 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
270 sample size and limited power in the mQTL dataset.
271
272 Choice of priors
273 The algorithm requires the definition of prior probabilities at the SNP level for the
274 association with one (p1), two (p2), or three traits (p3). We set the priors to p1 = 1 x 10-
275 4, p2 = 1 x 10-6, p3 = 1 x 10-7 based on simulations and exploratory analysis of genome-
276 wide enrichment of GWAS risk variants in eQTLs and mQTLs. We set the prior
277 probability that a variant is associated with one trait as 1 x 10-4 for GWAS, eQTL and
278 mQTL, assuming that each genetic variant is equally likely a priori to affect gene
279 expression or methylation or disease. This estimate has been suggested in the literature
280 for GWAS16 and used in similar methods6. In Figure S3, we find eQTLs and mQTLs to
281 be similarly enriched in GWAS, justifying our choice of the same prior probability of
282 association across the two traits. These values are also suggested by a crude
283 approximation of p2 and p3 from the common genome-wide significant SNPs across the
284 three dataset.
285 We varied the prior probability that a variant is associated with all three traits in
286 simulations (Table S1). We find that our choice of priors has good control of false
287 positive rates under the GEM scenario (<1%) and the smallest sum across our
288 scenarios of interests. We ran our real data analyses using different priors, and report
289 our results under the most restrictive set of priors tested (Table S4). We note that our R
290 package implementation allows users to specify a different set of priors.
14 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
291
292 Co-localization of eQTL, mQTL and risk for Schizophrenia
293 We applied our method to SCZ GWAS using eQTLs derived from 467 CMC
294 samples and mQTL from 121 individuals. Our aim is to identify the genes important for
295 disease through colocalization of GWAS variants with changes in gene expression and
296 DNA methylation. We analyzed associations genome-wide, and report results both
297 across previously identified GWAS loci, and across potentially novel loci. While we
298 consider all 15 possible scenarios of colocalization, here we focus on gene discovery
299 due to higher power in our eQTL dataset, by considering the combined probabilities of
300 cases where the same variant is shared across all three traits GWAS, eQTLs and
301 mQTLs (GEM > 0.8) or scenarios where SCZ risk loci are shared with eQTL only (GE >
302 0.8 or GE.M > 0.8) (Table 1). We identified 1,173 cis-regions/methylation pairs with
303 posterior probability above 0.8 that are associated with all three traits (GEM), or eQTLs
304 alone (GE or GE.M). These biologically relevant scenarios affect overall 97 unique
305 genes. Fifty-six out of the 97 candidate genes influence SCZ, gene expression and
306 methylation (GEM>=0.8). One possible scenario is that the variants in these genes
307 could be influencing the risk of SCZ through methylation, although other potential
308 interpretations such as pleiotropy should be considered.
309
310 Addition of a third trait increases gene discovery
311 We examined whether moloc with 3 traits enhance power for GWAS and eQTL
312 colocalization compared to using 2 traits. Colocalization analysis of only GWAS and
15 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
313 eQTL traits identified 11 genes with GE >= 0.8 and two genes within a previously
314 associated SCZ LD block (FURIN and PCCB), which indicates a ~9 fold increase in the
315 genes discovered when we consider an additional trait. The 97 genes identified with a
316 high probability of influencing SCZ (GEM, GE, GE.M>=0.8) are listed in Table S3. The
317 89 additional genes that were found by adding methylation include genes such as
318 AS3MT that would have been missed by only GWAS and eQTL colocalization.
319
320 Loci overlapping reported SCZ LD blocks
321 Psychiatric Genomics Consortium (PGC) identified 108 independent loci and
322 annotated LD blocks around these, 104 of which are within non-HLA, autosomal regions
323 of the genome9. In Table 1 we report the number of identified gene-methylation pairs
324 and unique genes under each scenario that overlap one of these previously defined
325 SCZ LD blocks. We examined associations for 79 out of the 104 SCZ LD blocks. We
326 found colocalizations in 22 (or 28%) of the SCZ LD blocks examined with an average
327 gene density per block of 2. 12,856 gene-methylation pairs overlap the SCZ LD regions,
328 and Figure 3A illustrates the average distribution of the posteriors across these regions.
329 Cumulatively, 12.3% of the evidence points to shared variation with an eQTL (GE,
330 GE.M and GEM). The majority of the evidence within these regions (62.2%) did not
331 reach support for shared variation across the three traits, with 19% not reaching
332 evidence for association with any traits, and 43.2% with only one of the three traits (35%
333 with GWAS; 6.4% with eQTL, 1.8% with mQTLs). The lack of evidence in these regions
334 could be addressed with greater sample sizes. Figure 3B shows the evidence for
335 colocalization of GWAS with eQTL or mQTL across the forty-four candidate genes. We
16 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
336 provide illustrative examples of SCZ association with expression and regulatory DNA
337 region in the FURIN locus (Figure 4 and Figure S4).
338
339 Potentially novel SCZ loci
340 We found 53 unique genes in below genome-wide significant regions (novel SCZ
341 associations). All genes were far from a SCZ LD block (more than 50kb, Table S4), and
342 contained SNPs with p-values for association with SCZ ranging from 10-4 to 10-9. These
343 genes will likely be identified using just the GWAS signal if the sample size is increased.
344 KCNN3 is among these genes which encodes an integral membrane protein that forms
345 a voltage-independent calcium-activated channel. It regulates neuronal excitability by
346 contributing to the slow component of synaptic afterhyperpolarization17.
347
348 Comparison with previous findings
349 Our gene discovery analysis replicate several previous results10,18–20 (Table 2).
350 One recent study20 performed mQTL analysis on 1714 individuals from three
351 independent sample cohorts, and used colocalization between mQTLs and SCZ GWAS
352 to identify genomic regions associated with both schizophrenia and methylation. From
353 their analysis, 32 methylation probes have a posterior probability of colocalizing with
354 SCZ >=0.8. We analyzed 15 out of these methylation probes, and reproduced 7 for
355 colocalization of GWAS and methylation (combined GEM, GM, GM.E >=0.8,
356 cg00585072, cg02951883, cg08607108, cg08772003, cg19624444, cg26732615).
357 Hannon et al.20 annotated these regions with 26 genes. Since we integrate eQTL
358 information, our analysis points to specific genes responsible for these associations.
17 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
359
360 Association of gene expression with methylation
361 DNA methylation is one the best studied epigenetic modifications. Methylation
362 can alter gene expression by disrupting transcription factor binding sites (with variable
363 consequences to expression depending on the TF), or by attracting methyl-binding
364 proteins that initiate chromatin compaction and gene silencing. Therefore methylation
365 can be associated with both increased or decreased gene expression21,22. Increased
366 CpG methylation in promoter regions is usually associated with silencing of gene
367 expression23. However, in genome-wide expression and methylation studies, the
368 correlation of methylation and gene expression is low or the pattern of association is
369 mixed, even for CpG methylation within promoter22. One challenge of examining DNA
370 methylation with expression is the uncertainty of linking the CpG site with a specific
371 gene, especially for CpG sites that are distal to any coding genes. To overcome this
372 challenge, we sought to explore direction of effects of methylation and expression, for
373 gene expression and DNA methylation that colocalize with posterior probability above
374 0.8 (GEM, EM, and G.EM scenarios) (Table 1). Overall, we tested 2,227 DNA
375 methylation and gene expression pairwise interactions and found a significant negative
376 correlation between the effect sizes of methylation and expression in the proximity of
377 the transcription start site (Figure 5).
18 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
378 DISCUSSION
379 In this paper, we propose a statistical method for integrating genetic data from
380 molecular quantitative trait loci (QTL) mapping into genome-wide genetic association
381 analysis of complex traits. The proposed approach requires only summary-level
382 statistics and provides evidence of colocalization of their association signals. To our
383 knowledge, a method integrating more than two traits is lacking. In contrast to other
384 methods that attempt to estimate the true genetic correlation between traits such as LD
385 score regression24 and TWAS5, moloc focuses on genes that are detectable from the
386 datasets at hand. Thus, if the studies are underpowered, most of the evidence will lie in
387 the null scenarios.
388 We expose one possible application of this approach in SCZ. In this application,
389 we focus on scenarios involving eQTLs and GWAS, alone or in combination with
390 mQTLs. Other scenarios are also biologically important. For example, colocalization of
391 GWAS and mQTL excluding eQTLs (GM.E scenario) could unveil important methylation
392 mechanisms affecting disease but not directly influencing gene expression in cis. We
393 report these and other scenarios in our web resource and encourage further
394 examination of these cases in future analyses. The GEM scenario provides evidence
395 that SCZ risk association is mediated through changes in DNA methylation and gene
396 expression. While our method does not detect causal relationships among the
397 associated traits, i.e. whether risk allele leads to changes in gene expression through
398 methylation changes or vice versa, there is evidence supporting the notion that risk
399 alleles might affect transcription factor binding and epigenome regulation that drives
400 downstream alterations in gene expression21,25.
19 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
401 We provide posterior probabilities supporting respective hypotheses for each
402 gene-methylation pair analyzed, and the SNP for each trait with the highest probability
403 of colocalization with any other trait. For example, the SNP with the highest posterior
404 probability of GWAS colocalization with eQTL or mQTL will be computed from PPA of
405 GE + GE.M + GM + GM.E + GEM. However, the aim of this method is not fine-mapping
406 of SNPs and we encourage researchers to further analyze the identified local
407 associations with methods better suited for fine-mapping.
408 We assign a prior probability that a SNP is associated with one trait (1 x 10-4), to
409 two (1 x 10-6), and to three traits (1 x 10-7). We have shown with simulations that these
410 are reasonable choices for the particular datasets at hand. Moreover, we prove that
411 eQTL enrichment in GWAS has a similar enrichment to mQTL in GWAS, however the
412 choices are arguable. One solution is to estimate priors for the different combinations of
413 datasets. Pickrell et al.3 proposed estimation of enrichment parameters from genome-
414 wide results maximizing a posteriori estimates for two traits. For multiple traits, another
415 possibility is using deterministic approximation of posteriors26. We leave these
416 explorations to future research.
417 We note that this approach can be extended to more than three traits. However,
418 the number of possible combinations increases exponentially as the number of traits
419 increases, therefore computation time is a limiting factor and realistically it works well for
420 up to four traits. Owing to the increasing availability of summary statistics from multiple
421 datasets, the systematic application of this approach can provide clues into the
422 molecular mechanisms underlying GWAS signals and how regulatory variants influence
423 complex diseases.
20 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
424 CONFLICTS OF INTEREST
425 None reported.
426 SUPPLEMENTAL DATA
427 Supplemental Data include four figures and four tables.
428
429 CONSORTIA
430 The CommonMind Consortium includes: Menachem Fromer, Panos Roussos, Solveig K
431 Sieberts, Jessica S Johnson, Douglas M Ruderfer, Hardik R Shah, Lambertus L Klei,
432 Kristen K Dang, Thanneer M Perumal, Benjamin A Logsdon, Milind C Mahajan, Lara M
433 Mangravite, Hiroyoshi Toyoshiba, Raquel E Gur, Chang-Gyu Hahn, Eric Schadt, David
434 A Lewis, Vahram Haroutunian, Mette A Peters, Barbara K Lipska, Joseph D Buxbaum,
435 Keisuke Hirai, Enrico Domenici, Bernie Devlin, Pamela Sklar
436
437 ACKNOWLEDGMENTS
438 We would like to thank Chris Wallace at the Department of Medical Genetics, NIHR
439 Cambridge Biomedical Research Centre, Cambridge Institute for Medical Research,
440 University of Cambridge, Cambridge, UK.
441 This work was supported by the National Institutes of Health (R01AG050986 Roussos
442 and R01MH109677 Roussos), Brain Behavior Research Foundation (20540 Roussos),
443 Alzheimer's Association (NIRG-340998 Roussos) and the Veterans Affairs (Merit grant
444 BX002395 Roussos). Additionally, this work was supported in part through the
21 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
445 computational resources and staff expertise provided by Scientific Computing at the
446 Icahn School of Medicine at Mount Sinai.
447 Data were generated as part of the CommonMind Consortium supported by funding
448 from Takeda Pharmaceuticals Company Limited, F. Hoffman-La Roche Ltd and NIH
449 grants R01MH085542, R01MH093725, P50MH066392, P50MH080405,
450 R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881
451 and R37MH057881S1, HHSN271201300031C, AG02219, AG05138 and MH06692.
452 Brain tissue for the study was obtained from the following brain bank collections: the
453 Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania
454 Alzheimer’s Disease Core Center, the University of Pittsburgh NeuroBioBank and Brain
455 and Tissue Repositories and the NIMH Human Brain Collection Core. CMC Leadership:
456 Pamela Sklar, Joseph Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie
457 Devlin, David Lewis (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University
458 of Pennsylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals
459 Company Limited), Enrico Domenici, Laurent Essioux (F. Hoffman-La Roche Ltd), Lara
460 Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner, Barbara Lipska (NIMH).
461
462 WEB RESOURCES
463 We developed a web site to visualize the colocalization results of SCZ GWAS, eQTL,
464 mQTLs under all possible scenarios (icahn.mssm.edu/moloc). The browser allows
465 searches by gene, methylation probe, and scenario of interest. The moloc method is
466 available as an R package from https://github.com/clagiamba/moloc.
22 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
467 FIGURE TITLES AND LEGENDS
G: 0 0 0 1 0 0 0 0 G: 0 0 0 1 0 0 0 0 GEM E: 0 0 0 1 0 0 0 0 GE E: 0 0 0 1 0 0 0 0 ● ● 30 M: 0 0 0 1 0 0 0 0 30 M: 0 0 0 0 0 0 0 0
● ●
●
20 ● 20 ●
● ●
● −log10(p) −log10(p)
10 ● 10 ●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● 0 ● ● ● ● ● ● ● ● Datasets genomic position genomic position ● GWAS
● eQTL
G: 0 0 0 1 0 0 0 0 G: 0 0 0 1 0 0 0 0 ● mQTL GE.M E: 0 0 0 1 0 0 0 0 G.E.M E: 0 1 0 0 0 0 0 0 30 ● ● M: 0 0 0 0 0 0 1 0 30 M: 0 0 0 0 0 0 1 0
●
●
● ● 20 20 ● ●
● −log10(p) −log10(p)
10 ● 10 ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● 0 genomic position genomic position 468
469 Figure 1. Graphical representation of four possible configurations at a locus with 8
470 SNPs in common across three traits. The traits are labeled as G, E, M representing
471 GWAS (G), eQTL (E), and mQTL (M) datasets, respectively. Each plot represents one
472 possible configuration, which is a possible combination of 3 sets of binary vectors
473 indicating whether the variant is associated with the selected trait. Left plot top panel
474 (GEM scenario): points to one causal variant behind all of the associations; Right plot
475 top panel (GE scenario): represent the scenario with the same causal variant behind the
476 GE and no association or lack of power for the M association; Left plot bottom panel
477 (GE.M scenario): represents the case with two causal variants, one shared by the G
478 and E, and a different causal variant for M; Right plot bottom panel (G.E.M. scenario):
479 represents the case of three distinct causal variants behind each of the datasets
480 considered.
23 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
1.00 PPA.E PPA.E.M PPA.EM 0.75 PPA.G Three Traits PPA.G.E PPA.G.E.M PPA.G.EM 0.50 PPA.G.M Two Traits PPA.GE PPA.GE.M PPA distribution PPA PPA.GEM 0.25 584 303 PPA.GM PPA.GM.E PPA.M 0.00 PPA.zero E G M GE EM GM G.E E.M G.M GEM G.EM GE.M GM.E G.E.M 481 Simulated scenarios
482
483 Figure 2. A. Results from simulations under colocalization/non-colocalization scenarios
484 using a sample size of 10,000 individuals for GWAS trait (denoted as G), 300 for eQTL
485 trait (denoted as E), and 300 for mQTL trait (denoted as M). X-axis shows all 15
486 simulated scenarios, e.g. G.E.M, three different causal variants for each of the three
487 traits; G.EM, 2 different causal variants, one for G and one shared between E and M;
488 GE, 1 shared causal variant for G and E; GE.M, 2 different causal variants, one shared
489 between G and E and one for M; GEM, one causal variant shared between all the three
490 traits. The x-axis shows the distribution of posterior probabilities under the simulated
491 scenario. B. Venn diagram comparing number of colocalization of two traits (coloc PPA
492 >=80%) with three traits (moloc PPA GE + GE.M + GEM) in simulations with one causal
493 variant shared between all the three traits (GEM). Results include 887 out of 1,000
494 simulations passing 80% threshold for colocalization. The variance explained by the trait
495 is 0.01 for GWAS (1%), and 0.1 (10%) for the eQTL and mQTL.
24 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
496
497
1 E, 6.5% chr1 ENSG00000252254 (78) RERE (106) GEM, SLC45A1 (82) E.M, 0.5% 0.8 2.6% chr1 SDCCAG8 (32) zero, chr2 SF3B1 (6) 0.6 GM, 19.5% chr2 C2orf47 (3) chr3 CNTN4 (56) 0.4 2.1% G.E, 15.6% chr3 MUSTN1 (55) GM.E, chr3 PCCB (9) 0.2 EM, 0.3% 0.7% chr4 CLCN3 (26) chr5 MAN2A1 (36) GE, 9.0% G.E.M, chr5 CTNNA1 (24) 0 GE.M, chr5 IK (110) 1.4% NDUFA2 (74) 0.7% PCDHA2 (96) PCDHA7 (83) G, 34.8% PCDHA8 (58) M, 1.8% WDR55 (121) ZMAT2 (82) G.EM, chr8 ENSG00000253553 (24) chr10 ARL3 (55) 1.0% AS3MT (50) CNNM2 (52) NT5C2 (38) G.M, 3.5% WBP1L (68) chr14 ENSG00000201184 (68) KLC1 (135) PPP1R13B (60) chr15 EFTUD1P1 (10) ENSG00000225151 (5) chr15 FES (56) FURIN (66) chr16 DOC2A (57) INO80E (57) chr16 PRMT7 (42) chr17 DRG2 (33) TOM1L2 (15) chr19 GATAD2A (71) MAU2 (30) YJEFN3 (68) chr22 ENSG00000266175 (14) ENSG00000269104 (16) chr22 LINC00634 (60) WBP2NL (83) GEM GE.M GE GM GM.E 498
499 Figure 3. Summary of genes identified using three-trait colocalization within the SCZ LD
500 blocks. A. Mean posterior probability for each hypotheses computed using the cis-
501 regions overlapping the SCZ LD blocks. Sections of the pie chart represent the 15
502 scenarios representing the possible combination of the three traits. The “.” between the
503 traits denotes scenarios supporting different causal variants. The combined scenarios
504 GE, GE.M, GE account for 12.3%. B. Heatmap displaying the maximum posterior
25 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
505 probabilities reached by the 44 regions overlapping known SCZ LD blocks (gene,
506 number of methylation probes).
● 2
● ●
● 0
−log10 p−value ● ● 3 ● 6 −2 ● ● ● 9
Z − score methylation ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● −4 ● ●● ●● ●● ● ●●●● ●● −6 ●
−6 −4 −2 0 507 Z−score expression
508 Figure 4. Illustration of one example of colocalization results with GWAS-eQTL-mQTL.
509 FURIN gene and cg24888049; Shown are Z-scores (regression coefficients/standard
510 errors) from association of expression (x-axis) and association of methylation (y-axis) at
511 the FURIN locus. The red point shows the SNP with the strongest evidence for eQTL,
512 mQTL, GWAS (rs4702).
513
26 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
0.2
●
●
0.0 ● ●
●
●
−0.2
Spearman's rho ●
● ●
−0.4
● 27 to 791 − 538 to 25 809 to 8032 8041 to 20405 − 4449 to 539 20518 to 37352 37359 to 99754 − 17987 to 4473 − 99598 to 36877 − 36839 to 18021 514 distance of methylation probe from TSS
515 Figure 5. Spearman correlation of eQTL and mQTL effect estimates by distance from
516 transcription start site of the gene. Intervals of methylation probe distance from TSS
517 were estimated based on 10 equal size bins.
518
519
520
521
522
523
27 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
524
525 TABLE TITLES AND LEGENDS
526 Table 1. Number of genes with evidence of colocalization (PPA>=0.8) under each
527 scenario.
Scenarios Sharing of variant Unique gene- Unique genes methylation pairs Total Total Overlapping Number of PPA>=80% PPA>=80% SCZ LD LD blocks blocks Null No associations 290,850 10,667 92 56
G GWAS only 4,445 222 149 63 E eQTL only 116,674 4,427 16 13 M mQTL only 23,662 6,150 41 28 G.E GWAS not eQTL (2 1,501 77 54 27 causals) E.M eQTL not mQTL (2 8,713 2,324 7 6 causals) G.M GWAS not mQTL (2 241 81 54 26 causals) GE GWAS,eQTL 389 34 19 15 EM eQTL,mQTL 1,724 893 3 3 GM GWAS,mQTL 38 23 18 10 GM.E GWAS,mQTL not eQTL 21 12 8 5 (2 causals) G.EM eQTL,mQTL not GWAS 24 12 8 4 (2 causals) GE.M GWAS,eQTL not mQTL 35 18 10 7 (2 causals) G.E.M not GWAS not eQTL not 72 33 26 15 mQTL (3 causals) GEM GWAS,eQTL,mQTL 127 56 27 12 GEM or GE.M combined scenarios for 1,173 97 44 22 or GE GWAS,eQTL total total 534,962 14,115 291 78 528
529 Table 2. Summary of Previous Findings integrating SCZ GWAS, CMC eQTL and
530 methylation datasets.
28 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
Method Used CMC10 SMR27 SMR 28 TWAS18 COLOC20 Scenarios 22 9 GWAS+eQTL: 26 GWAS+eQTL: 35 GWAS+ examined in our GWAS+eQTL+mQTL: 8 mQTL: analysis 15 Validated 13 (59%) 4 (44.4%) 22 (85%) GWAS+eQTL: 21 (60%) 7 (46%) scenarios (%) at GWAS+eQTL+mQTL: 6 (75%) PPA 0.8
Genes validated SF3B1, SF3B1, AL022476.2, GWAS+eQTL: cg005850 C2orf47, PCCB, ALMS1P, CLCN3, ALMS1P ,C2orf47, CPNE7, 72 and CNTN4, C17ORF3 DOC2A, DRG2, DOC2A, DRG2, ELOVL7, PCDHA8, CLCN3, 9, IRF3 EFTUD1P1, EMB, FURIN, GATAD2A, PCDHA2, ENSG00000 ELAC2, EMB, MAU2, MCHR1, NDUFA2, PCDHA7; 253553, FAM86B3P, NT5C2, PCCB, PCDHA2, cg012626 PPP1R13B, FURIN, PRMT7, SEPT10, SF3B1, 67 and EFTUD1P1, GATAD2A, SLC45A1, TMEM81, ZMAT2 ENSG00 ENSG00000 GOLGA2P7, GWAS+eQTL+mQTL: 0002676 225151, INO80E, JRK, SLC45A1,PCCB,NDUFA2,PC 29; FURIN, PCCB, PCDHA7, DHA2,ZMAT2,PRMT7 cg029518 INO80E, RBBP5, RP11- 83 and TOM1L2, 45P15.4, SF3B1, MAD1L1; DRG2, SLC9B1, cg086071 MAU2, SLCO4C1, 08 and GATAD2A, VPS37A MAD1L1; WBP2NL cg196244 44 and MAD1L1; cg087720 03 and AS3MT,C 10orf32; cg267326 15 and GATAD2 A, YJEFN3 531
532
533
29 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
534 REFERENCES
535 1. Visscher, P.M., Brown, M.A., McCarthy, M.I., and Yang, J. (2012). Five Years of
536 GWAS Discovery. Am. J. Hum. Genet. 90, 7–24.
537 2. Giambartolomei, C., Vukcevic, D., Schadt, E.E., Franke, L., Hingorani, A.D., Wallace,
538 C., and Plagnol, V. (2014). Bayesian test for colocalisation between pairs of genetic
539 association studies using summary statistics. PLoS Genet. 10, e1004383.
540 3. Pickrell, J.K., Berisa, T., Liu, J.Z., Ségurel, L., Tung, J.Y., and Hinds, D.A. (2016).
541 Detection and interpretation of shared genetic influences on 42 human traits. Nat.
542 Genet. 48, 709–717.
543 4. Pickrell, J.K. (2014). Joint analysis of functional genomic data and genome-wide
544 association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573.
545 5. Gusev, A., Ko, A., Shi, H., Bhatia, G., Chung, W., Penninx, B.W.J.H., Jansen, R., de
546 Geus, E.J.C., Boomsma, D.I., Wright, F.A., et al. (2016). Integrative approaches for
547 large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252.
548 6. Hormozdiari, F., van de Bunt, M., Segrè, A.V., Li, X., Joo, J.W.J., Bilow, M., Sul, J.H.,
549 Sankararaman, S., Pasaniuc, B., and Eskin, E. (2016). Colocalization of GWAS and
550 eQTL Signals Detects Target Genes. Am. J. Hum. Genet. 99, 1245–1260.
551 7. Kichaev, G., Yang, W.-Y., Lindstrom, S., Hormozdiari, F., Eskin, E., Price, A.L., Kraft,
552 P., and Pasaniuc, B. (2014). Integrating Functional Data to Prioritize Causal Variants in
553 Statistical Fine-Mapping Studies. PLoS Genet. 10, e1004722.
554 8. Consortium, E.N.C.O.D.E.P., and Bernstein, B.E. et al. (2012). An integrated
555 encyclopedia of DNA elements in the human genome. Nature 489, 57–74.
556 9. of the Psychiatric Genomics Consortium, S.W.G. (2014). Biological insights from 108 30 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
557 schizophrenia-associated genetic loci. Nature 511, 421–427.
558 10. Fromer, M., Roussos, P., Sieberts, S.K., Johnson, J.S., Kavanagh, D.H., Perumal,
559 T.M., Ruderfer, D.M., Oh, E.C., Topol, A., Shah, H.R., et al. (2016). Gene expression
560 elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19,
561 1442–1453.
562 11. Jaffe, A.E., Gao, Y., Deep-Soboslay, A., Tao, R., Hyde, T.M., Weinberger, D.R., and
563 Kleinman, J.E. (2016). Mapping DNA methylation across development, genotype and
564 schizophrenia in the human frontal cortex. Nat. Neurosci. 19, 40–47.
565 12. Wakefield, J. (2009). Bayes factors for genome-wide association studies:
566 comparison with P-values. Genet. Epidemiol. 33, 79–86.
567 13. Ip, H., Jansen, R., Abdellaoui, A., Bartels, M., Boomsma, D.I., and Nivard, M.G.
568 (2017). Stratified Linkage Disequilibrium Score Regression reveals enrichment of eQTL
569 effects on complex traits is not tissue specific. bioRxiv.
570 14. Shabalin, A.A. (2012). Matrix eQTL: ultra fast eQTL analysis via large matrix
571 operations. Bioinformatics 28, 1353–1358.
572 15. Wallace, C. (2013). Statistical testing of shared genetic control for potentially related
573 traits. Genet. Epidemiol. 37, 802–813.
574 16. Stephens, M., and Balding, D.J. (2009). Bayesian statistical methods for genetic
575 association studies. Nat. Rev. Genet. 10, 681–690.
576 17. Deignan, J., Luján, R., Bond, C., Riegel, A., Watanabe, M., Williams, J.T., Maylie, J.,
577 and Adelman, J.P. (2012). SK2 and SK3 expression differentially affect firing frequency
578 and precision in dopamine neurons. Neuroscience 217, 67–76.
579 18. Gusev, A., Mancuso, N., Finucane, H.K., Reshef, Y., Song, L., Safi, A., Oh, E., O
31 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
580 ’donovan, M.C., Katsanis, N., Crawford, G.E., et al. TITLE: Transcriptome-wide
581 association study of schizophrenia and chromatin activity yields mechanistic disease
582 insights.
583 19. Zhu, Z., Zhang, F., Hu, H., Bakshi, A., Robinson, M.R., Powell, J.E., Montgomery,
584 G.W., Goddard, M.E., Wray, N.R., Visscher, P.M., et al. (2016). Integration of summary
585 data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48,
586 481–487.
587 20. Hannon, E., Dempster, E., Viana, J., Burrage, J., Smith, A.R., Macdonald, R., St
588 Clair, D., Mustard, C., Breen, G., Therman, S., et al. (2016). An integrated genetic-
589 epigenetic analysis of schizophrenia: evidence for co-localization of genetic
590 associations and differential DNA methylation. Genome Biol. 17, 176.
591 21. Tak, Y.G., and Farnham, P.J. (2015). Making sense of GWAS: using epigenomics
592 and genome engineering to understand the functional relevance of SNPs in non-coding
593 regions of the human genome. Epigenetics Chromatin 8, 57.
594 22. Wagner, J.R., Busche, S., Ge, B., Kwan, T., Pastinen, T., and Blanchette, M.
595 (2014). The relationship between DNA methylation, genetic and expression inter-
596 individual variation in untransformed human fibroblasts. Genome Biol. 15, R37.
597 23. Du, X., Han, L., Guo, A.-Y., and Zhao, Z. (2012). Features of methylation and gene
598 expression in the promoter-associated CpG islands using human methylome data.
599 Comp. Funct. Genomics 2012, 598987.
600 24. Bulik-Sullivan, B., Finucane, H.K., Anttila, V., Gusev, A., Day, F.R., Loh, P.-R.,
601 Duncan, L., Perry, J.R.B., Patterson, N., Robinson, E.B., et al. (2015). An atlas of
602 genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241.
32 bioRxiv preprint doi: https://doi.org/10.1101/155481; this version posted June 26, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.
603 25. Li, Y.I., Van De Geijn, B., Raj, A., Knowles, D.A., Petti, A.A., Golan, D., Gilad, Y.,
604 and Pritchard, J.K. RNA splicing is a primary link between genetic variation and
605 disease.
606 26. Wen, X., Pique-Regi, R., and Luca, F. (2017). Integrating molecular QTL data into
607 genome-wide genetic association analysis: Probabilistic assessment of enrichment and
608 colocalization. PLoS Genet. 13,.
609 27. Zhu, Z., Zhang, F., Hu, H., Bakshi, A., Robinson, M.R., Powell, J.E., Montgomery,
610 G.W., Goddard, M.E., Wray, N.R., Visscher, P.M., et al. (2016). Integration of summary
611 data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48,
612 481–487.
613 28. Hauberg, M.E., Zhang, W., Giambartolomei, C., Franzén, O., Morris, D.L., Vyse,
614 T.J., Ruusalepp, A., Sklar, P., Schadt, E.E., Björkegren, J.L.M., et al. (2017). Large-
615 Scale Identification of Common Trait and Disease Variants Affecting Gene Expression.
616 Am. J. Hum. Genet. 100, 885–894.
617
33