<<

Journal of the American Statistical Association

ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20

On the Null Distribution of Bayes Factors in

Quan Zhou & Yongtao Guan

To cite this article: Quan Zhou & Yongtao Guan (2017): On the Null Distribution of Bayes Factors in Linear Regression, Journal of the American Statistical Association, DOI: 10.1080/01621459.2017.1328361 To link to this article: https://doi.org/10.1080/01621459.2017.1328361

© 2018 The Author(s). Published with license by Taylor & Francis© Quan Zhou and Yongtao Guan

View supplementary material

Accepted author version posted online: 27 Jul 2017. Published online: 08 Jun 2018.

Submit your article to this journal

Article views: 1062

View Crossmark

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=uasa20 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION , VOL. , NO. , –, Theory and Methods https://doi.org/./..

On the Null Distribution of Bayes Factors in Linear Regression

Quan Zhou and Yongtao Guan Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX

ABSTRACT ARTICLE HISTORY We show that under the null, the 2 log(Bayes factor) is asymptotically distributed as a weighted sum of chi- Received September  squared random variables with a shifted . This claim holds for Bayesian multi-linear regression with Revised April  a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and the normal prior. KEYWORDS Our results have three immediate impacts. First, we can compute analytically a p-value associated with a p-Value; Scaled Bayes factor; Bayes factor without the need of permutation. We provide a software package that can evaluate the p- Weighted sum of chi-squared value associated with Bayes factor efficiently and accurately. Second, the null distribution is illuminating random variables to some intrinsic properties of Bayes factor, namely, how Bayes factor quantitatively depends on prior and the genesis of Bartlett’s paradox. Third, enlightened by the null distribution of Bayes factor, we formulate a novel scaled Bayes factor that depends less on the prior and is immune to Bartlett’s paradox. When two tests have an identical p-value, the test with a larger power tends to have a larger scaled Bayes factor, a desirable property that is missing for the (unscaled) Bayes factor. Supplementary materials for this article are available online.

1. Introduction about the prior effects, one would risk prior-misspecification, Bayesian methods have been sidelined by most practitioners which also unintentionally favors the null.) We identified the in genetic association studies. The main reason is that p-value, dominant term in Bayes factor that is affected by the prior, which although often misinterpreted, is entrenched among practition- motivates us to systematically scale Bayes factor. The scaled ers (Sellke, Bayarri, and Berger 2001; Nuzzo 2014). A Bayesian Bayes factor depends weakly on the prior and no longer suffers method that performs genetic association tests, such as that from the Barlett’s paradox. of Guan and Stephens (2008), often faces an inconvenient Our third motivation is to emphasize the necessity of taking demand to produce a p-value associated with an extraordinarily into account power to interpret p-values. The probability that large Bayes factor. Because the null distribution of Bayes factor a positive report is false depends on both the p-value and the is unknown, a previous solution has been to obtain the p-value (Wacholder et al. 2004). A plainer reiteration through permutation (Servin and Stephens 2007). In genome- of this insightful observation is that a small p-value alone can- wide association studies, however, the significance threshold for not provide a strong evidence for a true association, and it has p-valuesisexceedinglysmall,owingtotheburdenofmulti- to be interpreted in light of the statistical power (Burton et al. ple testing; thus, the number of permutations required is often 2007). A large Bayes factor, however, by itself provides a strong prohibitively large and hence impractical. This motivates us to evidence for a true association (Stephens and Balding 2009). quantify the null distribution of Bayes factors, from which we And Sawcer (2010) related Bayes factor to the ratio between the can compute a p-value associated with Bayes factor analytically, power and the p-value. It is therefore beneficial for a study to without the need of permutation. report both Bayes factors and their associated p-values. The idea Our second motivation is to understand Bayes factor itself, of computing a p-value associated with Bayes factor dates back first and foremost to understand, quantitatively, the prior- to Good (1957), as a symbol of “Bayes/non-Bayes compromise” dependence nature of Bayes factors. Such prior-dependency (Good 1992). The p-values will satisfy the practical mandate often turns away naive practitioners. The second is to investi- imposed by the research community, and the companion Bayes gate the root of inconsistency of Bayes factor, namely, Bartlett’s factors will remind one to account for power when interpreting paradox (Bartlett 1957). Bartlett discovered that a diffusive prior p-values. For example, tests may be ranked differently by their tends to favor, unintentionally, the null model. In other words, if p-values than by their Bayes factors, and two identical p-values one were uncertain about the prior effects, one would automat- may associate with different Bayes factors. Both examples high- ically favor the null. (On the other hand, if one were too certain light the necessity of taking into account the statistical power to

CONTACT Yongtao Guan [email protected] USDA/ARS Children’s Nutrition Research Center, Department of Pediatrics, and Department of Molecular and Human Genetics, Baylor College of Medicine,  Bates, Room , Houston, TX , USA. Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/r/JASA. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/JASA. ©  Quan Zhou and Yongtao Guan. Published with license by Taylor and Francis. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/./), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 2 Q. ZHOU AND Y. GUAN

interpret p-values. To this end, our scaled Bayes factor becomes where V a and V b are some positive definite matrices, and the more appealing. When two tests produce identical p-values, the gamma distribution is in the shape-rate parameterization. Fol- scaled Bayes factor tends to assign a larger value to the test with lowing the standard treatment (see Servin and Stephens 2007) V −1 → 0 κ ,κ → a larger power, while the (unscaled) Bayes factor does not. to let a and 1 2 0, we can compute Bayes factor in Our main result states that, under the null, the logarithm the closed form of Bayes factor has the same distribution as a weighted sum of − / − − / BF =|V | 1 2|Xt X + V 1| 1 2 chi-squared random variables with a shifted mean. The results b b  −n/2 hold asymptotically for Bayesian multi-linear regression. For yt X(Xt X + V −1)−1Xt y , we have 2 log(Bayes factor) = λχ2 + × 1 − b , (3) 1 yt y − ytW (W tW )−1W t y log(1 − λ),whereλ is a quantity related to the prior and the χ 2 data, and 1 denotes a chi-squared of one degree of freedom (denoted by d.f.). The undesirable properties where   − of Bayes factor, namely, its over-dependence on the prior and X = I − W (W tW ) 1W t L (4) Bartlett’s paradox, find their roots in the shift term log(1 − λ). n Our scaled Bayes factor effectively substitutes this term with is the residuals of L after regressing out W,and|·|denotes the −λ ( ) = λ(χ2 − ). to achieve 2 log scaled Bayes factor 1 1 For sim- determinant. Since W is assumed to contain a column of 1, each ple linear regression, the p-value associated with a Bayes fac- column in X is therefore centered. tor is the same as the p-value of the likelihood ratio test. For Bayes factor in (3) uses the null as the base model and is multi-linear regression, computing the p-value associated with thus called the null-based Bayes factor (see Liang et al. 2008), a Bayes factor requires evaluation of the distribution function which has been widely used in genetic association studies (Bald- of a weighted sum of chi-squared random variables. Based on ing 2006;Marchinietal.2007;GuanandStephens2008;Xuand a recently published polynomial algorithm (Bausch 2013), we Guan 2014). developed a software package to evaluate the p-values analyti- V −1 → ,κ ,κ → The use of the improper prior a 0 1 2 0hastwo cally, which can efficiently achieve an arbitrary precision. merits. First, the limiting prior distributions for a and τ is equiv- The article is structured as follows. In Section 2,weformulate alenttoJeffreys’prior(IbrahimandLaud1991; O’Hagan and themodelandthepriors,andprovideourmainresultonthe Forster 2004), p(a,τ)∝ τ (q−2)/2, which is the standard choice null distribution of Bayes factors. In Section 3, we demonstrate of the noninformative prior for the nuisance parameters in the how to compute a p-value associated with a Bayes factor. In literature. Moreover, the posterior distributions for a and τ are Section 4, we introduce the scaled Bayes factor and demonstrate proper. Second, Bayes factor in (3), which can be written as the its benefits. In Section 5,weanalyzearealdatasettocompute limit of a sequence of Bayes factors with proper priors (see proof and compare Bayes factor, the scaled Bayes factor, and the in supplementary), is invariant to the shifting and scaling of y (or p-values associated with Bayes factors. In the last section, we independent of a and τ). To see this, replace y with the standard- summarize our findings and discuss relevant (future) topics. All ized random variable proofsareinthesupplementaryonline. def / z = τ 1 2(y − Wa), (5)

2. The Null Distribution of Bayes Factor and one can check (3)stillholds. b We consider the standard hypothesis testing problem in linear We assume a priori that the expectation of is zero so that regression with independent normal errors. thedirectionoftheeffecthasnoinfluenceonBayesfactor.This prior for b is commonly adopted in practice (see Jeffreys 1961, − chap. 5). For the NIG prior, we further assume independence H : y | a,τ ∼ MVN(Wa,τ 1I ), 0 n between the effects and the covariates. Henceforth when we refer y | a, b,τ ∼ (Wa+ Lb,τ−1I ), H1: MVN n (1) to the NIG prior, we mean

V = σ 2I , where MVN stands for the multivariate normal distribution, In b b p (6) is an n × n identity matrix, W is a full-rank n × q matrix rep- resenting the nuisance covariates, including a column of 1, a is unless otherwise stated. The NIG prior and the corresponding a q-vector, L is an n × p matrix representing the covariates of Bayes factor will be the primary focus of this article. interest, b is a p-vector, and τ −1 is the error . The two The second is Zellner’s g-prior (Zellner 1986; Liang et al. 2008): models H0 and H1 arenestedandthenullmodelH0 represents no effect of L. p(a,τ)∝ 1/τ, The Bayesian linear regression comes with three forms of   g (7) conjugate priors in the literature. The first is the normal-inverse- b | τ ∼ 0, (Xt X )−1 . MVN τ gamma (NIG) prior (O’Hagan and Forster 2004,chap.9), detailed below: ThiscanbeseenasaspecialcaseoftheNIGpriorwithV b = − − g(Xt X ) 1 and thus Bayes factor under the g-prior can be derived a | τ ∼ MVN(0,τ 1V ), a straightforwardly from (3). b | τ ∼ (0,τ−1V ), MVN b (2) Thethirdconjugateprior,thenormalprior,canalsobe τ ∼ Gamma(κ1/2,κ2/2), viewed as a special case of the NIG prior, when the error variance JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 3

τ −1 is assumed known: out the nuisance parameters over a reference distribution. Dif- −1 ferent Bayesian p-values can be computed using different ref- a | τ ∼ MVN(0,τ V a), (8) erence distributions (Robins, van der Vaart, and Ventura 2000, b | τ ∼ (0,τ−1V ). MVN b Table 1). Two well-known examples are the prior predictive − p-values (Box 1980)andtheposteriorpredictivep-values (Rubin Under this prior, and letting V 1 → 0, Bayes factor can also be a 1984;Meng1994),whichusethepriorandtheposteriorasthe computed analytically reference distributions, respectively. In our case, Bayes factors     =|V |−1/2Xt X + V −1−1/2 1zt X Xt X + V −1 −1Xt z , in (3)and(9) are independent of the nuisance parameters; thus, BF b b exp b 2 pB canbeviewedasaposteriorpredictivep-value. This conve- (9) nience can be viewed as a bonus from the improper prior we where X is defined in (4)andz is defined in (5). Because the error used. variance in most applications is unknown, the normal prior is Corollary 1. Denote by pF the p-value from the likelihood ratio more of theoretical interest. But, as we will see shortly, Bayes fac- test, then for simple linear regression, we have asymptotically tor with the normal prior is approximately equal to that with the pB = pF . NIGprior.Thisisnottoosurprisingbecausethedataarevery informative on the error variance. When p = 1, the right-hand side of (10)containsasingle chi-squared random variable Q ,whichisasymptoticallyequal Theorem 1. For multi-linear regression (1)withtheNIG 1 to the likelihood ratio test , and therefore the two p- prior (2), the g-prior (7), and the normal prior (8), under the values are equal. In addition, for simple linear regression (10) null Bayes factors (BF) given in (3)and(9)follow explains the linear relationship between log BF and the likeli- p hood ratio test statistic observed in Wakefield2008 ( )andGuan ( ) = λ + ( − λ ) + , 2log BF iQi log 1 i (10) and Stephens (2008). i=1 iid = (ut z)2 ∼ χ 2 z (λ , u ) 3.1. Weighted Sum of χ2 where Qi i 1 with defined in (5), and i i is 1 X(Xt X + V −1)−1Xt the ith eigenvalue-eigenvector pair of b with For multi-linear regression, the right-hand side of (10)contains X defined in (4). For the NIG prior and the g-prior,  = oP(1) aweightedsumofchi-squaredvariables.Theweightsλ1,...,λp vanishes in probability when the sample size n →∞.Forthe are functions of the prior σb and the eigenvalues of normal prior  = 0. the matrix X defined in (4). In contrast, the likelihood ratio p Theorem 1 states that under the null 2 log BF is distributed as test statistic is asymptotically equal to i=1 Qi and distributed χ 2 X a weighted sum of chi-squared random variables with a shifted as p , which does not take into account the eigenvalues of . mean, and the weights and the mean-shift can be computed. Consequently, pB no longer equals to pF in general. One excep- By evaluating the distribution function, we can obtain a p-value tion, however, is Bayes factor under the g-prior, where we have λ = /( + ) associated with Bayes factor. For simple linear regression, Q1 is i g g 1 for every i. asymptotically equal to the test statistic of the likelihood ratio To compute pB for multi-linear regression requires evaluat- test. Thus, the p-value associated with Bayes factor is the same ing the distribution function of a weighted sum of chi-squared as the p-value of the likelihood ratio test. random variables, a challenging problem. Fortunately, a recent λ ∈ , ) It is easy to see that i [0 1 . When the leading eigenvalue polynomial method by Bausch (2013) provides an efficient solu- p ( − λ ) tion. Our contribution here is its implementation. We have approaches 1, i=1 log 1 i goes to negative infinity, and so does the 2 log BF. Under two scenarios the leading eigenvalue implemented Bausch’s method in C++, which allows one to does approach 1: the sample size goes to infinity or the prior compute p-values (tail probabilities) to an arbitrary precision diffuses indefinitely. Thus, the prior-dependence nature of efficiently. Bayes factor and the Barlett’s paradox both find their roots in First, we briefly summarize Bausch’s method and then pro- p ( − λ ). vide more details of our implementation. Bausch pairs the chi- the term i=1 log 1 i Moreover, when the sample size p gets extraordinarily large, every λi approaches 1 and = λiQi squared variables to take advantage of the identity i 1     behaves like the likelihood ratio test statistic, which is dis- λ + λ λ − λ ( ) = ( λ λ )−1/2 − 1 2 2 1 , tributed as a chi-squared random variable with p degrees of fλ1Q1+λ2Q2 x 4 1 2 exp x I0 x 4λ λ 4λ λ freedom, a special case of Wilks’s 1938 theorem. 1 2 1 2 χ 2 where Q1 and Q2 are independent 1 random variables and I0 is the modified Bessel function of the first kind. I can be approx- 3. The p-Value Associated with Bayes Factor 0 imated, to an arbitrary precision, by its Taylor expansion, and Using Theorem 2.1,wecancomputeap-value associated with the series obtained can be integrated algebraically in the subse- Bayes factor given in (3), which henceforth is denoted by pB. quent convolutions. The error of this algorithm only depends on Since Bayes factor is a test statistic, pB is naturally a Frequentist the remainder terms of the Taylor expansions and thus can be p-value. We point out pB is also a Bayesian p-value. The p-values, quantified. Bausch showed that the complexity of this algorithm or tail probabilities, are frequently computed in Bayesian con- is polynomial in p. text to check whether the model provides a good fit to the data. In our implementation, the weights (λ1,...,λp)aresortedin Bayesian p-values can be computed by comparing the observed a descending order and the chi-squared variables are then paired test statistic to a predictive distribution obtained by integrating consecutively. If p is odd, the term with the smallest weight is 4 Q. ZHOU AND Y. GUAN

described in Servin and Stephens (2007). Since pB is quantified asymptotically, we are compelled to evaluate the accuracy and calibration of pB for small sample sizes. We also computed the likelihood ratio test p-value pF as a comparison because of its intrinsic connection to Bayes factor (and hence pB)shownin Theorem 1. We used a GWAS dataset to perform our simulation studies. The details of the dataset are provided in Section 5.Forgiven n and p, we randomly sampled a subset of genotypes of n indi- viduals and p SNPs. Then we simulated y under the null, that is, y ∼ MVN(0, In). For each pair of sampled genotypes and sim- ulated phenotypes, we computed pB,usingσb = 0.2, and pF .We chose n = 100, 300 and p = 1, 5, 10, 20. For every combination of n and p,werepeatedthesimulations107 times. For p = 1, we can compute the exact p-value associated with Bayes factor using the F test (see proof in supplementary materials). The compar- ison between exact values of pB and pB obtained from asymp- totic approximation is shown in the top row of Figure 2.For p > 1, true values of pB cannot be obtained analytically; we thus . compared our asymptotic results against the theoretical uniform Figure . Speed of evaluating pB The plot shows the time spent (y-axis) evaluating χ 2 =  density functions to obtain  p-values for different number of 1 compo- distribution. The two bottom rows show that for n 100, the nents (– on x-axis). The p-values were evaluated with relative error < 10−5. asymptotic results are conservative for small p-values; but for n = 300, the asymptotic results appear to be well-aligned with retained for a numerical convolution in the last step. This pairing the theoretical prediction. Overall, Figure 2 demonstrates that strategy aims to minimize the number of terms needed in Taylor pB is well calibrated, and the calibration is better than the pF expansions for a prespecified precision. After Taylor expansion, at the tail. We thus conclude that our asymptotic method for we are faced with convolving gamma density functions (up to a obtaining pB is accurate and well-calibrated when the sample normalizing constant). The order of the convolutions is deter- size is more than a few hundred. mined by a single-linkage hierarchical clustering (Murtagh and Contreras 2012) on the rate parameters of the gamma densities. Convolving two gamma densities of similar rates is computa- 4. Scaled Bayes Factors tionally more efficient. Bayes factors are sensitive to the specification of priors. Let us Ourprogramhasfourfeaturesoutstanding.First,weadopted V = σ 2I consider the NIG prior with b b p and denote the singular GNU Multi-Precision Library so that our program can produce values of X by δi for i = 1,...,p.Thenλi in (10)becomesλi = an arbitrarily small p-value without suffering underflow or δ2/(δ2 + /σ 2) i i 1 b ,andthus overflow. Second, for an even number of chi-squared vari- p ables, pB can be computed at an arbitrary precision; for an δ2   i Qi 2 2 odd number of chi-squared variables, the error introduced at 2logBF= − log 1 + δ σ . (11) δ2 + 1/σ 2 i b the last step of numerical convolution can be made arbitrarily i=1 i b small. Third, the terms of Taylor expansion are determined by a Here,weassumethesamplesizeissufficientlylargesuchthat prespecified precision and a strict error bound is provided. Last, the oP(1) error term can be safely omitted. The term λiQi is since the program is written in C++, it is fast and suitable for 2 2 bounded by Qi (because λi < 1), but the term log (1 + δ σ ) is studies that evaluate millions of tests, such as genetic association i b monotonically increasing with respect to both δi and σb.When studies. Figure 1 demonstrates the of our program, thesamplesizegoestoinfinity,δi goes to infinity; when the prior for example, when p = 10 our program can evaluate ≈ 150 becomes more diffusive, σb goes to infinity. These properties give p-values per second. The speed appears to be quadratic in p. rise to the prior-dependence nature of Bayes factor and Bartlett’s Theweightedsumofchi-squaredvariablesoccursfrequentlyin paradox.  statistical applications, such as the ridge regression, the variance E = p (λ + ( − λ )) E By (10), 0[2 log BF] i=1 i log 1 i ,where 0 is component model, and recently the association testing for rare the expectation evaluated under the null. Centering 2 log BF to variants (Wu et al. 2011; Epstein et al. 2015). We believe our obtain program has a wide of applications. The source code and p executablesofourprogramBACH(Bausch’sAlgorithmforCHi- def = − E = λ ( − ). square weighted sum) are freely available at http://haplotype.org. 2logsBF 2logBF 0[2 log BF] i Qi 1 (12) i=1 We call the quantity logBF − E [log BF] the logarithm of the 3.2. Accuracy and Calibration of p for Finite Sample Sizes 0 B scaled Bayes factor (sBF). By definition, evaluating sBF requires Using Theorem 1,wecanevaluateextremelysmallp-values, an computing E0[2 log BF]. In addition to direct computation, important advantage in applications such as genome-wide asso- E0[2 log BF] can also be evaluated by simulating y under the ciation studies (GWAS) compared to the permutation method null. A valid and convenient approach to simulating under the JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 5

Figure . Accuracy and calibration of pB. The top row is for simple linear regression. The “true values”for pB are obtained from F-tests. The y-axis is the asymptotic pB.Line = y x is marked in gray. Two bottom rows are for multi-linear regression. Red dots represent pF (from likelihood ratio tests) and blue pB. The gray region represents a 95% prediction intervals for uniformly distributed p-values. null was proposed by Kennedy (1995). The approach is to per- the scaling is 1.5 when λ1 = 0.8 and 2.0 when λ1 = 0.9. When mute yW ,theresidualsofy after regressing out covariates W. all λi → 0, the scaling approaches 1 and meanwhile sBF → 1, Since 2 log BF is a weighted sum of chi-squared random vari- as expected; when all λi → 1, although the scaling factor blows y / →∞ → χ 2 − . ables, a modest number of permutations of W provide an accu- up (sBF BF ),2logsBFisstableand2logsBF p p rate estimation of its mean under the null. The permutation Consider a multiple testing problem that tests H1, H2,... might be advantageous over the analytical computation when against H0. Each alternative model is concerned with testing a the model is misspecified. single covariate in association with the response variable, and each covariate has the same λ1.ThensBFandBFproducethe Proposition 1. ThescaledBayesfactorhasthefollowing same for all tests, because the scaling coefficient is deter- properties. mined solely by λ1. In light of the connection between BF and E = ; / = p { (−λ / )/ (a) √0[2 log sBF] 0 sBF BF i=1 exp i 2 pF (Wakefield 2008;GuanandStephens2008), we have that, ( − λ )} > . 1 i 1 asymptotically, sBF and pF produce the same ranking for all tests (b) sBFandBFhavethesame(Bayesian)p-value pB. that have the same λ1.However,whenλ1 differs, the three statis- y˜ y (y)/ (y˜) = (c) Let be a permutation of .ThenBF BF tics BF, sBF, and pF (or pB) produce different . ˜ def ˜ ˜ sBF(y)/sBF(y) = D(y), and EP[log D(y)] = log sBF(y), and sBF(y)

and (9) follow

p 2logBF= λiQi + log(1 − λi) + , i=1 (14) ∼ χ 2(ρ ), Qi 1 i

where  ∼ oP(1),Qi is a noncentral chi-squared random vari- able with d.f. = 1 and the noncentrality parameter ρi = (ut Lβ)2/ (λ , u ) i n,and i i is the ith eigenvalue–eigenvector pair of X(Xt X + V −1)−1Xt  = b . For the normal prior 0. Note in the above theorem, 2 log BF has the same form as in (10). The only difference is that under the local alternatives Q1,...,Qp become noncentral chi-squared random variables instead of central chi-squared under the null. The assumptions on L guarantee the stochastic boundedness of ρi, permitting a Taylor expansion that leads to the linear approximation. These two assumptions are usually satisfied in practice, particularly in genetic association studies, where each entry of L is the allele counts of an individual at a marker and thus bounded by two. Figure . BF and sBF as functions of σ . The plot is for simple linear regression of b Let us assume that the sample size is large enough so that various sample sizes. BF is in gray and sBF is in black. BF and sBF are computed assuming the covariate has unit variance. the error term  in Theorem 4.1 can be safely ignored. For sim- ple linear regression, we have 2 log BF = λ1Q1 + log(1 − λ1) λ = λ ( − ), ∼ χ 2(ρ ) covariate. In genetic association studies, an SNP’s 1 is deter- and 2 log sBF 1 Q1 1 where Q1 1 1 is a noncen- mined by the minor allele and the prior effect size, tral chi-squared random variable. Because E[Q1] = ρ1 + 1, we and for a fixed prior effect size, the larger the minor allele fre- have E[2 log sBF] = λ1ρ1, which is proportional to λ1 for a fixed ρ quency, the larger the λ1. 1. In other words, under the local alternatives, sBF tends to assign larger values to more informative covariates. On the other Proposition 2. Suppose two models H1 and H2 are each con- hand, E[2 log BF] = λ1(ρ1 + 1) + log(1 − λ1) is not a mono- cerned with a single but different covariate, and H1 associates tonic function of λ1 for a fixed ρ1. Thus, the (unscaled) Bayes with a larger λ1.DenoteBFj and sBF j of BF and sBF for model ( = , ) factor does not respect the informativeness of covariates under Hj j 1 2 .Wehave the alternative model.

E0[log BF1 − log BF2] < 0, E0[log sBF1 − log sBF2] = 0. (13) Proposition 3. Consider simple linear regression in the context of Theorem 2: √ = σ / τ, ∼ χ 2(ρ ) ρ = λ /( − Although it is rudimentary to prove Proposition 2, the result (a) Given b b Q1 1 1 with 1 1 1 is illuminating with respect to the difference between BF and λ1). ∼ ( ,σ2/τ ) sBF. Under the null, BF has the propensity to assign a larger (b) Let b N 0 b , the marginal distribution (over b) ( − λ )−1χ 2 value to a less informative covariate. In other words, BF penal- of Q1 is 1 1 1 , a scaled central chi-squared izes more heavily to a more informative covariate. On the distribution. other hand, sBF disregards the informativeness of a covari- ate under the null. This indifference to the informativeness of The above proposition says that under the local alternatives sBF is advantageous under the alternative model (next section), the distribution of Q1 (and hence BF and sBF) is determined because, loosely speaking, the over-penalty of BF on more infor- by λ1. In (a), the alternative is evaluated at a fixed point, while mative covariates applies not just under the null, but also under in (b) it is averaged over the prior distribution of b. Because the alternative. the power of a test is determined by the alternative distribution of Q1 (for a fixed null), Proposition 3 suggests that the statisti- λ . 4.2. BF and sBF Under the Local Alternatives cal power is positively correlated with 1 This result is simple yet profound. Suppose we are faced with two tests with equal The local alternatives are a sequence of alternatives that scale p-values that suggest the null should be rejected. Without know- down the effect size when sample size increases so that the ing the powers of the tests, we cannot decide which rejection test statistic converges for large samples (see Ferguson 1996, is more reliable or carries more evidence (Stephens and Bald- chap. 22). The following theorem quantifies BF (and hence sBF) ing 2009). Suppose two tests have different λ1’s but the same under the local alternatives. Q1’s, then the two tests have the same p-value. From 2 log sBF = Theorem 2. For multi-linear regression (1)withtheNIG λ1(Q1 − 1), the scaled Bayes factor has a propensity to assign a λ prior (2), the g-prior (7),√ and the normal prior (8), under the larger value to the test that has a larger power (or a larger 1), a local alternatives b = β/ nτ and assuming either Lt L/n con- desirable property. On the other hand, this desirable property is verges or L is entry-wise bounded, Bayes factors given in (3) missingfortheunscaledBayesfactor. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 7

5. Applying to Ocular Hypertension GWAS Datasets To illustrate how the scaled Bayes factor performs in real data analysis, we applied for access and downloaded two GWAS datasets from the database of Genotypes and Pheno- types (dbGaP). Both studies were funded by the National Eye Institute: one is the Ocular Hypertension Treatment Study (Kass et al. 2002) (henceforth OHTS, dbGaP accession number: phs000240.v1.p1), the other is National Eye Institute Human Genetics Collaboration Consortium Glaucoma Genome-Wide Association Study (Ulmer et al. 2012) (henceforth NEIGHBOR, dbGaP accession number: phs000238.v1.p1). The phenotype of interest is the intraocular pressure (IOP). The OHTS dataset only contains individuals of high IOP (> 21). The NEIGHBOR dataset is a case-control design for glaucoma (Ulmer et al. 2012; Weinreb, Aung, and Medeiros 2014), in which many samples have IOP measurements because a high IOP is considered a major risk factor and a precur- sor phenotype for glaucoma. The NEIGHBOR dataset, how- ever, contains case samples with small IOP and control sam- ples with large IOP. Since IOP and glaucoma evidently have dif- Figure . The distributions of log10 BF and log10 sBF by different bins of minor ferent genetic basis, though many are overlapping, we removed allele frequency (MAF). The bins are marked by color and the diagonal line is y = x. thosesamples.WealsoremovedsampleswhoseIOPmeasure- ments differ by more than 10 between the two eyes since such that is much worse than its rankings by BF and pB. Not surpris- a large difference is likely to be caused by a different mech- ingly, this SNP has the smallest MAF (0.023) among the 20 SNPs anism, for example, physical accidents. The average IOP of included in Table 1. This example fits the theoretical observa- the two eyes was used as the raw phenotype. We then per- tions made in Proposition 2. We permuted the phenotypes once, formed the routine for the genotypes using the and recomputed the test statistics. The permutation is to simu- same procedure described by Xu and Guan (2014). OHTS and late under the null to confirm that E0[log sBF] = 0. Apparently NEIGHBOR were genotyped on different SNP arrays, and there sBF is more invariant against permutation compared to BF in remained 301,143 autosome SNPs genotyped in both datasets the sense that log(sBF(y)/sBF(y˜)) ≈ log sBF(y). thatpassedQC.Wethenperformedprincipalcomponentanal- Our choice of the σb = 0.2 represents the prior belief of small ysis to remove the outliers and extracted 3226 subjects (740 but noticeable effect size in the context of GWAS (see Bur- from OHTS and 2486 from NEIGHBOR) that clustered around ton et al. 2007). To check how sensitive our results are with the European samples in HapMap3 (The International HapMap respect to the choice of σb = 0.2, we redid the analysis using Consortium 2010). We regressed out age, sex, and six leading σb = 0.5. As predicted by our theory (Figure 3), we observed principal components from the raw phenotypes, quantile nor- that with σb = 0.5 BF tends to be smaller and sBF tends to be malized the residuals, and used them as the phenotypes for sin- larger, and pB remains unchanged. Moreover, the rankings of the gleSNPanalysis.WecomputedBF, sBF, and pB using prior SNPs remained mostly unchanged between the two choices of σb σb = 0.2. (Table S1 in supplementary materials). We first compared BF and sBF in different minor allele fre- Lastly, although it was not our main objective, we examined quency (MAF) bins. Different MAF bins correspond to differ- thetophitsintheassociationresult.Ouranalysisreproduced ent bins of the informativeness (λ1)ofSNPs.Figure 4 shows three known genetic associations for IOP. Namely, the TMCO1 ∼ that in each bin log10 sBF log10 BFisroughlyparalleltothe gene on chromosome 1 (163.9M-164.0M), which was reported line y = x, and more importantly the larger the MAF,the further by van Koolwijk et al. (2012); a single hit rs2025751 in the = − are the points away from y x,orinotherwords,log10 sBF PKHD1 gene on chromosome 6 (Hysi et al. 2014); and a single log10 BF is larger, which agrees well with the definition of sBF hit rs12150284 in the GAS7 gene on chromosome 17 (Ozel et al. (Figure 3), and fits the theoretical predictions (Propositions 2 2014). A noticeable potentially novel finding is the gene PEX14 and 3). Another noticeable feature in Figure 4 is that the mini- on chromosome 1. Two SNPs, rs12120962 and rs12127400, have mumvalueofsBFislargerthanthatofBF,whichisconsistent modestassociationsignalsfromBFandpB, but their scaled with the Proposition 1(a). Bayes factors are noteworthy. PEX14 encodes an essential com- Next we examined the ranking of SNPs by different test statis- ponent of the peroxisomal import machinery. The protein inter- tics. Table 1 contains the top 20 SNPs in the ranking by BF. Rows acts with the cytosolic receptor for proteins containing a PTS1 were then sorted according to SNP’s chromosome and position. peroxisomal targeting signal. Incidentally, PTS1 is known to Incidentally, the top 2 hits (rs7518099 and rs4656461) are the elevate the intraocular pressure (Shepard et al. 2007). In addi- same for all the three test statistics. The rankings are largely sim- tion, a mutation in PEX14 results in one form of Zellweger syn- ilar to one another among the three test statistics: BF, sBF, and drome, and for children who suffer from Zellweger syndrome, pB, particularly so between BF and pB.Thereis,however,the congenital glaucoma is a typical neonatal-infantile presenta- noticeable exception of SNP rs7696626, with a ranking by sBF tion (Klouwer et al. 2015). 8 Q. ZHOU AND Y. GUAN

Table . Top  single SNP associations.

SNP Chr Pos MAF bf (y) bf (y˜) sbf (y) sbf (y˜) p(y) p(y˜)

rs12120962  . . . () − . . () − . . () . rs12127400 ...()− . . () − . . () . rs4656461  . . . () − . . () − . . () . rs  . . . () − . . () . . () . rs  . . . () − . . () . . () . rs7518099  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs7696626  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs2025751  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () − . . () . rs  . . . () − . . () . . () . rs  . . . () − . . () . . () . rs12150284  . . . () − . . () − . . () . σ = . y˜ y NOTES: The SNPs chosen have top  BF values using b 0 2. The rankings by three test statistics are given in the parentheses. is obtained by permuting once. SNP IDs are in bold if they are mentioned specifically in the main text. The column names are explained as following. Pos: genomic position in megabase pair according to (y) (y) (y˜) (y˜) (y) (y) (y˜) (y˜) (y) − (y) (y˜) − (y˜) HG; bf : log10 BF ; bf : log10 BF ; sbf : log10 sBF ; sbf : log10 sBF ; p : log10 pB ;andp : log10 pB .

6. Discussion are independent, while in our situations the p-values are obvi- ously dependent. Motivated by Theorem 2.1,weproposeto In this article, we quantify the null distribution of Bayes fac-  ( k ( / )/ ) ∼ χ 2, tors in the context of multi-linear regression. We showed that combine p-values using 2 log i=1 exp Wi 2 k 1 where = −1( − ) under the null, 2 log BF is distributed as a weighted sum of chi- Wi 1 pi and is the cumulative distribution func- χ 2. squared random variables. The null distribution allows us to tion of 1 In essence, we converted each p-value to its associated compute the p-value associated with Bayes factor analytically, Bayes factor, averaged Bayes factors, and computed pB of the and we have developed a software package to do so efficiently. average Bayes factor. Since averaging over Bayes factors is always The software package can be used in a wide range of applica- valid and Theorem 1 provides the necessary connection between tions such as ridge regression, variance component model, and p-values and Bayes factors, our approach to combining the cor- genetic association studies. The null distribution of Bayes fac- related p-values appears to work well, at least for the aforemen- tioned two examples, where the existing methods surely fail. tors also allows us to study the properties of Bayes factors, and E = we identified the dominant term in Bayes factor that leads to its By definition, 0[BF] 1whichisanice property because excessive prior-dependence and Bartlett’sparadox. Weproposed it suggests that BF does not favor either the null or the alter- the scaled Bayes factor, which depends less on the prior and is native when the data are simulated under the null. A care- immune to Bartlett’s paradox. We then studied the properties of ful investigation into Proposition 2, however, revealed that this the sBF under the null and the local alternatives. Compared to seemingly nice property effectively results in a greater penalty on more informative covariates. The scaled Bayes factor, on BF, sBF respects more to the informativeness of data. E = L the other hand, satisfies 0[log sBF] 0, trading the property Very often the covariates areinferredfromastatisti- E = cal model, for example, imputed allele dosages by Guan and 0[BF] 1 of the (unscaled) Bayes factor. Immediately, this Stephens (2008) and the haplotype loading matrix in Xu and trade suggests that sBF favors the alternative over the null (by Guan (2014). One would like to take into account the uncer- Jensen’s inequality or simply Proposition 1(a)). We argue that tainty of the inferred L. In imputation-based association stud- this trade brings several benefits to sBF: it depends less on prior; ies, one may compute the posterior mean of L and then per- itbecomesimmunetoBartlett’sparadox;and,moreimportantly, sBF becomes well calibrated with respect to permutation. Sup- form the test. But in haplotype association analysis (Xu and ˜ L pose we permute the response variable y once to obtain y and Guan 2014), using the posterior mean of is impractical as y˜ one realization of L may be a column switching of the other, compute the test statistic (either BF or sBF) with ,treatingitas the test statistic under the null. Obviously, 2 log(sBF(y)/sBF(y˜)) to say the least. A natural solution is to compute a Bayes fac- (y) tor for each realization of L andusetheaveragedBayesfactor isexpectedtohavethesamemeanasthatof2logsBF (Propo- sition 1(c)). On the other hand, 2 log(BF(y)/BF(y˜)) is expected as the test statistic. Then how to evaluate the associated p-value (y). for the averaged Bayes factor? The same question also arises to have a different (shifted) mean from 2 log BF We believe after obtaining an averaged Bayes factor from multiple choices this better calibration of sBF with respect to permutation will of σ ’s. Two commonly used methods to combine p-values are make it a better test statistic for Bayesian variable selection b regression (Guan and Stephens 2011). Fisher’s 1948 method and Stouffer et al.’s 1949 method. Fisher’s k 2 In genetic association studies, one routinely performs mil- method uses −2 = log(p ) ∼ χ ; and Stouffer’s method first i 1 i 2k  √ lions of simple linear regression to test for the association obtains a Z-score for each p-value and then uses k Z / k ∼ i=1 i between each genetic variant and the phenotype. In general, N(0, 1). Both methods assume the p-values to be combined sBF and pB would produce different rankings for the variants, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 9

because their corresponding λ1’s differ. When a special prior, Bartlett,M.S.(1957), “A Comment on D. V. Lindley’s Statistical Paradox,” σb ∝ 1/δ1,isused,however,sBF(andBF)willproducethesame Biometrika, 44, 533–534. [1] Bausch, J. (2013), “On the Efficient Calculation of a Linear Combination of ranking as pB (Wakefield 2008;GuanandStephens2008). This λ Chi-Square Random Variables with an Application in Counting String prior, which produces the same 1 for all covariates, somewhat Vacua,” JournalofPhysicsA:MathematicalandTheoretical, 46, 505202. defeats the purpose of specifying a prior, because it practically [2,3] eliminates the effect of a variant’s variance to its test statistic. Benjamini, Y., and Hochberg, Y. (1995), “Controlling the False Discovery In multi-linear regression, such a special prior is the g-prior, Rate: A Practical and Powerful Approach to Multiple Testing,” Journal which sets every λ to g/(g + 1). At some sense ranking variants of the Royal Statistical Society, Series B, 57, 289–300. [9] i Box, G. E. P.(1980), “ and Bayes’ Inference in Scientific Modelling using sBF is more “informative” than using pB, and we would and Robustness,” JournaloftheRoyalStatisticalSociety, Series A, 143, like to control (FDR) for sBF. One approach 383–430. [3] is specifying the prior , multiplying the prior odds with Burton, P. R., Clayton, D. G., Cardon, L. R., and The Wellcome Trust Case BForsBFtoobtaintheposteriorodds,andthentheposterior Control Consortium (2007), “Genome-Wide Association Study of 14, probability of association (PPA) for each variant. But specifying 000 Cases of Seven Common Diseases and 3,000 Shared Controls,” Nature, 447, 661–678. [1,7] the prior odds is somewhat arbitrary, which unfortunately has Epstein, M. P., Duncan, R., Ware, E. B., Jhun, M. A., Bielak, L. F., Zhao, a strong influence on PPA. An alternative approach is perhaps W.,Smith,J.A.,Peyser,P.A.,Kardia,S.andSatten,G.A.(2015), “A to develop a procedure that is similar to that of Benjamini and Statistical Approach for Rare-Variant Association Testing in Affected Hochberg (1995). The Benjamini-Hochberg procedure relies on Sibships,” The American Journal of Human Genetics, 96, 543–554. the null distribution of the p-values(whichisuniform)tocon- [4] Ferguson,T.S.(1996), ACourseinLargeSampleTheory(Vol. 49), London: trol FDR, but it is noted that the p-value may not be the optimal Chapman & Hall. [6] statistic for controlling FDR (Sun and Cai 2009). Since now we Fisher,R.A.(1948), “Questions and Answers #14,” The American Statisti- know the null distribution of sBF (and BF), we can estimate the cian, 2, 30–31. [8] expected FDR for sBF (and BF). Such an FDR controlling pro- Good,I.J.(1957), “Saddle-Point Methods for the Multinomial Dis- cedure will provide an alternative solution to “calibrating” Bayes tribution,” The Annals of , 28, 861–881. [1] factors (either scaled or unscaled), and it will strengthen the ——— ( 1992), “The Bayes/Non-Bayes Compromise: A Brief Review,” Jour- “Bayes/non-Bayes compromise,” which is likely to attract more nal of the American Statistical Association, 87, 597–606. [1] practitioners to apply Bayesian methods in their studies. Guan, Y., and Stephens, M. (2011), “Bayesian Variable Selection Regression for Genome-wide Association Studies, and Other Large-Scale Prob- lems,” Annals of Applied Statistics, 5, 1780–1815. [8] Supplementary Materials Guan, Y., and Stephens, M. (2008), “Practical Issues in Imputation-Based Association Mapping,” PLoS Genetics, 4, e1000279. [1,2,3,5,8] The supplementary online includes proofs of theorems, R code Hysi,P.G.,Cheng,C.,Springelkamp,H.,etal.(2014), “Genome-Wide for computing Bayes factors, and scaled Bayes factors, and a Analysis of Multi-Ancestry Cohorts Identifies New Loci Influencing table to summarize top single SNP association results using Intraocular Pressure and Susceptibility to Glaucoma,” Nature Genetics, σ = . 46, 1126–1130. [7] b 0 5 for the IOP dataset presented in Section 5.Oursoft- Ibrahim,J.G.,andLaud,P.W.(1991), “On Bayesian Analysis of Generalized ware package to compute p-values for a weighted sum of chi- Linear Models using Jeffreys’s Prior,” Journal of the American Statistical squared random variables is freely available at https://github. Association, 86, 981–986. [2] com/haplotype/BACH and http://www.haplotype.org. Jeffreys, H. (1961), The Theory of Probability,Oxford,UK:OxfordUniver- sity Press. [2] Kass,M.A.,Heuer,D.K.,Higginbotham,E.J.,Johnson,C.A.,Keltner, Acknowledgments J.L.,Miller,J.P.,Parrish,R.K.,Wilson,M.R.andGordon,M.O. (2002), “The Ocular Hypertension Treatment Study: A Randomized The authors thank Mark Meyer and Dennis Bier at Baylor College of Trial Determines that Topical Ocular Hypotensive Medication Delays Medicine for editorial assistance. The review comments from the edi- or Prevents the Onset of Primary Open-Angle Glaucoma,” Archives of tors and two anonymous reviewers greatly improved the clarity of the Ophthalmology, 120, 701–713. [7] presentation. Kennedy, F. E. (1995), “ Tests in Econometrics,” Journal of Business & Economic Statistics, 13, 85–94. [5] Klouwer, F. C. C., Berendse, K., Ferdinandusse, S., Wanders, R. J. A., Enge- len, M. and Poll-The, B. T. (2015), “Zellweger Spectrum Disorders: Funding Clinical Overview and Management Approach,” Orphanet Journal of Rare Diseases, 10, 1. [7] This work was supported by United States Department of Agricul- Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008), “Mix- ture/Agriculture Research Service under contract number 6250-51000-057 tures of g priors for Bayesian Variable Selection,” Journal of the Ameri- and National Institutes of Health under award number R01HG008157. can Statistical Association, 103, 410–423. [2] Marchini,J.,Howie,B.,Myers,S.,McVean,G.andDonnelly,P.(2007), “A New Multipoint Method for Genome-Wide Association Stud- ORCID ies by Imputation of Genotypes,” Nature Genetics, 39, 906–913. [2] Quan Zhou http://orcid.org/0000-0003-3190-3598 Meng,X.(1994), “Posterior Predictive p-Values,” The Annals of Statistics, Yongtao Guan http://orcid.org/0000-0002-8055-2192 22, 1142–1160. [3] Murtagh, F., and Contreras, P. (2012), “Algorithms for Hierarchical Cluster- ing: An Overview,” Wiley Interdisciplinary Reviews: Data Mining and References Knowledge Discovery, 2, 86–97. [4] Nuzzo,R.(2014), “Scientific Method: Statistical Errors,” Nature, 506, 150– Balding, D. J. (2006), “A Tutorial on Statistical Methods for Population 152. [1] Association Studies,” Nature Reviews Genetics, 7, 781–791. [2] 10 Q. ZHOU AND Y. GUAN

O’Hagan,A.,andForster,J.J.(2004), Kendall’sAdvancedTheoryofStatistics, The International HapMap Consortium (2010), “Integrating Common and Volume 2B: ,London:Arnold.[2] Rare Genetic Variation in Diverse Human Populations,” Nature, 467, Ozel,A.B.,Moroi,S.E.,Reed,D.M.,etal.(2014), “Genome-Wide 52–58. [7] Association Study and Meta-Analysis of Intraocular Pressure,” Human Ulmer,M.,Li,J.,Yaspan,B.L.,etal.(2012), “Genome-Wide Analysis of Genetics, 133, 41–57. [7] Central Corneal Thickness in Primary Open-Angle Glaucoma Cases Robins,J.M.,vanderVaart,A.,andVentura,V.(2000), “Asymptotic Distri- in the NEIGHBOR and GLAUGEN Consortia. The Effects of CCT- bution of P values in Composite Null Models,” Journal of the American Associated Variants on POAG Risk,” Investigative Ophthalmology & Statistical Association, 95, 1143–1156. [3] Visual Science, 53, 4468. [7] Rubin, D. B. (1984),“BayesianlyJustifiableandRelevantFrequencyCalcu- vanKoolwijk,L.M.E.,Ramdas,W.D.,Ikram,M.K.,etal.(2012), “Com- lations for the Applies ,” The Annals of Statistics, 12, 1151– mon Genetic Determinants of Intraocular Pressure and Primary Open- 1172. [3] Angle Glaucoma,” PLoS Genet, 8, e1002611. [7] Sawcer, S. (2010), “Bayes Factors in Complex Genetics,” European Journal Wacholder, S., Chanock, S., Garcia-Closas, M., and Rothman, N. (2004), of Human Genetics, 18, 746–750. [1] “Assessing the Probability that a Positive Report is False: An Approach Sellke,T.,Bayarri,M.J.,andBerger,J.O.(2001), “Calibration of P Values for for Molecular Studies,” Journal of the National Cancer Testing Precise Null Hypotheses,” The American Statistician, 55, 62–71. Institute, 96, 434–442. [1] [1] Wakefield,J.(2008), “Reporting and Interpretation in Genome-Wide Asso- Servin, B., and Stephens, M. (2007), “Imputation-Based Analysis of Associ- ciation Studies,” International Journal of Epidemiology, 37, 641–653. ation Studies: Candidate Regions and Quantitative Traits,” PLoS Genet- [3,5,9] ics, 3, e114. [1,2,4] Weinreb,R.N.,Aung,T.,andMedeiros,F.A.(2014), “The Pathophysiol- Shepard, A. R., Jacobson, N., Millar, J. C., Pang, I. H., Steely, H. T., Searby, ogy and Treatment of Glaucoma: A Review,” Journal of the American C.C.,Sheffield,V.C.,Stone,E.M.andClark,A.F.(2007), “Glaucoma- Medical Association, 311, 1901–1911. [7] Causing Myocilin Mutants Require the Peroxisomal Targeting Signal-1 Wilks, S. S. (1938), “The Large-Sample Distribution of the Likelihood Ratio Receptor (PTS1R) to Elevate Intraocular Pressure,” Human Molecular for Testing Composite Hypotheses,” The Annals of Mathematical Statis- Genetics, 16, 609–617. [7] tics, 9, 60–62. [3] Stephens,M.,andBalding,D.J.(2009), “Bayesian Statistical Methods for Wu,M.C.,Lee,S.,Cai,T.,Li,Y.,Boehnke,M.andLin,X.(2011), “Rare- Genetic Association Studies,” Nature Reviews Genetics, 10, 681–690. Variant Association Testing for Sequencing Data with the Sequence [1,6] Kernel Association Test,” The American Journal of Human Genetics, 89, Stouffer,S.A.,Suchman,E.A.,DeVinney,L.C.,Star,S.A.,andWilliams 82–93. [4] Jr,R.M.(1949), The American Soldier, Vol. 1: Adjustment During Army Xu, H., and Guan, Y. (2014), “Detecting Local Haplotype Sharing and Hap- Life, Princeton: Princeton University Press. [8] lotype Association,” Genetics, 197, 823–838. [2,7,8] Sun, W., and Cai, T. T. (2009), “Large-Scale Multiple Testing Under Depen- Zellner, A. (1986), “On Assessing Prior Distributions and Bayesian Regres- dence,” JournaloftheRoyalStatisticalSociety, Series B, 71, 393–424. sion Analysis with g-Prior Distributions,” Bayesian Inference and Deci- [9] sion Techniques: Essays in Honor of Bruno De Finetti, 6, 233–243. [2]