Arxiv:1803.00360V2 [Stat.CO] 2 Mar 2018 Prone to Misinterpretation [2, 3]

Computing Bayes factors to measure evidence from experiments: An extension of the BIC approximation Thomas J. Faulkenberry∗ Tarleton State University Bayesian inference affords scientists with powerful tools for testing hypotheses. One of these tools is the Bayes factor, which indexes the extent to which support for one hypothesis over another is updated after seeing the data. Part of the hesitance to adopt this approach may stem from an unfamiliarity with the computational tools necessary for computing Bayes factors. Previous work has shown that closed form approximations of Bayes factors are relatively easy to obtain for between-groups methods, such as an analysis of variance or t-test. In this paper, I extend this approximation to develop a formula for the Bayes factor that directly uses information that is typically reported for ANOVAs (e.g., the F ratio and degrees of freedom). After giving two examples of its use, I report the results of simulations which show that even with minimal input, this approximate Bayes factor produces similar results to existing software solutions. Note: to appear in Biometrical Letters. I. INTRODUCTION A. The Bayes factor Bayesian inference is a method of measurement that is Hypothesis testing is the primary tool for statistical based on the computation of P (H j D), which is called inference across much of the biological and behavioral the posterior probability of a hypothesis H, given data D. sciences. As such, most scientists are trained in classical Bayes' theorem casts this probability as null hypothesis significance testing (NHST). The scenario for testing a hypothesis is likely familiar to most readers of this journal. Suppose one wants to test a specific re- P (D j H) · P (H) P (H j D) = : (1) search hypothesis (e.g., some treatment has an effect on P (D) some outcome measure). NHST works by first assuming a null hypothesis (e.g., the treatment has no effect) and One may think of Equation 1 in the following manner: then computing some test statistic for a sample of data. before observing data D, one assigns a prior probability This sample test statistic is then compared to a hypothet- P (H) to hypothesis H. After observing data, one can ical distribution of test statistics that would arise if the then update this prior probability to a posterior proba- null hypothesis were true. If the sample's test statistic bility P (H j D) by multiplying the prior P (H) by the is in the tail of the distribution (that is, it should oc- likelihood P (D j H). This product is then rescaled to cur with low probability), the scientist decides to reject a probability distribution (i.e., total probability = 1) by the null hypothesis in favor of the alternative hypothesis. dividing by the marginal probability P (D). Further, the p-value, which indicates how surprising the Bayes' theorem provides a natural way to test hy- sample would be if the null hypothesis were true, is often potheses. Suppose we have two competing hypotheses: taken as a measure of evidence: the lower the p-value, an alternative hypothesis H1 and a null hypothesis H0. the stronger the evidence. We can directly compare the posterior probabilities of H and H by computing their ratio; that is, we can While orthodox across many disciplines, NHST does 1 0 compute the posterior odds in favor of H over H as have philosophical criticisms [13]. Also, the p-value is 1 0 P (H1 j D)=P (H0 j D). Using Bayes' theorem (Equation arXiv:1803.00360v2 [stat.CO] 2 Mar 2018 prone to misinterpretation [2, 3]. Finally, NHST is ide- 1), it is trivial to see that ally suited to providing support for the alternative hypothesis, but the procedure does not work in the case where one wants to measure support for the null hypoth- P (H1 j D) P (D j H1) P (H1) esis. That is, we can reject the null, but we cannot accept = · : (2) P (H0 j D) P (D j H0) P (H0) the null. To overcome this limitation, we can use an al- | {z } | {z } | {z } ternative method for testing hypotheses that is based on posterior odds Bayes factor prior odds Bayesian inference: the Bayes factor. This equation can also be interpreted in terms of the \updating" metaphor that was explained above. Specifi- cally, the posterior odds are equal to the prior odds multi- plied by an updating factor. This updating factor is equal to the ratio of likelihoods P (D j H1) and P (D j H0), and is called the Bayes factor [4]. Intuitively, the Bayes factor ∗ [email protected] can be interpreted as the weight of evidence provided by 2 a set of data D. For example, suppose that one assigned values are often over a continuous parameter space, this the prior odds of H1 and H0 equal to 1; that is, H1 and computation requires integration, and thus the formula H0 are a priori assumed to be equally likely. Then, sup- for the Bayes factor amounts to pose that after observing data D, the Bayes factor was computed to be 10. Now, the posterior odds (the odds of H1 over H0 after observing data) is 10:1 in favor of R H1 over H0. As such, the Bayes factor provides an easily P (D j H0; θ)π0(θ)dθ θ2Θ0 interpretable measure of the evidence in favor of H1. BF01 = R (3) P (D j H1; θ)π1(θ)dθ In order to help with interpreting Bayes factors, var- θ2Θ1 ious classification schemes have been proposed. One simple scheme is a four-way classification proposed by Raftery [9], where Bayes factors between 1 and 3 are where Θ0 and Θ1 are the parameter spaces for models considered weak evidence; between 3 and 20 constitutes H0 and H1, respectively, and π0 and π1 are the prior positive evidence; between 20 and 150 constitutes strong probability density functions of the parameters of H0 and evidence; and beyond 150 is considered very strong evi- H1, respectively. dence. Thus, in order to compute BF , one must specify the Note that in the discussion above, there was no spe- 01 priors π and π for H and H . Further, the integrals cific assumption about the order in which we addressed 0 1 0 1 usually do not have closed-form solutions, so numerical H and H . If instead we wanted to assess the weight of 1 0 integration techniques are necessary. These requirements evidence in favor of H over H , Equation 2 could sim- 0 1 lend a computation of the Bayes factor to be inaccessible ply be adjusted by taking reciprocals. As such, implied to all but the most ardent researchers who have at least direction is important when computing Bayes factors, so a more-than-modest amount of mathematical training. one must be careful to define notation when representing Bayes factors. A common convention is to define BF10 as Fortunately, there are an increasing number of solu- the Bayes factor for H1 over H0; similarly, BF01 would tions that avoid a direct encounter with computations represent the Bayes factor for H0 over H1. Note that of the above type. Recently, researchers have proposed BF10 = 1=BF01. default priors for standard experimental designs such as In summary, the Bayes factor provides an index of pref- t-tests [7, 11] and ANOVA [10]. These default priors are erence for one hypothesis over another that has some ad- implemented in software packages such as the R pack- vantages over NHST. First, the Bayes factor tells us by age BayesFactor [8], and as such, have provided a user- how much a data sample should update our belief in one friendly method for researchers to compute Bayes factors hypothesis over a competing one. Second, though NHST without the computational overhead needed in Equation does not allow one to accept a null hypothesis, doing so 3. While these software solutions work quite well for com- within a Bayesian framework makes perfect sense. Given puting Bayes factors from raw data, they are a bit lim- these advantages, it may be surprising that Bayesian in- ited in the following context. Suppose that in the course ference has not been used more often in the empirical of reading some published literature, a researcher comes sciences. One reason for the lack of more widespread across a result that is presented as “nonsignificant", with adoption may be that Bayes factors are quite difficult to associated test statistic F (1; 23) = 2:21, p = 0:15. In an compute. We tackle this issue in the next section. NHST context, this nonsignificant result does not provide evidence for the null hypothesis; rather, it just implies that we cannot reject the null. A natural question would II. COMPUTING BAYES FACTORS be what, if any, support does this result provide for the null hypothesis? Of course, a Bayes factor would be use- As an example, suppose we are interested in computing ful here, but without the raw data, we cannot use the the Bayes factor for a null hypothesis H0 over an alterna- previously mentioned software solutions. To this end, it tive hypothesis H1, given data D. Recall from Equation would be advantageous if there were some easy way to 2 that this Bayes factor (denoted BF01) is equal to compute a Bayes factor directly from the reported test statistic. P (D j H0) It turns out that this computation is indeed possible, BF01 = : at least in certain cases. In the following, I will show P (D j H1) how one particular method for computing Bayes factors While this equation may seem conceptually quite sim- [the BIC approximation; 9] can be adapted to solve this ple, it is computationally much more difficult.

Arxiv:1803.00360V2 [Stat.CO] 2 Mar 2018 Prone to Misinterpretation [2, 3]

1 Estimation and Beyond in the Bayes Universe

Hierarchical Models & Bayesian Model Selection

Bayes Factors in Practice Author(S): Robert E

Bayes Factor Consistency

Calibrated Bayes Factors for Model Selection and Model Averaging

The Bayes Factor

On the Derivation of the Bayesian Information Criterion

Bayesian Inference Chapter 5: Model Selection

Bayesian Alternatives for Common Null-Hypothesis Significance Tests

Bayesian Model Selection

A Tutorial on Conducting and Interpreting a Bayesian ANOVA in JASP

Assessing Bayes Factor Surfaces Using Interactive Visualization and Computer Surrogate Modeling