Toward Evidence-Based Medical Statistics. 2: the Bayes Factor

Toward Evidence-Based Medical Statistics. 2: The Bayes Factor Steven N. Goodman, MD, PhD Bayesian inference is usually presented as a method for peared in response to an article proposing a Bayesian determining how scientific belief should be modified by analysis of the GUSTO (Global Utilization of Strep- data. Although Bayesian methodology has been one of tokinase and tPA for Occluded Coronary Arteries) the most active areas of statistical development in the past trial (4), are typical. 20 years, medical researchers have been reluctant to em- brace what they perceive as a subjective approach to data When modern Bayesians include a “prior probability analysis. It is little understood that Bayesian methods have distribution for the belief in the truth of a hypothesis,” they are actually creating a metaphysical model of a data-based core, which can be used as a calculus of attitude change...Theresult...cannot be field-tested evidence. This core is the Bayes factor, which in its simplest for its validity, other than that it “feels” reasonable to form is also called a likelihood ratio. The minimum Bayes the consumer.... factor is objective and can be used in lieu of the P value as The real problem is that neither classical nor Bayesian a measure of the evidential strength. Unlike P values, methods are able to provide the kind of answers cli- Bayes factors have a sound theoretical foundation and an nicians want. That classical methods are flawed is un- interpretation that allows their use in both inference and deniable—I wish I had an alternative....(5) decision making. Bayes factors show that P values greatly This comment reflects the widespread mispercep- overstate the evidence against the null hypothesis. Most tion that the only utility of the Bayesian approach is important, Bayes factors require the addition of background as a belief calculus. What is not appreciated is that knowledge to be transformed into inferences—probabilities Bayesian methods can instead be viewed as an evi- that a given conclusion is right or wrong. They make the distinction clear between experimental evidence and infer- dential calculus. Bayes theorem has two compo- ential conclusions while providing a framework in which to nents—one that summarizes the data and one that combine prior with current evidence. represents belief. Here, I focus on the component related to the data: the Bayes factor, which in its This paper is also available at http://www.acponline.org. simplest form is also called a likelihood ratio. In Bayes theorem, the Bayes factor is the index through which Ann Intern Med. 1999;130:1005-1013. the data speak, and it is separate from the purely From Johns Hopkins University School of Medicine, Baltimore, subjective part of the equation. It has also been called Maryland. For the current author address, see end of text. the relative betting odds, and its logarithm is some- times referred to as the weight of the evidence (6, 7). n the first of two articles on evidence-based sta- The distinction between evidence and error is clear tistics (1), I outlined the inherent difficulties of I when it is recognized that the Bayes factor (evidence) the standard frequentist statistical approach to in- is a measure of how much the probability of truth ference: problems with using the P value as a mea- (that is, 1 2 prob(error), where prob is probability) is sure of evidence, internal inconsistencies of the com- altered by the data. The equation is as follows: bined hypothesis test–P value method, and how that method inhibits combining experimental results with Prior Odds 3 Bayes 5 Posterior Odds background information. Here, I explore, as non- of Null Hypothesis Factor of Null Hypothesis mathematically as possible, the Bayesian approach where Bayes factor 5 to measuring evidence and combining information and epistemologic uncertainties that affect all statis- Prob~Data, given the null hypothesis! tical approaches to inference. Some of this presen- Prob~Data, given the alternative hypothesis! tation may be new to clinical researchers, but most The Bayes factor is a comparison of how well of it is based on ideas that have existed at least since two hypotheses predict the data. The hypothesis the 1920s and, to some extent, centuries earlier (2). that predicts the observed data better is the one that is said to have more evidence supporting it. Unlike the P value, the Bayes factor has a sound The Bayes Factor Alternative theoretical foundation and an interpretation that Bayesian inference is often described as a method of showing how belief is altered by data. Because of this, many researchers regard it as non- scientific; that is, they want to know what the data See related article on pp 995-1004 and editorial say, not what our belief should be after observing comment on pp 1019-1021. them (3). Comments such as the following, which ap- ©1999 American College of Physicians–American Society of Internal Medicine 1005 Table 1. Final (Posterior) Probability of the Null energy. We know that energy is real, but because it Hypothesis after Observing Various Bayes Factors, as a Function of the Prior Probability of is not directly observable, we infer the meaning of a the Null Hypothesis given amount from how much it heats water, lifts a weight, lights a city, or cools a house. We begin to Strength Bayes Factor Decrease in Probability understand what “a lot” and “a little” mean through of Evidence of the Null Hypothesis its effects. So it is with the Bayes factor: It modifies From To No Less Than prior probabilities, and after seeing how much % Bayes factors of certain sizes change various prior probabilities, we begin to understand what repre- Weak 1/5 90 64* sents strong evidence, and weak evidence. 50 17 25 6 Table 1 shows us how far various Bayes factors Moderate 1/10 90 47 move prior probabilities, on the null hypothesis, of 50 9 90%, 50%, and 25%. These correspond, respective- 25 3 ly, to high initial confidence in the null hypothesis, Moderate to strong 1/20 90 31 50 5 equivocal confidence, and moderate suspicion that 25 2 the null hypothesis is not true. If one is highly con- Strong to very strong 1/100 90 8 vinced of no effect (90% prior probability of the 50 1 25 0.3 null hypothesis) before starting the experiment, a Bayes factor of 1/10 will move one to being equiv- * Calculations were performed as follows: A probability (Prob) of 90% is equivalent to an odds of 9, calculated as Prob/(1 2 Prob). ocal (47% probability on the null hypothesis), but if Posterior odds 5 Bayes factor 3 prior odds; thus, (1/5) 3 9 5 1.8. one is equivocal at the start (50% prior probability), Probability 5 odds/(1 1 odds); thus, 1.8/2.8 5 0.64. that same amount of evidence will be moderately con- vincing that the null hypothesis is not true (9% pos- allows it to be used in both inference and decision terior probability). A Bayes factor of 1/100 is strong making. It links notions of objective probability, ev- enough to move one from being 90% sure of the idence, and subjective probability into a coherent null hypothesis to being only 8% sure. package and is interpretable from all three perspec- As the strength of the evidence increases, the tives. For example, if the Bayes factor for the null data are more able to convert a skeptic into a hypothesis compared with another hypothesis is 1/2, believer or a tentative suggestion into an accepted the meaning can be expressed in three ways. truth. This means that as the experimental evidence 1. Objective probability: The observed results are gets stronger, the amount of external evidence half as probable under the null hypothesis as they needed to support a scientific claim decreases. Con- are under the alternative. versely, when there is little outside evidence sup- 2. Inductive evidence: The evidence supports the porting a claim, much stronger experimental evidence null hypothesis half as strongly as it does the alter- is required for it to be credible. This phenomenon can native. be observed empirically, in the medical community’s 3. Subjective probability: The odds of the null reluctance to accept the results of clinical trials that hypothesis relative to the alternative hypothesis af- run counter to strong prior beliefs (10, 11). ter the experiment are half what they were before the experiment. The Bayes factor differs in many ways from a Bayes Factors and Meta-Analysis P value. First, the Bayes factor is not a probability itself but a ratio of probabilities, and it can vary There are two dimensions to the “evidence-based” from zero to infinity. It requires two hypotheses, properties of Bayes factors. One is that they are a making it clear that for evidence to be against the proper measure of quantitative evidence; this issue null hypothesis, it must be for some alternative. will be further explored shortly. The other is that Second, the Bayes factor depends on the probability they allow us to combine evidence from different of the observed data alone, not including unobserved experiments in a natural and intuitive way. To under- “long run” results that are part of the P value calcu- stand this, we must understand a little more of the lation. Thus, factors unrelated to the data that affect theory underlying Bayes factors (12–14). the P value, such as why an experiment was stopped, Every hypothesis under which the observed data do not affect the Bayes factor (8, 9). are not impossible can be said to have some evi- Because we are so accustomed to thinking of dence for it.

Toward Evidence-Based Medical Statistics. 2: the Bayes Factor

Arxiv:1803.00360V2 [Stat.CO] 2 Mar 2018 Prone to Misinterpretation [2, 3]

1 Estimation and Beyond in the Bayes Universe

Hierarchical Models & Bayesian Model Selection

Bayes Factors in Practice Author(S): Robert E

Bayes Factor Consistency

Calibrated Bayes Factors for Model Selection and Model Averaging

The Bayes Factor

On the Derivation of the Bayesian Information Criterion

Bayesian Inference Chapter 5: Model Selection

Bayesian Alternatives for Common Null-Hypothesis Significance Tests

Bayesian Model Selection

A Tutorial on Conducting and Interpreting a Bayesian ANOVA in JASP