Likelihood Ratios: a Tutorial on Applications to Research in Psychology

Likelihood Ratios 1 Likelihood Ratios: A Tutorial on Applications to Research in Psychology Scott Glover Royal Holloway University of London RUNNING HEAD: LIKELIHOOD RATIOS Address correspondence to: Dr. Scott Glover Dept. of Psychology Royal Holloway University of London Egham, Surrey, TW20 0EX [email protected] Likelihood Ratios 2 Abstract Many in psychology view their choice of statistical approaches as being between frequentist and Bayesian. However, a third approach, the use of likelihood ratios, provides several distinct advantages over both the frequentist and Bayesian options. A quick explanation of the basic logic of likelihood ratios is provided, followed by a comparison of the likelihood- based approach to frequentist and Bayesian methods. The bulk of the paper provides examples with formulas for computing likelihood ratios based on t-scores, ANOVA outputs, chi-square statistics, and binomial data, as well as examples of using likelihood ratios to test for models that make a priori predictions of effect sizes. Finally, advice on interpretation is offered. Keywords: likelihood ratios, t-tests, ANOVA, chi-square, binomial Likelihood Ratios 3 Introduction: What is a Likelihood Ratio? A likelihood ratio is a statistic expressing the relative likelihood of the data given two competing models. The likelihood ratio, λ, can be written as 푓(푋|휃̂₂) 휆 = , (1) 푓(푋|휃̂₁) where f is the probability density, X is the vector of observations, and θ̂₁ and θ̂₂ are the vectors of parameter estimates that maximize the likelihood under the two models. Often, likelihood ratios involve comparing the likelihood of the data given a model based on the point estimate (also known as the “maximum likelihood estimate” or “MLE”) relative to the likelihood of the data given no effect (the null hypothesis). A “raw” likelihood ratio is the expression of the relationship between the frequency densities of those two models, as illustrated in Figure 1. For example, a raw likelihood ratio of λ = 5 results when the density of the MLE is five times the density of the Ho distribution at the same point. This indicates that the data are five times as likely to occur given an effect based on the maximum likelihood estimate than given no effect (Goodman & Royall, 1988; Royall, 1997). Likelihood Ratios 4 Figure 1. The raw likelihood ratio based on the maximum likelihood estimate (MLE). The grey curve shows the distribution based on the observations which form the basis of the alternative hypothesis (Ha). The blank curve shows the distribution under the null hypothesis (Ho). The dotted and solid arrows show the frequency density of the distributions under the two hypotheses, and the raw likelihood ratio is the ratio of these two densities. In this example, the raw likelihood ratio is λ = 5.0 in favor of the alternative hypothesis over the null. Adjusted Likelihood Ratios In many circumstances, a raw likelihood ratio must be adjusted to reflect the different number of parameters in the models under consideration. In the typical case of determining whether an effect differs from zero, for example, the model based on the MLE will usually have an extra parameter(s) relative to the null, and will almost always provide a better fit to the data. Failure to adjust the likelihood ratio for unequal numbers of parameters would result in a bias towards the model with more parameters, a phenomenon known as “overfitting” (Burnham & Anderson, 2002). The result of applying this penalty to the model with more parameters is an “adjusted” likelihood ratio, expressed as λadj. This tutorial will include instructions for how Likelihood Ratios 5 to calculate both raw (λ) and adjusted (λadj) likelihood ratios, and when it is appropriate to use them. For testing the null versus some unspecified alternative model, the adjusted likelihood ratio is the appropriate statistic. A likelihood ratio may be used to compare the evidence for any two models, a property that gives this approach to data analysis great flexibility. For example, a likelihood ratio can be used to compare the fit of the null to a specific effect size predicted by a particular theory, or to compare two different-sized effects based on two different models’ predictions, as will be described towards the end of this tutorial. Relation to Other Approaches Likelihoodism is one of three basic approaches to statistical analysis, the other two being frequentist and Bayesian. However, both frequentist and Bayesian approaches are based on likelihood, and so likelihoodism shares some features with both, while also having important differences. As one example of a difference, whereas a p-value is based on an analysis of the probability of the data occurring if the null is true, and thus ignores the alternative model, a likelihood ratio directly compares the relative evidence for two competing models. By adopting a statistically symmetrical approach, the likelihood ratio provides a clearer index of the strength of the evidence for or against an effect than does a p-value. The Bayesian approach is similar to likelihoodism in that it also involves model comparison. Indeed, a Bayes Factor is nothing more than a likelihood ratio adjusted by some prior distribution of parameter values. However, in contrast to a Bayesian, a likelihoodist eschews the use of a prior distribution to inform their analyses, focusing solely on the evidence provided by the data. The respective philosophies of the Bayesian and likelihood-based approaches differ thus because the likelihoodist applies their subjectivity at the end of the analysis. That is, the likelihoodist decides what to believe based on the evidence in Likelihood Ratios 6 conjunction with their own intuitions about what may or may not be true, whereas the Bayesian attempts to mathematically formalize these prior beliefs into their statistical model. The objections of likelihoodists to the formalization of prior belief are detailed elsewhere (Edwards, 1972; Royall, 1997), and the interested reader is invited to view these sources for a discussion of some of the conceptual and mathematical issues that make statistical modelling of one’s prior beliefs unattractive to a likelihoodist. As a parable comparing the three basic approaches to data analysis, imagine three detectives are asked to investigate a murder with two possible suspects, Mr. Null and Ms. Alternative, and report the outcome of their analysis. The first detective, a frequentist trained in null hypothesis significance testing, would only examine the evidence against Mr. Null, and if this evidence suggested it seemed quite improbable that Mr. N were guilty, the detective would infer that Ms. A must have committed the foul deed (p < .05). A second detective trained in the Bayesian method would begin their investigation by first assigning a prior probability to each suspect’s guilt. They would do this as a matter of procedure, regardless of how much or little information regarding the case they might have. If based on actual evidence, this prior probability might be weighted in favor of either Mr. Null or Ms. Alternative, and might under appropriate circumstances form a reasonable starting point. If based on no evidence, however (the “uninformed prior”), this prior probability might be neutral or biased, specific or vague. Regardless of how defensible their prior probability might be, the manner in which it is mathematically formalized will have an impact on how the Bayesian detective ultimately presents the evidence. Finally, the detective trained in likelihoodism would begin with no prior probabilities, but simply describe the evidence against both Mr. Null and Ms. Alternative, and compare the relative probability (likelihood) of each one’s guilt. By examining the evidence against both Likelihood Ratios 7 suspects, without introducing any prior bias into their calculations, the likelihoodist detective would arguably give the most objective report of all three investigators regarding which suspect was more likely to be the culprit, based on the data alone. This objectivity - the fair and even appraisal of the two “suspects” - is in my view the core advantage of using likelihood ratios over the frequentist and Bayesian methods. Of course, this same objectivity also applies when the appraisal of evidence is concerning two hypotheses or models. Mathematical relation between Likelihood Ratios and p-values Despite using a different approach to model testing, likelihood ratios are typically closely related to p-values. Thus, a data set that gives a large likelihood ratio will also return a small p-value, and vice-versa. In most prototypical hypothesis testing scenarios, an approximate transformation of a (two-tailed) p-value to an adjusted likelihood ratio is: 1 휆푎푑푗 ≈ (2) 7.4 푝 As such, p = 0.05 will normally correspond to 휆푎푑푗 ≈ 2.7, p = 0.01 will correspond to 휆푎푑푗 ≈ 13.5, and p = 0.001 will correspond to 휆푎푑푗 ≈ 135. Thus, p-values can also be viewed as describing the strength of the evidence, as noted by Fisher (1955), but do so only indirectly through their relation to likelihood (Dixon, 1998; Lew, 2013). Likelihood Ratios 8 Computing Likelihood Ratios A likelihood ratio can generally be computed from the same statistics used to compute a p- value. The remainder of this tutorial provides several examples of these calculations, including ones based on t-scores, ANOVA outputs, chi-square statistics, and binomial tests. A brief description of a model comparison application based on models that don’t rely on the maximum likelihood estimate is also provided; this would commonly be used to test two competing models that make a priori more specific predictions about the data than simply the presence or absence of an effect. Finally, personal views on interpreting likelihood ratios, and on the importance of methodological and statistical rigor in data collection and analysis, are provided. From here on, I recommend that interested readers experiment with likelihood ratios as they go through the tutorial, to get a feel for the statistic and how it relates to their intuitive sense of the data, as well as how it relates to other statistics they may have more experience with.

Load more