Likelihood Ratios: a Simple and Flexible Statistic for Empirical Psychologists

Psychonomic Bulletin & Review 2004, 11 (5), 791-806 Likelihood ratios: A simple and flexible statistic for empirical psychologists SCOTT GLOVER Pennsylvania State University, University Park, Pennsylvania and PETER DIXON University of Alberta, Edmonton, Alberta, Canada Empirical studies in psychology typically employ null hypothesis significance testing to draw statistical inferences. We propose that likelihood ratios are a more straightforward alternative to this approach. Likelihood ratios provide a measure of the fit of two competing models; the statistic represents a direct comparison of the relative likelihood of the data, given the best fit of the two models. Likelihood ratios offer an intuitive, easily interpretable statistic that allows the researcher great flexibility in framing empirical arguments. In support of this position, we report the results of a survey of empirical articles in psychology, in which the common uses of statistics by empirical psychologists is examined. From the results of this survey, we show that likelihood ratios are able to serve all the im- portant statistical needs of researchers in empirical psychology in a format that is more straightforward and easier to interpret than traditional inferential statistics. A likelihood ratio statistic reflects the relative likeli- LIKELIHOOD RATIOS hood of the data, given two competing models. Likelihood ratios provide an intuitive approach to summarizing the A likelihood ratio can be thought of as a comparison of evidence provided by an experiment. Because they de- two statistical models of the observed data. Each model scribe evidence, rather than embody a decision, they can provides a probability density for the observations and a easily be adapted to the various goals for which inferential set of unknown parameters that can be estimated from the statistics might be used. In contrast, the logic of null hy- data. In a broad range of common situations, the density pothesis significance testing is often at odds with the goals is the multivariate normal distribution, and the param- of the researcher, and that approach, despite its common eters are the means in the various conditions, together usage, is generally ill suited for the varied purposes to with the error variance. The two models differ in terms which it is put. of the constraints on the condition means. For example, We develop our thesis in three sections. In the first sec- a model in which two condition means differ might be tion, we provide an introduction to the use of likelihood ra- compared with one in which the means are identical. The tios as model comparisons and describe how they relate to match of each model to the observations can be indexed more traditional statistics. In the second section, we report by calculating the likelihood of the data, given the best the results of a small survey in which we identify some of estimates of the model parameters: The more likely the the common goals of significance testing in empirical psy- data are, the better the match. In this case, the best pa- chology. In the third section, we describe how likelihood rameter estimates are those that maximize the likelihood ratios can be used to achieve each of these goals more di- of the data, which are termed maximum-likelihood esti- rectly than traditional significance tests. mates. The ratio of two such likelihoods is the maximum likelihood ratio; it provides an index of the relative match of the two models to the observed data. Formally, the likelihood ratio can be written as The present work was supported by the Natural Sciences and Engi- ˆ f ()X |q2 neering Research Council of Canada through a fellowship to the first l = , (1) author and a grant to the second author. The authors thank Michael Lee, f X |qˆ Geoff Loftus, and an anonymous reviewer for their insightful comments ()1 on previous versions of this article. Correspondence concerning this ar- where f is the probability density, X is the vector of ob- ticle should be addressed to S. Glover, Department of Psychology, ˆ ˆ Royal Holloway University of London, Egham, Surrey TW20 0EX, servations, andq1 andq 2 are the vectors of parameter esti- England (e-mail: [email protected]). mates that maximize the likelihood under the two models. 791 Copyright 2004 Psychonomic Society, Inc. 792 GLOVER AND DIXON Figure 1. Comparison of linear (middle panel) and quadratic (right panel) model fits of a theoretical data set. Error bars indicate the standard deviations of the observations under the best-fitting model; for clar- ity, only six are shown. Viewing statistical inference as model comparison is a described by the correlation between the predicted and the well-established perspective. Judd and McClelland (1989), observed values, and the value of R2 for each model is in- for example, organized an entire introductory textbook dicated in the figure. The standard deviation shown by the around this approach. Furthermore, the use of likelihood error bars is an index of the residual variance that is not ex- ratios in statistical inference is common (e.g., Edwards, plained by the model and is proportional to ÷(1ϪR2). The 1972; Royall, 1997), and the role of likelihood in model error bars are shown on the curve to indicate that the esti- comparison is well established (e.g., Akaike, 1973; mate depends on which model is being fit (cf. Estes, 1997). Schwartz, 1978). Furthermore, the likelihood ratio plays a It appears from Figure 1 that the data are more likely pivotal role in most approaches to hypothesis testing. In given the best-fitting quadratic model on the right than Bayesian hypothesis testing, the posterior odds of two hy- given the best-fitting linear model. This is reflected in potheses are related to the products of the likelihood ratio the smaller deviations from the predicted values and the and the prior odds (e.g., Sivia, 1996). In the decision- larger value of R2. As a consequence, 1ϪR2 and the es- theoretic approach advocated by Neyman and Pearson timate of the standard error are smaller for the quadratic (1928, 1933), any suitable decision rule can be viewed as model than for the linear model. In fact, the likelihood is a decision based on the value of the likelihood ratio. related to the inverse of the standard deviation. The ratio Fisher (1955) advocated the use of the log likelihood ratio of the likelihoods thus indexes the relative quality of the as an index of the evidence against a null hypothesis. In two fits, and with normally distributed data, one can all of these approaches, the likelihood ratio represents the write the likelihood ratio as evidence provided by the data with respect to two mod- nn n els. Although the form of the likelihood ratio varies to Ê 1 s2 ˆ 2 Ê s2 ˆ 2 Ê1- R2 ˆ 2 l = 2 = 1 = 1 , (2) some extent in these different approaches, they all have a Á 2 ˜ Á 2 ˜ Á 2 ˜ Ë 1 s ¯ Ë s ¯ Ë1- R ¯ common conceptual basis. 1 2 2 Consider the hypothetical data shown in Figure 1. The where s1 and s2 are the two estimates of the standard de- 2 2 data points in each panel represent the effects of an inde- viation, n is the number of observations, and R1 and R2 pendent variable X on a dependent variable Y. The straight describe the quality of the fits of the two models. (A line in the middle panel indicates the best fitting linear proof is provided in Appendix A.) In this example, the R2 model of the 21 observations, whereas the line in the right values are .689 and .837, and the value of the likelihood panel shows the fit of a more complicated model that in- ratio is 862.6. In simple terms, this means that the data cludes a quadratic component. The fit of the models can be are 862.6 times as likely to occur if the second model LIKELIHOOD RATIOS 793 (and its best-fitting parameter values) is true than if the searcher is indifferent a priori. However, a likelihood first model (and its best-fitting parameter values) is true. ratio of that magnitude could not be regarded as very The 1ϪR2 terms correspond to residual variation that strong evidence one way or the other and should not be is not explained by each model. Thus, an equivalent de- persuasive if there were other reasons to prefer Model 1. scription of the likelihood is A comparable but somewhat different index can be de- n rived using principles of Bayesian inference. The alter- Ê Model 1 unexplained variationˆ 2 native index is referred to as the BIC (Schwartz, 1978), l = Á ˜ . (3) Ë Model 2 unexplained variation¯ and is defined as This conceptual formula provides a straightforward ap- BIC = -2ln(lkn )+ ln( ), (7) proach to calculating likelihood ratios for many kinds of where n is the sample size. Using this criterion, Model 2 model comparisons. In particular, when the models in should be preferred over Model 1 when l ϭ Q l is question are linear, the requisite information about un- B B greater than 1, where Q is explained variation can typically be found in a factorial B analysis of variance (ANOVA) table. ln(n ) 2 QkkB = []exp()12- . (8) Correcting for Model Complexity In this case, lB can be viewed as an estimate of the Bayes- Although arguments based on this form of the likeli- ian posterior odds of the two models, assuming uninfor- hood ratio are sometimes possible, there is an obvious mative priors. problem apparent in this example: Because the quadratic Pitt, Myung, and Zhang (2002) have argued convinc- model includes an extra parameter, it will always fit the ingly that the number of parameters as captured by these data better than the linear model, regardless of what re- indices is only one aspect of model flexibility and that sults are obtained.

Load more