On the Use of Relative Likelihood Ratios in Statistical Inference

ON THE USE OF RELATIVE LIKELIHOOD RATIOS IN STATISTICAL INFERENCE Charles G. Boncelet*, Lisa M. Marvel**, Michael E. Picollelli* *University of Delaware, Newark DE USA **US Army Research Laboratory, APG Aberdeen MD USA ABSTRACT Standard statistical procedures including maximum likelihood estimates and confidence intervals can by found in many We ask the question, “How good are parameter estimates?,” textbooks, including Bickel and Docksum [1] and Kendall and offer criticism of confidence intervals as an answer. In- and Stuart [4]. Relative likelihoods have received some atten- stead we suggest the engineering community adopt a little tion in the statistics and epidemiological literature, but little known idea, that of defining plausible intervals from the rela- attention in the engineering literature. The best reference on tive likelihood ratio. Plausible intervals answer the question, relative likelihood methods is the text by Sprott [7]. One en- “What range of values could plausibly have given rise to the gineering reference is a recent paper by Sander and Beyerer data we have seen?” We develop a simple theorem for com- [6]. puting plausible intervals for a wide variety of common dis- In this paper, we adopt a “frequentist” interpretation of tributions, including the Gaussian, exponential, and Poisson, probability and statistical inference. Bayesian statisticians among others. adopt a different view, one with which we have some sym- pathy, but that view is not explored herein. For a recent dis- 1. INTRODUCTION cussion of Bayesian statistics, see Jaynes [3]. We consider a basic question of statistical analysis, “How good are the parameter estimates?”. Most often this question 2. PRELIMINARIES: LIKELIHOOD FUNCTIONS, is answered with confidence intervals. We argue that confi- MAXIMUM LIKELIHOOD ESTIMATES, AND dence intervals are flawed. We propose that a little known al- CONFIDENCE INTERVALS ternative based on relative likelihood ratios be used. We term these “plausible intervals.” We present a theorem on comput- Consider a common estimation problem: estimating the mean ing plausible intervals that applies to a wide range of distri- of a Gaussian distribution. Let X1, X2,..., Xn be IID (in- butions, including the Gaussian, exponential, and Chi-square. dependent and identically distributed) Gaussian random vari- 2 2 A similar result holds for the Poisson. Lastly, we show how ables with mean µ and variance σ , i.e., Xi ∼ N(µ, σ ). 2 to compute plausible regions for the Gaussian when both lo- For the moment, we assume we know σ and seek to esti- cation and scale parameters are estimated. mate µ. The density of each Xi is This work is part of a larger effort to understand why sta- 1 −(x − µ)2 tistical and signal processing procedures do not work as well p i f(xi; µ) = exp 2 in practice as theory indicates they should. Furthermore, we 2πσ2 2σ strive to develop better and more robust procedures. In this The likelihood function of µ is work, we begin to understand why “statistically significant” results are, upon further evaluation, not as reliable as thought. L(µ; xn) = f(x ; µ)f(x ; µ) ··· f(x ; µ) One reason is that confidence intervals are too small. They do 1 1 2 n − Pn (x − µ)2 not accurately reflect the range of possible inputs that could = (2πσ2)−n=2 exp i=1 i have given rise to the observed data. 2σ2 This work is not defense specific, though there are nu- n merous defense applications of statistical inference and signal where we use the notation x1 = x1; x2; : : : ; xn. processing. We expect this work to influence a wide range The maximum likelihood estimate (MLE) of µ is found by n of procedures beyond simple parameter estimation. For in- setting the first derivative of L(µ; x1 ) to 0, stance, Kalman filtering, spectral estimation, and hypothesis d n testing are potential applications. 0 = L(µ; x1 ) dµ µ=µ^ CGB can be reached at [email protected], LMM at mar- [email protected], and MEP at [email protected]. The calculations are somewhat easier if we find the maximum The confidence interval is not too helpful at predicting fu- of the log-likelihood function, ture values either. For instance, consider the following change to the experiment: After observing n measurements we will d n 0 = log L(µ; x ) compute a sample average and a confidence interval. Then dµ 1 µ=µ^ we will make another n measurements (independent of the n ! d n 1 X first) and ask what is the probability the second sample mean = − log 2πσ2 − (x − µ)2 dµ 2 2σ2 i is in the confidence interval? Let the second sample mean be 0 i=1 µ=µ^ denoted µ^ . Then, n ! n 1 X 0 cσ 0 cσ = xi − µ^ Pr u ≤ µ^ ≤ v = Pr µ^ − p ≤ µ^ ≤ µ^ + p σ2 n i=1 n n cσ 0 cσ From which we conclude the MLE of µ is the sample mean = Pr − p ≤ µ^ − µ^ ≤ p n n n 1 X µ^ = x = x Since both µ^ and µ^0 are independent N(µ, σ2=n) random n i n i=1 variables, the probability is Just how good is the sample mean as an estimator of µ? p p p Φ(c= 2) − Φ(−c= 2) = 2Φ(c= 2) − 1 A commonly used measure of goodness is the confidence in- n n terval. Find u(X1 ) and v(X1 ) such that For example, when α = 0:05, c = 1:96 and the probability evaluates to only 0.834. n n We regard the latter criticism as particularly damning. Pr u(X1 ) ≤ µ ≤ v(X1 ) ≥ 1 − α The usual reason to perform statistical analysis is to deter- for some α > 0. Normally, the confidence interval is selected mine something about future observations. After all, the n n as the minimal range v(X1 ) − u(X1 ). For the Gaussian current observations are already known. Parameter estimates example, the confidence interval is are often useful to the extent they help inform us about future n n observations. Confidence intervals are misleading indicators Pr u(X1 ) ≤ µ ≤ v(X1 ) ≥ 1 − α cσ of future values. u(Xn) = µ^ − p To guarantee that µ^0 is in the confidence interval with 1 n p cσ probability 1 − α, c must increase to 1:96 2 = 2:78. v(Xn) = µ^ + p 1 n 4. RELATIVE LIKELIHOOD RATIO INTERVALS where c = Φ−1(1 − α=2) and Φ(·) is the Normal distribution function. In the usual case where α = 0:05, c = 1:96. We hope to revive an older idea that has received little attention in the engineering literature, relative likelihood ratios.A 3. CRITICISMS OF CONFIDENCE INTERVALS good reference is the text by Sprott [7]. The relative likelihood ratio is the following: Many criticisms of confidence intervals have been raised. L(θ; xn) L(θ; xn) Here we list a few of them: R(θ; xn) = 1 = 1 1 sup L(θ; xn) ^ n There is considerable confusion as to what a confidence θ 1 L(θ; x1 ) interval actually represents. The standard interpretation goes where θ represents the unknown parameter or parameters and something like this: Before the experiment is done, we agree θ^ is the MLE of θ. to compute a sample average and a confidence interval (u; v) The relative likelihood ratio helps answer the question, as above. Then the probability the interval will cover µ is “What values of θ could plausibly have given the data xn that 1 − α. 1 we observed?” The relative likelihood is useful after the ex- Note, however, after the experiment is done, the X have i periment is run, while probabilities are most useful before the values, x . Then, µ^, u, and v are numbers. Since µ is not i experiment is run. considered to be random, we cannot even ask the question, As an example, we consider the Gaussian example above. “What is the probability µ is in the interval (u; v)?” µ is either The unknown parameter is θ = µ and the MLE is θ^ = µ^. in the interval or not, but the question is not within the realm Compare the relative likelihood ratio to a threshold, of probability. The confidence interval has probability about the true Pn 2 − i=1(xi − µ) mean of 1 − α if µ = µ^. In general µ 6= µ^, and the interval exp 2 n 2σ contains less mass, R(θ; x1 ) = n ≥ α (1) − P (x − µ^)2 exp i=1 i Φ(v − µ)/σ − Φ(u − µ)/σ < 1 − α 2σ2 After taking logs and simplifying, the relation becomes 1.0 2σ2 (µ − µ^)2 ≤ log 1/α (2) n 1=n α 1−γ Solving for µ gives a relative likelihood ratio interval, which γe we shall refer to as a plausible interval. r r 2σ2 log 1/α 2σ2 log 1/α µ^ − ≤ µ ≤ µ^ + n n u 1.0 v γ σ σ µ^ − cp ≤ µ ≤ µ^ + cp (3) n n Fig. 1: Graphical representation of the plausible interval for an exponential distribution. When α = 0:05, c = 2:45. We see the plausible interval is bigger than the confidence interval. It is a more conservative measure. To reiterate, the plausible interval gives all values of θ that could have plau- After taking logs, multiplying by -1, and comparing to α, the sibly given rise to the data observed, where plausibly is mea- relation reduces to sured by the ratio of the likelihood at θ to the maximum like- k X pj log α lihood. − p^ log = KL(^pjjp) ≤ − (5) j p^ n As another example, consider estimating the parameter in j=1 j an exponential distribution.

On the Use of Relative Likelihood Ratios in Statistical Inference

WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conﬂict with His Ideology of Statistical Inference

Statistical Theory

Estimation Methods in Multilevel Regression

Akaike's Information Criterion

Plausibility Functions and Exact Frequentist Inference

An Introduction to Maximum Likelihood in R

Likelihood Ratios: a Simple and Flexible Statistic for Empirical Psychologists

SPATIAL and TEMPORAL MODELLING of WATER ACIDITY in TURKEY LAKES WATERSHED Spatial and Temporal Modelling of Water Acidity in Turkey Lakes Watershed

P Values, Hypothesis Testing, and Model Selection: It’S De´Ja` Vu All Over Again1

Maximum Likelihood Estimation ∗ Contents

The Ways of Our Errors

Profile Likelihood Estimation of the Vulnerability P(X>V) and the Mixing