On the Use of Relative Likelihood Ratios in Statistical Inference
Total Page:16
File Type:pdf, Size:1020Kb
ON THE USE OF RELATIVE LIKELIHOOD RATIOS IN STATISTICAL INFERENCE Charles G. Boncelet*, Lisa M. Marvel**, Michael E. Picollelli* *University of Delaware, Newark DE USA **US Army Research Laboratory, APG Aberdeen MD USA ABSTRACT Standard statistical procedures including maximum likeli- hood estimates and confidence intervals can by found in many We ask the question, “How good are parameter estimates?,” textbooks, including Bickel and Docksum [1] and Kendall and offer criticism of confidence intervals as an answer. In- and Stuart [4]. Relative likelihoods have received some atten- stead we suggest the engineering community adopt a little tion in the statistics and epidemiological literature, but little known idea, that of defining plausible intervals from the rela- attention in the engineering literature. The best reference on tive likelihood ratio. Plausible intervals answer the question, relative likelihood methods is the text by Sprott [7]. One en- “What range of values could plausibly have given rise to the gineering reference is a recent paper by Sander and Beyerer data we have seen?” We develop a simple theorem for com- [6]. puting plausible intervals for a wide variety of common dis- In this paper, we adopt a “frequentist” interpretation of tributions, including the Gaussian, exponential, and Poisson, probability and statistical inference. Bayesian statisticians among others. adopt a different view, one with which we have some sym- pathy, but that view is not explored herein. For a recent dis- 1. INTRODUCTION cussion of Bayesian statistics, see Jaynes [3]. We consider a basic question of statistical analysis, “How good are the parameter estimates?”. Most often this question 2. PRELIMINARIES: LIKELIHOOD FUNCTIONS, is answered with confidence intervals. We argue that confi- MAXIMUM LIKELIHOOD ESTIMATES, AND dence intervals are flawed. We propose that a little known al- CONFIDENCE INTERVALS ternative based on relative likelihood ratios be used. We term these “plausible intervals.” We present a theorem on comput- Consider a common estimation problem: estimating the mean ing plausible intervals that applies to a wide range of distri- of a Gaussian distribution. Let X1, X2,..., Xn be IID (in- butions, including the Gaussian, exponential, and Chi-square. dependent and identically distributed) Gaussian random vari- 2 2 A similar result holds for the Poisson. Lastly, we show how ables with mean µ and variance σ , i.e., Xi ∼ N(µ, σ ). 2 to compute plausible regions for the Gaussian when both lo- For the moment, we assume we know σ and seek to esti- cation and scale parameters are estimated. mate µ. The density of each Xi is This work is part of a larger effort to understand why sta- 1 −(x − µ)2 tistical and signal processing procedures do not work as well p i f(xi; µ) = exp 2 in practice as theory indicates they should. Furthermore, we 2πσ2 2σ strive to develop better and more robust procedures. In this The likelihood function of µ is work, we begin to understand why “statistically significant” results are, upon further evaluation, not as reliable as thought. L(µ; xn) = f(x ; µ)f(x ; µ) ··· f(x ; µ) One reason is that confidence intervals are too small. They do 1 1 2 n − Pn (x − µ)2 not accurately reflect the range of possible inputs that could = (2πσ2)−n=2 exp i=1 i have given rise to the observed data. 2σ2 This work is not defense specific, though there are nu- n merous defense applications of statistical inference and signal where we use the notation x1 = x1; x2; : : : ; xn. processing. We expect this work to influence a wide range The maximum likelihood estimate (MLE) of µ is found by n of procedures beyond simple parameter estimation. For in- setting the first derivative of L(µ; x1 ) to 0, stance, Kalman filtering, spectral estimation, and hypothesis d n testing are potential applications. 0 = L(µ; x1 ) dµ µ=µ^ CGB can be reached at [email protected], LMM at mar- [email protected], and MEP at [email protected]. The calculations are somewhat easier if we find the maximum The confidence interval is not too helpful at predicting fu- of the log-likelihood function, ture values either. For instance, consider the following change to the experiment: After observing n measurements we will d n 0 = log L(µ; x ) compute a sample average and a confidence interval. Then dµ 1 µ=µ^ we will make another n measurements (independent of the n ! d n 1 X first) and ask what is the probability the second sample mean = − log 2πσ2 − (x − µ)2 dµ 2 2σ2 i is in the confidence interval? Let the second sample mean be 0 i=1 µ=µ^ denoted µ^ . Then, n ! n 1 X 0 cσ 0 cσ = xi − µ^ Pr u ≤ µ^ ≤ v = Pr µ^ − p ≤ µ^ ≤ µ^ + p σ2 n i=1 n n cσ 0 cσ From which we conclude the MLE of µ is the sample mean = Pr − p ≤ µ^ − µ^ ≤ p n n n 1 X µ^ = x = x Since both µ^ and µ^0 are independent N(µ, σ2=n) random n i n i=1 variables, the probability is Just how good is the sample mean as an estimator of µ? p p p Φ(c= 2) − Φ(−c= 2) = 2Φ(c= 2) − 1 A commonly used measure of goodness is the confidence in- n n terval. Find u(X1 ) and v(X1 ) such that For example, when α = 0:05, c = 1:96 and the probability evaluates to only 0.834. n n We regard the latter criticism as particularly damning. Pr u(X1 ) ≤ µ ≤ v(X1 ) ≥ 1 − α The usual reason to perform statistical analysis is to deter- for some α > 0. Normally, the confidence interval is selected mine something about future observations. After all, the n n as the minimal range v(X1 ) − u(X1 ). For the Gaussian current observations are already known. Parameter estimates example, the confidence interval is are often useful to the extent they help inform us about future n n observations. Confidence intervals are misleading indicators Pr u(X1 ) ≤ µ ≤ v(X1 ) ≥ 1 − α cσ of future values. u(Xn) = µ^ − p To guarantee that µ^0 is in the confidence interval with 1 n p cσ probability 1 − α, c must increase to 1:96 2 = 2:78. v(Xn) = µ^ + p 1 n 4. RELATIVE LIKELIHOOD RATIO INTERVALS where c = Φ−1(1 − α=2) and Φ(·) is the Normal distribution function. In the usual case where α = 0:05, c = 1:96. We hope to revive an older idea that has received little atten- tion in the engineering literature, relative likelihood ratios.A 3. CRITICISMS OF CONFIDENCE INTERVALS good reference is the text by Sprott [7]. The relative likeli- hood ratio is the following: Many criticisms of confidence intervals have been raised. L(θ; xn) L(θ; xn) Here we list a few of them: R(θ; xn) = 1 = 1 1 sup L(θ; xn) ^ n There is considerable confusion as to what a confidence θ 1 L(θ; x1 ) interval actually represents. The standard interpretation goes where θ represents the unknown parameter or parameters and something like this: Before the experiment is done, we agree θ^ is the MLE of θ. to compute a sample average and a confidence interval (u; v) The relative likelihood ratio helps answer the question, as above. Then the probability the interval will cover µ is “What values of θ could plausibly have given the data xn that 1 − α. 1 we observed?” The relative likelihood is useful after the ex- Note, however, after the experiment is done, the X have i periment is run, while probabilities are most useful before the values, x . Then, µ^, u, and v are numbers. Since µ is not i experiment is run. considered to be random, we cannot even ask the question, As an example, we consider the Gaussian example above. “What is the probability µ is in the interval (u; v)?” µ is either The unknown parameter is θ = µ and the MLE is θ^ = µ^. in the interval or not, but the question is not within the realm Compare the relative likelihood ratio to a threshold, of probability. The confidence interval has probability about the true Pn 2 − i=1(xi − µ) mean of 1 − α if µ = µ^. In general µ 6= µ^, and the interval exp 2 n 2σ contains less mass, R(θ; x1 ) = n ≥ α (1) − P (x − µ^)2 exp i=1 i Φ(v − µ)/σ − Φ(u − µ)/σ < 1 − α 2σ2 After taking logs and simplifying, the relation becomes 1.0 2σ2 (µ − µ^)2 ≤ log 1/α (2) n 1=n α 1−γ Solving for µ gives a relative likelihood ratio interval, which γe we shall refer to as a plausible interval. r r 2σ2 log 1/α 2σ2 log 1/α µ^ − ≤ µ ≤ µ^ + n n u 1.0 v γ σ σ µ^ − cp ≤ µ ≤ µ^ + cp (3) n n Fig. 1: Graphical representation of the plausible interval for an exponential distribution. When α = 0:05, c = 2:45. We see the plausible interval is bigger than the confidence interval. It is a more conservative measure. To reiterate, the plausible interval gives all values of θ that could have plau- After taking logs, multiplying by -1, and comparing to α, the sibly given rise to the data observed, where plausibly is mea- relation reduces to sured by the ratio of the likelihood at θ to the maximum like- k X pj log α lihood. − p^ log = KL(^pjjp) ≤ − (5) j p^ n As another example, consider estimating the parameter in j=1 j an exponential distribution.