8 the Likelihood Ratio Test

8TheLikelihoodRatioTest 8.1 The likelihood ratio We often want to test in situations where the adopted probability model involves several unknown parameters. Thus we may denote an element of the parameter space by θ =(θ1,θ2,...θk) Some of these parameters may be nuisance parameters, (e.g. testing hypothe- ses on the unknown mean of a normal distribution with unknown variance, where the variance is regarded as a nuisance parameter). We use the likelihood ratio, λ(x),definedas sup {L(θ; x):θ ∈ Θ } λ(x)= 0 , x ∈ Rn . sup {L(θ; x):θ ∈ Θ} X The informal argument for this is as follows. For a realisation x, determine its best chance of occurrence under H0 and also its best chance overall. The ratio of these two chances can never exceed unity, but, if small, would constitute evidence for rejection of the null hypothesis. A likelihood ratio test for testing H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 is a test with critical region of the form C1 = {x : λ(x) ≤ k} , where k is a real number between 0 and 1. Clearly the test will be at significance level α if k can be chosen to satisfy sup {P (λ(X) ≤ k; θ ∈ Θ0)} = α. If H0 is a simple hypothesis with Θ0 = {θ0}, we have the simpler form P (λ(X) ≤ k; θ0)=α. To determine k, we must look at the c.d.f. of the random variable λ(X), where the random sample X has joint p.d.f. fX(x; θ0). 69 Example Exponential distribution Test H0 : θ = θ0 against H1 : θ>θ0. Here Θ0 = {θ0}, Θ1 =[θ0, ∞). The likelihood function is n n −θ xi L(θ; x)= f(xi; θ)=θ e . i=1 The numerator of the likelihood ratio is n −nθ0x L(θ0; x)=θ0 e . We need to find the supremum as θ ranges over the interval [θ0, ∞).Now l(θ; x)=n log θ − nθx so that ∂l(θ; x) n = − nx ∂θ θ which is zero only when θ =1/x. Since L(θ; x) is an increasing function for θ<1/x and decreasing for θ>1/x, x−ne−n, if 1/x ≥ θ sup {L(θ; x):θ ∈ Θ} = 0 . n −nθ0x θ0 e if 1/x<θ0 70 L(θ;x) sup{ L(θ;x):θ ∋ Θ} θ 1/x θ0 L(θ;x) sup{ L(θ;x):θ ∋ Θ} θ θ0 1/x θne−nθ0x 0 , 1/x ≥ θ λ(x)= −n −n 0 x e 1, 1/x<θ0 θnxne−nθ0xen, 1/x ≥ θ = 0 0 1, 1/x<θ0 Since d xne−nθ0x = nxn−1e−nθ0x (1 − θ x) dx 0 is positive for values of x between 0 and 1/θ0 where θ0 > 0, it follows that λ(x) is a non-decreasing function of x. Therefore the critical region of the likelihood ratio test is of the form n C1 = x : xi ≤ c . i=1 Example The one-sample t-test The null hypothesis is H0 : θ = θ0 for the mean of a normal distribution with unknown variance σ2. 71 We have Θ = {(θ, σ2):θ ∈ R,σ2 ∈ R+} 2 2 + Θ0 = {(θ, σ ):θ = θ0,σ ∈ R } and 1 1 f(x; θ, σ2)=√ exp − (x − θ)2 ,x∈ R. 2πσ2 2σ2 The likelihood function is 1 n L(θ, σ2; x)=(2πσ2)−n/2 exp − (x − θ)2 2σ2 i i=1 Since n 1 n l(θ ,σ2; x)=− log(2πσ2) − (x − θ )2 0 2 2σ2 i 0 i=1 and ∂l n 1 n = − + (x − θ )2, ∂σ2 2σ2 2σ4 i 0 i=1 which is zero when 1 n σ2 = (x − θ )2 n i 0 i=1 we conclude that −n/2 2π n sup L(θ ,σ2; x) = (x − θ )2 e−n/2 . 0 n i 0 i=1 For the denominator, we already know from previous examples that the m.l.e. of θ is x,so −n/2 2π n sup L(θ, σ2; x) = (x − x)2 e−n/2 n i i=1 and n (x − θ )2 −n/2 λ(x)= i=1 i 0 . n 2 i=1(xi − x) This may be written in a more convenient form. Note that n n 2 2 (xi − θ0) = ((xi − x)+(x − θ0)) i=1 i=1 n 2 2 = (xi − x) + n(x − θ0) i=1 72 so that n(x − θ )2 −n/2 λ(x)= 1+ 0 . n 2 i=1(xi − x) The critical region is C1 = {x : λ(x) ≤ k} so it follows that H0 is to be rejected when the value of |x − θ | 0 n 2 i=1(xi − x) exceeds some constant. Now we have already seen that X − θ √ ∼ t(n − 1) S / n where 1 n S2 = (X − X)2. n − 1 i i=1 Therefore it makes sense to write the critical region in the form |x − θ | C = x : √ 0 ≥ c 1 s / n which is the standard form of the two-sided t-test for a single sample. 73 8.2 The likelihood ratio statistic Since the function −2logλ(x) is a decreasing function, it follows that the critical region of the likelihood ratio test can also be expressed in the form C1 = {x : −2 log λ(x) ≥ c} . Writing Λ(x)=−2logλ(x)=2 l(θ : x) − l(θ0 : x) the critical region may be written as C1 = {x :Λ(x) ≥ c} and Λ(X) is called the likelihood ratio statistic. We have been using the idea that values of θ close to θ are well supported by the data so, if θ0 is a possible value of θ, then it turns out that, for large samples, D 2 Λ(X) → χp where p = dim(θ). Letusseewhy. 8.2.1 The asymptotic distribution of the likelihood ratio statistic Write 1 l(θ )=l(θ)+(θ − θ )l(θ)+ (θ − θ )2l(θ)+... 0 0 2 0 and, remembering that l(θ)=0, we have 2 Λ (θ − θ0) −l (θ) 2 =(θ − θ0) J(θ) J(θ) 2 =(θ − θ0) I(θ0) . I(θ0) But J(θ) 1/2 D P (θ − θ0)I(θ0) → N(0, 1) and → 1 I(θ0) so 2 D 2 (θ − θ0) I(θ0) → χ1 74 and Slutsky’s theorem gives D 2 Λ → χ1 provided θ0 is the true value of θ. Example Poisson distribution Let X =(X1,...,Xn) be a random sample from a Poisson distribution with parameter θ, and test H0 : θ = θ0 against H1 : θ = θ0 at significance level 0.05. The p.m.f. is e−θθx p(x; θ)= ,x=0, 1,... x! so that n n l(θ : x)=−nθ + xi log θ − log xi! i=1 i=1 and ∂l(θ : x) 1 n = −n + x ∂θ θ i i=1 giving θ = x. Therefore x Λ=2n θ0 − x + x log . θ0 2 2 The distribution of Λ under H0 is approximately χ1 and χ1(0.95)=3.84,so the critical region of the test is x C1 = x :2n θ0 − x + x log ≥ 3.84 . θ0 75 8.3 Testing goodness-of-fit for discrete distributions The data below were collected by the ecologist E.C. Pielou, who was inter- ested in the pattern of healthy and diseased trees. The subject of her re- search was Armillaria root rot in a plantation of Douglas firs. She recorded the lengths of 109 runs of diseased trees and these are given below. Runlength 1 23456 Number of runs 71 28 5221 On biological grounds, Pielou proposed a geometric distribution as a probability model. Is this plausible? Let’s try to answer this by first looking at the general case. th Suppose we have k groups with ni in the i group. Thus Group 1 2 3 4 ··· k Number n1 n2 n3 n4 ··· nk where i ni = n. Suppose further that we have a probability model such thatπi(θ),i= th 1, 2,...,k, is the probability of being in the i group. Clearly i πi(θ)=1. The likelihood is k π (θ)ni L(θ)=n! i n ! i=1 i and the log-likelihood is k k l(θ)= ni log πi(θ)+logn! − log ni! i=1 i=1 Suppose θ maximises l(θ), being the solution of l(θ)=0. The general alternative is to take πi as unrestricted by rthe model and subject only to i πi =1. Thus we maximise k k l(π)= ni log πi +logn! − log ni! with g(π)= πi =1. i=1 i=1 i Using Lagrange multiplier γ we obtain the set of k equations ∂l ∂g − γ =0, 1 ≤ i ≤ k, ∂πi ∂πi 76 or n i − γ =0, 1 ≤ i ≤ k. πi Writing this as ni − γπi =0, 1 ≤ i ≤ k and summing over i we find γ = n and n π = i . i n The likelihood ratio statistic is k n Λ=2 n log i − k n log π (θ) i n i=1 i i i=1 k n =2 n log i . i i=1 nπi(θ) General statement of asymptotic result for the likelihood ratio statistic Testing H0 : θ ∈ Θ0 ⊂ Θ against H1 : θ ∈ Θ, the likelihood ratio statistic D 2 Λ=2 sup l(θ) − sup l(θ) → χp, θ∈Θ θ∈Θ0 where p =dimΘ− dim Θ0 In the general case above where k n Λ=2 n log i , i i=1 nπi(θ) k the restriction i=1 πi =1means that dim Θ = k − 1.

Load more