Likelihood Ratio Tests 1 the Simplest Case: Simple Hypotheses

Math 541: Statistical Theory II Likelihood Ratio Tests Instructor: Songfeng Zheng A very popular form of hypothesis test is the likelihood ratio test, which is a generalization of the optimal test for simple null and alternative hypotheses that was developed by Neyman and Pearson (We skipped Neyman-Pearson lemma because we are short of time). The likelihood ratio test is based on the likelihood function fn(X ¡1; ¢ ¢ ¢ ;Xnjθ), and the intuition that the likelihood function tends to be highest near the true value of θ. Indeed, this is also the foundation for maximum likelihood estimation. We will start from a very simple example. 1 The Simplest Case: Simple Hypotheses Let us ¯rst consider the simple hypotheses in which both the null hypothesis and alternative hypothesis consist one value of the parameter. Suppose X1; ¢ ¢ ¢ ;Xn is a random sample of size n from an exponential distribution 1 f(xjθ) = e¡x/θ; x > 0 θ Conduct the following simple hypothesis testing problem: H0 : θ = θ0 vs:Ha : θ = θ1; where θ1 < θ0. Suppose the signi¯cant level is ®. If we assume H0 were correct, then the likelihood function is Yn 1 X ¡Xi/θ0 ¡n fn(X1; ¢ ¢ ¢ ;Xnjθ0) = e = θ0 expf¡ Xi/θ0g: i=1 θ0 Similarly, if H1 were correct, the likelihood function is X ¡n fn(X1; ¢ ¢ ¢ ;Xnjθ1) = θ1 expf¡ Xi/θ1g: We de¯ne the likelihood ratio as follows: P Ã ! ¡n ¡n ½µ ¶ X ¾ fn(X1; ¢ ¢ ¢ ;Xnjθ0) θ0 expf¡ Xi/θ0g θ0 1 1 LR = = ¡n P = exp ¡ Xi fn(X1; ¢ ¢ ¢ ;Xnjθ1) θ1 expf¡ Xi/θ1g θ1 θ1 θ0 1 2 Intuitively, if the evidence (data) supports H1, then the likelihood function fn(X1; ¢ ¢ ¢ ;Xnjθ1) should be large, therefore the likelihood ratio is small. Thus, we reject the null hypothesis if the likelihood ratio is small, i.e. LR · k, where k is a constant such that P (LR · k) = ® under the null hypothesis (θ = θ0). To ¯nd what kind of test results from this criterion, we expand the condition 0 1 Ã !¡n ½µ ¶ ¾ θ0 1 1 X ® = P (LR · k) = P @ exp ¡ Xi · kA θ1 θ1 θ0 Ã ½µ ¶ ¾ Ã !n ! 1 1 X θ0 = P exp ¡ Xi · k θ1 θ0 θ1 Ãµ ¶ "Ã !n #! 1 1 X θ0 = P ¡ Xi · log k θ1 θ0 θ1 Ã ! X log k + n log θ ¡ n log θ = P X · 0 1 i 1 ¡ 1 Ã θ1 θ0 ! 2 X 2 log k + n log θ0 ¡ n log θ1 = P Xi · 1 1 θ0 θ0 ¡ Ã θ1 θ0 ! 2 log k + n log θ0 ¡ n log θ1 = P V · 1 1 θ0 ¡ θ1 θ0 2 P where V = Xi. From the property of exponential distribution, we know under the null θ0 2 2 hypothesis, Xi follows Â distribution, consequently, V follows a Chi square distribution θ0 2 with 2n degrees of freedom. Thus, by looking at the chi-square table, we can ¯nd the value of the chi-square statistic with 2n degrees of freedom such that the probability that V is less than that number is ®, that is, solve for c, such that P (V · c) = ®. Once you ¯nd the value of c, you can solve for k and de¯ne the test in terms of likelihood ratio. For example, suppose that H0 : θ = 2 and Ha : θ = 1, and we want to do the test at a signi¯cance level ® = 0:05 with a random sample of size n = 5 from an exponential distribution. We can look at the chi-square table under 10 degrees of freedom to ¯nd that P 3.94 is the value under which there is 0.05 area. Using this, we can obtain P ( 2 X · P 2 i 3:94) = 0:05. This implies that we should reject the null hypothesis if Xi · 3:94 in this example. To ¯nd a rejection criterion directly in terms of the likelihood function, we can solve for k by 2 log k + n log θ0 ¡ n log θ1 1 1 = 3:94; θ0 ¡ θ1 θ0 and the solution is k = 0:8034. So going back to the original likelihood ratio, we reject the null hypothesis if Ã ! ¡n ½µ ¶ ¾ µ ¶¡5 ½µ ¶ ¾ θ0 1 1 X 2 1 1 X exp ¡ Xi = exp ¡ Xi · 0:8034 θ1 θ1 θ0 1 1 2 3 2 General Likelihood Ratio Test Likelihood ratio tests are useful to test a composite null hypothesis against a composite alternative hypothesis. Suppose that the null hypothesis speci¯es that θ (may be a vector) lies in a particular set of possible values, say £0, i.e. H0 : θ 2 £0; the alternative hypothesis speci¯es that £ lies in another set of possible values £a, which does not overlap £0, i.e. Ha : θ 2 £a. Let £ = £0 [ £a. Either or both of the hypotheses H0 and Ha can be compositional. ^ Let L(£0) be the maximum (actually the supremum) of the likelihood function for all θ 2 £0. ^ ^ That is, L(£0) = maxθ2£0 L(θ). L(£0) represents the best explanation for the observed ^ data for all θ 2 £0. Similarly, L(£) = maxθ2£ L(θ) represents the best explanation for ^ ^ the observed data for all θ 2 £ = £0 [ £a. If L(£0) = L(£), then a best explanation for the observed data can be found inside £0 and we should not reject the null hypothesis ^ ^ H0 : θ 2 £0. However, if L(£0) < L(£), then the best explanation for the observed data could be found inside £a, and we should consider rejecting H0 in favor of Ha. A likelihood ^ ^ ratio test is based on the ratio L(£0)=L(£). De¯ne the likelihood ratio statistic by L(£^ ) max L(θ) ¤ = 0 = θ2£0 ; L(£)^ maxθ2£ L(θ) A likelihood ratio test of H0 : θ 2 £0 vs:Ha : θ 2 £a employs ¤ as a test statistic, and the rejection region is determined by ¤ · k. Clearly, 0 · ¤ · 1. A value of ¤ close to zero indicates that the likelihood of the sample is much smaller under H0 than it is under Ha, therefore the data suggest favoring Ha over H0. The actually value of k is chosen so that ® achieves the desired value. A lot of previously introduced testing procedure can be reformulated as likelihood ratio test, such at the example below: Example 1: Testing Hypotheses about the mean of a normal distribution with unknown variance. Suppose that X = (X1; ¢ ¢ ¢ ;Xn) is a random sample from a normal distribution with unknown mean ¹ and unknown variance σ2. We wish to test the hypotheses H0 : ¹ = ¹0 vs:Ha : ¹ 6= ¹0 2 2 Solution: In this example, the parameter is θ = (¹; σ ). Notice that £0 is the set f(¹0; σ ): 2 2 2 2 σ > 0g, and £a = f(¹; σ ): ¹ 6= ¹0; σ > 0g, and hence that £ = £0 [ £a = f(¹; σ ): ¡1 < ¹ < 1; σ2 > 0g. The value of the constant σ2 is completely unspeci¯ed. We must ^ ^ now ¯nd L(£0) and L(£). 4 For the normal distribution, we have Ã ! " # 1 n Xn (X ¡ ¹)2 2 p i L(θ) = L(¹; σ ) = exp ¡ 2 : 2¼σ i=1 2σ ^ Restricting ¹ to £0 implies that ¹ = ¹0, and we can ¯nd L(£0) if we can determine the 2 2 value of σ that maximizes L(¹; σ ) subject to the constraint that ¹ = ¹0. It is easy to see 2 2 that when ¹ = ¹0, the value of σ that maximizes L(¹0; σ ) is Xn 2 1 2 σ^0 = (Xi ¡ ¹0) : n i=1 ^ 2 2 2 Thus, L(£0) can be obtained by replacing ¹ with ¹0 and σ withσ ^0 in L(¹; σ ), which yields Ã ! " # Ã ! n Xn 2 n ^ 1 (Xi ¡ ¹0) 1 ¡n=2 L(£0) = p exp ¡ 2 = p e : 2¼σ^0 i=1 2^σ0 2¼σ^0 We now turn to ¯nding L(£).^ Let (^¹; σ^2) be the point in the set £ which maximizes the likelihood function L(¹; σ2), by the method of maximum likelihood estimation, we have Xn ¹ 2 1 2 ¹^ = X andσ ^ = (Xi ¡ ¹^) : n i=1 Then L(£)^ is obtained by replacing ¹ with¹ ^ and σ2 withσ ^2, which gives Ã ! " # Ã ! 1 n Xn (X ¡ ¹^)2 1 n ^ p i p ¡n=2 L(£) = exp ¡ 2 = e : 2¼σ^ i=1 2^σ 2¼σ^ Therefore, the likelihood ratio is calculated as ³ ´ n Ã ! " P # ^ p 1 e¡n=2 2 n=2 n ¹ 2 n=2 L(£0) 2¼σ^0 σ^ i=1(Xi ¡ X) ¤ = = ³ ´n = = P 1 2 n 2 L(£)^ p e¡n=2 σ^0 i=1(Xi ¡ ¹0) 2¼σ^ Notice that 0 < ¤ · 1 because £0 ½ £, thus when ¤ < k we would reject H0, where k < 1 is a constant. Because Xn Xn Xn 2 ¹ ¹ 2 ¹ 2 ¹ 2 (Xi ¡ ¹0) = [(Xi ¡ X) + (X ¡ ¹0)] = (Xi ¡ X) + n(X ¡ ¹0) ; i=1 i=1 i=1 the rejection region, ¤ < k, is equivalent to P n ¹ 2 (Xi ¡ X) Pi=1 2=n 0 n 2 < k = k i=1(Xi ¡ ¹0) 5 P n ¹ 2 (Xi ¡ X) P i=1 < k0 n ¹ 2 ¹ 2 i=1(Xi ¡ X) + n(X ¡ ¹0) 1 0 ¹ 2 < k : Pn(X¡¹0) 1 + n (X ¡X¹)2 i=1 i This inequality is equivalent to ¹ 2 n(X ¡ ¹0) 1 P > ¡ 1 = k00 n ¹ 2 0 i=1(Xi ¡ X) k ¹ 2 n(X ¡ ¹0) P > (n ¡ 1)k00 1 n ¹ 2 n¡1 i=1(Xi ¡ X) By de¯ning Xn 2 1 ¹ 2 S = (Xi ¡ X) ; n ¡ 1 i=1 the above rejection region is equivalent to ¯ ¯ ¯p ¹ ¯ q ¯ n(X ¡ ¹0)¯ ¯ ¯ > (n ¡ 1)k00: ¯ S ¯ p ¹ We can recognize that n(X ¡ ¹0)=S is the t statistic employed in previous sections, and the decision rule is exactly the same as previous.

Likelihood Ratio Tests 1 the Simplest Case: Simple Hypotheses

Hypothesis Testing and Likelihood Ratio Tests

Data 8 Final Stats Review

5. Completeness and Sufficiency 5.1. Complete Statistics. Definition 5.1. a Statistic T Is Called Complete If Eg(T) = 0 For

Use of Statistical Tables

8.5 Testing a Claim About a Standard Deviation Or Variance

1 One Parameter Exponential Families

Lecture 4: Sufficient Statistics 1 Sufficient Statistics

On the Meaning and Use of Kurtosis

The Probability Lifesaver: Order Statistics and the Median Theorem

Minimum Rates of Approximate Sufficient Statistics

Theoretical Statistics. Lecture 5. Peter Bartlett

The Likelihood Function - Introduction