Wald, LM(Score), and LR Tests Asymptotic Distributions of The
Total Page:16
File Type:pdf, Size:1020Kb
Econ 620 Three Classical Tests; Wald, LM(Score), and LR tests Suppose that we have the density (y; θ) of a model with the null hypothesis of the form H0; θ = θ0. Let L (θ) be the log-likelihood function of the model and θ be the MLE of θ. Wald test is based on the very intuitive idea that we are willing to accept the null hypothesis when θ is close to θ0. The distance between θ and θ0 is the basis of constructing the test statistic. On the other hand, consider the following constrained maximization problem, max L (θ)s.t.θ = θ0 θ∈Θ If the constraint is not binding (the null hypothesi is true), the Lagrangian multiplier associated with the constraint is zero. We can construct a test measuring how far the Lagrangian multiplier is from zero. - LM test. Finally, another way to check the validity of null hypothesis is to check the distance between two values of maximum likelihood function like y; θ L θ − L (θ0)=log (y; θ0) If the null hypothesis is true, the above statistic should not be far away from zero, again. Asymptotic Distributions of the Three Tests Assume that the observed variables can be partitioned into the endogenous variables X and exogenous variables Y. To simplify the presentation, we assume that the observations (Yi,Xi) are i.i.d. and we can obtain conditional distribution of endogenous variables given the exogenous variables as f (yi | xi; θ)with θ ∈ Θ ⊆ Rp. The conditional density is known up to unknown parameter vector θ. By i.i.d. assumption, we can write down the log-likelihood function of n observations of (Yi,Xi)as n L (θ)= log f (yi | xi; θ) i=1 We assume all the regularity conditions for existence, consistency and asymptotic normality of MLE and denote MLE as θn. The hypotheses of interest are given as H0; g (θ0)=0 HA; g (θ0) =0 · p → r ∂g where g ( );R R and the rank of ∂θ is r. Wald test Proposition 1 ⎛ ⎞−1 ∂g θn ∂g θn ξW = ng θ ⎝ I−1 θ ⎠ g θ ∼ χ2 (r) under H . n n ∂θ n ∂θ n 0 2 where I − ∂ log f(Y |X;θ) and I−1 is the inverse of I evaluated at = EX Eθ ∂θ∂θ θn θ = θn. ⇒ From the asymptotic characteristics of MLE, we know that √ d −1 n θn − θ0 → N 0, I (θ0) (1) 1 The first order Taylor series expansion of g θn around the true value of θ0, we have ∂g (θ ) g θ = g (θ )+ 0 θ − θ + o (1) n 0 ∂θ n 0 p √ ∂g (θ )√ n g θ − g (θ ) = 0 n θ − θ + o (1) (2) n 0 ∂θ n 0 p Hence, combining (1) and (2) gives √ ∂g (θ ) ∂g (θ ) n g θ − g (θ ) →d N 0, 0 I−1 (θ ) 0 (3) n 0 ∂θ 0 ∂θ Under the null hypothesis, we have g (θ0)=0. Therefore, √ ∂g (θ ) ∂g (θ ) ng θ →d N 0, 0 I−1 (θ ) 0 (4) n ∂θ 0 ∂θ By forming the quadratic form of the normal random variables, we can conclude that − ∂g (θ ) ∂g (θ ) 1 ng θ 0 I−1 (θ ) 0 g θ ∼ χ2 (r) under H . (5) n ∂θ 0 ∂θ n 0 The statistic in (5) is useless since it depends on the unknown parameter θ0. However, we can consistently approximate the terms in inverse bracket by evaluating at MLE, θn. Therefore, ⎛ ⎞−1 ∂g θn ∂g θn ξW = ng θ ⎝ I−1 θ ⎠ g θ ∼ χ2 (r) under H . n n ∂θ n ∂θ n 0 • An asymptotic test which rejects the null hypothesis with probability one when the alternative hy- pothesis is true is called a consistent test. Namely, a consistent test has asymptotic power of 1. • The Wald test we discussed above is a consistent test. A heuristic argument is that if the alternative hy- − 1 →p ∂g(θn)I−1 ∂g (θn) pothesis is true instead of the null hypothesis, g θn g (θ0) =0. Therefore, g θn ∂θ θn ∂θ g θn W →∞ →∞ is converging to a constant instead of zero. By multiplying a constant by n, ξn as n , which implies that we always reject the null hypothesis when the alternative is true. • Another form of the Wald test statistic is given by - caution: this is quite confusing - ⎛ ⎞−1 ∂g θn ∂g θn ξW = g θ ⎝ I−1 θ ⎠ g θ ∼ χ2 (r) under H . n n ∂θ n n ∂θ n 0 2 2 I − ∂ L(θ) − n ∂ log(yi|xi;θ) I−1 I where n = EX Eθ ∂θ∂θ = EX Eθ i=1 ∂θ∂θ and n θn is the inverse of n eval- uated at θ = θn. Note that In = nI. • A quite common form of the null hypothesis is the zero restriction on a subset of parameters, i.e., H0; θ1 =0 HA; θ1 =0 where θ1 is a (q × 1) subvector of θ with q<p.Then, the Wald statistic is given by −1 W I11 ∼ 2 ξn = nθ1 θn θ1 χ (q) under H0. where I11 (θ) is the upper left block of the inverse information matrix, I (θ) I (θ) I (θ)= 11 12 I21 (θ) I22 (θ) − I11 I −I I−1 1 I11 I11 then, (θ)= 11 (θ) 12 (θ) 22 (θ) by the formula for partitioned inverse. θn is (θ) evaluated at MLE. 2 LM test (Score test) If we have a priori reason or evidence to believe that the parameter vector satisfies some restrictions in the form of g (θ)=0, incorporating the information into the maximization of the likelihood function through constrained optimization will improve the efficiency of estimator compared to MLE from unconstrained maximization. We solve the following problem; max L (θ) s.t.g (θ)=0 FOC’s are given by ∂L θn ∂g θn + λ =0 (6) ∂θ ∂θ g θn =0 (7) where θn is the solution of constrained maximization problem called constrained MLE and λ is the vector of Lagrange multiplier. The LM test is based on the idea that properly scaled λ has an asymptotically normal distribution. Proposition 2 ∂L θn ∂L θn S 1 I−1 ξn = θn n ∂θ ∂θ 1 ∂g θn ∂g θn = λ I−1 θ λ ∼ χ2 (r) under H . n ∂θ n ∂θ 0 ⇒ First order Taylor expansions of g θn and g θn around θ0 gives, ignoring op (1) terms, √ √ ∂g (θ )√ ng θ = ng (θ )+ 0 n θ − θ (8) n 0 ∂θ n 0 √ √ ∂g (θ )√ ng θ = ng (θ )+ 0 n θ − θ (9) n 0 ∂θ n 0 Note that g θn = 0 from (7) and substracting (9) from (8), we have √ ∂g (θ )√ ng θ = 0 n θ − θ (10) n ∂θ n n ∂L(θn) ∂L(θn) On the other hand, taking first order Taylor series expansions of ∂θ and ∂θ around θ0 gives, ignoring op (1) terms, ∂L θn 2 ∂L(θ0) ∂ L (θ0) − ⇒ = + θn θ0 ∂θ ∂θ ∂θ∂θ ∂L θn 2 √ √1 √1 ∂L(θ0) 1 ∂ L (θ0) − ⇒ = + n θn θ0 n ∂θ n ∂θ n ∂θ∂θ 1 ∂L θn 1 ∂L(θ ) √ √ = √ 0 −I(θ ) n θ − θ (11) n ∂θ n ∂θ 0 n 0 p 2 2 − 1 ∂ L(θ0) − 1 n ∂ log(yi|xi;θ) →I note that n ∂θ∂θ = n i=1 ∂θ∂θ (θ0) by the law of large numbers. Similarly, 1 ∂L θn 1 ∂L(θ ) √ √ = √ 0 −I(θ ) n θ − θ (12) n ∂θ n ∂θ 0 n 0 3 ∂L(θn) Considering the fact that ∂θ = 0 by FOC of the unconstrained maximization problem, we take the difference between (11) and (12). Then, 1 ∂L θn √ √ √ = −I (θ ) n θ − θ = I (θ ) n θ − θ (13) n ∂θ 0 n n 0 n n Hence, √ 1 ∂L θn n θ − θ = I−1 (θ ) √ (14) n n 0 n ∂θ From (10) and (14), we obtain √ ∂g (θ ) 1 ∂L θn ng θ = 0 I−1 (θ ) √ n ∂θ 0 n ∂θ Using (6), we deduce √ ∂g(θ ) ∂g θn λ ng θ = − 0 I−1 (θ ) √ n ∂θ 0 ∂θ n ∂g (θ ) ∂g (θ ) λ →− 0 I−1 (θ ) 0 √ (15) ∂θ 0 ∂θ n p p since θn → θ0 hence g θn → g (θ0) . Therefore, − λ ∂g (θ ) ∂g (θ ) 1 √ √ = − 0 I−1 (θ ) 0 ng θ (16) n ∂θ 0 ∂θ n √ →d ∂g(θ0) I−1 ∂g (θ0) From (4), under the null hypothesis, ng θn N 0, ∂θ (θ0) ∂θ . Consequently, we have − λ ∂g (θ ) ∂g (θ ) 1 √ →d N 0, 0 I−1 (θ ) 0 (17) n ∂θ 0 ∂θ Again, forming the quadratic form of the normal random variables, we obtain 1 ∂g (θ ) ∂g (θ ) λ 0 I−1 (θ ) 0 λ ∼ χ2 (r) under H . (18) n ∂θ 0 ∂θ 0 Alternatively, using (6), another form of the test statistic is given by 1 ∂L θn ∂L θn I−1 (θ ) ∼ χ2 (r) under H (19) n ∂θ 0 ∂θ 0 Note that (18) and (19) are useless since they depend on the unknown parameter value θ0. We can evaluate the terms involved in θ0 at the constrained MLE, θn to get a usable statistic. • S ∂L(θn)I−1 ∂L(θn) ∂g(θ0) I−1 ∂g (θ0) Again, another form of LM test is ξn = ∂θ n (θ0) ∂θ = λ ∂θ n (θ0) ∂θ λ.