Wald (And Score) Tests

Wald (and Score) Tests 1 / 18 Vector of MLEs is Asymptotically Normal That is, Multivariate Normal This yields I Confidence intervals I Z-tests of H0 : θj = θ0 I Wald tests I Score Tests I Indirectly, the Likelihood Ratio tests 2 / 18 Under Regularity Conditions (Thank you, Mr. Wald) a.s. I θbn → θ √ d −1 I n(θbn − θ) → T ∼ Nk 0, I(θ) 1 −1 I So we say that θbn is asymptotically Nk θ, n I(θ) . I I(θ) is the Fisher Information in one observation. I A k × k matrix ∂2 I(θ) = E[− log f(Y ; θ)] ∂θi∂θj I The Fisher Information in the whole sample is nI(θ) 3 / 18 H0 : Cθ = h Suppose θ = (θ1, . θ7), and the null hypothesis is I θ1 = θ2 I θ6 = θ7 1 1 I 3 (θ1 + θ2 + θ3) = 3 (θ4 + θ5 + θ6) We can write null hypothesis in matrix form as θ1 θ2 1 −1 0 0 0 0 0 θ3 0 0 0 0 0 0 1 −1 θ4 = 0 1 1 1 −1 −1 −1 0 θ5 0 θ6 θ7 4 / 18 p Suppose H0 : Cθ = h is True, and Id(θ)n → I(θ) By Slutsky 6a (Continuous mapping), √ √ d −1 0 n(Cθbn − Cθ) = n(Cθbn − h) → CT ∼ Nk 0, CI(θ) C and −1 p −1 Id(θ)n → I(θ) . Then by Slutsky’s (6c) Stack Theorem, √ ! n(Cθbn − h) d CT → . −1 I(θ)−1 Id(θ)n Finally, by Slutsky 6a again, −1 0 0 −1 Wn = n(Cθb − h) (CId(θ)n C ) (Cθb − h) →d W = (CT − 0)0(CI(θ)−1C0)−1(CT − 0) ∼ χ2(r) 5 / 18 The Wald Test Statistic −1 0 0 −1 Wn = n(Cθbn − h) (CId(θ)n C ) (Cθbn − h) I Again, null hypothesis is H0 : Cθ = h I Matrix C is r × k, r ≤ k, rank r I All we need is a consistent estimator of I(θ) I I(θb) would do I But it’s inconvenient I Need to compute partial derivatives and expected values in ∂2 I(θ) = E[− log f(Y ; θ)] ∂θi∂θj 6 / 18 Observed Fisher Information I To find θbn, minimize the minus log likelihood. I Matrix of mixed partial derivatives of the minus log likelihood is " n # ∂2 ∂2 X − `(θ, Y) = − log f(Y ; θ) ∂θ ∂θ ∂θ ∂θ i i j i j i=1 I So by the Strong Law of Large Numbers, " n # 1 X ∂2 J (θ) = − log f(Y ; θ) n n ∂θ ∂θ i i=1 i j ∂2 a.s.→ E − log f(Y ; θ) = I(θ) ∂θi∂θj 7 / 18 A Consistent Estimator of I(θ) Just substitute θbn for θ " n # 1 X ∂2 J n(θbn) = − log f(Yi; θ) n ∂θi∂θj i=1 θ=θbn ∂2 a.s.→ E − log f(Y ; θ) = I(θ) ∂θi∂θj I Convergence is believable but not trivial. I Now we have a consistent estimator, more convenient than I(θbn): Use Id(θ)n = J n(θbn) 8 / 18 Approximate the Asymptotic Covariance Matrix 1 −1 I Asymptotic covariance matrix of θbn is n I(θ) . I Approximate it with 1 −1 Vb n = J n(θbn) n !−1 1 1 ∂2 = − `(θ, Y) n n ∂θ ∂θ i j θ=θbn !−1 ∂2 = − `(θ, Y) ∂θ ∂θ i j θ=θbn 9 / 18 Compare Hessian and (Estimated) Asymptotic Covariance Matrix −1 h 2 i V = − ∂ `(θ, Y) I b n ∂θi∂θj θ=θbn h 2 i Hessian at MLE is H = − ∂ `(θ, Y) I ∂θi∂θj θ=θbn I So to estimate the asymptotic covariance matrix of θ, just invert the Hessian. I The Hessian is usually available as a by-product of numerical search for the MLE. 10 / 18 Connection to Numerical Optimization I Suppose we are minimizing the minus log likelihood by a direct search. I We have reached a point where the gradient is close to zero. Is this point a minimum? I The Hessian is a matrix of mixed partial derivatives. If all its eigenvalues are positive at a point, the function is concave up there. I Its the multivariable second derivative test. I The Hessian at the MLE is exactly the observed Fisher information matrix. I Partial derivatives are often approximated by the slopes of secant lines – no need to calculate them. 11 / 18 So to find the estimated asymptotic covariance matrix I Minimize the minus log likelihood numerically. I The Hessian at the place where the search stops is exactly the observed Fisher information matrix. I Invert it to get Vb n. I This is so handy that sometimes we do it even when a closed-form expression for the MLE is available. 12 / 18 Estimated Asymptotic Covariance Matrix Vb n is Useful I Asymptotic standard error of θbj is the square root of the jth diagonal element. I Denote the asymptotic standard error of θj by S . b θbj I Thus θbj − θj Z = j S θbj is approximately standard normal. 13 / 18 Confidence Intervals and Z-tests θbj −θj Have Zj = S approximately standard normal, yielding θbj I Confidence intervals: θj ± S z b θbj α/2 I Test H0 : θj = θ0 using θbj − θ0 Z = S θbj 14 / 18 And Wald Tests 1 −1 Recalling Vb n = n J n(θbn) −1 0 0 −1 Wn = n(Cθbn − h) (CId(θ)n C ) (Cθbn − h) 0 −1 0 −1 = n(Cθbn − h) (CJ n(θbn) C ) (Cθbn − h) −1 0 0 = n(Cθbn − h) C(nVb n)C (Cθbn − h) −1 0 1 0 = n(Cθbn − h) CVb nC (Cθbn − h) n −1 0 0 = (Cθbn − h) CVb nC (Cθbn − h) 15 / 18 Score Tests Thank you Mr. Rao I θb is the MLE of θ, size k × 1 I θb0 is the MLE under H0, size k × 1 u(θ) = ( ∂` ,... ∂` )0 is the gradient. I ∂θ1 ∂θk I u(θb) = 0 I If H0 is true, u(θb0) should also be close to zero. I Under H0 for large N, u(θb0) ∼ Nk(0, J (θ)), approximately. I And, 0 −1 2 S = u(θb0) J (θb0) u(θb0) ∼ χ (r) Where r is the number of restrictions imposed by H0 16 / 18 Three Big Tests I Score Tests: Fit just the restricted model I Wald Tests: Fit just the unrestricted model I Likelihood Ratio Tests: Fit Both 17 / 18 Comparing Likelihood Ratio and Wald I Asymptotically equivalent under H0, meaning p (Wn − Gn) → 0 I Under H1, I Both have approximately the same distribution (non-central chi-square) I Both go to infinity as n → ∞ I But values are not necessarily close I Likelihood ratio test tends to get closer to the right Type I error rate for small samples. I Wald can be more convenient when testing lots of hypotheses, because you only need to fit the model once. I Wald can be more convenient if it’s a lot of work to write the restricted likelihood. 18 / 18.

Wald (And Score) Tests

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support