AAE 637 Lab 3: Hypothesis Testing⇤

AAE 637 Lab 3: Hypothesis testing⇤ 2/11/2015 The Wald test You are familiar with the t-test. Usually the null hypothesis is that β =0. If the null is β = β0: βˆ β0 tn (β0)= − s βˆ ⇣ ⌘ where s βˆ the standard error (i.e. the square root of the variance) of the estimate. For a two-sided test⇣ ⌘ we reject the null hypothesis at the 5% confidence level if tn 1.96. | |≥ The Wald test is a generalization of the t-test so we can conduct tests dealing with multiple parameters and/or multiple restrictions. Let’s square the t-stat: 2 βˆ β0 2 − (tn (β0)) = ⇣ s2 βˆ ⌘ ⇣ ⌘ 2 βˆ β is assumed to be normally distributed, so βˆ β is distributed χ2 (“chi-squared”), − 0 − 0 2 with 1 degree of freedom. With Fq (u) denoting the ⇣χq density⌘ function, the critical value v for rejection of the null hypothesis at significance level ↵ satisfies ↵ =1 F (v).For↵ =0.05 and − q q =1,thiscriticalvalueis3.84,i.e.werejectthenullif(t (β ))2 3.84. n 0 ≥ 2 χ1 distribution 1.2 95% of mass 5% of mass 0.8 Density 0.4 0.0 0 2 4 6 8 10 Value of Wald statistic ⇤prepared by Travis McArthur, UW-Madison (http://www.aae.wisc.edu/tdmcarthur/teaching.asp) 1 For the Wald statistic, we can express any set of q linear restrictions on K parameter values with matrix notation: R β = c q KK 1 q 1 ⇥ ⇥ ⇥ Say we have four parameters and we wish to test the joint hypothesis that β + β β =1and 1 2 − 3 β1 = 10. Let’s make the first row of R: β + β β = β 1+β 1+β ( 1) + β 0 1 2 − 3 1 · 2 · 3 · − 4 · and the second row: β = β 1+β 0+β 0+β 0 1 1 · 2 · 3 · 4 · 11 10 1 So R = , c = 10− 0 0 10 The Wald statistic is given by: 0 1 − 0 Wn = R βˆ c R Vˆ R R βˆ c q KK 1 − q 1 q KK KK q q KK 1 − q 1 ⇥ ⇥ ⇥ ✓ ⇥ ⇥ ⇥ ◆ ⇥ ⇥ ⇥ W χ2 n ⇠ q Vˆ is the estimated asymptotic variance-covariance matrix of the parameters βˆ.Youmustuse the correct number of degrees of freedom on χ2 to do the Wald test and/or compute the p-value. The degrees of freedom here are the number of restrictions (i.e. rows of R), not the number of parameters being tested. The key thing here is that the covariance across the parameter estimates must be taken into account when you compare parameter estimates within the same model and/or dataset. To get the p-value in MATLAB, input something like: 1-chi2cdf(w_test_output,size(R_mat,1)) The F test The F statistic can be used to test multiple hypotheses as well, but it is extremely similar to the Wald stat. In fact, when the homoskedastic estimate of V is used in the Wald stat, we have: W F = n n q However, the F statistic is compared against the F distribution, which incorporates a “small sample” correction. F F [q, n K] where K is the number of estimated parameters and n is the n ⇠ − number of observations. As n ,theF distribution converges to a χ2 distribution if multiplied !1 q by q. The F statistic can also be expressed in terms of the sum of squared errors of an OLS regression: SSE β˜ SSE βˆ /q cls − Fn = ⇣ SSE⇣ βˆ⌘ / (n ⇣K)⌘⌘ − ⇣ ⌘ ˜ Where SSE βcls is the sum of squared errors for the constrained model. The F test is often ⇣ ⌘ ˜ used to check whether a model has any explanatory power at all, in which case SSE βcls would be the total sum of squares. This type of F test was more relevant in the past when⇣ only⌘ small datasets were available. 2 The delta method Say we know that the asymptotic distribution of a K 1 consistent set of estimators βˆ is jointly ⇥ normal, i.e. as n , !1 pn βˆ β d N 0 , V K 1 − K 1 ! K 1 K K ✓ ⇥ ⇥ ◆ ✓ ⇥ ⇥ ◆ Further suppose that what we really want to get out of our estimation is some continuously differentiable transformation g : RK RL of our parameter estimates, not the estimates them- ! selves. In addition to obtaining the transformed estimates, we need to know the uncertainty of the transformed estimates. The delta method gives us the following asymptotic approximation of the variance of the transformed estimates: 0 ˆ d @ @ pn g β g (β) N 0 , @β g (β) V @β g (β) 0 − L 1 1 ! 0L 1 K K 1 L 1 ⇥ ⇥ L K ⇥ K L ⇣⇥ ⌘ h ⇥ i h ⇥ i @ @ A @ A where @β g (β) is the Jacobian matrix: @ @ g1 (β) g1 (β) @1 ··· @K @ . @β g (β)=2 . .. 3 @ @ gL (β) gL (β) 6 @1 ··· @K 7 4 5 The delta method works because it is a first-order Taylor Series approximation of the true variance. Now say we can consistently estimate V. Then our estimator for the variance of g βˆ is simply the “plug-in” estimator: ⇣ ⌘ 0 @ ˆ ˆ @ ˆ @β g β V @β g β h ⇣ ⌘i h ⇣ ⌘i Example: ˆ ˆ 0.5 1 0.5 β1/β2 Say βˆ = , Vˆ = − , g βˆ = 2 1.5 0.52 βˆ − " 2 # 2 ⇣ ⌘ ˆ ˆ ˆ ⇣ ⌘ @ ˆ 1/β2 β1/ β2 Then: @β g β = − " 02βˆ⇣ ⌘ # ⇣ ⌘ 2 This implies that the variance-covariance estimate for g βˆ is: ⇣ ⌘ 0 1/1.5 0.5/ (1.5)2 1 0.5 1/1.5 0.5/ (1.5)2 0.691 2.3¯ Vˆ = − = − g(β) 02− 1.5 0.52 02− 1.5 2.3¯ 18 · − · − 1 Now say my null hypothesis is g (β)= 2 Then the Wald stat is: 0 1/3 1 ˆ 1 1/3 1 Wn = V− t 1.038 2.25 − 2 g(β) 2.25 − 2 ✓ ◆ ✓ ◆ 2 Using the χ2 distribution, this yields a p-value of 0.60, so we cannot reject the null hypothesis. When the parameter restrictions being tested are nonlinear, different algebraic formulations of the restriction can affect the value of the Wald statistic. This is undesirable. Next lab we will discuss likelihood-based tests that do not suffer from this problem. 3.

AAE 637 Lab 3: Hypothesis Testing⇤

Chapter 8 Large Sample Theory

Applying the Delta Method in Metric Analytics

Chapter 6 Asymptotic Distribution Theory

The 'Delta Method'

Chapter 5 the Delta Method and Applications

Stat 8931 (Aster Models) Lecture Slides Deck 4 [1Ex] Large Sample

Taylor Approximation and the Delta Method

TESTING for the PARETO DISTRIBUTION Suppose That X1

Uniformity and the Delta Method

Calculating Confidence Intervals for Continuous and Discontinuous Functions of Estimated Parameters

MLE Lecture 4.Pdf

Large Sample Tools∗ STA 312: Fall 2012