Econ 715

Lecture 6 Testing Nonlinear Restrictions1

The previous lectures prepare us for the tests of nonlinear restrictions of the form:

H0 : h(0) = 0 versus H1 : h(0) = 0: (1) 6 In this lecture, we consider Wald, Lagrange multiplier (LM), and quasi-likelihood ratio (QLR) test. The Delta Method will be useful in constructing those tests, especially the .

1 The Delta Method

The delta method can be used to …nd the of h(n), suitably normalized, if dn(n 0) d Z: ! b

Theoremb (-method): Suppose dn(n 0) d Y where n and Y are random k-vectors, 0 is a ! non-random k-vector, and dn : n 1 is a sequence of scalar constants that diverges to in…nity fk `b g b @ as n . Suppose h( ): R R is di¤erentiable at 0, i.e., h() = h(0) + h(0)( 0) + ! 1  ! @0 o(  0 ) as  0. Then, jj jj !

d @ dn(h(n) h(0)) h(0)Y: ! @0

@ b @ @ If Y N(0;V ), then h(0)Y N 0; h(0)V ( h(0))0 .  @0  @0 @0  Proof: By the assumption of di¤erentiability of h at 0, we have

@ dn(h(n) h(0)) = h(0)dn(n 0) + dno( n 0 ): @0 jj jj

b b @ b The …rst term on the right-hand side converges in distribution to h(0)Y: So, we have the desired @0 result provided dno( n 0 ) = op(1). This holds because jj jj

dbno( n 0 ) = dn(n 0) o(1) = Op(1)o(1) = op(1) jj jj jj jj b b and the proof is complete. 

1 The notes for this lecture is largely adapted from the notes of Donald Andrews on the same topic I am grateful for Professor Andrews’generosity and elegant exposition. All errors are mine.

Xiaoxia Shi Page: 1 Econ 715

2 Tests for Nonlinear Restrictions

r The R -valued function h( ) de…ning the restrictions and the matrix 0 (de…ned in Assumption  CF) are assumed to satisfy:

Assumption R: (i) h() is continuously di¤erentiable on a neighborhood of 0. @ (ii) H = h(0) has full rank r ( d; where d is the dimension of ): @0  (iii) 0 is positive de…nite.

For example, we might have

h() = 1 2; h() = 12 3, or  h() = 1 : (2) 2 + 3 1 !

The requirement that 0 is positive de…nite ensures that the asymptotic covariance matrix, 1 1 B 0B , of pn(n 0) is positive de…nite. 0 0 The Wald statistic for testing H0 is de…ned to be b 1 1 1 n = nh(n)0(HnB nB H0 ) h(n), where W n n n @ Hn = h(n) (3) @0 b b b b b b b b b and Bn and n are consistent of B0 and 0 respectively. See Lecture 5 for the choice of

Bn and n: b b As de…ned, the Wald statistic is a quadratic form in the (unrestricted) h(n) of the b b restrictions h(0). The quadratic form has a positive de…nite (pd) weight matrix. If H0 is true, b then h(n) should be close to zero and the quadratic form n in h(n) also should be close to zero. W On the other hand, if H0 is false, h(n) should be close to h(0) = 0 and the quadratic form n b 6 b W in h(n) should be di¤erent from zero. Thus, the Wald test rejects H0 when its value is su¢ ciently b large. Large sample critical values are provided by Theorem 6.1 below. b The LM and QLR depend on a restricted extremum estimator of 0. The restricted extremum estimator, n, is an estimator that satis…es

1=2 n ; h(en) = 0, and Q^n(n) inf Q^n():  ; h() = 0 + op(n ): (4) 2  f 2 g e e e

Xiaoxia Shi Page: 2 Econ 715

The LM statistic is de…ned by

@ ^ 1 1 1 1 1 @ ^ n = n Qn(n)Bn Hn0 (HnBn nBn Hn0 ) HnBn Qn(n), where LM @0 @ @ Hn = h(n)e e e e e e e e e e e (5) @0 e e and Bn and n are consistent estimators under H0 of B0 and 0, respectively, that are constructed using the restricted estimator n rather than n: e e The LM statistic is a quadratic form in the vector of derivatives of the criterion function eval- e b uated at the restricted estimator n. The quadratic form has a pd weight matrix. If the null hypothesis is true, then the unrestricted estimator should be close to satisfying the restrictions. e In this case, the restricted and unrestricted estimators, n and n, should be close. In turn, this implies that the derivative of the criterion function at n and n should be close. The latter, @ ^ e b ^ @ Qn(n), equals zero by the …rst-order conditions for unrestricted minimization of Qn() over . @ ^ e1=2 b (More precisely, @ Qn(n) is only close to zero, viz., op(n ), by Assumption EE2, since n is not b ^ @ ^ required to exactly minimize Qn() just approximately minimize it.) Then, @ Qn(n) should be b @ b close to zero under H0 and the quadratic form, n, in Q^n(n) also should be close to zero. On LM @ @ ^ e the other hand, if H0 is false, then n and n should not be close, @ Qn(n) should not be close to @ ^ e zero, and the quadratic form, n, in @ Qn(n) also should not be close to zero. In consequence, LM b e e the LM test rejects H0 when the LMn statistic is su¢ ciently large. Asymptotic critical values for e n are determined by Theorem 6.1 below. LM Note that n can be computed without computing n. This can have computational advan- LM tages if the criterion function is simpler when the restrictions h() are imposed than when unre- b stricted. For example, if the unrestricted model is a nonlinear regression model and the restricted model is a linear model, then the LS estimator has a closed-form solution under the restrictions, but not otherwise. Next, we consider the QLR statistic. It has the proper asymptotic null distribution only when the following assumption holds:

Assumption QLR: (i) 0 = cB0 for some scalar constant c = 0: 6 (ii) cn p c for some sequence of non-zero random variables cn : n 1 : ! f  g

Forb example, in the ML example with correctly speci…ed model,b the information matrix equality yields 0 = B0, so Assumption QLR holds with c = cn = 1. In the LS example with conditionally 2 2 homoskedastic errors (i.e., E(Ui Xi) = 0 a.s. and E(U Xi) =  a.s.), Assumption QLR holds with j i j c = 2 and b 1 n 2 cn = (Yi g(Xi; n)) : (6) n i=1 P b b

Xiaoxia Shi Page: 3 Econ 715

1 In the GMM, MD, and TS examples with optimal weight matrix (e.g., A0A = V0 for the GMM and 1 MD estimators, see Lecture 5), Assumption QLR holds with c = c = 1, since 0 = B0 = 0V0 0. The QLR statistic is de…ned by b

n = 2n(Q^n(n) Q^n(n))=cn: (7) QLR

When H0 is true, the restricted and unrestrictede estimators,b b n and n, should be close and, hence, so should the value of the criterion function evaluated at these parameter values. Thus, e b under H0, the statistic n should be close to zero. Under H1, n should be noticeably QLR QLR larger than zero, since the minimization of Q^n() over the restricted parameter space, which does not include the minimum value 0 of Q(), should leave Q^n(n) noticeably larger than Q^n(n). In consequence, the QLR test rejects H0 when n is su¢ ciently large. Asymptotic critical values QLR e b are provided by Theorem 6.1 below. For the ML example, the QLR statistic equals the standard likelihood ratio statistic, viz., minus two times the logarithm of the likelihood ratio.

When Assumption QLR holds, the n statistic simpli…es to LM

@ ^ 1 @ ^ n = n Qn(n)Bn Qn(n)=cn or LM @0 @ @ ^ 1 @ ^ n = n Qn(en) en Qn(en) b (8) LM @0 @

e@ e e (since in this case one can take n = cnBn and Q^n(n) = H0 n for some vector n of Lagrange @ n multipliers). e e e For the LM and QLR tests,e we useb thee following assumption.e

p Assumption REE: n 0 under H0: ! This assumption can bee veri…ed using the results of Lecture 3 with the parameter space  replaced by the restricted parameter space  =   : h() = 0 : f 2 g For the Wald and LM tests, we assume that consistent estimators under H0 of B0 and 0 are used in constructing the test statistics:e

p p Assumption COV: For the Wald statistic, Bn B0 and n 0 under H0. For the LM p p ! ! statistic, Bn B0 and n 0 under H0: ! ! b b Estimatorse that satisfy thise assumption are discussed in Lecture 5.

Xiaoxia Shi Page: 4 Econ 715

Theorem 6.1: (a) Under Assumptions EE2, CF, R, and COV,

d 2 n under H0; W ! r

2 where r denotes a chi-square distribution with r degrees of freedom. (b) Under Assumptions EE2, CF, R, COV, and REE,

d 2 n under H0: LM ! r (c) Under Assumptions EE2, CF, R, COV, REE, and QLR,

d 2 n under H0: QLR ! r

r r 2 One rejects H0 using the Wald test if Wn > k , where k is the 1 quantile of the r distribution. The LM and QLR tests are de…ned analogously. These tests have signi…cance level approximately equal to for large n:

Under sequences of local alternatives to H0 : h() = 0, the Wald, LM, and QLR test statistics have noncentral chi-squared distributions with the same noncentrality parameter, see next Lecture Thus, the three tests, Wald, LM, and QLR, have the same large sample power functions. One cannot choose between these tests based on (…rst order) large sample power. The choice can be made on computational grounds. It also can be made based on the closeness of the true …nite sample size of a test to its nominal asymptotic size . The best test according to the latter criterion depends on the particular testing context. The folklore (backed up by various simulation studies), however, is that the Wald test over-rejects in many contexts and the LM and QLR tests often are preferable in consequence.

Proof of Theorem 6.1: First we establish part (a). Element-by-element mean value expansions of h(n) about 0 give

b @ pnh(n) = pnh(0) + h(n )pn(n 0) @0 (9) = Hpn(n 0) + op(1) b p 1 1b Z0 N(0;HB0 0B0 H0); !  b where n lies between n and 0 (and, hence, satis…es n p 0 by Assumption EE2(i)), the second @ ! @ equality holds because h(0) = 0 under H0 and @ h(n ) p @ h(0) = H using Assumption R, b 0 ! 0 and the convergence in distribution uses Theorem 5.1.

Xiaoxia Shi Page: 5 Econ 715

By Assumptions COV, EE2, and R,

1 1 1 p 1 1 1 (HnB nB Hn) (HB 0B H0) : (10) n n ! 0 0 Combining (9) and (10)b givesb b b b

1 1 1 d 1 1 1 n = pnh(n)0(HnB nB Hn) pnh(n) Z0 (HB 0B H0) Z0; (11) W n n ! 0 0 0

1 b 1 1=2 b 2 where Z = (HB 0B H0b) b Z0b bN(0b;Ir) and Z0Z : 0 0   r @ ^ Next, we establish part (b). Element-by-element mean value expansions of @ Qn(n) and h(n) about 0 yield e e 2 @ ^ @ ^ @ ^ pn @ Qn(n) = pn @ Qn(0) + Qn(n )pn(n 0) and @@0 @ + (12) 0 = pnh(en) = pnh(0) + h(n )pn(n 0)e @0 @ + = h(n )pn(n 0); e @0 e

+ e @2 @ where  p 0 and  p 0. Let Bn() = Q^n() and H() = h(). Then, pre- n n @@0 @0 ! ! + 1 multiplication of the …rst equation of (12) by H(n )Bn(n ) gives

+ 1 @ H( )B ( ) pn Q^ ( ) n n n @ n n + 1 @ + = H( )Bn( ) pn Q^n(0) + H( )pn(n 0) n n @ e n (13) + 1 @ = H( )B ( ) pn Q^ ( ) n n n @ n 0 e p 1 1 Z0 N(0;HB 0B H0) !  0 0 using the second equation of (12) and Assumptions CF(iii), CF(iv), REE, and R. @ + Combining (13), the fact that pn Q^n(n) = Op(1) by (17) below, H( ) p H, Hn p H, @ n ! ! Bn(n ) p B0, and Bn p B0 gives ! ! e e

e 1 @ d 1 1 HnB pn Q^n(n) Z0 N(0;HB 0B H0): (14) n @ !  0 0 By Assumptions COV,e e REE, and R, e

1 1 1 p 1 1 1 (HnB nB H0 ) (HB 0B H0) : (15) n n n ! 0 0 Combining (14) and (15)e givese e thee desirede result

d 1 1 1 2 n Z0 (HB 0B H0) Z0 = Z0Z (16) LM ! 0 0 0  r

Xiaoxia Shi Page: 6 Econ 715

under H0, where Z is as above. We now show that @ pn Q^ ( ) = O (1): (17) @ n n p

With probability that goes to one as n , ne is in the interior of  and there exists a random ! 1 vector n of Lagrange multipliers such that e @ e Q^ ( ) + H( )  = o (n 1=2): (18) @ n n n 0 n p

Equations (13) and (18) combine to givee e e

+ 1 H(n )Bn(n ) H(n)0pnn = Op(1): (19)

+ 1 1 e 1e Since H( )Bn( ) H(n)0 p HB H0 and HB H0 is nonsingular, (19) implies pnn = n n ! 0 0 Op(1): This and (18) imply (17). e e Next, we establish part (c). A two-term Taylor expansion of Q^n(n) about n gives

n = 2n(Q^n(n) Q^n(n))=cn e b QLR 2 @ ^ @ ^ = 2n Qn(n)(n n)=cn + n(n n)0 Qn(n )(n n)=cn (20) @ e b @@0 2b @ ^ = op(1) + n(bn e n)0b Qn(ne)(nb n)=cn; e b @@b0 b

e b e b b where  lies between n and n (and, hence, satis…es  p 0) and the third inequality holds n n ! 1=2 @ ^ 1=2 because n @ Qn(n) = op(1) by EE2(ii) and n (n n) = Op(1) by (22) below. e b Element-by-element mean-value expansions of (@=@)Q^n(n) about n gives b e b 2 @ ^ @ ^ @ ^e + b pn Qn(n) = pn Qn(n) + Qn(n )pn(n n) @ @ @@0 (21) = op(1) + (B0 + op(1))pn(n n); e b e b

+ where  p 0. Since B0 is nonsingular, (21) implies e b n !

1 @ pn(n n) = B pn Q^n(n) + op(1): (22) 0 @ Substitution of (22) into (20)e givesb e

@ 1 1 @ n = op(1) + n Q^n(n)0B Bn( )B Q^n(n)=cn QLR @ 0 n 0 @ @ @ = o (1) + n Q^ ( ) B 1 Q^ ( )=c (23) p @ n en 0 n @ n n n e b = op(1) + n; LM e e e b

Xiaoxia Shi Page: 7 Econ 715

1 1 1 where the second equality holds by (17) and Bn Bn(n )B0 = B0 +op(1) and the third inequality holds by the simpli…ed expression for n given in (8). Part (c) now holds by (23) and part (b).  LM e 3 One-Sided Hypotheses

Sometimes, one-sided hypotheses are of interest:

H0 : h (0) 0 vs. H1 : h (0) > 0. (24)  We only consider one-dimensional hypotheses, i.e., those with real-valued h ( ) :Testing multiple  dimensional inequality constraints is much trickier, as explained in Wolak (1991). We use the same assumptions as above. We consider Wald Statistic only. Deriving the as- ymptotic properties of LM and QLR statistics requires deriving those for inequality-constraint extremum estimators, which is not covered by the previous lectures. The Wald statistic for testing

H0 is de…ned to be 2 n h(n) + n= , where [a] = max 0; a . (25) W h1 i 1 + f g HnBn bnBn Hn0 As de…ned, the Wald statistic is a quadratic form in the positive part of the (unrestricted) estimator b b b b b h(n) of the restrictions h(0). The quadratic form has a positive de…nite (pd) weight matrix. If

H0 is true, then h(n) should be close to zero and the quadratic form n in h(n) also b + W + should be close toh zero.i On the other hand, if H is false, h( ) should be close to h( ) >i 0 and b 0 n 0b the quadratic form n in h(n) should be di¤erent from zero. Thus, the Wald test rejects H0 W + b when its value is su¢ cientlyh large.i Large sample critical values are provided by Theorem 6.2 below. b Under the one-sided null hypothesis, there are two cases: h (0) < 0 and h (0) = 0. The asymptotic distributions of the Wald statistic are di¤erent in the two cases. Theorem 6.2 summarize the result. De…ne the mixed chi-squared distribution

2 2 + = [N(0; 1)]+ . (26)

Let "wpa1" denote "with probability approaching one".

Theorem 6.2: (a) Under Assumptions EE2, CF, R, and COV,

= 0 wpa1 if h ( ) < 0 0 : n 2 W ( d + if h (0) = 0 !

Proof of Theorem 6.2: When h (0) < 0, h(n) < 0 wpa1 because h(n) p h (0) by Assump- ! b b Xiaoxia Shi Page: 8 Econ 715

tions R(i) and EE2(i). Therefore, n = 0 wpa1. When h (0) = 0, the proof is the same as that W for Theorem 6.1(a) except for (11). The one-sided counterpart of (11).is:

2 pnh(n) 2 2 [Z0] = + d + = HB 1 B 1H Z = 2 . (27) n h 1 i1 1 1 0 0 0 0 0 d + W H B B H ! HB 0B H n n nb n n 0 0 0 q + b b b b b

Xiaoxia Shi Page: 9