<<

Journal of 149 (2009) 52–64

Contents lists available at ScienceDirect

Journal of Econometrics

journal homepage: www..com/locate/jeconom

Bootstrap validity for the score test when instruments may be weak

Marcelo J. Moreira a,b,∗, Jack R. Porter c, Gustavo A. Suarez d a Columbia University, United States b FGV/EPGE, Brazil c University of Wisconsin, United States d Federal Reserve Board, United States article info a b s t r a c t

Article history: It is well-known that size adjustments based on bootstrapping the t-statistic perform poorly when Available online 5 November 2008 instruments are weakly correlated with the endogenous explanatory variable. In this paper, we provide a theoretical proof that guarantees the validity of the bootstrap for the score statistic. This theory does not JEL classification: follow from standard results, since the score statistic is not a smooth function of sample means and some C12 parameters are not consistently estimable when the instruments are uncorrelated with the explanatory C31 variable. Keywords: © 2008 Elsevier B.V. All rights reserved. Bootstrap t-statistic Score statistic Identification Non-regular case Edgeworth expansion Instrumental variable regression

1. Introduction estimators. Hence, the empirical distribution function of the residuals may differ substantially from their true cumulative Inference in the linear simultaneous equations model with distribution function, which runs counter to the usual argument weak instruments has recently received considerable attention for bootstrap success. Second, the score statistic is not a smooth in the econometrics literature. It is now well understood that function of sample means. In many known non-regular cases1 standard first-order asymptotic theory breaks down when the the usual bootstrap method fails, even in the first-order. Familiar instruments are weakly correlated with the endogenous regressor; cases from the and econometrics literature include cf., Bound et al. (1995), Dufour (1997), Nelson and Startz (1990), estimation on the boundary of the parameter space (Shao, 1994; Staiger and Stock (1997), and Wang and Zivot (1998). It is then Andrews, 2000) and estimating a non-differentiable function of the natural to apply the bootstrap to decrease size distortions of the population mean. Wald statistic (also known as the t-statistic), since the bootstrap is Commonly used fixes for bootstrap failure due to nonregularity valid under some regularity conditions. However, these conditions, are to use the m out of n bootstrap or subsampling. However, which rely on the statistics being smooth functions of sample these methods have two limitations. First, in practice they give moments and the parameters being consistently estimable, break down for the Wald statistic in the weak-instrument case. In fact, quite different results for different choices of the bootstrap sample the bootstrap does not seem to perform well in decreasing the size (or subsample) size m. Second, they do not provide asymptotic distortions of the Wald statistic; cf., Horowitz (2001). refinements in the regular case. For instance, in the non- In this paper, we show that it is valid to bootstrap the score differentiable example above, the function may be differentiable statistic even in the weak-instrument case. Although the score at some values of the population mean and non-differentiable is well-behaved with weak instruments, showing the validity at other values. Then, at the differentiable values, the statistic is of the bootstrap in the unidentified case has several potential typically regular and the usual bootstrap is not only valid but pitfalls. First, the bootstrap replaces parameters with inconsistent

1 ∗ A statistic is said to be regular if, when written as function of sample moments, Corresponding author. Tel.: +1 212 854 3680; fax: +1 212 854 8059. the first derivative of this function evaluated at the population mean exists and is E-mail address: [email protected] (M.J. Moreira). different from zero.

0304-4076/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2008.10.008 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 53

= 0 −1 0 ˆ 2 = [ − ] [ also provides second-order improvements. Hence, there is a trade- where bβ2SLS (y2NZ y2) y2NZ y1 and σu 1, bβ2SLS Ω 1, off between robustness (m out of n bootstrap or subsampling) 0 −bβ2SLS ] . It is now well understood that the Wald statistic has and refinements (the usual bootstrap). Lastly, subsampling does important size distortions when the instruments may be weak. In not provide a general method of controlling size uniformly in particular, under the weak-instrument asymptotics of Staiger and cases where the bootstrap fails (Andrews and Guggenberger, Stock (1997), the limiting distribution of the Wald statistic is not forthcoming). standard normal. An alternative statistic is the score (LM) used by In this paper, we find that weak instruments are not, in Kleibergen (2002) and Moreira (2002): general, the cause of bootstrap failure. Although parameters are √ 0 0 not consistently estimable when instruments are weak and the LM = S T / T T , (3) score statistic is not differentiable, we show that the re-centered = 0 −1/2 0 · 0 −1/2 = 0 −1/2 0 −1 where S (Z Z) Z Yb0 (b0Ωb0) and T (Z Z) Z Y Ω residual bootstrap for the score is valid regardless of instrument · 0 −1 −1/2 a0 (a0Ω a0) . The (two-sided) score test rejects the null if the strength. In light of the recent negative results on the bootstrap, it is LM2 statistic is larger than the 1 − α quantile of the chi-square- notable that the bootstrap can still work in some non-regular cases. one distribution. This test is similar if the errors are normal with Still, we additionally find that the higher-order improvements known variance Ω, since the LM statistic is pivotal. With unknown provided by the bootstrapped score statistic when instruments are error distribution, the score test is no longer similar. However, un- strong do not extend to the case of weak instruments. like the Wald test, the score test is asymptotically similar under The remainder of this paper is organized as follows. In Section 2, both weak-instrument and standard asymptotics. we present the model and establish some notation. In Section 3, we In practice, the covariance matrix Ω is typically unknown, so summarize some folk theorems showing the size improvements 0 we replace it with the consistent estimator Ωe = Y MZ Y /n: based on the bootstrap for the Wald and score tests under = 0 −1/2 0 · 0 −1/2 standard asymptotics. In Section 4, we present the main results. eS (Z Z) Z Yb0 (b0Ωeb0) , We establish the validity of the bootstrap for the score statistic, and = 0 −1/2 0 −1 · 0 −1 −1/2 eT (Z Z) Z Y Ωe a0 (a0Ωe a0) , show that the bootstrap will not in general provide second-order 0 p 0 improvements in the unidentified case. In Section 5, we present LMf = eS eT / eT eT . Monte Carlo simulations that suggest that the bootstrap methods ˆ 2 ˜ 2 = [ − ] [ − ]0 For the Wald statistic, replace σu by σu 1, bβ2SLS Ωe 1, bβ2SLS may lead to improvements, although in general they do not lead to to obtain We. Below we present results for We and LMf although higher-order adjustments in the weak-instrument case. Section 6 analogous results for the known covariance case are also similarly concludes. In Appendix A, we provide all proofs pertaining to the available. score statistic. In Appendix B, we provide some additional useful results and extensions. 3. Preliminary results

2. The model In this section, we summarize some folk theorems for the strong-instrument case. Some of the results are already known, The structural equation of interest is and those that are new follow from standard results. The results in this section provide a foundation for the weak-instrument results y = y β + u, (1) 1 2 to be presented in Section 4. where y1 and y2 are n × 1 vectors of observations on two For any symmetric matrix A, let vech(A) denote the column endogenous variables, u is an n×1 unobserved disturbance vector, vector containing the column by column vectorization of the non- and β is an unknown scalar parameter. This equation is assumed redundant elements of A. The test statistics given in the previous to be part of a larger linear simultaneous equations model, section can be written as functions of  0  which implies that y2 is correlated with u. The complete system 0 0  0 0  Rn = vech Y , Z Y , Z contains exogenous variables that can be used as instruments for n n n n conducting inference on β. Specifically, it is assumed that the = 0 0  0 0 0 f1 Yn, Zn ,..., f` Yn, Zn , reduced form for Y = [y1, y2] can be written as where fi, i = 1, . . . , `, ` = (k + 2)(k + 3) /2, are elements of the 0 y = Zπβ + v (2) 0 0  0 0  1 1 matrix Yn, Zn Yn, Zn . Both We and LMf statistics can be written y2 = Zπ + v2, in the form √   where Z is an n × k matrix of exogenous variables having full n H Rn − H (µ) , (4) × column rank k with probability one (w.p.1) and π is a k 1 vector. where µ = E (Rn). The n rows of Z are i.i.d., and F is the distribution of each row Let k·k be the Euclidean norm and k·k∞ the supremum norm. of Z and V = [v1, v2]. Unless otherwise stated, we consider the Hereinafter, we use the following high-level assumptions: case where Z is independent of V . The n rows of the n × 2 matrix of the reduced-form errors V are i.i.d. with mean zero and 2 × 2 Assumption 1. π is fixed and different from zero. = [ ] nonsingular covariance matrix Ω ωi,j . In what follows, Xi s Assumption 2. E kRnk < ∞ for some s ≥ 3. is the i-th observation of some random vector X. For instance, Zi denotes the column vector containing the ith row of the matrix Z. 0  Assumption 3. lim sup E exp it Rn < 1. The sample mean of the first n observations of X is X n. The subscript ktk→∞ n is typically omitted in what follows, unless it clarifies exposition. Assumption 1 is key to the standard strong-instrument Tests for the null hypothesis H0 : β = β0 play an important 0 −1 0 asymptotics. Under Assumption 1, the gradient of H evaluated role in our results. Let NA = A A A A and MA = I − NA for any 0 0  2s 0 0 at µ differs from zero. Assumption 2 holds if E Y , Z < conformable matrix A, and let b = (1, −β ) and a = (β , 1) . n n 0 0 0 0 ∞. This minimum moment assumption seems too strong at first The commonly used (two-sided) Wald test rejects H for large 0 glance, but note that test statistics involve quadratic functions of values (of the square) of the Wald statistic: 0 0  Yn, Zn . Assumption 3 is the commonly used Cramér’s condition. −  p 0 The following result by Bhattacharya (1977) provides a sufficient bβ2SLS β0 y2NZ y2 W = , condition for Assumption 3. σˆu 54 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

0 0  ∗ ∗  Ps−2 −i/2 i Lemma 1 (Bhattacharya, 1977). Let Y , Z be a random vector (b) P We ≤ x − [Φ (x) + = n p (x; Fn,bβ, π)φ (x)] n n i 1 W b ∞ k+2 with values in R whose distribution has a nonzero absolutely = o n−(s−2)/2 , a.s. as n → ∞. k+2 continuous component G (relative to the Lebesgue measure on R ). k+2 −1/2 Assume that there exists an open ball B in R in which the density The error based on the bootstrap simulation is of order n ∗ of G is positive almost everywhere. If, in B, the functions 1, f1,..., f` due to the fact that the conditional moments of Rn converge almost are linearly independent, then Assumption 3 holds. surely to those of Rn, and that bβ and bπ converge almost surely to β In the identified case in which π is fixed and different from zero, and π. Consequently, Theorem 3 shows that the bootstrap offers a not only are the 2SLS and LIML estimators consistent for β, but better approximation than the standard normal approximation. also both Wald and score statistics admit second-order Edgeworth expansions under mild conditions. As a simple application of 4. Main results Theorem 2 of Bhattacharya and Ghosh (1978), we obtain the following result: In the previous section, we considered the strong-instrument case in which the structural parameter β is identified. Our results Theorem 2. Under Assumptions 1–3, the null distributions of We and there are threefold: the null distribution of the Wald and score LMf can be uniformly approximated (in x) by Edgeworth expansions: statistics can be approximated by an Edgeworth expansion up to the n−(s−2)/2 order, for some integer s; the bootstrap estimate and  Ps−2 −i/2 i (a) P LMf ≤ x − [Φ (x) + = n p (x; F, β0, π) φ (x)] − i 1 LM ∞ the (s 1)-term empirical Edgeworth expansion for both statistics −(s−2)/2 = o n−(s−2)/2 , are asymptotically equivalent up to the n order; and the −1/2 − error of estimation of the bootstrap is of order n for one-  Ps 2 −i/2 i − (b) P We ≤ x − [Φ (x) + = n p (x; F, β0, π) φ (x)] 1 i 1 W ∞ sided versions and of order n for two-sided versions of the = o n−(s−2)/2 , Wald and score tests. However, the three results in Section 3 depend crucially on Assumption 1. In this section, we address the i i = where pW and pLM , i 1, 2, are polynomials in x with coefficients issues above that arise in a weak-instrument setting. Formally, depending on moments of Rn, β0, and π. we consider two alternative weak-instrument assumptions that We now turn to the bootstrap. For each bootstrap sample, a replace Assumption 1. test statistic is computed, which in turn generates a simulated Assumption 1A (Unidentified Case). = 0. empirical distribution for the Wald or score statistic. This π distribution can then be used to provide new critical values for the √ Assumption 1B (Locally Unidentified Case). π = c/ n for some test. The bootstrap sample is generated based on an estimate of β, non-stochastic k-vector c. and likewise the null hypothesized value of β is replaced by that estimate in forming the bootstrap test statistics. Given consistent Under Assumption 1A, β0 is replaced by an inconsistent estimates bβ and bπ, the residuals from the reduced-form equations estimator bβ in the bootstrap estimates, and the score and Wald are obtained as statistics are non-smooth functions of sample means. However, the standard proofs of bootstrap validity for statistics in the form v1 = y1 − Zπbβ b b (4) crucially depend on the assumption that the derivatives of v2 = y2 − Zπ. b b functions evaluated at µ = E (Rn) are defined and different from ∗ zero (regular case). From the known examples of bootstrap failure These residuals are re-centered to yield (ev1,ev2). Then Z and ∗ ∗ in nonregular cases, it is not clear whether the bootstrap actually v1 , v2 are drawn independently from the empirical distribution provides valid approximations even in the first-order. In fact, similar function of Z and (ev1,ev2). Next, we set ∗ ∗ ∗ versions of Theorems 2 and 3 have been considered to fix size y = Z πβ + v 1 bb 1 distortions of the Wald test in the weak-instrument case. ∗ ∗ ∗ y = Z + 2 bπ v2 . To see the role of the lack of differentiability in the weak- We want to stress here that the simulation method above is exactly instrument results of this section, recall from Section 3 that, under equivalent to simulating directly from the structural model the null, the score statistic can be written as a function of sample means ∗ = ∗ + ∗ y1 y2bβ u √ ∗ ∗ ∗ LM = n H R  − H  (5) y = Z + f n (µ) . 2 bπ v2 , Smoothness in H allows an expansion and yields the strong where Z∗ and u∗, v∗ are drawn independently from the empirical 2 instrument results of Section 3. In the unidentified case, this distribution function of Z and (u, v ), where u = v −v β. Also, the 2 e e2 e e1 e2b function is non-differentiable. Note that E [Vi] = 0, so the probability under the empirical distribution function (conditional expression in (5) can be simplified to on the sample) will be denoted P∗ in what follows. Finally, the fact ∗ √ √ that Z is randomly drawn reflects our interest in the correlated   0 0  LMf = nH Rn = S T / T T . (6) case. We do not consider the fixed Z case here, although this can be done by establishing conditions similar to those by Navidi (1989) and Qumsiyeh (1990, 1994) in the simple regression model. Of course, this entails different Edgeworth expansions and bootstrap 2 For some j ∈ {1,..., k}, let zj denote the jth column of Z (and let πj be the methods. corresponding jth element of π). To simplify the expressions, focus here on the The following result shows that the bootstrap approximates the known covariance case LM and consider the derivative of H (for LM) with respect −(s−2)/2 0 empirical Edgeworth expansion up to the o n order. to the argument of H( Rn) corresponding to y1 zj/n. For π 6= 0, this derivative 0 −1/2 takes the form πj{(b0Ωb0)(π ΩZZ π)} . It is easy to see that this derivative is not well-defined at π = 0. Or, consider the limit of this derivative as π approaches Theorem 3. Under Assumptions 1–3, zero. Let c 6= 0 and consider a scalar sequence λn ↓ 0. Then the derivative  ∗  evaluated at π = cλ is c {(b Ωb )(c0Ω c)}−1/2 and the derivative at π = −cλ ∗ Ps−2 −i/2 i n j 0 0 ZZ n (a) P LM ≤ x − [Φ(x) + n p (x; F , β, π)φ(x)] − { 0 }−1/2 f i=1 LM n b b is cj (b0Ωb0)(c ΩZZ c) . Since these expressions are also equal to directional ∞ = = −(s−2)/2 derivatives at π 0 yet are not equal to each other, H is non-differentiable at o n , π = 0. M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 55   Expressions (5) and (6) suggest two possible bootstrap meth-  ∗ (iii) lim inf inf Qn Π − Qn (Π) ≥ 0 a.s. n→∞ ∈Bc ods. The first one would use Rn and Rn, the sample means and boot- Π strap sample means based on residuals (without re-centering). The a.s. a.s. Then π → π and πβ → πβ.3 bootstrap statistic would be b bb The assumptions of Proposition 4 are typical high-level con- ∗ √   ∗  = −  ditions, as in Andrews (1987), Newey and McFadden (1994), LMf n H Rn H Rn . (7) and Potscher and Prucha (1997). These assumptions are standard  for showing that the unrestricted M-estimator of Π is strongly con- Because H Rn is not necessarily zero here, we would need to expand (7). Since H (·) is not smooth, this bootstrap would be sistent. The conclusion of the proposition goes a step further. We find that the conditions for strong consistency of the unrestricted problematic. The second, more commonly-used bootstrap method ∗ estimator of Π are also sufficient for strong consistency of the M- would be based on eRn and eRn, respectively, the sample means estimator that imposes the reduced-form restrictions, Π(θ). This and bootstrap sample means using re-centered residuals. Then consistency is obtained despite the fact that θ itself is unidentified the bootstrapped score statistic, as defined in Section 4.1, can be under weak instruments. written as In Appendix B, we verify that the regularity conditions of Proposition 4 hold in maximum likelihood estimation. Given these ∗ √   ∗   = − findings, we assume that π and πβ are strongly consistent in the LMf n H eRn H eRn (8) b bb bootstrap validity results of the next subsection. √  ∗  √  = = ∗0 ∗ ∗0 ∗ nH eRn S T / T T , (9) 4.1.2. First-order validity   The derivation of bootstrap validity for the score statistic under where H R = 0, due to the re-centered residuals. Eq. (9) pro- en weak instruments is divided into two steps. The score statistic is vides intuition for the first-order validity of the bootstrap based a function of the statistics eS and eT , so we first obtain the limiting ∗ ∗ on re-centered residuals, shown formally below in Section 4.1. distribution for eS and (re-centered) eT under weak instruments. Re-centering the residuals allows us to just rely on the contin- Then, the continuous mapping theorem leads to the limiting ∗ uous mapping theorem. On the other hand, the lack of differ- distribution for LMf . entiability in (8) means that the standard expansion arguments These asymptotic results are obtained despite the fact that the in Bhattacharya and Ghosh (1978) break down in the uniden- estimator bβ is not a consistent estimator under Assumption 1A or ∗ tified case, which foreshadows our higher-order conclusions in 1B and replaces the null hypothesized value of β = β0 in eS and ∗ Section 4.2. eT . Therefore, we have ∗ ∗0 ∗ −1/2 ∗0 ∗ ˆ ˆ0 ∗ ˆ −1/2 eS = (Z Z ) Z Y b · (b Ωe b) , 4.1. Bootstrap ∗ ∗0 ∗ −1/2 ∗0 ∗ ∗−1 0 ∗−1 −1/2 eT = (Z Z ) Z Y Ωe aˆ · (aˆ Ωe aˆ) , 0 0 where aˆ = (bβ, 1) and bˆ = (1, −bβ) . 4.1.1. Strong consistency ∗ ∗ To derive the asymptotic distributions of eS and eT , we re- ∗ The usual intuition for the bootstrap requires that the empirical center eT by subtracting the term distribution, from which the bootstrap sample is drawn, to be √  Z0Z 1/2 p close to the distribution of the data under the null. For the ∗ = ˆ0 ∗−1 ˆ tn n π a Ωe a. model given in Eqs. (1) and (2), the empirical distribution used n b in bootstrap sampling depends on the residuals from these We can then consider the joint limiting distribution of S∗ T ∗ −t∗ , equations. These reduced-form residuals depend on the parameter (e ,e n ) where estimates through bπ and bπbβ. Despite the inconsistency of bβ when  " ∗0 ∗ 1/2 0 1/2# instruments are weak, the estimated residuals vˆ1, vˆ2 are close to √  Z Z   Z Z  p ∗ − ∗ = − ˆ0 ∗−1 ˆ eT tn n π a Ωe a (v1, v2) in the reduced-form model if n n b

a.s. a.s.  ∗0 ∗ −1/2 ∗0 ∗ π → π and πbβ → πβ. (10) Z Z Z V ∗−1 ˆ b b √ n n Ωe a + n The argument for bootstrap validity in Section 4.2 will rely on p . aˆ0Ω∗−1aˆ Eq. (10) holding. In this subsection we show that the strong e consistency in (10) can be considered the norm even with weak To describe this limiting distribution, we require some additional instruments. notation, by Liapunov’s Central Limit Theorem and the Delta 0 0 0 k+1 0 0 2k method, Let θ = (β, π ) ∈ R and Π = Π(θ) = π β, π ∈ R . 0 0 √ = ∗0 ∗ 1/2 0 1/2 d Definebθ (bβ, bπ ) to be the M-estimator that minimizes a sample n[(Z Z /n) − E(Z Z/n) ]π → N(0, Σ), criterion Qn (Π (θ)). Though θ is not identified at π = 0, the where Σ depends directly on π. In particular, define Σ = following result shows that the restricted estimator Π(bθ) is still √ 0 when π = 0. For π = 0, nπ is bounded in strongly consistent. 0 b probability and (Z∗ Z∗/n)1/2 − (Z0Z/n)1/2 has a zero conditional

2k Proposition 4. Let B be some set in R ,Q (·) be a deterministic 2k function, and δ be the distance induced by the Frobenius norm in R , and suppose the following hold: 3 We can relax the notion of convergence in assumptions (i) and (ii) to convergence in probability. Following Potscher and Prucha (1997), we can (i) ∀ > 0, inf Q Π > Q (Π); then obtain convergence in probability using standard subsequence arguments. Π∈B;δ( Π,Π)> However, this result is rather trivial for 2SLS and LIML estimation. If π 6= 0, then it →p →p = →p =   a.s. is well known that bπ π and bβ β. If π 0, then bπ 0 and bβ Op (1). As a (ii) sup Qn Π − Q Π → 0; p p result, π → π and πβ → πβ for any value of π. Π∈B b bb 56 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

∗ − ∗ probability limit almost surely. Hence, the first term of eT tn is 2. Two alternative bootstrap methods could also be pursued. The asymptotically negligible, and the second term has a joint normal first alternative amounts to not replacing β0 with bβ. This avoids ∗ limit distribution witheS . the problem of replacing the structural parameter with the Lemmas in Appendix A show asymptotic normality of the inconsistent estimator bβ, yet it possibly entails power losses following expression under various assumptions including strong- (recall that the e.d.f. of the residuals will not be close to their and weak-instrument cases: c.d.f. when the true β is different from the hypothesized value  ∗0 ∗   β0). The second alternative amounts to doing OLS regressions Z V bb in the reduced-form model ignoring the nonlinear constraints √ p  n 0 ∗  of the reduced-form coefficients. However, the interpretation  bb Ωe bb   ∗0 ∗  ∗−1  from bootstrapping from the structural form residuals is no  Z V Ωe ba   √ p  longer valid in the over-identified model.  n a0Ω∗−1a  3. Proposition 4 shows that the assumption of almost sure  0 b e b √  W ∗ ı W 0ı   convergence of πˆ and πˆ βˆ is typical even in the unidentified n − n n case. However, we note that the proof of Theorem 6 also works for the case where πˆ and πˆ βˆ converge in probability. Then = 0 ∈ k(k+1)/2 = 0 ∗ = where wi vech ZiZi R , W (w1, . . . , wn) , wi the weak convergence in the conclusion of the theorem occurs ∗ ∗0 ∗ = ∗ ∗ 0 × with probability approaching one rather than almost surely. vech Zi Zi , W (w1 , . . . , wn ) , and ı denotes an n 1 vector of ∗ ∗ ∗ Both almost-sure and in-probability conclusions correspond ones. Since (eS ,eT − t ) is a function of the expression above, the n to modes of convergence that have been proposed for the next result on the asymptotic distribution of these bootstrapped bootstrap; cf. Efron (1979) and Bickel and Freedman (1981). statistics follows.

4+δ 4+δ Lemma 5. Suppose that, for some δ > 0,EkZik , EkVik < ∞. 4.2. Edgeworth expansions Let π and bβ be estimators satisfying either: b Given the robustness of validity for the bootstrap score test →a.s. − →a.s. (i) Assumption 1, bβ β, bπ π 0; or in Theorem 6, it is natural to wonder whether the higher-order a.s. −1/2 improvements under strong instruments in Theorem 3(a) carry (ii) Assumption 1B (or 1A), πbβ − πnβ → 0, π − πn = Op(n ), b b over to the weak instrument case. Below we will show that, β = O (1). b p in fact, the bootstrap typically does not deliver higher-order Then, the following result holds: improvements (in the usual sense) for the score statistic. The  ∗     reason is twofold. First, the higher-order terms typically depend S d Ik 0 e → N 0, a.s. , on bβ separately from the term πbβ. Second, the higher-order terms ∗ − ∗ Xn + 0 −1 b eT tn 0 Ik Σa Ω a are not necessarily continuous functions of the parameters in the unidentified case. where = { Y 0 Z0 Y 0 Z0 } and a = 1 . Xn ( 1, 1), . . . , ( n, n) (β, ) We have noted the lack of differentiability in (8) in the We are now ready to show the first-order validity of the score unidentified case. Because standard expansion arguments rely test on smoothness, higher-order improvement results for empirical ∗0 ∗ Edgeworth expansions (or the bootstrap) may not hold here. For ∗ S T = e e example, consider the problem of finding second-order Edgeworth LMf p . eT ∗0eT ∗ expansions for the LMf statistic when Ω is unknown but the errors are normal.5 We can compute the higher-order terms 4 This result holds regardless of the strength of the instruments. using standard methods. Alternatively, we can adapt the results in Cavanagh (1983) and Rothenberg (1988) to compute the 4+δ 2+δ Theorem 6. Suppose that, for some δ > 0,EkZik , EkVik < ∞ second-order Edgeworth distribution for LMf based on a stochastic . Let bπ and bβ be estimators satisfying either expansion: →a.s. − →a.s. = + −1/2 + −1 + −3/2 (i) Assumption 1, bβ β, bπ πn 0; or LMf LM n Pn n Qn Op n , − →a.s. − = −1/2 (ii) Assumption 1B (or 1A), bπbβ πnβ 0, bπ πn Op(n ), where P and Q are stochastically bounded with conditional moments bβ = Op(1). p x = E P |LM = x Then, the following result holds: n ( ) ( n ) , qn (x) = E (Qn|LM = x) , ∗| →d LMf Xn N(0, 1) a.s. , vn (x) = V (Pn|LM = x) . where X = {(Y 0 , Z0 ), . . . , (Y 0 , Z0 )}. n 1 1 n n Proposition 7. If the errors are jointly normally distributed, and LM admits a second-order Edgeworth expansion, P LM ≤ x can be Comments. 1. Validity of the bootstrap in the (locally) uniden- f f approximated by tified case is the main result in this paper. For completeness,  −1/2 −1 we also show the first-order validity under the same moment Φ x − n pn (x) + 0.5 · n 0 0 conditions for the strong instrument case. Of course, second- × 2p (x) p (x) − 2q (x) + v (x) − xv (x) order improvements with strong instruments are available un- n n n n n der stronger assumptions, see Theorem 3(a). up to an o n−1 term.6

5 4 In addition, we can use Lemma 5 to (i) show that the bootstrap is valid for Although the stated results are for tests designed for the known covariance the Anderson–Rubin, see Appendix B; (ii) give a formal proof that the bootstrap matrix case, analogous results hold when we replace Ω with its consistent does not provide a first-order approximation to the Wald and LR statistics with estimator Ωe. In particular, the LMf and We statistics also admit Edgeworth weak instruments; and (iii) note that the fixed-T bootstrap provides an alternative expansions, but with different polynomials in the higher-order terms (see method to compute critical values for the CLR test of Moreira (2003), see Moreira Appendix A). et al. (2004). 6 This proposition is proved in Rothenberg (1988), Appendix A. M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 57

of ones; the remaining k − 1 columns are distributed N 0 I , Comments. 1. The terms pn (x), qn (x), and vn (x) can be ap- ( , k−1) 8 proximated by functions such that the terms in the higher- independently from [u, v2]. order expansion are expressed exactly as powers of n−1/2; see To examine the performance of the tests under various Rothenberg (1988). degrees of identification, we consider three different values of 0 9 ,10 the population first-stage F-statistic, π (nIk)π/k. In particular, 2. Recall that under normality the LM statistic is N (0, 1) under H0, 0 we consider the completely unidentified case (π (nIk)π/k = but the LMf statistic is not. Therefore, Proposition 7 provides a 0 0); a weak-instruments case (π (nI )π/k = 1); and a strong- second-order correction for the LM statistic using conditional k f instruments case (π 0(nI )π/k = 10). moments of the LM statistic. In FGLS examples, Edgeworth k The Monte Carlo simulations in this section compute actual expansions are known to correct for skewness and kurtosis due rejection rates for the two-sided version of the score and Wald to an estimated error covariance matrix; cf. Horowitz (2001) tests for the hypothesis that β = 0.11 For each specification, 1000 and Rothenberg (1988). We find that this behavior carries over pseudo-data sets are generated under the null hypothesis. For each to the IV setting as well. pseudo-data set, we compute rejections using two different critical The higher-order terms for the score statistic typically depend values: (a) the 5% critical value of the asymptotic distribution of the given test (i.e., chi-square one); (b) the bootstrapped 5% critical on π and Ω. In practice, we do not know π and Ω, and need value computed with 1000 replications for each pseudo-data set. to replace them with consistent estimators in the higher-order The simulations are designed to consider various cases of terms. As long as the higher-order polynomials are continuous disturbance distributions. The first set of simulations restricts functions of the parameters, empirical Edgeworth expansions (or attention to disturbances following conventional distributions the bootstrap) lead to higher-order improvements. However, in with small sample sizes. For design I, we take u and v to be the next result, we find that the non-differentiability of the score i 2i jointly normally distributed. For design II, we consider Wishart statistic in the unidentified case leads to a discontinuity in the √ distributed disturbances. In particular, we take u = (ξ 2 − 1)/ 2 second-order term of the Edgeworth expansion at π = 0. Recall √ i 1i = 2 − that Theorem 2 guarantees that the score statistic admits a second- and v2i (ξ2i 1)/ 2, where ξ√1i and ξ2i are standard normal order Edgeworth expansion at any π 6= 0. Here, we actually random variables with correlation ρ. compute the second-order terms. The next set of simulations considers nonnormal disturbances with varying numbers of instruments and large sample sizes. In these simulations we follow Kotz et al. (2000) to generate random Corollary 8. Under Assumptions 1, 2 (with s = 3), and 3, the null draws of ui and v2i with zero mean, unit variance, skewness of κ3, distribution of LM can be uniformly approximated (in x) by f kurtosis of κ4, and correlation ρ. In particular, we generate random 0 −1/2  2 −1/2 draws for i = [1i, 2i] , where the i’s are standard normal Pr(LMf ≤ x) = Φ(x) + n α2 + (α1 − α2)x φ(x) + o(n ), variables, and the correlation between 1i and 2i is ρ. Then, we where set [ 0 0 3] = + + 2 + 3 1 E (Zi π)(Vi b0) ui a b1i c1i d1i, and α1 = , 2 b0 b 3/2 0 1/2 = + + 2 + 3 ( 0Ω 0) (π µZZ π) v2i a b2i c2i d2i. 0 3 0 3 1 E[(Z π) (V b0) ] Fleishman (1978) describes the simultaneous equations system α = i i , and 2 0 3/2 0 3/2 that a, b, c, d, and ρ should satisfy to yield the desired skewness 6 (b0Ωb0) (π µZZ π)  0 (κ ), kurtosis (κ ), and correlation (ρ) for (u , v ). This family of µ = E(Z Z ). 3 4 i 2i ZZ i i distributions is helpful in checking the sensitivity of the rejection rates to variation in the degree of nonnormality as specified by the This higher-order term in general cannot be extended to be con- kurtosis and skewness of the distributions. tinuous at π = 0 (take π = c · n−1/2 → 0 for different vectors Tables 1 and 2 report null rejection probabilities for the score c). Thus, the empirical Edgeworth expansion and bootstrap7 ap- and Wald tests with sample sizes of 20 and 80 observations. proaches typically do not provide a n−1/2 correction and can per- Bootstrapping the score test instead of using the first-order form poorly in the unidentified case. asymptotic approximation takes actual rejection rates closer to the 12 This result is not a weak-instrument problem, but rather a nominal size, sometimes even in the unidentified case. For the non-differentiability problem of the score test. For example, in smaller sample size results, the bootstrapped score is closest to the Appendix B we show that the Anderson–Rubin statistic admits an nominal 5% level when the instruments are strongest. Edgeworth expansion even when π = 0. The performance of the Wald test exhibits even more sensitivity to instrument strength. When instruments are not strong, the size distortions for both asymptotic and bootstrap critical values can 5. Monte Carlo simulations be dramatic. On the other hand, when instruments are strong, rejection rates for the Wald test are much closer to the nominal In this section, we use simulation to examine the size size, and bootstrapping the Wald test offers improvements over performance of the bootstrap for the Wald and score test statistics. The basic simulation model is described by Eqs. (1) and (2). The n rows of [u, v2] are i.i.d. with mean zero, unit variance, and 8 There is a slight difference between Moreira’s (2002, 2003) design and ours. In correlation ρ. The correlation coefficient, ρ, represents the degree analogy with our results, our design takes Z as being random whereas Moreira’s of endogeneity of y2, and the distribution for these disturbances (2002, 2003) design takes Z as being fixed. will vary depending on the design, as described below. We take 9 The first-stage F-statistic corresponds to the concentration parameter λ0λ/k in the first column of the matrix of instruments, Z, to be a vector the notation of Staiger and Stock (1997). 10 0 Note that E(Z Z) = nIk. 11 Further simulations for other values of β0 have revealed similar results. 12 We have also performed simulations using the empirical Edgeworth expansion 7 In addition, the bootstrap has the problem of replacing β with the inconsistent for the one-sided score test. Results not reported here indicate that this estimator bβ. approximation method is outperformed by the bootstrap. 58 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

Table 1 Null rejection probabilities, nominal 5% n = 20, k = 4 (replications = 1000). 0 ρ π (nIk)π/k Normal Wishart LM Wald LM Wald BS 3.84 BS 3.84 BS 3.84 BS 3.84

0 0 4.8 8.0 0.0 0.5 7.9 11.9 0.9 2.0 0 1 4.1 7.4 1.3 2.4 6.2 9.5 1.7 4.0 0 10 4.5 6.5 3.4 4.9 5.8 9.5 5.7 8.7 0.5 0 5.8 9.1 12.0 15.4 7.4 11.3 14.5 2.1 0.5 1 4.2 6.4 13.0 14.1 6.9 10.4 9.3 14.6 0.5 10 4.6 6.6 5.7 7.4 6.5 9.7 6.3 8.8 0.75 0 6.1 7.6 42.7 48.7 7.5 12.8 39.0 50.7 0.75 1 4.3 6.5 27.9 32.6 6.9 9.7 22.7 29.2 0.75 10 4.9 6.3 7.6 10.6 7.0 10.5 8.2 12.3 0.99 0 5.9 7.6 95.2 99.1 9.0 13.3 93.7 98.3 0.99 1 4.5 6.5 35.4 57.2 7.0 10.3 31.7 51.3 0.99 10 5.1 6.5 9.1 14.2 7.0 10.6 9.0 15.2 • BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results.

Table 2 Null rejection probabilities, nominal 5% n = 80, k = 4 (replications = 1000). 0 ρ π (nIk)π/k Normal Wishart LM Wald LM Wald BS 3.84 BS 3.84 BS 3.84 BS 3.84

0 0 5.8 6.3 0.0 0.0 5.7 6.6 0.2 0.3 0 1 5.5 6.1 0.1 1.3 5.6 6.0 0.3 1.4 0 10 5.2 5.8 4.3 4.6 5.1 5.6 4.7 5.0 0.5 0 6.4 7.1 12.8 15.9 5.3 6.0 10.8 14.0 0.5 1 5.6 5.9 16.0 13.8 5.3 6.0 11.2 12.9 0.5 10 5.5 6.0 6.9 6.9 5.5 6.2 5.6 6.7 0.75 0 6.0 6.8 46.3 47.9 5.8 6.4 44.2 49.2 0.75 1 4.8 5.4 29.5 31.4 5.8 6.1 26.1 28.5 0.75 10 6.4 6.4 7.7 9.1 4.8 6.0 5.9 9.0 0.99 0 5.5 5.9 95.2 98.9 6.2 6.7 95.4 98.8 0.99 1 4.9 5.2 29.3 54.3 7.2 7.7 28.6 56.9 0.99 10 5.4 5.3 7.7 12.2 7.2 8.0 7.6 12.9 • BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results.

Table 3 LM test: null rejection probabilities, nominal 5% (1000 replications). 0 π (nIk)π ======ρ k n 100, k 4 n 250, k 4 n 1000, k 4

κ3 = 0 κ3 = 6 κ3 = 6 κ3 = 0 κ3 = 6 κ3 = 6 κ3 = 0 κ3 = 6 κ3 = 6

κ4 = 6 κ4 = 0 κ4 = 6 κ4 = 6 κ4 = 0 κ4 = 6 κ4 = 6 κ4 = 0 κ4 = 6 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84

0 0 6.8 7.3 6.2 6.9 4.7 5.3 5.3 5.1 4.9 5.0 4.9 5.2 4.4 4.5 5.2 5.2 4.3 4.2 0 1 5.1 5.4 5.4 6.0 5.8 5.8 4.6 4.5 3.8 4.2 4.6 5.2 6.3 6.5 5.6 5.5 4.0 4.0 0 10 4.8 4.8 4.3 5.1 5.0 6.1 4.2 4.3 3.8 3.5 3.8 3.8 6.1 6.1 6.0 5.7 5.1 5.1 0.5 0 5.7 6.0 6.2 6.4 5.5 6.5 5.1 4.6 5.5 5.7 6.4 6.2 3.8 3.6 5.2 5.3 4.5 4.5 0.5 1 5.6 6.0 4.9 5.6 5.6 6.1 5.4 5.4 4.4 4.5 4.5 4.9 5.2 4.9 5.8 5.6 4.4 4.2 0.5 10 5.3 5.5 5.0 5.4 5.3 5.5 5.0 5.3 4.0 4.1 4.3 4.4 5.1 5.3 5.9 5.9 5.2 5.5 0.75 0 6.4 6.9 5.5 6.2 5.6 6.2 5.2 5.0 5.5 5.7 5.2 5.6 4.8 4.5 4.3 4.2 4.6 4.2 0.75 1 4.5 4.5 4.7 5.5 5.1 5.9 5.3 5.1 4.4 4.5 4.9 5.0 4.8 5.2 5.0 4.8 4.6 4.1 0.75 10 4.7 5.5 4.7 5.4 4.9 5.7 5.2 5.2 4.0 4.1 4.2 4.1 4.5 5.3 4.7 4.9 4.7 4.8 0.99 0 5.0 5.2 5.5 6.2 4.5 5.4 6.1 6.0 5.5 5.8 5.6 5.9 4.3 4.5 4.3 4.2 4.6 4.3 0.99 1 4.3 4.5 4.8 5.5 4.1 4.2 5.9 5.7 4.4 4.5 5.1 5.2 4.9 5.1 5.0 4.8 4.8 4.5 0.99 10 4.5 4.4 4.6 5.4 4.2 4.8 4.7 5.1 5.9 6.0 5.8 5.8 4.9 5.2 4.8 4.9 5.0 5.0 • BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results. first-order asymptotics. The poor behavior of the bootstrap for the designs. In the high skewness, high kurtosis designs, the bootstrap Wald test with weak instruments is explained, as previously, by performs especially well. It is also notable that some of the largest its dependence on π. For the remaining designs, we focus on the bootstrap gains occur in the unidentified case. With larger sample behavior of the score test. sizes, the rejection rates close in on the nominal 5% values, and the Table 3 gives the results for the score test with sample sizes bootstrap and asymptotic rejection rates mirror each other. n = 100, 250, 1000 and varying skewness and kurtosis values. Table 4 considers designs with a large number of instruments The number of instruments is fixed at four for this table. For the (k = 50). Otherwise, Table 4 designs are identical to Table 3 smallest sample sizes, the bootstrapped critical values generally designs. With a large number of instruments, the bootstrap gains provide some improvements over the asymptotic values, with the over the asymptotic critical value are dramatic at the smallest exception of the high correlation (ρ = .99), strong-instrument sample size. Even with the middle-range sample size (n = 250), M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 59

Table 4 LM test: null rejection probabilities, nominal 5% (1000 replications). 0 π (nIk)π ======ρ k n 100, k 50 n 250, k 50 n 1000, k 50

κ3 = 0 κ3 = 6 κ3 = 6 κ3 = 0 κ3 = 6 κ3 = 6 κ3 = 0 κ3 = 6 κ3 = 6

κ4 = 6 κ4 = 0 κ4 = 6 κ4 = 6 κ4 = 0 κ4 = 6 κ4 = 6 κ4 = 0 κ4 = 6 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84 BS 3.84

0 0 9.8 16.4 10.1 16.2 9.1 15.2 6.7 7.2 7.0 8.3 6.9 8.4 5.6 6.6 5.2 5.5 5.8 5.9 0 1 8.9 13.3 6.3 7.9 7.6 11.2 6.4 7.2 5.1 5.3 5.0 5.9 5.8 6.2 5.1 5.3 5.0 4.6 0 10 6.1 7.7 5.5 5.9 6.9 8.5 5.8 5.6 5.9 6.0 4.6 5.0 6.2 6.2 4.9 5.1 5.1 5.3 0.5 0 10.5 16.9 9.7 16.0 9.3 17.6 6.7 8.1 7.1 8.7 6.1 8.3 5.5 6.1 5.0 5.2 5.7 5.4 0.5 1 7.3 10.3 6.6 7.8 6.6 12.0 6.6 7.6 5.2 5.4 5.1 5.9 5.8 6.4 5.0 5.5 4.8 4.8 0.5 10 4.7 5.5 5.3 6.1 5.9 6.9 6.0 6.5 5.4 5.7 5.6 6.1 6.1 6.3 4.9 4.7 5.0 5.2 0.75 0 8.3 15.9 8.5 16.9 8.7 16.9 6.9 7.7 6.2 8.5 6.3 8.6 6.1 6.1 5.7 5.8 5.7 5.8 0.75 1 4.6 7.9 5.9 7.8 8.0 10.9 6.3 7.1 5.2 6.0 4.5 5.4 5.7 5.6 5.3 5.0 5.9 5.9 0.75 10 3.7 4.6 5.5 5.9 5.3 6.8 5.6 5.7 5.9 6.0 5.5 5.7 5.5 5.6 4.9 4.6 5.3 5.1 0.99 0 10.3 16.9 8.5 16.9 8.9 16.6 6.9 6.9 6.2 8.6 6.3 8.4 5.5 6.1 5.7 5.9 5.7 5.9 0.99 1 3.1 4.3 5.9 7.8 8.2 11.2 4.5 5.2 5.3 6.0 4.6 5.5 5.8 6.4 5.4 5.0 5.8 5.8 0.99 10 3.6 4.4 5.5 5.9 5.5 6.8 4.7 5.1 5.9 6.0 5.8 5.8 6.1 6.3 4.9 4.6 5.2 5.5 • BS: Bootstrapped critical value results. • 3.84: First-order asymptotic critical value results. the bootstrap consistently outperforms the asymptotic critical Guggenberger and Smith (2005), Otsu (2006), Stock and Wright values for the various designs. For the largest sample size (n = (2000), and Brown and Newey (forthcoming). Inoue (2006), and 1000), the rejection rates for both methods are converging to the Kleibergen (2006) present some simulations and results which nominal value of 5%. Still, even with n = 1000, the bootstrap indicate that the bootstrap can lead to size improvements for the provides some improvement for 25 of the 36 designs and has unidentified case also in the GMM context. Our theoretical results equivalent performance to asymptotics for another six of the should be adaptable to those cases by analyzing GMM and GEL designs. versions of the two sufficient statistics for the simple simultaneous For the strong instrument cases, the simulation results in equations model analyzed here. these tables are consistent with the higher-order improvements expected of the bootstrap from the theoretical findings in Acknowledgement Sections 3 and 4. It is also notable that some of the largest gains for the bootstrap occur in the unidentified designs. In Discussions with Tom Rothenberg were very important for this Theorem 6, the first-order validity of the score test is established, paper, and we are indebted for his insights and generosity. We but from Section 4.2, we know that these bootstrap gains cannot be also thank the participants at Boston College, Boston University, explained by traditional Edgeworth expansion arguments. These Cornell, Harvard-MIT, Maryland, Montreal, North Carolina State, simulations, then, suggest an interesting path for future research.13 Rice, Texas A&M, Texas Austin, and USC–UCLA seminars, and at the NSF Weak Instrument Conference at MIT and the Semiparametrics 6. Conclusion in Rio conference organized by FGV/EPGE. Moreira and Porter gratefully acknowledge the research support of the National It is well-known that in the strong-instrument case, the Wald Science Foundation via grant numbers SES-0418268, SES-0819761, statistic (and the score statistic) are smooth functions of sample and SES-0438123. Suarez acknowledges support from the Los- means and the bootstrap provides higher-order improvements. In Andes-Harvard Fund and Banco de la Republica. This paper the unidentified case, the statistics are, in general, non-regular, and represents the views of the authors and does not necessarily the standard proofs for the validity of the bootstrap break down. represent those of the Federal Reserve System or members of its Despite the known fragility of the bootstrap in non-regular cases staff. (Shao, 1994; Andrews, 2000), this paper provides a positive result that bootstrapping the score statistic is, in fact, valid. Appendix A. Proofs As a negative result, the bootstrap for the score does not, in general, provide standard improvements in the weak-instrument Proof of Theorem 2. First, we prove part (a). Under H , case. This is due to the structural parameter not being consistently 0 estimable and the higher-order polynomials of the Edgeworth −1 q b0 Y 0Z/n Z0Z/n Z0Y /n Ω−1a / b0 Ωb expansions not being necessarily continuous in the unidentified √ 0 e 0 0 e 0 LMf = n case. Nevertheless, this discontinuity due to non-differentiability q − a0 Ω−1 (Y 0Z/n)(Z0Z/n) 1 (Z0Y /n) Ω−1a of the score statistic can be quite interesting, given that little 0 e e 0 is known about expansions when the statistic is not smooth. In with the words of Wallace (1958): ‘‘The assumption H0 (µ) 6= 0 and 0 0  0 −1 0  its equivalent for functions of several moments rule out many Ωe = Y Y /n − Y Z/n Z Z/n Z Y /n . interesting functions for which no general theory of asymptotic expansions is known.’’ The expression for LMf can be re-written as √ Our approach can, in principle, be extend to obtain results   LMf = n H Rn − H (µ) , for the unidentified case in the GEL and GMM contexts; cf. ` where H is a real-valued Borel measurable function on R such that H (µ) = 0. All the derivatives of H of order s and less are continuous in the neighborhood of µ. Using Assumptions 2 and 3, 13 There has been some recent work that supports bootstrap improvements beyond standard Edgeworth expansion arguments, e.g., Goncalves and Meddahi the result follows Theorem 2 of Bhattacharya and Ghosh (1978). (forthcoming) on realized volatility. Corollary 8 gives the expression for the second term. 60 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

The proof for part (b) is analogous to the proof for part (a). The Proof of Proposition 4. The function ρ (θ1, θ2) = δ (Π (θ1) , Π Wald statistic equals (θ2)) is nonnegative, symmetric, and satisfies the triangle inequal- −1 −1 ity. Therefore, ρ (θ , θ ) is a pseudometric. √ 0 0  0 −1/2 0 0  0 − 1 2 (y2Z/n Z Z/n Z y2/n) y2Z/n Z Z/n Z (y1 y2β0) /n W = n , The pseudometric ρ (θ1, θ2) induces an equivalence relation — e p 0 [1, −bβ2SLS ]Ωe[1, −bβ2SLS ] the metric identification — that converts the pseudometric space into a metric space. Define θ1 ∼ θ2 if ρ (θ1, θ2) = 0, and let where ∗ ∗ ∗ ∗ Θ = Θ/ ∼ and ρ ([θ1] , [θ2]) = ρ (θ1, θ2). Then (Θ , ρ ) is a = 0 0 −1 0 −1 0 0 −1 0 bβ2SLS (y2Z/n Z Z/n Z y2/n) y2Z/n Z Z/n Z y1/n. metric space. ∗ ∗ Define Qn ([θ]) = Qn (Π (θ)) and Q ([θ]) = Q (Π (θ)). Like the score statistic, the Wald statistic can be written as ∗ −1  √ Define B = {[θ] : θ ∈ Θ, Π (θ)} = {[θ] : θ ∈ Θ, θ ∈ Π (B)} .   From Assumption (i), we obtain that ∀ 0, We = n H Rn − H (µ)  > ∗   ∗ under H0, where H is a real-valued Borel measurable function such inf Q θ > Q ([θ]) . (11) that H (µ) = 0. All the derivatives of H of order s and less are [θ]∈B∗;ρ∗([θ],[θ])> continuous in the neighborhood of µ. The result then follows by From Assumptions (ii) and (iii), we obtain Theorem 2 of Bhattacharya and Ghosh (1978).  ∗   − ∗   →a.s. Proof of Theorem 3. Let F be the distribution of sup Qn θ Q θ 0 and (12) [θ]∈B∗ = 0 0 0 0 0  Rn vech (Yn, Zn) (Yn, Zn) " # ∗   − ∗ ≥ lim inf inf Qn θ Qn ([θ]) 0 a.s. and let Fn be the distribution of →∞ ∗ c n [θ]∈(B ) ∗ ∗0 ∗0 0 ∗0 ∗0  eR = vech (eY , Z ) (eY , Z ) n n n n n ∗ [ ] →a.s. 0 0 0 0 ∗ From (11) and (12), it follows that ρ ( bθn , [θ]) 0. By conditional on X = {(Y , Z ), . . . , (Y , Z )}. Here, Z has a.s. n 1 1 n n n definition of ρ∗, we obtain that ρ(θ , θ) → 0. Because ρ is not a probability 1/n in taking the values Z , and Y ∗ has probability 1/n bn n n →a.s. in taking the values metric, this does not imply that bθn θ. However, by construction of the pseudometric ρ, we have Y = Z πa + V = Z π(β, 1) + V . en nbb en nb b en a.s. δ(Π(bθn), Π (θ)) ≡ ρ(bθn, θ) → 0. The re-sampling mechanism for eYn and Zn and the re-centering a.s. procedure for bV of subtracting samples means reflect the fact that Because δ is a metric, we obtain the desired result: Π(bθn) → Π (θ). Z and V are independent. If Z and V were uncorrelated, it would  entail different drawing mechanisms and re-centering procedures. a.s. 2+δ But the essence of the proofs for the bootstrap presented here Lemma A1. Suppose πbβ − πnβ → 0. If, for some δ > 0,EkZik < 2+δ b would remain the same. ∞,Ekvik < ∞, then for j = 1,..., k and m = 1, 2, ∗[| ∗ ∗ |2+δ] LetbFn be the Fourier transform of Fn and E Zj,ivm,i is bounded a.s. 0 0 0 0 0 R = vech (Y , Z ) (Y , Z ) . ∗[| ∗ ∗ |2+δ] = ∗[| ∗ |2+δ] en en n en n Proof. By independence, E Zj,ivm,i E Zj,i ∗[| ∗ |2+δ] = Following Lemma 2 of Babu and Singh (1984), there exists for each E vm,i . For j 1,..., k, d > 0 positive numbers  and δ such that 1 n ∗[| ∗ |2+δ] = X | |2+δ →a.s. [| |2+δ] E Zj,i Zj,i E Zj,i . lim sup sup bFn (t) ≤ 1 −  a.s. n i=1 n→∞ d≤ktk≤enδ = 1 Pn = = 1 Pn ∗ Let vm i=1 vm,i, m 1, 2, and Z i=1 Zi. Using Since the rows eR are i.i.d. (conditionally given Xn) with common n n n Minkowski and Cauchy–Schwartz inequalities, we obtain distribution Fn, one can proceed as in Bhattacharya (1987) to show that n ∗ ∗ + − X + E [|v |2 δ] = n 1 |˜v |2 δ 1,i 1,i √ ∗ = ∗     i 1 sup P n R − Rn ∈ A n en e 0 A∈A = −1 X | − − −  − |2+δ n v1,i v1 Zi Z (bπbβ πnβ) Z " s−2 # i=1 X − − + i/2 − : ( n n ) 1 n Pi ( D Fn) φV (x) dx − X + − X 0 + ≤ C n 1 |v − v |2 δ + n 1 | Z − Z (πβ − π β)|2 δ A i=1 1 1,i 1 i bb n i=1 i=1 −1  → ∞ ` ( n n ) is o n a.s. as n for every class A of Borel subsets of R + ≤ −1 X | − |2+δ + − 2+δ −1 X − 2 δ satisfying, for some ϑ > 0, C2 n v1,i v1 bπbβ πnβ n Zi Z i=1 i=1 ε ϑ  sup ΦV ((∂A) ) = O ε as ε ↓ 0. A∈A for a large enough constants C1 and C2. An analogous result holds ∗[| ∗ |2+δ] for E v2,i :  ∗  ∗ Reduction of the expansion of n1/2 R − R to LM follows as in en en f ( n Bhattacharya and Ghosh (1978) once we realize that ∗[| ∗ |2+δ] ≤ −1 X | − |2+δ E v2,i C2 n v2,i v2 = ∗ √   ∗   i 1 = − LMf n H eRn H eRn n ) 2+δ −1 X 2+δ + kπ − π k n Z − Z .   b n i i=1 with H eRn = 0 (due to re-centered residuals).  M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 61

  2+δ 2+δ  k −1 δ  Using the Minkowski inequality again, we get − X c1,j c2,j(Ωe a)1 ∗  ∗ ∗ 2+δ  ≤ C n 2 · + b E |Z v | 5  p p  j,i 1,i ( ) 0 0 −1 n n  j=1 bb Ωebb ba Ωe ba −1 X 2+δ −1 X 2+δ 2+δ n Zi − Z ≤ C1 n kZik + Z  2+δ 2+δ   −1 c1,j(−bβ) c2,j(Ωe a)2 ∗ ∗ ∗ + i=1 i=1 + + b E |Z |2 δ   p p  j,iv2,i  0 0 −1 a.s.   2+δ 2+δ bb Ωebb ba Ωe ba → C1 E kZik + kE [Zi]k , +  (k+1)k/2 ! 2 δ 1 n  a.s. 2+δ 1/(2+δ) X ∗ ∗ X → k k ≤ k k ≤ k k + dlE w − wj,i using Z E [Zi] E Zi (E Zi ) by Jensen’s l,i n l=1 j=1  inequality. Similarly, using the Minkowski inequality again, we

obtain for large enough constants C3, C4, and C5. n ( n ) − + The vectors a and b both have one as an element, and Ω and Ω 1 −1 X | − |2+δ ≤ −1 X 2 δ + | |2+δ b b e e n vm,i vm C1 n vm,i vm converge almost surely to positive definite limits. So, regardless of i=1 i=1 the value of πn or bβ, the terms a.s. n 2+δo → C1 E vm,i , −1 −1 1 (Ωe a)1 −bβ (Ωe a)2 , b , , and b (13) a.s. a.s. −1 P 2+δ p p p p as v → 0. Since πβ − π β → 0, the term n |˜v | is 0 0 −1 0 0 −1 m bb n i m,i bb Ωebb ba Ωe ba bb Ωebb ba Ωe ba bounded a.s.  are almost always well-defined. These terms are also bounded by Recall the following notation = ech Z Z 0 ∈ k(k+1)/2 wi v i i R κ1(Ωe), where ∗ ∗ ∗0 and W = (w1, . . . , wn). Similarly, let w = vech Z Z and ∗ ∗ ∗ i i i (s s ) W = (w , . . . , w ). Also let Ωww = Var(wi) and let ı be an n × 1 σ11 σ22 1 n κ (Ω) = max e , e , (14) vector of ones. 1 e − 2 − 2 eσ11eσ22 eσ12 eσ11eσ22 eσ12 4+δ 2+δ Lemma A2. If, for some δ > 0,EkZik < ∞,EkVik < ∞, then where σij is the (i, j)-th entry of Ωe. 0 e   Z∗ V ∗  b  This bound follows from the fact that √ b p 0 − 0 − −  n 0  a Ω 1a = a Ω 1ΩΩ 1a,  bb Ωebb  b e b b e e e b  ∗0 ∗  −1   Z V Ωe ba  and the following claim (which holds regardless of the value of π).  √  Xn n p 0 −  a Ωe 1a  Let √  ∗0 b 0 b   W ı W ı  k k  τ  n − K = 11 12 and τ = 1 , n n k12 k22 τ2   0  d I ⊗ E(Z Z ) 0 → N 0, 2 i i a.s. where K is a symmetric positive definite matrix. Then, the 0 Ωww following holds: s 0 0 0 = 0 0 0 ∈ 2k 0 0  s Proof. Let (c , d ) be a nonzero vector with c (c1, c2) R τ1 τ e1e1 τ k22 + √ ≤ sup = . ∈ k(k 1)/2 0 2 and d R . Define 0 τ τ Kτ k k − k √ τ Kτ 11 22 12 0 0 ∗ ∗ 0 ∗ X = c ( J ⊗ I ) V ⊗ Z  + d w − w / n, n,i b k i i i Given the bound in (14), the conclusion of Lemma A1, and the fact 4+δ ∗ ∗ −1 n ∗ ∗ ∗ 0 that EkZ k < ∞ is sufficient to bound E w − (n P where V = v , v  is the i-th bootstrap draw of the (re- i li j=1 i 1,i 2,i 2+δ = −1 Pn wli)| almost surely, the final condition of the Liapunov Central centered) reduced-form residuals, w n i=1 wi, and Limit Theorem now follows because Ωbww converges a.s. to its " # b Ω−1a positive definite limits.  = b e b bJ p , p . b0 b a0 −1a 4+δ 2+δ bΩeb bΩe b Lemma A3. If, for some δ > 0,EkZik < ∞,EkVik < ∞, then We use the Cramér–Wald device and verify the conditions of the   ∗0 ∗   Z V bb Liapunov Central Limit Theorem. √ p  n 0 ∗  (a) E∗ X  = 0 follows from independence and E∗ V ∗ = 0.  bb Ωe bb  n,i i  ∗0 ∗  ∗−1  (b) By independence,  Z V Ωe ba   √  Xn 0 n p 0 ∗−    Z Z    a Ωe 1a  ∗  2  = −1 0 ⊗ + 0   ∗0 b 0 b  E Xn,i n c I2 c d Ωewwd ,  √ W ı W ı  n n − n n = −1 Pn − − 0 is finite a.s., where Ωeww n i=1 (wi w)(wi w) .   0  n d ⊗ P ∗[| |2+δ] = I2 E(ZiZi ) 0 (c) Finally, we need to show that limn→∞ i=1 E Xn,i 0 → N 0, a.s. 0 Ωww a.s. Let (p)j denote the j-th entry of a vector p and cm,j denote the j-th entry of the vector cm, m = 1, 2; that is, cm,j = (cm)j. ∗ = [ ∗ ∗] We have Proof. Noting that V v1 v2 , we can rewrite the first two terms n of the expression above: X ∗  2+δ  E |Xn,i|  ∗0 ∗  0 i=1  Z V  b  ∗ ∗   b Z v1 n √ √ − δ −1 X ∗ 0 0 ∗ ∗ 2+δ 0 ∗ 2+δ  p  ≤ 2 | ⊗ | + | −  |  n 0 ∗ ∗0  n  C3n n E c (bJ Vi Zi ) d wi w  bb Ω bb  = ⊗ ∗0 ∗ ∗−e1 (bJ Ik)  ∗0 ∗   , i=1  Z V  Ω a  Z v  √ e b   √ 2  − δ ∗  0 ∗ 0 ∗ 2+δ 0 ∗ 0 ∗ 2+δ 0 ∗  2+δ  p ≤ C4n 2 E |c Z (bJ V )1| + |c Z (bJ V )2| + |d w − w | n 0 ∗−1 n 1 i i 2 i i i ba Ωe ba 62 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64

√ " ∗0 ∗ 1/2  0 1/2# q where ∗ ∗ Z Z Z Z T˜ − t = n − a0 ∗−1a n bπ bΩe b " ∗−1 # n n ∗ bb Ωe ba J = , . ∗0 ∗ −1/2 ∗0 ∗ b p p  Z Z  Z V ∗−1 0 ∗ 0 ∗−1 √ Ω a bb Ω bb a Ω a √ n n e b e b e b + n p . Since 0 ∗−1 ba Ωe ba 0  Z∗ ∗   v1 We shall prove (i) first. By Lemma A3, √  n  d 0  " ∗0 ∗ 1/2  0 1/2# q Xn → N 0, Ω ⊗ E(ZiZ ) a.s. √ Z Z Z Z  ∗0 ∗   i 0 ∗−1  Z v2  n − π a Ωe a √ n n b b b n ∗0 ∗ 0 is uncorrelated with the term Z √V . Using the fact that Z∗ Z∗/n|X by the Liapunov CLT by an argument similar to the proof of n n a.s. √ ∗ a.s. a.s. p a.s. Lemma A2, it will suffice to show thatbJ −bJ|Xn → 0 a.s. 0 0 ∗−1 0 −1 0 → E(ZiZ ) a.s., π → π and a Ωb a − a Ω a|Xn → 0 a.s., Notice that E∗[Z∗ Z∗/n] = Z0Z/n. So by the Markov Law of Large i b b b ˜∗0 ˜ ∗ − ∗ 0 0| →d + 0 −1 Numbers, (S ,(T tn ) ) Xn N(0, I2k Σa Ω a) a.s. For case (ii) in which Assumption 1B (or 1A) holds, the term ∗0 ∗ 0 Z Z Z Z a.s. " # − Xn → 0 a.s. √  ∗0 ∗ 1/2  0 1/2 q n n Z Z Z Z 0 ∗− n − π a Ωb 1a n n b b b 0 →a.s. 0 Moreover, Z Z/n E(ZiZi ) which is positive definite, and so √ 0 −1 a.s. 0 a.s. Z∗ Z∗/n |X → E(Z Z0)−1 a.s. Similarly, Z∗ V ∗/n|X → is conditionally asymptotically negligible since nπ, bβ and n i i n p b ∗ ∗0 ∗ ∗ ∗0 ∗ 0 0 ∗−1 E [Z V /n] = 0 a.s. Also, E [V V /n] = eV eV /n, and ba Ωb ba are bounded in probability. It then follows that ∗0 ∗ − 0 | →a.s. 0 →a.s. ˜∗0 ˜ ∗ − ∗ 0 0| →d V V /n eV eV /n Xn 0 a.s. By standard arguments eV eV /n Ω (S ,(T tn ) ) Xn N(0, I2k) a.s.  a.s. a.s. and → . Hence ∗ − | → 0 a.s. Ωe Ω Ωe Ωe Xn Proof of Theorem 6. Consider part (i) first. We write 0 ∗ −1/2 0 −1/2 ∗ Consider the terms (bb Ωe bb) and (bb Ωebb) frombJ andbJ. √ ∗0 ∗ Let Ω denote a generic value of the covariance matrix. As argued ∗ S˜ T˜ / n 0 −1/2 0 −1/2 LM = . in the proof of Lemma A2, |(bb Ωbb) | and |bβ(bb Ωbb) | can f q T˜ ∗0T˜ ∗/n be bounded by κ1( Ω) for all bb. For any c > 1, there exists a neighborhood of Ω such that for Ω in the neighborhood, κ ( Ω) < 1 √ a.s. 1/2 ∂ 0 −1/2 0 −3/2 ˜ ∗ | → 0  0 −1 1/2 cκ1(Ω). Note that (b Ωb) = −(b Ωb) /2. So, for large Under (i), we have T / n Xn E(ZiZi ) π(a Ω a) a.s. By ∂σ¯11 b b b b enough n, Lemma A3, ˜∗| →d ∂ 0 −1/2 1 3 3 S Xn N(0, Ik) a.s. (bb Ωbb) ≤ κ1( Ω) < 2κ1(Ω) a.s. ∂σ¯11 2 ∗ d As a result, LMf |Xn → N(0, Ik) a.s. ∗ for Ω = λΩe + (1 − λ)Ωe with λ ∈ [0, 1]. The same bound applies For part (ii), we write √ taking the partial derivative with respect to the other terms of Ω. ∗0 ∗ ∗ ∗ ∗ S˜ (t + T˜ − t )/ n It follows by the mean value theorem that for large enough n, = n n LMf q . (t∗ + T˜ ∗ − t∗)0(t∗ + T˜ ∗ − t∗)/n 1 1 n n n n − ≤ 3k ∗ − k p p 8κ1(Ω) Ωe Ωe a.s. 0 ∗ 0 ∗ d bb Ωe bb bb Ωebb By Lemma 5(ii), LMf |Xn → N(0, 1) a.s.  0 ∗ −1/2 0 −1/2 The same bound applies to | − bβ(bb Ωe bb) − (−bβ)(bb Ωebb) |. Proof of Corollary 8. Write the score LMf as a function H of R¯ = A similar argument can be used to bound the terms of Y 0Y /n, Z0Z/n, Z0Y /n. Let L(yy, zz, zy) = (yy − zy0(zz)−1zy), ∗−1 −1 − 0 − √Ωe ba − Ωe ba . Let = max{ ¯ ¯ }. Then, for M(yy, zz, zy) = L(yy, zz, zy) 1zy (zz) 1zy and define the function 0 ∗−1 q κ2( Ω) σ11, σ22, κ1( Ω) ba Ωe ba 0 −1 ba Ωe ba H(yy, zz, zy) to be the following expression: j = 1, 2, a0 M(yy, zz, zy)b 0 0 ∗−1 −1 0 − 0 . (Ωe a)j (Ωe a)j [ 1 ]1/2[ ]1/2 b − b a0M(yy, zz, zy)L(yy, zz, zy) a0 b0L(yy, zz, zy)b0 p p √ a0Ω∗−1a a0Ω−1a b e b b e b Then LMf = nH(R¯). 3 ∗−1 −1 0 0 ≤ 8(κ2(Ω) + κ2(Ω))kΩe − Ωe k a.s. Let r = (yy, zz, zy), µZZ = E(Z Z/n), µYY = E(Y Y /n) = 0 0 0 0 a0π µZZ πa + Ω, µZY = E(Z Y /n) = µZZ πa , and µ = ∗ − | →a.s. 0 0 It follows that bJ bJ Xn 0 a.s. and by Slutsky’s Theorem the (µYY , µZZ , µZY ). Elements of the matrices in r are denoted yyij, zzlm, result holds.  zyrs, where i, j, s = 1, 2 and l, m, r = 1,..., k. Let Proof of Lemma 5. The result is a direct application of the Delta ∂ Method and the limiting distribution given in Lemma A3 (and = | hyyij H(r) r=µ noting the zero covariances between the three components in the ∂yyij normal limiting distribution). ∂ 2 = | We have hyyijzzlm H(r) r=µ ∂yyij∂zzlm ∗0 ∗ ∗0 ∗ ( Z Z )−1/2 Z √V bˆ ˜∗ = n n for i, j = 1, 2 and l, m = 1,..., k. Similarly, define for appropriate S p ˆ0 ∗ ˆ index ranges hzz , hzy , hyy yy , hyy zy , hzz zz , hzz zy , and hzy zy . b Ωe b lm rs ij lm ij rs lm rs lm rs lm rs M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 63

k 2 k 2 k 2 k 2 Also, let Y , Z , and V denote the tth observation (row) from the X X X X X X X X t t t A = 3 h h h σ σ matrices Y , Z, and V . The ith element of each row of observations 24 zyij zylm zypqzyrs zypqzyij zylmzyrs i=1 j=1 l=1 m=1 p=1 q=1 r=1 s=1 is denoted Yi,t and Zi,t . Define = 0. = [ − − ] σyyijzzlm E (Yi,t Yj,t µYYij )(Zl,t Zm,t µZZlm ) = [ − − − ] σyyijzzlmzyrs E (Yi,t Yj,t µYYij )(Zl,t Zm,t µZZlm )(Zr,t Ys,t µZYrs ) Hence, 0 3 0 3 0 0 3 and similarly for other σ· notation. √ E[(Zh π) (Vh b0) ] 3E[(Zh π)(Vh b0) ] ¯ ≤ = A = − . From Hall (1992, Theorem 2.2), Pr( nH(R)/σ c) Φ(c) 2 0 3/2 0 3/2 0 3/2 0 1/2  −1/2 −1/2 (b0Ωb0) (π µZZ π) (b0Ωb0) (π µZZ π) + n p1(c)φ(c) + o(n ), where

−1 1 −3 2 p1(c) = −A1σ − A2σ (c − 1), 6 Appendix B. Additional results √ σ is the asymptotic variance of nH, and expressions for A and A 1 2 In this section, we first provide a verification of the regularity are derived below. conditions from Proposition 4 for maximum likelihood estimators. First it should be noted that since LMf is already standardized, it is straightforward to show (using the expression in Hall (1992)) The second subsection contains an Edgeworth expansion result for the Anderson–Rubin statistic regardless of instrument strength. that σ = 1. Then, p1(·) yields the desired expansion. The following expression for A1 is directly from Hall (1992). " B.1. Strong consistency of maximum likelihood 1 X2 X2 X2 X2 A1 = hyy yy σyy yy 2 ij lm ij lm We give an example here in which Assumptions (i)–(iii) of i=1 j=1 l=1 m=1 Proposition 4 hold for any closed (bounded) sphere B containing k k k k X X X X Π. Consider the objective function for maximum likelihood + h σ zzijzzlm zzijzzlm estimation with normal errors: i=1 j=1 l=1 m=1 0  k 2 k 2 Y − ZΠ Y − ZΠ X X X X Q Π = . + h σ n zyijzylm zyijzylm n i=1 j=1 l=1 m=1 This objective function converges a.s. to the continuous function X2 X2 Xk Xk + 0 2 hyyijzzlm σyyijzzlm    Q Π = Ω + Π − Π ΩZZ Π − Π , i=1 j=1 l=1 m=1 2 2 k 2 = 0  X X X X where ΩZZ E(ZiZi ). Because ΩZZ is positive definite, Q Π +2 hyy zy σyy zy ij lm ij lm is minimized at Π = Π. Hence, Assumption (i) holds for any i=1 j=1 l=1 m=1 compact set B (additional algebra shows that Π is in fact uniquely k k k 2 # k×2 + X X X X identified for R ). 2 hzzijzylm σzzijzylm . The second-order derivative of Qn (π1, π2) ≡ Qn (Π) is i=1 j=1 l=1 m=1  2 2  ∂ Qn (π1, π2) ∂ Qn (π1, π2) Note that the last term in the numerator of H(r) is zy · b0 and 0 0 0 µ b = µ πa b = 0. It follows that h = 0, h = 0, 2  ∂π1∂π ∂π2∂π  ZY 0 ZZ 0 0 yyij zzij ∇ Q π , π = 1 1 = = = n ( 1 2)  2 2  hyyijyylm 0, hzzijzzlm 0, and hyyijzzlm 0. The other terms in ∂ Qn (π1, π2) ∂ Qn (π1, π2)  the formula for A1 can be obtained directly, yielding 0 0 ∂π1∂π2 ∂π2∂π2 0 0 3 1 E[Zh π(Vh b0)) ] A = − . where the partial derivatives are given by 1 0 3/2 0 1/2 2 (b0Ωb0) (π µZZ π) 2 ∂ Qn (π1, π2) The formula for A2 is given in Hall (1992) and involves ∂π ∂π 0 considerably more terms than A . Given the length of that formula, 1 1 1 0 0 0 0 = = = 2Z Z (y2 − Zπ2) (y2 − Zπ2) − 2Z (y2 − Zπ2)(y2 − Zπ2) Z we will take advantage of the simplifications hyyij 0, hzzij 0, h = 0, h = 0, and h = 0 to state A more 2 yyijyylm zzijzzlm yyijzzlm 2 ∂ Qn (π1, π2) compactly. 0 ∂π1∂π2 A2 = A21 + A22 + A23 + A24, 0 0 0 0 = 4Z (y1 − Zπ1)(y2 − Zπ2) Z − 4Z (y2 − Zπ2)(y1 − Zπ1) Z where 2 ∂ Qn (π1, π2) Xk X2 Xk X2 Xk X2 0 ∂π2∂π2 A21 = hzy hzy hzyrs σzy zy zyrs ij lm ij lm 0 0 0 0 i=1 j=1 l=1 m=1 r=1 s=1 = 2Z Z (y1 − Zπ1) (y2 − Zπ1) − 2Z (y1 − Zπ1)(y1 − Zπ1) Z. 0 3 0 3 E[(Zh π) (Vh b0) ] = 0 The function Q is convex (a.s.) because for any non- (b Ωb )3/2(π 0µ π)3/2 n (π1, π2) 0 0 ZZ = 0 0 0 ∈ 2k 0∇2 zero vector c (c1, c2) R , c Qn (π1, π2) c > 0 (a.s.): Xk X2 Xk X2 X2 X2 Xk X2 2 2 A22 = 6 hzy hzy hyypqzyrs σyypqzy σzy zyrs ij lm ij lm 0 2 0 ∂ Qn (π1, π2) 0 ∂ Qn (π1, π2) i=1 j=1 l=1 m=1 p=1 q=1 r=1 s=1 ∇ = + c Qn (π1, π2) c c1 0 c1 c2 0 c2 0 0 3E[(Z π)(V b )3] ∂π1∂π1 ∂π2∂π2 = − h h 0 0 3/2 0 1/2 2 2 (b0Ωb0) (π µZZ π) 0 ∂ Qn (π1, π2) 0 ∂ Qn (π1, π2) + c c2 + c c1 k 2 k 2 k k k 2 1 0 2 0 X X X X X X X X ∂π1∂π2 ∂π2∂π1 A = 6 h h h σ σ 23 zyij zylm zzpqzyrs zzpqzyij zylmzyrs 2 2 ======0 ∂ Qn (π1, π2) 0 ∂ Qn (π1, π2) i 1 j 1 l 1 m 1 p 1 q 1 r 1 s 1 = c c + c c > 0, = 0 1 0 1 2 0 2 ∂π1∂π1 ∂π2∂π2 64 M.J. Moreira et al. / Journal of Econometrics 149 (2009) 52–64 using the Cauchy–Schwartz inequality. By Theorem 10.8 of References Rockafellar (1970), pointwise convergence of convex functions implies uniform convergence of any compact set B. Therefore, Andrews, D.W.K., 1987. Consistency in nonlinear econometric models: A generic uniform law of large numbers. 55, 1465–1471. Assumption (ii) holds. Andrews, D.W.K., 2000. Inconsistency of the bootstrap when a parameter is on the To show Assumption (iii), we follow the proof of Theorem 2.7 boundary of the parameter space. Econometrica 68, 399–405. Andrews, D.W.K., Guggenberger, P., 2008. Asymptotic size and a problem with of Newey and McFadden (1994). Consider the maximum and Πb subsampling and the m out of n bootstrap. (forthcoming). on the compact set B. This estimator is consistent for Π. Then the Babu, J., Singh, K., 1984. On one term edgeworth correction by effron’s bootstrap. , Series A 46, 219–232. event that δ(Πb, Π) <  so that Qn(Πb) ≤ Qn( Π) for any Π ∈ B Ď Bhattacharya, R.N., 1977. Refinements of the multidimensional central limit has probability one. In this event, for any Π outside B, there is theorem and applications. Annals of Probability 5, 1–27. Ď a linear convex combination λΠb + (1 − λ) Π that lies in B. By Bhattacharya, R.N., 1987. Some aspects of edgeworth expansions in statistics and Ď probability. In: Puri, M.L., Vilaplana, J.P., Wertz, W. (Eds.), New Perspectives in convexity, we obtain Qn(Πb) ≤ Qn(Π ). The result now follows a.s. Theoretical and Applied Statistics. New York, John Wiley and Sons, pp. 157–170. from Qn(Πb) → Q (Π). Bhattacharya, R.N., Ghosh, J., 1978. On the validity of the formal Edgeworth expansion. 6, 434–451. Bhattacharya, R.N., Rao, R., 1976. Normal approximation and asymptotic expan- B.2. Edgeworth expansion for the Anderson–Rubin statistic sions. In: Wiley Series in Probability and Mathematical Analysis. Bickel, P., Freedman, D., 1981. Some asymptotic theory for the bootstrap. Annals of Statistics 9, 1196–1217. Bound, J., Jaeger, D., Baker, R., 1995. Problems with instrumental variables Lemma 9. Let B be any class of Borel Sets satisfying estimation when the correlation between the instruments and the endogenous explanatory variables is weak. Journal of American Statistical Association 90, Z 443–450. sup φA (v) dv = O () as  ↓ 0, (15) Brown, B., Newey, W., 2004. GMM, efficient booststrapping, and improved B∈B (∂B)ε inference. Journal of Business and Economic Statistics (forthcoming). Cavanagh, C., 1983. Hypothesis Testing in Models with Discrete Dependent where φA is the pdf of a mean zero normal distribution with variance Variables, Ph.D. Thesis, UC Berkeley. A, ∂B is the boundary of B, and (∂B)ε is the ε-neighborhood of B. If Dufour, J.-M., 1997. Some impossibility theorems in econometrics with applications to structural and dynamic models. Econometrica 65, 1365–1388. Assumptions 1 or 1A, 2, and 3 hold, then Efron, B., 1979. Bootstrap methods: Another look at the jacknife. The Annals of Z Statistics 7, 1–26. −(s−2)/2 Fleishman, A., 1978. A method for simulating nonnormal distributions. Psychome- sup P (Sn ∈ B) − ψs,n (v) dv = o n trika 43, 521–532. B∈B B Goncalves, S., Meddahi, N., 2008. Bootstrapping realized volatility. Econometrica (forthcoming). under H0 : β = β0 and ψs,n is a formal Edgeworth expansion of order Guggenberger, P., Smith, R., 2005. Generalized empirical likelihood estimators and s − 2 for Sn. tests under partial, weak and strong identification. Econometric Theory 21, 667–709. Proof. Under H , the statistics S can be written as Hall, Peter, 1992. The Bootstrap and Edgeworth Expansion. Springer-Verlag, New 0 York. √ Horowitz, J., 2001. The bootstrap. In: Heckman, J.J., Leamer, E. (Eds.), Handbook of = 0 −1/2 0 · 0 −1/2 S n Z Z/n (Z V /n)b0 (b0Ωb0) Econometrics. North-Holland, New York, pp. 3159–3228. √   Inoue, A., 2006. A bootstrap approach to moment selection. Econometrics Journal 9, = n H Rn − H (µ) 48–75. Kleibergen, F., 2002. Pivotal statistics for testing structural parameters in for a measurable mapping H from ` onto 2k with derivatives instrumental variables regression. Econometrica 70, 1781–1803. R R Kleibergen, F., 2006. Expansions of GMM statistics and the bootstrap. Brown of order s and lower being continuous in the neighborhood of µ. University, Unpublished Manuscript. The result follows from Bhattacharya and Ghosh (1978, p. 437). An Kotz, S., Balakrishnan, N., Johnson, N., 2000. Continuous multivariate distributions, 2nd ed. John Wiley and Sons, New York. analogous result holds foreS, albeit the Edgeworth expansion would Moreira, M.J., 2002. Tests with Correct Size in the Simultaneous Equations Model, have different polynomials for the higher-order terms.  Ph.D. Thesis, UC Berkeley. Moreira, M.J., 2003. A conditional likelihood ratio test for structural models. Econometrica 71, 1027–1048. Theorem 10. Let Gk (x) and gk (x) be respectively the cdf and pdf of Moreira, M.J., Porter, J., Suarez, G., 2004. Bootstrap and higher order expansion when a chi-square-k distribution. Under Assumptions 1 or 1A, 2, and 3, the instruments may be weak, NBER Working Paper t0302. null distributions of AR can be uniformly approximated (in x) by Navidi, W., 1989. Edgeworth expansions for bootstrapping regression models. The Annals of Statistics 17, 1472–1478. r Nelson, C., Startz, R., 1990. Some further results on the exact small sample properties ≤ = + X −i i ; + −r  of the instrumental variable estimator. Econometrica 58, 967–976. P (AR x) Gk (x) n pAR (x F, β0, π) gk (x) o n . Newey, W., McFadden, D., 1994. Large sample estimation and hypothesis testing. i=1 In: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics. Elsevier Science, Amsterdam, pp. 2111–2245. Otsu, T., 2006. Generalized empirical likelihood inference for nonlinear and time Proof. We want to approximate series models under weak identification. Econometric Theory 22, 513–527. Potscher, B., Prucha, I., 1997. Dynamic Nonlinear Econometric Models. Spring- 0  Verlag. P (AR ≤ x) = P S Sn ≤ x n Qumsiyeh, M., 1990. Edgeworth expansion in regression models. Journal of ∈ Multivariate Analysis 35, 86–101. uniformly in x. This expression can be written as P (Sn Cx), for the Qumsiyeh, M., 1994. Bootstrapping and empirical Edgeworth expansions in convex sets multiple linear regression models. Communication in Statistics — Theory and Methods 23, 3227–3239.  k 0 Cx = s ∈ R ; s s ≤ x . Rockafellar, T., 1970. Convex Analysis. Princeton University Press, Princeton. Rothenberg, T.J., 1988. Approximate power functions for some robust tests of Using Corollary 3.2 of Bhattacharya and Rao (1976), we can show regression coefficients. Econometrica 56, 997–1019. Shao, J., 1994. Bootstrap sample size in nonregular cases. Proceedings of the that American Mathematical Society 122, 1251–1262. Staiger, D., Stock, J.H., 1997. Instrumental variables regression with weak ε sup Φ ((∂Cx) ) ≤ d (k) ε, instruments. Econometrica 65, 557–586. x∈R Stock, James H., Wright, J., 2000. Gmm with weak identification. Econometrica 68, 1055–1096. where d (k) is a function of only k and ε > 0. Hence, Lemma 9 Wallace, D., 1958. Asymptotic approximations to distributions. The Annals of holds. Finally, the result follows from integration and the fact that Mathematical Statistics 29, 635–654. Wang, J., Zivot, E., 1998. Inference on a structural parameter in instrumental the odd terms of ψs,n are even.  variables regression with weak instruments. Econometrica 66, 1389–1404.