Journal of Econometrics 152 (2009) 28–36
Contents lists available at ScienceDirect
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Choosing instrumental variables in conditional moment restriction models
Stephen G. Donald a, Guido W. Imbens b, Whitney K. Newey c,∗ a Department of Economics, University of Texas, United States b Department of Economics, Harvard University, United States c Department of Economics, MIT, United States article info a b s t r a c t
Article history: Properties of GMM estimators are sensitive to the choice of instrument. Using many instruments leads Received 1 January 2002 to high asymptotic asymptotic efficiency but can cause high bias and/or variance in small samples. Accepted 1 October 2008 In this paper we develop and implement asymptotic mean square error (MSE) based criteria for Available online 5 March 2009 instrument selection in estimation of conditional moment restriction models. The models we consider include various nonlinear simultaneous equations models with unknown heteroskedasticity. We develop JEL classification: moment selection criteria for the familiar two-step optimal GMM estimator (GMM), a bias corrected C13 version, and generalized empirical likelihood estimators (GEL), that include the continuous updating C30 estimator (CUE) as a special case. We also find that the CUE has lower higher-order variance than the Keywords: bias-corrected GMM estimator, and that the higher-order efficiency of other GEL estimators depends on Conditional moment restrictions conditional kurtosis of the moments. Generalized method of moments © 2009 Elsevier B.V. All rights reserved. Generalized empirical likelihood Mean squared error
1. Introduction Gaussian disturbances and homoskedasticity, Rothenberg (1996) showed that empirical likelihood (EL) is higher order efficient It is important to choose carefully the instrumental variables relative to BGMM. Our efficiency comparisons generalize those of for estimating conditional moment restriction models. Adding Rothenberg (1996) to other GEL estimators and heteroskedastic, instruments increases asymptotic efficiency but also increases non Gaussian disturbances. These efficiency results are different small sample bias and/or variance. We account for this trade-off than the higher-order efficiency result for EL from Newey and by using a higher-order asymptotic mean-square error (MSE) of Smith (2004), where all estimators are biased corrected, the the estimator to choose the instrument set. We derive the higher- number of moments is fixed, and symmetry is not imposed. order MSE for GMM, a bias corrected version of GMM (BGMM), Donald and Newey (2001) gives analogous results for linear and generalized empirical likelihood (GEL). For simplicity we instrumental variable estimators with homoskedasticity. This pa- impose a conditional symmetry assumption, that third conditional per focuses on GMM estimators with heteroskedasticity, lead- moments of disturbances are zero, and use a large number ing to heteroskedasticity robust MSE that include terms from estimation of the weight matrix. Our MSE criteria is like that of of instrument approximations. We also consider the effect of Nagar (1959), being the MSE of leading terms in a stochastic ex- allowing identification to shrink with the sample size n at a √ pansion of the estimator. This approach is well-known to give the rate slower than 1/ n. The resulting MSE expressions are quite same answer as the MSE of leading terms in an Edgeworth expan- simple and straightforward to apply in practice to choose the sion, under suitable regularity conditions (e.g. Rothenberg (1984)). instrument set. The MSE criteria given here also provide higher The many-instrument simplification seems appropriate for many order efficiency comparisons. We find that continuously updated applications where there is a large number of potential instru- GMM estimator (CUE) is higher-order efficient relative to BGMM. mental variables. We also assume symmetry, in the sense that We also find that the higher order efficiency of the GEL estimators conditional third moments of the disturbances are zero. This sym- depends on conditional kurtosis, with all GEL estimators having the metry assumption greatly simplifies calculations. Also, relaxing it same higher-order variance when disturbances are Gaussian. With may not change the results much, e.g. because the bias from asym- metry tends to be smaller than other bias sources for large numbers of moment conditions, see Newey and Smith (2004). ∗ Corresponding address: Department of Economics, Massachusetts Institute Choosing moments to minimize MSE may help reduce mislead- of Technology, Room E52-262D, 50 Memorial Drive, Cambridge, MA 02142, ing inferences that can occur with many moments. For GMM, the United States. Tel.: +1 617 253 6420; fax: +1 617 253 1330. MSE explicitly accounts for an important bias term (e.g. see Hansen E-mail address: [email protected] (W.K. Newey). et al. (1996), Newey and Smith (2004)), so choosing moments to
0304-4076/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2008.10.013 S.G. Donald et al. / Journal of Econometrics 152 (2009) 28–36 29 minimize MSE avoids cases where asymptotic inferences are poor the most information, and so should be used first. Also, the main due to the bias being large relative to the standard deviation. For terms may often be more important than interactions. GEL, the MSE explicitly accounts for higher order variance terms, The instruments need not form a nested sequence. Letting so that choosing instruments to minimize MSE helps avoid under- qkK (x) depend on K allows different groups of instrumental estimated variances. However, the criteria we consider do not gen- variables to be used for different values of K. Indeed, K fills a double erally provide the most accurate inference, as recently pointed out role here, as the index of the instrument set as well as the number by Sun et al. (2007) in another context. of instruments. We could separate these roles by having a separate The problem addressed in this paper is different than that index for the instrument set. Instead here we allow for K to not considered by Andrews (1999), where selection of the largest set be selected from all the integers, and let K fulfill both roles. This of valid moments was considered. Here the problem is how to restricts the sets of instruments to each have a different number of choose among moments known to be valid. Choosing among valid instruments, but is often true in practice. Also, this double role for moments is important when there are many thought to be equally K restricts the number of instrument sets we can select among, as valid. Examples include various natural experiment studies, where seems important for the asymptotic theory. multiple instruments are often available, as well as intertemporal As demonstrated by Newey and Smith (2004), the correlation optimization models, where all lags may serve as instruments. of the residual with the derivative of the moment function leads to In Section 2 we describe the estimators we consider and an asymptotic bias that increases linearly with K. They suggested present the criteria we develop for choosing the moments. We also an approach that removes this bias (as well as other sources of bias compare the criteria for different estimators, which corresponds that we will ignore for the moment). This estimator can be obtained to the MSE comparison for the estimators, finding that the CUE by subtracting an estimate of the bias from the GMM estimator and has smaller MSE than bias corrected GMM. In Section 3 we gives rise to what we refer to as the bias adjusted GMM estimator K give the regularity conditions used to develop the approximate (BGMM). To describe it, let qi = q (xi), ρi(β) = ρ(zi, β), and MSE, give the formal results, and consider higher order efficiency 0 ρˆ = ρ (βˆ H ), yˆ = [∂ρ (βˆ H )/∂β], yˆ = [yˆ ,..., yˆ ] , comparisons. A small scale Monte Carlo experiment is conducted i i i i 1 n n in Section 4. Concluding remarks are offered in Section 5. ˆ = X ˆ0 Γ qiyi/n, i=1 2. The model and estimators Σˆ = Υˆ (βˆ H )−1 − Υˆ (βˆ H )−1Γˆ (Γˆ 0Υˆ (βˆ H )−1Γˆ )−1Γˆ 0Υˆ (βˆ H )−1. We consider a model of conditional moment restrictions like The BGMM estimator is Chamberlain (1987). To describe the model let z denote a single n 0 − − X 0 observation from an i.i.d. sequence (z1, z2, . . .), β a p × 1 ˆ B = ˆ H + ˆ ˆ ˆ H 1 ˆ 1 ˆ ˆ ˆ β β (Γ Υ (β ) Γ ) yiρiqiΣqi. parameter vector, and ρ(z, β) a scalar that can often be thought i=1 of as a residual.1 The model specifies a subvector of x, acting as Also as shown in Newey and Smith (2004), GEL estimators have conditioning variables, such that for a value β of the parameters 0 less bias than GMM when K is large. We follow the description E[ρ(z, β0)|x] = 0, of these estimators given in that paper. Let s(v) be a concave function with domain that is an open interval V containing 0, where E[·] the expectation taken with respect to the distribution s (v) = ∂ js(v)/∂vj, and s = s (0). We impose the normalizations of z . j j j i s = s = −1. Define the GEL estimator as To form GMM estimators we construct unconditional moment 1 2 restrictions using a vector of K functions of x given by qK (x) = n ˆ GEL X 0 0 K β = arg min max s(λ gi(β)) (q1K (x), . . . , qKK (x)) . Let g(z, β) = ρ(z, β)q (x). Then the ∈B ˆ β λ∈Λn(β) unconditional moment restrictions i=1 ˆ 0 where, Λn(β) = {λ : λ gi(β) ∈ V, i = 1,..., n}. This E[g(z, β0)] = 0 estimator includes as a special cases: empirical likelihood (EL, Qin ≡ ¯ ≡ −1 Pn are satisfied. Let gi(β) g(zi, β), gn(β) n i=1 gi(β), and and Lawless (1994), Owen (1988)), where s(v) = ln(1 − v), ˆ ≡ −1 Pn 0 exponential tilting (ET, Imbens et al. (1998), Kitamura and Stutzer Υ (β) n i=1 gi(β)gi(β) . A two-step GMM estimator is one ˜ (1997)), where s(v) = − exp(v), and the continuous updating that satisfies, for some preliminary consistent estimator β for β0, estimator (CUE Hansen et al., 1996), where s(v) = −(1 + v)2/2. ˆ H 0 ˆ ˜ −1 As we will see the MSE comparisons between these estimators β = arg min g¯n(β) Υ (β) g¯n(β), (2.1) β∈B depend on s3, the third derivative of the s function, where ˜ where B denotes the parameter space. For our purposes β could be CUE : s3 = 0, ET : s3 = −1, EL : s3 = −2. some other GMM estimator, obtained as the solution to an analo- gous minimization problem with Υˆ (β)˜ −1 replaced by a different ˜ = [Pn K K 0 ]−1 2.1. Instrument selection criteria weighting matrix, such as W0 i=1 q (xi)q (xi) /n . The MSE of the estimators will depend not only on the number The instrument selection is based on minimizing the approxi- of instruments but also on their form. In particular, instrumental ˆ0 ˆ variables that better predict the optimal instruments will help mate mean squared error (MSE) of a linear combination t β of a ˆ to lower the asymptotic variance of the estimator for a given GMM estimator or GEL estimator β, where tˆ is some vector of (es- K. Thus, for each K it is good to choose qK (x) that are the best timated) linear combination coefficients. To describe the criteria, predictors. Often it will be evident in an application how to some additional notation is required. Let β˜ be some preliminary ˜ ˜ choose the instruments in this way. For instance, lower order estimator, ρ˜i = ρi(β), y˜i = ∂ρi(β)/∂β, and approximating functions (e.g. linear and quadratic) often provide n n ˆ = X ˜ 2 0 ˆ = X ˜0 Υ ρi qiqi/n, Γ qiyi/n, i=1 i=1 0 −1 −1 1 The extension to the vector of residuals case is straightforward. Ωˆ = Γˆ Υˆ Γˆ , τˆ = Ωˆ tˆ, 30 S.G. Donald et al. / Journal of Econometrics 152 (2009) 28–36
−1 n ! ˆ ˆ The terms Ξ(K) and ΞGEL(K) arise from heteroskedasticity ˜ = ˆ 0 X 0 ˜ = ˜ − ˜ di Γ qjqj/n qi, ηi yi di, consistent estimation of the optimal weight matrix for GMM. j=1 When they are positive, as they will be if the kurtosis is not too 0 − ∗ 0 − large, they represent a penalty for using a weight matrix that is ξˆ = q Υˆ 1q /n, Dˆ = Γˆ Υˆ 1q , ij i j i i optimal under heteroskedasticity. It has been noted in simulations n n (including those below) that using a heteroskedasticity consistent ˆ = X ˆ ˆ 0 ˜ 2 ˆ = X ˆ ˜ ˆ 0 ˜ Λ(K) ξii τ ρβi , Π(K) ξiiρi(τ ηi), weight matrix when it is not needed tends to degrade the i=1 i=1 performance of GMM estimators. The presence of these extra terms n n o2 provides a theoretical counterpart to this feature of GMM. ˆ X ˆ 0 ˆ ∗ 2 0 ˆ 0 ˆ −1 ˆ Φ(K) = ξii τˆ (D ρ˜ −ρ ˜βi) −τ ˆ Γ Υ Γ τ.ˆ ˆ ˆ i i It is also interesting to note that Ξ(K) and ΞGEL(K) will be small i=1 relative to the other terms when identification shrinks, meaning The criteria for the GMM estimator, without a bias correction, is that di goes to zero as the sample size grows. Thus, under shrinking identification the estimation of the weight matrix does not need S (K) = Πˆ (K)2/n + Φˆ (K). GMM to be accounted for in the MSE. This fact simplifies considerably Also, let the instrument choice criteria and was also noted in Newey and n Windmeijer (2009). ˆ X 0 0 ˆ 2 ˜ ˆ −1 ˜ ˆ −1 It is interesting to compare the size of the criteria for different ΠB(K) = ρ˜iρ˜j(τˆ η˜ i)(τˆ η˜ j)ξ = tr(Q Υ Q Υ ), ij estimators, which comparison parallels that of the MSE. As i,j=1 ˆ 2 n previously noted, the squared bias term for GMM, which is Π(K) , X 0 0 ∗ 2 Ξˆ (K) = {5(τˆ dˆ )2 −ρ ˜ 4(τˆ Dˆ )2}ξˆ , has the same order as K /n. In contrast the higher-order variance i i i ii terms in the BGMM and GEL estimators generally have order K/n, i=1 because that is the order of . Consequently, for large K the MSE n ξii X 0 0 ∗ criteria for GMM will be larger than the MSE criteria for BGMM and Ξˆ (K) = {3(τˆ dˆ )2 −ρ ˜ 4(τˆ Dˆ )2}ξˆ , GEL i i i ii GEL, meaning the BGMM and GEL estimators are preferred over i=1 GMM. This comparison parallels that in Newey and Smith (2004) ˜ = Pn ˜ ˆ 0 ˜ 0 and in Imbens and Spady (2005). where Q i=1 ρi(τ ηi)qiqi. The criteria for the BGMM and GEL estimators are One interesting result is that for the CUE, where s3 = 0, ˆ h ˆ ˆ ˆ i ˆ the MSE criteria is smaller than it is for BGMM, because ΠB(K) SBGMM(K) = Λ(K) + ΠB(K) + Ξ(K) /n + Φ(K), is positive. Thus we find that the CUE dominates the BGMM estimator, in terms of higher-order MSE, i.e. the CUE is higher- h ˆ ˆ ˆ ˆ i ˆ SGEL(K) = Λ(K) − ΠB(K) + Ξ(K) + s3ΞGEL(K) /n + Φ(K). order efficient relative to BGMM. This result is analogous to the higher-order efficiency of the limited information maximum For each of the estimators, our proposed instrument selection likelihood estimator relative to the bias corrected two-stage least procedure is to choose K to minimize S(K). As we will show this squares estimator that was found by Rothenberg (1983). will correspond to choosing K to minimize the higher-order MSE The comparison of the higher-order MSE for the CUE and the of the estimator. other GEL estimators depends on the kurtosis of the residual. For Each of the terms in the criteria has an interpretation. For GMM, [ 4| ] = 4 conditionally normal ρi we have E ρi xi 3σi and consequently ˆ 2 Π(K) /n is an estimate of a squared bias term from Newey and Ξˆ (K) will converge to zero for each K, and all the GEL estimators ˆ GEL Smith (2004). Because ξii is of order K this squared bias term has have the same higher-order MSE. When there is excess kurtosis, 2 ˆ [ 4| ] 4 order K /n. The Φ(K) term in the GMM criteria is an asymptotic with E ρi xi > 3σi , ET will have larger MSE than the CUE, variance term. Its size is related to the asymptotic efficiency of a and EL will have larger MSE than ET, with these rankings being K [ 4| ] 4 GMM estimator with instruments q (x). As K grows these terms reversed when E ρi xi < 3σi . These comparisons parallel those will tend to shrink, reflecting the reduction in asymptotic variance of Newey and Smith (2004) for a heteroskedastic linear model with that accompanies using more instruments. The form of Φˆ (K) is exogeneity. The case with no endogeneity has some independent interest. analogous to a Mallows criterion, in that it is a variance estimator In this setting the GMM estimator can often be interpreted as using plus a term that removes bias in the variance estimator. ‘‘extra’’ moment conditions to improve efficiency in the presence The terms that appear in S(K) for BGMM and GEL are all of heteroskedasticity of unknown functional form. Here the MSE variance terms. No bias terms are present because, as discussed criteria will give a method for choosing the number of moments in Newey and Smith (2004), under symmetry GEL removes the ˆ used for this purpose. Dropping the bias terms, which are not GMM bias that grows with K. As with GMM, the Φ(K) term present in exogenous cases, leads to criteria of the form accounts for the reduction in asymptotic variance that occurs from = ˆ + ˆ adding instruments. The other terms are higher-order variance SGMM(K) Ξ(K)/n Φ(K) ˆ h ˆ ˆ i ˆ terms, that will be of order K/n, because ξii is of order K. The sum SGEL(K) = Ξ(K) + s3ΞGEL(K) /n + Φ(K). of these terms will generally increase with K, although this need not happen if Ξˆ (K) is too large relative to the other terms. Here Here GMM and the CUE have the same higher-order variance, as was found by Newey and Smith (2004). Also, as in the general case, Ξˆ (K) is an estimator of these criteria can fail to be useful if there is so much kurtosis that n the higher order variance terms shrink as K grows. = X 0 2 { − 4| 4} Ξ(K) ξii τ di 5 E(ρi xi)/σi , i=1 3. Assumptions and MSE results where ρ = ρ(z , β ) and σ 2 = E[ρ2|x ]. As a result if the i i 0 i i i As in Donald and Newey (2001), the MSE approximations are kurtosis of ρ is too high the higher-order variance of the BGMM i based on a decomposition of the form, and GEL estimators would actually decrease as K increases. This 0 ˆ ˆ 0 ˆ ˆ phenomenon is similar to that noted by Koenker et al. (1994) for nt (β − β0)(β − β0) t = Q (K) + R(K), (3.2) the exogenous linear case. In this case the criteria could fail to be 0 ∗− E(Qˆ (K)|X) = t Ω 1t + S(K) + T (K), useful as a means of choosing the number of moment conditions, ˆ because they would monotonically decrease with K. [R(K) + T (K)]/S(K) = op(1), K → ∞, n → ∞ S.G. Donald et al. / Journal of Econometrics 152 (2009) 28–36 31
= [ ]0 = ˆ ∗ = Pn −2 0 where X x1,..., xn , t plim(t), Ω i=1 σi didi/n, Assumption 2 (Expansion). Assume that ρ(z, β) is at least five = [ | ] and di E ∂ρi(β0)/∂β xi . Here S(K) is part of conditional times continuously differentiable in a neighborhood N of β0, with MSE of Qˆ that depends on K and Rˆ(K) and T (K) are remainder derivatives that are all dominated in absolute value by the random 2 ∞ 5 ∞ terms that goes to zero faster than S(K). Thus, S(K) is the MSE variable bi with E(bi ) < for GMM and BGMM and E(bi ) < of the dominant terms for the estimator. All calculations are done for GEL. assuming that K increases with n. The largest terms increasing and This assumption is used to control remainder terms and has as decreasing with K are retained. Compared to Donald and Newey an implication that for instance, (2001) we have the additional complication that none of our 0 sup k ∂/∂β ρ(z, β)k < bi. estimators has a closed form solution. Thus, we use the first-order β∈N condition that√ defines the estimator to develop approximations to 0 ˆ It should be noted that in the linear case only the first derivative the difference nt (β−β0) where remainders are controlled using needs to be bounded since all other derivatives would be zero. It is the smoothness of the relevant functions and the fact that under also interesting to note that although we allow for nonlinearities in our assumptions the estimators are all root-n consistent. the MSE calculations, they do not have an impact on the dominant To describe the results, let terms in the MSE. The condition is stronger for GEL reflecting the more complicated remainder term. Our next assumption concerns ρi = ρ(zi, β0), ρβi = ∂ρi(β0)/∂β, ηi = ρβi − di, the ‘‘instruments’’ represented by the vector qK (x ). = K = [ 4| ] 4 i qi q (xi), κi E ρi xi /σi , n Assumption 3 (Approximation). (i) There is ζ (K) such that for each X 0 X 0 ∗− K Υ = σ 2q q /n, Γ = q d /n, τ = Ω 1t, K there is a nonsingular constant matrix B such that q˜ (x) = i i i i i K k˜K k ≤ i=1 i Bp (x) for all x in the support X of xi, supx∈X q (x) ζ (K), and [˜K ˜K 0] 0 −1 0 ρη E q (x)q (√x) has smallest eigenvalue that is bounded away from ξ = q Υ q /n, E[τ η ρ |x ] = σ , ij i j i i i i zero, and K ≤ ζ (K) ≤ CK for some finite constant C. (ii) For n n each K there exists a sequence of constant vectors π and π ∗ such = X ρη = X ρη ρη 2 K K Π ξiiσi , ΠB σi σj ξij , 0 2 2 2 0 ∗ 2 that E(kdi − q πK k ) → 0 and ζ (K) E(kdi/σ − q π k ) → 0 as = = i i i K i 1 i,j 1 K → ∞. n X 0 Λ = ξ E[(τ η )2|x ], The first part of the assumption gives a bound on the norm of the ii i i basis functions, and is used extensively in the MSE derivations to i=1 bound remainder terms. The second part of the assumption implies n n 2 2 2 X 0 X 0 that di and di/σi can be approximated by linear combinations Ξ = ξii τ di (5 − κi), ΞGEL = ξi τ di (3 − κi), of q Because 2 is bounded and bounded away from zero, it is i=1 i=1 i. σi easily seen that for the same coefficients π , kd /σ − σ q π ∗k2 ≤ where we suppress the K argument for notational convenience. K i i i i K 2k 2 − 0 k2 The terms involving fourth moments of the residuals are due to σi di/σi qiπK so that di/σi can be approximated by a linear estimation of the weight matrix Υ −1 for the optimal GMM estima- combination of σiqi. Indeed the variance part of the MSE measures → tor. This feature did not arise in the homoskedastic case considered the mean squared error in the fit of this regression. Since ζ (K) ∞ the approximation condition for d /σ 2 is slightly stronger than in Donald and Newey (2001) where an optimal weight matrix de- i i for d . This is to control various remainder terms where d /σ needs pends only on the instruments. i i i to be approximated in uniform manner. Since in many cases one We impose the following fundamental condition on the data, −2α K can show that the expectations in (ii) are bounded by K where the approximating functions q (x) and the distribution of x: 2 α depends on the smoothness of the function di/σi , the condition can be met by assuming that d /σ 2 is a sufficiently smooth function Assumption 1 (Moments). Assume that zi are i.i.d., and i i of x . p i (i) β0 is unique value of β in B (a compact subset of R ) satisfying We will assume that the preliminary estimator β˜ used to E[ρ(zi, β)|xi] = 0; construct the weight matrix is a GMM estimator is itself a GMM Pn −2 0 (ii) i=1 σi didi/n is uniformly positive definite and finite estimator with weighting matrix that may not be optimal. We (w.p.1.). do not require either optimal weighting or that the number of 2 (iii) σi is bounded and bounded away from zero. moments increases, though to get consistency under Assumption 1 ι1 ι2 | = (iv) E(ηji ρi xi) 0 for any non-negative integers ι1 and ι2 such (i) K would need to grow. In other words we let β˜ solve, that ι1 + ι2 = 3. n ι ι (v) E(kηik + |ρi| |xi) is bounded for ι = 6 for GMM and BGMM 0 ˜ X min g˜n(β) W0g˜n(β), g˜n(β) = (1/n) q˜(xi)ρi(β) and = 8 for GEL. β ι i=1 For identification, this condition only requires that E[ z |x ] ˜ ˜ ˜ ˜ ρ( i, β) i for some K vector of functions q˜(xi) and some K × K matrix W0 = 0 has a unique solution at β = β0. Estimators will be consis- which potentially could be IK˜ or it could be random as would tent under this condition because K is allowed to grow with n, as be the case if more than one iteration were used to obtain the in Donald et al. (2003). Consistency could also be achieved by us- GMM estimator. We make the following assumption regarding this ing the approach of Dominguez and Lobato (2004) or Lavergne and preliminary estimator. Patilea (2008). Part of this assumption is a restriction that the third p Assumption 4 (Preliminary Estimator). Assume (i) β˜ → β (ii) moments are zero. This greatly simplifies the MSE calculations. The 0 last condition is a restriction on the moments that are used to con- ˜ − there exist some non-stochastic matrix W0 such that W0 W0 trol the remainder terms in the MSE expansion. The condition is →p ˜ = + 1 Pn ˜ + −1/2 more restrictive for GEL which has a more complicated expansion 0 and we can write β β0 n i=1 φiρi op(n ), involving more terms and higher moments. The next assumption ˜ = − ˜ 0 ˜ −1 ˜ ˜ ˜ = Pn ˜ φi (Γ W0Γ ) Γ W0qi with Γ i=1 q(xi)di/n and concerns the properties of the derivatives of the moment functions. E ρ2φ˜ φ˜ 0 < ∞. Specifically, in order to control the remainder terms we will require i i i certain smoothness conditions so that Taylor series expansions can Note that the assumption requires that we just use some root-n be used and so that we can bound the remainder terms in such ex- consistent and asymptotically normally distributed estimator. The pansions. asymptotic variance of the preliminary estimator will be, 32 S.G. Donald et al. / Journal of Econometrics 152 (2009) 28–36