Research in Economics 70 (2016) 238–261

Contents lists available at ScienceDirect

Research in Economics

journal homepage: www.elsevier.com/locate/rie

Modelling optimal instrumental variables for dynamic panel data models$

Manuel Arellano

CEMFI, Madrid, Spain article info abstract

Article history: Received 15 October 2015 This paper considers a new two-stage instrumental variable of panel data models with Accepted 15 November 2015 predetermined or endogenous explanatory variables. The instruments are fitted values from Available online 22 November 2015 period-specific first-stage equations based on all available lags, which are similar to those in Keywords: standard GMM estimation. The difference is that first-stage fitted values are not unrestricted but Panel data are chosen to satisfy the constraints implied by a VAR process with random effects. As a result Optimal instruments the number of free first-stage parameters is dramatically reduced, while retaining predictive Predetermined variables power from all lags. The are asymptotically efficient when the VAR restrictions hold, Double asymptotics but remain consistent if they do not. Since the instruments are parameterized using a fixed VAR with random effects fi Country growth equations number of coef cients for any value of T, the properties of the resulting estimators are not fundamentally affected by the relative dimensions of T and N, contrary to standard panel GMM. Empirical illustrations are reported using firm- and country-level panel data. & 2015 University of Venice. Published by Elsevier Ltd. All rights reserved.

1. Introduction

This paper considers a new instrumental variable method for estimating panel data models with general predetermined or endogenous explanatory variables. The instruments are fitted values from period-specific first-stage equations based on all available lags. First-stage coefficients are chosen to satisfy the constraints implied by a stable multivariate process. This is proposed as a framework for modelling optimal instruments in panel data analysis. A popular method in dynamic panel data estimation is GMM, which is consistent in short panels, robust, has general applicability, and provides a well-defined notion of optimality (Holtz-Eakin et al., 1988; Arellano and Bond, 1991). However, in practice the application of GMM often entails too many moment conditions for acceptable sampling properties in either finite orlargesampleswhenthetimeseriesdimensionisnotfixed (Alvarez and Arellano, 2003). For autoregressive models there are also available likelihood-based methods which exhibit better finite sample prop- erties than GMM but can be seriously biased if certain auxiliary assumptions are violated. Moreover, these methods cannot be readily extended to cover models with endogenous or general predetermined variables.1 There is therefore an acute robustness-efficiency trade-off in the choice among existing techniques. In addition, some methods are designed for short panels of large cross-sections, while others target long panels of small cross-sections, but there is a vacuum in between. The

☆ An earlier version of this paper was presented as an Invited Lecture at the European Meeting of the Econometric Society in Venice, August 2002. 1 Alvarez and Arellano (2004) and Moral-Benito (2013) developed likelihood-based estimators of autoregressive models and models with general predetermined variables, respectively, that are robust in the sense that remain fixed-T large-N consistent under the same assumptions as standard panel GMM techniques. http://dx.doi.org/10.1016/j.rie.2015.11.003 1090-9443/& 2015 University of Venice. Published by Elsevier Ltd. All rights reserved. M. Arellano / Research in Economics 70 (2016) 238–261 239 literature does not seem to have much to offer to researchers interested in “small N, small T” panels or other panels that are not easily classifiable. GMM estimators can be regarded as providing implicit models for the optimal instruments. The problem is that in the panel context these models are often overparameterized, leading to poor properties in finite samples and double asymptotics. This paper provides a framework for parsimonious modelling of optimal instruments in panel data, which is, first, coherent with the fixed T,large N perspective; second, has good theoretical properties in a double asymptotic setup, and thirdly provides estimators with the same robustness features as popular GMM methods. More research is needed on panel data methodology from a time series perspective, and this paper is intended as a contribution towards a marriage of the cross-sectional (fixed T) and time-series (long T) perspectives. Recent results by Newey and Smith (2004) on the higher order properties of empirical likelihood (EL) and GMM indicate that panel EL estimators may exhibit better finite sample properties than their GMM counterparts. However, while the double asymptotic properties of panel EL estimation remain to be explored, the fact that a panel EL estimator will be inconsistent for fixed N,largeT (as long as it is based on an increasing number of moment conditions) suggests that EL estimation will be less robust to double asymptotic plans than the instrumental variable methods considered in this paper. The paper is organized as follows. Section 2 presents the model, links GMM with the parametric optimal-instrument per- spective, and introduces projection-restricted simple IV estimation (SIV). Section 3 discusses the asymptotic biases of one-step GMM when both T and N tend to infinity. It is shown that the order of magnitude of the bias depends on whether the explanatory variables are predetermined or endogenous, and that in the latter situation GMM is inconsistent. In Section 4 we present an auxiliary random effects VAR model for the vector of instruments, and obtain sequential linearprojectionsoftheeffects.Section 5 describes the form of optimal instruments and provides several examples. Section 6 considers pseudo maximum likelihood estimation (PML) of the auxiliary VAR model. Section 7 discusses the properties of feasible projection-restricted IV estimators with and without strictly endogenous variables, and the calculation of asymptotic standard errors. Section 8 contains empirical illus- trations and Monte Carlo simulations. Section 8.1 reports estimates of autoregressive employment and wage equations from firm panel data; Section 8.2 presents the results of a simulation exercise calibrated to the previous firm panel, and Section 8.3 reports estimates of country growth convergence rates using a panel of 92 countries observed at five-year intervals. Finally, Section 9 ends with some concluding remarks and plans for future work. All proofs and technical details are contained in Appendices.

2. Model and optimal instruments: overview

2.1. A sequential conditional mean model

Let us consider a fixed effects panel data model of the form 0 β η ε ; …; ; ; …; ; yit ¼ xit þ i þ it ðt ¼ 1 T i ¼ 1 NÞ ð1Þ together with the conditional mean assumption ε t ε Eð itjzi ÞEtðÞ¼it 0 ð2Þ t 0 ; …; 0 0 T ; T ; T ; η η where zi ¼ðzi1 zitÞ and yi xi zi i are iid random variables; i represents an unobservable individual effect and zit is a vector of instrumental variables. The following remarks about the nature of the explanatory variables and the instruments are relevant. First, if an explanatory ε variable xkit is predetermined for itðÞþ j ,thenxkiðÞ t j is a component of zit. Second, an xkit may also be strictly endogenous in the sense of not being predetermined for any lead of ε.Finally,zit may contain external instruments that are not part of xit or its lags. An example of this type of model is an equation from a VAR with individual effects. Other examples are partial adjustment equations with predetermined regressors, or a structural relationship between endogenous variables. As illustrations of the latter we consider below cross-country growth and household consumption Euler equations.

2.2. Information bound and optimal instruments

η ∣ T β Since the distribution of i zi is unrestricted, all information about is in the conditional moments for the errors in differences or forward orthogonal deviations 0β ε ; …; Et yit xit ¼ Et it ¼ 0 ðÞt ¼ 1 T 1 ð3Þ where starred variables denote orthogonal deviations:

= T t 1 2 1 ε ¼ ε ε þ⋯þε : ð4Þ it T t þ1 it ðT tÞ iðt þ 1Þ iT ε ε The advantage of orthogonal deviations is that if it is homoskedastic and serially uncorrelated so is it (Arellano and Bover, 1995). Two-wave panel:IfT¼2 there is just one equation in deviations (which coincides with first-differences): 0 β∣ 1 ε Eyi1 xi1 zi ¼ E1 i1 ¼ 0 ð5Þ 240 M. Arellano / Research in Economics 70 (2016) 238–261 = ε2 and the optimal instrument is hi1 ¼ E1 xi1 E1 i1 , in the sense that the unfeasible IV estimator ! XN 1 XN β~ 0 1 ¼ hi1xi1 hi1yi1 ð6Þ i ¼ 1 i ¼ 1 attains the variance bound for this problem (Chamberlain, 1987). ε2 A parametric approach to feasible estimation is to specify functional forms for E1 xi1 and E1 i1 and substitute suitable 1 estimates in the IV formula (6). 2SLS is an example of this approach that uses the sample linear projection of xi1 on zi as an ε2 2 estimate of the optimal instrument. That is, 2SLS attains the variance bound when E1 xi1 is linear and E1 i1 is constant. Thus, in a parametric approach to feasible IV estimation there are two levels of assumptions: the substantive conditional moment restrictions used in estimation, and the auxiliary assumptions used in estimating the optimal instruments. The former are related to consistency and the latter to asymptotic efficiency. This distinction between substantive and auxiliary assumptions is central to the perspective adopted in this paper. Multi-wave panel:IfT 42 the form of the optimal instrument is complicated by the fact that neither conditional het- ε eroskedasticity or autocorrelation in it is ruled out. The unfeasible optimal IV estimator in the general case solves XN TX 1 ~ ~ 0 β~ hit yit xit ¼ 0 ð7Þ i ¼ 1 t ¼ 1 ~ = ε~ 2 where hit ¼ Et ðÞxit Et it and ε~ ε iTðÞ 1 ¼ iTðÞ 1 ð8Þ

ε~ ε τ ε~ ⋯ τ ε~ ; …; it ¼ it t1 itðÞþ 1 tTðÞ t 1 iTðÞ 1 ðÞt ¼ T 2 1 ð9Þ τ ε ε~ = ε~ 2 with tj ¼ Et þ j it itðÞþ j Et þ j itðÞþ j . ~ ~ ~ ~ The εit are forward filtered errors such that EtðÞ¼εit 0 and Et þ j εitεitðÞþ j ¼ 0 so that the bound for T 1 periods is the sum of the bounds for each period (Chamberlain, 1992): 0 1 TX 1 E ðÞx~ E ðÞx~ 0 J ¼ E@ t itt it A: ð10Þ T 1 ε~ 2 t ¼ 1 Et it

ε ε~ ε If the it's are conditionally homoskedastic and serially uncorrelated it ¼ it. The optimal instrument is hit t ht zi ¼ EtðxitÞ, and the unfeasible IVE ! ! XN TX 1 1 XN TX 1 β~ 0 ¼ hitxit hityit ð11Þ i ¼ 1 t ¼ 1 i ¼ 1 t ¼ 1 attains the fixed-T, large-N variance bound. Generalized method of moments: A One-step GMM estimator (GMM1) is an example of the parametric approach to t feasible IVE that uses the cross-sectional sample linear projections of xit on zi as an estimate of the optimal instruments. GMM1 is based on the specification t 0 π ⋯ 0 π t0π ; ht zi ¼ zi1 t1 þ þzit tt zi t ð12Þ together with assumptions of homoskedasticity and lack of serial correlation. In a time series context (large T, fixed N), these projections cannot be consistently estimated without further restrictions. - fi t0πb But as N 1 for xed T, the IVE that uses the sample projection zi t has the same asymptotic distribution as the unfeasible t0π IVE based on the population projection zi t. However, when both T and N tend to infinity at the same rate, the feasible and unfeasible estimators differ as shown for autoregressive models in Alvarez and Arellano (2003), who found that GMM1 had an asymptotic bias of order 1=N. In the next Section we obtain the asymptotic bias of GMM1 for models (1)and(2)whenbothT and N tend to infinity. We show that the order of magnitude of the bias depends on whether the explanatory variables are endogenous or predetermined. In the predetermined casethebiasisoforder1=N—as in Alvarez and Arellano (2003)—but in the endogenous case the bias is of order T=N, and hence a potentially more serious problem. These results provide theoretical support for the approach developed in this paper, because the proposed estimators are immune to asymptotic bias in a double asymptotics. t Under heteroskedasticity or autocorrelation, it is possible to obtain a GMM estimator based on the moments Ezi vit ¼ 0 fi βb that is more ef cient than GMM1 by using as weight matrix the inverse of a robust estimate of the variance matrix of the conditions. These are the standard two-step robust GMM2 estimators (Arellano and Bond, 1991). A GMM2 estimator, however, will not attain the efficiency bound in general, although it may attain it under more general conditions

2 Incorporating these assumptions in estimation may reduce the variance bound for β. A trade-off between robustness and efficiency arises, since estimates of β that exploit the extra restrictions may be inconsistent if they are false. M. Arellano / Research in Economics 70 (2016) 238–261 241 than GMM1. Specifically, GMM2 will attain the bound under serial correlation and time series heteroskedasticity of the form 2 t 2 σ2 t þ j σ 4 3 Eðvitjzi Þ¼EðvitÞ¼ t and Eðvitviðt þ jÞjzi Þ¼Eðvitviðt þ jÞÞ¼ tðt þ jÞ for j 0. Semiparametric asymptotically efficient estimation for fixed T: It is also possible to devise estimators that attain the fixed-T efficiency bound under more general assumptions than GMM1 or GMM2. This could be achieved by considering a semiparametric estimator that replaces the unknown functions in (11) or (7) by nonparametric estimates. The former would be asymptotically efficient when t fi the conditional expectations Eðxit jzi Þ are nonlinear but the errors are classical. The latter might be ef cient in the general case. Estimators of this type have been considered by Hahn (1997). Nevertheless, since these estimators use even more flexible estimates of t = fi Eðxit jzi Þ than GMM1, they would not be expected to perform better than GMM1 unless T N is suf ciently close to zero.

2.3. The crude time-series parametric approach

The conventional time-series parametric approach specifies Xq t 0 γ ht zi ¼ zitðÞ j j ð13Þ j ¼ 0

4 for some q. Eq. (13) has constant coefficients γj that can be consistently estimated time series-wise. It is based on assumptions about the stability and degree of dependence in the zit process. Panel data examples of the time-series approach are the stacked IV estimators for autoregressive models due to Anderson and Hsiao (1982). From the point of view of fixed-T optimal IV estimation, however, the crude time-series approach has two undesirable features. First, (13) cannot be calculated for periods without sufficient observations of the initial lags, hence requiring trimming of the time series and loss of some cross-sections. — — A second less obvious problem is that if the zit process contains heterogeneous intercepts as may be expected given model (1) t ht zi will depend on all lags, even if at the individual level z is characterized by stable low-order dependence of the kind that supports (13) in a time series context. The reason is that all lags are predictors of the effects (as described in Section 4). Since the predictions are updated each period, the coefficients of the predictor change over time. This is precisely the motivation behind (12).

2.4. Projection-restricted IV estimation

t0π fi fi The problem with zi t is that it often contains too many coef cients for good nite sample inference. If we have an individual-effects, stable process for z characterized by a parameter vector γ, the πt are functions of γ. Therefore, instead of an unrestricted linear projection we may consider a restricted one: t t0π γ : ht zi ¼ zi t ð14Þ These restrictions are similar to the low order, stable dependence assumptions implicit in the time-series approach. The difference is that we are using them in a fashion consistent with the fixed-T panel perspective. For a long-T panel the outcome is essentially the same as in the crude time-series approach. For a very short-T panel the outcome is essentially the same as in the standard GMM approach. But for other panels the number of first-stage coefficients is kept constant, while using all waves and the predictive ability due to unobserved heterogeneity.

The suggested strategy is to specify a VAR for zit with individual effects and unrestricted initial conditions. The status of this assumption is similar to that of (13) in the time-series approach. Then use as instruments the estimated restricted t0π γ fi projections zi t , whose evaluation involves, as we shall see, a straightforward recursive Kalman- lter calculation. Unrestricted πt in GMM are estimated by cross-sectional OLS. Restricted πt γ will be estimated using a modified WG estimator, which is a multivariate generalization of the random effects PML in Alvarez and Arellano (2003). We then obtain ! XX 1XX βb b 0 b : ¼ hitxit hityit ð15Þ i t i t b t0πb πb b t0π γb γb GMM sets hit ¼ zi t where t is an OLS estimate, and the projection-restricted IVE sets hit ¼ zi t where is a PML estimate. For fixed T and large N both estimators have similar robustness properties in the sense of being consistent under the fi fi same assumptions, but when T is not xed the latter is immune to asymptotic biases because the number of rst-stage coefficients does not increase with T. If the auxiliary assumptions πt ¼ πt γ are violated, projection-restricted IV remains a consistent estimate of β. The goal is, therefore, to use the time-series parametric approach in such a way that the resulting estimator can still achieve the fixed T efficiency bound under certain assumptions, in the same way as standard GMM only achieves the fixed T bound under certain auxiliary assumptions.

3 A more restrictive two-step GMM estimator based on a weight matrix that only depends on the data second-order moments (Arellano and Bond, 1991, footnote 2) will attain the bound under the same conditions as the standard GMM2. 4 t In this case it is better to think of ht zi as Et xit xitðÞþ 1 because xit involves a different number of terms for each t. 242 M. Arellano / Research in Economics 70 (2016) 238–261

2.5. Related approaches

Brundy and Jorgenson (1971) is a classic precedent in the simultaneous equations literature of the perspective adopted in this paper. The Brundy–Jorgenson estimator relied on preliminary structural form parameters to construct feasible optimal instru- ments, which avoided estimating first-stage reduced form equations with a large number of variables included in them. There is also a parallel with the asymptotic framework and estimation approach in Sargan (1975). Sargan (1975) argued that conventional asymptotic theory, in which N-1 and the model remains constant, was inadequate for large models where the total number of variables was large relative to the sample size. He considered instead an asymptotics in which the number of instruments and the number of equations increased with sample size. He focused on estimation of a fixed number of parameters occurring in a subsystem of equations and studied the asymptotic properties of a Brundy–Jorgenson iterated IV estimator that used the over- identifying restrictions in forming the instruments.5 Similar to Sargan (1975), the panel data models in this paper can be regarded as simultaneous systems in which the number of equations and the number of instruments tend to infinity as T increases. Feasible parametric implementations of optimal instruments have been considered by West and Wilcox (1996) and West et al. (2009) in the estimation of linear time series models with moving average disturbances; their notion of optimality is as introduced in Hansen (1985) for conditional mean models with dependent observations. In common with our panel methods, these time series estimators will not attain the efficiency bound if the auxiliary parametric specification is incorrect, but retain consistency and some finite sample advantages relative to conventional GMM, due to the smaller number of parameters involved in the construction of the instruments.

3. Double-asymptotic biases

In this section we obtain the asymptotic bias of the one-step GMM estimator when both T and N tend to infinity. We show that the order of magnitude of the bias depends on whether the explanatory variables are endogenous or predetermined. It is well known that if the number of first-stage coefficients is large relative to the sample size, standard asymptotic approximations may be a poor guide to the finite sample properties of IV estimates, specially when the instruments are weak. In the panel context, double-asymptotic results provide formal approximations to the impact of nonnegligible T (and hence of many instruments) relative to N on the properties of GMM estimates. They also provide a theoretical motivation for the strategy for reducing the number of first-stage coefficients pursued in this paper. The form of the GMM estimator is

! TX 1 1 TX 1 βb 0 0 ; ¼ Xt Mt Xt Xt Mtyt ð16Þ t ¼ 1 t ¼ 1 0 1 0 ; …; 0 t ; …; t 0 ; …; 0 where Mt ¼ Zt ZtZt Zt is N N, Xt ¼ x1t xNt is N k, Zt ¼ z1 zN is N mt, and yt ¼ y1t yNt is N 1, or in a more compact notation b 0 1 0 β ¼ X MX X My: ð17Þ

3.1. Assumptions fi ε ; 0 ; 0 0 Let us de ne the vector of variables wit ¼ it xit zit . Depending on the model, x and z may contain elements in com- mon. Also, some of the xs may be lags of y, and some of the zs may be lags of y and/or x. We make the following assumption:

Assumption 1. wit can be represented as a vector MA(1) of the form μ ζ Ψ ζ Ψ ζ ⋯ wit ¼ i þ it þ 1 itðÞ 1 þ 2 itðÞ 2 þ ð18Þ Ψ 1 ζ 1 fi where j j ¼ 0 is absolutely summable, and it t ¼1 is an i.i.d. sequence independent of the individual-speci c mean vector μi and with finite fourth-order moments. Moreover, the first element of μi is zero, and

X1 γ o jjs s ¼ B1 1 ð19Þ s ¼1 γ ε where s ¼ Exit itðÞ s . GMM estimators are typically motivated under less restrictive conditions than Assumption 1, which is only made for simplicity, since our purpose is to exhibit some consequences of double-asymptotics in a leading situation, rather than providing an alter- native basis for inference with GMM estimates. The results that follow are expected to hold under more general conditions.

5 Sargan's work on instrumental variables estimation is reviewed in Arellano (2002). M. Arellano / Research in Economics 70 (2016) 238–261 243 Predeterminedness and endogeneity:Ifxit is predetermined for εit, then xit ‘ εit; εitðÞþ 1 ; εitðÞþ 2 ; … but xit e εitðÞ 1 ; ε ; … γ γ ⋯ γ a 4 itðÞ 2 Þ, so that 0 ¼ 1 ¼ ¼ 0 but s 0 for s 0. If on the other hand xit is strictly endogenous xit e …; εitðÞ 2 ; εitðÞ 1 ; εit; εitðÞþ 1 ; εitðÞþ 2 ; … ; γ a ε ; ε ; … ε in which case s 0 for all s. Finally, if xit ‘ itðÞþ j itðÞþ j þ 1 we say that xit is endogenous but predetermined for itðÞþ j , and the corresponding γs vanish.

3.2. The order of the estimation error

Let us write the standardized estimation error as

! TX 1 1 TX 1 b 1 1 β β ¼ X0M X X0M ε A 1b : ð20Þ NT t t t NT t t t NT NT t ¼ 1 t ¼ 1 A key result is given in the following theorem. P 1 T 1 0 ε ε Theorem 1. Letting bNT ¼ ðÞNT t ¼ 1 Xt Mt t , under Assumption 1 if xit is predetermined, in the sense that E xit itðÞþ j ¼ 0 for jZ0, then m EbðÞ¼O : ð21Þ NT N

If xit is endogenous, in the sense that EðÞ xitεit a0, then mT EbðÞ¼O : ð22Þ NT N

Intuitively, in the predetermined case the “endogeneity bias” vanishes as T-1 because the within-group OLS (WG) estimator is large-T consistent. However, WG is not T-consistent in the endogenous explanatory variable case. These results also highlight the impact of m—the dimension of the instrument vector zit—on the order of magnitude of the bias.

3.3. The asymptotic bias of GMM

The previous setup can be used to extend the results in Alvarez and Arellano (2003) to analyzing the asymptotic properties of GMM estimators with general predetermined or endogenous explanatory variables. Here we provide an – informal discussion of consistency by drawing parallels with the corresponding Alvarez Arellano results. 0 Let us first consider the probability limit of ðÞNT 1 X MX . It is useful to introduce at this point the time series individual-specific linear projection of xit on zit; zitðÞ 1 ; zitðÞ 2 ; … : X1 ∣ ; ; … ϕ Φ : pit E xit zit zitðÞ 1 ¼ i þ jzitðÞ j ð23Þ j ¼ 0

Letting ξit be the corresponding projection error, we have ξ : xit ¼ pit þ it ð24Þ ξ If xit is predetermined then xit ¼ pit and the errors it are identically zero for all i and t. In such a case, using similar arguments as in Alvarez and Arellano (2003), when T=N-co1 we can obtain 1 0 μ μ 0 plim X MX ¼ Exit xi xit xi ð25Þ T-1;N-1NT μ μ ; μ0 ; μ0 0 where xi corresponds to the partition i ¼ 0 xi zi . However, when xit is endogenous we have

1 0 1 0 1 0 1 0 1 0 X MX ¼ P MP þ Ξ MP þ P MΞ þ Ξ MΞ ð26Þ NT NT NT NT NT Ξ ξ – where X ¼ P þ denotes the matrix and orthogonal deviation counterpart to xit ¼ pit þ it. By analogy with the Alvarez Arellano results, when T=N-co1 we have 1 0 p 0 P MP-Epϕ p ϕ ð27Þ NT it i it i 1 0 m P MΞ ¼ O ð28Þ NT N 244 M. Arellano / Research in Economics 70 (2016) 238–261 1 0 mT Ξ MΞ ¼ O : ð29Þ NT N 0 Decomposition (24) can also be used to examine the probability limit of ðÞNT 1 X Mε .Wehave

1 0 1 0 1 0 plim X Mε ¼ plim P Mε þ plim Ξ Mε: ð30Þ T-1;N-1NT T-1;N-1NT T-1;N-1NT

Extending the Alvarez–Arellano results (Lemma 2), it turns out that the first term on the right hand side of (30) vanishes, but 0 ðÞNT 1 Ξ Mε is of order OmT=N .

Thus, when xit is endogenous and T=N-co1, the form of the asymptotic bias is given by () 1 βb β ϕ ϕ 0 1 Ξ0 Ξ 1 Ξ0 ε plim ¼ Epit i pit i þ plim M plim M ð31Þ T-1;N-1 T-1;N-1NT T-1;N-1NT or

no 1 βb β ϕ ϕ 0 ξ ξ0 mc γ mc: plim ¼ Epit i pit i þE it it 0 ð32Þ T-1;N-1 2 2 γ ξ o Note that when x is predetermined 0 ¼ 0 and it 0, so that the bias vanishes as long as c 1. The conclusion is that GMM is inconsistent in the endogenous explanatory variable situation. In the predetermined case it is consistent, but it can be expected to exhibit a bias in the asymptotic distribution of order Om=N , similar to those reported in Alvarez and Arellano (2003) for autoregressive models. The intuition for these results is that in the predetermined case the endogeneity bias tends to disappear as T increases, whereas in the endogenous case it does not. So having an increasing number of moment conditions results in a larger bias when the explanatory variables are endogenous. The previous analysis is informative about the bias term. Okui (2009) discussed a bias-variance trade-off in the number of moment conditions used for GMM estimation of an autoregressive panel model. He proposed a method for selecting the number of instruments that minimizes a Nagar approximation to the estimator's mean square error when N and T tend to infinity. It would be interesting to extend these results to models with predetermined or endogenous explanatory variables.

4. Auxiliary VAR with individual effects

Let us consider a stable VAR(1) process for an m 1 vector zit: μ ; …; zit ¼ Aziðt 1Þ þðI AÞ i þvit ðt ¼ 1 TÞ ð33Þ t 1 Evit∣z ; μ ¼ 0 ð34Þ i i 0 t 1 0 ; …; 0 Ω where zi ¼ zi0 ziðt 1Þ , and VarðÞ¼ vit t. For notational convenience we assume here that zi0 is also observed. It is appropriate to conduct our discussion at the multivariate level because typically the conditioning variable will be a vector rather than a scalar, even if we are dealing with a single-equation model. However, we restrict attention to a first- order process for simplicity. Generalization to higher-order processes is cumbersome but straightforward. If the process for individual i started in the infinite past:

X1 μ j : zi0 ¼ i þ A viðjÞ ð35Þ j ¼ 0

Since we wish to allow for the possibility that the process started in any given period, and that different individuals ; μ started at different times, we treat zi0 i as realizations of some arbitrary cross-sectional joint distribution. μ μ μ Ω μ Let E i ¼ , Var i ¼ μ, and let the linear projection of zi0 on i be † zi0 ¼ τ0 þΥ 1μ þv ð36Þ i i0 † † ¼ ¼ Γ τ Υ Γ where Evi0 0 and Var vi0 0. In general, 0, 1, and 0 are free parameters, but if theP process is mean stationary τ Υ Ω Ω Γ 1 jΩ j0 μ 0 ¼ 0, and 1 ¼ Im. If the process is also stationary in variance, t ¼ for all t, and 0 ¼ j ¼ 0 A A . In any event, i denotes the mean of the steady state distribution of the process for individual i, and μ is the cross-sectional mean of μi. VAR forecasts:Fors40wehave μ s μ ⋯ s 1 : zitðÞþ s ¼ i þA zit i þvitðÞþ s þAvitðÞþ s 1 þ þA vitðÞþ 1 ð37Þ Therefore, μ s μ Et zitðÞþ s ¼ Et i þA zit Et i ð38Þ M. Arellano / Research in Economics 70 (2016) 238–261 245 and "# 1 TX t E z ¼ c z E z ð39Þ t it t it T t t itðÞþ s s ¼ 1 so that ! 1 TX t E z ¼ c I As z E μ : ð40Þ t it t T t it t i s ¼ 1 μ fi Notice that Et zit only depends on zitgiven Et i . This is so because we are dealing with a rst-order VAR. However, since μ Et i depends on all lags, so does Et zit . Similarly, 1 E z ¼ c z z þ⋯þz þE z þ⋯þz ð41Þ t itðÞ j t itðÞ j T t itðÞ j þ 1 it t itðÞþ 1 iTðÞ j where ! T X t j ⋯ μ s μ : Et zitðÞþ 1 þ þziTðÞ j ¼ ðÞT t j Et i þ A zit Et i s ¼ 1 : : μ The same is true if conditional expectations EtðÞare replaced by linear projections Et ðÞ. The form of Et i is given in the following Theorem.

Theorem 2 (Sequential linear projections of the effects). For the VAR(1) model presented above, the linear projection of the μ t Z vector of individual effects i on zi is given by the following recursive updating formula for t 0: μ H 1δ Et i ¼ t it ð42Þ where H Ω Υ 0 Γ 1Υ 0 ¼ I þ μ 1 0 1 ð43Þ

δ μ Ω Υ 0 Γ 1 τ i0 ¼ þ μ 1 0 ðÞzi0 0 ð44Þ and for t Z1: H H Ω 0Ω 1 t ¼ t 1 þ μðÞI A t ðÞIA ð45Þ δ δ Ω 0Ω 1 : it ¼ itðÞ 1 þ μðÞI A t zit AzitðÞ 1 ð46Þ

For the stationary case, by inserting τ0 ¼ 0, Υ 1 ¼ Im, and Ωt ¼ Ω for all t in the result given in the Theorem, we get: "# Xt μ Λ Λ 1 μ Λ Λ : Et i ¼ I þ 0 þt 1ðI AÞ þ 0zi0 þ 1 zis Aziðs 1Þ ð47Þ s ¼ 1 Λ Ω Γ 1 Λ Ω 0Ω 1 where 0 ¼ μ 0 and 1 ¼ μðI AÞ (further details on the stationary case are in the Appendix). Note that the linear projection of the vector of individual effects in a VAR(1) model with mean stationarity and constant Ω but no covariance stationarity is of the same form as the result in (47), but treating Γ0 as an unrestricted covariance matrix. Ω Ω Another intermediate possibility is one in which t ¼ for all t, but initial conditions are left fully unrestricted, so that τ Υ Γ μ fi 0, 1, and 0 are treated as free parameters. This is a case of special interest because Et i depends on a xed number of parameters which does not increase with t.

5. The form of the optimal instruments

An expression for the optimal instruments is given by6 ⋯ † Et xit ¼ B0Et zit þB1Et zitðÞ 1 þ þBqEt zitðÞ q ¼ B Et zitðÞ q ð48Þ 0 † ; …; 0 ; …; 0 where B ¼ B0 Bq and zitðÞ q ¼ zit zitðÞ q .

6 Expression (48) is saying that ⋯ ∣ t Exit B0zit B1zitðÞ 1 BqzitðÞ q zi ¼ 0 246 M. Arellano / Research in Economics 70 (2016) 238–261 ε † If all x's are predetermined for some lead of , the form of Et xit and B is known given knowledge of the parameters of the VAR for zit, so that estimation of β can be based on the sample moments XX b 0β Et xit yit xit ð49Þ i t b β where Et xit is an estimate of Et xit . Since the optimal instrument has the same dimension as no further weighting of the moments is required. † If on the other hand some of the x's are strictly endogenous, then q and some of the elements of B are unknown. Thus, we consider GMM estimates of β based on the moments XX b 0β : Et zitðÞ q yit xit ð50Þ i t In this case the instruments may have a larger dimension than β so that weighting of the moments may be required. † Indeed, the choice of weight matrix here is equivalent to choosing an estimation method for B . A suggested weighting is discussed below.7 Next, we consider some examples. Example 1: An equation from a VAR.Wehave 0 η ε w1it ¼ a1witðÞ 1 þ i þ it ð51Þ ε ∣ t 1 : E it wi ¼ 0 ð52Þ

In this case zit ¼ witðÞ 1 and xit ¼ zit, so that B0 ¼ I, and Bj ¼ 0 for j40. When the model of interest is a VAR, the auxiliary and substantive models coincide, except for the fact that the auxiliary model is based on stronger assumptions than the substantive model. In contrast, in conditional models the auxiliary assumptions give a parametric form to the feedback processes, which in the substantive model will typically remain unspecified.8 Example 2: Partial adjustment regression.Wehave α β β η ε yit ¼ yitðÞ 1 þ 0wit þ 1witðÞ 1 þ i þ it ð53Þ E ε ∣yt 1; wt ¼ 0: ð54Þ it i i 0 0 ; ; ; In this case zit ¼ yitðÞ 1 wit , xit ¼ yitðÞ 1 wit witðÞ 1 , 0 1 0 1 0 1 yitðÞ 1 10 00 B C B C B C xit ¼ @ wit A ¼ @ 01Azit þ@ 00AzitðÞ 1 B0zit þB1zitðÞ 1 : ð55Þ witðÞ 1 00 01

Example 3: Cross-country growth: The next example is an augmented Solow model of the determinants of growth as in Caselli et al. (1996). The equation is α γ 0 δ η ε yit ¼ yitðÞ 1 þsitðÞ 1 þf itðÞ 1 þ i þ it ð56Þ ε ∣ t 1; t 1; t 2 : E it yi si f i ¼ 0 ð57Þ fl The time interval is 5 years, yit is log per-capita GDP, f itðÞ 1 is a vector of ow variables containing the rates of investment and population growth, and s is a stock variable measuring the secondary school enrollment rate. itðÞ 1 0 ; ; 0 In this case zit ¼ yitðÞ 1 sitðÞ 1 f itðÞ 2 , and 0 1 0 10 1 0 1 yitðÞ 1 100 yitðÞ 1 0 B C B CB C B C @ sitðÞ 1 A ¼ @ 010A@ sitðÞ 1 Aþ@ 0 A ð58Þ

f itðÞ 1 A31 A32 A33 f itðÞ 2 ufiðÞ t 1 where the last equation coincides with the equation for f itðÞ 2 in the VAR model of zit. Thus, : Et xit ¼ B0Et zit ð59Þ

Since B0 is in general a squared, nonsingular matrix, it is irrelevant in the construction of optimal instruments, and the

(footnote continued) fi t fi fi or that the time-series individual-speci c projection of xit on zi only depends on the rst q lags of zit and satis es ⋯ ξ ∣ t Exit B0zit B1zitðÞ 1 BqzitðÞ q i zi ¼ 0. † 7 Note that estimation of an extended VAR for x0 ; z0 subject to exclusion restrictions (of lagged x's) is not warranted. The reason is that the B are it it t meant to be the coefficients of a linear projection of xit on zi and individual effects, and there is no reason why the errors in this projection should be serially uncorrelated. 8 Optimal instruments for scalar autoregressive models are discussed in Appendix C. M. Arellano / Research in Economics 70 (2016) 238–261 247 ∣ t optimal instrument in the growth example can be simply taken to be Ezit zi . So in both the VAR and growth examples we have B0 ¼ I and Bj ¼ 0 for j40. Example 4: Euler equation for household consumption: The last example is an Euler equation of the type considered by Zeldes (1989) and others: αΔ β0 η ε lnðÞ¼ 1þrit ln cit þ wit þ i þ it ð60Þ where wit is a vector of changes in family size variables, rit is the rate of return on a riskless asset, and cit is consumption; ηi captures heterogeneity in discount rates, and εit unobservable changes in tastes and expectation errors. The vector of instruments zit contains wit, lagged income, and marginal taxes, so that both returns and consumption growth are treated as strictly endogenous variables. 0 ; 0 0 In this case, letting zit ¼ wit z2it ,wehave Xq Δ π0 ζ ξ ln cit ¼ jzitðÞ j þ i þ it ð61Þ j ¼ 0

wit ¼ ðÞI; 0 zit ð62Þ or ! ! 0 0 Δln cit π π ¼ 0 þ⋯þ q : ð Þ Et ; Et zit Et zitðÞ J 63 wit I 0 0 Thus, there is a nontrivial specification of the B0…Bq due to the presence of strictly endogenous variables and external instruments.

6. PML estimation of the VAR model

To obtain feasible estimators, the coefficients parameterizing the optimal instruments must be replaced by sample estimates. The suggested estimates are Gaussian pseudo-maximum likelihood of the VAR auxiliary model. The estimates are obtained under the assumptions that the data are normally distributed and the error variance matrices are homoskedastic, but the multivariate linear projection of the initial observations on the effects is left unrestricted. Estimates of this type were considered by Blundell and Smith (1991), and Alvarez and Arellano (2003) for a scalar autoregressive model.9 Alvarez and Arellano obtained a useful concentrated likelihood that only depended on the autoregressive para- meter, and found that the maximizer of this criterion behaved very well in simulations.10 This section reports a multivariate generalization of the PML results of Alvarez and Arellano, which will be required in practice for obtaining feasible optimal instruments.11

As shown in the Appendix, the log-likelihood given zi0 (under homoskedasticity and a joint normal distribution for μi and zi0 with unrestricted mean and covariance matrix) can be written as:

TX 1 ðT 1Þ 1 1 1 0 ln fzðÞ¼; …; z ∣z ln det Ω u0Ω 1u ln det Θ u ϕ Φ z Θ 1 u ϕ Φ z : ð64Þ i1 iT i0 2 2 it it 2 0 2 i 0 1 i0 0 i 0 1 i0 t ¼ 1

where uit ¼ zit Aziðt 1Þ, ui ¼ zi Azið1Þ and ∣ N ϕ Φ ; Θ ui zi0 0 þ 1zi0 0 ð65Þ

Concentrating ϕ0, Φ1, Θ0 and Ω, the PMLE of A solves ~ 0 0 0 1 0 0 0 A ¼ arg min ln det Z AZ Z Z A þ ln det Z AZ S Z Z A ð66Þ 1 1 T 1 1 0 1 0 1 0 0 0 where S0 ¼ I FFF F and F is the N ðmþ1Þ matrix of constants and initial observations f ¼ 1; z . ~ i0 i0 PML estimates of the remaining parameters: Given A, they are given by

XN TX 1 ~ 1 ~ ~ 0 Ω ¼ z Az z Az ð67Þ NðT 1Þ it iðt 1Þ it iðt 1Þ i ¼ 1 t ¼ 1

9 Blundell and Smith (1991) considered a generalized least squares estimator of the same model, which has been further discussed by Blundell and Bond (1998). 10 This is the PML estimate that does not restrict the individual effect variance to be non-negative. Alvarez and Arellano also discussed an alternative PML that enforced non-negativity, but in such a case a boundary solution may occur. 11 One problem with this method as an estimator of substantive VAR parameters is that it is not robust to lack of time series homoskedasticity. PML estimation of panel AR(p) models with time series heteroskedasticity is developed in Alvarez and Arellano (2004). 248 M. Arellano / Research in Economics 70 (2016) 238–261

Φ~ ϕ~ ; Φ~ 0 ~ 0 0 1 ¼ 0 1 ¼ Z AZ 1 FFF ð68Þ ~ ~ and letting ui ¼ zi Azið1Þ:

XN ~ 1 ~ ~ ~ ~ ~ ~ 0 Θ ¼ u ϕ Φ z Þ u ϕ Φ z Þ : ð69Þ 0 N i 0 1 i0 i 0 1 i0 i ¼ 1 η μ The estimated mean and the variance matrix of i ¼ðI AÞ i can be obtained as: Ω~ Θ~ Φ~ Σ~ Φ~ 0 Ω~ = η ¼ 0 þ 1 0 1 T ð70Þ

η~ ϕ~ Φ~ ¼ 0 þ 1z0 ð71Þ ~ where w0 and Σ 0 are the sample mean and the variance of wi0. Finally, estimates of Υ1 and τ0 are given by Υ~ Σ~ Φ~ 0 Ω~ 1 ~ 1 ¼ 0 1 η I A ð72Þ

τ~ Σ~ Φ~ 0 Ω~ 1η~ : 0 ¼ z0 0 1 η ð73Þ

7. Inference with feasible SIV estimators

An unfeasible estimator for a parameterization of the instruments takes the form

! ! XN TX 1 1 XN TX 1 βb t; γ 0 t; γ ; UF ¼ htðzi Þxit ht ðzi Þyit ð74Þ i ¼ 1 t ¼ 1 i ¼ 1 t ¼ 1 where γ is a pseudo true value that in practice will be defined by the probability limit of some sample statistic. Under standard regularity conditions, for fixed T and large N, pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi βb β -d N ; NTðÞ1 ð UF Þ ðÞ0 V UF ð75Þ with asymptotic variance matrix given by 0 1 0 0 0 1 V UF ¼ ðÞT 1 EHX EHε ε H EX H ð76Þ i i i i i i i i 0 0 ; …; 0 t; γ ; …; ε ε ; …; ε where Hi ¼ hi1 hiðT 1Þ , hit ¼ htðzi Þ, Xi ¼ xi1 xiðT 1Þ , and i ¼ i1 iðT 1Þ . ε2 σ2 ε ε 4 Provided Etð itÞ¼ and Et þ jð it iðt þ jÞÞ¼0 for j 0 it will be the case that σ2 0 1 0 0 1: V UF ¼ ðÞT 1 EHiXi EHiHi EXi Hi ð77Þ If in addition E ðx Þ¼h ðzt; γÞ then V coincides with the variance bound. t it t bi UF b β β γ γb - γb ¼ γ pffiffiffiffiA feasible estimator F is of the same form as UF but is replaced by an estimate . Provided, plimN 1 and NðγbγÞ is bounded in probability, the feasible and unfeasible estimators are asymptotically equivalent. To see this first note that

! ffiffiffiffi XN TX 1 1 XN TX 1 p b 1 1 N β β ¼ h zt; γ x0 pffiffiffiffi h zt; γb v þo ðÞ1 : F N t i it t i it p i ¼ 1 t ¼ 1 N i ¼ 1 t ¼ 1 Moreover, using the mean value theorem ! 1 XN TX 1 1 XN TX 1 1 XN TX 1 ∂h ðzt; γÞ pffiffiffiffi pffiffiffiffi h zt; γb v ¼ pffiffiffiffi h zt; γ v þ t i v N γbγ þo ðÞ1 : t i it t i it N ∂γ0 it p N i ¼ 1 t ¼ 1 N i ¼ 1 t ¼ 1 i ¼ 1 t ¼ 1

P P ∣ t ¼ 1 N T 1 ∂ ð t; γÞ=∂γ0 ¼ Since Evit zi 0, we have that plimN-1N i ¼ 1 pt ¼ffiffiffiffi1 vit ht zi 0. Therefore, the second term on the rhs of the βb β -d N ; previous expression is opð1Þ. From this it follows that Nð F Þ ðÞ0 V UF . Similar arguments can be used for the cases where T or both N and T tend to infinity.

Consistent estimation of the asymptotic variance matrix: A natural estimate of the “Hessian” component of VUF is

XN TX 1 b 1 b Ψ ¼ h x0; ð78Þ NT NTðÞ1 it it i ¼ 1 t ¼ 1 M. Arellano / Research in Economics 70 (2016) 238–261 249 b t b b 1 0 where h ¼ htðz ; γÞ. Ψ NT is an N-consistent estimator of ðÞT 1 EHX , but also an N and T consistent estimator of it b i i i plimT-1;N-1Ψ NT . The conventional robust fixed-T estimate of the “outer-product” component of VUF is

XN ~ 1 b 0 0 b Υ ¼ H εb εb H ð79Þ NT NTðÞ1 i i i i i ¼ 1 εb 0βb b b where it ¼ yit xit F and H i contains the estimated instruments hit. It can be shown that TX 2 ~ b ℓ b b 0 Υ ¼ Ω þ 1 Ωℓ þΩ ð80Þ NT 0 T 1 ℓ ℓ ¼ 1 where

XN TX 1 b 1 b b0 Ωℓ ¼ εb εb h h ð81Þ NTðÞ1ℓ it itðÞ ℓ it itðÞ ℓ i ¼ 1 t ¼ ℓ þ 1 for ℓ ¼ 0; 1; …; T 2.12 From a non-fixed T perspective, we consider the following estimator of the outer-product component: Xr b b ℓ b b 0 Υ ¼ Ω þ 1 Ωℓ þΩ ; ð82Þ NT 0 r þ1 ℓ ℓ ¼ 1 where r rT 2. b For r ¼ T 2, Υ NT particularizes to the standard robust fixed-T formula (Arellano, 1987). In a fixed T context, it is natural ~ to use Υ NT , since the number of Ωℓ terms is fixed, and large N ensures consistent estimation of all of them, including ΩðÞT 2 , which is a time series average with a single observation. However, from a non-fixed T perspective, the bound r on the b number of autocovariances used to form Υ NT should be chosen as a suitable function of T to ensure consistent estimation for a given asymptotic arrangement in N and T. Formula (82) is a panel data version of the estimator proposed in Newey and West (1987) for time series data. It has the fi 13 attractive feature of providing a positive semide nite estimator for any value of r. In the time series context, the term 1ℓ=ðÞr þ1 appearing in (82) is motivated as a damping factor for reducing the sampling error induced by higher-order sample covariances in the truncated estimate (called modified Bartlett weights in Anderson, 1971, pp. 511–513). It is interesting that these weights appear naturally as a feature of the sample covariance formula (80) from a cross-sectional perspective. In the fixed-T panel data context, formula (79) is often used with instrument matrices whose number of columns is of 2 ~ order T or T , so that the dimension of Υ NT itself increases with T . Here, however, we are considering optimal instruments hit with a fixed dimension for any value of T. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiThe proposal is, therefore, to base inference on the following estimate of the asymptotic covariance matrix of βb β NTðÞ1 ð F Þ for a chosen value of r: "# Xr b † b 1 b ℓ b b 0 b 01 V ¼ Ψ Ω þ 1 Ωℓ þΩ Ψ : ð83Þ UF NT 0 r þ1 ℓ NT ℓ ¼ 1 Optimal instruments with strictly endogenous explanatory variables: a two-step GMM method: Now we can pursue the discussion on estimation when part of the x's are strictly endogenous and the instruments have a larger dimension than β. b We suggest using the inverse of Υ NT as the weight matrix. Thus, we consider estimators of the form "# ! ! 1 XX 0 1 XX β~ b Υb b 0 ¼ xithit NT hitxit t t i ! i ! XX 0 1 XX b Υb b xithit NT hityit ð84Þ i t i t b where hit is an estimate of Et zitðÞ q as introduced in (50).

12 Note that

TX 1 TX 1 0 TX 2 b 0εbεb0 b εb εb0 b b Ω ℓ Ω Ω0 H i i i H i ¼ it is hit his ¼ ðÞT 1 i0 þ ðÞT 1 iℓ þ iℓ t ¼ 1 s ¼ 1 ℓ ¼ 1 P 0 P Ω ℓ 1 T 1 εb εb b b Ωb 1 N Ω ℓ ; ; …; where iℓ ¼ ðÞT 1 t ¼ ℓ þ 1 it itðÞ ℓ hit hitðÞ ℓ and ℓ ¼ N i ¼ 1 iℓ for ¼ 0 1 T 2. 13 The estimator in (82) relies on cross-sectional independence. A related formula for large T, fixed N within-group standard errors that allow for arbitrary cross-sectional dependence is proposed in Arellano (2003). 250 M. Arellano / Research in Economics 70 (2016) 238–261

~ Note that β can be regarded as the SIV estimator that uses the following moments that have the same dimension of β: XX b† 0β~ xit yit xit ¼ 0 ð85Þ i t P P 0 1 b† b Υb b where xit ¼ i txithit NT hit.

8. Empirical illustrations

8.1. VAR for firm panel data

We estimate autoregressive employment and wage equations from firm panel data. We first consider the dataset used by Alonso-Borrego and Arellano (1999). This is a balanced panel of 738 Spanish manufacturing companies, for which there are available annual observations for 1983–1990. Secondly we consider a longer panel of 385 firms from the same source (Bank of Spain, Central de Balances) on which 14 years of data are available, also starting in 1983. The average size of the firms in the second panel is more than twice as large as that of those in the first one.

We estimate a first-order VAR model for the logs of employment and wages, denoted nit and wit respectively. Individual and time effects are included in both equations. Time effects are removed prior to estimation by taking data in deviations from period-specific cross-sectional means. Within-groups (WG), GMM, PML, and projection-restricted simple IV (SIV) estimates are reported. In this illustration PML appears in two different roles. First, it is an input in calculating feasible optimal instruments for SIV estimation. Second, since the auxiliary and substantive models coincide in this illustration, PML can be also regarded as an estimator of the parameters of interest that imposes time-series homoskedasticity.14 Table 1 shows the results. Focusing on the leading coefficient in the employment equation, first notice the larger size of the within-group estimates from the longer panel, which is to be expected since WG is known to have a small-T downward bias (Nickell, 1981). Next, the discrepancy between GMM and PML estimates is as noticeable as that between WG and GMM. Finally, SIV estimates are in between GMM and PML. The difference between GMM and PML may be due to finite-sample bias in GMM, heteroskedasticity bias in PML, or a combination of both. SIV estimates are as robust as GMM under a fixed-T asymptotics, but less prone to bias in a double asymptotics. In particular, SIV is robust to time-series heteroskedasticity in short panels whereas PML is not. Reported SIV estimates are half way between GMM and PML.

8.2. Monte Carlo simulations

Next we performed a simulation exercise loosely calibrated to the previous firm panel dataset. The design was chosen from the partial adjustment representation as follows: : : : η yit ¼ 1þ0 8yiðt 1Þ 0 5xit þ0 3xiðt 1Þ þ i þvit ð86Þ

: : ξ ε ; xit ¼ 0 5þ0 3xiðt 1Þ þ i þ it ð87Þ σ2 σ2 : ; ε where all unobservables are iid normally distributed (over both i and t) with zero mean and v ¼ ε ¼ 0 01, CorrðÞ¼ vit it 0, σ2 σ2 : η ; ξ : fi η ¼ ξ ¼ 0 09, and Corr i i ¼ 0 6. Therefore, the variance of the xed effects is 9 times that of the random errors. There is no feedback from lagged y into x, and the long run effect of x on y is unity. Initial observations are generated from the stationary distribution of the process. The corresponding VAR is ! ! ! yit 0:75 0:80:15 yiðt 1Þ ci þeit ¼ þ þ ξ ε ð88Þ xit 0:5 00:3 xiðt 1Þ i þ it : ε η : ξ where eit ¼ vit 0 5 it, ci ¼ i 0 5 i, ! eit 0:0125 0:005 Var ¼ Ω ¼ ð89Þ εit 0:005 0:01 and ! : : ci Ω 0 0585 0 009 Var ξ ¼ η ¼ : ð90Þ i 0:009 0:09 ; ε : ; ξ : The implied correlations are CorrðÞ¼ eit it 0 447 and Corr ci i ¼ 0 124. In Table 2 we report medians and median absolute errors of the WG, GMM, PML, and SIV estimators for {N¼738, T¼8} and {N¼385, T¼14}.

14 The PML estimates that we report use unrestricted initial conditions. According to estimates of Υ1 for the two panels (reported in Table 1) there is some evidence of nonstationary initial conditions. M. Arellano / Research in Economics 70 (2016) 238–261 251

Table 1 Employment and wage VAR model: panel data of Spanish firms.

Variables WG GMM PML SIV

N ¼ 738; T ¼ 8 Employment equation

niðt 1Þ 0.71 0.86 1.00 0.93 ðÞ0:03 ðÞ0:06 ðÞ0:07

wiðt 1Þ 0.08 0.12 0.08 0.14 ðÞ0:03 ðÞ0:07 ðÞ0:08

Wage equation

niðt 1Þ 0.06 0.03 0.01 0.02 ðÞ0:02 ðÞ0:08 ðÞ0:08

wiðt 1Þ 0.44 0.29 0.68 0.32 ðÞ0:03 ðÞ0:10 ðÞ0:10

N ¼ 385; T ¼ 14 Employment equation

niðt 1Þ 0.86 0.83 0.995 0.90 ðÞ0:05 ðÞ0:05 ðÞ0:05

wiðt 1Þ 0.26 0.29 0.28 0.30 ðÞ0:09 ðÞ0:13 ðÞ0:15

Wage equation

niðt 1Þ 0.01 0.09 0.03 0.15 ðÞ0:02 ðÞ0:04 ðÞ0:05

wiðt 1Þ 0.45 0.34 0.62 0.32 ðÞ0:07 ðÞ0:10 ðÞ0:12

All data in deviations from period-specific cross-sectional means.

Estimates of Υ1 using SIV estimates of A: ðN ¼ 738; T ¼ 8Þ sample: 0:863 0:155 Υb ¼ 1ð738Þ 0:008 0:849

ðN ¼ 385; T ¼ 14Þ sample: 0:983 0:441 Υb ¼ : 1ð385Þ 0:004 0:905

In another experiment with N¼738, T¼8 we generated data with trending variances. The specification in this case was σ2 σ2 : : ; vt ¼ εt ¼ 0 01þ0 001t ð91Þ with t¼0 selected in such a way that the resulting sequence of variances ranged from 0.005 to 0.012. This is also given in Table 2. Finally, in Table 3 we report further simulations for {N¼200, T¼8}, {N¼100, T¼6}, and {N¼50, T¼15}. For all cases we conducted 1000 replications.

Focusing again on the leading coefficient in the first equation (a11 with true value of 0.8), Table 2 shows that GMM is downward biased but PML and SIV are virtually median unbiased. The robustness of SIV comes at the cost of a larger median absolute error than PML. The third panel of Table 2 for the experiment with trending variance shows that PML is upward biased, which goes in the direction of the empirical findings. Table 3 reports results for other sample sizes. Those in the first two experiments are similar to the Arellano–Bond firm-level data, and the cross-country panels used in growth studies, respectively. The last one illustrates the situation in a smaller panel that would be difficult to classify as either long T or large N, small T. In all cases SIV is unbiased and has reasonable median absolute errors, which is in contrast to some spectacular GMM biases.

8.3. Estimating country growth convergence rates

Using panel GMM, Caselli et al. (1996) found a surprisingly large estimate of the convergence rate of about 10 percent. This was in sharp contrast with earlier cross-sectional estimates of Barro and Sala-i-Martin, who found convergence rates of 2–3%.15 Caselli et al. claimed that earlier estimates were biased due to lack of proper control of country effects and predeterminedness. The worry is that their estimates have finite sample downward biases in the GDP autoregressive coefficient that translate into upward biases in estimated convergence rates, as noted by Bond et al. (2001).

15 See Barro and Sala-i-Martin (1995) for details and further references. 252 M. Arellano / Research in Economics 70 (2016) 238–261

Table 2 Monte Carlo simulations for the VAR model.

Variables WG GMM PML SIV

Median mae Median mae Median mae Median mae

N ¼ 738; T ¼ 8

a11 0.48 0.32 0.72 0.08 0.80 0.02 0.80 0.05

a12 0.09 0.06 0.13 0.03 0.15 0.02 0.15 0.03

a21 0.03 0.03 0.01 0.04 0.00 0.01 0.01 0.05

a22 0.12 0.18 0.29 0.03 0.30 0.01 0.30 0.03

N ¼ 385; T ¼ 14

a11 0.63 0.17 0.74 0.06 0.80 0.01 0.80 0.03

a12 0.12 0.03 0.13 0.02 0.15 0.01 0.15 0.02

a21 0.02 0.02 0.01 0.02 0.00 0.01 0.00 0.02

a22 0.21 0.09 0.29 0.02 0.30 0.01 0.30 0.02 N ¼ 738; T ¼ 8 Data with trend in variance

a11 0.50 0.30 0.70 0.11 0.86 0.06 0.81 0.07

a12 0.11 0.04 0.12 0.04 0.17 0.02 0.16 0.04

a21 0.03 0.03 0.01 0.05 0.01 0.01 0.01 0.06

a22 0.11 0.19 0.29 0.03 0.31 0.02 0.30 0.03 a11 ¼ 0:8; a12 ¼ 0:15; a21 ¼ 0; a22 ¼ 0:3. 1000 replications. mae is the median absolute error.

Table 3 Monte Carlo simulations for the VAR model.

Variables WG GMM PML SIV

Median mae Median mae Median mae Median mae

N ¼ 200; T ¼ 8

a11 0.48 0.32 0.59 0.21 0.80 0.05 0.80 0.09

a12 0.09 0.06 0.08 0.07 0.15 0.03 0.16 0.06

a21 0.03 0.03 0.03 0.07 0.00 0.02 0.00 0.09

a22 0.12 0.18 0.27 0.05 0.30 0.02 0.30 0.05 N ¼ 100; T ¼ 6

a11 0.35 0.45 0.40 0.40 0.80 0.10 0.80 0.22

a12 0.08 0.07 0.04 0.13 0.15 0.06 0.15 0.12

a21 0.04 0.05 0.04 0.14 0.00 0.05 0.01 0.19

a22 0.04 0.26 0.22 0.10 0.30 0.05 0.29 0.11 N ¼ 50; T ¼ 15

a11 0.64 0.16 0.62 0.18 0.80 0.04 0.80 0.06

a12 0.12 0.04 0.10 0.06 0.15 0.04 0.15 0.05

a21 0.02 0.03 0.02 0.04 0.00 0.02 0.00 0.06

a22 0.22 0.09 0.24 0.06 0.30 0.03 0.30 0.04 a11 ¼ 0:8; a12 ¼ 0:15; a21 ¼ 0; a22 ¼ 0:3. 1000 replications. mae is the median absolute error.

We obtained the Caselli et al. data and re-estimated an augmented Solow model of the form given in (56) with the new projection-restricted IV estimator. We have found a smaller convergence rate of about 4%, but imprecisely estimated (Table 4).16 From a substantive point of view, however, explaining the uncovered heterogeneity in steady state income levels seems at least as crucial as finding good estimates of the convergence coefficient. The standard errors reported in Table 4 were calculated from formula (83) with r ¼ T 2. Using smaller values of r made very little difference. Further exploration of the effects on the estimates and standard errors of using optimal instruments based on higher-order VAR models would be of some interest.

16 We use similar data as Caselli et al. and Bond et al. Our sample is the same as the one in Table 2 of Bond et al., except for the exclusion of 9 countries and 17 observations in order to have a balanced panel. M. Arellano / Research in Economics 70 (2016) 238–261 253

Table 4 Augmented Solow model N ¼ 92; T ¼ 5.

Variables OLS WG GMM SIV

ðÞ1þβ 0.947 0.680 0.698 0.808 (s.e.)ðÞ 0:017 ðÞ0:057 ðÞ0:107 ðÞ0:248

lnðÞenrt 0.035 0.049 0.140 0.114 (s.e)ðÞ 0:014 ðÞ0:029 ðÞ0:066 ðÞ0:229

lnðÞst 0.081 0.138 0.144 0.090 (s.e)ðÞ 0:017 ðÞ0:039 ðÞ0:055 ðÞ0:075

lnðÞnt þg þd 0.094 0.033 0.230 0.227 (s.e)ðÞ 0:053 ðÞ0:152 ðÞ0:339 ðÞ1:384

Implied λ 0.011 0.077 0.072 0.043 (s.e)ðÞ 0:004 ðÞ0:017 ðÞ0:031 ðÞ0:062

All data in deviations from period-specific cross-sectional means. Data for 5-year intervals 1960–1985. s.e. robust to heterosk. & autocorrelation, m ¼ T 2. n¼population growth rate; g¼rate of technical change; d¼rate of depreciation of physical capital (gþd¼0.05); s¼saving rate; enr¼secondary-school enrollment rate.

9. Concluding remarks

We developed a new methodology for instrumental variable estimation of panel data models with general pre- determined or endogenous explanatory variables. The suggested instruments are linear forecasts of the explanatory vari- ables constructed under the assumption that the vector of conditioning variables follows a panel VAR process. These are linear combinations of all available lags as in ordinary GMM, but with the crucial difference that the number of first-stage coefficients is kept constant regardless of the value of T. We show analytically and through Monte Carlo simulations that this fundamentally alters the properties of the estimators in double asymptotics and finite samples. The new estimators elim- inate double-asymptotic biases while retaining similar robustness and optimality properties as GMM in fixed T environments. Specific discussions on unbalanced panels and higher-order auxiliary models are clearly of practical importance, but they are left for future work. GMM is not well suited to cope with the complications derived from typical patterns of unba- lancedness or rotation in firm and household panels, which magnify the number of first stage coefficients. In the GMM context this has been avoided at the expense of introducing ad-hoc restrictions in the form of the cross-sectional projections across sub-panels—see Arellano and Bond (1991) for discussion. In contrast, projection-restricted IV estimators offer the possibility of a coherent specification of optimal instruments across sub-panels based on a small number of common coefficients.

Acknowledgments

I wish to thank Stephen Bond, Francesco Caselli, and Jonathan Temple for kindly making their data available; Pedro Albarran for outstanding research assistance; Javier Alvarez, Olympia Bover, Jan Kiviet, Charles Manski, Whitney Newey, and Enrique Sentana for helpful comments; and David Hendry for posing the question that originated this work.

Appendix A. Double-asymptotic results

Proof of Theorem 1. We shall use the following implication of Assumption 1. For all r; sZ0wehave Et xitðÞþ r εitðÞþ s ¼ ExitðÞþ r εitðÞþ s : ð92Þ To see this note that X1 X1 X1 X1 0 ∣ t; μ μ μ0 Ψ ζ ζ0 ∣ζt; μ Ψ 0 μ μ0 Ψ ζ ζ0 Ψ 0 : EwitðÞþ r witðÞþ s wi i ¼ i i þ jE itðÞþ r j itðÞþ s k i i k ¼ i i þ jE itðÞþ r j itðÞþ s k k j ¼ 0 k ¼ 0 j ¼ 0 k ¼ 0 ð93Þ By the law of iterated expectations hi 0 ∣ t 0 ∣ t; μ ∣ t 0 μ μ0 μ μ0 : EwitðÞþ r witðÞþ s zi ¼ EE witðÞþ r witðÞþ s wi i zi ¼ EwitðÞþ r witðÞþ s þEt i i E i i ð94Þ

Eq. (92) holds because εit does not have an individual effect (i.e the first component of μi is zero). 254 M. Arellano / Research in Economics 70 (2016) 238–261 0 Let us examine the form of EX Mε .Wehave

TX 1 TX 1 0 ε 0 ε ε : EXM ¼ EXt Mt t ¼ m tE xit it ð95Þ t ¼ 1 t ¼ 1

To see this note that 0 ε 0 ε ε 0 ε : EXt Mt t ¼ ExℓtMt t ¼ Etr MtEt t xℓt ¼ mtE itxit ð96Þ

The last equality comes from (92),trðÞ¼Mt mt, and the fact that due to cross-sectional independence: 0 1 0 1 ε … ε ε … ε Et 1txℓ1t Et 1txℓNt Et 1txℓ1t Et 1t Et xℓNt B C B C ε 0 @ ⋮⋱⋮A @ ⋮⋱⋮A ε : Et t xℓt ¼ ¼ ¼ Et itxℓit IN ð97Þ ε … ε ε … ε Et Ntxℓ1t Et NtxℓNt Et Nt Et xℓ1t Et Nt xℓNt

Next, consider T t 1 1 Ex ε ¼ EðÞε x E ε x þ⋯þx E ε þ⋯þε x it it T t þ1 it it ðÞT t it itðÞþ 1 iT ðÞT t itðÞþ 1 iT it ) 1 þ E εitð þ 1 þ⋯þεiT xitðÞþ 1 þ⋯þxiT : ð98Þ ðÞT t 2

Collecting terms we obtain no 1 Ex ε ¼ γ ðÞT t γ þ⋯þγ þγ þ⋯þðÞT t γ ð99Þ it it 0 ðÞT t ðÞT t þ1 ðÞT t 1 1 ðÞT t

Thus ! TX 1 TX 1 TX 1 0 t EX Mε ¼ m tE x ε ¼ m γ t B ð100Þ it it 0 ðÞT t ðÞT t þ1 ðÞT t t ¼ 1 t ¼ 1 t ¼ 1 where γ ⋯ γ γ ⋯ γ : BðÞT t ¼ ðÞT t ðÞT t þ þ 1 þ 1 þ þðÞT t ðÞT t ð101Þ

We now use the following facts:

TX 1 1 t ¼ ðÞT 1 T ð102Þ 2 t ¼ 1

TX 1 t XT 1 ¼ T ð103Þ ðÞT t ðÞT t þ1 t t ¼ 1 t ¼ 1

BðÞT t ojjB1 ð104Þ γ a 0 ε 2 Hence, when xit is endogenous ( 0 0) we have EXM ¼ OmT or ! 0 X Mε mT E ¼ O : ð105Þ NT N 0 In contrast, when xit is predetermined we have EXMε ¼ OmTðÞor ! 0 X Mε m E ¼ O :□ ð106Þ NT N M. Arellano / Research in Economics 70 (2016) 238–261 255

Appendix B. Auxiliary VAR with individual effects

B.1. Sequential linear projections of the effects

Proof of Theorem 2. I show that for the VAR(1) model presented in the main text, the linear projection of the vector of μ t 0 ; …; 0 0 individual effects i on zi ¼ zi0 zit is given by "# ! Xt 1 μ ∣ t Ω 1 Υ 0 Γ 1Υ 0 Ω 1 E i zi ¼ μ þ 1 0 1 þðIAÞ s ðIAÞ "#s ¼ 1 Xt Ω 1μ Υ 0 Γ 1τ 0 Ω 1 Υ 0 Γ 1 μ 1 0 0 þðI AÞ s zis Aziðs 1Þ þ 1 0 zi0 s ¼ 1 "#"# Xt 1 Xt ¼ I þΛ0Υ 1 þ Λ1 sðÞI A μþΛ0ðÞzi0 τ0 þ Λ1 s zis Aziðs 1Þ ð107Þ s ¼ 1 s ¼ 1 for t Z1, and for t¼0: hihi 1 μ ∣ t Ω 1 Υ 0 Γ 1Υ Ω 1μ Υ 0 Γ 1 τ Λ Υ 1 μ Λ τ E i zi ¼ μ þ 1 0 1 μ þ 1 0 ðÞzi0 0 ¼ I þ 0 1 þ 0ðÞzi0 0 ð108Þ Λ Ω Υ 0 Γ 1 Λ Ω 0Ω 1 where 0 ¼ μ 1 0 and 1 s ¼ μðIAÞ s . The VAR model implies that t μ ⋯ t 1 t : zit ¼ðI A Þ i þ vit þAviðt 1Þ þ þA vi1 þA zi0 ð109Þ

Substituting (36) in (109) we obtain hi t Υ μ tτ zit ¼ I þA 1 I i þA 0 þzit ð110Þ where ⋯ t 1 t † : zit ¼ vit þAviðt 1Þ þ þA vi1 þA vi0 ð111Þ 0 0 20 t0 Let G denote the ðÞt þ1 m m matrix G ¼ Im; A ; A ; …; A . Then t μ τ t zi ¼ F i þG 0 þzi ð112Þ where F ¼ ðÞþι I G Υ 1 I ; ð113Þ 0 ι is a ðÞt þ1 1 vector of ones, and zt ¼ z0 ; …; z0 . Therefore, i i0 it t μ τ Ezi ¼ F þG 0 ð114Þ t Ω 0 Var zi ¼ F μF þV ð115Þ where V ¼ Var zt . Moreover, note that i t; μ Ω : Cov zi i ¼ F μ ð116Þ

The linear projection is given by μ ∣ t ψ Π0 t E i zi ¼ t þ tzi ð117Þ where ψ μ Π0 t t ¼ tEzi ð118Þ and Π t 1 t ; μ : t ¼ Var zi Cov zi i ð119Þ

Letting Ωμ ¼ PP0 and M¼FP, and using the matrix inversion lemma 1 t 1 0 1 1 1 0 1 0 1: Var zi ¼ MM þV ¼ V V MIþM V M M V 256 M. Arellano / Research in Economics 70 (2016) 238–261

Moreover, letting V 1 ¼ B0B where B is a block-lower triangular matrix hi 1 Π0 t Ω 0 1 1 0 1 0 1 t 0 0 0 0 0 1 0 0 t tzi ¼ μF V V MIþM V M M V zi ¼ PM B BB BM I þM B BM M B B zi hi 0 0 0 0 1 0 0 t 0 0 1 0 0 t : ¼ PI M B ðÞBM IþM B BM M B Bzi ¼ PIþ M B ðÞBM M B Bzi ð120Þ

t t Since V ¼ Var z , the matrix B has to be such that Var Bzi ¼ I. Therefore, since B must satisfy 0 i 1 = † Γ 1 2v B 0 i0 C B = C B Ω 1 2v C t ¼ B 1 i1 C; ð Þ Bzi B C 121 @ ⋮ A Ω 1=2 t vit it is given by 0 1 = Γ 1 2 00… 00 B 0 C B = = C B Ω 1 2A Ω 1 2 000C B 1 1 C B = = C B 0 Ω 1 2A Ω 1 2 00C B 2 2 C: B ¼ B C ð122Þ B ⋮⋱⋮C B = C B 000Ω 1 2 0 C @ t 1 A … Ω 1=2 Ω 1=2 000 t A t

Direct multiplication gives 0 1 Γ 1=2Υ B 0 1 C B = C B Ω 1 2ðÞI A C ι Υ B 1 C BM ¼ B ðÞþ I G 1 I P ¼ B CP ð123Þ @ ⋮ A Ω 1=2 t ðÞI A and 0 1 = Γ 1 2z B 0 i0 C B = C B Ω 1 2ðÞz Az C Bzt ¼ B 1 i1 i0 C: ð124Þ i B ⋮ C @ A Ω 1=2 t zit Aziðt 1Þ

Hence "# ! Xt 0 0 0 Υ 0 Γ 1Υ 0 Ω 1 M B ðÞ¼BM P 1 0 1 þðÞI A s ðÞI A P ð125Þ s ¼ 1 and "# Xt 0 0 t 0 Υ 0 Γ 1 0 Ω 1 : M B Bzi ¼ P 1 0 zi0 þðÞIA s zis Aziðs 1Þ ð126Þ s ¼ 1

Inserting these terms in the expression given above "# ! Xt 1 Π0 t 10 1 Υ 0 Γ 1Υ 0 Ω 1 tzi ¼ P P þ 1 0 1 þðÞI A s ðÞIA "#s ¼ 1 Xt Υ 0 Γ 1 0 Ω 1 : 1 0 zi0 þðÞIA s zis Aziðs 1Þ ð127Þ s ¼ 1

Finally, to obtain the vector of intercepts, note that since EzðÞ¼i0 τ0 þΥ 1μ and ! Xt Xt Ω 1 Ω 1 μ; E s zis Aziðs 1Þ ¼ s ðÞI A ð128Þ s ¼ 1 s ¼ 1 M. Arellano / Research in Economics 70 (2016) 238–261 257 we have "# ! Xt 1 Π0 t Ω 1 Υ 0 Γ 1Υ 0 Ω 1 tEzi ¼ μ þ 1 0 1 þðÞI A s ðÞI A "# s ¼ 1 ! Xt Υ 0 Γ 1τ Υ 0 Γ 1Υ μ 0 Ω 1 μ 1 0 0 þ 1 0 1 þðÞIA s ðÞIA ð129Þ s ¼ 1 Collecting terms we obtain "# ! Xt 1hi ψ μ Π0 t Ω 1 Υ 0 Γ 1Υ 0 Ω 1 Ω 1μ Υ 0 Γ 1τ t ¼ tEzi ¼ μ þ 1 0 1 þðÞIA s ðÞIA μ 1 0 0 ð130Þ s ¼ 1 from which the result follows. τ Υ Stationary VAR(1) processP: The stationary case is obtained as a specialization of the initial VAR model to 0 ¼ 0, 1 ¼ Im, Ω Ω Γ 1 jΩ j0 Γ fi Γ Γ 0 Ω t ¼ for all t, and 0 ¼ j ¼ 0 A A , so that 0 satis es 0 ¼ A 0A þ . In such case we have t ι μ Ewi ¼ ð131Þ t ιι0 Ω Var wi ¼ μ þV ð132Þ t; μ ι Ω : Cov wi i ¼ μ ð133Þ Moreover, V is given by 0 1 Γ Γ0 … Γ0 B 0 1 t C B Γ Γ Γ0 C B 1 0 t 1 C V ¼ B C: ð134Þ @ ⋮⋱A

Γt Γt 1 … Γ0

j where Γj ¼ A Γ0. Nevertheless, the matrix B in the decomposition of the inverse of V is of the same form as in the general case but with constant Ω. There are two special cases of equation (47) that are of some interest:

1. Uncorrelated multivariate error components (A¼0): In this case Γ0 ¼ Ω and Λ1 ¼ Λ0, so that μ Et zit ¼ ct zit Et i ð135Þ and "# Xt μ Λ 1 μ Λ : Et i ¼ I þðÞ1þt 0 þ 0 zis ð136Þ s ¼ 0 2. Homogeneous VAR(1) process (Ωμ ¼ 0): In this case δit ¼ δi0 ¼ μ and Ht ¼ H0 ¼ I, so that Et μ ¼ μ and ! i 1 Xt E z ¼ c I As z μ :□ ð137Þ t it t T t it s ¼ 1

B.2. VAR log-likelihood given initial observations

The VAR model can be written as 0 10 1 0 1 0 1 I 0 … 00 zi1 Im ui1 B CB C B C B C B CB z C B C B u C B AI 00CB i2 C B 0 C B i2 C B ⋮⋱ ⋮CB ⋮ C B ⋮ C B ⋮ C; B CB C ¼ B CAzi0 þB C ð138Þ B CB C B C B C @ 00 I 0 A@ ziðT 1Þ A @ 0 A @ uiðT 1Þ A

00… AI ziT 0 uiT or using a compact notation

Bzi ¼ DAzi0 þui ð139Þ ι η ui ¼ i þvi ð140Þ η η μ ι 0 ; …; 0 0 where uit ¼ i þvit, i ¼ðI AÞ i, is a vector of ones of order T and vi ¼ vi1 viT . Thus, under time series homo- skedasticity

EuðÞ¼i ι η ð141Þ 258 M. Arellano / Research in Economics 70 (2016) 238–261 and 0 VarðÞ¼ ui ιι Ωη þðÞIT Ω ð142Þ where η ¼ðIAÞμ and Ωη ¼ðIAÞΩμðIAÞ0.

The conditional density of zi given zi0 is related to that of ui by

fzðÞ¼i∣zi0 fuðÞi∣zi0 detðÞB ð143Þ but detðÞB ¼ 1 because B is a triangular matrix. Moreover ∣ ; ∣ fuðÞ¼i zi0 f ui ui zi0 detðÞH Im ð144Þ ι= ; A0 0 A where H ¼ T is a T T transformation matrix, and is the ðT 1ÞT forward orthogonal deviations operator, that 0; 0 0 fi m=2 produces ðÞH Im ui ¼ ui ui . The determinant of the transformation satis es detðÞH Im ¼ T , which is an irrelevant constant. Note that η EH½ðÞ I u ¼ Hι η ¼ ð145Þ m i 0 ! Ω 1Ω 0 0 0 0 0 η þT 0 Var½ðÞ H Im ui ¼ ðÞH Im ιι Ωη þðÞIT Ω H Im ¼ Hιι H Ωη þ H H Ω ¼ : 0 IT 1 Ω ð146Þ

In order to obtain the mean and variance matrix of fðÞH Im ui∣zi0g it is convenient to introduce some additional notation. Σ Υ Ω Υ 0 Γ η Let 0 ¼ VarðÞ¼ zi0 1 μ 1 þ 0 and let the linear projection of i on zi0 be ! 1 η ∣ ϕ Φ ϕ ; Φ Φ E i zi0 ¼ 0 þ 1zi0 ¼ 0 1 f i0 ð147Þ zi0

Φ Ω Υ 0 Σ 1 17 η ; ; ; …; so that 1 ¼ðIAÞ μ 1 0 . Thus under joint normality of i zi0 zi1 ziT : ! ϕ Φ 0 þ 1zi0 EH½ðÞ Im u ∣z ¼ ð148Þ i i0 0 and ! ! ϕ Φ Θ 0 þ 1zi0 0 0 Var½ðÞ H Im ui∣zi0 ¼ Var½ðÞ H Im ui Var ¼ ð149Þ 0 0 IT 1 Ω Θ Ω Φ Σ Φ0 1Ω 18 where 0 ¼ η 1 0 1 þT . Then, under normality ðT 1Þ 1 1 1 0 ln fzðÞ¼; …; z ∣z ln det Ω u0 I Ω 1 u ln det Θ u ϕ Φ z Θ 1 u ϕ Φ z : i1 iT i0 2 2 i T 1 i 2 0 2 i 0 1 i0 0 i 0 1 i0 ð150Þ

Appendix C. Estimating scalar autoregressive models

Let us consider first a scalar AR(1) model with individual effects of the form α α μ ∣α∣o yit ¼ yiðt 1Þ þðÞ1 i þvit 1 ð151Þ

∣ ; ; …; Eðvit yi0 yi1 yiðt 1ÞÞ¼0 ð152Þ so that in the notation of (1) and (2) xit and zit are both scalar variables and xit ¼ zit ¼ yiðt 1Þ. For convenience we assume that yi0 is observed. In this case hi 1 ∣ t 1 1 ∣ t 1 ⋯ ∣ t 1 : Exit yi ¼ yiðt 1Þ Eyit yi þ þEyiðT 1Þ yi ð153Þ ct T t

17 Note that

Φ η ; Σ 1 μ ; Σ 1 1 ¼ Covð i wi0Þ 0 ¼ðIAÞCovð i wi0Þ 0 Ω Ω 1 μ ; Σ 1 Ω Υ 0 Σ 1: ¼ðIAÞ μ μ Covð i wi0Þ 0 ¼ðIAÞ μ 1 0 18 η ∣ Ω Φ Σ Φ0 Note that Var i wi0 ¼ η 1 0 1 . M. Arellano / Research in Economics 70 (2016) 238–261 259

Note that since hi ∣ t 1 αs þ 1 α ⋯ αs α μ ∣ t 1 μ ∣ t 1 αs þ 1 μ ∣ t 1 ; …; ; Eyiðt þ sÞ yi ¼ yiðt 1Þ þð1þ þ þ ÞE ðÞ1 i yi ¼ E i yi þ yiðt 1Þ E i yi ðÞs ¼ 0 T 1t ∣ t 1 μ ∣ t 1 the instrument Ex y is just a function of yiðt 1Þ and E i y given by it i ! i hi 1 XT t Ex ∣yt 1 ¼ c 1 αs y E μ ∣yt 1 : ð154Þ it i t T t iðt 1Þ i i s ¼ 1

μ ∣ T 1 T 1 In general E i yi can be a nonlinear function of yi , but in the auxiliary models we shall assume that it coincides with the linear projection μ ∣ T 1 ψ π0 t 1: E i yi ¼ T 1 þ T 1yi ð155Þ In view of the results of the previous section, it turns out that the coefficients of this projection are unrestricted when the σ2 fi unconditional variances EvðÞ¼it t are allowed to change with t in an unspeci ed way. But even in this case note that due to the law of iterated projections μ ∣ t 1 μ ∣ t ∣ t 1 ; E i yi ¼ E E i yi yi ∣ t 1 α ψ γ all the instruments Exit yi can be written as functions of ,T 1, and T 1. This substantially reduces the number of ∣ t 1 parameters relative to the unrestricted linear projections for Exit yi used implicitly by standard GMM, despite being based on the same auxiliary model. Nevertheless, the number of coefficients still increases with T. A strictly stationary auxiliary model: As an auxiliary model we may consider a strictly stationary AR(1) process, in which case, using the results from Appendix B, we obtain hiP μ ϕ α t 1 α2 þ ðÞ1 ¼ uis þ 1 yi0 μ ∣ t 1 t 1; θ his 1 E i yi ¼ mt 1 yi ð156Þ 1þϕ ðt 1ÞðÞ1α 2 þ1α2

ϕ σ2 =σ2 θ α; ϕ; μ 0 α where ¼ μ , ¼ð Þ , and uis ¼ yis yiðs 1Þ. 2 2 2 Letting λ ¼ ση=σ ¼ð1αÞ ϕ, an equivalent expression is given by "# μ λ Xt 1 α μ ∣ t 1 E i yi ¼ þ yis þ yi0 þyiðt 1Þ ð157Þ λ 1 þ α λ 1 þ α 1α 1þ t 1þ ðÞ1 α 1þ t 1þ ðÞ1 α s ¼ 0 P - μ ∣ t 1 1 t 1 μ Note that as t 1 E y converges to the limit of t yis (which is given by i), but for small values of t the i i P s ¼ 0 μ ∣ t 1 μ ∣ t 1 approximation of E i yi by E i s ¼ 0 yis may be poor, even if yit is a strictly stationary process (it would only be α 19 appropriate if ¼ 0). μ ∣ t 1 Using as the auxiliary model the assumptionsP of strict stationarity together with the linearity of E i yi , the ∣ t 1 t 1 fi α ϕ μ instrument Exit yi becomes a linear function of s ¼ 0 uis and yi0 with coef cients that depend exclusively on , and . Since these assumptions need not be true, the previous coefficients should be understood as pseudo true values for which we use the notation c ¼ða; f ; mÞ0. Thus our parameterization of the instrument is given by hi a 1aT t h yt 1; c ¼ c 1 y m yt 1; c : ð158Þ t i t 1a T t iðt 1Þ t 1 i An auxiliary model with unrestricted initial conditions: An alternative, more general auxiliary model that retains a fixed number of parameters is one in which the assumption of stationarity of initial observations is removed. This adds three extra parameters to the instrument function.

The previous auxiliary model assumed that the linear projection of yi0 on μi τ τ μ yi0 ¼ 0 þ 1 i þvi0 ð159Þ τ τ γ2 σ2= α2 τ τ γ2 was such that 0 ¼ 0, 1 ¼ 1 and Varðvi0Þ¼ 0 ¼ ð1 Þ. In the alternative model 0, 1 and 0 are free parameters. Using again the results from Appendix B we obtain P t 1 † μr τ τ þϕðÞ1α u þr τ y E μ ∣yt 1 ¼ m yt 1; θ 0 1 0 s ¼ 1 is 0 1 i0 ð160Þ i i t 1 i ϕ α 2 τ2 1þ ðt 1ÞðÞ1 þr0 1 σ2 =γ2 θ α; ϕ; μ; τ ; τ ; 0 ϕ α2 where r0 ¼ μ 0 and ¼ 0 1 r0 . Note that under stationarity r0 ¼ ð1 Þ.

19 With T¼2, we only need to consider t¼1, and there is just one instrument given by

2 = ðÞ1α Ex ∣y ¼ 2 1 2 y μ : i1 i0 ðÞþ1α λð1þαÞ i0 260 M. Arellano / Research in Economics 70 (2016) 238–261

τ τ γ2 For large t the difference between (156) and (160) will be small regardless of the values of 0, 1 and 0, but in short panels the difference may be important if the steady state distribution of the process is not a good approximation to the distribution of initial observations. In such a case a better choice of instrument will be T t hi † a 1a † h yt 1; c ¼ c 1 y m yt 1; c : ð161Þ t i t 1a T t iðt 1Þ t 1 i AR(p) processes: The previous discussion can be extended to a stable autoregressive process of order p: α ⋯ α η yit ¼ 1yiðt 1Þ þ þ pyiðt pÞ þ i þvit ð162Þ

∣ ; ; …; ; Eðvit yi0 yi1 yiðt 1ÞÞ¼0 ð163Þ ; …; 0 so that in terms of the notation of model (1) and (2), we have xit ¼ðyiðt 1Þ yiðt pÞÞ and zit ¼ yiðt 1Þ. For convenience we ; ; …; assume that yi0 yið1Þ yiðp þ 1Þ are observed (i.e. the time series dimension of the panel is T þp), and write the model in companion form Π η xiðt þ 1Þ ¼ xit þd1 i þvit ð164Þ 0 where d1 ¼ð1; 0; …; 0Þ of order p, and Π is the p p matrix ! α1 … αp Π ¼ : ð165Þ Ip 1 ⋮ 0

Therefore, 1 ∣ t 1 1 ∣ t 1 ⋯ ∣ t 1 : Exit yi ¼ xit Exiðt þ 1Þ yi þ þExiT yi ð166Þ ct T t Moreover, since ∣ t 1 Πs Π ⋯ Πs 1 η ∣ t 1 ; …; ; Exiðt þ sÞ yi ¼ xit þ Ip þ þ þ d1E i yi ðs ¼ 1 T tÞ ð167Þ ∣ t 1 η ∣ t 1 the vector of instruments Exit yi is a function of xit and E i yi given by 1 Ex ∣yt 1 ¼ c I Π I ΠT t ðÞIΠ 1 x ι E μ ∣yt 1 ð168Þ it i t T t it p i i η α ⋯ α μ 20 where i ¼ð1 1 pÞ i. μ ∣ t 1 Under the strictly stationary auxiliary model, the linear projection E i yi is still of the form μ ϕ μ ∣ t 1 μ ϕι0 ϕιι0 1 t 1 μι ι0 1 t 1; E i yi ¼ þ þV yi ¼ þ V yi ð169Þ 1þϕι0V 1ι 1þϕι0V 1ι but now V ¼ Vðα1; …; αpÞ corresponds to the autoregressive covariance matrix of order p.

References

Alonso-Borrego, C., Arellano, M., 1999. Symmetrically normalized instrumental-variable estimation using panel data. J. Bus. Econ. Stat. 17, 36–49. Alvarez, J., Arellano, M., 2003. The time series and cross-section asymptotics of dynamic panel data estimators. Econometrica 71, 1121–1159. Alvarez, J., Arellano, M., 2004. Robust Likelihood Estimation of Dynamic Panel Data Models. CEMFI Working Paper 0421. Anderson, T.W., 1971. The Statistical Analysis of Time Series. John Wiley & Sons, New York. Anderson, T.W., Hsiao, C., 1982. Formulation and estimation of dynamic models using panel data. J. Economet. 18, 47–82. Arellano, M., 1987. Computing robust standard errors for within-group estimators. Oxford Bull. Econ. Stat. 49, 431–434. Arellano, M., 2002. Sargan's instrumental variables estimation and the generalized method of moments. J. Bus. Econ. Stat. 20, 450–459. Arellano, M., 2003. Panel Data . Oxford University Press, Oxford. Arellano, M., Bond, S.R., 1991. Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev. Econ. Stud. 58, 277–297. Arellano, M., Bover, O., 1995. Another look at the instrumental-variable estimation of error-components models. J. Economet. 68, 29–51. Barro, R.J., Sala-i-Martin, X., 1995. Economic Growth. McGraw-Hill, New York. Blundell, R., Smith, R., 1991. Initial conditions and efficient estimation in dynamic panel data models. Ann. Econ. Stat. 20/21, 109–123. Blundell, R., Bond, S., 1998. Initial conditions and moment restrictions in dynamic panel data models. J. Economet. 87, 115–143. Bond, S.R., Hoeffler, A., Temple, J., 2001. GMM Estimation of Empirical Growth Models. CEPR Discussion Paper 3048, London. Brundy, J.M., Jorgenson, D.W., 1971. Efficient estimation of simultaneous equations by instrumental variables. Rev. Econ. Stat. 53 (3), 207–224. Caselli, F., Esquivel, G., Lefort, F., 1996. Reopening the convergence debate: a new look at cross-country growth empirics. J. Econ. Growth 1, 363–389. Chamberlain, G., 1987. Asymptotic efficiency in estimation with conditional moment restrictions. J. Economet. 34, 305–334. Chamberlain, 1992. Comment: sequential moment restrictions in panel data. J. Bus. Econ. Stat. 10, 20–26.

20 Note that Π 1 η ∣ t 1 ι μ ∣ t 1 ðÞI d1E i yi ¼ pE i yi since ð1α1 ⋯αpÞd1 ¼ ðÞI Π ιp. M. Arellano / Research in Economics 70 (2016) 238–261 261

Hahn, J., 1997. Efficient estimation of panel data models with sequential moment restrictions. J. Economet. 79, 1–21. Hansen, L.P., 1985. A method of calculating bounds on the asymptotic covariance matrices of generalized method of moments estimators. J. Economet. 30, 203–238. Holtz-Eakin, D., Newey, W., Rosen, H., 1988. Estimating vector autoregressions with panel data. Econometrica 56, 1371–1395. Moral-Benito, E., 2013. Likelihood-based estimation of dynamic panels with predetermined regressors. J. Bus. Econ. Stat. 31 (4), 451–472. Newey, W.K., West, K.D., 1987. A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55, 703–708. Newey, W.K., Smith, R.J., 2004. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 72, 219–255. Nickell, S., 1981. Biases in dynamic models with fixed effects. Econometrica 49, 1417–1426. Okui, R., 2009. The optimal choice of moments in dynamic panel data models. J. Economet. 151, 1–16. Sargan, J.D., 1975. Asymptotic theory and large models. Int. Econ. Rev. 16, 75–91. West, K.D., Wilcox, D.W., 1996. A comparison of alternative instrumental variables estimators of a dynamic linear model. J. Bus. Econ. Stat. 14 (3), 281–293. West, K.D., Wong, Ka-fu, Anatolyev, S., 2009. Instrumental variables estimation of heteroskedastic linear models using all lags of instruments. Economet. Rev. 28 (5), 441–467. Zeldes, S.P., 1989. Consumption and liquidity constraints: an empirical investigation. J. Polit. Econ. 97, 305–346.