GMM Estimation and Testing

Whitney Newey

October 2007

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Idea: Estimate parameters by setting sample moments to be close to population counterpart.

Definitions:

β : p 1 parameter vector, with true value β . × 0 g (β)=g (w ,β):m 1 vector of functions of ith data observation w and paramet i i × i Model (or moment restriction):

E[gi(β0)] = 0. Definitions: n def gˆ(β) = gi(β)/n : Sample averages. iX=1 Aˆ : m m positive semi-definite matrix. × GMM : ˆ β =argmingˆ(β)0Aˆgˆ(β). β

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. GMM ESTIMATOR: ˆ β =argmingˆ(β)0Aˆgˆ(β). β

Interpretation:

Choosing βˆ so sample moments are close to zero.

For g = g Agˆ , same as minimizing gˆ(β) 0 . k kAˆ 0 k − kAˆ q When m = p,theβ ˆ with gˆ(βˆ)=0will be the GMM estimator for any Aˆ

When m>p then Aˆ matters.

Methodo f MomentsisSpecial Case:

j Moments : E[y ]=hj(β0), (1 j p), ≤ ≤p Specify moment functions : g (β)=(y h1(β),...,y hp(β)) , i i − i − 0 Estimator :ˆg(βˆ)=0same as yj = h (βˆ), (1 j p). j ≤ ≤

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Two-stage Least Squares as GMM:

Model : yi = Xi0β + εi,E[Ziεi]=0, (i =1,...,n).

Specify moment functions : g (β)=Z (y X β). i i i − i0 n Sample moments are :ˆg(β)= Z (y X β)/n = Z (y Xβ)/n i i − i0 0 − iX=1 Z =[Z1,...,Zn]0,X =[X1,...,Xn]0,y =(y1,...,yn)0.

1 Specify distance (weighting) matrix : Aˆ =(Z0Z/n)− .

ˆ 1 GMM estimator is 2SLS : β =argmin[(y Xβ)0Z/n](Z0Z/n)− Z0(y Xβ)/n β − − 1 =argmin(y Xβ)0Z(Z0Z)− Z0(y Xβ). β − −

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Intertemporal CAPM

Nonlinear in parameters. ci consumption at time i, Ri is asset return between i and i+1, α0 is time discount factor, u(c, γ0) utility function;

Ii is variables observed at time i;

Agent maximizes

∞ j E[ α0− u(ci+j,γ0)] jX=0 subject to intertemporal budget constraint.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. First-order conditions are

E[R α0 uc(c ,γ )/uc(c ,γ ) I ]=1. i · · i+1 0 i 0 | i

Marginal rate of substitution between i and i +1 equals rate of return (in expected value).

Let Z denote a m 1 vector of variables observable at time i (like lagged con- i × sumption, lagged returns, and nonlinear functions of these).

Moment function is

g (β)=Z R α uc(c ,γ )/uc(c ,γ ) 1 . i i{ i · · i+1 0 i 0 − }

Here GMM is nonlinear instrumental variables.

Empirical Example: Hansen and Singleton (1982, Econometrica).

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Dynamic Panel Data

It is a simple model that is important starting point for microeconomic (e.g. firm investment) and macroeconomic (e.g. cross-country growth) applications is

E[yit yi,t 1,yi,t 2,...,yi0,αi]=β0yi,t 1 + αi, | − − − αi is unobserved individual effect.

Microeconomic application is firm investment (with additional covariates).

Macroeconomic is cross-country growth equations.

Let ηit = yit E[yit yi,t 1,...,yi0,αi]. − | − E[yi,t jηit]=0, (1 j t, t =1, ..., T ), − ≤ ≤ E[αiηit]=0, (t =1,...,T) .

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. ∆ denote first difference, i.e. ∆yit = yit yi,t 1, so ∆yit = β0∆yi,t 1 + ∆ηit. − − − Then

E[yi,t j(∆yit β0∆yi,t 1)] = 0, (2 j t, t =1,...,T) . − − − ≤ ≤ IV moment conditions. Twice (and more) lagged levels of yit canbeusedas instruments for differenced equations.

Different instruments for different residuals.

Additional moment conditions from of αi and ηit.Theya re

E[(yiT β0yi,T 1)(∆yit β0∆yi,t 1)] = 0, (t =2, ..., T 1). − − − − − These are nonlinear.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Combine moment conditions by stacking. Let

yi0 t . gi (β)=⎛ . ⎞ (∆yit β∆yi,t 1), (t =2,...,T) , − − yi,t 2 ⎜ ⎟ ⎝ − ⎠ ∆yi2 β∆yi1 α −. gi (β)=⎛ . ⎞ (yiT βyi,T 1). − − ∆yi,T 1 β∆yi,T 2 ⎜ − − − ⎟ These moment functions⎝ can be combined as ⎠

2 T α gi(β)=(gi (β)0, ..., g i (β)0,gi (β)0)0. Here there are T (T 1)/2+(T 2) moment restrictions. − −

Ahn and Schmidt (1995, Journal of ) show that the addition of the α nonlinear moment condition gi (β) to the IV ones often gives substantial asymptotic efficiency improvements.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Arellano and Bond approach also better is small samples:

∆yi,1 t ∆yi,2 gi(β)=⎛ . ⎞ (yit βyi,t 1) . − − ⎜ ⎟ ⎜ ∆yi,t 1 ⎟ ⎜ − ⎟ Assumes that have representation⎝ ⎠

∞ ∞ yit = atjyi,t j + btjηi,t j + Cαi. − − jX=1 jX=1 Hahn, Hausman, Kuersteiner approach: Long differences

yi0 yi2 βyi1 gi(β)=⎛ −. ⎞ (yiT yi1 β(yi,T 1 yi0)] . − − − − ⎜ ⎟ ⎜ y βy ⎟ ⎜ i,T 1 i,T 2 ⎟ ⎝ − − − ⎠

Has better small sample properties by getting most of the information with fewer moment conditions.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. IDENTIFICATION:

Identification precedes estimation.

If a parameter is not identified then no consistent estimator exists.

Identification from moment functions: Let g¯(β)=E[gi(β)].

Identification condition is

β0 IS THE ONLY SOLUTION TO g¯(β)=0

Necessary condition for identification is that

m p. ≥ When m

In IV need at least as many instrumental variables as right-hand side variables.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Rank condition for identification:

Let G = E[∂gi(β0)/∂β]. Rank condition is rank(G)=p.

Necessary and sufficient for identification when gi(β) is linear in β. 0=g¯(β)=g¯(β )+G(β β )=G(β β ) 0 − 0 − 0 has a unique solution at β0 if and only if the rank of G equals the number of columns of G.

For IV G = E[Z X ] so that rank(G)=p is usual rank condition. − i i0

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. In the general nonlinear case it is difficult to specify conditions for uniqueness of the solution to g¯(β)=0. rank(G)=p will be sufficient for local identification.

There exists a neighborhood of β0 such that β0 is the unique solution to g¯(β) for all β in that neighborhood.

Exact identification is m = p.

For IV as many instruments as right-hand side variables.

Here the GMM estimator will satisfy gˆ(βˆ)=0asymptotically; see notes.

Overidentification is m>p.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. TWO STEP OPTIMAL GMM ESTIMATOR:

When m>pGMM estimator depends on weighting matrix Aˆ.

Optimal Aˆ minimizes asymptotic variance of βˆ.

n Let Ω be asymptotic variance of √ngˆ(β0)= i=1 gi(β0)/√n,i.e. d P √ngˆ(β ) N(0, Ω). 0 −→ In general, central limit theorem for time series gives Ω = lim E[ngˆ(β )ˆg(β ) ]. n 0 0 0 −→ ∞ Optimal Aˆ satisfies p Aˆ Ω 1 . −→ − Take ˆ 1 Aˆ = Ω− for Ωˆ a consistent estimator of Ω.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Estimating Ω.

Let β˜ be preliminary GMM estimator (known Aˆ).

Let g˜ = g (β˜) gˆ(β˜) i i −

If E[gi(β0)gi+(β0)0]=0thenf orsomepre n ˆ Ω = g˜ig˜i0/n. iX=1

Example is CAPM model above.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. If have autocorrelation in moment conditions then L ˆ ˆ ˆ ˆ Ω = Λ0 + wL(Λ + Λ0 ), X=1 n ˆ − Λ = g˜ig˜i0+/n. iX=1

ˆ Weights wL make Ω positive semi-definite.

Bartlett weights w =1 /(L +1), as in Newey and West (1987). L −

Choice of L.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Two step optimal GMM: ˆ ˆ 1 β =argmingˆ(β)0Ω− gˆ(β). β

Two step optimal instrumental variables: For gˆ(β)=Z (y Xβ)/n, 0 − ˆ ˆ 1 1 ˆ 1 β =(X0ZΩ− Z0X)− X0ZΩ− Z0y.

ˆ 1 Two-stage least squares versus optimal GMM: Using Ω− improves asymptotic efficiency but may be worse in small samples due to higher variability of Ωˆ, that estimates fourth moments.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. CONSISTENT ASYMPTOTIC VARIANCE ESTIMATION

Optimal two step GMM estimator has d √n(βˆ β ) N(0,V),V =(G Ω 1 G). − 0 −→ 0 −

To form asymptotic t- and confidence intervals need a consistent estimator Vˆ of V .

Let Gˆ = ∂gˆ(βˆ)/∂β. A consistent estimator of V is ˆ 1 1 Vˆ =(Gˆ0Ω− Gˆ)− .

Could also update Ωˆ.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. ASYMPTOTIC THEORY FOR GMM:

Consistency for i.i.d. data.

If the data are i.i.d. and i) E[gi(β)] = 0 if and only if β = β0 (identification); ii) the GMM minimization takes place over a compact set B containing β0; iii) gi(β) is continuous at each β with probability one and E[supβ gi(β) ] is finite; iv) p p ∈B k k Aˆ A positive definite; then βˆ β . → → 0 Proof in Newey and McFadden (1994).

p Idea: gˆ( β ) 0 by law of large numbers. 0 → p Therefore gˆ(β ) Agˆ (β ) 0. 0 0 0 → Also gˆ(βˆ) Aˆgˆ(βˆ) gˆ(β ) Agˆ (β ), so 0 ≤ 0 0 0 p gˆ(βˆ) Agˆ (βˆ) 0. 0 → p Theonlyway this canhappenisifβ ˆ β . → 0

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Asymptotic normality:

p If the data are i.i.d., βˆ β and i) β is in the interior of the parameter set over → 0 0 which minimization occurs; ii) gi(β) is continuously differentiable on a neighbor- ˆ p hood N of β0 iii) E[supβ ∂gi(β)/∂β ] is finite; iv) A A and G0AG is ∈N k k → nonsingular, for G = E[∂gi(β0)/∂β];v) Ω = E[gi(β0)gi(β0)0] exists, then d √n(βˆ β ) N(0,V ),V =(G AG) 1 G AΩAG(G AG) 1 . − 0 −→ 0 − 0 0 −

See Newey and McFadden (1994) for the proof.

Can give derivation:

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Let Gˆ = ∂gˆ(βˆ)/∂β.

First order conditions are ˆ 0=Gˆ0Aˆgˆ(β)

ˆ Expand gˆ(β) around β0 to obtain gˆ(βˆ)=gˆ(β )+G¯(βˆ β ), 0 − 0

¯ ¯ ¯ ˆ where G = ∂gˆ(β)/∂β and β lies on the line joining β and β0, and actually differs from row to row of G¯.

Substitute this back in first order conditions to get

0=Gˆ Aˆgˆ(β )+Gˆ AˆG¯(βˆ β ), 0 0 0 − 0

Solve for βˆ β and mulitply through by √n to get − 0 1 √n(βˆ β )= Gˆ AˆG¯ − Gˆ Aˆ√ngˆ(β ). − 0 − 0 0 0 ³ ´

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Recall 1 √n(βˆ β )= Gˆ AˆG¯ − Gˆ Aˆ√ngˆ(β ). − 0 − 0 0 0 ³ ´

By central limit theorem d √ngˆ(β ) N(0, Ω) 0 −→

p p p Also we have Aˆ A, Gˆ G, G¯ G, so by the continuous mapping −→ −→ −→ theorem, 1 p 1 Gˆ AˆG¯ − Gˆ Aˆ G AG − G A 0 0 −→ 0 0 ³ ´ ³ ´

Then by the Slutzky lemma.

d 1 √n(βˆ β ) G AG − G AN(0, Ω)=N (0,V ). − 0 −→ − 0 0 ³ ´

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 1 The fact that A = Ω− minimizes the asymptotic varince follows from the Gauss Markov Theorem.

Consider a linear model.

E[Y ]=Gδ, V ar(Y )=Ω.

1 1 (G0Ω− G)− is the variance of generalized least squares (GLS) in this model.

1 1 ˆ 1 V =(G0AG)− G0AΩAG(G0AG)− is the variance of δ =(G0AG)− G0AY .

GLS has smallest variance by Gauss-Markov theorem,

V (G Ω 1G) 1 is p.s.d. − 0 − −

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. TESTING IN GMM:

Overidentification test statistic is ˆ ˆ 1 ˆ ngˆ(β)0Ω− gˆ(β).

Limiting distribution is chi-squared with m p degrees of freedom. −

Test only of overidentifying restrictions.

What would value be if m = p.

"Use up" p restrictions in estimating β.

Will have no power against some misspecification which biases βˆ.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. 1 2 Testing subsets of moment conditions. Partition gi(β)=(gi (β)0,gi (β)0)0 and Ω 1 conformably. H0 : E[gi (β)] = 0. One simple test is ˆ 1 2 ˆ 1 2 Tˆ1 =minn gˆ(β)0Ω− gˆ(β) min ngˆ (β)0Ω− gˆ (β). β − β 22 2 1 Asymptotic distribution of this is χ (m1) where m1 is the dimension of gˆ (β).

Another version is 1 ˜ 1 1 ng˜ 0Σ− g˜ , 1 1 1 2 1 where g˜ =ˆg (βˆ) Ωˆ 12Ωˆ − gˆ (βˆ) and Σ˜ is estimator of asymptotic variance − 22 − of √ng˜1.

Hausman test: If m1 p then, except in degenerate cases, Hausman test based ≤ on the difference of the optimal two-step GMM estimator using all the moment conditions and using just gˆ2(β) will be asymptotically equivalent to this test, for any m1 parameters.

See Newey (1985, GMM Specification Testing, Journal of Econometrics.)

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Tests of H0 : s(β)=0.

˜ ˆ 1 Let β =argmins(β)=0 ngˆ(β)0Ω− gˆ(β) be restricted estimator.

Wald test statistic: ˆ ˆ ˆ 1 1 ˆ 1 ˆ W = ns(β)0[∂s(β)/∂β(Gˆ0Ω− Gˆ)− ∂s(β)/∂β]− s(β).

Likelihood ratio (sum of squared residuals test statistic).

LR = ngˆ(β˜) Ωˆ 1gˆ(β˜) ngˆ(βˆ) Ωˆ 1gˆ(βˆ). 0 − − 0 −

Lagrange mulitplier test statistic: For G˜ = ∂gˆ(β˜)/∂β,

˜ ˆ 1 ˆ 1 1 ˆ 1 ˜ LM = ngˆ(β)0Ω− G˜(G˜0Ω− G˜)− G˜0Ω− gˆ(β).

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. ADDING MOMENT CONDITIONS:

Improves asymptotic efficiency but can lead to bias in two step GMM estimator. More on bias next time.

Efficiency improvment occurs because optimal weighting matrix for fewer moment conditions is not optimal for all the moment conditions.

1 2 Suppose that gi(β)=(gi (β)0,gi (β)0)0. Then the optimal GMM estimator for just 1 the first set of moment conditions gi (β) corresponds to a weighting matrix for all the moment conditions of the form

(Ωˆ ) 1 0 Aˆ = 11 − , Ã 0 0 ! ˆ 1 Not generally optimal for the entire moment function vector gi(β),whereΩ − is optimal.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Example: model E[y X ]=X β . i| i i0 0 Least squares estimator is GMM with g1(β)=X (y X β). Let ρ (β)=y i i i − i0 i i − Xi0β0.

More efficient estimator obtained by adding g2(β)=a(X )(y X β) for some i i i − i0 (m p) 1 vector of functions a(X).GMMf or − × Xi gi(β)= (yi Xi0β) Ã a(Xi) ! − is more efficient than least squares with heteroskedasticity.

Example: Randomly missing data. Linear regression again. Some variables missing at random. Wi always observed.

Let ∆i =1denote a complete data indicator, equal to 1 if all variables observed, equal to zero otherwise. Complete data moment functions: g1(β)=∆ X (y X β). i i i i − i0 Add g2(η)=(∆ η)a(W ). i i − i

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. No reduction in asymptotic variance when additional moment conditions are exactly identified.

Adding moment conditions lowers asymptotic variance but:

1) Increases small sample bias (with endogeneity present).

2) Increases small sample variance.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. CONDITIONAL MOMENT RESTRICTIONS:

ρ (β)=ρ(w ,β) a r 1 residual vector. Instruments z such that i i × i E[ρ (β ) z ]=0 i 0 | i Let F (z ) be an m r matrix of functions of z (instrumental variables). Let i × i gi(β)=F (zi)ρi(β). E[g (β )] = E[F (z )E[ρ (β ) z ]] = 0. i 0 i i 0 | βi Can form GMM estimator using gi(β)=F (zi)ρi(β). Leads to nonlinear IV esti- mator. Sargan (1958, 1959).

Optimal choice of F (z):Let D(z)=E[∂ρ (β )/∂β z = z], Σ(z)=E[ρ (β )ρ (β ) z = z]. i 0 | i i 0 i 0 0| i Asymptotic variance minimizing F (z) is 1 F ∗(z)=D(z)0Σ(z)− . Minimizes variance over all F (z ) a weighting matrices A. Note F (z) is p r, so i ∗ × g (β)=F (z )ρ (β) is p 1 (exact identification). i ∗ i i ×

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Examples:

Heteroskedastic Linear Regression: E[y X ]=X β and let ρ (β)=y X β. i| i i0 0 i i − i0 GMM estimator has g (β)=F (X )(y X β). i i i − i0 Here z = X and ∂ρ (β)/∂β = X ,sothat i i − i D(z )= E[X z ]= X , Σ(z )=E[ρ (β )2 X ]=Var(y X )=σ2 . i − i| i − i i i 0 | i i| i i

Optimal instruments: Xi F ∗(zi)=− 2 . σi

GMM estimator solves Xi 0= (yi X β). σ2 − i0 Xi i This is ______.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Homoskedastic Linear Simultaneous Equation:

Here again ρ (β)=y X β but now z is not X ; z are instruments. i i − i0 i i i

Assume E[ρ (β )2 z ]=σ2 is constant. i 0 | i

Here D(z )= E[X z ] is the reduced form. i − i| i

The optimal instruments in this example are D(z ) F = − i = σ 2E[X z ]. σ2 − − i| i

Here the reduced form may be nonlinear.

In general optimal instruments combine heteroskedasticity weighting with nonlinear reduced form.

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Optimality proof:

Let Fi = F (zi) be instruments for some GMM estimator with moment function gi(β)=F (zi)ρi(β), Fi∗ = F ∗(zi) be optimal instruments, and ρi = ρi(β0).

Iterated expectations gives

G = E[Fi∂ρi(β0)/∂β]=E[FiD(zi)] = E[FiΣ(zi)Fi∗0]=E[Fiρiρi0Fi∗0].

For any weighting matrix A define hi = G0AFiρi.Leth i∗ = Fi∗ρi.Then

G0AG = G0AE[Fiρihi∗ 0]=E[hihi∗0],G0AΩAG = E[hihi0].

For Fi = Fi∗ we have G = Ω = E[hi∗hi∗0], so asymptotic variance of estimator 1 with F (z)=F ∗ is (E[hi∗hi∗0])− .

Variance of GMM estimator with F and A minus variance of estimator with F ∗ is 1 (G AG) 1G AΩAG(G AG) 1 E[h h ] − 0 − 0 0 − − i∗ i∗ 0 1 ³ ´ 1 1 = E[h h ] − E[h h ] E[h h ] E[h h ] − E[h h ] E[h h ] − . i i∗0 { i i0 − i i∗ 0 i∗ i∗0 i∗ i0 } i∗ i0 ³ ´ ³ ´ ³ ´

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY]. Approximating the optimal moment functions

1 For given F (z) the GMM estimator with optimal weight matrix A = Ω− is approximation to the optimal estimator. For simplicity consider one residual, one parameter case, r = p =1.

Let gi = Fiρi, as above, G = E[gihi∗ 0],sothat 1 1 G0Ω− = E[hi∗gi0](E[gigi0])− . 1 G0Ω− are the coefficients of the population regression of hi∗ on gi. First-order conditions for GMM n ˆ ˆ 1 ˆ ˆ ˆ 1 ˆ 0=G 0Ω− gˆ(β)=G 0Ω− F (zi)ρi(β)/n, i=1 Lowest mean-square error approximation to X n

0= Fi∗ρi(β)/n. iX=1

Thus, if F (z)=(a1m(z), ...., amm(z))0 such that linear combinations 2 2 min E[ hi∗ π0gi ]= min E[Σ(zi) Fi∗ π0F (zi) ] 0, π1,...,πm { − } π1,...,πm { − } −→ as m then get variance of GMM approaches lower bound. An important −→ ∞ issueforpracticeist hechoiceofm .

Cite as: Whitney Newey, course materials for 14.385 Nonlinear Econometric Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu), Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].