Nonparametric Two-Step Sieve M Estimation and Inference
Jinyong Hahny Zhipeng Liaoz Geert Ridderx UCLA UCLA USC
First Version: April 2012; This version: March 2016
Abstract
This paper studies the two-step sieve M estimation of general semi/nonparametric models, where the second step involves sieve estimation of unknown functions that may use the nonparametric esti- mates from the …rst step as inputs, and the parameters of interest are functionals of unknown functions estimated in both steps. We establish the asymptotic normality of the plug-in two-step sieve M estimate of a functional that could be root-n estimable. The asymptotic variance may not have a closed form expression, but can be approximated by a sieve variance that characterizes the e¤ect of the …rst-step estimation on the second-step estimates. We provide a simple consistent estimate of the sieve variance and hence a Wald type inference based on the Gaussian approximation. The …nite sample performance of the two-step estimator and the proposed inference procedure are investigated in a simulation study.
JEL Classi…cation: C14, C31, C32
Keywords: Two-Step Sieve Estimation; Nonparametric Generated Regressors; Asymptotic Normality; Sieve Variance Estimation
1 Introduction
Many recently introduced empirical methodologies adopt semiparametric two-step estimation approaches, where certain functions are estimated nonparametrically in the …rst step, and some Euclidean parameters are estimated parametrically in the second step using the nonparametric estimates from the …rst stage as
We gratefully acknowledge insightful comments from Xiaohong Chen, who was a co-author of the initial version. We appreciate useful suggestions from Liangjun Su, the coeditor and three anonymous referees. All errors are the responsibility of the authors. yDepartment of Economics, UCLA, Los Angeles, CA 90095-1477 USA. Email: [email protected] zDepartment of Economics, UCLA, Los Angeles, CA 90095-1477 USA. Email: [email protected] xDepartment of Economics, University of Southern California, Los Angeles, CA 90089. Email: [email protected].
1 inputs. Large sample properties such as the root-n asymptotic normality of the second step parametric estimators are well established in the literature. See, e.g., Newey (1994), Chen, Linton and van Keilegom (2003), and Ichimura and Lee (2010). Despite the mature nature of the literature, the mathematical framework employed by the existing general theory papers is not easily applicable to situations where some of the regressors need to be estimated. Many estimators that utilize so-called control variables may be incorrectly analyzed if this mathematical framework is adopted without proper modi…cation, because the control variables need to be estimated in many applications. See Hahn and Ridder (2013). Estimators using control variables are often based on the intuition that certain endogeneity problems may be overcome by conditioning on such variables.1 The control variables need to be estimated in practice, often by semiparametric or nonparametric methods. The second step estimation usually takes the form of a semiparametric or nonparametric estimation, as well. See, e.g., Olley and Pakes (1996), Newey, Powell and Vella (1999), Lee (2007), and Newey (2009). Such estimators are becoming increasingly important in the literature, yet their asymptotic properties have been established only on a case-by- case basis. In this paper, we establish statistical properties of the two step estimators in an extended mathematical framework that can address the generated regressor problem. We present a mathematical framework general enough to nest both the previous two-step estimation literature and the literature on control variables. This goal is achieved by investigating statistical prop- erties of nonparametric two-step sieve M estimation, where the second step may involve estimation of some in…nite dimensional parameters, and the nonparametric components in the second step may use some estimated value from the …rst step as input. The parameters of interest are functionals of unknown functions estimated in both steps. The estimation procedure considered in the paper formally consists n of the following steps. In the …rst step, we maximize i=1 ' (Z1;i; h) with respect to a (possibly in…- nite dimensional) parameter h. Letting hn denote theP maximizer in the …rst step, we then maximize n (Z ; g; h ) with respect to yet another (possibly in…nite dimensional) parameter g and get the i=1 2;i n b Psecond-step estimator gn. Let (ho; go) denote the parameter of interest, where the functional ( ) is b known. We …nally estimate (ho; go) by a simple plug-in two-step sieve M estimator (hn; gn). We allow b the …rst step function h to enter into the second step function g as an argument, as is common in many b b procedures using estimated control variables. By establishing statistical properties of such sieve two-step estimators, we contribute toward a general theory of semi/nonparametric two-step estimation.2 Our primary concern are the practical aspects of inference. We show that the numerical equivalence results in the literature, i.e., Newey (1994, Section 6) and Ackerberg, Chen and Hahn (2012), continue to hold in the general framework that nests estimators that use estimated control variables. Newey (1994,
2 Section 6) and Ackerberg, Chen and Hahn (2012) consider a framework that rules out the estimated control variable problem, and showed that in the context of sieve/series estimation, practitioners may assume that his/her model is parametric when a consistent estimator of the asymptotic variance is desired. In other words, practitioners can ignore the in…nite dimensional nature of the problem, and proceed with the standard parametric procedure proposed by Newey (1984) or Murphy and Topel (1985). We show that practitioners may continue to adopt such a convenient (yet incorrectly speci…ed) parametric model even in more general estimators that require dealing with estimated control variables. To be more
precise, our consistent estimator of the asymptotic variance of (hn; gn) turns out to be identical to a consistent estimator of the asymptotic variance derived under such a parametric interpretation. Because b b the asymptotic variance of (hn; gn) often takes a very complicated form, such numerical equivalence is expected to facilitate inference in practice. b b The rest of this paper is organized as follows. Section 2 introduces the nonparametric two-step sieve
M estimation and presents several examples. Section 3 establishes the asymptotic normality of (hn; gn).
Section 4 proposes a consistent sieve estimator of the asymptotic variance of (hn; gn). Section 5 studies an b b nonparametric two-step regression example to illustrate the high-level conditions in Section 3 and Section b b 4. Section 6 presents simple numerically equivalent ways of calculating the sieve variance estimator, which is the main result of the paper. Section 7 reports the results of a simulation studies of the …nite sample performance of the two-step nonparametric estimator and the proposed inference procedure. Section 8 concludes. A simple cross-validation method for selecting the tuning parameters in the two- step nonparametric regression is provided in Appendix A. Appendix B contains the proof of the normality theorem. The proofs of the results in Sections 4 and 6 are gathered in Appendix C. Su¢ cient conditions of the main theorem of the two-step nonparametric regression example are available in Appendix D. This theorem is proved by several lemmas which are included in the Supplemental Appendix of the paper. The consistency and the convergence rate of the second-step sieve M estimator gn, which are not the main focus of this paper, are also presented in the Supplemental Appendix. b
2 Two-step Sieve M Estimation
In this section we introduce some notation for the two-step nonparametric sieve M estimation and present several examples which are covered by our general setup. In the …rst step, we use the random vector Z1;i
i = 1; : : : ; n to identify and estimate a potentially in…nite dimensional parameter ho. In the second step,
we use the random vector Z2;i i = 1; : : : ; n to identify and estimate a potentially in…nite dimensional
3 parameter go. Let Zi be the union of the distinct elements of Z1;i and Z2;i, and assume that Zi has the n same distribution FZ . We will assume that Zi is a sequence of strictly stationary weakly dependent f gi=1 data.
The two-step estimation is based on the identifying assumption that ho is the unique solution 2 H to suph E[' (Z1; h)] and go is the unique solution to supg E[ (Z2; g; ho)], where ( ; ) and 2H 2 G 2G H kkH ( ; ) are some in…nite dimensional separable complete metric spaces, and and could be the G kkG kkH kkG sup-norm, the L2(dFZ )-norm, the Sobolev norm, or the Hölder norm. The two-step (approximate) sieve M estimation utilizes the sample analog of the identifying assump-
tion. We restrict our estimators of ho and go to be in some …nite dimensional sieve spaces n and n H G respectively, where the dimension of each space grows to in…nity as a function of the sample size n. In
the …rst step, we estimate ho by hn n de…ned as 2 H 2 H n b n 1 1 2 ' Z1;i; hn sup ' (Z1;i; h) Op(" ); (1) n n 1;n i=1 h n i=1 X 2H X b in the second step, we estimate go by gn n de…ned as 2 G 2 G n n 1 b 1 2 Z2;i; gn; hn sup Z2;i; g; hn Op(" ); (2) n n 2;n i=1 g n i=1 X 2G X b b b where n K(n) and n L(n) are the sieve spaces that become dense in ( ; ) and ( ; ) H H G G H kkH G kkG as L(n) dim( n) and K(n) dim( n) respectively. The magnitudes of the optimization H ! 1 G ! 1 errors "2 and "2 are small positive numbers that go to zero as n . See, e.g., Chen (2007) for many 1;n 2;n ! 1 examples of sieve spaces and criterion functions for sieve M estimation. Below are a few examples in the literature that adopt such a two-step sieve M estimation strategy.
Example 2.1 (Nonparametric Triangular Simultaneous Equation Model) Newey, Powell and Vella n (1999) study a nonparametric triangular simultaneous equation model. Let (yi; xi; z1;i) be a random f gi=1 sample generated according to the model
y = mo(x; z2) + ; E[ u; z1] = E[ u]; j j
x = ho(z1) + u; E [u z1] = 0; j
where ho and mo , z2 z1. This yields the identifying relation E [y x; z1] = mo(x; z2) + o (u), 2 H 2 M j 2 where o (u) E [ u] = : E[(u) ] < ; (u) = , here (u) = (for known constants u; ) is j 2 f 1 g
4 a location normalization. Their model speci…cation implies that the ho is identi…ed as a solution to the in…nite dimensional least squares problem
2 sup E (x h (z1)) : h 2H h i
Given ho, the go = (mo; o) is a solution to another in…nite dimensional least squares problem
2 sup E (y m (x; z2) (x ho (z1))) : m ; 2M 2 h i
Newey, Powell and Vella (1999) estimate ho and go = (mo; o) by a two-step series Least 2 H 2 G Squares regression. In the …rst step, they compute
n 1 2 hn = arg max (xi h (z1;i)) ; h n n 2H i=1 X b in the second step, they compute
n 1 2 gn = (mn; n) = arg max yi m (xi; z2;i) xi hn (z1;i) ; m n; n n 2M 2 i=1 X b b b b where n and n = n n are …nite dimensional linear sieves (or series). Let ( ) be a linear functional H G M of g. Newey, Powell and Vella (1999) establish the asymptotic normality of plug-in sieve estimate (gn) of their parameter of interest (go). b
Example 2.2 (Nonparametric sample selection with endogeneity) Das, Newey and Vella (2003) n studies a nonparametric sample selection model with endogeneity. Let (yi; di; xi; z1i) be a random f gi=1 sample generated according to the model
y = d y; y = mo(x; z2) + ;
x = ho1(z1) + u; E [u z1] = 0; Pr(d = 1 z1) = ho2(z1); j j where ho1 1, ho2 2 and mo , z2 z1, y is not directly observable. Das, Newey and Vella 2 H 2 H 2 M (2003) impose the restriction E [ u; z1; d = 1] = o (ho2; u) for some o . This gives us the identifying j 2 relation E [y x; z1; d = 1] = mo (x; z2) + o(ho2; u) = + . Their model speci…cation implies that j 2 G M
5 the ho = (ho1; ho2) is identi…ed as a solution to
2 2 sup E[ (x h1 (z1)) ] and sup E[ (d h2(z1)) ]: h1 1 h2 2 2H 2H
Given ho = (ho1; ho2), the go = mo + o is a solution to the in…nite dimensional least squares problem
2 sup E (y m (x; z2) (ho2 (z1) ; x ho1 (z1))) : g=m+ 2G h i
Das, Newey and Vella (2003) estimate ho = (ho1; ho2) and go = mo + o by a two-step series Least Squares regression, which exactly …ts into our two-step sieve M estimation procedure. Let ( ) be a linear functional of g. Das, Newey and Vella (2003) establish the asymptotic normality of plug-in sieve estimate
(gn) of their parameter of interest (go).
Exampleb 2.3 (Production Function) Olley and Pakes (1996) consider estimation of the Cobb-Douglas production function
yit = 0 + kkit + llit + !it + it; t = 1; 2 where yit; kit; lit denote the (logs of) output, capital and labor inputs, respectively. The !it denotes an unobserved productivity index that follows a …rst-order Markov process, and it can be viewed as a mea- surement error. Assuming away the potential invertibility problem of the investment demand function, they show that the l is identi…ed as the solution to the semiparametric problem
2 min E (yi1 lli1 1 (ii1; ki1)) l;1 where 1 is an in…nite dimensional parameter. Given ( l; 1), they show that k is identi…ed as the solution to the semiparametric problem
2 min E (yi2 lli2 kki2 (1 (ii1; ki1) kki1)) k; h i where is an in…nite dimensional parameter.
3 Asymptotic Normality
In this section, we establish the asymptotic normality of the plug-in two-step sieve M estimators. We characterize the asymptotic variance of the estimator, and we propose an estimator of the asymptotic
6 variance that is its sample analog in the next section. As is typical in the literature on sieve estimation, our asymptotic normality result is predicated on certain consistency and rate of convergence results.3
Let n = (hn; gn) and o = (ho; go). We assume that the convergence rates of hn and gn are 1;n and
2;n under and respectively, where j;n is a positive sequence such that j;n = o(1) for j = 1; 2. b kkHb b kkG f g b b Let j;n = log(log(n)) (j = 1; 2). Then n belongs to the shrinking neighborhood n = (h; g): h j;n N f 2 h;n and g g;n with probability approaching 1 (wpa1), where h;n = h n : h ho 1;n N 2 N g b N f 2 H k kH g and g;n = g n : g go 2;n . N f 2 G k kG g We suppose that for all h h;n, '(Z1; h) '(Z1; ho) can be approximated by '(Z1; ho)[h ho] 2 N such that '(Z1; ho)[h ho] is linear in h ho. As ho is the unique maximizer of E ['(Z1; h)] on , we H can let @E ['(Z1; ho + [h ho])[h ho]] 2 h ho ; @ k k' =0
which de…nes a norm on h;n. Let 1 be the closed linear span of h;n ho under , which is a N V N f g kk' Hilbert space under , with the corresponding inner product ; ' de…ned as kk' h i
@E ['(Z1; ho + vh2 )[vh1 ]] vh ; vh ' = (3) h 1 2 i @ =0
for any vh ; vh 1. In many examples, we have 1 2 2 V
@'(Z1; ho + vh1 ) @'(Z1; ho + vh2 )[vh1 ] '(Z1; ho)[vh ] = and vh ; vh ' = E ; 1 @ h 1 2 i @ =0 =0
if the derivative exists and we can interchange di¤erentiation and expectation. We assume that there is a linear functional @h( o)[ ]: 1 R such that V !
@(ho + v; go) @h( o)[v] = for all v 1. @ 2 V =0
Let ho;n denote the projection of ho on n under the norm . Let 1;n denote the Hilbert space H kk' V generated by h;n ho;n . Then dim( 1;n) = L(n) < . By Riesz representation theorem, there is a N f g V 1 sieve Riesz representer v 1;n such that hn 2 V
@ ( )[v] 2 @ ( )[v] = v ; v for all v , and v 2 = sup h o : (4) h o hn ' 1;n hn ' j 2 j 2 V 0=v 1;n v ' 6 2V k k
Moreover, @h( o)[ ]: 1 R is a bounded (or regular) functional if and only if limL(n) vh < . V ! !1 n ' 1
7 Similarly, we use (Z2; go; ho)[g go] to denote the linear approximation of (Z2; g; ho) (Z2; go; ho) for any g g;n. As go is the unique maximizer of E [ (Z2; g; ho)] on , we can let 2 N G
@E [ (Z2; go + [g go]; ho)[g go]] 2 g go ; @ k k =0
which de…nes a norm on g;n. Let 2 be the closed linear span of g;n go under , which is a N V N f g kk Hilbert space under , with the corresponding inner product ; de…ned as kk h i
@E [ (Z2; go + vg2 ; ho)[vg1 ]] vg ; vg = : (5) h 1 2 i @ =0
for any vg ; vg 2. In many cases, we have 1 2 2 V
@ (Z2; go + vg1 ; ho) @ (Z2; go + vg2 ; ho)[vg1 ] (Z2; o)[vg ] = and vg ; vg = E : 1 @ h 1 2 i @ =0
We assume that there is a linear functional @g( o)[ ]: 2 R such that V !
@(ho; go + v) @g( o)[v] = for all v 2. (6) @ 2 V =0
Let go;n denote the projection of go on n under the norm . Let 2;n denote the Hilbert space G kk V generated by g;n go;n . Then dim( 2;n) = K(n) < . By Riesz representation theorem, there is a N f g V 1 sieve Riesz representer v 2;n such that gn 2 V
@ ( )[v] 2 @ ( )[v ] = v ; v for all v , and v 2 = sup g o : (7) g o g gn g 2;n gn j 2 j 2 V 0=v 2;n v 6 2V k k
Moreover, @g( o)[ ]: 2 R is a bounded functional if and only if limK(n) vg < . V ! !1 n 1 Let = 1 2. For any v = (vh; vg) we denote V V V 2 V
@ ( o)[v] = @h( o)[vh] + @g( o)[vg].
Then @ ( o)[ ]: R is a bounded functional if and only if limL(n) vh < and limK(n) vgn < V! !1 n ' 1 !1 . 1 To evaluate the e¤ect of the …rst-step estimation on the asymptotic variance of the second-step sieve
8 M estimator, we de…ne
@E [ (Z2; go + vg; ho)] g( o)[vg] = for any vg 2 @ 2 V =0
and @ g(go; ho + vh)[vg] ( o)[vh; vg] = for any vh 1: @ 2 V =0
We assume that ( o)[ ; ] is a bilinear functional on . Given the Riesz representer v in (7), we de…ne V gn v 1;n as n 2 V ( o) vh; v = vh; v for any vh 1;n: (8) gn n ' 2 V Using the sieve Riesz representers v , v and v , we de…ne hn gn n
2 1 n v = Var n 2 '(Z1;i; ho)[v ] + '(Z1;i; ho)[v ] + (Z2;i; o)[v ] : (9) k nksd i=1 hn n gn h X i
As we can see from (9), the sieve variance of the plug-in estimator ( n) is determined by three compo- nents.4 To explain the sources of these components, we assume that the criterion function in the second- b 1 n step estimation is smooth. The …rst component n 2 (Z ; h )[v ] is from the estimation error i=1 ' 1;i o hn 2 of ho which enters v because ( o) depends on hoP. When ( o) only depends on go, we have v = 0 k nksd hn 2 1 n and then this component will not show up in v . The second component n 2 '(Z1;i; ho)[v ] k nksd i=1 n 2 is also from the estimation error of ho which enters v because the …rst derivativeP of the criterion k nksd function in the second-step estimation depends on the …rst-step M estimator hn. The last component 1 n n 2 (Z ; h )[v ] is from the estimation error of g in the second-step estimation assuming that i=1 ' 1;i o gn o b 2 the unknownP parameter ho is known. It enters v because ( o) depends on go. When ( o) only k nksd 2 depends on go and the parameter ho is known, v is identical to the sieve variance of the one-step k nksd plug-in sieve M estimator found in Chen, Liao and Sun (2014) and Chen and Liao (2014). 2 In this paper, we restrict our attention to the class of functionals ( o) such that v C for any k nksd n. We next list the assumptions needed for showing the asymptotic normality of ( n).
Assumption 3.1 (i) lim infn v > 0; (ii) the functional ( ) satis…es b k nksd
( ) ( o) @h( o)[h ho] @g( o)[g go] 1 sup = o(n 2 ); v n n sd 2N k k
(iii) there exists gn n such that gn go = O(2;n) and for any vh 1 and vg 2, vh ' 2 G k kG 2 V 2 V k k
9 c' vh and vg c vg where c' and c are some generic …nite positive constants; (iv) k kH k k k kG
1 1 max @h( o)[ho;n ho] ; @g( o)[go;n go] = o(n 2 ): v fj j j jg k nksd Assumption 3.1.(i) ensures that the sieve variance is asymptotically nonzero. When ( ) is a nonlin- ear functional in , Assumption 3.1.(ii) implies that there is a linear approximation @h( o)[h ho] + 1 @g( o)[g go] for ( ) uniformly over n with approximation error o( v n 2 ). Assumption 2 N k nksd 3.1.(iii) implies that ' and may be weaker than the pseudo-metrics and respectively. kk kk kkH kkG By Assumption 3.1.(iii) and the de…nition of go;n, we have
go go;n go gn go gn = O(2;n); k k k k k kG which indicates that go;n g;n. Similarly we have ho;n h;n, which together with the former result 2 N 2 N implies that (ho;n; go;n) n. Assumption 3.1.(iv) also requires that the sieve approximation error 2 N 1 converges to zero at a rate faster than n 2 v , which is an under-smoothing condition to derive the k nksd zero mean asymptotic normality of the sieve plug-in estimator ( n). 1 De…ne uh ; ugn ; u = vn sd vh ; vgn ; v and g = g nugn for any g g;n, where n = n n k k n n b 2 N 1=2 1 n o n is some positive sequence. For any function f, n(f) = n [f(Zi) E [f(Zi)]] denotes i=1 the empirical process indexed by f. P
Assumption 3.2 (i) The following stochastic equicontinuity conditions hold:
2 sup n (Z2; g; h) (Z2; g; h) (Z2; g; h)[ nugn ] = Op(n) (10) n 2N
and sup n (Z2; g; h)[ugn ] (Z2; go; ho)[ugn ] = Op(n); (11) n 2N
(ii) let K (g; h) E [ (Z2; g; h) (Z2; go; ho)], then g g 2 g g 2 o o 2 K (g; h) K (g; h) = n ( o) h ho; u + jj jj jj jj + O( ) (12) gn 2 n
uniformly over (h; g) n. 2 N
The stochastic equicontinuity conditions are regular assumptions in the sieve method literature, e.g., Shen (1997), Chen and Shen (1998) and Chen, Liao and Sun (2014). Assumption 3.2.(ii) implies that the Kullback–Leibler type of distance has a local quadratic approximation uniformly over the shrinking
10 neighborhood n. When there is no …rst-step estimate hn, i.e. h = ho in (12), Assumption 3.2.(ii) is N reduced to g bg 2 g g 2 o o 2 sup K (g; ho) K (g; ho) jj jj jj jj = O(n) g g;n 2 2N
which is the condition used in Chen, Liao and Sun (2014) to derive the asymptotic normality of the one-step sieve plug-in estimator. As a result, we can view the extra term in (12) as the e¤ect of the
…rst-step estimator hn on the asymptotic distribution of the second-step sieve M estimator gn.
Assumption 3.3 (i)b The …rst-step sieve M estimator hn satis…es b
b hn ho; u + u ' n '(Z1; ho)[u + u ] = Op(n); (13) h hn n i hn n
b (ii) the following central limit theorem (CLT) holds
n 1 n 2 '(Z1;i; ho)[u + u ] + (Z2;i; go; ho)[u ] d N(0; 1); hn n gn ! i=1 X 1 (iii) "2;n = O(n), n = o(1) and u = O(1). 2;n jj gn jj Assumption 3.3.(i) is a high level condition, which is established in Chen, Liao and Sun (2014) under a set of su¢ cient conditions. Assumption 3.3.(ii) is implied by the triangle array CLTs. Assumption
3.3.(iii) implies that the optimization error "2;n in the second-step sieve M estimation is of the same or larger order as n. As 2;n is the convergence rate of the second-step sieve M estimator gn, under the
stationary data assumption, it is reasonable to assume that 2;n converges to zero at a rate not faster b 1 than root-n, which explains the assumption n2 ;n = o(1).
Theorem 3.1 Suppose that Assumptions 3.2 and 3.3 hold. Then under Assumption 3.1.(i)-(iii), the
sieve plug-in estimator (hn; gn) satis…es
b b pn (hn; gn) (ho;n; go;n) d N(0; 1) (14) h vn sd i ! b bk k where v is de…ned in (9). Furthermore, if Assumption 3.1.(iv) is also satis…ed, then k nksd
pn (hn; gn) (ho; go) d N(0; 1): (15) h vn sd i ! b kb k Proof. In Appendix B.
11 4 Consistent Variance Estimation
The Gaussian approximation in Theorem 3.1 suggests a natural method of inference as long as a consistent estimator of v is available. In this section, we provide such a consistent estimator. An alternative k nksd is to consider bootstrap based con…dence sets, whose asymptotic validity could be established but is computationally more time-consuming. For simplicity of notation, we will assume in this and the next two sections that the data are i.i.d.,5
and that the criterion functions '(Z1; h) and (Z2; g; h) are respectively twice continuously pathwise
di¤erentiable with respect to h and (g; h) in a local neighborhood n. N Our estimator is a natural sample analog of the v , but for the sake of completeness, we provide k nksd the details below. We de…ne
@'(Z ; h + v ) @ (Z ; h + v )[v ] (Z ; h)[v ] = 1 h1 and r (Z ; h)[v ; v ] = ' 1 h1 h2 ' 1 h1 @ ' 1 h1 h2 @ =0 =0
for any vh ; vh 1;n. Similarly, we de…ne (Z2; g; h)[vg ] and r (Z2; g; h)[vg ; vg ] for any vg ; vg 1 2 2 V 1 1 2 1 2 2 2;n. We de…ne the empirical Riesz representer v by V hn
b @h( n)[vh] = vh; v for any vh 1;n (16) hn n;' 2 V
1 n b b where vh ; vh = r'(Z1; hn)[vh ; vh ]. Similarly, we de…ne v as h 1 2 in;' n i=1 1 2 gn P b b @g( n)[vg] = vg; v for any vg 2;n (17) gn n; 2 V
1 n b b where vg ; vg = r (Z2; n)[vg ; vg ]; and v as h 1 2 in; n i=1 1 2 n P b b n( n) vh; v = vh; v for any vh 1;n; (18) gn n n;' 2 V b b b where n 1 @ (Z2; gn; hn + vh)[vgn ] n( n) vh; vgn = : n @ i=1 =0 X b b b b b 2 Based on these ingredients, we propose to estimate vn by k ksd
n 2 2 1 v = '(Z1;i; hn)[v + v ] + (Z2;i; gn; hn)[v ] : (19) k nkn;sd n hn n gn i=1 X b b b b b b b
12 Our estimator v 2 is nothing but a natural sample analog of v . k nkn;sd k nksd
Let 1;n h 1;n : h ' 1 and 2;n g 2;n : g 1 and vg1 ; vg2 = W b f 2 V k k g W f 2 V k k g h i E r (Z2; o)[vg ; vg ] for any vg ; vg 2. We next list the assumptions needed for showing the f 1 2 g 1 2 2 V consistency of the sieve variance estimator.
Assumption 4.1 The following conditions hold:
(i) sup n;vg ;vg 2;n E r (Z2; )[vg1 ; vg2 ] r (Z2; o)[vg1 ; vg2 ] = o(1); 2N 1 2 2W j f gj
(ii) sup n;vg ;vg 2;n n r (Z2; )[vg1 ; vg2 ] = op(1); 2N 1 2 2W f g
(iii) sup n;vg 2;n @g( )[vg] @g( o)[vg] = o(1). 2N 2W j j
Assumptions 4.1.(i)-(ii) are useful to show that the empirical inner product vg ; vg and the popu- h 1 2 in; lation inner product vg ; vg are asymptotically equivalent uniformly over vg ; vg 2;n. Assumption h 1 2 i 1 2 2 W 4.1.(iii) implies that the empirical pathwise derivative @g( n)[vg] is close to the theoretical pathwise
derivative @g( o)[vg] uniformly over vg 2;n. When the functional ( ) is linear in g, Assumption 2 W b 4.1.(iii) is trivially satis…ed. 1 Let v 2;n : v v v v ;n , where v ;n = o(1) is some positive sequence such B2;n f 2 V gn gn g g g that v wpa1 and vh ; vh = E [ r'(Z1; ho)[vh ; vh ]] for any vh ; vh 1. gn 2 B2;n h 1 2 i' 1 2 1 2 2 V
Assumptionb 4.2 The following conditions hold:
(i) suph h;n;vh ;vh 1;n E [r'(Z1; h)[vh1 ; vh2 ] r'(Z1; ho)[vh1 ; vh2 ]] = o(1); 2N 1 2 2W j j
(ii) suph h;n;vh ;vh 1;n n r'(Z1; h)[vh1 ; vh2 ] = op(1); 2N 1 2 2W f g
(iii) sup n;vh 1;n @h( )[vh] @h( o)[vh] = o(1); 2N 2W j j
(iv) sup n;vg ;vh 1;n n( )[vh; vg] ( o) vh; vgn = op(1). 2N 2B2;n 2W
Assumptions 4.2.(i)-(ii) are useful to show the empirical inner product vh ; vh and the population h 1 2 in;' inner product vh ; vh are asymptotically equivalent uniformly over vh ; vh 1;n. Assumption h 1 2 i' 1 2 2 W 4.2.(iii) implies that the empirical pathwise derivative @h( n)[vh] is close to the population pathwise
derivative @h( o)[vh] uniformly over vh 1;n. Assumption 4.2.(iii) is trivially satis…ed, if the functional 2 W b
( ) is linear in h. Assumption 4.2.(iv) is needed to show that the functional ( o) vh; vgn is consistently
estimated by its empirical counterpart n( n) vh; v uniformly over vh 1;n. gn 2 W b b
13 Assumption 4.3 Let denote the L2(dFZ )-norm, then: (i) the functional '(Z1; h)[vh] satis…es kk2
sup '(Z1; h)[vh] '(Z1; ho)[vh] 2 = o(1); (20) h h;n;vh 1;n k k 2N 2W 2 and sup n '(Z1; h)[vh] = op(1); (21) h h;n;vh 1;n 2N 2W
(ii) the functional (Z2; )[vg] satis…es
sup (Z2; )[vg] (Z2; o)[vg] 2 = o(1); (22) n;vg 2;n k k 2N 2W 2 and sup n (Z2; )[vg] = op(1); (23) n;vg 2;n 2N 2W
(iii) the following condition holds
sup n '(Z1; h)[vh] (Z2; )[vg] = op(1); (24) n;vg 2;n;vh 1;n j f gj 2N 2W 2W
(iv) sup (Z ; h )[v ] = O(1) and sup (Z ; )[v ] = O(1); moreover vh 1;n ' 1 o h 2 vg 2;n 2 o g 2 2W k k 2W k k
vh + v + vgn n ' n ' = O(1): (25) '(Z1; ho)[v + v ] + (Z2; o)[vg ] hn n n 2
Conditions (20) and (22) require some local smoothness of '(Z1; h)[ ] in h h;n and (Z2; )[ ] 2 N in n respectively. Conditions (21), (23) and (24) can be veri…ed by applying uniform law of large 2 N numbers from the empirical process theory. To illustrate Assumption 4.3.(iv), suppose that vh ( ) = L(n) K(n) R ( )0 for some R and vg ( ) = P ( )0 for some R where R ( ) = [r1 ( ) ; : : : ; rK(n) ( )]0 2 2 and P ( ) = [p1 ( ) ; : : : ; p ( )] are two vectors of basis functions. Let L(n) 0
'(Z1; ho)[R] = '(Z1; ho)[r1];:::; '(Z1; ho)[rL(n)] 0 and
(Z2; o)[P ] = (Z2; o)[p1];:::; (Z2; o)[pK(n)] 0 . In most cases, Assumption 4.3.(iv) is implied by the regularity conditions on the eigenvalues of the
matrices E ['(Z1; ho)[R]'(Z1; ho)[R]0] and E [ (Z2; o)[P ] (Z2; o)[P ]0].
Theorem 4.1 Suppose that the data are i.i.d. and the conditions in Theorem 3.1 are satis…ed. Then
14 under Assumptions 4.1, 4.2 and 4.3, we have
v v 1 = op(1): k nkn;sd k nksd .
b Proof. In Appendix C. Based on Theorem 3.1 and Theorem 4.1, we can obtain the conclusion that
pn (hn; gn) (ho; go) d N(0; 1); h vn n;sd i ! b kb k which can be used to construct con…dence bandsb for (ho; go).
5 An Application
In this section, we illustrate the high level conditions and the main results established in the previous n sections in a two-step nonparametric regression example. Suppose we have i.i.d. data yi; xi; si from f gi=1 the following model:
yi = go("i) + ui;
si = ho(xi) + "i; (26)
where E ["i xi] = 0, E [ui xi;"i] = 0, ho and go are unknown parameters. The parameter of interest is j j (go), where the functional ( ) is known. For ease of notation, we assume that yi, xi and si are univariate random variables. The …rst-step M estimator is n 1 1 2 hn = arg max (si h(xi)) (27) h n n 2 2H i=1 X b L(n) where n = h : h ( ) = R( )0 , R . Let R(x) = r1(x); : : : ; rL(n)(x) 0 and Rn = [R(x1);:::;R(xn)]. H 2 The …rst step M estimator hn has a closed form expression
b 1 hn( ) = R(x)0 RnR0 RnSn = R( )0 n (28) n b b where Sn = [s1; : : : ; sn]0. From the …rst step estimator, we calculate "i = si hn(xi) for i = 1; : : : ; n. Let
b b
15 P (") = p1("); : : : ; pK(n)(") 0 and Pn = [P ("1);:::;P ("n)]0. The second-step M estimator is b b n b 1 1 2 gn = arg max (yi g("i)) (29) g n n 2 2G i=1 X b b K(n) where n = g : g ( ) = P ( )0 , R . The second step M estimator gn also has a closed form G 2 expression b 1 gn(") = P (")0(Pn0 Pn) Pn0 Yn = P (")0 n; (30)
where Yn = [y1; : : : ; yn]0. b b b b b
Using the speci…c forms of the criterion functions in the M estimations, we have vh ; vh ' = h 1 2 i E [vh (x)vh (x)] for any vh ; vh 1, and vg ; vg = E [vg (")vg (")] for any vg ; vg 2. As ( ) only 1 2 1 2 2 V h 1 2 i 1 2 1 2 2 V depends on go, we use @(go)[ ] to denote the linear functional de…ned in (6). As 2;n is the linear space V spanned by the basis functions P ("), the Riesz representer v of the functional @(go)[ ] has a closed gn form expression 1 vgn (") = @(go)[P ]0QK (n)P ("); (31) where @(go)[P ]0 = [@(go)[p1]; : : : ; @(go)[pK(n)]] and QK(n) = E [P (")P (")0]. By de…nition,