arXiv:1610.01697v3 [stat.ME] 26 Feb 2020 [email protected] [email protected] [email protected] ∗ † ‡ CA eateto cnmc,88 uceHl,Mi tp 1477 Stop: Mail Hall, Bunche 8283 Economics, of Department UCLA, nvriyo ayad eateto cnmc,TdnsHl 314 Hall Tydings Economics, of Department Maryland, of University CA eateto cnmc,88 uceHl,Mi tp 1477 Stop: Mail Hall, Bunche 8283 Economics, of Department UCLA, eta ii hoyfrCmie rs-eto and Cross-Section Combined for Theory Limit Central oe’ 18)formula. approximati (1985) our Topel’s of similarity formal a to implement due to distributions straightforward is it CLT, joint the complicate formulate the Despite shocks. aggregate of presence the interdepen in the recognizing data, series time mode and of sectional class general fact a of (CL common estimates parameter theorem on of limit properties central focus our We Using dependence. sets. this behind data two the that theory between limit dependence central a develop We economics. empirical iyn Hahn Jinyong obnn rs-eto n iesre aai ogadwe and long a is data series time and cross-section Combining UCLA ∗ nvriyo Maryland of University ud Kuersteiner Guido eray2,2020 27, February ieSeries Time Abstract 1 † ec ewe h w aasources data two the between dence n ihtesadr upyand Murphy standard the with ons aueo h nlssrqie to required analysis the of nature d sbsdo obnto fcross- of combination a on based ls h eutn aaee limiting parameter resulting the )w sals h asymptotic the establish we T) xlctyacut o possible for accounts explicitly arzoMazzocco Maurizio ,CleePr,M,272 kuer- 20742, MD, Park, College 5, 3 o nee,C 09,mmaz- 90095, CA Angeles, Los 03, letbihdpatc in practice established ll r stemechanism the as ors 3 o nee,C 90095, CA Angeles, Los 03, UCLA ‡ 1 Introduction
There is a long tradition in empirical economics of relying on information from a variety of data sources to estimate model parameters. In this paper we focus on a situation where cross-sectional and time-series data are combined. This may be done for a variety of reasons. Some parameters may not be identified in the cross section or time series alone. Alternatively, parameters estimated from one data source may be used as first-step inputs in the estimation of a second set of parameters based on a different data source. This may be done to reduce the dimension of the parameter space or more generally for computational reasons. Data combination generates theoretical challenges, even when only cross-sectional data sets or time-series data sets are combined. See Ridder and Moffitt (2007) for example. We focus on dependence between cross-sectional and time-series data produced by common aggregate factors. Andrews (2005) demonstrates that even randomly sampled cross-sections lead to independently distributed samples only conditionally on common factors, since the factors introduce a possible correlation. This correlation extends inevitably to a time series sample that depends on the same underlying factors. Andrews (2005) only considers cross-sections and panels with fixed T with sufficient regularity conditions for consistent estimation. Our work in this and in our companion papers Hahn, Kuersteiner and Mazzocco (2016, 2019a, 2019b) extends the analysis in Andrews to situations where consistent estimation is only possible by utilizing an additional time series data set. Examples in our companion papers draw on literatures for the structural estimation of rational expectations models as well as the program evaluation literature. Examples for the former include cross-sectional studies of the consumption based asset pricing model in Runkle (1991), Shea (1995) and Vissing-Jorgensen (2002) and a recent example of the latter is Rosenzweig and Udry (2019). The first contribution of this paper is to develop a central limit theory that explicitly accounts for the dependence between the cross-sectional and time-series data by using the notion of stable convergence. The second contribution is to use the corresponding central limit theorem to derive the asymptotic distribution of parameter estimators obtained from combining the two data sources. Our analysis is inspired by a number of applied papers and in particular by the discussion in Lee and Wolpin (2006, 2010). Econometric estimation based on the combination of cross-sectional and time-series data is an idea that dates back at least to Tobin (1950). More recently, Heckman and
2 Sedlacek (1985) and Heckman, Lochner, and Taber (1998) proposed to deal with the estimation of equilibrium models by exploiting such data combination. It is, however, Lee and Wolpin (2006, 2010) who develop the most extensive equilibrium model and estimate it using similar intuition and panel data. To derive the new central limit theorem and the asymptotic distribution of parameter estimates, we extend the model developed in Lee and Wolpin (2016, 2010) to a general setting that involves two submodels. The first submodel includes all the cross-sectional features, whereas the second submodel is composed of all the time-series aspects. The two submodels are linked by a vector of aggregate shocks and by the parameters that govern their dynamics. Given the interplay between the two submodels, the aggregate shocks have complicated effects on the estimation of the parameters of interest. With the objective of creating a framework to perform inference in our general model, we first derive a joint functional stable central limit theorem for cross-sectional and time-series data. The central limit theorem explicitly accounts for the factor-induced dependence between the two samples even when the cross-sectional sample is obtained by random sampling, a special case covered by our theory. We derive the central limit theorem under the condition that the dimension of the cross-sectional data n as well as the dimension of the time series data τ go to infinity. Using our central limit theorem we then derive the asymptotic distribution of the parameter estimators that characterize our general model. To our knowledge, this is the first paper that derives an asymptotic theory that combines cross sectional and time series data. In order to deal with parameters estimated using two data sets of completely different nature, we adopt the notion of stable convergence. Stable convergence dates back to Reyni (1963) and was recently used in Kuersteiner and Prucha (2013) in a panel data context to establish joint limiting distributions. Using this concept, we show that the asymptotic distributions of the parameter estimators are a combination of asymptotic distributions from cross-sectional analysis and time-series analysis. Stable convergence has found wide applications in many areas of statistics. In econometrics it has been applied most prominently in the area of high frequency financial econometrics1 while its use in panel and cross-sectional econometrics is relatively recent and less common. Important references to the high frequency literature include Barndorff-Nielsen, Hansen, Lunde and Shep-
1We thank one of the referees for bringing this literature to our attention.
3 hard (2008), Jacod, Podolskij and Vetter (2010) and Li and Xiu (2016). Monographs treating the probability theoretic foundation for this line of research are Jacod and Shiryaev (2003, henthforth JS), and Jacod and Protter (2012). In line with the literature on high frequency financial econo- metrics, we base our definition of stable convergence in function spaces on JS. In the introduction to their book, JS outline two different strategies of proving central limit theorems. One is what they term the ‘martingale method’ which dates back at least to Stroock and Varadhan (1979). The other is the approach followed by Billingsley (1968) which consists of establishing tightness, finite dimensional convergence and identification of the finite dimensional limiting distribution. In this paper we follow the second approach while JS and the cited papers in the high frequency literature rely on the martingale method. The reason for the difference in approach lies in the fact that the primitive parameters central to the martingale method, essentially a set of conditional moments called the triplets of characteristics of the approximating and limiting process in the language of JS, have no analog in the models we study. The data generating processes (DGP) producing our samples are not embedded in limiting diffusion processes. Even if our data generating process could be represented by or approximated with such diffusions, the limits we take assume that the DGP remains fixed. This assumption precludes any notion of convergence of characteristics that is required for the martingale method. Our limit theory has a second connection to the high frequency literature. Barndorff-Nielsen et.al (2008, Proposition 5) and Li and Xiu (2016, Lemma A3), develop methods to deduce joint stable convergence of two random sequences from stable convergence of the first random sequence and conditional convergence in law of the second random sequence. The assumptions made in these papers, namely that the noise terms are mean zero conditional on the entire time series are stronger than the assumptions we make in our paper. It is often unnatural to condition on the entire time series of common shocks, which we avoid in this paper. While the formal derivation of the asymptotic distribution may appear complicated, the asymp- totic formulae that we produce are straightforward to implement and very similar to the standard Murphy and Topel’s (1985) formula. We also derive a novel result related to the unit root literature. We show that, when the time-series data are characterized by unit roots, the asymptotic distribution is a combination of a normal distribution and the distribution found in the unit root literature. Therefore, the
4 asymptotic distribution exhibits mathematical similarities to the inferential problem in predictive regressions, as discussed by Campbell and Yogo (2006). However, the similarity is superficial in that Campbell and Yogo’s (2006) result is about an estimator based on a single data source. But, similarly to Campbell and Yogo’s analysis, we need to address problems of uniform inference. Phillips (2014) proposes a method of uniform inference for predictive regressions, which we adopt and modify to our own estimation problem in the unit root case. Our results should be of interest to both applied microeconomists and macroeconomists. Data combination is common practice in the macro calibration literature where typically a subset of parameters is determined based on cross-sectional studies. It is also common in structural mi- croeconomics where the focus is more directly on identification issues that cannot be resolved in the cross-section alone. In a companion paper, Hahn, Kuersteiner, and Mazzocco (2016, 2019a), we discuss in detail specific examples from the micro literature. In the companion paper, we also provide a more intuitive analysis of the joint use of cross-sectional and time-series data when aggregate factors are present, whereas in this paper the analysis is more technical and abstract. The remainder of the paper is organized as follows. In Section 2, we introduce the general statistical model. In Section 3, we present the intuition underlying our main result, which is presented in Section 4.
2 Model
We start this section with a stylized economic example that motivates our econometric model. Consider a farmer who chooses how much labor to supply to the cultivation of a plot in a way that maximizes profits. The production function of the plot depends on labor, Lit, an aggregate shock, νt, and takes the following form:
α 2 σν qit = νt exp (ǫit) Lit , where 2 2 σu E [exp (ǫit)] = 1, νt = ρνt 1 + ut, and σν = Var(νt)= . − 1 ρ2 − The idea behind this specification is that labor is less productive in places with higher variance of the aggregate shocks: in places that have cold winters but also hurricane seasons in the summer,
5 the incentives to invest in human capital and productivity-improving practices are lower. The gross domestic product (or aggregate state) of the economy y in which the farmer operates depends on aggregate investment x and the aggregate shock ν in the following way:
log yt = β1 + β2 log xt + νt, where the time-series of y and x are observed. Our objective is to estimate the parameters of the farmer’s production function. Let output produced by the farmer be q and its price p. The farmer maximizes the expected profit, i.e., he solves:
max π(L)= ptE [qt] witL L − α α σ2 σ2 s.t. E [qt]= E νt exp (ǫit) L ν = νtL ν
h α i σ2 Note that the expectation E [qt] = E νt exp (ǫit) L ν is with respect to the distribution of ǫit. α 2 2 Therefore, the farmer’s problem is maxh p ν L σν wiL . Let φ = α/ σ . Then the FOC can be L t t − it ν φ 1 written as p ν φL − w = 0. Solving for L, we have t t − it 1 w φ−1 L = it . t p ν φ t t The quantity produced by the farmer is therefore
φ 1 w φ−1 q = ν− φ−1 exp (ǫ ) it . it t it p φ t By taking logs, we have
φ 1 φ w log q = log φ log ν + log it + ǫ . it −φ 1 − φ 1 t φ 1 p it − − − t Now, suppose that the econometrician has panel data consisting of quantity and real wages. Also, suppose that time series data consisting of observations for zs = (log ys, log xs) is available where we assume that E [νs log xs] = 0 so that vs can be understood to be the error term in the OLS regression of ys on xs. By regressing log qt on a constant, time dummies, and log real wages using the panel data, and then using the functional relationship among parameters, one can estimate
φ as well as νt. Then by regressing log ys on log xs and a constant using the time series data, we
6 2 can estimate β1, β2 and σν . We can therefore estimate σν and α. Combining these estimates, we can estimate the parameters of the production function. This stylized example contains the gist of our motivation, i.e., that there may be economic models whose parameters can be consistently estimated only by combining two sources of data. Below, we present a more formal description of our econometric model. 2 We assume that our cross-sectional data consist of y , i =1,...,n,t =1, ..., T ,3 where the { i,t } start time of the cross-section or panel, t = 1, is an arbitrary normalization of time. Pure cross-sections are handled by allowing for T = 1. Note that T is fixed and finite throughout our discussion while our asymptotic approximations are based on n tending to infinity. Our time series data consist of z , s = τ +1,...,τ + τ where the time series sample size τ tends to { s 0 0 } infinity jointly with n. The start point of the time series sample is either fixed at an arbitrary time τ ( , ) or depends on τ such that τ = υτ + τ + T for υ [0, 1] and τ a fixed 0 ∈ −∞ ∞ 0 − 0,f ∈ 0,f constant. We refer to the case where υ = 1 such that τ when the time series sample 0 → −∞ size τ tends to as backwards asymptotics. In this case the end of the time series sample is ∞ fixed at τ + T. A hybrid case arises when υ (0, 1) such that the start point τ of the time 0,f ∈ 0 series sample extends back in time simultaneously with the last observation in the sample τ0 + τ tending to infinity. For simplicity we refer to both scenarios υ (0, 1) and υ = 1 as backwards ∈ asymptotics. Backwards asymptotics may be more realistic in cases where recent panel data is augmented with long historical records of time series data. We show that under a mild additional regularity condition both forward and backward asymptotics lead to the same limiting distribution.
Conventional asymptotics are covered by setting υ = 0. The vector yi,t includes all information related to the cross-sectional submodel, where i is an index for individuals, households or firms, and t denotes the time period when the cross-sectional unit is observed. The second vector zs contains aggregate data. The technical assumptions for our CLT, detailed in Section 4, do not directly restrict the data,
2The example presented here is a stylized version of models discussed in detail in Hahn, Kuersteiner and Mazzocco (2019a). In that paper we provide explicit formulas and calculations that demonstrate how the results of this paper can be applied to the types of models we cover. Similar, but simpler calculations can be carried out for the example presented here. 3We do not consider models with estimated fixed effects in this paper because we assume that the parameter space is finite dimensional.
7 nor do they impose restrictions on how the data were sampled. For example, we do not assume that the cross-sectional sample was obtained by randomized sampling, although this is a special case that is covered by our assumptions. Rather than imposing restrictions directly on the data we postulate that there are two parametrized models that implicitly restrict the data. The function f (y β, ν , ρ) is used to model y as a function of cross-sectional parameters β and common i,t| t i,t shocks ν = [ν1, ..., νT ] which are treated as parameters to be estimated and time series parameters ρ. In the same way the function g (z β, ρ) restricts the behavior of some time series variables z .4 s| s Depending on the exact form of the underlying economic model, the functions f and g may have different interpretations. They could be the log-likelihoods of yi,t, conditional on νt, and zs respectively. In a likelihood setting, f and g impose restrictions on yi,t and zs because of the implied martingale properties of the score process. More generally, the functions f and g may be the basis for method of moments (the exactly identified case) or GMM (the overidentified case) estimation. In these situations parameters are identified from the conditions E [f (y β, ν , ρ)] = 0 given C i,t| t the shock ν and E [g (z β, ρ)] = 0. The first expectation, E , is understood as being over the t τ s| C cross-section population distribution holding ν = (ν1, ..., νT ) fixed, while the second, Eτ , is over the stationary distribution of the time-series data generating process. The moment conditions follow from martingale assumptions we directly impose on f and g. In our companion paper we discuss examples of economic models that rationalize these assumptions. Whether we are dealing with likelihoods or moment functions, the central limit theorem is directly formulated for the estimating functions that define the parameters. We use the notation Fn (β,ν,ρ) and Gτ (β, ρ) to denote the criterion function based on the cross-section and time series respectively. When the model specifies a log-likelihood these functions are de- fined as F (β,ν,ρ) = 1 T n f (y β, ν , ρ) and G (β, ρ) = 1 τ0+τ g (z β, ρ) . When n n t=1 i=1 i,t| t τ τ s=τ0+1 s| 1 T n the model specifies momentP conditionsP we let hn (β,ν,ρ) = P f (yi,t β, νt, ρ) and n t=1 i=1 | 1 τ0+τ kτ (β, ρ) = g (zs β, ρ) . The GMM or moment based criterionP P functions are then given τ s=τ0+1 | 4 The functionPg may naturally arise if the νt is an unobserved component that can be estimated from the aggregate time series once the parameters β and ρ are known, i.e., if νt νt (β,ρ) is a function of (zt,β,ρ) and ≡ the behavior of νt is expressed in terms of ρ. Later, we allow for the possibility that g in fact is derived from the conditional density of νt given νt−1, i.e., the possibility that g may depend on both the current and lagged values of zt. For notational simplicity, we simply write g (zs β,ρ) here for now. |
8 C τ C by F (β,ν,ρ)= h (β,ν,ρ)′ W h (β,ν,ρ) and G (β, ρ)= k (β, ρ)′ W k (β, ρ) with W and n − n n n τ − τ τ τ n τ Wτ two almost surely positive definite weight matrices. The use of two separate objective functions is helpful in our context because it enables us to discuss which issues arise if only cross-sectional variables or only time-series variables are used in the estimation.5 We formally justify the use of two data sets by imposing restrictions on the identifiability of parameters through the cross-section and time series criterion functions alone. Let Θ be a compact set constructed from the product Θ = Θ Θ Θ where Θ is a compact set that contains the β × ν × ρ β true parameter value β0, Θν is a compact set that contains the true parameter value ν0 and Θρ is a compact set that contains the true parameter value ρ0. We denote the probability limit of the objective functions by F (β, νt, ρ) and G (β, ρ) , in other words,
F (β,ν,ρ) = plim Fn (β,ν,ρ) , n →∞
G (β, ρ) = plim Gτ (β, ρ) . τ →∞ The true or pseudo true parameters are defined as the maximizers of these probability limits
(β (ρ) , ν (ρ)) argmax F (β,ν,ρ) , (1) ≡ β,ν Θ Θν ∈ β× ρ (β) argmax G (β, ρ) , (2) ≡ ρ Θρ ∈ and we denote with β0 and ρ0 the solutions to (1) and (2). The idea that neither F nor G alone are sufficient to identify both parameters is formalized as follows. If the function F is constant in ρ at the parameter values β and ν that maximize it then ρ is not identified by the criterion F alone. Formally we state that
max F (β,ν,ρ) = max F (β,ν,ρ0) for all ρ Θρ (3) β,ν Θ Θν β,ν Θ Θν ∈ β× ∈ β× ∈ It is easy to see that (3) is not a sufficient condition to restrict identification in a desirable way. For example (3) is satisfied in a setting where F does not depend at all on ρ. In that case the maximizers in (1) also do not depend on ρ and by definition coincide with β0 and ν0. To rule out this case we require that ρ0 is needed to identify β0 and ν0. Formally, we impose the condition
5 Note that our framework covers the case where the joint distribution of (yit,zt) is modeled. Considering the two components separately adds flexibility in that data is not required for all variables in the same period.
9 that (β (ρ) , ν (ρ)) =(β , ν ) for all ρ = ρ . (4) 6 0 0 6 0 Similarly, we impose restrictions on the time series criterion functions that insure that the param- eters β and ρ cannot be identified solely as the maximizers of G. Formally, we require that
max G (β, ρ) = max G (β0, ρ) for all β Θβ, (5) ρ Θρ ρ Θρ ∈ ∈ ∈ ρ (β) = ρ for all β = β . 6 0 6 0
To insure that the parameters can be identified from a combined cross-sectional and time-series data set we impose the following condition. Define θ (β′, ν′)′ and assume that (i) there exists a ≡ unique solution to the system of equations:
∂F (β,ν,ρ) ∂G (β, ρ) , =0, (6) ∂θ ∂ρ ′ ′ and (ii) the solution is given by the true value of the parameters. In summary, our model is characterized by the high level assumptions in (3), (4), (5), and by the assumption that (6) only has one solution at the true parameter values.
3 Asymptotic Inference
Our asymptotic framework is such that standard textbook level analysis suffices for the discussion of consistency of the estimators. In standard analysis with a single data source, one typically restricts the moment equation to ensure identification, and imposes further restrictions such that the sample analog of the moment function converges uniformly to the population counterpart. Because these arguments are well known we simply impose as a high-level assumption that our estimators are consistent. The purpose of this section is to provide an overview over our results while a rigorous technical discussion is relegated to Section 4 which may be skipped by a less technically oriented reader.
For expositional purposes, suppose that the time series zt is such that the log of its conditional probability density function given zt 1 is g (zt zt 1, ρ). To simplify the exposition in this section − | − we assume that the time series model does not depend on the micro parameter β. To distinguish
10 this scenario where the time series parameter ρ can be estimated in a first step only using time series data from the general case which requires joint estimation of β, ν and ρ we denote the consistent estimator first stage estimator of ρ with ρ. The notationρ ˆ is reserved for the more general joint estimator. e We assume that the dimension of the time series data is τ, and that ρ is a regular estimator with an influence function ϕs such that e 1 τ0+τ √τ (ρ ρ)= ϕ + o (1) (7) − √τ s p s=τ0+1 X e with E [ϕs] = 0. Here, τ0 + 1 denotes the beginning of the time series data, which is allowed to differ from the beginning of the panel data. Using ρ from the time series data, we can then consider maximizing the criterion Fn (β,ν,ρ) with respect to θ = (β, ν1,...,νT ). Implicit in this e representation is the idea that we are given a short panel for estimation of θ = (β, ν1,...,νT ), where T denotes the time series dimension of the panel data. In order to emphasize that T is small, we use the term ’cross-section’ for the short panel data set, and adopt asymptotics where T is fixed. The moment equation then is
∂Fn θ, ρ =0 ∂θ b e and the asymptotic distribution of θ is characterized by
1 ∂2F (θ, ρ) − ∂F (θ, ρ) √n θ θ = b √n n + o (1) . − − ∂θ∂θ ∂θ p ′ e e b 2 √n Because √n (∂F (θ, ρ) /∂θ ∂F (θ, ρ) /∂θ) (∂ F (θ, ρ) /∂θ∂ρ′) √τ (ρ ρ) we obtain n − n ≈ √τ −
τ0+τ e 1 ∂Fn (θ, ρ) 1 √n 1 e √n θ θ = A− √n A− B ϕ + o (1) (8) − − ∂θ − √τ √τ s p s=τ0+1 ! X with b ∂2F (θ, ρ) ∂2F (θ, ρ) A , B ≡ ∂θ∂θ′ ≡ ∂θ∂ρ′ We adopt asymptotics where n, τ at the same rate, but T is fixed. We stress that a tech- →∞ nical difficulty arises because we are conditioning on the factors (ν1,...,νT ) . This is accounted for in the limit theory we develop through a convergence concept by Renyi (1963) called stable con- vergence, essentially a notion of joint convergence. It can be thought of as convergence conditional
11 on a specified σ-field, in our case the σ-field generated by (ν ,...,ν ). In simple special cases, C 1 T 1 τ0+τ and because T is fixed, the asymptotic distribution of √τ s=τ0+1 ϕs conditional on (ν1,...,νT ) may be equal to the unconditional asymptotic distribution.P However, as we show in Section 4, this is not always the case, even when the model is stationary. Renyi (1963) and Aldous and Eagleson (1978) show that the concepts of convergence of the dis- tribution conditional on any positive probability event in and the concept of stable convergence C are equivalent. Eagleson (1975) proves a stable CLT by establishing that the conditional character- istic functions converge almost surely. Hall and Heyde’s (1980) proof of stable convergence on the other hand is based on demonstrating that the characteristic function converges weakly in L1. As pointed out in Kuersteiner and Prucha (2013), the Hall and Heyde (1980) approach lends itself to proving the martingale CLT under slightly weaker conditions than what Eagleson (1975) requires. While both approaches can be used to demonstrate very similar stable and thus conditional limit laws, neither simplifies to conventional marginal weak convergence except in trivial cases. A re- lated idea might be to derive conditional (on ) limiting results separately for the cross-section and C time series dimension of our estimators. As noted before, such a result in fact amounts to demon- strating stable convergence, in this case for each dimension separately. Irrespective, this approach is flawed because it does not deliver joint convergence of the two components. It is evident from (8) that the continuous mapping theorem needs to be applied to derive the asymptotic distribution of θ. Because both A and B are -measurable random variables in the limit the continuous map- C 1/2 τ0+τ ping theorem can only be applied if joint convergence of √n∂F (θ, ρ) /∂θ,τ − ϕ and b n s=τ0+1 s any -measurable random variable is established. Joint stable convergence of bothP components C delivers exactly that. Finally, we point out that it is perfectly possible to consistently estimate parameters, in our case (ν1,...,νT ), that remain random in the limit. For related results, see the recent work of Kuersteiner and Prucha (2015). Here, for the purpose of illustration we consider the simple case where the dependence of the time series component on the factors (ν1,...,νT ) vanishes asymptotically. Let’s say that the unconditional distribution is such that
1 τ0+τ ϕ d N (0, Ω ) √τ s → ν s=τ0+1 X
12 where Ων is a fixed constant that does not depend on (ν1,...,νT ) . Let’s also assume that ∂F (θ, ρ) √n n d N (0, Ω ) ∂θ → y conditional on (ν1,...,νT ). Unlike in the case of the time series sample, Ωy generally does depend on (ν1,...,νT ) through the parameter θ.
We note that ∂F (θ, ρ) /∂θ is a function of (ν1,...,νT ). If there is overlap between (1,...,T )
1 τ0+τ and (τ0 +1,...,τ0 + τ), we need to worry about the asymptotic distribution of √τ s=τ0+1 ϕs conditional on (ν1,...,νT ). However, because in this example the only connection betweenP y and
ϕ is assumed to be through θ and because T is assumed fixed, the two terms √n∂Fn (θ, ρ) /∂θ
1/2 τ0+τ and τ − t=τ0+1 ϕt are expected to be asymptotically independent in the trend stationary case and whenP Ων does not depend on (ν1,...,νT ). Even in this simple setting, independence between the two samples does not hold, and asymptotic conditional or unconditional independence as well as joint convergence with -measurable random variables needs to be established formally. This C is achieved by establishing -stable convergence in Section 4.2. C It follows that 1 1 1 1 √n θ θ N 0, A− Ω A− + κA− BΩ B′A− , (9) − → y ν where 0 < κ lim n/ τ < b. This means that a practitioner would use the square root of ≡ ∞
1 1 1 n 1 1 1 1 1 1 1 1 A− Ω A− + A− BΩ B′A− = A− Ω A− + A− BΩ B′A− n y τ ν n y τ ν as the standard error. This result looks similar to Murphy and Topel’s (1985) formula, except that we need to make an adjustment to the second component to address the differences in sample sizes. The assumption that 0 <κ< is used as a technical device to obtain an asymptotic approx- ∞ imation that accounts for estimation errors stemming both from the cross-section and time series samples. Simulation results in Hahn et.al (2016, 2019a) for data and sample sizes calibrated to actual macro data show that our approximation provides good control for estimator bias and test size. The knife edge case κ = 0 corresponds to situations where the estimation of time series pa-
1 1 rameters can be neglected for inference about θ, and where now √n θ θ N (0, A− Ω A− ) . − → y The expansion in (8) also shows that the case κ = leads to a scenario where uncertainty from b ∞ b the time series dominates such that the rate of convergence of θˆ now is √τ rather than √n and
13 1 1 where √τ θ θ N (0, A− BΩ B′A− ) . However, in what follows we focus on the case most − → ν relevant in practice where 0 <κ< . b ∞ The asymptotic variance formula is such that the noise of the time series estimator ρ can make quite a difference if κ is large, i.e., if the time series size τ is small relative to the cross section size e n. Obviously. this calls for long time series for accurate estimation of the micro parameter β. We also note that time series estimation asymptotically has no impact on micro estimation if B = 0. One scenario where B = 0 is the case where the model is additively separable in θ and ρ such that F (θ, ρ)= F1 (θ)+ F2 (ρ) . This confirms the intuition that if ρ does not appear as part of the micro moment f, which is the case in Heckman and Sedlacek (1985), and Heckman, Lochner, and Taber (1998), cross section estimation can be considered separate from time series estimation.
In the more general setting of Section 4, Ων may depend on (ν1,...,νT ) . In this case the limiting distribution of the time series component is mixed Gaussian and dependent upon the limiting distribution of the cross-sectional component. This dependence does not vanish asymptotically even in stationary settings. As we show in Section 4, standard inference based on asymptotically pivotal statistics is available even though the limiting distribution of θ is no longer a sum of two independent components. b
4 Joint Panel-Time Series Limit Theory
In this section we first establish a generic joint limiting result for a combined panel-time series process and then specialize it to the limiting distributions of parameter estimates under stationarity and non-stationarity. Let (Ω′, ′,P ′) be a probability space with random sequences zt t∞= and G { } −∞ τ0+τ n y ,...,y ∞ . The observed sample z and y ,...,y is a subset of these random { i1 iT }i=1 { t}t=τ0+1 { i1 iT }i=1 y y sequences. The process we analyze consists of a triangular array of panel data ψn,it where ψn,it typically is a function of yit and parameters, observed for i = 1, ..., n and t = 1,...,T. We let n while T is fixed and t = 1 is an arbitrary normalization of time at the beginning of the → ∞ ν ν cross-sectional sample. It also consists of a separate triangular array of time series ψτ,t, where ψτ,t is a function of zt and parameters for t = τ0 +1,...,τ0 + τ. We allow for τ0 to be either fixed with < K τ K < for some bounded K and τ , or to vary as a function of the time −∞ − ≤ 0 ≤ ∞ →∞ series sample size τ. In the latter case we write τ0 = τ0 (τ) and use the short hand notation τ0 when
14 no confusion arises. The fixed τ0 scenario corresponds to a situation where a (hypothetical) time series sample is observed into the infinite future. For the the case where τ0 = τ0 (τ) , we restrict τ (τ) to τ (τ)= υτ + τ + T for some υ [0, 1] and where τ is a fixed constant. This covers 0 0 − 0,f ∈ 0,f the case τ = τ + T . The case where τ varies with the time series sample size in the prescribed 0 − 0 way can be used to model situations where the asymptotics are carried out ‘backwards’ in time (when υ = 1) or where the panel data are located at a fixed fraction of the time series sample as the time series sample size tends to infinity (0 <υ< 1) .
y ν We develop asymptotic theory for the sums of some generic random vectors ψn,it and ψτ,t. y ν Typically, ψn,itand ψτ,t are the scores or moment functions of a cross-section and time series crite- rion function based on observed data yit and zt. Below we introduce general regularity conditions y ν for these generic random vectors ψn,it and ψτ,t. Let kθ be the dimension of the parameter θ and kρ be the dimension of the parameter ρ. With some abuse of notation we also denote by kθ the number of moment conditions used to identify θ when GMM estimators are used, with a similar
y kθ ν convention applying to kρ. With this notation, ψn,it takes values in R and ψτ,t takes values in Rkρ . We assume that T τ + τ. Throughout we assume that ψy , ψν is a type of a vector ≤ 0 n,it τ,t mixingale sequence relative to a filtration to be specified below. The concept of mixingales was introduced by Gordin (1969, 1973) and McLeish (1975). We derive the joint limiting distribution
1 T n y 1 τ0+τ ν and a related functional central limit theorem for √n t=1 i=1 ψn,it and √τ t=τ0+1 ψτ,t. We now form the triangular array of filtrations similarlyP P to KuersteinerP and Prucha (2013). The filtrations are a theoretical construct defined on the probability space in such a way that the observed sample is a strict subset of the random variables that generate the filtrations. We proceed by first collecting information about all of the common shocks, (ν1, ..., νT ), then we pick the initial time-series realization zmin 1,τ0 in such a way that it predates or coincides with the initial period { } of the panel data set. We then sequentially add the cross-sectional units for the same time period,
15 starting from i =1 to i = n.6 Subsequently the same procedure is repeated by shifting the time index ahead by one period. The process ends once the time index reaches max(T, τ). If τ > T, which eventually happens in forward asymptotics, but may also arise in backward asymptotics, enlarge the filtration by adding random variables yit even when t > T . These random variables are generated from the same distribution that produces the observed cross-sectional sample but are not actually included in the cross-sectional sample. This enlargement is used mostly for notational convenience. Formally, the filtrations are defined as follows. We use the binary operator to denote the ∨ smallest σ-field that contains the union of two σ-fields. Setting = σ (ν , ..., ν ) we define C 1 T
= (10) Gτn,0 C = σ z , y i Gτn,i min(1,τ0) j,min(1,τ0) j=1 ∨ C . .
= σ y n , z , z , y i Gτn,n+i j,min(1,τ0) j=1 min(1,τ0)+1 min(1,τ0) j,min(1,τ0)+1 j=1 ∨ C . .
n i τn,(t min(1,τ0))n+i = σ yj,t 1,yj,t 2,...,yj,min(1,τ0) , zt, zt 1,...,zmin(1,τ0) , yj,t . G − − − j=1 − { }j=1 ∨ C As noted before, the filtration is generated by a set of random variables that contain Gτn,tn+i the observed sample as a strict subset. More specifically, we note that the cross-section yj,t is only observed in the sample for a finite number of time periods while the filtrations range over the entire expanding time series sample period. We use the convention that τn,(t min(1,τ0))n = G −
τn,(t min(1,τ0) 1)n+n. This implies that zt and y1t are added simultaneously to the filtration τn,(t min(1,τ0))n+1. G − − G − Also note that predates the time series sample by at least one period. To simplify notation Gτn,i define the function q (t, i)=(t min(1, τ ))n + i that maps the two-dimensional index (t, i) into n − 0 the integers and note that for q = q (t, i) it follows that q 0,..., max(T, τ)n . The filtrations n ∈{ } 6The filtrations are constructed based on the ordering of the cross-sectional sample. However, at the cost of slightly stronger moment conditions, the σ-fields can be constructed in a way that is invariant to reordering of the cross-sectional sample. We refer the interested reader to Kuersteiner and Prucha (2013), in particular Definition 1 and the related discussion for details. Note that the stronger moment conditions needed for the invariance property hold in situations where yit is cross-sectionally independent conditional on zt,zt−1, ... . The latter is a leading { }∨C case in our examples.
16 are increasing in the sense that for all q, τ and n. However, they are not Gτn,q Gτn,q ⊂ Gτn,q+1 nested in the sense of Hall and Heyde’s (1980) Condition (3.21) for two reasons. One is the fact that we are considering what essentially amounts to a panel structure generating the filtration. Kuersteiner and Prucha (2013) provide a detailed discussion of this aspect. A second reason has to do with the possible ’backwards’ asymptotics adopted in this paper. Even in a pure time series setting, i.e. omitting yi,t from the definitions (10), backwards asymptotics lead to a violation of Hall and Heyde’s Condition (3.21) because the definition of changes as q is held fixed but Gτn,q τ increases. The consequence of this is that unlike in Hall and Heyde (1980) stable convergence cannot be established for the entire probability space, but rather is limited to the invariant σ-field . This limitation is the same as in Kuersteiner and Prucha (2013) and also appears in Eagleson C (1975), albeit due to very different technical reasons. The fact that the definition of changes Gτn,q for fixed q does not pose any problems for the proofs that follow, because the underlying proof strategy explicitly accounts for triangular arrays and does not use Hall and Heyde’s Condition (3.21). The central limit theorem we develop needs to establish joint convergence for terms involving
y ν both ψn,it and ψτ,t with both the time series and the cross-sectional dimension becoming large simultaneously. Let [a] be the largest integer less than or equal a. Joint convergence is achieved by stacking both moment vectors into a single sum that extends over both t and i. Let r [0, 1] ∈ and define ψν ψ˜ν (r) τ,t 1 τ +1 t τ + [τr] 1 i =1 , (11) it ≡ √τ { 0 ≤ ≤ 0 } { } which depends on r in a non-trivial way. This dependence will be of particular interest when we specialize our models to the near unit root case. For cross-sectional data define ψy ψ˜y (r) ψ˜y n,it 1 1 t T (12) it ≡ it ≡ √n { ≤ ≤ } where ψ˜y (r) = ψ˜y is constant as a function of r [0, 1] . In turn, this implies that functional it it ∈ convergence of the component (12) is the same as the finite dimensional limit. It also means that the limiting process is degenerate (i.e. constant) when viewed as a function of r. However, this does not matter in our applications as we are only interested in the sample averages
max(T,τ0+τ) 1 T n n ψy = ψ˜y Xy . √n n,it it ≡ nτ t=1 i=1 i=1 X X t=min(1X,τ0+1) X 17 y ν ′ k Define the stacked vector ψ˜ (r) = ψ˜ (r)′ , ψ˜ (r)′ R φ where φ = (θ, ρ) and k is the it it it ∈ φ dimension of Θ. Consider the stochastic process
max(T,τ0+τ) n ˜ y Xnτ (r)= ψit (r) , Xnτ (0) = (Xnτ′ , 0)′ . (13) i=1 t=min(1X,τ0+1) X We derive a functional central limit theorem which establishes joint convergence between the panel and time series portions of the process Xnτ (r). The result is useful in analyzing both trend stationary and unit root settings. In the latter, we specialize the model to a linear time series setting. The functional CLT is then used to establish proper joint convergence between stochastic integrals and the cross-sectional component of our model.
For the stationary case we are mostly interested in Xnτ (1) where in particular
max(T,τ0+τ) 1 τ0+τ n ψν = ψ˜ν (1) √τ τ,t it t=τ0+1 i=1 X t=min(1X,τ0+1) X ψν and we used the fact that n ψ˜ν (1) = ψ˜ν (1) = τ,t 1 τ +1 t τ + τ by (11). The limiting i=1 it 1t √τ { 0 ≤ ≤ 0 } distribution of Xnτ (1) isP a simple corollary of the functional CLT for Xnτ (r) . We note that our treatment differs from Phillips and Moon (1999), who develop functional CLT’s for the time series dimension of the panel data set. In our case, since T is fixed and finite, a similar treatment is not applicable. y We introduce the following general regularity conditions for generic random vectors ψn,it and ν ψτ,t. Similarly, the central limit theorem established in this section is for generic random vectors and empirical processes satisfying the regularity conditions. In later sections, these conditions will be specialized to the particular models considered there. To apply the general theory in this section to specific models we will evaluate the score or moment function at the true parameter value. In
y ν those instances ψn,it and ψτ,t will be interpreted as the score or moment function evaluated at the true parameter value. We use . to denote the Euclidean norm. k k
Condition 1 Assume that y i) ψn,it is measurable with respect to τn,(t min(1,τ0))n+i. G − ν ii) ψτ,t is measurable with respect to τn,(t min(1,τ0))n+i for all i =1, ..., n. G − iii) for some δ > 0 and C < , sup E ψy 2+δ C for all n 1. ∞ it n,it ≤ ≥ h i 18 2+δ iv) for some δ > 0 and C < , sup E ψν C for all τ 1. ∞ t τ,t ≤ ≥ y h i v) E ψn,it τn,(t min(1,τ0))n+i 1 =0. G − − ν vi) E ψτ,t τn,(t min(1,τ0) 1)n+i =0 for t > T and all i =1, ..., n. G − − ν vii) E ψ τ,t τn,(t min(1,τ0) 1)n+i ϑt for t< 0 and all i =1, ..., n where G − − 2 ≤ 1/2 ϑ C t 1+δ − t ≤ | | for the same δ as in (iv) and some bounded constant C and where for a vector of random variables 1/2 x =(x , .., x ) , x = d E x 2 is the L norm. 1 d k k2 j=1 | j| 2 P Remark 1 Conditions 1(i), (iii) and (v) can be justified in a variety of ways. One is the sub- ordinated process theory employed in Andrews (2005) which arises when yit are random draws from a population of outcomes y. A sufficient condition for Conditions 1(v) to hold is that E [ψ (y θ, ρ, ν ) ]=0 holds in the population. This would be the case, for example, if ψ were the | t | C correctly specified score for the population distribution distribution of yi,t given νt . See Andrews (2005, pp. 1573-1574).
y Conditions 1(i), (iii) and (v) impose a martingale property for ψn,it both in the time as well as in the cross-section dimension. More specifically, E ψy ψy , ψy , ..., ψy , = 0 for any n,it| n,i1t n,i2t2 n,iktk C collection (i1, t) , ..., (ik, tk) with ir < i, t2, ..., tk < t andir 1, ..., n . The central limit theorem is ∈{ } established by letting the sums over the sample increase sequentially in line with the information contained in and thus mapping the sums over t and i into a single sum over an index set on Gτn,q the real line. This approach preserves the information structure in and in particular takes Gτn,q into account that in general E ψy ψy , = 0 for s > t and all j 1, ..., n . The score of a n,it| n,js C 6 ∈ { } correctly specified likelihood is a leading example where the information structure takes the form implied by our conditions. The construction in this paper is in contrast to some of the joint limit theory for panels that is based on mds assumptions in the time direction only and weak convergence of time series averages ˘y 1/2 n y of cross-sectional aggregates ψn,t = n− i=1 ψn,it, driven by T going to infinity. Our theory on the other hand depends on the cross-sectionP sample size n tending to infinity, while T is kept fixed. The pure cross-section case is covered by allowing for T = 1. Other examples include cases
19 where the cross-section is sampled at random conditional on and ψy is a conditional moment C n,it function in a rational agent model.
ν Conditions 1(ii), (iv) and (vi) impose a martingale property for ψτ,t in the time dimension. In addition the condition also implies that E ψν ψy , = 0 for any i 1, ..., n and s < t τ,t| n,is C ∈ { } ν and t > T . We note that this condition is weaker than assuming independence between ψτ,t and ψy , even conditionally on . While the examples in Hahn, Kuersteiner and Mazzocco (2019) n,is C do satisfy such a conditional independence restriction, it is not required for the CLT developed in this paper.
ν We note that E ψτ,t τn,qn(t 1,i) = 0 for all i only holds for t > T because we condition not G − only on zt 1, zt 2... but also on ν1, ..., νT , where the latter may have non-trivial overlap with the − − ν former. When τ0 is fixed the number of time periods t where E ψτ,t τn,qn(t 1,i) = 0 is finite G − 6 and thus can be neglected asymptotically. On the other hand, when τ0 varies with τ there is an asymptotically non-negligible number of time periods where the condition may not hold. To handle this latter case we impose an additional mixingale type condition that is satisfied for typical time series models.
Conditions 1(vii) is an additional restriction needed to handle situations where τ0, the starting point of the time series sample, is allowed to diverge to . We call this situation backward −∞ asymptotics. Since it generally is the case that E ψν = 0 for t < 0 because C contains τ,t|C 6 ν information about future realizations of ψτ,t we need a condition that limits this dependence as t . The following example illustrates that the condition naturally holds in linear time series → −∞ models.
j Example 1 Assume us is iid N (0, 1) and zs = ∞ ρ us j with ρ < 1 is the stationary solution j=0 − | | to zs+1 = ρzs + us+1. Use the convention that νPs = zs. Then the score of the Gaussian likelihood is ψν = z u . Assuming that τ = τ +1, T =1 and that z is independent of y conditional τ,s s s+1 0 − s i,t on = σ (ν1), it is sufficient for this example to define τn,(t min(1,τ0))n+i = σ ( zt, zt 1,...,zτ0 ) C G − { − } ∨ 7 σ (ν1). Then
ν s /2 (1+δ)/2 E ψτ,s τn,(s min(1,τ0) 1)n+i = O ρ | | = o s − . G − − 2 | | | | 7Detailed derivations are in Section A.1.
20 The following conditions, Condition 2 for the time series sample and Condition 3 below for the cross-section sample are put in place to ensure that, in combination with Condition 1, the variance of X (r) converges as n and τ tend to jointly. Conditions 2 and 3 correspond to nτ ∞ Condition 3.19 in Hall and Heyde (1980, Theorem 3.2). We note in particular that the martingale structure in conjunction with uniform moment bounds imposed in 1 are sufficient to guarantee that cross-covariance terms over different time periods converge to zero.
Condition 2 Assume that: i) for any r [0, 1] , ∈ τ0+[τr] 1 ν ν p ψ ψ ′ Ω (r) as τ τ τ,t τ,t → ν →∞ t=τ0+1 X where Ω (r) is positive definite a.s. and measurable with respect to σ (ν , ..., ν ) for all r (0, 1]. ν 1 T ∈ ii) The elements of Ω (r) are bounded continuously differentiable functions of r>s [0, 1]. The ν ∈ derivatives Ω˙ ν (r)= ∂Ων (r) /∂r are positive definite almost surely.
kρ ˙ iii) There is a fixed constant M < such that sup λν =1,λν R supt λν′ Ων (t) λν M a.s. ∞ k k ∈ ≤
Remark 2 Note that by construction Ων (0) = 0.
Condition 2 is weaker than the conditions of Billingsley’s (1968, Theorem 23.1) functional CLT for strictly stationary martingale difference sequences (mds) because we neither assume
ν ν strict stationarity nor homoskedasticity. We do not assume that E ψτ,tψτ,t′ is constant. Brown (1971) allows for time varying variances, but uses stopping times to achieve a standard Brownian limit. Even more general treatments with random stopping times are possible - see Gaenssler and Haeussler (1979). On the other hand, here convergence to a Gaussian process (not a standard Wiener process) with the same methodology (i.e. establishing convergence of finite dimensional distributions and tightness) as in Billingsley, but without assuming homoskedasticity is pursued. Related results with heteroskedastic errors in the high frequency and stochastic process literature can be found for example in Jacod and Shiryaev (2003, Theorem IX 7.28). Heteroskedastic errors are explicitly used in Section 5 where ψν = exp ((t s) γ/τ) η . Even τ,t − s 2 ν if ηs is iid(0, σ ) it follows that ψτ,t is a heteroskedastic triangular array that depends on τ. It can be shown that the variance kernel Ω (r) is Ω (r)= σ2 (1 exp ( 2rγ))/2γ in this case. See ν ν − − equation (61).
21 Condition 3 Assume that n 1 y y p ψ ψ ′ Ω n n,it n,it → ty i=1 X where Ωty is positive definite a.s. and measurable with respect to σ (ν1, ..., νT ) .
Condition 2 holds under a variety of conditions that imply some form of weak dependence of
ν the process ψτ,t. These include, in addition to Condition 1(ii) and (iv), mixing or near epoch ν dependence assumptions on the temporal dependence properties of the process ψτ,t. Condition 3 holds under appropriate moment bounds and random sampling in the cross-section even if the underlying population distribution is not independent (see Andrews, 2005, for a detailed treatment).
4.1 Stable Functional CLT
This section details the probabilistic setting we use to accommodate the results that Jacod and
Shiryaev (2002) (shorthand notation JS) develop for general Polish spaces. Let (Ω′, ′,P ′) be a G probability space with increasing filtrations ′ and for any q = 1, ..., k Gkn,t ⊂ G Gkn,q ⊂ Gkn,q+1 n 8 and an increasing sequence kn as n . Let DRkθ Rkρ [0, 1] be the space of functions → ∞ → ∞ × [0, 1] Rkθ Rkρ that are right continuous and have left limits (see Billingsley (1968, p.109)) → × with discontinuities synchronized across all elements in the vectors. Let be a sub-sigma field of C n kθ kρ ′. Let (ζ,Z (ω, t)) : Ω′ [0, 1] R R R be random variables or random elements in R G × → × × and DRkθ Rkρ [0, 1], respectively defined on the common probability space (Ω′, ′,P ′) and assume × G that ζ is bounded and measurable with respect to . C
As in JS, p.512, let Z (ω′, x)= x be the canonical element on DRkθ Rkρ [0, 1] and let Q (ω′, dx) × be a version of the distribution of Z conditional on . Similarly, let Q (ω′, dx) be a version of C n the conditional (on ) distribution of Zn. Following JS (Definition VI1.1 and Theorem VI1.14) C we define the σ-field generated by all coordinate projections on DRkθ Rkρ [0, 1] as Rkθ Rkρ . Then × D × define the joint probability space (Ω, ,P ) with Ω = Ω′ DRkθ Rkρ [0, 1] , = ′ Rkθ Rkρ and G × × G G ⊗D × n P (dω′, dx) = P ′ (dω′) Q (ω′, dx) . Following JS (p.512, Definition 5.28) we say that Z converges -stably to Z if for all bounded, -measurable ζ, C C E [ζf (Zn)] E [ζQ [f (Z)]] (14) → 8 In our case kn = max(T, τ)n where both n and τ such that clearly kn . → ∞ → ∞ → ∞ 22 where Q [f (Z)] is the expectation of f (Z) conditional on . More specifically, if W (r) is standard C Brownian motion, we say that Zn W (r) -stably where the notation means that (14) holds ⇒ C when Q is Wiener measure (for a definition and existence proof see Billingsley (1968, Chapter 2, Section 9)). By JS Proposition VIII, 5.33 it follows that Zn converges -stably iff Zn is tight and C for all A , E [1 f (Zn)] converges. ∈ C A The concept of stable convergence was introduced by Renyi (1963) and has found wide appli- cation in probability and statistics. Most relevant to the discussion here are the stable central limit theorem of Hall and Heyde (1980, Theorem 3.2) and Kuersteiner and Prucha (2013) who extend the result in Hall and Heyde (1980, Theorem 3.2) to panel data with fixed T . Dedecker and Merlevede (2002) established a related stable functional CLT for strictly stationary martingale differences.
Theorem 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ψ˜it defined in (13), and as τ, n , and T fixed, →∞
By (1) X (r) ( -stably) nτ ⇒ C Bν (r) where B (r) = Ω 1/2W (r), B (r) = r Ω˙ (s)1/2 dW (s) and Ω(r) = diag (Ω , Ω (r)) is - y y y ν 0 ν ν y ν C measurable, Ω˙ ν (s) = ∂Ων (s) /∂s and (WR y (r) , Wν (r)) is a vector of standard kφ-dimensional, mutually independent, Brownian processes independent of Ω.
Proof. In Appendix B.
Remark 3 Note that W (r)= W (1) for each r [0, 1] by construction. Thus, W (1) is simply y y ∈ y a vector of standard Gaussian random variables, independent both of Wν (r) and any random variable measurable w.r.t . C
The limiting random variables B (r) and B (r) both depend on and are thus mutually y ν C dependent. However, conditional on , the limiting random variables are independent because of C 1/2 the mutual independence of Wy (r) and Wν (r) . The representation By (1) = Ωy Wy(1), where a stable limit is represented as the product of an independent Gaussian random variable and a scale factor that depends on , is common in the literature on stable convergence. Results similar to the C 23 one for Bν (r) were obtained by Phillips (1987, 1988) for cases where Ω˙ ν (s) is non-stochastic and has an explicitly functional form, notably for near unit root processes and when convergence is marginal rather than stable. Rootzen (1983) establishes stable convergence but gives a represen- tation of the limiting process in terms of standard Brownian motion obtained by a stopping time transformation. The representation of Bν (r) in terms of a stochastic integral over the random scale process Ω˙ ν (s) is obtained by utilizing a technique mentioned in Rootzen (1983, p. 10) but not utilized there, namely establishing finite dimensional convergence using a stable martingale CLT. This technique combined with a tightness argument establishes the characteristic function of the limiting process. The representation for Bν (r) is then obtained by utilizing isometry prop- erties of the stochastic integral. Rootzen (1983, p.13) similarly utilizes characteristic functions to identify the limiting distribution in the case of standard Brownian motion. Similar representations have been obtained in the high frequency time series literature, see Jacod et al. (2010), Jacod and Protter (2012). Finally, the results of Dedecker and Merlevede (2002) differ from ours in that they only consider asymptotically homoskedastic and strictly stationary processes. In our case, heteroskedasticity is explicitly allowed because of Ω˙ ν (s) . An important special case of Theorem 1 is the near unit root model discussed in more detail in Section 5. More importantly, our results innovate over the literature by establishing joint convergence between cross-sectional and time series averages that are generally not independent and whose limiting distributions are not independent. This result is obtained by a novel construction that embeds both data sets in a random field. A careful construction of information filtrations Gτn,n+i allows to map the field into a martingale array. Similar techniques were used in Kuersteiner and Prucha (2013) for panels with fixed T. In this paper we extend their approach to handle an additional and distinct time series data-set and by allowing for both n and τ to tend to infinity jointly. In addition to the more complicated data-structure, we extend Kuersteiner and Prucha (2013) by considering functional central limit theorems. The following corollary is useful for possibly non-linear but trend stationary models.
Corollary 1 Assume that Conditions 1, 2 and 3 hold. Then it follows that for ψ˜it defined in (13), and as τ, n and T fixed, →∞
X (1) d B := Ω1/2W ( -stably) nτ → C 24 where Ω = diag (Ω , Ω (1)) is -measurable and W = (W (1) , W (1)) is a vector of standard y ν C y ν d-dimensional Gaussian random variables independent of Ω. The variables Ωy, Ων (.) , Wy (.) and
Wν (.) are as defined in Theorem 1.
Proof. In Appendix B. The result of Corollary 1 is equivalent to the statement that X (1) d N (0, Ω) conditional nτ → on positive probability events in . As noted earlier, no simplification of the technical arguments C are possible by conditioning on except in the trivial case where Ω is a fixed constant. Eagleson C (1975, Corollary 3), see also Hall and Heyde (1980, p. 59), establishes a simpler result where X (1) d B weakly but not ( -stably). Such results could in principle be obtained here as well, nτ → C but they would not be useful for the analysis in Sections 4.2 and 5 because the limiting distributions of our estimators not only depend on B but also on other -measurable scaling matrices. Since the C continuous mapping theorem requires joint convergence, a weak limit for B alone is not sufficient to establish the results we obtain below. Theorem 1 establishes what Phillips and Moon (1999) call diagonal convergence, a special form of joint convergence.9 To see that sequential convergence where first n or τ go to infinity, followed by the other index, is generally not useful in our set up, consider the following example.
Assume that d = kφ is the dimension of the vector ψ˜it. This would hold for just identified moment estimators and likelihood based procedures. Consider the double indexed process
max(T,τ0+τ) n ˜ Xnτ (1) = ψit (1) . (15) i=1 t=min(1X,τ0+1) X For each τ fixed, convergence in distribution of X as n follows from the central limit nτ → ∞ theorem in Kuersteiner and Prucha (2013). Let Xτ denote the “large n, fixed τ” limit. For each n fixed, convergence in distribution of X as τ follows from a standard martingale central limit nτ →∞ theorem for Markov processes. Let Xn be the “large τ, fixed n” limit. It is worth pointing out that the distributions of both Xn and Xτ are unknown because the limits are trivial in one direction.
1/2 τ0+τ ν For example, when τ is fixed and n tends to infinity, the component τ − t=τ0+1 ψτ,t trivially
9The discussion assumes that 0 <κ< . The cases where κ =0 or κ = allow for a simplerP treatment where ∞ ∞ either the time series or cross-section sample can be ignored. In those situations considerations of joint convergence play only a minor role.
25 1/2 τ0+τ ν converges in distribution (it does not change with n) but the distribution of τ − t=τ0+1 ψτ,t is generally unknown. More importantly, application of a conventional CLT for the cross-sectionP alone will fail to account for the dependence between the time series and cross-sectional compo- nents. Sequential convergence arguments thus are not recommended even as heuristic justifications of limiting distributions in our setting.
4.2 Trend Stationary Models
Let θ =(β, ν1, ..., νT ) and define the shorthand notation fit (θ, ρ)= f (yit θ, ρ), gt (β, ρ)= g (νt νt 1,β,ρ) , | | − fθ,it (θ, ρ) = ∂fit (θ, ρ) /∂θ and gρ,t (β, ρ) = ∂gt (β, ρ) /∂ρ. Also let fit = fit (θ0, ρ0) , fθ,it = fθ,it (θ0, ρ0) , gt = gt (β0, ρ0) and gρ,t = gρ,t (β0, ρ0) . Depending on whether the estimator under consideration is maximum likelihood or moment based we assume that either (fθ,it,gρ,t)or(fit,gt) y ν satisfy the same Assumptions as ψit, ψτ,t in Condition 1. We recall that νt (β, ρ) is a function of (zt,β,ρ), where zt are observable macro variables. For the CLT, the process νt = νt (ρ0, β0) is evaluated at the true parameter values and treated as observed. In applications, νt will be replaced by an estimate which potentially affects the limiting distribution of ρ. This dependence is analyzed in a step separate from the CLT. The next step is to use Corollary 1 to derive the joint limiting distribution of estimators for
ν 1/2 τ0+τ y φ = (θ′, ρ′)′. Define sML (β, ρ) = τ − ∂g (νt (β, ρ) νt 1 (β, ρ) ,β,ρ) /∂ρ and sML (θ, ρ) = t=τ0+1 | − 1/2 T n n− ∂f (yit θ, ρ) /∂θ for maximumP likelihood, and t=1 i=1 | P P τ0+τ ν τ 1/2 sM (β, ρ)= (∂kτ (β, ρ) /∂ρ)′ Wτ τ − g (νt (β, ρ) νt 1 (β, ρ) ,β,ρ) − | − t=τ0+1 X y C 1/2 T n and s (θ, ρ) = (∂h (θ, ρ) /∂θ)′ W n− f (y θ, ρ) for moment based estimators. M − n n t=1 i=1 it| We use sν (β, ρ) and sy (θ, ρ) generically forP argumentsP that apply to both maximum likelihood and moment based estimators. The estimator φˆ jointly satisfies the moment restrictions using time series data sν β,ˆ ρˆ =0. (16) and cross-sectional data sy θ,ˆ ρˆ =0. (17)
26 y ν Defining s (φ)= s (φ)′ ,s (φ)′ ′ the estimator φˆ satisfies s φˆ = 0. A first order Taylor series ˆ expansion around φ0 is used to obtain the limiting distribution for φ. We impose the following additional assumption.
k k kρ 1/2 1/2 Condition 4 Let φ =(θ′, ρ′)′ R φ , θ R θ , and ρ R . Define D = diag n− I , τ − I ,where ∈ ∈ ∈ nτ y ν Iy is an identity matrix of dimension kθ and Iν is an identity matrix of dimension kρ. Assume that for some ε> 0,
∂s (φ) sup Dnτ A (φ) = op (1) φ: φ φ0 ε ∂φ′ − k − k≤ where A (φ) is -measurable and A = A (φ0) is full rank almost surely. Let κ = lim n/τ, C
Ay,θ √κAy,ρ A = 1 √κ Aν,θ Aν,ρ
1/2 y 1/2 y 1/2 ν with Ay,θ = plim n− ∂s (φ0) /∂θ′, Ay,ρ = plim n− ∂s (φ0) /∂ρ′, Aν,θ = plim τ − ∂s (φ0) /∂θ′ 1/2 ν and Aν,ρ = plim τ − ∂s (φ0) /∂ρ′.
Condition 5 For maximum likelihood criteria the following holds:
1 τ0+[τr] p i) for any r [0, 1] with r, g g′ Ω (r) as τ and where Ω (r) satisfies the ∈ τ t=τ0+1 ρ,t ρ,t → ν → ∞ ν same regularity conditions as inP Condition 2(ii). 1 n p ii) f f ′ Ω for all t [1, ..., T ] and where Ω is positive definite a.s. and measurable n i=1 θ,it θ,it → ty ∈ ty T withP respect to σ (ν1, ..., νT ) . Let Ωy = t=1 Ωty.
C C P τ τ Condition 6 Let W = plimn Wn and W = plimτ Wτ and assume the limits to be positive definite and -measurable. Define h (θ, ρ) = plim h (β, ν , ρ) and k (β, ρ) = plim k (β, ρ) . For C n n t τ τ moment based criteria the following holds:
1 τ0+[τr] p i) g g′ Ω (r) as τ and where Ω (r) satisfies the same regularity conditions as τ t=τ0+1 t t → g →∞ g in ConditionP 2(ii). 1 n p T ii) f f ′ Ω for all t [1, ..., T ] . Let Ω = Ω . Assume that Ω is positive n i=1 it it → t,f ∈ f t=1 t,f f definiteP a.s. and measurable with respect to σ (ν1, ..., νT ). P Assume that for some ε> 0,
τ τ ′ ′ iii) supφ: φ φ0 ε (∂kτ (β, ρ) /∂ρ) Wτ ∂k (β, ρ) /∂ρW = op (1) , k − k≤ − C C ′ ′ iv) supφ: φ φ0 ε (∂hn (θ, ρ) /∂θ) Wn (∂h (θ, ρ) /∂θ) W = op (1) . k − k≤ −
27 The following result establishes the joint limiting distribution of φ.ˆ
y ν Theorem 2 Assume that Conditions 1, 4, and either 5 with ψit, ψτ,t =(fθ,it,gρ,t) in the case of y ν likelihood based estimators or 6 with ψit, ψτ,t = (fit,gt) in the case of moment based estimators ˆ hold. Assume that φ φ0 = op (1) and that (16) and (17) hold. Then, − 1 d 1 1/2 D− φˆ φ A− Ω W ( -stably) nτ − 0 → − C where A is full rank almost surely, -measurable and is defined in Condition 4. The distribution C 1/2 of Ω W is given in Corollary 1. In particular, Ω = diag (Ωy, Ων (1)). Then the criterion is maximum likelihood Ωy and Ων (1) are given in Condition 5. When the criterion is moment based, ′ ′ ∂h(θ0,ρ0) C C ∂h(θ0,ρ0) ∂k(β0,ρ0) τ τ ∂k(β0,ρ0) Ωy = ∂θ W Ωf W ′ ∂θ and Ων (1) = ∂ρ W Ωg (1) W ′ ∂ρ with Ωf and Ωg (1) defined in Condition 6.
Proof. In Appendix B.
Corollary 2 Under the same conditions as in Theorem 2 it follows that
√n θˆ θ d Ay,θΩ1/2W (1) √κAy,ρΩ1/2 (1) W (1) ( -stably). (18) − 0 → − y y − ν ν C where
y,θ 1 1 1 1 1 A = A− + A− A A A A− A − A A− y,θ y,θ y,ρ ν,ρ − ν,θ y,θ y,ρ ν,θ y,θ y,ρ 1 1 1 A = A− A A A A− A − . − y,θ y,ρ ν,ρ − ν,θ y,θ y,ρ For y,θ y,θ y,ρ y,ρ Ωθ = A ΩyA ′ + κA Ων (1) A ′ it follows that 1/2 d √nΩ− θˆ θ N (0,I) ( -stably). (19) θ − 0 → C Note that Ω , the asymptotic variance of √n θˆ θ conditional on , in general is a random θ − 0 C variable, and the asymptotic distribution of θˆ is mixed normal. However, as in Andrews (2005), the result in (19) can be used to construct an asymptotically pivotal test statistic. For a consistent 1/2 estimator Ωˆ the statistic √nΩˆ − Rθˆ r is asymptotically distribution free under the null θ θ − hypothesis Rθ r = 0 where R is a conforming matrix of dimension q k and and r a q 1 − × θ × vector.
28 5 Unit Root Time Series Models
5.1 Unit Root Problems
When the simple trend stationary paradigm does not apply, the limiting distribution of our es- timators may be more complicated. A general treatment is beyond the scope of this paper and likely requires a case by case analysis. In this subsection we consider a simple unit root model where initial conditions can be neglected. We use it to exemplify additional inferential difficulties that arise even in this relatively simple setting. In Section 5.2 we consider a slightly more complex version of the unit root model where initial conditions cannot be ignored. We show that more complicated dependencies between the asymptotic distributions of the cross-section and time series samples manifest. The result is a cautionary tale of the difficulties that may present themselves when nonstationary time series data are combined with cross-sections. We leave the development of inferential methods for this case to future work. We again consider the model in the previous section, except with the twist that (i) ρ is the
AR(1) coefficient in the time series regression of zt on zt 1 with independent error; and (ii) ρ is at − unity. In the same way that led to (8), we obtain
1 ∂Fn (θ, ρ) 1 √n √n θ θ A− √n A− B τ (ρ ρ) − ≈ − ∂θ − τ − For simplicity, again assumeb that the two terms on the right are asymptoticallye independent. The 1 1 first term converges in distribution to a normal distribution N (0, A− ΩyA− ), but with ρ = 1 and i.i.d. AR(1) errors the second term converges to
2 1 W (1) 1 ξA− B 1 −2 , 2 0 W (r) dr where ξ lim √n/ τ and W ( ) is the standardR Wiener process. In contrast to the result in (9) ≡ · when ρ is away from unity, √n/τ rather than n/τ is assumed to converge to a constant. Because ρ˜ is superconsistent under the unit root scenario, from a theoretical point of view, the correction term is relevant only in cases where n is much larger than τ such that ξ > 0 in the limit. The result is formalized in Section 5.2. The fact that the limiting distribution of θˆ is no longer Gaussian complicates inference. This discontinuity is mathematically similar to Campbell and Yogo’s (2006) observation, which leads
29 to a question of how uniform inference could be conducted. In principle, the problem here can be analyzed by modifying the proposal in Phillips (2014, Section 4.3). First, construct the 1 α − 1 confidence interval for ρ using Mikusheva (2007). Call it [ρ , ρ ]. Second, compute θ (ρ) L U ≡ argmax F (θ, ρ) for ρ [ρ , ρ ]. Assuming that ρ is fixed, characterize the asymptotic variance θ n ∈ L U b Σ(ρ), say, of √n θ (ρ) θ (ρ) , which is asymptotically normal in general. Third, construct the − 1 α confidence region, say CI (α ; ρ), using asymptotic normality and Σ (ρ). Our confidence − 2 b 2 interval for θ is then given by CI (α ; ρ). By Bonferroni, its asymptotic coverage rate 1 ρ [ρL,ρU ] 2 ∈ is expected to be at least 1 α1S α2. − −
5.2 Unit Root Limit Theory
In this section we consider the special case where νt follows an autoregressive process of the form
νt+1 = ρνt + ηt. As in Hansen (1992), Phillips (1987, 1988, 2014) we allow for nearly integrated processes where ρ = exp (γ/ τ) is a scalar parameter localized to unity such that
ντ,t+1 = exp (γ/ τ) ντ,t + ηt+1 (20)
and the notation ντ,t emphasizes that ντ,t is a sequence of processes indexed by τ. We assume that
τ0 = 0 is fixed and 1/2 τ − ντ,min(1,τ0) = V (0) a.s. where V (0) is a potentially nondegenerate random variable. In other words, the initial condition
1/2 for (20) is ντ,min(1,τ0) = τ V (0). We explicitly allow for the case where V (0) = 0, to model a situation where the initial condition can be ignored. This assumption is similar, although more parametric than, the specification considered in Kurtz and Protter (1991). We limit our analysis to the case of maximum likelihood criterion functions. Results for moment based estimators can be developed along the same lines as in Section 4.2 but for ease of exposition we omit the details.
For the unit root version of our model we assume that νt is observed in the data and that the only parameter to be estimated from the time series data is ρ. Further assuming a Gaussian quasi-likelihood function we note that the score function now is
gρ,t (β, ρ)= ντ,t 1 (ντ,t ντ,t 1ρ) . (21) − − −
30 The estimatorρ ˆ solving sample moment conditions based on (21) is the conventional OLS estimator given by τ ντ,t 1ντ,t t=τ0+1 − ρˆ = τ 2 . t=τ0+1 ντ,t 1 P − We continue to use the definition for fθ,it (θ,P ρ) in Section 4.2 but now consider the simplified case where θ0 = (β,V (0)). We note that in this section, V (0) rather than ντ,min(1,τ0) is the common 1/2 shock used in the cross-sectional model. The implicit scaling of ντ,min(1,τ0) by τ − is necessary in the cross-sectional specification to maintain a well defined model even as τ . →∞ y 1/2 Consider the joint process (V (r) ,s ) where V (r) τ − ν , and τn ML τn ≡ τ[τr] T n f sy s (θ , ρ ) θ,it . ML ≡ ML 0 0 ≡ √n t=1 i=1 X X Note that r τ0+[τr] 1 Vτn (u) dWτn (u)= τ − ντ,t 1ηt − 0 t=τ0+1 Z X 1/2 τ0+[τr] with W (r) τ − η . We define the limiting process for V (r) as τn ≡ t=τ0+1 t τn r P γr γ(r s) Vγ,V (0) (r)= e V (0) + σe − dWν (s) (22) Z0 γ[rτ]/τ where W is defined in Theorem 1. When V (0) = 0, Theorem 1 directly implies that e− V (r) ν τn ⇒ r sγ 2 1/2 σe− dW (s) -stably noting that in this case Ω (s)= σ (1 exp ( 2sγ)) /2γ and Ω˙ (s) = 0 ν C ν − − ν sγ r γ(r s) Rσe− . The familiar result (cf. Phillips 1987) that Vτn (r) σe − dWν (s) then is a conse- ⇒ 0 quence of the continuous mapping theorem. The case in (22)R where V (0) is a -measurable C random variable now follows from -stable convergence of V (r). In this section we establish C τn joint -stable convergence of the triple V (r) ,sy , r V (u) dW (u) . C τn ML 0 τn τn kφ kθ Let φ = (θ′, ρ)′ R , θ R , and ρ R. TheR true parameters are denoted by θ0 and ∈ ∈ ∈ ρ = exp (γ /τ) with γ R and both θ and γ bounded. We impose the following modified τ0 0 0 ∈ 0 0 assumptions to account for the the specific features of the unit root model.
Condition 7 Define = σ (V (0)). Define the σ-fields n,(t min(1,τ0))n+i in the same way as in C G − (10) except that here τ = κn such that dependence on τ is suppressed and that νt is replaced with
ηt as in
n i n,(t min(1,τ0))n+i = σ yjt 1,yjt 2,...,yj min(1,τ0) , ηt, ηt 1,...,ηmin(1,τ0) , (yj,t) . G − − − j=1 − j=1 ∨ C 31 Assume that i) fθ,it is measurable with respect to n,(t min(1,τ0))n+i. G − ii) ηt is measurable with respect to n,(t min(1,τ0))n+i for all i =1, ..., n G − iii) for some δ > 0 and C < , sup E f 2+δ C ∞ it k θ,itk ≤ iv) for some δ > 0 and C < , sup E hη 2+δ i C ∞ t k tk ≤ h i v) E fθ,it n,(t min(1,τ0))n+i 1 =0 |G − − vi) E ηt n,(t min(1,τ0) 1)n+i =0 for t > T and all i = 1, ..., n . |G − − { } r,s 1 τ0+[τr] 2 vii) For any 1 >r>s 0 fixed let Ωτ,η = τ − E ηt n,(t min(1,τ0))n . Then, ≥ t=min(1,τ0)+[τs]+1 |G − r,s 2 Ω p (r s) σ . P τ,η → − 1 n p viii) Assume that f f ′ Ω where Ω is positive definite a.s. and measurable with n i=1 θ,it θ,it → ty ty T respect to . Let Ωy P= Ωty. C t=1 Conditions 7(i)-(vi)P are the same as Conditions 1 (i)-(vi) adapted to the unit root model. Con-
2 dition 7(vii) replaces Condition 2. It is slightly more primitive in the sense that if ηt is homoskedas- 1 τ0+[τr] 2 2 tic, Condition 7(vii) holds automatically and convergence of τ − η (r s) σ t=min(1,τ0)+[τs]+1 t → − follows from an argument given in the proofs rather than beingP assumed. On the other hand, Condition 7(vii) is somewhat more restrictive than Condition 2 in the sense that it limits het- eroskedasticity to be of a form that does not affect the limiting distribution. In other words,
1 τ0+[τr] 2 we essentially assume τ − η to be proportional to r s asymptotically. This t=min(1,τ0)+[τs]+1 t − assumption is stronger thanP needed but helps to compare the results with the existing unit root literature.
For Condition 7(viii) we note that typically Ωty (φ) = E fθ,itfθ,it′ and Ωty = Ωty (φ0) where
φ0 = (β0′ ,V0 (0) , ρτ0). Thus, even if Ωty (.) is non-stochastic, it follows that Ωty is random and measurable with respect to because it depends on V (0) which is a random variable measurable C w.r.t . C The following results are established by modifying arguments in Phillips (1987) and Chan and Wei (1987) to account for -stable convergence and by applying Theorem 1. C Theorem 3 Assume that Conditions 7 hold. With τ = 0 and as τ, n and T fixed with 0 → ∞ τ = κn for some κ (0, ) it follows that ∈ ∞ s s V (r) ,sy , V (u) dW (u) V (r) , Ω1/2W (1) , σV (u) dW (u) ( -stably) τn ML τn τn ⇒ γ,V (0) y y γ,V (0) ν C Z0 Z0 32 in the Skorohod topology on DRd [0, 1] .
Proof. In Appendix B. We now employ Theorem 3 to analyze the limiting behavior of θˆ when the common factors are generated from a linear unit root process. To derive a limiting distribution for φˆ we impose the following additional assumption.
T n 1/2 Condition 8 Let θˆ = argmax f (y θ, ρˆ). Assume that θˆ θ = O n− . t=1 i=1 it| − 0 p 2P P T Condition 9 Let κ = lim n/τ . Let Ay,θ (φ)= t=1 E [∂fθ,it/∂θ′] , TP Ay,ρ (φ)= E [∂fθ,it/∂ρ] t=1 X y and define A (φ)= A (φ) √κA (φ) where A (φ) is a kθ kφ dimensional matrix of non- y,θ y,ρ × random functions φ h R. Assume that A i(φ ) is full rank almost surely. Assume that for some → y,θ 0 ε> 0, y ∂s˜ (φ) y sup Dnτ A (φ) = op (1) . φ: φ φ0 ε ∂φ′ − k − k≤
We make the possibly simplifying assumption that A (φ ) only depends on the factors through the parameter θ.
Theorem 4 Assume that Conditions 7, 8 and 9 hold. It follows that 1 1 − 1 ˆ d 1 1/2 1 2 √n θ θ0 Ay,θ− Ωy Wy (1) √κAy,θ− Ay,ρ Vγ,V (0) (r) dr σVγ,V (0) (r) dWν (r) ( -stably). − → − − 0 0 C Z Z Proof. In Appendix B. 1 1 2 − 1 Note that the term 0 Vγ,V (0) (r) dr corresponds to Aν,ρ− in the stationary case when the time series model does notR depend on cross-sectional parameters. The result in Theorem 4 is an example that shows how common factors affecting both time series and cross-section data can lead to non-standard limiting distributions. In this case, the initial condition of the unit root process in the time series dimension causes dependence between the components of the asymptotic ˆ distribution of θ because both Ωy and Vγ,V (0) in general depend on V (0). Thus, the situation encountered here is generally more difficult than the one considered in Campbell and Yogo (2006) and Phillips (2014). In addition, because the limiting distribution of θˆ is not mixed asymptotically normal, simple pivotal test statistics as in Andrews (2005) are not readily available contrary to the stationary case.
33 6 Summary
We develop a new limit theory for combined cross-sectional and time-series data sets. We focus on situations where the two data sets are interdependent because of common factors that affect both. The concept of stable convergence is used to handle this dependence when proving a joint Central Limit Theorem. Our analysis is cast in a generic framework of cross-section and time- series based criterion functions that jointly, but not individually, identify the parameters. Within this framework, we show how our limit theory can be used to derive asymptotic approximations to the sampling distribution of estimators that are based on data from both samples. We explicitly consider the unit root case as an example where particularly difficult to handle limiting expressions arise. Our results are expected to be helpful for the econometric analysis of rational expectation models involving individual decision making as well as general equilibrium settings. We investigate these topics, and related implementation issues, in a companion paper.
34 References
[1] Andrews, D.W.K. (1988): “Laws of Large Numbers for Dependent Non-Identically Dis- tributed Random Variables,” Econometric Theory, Vol. 4, pp.458-467.
[2] Andrews, D.W.K. (2005): “Cross-Section Regression with Common Shocks,” Econometrica 73, pp. 1551-1585.
[3] Aldous, D.J. and G.K. Eagleson (1978): “On mixing and stability of limit theorems,” The Annals of Probability 6, 325–331.
[4] Atchade, Y.F. (2009): “A strong law of large numbers for martingale arrays,” manuscript.
[5] Barndorff-Nielsen, O.E., P.R. Hansen, A. Lunde and N. Shephard (2008), “Designing realized kernels to measure the ex post variation of equity prices in the presence of noise,” Economet- rica 76, pp. 1481-1536.
[6] Billingsley, P. (1968): “Convergence of Probability Measures,” John Wiley and Sons, New York.
[7] Brown, B.M. (1971): “Martingale Central Limit Theorems,” Annals of Mathematical Statis- tics 42, pp. 59-66.
[8] Campbell, J.Y., and M. Yogo (2006): “Efficient Tests of Stock Return Predictability,” Journal of Financial Economics 81, pp. 27–60.
[9] Chan, N.H. and C.Z. Wei (1987): “Asymptotic Inference for Nearly Nonstationary AR(1) Processes,” Annals of Statistics 15, pp.1050-1063.
[10] de Jong, R.M. (1996): “A strong law of large numbers for triangular mixingale arrays,” Statistics & Probability Letters, Vol 27, pp. 1-9.
[11] Dedecker, J and F. Merlevede (2002): “Necessary and Sufficient Conditions for the Condi- tional Central Limit Theorem,” Annals of Probability 30, pp. 1044-1081.
[12] Durrett, R. (1996): “Stochastic Calculus,” CRC Press, Boca Raton, London, New York.
35 [13] Eagleson, G.K. (1975): “Martingale convergence to mixtures of infinitely divisible laws,” The Annals of Probability 3, 557–562.
[14] Feigin, P. D. (1985): “Stable convergence of Semimartingales,” Stochastic Processes and their Applications 19, pp. 125-134.
[15] Gordin, M.I. (1969): “The central limit theorem for stationary processes,” Dokl. Akad. Nauk SSSR 188, 739–741.
[16] Gordin, M.I. (1973): Abstracts of Communication, T.1: A-K, International Conference on Probability Theory, Vilnius.
[17] Hahn, J., and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects When Both n and T are Large,” Econometrica 70, pp. 1639– 57.
[18] Hahn, J., G. Kuersteiner and M. Mazzocco (2016): “Estimation with Aggregate Shocks,” arXiv:1507.04415.
[19] Hahn, J., G. Kuersteiner and M. Mazzocco (2019a): “Estimation with Aggregate Shocks,” forthcoming, The Review of Economic Studies.
[20] Hahn, J., G. Kuersteiner and M. Mazzocco (2019b): “Joint Time Series and Cross-Section Limit Theory under Mixingale Assumptions,” arXiv: 1903.04655.
[21] Hall, P., and C. Heyde (1980): “Martingale Limit Theory and its Applications,” Academic Press, New York.
[22] Hansen, B.E. (1992): “Convergence to Stochastic Integrals for Dependent Heterogeneous Processes,” Econometric Theory 8, pp. 489-500.
[23] Heckman, J.J., L. Lochner, and C. Taber (1998), “Explaining Rising Wage Inequality: Explo- rations with a Dynamic General Equilibrium Model of Labor Earnings with Heterogeneous Agents,” Review of Economic Dynamics 1, pp. 1-58.
36 [24] Heckman, J.J., and G. Sedlacek (1985): “Heterogeneity, Aggregation, and Market Wage Functions: An Empirical Model of Self-Selection in the Labor Market,” Journal of Political Economy, 93, pp. 1077-1125.
[25] Hill, J.B. (2010): “A New Moment Bound and Weak Laws for Mixingale Arrays without Mem- ory or Heterogeneity Restrictions, with Applications to Tail Trimmed Arrays,” manuscript
[26] Jacod, J., M. Podolskij and M. Vetter (2010): “Limit Theorems for Moving Averages of Discretized Processes Plus Noise,” The Annals of Statistics, 38, pp.1478-1545.
[27] Jacod, J. and A.N. Shiryaev (2002): “Limit Theorems for stochastic processes. Second Edi- tion,” Springer Verlag, Berlin.
[28] Jacod, J. and P. Protter (2012): “Discretization of Processes,” Springer Verlag, Berlin.
[29] Kallenberg, O. (1997): “Foundations of Modern Probability,” Springer Verlag, Berlin.
[30] Kanaya, S. (2017) “Convergence Rates of Sums of α-Mixing Triangular Arrays: With an Application to Nonparametric Drift Function Estimation of Continuous-Time Processes,” Econometric Theory, Vol.33, 1121-1153.
[31] Kuersteiner, G.M., and I.R. Prucha (2013): “Limit Theory for Panel Data Models with Cross Sectional Dependence and Sequential Exogeneity,” Journal of Econometrics 174, pp. 107-126.
[32] Kuersteiner, G.M and I.R. Prucha (2015): “Dynamic Spatial Panel Models: Networks, Com- mon Shocks, and Sequential Exogeneity,” CESifo Working Paper No. 5445.
[33] Kurtz and Protter (1991): “Weak Limit Theorems for Stochastic Integrals and Stochastic Differential Equations,” Annals of Probability 19, pp. 1035-1070.
[34] Lee, D., and K.I. Wolpin (2006): “Intersectoral Labor Mobility and the Growth of Service Sector,” Econometrica 47, pp. 1-46.
[35] Lee, D., and K.I. Wolpin (2010): “Accounting for Wage and Employment Changes in the U.S. from 1968-2000: A Dynamic Model of Labor Market Equilibrium,” Journal of Econometrics 156, pp. 68–85.
37 [36] Li, J. and D. Xiu (2016): “Generalized Method of Integrated Moments for High-Frequency Data,” Econometrica 84, 1613-1633.
[37] McLeish, D.L. (1975): “A maximal inequality and dependent strong laws,” Annals of Prob- ability 3, 829–839.
[38] Mikusheva, A. (2007): “Uniform Inference in Autoregressive Models,” Econometrica 75, pp. 1411–1452.
[39] Murphy, K. M. and R. H. Topel (1985): “Estimation and Inference in Two-Step Econometric Models,” Journal of Business and Economic Statistics 3, pp. 370 – 379.
[40] Phillips, P.C.B. (1987): “Towards a unified asymptotic theory for autoregression,” Biometrika 74, pp. 535-547.
[41] Phillips, P.C.B. (1988): “Regression theory for near integrated time series,” Econometrica 56, pp. 1021-1044.
[42] Phillips, P.C.B. and S.N. Durlauf (1986): “Multiple Time Series Regression with Integrated Processes,” Review of Economic Studies 53, pp. 473-495.
[43] Phillips, P.C.B. and H.R. Moon (1999): “Linear Regression Limit Theory for Nonstationary Panel Data,” Econometrica 67, pp. 1057-1111.
[44] Phillips, P.C.B. (2014): “On Confidence Intervals for Autoregressive Roots and Predictive Regression,” Econometrica 82, pp. 1177–1195.
[45] Renyi, A (1963): “On stable sequences of events,” Sankya Ser. A, 25, 293-302.
[46] Ridder, G., and R. Moffitt (2007): “The Econometrics of Data Combination,” in Heckman, J.J. and E.E. Leamer, eds. Handbook of Econometrics, Vol 6, Part B. Elsevier B.V.
[47] Rootzen, H. (1983): “Central limit theory for martingales via random change of time,” Essays in honour of Carl Gustav Essen, Eds. Holst & Gut, Uppsala University, 154-190.
[48] Rosenzweig, M.R. and C. Udry (2019): “External Validity in a Stochastic World: Evidence from Low-Income Countries,” forthcoming Review of Economic Studies.
38 [49] Runkle, D.E. (1991): “Liquidity Constraints and the Permanent-Income Hypothesis: Evi- dence from Panel Data,” Journal of Monetary Economics 27, pp. 73–98.
[50] Shea, J. (1995):“Union Contracts and the Life-Cycle/Permanent-Income Hypothesis,” Amer- ican Economic Review 85, pp. 186–200.
[51] Stroock, D.W. and S.R.S. Varadhan (1979): “Multidimensional diffusion processes,” Springer Verlag.
[52] Tobin, J. (1950): “A Statistical Demand Function for Food in the U.S.A.,” Journal of the Royal Statistical Society, Series A. Vol. 113, pp.113-149.
[53] Wooldridge, J.M. and H. White (1988): “Some invariance principles and central limit theo- rems for dependent heterogeneous processes,” Econometric Theory 4, pp.210-230.
39 Appendix
A Details for Examples
A.1 Calculations for Example 1
First note that it follows that
E zsus+1 τn,(s min(1,τ0))n+i = zsE [us+1 ] , |G − |C where the equality uses the fact that zs is measurable with respect to τn,(s min(1,τ0))n+i and that G − us+1 is independent of σ ( zt, zt 1,...,zτ0 ) .To evaluate E [us+1 ] consider the joint distribution { − } |C of us+1 for s< 0 and ν1 = z1, which is Gaussian,
s 0 1 ρ| | N , . s 1 0 ρ| | 1 ρ2 − s 2 This implies that E [u ]= ρ| | (1 ρ ) z . Evaluating the L norm of Condition 1(vii) leads to s+1|C − 1 2
ν 2 E ψτ,s τn,(s min(1,τ0) 1)n+i G − − 2 s 2 2 2 ρ | | 1 ρ E (z z E [z z ]) +(E [z z ]) ≤| | − s 1 − s 1 s 1 1 s 2 s 2 2 s 2 ρ − = ρ | | 1 ρ E (z z E [z z ]) + ρ | | 1 ρ | | − s 1 − s 1 | | − 1 ρ2 − 2 2s 2 2s s 2 ρ − +1 s ρ − = ρ | | 1 ρ + ρ | | | | − (1 ρ2)2 | | 1 ρ2 − − 2+2 s 2+2 s s ρ | | +1 s ρ | | = ρ | | + ρ | | | | 1 ρ2 | | 1 ρ2 − − s (1+δ) = O ρ | | = o s − | | | | such that ν s /2 (1+δ)/2 E ψτ,s τn,(s min(1,τ0) 1)n+i = O ρ | | = o s − . G − − 2 | | | |
40 B Proofs for Section 4
B.1 Proof of Theorem 1
To prove the functional central limit theorem we follow Billingsley (1968) and Dedecker and Merlevede (2002). Before stating the details of the proof we lay out the general proof strategy. JS (p.512, Definition 5.28) define stable convergence for sequences Zn defined on a Polish space. By Billingsley (1968), Theorem 15.5 the uniform metric can be used to establish tightness for certain processes that are continuous in the limit. We adopt their definition to our setting, noting that by JS (p.328, Theorem 1.14), DRkθ Rkρ [0, 1] equipped with the Skorohod topology is a Polish × space. Following Billingsley (1968, p. 120) let r