sii139.tex; 2/03/2011; 8:48 p. 1
1 Statistics and Its Interface Volume 0 (2011) 1–20 57 2 58 3 59 4 Asymptotic theory for stationary processes 60 5 61 6 Wei Biao Wu 62 7 63 8 64 9 present examples of linear and nonlinear processes that are 65 10 of form (1). 66 We present a systematic asymptotic theory for statistics 11 In the past half century, following the influential work of 67 of stationary time series. In particular, we consider proper- 12 Rosenblatt (1956b), there have been a substantial amount 68 ties of sample means, sample covariance functions, covari- 13 of results on limit theory for processes which are a strong 69 ance matrix estimates, periodograms, spectral density esti- 14 mixing of various types, such as α−, β−, ρ−, φ−mixing 70 mates, U-statistics, kernel density and regression estimates 15 and related concepts. See Ibragimov and Linnik (1971), the 71 16 of linear and nonlinear processes. The asymptotic theory monograph edited by Eberlein and Taqqu (1986), Doukhan 72 17 is built upon physical and predictive dependence measures, (1994) and Bradley (2007). Recently Doukhan and Louhichi 73 18 a new measure of dependence which is based on nonlinear (1999) and Dedecker and Prieur (2005) have proposed some 74 19 system theory. Our dependence measures are particularly new types of dependence measures which in a certain de- 75 20 useful for dealing with complicated statistics of time series gree overcome some drawbacks of strong mixing conditions. 76 21 such as eigenvalues of sample covariance matrices and max- In many cases it is not easy to compute strong mixing coef- 77 22 imum deviations of nonparametric curve estimates. ficients and verify strong mixing conditions. 78 23 Keywords and phrases: Dependence, Covariance func- In this paper we shall present a large-sample theory for 79 24 tion, Covariance matrix estimation, Periodogram, Spectral statistics of stationary time series of form (1). In particu- 80 25 density estimation, U-statistics, Kernel estimation, Invari- lar we shall discuss asymptotic properties of sample means, 81 sample auto-covariances, covariance matrix estimates, peri- 26 ance principle, Nonlinear time series. 82 27 odograms, spectral density estimates, U-statistics and ker- 83 28 nel density and regression estimates. Instead of using strong 84 mixing conditions and their variants, we adopt physical and 29 1. INTRODUCTION 85 30 predictive dependence measure (Wu, 2005b) for our asymp- 86 31 The exact probability distributions of statistics of time se- totic theory. The framework, tools and results presented 87 32 ries can be too complicated to be useful and they are known here can be useful for other time series asymptotic prob- 88 33 only in very special situations. It can be impossible to derive lems. 89 34 close forms for exact finite-sample distributions of statistics The rest of the paper is organized as follows. In Section 2 90 35 of time series. Therefore it is necessary to resort to large we shall review two types of representation theory for sta- 91 tionary processes: the Wold representation and (1), func- 36 sample theory. Asymptotics of linear time series have been 92 tionals of iid random variables. We argue that the latter 37 discussed in many classical time series books; see for ex- 93 representation is actually quite general. It can be viewed as a 38 ample Anderson (1971), Hannan (1970), Brillinger (1981), 94 nonlinear analogue of the Wold representation. Based on (1), 39 Brockwell and Davis (1991) and Hannan and Deistler (1988) 95 Section 3 defines physical and predictive dependence mea- 40 among others. Since the pioneering work of Howell Tong 96 on threshold processes, various nonlinear time series mod- sures which in many situations are easy to work with. Exam- 41 97 els have been proposed. It is more challenging to develop ples of linear and nonlinear processes are given in Sections 3 42 98 an asymptotic theory for such processes since one no longer and 4, respectively. Based on the physical and predictive de- 43 99 assumes linearity. pendence measures, we survey in Sections 5–12 asymptotic 44 100 This paper presents a systematic asymptotic theory for results for various statistics. Section 13 concludes the paper. 45 101 stationary processes of the form Our dependence measures are particularly useful for dealing 46 with complicated statistics of time series such as eigenvalues 102 47 103 (1) Xi = H(...,εi−1,εi), of sample covariance matrices, maxima of periodograms and 48 maximum deviations of nonparametric curve estimates. In 104 49 105 where εi, i ∈ Z, are independent and identically distributed such problems it is difficult to apply the traditional strong 50 (iid) random variables and H is a measurable function such mixing type of conditions. It would not be possible to in- 106 51 107 that Xi is well-defined. In (1), (Xi) is causal in the sense that clude in this paper proofs of all surveyed results. We only 52 108 Xi does not depend on the future innovations εj ,j >i.The present a few proofs so that readers can get a feeling of 53 causality is a reasonable assumption in the study of time the techniques used. Nonetheless we shall provide detailed 109 54 series. As argued in Section 2,(1) provides a very general background information and references where proofs can be 110 55 framework for stationary ergodic processes. Sections 3 and 4 found. 111 56 112 sii139.tex; 2/03/2011; 8:48 p. 2
1 2. REPRESENTATION THEORY OF (1973) and Kalikow (1982). Nonetheless the above construc- 57 2 STATIONARY PROCESSES tion suggests that the class of processes that (1) represents 58 3 can be very wide. For a more comprehensive account for 59 4 In 1938 Herman Wold proved a fundamental result which representing stationary processes as functions of iid ran- 60 5 asserts that any weakly stationary process can be decom- dom variables see Wiener (1958), Kallianpur (1981), Priest- 61 6 posed into a regular process (a moving average sum of white ley (1988), Tong (1990, p. 204), Borkar (1993) and Wu 62 7 noises) and a singular process (a linearly deterministic com- (2005b). 63 8 ponent). The latter result, called Wold representation or de- With the representation (1), together with the depen- 64 9 composition theorem, reveals deep insights into structures dence measures that will be introduced in Section 3,wecan 65 10 of weakly stationary processes. On the other hand, however, establish a systematic asymptotic distributional theory for 66 11 one cannot apply the Wold representation theorem to ob- statistics of stationary time series. Such a theory would not 67 12 tain asymptotic distributions of statistics of time series since be possible if one just applies the Wold representation theo- 68 13 the white noises in the moving average process do not have rem. On the other hand we note that in Wold decomposition 69 14 properties other than being uncorrelated. The joint distri- one only needs weak stationarity while here we require strict 70 15 butions of the white noises can be too complicated to be stationarity. 71 16 useful. Recently Voln´y, Woodroofe and Zhao (2011) proved 72 17 that stationary processes can be represented as super-linear 3. DEPENDENCE MEASURES 73 18 processes of martingale differences. Their useful and inter- 74 19 esting decomposition reveals a finer structure than the one To facilitate an asymptotic theory for processes of form 75 20 in Wold decomposition. (1), we need to introduce appropriate dependence measures. 76 21 Here we shall adopt a different framework. It is based on Here, based on the nonlinear system theory, we shall adopt 77 22 78 quantile transformation. For a random vector (X1,...,Xn), dependence measures which quantify the degree of depen- 23 79 let Xm =(X1,...,Xm) and define Gn(x,u)=inf{y ∈ dence of outputs on inputs in physical systems. Let the shift 24 R | ≥ } ∈ Rn−1 ∈ process 80 : FXn|Xn−1 (y x) u , x , u (0, 1). Here 25 ·|· 81 FXn|Xn−1 ( ) is the conditional distribution function of Xn 26 (4) F =(...,ε − ,ε ). 82 given Xn−1.SoGn is the conditional quantile function of Xn i i 1 i 27 given X − . In the theory of risk management, G (X − ,u) 83 n 1 n n 1 ∈ Z 28 is the value-at-risk (VaR) at level u [cf. J. P. Morgan (1996)]. Let (εi)i∈Z be an iid copy of (εi)i∈Z. Hence εi,εj,i,j , 84 ∈Lp 29 Then we have the distributional equality are iid. For a random variable X,wesayX (p>0) if 85 p 1/p 2 30 Xp := (E|X| ) < ∞. Write the L norm X = X2. 86 31 (2) Xn =D (Xn−1,Gn(Xn−1,Un)), Definition 1 (Functional or physical dependence measure). 87 32 p 88 Let Xi ∈L, p>0. For j ≥ 0 define the physical depen- where U ∼uniform(0, 1) and U is independent of X − . 33 n n n 1 dence measure 89 34 Let Uj =(U1,...,Uj). Iterating (2), we can find measurable 90 35 functions H1,...,Hn such that − ∗ 91 (5) δp(j)= Xj Xj p, 36 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 92 X1 X1 H1(U1) 37 ∗ 93 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ where Xj is a coupled version of Xj with ε0 in the latter ⎜ X2 ⎟ ⎜ G2(X1,U2) ⎟ ⎜ H2(U2) ⎟ 38 D D 94 (3) ⎝ ··· ⎠ = ⎝ ··· ⎠ = ⎝ ··· ⎠ . being replaced by ε0: 39 95 Xn Gn(Xn−1,Un) Hn(Un) ∗ F ∗ F ∗ 40 Xj = H( j ), j =(...,ε−1,ε0,ε1,...,εj−1,εj ). 96 41 In other words, we have the important and useful fact that 97 ∈ Z 42 any finite dimensional random vector can be expressed in Definition 2 (Predictive dependence measure). For j , 98 43 distribution as functions of iid uniforms. The above con- define the projection operator 99 44 struction was known for a long time; see for example Rosen- 100 (6) P · = E(·|F ) − E(·|F − ). 45 blatt (1952), Wiener (1958) and Arjas and Lehtonen (1978). j j j 1 101 46 102 It can be used to simulate multivariate distributions (see e.g. ∈Lp ≥ 47 Let Xi , p 1. Define the predictive dependence mea- 103 De´ak (1990), chapter 5) and Arjas and Lehtonen (1978). sure θ (i)=P X . 48 For more background see Wu and Mielniczuk (2010). They p 0 i p 104 49 105 also discussed connections of their dependence concept with Lemma 1 (Wu, 2005). For (Xi)i∈Z given in (1), assume 50 p 106 experimental design, reliability theory and risk measures. Xi ∈L, p ≥ 1.Forj ≥ 0 let gj be a Borel function on 51 107 If (Xi)i∈Z is a stationary ergodic process, one may expect R × R ×···→ R such that gj(F0)=E(Xj|F0).Let 52 that there exists a function H and iid standard uniform 108 ∗ 53 F − F 109 random variables Ui such that (1) holds. In Wiener (1958) (7) ωp(j)= gj( 0) gj( 0 ) p. 54 it is called coding problem. The latter claim, however, is 110 ≤ ≤ 55 generally not true; see Rosenblatt (1959, 2009), Ornstein Then θp(i) ωp(i) 2θp(i). 111 56 112 2 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 3
1 Definition 3 (Stability and weak stability). We say that since one can use that to construct simultaneous, instead of 57 2 the process (Xi)isp-stable if point-wise, confidence bands. The simultaneous confidence 58 3 bands are useful for assessing the overall variability of the 59 ∞ 4 estimated curves. Recently Lu and Linton (2007) and Li, 60 (8) Δ := δ (j) < ∞. 5 p p Lu and Linton (2010) obtained asymptotic normality and 61 j=0 6 uniform bounds for local linear estimates under near-epoch 62 7 ∞ ∞ dependence. It seems not easy to apply their framework to 63 We say that it is weakly p-stable if Ωp := j=0 θp(i) < . 8 64 ∗ establish the Gumbel type of convergence for maximum de- 9 In Definition 1 the pair (Xj,Xj ) is exchangeable. Namely 65 ∗ ∗ viations of local linear estimates. 10 (Xj,Xj )and(Xj ,Xj) have the same distribution. This in- 66 We interpret (1) as a physical system with F and X 11 teresting property is useful in applying our dependence mea- i i 67 P ∈ Z being the input and output, respectively, and H being a 12 sures. In Definition 2, the projection operators j, j , 68 F transform. With this interpretation, δ (j) quantifies the de- 13 naturally lead to martingale differences. The function gj ( 0) p 69 pendence of X = H(F )onε by measuring the distance 14 in Lemma 1 can be viewed as a nonlinear analogue of Kol- j j 0 70 between X and its coupled process X∗ = H(F ∗). The sta- 15 mogorov’s (1941) linear predictor which results from tail j j j 71 ∞ ∞ 16 terms in the Wold decomposition. When p =2,wewrite bility condition j=0 δp(j) < indicates that Δp, the cu- 72 17 δ(j)=δ2(j), ω(j)=ω2(j)andθ(i)=θ2(i). The weak sta- mulative impact of ε0 on the future values (Xi)i≥0, is finite. 73 18 bility with p = 2 guarantees an invariance principle for the Hence it can be interpreted as a short-range dependence con- 74 n 19 partial sum process Sn = i=1 Xi; see Theorem 3 in Sec- dition. For the predictive dependence measure ωp(j), since 75 F E |F 20 tion 5. gj( 0)= (Xj 0)isthejth step ahead predicted mean, 76 ω (j) measures the contribution of ε in predicting X .Re- 21 Remark 1. The above dependence measures are defined p 0 j 77 cently Escanciano and Hualde (2009) established a link be- 22 for the one-sided process X given in (1). Clearly similar 78 i tween the persistence measure proposed by Granger (1995), 23 definitions can be given for the two-sided process 79 24 the nonlinear impulse response (Koop et al. (1996)), and our 80 predictive dependence measures. 25 (9) Xi = H(...,εi−1,εi,εi+1,...) 81 26 Physical and predictive dependence measures provide a 82 27 as well. We can show that with non-essential modifications, convenient way for a large-sample theory for stationary pro- 83 28 the majority of the results in the following sections re- cesses and they are directly related to the underlying data- 84 29 main valid. Since many processes encountered in practice generating mechanism H. The obtained results based on 85 30 are causal, we decide to use the one-sided representation. those dependence measures are often optimal or nearly op- 86 31 Note that (9) can be naturally generalized to the spatial timal. The results in this paper extend to many previous 87 d d 32 process Xi = H(εi−j, j ∈ Z ), i ∈ Z , d ≥ 2. Hallin, Lu theorems in classical textbooks which are mostly for the 88 33 and Tran (2001, 2004) considered kernel density estimation special case of linear processes. 89 34 of such linear and non-linear random fields. Surgailis (1982) In the rest of this section we present examples of linear 90 35 dealt with long-memory linear fields. El Machkouri, Voln´y processes and Volterra processes, a polynomial-type nonlin- 91 36 and Wu (2010) established a very general central limit the- ear process. We shall compute their physical and predictive 92 37 orem for random fields of this type. dependence measures. Section 4 deals with nonlinear time 93 series. 38 Remark 2. In Ibragimov (1962), Billingsley (1968), Bierens 94 39 95 (1983), Andrews (1995) and Lu (2001), the following type of Example 1 (Linear Processes). Let εi be iid random vari- 40 ∈Lp 96 stationary processes has been considered: Xi = H(Vi−j,j ∈ ables with εi , p>0; let (ai) be real coefficients such 41 97 Z)orXi = H(...,Vi−1,Vi), where Vi is another stationary that 42 − − 98 process which can be α or φ mixing, and near-epoch de- ∞ 43 pendence conditions are imposed. This framework and ours min(2,p) 99 (10) |ai| < ∞. 44 100 have different ranges of application. On one hand, our (1) i=0 45 does not seem to lose too much generality in view of (3)and 101 46 Wiener’s (1958) construction. On the other hand, the prop- By Kolmogorov’s Three Series Theorem (Chow and Teicher, 102 47 103 erty that εi are independent greatly facilitates asymptotic 1988), the linear process 48 104 studies of time series. For example, in Section 11,were- ∞ 49 view Liu and Wu’s (2010a) asymptotic distributional theory 105 (11) Xt = aiεt−i 50 for maximum deviations of nonparametric curve estimates 106 i=0 51 for time series which can be possibly long-memory. It can 107 52 be very difficult to establish results of such type by using exists and is well-defined. Then (11)isofform(1) with a lin- 108 53 109 the framework of functions of strong mixing processes un- ear functional H. We can view the linear process (Xt)in(11) 54 110 der near-epoch dependence. In nonparametric inference it is as the output from a linear filter and the input (...,εt−1,εt) 55 important to have such an asymptotic distributional theory is a series of shocks that drive the system (Box, Jenkins and 111 56 112 Asymptotic theory for stationary processes 3 sii139.tex; 2/03/2011; 8:48 p. 4