Asymptotic Theory for Stationary Processes 60 5 61 6 Wei Biao Wu 62 7 63 8 64 9 Present Examples of Linear and Nonlinear Processes That Are 65 10 of Form (1)

sii139.tex; 2/03/2011; 8:48 p. 1

1 Statistics and Its Interface Volume 0 (2011) 1–20 57 2 58 3 59 4 Asymptotic theory for stationary processes 60 5 61 6 Wei Biao Wu 62 7 63 8 64 9 present examples of linear and nonlinear processes that are 65 10 of form (1). 66 We present a systematic asymptotic theory for statistics 11 In the past half century, following the influential work of 67 of stationary time series. In particular, we consider proper- 12 Rosenblatt (1956b), there have been a substantial amount 68 ties of sample means, sample covariance functions, covari- 13 of results on limit theory for processes which are a strong 69 ance matrix estimates, periodograms, spectral density esti- 14 mixing of various types, such as α−, β−, ρ−, φ−mixing 70 mates, U-statistics, kernel density and regression estimates 15 and related concepts. See Ibragimov and Linnik (1971), the 71 16 of linear and nonlinear processes. The asymptotic theory monograph edited by Eberlein and Taqqu (1986), Doukhan 72 17 is built upon physical and predictive dependence measures, (1994) and Bradley (2007). Recently Doukhan and Louhichi 73 18 a new measure of dependence which is based on nonlinear (1999) and Dedecker and Prieur (2005) have proposed some 74 19 system theory. Our dependence measures are particularly new types of dependence measures which in a certain de- 75 20 useful for dealing with complicated statistics of time series gree overcome some drawbacks of strong mixing conditions. 76 21 such as eigenvalues of sample covariance matrices and max- In many cases it is not easy to compute strong mixing coef- 77 22 imum deviations of nonparametric curve estimates. ficients and verify strong mixing conditions. 78 23 Keywords and phrases: Dependence, Covariance func- In this paper we shall present a large-sample theory for 79 24 tion, Covariance matrix estimation, Periodogram, Spectral statistics of stationary time series of form (1). In particu- 80 25 density estimation, U-statistics, Kernel estimation, Invari- lar we shall discuss asymptotic properties of sample means, 81 sample auto-covariances, covariance matrix estimates, peri- 26 ance principle, Nonlinear time series. 82 27 odograms, spectral density estimates, U-statistics and ker- 83 28 nel density and regression estimates. Instead of using strong 84 mixing conditions and their variants, we adopt physical and 29 1. INTRODUCTION 85 30 predictive dependence measure (Wu, 2005b) for our asymp- 86 31 The exact probability distributions of statistics of time se- totic theory. The framework, tools and results presented 87 32 ries can be too complicated to be useful and they are known here can be useful for other time series asymptotic prob- 88 33 only in very special situations. It can be impossible to derive lems. 89 34 close forms for exact finite-sample distributions of statistics The rest of the paper is organized as follows. In Section 2 90 35 of time series. Therefore it is necessary to resort to large we shall review two types of representation theory for sta- 91 tionary processes: the Wold representation and (1), func- 36 sample theory. Asymptotics of linear time series have been 92 tionals of iid random variables. We argue that the latter 37 discussed in many classical time series books; see for ex- 93 representation is actually quite general. It can be viewed as a 38 ample Anderson (1971), Hannan (1970), Brillinger (1981), 94 nonlinear analogue of the Wold representation. Based on (1), 39 Brockwell and Davis (1991) and Hannan and Deistler (1988) 95 Section 3 defines physical and predictive dependence mea- 40 among others. Since the pioneering work of Howell Tong 96 on threshold processes, various nonlinear time series mod- sures which in many situations are easy to work with. Exam- 41 97 els have been proposed. It is more challenging to develop ples of linear and nonlinear processes are given in Sections 3 42 98 an asymptotic theory for such processes since one no longer and 4, respectively. Based on the physical and predictive de- 43 99 assumes linearity. pendence measures, we survey in Sections 5–12 asymptotic 44 100 This paper presents a systematic asymptotic theory for results for various statistics. Section 13 concludes the paper. 45 101 stationary processes of the form Our dependence measures are particularly useful for dealing 46 with complicated statistics of time series such as eigenvalues 102 47 103 (1) Xi = H(...,εi−1,εi), of sample covariance matrices, maxima of periodograms and 48 maximum deviations of nonparametric curve estimates. In 104 49 105 where εi, i ∈ Z, are independent and identically distributed such problems it is difficult to apply the traditional strong 50 (iid) random variables and H is a measurable function such mixing type of conditions. It would not be possible to in- 106 51 107 that Xi is well-defined. In (1), (Xi) is causal in the sense that clude in this paper proofs of all surveyed results. We only 52 108 Xi does not depend on the future innovations εj ,j >i.The present a few proofs so that readers can get a feeling of 53 causality is a reasonable assumption in the study of time the techniques used. Nonetheless we shall provide detailed 109 54 series. As argued in Section 2,(1) provides a very general background information and references where proofs can be 110 55 framework for stationary ergodic processes. Sections 3 and 4 found. 111 56 112 sii139.tex; 2/03/2011; 8:48 p. 2

1 2. REPRESENTATION THEORY OF (1973) and Kalikow (1982). Nonetheless the above construc- 57 2 STATIONARY PROCESSES tion suggests that the class of processes that (1) represents 58 3 can be very wide. For a more comprehensive account for 59 4 In 1938 Herman Wold proved a fundamental result which representing stationary processes as functions of iid ran- 60 5 asserts that any weakly stationary process can be decom- dom variables see Wiener (1958), Kallianpur (1981), Priest- 61 6 posed into a regular process (a moving average sum of white ley (1988), Tong (1990, p. 204), Borkar (1993) and Wu 62 7 noises) and a singular process (a linearly deterministic com- (2005b). 63 8 ponent). The latter result, called Wold representation or de- With the representation (1), together with the depen- 64 9 composition theorem, reveals deep insights into structures dence measures that will be introduced in Section 3,wecan 65 10 of weakly stationary processes. On the other hand, however, establish a systematic asymptotic distributional theory for 66 11 one cannot apply the Wold representation theorem to ob- statistics of stationary time series. Such a theory would not 67 12 tain asymptotic distributions of statistics of time series since be possible if one just applies the Wold representation theo- 68 13 the white noises in the moving average process do not have rem. On the other hand we note that in Wold decomposition 69 14 properties other than being uncorrelated. The joint distri- one only needs weak stationarity while here we require strict 70 15 butions of the white noises can be too complicated to be stationarity. 71 16 useful. Recently Volný, Woodroofe and Zhao (2011) proved 72 17 that stationary processes can be represented as super-linear 3. DEPENDENCE MEASURES 73 18 processes of martingale differences. Their useful and inter- 74 19 esting decomposition reveals a finer structure than the one To facilitate an asymptotic theory for processes of form 75 20 in Wold decomposition. (1), we need to introduce appropriate dependence measures. 76 21 Here we shall adopt a different framework. It is based on Here, based on the nonlinear system theory, we shall adopt 77 22 78 quantile transformation. For a random vector (X1,...,Xn), dependence measures which quantify the degree of depen- 23 79 let Xm =(X1,...,Xm) and define Gn(x,u)=inf{y ∈ dence of outputs on inputs in physical systems. Let the shift 24 R | ≥ } ∈ Rn−1 ∈ process 80 : FXn|Xn−1 (y x) u , x , u (0, 1). Here 25 ·|· 81 FXn|Xn−1 ( ) is the conditional distribution function of Xn 26 (4) F =(...,ε − ,ε ). 82 given Xn−1.SoGn is the conditional quantile function of Xn i i 1 i 27 given X − . In the theory of risk management, G (X − ,u) 83 n 1 n n 1 ∈ Z 28 is the value-at-risk (VaR) at level u [cf. J. P. Morgan (1996)]. Let (εi)i∈Z be an iid copy of (εi)i∈Z. Hence εi,εj,i,j , 84 ∈Lp 29 Then we have the distributional equality are iid. For a random variable X,wesayX (p>0) if 85 p 1/p 2 30 Xp := (E|X| ) < ∞. Write the L norm X = X2. 86 31 (2) Xn =D (Xn−1,Gn(Xn−1,Un)), Definition 1 (Functional or physical dependence measure). 87 32 p 88 Let Xi ∈L, p>0. For j ≥ 0 define the physical depen- where U ∼uniform(0, 1) and U is independent of X − . 33 n n n 1 dence measure 89 34 Let Uj =(U1,...,Uj). Iterating (2), we can find measurable 90 35 functions H1,...,Hn such that − ∗ 91 (5) δp(j)= Xj Xj p, 36 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 92 X1 X1 H1(U1) 37 ∗ 93 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ where Xj is a coupled version of Xj with ε0 in the latter ⎜ X2 ⎟ ⎜ G2(X1,U2) ⎟ ⎜ H2(U2) ⎟ 38 D D 94 (3) ⎝ ··· ⎠ = ⎝ ··· ⎠ = ⎝ ··· ⎠ . being replaced by ε0: 39 95 Xn Gn(Xn−1,Un) Hn(Un) ∗ F ∗ F ∗ 40 Xj = H( j ), j =(...,ε−1,ε0,ε1,...,εj−1,εj ). 96 41 In other words, we have the important and useful fact that 97 ∈ Z 42 any finite dimensional random vector can be expressed in Definition 2 (Predictive dependence measure). For j , 98 43 distribution as functions of iid uniforms. The above con- define the projection operator 99 44 struction was known for a long time; see for example Rosen- 100 (6) P · = E(·|F ) − E(·|F − ). 45 blatt (1952), Wiener (1958) and Arjas and Lehtonen (1978). j j j 1 101 46 102 It can be used to simulate multivariate distributions (see e.g. ∈Lp ≥ 47 Let Xi , p 1. Define the predictive dependence mea- 103 Deák (1990), chapter 5) and Arjas and Lehtonen (1978). sure θ (i)=P X . 48 For more background see Wu and Mielniczuk (2010). They p 0 i p 104 49 105 also discussed connections of their dependence concept with Lemma 1 (Wu, 2005). For (Xi)i∈Z given in (1), assume 50 p 106 experimental design, reliability theory and risk measures. Xi ∈L, p ≥ 1.Forj ≥ 0 let gj be a Borel function on 51 107 If (Xi)i∈Z is a stationary ergodic process, one may expect R × R ×···→ R such that gj(F0)=E(Xj|F0).Let 52 that there exists a function H and iid standard uniform 108 ∗ 53 F − F 109 random variables Ui such that (1) holds. In Wiener (1958) (7) ωp(j)= gj( 0) gj( 0 ) p. 54 it is called coding problem. The latter claim, however, is 110 ≤ ≤ 55 generally not true; see Rosenblatt (1959, 2009), Ornstein Then θp(i) ωp(i) 2θp(i). 111 56 112 2 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 3

1 Definition 3 (Stability and weak stability). We say that since one can use that to construct simultaneous, instead of 57 2 the process (Xi)isp-stable if point-wise, confidence bands. The simultaneous confidence 58 3 bands are useful for assessing the overall variability of the 59 ∞ 4 estimated curves. Recently Lu and Linton (2007) and Li, 60 (8) Δ := δ (j) < ∞. 5 p p Lu and Linton (2010) obtained asymptotic normality and 61 j=0 6 uniform bounds for local linear estimates under near-epoch 62 7 ∞ ∞ dependence. It seems not easy to apply their framework to 63 We say that it is weakly p-stable if Ωp := j=0 θp(i) < . 8 64 ∗ establish the Gumbel type of convergence for maximum de- 9 In Definition 1 the pair (Xj,Xj ) is exchangeable. Namely 65 ∗ ∗ viations of local linear estimates. 10 (Xj,Xj )and(Xj ,Xj) have the same distribution. This in- 66 We interpret (1) as a physical system with F and X 11 teresting property is useful in applying our dependence mea- i i 67 P ∈ Z being the input and output, respectively, and H being a 12 sures. In Definition 2, the projection operators j, j , 68 F transform. With this interpretation, δ (j) quantifies the de- 13 naturally lead to martingale differences. The function gj ( 0) p 69 pendence of X = H(F )onε by measuring the distance 14 in Lemma 1 can be viewed as a nonlinear analogue of Kol- j j 0 70 between X and its coupled process X∗ = H(F ∗). The sta- 15 mogorov’s (1941) linear predictor which results from tail j j j 71 ∞ ∞ 16 terms in the Wold decomposition. When p =2,wewrite bility condition j=0 δp(j) < indicates that Δp, the cu- 72 17 δ(j)=δ2(j), ω(j)=ω2(j)andθ(i)=θ2(i). The weak sta- mulative impact of ε0 on the future values (Xi)i≥0, is finite. 73 18 bility with p = 2 guarantees an invariance principle for the Hence it can be interpreted as a short-range dependence con- 74 n 19 partial sum process Sn = i=1 Xi; see Theorem 3 in Sec- dition. For the predictive dependence measure ωp(j), since 75 F E |F 20 tion 5. gj( 0)= (Xj 0)isthejth step ahead predicted mean, 76 ω (j) measures the contribution of ε in predicting X .Re- 21 Remark 1. The above dependence measures are defined p 0 j 77 cently Escanciano and Hualde (2009) established a link be- 22 for the one-sided process X given in (1). Clearly similar 78 i tween the persistence measure proposed by Granger (1995), 23 definitions can be given for the two-sided process 79 24 the nonlinear impulse response (Koop et al. (1996)), and our 80 predictive dependence measures. 25 (9) Xi = H(...,εi−1,εi,εi+1,...) 81 26 Physical and predictive dependence measures provide a 82 27 as well. We can show that with non-essential modifications, convenient way for a large-sample theory for stationary pro- 83 28 the majority of the results in the following sections re- cesses and they are directly related to the underlying data- 84 29 main valid. Since many processes encountered in practice generating mechanism H. The obtained results based on 85 30 are causal, we decide to use the one-sided representation. those dependence measures are often optimal or nearly op- 86 31 Note that (9) can be naturally generalized to the spatial timal. The results in this paper extend to many previous 87 d d 32 process Xi = H(εi−j, j ∈ Z ), i ∈ Z , d ≥ 2. Hallin, Lu theorems in classical textbooks which are mostly for the 88 33 and Tran (2001, 2004) considered kernel density estimation special case of linear processes. 89 34 of such linear and non-linear random fields. Surgailis (1982) In the rest of this section we present examples of linear 90 35 dealt with long-memory linear fields. El Machkouri, Volný processes and Volterra processes, a polynomial-type nonlin- 91 36 and Wu (2010) established a very general central limit the- ear process. We shall compute their physical and predictive 92 37 orem for random fields of this type. dependence measures. Section 4 deals with nonlinear time 93 series. 38 Remark 2. In Ibragimov (1962), Billingsley (1968), Bierens 94 39 95 (1983), Andrews (1995) and Lu (2001), the following type of Example 1 (Linear Processes). Let εi be iid random vari- 40 ∈Lp 96 stationary processes has been considered: Xi = H(Vi−j,j ∈ ables with εi , p>0; let (ai) be real coefficients such 41 97 Z)orXi = H(...,Vi−1,Vi), where Vi is another stationary that 42 − − 98 process which can be α or φ mixing, and near-epoch de- ∞ 43 pendence conditions are imposed. This framework and ours min(2,p) 99 (10) |ai| < ∞. 44 100 have different ranges of application. On one hand, our (1) i=0 45 does not seem to lose too much generality in view of (3)and 101 46 Wiener’s (1958) construction. On the other hand, the prop- By Kolmogorov’s Three Series Theorem (Chow and Teicher, 102 47 103 erty that εi are independent greatly facilitates asymptotic 1988), the linear process 48 104 studies of time series. For example, in Section 11,were- ∞ 49 view Liu and Wu’s (2010a) asymptotic distributional theory 105 (11) Xt = aiεt−i 50 for maximum deviations of nonparametric curve estimates 106 i=0 51 for time series which can be possibly long-memory. It can 107 52 be very difficult to establish results of such type by using exists and is well-defined. Then (11)isofform(1) with a lin- 108 53 109 the framework of functions of strong mixing processes un- ear functional H. We can view the linear process (Xt)in(11) 54 110 der near-epoch dependence. In nonparametric inference it is as the output from a linear filter and the input (...,εt−1,εt) 55 important to have such an asymptotic distributional theory is a series of shocks that drive the system (Box, Jenkins and 111 56 112 Asymptotic theory for stationary processes 3 sii139.tex; 2/03/2011; 8:48 p. 4

∞ 1 Reinsel (1994), p. 8–9). Clearly ω (n)=δ (n)=|a |c , 57 p p n 0 θ2(n)= g2(u ,...,u ) 2 − ∞ k 1 k 58 where c0 = ε0 ε0 p < .Letp =2.If k=1 min(u ,...u )=n 3 1 k 59 ∞ ∞ 4 ∞ 60 = k g2(n, u ,...,u ) 5 (12) |ai| < ∞, k 2 k 61 k=1 u ,...u =n+1 6 i=0 2 k 62 7 and the physical dependence measure 63 8 then the filter is said to be stable (Box, Jenkins and Rein- 64 sel, 1994) and the preceding inequality implies short-range ∞ ∞ 9 δ2(n) 65 dependence since the covariances are absolutely summable. = k g2(n, u ,...,u ). 10 2 k 2 k 66 11 In this sense Definition 3 extends the notion of stability to k=1 u2,...uk=0 67 12 nonlinear processes. 68 13 4. NONLINEAR TIME SERIES 69 Example 2 (Autoregressive Moving Average Process, 14 70 ARMA). An important special class of linear process (11) A wide class of nonlinear time series can be expressed as 15 71 is the ARMA model which is of the form 16 72 (15) Xi = G(Xi−1,ξi)=Gξi (Xi−1), 17 p q 73 ∈ Z 18 (13) X − ϕ X − = ε + θ ε − , where ξ,ξi, i , are iid random variables taking values 74 t j t j t l t l X× →X 19 j=1 l=1 in Ξ with distribution μ and G : Ξ is a measur- 75 20 able function. Here (X ,ρ) is a complete and separable metric 76 21 p q space. We can view (15) as an iterated random function. The 77 where (ϕj)j=1 (resp. (θl)l=1) are autoregressive (resp. mov- 22 i problem of existence of stationary distributions of iterated 78 ing average) parameters. Note that ai is the coefficient of z 23 of the infinite series (1 + q θ zl)/(1 − p ϕ zj ). In the random functions and the related convergence issues has 79 l=1 l j=1 j been extensively studied (Barnsley and Elton (1988), Elton 24 special case in which q = 0, we call (13) an AR (autoregres- 80 25 (1990), Duflo (1997), Arnold (1998), Diaconis and Freedman 81 sive) process. Let λ1,...,λp be the roots of the equation 26 p − p p−j ∗ | | (1999), Steinsaltz (1999), Alsmeyer and Fuh (2001), Jarner 82 λ j=1 ϕj λ =0. Assume λ =maxm≤p λm < 1. 27 i ∗ and Tweedie (2001), Wu and Shao (2004)). Here we shall 83 Then |ai| = O(r ) for all r ∈ (λ , 1) and (10) holds. 28 present a sufficient condition for (15) so that the represen- 84 29 Example 3 (Volterra Series). Intuitively, if we perform tation (1) holds. 85 30 first-order Taylor expansion of H in (1), then the corre- Define the forward iteration function 86 31 87 sponding linear process can viewed as a first-order approx- ◦ ◦···◦ 32 (16) Xn(x)=Gξn Gξn−1 Gξ1 (x), 88 imation of Xi. To model nonlinearity, we can apply higher- 33 89 order Taylor expansions. Suppose that H is sufficiently well- where n ∈ N, and the backward iteration function 34 behaved so that it has the stationary and causal represen- 90 35 ◦ ◦···◦ 91 tation (17) Zn(x)=Gξ1 Gξ2 Gξn (x). 36 92 37 D 93 (14) H(...,εn−1,εn) Observe that, for all x ∈X, by independence of ξi, Xn(x) = 38 94 ∞ ∞ Zn(x). Note that the joint distributions (Xn(x))n≥1 and 39 95 = gk(u1,...,uk)εn−u ...εn−u , (Zn(x))n≥1 are not the same. If Zn(x) converges almost 40 1 k 96 k=1 u1,...,uk=0 surely to a random variable Z∞ (say), then Xn(x)converges 41 97 in distribution to Z∞. 42 where functions g are called the Volterra kernel. The right- 98 k ∈X 43 hand side of (14) is called the Volterra expansion and it plays Condition 1. There exist y0 and α>0 such that 99 44 an important role in the nonlinear system theory (Schetzen 100 45 (18) 101 1980, Rugh 1981, Casti 1985, Priestley 1988, Bendat 1990, 46 E{ α } α ∞ 102 Mathews 2000). Assume that εt are iid with mean 0, vari- I(α, y0):= ρ [y0,Gξ(y0)] = ρ [y0,Gθ(y0)]μ(dθ) < . 47 Ξ 103 ance 1 and gk(u1,...,uk) is symmetric in u1,...,uk and it 48 104 equals zero if ui = uj for some 1 ≤ i 0andr(α) ∈ (0, 1) 49 0 105 such that, for all x ∈X, 50 ∞ ∞ 106 51 2 ∞ α α 107 gk(u1,...,uk) < . (19) E{ρ [X1(x),X1(x0)]}≤r(α)ρ (x, x0). 52 108 k=1 u1,...,uk=0 53 Theorem 1 (Wu and Shao, 2004). Suppose that Conditions 109 54 2 110 Then Xn exists and Xn ∈L. Wu (2005) computed the 1 and 2 hold. Then there exists a random variable Z∞ such 55 111 predictive dependence measure that for all x ∈X, Zn(x) → Z∞ almost surely. The limit 56 112 4 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 5

1 57 Z∞ is σ(ξ1,ξ2,...)-measurable and does not depend on x. holds for all y =(x1,...,xp) and y =(x1,...,xp).Then 2 Moreover, for every n ∈ N, [i] (23) admits a stationary solution of the form (1) and [ii] 58 3 Xn satisfies GMC(α). In particular, if there exist functions 59 α n p 4 (20) E{ρ [Zn(x),Z∞]}≤Cr (α), H such that |R(y; ε) − R(y ; ε)|≤ H (ε)|x − x | for 60 j j=1 j j j 5 p α 61 all y and y and j=1 Hj(ε) α < 1, then we can let aj = 6 where C>0 depends only on x, x0,y0, α and r(α) ∈ 62 H (ε)α . 7 (0, 1). In addition, we have the geometric-moment contract- j α 63 8 ing (GMC) property: Duflo (1997) assumed α ≥ 1 and called (24)Lipschitz 64 9 mixing condition. Here we allow α<1. Similar conditions 65 E{ α }≤ n 10 (21) ρ [Zn(X0),Z∞] Cr (α), are given in Götze and Hipp (1994). 66 11 Doukhan and Wintenberger (2008) considered the 67 where X ∼ π is independent of ξ ,ξ ,.... 12 0 1 2 AR(∞) or chain with infinite memory model 68 13 Remark 3. In applying Theorem 1, a useful sufficient con- 69 14 dition for (19)is (25) Xk+1 = R(Xk,Xk−1,...; εk+1), 70 15 71 16 where εk are iid innovations. Assume that there exists a 72 (22) E(Kα)= Kαμ(dθ) < 1, 17 θ θ non-negative sequence (w ) ≥ such that, for some α ≥ 1, 73 Ξ j j 1 18 74 ρ[Gθ(x ),Gθ(x)] − 19 where Kθ =sup . (26) R(x−1,x−2,...; ε0) R(x−1,x−2,...; ε0) α 75 x = x ρ(x ,x) ∞ 20 76 ≤ w |x− − x |. 21 To see this, by Fatou’s lemma, we have (19) with r(α)= j j −j 77 22 E α j=1 78 (Kθ )inviewof 23 79 Under suitable conditions on (ω ) ≥ , iterations of (25) lead 24 ρα[G (x),G (x)] j j 1 80 E α θ θ { } to a stationary solution X of form (1). We now compute its 25 1 > (Kθ )= sup μ dθ k 81 α ∗ Θ x = x ρ (x ,x) physical dependence measure. Let δ (k)=X − H(F ) . 26 α k k α 82 ρα[G (x),G (x)] For k ≥ 0, by (25)and(26), we have 27 ≥ sup θ θ μ{dθ}. 83 α 28 x = x Θ ρ (x ,x) 84 k+1 29 85 (27) δα(k +1)≤ wiδα(k +1− i). 30 Remark 4. Assume that Kθ has an algebraic tail. If there 86 E i=1 31 exists an α such that (19) holds, then (log Kθ) < 0. The 87 converse is also true. The latter is a key condition in Dia- 32 Define recursively the sequence (ak)k≥0 by a0 = δα(0) and 88 33 conis and Freedman (1999). Our Theorem 1 is an improved 89 k+1 34 version of Theorem 1 in Diaconis and Freedman (1999) in 90 35 that it states stronger results under weaker conditions. (28) ak+1 = wiak+1−i. 91 i=1 36 ≥ 92 The GMC property (21) asserts that Xi,i 0, forgets 37 ∞ ∞ 93 the history F =(...,ε− ,ε ) geometrically quickly. It is k i | |≤ 0 1 0 Let A(s)= k=0 aks and W (s)= i=1 wis , s 1. By 38 94 equivalent to the following: the physical dependence mea- (28), we have A(s)=a0 + A(s)W (s). Hence A(s)=a0(1 − 39 n − 95 sure δα(n)=O(r (α)). W (s)) 1. Assume that, as s ↑ 1, 1 − W (s) ∼ (1 − s)d with 40 d−1 96 Theorem 1 can be generalized to nonlinear AR(p) models d ∈ (0, 1/2). Then δα(k) ≤ ak ∼ a0k /Γ(d), where Γ(·)is 41 d 97 (Shao and Wu, 2007). Let ε, εn be iid, p, d ≥ 1; let Xn ∈ R the Gamma function. The latter is the fractional integration 42 d 98 be recursively defined by model (1 − B) Xk+1 = εk+1. For a nonlinear functional R, 43 99 (25) generates a nonlinear long-memory process. 44 (23) X = R(X ,...,X − ; ε ), ∞ 100 n+1 n n p+1 n+1 Note that in our setting W (1) = wj = 1, while 45 j=1 101 W (1) < 1 is required in Doukhan and Wintenberger (2008). 46 102 where R is a measurable function. Suitable conditions on R Hence we can allow stronger dependence. If, as in Doukhan 47 implies GMC. k 103 and Wintenberger (2008), W (1) < 1, then ak = O(r )for 48 104 some r ∈ (0, 1). This is analogous to Theorem 2 which en- 49 Theorem 2 (Shao and Wu, 2007). Let α>0 and α = 105 ∈Lα sures the GMC property. 50 min(1,α). Assume that R(y0; ε) for somey0 and that 106 there exist constants a ,...,a ≥ 0 such that p a < 1 51 1 p j=1 j Example 4 (Amplitude-dependent Exponential Autore- 107 and 52 gressive (EXPAR) Model). Jones (1976) studied the follow- 108 α 53 p ing EXPAR model: let εj ∈L be iid and recursively define 109 54 − α ≤ | − |α 110 (24) R(y; ε) R(y ; ε) α aj xj xj − 2 55 j=1 Xn =[α + β exp( aXn−1)]Xn−1 + εn, 111 56 112 Asymptotic theory for stationary processes 5 sii139.tex; 2/03/2011; 8:48 p. 6

P 1 | | ≤ ≤ 57 where α, β, a > 0 are real parameters. Then H1(ε)= α + l=0 blkXt+j−k−l)εt+j−k if 1 + r j s. Pham (1985, 2 |β|.ByTheorem1 (cf Remark 3), Xn is GMC(α)if|α| + 1993) discovered the representation 58 3 |β| < 1. 59 4 (31) 60 Example 5 (Nonlinear AR Process Based on the Clayton 2 5 Xt = HZt−1 + εt,Zt =(A + Bεt)Zt−1 + cεt + dεt . 61 Copula). Let α>0andUi,i ∈ Z, be iid uniform(0, 1). 6 62 Consider the model ≥ ∈L2α E | 7 By (31), Xt is GMC(α), α 1ifε1 and ( A + 63 |α 8 −α/(1+α) − Bε1 ) < 1. By (39), Zt admits a causal representation and 64 Yi =(Ui 1)Yi−1 +1. 9 so does Xt. 65 10 −1/α 66 Then Yi has the stationary distribution with Y ∼ Example 7 (Threshold AR model, TAR (Tong, 1990)). For i − 11 uniform(0, 1). The above Markov process is generated by x ∈ R let x+ =max(x, 0) and x =min(x, 0). Tong (1990) 67 12 the Clayton copula (Chen and Fan, 2006) which is used to considered the threshold autoregressive model (TAR) 68 13 model tail dependence behavior of time series. 69 + − 14 (32) X = θ X + θ X + ε , 70 ∈ Z i 1 i−1 2 i−1 i 15 Example 6 (Bilinear time series). Let ε, εi,i , be iid 71 ∈ Z 16 and consider the recursion where θ1,θ2 are real parameters and ε, εi,i , are iid. The 72 17 above model suggests the regime switching phenomenon: if 73 (29) Xi =(a + bεi)Xi−1 + cεi, 18 Xi−1 > 0, then (32) becomes Xi = θ1Xi−1 + εi, while if 74 − 19 Xi 1 < 0, then Xi follows a different AR(1) process Xi = 75 where a, b and c are real parameters. When b =0,then(29) α θ X − +ε .ByTheorem1,ifmax(|θ |, |θ |) < 1andε ∈L , 20 reduces to an AR(1) process. The bilinear time series was 2 i 1 i 1 2 76 α>0, then (32) admits a stationary solution. 21 first proposed by Tong (1981) to model sudden jumps in 77 22 time series. Quinn (1982) derived the moment stability. By Example 8 (Autoregressive Conditional Heteroscedastic 78 α α 23 Theorem 1,ifε ∈L , α>0, and E(|a + bε| ) < 1, then Models, ARCH (Engle, 1982)). Let ε, εi,i ∈ Z, be iid. The 79 24 (29) admits a stationary solution. Consider the subdiagonal ARCH with order 1 is defined by the recursion 80 25 bilinear model [Granger and Anderson (1978), Subba Rao 81 26 and Gabr (1984)]: 2 2 2 82 (33) Xi = εi a + b Xi−1, 27 83 28 (30) 2 84 where a and b are real parameters. If Eεi =0andEε =1, p q Q i 29 P 2 85 then the conditional variance of Xi given Xi−1 is a + 30 Xt = ajXt−j + cjεt−j + bjkXt−j−kεt−k. 2 2 86 b Xi−1, which depends on Xi−1 and hence suggesting het- 31 j=1 j=0 j=0 k=1 eroscedasticity. The latter property is useful for modeling 87 32 financial time series that exhibit time-varying volatility clus- 88 Let s =max(p, P +q, P +Q), r = s−max(q, Q)andap+j = 33 E | | 89 0=cq+j = bP +k,Q+j =0,k, j ≥ 1; let H be a 1 × s tering. A sufficient condition for stationarity is log bε < 0. 34 α 90 vector with the (r + 1)-th element 1 and all others 0, c be If there exists α>0 such that E(|bε| ) < 1, then Xi has a 35 an s × 1 vector with the first r − 1 elements 0 followed by stationary solution with αth moment. 91 36 1,a + c ,...,a − + c − ,andd be an s × 1 vector with 92 1 1 s r s r Example 9 (Generalized Autoregressive Conditional Het- 37 the first r elements 0 followed by b01,...,b0,s−r. Define the 93 eroskedastic models, GARCH (Bollerslev, 1986)). Let ε ,t∈ 38 s × s matrices t 94 ⎛ ⎞ Z, be iid random variables with mean 0 and variance 1; let 39 95 01 0 0 40 ⎜ . ⎟ 96 ⎜ .. ⎟ (34) Xt = htεt, 41 ⎜ 0 ⎟ 97 ⎜ 010⎟ 42 ⎜ ⎟ where the conditional variance function follows the ARMA 98 A = ⎜ . ⎟ , 43 ⎜ .. ⎟ 99 ⎜ 00 a1 0 ⎟ model 44 ⎝ . ⎠ 100 45 .1 (35) 101 as ··· ··· as−r 0 2 ··· 2 ··· 46 ⎛ ⎞ ht = α0 + α1Xt−1 + + αqXt−q + β1ht−1 + + βpht−p, 102 47 0 ··· 00··· 0 103 ⎜ ⎟ where α > 0, α ≥ 0for1≤ j ≤ q and β ≥ 0for 48 ⎜ ...... ⎟ 0 j i 104 ⎜ ...... ⎟ 1 ≤ i ≤ p.Here(X ) is called the generalized autore- 49 ⎜ ··· ··· ⎟ t 105 ⎜ 0 00 0 ⎟ gressive conditional heteroscedastic model GARCH(p, q). A 50 B = ⎜ ··· ··· ⎟ . 106 ⎜ br1 b01 0 0 ⎟ sufficient condition for (X ) being stationary is (Bollerslev, 51 ...... t 107 ⎝ ...... ⎠ 1986): 52 ...... 108 b − ··· b − 0 ··· 0 53 r,s r 0,s r q p 109 54 110 Let Zt be an s × 1 vector with the j-th entry Xt−r+j (36) αj + βi < 1. − 55 ≤ ≤ r s r j=1 i=1 111 if 1 j r and k=j akXt+j−k + k=j (ck + 56 112 6 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 7

1 The existence of moments for GARCH models has been Example 11 (Nonlinear Heteroskedastic AR Models). Let 57 2 widely studied; see Chen and An (1998), He and Teräsvirta μ(·)andσ(·) ≥ 0 be real valued functions; let ε, εi,i∈ Z,be 58 α 3 (1999), Ling (1999), and Ling and McAleer (2002) among iid random variables with εi ∈L , α>0. Consider 59 4 2 2 T 60 others. Let Yt =(Xt ,...,Xt−q+1,ht,...,ht−p+1) , bt = 5 2 T (40) Xi = μ(Xi−1)+σ(Xi−1)εi 61 (α0t , 0,...,0,α0, 0,...,0) and θ =(α1,...,αq,β1,..., 6 β )T ;lete =(0,...,0, 1, 0,...,0)T be the unit column vec- 62 p i If σ(·) is not a constant function, then (40) defines a het- 7 tor with ith element being 1, 1 ≤ i ≤ p + q.Then(34)ad- 63 eroskedastic process. If ε is Gaussian, then we can view (40) 8 mits the following autoregressive representation (Bougerol i 64 as a discretized version of the stochastic diffusion model 9 and Picard, 1992): 65 10 66 (41) dY = μ(Y )dt + σ(Y )dIB(t) 11 (37) t t t 67 12 Yt = MtYt−1 + bt, where IB is the standard Brownian motion. Many well- 68 13 2 69 where Mt =(θt ,e1, ..., eq−1,θ,eq+1, ..., ep+q−1) . known financial models are special cases of (41); see Fan 14 (2005) and references therein. For (40), assume that 70 15 For a square matrix M let ρ(M) be its largest eigenvalue 71 16 T 1/2 72 of (M M) .Let⊗ be the usual Kronecker product; let (42) sup μ (x)+σ (x)εα < 1, 17 | | E 4 x 73 Y be the Euclidean length of a vector Y . Assume (εt ) < 18 ∞ E ⊗2 74 . Ling (1999) shows that if ρ[ (Mt )] < 1, then (Xt) then by Theorem 1 it has a stationary solution. 19 E 4 ∞ 75 has a stationary distribution and (Xt ) < .Lingand 20 E ⊗2 76 McAleer (2002) argue that the condition ρ[ (Mt )] < 1 5. CENTRAL LIMIT THEORY 21 is also necessary for the finiteness of the fourth moment. 77 22 Our Proposition 1 asserts that the same condition actually This section presents a central limit theorem for the pro- 78 E 23 implies (21) as well. cess (1). Let the mean (Xi)=0and γk =cov(X0,Xk) 79 n 24 80 the covariance function. Let Sn = i=1 Xi and define the 25 Proposition 1 (Wu and Min, 2005). For the GARCH process 81 26 model (34), assume that εt are iid with mean 0 and variance 82 E 4 ∞ E ⊗2 E | − |4 ≤ 27 1, (εt ) < and ρ[ (Mt )] < 1.Then ( Xn Xn ) (43) St = S t +(t − t)X t +1,t≥ 0, 83 n ∞ ∈ 28 Cr for some C< and r (0, 1). Therefore (21) holds. 84 where the floor function t =max{k ∈ Z : k ≤ t}.Note 29 Shao and Wu (2007) showed that (21) holds for the asym- 85 that S is continuous in t. We shall show that, under suitable 30 metric GARCH processes of Ding, Granger and Engle (1993) t 86 weak dependence conditions, the central limit theorem 31 and Ling and McAleer (2002). 87 32 88 × Sn 2 33 Example 10 (Random Coefficients Model). Let Ak be p p (44) √ ⇒ N(0,σ ) 89 × ∈ N n 34 random matrices and Bk be p 1 random vectors, p .Let 90 (A ,B ), k ∈ Z, be iid. The generalized random coefficient 35 k k holds for some σ2 < ∞. Here ⇒ denotes weak convergence 91 autoregressive process (Xi) is defined by 36 (Billingsley, 1968). Central limit theorems of type (44)has 92 37 93 (38) Xi = AiXi−1 + Bi,i∈ Z. a substantial history. The classical Lindeberg-Feller (cf Sec- 38 tion 9.1 in Chow and Teicher (1988)) concerns independent 94 39 Bilinear and GARCH models fall within the framework of random variables. Hoeffding and Robbins (1948) proved 95 40 (38). The stationarity, geometric ergodicity and β-mixing a central limit theorem under m-dependence. Rosenblatt 96 41 properties have been studied by Pham (1986), Mokkadem (1956) introduced strong mixing processes, while Gänssler 97 42 (1990) and Carrasco and Chen (2002). Their results require and Häeusler (1979) and Hall and Heyde (1980) consid- 98 43 that innovations have a density, which is not needed in our ered martingales. For central limit theorems for station- 99 44 setting. ary processes see Ibragimov (1962), Gordin (1969), Ibrag- 100 45 × | | | | | | ≥ 101 For a p p matrix A,let A α =supz=0 Az α/ z α,α 1 imov and Linnik (1971), Gordin and Lifsic (1978), Peligrad 46 | | 102 be the matrix norm induced by the vector norm z α = (1996), Doukhan (1999), Maxwell and Woodroofe (2000), 47 p | |α 1/α ≥ E | | 103 ( j=1 zj ) .ThenXi is GMC(α), α 1if ( A0 α) < Rio (2000), Peligrad and Utev (2005), Dedecker et al (2007) 48 104 1andE(|B0|α) < ∞. By Jensen’s inequality, we have and Bradley (2007). 49 105 E(log |A0|α) < 0. By Theorem 1.1 of Bougerol and Picard Here we shall use the predictive dependence measure. It 50 (1992), turns out that under a weak stability condition, one can 106 51 107 ∞ actually have an invariance principle concerning the weak 52 108 convergence of the re-scaled process of {Snu, 0 ≤ u ≤ 1} (39) X = A A − ...A − B − 53 n n n 1 n k+1 n k to a Brownian motion {IB(u), 0 ≤ u ≤ 1}. The latter auto- 109 k=0 54 matically entails (44). Recall (6) for the projection operator 110 55 111 converges almost surely. Pi. 56 112 Asymptotic theory for stationary processes 7 sii139.tex; 2/03/2011; 8:48 p. 8

1 Theorem 3. Let θp(i)=P0Xip, p>1. Assume EXi =0 Note that Pi−lXi, i =1,...,n, are stationary martingale 57 2 and diﬀerences. If p>2, by Theorem 2.1 in Rio’s (2009), we 58 3 ∞ have 59 4 (45) Θ := θ (i) < ∞. 60 p p n 2 5 i=0 2 61 (50) P − X ≤ (p − 1)nP X . 6 i l i 0 l p 62 i=1 7 Then (i) we have the moment inequality p 63 8 ≤ 64 (p − 1)1/2n1/2Θ ,p>2, If 1

statistical inference of time series 101 where z − is the up (α/2)th quantile of the standard 46 1 α/2 since Brownian motions have many attractive properties. In 102 Gaussian distribution. The estimation of σ2 will be discussed 47 Wu and Zhao (2007) we applied Wu’s (2007) Gaussian ap- 103 in Section 10. 48 proximation (see Theorem 5 below) to perform statistical 104 49 5.1 Proof of Theorem 3 inference of trends in time series. 105 50 The celebrated strong invariance principle by Koml´os, 106 By the triangle inequality, since X = P − X ,we 51 i l∈Z i l i Major and Tusnady (1975, 1976) gives an optimal rate; see 107 have 52 (53). The rate in (55) is optimal up to a multiplicative log- 108 53 n n arithmic factor. Theorem 2.1 in Liu and Lin’s (2009a) leads 109 P ≤ P 54 (49) Sn p = i−lXi i−lXi . to Theorem 6 which provides a strong invariance principle 110 55 i=1 l∈Z p l∈Z i=1 p for vector-valued processes. 111 56 112 8 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 9

1 Theorem 4 (Komlós, Major and Tusnady, 1975, 1976). As- 7. SAMPLE COVARIANCE FUNCTIONS 57 p 2 sume that Xi,i∈ Z, are iid with mean 0 and Xi ∈L , p>2. 58 3 Covariance functions characterize second order properties 59 Let σ = Xi. Then on a richer probability space there exists of stochastic processes and they play a fundamental role in 4 a Brownian motion {IB(u),u ≥ 0} and a process (X ) ∈Z 60 i i 5 D n the theory of time series. They are critical quantities that 61 such that (Xi)i∈Z =(Xi )i∈Z and, for Sn = Xi ,we 6 i=1 are needed in various inference problems including param- 62 have 7 eter estimation and hypothesis testing. Asymptotic proper- 63 ties of sample covariances have been studied in many classi- 8 | − | 1/p 64 (53) max Su σIB(u) = oa.s.(n ). 9 0≤u≤n cal time series textbooks; see for example Priestley (1981), 65 10 Brockwell and Davis (1991), Hannan (1970) and Anderson 66 11 (1971). For other contributions see Hall and Heyde (1980), 67 Theorem 5 (Wu, 2007). Let (Xi)i∈Z be of the form (1) 12 p Hannan (1976), Hosking (1996), Phillips and Solo (1992) 68 with mean 0 and Xi ∈L , 2 0 can be very small. See for example (61) 35 2/p−1 91 Philipp and Stout (1975) and Eberlein (1986). As pointed 3p − 3 4n X1pΔp 36 γˆ − (1 − k/n)γ ≤ Θ2 + . 92 k k p/2 p − 37 out in Wu and Zhao (2007), in nonparametric simultaneous n p 2 93 38 inference of trends of time series, such error bounds are too 94 (ii) Assume X ∈L4 and (60) holds with p =4.Thenas 39 crude to be useful. i 95 n →∞, 40 96 Theorem 6 (Liu and Lin, 2009a). Let (Xi)i∈Z be a d- 41 √ 97 dimensional stationary vector process of the form (1) with (62) n(ˆγ − γ , γˆ − γ ,...,γˆ − γ ) ⇒ N[0, E(D DT )] 42 0 0 1 1 k k 0 0 98 H taking values in Rd, d ≥ 2.Let2 0, where D = P (X Y ) ∈L2 and P is the projection 44 0 i=0 0 i i 0 100 operator defined by (6). 45 ∞ 101 46 −(p−2)/(8−2p)−τ n − 102 (57) Δp(m)= δp(j)=O(m ) Proof of Theorem 7. Write Tn = i=1 XiXi+j nγj.First 47 103 j=m we show that for all j ∈ Z, 48 104 49 ∞ 2/p 105 as m →∞.LetD = P X . Further assume that 4n X1pΔp k i=k k i (63) T ≤ . 50 E T n p/2 − 106 the covariance matrix Γ= (DkDk ) is positive definite. p 2 51 Then on a richer probability space, there exists an Rd valued 107 52 Let q = p/2 and assume j ≥ 0. Recall that X = g(ξ) 108 Brownian motion IBd(t) such that i i 53 E | 109 and, for i<0, we have Xi = Xi and (XiXi+j ξ−1)= 54 1/2 1/p E(X X |ξ− )=E(X X |ξ ). By Jensen’s and Schwarz’s 110 (58) max |Su − Γ IBd(u)| = oa.s.(n ). i i+j 1 i i+j 0 55 0≤u≤n inequalities, 111 56 112 Asymptotic theory for stationary processes 9 sii139.tex; 2/03/2011; 8:48 p. 10

1 (64) P0(XiXi+j)q and assume (60). Then we have (i) 57 2 E − | 58 = (XiXi+j XiXi+j ξ0) q n 3 √1 − E ⇒ 59 ≤X X − X X (67) [XiZi−kn (Xkn Z0)] N(0, Σh), 4 i i+j i i+j q n 60 i=1 5 ≤ − − 61 Xi(Xi+j Xi+j) q + (Xi Xi)Xi+j q 6 ≤ where Σh is an h × h matrix with entries 62 Xi pδp(i + j)+δp(i) Xi+j p. 7 63 (68) 8 By the triangle inequality, 64 9 σab = γj+aγj+b = γjγj+b−a =: σ0,a−b, 1 ≤ a, b ≤ h, 65 10 ∈Z ∈Z 66 (65) j j 11 n n 67 → 12 P ≤ P and (ii) if additionally kn/n 0, then 68 Tn q = i−lXiXi+j i−lXiXi+j . 13 i=1 ∈Z ∈Z i=1 69 l q l q (69) 14 √ 70 T T n[(ˆγ ,...,γˆ − ) − (γ ,...,γ − ) ] ⇒ N(0, Σ ). 15 Note that Pi−lXiXi+j, i =1,...,n, form stationary martin- kn kn h+1 kn kn h+1 h 71 16 gale differences. By Burkholder’s (1988) moment inequality 72 Theorem 8 can be extended to long-memory linear pro- 17 for martingale differences, we have 73 cesses. Wu, Huang and Zheng (2010) proved central and 18 74 n q noncentral limit theorems for sample covariances of long- 19 75 P memory heavy-tailed linear processes with bounded as well 20 (66) i−lXiXi+j 76 i=1 as unbounded lags. They showed that the limiting distribu- 21 q 77 n 2 q/2 q tion depends in an interesting way on the strength of de- 22 E{[ (P − X X ) ] } nP X X 78 ≤ i=1 i l i i+j ≤ 0 l l+j q pendence, the heavy-tailedness of the innovations, and the 23 − q − q 79 (q 1) (q 1) magnitude of the lags. 24 80 25 since q/2 ≤ 1. By (64)and(65), since δp(i)=0ifi<0, we Remark 5. Bartlett (1946) derived approximate expres- 81 26 have (63). Write sions of covariances of estimated covariances: for fixed 82 27 k, l ≥ 0, 83 n 28 n − k 1 ∞ 84 γˆ − γ = (X X − − γ ) 29 n n k n i i k k ∼ 85 i=k+1 (70) ncov(ˆγk, γˆk+l) (γmγm+l + γm+k+lγm−k). 30 −∞ 86 ¯ n − m= 31 Xn n k 2 87 − (Xi−k + Xi+k)+ X¯ 32 n n n If k →∞, then the above quantity converges to 88 i=k+1 ∞ 33 m=−∞ γmγm+l = σ0,l.Theorem√8 provides an asymptotic 89 − E 34 ¯ n ≤ distributional result. For large k, n(ˆγk γˆk) behaves as 90 By Theorem 3(i), the inequality Xn i=k+1 Xi−k q 35 n j∈Z γjηk−j,whereηj are iid standard normal random vari- 91 X¯np Xi−kp and (63), (61) follows via elemen- 36 i=k+1 ables. 92 tary manipulations. 37 93 By Theorem 3, (ii) follows from the Crámer-Wold device Remark 6. Theorem 8 suggests that the sample covariance 38 94 and (64) with p =4. γˆk is not a good estimate of γk if k is large, a folklore result 39 →∞ 95 √ in time series analysis.√ For example, if k = kn with 40 − → → 96 Theorem 7 provides a CLT for n(ˆγk γk) with bounded kn/n 0 satisfies nγkn 0. The mean squared error 41 E − 2 ∼ 97 k. It turns out that, for unbounded k, the asymptotic be- (MSE) (ˆγkn γkn ) σ00/n. However for such kn the 42 estimateγ ô ≡ 0 has a smaller MSE γ2 = o(n−1). The 98 havior is quite different in that the asymptotic distribution kn kn 43 →∞ 99 does not depend on the speed of kn ; see (67). By estimateγ ˆkn is too noisy. The shrinkage estimateγ ˆk1|γˆ |≥c 44 k n 100 Theorem 3.1 in Keenan (1997), one can have a CLT for with a carefully chosen threshold cn → 0canhaveabetter 45 101 strong mixing processes with kn = o(log n). An open prob- performance in the sense that it can reduce the asymptotic 46 102 lem was posed in the latter paper on whether the severe re- MSE. 47 103 striction kn = o(log n) can be relaxed. The latter restriction 48 104 excludes many important applications. Harris, McCabe and 8. ESTIMATION OF COVARIANCE 49 105 Leybourne (2003) considered linear processes with larger 50 MATRICES 106 ranges of k .Theorem8(ii) gives a CLT for short-range de- 51 n Theorems 7 and 8 provide asymptotic normality for sam- 107 pendent nonlinear processes under a natural and mild con- 52 ple covariances. This section deals with the estimation of 108 dition on k : k →∞and k /n → 0. 53 n n n the covariance matrix 109 54 110 Theorem 8 (Wu, 2008). Let Zi =(Xi,Xi−1,..., 55 T (71) Σn =(γi−j)1≤i,j≤n 111 Xi−h+1) ,whereh ∈ N is fixed. Let kn →∞, E(Xi)=0 56 112 10 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 11

1 based on the observations X1,...,Xn. Estimation of covari- 1 and (iii) w(u)=0if|u| > 1. Note that Σˆ n is non- 57 2 ance matrices or their inverses is important in the study negative definite. If Wn is also non-negative definite, then 58 3 of prediction and various inference problems in time series. by the Schur Product Theorem in matrix theory (Horn 59 4 ˜ 60 The entry-wise convergence results of Theorems 7 and 8 do and Johnson, 1990), their Schur product Σn,ln is also non- 5 not automatically lead to matrix convergence properties of negative definite. The truncated or rectangular window with 61 6 estimates of Σn. w(u)=1|u|≤1 is, unfortunately, not non-negative definite. 62 7 For an n×n matrix A with real entries the operator norm The Bartlett or triangular window wB(u)=max(0, 1 −|u|) 63 8 ρ(A) is defined by leads to a positive definite weight matrix Wn in view of 64 9 65 10 (72) ρ(A)= max |Ax|, 66 x∈Rn: |x|=1 (76) wB(u)= w(x)w(x + u)dx, 11 R 67 12 68 where, for an n-dimensional real vector x =(x1,...,xn) , where w is the rectangular window. To see this, let ci,ui ∈ 13 | | n 2 1/2 2 69 x =( xi ) . Hence ρ (A) is the largest eigenvalue of R,i=1,...,n.By(76), 14 i=1 70 A A,where denotes matrix transpose. 15 71 Wu and Pourahmadi (2009) studied convergence of co- n 2 16 − − ≥ 72 variance matrix estimates. Theorem 9 shows that, under the ciwB(ui uj)cj = ciw(v ui) dv 0. 17 R 73 operator norm ρ(·), the sample covariance matrix estimate 1≤i,j≤n i=1 18 √ 74 19 · · 75 (73) Σˆ n =(ˆγi−j)1≤i,j≤n Replacing w( )in(76)by 3wB( ), we obtain the Parzen 20 window: 76 21 77 is not a consistent estimate of Σn; see Theorem 9(i). Case 22 78 (ii) asserts that ρ(Σˆ n − Σn) has order log n. We conjec- (77) wP (u)= wB(x)wB(x + u)dx 23 R 79 ture that, with proper centering and scaling, ρ(Σˆ n − Σn) 24 converges to Gumbel distribution. Geman (1980) and Yin, 1 − 6u2 +6|u|3, |u| < 1/2, 80 = 25 Bai and Krishnaiah (1988) considered the convergence prob- max[0, 2(1 −|u|)3], |u|≥1/2. 81 26 lem of largest eigenvalues of sample covariance matrices of 82 27 iid random vectors which has independent entries; see also Theorem 9. (i) (Wu and Pourahmadi (2009)) Assume that 83 28 84 Johnstone (2001), El Karoui (2007) and Bai and Silverstein the process (Xt) in (1) is weakly stable, namely (45) holds 29 ∞ P ˆ − → 85 (2010). Their techniques are not applicable here since, in with p =2.If i=0 0Xi > 0, then, ρ(Σn Σn) 0 30 time series analysis, we have only one realization with de- in probability. (ii) (Xiao and Wu (2010b)) Let conditions in 86 31 pendent observations, while they require multiple iid copies Theorem 13 be satisfied. Then there exists a constant c>0 87 32 of vectors with independent entries. such that 88 33 89 The inconsistency of Σˆ n is due to the fact thatγ ˆk is not 34 −1 90 (78) lim P[c log n ≤ ρ(Σˆ n − Σn) ≤ c log n]=1. a good estimate of γk if k is large; see Remark 6. Hence, n→∞ 35 to obtain a consistent covariance matrix estimate, we shall 91 36 E 92 use the truncation technique by shrinking the unreliable es- Theorem 10.Assume that (Xt) in (1) satisfies Xi =0. 37 −1 n | | 93 timatesγ ˆk to 0. Namely we can use the banded covariance Let γˆk = n i=|k|+1 XiXi−|k|, k 4,and 107 −α λ 52 where is the Hadamard (or Schur) product, which is Θp(m)=O(m ), α>0.Letln n ,whereλ ∈ (0, 1) 108 53 formed by element-wise multiplication of elements of matri- satisfies λwindow function satisfying (i) w(·)is 110 ˜ −1 1/2 55 | |≤ (81) ρ(Σn,l − Σn)=O(bn)+OP[(n ln log ln) ]. 111 even and piecewise continuous; (ii) w(0) = 1, supu w(u) 56 112 Asymptotic theory for stationary processes 11 sii139.tex; 2/03/2011; 8:48 p. 12

√ 1 ∈Lp − − 57 Additionally assume that X0 , p>max(4, 2/(1 λ)), Example 12. In (76)ifweletw(x)= 30x(1 x)1|x|≤1, ∞ − 2 T1 − − 58 t=0 min(δt,p, Ψn+1,p)=O(n ) with T1 > max[1/2 (p then the window −T2 3 4)/(2pλ), 2λ/p] and Θn,p = O(n ), T2 > max[0, 1 − (p − 59 4 4)/(2pλ)]. Then there exists a constant c>0 such that (85) 60 5 w(x)w(x + u)dx =(1−|u|)3(1 + 3|u| + u2), |u|≤1, 61 6 (82) R 62 −1 −1 1/2 7 lim P[c (n ln log ln) − 2bn ≤ ρ(Σ˜ n,l − Σn)] = 1. 63 n→∞ 8 also leads to a positive-definite weight matrix. 64 9 Proof. (i) We shall use the argument in Wu and Pourahmadi As an application of our covariance matrix estimates, 65 10 66 (2009). Since Σ˜ n,l −Σn is a symmetric Toeplitz matrix, from we can apply the bound (79) to the celebrated problem 11 Golub and Van Loan (1989), we have of prediction and filtering of stationary time series. Kol- 67 12 mogorov (1939) and Wiener (1949) considered the funda- 68 13 ˜ − 69 ρ(Σn,l Σn) mental problem of predicting unknown future values of a 14 n time series based on past observations. Their theory is one 70 15 ≤ | − | 71 max γî−jw|i−j| γi−j of the great achievements in time series analysis. For a de- 1≤j≤n 16 i=1 tailed account see Doob (1953), Whittle (1963), Priestley 72 17 n−1 l n (1981) and Pourahmadi (2001) among others. In many of 73 18 ≤ |γˆ w − γ |≤2 |γˆ w − γ | +2 |γ |. 74 i i i i i i i such works, it is assumed that the covariances γk are known. 19 i=1−n i=0 i=1+l 75 For example, to predict Xn based on past observations, Kol- 20 n−1 76 |E − |≤ mogorov and Wiener assumed that the whole past (Xi)i=−∞ 21 By Theorem 7(i), we have (79) since the bias γî γi 77 i|γ |/n. (ii) Here we shall apply Theorem 3 in Liu and Wu is known and in this case by the ergodic theorem γk can 22 i be accurately estimated. In practice, however, one has only 78 23 (2010b). For details see Xiao and Wu (2010b). 79 finitely many past observations, and thus γk should be re- 24 The bound in (79) is non-asymptotic in that it holds for placed by its estimates. Then the question naturally ap- 80 25 81 all l

1 E 2 ∞ ∈ R 57 Definition 5 (Spectral density function). Let (Xk)beasta- Theorem 11. Assume Xk < .(i)Foralmostallϑ 2 tionary process with covariance function γk =cov(X0,Xk). (Lebesgue), we have 58 3 We say that F is a spectral distribution function if it is right- 59 4 continuous, non-decreasing and bounded on [0, 2π] such that Sn(ϑ) 60 (90) √ ⇒ N[0,πf(ϑ)Id2] 5 2π ıkφ n 61 γk = 0 e dF (φ). If F is absolutely continuous, then its 6 derivative f = F is called the spectral density. 62 7 and consequently In(ϑ)/(2πf(ϑ)) ⇒ Exp(1), the standard 63 8 Note that the process (1) is regular in the sense that exponential distribution with scale parameter 1. (ii)√ More- 64 E |F E F 9 (Xj −∞)= (Xj) since the sigma algebra σ( −∞)= over, for√ almost all pairs (ϑ, ϕ) (Lebesgue), Sn(ϑ)/ n and 65 10 ∩i∈Zσ(Fi)={∅, Ω} is trivial. Theorem 1 in Peligrad and Sn(ϕ)/ n are asymptotically independent. 66 11 Wu (2010) asserts that, for a regular process, its spectral 67 Proposition 2. Assume that 12 density function exists almost surely over φ ∈ [0, 2π] with 68 13 respect to the Lebesgue measure. If ∞ 69 14 (91) P0Xi −P0Xi+1 < ∞. 70 15 (88) |γk| < ∞, i=0 71 16 k∈Z 72 17 Then (90) holds for all 0 <ϑ<2π. A sufficient condition 73 then spectral density function has the form 18 for (91)is(45). 74 19 75 1 ıkφ 1 By the celebrated Fast Fourier Transform algorithm, one 20 (89) f(φ)= γke = γk cos(kφ), 76 can compute Sn(ϑj ), j =0,...,n − 1, in a very efficient 2π ∈Z 2π ∈Z 21 k k way with computational complexity O(n log n) and memory 77 22 78 which exists at all φ ∈ R and is continuous. The spectral complexity O(n). Historically this computational advantage 23 79 density function is even and has period 2π.Itscontinuity fuels the development of spectral analysis. Theorem 12 con- 24 80 cerns asymptotic distribution of Sn(ϑj ). In the special case property is related to the decay rate of the covariances γk. 25 ∞ 81 If kp|γ | < ∞, p>0, then f ∈Cp(R). If the former in which Xi are iid standard Gaussian random variables, 26 k=1 k − 82 holds for all p>0, for example if γ → 0 geometrically I(ϑj )/2, j =1,..., (n 1)/2 , are iid standard exponen- 27 k 83 quickly, then f is an analytic function. tials. 28 84 Let (X ) be a stationary second order process with mean 29 k Theorem 12. Assume that (Xi) defined in (1) satisfies (45) 85 30 0; let In(φ) be the periodogram of X1,...,Xn. Assume (88). and minϑ f(ϑ) > 0.Letq ∈ N, m = (n − 1)/2 and let Yk, 86 →∞ 31 Then as n , elementary manipulations show that 1 ≤ k ≤ 2q, be iid standard normals. Then 87 32 n−1 88 33 (92) 89 EIn(φ)= (1 −|k|/n)γk cos(kφ) → 2πf(φ). 34 S (ϑ ) 90 k=1−n n lj , 1 ≤ j ≤ q ⇒{Y2j−1 + ıY2j , 1 ≤ j ≤ q} 35 91 nπf(ϑlj ) 36 Hence In(φ) is an asymptotically unbiased estimate of 92 37 2πf(φ). However, by Theorem 11 or Proposition 2, In(φ) for integers 1 ≤ l1

∞ 1 2 ∞ 57 Theorem 13 (Lin and Liu, 2009b). Assume that (Xi) de- κ := −∞ K (x)dx < , K is continuous at all but a fi- 2 fined in (1) satisfies min f(ϑ) > 0, E(X )=0, X ∈Lp, 2 → 58 ϑ i i nite number of points and sup02 and, as j →∞, as c →∞. Then for any fixed 0 ≤ θ<2π, 59 4 60 ∞ 5 n { − E }⇒ 2 61 (94) δp(i)=o(1/ log j). (97) fn(θ) [fn(θ)] N[0,s (θ)], 6 Bn 62 i=j 7 where s2(θ)=(θ)f 2(θ)κ. 63 8 64 Recall Theorem 12 for Iñ(θ) and m = (n − 1)/2.Then 9 In Theorem 14, the short-range dependence condition 65 10 ∞ 66 − Δ4 < is natural, since otherwise the process (Xj)may −e u 11 (95) P max Iñ(θl) − log m ≤ u = e ,u∈ R. 67 1≤l≤m be long-range dependent and the spectral density function 12 68 may not be well-defined. The bandwidth condition Bn →∞ 13 B o n 69 10. ESTIMATION OF SPECTRAL and n = ( ) is also natural. 14 A particularly interesting special case of Theorem 14 is 70 15 DENSITIES θ =0.Inthiscase2πf(0) = σ2 is the long-run variance. 71 16 72 A fundamental problem in spectral analysis of time se- Estimation of long-run variance is needed in the inference 17 73 ries is the estimation of spectral density functions. Section 9 of means of stationary processes; see Theorems 3 and 5.By 18 (97), we have 74 19 demonstrates that In(ϑ) is an asymptotically unbiased, but 75 inconsistent estimate of f(θ). To obtain a consistent esti- 20 (98) 76 21 mate, one can introduce a taper, data window or convergence 77 n { − }⇒ 2 2 2 22 factor K and propose fn(0) f(0) N(0,s ), where s =2f (0)κ, 78 Bn 23 79 n−1 24 1 ıkθ if the bandwidth b =1/B satisfies 80 (96) fn(θ)= K(k/Bn)ˆγke , n n 25 2π 81 k=1−n 26 2π{E[fn(0)] − f(0)} 82 27 − ∞ 83 where Bn satisfies Bn →∞and Bn/n → 0, and the function n1 −1/2 28 K is symmetric, bounded, K(0) = 1 and K is continuous = K(kbn)(1 −|k|/n)γk − γk = O((nbn) ). 84 29 − −∞ 85 at 0. If K has bounded support, since Bn/n → 0, the sum- k=1 n k= 30 86 mands for large k in (96) are zero. Here fn is called the lag 31 window estimate. If K is the rectangle kernel K(u)=1|u|≤1, then the above 87 condition is reduced to 32 Properties of spectral density estimates have been dis- 88 33 89 cussed in many classical textbooks on time series; see Ander- Bn ∞ 34 1 −1/2 90 son (1971), Brillinger (1975), Brockwell and Davis (1991), kγk + γk = O((nbn) ). 35 n 91 Grenander and Rosenblatt (1984), Priestley (1981) and k=1 k=1+Bn 36 Rosenblatt (1985) among others. A classical problem in 92 37 spectral analysis is to develop an asymptotic distributional Hence, taking a logarithmic transformation of (98), we can 93 38 stabilize the variance via 94 theory for the spectral density estimate fn(θ). With the 39 95 latter results one can perform statistical inference such as n 40 { − }⇒ 96 hypothesis testing and construction of confidence intervals. (99) log fn(0) log f(0) N(0, 4). Bn 41 However, it turns out that the central limit problem for 97 42 98 fn(θ) is highly nontrivial. Many of the previous results re- Therefore the (1 − α)th, 0 <α<1, confidence interval for 43 99 quire that the underlying processes are linear or strong mix- log f(0) can be constructed by 44 100 ing, or satisfy stringent cumulant summability conditions 45 101 that are not easily verifiable. 2z1−α/2 46 log fn(0) ± √ , 102 Here we shall present a central limit theorem for fn(λ) nbn 47 103 under very mild and natural conditions, thus substantially 48 − 104 extending the applicability of spectral analysis to nonlinear where z1−α/2 is the (1 α/2)th quantile of the standard 49 105 and/or non-strong mixing processes. Let (u)=2ifu/π ∈ normal distribution. 50 106 Z and (u)=1ifu/π ∈ Z. The spectral density estimate (96) is non-recursive in the 51 sense that it cannot be updated within O(1) computation 107 52 108 Theorem 14 (Liu and Wu, 2010b). Assume E(Xk)=0, once a new observation arrives. Xiao and Wu (2010a) pro- 53 E 4 ∞ ∞ 109 (Xk ) < and the 4-stability condition Δ4 < .Let posed a recursive or single-pass algorithm which is compu- 54 110 Bn →∞and Bn = o(n) as n →∞. Further assume tationally fast in that the update can be performed within 55 111 that K is symmetric, bounded, limu→0 K(u)=K(0) = 1, O(1) computation, and the required memory complexity 56 112 14 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 15

1 is also only O(1). The computational advantage becomes where fn is Rosenblatt’s (1956) kernel density estimate 57 2 highly attractive for efficient and fast processing for extra 58 3 long time series. Xiao and Wu (2010a) proved a central limit (106) 59 n n 4 theorem for their recursive estimates by using physical de- 1 x − X 1 60 f (x )= K( 0 t )= K (x − X ). 5 pendence measures. n 0 nb b n bn 0 t 61 n t=1 n t=1 6 62 7 63 11. KERNEL ESTIMATION OF TIME For i ∈ Z,l∈ N,letFl(x|Fi)=P(Xi+l ≤ x|Fi)bethe 8 64 SERIES l-step ahead conditional distribution function of Xi+l given 9 F |F d |F 65 i and fl(x i)= dx Fl(x i) be the conditional density. 10 Kernel method is an important nonparametric approach 66 11 in the inference of the data-generating mechanisms of time Theorem 15 (Wu (2005), Wu, Huang and Huang (2010)). 67 12 series. It is useful in situations in which the functional or Assume that exists a constant c0 < ∞ such that 68 |F ≤ 13 parametric forms are unknown. Asymptotic properties for supx∈R f1(x 0) c0 almost surely, and 69 kernel estimates of iid observations have been studied in 14 ∞ 70 15 Silverman (1986), Devroye and Györfi (1985), Wand and 71 (107) sup P0f1(x|Fi) < ∞. 16 Jones (1995), Prakasa Rao (1983), Nadaraya (1989) and x 72 i=1 17 Eubank (1999) among others, and for strong mixing pro- 73 cesses in Robinson (1983), Singh and Ullah (1985), Castel- 2 18 Let κ =√R K (u)du. Assume (103). (i) The central limit 74 19 lana and Leadbetter (1986), Györfi et al (1989) and Bosq 75 theorem nbn[fn(x0) − Efn(x0)] ⇒ N(0,f(x0)κ) holds. (ii) 20 (1996), Yu (1993), Neumann (1998), Kreiss and Neumann p 2 2 76 Let Vp(x)=E[|G(x, ηn)| ] and σ (x)=V2(x) − g (x).If 21 (1998), Härdle et al (1997), Tjostheim (1994) and Fan and 77 f(x0) > 0, V2, g ∈C(R) and that Vp(x) is bounded on a 22 Yao (2003). Wu and Mielniczuk (2002) and Ho and Hsing 78 neighborhood of x0, then 23 (1996) considered long-memory processes. 79 24 Here we shall present an asymptotic theory for kernel (108) 80 estimates with predictive dependence measures. Consider E 25 − Tn(x0) ⇒ 2 81 the model nbn gn(x0) N[0,σ (x0)κ/f(x0)]. 26 Efn(x0) 82 27 83 − 28 (100) Yi = G(Xi,ηi),Xi = H(...,εi 1,εi), Using the Crámer-Wold device, we can have a multi- 84 29 variate version of (108). Liu and Wu (2010a) developed an 85 where ηi,i∈ Z, are also iid and ηi is independent of Fi−1 = 30 asymptotic distributional theory for the maximum deviation 86 (...,εi−2,εi−1). An important special example of (100)is 31 √ 87 the autoregressive model 32 nb 88 (109) Δn := sup |fn(x) − Efn(x)|, 33 l≤x≤u κf(x) 89 (101) Xi+1 = R(Xi,εi+1) 34 90 where l and u are fixed bounds. Similar asymptotic distri- 35 by letting η = ε and Y = X . Given the data (X ,Y ), 91 i i+1 i i+1 i i butions hold for maximum deviations of the regression es- 36 0 ≤ i ≤ n,let 92 37 timates as well. Such results can be used to construct uni- 93 n form or simultaneous confidence bands for unknown density 38 1 − 94 (102) Tn(x)= YtKbn (x Xt), and regression functions. Liu and Wu’s theorem substan- 39 n 95 t=1 40 tially generalize earlier results which were obtained under 96 independence (Bickel and Rosenblatt, 1973) or restrictive 41 where Kb(x)=K(x/b)/b,thekernel K is symmetric and 97 beta mixing assumptions (Neumann, 1998). The problem 42 bounded on R:sup ∈R |K(u)|≤K , K(u)du =1andK 98 u 0 R of generalizing Bickel and Rosenblatt’s theorem to station- 43 has bounded support; namely, K(x)=0if|x|≥c for some 99 ary processes is very challenging and it has been open for a 44 c>0, and b = bn is a sequence of bandwidths satisfying the 100 45 natural condition long time. Fan and Yao (2003, p. 208) conjectured that sim- 101 46 ilar results hold for stationary processes under certain mix- 102 47 (103) bn → 0andnbn →∞. ing conditions. Using physical dependence measure, Liu and 103 48 Wu solved this open problem and established an asymptotic 104 49 The Nadaraya-Watson estimator of the regression function theory for both short- and long-range dependent processes. 105 50 106 (104) g(x )=E(Y |X = x )=E[G(x ,η )] Theorem 16 (Liu and Wu (2010a)). Assume Xn = a0εn + 51 0 n n 0 0 0 p 107 g(...,εn−2,εn−1) ∈L for some p>0,whereg is a mea- 52 108 has the form surable function, a0 =0 , and the density function fε of ε1 is 53 | | | | ∞ 109 positive and supx∈R[fε(x)+ fε(x) + fε (x) ] < . For the 54 Tn(x0) bandwidth b , assume that there exists 0 <δ ≤ δ < 1 such 110 (105) gn(x0)= , n 2 1 55 −δ1 −δ2 111 fn(x0) that n = O(bn) and bn = O(n ).Letp =min(p, 2) and 56 112 Asymptotic theory for stationary processes 15 sii139.tex; 2/03/2011; 8:48 p. 16

√ n − 1 p /2 γ 2 ∞ −E ⇒ 57 Θn = i=0 δp (i) . Assume Ψn,p = O(n ) for some Then there exists σ < such that (Un Un)/ n N(0, 2 n 2 − σ ). (ii) (Non-summable weights) Let W (i)= w − 58 γ>δ1/(1 δ1) and n j=1 i j 3 n 2 1/2 ∞ | | ∞ 59 ∞ and Wn =[ i=1 Wn (i)/n] . Assume i=1 wi = , 4 n 2 2 n 60 2 −1 (n−k)w = o(nW ), lim infn→∞ Wn/( |wi|) > 0 (110) (Θn+k − Θk) = o(b n log n). k=0 k n i=0 5 n and 61 k=−n 6 62 ∞ 7 63 Let the kernel K ∈C1[−1, 1] with K(±1) = 0;letl =0and (115) sup K(X ,X ) − K(X˜ , X˜ ) < ∞, 8 0 j 0 j 64 j∈Z u =1.Then =0 9 65 where X˜ = E(X |ε − ,...,ε ). 10 (111) j j j j 66 11 −1 1/2 −1 1/2 −2e−z √ 67 P (2 log b ) Δn − 2 log b − log K ≤ z → e Then there exists σ2 < ∞ such that (U −EU )/(W n) ⇒ 12 3 U n n n 68 N(0,σ2 ). 13 U 69 1 2 14 holds for every z ∈ R,whereK3 = (K (t)) dt/ 70 −1 Hsing and Wu (2004) applied Theorem 17(ii) with wi ≡ 1 1 15 2 2 71 (4π −1 K (t)dt). and derived a central limit theorem for the correlation in- 16 n 72 tegral U = i,j=1 1|Xi−Xj |≤b, which measures the number 17 For the short-range dependent linear process Xn = 73 ∞ 2 of pairs (Xi,Xj) such that their distance is less than b>0. a ε − with Eε =0andEε =1,(110) is satis- 18 j=0 j n j 1 1 Correlation integral is of critical importance in the study 74 ∞ | | ∞ ∞ 2 −γ 19 fied if j=0 aj < and j=n aj = O(n )forsome of dynamical systems (Grassberger and Procaccia (1983a, 75 − 20 γ>2δ1/(1 δ1). The latter condition can be weaker than 1983b), Wolff (1990), Serinko (1994), Denker and Keller 76 ∞ | | ∞ 21 j=0 aj < if δ1 < 1/3. Interestingly, (110) also holds (1986), Borovkova et al (1999)). The central limit theorem 77 −β 22 for some long-range dependent processes. Let aj = j (j), is useful for the related statistical inference. A non-central 78 23 1/2 <β<1, where (·) is a slowly varying function. If limit theorem is also developed in Hsing and Wu (2004) for 79 1/2 1−β −1/2 24 δ1/(1 − δ1) <β− 1/2andbn n (n)=o(log n). long memory linear processes. 80 25 1/2 1/2 1−β 81 then (111) holds. If log n = o(bn n (n)), Liu and Wu 26 82 showed that the limiting distribution of Δn is no longer 13. CONCLUSION 27 Gumbel. 83 28 Physical and predictive dependence measures shed new 84 light on the asymptotic theory of time series. They are di- 29 12. U-STATISTICS 85 30 rectly related to the underlying physical mechanisms of the 86 processes and have the attractive input-output interpreta- 31 Given a sample X1,...,Xn, consider the weighted U- 87 32 statistic tion. In many cases they are easy to compute and results 88 33 built upon them are often optimal and nearly optimal. They 89 34 (112) Un = wi−jK(Xi,Xj), are particularly useful for dealing with complicated statistics 90 35 1≤i,j≤n of time series such as eigenvalues of sample covariance ma- 91 36 trices and maxima of periodograms, where it is difficult to 92 37 where wi are weights with wi = w−i and K is a symmet- apply the traditional strong mixing type of conditions. We 93 38 ric measurable function. Many statistics can be expressed in expect that our framework, tools and results can be useful 94 39 the form of Un. Hoeffding (1961), O’Neil and Redner (1993), for other asymptotic problems in the study of stationary 95 40 Major (1994) and Rifi and Utzet (2000) considered proper- time series. 96 41 ties of Un for iid observations. Yoshihara (1976), Denker and 97 42 Keller (1983, 1986), Borovkova, Burton and Dehling (1999, ACKNOWLEDGEMENTS 98 43 2001, 2002) and Dehling, Wendler (2010) dealt with strong 99 This work was supported in part from DMS-0906073 and 44 mixing processes. Hsing and Wu (2004) developed general 100 DMS-0448704. I am grateful to Mohamed El Machkouri, 45 results for processes satisfying (1) for both summable and 101 Martin Wendler, Jan Mielniczuk and a referee for their many 46 non-summable weights. In the context of U-statistics, it is 102 helpful comments. 47 natural to define the predictive dependence measure 103 48 104 P Received 10 August 2010 49 (113) θi,j = 0K(Xi,Xj) . 105 50 106 Theorem 17 (Hsing and Wu, 2004). (i) (Summable 51 REFERENCES 107 weights) Assume that 52 Alsmeyer, G. and Fuh, C. D. (2001). Limit theorems for iterated ran- 108 53 ∞ ∞ dom functions by regenerative methods. Stochastic Processes and 109 their Applications 96 123–142. MR1856683 54 | | ∞ 110 (114) wk θi,i−k < . Anderson,T.W.(1971). The Statistical Analysis of Time Series. 55 k=0 i=0 Wiley, New York. MR0283939 111 56 112 16 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 17

1 Anderson, T. W. and Walker, A. M. (1964). On the asymptotic dis- Carrasco, M. and Chen, X. (2002). Mixing and moment properties 57 2 tribution of the autocorrelations of a sample from a linear stochas- of various GARCH and stochastic volatility models. Econometric 58 3 tic process. The Annals of Mathematical Statistics 35 1296–1303. Theory 18 17–39. MR1885348 59 MR0165602 Castellana, J. V. and Leadbetter, M. R. (1986). On smoothed 4 Andrews,D.W.K.(1995). Nonparametric kernel estimation for semi- probability density estimation. Stochastic Process. Appl. 21 179– 60 5 parametric models. Econometric Theory 11 560–596. MR1349935 193. MR0833950 61 6 Arjas, E. and Lehtonen, T. (1978). Approximating many server Chan, K. S. and Tong, H. (2001). Chaos: A Statistical Perspective. 62 queues by means of single server queues. Math. Operation Research Springer, New York. MR1851668 7 63 3 205–223. MR0506659 Chen, M. and An. H. (1998). A note on the stationarity and existence 8 Arnold, L. (1998). Random Dynamical Systems. Springer, Berlin. of moments of the GARCH models. Statistica Sinica 8 505–510. 64 9 MR1723992 MR1624371 65 10 Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Chen, X. and Fan, Y. (2006). Estimation and model selection of 66 Dimensional Random Matrices, 2nd ed. Springer, New York. semiparametric copula-based multivariate dynamic models under 11 67 MR2567175 copula misspecification. J. Econometrics 135, no. 1–2, 125–154. 12 Barnsley, M. F. and Elton, J. H. (1988). A new class of Markov pro- MR2328398 68 13 cesses for image encoding. Adv. Appl. Probab. 20 14–32. MR0932532 Chow,Y.S.and Teicher, H. (1988). Probability Theory,2nded. 69 Bartlett, M. S. 14 (1946). On the theoretical specification and sampling Springer, New York. MR0953964 70 properties of autocorrelated time-series. Suppl.J.Roy.Statist.Soc. Deak,´ I. (1990). Random Numbers Generators and Simulation. 15 71 8 27–41. MR0018393 Akadémiai Kiadó, Budapest. MR1080965 16 Bhansali, R. J. (1974). Asymptotic properties of the Wiener- Dedecker, J. P., Doukhan, G., Lang, J. R., Leon R., Louhichi, 72 17 Kolmogorov predictor. I. Journal of the Royal Statistical Society. S. and Prieur, C. (2007). Weak Dependence: With Examples and 73 18 Series B 36 61–73. MR0368365 Applications. Springer, New York. MR2338725 74 Bhansali, R. J. (1977). Asymptotic properties of the Wiener- Dedecker, J. and Merlevede,` F. (2002). Necessary and sufficient 19 Kolmogorov predictor. II. Journal of the Royal Statistical Society. conditions for the conditional central limit theorem. Ann. Probab. 75 20 Series B 39 66–72. MR0445748 30 1044–1081. MR1920101 76 21 Bickel,P.J.and Rosenblatt, M. (1973). On some global measures Dedecker, J. and Merlevede,` F., (2003). The conditional central 77 22 of the deviations of density function estimates. Ann. Statist. 1 1071– limit theorem in Hilbert spaces. Stochastic Process. Appl. 108 229– 78 1095. MR0348906 262. MR2019054 23 Bickel, P. J. and Levina, E. (2008). Regularized estimation of large Dedecker, J., Merlevede,` F. and Volny,´ D. (2007). On the weak 79 24 covariance matrices. Ann. Statist. 36 199–227. MR2387969 invariance principle for non-adapted sequences under projective cri- 80 25 Bierens, H. (1983). Uniform consistency of kernel estimators of a re- teria. J. Theoret. Probab. 20 971–1004. MR2359065 81 gression function under generalized conditions. J. Amer. Statist. Dedecker, J. and Prieur, C. (2005). New dependence coefficients. 26 82 Assoc. 78 699–707. MR0721221 Examples and applications to statistics. Probability Theory and Re- 27 Billingsley, P. (1968). Convergence of Probability Measures. Wiley, lated Fields 132 203–236. MR2199291 83 28 New York. MR0233396 Dehling, H. and Wendler, M. (2010). Central limit theorem and the 84 29 Bollerslev, T. (1986). Generalized autoregressive conditional het- bootstrap for U-statistics of strongly mixing data. Journal Multi- 85 eroskedasticity. Journal of Econometrics 31 307–327. MR0853051 variate Anal. 101 126–137. MR2557623 30 86 Borovkova, S., Burton, R. M. and Dehling, H. (1999). Consistency Denker, M. and Keller, G. (1983). On U-statistics and von Mises’ 31 of the Takens estimator for the correlation dimension. Annals of statistic for weakly dependent processes. Z. Wahrsch. verw. Gebiete 87 32 Applied Probability 9 376–390. MR1687339 64 505–552. MR0717756 88 Borovkova, S., Burton, R. M. Dehling, H. Denker, M. Keller, G. 33 and (2001). Limit the- and (1986). Rigorous statistical procedures 89 orems for functionals of mixing processes with applications to U for data from dynamical systems. Journal of Statistical Physics 44 34 90 -statistics and dimension estimation. Transactions of the American 67–93. MR0854400 35 Mathematical Society 353 4261–4318. MR1851171 Devroye, L. and Gyorfi,¨ L. (1984). Nonparametric Density Estima- 91 36 Borovkova, S., Burton, R. M. and Dehling, H. (2002). From dition: The L1 View. Wiley, New York. MR0780746 92 Diaconis, P. Freedman, D. 37 mension estimation to asymptotics of dependent U-processes. In: and (1999). Iterated random functions. 93 Limit Theorems in Probability and Statistics I (I. Berkes, E. Csaki SIAM Rev. 41 45–76. MR1669737 38 and M. Csorgo, eds.) Budapest 2002, 201–234. MR1979966 Ding, Z., Granger, C. and Engle, R. (1993). A long memory prop- 94 39 Borkar,V.S.(1993). White-noise representations in stochastic real- erty of stock market returns and a new model. J. Empirical Finance 95 40 ization theory. SIAM J. Control Optim. 31 1093–1102. MR1233993 1 83–106. 96 Bosq, D. Doob, J. L. 41 (1996). Nonparametric Statistics for Stochastic Processes. (1953). Stochastic Processes. Wiley, New York. 97 Estimation and Prediction. Lecture Notes in Statist. 110. Springer, MR0058896 42 New York. MR1441072 Doukhan, P. and Louhichi, S. (1999). A new weak dependence con- 98 43 Bougerol, P. and Picard, N. (1992). Stationarity of GARCH pro- dition and applications to moment inequalities. Stochastic Process. 99 44 cesses and of some nonnegative time series. J. Econometrics 52 Appl. 84 313–342. MR1719345 100 115–127. MR1165646 Doukhan, P. (1994). Mixing: Properties and Examples. Springer, New 45 101 Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). York. MR1312160 46 Time Series Analysis: Forecasting and Control. Prentice-Hall, NJ. Doukhan, P. and Wintenberger, O. (2008). Weakly dependent 102 47 MR1312604 chains with infinite memory. Stochastic Process. Appl. 118 1997– 103 48 Bradley, R. C. (2007). Introduction to Strong Mixing Conditions. 2013. MR2462284 104 Kendrick Press, Utah. Duflo, M. (1997). Random Iterative Models. Springer-Verlag, Heidel- 49 105 Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, berg Germany. MR1485774 50 2nd ed. Holden-Day, San Francisco. MR0595684 Eberlein, E. (1986). On strong invariance principles under depen- 106 51 Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and dence assumptions. Ann. Probab. 14 260–270. MR0815969 107 Eberlein, E. Taqqu, M. 52 Methods, 2nd ed., Springer, New York. MR1093459 and (ed.) (1986). Dependence in Probabil- 108 Burkholder, D. L. (1988). Sharp inequalities for martingales and ity and Statistics: A Survey of Recent Results. Birkhauser, Boston. 53 109 stochastic integrals. In Colloque Paul Lévy sur les Processus MR0899982 54 Stochastiques (Palaiseau, 1987). Astérisque No. 157–158, pp. 75– 110 55 94. MR0976214 111 56 112 Asymptotic theory for stationary processes 17 sii139.tex; 2/03/2011; 8:48 p. 18

1 El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of Hannan, E. J. and Deistler, M. (1988). The Statistical Theory of 57 2 a large class of complex sample covariance matrices. Ann. Probab. Linear Systems. Wiley, New York. MR0940698 58 Hardle,¨ W., Lutkepohl, H. Chen, R. 3 35 663–714. MR2308592 and (1997). A review of non- 59 El Machkouri, M., Volny,´ D. and Wu, W. B. (2010). Central limit parametric time series analysis. Int. Stat. Rev. 65 49–72. 4 theorems for random fields. Preprint. Harris, D., McCabe, B. and Leybourne, S. (2003). Some limit the- 60 5 Elton, J. H. (1990). A multiplicative ergodic theorem for Lipschitz ory for autocovariances whose order depends on sample size. Econo- 61 6 maps. Stoc. Proc. Appl. 34 39–47. MR1039561 metric Theory 19 829–864. MR2002575 62 Engle, R. F. (1982). Autoregressive conditional heteroscedasticity He, C. and Terasvirta,¨ T. (1999). Fourth moment structure of the 7 63 with estimates of the variance of United Kingdom inflation. Econo- GARCH(p,q) process. Econometric Theory 15 824–846. 8 metrica 50 987–1007. MR0666121 Ho,H.C.and Hsing, T. (1996). On the asymptotic expansion of the 64 9 Escanciano, J. C. and Javier, H. (2009). Persistence in Nonlinear empirical process of long-memory moving averages. Ann. Statist. 24 65 10 Time Series: A Nonparametric Approach. CAEPR Working Paper 992–1024. MR1401834 66 No. 2009-003. Hoeffding, W. (1961). The strong law of large numbers for U- 11 67 Eubank, R. (1999). Nonparametric Regression and Spline Smoothing, statistics. Mimeograph Series No. 302, Department of Statistics, 12 2nd ed. Marcel Dekker, New York. MR1680784 University of North Carolina at Chapel Hill. 68 13 Fan, J. (2005). A selective overview of nonparametric methods in finan- Hoeffding, W. and Robbins, H. (1948). The central limit theo- 69 14 cial econometrics (with discussion). Statistical Science 20 317–357. rem for dependent random variables. Duke Math. J. 15 773–780. 70 MR2210224 MR0026771 15 71 Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Corrected 16 and Parametric Methods. Springer, New York. MR1964455 reprint of the 1985 original. Cambridge University Press, Cam- 72 17 Ganssler,¨ P. and Hausler,¨ E. (1979). Remarks on the functional bridge, UK. MR1084815 73 Hosking, J. R. M. 18 central limit theorem for martingales. Z. Wahrsch. Verw. Gebiete (1996). Asymptotic distributions of the sample 74 50 237–243. MR0554543 mean, autocovariances, and autocorrelations of long-memory time 19 Geman, S. (1980). A limit theorem for the norm of random matrices. series. J. Econom. 73 261–284. MR1410007 75 20 Ann. Probab. 8 252–261. MR0566592 Hsing, T. and Wu,W.B.(2004). On weighted U-statistics for sta- 76 21 Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, tionary processes. Annals of Probability 32 1600–1631. MR2060311 77 Ibragimov, I. A. 22 2nd ed. The Johns Hopkins University Press, Baltimore, Maryland. (1962). Some limit theorems for stationary processes. 78 MR1002570 Theory of Probability and its Applications 7 349–382. MR0148125 23 Gordin, M. I. (1969). The central limit theorem for stationary pro- Ibragimov, I. A. and Linnik, Yu. V. (1971). Independent and station- 79 24 cesses. (Russian) Dokl. Akad. Nauk SSSR 188 739–741. MR0251785 ary sequences of random variables. Groningen, Wolters-Noordhoff. 80 ˇ 25 Gordin, M. I. and Lifsic, B. A. (1978). Central limit theorem for MR0322926 81 stationary Markov processes. (Russian) Dokl. Akad. Nauk SSSR Jones, D. A. (1976). Non-linear autoregressive processes. Unpublished 26 82 239 766–767. MR0501277 Ph.D. Thesis, University of London. 27 Gotze,¨ F. and Hipp, C. (1994). Asymptotic distribution of statistics Jarner, S. and Tweedie, R. (2001). Locally contracting iterated ran- 83 28 in time series. Ann. Statist. 22 2062–2088. MR1329183 dom functions and stability of Markov chains. J. Appl. Probab. 38 84 29 Granger, C. W. J. (1995). Modelling nonlinear relationships between 494–507. MR1834756 85 extended-memory variables. Econometrica 63 265–279. MR1323523 Johnstone, I. M. (2001). On the distribution of the largest eigen- 30 86 Granger, C. W. J. and Anderson, A. P. (1978). An Introduction to value in principal components analysis. Ann. Statist. 29 295–327. 31 Bilinear Time Series Models. Vandenhoek and Ruprecht, Gottinger. MR1863961 87 32 MR0483231 Jones, R. H. (1964). Spectral analysis and linear prediction of mete- 88 Grassberger, P. Procaccia, I. 33 and (1983a). Measuring the orological time series. J. Appl. Meteor. 3 45–52. 89 Strangeness of Strange Attractors. Physica D 9 189–208. Kalikow, S. A. (1982). T, T−1 transformation is not loosely Bernoulli. 34 90 MR0732572 Ann. Math. 115 393–409. MR0647812 35 Grassberger, P. and Procaccia, I. (1983b). Characterization of Kallianpur, G. (1981). Some ramifications of Wiener’s ideas on non- 91 36 Strange Attractors. Phys. Rev. Let. 50 346–349. MR0689681 linear prediction. In: Norbert Wiener, Collected Works with Com- 92 Grenander, U. Rosenblatt, M. 37 and (1984). Statistical Analysis of mentaries. MIT Press, Mass., 402–424. 93 Stationary Time Series, 2nd ed. Chelsea Publishing Co., New York. Keenan, D. M. (1997). A central limit theorem for m(n) autocovari- 38 MR0890514 ances. J. Time Ser. Anal. 18 61–78. MR1437742 94 39 Grenander, U. and Szego,¨ G. (1958). Toeplitz Forms and Their Ap- Kolmogorov, A. (1941). Interpolation und Extrapolation von sta- 95 40 plications. Cambridge University Press, London. MR0094840 tionären zufälligen Folgen. Bull. Acad. Sci. URSS Sér. Math. [Izves- 96 Gyorfi,¨ L., Hardle,¨ W., Sarda, P. Vieu, P. 41 and (1989). Non- tia Akad. Nauk. SSSR] 5 3–14. MR0004416 97 parametric Curve Estimation From Time Series. Lecture Notes in Komlos,´ J., Major, P. and Tusnady,´ G. (1975). An approximation of 42 Statist. 60, Springer, Berlin. MR1027837 partial sums of independent RV’s and the sample DF. I. Z. Wahrsch. 98 43 Hall, P and C.C. Heyde (1980). Martingale Limit Theorem and its Verw. Gebiete 32 111–131; II. (1976) 34 33–58. 99 44 Application. Academic Press, New York. MR0624435 Koop, G., Pesaran, M. H. and Potter, S. M. (1996). Impulse re- 100 Hallin, M., Lu, Z. and Tran, L. T. (2001). Density estimation for sponse analysis in nonlinear multivariate models. Journal of Econo- 45 101 spatial linear processes. Bernoulli 7 657–668. MR1849373 metrics 74 119–147. MR1409037 46 Hallin, M., Lu, Z. and Tran, L. T. (2004). Kernel Density Estima- Li, D., Lu, Z. and Linton, O. (2010). Local Linear Fitting under 102 47 tion for Spatial Processes: The L1 Theory. Journal of Multivariate Near Epoch Dependence: Uniform Consistency with Convergence 103 48 Analysis 88 61–75. MR2021860 Rates. Discussion paper, London School of Economics. Available at: 104 Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. http://personal.lse.ac.uk/lintono/downloads/Li-Lu-Linton-4.pdf. 49 105 MR0279952 Ling, S. (1999). On the stationarity and the existence of moments 50 Hannan, E. J. (1973). Central limit theorems for time series regression. of conditional heteroskedastic ARMA models. Statistica Sinica 9 106 51 Z. Wahrsch. Verw. Gebiete 26 157–170. MR0331683 1119–1130. MR1744828 107 Hannan, E. J. Ling, S. McAleer, M. 52 (1976). The asymptotic distribution of serial covari- and (2002). Necessary and sufficient moment 108 ances. Ann. Statist. 4 396–399. MR0398029 conditions for the GARCH (r, s) and asymmetric power GARCH(r, 53 109 Hannan, E. J. (1979). The central limit theorem for time series re- s)models.Econometric Theory 18 722–729. MR1906332 54 gression. Stochastic Process. Appl. 9 281–289. MR0562049 110 55 111 56 112 18 W. B. Wu sii139.tex; 2/03/2011; 8:48 p. 19

1 Liu, W. and Lin, Z. (2009a). Strong approximation for a class of sta- Rio, E. (2000). Théorie asymptotique des processus aléatoires faible- 57 2 tionary processes. Stochastic Processes and their Applications 119 ment dépendants. Mathématiques & Applications 31. Springer- 58 3 249–280. doi:10.1016/j.spa.2008.01.012. MR2485027 Verlag, Berlin. MR2117923 59 Liu, W. and Lin, Z. (2009b). On maxima of periodograms of stationary Rio, E. (2009). Moment inequalities for sums of dependent random 4 processes. Ann. Statist. 37 5B, 2676–2695. MR2541443 variables under projective conditions. Journal of Theoretical Prob- 60 5 Liu, W. and Wu, W. B. (2010a). Simultaneous nonparametric infer- ability 22 146–163. MR2472010 61 6 ence of time series. Ann. Statist. 38 2388–2421. MR2676893 Robinson,P.M.(1983). Nonparametric estimators for time series. J. 62 Liu, W. and Wu, W. B. (2010b). Asymptotics of spectral density Time Ser. Anal. 4 185–207. MR0732897 7 63 estimates. Econometric Theory 26 1218–1245. Rootzen,´ H. (1976). Gordin’s theorem and the periodogram. J. Appl. 8 Lu, Z. (2001). Asymptotic normality of kernel density estimators under Probab. 13 365–370. MR0410876 64 9 dependence. Annals of the Institute of Statistical Mathematics 53 Rosenblatt, M. (1961). Independence and dependence, In Proc. 4th 65 10 447–468. MR1868884 Berkeley Sympos. Math. Statist. and Prob. vol II, Univ. California 66 Lu, Z. and Linton, O. (2007). Local linear fitting under near epoch Press, Berkeley, CA, pp. 431–443. MR0133863 11 67 dependence. Econometric Theory 23 37–70. MR2338951 Rosenblatt, M. (1952). Remarks on a multivariate transformation. 12 Major, P. (1994). Asymptotic distributions for weighted U-statistics. Ann. Math. Statist. 23 470–472. MR0049525 68 13 Ann. Probab. 22, 1514–1535. MR1303652 Rosenblatt, M. (1956a). A central limit theorem and a strong mixing 69 Maxwell, M. Woodroofe, M. 14 and (2000). Central limit theorems condition. Proc. Nat. Acad. Sci. U. S. A. 42 43–47. MR0074711 70 for additive functionals of Markov chains. Ann. Probab. 28 713– Rosenblatt, M. (1956b). Remarks on some nonparametric estimates 15 71 724. MR1782272 of a density function. Ann. Math. Statist. 27 832–837. MR0079873 16 Morgan, J. P. (1996). RiskMetrics. Technical Document. New York. Rosenblatt, M. (1985). Stationary Sequences and Random Fields. 72 17 Nadaraya,E.A.(1989). Nonparametric Estimation of Probabil- Birkhäuser, Boston. MR0885090 73 Rosenblatt, M. 18 ity Densities and Regression Curves. Kluwer Academic Pub. (2009). A comment on a conjecture of N. Wiener. 74 MR1093466 Statist. Probab. Letters 79 347–348. MR2493017 19 Neumann, M.H. (1998). Strong approximation of density estimators Serinko, R. J. (1994). A consistent approach to least squares estima- 75 20 from weakly dependent observations by density estimators from in- tion of correlation dimension in weak Bernoulli dynamical systems. 76 21 dependent observations. Ann. Statist. 26 2014–2048. MR1673288 Annals of Appl. Probab. 4 1234–1254. MR1304784 77 Olshen, R. A. Shao, X. Wu, W. B. 22 (1967). Asymptotic properties of the periodogram of a and (2007). Asymptotic spectral theory for non- 78 discrete stationary process. J. Appl. Probab. 4 508–528. MR0228059 linear time series. Ann. Statist. 35 1773–1801. MR2351105 23 O’Neil,K.A.and Redner, R. A. (1993). Asymptotic Distributions of Silverman, B. W. (1986). Density Estimation for Statistics and Data 79 24 Weighted U-Statistics of Degree 2. Annals of Probability 21 1159– Analysis. Chapman and Hall, London. MR0848134 80 25 1169. MR1217584 Singh, R. S. and Ullah, A. (1985). Nonparametric time-series esti- 81 Ornstein, D. S. (1973). An example of a Kolmogorov automor- mation of joint DGP, conditional DGP, and vector autoregression. 26 82 phism that is not a Bernoulli shift. Advances in Math. 10 49–62. Econometric Theory 1 27–52. 27 MR0316682 Steinsaltz, D. (1999). Locally contractive iterated function systems. 83 28 Peligrad, M. (1996). On the asymptotic normality of sequences of Ann. Probab. 27 1952–1979. MR1742896 84 29 weak dependent random variables. J. Theoret. Probab. 9 703–715. Subba Rao, T. and Gabr,M.M.(1984). An Introduction to Bispec- 85 MR1400595 tral Analysis and Bilinear Time Series Models. Lecture Notes in 30 86 Peligrad, M. and Utev, S. (2005). A new maximal inequality and Statistics, 24. Springer-Verlag, New York. MR0757536 31 invariance principle for stationary sequences. Ann. Probab. 33 798– Surgailis, D. (1982). Zones of attraction of self-similar multiple inte- 87 32 815. MR2123210 grals. Lithuanian Mathematical Journal 22 327–340. MR0684472 88 Peligrad, M. Wu,W.B. Terrin, N. Hurvich, C. M. 33 and (2010). Central limit theorem for and (1994). An asymptotic Wiener-Ito 89 Fourier transform of stationary processes. Annals of Probability 38 representation for the low frequency ordinates of the periodogram 34 90 2009–2022. MR2722793 of a long memory time series. Stochastic Process. Appl. 54 297–307. 35 Pham, D. T. (1985). Bilinear Markovian representation and bilinear MR1307342 91 36 models. Stochastic Process. Appl. 20 295–306. MR0808163 Tjøstheim, D. (1994). Nonlinear time series, a selective review. Scand. 92 Pham, D. T. 37 (1986). The mixing property of bilinear and generalised J. Statist. 21 97–130. 93 random coefficient autoregressive models. Stochastic Process. Appl. Tong, H. (1981). A note on a Markov bilinear stochastic process in 38 23 291–300. MR0876051 discrete time. J. Time Ser. Anal. 2 279–284. MR0648732 94 39 Pham, D. T. (1993). Bilinear time series models. In Dimension Es- Tong, H. (1983). Threshold Models in Non-linear Time Series Anal- 95 40 timation and Models (H. Tong, ed.). World Scientific, Singapore. ysis. Springer-Verlag. MR0717388 96 Tong, H. 41 MR1307660 (1990). Non-linear Time Series: A Dynamic System Ap- 97 Philipp, W. and Stout, W. (1975). Almost sure invariance principles proach. Oxford University Press, Oxford. MR1079320 42 for partial sums of weakly dependent random variables. Mem. Am. Volny,´ D., Woodroofe, M. and Zhao, O. (2011). Central limit theo- 98 43 Math. Soc. 161 1–140. MR0433597 rems for superlinear processes. Stochastics and Dynamics 11 71–80 99 ´ 44 Phillips,P.C.B.and Solo, V. (1992). Asymptotics for linear pro- Volny, D. (1993). Approximating martingales and the central limit 100 cesses. Ann. Statist. 20 971–1001. MR1165602 theorem for strictly stationary processes. Stochastic Process. Appl. 45 101 Pourahmadi, M. (2001). Foundations of Time Series Analysis and 44 41–74. MR1198662 46 Prediction Theory. Wiley, New York. MR1849562 Walker, A. M. (1965). Some asymptotic results for the periodogram 102 47 Prakasa Rao, B. L. S. (1983). Nonparametric Functional Estimation. of a stationary time series. J. Austral. Math. Soc. 5 107–128. 103 48 Academic Press, New York. MR0740865 MR0177457 104 Priestley, M. B. (1981). Spectral Analysis and Time Series 1.Aca- Walker, A. M. (2000). Some results concerning the asymptotic dis- 49 105 demic Press. MR0628735 tribution of sample Fourier transforms and periodograms for a 50 Priestley, M. B. (1988). Nonlinear and Nonstationary Time Series discrete-time stationary process with a continuous spectrum. J. 106 51 Analysis. Academic Press. MR0991969 Time Ser. Anal. 21 95–109. MR1766176 107 Quinn, B. G. Wand, M. P. Jones, M. C. 52 (1982). A note on the existence of strictly station- and (1995). Kernel Smoothing. Chapman 108 ary solutions to bilinear equations. J. Time Ser. Anal. 3 249–252. and Hall, London. MR1319818 53 109 MR0703088 Whittle, P. (1963). Prediction and Regulation by Linear Least 54 Rifi, M. and Utzet, F. (2000). On the asymptotic behavior of Squares Methods. Van Nostrand, Princeton. MR0157416 110 55 weighted U-statistics. J. Theor. Probab. 13 141–167. MR1744988 111 56 112 Asymptotic theory for stationary processes 19 sii139.tex; 2/03/2011; 8:48 p. 20

1 Wiener, N. (1949). Extrapolation, Interpolation and Smoothing of Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation 57 2 Stationary Time Series. Wiley, New York. of large covariance matrices of longitudinal data. Biometrika 90 58 3 Wiener, N. (1958). Nonlinear Problems in Random Theory.MIT 831–844. MR2024760 MR2024760 59 Press, Cambridge, MA. MR0100912 Wu,W.B.and Pourahmadi, M. (2009). Banding sample covari- 4 60 Wolff, R. C. L. (1990). A note on the behaviour of the correlation ance matrices of stationary processes. Statist. Sinica 19 1755–1768. 5 integral in the presence of a time series. Biometrika 77 689–697. MR2589209 61 6 MR1086682 Wu,W.B.and Shao, X. (2004). Limit theorems for iterated random 62 functions. J. Appl. Probab. 41 425–436. MR2052582 7 Woodroofe, M. (1992). A central limit theorem for functions of a 63 Markov chain with applications to shifts. Stochastic Processes and Wu, W. B. and Zhao, Z. (2007). Inference of trends in time se- 8 64 Their Applications 41 33–44. MR1162717 ries. Journal of the Royal Statistical Society, Series B 69 391–410. 9 Wu,W.B.(2005a). Fourier transforms of stationary processes. Proc. MR2323759 65 10 Amer. Math. Soc. 133 285–293. MR2086221 Xiao, H. and Wu, W. B. (2010a). A single-pass algorithm for spectrum 66 Wu,W.B. estimation with fast convergence. IEEE Transactions on Informa- 11 (2005b). Nonlinear system theory: Another look at de- 67 pendence. Proc. Nat. Journal Acad. Sci. USA 102 14150–14154. tion Theory, To Appear. 12 68 MR2172215 Xiao, H. and Wu,W.B.(2010b). Covariance matrix estimation for 13 Wu, W. B. (2007). Strong invariance principles for dependent random stationary time series. Preprint. 69 Yajima, Y. 14 variables. Ann. Probab. 35 2294–2320. MR2353389 (1989). A central limit theorem of Fourier transforms of 70 Wu, W. B. strongly dependent stationary processes. J. Time Ser. Anal. 10 375– 15 (2008). An asymptotic theory for sample covariances of 71 Bernoulli shifts. Stochastic Processes and their Applications 119 383. MR1038470 16 453–467. MR2493999 Yin, Y. Q., Bai, Z. D. and Krishnaiah, P. R. (1988). On the limit 72 17 Wu, W. B., Huang, Y. and Huang, Y. (2010). Kernel estimation for of the largest eigenvalue of the large-dimensional sample covariance 73 18 time series: An asymptotic theory. Stochastic Processes and their matrix. Probab. Theory Related Fields 78 509–521. MR0950344 74 Yoshihara, K. (1976). Limiting behavior of U-statistics for stationary, 19 Applications 120 2412–2431. 75 Wu, W. B., Huang, Y. and Zheng, W. (2010). Covariances estima- absolutely regular processes. Z. Wahrsch. verw. Gebiete 35 237– 20 252. MR0418179 76 tion for long-memory processes. Adv. in Appl. Probab. 42 137–157. ∞ 21 MR2666922 Yu, B. (1993). Density estimation in the L norm for dependent data 77 22 Wu,W.B.and Mielniczuk, J. (2002). Kernel density estimation for with applications to the Gibbs sampler. Ann. Statist. 21 711–735. 78 linear processes. Annals of Statistics 30 1441–1459. MR1936325 MR1232514 23 79 Wu,W.B.and Mielniczuk, J. (2010) A new look at measuring de- 24 pendence. In Dependence in Probability and Statistics. pp. 123–142 Wei Biao Wu 80 25 (Ed. P. Doukhan, G. Lang, G. Teyssi`ere, and D. Surgailis) (Lecture Department of Statistics 81 26 Notes in Statistics) Springer, New York. The University of Chicago 82 Wu,W.B.and Min, W. (2005). On linear processes with dependent 27 USA 83 innovations. Stochastic Process. Appl. 115 939–958. MR2138809 28 E-mail address: [email protected] 84 29 85 30 86 31 87 32 88 33 89 34 90 35 91 36 92 37 93 38 94 39 95 40 96 41 97 42 98 43 99 44 100 45 101 46 102 47 103 48 104 49 105 50 106 51 107 52 108 53 109 54 110 55 111 56 112 20 W. B. Wu