Statistica Sinica 18(2008), 313-333

EMPIRICAL PROCESSES OF STATIONARY SEQUENCES

Wei Biao Wu

University of Chicago

Abstract: The paper considers empirical distribution functions of stationary causal processes. Weak convergence of normalized empirical distribution functions to Gaussian processes is established and sample path properties are discussed. The Chibisov-O’Reilly Theorem is generalized to dependent random variables. The proposed dependence structure is related to the sensitivity measure, a quantity appearing in the prediction theory of stochastic processes.

Key words and phrases: Empirical process, , Hardy inequality, linear process, martingale, maximal inequality, nonlinear , prediction, short-range dependence, tightness, weak convergence.

1. Introduction Let ε , k Z, be independent and identically distributed (i.i.d.) random k ∈ variables and define Xn = J(...,εn−1, εn), (1) where J is a measurable function such that Xn is a proper random variable. The framework (1) is general enough to include many interesting and important examples. Prominent ones are linear processes and nonlinear time series aris- ing from iterated random functions. Given the sample X , 1 i n, we are i ≤ ≤ interested in the empirical distribution function

n 1 F (x)= 1 . (2) n n Xi≤x Xi=1 When the Xi are i.i.d., the weak convergence of Fn and its sample path prop- erties have been extensively studied (Shorack and Wellner (1986)). Various gen- eralizations have been made to dependent random variables. It is a challeng- ing problem to develop a weak convergence theory for the associated empir- ical processes without the independence assumption. One way out is to im- pose strong mixing conditions to ensure asymptotic independence; see Billingsley (1968), Gastwirth and Rubin (1975), Withers (1975), Mehra and Rao (1975), Doukhan, Massart and Rio (1995), Andrews and Pollard (1994), Shao and Yu (1996) and Rio (2000), among others. Other special processes that have been 314 WEI BIAO WU studied include linear processes and Gaussian processes; see Dehling and Taqqu (1989), Cs¨org˝oand Mielniczuk (1996), Ho and Hsing (1996) and Wu (2003). A collection of recent results is edited by Dehling, Mikosch and Sørensen (2002). We now introduce some notation. Let the triple (Ω, , P) be the underlying F probability space. Let F (x ξ )= P(X x ξ ) be the conditional distribution ε | n n+1 ≤ | n function of Xn+1 given the sigma algebra generated by ξn = (...,εn−1, εn); let F (x) = P(X x) and R (s) = √n[F (s) F (s)]. Assume throughout the pa- 1 ≤ n n − per that the conditional density f (x ξ ) = (∂/∂x)F (x ξ ) exists almost surely. ε | n ε | n Define the weighted measure w (du) = (1+ u )λ(du). For a random variable λ | | Z write Z q, q > 0, if Z := [E( Z q)]1/q < . Write the 2 norm as ∈ L k kq | | ∞ L Z = Z . Define projections Z = E(Z ξ ) E(Z ξ ), k Z. Denote by k k k k2 Pk | k − | k−1 ∈ Cq (resp. Cγ, Cµ, etc) generic positive constants which only depend on q (resp. γ, µ, etc). Their values may vary from line to line. The rest of paper is structured as follows. Section 2 concerns sample path properties and weak convergence of Rn. In particular, in Section 2.1, the Chibisov-O’Reilly Theorem, which concerns the weak convergence of weighted empirical processes, is generalized to dependent random variables. Section 2.2 considers the weighted modulus of continuity of Rn. Applications to linear pro- cesses and nonlinear time series are made in Section 3. Section 4 presents some useful inequalities that may be of independent interest. The inequalities are applied in Section 5, where proofs of the main results are given.

2. Main results It is certainly necessary to impose appropriate dependence structures on the process (Xi). We start by proposing a particular dependence condition which is quite different from the classical strong mixing assumptions. Let m be a measure on the 1-dimensional Borel space (R, ). For θ R, let T (θ)= n h(θ,ξ ) B ∈ n i=1 i − nE[h(θ,ξ )], where h is a measurable function such that h(θ,ξ ) < for almost 1 k 1 k P∞ all θ (m). Denote by h (θ,ξ )= E[h(θ,ξ ) ξ ] the j-step-ahead predicted mean, j 0 j | 0 j 0. Let (ε′ ) be an i.i.d. copy of (ε ), ξ∗ = (...,ε , ε′ , ε ,...,ε ), k 0, and ≥ i i k −1 0 1 k ≥ define ∞ 1 ∗ 2 2 σ(h, m)= hj(θ,ξ0) hj(θ,ξ0 ) m(dθ) . (3) R k − k Xj=0 h Z i Let f ′(θ ξ ) = (∂/∂θ)f (θ ξ ). In the case that h(θ,ξ )= F (θ ξ ) [resp. f (θ ξ ) ε | k ε | k k ε | k ε | k or f ′(θ ξ )], we write σ(F ,m) [resp. σ(f ,m) or σ(f ′,m)] for σ(h, m). ε | k ε ε ε Our dependence conditions are expressed as σ(F ,m) < , σ(f ,m) < ε ∞ ε and σ(f ′,m) < , where m(dt) = (1+ t )δdt, δ R. These condi- ∞ ε ∞ | | ∈ tions are interestingly connected with the prediction theory of stochastic pro- cesses (Remark 1 and Section 2.3). The quantity σ(h, m) can be interpreted EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 315 as a cumulative weighted prediction measure. It is worthwhile to note that ∗ hj(θ,ξ0) hj(θ,ξ0) has the same order of magnitude as 0h(θ,ξj) . To k − k ∗ kP ∗ k ∗ see this, let g(...,e−1, e0) = h(θ, (...,e−1, e0)). So g(ξj ) = h(θ,ξj ), gj(ξ0 ) = E[g(ξ∗) ξ∗]= h (θ,ξ∗), E[g(ξ ) ξ ]= E[g(ξ∗) ξ ], and j | 0 j 0 j | −1 j | 0 g(ξ ) = E[g (ξ ) g (ξ∗) ξ ] g (ξ ) g (ξ∗) kP0 j k k j 0 − j 0 | 0 k ≤ k j 0 − j 0 k g (ξ ) g (ξ ) + g (ξ ) g (ξ∗) = 2 g(ξ ) . (4) ≤ k j 0 − j+1 −1 k k j+1 −1 − j 0 k kP0 j k Note also that g(ξ ) g(ξ ) g(ξ∗) . Therefore we have kP0 j k ≤ k j − j k ∞ 1 1 2 2 σ(h, m) σ˜(h, m) := 0h(θ,ξj) m(dθ) σ(h, m) 2 ≤ R kP k ≤ Xj=0 h Z i ∞ 1 ∗ 2 2 2 h(θ,ξj) h(θ,ξj ) m(dθ) =: 2ˆσ(h, m). (5) ≤ R k − k Xj=0 h Z i All results in the paper with the condition σ(h, m) < can be re-stated in terms ∞ ofσ ˜(h, m) < , or the stronger versionσ ˆ(h, m) < . ∞ ∞ 2.1. Weak convergence The classical Donsker theorem asserts that, if X are i.i.d., then R (s), s k { n ∈ R converges in distribution to an F - process. The Donsker } theorem has many applications in . To understand the behavior at the two extremes s = , we need to consider the weighted version R (s)W (s), s ±∞ { n ∈ R , where W (s) as s . Clearly, if W is bounded, then by the } → ∞ → ±∞ Continuous Mapping Theorem, the weak convergence of the weighted empirical processes follows from that of R . By allowing W (s) as s , one n → ∞ → ±∞ can have the weak convergence of functionals of empirical processes, t(Fn), for a wider class of functionals. The Chibisov-O’Reilly Theorem concerns weighted empirical processes of i.i.d. random variables; see Shorack and Wellner (1986, Sec. 11.5). The case of dependent random variables has been far less studied. For strongly mixing processes see Mehra and Rao (1975), Shao and Yu (1996) and Cs¨org˝oand Yu (1996). Theorem 1. Let γ′ 0, q > 2, and γ = γ′q/2. Assume E[ X γ +log(1+ X )] < ≥ | 1| | 1| and ∞ q E 2 q [fε (u ξ0)]wγ−1+ (du) < . (6) R | 2 ∞ Z In addition assume that there exists 0 µ 1 and ν > 0 such that ≤ ≤ ′ σ(F , w ′ )+ σ(f , w ′ )+ σ(f , w ) < . (7) ε γ −µ ε γ +µ ε −ν ∞ 316 WEI BIAO WU

E 2 γ′ Then (i) [sups∈R Rn(s) (1 + s ) ] = (1), and (ii) the process Rn(s)(1 + ′ | | | | O { s )γ /2, s R converges weakly to a tight Gaussian process. In particular, if | | ∈ } X γ+q/2−1 and 1 ∈L

sup fε(u ξ0) C (8) u | ≤ holds almost surely for some constant C < , then (6) holds. ∞ An important issue in applying Theorem 1 is to verify (7), which is basically a short-range dependence condition (cf. Remark 1). For many important models including linear processes and Markov chains, (7) is easily verifiable (Section 3). If (Xn) is a , then σ(h, m) is related to the sensitivity measure (Fan and Yao (2003, p. 466)) appearing in nonlinear prediction theory (Section 2.3). If (Xn) is a linear process, then (7) reduces to the classical definition of the short-range dependence of linear processes.

Remark 1. If h(θ,ξk) = fε(θ ξk), then hk(θ,ξ0) = E[fε(θ ξk) ξ0] = fk+1(θ ξ0), | ∗ | | ′ | the conditional density of Xk+1 at θ given ξ0. Note that ξ0 = (ξ−1, ε0) is a coupled version of ξ with ε replaced by ε′ . So h (θ,ξ ) h (θ,ξ∗) = f (θ ξ ) 0 0 0 k 0 − k 0 k+1 | 0 − f (θ ξ∗) measures the change in the (k + 1)-step-ahead predictive distribution k+1 | 0 if ξ is changed to its coupled version ξ∗. In other words, h (θ,ξ ) h (θ,ξ∗) 0 0 k 0 − k 0 can be viewed as the contribution of ε0 in predicting Xk+1. So, in this sense, the condition σ(h, m) < means that the cumulative contribution of ε in ∞ 0 predicting future values X , k 1, is finite. It is then not unnatural to interpret k ≥ σ(h, m) as a cumulative weighted prediction measure. This interpretation seems in line with the connotation of short-range dependence. We now compare Theorem 1 with the Chibisov-O’Reilly Theorem that con- cerns the weak convergence of weighted empirical processes for i.i.d. random vari- ables. Note that (γ+q/2 1) γ′ as q 2. The moment condition X γ+q/2−1 − ↓ ↓ 1 ∈L in Theorem 1 is almost necessary in the sense that it cannot be replaced by the weaker ′ E X γ log−1(2 + X )[log log(10 + X )]−λ < (9) {| 1| | 1| | 1| } ∞ for some λ > 0. To see this let Xk be i.i.d. symmetric random variables with continuous, strictly increasing distribution function F ; let F # be the quantile ′ function and m(u) = [1+ F #(u) ]−γ /2. Then we have the distributional equality | | ′ # γ Rn(F (u)) R (s)(1 + s ) 2 , s R = , u (0, 1) . { n | | ∈ } D m(u) ∈ n o ′ Assume that F (s)(1 + s )γ is increasing on ( , G) for some G < 0. Then | | −∞ m(u)/√u is decreasing on (0, F (G)). By the Chibisov-O’Reilly Theorem, EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 317

R (F #(u))/m(u), u (0, 1) is tight if and only if lim m(t)/ t log log(t−1)= { n ∈ } t↓0 , namely ∞ ′ p lim F (u)(1 + u )γ log log u = 0. (10) u→−∞ | | | | ′ Condition (10) controls the heaviness of the tail of X . Let F (u) = u −γ 1 | | (log log u )−1 for u 10. Then (9) holds while (10) is violated. It is unclear | | ≤ − ′ ′ whether stronger versions of (9), such as E( X γ ) < or E[ X γ log−1(2 + | 1| ∞ | 1| X )] < , are sufficient. | 1| ∞ 2.2. Modulus of continuity Theorem 2 concerns the weighted modulus of continuity of R ( ). Sample n · path properties of empirical distribution functions of i.i.d. random variables have also been extensively explored; see for example Cs¨org˝oet al. (1986), Shorack and Wellner (1986), and Einmahl and Mason (1988), among others. It is far less studied for the dependent case. ′ ′ Theorem 2. Let γ > 0, 2

2.3. Sensitivity measures and dependence Our basic dependence condition is that σ(h, m) < . Here we present ∞ its connection with prediction sensitivity measures (Fan and Yao (2003, p.466)) special structure. Assume that (Xn) is a Markov chain expressed in the form of an iterated random function (Elton (1990) and Diaconis and Freedman (1999)):

Xn = M(Xn−1, εn), (13) where ε , k Z, are i.i.d. random variables and M( , ) is a bivariate measurable k ∈ · · function. For k 1 let f ( x) be the conditional (transition) density of X given ≥ k ·| k X = x. Then f (θ ξ )= f (θ X ) is the conditional density of X at θ given 0 ε | k 1 | k k+1 X , and, for k 0, E[f (θ ξ ) ξ ]= f (θ X ). Fan and Yao (2003) argue that k ≥ ε | k | 0 k+1 | 0 2 Dk(x, δ) := [fk(θ x + δ) fk(θ x)] dθ (14) R | − | Z 318 WEI BIAO WU is a natural way to measure the deviation of the conditional distribution of Xk given X0 = x. In words, Dk quantifies the sensitivity to initial values and it measures the error in the k-step-ahead predictive distribution due to a drift in the initial value. Under certain regularity conditions,

Dk(x, δ) ∂fk(θ x) 2 lim = | dθ =: Ik(x). δ→0 δ2 R ∂x Z h i Here Ik is called prediction sensitivity measure. It is a useful quantity in the prediction theory of nonlinear dynamical systems. Estimation of Ik is discussed in Fan and Yao (2003, p. 468). Proposition 1 shows the relation between σ(h, m) and Ik. Since it can be proved in the same way as (i) of Theorem 3, we omit the details of its proof. Proposition 1. Let k 0. For the process (13), we have ≥

∗ 2 ∗ 2 hk(θ,ξ0) hk(θ,ξ0) m(dθ) 4 τk(X0, X0 ) , R k − k ≤ k k Z where

b 1 ∂hk(θ,x) 2 2 τk(a, b)= [Ihk (x,m)] dx and Ihk (x,m)= m(dθ). a R ∂x Z Z h i Consequently, σ(h, m) < holds if ∞ τ (X , X∗) < . ∞ k=1 k k 0 0 k ∞ In the special case h(θ,ξk) = fεP(θ ξk) = f1(θ Xk), hk(θ,ξ0) = E[h(θ,ξk) ξ0] | | | = f (θ X ) and I (x,m) reduces to Fan and Yao’s sensitivity measure I (x) k+1 | 0 hk k+1 provided m(dθ)= dθ is Lebesgue measure. So it is natural to view Ihk (x,m) as a weighted sensitivity measure. Since the k-step-ahead conditional density f (θ x), k | may have an intractable and complicated form, it is generally not very easy to apply Proposition 1. This is especially so in nonlinear time series where it is often quite difficult to derive explicit forms of f (θ x). To circumvent such a difficulty, k | our Theorem 3 provides sufficient conditions which only involve 1-step-ahead conditional densities. For processes that are not necessarily in the form (13), we assume that there exists an σ(ξn)-measurable random variable Yn such that

P(X x ξ )= P(X x Y ) := F (x Y ). (15) n+1 ≤ | n n+1 ≤ | n | n Then there exists a similar bound as the one given in Proposition 1. Write ∗ ∗ Yn = I(ξn), Yn = I(ξn) and h(θ,ξn) = h(θ,Yn). For Markov chains, (15) is ∞ satisfied with Yn = Xn. Let Xn = i=0 aiεn−i be a linear process. Then (15) is also satisfied with Y = ∞ a ε . Let f(θ y) be the conditional density of n i=1 i n+1P−i | P EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 319

′ Xn+1 given Yn = y and f (θ y) = (∂/∂θ)f(θ y). Then σ(h, m) is also related to | ∗ | a weighted distance between Yn and Yn . Define

b 1 2 2 ∂ ρh,m(a, b)= Hh,m(y)dy, where Hh,m(y)= h(θ,y) m(dθ). (16) a R ∂y Z Z

Theorem 3. (i) Let Ξ = ∞ ρ (Y ,Y ∗) . Then σ(h, m ) 2Ξ and h,m n=0 k h,m n n k ≤ h,m P ∗ 2 ∗ 2 hn(θ,ξ0) hn(θ,ξ0 ) m(dθ) 4 ρh,m(Yn,Yn ) . (17) R k − k ≤ k k Z q−2 (ii) Assume there exist C > 0 and q R such that Hh,m(y) C(1 + y ) ∗∈ ∗ min(1,q/≤2) | | holds for all y R. Then ρ (Y ,Y ) = O[ Y Y q ] if q > 0, and ∈ k h,m n n k k n − n k ρ (Y ,Y ∗) = O[ min(1, Y Y ∗ ) ] if q < 0. k h,m n n k k | n − n | k Let h(θ,Y ) = f(θ Y ). Then H (y) can be interpreted as a measure n | n h,m of ”local dependence” of Xn+1 on Yn at y. As does Dk(x, δ) in (14), Hh,m(y) measures the distance between the conditional densities of [X Y = y] and n+1| n [X Y = y + δ]. In many situations Y Y ∗ is easy to work with since it n+1| n k n − n kq is directly related to the data generating mechanisms.

3. Applications 3.1. Iterated random functions Consider the process (13). The existence of stationary distributions has been widely studied and there are many versions of sufficient conditions; see Diaconis and Freedman (1999), Meyn and Tweedie (1993), Jarner and Tweedie (2001), Wu and Shao (2004), among others. Here we adopt the conditions given by Diaconis and Freedman (1999). The recursion (13) has a unique stationary distribution if there exist α> 0 and x0 such that ′ α E M(x, ε) M(x , ε) Lε0 + M(x0, ε0) and [log(Lε0 )]<0, where Lε = sup | − |. | |∈L ′ x x′ x6=x | − | (18) Condition (18) also implies the geometric-moment contraction (GMC(β), see Wu and Shao (2004)) property: there exist β > 0, r (0, 1), and C < such ∈ ∞ that

E[ J(...,ε , ε , ε ,...,ε ) J(...,ε′ , ε′ , ε ,...,ε ) β] Crn (19) | −1 0 1 n − −1 0 1 n | ≤ holds for all n N, where (ε′ ) is an i.i.d. copy of (ε ). Hsing and Wu (2004) ∈ k k argued that (19) is a convenient condition for limit theorems. Doukhan (2003) gave a brief review of weak convergence of Rn under various dependence structures including strong mixing conditions. Meyn and Tweedie 320 WEI BIAO WU

(1993) discussed alpha and beta-mixing properties of Markov chains. Doukhan and Louhichi (1999) introduced an interesting weak dependence condition based on the decay rate of the covariances between the past and the future values of the process, and obtained weak convergence of Rn. Wu and Shao (2004, pp.431-432) applied Doukhan and Louhichi’s results and proved that, under (19), Rn converges weakly to a tight Gaussian process if, for some κ > 5/2, sup F (x + δ) F (x) = O[log−κ( δ −1)] as δ 0. The latter condition holds if x | − | | | → Xk has a bounded density. Note that results based on Doukhan and Louhichi’s dependence coefficients do not require that F has a density. In comparison, our Theorem 1 concerns weighted empirical processes under the assumption that conditional densities exist. They can be applied in different situations. In Section 3.1.1, we consider weak convergence of weighted empirical processes for autoregressive conditionally heteroskedastic (ARCH, Engle (1982))sequences. Berkes and Horv´ath (2001) discussed strong approximation for empirical pro- cesses of generalized ARCH sequences.

3.1.1. ARCH models Consider the model

2 2 2 Xn = εn a + b Xn−1, (20) where a and b are real parameters forq which ab = 0 and ε are i.i.d. innovations 6 i with density f . Let M(x, ε) = ε√a2 + b2x2. Then L = sup ∂M(x, ε)/∂x ε ε x | | ≤ bε . Assume that there exists β > 0 such that r := E( bε β) < 1. Then (18) | | 0 | 0| holds with α = β and M(x, ε ) M(x′, ε ) β r x x′ β. Iterations of the k n − n kβ ≤ 0| − | latter inequality imply (19) with r = r0. For more details see Wu and Shao (2004). Corollary 1. Assume E( bε β) < 1, β > 0. Let φ (0, β) and ν > 0 satisfy | 0| ∈ 2 ′ 2 ′′ 2 fε(u) w1+φ(du)+ fε(u) w3+φ(du)+ fε (u) w2−ν(du) < . (21) R | | R | | R | | ∞ Z Z Z Then R (s)(1 + s )φ/2, s R converges weakly to a tight Gaussian process. { n | | ∈ } Proof. Let t = t = a2 + b2y2. Then F (θ y) = F (θ/t), f(θ y) = f (θ/t)/t, y | ε | ε and f ′(θ y)= f ′(θ/t)/t2. By (24) of Lemma 1 and (21), we have sup f (x) < | ε p x ε ∞ and (8). Choose q > 2 such that τ = φq/2+ q/2 1 < β. Then X τ . − 1 ∈ L By Theorem 1, it remains to verify (7). Let µ = 1. Recall (16) for Hh,m(y). By considering the cases φ 1 and φ < 1 separately, we have, by (21) and the ≥ inequality ut 1+ ut (1 + u )(1 + t ), that | |≤ | |≤ | | | | 2 ∂Fε(θ/t) −1 2 2 φ−1 φ−2 wφ−1(dθ)= t fε (u)u (1 + ut ) du Ct R ∂t R | | ≤ Z Z

EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 321 holds for some constant C (0, ). Since dt/dy b , ∂F (θ y)/∂y 2w ∈ ∞ | | ≤ | | R | | | φ−1 (dθ) C(1+ y )φ−2. Similar but lengthy calculations show that ∂f(θ y)/∂y 2 ≤ | | R R | | | w (dθ) C(1 + y )φ−2 and ∂f ′(θ y)/∂y 2w (dθ) C(1 + y )−5−ν . By φ+1 ≤ | | R | | | −2−ν ≤ R | | (19), Y Y ∗ = O(rn) for some r (0, 1). By Theorem 3, simple calculations k n − n kφ R ∈ show that (7) holds. Corollary 1 allows heavy-tailed ARCH processes. Tsay (2005) argued that in certain applications it is more appropriate to assume that εk has heavy tails. Let εk have a standard Student-t distribution with degrees of freedom ν, with density 2 −(1+ν)/2 fε(u)=(1+ u /ν) cν , where cν = Γ((ν + 1)/2)/[Γ(ν/2)√νπ]. Then (21) holds if φ < 2ν. Note that ε φ if φ<ν, and consequently X φ if k ∈ L k ∈ L E( bε φ) < 1. | 0| 3.2. Linear processes ∞ Let Xt = i=0 aiεt−i, where the εk are i.i.d. random variables with mean ∞ 2 0 and finite and positive variance, and the coefficients ai satisfy i=0 ai < . P ′ ∞ Assume without loss of generality that a0 = 1. Let Fε and fε = F be the P ε distribution and density functions of εk. Then the conditional density of Xn+1 given ξ is f (x Y ), where Y = X ε (cf. (15)). n ε − n n n+1 − n+1 Corollary 2. Let γ 0. Assume ε 2+γ , sup f (u) < , and ≥ k ∈L u ε ∞ ∞ a < , (22) | n| ∞ n=1 X ′ 2 ′′ 2 fε(u) wγ (du)+ fε (u) w−γ (du) < . (23) R | | R | | ∞ Z Z Then R (s)(1 + s )γ/2, s R converges weakly to a tight Gaussian process. { n | | ∈ } Proof. Let q = 2+ γ. Since ε q, we have Y q and Y Y ∗ = 1 ∈ L 1 ∈ L k n − n kq a (ε ε′ ) = O( a ). Note that f(x y)= f (x y). Since f (θ)w (dθ) k n+1 0 − 0 kq | n+1| | ε − R ε γ = E[(1 + ε )γ ] < and sup f (u) < , we have f (u)2w (du) < . Since | 1| ∞ u ε ∞ R ε γ R ∞ 1+ v + y (1 + v )(1 + y ), | |≤ | | | | R 2 ′ 2 γ 2 ′ 2 [ fε(θ y) + fε(θ y) ]wγ (dθ) (1 + y ) [ fε(θ) + fε(θ) ]wγ (dθ) R | − | | − | ≤ | | R | | | | Z Z C(1 + y )γ . ≤ | | On the other hand,

′′ 2 −γ ′′ 2 −γ γ fε (θ y) (1 + θ ) dθ fε (θ y) (1 + θ y ) (1 + y ) dθ R | − | | | ≤ R | − | | − | | | Z Z γ ′′ 2 =(1+ y ) fε (u) w−γ (du). | | R | | Z By Theorems 1 and 3, the corollary follows. 322 WEI BIAO WU

Condition (22), together with ε 2, implies that the covariances of (X ) 1 ∈ L t are absolutely summable, a well-known condition for a linear process being short- range dependent. If (22) is violated, then we say that the linear process (Xn) is long-range dependent. In this case properties of Fn have been discussed by Ho and Hsing (1996) and Wu (2003). Remark 2. Strong mixing properties of linear processes have been discussed by Withers (1981) and Pham and Tran (1985), among others. It seems that for linear processes strong mixing conditions do not lead to results with best possible 2 ∞ 2 conditions. Let εi have a density and An = i=n ai . Withers (1981) ∈ L ∞ 1/3 showed that (X ) is strongly mixing if max(An , A log A ) < . In i n=1 P n| n| ∞ the case that a = n−δ, the latter condition requires δ > 2. In comparison, our n P p condition (22) only requires that δ > 1. More stringent conditions are required for β-mixing.

Remark 3. Let φ(t) = E exp(√ 1uε0); let fk (resp. φk) be the density (resp. k − characteristic) function of i=0 aiε−i. Doukhan (2003) mentioned the following result (see also Doukhan and Surgailis (1998)): R converges weakly if for some P n 0 < α 1 and s,C,δ > 0 such that sδ > 2α, ε s, ∞ a α < ≤ k ∈ L i=0 | i| ∞ and φ(t) C/(1 + u δ). (Doukhan noted that the proof of the latter result | | ≤ | | P is unpublished.) These results allow heavy-tailed εk. Our Corollary 2 deals with weighted empirical processes, so there are different ranges of applications. Consider the special case of Corollary 2 in which γ = 0. A careful check of ′ the proofs of Theorems 1 and 3 suggests that Corollary 2 still holds if fε in (23) is replaced by fk for some fixed k (see also Wu (2003)). Note that, by Parseval’s identity, if γ = 0, then 2π f ′′(u) 2du = tφ (t) 2dt, which is R | k | R | k | finite if φ (t) C/(1 + t2). The latter condition holds if φ(t) C/(1 + u δ) | k | ≤ R R | | ≤ | | and # i : a = 0 = . { i 6 } ∞ Remark 4. Let g be a Lipschitz continuous function such that, for some r< 1, g(x) g(x′) r x x′ holds for all x, x′ R. Consider the nonlinear model | − | ≤ | − | ∈ Xn+1 = g(Xn)+ εn+1, where εn satisfy conditions in Corollary 2. Then Xn is GMC(q) and consequently Yn = g(Xn) is also GMC(q), q =2+ γ. It is easily seen that the argument in Corollary 2 also applies to this model and that the same conclusion holds. Special cases include the threshold (Tong (1990)) X = a max(X , 0) + b min(X , 0) + ε with max( a , b ) < 1, n+1 n n n+1 | | | | and the exponential autoregressive model (Haggan and Ozaki (1981)) Xn+1 = [a + b exp( cX2)]X + ε , where a + b < 1 and c> 0. − n n n+1 | | | | 4. Inequalities The inequalities presented in this section are of independent interest and EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 323 may have wider applicability. They are used in the proofs of the results in other sections. Lemma 1. Let H be absolutely continuous. (i) If µ 1 and γ R, then ≤ ∈ ∞ ∞ sup [H2(x)(1 + x )γ ] C H2(u)w (du)+ C [H′(u)]2w (du) | | ≤ γ,µ γ−µ γ,µ γ+µ x≥M ZM ZM (24) holds for all M 0, where C is a positive constant. The above inequality also ≥ γ,µ holds if M = and sup is replaced by sup R. (ii) If γ > 0, µ = 1, and −∞ x≥M x∈ H(0) = 0, then

2 −γ 1 ′ 2 sup[H (x)(1 + x ) ] [H (u)] w−γ+1(du), (25) R | | ≤ γ R x∈ Z

2 4 ′ 2 H (u)w−γ−1(du) [H (u)] w−γ+1(du). (26) R ≤ γ2 R Z Z 2 γ −1 ′ 2 (iii) If γ > 0 and H( ) = 0, then sup R[H (x)(1 + x ) ] γ [H (u)] ±∞ x∈ | | ≤ R w (du) and H2(u)w (du) 4γ−2 [H′(u)]2w (du). γ+1 R γ−1 ≤ R γ+1 R Proof. (i) ByR Lemma 4 in Wu (2003), forR t R and δ > 0 we have ∈ 2 t+δ t+δ sup H2(s) H2(u)du + 2δ [H′(u)]2du. (27) ≤ δ t≤s≤t+δ Zt Zt We first consider the case µ< 1. Let α = 1/(1 µ). In (27) let t = t = nα and − n δ = (n + 1)α nα, n N, and I = [t ,t ]. Since lim δ /(αnα−1) = 1, n − ∈ n n n+1 n→∞ n 2 γ γ −1 2 ′ 2 sup[H (x)(1 + x) ] 2 sup(1 + x) δn H (u)du + δn [H (u)] du x∈In ≤ x∈In In In h Z Z i C H2(u)w (du)+ C [H′(u)]2w (du). (28) ≤ γ−µ γ+µ ZIn ZIn It is easily seen by (27) that (28) also holds for n = 0 by choosing a suitable C. By summing (28) over n = 0, 1,..., we obtain (24) with M = 0. The case M > 0 α can be similarly dealt with by letting tn = n + M. If µ = 1, we let t = 2n, δ = t t = t and I = [t ,t ], n = 0, 1,.... n n n+1 − n n n n n+1 The argument above yields the desired inequality. (ii) Let s 0. Since H(s)= s H′(u)du, by the Cauchy-Schwarz Inequality, ≥ 0 s R s H2(s) H′(u) 2(1 + u)1−γ du (1 + u)γ−1du ≤ 0 | | × 0 Z Z γ ′ 2 (1 + s) 1 [H (u)] w−γ+1(du) − . ≤ R × γ Z 324 WEI BIAO WU

So (25) follows. Applying Theorem 1.14 in Opic and Kufner (1990, p. 13) with p = q = 2, the Hardy-type inequality (26) easily follows. The proof of (iii) is similar to that of (ii).

Lemma 2. Let m be a measure on R, A R be a measurable set, and Tn(θ)= n ⊂ i=1 h(θ,ξi), where h is a measurable function. Then P ∞ 2 2 Tn(θ) E[Tn(θ)] m(dθ) √n 0h(θ,ξj) m(dθ). (29) s A k − k ≤ s A kP k Z Xj=0 Z Proof. For j = 0, 1,... let T (θ)= n E[h(θ,ξ ) ξ ] and λ2 = h(θ,ξ ) 2 n,j i=1 i | i−j j AkP0 j k m(dθ), λ 0. By the orthogonality of E[h(θ,ξ ) ξ ] E[h(θ,ξ ) ξ ], i = j ≥ P i | i−j − Ri | i−j−1 1, 2,...,n,

T (θ) T (θ) 2m(dθ)=n E[h(θ,ξ ) ξ ] E[h(θ,ξ ) ξ ] 2m(dθ) k n,j − n,j−1 k k 1 | 1−j − 1 | −j k ZA ZA =n h(θ,ξ ) 2m(dθ)= nλ2. kP1−j 1 k j ZA ∞ Note that Tn(θ)= Tn,0(θ). Let∆= j=0 λj. By the Cauchy-Schwarz Inequality,

P ∞ 2 2 E Tn(θ) E[Tn(θ)] m(dθ)= E [Tn,j(θ) Tn,j+1(θ)] m(dθ) A | − | A − Z Z  Xj=0  ∞ [T (θ) T (θ)]2 ∆ E n,j − n,j+1 m(dθ)=n∆2, ≤ A λj Z  Xj=0  and (29) follows. Lemma 3. Let D be Lq (q > 1) martingale differences and C = 18q3/2(q i q − 1)−1/2. Then

n D + + D r Cr D r, where r = min(q, 2). (30) k 1 · · · nkq ≤ q k ikq Xi=1 Proof. Let M = n D2. By Burkholder’s inequality, n D C M . i=1 i k i=1 ikq ≤ qk kq/2 Then (30) easily follows by considering the cases q > 2 and q 2 separately. P P ≤ Lemma 4. (Wu (2005)) Let q > 1 and Z , 1 i 2d, be random variables in i ≤ ≤ q, where d is a positive integer. Let S = Z + + Z and S∗ = max S . L n 1 · · · n n i≤n | i| Then d 2d−r 1 q ∗ q S d S r S r . (31) k 2 kq ≤ k 2 m − 2 (m−1)kq r=0 m=1 X  X  EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 325

5. Proofs of Theorems 1 and 2 To illustrate the idea behind our approach, let F˜ (x)= n−1 n F (x ξ ) n i=1 ε | i−1 be the conditional empirical distribution function and write P F (x) F (x) = [F (x) F˜ (x)] + [F˜ (x) F (x)]. (32) n − n − n n − The decomposition (32) has two important and useful properties. First, n[F (x) n − F˜n(x)] is a martingale with stationary, ergodic and bounded martingale differ- ences. Second, F˜ F is differentiable with derivative f˜ (x) f(x), where n − n − f˜ (x) = (∂/∂x)F˜ (x) = n−1 n f (x ξ ). The property of differentiability is n n i=1 ε | i useful in establishing tightness. The idea was applied in Wu and Mielniczuk P (2002). Following (32), let G (s) = n1/2[F (x) F˜ (x)] and Q (s) = n1/2[F˜ (x) n n − n n n − F (x)]. Then Rn(s)= Gn(s)+ Qn(s). Sections 5.1 and 5.2 deal with Gn and Qn, respectively. Theorems 1 and 2 are proved in Sections 5.3 and 5.4.

5.1. Analysis of Gn The main result is this section is Lemma 7 which concerns the weak conver- gence of Gn. Lemma 5. Let q > 2 and α = max(1, q/4) q/2. Then there is a constant C − q such that

q y q q α −1 2 G (y) G (x) C n [F (y) F (x)]+C (y x) 2 E[fε (u ξ )]du (33) k n − n kq ≤ q − q − | 0 Zx holds for all n N and all x

q G (y) G (x) q = n− 2 E( d + + d q) k n − n kq | 1 · · · n| Cq q Cq q′ q′ E 2 2 2 q [(d1 + + dn) ] q ( Kn q′ + Ln q′ ). (35) ≤ n 2 · · · ≤ n 2 k k k k By Lemma 3,

q′ K ′ ′ ′ ′ ′ ′ ′ n q q q −1 2 q 2 q q 2 q k k ′ ′ E ′ ′ q′ Cq D1 q Cq2 [ d1 q + (d1 ξ0) q ] Cq2 d1 q , (36) nmax(1, 2 ) ≤ k k ≤ k k k | k ≤ k k 326 WEI BIAO WU

′ ′ 2 q 2 q where we have applied Jensen’s inequality in E(d ξ ) ′ d ′ . Notice that k 1| 0 kq ≤ k 1kq d 1, | 1|≤ ′ ′ ′ ′ 2 q q q′−1 q q q′ d ′ d ′ 2 [ 1 ′ + E(1 ξ ) ′ ] 2 [F (y) F (x)]. k 1kq ≤ k 1kq ≤ k x≤Xi≤ykq k x≤Xi≤y| 0 kq ≤ − (37) Since E(d2 ξ ) E(1 ξ ) and 1/p′ + 1/q′ = 1, we have by H¨older’s In- 1| 0 ≤ x≤Xi≤y| 0 equality that

′ ′ ′ y q q q′ E 2 q q′ E Ln q′ n (d1 ξ0) q′ n fε(u ξ0)du k k ≤ k | k ≤ x | y nh Z i o q′ E q′/p′ q′ n (y x) fε (u ξ0)du . (38) ≤ − x | h Z i Combining (35), (36), (37) and (38), we have (33). To show (34), in (35) we let d = d (x)= 1 E(1 ξ ). Then i i Xi≤x − Xi≤x| i−1 q q q E( d + + d q) C n 2 d q C n 2 d 2 C n 2 F (x)[1 F (x)] | 1 · · · n| ≤ q k 1kq ≤ q k 1k ≤ q − completes the proof. Lemma 6. Let q > 2 and α = max(1, q/4) q/2. Then there exists a constant − C < such that, for all b> 0, a R and n, d N, q ∞ ∈ ∈ q E sup Gn(a + s) Gn(a) 0≤s

a+b q/2 where V = E[fε (u ξ )]du. By Lemma 4, a | 0 R d d 1 q 1 ∗ α q r 2 −1 q S d C n [F (a + b) F (a)] + C (2 h) V k 2 kq ≤ { q − } { q } r=0 r=0 X 1 X q 1 d C nα[F (a + b) F (a)] q + C (2d+1h) 2 −1V q . (41) ≤ { q − } { q } Let B = √n[F˜ (a + jh) F˜ (a + (j 1)h)]. Recall F˜ (x)= n−1 n F (x ξ ) j n − n − n i=1 ε | i−1 and f˜ (x) = F˜′ (x). Since q′ = q/2 > 1, f˜ (x) ′ f (x ξ ) ′ . Note that n n k n kq ≤ k ε | 0Pkq 0 F˜ 1, so by H¨older’s Inequality, ≤ ε ≤ 2d 2d E q E q q′ ˜ ˜ q max Bj (Bj )= n Fn(a + jh) Fn(a + (j 1)h) q j≤2d ≤ k − − k h i Xj=1 Xj=1 2d ′ q′ q n F˜ (a + jh) F˜ (a + (j 1)h) ′ ≤ k n − n − kq Xj=1 d 2 a+jh q′ q′−1 q′ q′ q′−1 n h E[ f˜n(x) ]du n h V. (42) ≤ a+(j−1)h | | ≤ Xj=1 Z Observe that s s Gn(a + h ) max Bj Gn(a + s) Gn(a + h + 1 ) +max Bj. ⌊h⌋ − j≤2d ≤ ≤ ⌊h ⌋ j≤2d

Hence (39) follows from (41), (42) and, since S = G (a + jh) G (a), j n − n s sup Gn(a + s) Gn(a) sup Gn(a + h + 1 ) Gn(a) 0≤s 2. Assume E[ X1 + log(1 + X1 )] < , and ≥ q γ | | | | ∞ (6). Then (i) E[sup R G (s) (1+ s ) ]= (1), and (ii) the process G (s)(1+ s∈ | n | | | O { n s )γ/q, s R is tight. | | ∈ } Remark 5. In Lemma 7, the term log(1 + X ) is not needed if γ > 0. | 1| Proof. (i) Without loss of generality we show that E[sup G (s) q(1+ s )γ ]= s≥0 | n | | | (1), since the case of s < 0 follows similarly. Let α = (log n)qnmax(1,q/4)−q/2. O n 328 WEI BIAO WU

By (6) and (40) of Lemma 6, with a = b = 2k,

∞ kγ k q 2 E sup Gn(s) Gn(2 ) 2k≤s<2k+1 | − | k=1 h i X ∞ ∞ k+1 q 2 q kγ k+1 k kγ k −1 2 Cq 2 αn[F (2 ) F (2 )] + Cq 2 (2 ) 2 E[fε (u ξ0)]du ≤ − 2k | Xk=1 Xk=1 Z ∞ ∞ q q γ γ −1 2 C α f(u)(1 + u) du + C (1 + u) u 2 E[fε (u ξ )]du ≤ γ,q n γ,q | 0 Z2 Z2 C α + C = (1). (43) ≤ γ,q n γ,q O ∞ kγ γ Observe that the function ℓ(x)= k=1 2 12k≤x, x> 0, is bounded by Cγ[x + log(1 + x)], where C is a constant. Then by (34) of Lemma 5, we have γ P ∞ ∞ kγ k q kγ γ 2 G (2 ) 2 CE(1 k ) CE[ X + log(1+ X )] < . (44) k n kq ≤ 2 ≤X1 ≤ | 1| | 1| ∞ Xk=1 Xk=1 Simple calculations show that (i) follows from (43), (44) and (40), with a = 0 and b = 2. (ii) It is easily seen that the argument in (i) entails

q γ lim lim sup E sup Gn(s) (1 + s ) = 0. (45) r→∞ n→∞ |s|>r | | | | h i Let δ (0, 1). For s,t [ r, r] with 0 s t δ, we have ∈ ∈ − ≤ − ≤ γ γ Gn(s)(1 + s ) q Gn(t)(1 + t ) q | | | γ − | | | γ γ (1 + s ) q [Gn(s) Gn(t)] + Gn(t)[(1 + s ) q (1 + t ) q ] ≤ | | γ| − | | | | − | | | (1 + r) q Gn(s) Gn(t) + Cr,γ,qδ sup Gn(u) . (46) ≤ | − | u∈[−r,r] | |

By (i), sup R G (u) = (1). Let I = I (δ) = [kδ, (k + 1)δ]. By Lemma 6, k u∈ | n |k O k k r ⌊ δ ⌋+1 P sup Gn(s) Gn(kδ) >ǫ s∈I | − | k=−⌊ r ⌋−1 k Xδ h i r ⌊ δ ⌋+1 q q −q −1 2 ǫ C α P(X I )+ C δ 2 E[fε (u ξ )]du ≤ q n 1 ∈ k q | 0 k=−⌊ r ⌋−1 Ik Xδ n Z o q q −q −q −1 2 ǫ Cqαn + ǫ Cqδ 2 E[fε (u ξ0)]du. ≤ R | Z EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 329

q/2 By (6), E[fε (u ξ )]du < . Hence R | 0 ∞ R −q q −1 lim sup P sup Gn(s) Gn(t) > 2ǫ ǫ Cqδ 2 , n→∞ s,t∈[−r,r], 0≤s−t≤δ | − | ≤ h i which implies the tightness of G (s), s r for fixed r. So (45) and (46) { n | | ≤ } entail (ii).

5.2. Analysis of Qn

It is relatively easier to handle Qn since it is a differentiable function. The Hardy-type inequalities (cf Lemma 1) are applicable. ′ E 2 γ′ Lemma 8. Let γ 0 and assume (7). Then (i) [sups∈R Qn(s) (1 + s ) ]= ≥ ′ | | | | (1), and (ii) the process Q (s)(1 + s )γ /2, s R is tight. O { n | | ∈ } Proof. Let r 0 and recall 0 µ 1. (i) By Lemma 1, ≥ ≤ ≤

2 γ′ 2 ′ 2 Λ := sup[Q (s)(1 + s ) ] C Q (s)w ′ (ds)+C [Q (s)] w ′ (ds) r n | | ≤ n γ −µ n γ +µ |s|≥r Z|s|≥r Z|s|≥r holds for some constant C = Cγ′,µ. By Lemma 1,

1 2 ∞ ∞ Λr 2 2 k k 0Fε(θ ξj) wγ′−µ(dθ)+ 0fε(θ ξj) wγ′+µ(dθ). (47) √C ≤ s |s|≥kPr | k s |s|≥Pr k | k Xj=0 Z Xj=0 Z So (i) follows by letting r = 0 in (47). (ii) The argument in (ii) of Lemma 7 in applicable. Let 0 <δ< 1. Then

′ ′ γ γ Ψn,r(δ):= sup Qn(s)(1 + s ) 2 Qn(t)(1 + t ) 2 s,t∈[−r,r], 0≤s−t≤δ | | | − | | | ′ γ sup (1 + s ) 2 [Qn(s) Qn(t)] ≤ s,t∈[−r,r], 0≤s−t≤δ | | | − | ′ ′ γ γ + sup Qn(t)[(1 + s ) 2 (1 + t ) 2 ] s,t∈[−r,r], 0≤s−t≤δ | | | − | | |

′ ′ ′ Cr,γ δ sup Qn(u) + Cr,γ δ sup Qn(u) . ≤ u∈[−r,r] | | u∈[−r,r] | |

By (27), Lemma 2, and (7), there exists a constant C = C(r, γ′,µ,ν) such that

r r ′ 2 ′ 2 ′′ 2 E sup Q (s) C Q (s) w ′ (ds)+ C Q (s) w (ds) | n | ≤ k n k γ +µ k n k −ν |s|≤r Z−r Z−r h i 2 2 ′ C ′ σ (f , w ′ )+ C ′ σ (f , w )= (1). ≤ γ ε γ +µ γ ε −ν O 330 WEI BIAO WU

By (i), there exists C < such that for all n N, E[Ψ2 (δ)] δ2C . Notice 1 ∞ ∈ n,r ≤ 1 that the upper bound in (47) goes to 0 as r . Hence (ii) obtains. →∞ 5.3. Proof of Theorem 1. We need to verify finite-dimensional convergence and tightness. Let j 0. ≥ Observe that (∂/∂θ) F (θ ξ ) = f (θ ξ ) and F (θ ξ ) = 1 . P0 ε | j P0 ε | j kP0 ε | j k kP0 Xj+1≤θk By Lemma 1(i)

2 2 sup 0Fε(θ ξj) C 0Fε(θ,ξj) w−µ(dθ)+C 0fε(θ,ξj) wµ(dθ) R kP | k≤ R kP k R kP k θ∈ sZ sZ which, by (7) and (5), implies ∞ 1 < . Hence by Theorem 1(i) in i=0 kP0 Xi≤θk ∞ Hannan (1973), Rn(θ) is asymptotically normal. The finite-dimensional conver- P gence easily follows. Since Rn(s) = Gn(s)+ Qn(s), the tightness and (i) follow from Lemmas 7 and 8. Since E[f (u ξ )] = f(u), (8) and the moment condition X γ+q/2−1 ε | 0 1 ∈ L imply (6).

5.4. Proof of Theorem 2. Let Θ (a, δ) = sup G (a + s) G (a) and α = max(1, q/4) q/2. n 0≤s<δ | n − n | − q α q/2−1 Note that (log n) n = (δn ). By (40) of Lemma 6, we have, uniformly in O a, that a+δ q −1 q n E q q α 2 2 −1 Θn(a, δn) Cq(log n) n [F (a + δn) F (a)] + Cqδn τ f(u)du ≤ − a h i q Z 2 −1 Cδn [F (a + δ ) F (a)]. ≤ n − γ Here the constant C only depends on τ, γ, q and E( X1 ). Hence q| | γ q γ 2 −1 (1 + kδ ) E Θ (kδ , δ ) (1 + kδ ) Cδn [F (kδ + δ ) F (kδ )] | n| n n n ≤ | n| n n − n k∈Z k∈Z X h i Xq 2 −1 γ Cδn E[(1 + X ) ]. ≤ | 1| Let I (δ) = [kδ, (1+k)δ] and c = sup [(1+ t )/(1+ u )]. Then c 2 since k δ |u−t|≤δ | | | | δ ≤ δ < 1/2. By the inequality G (a) G (c) G (a) G (b) + G (b) G (c) , | n − n | ≤ | n − n | | n − n | sup Gn(t + s) Gn(t) 2 sup Gn(kδn + u) Gn(kδn) . t∈Ik(δn), 0≤s<δn | − |≤ 0≤u<2δn | − | Therefore, E γ q E γ q sup(1 + t ) Θn(t, δn) sup (1 + t ) Θn(t, δn) R t∈ | | ≤ Z t∈Ik(δn) | | h i Xk∈ h i q −1 2qcγ (1 + kδ )γ E Θq (kδ , 2δ ) Cδ 2 . δn n n n n n ≤ Z | | ≤ Xk∈ h i EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 331

Note that Rn(s)= Gn(s)+ Qn(s). Then (12) follows if it holds with Rn replaced by Gn and Qn, respectively. The former is an easy consequence of the preceding inequality and Jensen’s Inequality. To show that (12) holds with Rn replaced by ′ Qn, recall γ = 2γ/q. By (24) of Lemma 1 and Lemma 2,

′ E γ ′ 2 ′ 2 ′ ′′ 2 ′ sup(1 + x ) Qn(x) C Qn(x) wγ (dx)+ C Qn(x) wγ (dx) R | | | | ≤ R k k R k k x∈ Z Z h i 2 2 ′ Cσ (f , w ′ )+ Cσ (f , w ′ ) < , ≤ ε γ ε γ ∞ which completes the proof in view of the fact that

γ′ 2 2 γ′ ′ 2 (1 + t ) sup Qn(t + s) Qn(t) δn(1 + t ) sup Qn(t + s) | | |s|≤δn | − | ≤ | | |s|≤δn | | ′ ′ cγ δ2 sup [(1 + u )γ Q′ (u) 2]. δn n n ≤ u:|u−t|≤δn | | | |

Remark 6. It is worthwhile to note that the modulus of continuity of Gn has 1−2/q the order δn , while that of Qn has the higher order δn.

6. Proof of Theorem 3.

(i) If (17) holds, then σ(h, m) 2Ξh,m. To prove (17), let Zn(θ)= hn(θ,ξ0) ≤ ∗ − h (θ,ξ∗), h(θ,Y )= h(θ,ξ ), and V (θ)= h(θ,Y ∗) h(θ,Y )= Yn ∂ h(θ,y)dy. n 0 n n n n n Yn ∂y ∗ − Let λ(y) = [H (y)]1/2 and U = Yn λ(y)dy. By the Cauchy-Schwarz Inequality, h,m Yn R

∗ ∗ Yn R 2 Yn 2 1 ∂ 2 Vn (θ)m(dθ) h(θ,y) dy λ(y)dy m(dθ)= U . R ≤ R Yn λ(y) ∂y × Yn Z Z h Z Z i

Note that E[h(θ,Y ) ξ ] = E[h(θ,Y ∗) ξ ]. By (4), Z (θ) 2 h(θ,Y ) n −1 n 0 n ∗ 0 n | | k kY ≤n kP k ≤ 2 Vn(θ) . So we have (17). (ii) Let δ = q/2 1 and W = wδ(dy). If q < 0, k k − Yn then w (dy) = 4/q and W min( Y Y ∗ , 4/q). If δ > 0, by H¨older’s R δ − | | ≤ | n − n | − R Inequality, R W [(1 + Y )δ +(1+ Y ∗ )δ](Y Y ∗) k k ≤ k | n| | n | n − n k (1 + Y )δ +(1+ Y ∗ )δ δ Y Y ∗ = O( Y Y ∗ ). ≤ k | n| | n | kqk n − n kq k n − n kq For the case 1 < δ 0, we need to prove the inequality u w (dy) 2−δ u − ≤ | v δ |≤ | − v 1+δ/(1 + δ). For the latter, it suffices to consider cases (a) u v 0, and (b) | R ≥ ≥ u 0 v. For (a), ≥ ≥ u (1 + u)1+δ (1 + v)1+δ (u v)1+δ w (dy)= − − . δ 1+ δ ≤ 1+ δ Zv 332 WEI BIAO WU

For (b), let t = (u v)/2. Then − u (1 + u)1+δ 1+(1+ v )1+δ 1 2(1 + t)1+δ 2 2t1+δ w (dy)= − | | − − . δ 1+ δ ≤ 1+ δ ≤ 1+ δ Zv ∗ δ+1 ∗ q/2 Therefore W = O( Y Y )= O( Y Y q ). k k k| n − n | k k n − n k Acknowledgements The author is grateful to three referees for their valuable comments. The author also thanks Professors S´andor Cs¨org˝oand Jan Mielniczuk for useful sug- gestions. The work was supported in part by NSF grant DMS-0478704.

References

Andrews, D. W. K. and Pollard, D. (1994). An introduction to functional central limit theorems for dependent stochastic processes. Internat. Statist. Rev. 62, 119-132. Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Berkes, I. and Horv´ath, L. (2001). Strong approximation of the empirical process of GARCH sequences. Ann. Appl. Probab. 11, 789-809. Chow, Y. S. and Teicher, H. (1988). . 2nd edition. Springer, New York. Cs¨org˝o, M., Cs¨org˝o, S., Horv´ath, L. and Mason, D. M. (1986). Weighted empirical and quantile processes. Ann. Probab. 14, 31-85. Cs¨org˝o, M. and Yu, H. (1996). Weak approximations for quantile processes of stationary se- quences. Canad. J. Statist. 24, 403-430. Cs¨org˝o, S. and Mielniczuk, J. (1996). The empirical process of a short-range dependent station- ary sequence under Gaussian subordination. Probab. Theory Related Fields 104, 15-25. Dehling, H., Mikosch, T. and Sørensen, M. (eds), (2002). Empirical Process Techniques for Dependent Data. Birkh¨auser, Boston. Dehling, H. and Taqqu, M. S. (1989). The empirical process of some long-range dependent sequences with an application to U-statistics. Ann. Statist. 17, 1767-1786. Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev. 41, 41-76. Doukhan, P. (2003). Models, inequalities, and limit theorems for stationary sequences. In Theory and applications of long-range dependence (Edited by P. Doukhan, G. Oppenheim and M. S. Taqqu), 43-100, Birkh¨auser, Boston. Doukhan, P. and Louhichi, S. (1999). A new weak dependence condition and applications to moment inequalities. . Appl. 84, 313-342. Doukhan, P., Massart, P. and Rio, E. (1995). Invariance principles for absolutely regular em- pirical processes. Ann. Inst. H. Poincar´eProbab. Statist. 31, 393-427. Doukhan, P. and Surgailis, D. (1998). Functional for the empirical process of short memory linear processes. C. R. Acad. Sci. Paris Ser. I Math. 326, 87-92. Einmahl, J. H. J. and Mason, D. M. (1988). Strong limit theorems for weighted quantile pro- cesses. Ann. Probab. 16, 1623-1643. Elton, J. H. (1990). A multiplicative ergodic theorem for Lipschitz maps. Stochastic Process. Appl. 34, 39-47. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987-1007. EMPIRICAL PROCESSES OF STATIONARY SEQUENCES 333

Fan, J. and Yao, Q. (2003). Nonlinear Time Series: Nonparametric and Parametric Methods. Springer, New York. Gastwirth, J. L. and Rubin, H. (1975). The asymptotic distribution theory of the empiric cdf for mixing stochastic processes. Ann. Statist. 3, 809-824. Haggan, V. and Ozaki, T. (1981). Modelling nonlinear random vibrations using an amplitude- dependent autoregressive time series model. Biometrika 68, 189-196. Hannan, E. J. (1973). Central limit theorems for time series regression. Z. Wahrsch. Verw. Gebiete 26, 157-170. Ho, H. C. and Hsing, T. (1996). On the asymptotic expansion of the empirical process of long- memory moving averages. Ann. Statist. 24, 992-1024. Hsing, T. and Wu, W. B. (2004). On weighted U-statistics for stationary processes. Ann. Probab. 32, 1600-1631. Jarner, S. and Tweedie, R. (2001). Locally contracting iterated random functions and stability of Markov chains. J. Appl. Probab. 38, 494-507. Mehra, K. L. and Rao, M. S. (1975). Weak convergence of generalized empirical processes relative to dq under strong mixing. Ann. Probab. 3, 979-991. Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London. Opic, B. and Kufner, A. (1990). Hardy-type Inequalities. Longman Scientific & Technical. Wiley, New York. Pham, T. D. and Tran, L. T. (1985). Some mixing properties of time series models. Stochastic Process. Appl. 19, 297-303. Rio, E. (2000). Theorie Asymptotique des Processus Aleatoires Faiblement Dependants. Math´ematiques et Applications 31. Springer, Berlin. Shao, Q. M. and Yu, H. (1996). Weak convergence for weighted empirical processes of dependent sequences. Ann. Probab. 24, 2098-2127. Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York. Tong, H. (1990). Non-linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford. Tsay, R. S. (2005). Analysis of Financial Time Series. Wiley, New York. Withers, C. S. (1975). Convergence of empirical processes of mixing rv’s on [0, 1]. Ann. Statist. 3, 1101-1108. Withers, C. S. (1981). Conditions for linear process to be strongly mixing. Z. Wahrsch. Verw. Gebiete 57, 477-480. Wu, W. B. (2003). Empirical processes of long-memory sequences. Bernoulli 9, 809-831. Wu, W. B. (2005). A strong convergence theory for stationary processes. Preprint. Wu, W. B. and Mielniczuk, J. (2002). Kernel density estimation for linear processes. Ann. Statist. 30, 1441-1459. Wu, W. B. and Shao, X. (2004). Limit theorems for iterated random functions. J. Appl. Probab. 41, 425-436.

Department of Statistics, The University of Chicago, 5734 S. University Avenue, Chicago, IL 60637, U.S.A. E-mail: [email protected]

(Received September 2005; accepted May 2006)