<<

Mixing properties of ARCH and time-varying ARCH processes (technical report)

Piotr Fryzlewicz Suhasini Subba Rao

Abstract

There exists very few results on mixing for nonstationary processes. However, mixing is often required in statistical inference for nonstationary processes, such as time-varying ARCH (tvARCH) models. In this paper, bounds for the mixing rates of a stochastic pro- cess are derived in terms the conditional densities of the process. These bounds are used to obtain the , 2-mixing and -mixing rates of the nonstationary time-varying ARCH(p) process and ARCH( ) process. It is shown that the mixing rate of time-varying ARCH(p) ∞ process is geometric, whereas the bounds on the mixing rate of the ARCH( ) process de- ∞ pends on the rate of decay of the ARCH( ) parameters. We mention that the methodology ∞ given in this paper is applicable to other processes. Key words: Absolutely regular (-mixing) ARCH( ), conditional densities, time- ∞ varying ARCH, strong mixing (-mixing), 2-mixing. 2000 Subject Classication: 62M10, 60G99, 60K99.

1 Introduction

Mixing is a of dependence between elements of a random sequence that has a wide range of theoretical applications (see Bradley (2007) and below). One of the most popular mixing measures is -mixing (also called strong mixing), where the -mixing rate of the nonstationary X is dened as a sequence of coecients (k) such that { t}

(k) = sup sup P (G H) P (G)P (H) . (1) Z t∈ H∈(Xt,Xt1,...) | ∩ | G∈(Xt+k,Xt+k+1,...)

1 X is called -mixing if (k) 0 as k . If (k) decays suciently fast to zero as { t} → → ∞ { } k , then, amongst other results, it is possible to show asymptotic normality of sums of → ∞ X (c.f. Davidson (1994), Chapter 24), as well as exponential inequalities for such sums (c.f. { k} Bosq (1998)). The notion of 2-mixing is related to strong mixing, but is a weaker condition as it measures the dependence between two random variables and not the entire tails. 2-mixing is often used in statistical inference, for example deriving rates in nonparametric regression (see Bosq (1998)). The 2-mixing rate can be used to derive bounds for the covariance between functions of random variables, say cov(g(Xt), g(Xt+k)) (see Ibragimov (1962)), which is usually not possible when only the correlation structure of X is known. The 2-mixing rate of X is dened as { k} { k} a sequence ˜(k) which satises

˜(k) = sup sup P (G H) P (G)P (H) . (2) t∈Z H∈(Xt) | ∩ | G∈(Xt+k)

It is clear that ˜(k) (k). A closely related mixing measure, introduced in Volkonskii and Rozanov (1959) is -mixing (also called absolutely regular). The -mixing rate of the stochastic process X is dened as a sequence of coecients (k) such that { t}

(k) = sup sup P (Gi Hj) P (Gi)P (Hj) , (3) t∈Z {H }∈(X ,X ,...) | ∩ | j t t1 i j {Gj }∈(Xt+k,Xt+k+1,...) X X where G and H are nite partitions of the sample space . X is called -mixing if { i} { j} { t} (k) 0 as k . It can be seen that this measure is slightly stronger than -mixing (since → → ∞ an upper bound for (k) immediately gives a bound for (k); (k) (k)). Despite the versatility of mixing, its main drawback is that in general it is dicult to derive bounds for (k), ˜(k) and (k). However the mixing bounds of some processes are known. Chanda (1974), Gorodetskii (1977), Athreya and Pantula (1986) and Pham and Tran (1985) show strong mixing of the MA( ) process. Feigin and Tweedie (1985) and Pham (1986) have shown ∞ geometric of Bilinear processes (we note that stationary geometrically ergodic Markov chains are geometrically -mixing, 2-mixing and -mixing - see, for example, Francq and Zakoan (2006)). More recently, Tjostheim (1990) and Mokkadem (1990) have shown geometric ergodicity for a general class of Markovian processes. The results in Mokkadem (1990) have been applied in Bousamma (1998) to show geometric ergodicity of stationary ARCH(p) and GARCH(p, q) processes, where p and q are nite integers. Related results on mixing for GARCH(p, q) processes

2 can be found in Carrasco and Chen (2002), Liebscher (2005), Sorokin (2006) and Lindner (2008) (for an excellent review) and Francq and Zakoan (2006) and Meitz and Saikkonen (2008) (where mixing of ‘nonlinear’ GARCH(p, q) processes are also considered). Most of these these results are proven by verifying the Meyn-Tweedie conditions (see Feigin and Tweedie (1985) and Meyn and Tweedie (1993)), and, as mentioned above, are derived under the premise that the process is stationary (or asymptotically stationary) and Markovian. Clearly, if a process is nonstationary, then the aforementioned results do not hold. Therefore for nonstationary processes, an alternative method to prove mixing is required.

The main aim of this paper is to derive a bound for (1), (2) and (3) in terms of the densities of the process plus an additional term, which is an extremal probability. These bounds can be applied to various processes. In this paper, we will focus on ARCH-type processes and use the bounds to derive mixing rates for time-varying ARCH(p) (tvARCH) and ARCH( ) processes. ∞ The ARCH family of processes is widely used in nance to model the evolution of returns on nancial instruments: we refer the reader to the review article of Giraitis et al. (2005) for a comprehensive overview of mathematical properties of ARCH processes, and a list of further references. It is worth mentioning that Hormann (2008) and Berkes et al. (2008) have considered a dierent type of dependence, namely a version of the m-dependence moment measure, for ARCH-type processes. The stationary GARCH(p, q) model tends to be the benchmark nancial model. However, in certain situations it may not be the most appropriate model, for example it cannot adequently explain the long memory seen in the data or change according to shifts in the world economy. Therefore, recently attention has been paid to tvARCH models (see, for example, Mikosch and Starica (2003), Dahlhaus and Subba Rao (2006), Fryzlewicz et al. (2008) and Fryzlewicz and Subba Rao (2008)) and ARCH( ) models (see Robinson (1991), ∞ Giraitis et al. (2000), Giraitis and Robinson (2001) and Subba Rao (2006)). The derivations of the sampling properties of some of the above mentioned papers rely on quite sophisticated assumptions on the dependence structure, in particular their mixing properties.

We will show that due to the p-Markovian nature of the time-varying ARCH(p) process, the -mixing, 2-mixing and -mixing bound has the same geometric rate. The story is dierent for ARCH( ) processes, where the mixing rates can be dierent and vary according to the ∞ rate of decay of the parameters. An advantage of the approach advocated in this paper is that these methods can readily be used to establish mixing rates of several time series models. This is especially useful in time series analysis, for example, change point detection schemes for nonlinear

3 time series, where strong mixing of the underlying process is often required. The price we pay for the exibility of our approach is that the assumptions under which we work are slightly stronger than the standard assumptions required to prove geometric mixing of the stationary GARCH process. However, the conditions do not rely on proving irreducibility (which is usually required when showing geometric ergodicity) of the underlying process, which can be dicult to verify.

In Section 2 we derive a bound for the mixing rate of general stochastic processes, in terms of the dierences of conditional densities. In Section 3 we derive mixing bounds for time-varying ARCH(p) processes (where p is nite). In Section 4 we derive mixing bounds for ARCH( ) ∞ processes. Proofs which are not in the main body of the paper can be found in the appendix.

2 Some mixing inequalities for general processes

2.1 Notation

For k > 0, let Xtk = (X , . . . , X ); if k 0, then X tk = 0. Let y = (y , . . . , y ). Let t t tk t s s 0 denote the ` -norm. Let denote the sample space. The sigma-algebra generated by k k 1 X , . . . , X is denoted as t = (X , . . . , X ). t t+r Ft+r t t+r

2.2 Some mixing inequalities

Let us suppose X is an arbitrary stochastic process. In this section we derive some bounds { t} for (k), ˜(k) and (k). To do this we will consider bounds for

sup P (G H) P (G)P (H) and sup P (Gi Hj) P (Gi)P (Hj) , tr1 t+k | ∩ | tr1 t+k | ∩ | H∈F ,G∈F {Hj }∈F ,{Gi}∈F i,j t t+k+r2 t t+k+r2 X where r , r 0 and G and H are partitions of . In the proposition below, we give a 1 2 { i} { i} bound for the mixing rate in terms of conditional densities. Similar bounds for linear processes have been derived in Chanda (1974) and Gorodetskii (1977) (see also Davidson (1994), Chapter 14). However, the bounds in Proposition 2.1 apply to any stochastic process, and it is this generality that allows us to use the result in later sections, where we derive mixing rates for ARCH-type processes.

t+k tr1 Proposition 2.1 Let us suppose that the conditional density of X t+k+r2 given Xt exists and

4 + r +1 tr R 1 denote it as fXt+k |X 1 . For = (0, . . . , r1 ) ( ) , dene the set t+k+r2 t ∈

E = ω; Xtr1 (ω) where = ( , . . . , ); for all . (4) { t ∈ E} E { 0 r1 | j| j}

Then for all r , r 0 and we have 1 2

sup P (G H) P (G)P (H) tr H∈F 1 ,G∈F t+k | ∩ | t t+k+r2

c 2 sup f t+k tr1 (y x) f t+k tr1 (y 0) dy + 4P (E ), (5) Xt k r |Xt Xt k r |Xt x∈E Rr2+1 + + 2 | + + 2 | Z ¯ ¯ ¯ ¯ and ¯ ¯

sup P (Gi Hj) P (Gi)P (Hj) tr1 t+k | ∩ | {Hj }∈F ,{Gj }∈F i,j t t+k+r2 X c 2 sup f t+k tr1 (y x) f t+k tr1 (y 0) dy + 4P (E ), (6) Xt k r |Xt Xt k r |Xt Rr2+1 x∈E + + 2 | + + 2 | Z ¯ ¯ ¯ ¯ ¯ ¯ where G and H are nite partitions of . X tr1 . Let W t+1 be a random vector that is { i} { j} t t+k1 tr1 t+1 independent of X and f t+1 denote the density of W , then we have t W t+k1 t+k1

sup P (G H) P (G)P (H) tr H∈F 1 ,G∈F t+k | ∩ | t t+k+r2

r2 2 sup f (w) sup (y y , w, x)d y dw + 4P (Ec) (7) W s,k,t s s1 s x∈E y ∈Rs R D | s=0 ( s1 ) X Z Z and sup P (Gi Hj) P (Gi)P (Hj) tr1 t+k | ∩ | {Hj }∈F ,{Gj }∈F i,j t t+k+r2 X r2 2 f (w) sup sup (y y , w, x)d y dw + 4P (Ec) (8) W s,k,t s s1 s y ∈Rs R x∈E D | s=0 ( s1 ) X Z Z where (y y , w, x) = f (y w, x) f (y w, 0) and for s 1 D0,k,t 0| 1 s,k,t s| s,k,t s| ¯ ¯ ¯ ¯ (y y , w, x) = f (y y , w, x) f (y y , w, 0) , (9) Ds,k,t s| s1 s,k,t s| s1 s,k,t s s1 | ¯ ¯ ¯ t+1 tr1 ¯ with the conditional density of Xt+k given (W t+k1, Xt ) denoted as f0,k,t and the conditional

t+k t+1 tr1 density of Xt+k+s given (Xt+k+s1, W t+k1, Xt ), denoted as fs,k,t, x = (x0, . . . , xr2 ) and

5 w = (wk, . . . , w1).

PROOF. In Appendix A.1. ¤

Since the above bounds hold for all vectors (R+)r1+1 (note denes the set E; see (4)), by ∈ choosing the which balances the integral and P (Ec), we obtain an upper bound for the mixing rate.

The main application of the inequality in (7) is to processes which are ‘driven’ by the inno- t+1 vations (for example, linear and ARCH-type processes). If W t+k1 is the innovation process,

t+k t+1 tr1 often it can be shown that the conditional density of Xt+k+s given (Xt+k+s1, W t+k1, Xt ) can be written as a function of the innovation density. Deriving the density of Xt+k+s given

t+k t+1 tr1 (Xt+k+s1, W t+k1, Xt ) is not a trivial task, but it is often possible. In the subsequent sec- tions we will apply Proposition 2.1 to obtaining bounds for the mixing rates.

The proof of Proposition 2.1 can be found in the appendix, but we give a avour of it here. Let

H = ω; Xtr1 (ω) , G = ω; X t+k (ω) . (10) { t ∈ H} { t+k+r2 ∈ G}

It is straightforward to show that P (G H) P (G)P (H) P (G H E) P (G E)P (H) + | ∩ | | ∩ ∩ ∩ | 2P (Ec). The advantage of this decomposition is that when we restrict X tr1 to the set (ie. t E not large values of Xtr1 ), we can obtain a bound for P (G H E) P (G E)P (H) . More t | ∩ ∩ ∩ | precisely, by using the inequality

tr1 tr1 inf P G Xt = x P (H E) P (G H E) sup P G Xt = x P (H E), x∈E ∩ ∩ ∩ x∈E ∩ ¡ ¯ ¢ ¡ ¯ ¢ ¯ ¯ we can derive upper and lower bounds for P (G H E) P (G E)P (H) which depend only ∩ ∩ ∩ on E and not H and G, and thus obtain the bounds in Proposition 2.1.

It is worth mentioning that by using (7) one can establish mixing rates for time-varying linear processes (such as the tvMA( ) process considered in Dahlhaus and Polonik (2006)). Using ∞ (7) and similar techniques to those used in Section 4, mixing bounds can be obtained for the tvMA( ) process. ∞ In the following sections we will derive the mixing rates for ARCH-type processes, where one of the challanging aspects of the proof is establishing a bound for the integral dierence in (9).

6 3 Mixing for the time-varying ARCH(p) process

3.1 The tvARCH process

In Fryzlewicz et al. (2008) we show that the tvARCH process can be used to explain the com- monly observed stylised facts in nancial time series (such as the empirical long memory). A sequence of random variables X is said to come from a time-varying ARCH(p) if it satises { t} the representation

p

Xt = Zt a0(t) + aj(t)Xtj , (11) Ã j=1 ! X where Z are independent, identically distributed (iid) positive random variables, with E(Z ) = { t} t 1 and a ( ) are positive parameters. It is worth comparing (11) with the tvARCH process used in j the statistical literature. Unlike the tvARCH process considered in, for example, Dahlhaus and Subba Rao (2006) and Fryzlewicz et al. (2008), we have not placed any smoothness conditions on the time varying parameters a ( ) . The smoothness conditions assumed in Dahlhaus and { j } Subba Rao (2006) and Fryzlewicz et al. (2008) are used in order to do parameter estimation. However, in this paper we are dealing with mixing of the process, which does not require such strong assumptions. The assumptions that we require are stated below.

p Assumption 3.1 (i) For some > 0, sup Z a (t) 1 . t∈ j=1 j P (ii) inf Z a (t) > 0 and sup Z a (t) < . t∈ 0 t∈ 0 ∞ (iii) Let f denote the density of Z . For all a > 0 we have f (u) f (u[1 + a]) du K a, Z t | Z Z | for some nite K independent of a. R

(iv) Let f denote the density of Z . For all a > 0 we have sup f (u) f (u[1 + ]) du Z t 0a | Z Z | K a, for some nite K independent of a. R

We note that Assumption 3.1(i,ii) guarantees that the ARCH process has a Volterra expansion as a solution (see Dahlhaus and Subba Rao (2006), Section 5). Assumption 3.1(iii,iv) is a type of Lipschitz condition on the density function and is satised by various well known distribu- tions, including the chi-squared distributions. We now consider a class of densities which satisfy Assumption 3.1(iii,iv). Suppose f : R R is a density function, whose rst derivative is Z →

7 bounded, after some nite point m, the derivative f 0 declines monotonically to zero and satises yf 0 (y) dy < . In this case | Z | ∞ R ∞ sup fZ (u) fZ (u[1 + ]) du 0 0a | | Z m ∞ sup fZ (u) fZ (u[1 + ]) du + sup fZ (u) fZ (u[1 + ]) du 0 0a | | m 0a | | Z ∞ Z 2 0 0 a m sup fZ (u) + u fZ (u) du Ka, R | | | | u∈ Zm ¡ ¢ for some nite K independent of a, hence Assumption 3.1(iii,iv) is satised.

We use Assumption 3.1(i,ii,iii) to obtain the strong mixing rate (2-mixing and -mixing) of the tvARCH(p) process and the slightly stronger conditions Assumption 3.1(i,ii,iv) to obtain the -mixing rate of the tvARCH(p) process. We mention that in the case that X is a stationary, { t} ergodic time series, Francq and Zakoan (2006) have shown geometric ergodicity, which they show implies -mixing, under the weaker condition that the distribution function of Z can { t} have some discontinuities.

3.2 The tvARCH(p) process and the Volterra series expansion

In this section we derive a Volterra series expansion of the tvARCH process (see also Giraitis et al. (2000)). These results allow us to apply Proposition 2.1 to the tvARCH process. We rst t+1 tp+1 note that the innovations Zt+k1 and Xt are independent random vectors. Hence comparing t+1 with Proposition 2.1, we are interested in obtaining the conditional density of Xt+k given Zt+k1 tp+1 t+k t+1 and Xt , (denoted f0,k,t) and the conditional density of Xt+k+s given Xt+k+s1, Zt+k1 and Xtp+1 (denoted f ). We use these expressions to obtain a bound for (dened in (9)), t s,k,t Ds,k,t which we use to derive a bound for the mixing rate. We now represent X in terms of Z . { t} { t} To do this we dene

a1(t)zt a2(t)zt . . . ap(t)zt a1(t) a2(t) . . . ap(t)  1 0 . . . 0   1 0 . . . 0  A (z) =  0 1 . . . 0  , A = A (1) =  0 1 . . . 0  t   t t    . .   . .   ......   ......           0 0 1 0   0 0 1 0       0 tp+1  0  bt(z) = (a0(t)zt, 0, . . . , 0) and Xt = (Xt, Xt1, . . . , Xtp+1) . (12)

8 t+kp+1 t+kp Using this notation we have the relation X t+k = At+k(Z)Xt+k1 + bt+k(Z). We mention the vector representation of ARCH and GARCH processes has been used in Bougerol and Picard (1992), Basrak et al. (2002) and Straumann and Mikosch (2006) in order to obtain some proba- t+kp+1 bilistic properties for ARCH-type processes. Now iterating the relation k times (to get X t+k tp+1 in terms of Xt ) we have

k2 r1 k1 t+kp+1 tp+1 Xt+k = bt+k(Z) + At+ki(Z) bt+kr1(Z) + At+ki(Z) Xt , (13) r=0 i=0 i=0 X · Y ¸ · Y ¸ where we set [ 1 A (Z)] = I (I denotes the p p dimensional identity matrix). We use i=0 t+ki p p this expansionQbelow.

Lemma 3.1 Let us suppose that Assumption 3.1(i) is satised. Then for s 0 we have

X = Z (Z) + (Z, X) , (14) t+k+s t+k+s {Ps,k,t Qs,k,t } where for s = 0 we have

nt2 r (Z) = a (t + k) + [A A (Z)b (Z)] P0,k,t 0 t+k t+ki t+kr1 1 r=0 i=1 k1 X Y (Z, X) = A A (Z)Xtp+1 for n > t, Q0,k,t t+k t+ki t 1 i=1 £ Y ¤ ([ ] denotes the rst element of a vector), for 1 s p 1

(Z) = (15) Ps,k,t s1 p a (t + k + s) + a (t + k + s)X + a (t + k + s)Z a (t + k + s i) + 0 i t+k+si i k+si 0 i=1 i=s ½ k+si Xr X [A A (Z) b (Z)] , t+k+si { t+k+sid } t+k+sir 1 r=1 X Yd=0 ¾ p k+si (Z, X) = a (t + k + s)Z A A (Z)Xtp+1 , Qs,k,t i k+si t+k+si{ t+k+sid t } 1 i=s d=0 £ X Y ¤ and for s > p we have (Z) = a (t+k +s)+ p a (t+k +s)X and (Z, X) 0. Ps,k,t 0 i=1 i t+k+si Qs,k,t We note that and are positive random variables, and for s 1, is a function Ps,k,t Qs,k,t P Ps,k,t t+k of Xt+k+s1 (but this has been suppressed in the notation).

9 PROOF. In Appendix A.2.

t+k t+1 By using (14) we now show that the conditional density of Xt+k+s given Xt+k+s1, Zt+k1 and tp+1 Xt is a function of the density of Zt+k+s. It is clear from (14) that Zt+k+s can be expressed as Z = Xt+k+s . Therefore, it is straightforward to show that t+k+s Ps,k,t(Z)+Qs,k,t(Z,X)

1 y f (y y , z, x) = f s . (16) s,k,t s| s1 (z) + (z, x) Z (z) + (z, x) Ps,k,t Qs,k,t µPs,k,t Qs,k,t ¶

3.3 Strong mixing of the tvARCH(p) process

The aim in this section is to prove geometric mixing of the tvARCH(p) process without appealing to geometric ergodicity. Naturally, the results in this section also apply to stationary ARCH(p) processes.

In the following lemma we use Proposition 2.1 to obtain bounds for the mixing rates. It is worth mentioning that the techniques used in the proof below can be applied to other Markov processes.

Lemma 3.2 Suppose X is a tvARCH process which satises (11). Then for any = ( , . . . , ) { t} 0 p+1 ∈ (R+)p we have

sup P (G H) P (G)P (H) t+k ∞ | ∩ | G∈F∞ ,H∈Ft p1 k1 p1 2 sup f (z ) sup (y y , z, x)dy dz + 4 P ( X ), (17) Z i s,k,t s s1 s tj j+1 x∈E y ∈Rs R D | | | s=0 i=1 s1 j=0 X Z Y ½ Z ¾ X and

sup P (Gi Hj) P (Gi)P (Hj) ∞ t+k | ∩ | {Hj }∈Ft ,{Gj }∈F ∞ i,j X p1 k1 p1 2 sup f (z ) sup sup (y y , z, x)dy dz + 4 P ( X ), (18) Z i s,k,t s s1 s tj j+1 x∈E y ∈Rs R x∈E D | | | s=0 i=1 s1 j=0 X Z Y ½ Z ¾ X where z = (z , . . . , z ) and G and H are partitions of . 1 k1 { i} { j}

PROOF In Appendix A.2. ¤

To obtain a mixing rate for the tvARCH(p) process we need to bound the integral in (17), then

10 obtain the set E which minimises (17). We will start by bounding , which, we recall, is Ds,k,t based on the conditional density fs,k,t (dened in (16)).

Lemma 3.3 Let and be dened as in (9) and (15) respectively. Ds,k,t Qs,k,t

(i) Suppose Assumption 3.1(i,ii,iii) holds, then for all x (R+)p we have ∈

p1 k1 E[ (Z, x)] f (z ) sup (y y , z, x)dy dz K s,k,t K(1 ˜)k x , (19) Z i s,k,t s s1 s Q y ∈Rs D | inft∈Z a0(t) k k s=0 i=1 s1 X Z Y ½ Z ¾ where K is a nite constant and 0 < ˜ < 1 ( is dened in Assumption 3.1(i)). (ii) Suppose Assumption 3.1(i,ii,vi) holds, then for any set (dened as in (4)) we have E

p1 k1 f (z ) sup sup (y y , z, x)dy dz sup K(1 ˜)k x . (20) Z i s,k,t s s1 s y ∈Rs x∈E D | x∈E k k s=0 i=1 s1 X Z Y ½ Z ¾

PROOF. In Appendix A.2. ¤

We now use the lemmas above to show geometric mixing of the tvARCH process.

Theorem 3.1 (i) Suppose Assumption 3.1(i,ii,iii) holds, then

sup P (G H) P (G)P (H) Kk, t+k | ∩ | G∈(X∞ ) ∞ H∈(Xt )

(ii) Suppose Assumption 3.1(i,ii,iv) holds, then

k sup P (Gi Hj) P (Gi)P (Hj) K , ∞ | ∩ | {Hj }∈(Xt ) i j t+k {Gj }∈(X∞ ) X X for any √1 < < 1, and where K is a nite constant independent of t and k.

PROOF. We use (17) to prove the (i). (19) gives a bound for the integral dierence in (17), therefore all that remains is to bound the probabilities in (17). To do this we rst use Markov’s inequality, to give p1 P ( X ) p1 E X 1. By using the Volterra expansion j=0 | tj| j j=0 | tj| j of X (see Dahlhaus and Subba Rao (2006), Section 5) it can be shown that sup Z E X t P P t∈ | t|

11 p (sup Z a (t))/ sup Z(1 a (t))). Using these bounds and substituting (19) into (17) gives t∈ 0 t∈ j=1 j for every (R+)p the bound ∈ P

˜ k p1 p1 K(1 ) j 1 sup P (G H) P (G)P (H) 2 j=0 + 4K . t+k | ∩ | inft∈Z a0(t) j G∈(X∞ ) P j=0 ∞ H∈(Xt ) X

We observe the right hand side of the above is minimised when = (1 ˜)k/2 (for 0 j j (p 1)), which gives the bound

sup P (G H) P (G)P (H) K (1 ˜)k. ∞ | ∩ | H∈(Xt ) t+k q G∈(X∞ )

Since the above is true for any 0 < ˜ < , (ii) is true for any which satises √1 < < 1, thus giving the result.

To prove (ii) we use an identical argument but use the bound in (20) instead of (19), we omit the details. ¤

Remark 3.1 We observe that K and dened in the above theorem are independent of t, therefore under Assumption 3.1(i,ii,iii) we have (k) Kk (-mixing, dened in (1)) and under Assumption 3.1(i,ii,iv) (k) Kk (-mixing, dened in (3)) for all √1 < < 1. Moreover, since (X ) (X , . . . , X ) and (X ) (X , . . . , X ) the 2-mixing t+k t+k t+p1 t t tp+1 rate is also geometric with ˜(k) Kk (˜(k) dened in (2)).

4 Mixing for ARCH( ) processes ∞ In this section we derive mixing rates for the ARCH( ) process, we rst dene the process and ∞ state the assumptions that we will use.

4.1 The ARCH( ) process ∞ The ARCH( ) process has many interesting features, which are useful in several applications. ∞ For example, under certain conditions on the coecients, the ARCH( ) process can exhibit ∞ ‘near long memory’ behaviour (see Giraitis et al. (2000)). The ARCH( ) process satises the ∞ 12 representation

Xt = Zt a0 + ajXtj , (21) Ã j=1 ! X where Zt are iid postive random variables with E(Zt) = 1 and aj are positive parameters. The GARCH(p, q) model also has an ARCH( ) representation, where the a decay geometrically ∞ j with j. Giraitis and Robinson (2001), Robinson and Zaaroni (2006) and Subba Rao (2006) consider parameter estimation for the ARCH( ) process. ∞ We will use Assumption 3.1 and the assumptions below.

Assumption 4.1 (i) We have ∞ a < 1 and a > 0. j=1 j 0 P ∞ (ii) For some > 1, E X < (we note that this is fullled if [E Z ]1/ a < 1). | t| ∞ | 0 | j=1 j P Giraitis et al. (2000) have shown that under Assumption 4.1(i), the ARCH( ) process has a ∞ stationary solution and a nite mean (that is sup Z E(X ) < ). It is worth mentioning that t∈ t ∞ since the ARCH( ) process has a stationary solution the shift t, plays no role when obtaining ∞ mixing bounds, ie. sup P (G H) P (G)P (H) = sup P (G G∈(Xk+t),H∈(Xt) | ∩ | G∈(Xk),H∈(X0) | ∩ H) P (G)P (H) . Furthermore, the conditional density of X given Zt+1 and X∞ is not | t+k t+k1 t a function of t, hence in the section below we let f0,k denote the conditional density of Xt+k given (Zt+1 and X∞) and for s 1, let f denote the conditional density of X given t+k1 t s,k t+k+s t+k t ∞ (Xt+k+s1, Zt+k1 and Xt ).

4.2 The ARCH( ) process and the Volterra series expansion ∞

1 We now write Xk in terms of Zk1 and X = (X0, X1, . . .) and use this to derive the conditional densities f0,k and fs,k. It can be seen from the result below (equation (22)) that in general the ARCH( ) process is not Markovian. ∞ Lemma 4.1 Suppose X satises (21). Then { t}

X = (Z)Z + (Z, X)Z , (22) k P0,k k Q0,k k where

13 k m1 m1 (Z) = a + a Z P0,k 0 ji+1ji ji " m=1 Ã i=1 ! Ã i=1 !# X k=jmX>...>j1>0 Y Y k k m1 m1 (Z, X) = a Z d (X). (23) Q0,k ji+1ji ji r r=1 (m=1 Ã i=1 ! Ã i=1 !) X X k=jmX>...>j1=r Y Y Furthermore, setting = 0, for k 1 we have that (Z, X) satises the recursion Q0,k Q0,k (Z, X) = k a (Z, X)Z + d (X), where d (X) = ∞ a X (for k 1). Q0,k j=1 jQ0,kj kj k k j=0 k+j j P P PROOF. In Appendix A.3 of the Technical report. ¤

We will use the result above to derive the 2-mixing rate. To derive and mixing we require k 1 ∞ the density of Xk+s given Xk+s1, Zk1 and X0 , which uses the following lemma.

Lemma 4.2 Suppose X satises (21). Then for s 1 we have { t}

X = Z (Z) + (Z, X) , (24) k+s k+s {Ps,k Qs,k } where

s ∞ (Z) = a + a X + a Z (Z) Ps,k 0 j k+sj j k+sjP0,k+sj j=1 j=s+1 X X k+s (Z, X) = a Z (Z, X) + d (X). (25) Qs,k j k+sjQ0,k+sj k+s j=s+1 X

PROOF. In Appendix A.3 of the Technical report.

Xk+s Using (22) and (24) for all s 0 we have that Zk+s = , which leads to the Ps,k(Z)+Qs,k(Z,X) conditional densities

1 y f (y y , z, x) = f s . (26) s,k s| s1 (z) + (z, x) Z (z) + (z, x) Ps,k Qs,k µPs,k Qs,k ¶ In the proofs below (1 , x) plays a prominent role. By using the recursion in Lemma Q0,k k1 4.1 and (25), setting x = X ∞ and noting that E( (Z, x)) = (1 , x) we obtain the 0 Qs,k Qs,k k1 recursion (1 , x) = k a (1 , x) + d (x). We use this to obtain a solution Q0,k k1 j=1 j+sQ0,kj kj1 k+s for (1 , x) in terms of d (x) in the lemma below. Q0,k k1 P { k }k

14 Lemma 4.3 Suppose X satises (21) and Assumption 4.1 are fullled. Then, there ex- { t} ists such that for all z 1 we have (1 ∞ a zj)1 = ∞ zj. Furthermore, if { j} | | j=1 j j=0 j ja < , then Hannan and Kavaliers (1986) have shown that j < . For k 0, j | j| ∞ P P j | j| ∞ set d (x) = 0 and (1 , x) = 0, then for k 1, (1 , x) has the solution P k Q0,k k1 Q0,k k1 P

∞ k1 k1 ∞ (1 , x) = d (x) = d (x) = a x , (27) Q0,k k1 j kj j kj j kj+i i j=0 j=0 j=0 ( i=0 ) X X X X where x = (x0, x1, . . .).

PROOF. In Appendix A.3 of the Technical report. ¤

4.3 Mixing for ARCH( )-type processes ∞ In this section we show that the mixing rates are not necessarily geometric and depend on the rate of decay of the coecients a (we illustrate this in the following example). Furthermore { j} for ARCH( ) processes the strong mixing rate and 2-mixing rate can be dierent. ∞ Example 4.1 Let us consider the ARCH( ) process, X , dened in (21). Giraitis et al. ∞ { t} (2000) have shown that if a j(1+) (for some > 0) and [E(Z2)]1/2 ∞ a < 1, then j t j=1 j cov(X , X ) k(1+). That is, the absolute sum of the covariances is nite, but ‘only just’ if | 0 k | P is small. If Zt < 1, it is straightforward to see that Xt is a bounded random variable and by using Ibragimov’s inequality (see Hall and Heyde (1980)) we have

cov(X0, Xk) C sup P (A B) P (A)P (B) , | | A∈(X0),B∈(Xk) | ∩ | for some C < . Noting that cov(X , X ) = O(k(1+)) this gives a lower bound of O(k(1+)) ∞ | 0 k | on the 2-mixing rate. ¤

To obtain the mixing rates we will use Proposition 2.1, this result requires bounds on = Ds,k f (y y , z, x) f (y y , z, 0) and its integral. | s,k s| s1 s,k s| s1 | Lemma 4.4 Suppose X satises (21), f is the density of Z and let and ( ) be { t} Z t Ds,k Q0,k

15 dened as in (9) and (23). Suppose Assumptions 3.1(iii) and 4.1 are fullled, then

k1 (1 , x) k1 ∞ f (z ) f (y z, x) f (y z, 0) dy dz Q0,k k1 = a x (28) Z i | 0,k | 0,k | | a | j| kj+i i Z i=1 Z 0 j=0 ( i=0 ) Y © ª X X and for s 1

k+sj k1 1 k+s ∞ ∞ f (z ) sup (y y , z, x)dy dz a a x + a x , Z i s,k s s1 s j l k+sjl+i i k+s+i i y ∈Rs D | a0 | | Z i=1 s1 Z j=s+1 l=0 i=0 i=0 Y © ª © X X X X ª (29)

Suppose Assumptions 3.1(iv) and 4.1 are fullled, and is dened as in (4) then E

k1 f (z ) sup sup (y y , z, x)dy dz Z i s,k s s1 s y ∈Rs x∈E D | i=1 s1 Z Y Z ©k+sj ª 1 k+s ∞ ∞ aj l ak+sjl+ii + ak+s+ii , (30) a0 | | j=s+1 l=0 i=0 i=0 © X X X X ª where x = (x0, x1, . . .) is a positive vector.

PROOF. In Appendix A.3 of the Technical report. ¤

We require the following simple lemma to prove the theorem below.

Lemma 4.5 Let us suppose that c , d and are positive sequences, then { i} { i} { i}

∞ ∞ 1 1 1+ +1 +1 +1 inf (cii + dii ) = ( + ) ci di . (31) i=0 i=0 © X ª X PROOF. In Appendix A.3 of the Technical report. ¤

In the following theorem we obtain -mixing and -mixing bounds for the ARCH( ) process. ∞ Theorem 4.1 Suppose X satises (21). { t}

16 (a) Suppose Assumptions 3.1(iii) and 4.1 hold. Then, we have

sup P (G H) P (G)P (H) k ∞ | ∩ | G∈F∞,H∈F0 ∞ ∞ k+s k+sj ∞ +1 1 1 1 K() a a + a [E X ] +1 (32) a j | l| k+sjl+i a k+s+i | 0| i=0 0 s=0 j=s+1 0 s=0 X · X X Xl=0 X ¸

1 where K() = 3( 1+ + +1 ).

(i) If the parameters of the ARCH( ) process satisfy a j and the j ∞ | j| | j| (dened in Lemma 4.3), then we have

˜ ˜ sup P (G H) P (G)P (H) K k(k + 1)+3 + (k + 1)+2 , G∈F k ,H∈F ∞ | ∩ | ∞ 0 · ¸

where ˜ = ( ). +1 (ii) If the parameters of the ARCH( ) process satisfy a j and j, where ∞ | j| j 0 < < 1 (an example is the GARCH(p, q) process), then we have

sup P (G H) P (G)P (H) C k k/2 k 0 | ∩ | G∈F∞,H∈F∞

where C is a nite constant.

(b) Suppose Assumptions 3.1(iv) and 4.1 hold. Then, we have

sup P (Gi Hj) P (Gi)P (Hj) k ∞ | ∩ | {Gi}∈F∞,{Hj }∈F0 i j X X ∞ ∞ k+s k+sj ∞ +1 1 1 1 K() a a + a [E X ] +1 (33) a j | l| k+sjl+i a k+s+i | 0| i=0 0 s=0 j=s+1 0 s=0 X · X X Xl=0 X ¸ where G and H are partitions of . We mention that the bounds for the -mixing { i} { j} rates for dierent orders of a and derived in (i) also hold for the -mixing rate. { j} { j}

PROOF. We rst prove (a). We use that

∞ sup P (G H) P (G)P (H) = lim sup H 0 P (G H) P (G)P (H) , k ∞ | ∩ | n→∞ k ∈ F | ∩ | G∈F∞,H∈F0 G∈Fk+n

17 and nd a bound for each n. By using (5) to bound supG∈F k ,H∈F ∞ P (G H) P (G)P (H) k+n 0 | ∩ | we see that for all sets (as dened in (4)) we have E

sup P (G H) P (G)P (H) (34) k ∞ | ∩ | G∈Fk+n,H∈F 0 n k1 2 sup f (z ) sup (y y , z, x)dy dz + 4P (X > or . . . X > ). Z i s,k s s1 s 0 0 n n x∈E Rk y ∈Rs D | s=0 Z i=1 s1 Z X Y © ª To bound the integral in (34) we use (29) to obtain

n k1 sup f (z ) sup (y y , z, x)dy dz Z i s,k s s1 s x∈E Rk y ∈Rs R D | s=0 i=1 s1 X Z Y Z k+sj 1 n k+s ∞ ∞ = aj l ak+sjl+ii + ak+s+ii . (35) a0 | | s=0 j=s+1 l=0 i=0 i=0 X © X X X X ª

n E(|Xi| ) Now by using Markov’s inequality we have that P (X0 > 0 or , . . . , Xn n) i=0 . i Substituting (35) and the above into (34) and letting n gives → ∞ P sup P (G H) P (G)P (H) k ∞ | ∩ | G∈F∞,H∈F0 ∞ k+s k+sj ∞ ∞ ∞ 2 E inf aj l ak+sjl+ii + ak+s+ii + 4 X0 i (36) a0 | | | | · s=0 j=s+1 l=0 i=0 i=0 i=0 ¸ X © X X X X ª X where = (0, 1, . . .).

Now we use (31) to minimise (36), which gives us (33). The proof of (i) can be found in the appendix. It is straightforward to prove (ii) using (31).

The proof of (b) is very similar to the proof of (a) but uses (30) rather than (29), we omit the details. ¤

Remark 4.1 Under the assumptions in Theorem 4.1(a) we have a bound for the -mixing rate, 1 ∞ k+s k+sj 1 ∞ +1 that is (k) (k), where (k) = K aj l ak+sjl+i+ ak+s+i . a0 s=0 j=s+1 l=0 | | a0 s=0 Under the assumptions in Theorem£4.1(a) the -mixing coecient is bounded by (k) (k)¤. P P P P

In the following theorem we consider a bound for the 2-mixing rate of an ARCH( ) process. ∞ Theorem 4.2 Suppose X satises (21) and Assumptions 3.1(iii) and 4.1 holds. Then we { t}

18 have

∞ k1 +1 1 1 sup P (G H) P (G)P (H) K() aj j akj+i [E X0 ] +1 (37) ∞ | ∩ | a0 | | | | G∈(Xk),H∈F0 i=0 j=0 X · X ¸

1 where K() = 3( 1+ + +1 ).

If the parameters of the ARCH( ) process satisfy a j and j ( dened in Lemma ∞ j | j| j 4.3), then we have

˜ sup P (G H) P (G)P (H) K k(k + 1)+1 (38) ∞ | ∩ | G∈(Xk),H∈F0 where ˜ = ( ). +1

PROOF. We use a similar proof to the proof of Theorem 4.1. The integral dierence is replaced with the bound in (28) and again we use Markov’s inequality, together they give the bound

k1 ∞ ∞ 1 E 1 sup P (G H) P (G)P (H) inf 2 j akj+ii + 4 X0 . (39) ∞ | ∩ | a0 | | | | G∈(Xk),H∈F0 j=0 ( i=0 ) i=0 i · X X X ¸ Finally to obtain (37) and (38) we use (39) and a proof similar to Theorem 4.1(i) hence we omit the details. ¤

Remark 4.2 Comparing (38) and Theorem 4.1(i) we see that the 2-mixing bound is of a smaller order than the strong mixing bound.

In fact, it could well be that the 2-mixing bound is of a smaller order than Theorem 4.2(i). This is because Theorem 4.2(i) gives a bound for sup P (G H) P (G)P (H) G∈(Xk),H∈(X0,X1,...) | ∩ | whereas the 2-mixing bound restricts the sigma-algebra of the left tail to (X0). However, we have not been able to show this and this is a problem that requires further consideration.

Acknowledgments

We would like to thank Piotr Kokoszka, Mika Meitz and Joseph Tadjuidje for several useful discussions. We also thank the Associate Editor and two anonymous referees for suggestions and comments which greatly improved many aspects of the paper. SSR is partially supported by an NSF grant under DMS-0806096 and the Deutsche Forschungsgemeinschaft under DA 187/15-1.

19 A Proofs

A.1 Proof of Proposition 2.1

We will use the following three lemmas to prove Proposition 2.1.

Lemma A.1 Let G t+k = (Xt+k ) and H, E tr1 = (Xtr1 ) (where E is dened ∈ Ft+k+r2 t+k+r2 ∈ Ft t in (4)), and use the notation in Proposition 2.1. Then we have

P (G H E) P (G E)P (H) (40) | ∩ ∩ ∩ | tr1 tr1 tr1 c c 2P (H) sup P (G Xt = x) P (G Xt = 0) + inf P (G Xt = x) P (H)P (E ) + P (H E ) . | | x∈E | ∩ x∈E ¯ ¯ ½ ¾ ¯ ¯ ¯ ¯ ¯ ¯ PROOF. To prove the result we rst observe that

P (G H E) = P Xt+k , Xtr1 ( ) = dP (Xtr1 y, Xt+k x) ∩ ∩ t+k+r2 ∈ G t ∈ H ∩ E t t+k+r2 ZH∩E ZG ¡ ¢ = dP (Xt+k y Xtr1 = x) dP (Xtr1 x) t+k+r2 | t t ZH∩E ½ZG ¾ = P (Xt+k Xtr1 = x)dP (Xtr1 x). (41) t+k+r2 ∈ G| t t ZH∩E Therefore, by using the above and that P (H E) P (H) we obtain the following inequalities ∩

t+k tr1 t+k tr1 inf P (Xt+k+r2 Xt = x)P (H E) P (G H E) sup P (Xt+k+r2 Xt = x)P (H) (42) x∈E ∈ G| ∩ ∩ ∩ x∈E ∈ G|

t+k tr1 t+k tr1 inf P (Xt+k+r2 Xt = x)P (E) P (G E) sup P (Xt+k+r2 Xt = x)P (E). (43) x∈E ∈ G| ∩ x∈E ∈ G|

Subtracting (42) from (43) and using P (H E) = P (H) P (H E c) give the inequalities ∩ ∩ t+k tr1 P (G H E) P (G E)P (H) sup P (Xt+k+r2 Xt = x)P (H) ∩ ∩ ∩ x∈E ∈ G|

t+k tr1 c inf P (Xt+k+r Xt = x)P (H) + P (E )P (H) (44) x∈E 2 ∈ G| t+k tr1 P (G H E) P (G E)P (H) inf P (Xt+k+r Xt = x)P (H) ∩ ∩ ∩ x∈E 2 ∈ G| t+k tr1 c sup P (Xt+k+r2 Xt = x)P (H) P (E H). (45) x∈E ∈ G| ∩

20 Combining (44) and (45) we obtain

P (G H E) P (G E)P (H) (46) | ∩ ∩ ∩ | tr1 tr1 tr1 c c P (H) sup P (G Xt = x) inf P (G Xt = x) + inf P (G Xt = x) P (H)P (E ) + P (H E ) . | x∈E | x∈E | ∩ x∈E ½ ¾ ¯ ¯ ¯ ¯ Using the triangle inequality we have

tr1 tr1 tr1 tr1 sup P (G Xt = x) inf P (G Xt = x) 2 sup P (G Xt = x) P (G Xt = 0) . | x∈E | | | ¯ x∈E ¯ x∈E ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Substituting¯ the above into (46) gives (40), as¯ required.¯ ¯ ¤

We now obtain a bound for the rst term on the right hand side of (40).

t+k tr1 Lemma A.2 Let f t+k tr1 denote the density of X given X and and be dened X |X t+k+r2 t t+k+r2 t G H as in (10), then

P (G Xtr1 = x) P (G Xtr1 = 0) (y x)dy. (47) | t | t D0,k,t | ZG ¯ ¯ ¯ ¯ t+1 tr1 Let W t+k1 be a random vector which is independent of X t and let fW denote the density of W t+1 . If G (X ) then t+k1 ∈ t+k

f tr1 (y x) f tr1 (y 0) dy fW (w) 0,k,t(y w, x)dy dw (48) Xt k|X Xt k|X + t | + t | Rk1 D | ZG Z ½ ZG ¾ ¯ ¯ ¯ ¯ and if G (Xt+k ) then ∈ t+k+r2

r2

f t+k tr1 (y x) f t+k tr1 (y 0) dy f (w) sup (y y , w, x)dy dw,(49) X |X X |X W s,k,t s s1 s G t+k+r2 t | t+k+r2 t | Rk1 y G D | Z s=0 Z ½ s1 Z s ¾ ¯ ¯ X ¯ ¯ where = G . . . G , and G R. G 1 n j

PROOF. The proof of (47) is clear, hence we omit the details.

t+1 tr2 To prove (48) we rst note that by independence of W t+k1 and Xt we have that f tr1 (w x) = W |Xt | t+1 tr1 fW (w), where f tr1 is the conditional density of W given X . Therefore we have W |Xt t+k1 t

f tr1 (y x) = f tr1 (y w, x)fW (w)dw = f0,k,t(y w, x)fW (w)dw. Xt k|X Xt k|W ,X + t | Rk1 + t | Rk1 | Z Z

21 Now substituting the above into f tr1 (y x) f tr1 (y 0) dy gives (48). G Xt+k|Xt | Xt+k|Xt | R ¯ ¯ To prove (49) we note by using the ¯same argument to prove (48) we ha¯ ve

r2

f t+k tr1 (y x) = f (w) f (y y , w, x)dw. (50) X |X W s,k,t s s1 t+k+r2 t | Rk1 | s=0 Z Y

Now repeatedly subtracting and adding fs,k,t from the above gives

r2 s1

f t+k tr1 (y x) f t+k tr1 (y 0) = f (w) f (y y , w, x) X |X X |X W a,k,t a a1 t+k+r2 t | t+k+r2 t | Rk1 | s=0 Z ½ a=0 ¾ r2 X Y f (y y , w, 0) f (y y , w, x) f (y y , w, 0) dw. (51) b,k,t b| b1 s,k,t s| s1 s,k,t s| s1 ½ b=Ys+1 ¾½ ¾ Therefore taking the integral of the above over gives G r2 s1

f t+k tr1 (y x) f t+k tr1 (y 0) dy f (w) f (y y , w, x)dy X |X X |X W a,k,t a a1 a t+k+r2 t | t+k+r2 t | Rk | ZG s=0 Z ½· a=0 ZGa ¯ ¯ X Y ¯ r2 ¯ f (y y , w, x)dy sup f (y y , w, x) f (y y , w, 0) dy dw. (52) b,k,t b b1 b s,k,t s s1 s,k,t s s1 s G | y G | | b=s+1 Z b ¸ s1 Z s ¾ Y ¯ ¯ ¯ ¯ To obtain (49) we observe that since R and f (y y , w, x)dy = 1 we have Gj R s,k,t s| s1 s s1 r2 a=0 fa,k,t(ya y , w, x)dya b=s+1 fb,k,tR(yb y , w, x)dyb 1. Finally substituting Ga | a1 Gb | b1 ¤ ¡theQaboRve upper bound into (52)¢¡givQes (49).R ¢

The following lemma will be used to show -mixing and uses the above lemmas.

t+k tr1 Lemma A.3 Suppose that Gi , Hj and Gi and Hj are partitions of { } ∈ Ft+k+r2 { } ∈ Ft { } { } . Then we have

P (G H E) P (G E)P (H ) | i ∩ j ∩ i ∩ j | i,j X tr1 tr1 c 2 sup P (Gi Xt = x) P (Gi Xt = 0) + 2P (E ) x∈E | | | | i X and P (G H Ec) P (G Ec)P (H ) 2P (Ec). (53) | i ∩ j ∩ i ∩ j | i,j X

22 PROOF. Substituting the inequality in (40) into P (G H E) P (G E)P (H ) gives i,j | i ∩ j ∩ i ∩ j | P P (G H E) P (G E)P (H ) | i ∩ j ∩ i ∩ j | i,j X tr1 tr1 2 P (Hj) sup P (Gi Xt = x) P (Gi Xt = 0) + x∈E | | j i X X ¯ ¯ tr1 ¯ c c ¯ inf P (Gi Xt = x) P (Hj)P (E ) + P (Hj E ) . (54) x∈E | ∩ i,j X © ª The sets H are partitions of , hence P (H ) = 1 and P (H Ec) 1. Using these { j} i j i j ∩ observations together with (54) gives (53).P P (53) immediately follows from the fact that H and G are disjoint sets gives (53). ¤ { j} { i}

Using the above three lemmas we can now prove Proposition 2.1.

PROOF of Proposition 2.1 equation (5) It is straightforward to show that

P (G H) P (G)P (H) P (G H E) P (G E)P (H) + P (G H Ec) P (G Ec)P (H) . ∩ ∩ ∩ ∩ ∩ ∩ ∩ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Now by substituting (47) into Lemma A.1 and using the above gives

tr tr P (G H) P (G)P (H) 2 sup fXt+k |X 1 (y x) fXt+k |X 1 (y 0) dy + ∩ t+k+r2 t | t+k+r2 t | x∈E ZG ¯ tr1 ¯ c ¯ c c ¯ c ¯inf P (G Xt = x) P (H¯)P (E ) + P¯(H E ) + P (G H E ) + P (G E¯ )P (H). x∈E | ∩ ∩ ∩ ∩ © ª Finally by using that Rr2+1, P (G H Ec) P (Ec), P (G Ec)P (H) P (Ec) and G ∩ ∩ ∩ inf P (G Xtr1 = x) 1 we obtain (5). ¤ x∈E | t

PROOF of Proposition 2.1 equation (6) It is worth noting that the proof of (6) is similar to the proof of (5). Using (53) and the same arguments as in the proof of (5) we have

c tr tr P (Gi Hj) P (Gi)P (Hj) 2 sup fXt+k |X 1 (y x) fXt+k |X 1 (y 0) dy + 4P (E ) | ∩ | x∈E t+k+r2 t | t+k+r2 t | i,j i ZGi ¯ ¯ X X ¯ ¯ ¯ c ¯ tr t¯r ¯ 2 sup fXt+k |X 1 (y x) fXt+k |X 1 (y 0) dy + 4P (E ) (55) x∈E t+k+r2 t | t+k+r2 t | i ZGi ¯ ¯ X ¯ ¯ ¯ ¯ c 2 sup f¯ t+k tr1 (y x) f t+k tr1 (y 0) dy¯ + 4P (E ), Xt k r |Xt Xt k r |Xt Rr2+1 + + 2 | + + 2 | Z x∈E ¯ ¯ ¯ ¯ ¯ ¯ ¯tr1 t+k ¯ where Hj = ω; X (ω) j and Gi = ω; X (ω) i , which gives (6). ¤ { t ∈ H } { t+k+r2 ∈ G } 23 PROOF of Proposition 2.1 equation (7). To prove (7) we note that G s,k,t(ys y , w, x)dys s D | s1 (y y , w, x)dy . Now by substituting this inequality into (49) and what results into (5) R Ds,k,t s| s1 s R Rgives (7). ¤

PROOF of Proposition 2.1 equation (8). To prove (8) we use (49) and that for all positive functions f, f(u)du R f(u)du we have i Gs,i P R R

tr tr sup fXt+k |X 1 (y x) fXt+k |X 1 (y 0) dy x∈E t+k+r2 t | t+k+r2 t | i ZGi X ¯ ¯ r2 ¯ ¯ f (w) sup sup (y y , w, x)dy dw W s,k,t s s1 s Rk y ∈(R+)s G x∈E D i s=0 Z s1 ½ Z s,i ¾ X X ¯ r2 ¯ f (w) sup sup (y y , w, x)dy dw W s,k,t s s1 s Rk y ∈(R+)s R x∈E D s=0 Z s1 ½ Z ¾ X ¯ ¯ where = G . . . G and G R. Finally substituting the above into the right hand Gs s,1 s,n s,i side of (6) gives (8). ¤

A.2 Proofs in Section 3

PROOF of Lemma 3.2 We rst note that since X satises a tvARCH(p) representation { t} t+k (p < ) it is p-Markovian, hence for any r2 > p the sigma-algebras generated by X and ∞ t+k+r2 t+k+p t+k+p1 Zt+k+r2 , Xt+k are the same. Moreover, by using that for all > t, Z is independent of X we have

sup P (G H) P (G)P (H) = sup P (G H) P (G)P (H) . (56) t+k ∞ | ∩ | t+k tp+1 | ∩ | G∈F∞ ,H∈Ft G∈Ft+k+p1,H∈Ft

t+1 tp+1 Now by using the above, Proposition 2.1, equation (7), and that Z t+k1 and Xt are inde- pendent, for any set (dened as in (4)) we have E

sup P (G H) P (G)P (H) t+k tp+1 | ∩ | G∈Ft+k+p1,H∈Ft p1 k1 2 sup f (z ) sup (y y , z, x)dy dz + 4P (X > or . . . X > ). Z i s,k,t s s1 s t 0 tp+1 p+1 x∈E y ∈Rs D | s=0 i=1 s1 X Z Y Z © ª (57)

24 Finally using that P (X > or X > . . . X > ) p1 P (X > ) gives t 0 t1 1 tp+1 p+1 j=0 tj j (17). P The proof of (18) uses a similar proof as that given above, but uses (8) instead of (7), we omit the details. ¤

PROOF of Lemma 3.1 We rst prove (14) with s = 0. Suppose k 1 and focusing on the t+kp+1 rst element of Xt+k in (13) and factoring out Zt+k gives

k2 r k1 X = Z a (t + k) + [A A (Z)b (Z)] + [A A (Z) Xtp+1] , t+k t+k 0 t+k t+ki t+kr1 1 t+k{ t+ki } t 1 r=0 i=1 i=1 © X Y Y ª which is (14) (with s = 0). To prove (14) for 1 s p, we notice by using the tvARCH(p) representation in (11) and (14) for s = 0 gives

s1 p

Xt+k+s = Zt+k+s a0(t + k + s) + ai(t + k + s)Xt+k+si + ai(t + k + s)Xt+k+si ( i=1 i=s ) X X = Z (Z) + (Z, X) , t+k+s {Ps,k,t Qs,k,t } where and are dened in (15). Hence this gives (14). Since a ( ) and Z are positive, Ps,k,t Qs,k,t j t it is clear that and are positive random variables. ¤ Ps,k,t Qs,k,t

We require the following simple lemma to prove Lemmas 3.3 and 4.4.

Lemma A.4 Suppose that Assumption 3.1(iii) is satised, then for any positive A and B we have

1 y 1 y B B fZ ( ) fZ ( ) dy K( + ). (58) R |A + B A + B A A | A A + B Z Suppose that Assumption 3.1(iv) is satised, then for any positive A, positive B : Rr2+1 R and set E (dened as in (4)) we have → 1 y 1 y B(x) B(x) sup fZ ( ) fZ ( ) dy K sup( + ). (59) R |A + B(x) A + B(x) A A | A A + B(x) Z x∈E x∈E

PROOF. To prove (58) we observe that

1 y 1 y fZ ( ) fZ ( ) dy = I + II R |A + B A + B A A | Z 25 1 y y 1 1 y where I = fZ ( ) fZ ( ) dy and II = )fZ ( ). R A + B | A + B A | R A + B A A Z Z ¡ To bound I, we note that by changing variables with u = y/(A + B) and under Assumption 3.1(iii) we get

B B I fZ u fZ u(1 + ) du K . R A A Z ¯ ¡ ¢ ¡ ¢¯ ¯ ¯ It is straightforward to show II B . Hence the bounds for I and II give (58). A+B The proof of (59) is the same as above, but uses Assumption 3.1(iv) instead of Assumption 3.1(iii), we omit the details. ¤

PROOF of Lemma 3.3. We rst show that

K sup (y y , z, x)dy (z, x) (60) s,k,t s s1 s s,k,t y ∈Rs D | inft∈Z a0(t)Q s1 Z and use this to prove (19). We note that when x = 0, (z, 0) = 0 and Qs,k,t 1 ys fs,k,t(ys y , z, 0) = s,k,t(z) fZ ( ). Therefore using (16) gives | s1 P Pt+k+s,t+k(z)

1 y 1 y (y y , z, x) = f s f s . Ds,k,t s| s1 (z) + (z, x) Z (z) + (z, x) (x) Z (z) ¯Ps,k,t Qs,k,t Ps,k,t Qs,k,t Ps,t,k Ps,k,t ¯ ¯ ¡ ¢ ¡ ¢¯ ¯ ¯ Now recalling that ¯ and are both positive and setting A = (z) and B = (z, x¯) Ps,k,t Qs,k,t Ps,k,t Qs,k,t and using (58) we have

s,k,t(z, x) s,k,t(z, x) s,k,t(ys y , z, x)dys K Q + Q . R D | s1 (z) (z) + (z, x) Z µ Ps,k,t Ps,k,t Qs,k,t ¶

Qs,k,t(z,x) Finally, since s,k,t(z) > inft∈Z a0(t) we have R s,k,t(ys y , z, x)dys K , thus giving P D | s1 inft∈Z a0(t) (60). By using (60) we now prove (19). SubstitutingR (60) into the integral on the left hand side of (19) and using that E[ (Z, x)] = (1 , x), using this and substituting (60) into (17) Qs,k,t Qs,k,t k1 gives

k1 E[ (Z, x)] (1 , x) f (z ) sup (y y , z, x)dy dz K s,k,t = K Qs,k,t k1 . (61) Z i s,k,t s s1 s Q y ∈Rs R D | inft∈Z a0(t) inft∈Z a0(t) Z i=1 s1 Z Y © ª We now nd a bound for . By denition of in (15) and using the matrix norm Qs,k,t Qs,k,t

26 inequality [Ax] K A x ( is the spectral norm) we have 1 k kspeck k k kspec

p k+si k+si (1 , x) = a (t + k + s) A A x Qs,k,t k1 i t+k+si { t+k+sid} 1 i=s+1 r=1 X X Yd=0 p £ ¤ K k1 ai(t + k + s) At+k+si At+k+sid spec x . inft∈Z a0(t) { } k k i=s d=0 X ° Y ° ° ° p To bound the above, we note that by Assumption 3.1(i), sup Z a (t) (1 ), therefore t∈ j=1 j there exists a ˜, where 0 < ˜ < < 1, such that for all t we have A k1 A k Pt+k+si{ d=0 t+k+sid}kspec K(1 ˜)k+1, for some nite K. Altogether this gives Q

p K k+si s,k,t(1k1, x) ai(t + k + s) At+k+si At+k+sid spec x Q inft∈Z a0(t) { } k k i=s d=0 X ° Y ° p ° ° K k+si ai(t + k + s)(1 ˜) x . (62) inf Z a (t) k k t∈ 0 i=s X Substituting the above into (61) gives (19).

We now prove (20). We use the same proof to show (60), but apply (58) instead of (59) to obtain

K sup sup (y y , z, x)dy sup (z, x). s,k,t s s1 s s,k,t y ∈Rs x∈E D | inft∈Z a0(t) x∈E Q s1 Z

By substituting the above into (18) and using the same proof to prove (19) we obtain

p1 k1 E[sup s,k,t(Z, x)] f (z ) sup sup (y y , z, x)dy dz K x∈E Q . (63) Z i s,k,t s s1 s y ∈Rs R x∈E D | inft∈Z a0(t) s=0 Z i=1 s1 Z X Y © ª Since (Z, x) is a positive function and sup (Z, x) = (Z, ) we have E[sup (Z, x)] Qs,k,t x∈E Qs,k,t Qs,k,t x∈E Qs,k,t sup E[ (Z, x)] = sup (1 , x), hence by using (62) we have x∈E Qs,k,t x∈E Qs,k,t k1

E k [sup s,k,t(Z, x)] K(1 ˜) x x∈E Q k k. inft∈Z a0(t) inft∈Z a0(t)

Substituting the above into (63) gives (20). ¤

27 A.3 Proofs in Section 4

PROOF of Lemma 4.1 We rst observe from (21) and the denition of dk that

k1

Xk = Zk a0 + ajXkj + dk(X) . (64) j=1 £ X ¤

Therefore X1 = Z1 [a0 + d1(X)], which we use to obtain X2 = Z2[a0 + a1X1 + d2(X)] = Z2[a0 + a0a1Z1 + a1Z1d1(X) + d2(X)]. Continuing the iteration we get the general expansion in (22). To prove the recursion

k (Z, X) = a (Z, X)Z + d (X), (65) Q0,k jQ0,kj kj k j=1 X , we substitute the representation X = (Z)Z + (Z, X)Z into (64). By using proof by k P0,k k Q0,k k induction it is straightforward to show that (Z, X) satises the recursion. ¤ Q0,k

PROOF of Lemma 4.2 Using (21) and (22) we have

s k+s1

Xk+s = Zk+s a0 + ajXk+sj + ajXk+sj + dk+s(X) " j=1 j=s+1 # Xs X∞ = Z a + a X + a Z (Z) k+s 0 j k+sj j k+sjP0,k+sj j=1 j=s+1 ½ X X k+s + a Z (Z, X)] + d (X) j k+sjQ0,k+sj k+s j=s+1 X ¾ = Z (Z) + (Z, X) , k+s {Ps,k Qs,k } where and are dened in (25), thus giving us the desired result. ¤ Qs,k Ps,k

PROOF of Lemma 4.3. We rst prove

∞ ∞ (1 a zj)1 = zj.. (66) j j j=1 j=0 X X Let a(z) = 1 ∞ a zj, since the coecients of a(z) are absolutely summable, a(z) is analytic. j=1 j Because ∞ a < 1 , for all z 1 we have ∞ a zj 1 , thus a(z) 1 j=1 Pj | | | j=1 j | | | ∞ a zj > 0, which means that a(z) is not zero inside the unit circle. This means | j=1 j P| P P 28 that for z 1, a(z) has a reciprocal and there exists coecients such that (66) is true. | | { j} Furthermore, Hannan and Kavaliers (1986) show if j a < , then j < . We j | | j| ∞ j | | j| ∞ now prove (27) by using the recursion P P

k (1 , x) = a (1 , x) + d (x), s 0. (67) Qs,k k1 j+sQ0,kj kj1 k+s j=1 X We observe that by using the above and setting (1 , x) = 0 for k 0, we have that Q0,k k1 (1 , x) satises Q0,k k1

∞ (1 , x) = a (1 , x) + d (X), for k > 0. Q0,k k1 jQ0,kj kj1 k j=1 X Rewriting the above using backshift notation we have a(B) (1 , X) = d (X), where B Q0,k k1 k is the backshift operator. Since the reciprocal of a(z) is well dened for z 1, we have | | (1 , X) = a(B)1d (X) = ∞ d (X) = k1 d (X), and thus we obtain the Q0,k k1 k j=0 j kj j=0 j kj ¤ desired result. P P

PROOF of Lemma 4.4 Using the density of fs,k derived in (26) and the same method to prove (60) we have

k1 f (z ) f (y, z, x) f (y, z, 0) dy dz Z i | 0,k 0,k | i=1 Z Y ½Z ¾ k1 (z, x) k1 (z, x) K K f (z )Q0,k dz K f (z )Q0,k dz E[ (Z, x)]. Z i a Z i a a Q0,k i=1 0 i=1 0 0 Z Y Z Y By noting that E[ (Z, x)] = (1 , x) and using (27) gives us (28). Q0,k Q0,k k1 To prove (29) we use (26) and the same method used to prove (60), this gives us

k1 f (z ) sup (y y , x)dy dz Z i s,k s s1 s y ∈Rs D | Z i=1 s1 Z Y © ª K k1 K K f (z ) (z, x)dz = E[ (Z1 , x)] = (1 , x). a Z i Qs,k a Qs,k k1 a Qs,k k1 0 i=1 0 0 Z Y

29 By substituting (67) and (27) into the above we have

k1 f (z ) sup (y y , z, x)dy dz Z i s,k s s1 s y ∈Rs D | i=1 s1 Z Y Z © ª k+sj 1 k+s 1 k+s a (1 , x) + d (x) a d (x) + d (x) a jQ0,k+sj k1 k+s a j | l| k+sjl k+s 0 (j=s+1 ) 0 (j=s+1 ) X X Xl=0 k+sj 1 k+s ∞ ∞ = a a x + a x , a j | l| k+sjl+i i k+s+i i 0 (j=s+1 i=0 i=0 ) X Xl=0 X X which gives us (29).

The proof of (30) is very similar to the proof of (29), but uses Assumption 3.1(iv) rather than Assumption 3.1(iii), we omit the details. ¤

PROOF of Lemma 4.5 First let us consider the function h() = c+d, it clear that h( ) is a 1 1 1 d +1 1+ +1 +1 +1 convex function with a unique minimum at = ( c ) , where h( ) = ( + )c d . ∞ It is straightforward to extend this argument to the function i=0(cii + dii ), which gives ¤ the required result. P

PROOF of Theorem 4.1(i) By substituting a j and j into (33) and using for j | j| 0 < 1, that ( g ) g, and setting = we have i i i i +1 P P sup P (G H) P (G)P (H) G X∞ ∈ ( k ) | ∩ | ∞ H0∈(X0 ) ∞ ∞ k+s k+sj ∞ +1 K (j + 1) (l + 1)(k + s j l + i + 1) + (k + s + i + 1) i=0 s=0 j=s+1 s=0 X · X X Xl=0 X ¸ ∞ k+s ˜ ˜ K [(j + 1)(k + s j + 1)]+2 + (k + 1)+2 s=0 j=s+1 X X where K is a nite constant which depends on E X and , and ˜ = ( ). We note that | t| +1 f(j) = (j + 1)(k + s j + 1) is a concave quadratic in j, which takes a maximum at either

30 boundary (k + s + 1) and (s + 2)k. Using this and (s + 2)k (k + s + 1) we have

sup P (G H) P (G)P (H) k | ∩ | G∈(X∞) ∞ H∈(X0 ) ∞ ˜ ˜ ˜ ˜ K k(k + s + 1)+2 + (k + 1)+2 K k(k + 1)+3 + (k + 1)+2 , s=0 X · ¸ which proves Theorem 4.1(i).

References

K. B. Athreya and S. G. Pantula. Mixing properties of Harris chains and autoregressive processes. J. Appl. Probab., 23:880–892, 1986.

B. Basrak, R.A. Davis, and T Mikosch. Regular variation of GARCH processes. Stochastic Processes and their Applications, 99:95–115, 2002.

I. Berkes, S. Hormann, and J. Schauer. Asymptotic results for the empirical process of stationary sequences. Stochastic Processes and their Applications, 00:000–000, 2008. Forthcoming.

D. Bosq. Nonparametric Statistics for Stochastic Processes. Springer, New York, 1998.

P. Bougerol and N. Picard. Stationarity of GARCH processes and some nonnegative time series. J. Econometrics, 52:115–127, 1992.

F. Bousamma. Ergodicite, melange et estimation dans les modeles GARCH. PhD thesis, Paris 7, 1998.

R. C. Bradley. Introduction to Strong Mixing Conditions Volumes 1,2 and 3. Kendrick Press, 2007.

M. Carrasco and X. Chen. Mixing and moment properties of various garch and stochastic volatility models. Econometric Theory, 18:17–39, 2002.

K. C. Chanda. Strong mixing properties of linear stochastic processes. J. Appl. Prob., 11: 401–408, 1974.

R. Dahlhaus and W. Polonik. Nonparametric quasi-maximum likelihood estimation for gaussian locally stationary processes. Ann. Statist., 34:2790–2842, 2006.

31 R. Dahlhaus and S. Subba Rao. Statistical inference of time varying ARCH processes. Ann. Statistics, 34:1074–1114, 2006.

J Davidson. Stochastic Limit Theory. Oxford University Press, Oxford, 1994.

P.D. Feigin and R. L. Tweedie. Random coecient autoregressive processes. a analysis of stationarity and niteness of moments. Journal of Time Series Analysis, 6:1–14, 1985.

C. Francq and J.-M. Zakoan. Mixing properties of a general class of garch(1, 1) models without moment assumptions. Econometric Theory, 22:815–834, 2006.

P. Fryzlewicz, T. Sapatinas, and S. Subba Rao. Normalised least squares estimation in time- varying ARCH models. Ann. Statistics, 36:742–786, 2008.

P. Fryzlewicz and S. Subba Rao. BaSTA: consistent multiscale multiple change-point detection for piecewise-stationary ARCH processes. 2008.

L. Giraitis, P. Kokoskza, and R. Leipus. Stationary ARCH models: Dependence structure and central limit theorem. Econometric Theory, 16:3–22, 2000.

L. Giraitis, R. Leipus, and D Surgailis. Recent advances in ARCH modelling. In A. Kirman and G. Teyssiere, editors, Long Memory in Economics, pages 3–39. Berlin, 2005.

L. Giraitis and P.M. Robinson. Whittle estimation of ARCH models. Econometric Theory, 17: 608–631, 2001.

V.V. Gorodetskii. On the strong mixing propery for linear sequences. Theory of Probability and its Applications, 22:411–413, 1977.

P Hall and C.C. Heyde. Martingale Limit Theory and its Application. Academic Press, New York, 1980.

E. J. Hannan and L. Kavaliers. Regression, autoregressive models. J. Time Series Anal., 7: 27–49, 1986.

S Hormann. Augmented GARCH sequences: Dependence structure and asymptotics. Bernoulli, 14:543–561, 2008.

32 I. A. Ibragimov. Some limit theorems for stationary processes. Theory of Probability and its Applications, 7:349–82, 1962.

E. Liebscher. Towards a unied approach for proving geometric ergodicity and mixing properties of nonlinear autoregressive processes. J. Time Series Analysis, 26:669–689, 2005.

A. Lindner. Handbook of nancial time series, chapter Stationarity, Mixing, Distributional Prop- erties and Moments of GARCH(p, q)- Processes. Springer Verlag, Berlin, 2008.

M. Meitz and P. Saikkonen. Ergodicity, mixing, and existence of moments of a class of markov models with applications to garch and acd models. Econometric Theory, 24:1291–1320, 2008.

S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. Springer-Verlag, 1993.

T. Mikosch and C. Starica. Long-range dependence eects and arch modelling. In P. Doukhan, G. Oppenheim, and M.S. Taqqu, editors, Theory and Applications of Long Range Dependence, pages 439–459. Birkhauser, Boston, 2003.

A. Mokkadem. Properties de melange des processus autoregressifs polnomiaux. Annales de l’I H. P., 26:219–260, 1990.

D. T. Pham. The mixing propery of bilinear and generalised random coecient autorregressive models. Stochastic Processes and their Applications, 23:291–300, 1986.

D. T. Pham and T .T Tran. Some mixing properties of time series models. Stochastic processes and their applications, 19:297–303, 1985.

P. M. Robinson and P. Zaaroni. Pseudo-maximum likelihood estimation of ARCH( ) models. ∞ Ann. Statist., 34:1049–1074, 2006.

P.M. Robinson. Testing for strong serial correlation and dynamic conditional heteroskedasity in multiple regression. Journal of Econometrics, 47:67–78, 1991.

A. A. Sorokin. Uniform bound for strong mixing coecient and maximum of residual empirical process of ARCH sequences (in Russian). ArXiv (0610747), 2006.

D. Straumann and T. Mikosch. Quasi-maximum likelihood estimation in conditionally het- roscedastic time series: a stochastic recurrence equation approach. Ann. Statist., pages 2449– 2495, 2006.

33 S. Subba Rao. A note on uniform convergence of an ARCH( ) estimator. Sankhya, 68:600–620, ∞ 2006.

D. Tjostheim. Nonlinear time series and markov chains. Adv. in Appl. Probab, 22:587–611, 1990.

V. A. Volkonskii and Yu. A. Rozanov. Some limit theorems for random functions I. Theor. Probabl Appl., 4:178–197, 1959.

34