<<

Ruprecht-Karls-Universit¨atHeidelberg

Fakult¨atf¨urMathematik und Informatik

Bachelorarbeit

zur Erlangung des akademischen Grades

Bachelor of Science (B. Sc.)

GARCH(1,1) models

vorgelegt von

Brandon Williams

15. Juli 2011

Betreuung: Prof. Dr. Rainer Dahlhaus Abstrakt

In dieser Bachelorarbeit werden GARCH(1,1)-Modelle zur Analyse finanzieller Zeitreihen unter- sucht. Dabei werden zuerst hinreichende und notwendige Bedingungen daf¨urgegeben, dass solche Prozesse ¨uberhaupt station¨arwerden k¨onnen.Danach werden asymptotische Ergebnisse ¨uber rel- evante Sch¨atzerhergeleitet und parametrische Tests entwickelt. Die Methoden werden am Ende durch ein Datenbeispiel illustriert.

Abstract

In this thesis, GARCH(1,1)-models for the analysis of financial are investigated. First, sufficient and necessary conditions will be given for the process to have a stationary solution. Then, asymptotic results for relevant estimators will be derived and used to develop parametric tests. Finally, the methods will be illustrated with an empirical example. Contents

1 Introduction 2

2 Stationarity 4

3 A 9

4 Parameter estimation 18

5 Tests 22

6 Variants of the GARCH(1,1) model 26

7 GARCH(1,1) in continuous time 27

8 Example with MATLAB 34

9 Discussion 39

1 1 Introduction

Modelling financial time series is a major application and area of research in and . One of the challenges particular to this field is the presence of heteroskedastic effects, meaning that the of the considered process is generally not constant. Here the volatility is the square root of the conditional of the log return process given its previous values. That is, if Pt is the time series evaluated at time t, one defines the log returns

Xt = log Pt+1 − log Pt and the volatility σt, where 2 2 σt = Var[Xt | Ft−1] and Ft−1 is the σ-algebra generated by X0, ..., Xt−1. Heuristically, it makes sense that the volatility of such processes should change over time, due to any number of economic and political factors, and this is one of the well known “stylized facts” of mathematical finance.

The presence of heteroskedasticity is ignored in some financial models such as the Black-Scholes model, which is widely used to determine the fair pricing of European-style options. While this leads to an elegent closed-form formula, it makes assumptions about the distribution and stationarity of the underlying process which are unrealistic in general. Another commonly used homoskedas- tic model is the Ornstein-Uhlenbeck process, which is used in finance to model interest rates and credit markets. This application is known as the and suffers from the homoskedastic assumption as well.

ARCH (autoregressive conditional heteroskedasticity) models were introduced by Robert Engle in a 1982 paper to account for this behavior. Here the conditional variance process is given an au- toregressive structure and the log returns are modelled as a multiplied by the volatility:

Xt = etσt

2 2 2 σt = ω + α1Xt−1 + ... + αpXt−p, where et (the ’innovations’) are i.i.d. with expectation 0 and variance 1 and are assumed indepen- dent from σk for all k ≤ t. The lag length p ≥ 0 is part of the model specification and may be determined using the Box-Pierce or similar tests for significance, where the case 2 p = 0 corresponds to a white noise process. To ensure that σt remains positive, ω, αi ≥ 0 ∀i is required.

2 Tim Bollerslev (1986) extended the ARCH model to allow σt to have an additional autoregres- sive structure within itself. The GARCH(p,q) (generalized ARCH) model is given by

Xt = etσt

2 2 2 2 2 σt = ω + α1Xt−1 + ... + αpXt−p + β1σt−1 + ... + βqσt−q. This model, in particular the simpler GARCH(1,1) model, has become widely used in financial time series modelling and is implemented in most statistics and econometric software packages. GARCH(1,1) models are favored over other stochastic volatility models by many economists due

2 to their relatively simple implementation: since they are given by stochastic difference equations in discrete time, the is easier to handle than continuous-time models, and since financial is generally gathered at discrete intervals.

However, there are also improvements to be made on the standard GARCH model. A notable problem is the inability to react differently to positive and negative innovations, where in reality, volatility tends to increase more after a large negative shock than an equally large positive shock. This is known as the leverage effect and possible solutions to this problem are discussed further in section 6.

Without loss of generality, the time t will be assumed in the following sections to take values in either N0 or in Z.

3 2 Stationarity

The first task is to determine suitable parameter sets for the model. In the introduction, we 2 considered that ω, α, β ≥ 0 is necessary to ensure that the conditional variance σt remains non- 2 negative at all times t. It is also important to find parameters ω, α, β which ensure that σt has finite or higher moments. Another consideration which will be important when studying 2 the asymptotic properties of GARCH models is whether σt converges to a stationary distribution. Unfortunately, we will see that these conditions translate to rather severe restrictions on the choice of parameters.

Definition 1. : A process Xt is called stationary (strictly stationary), if for all times t1, ..., tn, h ∈ Z:

FX (xt1+h, ..., xtn+h) = FX (xt1 , ...xtn ) where FX (xt1 , ..., xtn ) is the joint cumulative distribution function of Xt1 , ..., Xtn .

Theorem 2. Let ω > 0 and α, β ≥ 0. Then the GARCH(1,1) equations have a stationary solution 2 if and only if E[log(αet + β)] < 0. In this case the solution is uniquely given by

∞ j 2  X Y 2  σt = ω 1 + (αet−i + β) . j=1 i=1

2 2 2 Proof. With the equation σt = ω + (αet−1 + β)σt−1, by repeated use on σt−1, etc. we arrive at the equation k j k+1 2 X Y 2 Y 2 2 σt = ω(1 + (αet−i + β)) + ( (αet−i + β))σt−k−1, j=1 i=1 i=1 which is valid for all k ∈ N. In particular, k j 2 X Y 2 σt ≥ ω(1 + (αet−i + β)), j=1 i=1

2 2 since α, β ≥ 0. Assume that σt is a stationary solution and that E[log(αet + β)] ≥ 0. We have j j j Y 2 Y 2 X 2 log E[ (αet−i + β)] ≥ E[log (αet−i + β)] = E[log(αet−i + β)] i=1 i=1 i=1

2 Qj 2 and therefore, if E[log(αet + β)] > 0, then the product i=1(αet−i + β) diverges a.s. by the strong 2 Pj 2 . In the case that E[log(αet + β)] = 0, then i=1 log(αet−i + β) is a process so that j X 2 lim sup log(αet−i + β) = ∞ a.s. j→∞ i=1 so that in both cases we have j Y 2 lim sup (αet−i + β) = ∞ a.s. j→∞ i=1

4 Since all terms are negative we then have

j 2 Y 2 σt ≥ lim sup ω (αet−i + β) = ∞ a.s. j→∞ i=1 2 which is impossible; therefore, E[log(αet + β)] < 0 is necessary for the existence of a stationary 2 2 solution. On the other hand, let E[αet + β] < 0. Then there exists a ξ > 1 with log ξ + E[log(αet + β)] < 0. For this ξ we have by the strong law of large numbers: n 1 X a.s. log ξ + log(αe2 + β) −→ log ξ + [log(αe2 + β)] < 0, n t−i E t i=1 so n n Y 1 X a.s. log(ξn (αe2 + β)) = n(log ξ + log(αe2 + β)) −→ −∞, t−i n t−i i=1 i=1 and n n Y 2 a.s. ξ (αet−i + β) −→ 0. i=1 Therefore, the series ∞ j  X Y 2  ω 1 + (αet−i + β) j=1 i=1 converges a.s. To show uniqueness, assume that σt andσ ˆt are stationary: then n 2 2 2 n Y 2 −n 2 2 P |σt − σˆt| = (αet−1 + β)|σt−1 − σˆt−1| = (ξ (αet−i + β))ξ |σt−n − σˆt−n| → 0. i=1

This that P(|σt − σˆt| > ) = 0 ∀ > 0, so σt =σ ˆt a.s.

Corollary 3. The GARCH(1,1) equations with ω > 0 and α, β ≥ 0,have a stationary solution with 2 ω finite expected value if and only if α + β < 1, and in this case: E[σt ] = 1−α−β . 2 2 Proof. : Since E[log(αet + β)] ≤ log(E[αet + β]) = log(α + β) < 0, the conditions of Theorem 1 are fulfilled. We have ∞ j 2  X Y 2  E[σt ] = E[ω 1 + (αet−i + β) ] j=1 i=1 ∞ j X Y 2 = ω(1 + E[ (αet−i + β)]) j=1 i=1 ∞ X = ω(1 + (α + β)j) j=1 ω = (1 − α − β) if this series converges, that is, if α + β < 1, and ∞ otherwise.

5 Remark 4. This theorem shows that strictly stationary IGARCH(1,1) processes (those where α + β = 1) exist. For example, if et is normally distributed, and α = 1, β = 0, then

2 2 E[log(αet + b)] = E[log et ] = −(γ + log 2) < 0,

2 2 where γ ≈ 0.577 is the Euler-Mascheroni constant. Therefore, the equations Xt = etσt; σt = Xt−1, or equivalently Xt = etXt−1 define a which has infinite variance at every t. On 2 2 the other hand, σt = σt−1 has no stationary solution. In some applications, we may require that the GARCH process have finite higher-order moments; for example, when studying its tail behavior it is useful to study its excess , which requires the fourth to exist and be finite. This leads to further restrictions on the coefficients α and β. For a stationary GARCH process,

4 4 4 E[Xt ] = E[et ]E[σt ] ∞ j 2 4 2 X Y 2  = E[et ]E[ω 1 + (αet−i + β) ] j=1 i=1 ∞ j ∞ ∞ k l 2 4 X Y 2 2 X X Y Y 2 2 = ω E[et ]E[1 + 2 (αet−i + β) + (αet−i + β)(αet−j + β)] j=1 i=1 k=1 l=1 i=1 j=1 ∞ ∞ ∞ 2 4  X j X X 2 2 k∧l k∨l−k∧l = ω E[et ] 1 + 2 (α + β) + E[(αet + β) ] (α + β) , j=1 k=1 l=1

2 2 2 2 which is finite if and only if ρ := E[(αet + β) ] < 1. In this case, using the recursion σt = 2 2 ω + σt−1(αet−1 + β),

4 2 2 2 2 2 4 E[σt ] = ω + 2ωE[αet−1 + β]E[σt−1] + E[(αet−1 + β) ]E[σt−1] α + β = ω2 + 2ω2 + ρ2 [σ4], 1 − α − β E t so 1 + α + β [X4] = [σ4] [e4] = ω2 [e4] . E t E t E t E t (1 − ρ2)(1 − α − β) 2 In the case of normally distributed innovations (et), the condition ρ < 1 means

2 2 2 4 2 2 2 2 2 2 E[(αet + β) ] = α E[et ] + 2αβE[et ] + β = 3α + 2αβ + β = (α + β) + 2α < 1.

The excess kurtosis of Xt with normally distributed innovations is then

4 2 E[Xt ] 3(1 + α + β)ω 2 − 3 = 2 2 ω 2 − 3 Var[Xt] (1 − α − β)(1 − 2α − (α + β) )( 1−α−β ) 1 − (α + β)2 = 3 − 3 1 − 2α2 − (α + β)2 2α2 = > 0, 1 − 2α2 − (α + β)2

6 which means that Xt is leptokurtic, or heavy-tailed. This implies that outliers in the GARCH model should occur more frequently than they would with a process of i.i.d. normally distributed variables, which is consistent with empirical studies of financial processes. More generally, for Xt to have a finite 2n-th moment (n ∈ N) a necessary and sufficient condition 2 n is that E[(αet + β) ] < 1.

Another interesting feature of GARCH processes is the extent to which innovations et at time 2 t persist in the conditional variance at a later time σt+h. To consider this mathematically we will use the following definition. For the GARCH(1,1)-process X = (Xt), define t+n t+n−1 k 2 Y 2 X Y 2 πX (t, n) = σ0 (αet+n−i + β) + ω( (αet+n−j + β)). i=1 k=n j=1

1 Definition 5. The innovation et does not persist in X in L iff

E[πX (t, n)] → 0 (n → ∞), and almost surely (a.s.) iff a.s. πX (t, n) → 0 (n → ∞).

If every innovation et persists in X, then we call X persistent. To see how this definition reflects the heuristic meaning of a shock innovation persisting in the conditional variance, consider that for a GARCH time series with finite variance, n−1 1 − α − β Y [π (t, n)] = [( [σ2 ] − [σ2 | e ]) (αe2 + β)] E X E E t+n E t+n t ω(α + β)n−1 t+n−i i=1 1 − α − β = ( [σ2 ] − [σ2 | e ]) , E t+n E t+n t ω 2 2 which tends to zero if and only if E[σt+n] − E[σt+n | et] tends to zero as well. 1 1 Theorem 6. (i) If et persists in X in L for any t, then et persists in X in L for all t. This is the case if and only if α + β ≥ 1. (ii) If et persists in X a.s. for any t, then et persists in X a.s. for all t. This is the case if and 2 only if E[log(αet + β)] ≥ 0. Proof. (i) First, t+n t+n−1 k 2 Y 2 X Y 2 E[πX (t, n)] = σ0 E[αet+n−i + β] + ω E[αet+n−j + β]. i=1 k=n j=1 2 For this value to be converge to zero (that is, for et to not persist), we need E[σ0] to be finite, which means α + β < 1. On the other hand, let α + β < 1. Then we have t+n−1 ω X [π (t, n)] = (α + β)t+n + ω (α + β)k E X 1 − α − β k=n ω(α + β)n = → 0 (n → ∞), 1 − α − β

7 so et does not persist. 2 (ii) Let E[log(αet + β)] < 0. By the strong law of large numbers,

n 1 X a.s. log(αe2 + β) → [log(αe2 + β)] < 0, n t−i E t i=1 so n n Y  1 X  a.s. log (αe2 + β) = n log(αe2 + β) → −∞ t−i n t−i i=1 i=1 and therefore n Y 2 a.s. (αet−i + β) → 0. i=1 This means that we have

t+n t+n−1 k 2 Y 2 X Y 2 a.s. πX (t, n) = σ0 (αet+n−i + β) +ω( (αet+n−j + β)) → 0. i=1 k=n j=1 | {z } | {z } →0 →0

2 On the other hand, let E[log(αet + β)] ≥ 0. Then by the argument in the proof of Theorem 1, we have j Y 2 lim sup (αet−i + β) = ∞ a.s. j→∞ i=1 so that πX (t, n) cannot converge to zero.

It is a peculiar property of GARCH(1,1) models that the concept of persistence depends strongly on the type of convergence used in the definition. Persistence in L1 is the more intuitive sense, 2 2 since it excludes pathological volatility processes such as σt = 3σt−1, which is strongly stationary 2 since E[log(3et + 0)] = −(γ + log 2) + log 3 < 0. Definition 7. We call π := α+β the persistence of a GARCH(1,1) model with parameters ω, α, β.

As we have seen earlier, the persistence of the model limits the kurtosis the process can take. Since the estimated best-fit GARCH process to a time series often has persistence close to 1, this severely limits the value of α to ensure the existence of the fourth moment. From the representation 2 of σt in theorem 2.1, we immediately have 2 Theorem 8. The autocorrelation function (ACF) of {Xn} decays exponentially to zero with rate π if π < 1.

8 3 A central limit theorem

Having derived the admissible parameter space, we consider the task of estimating the parameters and predicting the values of Xt at future times t. Since Xt is centered at every t, a natural 1 Pn 2 estimator for its variance is the average of the squares n t=1 Xt . The following theorem will show that, under a stationarity and moment assumption, this is a consistent and asymptotically normal estimator.

2 4 Theorem 9. For a wide-sense stationary GARCH(1,1)-process Xt with V ar[Xt ] < ∞, E[et ] < ∞ and parameters ω, α, β, the following theorem holds:

n 4 1 X D 1 + α + β  [e ](1 + α − β) + 2β 1  √ (X2 − [X2]) → N (0, ω2 E t − , n t E t (1 − α − β)2 1 − ρ2 1 − α − β t=1

2 2 2 where ρ := E[(αet + β) ] as in section 2. 2 Proof. By Corollary 2.2 the condition E[Xt ] < ∞ implies that α + β < 1 and we have ω [X2] = [σ2] = . E t E t 1 − α − β

4 2 Similarly, we have seen that E[Xt ] is finite if and only if ρ < 1 and in this case one has 1 + α + β [X4] = [σ4] [e4] = ω2 [e4] . E t E t E t E t (1 − ρ2)(1 − α − β)

2 2 Define Yt := Xt − E[Xt ]. Then the variables {Y1,Y2, ...} are weakly dependent in the following sense: u Lemma 10. Let s1 < s2 < ... < su < su + r = t and let f : R → R be quadratically integrable and measurable. Then p 2 r |Cov[f(Ys1 , ..., Ysu ),Yt]| ≤ C E[f (Ys1 , ..., Ysu )]ρ for a constant C which is independent of s1, ..., su, r.

2 Proof. Let w = ω − E[Xt ]. Then we have

∞ j 2 X Y 2 Yt = wet (1 + (αet−i + β)). j=1 i=1 We define the helper variable

r−1 j ˜ 2 X Y 2 Yt = wet (1 + (αet−i + β)). j=1 i=1 ˜ Then Yt is independent of (Ys1 , ..., Ysu ) and by the Cauchy-Schwarz inequality: ˜ |Cov[f(Ys1 , ..., Ysu ),Yt]| = |Cov[f(Ys1 , ..., Ysu ),Yt − Yt]| q p 2 ˜ 2 ≤ E[f (Ys1 , ..., Ysu )] E[(Yt − Yt) ].

9 However, we have

∞ j r−1 j ˜ 2 2 4 X Y 2 X Y 2 2 E[(Yt − Yt) ] = E[w et ( (αet−i + β) − (αet−i + β)) ] j=1 i=1 j=1 i=1 ∞ j 2 4 X Y 2 2 = w E[et ]E[( (αet−i + β)) ] j=r i=1 r ∞ j 2 4 Y 2 2 X Y 2 2 = w E[et ]E[( (αet−k + β)) ]E[(1 + (αet−i + β)) ] k=1 j=r+1 i=r+1 w2 = ρ2r [e4] [σ4] ω2 E t E t 2r 2 = ρ E[Yt ], and therefore q 2 p 2 r |Cov[f(Ys1 , ..., Ysu ),Yt]| ≤ E[Yt ] E[f (Ys1 , ..., Ysu )]ρ . | {z } C

Similarly, we will need an additional inequality: u Lemma 11. Let s1 < s2 < ... < su < su + r = t and let f : R → R be bounded, measurable and integrable. Then r |Cov[f(Ys1 , ..., Ysu ),YtYt+h]| ≤ Ckfk∞ρ for any h > 0.

Proof. Define Y˜t as earlier. Then by H¨older’sinequality, ˜ ˜ |Cov[f(Ys1 , ..., Ysu ),YtYt+h]| = |Cov[f(Ys1 , ..., Ysu ),YtYt+h − YtYt+h]| ˜ ˜ ≤ 2kfk∞E[|YtYt+h − YtYt+h|].

Using the triangle and Cauchy-Schwarz inequalities, we have ˜ ˜ ˜ ˜ ˜ E[|YtYt+h − YtYt+h|] ≤ E[|Yt − Yt|Yt+h] + erw|Yt+h − Yt+h|Yt q q q q ˜ 2 ˜ 2 ˜ 2 ≤ E[(Yt − Yt)] E[Yt+h] + E[(Yt+h − Yt+h) ] E[Yt ] q q q q r 2 2 r 2 ˜ 2 ≤ ρ E[Yt ] E[Yt+h] + ρ E[Yt+h] E[Yt ], so that q q 2 2 ˜ 2 r |Cov[f(Ys1 , ..., Ysu ),YtYt+h]| ≤ 2(E[Yt ] + E[Yt ] E[Yt ])kfk∞ρ . | {z } C

10 The theorem to be proved is now

n 4 1 X D 1 + α + β [e ](1 + α − β) + 2β 1 √ Y → N (0, ω2 (E t − )). n t (1 − α − β)2 1 − ρ2 1 − α − β t=1 Define ∞ ∞ 2 X 2 X σ := E[Y0Yk] = E[Y0 ] + 2 E[Y0Yk] k=−∞ k=1 and n 1 X σ2 := Var[√ Y ]. n n t t=1 Then we have ∞ 2 4 2 2 X 2 2 2 2 σ = E[X0 ] − E[X0 ] + 2 (E[X0 Xk ] − E[X0 ]E[Xk ]). k=1 However,

2 2 2 2 E[X0 Xk ] = E[σ0σk] k−1 j k 2 X Y 2 2 Y 2 = E[σ0(a0 (αek−i + β) + σ0 (αek−i + β))] j=1 i=1 i=1 1 − (α + β)k = (α + β)k [σ4] + ω [σ2] E 0 1 − α − β E 0 (1 + α + β)(α + β)k 1 − (α + β)k = ω2( + ), (1 − ρ2)(1 − α − β) (1 − α − β)2 so 1 + α + β 1 σ2 = ω2 [e4] − ω2 + E t (1 − ρ2)(1 − α − β) (1 − α − β)2 ∞ X (1 + α + β)(α + β)k (α + β)k + 2 ω2( − ) (1 − ρ2)(1 − α − β) (1 − α − β)2 k=1 ∞ ω2 1 + α + β 1 1 + α + β 1 X = ( [e4] − + 2( − ) (α + β)k) 1 − α − β E t 1 − ρ2 1 − α − β 1 − ρ2 1 − α − β k=1 ω2 1 − (α + β)2 2 + 2α + 2β 2 = ( [e4] − 1 + − ) (1 − α − β)2 E t 1 − ρ2 1 − ρ2 1 − α − β 1 + α + β [e4](1 + α − β) + 2β 1 = ω2 (E t − ). (1 − α − β)2 1 − ρ2 1 − α − β

11 Since Yt is centered for every t, we have

n 1 X |σ˜2 − σ2| = |( [Y Y ]) − σ2| n n E s t s,t=1 n−1 X k = | [Y 2] + 2 (1 − ) [Y Y ] − σ2| E 0 n E 0 k k=1 ∞ n−1 X X k ≤ 2 | [Y Y ]| + 2 | [Y Y ]| E 0 k n E 0 k k=n k=1 ∞ n−1 q X q X kρk ≤ 2C [Y 2] ρk + 2C [Y 2] → 0 (n → ∞), E 0 E 0 n k=n k=1 | {z } | {z } →0 →0 using the weak independence of the variables {Yk}, where the limit of the second sum is due to P∞ k 2 2 Kronecker’s lemma since k=1 ρ < ∞. This means thatσ ˜n → σ for n → ∞. We now define 2 2 Yt Xt − E[Xt ] Wn,t := √ = √ (t ≤ n). σ˜n n σ˜n n Pn D By Slutsky’s theorem, it is enough to show that t=1 Wn,t −→ N (0, 1). For k ≤ n define

k k−1 X X vn,k := Var[ Wn,t] − Var[ Wn,t]. t=1 t=1 Then we have

k−1 X vn,k = Var[Wn,k] + 2 Cov[Wn,t,Wn,k] t=1 k−1 1 X = ( [Y 2] + 2 [Y Y ]) nσ˜2 E 0 E 0 t n t=1 ∞ 1 X = (σ2 − 2 [Y Y ]), nσ˜2 E 0 t n t=k | {z } →0 so nvn,k tends to 1 as n → ∞ for all k. In particular we have maxk≤n vn,k → 0 (n → ∞) and that vn,k is positive for all n ≥ k > N0 for a large enough N0 ∈ N. Without loss of generality, we may assume that vn,k > 0 is true for all n, k. For every n ∈ N let Zn,1,Zn,2, ..., Zn,n be independent random variables, also independent from Wn,k for every k, and such that Zn,k ∼ N (0, vn,k). Since

D Zn,1 + ... + Zn,n −→ N (0, 1), | {zP } ∼N (0, k vn,k)

12 it is enough to show that

E[f(Wn,1 + ... + Wn,n) − f(Zn,1 + ... + Zn,n)] → 0 (n → ∞)

3 for any function f ∈ C (R). Let f be any such function. For 1 ≤ k ≤ n we define the partial sums

Sn,k := Wn,1 + ... + Wn,k−1,Tn,k := Zn,k+1 + ... + Zn,n

(where Sn,1 = Tn,n = 0), as well as

∆n,k := E[f(Sn,k + Wn,k + Tn,k) − f(Sn,k + Zn,k + Tn,k)]. Clearly, n X E[f(Wn,1 + ... + Wn,n) − f(Zn,1 + ... + Zn,n)] = ∆n,k. k=1 (1) (2) We now split ∆n,k = ∆n,k − ∆n,k by defining v ∆(1) := [f(S + W + T )] − [f(S + T )] − n,k [f 00(S + T )] n,k E n,k n,k n,k E n,k n,k 2 E n,k n,k v ∆(2) := [f(S + Z + T )] − [f(S + T )] − n,k [f 00(S + T )] n,k E n,k n,k n,k E n,k n,k 2 E n,k n,k and consider each term separately. By applying Taylor’s theorem to f as a function in Zn,k around 0, there exists a ξn,k ∈ (0, 1) with

Z2 (2) 0 n,k 00 vn,k 00 ∆n,k = E[Zn,kf (Sn,k + Tn,k)] +E[ (f (Sn,k + ξn,kZn,k + Tn,k)] − E[f (Sn,k + Tn,k))] | {z } 2 2 =0 where the first term is zero because Zn,k is independent of Sn,k,Tn,k. By the value theorem,

n n n n X (2) X 3 X 3/2 1/2 X | ∆ | ≤ C E[|Zn,k|] = C v ≤ C max v vn,k → 0(n → ∞). n,k n,k 1≤k≤n n,k k=1 k=1 k=1 k=1 | {z } =1

(1) Showing this for ∆n,k is somewhat more difficult. Similarly to above, we find by Taylor’s theorem a random variable τn,k ∈ (0, 1) so that

W 2 v ∆(1) = [W f 0(S + T )] + [ n,k f 00(S + τ W + T )] − n,k [f 00(S + T )] n,k E n,k n,k n,k E 2 n,k n,k n,k n,k 2 E n,k n,k where

k−1 0 X 0 0 E[Wn,kf (Sn,k + Tn,k)] = E[Wn,k(f (Sn,j+1 + Tn,k) − f (Sn,j + Tn,k))] j=1 k−1 X 00 = E[Wn,kWn,jf (Sn,j + ξn,k,jWn,j + Tn,k)] j=1

13 2 Pk−1 for a random variable ξn,k,j ∈ (0, 1), again by Taylor’s theorem. Since vn,k = E[Wn,k]+2 j=1 E[Wn,kWn,j], we then have

k−1 (1) X 00 00 ∆n,k = E[Wn,kWn,j(f (Sn,j + ξn,k,jWn,j + Tn,k) − E[f (Sn,k + Tn,k)])] j=1 | {z } (1,1) =∆n,k

1 + [W 2 (f 00(S + τ W + T ) − [f 00(S + T )])] 2E n,k n,k n,k n,k n,k E n,k n,k | {z } (1,2) =∆n,k

(1,2) For every d ∈ N we split ∆n,k as follows:

(1,2) (1,2,1) (1,2,2) (1,2,3) ∆n,k = ∆n,k,d + ∆n,k,d + ∆n,k,d where (1,2,1) 2 00 00 ∆n,k,d = E[Wn,k(f (Sn,k + τn,kWn,k + Tn,k) − f (Sn,k−d + Tn,k))]

(1,2,2) 2 00 00 ∆n,k,d = E[Wn,k(f (Sn,k−d + Tn,k) − E[f (Sn,k−d + Tn,k)])] and (1,2,3) 2 00 00 ∆n,k,d = E[Wn,k(E[f (Sn,k−d + Tn,k)] = E[f (Sn,k + Tn,k)])] From the second weak dependence lemma (Lemma 11), it follows that

(1,2,2) 1 2 00 00 |∆n,k,d | = 2 |E[Yk (f (Sn,k−d + Tn,k) − E[f (Sn,k−d + Tn,k)])]| nσ˜n 1 00 2 = 2 |Cov[f (Sn,k−d + Tn,k),Yk ]| nσ˜n 1 00 d ≤ 2 kf k∞ρ , nσ˜n and therefore for every δ > 0 there exists a D0 ∈ N so that ∀d > D0: n X (1,2,2) 1 ∆ ≤ kf 00k ρd ≤ δ. n,k,d σ˜2 ∞ k=1 n

For any such d, using the Cauchy-Schwarz inequality and the fact that f 00 is bounded, for any  > 0:

n n n k X (1,2,1) X X X | ∆ | ≤ C( [W 2 √ ] + [W 2 √ |W |]) n,k,d E n,kI{|Yk|≥ n} E n,kI{|Yk|≤ n} n,j k=1 k=1 k=1 j=k−d n k C C X X = [Y 2 √ ] + [|W | |W |] 2 E 1 I{|Y1|≥ n} E n,k n,j σ˜ σ˜n n k=1 j=k−d

14 for an appropriate bound C > 0. Using the Cauchy-Schwarz inequality on every term in the above sum, we have

n n k X (1,2,1) C C X X q | ∆ ≤ [Y 2 √ ] + [W 2 ] [W 2 ] n,k,d 2 E 1 I{|Y1|≥ n} E n,k E n,j σ˜ σ˜n k=1 n k=1 j=k−d C C(d + 1) X ≤ [Y 2 √ ] + [Y 2] ≤ δ 2 E 1 I{|Y1|≥ n} 3 E j σ˜n σ˜n j∈N for large enough n and small enough . In addition, we have

(1,2,3) 2 00 00 |∆n,k,d | = |E[Wn,k]||E[f (Sn,k−d + Tn,k)] − E[f (Sn,k + Tn,k)]| k−1 2 X ≤ C|E[Wn,k]| E[|Wn,j|] j=k−d Cd = [Y 2] [Y ], −3/2 3 E 1 E 1 n σ˜n which is summable over n. With δ → 0 we have n X (1,2) ∆n,k → 0 (n → ∞). k=1

(1,1) For the term ∆n,k we use the triangle inequality and have for any d between 1 and k:

k−d (1,1) X 00 00 |∆n,k | ≤ | E[Wn,kWn,j(f (Sn,j + ξn,k,jWn,j + Tn,k) − E[f (Sn,k + Tn,k)])]|+ j=1 k−1 X 00 00 + | E[Wn,kWn,j(f (Sn,j + ξn,k,jWn,j + Tn,k) − E[f (Sn,k + Tn,k)])]|. j=k−d+1

The first term tends to zero: for any δ > 0,

k−d X 00 00 | E[Wn,kWn,j(f (Sn,j + ξn,k,jWn,j + Tn,k) − E[f (Sn,k + Tn,k)])]| j=1 k−d kfk∞ X ≤ ρk−j nσ˜2 n j=1 d kfk∞(1 − ρ ) δ ≤ 2 ≤ n(1 − ρ)˜σn n for large enough d (and n ≥ d). The other term is split into three parts:

k−1 (1,1,1) X 00 00 ∆n,k,d := E[Wn,kWn,j(f (Sn,j + ξn,k,jWn,j + Tn,k) − f (Sn,j−d + Tn,k))], j=k−d+1

15 k−1 (1,1,2) X 00 00 ∆n,k,d := E[Wn,kWn,j(f (Sn,j−d + Tn,k) − E[f (Sn,j−d + Tn,k)])], j=k−d+1

k−1 (1,1,3) X 00 00 ∆n,k,d := E[Wn,kWn,j(E[f (Sn,j−d + Tn,k)] − E[f (Sn,j + Tn,k)])]. j=k−d+1 Using the weak dependence shown in Lemma 11, for large enough d:

k−1 (1,1,2) 1 X |∆ | ≤ |Cov[f 00(S + T ),Y Y ]| n,k,d nσ˜2 n,j−d n,j j k n j=k−d+1 dρd ≤ 2 kfk∞ ≤ δ. nσ˜n Additionally, for every  > 0,

n n k−1 X (1,1,1) X X 2 | ∆n,k,d | ≤ C E[Wn,kWn,jI{|YkYj |≥n }]+ k=1 k=1 j=k−d+1 n k−1 k X X X 2 + C E[Wn,kWn,jI{|YkYj |

16 for large enough n with an appropriate  = (n). Finally,

k−1 (1,1,3) X 00 00 |∆n,k,d | ≤ |E[Wn,kWn,j]|(E[f (Sn,j−d + Tn,k) − f (Sn,j + Tn,k)]) j=k−d+1 k−1 j−1 2 X q 2 k−j X ≤ Cnσ˜n ( E[Yj ]ρ E[ |Wn,i|]) j=k−d+1 i=j−d p k−1 Cd [Y 2] X = E 1 ρk−j 3 2 n 2 σ˜n j=k−d+1 Cdp [Y 2](1 − ρd) = E 1 . 3 2 n 2 σ˜n(1 − ρ) Altogether, we have

3 d n d 00 2 p 2 d X (1,1) dρ kf k∞ CdE[Yj ] 4 (1 − ρ 2 ) Cd E[Y ](1 − ρ ) |∆ | ≤ δ + + √ + √ 1 n,k,d σ˜2 σ˜3(1 − ρ) nσ˜2(1 − ρ) k=1 n n n Cdp [Y 2](1 − ρd) √E 1 ≤ 3δ + 2 → 0, nσ˜n(1 − ρ) letting n tend to ∞ and δ to zero. Altogether, we now have

n X (1) | ∆n,k| → 0 (n → ∞) k=1 and the theorem is proved.

17 4 Parameter estimation

Before attempting to estimate the parameters ω, α, β of a GARCH(1,1) process, we first have to show that they are unique - that they are the only parameters capable of defining the process.

Lemma 12. Let Xt be a strictly stationary GARCH(1,1) model with α + β < 1. Then

∞ ω X σ2 = + αβk−1X2 . t 1 − β t−k k=1 ω Proof. Since β < 1 it is clear that the series converges with probability 1. Let Zt := 1−β + P∞ k−1 2 k=1 αβ Xt−k. Then one has

∞ ω X ω + αX2 + βZ = ω + αX2 + (−ω) + + αβkX2 t−1 t−1 t−1 1 − β t−(k+1) k=1 ∞ ω X = + αβkX2 = Z , 1 − β t−(k+1) t k=0

2 so that Zt fulfills the same recursive equation as σt . However, by Theorem 1 the strictly stationary 2 solution is unique, so we must have Zt = σt ∀t. 2 Theorem 13. : Let Xt be a GARCH(1,1) model with α + β < 1, ω > 0 and where et is not a.s. constant. Then the parameters ω, α, β are unique.

Proof. : Assume that Xt has two representations as a GARCH(1,1) process with parameters ω, α, β andω, ˆ α,ˆ βˆ. By the above lemma, we can write

∞ ∞ ω X ωˆ X σ2 = + αβk−1X2 = + αˆβˆk−1X2 . t 1 − β t−k ˆ t−k k=1 1 − β k=1

k−1 k−1 k −1 This means that αβ =α ˆβˆ ∀k; assuming otherwise, let k0 be the smallest k with αβ 0 6= αˆβˆk0−1. Then ω − ωˆ + P∞ (αβj−1 − αˆβˆj−1)X2 2 1−β 1−βˆ j=k0+1 t−j et−k0 = . σ2 (αβk0−1 − αˆβˆk0−1) t−k0 However, since e2 is also stochastically independent from the right side of this equation, it must t−k0 be a.s. constant, which contradicts our assumption. Since αβk−1 =α ˆβˆk−1 for all k, we have

∞ ∞ αz X X αzˆ = αβk−1zk = αˆβˆk−1zk = ∀|z| ≤ 1, 1 − βz ˆ k=1 k=1 1 − βz and by analytic extension we have

α(1 − βzˆ ) =α ˆ(1 − βz) ∀z.

Letting z = 0 we have α =α ˆ, which immediately leads to β = βˆ and ω =ω ˆ.

18 Due to the complexity of the maximum likelihood function for GARCH(1,1) models, one gener- ally uses an approximating function. Let θ = (ω, α, β) and let (Ft) be the filtration on Ω generated by {Xs: s ≤ t}. Then one has

L(θ | x0, ..., xn) = f(x0, ..., xn | θ) = f(xn | Fn−1)f(xn−1 | Fn−2)...f(x1 | F0)f(x0 | θ), so n X − log L(θ | x0, ..., xn) = − log f(x0 | θ) − log f(xk | Fk−1), k=1 where f(x0, ..., xn | θ) is the joint of {X0, ..., Xn} in a GARCH model with parameters θ. With a large enough sample size, the contribution from f(x0 | θ) is commonly assumed to be relatively small and is dropped, resulting in the quasi maximum likelihood function

n−1 X QL(θ | x0, ..., xn) = − log f(xk | Fk−1) k=1 which is to be minimized. In the case of normally distributed innovations et, we have

2 1 xk f(xk+1 | Fk) = q exp(− ), 2 2σk 2πσk so n n − 1 1 X  x2  QL(θ | x , ..., x ) = log 2π + log σ2(θ) + k . 0 n 2 2 k σ2(θ) k=1 k

The parameter θˆn = (ˆωn, αˆn, βˆn) which minimizes this function given observations X0 = x0, ..., Xn = 2 1 Pn 2 Xk xn, or equivalently which minimizes ln(θ) = n k=1(log σk + 2 ), is called the Quasi-Maximum- σk Likelihood estimator.

Theorem 14. (QMLE - strong consistency) Let Xt be a GARCH-time series with parameters θ0 = (ω0, α0, β0). Under the conditions (i) θ0 lies in a compact set K ⊂ (0, ∞) × [0, ∞) × [0, 1) 2 (ii) et is not a.s. constant (iii) β0 < 1, α0 6= β0, α0 + β0 < 1, ω0 > 0 a.s. the QML estimator is consistent: θˆn −→ θ0 for n → ∞ Proof. By Theorem 4.2, θ is identifiable. 2 ω For each parameter θ, we define recursively σ0(θ) = 1−α−β and

2 2 2 σk(θ) = ω + αXk−1 + βσk−1(θ)

2 for k = 1, ..., n. Clearly, σk(θ0) coincides with the true volatility. Since α0 + β0 < 1, we know that 2 2 Eθ0 [Xk ] = Eθ0 [σk(θ0)] are finite for all k and therefore

n X 2 Eθ0 [|ln(θ0)|] = 1 + Eθ0 [|log σk|] k=1

19 is also finite. Using the inequality log x ≤ x − 1 (for x > 0),

n 2 2 1 X σ (θ0)e [|l (θ)|] − [|l (θ )|] = [log(σ2(θ)) − log(σ2(θ )) + k k − e2] Eθ0 n Eθ0 n 0 n Eθ0 k k 0 σ2(θ) k k=1 k n 2 2 1 X σ (θ0) σ (θ0) = [− log( k ) + k − 1] ≥ 0, n Eθ0 σ2(θ) σ2(θ) k=1 k k

2 σk(θ0) with equality only when 2 = 1 Pθ0 -a.s. for all k. Since θ0 is identifiable, this happens only σk(θ) when θ = θ0. Since the parameter set K is compact, we have supθ∈K β < 1. Define

n 2 n 2 2 1 X X 1 X e σ (θ0) ˜l (θ) = (log σ2(θ) + k ) = (log σ2(θ) + k k ). n n k σ2(θ) n k σ2(θ) k=1 k k=1 k Then 2 2 2 Y 2 Y 2 t sup|σt (θ0) − σt (θ)| = sup σ0| (α0et−i + β0) − (αet−i + β)| ≤ C sup β . θ∈K i i θ∈K x |x−y| Using the inequality |log( y )| ≤ min(x,y) for x, y > 0, we have

n 2 2 2 ˜ 1 X |σk(θ0) − σk(θ)| 2 σk(θ0) sup|ln(θ0) − ln(θ)| ≤ sup 2 2 Xk + |log( 2 )| n σ (θ0)σ (θ) σ (θ) θ∈K k=1 θ∈K k k k n n 1 1 X 1 1 X ≤ C sup βkX2 + C sup βk, ω2 n k ω n θ∈K k=1 θ∈K k=1

k 2 a.s. 2 where supθ∈K β Xk −→ 0 since supθ∈K β < 1 and Xk has a finite moment of order greater than 0. Using Kronecker’s lemma, we see that

a.s. sup|ln(θ) − ˜ln(θ)| −→ 0. θ∈K

Finally, for every θ ∈ K and r > 0 let Br(θ) be the open sphere with center θ and radius r. Then we have

lim inf inf ˜ln(θˆ) ≥ lim inf inf ln(θˆ) − lim sup sup|ln(θˆ) − ˜ln(θˆ)| n→∞ ˆ n→∞ ˆ θ∈Br(θ)∩K θ∈Br(θ)∩K n→∞ θˆ∈K n 2 2 1 X  2 ˆ ekσk(θ0) ≥ lim inf inf log σk(θ) + n→∞ n ˆ 2 ˆ k=1 θ∈Br(θ)∩K σk(θ) 2 2 h  2 ˆ ekσk(θ0)i = Eθ0 inf log σk(θ) + ˆ 2 ˆ θ∈Br(θ)∩K σk(θ) by Birkhoff’s ergodic theorem, using the fact that Xt is ergodic by condition (iii) . By Beppo-Levi’s theorem (monotone convergence), we have

2 2 2 2 h  2 ˆ ekσ1(θ0)i h e1σ1(θ0)i Eθ0 inf log σ1(θ) + → Eθ0 log σ1(θ0) + ˆ 2 ˆ σ2(θ ) θ∈Br(θ)∩K σ1(θ) 1 0

20 as r → 0. This means that for every θ 6= θ0 ∈ K, there is an open neighborhood U(θ) of θ in K such that 2 2 ˜ ˆ h e1σ1(θ0)i lim inf inf ln(θ) > θ log σ1(θ0) + . n→∞ E 0 2 θˆ∈U(θ) σ1(θ0) Since K is compact, by the Heine-Borel theorem K is covered by a finite set of these open neigh- borhoods U(θ1), ..., U(θk) and Br(θ) for any r > 0. Since

2 2 ˜ ˆ ˜ e1σ1(θ0) lim sup inf ln(θ) ≤ lim ln(θ0) = lim ln(θ0) = Eθ0 [log σ1(θ0) + ], ˆ n→∞ n→∞ 2 n→∞ θ∈Br(θ) σ1(θ0) for any r > 0, θˆn must lie in Br(θ0) for large enough n, and the theorem is proved. Theorem 15. (QMLE - asymptotic distribution) 3 Let Xt be a GARCH-time series with parameters θ0 = (ω0, α0, β0). Define M = R×Mα ×Mβ ⊂ R , where Mα = [0, ∞) if α0 = 0 and R otherwise, and Mβ analogously. Under the conditions (i) θ0 lies in a compact set K ⊂ (0, ∞) × [0, ∞) × [0, 1); (ii) et is not a.s. constant; (iii) β0 < 1, α0 6= β0, α0 + β0 < 1, ω0 > 0; 4 (iv) κ := E[et ] < ∞; 6 (v) E[Xt ] < ∞; √ D we have n(θˆn − θ0) −→ ξ, where

T −1 ξ = arg inf (t − Z) J(θ0)(t − Z),Z ∼ N (0, (η − 1)J(θ0) ) t∈M and 2 h 1 ∂ 2 i J(θ0) = Eθ0 4 2 σ1(θ0) σ1(θ0) ∂θ is a positive definite matrix. The proof of this theorem can be found in [4], Chapter 8. However, it is useful to see the 2 2 2 Xt 1 Pn 2 Xt following equality. Let ft(θ) = log σt (θ) + 2 , so that ln(θ) = k=1 fk(θ). Since et = 2 is σt (θ) n σt (θ0) 2 2 independent of σt (θ0) and all derivatives of σt at θ0, we have 2 h∂ft(θ0)i 2 h 1 ∂σt (θ0)i Eθ0 = Eθ0 [1 − et ] · Eθ0 2 = 0, ∂θ σt (θ0) ∂θ where the derivatives are understood as one-sided in case θ0 lies on the boundary of (0, ∞)×[0, ∞)× [0, 1). Additionally,

2 2 2 h ∂ i 2 h 1 ∂ 2i 2 h 1 ∂ 2 i Eθ0 2 ft(θ0) = Eθ0 [1 − et ] · Eθ0 2 2 σt + Eθ0 [2et − 1] · Eθ0 4 2 σt (θ0) ∂θ θ σt ∂θ σt (θ0) ∂θ θ = J(θ0), so we may equivalently write h ∂2 i J(θ0) = Eθ0 ft(θ0) . ∂θ2 θ 3 Note that when θ0 lies in the interior of K, M = R and ξ = Z is normally distributed.

21 5 Tests

We can use the QML estimator from section 4 to construct tests for the significance of the coefficients in the model; for example, to test between the hypothesis and alternative

H0 : α0 = 0 H1 : α0 > 0.

However, for technical reasons, it is difficult to test α0 = β0 = 0 simultaneously.

The LM test 2 Pn 2 Xk 1 Let ln(θ) := k=1 log σk(θ) + 2 be the QMLE as in section 4 (removing the factor n for conve- σk(θ) nience in the below equations). We construct θ˜n as the QML estimator under the constraint that α = 0. First, we assume that the innovations et actually are standard normally distributed, so that the QML estimator ln is the true maximum likelihood estimator. Under the assumptions that θ0 is identifiable and Xt has finite moments of up to 6th order, one has the convergence 1 ∂l (θ ) √ n 0 −→D N (0,I) n ∂θ and √ D −1 n(θˆn − θ0) −→ N (0,I )

2 1 ∂ ln(θ0) where I = limn→∞ n ∂θ2 converges almost surely. The constrained estimator is then derived through the method of Lagrangian multipliers: define Λ(θ, λ) = ln(θ) − λα. Then (θ˜n, λ˜) = arg sup Λ(θ, λ). We have ∇Λ(θ˜n, λ˜) = 0 and therefore

0 ∂ α˜ = 0, l (θ˜) = λ˜ 1 . ∂θ n   0

Let R = 0 1 0. Then we have

√ √ D −1 T nR(θˆn − θ˜n) = nR(θˆn − θ˜0) −→ N (0,RI R ).

We have 1 ∂ 1 ∂ √ 0 = √ l( θˆ ) = √ l (θ ) − I n(θˆ − θ ) + o(1) n ∂θ n n n ∂θ n 0 n 0 and 1 ∂ 1 ∂ √ √ l (θ˜ ) = √ l (θ ) − I n(θ˜ − θ ) + o(1), n ∂θ n n n ∂θ n 0 n 0 so that √ 1 ∂ 1 n(θˆ − θ˜ ) = I−1 √ l( θ˜ ) = I−1 √ RT λ˜ n n n ∂θ n n n and therefore

λ˜ √ √ = (RI−1RT )−1 nR(θ˜ − θ ) + o(1) −→D N (0, (RI−1RT )−1). n n 0

22 This means that 1 ∂ λ˜ √ l (θ˜ ) = RT √ −→D N (0,RT (RI−1RT )−1R). n ∂θ n n n ˆ 1 ∂2 ˜ ˆ Let In = n ∂θ2 ln(θn), so that In is a strongly consistent estimator for I. Then the LM 1 ∂ ∂ LM(n) = ( l (θ˜ ))T Iˆ−1 l (θ˜ ) n ∂θ n n n ∂θ n n is asymptotically χ2-distributed with Rank(R) = 1 degree of freedom. In the case where et are not normally distributed, one has in general 1 ∂ J := lim Varθ [√ ln(θ)] 6= I n→∞ 0 n ∂θ and it can be shown that the following limits hold: 1 ∂ √ l (θ ) −→D N (0,J) n ∂θ n 0 and √ D −1 −1 n(θˆn − θ0) −→ N (0,I JI ), and therefore the LM statistic

1 ∂l (θ˜ ) ∂l (θ˜ ) LM = ( n n )T Iˆ−1RT (RIˆ−1Jˆ Iˆ−1RT )−1RIˆ−1 n n . n n ∂θ n n n n n ∂θ It can be shown that this statistic is also asymptotically χ2-distributed with Rank(R) = 1 degree of freedom.

First, we show the problem that appears when attempting to test both α0 = β0 = 0. Applying the previous results to this case, we have θ0 = (ω0, 0, 0) under hypothesis H0 and therefore

n 2 2 2 1 ∂ln(θ0) 1 X Xk − σk(θ0) ∂σk(θ0) √ = √ 4 n ∂θ 2 n σ (θ0) ∂θ k=1 k   n 2 1 1 X ek − 1 2 D = √ Xk−1 −→ N (0,J), 2 n ω0 k=1 ω0

4 where, letting κ = E[ek],     1 1 ω0 ω0 1 h 2 2 2 2  i κ − 1 2 2 J = 2 E (ek − 1) Xk−1 1 Xk−1 ω0 = 2 ω0 κω0 ω0 . 4ω0 4ω0 2 2 ω0 ω0 ω0 ω0 Since J is singular, no LM test can immediately be constructed.

23 Testing α = 0, we have θ = (ω , 0, β ) under H and therefore [σ2(θ )] = ω0 . We then 0 0 0 0 0 E k 0 1−β0 have   n 2 1 1 ∂ln(θ0) 1 X (ek − 1) 2 D √ = √ 2  Xk−1  −→ N (0,J), n ∂θ 2 n σk(θ0) 2 k=1 σk−1(θ0) k σ2 (θ ) k−1 2 1−β0 k−1 0 1−β0 where σk(θ0) = ω0 1−β , so 2 = k =: γk and therefore 0 σk(θ0) 1−β0   n 1 γk γk 1 ∂ln(θ0) κ − 1 X 2 2 J = lim Var[√ ] = lim γk κγkσk−1(θ0) γkσk−1(θ0) , n→∞ n ∂θ n→∞ 4n 2 2 k=1 γk γkσk−1(θ0) γkσk−1(θ0)

Pn Pn 2 and we have with gn = k=1 γk, sn = k=1 γkσk−1(θ),

 sn −gn  2 0 2 4n sn−gn sn−gn −1  0 1 −1  J = lim sn(κ−1) sn(κ−1) n→∞ κ − 1  2  −gn −1 snκ−gn 2 2 sn−gn sn(κ−1) sn(sn−gn)(κ−1) where the limit in each matrix entry exists. To see this, consider for example that with 0 < β0 < 1,

n k−1 X 1 − β0 1 − β0 gn = k = ψβ0 (n) + n + O(1), 1 − β β0 log β0 k=1 0 where ψβ0 (n) is the β0-digamma function and

∞ n+z ψβ0 (z) X β0 lim = log β0 lim = 0 z→∞ z z→∞ z(1 − βn+z) n=0 0 by uniform convergence, so that lim n = 1 Additionally, we have n→∞ gn

n k−1 2 X ω0(1 − β0 ) sn = (1 − β )(1 − βk) k=1 0 0 n 2 β0 − β0 (n − 1) + β0(n − 2) (β − 1)ψβ0 (n + 1) = 2 + 2 + O(1), (β0 − 1) β0 β0 log β0 and n → 1 − β follows. Thus the limit J is indeed invertible, with inverse sn 0

 1 0 −1  1 1−β0 J −1 =  0 1−β0 − 1−β0  .  κ−1 κ−1  κ − 1 1−β0 1−β0 −1 − κ−1 κ−1

24 On the other hand,

1 ∂2 I := l (θ ) n n ∂θ2 n 0 n 2 2 2 1 X ∂ hXk − σk(θ0) ∂σk(θ0)i = 4 n ∂θ 2σ (θ0) ∂θ k=1 k n 2 2 2 2 1 X 2Xk − σk(θ0) ∂σk(θ0) ∂σk(θ0) T = 6 ( ) n 2σ (θ0) ∂θ ∂θ k=1 k  2 2  n 2 1 Xk−1 σk−1(θ0) 1 X 2ek − 1 2 4 2 2 = 4  Xk−1 Xk−1 Xk−1σk−1(θ0) , n 2σk(θ0) 2 2 2 4 k=1 σk−1(θ0) Xk−1σk−1(θ0) σk−1(θ0)

2 1 ˆ κ−1 ˆ which converges almost surely to κ−1 J due to the factor of n . Defining Jn = 2 In where   n 2 2 ˜ 1 1 X 2X − σ (θn) Iˆ = k k X2 1 X2 σ2 (θ˜ ) , n n 6 ˜  k−1  k−1 k−1 n 2σk(θn) 2 ˜ k=1 σk−1(θn) we have ˆ−1 ˆ ˆ−1 T κ − 1 ˆ−1 T 2n RIn JnIn R = RIn R = , 2 sn(κ − 1)  1  X2−σ2(θ˜ ) and considering that ∂ l (θ ) = 1 Pn k k n X2 , the LM statistic can be calculated. ∂θ n 0 2 k=1 σ4(θ˜ )  k−1  k n 2 ˜ σk−1(θn Testing for stationarity

The GARCH(1,1)-process is covariance stationary if and only if α0 + β0 < 1. The problem of determining whether the process has a stationary solution or not may therefore be addressed with a parametric test. Let H0 : α0 + β0 ≥ 1 H1 : α0 + β0 < 1. Here, θ lies in the interior of a compact parameter space K, so that the QML estimator is asymp- totically normal: √ D −1 n(θ0 − θˆn) −→ N (0, (κ − 1)J )  and with R = 0 1 1 , if Rθ0 = 1, we have

√ D −1 T n(Rθˆn − 1) −→ N (0, (κ − 1)RJ R ).

This leads to the asymptotic normality of the Wald statistic: √ n(ˆαn + βˆn − 1) D Wn = q −→ N (0, 1), −1 T (κ − 1)RJˆn R and H0 is rejected if Wn < uα, where uα is the α-quantile of the standard normal distribution.

25 6 Variants of the GARCH(1,1) model

While the standard GARCH(1,1) and related GARCH(p,q) models are useful tools in , they are unable to describe certain aspects often found in financial data. An important weakness is their inability to react differently to positive and negative innovations - the conditional variance considers only the squares of the innovations. However, many datasets display a leverage effect, where negative returns correspond to higher increases in volatility than positive returns. Another problem is the lack of clarity with regard to stationarity and persistence, where shocks may persist in one norm but not in another in the GARCH model. The existence of almost-surely stationary GARCH(1,1)-processes with infinite variance at every time t is inconvenient.

A notable variant of the GARCH model which addresses these problems is the Exponential ARCH (EARCH) model due to Nelson (1989). This has the additional advantage of greater flexibility 2 in the parameters by imposing the autoregressive relationship on log σt , which can take negative values. The general form of the EARCH(1) model is

2 log σt = ω + β(|et−1| − E[|et−1|]) + γet−1.

It can also be shown that the conditions for stationarity, unlike the GARCH(1,1) model, are the same for both wide-sense (almost sure) and covariance stationarity. A necessary and sufficient condition for this is β < 1. However, the asymptotic properties of QML estimation for EARCH models are not as well known as the GARCH case. Another possible extension of the GARCH(1,1) model to allow for asymmetry is the QGARCH(1,1) model of Sentana (1995): 2 2 2 σt = ω + αXt−1 + βσt−1 + γet−1, where appropriate restrictions are necessary to ensure that σ remains positive. These are given √ t by ω, α, β > 0 and |γ| ≤ 2 αω. Many of the properties of the GARCH(1,1) model carry over ω immediately to QGARCH. For example, the QGARCH model has unconditional variance 1−α−β if α + β < 1 and undefined otherwise, and α + β < 1 is also necessary for covariance stationarity.

The AGARCH(1,1) (asymmetric GARCH) model developed by Engle and Ng (1993) is another approach to allowing the GARCH model to react asymmetrically. It is defined by

2 2 2 Xt = etσt, σt = ω + α(Xt−1 + γ) + βσt−1 where γ is the noncentrality parameter. Another interesting model is the TGARCH(1,1) (threshold GARCH) model developed by Jean- Michel Zakoian. Here, the autoregressive specification is given for the conditional standard devia- tion instead of the variance: Xt = etσt + + − − σt = ω + α Xt−1 + α Xt−1 + βσt−1 + − where Xt = max{0,Xt} is the positive part of Xt and Xt = min{0,Xt} the negative. This is another model developed to account for asymmetric reactions to shocks, which has the advantage of being closer to the original GARCH formulation but also requires some non-negativity assumptions for the parameters.

26 7 GARCH(1,1) in continuous time The diffusion limit

Heuristically, we will consider in this section whether increasingly frequent observations of the time series will lead to a better model. Although this may seem intuitive, it is easy to see that this should not be the case in general: even in a non-stochastic setting, increasingly fine interpolation can lead to large errors without the assumption of differentiability. However, we will show that highly frequent observations will lead to a more accurate model in certain cases, where the precise meaning of “leading to a better model” will be the weak convergence of processes.

Nelson (1990) has studied the relationship between GARCH(1,1) and similar models, and stochas- tic differential equations in continuous time. His main result is that in a sequence of GARCH(1,1) models to increasingly small time intervals, under certain assumptions on the parameters, the conditional variance process converges in distribution to a stochastic differential equation with an inverse-gamma stationary distribution. This means that for sufficiently short time intervals the GARCH log returns can be approximately modelled with a Student’s t distribution. Since a GARCH(1,1) process is Markovian, it is enough to consider convergence of Markov chains. (h) The following theorem gives conditions under which a sequence of Markov processes Xhk (k ∈ N) converges weakly to an Ito process as h tends to zero. Let D be the Skorokhod space of c`adl`ag n (h) mappings from [0, ∞) into R , endowed with the Skorokhod metric. For every h > 0 let Fkh be (h) (h) (h) n the σ-algebra generated by X0 , ..., Xkh , P be a probability measure on B , the Borel σ-algebra n (h) n over R , and for every h > 0 and k ∈ N0 let Kkh be a transition function on R ; that means: n (h) n (i) for every x ∈ R , Kkh (x, .) is a probability measure on B , and n (h) (ii) for every A ∈ B , Kkh (., A) is a measurable function. (h) (h) (h) (h) (h) (h) n and such that we have P (X(k+1)h ∈ A | Fkh ) = Kkh (Xkh ,A) P -a.s. ∀A ∈ B , as well as (h) (h) (h) P (Xt = Xkh , kh ≤ t < (k + 1)h) = 1. (h) (h) (h) We define Xt as the extension of Xkh into continuous time; that is, Xt is a step function with discontinuities at kh for all k. For h > 0 and  > 0 define 1 Z a (x, t) = (y − x)(y − x)T K(h) (x, dy) h h h,bt/hc Rn 1 Z b (x, t) = (y − x)K(h) (x, dy) h h h,bt/hc Rn and for each i = 1, ..., n define 1 Z c (x, t) = |(y − x) |2+K (x, dy) h,i, h i h,bt/hc Rn where ah and bh are finite if ch,i, is finite for all i with some  > 0. Theorem 16. Let the following assumptions be fulfilled: (i): There is an  > 0 so that for every R > 0, T > 0 and i = 1, ..., n:

lim sup ch,i,(x, t) = 0 h→0 kxk≤R, 0≤t≤T

27 n n×n n n and continuous functions a : R × [0, ∞) → R , b : R × [0, ∞) → R so that

lim sup kah(x, t) − a(x, t)kF = 0 h→0 kxk≤R, 0≤t≤T

lim sup kbh(x, t) − b(x, t)k = 0 h→0 kxk≤R, 0≤t≤T where k−kF denotes the Frobenius norm. n n×n (ii): There exists a continuous function σ : R × [0, ∞) → R so that

T n a(x, t) = σ(x, t)σ(x, t) ∀x ∈ R , t ≥ 0.

n n (h) D (iii): There exists a random variable X0 with distribution P0 on (R , B ) so that X0 −→ X0. (iv): a, b, and P0 uniquely determine the distribution of a diffusion process Xt with starting distri- bution P0, diffusion matrix a(x, t) and drift b(x, t). (h) Then Xt ⇒ Xt where Xt is given by the stochastic differential equation

t t Z Z Xt = X0 + b(Xs, s)ds + σ(Xs, s)dBn,s 0 0 where Bn,t is an n-dimensional independent of X0. Xt does not depend on the choice of matrix square root σ. In addition, for every T > 0, we have

P( sup kXtk < ∞) = 1. 0≤t≤T

This theorem can be applied to the GARCH(1,1) model with normally distributed innovations, allowing the parameters α, β, ω to depend on h and making the innovations proportional to h. Let P Yt = s

(h) (h) (h) (h) Yhk = Yh(k−1) + σhk ehk 1 (σ(h))2 = ω + (σ(h) )2(β + α (e(h) )2) hk h h(k−1) h h h h(k−1) (h) 2 (h) 2 with i.i.d. random variables ehk ∼ N (0, h)(k = 0, 1, ...). For B ∈ B let νh(B) = P((Y0 , σ0) ∈ B) where we may assume that νh satisfies condition (iii) of the theorem. (h) (h) (h) (h) Let Fhk be the σ-algebra generated by Y0 , ..., Yh(k−1), σ0 , ..., σhk . Then

−1 (h) (h) (h) (h) E[h (Yhk − Yh(k−1)) | Fhk] = σhk E[ehk ] = 0 and 1 [h−1((σ(h) )2 − (σ(h))2) | F ] = (ω + (β + α − 1)(σ(h))2), E h(k+1) hk hk h h h h hk so the limit condition in (i) can be fulfilled only if the limits ω 1 − α − β lim h = ω ≥ 0 lim h h = θ h→0 h h→0 h

28 2 2 αh α exist. In addition, assuming limh→0 h = 2 exists and is finite,

−1 (h) 2 (h) 2 2 E[h {(σh(k+1)) − (σhk ) } | Fhk] −1 2 (h) 2 2 (h) 4 2 (h) 4 = h ωh − 2(1 − αh − βh)(σhk ) + (1 − αh − βh) (σhk ) + 2αh(σhk ) 2 (h) 4 = α (σhk ) + o(1), and −1 (h) (h) 2 (h) 2 E[h (Yhk − Yh(k−1)) | Fhk] = (σhk ) , and −1 (h) (h) (h) 2 (h) 2 E[h (Yhk − Yh(k−1))((σh(k+1)) − (σhk ) ) | Fhk] = 0, for a term o(1) which tends to zero uniformly on compacta. Since

−1 (h) (h) 4 −1 (h) 4 E[h (Yhk − Yh(k−1)) | Fhk] = 3h (σhk ) → 0 and −1 (h) 2 (h) 2 4 E[h {(σh(k+1)) − (σhk ) } | Fhk] → 0, we have with  = 2, −1 (h) (h) 4 ch,i,(x, t) = E[h (Yhk − Yh(k−1)) | Fhk] → 0; and setting σ2 0  a(Y, σ) = 0 α2σ4  0  b(Y, σ) = , ω − θσ2 σ 0  assuming the limit conditions on α , β , ω , condition (i) is satisfied. Letting τ = fulfills h h h 0 ασ2 condition (ii). Under the assumption of distributional uniqueness, by Theorem 7.1 we have the diffusion limit dYt = σtdBt 2 2 2 dσt = (ω − θσt )dt + ασt dWt, for independent Brownian motions Bt,Wt. Since the drift and variation terms are Lipschitz- continuous, there exists a unique strong solution and therefore a unique distributional solution. 2 The corresponding Fokker-Planck equation for σt is ∂ ∂ 1 ∂2 f(x, t) = − [(ω − θx)f] + (α2x2f) ∂t ∂x 2 ∂x2

2 2 where f(x, t) is the probability density of σt given σ0. A stationary distribution with p.d.f. g for 2 σt must satisfy g(x) = limt→∞ f(x, t) and therefore d α2x2g(x) = 2(ω − θx)g(x), dx

29 so 2 g0(x) = (ω − θx − α2x)g(x) α2x2 with solution 2 −2ω g(x) = Cx−2θ/α −2 exp( ) α2x for a normalizing constant C. This is the p.d.f. of an inverse gamma distribution up to the constant factor and therefore (2ω/α2)(2θ/α2+1) C = , Γ(2θ/α2 + 1) so under the assumption that 2θ/α2 + 1 > 0,

2 D −1 2 2 σt −→ Γ (2θ/α + 1, 2ω/α ).

Theorem 17. Assume that 2θ/α2 + 1 > 0 with θ, α defined as above, and that

(h) 2 D −1 2 2 (σ0 ) −→ Γ (2θ/α + 1, 2ω/α )(h → 0).

Then (h) 2 D −1 2 2 (σhk ) −→ Γ (2θ/α + 1, 2ω/α ) and r (2θ + α2)/2ω e(h)σ(h) −→D t(2 + 4θ/α2) h hk hk as (h → 0) and kh remains constant. Here, t(2 + 4θ/α2) is the Student’s t-distribution with 2 + 4θ/α2 degrees of freedom.

(h) Proof. The first statement follows immediately from the above considerations since σhk converges t 1 (h) in distribution to σt for h → 0 and k = h . For the second statement, consider that h ehk is standard (h) normally distributed and independent of σhk for every h, k. Therefore we assume without loss of generality that the process is stationary, that is,

(h) 2 −1 (σhk ) ∼ Γ (λ, µ)

2 2 (h) 2 for every h, k, defining λ = 2θ/α + 1, µ = 2ω/α . Since the density of (σhk ) is −µ g(x) = Cx−λ−1 exp( ) x

(h) with C defined as earlier, the density of σhk is

2 g(x ) −2λ−1 −µ f (x) = √ = 2Cx exp( ), σ d x x2 | dx |

30 q 1 (h) (h) so that the density of the product h ehk σhk is

∞ 1 Z 1 −x2 y f(y) = √ e 2 fσ( )dx 2π x x 0 ∞ 2C Z −x2( 1 + µ ) = √ y−2λ−1 x2λe 2 y2 dx 2π 0 2C 2 −λ− 1 1 µ 1 −λ− 1 1 = √ (y ) 2 ( + ) 2 Γ(λ + ) 2π 2 y2 2 2 2 1 λ y −λ− 1 Γ(λ + 2 ) = µ (µ + ) 2 √ , 2 2πΓ(λ) q (2θ+α2)/2ω (h) (h) q λ (h) (h) so that the distribution of the product h ehk σhk = µh ehk σhk is

r 2 1 µ λ y µ −λ− 1 Γ(λ + 2 ) f(y) = µ (µ + ) 2 √ λ 2λ 2πΓ(λ) 2 2λ+1 y −2λ−1 Γ( 2 ) = (1 + ) 2 √ , 2λ 2λ π · 2λΓ( 2 ) which is the density of a t-distributed random variable with 2λ = 2 + 4θ/α2 degrees of freedom. The theorem is proved.

The COGARCH(1,1) model

Recall that in the GARCH(1,1)-process with ω > 0, α, β ≥ 0, (see Theorem 3.1), we have the representation

t−1 j t−1 2 X Y 2  2 Y 2 σt = ω (αet−i + β) + σ0 (αet−k + β) j=0 i=1 k=0 n−1 j n−1 X X 2 2 X 2 = ω exp( log(αet−i + β)) + σ0 exp( (αek + β)). j=0 i=1 k=0

This motivates the following continuous-time GARCH(1,1) variant due to Kl¨uppelberg, Lindner and Maller:

Definition 18. Let (Lt)t≥0 be a L´evyprocess adapted to a filtrated probability space (Ω, F, P, Ft) satisfying the usual conditions: Ft contains every P-null set for any t, and Ft = ∩s>tFs. Define the cadlag process 2 X  α(∆Ls)  X = −t log β − log 1 + (t ≥ 0) t β s≤t

31 and with a random variable σ0 independent of (Lt)t≥0, the caglad volatility process t Z 2 Xs 2 −Xt− σt = (ω e ds + σ0)e (t ≥ 0) 0 and the cadlag (integrated) COGARCH(1,1) process s Z Gt = σsdLs (t > 0),G0 = 0. 0

Gt plays the same role as the cumulative log-returns Yt = Xt + ... + X0 from the GARCH model described in section 1. The simplest example of a COGARCH(1,1) process is driven by a Brownian motion: Lt = Bt. Since Bt is almost surely continuous, we have Xt = −t log β, so t 2 R −s 2 t ω(βt−1) 2 t 2 σt = (ω β ds + σ0)β = log β + σ0β . In this case, σt is deterministic. This is not surprising: 0 in general, the jumps ∆Ls are the analogons to the innovations en in discrete time.

2 Lemma 19. Let Xt and σt be given as above. Then σt solves the stochastic differential equation

2 2 Xt− −Xt dσt+ = ωdt + σt e d(e ) and therefore t Z α X σ2 = ωt + log β σ2ds + σ2(∆L )2 + σ2. t s β s s 0 0 0

32 by integration by parts and the chain rule, and since

t t t t h Z i h Z Z i Z −Xt Xs −Xs Xs e , e ds = log β e , e = d[s log β, s]s = 0 t t 0 0 0 0 we have t t r Z Z Z e−Xt eXs ds = t + eXr drd(e−Xs ), 0 0 0 so

t Z 2 −Xt Xs 2 −Xt dσt = ωd(e e ds) + σ0d(e ) 0 t Z Xs −Xt 2 −Xt 2 Xt− −Xt = ωdt + ω e dsd(e ) + σ0d(e ) = ωdt + σt e d(e ). 0

33 8 Example with MATLAB

In this section, we use the GARCH methodology to analyze the exchange rate between the U.S. dollar and the euro since its full introduction in 2002.

The above graph shows the daily average value of USD in euros each day from January 1st, 2002 until July 9, 2011. First, the data is transformed into its log returns. With the original time series named ”data”, the MATLAB code is simple: for i = 1:length(data)-1 temp(i) = log(data(i+1) / data(i)); end global log_returns = temp; plot(1:length(log_returns),log_returns); where the log returns are saved as a global variable so that they can be accessed easily in other functions later on.

34 This returns the graph below:

Alternating periods of volatility and relative quiet are visible, as well as a period of intense volatility in late 2008 (around 2500 days from the start) which likely corresponds to the subprime mortgage crisis. At first glance, the data appear to have heteroskedastic effects. We now estimate the param- eters using MATLAB’s optimization toolbox. The Gaussian quasi-maximum likelihood function is implemented and stored separately in a file called QMLE.m: function y = QMLE(param) global log_returns; sigma2(1) = param(1)/(1 - (param(2) + param(3))); y = log(sigma2(1)) + (log_returns(1)^2)/sigma2(1); for i=2:length(log_returns) sigma2(i) = param(1) + param(2)*log_returns(i-1)^2... + param(3)*sigma2(i-1); assert(sigma2(i) > 0); y = y + log(sigma2(i)) + (log_returns(i)^2)/sigma2(i); end

2 where param is a vector (ω, α, β), sigma2(i) is σi and log returns(i) is Xi.

35 The function QMLE is now minimized over the compact set

[1 ∗ 10−15, 5] × [1 ∗ 10−15, 1] × [1 ∗ 10−15, 1] under the additional constraint that α + β ≤ 1 − 10−15. The reason for this is that the constraints must be given in the form Ax ≤ b instead of Ax < b. This optimization is done with

[param, fval] = fmincon(@QMLE,[0.0000002,0.03,0.96],... [0,1,1],1-10^-15,[],[],lb,ub,[],options) where lb = 10−5[1; 1; 1] is the lower bound and ub = [5; 1; 1] the upper bound for the parameters. MATLAB generates the output param = 0.0000 0.0248 0.9744 fval = -3.4017e+004 and we find (after increasing the precision) our QML estimator

ωˆ = 2.27 ∗ 10−8, αˆ = 0.0248, βˆ = 0.9744

2 ˆ The function also generates the reconstructed volatility process σt (θn) described in section 4:

where the dotted line represents the stationary volatility.

36 We can now simulate the process for the following year. Since the parameters were estimated using Gaussian quasi-maximum likelihood, it is appropriate that normally distributed innovations should be used for the simulation. Normally distributed pseudorandom numbers are generated in MATLAB with randn. We define the function function [sim,sigma2] = simul(param,last_sigma2,last_observ,t) assert(length(param) == 3); assert(last_sigma2 > 0); assert(param(1)*param(2)*param(3) > 0); innov = randn([1,t]); sigma2(1) = param(1) + param(2)*last_observ^2... + param(3)*last_sigma2; sim(1) = innov(1)*sqrt(sigma2(1)); for i=2:t sigma2(i) = param(1) + param(2)*sim(i-1)^2... + param(3)*sigma2(i-1); sim(i) = innov(i)*sqrt(sigma2(i)); end

Four simulated volatilities are shown below:

37 which correspond to the following predicted exchange rates:

38 9 Discussion

In this thesis, we have considered the strengths and weaknesses of the GARCH(1,1) model in math- ematical finance, as well as the practical questions of parameter estimation and implementation. GARCH models and variants have become ubiquitous in the theory of economic time series since their introduction only 25 years prior. This is due to the relative simplicity of the model and the wide of processes it can approximate. Related models such as the EARCH and EGARCH models of Nelson (1989) provide the ability to account for emperical phenomena such as the lever- age effect at the cost of a more complex asymptotic theory for the typical estimators.

The investigation of multivariate GARCH models remains an active area of research. The need for such models arises when one considers a set of time series with significant interdependence; an example of this is the stock price of a manufacturing firm and the commodity prices for the re- sources it requires. The most general model replaces the GARCH specification with matrix-valued coefficients as well as a log-returns vector Xt and a vectorized volatility matrix σt (that is, such 2 that σt is the conditional covariance of Xt). This is known as the Vec model. However, this can 2 be very difficult to work with, as necessary and sufficient conditions to ensure that σt is positive definite are difficult or impossible to derive. Therefore, the Vec model is often restricted to models such as the BEKK model. The theory of these models is beyond the scope of this paper.

Another area of further research is the connection between GARCH models and stochastic volatil- ity models. Two examples of continuous-time processes which are related to the discrete GARCH equations are mentioned in section 7; these are the only such classes of processes known at this time. The stationary distribution of the diffusion limit may also contain information about the stationary distribution of the GARCH model and thus its long-term behavior. In addition, the convergence to the diffusion limit is weak and does not necessarily imply that the GARCH model and the diffusion limit must be asymptotically equivalent.

39 References

[1] Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of Econo- metrics 31, 307-327 Journal of Economics, 31, 307-327

[2] Engle, R.F., 1982, Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation, Econometrica, 50, 987-1008

[3] Engle, R.F. and Ng, V.K., 1993, Measuring and testing the impact of news on volatility, Journal of Finance, 48, 1749-1778

[4] Francq, C. and Zako¨ıan,J-M. Mod`elesGARCH: structure, inf´erence statistique et applications financi`eres, Economica: Paris, 2009

[5] Kreiss, J-P. and Neuhaus, G., Einf¨uhrungin die Zeitreihenanalyse, Berlin, Heidelberg: Springer- Verlag, 2006

[6] Kl¨uppelberg, C., Lindner, A. and Maller, R., 2004, A continuous time GARCH(1,1) process driven by a LA´evyˆ process: stationarity and second order behaviour, Journal of Applied Prob- ability, 41, 1-22

[7] MATLAB version 7.11.0. Natick, Massachusetts: The MathWorks Inc., 2010.

[8] Nelson, D.B., 1988, Stationarity and persistence in the GARCH(1,1) model, Econometric The- ory, 6, 318-334

[9] Nelson, D.B., 1989, Conditional heteroskedasticity in asset returns: A new approach, Econo- metrica, 59, 347-370

[10] Nelson, D.B., 1990, ARCH models as diffusion approximations, Journal of Economics, 45, 7-28

[11] Sentana, E., 1995, Quadratic ARCH models, Review of Economic Studies, 62, 639-661

[12] Stroock, D.W. and Varadhan, S.R.S., Multidimensional Diffusion Processes, Berlin: Springer- Verlag, 2006

[13] Zako¨ıan,J-M., 1994, Threshold heteroskedastic models, Journal of Economic Dynamics and Control, 18, 931-955

40