SIMULATION AND IDENTIFICATION OF THE FRACTIONAL BROWNIANMOTION: A BIBLIOGRAPHICAL AND COMPARATIVE STUDY
COEURJOLLY Jean-François 1
Date: June 7, 2000
Abstract : We present a non exhaustive bibliographical and comparative study of the problem of sim- ulation and identification of the fractional Brownian motion. The discussed implementation is realized within the software S-plus 3.4. A few simulations illustrate this work. Furthermore, we propose a test based on the asymptotic behavior of a self-similarity parameter’s estimator to explore the quality of different generators. This procedure, easily computable, allows us to extract an efficient method of simulation. In the Appendix are described the S-plus scripts related to simulation and identification methods of the fBm.
Keywords: Fractional Brownian motion, simulation, identification of a parametric model, quality of a generator, S-plus.
1LMC-IMAG, BP 53 - 38041 Grenoble cedex 9 - FRANCE, [email protected]. To whom correspon- dence should be addressed. 1 Introduction
Many naturally occurring phenomena can be effectively modeled using self-similar processes. For such processes observations that are far apart (in time or space) are correlated too strongly indicating the presence of a long-range dependence. As a result self-similar processes have been used to successfully model data exhibiting long-range dependence and arising in a variety of different scientific fields, including hydrology, see e.g. [6], geophysics [12], biology [8], telecommunication networks [28] and economics [24]. The empirical presence of long-memory in such series is found in a local version of the power spectrum which behaves, as |λ|1−2H , as λ → 0, where H ∈]1/2, 1[ is the long-memory parameter. Among the simplest models that display long-range dependence, one can consider the fractional Brownian motion (fBm), introduced by Kolmogorov in a theoretical context [19], and by Mandelbrot and his co-workers [21] for its statistical applications. The fractional Brownian motion ∗ (in short fBm), denoted by {BH,C (t), t ≥ 0}, with parameters (H, C) ∈ (]0, 1[×R+), is the process defined as the fractional integration of a Gaussian pure white noise, or equivalently by the stochastic integral: Z 1/2 BH (t) = CVH ft (s)d B(s) (1) R 1 n H− 1 H−1/2 o with f (s) = |t − s| 2 1I]−∞ ](s) − |s| 1I]−∞ ](s) , t 0(H + 1/2) ,t ,0 with BH (0) = 0 and VH = 0(2H + 1) sin(π H). Due to the non stationarity of the fBm and the presence of long-memory, simulation and identification of a fBm is a delicate task. A vast literature has been published on these subjects. A good survey can be found in Beran [6] where historical and statistical aspects are considered. We refer to Adler and al. [4] or Taqqu and al. [27] for an empirical study on estimation methods. Through a bibliographical study, we intend to draw up a non exhaustive list of methods for sim- ulating a fBm, and for estimating the self-similarity parameter H. Firstly in Section 2, we recall fundamental properties of fBm: covariance and autocovariance functions, spectral density, Hausdorff dimension. We then describe five simulation methods in Section 3: the method of Mandelbrot and al. [21], that of Sellan and al. [2], the Choleski method, the Levinson one [24] and finaly the method of Wood and Chan [29]. Section 4 discusses several methods for estimating the self-similarity param- eter: spectral methods, maximum likelihood, time-scale methods and temporal methods. Section 5 presents a few simulation results with Boxplots, thus illustrating numerically the notions of Sections 3 and 4. Section 6 explores the quality of pseudo-random generators of fBm. A similar study has been undertaken by Jennane and al. [17]: three testing procedures defined independently of the model’s identification were considered there. Our approach is slightly different: we provide a theoretical test based on the asymptotic behavior of a parametric estimator of H which allows us to point out a good simulation method of the fBm. Finally we display in the Appendix the S-plus scripts implementing the methods considered in Sections 3 and 4.
2 2 Properties of fractional Brownian motion
As an alternative way to (1), the fBm can be defined as the unique mean-zero Gaussian process, null at the origin, with stationary and self-similar increments, such that
2 2 2H + E ( BH,C (t) − BH,C (s) ) = C |t − s| , ∀ s, t ∈ R . (2) Hereafter, we shall call standard fractional Brownian motion, the fBm with scale parameter C ≡ 1. Let us briefly review some fundamental results about the fBm. From the self-similarity property, we deduce the covariance and the autocovariance functions, given by C2 0(t, s) = Cov B (t), B (s) = |t|2H + |s|2H − |t − s|2H (3) H H 2 γ (t − s) = Cov BH (t + 1) − BH (t), BH (s + 1) − BH (s) C2 = |t − s − 1|2H − 2 |t − s|2H + |t − s + 1|2H . (4) 2 In the particular case H = 1/2, the fBm is identical to the Brownian motion; consequently γ (k) = 0, for |k| ≥ 1. When H 6= 1/2, an asymptotic expansion, as |k| → +∞ exhibits an hyperbolic decrease of γ (·) :
γ (k) ∼ C2 H(2H − 1) |k|2H−2 , as |k| → +∞ . (5)
This asymptotic behavior clearly shows that a path deviating from its mean, will have tendency to deviate more when H > 1/2, or to return closer to the mean when H < 1/2. The increments process of the fBm is called the fractional Gaussian noise (in short fGn). The fGn constitutes a stationary time-series, and admits a spectral density, defined as the Fourier transform of γ (.), explicitly given by
X −1−2H f (λ) = 2 cλ (1 − cos λ) | 2π j + λ) | , ∀ λ ∈ [0, 2π] , (6) j∈Z = C2 + with cλ 2π sin(π H)0(2H 1). A Taylor expansion of f near 0 shows that the spectral signature of the fGn is |λ|1−2H , indicating a pole at zero for H > 1/2, a characteristic fact of long-memory processes. Concerning the paths’ regularity, the fBm similarly to the Brownian motion, has continuous and almost surely non differentiable sample paths. The fractal approach refines the difference. Indeed, the Hausdorff dimension of a fBm with parameter H ∈]0, 1[ is almost surely equal to 2− H, which implies that for H < 1/2, the paths are more irregular than those of the Brownian motion and conversely for H > 1/2. Figure Fig.1 illustrates this remark. Finally, let us mention a property of continuity in H, in the sense of Kolmogorov, for the fBm proved by Peltier and Lévy-Véhel [22]:
∀ [a, b] ⊂]0, 1[ and K > 0, lim sup sup |BH,C (t) − BH0,C (t)| = 0 . (7) → h 0 a≤H,H0≤b t∈[0,K ] |H−H0| 3 fBm -2.0 -1.5 -1.0 -0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 time fBm -1.5 -1.0 -0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 time fBm -1.2 -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 time Figure 1: Samples of a fractional Brownian motion on [0,1], simulated by Wood-Chan’s method for values H = 0.3, 0.5, 0.8 from top to bottom. 3 Simulating the fractional Brownian motion 3.1 Statement of the problem In this Section, our aim is to describe few methods for simulating a fBm. We adopt the following framework: simulation of a sample of a standard fractional Brownian motion (C = 1), of length N at times i/N, i = 0,..., N − 1. Two approaches are distinguished: the first one consists in using only the properties of the fBm. It gives rise to three methods: the first one is based on a stochastic representation of the fBm [21], the second one consists in extracting the square root of the covariance matrix of the fBm, and the last one consists in the fractional integration of a Gaussian white noise, decomposed on a multiresolution analysis and relies upon a method of Sellan and al [2]. The second approach is in fact an indirect way. The idea is to generate a fGn, eX, at times i/N, for i = 0,..., N − 1 and then to define a sample of a fBm via the cumulated sums of eX, that is Pi to say to define eBH (i/N) = k=0 eX(i/N) for i = 1,..., N − 1 and eBH (0) = 0. The interest to use the increments process rests in its stationarity. The covariance matrix of the fGn at times i/N, i = 0,..., N − 1 is a Toeplitz matrix. The two methods presented (method of Levinson [24], method of Wood-Chan [29]) consist in computing in an exact way the square root of a Toeplitz matrix. 3.2 Stochastic representation of fBm By considering the fBm’s representation of Mandelbrot and Van Ness [21], the first natural idea to simulate a fBm consists in discretizing the stochastic integral (1). We have to approximate the integral 4 (1) by a Riemann sum truncated to a bound −aN → −∞. For t = 1,..., N − 1, one gets t eBH = N 1/2 0 t VH 1 X − − X − (t − k)H 1/2 − (−k)H 1/2 B (k) + (t − k)H 1/2 B (k) , 0(H + 1/2) N H 1 2 k=−aN k=0 where B1 (resp. B2) is a vector of aN +1 (resp. N) zero-mean standard Gaussian variables independent, and independent of B2. The choice of aN results from a compromise between the desired precision and the number of temporal points. In practice, and for the illustration that follows, we have chosen 1.5 aN = N . This approach is purely historic, and owing to several approximations is not a good way to generate a fBm. 3.3 Method of Sellan, Meyer and Abry This method has been established by Sellan, Meyer and Abry [2]. Since the fBm is derived by fractional integration of a Gaussian pure white noise, the idea is to start from the decomposition of a pure white noise onto a multiresolution analysis (in short MRA), see e.g. Daubechies [9] for generalities on wavelets and MRA: X X X w(t) = λ(k) φ0(t − k) + γ j (k) ψ j,k(t), (8) k∈Z j≥0 k∈Z where λ(k) and γ j (k) are standard independent Gaussian variables. φ0 is the scaling function and ψ j,k j≥0,k∈Z the wavelets associated to the MRA. Applying the operator of fractional integration, denoted by D−s (with s = H + 1/2), to (8) leads to X −s X X −s BH (t) = λ(k)(D φ0)(t − k) + γ j (k)(D ψ j,k)(t). (9) k∈Z j≥0 k∈Z The following result, due to Sellan [26], describes explicitly how to integrate fractionally a MRA, and the necessity to introduce biorthogonal wavelets. 2 ∗ Theorem 1 Let V0(φ0) be an orthogonal MRA of L (R) with regularity r ∈ N , φ0 and ψ0 repre- senting respectively the scale function and the mother wavelet deduced from this analysis. Assume ∈] 1 3 [ that s 2 , 2 , then (s) = { ∈ 2 s ∈ } (−s) = { ∈ 2 −s ∈ } V0 f L (R), D f V0 , and V0 f L (R), D f V0 , define two biorthogonal MRAs, admitting for scale functions (s) = (s) (−s) = (−s) φ0 Us(φ0) for V0 (φ0), and φ0 U −s(φ0) for V0 (φ0), −s 2 where g = Us( f ) has for Fourier transform (i2πν) (1−exp(i2πν)) bf (ν), and for mother wavelets: (s) = s −s (−s) = −s s ψ0 4 D (ψ0), and ψ0 4 D (ψ0). 5 D−s denotes the conjugate operator of D−s, and E, for a set E, is the adherence of E. Under the conditions of Theorem 1, Sellan proves that there exists a Gaussian white noise with variance σ 2, + allowing to construct an ARIMA(0, s, 0), denoted by bH , and for j ∈ Z , a Gaussian discrete j 2 white noise, with variance 2 σ , denoted by (γ j (k))k∈Z such that the restriction of BH to the interval ]0, T ], T > 0 admits the following decomposition ∀ ∈] ] − = X (s) − + X X −s − js (s) t 0, T , BH (t) b0 bH (k) φ0 (t k) 4 2 γ j (k) ψ j,k (t). (10) k∈Z j≥0 k∈Z The computing implementation is then realized in three steps: (s) (s) 1. Estimation of the filters related to φ0 and ψ0 : let u(k) be the filter of the initial MRA related to φ0, and v(k) the associated quadrature mirror filter. The above cited authors show, by denoting (s) (s) (−s) (−s) (s) (s) (−s) (−s) u and v (resp. u and v ), the filters associated to φ0 and ψ0 (resp. φ0 and ψ0 ), the following relations: u(s) = f (s) ∗ u, F (s)(z) = 2−s(1 + z−1)s . v(s) = g(s) ∗ v, G(s)(z) = 2s(1 − z−1)−s . u(−s) = f (−s) ∗ u, F (−s)(z) = 2s(1 + z)−s . v(−s) = g(−s) ∗ v, G(−s)(z) = 2−s(1 − z)s . where ∗ denotes the convolution product and where F (s) (resp. G(s)) denotes the z-transform of f (s) (resp. g(s)), = ±1. However, a numerical problem appears: for s = H + 1/2, the functions f (s) and g(s) have, in general, an infinite support and the series (10) diverges. To avoid this problem, the following approximations are proposed: (s) = ∗ (1) ∗ (d) (−s) = − ∗ (s) ∨ u u f t f , u δ−1 (ev ) , (s) = ∗ (1) ∗ (d) (−s) = ∗ (s) ∨ v v g tg , v δ1 (eu ) , where d = s − 1 and where t f (d) and tg(d) are versions of f (d) and g(d) truncated up to an order chosen a priori. 2. Simulation of bH : the process ARIMA(0, s, 0) results from the convolution of a Gaussian white noise and a filter with impulsive response defined by: k X −s −s 0(−s + 1) α(s) = (−1)p with = k p p 0(p + 1)0(−s − k + 1) p=0 Since the same numerical difficulty noted previously appears, the following approximation is (1) (d) (d) (d) proposed: bH = γ j ∗ α ∗ tα , where tα is a version of α truncated up to an order chosen a priori. 3. One then truncates the series (10) at some resolution J to get, J J = X (s) + X X −s − js (s) BH (t) bH (k) φ−J,k(t) γ j (k) 4 2 ψ j,k (t). (11) k∈Z j=0 k∈Z 6 One can now use the pyramidal algorithm of Mallat adapted to biorthogonal wavelets, see Daubechies [9]: decomposition with the help of filters u(−s) and v(−s), and synthesis with the help of u(s) and v(s). The diagram in Fig.2 illustrates these operations. To remain coherent, getting a sample of length N through a resolution J needs to generate a fBm on a duration T = 2−J N; the self-similarity property is used to get a sample at times i/N, for i = 0,..., N − 1. In practice, we have, as Abry and Sellan recommend, chosen a resolution of 6 or 7, and used the filters of a Daubechies wavelet of order 20, for its regularity properties. Caption £ [↑ 2] : dilatation operator defined by [↑ 2]xk=x2k. £ bH : ARIMA(0, s, 0) simulated. £ g0,..., gJ−1 : simulated standard Gaussian variables. b H (s) 2 u (s) 2 u g (s) (s) 0 2 v 2 u (s) (s) g v B (t) 1 2 2 u H (s) g 2 v J-1 Figure 2: Diagram for the simulation of a fBm by wavelet synthesis. 3.4 Method of Choleski Let 0 be the covariance matrix of a standard fBm, discretized at times i/N, for i = 0,..., N − 1. From (3), i j (0) = 0 , , for i, j = 0,..., N − 1. i, j N N Define 00 as the matrix 0 deprived of its first row and its first column. Since 00 is a symmetric definite positive matrix, it admits a Choleski decomposition 00 = LLt , where L is a lower triangular matrix. 7 Thus simulating a sample of a fBm at times i/N for i = 1,..., N − 1 is equivalent to generate a vector Z of (N − 1) standard independent Gaussian variables and apply the product LZ. Indeed, LZ t 0 t t is a centered Gaussian vector and E (LZ)(LZ) = 0 . Define eB = 0,(LZ) , eB is a sample of a fBm at times i/N for i = 0,..., N − 1. This method is the only one exact in theory, but due to a computational complexity of order O(N 3) and to the fact that 00 is extremely ill conditioned, it is of interest to derive methods that are less computational demanding. 3.5 Method of Levinson Let G be the autocovariance matrix of a standard fBm, discretized at times i/N, for i = 0,..., N − 1. From (4), (G)i, j = γ ( j − i), for i, j = 0,..., N − 1. To avoid the computation of the Choleski decomposition of G (which would lead to a method identical to the one presented in Section 3.4), it suffices to remark that G is a Toeplitz matrix. The Toeplitz nature means that the first row of G suffices to reconstruct G, which leads to advanced algorithms to extract the square root of G. The following one can be found in [24]. At step 1 define: −→ = − 1 2 = k1 γ ( N ) , σ1 1 , −→ 1 = 1 N−1 t 1 = 1 N−1 t L 1, γ N , . . . , γ N and bL 0, γ N , . . . , γ N . t j = 2 j j Then define vectors L σ j 0,..., 0, 1, `2, . . . , `N− j+1 and t j = 2 ˆj ˆj bL σ j 0,..., 0, 0, `2,..., `N− j+1 . At step j + 1, one has: −`ˆj −→ = 1 2 = 2 − 2 k j+1 2 σ j+1 σ j (1 k j+1), σ j 0 j+1 ! ! j ! bL IN k j+1 Z bL 1 0 −→ = , where Z = . j+1 j .. .. L k j+1 IN Z L . . 1 0 1 N −1 Let us denote by D the matrix D = diag(σ1, . . . , σN ) and let L = (L ,..., L ) D . One can check that LLt = G. Denoting by Z a vector of N standard independent Gaussian variables, the vector (LZ) defines a sample of a fGn at times i/N, for i = 0,..., N − 1. The cumulated sums of this sample define a sample of a fBm, eBH at the desired times (setting moreover eBH (0) = 0). This method generates exactly a fGn with a computation cost O(N 2 log(N)) but still remains particularly slow within S-plus, as soon as N > 1000. 8 3.6 Method of Wood-Chan Initially proposed by Davis and Harte [10], this method, available for any stationary Gaussian pro- cess, has been recently improved by Wood and Chan [29]. In order to extract the square root of the ∗ autocovariance matrix G, the idea is to embed G in a circulant matrix C, of size m = 2g, g ∈ N , and t then to generate a vector Y = (Y0,..., Ym−1) ; N (0, C), and thanks to an appropriate construction t of C, to generate (Y0,..., YN−1) ; N (0, G). Let C be the matrix defined by: c0 c1 ... cm−1 ( j m cm−1 c0 ... cm−2 γ ( ) if 0 ≤ j < C = where c = N 2 . . . . j m− j m . . .. . γ ( ) if < j < m − 1 . N 2 c1 c2 ... c0 By construction, C is symmetric and circulant. One chooses m the first power of two, for questions of algorithmic rapidity, m ≥ 2(N − 1) such that C is definite positive. The authors suggest an approximation when this condition can not be fulfilled. For a fBm, this condition is satisfied for the value m = 2 ∗ 2ν, where 2ν is the first power of two superior to N. Then, in order to diagonalize C, one uses a result of Brockwell and Davis: C can be decomposed as C = Q3Q∗ , where 3 is the diagonal matrix of eigenvectors of C, and Q is the unitary matrix defined by jk (Q) = m−1/2 exp −2iπ , for j, k = 0,..., m − 1 . j,k m 1/2 ∗ Because Q is unitary, if Y = Q3 Q Z with Z ; N (0, Im), one has Y ; N (0, C). Thus the simulation procedure of a fBm’s sample at times i/N, for i = 0,..., N − 1 reduces itself to the three following steps: 1. Estimation of the eigenvalues of C: a matrix calculus shows that m−1 m−1 X jk X jk λ = c exp −2iπ = c exp 2iπ , for k = 0,..., m − 1 . k j m j m j=0 j=0 This estimation may be done using the Fast Fourier Transform (direct or inverse). 2. Fast simulation of Q∗ Z: by decomposing Q∗ Z into real and imaginary parts, simulating of W = Q∗ Z amounts to the two following substeps: −→ generate U, V two independent normal variables, N (0, 1), and write W0 = U and W m = V . 2 −→ ≤ m for 1 j < 2 , generate U j , Vj two independent normal variables, N (0, 1) and write 1 W j = √ (U j + iVj ), 2 1 Wm− j = √ (U j − iVj ). 2 9 3. Reconstruction of X: the last step consists in calculating m−1 k 1 X jk X = √ pλ W exp −2iπ , for k = 0,..., m − 1 . N m j j m j=0 using again the FFT. We get a sample of a fBm, denoted eBH by evaluating cumulated sums of 1 N−1 = the vector X (0), X ( N ), . . . , X( N ) and setting moreover eBH (0) 0. The method of Wood and Chan is exact for simulating a fGn, has a complexity of N log(N), and in a practical point of view is fast even for large values of N. 3.7 About the approximation of a fBm via the cumulated sums of a fGn Let eBH denote the vector defined by: eBH (0) = 0 i X eBH (i/N) = eX(k/N), i = 1,..., N − 1 , k=0 where eX denotes a sample of a fGn generated at times i/N, i = 0,..., N − 1. Recall that, E eX(i/N)eX( j/N) = γ (( j − i)/N) , i, j = 0,..., N − 1. So, it is easy to see that i j X X E eBH (i/N) eBH ( j/N) = γ (( j − i)/N), i, j = 1,..., N − 1. k=0 l=0 Thus, one can discuss the approximation of a sample of a fBm via the cumulated sums of a sample of a fGn by estimating the relative error done on the second order structure, via the function E defined on {0,..., N − 1}2 by 0 if i or j = 0 E(i, j) = − Pi P j − 0(i, j) k=0 l=0 γ ( j i) / 0(i, j) otherwise. Figure Fig.3 displays the function E computed for different values of H. It is clear that such an approximation involves negligible mistakes on the covariance structure. 10 H = 0.1 H = 0.2 H = 03 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 Relative Error Relative Relative Error Relative Relative Error Relative 0.01 0.01 0.01 0 0 0 0.8 0.8 0.8 0.6 0.6 0.6 0.8 0.8 0.4 0.8 time 0.6 0.4 0.4 time 0.6 time 0.6 0.2 0.4 0.2 0.4 0.2 0.4 time time 0.2 0.2 time 0 0.2 0 0 0 0 0 H = 0.4 H = 0.5 H = 0.6 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 Relative Error Relative Relative Error Relative Relative Error Relative 0.01 0.01 0.01 0 0 0 0.8 0.8 0.8 0.6 0.6 0.6 0.8 0.8 0.8 0.4 0.6 0.4 0.4 time time 0.6 time 0.6 0.4 0.2 0.2 0.4 0.2 0.4 time time time 0.2 0.2 0.2 0 0 0 0 0 0 H = 0.7 H = 0.8 H = 0.9 0.05 0.05 0.05 0.04 0.04 0.04 0.03 0.03 0.03 0.02 0.02 0.02 Relative Error Relative Relative Error Relative Relative Error Relative 0.01 0.01 0.01 0 0 0 0.8 0.8 0.8 0.6 0.6 0.6 0.8 0.8 0.8 0.4 0.6 0.4 0.4 time time 0.6 time 0.6 0.4 0.2 0.2 0.4 0.2 0.4 time time time 0.2 0.2 0.2 0 0 0 0 0 0 Figure 3: Relative Error on the covariance structure of a fBm approximated via the cumulated sums of a fGn, for different values of H. 11 4 Identification of the fractional Brownian motion 4.1 Statement of the problem The irregularity’s analysis of data modelled by a fBm, the study of its spectral behavior, and any fore- casting problem based on fBm imply the necessity to estimate the Hurst parameter. In this Section, we briefly describe the main parametric methods to estimate the self-similarity parameter. We distinguish four approaches: 1. Spectral methods: log-periodogram, a variant of Lobato and Robinson’s method. 2. Maximum likelihood: Whittle’s estimator. 3. Time-scale methods: wavelet decomposition of the fBm. 4. Temporal methods: number of level crossings, discrete variations. This list is not exhaustive (the R/S method or correlogram’s approach has not been considered here, see e.g. Beran [6]) but presents the different ways of tackling the identification that have been discussed in the literature recently. Later on, we address the same problem when observing a sample (BH,C ) of a non-standard fBm of length N at times i/N for i = 0,..., N − 1; (X) will denote the sample of the increments of (BH,C ). Unless otherwise stated, the scale coefficient C is supposed to be unknown. 4.2 Spectral methods 4.2.1 Log-periodogram This approach consists in exploiting, on the one hand, the spectral signature of the fGn, f (λ) ∼ 1−2H c f |λ| , as |λ| → 0, and on the other hand the fact that the periodogram defined by N−1 2 1 X −itλ 2πk IN (λ) = X(t) e , for λ = λk,N = , 2π N N t=0 is an asymptotical unbiased estimator of the spectral density. One immediately notices that