Gaussian Processes
1. Basic Notions P ∈T Let T be a set, and X := {X } a stochastic process, defined on a suitable probability space (Ω ), that is indexed by T. Definition> 1.1. We say that X is a Gaussian process indexedR by T when P 1 1 (X X ) is a Gaussian random vector for every T ∈ T and 1. The distribution of X—that is the Borel measure A µ(A) := {X ∈ A}—is called a Gaussian measure. 1 Lemma 1.2. Suppose X := (X X ) is a Gaussian random vector. If ∈T we set T := {1}, then the stochastic process {X } is a Gaussian ∈T 1 process. Conversely, if {X } is a Gaussian process, then (X X ) is a Gaussian random vector.
TheE proof is left as exercise. Definition 1.3. If X is a Gaussian process indexed by T, then we define µ() := (X )[ ∈ T] and C() := Cov(X X ) for all ∈ T. The functions µ and C are called the mean and covariance functions of X respectively. Lemma 1.4. A symmetric × real matrix C is the covariance of some Gaussian random vector if and only if C is positive semidefinite. The latter property means that > R 1 C = C 0 for all ∈ =1 =1 ⇤
Proof. Consult any textbook on multivariate normal distributions.
79 80 6. Gaussian Processes R
6 6 Corollary 1.5. A function C : T × T → is the covariance function 1 of some T-indexed Gaussian process if and only if (C( )) is a 1 R positive semidefinite matrix for all ∈ T. 6 6 Definition 1.6. From now on we will say that a function C : T × T → is 1 positive semidefinite when (C( )) is a positive semidefinite matrix 1 for all ∈ T.
Note that we understand the structure of every Gaussian process by looking only at finitely-many Gaussian random variables at a time. As a result, the theory of Gaussian processes does not depend a priori on the topological structure of the indexing set T. In this sense, the theory of GaussianR R processes is quite different from Markov processes, martingales, etc. In those theories, it is essential that T is a totally-ordered set [such as + or ], for example. Here, T can in principle be any set. Still, it can happen that X has particularly-nice structure when T is Euclidean, or more generally, has some nice group structure. We anticipate this possibility and introduce the following.
∈T Definition 1.7. Suppose T is an abelian group and {X } a Gaussian process indexed by T. Then we use the additive notation for T, and say that 1 +1 + X is stationary when (X X ) and (X X ) have the same law 1 for all ∈ T.
∈T Lemma 1.8. Let T be an abelian group and let X := {X } denote a T-indexed Gaussian process with mean function and covariance function C. Then X is stationary if and only if and C are “translation R invariant.” That means that M 1 2 1 2 1 2 ( + )=() and C( )=C( + + ) for all ∈
The proof is left as exercise.
2. Examples ofR Gaussian Processes §1. Brownian Motion. By Brownian motion X, we mean a Gaussian pro- + cess, indexed by := [0 ∞), with mean function 0 and covariance func- > tion
C() := min()[R 0] R
In order to justify this definition, it suffi2ces to prove that C is a posi- 1 tive semidefinite function on T × T = +. Suppose ∈ and 2. Examples of Gaussian Processes 81 >
1 0. Then, 1 1 ∞ [0] [0 ] C( )= () () d =1 =1 =1 =1 0 1 1 ⎛ ⎞ ∞ [0] [0 ] = 0 () ⎝ ()⎠d =1 1 =1 2 > ∞ [0] = () d 0 0 =1
Therefore, Brownian motion exists.
§2. The Brownian Bridge. A Brownian bridge6 is a6 mean-zero Gaussian Cov:BB process, indexed by [0 1], and with covariance C() = min() − [0 1] (6.1)
The most elegant proof of existence, that6 I am6 aware of, is due to J. L. Doob: Let B be a Brownian motion, and define 6 6 1 X := B − B [0 1] subsec:OU 0 1 Then, X := {X } is a mean-zero Gaussian process that is indexed by [0 1] and has the covariance function of (6.1). R §3. The Ornstein–Uhlenbeck Process. An Ornstein–Uhlenbeck process + is a stationary Gaussian process X indexed by> with mean function 0 and covariance −|−| C()=e [ 0]
It remains to prove that C is a positive semidefinite1 function. The proof R rests on the following well-known formula: FT:Cauchy ∞ −|| 1 e e = 2 d [ ∈ ] (6.2) π −∞ 1+ Thanks to (6.2), ∞ ( −) 1 d C( )= 2 e =1 =1 π −∞ 1+ =1 =1 2 > ∞ 1 d 2 = −∞ e 0 π 1+ =1 E 1In other words, if Y has a standard Cauchy distribution on the line, then its characteristic function is exp(Y) = exp(−||). 82 6. Gaussian Processes
R
§4. Brownian Sheet. AnN N-parameterN Brownian sheet X is a Gaussian process, indexed by + := [0 ∞) , whose mean function is zero and covariance function is R 1 N 1 N N C( )= min( )[ := ( ) := ( ) ∈ +] =1
Clearly, a 1-parameter BrownianR sheet is BrownianR motion; in that case, the existence problem has been addressed. In general,N we may argue as 1 1 follows: For all ∈ and ∈ +, 1 1 N N ∞ [0 ] [0] min( )= 0 () () d =1 =1 =1 =1 =1 =1 1 1 R N N [0 ] [0] = + ( ) ( ) d =1 =1 =1
It is harmless1/N to take1/N the complex conjugate of since is real valued. But now = is in general complex-valued, and1 we may1 write N R N 1/N N [0 ] [0] min( )= + ( ) ( ) ( ) d =1 =1 =1 =1 =1 =1 1 2 > R N 1/N N + [0 ] = ( ) d 0 =1 =1
This proves that the Brownian sheet exists. R §5. Fractional Brownian Motion. A fractional Brownian motion [or fBm] + 0 is a Gaussian process indexed by that has mean function 0, X := 0, E > Var:fBm and covariance function given by 2 2α (|X − X | )=| − | [ 0] (6.3) for some constant α>0. The constant α is called the Hurst parameter ofE X. Note that2 (6.3)2α indeed yields the covariance function of X: Since Var(X )= 0 E (|X − X | )= , 2α 2 2 2α 2α | − | = X + X − 2X X = + − 2Cov(X X )
Therefore, > Cov:fBm 2α 2α 2α + −| − | Cov(X X )= [ 0] (6.4) 2 2. Examples of Gaussian Processes 83
6 Direct inspection shows that (6.4) does not define a positive-definite func- tion C when α 0. This is why we have limited ourselves to the case that α>0. 1 2 Note that an fBm with Hurst index α = / is a Brownian motion. The reason is the following elementary identity: > + −| − | = min()[> 0] > 2 which can be verified by considering the cases and separately. The more interesting “if” portion of the following is due to Mandelbrot th:fBm:exists 6 and Van Ness (1968). Theorem 2.1. An fBm with Hurst index α exists if and only if α 1.
Fractional Brownian motion with> Hurst index α =1is a trivial process in the following sense: Let N be a standard normal random variable, and 0 define X := N. Then, X := {X } is fBm with index α =1. For this reason, many experts do not refer to the α =1case as fractional Brownian motion, and reserve the teminology fBm for the case that α ∈ (0 1). Also, 1 2 fractional Brownian motion with Hurst index α = / is Brownian motion.
Proof. First we examine the case that α<1. Our goal is to prove that 2α 2α 2α + −| − | C() := 2 is a covariance function. Phi1 Consider the function R α−(1/2) α−(1/2) R > + + Φ() := ( − ) − (−) (6.5) + defined for all 0 and ∈∞ , where2 := max(0) for all ∈ . Direct inspection yields that −∞[Φ()] d<∞, since α<1, and in fact a second computation on the side yields > Phi2 ∞ −∞ Φ()Φ() d = κC() for all 0 (6.6)
where κ is a positive and finite constant that depends only on α. In partic- ular, ∞ 1 C( )= Φ( )Φ( ) d =1 =2 κ =1 =2 −∞ > 2 ∞ 1 = Φ( ) d 0 κ −∞ =1 84 6. Gaussian Processes
This proves the Theorem in the case that α<1. We have seen already that theorem holds [easily] when α =1. Therefore, we now consider α>1, and strive to prove that fBm does not exist in this case. The proof hinges on a technicalP fact which we state without proof; this and much more will be proved later on in Theorem 2.3 on page 97. Recall pr:KCT:Gauss that Y¯ is a modification of Y when {Y = Y¯ } =1for all . ∈[0τ] Proposition 2.2. Let Y := {Y } denote a Gaussian process indexed by T := [0 τ], where τ>0 is a fixed constant. Suppose there exists a E finite constant C and a constant6 η>0 such that 6 6 2 η |Y − Y | C| − | for all 0 τ
Then Y has a H¨older-continuous modification Y¯ . Moreover, for every non-random constant ρ ∈ (0 η/2), eq:KCT:Gauss 6 6 |Y¯ − Y¯ | sup ρ < ∞ almost surely (6.7) 0 = τ | − |
We use Proposition 2.2 in the following way: Suppose to the contrary that there existed an fBm X with Hurst parameter α>1. By Proposition 2.2, X would have a continuous modification X¯ such that for all ρ ∈ (0 α) and τ>0, 6 6 |X¯ − X¯ | V(τ) := sup ρ < ∞ almost surely 0 = τ | − |
Choose ρ ∈ (1 α) and observe6 that ρ X¯ − X¯ V(τ)| − | for all ∈ [0 τ]
almost surely for all τ>0. Divide both side by | − | and let → > in order to see thatP X¯ is differentiable and> its derivative is zero everywhere, 0 0 a.s. Since X¯ = X =0a.s., it then follows that X¯ =0a.s. for all ⇤0. subsec:WN In particular, {2Xα =0} =1for all 0. Since the variance of X is supposed to be , we are led to a contradiction.H H H §6. White Noise and Wiener Integrals. Let beH a complex Hilbert space with norm and corresponding inner product · · . H H Definition 2.3. A white noise indexed by T = is a Gaussian process ∈ H H {ξ()} , indexed by , with mean function 0 and covariance function, 1 2 1 2 1 2 C( )= [ ∈ ] 2. Examples of Gaussian Processes 85 R H 1 The proof of existence is fairly elementary: For all ∈ and 1 ∈ , H C( )= =1 =1 =1 =1 2 H H > = = =1 =1 =1 which is clearly 0.
lem:WN:Lin The following simple result is oneR of the centerpiecesH of this section, and plays an important role in the sequel. 1 1 Lemma 2.4. For every ∈ and ··· ∈ , ⎛ ⎞ ξ ⎝ ⎠ = ξ( ) a.s. =1 =1 R H
WN:Lin1 Proof. We plan to prove that: (a) For all ∈ and ∈ , H ξ()=ξ() a.s.; (6.8) WN:Lin2 1 2 and (b) For all ∈ , 1 2 1 2 ξ( + )=ξ( )+ξ( ) a.s. (6.9) ETogether, (6.8) and (6.9)E imply the lemmaE with =2; the general case follows from this case, after we apply induction. Let us prove (6.8) then: 2 2 2 2 H H H |ξ() − ξ()| = |ξ()| + |ξ()| − 2Cov (ξ() ξ()) 2 2 2 = + − 2 =0 E This proves (6.8). As regards (6.9), we note that 2 E 1 2 1 E 2 |ξ( + ) − ξ( ) − ξ( )| 2 2 1 H 2 H 1H 2 H 1 2 1 2 = |ξ( + )| + |ξ( )+ξ( )| − 2Cov (ξ( + ) ξ( )+ξ( )) 2 2 2 H H 1 2 1 2 1 2 = + + + +2 H H H 1 2 H 1 1 2 2 − 2[ + + + ] 2 2 2 H 1 2 1 2 1 2 = + − 2 − − ⇤ which is zero, thanks to the Pythagorean rule on . This proves (6.9) an hence the lemma.
Lemma 2.4 can be rewritten in the following essentially-equivalent form. 86 6. Gaussian Processes th:Wiener H P P
2 2 Theorem 2.5 (Wiener). The map ξ : → L (Ω ) := L ( ) is a linear Hilbert-space isometry.
Because of its isometry property,H white noise is also referredH to as the P iso-normal or iso-gaussian process. 2 2 H 2 Very often, the Hilbert space is an L -space itself;2 say, = L (µ) := L (Aµ). Then, we can think of ξ() as an L ( )-valued integral of ∈ In such a case, we sometimes adopt an integral notation; namely, () ξ(d) := dξ := ξ()
This operation has all but2 one of the properties of integrals: The triangle inequalityH does not hold. Definition 2.6.2 The random variable dξ is called the Wiener integral of ∈ = L2 (µ). One also defines definite Wiener1 integrals as follows: For all ∈ L (µ) and E ∈, 1 6 E E () ξ(d) := E dξ := ξ( ) 2 2 E L (µ) L (µ) This is a rational definition since H < ∞. PAn important property of white noise is that, since it is a Hilbert-space isometry,2 it mapsE orthogonal elements of to orthogonalH elements of L ( ). In other words: 1 2 1 2 [ξ( )ξ( )] = 0 if and only if ( ) =0 1 2 Because (ξ( ) ξ( )) is a Gaussian random vector of uncorrelatedH co- ordindates, we find that 1 2 1 2 ξ( ) and ξ( ) are independent if and only if ( ) =0 pr:uncorr:indep H H H The following is a ready consequence of this rationale. H 1 2 Proposition 2.7. If are orthogonal subspaces of , then ∈ {ξ()} =1 2 are independent Gaussian processes. pr:KL H The following highlights the strength of the preceding result. ∞ Proposition 2.8. Let {ψ }=1 be a complete orthonormal basis for . Then, 1 2 we can find a sequence of i.i.d. standard normal random variables X X such that ∞ ξ()= X > =1
1 2 2In fact, |ξ()| 0 a.s., whereas ξ(||) is negative with probability / . 2. Examples of Gaussian Processes 87 H P
2 where := ψ and the sum converges in L ( ).
Remark 2.9. Proposition 2.8∞ yields a 1-1 identification of the white noise ξ with the i.i.d. sequence {X }=1. Therefore, in the setting of Proposition 2.8, some people refer to a sequence of i.i.d. standard normal random variables as white noise.
Proof. Thanks to Proposition 2.7, X := ξ(ψ ) defines an i.i.d. sequence of standard normal random variables. AccordingH to the Riesz–Fischer theorem ∞ = Hψ for every ∈ =1
where the sum converges in . Therefore, Theorem 2.5 ensuresH that ∞ ∞ ξ()= ξ(ψ )= X for every ∈ =1 P=1 H 2 where the sum convergesP in L ( ). We have implicitly used the following H ready consequenceP of2 Wiener’s isometry [Theorem 2.5]: If → in ⇤ then ξ( ) → ξ(2) in L ( ). It might help to recall that the reason is simply that ξ( − )L ( ) = − .
Next we work out a few examples of Hilbert spaces that ariseH in the literature. ExampleH 2.10 (Zero-Dimensional Hilbert Spaces). We can identify = {0} with a Hilbert space in a canonical way. In this case, white noise indexed by is just a normal random variable with mean zero and variance 0 [i.e., ξ(0) := 0]. > H R H Example 2.11 (Finite-Dimensional H Hilbert Spaces). Choose and fix an in- H R teger 1. The space := is a2 real Hilbert 2 space with inner product =1 =1 () := and norm := . Let ξ denote white noise 1 indexed by = and define a random vector X := (X X ) via XR:= ξ( ) =1 2 1 where := (1 0 0) := (0 0 1) denote the usual orthonor- mal basis elements of . According to Proposition 2.8 and its proof, 1 X X are i.i.d. standard normal random variables and for every - 1 vector := ( ), MVN ξ()= X = X (6.10) =1 88 6. Gaussian Processes R
1 Now consider points ∈ and write the th coordinate of and . Define ⎛ 1 ⎞ ξ( ) ⎜ . ⎟ Y := ⎝ . ⎠ ξ( )
Then Y is a mean-zero Gaussian random vector with covariance matrix A A where A is an × matrix whose th column is . Then we can apply (6.10) to see that Y = A X. In other words, every multivariate normal random vector with mean vector 0 and covariance matrix A A canH be written as a linear combination A X of i.i.d. standard normals. R 1 R > Example2 2.12 (Lebesgue2 Spaces)> . Consider the usual Lebesgue space := + [0] + L ( ). Since ∈ L ( ) for all 0, we can define a mean-zero 0 1 Gaussian process B := {B } by setting B:xi [0] B := ξ( )= 0 dξ (6.11) E 1 1 R Then, B is a Brownian motion because 2 E [0] [0] + [B B ]= L ( ) = min() 2 Since (|B − B | )=| − |, Kolmgorov’s continuity theorem [Proposition 2.2] shows that B has a continuous modification B¯ . Of course, B¯ is also a Brownian motion, but it has continuous trajectories [Wiener’s Brownian motion]. Some authors intepret (6.11) somewhat loosely and present white noise as the derivative of Brownian motion. This viewpoint can be made rigorous in the following way: White noise is the weak derivative of Br- ownian motion, in the sense of distribution theory. We will not delve into this matter further though. I will close this example by mentioning, to those that know WienerR and Itˆo’stheories of stochastic∞ integration against Brownian motion,2 that 10 + the Wiener integral ∞ dB of a non-random function ∈ L ( ) is the same object as 0 dξ = ξ() here. Indeed, it suffices to prove this [0] assertion when = () for some fixed number >0. But then the assertion is just our definition (6.11) of the Brownian motion B. Example 2.13 (Lebesgue Spaces, Continued). Here is a fairly general re- ceipe for constructing mean-zero Gaussian processes from white noise: Suppose we could write C()= K()K() µ(d)[ ∈ T] R
where µ is a locally-finite measure on some measure2 space (A), and K : A × T → is a function such that K(•) ∈ L (µ) for all ∈ T. Then, 2. Examples of Gaussian Processes 89 H
2 the receipe is this: Let ξ be white noise on := L (µ), and define X := K() ξ(d)[ ∈ T]
∈T Then, X := {X } defines a mean-zero T-indexed Gaussian process with covariance functionR C. Here are some examples of how we1 can use this idea to build mean-zero Gaussian processes from white noise. + [0] (1) Let A := , µ := Lebesgue measure, and K() := (). These choices lead us to the same white-noise construction ofH BrownianR
motion as the previous example. 2 (2) Given a number α ∈ (0 1), let ξ be a white noise on := L ( ). Because of (6.6) and our general discussion, earlier in this exam- > ple, we findR that α−(1/2) α−(1/2) 1 X := ( − )+ − (−)+ ξ(d)[ 0] κ defines an fBm with Hurst index α. (3) For a more interesting example, consider the covariance function of the Ornstein–Uhlenbeck process whose> covariance function is, we recall, −|−| C()=e [ 0] Define 1 d µ(d) := 2 [−∞ <<∞] π 1+ According to (6.2), and thanks to symmetry, (−) C()= e µ(d)= cos( − ) µ(d) = cos() cos() µ(d)+ sin() sin() µ(d) Now we follow our general2 discussion, let ξ and ξ are two inde- pendent white noises on L (µ), and then define > X := cos(> ) ξ(d) − sin() ξ (d)[ 0] 3 0 Then, X := {X } is an Ornstein–Uhlenbeck process.
3 One could just as easily put a plus sign in place of the minus sign here. The rationale for this particular way of writing is that if we study the “complex-valued white noise” ζ := ξ + ξ , where ξ is an independent copy of ξ, then X = Re exp() ζ(d) A fully-rigorous discussion requires facts about “complex-valued” Gaussian processes, which I will not develop here.