<<

STOCHASTIC PROCESSES AND APPLICATIONS

G.A. Pavliotis Department of Imperial College London

February 2, 2014 2 Contents

1 Introduction 1 1.1 Historical Overview ...... 1 1.2 The One-Dimensional ...... 3 1.3 WhyRandomness...... 6 1.4 Discussion and Bibliography ...... 7 1.5 Exercises ...... 7

2 Elements of 9 2.1 Basic Definitions from ...... 9 2.1.1 ...... 11 2.2 RandomVariables...... 12 2.2.1 Expectation of Random Variables ...... 16 2.3 Conditional Expecation ...... 18 2.4 The Characteristic ...... 18 2.5 Gaussian Random Variables ...... 19 2.6 Types of Convergence and Limit ...... 23 2.7 Discussion and Bibliography ...... 25 2.8 Exercises ...... 25

3 Basics of the Theory of Processes 29 3.1 Definition of a ...... 29 3.2 Stationary Processes ...... 31 3.2.1 Strictly Stationary Processes ...... 31 3.2.2 Second Order Stationary Processes ...... 32 3.2.3 Ergodic Properties of Second-Order Stationary Processes . 37 3.3 BrownianMotion ...... 39 3.4 Other Examples of Stochastic Processes ...... 44

i ii CONTENTS

3.5 The Karhunen-Lo´eve Expansion ...... 46 3.6 Discussion and Bibliography ...... 51 3.7 Exercises ...... 51 Chapter 1

Introduction

In this chapter we introduce some of the concepts and techniques that we will study in this book. In Section 1.1 we present a brief historical overview on the develop- ment of the theory of stochastic processes in the twentieth century. In Section 1.2 we introduce the one-dimensional random walk an we use this example in order to introduce several concepts such , the . Some comments on the role of probabilistic modeling in the physical sciences are of- fered in Section 1.3. Discussion and bibliographical comments are presented in Section 1.4. Exercises are included in Section 1.5.

1.1 Historical Overview

The theory of stochastic processes, at least in terms of its application to , started with Einstein’s work on the theory of Brownian motion: Concerning the motion, as required by the molecular-kinetic theory of heat, of particles suspended in liquids at rest (1905) and in a series of additional papers that were published in the period 1905 1906. In these fundamental works, Einstein presented an expla- − nation of Brown’s observation (1827) that when suspended in water, small pollen grains are found to be in a very animated and irregular state of motion. In devel- oping his theory Einstein introduced several concepts that still play a fundamental role in the study of stochastic processes and that we will study in this book. Using modern terminology, Einstein introduced a model for the motion of the particle (, pollen grain...). Furthermore, he introduced the idea that it makes more sense to talk about the probability of finding the particle at position x at time t, rather than about individual trajectories.

1 2 CHAPTER 1. INTRODUCTION

In his work many of the main aspects of the modern theory of stochastic pro- cesses can be found: The assumption of Markovianity (no memory) expressed through the Chapman- • Kolmogorov .

The Fokker–Planck equation (in this case, the diffusion equation). • The derivation of the Fokker-Planck equation from the master (Chapman- • Kolmogorov) equation through a Kramers-Moyal expansion.

The calculation of a transport coefficient (the diffusion equation) using macro- • scopic (kinetic theory-based) considerations: k T D = B . 6πηa

k is Boltzmann’s constant, T is the temperature, η is the viscosity of the • B fluid and a is the diameter of the particle. Einstein’s theory is based on an equation for the function, the Fokker–Planck equation. Langevin (1908) developed a theory based on a stochastic . The equation of motion for a Brownian particle is d2x dx m = 6πηa + ξ, dt2 − dt where ξ is a random force. It can be shown that there is complete agreement be- tween Einstein’s theory and Langevin’s theory. The theory of Brownian motion was developed independently by Smoluchowski, who also performed several ex- periments. The approaches of Langevin and Einstein represent the two main approaches in the modelling of physical systems using the theory of stochastic processes and, in particular, diffusion processes: either study individual trajectories of Brownian particles. Their evolution is governed by a stochastic differential equation: dX = F (X)+Σ(X)ξ(t), dt where ξ(t) is a random force or study the probability ρ(x,t) of finding a particle at position x at time t. This probability distribution satisfies the Fokker–Planck equation: ∂ρ 1 = (F (x)ρ)+ D2 : (A(x)ρ), ∂t −∇ · 2 1.2. THE ONE-DIMENSIONAL RANDOM WALK 3 where A(x) = Σ(x)Σ(x)T . The theory of stochastic processes was developed during the 20th century by several mathematicians and physicists including Smolu- chowksi, Planck, Kramers, Chandrasekhar, Wiener, Kolmogorov, Itˆo, Doob.

1.2 The One-Dimensional Random Walk

We let time be discrete, i.e. t = 0, 1, ... . Consider the following stochastic process S : S = 0; at each time step it moves to 1 with equal probability 1 . n 0 ± 2 In other words, at each time step we flip a . If the is heads, we move one unit to the right. If the outcome is tails, we move one unit to the left. Alternatively, we can think of the random walk as a sum of independent random variables: n Sn = Xj, Xj=1 where X 1, 1 with P(X = 1) = 1 . j ∈ {− } j ± 2 We can simulate the random walk on a computer:

We need a (pseudo)random number to generate n independent ran- • dom variables which are uniformly distributed in the [0,1].

If the value of the random is > 1 then the particle moves to the left, • 2 otherwise it moves to the right.

We then take the sum of all these random moves. • The S N indexed by the discrete time T = 1, 2, ...N is • { n}n=1 { } the path of the random walk. We use a linear interpolation (i.e. connect the points n,S by straight lines) to generate a continuous path. { n} Every path of the random walk is different: it depends on the outcome of a se- quence of independent random experiments. We can compute by gen- erating a large number of paths and computing . For example, E(Sn) = E 2 0, (Sn)= n. The paths of the random walk (without the linear interpolation) are not continuous: the random walk has a jump of size 1 at each time step. This is an example of a discrete time, discrete stochastic processes. The random walk is a time-homogeneous Markov process. If we take a large number of steps, the random walk starts looking like a continuous time process with continuous paths. 4 CHAPTER 1. INTRODUCTION

50−step random walk

8

6

4

2

0

−2

−4

−6

0 5 10 15 20 25 30 35 40 45 50

Figure 1.1: Three paths of the random walk of length N = 50.

1000−step random walk

20

10

0

−10

−20

−30

−40

−50

0 100 200 300 400 500 600 700 800 900 1000

Figure 1.2: Three paths of the random walk of length N = 1000. 1.2. THE ONE-DIMENSIONAL RANDOM WALK 5

2 of 1000 paths 5 individual paths 1.5

1

0.5 U(t)

0

−0.5

−1

−1.5 0 0.2 0.4 0.6 0.8 1 t

Figure 1.3: Sample Brownian paths.

We can quantify this observation by introducing an appropriate rescaled pro- cess and by taking an appropriate limit. Consider the sequence of continuous time stochastic processes 1 Zn := S . t √n nt In the limit as n , the sequence Zn converges (in some appropriate sense, → ∞ { t } that will be made precise in later chapters) to a Brownian motion with diffusion ∆x2 1 coefficient D = 2∆t = 2 . Brownian motion W (t) is a continuous time stochastic processes with continuous paths that starts at 0 (W (0) = 0) and has indepen- dent, normally. distributed Gaussian increments. We can simulate the Brownian motion on a computer using a random number generator that generates normally distributed, independent random variables. We can write an equation for the evo- lution of the paths of a Brownian motion Xt with diffusion coefficient D starting at x:

dXt = √2DdWt, X0 = x.

This is the simplest example of a stochastic differential equation. The probability of finding Xt at y at time t, given that it was at x at time t = 0, the transition 6 CHAPTER 1. INTRODUCTION probability density ρ(y,t) satisfies the PDE

∂ρ ∂2ρ = D , ρ(y, 0) = δ(y x). ∂t ∂y2 −

This is the simplest example of the Fokker–Planck equation. The connection be- tween Brownian motion and the diffusion equation was made by Einstein in 1905.

1.3 Why

Why introduce randomness in the description of physical systems?

To describe outcomes of a repeated set of experiments. Think of tossing a • coin repeatedly or of throwing a .

To describe a for which we have incomplete informa- • tion: we have imprecise knowledge of initial and boundary conditions or of model .

– ODEs with random initial conditions are equivalent to stochastic pro- cesses that can be described using stochastic differential .

To describe systems for which we are not confident about the validity of our • .

To describe a exhibiting very complicated behavior (chaotic • dynamical systems). versus .

To describe a high dimensional deterministic system using a simpler, low • dimensional stochastic system. Think of the physical model for Brownian motion (a heavy particle colliding with many small particles).

To describe a system that is inherently random. Think of quantum mechan- • ics.

Stochastic modeling is currently used in many different areas ranging from to climate modeling to economics. 1.4. DISCUSSION AND BIBLIOGRAPHY 7

1.4 Discussion and Bibliography

The fundamental papers of Einstein on the theory of Brownian motion have been reprinted by Dover [7]. The readers of this book are strongly encouraged to study these papers. Other fundamental papers from the early period of the development of the theory of stochastic processes include the papers by Langevin, Ornstein and Uhlenbeck [25], Doob [5], Kramers [13] and Chandrashekhar’s famous re- view article [3]. Many of these early papers on the theory of stochastic processes have been reprinted in [6]. Many of the early papers on the theory of Brown- ian motion are available from http://www.physik.uni-augsburg.de/ theo1/hanggi/History/BM-History.html. Very useful historical com- ments can be founds in the books by Nelson [19] and Mazo [18]. The figures in this chapter were generated using matlab programs from http: //www-math.bgsu.edu/z/sde/matlab/index.html.

1.5 Exercises

1. Read the papers by Einstein, Ornstein-Uhlenbeck, Doob etc.

2. Write a computer program for generating the random walk in one and two di- mensions. Study numerically the Brownian limit and compute the statistics of the random walk. 8 CHAPTER 1. INTRODUCTION Chapter 2

Elements of Probability Theory

In this chapter we put together some basic definitions and results from probability theory that will be used later on. In Section 2.1 we give some basic definitions from the theory of probability. In Section 2.2 we present some properties of ran- dom variables. In Section 2.3 we introduce the concept of and in Section 2.4 we define the , one of the most useful tools in the study of (sums of) random variables. Some explicit calculations for the multivariate Gaussian distribution are presented in Section 2.5. Different types of convergence and the basic limit theorems of the theory of probability are dis- cussed in Section 2.6. Discussion and bibliographical comments are presented in Section 2.7. Exercises are included in Section 2.8.

2.1 Basic Definitions from Probability Theory

In Chapter 1 we defined a stochastic process as a dynamical system whose law of evolution is probabilistic. In order to study stochastic processes we need to be able to describe the outcome of a random experiment and to calculate functions of this outcome. First we need to describe the set of all possible experiments. Definition 2.1. The set of all possible outcomes of an experiment is called the and is denoted by Ω. Example 2.2. The possible outcomes of the experiment of tossing a coin are • H and T . The sample space is Ω= H, T .

The possible outcomes of the experiment of throwing a die are 1, 2, 3, 4, 5 • and 6. The sample space is Ω= 1, 2, 3, 4, 5, 6 .  9 10 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

We define events to be of the sample space. Of course, we would like the unions, intersections and complements of events to also be events. When the sample space Ω is uncountable, then technical difficulties arise. In particular, not all subsets of the sample space need to be events. A definition of the collection of subsets of events which is appropriate for finite additive probability is the follow- ing. Definition 2.3. A collection of Ω is called a field on Ω if F i. ; ∅ ∈ F ii. if A then Ac ; ∈ F ∈ F iii. If A, B then A B . ∈ F ∪ ∈ F From the definition of a field we immediately deduce that is closed under F finite unions and finite intersections:

A ,...A n A , n A . 1 n ∈F ⇒ ∪i=1 i ∈ F ∩i=1 i ∈ F When Ω is infinite dimensional then the above definition is not appropriate since we need to consider countable unions of events. Definition 2.4 (σ ). A collection of Ω is called a σ-field or σ-algebra − F on Ω if i. ; ∅ ∈ F ii. if A then Ac ; ∈ F ∈ F iii. If A , A , then A . 1 2 ···∈F ∪i∞=1 i ∈ F A σ-algebra is closed under the operation of taking countable intersections. Example 2.5. = , Ω . • F ∅ = , A, Ac, Ω where A is a of Ω. • F ∅ The power set of Ω, denoted by 0, 1 Ω which contains all subsets of Ω. • { } Let be a collection of subsets of Ω. It can be extended to a σ algebra (take F − for example the of Ω). Consider all the σ that contain and − F take their intersection, denoted by σ( ), i.e. A Ω if and only if it is in every F ⊂ σ algebra containing . σ( ) is a σ algebra (see Exercise 1 ). It is the smallest − F F − algebra containing and it is called the σ algebra generated by . F − F 2.1. BASIC DEFINITIONS FROM PROBABILITY THEORY 11

Example 2.6. Let Ω = Rn. The σ-algebra generated by the open subsets of Rn (or, equivalently, by the open balls of Rn) is called the Borel σ-algebra of Rn and is denoted by (Rn). B Let X be a closed subset of Rn. Similarly, we can define the Borel σ-algebra of X, denoted by (X). B A sub-σ–algebra is a collection of subsets of a σ–algebra which satisfies the of a σ–algebra. The σ field of a sample space Ω contains all possible outcomes of the ex- − F periment that we want to study. Intuitively, the σ field contains all the − about the random experiment that is available to us. Now we want to assign to the possible outcomes of an experiment. Definition 2.7 (Probability ). A P on the measurable space (Ω, ) is a function P : [0, 1] satisfying F F 7→ i. P( ) = 0, P(Ω) = 1; ∅ ii. For A , A ,... with A A = , i = j then 1 2 i ∩ j ∅ 6 ∞ P( ∞ A )= P(A ). ∪i=1 i i Xi=1 Definition 2.8. The triple Ω, , P comprising a set Ω, a σ-algebra of subsets F F of Ω and a probability measure P on (Ω, ) is a called a .  F Example 2.9. A biased coin is tossed once: Ω= H, T , = , H, T, Ω = { } F {∅ } 0, 1 , P : [0, 1] such that P( ) = 0, P(H) = p [0, 1], P(T ) = { } F 7→ ∅ ∈ 1 p, P(Ω) = 1. − Example 2.10. Take Ω = [0, 1], = ([0, 1]), P = Leb([0, 1]). Then (Ω, , P) F B F is a probability space.

2.1.1 Conditional Probability

One of the most important concepts in probability is that of the dependence be- tween events. Definition 2.11. A family A : i I of events is called independent if { i ∈ } P j J Aj = Πj J P(Aj) ∩ ∈ ∈ for all finite subsets J of I.  12 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

When two events A, B are dependent it is important to know the probability that the event A will occur, given that B has already happened. We define this to be conditional probability, denoted by P(A B). We know from elementary proba- | bility that P (A B) P (A B)= ∩ . | P(B) A very useful result is that of the .

Definition 2.12. A family of events B : i I is called a partition of Ω if { i ∈ }

Bi Bj = , i = j and i I Bi =Ω. ∩ ∅ 6 ∪ ∈

Proposition 2.13. Law of total probability. For any event A and any partition B : i I we have { i ∈ } P(A)= P(A B )P(B ). | i i i I X∈ The proof of this result is left as an exercise. In many cases the calculation of the probability of an event is simplified by choosing an appropriate partition of Ω and using the law of total probability. Let (Ω, , P) be a probability space and fix B . Then P( B) defines a F ∈ F ·| probability measure on . Indeed, we have that F P( B) = 0, P(Ω B) = 1 ∅| | and (since A A = implies that (A B) (A B)= ) i ∩ j ∅ i ∩ ∩ j ∩ ∅

∞ P ( ∞ A B)= P(A B), ∪j=1 i| i| Xj=1 + for a countable family of pairwise disjoint sets A ∞. Consequently, (Ω, , P( B)) { j}j=1 F ·| is a probability space for every B cF . ∈ 2.2 Random Variables

We are usually interested in the consequences of the outcome of an experiment, rather than the experiment itself. The function of the outcome of an experiment is a , that is, a from Ω to R. 2.2. RANDOM VARIABLES 13

Definition 2.14. A sample space Ω equipped with a σ field of subsets is called − F a measurable space.

Definition 2.15. Let (Ω, ) and (E, ) be two measurable spaces. A function F G X :Ω E such that the event → ω Ω : X(ω) A =: X A (2.1) { ∈ ∈ } { ∈ } belongs to for arbitrary A is called a or random F ∈ G variable.

When E is R equipped with its Borel σ-algebra, then (2.1) can by replaced with X 6 x x R. { }∈F ∀ ∈ Let X be a random variable (measurable function) from (Ω, ,µ) to (E, ). If E F G is a then we may define expectation with respect to the measure µ by

E[X]= X(ω) dµ(ω). ZΩ More generally, let f : E R be –measurable. Then, 7→ G E[f(X)] = f(X(ω)) dµ(ω). ZΩ Let U be a topological space. We will use the notation (U) to denote the Borel B σ–algebra of U: the smallest σ–algebra containing all open sets of U. Every ran- dom variable from a probability space (Ω, ,µ) to a measurable space (E, (E)) F B induces a probability measure on E:

1 µ (B)= PX− (B)= µ(ω Ω; X(ω) B), B (E). (2.2) X ∈ ∈ ∈B

The measure µX is called the distribution (or sometimes the law) of X.

Example 2.16. Let denote a subset of the positive . A vector ρ = I 0 ρ , i is a distribution on if it has nonnegative entries and its total mass { 0,i ∈ I} I equals 1: i ρ0,i = 1. ∈I ConsiderP the case where E = R equipped with the Borel σ algebra. In this − case a random variable is defined to be a function X :Ω R such that → ω Ω : X(ω) 6 x x R. { ∈ }⊂F ∀ ∈ 14 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

We can now define the probability distribution function of X, F : R [0, 1] as X → F (x)= P ω Ω X(ω) 6 x =: P(X 6 x). (2.3) X ∈   In this case, (R, (R), F ) becomes a probability space. B X The distribution function FX (x) of a random variable has the properties that limx FX (x) = 0, limx + F (x) = 1 and is right continuous. →−∞ → ∞ Definition 2.17. A random variable X with values on R is called discrete if it takes values in some countable subset x , x , x ,... of R. i.e.: P(X = x) = x only { 0 1 2 } 6 for x = x0, x1,... .

With a random variable we can associate the probability mass function pk = P(X = xk). We will consider nonnegative valued discrete random vari- ables. In this case pk = P(X = k), k = 0, 1, 2,... .

Example 2.18. The Poisson random variable is the nonnegative integer valued random variable with probability mass function

k λ λ p = P(X = k)= e− , k = 0, 1, 2,..., k k! where λ> 0.

Example 2.19. The binomial random variable is the nonnegative integer valued random variable with probability mass function

N! n N n p = P(X = k)= p q − k = 0, 1, 2,...N, k n!(N n)! − where p (0, 1), q = 1 p. ∈ − Definition 2.20. A random variable X with values on R is called continuous if P(X = x) = 0 x R. ∀ ∈ Let (Ω, , P) be a probability space and let X :Ω R be a random variable F → with distribution F . This is a probability measure on (R). We will assume that X B it is absolutely continuous with respect to the with density ρX : FX (dx) = ρ(x) dx. We will call the density ρ(x) the probability density function (PDF) of the random variable X. 2.2. RANDOM VARIABLES 15

Example 2.21. i. The exponential random variable has PDF λe λx x> 0, f(x)= − 0 x< 0,  with λ> 0.

ii. The uniform random variable has PDF 1 a < x < b, f(x)= b a −0 x / (a, b),  ∈ with a < b. Definition 2.22. Two random variables X and Y are independent if the events ω Ω X(ω) 6 x and ω Ω Y (ω) 6 y are independent for all x, y R. { ∈ | } { ∈ | } ∈ Let X, Y be two continuous random variables. We can view them as a ran- dom vector, i.e. a random variable from Ω to R2. We can then define the joint distribution function F (x,y)= P(X 6 x, Y 6 y). ∂2F The mixed of the distribution function fX,Y (x,y) := ∂x∂y (x,y), if it exists, is called the joint PDF of the random vector X, Y : { } x y FX,Y (x,y)= fX,Y (x,y) dxdy. Z−∞ Z−∞ If the random variables X and Y are independent, then

FX,Y (x,y)= FX (x)FY (y) and

fX,Y (x,y)= fX (x)fY (y). The joint distribution function has the properties

FX,Y (x,y) = FY,X (y,x), + F (+ ,y) = F (y), f (y)= ∞ f (x,y) dx. X,Y ∞ Y Y X,Y Z−∞ We can extend the above definition to random vectors of arbitrary finite dimen- sions. Let X be a random variable from (Ω, ,µ) to (Rd, (Rd)). The (joint) F B distribution function F Rd [0, 1] is defined as X → FX (x)= P(X 6 x). 16 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

d Let X be a random variable in R with distribution function f(xN ) where xN = N 1 x1,...xN . We define the marginal or reduced distribution function f − (xN 1) { } − by N 1 N f − (xN 1)= f (xN ) dxN . − R Z We can define other reduced distribution functions:

N 2 N 1 f − (xN 2)= f − (xN 1) dxN 1 = f(xN ) dxN 1dxN . − R − − R R − Z Z Z 2.2.1 Expectation of Random Variables

We can use the distribution of a random variable to compute expectations and prob- abilities:

E[f(X)] = f(x) dFX (x) (2.4) R Z and P[X G]= dF (x), G (E). (2.5) ∈ X ∈B ZG The above formulas apply to both discrete and continuous random variables, pro- vided that we define the in (2.4) and (2.5) appropriately. d When E = R and a PDF exists, dFX (x)= fX (x) dx, we have

x1 xd FX (x) := P(X 6 x)= . . . fX (x) dx.. Z−∞ Z−∞ When E = Rd then by Lp(Ω; Rd), or sometimes Lp(Ω; µ) or even simply Lp(µ), we mean the of measurable functions on Ω with

1/p p X p = E X . k kL | |   Let X be a nonnegative integer valued random variable with probability mass function pk. We can compute the expectation of an arbitrary function of X using the formula ∞ E(f(X)) = f(k)pk. Xk=0 Let X, Y be random variables we want to know whether they are correlated and, if they are, to calculate how correlated they are. We define the of the two random variables as

cov(X,Y )= E (X EX)(Y EY ) = E(XY ) EXEY. − − −   2.2. RANDOM VARIABLES 17

The correlation coefficient is cov(X,Y ) ρ(X,Y )= (2.6) var(X) var(X) The Cauchy-Schwarz inequalityp yields thatpρ(X,Y ) [ 1, 1]. We will say ∈ − that two random variables X and Y are uncorrelated provided that ρ(X,Y ) = 0. It is not true in general that two uncorrelated random variables are independent. This is true, however, for Gaussian random variables (see Exercise 5). Example 2.23. Consider the random variable X :Ω R with pdf • 7→ 1 (x b)2 γ (x) := (2πσ)− 2 exp − . σ,b − 2σ   Such an X is termed a Gaussian or normal random variable. The mean is

EX = xγσ,b(x) dx = b R Z and the is

2 2 E(X b) = (x b) γσ,b(x) dx = σ. − R − Z Let b Rd and Σ Rd d be symmetric and positive definite. The random • ∈ ∈ × variable X :Ω Rd with pdf 7→ 1 d − 2 1 1 γ (x) := (2π) detΣ exp Σ− (x b), (x b) Σ,b −2h − − i     is termed a multivariate Gaussian or normal random variable. The mean is

E(X)= b (2.7)

and the covariance is

E (X b) (X b) =Σ. (2.8) − ⊗ − Since the mean and variance specify completely a Gaussian random variable on R, the Gaussian is commonly denoted by (m,σ). The standard normal random N variable is (0, 1). Similarly, since the mean and completely N specify a Gaussian random variable on Rd, the Gaussian is commonly denoted by (m, Σ). N Some analytical calculations for Gaussian random variables will be presented in Section 2.5. 18 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

2.3 Conditional Expecation

Assume that X L1(Ω, ,µ) and let be a sub–σ–algebra of . The conditional ∈ F G F expectation of X with respect to is defined to be the function (random variable) G E[X ]:Ω E which is –measurable and satisfies |G 7→ G E[X ] dµ = X dµ G . |G ∀ ∈ G ZG ZG We can define E[f(X) ] and the conditional probability P[X F ]= E[I (X) ], |G ∈ |G F |G where IF is the of F , in a similar manner. We list some of the most important properties of conditional expectation.

Theorem 2.24. [Properties of Conditional Expectation]. Let (Ω, ,µ) be a prob- F ability space and let be a sub–σ–algebra of . G F (a) If X is measurable and integrable then E(X )= X. G− |G

(b) (Linearity) If X1, X2 are integrable and c1, c2 constants, then

E(c X + c X )= c E(X )+ c E(X ). 1 1 2 2|G 1 1|G 2 2|G (c) (Order) If X , X are integrable and X 6 X a.s., then E(X ) 6 E(X ) 1 2 1 2 1|G 2|G a.s.

(d) If Y and XY are integrable, and X is measurable then E(XY ) = G− |G XE(Y ). |G (e) (Successive smoothing) If is a sub–σ–algebra of , and X is inte- D F D ⊂ G grable, then E(X )= E[E(X ) ]= E[E(X ) ]. |D |G |D |D |G (f) (Convergence) Let X be a sequence of random variables such that, for { n}n∞=1 all n, X 6 Z where Z is integrable. If X X a.s., then E(X ) | n| n → n|G → E(X ) a.s. and in L1. |G Proof. See Exercise 10.

2.4 The Characteristic Function

Many of the properties of (sums of) random variables can be studied using the of the distribution function. Let F (λ) be the distribution function 2.5. GAUSSIAN RANDOM VARIABLES 19 of a (discrete or continuous) random variable X. The characteristic function of X is defined to be the Fourier transform of the distribution function

φ(t)= eitλ dF (λ)= E(eitX ). (2.9) R Z For a continuous random variable for which the distribution function F has a den- sity, dF (λ)= p(λ)dλ, (2.9) gives

φ(t)= eitλp(λ) dλ. R Z

For a discrete random variable for which P(X = λk)= αk, (2.9) gives

∞ itλk φ(t)= e ak. Xk=0 From the properties of the Fourier transform we conclude that the characteristic function determines uniquely the distribution function of the random variable, in the sense that there is a one-to-one correspondance between F (λ) and φ(t). Fur- thermore, in the exercises at the end of the chapter the reader is asked to prove the following two results.

Lemma 2.25. Let X1, X2,...Xn be independent random variables with char- { } n acteristic functions φj(t), j = 1,...n and let Y = j=1 Xj with characteristic function φ (t). Then Y P n φY (t) = Πj=1φj(t).

Lemma 2.26. Let X be a random variable with characteristic function φ(t) and assume that it has finite moments. Then

1 E(Xk)= φ(k)(0). ik

2.5 Gaussian Random Variables

In this section we present some useful calculations for Gaussian random variables. In particular, we calculate the normalization constant, the mean and variance and the characteristic function of multidimensional Gaussian random variables. 20 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

Theorem 2.27. Let b Rd and Σ Rd d a symmetric and positive definite ma- ∈ ∈ × trix. Let X be the multivariate Gaussian random variable with probability density function 1 1 1 γ (x)= exp Σ− (x b), x b . Σ,b Z −2h − − i   Then

i. The normalization constant is

Z = (2π)d/2 det(Σ). p ii. The mean vector and covariance matrix of X are given by

EX = b

and E((X EX) (X EX))=Σ. − ⊗ − iii. The characteristic function of X is

1 i b,t t,Σt φ(t)= e h i− 2 h i.

Proof. i. From the spectral theorem for symmetric positive definite matrices we have that there exists a diagonal matrix Λ with positive entries and an orthogonal matrix B such that

1 T 1 Σ− = B Λ− B.

Let z = x b and y = Bz. We have − 1 T 1 Σ− z, z = B Λ− Bz, z h i h i 1 1 = Λ− Bz,Bz = Λ− y, y h i h i d 1 2 = λi− yi . Xi=1 1 d 1 d Furthermore, we have that det(Σ− ) = Πi=1λi− , that det(Σ) = Πi=1λi and that the Jacobian of an orthogonal transformation is J = det(B) = 1. 2.5. GAUSSIAN RANDOM VARIABLES 21

Hence,

1 1 1 1 exp Σ− (x b), x b dx = exp Σ− z, z dz Rd −2h − − i Rd −2h i Z   Z   d 1 1 2 = exp λi− yi J dy Rd −2 ! | | Z Xi=1 d 1 1 2 = exp λi− yi dyi R −2 Yi=1 Z   d/2 n 1/2 d/2 = (2π) Πi=1λi = (2π) det(Σ), p from which we get that

Z = (2π)d/2 det(Σ). p In the above calculation we have used the elementary identity

2 α x 2π e− 2 dx = . R α Z r

ii. From the above calculation we have that

T γΣ,b(x) dx = γΣ,b(B y + b) dy d 1 1 2 = exp λiyi dyi. (2π)d/2 det(Σ) −2 Yi=1   p Consequently

EX = xγΣ,b(x) dx Rd Z T T = (B y + b)γΣ,b(B y + b) dy Rd Z T = b γΣ,b(B y + b) dy = b. Rd Z

1 T 1 T We note that, since Σ− = B Λ− B, we have that Σ = B ΛB. Further- 22 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

more, z = BT y. We calculate

E((Xi bi)(Xj bj)) = zizjγΣ,b(z + b) dz − − Rd Z 1 1 1 2 = Bkiyk Bmiym exp λℓ− yℓ dy d/2 Rd −2 (2π) det(Σ) m ! Z Xk X Xℓ p1 1 1 2 = BkiBmj ykym exp λℓ− yℓ dy (2π)d/2 det(Σ) Rd −2 ! Xk,m Z Xℓ = BkiBpmjλkδkm Xk,m = Σij.

iii. Let y be a multivariate Gaussian random variable with mean 0 and covari- ance I. Let also C = B√Λ. We have that Σ = CCT = CT C. We have that X = CY + b. To see this, we first note that X is Gaussian since it is given through a linear transformation of a Gaussian random variable. Furthermore,

EX = b and E((X b )(X b ))=Σ . i − i j − j ij Now we have:

i X,t i b,t i CY,t φ(t) = Ee h i = e h iEe h i i b,t i Y,CT t = e h iEe h i

i b,t i P (P Cjktk)yj = e h iEe j k 1 2 i b,t 2 P P Cjktk = e h ie− j | k | 1 i b,t Ct,Ct = e h ie− 2 h i 1 i b,t t,CT Ct = e h ie− 2 h i 1 i b,t t,Σt = e h ie− 2 h i.

Consequently,

1 i b,t t,Σt φ(t) = e h i− 2 h i. 2.6. TYPES OF CONVERGENCE AND LIMIT THEOREMS 23

2.6 Types of Convergence and Limit Theorems

One of the most important aspects of the theory of random variables is the study of limit theorems for sums of random variables. The most well known limit theorems in probability theory are the law of large and the . There are various different types of convergence for or random variables. We list the most important types of convergence below. Definition 2.28. Let Z be a sequence of random variables. We will say that { n}n∞=1 (a) Zn converges to Z with probability one if

P lim Zn = Z = 1. n + → ∞  (b) Zn converges to Z in probability if for every ε> 0

lim P Zn Z > ε = 0. n + → ∞ | − | p  (c) Zn converges to Z in L if p lim E Zn Z = 0. n + → ∞ −   (d) Let F (λ),n = 1, + , F (λ) be the distribution functions of Z n = n · · · ∞ n 1, + and Z, respectively. Then Z converges to Z in distribution if · · · ∞ n lim Fn(λ)= F (λ) n + → ∞ for all λ R at which F is continuous. ∈ Recall that the distribution function FX of a random variable from a probability space (Ω, , P) to R induces a probability measure on R and that (R, (R), F ) is F B X a probability space. We can show that the convergence in distribution is equivalent to the weak convergence of the probability measures induced by the distribution functions. Definition 2.29. Let (E, d) be a metric space, (E) the σ algebra of its Borel B − sets, P a sequence of probability measures on (E, (E)) and let C (E) denote n B b the space of bounded continuous functions on E. We will say that the sequence of P converges weakly to the probability measure P if, for each f C (E), n ∈ b

lim f(x) dPn(x)= f(x) dP (x). n + → ∞ ZE ZE 24 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

Theorem 2.30. Let F (λ),n = 1, + , F (λ) be the distribution functions of n · · · ∞ Z n = 1, + and Z, respectively. Then Z converges to Z in distribution if n · · · ∞ n and only if, for all g C (R) ∈ b

lim g(x) dFn(x)= g(x) dF (x). (2.10) n + → ∞ ZX ZX Notice that (2.10) is equivalent to

lim Eng(Xn)= Eg(X), n + → ∞ where En and E denote the expectations with respect to Fn and F , respectively. When the sequence of random variables whose convergence we are interested in takes values in Rd or, more generally, a metric space space (E, d) then we can use weak convergence of the sequence of probability measures induced by the sequence of random variables to define convergence in distribution.

Definition 2.31. A sequence of real valued random variables Xn defined on a probability spaces (Ω , , P ) and taking values on a metric space (E, d) is said n Fn n to converge in distribution if the indued measures F (B) = P (X B) for n n n ∈ B (E) converge weakly to a probability measure P . ∈B Let X be iid random variables with EX = V . Then, the strong law { n}n∞=1 n of large numbers states that of the sum of the iid converges to V with probability one: N 1 P lim Xn = V = 1. N + N → ∞ n=1  X  The strong provides us with information about the behav- ior of a sum of random variables (or, a large number or repetitions of the same experiment) on average. We can also study fluctuations around the average be- havior. Indeed, let E(X V )2 = σ2. Define the centered iid random variables n − Y = X V . Then, the sequence of random variables 1 N Y converges n n − σ√N n=1 n in distribution to a (0, 1) random variable: N P N a 1 1 1 x2 lim P Yn 6 a = e− 2 dx. n + √ √ → ∞ σ N n=1 ! 2π X Z−∞ This is the central limit theorem. 2.7. DISCUSSION AND BIBLIOGRAPHY 25

2.7 Discussion and Bibliography

The material of this chapter is very standard and can be found in many books on probability theory. Well known textbooks on probability theory are [2, 8, 9, 16, 17, 12, 23]. The connection between conditional expectation and orthogonal projections is discussed in [4]. The reduced distribution functions defined in Section 2.2 are used extensively in . A different normalization is usually used in physics text- books. See for instance [1, Sec. 4.2]. The calculations presented in Section 2.5 are essentially an exercise in . See [15, Sec. 10.2]. Random variables and probability measures can also be defined in infinite di- mensions. More information can be found in [20, Ch. 2]. The study of limit theorems is one of the cornerstones of probability theory and of the theory of stochastic processes. A comprehensive study of limit theorems can be found in [10].

2.8 Exercises

1. Show that the intersection of a family of σ-algebras is a σ-algebra.

2. Prove the law of total probability, Proposition 2.13.

3. Calculate the mean, variance and characteristic function of the following prob- ability density functions.

(a) The with density

λe λx x> 0, f(x)= − 0 x< 0,  with λ> 0. (b) The uniform distribution with density

1 a < x < b, f(x)= b a −0 x / (a, b),  ∈ with a < b. 26 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY

(c) The with density

λ (λx)α 1e λx x> 0, f(x)= Γ(α) − − ( 0 x< 0, with λ> 0,α> 0 and Γ(α) is the

∞ α 1 ξ Γ(α)= ξ − e− dξ, α> 0. Z0 4. Le X and Y be independent random variables with distribution functions FX and FY . Show that the distribution function of the sum Z = X + Y is the of FX and FY :

F (x)= F (x y) dF (y). Z X − Y Z 5. Let X and Y be Gaussian random variables. Show that they are uncorrelated if and only if they are independent.

6. (a) Let X be a continuous random variable with characteristic function φ(t). Show that 1 EXk = φ(k)(0), ik where φ(k)(t) denotes the k-th derivative of φ evaluated at t. (b) Let X be a nonnegative random variable with distribution function F (x). Show that + E(X)= ∞(1 F (x)) dx. − Z0 (c) Let X be a continuous random variable with probability density function f(x) and characteristic function φ(t). Find the probability density and characteristic function of the random variable Y = aX + b with a, b R. ∈ (d) Let X be a random variable with uniform distribution on [0, 2π]. Find the probability density of the random variable Y = sin(X).

7. Let X be a discrete random variable taking vales on the set of nonnegative inte- P + gers with probability mass function pk = (X = k) with pk > 0, k=0∞ pk = 1. The generating function is defined as P + X ∞ k g(s)= E(s )= pks . Xk=0 2.8. EXERCISES 27

(a) Show that

2 EX = g′(1) and EX = g′′(1) + g′(1),

where the prime denotes differentiation. (b) Calculate the generating function of the Poisson random variable with

e λλk p = P(X = k)= − , k = 0, 1, 2,... and λ> 0. k k! (c) Prove that the generating function of a sum of independent nonnegative integer valued random variables is the product of their generating func- tions.

8. Write a computer program for studying the law of large numbers and the central limit theorem. Investigate numerically the rate of convergence of these two theorems.

9. Study the properties of Gaussian measures on separable Hilbert spaces from [20, Ch. 2].

10. Prove Theorem 2.24. 28 CHAPTER 2. ELEMENTS OF PROBABILITY THEORY Chapter 3

Basics of the Theory of Stochastic Processes

In this chapter we present some basic results form the theory of stochastic pro- cesses and we investigate the properties of some of the standard stochastic pro- cesses in continuous time. In Section 3.1 we give the definition of a stochastic pro- cess. In Section 3.2 we present some properties of stationary stochastic processes. In Section 3.3 we introduce Brownian motion and study some of its properties. Various examples of stochastic processes in continuous time are presented in Sec- tion 3.4. The Karhunen-Loeve expansion, one of the most useful tools for repre- senting stochastic processes and random fields, is presented in Section 3.5. Further discussion and bibliographical comments are presented in Section 3.6. Section 3.7 contains exercises.

3.1 Definition of a Stochastic Process

Stochastic processes describe dynamical systems whose evolution law is of proba- bilistic nature. The precise definition is given below.

Definition 3.1 (stochastic process). Let T be an ordered set, (Ω, , P) a probabil- F ity space and (E, ) a measurable space. A stochastic process is a collection of G random variables X = X ; t T where, for each fixed t T , X is a random { t ∈ } ∈ t variable from (Ω, , P) to (E, ). Ω is called the sample space. and E is the state F G space of the stochastic process Xt.

The set T can be either discrete, for example the set of positive integers Z+, or

29 30 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES continuous, T = [0, + ). The E will usually be Rd equipped with the ∞ σ–algebra of Borel sets. A stochastic process X may be viewed as a function of both t T and ω Ω. ∈ ∈ We will sometimes write X(t), X(t,ω) or Xt(ω) instead of Xt. For a fixed sample point ω Ω, the function X (ω) : T E is called a (realization, trajectory) of ∈ t 7→ the process X.

Definition 3.2 (finite dimensional distributions). The finite dimensional distribu- tions (fdd) of a stochastic process are the distributions of the Ek–valued random variables (X(t1), X(t2),...,X(tk)) for arbitrary positive integer k and arbitrary times t T, i 1,...,k : i ∈ ∈ { }

F (x)= P(X(ti) 6 xi, i = 1,...,k) with x = (x1,...,xk).

From experiments or numerical we can only obtain information about the finite dimensional distributions of a process. A natural question arises: are the finite dimensional distributions of a stochastic process sufficient to deter- mine a stochastic process uniquely? This is true for processes with continuous paths 1. This is the class of stochastic processes that we will study in these notes.

Definition 3.3. We will say that two processes Xt and Yt are equivalent if they have same finite dimensional distributions.

Definition 3.4. A one dimensional continuous time is a stochas- tic process for which E = R and all the finite dimensional distributions are Gaus- sian, i.e. every finite dimensional vector (X , X ,...,X ) is a (µ , K ) ran- t1 t2 tk N k k dom variable for some vector µk and a symmetric nonnegative definite matrix Kk for all k = 1, 2,... and for all t1,t2,...,tk.

From the above definition we conclude that the finite dimensional distributions of a Gaussian continuous time stochastic process are Gaussian with probability distribution function

n/2 1/2 1 1 γ (x) = (2π)− (detK )− exp K− (x µ ),x µ , µk,Kk k −2h k − k − ki   where x = (x1,x2,...xk).

1In fact, what we need is the stochastic process to be separable. See the discussion in Section 3.6 3.2. STATIONARY PROCESSES 31

It is straightforward to extend the above definition to arbitrary . A Gaussian process x(t) is characterized by its mean m(t) := Ex(t) and the covariance (or ) matrix

C(t,s)= E x(t) m(t) x(s) m(s) . − ⊗ − Thus, the first two moments of a Gaussian process are sufficien t for a complete characterization of the process.

3.2 Stationary Processes

3.2.1 Strictly Stationary Processes

In many stochastic processes that appear in applications their statistics remain in- variant under time translations. Such stochastic processes are called stationary. It is possible to develop a quite general theory for stochastic processes that enjoy this symmetry property. Definition 3.5. A stochastic process is called (strictly) stationary if all finite di- mensional distributions are invariant under time translation: for any integer k and times t T , the distribution of (X(t ), X(t ),...,X(t )) is equal to that of i ∈ 1 2 k (X(s + t ), X(s + t ),...,X(s + t )) for any s such that s + t T for all 1 2 k i ∈ i 1,...,k . In other words, ∈ { } P(X A , X A ...X A )= P(X A , X A ...X A ), t T. t1+t ∈ 1 t2+t ∈ 2 tk+t ∈ k t1 ∈ 1 t2 ∈ 2 tk ∈ k ∀ ∈ Example 3.6. Let Y0, Y1,... be a sequence of independent, identically distributed random variables and consider the stochastic process Xn = Yn. Then Xn is a strictly (see Exercise 1). Assume furthermore that EY0 = µ < + . Then, by the strong law of large numbers, we have that ∞ N 1 N 1 1 − 1 − X = Y EY = µ, N j N j → 0 Xj=0 Xj=0 . In fact, the Birkhoff ergodic theorem states that, for any function f such that Ef(Y ) < + , we have that 0 ∞ N 1 1 − lim f(Xj)= Ef(Y0), (3.1) N + N → ∞ Xj=0 32 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES almost surely. The sequence of iid random variables is an example of an ergodic strictly stationary processes.

Ergodic strictly stationary processes satisfy (3.1) Hence, we can calculate the statistics of a sequence stochastic process Xn using a single sample path, provided that it is long enough (N 1). ≫

Example 3.7. Let Z be a random variable and define the stochastic process Xn = Z, n = 0, 1, 2,... . Then Xn is a strictly stationary process (see Exercise 2). We can calculate the long time average of this stochastic process:

N 1 N 1 1 − 1 − X = Z = Z, N j N Xj=0 Xj=0 which is independent of N and does not converge to the mean of the stochastic pro- cesses EXn = EZ (assuming that it is finite), or any other deterministic number. This is an example of a non-ergodic processes.

3.2.2 Second Order Stationary Processes

Let Ω, , P be a probability space. Let X , t T (with T = R or Z) be a F t ∈ real-valued random process on this probability space with finite second moment,  E X 2 < + (i.e. X L2(Ω, P) for all t T ). Assume that it is strictly | t| ∞ t ∈ ∈ stationary. Then, E(X )= EX , s T (3.2) t+s t ∈ from which we conclude that EXt is constant. and

E((X µ)(X µ)) = E((X µ)(X µ)), s T (3.3) t1+s − t2+s − t1 − t2 − ∈ from which we conclude that the covariance or autocorrelation or C(t,s) = E((X µ)(X µ)) depends on the difference between the t − s − two times, t and s, i.e. C(t,s)= C(t s). This motivates the following definition. − Definition 3.8. A stochastic process X L2 is called second-order stationary or t ∈ wide-sense stationary or weakly stationary if the first moment EXt is a constant and the E(X µ)(X µ) depends only on the difference t − s − t s: − EX = µ, E((X µ)(X µ)) = C(t s). t t − s − − 3.2. STATIONARY PROCESSES 33

The constant µ is the expectation of the process Xt. Without loss of generality, we can set µ = 0, since if EX = µ then the process Y = X µ is mean zero. t t t − A mean zero process with be called a centered process. The function C(t) is the covariance (sometimes also called ) or the autocorrelation function E E 2 of the Xt. Notice that C(t)= (XtX0), whereas C(0) = (Xt ), which is finite, by assumption. Since we have assumed that Xt is a real valued process, we have that C(t)= C( t), t R. − ∈ Remark 3.9. Let Xt be a strictly stationary stochastic process with finite second moment (i.e. X L2). The definition of strict stationarity implies that EX = µ, a t ∈ t constant, and E((X µ)(X µ)) = C(t s). Hence, a strictly stationary process t− s− − with finite second moment is also stationary in the wide sense. The converse is not true.

Example 3.10. Let Y0, Y1,... be a sequence of independent, identically distributed random variables and consider the stochastic process Xn = Yn. From Example 3.6 we know that this is a strictly stationary process, irrespective of whether Y0 is such that EY 2 < + . Assume now that EY = 0 and EY 2 = σ2 < + . Then 0 ∞ 0 0 ∞ Xn is a second order stationary process with mean zero and correlation function 2 R(k)= σ δk0. Notice that in this case we have no correlation between the values of the stochastic process at different times n and k.

Example 3.11. Let Z be a single random variable and consider the stochastic process Xn = Z, n = 0, 1, 2,... . From Example 3.7 we know that this is a strictly stationary process irrespective of whether E Z 2 < + or not. Assume now that 2 2 | | ∞ EZ = 0, EZ = σ . Then Xn becomes a second order stationary process with R(k)= σ2. Notice that in this case the values of our stochastic process at different times are strongly correlated.

We will see in Section 3.2.3 that for second order stationary processes, ergod- icity is related to fast decay of correlations. In the first of the examples above, there was no correlation between our stochastic processes at different times and the stochastic process is ergodic. On the contrary, in our second example there is very strong correlation between the stochastic process at different times and this process is not ergodic.

Remark 3.12. The first two moments of a Gaussian process are sufficient for a complete characterization of the process. Consequently, a Gaussian stochastic process is strictly stationary if and only if it is weakly stationary. 34 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

Continuity properties of the covariance function are equivalent to continuity 2 properties of the paths of Xt in the L sense, i.e.

2 lim E Xt+h Xt = 0. h 0 | − | → Lemma 3.13. Assume that the covariance function C(t) of a second order station- ary process is continuous at t = 0. Then it is continuous for all t R. Further- ∈ more, the continuity of C(t) is equivalent to the continuity of the process Xt in the L2-sense.

Proof. Fix t R and (without loss of generality) set EX = 0. We calculate: ∈ t C(t + h) C(t) 2 = E(X X ) E(X X ) 2 = E ((X X )X ) 2 | − | | t+h 0 − t 0 | | t+h − t 0 | 6 E(X )2E(X X )2 0 t+h − t = C(0)(EX2 + EX2 2EX X ) t+h t − t t+h = 2C(0)(C(0) C(h)) 0, − → as h 0. Thus, continuity of C( ) at 0 implies continuity for all t. → · Assume now that C(t) is continuous. From the above calculation we have

E X X 2 = 2(C(0) C(h)), (3.4) | t+h − t| − which converges to 0 as h 0. Conversely, assume that X is L2-continuous. → t Then, from the above equation we get limh 0 C(h)= C(0). → Notice that form (3.4) we immediately conclude that C(0) >C(h), h R. ∈ The Fourier transform of the covariance function of a second order stationary process always exists. This enables us to study second order stationary processes using tools from . To make the link between second order station- ary processes and Fourier analysis we will use Bochner’s theorem, which applies to all nonnegative functions.

Definition 3.14. A function f(x) : R R is called nonnegative definite if 7→ n f(t t )c c¯ > 0 (3.5) i − j i j i,jX=1 for all n N, t ,...t R, c ,...c C. ∈ 1 n ∈ 1 n ∈ 3.2. STATIONARY PROCESSES 35

Lemma 3.15. The covariance function of second order stationary process is a nonnegative definite function.

c n Proof. We will use the notation Xt := i=1 Xti ci. We have. n n P C(t t )c c¯ = EX X c c¯ i − j i j ti tj i j i,jX=1 i,jX=1 n n = E X c X c¯ = E XcX¯ c  ti i tj j t t i=1 j=1 X X  = E Xc 2 > 0.  | t |

Theorem 3.16. [Bochner] Let C(t) be a continuous positive definite function. Then there exists a unique nonnegative measure ρ on R such that ρ(R) = C(0) and C(t)= eiωt ρ(dω) t R. (3.6) R ∀ ∈ Z Definition 3.17. Let Xt be a second order stationary process with autocorrelation function C(t) whose Fourier transform is the measure ρ(dω). The measure ρ(dω) is called the spectral measure of the process Xt.

In the following we will assume that the spectral measure is absolutely contin- uous with respect to the Lebesgue measure on R with density S(ω), i.e. ρ(dω) = S(ω)dω. The Fourier transform S(ω) of the covariance function is called the of the process:

1 ∞ itω S(ω)= e− C(t) dt. (3.7) 2π Z−∞ From (3.6) it follows that that the autocorrelation function of a mean zero, second order stationary process is given by the inverse Fourier transform of the spectral density: C(t)= ∞ eitωS(ω) dω. (3.8) Z−∞ There are various cases where the experimentally measured is the spec- tral density (or power spectrum) of a stationary stochastic process. Conversely, 36 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES from a of observations of a stationary processes we can calculate the autocorrelation function and, using (3.8) the spectral density. The autocorrelation function of a second order stationary process enables us to associate a time scale to Xt, the correlation time τcor: 1 τ = ∞ C(τ) dτ = ∞ E(X X )/E(X2) dτ. cor C(0) τ 0 0 Z0 Z0 The slower the decay of the correlation function, the larger the correlation time is. Notice that when the correlations do not decay sufficiently fast so that C(t) is integrable, then the correlation time will be infinite.

Example 3.18. Consider a mean zero, second order stationary process with cor- relation function α t R(t)= R(0)e− | | (3.9)

D where α > 0. We will write R(0) = α where D > 0. The spectral density of this process is:

+ 1 D ∞ iωt α t S(ω) = e− e− | | dt 2π α Z−∞ 0 + 1 D iωt αt ∞ iωt αt = e− e dt + e− e− dt 2π α 0 Z−∞ Z  1 D 1 1 = + 2π α iω + α iω + α −  D 1 = . π ω2 + α2 This function is called the Cauchy or the Lorentz distribution. The correlation time is (we have that R(0) = D/α)

∞ αt 1 τcor = e− dt = α− . Z0 A Gaussian process with an exponential correlation function is of particular importance in the theory and applications of stochastic processes.

Definition 3.19. A real-valued Gaussian stationary process defined on R with cor- relation function given by (3.9) is called the (stationary) Ornstein-Uhlenbeck pro- cess. 3.2. STATIONARY PROCESSES 37

The Ornstein Uhlenbeck process is used as a model for the velocity of a Brown- ian particle. It is of interest to calculate the statistics of the position of the Brownian particle, i.e. of the

t X(t)= Y (s) ds, (3.10) Z0 where Y (t) denotes the stationary OU process. Lemma 3.20. Let Y (t) denote the stationary OU process with covariance func- tion (3.9) and set α = D = 1. Then the position process (3.10) is a mean zero Gaussian process with covariance function

min(t,s) max(t,s) t s E(X(t)X(s)) = 2 min(t,s)+ e− + e− e−| − | 1. (3.11) − − Proof. See Exercise 8.

3.2.3 Ergodic Properties of Second-Order Stationary Processes

Second order stationary processes have nice ergodic properties, provided that the correlation between values of the process at different times decays sufficiently fast. In this case, it is possible to show that we can calculate expectations by calculating time averages. An example of such a result is the following.

Theorem 3.21. Let X > be a second order stationary process on a proba- { t}t 0 bility space Ω, , P with mean µ and covariance R(t), and assume that R(t) F ∈ L1(0, + ). Then ∞ 2 1 T lim E X(s) ds µ = 0. (3.12) T + T − → ∞ Z0

For the proof of this result we will first need an elementary lemma.

Lemma 3.22. Let R(t) be an integrable symmetric function. Then

T T T R(t s) dtds = 2 (T s)R(s) ds. (3.13) − − Z0 Z0 Z0 Proof. We make the change of variables u = t s, v = t + s. The domain of − integration in the t, s variables is [0, T ] [0, T ]. In the u, v variables it becomes × [ T, T ] [0, 2(T u )]. The Jacobian of the transformation is − × − | | ∂(t,s) 1 J = = . ∂(u, v) 2 38 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

The integral becomes

T T T 2(T u ) R(t s) dtds = −| | R(u)J dvdu 0 0 − T 0 Z Z Z− Z T = (T u )R(u) du T − | | Z− T = 2 (T u)R(u) du, − Z0 where the symmetry of the function R(u) was used in the last step.

Proof of Theorem 3.21. We use Lemma (3.22) to calculate:

2 2 1 T 1 T E X ds µ = E (X µ) ds T s − T 2 s − Z0 Z0 T T 1 = E (X(t) µ)(X(s) µ) dtds T 2 − − Z0 Z0 1 T T = R(t s) dtds T 2 − Z0 Z0 2 T = (T u)R(u) du T 2 − Z0 2 + u 2 + 6 ∞ 1 R(u) du 6 ∞ R(u) du 0, T 0 − T T 0 → Z   Z using the dominated convergence theorem and the assumption R( ) L1. · ∈ Assume that µ = 0 and define

+ D = ∞ R(t) dt, (3.14) Z0 which, from our assumption on R(t), is a finite quantity. 2 The above calculation suggests that, for T 1, we have that ≫ t 2 E X(t) dt 2DT. ≈ Z0  This implies that, at sufficiently long times, the mean displacement of the integral of the ergodic second order stationary process Xt scales linearly in time, with proportionality coefficient 2D.

2Notice however that we do not know whether it is nonzero. This requires a separate argument. 3.3. BROWNIAN MOTION 39

Assume that Xt is the velocity of a (Brownian) particle. In this case, the inte- gral of Xt t Zt = Xs ds, Z0 represents the particle position. From our calculation above we conclude that

E 2 Zt = 2Dt. where ∞ ∞ D = R(t) dt = E(XtX0) dt (3.15) Z0 Z0 is the diffusion coefficient. Thus, one expects that at sufficiently long times and under appropriate assumptions on the correlation function, the time integral of a stationary process will approximate a Brownian motion with diffusion coefficient D. The diffusion coefficient is an example of a transport coefficient and (3.15) is an example of the Green-Kubo formula: a transport coefficient can be calculated in terms of the time integral of an appropriate autocorrelation function. In the case of the diffusion coefficient we need to calculate the integral of the velocity autocorrelation function.

Example 3.23. Consider the stochastic processes with an exponential correlation function from Example 3.18, and assume that this stochastic process describes the velocity of a Brownian particle. Since R(t) L1(0, + ) Theorem 3.21 applies. ∈ ∞ Furthermore, the diffusion coefficient of the Brownian particle is given by

+ ∞ 1 D R(t) dt = R(0)τ − = . c α2 Z0 3.3 Brownian Motion

The most important continuous time stochastic process is Brownian motion. Brow- nian motion is a mean zero, continuous (i.e. it has continuous sample paths: for a.e ω Ω the function X is a of time) process with indepen- ∈ t dent Gaussian increments. A process Xt has if for every sequence t0

X X , X X ,...,X X t1 − t0 t2 − t1 tn − tn−1 40 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES are independent. If, furthermore, for any t , t , s T and B R 1 2 ∈ ⊂ P(X X B)= P(X X B) t2+s − t1+s ∈ t2 − t1 ∈ then the process Xt has stationary independent increments.

Definition 3.24. A one dimensional standard Brownian motion W (t) : R+ • → R is a real valued stochastic process such that

i. W (0) = 0. ii. W (t) has independent increments. iii. For every t>s > 0 W (t) W (s) has a Gaussian distribution with − mean 0 and variance t s. That is, the density of the random variable − W (t) W (s) is − 1 2 2 x g(x; t,s)= 2π(t s) − exp ; (3.16) − −2(t s)    −  A d–dimensional standard Brownian motion W (t) : R+ Rd is a collec- • → tion of d independent one dimensional Brownian motions:

W (t) = (W1(t),...,Wd(t)),

where Wi(t), i = 1,...,d are independent one dimensional Brownian mo- tions. The density of the Gaussian random vector W (t) W (s) is thus − d/2 x 2 g(x; t,s)= 2π(t s) − exp k k . − −2(t s)    −  Brownian motion is sometimes referred to as the . Brownian motion has continuous paths. More precisely, it has a continuous modification.

Definition 3.25. Let X and Y , t T , be two stochastic processes defined on the t t ∈ same probability space (Ω, , P). The process Y is said to be a modification of F t X if P(X = Y ) = 1 t T . t t t ∀ ∈ Lemma 3.26. There is a continuous modification of Brownian motion.

This follows from a theorem due to Kolmogorov. 3.3. BROWNIAN MOTION 41

2 mean of 1000 paths 5 individual paths 1.5

1

0.5 U(t)

0

−0.5

−1

−1.5 0 0.2 0.4 0.6 0.8 1 t

Figure 3.1: Brownian sample paths

Theorem 3.27. (Kolmogorov) Let X , t [0, ) be a stochastic process on a t ∈ ∞ probability space Ω, , P . Suppose that there are positive constants α and β, { F } and for each T > 0 there is a constant C(T ) such that

E X X α 6 C(T ) t s 1+β, 0 6 s,t 6 T. (3.17) | t − s| | − |

Then there exists a continuous modification Yt of the process Xt.

The proof of Lemma 3.26 is left as an exercise.

Remark 3.28. Equivalently, we could have defined the one dimensional standard Brownian motion as a stochastic process on a probability space Ω, , P with F continuous paths for ω Ω, and Gaussian finite dimensional distri-  ∈ E butions with zero mean and covariance (Wti Wtj ) = min(ti,tj). One can then show that Definition 3.24 follows from the above definition.

It is possible to prove rigorously the existence of the Wiener process (Brownian motion):

Theorem 3.29. (Wiener) There exists an almost-surely continuous process Wt with independent increments such and W0 = 0, such that for each t > 0 the random variable W is (0,t). Furthermore, W is almost surely locally Holder¨ continu- t N t ous with exponent α for any α (0, 1 ). ∈ 2 42 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

Notice that Brownian paths are not differentiable. We can also construct Brownian motion through the limit of an appropriately rescaled random walk: let X1, X2,... be iid random variables on a probability space (Ω, , P) with mean 0 and variance 1. Define the discrete time stochastic F process Sn with S0 = 0, Sn = j=1 Xj, n > 1. Define now a continuous time stochastic process with continuous paths as the linearly interpolated, appropriately P rescaled random walk: 1 1 W n = S + (nt [nt]) X , t √n [nt] − √n [nt]+1 where [ ] denotes the integer part of a number. Then W n converges weakly, as · t n + to a one dimensional standard Brownian motion. → ∞ Brownian motion is a Gaussian process. For the d–dimensional Brownian mo- tion, and for I the d d dimensional identity, we have (see (2.7) and (2.8)) × EW (t) = 0 t > 0 ∀ and E (W (t) W (s)) (W (t) W (s)) = (t s)I. (3.18) − ⊗ − − Moreover,   E W (t) W (s) = min(t,s)I. (3.19) ⊗   From the formula for the Gaussian density g(x,t s), eqn. (3.16), we immedi- − ately conclude that W (t) W (s) and W (t + u) W (s + u) have the same pdf. − − Consequently, Brownian motion has stationary increments. Notice, however, that Brownian motion itself is not a stationary process. Since W (t)= W (t) W (0), − the pdf of W (t) is 1 x2/2t g(x,t)= e− . √2πt We can easily calculate all moments of the Brownian motion:

+ n 1 ∞ n x2/2t E(x (t)) = x e− dx √2πt Z−∞ 1.3 . . . (n 1)tn/2, n even, = − 0, n odd. n Brownian motion is invariant under various transformations in time. 3.3. BROWNIAN MOTION 43

Theorem 3.30. Let Wt denote a standard Brownian motion in R. Then, Wt has the following properties:

i. (Rescaling). For each define 1 . Then > c > 0 Xt = √c W (ct) (Xt, t 0) = (Wt, t > 0) in law.

ii. (Shifting). For each c> 0 W W ,t > 0 is a Brownian motion which is c+t − c independent of W , u [0, c]. u ∈

iii. (Time reversal). Define Xt = W1 t W1, t [0, 1]. Then (Xt, t [0, 1]) = − − ∈ ∈ (W , t [0, 1]) in law. t ∈

iv. (Inversion). Let Xt, t > 0 defined by X0 = 0, Xt = tW (1/t). Then (Xt, t > 0) = (Wt, t > 0) in law. We emphasize that the equivalence in the above theorem holds in law and not in a pathwise sense.

Proof. See Exercise 13.

We can also add a drift and change the diffusion coefficient of the Brownian motion: we will define a Brownian motion with drift µ and variance σ2 as the process

Xt = µt + σWt.

The mean and variance of Xt are

EX = µt, E(X EX )2 = σ2t. t t − t

Notice that Xt satisfies the equation

dXt = µ dt + σ dWt.

This is the simplest example of a stochastic differential equation. We can define the OU process through the Brownian motion via a time change.

Lemma 3.31. Let W (t) be a standard Brownian motion and consider the process

t 2t V (t)= e− W (e ).

Then V (t) is a Gaussian stationary process with mean 0 and correlation function

t R(t)= e−| |. (3.20) 44 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

For the proof of this result we first need to show that time changed Gaussian processes are also Gaussian. Lemma 3.32. Let X(t) be a Gaussian stochastic process and let Y (t)= X(f(t)) where f(t) is a strictly increasing function. Then Y (t) is also a Gaussian process. Proof. We need to show that, for all positive integers N and all sequences of times t , t ,...t the random vector { 1 2 N } Y (t ), Y (t ),...Y (t ) (3.21) { 1 2 N } is a multivariate Gaussian random variable. Since f(t) is strictly increasing, it is 1 invertible and hence, there exist si, i = 1,...N such that si = f − (ti). Thus, the random vector (3.21) can be rewritten as

X(s ), X(s ),...X(s ) , { 1 2 N } which is Gaussian for all N and all choices of times s1, s2,...sN . Hence Y (t) is also Gaussian.

Proof of Lemma 3.31. The fact that V (t) is mean zero follows immediately from the fact that W (t) is mean zero. To show that the correlation function of V (t) is given by (3.20), we calculate

t s 2t 2s t s 2t 2s E(V (t)V (s)) = e− − E(W (e )W (e )) = e− − min(e , e ) t s = e−| − |.

The Gaussianity of the process V (t) follows from Lemma 3.32 (notice that the transformation that gives V (t) in terms of W (t) is invertible and we can write 1/2 1 W (s)= s V ( 2 ln(s))).

3.4 Other Examples of Stochastic Processes

Brownian Bridge Let W (t) be a standard one dimensional Brownian motion. We define the (from 0 to 0) to be the process

B = W tW , t [0, 1]. (3.22) t t − 1 ∈ Notice that B0 = B1 = 0. Equivalently, we can define the Brownian bridge to be the continuous Gaussian process B : 0 6 t 6 1 such that { t } EB = 0, E(B B ) = min(s,t) st, s,t [0, 1]. (3.23) t t s − ∈ 3.4. OTHER EXAMPLES OF STOCHASTIC PROCESSES 45

Another, equivalent definition of the Brownian bridge is through an appropriate time change of the Brownian motion: t B = (1 t)W , t [0, 1). (3.24) t − 1 t ∈  −  Conversely, we can write the Brownian motion as a time change of the Brownian bridge: t W = (t + 1)B , t > 0. t 1+ t   Fractional Brownian Motion

H Definition 3.33. A (normalized) fractional Brownian motion Wt , t > 0 with Hurst H (0, 1) is a centered Gaussian process with continuous sam- ∈ ple paths whose covariance is given by 1 E(W H W H)= s2H + t2H t s 2H . (3.25) t s 2 − | − | Proposition 3.34. Fractional Brownian motion has the following properties. 1 1 2 i. When H = 2 , Wt becomes the standard Brownian motion. ii. W H = 0, EW H = 0, E(W H)2 = t 2H , t > 0. 0 t t | | iii. It has stationary increments, E(W H W H)2 = t s 2H . t − s | − | iv. It has the following self similarity property

H H H (Wαt , t > 0) = (α Wt , t > 0), α> 0, (3.26) where the equivalence is in law. Proof. See Exercise 19.

The Poisson Process Definition 3.35. The Poisson process with intensity λ, denoted by N(t), is an integer-valued, continuous time, stochastic process with independent increments satisfying

λ(t s) k e− − λ(t s) P[(N(t) N(s)) = k]= − , t>s > 0, k N. − k!  ∈ The Poisson process does not have a continuous modification. See Exercise 20. 46 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

3.5 The Karhunen-Loeve´ Expansion

Let f L2(Ω) where Ω is a subset of Rd and let e be an orthonormal basis ∈ { n}n∞=1 in L2(Ω). Then, it is well known that f can be written as a series expansion:

∞ f = fnen, n=1 X where

fn = f(x)en(x) dx. ZΩ The convergence is in L2(Ω):

N lim f(x) fnen(x) = 0. N − 2 →∞ n=1 L (Ω) X

It turns out that we can obtain a similar expansion for an L2 mean zero process which is continuous in the L2 sense: E 2 E E 2 Xt < + , Xt = 0, lim Xt+h Xt = 0. (3.27) ∞ h 0 | − | → For simplicity we will take T = [0, 1]. Let R(t,s)= E(XtXs) be the autocorrela- tion function. Notice that from (3.27) it follows that R(t,s) is continuous in both t and s (exercise 21). Let us assume an expansion of the form

∞ X (ω)= ξ (ω)e (t), t [0, 1] (3.28) t n n ∈ n=1 X where e is an orthonormal basis in L2(0, 1). The random variables ξ are { n}n∞=1 n calculated as

1 1 ∞ Xtek(t) dt = ξnen(t)ek(t) dt 0 0 Z Z nX=1 ∞ = ξnδnk = ξk, n=1 X where we assumed that we can interchange the summation and integration. We will assume that these random variables are orthogonal:

E(ξnξm)= λnδnm, 3.5. THE KARHUNEN-LOEVE´ EXPANSION 47 where λ are positive numbers that will be determined later. { n}n∞=1 Assuming that an expansion of the form (3.28) exists, we can calculate

∞ ∞ R(t,s)= E(XtXs) = E ξkek(t)ξℓeℓ(s) ! Xk=1 Xℓ=1 ∞ ∞ = E (ξkξℓ) ek(t)eℓ(s) Xk=1 Xℓ=1 ∞ = λkek(t)ek(s). Xk=1 Consequently, in order to the expansion (3.28) to be valid we need

∞ R(t,s)= λkek(t)ek(s). (3.29) Xk=1 From equation (3.29) it follows that

1 1 ∞ R(t,s)en(s) ds = λkek(t)ek(s)en(s) ds 0 0 Z Z Xk=1 ∞ 1 = λkek(t) ek(s)en(s) ds 0 Xk=1 Z ∞ = λkek(t)δkn Xk=1 = λnen(t).

Hence, in order for the expansion (3.28) to be valid, λ , e (t) have to be { n n }n∞=1 the eigenvalues and eigenfunctions of the integral whose kernel is the correlation function of Xt:

1 R(t,s)en(s) ds = λnen(t). (3.30) Z0 Hence, in order to prove the expansion (3.28) we need to study the eigenvalue problem for the integral operator : L2[0, 1] L2[0, 1]. It easy to check that R 7→ this operator is self-adjoint (( f, h) = (f, h) for all f, h L2(0, 1)) and non- R R ∈ negative ( f,f > 0 for all f L2(0, 1)). Hence, all its eigenvalues are real R ∈ and nonnegative. Furthermore, it is a compact operator (if φ is a bounded { n}n∞=1 48 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES sequence in L2(0, 1), then φ has a convergent subsequence). The spec- {R n}n∞=1 tral theorem for compact, self-adjoint operators implies that has a countable R sequence of eigenvalues tending to 0. Furthermore, for every f L2(0, 1) we can ∈ write ∞ f = f0 + fnen(t), n=1 X where f = 0, e (t) are the eigenfunctions of corresponding to nonzero R 0 { n } R eigenvalues and the convergence is in L2. Finally, Mercer’s Theorem states that for R(t,s) continuous on [0, 1] [0, 1], the expansion (3.29) is valid, where the × series converges absolutely and uniformly. Now we are ready to prove (3.28).

Theorem 3.36. (Karhunen-Loeve).´ Let X , t [0, 1] be an L2 process with { t ∈ } zero mean and continuous correlation function R(t,s). Let λ , e (t) be the { n n }n∞=1 eigenvalues and eigenfunctions of the operator defined in (3.36). Then R

∞ X = ξ e (t), t [0, 1], (3.31) t n n ∈ n=1 X where 1 ξn = Xten(t) dt, Eξn = 0, E(ξnξm)= λδnm. (3.32) Z0 The series converges in L2 to X(t), uniformly in t.

Proof. The fact that Eξn = 0 follows from the fact that Xt is mean zero. The orthogonality of the random variables ξ follows from the orthogonality of { n}n∞=1 the eigenfunctions of : R 1 1 E(ξnξm) = E XtXsen(t)em(s) dtds 0 0 Z1 Z1 = R(t,s)en(t)em(s) dsdt 0 0 Z Z 1 = λn en(s)em(s) ds Z0 = λnδnm. 3.5. THE KARHUNEN-LOEVE´ EXPANSION 49

N Consider now the partial sum SN = n=1 ξnen(t).

E X S 2 = EX2 + ES2 P2E(X S ) | t − N | t N − t N N N = R(t,t)+ E ξ ξ e (t)e (t) 2E X ξ e (t) k ℓ k ℓ − t n n n=1 ! k,ℓX=1 X N N 1 2 = R(t,t)+ λk ek(t) 2E XtXsek(s)ek(t) ds | | − 0 Xk=1 Xk=1 Z N = R(t,t) λ e (t) 2 0, − k| k | → Xk=1 by Mercer’s theorem.

Remark 3.37. Let Xt be a Gaussian second order process with continuous co- variance R(t,s). Then the random variables ξ are Gaussian, since they { k}k∞=1 are defined through the time integral of a Gaussian processes. Furthermore, since they are Gaussian and orthogonal, they are also independent. Hence, for Gaussian processes the Karhunen-Loeve´ expansion becomes:

+ ∞ Xt = λkξkek(t), (3.33) Xk=1 p where ξ are independent (0, 1) random variables. { k}k∞=1 N Example 3.38. The Karhunen-Loeve´ Expansion for Brownian Motion. The corre- lation function of Brownian motion is R(t,s) = min(t,s). The eigenvalue problem ψ = λ ψ becomes R n n n 1 min(t,s)ψn(s) ds = λnψn(t). Z0 Let us assume that λn > 0 (it is easy to check that 0 is not an eigenvalue). Upon setting t = 0 we obtain ψn(0) = 0. The eigenvalue problem can be rewritten in the form t 1 sψn(s) ds + t ψn(s) ds = λnψn(t). Z0 Zt We differentiate this equation once:

1 ψn(s) ds = λnψn′ (t). Zt 50 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

We set t = 1 in this equation to obtain the second boundary condition ψn′ (1) = 0. A second differentiation yields;

ψ (t)= λ ψ′′(t), − n n n where primes denote differentiation with respect to t. Thus, in order to calcu- late the eigenvalues and eigenfunctions of the integral operator whose kernel is the covariance function of Brownian motion, we need to solve the Sturm-Liouville problem ψ (t)= λ ψ′′(t), ψ(0) = ψ′(1) = 0. − n n n It is easy to check that the eigenvalues and (normalized) eigenfunctions are

1 2 2 ψ (t)= √2sin (2n 1)πt , λ = . n 2 − n (2n 1)π    −  Thus, the Karhunen-Loeve´ expansion of Brownian motion on [0, 1] is

∞ 2 1 W = √2 ξ sin (2n 1)πt . (3.34) t n (2n 1)π 2 − n=1 X −   We can use the KL expansion in order to study the L2-regularity of stochas- tic processes. First, let R be a compact, symmetric positive definite operator on 2 + L (0, 1) with eigenvalues and normalized eigenfunctions λ , e (x) ∞ and con- { k k }k=1 sider a function f L2(0, 1) with 1 f(s) ds = 0. We can define the one parame- ∈ 0 ter family of Hilbert spaces Hα through the norm R 2 α 2 2 α f = R− f 2 = f λ− . k kα k kL | k| Xk The inner product can be obtained through . This norm enables us to 3 measure the regularity of the function f(t). Let Xt be a mean zero second order (i.e. with finite second moment) process with continuous autocorrelation function. Define the space α := L2((Ω, P ),Hα(0, 1)) with (semi)norm H 2 2 1 α X = E X α = λ − . (3.35) k tkα k tkH | k| Xk Notice that the regularity of the stochastic process Xt depends on the decay of the eigenvalues of the integral operator := 1 R(t,s) ds. R· 0 · 3Think of R as being the inverse of the LaplacianR with periodic boundary conditions. In this case Hα coincides with the standard fractional Sobolev space. 3.6. DISCUSSION AND BIBLIOGRAPHY 51

As an example, consider the L2-regularity of Brownian motion. From Exam- ple 3.38 we know that λ k 2. Consequently, from (3.35) we get that, in order k ∼ − for W to be an element of the space α, we need that t H 2(1 α) λ − − < + , | k| ∞ Xk from which we obtain that α< 1/2. This is consistent with the H¨older continuity of Brownian motion from Theorem 3.29. 4

3.6 Discussion and Bibliography

The Ornstein-Uhlenbeck process was introduced by Ornstein and Uhlenbeck in 1930 as a model for the velocity of a Brownian particle [25]. The kind of analysis presented in Section 3.2.3 was initiated by G.I. Taylor in [24]. The proof of Bochner’s theorem 3.16 can be found in [14], where addi- tional material on stationary processes can be found. See also [11]. The spectral theorem for compact, self-adjoint operators which was needed in the proof of the Karhunen-Lo´eve theorem can be found in [21]. The Karhunen- Loeve expansion is also valid for random fields. See [22] and the reference therein.

3.7 Exercises

1. Let Y0, Y1,... be a sequence of independent, identically distributed random variables and consider the stochastic process Xn = Yn.

(a) Show that Xn is a strictly stationary process. (b) Assume that EY = µ< + and EY 2 = sigma2 < + . Show that 0 ∞ 0 ∞ N 1 1 − lim E Xj µ = 0. N + N − → ∞ j=0 X

(c) Let f be such that Ef 2(Y ) < + . Show that 0 ∞ N 1 1 − lim E f(Xj) f(Y0) = 0. N + N − → ∞ j=0 X

4Notice, however, that Wiener’s theorem refers to a.s. H¨older continuity, whereas the calculation presented in this section is about L2-continuity. 52 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

2. Let Z be a random variable and define the stochastic process Xn = Z, n = 0, 1, 2,... . Show that Xn is a strictly stationary process.

3. Let A0, A1,...Am and B0,B1,...Bm be uncorrelated random variables with mean zero and EA2 = σ2, EB2 = σ2, i = 1,...m. Let ω ,ω ,...ω i i i i 0 1 m ∈ [0,π] be distinct frequencies and define, for n = 0, 1, 2,... , the stochastic ± ± process m Xn = Ak cos(nωk)+ Bk sin(nωk) . Xk=0   Calculate the mean and the covariance of Xn. Show that it is a weakly stationary process.

4. Let ξ : n = 0, 1, 2,... be uncorrelated random variables with Eξ = { n ± ± } n µ, E(ξ µ)2 = σ2, n = 0, 1, 2,... . Let a , a ,... be arbitrary real n − ± ± 1 2 numbers and consider the stochastic process

Xn = a1ξn + a2ξn 1 + ...amξn m+1. − −

(a) Calculate the mean, variance and the covariance function of Xn. Show that it is a weakly stationary process.

(b) Set ak = 1/√m for k = 1,...m. Calculate the covariance function and study the cases m = 1 and m + . → ∞ 5. Let W (t) be a standard one dimensional Brownian motion. Calculate the fol- lowing expectations.

(a) EeiW (t). (b) Eei(W (t)+W (s)), t,s, (0, + ). ∈ ∞ (c) E( n c W (t ))2, where c R, i = 1,...n and t (0, + ), i = i=1 i i i ∈ i ∈ ∞ 1,...n. P n (d) Ee i =1 ciW (ti) , where c R, i = 1,...n and t (0, + ), i = i ∈ i ∈ ∞ 1,...n . 

6. Let Wt be a standard one dimensional Brownian motion and define

B = W tW , t [0, 1]. t t − 1 ∈ 3.7. EXERCISES 53

(a) Show that Bt is a Gaussian process with

EB = 0, E(B B ) = min(t,s) ts. t t s −

(b) Show that, for t [0, 1) an equivalent definition of B is through the ∈ t formula t B = (1 t)W . t − 1 t  −  (c) Calculate the distribution function of Bt.

7. Let Xt be a mean-zero second order stationary process with autocorrelation function N λ2 j αj t R(t)= e− | |, αj Xj=1 where α , λ N are positive real numbers. { j j}j=1 (a) Calculate the spectral density and the correlaction time of this process. (b) Show that the assumptions of Theorem 3.21 are satisfied and use the argu- ment presented in Section 3.2.3 (i.e. the Green-Kubo formula) to calculate t the diffusion coefficient of the process Zt = 0 Xs ds. N (c) Under what assumptions on the coefficientsR αj, λj can you study { }j=1 the above questions in the limit N + ? → ∞ 8. Prove Lemma 3.20.

9. Let a1,...an and s1,...sn be positive real numbers. Calculate the mean and variance of the random variable

n X = aiW (si). Xi=1

10. Let W (t) be the standard one-dimensional Brownian motion and let σ, s1, s2 > 0. Calculate

(a) EeσW (t).

(b) E sin(σW (s1)) sin(σW (s2)) .  54 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

11. Let Wt be a one dimensional Brownian motion and let µ,σ > 0 and define

tµ+σWt St = e .

(a) Calculate the mean and the variance of St.

(b) Calculate the probability density function of St.

12. Use Theorem 3.27 to prove Lemma 3.26.

13. Prove Theorem 3.30.

14. Use Lemma 3.31 to calculate the distribution function of the stationary Ornstein- Uhlenbeck process.

15. Calculate the mean and the correlation function of the integral of a standard Brownian motion t Yt = Ws ds. Z0 16. Show that the process

t+1 Y = (W W ) ds, t R, t s − t ∈ Zt is second order stationary.

t 2t 17. Let Vt = e− W (e ) be the stationary Ornstein-Uhlenbeck process. Give the definition and study the main properties of the Ornstein-Uhlenbeck bridge.

18. The autocorrelation function of the velocity Y (t) a Brownian particle moving 1 2 2 in a harmonic potential V (x)= 2 ω0x is

γ t 1 R(t)= e− | | cos(δ t ) sin(δ t ) , | | − δ | |   where γ is the friction coefficient and δ = ω2 γ2. 0 − p (a) Calculate the spectral density of Y (t). (b) Calculate the mean square displacement E(X(t))2 of the position of the Brownian particle X(t)= t Y (s) ds. Study the limit t + . 0 → ∞ R 19. Show the scaling property (3.26) of the fractional Brownian motion. 3.7. EXERCISES 55

20. Use Theorem (3.27) to show that there does not exist a continuous modification of the Poisson process.

21. Show that the correlation function of a process Xt satisfying (3.27) is continu- ous in both t and s.

22. Let Xt be a stochastic process satisfying (3.27) and R(t,s) its correlation func- tion. Show that the integral operator : L2[0, 1] L2[0, 1] R 7→ 1 f := R(t,s)f(s) ds (3.36) R Z0 is self-adjoint and nonnegative. Show that all of its eigenvalues are real and nonnegative. Show that eigenfunctions corresponding to different eigenvalues are orthogonal.

23. Let H be a Hilbert space. An operator : H H is said to be Hilbert– R → Schmidt if there exists a complete orthonormal sequence φ in H such { n}n∞=1 that ∞ e 2 < . kR nk ∞ n=1 X Let : L2[0, 1] L2[0, 1] be the operator defined in (3.36) with R(t,s) being R 7→ continuous both in t and s. Show that it is a Hilbert-Schmidt operator.

24. Let Xt a mean zero second order stationary process defined in the interval [0, T ] + with continuous covariance R(t) and let λ ∞ be the eigenvalues of the { n}n=1 . Show that

∞ λn = T R(0). nX=1 25. Calculate the Karhunen-Loeve expansion for a second order stochastic process with correlation function R(t,s)= ts.

26. Calculate the Karhunen-Loeve expansion of the Brownian bridge on [0, 1].

27. Let X , t [0, T ] be a second order process with continuous covariance and t ∈ Karhunen-Lo´eve expansion

∞ Xt = ξkek(t). Xk=1 56 CHAPTER 3. BASICS OF THE THEORY OF STOCHASTIC PROCESSES

Define the process Y (t)= f(t)X , t [0,S], τ(t) ∈ where f(t) is a continuous function and τ(t) a continuous, nondecreasing func- tion with τ(0) = 0, τ(S) = T . Find the Karhunen-Lo´eve expansion of Y (t), 2 in an appropriate weighted L space, in terms of the KL expansion of Xt. Use this in order to calculate the KL expansion of the Ornstein-Uhlenbeck process.

28. Calculate the Karhunen-Lo´eve expansion of a centered Gaussian stochastic pro- cess with covariance function R(s,t) = cos(2π(t s)). − 29. Use the Karhunen-Loeve expansion to generate paths of the

(a) Brownian motion on [0, 1]. (b) Brownian bridge on [0, 1]. (c) Ornstein-Uhlenbeck on [0, 1].

Study computationally the convergence of the KL expansion for these pro- cesses. How many terms do you need to keep in the KL expansion in order to calculate accurate statistics of these processes? Index

autocorrelation function, 32 of a random variable, 13 law of large numbers Banach space, 16 strong, 24 Birkhoff ergodic theorem, 31 law of total probability, 12 Bochner theorem, 35 Brownian motion, 39 Mercer theorem, 48 Brownian motion scaling and symmetry properties, 43 operator Hilbert–Schmidt, 55 central limit theorem, 24 Ornstein-Uhlenbeck process, 36 characteristic function of a random variable, 18 Poisson process, 45 conditional expectation, 18 probability density function conditional probability, 12 of a random variable, 14 correlation coefficient, 17 random variable, 12 covariance function, 32 random variable equation Gaussian, 17 Fokker–Planck, 2 uncorrelated, 17 reduced distribution function, 16 Fokker–Planck equation, 2 fractional Brownian motion, 45 sample path, 30 sample space, 9 Gaussian stochastic process, 30 spectral density, 35 generating function, 26 spectral measure, 35 Green-Kubo formula, 39 stationary process, 31 stationary process Karhunen-Lo´eve Expansion, 46 second order stationary, 32 Karhunen-Lo´eve Expansion strictly stationary, 31 for Brownian Motion, 49 wide sense stationary, 32 law stochastic differential equation, 2, 43

57 58 INDEX stochastic process definition, 29 finite dimensional distributions, 30 Gaussian, 30 sample paths, 30 second-order stationary, 32 stationary, 31 equivalent, 30 stochastic processes strictly stationary, 31 theorem Bochner, 35 Mercer, 48 transport coefficient, 39

Wiener process, 40 Bibliography

[1] R. Balescu. Statistical dynamics. Matter out of equilibrium. Imperial College Press, London, 1997.

[2] L. Breiman. Probability, volume 7 of Classics in . Soci- ety for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992. Corrected reprint of the 1968 original.

[3] S. Chandrasekhar. Stochastic problems in physics and astronomy. Rev. Mod. Phys., 15(1):1–89, Jan 1943.

[4] A.J. Chorin and O.H. Hald. Stochastic tools in mathematics and science, volume 1 of Surveys and Tutorials in the Applied Mathematical Sciences. Springer, New York, 2006.

[5] J. L. Doob. The Brownian movement and stochastic equations. Ann. of Math. (2), 43:351–369, 1942.

[6] N. Wax (editor). Selected Papers on and Stochastic Processes. Dover, New York, 1954.

[7] A. Einstein. Investigations on the theory of the Brownian movement. Dover Publications Inc., New York, 1956. Edited with notes by R. F¨urth, Translated by A. D. Cowper.

[8] W. Feller. An introduction to probability theory and its applications. Vol. I. Third edition. John Wiley & Sons Inc., New York, 1968.

[9] W. Feller. An introduction to probability theory and its applications. Vol. II. Second edition. John Wiley & Sons Inc., New York, 1971.

59 60 BIBLIOGRAPHY

[10] J. Jacod and A.N. Shiryaev. Limit theorems for stochastic processes, vol- ume 288 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 2003.

[11] S. Karlin and H.M. Taylor. A first course in stochastic processes. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York- London, 1975.

[12] L. B. Koralov and Y. G. Sinai. Theory of probability and random processes. Universitext. Springer, Berlin, second edition, 2007.

[13] H. A. Kramers. Brownian motion in a field of force and the diffusion model of chemical reactions. Physica, 7:284–304, 1940.

[14] N. V. Krylov. Introduction to the theory of diffusion processes, volume 142 of Translations of Mathematical Monographs. American Mathematical Society, Providence, RI, 1995.

[15] P. D. Lax. Linear algebra and its applications. Pure and Applied Mathematics (Hoboken). Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition, 2007.

[16] M. Lo`eve. Probability theory. I. Springer-Verlag, New York, fourth edition, 1977. Graduate Texts in Mathematics, Vol. 45.

[17] M. Lo`eve. Probability theory. II. Springer-Verlag, New York, fourth edition, 1978. Graduate Texts in Mathematics, Vol. 46.

[18] R.M. Mazo. Brownian motion, volume 112 of International Series of Mono- graphs on Physics. Oxford University Press, New York, 2002.

[19] E. Nelson. Dynamical of Brownian motion. Princeton University Press, Princeton, N.J., 1967.

[20] G. Da Prato and J. Zabczyk. Stochastic Equations in Infinite Dimensions, volume 44 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1992.

[21] Frigyes Riesz and B´ela Sz.-Nagy. . Dover Publications Inc., New York, 1990. Translated from the second French edition by Leo F. Boron, Reprint of the 1955 original. BIBLIOGRAPHY 61

[22] C. Schwab and R.A. Todor. Karhunen-Lo`eve approximation of random fields by generalized fast multipole methods. J. Comput. Phys., 217(1):100–122, 2006.

[23] D.W. Stroock. Probability theory, an analytic view. Cambridge University Press, Cambridge, 1993.

[24] G. I. Taylor. Diffusion by continuous movements. London Math. Soc., 20:196, 1921.

[25] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the Brownian motion. Phys. Rev., 36(5):823–841, Sep 1930.