<<

Advanced Probability , Part III of the Mathematical Michaelmas Term 2006

Gr´egory Miermont1

1CNRS & Laboratoire de Math´ematique, Equipe Probabilit´es, Statistique et Mod´elisation, Bt. 425, Universit´eParis-Sud, 91405 Orsay, France

Contents

1 Conditional expectation 5 1.1 The discrete case ...... 5 1.2 Conditioning with respect to a σ- ...... 6 1.2.1 The L2 case ...... 7 1.2.2 General case ...... 7 1.2.3 Non-negative case ...... 8 1.3 Specific properties of conditional expectation ...... 10 1.4 Computing a conditional expectation ...... 11 1.4.1 Conditional density functions ...... 11 1.4.2 The Gaussian case ...... 11

2 Discrete-time martingales 13 2.1 Basic notions ...... 13 2.1.1 Stochastic processes, filtrations ...... 13 2.1.2 Martingales ...... 14 2.1.3 Doob’s stopping times ...... 14 2.2 Optional stopping ...... 15 2.3 The convergence theorem ...... 17 2.4 Lp convergence, p > 1 ...... 18 2.4.1 A maximal inequality ...... 18 2.5 L1 convergence ...... 20 2.6 Optional stopping in the UI case ...... 20 2.7 Backwards martingales ...... 21

3 Examples of applications of discrete-time martingales 23 3.1 Kolmogorov’s 0 − 1 law, law of large numbers ...... 23 3.2 Branching processes ...... 23 3.3 The Radon-Nikodym theorem ...... 24 3.4 Product martingales ...... 26 3.4.1 Example: consistency of the likelihood ratio ...... 28

3 4 Continuous-parameter processes 29 4.1 Theoretical problems ...... 29 4.2 Finite marginal distributions, versions ...... 31 4.3 The martingale regularization theorem ...... 32 4.4 Convergence theorems for continuous-time martingales ...... 34 4.5 Kolmogorov’s continuity criterion ...... 35

5 Weak convergence 37 5.1 Definition and characterizations ...... 37 5.2 Convergence in distribution ...... 39 5.3 Tightness ...... 40 5.4 L´evy’sconvergence theorem ...... 41

6 Brownian motion 43 6.1 Wiener’s theorem ...... 43 6.2 First properties ...... 47 6.3 The strong Markov property ...... 48 6.4 Martingales and Brownian motion ...... 51 6.5 Recurrence and transience properties ...... 53 6.6 The Dirichlet problem ...... 55 6.7 Donsker’s invariance principle ...... 59

7 Poisson random measures and processes 63 7.1 Poisson random measures ...... 63 7.2 Integrals with respect to a Poisson measure ...... 64 7.3 Poisson point processes ...... 66 7.3.1 Example: the Poisson process ...... 66 7.3.2 Example: compound Poisson processes ...... 67

8 ID laws and L´evy processes 69 8.1 The L´evy-Khintchine formula ...... 69 8.2 L´evyprocesses ...... 71

9 Exercises 75 9.1 Conditional expectation ...... 77 9.2 Discrete-time martingales ...... 79 9.3 Continuous-time processes ...... 84 9.4 Weak convergence ...... 85 9.5 Brownian motion ...... 86 9.6 Poisson measures, ID laws and L´evyprocesses ...... 89 Chapter 1

Conditional expectation

1.1 The discrete case

Let (Ω, F,P ) be a probability space. If A, B ∈ F are two events such that P (B) > 0, we define the conditional probability of A given B by the formula P (A ∩ B) P (A|B) = . P (B) We interpret this quantity as the probability of the event A given the fact that B is realized. The fact that P (B|A)P (A) P (A|B) = P (B) is called Bayes’ rule. More generally, if X ∈ L1(Ω, F,P ) is an integrable random variable, we define E[X1 ] E[X|B] = B , P (B) the conditional expectation of X given B.

Example. Toss a fair die (probability space Ω = {1, 2, 3, 4, 5, 6} and P ({ω}) = 1/6, for ω ∈ Ω) and let A = {the result is even},B = {the result is less than or equal to 2}. Then P (A|B) = 1/2,P (B|A) = 1/3. If X = ω is the result, then E[X|A] = 4,E[X|B] = 3/2. S Let (Bi, i ∈ I) be a countable collection of disjoint events, such that Ω = i∈I Bi, and 1 G = σ{Bi, i ∈ I}. If X ∈ L (Ω, F,P ), we define a random variable 0 X 1 X = E[X|Bi] Bi , i∈I with the convention that E[X|Bi] = 0 if P (Bi) = 0. The random variable X0 is integrable, since 1 0 X X |E[X Bi ]| E[|X |] = P (Bi)|E[X|Bi]| = P (Bi) ≤ E[|X|]. P (Bi) i∈I i∈I Moreover, it is straightforward to check:

5 6 CHAPTER 1. CONDITIONAL EXPECTATION

1. X0 is G-measurable, and

0 2. for every B ∈ G, E[1BX ] = E[1BX].

Example. If X ∈ L1(Ω, F,P ) and Y is a random variable with values in a countable set E, the above construction gives, by letting By = {Y = y}, y ∈ E, which partitions σ(Y ) into measurable events, a random variable X E[X|Y ] = E[X|Y = y]1{Y =y}. y∈E

Notice that the value taken by E[X|Y = y] when P (Y = y) = 0, which we have fixed to 0, is actually irrelevant to define E[X|Y ], since a random variable is always defined up to a set of zero measure. It is important to keep in mind that conditional expectations are always a priori only defined up to a zero-measure set.

1.2 Conditioning with respect to a σ-algebra

We are now going to define the conditional expectation given a sub-σ-algebra of our probability space, by using the properties 1. and 2. of the previous paragraph. The definition is due to Kolmogorov.

Theorem 1.2.1 Let G ⊂ F be a sub-σ-algebra, and X ∈ L1(Ω, F,P ). Then there exists a random variable X0 with E[|X0|] < ∞ such that the following two characteristic property are verified:

1. X0 is G-measurable

0 2. for every B ∈ G, E[1BX ] = E[1BX].

Moreover, if X00 is another such random variable, then X0 = X00 a.s. We denote by E[X|G] ∈ L1(Ω, G,P ) the class of random variable X0. It is called the conditional expec- tation of X given G.

1 Otherwise said, E[X|G] is the unique element of L (Ω, G,P ) such that E[1BX] = E[1BE[X|G]] for every B ∈ G. Equivalently, an approximation argument allows to replace 2. in the statement by:

2’. For every bounded G-measurable random variable Z, E[ZX0] = E[ZX].

Proof of the uniqueness. Suppose X0 and X00 satisfy the two conditions of the statement. Then B = {X0 > X00} ∈ G, and therefore

0 00 0 = E[1B(X − X)] = E[1B(X − X )],

0 00 which shows X ≤ X a.s., the reverse inequality is obtained by symmetry. 

The existence will need two intermediate steps. 1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA 7

1.2.1 The L2 case

We first consider L2 variables. Suppose that X ∈ L2(Ω, F,P ) and let G ⊂ F be a sub-σ-algebra. Notice that L2(Ω, G,P ) is a closed vector subspace of the Hilbert space L2(Ω, F,P ). Therefore, there exists a unique random variable X0 ∈ L2(Ω, G,P ) such that E[Z(X − X0)] = 0 for every Z ∈ L2(Ω, G,P ), namely, X0 is the orthogonal projection of X onto L2(Ω, G,P ). This shows the previous theorem in the case X ∈ L2, and in fact E[·|G]: L2 → L2 is the orthonormal projector onto L2(Ω, G,P ), and hence is linear. It follows from the uniqueness statement that the conditional expectation has the following nice interpretation in the L2 case: E[X|G] is the G-measurable random variable that best approximates X. It is useful to keep this intuitive idea even in the general L1 case, although the word “approximates” becomes more fuzzy. Notice that X0 := E[X|G] ≥ 0 a.s. whenever X ≥ 0 since (notice {X0 < 0} ∈ G)

0 E[X1{X0<0}] = E[X 1{X0<0}], and the left-hand side is non-negative while the right hand-side is non-positive, entailing P (X0 < 0) = 0. Moreover, it holds that E[E[X|G]] = E[X], because it is the scalar product of X against the constant function 1 ∈ L2(Ω, G,P ).

1.2.2 General case

Now let X ≥ 0 be any non-negative random variable (not necessarily integrable). Then X ∧ n is in L2 for every n ∈ N, and X ∧ n increases to X pointwise. Therefore, the sequence E[X ∧ n|G] is an (a.s.) increasing sequence, because X ∧ n − X ∧ (n − 1) ≥ 0) and by linearity of E[·|G] on L2. It therefore increases a.s. to a limit which we denote by E[X|G]. Notice that E[E[X ∧ n|G]] = E[X ∧ n] so that by the monotone convergence theorem, E[E[X|G]] = E[X]. In particular, if X is integrable, then so is E[X|G].

Proof of existence in Theorem 1.2.1. Existence. Let X ∈ L1, and write X = X+ − X− (where X+ = X ∨ 0, and X− = (−X) ∨ 0). Then X+,X− are non-negative integrable random variables, so E[X+|G] and E[X−|G] are finite a.s. and we may define

E[X|G] = E[X+|G] − E[X−|G].

+ + Now, let B ∈ G. Then E[(X ∧ n)1B] = E[E[X ∧ n|G]1B] by definition. The monotone convergence theorem allows to pass to the limit (all integrated random variables are non- + + − negative), and we obtain E[X 1B] = E[E[X |G]1B]. The same is easily true for X , and by subtracting we see that E[X|B] indeed satisfies the characteristic properties 1., 2. The following properties are immediate consequences of the previous theorem and its proof.

Proposition 1.2.1 Let G ⊂ F be a σ-algebra and X,Y ∈ L1(Ω, F,P ). Then

1. E[E[X|G]] = E[X]

2. If X is G-measurable, then E[X|G] = X. 8 CHAPTER 1. CONDITIONAL EXPECTATION

3. If X is independent of G, then E[X|G] = E[X].

4. If a, b ∈ R then E[aX + bY |G] = aE[X|G] + bE[Y |G] (linearity). 5. If X ≥ 0 then E[X|G] ≥ 0 (positiveness). 6. |E[X|G]| ≤ E[|X| |G], so that E[|E[X|G]|] ≤ E[|X|].

Important remark. Notice that all statements concerning conditional expectation are about L1 variables, which are only defined up to a subset of of zero probability, and hence are a.s. statements. This is of crucial importance and reminds the fact that we encountered before, that E[X|Y = y] can be assigned an arbitrary value whenever P (Y = y) = 0.

1.2.3 Non-negative case

In the course of proving the last theorem, we actually built an object E[X|G] as the a.s. increasing limit of E[X ∧ n|G] for any non-negative random variable X, not necessarily integrable. This random variable enjoys similar properties as the L1 case, and we state them similarly as in Theorem 1.2.1.

Theorem 1.2.2 Let G ⊂ F be a sub-σ-algebra, and X ≥ 0 a non-negative random variable. Then there exists a random variable X0 ≥ 0 such that

1. X0 is G-measurable, and 2. for every non-negative G-measurable random variable Z, E[ZX0] = E[ZX].

Moreover, if X00 is another such r.v., X0 = X00 a.s. We denote by E[X|G] the class of X0 up to a.s. equality.

Proof. Any r.v. in the class of E[X|G] = lim supn E[X ∧ n|G] trivially satisfies 1. It also satisfies 2. since if Z is a positive G-measurable random variable, we have, by passing to the (increasing) limit in

E[(X ∧ n)(Z ∧ n)] = E[E[X ∧ n|G]Z ∧ n], that E[XZ] = E[E[X|G]Z]. Uniqueness. If X0,X00 are non-negative and satisfy the properties 1. & 2., for any 0 00 a < b ∈ Q+, by letting B = {X ≤ a < b ≤ X } ∈ G, we obtain 00 0 bP (B) ≤ E[X 1B] = E[X1B] = E[X 1B] ≤ aP (B), which entails P (B) = 0, so that P (X0 < X00) = 0 by taking the countable union over 0 00 a < b ∈ Q+. Similarly, P (X > X ) = 0. 

The reader is invited to formulate and prove analogs of the properties of Proposition 1.2.1 for positive variables, and in particular, that if 0 ≤ X ≤ Y then 0 ≤ E[X|G] ≤ E[Y |G] a.s. The conditional expectation enjoys the following properties, which match those of the classical expectation. 1.2. CONDITIONING WITH RESPECT TO A σ-ALGEBRA 9

Proposition 1.2.2 Let G ⊂ F be a σ-algebra.

1. If (Xn, n ≥ 0) is an increasing sequence of non-negative random variables with limit X, then (conditional monotone convergence theorem)

E[Xn|G] % E[X|G] a.s. n→∞

2. If (Xn, n ≥ 0) is a sequence of non-negative random variables, then (conditional Fatou theorem) E[lim inf Xn|G] ≤ lim inf E[Xn|G] a.s. n→∞ n→∞

3. If (Xn, n ≥ 0) is a sequence of random variables a.s. converging to X, and if there 1 exists Y ∈ L (Ω, F,P ) such that supn |Xn| ≤ Y a.s., then (conditional dominated convergence theorem) 1 lim E[Xn|G] = E[X|G] , a.s. and in L . n→∞

4. If ϕ : R → (−∞, ∞] is a convex function and X ∈ L1(Ω, F,P ), and either ϕ is non-negative or ϕ(X) ∈ L1(Ω, F,P ), then (conditional Jensen inequality) E[ϕ(X)|G] ≥ ϕ(E[X|G]) a.s.

5. If 1 ≤ p < ∞ and X ∈ Lp(Ω, F,P ),

kE[X|G]kp ≤ kXkp. In particular, the linear operator X 7→ E[X|G] from Lp(Ω, F,P ) to Lp(Ω, G,P ) is continuous.

0 Proof. 1. Let X be the increasing limit of E[Xn|G]. Let Z be a positive G-measurable random variable, then E[ZE[Xn|G]] = E[ZXn], which by taking an increasing limit gives E[ZX0] = E[ZX], so X0 = E[X|G].

2. We have E[infk≥n Xk|G] ≤ infk≥n E[Xk|G] for every n by monotonicity of the conditional expectation, and the result is obtained by passing to the limit and using 1.

3. Applying 2. to the nonnegative random variables Z − Xn,Z + Xn, we get that E[Z −X|G] ≤ E[Z|G]−lim sup E[Xn|G] and that E[Z +X|G] ≤ E[Z|G]+lim inf E[Xn|G], giving the a.s. result. The L1 result is a consequence of the dominated convergence theorem, since |E[Xn|G]| ≤ E[|Xn| |G] ≤ |Z| a.s. 4. A convex function ϕ is the superior envelope of its affine minorants, i.e. ϕ(x) = sup ax + b = sup ax + b. a,b∈R:∀y,ay+b≤ϕ(y) a,b∈Q:∀y,ay+b≤ϕ(y) The result is then a consequence of linearity of the conditional expectation and the fact that Q is countable (this last fact is necessary because of the fact that conditional expec- tation is defined only a.s.). p p 5. One deduces from 4. and the previous proposition that kE[X|G]kp = E[|E[X|G]| ] ≤ p p p p E[E[|X| |G]] = E[|X| ] = kXkp, if 1 ≤ p < ∞ and X ∈ L (Ω, F,P ). Thus

kE[X|G]kp ≤ kXkp.  10 CHAPTER 1. CONDITIONAL EXPECTATION

1.3 Specific properties of conditional expectation

The “information contained in G” can be factorized out the conditional expectation: Proposition 1.3.1 Let G ⊂ F be a σ-algebra, and let X,Y be real random variables such that either X,Y are non-negative or X,XY ∈ L1(Ω, F,P ). Then, is Y is G-measurable, we have E[YX|G] = YE[X|G]. Proof. Let Z be a non-negative G-measurable random variable, then, if X,Y are non- negative, E[ZYX] = E[ZYE[X|G]] since ZY is non-negative, and the result follows by uniqueness. If X,XY are integrable, the same result follows by letting X = X+−X−,Y = + − Y − Y .  One has the Tower property (restricting the information)

Proposition 1.3.2 Let G1 ⊂ G2 ⊂ F be σ-. Then for every random variable X which is positive or integrable,

E[E[X|G2]|G1] = E[X|G1].

Proof. For a positive bounded G1-measurable Z, Z is G2-measurable as well, so that E[ZE[E[X|G2]|G1]] = E[ZE[X|G2]] = E[ZX] = E[ZE[X|G1]], hence the result. 

Proposition 1.3.3 Let G1, G2 be two sub-σ-algebras of F, and let X be a positive or integrable random variable. Then, if G2 is independent of σ(X, G1), E[X|G1 ∨ G2] = E[X|G1].

Proof. Let A ∈ G1,B ∈ G2, then

E[1A∩BE[X|G1 ∨ G2]] = E[1A1BX] = E[1BE[X1A|G2]] = P (B)E[X1A]

= P (B)E[1AE[X|G1]] = E[1A∩BE[X|G1]], where we have used the independence property at the third and last steps. The proof is then done by the monotone class theorem.  Proposition 1.3.4 Let X,Y be random variables and G be a sub-σ-algebra of F such that Y is G-measurable and X is independent of G. Then for any non-negative measurable function f, Z E[f(X,Y )|G] = P (X ∈ dx)f(x, Y ), where P (X ∈ dx) is the law of X. Proof. For any non-negative G-measurable random variable Z, we have that X is in- dependent of (Y,Z), so that the law P ((X,Y,Z) ∈ dxdydz) is equal to the product P (X ∈ dx)P ((Y,Z) ∈ dydz) of the law of X by the law of (Y,Z). Hence, Z E[Zf(X,Y )] = zf(x, y)P (X ∈ dx)P ((Y,Z) ∈ dydz) Z  Z  = P (X ∈ dx)E[Zf(x, Y )] = E Z P (X ∈ dx)f(x, Y ) , where we used Fubini’s theorem in two places. This shows the result. 1.4. COMPUTING A CONDITIONAL EXPECTATION 11

1.4 Computing a conditional expectation

We give two concrete and important examples of computation of conditional expectations.

1.4.1 Conditional density functions

Suppose X,Y have values in Rm and Rn respectively, and that the law of (X,Y ) has a R n density: P ((X,Y ) ∈ dxdy) = fX,Y (x, y)dxdy. Let fY (y) = m fX,Y (x, y)dx, y ∈ be x∈R R the density of Y . Then for every non-negative measurable h : Rm → R, g : Rn → R, we have Z E[h(X)g(Y )] = h(x)g(y)fX,Y (x, y)dxdy m n R ×R Z Z fX,Y (x, y)1 = g(y)fY (y)dy h(x) {fY (y)>0}dx n m f (y) R R Y = E[ϕ(Y )g(Y )], so E[h(X)|Y ] = ϕ(Y ), where

1 Z ϕ(y) = h(x)fX,Y (x, y)dx if fY (y) > 0, f (y) m Y R and 0 else. We interpret this result by saying that Z E[h(X)|Y ] = h(x)ν(Y, dx), m R −1 1 where ν(y, dx) = fY (y) fX,Y (x, y) {fY (y)>0}dx = fX|Y (x|y)dx. The measure ν(y, dx) is called conditional distribution given Y = y, and fX|Y (x|y) is the conditional density function of X given Y = y. Notice this function of x, y is defined only up to a zero- measure set.

1.4.2 The Gaussian case

Let (X,Y ) be a Gaussian vector in R2. Take X0 = aY +b with a, b such that Cov (X,Y ) = aVar Y and aE[Y ] + b = E[X]. In this case, Cov (Y,X − X0) = 0, hence X − X0 is independent of σ(Y ) by properties of Gaussian vectors. Moreover, X − X0 is centered so 0 0 for every B ∈ σ(Y ), one has E[1BX] = E[1BX ], hence X = E[X|Y ]. 12 CHAPTER 1. CONDITIONAL EXPECTATION Chapter 2

Discrete-time martingales

Before we entirely focus on discrete-time martingales, we start with a general discussion on stochastic processes, which includes both discrete and continuous-time processes.

2.1 Basic notions

2.1.1 Stochastic processes, filtrations

Let (Ω, F,P ) be a probability space. For a measurable states space (E, E) and a subset I ⊂ R of “times”, or “epochs”, an E-valued stochastic process indexed by I is a collection (Xt, t ∈ I) of random variables. Most of the processes we will consider take values in R, Rd, or C, being endowed with their Borel σ-algebras. A filtration is a collection (Ft, t ∈ I) of sub-σ-algebras of F which is increasing (s ≤ t =⇒ Fs ⊆ Ft). Once a filtration is given, we call (Ω, F, (Ft)t∈I ,P ) a filtered probability space. A process (Xt, t ∈ I) is adapted to the filtration (Ft, t ∈ I) if Xt is Ft-measurable for every t.

The intuitive idea is that Ft is the quantity of information available up to time t (present). To give an informal example, if we are interested in the evolution of the stock market, we can take Ft as the past history of the stocks prices (or only some of them) up to time t. W We will let F∞ = t∈I Ft ⊆ F be the information at the end of times.

Example. For every process (Xt, t ∈ I), one associates the natural filtration

X Ft = σ({Xs, s ≤ t}) , t ∈ I.

Every process is adapted to its natural filtration, and F X is the smallest filtration X is X adapted to: Ft contains all the measurable events depending on (Xs, s ≤ t).

Last, a real-valued process (Xt, t ∈ I) is said to be integrable if E[|Xt|] < ∞ for all t ∈ I.

13 14 CHAPTER 2. DISCRETE-TIME MARTINGALES

2.1.2 Martingales

Definition 2.1.1 Let (Ω, F, (Ft)t∈I ,P ) be a filtered probability space. An R-valued adapted integrable process (Xt, t ∈ I) is:

• a martingale if for every s ≤ t, E[Xt|Fs] = Xs.

• a supermartingale if for every s ≤ t, E[Xt|Fs] ≤ Xs.

• a submartingale if for every s ≤ t, E[Xt|Fs] ≥ Xs.

Notice that a (super, sub)martingale remains a martingale with respect to its natural filtration, by the tower property of conditional expectation.

2.1.3 Doob’s stopping times

Definition 2.1.2 Let (Ω, F, (Ft)t∈I ,P ) be a filtered probability space. A stopping time (with respect to this space) is a random variable T :Ω → I t {∞} such that {T ≤ t} ∈ Ft for every t ∈ I.

For example, constant times are (trivial) stopping times. If I = Z+, the random variable n1A + ∞1Ac is a stopping time if A ∈ Fn (with the convention 0 · ∞ = 0). The intuitive idea behind this definition is that T is a time when a decision can be taken (given the information we have). For example, for a meteorologist having the weather information up to the present time, the “first day of 2006 when the temperature is above 23◦C is a stopping time, but not the “last day of 2006 when the temperature is above 23◦C”.

Example. If I ⊂ Z+, the definition can be replaced by {T = n} ∈ Fn for all n ∈ I. When I is a subset of the integers, we will denote the time by letters n, m, k rather than t, s, r (so n ≥ 0 means n ∈ Z+). Particularly important instances of stopping times in this case are the first entrance times. Let (Xn, n ≥ 0) be an adapted process and let A ∈ E. The first entrance time in A is

TA = inf{n ∈ Z+ : Xn ∈ A} ∈ Z+ t {∞}. It is a stopping time since [ −1 {TA ≤ n} = Xm (A). 0≤m≤n To the contrary, the last exit time before some fixed N,

LA = sup{n ∈ {0, 1,...,N} : Xn ∈ A} ∈ Z+ t {∞}, is in general not a stopping time. As an immediate consequence of the definition, one gets:

Proposition 2.1.1 Let S,T, (Tn, n ∈ N) be stopping times (with respect to some filtered probability space). Then S ∧ T,S ∨ T, infn Tn, supn Tn, lim infn Tn, lim supn Tn are stopping times. 2.2. OPTIONAL STOPPING 15

Definition 2.1.3 Let T be a stopping time with respect to some filtered probability space (Ω, F, (Ft)t∈I ,P ). We define FT , the σ-algebra of events before time T

FT = {A ∈ F∞ : A ∩ {T ≤ t} ∈ Ft}.

The reader is invited to check that it defines indeed a σ-algebra, which is interpreted as the events that are measurable with respect to the information available at time T : “4 days before the first day (T ) in 2005 when the temperature is above 23◦C, the temperature ◦ was below 10 C” is in FT . If S,T are stopping times, one checks that

S ≤ T =⇒ FS ⊆ FT . (2.1)

Now suppose that I is countable. If (Xt, t ∈ I) is adapted, and T a stopping time, we let XT 1{T <∞} = XT (ω)(ω) if T (ω) < ∞, and 0 else. It is a random variable as the composition of (ω, t) 7→ Xt(ω) and ω 7→ (ω, T (ω)), which are measurable (why?). We also T let X = (XT ∧t, t ∈ I), and we call it the process X stopped at T .

Proposition 2.1.2 Under these hypotheses,

1. XT 1{T <∞} is FT -measurable,

2. the process XT is adapted,

T 3. if moreover I = Z+ and X is integrable, then X is integrable.

S Proof. 1. Let A ∈ E. Then {XT ∈ A} ∩ {T ≤ t} = {Xs ∈ A} ∩ {T = s}. Then S s∈I,s≤t notice {T = s} = {T ≤ s}\ u

From now on until the end of the section (except in the paragraph on backwards martingales), we will suppose that E = R and I = Z+ (discrete-time processes).

2.2 Discrete-time martingales: optional stopping

We consider a filtered probability space (Ω, F, (Fn),P ). All the above terminology (stop- ping times, adapted processes and so on) will be with respect to this space. We first introduce the so-called ‘martingale transform’, which is sometimes called the ‘discrete stochastic integral’ with respect to a (super, sub)martingale X. We say that a process (Cn, n ≥ 1) is previsible if Cn is Fn−1-measurable for every n ≥ 1. A previsible process can be interpreted as a strategy: one bets at time n only with all the accumulated knowledge up to time n − 1. 16 CHAPTER 2. DISCRETE-TIME MARTINGALES

If (Xn, n ≥ 0) is adapted and (Cn, n ≥ 1) is previsible, we define an adapted process C · X by n X (C · X)n = Ck(Xk − Xk−1). k=1

We can interpret this new process as follows: if Xn is a certain amount of money at time n and if Cn is the bet of a player at time n then (C · X)n is the total winning of the player at time n.

Proposition 2.2.1 In this setting, if X is a martingale, and C is bounded, then C · X is a martingale. If X is a supermartingale (resp. submartingale) and Cn ≥ 0 for every n ≥ 1, then C · X is a supermartingale (resp. submartingale).

Proof. Suppose X is a martingale. Since C is bounded, the process C · X is trivially integrable. Since Cn+1 is Fn-measurable,

E[(C · X)n+1 − (C · X)n|Fn] = E[Cn+1(Xn+1 − Xn)|Fn] = Cn+1E[Xn+1 − Xn|Fn] = 0.

The (super-, sub-)martingale cases are similar. 

Theorem 2.2.1 (Optional stopping) Let (Xn, n ≥ 0) be a martingale (resp. super-, sub-martingale). (i) If T is a stopping time, then Then XT is also a martingale (resp. super-, sub- martingale).

(ii) If S ≤ T are bounded stopping times, then E[XT |FS] = XS (resp. E[XT |FS] ≤ XS, E[XT |FS] ≥ XS).

(iii) If S ≤ T are bounded stopping times, then E[XT ] = E[XS] (resp. E[XT ] ≤ E[XS], E[XT ] ≥ E[XS]).

Proof. (i) Let Cn = 1{n≤T }, then C is a previsible non-negative bounded process, and it is immediate that C · X = XT . The first result follows from Proposition 2.2.1.

(ii) If now S,T are bounded stopping times with S ≤ T , and A ∈ FS, we define Cn = 1A1{S

Notice that the last two statement is not true in general. For example, if (Yn, n ≥ 0) are independent random variables which take values ±1 with probability 1/2, then Xn = P 1≤i≤n Yi is a martingale. If T = inf{n ≥ 0 : Xn = 1} then it is classical that T < ∞ a.s., but of course E[XT ] = 1 > 0 = E[X0]. However, for non-negative supermartingales, Fatou’s lemma entails:

Proposition 2.2.2 Suppose X is a non-negative supermartingale. Then for any stopping time which is a.s. finite, we have E[XT ] ≤ E[X0]. 2.3. THE CONVERGENCE THEOREM 17

Beware that this ≤ sign should not in general be turned into a = sign, even if X is a martingale! The very same proposition is actually true without the assumption that P (T < ∞) = 1, by the martingale convergence theorem 2.3.1 below.

2.3 Discrete-time martingales: the convergence the- orem

The martingale convergence theorem is the most important result in this chapter.

Theorem 2.3.1 (Martingale convergence theorem) If X is a supermartingale which 1 is bounded in L (Ω, F,P ), i.e. such that supn E[|Xn|] < ∞, then Xn converges a.s. to- wards an a.s. finite limit X∞.

An easy and important corollary for that is

Corollary 2.3.1 A non-negative supermartingale converges a.s. towards an a.s. finite limit.

Indeed, for a non-negative supermartingale, E[|Xn|] = E[Xn] ≤ E[X0] < ∞. The proof of Theorem 2.3.1 relies on an estimation of the number of upcrossings of a submartingale between to levels a < b. If (xn, n ≥ 0) is a real sequence, and a < b are two real numbers, we define two integer-valued sequences Sk(x),Tk(x), k ≥ 1 recursively as follows. Let T0(x) = 0 and for k ≥ 0, let

Sk+1(x) = inf{n ≥ Tk : xn < a} ,Tk+1(x) = inf{n ≥ Sk+1(x): xm > b}, with the usual convention inf ∅ = ∞. The number Nn([a, b], x) = sup{k > 0 : Tk(x) ≤ n} is the number of upcrossings of x between a and b before time n, which increases as n → ∞ to the total number of upcrossings N([a, b], x) = sup{k > 0 : Tk(x) < ∞}. The key is the simple following analytic lemma:

Lemma 2.3.1 A real sequence x converges (in R) if and only if N([a, b], x) < ∞ for every rationals a < b.

Proof. If there exist a < b rationals such that N([a, b], x) = ∞, then lim infn xn ≤ a < b ≤ lim supn xn so that x does not converge. If x does not converge, then lim infn xn < lim supn xn, so by taking two rationals a < b in between, we get the converse statement. 

Theorem 2.3.2 (Doob’s upcrossing lemma) Let X be a supermartingale, and a < b two reals. Then for every n ≥ 0,

− (b − a)E[Nn([a, b],X)] ≤ E[(Xn − a) ]. 18 CHAPTER 2. DISCRETE-TIME MARTINGALES

Proof. It is immediate by induction that Sk = Sk(X),Tk = Tk(X) defined as above are stopping times. Define a previsible process C, taking only 0 or 1 values, by X 1 Cn = {Sk

c It is indeed previsible since {Sk < n ≤ Tk} = {Sk ≤ n − 1} ∩ {Tk ≤ n − 1} ∈ Fn−1. Now, letting Nn = N([a, b],X), we have

N Xn (C · X) = (X − X ) + (X − X )1 n Ti Si n SNn+1 {SNn+1≤n} i=1 1 − ≥ (b − a)Nn + (Xn − a) {Xn≤a} ≥ (b − a)Nn − (Xn − a) . Since C is a non-negative bounded previsible process, C ·X is a supermartingale so finally

− (b − a)E[Nn] − E[(Xn − a) ] ≤ E[(C · X)n] ≤ 0, hence the result. 

Proof of Theorem 2.3.1. Since (x + y)− ≤ |x| + |y|, we get from Theorem 2.3.2 −1 that E[Nn] ≤ (b − a) E[|Xn| + a], and since Nn increases to N = N([a, b],X) we get −1 by monotone convergence E[N] ≤ (b − a) (supn E[|Xn|] + a). In particular, we get N([a, b],X) < ∞ a.s. for every a < b ∈ Q, so ! \ P {N([a, b],X) < ∞} = 1. a

Hence the a.s. convergence to some X∞, possibly infinite.

Now Fatou’s lemma gives E[|X∞|] ≤ lim infn E[|Xn|] < ∞ by hypothesis, hence |X∞| < ∞ a.s. 

. In fact, from Theorem 2.3.2 it is clearly enough that supn E[Xn ] < ∞ is suffi- cient, prove that this actually implies boundedness is L1 (provided X is a supermartingale, of course).

2.4 Doob’s inequalities and Lp convergence, p > 1

2.4.1 A maximal inequality

Proposition 2.4.1 Let X be a sub-martingale. Then letting Xen = sup0≤k≤n Xk, for every c > 0, and n ≥ 0,

+ 1 ∗ cP (Xen ≥ c) ≤ E[Xn {Xn≥c}] ≤ E[Xn ].

Proof. Letting T = inf{k ≥ 0 : Xk ≥ c}, we obtain by optional stopping that T 1 1 1 E[Xn] ≥ E[Xn ] = E[Xn {T >n}] + E[XT {T ≤n}] ≥ E[Xn {T >n}] + cP (T ≤ n).

Since {T ≤ n} = {Xen ≥ c}, the conclusion follows.  2.4. LP CONVERGENCE, P > 1 19

Theorem 2.4.1 (Doob’s Lp inequality) Let p > 1, and X be a martingale, then let- ∗ ting Xn = sup0≤k≤n |Xk|, we have p kX∗k ≤ kX k . n p p − 1 n p

Proof. Since x 7→ |x| in convex, the process (|Xn|, n ≥ 0) is a non-negative submartingale. Applying Proposition 2.4.1 and H¨older’sinequality shows that Z ∞ ∗ p p−1 ∗ E[(Xn) ] = dx px P (Xn ≥ x) 0 Z ∞ p−2 1 ∗ ≤ dx px E[|Xn| {Xn≥x}] 0 ∗  Z Xn  p−2 = pE |Xn| dx x 0 p p = E[|X |(X∗)p−1] ≤ kX k kX∗kp−1, p − 1 n n p − 1 n p n p which yields the result. 

Theorem 2.4.2 Let X be a martingale and p > 1, then the following statements are equivalent:

p 1. X is bounded in L (Ω, F,P ): supn≥0 kXnkp < ∞

p 2. X converges a.s. and in L to a random variable X∞ 3. There exists some Z ∈ Lp(Ω, F,P ) such that

Xn = E[Z|Fn].

Proof. 1. =⇒ 2. Suppose X is bounded in Lp, then in particular, it is bounded in L1 p so it converges a.s. to some finite X∞ by Theorem 2.3.1. Moreover, X∞ ∈ L by an easy ∗ 0 application of Fatou’s theorem. Next, Doob’s inequality kXnkp ≤ CkXnkp < C < ∞ ∗ ∗ entails kX∞kp < ∞ by monotone convergence, where X∞ = supn≥0 |Xn| is the monotone limit of X∗. Since X∗ ≥ sup |X |, |X −X | ≤ 2X∗ ∈ Lp and dominated convergence n ∞ n∈N n n ∞ ∞ p entails that Xn converges to X∞ in L . 2. =⇒ 3. Since conditional expectation is continuous as a linear operator on Lp p spaces (Proposition 1.2.2), if Xn → X∞ in L we have for n ≤ m, Xn = E[Xm|Fn] →m→∞ E[X∞|Fn]. 3. =⇒ 1. This is immediate by the conditional Jensen inequality. 

A martingale which has the form in 3. is said to be closed (in Lp). Notice that W S in this case, X∞ = E[Z|F∞], where F∞ = n≥0 Fn. Indeed, n≥0 Fn is a π-system that spans F∞, and moreover if B ∈ FN say is an element of this π-system, E[1BZ] = 1 1 1 E[ BE[Z|F∞]] = E[ BXN ] → E[ BX∞] as N → ∞. Since X∞ = lim supn Xn is F∞- measurable, this gives the result. p Therefore, for p > 1, the map Z ∈ L (Ω, F∞,P ) 7→ (E[Z|Fn], n ≥ 0) is a bijection p p between L (Ω, F∞,P ) and the set of martingales that are bounded in L . 20 CHAPTER 2. DISCRETE-TIME MARTINGALES

2.5 Uniform integrability and convergence in L1

The case of L1 convergence is a little different from Lp for p > 1, as one needs to sup- pose uniform integrability rather than a mere boundedness in L1. Notice that uniform integrability follows from boundedness in Lp. Theorem 2.5.1 Let X be a martingale. The following statements are equivalent:

1. (Xn, n ≥ 0) is uniformly integrable

1 2. Xn converges a.s. and in L (Ω, F,P ) to a limit X∞

1 3. There exists Z ∈ L (Ω, F,P ) so that Xn = E[Z|Fn], n ≥ 0. Proof. 1. =⇒ 2. Suppose X is uniformly integrable, then it is bounded in L1 so by Theorem 2.3.1 it converges a.s. By properties of uniform integrability, it then converges in L1.

2. =⇒ 3. This follows the same proof as above: X∞ = Z is a suitable choice. 3. =⇒ 1. This is a straightforward consequence of the fact that {E[X|G]: G is a sub-σ-algebra of F} is U.I., see the example sheet 1. 

As above, we then have E[Z|F∞] = X∞, and this theorem says that there is a one-to- 1 one correspondence between U.I. martingales and L (Ω, F∞,P ).

Exercise. Show that if X is a U.I. supermartingale (resp. submartingale), then Xn 1 converges a.s. and in L to a limit X∞, so that E[X∞|Fn] ≤ Xn (resp. ≥) for every n.

2.6 Optional stopping in the case of U.I. martingales

We give an improved version of the optional stopping theorem, in which the boundedness condition on the stopping time is lifted, and replaced by a uniform integrability condition on the martingale. Since U.I. martingales have a well defined limit X∞, we unambiguously let XT = XT 1{T <∞} + X∞1{T =∞} for any stopping time T . Theorem 2.6.1 Let X be a U.I. martingale, and S,T be two stopping times with S ≤ T . Then E[XT |FS] = XS.

1 Proof. We check that XT ∈ L , indeed, since |Xn| ≤ E[|X∞| |Fn], ∞ X X E[|XT |] = E[|Xn|1{T =n}] + E[|X∞|1{T =∞}] ≤ E[|X∞|1{T =n}] = E[|X∞|]. n=0 n∈N Next, if B ∈ FT , X X E[1BX∞] = E[1B1{T =n}X∞] = E[1B1{T =n}Xn] = E[1BXT ], n∈Z+t{∞} n∈Z+t{∞} so that XT = E[X∞|FT ]. Finally, E[XT |FS] = E[E[X∞|FT ]|FS] = XS, by the tower property.  2.7. BACKWARDS MARTINGALES 21

2.7 Backwards martingales

Backwards martingales are martingales whose time-set is Z−. More precisely, given a filtration ... ⊆ G−2 ⊆ G−1 ⊆ G0, a process (Xn, n ≤ 0) is a backward martingale if E[Xn+1|Gn] = Xn, as in the usual definition. They are somehow nicer than forward 1 martingales, as they are automatically U.I. since X0 ∈ L , and E[X0|Gn] = Xn for every n ≤ 0. Adapting Doob’s upcrossing theorem is a simple exercise: if Nm([a, b],X) is the number of upcrossings of a backwards martingale from a to b between times −m and 0, one has, considering the (forward) supermartingale (X−m+k, 0 ≤ k ≤ m), that

− (b − a)E[Nm([a, b],X)] ≤ E[(X0 − a) ].

As m → ∞, Nm([a, b],X) increases to the total number of upcrossings of X from a to b, and this allows to conclude that Xn converges a.s. as n → −∞ to a G−∞-measurable T random variable X−∞, where G−∞ = n≤0 Gn. We proved:

1 Theorem 2.7.1 Let X be a backwards martingale. Then Xn converges a.s. and in L as n → −∞ to the random variable X−∞ = E[X0|G−∞]. p p p Moreover, if X0 ∈ L for some p > 1, then X is bounded in L and converges in L as n → −∞. 22 CHAPTER 2. DISCRETE-TIME MARTINGALES Chapter 3

Examples of applications of discrete-time martingales

3.1 Kolmogorov’s 0 − 1 law, law of large numbers

Let (Yn, n ≥ 1) be a sequence of independent random variables. T Theorem 3.1.1 (Kolmogorov’s 0 − 1 law) The tail σ-algebra G∞ = n≥0 Gn, where Gn = σ{Xm, m ≥ n}, is trivial: every A ∈ G∞ has probability 0 or 1.

Proof. Let Fn = σ{Y1,...,Yn}, n ≥ 1. Let A ∈ G∞. Then E[1A|Fn] = P (A) since Fn is independent of Gn+1, hence of G∞. Therefore, the martingale convergence theorem gives 1 1 E[ A|F∞] = A = P (A) a.s., since G∞ ⊂ F∞. Hence, P (A) ∈ {0, 1}. 

1 Suppose now that the Yi are real-valued i.i.d. random variables in L . Let Sn = Pn k=1 Yk, n ≥ 0 be the associated random walk.

Theorem 3.1.2 (LLN) A.s. as n → ∞,

Sn −→ E[Y1]. n n→∞

Proof. Let Hn = σ{Sn,Sn+1,...} = σ{Sn,Yn+1,Yn+2,...}. We have E[Sn|Hn+1] = Sn+1 − E[Xn+1|Sn+1]. Now, by symmetry we have E[Xn+1|Sn+1] = E[Xk|Sn+1] for ev- −1 ery 1 ≤ k ≤ n + 1, so that it equals (n + 1) E[Sn+1|Sn+1] = Sn+1/(n + 1). Finally, E[Sn/n|Hn+1] = Sn+1/(n + 1), so that (S−n/(−n), n ≤ −1) is a backwards martingale 1 with respect to its natural filtration. Therefore, Sn/n converges a.s. and in L to a limit which is a.s. constant by Kolmogorov’s 0 − 1 law, so it must be equal to its mean value: E[S1|H∞] = E[S1] = E[Y1]. 

3.2 Branching processes

Let µ be a probability distribution on Z+, and consider a Markov process (Zn, n ≥ 0) in Z+ whose step-transitions are determined by the following rule. Given Zn = z, take z

23 24CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

independent random variables Y1,...,Yz with law µ, and let Zn+1 have the same distri- bution as Y1 + ... + Yz. In particular, 0 is an absorbing state for this process. This can be interpreted as follows: Zn is a number of individuals present in a population, and at each time, each individual dies after giving birth to a µ-distributed number of sons, indepen- dently of the others. Notice that E[Zn+1|Fn] = E[Zn+1|Zn] = mZn, where (Fn, n ≥ 0) is P the natural filtration, and m is the mean of µ, m = z zµ({z}). Therefore, supposing m ∈ (0, ∞),

−n Proposition 3.2.1 The process (m Zn, n ≥ 0) is a non-negative martingale.

Notice that the fact that the martingale converges a.s. to a finite value immediately implies that when m < 1, there exists some n so that Zn = 0, i.e. the population becomes n extinct in finite time. It is also guessed that when m > 1, Zn should be of order m so that the population should grow explosively, at least with a positive probability. It is a standard to show that Exercise Let ϕ(s) = P µ({z})sz be the generating function of µ, we suppose z∈Z+ µ({1}) < 1. Show that if Z0 = 1, then the generating function of Zn is the n-fold composition of ϕ with itself. Show that the probability of eventual extinction of the pop- ulation satisfies ϕ(q) = q, and that q > 0 ⇐⇒ m > 1. As a hint, ϕ is a convex function such that ϕ0(1) = m.

n Notice that, still supposing Z0 = 1, the martingale (Mn = Zn/m , n ≥ 0) cannot be U.I. when m ≤ 1, since it converges to 0 a.s., so E[M∞] < E[M0]. This leaves open the question whether P (M∞ > 0) > 0 in the case m > 1. We are going to address the problem in a particular case:

2 Proposition 3.2.2 Suppose m > 1, Z0 = 1 and σ = Var (µ) < ∞. Then the martingale 2 2 M is bounded in L , and hence converges a.s. and in L to a variable M∞ so that E[M∞] = 1, in particular, P (M∞ > 0) > 0.

2 2 2 2 2 2 Proof. We compute E[Zn+1|Fn] = Znm + Znσ . This shows that E[Mn+1] = E[Mn] + σ2m−n, and therefore, since m−n, n ≥ 0 is summable, M is bounded in L2 (this statement is actually equivalent to m > 1). 

Exercise Show that under these hypotheses, {M∞ > 0} and {limn Zn = ∞} are equal, up to an event of vanishing probability.

3.3 A martingale approach to the Radon-Nikodym theorem

We begin with the following general remark. Let (Ω, F, (Fn),P ) be a filtered probability space with F∞ = F, and let Q be a finite non-negative measure on (Ω, F). Let Pn and Qn denote the restrictions of P and Q to the measurable space (Ω, Fn). Suppose that for every n, Qn has a density Mn with respect to Pn, namely Qn(dω) = Mn(ω)Pn(dω), where Mn is an Fn-measurable non-negative function. We also sometimes let Mn = dQn/dPn. 3.3. THE RADON-NIKODYM THEOREM 25

Then it is immediate that (Mn, n ≥ 0) is a martingale with respect to the filtered space (Ω, F, (Fn),P ). Indeed, E[Mn] = Q(Ω) < ∞, and for A ∈ Fn,

P Pn+1 Pn P E [Mn+11A] = E [Mn+11A] = Qn+1(A) = Qn(A) = E [Mn1A] = E [Mn1A],

P Pn where E ,E denote expectations with respect to the probability measures P,Pn.A natural problem is to wonder whether the identity Qn = MnPn passes to the limit Q = M∞P as n → ∞, where M∞ is the a.s. limit of the non-negative martingale M.

Proposition 3.3.1 Under these hypotheses, there exists a non-negative random variable X := dQ/dP such that Q = X · P if and only if (Mn, n ≥ 0) is U.I.

Proof. If M is U.I., then we can pass to the limit in E[Mm1A] = Q(A) for A ∈ Fn and 1 S m → ∞, to obtain E[M∞ A] = Q(A) for every A ∈ n≥0 Fn. Since this last set is a π-system that generates F∞ = F, we obtain M∞ · P = Q by the theorem on uniqueness of measures.

Conversely, if Q = X · P , then for A ∈ Fn, we have Q(A) = E[Mn1A] = E[X1A] so that Mn = E[X|Fn], which shows that M is U.I. 

The Radon-Nikodym theorem (in a particular case) states as follows.

Theorem 3.3.1 (Radon-Nikodym) Let (Ω, F) be a measurable space such that F is separable, i.e. generated by a countable set of events Fk, k ≥ 1. Let P be a probability measure on (Ω, F) and Q be a finite non-negative measure on (Ω, F). Then the following statements are equivalent. (i) Q is absolutely continuous with respect to P , namely

∀ A ∈ F,P (A) = 0 =⇒ Q(A) = 0.

(ii) ∀ ε > 0, ∃ δ > 0, ∀ A ∈ F,P (A) ≤ δ =⇒ Q(A) ≤ ε. (iii) There exists a non-negative random variable X such that Q = X · P .

The separability condition on F can actually be lifted, see Williams’ book for the proof in the general case.

Proof. That (iii) implies (i) is straightforward.

If (ii) is not satisfied then we can find a sequence Bn of events and an ε > 0 such −n that P (Bn) < 2 but Q(Bn) ≥ ε. But by the Borel-Cantelli lemma, P (lim sup Bn) = S 0, while Q(lim sup Bn), as the decreasing limit of Q( k≥n Bk) as n → ∞, must be ≥ lim supn Q(Bn) ≥ ε. Hence, (i) does not hold for the set A = lim sup Bn. So (i) implies (ii).

Let us now assume (ii). Let Fn be a filtration such that Fn is the σ-algebra spanned by events F1,...,Fn, Notice that any event of Fn is a disjoint union of non-empty “atoms” of the form \ Gi, i≥1 26CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

where either Gi = Fi or its complementary set. We let An be the set of atoms of Fn. Let

X Q(A) M (ω) = 1 (ω), n P (A) A A∈An with the convention that 0/0 = 0. Then it is easy to check that Mn is a density for Qn with respect to Pn, where Pn,Qn denote restrictions to Fn as above. Indeed, if A ∈ An, Q(A) Q (A) = P (A) = EPn [M 1 ]. n P (A) n A

Therefore, (Mn, n ≥ 0) is a non-negative (Fn, n ≥ 0)-martingale, and Mn(ω) converges a.s. towards a limit M∞(ω). Moreover, the last proposition tells us that it suffices to shoz that (Mn) is U.I. to conclude the proof. 1 But note that we have E[Mn {Mn≥a}] = Q(Mn ≥ a). So for ε > 0 fixed, P (Mn ≥ a) ≤ E[Mn]/a = Q(Ω)/a < δ for all n, as soon as a is large enough, with δ fixed by the claim, and this entails Q(Mn ≥ a) ≤ ε for every n. Hence the result. 

Example. Let Ω = [0, 1) be endowed with its Borel σ-field, which is spanned by {Ik,j = −k −k k k [j2 , (j + 1)2 ), k ≥ 0, 0 ≤ j ≤ 2 − 1}. The intervals Ik,j, 0 ≤ j ≤ 2 − 1 are called the dyadic intervals of depth k, they span a σ-algebra which we call Fk. We let λ(dω) be the Lebesgue measure on [0, 1). Let ν be a finite non-negative measure on [0, 1), and

2n−1 n X 1 Mn(ω) = 2 In,j (ω)ν(In,j), j=0 then we obtain by the previous theorem that if ν is absolutely continuous with respect to λ, then ν = f · λ for some non-negative measurable f. We then see that a.s., if −k k −k k Ik(x) = [2 [2 x], 2 ([2 x] + 1)) denotes the dyadic interval of level k containing x, Z 2k f(x)λ(dx) → f(x). k→∞ Ik(x) This is a particular case of Lebesgue differentiation theorem.

3.4 Product martingales and likelihood ratio tests

Theorem 3.4.1 (Kakutani’s theorem) Let (Yn, n ≥ 1) a sequence of independent non-negative random variables, with mean 1. Let Fn = σ(Y1,...,Yn). Then Xn = Q Y , n ≥ 0 is a (F , n ≥ 0)-martingale, which converges to some X ≥ 0. Letting 1≤k≤n √k n ∞ an = E[ Yn], the following statements are equivalent:

1. X is U.I.

2. E[X∞] = 1

3. P (X∞ > 0) > 0 3.4. PRODUCT MARTINGALES 27

Q 4. n an > 0. Proof. The fact that M is a (non-negative) martingale follows from the fact that E[Xn+1|Fn] = XnE[Yn+1|Fn] = XnE[Yn+1]. For the same reason, the process n √ Y Yn Mn = , n ≥ 0 an k=1 2 Qn −2 is a non-negative martingale with mean E[Mn] = 1, and E[Mn] = k=1 ak . Thus, M 2 Q is bounded in L if and only if an > 0 (notice that an ∈ (0, 1] e.g. by the Schwarz √ p n inequality E[1 · Yn] ≤ E[Yn]). ∗ 2 Now, with the standard notation Xn = sup0≤k≤n Xk, using Doob’s L inequality, ∗ ∗ 2 2 E[Xn] ≤ E[(Mn) ] ≤ 4E[Mn], 2 ∗ which shows that if M is bounded in L , then X∞ is integrable, hence X is U.I. since it ∗ is dominated by X∞. We thus have obtained 4. =⇒ 1. =⇒ 2. =⇒ 3., where the second implication comes from the optional stopping theorem for U.I. martingales, and Q the implication 2. =⇒ 3. is trivial. On the other hand, if an = 0, since Mn converges √ Q n a.s. to some M∞ ≥ 0, then Xn = Mn 1≤k≤n ak converges to 0, so that 3. does not hold. So 3. =⇒ 4., hence the result. 

Note that, if Yn > 0 a.s. for every n, the event {X∞ = 0} is a tail event, so that 2. above is equivalent to P (X∞ > 0) = 1 by Kolmogorov’s 0 − 1 law. As an example of application of this theorem, consider a σ-finite measured space ⊗ (E, E, λ) and let Ω = EN, F = E N be the product measurable space. We let Xn(ω) = ωn, n ≥ 1, and Fn = σ({X1,...,Xn}). One says that X is the canonical (E-valued) process.

Now suppose given two families of probability measures (µn, n ≥ 1) and (νn, n ≥ 1) that admit densities dµn = fndλ, dνn = gndλ with respect to λ. We suppose that N N fn(x)gn(x) > 0 for every n, x. Let P = n≥1 µn, resp. Q = n≥1 νn denote the measures on (Ω, F) under which (Xn, n ≥ 1) is a sequence of independent random variables with Qn N respective laws µn (resp. νn). In particular, if A = i=1 Ai ×E is a measurable rectangle in Fn, Z n n Y gi(xi) Y P Q(A) = fi(xi)dxi = E (Mn 1A), n f (x ) E i=1 i i i=1 where EP denotes expectation with respect to P , and n Y gi(Xi) M = . n f (X ) i=1 i i

Since measurable rectangles of Fn form a π-system that span Fn, the probability Q|Fn is absolutely continuous with respect to P |Fn , with density Mn, so that (Mn, n ≥ 1) is a non- negative martingale with respect to the filtered space (Ω, F, (Fn, n ≥ 0),P ). Kakutani’s 1 theorem then shows that M converges a.s. and in L to its limit M∞ if and only if Y Z p X Z p p 2 fn(x)gn(x)λ(dx) > 0 ⇐⇒ fn(x) − gn(x) λ(dx). n≥1 E n≥1 E 28CHAPTER 3. EXAMPLES OF APPLICATIONS OF DISCRETE-TIME MARTINGALES

P In this case, one has Q(A) = E [M∞1A] for every measurable rectangle of F, and Q is absolutely continuous with respect to P with density M∞. In the opposite case, M∞ = 0, so Proposition 3.3.1 shows that Q and P are carried by two disjoint measurable sets.

3.4.1 Example: consistency of the likelihood ratio test

In the case where µn = µ, νn = ν for every n, we see that M∞ = 0 a.s. if (and only if) µ 6= ν. This is called the consistency of the likelihood ratio test in statistics. Let us recall the background for the application of this test. Suppose given an i.i.d. sample X1,X2,...,Xn, with an unknown common distribution. Suppose one wants to test the hypothesis (H0) that this distribution is P against the hypothesis (H1) that it is Q, where P and Q have everywhere positive densities f, g with respect to some common σ- finite measure λ (for example, a normal distribution and a Cauchy distribution). Letting Q 1 Mn = 1≤i≤n g(Xi)/f(Xi), we use the test {Mn≤1} for acceptance of H0 against H1. Then supposing H0, then M∞ = 0 a.s. so the probability of rejection P (Mn > 1) converges to 0. Similarly, supposing H1, then M∞ = +∞ a.s. so the probability of rejection goes to 1. Chapter 4

Continuous-parameter stochastic processes

In this section, we will consider the case when processes are indexed by a real interval I ⊂ R, with non-empty interior, in many cases I will be R+. This makes the whole study more involved, as we now stress here. In all what follows, the states space E is assumed to be a metric space, usually E = R or E = Rd endowed with the Euclidean norm.

4.1 Theoretical problems when dealing with contin- uous time processes

Although the definitions for filtrations, adapted processes, stopping times, martingales, super-, sub-martingales are not changed when compared to the discrete case (see the be- ginning of Section 2), the use of continuous time induces important measurability prob- lems. Indeed, there is no reason why an adapted process (ω, t) 7→ Xt(ω) should be a measurable map defined on Ω × I, or even the sample path t 7→ Xt(ω) for any fixed ω. In particular, stopped processes like XT 1{T <∞} for a stopping time T have no reason to be random variables. Even worse, there are in general “very few” stopping times — for example first entrance times inf{t ≥ 0 : Xt ∈ A} for measurable (or even open or closed) subsets of the states space E need not be stopping times. This is the reason why we add a priori requirements on the regularity of random processes under consideration. A quite natural requirement is that they are continuous processes, i.e. that t 7→ Xt(ω) is continuous for a.e. ω, because a continuous function is determined by its values on a countable dense subset of I. More generally, we will consider processes that are right-continuous and admit left limits everywhere, a.s. — such processes are called c`adl`ag, and are also determined by the values they take on a countable dense subset of I (the notation c`adl`agstands for the French ‘continu `adroite, limit´e`agauche’). We let C(I,E),D(I,E) denote the spaces of continuous and c`adl`ag functions from I to E, we consider these sets as measurable spaces by endowing them with the product σ-algebra that makes the projections πt : X 7→ Xt measurable for every t ∈ I. Usually, we will consider processes with values in R, or sometimes Rd for some d ≥ 1 in the chapter

29 30 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES on Brownian motion. The following proposition holds, of which (ii) is an analog of 1., 2. in Proposition 2.1.2.

Proposition 4.1.1 Let (Ω, F, (Ft, t ∈ I),P ) be a filtered probability space, and let (Xt, t ∈ I) be an adapted process with values in E.

(i) Suppose X is continuous (i.e. (Xt(ω), t ∈ I) ∈ C(I,E) for every ω). If A is a closed set, and inf I > −∞, then the random time

TA = inf{t ∈ I : Xt ∈ A}

is a stopping time.

(ii) Let T be a stopping time, and suppose X is c`adl`ag. Then XT 1{T <∞} : ω 7→ XT (ω)(ω)1{T (ω)<∞} is an FT -measurable random variable. Moreover, the stopped process T X = (XT ∧t, t ≥ 0) is adapted.

Proof. For (i), notice that if A is closed and X is continuous, then for every t ∈ I,   {TA ≤ t} = inf d(Xs,A) = 0 , s∈I∩Q,s≤t where d(x, A) = infy∈A d(x, y) is the distance from x to the set A. Indeed, if Xs ∈ A for some s ≤ t, then for qn converging to s in Q ∩ I ∩ (−∞, t], Xqn converges to Xs, so that d(Xqn ,A) converges to 0. Conversely, if there exists qn ∈ Q ∩ I ∩ (−∞, t] such that d(Xqn ,A) converges to 0, then since inf I > −∞ we can extract along a subsequence and assume qn converges to some s ∈ I ∩ (−∞, t], and this s has to satisfy d(Xs,A) = 0 by continuity of X. Since A is closed, this implies Xs ∈ A, so that {TA ≤ t}.

For (ii), first note that a random variable Z is FT -measurable if Z1{T ≤t} ∈ Ft for P 1 every t ∈ I, by approximating Z by a finite sum of the form αi Ai , for Ai ∈ FT . Notice also that if T is a stopping time, then, if dxe denotes smallest n ∈ Z+ with −n n n ≥ x, Tn = 2 d2 T e is also a stopping time with Tn ≥ T , that decreases to T as n → ∞ −n n (Tn = ∞ if T = ∞). Indeed, {Tn ≤ t} = {T ≤ 2 b2 tc} ∈ Ft (notice dxe ≤ y if and only if x ≤ byc, where byc is the largest n ∈ Z+ with n ≤ x). Moreover, Tn takes values ∗ −n in the set Dn = {k2 , k ∈ Z+} t {∞} of dyadic numbers with level n (or ∞). Therefore, XT 1{T <∞}1{T ≤t} = Xt1{T =t} + XT 1{T

X 1 1 XTn∧t = Xd {Tn=d} + Xt {t

It turns out that (i) does not hold in general for c`adl`agprocesses, although it is a very subtle problem to find counterexamples. See Rogers and Williams’ book, Chapters II.74 4.2. FINITE MARGINAL DISTRIBUTIONS, VERSIONS 31

and II.75. In particular, Lemma 75.1 therein shows that TA is a stopping time if A is compact and X is an adapted c`adl`ag process, whenever the filtration (Ft, t ∈ I) satisfies the so-called “usual conditions” — see Section 4.3 for the definition of these conditions.

You may check as an exercise that the times TA for open sets A associated with c`adl`ag processes, are stopping times with respect to the filtration (Ft+, t ∈ I), where \ Ft+ = Fs. s>t

Somehow, the filtration (Ft+) foresees what will happen ‘just after’ t.

4.2 Finite marginal distributions, versions

We now discuss the notion of law of a process. If (Xt, t ∈ I) is a stochastic process, we can consider it as a random variable with values in the set EI of maps f : I → E, where this last space is endowed with the product σ-algebra (the smallest σ-algebra that makes the projections f ∈ EI 7→ f(t) measurable for every t ∈ I). It is then natural to consider the image measure µ of the probability P by the process X as the law of X. However, this measure is uneasy to manipulate, and the quantities that are of true interest are the following simpler objects.

Definition 4.2.1 Let (Xt, t ∈ I) be a process. For every finite J ⊂ I, the finite marginal J distribution of X indexed by J is the law µJ of the E -valued random variable (Xt, t ∈ J).

It is a nice fact that the finite marginal distributions {µJ : J ⊂ I, #J < ∞} uniquely characterize the law µ of the process (Xt, t ∈ I) as defined above. Indeed, by definition, if X and Y are c`adl`ag processes having the same finite marginal laws, then their distribution Q Q agree on the π-system of finite “rectangles” of the form i∈J As × t∈I\J E for finite J ⊂ I, which generate the product σ-algebra, hence the distributions under consideration are equal. Notice that this uniqueness result does not imply the existence of a process with given marginal distributions. The problem with (finite marginal) laws of processes is that they are powerless in dealing with properties of processes that involve more than countably many times, such as continuity or c`adl`agproperties of the process. For example, if X is a continuous process, there are (many!) non-continuous processes that have the same finite-marginal distributions as X: the finite marginal distributions just do not ‘see’ the sample path properties of the process. This motivates the following definition.

Definition 4.2.2 If X and X0 are two processes defined on some common probability 0 0 space (Ω, F,P ), we say that X is a version of X if for every t, Xt(ω) = Xt(ω) a.s.

In particular, two versions X and X0 of the same process share the same finite- dimensional distribution, however, this does not say that there exists an ω so that 0 0 Xt(ω) = Xt(ω) for every t. This becomes true if both X and X are a priori known to be c`adl`ag, for instance. 32 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

Example. To explain these very abstract notions, suppose we want to find a process (Xt, 0 ≤ t ≤ 1) whose finite marginal laws are Dirac masses at 0, namely

µJ ({(0, 0,..., 0)}) = P (Xs = 0, s ∈ J) = 1 | {z } #J times for every finite J ⊂ [0, 1]. Of course, the process Xt = 0, 0 ≤ t ≤ 1 satisfies this. However, 0 1 the process Xt = {U}(t), 0 ≤ t ≤ 1, where U is a uniform random variable on [0, 1], is a version of X, and therefore has the same law as X. But of course, it is not continuous, 0 and P (Xt = 0 ∀ t ∈ [0, 1]) = 0. We thus want to consider it as a ‘bad’ version of the zero process. This example motivates the following way of dealing with processes: when considering a process whose finite marginal distributions are known, we first try to find the most regular version of the process as we can before studying it.

We will discuss two ‘regularization theorems’ in this course, the martingale regular- ization theorem and Kolmogorov’s continuity criterion, which are instances of situations when there exists a regular (continuous or c`adl`ag)version of the stochastic process under consideration.

4.3 The martingale regularization theorem

We consider here a martingale (Xt, t ≥ 0) on some filtered probability space (Ω, F, (Ft, t ≥ 0),P ). We let N be the set of events in F with probability 0, \ Ft+ = Fs, t ≥ 0, s>t and Fet = Ft+ ∪ N .

Theorem 4.3.1 Let (Xt, t ≥ 0) be a martingale. Then there exists a c`adl`agprocess Xe which is a martingale with respect to the filtered probability space (Ω, F, (Fet, t ≥ 0),P ), so that for every t ≥ 0, Xt = E[Xet|Ft] a.s. If Fet = Ft for every t ≥ 0, Xe is therefore a c`adl`agversion of X.

We say that (Ft, t ≥ 0) satisfies the usual conditions if Fet = Ft for every t, that is, N ⊆ F0 and Ft+ = Ft (a filtration satisfying this last condition for t ≥ 0 is called right- continuous, notice that (Ft+, t ≥ 0) is right-continuous for every filtration (Ft, t ≥ 0)). As a corollary of Theorem 4.3.1, in the case when the filtration satisfies the usual conditions, a martingale admits a c`adl`agversion so there is “little to lose” to consider that martingales are c`adl`ag.

Lemma 4.3.1 A function f : Q+ → R admits a left and a right (finite) limit at every t ∈ R+ if and only if for every rationals a < b and bounded I ⊂ Q, f is bounded on I and the number  ∃ 0 ≤ s < t < . . . < s < t , all in I,  N([a, b], I, f) = sup n ≥ 0 : 1 1 n n f(si) < a, f(ti) > b, 1 ≤ i ≤ n a upcrossings of f from a to b is finite. 4.3. THE MARTINGALE REGULARIZATION THEOREM 33

Proof of Theorem 4.3.1. We first show that X is bounded on bounded subsets of Q+. Indeed, if I is such a subset and J = {a1, . . . , ak} is a finite subset of I with a1 < . . . < an, then Ml = Xal , 1 ≤ l ≤ k is a martingale. Doob’s maximal inequality applied to the submartingale |M| then shows that

∗ cP (Mk > c) = cP (max |Xal | > c) ≤ E[|Xak |] ≤ E[|XK |] 1≤l≤k for any K > sup I. Therefore, taking a monotone limit over finite J ⊂ I with union I, we have cP (sup |Xt| > c) ≤ E[|XK |]. t∈I

This shows that P (supt∈I |Xt| < ∞) = 1 by letting c → ∞. Let I still be a bounded subset of R+, and a < b ∈ Q+. By definition, we have N([a, b],I,X) = supJ⊂I,finite N([a, b], J, X). So let J ⊂ I be a finite subset of the form

{a1, a2, . . . , ak} as above, and again let Ml = Xal , 1 ≤ l ≤ k. Doob’s upcrossing lemma for this martingale gives

− − (b − a)E[N([a, b], J, X)] ≤ E[(Xak − a) ] ≤ E[(XK − a) ],

− for any K ≥ sup I, because ((Xt − a) , t ≥ 0) is a submartingale due to the convexity of x 7→ (x − a)−. Taking the supremum over J shows that N([a, b],I,X) is a.s. bounded, because E[|XK |] < ∞. This shows by letting K → ∞ along integers, that N([a, b],I,X) is finite for every bounded subset I of Q+, and every a < b rationals, for every ω in an event Ω0 with probability 1. Therefore, we can define

Xet(ω) = lim Xs(ω) , ω ∈ Ω0 s∈Q+,s>t and Xet(ω) = 0 for every t for ω∈ / Ω0. The process Xe thus obtained then is indeed adapted to the filtration (Fet, t ≥ 0). It remains to show that Xe is an (Fet)-martingale, satisfies E[Xet|Ft] = Xt, and is c`adl`ag.

First, check that if X remains an (Ft ∨ N , t ∈ I)-martingale, because E[X|G ∨ N ] = E[X|G] in L1(Ω, G ∨ N ,P ) for any integrable X and sub-σ-algebra G ∈ F. Thus, we may suppose that N ⊂ Ft for every t. Let s < t ∈ R+, and sn, n ≥ 0 be a (strictly) decreasing sequence of rationals that converges to s, with s0 < t. Then Xes = lim Xsn = lim E[Xt|Fsn ] by definition for ω ∈ Ω0. Now, the process (Mn = Xs−n , n ≤ 0) is a backwards martingale with respect to the filtration (Gn = Fs−n , n ≤ 0). The backwards martingale convergence theorem thus shows that Xes = E[Xt|Fs+], and therefore Xt = E[Xet|Ft]. Moreover, taking a rational sequence (tn) decreasing to t and using again the backwards martingale 1 convergence theorem, (Xtn ) converges to Xet in L , so that Xes = E[Xet|Fes] for every s ≤ t.

The only thing that remains to prove is the c`adl`agproperty. If t ∈ R+ and if Xes(ω) does not converge to Xet(ω) as s ↓ t, then |Xet − Xes| > ε for some ε > 0 and for infinitely many s > t, so that if ω ∈ Ω0, |Xet − Xu| > ε/2 for an infinite number of rationals u > t, contradicting ω ∈ Ω0. The argument for showing that Xe has left limits is similar. 

From now on, when considering martingales in continuous time, we will always take their c`adl`ag version, provided the underlying filtration satisfies the usual hypotheses. 34 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

4.4 Doob’s inequalities and convergence theorems for martingales in continuous time

Considering c`adl`ag martingales makes it straightforward to generalize to the continuous case the inequalities of section 2.4, by density arguments. We leave to the reader to show the following theorems which are analog to the discrete-time case.

Proposition 4.4.1 (A.s. convergence) Let (Xt, t ≥ 0) be a c`adl`agmartingale which 1 is bounded in L . Then Xt converges as t → ∞ a.s. to an (a.s.) finite limit X∞.

To prove this, notice that convergence of Xt as t → ∞ to a (possibly infinite) limit is equivalent to the fact that the number of upcrossings of X from below a to above b over the time interval R+ is finite for every a < b rationals. However, by the c`adl`ag property, it suffices to restrict our attention to the countable time set Q+ rather than R+. Indeed, for each upcrossing of X from a to b between times s < t say, we can find rationals s0 > s, t0 > t as close to s, t as wanted so that X accomplishes an upcrossing 0 0 from a to b between times s , t , and this implies that N(X, R+, [a, b]) = N(X, Q+, [a, b]) (possibly infinite). Then, use similar arguments as those used in the first part of the proof of Theorem 4.3.1.

Proposition 4.4.2 (Doob’s inequalities) If (Xt, t ≥ 0) is a c`adl`agmartingale and ∗ Xt = sup0≤s≤t |Xs|, then for every c > 0, t ≥ 0,

∗ cP (Xt ≥ c) ≤ E[|Xt|]. Moreover, if p > 1 then p kX∗k ≤ kX k . t p p − 1 t p

To prove this, notice that X∗ = sup |X | by the c`adl`agproperty. t s∈{t}∪([0,t]∩Q) s

Proposition 4.4.3 (Lp convergence) (i) If X is a c`adl`agmartingale and p > 1 then p supt≥0 kXtkp < ∞ if and only if X converges a.s. and in L to its limit X∞, and this if p p and only if X is closed in L , i.e. there exists Z ∈ L so that E[Z|Ft] = Xt for every t, a.s. (one can then take Z = X∞). (ii) If X is a c`adl`agmartingale then X is U.I. if and only if X converges a.s. and in 1 1 L to its limit X∞, and this if and only if X is closed (in L ).

Proposition 4.4.4 (Optional stopping) Let X be a c`adl`agU.I. martingale. Then for every stopping times S ≤ T , one has E[XT |FS] = XS a.s.

−n n Proof. Let Tn be the stopping time 2 d2 T e as defined in the proof of Proposition

4.1.1. The right-continuity of paths of X shows that XTn converges to XT a.s. Moreover, ∗ Tn takes values in the countable set Dn of dyadic rationals of level n (and ∞), so that X 1 X 1 E[X∞|FTn ] = E[ {Tn=d}X∞|FTn ] = {Tn=d}E[X∞|Fd] ∗ ∗ d∈Dn d∈Dn 4.5. KOLMOGOROV’S CONTINUITY CRITERION 35

1 (you should check this carefully). Now, since Xt converges to X∞ in L , Xd = E[Xt|Fd] =

E[X∞|Fd] a.s., and E[X∞|FTn ] = XTn . Passing to the limit as n → ∞ and using the 0 0 backwards martingale convergence theorem, we obtain E[X∞|FT ] = XT where FT = T n≥1 FTn , and therefore E[X∞|FT ] = XT by the tower property, since XT is FT -measurable. The theorem then follows as in Theorem 2.6.1. 

4.5 Kolmogorov’s continuity criterion

Theorem 4.5.1 (Kolmogorov’s continuity criterion) Let (Xt, 0 ≤ t ≤ 1) be a stochas- tic process with real values. Suppose there exist p > 0, c > 0, ε > 0 so that for every s, t ≥ 0, p 1+ε E[|Xt − Xs| ] ≤ c|t − s| . Then, there exists a modification Xe of X which is a.s. continuous (and even α-H¨older continuous for any α ∈ (0, ε/p)).

−n n Proof. Let Dn = {k · 2 , 0 ≤ k ≤ 2 } denote the dyadic numbers of [0, 1] with level n, so Dn increases as n increases. Then letting α ∈ (0, ε/p), Markov’s inequality gives for 0 ≤ k < 2n,

−nα npα p npα −n−nε −n −(ε−pα)n P (|Xk2−n −X(k+1)2−n | > 2 ) ≤ 2 E[|Xk2−n −X(k+1)2−n | ] ≤ 2 2 ≤ 2 2 .

Summing over Dn we obtain   −nα −n(ε−pα) P sup |Xk2−n − X(k+1)2−n | > 2 ≤ 2 , 0≤k<2n which is summable. Therefore, the Borel-Cantelli lemma shows that for a.a. ω, there −nα exists Nω so that if n ≥ Nω, the supremum under consideration is ≤ 2 . Otherwise said, a.s., |Xk2−n − X(k+1)2−n | sup sup −nα ≤ M(ω) < ∞. n≥0 k∈{0,...,2n−1} 2 S 0 α We claim that this implies that for every s, t ∈ D = n≥0 Dn, |Xs − Xt| ≤ M (ω)|t − s| , for some M 0(ω) < ∞ a.s. Indeed, if s, t ∈ D, s < t, and if r is the least integer such that t − s > 2−r−1 we can write [s, t) as a disjoint unions of intervals of the form [r, r + 2−n) with r ∈ Dn and n > r, in such a way that for every n > r, at most two of these intervals have length 2−n. This entails that

X −nα −α −1 −(r+1)α 0 α |Xs − Xt| ≤ 2 M(ω)2 ≤ 2(1 − 2 ) M(ω)2 ≤ M (ω)|t − s| n≥r+1

0 where M (ω) < ∞ a.s. Therefore, the process (Xt, t ∈ D) is a.s. uniformly continuous (and even α-H¨oldercontinuous). Since D is an everywhere dense set in [0, 1], the latter process a.s. admits a unique continuous extension Xe on [0, 1], which is also α-H¨older continuous (it is consistently defined by Xet = limn Xtn , where (tn, n ≥ 0) is any D-valued sequence converging to t). On the exceptional set where (Xd, d ∈ D) is not uniformly 36 CHAPTER 4. CONTINUOUS-PARAMETER PROCESSES

continuous, we let Xet = 0, 0 ≤ t ≤ 1, so Xe is continuous. It remains to show that Xe is a version of X. To this end, we estimate by Fatou’s lemma

p p E[|Xt − Xet| ] ≤ lim inf E[|Xt − Xtn | ], n

p where (tn, n ≥ 0) is any D-valued sequence converging to t. But since E[|Xt − Xtn | ] ≤ 1+ε c|t − tn| , this converges to 0 as n → ∞. Therefore, Xt = Xet a.s. for every t. 

The nice thing about this criterion is that is depends only on a control on the two- dimensional marginal distributions of the stochastic process. In fact, the very same proof can give the following alternative

Corollary 4.5.1 Let (Xd, d ∈ D) be a stochastic process indexed by the set D of dyadic numbers in [0, 1]. Assume that there exist c, p, ε > 0 so that for every s, t ∈ D,

p 1+ε E[|Xs − Xt| ] ≤ c|s − t| , thenalmost-surely, the process (Xd, d ∈ [0, 1]) has an extension (Xt, t ∈ [0, 1]) that is continuous, and even H¨older-continuous of any index α ∈ (0, ε/p). Chapter 5

Weak convergence

5.1 Definition and characterizations

Let (M, d) be a metric space, endowed with its Borel σ-algebra. All measures in this chapter will be measures on such a measurable space. Let (µn, n ≥ 0) be a sequence of probability measures on M. We say that µn converges weakly to the non-negative measure µ if for every continuous bounded function f : M → R, one has µn(f) → µ(f). Notice that in this case, µ is automatically a probability measure since µ(1) = 1, and the definition actually still makes sense if we suppose that µn (resp. µ) are just finite non-negative measures on M.

Examples. Let (xn, n ≥ 0) be a M-valued sequence that converges to x. Then δxn converges weakly to δx, where δa is the Dirac mass at a. This is just saying that f(xn) → f(x) for continuous functions. −1 P Let M = [0, 1] and µn = n 0≤k≤n−1 δk/n. Then µn(f) is the Riemann sum −1 P R 1 n 0≤k≤n−1 f(k/n), which converges to 0 f(x)dx if f is continuous, which shows that µn converges weakly to Lebesgue’s measure on [0, 1].

In this two cases, notice that it is not true that µn(A) converges to µ(A) for every Borel set A convergence. This ‘pointwise convergence’ is stronger, but much more rigid than weak convergence. For example, δxn does not converge in that sense to δx unless xn = x eventually. See e.g. Chapter III in Stroock’s book for a discussion on the various existing notions of convergence for measures.

Theorem 5.1.1 Let (µn, n ≥ 0) be a sequence of probability distributions. The following assertions are equivalent:

1. µn converges weakly to µ

2. For every open subset G of M, lim infn µn(G) ≥ µ(G) (‘open sets can lose mass’)

3. For every closed subset F of M, lim supn µn(F ) ≤ µ(F ) (‘closed sets can gain mass’)

4. For every Borel subset A in M with µ(∂A) = 0, limn µn(A) = µ(A). (‘mass is lost or gained through the boundary’)

37 38 CHAPTER 5. WEAK CONVERGENCE

Proof. 1. =⇒ 2. Let G be an open subset with nonempty complement Gc. The distance c c function d(x, G ) is continuous and positive if and only if x ∈ G. Let fM = 1∧(Md(x, G )). Then fM increases to 1G as M ↑ ∞. Now, µn(fM ) ≤ µn(G) converges to µ(fM ), so that lim infn µn(G) ≥ µ(fM ) for every M, and by monotone convergence letting M ↑ ∞, one gets the result. 2. ⇐⇒ 3. is obvious by taking complementary sets. 2.,3. =⇒ 4. Let A◦ and A respectively denote the interior and the closure of A. Since µ(∂A) = µ(A \ A◦) = 0, we obtain µ(A◦) = µ(A) = µ(A).

◦ lim sup µn(A) ≤ µ(A) ≤ lim inf µn(A ), n n and since A◦ ⊂ A, this gives the result.

4. =⇒ 1. Let f : M → R+ be a continuous bounded non-negative function, then using Fubini’s theorem,

Z Z Z ∞ Z K f(x)µn(dx) = µn(dx) 1{t≤f(x)}dt = µn({f ≥ t})dt, M M 0 0 where K is any upper bound for f. Now {f ≥ t} := {x : f(x) ≥ t} is a closed subset of M, whose boundary is included in {f = t}, because {f > t} is open and included in {f ≥ t}, and their difference is {f = t}. However, there can be at most a countable set of numbers t such that µ({f = t}) > 0, because [ {t : µ({f = t}) > 0} = {t : µ({f = t}) ≥ n−1}, n≥1 and the n-th set on the right-hand side has at most n elements. Therefore, for Lebesgue- almost all t, µ(∂{f ≥ t}) = 0 and therefore, 4. and dominated convergence over the finite interval [0,K], where the integrated quantities are bounded by 1, show that µn(f) R K converges to 0 µ({f ≥ t})dt = µ(f). The case of functions taking values of both signs is immediate. 

As a consequence, one obtains the following important criterion for weak convergence of measures on R. Recall that the distribution function of a non-negative finite measure µ on R is the c`adl`ag function defined by Fµ(x) = µ((−∞, x]), x ∈ R.

Proposition 5.1.1 Let µn, n ≥ 0, µ be probability measures on R. Then the following statements are equivalent:

1. µn converges weakly to µ

2. for every x ∈ R such that Fµ is continuous at x, Fµn (x) converges to Fµ(x) as n → ∞.

Proof. The continuity of Fµ at x exactly says that µ(∂Ax) = 0 where Ax = (−∞, x], so 1. =⇒ 2. is immediate by Theorem 5.1.1. 5.2. CONVERGENCE IN DISTRIBUTION 39

Conversely, let G be an open subset of R, which we write as a countable union S k(ak, bk) of disjoint open intervals. Then X µn(G) = µn((ak, bk)) , (5.1) k

0 0 while for every k and ak < a < b < bk.,

0 0 µn((ak, bk)) = Fµn (bk−) − Fµn (ak) ≥ Fµn (b ) − Fµn (a ).

0 0 If we take a , b to be continuity points of Fµ, we then obtain lim infn µn((ak, bk)) ≥ 0 0 0 0 Fµ(b ) − Fµ(a ). Letting a ↓ ak, b ↑ bk along continuity points of Fµ (such points always form a dense set in R) gives lim infn µn((ak, bk)) ≥ µ((ak, bk)). On the other hand, P applying Fatou’s lemma to (5.1) yields lim infn µn(G) ≥ k lim infn µn((ak, bk)), whence lim infn µn(G) ≥ µ(G). 

5.2 Convergence in distribution for random variables

If (Xn, n ≥ 0) is a sequence of random variables with values in a metric space (M, d), and defined on possibly different probability spaces (Ωn, Fn,Pn), we say that Xn converges in distribution to a random variable X on (Ω, F,P ) if the law of Xn converges weakly to that of X. Otherwise said, Xn converges in distribution to X if for every continuous bounded function f, E[f(Xn)] converges to E[f(X)]. The two following examples are the probabilistic counterpart of the examples discussed in the beginning of the previous section.

Examples. If (xn) is a sequence in M that converges to x, then xn converges as n → ∞ to x in distribution, if the xn, n ≥ 0 and x are considered as random variables! −1 If U is a uniform random variable on [0, 1) and Un = n bnUc, we see that Un has law µn and converges in distribution to U.

In the two cases we just discussed, the variables under consideration even converge a.s., which directly entails convergence in distribution, see the example sheets.

The notion of convergence in distribution is related to the other notions of convergence for random variables as follows. See the Example sheet 3 for the proof.

Proposition 5.2.1 1. If (Xn, n ≥ 1) is a sequence of random variables that converges in probability to some random variable X, then Xn converges in distribution to X.

2. If (Xn, n ≥ 1) is a sequence of random variables that converges in distribution to some constant random variable c, then Xn converges to c in probability.

Using Proposition 5.1.1, we can discuss the following 40 CHAPTER 5. WEAK CONVERGENCE

Example: the central limit theorem. The central limit theorem says that if (Xn, n ≥ 2 2 1) is a sequence of iid random variables in L with m = E[X1] and σ = Var (X1), then for every a < b in R,   b S − mn 1 Z 2 P a ≤ n √ ≤ b → √ e−x /2dx, σ n n→∞ 2π a √ where Sn = X1 + ... + Xn. This is exactly saying that (Sn − mn)/(σ n) converges in distribution as n → ∞ to a Gaussian N (0, 1) random variable.

5.3 Tightness

Definition 5.3.1 Let {µi, i ∈ I} be a family of probability measures on M. This family is said to be tight if for every ε > 0, there exists a compact subset K ⊂ M such that

sup µi(M \ K) ≤ ε, i∈I i.e. most of the mass of µi is contained in K, uniformly in i ∈ I.

Proposition 5.3.1 (Prokhorov’s theorem) Suppose that the sequence of probability measures (µn, n ≥ 0) on M is tight. Then there exists a subsequence (µnk , k ≥ 0) along which µn converges weakly to some limiting µ.

The proof is considerably eased when M = R, which we will suppose. For the general case, see Billingsley’s book Convergence of Probability Measures. Notice that in particular, if (µn, n ≥ 0) is a sequence of probability measures on a compact space, then there exists a subsequence µnk weakly converging to some µ.

Proof. Let Fn be the distribution function of µn. Then it is easy by a diagonal extraction argument to find an extraction (nk, k ≥ 0) and a non-decreasing function F : Q → [0, 1] such that Fnk (r) → F (r) as k → ∞ for every rational r. The function F is extended on R as a c`adl`agnon-decreasing function by the formula F (x) = limr↓x,r∈Q F (r). It is then elementary by a monotonicity argument to show that Fnk (x) → F (x) for every x which is a continuity point of F . To conclude, we must check that F is the distribution function of some measure µ. But the tightness shows that for every ε > 0, there exists A > 0 such that Fn(A) ≥ 1 − ε and Fn(−A) ≤ ε for every n. By further choosing A so that F is continuous at A and −A, we see that F (A) ≥ 1 − ε and F (−A) ≤ ε, whence F has limits 0 and 1 at −∞ and +∞. By a standard corollary of Caratheodory’s theorem, there exists a probability measure µ having F as its distribution function. 

Remark. The fact that Fn converges up to extraction to a function F , which need not be a probability distribution function unless the tightness hypothesis is verified, is a particular case of Helly’s theorem, and says that up to extraction, a family of probability laws µn converges vaguely to a possibly defective measure µ (i.e. of mass ≤ 1), i.e. µn(f) → µ(f) 5.4. LEVY’S´ CONVERGENCE THEOREM 41 for every f with compact support. The problem that could appear is that some of the mass of the µn’s could ‘go to infinity’, for example δn converges vaguely to the zero measure as n → ∞, and does not converge weakly. This phenomenon of mass ‘going away’ is exactly what Prokhorov’s theorem prevents from happening.

In many situations, showing that a sequence of random variables Xn converges in distribution to a limiting X with law µ is done in two steps. One first shows that the sequence (µn, n ≥ 0) of laws of the Xn form a tight sequence. Then, one shows that the limit of µn along any subsequence cannot be other than µ. This will be illustrated in the next section.

5.4 L´evy’s convergence theorem

In this section, we let d be a positive integer and consider only random variables with values in the states space Rd. Recall that the characteristic function of an Rd-valued random variable X is the func- d tion ΦX : R → C defined by ΦX (λ) = E[exp(ihλ, Xi)]. It is a continuous function on d R , such that ΦX (0) = 1. Moreover, it induces an injective mapping from (distributions of) random variables to complex-valued functions defined on Rd, in the sense that two random variables with distinct distributions have distinct characteristic functions. The following theorem is extremely useful in practice.

Theorem 5.4.1 (L´evy’s convergence theorem) Let (Xn, n ≥ 0) be a sequence of ran- dom variables.

(i) If Xn converges in distribution to a random variable X, then ΦXn (λ) converges to d ΦX (λ) for every λ ∈ R . d (ii) If ΦXn (λ) converges to Ψ(λ) for every λ ∈ R , where Ψ is a function which is continuous at 0, then Ψ is a characteristic function, i.e. there exists a random variable X such that Ψ = ΦX , and moreover, Xn converges in distribution to X.

d Corollary 5.4.1 If (Xn, n ≥ 0),X are random variables in R , then Xn converges in distribution to X as n → ∞ if and only if ΦXn converges to ΦX pointwise.

The proof of (i) in L´evy’s theorem is immediate since the function x 7→ exp(iλ · x) is continuous and bounded from Rd to C. For the proof of (ii), we will need to show that the hypotheses imply the tightness of the sequence of laws of Xn, n ≥ 0. To this end, the following bound is very useful.

Lemma 5.4.1 Let X be a random variable with values in Rd. Then for any norm k · k on Rd there exists a constant C > 0 (depending on d and on the choice of the norm) such that Z d P (kXk ≥ K) ≤ CK (1 − <ΦX (u))du. [−K−1,K−1]d 42 CHAPTER 5. WEAK CONVERGENCE

Proof. Let µ be the distribution of X. Using Fubini’s theorem and a simple recursion, it is easy to check that

Z Z d ! 1 d Y sin(λxi) d (1 − <ΦX (u))du = 2 µ(dx) 1 − . λ d d λx [−λ,λ] R i=1 i

Now, the continuous function sinc : t ∈ R 7→ t−1 sin t is such that there exists 0 < c < 1 d Qd such that |sinc t| ≤ c for every t ≥ 1, so that f : u ∈ R 7→ i=1 sin ui/ui is such that |f(u)| ≤ c as soon as kuk∞ ≥ 1. Therefore, 1 − f is a non-negative continuous function d −1 which is ≥ 1 − c when kuk∞ ≥ 1. Letting C = 2 (1 − c) entails that C(1 − f(u)) ≥ 1 {kuk∞≥1}. Putting things together, one gets the result for the norm k·k∞, and the general result follows from the equivalence of norms in finite-dimensional vector spaces. 

Proof of L´evy’s theorem. Suppose ΦXn converges pointwise to a limit Ψ that is continuous at 0. Then, |1−<ΦXn | being bounded above by 2, fixing ε > 0, the dominated convergence theorem shows that for any K > 0 Z Z d d lim K (1 − <ΦXn (u))du = K (1 − <Ψ(u))du. n [−K−1,K−1]d [−K−1,K−1]d

By taking K large enough, we can make this limiting value < ε/(2Cd), because Ψ is continuous at 0, and it follows by the lemma that for every n large enough, P (|Xn| ≥ K) ≤ ε. Up to increasing K, this then holds for every n, showing tightness of the family of laws of the Xn. Therefore, up to extracting a subsequence, we see from Prokhorov’s theorem that Xn converges in distribution to a limiting X, so that ΦXn converges pointwise to ΦX along this subsequence (by part (i)). This is possible only if ΦX = Ψ, showing that Ψ is a characteristic function. Moreover, this shows that the law of X is the only possible probability measure which is the weak limit of the laws of the Xn along some subsequence, so Xn must converge to X in distribution.

More precisely, if Xn did not converge in distribution to X, we could find a continuous bounded f, some ε > 0 and a subsequence Xnk such that for all k,

|E[f(Xnk )] − E[f(X)]| > ε (5.2)

But since the laws of (Xnk , k ≥ 0) are tight, we could find a further subsequence along 0 0 which Xnk converges in distribution to some X , which by (i) would satisfy ΦX = Ψ = ΦX and thus have same distribution as X, contradicting (5.2).  Chapter 6

Introduction to Brownian motion

6.1 History up to Wiener’s theorem

This chapter is devoted to the construction and some properties of one of probability theory’s most fundamental objects. Brownian motion earned its name after R. Brown, who observed around 1827 that tiny particles of pollen in water have an extremely erratic motion. It was observed by Physicists that this was due to a important number of random shocks undertaken by the particles from the (much smaller) water molecules in motion in the liquid. A. Einstein established in 1905 the first mathematical basis for Brownian motion, by showing that it must be an isotropic Gaussian process. The first rigorous mathematical construction of Brownian motion is due to N. Wiener in 1923, using Fourier theory. In order to motivate the introduction of this object, we first begin by a “microscopical” d depiction of Brownian motion. Suppose (Xn, n ≥ 0) is a sequence of R valued random 2 variables with mean 0 and covariance matrix σ Id, which is the identity matrix in d 2 1 d dimensions, for some σ > 0. Namely, if X1 = (X1 ,...,X1 ),

i i j 2 E[X1] = 0 ,E[X1X1 ] = σ δij , 1 ≤ i, j ≤ d.

We interpret Xn as the spatial displacement resulting from the shocks due to water molecules during the n-th time interval, and the fact that the covariance matrix is scalar stands for an isotropy assumption (no direction of space is priviledged).

From this, we let Sn = X1 + ... + Xn and we embed this discrete-time process into continuous time by letting

(n) −1/2 Bt = n S[nt] , t ≥ 0.

Let | · | be the Euclidean norm on Rd and for t > 0 and X, y ∈ Rd, define 1  |x|2  p (x) = exp − , t (2πt)d/2 2t which is the density of the Gaussian distribution N (0, tId) with mean 0 and covariance matrix tId. By convention, the Gaussian law N (m, 0) is the Dirac mass at m.

43 44 CHAPTER 6. BROWNIAN MOTION

Proposition 6.1.1 Let 0 ≤ t1 < t2 < . . . < tk. Then the finite marginal distributions of (n) B with respect to times t1, . . . , tk converge weakly as n → ∞. More precisely, if F is a bounded continuous function, and letting x0 = 0, t0 = 0, Z h (n) (n) i Y 2 E F (Bt1 ,...,Btk ) → F (x1, . . . , xk) pσ (ti−ti−1)(xi − xi−1)dxi. n→∞ d k (R ) 1≤i≤k

(n) (n) Otherwise said, (Bt1 ,...,Btk ) converges in distribution to (G1,G2,...,Gk), which is a random vector whose law is characterized by the fact that (G1,G2 − G1,...,Gk − Gk−1) are independent centered Gaussian random variables with respective covariance matrices 2 σ (ti − ti−1)Id.

(n) (n) (n) (n) Proof. With the notations of the theorem, we first check that (Bt1 ,Bt2 −Bt1 ,...,Btk − (n) Btk−1 ) is a sequence of independent random variables. Indeed, one has for 1 ≤ i ≤ k,

[nti] (n) (n) 1 X B − B = √ X , ti ti−1 n j j=[nti−1]+1 and the independence follows by the fact that (Xj, j ≥ 0) is an i.i.d. family. Even better, we have the identity in distribution for the i-th increment

p [nti]−[nti−1] (n) (n) d [nti] − [nti−1] 1 X B − B = √ X , ti ti−1 n p j [nti] − [nti−1] j=1 and the central limit theorem shows that this converges in distribution to a Gaussian law 2 N (0, σ (ti −ti−1)Id). Summing up our study, and introducing characteristic functions, we have shown that for every ξ = (ξj, 1 ≤ j ≤ k),

" k !# k Y (n) (n) Y h  (n) (n) i E exp i ξj(Btj − Btj−1 = E exp iξj(Btj − Btj−1 j=1 j=1 k Y → E[exp (iξj(Gj − Gj−1)] n→∞ j=1 " k !# Y = E exp i ξi(Gj − Gj−1) , j=1 where G1,...,Gk is distributed as in the statement of the proposition. By L´evy’sconver- (n) gence theorem we deduce that increments of B between times ti converge to increments of the sequence Gi, which is easily equivalent to the statement. 

This gives the clue that B(n) should converge to a process B whose increments are independent and Gaussian with covariances dictated by the above formula. This will be set in a rigorous way at the end of this section, with Donsker’s invariance theorem. 6.1. WIENER’S THEOREM 45

d Definition 6.1.1 A R -valued stochastic process (Bt, t ≥ 0) is called a standard Brownian motion if it is a continuous process, that satisfies the following conditions:

(i) B0 = 0 a.s.,

(ii) for every 0 = t0 ≤ t1 ≤ t2 ≤ ... ≤ tk, the increments (Bt1 −Bt0 ,Bt2 −Bt1 ,...,Btk −

Btk−1 ) are independent, and

(iii) for every t, s ≥ 0, the law of Bt+s − Bt is Gaussian with mean 0 and covariance sId.

The term “standard” refers to the fact that B1 is normalized to have variance Id, and the choice B0 = 0. The characteristic properties (i), (ii), (iii) exactly amount to say that the finite- dimensional marginals of a Brownian motion are given by the formula of Proposition 6.1.1. Therefore the law of the Brownian motion is uniquely determined. We now show Wiener’s theorem that Brownian motion exists!

Theorem 6.1.1 (Wiener) There exists a Brownian motion on some probability space.

Proof. We will first prove the theorem in dimension d = 1 and construct a process (Bt, 0 ≤ t ≤ 1) satisfying the properties of a Brownian motion. −n n S Let D0 = {0, 1}, Dn = {k2 , 0 ≤ k ≤ 2 } for n ≥ 1, and D = n≥0 Dn be the set of dyadic rational numbers in [0, 1]. On some probability space (Ω, F,P ), let (Zd, d ∈ D) be a collection of independent random variables all having a Gaussian distribution N (0, 1) with mean 0 and variance 1. We are first going to construct the process (Bd, d ∈ D) so that Bd is a linear combination of the Zd0 ’s for every d.

It is a well-known and important fact that if random variables X1,X2,... are linear combinations of independent centered Gaussian random variables, then X1,X2,... are in- dependent if and only if they are pairwise uncorrelated, namely Cov (Xi,Xj) = E[XiXj] = 0 for every i 6= j.

We set B0 = 0 and Bd = Z1. Inductively, given (Bd, d ∈ Dn−1), we build (Bd, d ∈ Dn) in such a way that

• (Bd, d ∈ Dn) satisfies (i), (ii), (iii) in the definition of the Brownian motion (where the instants under consideration are taken in Dn).

• the random variables (Zd, d ∈ D \ Dn) are independent of (Bd, d ∈ Dn).

−n −n To this end, take d ∈ Dn \ Dn−1, and let d− = d − 2 and d+ = d + 2 so that d−, d+ are consecutive dyadic numbers in Dn−1. Then write B + B Z B = d− d+ + d . d 2 2(n+1)/2

(n+1)/2 (n+1)/2 Then Bd −Bd− = (Bd+ −Bd− )/2+Zd/2 and Bd+ −Bd = (Bd+ −Bd− )/2−Zd/2 . 0 (n+1)/2 Now notice that Nd := (Bd+ − Bd− )/2 and Nd := Zd/2 are by the induction hypothesis two independent centered Gaussian random variables with variance 2−n−1. 0 0 0 From this, one deduces Cov (Nd + Nd,Nd − Nd) = Var (Nd) − Var (Nd) = 0, so that 46 CHAPTER 6. BROWNIAN MOTION

−n the increments Bd − Bd− and Bd+ − Bd are independent with variance 2 , as should be. Moreover, these increments are independent of the increments Bd0+2−n−1 − Bd0 for 0 0 0 0 d ∈ Dn−1, d 6= d− and of Zd0 , d ∈ Dn \ Dn−1, d 6= d so they are independent of the 00 00 increments Bd00+2−n − Bd00 for d ∈ Dn, d ∈/ {d−, d}. This allows the induction argument to proceed one step further.

Thus, we have a process (Bd, d ∈ D) satisfying the properties of Brownian motion. Note that B − B − s has same t √ Let s ≤ t ∈ D, and notice that for every p > 0, since Bt −Bs has same law as t − sN, where N is a standard Gaussian random variable,

p p/2 p E[|Bt − Bs| ] = |t − s| E[|N| ].

Since a Gaussian random variable admits moments of all orders, it follows from Corollary 4.5.1 that (Bd, d ∈ D) a.s. admits a continuous continuation (Bt, 0 ≤ t ≤ 1). Up to modifying B on the exceptional event where such an extension does not exist, replacing it by the 0 function for instance, we see that B can be supposed to be continuous for every ω.

We now check that (Bt, t ∈ [0, 1]) thus constructed has the properties of Brownian n n n motion. Let 0 = t0 < t1 < . . . < tk, and let 0 = t0 < t1 < . . . < tk be dyadic numbers n such that t converges to t as n → ∞. Then by continuity, (B n ,...,B n ) converges a.s. i i t1 tk to (B ,...,B ) as n → ∞, while on the other hand, (B n − B n ) are independent t1 tk tj tj−1,1≤j≤k n n Gaussian random variables with variances (tj − tj−1, 1 ≤ j ≤ k), so it is not difficult using Lvy’s theorem to see that this converges in distribution to independent Gaussian random variables with respective variances tj − tj−1, which thus is the distribution of

(Btj − Btj−1 , 1 ≤ j ≤ k), as wanted. It is now easy to construct a Brownian motion indexed by R+: simply take independent i standard Brownian motions (Bt, 0 ≤ t ≤ 1), i ≥ 0 as we just constructed, and let

btc−1 X i btc Bt = B1 + Bt−btc , t ≥ 0 . i=0 It is easy to check that this has the wanted properties. Finally, it is straightforward to build a Brownian motion in Rd, by taking d independent 1 d 1 d copies B ,...,B of B and checking that ((Bt ,...,Bt ), t ≥ 0) is a Brownian motion in d R . 

d Let ΩW = C(R+, R ) be the ‘Wiener space’ of continuous functions, endowed with the product σ-algebra W (or the Borel σ-algebra associated with the compact-open topology). Let Xt(w) = w(t), t ≥ 0 denote the canonical process (w ∈ ΩW ).

Proposition 6.1.2 (Wiener’s measure) There exists a unique measure W0(dw) on (ΩW , W), such that (Xt, t ≥ 0) is a standard Brownian motion on (ΩW , W,W0(dw)).

Proof. Let (Bt, t ≥ 0) be a standard Brownian motion defined on some probability space (Ω, F,P ) The distribution of B, i.e. the image measure of P by the random variable 6.2. FIRST PROPERTIES 47

B :Ω → ΩW , is a measure W0(dw) satisfying the conditions of the statement. Uniqueness is obvious because such a measure is determined by the finite-dimensional marginals of Brownian motion. 

d For x ∈ R we also let Wx(dw) to be the image measure of W by (wt, t ≥ 0) 7→ (x + wt, t ≥ 0). A (continuous) process with law Wx(dw) is called a Brownian motion started at x. B We let (Ft , t ≥ 0) be the natural filtration of (Bt, t ≥ 0). Notice that Kolmogorov’s continuity lemma shows that a standard Brownian motion is also a.s. locally H¨oldercontinuous with any exponent α < 1/2, since it is for every α < 1/2 − 1/p for some integer p.

6.2 First properties

The first few following basic (and fundamental) invariance properties of Brownian motion are left as an exercise.

Proposition 6.2.1 Let B be a standard Brownian motion in Rd.

1. If U ∈ O(n) is an orthogonal matrix, then UB = (UBt, t ≥ 0) is again a Brownian motion. In particular, −B is a Brownian motion.

−1/2 2. If λ > 0 then (λ Bλt, t ≥ 0) is a standard Brownian motion (scaling property)

3. For every t ≥ 0, the shifted process (Bt+s − Bt, s ≥ 0) is a Brownian motion inde- B pendent of Ft (simple Markov property).

We now turn to less trivial path properties of Brownian motion. We begin with

Theorem 6.2.1 (Blumenthal’s 0 − 1 law) Let B be a standard Brownian motion. The B T B σ-algebra F0+ = ε>0 Fε is trivial, i.e. constituted of the events of probability 0 or 1.

B Proof. Let 0 < t1 < t2 < . . . < tk and A ∈ F0+. Then if F is continuous bounded function (Rd)k → R, we have by continuity of B and the dominated convergence theorem, 1 1 (ε) (ε) E[ AF (Bt1 ,...,Bt )] = lim E[ AF (Bt −ε,...,Bt −ε)], k ε↓0 1 k

ε B where B = (Bt+ε−Bε , t ≥ 0). On the other hand, since A is Fε -measurable for any ε > 0, the simple Markov property shows that this is equal to

P (A) lim E[F (Bt1−ε,...,Bt −ε)], ε↓0 k

which is P (A)E[F (Bt1 ,...,Btk )], using again dominated convergence and continuity of B B B and F . This entails that F0+ is independent of σ(Bs, s ≥ 0) = F∞. However, F∞ contains B 2 F0+, so that the latter σ-algebra is independent of itself, and P (A) = P (A ∩ A) = P (A) , entailing the result.  48 CHAPTER 6. BROWNIAN MOTION

Proposition 6.2.2 (i) For d = 1 and t ≥ 0, let St = sup0≤s≤t Bs and It = inf0≤s≤t Bs (these are random variables because B is continuous). Then almost-surely, for every ε > 0, one has Sε > 0 and Iε < 0. In particular, there exists a zero of B in any interval of the form (0, ε), ε > 0. (ii) A.s., sup Bt = − inf Bt = +∞. t≥0 t≥0

(iii) Let C be an open cone in Rd with non-empty interior and origin at 0, i.e. a set of the form {tu : t > 0, u ∈ A}, where A is an non-empty open subset of the unit sphere of Rd. If HC = inf{t > 0 : Bt ∈ C} is the first hitting time of C, then HC = 0 a.s.

Proof. (i) The probability that Bt > 0 is 1/2 for every t, so P (St > 0) ≥ 1/2, and there- fore if tn, n ≥ 0 is any sequence decreasing to 0, P (lim supn{Btn > 0}) ≥ lim supn P (Btn >

0) = 1/2. Since the event lim supn{Btn > 0} is in F0+, Blumenthal’s law shows that its probability must be 1. The same is true for the infimum by considering the Brownian motion −B.

(ii) Let S∞ = supt≥0 Bt. By scaling invariance, for every λ > 0, λS∞ = supt≥0 λBt has same law as supt≥0 Bλ2t = S∞. This is possible only if either S∞ ∈ {0, ∞} a.s., however, it cannot be 0 by (i).

(iii) The cone C is invariant by multiplication by a positive scalar, so that P (Bt ∈ C) is the same as P (B1 ∈ C) for every t by the scaling invariance of Brownian motion. Now, if C has nonempty interior, it is straightforward to check that P (B1 ∈ C) > 0, and one concludes similarly as above. Details are left to the reader. 

6.3 The strong Markov property

We now want to prove an important analog of the simple Markov property, where de- terministic times are replaced by stopping times. To begin with, we extend a little the definition of Brownian motion, by allowing it to start from a random location, and by working with filtrations that are larger with the natural filtration of standard Brownian motions.

We say that B is a Brownian motion (started at B0) if (Bt − B0, t ≥ 0) is a standard Brownian motion which is independent of B0. Otherwise said, it is the same as the definition as a standard Brownian motion, except that we do not require that B0 = 0. If we want to express this on the Wiener space with the Wiener measure, we have for every measurable functional F :ΩW → R+,

E[F (Bt, t ≥ 0)] = E[F (Bt − B0 + B0, t ≥ 0)], and since (Bt − B0) has law Wx, this is Z Z Z

P (B0 ∈ dx) W0(dw)F (x + w(t), t ≥ 0) = P (B0 ∈ dx)Wx(F ) = E[WB0 (F )], d d R ΩW R 6.3. THE STRONG MARKOV PROPERTY 49

where as above, Wx is the image of W0 by the translation w 7→ x + w, and WB0 (F ) is the random variable ω 7→ WB0(ω)(F ). Using Proposition 1.3.4 actually shows that

E[F (B)|B0] = WB0 (F ).

Let (Ft, t ≥ 0) be a filtration. We say that a Brownian motion B is an (Ft)-Brownian (t) motion if B is adapted to (Ft), and if B = (Bt+s − Bt, s ≥ 0) is independent of Ft for every t ≥ 0. For instance, if (Ft) is the natural filtration of a 2-dimensional Brownian 1 2 1 0 motion (Bt ,Bt , t ≥ 0), then (Bt , t ≥ 0) is an (Ft)-Brownian motion. If B is a standard 0 0 Brownian motion and X is a random variable independent of B , then B = (X +Bt, t ≥ 0) B B0 is a Brownian motion (started at B0 = X), and it is an (Ft ) = (σ(X) ∨ Ft )-Brownian B motion. A Brownian motion is always an (Ft )-Brownian motion. If B is a standard B Brownian motion, then the completed filtration Ft = Ft ∨ N (N being the set of events of probability 0) can be shown to be right-continuous, i.e. Ft+ = Ft for every t ≥ 0, and B is an (Ft)-Brownian motion. d Let (Bt, t ≥ 0) be an (Ft)-Brownian motion in R and T be an (Ft)-stopping time. (T ) We let Bt = BT +t − BT for every t ≥ 0 on the event {T < ∞}, and 0 otherwise. Then

Theorem 6.3.1 (Strong Markov property) Conditionally on {T < ∞}, the process (T ) B is a standard Brownian motion, which is independent of FT . Otherwise said, condi- tionally given FT and {T < ∞}, the process (BT +t, t ≥ 0) is an (FT +t)-Brownian motion started at BT .

Proof. Suppose first that T < ∞ a.s. Let A ∈ FT , and consider times t1 < t2 < . . . < tk. We want to show that for every bounded continuous function F on (Rd)k, 1 (T ) (T ) E[ AF (Bt1 ,...,Btk )] = P (A)E[F (Bt1 ,...,Btk )]. (6.1)

(T ) Indeed, taking A = Ω entails that B is a Brownian motion, while letting A vary in FT (T ) (T ) (T ) entails the independence of (Bt1 ,...,Btk ) and FT for every t1, . . . , tk, hence of B and FT . Now, suppose first that T takes its values in a countable subset E of R+. Then 1 (T ) (T ) X 1 (s) (s) E[ AF (Bt1 ,...,Btk )] = E[ A∩{T =s}F (Bt1 ,...,Btk )] s∈E X = P (A ∩ {T = s})E[F (Bt1 ,...,Btk )], s∈E where we used the simple Markov property and the fact that A ∩ {T = s} ∈ Fs by definition. Back to the general case, we can apply this result to the stopping time Tn = −n n 2 d2 T e. Since Tn ≥ T , it holds that FT ⊂ FTn so that we obtain for A ∈ FT

1 (Tn) (Tn) E[ AF (Bt1 ,...,Btk )] = P (A)E[F (Bt1 ,...,Btn )]. (6.2)

(Tn) (T ) Now, by a.s. continuity of B, it holds that Bt converges a.s. to Bt as n → ∞, for every t ≥ 0. Since F is bounded, the dominated convergence theorem allows to pass to the limit in (6.2), obtaining (6.1). Finally, if P (T = ∞) > 0, check that (6.1) remains true when replacing A by A∩{T < ∞}, and divide by P ({T < ∞}).  50 CHAPTER 6. BROWNIAN MOTION

An important example of application of the strong Markov property is the so-called reflection principle. Recall that St = sup0≤s≤t Bs.

Theorem 6.3.2 (Reflection principle) Let (Bt, t ≥ 0) be an (Ft)-Brownian motion started at 0, and T be an (Ft)-stopping time. Then, the process

Bet = Bt1{t≤T } + (2BT − Bt)1{t>T } , t ≥ 0 is also an (Ft)-Brownian motion started at 0.

(T ) Proof. By the strong Markov property, the processes (Bt, 0 ≤ t ≤ T ) and B are independent. Moreover, B(T ) is a standard Brownian motion, and hence has same law (T ) (T ) as −B . Therefore, the pair ((Bt, 0 ≤ t ≤ T ),B ) has same law as ((Bt, 0 ≤ t ≤ (T ) (T ) T ), −B ). On the other hand, the trajectory B is a measurable G((Bt, 0 ≤ t ≤ T ),B ), where G(X,Y ) is the concatenation of the paths X,Y . The conclusion follows from the (T ) fact that G((Bt, 0 ≤ t ≤ T ), −B ) = Be.  Corollary 6.3.1 (Sometimes also called the reflection principle) Let 0 < b and a ≤ b, then for every t ≥ 0,

P (St ≥ b, Bt ≤ a) = P (Bt ≥ 2b − a).

Proof. Let Tx = inf{t ≥ 0 : Bt ≥ x} be the first entrance time of Bt in [x, ∞) for B x > 0. Then Tx is an (Ft )-stopping time for every x by (i), Proposition 4.1.1. Notice that Tx < ∞ a.s. since S∞ = ∞ a.s., where S∞ = limt→∞ St.

Now by continuity of B, BTx = x for every x. By the reflection principle applied to T = Tb, we obtain (with the definition Be given in the statement of the reflection principle)

P (St ≥ b, Bt ≤ a) = P (Tb ≤ t, 2b − Bt ≥ 2b − a) = P (Tb ≤ t, Bet ≥ 2b − a), since 2b − Bt = Bet as soon as t ≥ Tb. On the other hand, the event {Bet ≥ 2b − a} is contained in {Tb ≤ t} since 2b − a ≥ b. Therefore, we obtain P (St ≥ b, Bt ≤ a) = P (Bet ≥ 2b − a), and the result follows since Be is a Brownian motion. 

Notice also that the probability under consideration is equal to P (St > b, Bt < a) = P (Bt > 2b − a), i.e. the inequalities can be strict or not. Indeed, for the right-hand side, this is due to the fact that the distribution of Bt is non-atomic, and for the left-hand side, this boils down to showing that for every x,

Tx = inf{t ≥ 0 : Bt > x} , a.s., which is a straightforward consequence of the strong Markov property at time Tx, com- bined with Proposition 6.2.2.

Corollary 6.3.2 The random variable St has the same law as |Bt|, for every fixed t ≥ 0. 2 Moreover, for every x > 0, the random time Tx has same law as (x/B1) .

Proof. As a ↑ b, the probability P (St ≥ b, Bt ≤ a) converges to P (St ≥ b, Bt ≤ b), and this is equal to P (Bt ≥ b) by Corollary 6.3.1. Therefore,

P (St ≥ b) = P (St ≥ b, Bt ≤ b) + P (Bt ≥ b) = 2P (Bt ≥ b) = P (|Bt| ≥ b), because {Bt ≥ b} ⊂ {St ≥ b}, and this gives the result. We leave the computation of the distribution of Tx as an exercise.  6.4. MARTINGALES AND BROWNIAN MOTION 51

6.4 Some martingales associated to Brownian motion

One of the nice features of Brownian motion is that there are a tremendous amount of martingales that are associated with it.

Proposition 6.4.1 Let (Bt, t ≥ 0) be an (Ft)-Brownian motion. 1 (i) If d = 1 and B0 ∈ L , the process (Bt, t ≥ 0) is a (Ft)-martingale. 2 2 (ii) If d = 1 and B0 ∈ L , the process (Bt − t, t ≥ 0) is a (Ft)-martingale. d (iii) In any dimension, let u = (u1, . . . , ud) ∈ C . If E[exp(hu, B0i)] < ∞, the process 2 d 2 M = (exp(hu, Bti − tu /2), t ≥ 0) is also a (Ft)-martingale for every u ∈ C , where u is Pd 2 a notation for i=1 ui .

Notice that in (iii), we are dealing with C-valued processes. The definition of E[X|G] the conditional expectation for a random variable X ∈ L1(C) is E[

(s) (s) Proof. (i) If s ≤ t, E[Bt − Bs|Fs] = E[Bt−s] = 0, where Bu = Bu+s − Bs has mean 0 and is independent of Fs, by the simple Markov property. The integrability of the process is obvious by hypothesis on B0.

(ii) Integrability is an easy exercise using that Bt − B0 is independent of B0. We have, 2 2 2 for s ≤ t, Bt = (Bt − Bs) + 2Bs(Bt − Bs) + Bs . Taking conditional expectation given Fs 2 2 and using the simple Markov property gives that E[Bt ] = (t − s) + Bs , hence the result. 2 (iii) Integrability comes from the fact that E[exp(λBt)] = exp(tλ /2) whenever B is a standard Brownian motion, and the fact that

E[exp(hu, Bti)] = E[exp(hu, (Bt −B0 +B0)i)] = E[exp(hu, Bt −B0i)]E[exp(hu, B0i)] < ∞. 2 For s ≤ t, Mt = exp(ihu, (Bt−Bs)i+ihu, Bsi+t|u| /2). We use the Markov property again, 2 and the fact that E[exp(ihu, Bt − Bsi)] = exp(−(t − s)|u| /2), which is the characteristic 2 function of a Gaussian law with mean 0 and variance |u| .  From this, one can show that

Proposition 6.4.2 Let (Bt, t ≥ 0) be a standard Brownian motion and Tx = inf{t ≥ 0 : Bt = x}. Then for x, y > 0, one has x P (T < T ) = ,E[T ∧ T ] = xy. −y x x + y x −y

d Proposition 6.4.3 Let (Bt, t ≥ 0) be a (Ft)-Brownian motion. Let f(t, x): R+×R → C be continuously differentiable in the variable t and twice continuously differentiable in x, and suppose that f and its derivatives of all order are bounded. Then, Z t  ∂ 1  Mt = f(t, Bt) − f(0,B0) − ds + ∆ f(s, Bs) , t ≥ 0 0 ∂t 2 Pd ∂2 is a (Ft)-martingale, where ∆ = i=1 2 is the Laplacian operator acting on the spatial ∂xi coordinate of f. 52 CHAPTER 6. BROWNIAN MOTION

This is the first symptom of the famous It¯oformula, which says what this martingale actually is.

Proof. Integrability is trivial from the boundedness of f, as well as adptedness, since Mt is a function of (Bs, 0 ≤ s ≤ t). Let s, t ≥ 0. We estimate

 Z t+s    ∂ ∆ E[Mt+s|Ft] = Mt + E f(t + s, Bt+s) − f(t, Bt) − du + f(u, Bu) Ft . t ∂t 2

On the one hand, E[f(t+s, Bt+s)−f(t, Bt)|Ft] = E[f(t+s, Bt+s −Bt +Bt)|Ft]−f(t, Bt), and since Bt+s − Bt is independent of Ft with law N (0, s), using Proposition 1.3.4, this is equal to Z f(t + s, Bt + x)p(s, x)dx − f(t, Bt), (6.3) R where p(s, x) = (2πs)−d/2 exp(−|x|2/(2s)) is the probability density function for N (0, s). On the other hand, if we let L = ∂/∂t + ∆/2,

Z t+s  Z s 

E du Lf(u, Bu) Ft = E du Lf(u + t, Bt+s − Bt + Bt) Ft . t 0

(t) (t) This expression is of the form E[F ((B ,Bt))|Ft], where F is measurable and Bs = Bt+s − Bt, s ≥ 0 is independent of Bt by the simple Markov property, and has law W0(dw), the Wiener measure. If (Xt, t ≥ 0) is the canonical process Xt(w) = wt, then this last expression rewrites, by Proposition 1.3.4,

Z t+s  Z Z s

E Lf(u, Bu)du Ft = W0(dw) du Lf(u + t, Xs + Bt) t ΩW 0 Z s Z = du W0(dw)Lfe(u, Xs + Bt) 0 ΩW Z s Z = du dx p(s, x)Lfe(u, x + Bt), d 0 R where fe(s, x) = f(s+t, x), and we made a use of Fubini’s theorem. Next, the boundedness of Lfe entails that this is equal to Z s Z lim du dx p(s, x)Lfe(u, x + Bt). ε↓0 d ε R From the expression for L, we can split this into two parts. Using integration by parts,

Z Z s ∂fe dx du p(u, x) (u, x + Bt) ε ∂t R Z Z = dx p(s, x)fe(s, x + Bt) − dx p(ε, x)fe(ε, x + Bt) R R Z Z s ∂p − dx du (u, x) f(t + u, x + Bt). d ∂t R ε 6.5. RECURRENCE AND TRANSIENCE PROPERTIES 53

Similarly, integrating by parts twice yields Z s Z ∆ Z s Z ∆ du dx p(u, x) f(u + t, x + B ) = du dx p(u, x) f(u + t, x + B ). 2 t 2 t ε R ε R

Now, p(t, x) satisfies the heat equation (∂t − ∆/2)p = 0. Therefore, the integral terms cancel each other, and it remains

Z t+s  Z Z

E du Lf(u, Bu) Ft = dx p(s, x)fe(s, x + Bt) − lim dx p(ε, x)fe(ε, x + Bt), d ε↓0 d t R R which by dominated convergence is exactly (6.3). This shows that E[Mt+s − Mt|Ft] = 0. 

6.5 Recurrence and transience properties of Brown- ian motion

From this section on, we are going to introduce a bit of extra notation. We will suppose that the reference measurable space (Ω, F) on which (Bt, t ≥ 0) is defined endowed with d probability measures Px, x ∈ R such that under Px,(Bt −x, t ≥ 0) is a standard Brownian motion. A possibility is to choose the Wiener space and endow it with the measures Wx, so that the canonical process (Xt, t ≥ 0) is a Brownian motion started at x under Wx. We let Ex be the expectation associated with Px. In the sequel, B(x, r), B(x, r) will respectively denote the open and closed Euclidean balls with center x and radius r, in Rd for some d ≥ 1.

Theorem 6.5.1 (i) If d = 1, Brownian motion is point-recurrent in the sense that under P0 (or any Py, y ∈ R),

a.s., {t ≥ 0 : Bt = x} is unbounded for every x ∈ R.

(ii) If d = 2, Brownian motion is neighborhood-recurrent, in the sense that for every x, under Px,

d a.s., {t ≥ 0 : |Bt| ≤ ε} is unbounded for every x ∈ R , ε > 0.

However, points are polar in the sense that for every x ∈ Rd,

P0(Hx = ∞) = 1 , where Hx = inf{t > 0 : Bt = x} is the hitting time of x.

(iii) If d ≥ 3, Brownian motion is transient, in the sense that a.s. under P0, |Bt| → ∞ as t → ∞.

Proof. (i) is a consequence of (ii) in Proposition 6.2.2. For (ii), let 0 < ε < R be real numbers and f be a C∞ function, which is bounded with 2 all its derivatives, and that coincides with x 7→ log |x| on Dε,R = {x ∈ R : ε ≤ |x| ≤ R}. 54 CHAPTER 6. BROWNIAN MOTION

Then one can check that ∆f = 0 on the interior of Dε,R, and therefore, if we let S = inf{t ≥ 0 : |Bt| = ε} and T = inf{t ≥ 0 : |Bt| = R}, then S,T,H = S ∧ T are stopping times, and from Proposition 6.4.3, the stopped process (log |Bt∧H |, t ≥ 0) is a (bounded) martingale. If ε < |x| < R, we thus obtain that Ex[log |BH |] = log |x|. Since H ≤ T < ∞ a.s. (Brownian motion is unbounded a.s.), and since |BS| = ε, |BT | = R on the event that S < ∞, T < ∞, the left-hand side is (log ε)Px(S < T ) + (log R)Px(S > T ). Therefore, log R − log |x| P (S < T ) = . (6.4) x log R − log ε Letting ε → 0 shows that the probability of hitting 0 before hitting the boundary of the ball with radius R is 0, and therefore, letting R → ∞, the probability of hitting 0 (starting from x 6= 0) is 0. The announced result (for x 6= 0) is then obtained by translation. We thus have that P0(Hx < ∞) = 0 for every x 6= 0. Next, we have

P0(∃t ≥ a : Bt = 0) = P0(∃s ≥ 0 : Bs+a − Ba + Ba = 0), and the Markov property at time a shows that this is Z P0(∃t ≥ a : Bt = 0) = P0(Ba ∈ dy)P0(∃s ≥ 0 : Bs + y = 0) 2 ZR = P0(Ba ∈ dy)Py(∃s ≥ 0 : Bs = 0) = 0, 2 R because the law of Ba under P0 is a Gaussian law that does not charge the point 0 (we have been using the notation P (X ∈ dx) for the law of the random variable X). On the other hand, letting R → ∞ first in (6.4), we get that the probability of hitting the ball with center 0 and radius ε is 1 for every ε, starting from any point: Px(∃t ≥ 0 : |Bt| ≥ ε) = 1. Thus, for every n ∈ Z+, a similar application of the Markov property at time n gives Z Px(∃t ≥ n : |Bt| ≤ ε) = Px(Bn ∈ dy)Py(∃t ≥ 0 : |Bt| ≤ ε) = 1. 2 R Hence the result. For (iii) Since the three first components of a Brownian motion in Rd form a Brownian motion, it is clearly sufficient to treat the case d = 3. So assume d = 3. Let f be a C∞ function with all derivatives that are bounded, and that coincides with x 7→ 1/|x| on Dε,R, which is defined as previously but for d = 3. Then ∆f = 0 on the interior of Dε,R, and the same argument as above shows that for x ∈ Dε,R, defining S,T as above,

|x|−1 − R−1 P (S < T ) = . x ε−1 − R−1 This converges to ε/|x| as R → ∞, which is thus the probability of ever visiting B(0, ε) when starting from x (with |x| ≥ ε). Define two sequences of stopping times Sk,Tk, k ≥ 1 by S1 = inf{t ≥ 0 : |Bt| ≤ ε}, and

Tk = inf{t ≥ Sk : |Bt| ≥ 2ε} ,Sk+1 = inf{t ≥ Sk : |Bt| ≥ 2ε}. 6.6. THE DIRICHLET PROBLEM 55

If Sk is finite, we get that Tk is also finite, because Brownian motion is an a.s. unbounded process, so {Sk < ∞} = {Tk < ∞} up to a zero-probability event. The strong Markov property at time Tk gives

Px(Sk+1 < ∞|Sk < ∞) = Px(Sk+1 < ∞|Tk < ∞)

= Px(∃s ≥ Tk : |Bs − BTk + BTk | ≤ ε | Tk < ∞) Z

= Px(BTk ∈ dy|Tk < ∞)Py(∃s ≥ 0 : |Bs| ≤ ε), 3 R where Px(BTk ∈ dy|Tk < ∞) is the law of BTk under the probability measure Px(A|Tk <

∞),A ∈ F. Since |BTk | = 2ε on the event {Tk < ∞}, we have that the last probability −k+1 is ε/|y| = 1/2. Finally, we obtain by induction that Px(Sk < ∞) ≤ Px(S1 < ∞)2 , and the Borel-Cantelli lemma entails that a.s., Sk = ∞ for some k. Therefore, Brownian motion in dimension 3 a.s. eventually leaves the ball of radius ε for good, and letting ε = n → ∞ along Z+ gives the result. 

Remark. If B(x, ε) is the Euclidean ball of center x and radius ε, notice that the property 2 of (ii) implies the fact that {t ≥ 0 : Bt ∈ B(x, ε)} is unbounded for every x ∈ R and every ε > 0, almost surely (indeed, one can cover R2 by a countable union of balls of a fixed radius). In particular, the trajectory of a 2-dimensional Brownian motion is everywhere dense. On the other hand, it will a.s. never hit a fixed countable family of points (except maybe at time 0), like the points with rational coordinates!

6.6 Brownian motion and the Dirichlet problem

Let D be a connected open subset of Rd for some d ≥ 2. We will say that D is a domain. Let ∂D be the boundary of D. We denote by ∆ the Laplacian on Rd. Suppose given a measurable function g : ∂D → R. A solution of the Dirichlet problem with boundary condition g on D is a function u : D → R of class C2(D) ∩ C(D), such that

 ∆u = 0 on D (6.5) u|∂D = g.

A solution of the Dirichlet problem is the mathematical counterpart of the following physical problem: given an object made of homogeneous material, such that the tempera- ture g(y) is imposed at point y of its boundary, the solution u(x) of the Dirichlet problem gives the temperature at the point x in the object when equilibium is attained. As we will see, it is possible to give a probabilistic resolution of the Dirichlet problem with the help of Brownian motion. This is essentially due to Kakutani. We let Ex be the law of the Brownian motion in Rd started at x. In the remaining of the section, let T = inf{t ≥ 0 : Bt ∈/ D} be the first exit time from D. It is a stopping time, as it is the first entrance time in the closed set Dc. We will assume that the domain D is such that P (T < ∞) = 1 to avoid complications. Hence BT is a well-defined random variable. In the sequel, | · | is the euclidean norm on Rd. The goal of this section is to prove the following 56 CHAPTER 6. BROWNIAN MOTION

Theorem 6.6.1 Suppose that g ∈ C(∂D, R) is bounded, and assume that D satisfies a local exterior cone condition (l.e.c.c.), i.e. for every y ∈ ∂D, there exists a nonempty open convex cone with origin at y such that C ∩ B(y, r) ⊂ Dc for some r > 0. Then the function u : x 7→ Ex [g(BT )] is the unique bounded solution of the Dirichlet problem (6.5). In particular, if D is bounded and satisfies the l.e.c.c., then u is the unique solution of the Dirichlet problem.

We start with a uniqueness statement.

Proposition 6.6.1 Let g be a bounded function in C(∂D, R). Set

u(x) = Ex [g(BT )] . If v is a bounded solution of the Dirichlet problem, then v = u.

In particular, we obtain uniqueness when D is bounded. Notice that we do not make any assumption on the regularity of D here besides the fact that T < ∞ a.s. Proof. Let v be a bounded solution of the Dirichlet problem. For every N ≥ 1, introduce the reduced set DN = {x ∈ D : |x| < N and d(x, ∂D) > 1/N}. Notice it is an open set, but which need not be connected. We let TN be the first exit time of DN . By Proposition 6.4.3, the process Z t 1 Mt = veN (Bt) − veN (B0) + − ∆veN (Bs)ds , t ≥ 0 0 2 2 is a martingale, where veN is a C function that coincides with v on DN , and which is bounded with all its partial derivatives (this may look innocent, but the fact that such a function exists is highly non-trivial, the use of such a function could be avoided by a stopped analog of Proposition 6.4.3). Moreover, the martingale stopped at TN is

Mt∧TN = v(Bt∧TN ) − v(B0), because ∆v = 0 inside D, and it is bounded (because v is bounded on DN ), hence uniformly integrable. By optional stopping at TN , we get that for every x ∈ DN ,

0 = Ex[MTN ] = Ex[v(BTN )] − v(x) (6.6)

Now, as N → ∞, BTN converges to BT a.s. by continuity of paths and the fact that T < ∞ a.s. Since v is bounded, we can use dominated convergence as N → ∞, and get that for every x ∈ D, v(x) = Ex[v(BT )] = Ex[g(BT )], hence the result. 

d For every x ∈ R and r > 0, let σx,r be the uniform probability measure on the sphere d Sx,r = {y ∈ R : |y −x| = r}. It is the unique probability measure on Sx,r that is invariant under isometries of Sx,r. We say that a locally bounded measurable function h : D → R is harmonic on D if for every x ∈ D and every r > 0 such that the closed ball B(x, r) with center x and radius r is contained in D, Z h(x) = h(y)σx,r(dy). Sx,r 6.6. THE DIRICHLET PROBLEM 57

Proposition 6.6.2 Let h be harmonic on a domain D. Then h ∈ C∞(D, R), and ∆h = 0 on D.

Proof. Let x ∈ D and ε > 0 such that B(x, ε) ⊂ D. Then let ϕ ∈ C∞(D, R) be non-negative with non-empty compact support in [0, ε[. We have, for 0 < r < ε, Z h(x) = h(x + y)σ0,r(dy). S(0,r) Multiplying by ϕ(r)rd−1 and integrating in r gives Z ch(x) = ϕ(|z|)h(x + z)dz, d R where c > 0 is some constant, where we have used the fact that Z Z Z d−1 f(x)dx = C r dr f(ry)σ0,r(dy) d R R+ S(0,r) R R for some C > 0. Therefore, ch(x) = d ϕ(|z − x|)h(z)dz and by derivation under the R sign, we easily get that h is C∞. Next, by translation we may suppose that 0 ∈ D and show only that ∆h(0) = 0. we may apply Taylor’s formula to h, obtaining, as x → 0,

d 2 2 X 2 ∂ h X ∂ h 2 h(x) = h(0) + h∇h(0), xi + xi 2 (0) + xixj (0) + o(|x| ). ∂x ∂xi∂xj i=1 i i6=j

Now, integration over S0,r for r small enough yields Z 2 h(x)σ0,r(dx) = h(0) + Cr∆h(0) + o(|r| ), Sx,r

R 2 where Cr = x σ0,r(dx), as the reader may check that all the other integrals up to the S0,r 1 second order are 0, by symmetry. since the left-hand side is h(0), we obtain ∆h(0) = 0. 

Therefore, harmonic functions are solutions of certain Dirichlet problems.

Proposition 6.6.3 Let g be a bounded measurable function on ∂D, and let T = inf{t ≥ 0 : Bt ∈/ D}. Then the function h : x ∈ D 7→ Ex[g(BT )] is harmonic on D, and hence ∆h = 0 on D.

d Proof. For every Borel subsets A1,...,Ak of R and times t1 < . . . < tk, the map

x 7→ Px(Bt1 ∈ A1,...,Btn ∈ An) is measurable by Fubini’s theorem, once one has written the explicit formula for this probability. Therefore, by the monotone class theorem, x 7→ Ex[F ] is measurable for 58 CHAPTER 6. BROWNIAN MOTION every integrable random variable F , which is measurable with respect to the product σ-algebra on C(R, Rd). Moreover, h is bounded by assumption. Now, let S = inf{t ≥ 0 : |Bt − x| ≥ r} the first exit time of B form the ball of center x and radius r. Then by (ii), Proposition 6.2.2, S < ∞ a.s. By the strong Markov property, Be = (BS+t, t ≥ 0) is an (FS+t) Brownian motion started at BS. Moreover, the first hitting time of ∂D for B is T = T − S. Moreover, B = B , so that e e eTe T Z E [g(B )] = E [g(B )] = P (B ∈ dy)E [g(B )1 ], x T x eTe x S y T {T <∞} d R R and we recognize Px(BS ∈ dy)h(y) in the last expression.

Since B starts from x under Px, the rotation invariance of Brownian motion shows that BS − x has a distribution on the sphere of center 0 and radius r which is invariant under the orthogonal group, so we conclude that the distribution of BS is the uniform measure on the sphere of center x and radius r, and therefore that h is harmonic on D.  It remains to understand whether the function u of Theorem 6.6.1 is actually a solution of the Dirichlet problem. Indeed, is not the case in general that u(x) has limit g(y) as x ∈ D, x → y, and the reason is that some points of ∂D may be ‘invisible’ to Brownian motion. The reader can convince himself, for example, that if D = B(0, 1)\{0} is the open 2 1 ball of R with center 0 and radius 1, whose origin has been removed, and if g = {0}, then no solution of the Dirichlet problem with boundary constraint g exists. The probabilistic reason for that is that Brownian motion does not see the boundary point 0. This is the reason why we have to make regularity assumptions on D in the following theorem.

Proof of Theorem 6.6.1. It remains to prove that under the l.e.c.c., u is continuous on D, i.e. u(x) converges to g(y) as x ∈ D converges to y ∈ ∂D. In order to do that, we need a preliminary lemma. Recall that T is the first exit time of D for the Brownian path. Lemma 6.6.1 Let D be a domain satisfying the l.e.c.c., and let y ∈ ∂D. Then for every η > 0, Px(T < η) → 1 as x ∈ D → y.

Proof. Let Cy = y + C be a nonempty open convex cone with origin at y such that for c some η > 0, Cy ⊂ D (we leave as an exercise the case when only a neighborhood of this cone around y is contained in Dc). Then it is an elementary geometrical fact that for every η > 0 small enough, there exist δ > 0 and a nonempty open convex cone C0 with origin 0 at 0, such that x + (C \ B(0, η)) ⊆ Cy for every x ∈ B(y, δ). Now by (iii) in Proposition ε 0 ε 6.2.2, if HC0 = inf{t > 0 : Bt ∈ C \ B(0, ε)}, then P0(HC0 < η) % P0(HC0 < η) = 1 as η ↓ 0. 0 Since hitting x + (C \ B(0, η)) implies hitting Cy and therefore leaving D, we obtain, 0 0 after translating by x, that for every η, ε > 0, Px(T < η) can be made ≥ 1 − ε for x belonging to a sufficiently small δ-neighborhood of y in D.  We can now finish the proof of Theorem 6.6.1. Let y ∈ ∂D. We want to estimate the quantity Ex[g(BT )] − g(y) for some y ∈ ∂D. For η, δ > 0, let   Aη,δ = sup |Bt − x| ≥ δ/2 . 0≤t≤η 6.7. DONSKER’S INVARIANCE PRINCIPLE 59

This event decreases to ∅ as η ↓ 0 because B has continuous paths. Now, for any δ, η > 0,

c Ex[|g(BT ) − g(y)|] = Ex[|g(BT ) − g(y)| ; {T ≤ η} ∩ Aδ,η]

+Ex[|g(BT ) − g(y)| ; {T ≤ η} ∩ Aδ,η]

+Ex[|g(BT ) − g(y)| ; {T ≥ η}]

Fix ε > 0. We are going to show that each of these three quantities can be made < ε/3 for x close enough to y. Since g is continuous at y, for some δ > 0, |y − z| < δ with c y, z ∈ ∂D implies |g(y) − g(z)| < ε/3. Moreover, on the event {T ≤ η} ∩ Aδ,η, we know that |BT − x| < δ/2, and thus |BT − y| ≤ δ as soon as |x − y| ≤ δ/2. Therefore, for every η > 0, the first quantity is less than ε/3 for x ∈ B(y, δ/2).

Next, if M is an upper bound for |g|, the second quantity is bounded by 2MP (Aδ,η). Hence, by now choosing η small enough, this is < ε/3.

Finally, with δ, η fixed as above, the third quantity is bounded by 2MPx(T ≥ η). By the previous lemma, this is < ε/3 as soon as x ∈ B(y, α) ∩ D for some α > 0. Therefore, for any x ∈ B(y, α ∧ δ/2) ∩ D, |u(x) − g(y)| < ε. This entails the result. 

Corollary 6.6.1 A function u : D → R is harmonic in D if and only if it is in C2(D, R), and satisfies ∆u = 0.

Proof. Let u be of class C2(D) be of zero Laplacian, and let x ∈ D. Let ε be such that

B(x, ε) ⊆ D, and notice that u|B(x,ε) is a solution of the Dirichlet problem on B(x, ε) with boundary values u|∂B(x,ε). Then B(x, ε) satisfies the l.e.c.c., so that u|B(x,ε) is the unique such solution, which is also given by the harmonic function of Theorem 6.6.1. Therefore, u is harmonic on D. 

6.7 Donsker’s invariance principle

The following theorem completes the description of Brownian motion as a ‘limit’ of cen- tered random walks as depicted in the beginning of the chapter, and strengthen the convergence of finite-dimensional marginals to that convergence in distribution. We en- dow C([0, 1], R) with the supremum norm, and recall (see the exercises on continuous-time processes) that the product σ-algebra associated with it coincides with the Borel σ-algebra associated with this norm. We say that a function F : C([0, 1]) → R is continuous if it is continuous with respect to this norm.

Theorem 6.7.1 (Donsker’s invariance principle) Let (Xn, n ≥ 1) be a sequence of R-valued integrable independent random variables with common law µ, such that Z Z xµ(dx) = 0 and x2µ(dx) = σ2 ∈ (0, ∞).

Let S0 = 0 and Sn = X1 + ... + Xn, and define a continuous process that interpolates linearly between values of S, namely

St = (1 − {t})S[t] + {t}S[t]+1 t ≥ 0, 60 CHAPTER 6. BROWNIAN MOTION

[N] 2 −1/2 where [t] denotes the integer part of t and {t} = t − [t]. Then S := ((σ N) SNt, 0 ≤ t ≤ 1) converges in distribution to a standard Brownian motion between times 0 and 1, i.e. for every bounded continuous function F : C([0, 1]) → R,

 [N]  E F (S ) → E0[F (B)]. n→∞

Notice that this is much stronger than what Proposition 6.1.1 says. Despite the slight difference of framework between these two results (one uses c`adl`agcontinuous- time version of the random walk, and the other uses an interpolated continuous version), Donsker’s invariance principle is stronger. For instance, one can infer from this theorem −1/2 that the random variable N sup0≤n≤N Sn converges to sup0≤t≤1 Bt in distribution, because f 7→ sup f is a continuous operation on C([0, 1], R). Proposition 6.1.1 would be powerless to address this issue. The proof we give here is an elegant demonstration that makes use of a coupling of the random walk with the Brownian motion, called the Skorokhod embedding theorem. It is however specific to dimension d = 1. Suppose we are given a Brownian motion (Bt, t ≥ 0) on some probability space (Ω, F,P ).

Let µ+(dx) = P (X1 ∈ dx)1{x≥0}, µ−(dy) = P (−X1 ∈ dy)1{y≥0} define two non- negative measures. Assume that (Ω, F,P ) is a rich enough probability space so that we can further define on it, independently of (Bt, t ≥ 0), a sequence of independent identically 2 distributed R -valued random variables ((Yn,Zn), n ≥ 1) with distribution

P ((Yn,Zn) ∈ dxdy) = C(x + y)µ+(dx)µ−(dy), where C > 0 is the appropriate normalizing constant that makes this expression a prob- ability measure. B Next, consider let F0 = σ{(Yn,Zn), n ≥ 1} and Ft = F0 ∨ Ft , so that (Ft, t ≥ 0) is a filtration such that B is an (Ft)-Brownian motion. We define a sequence of random times, by T0 = 0,T1 = inf{t ≥ 0 : Bt ∈ {Y1, −Z1}}, and recursively,

Tn = inf{t ≥ Tn−1 : Bt − BTn−1 ∈ {Yn, −Zn}}.

By (ii) in Proposition 6.2.2, these times are a.s. finite, and they are stopping times with respect to the filtration (Ft). We claim that

Lemma 6.7.1 The sequence (BTn , n ≥ 0) has the same law as (Sn, n ≥ 0). Moreover, the intertimes (Tn − Tn−1, n ≥ 0) form an independent sequence of random variables with 2 same distribution, and expectation E[T1] = σ .

Proof. By repeated application of the Markov property at times Tn, n ≥ 1, and the fact that the (Yn,Zn), n ≥ 1 are independent with same distribution, we obtain that the processes (Bt+Tn−1 −BTn−1 , 0 ≤ t ≤ Tn −Tn−1) are independent with the same distribution.

The fact that the differences BTn − BTn−1 , n ≥ 1 and Tn − Tn−1, n ≥ 0 form sequences of independent and identicaly distributed random variables follows from this observation. 2 It therefore remains to check that BT1 has same law as X1 and E[T1] = σ . Remember from Proposition 6.4.2 that given Y1,Z1, the probability that BT1 = Y1 is Z1/(Y1 + Z1), as 6.7. DONSKER’S INVARIANCE PRINCIPLE 61 follows from the optional stopping theorem. Therefore, for every non-negative measurable function f, by first conditioning on (Y1,Z1), we get   Z1 Y1 E[f(BT1 )] = E f(Y1) + f(−Z1) Y1 + Z1 Y1 + Z1 Z  y x  = C(x + y)µ (dx)µ (dy) f(x) + f(−y) + − x + y x + y R+×R+ Z 0 0 = C (f(x)µ+(dx) + f(−x)µ−(dx)) = C E[f(X1)], R+

0 R for C = C xµ+(dx) which can only be = 1 (by taking f = 1). Here, we have used the R R fact that xµ+(dx) = xµ−(dx), which amounts to say that X1 is centered.

For E[T1], recall from Propostition 6.4.2 that E[inf{t ≥ 0 : Bt ∈ {x, −y}}] = xy, so by a similar conditioning argument as above, Z 2 E[T1] = C(x + y)xyµ+(dx)µ−(dy) = σ , R+×R+ R where we again used that C xµ+(dx) = 1. 

Proof of Donsker’s invariance principle. We suppose given a Brownian motion (N) 1/2 B. For N ≥ 1, define Bt = N BN −1t, t ≥ 0, which is a Brownian motion by scaling invariance. Perform the Skorokhod embedding construction on B(N) to obtain variables (N) (N) (N) (N) Tn , n ≥ 0. Then, let Sn = B (N) . Then by Lemma 6.7.1, Sn , n ≥ 0 is a random Tn walk with same law as Sn, n ≥ 0. We interpolate linearly between integers to obtain a (N) (N) 2 −1/2 (N) (N) continuous process St , t ≥ 0. Finally, let Set = (σ N) SNt , t ≥ 0 and Ten = −1 (N) N Tn . (N) We are going to show that the supremum norm of Bt − Set , 0 ≤ t ≤ 1 converges to 0 in probability. 2 By the law of large numbers, Tn/n converges a.s. to σ as n → ∞. Thus, by a −1 2 monotonicity argument, N sup0≤n≤N |Tn − σ n| converges to 0 a.s. as N → ∞. As a consequence, this supremum converges to 0 in probability, meaning that for every δ > 0,   (N) P sup |Ten − n/N| ≥ δ → 0. 0≤n≤N N→∞

(N) (N) On the other hand, for every t ∈ [n/N, (n + 1)/N], there exists some u ∈ [Ten , Ten+1] with (N) (N) Bu = Se , because Se = B (N) for every n and by the intermediate values theorem, t n/N Ten (N) (N) Se and B being continuous. Therefore, the event {sup0≤t≤1 |Set −Bt| > ε} is contained in the union KN ∪ LN , where   N (N) K = sup |Ten − n/N| > δ 0≤n≤N and N L = {∃ t ∈ [0, 1], ∃ u ∈ [t − δ − 1/N, t + δ + 1/N]: |Bt − Bu| > ε}. 62 CHAPTER 6. BROWNIAN MOTION

We already know that P (KN ) → 0 as N → ∞. For LN , since B is a.s. uniformly continuous on [0, 1], by taking δ small enough and then N large enough, we can make P (LN ) as small as wanted. Therefore, we have showed that

 (N)  P kSe − Bk∞ > ε → 0. n→∞

(N) Therefore, (Se , 0 ≤ t ≤ 1) converges in probability for the uniform norm to (Bt, 0 ≤ t ≤ 1), which entails convergence in distribution by Proposition 5.2.1. This concludes the proof.  Chapter 7

Poisson random measures and Poisson processes

7.1 Poisson random measures

Let (E, E) be a measurable space, and let µ be a non-negative σ-finite measure on (E, E). We denote by E∗ the set of σ-finite atomic measures on E, i.e. the set of σ-finite measures taking values in Z+ t {∞} (in fact, we will only consider measures that can be put in P ∗ the form i∈I δxi with I countable and xi ∈ E, i ∈ I). The set E is endowed with the ∗ ∗ product σ-algebra E = σ(XA,A ∈ E), where XA(m) = m(A) for m ∈ E , and A ∈ E. ∗ Otherwise said, for every A ∈ E, the mapping m 7→ m(A) from E to Z+ t {∞} is measurable with respect to E ∗. For λ > 0 we denote by P(λ) the Poisson distribution with parameter λ, which assigns mass e−λλn/n! to the integer n.

Definition 7.1.1 A Poisson measure on (E, E) with intensity µ is a random variable ∗ M with values in E , such that if (Ak, k ≥ 1) is a sequence of disjoint sets in E, with µ(Ak) < ∞ for every k,

(i) the random variables M(Ak), k ≥ 1 are independent, and

(ii) the law of M(Ak) is P(µ(Ak)) for k ≥ 1.

Notice that properties (i) and (ii) completely characterize the law of the random variable M. Indeed, notice that events which are either empty or of the form

∗ {m ∈ E : m(A1) = i1, . . . , m(Ak) = ik} , with pairwise disjoint A1,...,Ak ∈ E, µ(Aj) < ∞, 1 ≤ j ≤ k and where (i1, . . . , ik) are integers, form a π-system that generates E ∗. If now M is a Poisson random measure with intensity µ, on some probability space (Ω, F,P ), then

k ij Y µ(Aj) P (M(A ) = i ,...,M(A ) = i ) = e−µ(Aj ) . 1 1 k k n ! j=1 j Hence the uniqueness of the law of a random measure satisfying (i), (ii). Existence is stated in the next

63 64 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

Proposition 7.1.1 For every σ-finite non-negative measure µ on (E, E), there exists a Poisson random measure on (E, E) with intensity µ.

Proof. Suppose first that λ = µ(E) < ∞. We let N be a Poisson random variable with parameter λ, and X1,... be independent random variables, independent of N, with law PN(ω) µ/µ(E). Finally, we let Mω = i=1 δXi(ω). Now, if N is Poisson with parameter λ and (Yi, i ≥ 1) are independent and independent PN 1 of N, with P (Yi = j) = pj, 1 ≤ j ≤ k, it holds that i=1 {Yi=j}, 1 ≤ j ≤ k are independent with respective laws P(pjλ), 1 ≤ j ≤ k. It follows that M is a Poisson measure with intensity µ: for disjoint A1,...,Ak in E with finite µ-measures, we let Yi = j whenever Xi ∈ Aj, defining independent random variables in {1, . . . , k} with P (Yi = j) = µ(Aj)/µ(E), so that M(Aj), 1 ≤ j ≤ k are independent P(µ(E)µ(Aj)/µ(E)), 1 ≤ j ≤ k random variables. In the general case, since µ is σ-finite, there is a partition of E into measurable sets Ek, k ≥ 1 that are disjoint and have finite µ-measure. We can construct independent Poisson measures Mk on Ek with intensity µ(· ∩ Ek), for k ≥ 1. We claim that X M(A) = Mk(A ∩ Ek) ,A ∈ E, k≥1 defines a Poisson random measure with intensity µ. This is an easy consequence of the property that if Z1,Z2,... are independent Poisson variables with respective parameters λ1, λ2,..., then the sum Z1 + Z2 + ... is Poisson with parameter λ1 + λ2 + ... (with the convention that P(∞) is a Dirac mass at ∞). 

From the construction, we obtain the following important property of Poisson random measures:

Proposition 7.1.2 Let M be a Poisson random measure on E with intensity µ, and let A ∈ E be such that µ(A) < ∞. Then M(A) has law P(µ(A)), and given M(A) = k, the Pk restriction M|A has same law as i=1 δXi , where (X1,X2,...,Xk) are independent with law µ(· ∩ A)/µ(A). Moreover, if A, B ∈ E are disjoint, then the restrictions M|A,M|B are independent. Last, any Poisson random measure can be written in the form M(dx) = P i∈I δxi (dx) where I is a countable index-set and the xi, i ∈ I are random variables.

7.2 Integrals with respect to a Poisson measure

Proposition 7.2.1 Let M be a Poisson random measure on E, with intensity µ. Then for every measurable f : E → R+, the quantity Z M(f) := f(x)M(dx) E defines a random variable, and  Z  E[exp(−M(f))] = exp − µ(dx)(1 − exp(−f(x))) . E 7.2. INTEGRALS WITH RESPECT TO A POISSON MEASURE 65

1 1 R Moreover, if f : E → R is measurable and in L (µ), then f ∈ L (M) a.s., E f(x)M(dx) defines a random variable, and Z  E[exp(iM(f))] = exp µ(dx)(exp(if(x)) − 1) . E

The first formula is sometimes called the Laplace functional formula, or Campbell formula. Notice that by replacing f by af, differentiating the formula with respect to a and letting a ↓ 0, one gets the first moment formula Z E[M(f)] = f(x)µ(dx) , E whenever f ≥ 0, or f is integrable w.r.t. µ (in this case, consider first f +, f −). Similarly, Z Var M(f) = f(x)2µ(dx) E (for this, first notice that the restrictions of M to {f ≥ 0} and {f < 0} are independent).

Proof. Let En, n ≥ 0 be a measurable partition of E into sets with finite µ-measure. First assume that f = 1A for A ∈ E, µ(A) < ∞. Then M(A) is a random variable by definition of M, and this extends to any A ∈ E by considering A ∩ En, n ≥ 0 and summation. Since any measurable non-negative function is the increasing limit of finite linear combinations of such indicator functions, we obtain that M(f) is a random variable 1 as a limit of random variables. Moreover, a similar argument shows that M(f En ), n ≥ 0 are independent random variables.

Next, assume f ≥ 0. The number Nn of atoms of M that fall in En has law P(µ(En)) and given Nn = k, the atoms can be supposed to be independent random variables with law µ(· ∩ En)/µ(En). Therefore,

∞ µ(E )k Z µ(dx) k 1 X −µ(En) n −f(x) E[exp(−M(f En ))] = e e k! µ(En) k=0 En  Z  = exp − µ(dx)(1 − exp(−f(x))) En 1 From the independence of the variables M(f En ), we can then take products over n ≥ 0 (i.e. apply monotone convergence) and obtain the wanted formula. From this, we obtain the first moment formula for functions f ≥ 0. If f is a measurable function from E → R, applying the result to |f| shows that if f ∈ L1(µ), then M(|f|) < ∞ a.s. so M(f) is well-defined for almost every ω, and defines a random variable as it is equal to M(f +) − M(f −). To establish the last formula of the theorem, in the case where f ∈ L1(µ), follows by 1 the same kind of arguments: first, we establish the formula for f En in place of f. Then, to obtain the result, we must show that R µ(dx)(eif(x) −1) converges to R µ(dx)(eif(x) −1), An E if(x) where An = E0∪· · ·∪En. But |e −1| ≤ |f(x)|, whence the function under consideration is integrable with respect to µ, giving the result (| R g(x)µ(dx)| ≤ R |g(x)|µ(dx) E\An E\An decreases to 0 whenever g is integrable).  66 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

7.3 Poisson point processes

We now show how Poisson random measures can be used to define certain stochastic processes. Let (E, E) be a measurable space, and consider a σ-finite measure G on (E, E). Let µ be the product measure dt ⊗ G(dx) on R+ × E, where dt is the Lebesgue measure on (R+, B(R+)). Otherwise said, µ is the unique measure such that µ([0, t] × A) = tG(A) for t ≥ 0 and A ∈ E.

A L´evyprocess (Xt, t ≥ 0) (with values in R) is a process with independent and stationary increments, i.e. such that for every 0 = t0 ≤ t1 ≤ ... ≤ tk, the random variables

(Xti − Xti−1 , 1 ≤ i ≤ k) are independent with respective laws that of Xti−ti−1 , 1 ≤ i ≤ k. (t) Equivalently, X is a L´evyprocess if and only if X = (Xt+s − Xt, s ≥ 0) has same law X as X and is independent of Ft = σ(Xs, 0 ≤ s ≤ t) for every t ≥ 0 (simple Markov property).

Proposition 7.3.1 A Poisson random measure M whose intensity µ is of the above form is called a Poisson point process. If f be a measurable G-integrable function on E, then the process Z f Nt = f(x)M(ds, dx) , t ≥ 0 , [0,t]×E is a L´evyprocess. Moreover, the process Z Z f Mt = f(x)M(ds, dx) − t f(x)G(dx) , t ≥ 0, [0,t]×E E is a martingale with respect to the filtration Ft = σ(M([0, s] × A), s ≤ t, A ∈ E), t ≥ 0. If moreover f ∈ L2(µ), the process Z f 2 2 (Mt ) − t f(x) G(dx), E is an (Ft)-martingale.

f f R Proof. For s ≤ t, we have Nt − Ns = (s,t]×E f(x)M(du, dx). Moreover, it is easy to check that M(du, dx)1{u∈(s,t]} has same law as the image of M(du, dx)1{u∈(0,t−s]} under 1 (u, x) 7→ (s + u, x) from R+ × E to itself, and is independent of M(du, dx) {u∈[0,s]}. We obtain that N f has stationary and independent increments. The fact that M f is a martingale is a straightforward consequence of the first moment formula and the simple f 2 f f f 2 Markov property. The last statement comes from writing (Mt ) = (Mt − Ms + Ms ) and expanding, then using the variance formula and the simple Markov property. 

7.3.1 Example: the Poisson process

Let X1,X2,... be a sequence of independent exponential random variables with parameter θ, and define 0 = T0 ≤ T1 ≤ ... by Tn = X1 + ... + Xn. We let

∞ θ X 1 Nt = {Tn≤t} , t ≥ 0, n=1 7.3. POISSON POINT PROCESSES 67

be the c`adl`agprocess that counts the number of times Tn that are ≤ t. The process θ (Nt , t ≥ 0) is called the (homogeneous) Poisson process with intensity θ. This is the so- called Markovian description of the Poisson process, which is a jump-hold Markov process. The following alternative description makes use of Poisson random measures. We give the statement without proof, which can be found in textbooks, or make a good exercise (first notice that with both definitions, N θ is a process with stationary and independent increments).

Proposition 7.3.2 Let θ > 0, and let M be a Poisson random measure with intensity θdt on R+. Then the process

θ Nt = M([0, t]) , t ≥ 0 is a Poisson process with intensity θ.

The set of atoms of the measure M itself is sometimes also called a Poisson (point) process with intensity θ.

7.3.2 Example: compound Poisson processes

A compound Poisson process with intensity ν is a process of the form Z ν Nt = xM(ds, dx) , t ≥ 0, [0,t]×R where M is a Poisson random measure with intensity dt ⊗ ν(dx) and ν is a finite measure P on R. Alternatively, if we write M in the form i∈I δ(ti,xi), for every t ≥ 0 we can write Xt = xi whenever t = ti and Xt = 0 otherwise. As an exercise, one can prove that this is a.s. well-defined, i.e. that a.s., for every t ≥ 0, the set {i ∈ I : ti = t} has at most one element. With this notation, we can write

ν X Nt = Xs , t ≥ 0 0≤s≤t

(notice that there is a.s. a finite set of times s ∈ [0, t] such that Xs 6= 0, so the sum is meaningful). There is a Markov jump-hold description for these processes as well: if N θ is a Poisson process with parameter θ = ν(R) and jump times 0 < T1 < T2 < . . ., and if Y1,Y2,... is a sequence of i.i.d. random variables, independent of N θ and with law ν/θ, then the process

X 1 Yn {Tn≤t} , t ≥ 0, n≥1 is a compound Poisson process with intensity ν. This comes from the following marking property of Poisson measures: suppose we have a description of a Poisson random measure P M(dx) with intensity µ as i∈I δXi (dx), where (Xi, i ∈ I) is a countable family of random variables. If (Yi, i ∈ I) is a family of i.i.d. random variables with law ν, and independent 68 CHAPTER 7. POISSON RANDOM MEASURES AND PROCESSES

0 P of M, then M = i∈I δ(Xi,Yi) is a Poisson random measure with intensity the product measure µ ⊗ ν. ν We let CP(ν) be the law of N1 , it is called the compound Poisson distribution with intensity ν. It can be written in the form

X ν∗n CP(ν) = e−ν(R) , n! n≥0 where ν∗n is the n-fold convolution of the measure ν. Recall that the convolution of two finite measures µ, ν on R is the unique measure µ ∗ ν which is characterized by ZZ 1 µ ∗ ν(A) = A(x + y)µ(dx)ν(dy) ,A ∈ BR, and that if µ, ν are probability measures, then µ∗ν is the law of the sum of two independent random variables with respective laws µ, ν). The characteristic function of CP(ν) is given by

ΦCP(ν)(u) = exp(−ν(R)(1 − Φν/ν(R)(u))), where Φν/ν(R) is the characteristic function of ν/ν(R). Chapter 8

Infinitely divisible laws and L´evy processes

In this chapter, we consider only random variables and processes with values in R.

8.1 Infinitely divisible laws and L´evy-Khintchine for- mula

Definition 8.1.1 Let µ be a probability measure on (R, BR). We say that µ is infinitely divisible (ID) if for every n ≥ 1, there exists a probability distribution µn such that if X1,...,Xn are independent with law µn, then their sum X1 + ... + Xn has law µ.

∗n Otherwise said, for every n, there exists µn such that µn = µ, where ∗ stands for the convolution operation for measures. Yet otherwise said, the characteristic function Φ of µ is such that for every n ≥ 1, there exists another characteristic function Φn with n Φn = Φ. We stress that it is not the existence of a function whose n-th power is Φ which is problematic, but really that this function is a characteristic function. To start with, let us mention examples of ID laws. Constant random variables are ID. The Gaussian N (m, σ2) is the convolution of n laws N (m/n, σ2/n), so it is ID. The Poisson law P(λ) is also ID as the convolution of n laws P(λ/n). More generally, a compound Poisson law CP(ν) is ID, as the n-th power of CP(ν/n). It is a bit harder to see, but however true, that exponential and geometric distributions are ID. However, the uniform distribution on [0, 1], or the Bernoulli distribution with parameter p ∈ (0, 1), are not ID. Suppose indeed that an ID law µ has a support which is bounded above and below by M > 0. Then the support of µn is bounded by M/n, but then its variance is ≤ M 2/n2, which shows that the variance of µ is ≤ M 2/n for every n, hence µ is a Dirac mass. The main goal of this chapter is to give a structural theorem for ID laws, the L´evy- Khintchine formula. Say that a triple (a, q, Π) is a L´evytriple if

• a ∈ R, • q ≥ 0,

69 70 CHAPTER 8. ID LAWS AND LEVY´ PROCESSES

R 2 • Π is a σ-finite measure on R such that Π({0}) = 0 and (x ∧ 1) Π(dx) < ∞.

In particular, Π(1{|x|>ε}) < ∞ for every ε > 0.

Theorem 8.1.1 (L´evy-Khintchine formula) Let µ be an ID law. Then there exist a unique L´evytriple (a, q, Π) such that if Φ is the characteristic function of µ, Φ(u) = eψ(u), where ψ is the characteristic exponent given by

q Z ψ(u) = iau − u2 + (eiux − 1 − iux1 )Π(dx). 2 {|x|<1} R

We reobtain the constants for a = q = 0, the normal laws for Π = 0, and the compound Poisson laws for a = q = 0 and Π(dx) = ν(dx), a finite measure.

Lemma 8.1.1 The characteristic function Φ of an ID law never vanishes, and therefore, the characteristic exponent ψ with ψ(0) = 0 is well-defined and unique.

n Proof. If µ is ID, then Φ = Φn for all n, where Φn is the characteristic exponent of n some law µn. Therefore, |Φ| = |Φn| , and taking logarithms, as n → ∞, we see that Φn converges pointwise to 1{Φ6=0}. However, Φ is continuous and takes the value 1 at 0, so it is non-zero in a neighborhood of 0, and 1{Φ6=0} equals 1 (hence is continuous) in a neighborhood of 0. By L´evy’sconvergence theorem, this shows that µn weakly converges to some distribution, which has no choice but to be δ0. In particular, Φ never vanishes. To conclude, it is a standard topology exercise that a continuous function f : R → C that never vanishes and such that f(0) = 1 can be uniquely lifted into a continuous g function g : R → C with g(0) = 0, so that e = f. 

As a corollary, notice that Φn, the n-th ‘root’ of Φ, can itself be written in the form ψn e for a unique continuous ψn satisfying ψn(0) = 0, so that ψn = ψ/n. It also entails the ∗n uniqueness of µn such that µn = µ.

Lemma 8.1.2 An ID law is the weak limit of compound Poisson laws.

Proof. Let Φn be the characteristic function of µn, as defined above. Then since (1−(1− n Φn)) = Φ, and Φn → 1 pointwise, we obtain that −n(1 − Φn) → ψ pointwise, taking the complex logarithm in a neighborhood of 1. In fact, this convergence even holds uniformly on compact neighborhood of 0, a fact that we will need later on. Exponentiating gives exp(−n(1 − Φn)) → Φ. However, on the left-hand size we can recognize the characterisitc function of a compound Poisson law with intensity nµn. 

Proof of the L´evy-Khintchine formula. We must now prove that the limit ψ of −n(1 − Φn) has the form given in the statement of the theorem. First of all, we make a technical modification of the statement, replacing the 1{|x|<1} in the statement by a continuous function h such that 1{|x|<1} ≤ h ≤ 1{|x|≤2}. This will just modify the value of a in the statement. 8.2. LEVY´ PROCESSES 71

2 Let ηn(dx) = (1 ∧ x )nµn(dx), which is a sequence of measures with finite total mass. Suppose we know that the sequence (ηn, n ≥ 1) is tight and (ηn(R), n ≥ 1) is bounded, and let η be the limit of ηn along some subsequence nk. Then Z Z η (dx) (eiux − 1)nµ (dx) = (eiux − 1) n (8.1) n x2 ∧ 1 R R Z (eiux − 1 − iuxh(x)) Z xh(x) = η (dx) + iu η (dx) x2 ∧ 1 n x2 ∧ 1 n ZR R = Θ(u, x)ηn(dx) + iuan R where  (eiux − 1 − iuxh(x))/(x2 ∧ 1) if x 6= 0 Θ (u, x) = n −u2/2 if x = 0, R xh(x) and an = 2 ηn(dx). Now, for each fixed u, Θ(·, u) is a continuous bounded function, R x ∧1 R R and therefore, along the subsequence nk, Θ(u, x)ηn(dx) converges to Θ(u, x)η(dx). R R Since the left-hand side in (8.1) converges to ψ(u), this implies that ank converges to some a ∈ R. Therefore, if q = η({0}), we obtain that q Z ψ(u) = iua − u2 + (eiux − 1 − iuxh(x))Π(dx), 2 R 2 −1 2 where Π(dx) = 1{x6=0}(x ∧ 1) η(dx) is a measure that is σ-finite, integrates x ∧ 1, and does not charge 0. Hence the result.

So, let us prove that (ηn, n ≥ 1) is tight and that the total masses are bounded. First, 2 x 1{|x|≤1} ≤ C(1 − cos x) for some C > 0, so Z Z 2 ηn(|x| ≤ 1) = x 1{|x|≤1}nµn(dy) ≤ C (1 − cos x)nµn(dx), R R which converges to −C<ψ(1) as n → ∞. Second, adapting Lemma 5.4.1, since ηn1{|x|≥1} = nµn1{|x|≥1}, for some C > 0, and every K ≥ 1,

Z Z K−1 ηn(|x| ≥ K) ≤ CK n(1 − <Φn(x))dx → −CK <ψ(x)dx, {|x|≤K−1} n→∞ −K−1 where the limit can be taken because the convergence of the integrand is uniform on compact neighborhoods of 0, as stressed in the proof of Lemma 8.1.2. Now the limit can be made as small as wanted for K large enough, because ψ is continuous and ψ(0) = 0. This entails the result. The uniqueness statement will be proved in the next section. 

8.2 L´evyprocesses

In this section, all the L´evyprocesses under consideration start at X0 = 0. L´evyprocesses are closely related to ID laws, indeed, if X is a L´evyprocess, then the random variable 72 CHAPTER 8. ID LAWS AND LEVY´ PROCESSES

X1 can be written as a sum of i.i.d. variables n X X1 = (Xk/n − X(k−1)/n), k=1 hence is ID. In fact, (laws of) c`adl`ag L´evyprocesses are in one-to-one correspondence with ID laws, as we show in this section. The first problem we address is that the mapping X 7→ X1 is injective from the set of (laws of) c`adl`agL´evyprocesses to the set of ID laws.

Proposition 8.2.1 Let µ be an ID law. Then there exists at most one c`adl`agL´evy process (Xt, t ≥ 0) such that X1 has law µ. Moreover, if µ has a L´evytriple (a, q, Π) with associated characteristic exponent q Z ψ(u) = iau − u2 + (eiux − 1 − iux1 )Π(dx), 2 {|x|<1} R then the law of such a process X is entirely characterized by the formula.

E[exp(iuXt)] = exp(tψ(u)).

Proof. If X is as in the statement, then for n ≥ 1, X1/n must have ψ/n as characteristic exponent by uniqueness of the characteristic exponent of the n-th root of an ID law. From this we deduce easily that E[exp(iuXt)] = exp(tψ(u)) for every t ∈ Q+ and u ∈ R. −n n Since X is c`adl`agwe deduce the result for every t ∈ R+ by approximating t by 2 d2 te. Therefore, the one-dimensional marginal distributions of X are uniquely determined by µ. It is then easy to check that the finite-marginal distributions of L´evyprocesses are in turn determined by their one-dimensional marginal distributions, because the increments

(Xtj − Xtj−1 , 1 ≤ j ≤ k), for any 0 = t0 ≤ t1 ≤ ... ≤ tk, are independent with respective laws (Xtj −tj−1 , 1 ≤ j ≤ k). Hence the result.  The next theorem is a kind of converse to this theorem, and gives an explicit construc- tion of ‘the’ c`adl`agL´evy process whose law at time 1 is a given ID law µ. Let (a, q, Π) be a L´evytriple associated to an ID law µ. Consider a Poisson random measure M on R+ × R with intensity dt ⊗ Π(dx), and let (t, ∆t) = (t, x) if M has an atom of the form (t, x), and ∆t = 0 otherwise. For any n ≥ 1, consider the martingale Z Z Z n 1 1 Yt = {n−1≤|y|<1}yM(ds, dy) − t yΠ(dy) {n−1≤|y|<1} , t ≥ 0 [0,t] R associated by Proposition 7.3.1 with the Poisson measure M(dt, dx)1{n−1≤|x|<1}. Notice also that this last measure has always a finite number of atoms, because Π(dx)1{|x|>n−1} is a finite measure by assumption on Π, so that Z n X 1 −1 1 −1 Yt = ∆t {n ≤|∆t|<1} − t yΠ(dy) {n ≤|y|<1} , t ≥ 0. (8.2) 0≤s≤t

Independently of M, let Bt be a standard Brownian motion. Finally notice that 0 X 1 Yt = ∆t {|∆t|≥1} , t ≥ 0 (8.3) 0≤s≤t is a compound Poisson process with intensity Π(dx)1{|x|>1}. We let Ft be the σ-algebra 0 n generated by {Bs,Ys ,Ys , n ≥ 1; 0 ≤ s ≤ t}. 8.2. LEVY´ PROCESSES 73

Theorem 8.2.1 (L´evy-It¯o’s theorem) Let µ be an ID law, with L´evytriple (a, q, Π), and let B,Y 0,Y n, n ≥ 1 denote the processes associated with this triple as explained above. ∞ Then there exists a c`adl`agsquare integrable (Ft)-martingale Y such that for every t ≥ 0,   n ∞ 2 E sup |Ys − Ys | −→ 0. 0≤s≤t n→∞ Moreover, the process

√ 0 ∞ Xt = at + qBt + Yt + Yt , t ≥ 0 is a L´evyprocess such that X1 has distribution µ.

This theorem, which is extremely useful in practice, is an explicit construction of any c`adl`ag L´evyprocess, out of four independent ingredients: a deterministic drift, a Brownian motion, and a jump part made of a compound Poisson process, and a compensated L2 c`adl`ag martingale. The compensation by a drift in the formula defining Y n is crucial, 1 R because the identity function is in general not in L (Π), so that [0,t]×[0,1] xM(ds, dx) is in general ill-defined. Proof. For every n > m > 0, the process Y n − Y m is a c`adl`ag martingale, and Doob’s L2 inequality gives   Z n m 2 n m 2 2 1 E sup |Ys − Ys | ≤ 4E[|Yt − Yt | ] = 4t y Π(dy) {n−1≤|x|

By passing to the limit as n → ∞, we obtain that X1 has the characteristic function associated to µ.  74 CHAPTER 8. ID LAWS AND LEVY´ PROCESSES

Proof of the uniqueness in Theorem 8.1.1. Let µ be an ID law with L´evy triple (a, q, Π), and let X be the unique c`adl`ag L´evyprocess such that X1 has law µ, given by Proposition 8.2.1. Then Theorem 8.2.1 shows that the jumps of X between times 0 and t, i.e. the process (∆s, 0 ≤ s ≤ t) defined by ∆s = Xs − Xs−, s ≥ 0, are the atoms of a Poisson random measure M with intensity tΠ on R. This intensity is determined by the law of M through the first moment formula tΠ(A) = E[M(A)] of Proposition 7.2.1. 0 n ∞ n Then, by defining Y and Y by the formulas (8.2), (8.3) and letting Y = limn Y along a subsequence according to which the limit is almost-sure uniformly on compacts, we obtain that X − Y 0 − Y ∞ is a (scaled) Brownian motion with drift, with same law √ √ as Be = (at + qBt, t ≥ 0). We can recover a as the expectation of Be1, and q as its variance. Finally, we see that µ uniquely determines its L´evytriple.  Chapter 9

Exercises

Warmup

The exercises of this section are designed to help remind you of basic concepts of prob- ability theory (random variables, expectation, classical probability distributions, Borel- Cantelli lemmas). The last one is a longer exercise that contains the basic results on uniform integrability that are needed in this course.

Exercise 9.0.1 Remind yourself what the following classical discrete distributions are : Bernoulli with pa- rameter p ∈ [0, 1], binomial with parameters (n, p) ∈ N × [0, 1], geometric with parameter p ∈ [0, 1], Poisson with parameter λ ≥ 0. Do so with the following classical distributions on R: uniform on [a, b], exponential with mean θ−1, gamma with (positive) parameters (a, θ) (mean a/θ, variance a/θ2), beta with (positive) parameters (a, b), Gaussian with mean m and variance σ2, Cauchy with parameter a.

Exercise 9.0.2 Compute the distribution of 1/N 2, where N is a standard Gaussian N (0, 1) random variable. What is the distribution of N/N 0, where N,N 0 are two independent such random variables ?

Exercise 9.0.3 Show that for any countable set I and I-indexed family (Xi, i ∈ I) of non-negative random variables, then supi∈I E[Xi] ≤ E[supi∈I Xi]. Show that these two quantities are equal if for every i, j ∈ I there exists some k ∈ I such that Xi ∨ Xj ≤ Xk.

Exercise 9.0.4 Fix α > 0, and let (Zn, n ≥ 0) be a sequence of independent random variables with values in {0, 1}, whose laws are characterized by 1 P (Z = 1) = = 1 − P (Z = 0). n nα n 1 Show that Zn converges to 0 in L . Show that lim supn Zn is 0 a.s. if α > 1 and 1 a.s. if α ≤ 1.

75 76 CHAPTER 9. EXERCISES

Exercise 9.0.5 Let (Xn, n ≥ 1) be a sequence of independent exponential random variables with mean −1 1. Show that lim supn(log n) Xn = 1 a.s.

Exercise 9.0.6 Let N be a random N (0, 1) random variable. Show that 1 P (N > x) ≤ √ exp(−x2/2). x 2π Show in fact that as x → ∞, 1 P (N > x) = √ exp(−x2/2)(1 + o(1)). x 2π

Let (Yn, n ≥ 1) be a sequence of independent such Gaussian variables. Show that −1/2 lim supn(2 log n) Yn = 1 a.s.

Exercise 9.0.7 The basics of uniform integrability Let (E, A, µ) be a measured space, with µ(E) < ∞. If f is a measurable non-negative R function, we let µ(f) be a shorthand for E fdµ. 1 A family of R-valued functions (fi, i ∈ I) in L (E, A, µ) is said to be uniformly inte- grable (U.I. in short) if the following holds : 1 sup µ(|fi| {|fi|>a}) → 0. i∈I a→∞

You may think of (E, A, µ) and the fi as being a probability space and random variables. 1. Show that a U.I. family is bounded in L1(E, A, µ). Show that the converse is not true. 2. Show that a finite family of integrable functions is U.I. −1 3. Let G : R+ → R+ be a measurable function such that limx→∞ x G(x) = +∞. Show that for every C > 0, the family

{f ∈ L1(E, A, µ): µ(G(|f|)) ≤ C} is U.I. Deduce that a family of measurable functions that is bounded in Lp(E, A, µ) for some p > 1 is U.I.

4. (Harder) Show that the converse is true : if (fi, i ∈ I) is a U.I. family, then there exists a function G as in 3. so that (fi, i ∈ I) is included in a set of the form of previous displayed expression. (Hint : consider an increasing positive sequence (an, n ≥ 0) such 1 −n that supi∈I µ(|fi| {fi≥an}) ≤ 2 for every n) 1 5. Let (fi, i ∈ I) be a family that is bounded in L (E, A, µ). Show that (i) and (ii) below are equivalent :

(i) (fi, i ∈ I) is U.I. 1 (ii) ∀ ε > 0, ∃ δ > 0 s.t. ∀ A ∈ A, µ(A) < δ =⇒ supi∈I µ(|fi| A) < ε.

6. Show that if (fi, i ∈ I) and (gj, j ∈ J) are two U.I. families, then (fi+gj, i ∈ I, j ∈ J) is also U.I. 9.1. CONDITIONAL EXPECTATION 77

1 7. Let (fn, n ≥ 0) be a sequence of L functions that converges in measure to a measurable function f, i.e. for every ε > 0,

µ({|f − fn| > ε}) → 0. n→∞

1 Show that (fn, n ≥ 0) converges in L to f if and only if (fn, n ≥ 0) is U.I. Hint : For the necessary condition, you might find useful to consider sets such as {|f − fn| > 1}, {ε < |f − fn| ≤ 1} and {|f − fn| ≤ ε}.

Remark. This shows that a sequence of random variables converging in probability (or a.s.) to some other random variable has an ’upgraded’ L1 convergence if and only if it is uniformly integrable.

9.1 Conditional expectation

Exercise 9.1.1 Let X,Y be two random variables in L1 so that

E[X|Y ] = Y and E[Y |X] = X.

Show that X = Y a.s. As a hint, you may want to consider quantities like E[(X − Y )1{X>c,Y ≤c}] + E[(X − Y )1{X≤c,Y ≤c}].

Exercise 9.1.2 Let X,Y be two independent Bernoulli random variables with parameter p ∈ (0, 1). Let Z = 1{X+Y =0}. Compute E[X|Z],E[Y |Z].

Exercise 9.1.3 Let X ≥ 0 be a random variable on a probability space (Ω, F,P ), and let G ⊆ F be a sub-σ-algebra. Show that X > 0 implies that E[X|G] > 0, up to an event of zero probability. Show that {E[X|G] > 0} is actually the smallest G-measurable event that contains the event {X > 0}, up to zero probability events.

Exercise 9.1.4 Check that the sum Z of two independent exponential random variables X,Y with param- eter θ > 0 (mean 1/θ) is a gamma distribution with parameter (2, θ), whose density with 2 respect to Lebesgue measure is θ x exp(−θx)1{x≥0}. Show that for every non-negative measurable h, 1 Z Z E[h(X)|Z] = h(u)du. Z 0 Conversely, let Z be a random variable with a Γ(2, θ) distribution, and suppose X is a random variable whose conditional distribution given Z is uniform on [0,Z]. Namely, −1 R Z for every Borel non-negative function h, E[h(X)|Z] = Z 0 h(x)dx a.s. Show that X and Z − X are independent, with exponential law. 78 CHAPTER 9. EXERCISES

Exercise 9.1.5 Suppose given a, b > 0, and let X,Y be two random variables with values in Z+ and R+ respectively, whose distribution is characterized by the formula Z t (ay)n P (X = n, Y ≤ t) = b exp(−(a + b)y)dy. 0 n!

Let n ∈ Z+ and h : R+ → R+ be a measurable function, compute E[h(Y )|X = n]. Then compute E[Y/(X + 1)], E[1{X=n}|Y ] and E[X|Y ].

Exercise 9.1.6 2 Let (X,Y1,...,Yn) be a random vector with components in L . Show that the best 2 approximation of X in the L norm by an affine combination of the (Yi, 1 ≤ i ≤ n), say Pn of the form λ0 + i=1 λi(Yi − E[Yi]), is given by λ0 = E[X] and any solution (λ1, . . . , λn) of the linear system n X Cov (X,Yj) = λiCov (Yi,Yj) , 1 ≤ j ≤ n. i=1

This affine combination is called the linear regression of X with respect to (Y1,...,Yn).

If (X,Y1,...,Yn) is a Gaussian random vector, show that E[X|Y1,...,Yn] equals the linear regression of X with respect to (Y1,...,Yn).

Exercise 9.1.7 Let X ∈ L1(Ω, F,P ). Show that the family {E[X|G]: G is a sub-σ-algebra of F} is uniformly integrable.

Exercise 9.1.8 Conditional independence Let G ⊆ F be a sub-σ-algebra. Two random variables X,Y are said to be independent conditionally on G if for every non-negative measurable f, g, E[f(X)g(Y )|G] = E[f(X)|G] E[g(Y )|G].

What are two random variables independent conditionally on {∅, Ω}? On F? 1. Show that X,Y are independent conditionally on G if and only if for every non- negative G-measurable random variable Z, and every f, g non-negative measurable func- tions, E[f(X)g(Y )Z] = E[f(X)ZE[g(Y )|G]], and this if and only if for every measurable non-negative g, E[g(Y )|G ∨ σ(X)] = E[g(Y )|G].

Comment the case G = {∅, Ω}. 2. Suppose given three random variables X,Y,Z with a positive density p(x, y, z). Suppose X,Y are independent conditionally on σ(Z). Show that there exist measurable positive functions r, s so that p(x, y, z) = q(z)r(x, z)s(y, z) where q is the density of Z, and conversely. 9.2. DISCRETE-TIME MARTINGALES 79

9.2 Discrete-time martingales

Exercise 9.2.1 Let (Xn, n ≥ 0) be an integrable process with values in a countable subset E ⊂ R. Show that X is a martingale with respect to its natural filtration if and only if for every n and every i0, . . . , in ∈ E, E[Xn+1|X0 = i0,...,Xn = in] = in.

Exercise 9.2.2 Let (Xn, n ≥ 1) be a sequence of independent random variables with respective laws given by 1  n2  1 P (X = −n2) = ,P X = = 1 − . n n2 n n2 − 1 n2

Let Sn = X1 +...+Xn. Show that Sn/n → 1 a.s. as n → ∞, and deduce that (Sn, n ≥ 0) is a martingale which converges to +∞.

Exercise 9.2.3 0 Let (Ω, F, (Fn),P ) be filtered probability space. Let A ∈ Fn for some n, and let m, m ≥ 0 n. Show that m1A + m 1Ac is a stopping time.

Show that an adapted process (Xn, n ≥ 0) with respect to some filtered probability space is a martingale if and only if it is integrable, and for every bounded stopping time T , E[XT ] = E[X0].

Exercise 9.2.4 Let X be a martingale (resp. supermartingale) on some filtered probability space, and let T be an a.s. finite stopping time. Prove that E[XT ] = E[X0] (resp. E[XT ] ≤ E[X0]) if either one of the following conditions holds:

1. X is bounded (∃M > 0 : ∀n ≥ 0, |Xn| ≤ M a.s.).

2. X has bounded increments (∃M > 0 : ∀n ≥ 0, |Xn+1 − Xn| ≤ M a.s.) and E[T ] < ∞.

Exercise 9.2.5 Let (Xn, n ≥ 0) be a non-negative supermartingale. Show the maximal inequality for a > 0:   aP max Xk ≥ a ≤ E[X0]. 0≤k≤n

Exercise 9.2.6 Let T be an (Fn, n ≥ 0)-stopping time such that for some integer N > 0 and ε > 0,

P (T ≤ N + n|Fn) ≥ ε , for every n ≥ 0.

Show that E[T ] < ∞. Hint: Find bounds for P (T > kN). 80 CHAPTER 9. EXERCISES

Exercise 9.2.7 Your winnings per unit stake on game n are εn, where (εn, n ≥ 0) is a sequence of independent random variables with

P (εn = 1) = p , P (εn = −1) = 1 − p = q, where p ∈ (1/2, 1). Your stake Cn on game n must lie between 0 and Zn−1, where Zn−1 is your fortune at time n − 1. Your object is to maximize the expected ’interest rate’ E[log(ZN /Z0)] where N is a given integer representing the length of the game, and Z0, your fortune at time 0, is a given constant. Let Fn = σ{ε1, . . . , εn}. Show that if C is any previsible strategy, that is if Cn is Fn−1-measurable for all n, then log Zn − nα is a supermartingale, where α denotes the entropy

α = p log p + q log q + log 2, so that E[log(ZN /Z0)] ≤ Nα, but that, for a certain strategy, log Zn −nα is a martingale. What is the best strategy?

Exercise 9.2.8 P´olya’s urn Consider an urn that initially contains two balls, one black, one white. One picks at random one of the balls with equal probability, checks the color, replaces the ball in the urn and adds another ball of the same color. Then resume the procedure. After step n, n + 2 balls are in the urn, of which Bn + 1 are black and n + 1 − Bn are white. −1 1. Show that ((n + 2) (Bn + 1), n ≥ 0) is a martingale with respect to a certain filtration you should indicate. Show that it converges a.s. and in Lp for all p ≥ 1 to a [0, 1]-valued random variable X∞. 2. Show that for every k, the process (B + 1)(B + 2) ... (B + k) n n n , n ≥ 1 (n + 2)(n + 3) ... (n + k + 1)

k is a martingale. Deduce the value of E[X∞], and finally the law of X∞. −1 3. Re-obtain this result by directly showing that P (Bn = k) = (n + 1) for every n ≥ 1, 1 ≤ k ≤ n. As a hint, let Yi be the indicator that the i-th picked ball is black, and n compute P (Yi = ai, 1 ≤ i ≤ n) for any (ai, 1 ≤ i ≤ n) ∈ {0, 1} . θ 4. Show that for 0 < θ < 1, (Nn, n ≥ 0) is a martingale, where

θ (n + 1)! Bn n−Bn Nn = θ (1 − θ) Bn!(n − Bn)!

Exercise 9.2.9 Bayes’ urn Let U be a uniform random variable on [0, 1], and conditionally on U, let X1,X2,... be Pn independent Bernoulli random variables with parameter U. Let Bn = i=1 Xi. Show that for every n,(B1,...,Bn) has the same law as the sequence (B1,...,Bn) in the previous θ exercise. Show that Nn is a conditional density function of U given B1,...,Bn.

Exercise 9.2.10 Monkey typing ABRACADABRA A monkey types a text at random on a keyboard, so that each new letter is picked 9.2. DISCRETE-TIME MARTINGALES 81

uniformly at random among the 26 letters of the roman alphabet. Let Xn be the n-th letter of the monkey’s masterpiece, and let T be the first time when the monkey has typed the exact word ABRACADABRA

T = inf{n ≥ 0 : (Xn−10,Xn−9,...,Xn) = (A, B, R, A, C, A, D, A, B, R, A)}.

Show that E[T ] < ∞. The goal is to give the exact value of E[T ]. For this, suppose that just before each time n, a player Pn comes and bets 1 gold coin (GC) that Xn will be A. If he loses, he leaves the game, and if he wins, he earns 26GC, which he entirely plays on 2 Xn+1 being B. If he loses, he leaves, else he earns 26 GC which he bets on Xn+2 being R, and so on. Show that E[T ] = 2611 + 264 + 26. (Hint: Use exercise 9.2.4) Why is that larger than the average first time the monkey has typed ABRACADABRI?

Exercise 9.2.11 Let (Xn, n ≥ 0) be a sequence of [0, 1]-valued random variables, which satisfy the following property. First, X0 = a a.s. for some a ∈ (0, 1), and for n ≥ 0,     Xn 1 + Xn P Xn+1 = Fn = 1 − Xn ,P Xn+1 = Fn = Xn, 2 2 where Fn = σ{Xk, 0 ≤ k ≤ n}. Here, we have denoted P (A|G) = E[1A|G]. p 1. Prove that (Xn, n ≥ 0) is a martingale that converges in L for every p ≥ 1. 2 2. Check that E[(Xn+1 − Xn) ] = E[Xn(1 − Xn)]/4. Then determine E[X∞(1 − X∞)] and deduce the law of X∞.

Exercise 9.2.12 2 Let (Xn, n ≥ 0) be a martingale in L . Show that its increments (Xn+1 − Xn, n ≥ 0) are pairwise orthogonal. Conclude that X is bounded in L2 if and only if

X 2 E[(Xn+1 − Xn) ] < ∞, n≥0

2 2 and that Xn converges in L in this case, without using the L convergence theorem for martingales.

Exercise 9.2.13 Wald’s identity Let (Xn, n ≥ 0) be a sequence of independent and identically distributed real integrable random variables, which are not a.s. 0. We let Sn = X1+...+Xn be the associated random walk, and recall that (Sn − nE[X1], n ≥ 0) is a martingale. Let T be a (Fn)-stopping time. 1. Show that

∞ X E[|ST ∧n − ST |] ≤ E[|Xk|1{T ≥k}] ≤ E[|X1|]E[T 1{T ≥n+1}]. k=n+1 82 CHAPTER 9. EXERCISES

1 Deduce that if E[T ] < ∞, then ST ∧n converges to ST in L . Deduce that if E[T ] < ∞, then E[ST ] = E[X1]E[T ].

2. Suppose E[X1] = 0 and Ta = inf{n ≥ 0 : Sn > a} for some a > 0. Show that E[Ta] = ∞.

3. Let now a < 0 < b and Ta,b = inf{n ≥ 0 : Sn < a or Sn > b}. Assume that E[X1] 6= 0. By discussing separately the cases where X1 is bounded or not, prove that

E[Ta,b] < ∞ and that E[STa,b ] = E[X1]E[Ta,b].

4. Assume that E[X1] = 0. Show that E[Ta,b] < ∞. Hint: consider again separately the cases when X1 is bounded and unbounded. In the bounded case, think how far 2 (Sn, n ≥ 0) is from being a martingale.

Exercise 9.2.14 The gambler’s ruin Let 0 < K < N be integers. Consider a sequence of independent random variables (Xn, n ≥ 1) with P (Xn = 1) = p = 1 − P (Xn = −1), where p ∈ (0, 1/2) ∪ (1/2, 1). Let Sn = X1 + ... + Xn and define

T0 = inf{n ≥ 1 : Sn = 0} ,TN = inf{n ≥ 1 : Sn = N}.

Show that T := T0 ∧ TN is a.s. finite (and in fact has finite expectation). Then show that, letting q = 1 − p,

q Sn M = ,N = S − (p − q)n , n ≥ 0, n p n n defines two martingales with respect to the natural filtration of (Sn, n ≥ 1). Compute P (T0 < TN ) and E[ST ], E[T ]. What happens to this exercise if p = 1/2?

Exercise 9.2.15 Azuma-Hœffding inequality 1. Let Y be a random variable taking values in [−c, c] for some c > 0, and such that E[Y ] = 0. Show that for every θ ∈ R, θ2c2  E[eθY ] ≤ cosh θc ≤ exp . 2

As a hint, the convexity of z 7→ ezθ entails that y + c c − y eyθ ≤ ecθ + e−cθ. 2c 2c Also, state and prove a conditional version of this fact.

2. Let M be a martingale with M0 = 0, and such that there exists a sequence (cn, n ≥ 0) of positive real numbers such that |Mn − Mn−1| ≤ cn for every n. Show that for x ≥ 0,    x2  P sup Mk ≥ x ≤ exp − Pn 2 . 0≤k≤n 2 k=1 ck As a hint, notice that (eθMn , n ≥ 0) is a submartingale, and optimize over θ. 9.2. DISCRETE-TIME MARTINGALES 83

Exercise 9.2.16 A discrete Girsanov theorem

Let Ω be the space of real-valued sequences (ωn, n ≥ 0) such that lim supn ωn = +∞ and lim infn ωn = −∞. We say that such sequences oscillate. Let Fn = σ{Xk, 0 ≤ k ≤ n} where Xk(ω) = ωk is the k-th projection, and F = F∞. Show that p = 1/2 is the only real in (0, 1) such that there exists a probability measure Pp on (Ω, F) that makes (Xn, n ≥ 0) a simple random walk with step distributions

Pp(X1 = 1) = p = 1 − P (X1 = −1).

Let Pp,n be the unique probability measure on (Ω, Fn) that makes (Xk, 0 ≤ k ≤ n) a simple random walk with these step distributions. If p ∈ (0, 1) \{1/2}, identify the martingale dPp,n Mn = . dP1/2,n

Find a finite stopping time T such that E1/2[MT ] < 1.

Exercise 9.2.17 Let f : [0, 1] → R be a Lipschitz function, i.e. |f(x) − f(y)| ≤ K|x − y| for some K > 0 and every x, y. Let fn be the function obtained by interpolating linearly between the −n n 0 values of f taken at numbers of the form k2 , 0 ≤ k ≤ 2 , and let Mn = fn.

1. Show that Mn is a martingale in some filtration. 2. Deduce that there exists an integrable function g : [0, 1] → R such that f(x) = R x f(0) + 0 g(y)dy for almost every 0 ≤ x ≤ 1. Exercise 9.2.18 Doob’s decomposition of submartingales Let (Xn, n ≥ 0) be a submartingale.

1. Show that there exists a unique martingale Mn and a unique previsible process (An, n ≥ 0) such that A0 = 0, A is increasing and X = M + A. 1 2. Show that M,A are bounded in L if and only if X is, and that A∞ < ∞ a.s. in this case (and even that E[A∞] < ∞), where A∞ is the increasing limit of An as n → ∞.

Exercise 9.2.19 Let (Xn, n ≥ 0) be a U.I. submartingale. 1. Show that if X = M + A is the Doob decomposition of X, then M is U.I. 2. Show that for every pair of stopping times S,T , with S ≤ T ,

E[XT |FS] ≥ XS

Exercise 9.2.20 Quadratic variation Let (Xn, n ≥ 0) be a square-integrable martingale. 1. Show that there exists a unique increasing previsible process starting at 0, which 2 we denote by (hXin, n ≥ 0), so that (Xn − hXin, n ≥ 0) is a martingale. 2. Let C be a bounded previsible process. Compute hC · Xi. 3. Let T be a stopping time, show that hXT i = hXiT .

4. (Harder) Show that hXi∞ < ∞ implies that Xn converges as n → ∞, up to a zero probability event. Is the converse true? Show that it is when supn≥0 |Xn+1 − Xn| ≤ K a.s. for some K > 0. 84 CHAPTER 9. EXERCISES

9.3 Continuous-time processes

Exercise 9.3.1 Gaussian processes A real-valued process (Xt, t ≥ 0) is called a Gaussian process if for every t1 < t2 < . . . < tk, the random vector (Xt1 ,...,Xtk ) is a Gaussian random vector. Show that the law of a Gaussian process is uniquely characterized by the numbers E[Xt], t ≥ 0 and Cov (Xs,Xt) for s, t ≥ 0.

Exercise 9.3.2 Let T be an exponential random variable with parameter λ > 0. Define  0 if t < T  1 − eλt if t < T Z = , F = σ{Z , 0 ≤ s ≤ t} ,M = . t 1 if t ≥ T t s t 1 if t ≥ T

Show that E[|Mt|] < ∞ for every t ≥ 0, and that E[Mt1{T >r}] = E[Ms1{T >r}] for every r ≤ s ≤ t. Deduce that (Mt, t ≥ 0) is a c`adl`ag(Ft)-martingale. 1 1 Is M bounded in L ? Is it uniformly integrable? Is MT − in L ?

Exercise 9.3.3 Hazard function Let T be a random variable in (0, ∞) that admits a strictly positive continuous density f on (0, ∞). Let F (t) = P (T ≤ t). Let Z t f(s)ds At = , t ≥ 0 0 1 − F (s) to be the hazard function of T . Show that AT has the law of an exponential random variable with parameter 1. As a hint, consider the distribution function P (AT ≤ t), t ≥ 0 and write it in terms of the inverse function A−1.

By letting Zt = 1t≥T , t ≥ 0 and Ft = σ{Zs, 0 ≤ s ≤ t}, prove that (Zt − AT ∧t, t ≥ 0) is a c`adl`agmartingale with respect to (Ft, t ≥ 0).

The next exercises are designed to (hopefully) help those of you who want to have a better insight on the nature of filtrations and events related to continuous-time processes.

Exercise 9.3.4 Let C1 be the product σ-algebra on Ω = C([0, 1], R), i.e. the smallest σ-algebra that makes the applications Xt : ω 7→ ω(t) for t ≥ 0 measurable. Let C2 be the (more natural?) Borel σ-algebra on C([0, 1], R), when endowed with the uniform norm and the associated topology.

Show that C1 = C2.

Exercise 9.3.5 Let I be a nonempty real interval. Let Ω = RI be the set of all functions defined on I, which is endowed with the product σ-algebra F, i.e. the smallest σ-algebra with respect to which Xt : ω 7→ ω(t) is measurable for every t. Show that [ G = σ(Xs, s ∈ J) J≺I 9.4. WEAK CONVERGENCE 85 is a σ-algebra, where J ≺ I stands for J ⊂ I and J is countable. Deduce that G = F. Show that the set

{ω ∈ Ω: s 7→ Xs(ω) is continuous} is not measurable with respect to F.

9.4 Weak convergence

Exercise 9.4.1 Let (Xn, n ≥ 1) be a sequence of independent random variables with uniform distribution on [0, 1]. Let Mn = max(X1,...,Xn). Show that n(1 − Mn) converges in distribution as n → ∞, and determine the limit law.

Exercise 9.4.2 Let (Xn, n ∈ N t {∞}) be random variables defined on some probability space (Ω, F,P ), with values in a metric space (M, d).

1. Suppose that Xn → X∞ a.s. as n → ∞. Show that Xn converges to X∞ in distribution.

2. Suppose that Xn converges in probability to X∞. Show that Xn converges in distribution to X∞.

Hint: use the fact that (Xn, n ≥ 0) converges in probability to X∞ if and only if for every subsequence extracted from (Xn, n ≥ 0), there exists a further subsequence converging a.s. to X∞.

3. If Xn converges in distribution to a constant X∞ = c, then Xn converges in probability to c.

Exercise 9.4.3 Suppose given sequences (Xn, n ≥ 0), (Yn, n ≥ 0) of real-valued random variables, and two extra random variables X,Y , such that Xn,Yn respectively converge in distribution to X,Y . Is it true that (Xn,Yn) converges in distribution to (X,Y )? Show that this is true in the following cases

1. For every n, Xn and Yn are independent, as well as X and Y . 2. Y is a.s. constant (Hint: use 3. in the previous exercise).

Exercise 9.4.4 Let m be a probability measure on R. Define, for every n ≥ 0,

X −n −n mn(dx) = m([k2 , (k + 1)2 ))δk2−n (dx), k∈Z where δz(dx) denotes the Dirac mass at z. Show that mn converges weakly to m.

Exercise 9.4.5 1. Let (Xn, n ≥ 1) be independent exponential random variables with mean 1. Define 86 CHAPTER 9. EXERCISES

Sn = X1 +...+Xn, and determine without computation the limit of P (Sn ≤ n) as n → ∞ (Hint: which theorem could be useful here?). Pn −1 k 2. Determine also without computation the limit of exp(−n) k=0(k!) n . Hint: recall that the Poisson law with parameter λ > 0 is the probability distribution −λ n on Z+ that puts mass e λ /n! on the integer n. Then if X,Y are two random variables which are independent and with respective laws that are Poisson with parameters λ, µ, then X + Y has a Poisson law with parameter λ + µ. Using this, make the formula look like question 1.

Exercise 9.4.6 2 Let (Yn, n ≥ 0) be a sequence of random variables so that Yn follows a Gaussian N (mn, σn) law, and suppose that Yn weakly converges to some Y as n → ∞. Show that there exist 2 2 2 2 m ∈ R and σ ≥ 0 so that mn → m, σn → σ , and that Y is Gaussian N (m, σ ). Hint: Use characteristic functions, and first show that the variance converges.

Exercise 9.4.7 Let d ≥ 1. 1. Show that a finite family of probability measures on Rd is tight. 2. Assuming Prokhorov’s theorem for probability measures on Rd, show that if d (µn, n ≥ 0) is a sequence of non-negative measures on R which is tight (for every ε > 0 d d there is a compact K ⊂ R such that supn≥0 µn(R \ K) < ε) and such that

d sup µn(R ) < ∞, n≥0 then there exists a subsequence nk along which µn weakly converges to a limit µ (i.e.

µnk (f) converges to µ(f) for every bounded continuous f).

9.5 Brownian motion

Exercise 9.5.1 d Recall that a Gaussian process (Xt, t ≥ 0) in R is a process such that for every t1 < t2 < . . . < tk ∈ R+, the vector (Xt1 ,...,Xtk ) is a Gaussian random vector. Show that d the (standard) Brownian motion in R is the unique Gaussian process (Bt, t ≥ 0) with E[Bt] = 0 for every t ≥ 0 and Cov (Bs,Bt) = (s ∧ t) Id for every s, t ≥ 0.

Exercise 9.5.2 Let B be a standard real-valued Brownian motion. 1. Show that a.s., B B lim sup √t = ∞ , lim inf √t = −∞. t↓0 t t↓0 t

2. Show that Bn/n√→ 0 a.s. as n → ∞. Then show that a.s. for n large enough, supt∈[n,n+1] |Bt − Bn| ≤ n, and conclude that Bt/t → 0 a.s. as t → ∞. 9.5. BROWNIAN MOTION 87

3. Show that the process

 tB if t > 0 B0 = 1/t , t ≥ 0 t 0 if t = 0 is a standard Brownian motion (Hint: Use Exercise 9.5.1). 4. Use this to show that B B lim sup √t = ∞ , lim inf √t = −∞. t→∞ t t→∞ t

Exercise 9.5.3 Around hitting times Let (Bt, t ≥ 0) be a standard real-valued Brownian motion. 1. Let Tx = inf{t ≥ 0 : Bt = x} for x ∈ R. Prove that Tx has same distribution as 2 (x/B1) , and compute its probability distribution function. 2. For x, y > 0, show that x P (T < T ) = ,E[T ∧ T ] = xy. −y x x + y x −y

3. Show that if 0 < x < y, the random variable Ty − Tx has same law as Ty−x, and is independent of FTx (where (Ft, t ≥ 0) is the natural filtration of B). Hint: the three questions are independent.

Exercise 9.5.4 Let (Bt, t ≥ 0) be a standard real-valued Brownian motion. Compute the joint distribution of (Bt, sup0≤s≤t Bs) for t ≥ 0.

Exercise 9.5.5 Let (Bt, t ≥ 0) be a standard Brownian motion, and let 0 ≤ a < b. 1. Compute the mean and variance of

2n X 2 Xn := Ba+k(b−a)2−n − Ba+(k−1)(b−a)2−n . k=1

2. Show that Xn converges a.s. and give its limit. 3. Deduce that a.s. there exists no interval [a, b] with a < b such that B is H¨older α continuous with exponent α > 1/2 on [a, b], i.e. supa≤s,t≤b(|Bt − Bs|/|t − s| ) < ∞.

Exercise 9.5.6 Let (Bt, t ≥ 0) be a standard Brownian motion. Define G1 = sup{t ≤ 1 : Bt = 0} and D1 = inf{t ≥ 1 : Bt = 0}.

1. Are these random variables stopping times? Show that G1 has same distribution −1 as D1 .

2. By applying the Markov property at time 1, compute the law of D1. Deduce that of G1 (it is called the arcsine law). 88 CHAPTER 9. EXERCISES

Exercise 9.5.7 Let (Bt, t ≥ 0) be a standard Brownian motion, and let (Ft, t ≥ 0) be its natural filtration. Determine all the polynomials f(t, x) of degree less than or equal to 3 in x such that (f(t, Bt), t ≥ 0) is a martingale.

Exercise 9.5.8 3 Let (Bt, t ≥ 0) be a standard Brownian motion in R . We let Rt = 1/|Bt|. 2 1. Show that (Rt, t ≥ 1) is bounded in L .

2. Show that E[Rt] → 0 as t → ∞.

3. Show that Rt, t ≥ 1 is a supermartingale. Deduce that |Bt| → ∞ as t → ∞, a.s.

Exercise 9.5.9 Zeros of Brownian motion Let (Bt, t ≥ 0) be a standard real-valued Brownian motion. Let Z = {t ≥ 0 : Bt = 0} be the set of zeros of B. 1. Show that it is closed, unbounded and has zero Lebesgue measure a.s.

2. By using the stopping times Dq = inf{t ≥ q : Bt = 0} for q ∈ Q+, show that Z has no isolated point a.s.

Exercise 9.5.10 Let W0(dw) denote Wiener’s measure on Ω0 = {w ∈ C([0, 1]) : w(0) = 0}, and define a (a) new probability measure W0 on Ω0 by

dW (a) 0 (w) = exp(aw(1) − a2/2). dW0

(a) 1. Show that under W0 , the canonical process Xt : w 7→ w(t) remains Gaussian, and give its distribution.

2. Show that W0({f ∈ Ω0 : kfk∞ < ε}) > 0 for every ε > 0, where kfk∞ = sup0≤t≤1 |f(t)|.

3. Show that for every non-empty open set U ⊂ Ω0, one has W0(A) > 0. Hint: First note that any such U contains the ε-neighborhood of a function f which is piecewise linear, for some ε > 0.

Exercise 9.5.11 Brownian bridge y Let (Bt, 0 ≤ t ≤ 1) be a standard Brownian motion. We let (Zt = yt + (Bt − tB1), 0 ≤ y t ≤ 1) for any y ∈ R, and call it the Brownian bridge from 0 to y. Let W0 be the y law of (Zt , 0 ≤ t ≤ 1) on C([0, 1]). Show that for any non-negative measurable function y F : C([0, 1]) → R+ for f(y) = W0 (F ), we have

E[F (B)|B1] = f(B1) , a.s.

Hint: find a simple argument entailing that B1 is independent of the process (Bt −tB1, 0 ≤ t ≤ 1). y Explain why we can interpret W0 as the law of a Brownian motion ‘conditioned to hit y at time 1’. 9.6. POISSON MEASURES, ID LAWS AND LEVY´ PROCESSES 89

Exercise 9.5.12 Show that the Dirichlet problem on D = B(0, 1) \{0} in Rd, with boundary conditions g(x) = 0 for |x| = 1 and g(x) = 1 for x = 0, has no solution for d ≥ 2.

Exercise 9.5.13 Dirichlet problem in the upper-half plane 2 Let H = {(x, y) ∈ R : y > 0}. Let (Bt, t ≥ 0) be a Brownian motion started from x under the probability measure Px, and let T = inf{t ≥ 0 : Bt ∈/ H}. 1. Determine the law of BT under Px whenever x ∈ H. 2. Show that if u is a bounded continuous function on H which is harmonic on H, then Z 1 y u(x, y) = dz u(z, 0) dz. π (x − z)2 + y2 R

9.6 Poisson measures, ID laws and L´evy processes

Exercise 9.6.1 Prove that the Poisson law with parameter λ > 0 is the weak limit of the Binomial law with parameters (n, λ/n) as n → ∞. A factory makes 500,000 light bulbs in a day. On an average, 4 of these are defectuous. Estimate the probability that on some given day, 2 of the produced light bulbs were defectuous.

Exercise 9.6.2 The bus paradox Why do we always feel we are waiting a very long time before buses arrive? This exercise gives in indication of why... well, if buses arrive according to a Poisson process. 1. Suppose buses are circulating in a city day and night since ever, the counterpart being that drivers do not officiate with a timetable. Rather, the times of arrival of buses at a given bus-stop are the atoms of a Poisson measure on R with intensity θdt, where dt is Lebesgue measure on R. A customer arrives at a fixed time t at the bus-stop. Let S,T be the two consecutive atoms of the Poisson measure satisfying S < t < T . Show that the average time E[T − S] that elapses between the arrivals of the last bus before time t and the first bus after time t is 2/θ. Explain why this is twice the average time between consecutive buses. Can you see why this is so? 2. Suppose that buses start circulating at time 0, so that arrivals of buses at the station are now the jump times of a Poisson process with intensity θ on R+. If the customer arrives at time t, show that the average elapsed time between the bus before (time S) and after his arrival (time T ) is θ−1(2 − e−θt) (with the convention S = 0 if no atom has fallen in [0, t]).

Exercise 9.6.3 Prove Proposition 7.3.2.

Exercise 9.6.4 P Check the marking property of Poisson random measures: if M(dx) = i∈I δxi (dx) is 90 CHAPTER 9. EXERCISES

a Poisson random measure on E, E with intensity µ, and if yi, i ∈ I are i.i.d. random variables with law ν on some measurable space (F, F), and independent of M, then P i∈I δ(xi,yi)(dx dy) is a Poisson random measure on E × F with intensity µ ⊗ ν.

Exercise 9.6.5 Brownian motion and the Cauchy process 1 2 2 Let (Bt = (Bt ,Bt ), t ≥ 0) be a standard Brownian motion in R (i.e. B0 = 0). Recall that the Cauchy law with parameter a > 0 has probability distribution function a/(π(a2 + x2)), x ∈ R. We let 2 Ca = inf{t ≥ 0 : Bt = −a} , a ≥ 0. 1 Prove that the process (BCa , a ≥ 0) is a L´evyprocess such that Ca is a Cauchy law with parameter a for every a > 0. Does this remind you of a previous exercise? Index

c`adl`ag, 29 Intensity measure, 63

Blumenthal’s 0 − 1 law, 47 Kakutani’s product-martingales theorem, 26 Branching process, 23 Kolmogorov’s 0 − 1 law, 23 Brownian motion, 45 Kolmogorov’s continuity criterion, 35 (Ft)-Brownian motion, 48 finite marginal distributions, 45 L´evyprocess, 66, 71 standard, 45 L´evytriple, 69 L´evy’sconvergence theorem, 41 Central limit theorem, 40 L´evy-It¯otheorem, 73 Characteristic exponent, 70 L´evy-Khintchine formula, 70 Compound Poisson distribution, 68 Laplace functional, 65 Compound Poisson process, 67 Last exit time, 14 Conditional convergence theorems, 9 Law of large numbers, 23 Conditional density functions, 11 Likelihood ratio test, 28 Conditional expectation discrete case, 5 Martingale, 14 for L1 random variables, 6 backwards, 21 for non-negative random variables, 8 closed, 19 Conditional Jensen inequality, 9 complex-valued, 51 Convergence in distribution, 39 convergence theorem almost-sure, 17, 34 Dirichlet problem, 55 for backwards martingales, 21 Donsker’s invariance principle, 59 in L1, 20, 34 Doob’s Lp inequality, 19, 34 in Lp, p > 1, 19, 34 Doob’s maximal inequality, 18 regularization, 32 Doob’s upcrossing lemma, 17 uniformly integrable, 20 Martingale transform, 15 Exterior cone condition, 55 Optional stopping Filtration, 13 for discrete-time martingales, 16 filtered space, 13 for uniformly integrable martingales, 20, natural filtration, 13 34 Finite marginal distributions, 31 First entrance time, 14, 30 Poisson point process, 66 First hitting times for Brownian motion, 50, Poisson random measure, 63 51 Prokhorov’s theorem, 40

Harmonic function, 56 Radon-Nikodym theorem, 25 Recurrence, 53 Infinitely divisible distribution, 69 Reflection principle, 49

91 92 INDEX

Scaling property of Brownian motion, 47 Separable σ-algebra, 25 Simple Markov property for Brownian motion, 47 for L´evyprocesses, 66 Skorokhod’s embedding, 60 Stochastic process, 13 adapted, 13 continuous-time, 29 discrete-time, 15 integrable, 13 previsible, 15 stopped process, 15 Stopping time definition, 14 measurable events before T , 15 Strong Markov property, 49 Submartingale, 14 Supermartingale, 14

Taking out ‘what is known’, 10 Tightness, 40 Tower property, 10 Transience, 53

Versions of a process, 31

Weak convergence, 37 Wiener space, 46 Wiener’s measure, 46, 48 Wiener’s theorem, 45