Stochastic Processes in Discrete Time - Stochastik II

Lecture notes

Prof. Dirk Becherer

Institut f¨urMathematik Humboldt Universit¨atzu Berlin

February 12, 2018 Contents

1 Conditional expectations 2 1.1 The Hilbert space L2 and orthogonal projections ...... 2 1.2 Conditional expectations: Definition and construction ...... 2 1.3 Properties of conditional expectations ...... 4 1.4 Regular conditional probability distributions ...... 7 1.5 Disintegration and semi-direct products ...... 10

2 Martingales 14 2.1 Definition and interpretation ...... 14 2.2 Martingale convergence theorems for a.s. convergence ...... 17 2.3 Uniform integrability and martingale convergence in L1 ...... 18 2.4 Martingale inequalities ...... 21 2.5 The Radon-Nikodym theorem and differentiation of measures ...... 23 2.6 A brief excursion on Markov chains ...... 29 2.6.1 First-entry times of Markov chains ...... 30 2.6.2 Further properties of Markov chains ...... 31 2.6.3 Branching Process ...... 32

3 Construction of stochastic processes 33 3.1 General construction of stochastic processes ...... 33 3.2 Gaussian Processes ...... 35 3.3 Brownian motion ...... 37

4 Weak convergence and tightness of probability measures 40 4.1 Weak convergence ...... 40 4.2 Tightness and Prohorov’s theorem ...... 41 4.3 Weak convergence on S  Cr0, 1s ...... 43

5 Donsker’s invariance principle 45

1 Chapter 1

Conditional expectations

To define conditional expectation, we will need the following theory.

1.1 The Hilbert space L2 and orthogonal projections

Let pΩ, F, Pq be a probability space. L2  L2pΩ, F, Pq is the space of (equivalence classes of) square-integrable real-valued random variables with the scalar product xX,Y y : rXY s a Ea that induces the norm }X} : xX,Xy  E rX2s. From and integration theory we know that L2 is a complete normed space (hence a Banach space), with the scalar product x¤, ¤y, i.e. it is even a Hilbert space. Theorem 1.1. Let K be a closed subspace in L2. Then for any Y P L2 there exists an ˆ P element Y K such that ! § ) § (i) }Y ¡ Yˆ }  dpY,Kq : inf }Y ¡ X} § X P K ,

(ii) Y ¡ Yˆ K X for all X P K, i.e. Y ¡ Yˆ P KK. Properties (i) and (ii) are equivalent and Yˆ is uniquely described through (i) or (ii). Remark 1.1. The result holds as well for general Hilbert spaces: Note that in the proof we just used the inner-product-structure but not the specific form of this inner product. Definition 1.2 (“best forecast in L2 sense”). Yˆ from Theorem 1.1 is called the orthogonal projection of Y into K (in the Hilbert space L2).

1.2 Conditional expectations: Definition and construction

Let us fix a probability space pΩ, F, Pq. Conditional expectations will be essential for the definition of martingales. A conditional expectation could be seen as the best approxima- tion in a mean-squared-error sense for a real valued given some partial information (modelled as a sub-σ-field G of F). More precisely and more generally, we define

2 Definition 1.3 (Conditional expectation). Let G be a sub-σ-field of F. Let also X be a random variable with X ¥ 0 or X P L1. Then a random variable Y with the properties

(i) Y is G-measurable, r 1 s  r 1 s P (ii) E Y G E X G for each G G, is called (a version of) the conditional expectation of X with respect to G. We write Y  E rX|Gs. Remark 1.2. • Intuitively, (i) means “based on information G”, while (ii) means “best forecast”.

• Using (i) and (ii) only, it is straightforward to check that

X ¥ 0 ùñ Y ¥ 0,

X ¡ integrable ùñ Y ¡ integrable.

(By (i), note that tYˆ ¥ 0u, tYˆ 0u P G; use (ii) in Definition 1.3)

Exercise 1.1. Let Y be a G-measurable integrable random variable for a σ-field G. Then the following are equivalent

(a) Y satisfies (ii) in Definition 1.3 for every G P G, i.e.

r 1 s  r 1 s @ P E X G E Y G G G;

(b) Y satisfies r 1 s  r 1 s @ P E X G E Y G G E for a class E € 2Ω with Ω P E, G  σpEq and E - X-closed;

(c) Y satisfies E rXZs  E rYZs for all G-measurable, bounded (and1 non-negative) random variables Z.

Sidenote: An analogous statement holds with Y being nonnegative instead of inte- grable, if one requires Z ¥ 0 (instead of boundedness) in part (c).

Theorem 1.4. Let X be a random variable with X ¥ 0 (or X P L1) and let G be a sub-σ-field of F. Then:

(1) Conditional expectation E rX|Gs exists and satisfies E rX|Gs ¥ 0 (or E rX|Gs P L1 respectively).

1 (2) The conditional expectation is P-a.s. unique, i.e. if Y and Y are both versions of the 1 conditional expectation, then Y  Y P-a.s. 1optional

3 Remark 1.3. In part (b) of the proof above we constructed ErX|Gs for X P L2 as the 2 orthogonal projection onto the subspace LG, i.e. the conditional expectation minimizes rp ¡ q2s  } ¡ }2 P 2 E X Y X Y L2 over all Y LG, which justifies the description of the conditional expectation as “the best” projection (for X P L2). In simple cases conditional expectations can be constructed explicitly. Example 1.1. Let G be a σ-field that is generated by a finite number of sets, i.e. G  p q P σ A1,...,AN for a finite— number of sets Ai F. For such partitions, we can find— a  n X  H   p q  disjoint partition Ω i1 Bi with Bi Bj for i j with G σ B1,...,Bn ( disjoint union). Then every G-measurable real-valued random variable Y has the form

¸n  1 P Y yi Bi , yi R i1 (ù Exercise: ð is clear, ñ follows by contradiction). Let X P L1 or ¥ 0. Then we have the following explicit representation for the condi- tional expectation of X given G:

¸n rX1 s rX|Gs  E Bi 1 pwith the convention 0{0  0q. E r s Bi i1 P Bk There is an analogous formula if G is generated by a countable (possibly infinite) set. For example, if X  fpUq with U uniformly distributed on r0, 1s and pBkq some partition of [0,1], then the formula above shows that ErX|Gs is a piecewise-constant function on each interval Bk with constant value equal to the mean of fpUq on this interval. Intuitive interpretation of conditional expectations with conditional probabilities: for r s ¡ r¤| s  r ¤ X s{ r s P Bi 0 the elementary conditional probability is defined by P Bi : P Bi P Bi , and the conditional expectation of X given Bi could be defined by » rX1 s r | s  r¤| s  E Bi E X Bi XdP Bi r s , P Bi ° r | s  n r | s ¤ 1 such that E X G i1 E X Bi Bi . Remark 1.4. A similar formula holds when G is generated by a countably infinite par- tition. Check that (i) and (ii) in Definition 1.3 are fulfilled (exercise).

Remark 1.5. If G  σpZq for a random variable Z, we define ErX|Zs : ErX|σpZqs. Then ErX|Zs is of the form hpZq for a measurable function h, since σ-measurable.§ We p q  r |  s p q  r | s§ often write h z E X Z z , or more precise would be h z E X Z Zz.

1.3 Properties of conditional expectations

Throughout this section, we fix a probability space pΩ, F, Pq and a sub-σ-field G „ F. Theorem 1.5. Let X ¥ 0 or in L1. Then

4 (1) ErErX|Gss  ErXs;

(2) X: G-measurable ùñ ErX|Gs  X a.s.; P 1 P ùñ r | s  r | s r | s (3) Linearity: X1,2 L , a, b R E aX1 bX2 G aE X1 G bE X2 G ; ¥ P 1 ¤ ùñ r | s ¤ r | s (4) Monotonicity: X1,2 0 or L with X1 X2 a.s. E X1 G E X2 G a.s.; (5) Projectivity, “tower property”: H € G € F - σ-fields, then

ErErX|Gs|Hs  ErX|Hs a.s.;

(6) “Taking out what is known”: Z: G-measurable, Z ¤ X and X are ¥ 0 or in L1, then

ErZX|Gs  ZErX|Gs a.s.;

(7) If X is independent of G, then ErX|Gs  ErXs a.s. p q Example 1.2 (Application of conditional expectations: martingales). Let Fk kPN0 with F0 € F1 € ¤ ¤ ¤ € F, be an increasing family of σ-fields. Such a sequence pFkq is called a p q filtration. A process Mk kPN0 with the properties P 1 P (1) Mk L for every k N0, P p q p q (2) Any Mk, k N0, is Fk-measurable, (We say: Mk is adapted to Fk .) r | s  @ ¤ (3) E Mk Fl Ml a.s. l k, is called a martingale.

P 1 r s  (a) Random° walk: Let Y1,...,Yn be i.i.d. random variables with Yk L , E Yi 0. Then  k p q  p p ¤ qq Mk : i1 Yi is a martingale with respect to the filtration Fk σ Yi : i k (ù Exercise).

P 1 p q  r | s P (b) Let X L and Fk kPN0 be a filtration. Then Mk : E X Fk , k N0, is a martingale. ¤ ¤ ¤ ¥ (c) “Multiplicative ”: Y1,Y2, , are i.i.d. random± variables such that Yi @  p ¤ q  k  0 i and bounded. Let Fk σ Yi : i k . Then Mk : i1 Yi, k 1, 2,..., is a r s   martingale if E Yi 1 (noting M0 1). Theorem 1.6 (Limiting theorems and inequalities for conditional expectations).

(1) Monotone convergence: Let 0 ¤ Xn Õ X a.s. Then r | s Õ r | s E Xn G E X G a.s.

(2) Fatou’s lemma: Let Xn ¥ 0 for all n. Then

rlim inf Xn|Gs ¤ lim inf rXn|Gs a.s. E nÑ8 nÑ8 E

5 a.s. (3) Dominated convergence theorem (theorem of Lebesgue): If Xn Ñ X and 1 there exists Z P L such that |Xn| ¤ Z, then r | s Ñ r | s E Xn G E X G a.s.

(4) Jensen’s inequality: Let X P L1 and u : R Ñ R be convex with upXq P L1. Then

ErupXq|Gs ¥ upErX|Gsq a.s.

A simple corollary of Jensen’s inequality gives us Corollary 1.7. For p ¥ 1, the conditional expectation maps Lp into itself, i.e. for X P Lp we have ErX|Gs P Lp, and } r | s} ¤ } } E X G p X p. Example 1.3. (1) Conditional expectation if joint density exists: Let X and Y be real-valued random variables with joint density fpx, yq. Let g be a bounded (or non-negative) measurable function. Then

ErgpXq|Y s  ErgpXq|σpY qs  hpY q for ³ » ¡ » © gpxqfpx, yq dx fpx, yq hpyq : ³  gpxq³ dx  gpxq f | px|yq dx , fpx, yq dx fpu, yq du loooomoooonX Y cond. density

where the right-hand side is defined finitely, i.e. h  0 when the denominator in the formula is 0. (Exercise)

(2) Bayes’ formula: Let pΩ, F, Pq be a probability space and consider the Q with r s  r 1 s P density Z with respect to P, i.e. Q A EP Z A for every A F (in particular ErZs  1,Z ¥ 0 P -a.s.). Then for any random variable X ¥ 0 (or in L1) and sub-σ-field G „ F we have rX ¤ Z|Gs r | s  EP EQ X G , ZG  r | s where ZG : EP Z G . Note that the right-hand side is defined Q-a.s.; on the Q-nullset where the (version of the) denominator is 0, one could set the conditional expectation to 0. r s  Remark: ZG is the density of Q with respect to P on G in the sense that Q G r 1 s P EP ZG G for every G G. (3) Wald’s identities:

Let X1,X2,..., be i.i.d. square integrable random variables and let N be an N- p q valued random variable that is independent of Xi iPN. Consider the random variable

6 °  N S : i1 Xi (interpretation: “total claim size of an insurance company”). We are interested in computing some statistics of S, e.g. moments: mean and variance. We do so by using the Theorem 1.8 with G : σpNq:   §    §    ¸N § ¸n § § r s  §  §  r s§ E S E E Xi G E E Xi E nE X1  nN n N i1 i1  r s ¤ r s E N E X1 (Wald’s first identity).

 ¡ © §  ¡ ¡ © ¡ © ©§  ¸n 2 § ¸n ¸n 2 § r 2s   E S E E Xi § E Var Xi E Xi § nN nN i1  i1 i1 §  ¨ §  p q 2 rp qs2 E nVar X1 n E X1 § nN  r s ¤ r s r 2s ¤ r s2 E N Var X1 E N E X1 , and thus ¨ r s  r s ¤ r s r s2 ¤ r 2s ¡ r s2 Var S E N Var X1 E X1 E N E N  r s ¤ r s r s2 ¤ r s E N Var X1 E X1 Var N (Wald’s second identity). ° P 1  n (4) Let X1,...,Xn be i.i.d., X1 L . Set Sn : i1 Xi. Then we have S rX |S s  n . E i n n

Now, consider random variables X and Y defined on the probability space pΩ, F, Pq and taking values in the measurable spaces pSX , SX q and pSY , SY q respectively. Theorem 1.8. Let X and Y be independent, gpx, yq be a measurable real-valued function taking non-negative values (or gpX,Y q P L1). Then § r p q| s  r p qs§ E g X,Y Y E g X, y yY .

More precisely, hpyq : ErgpX, yqs is a measurable function and the random variable hpY q is a version of ErgpX,Y q|Y s.

1.4 Regular conditional probability distributions

For this section, fix a probability space pΩ, F, Pq, a sub-σ-field G „ F, a measurable space pS, Sq, and a random variable X : pΩ, Fq Ñ pS, Sq. Motivation: for any B P S, 1BpXq  1tXPBu is a bounded random variable, hence r1 p q| s  r P | s (a version of) the conditional expectation E B X G P X B G exits, it is G- measurable, valued in r0, 1s a.s. This leads to a mapping

Ω ¢ S ÞÑ r0, 1s pω, Bq ÞÑ PrX P B|Gsrωs

7 for any chosen versions (for each B) of the conditional expectations. However, the defini- tion of such a mapping would depend on the choice of versions. Question: Can we choose versions of the conditional expectations such that B ÞÑ PrX P B|Gs is a probability measure for any ω P Ω (or a.e.)? Problem: In general there a uncountably many B’s. Definition 1.9. (a) A stochastic kernel (or Markov kernel) from a measurable space pS1, S1q into another one pS2, S2q is a mapping K : S1 ¢ S2 Ñ r0, 1s such that

(i) for all x1 P S1, B ÞÑ Kpx1,Bq is a probability measure on S2,

(ii) for all A2 P S2, x1 ÞÑ Kpx1,A2q is a S1 ¡ Bpr0, 1sq-measurable mapping. (b) A regular conditional probability distribution of a random variable X taking values in pS, Sq given the σ-field G is a stochastic kernel K from pΩ, Gq to pS, Sq such that for P p¤ q r1 p q| s  r P | s every B S, K ,B is a version of E B X G P X B G . The existence results below will make use of the so-called “Monotone Class Theorem” (MCT). That is why we explain it first. Definition 1.10. Let pΩ, Fq be a measurable space. Let D „ 2Ω be a class (familiy) of sets which is closed with respect to differences and increasing limits, that means • A, B P D with B „ A implies AzB P D, ” P P „ „ ¤ ¤ ¤  8 P • Ak D, k N, with A1 A2 , implies A : k1 Ak D. We may call such D an “m-class”2. If additionally Ω P D, then one calls D a Dynkin system. Note that a Dynkin system contains any monotone limit of its elements, being sets. Theorem 1.11 (Monotone class theorem for classes of sets, cf. [Klenke06] (Thm. 1.19) or [JacPro03] (ch.6)). Let E „ 2Ω be closed under finite intersections (and Ω P E). Then the smallest Dynkin system δpEq (or m-class when Ω P E) containing E is equal to the σ-field σpEq generated by E, i.e. δpEq  σpEq. Note that “„” is clear, “ ” is the main statement. Corollary 1.12 (a simple MCT for functions). Let E be as in Theorem 1.11 with Ω P E. Let H be a vector space of functions Ω ÞÑ R such that

•H contains all functions 1E, E P E, P P ¤ ¤ ¤ ¤ ¤ ¤ Ò ¡ P • if fn H, n N, 0 f1 f2 , and f : limnÑ8 fn is bounded, then f H. Then H contains all bounded σpEq-measurable functions.

Theorem 1.13. Let S  R with the σ-field S  BpRq, and X be a pS, Sq-valued random variable. For every sub-σ-field G „ F there exists a regular conditional probability of X given G. 2this is not standard in the literature

8 Remark 1.6. The important property about R which we will use is that there exists a countable dense subset (e.g. Q), i.e. that R is separable. Remark 1.7. If pS, Sq as a measurable space is “similar” (isomorphic) to pR, BpRqq, then one can construct a regular conditional probability on it. Definition 1.14. A measurable space pS, Sq is called Borel space if there exists A P BpRq and a bijection Ψ : S Ñ A such that Ψ and Ψ¡1 are measurable with respect to pS, Sq and pA, BpAqq. Remark 1.8. Such Ψ induces a bijection between S and BpAq by3 B ÞÑ ΨpBq  pΨ¡1q¡1pBq ÞÑ Ψ¡1pΨpBqq  B. That means, the σ-fields are isomorphic. One could write briefly ΨpSq  pΨ¡1q¡1pSq  BpAq and Ψ¡1pBpAqq  S. Theorem 1.15. Let pS, Sq be a Borel space, X be a random variable valued in pS, Sq and defined on a probability space pΩ, F, Pq. Let also G € F be a sub-σ-field. Then there exists a regular conditional probability for X given G. Definition 1.16. A Polish space is a complete, separable and metrizable topological space. Theorem 1.17. Let S be a Polish space with S  BpSq (the σ-algebra generated by the topology). Then pS, Sq is a Borel space. Example 1.4 (Examples of Polish spaces). • pRd, Bdq;

d • pZd, 2Z q; • closed subsets A of Rd with BpAq;

• Cpr0,T s, Rq with } ¤ }8.

Lemma 1.18 (“Lifting” of a kernel). Let X be a random variable taking values in pS1, S1q, on a probability space pΩ, F, Pq; let σpXq : G € F, and K be a stochastic kernel from r pΩ, σpXqq to pS2, S2q. Then there exists a unique stochastic kernel K from pXpΩq,XpΩqX r S1q to pS2, S2q such that KpXpωq,Bq  Kpω, Bq for all ω P Ω and B P S2. Terminology: If K is the regular conditional probability of a random variable Y given σpXq, then one calls Kr the regular conditional probability of Y given X.

pS1, S1q

Kr X

K pΩ, G  σpXqq pS2, S2q

Figure 1.4.1: “Lifting” of a kernel K to a kernel Kr via the random variable X 3Note that each preimage here can be read as an inverse image.

9 Example 1.5 (Application example). Let K be a regular conditional probability of a random variable Y (taking values in SY , SY q) given G  σpXq for some random variable X. Then for every measurable function g ¥ 0 (or g bounded, or gpY q P L1) we have » § § ErgpY q|Xs  gpyqKpx, dyq§ a.s. (1.4.1) xX

Remark 1.9. What if gpXq P L1 but not bounded? Then (1.4.1) would hold for g ¡ ¡ and g . The decomposition³ g³  g ¡ g yields a.s. finiteness and equality (1.4.1), thus 2 ¡ Pr“8 ¡ 8 s  P g dK  g dK  8  0, i.e. the right-hand side above is well- ¡ defined for P ¥ X 1-almost all values of x  Xpωq. Hence, the claim will hold true; one could define the integral at the RHS (for all values x) to be ¡8 for those x where it is (as a measure-integral) not well-defined, to have RHS defined for all x with a.s. equality holding true.

1.5 Disintegration and semi-direct products

Setting to consider: Given a stochastic kernel K from pS1, S1q to pS2, S2q, a probability p q  ¢  b Ñ measure P1 on S1, S1 ,Ω S1 S2, F S1 S2, and projections Xi :Ω Si,  p q ÞÑ ω x1, x2 xi. The aim is to construct a probability measure P on F such that ¥ ¡1  P X1 P1 and K is the regular conditional probability distribution of X2 given X1.

Si Ω Example 1.6. Let Si be countable sets, Si  2 (i  1, 2), F  2 . Any probability measure P on F is determined by the probability weights rtp qus  r   s P x1, x2 P X1 x1,X2 x2  r  s ¤ r  |  s P X1 x1 P X2 x2 X1 x1  r s ¤ p q P1 x1 K x1, x2 . p q Ñ p p qq Hence, P1 and K determine P on F. Furthermore, for any f : Ω, F R , B R the following formula holds: » r p qs  E f X1,X2 f dP ¸Ω ¸  p q p q p q f x1, x2 K x1, x2 P1 x1 P P x»1 S¢1 x»2 S2  p q p q p q f x1, x2 K x1, dx2 P1 dx1 . S1 S2

In other words, the probability measure P and expectations with respect to it are being determined by P1 and K via a representation similar to the one from Fubini’s theorem. Theorem 1.19 (Disintegration). In the setting under consideration, there exists a prob- ability measure P on pΩ, Fq such that

10 (a) » » ¢»  p q p q p q ¥ f dP f x1, x2 K x1, dx2 P1 dx1 for all measurable f 0. Ω S1 S2 P  t P | p q P u In particular, for all A F the sections Ax1 : x2 S2 x1, x2 A are in S2, and ³ (b) rAs  Kpx1,Ax q 1pdx1q, P S1 1 P ³ (c) rA1 ¢ A2s  Kpx1,A2q 1pdx1q for all Ai P Si, i  1, 2. P A1 P The probability measure P is uniquely characterized by (c).

Notation: For this unique measure P we write  b P P1 K.

One calls P the (semi-direct) product of P1 with K. Remark 1.10. Theorem 1.19 implies the standard statement of the Fubini’s theorem for  b p ¤q  p¤q product measures P P1 P2 (take K x1, P2 ) (cf. [Klenke06, Theorem 14.14 and 14.16]). b  ¥ ¡1 Terminology: The marginal distributions of P on S1 S2 are PX1 : P X1 and  ¥ ¡1 p ¤q PX2 : P X2 . Then K x1, is the (regular) conditional probability distribution of X2 given X1.

Example 1.7. If X1,X2 are discrete random variables, we get using elementary condi- tional probabilities that rt P u X t  us r P |  s  P X2 A2 X1 x1 P X2 A2 X1 x1 rt  us , P1 X1 x1 p q  r P |  s rt P u X t  us  and hence K x1,A2 P X2 A2 X1 x1 because P X2 A2 X1 x1 rt  us ¤ p q P1 X1 x1 K x1,A2 by (c) in Theorem 1.19. Remark 1.11. We can obtain a canonical generalization from 2 to n factors (finite number of factors) in Theorem 1.19 by induction. For a countably infinite number of factors the existence of a measure P will be proven later in the lecture. p q ¤ ¤ Now, let Si, Si be measurable spaces for‘ 1 i Ân, P1 be a probability measure on p q p i¡1 i¡1 q p q ¤ ¤ S1, S1 and‘ Ki be stochastic kernels from k1 Sk, k1 Sk to Si, Si for 2 i n.  n  n Set Ω : 1 Si and F : 1 Si.

Corollary 1.20. In the aforementioned setting, there exists a probability measure P on pΩ, Fq such that for every measurable f ¥ 0

(a) » » » »  ¤ ¤ ¤ p ¤ ¤ ¤ q pp ¤ ¤ ¤ q q ¤ ¤ ¤ p q p q fdP f x1, , xn Kn x1, , xn¡1 , dxn K2 x1, dx2 P1 dx1 , Ω S1 S2 Sn

11 (b) and, in particular, for any Ai P Si   » » » ¡n  ¤ ¤ ¤ pp q q pp ¤ ¤ ¤ q q P Ai Kn x1, . . . , xn¡1 ,An Kn¡1 x1, , xn¡2 , dxn¡1 1 A1 A2 An¡1 ¤ ¤ ¤ p q p q K2 x1, dx2 P1 dx1 ; with probability measure P being uniquely defined in this way. Â  n Remark 1.12. Remark for product measures P 1 Pi: 1. The construction above works for general measures (instead of just probability mea- sures); the uniqueness argument in the Fubini’s thm. needed σ-finiteness of P. 2. One can consider f ¥ 0, or bounded or f P L1pPq in the theorem of Fubini for product measures; in the L1-case, the inner integrals are finitely defined a.e. (wrt. the respective finite dim. distribution given by the outer intergrals); integral wrt to kernels could be defined (in generalization of the ususal measure integral) pointwise everywhere, defining their value as the usual difference of the integral of the positiv and negative integrand where that is defined, and letting it be ¡8 elsewhere. p q Theorem 1.21. Let X1,X2 be random variables defined on a probability space Ω, F, P p q p q and taking values in the measurable spaces S1, S1 and S2, S2 respectively. Let P1 be a measure on S1 and K be a stochastic kernel from S1 to S2. Suppose that the joint p q p ¢ b q b b distribution X1,X2 on S1 S2, S1 S2 is P1 K. Then for any S1 S2-measurable real-valued function F ¥ 0, the following holds » r p q| sp q  p p q q p p q q E F X1,X2 X1 ω F X1 ω , x2 K X1 ω , dx2 S2

 hpX1pωqq, ³ for hpx1q : F px1, x2qKpx1, dx2q. S2 Remark 1.13. This statement generalizes a previous result: see Example 1.5. Remark 1.14. A simple corollary of Theorem 1.21 is Theorem 1.8. Indeed, this follows b  b when P1 K P1 P2 (i.e. independent case, without dependence of K from its first argument).

Theorem 1.22. Let pSi, Siq be measurable spaces for i  1, 2, and pS2, S2q be a Polish  p q p ¢ b q  space with S2 B S2 . Then every probability distribution Q on S1 S2, S1 S2 : p q  b r Ω, F is of the form Q Q1 K for a probability measure Q1 on S1 and a stochastic r kernel K from pS1, S1q to pS2, S2q. An analogous statement holds for measures on product spaces with finitely many factors pSi, Siq, i  1, ¤ ¤ ¤ , n,. Proof by induction. Example 1.8 (Markov process in discrete finite time with a general state space). Let p q S, S be a measurable space, P0 a probability measure on S (starting initial distribution for the process), and K be a stochastic kernel from pS, Sq to pS, Sq. An S-valued process p q p q Xn n¥0 on a probability space Ω, F, P is called (time-homogeneous) Markov process with initial distribution P0 and transition kernel K if

12  ¥ ¡1  (i) PX0 P X0 P0, (ii) for every n ¥ 0 and B P S holds

r P | ¤ ¤ ¤ s  r P | s  p q P Xi B Xi¡1,Xi¡2, ,X0 P Xi B Xi¡1 K Xi¡1,B a.s..

p q For a given P0 and K, there exists a Markov process Xk k0,1,...,N for any given finite P p ¤ ¤ ¤ q b b ¤ ¤ ¤ b time horizon N N, and the joint law of X0,X1, ,XN is given by P0 loooooomoooooonK K. N times (For infinite time horizon, we‘ will construct such a process only later in the lecture).  N  N  b b ¤ ¤ ¤ b Existence: Take Ω 0 Si, F 0 Si, P : P0 KloooooomoooooonK with kernels N times Kippx0, . . . , xi¡1q, dxiq  Kpxi¡1, dxiq as for Cor. 1.20 but with the conditional distribution of the next future state xi depending on the ’past’ px0, . . . , xi¡1q only through the ’present state’ xi¡1 (Markov property). Provided that the Kernels K do not depend on i one speaks of a time-homogeneous Markov process. (If one considers more generally kernels Kipxi¡1, dxiq with Ki not being the same for all i one speaks of a time-inhomogeneous Markov process and can do a likewise construction.) Then it is easy to check that (i) and (ii) hold for the canonical projections X0, ¤ ¤ ¤ ,XN on the product space:  • Clearly PX0 P0 by construction. • By the generalized Fubini’s theorem (Theorem 1.19, extension for more kernels) we have for all Bk P S » » n¹¡1 r1 p q ¤ 1 p qs  ¤ ¤ ¤ p q p q ¤ ¤ ¤ p q E Bn Xn Bk Xk K xn¡1,Bn K xn¡2, dxn¡1 P0 dx0 0 B0 Bn¡1 n¹¡1  r p q ¤ 1 p qs E K Xn¡1,Bn Bk Xk . 0 ‘ r P | ¤ ¤ ¤ s  p q n¡1t P u Hence, P Xn B Xn¡1, ,X0 K Xn¡1,B since 0 Xk Bk forms an intersection stable generator of σpX0, ¤ ¤ ¤ ,Xn¡1q (cf. Exercise 1.1). Analogously,   ¤ ¤ ¤ ¡ r P | s  p q taking Bk S for k 0, , n 2 we get P Xn B Xn¡1 K Xn¡1,B . Uniqueness of the finite dimensional distribution: follows by uniqueness of the generalized Fubini theorem (Cor.1.20), noting that (ii) implies (with notations as from  b  the proof there) that Pk 1 Pk K for k 1, . . . n is the joint distribution of X0,...,Xk.

13 Chapter 2

Martingales

2.1 Definition and interpretation

Let pΩ, F, Pq be a probability space. Consider discrete time n  0, 1,..., with n P t u P 0, 1,...,T (finite horizon) or n N0 (infinite horizon). p q € Definition 2.1. (a) A filtration ”Fn n¥0¨on F is an increasing family of σ-fields F0 € ¤ ¤ ¤ €  € F1 F. Set F8 : σ n¥0 Fn F.

(b) A stochastic process pXnqn¥0 is adapted to the filtration pFnqn¥0 if Xn is Fn-measurable for each n ¥ 0.

(c) The natural filtration of a process pXnqn¥0 is the smallest filtration with respect to which pXnqn¥0 is an adapted process, i.e. pGnqn¥0 for Gn  σpYk : k ¤ nq.

Remark 2.1. Filtrations are models for information flow, Fn represents information available at time n.

Definition 2.2. A stochastic process pXnqn¥0 is called martingale with respect to the p q filtration Fn n¥0 and a probability measure P if

(i) pXnqn¥0 is adapted, p q P 1p q ¥ (ii) Xn n¥0 is integrable, i.e. Xn L P for all n 0, r | s  ¥ (iii) E Xn Fn¡1 Xn¡1 P-a.s. for all n 1.

The process pXnqn¥0 is a super-/sub-martingale if (iii) only holds with “¤”/“¥”. r | s  ¥ Remark 2.2. • Note that (iii) is equivalent to E Xm Fn Xn P-a.s. for all m n. • A super(sub)-martingale is a an adapted process that decreases (increases) in con- ditional mean.

P 1 r s  Example 2.1. (1) Additive° martingales: let Yk L be i.i.d. with E Y1 0, and  n p q  p consider Xn : k1 Yk. Then Xn n¥1 is a martingale with respect to Fn σ Yk : k ¤ nq.

14 P 1 r s  (2) Multiplicative± martingales: let Yk be i.i.d. with Yk L and E Yk 1. Consider  n p q  p ¤ q Xn : k1 Yk. Then Xn n¥1 is a martingale with respect to Fn σ Yk : k n . P 1p q  r | s ¥ p q p q (3) For X L P , let Xn : E X Fn , n 0, for some filtration Fn n¥0. Then Xn n¥0 is a martingale w.r.t pFnqn¥0.

Definition 2.3. A process pHnqn¥1 is called predictable if Hn is Fn¡1-measurable for all n ¥ 1 (sometimes also called “previsible”).

For a stochastic process pXnq and a predictable process pHnqn¥1, we define by H ¤ X the process ¸n Yn  pH ¤ Xqn  Hk∆Xk, n ¥ 0 pY0  0q. k1 Interpretation:³ “cumulative gain/loss from betting according to the strategy H”. Y “ HdX” is a discrete-time version of the stochastic integral of H with respect to X. If X is a martingale, then H ¤ X is also called a martingale transform. Theorem 2.4. (a) Let H be predictable, bounded (i.e. there exists K 8 such that |Hn| ¤ K for all n), and X be a martingale. Then Y  H ¤ X is a martingale. (b) Let H be predictable, bounded and non-negative, X be a supermartingale. Then Y  H ¤ X is a supermartingale. (c) Instead of boundedness of H in (a) or (b), it suffices that H P L2 when X P L2. Ñ  t 8u Definition 2.5. A map τ :Ω N0 0, 1,..., is called stopping time with respect to a filtration pFnqn¥0 if tτ ¤ nu P Fn for all n ¥ 0. Ñ t  u P Remark 2.3. τ :Ω N0 is a stopping time— iff τ n Fn for all n since (note: t ¤ u  n t  u € ¤ notation being for disjoint unions) τ n k0 τ k and Fk Fn for k n. p q Example 2.2. • Let Xn nPN be an adapted process taking values in the measurable space pS, Sq and let B P S. Then, “the first entry time of X in B”:  t P | P u p H  8q τ : inf n N0 Xn B inf : , ” t ¤ u  n t P u P is a stopping time. Indeed, τ n k0 loooomoooonXk B Fn.

PFk€Fn  t P | P u • Typically, a “last entry time”: τ : sup n N0 Xn B would not be a stopping time as tτ ¤ nu would depend on foresight (future) knowledge, • Let τ, τ 1 be stopping times. Then τ ^ τ 1, τ _ τ 1, τ τ 1 are also stopping times. • Every deterministic time n is a stopping time. In particular, τ ^ n is a stopping time for each stopping time τ. Remark 2.4. Intuitively, Definition 2.5 says that it is known at any time n whether stopping has occured up to that time. Note that a finite horizon process pXkqk0,...,N can be naturally embedded into the infinite horizon setting by letting Xk : XN for k ¡ N (and Fk  aFN ) for k ¡ N.

15 p p q q Now, consider a filtered probability space Ω, F, Fn , P .

Definition 2.6. Let X  pXnq be adapted process and τ be a stopping time. Then τ  p τ q τ  τ p q  p q P X Xn with Xn : Xn^τ (i.e. Xn ω Xn^τpωq ω for ω Ω) is called the process X stopped at time τ. Remark 2.5. Note the difference • Xτ is a process, while

• Xτ is a random variable: Xτ : ω ÞÑ Xτpωq (which requires τ to be finite or, otherwise, X8 to be defined).

Exercise 2.1. (a) If X is an adapted process and τ is a finite stopping time, then Xτ is a random variable.

(b) If X is an adapted process and τ is a stopping time, then Xτ is an adapted process.

(c) If τ and τ 1 are stopping times, then τ ^ τ 1, τ _ τ 1 and τ τ 1 are stopping times. pτq  1 t ¤ u  zt ¤ Now, let τ be a stopping time and let Hn : tn¤τu. Note that n τ Ω τ ¡ u P p pτqq p q n 1 Fn¡1, and thus Hn is a previsible process. For a process Xn we have ¸n p pτq ¤ q  1 ¤ p ¡ q  ¡  τ ¡ H X n tk¤τu Xk Xk¡1 Xn^τ X0 Xn X0. k1 By Theorem 2.4 we thus obtain Theorem 2.7 (Stopping theorem, Doob, version I).

τ (a) Let pXnq be a supermartingale τ be a stopping time. Then X is a supermartingale. r τ | s ¤ τ r τ s ¤ r s ¤ In particular, E Xn Fk Xk and E Xn E X0 for all k n.

τ (b) Let pXnq be a martingale τ be a stopping time. Then X is a martingale. In particular, r τ | s  r τ s  r s ¤ E Xn Fk Xk and E Xn E X0 for all k n. Remark 2.6. • If τ is a bounded stopping time, i.e. if there exists a deterministic 8 ¤ r s ¤ r s constant N such that τ N, then Theorem 2.7 gives that E Xτ E X0 (and “=” in (b)). Indeed, simply choose n ¥ N.

• Stopped supermartingale remains supermartingale. Next aim will be obtain statements as in Theorem 2.7 for σ-fields from stopping times. For that purpose, we need to define the stopped-σ-field.

Definition 2.8. Let τ be a stopping time with respect to the filtration pFnqn¥0 in F. Then Fτ : tA P F| A X tτ ¤ nu P Fn @n ¥ 0u is called the σ-field of the events observable until time τ.

Exercise 2.2. 1. for deterministic stopping times τ  k we have Fτ  Fk, hence notation in non-ambiguous.

16 2. Prove that Fτ is a σ-field.

3. If X is an adapted process and τ is a finite stopping time, then Xτ is Fτ -measurable. 1 1 Lemma 2.9. Let τ and τ be stopping times with τ ¤ τ . Then Fτ € Fτ 1 .

Corollary 2.10 (Corollary of Theorem 2.7). Let X  pXnqn¥0 be a martingale (or su- permartinga) and σ, τ be stopping times with σ ¤ τ a.s. (a) Then r | s p¤q E Xτ^n Fσ^n Xσ^n a.s. (b) If, moreover, τ and σ are bounded, then

r | s p¤q E Xτ Fσ Xσ a.s.

Corollary 2.11. 1. Let pXnq be a martingale. Then for any bounded stopping time τ r s  r s holds E Xτ E X0 .

1 2. Let pXnq be an adapted and integrable process (i.e. Xn P L for all n) such that r s  r s p q for each bounded stopping time τ it holds that E Xτ E X0 . Then Xn is a martingale.

2.2 Martingale convergence theorems for a.s. convergence p q p q p q Let Ω, F, P be a probability space with Fn n¥0 a filtration. Let also Xn n¥0 be a real-valued stochastic process. For a, b P R with a b define N  r s P Ua,b : # of times the process X crosses a, b for time [0,N],N N. In other words, N  t | ¤ u Ua,b sup k Tk N , where Tk is defined as follows:

S0  T0  0,Sk : inftn| n ¥ Tk¡1,Xn ¤ au,Tk : inftn| n ¥ Sk,Xn ¥ bu, k ¥ 1.

For X being adapted, Sk and Tk are stopping times.

Lemma 2.12 (Upcrossing lemma). Let pXnqn¥0 be a supermartingale. Then 1 rU N s ¤ rpX ¡ aq¡s. (2.2.1) E a,b b ¡ aE N Ò N  Remark 2.7. • With Ua,b - limNÑ8 Ua,b # of all upcrossings, the monotone convergence theorem gives

r s ¤ 1 rp ¡ q¡s E Ua,b ¡ sup E XN a . (2.2.2) b a NPN 17 r ¡s • The right-hand side in (2.2.2) is finite if and only if supNPN E XN is finite. ¡  ¥ r| |s 8 • This condition is satisfied if XN 0 for all N (i.e. X 0) or if supNPN E XN 1 r ¡s ¤ r| |s (i.e. X is L -bounded). Indeed, the latter condition suffices since E XN E XN .

Theorem 2.13 (Doob’s martingale convergence theorem I). Let pXnqn¥0 be a super- r ¡s 8  martingale with supN¥0 E XN . Then the a.s. limit X8 limnÑ8 Xn exists and is 1 in L . In particular, X8 is finite a.s. and every non-negative supermartingale converges almost surely against a finite and integrable limit. 2 Example 2.3. (1) Let Zi be i.i.d. with Zi  N p0, 1q and let a ¥ σ for σ ¡ 0 be fixed. Consider the random variables ¢ 1 Y  exp σZ ¡ a . i i 2 ± r s ¤ ¥ 2  n Then E Yi 1 because a σ . Consider the non-negative process Xn : 1 Yi. It is a non-negative supermartingale (cf. Example 2.1, 2). Hence, Theorem 2.13 implies that Xn converges a.s. Question: What is the limit? (Exercise) °  n  r  s  (2) Consider a random walk Sn 1 Yn, where Yi are i.i.d. and p P Yi 1 ¡ r  ¡ s 1 P Yi 1 .

Symmetric case: p  1{2. In this case, Sn is a martingale (cf. Exercise 2.1, 1).  t |  u P Consider the stopping time Tc : inf n Sn c for some c Z. Then Theorem 2.7 p Tc q ¡ gives that Sn is a martingale. It is bounded from above if c 0 and from below if Tc c 0. By the martingale convergence theorem, limnÑ8 Sn exists a.s. and therefore r 8s  P P Tc 1 for every c Z. Thus, £ t 8u  r  8  ¡8s  Tc P lim sup Sn , lim inf Sn 1. cPZ ¡ © S  { 1¡p n Asymmetric case: p 1 2. Then one can check that p is a (non-trivial) non- negative martingale (ù exercise). Hence, Theorem 2.13 implies that it converges a.s. The limit is 0 because the Law of large numbers gives 1 S Ñ 2p ¡ 1  rX s, n n E 1 ¡ © S 1¡p n and hence Sn Ñ 8 if p ¡ 1{2 and Sn Ñ ¡8 if p 1{2. In both cases, Ñ 0. ¡ © p Sn 1¡p 1 @ Note that p DOES NOT converge to 0 in L (expectations being 1 n).

2.3 Uniform integrability and martingale convergence in L1

Definition 2.14. A family X of random variables on a probability space pΩ, F, Pq is called uniformly integrable if

r| |1t| |¥ us  limÑ8 sup E X X c 0. c XPX

18 r| | | | ¥ s  r| |1 s r s  r 1 s Notation: E X ; X c : E X t|X|¥cu or E Y ; B E Y B (common nota- tion, as e.g. in [Will91]); Writing X “is UI” is an abbrevation for “is uniformly integrable”.

Example 2.4. Let Y P L1. Then the family X  tErY |Gs| G is a sub-σ-field of Fu is uniformly integrable (exercise, or [Will91, Chapter 13.1], or example 2.5 )

Lemma 2.15. Each of the following is a sufficient condition for a family X to be uni- formly integrable.

r| |ps 8 ¡ (a) supXPX E X for some p 1. (b) There exists Y P L1, such that |X| ¤ Y a.s. for all X P X .

Theorem 2.16. Let X be a family of random variables. The following are equivalent

(1) X is uniformly integrable;

1 r| |s 8 @ ¡ D ¡ (2) X is L -bounded (i.e. supXPX E X ) and ε 0 δ 0 such that r| |1 s @ P r s sup E X A ε A F with P A δ; XPX

(3) (De La Vall´eePoussin criterion:) There exists a function G : r0, 8q Ñ r0, 8q with Gpxq  8 r p| |qs 8 limxÑ8 x and supXPX E G X ; moreover, G can be chosen as in- creasing and convex.

Example 2.5 (for application of the de La Vall´eePoussin criterion). (a) If X is Lp-bounded for some p ¡ 1, then X is UI (choose Gpxq  |x|p for dLVP in part (3)).

(b) Let Z P L1. Then X : tErZ|Gs | G is a sub-σ-field of Fu is uniformly integrable. p q 1 Theorem 2.17. For a random variable X8 and a sequence Xn nPN in L , the following are equivalent:

1 1 (1) pXnq converges to X8 in L and X8 P L ;

(2) pXnq is UI and converges to X8 in probability. Remark 2.8. • This generalizes the dominated convergence theorem.

Remark 2.9. On the connection of uniform integrability to compactness in the weak topology σpL1,L8q on L1:

• The weak topology σpL1,L8q is the coarsest topology on L1 with respect to which p q  r s P 8 p q 1 all mappings fZ X : E ZX , Z L , are continuous. A sequence Xn in L 1 converges to Xw P L weakly if

8 lim rZXns  rZXws @Z P L . n E E

• A set K € L1 is weakly (relatively) compact if K (resp. its closure) is compact with respect to the topology σpL1,L8q.

19 Theorem 2.18 (Dunford, Pettis). For a set K € L1 the following are equivalent

(1) K is UI,

(2) K is weakly relatively compact,

(3) K is weakly relatively sequentially compact, i.e. any sequence in K has a subsequence that converges weakly (with limit in the closure of K).

Remark 2.10. • “(2) ô (3)” holds on every Banach space (result by Eberlein and Smulian).

• Weak convergence of probability measures (will come later in the lecture) is just the weak-¦-convergence of probability measures as subset of the dual space of continuous and bounded functionals.

1 p q Theorem 2.19 (Martingale convergence theorem II, L -convergence). Let Xn nPN0 be a martingale. Then the following are equivalent.

P 1  r | s (1) There exists a random variable Y L such that Xn E Y Fn a.s. p q 1 (2) Xn nPN0 converges in L against an F8-measurable random variable X8.

1 (3) There exists a random variable X8 P L pF8q such that pX q is a martingale (on n nPN0 P n N0). p q (4) Xn nPN0 is UI. p P p 8q p q Exercise 2.3. (L -convergence of martingales): Let p 1, . Assume that Xn nPN0 is martingale that is bounded in Lp. Then the convergence statement (2) of Theorem 2.19 p p holds even with convergence in L ; i.p. X8 is in L and the other statements of that p theorem hold true too (with Y  X8 P L ).

Remark 2.11 (On the relation of Y and X8). In general,

r 1 s  r 1 s  r 1 s @ P @ P E X8 A E Xn A E Y A A Fn, n N0.

Therefore, ¤ r 1 s  r 1 s @ P E X8 A E Y A A Fn. nPN0 Hence, by a monotone class / Dynkin-type of argument it follows that £ ¤ r 1 s  r 1 s @ P  E X8 A E Y A A σ Fn F8, nPN0 i.e. X8  ErY |F8s. P 1  r | s p q Corollary 2.20. Let Y L and Xn : E Y Fn . Then Xn nPN0 is a uniformly integrable Ñ  r | s 1 martingale and Xn X8 : E Y F8 , where the convergence is a.s. and in L .

20 Corollary 2.21 (Kolmogorov’s 0-1-law). Let Y0,Y1,..., be“ independent random variables. Consider the σ-fields A : σpY ,Y ,...q and A  A . Then for every A P A n n 1 n 2 nPN n we have either PrAs  0 or PrAs  1.

Example 2.6 (Approximation of functions). Consider Ω  r0, 1s, F  BpΩq, P - the Lebesgue measure on Ω. Consider the filtration ¢ § j j 1 § F : σ , § j ¤ 2n ¡ 1 , n P . n 2n 2n N0

P 1  r | s Ñ 1 For a function f L , let fn : E f Fn . Then fn f in L and almost everywhere. To be able to apply the results, one needs to check that f is F8-measurable, i.e. that Bpr0, 1sq  F8. But this is true since for every ra, bs with a, b P r0, 1s we“ can take dyadic Ò Ó r q P r s  r q P approximations an a and bn b such that an, bn Fn; then a, b n an, bn F8.

2.4 Martingale inequalities p q p q Let Ω, F, P be a probability space and Fn nPN0 be a filtration.  p q ¥ Lemma 2.22. Let X Xn nPN0 be a supermartingale with X 0. Then for any N0– valued stopping time τ holds

r s ¥ r s ¥ r 1 s E X0 E Xτ E Xτ tτ 8u .

(in particular, Xτ is also defined on the set tτ  8u a.s.) Theorem 2.23 (Doob’s maximal inequality for probabilities).  p q ¥ ¡ (1) Let X Xn nPN0 be a supermartingale with X 0. Then for every c 0   r s ¥ ¤ E X0 P sup Xn c . nPN0 c

 p q ¥ ¡ P (2) Let X Xn nPN0 be a submartingale with X 0. Then for every c 0 and N N0   ¥ ¤ 1 r 1 s P max Xn c E XN tmaxn0,1,...,N Xn¥cu n0,1,...,N c 1 ¤ rX s. c E N

 p q Corollary 2.24. Let X Xn nPN0 be a martingale. Then   | | ¥ ¤ 1 r| |s P max Xn c E XN . n0,1,...,N c

Example 2.7 (Ruin probabilities in insurance mathematics). Consider the following in- surance balance process  ¡ P Xn Xn¡1 cn Yn n N,

21  P where X0 x0 R denotes the starting capital, cn is deterministic and denotes the premiums at time n, and Yn is stochastic and denotes the claims at time n. Assume that p q p q  p ¤ q  t P | ¤ u Xn is adapted to Fn (e.g. Fn σ Yk : l n ). Let τ : inf n N Xn 0 ; this is the time of technical ruin of the insurance company. A main question in Ruin theory is to estimate Prτ 8s (ruin probability). Note that t 8u € t ¤ u  t p¡ q ¥ u τ infn Xn 0 supn Xn 0 . The idea is to look for a decreasing Ñ p q function u : R R such that u Xn is a supermartingale. This way we can apply the Doob’s inequality (Theorem 2.23, (1)) to estimate p q p q r 8s ¤ r p p qq ¥ p qs ¤ u X0  u x0 P τ P sup u Xn u 0 . n up0q up0q r p¡ p ¡ qq| s ¤ P ¡ A specific example is when E exp λ cn Yn Fn¡1 1 for all n N, where λ 0 is fixed. In this case upXnq : expp¡λXnq is a supermartingale, and thus

¡λx Prτ 8s ¤ e 0 .

 r λY1 s ¤ λc For instance, if Yn are i.i.d. and cn c, then the moment estimate E e e for some λ ¡ 0 sufficies. Remark:

¡λx0 • exponential decay of ruin probability e in initial capital x0 ¡ 0;

• but assuming exponential moments for claim sizes Yn is restrictive. The next statement is a preparation for the Doob’s martingale inequalities for Lp- moments. ¥ ¤ r ¥ s ¤ r 1 s Lemma 2.25. Let U, V 0 be random variables with c P V c E U tV ¥³ cu for every ¡ r 8q Ñ r 8q p q  v p q c 0. Then for any measurable function ψ : 0, 0, with Ψ v : 0 ψ z dz we have  »  V 1 ErΨpV qs ¤ E U ψpcq dc . 0 c p p q Theorem 2.26 (Doob’s inequality for L -moments). Let Xn nPN0 be a non-negative submartingale (or a general martingale). Set X¦ : sup |X |. Then for every p ¡ 1 8 nPN0 n there exist constants cp,Cp ¡ 0, depending only on p but not on X, such that

p q p q a ¦ b cp ¤ sup }Xn}Lp ¤ }X8}Lp ¤ Cp ¤ sup }Xn}Lp . nPN0 nPN0 Remark 2.12. • For p ¡ 1 and a non-negative submartingale X we have: X is p ¦ p bounded in L ðñ X8 is in L . • As we saw in the proof of Theorem 2.26, the inequality (a) holds even for p  1, but in general (b) does not necessarily hold for p  1.

• Note that, by stopping, the inequality holds up to any (finite) time horizon.

22 2.5 The Radon-Nikodym theorem and differentiation of measures

Let P and Q be finite measures on pΩ, Fq.

Definition 2.27. (a) Q is absolutely continuous w.r.t P on F, denoted by Q ! P, if the following holds @A P F : PrAs  0 ñ QrAs  0.

(b) Q is equivalent to P on F, denoted by Q  Q, if Q ! P and P ! Q on F, i.e. the null sets of P and Q are the same.

(c) Q is singular to P on F, denoted by Q K P, if there exists A P F with PrAs  0 and QrAcs  0, i.e. Q is supported on a P-null set and vice versa.

Example 2.8. (1) Let P be a continuous distribution on R and Q be a discrete distri- bution on R. Then P K Q.

(2) Let P and Q be continuous distributions on Rd with density functions f and g (w.r.t Lebesgue measure λ). Then

Q ! P ðñ tg  0u tf  0u λ-a.e., Q  P ðñ tg  0u  tf  0u λ-a.e..

(3) Let X1,X2,..., be i.i.d Bernouli random variables, t0, 1u-valued, with success prob- ability p and q under P and šQ respectively. P and Q are measures on Ω  t0, 1uN,  p q   p | P q  Fn : σ X1,...,Xn , F : nP Fn σ Xn n N . Then P Q on Fn for ev- P P p q N P r s ¡ ery n N if p, q 0, 1 : for every non-empty A Fn we have that P A 0 and QrAs ¡ 0. However, for p  °q we have that Q K P on°F. Indeed, by the Law of 1 n Ñ 1 n Ñ large numbers we° have that n( k1 Xk p P-a.s., and n k1 Xk q Q-a.s., i.e. for  1 n  P r s  r s  A : limnÑ8 n k1 Xk q F we have that P A 0 and Q A 1. Lemma 2.28. Let P be a probability measure on pΩ, Fq, X ¥ 0 be a random variable in 1p q r s 8 L P , i.e. EP X . Then » r s  r 1 s  P Q A : EP X A X dP,A F, A defines a finite measure on pΩ, Fq with Q ! P. ¥ r s  r s Remark 2.13. • For Y 0, F-measurable, we have EQ Y EP XY (by standard approximation).

• The Radon-Nikodym theorem will show that Q ! P is sufficient for having a repre- sentation as in Lemma 2.28.

More generally, we will next study the Lebesgue decomposition.

23 Definition 2.29. Lebesgue decomposition of a probability measure Q with respect to a probability measure P on F is a decomposition of the form r s  r 1 s r X t  8us @ P Q A EP X A Q A X A F , (2.5.1) where X P L1pPq is F-measurable and non-negative. Such a random variable is called generalized Radon-Nikodym derivative of Q w.r.t P on F, denoted by § d § X  Q§ . dP F Remark 2.14. A Lebesgue decomposition of a measure Q w.r.t P is a decomposition of p q  Q into an absolutely continuous part and a singular part. More precisely, let Qa A : r 1 s r s  r X t  8us P  ! EP X A and Qs A : Q A X for A F. Then Q Qa Qs; also, Qa P on K r  8s  P 1p q F by Lemma 2.28 and Qs P because P X 0 since X L P .

Lemma 2.30. Let Q have a Lebesgue decomposition w.r.t P on F with a generalized Radon-Nikodym derivative X. Then

(1) X ¡ 0 Q-a.s.

(2) P has a Lebesgue decomposition w.r.t Q on F with a generalized Radon-Nikodym derivative given by § d § 1 P §  . dQ F X —  8 Example 2.9. Let F be generated by a countable partition of Ω: Ω i1 Bi for P  p P q (disjoint) Bi F and F σ Bi : i N . Set ¢ ¸8 r s Q Bi X  1 ¤ 1t r s u 8 ¤ 1t r s u Bi rB s P Bi 0 P Bi 0 "i1 P i r s{ r s P r s   Q Bi P Bi if ω Bi with P Bi 0 8 P r s  . if ω Bi with P Bi 0 ° Then rX  8s  0 and rXs  rBis (being 8 because is a probability P EP i:PrBis0 Q— Q P  „ measure). Each set A F is of the form A iPJ Bi with J N. Hence ¸ ¸ rB s ¸ rAs  rB s  Q i rB s rB s Q Q j rB s P i Q i jPJ jPJ P i jPJ r s r s P Bj 0 P Bj 0  r 1 s p X t  8uq EP X A Q A X , § i.e. X  dQ § . dP F Remark 2.15. The example showed that the Lebesgue decomposition for any two mea- sures Q and P on a σ-field generated by a countable partition always exists!

24 Let us fix the set-up for the statements that follow. Let P and Qšbe probability p q p q  measures on a measurable space Ω, F with a filtration Fn . Set F8 : n Fn. Assume that Lebesgue decompositions of Q with respect to P on any Fn exist, providing random ¥ P 1p q variables Xn 0, Xn L P, Fn such that r s  r 1 s r X t  8us @ P Q A EloooomoooonP Xn A loooooooooomoooooooooonQ A Xn A Fn

:QarAs :QsrAs ! r s   Remark 2.16. • If Q P on Fn, then Qs A 0 and Q Qa on Fn. K    • If Q P (on F8) with gen.RN-derivative X8, then Qs Q, Qa 0, and X8 0 P-a.s. Theorem 2.31. In the set-up defined above, the following holds true (1) pX q is a -supermartingale and p 1 q is a -supermartingale. n P Xn Q

(2) X8 : a.s- limnÑ8 Xn exists and " P r0, 8q P-a.s. X8  . P p0, 8s Q-a.s.

(3) The Lebesgue decomposition of Q w.r.t P on F8 is given by § dQ§ §  X8. dP F8

loc (4) Q is locally absolutely continuous w.r.t P (“Q ! P”), i.e. ! @ Q P on Fn n, p q r s  if and only if Xn is a P-martingale with EP Xn 1. ! loc! p q r  (5) Q P on F8 if and only if Q P and Xn is P-UI, or (equivalently) if Q X8 8s  0. Definition 2.32. A σ-field is called separable if it is generated by a countable family p q Ai iPN of sets. Example 2.10. (1) BpRdq is separable (generated by rectangles with rational vertices). (2) Borel σ-field of a Polish space is separable. Remark: P • The sets Ai, i N, do not need to be disjoint. But for any n one can find finitely 1 1 p q  p 1 1 q many disjoint sets A 1,...,Am such that σ A1,...,An σ A 1,...,Am . For in- c stance, consider the finite partition generated by all intersections of sets Ai or Ai , for i ¤ n. In this sense, each Fn  σpA1,...,Anq may be assumed to be generated by a finite number of disjoint sets, or by a finite partition of Ω.

25 š • Thus, every separable σ-field can be represented as F as in Thm.2.31 with nPN n each Fn being generated by a finite partition.

• Recall that for every σ-field Fn generated by a countable partition the Lebesgue decomposition of Q with respect to P exists for any probability measures P and Q (see Example ).

Theorem 2.33 (Radon-Nikodym). Let Q and P be finite measures on pΩ, Fq with Q ! P. Then there exists X P L1pP, Fq, X ¥ 0, such that » r s  r 1 s  @ P Q A EP X A X dP A F. A § We write X  dQ § and call it the Radon-Nikodym derivative (or density) of with dP F Q respect to P on F. The same statement holds more generally for P being only σ-finite. For both P, Q being only σ-finite, the statement holds too except that the density X ¥ 0 does not need to be in L1pPq. Note that the density is unique up to equality P -almost everywhere.. p q Example 2.11 (Likelihood-ratio test). Let Yi iPN be i.i.d. under P and Q real-valued random variables with Yi having distribution µ under P and ν under Q. p q Question: are successive observations of Yi being drawn from P or Q?  N p q   p ¤ q Set-up:š Ω R with Yi ω ωi (coordinate mappings), Fn σ Yk : k n , b b F  F8  F ,  µ N and  ν N. nPN n P Q Note that knowing the whole sequence, the decision between Q and P is easy: for every B P BpRq we have " ¸n µpBq P-a.s., lim 1BpYiq  nÑ8 νpBq -a.s. i1 Q

So take B P BpRq such that µpBq  νpBq and observe the limit, or rather the sequence, above. What to conclude based on only finitely many observations? ! dν  Suppose that ν µ with dµ h on R. For instance, µ and ν have densities f and g w.r.t the Lebesgue measure λ on R, in which case one needs tg  0u tf  0u λ a.s., and h  g{f: » » p q   g  r 1 s @ P p q ν B g dλ f dλ Eµ h B B B R . B B f

loc § ± ‘ dQ § n n ¡1 Then ! and  Xn : hpYiq : Indeed, let A  Y pBiq for Bi P Bp q, Q P dP Fn 1 1 i R

26 then ¹n r s  r P s p q Q A Q Yi Bi Yi are Q-i.i.d. 1 » » ¹n ¹n g  g dλ  f dλ f 1 Bi 1 Bi loomoon h ¹n  r p q1 p qs EP h Yi Bi Yi 1 £  ¹n ‘  hpY q 1 n pY ,...,Y q pY are -i.i.d.q EP i 1 Bi 1 n i P 1  r 1 s EP Xn A , § n dQ § and since such sets form X-stable generator of Bp q, we conclude that  Xn (using R dP Fn standard arguments/monotone class theorem). However, if µ  ν, then Q K P on F8 (take B such that µpBq  νpBq, then use the Law of large numbers to construct a suitable set). Also, by Theorem 2.31 and Remark    8 2.16 we have that limnÑ8 Xn X8 0 P-a.s. and limnÑ8 Xn Q-a.s. Decision between P and Q can be made by observing the sequence of the likelihood ratios Xn. More precisely, the law of large numbers gives " ¸n 1 1 rlog hpY qs -a.s., log X  log hpY q Ñ EQ 1 Q n i rlog hpY qs -a.s., n n 1 EP 1 P

p q P 1p q P 1p q assuming that log h Y1 L Q or L P respectively. Set » » dν rlog hpY qs  log h dν  log dν : Hpν|µq, EQ 1 dµ the relative entropy of ν with respect to µ, and we have respectively when also µ ! ν, i.e. µ  ν, » » dµ rlog hpY qs  log h dµ  ¡ log dµ  ¡Hpµ|νq. EP 1 dν Exercise: Hpν|µq ¥ 0 with equality iff µ  ν. If both entropies are finite, we have asymptotically " exppn ¤ Hpν|µqq Q-a.s., Xn  expp¡n ¤ Hpµ|νqq P-a.s.

The relative entropy intuitively gives the rate of convergence of pXnq towards 8 or 0.  p q  Example 2.12 (“Exponential tilting”). (a) Let Yi Exp 1 i.i.d. under P (with µ : Expp1q). Can we construct a change to an absolutely continuous probability measure p q  Q such that Y1,...,Yn are i.i.d. with distribution Exp λ : ν under Q for a given parameter λ ¡ 0?

27 p q  dν ¤ dx p q  λe¡λx ¥ r s  Ansatz: For h x dx dµ x e¡x , note that h 0 and Eµ h 1. So if we define p q  r1 s @ P p q ν B : Eµ Bh B B R , we have that ν  Exppλq and with

¹n bn  p q  dν Xn : h Yi bn 1 dµ  one can describe a probability distribution dQ XndP (on any Fn): Indeed, for all P p q Bi B R ¹n r P s  r p q1 p qs Q Y B EP h Yi B Y1,...,Yn 1 ¹n  r p q1 p qs EP h Yi Bi Yi (Yi are i.i.d.) i1 » » ¹n ¹n  hpxqe¡x dx  λ expp¡λxq dx. i1 Bi i1 Bi p q Thus, Y1,...,Yn are i.i.d. and Exp λ -distributed under Q.  bN P p q However, the measure Q ν , under which all Yi, i N, are i.i.d. Exp λ -  bN  p P q bn  bn distributed, is singular to P µ on F8 σ Yi : i N , although ν µ ,   p q i.e. Q P on Fn σ Y1,...,Yn . p q Note that Xn is a P-martingale but not P-UI.  p q  p q P p q (b) Let Yi N 0, 1 be i.i.d. under¨ P (µ ¡: N 0, 1© ). For m R, define ν on B R p q  ¡ 1 2  ¡ px¡mq2 { p¡ 2{ q through h x exp mx 2 m exp 2 exp x 2 by » p q  r 1 s  ν B Eµ h B h dµ » B ϕm,1  ϕ0,1 dx »B ϕ0,1

 ϕm,1 dx, B

where ϕa,b is the density of N pa, bq w.r.t dx. Thus, ν  N pm, 1q distribution. Now set bn ¹n dν  p q  bn h Yi : Xn. dµ 1 p q bN K bN Xn is a P-martingale but not P-UI because µ ν on the canonical probability p N p Nqq  bn  bn P ¡ space R , B R since µ ν although µ ν for every n N (because Xn 0).

28 2.6 A brief excursion on Markov chains

We do a brief excusion on Markov chains, postponing question about construction of the respective stochastic model to the next chapter. Indeed, the construction is a direct application (using Markov kernels) of the general contruction for stochastic processes in discrete time provided there. The idea of Markov transition kernels also provides an intuitive example that motivates the more general framework (more general kernels) in the next chapter.

: Markov process in discrete time with countable (discrete) state space.

• Setting: E is a countable state space, i P E are the states, N  t0, 1,...u is the time index set.

p q p q Definition 2.34 (Markov property). A process Xn nPN on some probability space Ω, F, P @ @ P has the Markov property (w.r.t. P) if n i0, i1, . . . , in 1 E  §   §   §    §  P Xn 1 in 1 Xr0,ns ir0,ns P Xn 1 in 1 Xn in  §   §   §   § (whereever defined), i.e. if P Xn 1 in 1 Xr0,ns P Xn 1 in 1 Xn a.e..

t¡s 1 • Notational abbreviation used: irs,ts  pis, . . . , itq P E , Xrs,ts  pXs,...,Xtq. • On E we consider the discrete topology, BpEq  2E.

• Stochastic kernel K : E ¢ BpEq Ñ r0, 1s from E to E is determined by a stochastic matrix P  ppijqi,jPE with pij  Kpi, tjuq. • A stochastic kernel K is also called Markov kernel, and P is called transition prob- ability matrix, as K (resp. P) describes the dynamics of transitions for a Markov chain.

p q Definition 2.35 (Markov chain). Xn nPN is called (time-homogeneous) Markov chain (under P) with initial (starting) distribution ν on E and stochastic kernel K represented @ P @ P by a stochastic matrix P, if n N i0, . . . , in 1 E  §   §    p t uq P Xn 1 in 1 Xr0,ns ir0,ns pinin 1 K in, in 1 (2.6.1) (whereever conditional probability is defined), i.e. if  §  §  §  §  p t uq Xn 1 in 1 Xr0,ns pi i K Xn, in 1 a.e.. P n n 1 inXn p q Notation: given a transition matrix/kernel, the distribution of the process Xn nPN is determined by ν and one often writes Pν for P to indicate the initial distribution, or Pi  for Pν when ν δi is the point mass at i. In the sequel we will work on the canonical probability space of paths for a Markov  N  p P q  p P q  p qN  chain: Ω E , F σ πn : n N σ Xn : n N B E , Xn πn for the canonical p q  P projections πn ω ωn. This permits to define a family Pi, i E, of probability measures simultaneously on the same space pΩ, Fq with the canonical mapping X : N ¢ Ω Ñ E,

29 p q   p q Xn ω ωn, such that X Xn nPN is a Markov chain with initial distribution δi (starting P p q at i E) under Pi for any i. By Th. 2.36(a) one can also define Pν on Ω, F for any initial probability distribution ν on E. Any question that depends only the distributional properties of the Markov process (e.g. about probabilities or expectations depending on the distribution of the process N bN pXnq) can be addressed on the canonical space, since the distribution on pE , BpEq q of a  ¥ ¡1 Markov chain X (defined on some probability space) with initial distribution ν : P X0 and given transition matrix is identical to the distribution of the canonical process π on the canonical space under Pν. Indeed part (a) of Theorem 2.36 shows that the distribution of Xr0,ns under P is P Y uniquely determined on Fn for any n N. As nFn is an intersection-stable generator of F, the distributions of the processes X and π on BpEqbN are the same (uniqueness) - if they exists. Existence we postpone to the next chapter - it will be obtained by Kolmogorov’s consistency theorem. p q Theorem 2.36. Let Xn nPN be E-valued stochastic process. p q (a) Xn nPN is a Markov chain (under P ) with transition kernel (matrix) K (P) and ¥ ¡1  @ P @ P initial distribution P X0 ν if and only if n N i0, . . . , in E     p q ¤ ¤ ¤ P Xr0,ns ir0,ns ν i0 pi0i1 pi1i2 pin¡1in .

p q @ P € n € m¡n (b) If Xn nPN is a Markov chain, then n m, in E, A E ,B E  §   §  § § Xr s P B Xr ¡ s P A, X  i  Xr s P B X  i P n 1,m 0,n 1 n n P  n 1,m n n  P Pin Xr1,m¡ns B (2.6.2) (whereever defined), i.e.  §   §  P §  P § P Xrn 1,ms B X0,...,Xn P Xrn 1,ms B Xn , meaning that future evolution of Markov chains only depends on the history up to today through the present state. Theorem 2.37 (Conditional independence of past and future given the present). Let p q @ P € n € m¡n Xn nPN be a Markov chain. Then n m, in E, A E ,B E we have  §   §   §  P P §   P §  ¤ P §  P Xr0,n¡1s A, Xrn 1,ms B Xn in P Xr0,n¡1s A Xn in P Xrn 1,ms B Xn in (whereever defined), i.e.  §   §   §  P P §  P § ¤ P § P Xr0,n¡1s A, Xrn 1,ms B Xn P Xr0,n¡1s A Xn P Xrn 1,ms B Xn a.e.

2.6.1 First-entry times of Markov chains

We consider the natural filtration Fn  σpXk : k ¤ nq generated by the process pXnq. With respect to pFnq, the first-entry times of a set A € E

TA : inftn ¥ 1 : Xn P Au,

HA : inftn ¥ 0 : Xn P Au,

30 8 r  s  P are stopping times, possibly taking value . Note that Pi HA 0 1 for i A and r  s  R p q  r 8s Pi HA TA 1 for i A. Define hA i : Pi HA as the probability of reaching p q  r s A eventually when starting at i, and define kA i : Ei HA as the expected time for reaching A from i (expectation is taken under Pi).

Theorem 2.38. Let A € E be non-empty. Then hA : E Ñ r0, 1s is the smallest function E ÞÑ r0, 8q solving ¸ hApiq  pijhApjq @i P EzA jPE with boundary conditions hApiq  1 for i P A. For the proof of the theorem we will need the following lemma.

Lemma 2.39. For every i P EzA, n P N, A  H we have ¸ r ¤ s  r ¤ s Pi TA n 1 pijPj HA n . jPE

Theorem 2.40. kA is the minimal function E ÞÑ r0, 8q solving ¸ kApiq  1 pijkApjq @i R A jRA with boundary condition kApiq  0 for i P A.

2.6.2 Further properties of Markov chains P Ñ  p q ÞÑ p q For n N define the shift operator θn :Ω Ω, ω ωj jPN ωn j jPN.

Theorem 2.41. Let g : pΩ, Fq Ñ pR, BpRqq be measurable and nonnegative (or bounded). Then for any n ¥ 0 we have § § νrg ¥ θn | Fns  irgs : X rgs E E iXn E n Instead of just 1-step transition probabilities, one can also consider multi-step transi- tion probabilities pnq  r  |  s  r  s pij : P Xk n j Xk i Pi Xn j and those are being related according to

Theorem 2.42 (Chapman-Kolmogorov). For m, n P N, i, j P E we have ¸ r  s  r  s r  s Pi Xn m j Pi Xn y Py Xm j . yPE For Markov processes in general the analogue of the next property would be stronger than the Markov property itself, but for Markov chains both are equivalent:

31 p q 8 Theorem 2.43. Let Xn nPN be a Markov chain and T be a finite stopping time, p q with respect to Fn nPN. Then  §    P §   P P XrT 1,T ks B XT i, A Pi Xr1,ks B , (2.6.3) P P € k P for k N, A FT , B E , i E (wherever defined), such that §  §    §   P §  P §  P P XrT 1,T ks B FT Pi Xr1,ks B : PXT Xr1,ks B . (2.6.4) iXT ˜ ˜ n 1 Remark 2.17. For T  n, m  n k, we have for A  tXr0,ns P Au for some A P E in (2.6.3):  §    P §  P ˜  P P Xrn 1,ms B Xn i, Xr0,T s A Pi Xr1,m¡ns B , i.e. the ordinary Markov property.

2.6.3 Branching Process We consider the branching process (Galton-Watson process) as a simple Markovian model for a growing population. p nq  Let ξi n,iPN1 be an array of iid., N N0-valued random variables. p q  Define the process Zn nPN by Z0 : 1 and

¸Zn  n 1 Zn 1 : ξi , i1 p q  Then Zn is a Markov chain, taking values in E N, with respect to filtration  p k ¤ P q Fn : σ ξi : k n, i N , with transition-matrix ppijq and -kernel K given by r  |  s  r n 1 n 1  s   p t uq P Zn 1 j Zn i & P ξ1 ... ξi j : pij : K i, j ; indeed we have   § ¸Zn § r  | s  n 1§  r  | s P P Zn 1 j Fn P ξi Fn P Zn 1 j Zn , j E, i1 implying (by monotone convergence) r P | s  r P | s € P Zn 1 B Fn P Zn 1 B Zn ,B E.  r ns 2  p nq P p 8q  Let m : E ξi and σ : Var ξi . Assume m 0, . (Case m 0 is trivial .) 2 n Theorem 2.44. Assume σ P p0, 8q and let Wn : Zn{m . Then pWnq is a martingale,the almost-sure limit limn Wn : W8 exists, and

m ¡ 1 ô ErW8s  1 ô ErW8s ¡ 0 .

32 Chapter 3

Construction of stochastic processes

3.1 General construction of stochastic processes

The aim of this section is to explain general concepts and construction of mathematical model (probability space) carrying general stochastic processes. Let pS, Sq be a measurable space, S is the space of observations, pXiqiPI is some family of random variables defined on a probability space pΩ, F, Pq and taking values in pS, Sq, with some general index set I. Definition 3.1. (and Lemma) Let the index set I be non-empty, pS, Sq be a measurable p q p q space, Xi iPI be a stochastic process (family of random variables valued in S, SÂ). For € p q Ñ J pJq J  J I, J finite, the mapping Xi iPJ :Ω S induces a measure Q on S : iPJ S by pJq  ¥ pp q q¡1 Q P Xi iPJ , p q t pJq| € u being the “image measure” (distribution) of Xi iPJ . The family Q J I,J finite p q is called the family of finite-dimensional marginal distributions of Xi iPI under P. This family is consistent in the sense that for any finite J 1 € J we have pJ1q  pJq ¥ p J q¡1 Q Q πJ1 , 1 J J Ñ J J pp q q  p q 1 where πJ1 : S S are the canonical projections: πJ1 si iPJ si iPJ . ‘ I   I  p q ÞÑ OnÂS iPI S we define the coordinate mappings πj πtju : ω si iPI sj. I  p P q S iPI S is the product-σ-field given by σ πi : i I , i.e. the smallest σ-field on I I S so that all πi are measurable. An intersection-stable generator of S is given by the R R I system Z of all cylinder sets of the form Z  tω P S | ωi P Ai for i P Ju with Ai P S I and J € I, J – finite. A process pXiqiPI induces also a distribution on S as the image p q Ñ I measure of P under the mapping Xi iPI :Ω S . Studying the probabilistic properties p q  ¥ pp q q¡1 of Xi iPI requires to know this distribution Q P Xi iPI of the process X. Remark 3.1. • More generally, we can also consider‘ a family of different measurable p q P p I I q  p q  spaces Si, Si , i I; and in this case S , S : iPI Si, iPI Si where iPI Si σpπi : i P Iq.  ‘ I  pt | P uq •S iPI Si in general is not the same as σ iPI Ai Ai Si if I is uncountable. If I is countable, then the two σ-fields coincide (exercise).

33 p q p q Question: How to construct a model Ω, F, P and Xi iPI such that X has a given “stochastic structure”? I I Canonical model: Choose Ω  S , F  S , and Xi  πi as canonical coordinate projections. Hence, the question reduces to the construction of a measure P on pSI , SI q with some “structure” (on canonical probability space, Q  P in the above notation). There are alternative conceptual ways to impose “stochastic structure” in general: (1) specify a consistent family of finite-dimensional distributions; (2) specify some initial distribution for one (first) coordinate and a “transition rule” on how to sample further coordinates from given ones.

Lemma 3.2. A probability measure P on pΩ, Fq  pSI , SI q is uniquely determined by its finite-dimensional marginal distributions, i.e. for any consistent family of finite- p q dimensional distributions tQ J | J € I,J- finiteu, there exists at most one P on pΩ, Fq pJq  ¥ p I q¡1 € such that Q P πJ for every finite J I. p q p q   Let S, S be a measurable space, P0 be a probability measure on S, S , I N : n n t0, 1,...u. For any n ¥ 1, let Kn be a stochastic kernel from pS , S q to pS, Sq. Consider the following iterative construction of measures (as semidirect products of measures and kernels): p0q  p1q  b pnq  pn¡1q b P P0, P P0 K1,... P P Kn,... For every f ¥ 0, Sn 1-measurable function, we have » » » » pnq  p q p q ¤ ¤ ¤ pp q q ¤ p q f dP P0 dx0 K1 x0, dx1 Kn x0, . . . , xn¡1 , dxn f x0, . . . , xn . Sn 1 S S S

 N  t  p q| P P u Let Ω S ω x0, x1,... xi S, i N ; this is the space of all trajectories. p q  p q  P Let Xn ω πn ω xn for n N; this process describes the state at time n. Set  p q P Fn : σ X0,...,Xn for all” n N, describing the information up to time n, and let F : σpX ,X ,...q  σ p F q  SN. Note that F is a σ-field on SN and each 0 1 nPN n n n 1 A P Fn has the form An ¢ S ¢ S ¢ ¤ ¤ ¤ for some An P S .

p0q p1q Remark 3.2. Consistency of P , P , ... : For An P Fn, we have looomooonpAn ¢ Sq ¢S ¢¤ ¤ ¤ 

n 2 :An 1PS ¢ ¢ ¤ ¤ ¤ pn 1qr ¢ s  pnq b r ¢ s  pnqr s An 1 S , and P An S P Kn 1 An S P An . Similarly, pmqr ¢ ¢ ¤ ¤ ¤ ¢ s  pnqr s ¡ P P An looooomooooonS S P An for every m n and An Fn. m¡n Theorem 3.3 (Ionescu Tulcea). In the discrete time situation I  N, there exists a unique probability measure P on SN satisfying r ¢ ¢ ¤ ¤ ¤ s  pnqr s @ P n 1 P An S P An An S , (3.1.1) or equivalently @n and f ¥ 0, Sn 1-measurable, » » » » p q  p q p q ¤ ¤ ¤ pp q q¤ p q f X0,...,Xn dP P0 dx0 K1 x0, dx1 Kn x0, . . . , xn¡1 , dxn f x0, . . . , xn . Ω S S S (3.1.2)

34 Notation and example: If in Theorem 3.3 we have that Knppx0, . . . , xn¡1q, dxnq  p q Pn dxn for some probability measure Pn on S for all n, i.e. does not depend on x0, . . . , xn¡1, then the unique measure P (from Theorem 3.3) is called the product measure of the Pn’s, and we write â8  P Pn. n0 In the general case we write ¢ â8  b P P0 Kn n1 using notation of semidirect products.  8 For P nÂ0 Pn we have that Pn is the marginal distribution of the n-th coordi- pnq  n ¥ nate and P k0 Pk for all n 0, i.e. in particular the coordinate mappings are independent.

Corollary 3.4. The probability measure P from Theorem 3.3 is the product of its one-  ¥ ¡1  ¥ ¡1 dimensional marginal distributions Pn : P πn P Xn if and only if the random p q variables Xn n¥0 are independent under P.

Example 3.1. If Knppx0, . . . , xn¡1q, ¤q  Knpxn¡1, ¤q does not depend on x0, x1, . . . , xn¡2 but only on xn¡1 for all n, then the coordinate process pXnqn¥0 is a (time-inhomogeneous) discrete-time Markov process. If Kn  K (does not depend on n), then pXnqn¥0 is a time- homogeneous Markov process.

Next goal is to construct stochastic processes on arbitrary (possibly uncountable) I I index set I with state space pS, Sq. Let Ω  S , F  S , and Xi :Ω Ñ S for i P I - the coordinate mappings.

p q Theorem 3.5 (Kolmogorov’s consistency theorem). Let tQ J | J € I,J - finiteu be a p q consistent family of finite-dimensional distributions Q J on pSJ , SJ q and let S be a Polish space with Borel σ-field S  BpSq. Then there exists a unique probability measure P on pΩ, Fq  pSI , SI q which has the QpJq’s as its finite-dimensional marginal distributions.  I   p P q Lemma 3.6. Let I be an arbitrary index set and S iPI S σ Xi : i I be the product σ-field. Then we have

(a) ¤ I S  σpXj : j P Jq. J€I J - countable ¡! ‘ ‘ § )© I  p q ¢ p q § P @ P € (b) S σ jPJ Aj iPIzJ S Aj S j J, J I,J - countable .

3.2 Gaussian Processes

Let I be an arbitrary index set, pXiqiPI be a stochastic process defined on the probability space pΩ, F, Pq and taking values in pS, Sq  pR, BpRqq.

35 Definition 3.7. (a) X  pXiqiPI is called if all finite dimensional @ P P marginal distributions° are multivariate normal (“Gaussian”), i.e. n N, i1, . . . , in P n n I, a R holds that 1 akXik is a univariate normal random variable (possibly de- generate, i.e. with zero variance ). r s  (b) X is called centered Gaussian process if it is a Gaussian process with E Xi 0 for every i P I. p kq P p q (c) A family Xi iPIk , k K, of stochastic processes on Ω, F, P is called a Gaussian t k| P P u  p kq ˜  tp q| P P u process if Xi k K, i Ik Xi pk,iqPI˜ with I : k, i k K, i Ik is a Gaussian process.

Theorem 3.8. (a) Let X  pXiqiPI be a Gaussian process. Then its distribution is r s P p q P uniquely determined by E Xi , i I, and Cov Xi,Xj , i, j I.  p q r s P (b) The process X Xi iPI is a Gaussian process with mean values E Xi , i I, if r  p ¡ r sq and only if the process X Xi E Xi iPI is a centered Gaussian process (with the covariance function for both processes being the same).

(c) Existence: Let Γ: I ¢ I Ñ R be such that for every finite J, J € I, the matrix pΓpi, jqqi,jPJ is symmetric and positive semi-definite. Then there exists a centered Gaussian process pXiqiPI on some probability space with CovpXi,Xjq  Γpi, jq for all i, j P I. ³ 2 Example 3.2. Consider the space L p , dxq with the scalar product xf, gy  fg dx. R R This space is a separable Hilbert space. Let pΩ, F, Pq be a probability space carrying a p q p q 2p q sequence εi iPN of i.i.d. N 0, 1 random variables. The subspace of L Ω, F, P which p q x y  is spanned³ by εi iPN is also a separable Hilbert space with the scalar product X,Y r s  E XY Ω XY dP. Now, an isometry between these two Hilbert spaces is given, after choosing an ONB p q 2p q Ø P bi iP of L R , dx , by bi εi, i N, inducing N ¸ ¸ 2p q Q  ÐÑ  P 2p q L R , dx h αibi αiεi : Xh L Ω, F, P ,  x y P where αi h, bi for i N. p q Since εi are Gaussian, then one can prove that Xh hPL2pR ,dxq is a centered Gaussian process with covariance CovpXh,Xgq  xh, gy  1r s  1 P p q P Take h 0,t and set Wt : X r0,ts for every t R . The process Wt t R is a t1 | P u € 2p q centered Gaussian process, since r0,ts t R L R , dx , and its covariance function is given by p q  p q Cov Ws,Wt Cov X1r0,ss ,X1r0,ts

 x1r0,ss, 1r0,tsy (isometry) » 8

 1r0,ss ¤ 1r0,ts dx 0  s ^ t. This will be the covariance structure of Brownian motion, also known as Wiener process (after Norbert Wiener, 1894-1964).

36 Example 3.3. In this example, we will construct a Gaussian process with the same covariance structure as the one from the previous example using a different approach.  Let P0 δ0 and ¢ | ¡ |2 a 1 y x Ks,tpx, dyq  exp ¡ dy, t ¡ s ¥ 0. 2πpt ¡ sq 2|t ¡ s|  t u P For every finite index set J t1, . . . , tn , n N, define the finite-dimensional distributions » » » ¡n pJqp q  p q p q ¤ ¤ ¤ p q @ P p q Q Ai P0 dx0 K0,t1 x0, dxt1 Ktn¡1,tn xtn¡1 , dxtn Ati B R . 1 R At1 Atn

This family of measures is consistent. Indeed, δxbKs,tpx, dyq  x N p0, s¡tq and N p0, t¡ sq N p0, u ¡ tq  N p0, u ¡ sq for all u ¡ t ¡ s ¥ 0. Hence, by the Kolmogorov’s consis- r 8q r 8q tency theorem (Theorem 3.5) there exists a measure P on pR 0, , BpRq 0, q such that the pJq canonical process X  pXtqt¥0 has tQ | J - finiteu as finite-dimensional marginal distri- butions that are multivariate normal distributions. Therefore, X is a Gaussian process under P, centered and with covariances: for s t p q  r s Cov Xs,Xt E XsXt  r p ¡ qs E Xs Xs looomooonXt Xs

KXs  r 2s  E Xs s  s ^ t.

Remark 3.3. Note that pXtq and pWtq (from the previous example) have the same p r0,8q p qr0,8qq ¡ p ¡ q distribution on R , B R . The process X starts at 0, Xt Xs is N 0, t s - distributed, and Xt ¡ Xs is independent of the past of the process until time s for t ¡ s ¥ 0. This Gaussian process has the properties which define Brownian motion except the continuity of paths, it is sometimes called pre-Brownian motion.

3.3 Brownian motion p q p q P p q Definition 3.9. Processes Xt , Yt , t I, given on the probability space Ω, F, P , are called (a) versions or modifications of each other if @ P D P r s  p q  p q @ R t I Nt F : P Nt 0 & Xt ω Yt ω ω Nt, i.e. r  s  @ P P Xt Yt 1 t I; (b) indistinguishable if there exists a null set N such that

Xtpωq  Ytpωq @t P I, @ω R N, i.e. r  P s  P Xt Yt for all t I 1.

37 Theorem 3.10 (Continuity criterion from Kolmogorov and Chentsov). Let pXtqtPr0,1s be an Rd-valued process, with constants α, β, C ¡ 0 such that r| ¡ |αs ¤ ¤ | ¡ |1 β @ P r s E Xt Xs C t s s, t 0, 1 . p q p q Then there exists ¨ a version Yt tPr0,1s of Xt tPr0,1s with (globally) H¨older-continuous paths P β of order λ 0, α . Remark 3.4. • A function g : r0, 1s Ñ Rd is (globally) H¨older-continuous of order λ if } p q ¡ p q} g s g t 8 sup λ . s,tPr0,1s |s ¡ t| st

• If X  pXtqtPr0,1s is a process such that Xt ¡ Xs  N p0, t ¡ sq for every t, s P r0, 1s ¡ r| ¡ |4s  | ¡ |2 r 4s  4  p 2q with t s, then E Xt Xs 3 t s (E Z 3σ for Z N 0, σ ). Hence, pXtq has a H¨older-continuous version with α  4, β  1 and c  3 in the previous theorem.

d p q Definition 3.11 (Brownian motion). An R -valued stochastic process Bt tPR defined on some probability space pΩ, F, Pq is called (1) rd  1s a (1-dimensional) Brownian motion if

(a) B0  0, @ P ¤ ¤ ¤ ¤ ¤ ¤ ¤  ¡ (b) n N, 0 t0 t1 tn, we have that ∆Btn : Btn Btn¡1 are N p0, tn ¡ tn¡1q–distributed and independent (increments over disjoint intervals are independent), (c) P-a.a. paths are continuous; (2) rd ¥ 1s a (d-dimensional) Brownian motion if all coordinate processes Bj, j  1, . . . , d, j of B  pB qj1,...,d are independent 1-dimensional Brownian motions; p q (3) a Brownian motion with respect to a given filtration Ft tPR on F if it is a d- dimensional Brownian motion with increments Bt ¡ Bs independent of Fs for all t ¥ s ¥ 0.  p q Theorem 3.12 (Existence and H¨older-continuity of Brownian motion). Let X Xt tPR be a centered real-valued Gaussian process with covariance function CovpXs,Xtq  s ^ t (which exists, by previous results). Then there exists a version B of X such that B has continuous paths that are H¨older-continuous of any order λ 1{2 on any compact interval r0, ns € R . In particular, a Brownian motion (d-dimensional) does exist. p q p q Remark 3.5. Let Xt tPR and Yt tPR be right-continuous processes. If X is a version of Y , then X and Y are indistinguishable. The reason is that a right-continuous function d R ÞÑ R is determined by its values on a dense countable subset on R .  p q Definition 3.13. Let X Xt tPR be a stochastic process on some probability space pΩ, F, Pq.

38 p ¥q ¥  p ¤ q (a) The natural filtration Ft t¥0 generated by X is the filtration Ft σ Xs : s t , i.e. the smallest filtration with respect to which X is adapted. p q (b) • A filtration Ft t¥0 is complete (w.r.t the measure P) if each Ft is complete w.r.t P (i.e. contains all subsets of P-null sets).

• A filtration pFtqt¥0 is called right-continuous if £   @ P Ft Ft : Ft ε t R . ε¡0

p q • A filtration Ft t¥0 satisfies the usual conditions if it is complete (w.r.t P) and right-continuous.

p X q (c) The usual filtration Ft generated by a process X is £ ¨ X  ¥ P @ P Ft : Ft ε t R , ε¡0

where p¤qP means completion with respect to P.

p q p Pq Lemma 3.14. If a filtration Ft is right-continuous, then its completion Ft is also right-continuous, hence it satisfies the usual conditions.

Theorem 3.15. Let pFtqt¥0 denote the completion of the natural filtration of the Brownian motion pBtqt¥0. Then pFtqt¥0 is right-continuous, i.e. it satisfies the usual conditions.

Remark 3.6. A Brownian motion is an pFtq-Brownian motion with respect to its usual filtration : pFtq.

p q  p ¥qP Corollary 3.16. For Ft t¥0 from Theorem 3.15, F0 F0 is trivial, i.e. for every P r s P t u A F0 we have P A 0, 1 . Example 3.4. If B is a Brownian motion, then

 tD ¡  p su P B A ε 0 Bt 0 in 0, ε F0 r s P t u r s  r ¡ s ¡ and hence P A 0, 1 . However, P A 1 (e.g. P Bε{3 0,B2ε{3 0 0). ¡ r s   t@ ¡ D P p q ¡ ¡ ¡ u Similarly, P C 0 for C : ε 0 t , t 0, ε : Bt 0 and Bt 0

39 Chapter 4

Weak convergence and tightness of probability measures

4.1 Weak convergence p q  p q P Setting: S, d is a metric space with S B S (the Borel σ-field), µ and µn, n N, are probability measures on pS, Sq.

Remark 4.1. We know from the course Stochastics I that convergence notions like p q Ñ p q @ P | ¡ p q|  | p q ¡ | Ñ “µn A µ A A S” or “ µn µ A supAPS µn A µ 0” would be too restrictive, e.g. for the central limit theorem. p q ñ Definition 4.1. (a) µn nPN converges weakly to µ, write µn µ, if » »

h dµn Ñ h dµ @h P CbpSq, S S

where CbpSq denotes the space of all bounded continuous functions on S.

(b) A sequence of pS, Sq-valued random variables Xn defined on some probability spaces p q p q Ωn, Fn, Pn converges in distribution to an S, S -valued random variable X defined p q ÑD  ¥ ¡1 ñ  ¥ ¡1 on Ω, F, P , and write Xn X, if µn : Pn Xn µ : P X .

Remark 4.2. • It is essential in (b) that Xn, X, are pS, Sq-valued (the same space).

D L • Notations in the literature for this type of convergence: Xn Ñ X, Xn Ñ X (con- p | q Ñ p | q Ñw vergence in law), L Xn Pn L X P , µn µ. Theorem 4.2 (Portemanteau theorem). The following are equivalent

(1) µn ñ µ as n Ñ 8. ³ ³ (2) h dµn Ñ h dµ for every h P CbpSq that is uniformly continuous. p q ¤ p q € (3) lim supn µn F µ F for every closed set F S.

(4) lim infn µnpGq ¥ µpGq for every open set G € S.

40 ¥ (5) limnÑ8 µnpAq  µpAq for every µ-boundary-less A P S, i.e. A P S with µpAzA q  0.

Sometimes it is very useful that it suffices to have µnpAq Ñ µpAq for only a subfamily of sets A. Corollary 4.3. Let E € S be X-stable such that (1) any open G € S can be represented as a countable union of sets of E,

(2) µnpAq Ñ µpAq as n Ñ 8 for every A P E.

Then µn ñ µ. Remark 4.3. For separable pS, dq one can construct a countable set E as in Corollary 4.3: take the balls of rational radii centered at a countable dense subset of S, and consider all finite intersection of such balls.

4.2 Tightness and Prohorov’s theorem

Let pS, dq be a metric space and S  BpSq be the Borel σ-field. Consider be the set of all probability distributions on pS, Sq denoted by M1pSq. By “µn ñ µ” we will denote weak convergence. Remark 4.4. • The topology of weak convergence is described by the neighborhood- basis (which is also a basis of the topology) " § » » * § p q  P p q § | ¡ |  P ¡ P p q P Uε,h1,...,hn µ : ν M1 S hi dν hi dµ ε, i 1, . . . , n , n N, ε 0, hi Cb S , µ M1.

Open sets of M1pSq are sets which contain for any element a neighborhood of this form. Note that basis sets above are itself open, and any open set is an (arbitrary) union of such basis sets.

• Recall (cf. [Alt02, Chapter 2.5]): A set M in a topological space is

– M is compact :ô any open cover has a finite subcover. – M is sequentially (relatively) compact :ô any sequence in M (M) has a sub- sequence converging in M (M). Note that compact is not necessarily cvalent to sequentially compact. – M is precompact (totally bounded) in a metric space :ô It has, for any ε ¡ 0, a finite covering with ε-balls. – for M set in a metric space,

compact ô sequentially compact ô precompact and complete ñ M is separable

– A subset M of a complete metric space is precompact iff it is relatively compact.

41 • A natural question: is the topology of weak convergence in M1pSq metrizable? Answer: yes, if S is separable (“Prohorov’s metric”).  p q  Example 4.1. Consider S R with the usual metric, xn a sequence in R and µn : δxn . | |  8 If limnÑ8 xn , there³ exists no convergence subsequence in general. Consider for example xn  n. Then h dµn  hpnq and this does not necessarily converge, i.e. µn does not converge weakly to some measure. However, if pxnq is bounded, then there exists a p q P ñ subsequence xnk converging to some x R. In this case, it is easy to see that µnk δx.

Definition 4.4. A set M € M1pSq is called tight if for every ε ¡ 0 there exists a compact set K € S such that µpKq ¥ 1 ¡ ε for every µ P M.

Example 4.2. The following is a sufficient condition for tightness: there exists a function h : S Ñ R such that t|h| ¤ cu is compact for every c ¥ 0 and » sup h dµ 8. µPM

For instance, S  R, hpxq  |x|p for some p ¡ 0. Then the set of measures with bounded p-moments will form a tight set.

Theorem 4.5 (Prohorov). Let M € M1pSq. (1) If M is tight, then M is relatively sequentially compact.

(2) Let S be complete and separable. If M is relatively sequentially compact, then M is tight.

Example 4.3. Let pµnq be a tight sequence of probability measures. Assume that any weakly convergent subsequence has weak limit µ. Then the sequence converges: µn ñ µ.

Proposition 4.6. If S is compact, then M1pSq is compact and sequentially compact. P p d dq For probability measures µ and µn, n N, on R , B , let ϕµn , ϕµ denote the char- acteristic functions of µn, µ; and let Xn, X denote random variables with distributions  ¥ ¡1  ¥ ¡1 µn Pn Xn , µ P X , respectively. We recall Levy’s continuity theorem for characteristic functions. It states that the following are equivalent

d (1) There exists a probability measure µ on B such that µn ñ µ. d Ñ p q Ñ p q P d (2) There exists a function ϕ : R C such that ϕµn u ϕ u for all u R and ϕ is (partially) continuous at 0 (in which case ϕ  ϕµ). The proof for d  1 is usually shown in the lecture Stochastics I. The proofs for the case d ¥ 1 for Levy’s continuity theorem, and likewise the proof for the Cramer-Wold theorem (cf.below), make use of Prohorov’s Theorem 4.5 (cf. [Klenke06, Thms 15.23 and 15.55], elaborate the details) (for part (b), by using part (a) and the already known result for dimension d  1). :

Proposition 4.7. With the previous notations, we have

42 (a) (Cram´er-Wold theorem): Then

ñ ðñ x y ÑD x y @ P d µn µ y, Xn y, X y R .

(b) Levy’s continuity theorem (cf.above) on characteristic functions holds in multiple di- mensions on Rd, d P N.

Remark 4.5 (On M1pSq being metrizable and the Prohorov metric). (1)

ρpµ, νq : inftε ¡ 0| µpAq ¤ ε νpUεpAqq, νpAq ¤ ε µpUεpAqq @A P Su

defines a metric on M1pSq called the Prohorov metric.

(2) ρpµn, µq Ñ 0 implies µn ñ µ.

(3) If S is separable, then the converse is true: µn ñ µ implies that ρpµn, µq Ñ 0. Hence, M1pSq with the topology of weak convergence is metrizable by the Prohorov metric. For reference, cf. [Klenke06].

4.3 Weak convergence on S  Cr0, 1s

We consider the space S  Cr0, 1s of continuous real-valued functions on r0, 1s with } }  | p q| p q  } ¡ } the sup-norm x suptPr0,1s x t (inducing the metric d x, y x y ). With this topology, S is a separable Banach space . Lemma 4.8. (a) The Borel σ-field S  BpSq is generated by the cylinder sets of the form

Z  tx P S| xptiq P Ai, i  1, . . . , nu ¤ ¤ ¤ ¤ P p q  P pt uq  with 0 t1 ... tn 1, Ai B R for i 1, . . . , n, n N, i.e. σ Z sets BpSq  S.

(b) BpSq  σpπt| t P r0, 1sq with πt : x ÞÑ xptq being the coordinate projections. (c) The set Z of all cylinder sets is X-stable. Any measure µ on Cr0, 1s is determined by its finite-dimensional marginal distributions  ¥ ¡1 € r s  r0,1s ÞÑ p p qq µJ µ πJ with J 0, 1 - finite, where πJ πJ , x x tj tj PJ , are the canonical projections, since cylinder sets are X-stable generator of BpCr0, 1sq by Lemma 4.8.

Lemma 4.9. (a) If µn ñ µ for some µ, µn P M1pCr0, 1sq, then all finite-dimensional marginal distributions of µn converge to the finite-dimensional marginal distributions of µ, i.e. µn,J ñ µJ @J € r0, 1s - finite.

(b) The converse implication is not true.

Theorem 4.10. For probability measures µn, µ, on pCr0, 1s, BpCr0, 1sqq, the following are equivalent

43 (a) µn ñ µ,

(b) all finite-dimensional marginal distributions of µn converge against these of µ, i.e. µn,J  ¥ ¡1 ñ ¥ ¡1  € r s p q µn πJ µ πJ µJ for all finite J 0, 1 , and the sequence µn nPN is tight. For x P Cr0, 1s we define

Wδpxq : supt|xptq ¡ xpsq| | s, t P r0, 1s with |t ¡ s| ¤ δu, δ ¡ 0.

Note that for any x P Cr0, 1s holds limδÓ0 Wδpxq  0 since continuity on r0, 1s implies uniform continuity.

Proposition 4.11 (Arzel`a-Ascoli). A set A € Cr0, 1s is relatively compact in Cr0, 1s if and only if the following two conditions are satisfied: } } 8 (1) A is uniformly bounded, i.e. supxPA x , p q  (2) A is equi-continuous , i.e. limδÓ0 supxPA Wδ x 0. Given (2) in Proposition 4.11, one can replace (1) by

(1’) A is uniformly bounded at 0, i.e. sup |xp0q| 8. xPA

€ p r sq 0  ¥ ¡1  pt p q P ¤uq Theorem 4.12. Let M M1 C 0, 1 and define µ : µ πt0u µ x 0 for µ P M . Then M is tight if and only if tµ0| µ P Mu is tight on R and

lim sup µptWδpxq ¥ εuq  0 @ε ¡ 0 . (4.3.1) δÓ0 µPM

Remark 4.6. (a) In Theorem 4.12 the set tµ0| µ P Mu would be trivially tight if for example µptx| xp0q  0uq  1 for all µ P M.  t | P u (b) If M µn n N , then (4.3.1) is equivalent to

lim lim sup µnpWδpxq ¥ εq  0. (4.3.2) δÓ0 nÑ8

44 Chapter 5

Donsker’s invariance principle

p q p q If Bt tPr0,1s is a Brownian motion defined on a probability space Ω, F, P , then the map- ping B :Ω Ñ Cr0, 1s, ω ÞÑ pt ÞÑ Btpωqq, induces a probability measure on pCr0, 1s, BpCr0, 1sqq,  ¥ p q¡1 namely the distribution µ P Br0,1s of Br0,1s under P. Definition 5.1. The measure µ on pCr0, 1s, BpCr0, 1sqq whose finite dimensional distri- butions are centered multivariate Gaussian with Covariance function s ^ t (s, t P r0, 1s) (i.e. the distribution of Brownian motion) is called the Wiener measure (for time interval r0, 1s).

Remark 5.1. (a) Existence of the Wiener measure is equivalent to existence of Brownian motion (though its definition does not require existence of the latter). Indeed, taking  r s  p r sq p q  p q  Ω C 0, 1 , F B C 0, 1 , and Xt ω ω t as the coordinate process, P µ the Wiener measure, then pXtqtPr0,1s is a BM (on r0, 1s) with respect to the natural filtration Ft  σpXs : s ¤ tq. (b) We already know existence of Brownian motion as a continuous centered Gaussian process X with covariance function CovpXt,Xsq  t ^ s (Kolmogorov’s consistency theorem together with Kolmogorov-Chentsov continuity theorem for a continuous modification). Hence, the Wiener measure exists.

(c) The Wiener measure is defined analogously for any time interval, e.g. for r0, 8q. p q p q Setting now: Ω, F, P is a probability space with° Yk kPN a sequence of real-valued r s  p q   n  i.i.d.’s with E Y1 0, Var Y1 1. Define Sn : k1 Yk, S0 : 0 - random walk ?1  with i.i.d. increments. Consider the rescaled random variables n Sl, l 0, . . . , n, on equidistant time grid on r0, 1s with linear interpolation:

n 1 1 X  ? St u ? Yt u pnt ¡ tntuq, t P r0, 1s. t n nt n nt 1

n n X is a continuous stochastic process, X¤ :Ω Ñ Cr0, 1s is F ¡ BpCr0, 1sq-measurable: n  ¥ n ¡ p q p P r sq  p r sq all Xt πt X¤ are F B R -measurable and σ πt : t 0, 1 B C 0, 1 by Lemma  ¥ p nq¡1 P p r sq n 4.8. Denote by µn P X M1 C 0, 1 the distribution of X under P.

45 Theorem 5.2 (Donsker). Under the above considerations, it holds that the sequence pµnq converges weakly and the limit is the Wiener measure µ, i.e.

µn ñ µ p Wiener measure on Cr0, 1sq;

n D that means that X¤ Ñ B¤ for Brownian motion B (one-dimensional, on r0, 1s). Remark 5.2. (1) Existence and construction of µ. p q (2) Invariance principle: only requirement for Yk kPN is that they are i.i.d. with second moments.

?1  n ÑD p q (3) Functional version of the CLT: not only the convergence n Sn X1 N 0, 1 , but convergence of processes. p q (4) Instead of sequences Yk kPN, the result also holds for “triangle-schemes” where each n p nq X relies on other increments Yk kPN ... (5) For continuous functions F : Cr0, 1s Ñ S, we get that F pXnq ÑD F pBq, where B is a Brownian motion. This is used, for instance, in financial mathematics to show that the CRR binomial model converges to the Black-Scholes model. p q  pp 1 dqq P d Theorem 5.3 (Multivariate CLT). Let Xk kPN Xk ,...,Xk kPN, k N, be R -valued r s  P d p j l q  P t u i.i.d. random° variables with E X1 b R , Cov X1 ,X1 Σjl, j, l 1, . . . , d . Let also ¦  ?1 n p ¡ q ¦ ÑD p q Ñ 8 Sn n k1 Xk b . Then Sn N 0, Σ as n . p q p q Lemma 5.4. Let Un nPN, U, be random variables on some probability space Ω, F, P taking values in a normed space pS, } ¤ }q.

P (1) Let pVnq be other S-valued random variables with Vn Ñ 0. If S is separable and D D Un Ñ U, then Un Vn Ñ U. p q Ñ P Ñ 8 ÑD (2) Let cn be a sequence in R such that cn c R as n . If Un U, then D cnUn Ñ cU. Proposition 5.5. With the set-up from above, the finite-dimensional marginal distribu- p q tions of µn nPN converge weakly to the finite-dimensional marginal distributions of µ, i.e. ¥ p r0,1sq¡1 ùñ ¥ p r0,1sq¡1 @ € r s µlooooooomooooooonn πJ µloooooomoooooonπJ finite J 0, 1 ,

 J  J :µn :µ r0,1s ÞÑ p p qq where πJ denotes the projection x x j jPJ .

Lemma 5.6 (maximum inequality, Ottaviani). Let U1,...,U°n be independent random r s  p q  r 2s   ?1 k variables with E Ui 0, Var Ui E Ui 1. Set Zk : n i1 Ui. Then we have   1 max |Zk| ¡ 2α ¤ r|Zn| ¡ αs @α ¡ 1. P k1,...,n ¡ 1 P 1 α2 p q Proposition 5.7. In the set-up from above, the sequence µn nPN is tight.

46 Bibliography

[Aleks01] Aleksandrov P.S., Lehrbuch der Mengenlehre, Harri Deutsch (Verlag), 2001.

[Alt02] Alt H. W., Lineare Funktionalanalysis: Eine anwendungsorientierte Einf¨uhrung, Springer, 2002.

[Bauer01] Bauer H., Measure and Integration Theory, Walter de Gruyter, 2001.

[Dieu60] Dieudonn´eJ., Foundations of Modern Analysis, New York, Academic Press, 1960.

[DellMey79] Dellacherie C., Meyer P.-A., A Probabilities and Potential, North-Holland Mathematics Studies, 1979.

[JacPro03] Jacod J., Protter P., Probability Essentials, Universitext, Springer, 2003.

[Klenke06] Klenke A., : A Comprehensive Course., Universitext, Springer, 2006.

[Kall02] Kallenberg O., Foundations of Modern Probability, Springer, 2002.

[Will91] Williams D., Probability with Martingales, Cambridge University Press, 1991.

[a.s. wordpress] http://almostsure.wordpress.com/2009/12/06/upcrossings- downcrossings-and-martingale-convergence/

47