theory I

Prof. Dr. Alexander Drewitz Universit¨at zu K¨oln

Preliminary version of July 16, 2019

If you spot any typos or mistakes, please drop an email to [email protected]. 2 Contents

1 functions 5 1.1 Systems of sets ...... 5 1.1.1 Semirings, rings, and algebras ...... 5 1.1.2 σ-algebras and Dynkin systems ...... 11 1.2 Set functions ...... 17 1.2.1 Properties of set functions ...... 17 1.3 Carath´eodory’s extension theorem (‘Maßerweiterungssatz’) ...... 25 1.3.1 Lebesgue ...... 29 1.3.2 Lebesgue-Stieltjes measure ...... 31 1.4 Measurable functions, random variables ...... 32 1.5 Image measures, distributions ...... 38

2 The Lebesgue integral 41 2.0.1 Integrals of simple functions ...... 41 2.0.2 Lebesgue integral for measurable functions ...... 43 2.0.3 Lebesgue vs. Riemann integral ...... 46 2.1 Convergence theorems ...... 47 2.1.1 Dominated and monotone convergence ...... 47 2.2 Measures with densities, absolute continuity ...... 50 2.2.1 Almost sure / almost everywhere properties ...... 50 2.2.2 Hahn-Jordan decomposition ...... 53 2.2.3 Lebesgue’s decomposition theorem, Radon-Nikodym derivative ...... 55 2.2.4 Integration with respect to image measures ...... 55 2.3 Product spaces ...... 56 2.4 Product measures ...... 58 2.5 The theorems of Fubini and Tonelli ...... 60 2.6 Fourier transform / characteristic functions ...... 63

3 Classical and basic results in probability theory 65 3.1 Specific distributions ...... 65 3.1.1 Discrete distributions ...... 65 3.1.2 Distributions with densities ...... 68 3.2 Independence ...... 69 3.3 Covariance, variance ...... 72 3.4 Lp spaces and some fundamental inequalities ...... 77 3.5 Convergence of random variables ...... 80 3.5.1 Almost sure convergence ...... 80 3.5.2 Convergence in Lp ...... 81 3.5.3 Convergence in probability ...... 82 3.5.4 Convergence in distribution ...... 82 3.5.5 Some fundamental tools ...... 85

3 4 CONTENTS

3.5.6 Interdependence of types of convergence of random variables ...... 86 3.6 Laws of large numbers ...... 87 3.6.1 Weak law of large numbers ...... 87 3.6.2 Strong law of large numbers ...... 89 3.7 Convolution of measures ...... 90 3.8 Central limit theorem ...... 91

4 A primer on stochastic processes 103 4.1 Stochastic processes ...... 103 4.2 Kolmogorov’s existence and uniqueness theorem ...... 105 Chapter 1

Set functions

In the introductory course ‘Introduction to probability and statistics’ (see [Dre18]),1 a frequent motivation was the investigation of (finitely many) dice tosses or coin flips. We have seen that those experiments were suitably described by discrete probability spaces, see [Dre18, Section 1.2]. Also in the nice setting of probability distributions with a density we have learned how to deal with that setting by means of the Riemann integral to some extent. However, quite quickly we had reached the limit of that approach, since for example pointwise limits of Riemann integrable functions were not necessarily integrable anymore, see e.g. Section [Dre18, Section 1.8.3]. As promised in the introductory , in this course we will provide a rigorous and self-contained introduction to the concept of Lebesgue integration, which will in particular comprise and generalize most of the content of [Dre18] regarding random variables and expectations. The principal goals of the first two chapters are twofold:

• Investigate measures and in particular construct the Lebesgue measure on Rd using a general extension theorem for elementary notions of volume;

• Introduce the Lebesgue integral for suitable functions defined on arbitrary measure spaces;

Recommended references to accompany this Chapter are [Els05], [Kle14], [Kal02] and [G08¨ ], as well as [Bau92]. Further sources on measure theory are [Bil95], [Coh13], [Doo94], [Rao04], [Hal50].

1.1 Systems of sets

1.1.1 Semirings, rings, and algebras One of the main goals will be to measure of some a priori abstract set Ω. In general, it will not be possible to do so in an appropriate manner for all subsets of Ω (the so-called ‘Maßproblem’ and ‘Inhaltsproblem’ (see [Els05]) as well as the Banach-Tarski paradox show the kind of problems that can arise when trying to do so; due to time constraints, we will not go into details here). d Maßproblem: We want to construct a function µ : 2R 0, (supposed to measure subsets Ñ r 8s of Rd) with the following properties:

d (a) For any sequence An n N of pairwise disjoint subsets An R , we have p q P Ă

µ An µ An (σ-additivity); n N “ n N p q ´ ďP ¯ ÿP 1This course is not a prerequisite, however it might help intuition to have attended that course.

5 6 CHAPTER 1. SET FUNCTIONS

(b) For any isometry (‘Bewegung’) T of Rd and any A Rd one has Ă µ A µ T A (invariance under isometries); p q“ p p qq

(c) µ 0, 1 d 1 (normalization); pr q q“ Theorem 1.1.1 (Satz v. Vitali (1905), Italian mathematician (1875–1932)). Das Maßproblem ist unl¨osbar f¨ur d 1. “ Theorem 1.1.2 (Satz v. Banach-Tarski (1924), Polish mathematicians (Stefan Banach, 1892 – 1945, Alfred Tarski, 1901 – 1983)). For d 1, let A, B Rd be arbitrary sets with non-empty ě Ă d interior. Then there exists a sequence Cn of subsets Cn R and isometries (Bewegungen) p q P Tn n N such that p q P 9 9 A Cn and B Tn Cn . n N n N “ P “ P p q ď ď d Therefore, a key role will be played by certain subsets of the 2R of Rd which are nicely behaved. In fact, it will turn out that with little additional effort we will be able to develop a theory of measures (and subsequently integration) not only on suitable subsets of Rd, but also of more general spaces Ω which will proves useful in many occasions. In the following we introduce several such systems of subsets which play an integral part in the construction of those functions (so-called measures, introduced in Definition 1.2.3 below) which will be measuring the corresponding subsets (which will form so-called σ-algebras, see Definition 1.1.18). The first definition is just a shorthand for systems of subsets of Ω which are closed under finite intersection.

Definition 1.1.3. Let Ω be a non- and let S 2Ω. S is called a π-system (‘π-System’) Ă if it is closed under (finite) intersections:

A, B S A B S. (1.1.1) P ùñ X P Remark 1.1.4. Using induction (‘vollst¨andige Induktion’) it is not hard to show that if S is a π-system, n N, and A ,...,An S, then P 1 P n Ai S. P i 1 č“ (see exercise classes).

To us, this property will prove particularly valuable in combination with further properties, as will be seen in the set systems introduced below. A standard way to construct functions which are supposed to measure many subsets of a set Ω is to first specify how to measure certain ‘simple’ subsets of Ω. Such simple subsets oftentimes form a semiring as introduced in the following definition.

Definition 1.1.5. Let Ω be a non-empty set. A S of 2Ω is called a semiring (‘Halbring’, ‘Semiring’) over Ω if the following properties are fulfilled:

(a) S; (1.1.2) HP (b) S is a π-system; 1.1. SYSTEMS OF SETS 7

(c) for any A, B S, there exist pairwise disjoint C ,...,Cn S such that P 1 P 9 n A B Ci. i 1 z “ “ ď One of the examples most relevant to us will be the following.

Example 1.1.6. Denote by I : a, b : a, b R, a b “ p s P ď the set of left-open right-closed intervals in R. Then I is a semiring( over R.

Proof. See exercise classes.

The set of Cartesian products of elements of two semirings is a semiring again, as is stated in the following result.

Lemma 1.1.7. Let S1 be a semiring over Ω1 and S2 be a semiring over Ω2. Then

S S : A A : A S , A S 1 ˚ 2 “ 1 ˆ 2 1 P 1 2 P 2 is a semiring over Ω Ω . ( 1 ˆ 2 Proof. We have to establish the properties of Definition 1.1.5. Since S and S , we get S S . HP 1 HP 2 HP 1 ˚ 2 Furthermore, to prove the second item let A A A ,B B B S S . Thus, “ 1 ˆ 2 “ 1 ˆ 2 P 1 ˚ 2 A B A B A B S S , X “ p 1 X 1q ˆ p 2 X 2q P 1 ˚ 2 S1 S2 P P looomooon looomooon since S1 and S2 are stable under intersections. To prove the last item, again let A A A ,B B B S S . Then A B can be “ 1 ˆ 2 “ 1 ˆ 2 P 1 ˚ 2 z partitioned via 9 A B A B A A B A B . (1.1.3) z “ pp 1z 1qˆ 2q pp 1 X 1q ˆ p 2z 2qq ď Now by assumption there exist C ,...,Cn S pairwise disjoint with 1 P 1 9 n A1 B1 Ci i 1 z “ “ ď and D ,...,Dm S pairwise disjoint with 1 P 2 9 m A2 B2 Di. i 1 z “ “ ď As a result, the right-hand side of (1.1.3) can be written as a pairwise disjoint union of elements of S S , which finishes the proof. 1 ˚ 2 d We will use the standard convention that for a, b R we write a b if ai bi for all P ď ď i 1,...,d , and similarly for other types of (in)equalities. In addition, for a, b Rd with P t u d d P a b we will use the notation a, b : ai, bi x R : a x b for the Cartesian ď p q “ i 1p q “ t P ă ă u product of (one-dimensional) intervals, and“ analogously for other types of intervals. Ś Corollary 1.1.8. For any d N, the set P Id : a, b : a, b Rd, a b (1.1.4) “ p s P ď of hyperrectangles (‘Hyperquader’) is a semiring over Rd. ( 8 CHAPTER 1. SET FUNCTIONS

Proof. We proceed by induction (‘vollst¨andige Induktion’) over the dimension d. The case d 1 “ is Example 1.1.6. Assume the statement holds for arbitrary d N. Then Id 1 Id I, and hence Id 1 is a P ` “ ˚ ` semiring due to Lemma 1.1.7 and the induction hypothesis.

Definition 1.1.9. Let Ω be a non-empty set. A subset R of 2Ω is called a ring over Ω if the following properties are fulfilled:

(a) R; (1.1.5) HP (b) A, B R implies A B R; (1.1.6) P Y P (c) for any A, B R, one has A B R. (1.1.7) P z P Example 1.1.10. Let Ω be an arbitrary non-empty set. Then the set R of countable (we use countable in the sense of ‘at most’ countable, i.e., a set is countable if it is finite or has a bijection with N) subsets of Ω is a ring.

Proof. Since is countable we have R. Furthermore, if A, B R, then A and B are both H HP P countable and hence so is A B. In this case, also A B is countable, and hence R is a ring. z Y Example 1.1.11. Let S be a semiring. Then the set n R : Si : n N, Si S i 1,...,n “ i 1 P P @ P t u ! ď“ ) is a ring. It is also referred to as the ring generated by S, and it is the smallest ring containing S.

Proof. Since S, we immediately get R. HP m HP n For A, B R we have A i 1 Ai, B i 1 Bi, with Ai,Bi S, and we get at once that A B RP. “ “ “ “ P Y P Ť Ť The ‘hard’ part is to show the last property. For that purpose, assume again that A, B R and m n P hence that A i 1 Ai, B i 1 Bi, with Ai,Bi S. Then “ “ “ “ P Ť Ť m n m n A B Ai Bj Ai Bj. z “ i 1 z j 1 “ i 1 j 1 z ď“ ď“ ď“ č“ Since S is a semiring, we deduce that

ni,j i,j Ai Bj C , z “ k k 1 ď“ i,j some ni,j N and C S for all k 1,...,ni,j . Hence, P k P P t u m n ni,j m n i,j i,j A B Ck Cf j , z “ “ n p q i 1 j 1 k 1 i 1 f 1,...,ni,j j 1 ď“ č“ ď“ ď“ P j“1ďt u č“ Ś S P which shows that R is a ring. looomooon Furthermore, since rings are stable under finite unions, any ring containing S must also contain R. Thus, R is the smallest ring containing S. 1.1. SYSTEMS OF SETS 9

Lemma 1.1.12. For a non-empty set Ω, a subset R 2Ω is a ring if and only if one of the Ă following two conditions holds true:

(a) R, HP

for any A, B R one has A∆B R, and P P

for any A, B R one has A B R; P Y P (b) R, HP

for any A, B R one has A∆B R, and P P

for any A, B R one has A B R; P X P Proof. Let R be a ring over Ω. Since

A∆B A B B A , “ p z q Y p z q we deduce that (a) holds true. Now assume that R fulfills (a). Then, since

A B A B ∆ A∆B X “ p Y q p q we deduce that (b) holds true. Now assume (b) to hold true. Then since

A B A∆ A B , z “ p X q (1.1.7) follows. Furthermore, since A B A∆ B A , we also deduce (1.1.6). Hence, R is a Y “ p z q ring.

Corollary 1.1.13. If R is a ring over a non-empty set Ω, then R is also a semiring over Ω.

Proof. This is an immediate consequence of Lemma 1.1.12 and the very definition of a semiring and a ring.

An even stronger (cf. Lemma 1.1.16) concept than that of a ring is given in the following definition.

Definition 1.1.14. Let Ω be a non-empty set. A subset A of 2Ω is called an algebra2 over Ω (‘Algebra ¨uber Ω’) if the following properties are fulfilled:

(a) Ω A; (1.1.8) P (b) A A implies Ac A; (1.1.9) P P 2Some authors use the term field instead of algebra, see e.g. [Bil95]. 10 CHAPTER 1. SET FUNCTIONS

(c) A, B A implies A B A. (1.1.10) P Y P Lemma 1.1.15. Let Ω be a non-empty set. A subset A 2Ω is an algebra over Ω if and only Ă if the following conditions are fulfilled:

(a) Ω A; (1.1.11) P (b) A A Ac A; (1.1.12) P ùñ P (c) A, B A A B A. (1.1.13) P ùñ X P Proof. Let A be an algebra. It only remains to show (1.1.13). But De Morgan’s laws imply A B Ac Bc c, hence if A, B A, then (1.1.10) and (1.1.9) yield that A B A. X “ p Y q P X P Conversely assume that the system A fulfills properties (1.1.11) to (1.1.13). It remains to show (1.1.10). But again by De Morgan’s laws we obtain A B Ac Bc c, and hence (1.1.12) Y “ p X q and (1.1.13) imply that A B A. Hence, A is an algebra over Ω. Y P Lemma 1.1.16. Let Ω be a non-empty set. A ring R over Ω is an algebra over Ω if and only if Ω R. P Proof. Let R be a ring over Ω with Ω R. Then (1.1.8) immediately holds true. Choosing P A : Ω R in (1.1.7), we obtain that for arbitrary B R we have Bc Ω B Ω B R, “ P P “ z “ z P which implies that (1.1.9) holds true as well. In addition, (1.1.10) immediately follows from (1.1.6). Hence, R is an algebra. Conversely, if A is an algebra, then Ωc A due to (1.1.9) and (1.1.8). Thus, (1.1.5) holds H“ P true. Furthermore, for A, B A we have A B A Bc, hence (1.1.7) is a consequence of P z “ X (1.1.13) and (1.1.9). Lastly, (1.1.6) is implied by (1.1.10).

As an application, we have seen in Corollary 1.1.8 that Id is a semiring over Rd, and we can generate a ring from it as in Example 1.1.11. Still Rd is not contained in this ring, so due to the previous lemma this is still not an algebra. We now go for a short detour to explain the motivation of the terms ‘ring’ and ‘algebra’ in the above context. For this purpose we recall that in (linear) algebra, a ring (‘Ring’) had been defined as a set R endowed with operations : R R R (addition) and : R R R ` ˆ Ñ ¨ ˆ Ñ (multiplication) such that R, is an additive commutative group, and such that p `q (a) a, b, c R : a b c a b c (associativity) @ P ¨ p ¨ q “ p ¨ q ¨ (b)

a, b, c R : a b c a b a c, a b c a c b c (distributivity) @ P ¨ p ` q“ ¨ ` ¨ p ` q ¨ “ ¨ ` ¨ Furthermore, in linear algebra, one way to introduce the concept of an ‘algebra’ is to demand it to be a ring R which in addition is a vector space over a field3 (‘K¨orper’) K such that in addition, for all u, v R and α K one has P P α uv αu v u αv . p q “ p q “ p q 3Adding insult to injury, there is again a possible overlap with terms here, since as we have seen before, some (although seemingly not too many) authors use the term ‘field’ for an algebra (in the set-theoretic sense). It seems that for notations in German there is slightly less confusion. 1.1. SYSTEMS OF SETS 11

Lemma 1.1.17. Let Ω be a non-empty set.

(a) Endow the power set 2Ω with ∆ (symmetric difference) as addition and (intersection) X as multiplication. Then 2Ω, ∆, is a commutative ring with a zero (‘Nullelement’) p Xq H and a one (‘Einselement’) Ω.

(b) A subset R 2Ω is a ring in the sense of Definition 1.1.9 if and only if R, ∆, is a Ă p Xq ring.

(c) Let A 2Ω be a ring or an algebra in the sense of Definition 1.1.14 or 1.1.9, respectively. Ă Then A, ∆, is an algebra over the field 0, 1 .4 Here we define 0 A : and 1 A : A p Xq t u ¨ “H ¨ “ for A A. P Proof. See exercise classes.

1.1.2 σ-algebras and Dynkin systems The systems of sets introduced above, i.e., semirings, rings, and algebras, only involved stability under finite operations (such as intersections and unions, for example). It turns out, however, that we want to be able to measure not only finite but countable intersections (or unions) of ‘nice’ sets.5 As a consequence, we will introduce systems of sets that are stable under such kinds of operations.

Definition 1.1.18. Let Ω be a non-empty set. A subset F of 2Ω is called a σ-algebra over Ω6 (‘σ-Algebra ¨uber Ω’) if the following properties are fulfilled:

(a) Ω F; (1.1.14) P (b) A F implies Ac F; (1.1.15) P P (c)

A1, A2,... F implies An F. (1.1.16) P n N P ďP Exercise 1.1.19. If F is a σ-algebra over Ω and F Ω, then Ă

FF : F F : F G : G F “ X “ X P is a σ-algebra over F (it is called the trace σ-algebra of F in F().

Exercise 1.1.20. Let Ω be a non-empty set.

(a) Any σ-algebra F also is an algebra, i.e., it is stable under finite unions.

(b) F 2Ω is a σ-algebra if and only if (1.1.14), (1.1.15), and Ă

A1, A2,... F implies An F. (1.1.17) P n N P čP 4Where 0 is the neutral element of addition, and 1 the neutral element of multiplication. In particular, 0 ` 1 “ 1, and 1 ` 1 “ 0. 5In ‘real life experiments’ you might e.g. want to ask whether certain properties are fulfilled by an infinite sequence of coin tosses or dice rolls. 6Again, some authors use the term σ-field instead, see [Bil95]. 12 CHAPTER 1. SET FUNCTIONS

While σ-algebras play an important role not only in probability theory but also e.g. analysis, the following concept of a Dynkins system has primarily been employed in probability theory. Definition 1.1.21. Let Ω be a non-empty set. A subset D of 2Ω is called a Dynkin (‘Dynkin- System’)7 system (or also λ-system (‘λ-System’)) over Ω if the following properties are fulfilled:

(a) Ω D; (1.1.18) P (b) A D implies Ac D; (1.1.19) P P (c)

If A1, A2,... D is a sequence of pairwise disjoint sets, then An D. (1.1.20) P n N P ďP Exercise 1.1.22. Find an example of a Dynkin system that is not a semiring. Lemma 1.1.23. Property (1.1.19) in Definition 1.1.21 can be substituted by the following: For any A, B D with A B one has B A D. P Ă z P Proof. Indeed, if this property holds, then for any A D we obtain, setting B : Ω F, that P “ P Ac Ω A D. “ z P Conversely, if D is a Dynkin system and A, B D with A B, then we have P Ă B A A 9 Bc c, z “ p Y q and the latter is contained in D since Dynkin systems are closed under disjoint unions and complements.

We will investigate the relations between λ-systems and σ-algebras in more detail in Theorems 1.1.33 and 1.1.32 below.

Proposition 1.1.24. Let Λ be an arbitrary non-empty set, and let Aλ λ Λ be a family of p q P σ-algebras (or rings, or algebras, or λ-systems) over the same set Ω. Then

Aλ λ Λ čP is a σ-algebra (or ring, or algebra, or λ-system) over Ω again. Proof. We only give the proof for σ-algebras, the remaining cases are proven in a similar way. Since Ω Aλ for all λ Λ, we immediately get P P Ω Aλ. P λ Λ čP c Furthermore, assume A Aλ. Then A Aλ for all λ Λ, therefore A Aλ for all λ Λ, P λ Λ P P P P and hence P Ş c A Aλ. P λ Λ čP Lastly, assume that An n N is a sequence of sets such that An λ Λ Aλ for all n N. Thus, p q P P P P for each λ Λ, we have An Aλ for all n N, and hence n N An Aλ. As a consequence, P P P P ŞP An Aλ Ť P n N λ Λ ďP čP for each λ Λ, too, which finishes the proof. P 7In honor of Eugene Dynkin (1924–2014) 1.1. SYSTEMS OF SETS 13

Exercise 1.1.25. Show that Proposition 1.1.24 does not hold true anymore if one replaces ‘σ-algebra’ by ‘semiring’. The following is a generalization of the very specific Example 1.1.11. Definition 1.1.26. Let Ω be a non-empty set, and let E 2Ω. Proposition 1.1.24 implies that Ă δ E : D p q “ D is a λ-system over Ω DčE Ą is a λ-system again. It is called the λ-system generated by E. Similarly, σ E : F p q “ F is a σ-algebra over Ω FčE Ą is a σ-algebra again. It is called the σ-algebra generated by E. In an analogous way, one can define rings and algebras generated by subsets of 2Ω. Note that it follows the previous definition that the σ-algebra σ E is at the same time the p q smallest σ-algebra containing E (and similarly for the remaining set systems). Remark 1.1.27. For simpler set systems such as algebras and rings it is possible to explicitly represent their elements using by applying finitely many combinations of elementary set func- tions to elements of their generators (see Example 1.1.11, problem 1 on the first homework sheet, or also [Els05, Ch.1, §4]). For σ-algebras this is generally not possible anymore, see the Section ‘Constructing σ-fields’ in [Bil95] pp. 30. However, for many purposes it is actually sufficient to consider the generator of σ-algebras or Dynkin systems instead of the entire set system. An prominent role will be played by the Borel-σ-algebra (French mathematician and politician Emile´ Borel (1871–1956)). Definition 1.1.28. Let Ω be a non-empty set. A set τ 2Ω is called a topology (‘Topologie’) Ă if the following hold true: (a) , Ω τ; H P

(b) if Oλ λ Λ is an arbitrary Oλ τ, then p q P P

Oλ τ; P λ Λ ďP

(c) if O1, O2 τ, then P O O τ. 1 X 2 P The pair Ω,τ is called a topological space. The elements of τ are called open sets, and the p q elements of C Ω : Cc τ Ă P are called closed sets. ( This definition can be motivated by having a closer look at metric spaces. (E.g., consider Rd d 2 d endowed with the Euclidean metric defined via d x,y : i 1 xi yi , for x,y R .) In p q “ “ p ´ q P fact, for a metric space X, d a subset O X had beenb defined to be open if for each x O p q Ă ř P there exists r 0, such that P p 8q

Br x : y X : d x,y r O. p q “ P p qă Ă ( 14 CHAPTER 1. SET FUNCTIONS

It can be shown without too much effort that defining τ as the set of all open sets of the metric space X, d , it fulfills the properties of a topology as defined in Definition 1.1.28. p q We will mostly be dealing with metric spaces (and, in fact, most of the times with Rd endowed with the Euclidean metric), but the following definition does not come at any additional cost in its fully-fledged generality. In the following, instead of τ for topology we write O to denote the topology (inspired by the wording ‘open’ sets). Definition 1.1.29. For an arbitrary topological space X, O , the σ-algebra p q B X : σ O F p q “ p q“ F is a σ-algebra over Ω FčO Ą generated by the open subsets of X is called the Borel-σ-algebra (‘Borel-σ-Algebra’) on Ω. The sets B B X are called Borel sets or Borel-measurable sets. P p q Example 1.1.30. (a) Most often we will be interested in the case X Rd, d N, and O the “ P set of open subsets of Rd (in the topology induced by the Euclidean metric), i.e., in the σ-algebra B Rd . p q (b) It will also turn out useful to consider the Borel-σ-algebra B R of the two point com- p q pactification R : R , of R. The open sets of R can be described as the largest “ Y t´8 8u topology (i.e., a system of open sets) on R such that the map

ϕ : 1, 1 R r´ sÑ , if x 1, x ˘8 “˘ ÞÑ tan πx 2 , otherwise, " p { q is continuous, where we recall that the map ϕ is continuous if and only if all preimages ϕ 1 O , O R open, is open in 1, 1 . Here, 1, 1 is endowed with the usual (trace-) ´ p q Ă r´ s r´ s topology of R, i.e., the open subsets of 1, 1 are just the ones of the form 1, 1 O, r´ s r´ sX for any O R open in the topology induced by the Euclidean metric. Ă In other words, the open sets of R are given by (unions of) the sets of the form

• V R open; Ă • , a , a R; r´8 q P • a, , a R; p 8s P In particular, we see that and are not open in this topology of R. t8u t´8u (c) in fact, it is not easy to construct sets in Rd which are not in B Rd . An example of such p q sets are the so-called Vitali sets; however, apart from possibly peculiar counterexamples, all subsets of Rd that we will encounter in this course are actually contained in B Rd . p q d Lemma 1.1.31. Each of the following subsets of 2R is a generator of the Borel-σ-algebra B Rd , and each of them is a π-system: p q (a) E : a, b : a, b Qd with a b ; 1 “ p q P ă Y tHu (b) E : a, b : a, b Qd with a b( ; 2 “ p s P ă Y tHu (c) E : a, b : a, b Qd with a b( ; 3 “ r q P ă Y tHu (d) E : a, b : a, b Qd with a b( ; 4 “ r s P ď Y tHu (e) E : , a : a Qd ; ( 5 “ p´8 s P ( 1.1. SYSTEMS OF SETS 15

(f) E : , a : a Qd ; 6 “ p´8 q P (g) E : a, : a Qd ;( 7 “ r 8q P (h) E : a, : a Qd(; 8 “ p 8q P (i) E : A Rd : A is compact( ; 9 “ Ă (j) E : A Rd : A is closed .( 10 “ Ă We can replace any Q in the above( by R and the statements still hold true. In addition, in the case d 1, and considered as subsets of 2R, any of the above systems of “ sets is a generator of B R also (where in the cases (e) to (h) we have to include and , p q 8 ´8 respectively, i.e., exchange the corresponding or by or , respectively). 1p1 1q1 1r1 1s1 Proof. We prove only part of the result (the remaining cases can be proved in similar ways). d Any open subset of R can be written as a countable union of elements of E1 (or of E2, E3, or E4 for that matter). Hence, all open subsets of Rn are contained in any σ-algebra containing any of the Ej, 1 j 4, and thus we deduce ď ď n B R σ Ej , for all 1 j 4. (1.1.21) p qĂ p q ď ď Conversely, since any element of E is open, we immediately get σ E B Rn , which proves 1 p 1q Ă p q the desired equality σ E B Rn . p 1q“ p q Furthermore, the complement of any element of E is open, E B Rd , and hence σ E 4 4 Ă p q p 4q Ă B Rd ; thus, in combination with (1.1.21) we have p q σ E B Rd . (1.1.22) p 4q“ p q

To continue, any element of E2 (or of E3) can be written as a countable union of elements of E4; indeed, we have for a, b Qd with a b that P ă a, b r, b , p s“ r s r Qd a ďPr b ă ă and similarly in the case of E . As a consequence, E , E E and thus we obtain σ E ,σ E 3 2 3 Ă 4 p 2q p 3qĂ σ E , which in combination with (1.1.21) and (1.1.22) supplies us with p 4q σ E σ E B Rd . p 2q“ p 3q“ p q Furthermore, E , E B Rd , hence 9 10 Ă p q σ E ,σ E B Rd . (1.1.23) p 9q p 10qĂ p q Conversely all open subsets of Rd are contained in σ E , hence p 10q σ E B Rd . (1.1.24) p 10q“ p q

In addition, E4 E9 and hence in combination with (1.1.23) and (1.1.22) we deduce σ E9 B Rd . Ă p q “ p q For the last point, alternatively we could argue by exhaustion as follows: For C E we have P 10 C n,n d C , “ n N r s X ďP ` compact ˘ looooomooooon and in combination with (1.1.24) we infer σ E B Rd . p 9q“ p q The remaining parts are left as an exercise. 16 CHAPTER 1. SET FUNCTIONS

Theorem 1.1.32. Let D be a λ-system over a non-empty set Ω. Then D is a π-system if and only if it is a σ-algebra.

Proof. Let D be a λ-system. Any σ-algebra is obviously a π-system (see Exercise 1.1.20 (b), where all but finitely many An can be chosen to be ). H Conversely, assume that D is a π-system. Obviously, (1.1.19) and (1.1.18) imply that (1.1.15) and (1.1.14) hold true, so it remains to show that (1.1.16) is fulfilled as well. For this purpose, we start with observing that D is -closed. In fact, if A, B D, then z P A B A Bc D. z “ X P

Now let a sequence An n N of sets with An D for all n N be given. Define B1 : A1, and p q P P P “ for n 2 set ě n 1 n 1 ´ 9 ´ Bn : An Ai An Bi. “ z “ z i 1 i 1 “ ď“ ď By definition the Bn form a sequence of pairwise disjoint sets, and inductively we obtain that Bn D for all n N. As a consequence, P P

8 8 An Bn D. n 1 “ n 1 P ď“ ď“ This implies (1.1.16).

The following result is fundamental in probability theory, and in particular it will be helpful in order to show Theorem 1.2.17 below.

Theorem 1.1.33 (Dynkin’s π-λ-Theorem). Let Ω be a non-empty set and let A 2Ω be a Ă π-system. Then δ A σ A . (1.1.25) p q“ p q Proof. Since any σ-algebra is a Dynkin system also, it is clear that ‘ ’ holds in (1.1.25). For the Ă converse inclusion, we observe that it is sufficient to show that δ A is a π-system, which due p q to Theorem 1.1.32 would imply that δ A is a σ-algebra (containing A), so ‘ ’ would follow. p q Ą For this purpose we define for arbitrary A δ A the set P p q

δA A : B δ A : A B δ A . p q “ P p q X P p q ( We claim that for any A δ A , the set system δA A is a Dynkin system. Indeed, we have P p q p q the following:

• Ω δA A , since A Ω A δA A by choice of A; P p q X “ P p q c • if B δA A , then also B A δ A ; hence, Lemma 1.1.23 supplies us with A B P p q X cP p q X “ A B A δ A , and thus B δA A also; zp X q P p q P p q

• if Bn is a sequence of pairwise disjoint sets such that Bn δA A for all n N, then we p q P p q P have Bn A δ A for all n N, and furthermore those sets are pairwise disjoint. As a X P p q P consequence, 9 8 9 8 Bn A Bn A δ A , n 1 n 1 “ X “ “ p X q P p q ´ď ¯ ď 9 8 i.e., n 1Bn δA A . “ P p q Ť 1.2. SET FUNCTIONS 17

Hence, δA A forms a Dynkin system. p q Next, we observe that since A is a π-system, we have A δA A for all A A. Since δA A is Ă p q P p q a Dynkin systems for A A due to the above, we deduce that δ A δA A for any A A. P p q Ă p q P This, however, implies that for all B δ A and A A we have A B δ A , hence A δA A P p q P X P p q Ă p q for all A δ A now, and thus δ A δA A for all A δ A . In particular, this implies that P p q p qĂ p q P p q δ A is a π-system. In combination with Theorem 1.1.32 this establishes the fact that δ A is p q p q a σ-algebra, hence δ A σ A , which finishes the proof. p qĄ p q Corollary 1.1.34. In the notation of Lemma 1.1.31 we have for each j 1, 2,..., 10 that P t u d B R δ Ej . p q“ p q Proof. This is a direct consequence of Lemma 1.1.31 in combination with Theorem 1.1.33.

1.2 Set functions

1.2.1 Properties of set functions Definition 1.2.1. For Ω a non-empty set and E 2Ω, a function µ : E R is called a set Ă Ñ function. The set function µ is called

• monotone if µ A µ B for all A, B E with A B; p qď p q P Ă • additive if n n µ Ai µ Ai “ p q i 1 i 1 ´ ď“ ¯ ÿ“ n for all pairwise disjoint A1,...,An E with i 1 Ai E; P “ P • subadditive if Ť n µ A µ Ai p qď p q i 1 ÿ“ n for all A, A1,...,An E with A i 1 Ai; P Ă “ • σ-additive if Ť 8 8 µ Ai µ Ai (1.2.1) “ p q i 1 i 1 ´ ď“ ¯ ÿ“ for any sequence of pairwise disjoint sets A , A ,... E with 8 Ai E such that the 1 2 P i 1 P right-hand side of the previous equation is well-defined; “ Ť • σ-subadditive if 8 µ A µ Ai p qď i 1 p q ÿ“ for any A, A , A ,... E with A 8 Ai such that the right-hand side of the previous 1 2 P Ă i 1 equation is well-defined; “ Ť Remark 1.2.2. In the context of Definition 1.2.1, as for ‘usual’ functions, we will commonly write µ ν for two set functions on E if for all A E we have ď P µ A ν A . p qď p q 18 CHAPTER 1. SET FUNCTIONS

Apart from a short excursion (see Definition 1.2.11 below which will prove useful later on) we will only be interested in non-negative set functions. Some types of set functions will occur frequently, hence we introduce the following terminology.

Definition 1.2.3. Let S be a semiring over Ω, and let µ : S 0, be a set function with Ñ r 8s µ 0. Then µ is called a pHq “ (a) content (‘Inhalt’) if µ is additive;

(b) pre-measure (‘Pr¨amaß’) if µ is σ-additive;

(c) measure (‘Maß’) if µ is a pre-measure and S is a σ-algebra;

(d) probability measure (‘Wahrscheinlichkeitsmaß’) if µ is a measure with µ Ω 1. p q“ Remark 1.2.4. Show that the items in Definition 1.2.3 become more and more restrictive. I.e., if (d) is satisfied, then (c) is also satisfied; if (c) is satisfied, then (b) is also satisfied, and so on. On the other hand, one can find examples of a content which is no pre-measure, of pre-measures which are no measures, and so on (exercise).

Definition 1.2.5. Let S be a semiring over Ω and let µ : S 0, be a content. Ñ r 8s (a) Then µ is called finite if µ S for all S S. Furthermore, µ is called σ-finite if there p qă8 P exists a sequence Sn of sets Sn S with p q P • µ Sn n N, p qă8 @ P and • 8 Sn Ω. n 1 “ ď“ (b) A set N S is called a (µ-)null set (‘Nullmenge’) if µ N 0.8 P p q“ Definition 1.2.6. Let Ω , F a σ-algebra over Ω, and µ : F 0, a measure, Then the ‰H Ñ r 8s triplet Ω, F,µ is called a measure space. p q A measure space Ω, F,µ is called a σ-finite measure space if µ is σ-finite. p q If µ is a probability measure, then a measure space Ω, F,µ is called a probability space. p q Example 1.2.7 (Contents, pre-measures). (a) On the semiring I of left-open right-closed intervals introduced in Example 1.1.6 we can define a σ-finite content as follows. Let f : R 0, be continuous and set Ñ r 8q x F x : f r dr, x R, p q “ p q P ż0 and where the integral is to be interpreted in the Riemann sense. Then F is non-decreasing and hence µ I : F b F a , p q “ p q´ p q for I a, b I defines a content on I. Indeed, we have µ F a F a (for any “ p s P pHq “ p q´ p q a R), and for ai, bi I, 1 i n, with P p s P ď ď 9 n ai, bi I, i 1 “ p s P 8Some authors consider any B Ă Ω for which weď find N P S with B Ă N and µpNq“ 0 a null set. 1.2. SET FUNCTIONS 19

we can, after possibly permuting the indices, write 9 n a, b ai, bi i 1 p s“ “ p s ď with a a1 b1 a2 b2 a3 . . . bn 1 an bn b. Hence, since “ ď “ ď “ ď ´ “ ď “ n n µ a, b F b F a F bi F ai µ ai, bi , pp sq “ p q´ p q“ i 1 p q´ p q“ i 1 pp sq ÿ“ ÿ“ we deduce that µ is additive and hence a content indeed.

For Sn : n,n S we obtain that Sn R and µ Sn F n F n , “ p´ s P n N “ p q “ p q´ p´ qă8 hence µ is σ-finite. P Ť Using problem 3 of Homework 3 (see also Example 1.1.11) we can uniquely extend µ to a content on not only the semiring S, but also on the ring R generated by S.

(b) For our purposes, the most important content / pre-measure arguably is the d-dimensional Lebesgue content / pre-measure λd. It is defined on the set of d-dimensional hyper-cuboids Id introduced in (1.1.4) as follows. For a, b Id we have a, b Rd with a b, and its p s P P ď d-dimensional Lebesgue pre-measure is defined as

λd : Id 0, , Ñ r 8q d (1.2.2) a, b bi ai . p s ÞÑ i 1p ´ q ź“ It is not too hard to verify that λd defines a content on Id; indeed, for the case d 1 this “ is a consequence of Part (a) of this example, where we choose f 1. For the case d 2 ” ě we refer to the proof of Proposition 1.2.8 (alternatively, see [Bau92, Satz 4.3]). d d d and again choosing the sets Sn : n,n I we get that R Sn and µ Sn “ p´ s P “ n N p q “ 2n d, thus λ is σ-finite. P p q Ť In fact, λd even defines a pre-measure:

Proposition 1.2.8. The function λd defined in (1.2.2) defines a pre-measure on Id.

Proof. The proof is slightly more technical than what we did in Part (a) of this example for the case d 1; however, the key ideas are present here already. We refer to [Els05, Satz “ II.3.1] for a proof (in this source, several strategies for proving the result are given – the one following [Els05, Satz II.3.8 b)] is closest to Part (a) of this example).

Example 1.2.9 (Measures). (a) Let Ω be an arbitrary set and define the counting measure (‘Z¨ahlmaß’)

µ : 2Ω 0, Ñ r 8s A A , ÞÑ | | where for A Ω we denote by A the number of elements of A if A is finite, and Ă | | 8 otherwise.

Exercise 1.2.10. Show that µ indeed defines a measure on 2Ω. Is it σ-finite? 20 CHAPTER 1. SET FUNCTIONS

(b) Let X be uncountable and define F to be the σ-algebra over X that contains all sets A X Ă for which either A or Ac is countable. (Check that this defines a σ-algebra indeed!) For A F define the measure µ via P 1, if A uncountable, µ A : (1.2.3) p q “ 0, if A countable. "

Check that µ defines a measure on X, F . We have µ 0, and for a sequence An p q pHq “ p q of pairwise disjoint sets with An F for all n N, we have that n N An is countable if P P P and only if An is countable for each n N. Thus, P Ť

1, if some An is uncountable, µ An “ 0, if all An are countable. n N " ´ ďP ¯ In addition, if one An˚ is uncountable, then, since the An are pairwise disjoint and in F, we get that An is countable for all n N with n n . Thus, P ‰ ˚

1, if some An is uncountable, µ An p q“ 0, if all An are countable. n N " ÿP This shows that µ is σ-additive and hence a measure. Also, observe that there are many µ-null sets, not just . H Furthermore, µ is σ-finite; indeed, µ is a finite (even a probability) measure since µ X p qă , and any finite measure is σ-finite as we can choose Sn : X for all n N. 8 “ P On the other hand, if we replaced 1 by in (1.2.3), then µ would still be a measure 8 (check!) but it would not be σ-finite anymore. Indeed, if we had a sequence Sn with p q Sn F and µ Sn , then this would imply that Sn is countable for all n N, and in P p qă8 P particular we would infer that Sn n N ďP is countable again. Since we assumed X to be uncountable, a fortiori we would deduce n N Sn X, hence µ cannot be σ-finite. P ‰ The aboveŤ were relatively simple examples of set functions, which could be defined explicitly for all sets we were interested in. If, however, we would try to directly define a measure on B Rd which is consistent with the content of d-dimensional volume for the hyperrectangles p q Id B Rd , we would run into troubles: The reason is just that we do not have an explicit Ă p q hold on elements of B Rd (as we had e.g. for generated algebras, cf. Homework 2 on sheet 1). p q Hence, one of our goals will be to give at least an abstract machinery of extending such simple notions of volume on basic sets to bigger systems of sets, see Section 1.3 below. We now introduce the concept of a signed measure here for the sake of completeness. This is not very essential in probability theory, but on the one hand it will turn out that it does not really make things more complicated, and on the other hand it makes our short introduction to measure theory a bit more complete.

Definition 1.2.11. For a σ-algebra F over Ω, we call a set function µ : F , with Ñ r´8 8s µ 0 a signed measure if µ is σ-additive.9 pHq “ Exercise 1.2.12. (a) Show that if µ is a signed measure, then its range is either a subset of , or of , . r´8 8q p´8 8s 9In particular, we require all sums occurring on the right-hand side of (1.3.4) to be well-defined. Alternatively we could also demand the restriction of the ranges established in Exercise 1.2.12 in the definition already, if that makes you feel more comfortable. 1.2. SET FUNCTIONS 21

(b) If µ is a signed measure and A, B F with A B and µ B , , then also µ A , . P Ă p q P p´8 8q p q P p´8 8q Although in probability theory we will mostly be interested in probability measures, it will turn out useful to be able to cover the case of ‘nicely’ behaved infinite contents (and measures) also. Lemma 1.2.13. Let S be a semiring and let µ be a content on S. Then: (a) µ is monotone;

(b) if S is a ring, then for all A, B S we have P µ A B µ A B µ A µ B ; (1.2.4) p Y q` p X q“ p q` p q (c) µ is subadditive, and if µ is σ-additive, then µ is also σ-subadditive;

n (d) if S is a ring and A1,...,An S with µ i 1 Ai , then P p “ qă8 n n Ť k 1 µ Ak 1 ´ µ Ai . . . Ai (inclusion-exclusion formula). “ p´ q p 1 X X k q ´ k 1 ¯ k 1 1 i1 ... ik n ď“ ÿ“ ď ăÿă ď (1.2.5) 9 n Proof. (a) Let A, B S with A B. Since S is a semiring we can write B A j 1Cj some P Ă z “ “ pairwise disjoint Cj S, 1 j n, some n N. Thus, A, Cj, 1 j n is a family of P ď ď P ď ď Ť pairwise disjoint elements of S such that 9 n B A 9 Cj. j 1 “ Y “ ď Hence, the additivity of the content µ gives

n µ B µ A µ Cj µ A , p q“ p q` j 1 p qě p q ÿ“ which implies the monotonicity, since µ 0. ě (b) We have A A B 9 A B “ p X qYp z q S S P P and hence the additivity of µ supplies usloomoon with loomoon

µ A µ B µ A B µ A B µ B µ A B µ A B , p q` p q“ p X q` p z q` p q“ p X q` p Y q where we also took advantage of A B 9 B A B. z Y “ Y n (c) Let A, A1,...,An S with A i 1 Ai. Similar to the proof of Theorem 1.1.32 we write P Ă “ B1 : A1 and for k 2, “ ě Ť k 1 k 1 ´ ´ Bk : Ak Ai Ak Ai Ak, “ z i 1 “ i 1 z Ă ď“ č“ so in particular n n Ak Bk. (1.2.6) “ k 1 k 1 ď“ ď“ By definition of a semiring, Ak Ai is the finite disjoint union of elements of S. Since S is z a π-system, it is not hard to show that the same applies to Bk, 22 CHAPTER 1. SET FUNCTIONS

i.e., nk 9 k k k Bk C , some pairwise disjoint C ,...,C S. i 1 i 1 nk “ “ P Using this representationď for Bk, in a similar fashion we get that

mk 9 k k k Ak Bk D some pairwise disjoint D ,...,D S. i 1 i 1 mk z “ “ P ď In combination with (1.2.6) we therefore get

n n n n nk k 9 9 9 k additivity k µ A µ Bk A µ C A µ C A p q“ k 1 X “ k 1 i 1 i X “ p i X q “ “ “ k 1 i 1 ´ď ¯ ´ď ď ¯ ÿ“ ÿ“ n nk mk n k k additivity µ C µ D µ Ak , ď p i q` p i q “ p q k 1 i 1 i 1 k 1 ÿ“ ´ ÿ“ ÿ“ ¯ ÿ“ which shows the subadditivity, and where we also took advantage of the monotonicity of µ in order to obtain the inequality. If µ is σ-additive, then the σ-subadditivity follows in essentially the same way, replacing n in the above by . 8 (d) Using Lemma 1.1.12, the proof proceeds in the same way as that of [Dre18, Lemma 1.3.10] (that proof was for µ a probability measure on a σ-algebra, but it proceeds in the same way for µ a content on a ring).

The following definition will allow us to introduce the notion of continuity for functions defined on sets also.

Definition 1.2.14. We write An A as n , p q Ò Ñ8 if An n N is a sequence of sets such that An An 1 for all n N, and A n N An. p q P Ă ` P “ Similarly, we write P Ť An A as n , p q Ó Ñ8 if An n N is a sequence of sets such that An 1 An for all n N, and A n N An. p q P ` Ă P “ P Definition 1.2.15. Let S be a semiring and let µ be a content defined on S.Ş

(a) We say that µ is continuous from below (‘stetig von unten’) if for any sequence of sets An with An A as well as An, A S for all n N, one has p q Ò P P

lim µ An µ A . n Ñ8 p q“ p q

(b) We say that µ is continuous from above (‘stetig von oben’) if for any sequence of sets An with An A as well as An, A S and µ An for all n N, one has p q Ó P p qă8 P

lim µ An µ A . n Ñ8 p q“ p q

(c) We say that µ is continuous in (‘stetig in ’) if for any sequence of sets An with H H p q An as well as An S and µ An for all n N, one has ÓH P p qă8 P

lim µ An 0. n Ñ8 p q“ 1.2. SET FUNCTIONS 23

As the name of this section suggests, we will mostly be interested in measures, i.e., in particular the property of a set function being σ-additive will play a crucial role. Thus, it turns out useful to have other characterizations (and related properties) of σ-additivity available.

Proposition 1.2.16. Let µ be a content on a ring R. Consider the following properties:

(a) µ is σ-additive (i.e., it is a pre-measure);

(b) µ is continuous from below;

(c) µ is continuous from above;

(d) µ is continuous in ; H We have the following implications:

a b c d . p q ðñ p q ùñ p q ðñ p q If in addition µ is finite, then in the previous display can be replaced by . ùñ ðñ Proof. a b : 1p q ùñ p q1 Let An and A be as in the assumptions of (b). As done several times before already, we define p q B1 : A1 as well as “ n 1 ´ Bn : An Ai “ z i 1 ď“ R P loomoonR P for n 2. Then the Bn form a sequence of pairwiselooooomooooon disjoint sets with Bn R, and such that n ě n p q P Bi Ai for all n N. As a consequence, and since µ is σ-additive by assumption, i 1 “ i 1 P we“ get “ Ť Ť n 8 µ σ-additive 8 µ A µ Bi µ Bi lim µ Bi lim µ An , n n p q“ i 1 “ i 1 p q“ Ñ8 i 1 p q“ Ñ8 p q ´ ď“ ¯ ÿ“ ÿ“ which shows that µ is continuous from below. b a : 1p q ùñ p q1 Let An be a sequence of pairwise disjoint sets with An R for all n N as well as n N An R. p q 9 n P P P P Defining Bn : i 1Ai, we have a sequence Bn with Bn R and such that Bn B : “ “ p q P Ť Ò “ i N Bi R as n . Henceforth, the continuity from below supplies us with P P ŤÑ8 n Ť 8 µ continuous from below additivity 8 µ Ai µ B lim µ Bn lim µ Ai µ Ai , n n i 1 “ p q “ Ñ8 p q “ Ñ8 i 1 p q“ i 1 p q ´ ď“ ¯ ÿ“ ÿ“ which proves the σ-additivity of µ. b c : 1p q ùñ p q1 Let An and A be as in the assumptions of (c). Since µ An and An An 1 for all n N, p q p qă8 Ą ` P the additivity of µ implies that

µ A An µ A µ An . (1.2.7) p 1z q“ p 1q´ p q

Now because An A as n , we get that A An A A as n , and therefore the Ó Ñ 8 1z Ò 1z Ñ 8 validity of (b) implies the existence of the left-hand side of

(1.2.7) µ A1 A lim µ A1 An lim µ A1 µ An µ A1 lim µ An , n n n p z q“ Ñ8 p z q “ Ñ8 p q´ p q“ p q´ Ñ8 p q 24 CHAPTER 1. SET FUNCTIONS exists. Hence, using µ A , p 1qă8 lim µ An µ A1 µ A1 A µ A , n Ñ8 p q“ p q´ p z q“ p q where in the last equality we took advantage of the additivity of µ. c d : This is obvious. 1p q ùñ p q1 d c : 1p q ùñ p q1 Let An and A be as in (c). Then Bn : An A as n . Furthermore, the monotonicity p q “ z ÓH Ñ8 of µ in combination with µ An implies that µ Bn for all n N. As a consequence, p qă8 p qă8 P we deduce from the continuity of µ in that H µ An µ Bn µ A µ A as n , p q“ p q` p qÑ p q Ñ8 which implies the continuity of µ from above. It remains to prove b c in the case of µ being finite. For this purpose, let An and A 1p q ðù p q1 p q be as in (b). Then A An as n and µ An for all n N. Hence, the continuity of z ÓH Ñ8 p qă8 P µ in in combination with the finiteness of µ supply us with H µ A µ An µ A An 0 as n , p q´ p q“ p z qÑ Ñ8 and hence limn µ An µ A , which shows the continuity from below. Ñ8 p q“ p q

We now give a first application of Dynkin’s π-λ-Theorem (Theorem 1.1.33) that will prove useful later on.

Theorem 1.2.17. Let E be a π-system over Ω. Assume there exists a sequence of sets En p q with En E for all n N and En Ω. Furthermore, let µ ,µ denote two measures on P P n N “ 1 2 Ω,σ E such that P p p qq Ť (a) µ E µ E E E, (1.2.8) 1p q“ 2p q @ P (b) µ En µ En n N. (1.2.9) 1p q“ 2p qă8 @ P Then the measures µ1 and µ2 coincide and are σ-finite.

Proof. The σ-finiteness is a direct consequence of (1.2.9). Hence, it remains to show that µ1 and µ coincide. For E E with µ E , define 2 P 1p qă8 DE : D σ E : µ D E µ D E . “ P p q 1p X q“ 2p X q Claim 1.2.18. DE is a Dynkin system. (

Proof. We have Ω DE since the middle equality of µ Ω E µ E µ E µ Ω E P 1p X q“ 1p q“ 2p q“ 2p X q follows from E E. P Also, for D DE we have P µ Dc E µ E µ D E µ E µ D E µ Dc E , 1p X q“ 1p q´ 1p X q“ 2p q´ 2p X q“ 2p X q where the second equality follows from D DE and the fact that E E. Hence, we deduce c P P D DE. P It remains to show that DE is stable unter countable unions of pairwise disjoint sets. Thus, for a sequence Dn of pairwise disjoint sets with Dn DE for all n N, we deduce p q P P

µ1 Dn E µ1 Dn E µ2 Dn E µ2 Dn E , n N X “ n N p X q“ n N p X q“ n N X ´ ďP ¯ ÿP ÿP ´ ďP ¯ where the first and last equality exploit that the Dn E are pairwise disjoint, and the middle X equality takes advantage of the fact that Dn DE for all n N. P P 1.3. CARATHEODORY’S´ EXTENSION THEOREM (‘MASSERWEITERUNGSSATZ’) 25

Due to the above claim, DE is a Dynkin system with E DE. Since E is a π-system, Theorem Ă 1.1.33 implies that σ E DE. In particular, this implies that for each E σ E and n N, p q“ P p q P

µ E En µ E En . (1.2.10) 1p X q“ 2p X q n 1 Setting Fn : En i ´1 Ei we have the disjoint union “ z “ Ť 9 Fn Ω. n N P “ ď Thus, using that for E σ E we have Fn E σ E , we obtain in combination with (1.2.10) P p q X P p q that µ Fn E µ En Fn E µ En Fn E µ Fn E . 1p X q“ 1p X p X qq “ 2p X p X qq “ 2p X q Summing these identities over n N we obtain µ E µ E , which finishes the proof, since P 1p q “ 2p q E σ E had been chosen arbitrarily. P p q

Corollary 1.2.19. If µ and µ are measures on Ω,σ E with µ Ω µ Ω and such 1 2 p p qq 1p q“ 2p qă8 that (1.2.8) holds, then µ µ . 1 “ 2 Proof. We define E : E Ω . Then the assumptions of Theorem 1.2.17 are fulfilled with E “ Y t u replaced by E and En : Ω for all n N. Since σ E σ E , the result follows. “ P p q“ p q r 1.3 Carath´eodory’sr extension theoremr (‘Maßerweiterungssatz’)

In this section we will see how to extend contents to measures. As it turns out, in pursuing this endeavor it will be useful to make a detour via so-called ‘outer measures’ introduced below: Contents will give rise to ‘nice’ outer measures (see Theorem 1.3.9 below), which themselves (by restriction to ‘well-behaved’ sets) give rise to measures (see Theorem 1.3.5 below). We start with investigating how to go from outer measures to measures, and start with the definition of the former. Definition 1.3.1. For Ω a non-empty set, a set function µ : 2Ω 0, is called an outer ˚ Ñ r 8s measure (‘¨außeres Maß’), if (a) µ 0; ˚pHq “ (b) µ˚ is monotone;

(c) µ˚ is σ-subadditive.

Remark 1.3.2. Since an outer measure µ˚ is σ-subadditive by definition, we deduce that for Ω Ω A ,...,An 2 , we have, setting Am : 2 for m n, that 1 P “HP ą n n 8 σ-subadditivity 8 Definition 1.3.1 (a) µ˚ Ai µ˚ Ai µ˚ Ai µ˚ Ai . “ ď p q “ p q i 1 i 1 i 1 i 1 ´ ď“ ¯ ´ ď“ ¯ ÿ“ ÿ“ Thus, an outer measure is also subadditive. We will now introduce the concept of sets which are measurable with respect to an outer measure. Such sets will be the ‘well-behaved’ sets alluded to above; they form a σ-algebra on which (the restriction of) the outer measure induces a measure (see Theorem 1.3.5 below). Definition 1.3.3. Let Ω be a non-empty set and assume an outer measure µ : 2Ω 0, to ˚ Ñ r 8s be given. Then A 2Ω is called µ -measurable (‘µ -messbar’), if for all B 2Ω, P ˚ ˚ P c µ˚ B µ˚ B A µ˚ B A . (1.3.1) p q“ p X q` p X q 26 CHAPTER 1. SET FUNCTIONS

Remark 1.3.4. Since an outer measure is subadditive due to Remark 1.3.2, we infer that c µ˚ B µ˚ B A µ˚ B A p qď p X q` p X q holds true for all A, B 2Ω. Therefore, (1.3.1) is equivalent to P c µ˚ B µ˚ B A µ˚ B A . p qě p X q` p X q The next results is fundamental in our construction of measures and provides us with a recipe for how to obtain a measure from an outer measure. Theorem 1.3.5 (Carath´eodory). For an outer measure µ : 2Ω 0, , the set ˚ Ñ r 8s Mµ˚ : A Ω : A is µ˚-measurable “ Ă is a σ-algebra over Ω, and the restriction (

µ˚ M (1.3.2) | µ˚ of µ˚ to Mµ˚ is a measure. Proof. We start with proving the following claim.

Claim 1.3.6. Mµ˚ is an algebra over Ω. Proof. By Definition 1.3.1 (a) we have µ 0, and hence (1.3.1) trivially holds true for ˚pHq “ A Ω. “ c Furthermore, since (1.3.1) is symmetric in A and A we immediately deduce that Mµ˚ is stable under complements. It remains to show Property (c) of Definition 1.1.14. For this purpose, let A , A Mµ˚ , and 1 2 P let B Ω be arbitrary. Then Ă A1 M ˚ P µ c µ˚ B µ˚ B A µ˚ B A p q “ p X 1q` p X 1q M A2 µ˚ c c c P µ˚ B A µ˚ B A A µ˚ B A A “ p X 1q` p X 1 X 2q` p X 1 X 2q µ˚ subadditive 9 c c µ˚ B A1 B A1 A2 µ˚ B A1 A2 ě p X qYp X X q c` p X p Y q q µ˚ B A A µ˚ B A A . “ p X p 1 Y` 2qq ` p X p 1 Y 2q˘ q This shows that A B Mµ˚ , and therefore Mµ˚ is an algebra. Y P In order to show that Mµ˚ is a σ-algebra, due to Theorem 1.1.32 it is sufficient to show that it is a λ-system; Since Mµ˚ is an algebra due to Claim 1.3.6, we immediately get Ω Mµ˚ and we also know P that Mµ˚ is stable under complements. Thus, it remains to show the union of a countable family of pairwise disjoint elements of Mµ˚ is contained in Mµ˚ again. For this purpose, let An be a sequence of pairwise disjoint sets with An Mµ˚ for all n N. p q P P We start with inductively proving that for all n N, P n n µ˚ B Ai µ˚ B Ai B Ω. (1.3.3) X i 1 “ i 1 p X q @ Ă ´ ď“ ¯ ÿ“ Indeed, for n 1 this boils down to a tautology, so assume (1.3.3) holds for arbitrary n N. “ n P Then, using that the Ai are pairwise disjoint and that i 1 Ai Mµ˚ , “ P n 1 n 1 n n 1 n ` ` Ť ` c µ˚ B Ai µ˚ B Ai Ai µ˚ B Ai Ai X i 1 “ X i 1 X i 1 ` X i 1 X i 1 ´ ď“ ¯ n ´´ ď“ ¯ ď“ ¯ ´´ ď“ ¯ ´ ď“ ¯ ¯ µ˚ B Ai µ˚ B An 1 , “ p X q` p X ` q i 1 ÿ“ 1.3. CARATHEODORY’S´ EXTENSION THEOREM (‘MASSERWEITERUNGSSATZ’) 27 where we used the induction assumption in the last step. This finishes the induction step and therefore establishes (1.3.3). We infer that n n n c 8 c µ˚ B µ˚ B Ai µ˚ B Ai µ˚ B Ai µ˚ B Ai , p qě X i 1 ` X i 1 ě i 1 p X q` X i 1 ´ ď“ ¯ ´ ´ ď“ ¯ ¯ ÿ“ ´ ´ ď“ ¯ ¯ where we used the fact that Mµ˚ is an algebra to get the inequality, and the inequality is a consequence of display (1.3.3) in combination with the monotonicity of µ . Taking n in ˚ Ñ 8 the previous display and using that µ˚ is σ-subadditive, we arrive at

8 8 c µ˚ B µ˚ B Ai µ˚ B Ai p qě p X q` X i 1 i 1 ÿ“ ´ ´ ď“ ¯ ¯ (1.3.4) 8 8 c µ˚ B Ai µ˚ B Ai , ě X i 1 ` X i 1 ´ ď“ ¯ ´ ´ ď“ ¯ ¯ which shows that i8 1 Ai Mµ˚ . Therefore, Mµ˚ is a λ-system, which in combination with “ P Claim 1.3.6 and Theorem 1.1.32 implies that Mµ˚ is a σ-algebra also. Ť Last but not least, we have to show (1.3.2). Since µ is an outer measure, we have µ 0 by ˚ ˚pHq “ definition. Also, for a sequence An of pairwise disjoint sets with An Mµ˚ we have, choosing p q P B : n8 1 An, that the first inequality in (1.3.4) supplies us with “ “ Ť 8 8 µ˚ An µ˚ An . n 1 ě n 1 p q ´ ď“ ¯ ÿ“ In combination with the σ-subadditivity of µ this concludes the proof that µ M is a measure. ˚ ˚| µ˚

Definition 1.3.7. A measure µ on a measurable space Ω, F (or the measure space Ω, F,µ p q p q for that matter) is called complete, if for all M F with µ M 0, and all N M, one has P p q “ Ă that N F as well. P Exercise 1.3.8. Show that the measure µM˚ on Ω, Mµ˚ from Theorem 1.3.5 is complete µ˚ p q in the sense of Definition 1.3.7. In what follows, we will assume the standard convention that

inf and sup . H“8 H“´8

Theorem 1.3.9. Let A 2Ω with A, and let µ : A 0, be a set function with Ă HP Ñ r 8s µ 0. For A Ω define the set function µ : 2Ω 0, via pHq “ Ă ˚ Ñ r 8s 8 8 µ˚ A : inf µ Ai : A1, A2,... A, and A Ai , A Ω. (1.3.5) p q “ i 1 p q P Ă i 1 Ă ! ÿ“ ď“ ) Then,

(a) µ˚ defines an outer measure;

(b) if A is a semiring and µ is a content, then A Mµ˚ ; Ă (c) if A is a semiring and if µ is not only a content but also σ-subadditive, then

µ˚ A µ. | “ 28 CHAPTER 1. SET FUNCTIONS

Proof. (a) Setting An : A, the definition in (1.3.5) of µ immediately yields µ 0. “HP ˚ ˚pHq “ The monotonicity of µ˚ also is a direct consequence of (1.3.5).

It remains to show the σ-subadditivity. For this purpose, let An be a sequence with p q An Ω for all n N, and assume without loss of generality that µ An for all Ă P ˚p qă8 n N. Then for ε 0 arbitrary, choose for each n N a covering An,i i N of An with P ą P p q P An,i A for all i N and such that P P

8 n µ An,i µ˚ An 2´ ε. (1.3.6) p qď p q` i 1 ÿ“ Then An An,i, n N Ă i,n N ďP ďP and by the definition of µ˚ in combination with (1.3.6), we deduce

µ˚ An µ An,i µ˚ An ε. n N ď n N i N p q ď n N p q` ´ ďP ¯ ÿP ÿP ÿP (1.3.6) ˚ ´n µ An 2 ε ďlooooomooooonp q` Since ε 0 had been chosen arbitrarily, we deduce ą

µ˚ An µ˚ An , n N ď n N p q ´ ďP ¯ ÿP which shows the σ-subadditivity of µ˚.

(b) For A A arbitrary we want to show that for all E Ω we have P Ă c µ˚ E A µ˚ E A µ˚ E , p X q` p X q“ p q and without loss of generality we can assume µ E . Then for ε 0 arbitrary but ˚p qă8 ą fixed, we find a sequence An such that p q • An A for all n N; P P • E n8 1 An; Ă “ • Ť 8 µ An µ˚ E ε. (1.3.7) n 1 p qď p q` ÿ“ Now since A is a semiring, for any n N, we find mn N and An,i, 1 i m, such that P P ď ď

mn c 9 An A An A An,i. i 1 X “ z “ “ ď As a consequence, we get

mn c 8 8 9 µ˚ E A µ˚ E A µ An A µ˚ An,i i 1 p X q` p X qď n 1 p X q` n 1 “ “ “ ÿ ÿmn `ď ˘ 8 8 (1.3.7) µ An A µ An,i µ An µ˚ E ε. ď n 1 p X q` i 1 p q “ n 1 p q ď p q` ÿ“ ´ ÿ“ ¯ ÿ“ Since ε 0 was chosen arbitrarily and using Remark 1.3.4, this finishes the proof. ą 1.3. CARATHEODORY’S´ EXTENSION THEOREM (‘MASSERWEITERUNGSSATZ’) 29

(c) Let A A. On the one hand, by definition of µ we have µ A µ A . On the other P ˚ ˚p q ď p q hand, for any covering A1, A2,... A with A i8 1 Ai we have P Ă “ Ť 8 8 µ A µ A Ai µ Ai A , (1.3.8) p q“ p X q ď p X q i 1 n 1 ´ ď“ ¯ ÿ“ where we used that µ is σ-subadditive in combination with the fact that Ai A A for all X P i N. Since display (1.3.8) holds for any suitable covering, this shows that µ A µ A , P p qď ˚p q which finishes the proof.

We can summarize Theorems 1.3.9 and 1.3.5 to obtain the following corollary.

Corollary 1.3.10. Let S be a semiring and let µ be a content on S which is σ-subadditive. By µ˚ we denote the corresponding outer measure as defined in (1.3.5). Then there exists a measure µ : Mµ˚ 0, suchr that µ and µ coincide on S. Ñ r 8s Furthermore,r if µ is σ-finite, then so is µ, and in this case the restriction of µ to σ S Mµ˚ p qĂ is the unique extension of µ to a measurer on σ S . r p q r r Proof. The existence of a measure µ on M ˚ extending µ follows from Theorems 1.3.9 and r µ 1.3.5. r If µ is σ-finite, then there exists a sequence Sn such that Snr S, n N Sn Ω, and µ Sn p q P “ p qă8 for all n N. In particular, this implies that µ is σ-finite. P P Ť Furthermore,r using Theorem 1.2.17 we deduce that there exists at most one measure on σ S p q which extends µ, so the extension is unique.

1.3.1 Lebesguer measure Our principal goal in this section is to extend the elementary content λd that we had defined on hyperrectangles in (1.2.2) to a measure on the σ-algebra generated by the hyperrectangles; due to Lemma 1.1.31, this σ-algebra coincides with B Rd . p q Theorem 1.3.11 (d-dimensional Lebesgue measure). There exists a uniquely determined σ- finite measure λd on Rn, B Rd such that p p qq d d d λ a, b bi ai a, b R with a b. (1.3.9) pp sq “ p ´ q @ P ă i 1 ź“ λd is called the d-dimensional (Borel-) Lebesgue measure.

Proof. In order to distinguish Lebesgue content and Lebesgue measure, let us write λd for the Lebesgue content. We will apply Corollary 1.3.10 to the case S Id and µ λd. In combination withr the fact “ “ that σ Id B Rd (cf. Lemma 1.1.31), the measure whose existence is implied by Corollary p q “ p q 1.3.10 will be the unique one defined on B Rd and satisfyingr (1.3.9). p q r We have to show that λd is σ-subadditive on Id. (1.3.10) For this purpose let a, b Rd with a b be given, and let a n , b n , where a n , b n Rd P r ď p p q p qs p q p q P with a n b n for all n N, be a sequence of hypercubes (i.e., elements of Id) such that p qď p q P

8 a, b a n , b n . p sĂ n 1p p q p qs ď“ 30 CHAPTER 1. SET FUNCTIONS

It is sufficient to show that for the content λd on Id we have

r8 λd a, b λd a n , b n . (1.3.11) pp sq ď n 1 pp p q p qsq ÿ“ r r We will show this inequality by exploiting a continuity property of the content λd in order to apply a compactness argument which then reduces the above to a finite setting. To be precise, for ε 0 arbitrary choose for each n N an element bε n Rd such that bε nr b n and ą P p q P p q ą p q such that d ε d n λ a n , b n λ a n , b n 2´ ε. pp p q p qsq ď pp p q p qsq ` In addition, choose aε Rd such that aε a and such that P r ą r λd aε, b λd a, b ε. (1.3.12) pp sq ě pp sq ´ Now r r 8 aε, b a, b a n , bε n . r s Ă p sĂ n 1p p q p qq ď“ Since the left-hand side is a compact set, there exists N N such that 0 P

N0 aε, b a n , bε n . r sĂ n 1p p q p qq ď“ Hence, using the fact that the content λ is (finitely) subadditive (cf. Lemma 1.2.13), we deduce

(1.3.12) λd a, b λd aε, b ε pp sq ď pp sq ` N0 N0 r λrd a n , bε n ε λd a n , b n 2ε. ď n 1 pp p q p qsq ` ď n 1 pp p q p qsq ` ÿ“ ÿ“ r r In particular, this implies

8 λd a, b λd a n , b n 2ε. pp sq ď n 1 pp p q p qsq ` ÿ“ r r Since ε 0 was chosen arbitrarily, this establishes (1.3.10) and hence (1.3.9). ą The σ-finiteness follows immediately from the σ-finiteness of the Lebesgue content and the fact that Lebesgue content and Lebesgue measure coincide on Id.

Remark 1.3.12. For simplicity write λ˚ for the outer measure induced by the d-dimensional Lebesgue content as defined in (1.3.5). Theorem 1.3.5 actually shows that the measure λd can be d defined on the σ-algebra L R : Mλ˚ of the so-called Lebesgue-measurable sets (in a unique p q “ way, and L Rd is the completion of B Rd , i.e., the smallest complete σ-algebra containing p q p q B Rd ; see [Els05, Corollary II.6.5]), where by λ we denote the outer measure induced by the p q ˚ content λ and (1.3.5). From Theorem 1.3.9 we infer that

r B Rd L Rd 2Ω, p qĂ p qĂ and one can show that both of these inclusions are strict (see [Els05, Example II.4.6]). 1.3. CARATHEODORY’S´ EXTENSION THEOREM (‘MASSERWEITERUNGSSATZ’) 31

1.3.2 Lebesgue-Stieltjes measure Theorem 1.3.13 (Lebesgue-Stieltjes measures). Let a non-decreasing function and right- continuous function F : R R be given. Recall the semiring I of left-open right-closed intervals Ñ in R which had been introduced in Example 1.1.6. Then

µF : I 0, , Ñ r 8q a, b F b F a , rp s ÞÑ p q´ p q defines a content, and there exists a unique σ-finite measure µF on R, B R such that p p qq

µF I µF I for all I I. p q“ p q P µF is called the Lebesgue-Stieltjes measurer of F.

Proof. We have µF 0, and for pairwise disjoint intervals a 1 , b 1 ,..., a n , b n with n pHq “ p p q p qs p p q p qs a, b i 1 a i , b i some a, b R with a b we have p s“ “ p p q p qs P ă r Ť n µF a i , b i F max b i F min a i µF a, b , p p q p qs “ 1 i n p q ´ p1 i n p qq “ pp sq i 1 ď ď ď ď ÿ“ ` ˘ ` ˘ r r since a min1 i n a i and b max1 i n b i . This means that µF is additive also, and hence “ ď ď p q “ ď ď p q a content on I. Since I is a semiring, according to Corollary 1.3.10, the only thingr that is left to show is that µF is σ-subadditive and σ-finite. For the first, we proceed similarly to the proof of Theorem 1.3.11. Indeed, let a, b as well as a n , b n n N with a, b, a n , b n R be given such that p s pp p q p qsq P p q p q P a b and a n b n for all n N, and such that r ď p qď p q P

8 a, b a n , b n . p sĂ n 1p p q p qs ď“ For ε 0 given we take advantage of the right continuity of F in order to deduce the existence ą of aε a such that ą µF aε, b µF a, b ε. pp sq ě pp sq ´ In addition, using the right continuity of F again, for each n N we find bε n R with r r P p q P bε n b n and p qą p q n µF a n , b n µF a n , bε n ε2´ . pp p q p qsq ě pp p q p qsq ´ As in the proof of Theorem 1.3.11 we can then deduce that r r

µF a, b µF a n , b n 2ε. pp sq ď n N pp p q p qsq ` ÿP Since ε 0 had been choosenr arbitrarily, wer infer ą

µF a, b µF a n , b n , pp sq ď n N pp p q p qsq ÿP which implies the desired σ-subadditivity.r r Regarding the σ-finiteness, we observe that µF is σ-finite since µF n,n for all n N, pp´ sqă8 P and hence so is µF . r r Example 1.3.14. (a) Show that for F : R R with F x x the Lebesgue-Stieltjes measure Ñ p q“ 1 µF from Theorem 1.3.13 coincides the one-dimensional Lebesgue measure λ . 32 CHAPTER 1. SET FUNCTIONS

(b) Let f : R 0, be continuous and set Ñ r 8q x F x : f r dr, x R, p q “ p q P ż0 with the right-hand side interpreted as Riemann integral. Then F is non-decreasing and continuous (e.g. due to the Fundamental Theorem of Calculus), and hence we can use Theorem 1.3.13 to deduce the existence of a measure µF on σ I B R . p q“ p q Remark 1.3.15. If we have a closer look at the proof of Theorem 1.3.13 again, we discover that it did not hinge on the special nature of the Lebesgue content, but rather we only needed some continuity properties of the content. In particular, if F1,...,Fn are non-decreasing and right- continuous functions from R to R just as in Theorem 1.3.13, then these give rise to measures µ ,...,µn on R, B R just in the same vein as in Theorem 1.3.13. We can then perform the 1 p p qq proof of Theorem 1.3.11 in essentially the same way as before and obtain that there exists a unique σ-finite measure µ on Rn, B Rn (recall that σ In B Rn due to Lemma 1.1.31) p p qq p q “ p q such that n n µ a, b µi ai, bi , a, b R with a b. pp sq “ pp sq @ P ă i 1 “ n ź We also write µi : µ and call µ the product measure of µ ,...,µn. bi 1 “ 1 In Section 2.4 below“ we will see how to construct product measures not only as products of measures on R, B R but on arbitrary measurable spaces, and also how to construct infinite p p qq products of measures. In what follows, most results and definitions concerning probability spaces can be generalized to finite measure spaces Ω, F,µ , where µ is a finite measure. There is, however, oftentimes p q a more significant difference between the case of µ being finite or infinite, so we have to be a bit more careful when trying to transfer results we have for finite measure spaces to infinite measure spaces.

1.4 Measurable functions, random variables

The Riemann integral had been introduced by partitioning the domain of definition of the integrand (oftentimes intervals in R or hypercubes in Rd) into finer and finer pieces, and then consider the upper and lower Riemann sums. If they converged to the same limit as the partitions got finer and finer, this limit had been defined as the corresponding Riemann integral. The notion of integral we will be introducing later on,10 on the other hand, will be defined for real or complex functions defined on a measure space Ω, F,µ . It can essentially be defined by p q partitioning the range of the integrand into finer and finer pieces. In particular, for an integrand f this will require that preimages of intervals, i.e., sets of the form f 1 a, b , can be measured ´ pr sq by the measure underlying the domain of definition of the function. This means we want expressions of the type µ f 1 a, b to be well-defined, which is equivalent to f 1 a, b F. p ´ pr sqq ´ pr sq P As a consequence, such functions will play a special role. (In the case of Ω, F,µ being a p q probability space, the definition of random variables in [Dre18, Def.1.7.1] had been general enough to serve our purposes.) Definition 1.4.1. Let Ω, F and E, E be measurable spaces. A function f :Ω E is called p q p q Ñ a measurable function (‘messbare Funktion’) if for all A E its preimage under f is contained P in F, i.e., if

1 f ´ A : ω Ω : f ω A F, A E. p q “ t P p q P u P @ P 10It is called the Lebesgue integral, although it not only refers to integrals with respect to the standard Lebesgue measure on pRd, BpRdqq, but with respect to arbitrary measures. 1.4. MEASURABLE FUNCTIONS, RANDOM VARIABLES 33

In this case, f is also said to be F E-measurable (‘F E-messbar’). ´ ´ If not only a measurable space Ω, F but a probability space Ω, F,µ is given, then f as above is p q p q called a (‘Zufallsvariable’) If furthermore E, E R, B R , then f is referred p q “ p p qq to as a real random variable (‘reelle Zufallsvariable’, ‘Zufallsgr¨oße’), and if E, E R, B R , p q “ p p qq we say that f is an extended real random variable (‘erweiterte reelle Zufallsvariable’). Even if we have given only a measurable space Ω, F , we denote the space of extended real p q functions by M, and the subset of non-negative extended real-valued functions by M`. For a random variable X the values X ω , ω Ω, are also called realizations of the random variable p q P X. For the sake of simplicity, functions f : Rd Rk which are B Rd B Rk -measurable are just Ñ p q´ p q called Borel-measurable.

Example 1.4.2. (a) Any constant function from Ω, F to E, E is F E-measurable. p q p q ´ (b) For A Ω the indicator function of A is defined as Ă

1A :Ω 0, 1 Ñ t u 1, if ω A, ω P ÞÑ 0, if ω A. " R

Ω Show that if Ω, F is a measurable space, then for A 2 the indicator function 1A is p q Ă F B R -measurable if and only if A F. ´ p q P (c) Let Ω be a non-empty set and let F, G be two σ-algebras on Ω with F G. The identity Ĺ function id from Ω to Ω, where id ω : ω, is G F-measurable. However, it is not Ω Ωp q “ ´ F G-measurable. ´ (d) If either F 2Ω or E E, , then any function from Ω, F to E, E is F E- “ “ t Hu p q p q ´ measurable.

Remark 1.4.3. Let Ω, F and E, E be measurable spaces. Let furthermore f :Ω E be an p q p q Ñ F E-measurable function. Recall the definition of the trace-σ-algebra EG from Exercise 1.1.19 ´ for G E, and assume that f Ω G. Then, if G E, the function f can also be interpreted Ă p qĂ P as a F EG-measurable function from Ω to G. This is not necessarily true if G E. ´ R To us, random variables will be important to describe the outcomes of experiments that we consider to be random (prime examples being dice rolls or coin tosses). For more intuition on random variables we refer to [Dre18], in particular Section 1.7. Note, however, that in [Dre18] essentially all functions which occurred were measurable by definition anyway since we only investigated nice and simple settings. In general, the measurability has to be established when investigating arbitrary functions between measurable spaces. We will provide a useful tool for this in Theorem 1.4.7 below, but before we will establish some further general properties for measurable functions.

Theorem 1.4.4 (Compositions of measurable functions). Let Ωi, Fi , i 1, 2, 3 , be measur- p q P t u able spaces, and let fi :Ωi Ωi 1 be Fi Fi 1-measurable maps, for i 1, 2 . Ñ ` ´ ` P t u Then the composition

f f :Ω Ω 2 ˝ 1 1 Ñ 3 ω f f ω 1 ÞÑ 2p 1p 1qq is a F F -measurable map from Ω to Ω . 1 ´ 3 1 3 34 CHAPTER 1. SET FUNCTIONS

Proof. For arbitrary F F we have 3 P 3 1 1 1 f f ´ F f ´ f ´ F F , p 2 ˝ 1q p 3q“ 1 p 2 p 3qq P 1 where the latter takes advantage of the fact that

1 f ´ F F 2 p 3q P 2 since f is F F -measurable. This concludes the proof. 2 2 ´ 3 Since σ-algebras are often large, it is practically not feasible to check the measurability condition of Definition 1.4.1. As a remedy, Theorem 1.4.7 below states that it is sufficient to check it on a generator, which is often easier to achieve. We are now going to prepare its proof via a couple of auxiliary results.

Claim 1.4.5 (‘Operationstreue’). Given an arbitrary map f from a non-empty space X to a non-empty space Y , consider the preimage map

1 Y X f ´ : 2 2 Ñ 1 A f ´ A . ÞÑ p q Then the following properties hold:

Y (a) For an arbitrary family Bλ , λ Λ, with Bλ 2 for all λ Λ, p q P P P 1 1 f ´ Bλ f ´ Bλ , “ p q λ Λ λ Λ ´ ďP ¯ ďP and 1 1 f ´ Bλ f ´ Bλ ; “ p q λ Λ λ Λ ´ čP ¯ čP (b) for each B 2Y , P 1 c 1 c f ´ B f ´ B . p q “ p p qq Proof. Exercise.

The previous claim helps in deriving the following lemma.

Lemma 1.4.6. Let f :Ω E be an arbitrary mapping, and let H be an arbitrary subset of Ñ 2E. Then 1 1 σ f ´ H f ´ σ H . p p qq “ p p qq In particular, if H is a σ-algebra over E, then f 1 H is a σ-algebra over Ω. ´ p q Proof. We have 1 1 f ´ H f ´ σ H , p qĂ p p qq and using Claim 1.4.5, we infer that the right-hand side of the previous display is a σ-algebra, whence 1 1 σ f ´ H f ´ σ H . p p qq Ă p p qq We now prove the converse inclusion using the good sets principle. Denote by G those subsets of E the preimages of which under f which are contained in σ f 1 H : p ´ p qq 1 1 G : G E : f ´ G σ f ´ H . (1.4.1) “ Ă p q P p p qq ( 1.4. MEASURABLE FUNCTIONS, RANDOM VARIABLES 35

Then we have E G, and furthermore, using Claim 1.4.5 we deduce that G is stable under P countable unions and complements. Hence, G is a σ-algebra. In addition, as a consequence of (1.4.1) we get H G. Thus, in particular we deduce that Ă 1 1 1 f ´ σ H f ´ G σ f ´ H , p p qq Ă p qĂ p p qq which finishes the proof.

We are now ready to prove the tool announced above, which shows that in order to prove measurability of a map, it is sufficient to consider the generator of the corresponding σ-algebra in the image space.

Theorem 1.4.7. Let f be a mapping from the measurable space Ω, F to the measurable space p q E, E . Furthermore, let G 2E be any generator of E. p q Ă Then f is F E-measurable if and only if f 1 G F. ´ ´ p qĂ Proof. If f is F E-measurable, then since G E we deduce from the definition that f 1 G F. ´ Ă ´ p qĂ To prove the converse inclusion, assume that f 1 G F. Then ´ p qĂ 1 σ f ´ G F, p p qq Ă and the left-hand side of this display coincides with f 1 σ G f 1 E due to Lemma 1.4.6, ´ p p qq “ ´ p q which finishes the proof.

We will now derive a corollary of the previous result for which we introduce the following notation.

Definition 1.4.8. Let Λ be a non-empty set, and let Xλ, λ Λ be mappings from Ω to sets P Eλ. Furthermore, let Eλ be σ-algebras on Eλ, λ Λ. We denote by σ Xλ : λ Λ the smallest P p P q σ-algebra on Ω such that each Xλ is σ Xλ : λ Λ Eλ-measurable. σ Xλ : λ Λ is also p P q´ p P q called the σ-algebra generated by the Xλ, λ Λ. P Corollary 1.4.9. Let Λ and measurable spaces Ω, F , Ω, F , as well as Ωλ, Fλ , λ Λ, ‰H p q p q p q P be given. Furthermore, assume maps Yλ : Ω Ωλ, λ Λ, to be given such that F σ Yλ : Ñ P “ p λ Λ . r r P q Then a map X :Ω Ω is F F-measurabler if and only if the compositions Yλ Xr :Ω Ωλ Ñ ´ ˝ Ñ are F Fλ-measurable for all λ Λ. ´ P r r Proof. If X is measurable, then all the compositions are measurable due to Theorem 1.4.4. If, on the other hand, all the Yλ X are F Fλ-measurable, then we start with observing that ˝ ´ by definition of F, 1 G : Yλ´ F : λ Λ, F Fλ r “ p q P P 1 is a generator of F. But since all Yλ are measurable by assumption,( we have X G F, and r ´ p qĂ hence Theorem 1.4.7 implies that X has the desired measurability properties. r r The following result is interesting in its own right, but it will also play an important role in proving the central Proposition 1.4.13 below. We recall the facts we had learned about topologies in Definition 1.1.28 and below. Furthermore, we remind ourselves that a function from a topological space O , O to a topological space O , O was defined to be continuous p 1 1q p 2 2q if and only if f 1 O O for all O O (this definition coincided with the definition for ´ p q P 1 P 2 continuity in the case of metric spaces or Rd).

Theorem 1.4.10. Let O , O and O , O be topological spaces. Then any continuous map p 1 1q p 2 2q from O , O to O , O is B O B O -measurable. p 1 1q p 2 2q p 1q´ p 2q 36 CHAPTER 1. SET FUNCTIONS

Proof. Since f is continuous, by definition f 1 O O , so Theorem 1.4.7 implies ´ p 2q Ă 1 f 1 B O B O , which finishes the proof. ´ p p 2qq Ă p 1q Proposition 1.4.11. Let f : Ω, F Rd, B Rd . For i 1,...,d denote by p q Ñ p p qq P t u d πi : R R, Ñ x ,...,xd xi, p 1 q ÞÑ the projection on the i-th coordinate. d Then the function the function f is F B R -measurable if and only if πi f is F B R - ´ p q ˝ ´ p q measurable for all admissible choices of i.

d Proof. Once we show B R σ πi, 1 i d , the result follows from Corollary 1.4.9. d p q“ p ď ď q d Since the πi : R R are continuous, Theorem 1.4.10 implies their B R B R -measurability, Ñ d p q´ p q and we infer that σ πi, 1 i d B R . p ď ď qĂ d p q1 d On the other hand, due to a, b i 1πi´ ai, bi , a b R and the fact that such hyper- d r s“X “ pr sq ď P d rectangles generate B R due to Lemma 1.1.31, we infer B R σ πi, 1 i d , which p q p q Ă p ď ď q finishes the proof.

Example 1.4.12. (a) Let f : Ω, F Rd, B Rd be a F B Rd -measurable function. p q Ñ p p qq ´ p q Then the function

f : Ω, F R, B R , } }2 p q Ñ p p qq ω f ω , ÞÑ } p q}2

n d 2 is F B R -measurable, where for x R we write x 2 : i 1 xi for the so-called ´ p q P } } “ “ 2-norm on Rd. In fact, as a composition of continuous functions,b the function Rd x ř Q ÞÑ x 0, is continuous again, and hence F B R -measurable due to Theorem and } }2 P r 8q ´ p q 1.4.10. Thus, due to Theorem 1.4.4 the function f is F B Rd -measurable. ´ p q (b) Let f, g : Ω, F R, B R be F B R -measurable functions. Then the functions p q Ñ p p qq ´ p q

f g, f g f `,f ´, and f _ ^ | | are F B R -measurable. Here, we use the standard notation that for x,y R ´ p q P

x y : max x,y , x y : min x,y , as well as x` : x 0 and x´ : x 0 0. _ “ p q ^ “ p q “ _ “ ´p ^ qě In the following result we will summarize a couple of important compositions of functions which supply us with measurable functions again.

Proposition 1.4.13. Let f be a F B R -measurable function from Ω to R, and let g, h :Ω ´ p q Ñ Rd be F B Rd -measurable functions. ´ p q Then g h, g h, f g, g f are also measurable functions. ` ´ ¨ { Remark 1.4.14. Here we must pay attention to how define g f in the case that f (or f and g) { are 0. For the setting of this result and the following proof, we set x 0 : 0 (which might seem { “ awkward in the case x 0, but more natural for x 0; either way, scrutinizing the proof below ‰ “ we will see that any convention x 0 : c, some c R, would work and still leave us with g f { “ P { measurable). 1.4. MEASURABLE FUNCTIONS, RANDOM VARIABLES 37

Proof. For the case g h, we observe that the function Ω ω g ω , h ω Rd Rd ` Q ÞÑ p p q p qq P p ˆ q is a F B R2d -measurable function due to Proposition 1.4.11. Furthermore, the function ´ p q Rd Rd x,y x y Rd is continuous, so in combination with Theorem 1.4.4 we infer ˆ Q p q ÞÑ ` P that the composition g h :Ω ω g ω h ω Rd is F B Rd -measurable. ` Q ÞÑ p q` p q P ´ p q The remaining cases follows in a similar fashion, and the only point where we do have to pay some extra attention is g f in the case that the denominator vanishes. Since we can write { g f g 1 , and we know that f, g are measurable and that the product of two measurable { “ ¨ f functions is measurable again, due to Theorem 1.4.4 the only thing that remains to show is that i : R x 1 x R is B R B R -measurable. For this purpose, observe that for U R open Q ÞÑ { P p q´ p q Ă we have 1 1 i´ U i´ U 0 U 0 B R . p q“ p zt u q Yp X t uq P p q R 0 open Ă zt u R 0 open, sinceloomooni continuous on R 0 Ă zt u zt u Hence, the result follows in combinationloooooooomoooooooon with Theorem 1.4.7.

Proposition 1.4.15. Let fn be a sequence of functions in M. Then the functions p q sup fn, inf fn, lim sup fn and lim inf fn, n N n N n n P P Ñ8 Ñ8 are all in M again. Proof. For any a R we have P 1 sup fn a, fn´ a, n N P p 8s “ n N pp 8sq P ( ďP

Thus, using Lemma 1.1.31 and Theorem 1.4.7 yields that supn N fn has the desired measurability P property. The case of infn N fn can be shown similarly or otherwise by using the identity P infn N fn supn N fn and invoking Proposition 1.4.13 twice. P “´ P p´ q With regard to lim supn fn, we observe that Ñ8 lim sup fn inf sup fm, n “ n N m n Ñ8 P ě and use the first part of this Proposition to first conclude that supm n fm has the desired ě measurability properties, and then to conclude that the same holds true for infn N supm n fm, which finishes this part. P ě Again, for the case lim infn fn we can follow one of the two alternative routes outlined in the Ñ8 proof of the first part.

Corollary 1.4.16. Let fn be a sequence of functions in M such that p q f ω : lim fn ω n p q “ Ñ8 p q exists for all ω Ω. P Then f M. P Another result that comes in handy is the following. Lemma 1.4.17 (factorization lemma (‘Maßtheoretischer Dreisatz’)). Let Ω, F and Ω , F p q p 1 1q be two measurable spaces and let T :Ω Ω and f :Ω R be two mappings. Then f is Ñ 1 Ñ T 1 F B-measurable if and only if there exists a F B-measurable mapping ϕ :Ω R ´ p 1q´ 1 ´ 1 Ñ such that f ϕ T. “ ˝ Proof. Exercise. 38 CHAPTER 1. SET FUNCTIONS

1.5 Image measures, distributions

As introduced in [Dre18, Definition 1.8.12], for real-valued random variables the concept of its (cumulative) distribution function plays a prominent role.

Definition 1.5.1. Let X be a random variable defined on a probability space Ω, F, P and p q mapping to Rd, B Rd . Then the function p p qq d FX : R 0, 1 , Ñ r s d t P X t P X ,ti , ÞÑ p ď q“ P p´8 s i 1 ´ ą“ ¯ is called the (cumulative) distribution function (or cdf) of X (‘Verteilungsfunktion von X’). Similarly, if µ is a probability measure on Rd, B Rd , then the function p p qq d Fµ : R 0, 1 , Ñ r s d t µ ,ti , ÞÑ p´8 s i 1 ´ ą“ ¯ is called the distribution function of µ (‘Verteilungsfunktion von µ’).

Of particular importance to us will be the case d 1, and in [Dre18, Thm. 1.8.16] we had found “ the following properties of distribution functions.

Theorem 1.5.2. If F is the distribution function of a real random variable or of a probability measure on R, B R , then p p qq (a) F is non-decreasing;

(b) lim F t 0, lim F t 1; t t Ñ´8 p q“ Ñ8 p q“

(c) F is right-continuous (i.e., for all t0 R one has F t0 limt t0 F t ; P p q“ Ó p qq We had also introduced the following definition and theorem as [Dre18, Def. 1.8.17, Thm.1.8.18].

Definition 1.5.3. Any function F : R 0, 1 that satisfies the three properties given in Ñ r s Theorem 1.5.2 is called a distribution function (‘Verteilungsfunktion’).

The following result complements Theorem 1.5.2, and combined they establish that there is a correspondence between random variables and distribution functions.

Theorem 1.5.4. If F is any distribution function, then there exists a unique probability measure µF on R, B R such that Fµ F. p p qq F “ However, at that point we were not able to prove this result. Since we are now in the position to do so, we give the proof.

Proof of Thm.1.5.4. See homework problems.

Observing that the distribution function FX of a real random variable X depends only on the probability measure P X , and combining Theorems 1.5.2 and 1.5.4, we deduce that there p P ¨q exists a one-to-one correspondence between probability measures on R, B R and distribution p p qq functions F : R 0, 1 . Ñ r s 1.5. IMAGE MEASURES, DISTRIBUTIONS 39

Corollary 1.5.5. The mapping induced by Definition 1.5.1, which maps a probability measure µ on R, B R to its distribution function Fµ defines a bijection from the space of probability p p qq measures on R, B R to the space of distributions functions on R. p p qq Theorem 1.5.6. Let Ω, F,ν be a measure space and let E, E be a measurable space. Fur- p q p q thermore, assume a F E-measurable mapping f to be given. Then the set function ´ 1 ν f ´ : E 0, , ˝ Ñ r 8s 1 F ν f ´ F , ÞÑ p p qq defines a measure on E, E . We will also sometimes denote it by νf for simplicity of notation. p q If ν is a probability measure (so f is a random variable), νf is also called the distribution of f. Proof. The proof proceeds in the same way as the one of [Dre18, Theorem 1.7.6], which is for probability measures.

Definition 1.5.7. The measure ν f 1 introduced in Theorem 1.5.6 is called the image measure ˝ ´ (or pushforward) of ν under f. In the case of f being a random variable (i.e., ν is a probability measure), the image measure ν f 1 is called the distribution (or law) of f. ˝ ´ As we will see, in probability theory, oftentimes a random variable X itself will not be of too much importance to us; rather, what we will be interested in usually is its law.

Example 1.5.8. Arguably the most prominent example of a distribution is the so-called Normal distribution or Gaussian distribution (‘Normalverteilung’, ‘Gaußverteilung’). We say that for µ R, σ 0, , a random variable X defined on a probability space Ω, F, P is N µ,σ2 P P p 8q p q p q distributed if s 2 1 px´µq P X s e´ 2σ2 dx, s R. p ď q“ ?2πσ2 @ P ż8 This really defines a probability measure, i.e.,

2 1 8 px´µq e´ 2σ2 dx 1. (1.5.1) ?2πσ2 “ ż8 Indeed, we have

2 8 x2 2 8 8 x2 2 y2 2 8 r2 2 r2 2 e´ { dx e´ { e´ { dx dy 2πre´ { dr 2πe´ { 8 2π, “ “ “´ r 0 “ ż ż ż ż0 “ ´ ´8 ¯ ´8 ´ ´8 ¯ ˇ where we took advantage of Polar coordinates in standard Riemann integration inˇ the second equality. In particular, (1.5.1) follows. In higher dimensions d 2 we can still define the normal distribution and say that X : ě Ω, F, P Rd, B Rd is N µ, Σ -distributed, where µ Rd and Σ is a symmetric positive p q Ñ p pd dqq p q P definite matrix in R ˆ , if

s1 sd 1 1 t ´1 2 x µ Σ x µ d P X s d . . . e´ p ´ q ¨p p ´ qq dx1 . . . dx1, s R , p ď q“ 2π 2 det Σ @ P p q | p q| ż8 ż8 i.e., its density with respecta to λd is given by

1 1 t ´1 2 x µ Σ x µ d e´ p ´ q ¨p p ´ qq. 2π 2 det Σ p q | p q| a 40 CHAPTER 1. SET FUNCTIONS Chapter 2

The Lebesgue integral

We recall here that the Riemann integral had been constructed by directly partitioning the domain of the integrand into finer and finer partitions (see Section 2.0.3 below for a short reminder). As already outlined at the beginning of Section 1.4, the Lebesgue integral will essentially be constructed by first partitioning the image of the integrand (into finer and finer partitions), and then use these partitions of the image in order to obtain a partition of the domain and hence define the integral. This procedure is most easily performed for simple functions as follows.

2.0.1 Integrals of simple functions Definition 2.0.1. Let Ω, F,µ be a measure space, and let f : Ω, F,µ R, B R be a p q p q Ñ p p qq F B R -measurable function such that f Ω is a finite set. Then f is called a simple function ´ p q p q (‘einfache Funktion’, ‘Treppenfunktion’). The set (in fact a vector space, see Lemma 2.0.4 below) of all simple functions will be denoted by T , and by T ` we denote the set of all non- negative simple functions.

Lemma 2.0.2. A function f : Ω, F R, B R is simple if and only if there exist pairwise p q Ñ p p qq disjoint sets F ,...,Fn F and numbers α ,...,αn R such that 1 P 1 P n 1 f αi Fi . (2.0.1) “ i 1 ÿ“ Remark 2.0.3. As in the literature, we will call a representation of f as in Lemma 2.0.2 a normal representation ‘Normaldarstellung’ of f; note that some authors also demand that n n i 1 Fi Ω for (2.0.1) to be called a normal representation. In fact, if i 1 Fi Ω, then n“ 1 1“ n “ nĹ1 i `1 αi Fi with αn 1 0 and Fn 1 Ω i 1 Fi is a normal representation with i `1 Fi Ω. Ť “ ` “ ` “ z “ Ť “ “ Proof.ř If f is of the form (2.0.1), then obviouslyŤ f T . On the other hand, ifŤf T , then P P f Ω α ,...,αn R, and we have p q “ t 1 uĂ n

1 ´1 f αi f αi , “ pt uq i 1 ÿ“ 1 so f is of the form (2.0.1) with pairwise disjoint Fi, since f αi F due to the F B R - ´ pt uq P ´ p q measurability of f.

Lemma 2.0.4. The product of two simple functions is simple again, and so is the linear com- bination of finitely many simple functions. In particular, T is a vector space.

Proof. Exercise.

41 42 CHAPTER 2. THE LEBESGUE INTEGRAL

From now on onwards we will use the convention that 0 0. ¨8“ Lemma 2.0.5. For a measure space Ω, F,µ , let f T be a simple function with normal p q P ` representations n m f αi1F βj1G , “ i “ j i 1 j 1 ÿ“ ÿ“ for m,n N, αi, βj R, and Fi, Gj F. Then P P P n m αiµ Fi βjµ Gj . p q“ p q i 1 j 1 ÿ“ ÿ“

Proof. For Fi, Gj with Fi Gj we get that f ω αi βj for ω Fi Gj, and as a X ‰ H p q “ “ P X consequence

n n m m n m αiµ Fi αiµ Fi Gj βiµ Fi Gj βiµ Gj , i 1 p q“ i 1 j 1 p X q“ j 1 i 1 p X q“ j 1 p q ÿ“ ÿ“ ÿ“ ÿ“ ÿ“ ÿ“ where in the first and third equality we took advantage of the facts that if αi 0, then m n ‰ Fi j 1 Gj, and if βj 0, then Gj i 1 Fi. Ă “ ‰ Ă “ n 1 DefinitionŤ 2.0.6. Let f T ` with normalŤ representation f i 1 αi Fi with αi 0 and P “ “ ě Fi F for all i 1,...,n . Then the (µ-)integral of f is defined as P P t u ř n f dµ : f ω µ dω : αiµ Fi 0, . Ω “ Ω p q p q “ i 1 p q P r 8s ż ż ÿ“ Remark 2.0.7. Note that due to Lemma 2.0.5, the previous Definition 2.0.6 is well-defined.

We collect some important properties of integrals of non-negative simple functions

Lemma 2.0.8. (a) Let F F. Then P

1F dµ µ F . “ p q żΩ (b) For f, g T and c 0, P ` ě

cf g dµ c f dµ g dµ (linearity). (2.0.2) p ` q “ ` żΩ żΩ żΩ (c) For f, g T with f g P ` ď we have f dµ g dµ. ď żΩ żΩ Proof. (a) A normal representation of f is given by 1 1F , the integral of which is µ F by ¨ p q definition.

(b) We start with noting that due to Lemma 2.0.4, cf g is a simple function again, and ` since it is non-negative, we have cf g T and can therefore consider its integral. ` P ` n 1 m 1 Furthermore, if i 1 αi Fi and j 1 βj Gj are normal representations of f and g, respec- m n “1 “ tively, then k ¨ 1 γk Hk is a normal representation for cf g, where “ř ř `

ř γk cαi βj “ ` 43

and

Hk Fi Gj “ X with k i 1 m j 1,...,mn , i 1,...,n , j 1,...,m . (2.0.2) then follows “ p ´ q ` P t u P t u P t u immediately from the definition of the integral.

(c) This follows from the linearity and non-negativity of the integral in combination with the fact that g, g f T . ´ P `

n Exercise 2.0.9. Show that if f αi1F T with Fi F and αi 0 for all 1 i n is “ i 1 i P ` P ě ď ď not necessarily a normal representation,“ one still has ř n f dµ αiµ Fi . Ω “ i 1 p q ż ÿ“ 2.0.2 Lebesgue integral for measurable functions As is often done in mathematics, we are now to introduce the integral of suitable measurable functions by reducing it to something simpler, i.e., to the integral of simple functions. In order to pull through this procedure, the following approximation result is fundamental.

Lemma 2.0.10. Let Ω, F be a measurable space. Then f : Ω, F 0, , B 0, is p q p q Ñ pr 8s pr 8sq F B 0, -measurable if and only if there exists a non-decreasing sequence fn of functions ´ pr 8sq p q fn T such that P ` fn f as n . Ñ Ñ8 Proof. Proposition 1.4.15 immediately supplies us with the fact that if fn is a sequence of p q functions as in the assumptions, limn fn supn N fn M`. Ñ8 “ P P On the other hand, assume f M to be given. For n N and i 0, 1,...,n2n 1 we define P ` P P t ´ u the function n2n n1 fn : i2´ An,i , “ i 0 ÿ“ n 1 n n where for n N and i 0, 1,...,n2 1 we set An,i : f ´ i2´ , i 1 2´ F, as well as P1 P t ´ u “ pr p ` q qq P An,n n : f n, . It is apparent from the definition that fn T , that fn is non-decreasing, 2 “ ´ pr 8sq P ` and that fn f pointwise, which finishes the proof. Ñ The obvious procedure would now be to define the integral of a non-negative measurable (ex- tended) real-valued function f as the monotone limit of a sequence of integrals of simple func- tions fn approximating it monotonically. I.e., p q

f dµ : lim fn dµ, “ n żΩ Ñ8 żΩ where one can invoke Lemma 2.0.8 to ensure that the integrals on the right-hand side are non- decreasing in n, and hence the limit exists. However, we have to make sure that that the limit of those integrals does not depend on the very choice of the approximating sequence of simple functions. For this purpose, we prove the following lemma.

Lemma 2.0.11. Let fn , gn be two non-decreasing sequences of functions in T such that p q p q ` limn fn limn gn. Then Ñ8 “ Ñ8

lim fn dµ lim gn dµ. (2.0.3) n “ n Ñ8 żΩ Ñ8 żΩ 44 CHAPTER 2. THE LEBESGUE INTEGRAL

For the proof of Lemma 2.0.11 we take advantage of the following claim.

Claim 2.0.12. Let f T and let fn be an increasing sequence with fn T for all n N P ` p q P ` P such that f lim fn. (2.0.4) n ď Ñ8 Then

f dµ lim fn dµ. ď n żΩ Ñ8 żΩ m Proof. Write f αi1F for a normal representation of f. Then for ε 0, 1 we consider “ i 1 i P p q the set “ ř Mn : fn 1 ε f F. “ t ě p ´ q u P From (2.0.4) we deduce that Mn Ω, hence the continuity of the measure µ from below implies Ò m m f dµ αiµ Fi lim αiµ Fi Mn lim f 1M dµ “ p q“ n p X q“ n ¨ n Ω i 1 Ñ8 i 1 Ñ8 Ω ż ÿ“ ÿ“ ż 1 1 lim fn dµ lim fn dµ. ď n 1 ε “ n 1 ε Ñ8 żΩ ´ Ñ8 ´ żΩ Since ε 0, 1 was chosen arbitrarily, taking ε 0 it follows that f dµ limn fn dµ, P p q Ó Ω ď Ñ8 Ω which finishes the proof. ş ş

Proof of Lemma 2.0.11. From Claim 2.0.12 we deduce that for all m N, P

gm dµ lim fn dµ. ď n ż Ñ8 ż Hence, taking m we infer Ñ8

lim gm dµ lim fn dµ. m ď n Ñ8 ż Ñ8 ż Exchanging the roles of fn and gm we therefore obtain (2.0.3). We can now introduce the integral for measurable extended real-valued functions, which by Lemma 2.0.11 is well-defined.

Definition 2.0.13. Let f M and let fn be any sequence as in Lemma 2.0.10. Then P ` p q

f dµ : f ω µ dω : lim fn dµ “ p q p q “ n żΩ żΩ Ñ8 żΩ is called the (µ-)integral of f. As in the case of integrals of non-negative simple functions we derive the following basic prop- erties. Lemma 2.0.14. (a) For f, g M and c 0, , P ` P r 8s

cf g dµ c f dµ g dµ (linearity). p ` q “ ` żΩ żΩ żΩ (b) For f, g M with f g P ` ď we have f dµ g dµ. ď żΩ żΩ 45

The lemma can be proven taking advantage of its validity in the case of integrands in T ` (see Lemma 2.0.8) and then decomposing into positive and negative parts as well as taking limits. We omit the details.

Corollary 2.0.15. An alternative way (which sometimes comes handy) to reduce the integral of f as in Definition 2.0.13 to integrals of non-negative simple functions is via

f dµ sup g dµ. Ω “ g T ` Ω ż gP f ż ď Proof. Exercise.

Having introduced the integral for non-negative measurable functions, we would like to extend it to a suitable class of measurable functions that can take positive and negative values at once. It will turn out, however, that we do not only need the notion of Lebesgue integral for extended real-valued functions, but also for complex valued functions. Therefore, for a complex number z x yi C with x,y R, we denote its real part x by Re z and its imaginary part y by “ ` P P p q Im z , so z Re z Im z i. p q “ p q` p q ¨ Exercise 2.0.16. Show that a function f : Ω, F,µ C, B C is F B C -measurable if p q Ñ p p qq ´ p q and only if the functions Re f and Im f are F B R -measurable. ´ p q Proof. We note that the functions C z Re z R and C z Re z R are continuous. Q ÞÑ p q P Q ÞÑ p q P Therefore, if f is measurable, so are the functions Re f and Im f due to Theorems 1.4.10 and 1.4.4. If, on the other hand, the functions Re f and Im f are F B R -measurable, the so is the ´ p q function i Im f, and hence f Re f i Im f is measurable as well due to Proposition 1.4.13 ¨ “ ` ¨ (where we identify C with R2).

Definition 2.0.17. Let f : Ω, F,µ C, B C be an F B C -measurable function. Then p q Ñ p p qq ´ p q f is called Lebesgue-integrable or (µ-)integrable if the integrals

Re f ˘ dµ, Im f ˘ dµ p q p q żΩ żΩ are all finite. If this is the case, the quantity

f dµ : Re f ` dµ Re f ´ dµ i Im f ` dµ i Im f ´ dµ “ p q ´ p q ` p q ´ p q żΩ żΩ żΩ żΩ żΩ is called the Lebesgue integral of f (or also µ-integral of f). Furthermore, for A F we introduce the notation P

f dµ : 1A f dµ, (2.0.5) “ ¨ żA żΩ if the function 1A f is Lebesgue-integrable. ¨ Above all, in the case of real-valued integrands it will turn out useful to be slightly less demand- ing in the above definition.

Definition 2.0.18. Let f M. We define P

f dµ : f ` dµ f ´ dµ “ ´ żΩ żΩ żΩ 46 CHAPTER 2. THE LEBESGUE INTEGRAL as long as one of the two terms on the right-hand side is finite.1 In this case, we call f quasi- integrable. We write L1 or L1 Ω, F,µ for the set f M : f dµ R , the (vector) space of p q t P Ω P u integrable functions. ş Similarly, for a random variable X : Ω, F, P R, B R we define its expectation p q Ñ p p qq E X : X dP, r s “ żΩ whenever X is quasi-integrable with respect to P. The Reader may convince herself that the notion of integral introduced in Definition 2.0.18 coincides with that of Definition 2.0.17 for measurable f that takes values in R only. Also, the property of linearity as given for non-negative functions in Lemma 2.0.14 directly transfers to complex-valued or R-valued functions if the resulting sum is well-defined in R (i.e., we don’t have expressions like appearing). 8´8 2.0.3 Lebesgue vs. Riemann integral Recall that the Riemann integral has been defined for real-valued functions which are defined on (subsets of) Rd instead of on more general sets (as is the case for the Lebesgue integral, which essentially can be applied to real-valued functions defined on arbitrary measure spaces). Generalizing Definition 2.0.18 to the case of functions that are defined only on a part of the underlying space, for f : A R which is L A B R -measurable, where A B Rd and d Ñ p q´ p q P p q L A : L R A the trace σ-algebra, we say that f is Lebesgue integrable if p q “ p q f dλd : f dλd “ d żA żR (where f : Rd R is defined to coincide with f on A,r and as 0 on Ac) is well-defined and finite. Ñ We will have a closer look at the case of A being an interval in R and recall that the Riemann integralr had been defined as follows. f : a, b R was called Riemann integrable if for any r s Ñ sequence In with p q n n n In : a tp q tp q . . . tp q b “ 0 ă 1 ă ă mn “ such that n n max tip q tpi q1 0 as n , 1 i mn ´ Ñ Ñ8 ď ď ´ and for any sequence of points n n n ξip q tip q1,tpi q , n N, 1 i mn, P r ´ s P ď ď we have that the limit mn n n n lim f ξp q tp q tp q n i i i 1 Ñ8 i 1 p qp ´ ´ q ÿ“ n exists, is finite, and is independent of the very choice of In and ξp q . In this case we write p q p i q b f x dx for the corresponding limit. a p q Exerciseş 2.0.19. (a) Show that f as above is Lebesgue integrable if and only if f dλd . | | ă8 (b) Show that the function ş f : 0, 1 0, 1 r s Ñ t u 1, if x Q, x P ÞÑ 0, if x Q, " R is Lebesgue integrable but not Riemann integrable.

1 You might sometimes also encounter the notation µpfq for the Lebesgue integral Ω f dµ. ş 2.1. CONVERGENCE THEOREMS 47

(c) Show that the function g : 1, R r 8q Ñ sin x x p q ÞÑ x n is (improperly (‘uneigentlich’)) Riemann integrable (i.e., the limit limn g x dx ex- Ñ8 1 p q ists) but not Lebesgue integrable. ş Theorem 2.0.20. If f : a, b R is Riemann integrable, then f is bounded and r s Ñ L a, b , B R -measurable. In particular, f is Lebesgue integrable and the Riemann integral p pr sq p qq of f coincides with the Lebesgue integral of f. Since this result is not central to our further exposition, we refer to the exercise classes for a proof. As a consequence of the previous result, we will henceforth also write dx instead of λ dx if p q there is not danger of confusion.

2.1 Convergence theorems

In contrast to the Riemann integral, the Lebesgue integral is pretty robust when it comes to exchanging limits and integration. The following subsection collects the convergence theorems that are most relevant to us in what is to come.

2.1.1 Dominated and monotone convergence Theorem 2.1.1 (Monotone convergence theorem (MCT) (B. Levi (1875 – 1961, Italian poly- math)). Let fn be a non-decreasing sequence of measurable functions fn M . Then p q P `

lim fn dµ lim fn dµ. n “ n żΩ Ñ8 Ñ8 żΩ Proof. We first of all notice that limn fn M` due to Proposition 1.4.15, and hence its Ñ8 P integral is well-defined (recall Def. 2.0.13). The monotonicity of the integral (Lemma 2.0.14) immediately implies that

lim fn dµ fm dµ n ě żΩ Ñ8 żΩ for each m N, and that the right-hand side is monotone in m, whence we conclude that P

lim fn dµ lim fm dµ. n ě m żΩ Ñ8 Ñ8 żΩ To show the reverse inequality, we will take advantage of Corollary 2.0.15. For this purpose, consider arbitrary g T ` with g limn fn. Then for ε 0 arbitrary we have that P ď Ñ8 ą 1 fn 1 ε g g g, t ěp ´ q u ¨ Ò and as a consequence 1 lim fn dµ 1 ε lim fn 1 ε g g dµ 1 ε g dµ, n n t ěp ´ q u Ñ8 Ω ě p ´ q Ñ8 Ω ¨ ě p ´ q Ω ż ż T ` ż P where the limits exist due to monotonicity andloooooooomoooooooon Lemma 2.0.14, and the second inequality follows from Claim 2.0.12. Taking ε 0 to 0 we obtain ą

lim fn dµ g dµ, n ě Ñ8 żΩ żΩ for any g T with g f. Thus, the desired inequality is a consequence of Corollary 2.0.15. P ` ď 48 CHAPTER 2. THE LEBESGUE INTEGRAL

Having this result at our disposal, we can immediately prove the following result which is of importance on its own.

Lemma 2.1.2 (Lemma of Fatou (1878 – 1929, French mathematician)). Let fn be a sequence p q with fn M . Then P ` lim inf fn dµ lim inf fn dµ. n ě n Ñ8 ż ż Ñ8 Proof. Since for each n N we have that for m n, P ě

fm inf fk, ě k n ě the monotonicity of the integral supplies us with

inf fk dµ inf fk dµ. k n ě k n ě ż ż ě Taking limits on both sides we get

Thm. 2.1.1 lim inf fn dµ lim inf fk dµ lim inf fn dµ. n ě n k n “ n Ñ8 ż Ñ8 ż ě ż Ñ8

Exercise 2.1.3. Show that the conclusion of Lemma 2.1.2 does not hold in general if we dispose of the assumption fn 0. ě

Definition 2.1.4. Let Ω, F, P be a probability space and assume given a sequence An of p q p q subsets An Ω. Then the ‘limes superior’ of the sequence An is defined as Ă p q

8 8 lim sup An : Ak. n “ Ñ8 n 1 k n č“ ď“

The ‘limes inferior’ of the sequence An is defined as p q

8 8 lim inf An : Ak n Ñ8 “ n 1 k n ď“ č“

Note that if An F for all n N in the previous definition, then also lim supn An F and P P Ñ8 P lim infn An F. Ñ8 P Exercise 2.1.5. Show the following identities:

• lim sup An ω Ω : ω An for infinitely many n ; n “ P P Ñ8 ( • lim inf An ω Ω : ω An such that n0 N with ω An n n0 ; n Ñ8 “ P P D P P @ ě ( Corollary 2.1.6. Let a measure space Ω, F,µ and a sequence An with An F be given. p q p q P Then µ lim inf An lim inf µ An , (2.1.1) n n Ñ8 ď Ñ8 p q and if µ is finite, then also ` ˘ µ lim sup An lim sup µ An , (2.1.2) n ě n p q Ñ8 Ñ8 ` ˘ 2.1. CONVERGENCE THEOREMS 49

Proof. (2.1.1) is a direct consequence of Fatou’s lemma with fn : 1A , since “ n

µ lim inf An lim inf 1A dµ lim inf 1A dµ lim inf µ An . p q“ n n ď n n “ n p q ż Ñ8 Ñ8 ż Ñ8 c To obtain (2.1.2), observe that (2.1.1) with An replaced by An reads

µ lim inf Ac lim inf µ Ac . n n n ď Ñ8 p q ` ˘ c c Using this in combination with the identity lim supn An lim infn An, the finiteness c Ñ8 “ Ñ8 of µ, and the fact that µ Ω lim infn µ An lim supn µ An , we therefore deduce p q´ Ñ8 p` q“ Ñ8˘ p q

µ lim sup An lim sup µ An . n ě n p q Ñ8 Ñ8 ` ˘

Another benefit of Fatou’s lemma is that it serves in proving the dominated convergence theorem below, which (besides the monotone convergence theorem) is one of the principal results allowing the interchange of integration and limits.

Theorem 2.1.7 (Lebesgue’s dominated convergence theorem (DCT)). Let fn be a sequence p q in M Ω, F,µ , and assume there exists g M Ω, F,µ with fn g for all n N, as well p q P `p q | | ď P as g dµ . Furthermore, assume that fn converges µ-almost surely to some f M (see ă 8 P Definition 2.2.1 below for almost sure convergence). ş Then f dµ , and | | ă8 ş lim fn dµ f dµ. n “ Ñ8 ż ż Proof. Using Fatou’s lemma we obtain

g dµ lim inf fn dµ lim inf g fn dµ g f dµ g dµ f dµ. (2.1.3) ` n ˘ “ n p ˘ q ě p ˘ q “ ˘ ż Ñ8 ´ ż ¯ Ñ8 ż ż ż ż Subtracting g dµ 0, on both sides supplies us with P r 8q ş f dµ lim inf fn dµ lim sup fn dµ lim inf fn dµ f dµ, ď n ď n “´ n ´ ď ż Ñ8 ż Ñ8 ż Ñ8 ż ż and hence finishes the proof.

Example 2.1.8. Let f M . Then P `

f dµ µ f t λ dt . “ 0, p ą q p q ż żr 8q

Indeed, let fn be any sequence as in Lemma 2.0.10. Then, using MCT, p q

f dµ lim fn dµ lim ν fn t λ dt ν f t λ dt . Ω “ n Ω “ n 0, p ą q p q“ 0, p ą q p q ż Ñ8 ż Ñ8 żr 8q żr 8q where the first equality is the definition of the integral for measurable non-negative functions (or else by MCT), the second equality is easy to check for simple functions in normal form, and the third equality takes advantage of the MCT as well as the convergence fn t f t in t ą u Ò t ą u combination with the continuity from below of the measure µ. 50 CHAPTER 2. THE LEBESGUE INTEGRAL

2.2 Measures with densities, absolute continuity

2.2.1 Almost sure / almost everywhere properties Definition 2.2.1. Let Ω, F,µ be a measure space and let P be a property such that for each p q ω Ω it can be decided whether the property P holds true or not. We then say that the property P P holds (µ-)almost everywhere / (µ-)a.e. (‘(µ-)fast ¨uberall’/ ‘(µ-)f.¨u.’)) if there exists a (µ-)null set N F such that P holds for all ω N c. P P If µ is a probability measure, we also say that the property P holds for (µ-)almost all / (µ-)a.a. (‘µ-fast alle’/ ‘µ-f.a.’) ω instead.

Example 2.2.2. Consider the Cantor function F : 0, 1 0, 1 . Then F is differentiable r s Ñ r s λ 0,1 -almost everywhere with F 1 x 0 for all such x (see problem 2 e) on homework sheet 7). |r s p q“ Indeed, in the homework we have seen that F x 0 for all x 0, 1 C, with C denoting the 1p q“ P r sz Cantor set. You have furthermore shown that λ C 0, so in particular this means that F is p q “ differentiable λ 0,1 -a.e. |r s Lemma 2.2.3. Let f M Ω, F,µ . Then P `p q

f dµ 0 if and only if f 0 µ a.e. “ “ ´ ż Proof. The statement is obvious for simple functions. For general f M , choose a monotone P ` approximating sequence of non-negative simple functions fn with fn f. Then f 0 µ-a.e. if Ò “ and only if for all n N, we have fn 0 µ-a.e. By the observed validity of the statement for P “ simple functions, the latter is equivalent to fn dµ 0 for all n N, which due to the definition “ P of the integral via ş f dµ lim fn dµ “ n ż Ñ8 ż is equivalent to f dµ 0. This proves the result. “ Lemma 2.2.3 itş interesting in its own right, but it also proves useful in deriving the following result.

Proposition 2.2.4. Let

(a) f, g M Ω, F,µ , or let P `p q (b) f, g M Ω, F,µ and let f or g be µ-integrable. P p q Then, if f g µ-a.e., we have ď f dµ g dµ. ď żΩ żΩ Proof. Exercise.

The MCT Theorem 2.1.1 gives rise to a class of measures that is of particular importance in probability theory, as is explained in the following corollary to it.

Corollary 2.2.5. Let f M be defined on a measure space Ω, F,µ . Then the set function P ` p q ν : F 0, , Ñ r 8s (2.2.1) A f dµ, ÞÑ żA defines a measure on Ω, F . p q 2.2. MEASURES WITH DENSITIES, ABSOLUTE CONTINUITY 51

Proof. Since f 0 we have ν 0, and furthermore ν 0. To show the σ-additivity, let ě ě pHq “ An be a sequence of pairwise disjoint sets with An F for all n N. Then p q P P 9 1 Thm. 2.1.1 1 ν An f dµ An f dµ f An dµ ν An . n N “ 9 “ ¨ “ ¨ “ p q P nPNAn Ω n N n N Ω n N ´ď ¯ ż ż ´ ÿP ¯ ÿP ż ÿP Ť

Remark 2.2.6. In fact, if instead of f M we only assume that f M is µ-quasi-integrable, P ` P we still get that (2.2.1) defines a signed measure.

For those who have attended ‘Introduction to stochastics’, we add here the remark that the result of Corollary 2.2.5 is very good news for us. Indeed, recall from [Dre18, Section 1.8.3] that we did run into severe troubles trying to define measures as in (2.2.1) using the Riemann integral. It turns out that by use of the Lebesgue integral everything we’re after works out smoothly.

Definition 2.2.7. In the context of Corollary 2.2.5, we say that f is a density of ν with respect to µ. We write ν f µ, or “ ¨ dν f . “ dµ

Example 2.2.8. Recall the Normal distribution on Rd, B Rd introduced in Example 1.5.8. p p qq We observe that a N µ, Σ distributed random variable (µ Rd, Σ a symmetric positive definite p q P matrix) has a distribution (recall Def. 1.5.7) which has density

1 1 x µ Σ´1 x µ d e´ 2 p ´ q¨p p ´ qq, x R , 2π d det Σ P p q | p q| with respect to λd. a

The following result explains us how to integrate with respect to measures that have a density.

Proposition 2.2.9. Let g M such that g 0 µ-a.s. or g dµ . Furthermore, assume P ě | | ă 8 ν f µ with f M . Then “ ¨ P ` ş

g dν if and only if g f dµ , | | ă8 | | ¨ ă8 żΩ żΩ and if the integrals are finite, then

g dν g f dµ R. “ ¨ P żΩ żΩ

Proof. Assume g M`. Then, due to Lemma 2.0.10, there exists a non-decreasing sequence P mn n g of functions in T such that lim g g. Writing g α 1 n , for a normal n ` n n n i 1 i Ai p q n nÑ8 “ “ “ representation of gn with A F and α 0 we have i P i ě ř mn mn n n n g ν α ν A α f 1 n µ g f µ, n d i i i Ai d n d Ω “ i 1 p q“ i 1 Ω ¨ “ Ω ¨ ż ÿ“ ÿ“ ż ż n where in the second equality we took advantage of the fact that ν Ai An f dµ by definition p q“ i of the measure ν. Using the MCT, the result follows. ş If, on the other hand, g M, then we decompose g g g , and proceed similarly to the P “ ` ´ ´ above for both, g` and g´. 52 CHAPTER 2. THE LEBESGUE INTEGRAL

Proposition 2.2.10. Let ν and µ be measures on Ω, F , and assume that ν is σ-finite. Fur- p q thermore, assume that f and g are densities of ν with respect to µ. Then we have

f g µ a.e. “ ´

Proof. Since ν is σ-finite we find Sn F such that ν Sn and Ω n N Sn. Define the set P p qă8 “ P Ln : f g Sn of those points in Sn where f takes values larger than g. We deduce “ t ą uX Ť 1 0 ν Ln ν Ln f dµ g dµ f g f g Sn dµ. “ p q´ p q“ ´ “ p ´ q t ą uX żLn żLn żΩ 1 Since f g f g 0, this implies µ Ln 0. Therefore, p ´ q t ą u ą p q“

µ f g µ Ln µ Ln 0. p ą q“ n N ď n N p q“ ´ ďP ¯ ÿP In a similar manner we obtain µ f g 0, p ă q“ so µ f g 0 and therefore f g µ-a.e. p ‰ q“ “ More generally, we introduce the following concepts relating two measures on the same measur- able space. Definition 2.2.11. Let µ,ν be two measures on a measurable space Ω, F . We say that p q (a) ν is absolutely continuous (‘absolutstetig’) with respect to µ (and write ν µ), if for ! each F F with µ F 0 we also have ν F 0; P p q“ p q“ (b) µ and ν are equivalent if µ ν and ν µ; ! ! (c) ν is singular (‘singul¨ar’) with respect to µ (and write ν µ), if there exists F F such K P that µ F 0 and ν F c 0. p q“ p q“ In the case of finite measures, there is an obvious justification for Part (a) of this terminology given in the following lemma. Lemma 2.2.12. Let µ and ν be measures on a measurable space Ω, F such that ν is finite. p q Then ν is absolutely continuous with respect to µ if and only if for each ε 0 there exists δ 0 ą ą such that for any F F, P µ F δ implies ν F ε. p qď p qď Proof. If the ε-δ-condition holds, then for any set F F with µ F 0 we have ν F 0, so P p q “ p q“ ν is absolutely continuous with respect to µ. On the other hand, assume that the condition does not hold true. Then we find ε 0 and a n ą sequence Fn with Fn F such that µ Fn 2 and ν Fn ε. Setting p q P p qď ´ p qą

F : lim sup Fn Fm “ n “ Ñ8 n N m n čP ďě we deduce for each n N that P n 1 µ F µ Fm µ Fm 2´ ` , p qď m n ď m n p qď ´ ďě ¯ ÿě so µ F 0. On the other hand, using Corollary 2.1.6 and the fact that µ is finite, p q“ ν F lim sup ν Fn ε 0. p qě n p qě ą Ñ8 Thus, ν is not absolutely continuous with respect to µ. 2.2. MEASURES WITH DENSITIES, ABSOLUTE CONTINUITY 53

2.2.2 Hahn-Jordan decomposition The following result on the Radon-Nikodym derivative (Theorem 2.2.16) is not only of impor- tance in the context of probability theory, but is also of significance to functional analysis. Before, however, we introduce a key result for proving it, which is interesting in its own right. We begin with giving an intuitive definition.

Definition 2.2.13. Let µ be a signed measure on Ω, F . A set A F is called positive (nega- p q P tive), if for every F F we have that µ A F 0 (µ A F 0). P p X qě p X qď Theorem 2.2.14 (Hahn-Jordan decomposition (Hans Hahn (1879–1934), Camille Jordan (1838–1922))). Let µ be a signed measure on Ω, F . Then there exist Ω , Ω F such that p q ` ´ P Ω Ω 9 Ω and the following hold: “ `Y ´ (a) Setting µ : µ Ω as well as µ : µ Ω , both µ and µ define non-negative ` “ p¨X `q ´ “´ p¨X ´q ` ´ measures (note that in particular we have µ Ω 0 as well as µ Ω 0). In this ´p `q “ `p ´q “ case, we say that Ω` and Ω´ form a so-called Hahn decomposition of Ω with respect to µ.

(b) µ µ` µ´ (Jordan decomposition). “ ´

In addition, the Hahn decomposition of Ω is unique up to null sets with respect to the measure µ µ . ` ` ´ Proof. According to Exercise 1.2.12 we can assume w.l.o.g. (without loss of generality, ‘o.B.d.A.’, ‘ohne Beschr¨ankung der Allgemeinheit’) that µ F , for all F F. p q P p´8 8s P We define c : inf µ F . (2.2.2) “ F F p q F negativeP

We can find a sequence Fn of negative subsets of F such that limn µ Fn c, the countable p q Ñ8 p q“ union of negative sets is negative again, we infer that

8 Ω´ : Fn “ n 1 ď“ is negative and that µ Ω´ c. Indeed, since µ Ω´ is a ‘common’ non-negative measure, the p q“ ´ | latter equality follows from

c lim µ Fn µ Ω´ inf µ F c, “ n p qě p qě F F p qě Ñ8 F negativeP where the second limit exists since the sequence is monotone. We claim that Ω : Ω Ω is a positive set. Indeed, assume it was not. Then there would ` “ z ´ exist G F with G Ω such that µ G Ω 0. Now G cannot be a negative set since 0 P 0 Ă ` p 0 X `qă 0 in that case G0 9 Ω´ would be a negative set with µ G0 Ω´ c, a contradiction to (2.2.2). Y p Y q ă 1 Thus, there exists a minimal k1 N such that G0 contains a set G1 F with µ G1 . Then P P p qě k1 1 µ G0 G1 µ G0 µ G1 ; p z q“ p q´ p qď´k1 since the RHS is negative, the same reasoning applied to G0 before can now be applied to 1 G0 G1 in order to infer that G0 G1 contains a set G2 F with µ G2 , and such that k2 is z z P p qě k2 54 CHAPTER 2. THE LEBESGUE INTEGRAL minimized among all such admissible subsets. Due to µ G , , Exercise 1.2.12 would p 0q P p´8 8q in particular yield that 1 lim 0, n k Ñ8 n “ and as a consequence, for every G F subset of P

8 G˚ : G0 Gn “ z n 1 ď“ we would have µ G 0. In particular, G F would be negative. But then p qď ˚ P

8 µ G˚ µ G0 µ Gn µ G0 0, p q“ p q´ n 1 p qď p qă ÿ“ which as before contradicts the assumption that µ Ω is minimal among all negative sets. p ´q Therefore, we must have that Ω` is a positive set and we have proved the claim and can now conclude the proof:

(a) It immediately follows that µ´ Ω` µ` Ω´ 0. (2.2.3) p q“ p q“ Similarly, µ 0 since Ω is a positive set and µ 0 since Ω is a negative set. ` ě ` ´ ě ´ (b) This is a consequence of the additivity of signed measures.

Uniqueness: Exercise.

Lemma 2.2.15. For a signed measure µ on Ω, F , we have with the notation of Theorem p q 2.2.14:

(a) µ` A sup µ F , A F, p q“ F F p q @ P F P A Ă and µ` is also called the positive variation of µ.

(b) µ´ A inf µ F , A F, p q“´ F F p q @ P F P A Ă and µ´ is also called the negative variation of µ.

Furthermore, we call the measure µ : µ` µ´ | | “ ` the total variation of µ.

Proof. (a) It follows from Theorem 2.2.14 that µ Ω is non-positive and µ Ω is p¨ X ´q p¨ X `q non-negative, so

sup µ F sup µ F µ` A . F F p q“ F F p q“ p q F P A F AP Ω` Ă Ă X The remaining part follows in a similar manner. 2.2. MEASURES WITH DENSITIES, ABSOLUTE CONTINUITY 55

2.2.3 Lebesgue’s decomposition theorem, Radon-Nikodym derivative Theorem 2.2.16 (Lebesgue’s decomposition theorem, Radon-Nikodym theorem). Let µ and ν be σ-finite measures on a measurable space Ω, F . p q Then there exist unique measures ν and ν on Ω, F such that the following hold: ac s p q ν ν ν , and ν µ, ν µ. “ ac ` s ac ! s K Furthermore, there exists f M with P ` ν f µ ac “ ¨ and f is µ-a.s. uniquely determined.

Corollary 2.2.17. Under the same assumptions as in Theorem 2.2.16,

ν µ if and only if ν has a density with respect to µ. ! You are asked to prove most of Theorem 2.2.16 in homework problem 2 on sheet 8. Here, we will only address the uniqueness part. For this purpose take another decomposition of µ with νs1 and ν denoting the corresponding singular and absolutely continuous parts. Let A F such ac1 P that µ A 0 and ν Ac 0. Then ν A 0 since µ A 0, and hence p q“ sp q“ ac1 p q“ p q“

ν F ν F A ν1 F A ν1 F F F. sp q“ p X q“ sp X qď sp q @ P In particular, we infer ν ν and thus ν ν . But then ν ν ν ν is a measure s ď s1 ac1 ď ac s1 ´ s “ ac ´ ac1 which at the same times is singular and absolutely continuous w.r.t. µ so it must vanish. Hence, ν ν and ν ν , which proves the uniqueness of the Lebesgue decomposition. s “ s1 ac1 “ ac Now that we know that νac is unique, the µ-a.e. uniqueness of f follows from the fact that (see homework sheet) ν : f µ for some f M and Proposition 2.2.10. ac “ ¨ P ` Remark 2.2.18. (a) Using Theorem 2.2.14, Theorem 2.2.16 can be extended to signed mea- sures (which we won’t do here).

(b) Above we had introduced all the machinery we needed in order to give a self-contained proof of Theorem 2.2.16. If you do have a basic knowledge of functional analysis, you might also want to have a look at another proof of the cited results that takes advantage of Riesz’ representation theorem (see the Proof of [Kle14, Theorem 7.33], for example).

2.2.4 Integration with respect to image measures The following is a generalization of [Dre18, Proposition 1.9.10].

Theorem 2.2.19 (Change of variable formula (‘Transformationssatz’)). Assume a measure space Ω, F,µ as well as a measurable space Ω, F and a F F-measurable map ϕ :Ω Ω p q p q ´ Ñ be given. Denote the image measure µ ϕ 1 by µ. Then for f M Ω, F the integral ˝ ´ P p q r r r r f rϕ dµ r r ˝ żΩ exists if and only if the integral f dµ żΩ exists in R (i.e., in the sense of quasi-integrability).r r In this case both integrals coincide. The proof is contained in Homework 8.1. 56 CHAPTER 2. THE LEBESGUE INTEGRAL

2.3 Product spaces

As outlined in [Dre18, Section 1.14.3] of [Dre18] already, it will be crucial for us to be able to construct infinite sequences of random variables defined on the same probability space. One way to ensure that the underlying probability space is ‘rich enough’ to accommodate for this wealth and structure of random variables will be to consider infinite product spaces, which are products of measure spaces and play a crucial role in probability theory.2

Definition 2.3.1. Let Λ be an arbitrary non-empty index set, and let Ωλ, λ Λ, be a family P of non-empty sets. Then we define the product space (‘Produktraum’) (or also the product of the Ωλ, λ Λ) P Ωλ λ Λ ąP to be the set of all maps f : Λ λ Λ Ωλ such that f λ Ωλ for all λ Λ. Ñ P p q P P Λ In the case that the Ωλ are equal to some set Ω for all λ Λ, we also write Ω for the corre- Ť P sponding product space.

Example 2.3.2. Without going into detail, it might be suggesting itself that if we want to model an infinite sequence of coin tosses, the space 0, 1 N might be a candidate for outcomes of such t u an experiment to lie in, where 0 can be identified with tails and 1 with heads. Then ω 0, 1 N P t u is nothing else than an infinite sequence ω n n N of elements in ω n 0, 1 for all n N. p p qq P p q P t u P Definition 2.3.3. For I J Λ we introduce the projections Ă Ă J πI : Ωλ Ωλ λ J Ñ λ I ąP ąP ω ω I . ÞÑ | J I In particular, if J Λ we will write πI, and if I λ , we will use the notation πλ . For π λ “ “ t u t u we will also just write π λ . t u We are now going to introduce the notion of a product-σ-algebra. In case you have seen this before, it will be very much in the spirit of the definition of the product topology, where instead of measurability one asks for continuity of the projection maps. We recall the definition here fore completeness.

Definition 2.3.4. Let a family Ωλ,τλ , λ Λ, of topological spaces be given. The corresponding p q P product topology τ is defined as the smallest topology on λ Λ Ωλ such that for each λ1 Λ, P P the coordinate maps πλ1 : λ Λ Ωλ Ωλ1 are continuous with respect to the topologies τ and P Ñ Ś τλ. Ś With this in mind, we can now proceed to the definition of the product-σ-algebra.

Definition 2.3.5. Let a family Ωλ, Fλ , λ Λ, of measurable spaces be given. The correspond- p q P ing product-σ-algebra λ Λ Fλ is defined as the smallest σ-algebra on λ Λ Ωλ such that for P P each λ1 Λ, the coordinate maps πλ1 : λ Λ Ωλ Ωλ1 are λ Λ Fλ Fλ1 -measurable. P  P Ñ P ´ Ś Λ As before, if F0 Fλ for all λ Λ, then we also abbreviate λ Λ Fλ by F0 b . “ P Ś  P p q Lemma 2.3.6. In the setting of Definition 2.3.5, for anyÂI J Λ, the mapping πJ is Ă Ă I λ J Fλ λ I Fλ-measurable. P ´ P Proof. This is a direct consequence of Corollary 1.4.9.

2In spirit, this will be very much related to the concept of products of topological spaces in case you have seen this before 2.3. PRODUCT SPACES 57

Definition 2.3.7. Let a family Ωλ, Fλ , λ Λ, of measurable spaces be given. Then any set p q P of the form 1 πJ ´ F Fλ, F Fλ, J Λ finite, (2.3.1) p q p q P λ Λ P λ J Ă âP âP is called a cylinder set.

Lemma 2.3.6 implies that cylinder sets are F Fλ-measurable. “ λ Λ The principal reason that cylinder sets will play anP important role in what follows is that it  is generally easier to assign to them than to arbitrary measurable subsets of an infinite space. Indeed, imagine the setting of infinitely many coin tosses again. As long as you only want to understand events involving the outcome of finitely many of these coin tosses, you are easily able to assign them a probability under the assumption that the coin tosses are fair and independent. This becomes much more intricate in case you consider events that depend on the outcome of infinitely many coin tosses. Crucially, as will turn out below, specifying a probability measure on cylinder sets already uniquely characterizes the measure (see Theorem 4.2.1 below), so there is no need to specify probabilities for an even bigger subclass of measurable sets.

Exercise 2.3.8. Assume the setting of Definition 2.3.7. Show that cylinder sets form an algebra, but not a σ-algebra.

We have seen before that the Borel-σ-algebra plays a prominent role in our studies; the following result sheds some light on its behaviour under taking products.

Theorem 2.3.9. Let Λ be an at most countable set and assume that for a family Ωλ,τλ , 3 p q λ Λ, each Ωλ,τλ is a Polish space (complete separable metric spaces, if you prefer). Then, P p q setting Ω : Ωλ and denoting by τ the product topology of Ω, we get that Ω,τ is a Polish “ λ Λ p q space (or a completeP separable metric spaces for that matter) again, and Ś

σ τ B τλ . (2.3.2) p q“ λ Λ p q âP Proof. For simplicity of notation we assume w.l.o.g. that Λ N (or a finite subset of N with “ the respective modifications in the notation below) for the first part of the proof. Denote by dn a metric on Ωn that induces the topology τn, and with respect to which Ωn is complete. We define on Ω a new metric

n dn ω n ,ω1 n d ω,ω1 : 2´ p p q p qq , 1 d ω n ,ω n p q “ n N n 1 ÿP ` p p q p qq and it is left as an exercise to check that d induces the product topology on Ω, and that Ω is complete with respect to d. Furthermore, Ω is separable, as will follow from an argument below. We now prove (2.3.2). By definition, for each λ Λ, the projections πλ :Ω Ωλ, λ Λ, P Ñ P are continuous maps from the topological space Ω,τ to the topological space Ωλ,τλ . As p q p q a consequence of Theorem 1.4.7, the σ-algebra λ ΛB τλ is generated by all sets of the form 1 b P p q πλ´ O , O τλ, λ Λ, and in combination with the aforementioned continuity of the projections p q P 1 P we get that π´ O τ, whence B τλ σ τ . λ p q P λ Λ p qĂ p q To prove the converse inclusion, weP first of all observe that the (at most) countable product  of separable metric spaces is separable again. Indeed, for λ Λ we denote by Dλ a countable P dense subset of Ωλ. For each λ Λ choose and fix ωλ Dλ arbitrarily and set P P

D : ω Dλ : ωλ1 ωλ1 for finitely many λ1 Λ . “ P ‰ r P λ Λ ąP ( 3Recall that a Polish space was defined as a separabler topological space, for which there exists a complete metric that induces its topology. 58 CHAPTER 2. THE LEBESGUE INTEGRAL

Then D is countable and dense in Ω, so Ω is separable. Now let Oλ : B 1 ω : ω Dλ,n N , “ n p q P P !1 ) with B 1 ω ω Ωλ : dλ ω, ω n , then Oλ is a countable basis of the topology of Ωλ n p q “ t P p q ă u (i.e., any open set in Ωλ can be written as a (a priori possibly uncountable) union of elements of Oλ). Thus, r r n 8 1 π´ Bλ : Bλ Oλ , λ1, . . . , λn Λ . (2.3.3) λi p i q i P i t uĂ n 1 i 1 ď“ ! č“ ) is a basis for the topology τ. Now since Ω is separable, for an arbitrary basis B of the topology τ, any open set in τ can be written as the countable union of elements in B. Thus, since the elements of (2.3.3) are contained in λ ΛB τλ , this finishes the proof of τ λ ΛB τλ and thus also σ τ λ ΛB τλ . b P p q Ăb P p q p qĂb P p q

We immediately obtain the following important corollary.

Corollary 2.3.10. For each d N, we have P d d B R B R b . p q“ p q 2.4 Product measures

Oftentimes, such as e.g. in the case of Rd, we do know how to integrate with respect to the ‘one-dimensional measures’ (such as with respect to the Lebesgue measure, where the funda- mental theorem of calculus provides us with a powerful tool to actually compute integrals), but integration with respect to the product measure seems to be harder when it comes to actual computations (recall Theorem 1.3.11 as well as Remark 1.3.15). The Theorems 2.5.1 and 2.5.2 below provide a useful technique to reduce the integral with respect to the product measure to integrals with respect to the marginals. In order to be able to rigorously formulate them, we first have to introduce the concept of a product measure in a more general setting than that of Remark 1.3.15.

Definition 2.4.1. Assume measurable spaces Ω , F , Ω , F and E, E be given and write p 1 1q p 2 2q p q Ω : Ω Ω . For arbitrary A Ω as well as ωj Ωj, 1 j 2, we call “ 1 ˆ 2 Ă P ď ď A : ω Ω : ω ,ω A ω1 “ 2 P r2 p 1 2q P ( the ω1-section of A (‘ω1-Schnitt vonr A’), and similarlyr we call Aω2 : ω Ω : ω , ω A r r “ 1 P 1 p 1 2q P r ( the ω2-section of A (‘ω2-Schnitt von A’). r If f :Ω E, then we call Ñ r r fω :Ω2 E, ω2 f ω1,ω2 1 Ñ ÞÑ p q the ω1-section of f (‘ω1-Schnitt von f’), and similarly r r f ω2 :Ω E, ω f ω , ω r r 1 Ñ 1 ÞÑ p 1 2q r the ω2-section of f (‘ω2-Schnitt von f’). r Lemma 2.4.2. Assume the setting of Definition 2.4.1. r r 2.4. PRODUCT MEASURES 59

(a) If A F1 F2, then P b ω2 Aω F , A F . 1 P 2 P 1 r ω2 (b) If f is F F E-measurable, thenr fω is F E-measurable, and similarly f is 1 b 2 ´ 1 2 ´ F1 E-measurable. r ´ r Proof. (a) Fix ω Ω and consider the system of sets 1 P 1 A : A F F : A F . r “ P 1 b 2 ω1 P 2 r ( We claim that A is a σ-algebra. Indeed, Ω A since Ωω1 Ω2 F2. Furthermore, since c P c “ P A Ω A , we get that A A implies A A. Lastly, for An with An A for p qω1 “ 2zp ω1 q P P p q P all n N, we deduce r rP r An An ω , ω 1 n N 1 “ n Np q ´ P ¯ P ď ď r and hence we deduce n N An A, too, andr A is a σ-algebra. Furthermore, for F1 F1, P P P F2 F2 we have F1 F2 F1 F2, since P ˆŤ P b

F2, if ω1 F1, F1 F2 ω P p ˆ q 1 “ , otherwise. " H r r Since F F (recall the notation of Lemma 1.1.7) generates the σ-algebra F F , we 1 ˚ 2 1 b 2 deduce that A F F , which finishes the proof. “ 1 b 2 Analogously, one can show Aω2 F . P 1 (b) Fix ω Ω . For F E we haver 1 P 1 P 1 1 f ´ F f ´ F , r p ω1 q p q “ p p qqω1 and the claim follows from the factr that f 1 F F r F in combination with (a). ´ p q P 1 b 2 Similarly for ω Ω and f ω2 . 2 P 2 r r Proposition 2.4.3. For j 1, 2 , let µj be a measure on a measurable space Ωj, Fj . P t u p q (a) If µ is σ-finite, then for any F F F , the function 2 P 1 b 2

Ω ω µ Fω (2.4.1) 1 Q 1 ÞÑ 2p 1 q is F B R -measurable, and the function 1 ´ p q

µ : F F µ Fω µ dω (2.4.2) 1 b 2 ÞÑ 2p 1 q 1p 1q żΩ1 defines a measure such that for any A F , B F , P 1 P 2 µ A B µ A µ B . (2.4.3) p ˆ q“ 1p q ¨ 2p q (b) If both, µ and µ are σ-finite, then there is exactly one measure µ µ on F F such 1 2 1 b 2 1 b 2 that (2.4.3) holds with µ replaced by µ µ . In this case, 1 b 2

ω2 µ µ F µ Fω µ dω µ F µ dω F F F . (2.4.4) 1 b 2p q“ 2p 1 q 1p 1q“ 1p q 2p 2q @ P 1 b 2 żΩ1 żΩ2 µ µ is called the product measure of µ and µ , and it is σ-finite. 1 b 2 1 2 60 CHAPTER 2. THE LEBESGUE INTEGRAL

Proof. (a) As before, by the usual exhaustion procedure we can assume that µ2 is finite. Then desired measurability of the function in (2.4.1) follows again by a good sets principle and we will omit the details here. We can rewrite the function in (2.4.2) as

µ F 1F ω ,ω µ dω µ dω , p q“ p 1 2q 2p 2q 1p 1q żΩ1 żΩ2 from which one can observe by applying the MCT that this expression defines a measure on F F . By setting F : A B we immediately obtain (2.4.3). 1 b 2 “ ˆ (b) By (a) and symmetry, the middle and right-hand side expressions of (2.4.4) both define measures satisfying (2.4.3). Since both, µ1 and µ2 are σ-finite, we can use Theorem 1.2.17 to deduce the desired uniqueness, so the two measures coincide and µ µ is well-defined 1 b 2 and uniquely determined by (2.4.4).

Remark 2.4.4. σ-finiteness of µ2 is needed Part (a) of Proposition 2.4.3, since otherwise the function ω µ Fω is not necessarily measurable anymore, see [Beh87, p.96]. 1 ÞÑ 2p 1 q The above can immediately be generalized to the product of finitely many measures, which in particular is a generalization of the observation of Remark 1.3.15.

Theorem 2.4.5. Let Ωi, Fi,µi , 1 i n, be σ-finite measure spaces. Then there exists a p q ď ď unique σ-finite measure on the product-σ-algebra F : 1 i nFi such that “b ď ď n µ F1 . . . Fn µi Fi , p ˆ ˆ q“ i 1 p q ź“ for all Fi Fi, 1 i n. n P ď ď i 1µi : µ1 . . . µn : µ is called the product measure of the µi, 1 i n, and in the case b “ “ b b “ n ď ď that all Ωi, Fi,µi are equal, we write µb for the product measure. p q 1 Proof. We will not give the proof here since we will prove a more general result in Theorem 4.2.1 below. See [Els05, Satz V.1.12] for a proof.

2.5 The theorems of Fubini and Tonelli

Theorem 2.5.1 (Tonelli’s theorem (Italian mathematician (1885–1946))). Let Ω , F ,µ and p 1 1 1q Ω , F ,µ be σ-finite measure spaces, and let f M Ω Ω , F F . Then the function p 2 2 2q P `p 1 ˆ 2 1 b 2q

ω f ω ,ω dµ ω is in M` Ω , F , (2.5.1) 1 ÞÑ p 1 2q 2p 2q p 1 1q żΩ2 the function

ω f ω ,ω dµ ω is in M` Ω , F , (2.5.2) 2 ÞÑ p 1 2q 1p 1q p 2 2q żΩ1 and the equality

f dµ1 µ2 f ω1,ω2 µ2 dω2 µ1 dω1 f ω1,ω2 µ1 dω1 µ2 dω2 Ω1 Ω2 b “ Ω1 Ω2 p q p q p q“ Ω2 Ω1 p q p q p q ż ˆ ż ż ż ż ´ ¯ ´ ¯ (2.5.3) holds true. 2.5. THE THEOREMS OF FUBINI AND TONELLI 61

Proof. We see that (2.5.1) to (2.5.3) hold true for f 1F , F F F , due to Proposition 2.4.3. “ P 1 b 2 Using the linearity of the integral for non-negative functions as in Lemma 2.0.14, we obtain that the validity extends to simple non-negative functions which are F F B R -measurable. 1 b 2 ´ p q Choosing a non-decreasing sequence fn of non-negative simple functions as in Lemma 2.0.13 p q with limn fn f, we use Proposition 1.4.15 and the MCT to deduce the measurability stated Ñ8 “ in (2.5.1) and (2.5.2) for f M from their validity for simple functions: P `

f ω1,ω2 dµ2 ω2 lim fn ω1,ω2 dµ2 ω2 lim fn ω1,ω2 dµ2 ω2 . p q p q“ n p q p q“ n p q p q żΩ2 żΩ2 Ñ8 Ñ8 żΩ2

Similarly for the roles of ω1 and ω2 exchanged. The equalities in (2.5.3) then follow by the MCT and their validity for simple functions:

f dµ1 µ2 lim fn dµ1 µ2 lim fn dµ1 µ2 n n Ω1 Ω2 b “ Ω1 Ω2 b “ Ω1 Ω2 b ż ˆ ż ˆ Ñ8 Ñ8 ż ˆ lim fn ω1,ω2 µ2 dω2 µ1 dω1 f ω1,ω2 µ2 dω2 µ1 dω1 , “ n Ω Ω p q p q p q“ Ω Ω p q p q p q Ñ8 ż 1 ´ ż 2 ¯ ż 1 ´ ż 2 ¯ and similarly for the second equality in (2.5.3).

It turns out that we are still allowed to integrate coordinatewise even if f is not necessarily non-negative. However, we have to demand integrability with respect to the product measure to replace non-negativity. This is the content of Fubini’s theorem.

Theorem 2.5.2 (Fubini’s theorem (Italian mathematician (1879–1943))). Assume the setting of Theorem 2.5.1, except that instead of f M Ω Ω , F F we only require f g ih, P `p 1 ˆ 2 1 b 2q “ ` with g, h M Ω Ω , F F . Then, if f is µ µ -integrable, P p 1 ˆ 2 1 b 2q 1 b 2 (a) Ac : ω Ω : f ω , is not µ -integrable F (2.5.4) 1 “ 1 P 1 p 1 ¨q 2 P 1 is a µ1-null set; ( (b) Ac : ω Ω : f ,ω is not µ -integrable F (2.5.5) 2 “ 2 P 2 p¨ 2q 1 P 2 is a µ2-null set; ( (c)

ω1 f ω1,ω2 dµ2 ω2 is in M A1, F1 A1 , (2.5.6) ÞÑ p q p q p | q żΩ2 and

ω2 f ω1,ω2 dµ1 ω1 is in M A2, F2 A2 , (2.5.7) ÞÑ p q p q p | q żΩ1 and

f dµ1 µ2 f ω1,ω2 µ2 dω2 µ1 dω1 Ω1 Ω2 b “ A1 Ω2 p q p q p q ż ˆ ż ´ ż ¯ (2.5.8) f ω1,ω2 µ1 dω1 µ2 dω2 . “ A Ω p q p q p q ż 2 ´ ż 1 ¯ Proof. From the µ µ -integrability of f we deduce that f is integrable with respect to µ µ 1 b 2 | | 1 b 2 as well. Thus, Tonelli’s theorem implies

f ω1,ω2 µ2 dω2 µ1 dω1 f dµ1 µ2 . (2.5.9) Ω1 Ω2 | p q| p q p q“ Ω1 Ω2 | | b ă8 ż ´ ż ¯ ż ˆ 62 CHAPTER 2. THE LEBESGUE INTEGRAL

The inner integral on the left-hand side is a measurable function due to (2.5.1) and so the finiteness of the outer integral on the left-hand side implies

f ω ,ω µ dω for µ a.a. ω Ω , | p 1 2q| 2p 2qă8 1 ´ 1 P 1 żΩ2 i.e., c A1 ω1 Ω1 : f ω1,ω2 µ2 dω2 “ P Ω | p q| p q“8 ! ż 2 ) is in F1 due to (2.5.1) and a µ1-null set, which establishes (2.5.4). In particular, for all ω A , the function f ω , is µ -integrable. (2.5.10) 1 P 1 | p 1 ¨q| 2 Thus we deduce from the linearity of the integral that for ω A we have 1 P 1

f ω ,ω µ dω Ref ` ω ,ω µ dω p 1 2q 2p 2q“ p q p 1 2q 2p 2q żΩ2 żΩ2 Ref ´ ω ,ω µ dω ´ p q p 1 2q 2p 2q żΩ2 (2.5.11) i Imf ` ω ,ω µ dω ` p q p 1 2q 2p 2q żΩ2 i Imf ´ ω ,ω µ dω , ´ p q p 1 2q 2p 2q żΩ2 where all integrals on the right-hand side exist in 0, due to (2.5.10) and r 8q Ref `, Ref ´, Imf `, Imf ´ 0, f . (2.5.12) p q p q p q p q P r | |s Thus, (2.5.6) follows since restricted to A1, all integrals on the right-hand side of (2.5.11) are finite and in M` A1, F1 A1 . The bounds (2.5.12) in combination with (2.5.9) imply that p | q linearity of the integral supplies us with

f ω1,ω2 µ2 dω2 µ1 dω1 Ref ` ω1,ω2 µ2 dω2 µ1 dω1 A Ω p q p q p q“ A Ω p q p q p q p q ż 1 ´ ż 2 ¯ ż 1 ´ ż 2 ¯ Ref ´ ω1,ω2 µ2 dω2 µ1 dω1 ´ A Ω p q p q p q p q ż 1 ´ ż 2 ¯ i Imf ` ω1,ω2 µ2 dω2 µ1 dω1 ` A Ω p q p q p q p q ż 1 ´ ż 2 ¯ i Imf ´ ω1,ω2 µ2 dω2 µ1 dω1 , ´ A Ω p q p q p q p q ż 1 ´ ż 2 ¯ where the last equality follows from the fact that all integrands of the outer integrals are non- negative. We may now apply Tonelli’s theorem to each summand on the right-hand side of the last display to deduce that it equals

Ref ` dµ1 µ2 Ω1 Ω2 p q b ż ˆ Ref ´ dµ1 µ2 ´ Ω1 Ω2 p q b ż ˆ Imf ` dµ1 µ2 ` Ω1 Ω2 p q b ż ˆ Imf ´ dµ1 µ2 ´ Ω1 Ω2 p q b ż ˆ f dµ1 µ2, “ Ω1 Ω2 b ż ˆ 2.6. FOURIER TRANSFORM / CHARACTERISTIC FUNCTIONS 63 where the last equality follows from the finiteness of the integrals on the left-hand side and the linearity of the integral. This implies the first equality in (2.5.8). The remaining statements are obtained by exchanging the roles of Ω , F ,µ and Ω , F ,µ . p 1 1 1q p 2 2 2q

2.6 Fourier transform / characteristic functions

Recall the notation for image measures introduced in Theorem 1.5.6 and Definition 1.5.7.

Definition 2.6.1. Let µ be a finite measure on Rd, B Rd . Its characteristic function is defined p p qq via the Fourier transform

it x ϕµ t : e ¨ µ dx cos t x µ dx i sin t x µ dx . p q “ d p q“ d p ¨ q p q` d p ¨ q p q żR żR żR The characteristic function of a random variable X taking values in Rd, B Rd is defined as p p qq the characteristic function of its distribution:

it x Thm. 2.2.19 it X ϕX t : ϕP X´1 t e ¨ PX dx E e ¨ p q “ ˝ p q“ d p q “ r s żR For the next example we need the following result which is interesting and useful in its own right.

Proposition 2.6.2 (Interchange of integration and differentiation). Let I be an interval con- taining more than one point, t I, let Ω, F,µ be a measure space, and let f : I X C 0 P p q ˆ Ñ with the following properties:

(a) for all t I, the function f t, is integrable; P p ¨q f (b) B t t0,ω exists for all ω Ω; B p q P (c) there exists a neighborhood U of t as well as g M Ω, F integrable such that for all 0 P `p q t U I with t t , one has for µ-a.a. ω Ω that P X ‰ 0 P f t,ω f t ,ω p q´ p 0 q g ω . t t0 ď p q ˇ ´ ˇ ˇ ˇ ˇ ˇ Then the function F : I t Ω f t,ω µ dω is (at least one-sidedly) differentiable in t0, the f Q ÞÑ p q p q function B t t0, is integrable, and B p ¨q ş f F 1 t B t ,ω µ dω . p 0q“ t p 0 q p q żΩ B Proof. See exercise 3 on homework sheet 10.

Remark 2.6.3. Under the appropriate assumptions, important computational tools that we got to know for real-valued functions remain valid for complex-valued functions also. This applies to the Fundamental Theorem of Calculus, integration by parts, etc., where we can essentially prove the respective results by decomposing a complex-valued function into its real and imaginary parts, and then perform the proof for each of these parts. Exemplifying we go through the example of the Fundamental Theorem of Calculus here: Let f : a, b C a continuous complex-valued function with f g ih and g, h real-valued r s Ñ “ ` functions, and F G iH an antiderivative to F with G and H antiderivatives to g and h. “ ` Then f dλ g dλ i h dλ G b G a i H b H a F b F a , where a,b “ a,b ` a,b “ p q´ p q` p p q´ p qq “ p q´ p q the penultimater s equalityr s followsr froms the Fundamental Theorem of Calculus. ş ş ş 64 CHAPTER 2. THE LEBESGUE INTEGRAL

2 Example 2.6.4. Let µ R, σ 0, , and let X N µ,σ . In order to compute ϕX we P P p 8q „ p q first of all note that using the previous result on integration by measures with densities (see Proposition 2.2.9) and using substitution,

px´µq2 2 1 8 itx 1 8 x it σx µ itµ ϕX t e´ 2σ2 e dx e´ 2 e p ` q dx e ϕY σt (2.6.1) p q“ ?2πσ2 “ ?2π “ p q ż´8 ż´8 where Y N 0, 1 . Using Proposition 2.6.2 we can then differentiate ϕY t 1 x2„2 itx p q p q “ 8 e e dx to get that ?2π ´ { ´8 ş 1 8 x2 2 itx ϕ1 t e´ { x i e dx. Y p q“ ?2π p´ q p´ q ż´8 :u1 x :v x “ p q “ p q looooomooooon looomooon We can continue using integration by parts to obtain

1 x2 2 itx 8 x2 2 itx 8 ϕY1 t e´ { i e x e´ { te dx tϕY t (2.6.2) p q“ ?2π p´ q “´8 ´ “´ p q ż´8 ´ 0 ˇ ¯ “ ˇ and note that looooooooooomooooooooooon 0 ϕY 0 E e 1. (2.6.3) p q“ r s“ From the theory of ODEs (i.e., Analysis II in your case, cf. Existence and Uniqueness Theorem of Picard-Lindel¨of, [Wal93, Theorem II.6.I]) we know that the initial value problem given by (2.6.2) and (2.6.3) has a unique solution which is given by

t2 2 ϕY t e´ { . p q“ Plugging this into (2.6.1), we obtain

iµt σt 2 2 ϕX t e ´p q { . p q“ The above considerations generalize to X a d-dimensional µ, Σ -distributed random variable p q (µ Rd, Σ Rd d positive definite) to obtain P P ˆ iµ t t Σ t 2 d ϕX t e ¨ ´ ¨p ¨ q{ , t R . p q“ P Theorem 2.6.5. Any finite measure on Rd, B Rd is uniquely characterized by its character- p p qq istic function.

The proof of this result will be given later on (see page 96), when we have a better probabilistic understanding of its tools. Chapter 3

Classical and basic results in probability theory

A large chunk of this chapter will be based on the lecture notes [Dre18] accompanying the course ‘Introduction to Stochastics’ which can be found here. As a consequence, we will not repeat proofs of results that are proven in the same way as in [Dre18] but rather refer to that source instead. In particular, you might want to have a look at [Dre18, Section 1.2] for motivating the concept of a probability. As regards to other sources, [Kle14], [Kal02] and [Dur10] make particularly good reads for foundations of probability theory. All three sources cover significantly more than what we can hope for in this course.

3.1 Specific distributions

When putting our previous setting of measure theory and integration into a probabilistic con- text, we will usually consider some probability space Ω, F, P to be given. Random experiments p q will then be described via random variables X : Ω, F, P E, E as defined in Definition 1.4.1. p q Ñ p q Furthermore, elements of the form X A : ω Ω : X ω A F , A E, will be called t P u “ t P p q P upP q P events, and they will constitute those outcomes of (random) experiments that we will be able to assign a probability to. See also [Dre18, Example 1.2.1]. In fact, this example (as well as Remark 3.1.1 below) also exemplifies that we will oftentimes and without loss of generality choose Ω, F : E, E ; the latter is usually naturally given by the model of the experiment, p q “ p q whereas the former may as well be some abstract space lurking in the background. What is important to us, however, is the image measures P X 1. ˝ ´ The following observation will prove useful in the next sections when we introduce various different distributions. This is largely taken from the corresponding section [Dre18, Section 1.8]. We refer to that source for further examples and motivation. Remark 3.1.1. Given any distribution µ (i.e., a probability measure on E, E ) one can con- p q struct a random variable X with law µ as follows. Take E, E,µ as the underlying probability p q space and choose X : E ω ω E to be the identity on E. Then X defines a random Q ÞÑ P variable from E, E,µ to E, E with law µ. p q p q In particular, as a consequence of this remark, if we want to describe an random experiment whose outcome is a value in E, then we can choose E, E,µ as the underlying probability space p q for suitable E and µ.

3.1.1 Discrete distributions We recall the notion of a distribution introduced in Definition 1.5.7, and we also repeat the definition [Dre18, Definition 1.3.8] of the δ- or Dirac-measure.

65 66 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Definition 3.1.2. Let E, E be a measurable space. For x E, the Dirac measure / Dirac p q P distribution / delta measure in x is defined via

δx : E 0, 1 . Ñ r s F 1F x . ÞÑ p q We will call any distribution on a measurable space E, E which is of the form p q

αnδxn , n N ÿP where xn E and αn 0 with αn 1 a discrete distribution. Similarly, we call any P ě n N “ random variable which has a discreteP distribution a discrete random variable. ř In the remaining part of this section, if not mentioned otherwise explicitly, we will always assume the underlying probability space to be Ω, F, P , and random variables map to R, B R . p q p p qq Example 3.1.3. A random variable X is called Bernoulli distributed with parameter p 0, 1 P r s (named after the Swiss mathematician Jacob Bernoulli (1655–1705)) if

P X 1 p, and P X 0 1 p. p “ q“ p “ q“ ´ 1 In this case one writes X Berp and the law / distribution P X is referred to as the „ ˝ ´ Bernoulli distribution Berp which, using Definition 3.1.2, can be written as

Berp pδ 1 p δ . “ 1 ` p ´ q 0 A random variable that is Bernoulli distributed describes a coin flip (biased if p 1 2), for ‰ { example. Assume w.l.o.g. that the coin shows heads with probability p and tails with probability 1 p. ´ Example 3.1.4. A random variable X is called Binomially distributed with parameters n N P and p 0, 1 , if P p q

n k n k for each k 0, 1,...,n one has P X k p 1 p ´ . (3.1.1) P t u p “ q“ k p ´ q ˆ ˙

In this case, one writes X Binn,p and its distribution is referred to as the Binomial distribution „ Binn,p, which can be written as

n n k n k Binn,p p 1 p ´ δk. “ k p ´ q k 0 ˆ ˙ ÿ“ Example 3.1.5. A random variable X is called geometrically distributed with success param- eter p 0, 1 , if P p q k 1 for all k N one has P X k p 1 p ´ . (3.1.2) P p “ q“ p ´ q In this case we write X Geop, and its distribution is referred to as the Geometric distribution „ Geop, which can be written as

8 k 1 Geop p 1 p ´ δk. “ p ´ q k 1 ÿ“ Remark 3.1.6. Some authors call X geometrically distributed if instead of (3.1.2),

for all k N one has P X k p 1 p k. P 0 p “ q“ p ´ q 3.1. SPECIFIC DISTRIBUTIONS 67

Example 3.1.7. A random variable X is called Poisson distributed with parameter ν 0 if ą X :Ω N0 and Ñ k ν ν P X k e´ k N . p “ q“ k! @ P 0 In this case we write X Poiν, and its distribution is referred to as the Poisson distribution „ Poiν (named after the French mathematician Sim´eon Denis Poisson (1781 – 1840)), which can be written as k ν 8 ν Poiν e´ δk. “ k! k 0 ÿ“ Poisson distributed random variables are e.g. used to describe the number of customers that have called a customer service center in a certain time interval. The reason for such a description being feasible is given by Theorem 3.1.8 below.

Theorem 3.1.8 (Poisson limit theorem). Let pn be a sequence of numbers from 0, 1 such p q r s that the limit ν : limn npn exists in 0, . Then for each k N0, “ Ñ8 p 8q P lim Binn,p k Poiν k . n n Ñ8 p q“ p q Proof. For k N fixed we have P 0 k n k k n k n k n! pnn pnn ´ n ν ν Binn,p k p 1 pn ´ p q 1 Ñ8 e´ Poiν k . n p q“ k np ´ q “ k! n k ! nk ´ n ÝÑ k! “ p q ˆ ˙ p ´ q ´ ¯

This result explains the fact that the Poisson distribution is used for modeling e.g. the number of customers that contact a call center during a certain time interval: We partition the time interval into n subintervals of equal width, and as we take n to infinity, it is reasonable to assume that in any of the subintervals either zero or one customers are calling. Due to symmetry, it furthermore seems reasonable to assume that the probability of a customer calling in a subinterval has a probability decaying like p n some p 0, , and that the fact that a customer has called { P p 8q during one subinterval does not influence the probabilities that a customer is calling during another time interval.1 Thus, the probability of k customers calling during the original time interval should be approximated by Binn,p n k if n is large. The above Theorem 3.1.8 now { p q shows that the Binomial distribution is the right candidate for this. Example 3.1.9. Let N N, and K,n 0, 1,...,N . A random variable X is called hyper- P P t u geometrically distributed with parameters K,N,n if X : 0, 1,...,N 0, 1,...,N with t u Ñ t u K N K k n´k P X k ´ for k 0 n K N,...,n K , (3.1.3) p “ q“ N P t _ ` ´ ^ u ` ˘`n ˘ and P X k 0 otherwise. p “ q“ ` ˘ In this case we write X Hyp N,K,n , and its distribution is referred to as the Hypergeometric „ p q distribution HypN,K,n with parameters N, K, and n.

Example 3.1.10. Let X Geop. Then the distribution function of X is given by „

0, if t 1, ttu ă FX t ttu j 1 1 1 p ttu p q“ # j 1 p 1 p ´ p 1´p 1´ pq 1 1 p , if t 1. “ p ´ q “ ´p ´ q “ ´ p ´ q ě Exercise 3.1.11. If Xřis a discrete real random variable, then FX has jumps exactly at the points in x X Ω : P X 1 x 0 and is constant otherwise. t P p q p ´ t uq ą u 1These are slightly delicate issues; in fact, if the customer center in question is e.g. that of an energy retailer and there is a power outage during some part of the time interval we consider, then these assumptions will generally not be met. However, they seem reasonable to assume during normal operation. 68 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

3.1.2 Distributions with densities Definition 3.1.12. A function f M Ω, F,µ with the property that P `p q f dµ 1 “ żR is called a probability density (‘Wahrscheinlichkeitsdichte’). For the time being we will mostly be interested in the case Ω, F,µ Rd, B Rd , λd . p q “ p p q q We will call any distribution on Ω, F which is absolutely continuous with respect to λd a con- p q tinuous distribution. Similarly, we call any random variable which has a continuous distribution a continuous random variable. Also, we remark in passing that the distinction between discrete and continuous distributions is not as essential anymore as it used to be in the introductory lecture. This is because we now have one comprising framework for discrete and continuous distributions, since both of them can be considered as probability measures on Ω, F now. p q Example 3.1.13. (a) For a, b R with a b the uniform distribution (‘Gleichverteilung’) P ă on the interval a, b has the density r s 1 1 R x a,b x . Q ÞÑ b a r sp q ´ We write Uni a, b for the uniform distribution on the interval a, b , and the correspond- pr sq r s ing distribution function is given by

0, if t a, t a ď F t ´ , if t a, b , p q“ $ b a P p q ´1, if t b. & ě (b) Let κ 0, . The exponential distribution% (‘Exponentialverteilung’) with parameter κ P p 8q has density κe κx, if x 0, R x ´ ě Q ÞÑ 0, otherwise. " We write X Exp κ if X is a random variable that is exponentially distributed with „ p q parameter κ 0. ą (c) The normal or Gaussian distribution (‘Normalverteilung’ or ‘Gaußverteilung’, named after the German mathematician Carl Friedrich Gauss (1777–1855)) with parameters µ R and P σ2 0, (seen in Example 1.5.8 already) has the density P p 8q 1 px´µq2 R x e´ 2σ2 , Q ÞÑ ?2πσ2 and we had agreed to write X N µ,σ2 if X is a random variable that is normally „ p q distributed with parameters µ and σ2. It should also be noted here that the cumulative distribution function of the standard Nor- mal distribution N 0, 1 is usually denoted by p q t 1 x2 Φ t : e´ 2 dx, (3.1.4) p q “ ?2π ż´8 and that there is no closed expression for general values of t for the right-hand side. There are, however, tables to look up those values for a variety of different values for t. We will get back to those distributions after having introduced the concept of expectation. 3.2. INDEPENDENCE 69

3.2 Independence

A key concept in probability theory is the notion of ‘independence’, which in the setting of events has been introduced as follows in [Dre18, Def. 1.6.1], see here. In fact, there are even people saying that the concept of independence is the principal distinction of probability from measure theory. The motivation for the definition of independence has been the following definition of the con- ditional probability.

Definition 3.2.1. Let F, G F be such that P G 0. Then we define the conditional proba- P p qą bility of F given G as P F G P F G : p X q. (3.2.1) p | q “ P G p q In terms of the interpretation of relative frequencies given in [Dre18, Section 1.2], this means that if P F G P F (i.e., if P F G P F P G ), then the (limiting) relative frequency p | q “ p q p X q “ p q p q of F is not changed if we restrict to those experiments for which G occurs. This gave rise to the following definition.

Definition 3.2.2. Given a probability space Ω, F, P , events A, B F are called independent p q P if P A B P A P B . p X q“ p q ¨ p q As it turns out, we will need a more general concept of independence as introduced in the following definition.

Definition 3.2.3. A family Eλ , λ Λ, with Eλ F is called independent if for any J Λ p q P Ă Ă finite and any choice of Fj Ej for j J, one has P P

P Fj P Fj . (3.2.2) j J “ j J p q ` čP ˘ źP

An important special case is when Eλ Fλ for all λ Λ, and with Fλ F. In this case we “ t u P P say that the family of events Fλ , λ Λ, is independent. p q P

Remark 3.2.4. The family Eλ , λ Λ, is independent if and only if for any J Λ finite, the p q P Ă family Eλ , λ J, is independent. p q P

Proposition 3.2.5. Let Eλ , λ Λ be an independent family with Eλ F. Then the family p q P Ă δ Eλ , λ Λ, of Dynkin systems is also independent. p p qq P Proof. From Remark 3.2.4 we deduce that we can assume Λ to be finite. For λ Λ arbitrary but fixed define Dλ1 to be the set of all F F such that if we replace Eλ1 by 1 P P F , then the resulting family Eλ , λ Λ, is still independent. Then Dλ1 is a Dynkin system t u p q P (exercise). Now we have Eλ1 Dλ1 , and hence also δ Eλ1 Dλ1 . Thus, by definition of the independence Ă p q Ă property, we deduce that the family we obtain when replacing Eλ1 by δ Eλ1 is still independent. p q Repeating this step for each remaining λ Λ λ , the result follows. P zt 1u

Corollary 3.2.6. If Eλ , λ Λ is an independent family such that each Eλ is a π-system, then p q P σ Eλ , λ Λ also is an independent family. p p qq P Proof. This follows from Proposition 3.2.5 in combination with the π-λ-Theorem 1.1.32. 70 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Proposition 3.2.7. Let Eλ , λ Λ, be an independent family such that each Eλ is a π-system. p q P Consider a partition of Λ into subsets Ji, i I, and denote P

Fi : σ Ej . “ j Ji ´ ďP ¯

Then Fi , i I, is an independent family. p q P Proof. For i I denote P

Ei : Ej . . . Ej : n N, j ,...,jn Ji, and Ej Ej k 1,...,n . “ 1 X X n P t 1 uĂ k P k @ P t u ! ) r Since each Eλ is a π-system, so is each Ei and hence σ Ei δ Ei . Since the Ei , i I, still p q “ p q p q P form an independent family, the result follows with Corollary 3.2.6. r r r r Definition 3.2.8. A family of random variables Xλ , λ Λ, is called independent if the family p q P of σ-algebras σ Xλ , λ Λ, is independent. p q P We refer to [Dre18] on more background for the concept of independence, in particular see [Dre18, Remark 1.7.15]. In [Dre18, Claim 1.8.4] the following claim had been derived, which you might want to try your hands at (without looking it up) if you haven’t seen it before.

Claim 3.2.9. The sum n Sn : Xj “ j 1 ÿ“ of independent random variables X1,...,Xn, each distributed according to Berp, is distributed according to Binn,p.

The concept of a family of random variables that are independent and all have the same distri- bution is so important that it has its own name.

Definition 3.2.10. A family of random variables Xλ , λ Λ, is called independent identically p q P distributed (i.i.d.) (‘unabh¨angig identisch verteilt’ (u.i.v.)), if

(a) the family Xλ , λ Λ, is an independent family of random variables, and p q P

(b) if the Xλ, λ Λ, all have the same distribution. P

The Borel-Cantelli lemmas In order to prove this theorem we need some further results, which are important and of interest on their own.

Lemma 3.2.11 (Borel-Cantelli lemma (Italian mathematician Francesco Paolo Cantelli (1875–1966)). Let Ω, F, P be a probability space and assume given a sequence An of events p q p q An F. P (a) If

P An , (3.2.3) n N p qă8 ÿP we have P lim sup An 0. n “ Ñ8 ` ˘ 3.2. INDEPENDENCE 71

(b) If P An , n N p q“8 ÿP and if in addition the An are independent, then p q P lim sup An 1. n “ Ñ8 ` ˘ The proof is that of [Dre18, Lemma 1.12.6]. Remark 3.2.12. It is important to note here that the independence assumption in part (b) of Lemma 3.2.11 cannot be dropped. To see this, consider for example a single fair coin toss modeled on a probability space Ω, F, P , and denote for all n N by An the event that the coin p q P shows tails. Then P A 1 for all n N, so P A , but P lim sup A n 2 n N0 n n n 1 p q “ P P p q“8 p Ñ8 q “ P An 1. p q“ 2 ‰ ř Example 3.2.13. (a) A popular application is the so-called ‘infinite monkey theorem’. It states that a monkey which is randomly hitting keys (in an i.i.d. fashion, and such that any key, lower and upper case, has a positive probability of being hit) of a computer keyboard will almost surely type any given text, such as e.g. Tolstoy’s ‘War and Peace’. It is left to the reader to make this statement more precise. k

(b) Consider a sequence Xn of independent random variables such that p q 1 1 P Xn n P Xn n p “ q“ p “´ q“ 2 n ln n 1 p ` q and 1 1 P Xn 0 1 . p “ q“ ´ 2 n ln n 1 p ` q Xn Then, setting An : | | 1 , we obtain “ t n ě u 8 8 1 P A , n n ln n 1 n 1 p q“ n 1 “8 ÿ“ ÿ“ p ` q where the latter equality can be shown by Cauchy’s condensation test (‘Cauchy’sches Verdichtungskriterium’). Therefore, the sequence An fulfills the condition of the sec- p q ond part of the Borel-Cantelli lemma. We will come back to this example in the context of the law of large numbers.

Kolmogorov’s 0 1-law ´ Definition 3.2.14. Let Fn be a sequence of σ-algebras with Fn F. We define the corre- p q Ă sponding tail-σ-algebra (‘terminale σ-Algebra’) as

8 T : T Fn : σ Fm . “ p q “ n 1 m n ` ˘ č“ ´ ďě ¯ The intuition is the following: An event A F is contained in the tail-σ-algebra if in order to P decide whether or not it occurs (i.e., whether or not ω A) we can discard the ‘information’ P from finitely many of the Fn. It becomes more clear in the context of an example.

Example 3.2.15. Let Xn be a sequence of random variables and consider the corresponding p q tail-σ-algebra T : T σ Xn . “ p p qq n Define Sn : i Xi. ` ˘ “ 1 Then we do for“ example have that (check!) ř 72 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

lim Sn exists T and n Ñ8 P ( • S lim n 0 T . n n Ñ8 “ P ( Theorem 3.2.16 (Kolmogorov’s 0 1-law). Let Fn be a sequence which is an independent ´ p q family of σ-algebras with Fn F. Then the tail-σ-algebra T is P-trivial, i.e., Ă P A 0, 1 A T . p q P t u @ P Proof. We are going to show that for all A T , P P A P A 2, (3.2.4) p q“ p q which will imply the result. For this purpose, for arbitrary fixed A T define P D : D F : P A D P A P D . “ P p X q“ p q p q Our strategy is to show that D is a Dynkin system with (

T D, (3.2.5) Ă which in particular will imply (3.2.4). The fact that D is a Dynkin system is shown along the by now standard lines and we will not go into further detail. In order to establish (3.2.5), we observe that due to Proposition 3.2.7 and the fact that A T we have P n σn : σ Fk D, “ Ă k 1 ´ ď“ ¯ so we also obtain σ : σn D, 8 “ n N Ă ďP and consequently δ σ D. (3.2.6) p 8qĂ But since the σn are non-decreasing in n, we deduce that σ is a π-system, so by the π-λ- 8 Theorem we infer δ σ σ σ . (3.2.7) p 8q“ p 8q On the other hand, n NFn σ σ , and therefore in particular also T σ σ . Combining Y P Ă p 8q Ă p 8q this with (3.2.7) and (3.2.6), we infer (3.2.5) which finishes the proof.

3.3 Covariance, variance

Definition 3.3.1. For X L1, the expression on the right-hand side of P Var X : E X E X 2 0, p q “ p ´ r sq P r 8s is called the variance of X. “ ‰ 3.3. COVARIANCE, VARIANCE 73

From the expression on the right-hand side it is clear that the variance is always non-negative, since the random variable in the expectation on the right-hand side is non-negative. Further- more, this expression shows that the variance gauges the expected quadratic deviation of X from its expectation E X . It is a simple measure for how strongly the random variable X r s fluctuates around its mean. Using the linearity of expectation, we can rewrite the variance as

Var X E X E X 2 E X2 2E X E X E X 2 E X2 E X 2, p q“ p ´ r sq “ r s´ r s r s` r s “ r s´ r s which holds true in“ the case E X2‰ as well. Thus, we immediately obtain the following r s“8 corollary.

Corollary 3.3.2. For X L1, we have Var X if and only if E X2 . P p qă8 r să8 Definition 3.3.3. The covariance of two random variables X and Y is defined as

Cov X,Y E X E X Y E Y E XY 2E X E Y E X E Y E XY E X E Y p q“ p ´ r sqp ´ r sq “ r s´ r s r s` r s r s“ r s´ r s r s (3.3.1) “ ‰ if the right-hand side is well-defined in , . r´8 8s The two random variables are called uncorrelated if Cov X,Y 0. p q“ Again we note that variance and covariance only depend on the random variables involved through their corresponding distributions. In some sense the covariance Cov X,Y tells us how strongly X and Y are correlated, i.e., p q how strongly they tend to ‘change together’. If both X and Y tend to take values above their expectation on the same subset of Ω, and also tend to take values below their expectations on similar sets, then according to (3.3.1) this should imply that their covariance is positive; on the other hand, if X tends to take values above its expectation on subsets of Ω where Y tends to take values below its expectation, and vice versa, then this would suggest that their covariance is negative. Therefore, if X and Y are independent one might possibly guess that Cov X,Y vanishes. This is indeed the case as Theorem 3.3.6 below shows. Note, however, p q that the converse is not generally true as will be asked to show in Exercise 3.3.8. We now collect some properties of covariances and variances in the following result.

Proposition 3.3.4. Let X and Y be random variables with E X2 , E Y 2 , and let a, b, c, d r s r să8 P R. Then

(a) Cov aX b, cY d ac Cov X,Y ; p ` ` q“ p q in particular, Var a X b a2 Var X ; (3.3.2) p p ` qq “ p q (b) Cov X,Y Var X Var Y ; | p q| ď p q p q Proof. • Using the linearity of expectation wea get

Cov aX b, cY d E aX b E aX b cY d E cY d p ` ` q“ p ` ´ r ` sqp ` ´ r ` sq acE X E X Y E Y ac Cov X,Y . “ “ p ´ r sqp ´ r sq “ p q ‰ “ ‰ •

1 1 Cov X,Y E X E X Y E Y E X E X 2 2 E Y E Y 2 2 | p q| ď | ´ r s| ¨ | ´ r s| ď p ´ r sq p ´ r sq “Var X Var Y , ‰ “ ‰ “ ‰ “ p q p q a 74 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

where the inequality is a consequence of the Cauchy-Schwarz inequality.2

Before continuing, we bring a small result which is easy to prove but nevertheless oftentimes important and helpful. Claim 3.3.5. Let X : Ω, F, P E , E and Y : Ω, F, P E , E be two independent p q Ñ p 1 1q p q Ñ p 2 2q random variables. Then P X,Y PX PY . (3.3.3) p q “ b Proof. Due to the independence assumption on X and Y , on the -stable generator of rectangles X of E1 E2 the two probability measures P X,Y and PX PY coincide. Thus, Corollary 1.2.19 b p q b yields (3.3.3).

Theorem 3.3.6. Let X,Y L1 be independent random variables. Then XY L1 and P P E XY E X E Y . (3.3.4) r s“ r s r s In particular, independent random variables are uncorrelated. Proof. We recall the statement of Claim 3.3.5. Therefore, the change of variable formula The- orem 2.2.19 in combination with Tonelli’s theorem implies that

E XY z P XY dz xy P X,Y d x,y xy PX PY d x,y r| |s “ | |p q“ 2 | | p qp p qq “ 2 | | b p p qq żR żR żR xy PX dx PY dy x PX dx y PY dy E X E Y . “ 2 | | p q p q“ | | p q | | p q“ r| |s r| |s żR żR żR Now if X,Y L1, then the right-hand side (and therefore all expressions appearing) are finite. P In particular, in this case we can remove the absolute value signs and obtain (reading the previous display from right to left, and replacing Tonelli by Fubini) that XY L1 as well as P (3.3.4).

Remark 3.3.7. Iterating the above we obtain the following generalization of Theorem 3.3.6: 1 Let X1,...,Xn be a family of independent random variables which are either all in L or all non-negative. Then n n E Xj E Xj . j 1 “ j 1 r s ” ź“ ı ź“ Exercise 3.3.8. Find an example of real random variables X,Y which are uncorrelated but not independent. We now compute some variances of distributions we got to know earlier in this course. Example 3.3.9. (a) Let X N µ,σ2 with µ R and σ2 0, . It is not hard to compute „ p q P P p 8q E X µ (see e.g. [Dre18, Example 1.9.6]). Then we get using Proposition 2.2.9 that r s“ px´µq2 2 8 2 1 Var X E X E X x µ e´ 2σ2 dx p q“ p ´ r sq “ p ´ q ?2πσ2 ż´8 “ ‰ 2 2 2 2 x σx µ 1 8 2 x σ x 8 8 x 2 ÞÑ ` σx e´ 2 dx xe´ 2 e´ 2 dx σ , “ ?2π p q “ ?2π ´ x ` “ ż´8 ´ ˇ “´8 ż´8 ¯ 0ˇ “ ˇ ?2π looooooomooooooon “ 2Here, the Cauchy-Schwarz inequality is applied to the symmetric bilinear form definedloooooomoooooon via L2 ˆ L2 Qpf,gq ÞÑ f ¨ g dµ P R – which is not necessarily an inner product since since we can have pf, fq “ 0 even if f ‰ 0 (and only µpf ‰ 0q “ 0) – however, the (standard) proof of the Cauchy-Schwarz for inner products does not depend ş on the missing implication pf, fq“ 0 ùñ f “ 0. 3.3. COVARIANCE, VARIANCE 75

where we used integration by parts for the penultimate equality. Hence, we observe that the second parameter in N µ,σ2 denotes the variance of the random variable. In particular, p q this means that the normal distribution is completely distributed by its expectation and its variance. Furthermore, we deduce that the standard normal distribution from Example 3.1.13 (c) has mean 0 and variance 1.

(b) Let X Geo p for p 0, 1 . We first compute E X and for this purpose we take „ p q P p q r s advantage of the following useful trick. For q 1, 1 , the formula for the geometric P p´ q series supplies us with 8 q qj . “ 1 q j 1 ÿ“ ´ Since the left-hand side defines a power series that is absolutely convergent on 1, 1 , we p´ q know from Analysis I that its derivative can be computed term by term. Thus, differenti- ating both sides of the equation gives

8 j 1 1 q q 1 1 jq ´ p ´ q´ p´ q . (3.3.5) “ 1 q 2 “ 1 q 2 j 1 ÿ“ p ´ q p ´ q Using this identity for q 1 p we can compute “ ´

8 8 j 1 p 1 E X jP X j jp 1 p ´ . (3.3.6) r s“ p “ q“ p ´ q “ p2 “ p j 1 j 1 ÿ“ ÿ“ We now have to compute E X2 . For this purpose we differentiate (3.3.5) once again (and r s again, the left-hand side can be differentiated term by term on 1, 1 due to its absolute p´ q convergence) to obtain

8 j 2 2 j j 1 q ´ . (3.3.7) 1 q 3 j 1 p ´ q “ ÿ“ p ´ q Thus, we get using the change of variable formula that

2 8 2 8 2 j 1 E X j P X j j p 1 p ´ r s“ j 1 p “ q“ j 1 p ´ q ÿ“ ÿ“ 8 j 2 8 j 1 2 1 p 1 2 p p 1 p j j 1 1 p ´ p j 1 p ´ p ´ q ´ , p2 p p2 “ p ´ q j 1 p ´ qp ´ q ` j 1 p ´ q “ ` “ ÿ“ ÿ“ where we took advantage of (3.3.6) and (3.3.7) to get the third equality. Thus, we can compute 2 p 1 1 p Var X E X2 E X 2 ´ ´ . p q“ r s´ r s “ p2 ´ p2 “ p2

If we want to compute the variance of the sum of random variables, the following result turns out to be useful by decomposing it into a sum of variances and corresponding covariances.

2 Proposition 3.3.10. Let X1,...,Xn be random variables in L . Then

n n Var Xj Var Xj Cov Xi, Xj . j 1 “ j 1 p q` 1 i,j n,i j p q ´ ÿ“ ¯ ÿ“ ď ÿď ‰ 76 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Proof. Due to Proposition 3.3.4 (a), without loss of generality, we can assume E Xi 0 for all r s“ 1 i n. Using the linearity of expectation we get ď ď n n 2 n 2 n Var Xj E Xj E Xj E XiXj j 1 “ j 1 ´ j 1 “ i,j 1 r s ´ ÿ“ ¯ ”´ ÿ“ ¯ ı ´ ” ÿ“ ı¯ ÿ“ 0 by assumption “ n looooooomooooooon Var Xi Cov Xi, Xj . “ p q` p q i 1 1 i,j n,i j ÿ“ ď ÿď ‰

Note that 1 1 2 2 E XiXj E X 2 E X 2 r| |s ď r i s r j s ă8 due to Cauchy-Schwarz’ inequality, hence` all expectations˘ ` ˘ in the above equations are well- defined, and so are all the sums.

If the random variables in the above result turn out to be uncorrelated, all covariances in the above result vanish and the computation of the variance becomes significantly simpler. The corresponding result is used so often that it deserves its own name. Corollary 3.3.11 (Bienaym´eformula (Ir´en´ee-Jules Bienaym´e(1796–1878), French probabilist 2 and statistician)). Let X1,...,Xn be (pairwise) uncorrelated random variables in L . Then n n Var Xj Var Xj . “ p q j 1 j 1 ´ ÿ“ ¯ ÿ“ Example 3.3.12. (a) Let X N µ ,σ2 and Y N µ ,σ2 be independent random vari- „ p 1 1q „ p 2 2q ables. Then X Y N µ µ ,σ2 σ2 . ` „ p 1 ` 2 1 ` 2q Now you may have a look at [Dre18, Example 3.3.12] to convince yourself that it took a little bit of not so nice calculus do prove this. Using Theorem 2.6.5 and Example 2.6.4 we are in the position to derive this result in a significantly neater way. Indeed, using the independence of X and Y in combination with Example 2.6.4 we compute

2 2 it X Y Thm. 3.3.6 itX itY iµ1t σ1t 2 iµ2t σ2t 2 ϕX Y t E e p ` q E e E e e ´p q { e ´p q { ` p q“ r s 2 “2 2 r s r s“ i µ1 µ2 t σ σ t 2 e p ` q ´pp 1 ` 2 q { . “ In combination with Theorem 2.6.5 and Example 2.6.4 it therefore follows that X Y ` „ N µ µ ,σ2 σ2 . p 1 ` 2 1 ` 2q (b) Let X Binn,p for some n N and p 0, 1 . In Claim 3.2.9 we had seen that X has the „ n P P r s same distribution as j 1 Yj, where the Yj are independent random variables distributed “ 2 according to Berp. Now Var Yj is easy to compute since E Yj p and E Y p. Thus, ř p q r s“ r j s“ Var Yj p 1 p . Now since Var X depends on X only through its distribution, we get p q“ p ´ q p q the first equality of n n Var X Var Yj Var Yj np 1 p , p q“ “ p q“ p ´ q j 1 j 1 ´ ÿ“ ¯ ÿ“ where in the second equality we used Corollary 3.3.11. The following lemma is interesting in its own right, but a generalization of it will play an important role when we introduce the concept of conditional expectations (which heurstically will amount to averaging over partial information of F only) in Section ?? below. It can be interpreted in the sense that the best approximation to a random variable X by a constant c is its expectation c E X (if distance is measured in terms of the second moment of X c). “ r s ´ 3.4. LP SPACES AND SOME FUNDAMENTAL INEQUALITIES 77

Lemma 3.3.13. Let X L2 be a random variable. Then the function P R s E X s 2 Q ÞÑ rp ´ q s is minimized at s E X . In particular, we have E X s 2 Var X for all s R. “ r s rp ´ q sě p q P Proof. We compute using the linearity of expectation that

E X s 2 E X2 2sE X s2 E X2 E X 2 E X s 2. rp ´ q s“ r s´ r s` “ p r s´ r s q ` p r s´ q From this it is obvious that the function attains its minimum for s E X , in which case it “ r s equals Var X . This finishes the proof. p q 3.4 Lp spaces and some fundamental inequalities

Definition 3.4.1. Let f M. We define its essential supremum as P ess sup f : inf M R : µ f M 0 , “ t P p ě q“ u with the standard convention inf . H“8 Similarly, its essential infimum is defined as

ess inf f : sup m R : µ f m 0 , “ t P p ď q“ u with the standard convention sup . H“´8 Exercise 3.4.2. Show that the essential supremum could be equivalently defined as

ess sup f : inf M R : µ f M 0 , “ t P p ą q“ u and similarly that ess inf f sup m R : µ f m 0 , “ t P p ă q“ u Definition 3.4.3. Let p 0, . For f M we define P p 8q P 1 p p f p : f dµ 0, . } } “ Ω | | P r 8s ´ ż ¯ In addition, set f : ess sup f 0, . } }8 “ | | P r 8s For p 0, we then set P p 8s p p L : L Ω, F,µ : f M : f p , “ p q “ P } } ă8 which is consistent with the notation from Definition 2.0.18. Motivated( by this definition one also uses the notation f Lp Ω,F,µ : f p. } } p q “} } p By p : L f f p we denote the mapping that maps functions to their respective norms. }¨} Q ÞÑ } } Proposition 3.4.4. For p 1, the mapping p introduced in Definition 3.4.3 is a semi- P r 8s }¨} norm on Lp. For the proof of the triangle inequality we need another result which is important on its own. Theorem 3.4.5 (Minkowski’s inequality). Let f, g M such that f g is well-defined (in the P ` sense that ’ ’ does not occur). Then for any p 1, , 8´8 P r 8s f g p f p g p. (3.4.1) } ` } ď} } `} } 78 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

The proof of Theorem 3.4.5 will take advantage of yet another inequality that we have not studied so far.

p Proof of Proposition 3.4.4. p maps from L to 0, , so we only have to show that it is }¨} r 8q absolutely homogeneous and fulfills the triangle inequality. Absolute homogeneity ( cf p p } } “ c f p for all c R and f L ) follows from the linearity of the integral. The validity of the | |} } P P triangle inequality is a consequence of Minkowski’s inequality.

While for p 1, the above result in combination with the fact that Lp is a vector space P r 8s provides us with the fact that Lp is actually a semi-normed vector space, it is not hard to p p observe that p does not in general define a norm on L . Indeed, for any f L we can choose p }¨} P some g L such that f g and µ f g 0 and get f g p 0. P ‰ p ‰ q“ } ´ } “ An elegant way out of this quandary is to consider an appropriate quotient space. To be precise,

let N denote the set of all f Lp such that µ f 0 0. (3.4.2) P p ‰ q“ Applying Lemma 2.2.3 we can deduce that

p N f L : f p 0 . (3.4.3) “ P } } “ It is not hard to show that N forms a subspace of Lp, and thus( we can define the quotient space

Lp : Lp N f : f N : f Lp . “ { “ t “ ` P u p p p Hence, elements of L are equivalence classesr of functions in L , and f, g L are in the same P equivalence class (usually written f g) if and only if f g N , i.e., according to (3.4.3), „ ´ P

f g p dµ 0. | ´ | “ ż Thus, in combination with Proposition 3.4.4

f p f g p g p g p, } } ď} ´ } `} } “} } and similarly we get g p f p, } } ď} } so f p g p. As a consequence, we obtain the following result. } } “} } p Corollary 3.4.6. For p 1, , the space L is a normed vector space with norm p. P r 8s }¨} Proof. Using Proposition 3.4.4 in combination with the fact that N as introduced in (3.4.2) is a subspace of Lp, we obtain that Lp is a semi-normed vector space. Since for f Lp we have p P f p 0 if and only if f N , we deduce that p is definite on L , and hence the latter } } “ P }¨} endowed with p is a normed vector space. }¨} In a slight abuse of nomenclature, one usually also refers to elements of Lp as functions, although, strictly speaking, they are equivalence classes of functions. One reason for this is that in probability theory (and also functional analysis, where Lp spaces play an important role) people are most often mainly interested in the almost sure behaviour (recall Section 2.2.1), i.e., in properties that do not change if the random variable is modified on a set of measure zero; in particular, this implies that for f Lp, any representative f Lp of the equivalence class of P P f would have the same (almost everywhere) properties. An example that you have gotten to know already is the distribution of a random variable (recalrl Definition 1.5.1), which did not change if we modified a random variable on a set of measure zero. p In fact, one even has that L endowed with p is complete. }¨} 3.4. LP SPACES AND SOME FUNDAMENTAL INEQUALITIES 79

p Theorem 3.4.7. For p 0, , the vector space L endowed with p is a Banach space. P r 8s }¨} Since this result is not central to this class, we refer to the proof of [Els05, Korollar 2.6] for a proof. We now give some further properties of the above spaces, most of which have been derived in [Dre18] already. Since for 0 p q we have x p 1 x q for all x R we immediately ă ď | | ď ` | | P obtain the inclusion Lq µ Lp µ , (3.4.4) p qĂ p q if µ is a finite measure.

Example 3.4.8. Consider the measurable function

1 1 f x 1, x , x R. p q“ r 8qp qx P Then Theorem 2.1.1 (MCT) implies that

f x p λ dx lim f x p λ dx . R | p q| p q“ n 1,n | p q| p q ż Ñ8 żr s Then we can use Theorem 2.0.20 to deduce

p 1 p 1 f x λ dx lim n´ ` 1 R | p q| p q“ n p 1 ´ ` ż Ñ8 ´ ´ ¯ Thus, we see that if p 1, then ą 1 f x p λ dx , | p q| p q“ p 1 ă8 żR ´ whereas for p 0, 1 , P p q f x p λ dx | p q| p q“8 żR (and the same applies for p 1). “ In particular, the right-hand side is infinite for p 0, 1 and finite for p 1, . Thus, f Lp P p s P p 8q P for p 1, but f Lp for p 0, 1 . P p 8q R P p s In order to prove the fundamental H¨older inequality below we will need the following auxiliary result.

Lemma 3.4.9 (Young’s inequality (English mathematician William Henry Young (1863–1942))). Let a, b 0, and p,q 1, such that P r 8q P p 8q 1 1 1. (3.4.5) p ` q “ Then ap bq ab . (3.4.6) ď p ` q Proof. See the proof of [Dre18, Lemma 1.9.14].

Theorem 3.4.10 (H¨older inequality (German mathematician Otto Ludwig H¨older (1859–1937))). Let p,q 1 such that 1 1 1. Then, for f, g M Ω, F,µ one has ą p ` q “ P p q

1 1 f g dµ f p dµ p g q dµ q . (3.4.7) Ω | ¨ | ď Ω | | Ω | | ż ´ ż ¯ ´ ż ¯ 80 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Proof. This is the proof of [Dre18, Thm. 1.9.16], so we omit it here.

Remark 3.4.11. (a) In particular, if f Lp and g Lq, then fg L1. P P P (b) The special case of p q 1 2 gives a special case of the Cauchy-Schwarz (Augustin- “ “ { Louis Cauchy (1789–1857), Hermann Schwarz (1843–1921) inequality you might know from linear algebra (or might get to know in functional analysis) for inner products. (c) H¨older’s inequality not only holds for expectations (which will be interpreted as integration against probability measures in ‘Probability Theory I’) but also for more general integrals in. We now have all the tools to prove Theorem 3.4.5, which will be part of the last homework sheet.

3.5 Convergence of random variables

Since this section introduces some core notions of probability theory, and in less generality this has been treated in the corresponding part [Dre18, Section 1.11], which can be found here. As in analysis, asymptotic investigation play a fundamental role in probability theory, in par- ticular when it comes to the fundamental limit theorems that we will be investigating below. As a first step to build a theoretical base for this we will introduce the fundamental types of convergence that we will encounter in probability theory and give their dependencies. In what follows, if not mentioned otherwise S, d is a separable metric space. p q 3.5.1 Almost sure convergence This is one of the strongest types of convergence that we will consider, and we will introduce it for random variables taking values in separable metric spaces. We will need an auxiliary result before giving the precise definition. Lemma 3.5.1. Let S, d be a metric space. If X,Y : Ω, F, P S, d are two random p q p q Ñ p q variables, then the mapping Ω ω d X ω ,Y ω defines an F B R -measurable real Q ÞÑ p p q p qq ´ p q valued random variable. Proof. We first of all note that the mapping ϕ :Ω ω X ω ,Y ω S S is F B S 1 Q ÞÑ p p q p qq P ˆ ´ p p qb B S -measurable by definition of the product-σ-algebra (and the assumption that the X and p qq Y are random variables). Furthermore, since d is a metric, the function ϕ : S S x,y d x,y 0, (3.5.1) 2 ˆ Q p q ÞÑ p q P r 8q is continuous. Therefore, as a consequence of Theorem 1.4.10, ϕ is B S S B R -measurable. 2 p ˆ q´ p q Since S, d is separable, Theorem 2.3.9 implies that B S S B S B S and hence ϕ is p q p ˆ q“ p qb p q 2 B S S B R -measurable also. Therefore, due Theorem 1.4.4, it follows that the composition p ˆ q´ p q ϕ ϕ , which equals d X,Y , is measurable and hence a random variable. 2 ˝ 1 p q Definition 3.5.2. Let Xn be a sequence of random variables defined on Ω, F, P and mapping p q p q into a metric space S, d , and let X be another such random variable. We say that Xn converges p q (P-)almost surely (or a.s.) (‘fast sicher’ (or else ‘f.s.’) to X, and we write a.s. Xn X as n , ÝÑ Ñ8 or lim Xn X P-a.s., n Ñ8 “ if

P lim d Xn, X 0 P ω Ω : lim Xn ω X ω 1. (3.5.2) n n Ñ8 p q“ “ P Ñ8 p q“ p q “ ` ˘ ´! )¯ 3.5. CONVERGENCE OF RANDOM VARIABLES 81

Remark 3.5.3. (a) Note that from Lemma 3.5.1 in combination with Proposition 1.4.15 we infer that the probabilities in (3.5.2) are well-defined.

(b) In particular, note that if Xn converges to X pointwise, then we have almost sure con- vergence as well. The reason that pointwise convergence is not so important to us is that modifications that only effect null sets cannot be noticed from a point of view of the probability measure.

(c) Property (3.5.2) can be rephrased as

P lim sup d Xn, X 0 0. n p qą “ Ñ8 ` ˘ In the setting of a general measure space Ω, F,µ , where µ does not necessarily have mass p q 1, if for functions fn and f one has µ lim supn d fn,f 0 0, or equivalently p q c Ñ8 p q ą “ µ limn d fn,f 0 0, then fn is said to ‘converge µ-almost everywhere (or t Ñ8 p q “ u “ p `q ˘ µ-a.e.) to f’. ` ˘ 3.5.2 Convergence in Lp This is yet another fairly strong type of convergence which in a slightly more general form plays an important role in (functional) analysis, too. Here, we will focus on the case of real-valued random variables.

Definition 3.5.4. Let p 0, let Xn be a sequence of (equivalence classes of) random variables ą p q in Lp Ω, F, P , and let X Lp Ω, F, P as well. Then we say that Xn converges to X in p q P p q Lp Ω, F, P , and write p q Lp Xn X ÝÑ if Xn X p 0 as n . } ´ } Ñ Ñ8 As long as we do not impose any further assumptions (which we don’t do for the time being), none of the above two types of convergence is actually stronger than the other. Example 3.5.5. Let P denote the uniform distribution on 0, 1 r q (a) Consider for n 1 and k 0, 1,..., 2n 1 the random variables ě P t ´ u 1 Xn,k : k2´n, k 1 2´n “ r p ` q q

and define Y : X , , Y : X , , Y : X , , Y : X , , ... (this is the ‘lexicographic 1 “ 1 0 2 “ 1 1 3 “ 2 0 4 “ 2 1 ordering’). Then lim supn Yn 1 and lim infn Yn 0, and in particular Yn does not Ñ8 “ Ñ8 “ converge almost surely. On the other hand, for p 0, any n N, and k 0,..., 2n 1 ą P P t ´ u we have p n n E Xn,k 0 P 0, 2´ 2´ , r| ´ | s“ pr qq “ Lp and the right-hand side converges to 0 as n . Therefore, Yn 0 as n . Ñ8 ÝÑ Ñ8 This example shows that convergence in Lp does not imply almost sure convergence.

1 p 1 (b) Fix p 0 and consider the random variables Xn : n 0,1 n . Then for any ω 0, 1 ą “ r { s P p q fixed we have 1 p 1 Xn ω n 0,1 n ω , p q“ r { sp q and the right-hand side converges to 0 as n . Therefore, Ñ8

lim Xn 0 0, 1 , n Ñ8 “ “ p q ( 82 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

and since P 0, 1 1 this implies that limn Xn 0 almost surely. pp qq “ Ñ8 “ On the other hand, a moment’s thought reveals that since Xn X holds P-a.s. as n , Ñ Ñ8 the only possible limit in Lp would be an almost surely constant random variable X 0. “ Now for all n N one has P p E Xn 0 1, r| ´ | s“ p and therefore Xn does not converge to 0 in L . This example shows that almost sure convergence does not imply convergence in Lp.

3.5.3 Convergence in probability

Definition 3.5.6. Let Xn be a sequence of random variables defined on Ω, F, P mapping p q p q into a separable metric space S, d , and let X be another such random variable. We say that p q Xn converges in probability (‘konvergiert in Wahrscheinlichkeit’ oder ‘konvergiert stochastisch’) to X if for all ε 0, ą P d Xn, X ε 0 as n . (3.5.3) p p qě qÑ Ñ8 In this case we write P Xn X as n . ÝÑ Ñ8 Again, as a consequence of Lemma 3.5.1, the probability appearing in (3.5.3) is well-defined.

3.5.4 Convergence in distribution In a slight abuse of notation, we will say that µ is a measure on a metric space S, d if, in fact, p q it is a measure on the measurable space S, B S , where as before the Borel-σ-algebra on S is p p qq defined as the σ-algebra generated by the open sets of S (which again are induced by the metric d).

Definition 3.5.7. Let µn be a sequence of finite measures on a separable metric space S, d p q p q and let µ be yet another finite measure on S, d . We say that µn converges in weakly (‘kon- p q p q vergiert schwach’) to µ if for all continuous bounded functions f Cb S from S to R we P p q have

f dµn f dµ as n . Ñ Ñ8 żS żS In this case we write w µn µ as n , ÝÑ Ñ8 where w stands for ‘weakly’. In addition, given S, d -valued random variables Xn and X defined on possibly different proba- p q bility space Ωn, Fn, Pn and Ω, F, P , we say that Xn converges to X in distribution as n , p q p q Ñ8 if 1 w 1 Pn X´ P X´ as n . ˝ n ÝÑ ˝ Ñ8 In this case we write L Xn X as n , ÝÑ Ñ8 or also D Xn X as n . ÝÑ Ñ8 Here, L and D stand for ‘law’ and ‘distribution’, respectively. Yet another very common notation is

Xn X as n . ùñ Ñ8 3.5. CONVERGENCE OF RANDOM VARIABLES 83

Theorem 3.5.8. If S, d is a metric space and µ, ν are two finite measures on S, d with p q p q

f dµ f dν f Cb S with f 0, “ @ P p q ě ż ż then µ ν. “ Proof. According to Theorem 1.2.17 it is sufficient to show that µ and ν coincide on the π-system of open sets (which is generating B S ). p q c For this purpose, let U S be open and for x S define d x, U : infy U c d x,y . Then for Ă c P p q “ P p q any n N, the function fn x : 1 nd x, U is in Cb S and we have fn 1U . Therefore, by P p q “ ^ p q p q Ò assumption and the MCT we infer that

µ U lim fn dµ lim fn dν ν U . p q“ n “ n “ p q Ñ8 ż Ñ8 ż

The following result gives a powerful characterization of weak convergence. As we will see in the proof of Corollary 3.5.10 already, it will turn out very useful to have different characterizations of weak convergence available.

Theorem 3.5.9 (Portmanteau theorem). For a sequence µn of probability measures on the p q metric space S, d and µ another probability measure, the following conditions are equivalent: p q (a) µn µ weakly; Ñ (b)

lim f dµn f dµ n “ Ñ8 ż ż for all f Cb S which are uniformly continuous; P p q (c) lim sup µn F µ F n p qď p q Ñ8 for all F S closed; Ă (d) lim inf µn O µ O n Ñ8 p qě p q for all O S open; Ă (e) lim µn A µ A n Ñ8 p q“ p q for all A B S with µ A 0 (such a set A is also called a µ-continuity set). P p q pB q“ Proof. ‘ a b ’: This is immediate from the definition. p q ùñ p q ‘ b c ’: Similarly to the proof of Theorem 3.5.8, setting p q ùñ p q

fm x : 1 md x, F `, p q “ p ´ p qq we get, since each fm is bounded and uniformly continuous, and since 1F fm 1F , that ď ď m

1 lim sup µn F lim sup fm dµn fm dµ µ F m , n p qď n “ ď p q Ñ8 Ñ8 ż ż 84 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY where for ε 0 we define F ε : x S : d x, F ε . ą “ t P p qă u If F is closed, then by the continuity of µ from above we obtain taking m that (c) holds. Ñ8 ‘ c d ’: (d) follows from (c) by taking complements. p q ùñ p q ‘ c & d e ’: We obtain p q p q ùñ p q c d o o µ A p q lim sup µn A lim sup µn A lim inf µn A lim inf µn A p q µ A . n n p q ě p qě p qě Ñ8 p qě Ñ8 p q ě p q For A a µ-continuity set the left-hand side and the right-hand side of the previous display coincide, which yields (e). ‘ e a ’: we choose f Cb S arbitrary, but by linearity of the integral, we assume without p q ùñ p q P p q loss of generality that f S 0, 1 . Then using Example 2.1.8 we obtain that p q Ă r s 1 f dµ 8 µ f t dt µ f t dt. “ p ą q “ p ą q żS ż0 ż0 Now since f is continuous, we deduce that f t f t . But we know that we can have Bt ą u Ă t “ u µ f t 0 for at most countably many t 0, 1 , so (e) implies that for λ-almost all t 0, 1 , p “ qą P r s P r s we have that µn f t µ f t as n , which in combination with the DCT implies p ą qÑ p ą q Ñ8 1 1 f dµn µn f t dt µ f t dt f dµ, “ p ą q Ñ p ą q “ ż ż0 ż0 ż which implies (a).

Oftentimes we will be dealing with real random variables, and the following equivalent criterion for convergence in distribution of real random variables will come handy (which we had proven separately in the introductory lecture).

Corollary 3.5.10. Let Xn be a sequence of real random variables and let X also be a real p q random variable. Denote the corresponding distribution functions by Fn and F, respectively. Then the following are equivalent:

(a)

Xn X; ùñ (b) For all points t of continuity of F , one has

Fn t F t as n . (3.5.4) p qÑ p q Ñ8

Proof. We only prove ‘ a b here. The key point is to observe that t is a point of continuity p q ùñ p q of F if and only if ,t is a P X 1-continuity set. Indeed, F is right-continuous due to p´8 s ˝ ´ Theorem 1.5.2, so we have that F is continuous at t if and only if

0 F t lim F t h , “ p q´ h 0 p ´ q Ó P X´1 ,t due to Prop. 1.2.16 “ ˝ pp´8 qq loooooomoooooon and the right-hand side of the last display equals P X 1 t , i.e., P X 1 ,t , which ˝ ´ pt uq ˝ ´ pBp´8 sq establishes the claim. The last equivalence of the Portmanteau theorem now immediately supplies us with (3.5.4). 3.5. CONVERGENCE OF RANDOM VARIABLES 85

3.5.5 Some fundamental tools Markov’s and Chebyshev’s inequalities We will introduce some fundamental inequalities. These play a central role in probability and are some of the standard tools one has to feel comfortable to apply.

Proposition 3.5.11 (Markov’s inequality (Andrey Andreyevich Markov (1856–1922))). Let X be a real random variable and let ε 0. Then, for any increasing function ϕ : 0, 0, ą r 8q Ñr 8q with ϕ ε 0 one has p qą E ϕ X P X ε r p| |qs . (3.5.5) p| |ě qď ϕ ε p q The proof is contained in [Dre18], but since it is short we reproduce it here.

Proof. Since ϕ is monotone increasing we have the inequality

1 ϕ X X εϕ ε , p| |q ě | |ě p q and taking expectations on both sides supplies us with

E ϕ X P X ε ϕ ε , r p| |qs ě p| |ě q p q which implies (3.5.5).

Corollary 3.5.12 (Chebyshev’s inequality (Pafnuty Chebyshev (1821–1894))). Let X be in L1 Ω, F, P . Then p q E X E X 2 Var X P X E X ε rp ´ r sq s p q. (3.5.6) p| ´ r s| ě qď ε2 “ ε2 Proof. This follows from Proposition 3.5.11 by choosing the random variable in (3.5.5) as X ´ E X and ϕ x : x2. r s p q “ Remark 3.5.13. Inequalities of the type (3.5.6) which bound the probability that X deviates from a certain quantity, such as its expectation, are also referred to as ‘concentration inequali- ties’.

Theorem 3.5.14 (Jensen’s inequality (Danish mathematician Johan Jensen (1859 – 1925))). Let X be a real random variable in L1 and let ϕ : R R be a convex function (if X is a non- Ñ negative random variable, then it is sufficient for ϕ to be a convex function defined on 0, ). r 8q Then ϕ E X E ϕ X , . (3.5.7) p r sq ď r p qs P p´8 8s The proof is that of [Dre18, Thm. 1.12.9].

Remark 3.5.15. (a) Using Theorem 1.4.4 in combination with the fact that convex functions from R (or 0, ) to R are R, B R R, B R measurable (either exercise, or: for r 8q p p qq ´ p p qq affine functions this is clear, and otherwise it follows from the proof of Theorem ?? below) we deduce that ϕ X is a random variable again, and hence at least we do have the ˝ measurability assumptions to speak of the expectation of ϕ X. ˝ (b) If ϕ is a concave function on R, then ϕ is a convex function, hence Theorem 3.5.14 ´ yields r ϕ E X r E ϕ X p r sq ě r p qs for X L1. P r r 86 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

(c) This immediately supplies us with another proof for the inclusion Lq Lp for q,p 0, Ă P p p8q with q p which we had derived in (3.4.4). Indeed, since the function ϕ x : x q is ą p q “ concave on 0, and since X is non-negative, we get for X Lq that r 8q | | P ϕ E X q E ϕ X q E X p . r 8ą p r| | sq ě r p| | qs “ r| | s Thus, E X p which implies X Lp. r| | să8 r P r Example 3.5.16. (a) The absolute value function ϕ x : x yields p q “ | | E X E X . | r s| ď r| |s (b) Choosing the convex function ϕ x : x2, Jensen’s inequality supplies us with p q “ E X 2 E X2 . r| |s ď r s 3.5.6 Interdependence of types of convergence of random variables Having introduced all the above types of convergence, it is natural to try to order them in terms of strength. As we have seen in Example 3.5.5, there is no general implications between convergence in Lp Ω, F, P and P-almost sure convergence. However, for the remaining ones we p q do have the following hierarchy.

Theorem 3.5.17. Let Xn, X be real random variables on Ω, F, P and let p 0. p q ą p p L (a) If either limn Xn X almost surely, or if X, Xn L and Xn X, then Ñ8 “ P ÝÑ P Xn X. ÝÑ P (b) If Xn X, then ÝÑ Xn X. (3.5.8) ùñ q p q L L (c) If 0 p q and if Xn and X are in L such that Xn X, then Xn X as ă ă ă 8 p q ÝÑ ÝÑ well. (d) If 8 for all ε 0 one has P Xn X ε , (3.5.9) ą n 1 p| ´ |ě qă8 “ 3 ÿ then limn Xn X P-a.s. Ñ8 “ P In particular, if Xn X, then there exists a subsequence Xn of Xn such that ÝÑ p k q p q Xn X P a.s. k ÝÑ ´ The proof is exactly that of [Dre18, Thm. 1.13.1], so we omit it here. Remark 3.5.18. (a) Show that the converses of the convergence implications given in The- orem 3.5.17 (a) to (c) do not hold true in general. (b) Also note that a substantial part of the above implications might break down if instead of P we consider an infinite measure on Ω, F . p q Theorem 3.5.19 (Egorov’s theorem). Let Xn, X M Ω, F, P be real random variables such P p q that P-a.s., Xn X. Ñ Then for every ε 0 there exists A F such that P A ε and such that Xn converges to X ą P p q ă uniformly on Ac. Exercise 3.5.20. Let Ω, F, P be a discrete probability space. Show that in this setting, if P p q Xn X already implies that limn Xn X holds P-almost surely. ÝÑ Ñ8 “ 3 If (3.5.9) holds true one says that Xn converges fast or almost completely to X. 3.6. LAWS OF LARGE NUMBERS 87

3.6 Laws of large numbers

One central topic in probability theory is the asymptotic analysis of random systems and one of the simplest and more or less realistic situations to imagine is arguably a very long (or, possibly slightly less realistic, an infinite) sequence of independent coin tosses or dice rolls. For the sake of simplicity let’s have a look at the situation of independent fair coin tosses, and define for n N a random variable Xn on Ω, F, P that takes the value 1 if the coin of the n-th toss shows P p q 4 heads, whereas it takes the value 1 if the coin shows tails. Now we know that E Xn 0, ´ r s “ and also for the sum n Sn : Xj (3.6.1) “ j 1 ÿ“ we have E Sn 0 by the linearity of expectation. r s“ Definition 3.6.1. The sequence Sn as defined in (3.6.1) is also called simple random walk (SRW) (‘einfache Irrfahrt’). For x Z we will sometimes write Px Sn : P Sn x to denote the law of simple P p P ¨q “ p ` P ¨q random walk started in x. If you have attended the introductory class, it might be worthwhile to notice that simple random walk is a very basic example of a Markov chain. Oftentimes, instead of investigating the expectation, one is interested e.g. in realizationwise statements, or statements concerning probabilities of certain events. In our current setting for example, one might want to ask what values Sn ω ‘typically’ takes. Now, although E Sn 0 p q r s“ for all n N, it is obvious that Sn ω 0 can only hold true if n is even. In fact, even when P p q “ n is even, 0 is not the typical value for Sn to take, in the sense that it is realised with a high probability or at least with a probability that is bounded away from 0 for n . Indeed, for Ñ 8 n 2k even we get with Stirling’s formula that “ n 2k 2k 1 2k e ?2π 2k 2k 1 P Sn 0 p { q ¨ 2´ , (3.6.2) p “ q“ k 2 „ k e k?2πk 2 “ ?kπ ˆ ˙´ ¯ p { q where for sequences an and bn of positive real` numbers we˘ write an bn if limn an bn 1. p q p q „ Ñ8 { “ Exercise 3.6.2. Using an explicit computation as in (3.6.2), show that although P Sn 0 0 p “ qÑ due to (3.6.2), for n 2k the function Z m P Sn m is maximised for m 0. “ Q ÞÑ p “ q “ 1 Thus (3.6.2) tells us that P Sn 0 goes to zero at the order of n´ 2 . One might therefore p “ q be tempted to guess that if instead of just considering 0, we were replacing it by intervals of the type c?n, c?n , then we would obtain a non-trivial limiting probability for Sn to take r´ s values in such intervals. This is indeed the case (and not only if the Xn describe coin tosses, but for far more general distributions of X) as will be established in the central limit theorem (see Theorem 3.8.1 below). For the time being, however, we start with having a look at a simpler result at cruder scales.

3.6.1 Weak law of large numbers We will start with investigating the so-called empirical mean.

d Definition 3.6.3. Given a realization X ω ,...,Xn ω of R -valued random variables, its 1p q p q empirical mean is defined as 1 1 n S ω X ω . (3.6.3) n n n j p q“ j 1 p q “ 4 ÿ The corresponding distribution P ˝ Xn is also called Rademacher distribution, named after the German- American mathematician Hans Rademacher. 88 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

In order to be able to prove something meaningful about the empirical mean, we will take advantage of Chebyshev’s inequality introduced in Corollary 3.5.12 above. As suggested by (3.6.2) and the heuristics developed subsequently, we might guess that the empirical mean defined in (3.6.3) will converge to 0 under suitable assumptions on the sequence Xn . p q In order to be able to treat the d-dimensional case at once, we generalize our definition of expectation to random variable X : Ω, F, P Rd, B Rd we define its expectation as p q Ñ p p qq E π X r 1p qs . ¨ . ˛ (3.6.4) E πd X ˚ r p qs ‹ ˝ ‚ where we recall that the measurability of the coordinate functions πi X comes as a consequence p q of Proposition 1.4.11. It will then be left as an easy exercise to generalize the laws of large numbers below to Rd-valued random variables whose coordinate functions fulfil the assumptions of those results. 1 Definition 3.6.4. A sequence Xn of elements of L Ω, F, P satisfies a weak law of large p q p q numbers if n 1 P Xj E Xj 0 as n . (3.6.5) n ´ r s ÝÑ Ñ8 j 1 ´ ÿ“ ¯ Historically, a weak law of large numbers had first been rigorously derived by Jakob Bernoulli in [Ber13]. Nevertheless, the intuition for such a statement must have been around at that time already since in a correspondence Jakob Bernoulli writes to Gottfried Wilhelm Leibniz in October 1703 [vdWB75, pp. 509–513]: ‘Obwohl aber seltsamerweise durch einen sonderbaren Naturinstinkt auch jeder D¨ummste ohne irgend eine vorherige Unter- weisung weiss, dass je mehr Beobachtungen gemacht werden, umso weniger die Gefahr besteht, dass man das Ziel verfehlt, ist es doch ganz und gar nicht Sache einer Laienuntersuchung, dieses genau und geometrisch zu beweisen.’

Theorem 3.6.5 (Weak law of large numbers). Let Xn be a sequence of pairwise uncorrelated 2 p q random variables in L Ω, F, P and let αn be a sequence of real numbers such that p q p q n j 1 Var Xj “ p q 2 0. (3.6.6) ř αn Ñ Then for all ε 0, ą n n j 1 Xj E Xj j 1 Var Xj P “ p ´ r sq “ p q ε 2 2 0 as n . (3.6.7) αn ě ď αnε Ñ Ñ8 ´ˇř ˇ ¯ ř In particular, if theˇ sequence Xn is evenˇ i.i.d., then it satisfies a weak law of large numbers. ˇ p q ˇ The proof is a consequence of Chebychev’s inequality (Corollary 3.5.12) and Bienayme´e’s for- mular (Corollary 3.3.11). We omit the details and refer to [Dre18, Thm. 1.14.6] for a proof.

Example 3.6.6. Let a sequence Xn as in Definition 3.6.1 of simple random walk be given. p q Then the sequence Xn satisfies a weak law of large numbers. p q Indeed, by assumption the Xj are independent and hence in particular pairwise uncorrelated. p q In addition, we have 2 2 Var Xj E X E X 1 0 1. p q“ r s´ r s “ ´ “ 2 Thus, in particular Xj L , and the assumption of Theorem 3.6.5 are satisfied for any sequence P αn of positive reals with αn ?n as n , which supplies us with p q { Ñ8 Ñ8 n 1 P X 0 α j n j 1 ÝÑ ÿ“ 3.6. LAWS OF LARGE NUMBERS 89 and in particular n 1 P Xj 0. n ÝÑ j 1 ÿ“ In particular the sequence Xn satisfies a weak law of large numbers. p q It occurs quite frequently in probability theory that triangular arrays Xn,k , 1 k n, of p q ď ď random variables play an important role. In this setting we get the following generalization of Theorem 3.6.5.

Theorem 3.6.7. Let Xn,k , 1 k n, n N be a triangular array of random variables 2 p q ď ď P in L Ω, F, P such that for each n N, the random variables Xn, ,...,Xn,n are pairwise p q P 1 uncorrelated. Furthermore, let αn be a sequence of real numbers such that setting p q n Sn : Xn,j, “ j 1 ÿ“ we have that Var Sn p2 q 0. (3.6.8) αn Ñ Then Sn E Sn P ´ r s 0 as n . αn ÝÑ Ñ8 The proof is exactly the same as that of [Dre18, Theorem 1.14.9].

3.6.2 Strong law of large numbers 1 Definition 3.6.8. A sequence Xn of elements of L Ω, F, P satisfies the strong law of large p q p q numbers if 1 n P lim sup Xj E Xj 0 1, n n ´ r s “ “ ´ Ñ8 ˇ j 1 ˇ ¯ ˇ ÿ“ ` ˘ˇ which is the same as saying that ˇ ˇ

1 n lim Xj E Xj 0 P a.s. n n ´ r s “ ´ Ñ8 j 1 ÿ“ ` ˘ Theorem 3.6.9 (Strong law of large numbers). Let Xn be a sequence of independent iden- 4 p q tically distributed random variables in L Ω, F, P . Then Xn satisfies a strong law of large p q p q numbers.

Proof. Possibly replacing Xi by Xi E Xi we can assume without loss of generality that n ´ r s E Xi 0. Setting Sn : Xi, according to Theorem 3.5.17 (d) it is sufficient to show that r s“ “ i 1 for all ε 0 we have “ ą ř 8 1 P n´ Sn ε . (3.6.9) n 0 p| |ě qă8 ÿ“ For this purpose, we apply Markov’s inequality with the function ϕ x x4, which entails p q“ 4 4 1 E n´ Sn P n´ Sn ε r s. (3.6.10) p| |ě qď ε4 Now

4 E S E XiXjXkXl . r ns“ r s 1 i,j,k,l n ď ÿ ď 90 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Using that the Xn are independent we deduce that E XiXjXkXl can be non-zero only if each p q r s of the indices i, j, k, l appears at least twice among i, j, k, l. We can therefore continue the above equality to get n n E S4 E X4 C E X2X2 nE X4 Cn2E X2 2, r nsď r i s` r i j sď r 1 s` r 1 s i 1 i,j 1 ÿ“ iÿ“j ‰ with C a finite constant. Plugging this into (3.6.10) we get

4 2 2 2 1 nE X1 Cn E X1 P n´ Sn ε r s` r s , p| |ě qď n4ε4 which is summable over n N since E X2 , E X4 . Therefore, (3.6.9) follows which finishes P r 1 s r 1 să8 the proof.

Remark 3.6.10. (a) The implications of Theorem 3.6.9 also hold if we replace the condition X L4 Ω, F, P by X L1 Ω, F, P . This has been proven P p q P p q by Etemadi [Ete81]; the proof is elementary and you should feel encouraged to read it (the article is available online through the university network at http: // link.springer.com/article/10. 1007%2FBF01013465 )

(b) As the name suggests, if Xn satisfies a strong law of large numbers it also satisfies a p q weak law of large numbers. This is a direct consequence of Theorem 3.5.17 (a) applied to 1 n the sequence n Xi of random variables and where the limiting random variable p ´ i 1 q in Theorem 3.5.17 (a)“ is given by the constant 0. ř 3.7 Convolution of measures

As outlined above, the scaling (i.e., division by n) in the law of large numbers does not look like the most accurate information one might be able to obtain on a sequence of i.i.d. variables under nice assumptions. In order to prepare for the Central Limit Theorem, we will therefore introduce some tools that will prove helpful in its derivation. Definition 3.7.1. Let µ,ν be two finite (possibly signed) measures on Rd, B Rd . Then their p p qq convolution is defined as

µ ν B : ν B x µ dx , B B Rd , (3.7.1) p ˚ qp q “ d p ´ q p q P p q żR where B x : y Rd : y x B . ´ “ t P ` P u Alternatively, if f, g L1 Rd, B Rd , λd , then their convolution is defined as the function P p p q q x y x f g y : f y x g x dx ÞÑ ´ g y x f x dx g f y . p ˚ qp q “ d p ´ q p q “ d p ´ q p q “ p ˚ qp q żR żR Note that due to

ν B x 1B x y ν dy , (3.7.2) p ´ q“ d p ` q p q żR the right-hand side of (3.7.1) is well-defined. Furthermore, plugging (3.7.2) into (3.7.1) and applying Tonelli’s theorem we also infer that µ ν ν µ (commutativity of convolution). ˚ “ ˚ Also, using Tonelli’s theorem it can be shown that f g is well-defined and in L1 once f, g L1 ˚ P (exercise). The following result is the main reason the convolution plays an important role in probability theory. 3.8. CENTRAL LIMIT THEOREM 91

Theorem 3.7.2. Let X,Y : Ω, F, P Rd, B Rd be independent random variables. Then p q Ñ p p qq PX Y PX PY . ` “ ˚ d d d 1 Proof. Writing σ : R R x,y x y R and noting that PX Y PX PY σ´ (c.f. ˆ Q p q ÞÑ ` P ` “ p b q ˝ (3.3.3)), we obtain for B B Rd that P p q Thm. 2.2.19 PX Y B 1B x y PX PY d x,y PY B x PX dx ` p q “ Rd Rd p ` q p b qp p qq “ Rd p ´ q p q ż ˆ ż PX PY B , “ p ˚ qp q where in the penultimate equality we took advantage of Tonelli’s theorem.

Lemma 3.7.3. Let X1,...,Xn be independent real random variables whose distributions have densities ϕ1,...,ϕn with respect to the Lebesgue measure λ. Then the distribution of the random n variable i 1 Xi is absolutely continuous with respect to λ with density “ ř ϕ ϕ . . . ϕn, 1 ˚ 2 ˚ ˚ which is well-defined due to the associativity of convolution. Proof. Exercise.

3.8 Central limit theorem

As the name suggests, the central limit theorem is one of the main result in probability theory. On the one hand, it gives us a somewhat more precise result of the fluctuations of the sum of well-behaved independent identically distributed random variables than the results we know from the laws of large numbers. On the other hand, it plays an important role in statistics since it justifies using the normal distribution in many models. To motivate the central limit theorem, let us get back to (3.6.2) where we had shown that for simple random walk Sn, 1 P S2k 0 . p “ q„ ?2kπ In fact, in this setting it is not hard to show that not only the probability of finding simple random walk in 0 at time 2k has a square root decay in k, but also the probabilities of finding simple random walk at a distance of order ?k at time 2k (we restrict ourselves to even times for simplicity), see [Dre18, Section 1.15] for further details. As a consequence, if we look for a rescaling of Sn by some scale function ϕ n such that Sn ϕ n p q { p q converges in distribution to a non-trivial limiting distribution, then the above suggests that ?n is the only possible order of ϕ n – and, as it will turn out below, the desired convergence does p q indeed take place. Yet another motivation for the central limit theorem can be derived from the laws of large numbers: From those we know that under suitable assumptions on a sequence of i.i.d. random variables we have 1 lim Sn E X1 0. n n Ñ8 ´ r s “ To obtain information on a finer scale´ than in the central¯ limit theorem we can now ask if β 1 there exists an exponent β 0, such that the sequence n Sn E X might hopefully P p 8q p n ´ r 1sq converge to a non-trivial limiting random variable instead of 0. The first motivational thread via the investigation of simple random walk then suggests that β 1 2. Indeed, this always has “ { to be the case as long as the Xn are assumed to have finite variance since due to Bienaym´e’s formula we have β 1 2β 1 2β 1 Var n n´ Sn E X n n n ´ , p ´ r 1sq “ n2 “ ´ ¯ 92 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

which can only converge to a non-trivial limit if β 1 . “ 2 While the central limit theorem will not give us any information on probabilities of finding e.g. simple random walk at single points, it does indeed imply that the right scale for rescaling is ?n; and not only does it do so for simple random walk, but for a very general class of distributions.

Theorem 3.8.1 (Central limit theorem). Let a sequence Xn of independent identically dis- d pd q d tributed random variables with Xn : Ω, F, P R , B R such that E X µ R and p q Ñ p p qq r 1s “ P E X µ 2 0, be given. Then the sequence of random variables defined via r| 1 ´ | s P p 8q n i 1 Xi µ Yn : “ p ´ q, n N, (3.8.1) “ ?n P ř converges in distribution to a N 0, Σ distributed random variable where Σi,j : p q “ Cov πi X ,πj X , 1 i, j d, is the (positive semi-definite) covariance matrix. p p 1q p 1qq ď ď

Remark 3.8.2. (a) The Yn are shifted in such a way that E Yn 0 and r s “ Cov πi Yn ,πj Yn Σi,j (the latter being a consequence of Bienaym´e’s formula, see p p q p qq “ Cor. 3.3.11), so expectation and covariance structure already coincide with those of a N 0, Σ -distributed variable. p q

(b) It is surprising that, as long as the Xn have finite second moments the limiting distribu- tion is the normal distribution, independent of the specific distribution of the Xis. This phenomenon is also called universality (of the normal distribution). The fact that the normal distribution appears in this context is due to the fact that if the Xn are i.i.d. N 0, Σ distributed, then p q 1 n X N 0, Σ , (3.8.2) ?n i i 1 „ p q ÿ“ i.e., the Yn as defined in (3.8.1) are again N 0, Σ distributed for all n N. p q P (c) There is a plethora of other, more general conditions which imply the validity (3.8.1). In particular, similarly to the case of the weak law of large numbers Theorem 3.6.7, there is a version of the central limit theorem for triangular arrays as well.

(d) The finiteness of the second moment is in fact essential in Theorem 3.8.1. If it is not assumed, however, then one can still obtain other types of convergence results to non-trivial distributions (so-called α-stable distributions) for different rescalings than the division by ?n in (3.8.1).

(e) One can ask whether the sequence Yn might even converge in probability to some random p q variable Z. In fact, in this case we would have that Y n Yn would converge to 0 in 2 ´ probability due to

P Y n Yn ε P Y n Z ε 2 P Yn Z ε 2 0, as n , p| 2 ´ |ě qď p| 2 ´ |ě { q` p| ´ |ě { qÑ Ñ8 and using Theorem 3.5.17 we would deduce that

Y n Yn 0 as n . (3.8.3) 2 ´ ùñ Ñ8

However, assuming d 1, µ 0 and σ2 1 for simplicity of notation, we rewrite “ “ “ 2n n 1 i n 1 Xi 1 i 1 Xi Y n Yn “ ` 1 “ , 2 ´ “ ? ?n ´ p ´ ? q ?n 2 ř 2 ř 3.8. CENTRAL LIMIT THEOREM 93

2n X n i“n`1 i i“1 Xi and observe that due to the CLT, both ?n and ?n converge in distribution to ř a N 0, 1 -variable, both of which are independent. Usingř Part (a) of Example 3.3.12 we p q 1 2 1 2 can therefore infer that Y n Yn must converge to a N 0, 1 -distributed 2 ´ p p ?2 q ` p ´ ?2 q q random variable. In particular, this contradicts (3.8.3), hence Y n Yn cannot converge 2 ´ to 0 in probability, so there cannot exist a random variable Z as postulated above.

There are at least two essentially different strategies to prove the central limit theorem. The first one works well in the case d 1 and is a more or less self-contained and direct proof “ along the lines of the proof of [Geo09, Theorem 5.28]. The second one uses the technique of characteristic functions. It has the disadvantage that it is less self-contained; it is, however, more robust under variations of the very setting given in Theorem 3.8.1 and can be extended without too much effort to more general situations, such as higher dimensions or dependencies between the random variables Xn. We will follow the second approach and need a couple of general and auxiliary result which will also prove to be beneficial later on and in the lecture Probability II when establishing so-called ‘functional Central limit theorems’. There are a couple of important properties of characteristic functions which are not hard to prove.

Lemma 3.8.3. Let X and Y be random variables mapping to Rd, B Rd , and let a R as p p qq P well as b, t Rd be arbitrary. Then P (a) ϕX 0 1; p q“ (b) ib t ϕaX b t e ¨ ϕX at ; ` p q“ p q (c) d ϕX t 1 t R ; (3.8.4) | p q| ď @ P (d) If X and Y are independent, then

ϕX Y t ϕX t ϕY t ; ` p q“ p q p q

d (e) The function R t ϕX t is uniformly continuous. Q ÞÑ p q Proof. (a) Obvious, since e0 1. “ (b) We have it aX b ib t ϕaX b t E e ¨p ` q e ¨ ϕX at . ` p q“ r s“ p q it X (c) If e ¨ was a real random variable, the statement would follow immediately with Jensens inequality applied to the convex function R x x . Since eitX is not real, however, Q ÞÑ | | we can approximate it by simple C, B C -valued random variables Xn (as we did with p p qq it X real-valued functions) such that Xn 1 and Xn e . For simple random variables, | | ď Ñ ¨ (3.8.4) is a simple consequence of the fact that the unit ball around 0 in C is a convex set. Taking the limit and using Fubini’s theorem then implies the result.

(d) If X and Y are independent random variables, then so are eit X and eit Y , for each t Rd. ¨ ¨ P Therefore, it X Y it X it Y ϕX Y t E e ¨p ` q E e ¨ E e ¨ ϕX t ϕY t . ` p q“ r s“ r s r s“ p q p q (e) Exercise. 94 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

We recall the following result that we had mentioned before already, and we will actually prove it here. Theorem (Theorem 2.6.5). Any finite measure on Rd, B Rd is uniquely characterized by its p p qq characteristic function. For µ a finite measure as above, denote for σ 0 by ą σ 2 µp q : µ N 0,σ Id , (3.8.5) “ ˚ p q i.e., the convolution of µ with a d-dimensional Normal distribution with mean 0 and covariance 2 d d matrix σ Id, where Id is the identity matrix in R ˆ (recall Example 1.5.8). The following is a common paradigm in Fourier analysis: If we put in an arbitrarily ’rough’ measure µ, then convoluting it with something ’smooth’ (in our case a measure which is abso- lutely continuous with respect to λ) supplies us with something smooth as well. Here and in the lemma below, the ’smooth’ measure N 0,σ2Id plays the role of a ’mollifier’, just in case p q you’ve seen this concept in functional analysis.

d d σ Lemma 3.8.4. Let µ be a finite measure on R , B R . Then the convolution µp q has a σ d p p qq density f p q with respect to λ which is given by

2 σ 1 σ t t f p q x d ϕµ t exp ix t p ¨ q dt, (3.8.6) p q“ 2π Rd p q ´ ¨ ´ 2 p q ż ! ) where we recall that ϕµ denotes the characteristic function of µ. Proof. We write σ 1 x¨x d 2σ2 hp q x : d e´ , x R , (3.8.7) p q “ 2πσ2 2 P p q for the density of an Rd-valued N 0,σ2Id -distributed random variable Z. p q We start with observing that

σ 1 y¨y d 1 2σ2 µp q B PZ B x µ dx d B x y e´ λ dy µ dx p q“ Rd p ´ q p q“ 2πσ2 2 Rd Rd p ` q p q p q ż p q ż ż y y x σ d σ d ÞÑ ´ 1B y hp q y x λ dy µ dx hp q y x µ dy λ dy , “ Rd Rd p q p ´ q p q p q“ B Rd p ´ q p q p q ż ż ż ´ ż ¯ where the penultimate equality follows from the change of variable formula Theorem 2.2.19 and σ the last equality is due to Tonelli’s theorem. Thus, µp q has density

σ σ f p q x hp q y x µ dy (3.8.8) p q“ d p ´ q p q żR with respect to λd. Then, generalizing Example 2.6.4 to the d-dimensional case, we obtain that

σ2t¨t ϕN 0,σ2Id t e´ 2 . p qp q“ and hence the identity

2 1 t x σ x¨x def t¨t d σ i 2 2σ2 2 2 d e ¨ e´ ϕN 1 t e´ 2πσ hp q t . 0, 2 Id 2π σ2 2 Rd “ p σ qp q“ “ p q p q p { q ż Plugging this into the right-hand side of (3.8.8), we deduce that

2 σ σ 1 σ t t f p q x hp q y x µ dy d exp i y x t ¨ dtµ dy . p q“ Rd p ´ q p q“ Rd 2π Rd p ´ q ¨ ´ 2 p q ż ż p q ż ! ) 3.8. CENTRAL LIMIT THEOREM 95

Using Fubini’s theorem the latter equals

2 1 iy t σ t t d e ¨ µ dy exp ix t ¨ dt 2π Rd Rd p q ´ ¨ ´ 2 p q ż ´ ż ¯ ! ) 1 σ2t t d ϕµ t exp ix t ¨ dt, “ 2π Rd p q ´ ¨ ´ 2 p q ż ! ) which establishes (3.8.6) and hence finishes the proof.

σ Lemma 3.8.5. For µ and µp q as above we have that

σ w µp q µ, as σ 0. (3.8.9) ÝÑ Ó (So far we’ve only been concerned with convergence of sequences, not families of measures. We can either retreat to considering µ 1 n instead of µ σ and then take n , or otherwise the p { q p q Ñ 8 real meaning of the convergence in (3.8.9) is that the stated convergence takes place along any subsequence σn n with σn 0 and limn σn 0.) p q ą Ñ8 “ Proof. µ and µ σ have the same mass µ Rd 0, , so w.l.o.g. we can assume that it equals p q p q P p 8q one. Then choose independent Rd-valued random variables X and Y on the same probability space such that P X 1 µ and P Y 1 N 0, Id . Then X σY has law µ σ , and P-a.s. ˝ ´ “ ˝ ´ “ p q ` p q X σY X. ` Ñ σ As a consequence, Theorem 3.5.17 implies that µp q converges weakly to µ, which finishes the proof.

The following result is not explicitly needed for proving Theorem 2.6.5, but we nevertheless give it here since it is important and interesting on its own.

Corollary 3.8.6 (Fourier inversion formula). Let µ f λd for some probability density f d 1 d d d “ ¨ d d defined on R . Then, if ϕµ L R , B R , λ , we have for λ -almost all x R that P p p q q P

1 ix t f x d ϕµ t e´ ¨ dt. p q“ 2π d p q p q żR ix t Proof. We have that the integrand on the right-hand side of (3.8.6) converges to ϕµ t e as p q ´ ¨ σ 0, and furthermore, for each σ 0, its absolute value is upper bounded by ϕµ , which by Ó ą | | assumption is in L1. Therefore, using the DCT and taking σ 0 in (3.8.6), Ó

σ 1 ix t lim f p q x d ϕµ t e´ ¨ dt : h x . σ 0 p q“ 2π d p q “ p q Ó p q żR In combination with the fact that µ σ f σ λd according to Lemma 3.8.4 and using our result p q “ p q ¨ on integration with respect to measures with densities (Theorem 2.2.19), this implies (due to the DCT and the fact that

σ 1 sup f p q x d ϕµ t dt : C 0, , x Rd p qď 2π Rd | p q| “ P p 8q P p q ż that for any continuous function v 0 with compact support we have that ě σ d C sup v x vf p q y y R , x Rd p q ě | p q| @ P P 96 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY and hence the left-hand side of the previous display is an integrable majorising function, so the DCT gives σ σ d d v dµp q vf p q dλ vh dλ , as σ 0. “ Ñ Ó ż ż ż On the other hand, Lemma 3.8.5 implies that

σ d v dµp q v dµ vf dλ , as σ 0, Ñ “ Ó ż ż ż so the right-hand sides of the last two displays coincide for all such v, i.e., we have

vh dλd vf dλd (3.8.10) “ ż ż d for all v 0 in Cb R with compact support. ě p qd Now if only v Cb R instead of v having compact support as well, then we can choose a P p q sequence a monotone increasing sequence vn of functions with p q d • vn 0, vn Cb R with compact support; ě P p q

• vn x 0, maxx Rd v x 0, ; p q P r P | p q|s Ă r 8q d • For all x R we have vn x v x as n . P p qÑ p q Ñ8 As a consequence (3.8.10) implies

d d (3.8.10) d d vh dλ lim vnh dλ lim vnf dλ vf dλ , “ n “ n “ ż Ñ8 ż Ñ8 ż ż where the first and third equality are due to the MCT. Therefore, Theorem 3.5.8 implies that

f λd h λd, ¨ “ ¨ so f h holds λd-almost everywhere according to Proposition 2.2.10. “

Proof of Theorem 2.6.5. Assume that ϕµ ϕν . Then, by Lemma 3.8.4 we get that “ σ σ µp q νp q, “ and due to Lemma 3.8.5 the left-hand side converges weakly to µ, whereas the right-hand side converges weakly to ν. I.e.,

σ σ v dµ lim v dµp q lim v dνp q v dν d “ σ 0 d “ σ 0 d “ d żR Ó żR Ó żR żR d for all v Cb R . Therefore, by Theorem 3.5.8, µ ν, which finishes the proof. P p q “ We will give a couple of important implications of Theorem 2.6.5. We start with generalizing Example 3.3.12 (a) and giving a small hit parade of characteristic functions. Proposition 3.8.7. The following distributions have the given characteristic functions:

(a) δx with x R: P ϕ t eitx. p q“

(b) Berp with p 0, 1 : P p q ϕ t peit 1 p . p q“ ` p ´ q 3.8. CENTRAL LIMIT THEOREM 97

(c) Binn,p with p 0, 1 , n N : P p q P ϕ t peit 1 p n. p q “ p ` p ´ qq

(d) Poiν with ν 0, : P p 8q ν eit 1 ϕ t e p ´ q p q“

(e) U 0,a with a 0, : r s P p 8q eiat 1 ϕ t ´ p q“ iat (f) Exp with κ 0, : κ P p 8q κ ϕ t . p q“ κ it ´ (g) N µ,σ2 with µ R, σ2 0, : p q P P p 8q 2 2 iµt σ t ϕ t e ´ 2 . p q“ Proof. For the normal distribution this has been shown in Exercise 2.6.4, the remaining parts are left as an exercise.

Combining Proposition 3.8.7 and Theorem 2.6.5, we directly obtain the following important result which (see Theorem 3.7.2) tells us how some sums of two independent random variables with the same type of distributions is distributed. Corollary 3.8.8. (a) For µ ,µ R, σ2,σ2 0, , we have 1 2 P 1 2 P p 8q

N 2 N 2 N 2 2 µ1,σ µ2,σ µ1 µ2,σ σ ; 1 ˚ 2 “ ` 1 ` 2 (b) For ν ,ν 0, , we have 1 2 P p 8q Poiν1 Poiν2 Poiν1 ν2 ; ˚ “ ` (c) For p 0, 1 and m,n N we have P p q P Binn,p Binm,p Binn m,p. ˚ “ ` We now proceed with our preparations to proving the Central Limit theorem and start with the following auxiliary result, which we use in the proof of Theorem 3.8.10, but it is also of general interest. Lemma 3.8.9. Let µ,ν be probability measures on Rd, B Rd , and assume that µ f λd for p p qq “ ¨ some density f. Then µ ν has density ˚ g x : f x y ν dy p q “ d p ´ q p q żR with respect to λd. Proof. For B B Rd arbitrary we get P p q

µ ν B µ B y ν dy 1B x y µ dx ν dy p ˚ qp q“ d p ´ q p q“ d d p ` q p q p q żR żR żR 1B x y f x dxν dy f x y ν dy dx, “ d d p ` q p q p q“ d p ´ q p q żR żR żB żR which proves the result, and where to obtain the last equality we substituted x x y and ÞÑ ´ used Tonelli’s theorem. 98 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

The following is actually a weak version of L´evy’s continuity theorem. It is, however, significantly easier to prove than the general version and still sufficient for our purposes for the time being.

Theorem 3.8.10. Assume probability measures µ,µ ,µ ,... on Rd, B Rd to be given, and 1 2 p p qq denote the corresponding characteristic functions by ϕ, ϕ1, ϕ2,.... If d lim ϕn t ϕ t t R , n Ñ8 p q“ p q @ P then w µn µ as n . (3.8.11) Ñ Ñ8

Proof. Let g Cb S have compact support. In particular, this implies that g is uniformly P p q continuous. Then using the notation from (3.8.5) for µ σ for any σ 0, the triangle inequality p q ą implies

σ g dµn g dµ g dµn g dµp q ´ ď ´ n ˇ ż ż ˇ ˇ ż ż ˇ (3.8.12) ˇ ˇ ˇ σ σˇ σ ˇ ˇ ˇ g dµnp q g dµp ˇq g dµp q g dµ ` ´ ` ´ ˇ ż ż ˇ ˇ ż ż ˇ ˇ ˇ ˇ ˇ Recalling that µ σ was definedˇ as the convolution µ ˇ N ˇ0,σ2Id , using theˇ notation h σ from p q ˚ p q p q (3.8.7) for denoting the mollifier, and using Lemma 3.8.9, we can write that

σ σ d σ g dµp q g x hp q x y µn dy λ dx g hp q dµn, n “ p q p ´ q p q p q“ ˚ ż ż ´ ż ¯ ż where in the last equality we used Fubini’s theorem (since the function x,y g x h σ x y p q ÞÑ p q p qp ´ q is µ λd-integrable due to b

σ d σ d g x hp q x y µn λ dx, dy g x hp q x y µn dy λ dx | p q p ´ q| b p q“ | p q| p ´ q p q p q ż ż ´ ż ¯ σ g dµp q , “ | | n ă8 ż where in the first equality we used Tonelli’s theorem, in the second we used Lemma 3.8.9, and σ the inequality comes from the fact that µnp q is a finite measure and g is bounded.) Therefore, and taking advantage of Lemma 3.8.9 and Tonelli’s theorem, we can upper bound

σ σ g dµn g dµp q g x g hp q x µn dx , ´ n ď p q ´ p ˚ qp q p q ˇ ż ż ˇ ż ˇ ˇ ˇ ˇ and due to the uniformˇ continuity andˇ the boundednessˇ of g weˇ get that g h σ converges ˚ p q uniformly to g, i.e., for any ε 0, for all σ 0 small enough we have ą ą σ sup g x g hp q x ε. x Rd p q ´ p ˚ qp q ă P ˇ ˇ Thus, for any ε 0 we have that forˇ all σ 0 small enoughˇ and all n N, ą ą P

σ g dµn g dµp q ε. (3.8.13) ´ n ď ˇ ż ż ˇ ˇ ˇ The exact same argument works forˇ the last term on theˇ RHS to yield

σ g dµp q g dµ ε. (3.8.14) ´ ď ˇ ż ż ˇ ˇ ˇ ˇ ˇ 3.8. CENTRAL LIMIT THEOREM 99

Now in order to digest the second summand on the RHS of (3.8.12), note that Lemma 3.8.4 σ σ implies that µp q and µnp q have densities 1 σ2 t t x d ϕµ t exp ix t p ¨ q dt ÞÑ 2π Rd p q ´ ¨ ´ 2 p q ż ! ) and 1 σ2 t t x d ϕµn t exp ix t p ¨ q dt ÞÑ 2π Rd p q ´ ¨ ´ 2 p q ż ! ) with respect to λd. Thus, we get 2 σ σ σ t t g dµp q g dµp q g x ϕµ t ϕµ t exp p ¨ q dt dx (3.8.15) n ´ ď | p q|| p q´ n p q| ´ 2 ˇ ż ż ˇ ż ż ! ) ˇ ˇ and theˇ integrand on the right-handˇ side can be upper bounded by σ2 t t Rd Rd x,y 2 g x exp p ¨ q ˆ Q p q ÞÑ | p q| ´ 2 ! ) which is integrable with respect to λd λd since g is continuous with compact support. Therefore, b the assumptions of the DCT are fulfilled and hence for any σ 0 the RHS of (3.8.15) converges ą to 0 as n . Ñ8 In combination with (3.8.13), (3.8.14) and(3.8.12), this proves that

g dµn g dµ 0 (3.8.16) ´ Ñ ˇ ż ż ˇ ˇ ˇ for g as above. The Portmanteau Theoremˇ (Theorem 3.5.9ˇ ), however, demands the convergence of bounded continuous functions with non-compact support also. In order to derive the required d convergence, let g Cb R not necessarily with bounded support, and choose a sequence of P p q continuous functions hm with hm 0, 1 and compact support (in particular, hm is uniformly p q P r s continuous) such that hm 1. In particular, the reasoning above implies Ò

hm dµn hm dµ as n , (3.8.17) Ñ Ñ8 ż ż and

ghm dµn ghm dµ as n , Ñ Ñ8 ż ż since hm and ghm are bounded and continuous with compact support. Furthermore,

ghm dµn g dµn sup g x 1 hm dµn (3.8.18) ´ ď | p q| p ´ q ˇ ż ż ˇ ż ˇ ˇ and ˇ ˇ ghm dµ g dµ sup g x 1 hm dµ, (3.8.19) ´ ď | p q| p ´ q ˇ ż ż ˇ ż From (3.8.17) and the factˇ that the µ andˇµ are probability measures we infer that ˇ n ˇ

1 hm dµn 1 hm dµ as n . p ´ q Ñ p ´ q Ñ8 ż ż In particular, for ε 0 we can choose m N such that for all n large enough, the right-hand ą P sides of (3.8.18) and (3.8.19) are upper bounded by ε. All in all, putting things together we obtain for g Cb S and hm as above that for any ε 0 there exist m,n N such that P p q ą 0 P

g dµn g dµ g dµn ghm dµn ghm dµn ghm dµ ghm dµ g dµ ´ ď ´ ` ´ ` ´ ˇ ż ż ˇ ˇ ż ż ˇ ˇ ż ż ˇ ˇ ż ż ˇ ˇ ˇ ˇ (3.8.18) ˇ ˇ (3.8.16) ˇ ˇ (3.8.19) ˇ ˇ ˇ ˇ ε ˇ ˇ ε ˇ ˇ ε ˇ looooooooooooomooooooooooooonď looooooooooooooomooooooooooooooonď loooooooooooomoooooooooooonď for all n n . This shows the desired weak convergence (3.8.11) and hence finishes the proof. ě 0 100 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY

Theorem 3.8.11. Let X be an Rd, B Rd -valued random variable with E X m for p p qq r} }2 să8 some m N. Then for t Rd fixed we obtain for h R with h 0 that P P P ‰ m k k i E ht X m ϕX ht rp ¨ q s o h , as h 0. (3.8.20) p q“ k! ` p| | q Ñ k 0 ÿ“ Proof. For h R set P ψ h : ϕX ht . p q “ p q For h R as well as k 0,...,m we obtain P P t u k k iht X ψp q h E it X e ¨ . (3.8.21) p q“ p ¨ q Indeed, we use induction. The equality is clear“ for k 0, and‰ for k 1,...,m 1 we obtain “ P t ´ u k 1 k 1 k 1 i h α t X k 1 iht X k ψp ´ q h α ψp ´ q h E it X ´ e p ` q ¨ E it X ´ e ¨ ψp q h lim p ` q´ p q lim p ¨ q ´ p ¨ q p q“ α 0 α “ α 0 α Ñ Ñ “ ‰ “ ‰ it X k 1eiht X eiαt X 1 lim E p ¨ q ´ ¨ p ¨ ´ q , “ α 0 α Ñ ” ı where in the second equality we took advantage of the induction assumption. Due to the fact that eih 1 h for h R arbitrary, we obtain that | ´ | ď | | P it X k 1eiht X eiαt X 1 p ¨ q ´ ¨ p ¨ ´ q t X k, α R, α 0. α ď | ¨ | @ P ‰ ˇ ˇ According to our assumptions,ˇ the right-hand sideˇ is in Lk Ω, F, P , and furthermore we have ˇ ˇ p q k 1 iht X iαt X it X ´ e ¨ e ¨ 1 k iht X lim p ¨ q p ´ q it X e ¨ . α 0 α Ñ “ p ¨ q Therefore, the dominated convergence theorem (see Theorem 2.1.7) implies that (3.8.21) holds. Therefore, (3.8.20) follows in combination with Taylor’s formula.

Proof of Theorem 3.8.1. Without loss of generality we can assume that the Xn are centered for all n N. Denote the characteristic function of Xn by ϕX and let ϕn denote the characteristic n P i“1 Xi d d function of Yn . Then for any t R fixed, using that E t Xn tiE πi Xn 0 “ ?n P r ¨ s“ i 1 r p qs “ as well as that ř “ 2 T ř E t Xn titj E πi Xn πj Xn t Σt, rp ¨ q s“ 1 i,j n r p q p qs “ ďÿď Σi,j we obtain looooooooomooooooooon

n T n T Lemma 3.8.3 Thm. 3.8.11 for m 2 t Σt 1 t Σt ϕn t ϕX t ?n “ 1 o n´ e´ 2 , as n . p q “ p { q “ ´ 2n ` p q Ñ Ñ8 ´ ¯ ´ ¯ (3.8.22)

The result now follows using Theorem 3.8.10 in combination with the fact that the right-hand side of (3.8.22) is the characteristic function of a N 0, Σ -distributed random variable. p q Exercise 3.8.12. For a sequence of random variables Xn as in the assumption of Theorem p q 3.8.1, the central limit theorem implies the validity of a weak law of large numbers for Xn . p q Indeed, since the distribution function Φ of the standard normal distribution (see (3.1.4)) is continuous, Theorem 3.8.1 implies that for arbitrary M 0 we have ą n i 1 Xi µ P “ p ´ q M, M Φ M 1 Φ M . (3.8.23) ?σ2n R p´ s Ñ p´ q ` p ´ p qq ´ ř ¯ An,M loooooooooooooooomoooooooooooooooon 3.8. CENTRAL LIMIT THEOREM 101

Now for any M 0, and ε 0 there exists N N such that for all n N one has P p 8q ą P ě n i 1 Xi µ Bn,ε : “ p ´ q ε An,M . “ n ą Ă !ˇř ˇ ) ˇ ˇ As a consequence, we obtain for anyˇ such M and ε,ˇ in combination with (3.8.23), that

lim sup P Bn,ε Φ M 1 Φ M . n p qď p´ q ` p ´ p qq Ñ8

Since M was arbitrary and limM Φ M 1 Φ M 0, this implies Ñ8 p´ q ` p ´ p qq “

lim P Bn,ε 0. n Ñ8 p q“

As in addition ε 0 was arbitrary, this implies the desired weak law of large numbers for Xn . ą p q Example 3.8.13. (a) Using the strong law of large numbers, for a random walk with drift n (i.e., Sn Xj where the Xj are i.i.d. with P X 1 p, P X 1 1 p, “ j 1 p 1 “ q “ p 1 “ ´ q “ ´ and p 1 2, 1 “) one has that for all ε 0, P p { ř q ą

P Sn n 2p 1 nε 0. p| ´ p ´ q| ě qÑ

Therefore, the first order (i.e. linear in n) term of the position of Sn at time n will asymptotically be given by 2p 1. In order to obtain a better understanding, it is of course ´ tempting to ask for the lower order corrections. For this purpose we apply the central limit theorem; using that the variance of Xn is given by

2 2 2 2 2 Var Xn E X E Xn 1 2p 1 1 4p 4p 1 4p 1 p : σ p q“ r ns´ r s “ ´ p ´ q “ ´ ` ´ “ p ´ q “ we obtain Sn n 2p 1 L ´ p ´ q N 0, 1 . ?σ2n ÝÑ p q

In particular, this implies that the ‘typical’ fluctuations of Sn around its expected value n 2p 1 are of the order ?n. p ´ q 102 CHAPTER 3. CLASSICAL AND BASIC RESULTS IN PROBABILITY THEORY Chapter 4

A primer on stochastic processes

4.1 Stochastic processes

Definition 4.1.1. A is a family Xt , t T, of random variables mapping p q P from some probability space Ω, F, P into a measurable space E, E . Here, T is an arbitrary p q p q non-empty set. Example 4.1.2. In the setting of the law of large numbers and the central limit theorem, the sequences Xn , n N, n p q P i 1 Xi E Xi “ p ´ r sq, n N, n P ř and n i 1 Xi “ , n N, ?n P ř are all stochastic processes (with T N or T N , respectively, and E, E Rd, B Rd .). “ “ 0 p q “ p p qq We will primarily consider the setting that the Xt are real random variables, and in this case 1 we also refer to Xt as a real stochastic process. p q Most of the times we will actually interpret t as ‘time’, and hence natural choices are T N “ 0 or also T 0, . The above definition, however, is more general. “ r 8q We have seen in the theory of random variables that the distribution of random variables has played a very important role. In fact, the very structure of the probability space Ω, F, P p q underlying a random variable X was often irrelevant, and what was more crucial to us was the law PX of the random variable. In a similar way, in the theory of stochastic processes, a key role is played by the distribution of a process. By definition, the distribution of a real process Xt ,t T, would be a probability p q P law on RT endowed with a suitable σ-algebra T on RT which makes the mapping Ω, F RT , T p q Ñ p q Ω ω Xt ω t T Q ÞÑ p p qq P itself a F T -measurable random variable. Since by definition of a stochastic process, the ´ only ‘regularity’ assumption we made was that the Xt were random variables (i.e., they are F B R -measurable functions from Ω, F, P to R, B R ), the natural σ-algebra to choose ´ p q p q p p q for RT is the product-σ-algebra (recall Definition 2.3.5). We recall that the product σ-algebra had been generated by the coordinate projections πt, t T , and equivalently the product σ- P algebra is generated by the cylinder sets (recall Definition 2.3.7). Thus, those subsets of RT for which finitely many coordinates are contained in certain measurable subsets of R will play an important role; this then leads to the following concept. 1In fact, if not mentioned otherwise, we will assume all the stochastic processes to be real in the following.

103 104 CHAPTER 4. A PRIMER ON STOCHASTIC PROCESSES

Definition 4.1.3. Let a stochastic process Xt , t T, be given. The finite dimensional distri- p q P butions of the process are given by the probability measures

S µS B : P Xs s S B , B B R , p q “ pp q P P q P p q on RS, B RS , where S T finite. p p qq Ă In particular in the case of say continuous time (as opposed to discrete such as T N ), i.e. “ 0 if e.g. T R, then the finite dimensional distributions do not contain all the mathematically “ interesting information. However, this issue will only play a role and be adressed in more advanced classes. For the time being, it is worthwhile to notice that for any given stochastic process Xt , t T, p q P the family of finite dimensional distributions satisfies the following consistency condition:

J 1 I P Xt t J πI ´ B P Xt t I B I J T, J finite, B B R , (4.1.1) pp q P P p q p qq “ pp q P P q @ Ă Ă P p q or, which is the same,

J 1 P Xt t J πI ´ P Xt t I I J T, J finite, (4.1.2) pp q P q ˝ p q “ pp q P q @ Ă Ă where the projection J J I πI : R xj j J xi i I R Q p q P ÞÑ p q P P had been introduced in Definition 2.3.3.

Remark 4.1.4. In fact, some authors index the family of finite dimensional distributions by ordered tuples t ,...,tn with n N, ti T for all i 1,...,n . This, however, turns out to be p 1 q P P P t u more complicated since one has to impose a condition on how permuting acts on the elements of the family, i.e., how µt1,...,tn and µtπp1q,...,tπpnq are related to each other for an arbitrary per- mutation π of 1,...,n (i.e., π Sn with Sn denoting the symmetric group). Indeed, if the t u P family of finite dimensional distributions is to be generated by a stochastic process in the sense that

µt ,...,t B . . . Bn P Xt ,...,Xt B . . . Bn , 1 n p 1 ˆ ˆ q“ pp 1 n q P 1 ˆ ˆ q then one obviously must have the condition that

µt1,...,tn B1 . . . Bn µtπp1q,...,tπp1q Bπ 1 . . . Bπ 1 . (4.1.3) p ˆ ˆ q“ p p q ˆ ˆ p qq

Therefore, once one has specified µt ,...,t for some t ,...,tn T and an arbitrary permu- πp1q πp1q 1 P tation π Sn, then (4.1.3) already characterizes µt ,...,t for any π Sn. Therefore, it is P πp1q πp1q P sufficient and more convenient to index the finite dimensional distributions just by finite subsets r r of T (i.e., unordered tuples) instead of by ordered tuples t ,...,tn with ti T. 1 r P Motivated the observation in (4.1.2), we introduce the notion of a consistent family of probability measures. For this purpose, recall the definition of the projection operators given in Definition 2.3.3.

Definition 4.1.5. Let T be an arbitrary non-empty set. If PI , I T finite, is a family of I Ă I probability measures such that PI is a probability measure on R , B R , then we call the p p qb q family consistent, if we have

J 1 PJ π ´ PI I J T, J finite. (4.1.4) ˝ p I q “ @ Ă Ă 4.2. KOLMOGOROV’S EXISTENCE AND UNIQUENESS THEOREM 105

4.2 Kolmogorov’s existence and uniqueness theorem

The following result has first appeared (in slightly weaker form) in [Kol33, III § IV]

Theorem 4.2.1 (Kolmogorov’s existence and uniqueness theorem). Let Eλ , λ Λ, be an p q P arbitrary family of Polish spaces, and let Bλ , λ Λ, be the corresponding family of Borel p q P σ-algebras. Furthermore, let PI , I Λ finite, be a consistent family of probability measures PI on Ă λ I Eλ, λ I Bλ . pˆ P b P q Then there exists a unique probability measure P on λ Λ Eλ, λ Λ Bλ such that p P P q 1 Ś Â P πJ ´ PJ J Λ finite, (4.2.1) ˝ p q “ @ Ă where πJ : Eλ ωλ λ Λ ωλ λ J Eλ Q p q P ÞÑ p q P P λ Λ λ J ąP ąP is again the projection. P is also called the projective limit (‘projektiver Limes’) of the family PJ , J Λ finite. p q Ă For the proof, we will need the following notation and auxiliary result.

Definition 4.2.2. Let E be a Hausdorff topological space or else a metric space, and let µ be a measure defined on the Borel σ-algebra B E . p q (a) µ is called a Borel measure (‘Borelmaß’), if

µ K K E compact. p qă8 @ Ă

(b) µ is called inner regular (‘regul¨ar von innen’) if

µ B sup µ K : K B compact . p q“ p q Ă ( (c) µ is called outer regular (‘regul¨ar von außen’) if

µ B inf µ O : O B open . p q“ p q Ą ( (d) µ is called regular (‘regul¨ar’) if it is inner regular and outer regular.

It should be noted that there is a variety of different definitions of the term ‘Borel measure’. We will stick to the one above.

Lemma 4.2.3. Let µ be either a finite Borel measure on a Polish space E, or let µ be a measure on Rd, Bd such that µ A for any bounded A B E . p q p qă8 P p q Then µ is regular.

Remark 4.2.4. In the Polish space setting the above result is sometimes referred to as Ulam’s theorem.

Proof. Since our main emphasis is on E Rd, we will restrict ourselves to giving the proof in “ this simpler case (as a treat, we may on the other hand discard with the finiteness of µ and only require µ A for any A B Rd bounded – in particular, we include the Lebesgue measure p qă8 P p q this way). For a proof of the general version, see [Bau92, Lemma 26.2] for instance. We start with showing the following claim. 106 CHAPTER 4. A PRIMER ON STOCHASTIC PROCESSES

Claim 4.2.5. For all A B Rd , P p q µ A inf µ O : O A open sup µ F : F A closed . (4.2.2) p q“ p q Ą “ p q Ă Proof. We start with proving the first equality.( Recalling the notation Id for( the semi-ring of d rectangles in R , according to Theorem 1.3.9 we can extend µ Id to an outer measure µ˚ on Rd | 2 . Now Theorem 1.3.5 in combination with Theorem 1.3.9 imply that µ˚ B Rd is a measure, | p q and due to Theorem 1.2.17, we obtain that

µ˚ B Rd µ. | p q “ Using (1.3.5) we therefore get

8 d 8 d µ A inf µ Ai : A1, A2,... I , and A Ai , A B R . (4.2.3) p q“ i 1 p q P Ă i 1 P p q ! ÿ“ ď“ ) Using furthermore that µ is continuous from above, for every ε 0 and any sequence of Ai ą p q on the right-hand side of (4.2.3) we deduce the existence of a sequence Oi of open rectangles p q with

• Ai Oi, and Ă i • µ Oi µ Ai 2 ε p qă p q` ´ for all i N. In combination with (4.2.3) this yields the outer regularity of µ. P Passing to complements we deduce

µ A sup µ F : F A closed p q“ t p q Ă u and thus recover the second equality in (4.2.2).

Since measures are continuous from below (see Prop. 1.2.16), for any F A closed we get that Ă µ F supn N µ Fn , where Fn : F B 0,n is compact, with B 0,n denoting the closed p q “ P p q “ X p q p q unit ball of radius n in Rd. In combination with (4.2.2) this implies the desired regularity.

Proof of Thm. 4.2.1. We know from Exercise 2.3.8 that the cylinder sets form an algebra over

λ Λ Eλ, which we will denote by A for simplicity. P From Definition 2.3.5 we infer that A is a generator of Bλ, and being an algebra, we deduce Ś λ Λ that A is a π-system. Hence, we may readily check thatP the assumptions of the newly added  Corollary 1.2.19 are fulfilled and we deduce uniqueness, i.e., there is at most one probability measure on Eλ, Eλ satisfying (4.2.1). p λ Λ λ Λ q It remains to showP the existenceP of such a probability measure,2 and we will take advantage of Ś  Carath´eodory’s existence theorem, or rather its Corollary 1.3.10. For that purpose, we have to show that

the right-hand side of (4.2.1) defines a σ-subadditive content P on the semiring A (4.2.4) via

1 P Z : PI B , all Z A, i.e., Z πI ´ B , where B Bλ,I Λ finite. (4.2.5) p q “ p q P “ p q p q P λ I Ă âP We start with showing that P as in (4.2.5) is actually well-defined. For this purpose assume that 1 there are B1 λ I Bλ and B2 λ J Bλ, where I, J Λ finite such that Z πI ´ B1 1 P P P P Ă “ p q p q“ πJ ´ B2 . Then there exists B3 λ I J Bλ such that p q p q  P P Y 1 ÂZ πI J ´ B3 . “ p Y q p q 2 A In fact, if any such measure exists, it must be a probability measure since λPΛ Eλ P , and we know that e.g. P p Eλq“ PλpEλq“ 1 for arbitrary λ P Λ, since the Pλ are probability measures. λPΛ Ś Ś 4.2. KOLMOGOROV’S EXISTENCE AND UNIQUENESS THEOREM 107

I J From the consistency condition (4.2.1) and the fact that B x Eλ : π x B , 3 “ t P λ I J IY p q P 1u we deduce that P Y Ś PI J B3 PI B1 , Y p q“ p q and similarly we obtain PI J B3 PJ B2 , Y p q“ p q so P Z as introduced in (4.2.5) does not depend on the specific representation of Z, and hence p q P is well-defined. Next, we have to show (4.2.4). It is obvious that P 0 and that P 0. To show additivity, pHq “ ě let Z ,Z A be two disjoint sets. Then there exist I Λ finite as well as B ,B Bλ 1 2 P Ă 1 2 P λ I disjoint such that P 1 1 Â Z π´ B and Z π´ B . (4.2.6) 1 “ I p 1q 2 “ I p 2q Since Z1 and Z2 are disjoint, we get that B1 and B2 must be disjoint as well, and thus we deduce, using (4.2.6), the fact that

1 1 1 π´ B B π´ B π´ B , I p 1 Y 2q“ I p 1qY I p 2q as well as that PI is a measure on λ I Bλ, that P P Z 9 Z PI B Â9 B PI B PI B P Z P Z , p 1Y 2q“ p 1Y 2q“ p 1q` p 2q“ p 1q` p 2q hence P is additive and a content on A. In order to show that P is σ-subadditive, due to Proposition 1.2.16 it is sufficient to check that P is continuous in . H We will prove this by contradiction. Indeed, assume otherwise that there is a decreasing sequence Zn , n N, of sets Zn A with limn Zn and such that lim supn P Zn 0. By p q P P Ñ8 “ H Ñ8 p q ą possibly passing to a subsequence, w.l.o.g. assume that P Zn δ for some δ 0 and all n N. p qě ą P Furthermore, we can assume Z π 1 A some A B for all n N, where I Λ n I´n n n λ In λ n “ p q P P P Ă finite (w.l.o.g., the In can and will be chosen to be increasing sets in n N). Now recall that E Â P λ In λ is Polish (due to Theorem 2.3.9). Using Lemma 4.2.3, we therefore deduce that there P exists a sequence Kn , n N, of compact sets Kn Bλ, and such that Kn An for all Ś p q P P λ In Ă n N, and P P n 1 Â PI Kn δ 1 2´p ` q for all n N. (4.2.7) n p qě p ´ q P 1 We set Yn : π´ Kn and furthermore “ p In qp q n Yn : Yi A. “ i 1 P č“ r Then Yn is a non-increasing sequence of elements of λ Λ Eλ with p q P

Yn , n NÂ. (4.2.8) r ‰H @ P Indeed, since Yn Zn, we have Ă r n r 8 P Zn P Yn P Zn Yn P Zi Yi P Zi Yi p q´ p q“ p z qď i 1p z q ď i 1 p z q ´ ď“ ¯ ÿ“ (4.2.7) r 8 r 8 i 1 δ P A K δ 2´p ` q , Ii i i 2 “ i 1 p z q ď i 1 ď ÿ“ ÿ“ where the first inequality we took advantage of the fact that

n Zn Yn Zn Yi z “ i 1 z ď“ r 108 CHAPTER 4. A PRIMER ON STOCHASTIC PROCESSES as well as the fact that the Zn are monotone decreasing. Therefore P Yn 0 since P Zn δ, p qą p qě so in particular Yn . ‰H Our goal now is to construct from this an element z limn Zn, whichr would finish our proof P Ñ8 by contradiction.r For this purpose, due to (4.2.8), we can and do choose a sequence yn , n N, p q P such that yn Yn, and the monotonicity of Yn implies that P p q

yn Yk n k. r P r @ ě

Hence, choosing t in some In, n N, and projecting both sides of the previous display on the P r t-th coordinate, we deduce that

In πt ym Yn π Kn m n. (4.2.9) p q P Ă t p q @ ě

Since n N In limn In is a countable subset of Λ, we can order its elements as t1,t2,.... “ Ñ8 r We now applyP a diagonal argument in combination with (4.2.9) to deduce that there exists a Ť subsequence ynk , k N, such that for each j N the sequence πtj ynk converges in each Im p q P P In p p qq of the π Km , m N, as k (mind that the π Kn are compact subsets of Et since tj p q P Ñ 8 tj p q j πIn K E the projections tj are continuous and the n are compact in λ In λ.) Hence, for each P t n N In, the limit P P Im Ś lim πt ynk : y t πt Km πt Zm (4.2.10) Ť k p q“ p q P p qĂ p q Ñ8 (for all m N) exists. P r For some x λ Λ Eλ we now define z n N Yn via P P P P

Ś y Şt , if t In, y t : p q r P n N p q “ x t , otherwiseP. " p q Ť r Then y Zn due to (4.2.10), so the intersection is non-empty which was all that remained P n N to finish theP proof. Ş

Remark 4.2.6. One can also prove Kolmogorov’s extension theorem by first establishing it for the case that T is countable (using the so-called Ionescu-Tulcea theorem) and then generalize it to uncountable T ; See [Kle14, Section 14.3] for this approach. In particular, if one is only interested in the result for T N , then there are easier proofs “ 0 available than the one we gave (and the consistency condition is also easier to formulate).

Example 4.2.7. (a) We are now in the position to show that i.i.d. sequences of random variables, such as e.g. postulated in the strong law of large numbers, actually do exist! For this purpose, all we have to show is the existence of an i.i.d. sequence of random variables Xn on some probability space Ω, F, P such that PX µ for all n N, where µ is some p q p q n “ P given probability measure on Rd, B Rd . p p qq d We now apply Kolmogorov’s extension theorem to the case Λ : N and En : R , En : d “ “ “ B R , for all n Λ, and we define the measures Pn : µ on En, En . For I N finite p q P “ p q Ă we consider the finite product measure

PI Pn µ nPI “ n I “ Â âP on n I En, n I En , as defined in Theorem 2.4.5. p P b P q It is not hard to show that the PI , I Λ finite, form a consistent family of probability Ś Ă measures. Indeed, for I J N with J finite, we have for Ă Ă

B Bn En “ n I P n I ąP âP 4.2. KOLMOGOROV’S EXISTENCE AND UNIQUENESS THEOREM 109

(i.e., Bn En for each n I), that P P J 1 d J I πI ´ B Bn R z n J En. P p q p q“ n I ˆ p q Ăˆ ´ ąP ¯ ` ˘ So, according to the definition of the product measure,

J 1 d J I PJ π ´ B PJ Bn R z Pn Bn Pn En pp I q p qq “ ˆ p q “ p q p q n I n I n J I ´´ ąP ¯ ` ˘¯ źP źP z Pn Bn PI B . “ n I p q“ p q źP Using Theorem 1.2.17 we thus deduce that

J 1 PJ π ´ PI, ˝ p I q “ and as a conseqeunce, Theorem 4.2.1 implies that there exists a probability measure P on

d N d N R , B R b p q p q ` 1 ˘ such that for any I N finite, P πI´ PI . In particular, defining we can define a Ă ˝ “ d N d N sequence of random variables Xn on the probability space R , B R , P via p q pp q p qb q d N d Xn : R ω ω n R p q Q ÞÑ p q P with the desired properties as required above.

(b) Bernoulli percolation on Zd d p T T Fix p 0, 1 . For T Z finite define the probability measure PT on 0, 1 , B R P r s T Ă pt u p qq “ 0, 1 T , 2 0,1 by setting pt u t u q p f x 1 f x PT f : p p q 1 p ´ p q (4.2.11) pt uq “ x T p ´ q źP 0 d (with the convention that 0 : 0). Then the family of probability measures PT , T Z “ Ă finite, is consistent. Indeed, for S T Zd finite we have for f 2S that Ă Ă P p T 1 g x 1 g x P π ´ f p p q 1 p ´ p q T pp J q pt uqq “ p ´ q g πT ´1 f x T Pp Sÿq pt uq źP g x 1 g x g x 1 g x p p q 1 p ´ p q p p q 1 p ´ p q “ p ´ q p ´ q g πT ´1 f x S x T S Pp Sÿq pt uq źP Pźz f x 1 f x p p p q 1 p ´ p q PS f ; “ x S p ´ q “ pt uq źP Since 0, 1 T is a discrete space, this shows Pp πT 1 Pp and hence the consistency. t u T ˝ p J q´ “ S Therefore, Theorem 4.2.1 supplies us with the existence of a probability measure Pp on Zd Zd Zd Zd 0, 1 , B 0, 1 b 0, 1 , B 0, 1 with the property that its projections / pt u pt uq qq “ pt T u pt u qq pushforwards on 0, 1 T , 2 0,1 are given by the expression in (4.2.11). pt u t u q In fact, the existence of this measure can also be derived ‘by foot’, showing that the corre- sponding content is σ-additive, see [Kle14, Thm. 14.36].

Remark 4.2.8. This will not be of utmost importance to us in this class, but we should note that in some sense from a certain point of view the σ-algebra RT is not very suitable. Indeed, in 110 CHAPTER 4. A PRIMER ON STOCHASTIC PROCESSES the case 0, it is natural to ask whether or not a stochastic process is a continuous functions, r 8q or the probability of this being the case. I.e., one might be interested in probabilities of the type

P the function 0, t Xt is continuous . r 8q Q ÞÑ q In particular, for this to make` sense we would need that the set

ω Ω : 0, t Xt ω is continuous P r 8q Q ÞÑ p q is contained in B R T . This, however, is generally not the case. This( will be investigated in p qb detail in more advanced probability classes. 4.2. KOLMOGOROV’S EXISTENCE AND UNIQUENESS THEOREM 111

Acknowledgment: I would like to thank Cedric Neumann for pointing out various typos. 112 CHAPTER 4. A PRIMER ON STOCHASTIC PROCESSES Bibliography

[AT07] Robert J. Adler and Jonathan E. Taylor. Random fields and geometry. Springer Monographs in Mathematics. Springer, New York, 2007. [Bau92] Heinz Bauer. Maß- und Integrationstheorie. de Gruyter Lehrbuch. [de Gruyter Textbook]. Walter de Gruyter & Co., Berlin, second edition, 1992. [Bau02] Heinz Bauer. Wahrscheinlichkeitstheorie. de Gruyter Lehrbuch. [de Gruyter Text- book]. Walter de Gruyter & Co., Berlin, fifth edition, 2002. [Beh87] E. Behrends. Maß- und Integrationstheorie. Hochschultext. [University Textbooks]. Springer-Verlag, Berlin, 1987. [Ber13] J. Bernoulli. Ars conjectandi. Landmarks of science. Impensis Thurnisiorum, fratrum, 1713. [Bil95] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Math- ematical Statistics. John Wiley & Sons, Inc., New York, third edition, 1995. A Wiley-Interscience Publication. [Boc33] S. Bochner. Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse. Math. Ann., 108(1):378–410, 1933. [Coh13] Donald L. Cohn. Measure theory. Birkh¨auser Advanced Texts: Basler Lehrb¨ucher. [Birkh¨auser Advanced Texts: Basel Textbooks]. Birkh¨auser/Springer, New York, second edition, 2013. [Dir68] P. G. Lejeune Dirichlet. Vorlesungen ¨uber Zahlentheorie. Herausgegeben und mit Zus¨atzen versehen von R. Dedekind. Vierte, umgearbeitete und vermehrte Auflage. Chelsea Publishing Co., New York, 1968. [Doo94] J. L. Doob. Measure theory, volume 143 of Graduate Texts in Mathematics. Springer- Verlag, New York, 1994. [Dre17] Alexander Drewitz. Probability theory II (lecture notes), 2017. [Dre18] Alexander Drewitz. Introduction to probability and statistics (lecture notes), 2018. [Dud02] R. M. Dudley. Real analysis and probability, volume 74 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989 original. [Dur10] Rick Durrett. Probability: theory and examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, fourth edition, 2010. [Els05] J¨urgen Elstrodt. Maß- und Integrationstheorie. Springer-Lehrbuch. [Springer Text- book]. Springer-Verlag, Berlin, fourth edition, 2005. Grundwissen Mathematik. [Ba- sic Knowledge in Mathematics].

113 114 BIBLIOGRAPHY

[Ete81] N. Etemadi. An elementary proof of the strong law of large numbers. Z. Wahrsch. Verw. Gebiete, 55(1):119–122, 1981.

[G08]¨ J. G¨artner. Skript Maß- und Integrationstheorie. 2008.

[Geo09] Hans-Otto Georgii. Stochastik. de Gruyter Lehrbuch. [de Gruyter Textbook]. Walter de Gruyter & Co., Berlin, expanded edition, 2009. Einf¨uhrung in die Wahrschein- lichkeitstheorie und Statistik. [Introduction to probability and statistics].

[Hal50] Paul R. Halmos. Measure Theory. D. Van Nostrand Company, Inc., New York, N. Y., 1950.

[Kal02] Olav Kallenberg. Foundations of modern probability. Probability and its Applications (New York). Springer-Verlag, New York, second edition, 2002.

[Kle14] Achim Klenke. Probability theory. Universitext. Springer, London, second edition, 2014. A comprehensive course.

[Kol33] A. N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer, Berlin, 1933.

[LL10] Gregory F. Lawler and Vlada Limic. Random walk: a modern introduction, volume 123 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2010.

[Rao04] M. M. Rao. Measure theory and integration, volume 265 of Monographs and Textbooks in Pure and Applied Mathematics. Marcel Dekker, Inc., New York, second edition, 2004.

[Rud99] Walter Rudin. Reelle und komplexe Analysis. R. Oldenbourg Verlag, Munich, 1999. Translated from the third English (1987) edition by Uwe Krieg.

[Sie75] Wac law Sierpi´nski. Oeuvres choisies. PWN–Editions´ Scientifiques de Pologne, War- saw, 1975.

[vdWB75] B.L. van der Waerden and J. Bernoulli. Die Werke von Jakob Bernoulli: Bd. 3: Wahrscheinlichkeitsrechnung. Die Werke von Jakob Bernoulli. Birkh¨auser Basel, 1975.

[Wal93] Wolfgang Walter. Gew¨ohnliche Differentialgleichungen. Springer-Lehrbuch. [Springer Textbook]. Springer-Verlag, Berlin, fifth edition, 1993. Eine Einf¨uhrung. [An introduction].