<<

and Integration

Xue-Mei Li with assistance from Henri Elad Altman Imperial College London

March 6, 2021

Contents

1 Introduction (cf. Introductory Lecture) 9

1.1 Module Information ...... 9

1.2 Course Structure ...... 9

1.3 Prologue ...... 10

1.4 The mass transport problem ...... 10

1.5 Coin tossing ...... 11

1.6 Extending Riemann’s theory of integration ...... 12

1.6.1 Inadequacy of Riemann Integrals ...... 12

1.6.2 Idea of Lebesgue’s integral: allow for more sets as “building blocks” ...... 13

1.7 Observables of Random Variables ...... 14

2 Measurable sets (1st Lecture) 15

2.1 Set Algebra ...... 15

2.2 Measurable Space ...... 16

2.2.1 Countability of σ-algebras ...... 18

2.2.2 Fundamental Examples and Exercises ...... 18

2.3 Borel σ-algebras ...... 20

2.3.1 Definition ...... 20

2.3.2 The case of R ...... 20

2.3.3 Borel σ-algbera on a subset of a metric space ...... 21

3 CONTENTS 4

2.4 Product σ-algebras ...... 21

2.4.1 Product of a countable family of σ-algebras * ...... 22

2.5 A warning ...... 23

2.6 Background material* ...... 23

3 Measures (Lectures 2-6) 25

3.1 Basics of Measures (2nd Lecture) ...... 25

3.2 Uniqueness of Measures (3rd Lecture) ...... 28

3.2.1 π − λ theorem ...... 28

3.3 Measure determining sets ...... 29

3.4 Construction of Measures (Lectures 4,5,6) ...... 30

3.4.1 Negligible sets and complete σ-algebras ...... 31

3.4.2 A completion theorem* ...... 31

3.4.3 Outer Measure ...... 32

3.4.4 Extension of pre-measures ...... 34

3.4.5 Proof for Caratheodory’s Theorem ...... 34

3.4.6 Construction of Lebesgue-Stieltjes / ...... 38

3.4.7 Example of a non-Lebesgue measurable set* ...... 43

3.4.8 Examples ...... 44

3.5 Lebesgue Measure on Rn (6th Lecture) ...... 45

4 Measurable maps (Lectures 7,8) 47

4.1 Definition ...... 47

4.2 Examples ...... 47

4.3 Properties of measurable functions ...... 48

4.4 Measurability with respect to the σ-algebra generated by a map ...... 49

4.5 Measurable maps with values in a subset ...... 49

4.6 Real-valued measurable functions ...... 50 CONTENTS 5

4.7 Simple functions ...... 53

4.8 Factorisation Lemma ...... 56

4.9 Pushed-forward measure ...... 56

4.10 Appendix* ...... 57

4.10.1 Littlewood’s three principles ...... 57

4.10.2 Product σ-algebras ...... 57

4.10.3 The Borel σ-algebra on the Wiener Space ...... 57

5 Integration: Lectures 9-12 59

5.1 Integration ...... 59

5.2 Outline of the construction ...... 60

5.3 Integral of simple functions ...... 60

5.4 Integration of non-negative functions ...... 64

5.5 Integrable functions ...... 69

5.6 Further limit theorems ...... 71

5.7 Integrals depending on a parameter ...... 73

5.8 Examples and exercises ...... 75

5.9 The Monotone Class Theorem for Functions * ...... 77

5.9.1 ...... 77

5.10 The pushed forward measure ...... 78

5.11 Appendix: Riemann integrals ...... 79

5.12 Appendix: Riemann-Stieltjes integrals* ...... 80

5.13 Appendix: Functions with * ...... 82

6 Lp spaces: Lectures 13,14 85

6.1 Holder’s inequality ...... 85

6.2 Definition and basic properties of the Lp spaces ...... 86

6.3 Density results ...... 89 CONTENTS 6

6.4 The case of finite measures ...... 91

6.4.1 Mode of convergence ...... 93

7 Product measures: Lectures 15-17 95

7.1 Product measures ...... 96

7.1.1 Sections of subsets of product spaces ...... 100

7.2 Fubini’s Theorem ...... 101

7.3 Product measures and independence ...... 103

8 Radon-Nikodym Theorem: Lectures 18-20 105

8.1 Singular and absolutely continuity ...... 105

8.2 Signed measure ...... 107

8.3 Radon-Nikodym Theorem ...... 110

9 Conditional Expectations: Lectures 21-23 115

9.1 Conditional expectation and probabilities ...... 115

9.1.1 Conditioning on a random variable ...... 117

9.2 Properties ...... 118

9.3 Conditional expectation of a non-negative random variable ...... 119

9.4 Some further properties ...... 120

9.5 Convergence Theorems ...... 122

9.6 Conditional expectation of square-integrable random variables ...... 123

9.6.1 Conditional Jensen inequality ...... 123

9.6.2 Conditional expectation as orthogonal projection ...... 124

9.7 Useful Inequalities and exercises* ...... 125

9.7.1 Exercises ...... 126

9.8 Conditional probability ...... 127

9.8.1 Finite σ-algebras ...... 127 CONTENTS 7

10 Mastery Material: Ergodic Theory 129

10.1 Basic Concepts ...... 129

10.1.1 Example: circle rotation ...... 130

11 Exercises from the problem sheets 131

11.1 Problem Sheet 1: Measurable sets ...... 131

11.2 Problem Sheet 2: Measures ...... 133

11.3 Problem Sheet 3: measurable functions, integrals ...... 136

11.4 Problem Sheet 4: limit theorems, computation of integrals ...... 139

11.5 Problem Sheet 5: Fubini’s Theorems ...... 142

11.6 Problem Sheet 6: the Radon-Nikodym Theorems ...... 144

11.7 Problem Sheet 7: Conditional expectations ...... 146

Chapter 1

Introduction

1.1 Module Information

Prerequisite Real Analysis (M2PM1), Metric Spaces and Topology (M2PM5), and basic Probability. Measure and Integration is a foundational course, underlies analysis modules. It brings together many concepts previously taught separately, for example integration and taking expectation, reconciling dis- crete random variables with continuous random variables.

Contents. Measurable sets, sigma-algebras, Measurable functions, Measures. Integration with re- spect to measures, Lp spaces, Modes of convergences, basic important convergence theorems (Domi- nated convergence theorem, Fatou’s lemma etc), useful inequalities, properties of integrals.

This module is related to: Probability theory, Applied probability, Markov processes, Fourier analysis and theory of distributions, Functional analysis, ODE’s, Ergodic theory, Probability theory, Stochastic Filtering, Applied Stochastic Processes.

1.2 Course Structure

Assessment

• CW 1. (5%) Thursday 5 November, Deadline: Thursday 19 November

• CW 2. (5%) Thursday 3 December, Deadline: Thursday 17 December

• May Exam (90%): 2 hour exam (year 3), 2.5-hour exam (year 4/Msc).

Problem sheets: not assessed, they are the best material for preparing for the exam.

9 1.3. PROLOGUE 10

Questions and Answers sessions: Wednesdays 11-12 am

Problem classes: Tuesdays Weeks 2-4-6-8-10

References:

• Real Analysis by H. L. Royden, Third edition

• Real Analysis, Modern Techniques and their applications, by G. B. Folland

• Measure Theory by P. R. Halmos

• Measures, Integrals and Martingales, by Rene´ L. Schilling, Cambridge UP, 2005. 176-89. Web.

How to use these notes? A ‘remark’ is material which is helpful with our understanding, which may or may not be covered in the lectures. An ‘exercise’ may appear in the weekly problem sheet, and is in the notes for one or more of the following reasons: (1) the conclusion is interesting, (2) the proof is representative, (3) proving it improves our understanding.

If a section /paragraph is given a ∗ sign, it is not covered in class (or not ‘officially’ covered in class). They are in the notes for the background or considered to be useful to further our understanding.

1.3 Prologue

Measure theory is not only a fundamental tool to measure/compute, it also provides the basis for several fundamental concepts in mathematics.

1.4 The mass transport problem

Monge was concerned with the cost of moving material from a mine to a construction site, Monge’s problem (1781) is: What is the optimal way to transport a pile of sand? 1.5. COIN TOSSING 11

Let us think of µ as the original sand mass distribution, a measure µ. We also think of the target mass as a measure ν. A map T is a transport map if it pushes µ to ν. The optimal transport problem from µ to ν is then formulated as follows: find a map x 7→ T (x) such that the cost (distance) Z |x − T (x)|µ(dx) R is minimised among all transport maps. We may also use other cost functions, e.g. kinetic energy or take into consideration the geometry of the space.

Example 1.1 Shift of a block of mass of height one of uniform density on [0, n] to [1, n + 1]. There are more than one transport maps: T1(x) = x + 1 for any x ∈ [0, n]; and ( n + 1 − x, x ∈ [0, 1) T (x) = , 2 x, x ∈ [1, n] are two different examples.

Example 1.2 Can one find a map shifting 1 unit of mass at location 0 to location 1 and −1 of equal mass?

1 1 In this case, the measures involved are µ = δ0, ν = 2 δ1 + 2 δ−1 and no transport map exists. Kantorovich’s formulation (1940’s). Find a measure on Ω × Ω that minimises Z |x − y|µ˜(dx, dy) and has marginals µ, ν respectively, i.e. µ˜(Ω × B) = ν(B), µ˜(A × Ω) = µ(A). This is called a transport plan.

Find a transport plan for Example 2. (Hint: this is a measure on {(0, 0), (1, 0), (0, 1), (1, 1)}.

Optimize the transport cost involved in distribution of pastries from bakeries to coffee shops. E.g. Bakery 1 delivers 50 pastries to Cafe 1, 30 each to Cafe shops 2 and 3, Bakery 2 delivers 20 to Cafe 2, 30 to cafe 3.

1.5 Coin tossing

We will need to understand what is meant by taking limits of functions and measures.

Example 1.3 Toss a fair coin 1000 times. Denote by Xi the result of the ith toss: Xi = 1 if we get head and Xi = −1 if we get tail. Mathematically, Xi can be represented by a random variable. Assuming that 1.6. EXTENDING RIEMANN’S THEORY OF INTEGRATION 12

the coin is fair and that all tosses are independent, then the random variables Xi, i ≥ 1 are independent 1 1 and identically distributed (i.i.d.), with probability distribution 2 δ−1 + 2 δ1, i.e. 1 P({X = 1}) = P({X = −1}) = . i i 2

The Law of Large Numbers (LLN) is a theorem ensuring that the empirical mean of a large number of tosses converges to the expectation of a single toss n 1 X Xi −→ E(X1) = 0 N N→∞ k=1 almost-surely. Thus, if you toss a coin 1000 times, you expect the empirical mean to be very close to 0. This is not suprising in practice. However the formulation and proof of the LLN (not covered in this module) is challenging, and requires sophisticated tools of measure theory and probability.

The Central Limit Theorem (CLT) describes how the empirical mean fluctuates. Namely, we have the convergence in law N 1 X √ Xi −→ N(0, 1) N→∞ N i=1 where N(0, 1) denotes a standard Gaussian distribution distributions. Thus, the deviation of the empirical mean away from 0 will be distributed , for large N, like a Gaussian curve.

The (LLN) and (CLT) are two fundamental results of probability theory, with far-reaching conse- quences. These results will not be proved in this module, but the key notions behind will be introduced.

1.6 Extending Riemann’s theory of integration

1.6.1 Inadequacy of Riemann Integrals

R b Let f :[a, b] → R be a bounded function. Riemann’s theory of integration allows to define a f(x) dx provided that f is Riemann-integrable. Recall the following definition:

Definition 1.6.1 For any partition P = {a = t0 < t1 < . . . < tn} of [a, b], we set n n X X L(f, P ) = (ti − ti−1) inf f(t),U(f, P ) = (ti − ti−1) sup f(t), t∈[t ,t ] i=1 i−1 i i=1 t∈[ti−1,ti] and define Z Z f = sup L(f, P ), f = inf U(f, P ), P P where the infimum and supremum are taken over all partitions of [0, 1]. Then f is Riemann-integrable if R R R b f and f coincide. In this case we define the Riemann integral a f(x) dx to be this common value. 1.6. EXTENDING RIEMANN’S THEORY OF INTEGRATION 13

Heuristically, f being Riemann-integrable means that one can approximate the area under the graph of f by rectangles [ti−1, ti] × [0, f(si)], where P = {a = t0 < t1 < . . . < tn} is a partition of [a, b], and si is any point in [ti−1, ti]. Intervals are thus the “building blocks” of Riemann’s theory. Unfortunately, in many interesting cases, f is not Riemann-integrable.

Example 1.4 An example of non-Riemann-integrable function is the Dirichlet function on [0, 1], denoted by 1Q: 1Q(x) = 1 for x ∈ Q and 1Q(x) = 0, for x ∈ [0, 1] \ Q. Indeed one has U(1Q,P ) = 1 and L(1Q,P ) = 0 along any partition P of [0, 1], so that the lower and upper integrals are distinct: Z Z 1Q := sup L(1Q,P ) = 0, 1Q := inf U(1Q,P ) = 1. P P

Same with the function 1[0,1]\Q defined on [0, 1] by 1[0,1]\Q(0) = 1 for x ∈ Q and 1[0,1]\Q(x) = 1 for x ∈ [0, 1] \ Q.

R 1 There are more irrational points than rational points, perhaps 0 f1(x)dx can be defined to be 1? Can we quantify how big are the sets of rational and irrational number, to see that the latter is larger than the former?

One inconvenient feature of the Riemann integral is its lack of stability under taking limits. Indeed, one can construct a sequence of bounded Riemann integrable functions fn such that fn(x) → f(x) for every x ∈ [0, 1], but f is not Riemann integrable, as the following exmple shows:

Example 1.5 Let Q = {q1, q2,... } be an enumeration of rational numbers. Define fn(x) = 1 if R x ∈ {q1, q2, . . . , qn} and set fn(x) = 0 otherwsie. Then fn is Riemann interagble and fn(x)dx = 1, but f is the Dirichlet function which is not Riemann integrable.

1.6.2 Idea of Lebesgue’s integral: allow for more sets as “building blocks”

Although the Dirichlet is not Riemann integrable, seen from another perspective, it is a simple function, simply the indicator of the set Q. The idea behind Lebesgue’s integral is to extend the family of “building blocks” beyond intervals, allowing for more general sets such as Q.

The question then arises: what kind of subsets of the plane or the can be measured? Clas- sically, the basic sets are: intervals on R; triangles, squares, polygons, squares on R2. For instance, relating the area of a circle to its length is an old mathematical problem, accomplished by ‘exhaustions’ (by the ancient Greeks 5th BC, the Achimedes of Syracus : 2-3 BC), which corresponds to taking limit in the modern language.

To extend our theory, we therefore begin with simple sets we can measure, and gradually build up more measurable sets by taking complements, unions, and approximations. This translates into the definition of σ-algebras. The new building blocks of our theory will now be measurable sets, i.e. elements 1.7. OBSERVABLES OF RANDOM VARIABLES 14 of the Lebesgue σ-algebra. The way in which we assign a measure/weight to these sets will correspond to the notion of Lebesgue measure.

1.7 Observables of Random Variables

Besides allowing to extend Riemann’s theory of integration, measure theory allows to define integrals in a much more general setting: on any given measure space (to be defined below), one can build up the associated measure theory. One important example corresponds to probability spaces.

Let (Ω, F, P) be a probability space, let X be a real-valued random variable, and let f :Ω → R be a P Borel measurable function. If P({X = i}) = pi for all i ≥ 1, with i pi = 1, the expectation of X is given by X Ef(X) = pif(i) i On the other hand, if X is a random variable with standard Gaussian normal distribution, then

Z 1 2 EX = √ e−x /2f(x)dx. 2π

Althought they look different, both expectations are integrals R f(y) dµ(y) where µ is probability distri- R bution of X. Both are also given by Ω X(ω)dP(ω). These concepts use integration theory with respect to a measure. Chapter 2

Measurable sets

Given a set X , the collection of its subsets is its power set. We denote it by 2X . What kind of subsets can be measured? What axioms should they satisfy? If a subset A can be measured, this ought to lead to some understanding of its complement Ac = X \ A. Moreover, if A, B can be measured, it is natural to postulate that A ∪ B can be measured. Actually, we further require limits of measurable sets to be S measurable, so that if (An)n≥1 is a sequence of measurable functions, n≥1 An should be measurable as well.

2.1 Set Algebra

Denote by Ac the complement of a subset A.

de Morgan’s identities: c c (∩A∈ΛAα) = (∪A ∈ Λα) c c (∪A∈ΛAα) = (∩A ∈ Λα) This holds for any number of sets, not necessarily countable number of.

Distribution Law

A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).

Distribution law holds for also for infinite number of sets. For example [ [ [ ( Ai) ∩ ( Bj) = (Ai ∩ Bj). i i i,j

Limits If An is an increasing sequence of subsets, we set

∞ lim An = ∪ An. n→∞ n=1

15 2.2. MEASURABLE SPACE 16

Similarly if An is a decreasing sequence of subsets, we define

∞ lim An = ∩ An. n→∞ n=1

Let f : X → Y be a map, define

f −1(A) = {x ∈ X : f(x) ∈ A}.

Taking pre-image respects set operations:

Lemma 2.1.1 −1 [  [ −1 f Bi = f (Bi),

−1 \  \ −1 f Bi = f (Bi),

f −1(A \ B) = f −1(A) \ f −1(B).

2.2 Measurable Space

Let X be a non-empty set. We study collections of subsets of X . Let φ denote the empty set.

Definition 2.2.1 A collection of subsets of X, denoted by A, is a (Boolean) algebra if the following property holds.

• (Non-emptiness ) φ ∈ A.

• (Closedness under complements) If B ∈ A then Bc ∈ A;

• (Closedness under finite unions) If B ∈ A, C ∈ A, then B ∪ C ∈ A.

The assumption φ ∈ A can be equally replaced by the condition that A is not empty.

We would like to be able to take limits.

Definition 2.2.2 A collection F of subsets of X is a σ-algebra if:

• non-emptiness: φ ∈ F.

• closedness under complements: if B ∈ F then Bc ∈ F.

• closedness under countable unions: if we have a sequence (Bn)n≥1 with Bn ∈ F for all n ≥ 1, S∞ then n=1 Bn ∈ F. 2.2. MEASURABLE SPACE 17

Elements of F are called measurable sets and (X , F) is said to be a measurable space.

Remark 2.2.1 Let F be a σ-algebra over X . Then it follows from the definition that:

•X∈F

n • If A1,...,An is a finite collection of elements of F, then ∪i=1Ai ∈ F

∞ • by de Morgan’s identities, if Bn ∈ F for all n ≥ 1, then ∩n=1Bn ∈ F.

• If A, B ∈ F then so does A \ B.

Examples:

1. The smallest σ-algebra over X is {φ, X}, the coarse σ-algebra

2. The largest σ-algebra over X is 2X , the discrete σ-algebra

3. If A ⊂ X , then F = {φ, X , A, Ac} is a σ-algebra.

4. The collection F of subsets A of X such that either A or Ac is countable, is a σ-algebra.

Exercise 2.2.1 Assume that X is infinite, and let F := {A ∈ 2X : A is finite or Ac is finite}. Show that F is an algebra but not a σ-algebra.

Proposition 2.2.2 The intersection of any number of σ-algebras is a σ-algebra. If C is any collection of subsets of a set X , then there always exists a smallest σ-algebra containing C.

Proof The first point is straightforward to prove and left as an exercise. For the second point, it suffices to consider \ A = B. B σ−algebra C⊂B In virtue of the first point, A is a σ-algebra. Moreover, A contains C, and by contsruction A ⊂ B for any other σ-algebra B containing C, as claimed. 

The intersection of σ-algebras is often denoted by ∧. For example, F1 ∧ F2 denotes the intersection of F1 and F2.

Definition 2.2.3 If C is a collection of subsets of X, we denote by σ(C) the smallest σ-algebra contain- ing C. We call σ(C) the σ-algebra generated by C. 2.2. MEASURABLE SPACE 18

Example 2.1 Let X with a finite partition A1,...,An, by which we mean that A1,...,An are disjoint n and X = ∪k=1Ak. Let C = {A1,...,An}. Then σ(C) consists of all possible unions of the sets Ai: ( ) [ σ(C) = Ai,I ⊂ {1, . . . n} . i∈I

There are 2n such choices: we can include or not include a particular set and every element of F will come from such a union. In this particular case, the σ-algebra consists of a finite number of elements.

Exercise 2.2.2 Assume that X is infinite, and let F be as in Exercise 2.2.1. Find the σ-algebra generated by F.

2.2.1 Countability of σ-algebras

We will see below that a σ-algebra with an infinite number of elements is uncountable. So a σ-algebra is either generated by a partition as in Example 2.1 above, or has an uncountable number of elements.

Proposition 2.2.3 A σ-algebra F is either finite or uncountable.

Proof will be released later in November.

2.2.2 Fundamental Examples and Exercises

Proposition 2.2.4 If f :Ω → X is a map, and (X , B) a measurable space. Then f pulls back the σ-algebra G to a σ-algebra on Ω. This is the collection of pre-images

σ(f) := {f −1(A): A ∈ B} is a σ-algebra. This is called the σ-algebra generated by f , and also called the pre-image σ-algebra.

Proof. Exercise.

Example 2.2 Let R be endowed with the Borel σ-algebra B(R). If f : X → R is of the form

n X f(x) = aj1Aj (x), x ∈ X j=1

n where aj ∈ R are distinct and the Aj form a partition of X (i.e. ∪j=1Aj = X and the Aj are pairwise disjoint), then σ(f) is generated by the finite collection of sets {A1,...,An}. 2.2. MEASURABLE SPACE 19

Example 2.3 To see the role played by the assumption that Aj are disjoint, we give an example where Aj are not disjoint. Set for example

f(x) = 1[1,2] + 31[1,3] + 41(−2,1].

Then  4, x ∈ (−2, 1) ∪ (1, 2]  3, x ∈ (2, 3] f(x) = 8, x = 1   0, otherwise . The σ-algebra generated by f is given by the partition:

{{1}, (−2, 1) ∪ (1, 2], (2, 3], (−∞, −2] ∪ (3, ∞)}.

If f(x) = 1[1,2] + 31[1,3], then σ(f) is generated by the collection of subsets:

{[1, 2], (2, 3], (−∞, 1) ∪ (3, ∞), φ, R}.

Example 2.4 If F is a σ-algebra and A ⊂ X, then {A ∩ B : B ∈ F} defines a σ-algebra on A, called the trace of the σ-algebra F.

Exercise 2.2.3 Given two σ-algebras F1 and F2, we denote by F1∨F2 the smallest σ-algebra containing both F1 and F2. Show that F1 ∨ F2 can equivalently be characterised by the expressions:

•F 1 ∨ F2 = σ{A ∪ B | A ∈ F1 and B ∈ F2},

•F 1 ∨ F2 = σ{A ∩ B | A ∈ F1 and B ∈ F2}.

Definition 2.2.4 A collection of subsets E is an elementary family if

• φ ∈ E;

• If A, B ∈ E, then A ∩ B ∈ E;

• if A ∈ E then Ac is the finite union of disjoint sets from E.

Exercise 2.2.4 Show that if E is an elementary family, then the collection A of finite disjoint unions of members of E is an algebra.

c n c n Proof If A, B ∈ E, then B = ∪l=1Bl where Bl ∈ E are disjoint. So A\B = A∩B = ∪l=1(A∩Bl) ∈ n A. Also, A ∪ B = (A \ B) ∪ B ∈ A. By induction if Ai ∈ E then ∪i=1Ai ∈ A. This means that the finite union of sets from the elementary family is in A and A is closed under taking unions. 2.3. BOREL σ-ALGEBRAS 20

n We show A is closed under complement. Let A = ∪j=1Aj ∈ A where Aj ∈ E are disjoint. Then c Pnj m m c n c c Aj = m=1 Aj where Aj ∈ E. Since A = ∩j=1(Aj), by associativity, so A is a finite union of intersections of sets from E, therefore in A. 

We call the following sets of the following form half open intervals: (a, b], φ or (a, ∞), (−∞, a) and φ, where a, b are real numbers.

Example 2.5 The collection of unions of a finite number of disjoint half open intervals is an algebra.

This follows from taking E to be the collection of half open intervals and use the earlier exercise.

2.3 Borel σ-algebras

2.3.1 Definition

Definition 2.3.1 If X is a complete separable metric space , the smallest σ-algebra generated by the open sets is called Borel σ-algebra, it is denoted by B(X ).

If X is a separable metric space, the metric topology satisfies the second axiom of countability, i.e. there exists a countable base. This countable base generates B(X).

2.3.2 The case of R

Example 2.6 We know the following about B(R), the Borel σ-algebra over R.

∞ 1 1 1. If a ∈ R, {a} = ∩n=1(a − n , a + n ) ∈ B(R). 2. Any countable subset of R is a .

3. Any intervals of any form is a Borel set.

4. B(R) is generated by the set of open intervals. This follows from the proposition below.

Proposition 2.3.1 Every of R is a countable union of disjoint open sets. This decomposition into disjoint open sets is unique up to ordering.

Proof Denote by A the collection of open intervals. Let O be an open set. Let x ∈ O. Then x is contained in an open , the interval is a subset of O. The set of the left end points of such interval has an infimum. Let us call it ax. Let bx denote the supremum of the right end of the intervals. Then Ix = (ax, bx) is the maximal interval such that x ∈ Ix ⊂ O Given x, y ∈ O either they are disjoint or 2.4. PRODUCT σ-ALGEBRAS 21

they overlap in which case they are the same. Pick up a rational number qx from Ix. Then the rational numbers are all distinct. So {Ix : x ∈ O} contains only a countable number of distinct open intervals. 

2.3.3 Borel σ-algbera on a subset of a metric space

Let (X , d) be a metric space, and Y ⊂ X be a subset of X . Then Y (endowed with the restriction of the metric d to Y) is still a metric space, its open sets are the subsets of the form U ∩ Y, where U is an open subset of X . One may consider the corresponding Borel σ-algebra B(Y). It turns out that the following holds

Proposition 2.3.2 B(Y) coincides with the trace on Y of B(X ), that is

B(Y) = {A ∩ Y,A ∈ B(X )},

(see Example 2.4 above).

Proof By Example 2.4, we know that A := {A ∩ Y,A ∈ B(X )} is a σ-algebra on Y. Any open subset of Y is of the form U ∩ Y, where U is an open (hence Borel) subset of X , hence B(Y) ⊂ A. Conversely, the collection {A ∈ B(X ): A ∩ Y} is shown at once to be a σ-algebra on X , and it contains open subsets of X . Therefore it contains, hence coincides with, B(X ). So A ⊂ B(Y), and the claim follows. 

Remark 2.3.3 In the special case where Y is itself a Borel measurable subset of X , then for any A ∈ B(X ), we also have A ∩ Y ∈ B(X ), hence it follows from the above proposition that

B(Y) = {B ∈ B(X ): B ⊂ Y}.

In particular, in this case, B(Y) ⊂ B(X ).

2.4 Product σ-algebras

Let (X , F) and (Y, G) be two measurable spaces. In general, subsets of X × Y which are product sets, i.e. of the form A × B with A ∈ F and B ∈ G, do not constitute a σ-algebra. Rather, we introduce the following definition.

Definition 2.4.1 We define the product σ-algebra (also called tensor σ-algebra) on the product space X × Y to be that generated by product sets

F ⊗ G = σ ({A × B : A ∈ F,B ∈ G}) . 2.4. PRODUCT σ-ALGEBRAS 22

2.4.1 Product of a countable family of σ-algebras *

The above definition of product extends to any number of measurable spaces. We give the definition for the product of a countable number of measurable spaces.

Definition 2.4.2 Let (En, Fn) be measurable spaces. The tensor or product σ-algebra on the product ∞ space Πn=1En is   σ Πn≥1An, : An ∈ Fn .

Remark 2.4.1 By stability of a σ-algebra under countable intersections, the above σ-algebra is also n generated by measurable cylinders, i.e. subsets of the form Πk=1Ak ×Πk>nEk for some n ≥ 1 and with A1 ∈ F1,...,An ∈ Fn.

Let (En, dn), n ≥ 1, be metric spaces. The product topology on E = Πn≥1En is the coarsest topology such that the projections pn : E → En are continuous, it is generated by sets of the form −1 pn (Un) where n ≥ 1 and Un is an open set of En. The product topology is induced by the distance d on E defined by ∞ X dn(xn, yn) ∧ 1 d(x, y) = , 2n n=1 for any two elements x = (xn)n≥1 and y = (yn)n≥1 of E. We may consider the Borel σ-algebra B(E) of the product metric space E, and investigate how it relates with the product σ-algebra ⊗n≥1B(En).

Lemma 2.4.2 ⊗n≥1B(En) ⊂ B(E).

Proof By Remark 2.4.1 above, it suffices to prove that, for all n ≥ 1 and for all A1 ∈ B(E1),...,An ∈ B(En), we have A := A1 × ... × An × Πk>nEk ∈ B(E). Note that the claim is true if A1,...,An are all open subsets, as then A is an open, hence Borel subset of E. Thus, for all A2,...An open subsets of E2,...,En, respectively, the collection

A1 := {A1 ∈ B(E1): A1 × ... × An × Πk>nEk ∈ B(E)} contains all open subsets of E1. It is also a σ-algebra. Hence A1 = B(E1), proving that

A1 × ... × An × Πk>nEk ∈ B(E) whenever A1 ∈ B(E1), and A2,...An are open subsets of E2,...,En, respectively. Porceeding by in- duction over i = 1, . . . , n, we similarly show that the claim still holds if A1,...,Ai are Borel measurable and Ai+1,...,An are open. Taking i = n yields the claim. 

Remark 2.4.3 An alternative reasoning is as follows. The coordinate mappings pn : E → En being continuous, by Proposition 4.3.3 (see below), they are therefore measurable with respect to the Borel σ-algebra on the product space. Hence, for all n ≥ 1, and all A1 ∈ B(E1),...,An ∈ B(En), n −1 A1 × ... × An × Πk>nEk = ∩i=1Pi (Ai) ∈ B(E) 2.5. A WARNING 23 which yields the claim.

We now make the additional assumption that the En are all separable.

Remark 2.4.4 If X is a separable metric space, the metric topology satisfies the second axiom of count- ability, i.e. there exists a countable base. It suffices e.g. to consider the (countable) collection of open balls B(x, r) = {y ∈ X : d(x, y) < r}, for r > 0 rational, and for x ∈ D, where D is a countable dense subset of X. This countable base generates B(X).

∞ Theorem 2.4.5 Let (En) be separable metric spaces and set E = Πn=1En. Then

∞ B(E) = ⊗n=1B(En).

∞ Proof In view of the above lemma, there only remains to prove that B(E) ⊂ ⊗n=1B(En). For all n ≥ 1, let Dn be a countable base for the topology of En. Then subsets of E of the form U1×...×Un×Πk>nEk, for n ≥ 1, and U1 ∈ D1,...,Un ∈ Dn form a countable base for the product topology on E. That is, any open subset W of E can be written in the form W = ∪j≥1Wj, with Wj = U1 ×...×Un(j) ×Πk>n(j)Ek, where, for all k = 1, . . . , n(j), Uk is an element of Dk (in particular it is open, hence Borel). In particular, ∞ ∞ for all j ≥ 1, Wj ∈ ⊗n=1B(En). Therefore W ∈ ⊗n=1B(En), and this being true for any open subset W of E, the claim follows. 

2.5 A warning

We end this chapter with a warning:

Remark 2.5.1 if C is a sollection of subsets of a set X and F = σ(C), this does not mean in general ∞ that every measurable set is of the form ∪i=1Ci with Ci ∈ C. Taking for example C to be the collection ∞ {(−∞, a], where a ∈ R}, then σ(C) = B(R), however if Ci = (−∞, ai] are elements of C, then ∪i=1Ci is an interval of the form (−∞, a) or (−∞, a] with a = sup{ai, i ≥ 0}, and the Borel set {−1, 1}, for instance, is not of this form. More generally, as soon as C is an infinite collection of sets, one cannot hope to represent a generic element of σ(C) as some countable combination of elements of C.

2.6 Background material*

A (X , T ) consists of a set X equipped with a topology T , i.e. a subset T ⊂ 2X such that: 2.6. BACKGROUND MATERIAL* 24

• φ ∈ T and X ∈ T .

TN • If {A0,A1,...,AN } ⊂ T , then n=0 An ∈ T . S • If A ⊂ T , then A∈A A ∈ T .

In other words, T is closed under arbitrary unions and finite intersections. Elements of T are called open sets. A function between topological spaces is continuous if the pre-images of open sets are open sets.

Given a topological space (X , T ), we define B(X ) to be the smallest σ-algebra on X containing T . This particular σ-algebra is called the Borel σ-algebra of X . In other words, the Borel σ-algebra is the smallest σ-algebra such that all open sets are measurable. We denote by Bb(X ) the (Banach) space of all Borel-measurable and bounded functions from X to R equipped with the norm

kϕk∞ = sup |ϕ(x)| . (2.1) x∈X

We denote by Cb(X ) the (Banach) space of all continuous and bounded functions from X to R equipped with the same norm as in (2.1).

The open sets of a metric space gives a topology on the metric space. Borel σ-algebras on a complete separable metric space is specially nice.

When discussing Borel σ-algebras, we will always assume that X is a complete separable metric space.

A metric space is compact if any covering of it by open sets has a sub-covering of finite open sets. A discrete metric space (whose subsets are all open sets ) is compact if and only if it is finite. (e.g. Z and N with the usual distance d(x, y) = |x − y| is not compact.) A subset of a metric space is compact if it is compact as a metric space with the induced metric. It is relatively compact if its closure is compact. A metric space is sequentially compact if every sequence of its elements has a convergent sub-sequence (with limit in the metric space of course). It is totally bounded if for any  > 0 it has a finite covering by open balls of side . A metric space is complete if every Cauchy sequence converges.

It is a theorem that a metric space is compact if and only if it is complete and totally bounded. A metric space is compact if and only if it is sequentially compact.

A subset of a metric space is relatively compact if it is sequentially compact (the limit may not need to belong to the subset). Chapter 3

Measures

3.1 Basics of Measures

A measure assigns a number to every measurable set.

Definition 3.1.1 A measure µ on the measurable space (X , F) is a map µ: F → [0, ∞] with the fol- lowing properties;

• µ(φ) = 0.

• (σ-additive Propertices) If {An}n>0 is a countable collection of elements of F that are all disjoint, then one has [ X µ( An) = µ(An). n>0 n>0

The triple (X , F, µ) is called a measure space.

Definition 3.1.2 We say that:

• µ is a finite measure if µ(X ) < ∞.

• µ is a probability measure if µ(X ) = 1.

∞ • µ is σ-finite if there exists A1 ⊂ A2 ⊂ ... such that µ(Ai) < ∞ and X = ∪i=1Aj. This is equivalent to the statement that there exists an increasing sequence of measurable sets Xn with ∞ µ(Xn) < ∞ such that X = ∪n=1Xn. The sequence is called an exhausting sequence.

Convention. From now on we study only σ-finite measures.

25 3.1. BASICS OF MEASURES (2ND LECTURE) 26

X Example 3.1 If X = {1, 2, . . . , n}, let F = 2 . Then (X , F) is a σ-algebra. If we set µ(i) = ai, this defines a measure.

Proposition 3.1.1 Let (Ω, F) be a measurable space. Then

1. If A, B ∈ F and A ⊂ B then µ(B \ A) = µ(B) − µ(A).

2. Monotonicity. If A ⊂ B then µ(A) ≤ µ(B).

3. (Finite) additivity. If A1,A2 ∈ F and A1 ∩ A2 = φ then µ(A1 ∪ A2) = µ(A1) + µ(A2).

∞ 4. Continuity from below. If An ∈ F, An increases, then µ(∪n=1An) = limn→∞ µ(An).

5. Continuity from above. If An ∈ F, An decreases and µ(A1) < ∞, then

∞ µ(∩ An) = lim µ(An). n=1 n→∞

Proof Claim 1 follows from that B = A ∪ (B \ A) is a disjoint union whenever A ⊂ B. Hence µ(B) = µ(A) + µ(B \ A).

Claim 2 follows by claim 1, claim 3 follows from taking Ai = φ for i ≥ 3.

∞ We prove claim 4. If A1 ⊂ A2 ⊂ A3 ⊂ ..., set A0 = φ, A = ∪i=1Ai Also set

B1 = A1,B2 = A2 \ A1,B3 = A3 \ A2,...,

∞ ∞ and so on. Then {Bi} are pairwise disjoint and ∪i=1Bi = ∪i=1Ai = A. Then, ∞ n n X X X µ(A) = µ(Bi) = lim µ(Bi) = lim [µ(Ai) − µ(Ai−1)] = lim µ(An). n→∞ n→∞ n→∞ i=1 i=1 i=1

Prove claim 5. We assume that µ(A1) < ∞ and proceed as for claim 3. 

Example 3.2 1. Let X be a set endowed with the discrete σ-algebra 2X . Set

µ(A) = |A|,A ⊂ X ,

where |A| denotes the cardinality of A. Then µ is a measure, called the counting measure on (X , 2X ).

2. Let (X , G) be a measurable space and x ∈ X . The δ measure at x, defined by ( 1, if x ∈ A, δ (A) = x 0, if x 6∈ A,

is a probability measure. 3.1. BASICS OF MEASURES (2ND LECTURE) 27

3. A discrete measure P on a countable space Ω = {ω1, ω2,... }, is uniquely determined by the sequence of numbers (pi)i≥1, where

pi = P ({ωi}), i ≥ 1.

Then, if A ⊂ Ω, we have ∞ ∞ X X P (A) = piδωi (A) = pi1A(ωi). i=1 i=1 P∞ In particular P is a probability measure if and only if i=1 pi = 1.

The last example above contains many examples of measures on countable sets you already know:

• Let Y be a random variable on a two state space X = {0, 1} with Bernoulli distribution of parame- ter p: P(Y = 0) = p and P(Y = 1) = 1 − p. Then the probability distribution of Y is the measure on X given by: PY := p δ0 + (1 − p) δ1.

• Let X be a binomial random variable of parameter p on {0, . . . , n}. The probability distribution of X is the measure PX on {0, . . . , n} given by n X PX = k=0

• Let X be a Poisson random variable of parameter λ on {0, 1, 2,...}. The probability distribution of X is the measure PX on {0, 1, 2 ..., } given by ∞ X λk P (A) = e−λ δ (A). X k! k k=0

Not all measures can be written as linear combinations of Dirac measures as above: here are two examples of more complicated measures.

Example 3.3 The interval [0, 1] equipped with its Borel σ-algebra admits a unique probability measure λ such that λ([a, b]) = b−a, for all a, b ∈ [0, 1], a ≤ b. The existence and uniqueness of such a measure, called the Lebesgue measure on [0, 1], will be shown below.

Example 3.4 The measure Z P(A) = e−x dx, A to be defined below, is a probability measrue on the half-line R+ equipped with B(R+). In such a situation, where the measure has a density with respect to Lebesgue measure, we will also use the short- hand notation P(dx) = e−x dx. 3.2. UNIQUENESS OF MEASURES (3RD LECTURE) 28

Exercise 3.1.1 If µ : F → [0, ∞] satisfies µ(φ) = 0, is finitely additive, and continuous from below, then it is a measure.

Exercise 3.1.2 Let N be endowed with the discrete σ-algebra 2N. Let µ : 2N → [0, ∞] be defined by P −n µ(A) = n∈A 2 if A is finite, and µ(A) = +∞ otherwise.

1. Is µ additive?

2. Is µ a measure on (N, 2N)?

3.2 Uniqueness statements

We often know that a property holds for a sub-collection of a σ-algebra, and want to show it holds for every set in the σ-algebra. The difficulty with this is that σ-algebras are in generally complicated. If we have a finite partition of X , the σ-algebra contains a finite number of sets. Otherwise it has uncountably many elements. See §2.2.1.

There are a number of powerful techniques, one of them is the π − λ theorem.

3.2.1 π − λ theorem

Definition 3.2.1 A non-empty collection of sub-sets C is a π-system if A, B ∈ C implies that A∩B ∈ C.

Definition 3.2.2 A collection C of sub-sets is called a λ-system if

1. Ω ∈ C

2. If A, B ∈ C and A ⊂ B then B \ A ∈ C.

3. If (An)n≥1 is a non-decreasing sequence of elements of C, i.e. An ∈ C and An ⊂ An+1 for all n ≥ 1, then ∪n≥1An ∈ C.

Note that a σ-algebra is both a π-system and a λ system.

Proposition 3.2.1 (not given in class) 1. Show that the intersection of any number of λ-systems is a λ-systems. Therefore given any collections of subsets there exists a smallest λ-system containing it.

2. If a λ system A is also a π-system, then it is a σ-algebra. 3.3. MEASURE DETERMINING SETS 29

Proof Part (1) is trivial. To see a set which is simultaneously a π-system and a λ-system is a σ- algebra, remark that taking unions is obtained from taking intersections and complements. In turn, taking countable unions is obtained from taking finite unions and monotone limits, whence the claim. 

Theorem 3.2.2 If C is a π-system, then the smallest λ-system generated by C, λ(C), is a σ-algebra, and λ(C) = σ(C).

Proof Firstly

F1 : = {A : A ∩ C ⊂ λ(C)} = {A ∈ λ(C): A ∩ F ∈ λ(C), ∀F ∈ C} is a λ-system containing C, which implies F1 = λ(C). Indeed, let F ∈ C, then (1) Ω ∩ F = F ∈ C, and so Ω ∈ F1; (2) If A ⊂ B, with A, B ∈ F1,

(B \ A) ∩ F = B ∩ F \ (A ∩ F ) ∈ F1

S S (3).If An ∈ F1, An ∩ F ∈ λ(C), so ( An) ∩ F = n(An ∩ F ) ∈ λ(C). Similarly,

F2 : = {A : A ∩ λ(C) ⊂ λ(C)} = {A ∈ λ(C): A ∩ E ∈ λ(C), ∀E ∈ λ(C)} is a λ-system containing C, and therefore F2 = λ(C). Thus λ(C) is both a λ-system and a π-system, and is therefore a σ-algebra. Since it contains C, we obtain σ(C) ⊂ λ(C). But, since any σ-algebra is a λ-system, we also have λ(C) ⊂ σ(C), and the equality follows. 

We present now a very useful result, which follows at once from Theorem 3.2.2.

Corollary 3.2.3 [π − λ Theorem.] If a λ-system contains a π-system C, then it contains the σ-algebra generated by C.

3.3 Measure determining sets

Theorem 3.3.1 Suppose that (Ω, F) is a measurable space, and C is a π-system generating F. Let µ and ν be two measures which agree on C.

1. If µ(Ω) = ν(Ω) < ∞, then µ = ν

2. More generally, if there exists an increasing sequence of subsets Ωk ∈ C, such that Ω = ∪k≥1Ωk and µ(Ωk) = ν(Ωk) < ∞ for all k ≥ 1, then µ = ν. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 30

Proof (1) First assume that µ(Ω) = ν(Ω) < ∞. Let

G = {A ∈ F : µ(A) = ν(A)}.

Then Ω ∈ G by assumption. Moreover, if An is a non-decreasing sequence of measurable sets in G,

∞ ∞ [ [ µ( An) = lim µ(An) = lim ν(An) = ν( An) n→∞ n→∞ n=1 n=1

Thus, G is closed under taking lower limit. Let A ⊂ B, A, B ∈ G, then by additive property,

µ(B \ A) = µ(B) − µ(A) = ν(B) − ν(A) = ν(B \ A).

This means B \ A ∈ G, and therefore G is a λ-system containing a π-system generating F. By the π − λ-Theorem, G ⊃ F and µ = ν on F, which proves the first point.

(2) For the second point, let Fk = {A ∩ Ωk : A ∈ F} denote the trace σ-algebras on Ωk, and denote by µk and νk the restrictions to Ωk of the measures µ and ν:

∀A ∈ F, µk(A) = µ(A ∩ Ωk), νk(A) = ν(A ∩ Ωk).

Applying the first point to µk and νk, we deduce that µk = νk. Therefore, by lower contiunity of measures, we obtain, for all A ∈ F,

µ(A) = lim µ(A ∩ Ωk) = lim ν(A ∩ Ωk) = ν(A), k→∞ k→∞ completing the proof. 

Example 3.5 Let Ω = {1, 2, 3, 4}. Let G = {{1, 2}, {1, 3}, {3, 4}}, which generates the discrete σ- algebra F ( the power set), but not a π-system. The measure µ and ν agree on G, not on F.

µ(1) = 1/6, µ(2) = 2/6, µ(3) = 1/6, µ(4) = 2/6,

ν(1) = 2/6, ν(2) = 1/6, ν(3) = 0, ν(4) = 3/6.

3.4 Construction of measures from pre-measures

If (X , F) is a measurable space, constructing a non-trivial measure µ on (X , F), is in general, a very complicated task. One of the main goals we are aiming at is the construction of the Lebesgue measure on (R, B(R)). This will require introducing further set-theoretical notions. We start with some definitons. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 31

3.4.1 Negligible sets and complete σ-algebras

Definition 3.4.1 A measurable set A on a measure space (M, F, µ) is said to be a ‘’ if µ(A) = 0. A subset N of M is negligible if there exists a null set A ∈ F such that N ⊂ A.

Remark 3.4.1 A negligible set N is not necessarily measurable. Therefore we may not write that µ(N) = 0, simply because µ(N) is not defined.

Definition 3.4.2 Given a measure space (M, F, µ), F is said to be complete with respect to µ if it contains every negligible null set. The completion of F with respect to µ, denoted by F¯, is the σ- algebra given by F¯ = A ∨ N , where N is the collection of negligible sets.

3.4.2 A completion theorem*

Let Ai be two collections of subsets, %i : Ai → [0, ∞], i = 1, 2. In the sequel we say %2 is an extension of %1 if A2 contains A1 and %1(A) = %2(A) whenever A ∈ A1. Let (M, F, µ) be a measure space.

Proposition 3.4.2 The completion of F with respect to µ can be expressed as

F¯ = {A ∪ N,A ∈ F,N ∈ N }

= {A ⊂ M : ∃B1,B2 ∈ F, µ(B2 \ B1) = 0,B1 ⊂ A ⊂ B2}.

Moreover, we can extend uniquely µ into a measure µ¯ on F¯ by setting µ¯(A ∪ N) = µ(A) for A ∈ F and N ∈ N , and F¯ is complete w.r.t. µ¯. Furthermore, F¯ is the smallest σ-algebra containing F and to which µ can be extended in a way that F¯ is complete with respect to µ.

Proof The proof of the first two assumptions is similar to the proof of Theorem B, Section 13 (Chapter III) in Halmos’s book. We show the last claim: assume that G is a sigma-algebra containing F and on which µ can be extended into a measure ν in a way that G is complete with respect to ν. By definition of N , any N ∈ N satisfies N ⊂ A for some A ∈ F, with µ(A) = 0. Since F ⊂ G and ν is an extension of µ to G, we also have ν(A) = 0. Hence A is negligible with repect to ν. Since G is complete w.r.t. ν, we deduce that N ∈ G. Thus F ∪ N ⊂ G, so since G is a σ-algebra, we have F¯ = σ(F ∪ N ) ⊂ G. Moreover, for all A ∈ F and N ∈ N , we have ν(A) ≤ ν(A ∪ N) ≤ ν(A) + ν(N) = ν(A), so ν(A ∪ N) = µ(A) =µ ¯(A ∪ N). Thus ν is an extension of µ¯ to G, which yields the claim. 

Remark 3.4.3 Ususally the extension µ¯ of µ to F¯ will still be denoted by µ. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 32

3.4.3 Outer Measure

Definition 3.4.3 A function µ∗ : 2X → [0, ∞] is said to be an outer measure over X if we have:

(1) µ∗(φ) = 0,

(2) Monotonicity. If A ⊂ B then µ∗(A) ≤ µ∗(B).

(3) countable sub-additivity. ∞ ∗ ∞ X ∗ µ (∪n=1An) ≤ µ (An). n=1

Exercise 3.4.1 • Let µ∗(φ) = 0 and µ∗(A) = ∞ if A ⊂ X is not empty. Show that µ∗ is an outer-measure.

• If an outer measure µ∗ satisfies that µ∗(A ∪ B) = µ∗(A) + µ∗(B) for any A, B ∈ F where F is a σ-algebra, what can you say about µ∗?

Definition 3.4.4 Let A be a collection of subsets containing φ, X . A map A → [0, ∞) is a pre-measure if

• %(φ) = 0, and

∞ • for any An ∈ A pairwise disjoint with ∪n=1An ∈ A,

∞ ∞ X %(∪n=1An) = %(An). n=1

Outer measures can be obtained from any suitable function on a given collection of subsets, by cov- ering a set with sets from this collection. A collection of set {Aα, α ∈ Λ} is said to be a cover of a set E if E ⊂ ∪α∈ΛAα. We are interested in covers consisting of a countable number of elements.

Proposition 3.4.4 Let A be a non-empty collection of subsets with φ ∈ A, X ∈ A. Let % : A → [0, ∞] be a function such that %(φ) = 0. We define a function on 2X as below. For any E ⊂ X ,

 ∞  ∗ X ∞  µ (E) = inf %(Aj): E ⊂ ∪j=1Aj,Aj ∈ A . (3.1) j=1 

Then µ∗ is an outer measure.

Proof (1) Since every set is covered by {X }, the definition makes sense, and µ∗(φ) = 0. (2) If A ⊂ B, then every cover of B is a cover A and so µ∗(B) ≥ µ∗(A). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 33

(3) To show countable sub-additivity, take any sequence of subsets Aj. For any  > 0 there exist k k Bj ∈ A with Aj ⊂ ∪kBj such that

∞ X  µ∗(A ) ≥ %(Bk) − . j j 2j k=1 Summing over j, ∞ ∞ ∞ X ∗ X X k µ (Aj) ≥ %(Bj ) − . j=1 j=1 k=1 k ∞ Since {Bj } is a cover of ∪j=1Aj,

∞ ∞ ∗ ∞ X X k µ (∪j=1Aj) ≤ %(Bj ). j=1 k=1 Hence, X ∗ ∗ ∞ µ (Aj) ≥ µ (∪j=1Aj) −  j ∗ ∞ P ∗ for any  > 0. Take  → 0 to see µ (∪j=1Aj) ≤ j µ (Aj), thus completing the proof. 

How do we obtain additive from sub-additive? Let us single out sets that are strongly additive. Take a set A, we then have two collections of subsets: subsets of A on the one hand, and subsets of Ac on the other hand. If we take one from each collection and take their union, we want to prove that µ∗ is additive along this decomposition.

Definition 3.4.5 A subset A of X is said to be µ∗-measurable (in the sense of Caratheodory) if for any set B ⊂ X , we have µ∗(B) = µ∗(B ∩ A) + µ∗(B ∩ Ac). (3.2)

We have already sub-additivity, the non-trivial part of (3.2) is

µ∗(B) ≥ µ∗(B ∩ A) + µ∗(B ∩ Ac). (3.3)

Theorem 3.4.5 [Caratheodory Theorem] If µ∗ is an outer measure on X and G the collection of µ∗- measurable sets, then

(1) G is a σ-algebra.

(2) The restriction of µ∗ to G is a measure.

(3) G is complete w.r.t. µ∗.

Proof The proof is deferred to §3.4.5.  3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 34

3.4.4 Extension of pre-measures

Proposition 3.4.6 If A is an algebra, and % is a pre-measure on A, we define µ∗ by (3.1). Then the following statements hold.

1. Every set in A is µ∗-measurable.

2. µ∗ and % agree on A.

In particular, µ∗ is a measure on σ(A).

Proof The proof is deferred to §3.4.5. 

Proposition 3.4.7 * If % is a pre-measure on an algebra A, we define µ∗ by (3.1) and % is σ-finite, then µ∗ is its unique extension to G and therefore to σ(A).

Proof The proof is deferred to §3.4.5. 

3.4.5 Proof for Caratheodory’s Theorem

We return to prove some of the theorems and propositions given earlier.,

Theorem 3.4.5 (Caratheodory’s Theorem) If µ∗ is an outer measure on X , then the collection G of µ∗-measurable sets is a σ-algebra, the restriction of µ∗ to G is a measure, and G is complete w.r.t. µ∗.

Proof Throughout the proof let us write µ = µ∗ for simplicity. (1) φ ∈ G and µ∗(φ) = 0. (2) Since the roles played by A and Ac are symmetric, A ∈ G implies that Ac ∈ G.

(3) Suppose A, F ∈ G, we show A ∪ F ∈ G.

Take any B ∈ X . We apply (3.2) first with A ∈ G and then with F ∈ G:

µ(B) = µ(B ∩ A) + µ(B ∩ Ac) = µ(B ∩ A ∩ F ) + µ(B ∩ A ∩ F c) + µ(B ∩ Ac ∩ F ) + µ(B ∩ Ac ∩ F c).

We use A∪F = (A∩F )∪(A∩F c)∪(Ac ∩F ), as well as (A∪F )c = Ac ∩F c, and apply sub-additivity

µ(B) = µ(B ∩ A ∩ F ) + µ(B ∩ A ∩ F )c + µ(B ∩ Ac ∩ F ) + µ(B ∩ Ac ∩ F c) ≥ µ(B ∩ (A ∪ F )) + µ(B ∩ (A ∪ F )c).

This shows that A ∪ F ∈ G. Also, if A, F ∈ G are disjoint,

µ(A ∪ F ) = µ((A ∪ F ) ∩ F ) + µ((A ∪ F ) ∩ F c) = µ(F ) + µ(A). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 35

∞ n (4) Let An ∈ G, we show A = ∪n=1An ∈ G. Set Fn = ∪j=1Aj. Since G is stable under finite unions and under taking the complement, we may assume that the An are disjoint. Then for any B ∈ X , using Aj ∈ G, c µ(B ∩ Fn) = µ(B ∩ Fn ∩ An) + µ(B ∩ Fn ∩ (An) )

= µ(B ∩ An) + µ(B ∩ Fn−1) c = µ(B ∩ An) + µ(B ∩ Fn−1 ∩ An−1) + µ(B ∩ Fn−1 ∩ An−1)

= µ(B ∩ An) + µ(B ∩ An−1) + µ(B ∩ Fn−2) = ... n X = µ(B ∩ Aj). j=1

Thus using Fn ∈ G, n c X c µ(B) = µ(B ∩ Fn) + µ(B ∩ (Fn) ) = µ(B ∩ Aj) + µ(B ∩ (Fn) ) j=1 n X c ≥ µ(B ∩ Aj) + µ(B ∩ A ). j=1

Taking n → ∞, then using ∪(B ∩ Aj) = B ∩ A and sub-additivity, ∞ X c µ(B) ≥ µ(B ∩ Aj) + µ(B ∩ A ) j=1 ≥ µ(B ∩ A) + µ(B ∩ Ac) ≥ µ(B). P∞ Hence A ∈ G and j=1 µ(B ∩ Aj) = µ(B ∩ A) for any B ∈ G. Take B = A to conclude the σ-additivity. We proved that G is a σ-algebra and µ is a measure on it.

(5) Finally if µ(A) = 0 and F ⊂ A, then for any B ⊂ X ,

µ(B) ≤ µ(B ∩ F ) + µ(B ∩ (F )c) ≤ 0 + µ(B).

Hence all the inequalities above are equalities,

µ(B) = µ(B ∩ F ) + µ(B ∩ (F )c) and F ∈ G. Therefore G contains all negligible subsets F for µ and µ(F ) = 0. 

In the sequel, we shall assume A to be an algebra. In that case, we can use the following, convenient lemma:

Lemma 3.4.8 Let A be an algebra and % : A → [0, ∞] an additive function. In words, for all A, B ∈ A disjoint, we have: %(A ∪ B) = %(A) + %(B). Then % is 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 36

1. monotone: for any A, B ∈ A with A ⊂ B, %(A) ≤ %(B)

Sn Pn 2. sub-additive: for any A1,...,An ∈ A, %( i=1 Ai) ≤ i=1 %(Ai)

Proof By additivity for all A, B ∈ A with A ⊂ B, we have

%(B) = %(A) + %(B \ A) ≥ %(A), which yields monotonicity. Now, for A1,...,An ∈ A, we set

i−1  [ Bi = Ai \  Aj ∈ A, i = 1, . . . , n, j=1 so that the Bi are disjoint. Hence, using successively the additivity and monotonicity of %,

n ! n ! n n [ [ X X % Ai = % Bi = % (Bi) ≤ % (Ai) , i=1 i=1 i=1 i=1 which yields the requested sub-additivity. 

Proposition3.4.6. If % is a pre-measure on an algebra A, we define µ∗ by (3.1). Then:

1. µ∗ = % on A.

2. every set in A is µ∗-measurable.

In particular, µ∗ is a measure on σ(A), the completion of σ(A) w.r.t µ∗.

Proof (1) Let B ∈ A, then B is a cover for itself and µ∗(B) ≤ %(B). As before write µ = µ∗ for simplicity. We now prove the reverse inequality. Suppose that B ⊂ ∪nAn where An ∈ A. Then B = ∪nA˜n, where A˜n = (B ∩ A) \ (∪i

∞ ∞ X X %(B) = %(A˜n) ≤ %(An). n=1 n=1 Take infimum over all covers, ∞ X %(B) ≤ inf %(An) = µ(B). n=1 We have showed that % and µ agree on A.

(2) We now show that any A ∈ A is µ∗-measurable. Take any B ⊂ X, we want to show

µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B), (3.4) 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 37

For any  > 0 choose a cover {Bj} of B from A with

∞ ∗ X µ (B) ≥ %(Bj) − . j=1

We first use A ∈ A ⊂ G, % = µ∗ on A and σ- sub-additivity of µ, and

∞ ∞ ∞ X X X c %(Bj) = %(A ∩ Bj) + %(A ∩ Bj) j=1 j=1 j=1 ∗ ∗ c ≥ µ (A ∩ ∪jBj) + µ (A ∩ ∪jBj) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B).

In the last step we use ∪jBj ⊃ B and the monotonicity of the outer measure. Therefore

µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B) − .

Taking  → 0, we obtain µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B), so that A is µ∗-measurable (the reverse inequality follows by sub-additivity), which proves the second claim. For the final statement, note that by 3.4.5, µ∗ defines a measure on G, and that G is complete w.r.t. ∗ µ . Since A ⊂ G, therefore σ(A) ⊂ G, and the claim follows. 

Proposition 3.4.9 * Let % be a σ-finite pre-measure on an algebra A, and define µ∗ by (3.1). Then µ∗ is the unique measure extending % to G, and therefore the unique measure on σ(A), extending %.

Proof Step 1. We first show that if ν is another measure on the µ∗-measurable set G extending %, then ∗ ν ≤ µ . Indeed if E ∈ G, E ⊂ ∪Aj where Aj ∈ A, then by sub-additivity, X X ν(E) ≤ ν(∪Aj) ≤ ν(Aj) = %(Aj). j j

This holds for any covering of E by sets from A, hence

ν(E) ≤ µ∗(E).

Step 2. Suppose that E ∈ G is such that µ∗(E) < ∞, we show that µ∗(E) = ν(E). We can find a P ∗ ∗ cover Aj of E from A with j µ (Aj) ≤ µ (E) + . We may and will assume that the cover is disjoint. ∞ Set A = ∪j=1Aj. Since E ⊂ A,

∗ ∗ ∗ X ∗ ∗ µ (A \ E) = µ (A) − µ (E) ≤ µ (Aj) − µ (E) < . (3.5) j 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 38

Also, since E ⊂ A, ∗ ∗ X ∗ µ (E) ≤ µ (A)≤ µ (Aj) j agree on A X = ν(Aj) = ν(A) j = ν(E) + ν(A \ E) ν≤µ∗ ≤ ν(E) + µ∗(A \ E) ≤ ν(E) + , where we have used (3.5) in the last inequality. Taking  → 0, we see that µ∗(E) = ν(E).

Step 3. Let % be σ-finite. By the assumption there exists an increasing sequence of sets Xn from A S such that %(Xn) < ∞ and n≥1 Xn = X . Then using step 2 we can complete the proof (exercise). 

3.4.6 Construction of Lebesgue-Stieltjes / Lebesgue measure

We want to construct a measure λ on (R, B(R)), such that the measure of a half open interval is its length, and such that it is translation invariant, i.e. λ(A + t) = λ(A) for any t ∈ R and any Borel set A, where A + t = {a + t : a ∈ A} is the translate of A by the number t. We will show that there exists a unique such measure on (R, B(R)) called the Lebesgue measure. This will be one specific example of Lebesgue-Stieltjes measure.

Definition 3.4.6 A collection of subsets E is an elementary family if

• φ ∈ E;

• If A, B ∈ E, then A ∩ B ∈ E;

• if A ∈ E then Ac is a finite union of disjoint sets from E.

Exercise 3.4.2 Show that if E is an elementary family, then the collection A of finite disjoint unions of members of E is an algebra.

In this section, by half open interval we mean sets of the form (a, b], (c, ∞), (−∞, b], or φ, where a, b, c are finite numbers. For simplicity we write a generic half interval as (c, d], where c ∈ R ∪ {−∞}, and d is a real number or ∞. In the latter case by (c, ∞], in this chapter, we really mean (c, ∞] ∩ R.

Definition 3.4.7 Henceforth, in this section, let E denote the collection of half open intervals and let A be the collection of finite unions of disjoint half open intervals.

We have seen that 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 39

Proposition 3.4.10 A is an algebra.

Remark 3.4.11 Every element of A can be written as a finite disjoint union of maximal intervals (this means the interval cannot be extended within A): n [ A = (ak, bk], (3.6) k=1 this is unique if we order them by ak < bk < ak+1. Let us call this the canonical representation. If A is not bounded from above, bn = ∞, by (an, bn] we mean (an, bn] ∩ R.

Let F : R → R be an increasing and right continuous function. Then it has at most a countable number of discontinuities, all of which are of jump type (i.e. left and right limits exist). We define

F (∞) = lim F (x),F (−∞) = lim F (x). (3.7) x→∞ x→−∞ Sn Definition 3.4.8 For A = j=1(aj, bj] ∈ A, written in its canonical representation , we set n X %(A) = (F (bj) − F (aj)). (3.8) j=1 We also set %(φ) = 0.

Remark 3.4.12 This definition is independent of the expression for A. Indeed, suppose that A = (c, d] Sn is written also as A = j=1(aj, bj], then up to a re-ordering,

c = a1 ≤ b1 = a2 ≤ b2 = a3 ≤ · · · ≤ bn = d. This gives a telescopic sum, n X (F (bj) − F (aj)) = F (bn) − F (a1) = F (d) − F (c). j=1 So when A is a single interval, there is no ambiguity in the definition of %. Pp If A = k=1(ck, dk] is written in its canonical representation, so each interval is a maximal interval, then there is partition {Λk, k = 1, . . . , p} of {1, . . . , n} such that [ (aj, bj] = (ck, dk].

j∈Λk (recall that a partition of a set I is a collection of disjoint subsets of I whose union is I). In fact, it suffices to consider Λk := {j = 1, . . . , n :(aj, bj] ⊂ (ck, dk]}. Hence n p p X X X X %((aj, bj]) = %((aj, bj]) = %((ck, bk]).

j=1 k=1 j∈Λk k=1 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 40

Proposition 3.4.13 If F : R → R is a non-decreasing and right continuous function, then % defined by (3.8) is a pre-measure on A.

Proof We have %(φ) = 0, so it remains to prove σ-additivity. We start by showing that % is additive. Let A, B ∈ A be disjoint. Then there exists Ej ∈ E such that

n m [ [ A = Ej,B = Ej. j=1 j=n+1

Then m n m X X X %(A ∪ B) = %(Ej) = %(Ej) + %(Ej) = %(A) + %(B). j=1 j=1 j=n+1 Hence % is indeed additive. By Lemma 3.4.8, we in particular deduce that % is monotone and sub-additive.

Step 2. Let now (Aj)j≥1 be a sequence of disjoint elements of A such that [ A := Aj j≥1 is in A. Using monotonicity followed by finite additivity,

n n [ X %(A) ≥ %( Aj) = %(Aj). j=1 j=1

Since this holds for every n, ∞ X %(A) ≥ %(Aj). j=1

Step 3. We now prove the reverse inequality. Since A ∈ A,

n [ A = (ak, bk]. (3.9) k=1 P We show that %(A) ≤ j=1 %(Aj). As argued before, in Remark 3.4.12, by additivity of % we reduce this to the case A is one single half open interval.

Step 4. We first assume A = (c, d] is bounded. We may also assumme that c < d, otherwise %((c, d]) = %(φ) = 0 and the requested inequality is trivially satisfied. Since, for all j ≥ 1, Aj is a finite union of bounded disjoint half open intervals, the whole collection of these intervals is countable and we write it {(ak, bk], k ≥ 1}. In particular, we have

∞ ∞ ∞ ∞ [ [ X X Aj = (ak, bk], and %(Aj) = %((ak, bk]). (3.10) j=1 k=1 j=1 k=1 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 41

The idea is to approximate (c, d] by a compact interval [c + δ, d] and (ak, bk] by a larger open interval (ak, bk + δk), we can then choose a finite cover and use finite additivity. Let  > 0. By right-continuity of F , we may choose δ > 0 such that δ < d − c and

F (c + δ) − F (c) < .

Similarly, for all k ≥ 1, we choose a δk so that

−k F (bk + δk) − F (bk) ≤ 2 .

The compact set [c+δ, d] is covered by the sets (ak, bk] and therefore by the larger open sets (ak, bk +δk). We can choose a finite covering, so that for some N,

N [c + δ, d] ⊂ ∪k=1(ak, bk + δk).

In particular, we have the inclusion

N (c + δ, d] ⊂ ∪k=1(ak, bk + δk], between subsets that are in A. By monotonicity and sub-additivity of %,

%((c, d]) = F (d) − F (c) = F (d) − F (c + δ) + F (c + δ) − F (c) ≤ %((c + δ, d]) +  N  ≤ % ∪k=1(ak, bk + δk] +  N N X X  ≤ %((a , b ]) + +  k k 2k k=1 k=1 ∞ X ≤ %(Aj) + 2, j=1 where the last inequality follows from (3.10). Taking  → 0, we conclude that X %(A) = %((c, d]) ≤ %(Aj) j as requested. Together with step 2, we have proved X %(A) = %(Aj) j for A a bounded half open interval.

Step 5. Now assume A = (−∞, b]. Then, for every n with −n < b,

%(A) = F (b) − F (−∞) = %((−n, b]) + F (−n) − F (−∞). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 42

Note that ∞ ∞ [ [ (−n, b] = (−n, b] ∩ Aj = (−n, b] ∩ Aj. j=1 j=1

Since (−n, b] ∩ Aj ∈ A for all j ≥ 1, we use step 4 to conclude that

∞ X %((−n, b]) = %((−n, b] ∩ Aj). j=1 Hence, invoking monotonicity of %, we obtain

∞ ∞ X X %(A) = %((−n, b] ∩ Aj) + F (−n) − F (−∞) ≤ %(Aj) + F (−n) − F (−∞). j=1 j=1

Taking n → ∞, since F (−n) → F (−∞) we have proved the requested inequality. The same can be shown for A = (a, ∞), a ∈ R, or for A = R = (−∞, +∞]. We have completed the proof. 

As in (3.1), we define

 ∞ ∞  ∗ X [  µ (E) = inf %(Aj): E ⊂ Aj,Aj ∈ A . j=1 j=1 

Since each A ∈ A is a finite disjoint union of members in E, by re-arranging we have

 ∞ ∞  ∗ X [  µ (E) = inf %(Ej): E ⊂ Ej,Ej ∈ E j=1 j=1 

 ∞ ∞  X [  = inf %((aj, bj]) : E ⊂ (aj, bj] . j=1 j=1 

Theorem 3.4.14 There exists a measure µF on the Borel σ-algebra B(R) such that µF = % on A. It is the unique measure on B(R) that agrees with % on A.

Proof Existence of µF follows from Caratheodory’s Theorem. The uniqueness statement follows from Theorem 3.3.1. 

Proposition 3.4.15 µF extends to a measure (still denoted by µF ) on the completion B¯(R) of B(R).

Proof Let G be the collection of µ∗ measurable subsets of R, and N the collection of µF -negligible ∗ ∗ subsets. Since G is complete w.r.t. µ , and since µ coincides with µF on B(R), we deduce that N ⊂ G. But we also have B(R) ⊂ G, and hence B¯(R) ⊂ G. So the restriction of µ∗ to B¯(R) provides the requested extension.  3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 43

Remark 3.4.16 * Note that, by Proposition 3.4.2, such an extension is unique.

∞ P∞ Remark 3.4.17 If A = ∪j=1(aj, bj] is a disjoint union, then µF (A) = j=1 %((aj, bj]). This follows from that the union is a Borel set, and σ-additive property fo the measure. The subscript F will be omitted if there is no risk of confusion.

Definition 3.4.9 • The measure µF constructed in the theorem above is called Lebesgue-Stieltjes measure associated to F .

• If F is the identity map, this is called the Lebesgue measure on R and will be denoted by λ.

• The completion B¯(R) of the Borel σ-algebra with respect to λ is called the Lebesgue σ-algebra, its elements are called Lebesgue-measurable sets.

Theorem 3.4.18 If B is a µ∗-measurable set, then

µF (B) = inf{µF (O): B ⊂ O, O is open}

= sup{µF (D): D ⊂ B, D is closed}. This holds in particular if B is a Borel measurable set.

Proof This is left as exercise. 

Recall that B∆A = (B \ A) S(A \ B).

Remark 3.4.19 If B is µ∗-measurable set with finite measure, then for every  > 0 there exists a set A which is a finite union of open intervals, such that µF (B∆A) < .

3.4.7 Example of a non-Lebesgue measurable set*

Proposition 3.4.20 B¯(R) 6= 2R, i.e. there exists a subset of R that is not Lebesgue measurable.

Proof Let us define an equivalent relation: x ∼ y if and only if x − y ∈ Q. Using Axiom of choice we can choose the representative of the equivalent relations to be in (0, 1]. This quotient set V = R/ ∼ as a subset of (0, 1] is called the Vitali set. If q1 6= q2 are two rational numbers, the translates V + q1 and V + q2 are disjoint. Let A = Q ∩ (−1, 1], then

(0, 1] ⊂ ∪q∈A(V + q) ⊂ [−1, 2]. (3.11)

The second inequality follows from the inclusions V ⊂ (0, 1] and A ⊂ (−1, 1]. To prove the first one, take y ∈ (0, 1], then there exists q ∈ (−1, 1] ∩ Q = A such that y = q + [y], where [y] is the representative of y in (0, 1]. Thus y ∈ ∪q∈A(V + q) and the left hand side inclusion is proved. Let us assume by contradiction that V is Lebesgue measurable. What can its Lebesgue measure be? By 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 44 translation invariance, λ(V + q) = λ(V ) for all q ∈ A. Moreover, the sets V + q, q ∈ A, are pairwise disjoint. Therefore X λ(∪q∈A(V + Q)) = λ(V ), q∈A which equals either 0 (if λ(V ) = 0) or +∞ (if λ(V ) > 0). But in view of (3.11), we also have 1 ≤ λ(∪q∈A(V + Q)) ≤ 3, which is a contradiction. Therefore V is not Lebesgue measurable. 

Exercise 3.4.3 Show that there does not exist a translation invariant countably additive measure on (R, 2R) which assigns an interval its length.

3.4.8 Examples

Example 3.6 Let a, b ∈ R with a < b. then, since µF ({b}) ≤ µF ((b − 1, b]) = F (b) − F (b − 1) < ∞,

µF ((a, b)) = µF ((a, b]) − µF ({b}).

Observe that µF ({b}) 6= 0 precisely when b is a discontinuity of F . Indeed, since µF ((b − 1, b]) < ∞, by continuity of µF from above,

∞ 1 1 1 µF ({b}) = µF (∩ {(b − , b]}) = lim µF ((b − , b]) = F (b) − lim F (b − ) = F (b) − F (b−). n=1 n n→∞ n n→∞ n

Example 3.7 Since F (x) = x is continuous, any countable subset of R has Lebesgue measure zero.

Example 3.8 Let α ∈ R be fixed, and let ( α, x < 0, F (x) = α + 3, x ≥ 0.

Then µF = 3δ0. Indeed, for any interval (a, b], ( 3, if 0 ∈ (a, b], µ ((a, b]) = F 0, if 0 6∈ (a, b], so that µF and 3δ0 coincide on E. To see that they are equal, note that E is a π-system generating B(R) and that µF (R) = 3 = 3δ0(R) < ∞.

Example 3.9 If Y is a random variable on a probability space (Ω, F, P), set F (x) = P(Y ≤ x). Then F is right continuous and increasing, it determines a Lebesgue-Stieltjes measure µF . Since F (−∞) = 0, Also,

µF ((a, b]) = F (b) − F (a) = P(a < Y ≤ b). 3.5. LEBESGUE MEASURE ON RN (6TH LECTURE) 45

3.5 Lebesgue Measure on Rn

By a similar construction, we obtain a Lebesgue measure on Rn. We do not get into details.

n Definition 3.5.1 There is a unique measure λ on B(R ) such that, for all a1, . . . , an, b1, . . . , bn ∈ R with ai ≤ bi, n λ((a1, b1] × · · · × (an, bn]) = Πi=1(bi − ai). This is called the Lebesgue measure on Rn, it extends to the completion of B(Rn).

Remark 3.5.1 The Lebesgue measure is invariant under translations and rotations.

For A ⊂ R and t ∈ R, let us denote by t + A the shifted set

t + A = {t + a, a ∈ A}.

Theorem 3.5.2 Any shift of a Borel set is a Borel set, i.e. if A ∈ B(R), then for all t ∈ R, t + A ∈ B(R). Moreover, the Lebesgue measure is translation invariant on the Borel σ-algebra, i.e.

∀A ∈ B(R), λ(A) = λ(t + A).

Proof Let C = {A ∈ B(R): ∀t ∈ R, t + A ∈ B(R)}. C is a σ-algebra. Moreover, if U is an open set, then for all t ∈ R, t + U is an open set as well, hence C contains all open sets. Therefore C contains the σ-algebra genereated by open sets, i.e. C = B(R). Now we show the second point. Let

A = {A ∈ B(R): ∀t ∈ R, λ(t + A) = λ(A)}.

Then A contains the π-system of finite unions of disjoint half intervals. Now Ω ∈ A. If A ⊂ B with A, B ∈ A, t + B \ A = (t + B) \ (t + A), t + A ⊂ t + B, hence

µ(t + B \ A) = µ(t + B) − µ(t + A) = µ(B) − µ(A) = µ(B \ A).

Thus B \ A ∈ A. S S If An ∈ A is an increasing sequence, so is t + An, and n(t + An) = t + n An, ∞ [ [ µ(t + An) = lim µ(t + An) = lim µ(An) = µ( An). n→∞ n→∞ n=1 n

Thus A is a λ system, concluding the proof.  3.5. LEBESGUE MEASURE ON RN (6TH LECTURE) 46

Proposition 3.5.3 Two non-trivial translation-invariant Borel measures on R which assign a finite mea- sure to bounded intervals are constant multiples of each other.

Proof Let µ and ν be translation invariant measures. Then, for all integers p, q ≥ 1, 1 1 µ((0, 1]) = qµ((0, ]), µ([0, p/q]) = pµ((0, ]) = (p/q)µ((0, 1]). q q

p Similarly, ν([0, q )) = (p/q) ν((0, 1]). The quanitities ν((0, 1]) and ν((0, 1]) are non-zero, for other- wise the measures would be trivial. Set c = µ([0, 1)]/ν([0, 1). Then µ([0, p/q]) = (p/q)µ([0, 1)] = cν([0, p/q)). Hence, µ(I) = cν(I) for all I ∈ C, where C denotes the collection of all half-open intervals with finite rational length. Now, C is a π-system and σ(C) = B(R). Moreover, we have ∪k≥1(−k, k] = R, and µ and cν assign the same finite mass to (−k, k] ∈ C for all k ≥ 1. By Theorem 3.3.1, we deduce that µ = cν.  Chapter 4

Measurable maps

4.1 Definition

Let (Ω, F) and (X , G) be two measurable spaces.

Definition 4.1.1 A map f :Ω → X is said to be measurable from (Ω, F) to (X , G) if, for all A ∈ G, f −1(A) ∈ F.

When it is clear which σ-algebras are being considered, in the situation above, we will often just say that the map f is measurable.

Definition 4.1.2 Assume Ω and X are topological spaces endowed with their respective Borel σ-algebras B(Ω) and B(C). We call a function f :Ω → X (Borel) measurable if it is measurable from (Ω, B(Ω)) to (X , B(X )).

Definition 4.1.3 Let (Ω, F, P) be a probability space and (X , G) be a measurable space. If X is a measurable map from (Ω, F) to (X , G), we call X a random variable.

Exercise 4.1.1 Let (X , B) be a measurable space and Ω a set. A function f :Ω → X is always measurable when Ω is endowed with the pre-image σ-algebra σ(f). Moreover, σ(f) is the smallest σ-algebra on Ω such that f :Ω → X is measurable.

4.2 Examples

Example 4.1 If X is a discrete space, let the power set be its σ-algebra. Let Y be another space with a σ-algebra. Then any function f : X → Y is measurable.

47 4.3. PROPERTIES OF MEASURABLE FUNCTIONS 48

If f is identically 1 on a set A and zero everywhere else, it is called the indicator function of A. We denote it by 1A: ( 1, if ω ∈ A 1 (ω) = A 0, if ω 6∈ A.

The indicator function 1A of a set is a measurable function if and only if A is a measurable set.

Example 4.2 If f : E → R takes only a finite number of values {a1, . . . , ak}. Then, f is measurable if and only if, for all i, {f = ai} is measurable. Also, f can be written in the following form: f = Pk i=1 ai1Aj . (Just take Aj = {x : f(x) = aj}).

4.3 Properties of measurable functions

Proposition 4.3.1 If f :(X1, F1) → (X2, F2) and g :(X2, F2) → (X3, F3) are measurable functions then so f ◦ g : X1 → X3 is measurable.

−1 Proof Let A ∈ F3 then f (A) ∈ F2. Furthermore,

−1 −1 (f ◦ g) (A) = {x ∈ X1 : f ◦ g ∈ A} = {x ∈ X1 : g ∈ f (A)} ∈ F1 using g is measurable. 

Proposition 4.3.2 Let (Ω, F) and (X , B) be measurable spaces. Suppose that B is generated by C and f :Ω → X . If f −1(A) ∈ F whenever A ∈ C, then f is measurable.

Proof B0 = {A ∈ B : f −1(A) ∈ F} ⊃ C is a σ-algebra, check with Lemma 2.1.1. It must therefore agree with B = σ(C). 

Here are some important consequences of the above proposition.

Proposition 4.3.3 If X,Y are metric spaces and f : X → Y is a continuous map, then f is Borel measurable.

Proof Continuity means that the pre-image of an open set is an open set. Hence, f −1(U) ∈ B(X) for all open subset U of Y . Since B(Y ) is generated by open sets, we deduce that f is Borel measurable. 

Proposition 4.3.4 Letf1 :(X , F) → (X1, F1) and f2 :(X , F) → (X2, F2) be measurable functions. Then

ψ = (f1, f2): X → X1 × X2 is measurable, when the target space is endowed by the product σ-algebra F1 ⊗ F2. 4.4. MEASURABILITY WITH RESPECT TO THE σ-ALGEBRA GENERATED BY A MAP 49

−1 Proof Apply Proposition 4.3.2, it is sufficient to know that h (A × B), where A ∈ F1 and B ∈ F2, is measurable. But, −1 −1 −1 ψ (A × B) = f1 (A) ∩ f2 (B) ∈ F, completing the proof. 

4.4 Measurability with respect to the σ-algebra generated by a map

An enhanced version of the composition rule for f ◦ g can be proved in the case where A2, the σ-algebra on the second set X2, corresponds to the σ-algebra σ(f) generated by f.

Let (X1, A1) and (X3, A3) be measurable spaces, and X2 be a set. Let f : X2 → X3 be a map, and let σ(f) be the σ-algebra on X2 generated by f.

Proposition 4.4.1 A map g : X1 → X2 is measurable from (X1, A1) to (X2, σ(f)) if and only if f ◦ g is measurable from (X1, A1) to (X3, A3).

Proof Since f is measurable from (X2, σ(f)) to (X , A3), by the composition rule, if g is measurable from (X1, A1) to (X2, σ(f)), then f ◦ g is measurable from (X1, A1) to (X3, A3). Conversely, assume that f ◦ g is measurable from (X1, A1) to (X3, A3), then for any C ∈ A3, we have

−1 −1 −1 g (f (C)) = (f ◦ g) (C) ∈ A1.

−1 Since σ(f) consists of all subsets of the form f (C), for C ∈ X3, we deduce that g is measurable from (X1, A1) to (X2, σ(f)). 

4.5 Measurable maps with values in a subset

Let (Ω, F) and (X , G) be measurable spaces, and let Y ⊂ X . Let us denote by i : Y → X the inclusion map. Recall that Y can be endowed with the trace H of the σ-algebra G, see Example2.4 above. Let f :Ω → Y be a map, one can then also consider i ◦ f, which is a map from Ω to X , therefore there are two possible definitions for the measurability of f. The following Proposition states that these coincide.

Proposition 4.5.1 Let f :Ω → Y. Then f is measurable from (Ω, F) to (Y, H) if and only if i ◦ f is measurable from (Ω, F) to (X , G).

Proof Note that σ(i) = {i−1(A),A ∈ G} = {A ∩ Y,A ∈ G} = H.

Hence the result follows from Proposition 4.4.1 above.  4.6. REAL-VALUED MEASURABLE FUNCTIONS 50

In particular, one obtains the following consequence for the case when X is a metric space.

Corollary 4.5.2 Let X be a metric space and let Y be a subset of X . Then f :Ω → Y is measurable from (Ω, F) to (Y, B(Y)) if and only if it is measurable from (Ω, F) to (X , B(X )).

Proof This follows from the above Proposition, noting that B(Y) is the trace of B(X ) on Y, see Propo- sition 2.3.2. 

4.6 Real-valued measurable functions

The Borel σ-algebra of R is generated by the set of open intervals, equally they are generated by the set of all closed intervals. We state some of these below, they follow straightforwardly from Proposition 4.3.2.

Proposition 4.6.1 Let (X, B) be a measurable space. A function f : X → R is Borel measurable if one of the following conditions hold:

1. for each real number a, {x : f(x) > a} is a measurable set;

2. for each real number a, {x : f(x) < a} is a measurable set;

3. for each real number a, {x : f(x) ≥ a} is a measurable set;

4. for each real number a, {x : f(x) ≤ a} is a measurable set;

For the first just note that sets of the form (a, ∞) generates the Borel σ-algebra, apply Proposition 4.3.2 to. conclude. Similarly the other statements chan be shown.

Proposition 4.6.2 Let (X, F) be a measurable space. If f, g : X → R are measurable functions, then so are the following functions.

1. f + g

2. fg,

3. max(f, g)

4. f + = max(f, 0), f − = max(−f, 0)

5. |f| = f + + f −. 4.6. REAL-VALUED MEASURABLE FUNCTIONS 51

Proof Firstly (f, g): X → R2 is Borel measurable. Set ψ : X → R2 by

ψ(x) = (f(x), g(x)) and define ϕ : R2 → R by ϕ(x, y) = x + y, a continuous function. Then f + g = ϕ ◦ ψ is a composition of measurable function, so measurable. Choosing ϕ(x, y) = xy, for the analogous proof for fg. Now for any a > 0, {max(f, g) < a} = {f < a} ∩ {g < a} ∈ F, so max(f, g) is measurable. Part 4 follows from part 3, and 5 follows part 1 and part 4. 

(f+g)+|f−g| Remark 4.6.3 Note the relation max(f, g) = 2 .

Definition 4.6.1 Let R = R ∪ {−∞, +∞}. One can endow R with the distance

d(x, y) = |F (x) − F (y)|, x, y ∈ R, with F (x) = arctan(x), x ∈ R, and F (±∞) = limu→±∞ arctan(u) = ±π/2. We then denote by B(R) the associated Borel σ-algebra.

Exercise 4.6.1 1. Show that open subsets of R are of the form i) U, where U is an open subset of R, ii) [−∞, a) where a ∈ R, iii) (b, +∞], where b ∈ R, or any union of i), ii) and iii).

2. Using Remark 2.3.3, deduce that B(R) ⊂ B(R) and that B([0, +∞]) ⊂ B(R). Show also that B(R) is generated by the collection of intervals (t, +∞] ⊂ R, for t ∈ R.

3. Show that f is measurable from (X , F) to (R, B(R)) iff {f = ω} ∈ F for ω ∈ {−∞, +∞} and f −1(A) ∈ F for any A ∈ B(R).

Remark 4.6.4 By Proposition 4.5.1, if (X , A) is a measurable space, then a function f : X → R is measurable from (X , A) to (R, B(R)) if and only if i ◦ f is measurable from (X , A) to (R, B(R), where i : R → R is the inclusion map.

Proposition 4.6.5 Let (X, B) be a measurable space. If fn : X → R are measurable functions, so are the following:

1. sup{n≥1} fn,

2. inf{n≥1} fn;

3. lim sup fn,

4. lim inf fn. 4.6. REAL-VALUED MEASURABLE FUNCTIONS 52

Proof Firstly {sup fn > a} = ∪n{fn ≥ a}, n belongs to F, as each {fn ≥ a} does (the above is {x : infn≥1 fn(x) < a} = ∪n{x : fn(x) < a}), so sup{n≥1} fn is measurable. Also,

{ inf fn < a} = ∩n{fn < a} ∈ F, n≥1 showing that inf{n≥1} fn is measurable. Since

lim sup fn = inf sup fk, lim inf fn. = sup sup fk, n→∞ n k≥n n k≥n they are both measurable. 

Proposition 4.6.6 If fn is a sequence of Borel measurable functions with f = limn→∞ fn(x) exists at every x, then f is a Borel measurable function.

Example 4.3 If (X , F) and (Y, G) are measurable spaces. Suppose that µ is a measure. Let f : X → Y and g : X → Y. Let A = {f 6= g}.

1. Suppose both f and g are measurable, then f − g is a measurable function, consequently, A = {f − g 6= 0} is a measurable set.

2. Suppose that F is complete with respect to µ. Suppose A is contained in a null measurable set. Since F is complete, then A is measurable and µ(A) = 0. Then f is measurable implies that g is measurable. Indeed, for any B ∈ G, [ {g ∈ B} = ({f ∈ B} ∩ Ac) ({g ∈ B} ∩ A) ∈ F.

We use that ({g ∈ B} ∩ A) is contained in a null set and therefore measurable, Ac and {f ∈ B} are measurable.

Definition 4.6.2 Let (X , F, µ) be a measure space and (Y, G) be a measurable space. Let f, g two measurable maps from (X , F) to (Y, G). If f = g on a set of full measure, i.e. µ({f 6= g}) = 0, we say that f = g almost-everywhere. This is abbreviated by f = g a.e.

Let (Ω, F, P) be a probability space, (Y, G) a measurable space, and X,Y two random variables from (Ω, F) to (Y, G). We say that X = Y almost-surely if X = Y on an event of full probaility, i.e. P({X 6= Y }) = 0. This is abbreviated by X = Y a.s.

Example 4.4 (This is not intended to be delivered in class) If fn is a sequence of Borel measurable functions. Set   lim fn(x), if the limit exists at x, f(x) = n→∞  0, otherwise Then f is measurable. 4.7. SIMPLE FUNCTIONS 53

Proof Set

U := {x : lim fn(x) exists} = {x : lim sup fn(x) = lim inf fn(x)} n→∞ n→∞ n→∞

Since lim supn→∞ fn(x) and lim infn→∞ fn(x) are both measurable, U is a measurable set. Define

Z := (lim inf fn, lim sup fn).

( Let ∆ = {(b, b): b ∈ R} denote the the diagonal subset of R2, then U = Z−1(∆).)

Note that f(x) = 0 on U c. Then for any a < 0,

{x : lim fn(x) < a} = {x : lim sup fn(x) < a}, n→∞ n→∞ which is measurable since lim supn→∞ fn is measurable. Let a ≥ 0, then

{x : lim fn(x) < a} = {x : lim sup fn(x) < a} ∪ U, n→∞ n→∞ which is also a measurable set. 

In analysis we frequently encounter Lebesgue measurable sets and functions, this refers to the σ- algebra

4.7 Simple functions

A simple functions is a real-valued measurable function taking on a finite number of distinct values. It has representation as linear combinations of indicator functions of measurable sets. Representations are not necessarily unique. For example take the domain space to be R, let A1 = [1, 2], A2 = (2, 6], and

A3 = (−∞, 1) ∪ (4, ∞). Then, f = 21A1 + 31A2 + 01A3 is a simple function. We often do not include the last term with zero value, but sometimes it is included for the convenience that A1 ∪ A2 ∪ A3 = R. The representation is not unique,

c c f = 21A1∩Q + 31A2∩Q + 21A1∩Q + 31A2∩Q , by we also have,

f = 21([1,3] + 1(2,3] + 31(3,6]. Let us now give one popular definition for simple functions.

PN Definition 4.7.1 Let (X , F) be a measurable space. A function of the form f(x) = i=1 ai1Ai (x) where ai ∈ R and Ai ∈ Fi is said to be a simple function.

P∞ Proposition 4.7.1 If Ai are measurable sets, then f = i=1 ai1Ai is measurable. 4.7. SIMPLE FUNCTIONS 54

Proof If B is a Borel set, −1 f (A) = ∪{i:ai∈B}Ai. This is a countable union and belongs to F if every Ai ∈ F. 

P∞ Remark 4.7.2 Suppose that f = i=1 ai1Ai is measurable. If the numbers ai are distinct, and {Ai} −1 are disjoint sets, then Ai is necessarily measurable. Indeed, f ({ai}) = Ai. So for f to be measurable, Ai must belong to F.

−1 If a1 = a2, we can conclude f ({a1}) = A1 ∪ A2 is measurable, but we may not have any further information to conclude the measurability of the sets A1 and A2.

Definition 4.7.2 Suppose that a simple function f takes the following values {ai, i = 1,...,N}. Then n X −1 f(x) = ai1f (ai) i=1 −1 is said to be the canonical representation for f. Set Ai = f (ai), then Ai are pairwise disjoint mea- SN surable sets and i=1 Ai = X . P∞ Exercise 4.7.1 Write the following function in the form i=1 ai1Ai where {Ai} are pairwise disjoint. 2, x ∈ A = (−2π, −π]  1  3, x ∈ A = [1, 4) ∪ (7, 8)  2 1 f(x) = , x ∈ A = (4, 6] 2 3   5, x ∈ A4 = (9, ∞) ∩ Q  0, otherwise

Solution: The canonical representation is 1 f(x) = 21 + 31 + 1 + 51 . A1 A2 2 A3 A4 4 c We see that {A1,A2,A3,A4, (∪i=1Ai) } is a partition of X .

Example 4.5 A step function f : R → R is a piecewise constant function. Steps functions with a finite number of steps are simple functions.

Theorem 4.7.3 Let (X , G) be a measurable space. Then,

• Every measurable function f : X → R is the pointwise limit of simple functions:

f(x) = lim fn(x), n→∞

where fn ∈ E(G). Furthermore we can arrange so that |fn| ≤ |f|. ( If f is bounded, the convergence is uniform.) 4.7. SIMPLE FUNCTIONS 55

• If f ≥ 0 we may choose fn to be non-negative and the sequence (fn)n≥1 non- decreasing so

f = sup{n≥1} fn.

Proof Step 1. We prove the second claim first. Let us consider dyadic partitions of the interval [0, n]. Let us cut each length 1 interval into 2n pieces of equal length (this is called a dyadic partition). The dyadic partitions have an advantage: any partition points from current level contains all partition points from the previous levels.

Let us define the measurable sets:  j j + 1 An = x : < f(x) ≤ , 0 ≤ j ≤ n(2n − 1). j 2n 2n

j n We approximate f with the value 2n on Aj and define

n(2n−1) X j gn = 1An . 2n j j=0

1 Then on {x : 0 ≤ f(x) < n}, |fn(x) − f(x)| ≤ 2n , so

lim fn(x) = f(x). n→∞

Also fn increases with n and fn(x) ≤ f(x) for every x. Step 2. For the first claim, we write f = f + − f − where

f + = max(f, 0) f − = max(−f, 0).

+ + − Let (f )n be an non-decreasing sequence of simple functions with limit f . Similarly, Let (f )n be an non-decreasing sequence of simple functions with limit f −. Set

+ − gn = (f )n − (f )n. Then + − + − |gn(x)| = (f )n(x) + (f )n(x) ≤ f (x) + f (x) = |f(x)|. For every x ∈ f −1([−n, n]),

+ − |f(x) − gn(x)| = |(f(x) − (f )n + (f )n)(x)| + + − − ≤ |f (x) − (f )n(x)| + |f (x) − (f )n(x)| 2 ≤ . 2n

This means that for every x, limn→∞ gn(x) → f(x). If |f| ≤ M ≤ n0 is bounded, then for n ≥ n0, 2 |f(x) − g (x)| ≤ n 2n for all x ∈ X , showing the uniform convergence.  4.8. FACTORISATION LEMMA 56

4.8 Factorisation Lemma

Lemma 4.8.1 (The factorisation Lemma) Let (X , G) be a measurable space and Ω a set. Let Y :Ω → X be a measurable function and consider the measurable space (Ω, σ(Y )). Then X :Ω → R is σ(Y )- measurable if and only if there exists a measurable function f : X → R such that X = f ◦ Y . Moreover, if X is non-negative, f may be chosen to be non-negative.

Proof Recall that σ(Y ) = {Y −1(B): B ∈ G). Suppose that f : X → R is measurable, since Y is measurable with respect to σ(Y )/G, then the composition of the measurable functions f ◦ Y is measurable.

For the converse, suppose that X is σ(Y )-measurable. We first assume X = 1A where A ∈ σ(Y ). Then A = {Y −1(B)} for some B ∈ G and

1B ◦ Y = 1{ω:Y (ω)∈B} = X.

So X is the composition of the measurable function 1B with Y .

−1 Suppose that X takes only a countable number of values (an) and write An = X ({an}). Since X −1 is σ(Y )-measurable, there exist sets Bn ∈ G such that Y (Bn) = An. Define now the sets [ Cn = Bn \ Bp. p

−1 These sets are disjoint and one has again Y (Cn) = An. Setting f(x) = an for x ∈ Cn and f(x) = 0 S for x ∈ X \ n Cn, we see that f has the required property.

Suppose that X ≥ 0, we may approximate it with an increasing sequence of simple functions Xn. By the paragraph above, there exists gn : X → R+ simple functions such that Xn = gn(Y ). In fact, gn is an increasing sequence. Let g = supn gn ≥ 0. Since g is a pointwise limit of measurable functions, g is also measurable. We note g(Y (ω)) = supn gn(Y (ω)) = supn Xn(ω)) = X(ω). Finally, in the general case where X takes values in R we complete the proof by exploiting the decomposition X = X+ − X−. 

4.9 Pushed-forward measure

Definition 4.9.1 Let (X1, B1) and (X2, B2) be measurable spaces, and µ1 a measure on (X1, B1). Let f : X1 → X2 be a measurable function. We define a measure µ2 on B2 by setting

−1 µ2(A) = µ1(f (A)) = µ1({x : f(x) ∈ A}).

−1 This is called the pushed-forward measure and denoted by f∗(µ1) or sometimes as µ◦ . 4.10. APPENDIX* 57

Exercise 4.9.1 Show that f∗(µ1) is a measure.

Definition 4.9.2 If (Ω, F, P) is a probability space and X :Ω → R is a random variable, X∗P is called the probability distribution of the random variable.

Example 4.6 If X :Ω → {0, 1}, then X∗P({0}) = P(X = 0) and X∗P = P(X = 0)δ0 + P(X = 1)δ1.

4.10 Appendix*

4.10.1 Littlewood’s three principles

Every Borel measurable set on R is nearly a finite union of intervals. Every (measurable) function is nearly continuous, every convergent sequence of measurable functions is nearly uniform convergent.

4.10.2 Product σ-algebras

Let I be an arbitrary index and let (Eα, Fα), α ∈ I be a family of measurable spaces. The tensor σ-algebra, also called the product σ-algebra, on E = Πα∈I Eα is defined to be

−1 ⊗α∈I Fα = σ{πα (Aα): Aα ∈ Fα, α ∈ I}.

Here πα : E = Πα∈I Eα → Eα is the projection (also called the coordinate map). If x = (xα, α ∈ I) is an element of E, πα(x) = xα. The tensor σ-algebra is the smallest one such that for all α ∈ I, the mapping

πα :(E, ⊗α∈I Fα) → (Eα, Fα) is measurable.

n Let πi : R → R be the projection to the ith component. The tensor σ-algebra ⊗nB(R) is

−1 ⊗nB(R) = σ{πi (A): A ∈ B(R)}.

−1 For example π1 (A) = A × R × · · · × R.

4.10.3 The Borel σ-algebra on the Wiener Space

The set of all maps from [0, 1] to Rd can be denoted as the product space (Rd)[0,1]. The tensor σ-algebra d ⊗[0,1]B(R ) is the smallest space such that the projections πt where πt(σ) = σ(t) is measurable. The product topology is the smallest topology such that the projections are continuous. Hence the Borel σ-algebra of the product topology is larger than the tensor σ-algebra. 4.10. APPENDIX* 58

Let W d = C([0,T ], Rd) be the space of continuous functions from [0, 1] to Rn. It is a Banach space with the supremum norm: kgkW := sup kg(t)k. t∈[0,1] Any measurable set in the tensor σ-algebra is determined by the a countable number of projections. The action to determine whether a function is continuous or not cannot be determined by a countable number of operations. Hence W d is not a measurable set in the tensor σ-algebra. (once we know a function is continuous, the function can be determined by its values on a countable dense set.)

Example 4.7 A sample continuous real valued stochastic process (Xt, 0 ≤ t ≤ 1) on a probability space (Ω, F,P ) can be considered as a function, denoted by X, from (Ω, F) to W :

X(ω)(t) = Xt(ω).

n For each time t, and a ∈ R , {ω : |Xt(ω) − a| < } belongs to F as Xt is measurable. We show that the function X :Ω → W is Borel measurable. Take f ∈ W and the open ball in the Banach space W centred at f with radius ε > 0. Its pre-image by X is:

{ω : sup |Xt(ω) − fti | < } = {ω : sup |Xti (ω) − fti | < } 0≤t≤1 0≤ti≤1,ti∈Q

= ∩0≤ti≤1,ti∈Q{ω : |Xti (ω) − fti | < }.

Since for each i, {ω : |Xti (ω) − fti | < } ∈ F, the intersection of the countable number of such sets belongs to F. Chapter 5

Integration

5.1 Integration

Throughout this chapter (X , F, µ) is a measure space with a σ-finite measure µ. (Previously I assume the measure is complete, this assumption is not needed for defining integration.)

By the integral of a Borel measurable function f : X → R, we mean a value assigned to this function. Let us call this value I(f). So I : H → R is a function on a set of functions H. For I to qualify to be an integral it should have some properties which we describe below.

Property 1 (IP) Let H be a class of Borel measurable functions f : X → R. Suppose that to every f ∈ H, we assign a number or ∞ which we denote by I(f). We say I satisfies the IP if the following holds:

1. (Linearity) If f, g ∈ H and a, b ∈ R, we have

I(af + bg) = aI(f) + bI(g).

2. (Monotonicity) If f ≤ g then I(f) ≤ I(g)

Notation. We use R f for the integral of f. To emphasize the measure used we also use R fdµ for R the same integral. Sometime we emphasize the domain of integration as follows X fdµ. Sometimes we explicitly express the argument of integration by a dummy variable: R f(x)µ(dx), this is also written as R R R f(x)dµ(x). Also if A is a measurable set, we set A f = f1A etc.

59 5.2. OUTLINE OF THE CONSTRUCTION 60

5.2 Outline of the construction

The procedure for defining the integral of a function f : X → R w.r.t. a measure µ is as follows.

PN 1. If f = i=1 ai1Ai is a simple function (the assumption that Ai are measurable is included in the definition), we define N Z X f dµ = ai µ(Ai). X i=1 2. If f is a positive function we define Z Z f dµ = sup g dµ X g where the supremum is taken over simple functions g with 0 ≤ g ≤ f. We say f is integrable if R f dµ < ∞.

3. Let f : X → R be a measurable function. Let f + = max(f(x), 0) and f −(x) = max(−f(x), 0) be the positive and negative parts of f, then f = f + − f −. If both f + and f − have finite integrals we say f is integrable and define

Z Z Z f dµ = f + dµ − f − dµ. X X X The set of integrable functions is denoted by L1. Observe that in this case Z Z Z |f|dµ = f +dµ + f −dµ.

5.3 Integral of simple functions

Let us denote by S the set of measurable functions defined as follows:

( n ) X S = ci1Ai : Ai ∈ F, ci ∈ R, n = 1, 2,... . i=1 Elements of S are called simple functions. We also denote by S+ the set of positive simple functions:

( n ) + X S = ci1Ai : Ai ∈ F, ci > 0, n = 1, 2,... . i=1

A simple function can have more than one representations. Take for example ( 1, x ∈ [0, 1] f(x) = 1, x ∈ (2, 3] 5.3. INTEGRAL OF SIMPLE FUNCTIONS 61

Then

f(x) = 1[0,1](x) + 1[2,3](x) = 1[0,1)(x) + 1{1}(x) + 1[2,3](x). There exists a unique canonical representation, up to re-arranging the orders of the sets. By the canonical form we mean n X −1 f = ai1{f (ai)}, i=1 where ai are the set of non-zero values of f (they are of course distinct numbers), this is unique up to re-ordering.

Exercise 5.3.1 A simple functions is a measurable function taking only a finite number of values, a measurable function taking only a finite number of values is a simple function.

Proof If f is a measurable function taking only a finite number of non-zero values {c1, . . . , cn}, then it n −1 P −1 can be expressed in the form of i=1 ci1f (ci) and f (ci) is measurable. Pn We show the converse: f = i=1 ai1Ai only takes a finite number of values.

Remark 5.3.1 An easier way to show that f takes only finitely many values is as below: f is a linear combination of n indicator functions. Since an indicator function takes only value 0 or 1, such a function thus can take at most 2n values.

We next show this in at hard way, the proof itself has some value. First suppose that

f = a11A1 + a21A2 .

Set

B1 = A1 \ A2,B2 = A2 \ A1,B3 = A1 ∩ A2. Then

f = a11B1 + a21B2 + (a1 + a2)1B3 , so f takes at most the following values: 0, a1, a2, a1 + a2. Pn We now prove by induction to show that every f of the form i=1 ai1Ai can be written in the form

m X f = bj1Bj , i=j where Bj are disjoint. We have shown this holds for n ≤ 2. Suppose this holds for k ≤ n. Then,

n+1 m X X f = ai1Ai = bj1Bj + an+11An+1 , i=1 i=j 5.3. INTEGRAL OF SIMPLE FUNCTIONS 62

where Bi are disjoint. Set Cj = Bj \ An+1, j = 1, . . . , m, n [ Cm+j = Bj ∩ An+1,C2m+1 = An+1 \ ( Bm). j=1

Then Cj are disjojnt sets and

m m X X f = bj1Cj + an+11C2m+1 + (bk + an+1)1Bk∩An+1 . j=1 k=1 Thus f can be written as a finite linear combination of indicator functions of disjoint sets, among other things this also shows that f takes a finite number of values. 

Exercise 5.3.2 1. If f, g are simple functions, so are f + g and fg. In particular S is a vector space.

2. If f is a simple function, f +, f − are both simple functions.

Proof Let us denote by ai the distinct values of f and bj the distinct values for g. We write

n m X X −1 −1 f = ai1{f (ai)}, g = bj1{f (bj )}, i=1 j=1 Set −1 −1 Ei = f (ai),Fj = g (bj).

Then (Ei ∩ Fj) ∩ (Ek ∩ Fl) = φ if (i, j) 6= (k, l). It is clear that X f + g = (ai + bj)1{Ei∩Fj }. 1≤i≤n,1≤j≤m showing f + g is a simple function. The proof for fg is similar. It is clear that f + = max(f, 0), − f = min(f, 0) are also simple functions. They are obtained by flipping ai to max(ai, 0) or max(−ai, 0). 

Pn Definition 5.3.1 Let f = i=1 ai1Ai , in the canonical representation. Suppose that µ(Ai) < ∞ for every i, we then set n X I(f) = aiµ(Ai). i=1 + If in addition f ∈ S , then ai > 0, we may remove the assumption µ(Ai) = ∞, I(f) is still defined non-ambiguously.

Of course I(f) takes the value ∞ if and only if one of the Ai’s has infinite measure. Observe that if f = 0 almost-everywhere then I(f) = 0. 5.3. INTEGRAL OF SIMPLE FUNCTIONS 63

Lemma 5.3.2 If f ∈ S is represented as

M X f = dj1Bj j=1 where Bi are pairwise disjoint, then M X I(f) = diµ(Bi). i=1

Proof Observe that [ Bj = Ai.

{j:dj =ai} Also µ(A ) = P µ(B ) by additive property of measures. Hence i {j:dj =ai} j

n n X X X I(f) = aiµ(Ai) = ai µ(Bj)

i=1 i=1 {j:dj =ai} M X = djµ(Bj), j=1 completing the proof. (It is clear if µ(Ai) = ∞ then one of the µ(Bj) = ∞.) 

We show next that I has the IP.

Proposition 5.3.3 Let f, g ∈ S, vanishing outside of a set of finite measure. Then

1. for any a, b ∈ R, I(af + bg) = aI(f) + bI(g).

2. If f ≤ g a.e., then I(f) ≤ I(g).

Proof Let {Ai} ( respectively {Bi}) be the sets in the canonical representation for f (respectively for −1 −1 g). Let A0 = f (0) and B0 = f (0). Then, Ai ∩ Bj forms a finite collection of disjoint measurable sets, denote this collection by {Ei : i ≤ N}. Then

N N X X f = ai1Ei , g = bi1Ei i=1 i=1 and N X af + bg = (a ai + b bi)1Ei . i=1 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 64

Use Lemma 5.3.2, we see that

N X I(af + bg) = (a ai + b bi)µ(Ei) = aI(f) + bI(g). i=1

Next if f ≤ g a.e. then {f > g} is a measurable set (by comparing their values on Ei as above), and

I(f) − I(g) = I(f − g) ≥ 0.

This proved the monotonicity. 

Example 5.1 The Dirichlet function f can be written as f = 1Qc . Its integral w.r.t. the Lebesgue measure on [0, 1] is λ(Qc ∩ [0, 1]) = 1.

Example 5.2 Let a, b be real numbers. Consider the measure space ([a, b], B([a, b]). Let µ be the the n Lebesgue measure on [a, b]. It is determined by its value on sets of the form A = ∪i=1(ci, di) (union of disjoint intervals), n X µ(A) = (di − ci). i

Consider a = t0 < t1 < ··· < tn = b. Let

n−1 X f(x) = f01{0}(x) + aj1(tj ,tj+1](x). j=0

Since µ({0}) = 0, n−1 Z b X f(x)dx = aj(tj+1 − tj). a j=0

5.4 Integration of non-negative functions

We will allow a non-negative function to take values in the extended real line R. In the sequel, a non- negative (potentially infinite) function f : X → [0, +∞] will be said to be measurable if it is measurable from (X , F) to ([0, ∞], B([0, ∞])). Note that, by Proposition 4.5.1 above, since B([0, ∞]) ⊂ B(R) (see Exercise 4.6.1) this is equivalent to saying that f is measurable from (X , F) to (R, B(R)). Recall that f is measurable from (X , F) to ([0, ∞], B([0, ∞])) iff {f < ∞} is measurable and f −1(A) ∈ F for any A ∈ B(R).

The principle for a measurable function f : X → [0, ∞] is as follows: if f = ∞ on a set of non-zero measure, we should assign ∞ to its integral; if f < ∞ almost everywhere, we ignore those values that are ∞. This fact is encapsulated in the following, comprehensive, definition. 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 65

Definition 5.4.1 If f : X → [0, ∞] is a measurable function, we define the integral of f to be Z fdµ = sup{I(g) : 0 ≤ g ≤ f, g ∈ S}.

We say f is integrable if R fdµ < ∞.

Remark 5.4.1 If µ is a finite measure, then a bounded measurable function is integrable. A special case is when µ is the Lebesgue measure restricted to any bounded interval.

Definition 5.4.2 If A is a measurable set and f1A is integrable, we define Z Z f dµ = f 1A dµ. A

Proposition 5.4.2 1. If f ∈ S+ then I(f) = R fdµ.

2. the integral satisfies monotonicity.

Proof Firstly, let f ∈ S+. We may take g = f ∈ S+, so

sup{I(g): g ≤ f, g ∈ S+} ≥ I(f).

Next, by monotonicity for I, if g ∈ S and g ≤ f, then I(g) ≤ I(f). Hence

sup{I(g): g ≤ f, g ∈ S+} ≤ I(f).

R + This proves that I(f) = f dµ for f ∈ S . Now if f1 ≤ f2 are non-negative and measurable, then Z Z + + f1 dµ = sup{I(g): g ≤ f1, g ∈ S } ≤ sup{I(g): g ≤ f2, g ∈ S } = f2 dµ proving the monotonicity. 

We cannot directly prove the linearity and try to obtain this from that of simple functions by taking limits.

Definition 5.4.3 We say fn → f a.e. if there exists a null set A such that fn → f everywhere outside of the null set A.

Theorem 5.4.3 (Fatou’s lemma) Let fn be a sequence of non-negative measurable functions that con- verges almost-everywhere to a function f, then Z Z fdµ ≤ lim inf fndµ. n→∞ 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 66

Proof Let E = {x : fn(x) −→ f(x)}. At the expanse of replacing fn by 1E fn and f by 1E f, we n→∞ may assume without loss of generality that fn → f pointwise. Making this assumption it is sufficient to show that for any g ∈ S+ with g ≤ f, Z Z g dµ ≤ lim inf fn dµ. n→∞

Step 1. Suppose that R g < ∞, set

X0 = {x : g(x) > 0}.

Since g takes only a finite number of values µ(X0) < ∞ and M = max g < ∞ ( We only need to restrict to X as R g = R g.) 0 X0 X

Fix  > 0. Since g ≤ f = limn→∞ fn, for any x there exists N(x, ) s.t. for n > N(x, ),

fn(x) ≥ (1 − )g(x).

Let An = {x ∈ X0 : fk(x) > (1 − )g(x), ∀k ≥ n}.

Then An is an increasing sequence and ∪nAn = X0. By the continuity property of measures,

lim µ(X0 \ An) = 0. n→∞ Then for k ≥ n, Z Z Z (1 − ) g ≤ fk ≤ fk. An An Also, Z Z Z (1 − ) g = (1 − ) g − (1 − ) g. c An An Hence Z Z Z (1 − ) g − (1 − ) g ≤ fk. c An An Since g ≤ M, we may take  to 0, Z Z Z g − g ≤ lim inf fk, c k→∞ An R R for every n. Take n → ∞ to conclude g ≤ lim infn→∞ fn. Step 2. Suppose that R g = ∞, since g take only a finite number of values, and g ≥ 0, it is necessary that there exists a set of infinite measure on which g takes a positive value a > 0. Denote this set by A. Set 1 A = {x : f (x) ≥ a, ∀n ≥ k}. k n 2 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 67

Then An is increasing. Since limn→∞ fn ≥ g, we have ∪nAn = A and limn→∞ µ(An) = µ(A) = ∞. For all n ≥ k, Z Z 1 1 f ≥ a1 = aµ(A ). n 2 Ak 2 k Thus, Z 1 lim inf fn ≥ aµ(Ak). n→∞ 2 Take k → ∞ lim infn→∞ fn ≥ ∞, this completes the proof. 

Theorem 5.4.4 (Monotone convergence theorem) If fn is a sequence of non-negative measurable func- tions converging almost everywhere to f. Suppose that fn ≤ f for all n. Then Z Z lim fn = lim fn. n n

R R Proof If 0 ≤ g ≤ fn then 0 ≤ g ≤ f, hence for every n, fn ≤ f. Z Z lim sup fn ≤ f. n By Fatou’s lemma, Z Z Z lim inf fn ≥ lim fn = f, n completing the proof. 

The definition of integrals for non-negative functions using the value sup{I(g) : 0 ≤ g ≤ f, g ∈ S} is not easy to use. The monotone convergence theorem allow us to use a sequence. Choose fn ∈ S with fn ↑ f (this exists by Theorem 4.7.3), then Z Z fdµ = lim fn. n→∞

The name ‘Monotone convergence theorem’ comes from the following Corollary.

Corollary 5.4.5 If 0 ≤ f1 ≤ f2 ≤ ... is a sequence of increasing real valued measurable functions, R and let f = supn fn. Then f is integrable if and only if supn fn < ∞, and then Z Z sup fn = sup fn. n n

This means in particular that

Corollary 5.4.6 If fn is a sequence of increasing simple functions with limit f, then Z Z f = lim fn. n→∞ 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 68

Proposition 5.4.7 If f, g are non-negative and measurable, then the following holds: (1)[Linearity] for any a, b ≥ 0, Z Z Z (af + bg) = a f + b g.

(2) [Monotonicity] If f ≥ g then R f ≥ R g. (3) R f = 0 if and only if f = 0 a.e.

Proof (1) Note that linearity holds on S. Now we approximate f, g with positive simple increasing + + sequence of functions. Take fn ∈ S , gn ∈ S such that

fn ↑ f, gn ↑ g. with fn ≤ f and gn ≤ g. Then Z Z Z (afn + bgn)dµ = fndµ + bgndµ,

We apply the monotone convergence theorem to take the limit inside the integrals, proving the linearity.

(2) It is sufficient to show f ≥ 0 a.e. then R f ≥ 0. This is true as the simple functions approximating f is non-negative.

(3) Let f ≥ 0 a.e. If f = 0 a.e. it is clear that R fdµ = 0 (e.g. use monotonicity.)

R 1 ∞ Suppose that f = 0. Let An = {f ≥ n }. Then {f > 0} = ∪n=1An. Z Z Z

µ(An) = 1An dµ ≤ 1An nfdµ ≤ n fdµ = 0.

Thus ∞ X µ({f > 0} ≤ µ(An) = 0, n=1 concluding the proof. 

These shows that we can exchange summation (over a finite number of functions) with integration. For a sum of non-negative functions, this holds also with infinitely many summands, as to be explained below.

Corollary 5.4.8 (Exchange of order of summation and integration) Suppose that fn is a sequence of non-negative measurable functions, then

∞ ∞ Z X X Z fn = fn. n=1 n=1 5.5. INTEGRABLE FUNCTIONS 69

Proof By Proposition 5.4.7, finite sum can be taken out of the integral. Fatou’s lemma allow us to take the limit. Z ∞ Z N Z N X X Monotone X fn = lim fn = lim fn N→∞ N→∞ n=1 n=1 n=1 N ∞ linearity X Z X Z = lim fn = fn, N→∞ n=1 n=1 concluding the proof. 

5.5 Integrable functions

Definition 5.5.1 Let f : X → R be measurable. If both f + and f − are integrable we say that f is integrable, in which case we define its integral to be: Z Z Z f = f + − f −.

Note that f is integrable if and only if |f| is integrable. A bounded measurable function is integrable with respect to a finite measure.

Definition 5.5.2 We denote by the class of integrable functions by L1. The notation L1(X ) is used to emphasize the space. Also L1(µ), L1(X ; µ) are used to emphasize the measure.

Proposition 5.5.1 If f, g are integrable functions,t then

(1) (linearity) For any a, b ∈ R, Z Z Z (af + bg) = a f + b g.

(2) (Monotonicity) If g ≤ f a.e., then R g ≤ R f.

(3) R |f| = 0 if and only if f = 0 almost-everywhere.

(4) | R f dµ| ≤ R |f| dµ.

Proof Part one follows from splitting f = f + − f − and g = g+ − g− and the same property for non-negative functions. If g ≤ f then g − f ≥ 0, so R (g − f) ≥ 0, monotonicity now follows from the linearity. The third property follows from Proposition 5.4.7 and the property |f| ≥ 0. Finally, for (4), since −|f| ≤ f ≤ |f|, by monotonicity we deduce that Z Z Z − |f| dµ ≤ f dµ ≤ |f| dµ, thus yielding the claim.  5.5. INTEGRABLE FUNCTIONS 70

Exercise 5.5.1 Assume that f, g are non-negative measurable functions and h is an integrable function on X sastisfying that f ≥ g + h µ-a.e. Show that Z Z Z f dµ ≥ g dµ + h dµ.

Exercise 5.5.2 If f, g are real valued integrable functions, show the following statements hold:

R 1. If µ(A) = 0 then A f dµ = 0. R 2. If A f = 0 for every measurable set A then f = 0 µ almost-everywhere.

Hint: For part (2), use Proposition 5.4.7 and choose a suitable A.

Exercise 5.5.3 Let f, g : X → [0, ∞] be two non-negative measurable functions. We assume that R R A f dµ = A g dµ for every measurable set A.

1. Show that if µ is also assumed to be finite, then necessarily f = g µ-a.e. 1

2. Show that if µ is σ-finite, then the conclusion f = g µ-a.e. remains true.

Example 5.3 Recall that for x0 ∈ X , the Dirac measure at x0 is defined by: ( 0, if x0 ∈ A δx0 (A) = = 1A(x0). 1, if x0 6∈ A R Let f : X → R be measurable. Then f is integrable and X f dδx0 = f(x0).

Example 5.4 Let p : X → [0, ∞) be integrable. Set Z µp(A) := p(x)µ(dx). A

Then µp defines a measure.

R Proof Firstly µp(φ) = φ p µ = 0. Next if A ∩ B = φ, 1A∩B = 1A + 1B, So Z µp(A ∪ B) = (1A + 1B)pdµ = µp(A) + µp(B).

Finally, if An is an increasing sequence of measurable sets with union A,

lim 1A = 1∪∞ A = 1A, n→∞ n n=1 n

1Hint: Beware in this case f and g may not be integrable, so we may not use the result from the previous exercise. One may rather consider, for all a, b ∈ Q such that 0 ≤ a < b, the measurable subset A = {f ≤ a < b ≤ g}, and show that µ(A) = 0. 5.6. FURTHER LIMIT THEOREMS 71 showing that Z Z Z µp(A) = 1Ap dµ = lim 1A pdµ = lim 1A p dµ = lim µp(An). n→∞ n n→∞ n n→∞

Hence the σ-additive property holds. 

Questions. R Let x0 ∈ R. Can one find a function p : R → R such that, for all A ∈ B(R), δx0 (A) = A p dλ? R Can one find a function p : R → R such that, for all A ∈ B(R), λ(A) = A p dδx0 ?

Exercise 5.5.4 The answer are no and no. Prove it.

Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2 such that S Ω = A1 A2 with µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν. The Lebesgue measure and any Dirac measures δx are singular.

Definition 5.5.3 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, we have µ(A) = 0. We write µ  ν.

Later we will see that if µ and ν are two finite measures such that µ  ν then we can found a a density function p such that µ is given by integration of p with respect to ν. This is called the Radon-Nikodym Theorem.

5.6 Further limit theorems

Theorem 5.6.1 (Fatou’s Lemma) Suppose that fn is a sequence of non-negative measurable functions, then Z Z lim inf fn dµ ≤ lim inf fn dµ. n→∞ n→∞

Proof Firstly,

lim inf fn = lim inf fm. n→∞ k→∞ m≥k

Observe that infm≥k fm is an increasing sequence and infm≥k fm ≤ fk. So Z Z lim inf fn = lim inf fm n→∞ k→∞ m≥k Z Z = lim inf inf fm ≤ lim inf fk. k→∞ m≥k k→∞

This completes the proof.  5.6. FURTHER LIMIT THEOREMS 72

Theorem 5.6.2 (Dominated convergence theorem) Suppose that fn is a sequence of measurable func- tions converging to a function f almost everywhere. Suppose that there exists an integrable function g such that |fn| ≤ |g|, then fn and f are integrable w.r.t. µ, and

(1) Z Z lim fn dµ = lim fn dµ. n→∞ n→∞

(2) Z lim |fn − f| dµ = 0. n→∞

Proof Since g is integrable and |fn| ≤ |g| µ a.e., the fn are integrable as well. Moreover as fn → f a.e., we deduce that |f| ≤ g a.e., so f is also µ-integrable. Then statement (1) follows from statement (2), as Z Z Z

fn dµ − f dµ ≤ |fn − f| dµ.

Let us prove (2). By the triangle inequality,

|fn − f| ≤ |fn| + |f| ≤ 2|g| a.e., hence the measurable function 2|g| − |fn − f| is non-negative and converges to 2|g| a.e. Therefore, by Fatou’s lemma Z Z Z Z 2 g dµ ≤ lim inf (2|g| − |fn − f|) dµ = 2 g dµ − lim sup |fn − f| dµ. n→∞ n→∞ R R Since g dµ < ∞, we deduce that lim sup |fn − f| dµ ≤ 0, and the claim follows.  n→∞

Exercise 5.6.1 Prove part (2) of the dominated convergence theorem.

Definition 5.6.1 A sequence of measurable functions fn is said to converge to f in L1 if Z lim |fn − f| dµ = 0. n→∞

Exercise 5.6.2 Show that if fn → f in L1 then, Z Z lim fn dµ = f dµ. n→∞

R p Definition 5.6.2 1. A measurable functions with the property |f| < ∞ are said to be in Lp where 1 ≤ p < ∞.

R p 2. If there exists a measurable function f such that |fn − f| dµ → 0, then we say fn converges to f is Lp, in which case f is also in Lp. 5.7. INTEGRALS DEPENDING ON A PARAMETER 73

5.7 Integrals depending on a parameter

As before, let (X , F, µ) be a measure space, and let E be a metric space. In many cases we are integrating a function f : E × X → R 3 (t, x) → f(t, x) with respect to the variable x, and are interested in the R behaviour of X f(t, x) dµ(x) when the value of t changes.

Proposition 5.7.1 Let f : E × X → R be a function, and let t0 ∈ E. We assume that:

1. For all t ∈ E, the function x 7→ f(t, x) is measurable from (X , F) to (R, B(R)).

2. µ(dx)-a.e., the map t 7→ f(t, x) is continuous at the point t0.

3. there exists g : X 7→ [0, ∞] integrable such that, for all t ∈ E, |f(t, x)| ≤ g(x) µ(dx)-a.e.

R Let F : E 7→ R given, for t ∈ E, by F (t) = X f(t, x) µ(dx). Then the function F is well-defined and continuous at t0.

Proof Since for all t ∈ E, |f(t, x)| ≤ g(x) µ-dx a.e. where g is integrable, we deduce that x 7→ f(t, x) is integrable, hence F (t) is well-defined. Moreover, if (tn)n≥1 is a sequence in E converging to t0, then, in view of asssumption 2., f(tn, x) → f(t0, x) µ(dx)-a.e. Moreover, by Assumption 3, for all n ≥ 1, R |f(tn, x)| ≤ g(x). Since g is integrable, by the dominated convergence theorem, f(tn, x)µ(dx) −→ n→∞ R f(t0, x)µ(dx), i.e. F (tn) −→ F (t0). The continuity property for F follows. n→∞ 

Assume now that E = I is an open interval R. The next proposition provides sufficient conditions R for the function t 7→ X f(t, x)dµ(x) to be differentiable at a point t0 ∈ I.

Proposition 5.7.2 Let f : I × X → R be a function, and let t0 ∈ I. We assume that:

1. For all t ∈ E, the function x 7→ f(t, x) is integrable with respect to µ.

∂f 2. µ(dx)-a.e., the map t 7→ f(t, x) admits a derivative at the point t0, which we denote by ∂t (t0, x).

3. there exists an integrable function g : X 7→ [0, ∞] such that, for all t ∈ E, we have

|f(t, x) − f(t0, x)| ≤ g(x) |t − t0|, µ(dx) − a.e. (5.1)

R Then the function F : I → R given, for t ∈ I, by F (t) = X f(t, x) µ(dx), is differentiable at t0, and Z 0 ∂f F (t0) = (t0, x) dµ(x). X ∂t 5.7. INTEGRALS DEPENDING ON A PARAMETER 74

Remark 5.7.3 Under the assumptions of the above Proposition, we thus have Z  Z ∂ ∂f f(t, x) dµ(x) = (t0, x) dµ(x), ∂t ∂t X t=t0 X i.e., we may interchange the derivation and the integration.

Proof By Assumption 1., F (t) is well-defined for all t ∈ I. Moreover, by Assumption 2., for all sequence of points tn ∈ I converging to t0, we have

f(tn, x) − f(t0, x) ∂f −→ (t0, x), µ(dx) − a.e. t − t0 n→∞ ∂t Moreover, Assumption 3. yields the domination property

f(tn, x) − f(t0, x) ≤ g(x), µ(dx) − a.e. t − t0 holding for all n ≥ 1. Since g is integrable, by the Dominated Convergence Theorem Z Z F (tn) − F (t0) f(tn, x) − f(t0, x) ∂f = dµ(x) −→ (t0, x) dµ(x), tn − t0 X t − t0 n→∞ X ∂t which yields the claim. 

Often the following, weaker version of the previous theorem proves convenient. It is less general, but its assumptions are quicker to check and are very often satisfied in most applications.

Proposition 5.7.4 Let f : I × X → R be a function. We assume that:

1. For all t ∈ E, the function x 7→ f(t, x) is integrable with respect to µ.

∂f 2’. µ(dx)-a.e., the map t 7→ f(t, x) is differentiable on I, with derivative at t denoted by ∂t (t, x). 3’. there exists an integrable function g : X 7→ [0, ∞] such that, µ(dx)-a.e, we have

∂f ∀t ∈ I, (t, x) ≤ g(x). ∂t

R Then the function F : I → R given, for t ∈ I, by F (t) = X f(t, x) µ(dx), is differentiable on I, and Z ∂f F 0(t) = (t, x) dµ(x), t ∈ I. X ∂t

Proof By Assumptions 2’. and 3’. and the mean value theorem, for any given t0 ∈ I, the inequality (5.1) is satisfied at t0. Hence the result follows by Proposition 5.7.2. . 

Here are two important examples of applications of the above three propositions 5.8. EXAMPLES AND EXERCISES 75

Example 5.5 () Let ϕ : R → R be a Lebesgue-integrable function, and let % : R → R be a bounded continuous function. Then the convolution % ∗ ϕ : R → R , defined by Z % ∗ ϕ(t) = %(t − x) ϕ(x) λ(dx) is continuous on R. Moreover, if % is C1 and %0 is bounded, then % ∗ ϕ is C1 and (% ∗ ϕ)0 = %0 ∗ ϕ.

Example 5.6 () Let ϕ : R → R be a Lebesgue-integrable function. Its Fourier transform ϕˆ : R → C is defined by Z ϕˆ(ξ) = eiξxϕ(x) λ(dx), ξ ∈ R. R Then ϕˆ is continuous. Moreover, if Z |x| |ϕ(x)| λ(dx) < ∞ R 1 0 R iξx then ϕˆ is C , and for all ξ ∈ R, ϕˆ (ξ) = R ixe ϕ(x) λ(dx).

Remark 5.7.5 The above three propositions give criteria for regularity (continuity and differentiability) R of the function F : t 7→ X f(t, x) µ(dx). One may also ask for criteria of integrability: this question will be tackled in Chapter 7 below with Fubini’s Theorems.

5.8 Examples and exercises

Exercise 5.8.1 If a bounded function f :[a, b] → R is Riemann integrable, then it is Lebesgue measur- able and Lebesgue integrable. Furthermore, the integrals have the common value: Z Z b f(x)dµ = f(x)dx. [a,b] a

Hint. Check that the integrals agree on functions of the form:

n X αi1(ci,di) i=1 where (ci, di) are disjoint intervals. (b−a) Let ∆ be the dyadic partition of [a, b]: tj = 2n . Consider X fn(t) = sup{f(x): x ∈ [ti, ti+1]}1(ti,ti+1](t). j

Then fn is measurable and converges to f a.e. 5.8. EXAMPLES AND EXERCISES 76

Exercise 5.8.2 1. Show that if for two measures µ and ν, R fdµ = R fdν for all f ∈ S, then R R L1(µ) = L1(ν), and fdµ = fdν for all integrable functions.

2. Let E be a measurable set and µE the restriction of µ to the trace σ-algebra FE = {E∩A : A ∈ F}. Show that Z Z fdµ = fdµE. E X Also f ∈ L1(µE) if and only if f1E ∈ L1(µ).

(1) By the construction of integrals, R fdµ = R fdν for all non-negative functions. Thus R |f|dµ = R |f|dν, which are either both finite or both infinite. This implies that L1(µ) = L1(ν) and the integrals are the same for they are defined by the integrals of their positive and negative parts. R R (2) If g is elementary, so is g1E and g1Edµ = gdµE for all elementary g ∈ S. Let fn be an increasing sequence of simple functions converging to f, then fn1E ↑ f1E. Hence Z Z Z Z fdµ = lim fn1E dµ = lim fn dµE = fdµE. E n→∞ n→∞ Exercise 5.8.3 Suppose that p :[a, b] → R is a non-negative continuous function. Set Z x F (x) = F (a) + p(t)dt. a

Let f :[a, b] → R be a continuous function. Describe the Lebesgue-Stilejes measure µF and the integral Z f dµF in terms of Riemann integral.

Sketch. Let A ⊂ B([a, b]). Then, Z Z d Z 1A dµF = µF (A) = p(t)dt = 1A p(t) dt. c [a,b] By linearity for any simple function f, Z Z b fdµF = f(t)p(t)dt. a By Fatou’s lemma we can pass this to non-negative functions, and then by linearity to integrable func- tions: Z Z b fdµF = f(t)p(t)dt. [a,b] a

Example 5.7 Let fn = n1 1 Then, fn → 0 everywhere except at 0. [0, n ] Z Z n fndλ = dx = 1. 0 Hence we cannot exchange the order of taking limit with taking integration in this case. 5.9. THE MONOTONE CLASS THEOREM FOR FUNCTIONS * 77

5.9 The Monotone Class Theorem for Functions *

Proposition 5.9.1 (The Monotone Class Theorem for Functions) Let (Ω, F) be a measure space. Let C be a π system with σ(C) = F. Let H be a vector space of functions from Ω to R, with the following property:

1. 1 ∈ H, and 1A ∈ H for every A ∈ C.

2. (Monotone class property) If fn ∈ H is an increasing sequence of non-negative functions with

f(x) = supn f(x) finite for every x (resp. with f = supn f bounded), then f ∈ H.

Then H contains the set of all real valued (resp. all bounded ) F-measurable functions.

Proof Let J = {B ∈ F : 1B ∈ H}.

By assumption, 1 = 1Ω ∈ H, and J ⊃ C ∪ {Ω}. If A ⊂ B, A, B ∈ J then 1B\A = 1B − 1A ∈ H by the vector space property of H. Thus B \ A ∈ J. Let An ∈ J be an increasing sequence of sets then by condition 2, 1∪∞ A = lim 1A ∈ H. n=1 n n→∞ n ∞ Hence ∪n=1An ∈ J and J is a λ-system. It therefore contains F. This means all indictor functions are in H and simple functions are in J.

Suppose H is closed under monotone limit of functions with bounded limit. If f is bounded positive and measurable there is an sequence of positive simple functions fn converging to f, thus f ∈ H. If f is not positive let f = f + − f − to conclude.

Suppose H is closed under monotone limit of functions with finite limit. If f is positive and measur- able there is an sequence of positive simple functions fn converging to f, thus f ∈ H. Again, if f is not + − positive let f = f − f to conclude. 

Remark 5.9.2 From this we see a meta theorem (means the theorem is likely to hold, of course one needs to give a proof): If a property concerning integration of functions holds for all indicator functions, then it holds for all measurable functions (use Fatou’s lemma to obtain the monotone property) or it holds for all bounded measurable functions ( use dominated convergence theorem to obtain the monotone property).

5.9.1 Lebesgue Integration

In this section let X = R, F the completion of the Borel σ algebra, i.e. consists of Lebesgue measurable sets, and µ the Lebesgue measure λ. The restriction of λ to any measurable subset is also denoted by the same letter. 5.10. THE PUSHED FORWARD MEASURE 78

A function f : R → R is said to be Lebesgue measurable if it is measurable with respect to F/B(R). Borel measurable functions are of course Lebesgue measurable. In this section by a measurable function we refer to a Lebesgue measurable function.

Definition 5.9.1 Integrals with respect to a Lebesgue measure, are called Lebesgue integrals.

Exercise 5.9.1 Let I be a Lebesgue measurable set. Suppose that f : I → R is bounded.

1. If f is Lebesgue measurable, show that Z  Z  inf h dλ, h ≥ f, h ∈ S. = sup g dλ : g ≤ f, g ∈ S (5.2)

2. Show that if (5.2) holds, then f is Lebesgue measurable.

5.10 The pushed forward measure

Suppose we are given two measurable spaces (X , A) and (Y, B). Let µ be a measure on X and f : X → Y be a measurable function. Denote by f∗µ the pushed forward (induced) measure on (Y, B), i.e.

f∗(µ)(B) = µ{x : f(x) ∈ B}.

The right hand side µ(f −1(B)), so we given B the measure of its pre-image.

Proposition 5.10.1 Let ϕ : Y → R a measurable function, we have Z Z ϕ ◦ f dµ = ϕ d(f∗µ). X Y

This is in the sense that ϕ is integrable with respect to f∗µ if and only if ϕ ◦ f is integrable with respect to µ.

Proof This holds for indicator functions of measurable sets by the definition of pushed forward measures. Apply monotone convergence theorem on both sides to see that those functions with the desired property is a monotone class. Apply the monotone class theorem to conclude. 

Let (Ω, F, P) be a probability measure. Let (S, G) be a measurable space. A measurable function from X :Ω → S is called a random variable, S is called its state space. If ϕ : S → R and X :Ω → S are measurable, then Z Z E[ϕ(X)] := ϕ ◦ X dP = ϕ dX∗(P). Ω S

0 R R 0 Exercise 5.10.1 If both Y , Y are real valued integrable functions on (Ω, F, P) with A Y = A Y for every measurable set A, then Y = Y 0 almost surely. 5.11. APPENDIX: RIEMANN INTEGRALS 79

Proof Let A = {Y > Y 0}. By symmetry it is sufficient to proves that P(A) = 0. Thus we only need R to prove the following statement: for any Y ≥ 0, A Y = 0 for any A ∈ F, then Y = 0 almost surely. For this just note that if {Y 6= 0} = {Y > 0} has positive measure then for some a > 0, {Y > a} has 1 positive measure (for otherwise P(Y > 0) = limn→∞ P(Y > n ) = 0), and then Z Y dP ≥ aP(Y > a) > 0, A this contradicts the assumption. Hence µ({Y 6= 0}) = 0 ,completting the proof. 

Example 5.8 Let f : X → Y. Given a measure on Y, can we use f and µ to define a measure on X in a meaningful way? The answer is no in general. There is no good sensible way for this construction, for otherwise you should be able to pull back a direct measure, what that would be if f : {1, 2} → R f(1) = 0 and f(2) = 0? If f : R → R is injective, one can define a measure from a measure on the target space, it would be the pushed forward measure for the inverse map.

Definition 5.10.1 If f is a measurable map from X to X , we say µ is invariant by f if f∗(µ) = µ.

n n n Example 5.9 Let δx be the Dirac measure on R . Then for any transformation T : R → R , T∗(δx) = δT x. The δ-measure is not invariant under rotations (unless x = 0), nor by translation.

5.11 Appendix: Riemann integrals

If a function f :[a, b] → R is Riemann integrable, it is Lebesque integrable and the two integrals agree. In this and the next section we review Riemann integrals and Riemann-Stieltjes integrals.

I do not intend to cover this in class. However you might find this helpful for comparing several notions of integration theory.

We know how to compute the areas of a triangle and a polygon. We then define the integral underneath a continuous curve by rectangle approximation. If y = f(x), x ≥ 0 is the curve, the approximation leads R t R b to 0 f(s)ds, this is the Riemann integral a f(x)dx. (Riemann integrals are covered in M2PM1: Real Analysis)

The Riemann integral of f is defined as follows. Take a partition ∆ : a = x0 < a1 < ··· < an = b. Let Mi and mi be respectively the supremum and infimum values of f on the interval [ai−1, ai]. Define

n n X X U(∆, f) = Mi(ai − ai−1),L(∆, g) = mi(ai − ai−1). i=1 i=1

U(f) = inf U(∆, f),L(f) = sup U(∆, f). ∆ ∆ 5.12. APPENDIX: RIEMANN-STIELTJES INTEGRALS* 80

Definition 5.11.1 A bounded function f :[a, b] → R is Riemann integrable, if U(f) = L(f) and the common value is the Riemann integral of f on [a, b].

A continuous function is Riemann integrable.

Theorem 5.11.1 A bounded function f is Riemann integrable if and only if for every  > 0 there exists a partition such that U(f, ∆) − L(f, ∆) < .

Observe that the oscillation of f on [ai−1, ai] is Osc(f) = Mi − mi, X U(f, ∆) − L(f, ∆) = Osc(f, [ai−1, ai])(ai − ai−1).

Define the Riemann sum: n X ∗ R(f, ∆) = f(xi )(ai − ai−1). i=1

Theorem 5.11.2 Suppose a bounded function f :[a, b] → R is Riemann integrable. If ∆n is a sequence R b of partitions with mesh goes to zero, then their Riemann sum R(f, ∆n) converges to a f(x)dx.

This can be proved using Theorem 5.11.1.

Remark 5.11.3 If f : [0, ∞) → R is measurable and Riemann integrable on every interval [0, n] where n ∈ N, then f is Lebesgue integrable on [0, ∞) if and only if Z N lim |f(x)|dx < ∞. N→∞ 0 R ∞ R If so, 0 f(x)dx = [0,∞) fdλ. The concept ‘Lebesgue integrable’ does not allow cancellation, while improper Riemann integration sin(x) allows it. x is improper Riemann integrable, bot not Lebesgue integrable.

5.12 Appendix: Riemann-Stieltjes integrals*

Riemann-Stieltjes integrals aree defined in parallel to Riemann integrals. Suppose : f[a, b] → R is a bounded function and g a monotone increasing function on [a, b], for example g(x) = x in which we case we are back to Riemann integrals.

Write ∆gi = g(ai) − g(ai−1). Define n n X X U(∆, f, g) = Mi∆gi,L(∆, f, g) = mi∆gi. i=1 i=1 5.12. APPENDIX: RIEMANN-STIELTJES INTEGRALS* 81

U(f, g) = inf U(∆, f, g),L(f, g) = sup U(∆, f, g). ∆ ∆

Definition 5.12.1 If U(f, g) = L(f, g), we say f is Riemann-Stieltjes integrable (with respect to g), we R b denote f ∈ D(g). The common value is denoted by a fdg.

Theorem 5.12.1 (Integrability Criterion) Suppose f is a bounded function and g a monotone increas- ing function on [a, b]. Then f is Riemann-Stieltjes integrable if and only if for any  > 0, there exists a partition ∆ of [a, b] such that U(∆, f, g) − L(∆, f, g) < .

Corollary 5.12.2 Suppose f is a bounded function with only a finite number of discontinuity points, and g a monotone increasing function on [a, b], continuous at every point where f is discontinuous. Then f is Riemann-Stieltjes integrable.

Proposition 5.12.3 Suppose that f, f1, f2 :[a, b] → R are bounded, Riemann-Stieltjes integrable with respect to a monotone increasing function. g :[a, b] → R. Let c ∈ R.

R b R b 1. If f ∈ D(g), then for any c ∈ R, cf ∈ D(g)and a (cf)dg = c a fdg. R b R b R b 2. f1 + f2 ∈ D(g), and a (f1 + f2)dg = a f1dg + a dg. R b R b 3. If f1 ≤ f2 then a f1dg ≤ a f2dg. R b R b 4. If f ∈ D(g) and c > 0 is a constant, then f ∈ D(cg) and a fd(cg) = c a fdg. 0 0 5. If f ∈ D(g) ∩ D(g ), then f ∈ D(g + g ), Z b Z b Z b fd(g + g0) = fdg + ddg0. a a a

Definition 5.12.2 If g = g1 − g2, where monotone increasing functions we define Z Z Z fd(g1 − g2) = fdg1 − fdg2.

Riemann-Stieljes integrals do not exists if the integrator and the integrand have the same point of R b R c R c discontinuity. That a fdF, b fdF exist in Riemann-Stieljes sense do not necessarily imply that a fdF is Riemann-Stieltjes integrable.

Proposition 5.12.4 If g :[a, b] → R is continuous and F :[a, b] → R has bounded variation then the R b R b Riemann-Stieltjes integral a gdF and a F dg exist. If F is furthermore absolutely continuous and the integral equals to the corresponding Lebesgue integral: Z b Z b gdF = gF 0dx. a a 5.13. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION* 82

5.13 Appendix: Functions with bounded variation*

The most commonly used integrators are functions of bounded variations. Functions of bounded varia- tions are differences of increasing functions.

Given an interval [a, b], define P ([a, b]) to be the set of all partitions of [a, b]:

P ([a, b]) = {t = (t0, t1, . . . , tn): a = t0 < t1 < ··· < tn−1 < tn = b, n ∈ N}.

Let ∆t = max0≤i≤n−1(ti+1 − ti) be the modulus of the partition. Write

n X X |f(u) − f(v)| = |f(tj) − f(tj−1)|. [u,v]⊂∆ j=1

Definition 5.13.1 The total variation of a function f on [a, b] is defined by X |f|TV ([a, b]) = sup |f(u) − f(v)|. ∆∈P ([a,b]) [u,v]⊂∆

If this number is finite we say that f has bounded total variation over [a, b].

1 The function x sin( x ) does not have bounded variation over any interval containing 0. Note that f could have bounded total variation over [a, b] without having a bounded variation over R (e.f. f(x) = sin x).

Theorem 5.13.1 A. function is of bounded variation on [a, b] if and only if f is the difference of two monotone real-valued functions on [a, b].

Proof Let f = f1 − f2, where f, g are increasing functions, then X |f|TV ([a, b]) = sup |f1(u) − f1(v) − (f2(u) − f2(v))|. ∆∈P ([a,b]) [u,v]⊂∆



For f :[a, b] → R, define a function:

|f|TV (x) = |f|TV ([a, x]).

Example 5.10 1. If f : R → R is an increasing function, |f|TV (x) = f(x) − f(−∞).

2. If f is locally Lipschitz continuous, then f is of bounded variation on any finite time interval.

3. The set of bounded variation function form a vector space. 5.13. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION* 83

If f : R → R is an increasing then it has left and right derivatives at every point and there are only a countable number of points of discontinuity. Furthermore f is differentiable almost everywhere.

Theorem 5.13.2 Let F : R → R+ be a function of bounded variation, right continuous with F (−∞) = 0. Then

1. F determines a µ on R.

2. The Borel measure µ, with distribution function F , is absolutely continuous with respect to the Lebesgue measure dx if and only if Z x F (x) = F 0(t)dt. 0

3. It is singular with respect to dx if and only if F 0 = 0.

Chapter 6

Lp spaces

Lebesgue spaces, also known as Lp spaces, are a family of Banach spaces that play a fundamental role in functional analysis. Their construction, which relies heavily on Lebesgue’s theory of integration, is outlined here. Throughout this chapter, we consider a fixed measure space (X , B, µ).

6.1 Holder’s inequality

Let f : X → R measurable. For all p ∈ [1, ∞), we set

1 Z  p p kfkp := |f| dµ ∈ [0, ∞].

Moreover, for p = ∞, we set

kfk∞ := inf{M ∈ [0, ∞]: |f| ≤ M µ − a.e.} ∈ [0, ∞].

Remark 6.1.1 Note that |f| ≤ kfk∞ µ-a.e. Indeed, by definition of kfk∞, for all n ≥ 1  1  µ |f| ≥ kfk + = 0 ∞ n so that   [  1  µ({|f| > M}) = µ |f| ≥ kfk + = 0.  ∞ n  n≥1

So indeed |f| ≤ kfk∞ µ-a.e, and kfk∞ is thus the smallest constant that bounds |f| µ-a.e. We also call it the essential supremum of f.

Exercise 6.1.1 Show that, if µ is a probability measure, then as p ↑ ∞, kfkp → kfk∞.

85 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 86

Remark 6.1.2 For all p ∈ [1, ∞], k · kp is homogeneous, i.e. for all f : X → R measurable and all a ∈ R, kafkp = |a|kfkp.

Theorem 6.1.3 (Holder’s¨ inequality (Cauchy-Schwarz if p = q = 2)) Let 1 ≤ p, q ≤ ∞ such that 1 1 1 d p + q = 1 (with the convention ∞ = 0). Then, for all f, g : X → R measurable, Z |f · g|dµ ≤ kfkp kgkq.

Proof If p = 1 and q = ∞, then |fg| ≤ |f| kgk∞ µ-a.e., hence Z Z |fg|dµ ≤ |f| kgk∞dµ = kfk1kgk∞, so the claim follows. The case where p = ∞ and q = 1 is symmetric, so we now focus on the case p, q ∈ (1, ∞).

R p If kfkp = 0, then |f| dµ = 0, whence f = 0 µ -a.e.. Hence we also have fg = 0 µ-a.e., so R |fg| dµ = 0. Similarly, the inequality holds if kgkq = 0, hence we may assume kfkp, kgkq > 0. By homogeneity, at the expanse of considering f g f˜ = , g˜ = kfkp kgkp we may even assume kfkp = kgkq = 1. We now claim the following statement, known as Young’s inequality: xp yq ∀x, y ≥ 0, xy ≤ + . p q To prove that claim, note that the case where x or y vanishes is straightforward, so we may assume x, y > 0. Then, setting u = xp and v = yq, by convexity of the exponential, we have u v u1/p v1/q ≤ + , p q which yields the claim. Now, by Young’s inequality, we have

Z Z |f|p |g|q  kfkp kgkq 1 1 |f| |g| dµ ≤ + dµ = p + q = + = 1 = kfk = kgk , p q p q p q p q which yields the requested inequality. 

6.2 Definition and basic properties of the Lp spaces

For p ∈ [1, ∞], we set d Lp = {f : X → R : measurable, kfkp < ∞}. 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 87

p p p That Lp is a vector space is clear from the elementary inequality |a + b| ≤ Cp|a| + Cp|b| , for all p ∈ [1, ∞).

Question: Does kfkp define a norm on Lp? As we saw it is homogeneous, and it also satisfies the triangle inequality which follows from the following:

Theorem 6.2.1 (Minkowski’ inequality) Let f, g ∈ Lp where 1 ≤ p ≤ ∞. Then, f + g ∈ Lp and

kf + gkp ≤ kfkp + kgkp.

Proof If p = 1 or p = ∞, the inequality follows at once from the triangle inequality, so we restrict our attention to the case p ∈ (1, ∞). We may also assume WLOG that kf + gkp > 0. Let q ∈ (1, ∞) such 1 1 p that p + q = 1, then p = 1 + q . Hence

Z Z p p p 1+ q kf + gkp = |f + g| dµ = |f + g| dµ

Z p = |f + g| |f + g| q dµ

Z p Z p ≤ |f| |f + g| q dµ + |g| |f + g| q dµ where we used the triangle inequality in the last line. Using Holder’s¨ inequality, noting that

1 1 Z p  q Z  q p ·q p q |f + g| q dµ = |f + g| dµ = kf + gkp ,

we therefore obtain p p p q q kf + gkp ≤ kfkpkf + gkp + kgkpkf + gkp , p(1− 1 ) q 1 1 whence kf + gkp ≤ kfkp + kgkp. Since p(1 − q ) = p p = 1, the claim follows. 

Unfortunately, k · kp falls short of defining a norm on Lp because of the following observation: a function f ∈ Lp satisfies kfkp = 0 iff f = 0, µ a.e. In general, a vast collection of non-zero functions may vanish µ a.e.

Example 6.1 If (X , F) = (R, B(R)) and µ = δ0, then f = 0 µ-a.e. iff f(0) = 0.

To fix this degeneracy, we introduce an equivalence relation on functions as follows:

Definition 6.2.1 Two functions f, g : X → R are considered to be in the same equivalent class, and we write f ∼ g, if f = g µ-a.e.

Note that this equivalence relation is compatible with the normed vector space structure of Lp. More precisely, if f ∼ f 0 and g ∼ g0, then for all a, b ∈ R, af + bg ∼ af 0 + bg0. Moreover, if f ∼ f 0, then 0 p kfkp = kf kp. This allows us to introduce a normed vector space L as follows: 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 88

Definition 6.2.2 For 1 ≤ p ≤ ∞, the space Lp(X , F, µ) (often written Lp(µ), or just Lp) is the space of equivalence classes of measurable functions f such that f ∈ Lp.

Loosely speaking, p L = {f : f : X → R measurable, kfkp < ∞}, but keeping in mind that f = g in Lp as soon as f = g µ-a.e.

p Proposition 6.2.2 The L spaces are vector spaces, they are Banach spaces under the norm k · kLp .

p That L is a Banach space means that it is complete for the norm k · kLp , i.e. for every Cauchy p p sequence fn ∈ L , there exists f ∈ L such that

lim kfn − fkp = 0. n→∞

Proof We have already seen that k·kp is homogeneous and verifies the triangle inequality (by Minkowski’sinequality). p Moreover, note that kfkp = 0 if and only if |f| = 0 µ a.e., i.e. if and only if f = 0 in L . Hence k · kp indeed defines a norm on Lp.

We recall a useful fact: a normed vector space is a Banach space if and only if any absolutely conver- p P∞ gent sequence is convergent. Let fn ∈ L with n=1 kfnkp = M < ∞. Our aim is to show that there p Pn p exists f ∈ L such that fk −→ f in L . We first assume that p < ∞. We define k=1 n→∞

n X gn(x) = |fk(x)|. k=1

Pn Then, by Minkowski’s inequality, for all n ≥ 1, kgnkp ≤ k=1 kfkkp ≤ M. Hence, invoking the monotone convergence theorem, we obtain

Z ∞ !p Z Z X p p p |fk(x)| dµ = lim |gn| dµ = lim |gn| dµ ≤ M < ∞ n→∞ n→∞ k=1

P∞ P∞ Hence the first integral above is finite, so k=1 |fk(x)| < ∞ µ-a.e., i.e. the series k=1 |fk(x)| con- P∞ verges absolutely in R µ(dx)-a.e Then we may define f(x) = k=1 fk(x) whenever the series con- Pn verges absolutely, and f(x) = 0 otherwise. Then f : X → R is measurable. Set hn(x) = k=1 fk(x). P∞ p p Then |hn(x)| ≤ k=1 |fk(x)| ∈ L . By the dominated convergence theorem (since |fn − f| ≤ p P∞ p 1 p 2 ( k=1 |fk(x)|) ∈ L ), |fn − f| converges in L1, hence fn → f in Lp. This completes the proof when p < ∞. When p = ∞, we have, µ-a.e.,

∞ ∞ X X |fk(x)| ≤ kfkk < ∞, k=1 k=1 6.3. DENSITY RESULTS 89

P∞ so again the series k=1 fk(x) converges absolutely in R µ(dx)-a.e. Then we may define f(x) = P∞ k=1 fk(x) whenever the series converges absolutely, and f(x) = 0 otherwise. We have the bound P∞ ∞ |f| ≤ k=1 kfkk∞ < ∞ µ-a.e., so f ∈ L . Moreover, µ(dx)-a.e., n ∞ ∞ X X X | fk(x) − f(x)| ≤ |fk(x)| ≤ |fk|∞, k=1 k=n+1 k=n+1 Pn P∞ Pn ∞ so that | fk − f|∞ ≤ |fk|∞ −→ 0, i.e. fk converges to f in L . k=1 k=n+1 n→∞ k=1 

Pn Remark 6.2.3 In the proof above we actually established a stronger fact: if the series k=1 fk converges p p Pn p absolutely in L , there exists f ∈ L such that fk −→ f µ-a.e. and in L . In particular, we k=1 n→∞ deduce the following, useful result.

p Corollary 6.2.4 Let p ∈ [1, ∞). If a sequence (fn)n≥1 converges to f in L , we can extract a subse- quence (fϕ(n))n≥1 converging to f µ-a.e.

−n Proof Since kfn −fkp −→ 0, we can extract a subsequence (fϕ(n))n≥1 such that kfϕ(n) −fkp ≤ 2 , n→∞ Pn p so that the series k=1(fϕ(k) − fϕ(k−1)) converges absolutely in L (where we set fϕ(0) := 0). Thus, Pn p by the previous remark, fϕ(n) = k=1(fϕ(k) − fϕ(k−1)) converges µ-a.e. to its limit in L , which is f. 

∞ Exercise 6.2.1 If p = +∞, and if a sequence (fn)n≥1 converges to f in L , then (fn)n≥1 converges to f µ-a.e. (without need to extract a subsequence).

6.3 Density results

When we need to prove a certain property for all functions in Lp, where p ∈ [1, ∞], it is often convenient to prove the property on a smaller set of functions, and argue by a density argument that the property remains true on the whole space Lp. We state here some fundamental density results.

p Theorem 6.3.1 Let p ∈ [1, ∞]. If f ∈ L (X ) then there exists a sequence of simple functions fn such p p that limn→∞ kfn − fkp = 0. In words, S ∩ L (X ) is dense in L (X ).

Proof First assume that p < ∞. We may assume that f ≥ 0 (standard methods passes to f = f + −f −). Let fn ≤ f be a sequence of non-negative increasing simple functions converging to f. Define

p p gn = f − (f − fn) .

p Then gn → f , and gn is non-negative and increasing , by the monotone convergence theorem, Z Z  p p p lim f − (f − fn) dµ = f dµ n→∞ 6.3. DENSITY RESULTS 90 i.e. Z p lim (f − fn) dµ = 0, n→∞ ˜ ˜ and the claim follows. If p = ∞, then |f| ≤ kfk∞ < ∞ µ-a.e. Setting f = 1{|f|≤kfk∞}, then f = f µ-a.e. Moreover f˜is bounded, so there exists a sequence simple functions fn converging to f uniformly. Therefore kfn − fk∞ = kfn − f˜k∞ −→ 0, n→∞ thus concluding the proof. 

Corollary 6.3.2 Let 1 ≤ p < ∞, then Lp(Rd) is a separable metric space.

Pn d Proof The subset D of simple functions i=1 ai1Ai where Ai = Πj=1(ci,j, di,j) with ai, ci,j, di,j ∈ Q is a countable subset of Lp(Rd). We also claim that D is dense in Lp(Rd). Let us prove this claim. For convenience of notation, let us assume that d = 1 (the general case is similar). We know that the subspace S ∩ Lp(R) is dense in Lp(R). Note that S ∩ Lp(R) is the subspace of functions of the form Pn i=1 ai 1Ai , where n ≥ 1, ai ∈ R \{0}, and Ai ∈ B(R) with λ(Ai) < ∞ . Since Q is a dense subset of R, the claim will follow upon showing that if A ∈ B(R) satisfies λ(A) < ∞, then

∀ > 0, ∃f ∈ D, k1A − fkp ≤ . (6.1) Note moreover that for all A ∈ B(R) such that λ(A) < ∞, by the Dominated Convergence Theorem, p 1A∩(−N,N) −→ 1A in L (R), so it is enough to prove (6.1) holds for all subset A ∈ B((−N,N)), for N→∞ N ≥ 1.

Given N ≥ 1, let us therefore consider A, the collection of subsets A ∈ B((−N,N)) satisfying (6.1). A is a λ-system, indeed:

• 1(−N,N) ∈ D, hence (−N,N) ∈ A

• if A, B ∈ A and A ⊂ B, then for all  > 0 there exist f, g ∈ D such that k1A − fkp < /2, and k1B − gkp < /2. Since 1B\A = 1B − 1A, it follows by the Minkowski inequality that k1B\A − (g − f)kp < ; since g − f ∈ D, this shows that B \ A ∈ A.

• if (An)n≥1 is a non-decreasing sequence of elements of A, and A = ∪n≥1An, then 1A −→ 1A n n→∞ in Lp by the Dominated Convergence Theorem. Hence, for all  > 0, there exists a n ≥ 1 such

that k1An − 1Akp < /2. Since An ∈ A, there exists f ∈ D such that k1An − fkp < /2. By Minkowski’s inequality, we deduce that k1A − fkp < , so A ∈ A.

Thus A is a λ-system, and it contains C := {(c, d): c, d ∈ Q ∩ [−N,N], c < d}. Since C is a π-system and σ(C) = B((−N,N)), by the π − λ Theorem, we deduce that A = B((−N,N)). Hence D is indeed a dense countable subset Lp(Rd), which is therefore separable.

 6.4. THE CASE OF FINITE MEASURES 91

Remark 6.3.3 Beware that L∞(Rd) is not separable.

∞ d ∞ d Theorem 6.3.4 Let 1 ≤ p < ∞. The space Cc (R ) of C functions on R with compact is a dense subspace of Lp(Rd).

∞ d p d ∞ d Proof First note that Cc (R ) ⊂ L (R ). Indeed, if f ∈ Cc (R ), then there exists N ≥ 1 such that f is supported in [−N,N]d. Since moreover f is continuous, it is therefore bounded. Thus Z Z p p p d |f| dλ = |f| 1[−N,N]d dλ ≤ kfk∞ λ([−N,N] ) < ∞,

p d ∞ d p d so f ∈ L (R ). We now prove that Cc (R ) is dense in L (R ). By the proof of Corollary 6.3.2, the Pn d subset D of simple functions i=1 ai1Ai where Ai = Πj=1(ci,j, di,j) with ai, ci,j, di,j ∈ Q is dense in p d d L (R ). Hence it suffices to show that for any subset A of the form Πj=1(cj, dj), with cj, dj ∈ Q, there ∞ d p exists a sequence of functions fn ∈ C (R ) such that fn −→ 1A in L . We now recall the following c n→∞ ∞ useful fact: there exists a non-decreasing sequence of functions ϕn ∈ Cc (R) such that

• 0 ≤ ϕn ≤ 1(0,1) for all n ≥ 1

• ϕn −→ 1(0,1) pointwise from below. n→∞

d ∞ d Given A = Πj=1(cj, dj), with cj, dj ∈ Q, and cj < dj, we now define fn ∈ Cc (R ) by setting   d xj − cj d fn(x) = Πj=1 ϕn , x = (x1, . . . , xd) ∈ R . dj − cj

Then we have

• 0 ≤ fn ≤ 1A for all n ≥ 1

• fn −→ 1A pointwise from below. n→∞

p By the Dominated Convergence Theorem, we deduce that fn −→ 1A in L , as requested. n→∞ 

6.4 The case of finite measures

Here we state some fundamental properties specific to the case when µ is a finite measure.

Proposition 6.4.1 If µ is a finite measure, then for all r, r0 ∈ [1, ∞] with r < r0, Lr0 (µ) ⊂ Lr(µ). 6.4. THE CASE OF FINITE MEASURES 92

0 ∞ Proof If r = ∞, and f ∈ L (µ), then f ≤ kfk∞ µ a.e., hence for all r ∈ [1, ∞), Z r r |f| dµ ≤ kfk∞ µ(X ) < ∞

r 0 r0 r0 so f ∈ L (µ), thus proving the claim. Let us now assume r < ∞. Let p = r and q = r0−r , so that p, q 0 are two conjugate numbers in (1, ∞). If f ∈ Lr (µ), by Holder’s¨ inequality

1/r 1/pr 1/qr Z  Z  Z  r0−r r pr 0 kfkr = |f| 1 dµ ≤ |f| dµ 1 dµ = kfkr0 µ(X ) rr < ∞,

r so f ∈ L (µ), thus proving the claim. 

Remark 6.4.2 In the special case where µ is a probability measure, the proof above shows that we 0 r0 furthermore have kfkr ≤ kfkr0 for all r > r ≥ 1 and all f ∈ L (µ). In probabilistic notations, 0 0 0 E(|X|r)1/r ≤ E(|X|r )1/r for any random variable X ∈ Lr (µ).

Terminology: If X is a random variable on a probability space (Ω, F, P), for all p ∈ [1, ∞), E(|X|p) is called the p-th moment of X.

We will now state an inequality that plays an important role in probability theory: Jensen’s inequality. We first recall the definition of a convex function.

Definition 6.4.1 A function ϕ : Rd → R is convex if

ϕ(tx + (1 − t)y) ≤ tϕ(x) + (1 − t)ϕ(y) for any x, y ∈ Rd and for any t ∈ [0, 1].

The following are convex functions on Rd: ϕ(x) = |x|, |x|p, p > 1. If d = 1, ex is convex, and so are 00 ϕ(x) = xp for p > 1. Moreover, if ϕ is twice differentiable, ϕ is convex if and only if ϕ ≥ 0.

Theorem 6.4.3 (Jensen’s Inequality) If ϕ : R → R+ is a convex function and µ is a probability mea- sure, then for all f ∈ L1(X , F, µ), Z  Z ϕ fdµ ≤ ϕ(f) dµ.

Proof It is a fact that if ϕ is convex then there exists a constant c ∈ R such that Z   Z  ∀y ∈ R, ϕ(y) ≥ ϕ fdµ + c y − fdµ .

Take y = f(x), Z   Z  ϕ(f(x)) ≥ ϕ fdµ + c f(x) − fdµ .

Integrate this over with respect to the probability measure µ (see Exercise 5.5.1) to conclude.  6.4. THE CASE OF FINITE MEASURES 93

Pn Example 6.2 Let x1, . . . xn be points in R and suppose that µ = i=1 λiδxi is a probability measure on R. Then apply Jensen to f(x) = x and any convex ϕ:

n n X X ϕ( λixi) ≤ λiϕ(xi). i=1 i=1

6.4.1 Mode of convergence

To wrap up we note the commonly used notions of convergence. Let (X , F, µ) be a measure space with σ-finite measure, and fn, f : X → R be Borel measurable functions.

Definition 6.4.2 We say fn converges to f in Lp if Z p |fn − f| dµ → 0.

Definition 6.4.3 We say fn converges to f in measure if for any  > 0,

lim µ(x : |fn(x) − f(x)| > ) = 0. n→∞

Definition 6.4.4 We say fn converges to f a.s. if there exists a null set N and for any x 6∈ N,

lim fn(x) = f(x). n→∞

Remark 6.4.4 If fn → f in Lp it converges in measure.

Chapter 7

Product measures and Fubini’s Theorem

Throughout this section let (X , F, µ) and (Y, G, ν) be two measure spaces. A (measurable) product set is of the form A × B where A ∈ F and B ∈ G. Denote by C the collection of products measurable set.

C = {A × B : A ∈ F,B ∈ G}.

The product σ-algebra is generated by such sets.

F ⊗ G = σ(C).

Can we assign a measure to F ⊗ G that is consistent to µ and ν?

If X = {a1, . . . , an}, we use the power set σ-algebra. If we assign any non-negative values to ai, X say pi, this determines a measure on 2 (no relations is needed on {pi}, as intersections of singletons are emptysets, the union of two singletons is no longer a singleton). The power σ-algebra has precisely 2n elements. Every measure on the finite space X , with the power σ-algebra, is determined uniquely by their values of the singleton sub-sets.

Let Y = {b1, . . . , bm}, then X × Y = {(ai, bj) : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, and every element of X Y X Y X ×Y X × Y is contained in 2 ⊗ 2 . Thus 2 ⊗ 2 = 2 . Hence by assigning a value to every {(ai, bj)}, this is a product set, this determines a measure on the product σ-algebra. Given µ on X and ν on Y, we define the by

µ × ν({(ai, bj)}) = µ({ai})ν({bj}).

Similarly if X is given by a partition {A1,...,An}, and Y is given a partition {B1,...,Bm}, and if n m we set F = σ({A1,...,An}) and G = σ({B1,...,Bm}). Then there are 2 and 2 elements in F and G respectively. If we assign values µ(Ai), this determine a measure on X (because the intersections of the Ai are emptyset, no relation between these numbers are neeeded for extending these to every element of F). Every measure on F is obtained in this way. Also F ⊗ G = σ({Ai × Bj}). It is clear that

95 7.1. PRODUCT MEASURES 96

Ai × Bj is a partition of X × Y. Thus this is totally analogous to the finite state spaces discussed earlier. Furthermore given µ, ν on F and G respectively, the product measure on F ⊗ G will be defined by

µ × G(Ai × Bj) = µ(Ai)ν(Bj).

If F and G are not finite σ-algebras, they are uncountable, the construction for the product measure becomes much more involved.

7.1 Product measures

Recall the definition of elementary family in Definition 2.2.4.

Lemma 7.1.1 The collection C of product sets is an elementary family. The collection A of a finite number of disjoint unions of product sets is an algebra.

Proof The empty set is in C. Also,

(A × B) ∩ (A0 × B0) = (A ∩ A0) × (B ∩ B0), (A × B)c = (A × Bc) ∪ (Ac × B) ∪ (Ac × Bc).

Thus, C is closed under taking complement, and the complement of a set is the disjoint union of a finite number of product sets. By exercise 2.2.4, the collection of finite number of disjoint union of sets from C is an algebra. 

n Let us define a function µ × ν : A → [0, ∞] by the following rules: If E = ∪j=1(Ai × Bj) where Ai ∈ F and Bi ∈ G, then we want to define

n X µ × ν(E) = µ(Ai)ν(Bi). (7.1) j=1

We must show that this is independent of the expression of E into disjoint unions of product sets.

∞ Lemma 7.1.2 Let E = A × B ∈ C. Suppose that E = ∪j=1(Aj × Bj) where Ai ∈ F and Bi ∈ G, where {Aj × Bj} are disjoint sets. then

∞ X µ(A)ν(B) = µ(Aj)ν(Bi). j=1

Proof Observe that 1A×B(x, y) = 1A(x)1B(y) for x ∈ X ,B ∈ Y. Also,

∞ ∞ X X 1A×B(x, y) = 1Ai×Bi (x, y) = 1Ai (x)1Bi (y). i=1 i=1 7.1. PRODUCT MEASURES 97

P∞ Holding y fixed, we may integrate 1A(x)1B(y) = i=1 1Ai (x)1Bi (y) with respect to µ. ∞ Z Z X 1A(x) 1B(y) dµ(x) = 1Ai (x)1Bi (y)dµ(y). X X i=1 Use Corollary 5.4.8 for exchanging the summation and integration,

∞ X µ(A)1B = µ(Ai)1Bi . i=1 We use the convention 0 · ∞ = 0. ( So if µ(A) = ∞, then the left hand side is ∞ if y ∈ B and 0 otherwise. The right hand side is interpreted in the same way. Observe that Bi ⊂ B and Ai ⊂ A.) P∞ Integrating both sides from the above line w.r.t. ν we obtain µ(A)ν(B) = i=1 µ(Ai)ν(Bi), com- pleting the proof. 

S∞ S∞ Lemma 7.1.3 Suppose that i=1(Ai × Bi), j=1(Cj × Dj) are disjoint unions and

∞ ∞ [ [ (Ai × Bi) = (Cj × Dj), i=1 j=1 then ∞ ∞ X X µ(Ai)ν(Bi) = µ(Cj)ν(Dj). i=1 j=1

S∞ Proof Let E = j=1(Ai × Bi). Then,

∞ ∞ [ [ (Ai × Bi) = ( (Ai × Bi)) ∩ E i=1 j=1 ∞ ∞ [ [ = (Ai × Bi) ∩ (Cj × Dj) i=1 j=1 ∞ ∞ [ [ = (Ai ∩ Cj) × (Bi ∩ Dj). i=1 j=1 The last is also a union of disjoint sets.

For each i, we apply the previous lemma to

∞ [ Ei = Ai × Bi = (Ai ∩ Cj) × (Bi ∩ Dj) j=1 7.1. PRODUCT MEASURES 98 to see ∞ X µ(Ai × Bi) = µ(Ai ∩ Cj)ν(Bi ∩ Dj) j=1 So, ∞ ∞ ∞ X X X µ(Ai × Bi) = µ(Ai ∩ Cj)ν(Bi ∩ Dj). i=1 i=1 j=1 Similarly ∞ ∞ ∞ X X X µ(Cj × Dj) = µ(Ai ∩ Cj)ν(Bi ∩ Dj). j=1 i=1 j=1 This finishes the proof. 

The map (7.1) defines a pre-measure on A. Firstly µ × ν(φ) = 0, secondly it has finite additive S∞ property on A. Furthermore, suppose that E = i=1 Ei, where Ei ∈ A are disjoint, with the property P∞ that E ∈ A, we show that µ(E) = i=1 µ(Ei). On one hand, each Ei is a finite disjoint union of product sets, we pull these product sets , that composes of Ei, together: they are all disjoint. Relabel S∞ P these disjoint product sets we see E = i=1 Ai × Bi. Re-arrange it, we see that µ(E) = µi(Ei). With this we define an outer measure µ∗, by (3.1):

 ∞ ∞  ∗ X [  µ (E) = inf µ × ν(Ej): E ⊂ Ej,Ej ∈ A . j=1 j=1 

By the Caratheodory theorem, the outer measure µ∗ is a measure on F ⊗ G and agrees with µ × ν on A. This measure on F ⊗ G will be called the product measure. If µ and ν are σ-finite, there exists a unique measure such that the measure of product sets is the product of the measures on each factor.

If µ, ν are finite, so is µ×ν(X ×Y) = µ(X )ν(Y) < ∞. If they are both σ-finite, so is µ×ν. Indeed, there exist An ↑ X and Bn ↑ Y such that µ(An) < ∞, ν(Bn) < ∞ for every n. Now {Ai × bj} is an increasing sequence converging to X × Y with finite measure, hence µ × ν is σ-finite.

Remark: Since a countable union of measures of A is a countable union of product sets,

 ∞ ∞  ∗ X [  µ (E) = inf µ(Aj)ν(Aj): E ⊂ Aj × Bj,Aj ∈ F,Bj ∈ F ∈ A . j=1 j=1 

Remark 7.1.4 Similar constructions can be used to obtain a product measure on a finite number of copies of product spaces and their product σ-algebras.

Let us take for example three measure spaces (Xi, Fi, µi). Recall that

3  3 ⊗i=1Fi = σ( Πi=1Ai : Ai ∈ Fi, i = 1, 2, 3 ). 7.1. PRODUCT MEASURES 99

We define µ1 × µ2 × µ3 by

3 µ1 × µ2 × µ3(A1 × A2 × A3) = Πi=1µi(Ai).

Proposition 7.1.5 Let (Xi, Fi, µi) be measurable spaces. We have,

3 (F1 ⊗ F2) ⊗ F3 = ⊗i=1Fi.

If µi are σ-finite, then

(µ1 × µ2) × µ3 = µ1 × µ2 × µ3.

 3 Proof It is clear that, Πi=1Ai : Ai ∈ Fi, i = 1, 2, 3 ⊂ {E × C : E ∈ F1 ⊗ F2,C ∈ F3}. Hence

3 ⊗i=1Fi ⊂ (F1 ⊗ F2) ⊗ F3.

On the other hand, for any C ∈ F3,

3 {E × C : E ∈ F1 ⊗ F2} ⊂ ⊗i=1Fi. so 2 3 {E × C : E ∈ F1 ⊗ F2,C ∈ F3} ⊂ ∪C∈F3 {E × C : E ∈ ⊗i=1Fi} ⊂ ⊗i=1Fi.

Thus 3 (F1 ⊗ F2) ⊗ F3 = σ({E × C : E ∈ F1 ⊗ F2,C ∈ F3}) ⊂ ⊗i=1Fi.

Now define µ1 × µ2 on F1 ⊗ F2. Then, (µ1 × µ2) × µ3 and µ1 × µ2 × µ3 agree on the product sets. These product set determines σ-finite measures. 

Example 7.1 Let λ be the Lebesgue measure, the completion of the product measure on R × R is called the Lebesgue measure on R2. This is denoted by the same letter or by λ2. Our question is: given f : R2 → R measurable, what is Z fdλ2? R2

Remark 7.1.6 If each factor (X , F, µ) and (Y, G, ν) are complete, (i.e. the σ-algebra is complete with respect to the given measure), the product σ-algebra is often NOT complete with respect to the product measure. We can of course complete the product σ-algebra under the product measure.

Exercise 7.1.1 Give an example of a Borel measure on R2 (i.e. on the Borel σ-algebra) that is not a product measure. 7.1. PRODUCT MEASURES 100

7.1.1 Sections of subsets of product spaces

If E ⊂ X × Y, for x ∈ X , we define the section

Ex = {y ∈ Y :(x, y) ∈ E}.

Similarly for y ∈ Y we define Ey = {x ∈ X :(x, y) ∈ E}.

Proposition 7.1.7 If E ∈ F ⊗ G, then

1. Ex ∈ G for every x ∈ X and ν(Ex) is measurable.

2. Ey ∈ F for every y ∈ Y and µ(Ey) is measurable.

3. Also, Z Z y µ × ν(E) = ν(Ex)µ(dx) = µ(E )ν(dy). X Y

Proof (1) We first assume that the measures are finite. Let

D = {E ∈ F ⊗ G : conclusion holds}.

Observe that ( B, x ∈ A, (A × B) = x φ, x 6∈ A. Take µ measure of the above, ( ν(B), x ∈ A, ν((A × B) ) = x 0, x 6∈ A.

Integrate w.r.t. x, we see

ν((A × B)x)µ(dx) = ν(B)µ(A) = µ × ν(A × B).

A similar conclusion holds for the y-sections. This means D contains the set of products sets, which is a π-system.

If A ⊂ B, A, B ∈ D then

y y y (A \ B)x = Ax \ Bx ∈ G (A \ B) = A \ B ∈ F. Z Z Z ν((A \ B)x)µ(dx) = ν(Ax)dµ(x) − ν(Bx)dµ(y), 7.2. FUBINI’S THEOREM 101

A similar conclusion for the y-sections. Hence A \ B ∈ D. If An ∈ D is an increasing sequence, [ [ ( An)x = (An)x ∈ G. n

Thus D is a λ system and thus equals F ⊗ G, as it contains a generating set which is a π-system. This concludes the finite measure case.

(2) Suppose that the measures are σ-finite, we take Ai ×Bi increasing with finite measure, then apply the proposition to E ∩ (Ai × Bi), we see that ( E ∩ B , if x ∈ A (E ∩ (A × B )) = x i i i i x φ, otherwise, is measurable for every x, then so is Ex = ∪i(E ∩ (Ai × Bi))x. Similarly conclusions for the y-sections. Since, Z Z y µ × ν(E ∩ (Ai × Bi)) = ν((E ∩ (Ai × Bi))x)µ(dx) = µ((E ∩ (Ai × Bi))x)ν(dy), X Y we may take i → ∞ and use the monotone convergence theorem to conclude. 

Proposition 7.1.8 If f : X × Y → R is measurable, then so are the following functions:

y fx = f(x, ·): Y → R, f = f(·, y): X → R are measurable for every x ∈ X and for every y ∈ Y.

Proof For any B ∈ B(R),

(f y)−1(B) = {x : f(x, y) ∈ B} = (f −1(B))y ∈ F, −1 −1 (fx) (B) = {y : f(x, y) ∈ B} = (f (B))x ∈ G.

This conclusion follows from Proposition 7.1.7. 

7.2 Fubini’s Theorem

Theorem 7.2.1 (The Fubini-Tonelli Theorem) Let µ and ν be σ-finite measures.

R R y (1) Suppose f : X ×Y → [0, ∞] is measurable. Then g(x) = fx(y)dν(y) and h(y) = f (x)dµ(x) are both non-negative and measurable. Furthermore, Z Z Z  Z Z  fd(µ × ν) = f(x, y)dν(y) dµ(x) = f(x, y)dµ(x) dν(y). (7.2) X Y Y X 7.2. FUBINI’S THEOREM 102

(2) Suppose f ∈ L1(µ × ν), then

(a) fx ∈ L1(ν) for a.e. x y (b) f ∈ L1(µ) for a.e. y ∈ Y. R (c) The almost everywhere defined functions g1(x) = fx(y)dν(y) ∈ L1(µ) and g2(y) = R y f (x)dµ(x) ∈ L1(ν) are both integrable. (d) Furthermore, (7.2) holds.

Proof This theorem holds for characteristic functions, and therefore holds for simple functions by non- linearity. We prove (1) first. Let f n ∈ S+ such that f n ↑ f. Then Z Z n gn(x) = f (x, y)dν(y) ↑ g(x) = f(x, y)dν(y), Z Z n hn(y) = f (x, y)dµ(x) ↑ h(y) = f(x, y)dµ(x).

So g, h are measurable and Z Z Z  Z Z  fnd(µ × ν) = fn(x, y)dν(y) dµ(x) = fn(x, y)dµ(x) dν(y). X Y Y X Apply the monotone convergence theorem to conclude (1). R R R Also, fd(µ × ν) < ∞ implies that Y fn(x, y)dν(y) is finite for a.e. x and X fn(x, y)dµ(x) is finite for a.e. y.

For a general integrable function f, apply this to f + and f −, and use the statement above to conclude. 

Remark 7.2.2 For simplicity we remove the brackets and write Z Z Z Z  f(x, y)dν(y)dµ(x) = f(x, y)dν(y) dµ(x), X Y X Y Z Z Z Z  f(x, y)dν(y)dµ(x) = f(x, y)dν(y) dµ(x). X Y Y X Corollary 7.2.3 Fubini’s theorem holds if we replace µ × ν by its completion, provided that µ, ν are complete.

To see this we note that if f is measurable with respect to the completion of F ⊗ G, then there exists an F ⊗ G -measurable function fˆ such that f = fˆ except on a null set E, which by enlarging we may y assume to be F × G measurable. Then µ(Ex) = 0 for a.e. x, ν(E ) = 0 for a.e. y. Since µ, ν are complete, they are measurable. Thus, we may apply Fubini’s theorem to fˆ, Z Z ZZ ZZ f dµ × ν = fˆ dµ × ν = fˆ(x, y) dµ(x)dν(y) = fˆ(x, y) dν(y)dµ(x). X ×Y X ×Y 7.3. PRODUCT MEASURES AND INDEPENDENCE 103

We know that fˆ(x, ·) is measurable for almost every x and f(x, ·) differs at most on Ex, a null set. Similarly we conclude f(·, y) is measurable a..e. y, and also they are in L1, the conclusion follows.

Exercise 7.2.1 Let X = {1,...,N} be a finite state space. Write down the the Fubini-Tonelli Theorem specific to this case.

Given a measure µ on X × Y. We say µi are the marginals of µ if

µ(X × B) = µ2(B), µ(A × Y) = µ1(A), ∀A ∈ F,B ∈ G.

We also say that µ is a coupling of µ1 and µ2.

2 Exercise 7.2.2 Given an example of a Borel measure µ on R , two Borel measures µi on R such that µ is not the product measure µ1 ×µ2, but µi are marginals of µ. Given another set of examples of measures such that µ = µ1 × µ2 but µi are not the Lebesgue measure.

R Note that if f is a non-negative measurable function, µˆ(A) := A f dµ is a measure.

Exercise 7.2.3 f : R → R f(x) = x2 + 1 λ For given by √ and the Lebesgue measure, compute the R x−1 pushed forward measure. f∗λ and compute [1,2] e d(f∗λ).

(x2−y2) Exercise 7.2.4 Let f(x, y) = (x2+y2)2 for x ∈ [0, 1] and y ∈ (0, 1], also f(0, 0) = 0. Show that f 6∈ L1([0, 1] × [0, 1]).

7.3 Product measures and independence

The notion of product measure admits an important interpretation in probability theory with the notion of independence. We first recall the following definitions, for which we assume that we are given a probability space (Ω, F, P). We will say that A is a sub-σ-algebra of F if it is a σ-algbera contained in F.

Definition 7.3.1 1. We say that two sub-σ-algebras A and B of F are independent if any A ∈ A and any B ∈ B are independent, i.e. they satisfy P(A ∩ B) = P(A)P(B).

2. Two random variables X and Y on (Ω, F, P) are independent if σ(X) is independent of σ(Y ).

3. A random variable Y is said to be independent of a sub-σ-algebra A of F, if σ(Y ) and A are independent.

Thus, if (X , A) and (Y, B) are two measurable spaces, and if X :Ω → X and Y :Ω → Y are two random variables, then X and Y are independent if and only if, for all A ∈ A and B ∈ B,

P(X ∈ A, Y ∈ B) = P(X ∈ A) P(Y ∈ B). (7.3) 7.3. PRODUCT MEASURES AND INDEPENDENCE 104

Let us denote by PX (resp. PY ) the probability distribution of X (resp. Y ): this is a probability measure on (X , A) (resp. (Y, B)). Let us further denote by P(X,Y ) the probability distribution of the random variable (X,Y ):Ω → X × Y: this is a probability measure on (X × Y, A ⊗ B).

Theorem 7.3.1 The random variables X and Y are independent if and only if P(X,Y ) = PX × PY .

Proof X and Y are independent if and only if, for all A ∈ A and B ∈ B, the equality (7.3) holds, and this equality can be written P(X,Y )(A × B) = PX (A) PY (B). Thus X and Y are independent if and only if P(X,Y ) satisfies the characteristic property of the product measure PX × PY , whence the claim. 

Corollary 7.3.2 If X and Y are independent, then for all measurable functions f : X → [0, ∞] and g : Y → [0, ∞], we have E(f(X) g(Y )) = E(f(X)) E(g(Y )).

Proof By the transfer lemma, we have Z E(f(X) g(Y )) = f(x) g(y)dP(X,Y )(x, y). X ×Y

But since X and Y are independent, P(X,Y ) = PX × PY , so by the Fubini-Tonnelli Theorem we obtain Z Z E(f(X) g(Y )) = f(x) dPX (x) g(y)dPY (y) X Y = E(f(X)) E(g(Y )) where we used the transfer lemma to obtain the last equality. 

Exercise 7.3.1 Assume that f : X → R and g : Y → R are measurable functions such that the random variables f(X) and g(Y ) are integrable. Show that if X and Y are independent, then the random variable f(X) g(Y ) is integrable and

E(f(X) g(Y )) = E(f(X)) E(g(Y )). Chapter 8

Radon-Nikodym Theorem

In this chapter we compare two measure, discuss the concepts of singular and absolutely continuity.

8.1 Singular and absolutely continuity

Let (X , F) be a measurable space.

Definition 8.1.1 We say that a measure µ is absolutely continuous with respect to ν if whenever ν(A) = 0, we have µ(A) = 0.

We write µ  ν. Given any two finite measures µ1, µ2 we can find a reference measure (e.g. µ = µ1 + µ2, such that µ1  µ and µ2  µ).

Definition 8.1.2 Two measures µ and ν on (X , F) are mutually singular if there are disjoint sets A1 and A2 such that X = A1 ∪ A2 such that µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν.

Remark 8.1.1 Note that the above property is equivalent to the existence of a measurable subset B satisfying µ(Bc) = ν(B) = 0.

The Lebesgue measure and any Dirac measures δx are singular.

Exercise 8.1.1 If ν is a measure on (X , G) and p : X → [0, ∞) is an integrable function. Show that Z µ(A) = p(y)ν(dy) A defines a measure and µ  ν. We write µ = p ν, or dµ = p dν.

105 8.1. SINGULAR AND ABSOLUTELY CONTINUITY 106

Example 8.1 The N(0, 1) on R1 is absolutely continuous w.r.t. the Lebesgue mea- sure. Z 2 1 − |x| N(0, 1)(A) = √ e 2 dx. A 2π

R 2 Example 8.2 Let X = [0, 1] and P the Lebesgue measure . Define Q(A) = 2 A x dx so

dQ (x) = 2x2. dP

Define Q1 by dQ1 = 21[0, 1 ]. dP 2

Then Q1  P and P is not absolutely continuous with respect to Q1. Define Q2 by

dQ2 = 21[ 1 ,1]. dP 2

The two measures Q1 and Q2 are singular.

The following lemma will be useful in the sequel:

Lemma 8.1.2 Let µ, µ˜ and ν three measures on a measurable space (X , F) such that µ ⊥ ν and µ˜ ⊥ ν. Then there exists a measurable subset B such µ(Bc) =µ ˜(Bc) while ν(B) = 0.

Proof Since µ ⊥ ν, there exist measurable disjoint susbets A1 and A2 such that A1 ∪ A2 = X and µ(A1) = ν(A2) = 0. Since moreover µ˜ ⊥ ν, there also exist measurable disjoint susbets A˜1 and A˜2 c such that A˜1 ∪ A˜2 = X and µ˜(A˜1) = ν(A˜2) = 0. Setting B = A2 ∪ A˜2, it follows that µ(B ) = ˜ c ˜ µ(A1 ∩ A1) ≤ µ(A1) = 0, and similarly µ˜(B ) = 0, while ν(B) ≤ ν(A1) + ν(A1) = 0. 

Exercise 8.1.2 Let ν be a finite measure. Sow that ν  µ if and only if the following holds: for any  > 0 there exists δ > 0 such that whenever µ(E) < δ then ν(E) < .

Exercise 8.1.3 Show that if f ∈ L1(µ) is a non-negative function, then for every  > 0 there exists δ > 0 such that whenever µ(A) < δ we have Z fdµ < . E

R Note that ν(A) = A dµ defines a measure with ν  µ.

Exercise 8.1.4 Give examples of pairs of measure µ, ν with µ  ν; then give examples such that µ ⊥ ν. 8.2. SIGNED MEASURE 107

8.2 Signed measure

Definition 8.2.1 Let (X , F) be a measurable space. A signed measure on (X , F) is a map µ : F → R satisfying the following properties:

1. µ(φ) = 0, P 2. if (Aj)j≥1 is a sequence of disjoint measurable sets, then the series j≥1 µ(Aj) is absolutely convergent in R and ∞ ∞ X µ(∪j=1Aj) = µ(Aj). j=1

Remark 8.2.1 We stress that, in the definition of a signed measure µ, we impose µ to have finite real values (this is sometimes refered to as a finite signed measure). We thus allow for A ∈ F such that µ(A) < 0, but not for µ(A) ∈ {±∞}.

Terminology: When talking about genuine measures (as introduced in Chapter 3), we will often use the expression “positive measures” to avoid confusion with signed measures.

1 R Example 8.3 If ν is a positive measure, and f ∈ L (X , F, ν), then µ(A) := A f dν defines a signed measure.

Definition 8.2.2 A measurable set P is said to be positive for µ if for any measurable F ⊂ P , we have µ(F ) ≥ 0. In other words µ restricts to a (positive) measure on P . A measurable set N is said to be negative for µ if for any F ⊂ N, we have µ(F ) ≤ 0; in other words (−µ) restricts to a (positive) measure on N.

∞ Exercise 8.2.1 If Pn is a sequence of positive sets for µ, then so is ∪n=1Pn.

Sn−1 Note that each Pn \ k=1 Pn is positive for µ.

Lemma 8.2.2 If A ∈ F satisfies µ(A) < 0, then there exists a negative subset N ⊂ A such that µ(N) < 0.

Proof Let δ1 := sup{µ(B): B ∈ F,B ⊂ A}. Note that φ ⊂ A, so that the collection of sets over which we are taking the sup is non-empty, and δ1 ≥ µ(φ) = 0. Thus δ1 ∈ [0, ∞]. By definition of δ1, δ1 we can find B1 ∈ F such that B1 ⊂ A and µ(B1) ≥ min( 2 , 1). Continuing in this way we define, by induction over n, a sequence of non-negative numbers δn and a sequence of disjoint measurable subsets Bn by setting ( n−1 ) [ δn := sup µ(B): B ∈ F,B ⊂ A \ Bi i=1 8.2. SIGNED MEASURE 108

n−1  δn and by taking a measurable subset Bn ⊂ A\ ∪i=1 Bi such that µ(Bn) ≥ min( 2 , 1). Note in particular that µ(Bn) ≥ 0. We define ∞ ! [ N := A \ Bi . i=1 ∞ P∞ Note that, since the Bi are disjoint, µ(∪i=1Bi) = i=1 µ(Bi) ≥ 0. Hence ∞ µ(N) = µ(A) − µ(∪i=1Bi) ≤ µ(A) < 0. We finally show that N is a negative set. Let B be a measurable subset of N. Then, for all n ≥ 1, n−1  B ⊂ N ⊂ A \ ∪i=1 Bi so, by the definition of δn, µ(B) ≤ δn. But δn → 0 as n → ∞: indeed, since P the Bn are disjoint, the series n≥1 µ(Bn) is absolutely convergent, so in particular µ(Bn) → 0 when δn n → ∞; since 0 ≤ min( 2 , 1) ≤ µ(Bn), the claim follows. Hence µ(B) ≤ 0. Thus N is a negative set. 

Theorem 8.2.3 (The Hahn decomposition theorem) If µ is a signed measure on (X , F), then there exists a partition of X : X = P ∪ N and P ∩ N = φ, such that P is positive and N is negative for µ.

If X = P 0 ∪ N 0 is another such partition then µ(P ∆P 0) = 0 and µ(N∆N 0) = 0.

Proof Let c := inf{µ(N): N ∈ F negative set}. Note that φ is a negative set, so the collection of subsets over which we are taking the inf is non-empty, and c ≤ µ(φ) = 0. Hence c ∈ [−∞, 0]. Let (Nn) be a sequence of negative sets such that µ(Nn) → c as n → ∞, and let [ N := Nn. n≥1

Note that N is a negative set as a countable union of negative sets. Moreover, for all n ≥ 1, N \Nn ⊂ N, so that µ(N) − µ(Nn) = µ(N \ Nn) ≤ 0. Thus c ≤ µ(N) ≤ µ(Nn) and, sending n → ∞, we deduce that µ(N) = c. In particular, we deduce that c > −∞. Let P := N c. We show that P is a positive set. If not, there exists A ⊂ P with µ(A) < 0. By Lemma 8.2.2, one can then find a negative set A˜ ⊂ A such that µ(A˜) < 0. Then the set N˜ := N ∪ A˜ is negative, and, since N and A˜ are disjoint, µ(N˜) = µ(N) + µ(A˜) < µ(N) = c, which contradicts the definition of c. Therefore P is indeed a positive set, and (N,P ) gives the a partition of X into a negative and a positive subset.

If X = P 0 ∪ N 0 is another such partition, then P ∆P 0 = (P ∩ (P 0)c) ∪ (P c ∩ P 0) = (P ∩ N 0) ∪ (N ∩ P 0) = N∆N 0. In particular P ∆P 0 ⊂ P ∪ P 0 and, since P ∪ P 0 is a positive subset, we deduce that µ(P ∆P 0) ≥ 0. But we also have P ∆P 0 ⊂ N ∪ N 0 which, since N ∪ N 0 is a negative subset, implies µ(P ∆P 0) ≤ 0. So 0 0 0 µ(P ∆P ) = 0, and therefore µ(N∆N ) = µ(P ∆P ) = 0.  8.2. SIGNED MEASURE 109

Theorem 8.2.4 (Jordan decomposition theorem) If µ is a signed measure on (X , F), then there exist uniquely finite positive measures µ+ and µ− such that

µ = µ+ − µ−, µ+ ⊥ µ−.

Proof Take µ+ = µ(P ∩ ·), µ− = −µ(N ∩ ·) where X = P ∪ N as in Hahn decomposition theorem. To see it is unique suppose µ = Q1 − Q2 where Q1 and Q2 are two finite positive measures such that Q1 ⊥ Q2. Then X = E1 ∪ E2, with disjoint subsets E1,E2 such that Q1(E2) = 0 and Q2(E1) = 0. Then E1 and E2 provide another Hahn decompostion, by its uniqueness we conclude. 

We can talk about a signed measure being absolutely continuous with respect to a (positive ) measure.

Definition 8.2.3 If µ is a signed measure on (X , F), we set |µ| = µ+ + µ−. |µ| is a finite positive measure, is called the total variation of µ.

Definition 8.2.4 If ν is a (positive) measure, we say µ  ν if µ(A) = 0 whenever ν(A) = 0 where A ∈ F. On the other hand, we say that µ ⊥ ν if there are two disjoint sets A1,A2 ∈ F such that X = A1 ∪ A2, and ν(A2) = 0 while µ(B ∩ A1) = 0 for all measurable subset B.

Exercise 8.2.2 1. Show that µ  ν if and only if |µ|  ν. Further show that this holds if and only if µ+  ν and µ−  ν.

2. Show that µ ⊥ ν if and only if |µ| ⊥ ν. Further show that this holds if and only if µ+ ⊥ ν and µ− ⊥ ν.

+ − Definition 8.2.5 If µ is a signed measure we set L1(µ) = L1(µ ) ∩ L1(µ ), and for f ∈ L1(µ), we set Z Z Z fdµ = fdµ+ − fdµ−.

Lemma 8.2.5 Suppose that µ and ν are finite (positive) measures. Then either µ ⊥ ν or there exists  > 0 and a measurable set E such that ν(E) > 0 and E is a positive set for µ − ν.

Proof By Hahn’s decomposition theorem, there exists a partition X = Pk ∪Nk such that Pk is a positive 1 set and Nk a negative set for the signed measure µ − k ν. S 1 Let P = k Pk and N = ∩kNk. Then N, a subset of Nk, is negative for every µ − k ν, in particular 1 µ(N) − ν(N) ≤ 0. k Taking k → ∞, we see that µ(N) ≤ lim 1 ν(N) = 0, so that µ(N) = 0. If µ, ν are not mutually k→∞ k singular, ν must charge the complement of N which is P , i.e. we must have ν(P ) > 0. Then, since P is a countable union of Pk’s we have ν(Pk) > 0 for some k. We take  = 1/k and E = Pk which is 1 positive for µ − k ν.  8.3. RADON-NIKODYM THEOREM 110

8.3 Radon-Nikodym Theorem

Use with caution the following (for it might be confused with , I used this in proofs.) We use µ ≤ ν to indicate that µ(A) ≤ ν(A) for every measurable set A.

Theorem 8.3.1 (Lebesgue-Radon-Nikodym Theorem) Let µ and ν be two σ-finite positive measures on (X , F). Then

(1) [Lebesgue decomposition] there exist, uniquely, σ-finite positive measures µ1 and µ2 with the property that µ = µ1 + µ2, µ1 ⊥ ν, and µ2  ν.

(2) [Radon-Nikodym Theorem] There exists a measurable function D : X → [0, ∞] such that µ2(A) = R A D dν for all A ∈ A. Any two such functions agree ν almost-everywhere. (3) If µ is finite, then D is integrable with respect to ν.

dµ Definition 8.3.1 We denote D by dν and call it the Radon-Nikodym derivative of µ with respect to ν.

Proof Step 1. We first assume that µ and ν are finite positive measures. Let us define  Z  H = f : X → [0, ∞]: fdν ≤ µ(A) for all A ∈ F . A We choose the ‘largest f’ from H as follows. Let Z  M = sup fdν : f ∈ H . X This is possible since H is not empty: f ≡ 0 ∈ H. Also any f ∈ H has finite integral R fdν ≤ µ(X ) so M < ∞.

Next we see, If f, g ∈ H then max(f, g) ∈ H. Let E = {f < g}. Z Z Z max(f, g)dν = gdν + fdν ≤ µ(A ∩ E) + µ(A ∩ Ec) = µ(A). A E∩A Ec∩A

There exists a sequence of functions fn ∈ H such that Z lim fndµ = M. n

Set gn = max{f1, . . . , fn} ∈ H. This is an increasing sequence and Z Z fn ≤ gn ≤ M. X X 8.3. RADON-NIKODYM THEOREM 111

R R By the monotone convergence theorem, limn ↑ gndµ = limn gndµ = M. Set D = lim ↑ gn, n Then D ∈ H. Indeed, it is measurable and for any A ∈ F, Z Z Z Ddν = 1ADdν = lim 1A gndν ≤ µ(A). A n We show that µ−Ddν is singular w.r.t. ν. If not there exists  > 0 and E measurable such that ν(E) > 0 with µ − Ddν − 1Edν > 0 on E, i.e. for every A ∈ F, Z (D + 1E)dν ≤ µ(A ∩ E). A∩E Then Z Z Z c (D + 1E)dν = (D + 1E)dν + Ddν ≤ µ(A ∩ E) + µ(A ∩ E ) = µ(A) A A∩E A∩Ec and (f + 1E) ∈ H and its integral is larger than M, which contradicts with M being the largest integral among all f ∈ H.

Uniqueness of the Lebesgue decomposition: if (˜µ1, µ˜2) is another Lebesgue decomposition of µ wrt ν, then µ1+µ2 =µ ˜1+˜µ2, so we have the following equality between signed measures µ1−µ˜1 =µ ˜2−µ2. c c Since µ1 ⊥ ν and µ˜1 ⊥ ν, by Lemma 8.1.2 there exists B ∈ F such that µ1(B ) =µ ˜1(B ) = 0 while ν(B) = 0. Hence, for all A ∈ F,

(µ1 − µ˜1)(A) = (µ1 − µ˜1)(A ∩ B) = (˜µ2 − µ2)(A ∩ B) = 0, where, to obtain the last equality, we used that 0 ≤ ν(A ∩ B) ≤ ν(B) = 0 which implies ν(A ∩ B) = 0 and hence µ2(A ∩ B) =µ ˜2(A ∩ B) = 0 as µ˜2 and µ2 are absolutely continuous w.r.t. ν. So µ1(A) = µ˜1(A). Since A was arbitrary, µ1 =µ ˜1, and therefore µ2 =µ ˜2 as well.

Step 2. Suppose µ and ν are σ-finite (positive) measures. Then there exists Ej disjoint with X = j j ∪jEj, µ(Ei) < ∞ and ν(Ej) < ∞. Let µ = µ(Ej ∩ ·) and ν = ν(Ej ∩ ·). By the previous j j argument, for all j ≥ 1, there exist two finite positive measures µ1 and µ2, and a measurable function j j j j j j Dj : X → [0, ∞] such that µ = µ1 + µ2, µ1 ⊥ ν and dµ2 = D dν. Then µ = µ1 + µ2 where P∞ j P∞ j P∞ µ1 = j=1 µ1 and µ2 = j=1 µ2. Moreover, setting : D = j=1 1Ej Dj; we have dµ2 = D dν: indeed, recalling that the Dj are non-negative measurable functions, for all A ∈ F we have   ∞ Z ∞ Z Z ∞ Z X j X X µ2(A) = Dj dν = 1Ej Dj dν =  1Ej Dj d ν = D d ν. j=1 A j=1 A A j=1 A

j j j c Finally, µ1 ⊥ ν. Indeed, for all j ≥ 1, since µ1 ⊥ ν , there exists Aj ∈ F such that µ1(Aj) = 0 while j ν (Aj) = 0. Let A := ∪j≥1(Ej ∩ Aj). Note that, since Ej form a partition of X , this is a disjoint union, 8.3. RADON-NIKODYM THEOREM 112 so ∞ ∞ X X j ν(A) = ν(Ej ∩ Aj) = ν (Aj) = 0. j=1 j=1 c c Moreover, A = ∪j≥1(Ej ∩ Aj), so that ∞ ∞ ∞ c X c X j c X j c µ1(A ) = µ1(Ej ∩ Aj) = µ1(Ej ∩ Aj) ≤ µ1(Aj) = 0, j=1 j=1 j=1

k k where, for the second equality, we used that µ1(Ej) ≤ µ (Ej) = 0 for all k 6= j. Therefore µ1 ⊥ ν as claimed.

Uniqueness of the Lebesgue decomposition: If (˜µ1, µ˜2) is another Lebesgue decomposition for µ with respect to ν, then for all j ≥ 1, the measures µ˜1(Ej ∩·) and µ˜2(Ej ∩·) provide a Lebesgue decomposition j j j of µ with respect to ν . By uniqueness in the finite case, we deduce that µ˜1(Ej ∩ ·) = µ1 and µ˜2(Ej ∩ j ·) = µ2. Hence, for i = 1, 2 X X j µ˜i = µ˜i(Ej ∩ ·) = µi = µi, j≥1 j≥1 and uniqueness follows.

Step 3. There remains to show that if D and D˜ are two non-negative measurable functions on X such ˜ ˜ R R ˜ that dµ2 = D dν and dµ2 = D dν, then D = D ν-a.e. But in that case we have A D dν = A D dν for all A ∈ F, and the result follows from Exercise 5.5.3.



1 Exercise 8.3.1 (1) In Step 1 of the proof, why did we not take the density to be D(x) = supf∈H f(x)? (2) If we have a finite state space and discrete σ-algebra, construct the function D directly.

(3) If one has a finite σ-algebra, construct this function D directly.

Terminology: We use ‘essentially unique’ to say that if D1 and D2 are two possible choices for the density of µ with respect to ν, then D1 = D2 ν-a.e. We now state a version of Lebesgue-Radon-Nikodym Theorem corresponding to the case of a signed measure.

Theorem 8.3.2 (Lebesgue-Radon-Nikodym Theorem for a signed measure) Let ν be a σ-finite posi- tive measures, and µ be a signed measure on (X , F). Then,

1Hint: Let A be a non-measurable set, let us consider the collections of functions indexed by A as follows: ( 1, x = y f (x) = = 1 (x). y 0, x 6= y {y}

Is supy∈A f(y) measurable? 8.3. RADON-NIKODYM THEOREM 113

(1) [Lebesgue decomposition] there exist, uniquely, two signed measures µ1 and µ2 with the property that µ = µ1 + µ2, µ1 ⊥ ν, µ2  ν.

(2) [Radon-Nikodym Theorem] There exists an integrable function D : X → R such that µ2(A) = R A D dν for all A ∈ F. Any two such functions agree ν almost-everywhere.

(3) If µ is a positive measure then D ≥ 0 ν-a.e.

Proof The existence of µ1, µ2 and D follows by applying the previous Lebesgue-Radon-Nikodym Theorem to the positive measures µ+ and µ−. To prove uniqueness of the Lebesgue Decomposition, assume that we have µ = µ1 + µ2 =µ ˜1 +µ ˜2 with µ1 ⊥ ν and µ˜1 ⊥ ν, while µ2  ν and µ˜2  ν. In particular |µ1| ⊥ ν and |µ˜1| ⊥ ν, hence by Lemma 8.1.2 there exists a measurable susbet B such that c c |µ1|(B ) = |µ˜1|(B ) = 0, while ν(B) = 0. Then, for all measurable subset A,

c µ1(A) = µ1(A ∩ B) + µ1(A ∩ B ) = µ1(A ∩ B),

c c c where we used that |µ1(A ∩ B )| ≤ |µ1|(A ∩ B ) ≤ |µ1|(B ) = 0 to obtain the second equality. But since ν(A ∩ B) = 0 and µ2  ν, we also have µ2(A ∩ B) = 0. Hence the above equality yields µ1(A) = µ(A ∩ B). By symmetry we also have µ˜1(A) = µ(A ∩ B), so µ1 =µ ˜1, and therefore we also obtain µ2 = µ − µ1 = µ − µ˜1 =µ ˜2. Finally, the uniqueness ν-a.e. of the function D follows from Exercise 5.5.3, while statement (3) follows from the uniqueness of the density D in the Radon-Nikodym Theorem for positive measures. 

dµ Definition 8.3.2 Also in this case, we denote D by dν and call it the Radon-Nikodym derivative of µ with respect to ν. Note that in this case, D corresponds to a unique element of L1(X , F, ν).

R Exercise 8.3.2 Define a Borel measure on R by µ(A) = A p(x)dx + 1A(2π) where p is a continuous function vanishing everywhere on the complement of [−1, 1]. Is µ absolutely continuous with respect to the Lebesgue measure dx? Write down its Lebesgue decomposition w.r.t. dx

Proposition 8.3.3 Let µ1, µ2, µ3 be σ -finite measures, positive measures.

1. Suppose µ  µ . If g ∈ L (µ ) then g dµ1 ∈ L (µ ) and 1 2 1 1 dµ2 1 2 Z Z dµ1 gdµ1 = g dµ2. dµ2

2. If µ1  µ2, µ2  µ3 then µ1  µ3, and

dµ1 dµ1 dµ2 = , µ3 − almost-everywhere. dµ3 dµ2 dµ3 8.3. RADON-NIKODYM THEOREM 114

Proof (1) If g = 1A and µ1(A) < ∞, we know by the definition, Z dµ1 µ1(A) = dµ2. A dµ2 So the claim holds. This extends to simple functions by linearity, to non-negative functions by the monotone convergence theorem and to any L1 functions by linearity again. (2) The transitivity of the absolute continuity is clear(check!). Use part (1), first suppose that dµ1 ∈ dµ2 L1(µ2), we see Z g= dµ1 Z dµ1 dµ2 dµ1 dµ2 µ1(A) = dµ2 = dµ3. A dµ2 A dµ2 dµ3

Then we assume µ1 is a positive σ-finite measure. break down X = ∪Ej where µ1(Ej) < ∞ and Ej are disjoint measurable sets. We apply the argument, on each Ej, then sum them up. For a signed measure + − we break it down to m1 = µ − µ and to each we apply the above conclusion. 

Definition 8.3.3 If µ  ν and ν  µ we say that they are equivalent. In this case they have the same set of measure zero’s.

dµ 1 Proposition 8.3.4 If µ and ν are equivalent then dν = dν a.e. dµ Chapter 9

Conditional Expectations

Let us take a probability space (Ω, F, P). The σ-algebra F represents the set of all observation and measurements one can make on a physical system. From observations ( representing partial information) we would like to make predictions on the value of an observables, an observable is a measurable function ( a random variable). The observed information is represented by a sub-σ-algebra G. Suppose G = σ(Y ) where Y is a random variable (this represents information obtained by for example an experiment). If a random variable X :Ω → R is furthermore measurable w.r.t. G, then there exists a Borel measurable function ϕ : R → R, such that X = ϕ(Y ), thus the observable X is determined by the information in G (i.e. by Y ). If X is not measurable, can we predict its value by G, what is the best approximation for it?

We denote by EX or E(X) the integral of X, called the (mathematical) expectation of X: Z EX = XdP. Ω

This is the best predicted value for X given F. If X is real valued, EX minimizes, among all constants, 2 the value minc E(X − c) .

9.1 Conditional expectation and probabilities

Definition 9.1.1 Let X be a real-valued random variable on some probability space (Ω, F, P) such that E|X| < ∞ and let G be a sub σ-algebra of F. Then the conditional expectation of X with respect to G is an integrable, G-measurable random variable X0 such that Z Z X(ω) P(dω) = X0(ω) P(dω) , (9.1) A A for every A ∈ G. We denote this by X0 = E(X | G).

115 9.1. CONDITIONAL EXPECTATION AND PROBABILITIES 116

Proposition 9.1.1 With the notations as above, there exists a unique element conditional expectation X0 = E(X | G) exists and is unique as an element of L1(Ω, G, P). If moreover X ≥ 0 a.s., then E(X|G) ≥ 0 a.s.

Proof Denote by P also the restriction of P to G and define the signed measure Q on (Ω, G) by Z Q(A) = X(ω) P(dω), ∀A ∈ G. A Then Q is a signed measure on (Ω, G) which is absolutely continuous with respect to P . Its density with respect to P , given by the Radon-Nikodym theorem, is then a G-measurable functions and is the required conditional expectation. The uniqueness follows from the uniqueness statement in the Radon- Nikodym theorem. Finally, if X ≥ 0 a.s., then Q is non-negative measure, so E(X|G) ≥ 0 a.s. by the Radon-Nikodym theorem for positive measures. 

Using the Monotone Class Theorem we obtain:

Proposition 9.1.2 For all bounded G-measurable function g. Z Z gX dP = gE(X|G) dP. (9.2) Ω Ω

Example 9.1 • If G = {φ, Ω}, verify that E(X|G) = E(X). Remember that X0 being {φ, Ω}- measurable means that the pre-image under X0 of an arbitrary Borel set is either φ or Ω, so the conditional expectation is a constant.

• If X is G-measurable, then E(X|G) = X a.e.

Example 9.2 If the only information we know is whether a certain event B happened or not, then it should by now be intuitively clear that the conditional expectation of a random variable X with respect to this information is given by E(X|B), if B happened and by E(X|Bc) if B didn’t happen. It is a straightforward exercise that the conditional expectation of X with respect to the σ-algebra FB = {φ, Ω,B,Bc} is indeed given by ( E(X|B) if ω ∈ B E(X | F )(ω) = B E(X|Bc) otherwise.

Proof. Denote by Y the right hand side. Then, E(1BY ) = E (E(X|B)1B) = E(X1B). Similarly E(1Bc Y ) = E(1Bc X), and hence also E(X) = E(Y ).

If Y is a measurable function, we denote

E(X|Y ) = E(X|σ(Y )). (9.3) 9.1. CONDITIONAL EXPECTATION AND PROBABILITIES 117

Exercise 9.1.1 Suppose that {A1,...,An} is a partition of Ω by measurable subsets: the Ai are disjoint and their union is Ω. We further assume that P(Ai) > 0 for every i. Let G be generated by them. Then n X E(X|G) = E(X|Ai)1Ai . i=1

If Y a.s. takes only countably many values si, each of which are taken with positive probability, then {Y = si} is a partition of Ω and X E(X|Y ) = E(X|Y = si)1Y =si . i

9.1.1 Conditioning on a random variable

In many situations, we will describe F 0 as the σ-algebra generated by another random variable Y : Ω → X . By the factorization lemma there exists ϕ : X → R such that E(X|Y ) = ϕ(Y ) a.s. If two functions ϕ and ϕ0 satisfy this relation, then ϕ = ϕ0 on a set of B(X ) of full measure (measured by the law of Y which is denoted by PY ). To see this just note that if A = {ω : ϕ(Y (ω) =ϕ ˜(Y (ω)} and −1 −1 B = {x : ϕ(x) =ϕ ˜(x)}, then A = Y (B) and PY (B) = P(Y (B)). And we do know that by the   uniqueness fo the conditional expectation, P {ω : ϕ(Y (ω)) =ϕ ˜(Y (ω)} = 1.

Notation. We denote by E(X|Y = y) the measurable function ϕ : X → R such that E(X|Y ) = ϕ(Y ).

E(X|Y ) = ψ(Y ) a.s. If two functions ϕ and ϕ˜ satisfy this relation, by the uniqueness fo the condi- tional expectation,   P {ω : ϕ(Y (ω)) =ϕ ˜(Y (ω)} = 1.

Write Y∗µ for the pushed forward measure on C. Then

0 (Y∗µ)({ϕ = ϕ }) = 1.

(Just note that if A = {ω : ϕ(Y (ω)) =ϕ ˜(Y (ω))} and B = {x : ϕ(x) =ϕ ˜(x)}, then A = Y −1(B) and −1 (Y∗µ)(B) = P(Y (B)) by the definition).

Definition 9.1.2 We denote by E(X|Y = y) the measurable function ϕ : X → R s.t. E(X|Y ) = ϕ(Y ).

Example 9.3 Take a measurable function Y :Ω → E = {y1, y2, . . . , yn}. Define

−1 Ai = Y ({yi}) = {ω : Y (ω) = yi}.

Then σ(Y ) = σ{Ai} and Ai ∩ Aj = φ if i 6= j. Let X ∈ L1(Ω, F,P ). Define ( E(1Ai X) P (A ) , if P (Ai) 6= 0 ϕ(yi) = i . 0, if P (Ai) = 0 9.2. PROPERTIES 118

Define E(X1Ai ) E(X|Ai) = , if P (Ai) 6= 0, P (Ai) otherwise define it to be zero. Then n n X X ϕ(y) = E(X|Ai)1Ai (y), ϕ(Y (ω)) = E(X|Ai) 1{Y =yi}. i=1 i=1

Then E(X|Y )(ω) = ϕ(Y (ω)). Indeed for any Ak ∈ σ(Y ),

n ! Z Z X Z Z ϕ(Y )dP = E(X|Ai) 1Ai dP = E(X|Ak) dP = XdP. Ak Ak i=1 Ak Ak

We could also define the function E(X|Y = y) as follows.

Definition 9.1.3 Define PY (A) = P(Y ∈ A), the probability distribution of Y . Let X be a real-valued random variable which is integrable or non-negative. We denote by E(X | Y = y) any random function such that for any B Borel measurable, Z Z E(X | Y = y) PY (dy) = XdP. B {ω:Y (ω)∈B}

When conditioning on a number of variables Y1,...,Yn, the following notation is also used:

E(X|Y1,...,Yn) = E(X|σ(Y1) ∨ · · · ∨ σ(Yn)). (9.4)

9.2 Properties

Conditional expectation has very nice properties, they behave almost like integration with respect to a (family of ) measures.

Proposition 9.2.1 • (linearity) If X,Y are integrable and a, b are numbers, then

E(aX + bY |G) = aE(X|G) + bE(Y |G), a.s.

•| E(X|G)| ≤ E(|X||G) a.s, and as a consequence E(|E(X|G)|) ≤ E(|X|).

• (monotonicity) If X and Y are integrable and X ≤ Y a.s., then E(X|G) ≤ E(Y |G) a.s.

Proof The linearity follows from uniqueness of conditional expectations. (ii) is a direct consequence of the Radon-Nikodyme theorem.  9.3. CONDITIONAL EXPECTATION OF A NON-NEGATIVE RANDOM VARIABLE 119

Lemma 9.2.2 If Xn is a sequence of integrable r.v. such that E|Xn − X| → 0 for some integrable r.v. X, then E|E(Xn|G) − E(X|G)| → 0.

Proof We have |E(Xn − X G)| ≤ E(|Xn − X| G), and the latter converges in L1 as

E(E(|Xn − X| G)) = E|Xn − X| → 0, completing the proof. 

9.3 Conditional expectation of a non-negative random variable

So far, we have only defined conditional expectations of integrable random variables. In analogy with integrals/expectations, it would be natural to have a corresponing notion for non-negative random vari- ables.

Let X :Ω → [0, ∞] be a non-negative r.v., and let G ⊂ F be a sub-σ-algebra.

Theorem 9.3.1 There exists a G-measurable r.v. Y with values in [0, ∞] s.t.

∀A ∈ G, E(1A X) = E(1A Y ).

Such a r.v. is unique up to a P-null set.

We denote the r.v. provided by the Theorem by E(X|G), and call it the conditional expectation of X given G.

Proof For all n ≥ 1, the truncated r.v. min(X, n) is integrable, hence E(min(X, n)|G) is well- defined. Moreover, by montonicity of the conditional expectation, almost-surely E(min(X, n)|G) is non-decreasing in n, hence the r.v. Y := lim ↑ E(min(X, n)|G) is well-defined on an event of prob- n→∞ ability 1 (we can extend it arbitrarily by 0 outside of this event). Then, for all A ∈ G, invoking the Monotone Convergence Theorem, we have

E(1A Y ) = lim E(1A E(min(X, n)|G)) = lim E(1A min(X, n)) = E(1A X). n→∞ n→∞

Thus Y has the requested property. Now, if Y˜ is another non-negative G-measurable r.v. such that E(1A X) = E(1A Y˜ ) for all A ∈ G, then we have E(1A Y ) = E(1A Y˜ ) for all A ∈ G. By Exercise ˜ 5.5.3, we deduce that Y = Y a.s. 

Remark 9.3.2 If a non-negative random variable X is also integrable, the uniqueness statement of the above Theorem ensures the two notions for E(X|G) actually coincide a.s. 9.4. SOME FURTHER PROPERTIES 120

Proposition 9.3.3E (X|G) is, up to a P-null set, the only non-negative r.v. such that, for all non-negative G-measurable r.v. Z, E(ZX) = E(Z E(X|G)).

Proof The statement is true for Z = 1A for A ∈ G, and extends to any non-negative r.v. Z by linearity and by the Monotone Convergence Theorem (thorugh approximating Z by non-negative simple functions). The uniqueness statement follows at once by the uniqueness part of Theorem 9.3.1. 

Example 9.4 1. If G = {φ, Ω}, then E(X|G) = E(X) a.s. (note that E(X) ∈ [0, ∞] in this non- negative case).

2. If X is G-measurable, then E(X|G) = X a.s.

Proposition 9.3.4 Let X,Y :Ω → [0, ∞] be two non-negative r.v., and let G ⊂ F be a sub-σ-algebra.

1. (Linearity) For all a, b ≥ 0, E(aX + bY |G) = aE(X|G) + bE(Y |G) a.s.

2. E(E(X|G)) = E(X)

3. (Monotonicity) If X ≤ Y a.s., then E(X|G) ≤ E(Y |G) a.s.

Proof The first two properties follow at once from the characteristic property of the conditional ex- pectation of a non-negative r.v., so we focus on the third one. If X ≤ Y a.s., then for all n ≥ 1, min(X, n) ≤ min(Y, n) a.s. Hence, by linearity of the conditional expectation of integrable random variables, E(min(X, n)|G) ≤ E(min(Y, n)|G) a.s. Sending n → ∞ yields the result. 

Exercise 9.3.1 Assume that X,Y are non-negative random variables, Z is an integrable random variable, and X ≤ Y + Z a.s. Show that

E(X|G) ≤ E(Y |G) + E(Z|G) a.s.

9.4 Some further properties

We now state some further properties that are specific to conditional expectations. We first recall some definitions.

Definition 9.4.1 1. We say B is independent of G if P(A ∩ B) = P(A)P(B) for every A ∈ G.

2. We say two σ-algebras F and G are independent is any A ∈ F is independent of any B ∈ G.

3. Two random variables are independent if σ(X) is independent of σ(Y ).

4. A random variable is said to be independent of a σ-algebra G, if σ(Y ) and G are independent. 9.4. SOME FURTHER PROPERTIES 121

Proposition 9.4.1 Let X ∈ L1(Ω, F, P) and F 0 be a sub-σ-algebra of F. Then the following hold

1. E(X) = E(E(X|F 0)).

2. If X is F 0 measurable, then E(X|F 0) = X.

3. (Tower property) If G1 ⊂ G2,

E(X|G1) = E(E(X|G1)|G2)) = E(E(X|G2)|G1)).

4. (Without any information, the best estimate is expectation) If X is independent of G,

E(X|G) = E(X), a.e.

Proof (1) To see this set A = Ω in equation (9.1.1). (2) follows from uniqueness.

(3) If G1 ⊂ G2, E(X|G1) is G2-measurable, so E(X|G1) = E(E(X|G1)|G2). One can also show by the definition that E(X|G1) = E(E(X|G2)|G1)). R R (4) For any A ∈ G, A E(X|G)dP = E(X1A) = E(X)P (A) = A E(X) dP . Since E(X) is a constant, hence G-measurable r.v., the conclusion follows. 

The following result is a generalisation of Property 4. of the previous Proposition.

Proposition 9.4.2 Suppose that σ(X) ∨ G is independent of A, then E(X|A ∨ G) = E(X|G).

Proof Let A ∈ A,B ∈ G, Z XdP = E(X1B)P (A), A∩B Z Z E{X | G}dP = E{X1B|G}dP = P (A)E(X1B). A∩B A Since {A ∩ B} forms a π-system, and we can prove that Z Z C = {D ∈ G ∨ A : XdP = E{X|G}dP } D D is a λ system. Hence C = G ∨ A. 

Exercise 9.4.1 Let X,Y ∈ L1(Ω, F,P ) and G a sub-σ-algebra of F. If X is G measurable, XY ∈ L1 then E(XY |G) = XE(Y |G). This means we take out ‘what is known’. 9.5. CONVERGENCE THEOREMS 122

9.5 Convergence Theorems

The important convergence theorems of integration admit an analog statement for conditional expecta- tions, which we now state. Let henceforth G be a sub-σ-algebra of F.

Theorem 9.5.1 (Conditional Monotone Convergence Theorem) If (Xn)n≥1 is a sequence of non-negative random variables such that Xn ↑ X, then E(XnG) ↑ E(X|G) a.s.

Proof (1) Note that a.s. E(Xn|G) is non-negative and non-decreasing in n, hence it admits a limit Y , which is G-measurable. Then, by the Monotone Convergence Theorem, for all A ∈ G,

E(1A Xn) ↑ E(1AX), E(1A E(Xn|G)) ↑ E(1A Y ).

Since E(1A Xn) = E(1AE(Xn|G)) for all n ≥ 1, sending n → ∞ we deduce that E(1AX) = E(1AY ). This being true for all A ∈ G, and since Y is G-measurable, we deduce that E(X|G) = Y = limn→∞ E(Xn|G) a.s. 

Lemma 9.5.2 (Conditional Fatou Lemma) If (Xn)n≥1 is a sequence of non-negative r.v., then

E(lim inf Xn|G) ≤ lim inf E(Xn|G) a.s. n→∞ n→∞

Proof By the Conditional Monotone Convergence Theorem

E( inf Xn|G) −→ E(lim inf Xn|G) a.s. n≥m n→∞ n→∞

But, for all m ≥ 1 and p ≥ m, by monotonicty of the conditional expectation

E( inf Xn|G) ≤ E(Xp|G) a.s. n≥m whence E( inf Xn|G) ≤ inf E(Xp|G) a.s. n≥m p≥m Sending n → ∞ in the above inequality yields the result. 

Theorem 9.5.3 (Conditional Dominated Convergence Theorem) If (Xn)n≥1 is a sequence of random variables such that

• Xn converges a.s. to a random variable X

• there exists an integrable non-negative r.v. Y such that, for all n ≥ 1, |Xn| ≤ Y a.s.

1 Then E(Xn|G) −→ E(X|G) a.s. and in L . n→∞ 9.6. CONDITIONAL EXPECTATION OF SQUARE-INTEGRABLE RANDOM VARIABLES 123

Proof First note that the assumptions guaranteee that the r.v. Xn and X are all integrable, so their conditional expectations are well-defined. At the expanse of modifying Xn, X and Y on a null-set, we may assume that Xn −→ X pointwise, and that |Xn| ≤ Y for all n ≥ 1. Then for all n ≥ 1, n→∞ Zn := 2Y − |Xn − X| is a non-negative r.v. By the conditional Fatou Lemma applied to Zn,

2E(Y |G) ≤ lim inf (2E(Y |G) − E(|Xn − X||G)) a.s. n whence lim sup E(|Xn − X||G) ≤ 0 a.s., i.e. E(|Xn − X||G) −→ 0 a.s. Therefore n n→∞

|E(Xn|G) − E(X|G)| ≤ E(|Xn − X|G) −→ 0 a.s. n→∞ which entails the a.s. convergence statement. For the convergence in L1, note that

E(|E(Xn|G) − E(X|G)|) ≤ E(E(|Xn − X|G)) = E(|Xn − X|) and E(|Xn − X|) −→ 0 by the (usual) Dominated Convergence Theorem. n→∞ 

9.6 Conditional expectation of square-integrable random variables

In this section we will see an interesting interpretation of the conditional expectation of a square- integrable r.v. in terms of an orthogonal projection. We first need a conditional version of Jensen’s inequality.

9.6.1 Conditional Jensen inequality

Proposition 9.6.1 (Conditional Jensen Inequality) Let ϕ : R → R+ be a convex function. Then

ϕ (E(X|G)) ≤ E(ϕ(X)|G), a.s.

Proof Since ϕ is convex, for all x ∈ R, we have

0 ∀y ∈ R, ϕ(y) − ϕ(x) ≥ ϕ+(x)(y − x), (9.5) where 0 ϕ(x + h) − ϕ(x) ϕ+(x) := lim ∈ R h→0 h h>0 is the right-hand derivative of ϕ at x. In particular, we have ϕ0 (x) := lim ϕ(x+1/n)−ϕ(x) : since ϕ is + n ∞ 1/n 0 a convex, hence continuous function on R, this shows that x 7→ ϕ+(x) is a pointwise limit of Borel 9.6. CONDITIONAL EXPECTATION OF SQUARE-INTEGRABLE RANDOM VARIABLES 124 measurable functions, hence it is Borel measurable. Applying for each ω ∈ Ω the inequality (9.5) with x = E(X|G)(ω) and y = X(ω) yields the inequality

ϕ(X) − ϕ(E(X|G)) ≥ Z (X − E(X|G)), (9.6)

0 0 where Z is the random variable ϕ+(E(X|G)). Note that, since E(X|G) is G measurable and x 7→ ϕ+(x) is Borel measurable, Z is G-measurable by the composition rule. At this stage it would be tempting to take E(·|G) in both sides of (9.6) to conclude, however a caveat is that the random variable in the right-hand side is neither non-negative nor integrable in general.

To circumvent this difficulty, we proceed with a truncation argument. By (9.6), for all n ≥ 1, we have

ϕ(X) ≥ ϕ(E(X|G))1{|Z|≤n} + Z1{|Z|≤n} (X − E(X|G)), where now the random variable Z1{|Z|≤n} (X − E(X|G)) is bounded by n(|X| + |E(X|G)|), so it is integrable as X and E(X|G) are. By Exercise 9.3.1, we therefore obtain

E(ϕ(X)|G) ≥ ϕ(E(X|G))1{|Z|≤n} + E(Z1{|Z|≤n} (X − E(X|G))|G) a.s., where we used the fact that the first term in the right-hand side is G-measurable. Now, since Z1{|Z|≤n} is G-measurable, by Exercise (9.4.1), the second term in the right-hand side equals

Z1{|Z|≤n} E(X − E(X|G)|G) = 0 a.s.

Thus for all n ≥ 1, we have E(ϕ(X)|G) ≥ ϕ(E(X|G))1{|Z|≤n} a.s. Since 1{|Z|≤n} −→ 1, by taking n→∞ n → ∞ we obtain the claim. 

p p Corollary 9.6.2 For p ≥ 1, kE(X|G)kp ≤ kXkp. In particular, if X ∈ L , then E(X|G) ∈ L .

p Proof It suffices to apply the conditional Jensen inequality with ϕ(x) = |x| , x ∈ R. 

9.6.2 Conditional expectation as orthogonal projection

We have seen above that Lp, the space of Lp integrable and F-measurable random variables, is complete under the norm Lp, and simple functions are dense in Lp. Furthermore L2 is a Hilbert space, we denote it by L2(Ω, F,P ). If G is a sub-σ algebra, the sub-space of L2(Ω, F,P ) that are measurable with respect to G, which we denote by L2(Ω, G,P ) is a closed sub-space of L2.

Since L2(Ω, F,P ) is a Hilbert space and L2(Ω, G,P ) is a closed subspace of L2, let π denote the orthogonal projection defined by the projection theorem ( §II.2 Functional Analysis [?]),

π : L2(Ω, F,P ) → L2(Ω, G,P ). 9.7. USEFUL INEQUALITIES AND EXERCISES* 125

Theorem 9.6.3 If X ∈ L2(Ω, F,P ) then E(X|G) = π(X).

Proof * Let f ∈ L2(Ω, F,P ). Then for any h ∈ L2(Ω, G,P ),

hf − πf, hiL2(Ω,F,P ) = 0. R R This is, Ω fhdP = Ω πfhdP .

f f − πf

πf Let A ∈ G and take h = 1A to see that πf = E(f|G). 

Corollary 9.6.4E (X) is the constant that minimizes E(X − c)2 and E(X|G) minimizes E(X − Y )2 where Y is a G-measurable random variable.

Remark 9.6.5 The projection πX is the unique element of L2(Ω, G,P ) such that

E|X − πX|2 = min E|X − Y |2. Y ∈L2(Ω,G,P )

Remark 9.6.6 The conditional expectation defines on L2 can be extended to L1 to give the conditional expectation, as defined earlier. Indeed, simple functions and bounded functions are in L2. Let f ∈ L1 with f ≥ 0. Let fn be a sequence of bounded functions (increasing with n) converging to f pointwise. Then πfn is well defined and Z Z fndP = πfndP, A ∈ G. A A

Since fn increases with n and is positive; so does πfn and limn→∞ πfn exists. We may exchange limits and integration: Z Z Z fdP = lim fndP = lim π(fn)dP. A n→∞ A A n→∞ Thus we define E(f|G) = lim π(fn) n→∞ + − + − For f ∈ L1 let f = f − f and define E(f|G) = E(f |G) − E(f |G).

9.7 Useful Inequalities and exercises*

Exercise 9.7.1 (Chebychev’s inequality/ Markov inequality) If X is an Lp random variable then for a ≥ 0, 1 P (|X| ≥ a) ≤ E|X|p. a 9.7. USEFUL INEQUALITIES AND EXERCISES* 126

9.7.1 Exercises

Exercise 9.7.2 Suppose that Y1,Y2 be random variables on a probability space (Ω, A, P) taking values in two measurable spaces (X1, F1) and (X2, F2) respectively. Let X be an integrable random variable on (Ω, A, P). Show that the statement that a σ(Y1)∨σ(Y2)-measurable random variable Z is the conditional expectation of X with respect to σ(Y1) ∨ σ(Y2) is equivalent to one of the following statements

1. For any A ∈ σ(Y1) and B ∈ σ(Y2), Z Z Z dP = X dP, A∩B A∩B

2. For any gi : Xi → R Borel measurable and bounded,

E (g1(Y1)g2(Y2)Z) = E (g1(Y1)g2(Y2)X) ,

Exercise 9.7.3 Let h : E × E → R be a function, where (E, F) is some measurable space. Let X,Y be random variables on a probability space (Ω, A, P) with state space E such that h(X,Y ) ∈ L1. We assume that X and Y are independent. Let H(y) = E (h(X, y)), for y ∈ E. Show that

E(h(X,Y )|σ(Y )) = H(Y ).

0 Exercise 9.7.4 Let X :Ω → X1 be F ⊂ F measurable and let Y :Ω → X2 be independent of 0 F . Let Φ: X1 × X2 → R be measurable and such that E|Φ(X,Y )| < ∞. For x ∈ X1 define R   g(x) = E[Φ(x, Y )] = Ω Φ x, Y (ω) P(dω). Then   0 E Φ(X,Y ) F = g(X).

Proof Let gi : Xi → R be bounded measurable. Let Φ(x, y) = g1(x)g2(y).

0  0 E g1(X)E(g2(Y )|F ) = g1(X)E g2(Y )kF = g1(X)Eg2(Y ) = Φ(˜ X). n o def 0 ˜ So g1(X)g2(Y ) ∈ H. where H = Φ: E Φ(X,Y ) F = Φ(X) . The earlier statement shows that if A × B ∈ C, where C = {A × B : A ∈ B(X1),B ∈ B(X2)}. That C is a π system follows from A × B ∩ C × D = (A ∩ C) × (B ∩ D). Finally, the π-system C generates B(X1) × B(X2) by the π − λ theorem, and then we conclude the statement holds for any bounded measurable functions Φ, by approximating Φ with simple functions, and thus H contains all bounded Borel measurable functions from X1 × X2 → R. 

Exercise 9.7.5 Prove that if X is an integrable random variable such that (X,Y ) has a joint density p(x, y), then for any Borel measurable set A and x ∈ R, R A p(x, y)dx P (X ∈ A|Y = y) = R ∞ . −∞ p(x, y)dx 9.8. CONDITIONAL PROBABILITY 127

9.8 Conditional probability

We define similarly the concept of conditional probability.

Definition 9.8.1 Let B ∈ F, define

0 0 P(B|F ) := E(1B|F ).

This is called the conditional probability of B given F 0.

0 0 Remark 9.8.1 ** For every B ∈ F, this identity P(B | F ) = E(1B | F ) holds outside of a null set. If Bi is a sequence of disjoint sets, there is a common set of null sets for every Bi, and so

∞ [ 0  X 0 X 0 P( Bi | F ) = E 1Bi | F = P(B|F ) i=1 i i holds a.e. To make P(· | F 0) a probability measure, the σ-additive property must hold for all sequences of disjoint unions, that is an unaccountable number of sequences in general, and one has to be able to choose a common set of null sets for all of them. If this can be done we have a family of probability measures P(·|F 0)(ω), called regular conditional probabilities, and then Z E(X|F 0)(ω) = X(ω0)P(dω0|F 0)(ω), a.e.. Ω See Thm 7.1 in ‘Probability measures on metric spaces’ by K.R. Parthasarthy: regular conditional prob- abilities exist for complete separable metric spaces and their Borel σ-algebras.

9.8.1 Finite σ-algebras

For finite σ-algebras, we can construct a family of probability measures P ω, so the conditional expecta- tions is integration w.r.t. this family, by which we mean E(X|G)(ω) = R XdP ω.

Let A, B ∈ F. If P (B) > 0. define

P (A ∩ B) P (A|B) = . P (B)

Let X :Ω → R, define, E(X1 ) E(X|B) = B . P (B)

If {B1,...,Bn} is a partition of Ω with P (Bi) > 0 for all i, we define G = σ{Bi, i = 1, . . . , n}. Then for ω ∈ Bi, E(X|G)(ω) = E(X|Bi)(ω). 9.8. CONDITIONAL PROBABILITY 128

Fix ω. For A ∈ F, define n X P (A|G)(ω) = P (A|Bi)1Bi (ω). i=1 This is a probability measure, denoted by P (dω0|G)(ω). And Z E(X|G)(ω) = X(ω0)P (dω0|G)(ω).

For a large class of general σ-algebra, we can indeed construct a family of probability measures, called conditional probabilities. Chapter 10

Mastery Material: Ergodic Theory

The mastery material are basics of ergodic theory. These are: definitions of measure preserving trans- formations, invariant sets, ergodic measures, Poincare’s´ Recurrence Theorem and Birkhoff’s ergodic Theorem (and their proofs). These can be found in many textbooks.

• An Introduction to Infinite Ergodic Theory, by Jon Aaronson, published by the Americam Mathe- matical Society. In library, 517.518.1AAR

• Ergodic Theory and Dynamical systems by Yves Coudene (Chapter 2)

https://link.springer.com/content/pdf/10.1007

• Introduction to ergodic theory, by Peter Walters

• Ergodic theory and information by Billingsley.

For your convenience, some basic concepts are given below, this is not the full scope of the reading material. Please refer to the references for a comprehensive study.

10.1 Basic Concepts

Let (X , F, µ) be a measure space with the measure σ-finite.

Definition 10.1.1 A map T : X → X is a measure preserving transformation (map) if the pushed forward measure by T is the same as µ, i.e.

T∗µ = T.

129 10.1. BASIC CONCEPTS 130

In other words µ(T −1(A)) = µ(A) for every measurable set A. We also say T preserves µ, also µ is invariant under T , and also µ is an invariant measure for T .

A Dirac measure δa is invariant under T if T leaves the point a fixed.

Definition 10.1.2 1. The invariant σ-algebra of a transformation T is

I = {A ∈ F : T −1(A) = A}.

2. A function f : X → R is invariant under T if f(T x) = f(x) for every x ∈ X .

It is easy to see that I is a σ-algebra.

Theorem 10.1.1 (Poincare´ Recurrence Theorem) Let (X , F, µ) be a finite measure space and let T : X → X be a measure preserving transformation. Let A ∈ F. Then for almost every point in A, there exists some n ≥ 1 (and hence an infinitely many n) such that T n(x) ∈ A.

If µ is a probability measure the celebrated Birkhoff’s ergodic theorem states that the time average equals to spatial average.

Theorem 10.1.2 (Birkhoff’s ergodic theorem) Let (X , F, µ) be a probability space and T preserves measure. Then for any f ∈ L1,

n 1 X lim f(T jx) = E(f|I)(x), a.e. n→∞ n j=1

A transformation is said to be non-singular if T∗µ  µ. Let T : X → X preserves a probability measure µ. We say T is ergodic (we also say µ is ergodic) if whenever A ∈ I then µ(A) = 0 or µ(Ac) = 0.

10.1.1 Example: circle rotation

The Lebesgue measure on the unit circle S1 is defined to be the pushed forward measure on the Lebesgue measure on [0, 1) to S1 by the map α 7→ e2παi. A rotation is x 7→ xe2πθi where θ ∈ R.

We are going to actually identify the unit circle S1 with the quotient space R/Z, so x ∼ y if x = n+y, where n is an integer. So S1 is [0, 1] with 0 and 1 glued together. From now on this is what we use.

1 1 1 Definition 10.1.3 Let θ ∈ S define Tθ : S → S by Tθ(α) = θ + α mod 1. This is a called a rotation of the circle.

The Lebesgue measure is invariant under any circle rotation. Chapter 11

Exercises from the problem sheets

11.1 Problem Sheet 1: Measurable sets

Exercise 11.1.1 Let (X , A) be a measurable space, and let (An)n≥1 be a sequence of measurable sets. Show that the sets lim inf An and lim sup An are measurable, where [ \ \ [ lim inf An = An, lim sup An = An. N≥1 n≥N N≥1 n≥N

Exercise 11.1.2 If f :Ω → X is a map and B is a σ-algebra over X , prove that the collection of pre-image sets σ(f) := {f −1(A): A ∈ B} is a σ-algebra.

Exercise 11.1.3 Let Ω be a set. Below we endow R with the Borel σ-algebra B(R).

1. If X :Ω → R is a constant function, what is σ(X)?

2. If X :Ω → {0, 1}, and {0, 1} is endowed with the discrete σ-algebra, what is σ(X)?

3. We say that X :Ω → R is an elementary function if it takes a finite number of values. If X is an elementary function, what is σ(X)?

Exercise 11.1.4 Give an example of an algebra which is not a σ-algebra.

Exercise 11.1.5 If F is a σ-algebra over a set X, and A ⊂ X, show that {A∩B : B ∈ F} is a σ-algebra over A.

Exercise 11.1.6 Show that the intersection of an arbitrary family of σ-algebras is a σ-algebra. If C is any collection of subsets of a set X , show that there always exists a smallest σ-algebra containing C.

131 11.1. PROBLEM SHEET 1: MEASURABLE SETS 132

Exercise 11.1.7 Provide counter-examples to show the following:

1. If A and B are two σ-algebras on X , A ∪ B is in general not a σ-algebra on X .

2. If A (resp. B) is a σ-algebras on X (resp. Y), the collection of subsets

{A × B : A ∈ A,B ∈ B}

is in general not a σ-algebra on X × Y.

3. If (X , A) is a measurable space, Y is a set, and f : X → Y is a map, the collection of subsets {f(A): A ∈ A} is in general not a σ-algebra on Y.

Exercise 11.1.8 Let F1 and F2 be two σ-algebras over a set X . Show that F1 ∨ F2, that is the σ-algebra generated by F1 ∪ F2, can equivalently be characterised by the expressions:

•F 1 ∨ F2 = σ{A ∪ B | A ∈ F1 and B ∈ F2},

•F 1 ∨ F2 = σ{A ∩ B | A ∈ F1 and B ∈ F2}.

Exercise 11.1.9 Show that:

1. the Borel σ-algebra over R is generated by closed intervals.

2. the set of irrational numbers is Borel measurable.

3. K is Borel measurable, where K ⊂ [0, 1] is the standard Cantor set defined as follows. One first n defines recursively a sequence Kn, n ≥ 1 of subsets of [0, 1] that are unions of 2 disjoint closed intervals: we set K0 = [0, 1] and, for n ≥ 0, we define Kn+1 by removing the central third of each interval composing Kn. Thus  1 2  K = 0, ∪ , 1 , 1 3 3

 1 2 1 2 7 8  K = 0, ∪ , ∪ , ∪ , 1 , 2 9 9 3 3 9 9 T etc. Then K is defined by setting K := n≥0 Kn. 11.2. PROBLEM SHEET 2: MEASURES 133

Exercise 11.1.10 A collection of subsets E is an elementary family if

• φ ∈ E;

• If A, B ∈ E, then A ∩ B ∈ E;

• if A ∈ E then Ac is a finite union of disjoint sets from E.

Show that if E is an elementary family, then the collection A of finite disjoint unions of members of E is an algebra.

* Exercise 11.1.11 The goal is to show that B(R2) = B(R) ⊗ B(R).

1. Let A be the collection of subsets defined by

A := {A ∈ B(R): A × V ∈ B(R2) for all open subset V ⊂ R}.

Show that A = B(R).

2. Let B be the collection of subsets defined by

B := {B ∈ B(R): A × B ∈ B(R2) for all A ∈ B(R)}.

Show that B = B(R).

3. Deduce that B(R) ⊗ B(R) ⊂ B(R2).

4. We recall that any open subset of R2 is a countable union of subsets of the form U × V , where U, V are open subsets of R. Deduce that B(R2) ⊂ B(R) ⊗ B(R) and conclude.

Exercise 11.1.12 Let X be a set and let C be a non-empty collection of subsets of X . Lucy claims: ”For any A ∈ σ(C), there must exist a countable sub-collection D ⊂ C such that A ∈ σ(D)”. Do you agree with Lucy? Prove your claim or give a counter example.

11.2 Problem Sheet 2: Measures

Exercise 11.2.1 Show that the countable additive property of a measure is equivalent to additive and continuous from below, i.e. if (An)n≥1 is a monotone increasing sequence of sets then

∞ [ lim µ(An) = µ( An). n→∞ n=1

Exercise 11.2.2 We say that a measure µ on (Z, 2Z) is invariant under translation if, for any A ⊂ Z and x ∈ Z, µ(x + A) = µ(A). Find all finite measures on (Z, 2Z) which are invariant under translation. 11.2. PROBLEM SHEET 2: MEASURES 134

Exercise 11.2.3 Consider the measure space (R, B(R), λ), with λ the Lebesgue measure on R.

1. Show that, for any x ∈ R, we have λ({x}) = 0

2. John claims : “One can write R as the disjoint union, over all x ∈ R, of the singleton {x}, and therefore: X X λ(R) = λ({x}) = 0 = 0.” x∈R x∈R What is wrong with John’s argument?

Exercise 11.2.4 Show that any countable set is Borel measurable and has Lebesgue measure zero.

Exercise 11.2.5 Let (X , F, µ) be a probability space and E ∈ F.

1. Show that the set of B ∈ F with the properties µ(E ∩ B) = µ(E)µ(B) is a λ -system.

2. We assume that there exists a π-system C such that σ(C) = F and that, for all B ∈ C, µ(E ∩ B) = µ(E)µ(B). Show that the same equality holds for all B ∈ F.

Exercise 11.2.6 Let X and Y be two sets. We assume that X = A1 ∪ A2 and Y = B1 ∪ B2, where both unions are disjoint. We endow X and Y with the σ-algebras A = σ({A1,A2}) and B = σ({B1,B2}) respectively, and we endow X ×Y with the product σ-algebra A⊗B which is defined as σ({A×B,A ∈ A,B ∈ B}).

1. Show that A ⊗ B is given by the collection of sets written as a disjoint union [ Ai × Bj, (i,j)∈K

where K ⊂ {1, 2} × {1, 2}. Deduce that a measure m on A ⊗ B is uniquely determined by its values on the product sets Ai × Bj, for i, j = 1, 2.

2. Let a1, a2 and b1, b2 be positive numbers. We define a measure µ on X by µ(Ai) = ai, i = 1, 2, as well as a measure ν on Y by ν(Bj) = bj, j = 1, 2. Under which conditions on a1, a2, b1, b2 does there exist a measure m on the product σ-algebra A × B with respective marginals µ and ν, i.e. such that m(A × Y) = µ(A), m(X × B) = ν(B), for all A ∈ A and B ∈ B? When these conditions are fulfilled, describe all such measures.

Exercise 11.2.7 Let (X , A) be a measurable space such that {x} ∈ A for all x ∈ X . We say that x ∈ X is a point atom for µ if µ({x}) > 0. Furthermore, µ is said to be diffuse if it does not have any point atom, while it is said to be discrete if µ(S) = 0 for any set S which does not contain any of its point atoms. Give a few examples of diffuse measures and discrete measures. 11.2. PROBLEM SHEET 2: MEASURES 135

Exercise 11.2.8 Let µ be a Borel measure on Rn. We set

S := {x ∈ Rn, µ(B(r, x)) > 0 for all r > 0}.

Show that:

1. S is a closed subset of Rn,

2. µ(Sc) = 0,

3. any strict closed subset F of S satisfies µ(S \ F ) > 0.

S is called the support of the measure µ.

Exercise 11.2.9 Let λ be the Lebesgue measure on (R, B(R)). Show that λ is regular, in the sense that, for all A ∈ B(R), λ(A) = inf{λ(U): U open set,A ⊂ U} = sup{λ(K): K compact set,K ⊂ A}.

Exercise 11.2.10 Let A be a σ-algebra over a set X . We say that two elements x and y of X are equivalent if ∀A ∈ A, x ∈ A ⇐⇒ y ∈ A.

1. Show that this is indeed an equivalence relation on X .

2. Show that, for all x ∈ X , the equivalence class [x] of x is given by T A. A∈A: x∈A 3. (a) Give a description of the equivalence classes when (X , A) = (R, B(R)), where B(R) is the Borel σ-algebra over R. (b) Give an example of (X , A) for which the equivalence classes are not all singletons.

4. * Show that there does not exist an infinitely countable σ-algebra, i.e. that any σ algebra is either finite or uncountably infinite (Hint: show that if A is countable, then [x] ∈ A for all x ∈ X , and S that for each A ∈ A, we have A = x∈A[x]; conclude by a contradiction argument that A cannot be countably infinite).

Exercise 11.2.11 Let T :Ω → Ω by a bijection and let I = {A ⊂ Ω: T −1(A) = A}.

1. Show that I is a σ-algebra. This is the invariant σ-algebra for T .

2. Show that if X :Ω → R is a function satisfying X = X ◦ T , then X is measurable w.r.t. I.

Pn 3. Let n ∈ N, let X(ω) = i=1 bi1Ai where bi are real numbers and Ai ∈ I. Show that X = X ◦ T . 11.3. PROBLEM SHEET 3: MEASURABLE FUNCTIONS, INTEGRALS 136

4. If X :Ω → R is I-measurable, show that X = X ◦ T . 1

11.3 Problem Sheet 3: measurable functions, integrals

Exercise 11.3.1 Are the following statements true or false? Give a proof or provide a counter-example.

1. If two non-decreasing, right-continuous functions on R only differ by a constant, then they have the same Lebesgue-Stieltjes measure. What about the converse?

2. Let µ and ν be two measures on (R, B(R)), and let C be a π-system such that σ(C) = B(R). If µ and ν agree on all elements of C, then µ = ν.

3. Let (X1, B1) and (X2, B2) be measurable spaces, and f : X1 → X2 a measurable map. If µ is a probability measure on B1, then f∗(µ) is a probability measure on B2.

4. Any continuous function from R to R is Borel measurable.

5. Any non-decreasing function from R to R is Borel measurable.

6. Let f be a differentiable function from (0, 1) to R. Prove that f 0 is Borel measurable.

Exercise 11.3.2 Let (X1, F1) and (X2, F2) be measurable spaces.

1. Show that if F1 is given by the power set of X1, then any map from X1 to X2 is measurable.

2. We assume that F2 contains all singletons {y}, for y ∈ X2. Show that if F1 = {φ, X1}, then a measurable map from X1 to X2 is necessarily constant.

* Exercise 11.3.3 Let (X , A) be a measurable space and Ω a set. Let Y :Ω → X be a measurable function and consider the measurable space (Ω, σ(Y )). Show that X :Ω → R is σ(Y )-measurable if and only if there exists a measurable function f : X → R such that X = f ◦ Y .(Hint: recall that one can approximate pointwise any measurable function by a sequence of simple functions).

Exercise 11.3.4 Let Y be a set and (Yi, Bi)i∈I be a family of measurable spaces. For all i ∈ I we are given a map fi : Y → Yi.

1. We endow Y with σ(fi, i ∈ I), the smallest σ-algebra for which all the fi are measurable. Show that σ(fi, i ∈ I) coincides with σ (∪i∈I σ(fi)).

1Hint: show that if ∞ X j Xn(ω) = n 1 ω:X(ω)∈[ j , j+1 ) (ω). 2 { 2n 2n } j=−∞

then Xn = Xn ◦ T . Show that Xn is an increasing sequence with Xn ↑ X pointwise (by this we mean Xn(ω) ↑ X(ω) for every ω. 11.3. PROBLEM SHEET 3: MEASURABLE FUNCTIONS, INTEGRALS 137

2. Let (X , A) be a measurable space and g : X → Y. Show that g is measurable from (X , A) to (Y, σ(fi, i ∈ I)) if an only if, for all i ∈ I, fi ◦ g is measurable from X to Yi.

Exercise 11.3.5 We recall that if (X , A) and (Y, B) are two σ-algebras, the product σ-algebra A ⊗ B is defined as the σ-algebra over X × Y generated by sets of the form A × B, with A ∈ A and B ∈ B. Moreover, we denote by pX : X × Y → X and pY : X × Y → Y the canonical projections on X and Y .

1. Show that A ⊗ B coincides with σ(pX , pY ), the smallest σ-algebra for which both pX and pY measurable.

2. Let (Z, C) be a measurable space, and let f : Z → X × Y. Show that f is measurable from (Z, C) to (X ×Y, A⊗B) if and only if the maps pX ◦f :(Z, C) → (X , A) and pY ◦f :(Z, C) → (Y, B) are measurable.

3. * Show that B(R2) = B(R) ⊗ B(R) (Hint: recall that any open subset of R2 is of the form ∪n≥1 In × Jn, where In and Jn are open intervals of R).

* Exercise 11.3.6 Let (X , A) be a measurable space. We consider a sequence of Borel measurable maps fn : X → R, n ≥ 1.

1. Show that the set {x ∈ X :(fn(x))n≥1 converges in R} is measurable.

2. Show that if (fn)n≥1 converges pointwise, that is, for all x ∈ R, (fn(x))n≥1 converges in R, then the map limn→∞ fn is Borel measurable from (X , A) to R.

3. Let a ∈ R. Prove the Borel measurability of the map g : X → R defined by ( limn→∞ fn(x) if (fn(x))n≥1 converges in R g(x) := a otherwise.

Exercise 11.3.7 Let [a, b] be an interval. A function f :[a, b] → R is called Lebesgue measurable if it is measurable from ([a, b], B([a, b])) to (R, B(R)). Suppose that I assigns a real number to every bounded Lebesgue measurable function f :[a, b] → R, which we denote by I(f). We assume that I has the following properties:

1. (Linearity) If f, g are bounded Lebesgue measurable and c, d ∈ R, we have I(cf + dg) = cI(f) + dI(g).

2. (Monotonicity) If f ≤ g then I(f) ≤ I(g) 11.3. PROBLEM SHEET 3: MEASURABLE FUNCTIONS, INTEGRALS 138

3. For any Borel measurable subset A of [a, b], I(1A) = λ(A) where λ is the Lebesgue measure.

R b Show that if f is bounded and Riemann integrable, then I(f) = a f(x)dx, where the latter is the Riemann integral of f. 2

Exercise 11.3.8 For any non-decreasing, right-continuous function F : R → R, we shall denote by µF the Lebesgue-Stieltjes measure associated with µF .

1. Show that if µ is a finite Borel measure on R, there exists a unique non-decreasing, right-continuous, bounded function F satisfying F (−∞) = 0, and such that µ = µF . Give an expression for F in terms of µ.

2. Let F : R → R be a non-constant, non-decreasing, right-continuous, bounded function satisfying F (−∞) = 0. Let a := F (+∞) ∈ (0, +∞). We define G : (0, a) → R by

G(u) := inf{x ∈ R : F (x) ≥ u}, u ∈ (0, a).

(a) Show that G is a well-defined, non-decreasing function on (0, a). Deduce in particular that it is Borel measurable. (b) Show that, for all x ∈ R and u ∈ (0, a),

G(u) ≤ x ⇐⇒ F (x) ≥ u.

(c) Let λ denote the Lebesgue measure on (0, a), and let G?(λ) denote the pushed forward measure through G. Show that, for any x ∈ R, G∗(λ)((−∞, x]) = F (x). Deduce that G∗(λ) = µF .

Exercise 11.3.9 Let (X , A, µ) be a measure space with a non-zero measure µ, and let f : E → R be a Borel measurable function. Show that, for all  > 0, there exists A ∈ A with µ(A) > 0 such that, for all x, y ∈ A, |f(x) − f(y)| < .

Exercise 11.3.10 (Egoroff’s Theorem) Let (X , F) be a measurable space endowed with a finite mea- sure µ, and let (fn)n≥1 be a sequence of measurable functions from X to R. We assume that there exists a measurable function f : X → R such that fn −→ f pointwise on X . n→∞

1. Let k ≥ 1. Sow that   \ [  1  µ x ∈ X : |f (x) − f(x)| ≥ = 0.  n k  N≥1 n≥N

2 Pn Hint: Given a partition of [a, b], let mi = inf{f(x): x ∈ [xi−1, xi)}, for which simple function g, I(g) = i=1 mi(xi − xi−1). Review the definition for f to be Riemann integrable. 11.4. PROBLEM SHEET 4: LIMIT THEOREMS, COMPUTATION OF INTEGRALS 139

Deduce that, for all δ > 0, there exists a N ≥ 1 such that   [  1  µ x ∈ X : |f (x) − f(x)| ≥ ≤ δ.  n k  n≥N

2. * Deduce that, for all  > 0, there exists a A ∈ F such that µ(X\ A) ≤  and fn converges to f uniformly on f.

3. * Can one choose such a A with full measure, i.e. such that µ(X\ A) = 0? Prove your statement or give a counter-example.

11.4 Problem Sheet 4: limit theorems, computation of integrals

Exercise 11.4.1 Are the following statements true or false? When appropriate, give a proof or provide a counter-example.

1. Any continuous function on R is integrable with respect to the Lebesgue measure.

2. Any continuous function on [0, 1] is integrable with respect to the Lebesgue measure. R 3. If a Borel measurable function f : R → R is such that R f dλ = 0, then f = 0 almost-everywhere.

4. If fn, f are measurable real-valued functions on a measure space (X , A, µ), and fn ↑ f as n → ∞, R R then fndµ ↑ fdµ as n → ∞.

5. If (fn)n≥1 is a sequence of nonnegative measurable functions on a measure space (X , A, µ) such R R that sup fndµ < ∞, and if fn −→ f pointwise, then fdµ < ∞. n≥1 n→∞

Exercise 11.4.2 If f, g are real valued integrable functions on a measure space (X , µ), show the follow- ing statements hold:

R 1. If µ(A) = 0 then A f dµ = 0. R 2. If A f dµ = 0 for every measurable set A then f = 0 µ almost-everywhere. 3. | R f dµ| ≤ R |f| dµ

Exercise 11.4.3 (Markov’s inequality) Let (X , A, µ) be a measure space and let f be a nonnegative, R R measurable function on X . For all M > 0, show that f dµ ≥ f 1{f≥M} dµ, and deduce that

R f dµ µ({f ≥ M}) ≤ . M 11.4. PROBLEM SHEET 4: LIMIT THEOREMS, COMPUTATION OF INTEGRALS 140

Exercise 11.4.4 Let (X , A, µ) be a measure space.

1 1. Show that if fn → f in L (X , A, µ), then Z Z lim fn dµ = f dµ. n→∞

2. Let (fn)n≥1 be a sequence of nonnegative integrable functions converging µ-a.e. to an integrable function f. We assume that Z Z lim fn dµ = f dµ. n→∞ 1 R + Show that fn → f in L (X , A, µ) (Hint: first show that limn→∞ (f − fn) dµ = 0).

Exercise 11.4.5 Let f be a Borel measurable function on R, and let y ∈ R. Let λ be the Lebesgue measure on R.

1. Show that if f is non-negative, then Z Z f(x + y) dλ(x) = f(x) dλ(x).

2. In general, show that f is integrable if and only if x → f(x + y) is integrable on R, and if so their integrals coincide.

Exercise 11.4.6 Let [a, b] be an interval of R.

1. If ϕ :[a, b] → R is continuous, show that the function F :[a, b] → R defined by Z F (x) = ϕ(x) dλ(x), x ∈ [a, b] [a,x]

is differentiable on [a, b], and F 0 = ϕ.

2. * If f :[a, b] → R is a differentiable function with bounded derivative, show that Z f 0(x)dλ(x) = f(b) − f(a). [a,b]

Exercise 11.4.7 (Uniform continuity of integrals) Let (X , A, µ) be a measure space, and let f be a non-negative integrable function on X . Let  > 0 be fixed.

1. Show that there exists a M > 0 such that Z  f 1 dµ ≤ {f≥M} 2 11.4. PROBLEM SHEET 4: LIMIT THEOREMS, COMPUTATION OF INTEGRALS 141

2. Deduce that there exists a δ > 0 such that, for all A ∈ A with µ(A) ≤ δ, Z f dµ ≤ . A

3. Let f : R → R be a Borel measurable function. If f is integrable with respect to the Lebesgue measure, show that the function F : R → R defined by Z F (x) = f dλ, x ∈ R (−∞,x]

is uniformly continuous on R.

Exercise 11.4.8 (Convergence in measure) Let (X , A, µ) be a measure space with µ(X ) < ∞, and let fn, n ≥ 1, and f be measurable functions from (X , A) to (R, B(R)). We say that fn converges to f in measure if, for all  > 0,

µ({|fn − f| > }) −→ 0. n→∞

R 1. Using Markov’s inequality (Ex.3), show that if |fn − f| dµ −→ 0, then fn converges to f in n→∞ measure. Show, with a counter-example, that the converse is wrong.

2. Show that if fn converges to f µ-a.e., then fn converges to f in measure. Show, with a counter- example, that the converse is wrong.

3. * Assume that fn converges to f in measure, and that there exists a non-negative integrable function g : X → R such that |fn| ≤ g µ-a.e., for all n ≥ 1.

(a) Show that |f| ≤ g µ-a.e.. (b) Using Exercise 7.2, show that Z |fn − f| dµ −→ 0. n→∞

Exercise 11.4.9 Let I be a bounded Lebesgue measurable set. Suppose that f : I → R is bounded.

1. Show that if f is Lebesgue measurable, then Z  Z  inf h dλ, h ≥ f, h ∈ S = sup g dλ : g ≤ f, g ∈ S (11.1)

2. Show that if (11.1) holds, f is Lebesgue measurable. 3

3Hint: extract a non-decreasing sequence of simple functions whose supremum is f. 11.5. PROBLEM SHEET 5: FUBINI’S THEOREMS 142

11.5 Problem Sheet 5: Fubini’s Theorems

Exercise 11.5.1 Let X = {1,...,N} be a finite state space. We consider the measure space (X , A, m), X PN where A = 2 is the discrete σ-algebra and m = i=1 δi is the counting measure on (X , A).

1. What can you say about the sigma-algebra A ⊗ A on X × X ? And about m ⊗ m?

2. Write down the Fubini-Tonelli Theorem for the specific case of a measurable function on the mea- sure space (X × X , A ⊗ A, m ⊗ m).

Exercise 11.5.2 Given a measure µ on X × Y. We say µi are the marginals of µ if

µ(X × B) = µ2(B), µ(A × Y) = µ1(A), ∀A ∈ F,B ∈ G.

We also say that µ is a coupling of µ1 and µ2.

2 Give an example of a Borel measure µ on R , two Borel measures µi on R such that µ is not the product measure µ1 × µ2, but µi are marginals of µ. Give another set of examples of measures such that µ = µ1 × µ2 but µi are not the Lebesgue measure.

Exercise 11.5.3 Consider the measure space (R2, B(R2), λ), where λ is the Lebesgue measure on R2. Let f be the function on R2 given by ( q 1−y if 0 ≤ y < x and 0 < x < 1 f(x, y) = x−y 0 otherwise.

Show that f is integrable and compute R f dλ.

Exercise 11.5.4 Consider the measure space ([0, 1]2, B([0, 1]2), λ), where λ is the Lebesgue measure on [0, 1]2. Let α ∈ R. and, for all (x, y) ∈ [0, 1]2, let ( 1 if x 6= y f(x, y) = |x−y|α 0 otherwise R Compute [0,1]×[0,1] fdλ. For which values of α is f integrable?

Exercise 11.5.5 Let f(x, y) = e−xy − 2e−2xy, for (x, y) ∈ [0, 1] × [1, +∞). Show that the integrals R R R R [0,1] [1,+∞) f(x, y) dλ(y) dλ(x) and [1,+∞) [0,1] f(x, y) dλ(x) dλ(y) exist but do not coincide. De- duce therefrom that f is not integrable on [0, 1] × [1, +∞).

Exercise 11.5.6 Let f :(−1, 1)2 → R be the function defined by   xy if (x, y) ∈ (−1, 1)2 \{(0, 0)} f(x, y) = (x2+y2)2 0 if (x, y) = (0, 0). 11.5. PROBLEM SHEET 5: FUBINI’S THEOREMS 143

Show that the iterated integrals RR f(x, y) dλ(x) dλ(y) and RR f(x, y) dλ(y) dλ(x) exist and coincide, but that f is not integrable on (−1, 1)2.

Exercise 11.5.7 We consider the measurable space ([0, 1], B([0, 1])). Let µ be the counting measure on [0, 1], which assigns to each A ∈ B([0, 1]) its number of elements #A ∈ {0, 1,..., +∞}. Let also λ be Lebesgue measure on [0, 1]. Let ∆ = {(x, x), x ∈ [0, 1]}.

1. Show that ∆ ∈ B([0, 1]2).

2 2. John claims: “ The function 1∆ is a non-negative Borel measurable function on [0, 1] , therefore by Fubini’s Theorem we have Z Z Z Z 1∆(x, y) dλ(y) dµ(x) = 1∆(x, y) dµ(x) dλ(y)”. [0,1] [0,1] [0,1] [0,1] Is John’s statement correct? If not, what is wrong with his argument?

Exercise 11.5.8 Let f : R → R+ be a measurable, non-negative function. Show that Z Z λ({x : f(x) ≥ y}) dλ(y) = f dλ. R+ Deduce that, for all g : R → R measurable and all p ≥ 1, Z Z p λ({x : |g(x)| ≥ a}) ap−1dλ(a) = |g|p dλ. R+ Exercise 11.5.9 (A nice application of Fubini) If f : R → R is a measurable function, show that λ({x ∈ R : f(x) = y}) = 0 for λ a.e. y ∈ R.

sin(x) Exercise 11.5.10 1. Show that the function x 7→ x is Borel measurable from (0, ∞) to R, but not integrable with respect to λ.

2. * Show that, for all x > 0, Z 1 e−tx dt = , (0,∞) x and deduce that Z sin(x) π lim = . n→∞ (0,n) x 2

Exercise 11.5.11 (Volume of the unit ball in Rd) For all d ≥ 1 and r ≥ 0, let

d d X 2 2 Bd(r) = {x ∈ R : xi ≤ r } i=1

d d d be the closed ball of radius r in R centered at 0, and let Vd(r) = λ (Bd(r)), where λ is the Lebesgue measure on Rd. 11.6. PROBLEM SHEET 6: THE RADON-NIKODYM THEOREMS 144

d 1. Prove that, for all r ≥ 0, Vd(r) = r Vd(1).

R 2 2 d−2 2. * Show that, for all d ≥ 3. V (1) = V (1) (1 − x − y ) 2 dλ(x) dλ(y). Using a polar d d−2 B2(1) 2π change coordinates, deduce that Vd(1) = Vd−2(1) d .

3. We denote by Γ the Gamma function defined for all x > 0 by Γ(x) = R tx−1e−t dλ(t), and √ (0,∞) recall that Γ(x + 1) = xΓ(x) for all x > 0, and that Γ(1/2) = π. Show that for all d ≥ 1,

πd/2 Vd(1) = d . Γ( 2 + 1)

11.6 Problem Sheet 6: the Radon-Nikodym Theorems

Exercise 11.6.1 In the following let (X , A) be a fixed measurable space. Are the following statements true or false? Give a proof of provide a counter-example

1. If µ and ν are two measures on (X , A) such that µ ≤ Cν for some C > 0, then µ  ν. How about the converse?

2. If µ and ν are two measures on (X , A), there always exists a measure ξ such that µ  ξ and ν  ξ.

3. If m is the counting measure on X , then every measure µ on (X , A) is absolutely continuous with respect to m.

4. On (R, B(R)), the measure 1[0,1]λ is absolutely continuous with respect to λ.

5. On (R, B(R)), the measure λ is absolutely continuous with respect to 1[0,1]λ.

Exercise 11.6.2 Let µ and ν be two finite measures on a measurable space (X , A), and let D be a non-negative measurable function on X . Show that the following conditions are equivalent:

dµ 1. µ  ν and dν = D ν-a.e.

2. For all non-negative measurable function f on X , R f dµ = R fD dν

3. For all bounded measurable function f on X , R f dµ = R fD dν.

Exercise 11.6.3 Let µ and ν be two σ- finite measures on a measurable space (X , A) such that µ  ν. dµ dν On which condition on dν do we also have ν  µ? In that case, give an expression for dµ . 11.6. PROBLEM SHEET 6: THE RADON-NIKODYM THEOREMS 145

Exercise 11.6.4 1. Let µ be a Borel measure on R2 admitting a density h with respect to the Lebesgue measure, where h(x, y) = f(x) g(y), (x, y) ∈ R2,

where f and g are two non-negative Borel measurable functions on R. Show that µ is the product of two measures on R. In what case is µ a finite measure, resp. a probability measure?

2. Let X and Y be two real-valued random variables admitting a joint density f with respect to the Lebesgue measure λ2 on R2:

Z ∀A ∈ B(R2), P((X,Y ) ∈ A) = f dλ2. A

(a) Show that X and Y both admit a density, denoted by fX and fY respectively, with respect to the Lebesgue measure λ on R, that is, for all A ∈ B(R)

Z Z P(X ∈ A) = fX dλ, P(Y ∈ A) = fY dλ. A A

Give an expression for fX and fY in terms of f. (b) Under which condition on f are X and Y independent?

Exercise 11.6.5 For all h ∈ R, let N(h, 1) be the probability measure on (R, B(R)) given by

Z 2 1 − (x−h) N(h, 1)(A) = √ e 2 dx. A 2π

Show that the measures N(h, 1) and N(0, 1) are equivalent, and compute the corresponding Radon- Nikodym derivatives.

Exercise 11.6.6 Let ν be a finite measure. Show that ν  µ if and only if the following holds: for any  > 0 there exists δ > 0 such that whenever µ(E) < δ then ν(E) < .

Exercise 11.6.7 On (R, B(R)) we consider the Lebesgue measure λ and the counting measure µ which assigns, to every set A ∈ B(R), the number #A ∈ {0, 1,..., +∞} of elements of A.

1. Show that λ  µ

2. Show that there does not exist a measurable function f : R → R+ such that, for all A ∈ B(R), R λ(A) = A f dµ. Does this contradict the Radon-Nikodym theorem? 11.7. PROBLEM SHEET 7: CONDITIONAL EXPECTATIONS 146

Exercise 11.6.8 Recall that x ∈ R is a point atom of a measure µ on (R, B(R)) if µ({x}) > 0. Recall also that µ is said to be discrete if µ(Ac) = 0, where A is the set of point atoms of µ, while µ is said to be diffuse if it does not have any point atom.

Let µ be a finite measure on (R, B(R)). Show that there exists a unique decomposition µ = µa + µd + µs where:

• µd is a discrete measure R • µa(A) = A f dλ for all A ∈ B(R), for some non-negative integrable function f

• µs is diffuse and singular with respect to λ.

Exercise 11.6.9 (The devil’s staircase) We construct recursively a sequence (fn)n≥0 of non-decreasing, piecewise linear functions on [0, 1] such that f(0) = 0 and f(1) = 1 as follows. We set f0(x) = x for x ∈ [0, 1]. For n ≥ 0, given the piecewise linear function fn, we construct fn+1 by replacing fn, on each maximal subinterval [u, v] where it is not constant, by the piecewise linear function such that 2u+v u+2v fn+1(u) = fn(u), fn+1(v) = fn(v), and fn+1 is identically equal to (fn(u)+fn(v))/2 on [ 3 , 3 ]. Thus, for n = 1,  (3x)/2 if x ∈ [0, 1/3]  f1(x) = 1/2 if x ∈ [1/3, 2/3]  (3x)/2 − 1/3 if x ∈ [2/3, 1].

−n 1. Show that |fn+1(x) − fn(x)| ≤ 2 for all n ≥ 0 and x ∈ [0, 1]. Deduce therefrom that fn converges uniformly on [0, 1] to a continuous, non-decreasing function f.

2. Let µ be the Lebesgue-Stieltjes measure on [0, 1] associated with f. Show that µ is a probability measure, and that it is diffuse (i.e. µ({x}) = 0 for all x ∈ [0, 1]).

3. * Show that µ is singular with respect to the Lebesgue measure on [0, 1]. 4

11.7 Problem Sheet 7: Conditional expectations

Exercise 11.7.1 Let (Ω, F, P) be a probability space and X an integrable real-valued random variable. Suppose that Y takes distinct values {s1, . . . , sn} with positive probability. Give an explicit formula for E(X|Y ).

Exercise 11.7.2 We fix a probability space (Ω, F, P). Let (X , A) be a measurable space, and Z a random variable with values in X . Show that for every non-negative real-valued random variable X, there exists a measurable map ϕ :(X , A) → R+ such that, for all measurable function h : X → R+, we have E(h(Z)X) = E(h(Z)ϕ(Z)). 4Hint: Show that µ(K) = 1 while λ(K) = 0, where K is the Cantor set. 11.7. PROBLEM SHEET 7: CONDITIONAL EXPECTATIONS 147

Exercise 11.7.3 Let X,Y ∈ L1(Ω, F, P) and G a sub-σ-algebra of F. Show that if X is G measurable and XY ∈ L1 then E(XY |G) = XE(Y |G).

This means we take out ‘what is known’.

Exercise 11.7.4 Let X,Y be two real-valued random variables defined on a probability space (Ω, F, P). For all A ∈ B(R), we define

P(X ∈ A|Y = y) := E(1{X∈A}|Y = y), y ∈ R.

Prove that if (X,Y ) has joint density p(x, y) w.r.t. the Lebesgue measure on R2, then for any A ∈ B(R) and a.e. y ∈ R, R A p(x, y)dx P(X ∈ A|Y = y) = R ∞ . −∞ p(x, y)dx

Exercise 11.7.5 Let X,Y be integrable real-valued random variables such that E(X) = E(Y ). Let

A = {A ∈ F : E(X1A) = E(Y 1A)}.

Show that A is a λ-system.

Exercise 11.7.6 Show that if Y and Y 0 are integrable real-valued random variables on (Ω, F, P) with 0 0 E(Y ) = E(Y ), and E(1AY ) = E(1AY ) for all element A of a π-system C such that σ(C) = F, then Y = Y 0 almost-surely.

n Exercise 11.7.7 Let (Ω, F, P) be a probability space, Fi sub-σ-algebras of F and denote by ∨i=1Fi the n σ-algebra generated by ∪i=1Fi. n Let C = {∩i=1Ai : A1 ∈ F1,...,An ∈ Fn}.

n (1) Show that ∨i=1Fi = σ(C).

n (2) Let X be an integrable real-valued random variable. If Y is ∨i=1Fi-measurable, E(Y ) = E(X), n and E(1AX) = E(1AY ) for all A ∈ C, show that Y = E(X| ∨i=1 Fi).

Exercise 11.7.8 (Conditional variance) Let X and Y be two real-valued random variables such that Y is square-integrable. We call conditional variance of Y given X, and denote by Var(Y |X), the random variable E((Y − E(Y |X))2|X).

1. Provide an alternative expression for Var(Y |X) in terms of E(Y 2|X) and E(Y |X).

2. In which case do we have Var(Y |X) = 0 a.s.? What is Var(Y |X) if X and Y are independent? 11.7. PROBLEM SHEET 7: CONDITIONAL EXPECTATIONS 148

3. Show that, for all f : R → R Borel measurable such that f(X) is square-integrable

E((Y − f(X))2) = E(Var(Y |X)) + E((E(Y |X) − f(X))2). (∗)

Which functions f minimise E((Y − f(X))2)?

4. By choosing an appropriate function f in (∗), prove that

Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)).

Exercise 11.7.9 ** Show by a counter-example that E(E(X | F2) | F1) = E(E(X | F1) | F2) a.s. is not true in general. 5

5Example: divide a square, our Ω, into 4 squares by drawing a horizontal line and a vertical line through it. We denote S S by A1,A2,A3,A4 the sub-squares. Let F1 = σ({A1 A2,A3 A4}) be generated by the horizontal partition and F2 = S S σ({A1 A3,A2 A4}) the vertical partition. Let X be a random variable taking values in ai on Ai respectively for i =

1, 2, 3, 4. Compute E(E(X|F1)|F2).