Measure and Integration
Xue-Mei Li with assistance from Henri Elad Altman Imperial College London
March 6, 2021
Contents
1 Introduction (cf. Introductory Lecture) 9
1.1 Module Information ...... 9
1.2 Course Structure ...... 9
1.3 Prologue ...... 10
1.4 The mass transport problem ...... 10
1.5 Coin tossing ...... 11
1.6 Extending Riemann’s theory of integration ...... 12
1.6.1 Inadequacy of Riemann Integrals ...... 12
1.6.2 Idea of Lebesgue’s integral: allow for more sets as “building blocks” ...... 13
1.7 Observables of Random Variables ...... 14
2 Measurable sets (1st Lecture) 15
2.1 Set Algebra ...... 15
2.2 Measurable Space ...... 16
2.2.1 Countability of σ-algebras ...... 18
2.2.2 Fundamental Examples and Exercises ...... 18
2.3 Borel σ-algebras ...... 20
2.3.1 Definition ...... 20
2.3.2 The case of R ...... 20
2.3.3 Borel σ-algbera on a subset of a metric space ...... 21
3 CONTENTS 4
2.4 Product σ-algebras ...... 21
2.4.1 Product of a countable family of σ-algebras * ...... 22
2.5 A warning ...... 23
2.6 Background material* ...... 23
3 Measures (Lectures 2-6) 25
3.1 Basics of Measures (2nd Lecture) ...... 25
3.2 Uniqueness of Measures (3rd Lecture) ...... 28
3.2.1 π − λ theorem ...... 28
3.3 Measure determining sets ...... 29
3.4 Construction of Measures (Lectures 4,5,6) ...... 30
3.4.1 Negligible sets and complete σ-algebras ...... 31
3.4.2 A completion theorem* ...... 31
3.4.3 Outer Measure ...... 32
3.4.4 Extension of pre-measures ...... 34
3.4.5 Proof for Caratheodory’s Theorem ...... 34
3.4.6 Construction of Lebesgue-Stieltjes / Lebesgue measure ...... 38
3.4.7 Example of a non-Lebesgue measurable set* ...... 43
3.4.8 Examples ...... 44
3.5 Lebesgue Measure on Rn (6th Lecture) ...... 45
4 Measurable maps (Lectures 7,8) 47
4.1 Definition ...... 47
4.2 Examples ...... 47
4.3 Properties of measurable functions ...... 48
4.4 Measurability with respect to the σ-algebra generated by a map ...... 49
4.5 Measurable maps with values in a subset ...... 49
4.6 Real-valued measurable functions ...... 50 CONTENTS 5
4.7 Simple functions ...... 53
4.8 Factorisation Lemma ...... 56
4.9 Pushed-forward measure ...... 56
4.10 Appendix* ...... 57
4.10.1 Littlewood’s three principles ...... 57
4.10.2 Product σ-algebras ...... 57
4.10.3 The Borel σ-algebra on the Wiener Space ...... 57
5 Integration: Lectures 9-12 59
5.1 Integration ...... 59
5.2 Outline of the construction ...... 60
5.3 Integral of simple functions ...... 60
5.4 Integration of non-negative functions ...... 64
5.5 Integrable functions ...... 69
5.6 Further limit theorems ...... 71
5.7 Integrals depending on a parameter ...... 73
5.8 Examples and exercises ...... 75
5.9 The Monotone Class Theorem for Functions * ...... 77
5.9.1 Lebesgue Integration ...... 77
5.10 The pushed forward measure ...... 78
5.11 Appendix: Riemann integrals ...... 79
5.12 Appendix: Riemann-Stieltjes integrals* ...... 80
5.13 Appendix: Functions with bounded variation* ...... 82
6 Lp spaces: Lectures 13,14 85
6.1 Holder’s inequality ...... 85
6.2 Definition and basic properties of the Lp spaces ...... 86
6.3 Density results ...... 89 CONTENTS 6
6.4 The case of finite measures ...... 91
6.4.1 Mode of convergence ...... 93
7 Product measures: Lectures 15-17 95
7.1 Product measures ...... 96
7.1.1 Sections of subsets of product spaces ...... 100
7.2 Fubini’s Theorem ...... 101
7.3 Product measures and independence ...... 103
8 Radon-Nikodym Theorem: Lectures 18-20 105
8.1 Singular and absolutely continuity ...... 105
8.2 Signed measure ...... 107
8.3 Radon-Nikodym Theorem ...... 110
9 Conditional Expectations: Lectures 21-23 115
9.1 Conditional expectation and probabilities ...... 115
9.1.1 Conditioning on a random variable ...... 117
9.2 Properties ...... 118
9.3 Conditional expectation of a non-negative random variable ...... 119
9.4 Some further properties ...... 120
9.5 Convergence Theorems ...... 122
9.6 Conditional expectation of square-integrable random variables ...... 123
9.6.1 Conditional Jensen inequality ...... 123
9.6.2 Conditional expectation as orthogonal projection ...... 124
9.7 Useful Inequalities and exercises* ...... 125
9.7.1 Exercises ...... 126
9.8 Conditional probability ...... 127
9.8.1 Finite σ-algebras ...... 127 CONTENTS 7
10 Mastery Material: Ergodic Theory 129
10.1 Basic Concepts ...... 129
10.1.1 Example: circle rotation ...... 130
11 Exercises from the problem sheets 131
11.1 Problem Sheet 1: Measurable sets ...... 131
11.2 Problem Sheet 2: Measures ...... 133
11.3 Problem Sheet 3: measurable functions, integrals ...... 136
11.4 Problem Sheet 4: limit theorems, computation of integrals ...... 139
11.5 Problem Sheet 5: Fubini’s Theorems ...... 142
11.6 Problem Sheet 6: the Radon-Nikodym Theorems ...... 144
11.7 Problem Sheet 7: Conditional expectations ...... 146
Chapter 1
Introduction
1.1 Module Information
Prerequisite Real Analysis (M2PM1), Metric Spaces and Topology (M2PM5), and basic Probability. Measure and Integration is a foundational course, underlies analysis modules. It brings together many concepts previously taught separately, for example integration and taking expectation, reconciling dis- crete random variables with continuous random variables.
Contents. Measurable sets, sigma-algebras, Measurable functions, Measures. Integration with re- spect to measures, Lp spaces, Modes of convergences, basic important convergence theorems (Domi- nated convergence theorem, Fatou’s lemma etc), useful inequalities, properties of integrals.
This module is related to: Probability theory, Applied probability, Markov processes, Fourier analysis and theory of distributions, Functional analysis, ODE’s, Ergodic theory, Probability theory, Stochastic Filtering, Applied Stochastic Processes.
1.2 Course Structure
Assessment
• CW 1. (5%) Thursday 5 November, Deadline: Thursday 19 November
• CW 2. (5%) Thursday 3 December, Deadline: Thursday 17 December
• May Exam (90%): 2 hour exam (year 3), 2.5-hour exam (year 4/Msc).
Problem sheets: not assessed, they are the best material for preparing for the exam.
9 1.3. PROLOGUE 10
Questions and Answers sessions: Wednesdays 11-12 am
Problem classes: Tuesdays Weeks 2-4-6-8-10
References:
• Real Analysis by H. L. Royden, Third edition
• Real Analysis, Modern Techniques and their applications, by G. B. Folland
• Measure Theory by P. R. Halmos
• Measures, Integrals and Martingales, by Rene´ L. Schilling, Cambridge UP, 2005. 176-89. Web.
How to use these notes? A ‘remark’ is material which is helpful with our understanding, which may or may not be covered in the lectures. An ‘exercise’ may appear in the weekly problem sheet, and is in the notes for one or more of the following reasons: (1) the conclusion is interesting, (2) the proof is representative, (3) proving it improves our understanding.
If a section /paragraph is given a ∗ sign, it is not covered in class (or not ‘officially’ covered in class). They are in the notes for the background or considered to be useful to further our understanding.
1.3 Prologue
Measure theory is not only a fundamental tool to measure/compute, it also provides the basis for several fundamental concepts in mathematics.
1.4 The mass transport problem
Monge was concerned with the cost of moving material from a mine to a construction site, Monge’s problem (1781) is: What is the optimal way to transport a pile of sand? 1.5. COIN TOSSING 11
Let us think of µ as the original sand mass distribution, a measure µ. We also think of the target mass as a measure ν. A map T is a transport map if it pushes µ to ν. The optimal transport problem from µ to ν is then formulated as follows: find a map x 7→ T (x) such that the cost (distance) Z |x − T (x)|µ(dx) R is minimised among all transport maps. We may also use other cost functions, e.g. kinetic energy or take into consideration the geometry of the space.
Example 1.1 Shift of a block of mass of height one of uniform density on [0, n] to [1, n + 1]. There are more than one transport maps: T1(x) = x + 1 for any x ∈ [0, n]; and ( n + 1 − x, x ∈ [0, 1) T (x) = , 2 x, x ∈ [1, n] are two different examples.
Example 1.2 Can one find a map shifting 1 unit of mass at location 0 to location 1 and −1 of equal mass?
1 1 In this case, the measures involved are µ = δ0, ν = 2 δ1 + 2 δ−1 and no transport map exists. Kantorovich’s formulation (1940’s). Find a measure on Ω × Ω that minimises Z |x − y|µ˜(dx, dy) and has marginals µ, ν respectively, i.e. µ˜(Ω × B) = ν(B), µ˜(A × Ω) = µ(A). This is called a transport plan.
Find a transport plan for Example 2. (Hint: this is a measure on {(0, 0), (1, 0), (0, 1), (1, 1)}.
Optimize the transport cost involved in distribution of pastries from bakeries to coffee shops. E.g. Bakery 1 delivers 50 pastries to Cafe 1, 30 each to Cafe shops 2 and 3, Bakery 2 delivers 20 to Cafe 2, 30 to cafe 3.
1.5 Coin tossing
We will need to understand what is meant by taking limits of functions and measures.
Example 1.3 Toss a fair coin 1000 times. Denote by Xi the result of the ith toss: Xi = 1 if we get head and Xi = −1 if we get tail. Mathematically, Xi can be represented by a random variable. Assuming that 1.6. EXTENDING RIEMANN’S THEORY OF INTEGRATION 12
the coin is fair and that all tosses are independent, then the random variables Xi, i ≥ 1 are independent 1 1 and identically distributed (i.i.d.), with probability distribution 2 δ−1 + 2 δ1, i.e. 1 P({X = 1}) = P({X = −1}) = . i i 2
The Law of Large Numbers (LLN) is a theorem ensuring that the empirical mean of a large number of tosses converges to the expectation of a single toss n 1 X Xi −→ E(X1) = 0 N N→∞ k=1 almost-surely. Thus, if you toss a coin 1000 times, you expect the empirical mean to be very close to 0. This is not suprising in practice. However the formulation and proof of the LLN (not covered in this module) is challenging, and requires sophisticated tools of measure theory and probability.
The Central Limit Theorem (CLT) describes how the empirical mean fluctuates. Namely, we have the convergence in law N 1 X √ Xi −→ N(0, 1) N→∞ N i=1 where N(0, 1) denotes a standard Gaussian distribution distributions. Thus, the deviation of the empirical mean away from 0 will be distributed , for large N, like a Gaussian curve.
The (LLN) and (CLT) are two fundamental results of probability theory, with far-reaching conse- quences. These results will not be proved in this module, but the key notions behind will be introduced.
1.6 Extending Riemann’s theory of integration
1.6.1 Inadequacy of Riemann Integrals
R b Let f :[a, b] → R be a bounded function. Riemann’s theory of integration allows to define a f(x) dx provided that f is Riemann-integrable. Recall the following definition:
Definition 1.6.1 For any partition P = {a = t0 < t1 < . . . < tn} of [a, b], we set n n X X L(f, P ) = (ti − ti−1) inf f(t),U(f, P ) = (ti − ti−1) sup f(t), t∈[t ,t ] i=1 i−1 i i=1 t∈[ti−1,ti] and define Z Z f = sup L(f, P ), f = inf U(f, P ), P P where the infimum and supremum are taken over all partitions of [0, 1]. Then f is Riemann-integrable if R R R b f and f coincide. In this case we define the Riemann integral a f(x) dx to be this common value. 1.6. EXTENDING RIEMANN’S THEORY OF INTEGRATION 13
Heuristically, f being Riemann-integrable means that one can approximate the area under the graph of f by rectangles [ti−1, ti] × [0, f(si)], where P = {a = t0 < t1 < . . . < tn} is a partition of [a, b], and si is any point in [ti−1, ti]. Intervals are thus the “building blocks” of Riemann’s theory. Unfortunately, in many interesting cases, f is not Riemann-integrable.
Example 1.4 An example of non-Riemann-integrable function is the Dirichlet function on [0, 1], denoted by 1Q: 1Q(x) = 1 for x ∈ Q and 1Q(x) = 0, for x ∈ [0, 1] \ Q. Indeed one has U(1Q,P ) = 1 and L(1Q,P ) = 0 along any partition P of [0, 1], so that the lower and upper integrals are distinct: Z Z 1Q := sup L(1Q,P ) = 0, 1Q := inf U(1Q,P ) = 1. P P
Same with the function 1[0,1]\Q defined on [0, 1] by 1[0,1]\Q(0) = 1 for x ∈ Q and 1[0,1]\Q(x) = 1 for x ∈ [0, 1] \ Q.
R 1 There are more irrational points than rational points, perhaps 0 f1(x)dx can be defined to be 1? Can we quantify how big are the sets of rational and irrational number, to see that the latter is larger than the former?
One inconvenient feature of the Riemann integral is its lack of stability under taking limits. Indeed, one can construct a sequence of bounded Riemann integrable functions fn such that fn(x) → f(x) for every x ∈ [0, 1], but f is not Riemann integrable, as the following exmple shows:
Example 1.5 Let Q = {q1, q2,... } be an enumeration of rational numbers. Define fn(x) = 1 if R x ∈ {q1, q2, . . . , qn} and set fn(x) = 0 otherwsie. Then fn is Riemann interagble and fn(x)dx = 1, but f is the Dirichlet function which is not Riemann integrable.
1.6.2 Idea of Lebesgue’s integral: allow for more sets as “building blocks”
Although the Dirichlet is not Riemann integrable, seen from another perspective, it is a simple function, simply the indicator of the set Q. The idea behind Lebesgue’s integral is to extend the family of “building blocks” beyond intervals, allowing for more general sets such as Q.
The question then arises: what kind of subsets of the plane or the real line can be measured? Clas- sically, the basic sets are: intervals on R; triangles, squares, polygons, squares on R2. For instance, relating the area of a circle to its length is an old mathematical problem, accomplished by ‘exhaustions’ (by the ancient Greeks 5th BC, the Achimedes of Syracus : 2-3 BC), which corresponds to taking limit in the modern language.
To extend our theory, we therefore begin with simple sets we can measure, and gradually build up more measurable sets by taking complements, unions, and approximations. This translates into the definition of σ-algebras. The new building blocks of our theory will now be measurable sets, i.e. elements 1.7. OBSERVABLES OF RANDOM VARIABLES 14 of the Lebesgue σ-algebra. The way in which we assign a measure/weight to these sets will correspond to the notion of Lebesgue measure.
1.7 Observables of Random Variables
Besides allowing to extend Riemann’s theory of integration, measure theory allows to define integrals in a much more general setting: on any given measure space (to be defined below), one can build up the associated measure theory. One important example corresponds to probability spaces.
Let (Ω, F, P) be a probability space, let X be a real-valued random variable, and let f :Ω → R be a P Borel measurable function. If P({X = i}) = pi for all i ≥ 1, with i pi = 1, the expectation of X is given by X Ef(X) = pif(i) i On the other hand, if X is a random variable with standard Gaussian normal distribution, then
Z 1 2 EX = √ e−x /2f(x)dx. 2π
Althought they look different, both expectations are integrals R f(y) dµ(y) where µ is probability distri- R bution of X. Both are also given by Ω X(ω)dP(ω). These concepts use integration theory with respect to a measure. Chapter 2
Measurable sets
Given a set X , the collection of its subsets is its power set. We denote it by 2X . What kind of subsets can be measured? What axioms should they satisfy? If a subset A can be measured, this ought to lead to some understanding of its complement Ac = X \ A. Moreover, if A, B can be measured, it is natural to postulate that A ∪ B can be measured. Actually, we further require limits of measurable sets to be S measurable, so that if (An)n≥1 is a sequence of measurable functions, n≥1 An should be measurable as well.
2.1 Set Algebra
Denote by Ac the complement of a subset A.
de Morgan’s identities: c c (∩A∈ΛAα) = (∪A ∈ Λα) c c (∪A∈ΛAα) = (∩A ∈ Λα) This holds for any number of sets, not necessarily countable number of.
Distribution Law
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C),A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
Distribution law holds for also for infinite number of sets. For example [ [ [ ( Ai) ∩ ( Bj) = (Ai ∩ Bj). i i i,j
Limits If An is an increasing sequence of subsets, we set
∞ lim An = ∪ An. n→∞ n=1
15 2.2. MEASURABLE SPACE 16
Similarly if An is a decreasing sequence of subsets, we define
∞ lim An = ∩ An. n→∞ n=1
Let f : X → Y be a map, define
f −1(A) = {x ∈ X : f(x) ∈ A}.
Taking pre-image respects set operations:
Lemma 2.1.1 −1 [ [ −1 f Bi = f (Bi),
−1 \ \ −1 f Bi = f (Bi),
f −1(A \ B) = f −1(A) \ f −1(B).
2.2 Measurable Space
Let X be a non-empty set. We study collections of subsets of X . Let φ denote the empty set.
Definition 2.2.1 A collection of subsets of X, denoted by A, is a (Boolean) algebra if the following property holds.
• (Non-emptiness ) φ ∈ A.
• (Closedness under complements) If B ∈ A then Bc ∈ A;
• (Closedness under finite unions) If B ∈ A, C ∈ A, then B ∪ C ∈ A.
The assumption φ ∈ A can be equally replaced by the condition that A is not empty.
We would like to be able to take limits.
Definition 2.2.2 A collection F of subsets of X is a σ-algebra if:
• non-emptiness: φ ∈ F.
• closedness under complements: if B ∈ F then Bc ∈ F.
• closedness under countable unions: if we have a sequence (Bn)n≥1 with Bn ∈ F for all n ≥ 1, S∞ then n=1 Bn ∈ F. 2.2. MEASURABLE SPACE 17
Elements of F are called measurable sets and (X , F) is said to be a measurable space.
Remark 2.2.1 Let F be a σ-algebra over X . Then it follows from the definition that:
•X∈F
n • If A1,...,An is a finite collection of elements of F, then ∪i=1Ai ∈ F
∞ • by de Morgan’s identities, if Bn ∈ F for all n ≥ 1, then ∩n=1Bn ∈ F.
• If A, B ∈ F then so does A \ B.
Examples:
1. The smallest σ-algebra over X is {φ, X}, the coarse σ-algebra
2. The largest σ-algebra over X is 2X , the discrete σ-algebra
3. If A ⊂ X , then F = {φ, X , A, Ac} is a σ-algebra.
4. The collection F of subsets A of X such that either A or Ac is countable, is a σ-algebra.
Exercise 2.2.1 Assume that X is infinite, and let F := {A ∈ 2X : A is finite or Ac is finite}. Show that F is an algebra but not a σ-algebra.
Proposition 2.2.2 The intersection of any number of σ-algebras is a σ-algebra. If C is any collection of subsets of a set X , then there always exists a smallest σ-algebra containing C.
Proof The first point is straightforward to prove and left as an exercise. For the second point, it suffices to consider \ A = B. B σ−algebra C⊂B In virtue of the first point, A is a σ-algebra. Moreover, A contains C, and by contsruction A ⊂ B for any other σ-algebra B containing C, as claimed.
The intersection of σ-algebras is often denoted by ∧. For example, F1 ∧ F2 denotes the intersection of F1 and F2.
Definition 2.2.3 If C is a collection of subsets of X, we denote by σ(C) the smallest σ-algebra contain- ing C. We call σ(C) the σ-algebra generated by C. 2.2. MEASURABLE SPACE 18
Example 2.1 Let X with a finite partition A1,...,An, by which we mean that A1,...,An are disjoint n and X = ∪k=1Ak. Let C = {A1,...,An}. Then σ(C) consists of all possible unions of the sets Ai: ( ) [ σ(C) = Ai,I ⊂ {1, . . . n} . i∈I
There are 2n such choices: we can include or not include a particular set and every element of F will come from such a union. In this particular case, the σ-algebra consists of a finite number of elements.
Exercise 2.2.2 Assume that X is infinite, and let F be as in Exercise 2.2.1. Find the σ-algebra generated by F.
2.2.1 Countability of σ-algebras
We will see below that a σ-algebra with an infinite number of elements is uncountable. So a σ-algebra is either generated by a partition as in Example 2.1 above, or has an uncountable number of elements.
Proposition 2.2.3 A σ-algebra F is either finite or uncountable.
Proof will be released later in November.
2.2.2 Fundamental Examples and Exercises
Proposition 2.2.4 If f :Ω → X is a map, and (X , B) a measurable space. Then f pulls back the σ-algebra G to a σ-algebra on Ω. This is the collection of pre-images
σ(f) := {f −1(A): A ∈ B} is a σ-algebra. This is called the σ-algebra generated by f , and also called the pre-image σ-algebra.
Proof. Exercise.
Example 2.2 Let R be endowed with the Borel σ-algebra B(R). If f : X → R is of the form
n X f(x) = aj1Aj (x), x ∈ X j=1
n where aj ∈ R are distinct and the Aj form a partition of X (i.e. ∪j=1Aj = X and the Aj are pairwise disjoint), then σ(f) is generated by the finite collection of sets {A1,...,An}. 2.2. MEASURABLE SPACE 19
Example 2.3 To see the role played by the assumption that Aj are disjoint, we give an example where Aj are not disjoint. Set for example
f(x) = 1[1,2] + 31[1,3] + 41(−2,1].
Then 4, x ∈ (−2, 1) ∪ (1, 2] 3, x ∈ (2, 3] f(x) = 8, x = 1 0, otherwise . The σ-algebra generated by f is given by the partition:
{{1}, (−2, 1) ∪ (1, 2], (2, 3], (−∞, −2] ∪ (3, ∞)}.
If f(x) = 1[1,2] + 31[1,3], then σ(f) is generated by the collection of subsets:
{[1, 2], (2, 3], (−∞, 1) ∪ (3, ∞), φ, R}.
Example 2.4 If F is a σ-algebra and A ⊂ X, then {A ∩ B : B ∈ F} defines a σ-algebra on A, called the trace of the σ-algebra F.
Exercise 2.2.3 Given two σ-algebras F1 and F2, we denote by F1∨F2 the smallest σ-algebra containing both F1 and F2. Show that F1 ∨ F2 can equivalently be characterised by the expressions:
•F 1 ∨ F2 = σ{A ∪ B | A ∈ F1 and B ∈ F2},
•F 1 ∨ F2 = σ{A ∩ B | A ∈ F1 and B ∈ F2}.
Definition 2.2.4 A collection of subsets E is an elementary family if
• φ ∈ E;
• If A, B ∈ E, then A ∩ B ∈ E;
• if A ∈ E then Ac is the finite union of disjoint sets from E.
Exercise 2.2.4 Show that if E is an elementary family, then the collection A of finite disjoint unions of members of E is an algebra.
c n c n Proof If A, B ∈ E, then B = ∪l=1Bl where Bl ∈ E are disjoint. So A\B = A∩B = ∪l=1(A∩Bl) ∈ n A. Also, A ∪ B = (A \ B) ∪ B ∈ A. By induction if Ai ∈ E then ∪i=1Ai ∈ A. This means that the finite union of sets from the elementary family is in A and A is closed under taking unions. 2.3. BOREL σ-ALGEBRAS 20
n We show A is closed under complement. Let A = ∪j=1Aj ∈ A where Aj ∈ E are disjoint. Then c Pnj m m c n c c Aj = m=1 Aj where Aj ∈ E. Since A = ∩j=1(Aj), by associativity, so A is a finite union of intersections of sets from E, therefore in A.
We call the following sets of the following form half open intervals: (a, b], φ or (a, ∞), (−∞, a) and φ, where a, b are real numbers.
Example 2.5 The collection of unions of a finite number of disjoint half open intervals is an algebra.
This follows from taking E to be the collection of half open intervals and use the earlier exercise.
2.3 Borel σ-algebras
2.3.1 Definition
Definition 2.3.1 If X is a complete separable metric space , the smallest σ-algebra generated by the open sets is called Borel σ-algebra, it is denoted by B(X ).
If X is a separable metric space, the metric topology satisfies the second axiom of countability, i.e. there exists a countable base. This countable base generates B(X).
2.3.2 The case of R
Example 2.6 We know the following about B(R), the Borel σ-algebra over R.
∞ 1 1 1. If a ∈ R, {a} = ∩n=1(a − n , a + n ) ∈ B(R). 2. Any countable subset of R is a Borel set.
3. Any intervals of any form is a Borel set.
4. B(R) is generated by the set of open intervals. This follows from the proposition below.
Proposition 2.3.1 Every open set of R is a countable union of disjoint open sets. This decomposition into disjoint open sets is unique up to ordering.
Proof Denote by A the collection of open intervals. Let O be an open set. Let x ∈ O. Then x is contained in an open interval, the interval is a subset of O. The set of the left end points of such interval has an infimum. Let us call it ax. Let bx denote the supremum of the right end of the intervals. Then Ix = (ax, bx) is the maximal interval such that x ∈ Ix ⊂ O Given x, y ∈ O either they are disjoint or 2.4. PRODUCT σ-ALGEBRAS 21
they overlap in which case they are the same. Pick up a rational number qx from Ix. Then the rational numbers are all distinct. So {Ix : x ∈ O} contains only a countable number of distinct open intervals.
2.3.3 Borel σ-algbera on a subset of a metric space
Let (X , d) be a metric space, and Y ⊂ X be a subset of X . Then Y (endowed with the restriction of the metric d to Y) is still a metric space, its open sets are the subsets of the form U ∩ Y, where U is an open subset of X . One may consider the corresponding Borel σ-algebra B(Y). It turns out that the following holds
Proposition 2.3.2 B(Y) coincides with the trace on Y of B(X ), that is
B(Y) = {A ∩ Y,A ∈ B(X )},
(see Example 2.4 above).
Proof By Example 2.4, we know that A := {A ∩ Y,A ∈ B(X )} is a σ-algebra on Y. Any open subset of Y is of the form U ∩ Y, where U is an open (hence Borel) subset of X , hence B(Y) ⊂ A. Conversely, the collection {A ∈ B(X ): A ∩ Y} is shown at once to be a σ-algebra on X , and it contains open subsets of X . Therefore it contains, hence coincides with, B(X ). So A ⊂ B(Y), and the claim follows.
Remark 2.3.3 In the special case where Y is itself a Borel measurable subset of X , then for any A ∈ B(X ), we also have A ∩ Y ∈ B(X ), hence it follows from the above proposition that
B(Y) = {B ∈ B(X ): B ⊂ Y}.
In particular, in this case, B(Y) ⊂ B(X ).
2.4 Product σ-algebras
Let (X , F) and (Y, G) be two measurable spaces. In general, subsets of X × Y which are product sets, i.e. of the form A × B with A ∈ F and B ∈ G, do not constitute a σ-algebra. Rather, we introduce the following definition.
Definition 2.4.1 We define the product σ-algebra (also called tensor σ-algebra) on the product space X × Y to be that generated by product sets
F ⊗ G = σ ({A × B : A ∈ F,B ∈ G}) . 2.4. PRODUCT σ-ALGEBRAS 22
2.4.1 Product of a countable family of σ-algebras *
The above definition of product extends to any number of measurable spaces. We give the definition for the product of a countable number of measurable spaces.
Definition 2.4.2 Let (En, Fn) be measurable spaces. The tensor or product σ-algebra on the product ∞ space Πn=1En is σ Πn≥1An, : An ∈ Fn .
Remark 2.4.1 By stability of a σ-algebra under countable intersections, the above σ-algebra is also n generated by measurable cylinders, i.e. subsets of the form Πk=1Ak ×Πk>nEk for some n ≥ 1 and with A1 ∈ F1,...,An ∈ Fn.
Let (En, dn), n ≥ 1, be metric spaces. The product topology on E = Πn≥1En is the coarsest topology such that the projections pn : E → En are continuous, it is generated by sets of the form −1 pn (Un) where n ≥ 1 and Un is an open set of En. The product topology is induced by the distance d on E defined by ∞ X dn(xn, yn) ∧ 1 d(x, y) = , 2n n=1 for any two elements x = (xn)n≥1 and y = (yn)n≥1 of E. We may consider the Borel σ-algebra B(E) of the product metric space E, and investigate how it relates with the product σ-algebra ⊗n≥1B(En).
Lemma 2.4.2 ⊗n≥1B(En) ⊂ B(E).
Proof By Remark 2.4.1 above, it suffices to prove that, for all n ≥ 1 and for all A1 ∈ B(E1),...,An ∈ B(En), we have A := A1 × ... × An × Πk>nEk ∈ B(E). Note that the claim is true if A1,...,An are all open subsets, as then A is an open, hence Borel subset of E. Thus, for all A2,...An open subsets of E2,...,En, respectively, the collection
A1 := {A1 ∈ B(E1): A1 × ... × An × Πk>nEk ∈ B(E)} contains all open subsets of E1. It is also a σ-algebra. Hence A1 = B(E1), proving that
A1 × ... × An × Πk>nEk ∈ B(E) whenever A1 ∈ B(E1), and A2,...An are open subsets of E2,...,En, respectively. Porceeding by in- duction over i = 1, . . . , n, we similarly show that the claim still holds if A1,...,Ai are Borel measurable and Ai+1,...,An are open. Taking i = n yields the claim.
Remark 2.4.3 An alternative reasoning is as follows. The coordinate mappings pn : E → En being continuous, by Proposition 4.3.3 (see below), they are therefore measurable with respect to the Borel σ-algebra on the product space. Hence, for all n ≥ 1, and all A1 ∈ B(E1),...,An ∈ B(En), n −1 A1 × ... × An × Πk>nEk = ∩i=1Pi (Ai) ∈ B(E) 2.5. A WARNING 23 which yields the claim.
We now make the additional assumption that the En are all separable.
Remark 2.4.4 If X is a separable metric space, the metric topology satisfies the second axiom of count- ability, i.e. there exists a countable base. It suffices e.g. to consider the (countable) collection of open balls B(x, r) = {y ∈ X : d(x, y) < r}, for r > 0 rational, and for x ∈ D, where D is a countable dense subset of X. This countable base generates B(X).
∞ Theorem 2.4.5 Let (En) be separable metric spaces and set E = Πn=1En. Then
∞ B(E) = ⊗n=1B(En).
∞ Proof In view of the above lemma, there only remains to prove that B(E) ⊂ ⊗n=1B(En). For all n ≥ 1, let Dn be a countable base for the topology of En. Then subsets of E of the form U1×...×Un×Πk>nEk, for n ≥ 1, and U1 ∈ D1,...,Un ∈ Dn form a countable base for the product topology on E. That is, any open subset W of E can be written in the form W = ∪j≥1Wj, with Wj = U1 ×...×Un(j) ×Πk>n(j)Ek, where, for all k = 1, . . . , n(j), Uk is an element of Dk (in particular it is open, hence Borel). In particular, ∞ ∞ for all j ≥ 1, Wj ∈ ⊗n=1B(En). Therefore W ∈ ⊗n=1B(En), and this being true for any open subset W of E, the claim follows.
2.5 A warning
We end this chapter with a warning:
Remark 2.5.1 if C is a sollection of subsets of a set X and F = σ(C), this does not mean in general ∞ that every measurable set is of the form ∪i=1Ci with Ci ∈ C. Taking for example C to be the collection ∞ {(−∞, a], where a ∈ R}, then σ(C) = B(R), however if Ci = (−∞, ai] are elements of C, then ∪i=1Ci is an interval of the form (−∞, a) or (−∞, a] with a = sup{ai, i ≥ 0}, and the Borel set {−1, 1}, for instance, is not of this form. More generally, as soon as C is an infinite collection of sets, one cannot hope to represent a generic element of σ(C) as some countable combination of elements of C.
2.6 Background material*
A topological space (X , T ) consists of a set X equipped with a topology T , i.e. a subset T ⊂ 2X such that: 2.6. BACKGROUND MATERIAL* 24
• φ ∈ T and X ∈ T .
TN • If {A0,A1,...,AN } ⊂ T , then n=0 An ∈ T . S • If A ⊂ T , then A∈A A ∈ T .
In other words, T is closed under arbitrary unions and finite intersections. Elements of T are called open sets. A function between topological spaces is continuous if the pre-images of open sets are open sets.
Given a topological space (X , T ), we define B(X ) to be the smallest σ-algebra on X containing T . This particular σ-algebra is called the Borel σ-algebra of X . In other words, the Borel σ-algebra is the smallest σ-algebra such that all open sets are measurable. We denote by Bb(X ) the (Banach) space of all Borel-measurable and bounded functions from X to R equipped with the norm
kϕk∞ = sup |ϕ(x)| . (2.1) x∈X
We denote by Cb(X ) the (Banach) space of all continuous and bounded functions from X to R equipped with the same norm as in (2.1).
The open sets of a metric space gives a topology on the metric space. Borel σ-algebras on a complete separable metric space is specially nice.
When discussing Borel σ-algebras, we will always assume that X is a complete separable metric space.
A metric space is compact if any covering of it by open sets has a sub-covering of finite open sets. A discrete metric space (whose subsets are all open sets ) is compact if and only if it is finite. (e.g. Z and N with the usual distance d(x, y) = |x − y| is not compact.) A subset of a metric space is compact if it is compact as a metric space with the induced metric. It is relatively compact if its closure is compact. A metric space is sequentially compact if every sequence of its elements has a convergent sub-sequence (with limit in the metric space of course). It is totally bounded if for any > 0 it has a finite covering by open balls of side . A metric space is complete if every Cauchy sequence converges.
It is a theorem that a metric space is compact if and only if it is complete and totally bounded. A metric space is compact if and only if it is sequentially compact.
A subset of a metric space is relatively compact if it is sequentially compact (the limit may not need to belong to the subset). Chapter 3
Measures
3.1 Basics of Measures
A measure assigns a number to every measurable set.
Definition 3.1.1 A measure µ on the measurable space (X , F) is a map µ: F → [0, ∞] with the fol- lowing properties;
• µ(φ) = 0.
• (σ-additive Propertices) If {An}n>0 is a countable collection of elements of F that are all disjoint, then one has [ X µ( An) = µ(An). n>0 n>0
The triple (X , F, µ) is called a measure space.
Definition 3.1.2 We say that:
• µ is a finite measure if µ(X ) < ∞.
• µ is a probability measure if µ(X ) = 1.
∞ • µ is σ-finite if there exists A1 ⊂ A2 ⊂ ... such that µ(Ai) < ∞ and X = ∪i=1Aj. This is equivalent to the statement that there exists an increasing sequence of measurable sets Xn with ∞ µ(Xn) < ∞ such that X = ∪n=1Xn. The sequence is called an exhausting sequence.
Convention. From now on we study only σ-finite measures.
25 3.1. BASICS OF MEASURES (2ND LECTURE) 26
X Example 3.1 If X = {1, 2, . . . , n}, let F = 2 . Then (X , F) is a σ-algebra. If we set µ(i) = ai, this defines a measure.
Proposition 3.1.1 Let (Ω, F) be a measurable space. Then
1. If A, B ∈ F and A ⊂ B then µ(B \ A) = µ(B) − µ(A).
2. Monotonicity. If A ⊂ B then µ(A) ≤ µ(B).
3. (Finite) additivity. If A1,A2 ∈ F and A1 ∩ A2 = φ then µ(A1 ∪ A2) = µ(A1) + µ(A2).
∞ 4. Continuity from below. If An ∈ F, An increases, then µ(∪n=1An) = limn→∞ µ(An).
5. Continuity from above. If An ∈ F, An decreases and µ(A1) < ∞, then
∞ µ(∩ An) = lim µ(An). n=1 n→∞
Proof Claim 1 follows from that B = A ∪ (B \ A) is a disjoint union whenever A ⊂ B. Hence µ(B) = µ(A) + µ(B \ A).
Claim 2 follows by claim 1, claim 3 follows from taking Ai = φ for i ≥ 3.
∞ We prove claim 4. If A1 ⊂ A2 ⊂ A3 ⊂ ..., set A0 = φ, A = ∪i=1Ai Also set
B1 = A1,B2 = A2 \ A1,B3 = A3 \ A2,...,
∞ ∞ and so on. Then {Bi} are pairwise disjoint and ∪i=1Bi = ∪i=1Ai = A. Then, ∞ n n X X X µ(A) = µ(Bi) = lim µ(Bi) = lim [µ(Ai) − µ(Ai−1)] = lim µ(An). n→∞ n→∞ n→∞ i=1 i=1 i=1
Prove claim 5. We assume that µ(A1) < ∞ and proceed as for claim 3.
Example 3.2 1. Let X be a set endowed with the discrete σ-algebra 2X . Set
µ(A) = |A|,A ⊂ X ,
where |A| denotes the cardinality of A. Then µ is a measure, called the counting measure on (X , 2X ).
2. Let (X , G) be a measurable space and x ∈ X . The δ measure at x, defined by ( 1, if x ∈ A, δ (A) = x 0, if x 6∈ A,
is a probability measure. 3.1. BASICS OF MEASURES (2ND LECTURE) 27
3. A discrete measure P on a countable space Ω = {ω1, ω2,... }, is uniquely determined by the sequence of numbers (pi)i≥1, where
pi = P ({ωi}), i ≥ 1.
Then, if A ⊂ Ω, we have ∞ ∞ X X P (A) = piδωi (A) = pi1A(ωi). i=1 i=1 P∞ In particular P is a probability measure if and only if i=1 pi = 1.
The last example above contains many examples of measures on countable sets you already know:
• Let Y be a random variable on a two state space X = {0, 1} with Bernoulli distribution of parame- ter p: P(Y = 0) = p and P(Y = 1) = 1 − p. Then the probability distribution of Y is the measure on X given by: PY := p δ0 + (1 − p) δ1.
• Let X be a binomial random variable of parameter p on {0, . . . , n}. The probability distribution of X is the measure PX on {0, . . . , n} given by n X PX = k=0
• Let X be a Poisson random variable of parameter λ on {0, 1, 2,...}. The probability distribution of X is the measure PX on {0, 1, 2 ..., } given by ∞ X λk P (A) = e−λ δ (A). X k! k k=0
Not all measures can be written as linear combinations of Dirac measures as above: here are two examples of more complicated measures.
Example 3.3 The interval [0, 1] equipped with its Borel σ-algebra admits a unique probability measure λ such that λ([a, b]) = b−a, for all a, b ∈ [0, 1], a ≤ b. The existence and uniqueness of such a measure, called the Lebesgue measure on [0, 1], will be shown below.
Example 3.4 The measure Z P(A) = e−x dx, A to be defined below, is a probability measrue on the half-line R+ equipped with B(R+). In such a situation, where the measure has a density with respect to Lebesgue measure, we will also use the short- hand notation P(dx) = e−x dx. 3.2. UNIQUENESS OF MEASURES (3RD LECTURE) 28
Exercise 3.1.1 If µ : F → [0, ∞] satisfies µ(φ) = 0, is finitely additive, and continuous from below, then it is a measure.
Exercise 3.1.2 Let N be endowed with the discrete σ-algebra 2N. Let µ : 2N → [0, ∞] be defined by P −n µ(A) = n∈A 2 if A is finite, and µ(A) = +∞ otherwise.
1. Is µ additive?
2. Is µ a measure on (N, 2N)?
3.2 Uniqueness statements
We often know that a property holds for a sub-collection of a σ-algebra, and want to show it holds for every set in the σ-algebra. The difficulty with this is that σ-algebras are in generally complicated. If we have a finite partition of X , the σ-algebra contains a finite number of sets. Otherwise it has uncountably many elements. See §2.2.1.
There are a number of powerful techniques, one of them is the π − λ theorem.
3.2.1 π − λ theorem
Definition 3.2.1 A non-empty collection of sub-sets C is a π-system if A, B ∈ C implies that A∩B ∈ C.
Definition 3.2.2 A collection C of sub-sets is called a λ-system if
1. Ω ∈ C
2. If A, B ∈ C and A ⊂ B then B \ A ∈ C.
3. If (An)n≥1 is a non-decreasing sequence of elements of C, i.e. An ∈ C and An ⊂ An+1 for all n ≥ 1, then ∪n≥1An ∈ C.
Note that a σ-algebra is both a π-system and a λ system.
Proposition 3.2.1 (not given in class) 1. Show that the intersection of any number of λ-systems is a λ-systems. Therefore given any collections of subsets there exists a smallest λ-system containing it.
2. If a λ system A is also a π-system, then it is a σ-algebra. 3.3. MEASURE DETERMINING SETS 29
Proof Part (1) is trivial. To see a set which is simultaneously a π-system and a λ-system is a σ- algebra, remark that taking unions is obtained from taking intersections and complements. In turn, taking countable unions is obtained from taking finite unions and monotone limits, whence the claim.
Theorem 3.2.2 If C is a π-system, then the smallest λ-system generated by C, λ(C), is a σ-algebra, and λ(C) = σ(C).
Proof Firstly
F1 : = {A : A ∩ C ⊂ λ(C)} = {A ∈ λ(C): A ∩ F ∈ λ(C), ∀F ∈ C} is a λ-system containing C, which implies F1 = λ(C). Indeed, let F ∈ C, then (1) Ω ∩ F = F ∈ C, and so Ω ∈ F1; (2) If A ⊂ B, with A, B ∈ F1,
(B \ A) ∩ F = B ∩ F \ (A ∩ F ) ∈ F1
S S (3).If An ∈ F1, An ∩ F ∈ λ(C), so ( An) ∩ F = n(An ∩ F ) ∈ λ(C). Similarly,
F2 : = {A : A ∩ λ(C) ⊂ λ(C)} = {A ∈ λ(C): A ∩ E ∈ λ(C), ∀E ∈ λ(C)} is a λ-system containing C, and therefore F2 = λ(C). Thus λ(C) is both a λ-system and a π-system, and is therefore a σ-algebra. Since it contains C, we obtain σ(C) ⊂ λ(C). But, since any σ-algebra is a λ-system, we also have λ(C) ⊂ σ(C), and the equality follows.
We present now a very useful result, which follows at once from Theorem 3.2.2.
Corollary 3.2.3 [π − λ Theorem.] If a λ-system contains a π-system C, then it contains the σ-algebra generated by C.
3.3 Measure determining sets
Theorem 3.3.1 Suppose that (Ω, F) is a measurable space, and C is a π-system generating F. Let µ and ν be two measures which agree on C.
1. If µ(Ω) = ν(Ω) < ∞, then µ = ν
2. More generally, if there exists an increasing sequence of subsets Ωk ∈ C, such that Ω = ∪k≥1Ωk and µ(Ωk) = ν(Ωk) < ∞ for all k ≥ 1, then µ = ν. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 30
Proof (1) First assume that µ(Ω) = ν(Ω) < ∞. Let
G = {A ∈ F : µ(A) = ν(A)}.
Then Ω ∈ G by assumption. Moreover, if An is a non-decreasing sequence of measurable sets in G,
∞ ∞ [ [ µ( An) = lim µ(An) = lim ν(An) = ν( An) n→∞ n→∞ n=1 n=1
Thus, G is closed under taking lower limit. Let A ⊂ B, A, B ∈ G, then by additive property,
µ(B \ A) = µ(B) − µ(A) = ν(B) − ν(A) = ν(B \ A).
This means B \ A ∈ G, and therefore G is a λ-system containing a π-system generating F. By the π − λ-Theorem, G ⊃ F and µ = ν on F, which proves the first point.
(2) For the second point, let Fk = {A ∩ Ωk : A ∈ F} denote the trace σ-algebras on Ωk, and denote by µk and νk the restrictions to Ωk of the measures µ and ν:
∀A ∈ F, µk(A) = µ(A ∩ Ωk), νk(A) = ν(A ∩ Ωk).
Applying the first point to µk and νk, we deduce that µk = νk. Therefore, by lower contiunity of measures, we obtain, for all A ∈ F,
µ(A) = lim µ(A ∩ Ωk) = lim ν(A ∩ Ωk) = ν(A), k→∞ k→∞ completing the proof.
Example 3.5 Let Ω = {1, 2, 3, 4}. Let G = {{1, 2}, {1, 3}, {3, 4}}, which generates the discrete σ- algebra F ( the power set), but not a π-system. The measure µ and ν agree on G, not on F.
µ(1) = 1/6, µ(2) = 2/6, µ(3) = 1/6, µ(4) = 2/6,
ν(1) = 2/6, ν(2) = 1/6, ν(3) = 0, ν(4) = 3/6.
3.4 Construction of measures from pre-measures
If (X , F) is a measurable space, constructing a non-trivial measure µ on (X , F), is in general, a very complicated task. One of the main goals we are aiming at is the construction of the Lebesgue measure on (R, B(R)). This will require introducing further set-theoretical notions. We start with some definitons. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 31
3.4.1 Negligible sets and complete σ-algebras
Definition 3.4.1 A measurable set A on a measure space (M, F, µ) is said to be a ‘null set’ if µ(A) = 0. A subset N of M is negligible if there exists a null set A ∈ F such that N ⊂ A.
Remark 3.4.1 A negligible set N is not necessarily measurable. Therefore we may not write that µ(N) = 0, simply because µ(N) is not defined.
Definition 3.4.2 Given a measure space (M, F, µ), F is said to be complete with respect to µ if it contains every negligible null set. The completion of F with respect to µ, denoted by F¯, is the σ- algebra given by F¯ = A ∨ N , where N is the collection of negligible sets.
3.4.2 A completion theorem*
Let Ai be two collections of subsets, %i : Ai → [0, ∞], i = 1, 2. In the sequel we say %2 is an extension of %1 if A2 contains A1 and %1(A) = %2(A) whenever A ∈ A1. Let (M, F, µ) be a measure space.
Proposition 3.4.2 The completion of F with respect to µ can be expressed as
F¯ = {A ∪ N,A ∈ F,N ∈ N }
= {A ⊂ M : ∃B1,B2 ∈ F, µ(B2 \ B1) = 0,B1 ⊂ A ⊂ B2}.
Moreover, we can extend uniquely µ into a measure µ¯ on F¯ by setting µ¯(A ∪ N) = µ(A) for A ∈ F and N ∈ N , and F¯ is complete w.r.t. µ¯. Furthermore, F¯ is the smallest σ-algebra containing F and to which µ can be extended in a way that F¯ is complete with respect to µ.
Proof The proof of the first two assumptions is similar to the proof of Theorem B, Section 13 (Chapter III) in Halmos’s book. We show the last claim: assume that G is a sigma-algebra containing F and on which µ can be extended into a measure ν in a way that G is complete with respect to ν. By definition of N , any N ∈ N satisfies N ⊂ A for some A ∈ F, with µ(A) = 0. Since F ⊂ G and ν is an extension of µ to G, we also have ν(A) = 0. Hence A is negligible with repect to ν. Since G is complete w.r.t. ν, we deduce that N ∈ G. Thus F ∪ N ⊂ G, so since G is a σ-algebra, we have F¯ = σ(F ∪ N ) ⊂ G. Moreover, for all A ∈ F and N ∈ N , we have ν(A) ≤ ν(A ∪ N) ≤ ν(A) + ν(N) = ν(A), so ν(A ∪ N) = µ(A) =µ ¯(A ∪ N). Thus ν is an extension of µ¯ to G, which yields the claim.
Remark 3.4.3 Ususally the extension µ¯ of µ to F¯ will still be denoted by µ. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 32
3.4.3 Outer Measure
Definition 3.4.3 A function µ∗ : 2X → [0, ∞] is said to be an outer measure over X if we have:
(1) µ∗(φ) = 0,
(2) Monotonicity. If A ⊂ B then µ∗(A) ≤ µ∗(B).
(3) countable sub-additivity. ∞ ∗ ∞ X ∗ µ (∪n=1An) ≤ µ (An). n=1
Exercise 3.4.1 • Let µ∗(φ) = 0 and µ∗(A) = ∞ if A ⊂ X is not empty. Show that µ∗ is an outer-measure.
• If an outer measure µ∗ satisfies that µ∗(A ∪ B) = µ∗(A) + µ∗(B) for any A, B ∈ F where F is a σ-algebra, what can you say about µ∗?
Definition 3.4.4 Let A be a collection of subsets containing φ, X . A map A → [0, ∞) is a pre-measure if
• %(φ) = 0, and
∞ • for any An ∈ A pairwise disjoint with ∪n=1An ∈ A,
∞ ∞ X %(∪n=1An) = %(An). n=1
Outer measures can be obtained from any suitable function on a given collection of subsets, by cov- ering a set with sets from this collection. A collection of set {Aα, α ∈ Λ} is said to be a cover of a set E if E ⊂ ∪α∈ΛAα. We are interested in covers consisting of a countable number of elements.
Proposition 3.4.4 Let A be a non-empty collection of subsets with φ ∈ A, X ∈ A. Let % : A → [0, ∞] be a function such that %(φ) = 0. We define a function on 2X as below. For any E ⊂ X ,
∞ ∗ X ∞ µ (E) = inf %(Aj): E ⊂ ∪j=1Aj,Aj ∈ A . (3.1) j=1
Then µ∗ is an outer measure.
Proof (1) Since every set is covered by {X }, the definition makes sense, and µ∗(φ) = 0. (2) If A ⊂ B, then every cover of B is a cover A and so µ∗(B) ≥ µ∗(A). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 33
(3) To show countable sub-additivity, take any sequence of subsets Aj. For any > 0 there exist k k Bj ∈ A with Aj ⊂ ∪kBj such that
∞ X µ∗(A ) ≥ %(Bk) − . j j 2j k=1 Summing over j, ∞ ∞ ∞ X ∗ X X k µ (Aj) ≥ %(Bj ) − . j=1 j=1 k=1 k ∞ Since {Bj } is a cover of ∪j=1Aj,
∞ ∞ ∗ ∞ X X k µ (∪j=1Aj) ≤ %(Bj ). j=1 k=1 Hence, X ∗ ∗ ∞ µ (Aj) ≥ µ (∪j=1Aj) − j ∗ ∞ P ∗ for any > 0. Take → 0 to see µ (∪j=1Aj) ≤ j µ (Aj), thus completing the proof.
How do we obtain additive from sub-additive? Let us single out sets that are strongly additive. Take a set A, we then have two collections of subsets: subsets of A on the one hand, and subsets of Ac on the other hand. If we take one from each collection and take their union, we want to prove that µ∗ is additive along this decomposition.
Definition 3.4.5 A subset A of X is said to be µ∗-measurable (in the sense of Caratheodory) if for any set B ⊂ X , we have µ∗(B) = µ∗(B ∩ A) + µ∗(B ∩ Ac). (3.2)
We have already sub-additivity, the non-trivial part of (3.2) is
µ∗(B) ≥ µ∗(B ∩ A) + µ∗(B ∩ Ac). (3.3)
Theorem 3.4.5 [Caratheodory Theorem] If µ∗ is an outer measure on X and G the collection of µ∗- measurable sets, then
(1) G is a σ-algebra.
(2) The restriction of µ∗ to G is a measure.
(3) G is complete w.r.t. µ∗.
Proof The proof is deferred to §3.4.5. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 34
3.4.4 Extension of pre-measures
Proposition 3.4.6 If A is an algebra, and % is a pre-measure on A, we define µ∗ by (3.1). Then the following statements hold.
1. Every set in A is µ∗-measurable.
2. µ∗ and % agree on A.
In particular, µ∗ is a measure on σ(A).
Proof The proof is deferred to §3.4.5.
Proposition 3.4.7 * If % is a pre-measure on an algebra A, we define µ∗ by (3.1) and % is σ-finite, then µ∗ is its unique extension to G and therefore to σ(A).
Proof The proof is deferred to §3.4.5.
3.4.5 Proof for Caratheodory’s Theorem
We return to prove some of the theorems and propositions given earlier.,
Theorem 3.4.5 (Caratheodory’s Theorem) If µ∗ is an outer measure on X , then the collection G of µ∗-measurable sets is a σ-algebra, the restriction of µ∗ to G is a measure, and G is complete w.r.t. µ∗.
Proof Throughout the proof let us write µ = µ∗ for simplicity. (1) φ ∈ G and µ∗(φ) = 0. (2) Since the roles played by A and Ac are symmetric, A ∈ G implies that Ac ∈ G.
(3) Suppose A, F ∈ G, we show A ∪ F ∈ G.
Take any B ∈ X . We apply (3.2) first with A ∈ G and then with F ∈ G:
µ(B) = µ(B ∩ A) + µ(B ∩ Ac) = µ(B ∩ A ∩ F ) + µ(B ∩ A ∩ F c) + µ(B ∩ Ac ∩ F ) + µ(B ∩ Ac ∩ F c).
We use A∪F = (A∩F )∪(A∩F c)∪(Ac ∩F ), as well as (A∪F )c = Ac ∩F c, and apply sub-additivity
µ(B) = µ(B ∩ A ∩ F ) + µ(B ∩ A ∩ F )c + µ(B ∩ Ac ∩ F ) + µ(B ∩ Ac ∩ F c) ≥ µ(B ∩ (A ∪ F )) + µ(B ∩ (A ∪ F )c).
This shows that A ∪ F ∈ G. Also, if A, F ∈ G are disjoint,
µ(A ∪ F ) = µ((A ∪ F ) ∩ F ) + µ((A ∪ F ) ∩ F c) = µ(F ) + µ(A). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 35
∞ n (4) Let An ∈ G, we show A = ∪n=1An ∈ G. Set Fn = ∪j=1Aj. Since G is stable under finite unions and under taking the complement, we may assume that the An are disjoint. Then for any B ∈ X , using Aj ∈ G, c µ(B ∩ Fn) = µ(B ∩ Fn ∩ An) + µ(B ∩ Fn ∩ (An) )
= µ(B ∩ An) + µ(B ∩ Fn−1) c = µ(B ∩ An) + µ(B ∩ Fn−1 ∩ An−1) + µ(B ∩ Fn−1 ∩ An−1)
= µ(B ∩ An) + µ(B ∩ An−1) + µ(B ∩ Fn−2) = ... n X = µ(B ∩ Aj). j=1
Thus using Fn ∈ G, n c X c µ(B) = µ(B ∩ Fn) + µ(B ∩ (Fn) ) = µ(B ∩ Aj) + µ(B ∩ (Fn) ) j=1 n X c ≥ µ(B ∩ Aj) + µ(B ∩ A ). j=1
Taking n → ∞, then using ∪(B ∩ Aj) = B ∩ A and sub-additivity, ∞ X c µ(B) ≥ µ(B ∩ Aj) + µ(B ∩ A ) j=1 ≥ µ(B ∩ A) + µ(B ∩ Ac) ≥ µ(B). P∞ Hence A ∈ G and j=1 µ(B ∩ Aj) = µ(B ∩ A) for any B ∈ G. Take B = A to conclude the σ-additivity. We proved that G is a σ-algebra and µ is a measure on it.
(5) Finally if µ(A) = 0 and F ⊂ A, then for any B ⊂ X ,
µ(B) ≤ µ(B ∩ F ) + µ(B ∩ (F )c) ≤ 0 + µ(B).
Hence all the inequalities above are equalities,
µ(B) = µ(B ∩ F ) + µ(B ∩ (F )c) and F ∈ G. Therefore G contains all negligible subsets F for µ and µ(F ) = 0.
In the sequel, we shall assume A to be an algebra. In that case, we can use the following, convenient lemma:
Lemma 3.4.8 Let A be an algebra and % : A → [0, ∞] an additive function. In words, for all A, B ∈ A disjoint, we have: %(A ∪ B) = %(A) + %(B). Then % is 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 36
1. monotone: for any A, B ∈ A with A ⊂ B, %(A) ≤ %(B)
Sn Pn 2. sub-additive: for any A1,...,An ∈ A, %( i=1 Ai) ≤ i=1 %(Ai)
Proof By additivity for all A, B ∈ A with A ⊂ B, we have
%(B) = %(A) + %(B \ A) ≥ %(A), which yields monotonicity. Now, for A1,...,An ∈ A, we set
i−1 [ Bi = Ai \ Aj ∈ A, i = 1, . . . , n, j=1 so that the Bi are disjoint. Hence, using successively the additivity and monotonicity of %,
n ! n ! n n [ [ X X % Ai = % Bi = % (Bi) ≤ % (Ai) , i=1 i=1 i=1 i=1 which yields the requested sub-additivity.
Proposition3.4.6. If % is a pre-measure on an algebra A, we define µ∗ by (3.1). Then:
1. µ∗ = % on A.
2. every set in A is µ∗-measurable.
In particular, µ∗ is a measure on σ(A), the completion of σ(A) w.r.t µ∗.
Proof (1) Let B ∈ A, then B is a cover for itself and µ∗(B) ≤ %(B). As before write µ = µ∗ for simplicity. We now prove the reverse inequality. Suppose that B ⊂ ∪nAn where An ∈ A. Then B = ∪nA˜n, where A˜n = (B ∩ A) \ (∪i ∞ ∞ X X %(B) = %(A˜n) ≤ %(An). n=1 n=1 Take infimum over all covers, ∞ X %(B) ≤ inf %(An) = µ(B). n=1 We have showed that % and µ agree on A. (2) We now show that any A ∈ A is µ∗-measurable. Take any B ⊂ X, we want to show µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B), (3.4) 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 37 For any > 0 choose a cover {Bj} of B from A with ∞ ∗ X µ (B) ≥ %(Bj) − . j=1 We first use A ∈ A ⊂ G, % = µ∗ on A and σ- sub-additivity of µ, and ∞ ∞ ∞ X X X c %(Bj) = %(A ∩ Bj) + %(A ∩ Bj) j=1 j=1 j=1 ∗ ∗ c ≥ µ (A ∩ ∪jBj) + µ (A ∩ ∪jBj) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B). In the last step we use ∪jBj ⊃ B and the monotonicity of the outer measure. Therefore µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B) − . Taking → 0, we obtain µ∗(B) ≥ µ∗(A ∩ B) + µ∗(Ac ∩ B), so that A is µ∗-measurable (the reverse inequality follows by sub-additivity), which proves the second claim. For the final statement, note that by 3.4.5, µ∗ defines a measure on G, and that G is complete w.r.t. ∗ µ . Since A ⊂ G, therefore σ(A) ⊂ G, and the claim follows. Proposition 3.4.9 * Let % be a σ-finite pre-measure on an algebra A, and define µ∗ by (3.1). Then µ∗ is the unique measure extending % to G, and therefore the unique measure on σ(A), extending %. Proof Step 1. We first show that if ν is another measure on the µ∗-measurable set G extending %, then ∗ ν ≤ µ . Indeed if E ∈ G, E ⊂ ∪Aj where Aj ∈ A, then by sub-additivity, X X ν(E) ≤ ν(∪Aj) ≤ ν(Aj) = %(Aj). j j This holds for any covering of E by sets from A, hence ν(E) ≤ µ∗(E). Step 2. Suppose that E ∈ G is such that µ∗(E) < ∞, we show that µ∗(E) = ν(E). We can find a P ∗ ∗ cover Aj of E from A with j µ (Aj) ≤ µ (E) + . We may and will assume that the cover is disjoint. ∞ Set A = ∪j=1Aj. Since E ⊂ A, ∗ ∗ ∗ X ∗ ∗ µ (A \ E) = µ (A) − µ (E) ≤ µ (Aj) − µ (E) < . (3.5) j 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 38 Also, since E ⊂ A, ∗ ∗ X ∗ µ (E) ≤ µ (A)≤ µ (Aj) j agree on A X = ν(Aj) = ν(A) j = ν(E) + ν(A \ E) ν≤µ∗ ≤ ν(E) + µ∗(A \ E) ≤ ν(E) + , where we have used (3.5) in the last inequality. Taking → 0, we see that µ∗(E) = ν(E). Step 3. Let % be σ-finite. By the assumption there exists an increasing sequence of sets Xn from A S such that %(Xn) < ∞ and n≥1 Xn = X . Then using step 2 we can complete the proof (exercise). 3.4.6 Construction of Lebesgue-Stieltjes / Lebesgue measure We want to construct a measure λ on (R, B(R)), such that the measure of a half open interval is its length, and such that it is translation invariant, i.e. λ(A + t) = λ(A) for any t ∈ R and any Borel set A, where A + t = {a + t : a ∈ A} is the translate of A by the number t. We will show that there exists a unique such measure on (R, B(R)) called the Lebesgue measure. This will be one specific example of Lebesgue-Stieltjes measure. Definition 3.4.6 A collection of subsets E is an elementary family if • φ ∈ E; • If A, B ∈ E, then A ∩ B ∈ E; • if A ∈ E then Ac is a finite union of disjoint sets from E. Exercise 3.4.2 Show that if E is an elementary family, then the collection A of finite disjoint unions of members of E is an algebra. In this section, by half open interval we mean sets of the form (a, b], (c, ∞), (−∞, b], or φ, where a, b, c are finite numbers. For simplicity we write a generic half interval as (c, d], where c ∈ R ∪ {−∞}, and d is a real number or ∞. In the latter case by (c, ∞], in this chapter, we really mean (c, ∞] ∩ R. Definition 3.4.7 Henceforth, in this section, let E denote the collection of half open intervals and let A be the collection of finite unions of disjoint half open intervals. We have seen that 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 39 Proposition 3.4.10 A is an algebra. Remark 3.4.11 Every element of A can be written as a finite disjoint union of maximal intervals (this means the interval cannot be extended within A): n [ A = (ak, bk], (3.6) k=1 this is unique if we order them by ak < bk < ak+1. Let us call this the canonical representation. If A is not bounded from above, bn = ∞, by (an, bn] we mean (an, bn] ∩ R. Let F : R → R be an increasing and right continuous function. Then it has at most a countable number of discontinuities, all of which are of jump type (i.e. left and right limits exist). We define F (∞) = lim F (x),F (−∞) = lim F (x). (3.7) x→∞ x→−∞ Sn Definition 3.4.8 For A = j=1(aj, bj] ∈ A, written in its canonical representation , we set n X %(A) = (F (bj) − F (aj)). (3.8) j=1 We also set %(φ) = 0. Remark 3.4.12 This definition is independent of the expression for A. Indeed, suppose that A = (c, d] Sn is written also as A = j=1(aj, bj], then up to a re-ordering, c = a1 ≤ b1 = a2 ≤ b2 = a3 ≤ · · · ≤ bn = d. This gives a telescopic sum, n X (F (bj) − F (aj)) = F (bn) − F (a1) = F (d) − F (c). j=1 So when A is a single interval, there is no ambiguity in the definition of %. Pp If A = k=1(ck, dk] is written in its canonical representation, so each interval is a maximal interval, then there is partition {Λk, k = 1, . . . , p} of {1, . . . , n} such that [ (aj, bj] = (ck, dk]. j∈Λk (recall that a partition of a set I is a collection of disjoint subsets of I whose union is I). In fact, it suffices to consider Λk := {j = 1, . . . , n :(aj, bj] ⊂ (ck, dk]}. Hence n p p X X X X %((aj, bj]) = %((aj, bj]) = %((ck, bk]). j=1 k=1 j∈Λk k=1 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 40 Proposition 3.4.13 If F : R → R is a non-decreasing and right continuous function, then % defined by (3.8) is a pre-measure on A. Proof We have %(φ) = 0, so it remains to prove σ-additivity. We start by showing that % is additive. Let A, B ∈ A be disjoint. Then there exists Ej ∈ E such that n m [ [ A = Ej,B = Ej. j=1 j=n+1 Then m n m X X X %(A ∪ B) = %(Ej) = %(Ej) + %(Ej) = %(A) + %(B). j=1 j=1 j=n+1 Hence % is indeed additive. By Lemma 3.4.8, we in particular deduce that % is monotone and sub-additive. Step 2. Let now (Aj)j≥1 be a sequence of disjoint elements of A such that [ A := Aj j≥1 is in A. Using monotonicity followed by finite additivity, n n [ X %(A) ≥ %( Aj) = %(Aj). j=1 j=1 Since this holds for every n, ∞ X %(A) ≥ %(Aj). j=1 Step 3. We now prove the reverse inequality. Since A ∈ A, n [ A = (ak, bk]. (3.9) k=1 P We show that %(A) ≤ j=1 %(Aj). As argued before, in Remark 3.4.12, by additivity of % we reduce this to the case A is one single half open interval. Step 4. We first assume A = (c, d] is bounded. We may also assumme that c < d, otherwise %((c, d]) = %(φ) = 0 and the requested inequality is trivially satisfied. Since, for all j ≥ 1, Aj is a finite union of bounded disjoint half open intervals, the whole collection of these intervals is countable and we write it {(ak, bk], k ≥ 1}. In particular, we have ∞ ∞ ∞ ∞ [ [ X X Aj = (ak, bk], and %(Aj) = %((ak, bk]). (3.10) j=1 k=1 j=1 k=1 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 41 The idea is to approximate (c, d] by a compact interval [c + δ, d] and (ak, bk] by a larger open interval (ak, bk + δk), we can then choose a finite cover and use finite additivity. Let > 0. By right-continuity of F , we may choose δ > 0 such that δ < d − c and F (c + δ) − F (c) < . Similarly, for all k ≥ 1, we choose a δk so that −k F (bk + δk) − F (bk) ≤ 2 . The compact set [c+δ, d] is covered by the sets (ak, bk] and therefore by the larger open sets (ak, bk +δk). We can choose a finite covering, so that for some N, N [c + δ, d] ⊂ ∪k=1(ak, bk + δk). In particular, we have the inclusion N (c + δ, d] ⊂ ∪k=1(ak, bk + δk], between subsets that are in A. By monotonicity and sub-additivity of %, %((c, d]) = F (d) − F (c) = F (d) − F (c + δ) + F (c + δ) − F (c) ≤ %((c + δ, d]) + N ≤ % ∪k=1(ak, bk + δk] + N N X X ≤ %((a , b ]) + + k k 2k k=1 k=1 ∞ X ≤ %(Aj) + 2, j=1 where the last inequality follows from (3.10). Taking → 0, we conclude that X %(A) = %((c, d]) ≤ %(Aj) j as requested. Together with step 2, we have proved X %(A) = %(Aj) j for A a bounded half open interval. Step 5. Now assume A = (−∞, b]. Then, for every n with −n < b, %(A) = F (b) − F (−∞) = %((−n, b]) + F (−n) − F (−∞). 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 42 Note that ∞ ∞ [ [ (−n, b] = (−n, b] ∩ Aj = (−n, b] ∩ Aj. j=1 j=1 Since (−n, b] ∩ Aj ∈ A for all j ≥ 1, we use step 4 to conclude that ∞ X %((−n, b]) = %((−n, b] ∩ Aj). j=1 Hence, invoking monotonicity of %, we obtain ∞ ∞ X X %(A) = %((−n, b] ∩ Aj) + F (−n) − F (−∞) ≤ %(Aj) + F (−n) − F (−∞). j=1 j=1 Taking n → ∞, since F (−n) → F (−∞) we have proved the requested inequality. The same can be shown for A = (a, ∞), a ∈ R, or for A = R = (−∞, +∞]. We have completed the proof. As in (3.1), we define ∞ ∞ ∗ X [ µ (E) = inf %(Aj): E ⊂ Aj,Aj ∈ A . j=1 j=1 Since each A ∈ A is a finite disjoint union of members in E, by re-arranging we have ∞ ∞ ∗ X [ µ (E) = inf %(Ej): E ⊂ Ej,Ej ∈ E j=1 j=1 ∞ ∞ X [ = inf %((aj, bj]) : E ⊂ (aj, bj] . j=1 j=1 Theorem 3.4.14 There exists a measure µF on the Borel σ-algebra B(R) such that µF = % on A. It is the unique measure on B(R) that agrees with % on A. Proof Existence of µF follows from Caratheodory’s Theorem. The uniqueness statement follows from Theorem 3.3.1. Proposition 3.4.15 µF extends to a measure (still denoted by µF ) on the completion B¯(R) of B(R). Proof Let G be the collection of µ∗ measurable subsets of R, and N the collection of µF -negligible ∗ ∗ subsets. Since G is complete w.r.t. µ , and since µ coincides with µF on B(R), we deduce that N ⊂ G. But we also have B(R) ⊂ G, and hence B¯(R) ⊂ G. So the restriction of µ∗ to B¯(R) provides the requested extension. 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 43 Remark 3.4.16 * Note that, by Proposition 3.4.2, such an extension is unique. ∞ P∞ Remark 3.4.17 If A = ∪j=1(aj, bj] is a disjoint union, then µF (A) = j=1 %((aj, bj]). This follows from that the union is a Borel set, and σ-additive property fo the measure. The subscript F will be omitted if there is no risk of confusion. Definition 3.4.9 • The measure µF constructed in the theorem above is called Lebesgue-Stieltjes measure associated to F . • If F is the identity map, this is called the Lebesgue measure on R and will be denoted by λ. • The completion B¯(R) of the Borel σ-algebra with respect to λ is called the Lebesgue σ-algebra, its elements are called Lebesgue-measurable sets. Theorem 3.4.18 If B is a µ∗-measurable set, then µF (B) = inf{µF (O): B ⊂ O, O is open} = sup{µF (D): D ⊂ B, D is closed}. This holds in particular if B is a Borel measurable set. Proof This is left as exercise. Recall that B∆A = (B \ A) S(A \ B). Remark 3.4.19 If B is µ∗-measurable set with finite measure, then for every > 0 there exists a set A which is a finite union of open intervals, such that µF (B∆A) < . 3.4.7 Example of a non-Lebesgue measurable set* Proposition 3.4.20 B¯(R) 6= 2R, i.e. there exists a subset of R that is not Lebesgue measurable. Proof Let us define an equivalent relation: x ∼ y if and only if x − y ∈ Q. Using Axiom of choice we can choose the representative of the equivalent relations to be in (0, 1]. This quotient set V = R/ ∼ as a subset of (0, 1] is called the Vitali set. If q1 6= q2 are two rational numbers, the translates V + q1 and V + q2 are disjoint. Let A = Q ∩ (−1, 1], then (0, 1] ⊂ ∪q∈A(V + q) ⊂ [−1, 2]. (3.11) The second inequality follows from the inclusions V ⊂ (0, 1] and A ⊂ (−1, 1]. To prove the first one, take y ∈ (0, 1], then there exists q ∈ (−1, 1] ∩ Q = A such that y = q + [y], where [y] is the representative of y in (0, 1]. Thus y ∈ ∪q∈A(V + q) and the left hand side inclusion is proved. Let us assume by contradiction that V is Lebesgue measurable. What can its Lebesgue measure be? By 3.4. CONSTRUCTION OF MEASURES (LECTURES 4,5,6) 44 translation invariance, λ(V + q) = λ(V ) for all q ∈ A. Moreover, the sets V + q, q ∈ A, are pairwise disjoint. Therefore X λ(∪q∈A(V + Q)) = λ(V ), q∈A which equals either 0 (if λ(V ) = 0) or +∞ (if λ(V ) > 0). But in view of (3.11), we also have 1 ≤ λ(∪q∈A(V + Q)) ≤ 3, which is a contradiction. Therefore V is not Lebesgue measurable. Exercise 3.4.3 Show that there does not exist a translation invariant countably additive measure on (R, 2R) which assigns an interval its length. 3.4.8 Examples Example 3.6 Let a, b ∈ R with a < b. then, since µF ({b}) ≤ µF ((b − 1, b]) = F (b) − F (b − 1) < ∞, µF ((a, b)) = µF ((a, b]) − µF ({b}). Observe that µF ({b}) 6= 0 precisely when b is a discontinuity of F . Indeed, since µF ((b − 1, b]) < ∞, by continuity of µF from above, ∞ 1 1 1 µF ({b}) = µF (∩ {(b − , b]}) = lim µF ((b − , b]) = F (b) − lim F (b − ) = F (b) − F (b−). n=1 n n→∞ n n→∞ n Example 3.7 Since F (x) = x is continuous, any countable subset of R has Lebesgue measure zero. Example 3.8 Let α ∈ R be fixed, and let ( α, x < 0, F (x) = α + 3, x ≥ 0. Then µF = 3δ0. Indeed, for any interval (a, b], ( 3, if 0 ∈ (a, b], µ ((a, b]) = F 0, if 0 6∈ (a, b], so that µF and 3δ0 coincide on E. To see that they are equal, note that E is a π-system generating B(R) and that µF (R) = 3 = 3δ0(R) < ∞. Example 3.9 If Y is a random variable on a probability space (Ω, F, P), set F (x) = P(Y ≤ x). Then F is right continuous and increasing, it determines a Lebesgue-Stieltjes measure µF . Since F (−∞) = 0, Also, µF ((a, b]) = F (b) − F (a) = P(a < Y ≤ b). 3.5. LEBESGUE MEASURE ON RN (6TH LECTURE) 45 3.5 Lebesgue Measure on Rn By a similar construction, we obtain a Lebesgue measure on Rn. We do not get into details. n Definition 3.5.1 There is a unique measure λ on B(R ) such that, for all a1, . . . , an, b1, . . . , bn ∈ R with ai ≤ bi, n λ((a1, b1] × · · · × (an, bn]) = Πi=1(bi − ai). This is called the Lebesgue measure on Rn, it extends to the completion of B(Rn). Remark 3.5.1 The Lebesgue measure is invariant under translations and rotations. For A ⊂ R and t ∈ R, let us denote by t + A the shifted set t + A = {t + a, a ∈ A}. Theorem 3.5.2 Any shift of a Borel set is a Borel set, i.e. if A ∈ B(R), then for all t ∈ R, t + A ∈ B(R). Moreover, the Lebesgue measure is translation invariant on the Borel σ-algebra, i.e. ∀A ∈ B(R), λ(A) = λ(t + A). Proof Let C = {A ∈ B(R): ∀t ∈ R, t + A ∈ B(R)}. C is a σ-algebra. Moreover, if U is an open set, then for all t ∈ R, t + U is an open set as well, hence C contains all open sets. Therefore C contains the σ-algebra genereated by open sets, i.e. C = B(R). Now we show the second point. Let A = {A ∈ B(R): ∀t ∈ R, λ(t + A) = λ(A)}. Then A contains the π-system of finite unions of disjoint half intervals. Now Ω ∈ A. If A ⊂ B with A, B ∈ A, t + B \ A = (t + B) \ (t + A), t + A ⊂ t + B, hence µ(t + B \ A) = µ(t + B) − µ(t + A) = µ(B) − µ(A) = µ(B \ A). Thus B \ A ∈ A. S S If An ∈ A is an increasing sequence, so is t + An, and n(t + An) = t + n An, ∞ [ [ µ(t + An) = lim µ(t + An) = lim µ(An) = µ( An). n→∞ n→∞ n=1 n Thus A is a λ system, concluding the proof. 3.5. LEBESGUE MEASURE ON RN (6TH LECTURE) 46 Proposition 3.5.3 Two non-trivial translation-invariant Borel measures on R which assign a finite mea- sure to bounded intervals are constant multiples of each other. Proof Let µ and ν be translation invariant measures. Then, for all integers p, q ≥ 1, 1 1 µ((0, 1]) = qµ((0, ]), µ([0, p/q]) = pµ((0, ]) = (p/q)µ((0, 1]). q q p Similarly, ν([0, q )) = (p/q) ν((0, 1]). The quanitities ν((0, 1]) and ν((0, 1]) are non-zero, for other- wise the measures would be trivial. Set c = µ([0, 1)]/ν([0, 1). Then µ([0, p/q]) = (p/q)µ([0, 1)] = cν([0, p/q)). Hence, µ(I) = cν(I) for all I ∈ C, where C denotes the collection of all half-open intervals with finite rational length. Now, C is a π-system and σ(C) = B(R). Moreover, we have ∪k≥1(−k, k] = R, and µ and cν assign the same finite mass to (−k, k] ∈ C for all k ≥ 1. By Theorem 3.3.1, we deduce that µ = cν. Chapter 4 Measurable maps 4.1 Definition Let (Ω, F) and (X , G) be two measurable spaces. Definition 4.1.1 A map f :Ω → X is said to be measurable from (Ω, F) to (X , G) if, for all A ∈ G, f −1(A) ∈ F. When it is clear which σ-algebras are being considered, in the situation above, we will often just say that the map f is measurable. Definition 4.1.2 Assume Ω and X are topological spaces endowed with their respective Borel σ-algebras B(Ω) and B(C). We call a function f :Ω → X (Borel) measurable if it is measurable from (Ω, B(Ω)) to (X , B(X )). Definition 4.1.3 Let (Ω, F, P) be a probability space and (X , G) be a measurable space. If X is a measurable map from (Ω, F) to (X , G), we call X a random variable. Exercise 4.1.1 Let (X , B) be a measurable space and Ω a set. A function f :Ω → X is always measurable when Ω is endowed with the pre-image σ-algebra σ(f). Moreover, σ(f) is the smallest σ-algebra on Ω such that f :Ω → X is measurable. 4.2 Examples Example 4.1 If X is a discrete space, let the power set be its σ-algebra. Let Y be another space with a σ-algebra. Then any function f : X → Y is measurable. 47 4.3. PROPERTIES OF MEASURABLE FUNCTIONS 48 If f is identically 1 on a set A and zero everywhere else, it is called the indicator function of A. We denote it by 1A: ( 1, if ω ∈ A 1 (ω) = A 0, if ω 6∈ A. The indicator function 1A of a set is a measurable function if and only if A is a measurable set. Example 4.2 If f : E → R takes only a finite number of values {a1, . . . , ak}. Then, f is measurable if and only if, for all i, {f = ai} is measurable. Also, f can be written in the following form: f = Pk i=1 ai1Aj . (Just take Aj = {x : f(x) = aj}). 4.3 Properties of measurable functions Proposition 4.3.1 If f :(X1, F1) → (X2, F2) and g :(X2, F2) → (X3, F3) are measurable functions then so f ◦ g : X1 → X3 is measurable. −1 Proof Let A ∈ F3 then f (A) ∈ F2. Furthermore, −1 −1 (f ◦ g) (A) = {x ∈ X1 : f ◦ g ∈ A} = {x ∈ X1 : g ∈ f (A)} ∈ F1 using g is measurable. Proposition 4.3.2 Let (Ω, F) and (X , B) be measurable spaces. Suppose that B is generated by C and f :Ω → X . If f −1(A) ∈ F whenever A ∈ C, then f is measurable. Proof B0 = {A ∈ B : f −1(A) ∈ F} ⊃ C is a σ-algebra, check with Lemma 2.1.1. It must therefore agree with B = σ(C). Here are some important consequences of the above proposition. Proposition 4.3.3 If X,Y are metric spaces and f : X → Y is a continuous map, then f is Borel measurable. Proof Continuity means that the pre-image of an open set is an open set. Hence, f −1(U) ∈ B(X) for all open subset U of Y . Since B(Y ) is generated by open sets, we deduce that f is Borel measurable. Proposition 4.3.4 Letf1 :(X , F) → (X1, F1) and f2 :(X , F) → (X2, F2) be measurable functions. Then ψ = (f1, f2): X → X1 × X2 is measurable, when the target space is endowed by the product σ-algebra F1 ⊗ F2. 4.4. MEASURABILITY WITH RESPECT TO THE σ-ALGEBRA GENERATED BY A MAP 49 −1 Proof Apply Proposition 4.3.2, it is sufficient to know that h (A × B), where A ∈ F1 and B ∈ F2, is measurable. But, −1 −1 −1 ψ (A × B) = f1 (A) ∩ f2 (B) ∈ F, completing the proof. 4.4 Measurability with respect to the σ-algebra generated by a map An enhanced version of the composition rule for f ◦ g can be proved in the case where A2, the σ-algebra on the second set X2, corresponds to the σ-algebra σ(f) generated by f. Let (X1, A1) and (X3, A3) be measurable spaces, and X2 be a set. Let f : X2 → X3 be a map, and let σ(f) be the σ-algebra on X2 generated by f. Proposition 4.4.1 A map g : X1 → X2 is measurable from (X1, A1) to (X2, σ(f)) if and only if f ◦ g is measurable from (X1, A1) to (X3, A3). Proof Since f is measurable from (X2, σ(f)) to (X , A3), by the composition rule, if g is measurable from (X1, A1) to (X2, σ(f)), then f ◦ g is measurable from (X1, A1) to (X3, A3). Conversely, assume that f ◦ g is measurable from (X1, A1) to (X3, A3), then for any C ∈ A3, we have −1 −1 −1 g (f (C)) = (f ◦ g) (C) ∈ A1. −1 Since σ(f) consists of all subsets of the form f (C), for C ∈ X3, we deduce that g is measurable from (X1, A1) to (X2, σ(f)). 4.5 Measurable maps with values in a subset Let (Ω, F) and (X , G) be measurable spaces, and let Y ⊂ X . Let us denote by i : Y → X the inclusion map. Recall that Y can be endowed with the trace H of the σ-algebra G, see Example2.4 above. Let f :Ω → Y be a map, one can then also consider i ◦ f, which is a map from Ω to X , therefore there are two possible definitions for the measurability of f. The following Proposition states that these coincide. Proposition 4.5.1 Let f :Ω → Y. Then f is measurable from (Ω, F) to (Y, H) if and only if i ◦ f is measurable from (Ω, F) to (X , G). Proof Note that σ(i) = {i−1(A),A ∈ G} = {A ∩ Y,A ∈ G} = H. Hence the result follows from Proposition 4.4.1 above. 4.6. REAL-VALUED MEASURABLE FUNCTIONS 50 In particular, one obtains the following consequence for the case when X is a metric space. Corollary 4.5.2 Let X be a metric space and let Y be a subset of X . Then f :Ω → Y is measurable from (Ω, F) to (Y, B(Y)) if and only if it is measurable from (Ω, F) to (X , B(X )). Proof This follows from the above Proposition, noting that B(Y) is the trace of B(X ) on Y, see Propo- sition 2.3.2. 4.6 Real-valued measurable functions The Borel σ-algebra of R is generated by the set of open intervals, equally they are generated by the set of all closed intervals. We state some of these below, they follow straightforwardly from Proposition 4.3.2. Proposition 4.6.1 Let (X, B) be a measurable space. A function f : X → R is Borel measurable if one of the following conditions hold: 1. for each real number a, {x : f(x) > a} is a measurable set; 2. for each real number a, {x : f(x) < a} is a measurable set; 3. for each real number a, {x : f(x) ≥ a} is a measurable set; 4. for each real number a, {x : f(x) ≤ a} is a measurable set; For the first just note that sets of the form (a, ∞) generates the Borel σ-algebra, apply Proposition 4.3.2 to. conclude. Similarly the other statements chan be shown. Proposition 4.6.2 Let (X, F) be a measurable space. If f, g : X → R are measurable functions, then so are the following functions. 1. f + g 2. fg, 3. max(f, g) 4. f + = max(f, 0), f − = max(−f, 0) 5. |f| = f + + f −. 4.6. REAL-VALUED MEASURABLE FUNCTIONS 51 Proof Firstly (f, g): X → R2 is Borel measurable. Set ψ : X → R2 by ψ(x) = (f(x), g(x)) and define ϕ : R2 → R by ϕ(x, y) = x + y, a continuous function. Then f + g = ϕ ◦ ψ is a composition of measurable function, so measurable. Choosing ϕ(x, y) = xy, for the analogous proof for fg. Now for any a > 0, {max(f, g) < a} = {f < a} ∩ {g < a} ∈ F, so max(f, g) is measurable. Part 4 follows from part 3, and 5 follows part 1 and part 4. (f+g)+|f−g| Remark 4.6.3 Note the relation max(f, g) = 2 . Definition 4.6.1 Let R = R ∪ {−∞, +∞}. One can endow R with the distance d(x, y) = |F (x) − F (y)|, x, y ∈ R, with F (x) = arctan(x), x ∈ R, and F (±∞) = limu→±∞ arctan(u) = ±π/2. We then denote by B(R) the associated Borel σ-algebra. Exercise 4.6.1 1. Show that open subsets of R are of the form i) U, where U is an open subset of R, ii) [−∞, a) where a ∈ R, iii) (b, +∞], where b ∈ R, or any union of i), ii) and iii). 2. Using Remark 2.3.3, deduce that B(R) ⊂ B(R) and that B([0, +∞]) ⊂ B(R). Show also that B(R) is generated by the collection of intervals (t, +∞] ⊂ R, for t ∈ R. 3. Show that f is measurable from (X , F) to (R, B(R)) iff {f = ω} ∈ F for ω ∈ {−∞, +∞} and f −1(A) ∈ F for any A ∈ B(R). Remark 4.6.4 By Proposition 4.5.1, if (X , A) is a measurable space, then a function f : X → R is measurable from (X , A) to (R, B(R)) if and only if i ◦ f is measurable from (X , A) to (R, B(R), where i : R → R is the inclusion map. Proposition 4.6.5 Let (X, B) be a measurable space. If fn : X → R are measurable functions, so are the following: 1. sup{n≥1} fn, 2. inf{n≥1} fn; 3. lim sup fn, 4. lim inf fn. 4.6. REAL-VALUED MEASURABLE FUNCTIONS 52 Proof Firstly {sup fn > a} = ∪n{fn ≥ a}, n belongs to F, as each {fn ≥ a} does (the above is {x : infn≥1 fn(x) < a} = ∪n{x : fn(x) < a}), so sup{n≥1} fn is measurable. Also, { inf fn < a} = ∩n{fn < a} ∈ F, n≥1 showing that inf{n≥1} fn is measurable. Since lim sup fn = inf sup fk, lim inf fn. = sup sup fk, n→∞ n k≥n n k≥n they are both measurable. Proposition 4.6.6 If fn is a sequence of Borel measurable functions with f = limn→∞ fn(x) exists at every x, then f is a Borel measurable function. Example 4.3 If (X , F) and (Y, G) are measurable spaces. Suppose that µ is a measure. Let f : X → Y and g : X → Y. Let A = {f 6= g}. 1. Suppose both f and g are measurable, then f − g is a measurable function, consequently, A = {f − g 6= 0} is a measurable set. 2. Suppose that F is complete with respect to µ. Suppose A is contained in a null measurable set. Since F is complete, then A is measurable and µ(A) = 0. Then f is measurable implies that g is measurable. Indeed, for any B ∈ G, [ {g ∈ B} = ({f ∈ B} ∩ Ac) ({g ∈ B} ∩ A) ∈ F. We use that ({g ∈ B} ∩ A) is contained in a null set and therefore measurable, Ac and {f ∈ B} are measurable. Definition 4.6.2 Let (X , F, µ) be a measure space and (Y, G) be a measurable space. Let f, g two measurable maps from (X , F) to (Y, G). If f = g on a set of full measure, i.e. µ({f 6= g}) = 0, we say that f = g almost-everywhere. This is abbreviated by f = g a.e. Let (Ω, F, P) be a probability space, (Y, G) a measurable space, and X,Y two random variables from (Ω, F) to (Y, G). We say that X = Y almost-surely if X = Y on an event of full probaility, i.e. P({X 6= Y }) = 0. This is abbreviated by X = Y a.s. Example 4.4 (This is not intended to be delivered in class) If fn is a sequence of Borel measurable functions. Set lim fn(x), if the limit exists at x, f(x) = n→∞ 0, otherwise Then f is measurable. 4.7. SIMPLE FUNCTIONS 53 Proof Set U := {x : lim fn(x) exists} = {x : lim sup fn(x) = lim inf fn(x)} n→∞ n→∞ n→∞ Since lim supn→∞ fn(x) and lim infn→∞ fn(x) are both measurable, U is a measurable set. Define Z := (lim inf fn, lim sup fn). ( Let ∆ = {(b, b): b ∈ R} denote the the diagonal subset of R2, then U = Z−1(∆).) Note that f(x) = 0 on U c. Then for any a < 0, {x : lim fn(x) < a} = {x : lim sup fn(x) < a}, n→∞ n→∞ which is measurable since lim supn→∞ fn is measurable. Let a ≥ 0, then {x : lim fn(x) < a} = {x : lim sup fn(x) < a} ∪ U, n→∞ n→∞ which is also a measurable set. In analysis we frequently encounter Lebesgue measurable sets and functions, this refers to the σ- algebra 4.7 Simple functions A simple functions is a real-valued measurable function taking on a finite number of distinct values. It has representation as linear combinations of indicator functions of measurable sets. Representations are not necessarily unique. For example take the domain space to be R, let A1 = [1, 2], A2 = (2, 6], and A3 = (−∞, 1) ∪ (4, ∞). Then, f = 21A1 + 31A2 + 01A3 is a simple function. We often do not include the last term with zero value, but sometimes it is included for the convenience that A1 ∪ A2 ∪ A3 = R. The representation is not unique, c c f = 21A1∩Q + 31A2∩Q + 21A1∩Q + 31A2∩Q , by we also have, f = 21([1,3] + 1(2,3] + 31(3,6]. Let us now give one popular definition for simple functions. PN Definition 4.7.1 Let (X , F) be a measurable space. A function of the form f(x) = i=1 ai1Ai (x) where ai ∈ R and Ai ∈ Fi is said to be a simple function. P∞ Proposition 4.7.1 If Ai are measurable sets, then f = i=1 ai1Ai is measurable. 4.7. SIMPLE FUNCTIONS 54 Proof If B is a Borel set, −1 f (A) = ∪{i:ai∈B}Ai. This is a countable union and belongs to F if every Ai ∈ F. P∞ Remark 4.7.2 Suppose that f = i=1 ai1Ai is measurable. If the numbers ai are distinct, and {Ai} −1 are disjoint sets, then Ai is necessarily measurable. Indeed, f ({ai}) = Ai. So for f to be measurable, Ai must belong to F. −1 If a1 = a2, we can conclude f ({a1}) = A1 ∪ A2 is measurable, but we may not have any further information to conclude the measurability of the sets A1 and A2. Definition 4.7.2 Suppose that a simple function f takes the following values {ai, i = 1,...,N}. Then n X −1 f(x) = ai1f (ai) i=1 −1 is said to be the canonical representation for f. Set Ai = f (ai), then Ai are pairwise disjoint mea- SN surable sets and i=1 Ai = X . P∞ Exercise 4.7.1 Write the following function in the form i=1 ai1Ai where {Ai} are pairwise disjoint. 2, x ∈ A = (−2π, −π] 1 3, x ∈ A = [1, 4) ∪ (7, 8) 2 1 f(x) = , x ∈ A = (4, 6] 2 3 5, x ∈ A4 = (9, ∞) ∩ Q 0, otherwise Solution: The canonical representation is 1 f(x) = 21 + 31 + 1 + 51 . A1 A2 2 A3 A4 4 c We see that {A1,A2,A3,A4, (∪i=1Ai) } is a partition of X . Example 4.5 A step function f : R → R is a piecewise constant function. Steps functions with a finite number of steps are simple functions. Theorem 4.7.3 Let (X , G) be a measurable space. Then, • Every measurable function f : X → R is the pointwise limit of simple functions: f(x) = lim fn(x), n→∞ where fn ∈ E(G). Furthermore we can arrange so that |fn| ≤ |f|. ( If f is bounded, the convergence is uniform.) 4.7. SIMPLE FUNCTIONS 55 • If f ≥ 0 we may choose fn to be non-negative and the sequence (fn)n≥1 non- decreasing so f = sup{n≥1} fn. Proof Step 1. We prove the second claim first. Let us consider dyadic partitions of the interval [0, n]. Let us cut each length 1 interval into 2n pieces of equal length (this is called a dyadic partition). The dyadic partitions have an advantage: any partition points from current level contains all partition points from the previous levels. Let us define the measurable sets: j j + 1 An = x : < f(x) ≤ , 0 ≤ j ≤ n(2n − 1). j 2n 2n j n We approximate f with the value 2n on Aj and define n(2n−1) X j gn = 1An . 2n j j=0 1 Then on {x : 0 ≤ f(x) < n}, |fn(x) − f(x)| ≤ 2n , so lim fn(x) = f(x). n→∞ Also fn increases with n and fn(x) ≤ f(x) for every x. Step 2. For the first claim, we write f = f + − f − where f + = max(f, 0) f − = max(−f, 0). + + − Let (f )n be an non-decreasing sequence of simple functions with limit f . Similarly, Let (f )n be an non-decreasing sequence of simple functions with limit f −. Set + − gn = (f )n − (f )n. Then + − + − |gn(x)| = (f )n(x) + (f )n(x) ≤ f (x) + f (x) = |f(x)|. For every x ∈ f −1([−n, n]), + − |f(x) − gn(x)| = |(f(x) − (f )n + (f )n)(x)| + + − − ≤ |f (x) − (f )n(x)| + |f (x) − (f )n(x)| 2 ≤ . 2n This means that for every x, limn→∞ gn(x) → f(x). If |f| ≤ M ≤ n0 is bounded, then for n ≥ n0, 2 |f(x) − g (x)| ≤ n 2n for all x ∈ X , showing the uniform convergence. 4.8. FACTORISATION LEMMA 56 4.8 Factorisation Lemma Lemma 4.8.1 (The factorisation Lemma) Let (X , G) be a measurable space and Ω a set. Let Y :Ω → X be a measurable function and consider the measurable space (Ω, σ(Y )). Then X :Ω → R is σ(Y )- measurable if and only if there exists a measurable function f : X → R such that X = f ◦ Y . Moreover, if X is non-negative, f may be chosen to be non-negative. Proof Recall that σ(Y ) = {Y −1(B): B ∈ G). Suppose that f : X → R is measurable, since Y is measurable with respect to σ(Y )/G, then the composition of the measurable functions f ◦ Y is measurable. For the converse, suppose that X is σ(Y )-measurable. We first assume X = 1A where A ∈ σ(Y ). Then A = {Y −1(B)} for some B ∈ G and 1B ◦ Y = 1{ω:Y (ω)∈B} = X. So X is the composition of the measurable function 1B with Y . −1 Suppose that X takes only a countable number of values (an) and write An = X ({an}). Since X −1 is σ(Y )-measurable, there exist sets Bn ∈ G such that Y (Bn) = An. Define now the sets [ Cn = Bn \ Bp. p −1 These sets are disjoint and one has again Y (Cn) = An. Setting f(x) = an for x ∈ Cn and f(x) = 0 S for x ∈ X \ n Cn, we see that f has the required property. Suppose that X ≥ 0, we may approximate it with an increasing sequence of simple functions Xn. By the paragraph above, there exists gn : X → R+ simple functions such that Xn = gn(Y ). In fact, gn is an increasing sequence. Let g = supn gn ≥ 0. Since g is a pointwise limit of measurable functions, g is also measurable. We note g(Y (ω)) = supn gn(Y (ω)) = supn Xn(ω)) = X(ω). Finally, in the general case where X takes values in R we complete the proof by exploiting the decomposition X = X+ − X−. 4.9 Pushed-forward measure Definition 4.9.1 Let (X1, B1) and (X2, B2) be measurable spaces, and µ1 a measure on (X1, B1). Let f : X1 → X2 be a measurable function. We define a measure µ2 on B2 by setting −1 µ2(A) = µ1(f (A)) = µ1({x : f(x) ∈ A}). −1 This is called the pushed-forward measure and denoted by f∗(µ1) or sometimes as µ◦ . 4.10. APPENDIX* 57 Exercise 4.9.1 Show that f∗(µ1) is a measure. Definition 4.9.2 If (Ω, F, P) is a probability space and X :Ω → R is a random variable, X∗P is called the probability distribution of the random variable. Example 4.6 If X :Ω → {0, 1}, then X∗P({0}) = P(X = 0) and X∗P = P(X = 0)δ0 + P(X = 1)δ1. 4.10 Appendix* 4.10.1 Littlewood’s three principles Every Borel measurable set on R is nearly a finite union of intervals. Every (measurable) function is nearly continuous, every convergent sequence of measurable functions is nearly uniform convergent. 4.10.2 Product σ-algebras Let I be an arbitrary index and let (Eα, Fα), α ∈ I be a family of measurable spaces. The tensor σ-algebra, also called the product σ-algebra, on E = Πα∈I Eα is defined to be −1 ⊗α∈I Fα = σ{πα (Aα): Aα ∈ Fα, α ∈ I}. Here πα : E = Πα∈I Eα → Eα is the projection (also called the coordinate map). If x = (xα, α ∈ I) is an element of E, πα(x) = xα. The tensor σ-algebra is the smallest one such that for all α ∈ I, the mapping πα :(E, ⊗α∈I Fα) → (Eα, Fα) is measurable. n Let πi : R → R be the projection to the ith component. The tensor σ-algebra ⊗nB(R) is −1 ⊗nB(R) = σ{πi (A): A ∈ B(R)}. −1 For example π1 (A) = A × R × · · · × R. 4.10.3 The Borel σ-algebra on the Wiener Space The set of all maps from [0, 1] to Rd can be denoted as the product space (Rd)[0,1]. The tensor σ-algebra d ⊗[0,1]B(R ) is the smallest space such that the projections πt where πt(σ) = σ(t) is measurable. The product topology is the smallest topology such that the projections are continuous. Hence the Borel σ-algebra of the product topology is larger than the tensor σ-algebra. 4.10. APPENDIX* 58 Let W d = C([0,T ], Rd) be the space of continuous functions from [0, 1] to Rn. It is a Banach space with the supremum norm: kgkW := sup kg(t)k. t∈[0,1] Any measurable set in the tensor σ-algebra is determined by the a countable number of projections. The action to determine whether a function is continuous or not cannot be determined by a countable number of operations. Hence W d is not a measurable set in the tensor σ-algebra. (once we know a function is continuous, the function can be determined by its values on a countable dense set.) Example 4.7 A sample continuous real valued stochastic process (Xt, 0 ≤ t ≤ 1) on a probability space (Ω, F,P ) can be considered as a function, denoted by X, from (Ω, F) to W : X(ω)(t) = Xt(ω). n For each time t, and a ∈ R , {ω : |Xt(ω) − a| < } belongs to F as Xt is measurable. We show that the function X :Ω → W is Borel measurable. Take f ∈ W and the open ball in the Banach space W centred at f with radius ε > 0. Its pre-image by X is: {ω : sup |Xt(ω) − fti | < } = {ω : sup |Xti (ω) − fti | < } 0≤t≤1 0≤ti≤1,ti∈Q = ∩0≤ti≤1,ti∈Q{ω : |Xti (ω) − fti | < }. Since for each i, {ω : |Xti (ω) − fti | < } ∈ F, the intersection of the countable number of such sets belongs to F. Chapter 5 Integration 5.1 Integration Throughout this chapter (X , F, µ) is a measure space with a σ-finite measure µ. (Previously I assume the measure is complete, this assumption is not needed for defining integration.) By the integral of a Borel measurable function f : X → R, we mean a value assigned to this function. Let us call this value I(f). So I : H → R is a function on a set of functions H. For I to qualify to be an integral it should have some properties which we describe below. Property 1 (IP) Let H be a class of Borel measurable functions f : X → R. Suppose that to every f ∈ H, we assign a number or ∞ which we denote by I(f). We say I satisfies the IP if the following holds: 1. (Linearity) If f, g ∈ H and a, b ∈ R, we have I(af + bg) = aI(f) + bI(g). 2. (Monotonicity) If f ≤ g then I(f) ≤ I(g) Notation. We use R f for the integral of f. To emphasize the measure used we also use R fdµ for R the same integral. Sometime we emphasize the domain of integration as follows X fdµ. Sometimes we explicitly express the argument of integration by a dummy variable: R f(x)µ(dx), this is also written as R R R f(x)dµ(x). Also if A is a measurable set, we set A f = f1A etc. 59 5.2. OUTLINE OF THE CONSTRUCTION 60 5.2 Outline of the construction The procedure for defining the integral of a function f : X → R w.r.t. a measure µ is as follows. PN 1. If f = i=1 ai1Ai is a simple function (the assumption that Ai are measurable is included in the definition), we define N Z X f dµ = ai µ(Ai). X i=1 2. If f is a positive function we define Z Z f dµ = sup g dµ X g where the supremum is taken over simple functions g with 0 ≤ g ≤ f. We say f is integrable if R f dµ < ∞. 3. Let f : X → R be a measurable function. Let f + = max(f(x), 0) and f −(x) = max(−f(x), 0) be the positive and negative parts of f, then f = f + − f −. If both f + and f − have finite integrals we say f is integrable and define Z Z Z f dµ = f + dµ − f − dµ. X X X The set of integrable functions is denoted by L1. Observe that in this case Z Z Z |f|dµ = f +dµ + f −dµ. 5.3 Integral of simple functions Let us denote by S the set of measurable functions defined as follows: ( n ) X S = ci1Ai : Ai ∈ F, ci ∈ R, n = 1, 2,... . i=1 Elements of S are called simple functions. We also denote by S+ the set of positive simple functions: ( n ) + X S = ci1Ai : Ai ∈ F, ci > 0, n = 1, 2,... . i=1 A simple function can have more than one representations. Take for example ( 1, x ∈ [0, 1] f(x) = 1, x ∈ (2, 3] 5.3. INTEGRAL OF SIMPLE FUNCTIONS 61 Then f(x) = 1[0,1](x) + 1[2,3](x) = 1[0,1)(x) + 1{1}(x) + 1[2,3](x). There exists a unique canonical representation, up to re-arranging the orders of the sets. By the canonical form we mean n X −1 f = ai1{f (ai)}, i=1 where ai are the set of non-zero values of f (they are of course distinct numbers), this is unique up to re-ordering. Exercise 5.3.1 A simple functions is a measurable function taking only a finite number of values, a measurable function taking only a finite number of values is a simple function. Proof If f is a measurable function taking only a finite number of non-zero values {c1, . . . , cn}, then it n −1 P −1 can be expressed in the form of i=1 ci1f (ci) and f (ci) is measurable. Pn We show the converse: f = i=1 ai1Ai only takes a finite number of values. Remark 5.3.1 An easier way to show that f takes only finitely many values is as below: f is a linear combination of n indicator functions. Since an indicator function takes only value 0 or 1, such a function thus can take at most 2n values. We next show this in at hard way, the proof itself has some value. First suppose that f = a11A1 + a21A2 . Set B1 = A1 \ A2,B2 = A2 \ A1,B3 = A1 ∩ A2. Then f = a11B1 + a21B2 + (a1 + a2)1B3 , so f takes at most the following values: 0, a1, a2, a1 + a2. Pn We now prove by induction to show that every f of the form i=1 ai1Ai can be written in the form m X f = bj1Bj , i=j where Bj are disjoint. We have shown this holds for n ≤ 2. Suppose this holds for k ≤ n. Then, n+1 m X X f = ai1Ai = bj1Bj + an+11An+1 , i=1 i=j 5.3. INTEGRAL OF SIMPLE FUNCTIONS 62 where Bi are disjoint. Set Cj = Bj \ An+1, j = 1, . . . , m, n [ Cm+j = Bj ∩ An+1,C2m+1 = An+1 \ ( Bm). j=1 Then Cj are disjojnt sets and m m X X f = bj1Cj + an+11C2m+1 + (bk + an+1)1Bk∩An+1 . j=1 k=1 Thus f can be written as a finite linear combination of indicator functions of disjoint sets, among other things this also shows that f takes a finite number of values. Exercise 5.3.2 1. If f, g are simple functions, so are f + g and fg. In particular S is a vector space. 2. If f is a simple function, f +, f − are both simple functions. Proof Let us denote by ai the distinct values of f and bj the distinct values for g. We write n m X X −1 −1 f = ai1{f (ai)}, g = bj1{f (bj )}, i=1 j=1 Set −1 −1 Ei = f (ai),Fj = g (bj). Then (Ei ∩ Fj) ∩ (Ek ∩ Fl) = φ if (i, j) 6= (k, l). It is clear that X f + g = (ai + bj)1{Ei∩Fj }. 1≤i≤n,1≤j≤m showing f + g is a simple function. The proof for fg is similar. It is clear that f + = max(f, 0), − f = min(f, 0) are also simple functions. They are obtained by flipping ai to max(ai, 0) or max(−ai, 0). Pn Definition 5.3.1 Let f = i=1 ai1Ai , in the canonical representation. Suppose that µ(Ai) < ∞ for every i, we then set n X I(f) = aiµ(Ai). i=1 + If in addition f ∈ S , then ai > 0, we may remove the assumption µ(Ai) = ∞, I(f) is still defined non-ambiguously. Of course I(f) takes the value ∞ if and only if one of the Ai’s has infinite measure. Observe that if f = 0 almost-everywhere then I(f) = 0. 5.3. INTEGRAL OF SIMPLE FUNCTIONS 63 Lemma 5.3.2 If f ∈ S is represented as M X f = dj1Bj j=1 where Bi are pairwise disjoint, then M X I(f) = diµ(Bi). i=1 Proof Observe that [ Bj = Ai. {j:dj =ai} Also µ(A ) = P µ(B ) by additive property of measures. Hence i {j:dj =ai} j n n X X X I(f) = aiµ(Ai) = ai µ(Bj) i=1 i=1 {j:dj =ai} M X = djµ(Bj), j=1 completing the proof. (It is clear if µ(Ai) = ∞ then one of the µ(Bj) = ∞.) We show next that I has the IP. Proposition 5.3.3 Let f, g ∈ S, vanishing outside of a set of finite measure. Then 1. for any a, b ∈ R, I(af + bg) = aI(f) + bI(g). 2. If f ≤ g a.e., then I(f) ≤ I(g). Proof Let {Ai} ( respectively {Bi}) be the sets in the canonical representation for f (respectively for −1 −1 g). Let A0 = f (0) and B0 = f (0). Then, Ai ∩ Bj forms a finite collection of disjoint measurable sets, denote this collection by {Ei : i ≤ N}. Then N N X X f = ai1Ei , g = bi1Ei i=1 i=1 and N X af + bg = (a ai + b bi)1Ei . i=1 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 64 Use Lemma 5.3.2, we see that N X I(af + bg) = (a ai + b bi)µ(Ei) = aI(f) + bI(g). i=1 Next if f ≤ g a.e. then {f > g} is a measurable set (by comparing their values on Ei as above), and I(f) − I(g) = I(f − g) ≥ 0. This proved the monotonicity. Example 5.1 The Dirichlet function f can be written as f = 1Qc . Its integral w.r.t. the Lebesgue measure on [0, 1] is λ(Qc ∩ [0, 1]) = 1. Example 5.2 Let a, b be real numbers. Consider the measure space ([a, b], B([a, b]). Let µ be the the n Lebesgue measure on [a, b]. It is determined by its value on sets of the form A = ∪i=1(ci, di) (union of disjoint intervals), n X µ(A) = (di − ci). i Consider a = t0 < t1 < ··· < tn = b. Let n−1 X f(x) = f01{0}(x) + aj1(tj ,tj+1](x). j=0 Since µ({0}) = 0, n−1 Z b X f(x)dx = aj(tj+1 − tj). a j=0 5.4 Integration of non-negative functions We will allow a non-negative function to take values in the extended real line R. In the sequel, a non- negative (potentially infinite) function f : X → [0, +∞] will be said to be measurable if it is measurable from (X , F) to ([0, ∞], B([0, ∞])). Note that, by Proposition 4.5.1 above, since B([0, ∞]) ⊂ B(R) (see Exercise 4.6.1) this is equivalent to saying that f is measurable from (X , F) to (R, B(R)). Recall that f is measurable from (X , F) to ([0, ∞], B([0, ∞])) iff {f < ∞} is measurable and f −1(A) ∈ F for any A ∈ B(R). The principle for a measurable function f : X → [0, ∞] is as follows: if f = ∞ on a set of non-zero measure, we should assign ∞ to its integral; if f < ∞ almost everywhere, we ignore those values that are ∞. This fact is encapsulated in the following, comprehensive, definition. 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 65 Definition 5.4.1 If f : X → [0, ∞] is a measurable function, we define the integral of f to be Z fdµ = sup{I(g) : 0 ≤ g ≤ f, g ∈ S}. We say f is integrable if R fdµ < ∞. Remark 5.4.1 If µ is a finite measure, then a bounded measurable function is integrable. A special case is when µ is the Lebesgue measure restricted to any bounded interval. Definition 5.4.2 If A is a measurable set and f1A is integrable, we define Z Z f dµ = f 1A dµ. A Proposition 5.4.2 1. If f ∈ S+ then I(f) = R fdµ. 2. the integral satisfies monotonicity. Proof Firstly, let f ∈ S+. We may take g = f ∈ S+, so sup{I(g): g ≤ f, g ∈ S+} ≥ I(f). Next, by monotonicity for I, if g ∈ S and g ≤ f, then I(g) ≤ I(f). Hence sup{I(g): g ≤ f, g ∈ S+} ≤ I(f). R + This proves that I(f) = f dµ for f ∈ S . Now if f1 ≤ f2 are non-negative and measurable, then Z Z + + f1 dµ = sup{I(g): g ≤ f1, g ∈ S } ≤ sup{I(g): g ≤ f2, g ∈ S } = f2 dµ proving the monotonicity. We cannot directly prove the linearity and try to obtain this from that of simple functions by taking limits. Definition 5.4.3 We say fn → f a.e. if there exists a null set A such that fn → f everywhere outside of the null set A. Theorem 5.4.3 (Fatou’s lemma) Let fn be a sequence of non-negative measurable functions that con- verges almost-everywhere to a function f, then Z Z fdµ ≤ lim inf fndµ. n→∞ 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 66 Proof Let E = {x : fn(x) −→ f(x)}. At the expanse of replacing fn by 1E fn and f by 1E f, we n→∞ may assume without loss of generality that fn → f pointwise. Making this assumption it is sufficient to show that for any g ∈ S+ with g ≤ f, Z Z g dµ ≤ lim inf fn dµ. n→∞ Step 1. Suppose that R g < ∞, set X0 = {x : g(x) > 0}. Since g takes only a finite number of values µ(X0) < ∞ and M = max g < ∞ ( We only need to restrict to X as R g = R g.) 0 X0 X Fix > 0. Since g ≤ f = limn→∞ fn, for any x there exists N(x, ) s.t. for n > N(x, ), fn(x) ≥ (1 − )g(x). Let An = {x ∈ X0 : fk(x) > (1 − )g(x), ∀k ≥ n}. Then An is an increasing sequence and ∪nAn = X0. By the continuity property of measures, lim µ(X0 \ An) = 0. n→∞ Then for k ≥ n, Z Z Z (1 − ) g ≤ fk ≤ fk. An An Also, Z Z Z (1 − ) g = (1 − ) g − (1 − ) g. c An An Hence Z Z Z (1 − ) g − (1 − ) g ≤ fk. c An An Since g ≤ M, we may take to 0, Z Z Z g − g ≤ lim inf fk, c k→∞ An R R for every n. Take n → ∞ to conclude g ≤ lim infn→∞ fn. Step 2. Suppose that R g = ∞, since g take only a finite number of values, and g ≥ 0, it is necessary that there exists a set of infinite measure on which g takes a positive value a > 0. Denote this set by A. Set 1 A = {x : f (x) ≥ a, ∀n ≥ k}. k n 2 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 67 Then An is increasing. Since limn→∞ fn ≥ g, we have ∪nAn = A and limn→∞ µ(An) = µ(A) = ∞. For all n ≥ k, Z Z 1 1 f ≥ a1 = aµ(A ). n 2 Ak 2 k Thus, Z 1 lim inf fn ≥ aµ(Ak). n→∞ 2 Take k → ∞ lim infn→∞ fn ≥ ∞, this completes the proof. Theorem 5.4.4 (Monotone convergence theorem) If fn is a sequence of non-negative measurable func- tions converging almost everywhere to f. Suppose that fn ≤ f for all n. Then Z Z lim fn = lim fn. n n R R Proof If 0 ≤ g ≤ fn then 0 ≤ g ≤ f, hence for every n, fn ≤ f. Z Z lim sup fn ≤ f. n By Fatou’s lemma, Z Z Z lim inf fn ≥ lim fn = f, n completing the proof. The definition of integrals for non-negative functions using the value sup{I(g) : 0 ≤ g ≤ f, g ∈ S} is not easy to use. The monotone convergence theorem allow us to use a sequence. Choose fn ∈ S with fn ↑ f (this exists by Theorem 4.7.3), then Z Z fdµ = lim fn. n→∞ The name ‘Monotone convergence theorem’ comes from the following Corollary. Corollary 5.4.5 If 0 ≤ f1 ≤ f2 ≤ ... is a sequence of increasing real valued measurable functions, R and let f = supn fn. Then f is integrable if and only if supn fn < ∞, and then Z Z sup fn = sup fn. n n This means in particular that Corollary 5.4.6 If fn is a sequence of increasing simple functions with limit f, then Z Z f = lim fn. n→∞ 5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 68 Proposition 5.4.7 If f, g are non-negative and measurable, then the following holds: (1)[Linearity] for any a, b ≥ 0, Z Z Z (af + bg) = a f + b g. (2) [Monotonicity] If f ≥ g then R f ≥ R g. (3) R f = 0 if and only if f = 0 a.e. Proof (1) Note that linearity holds on S. Now we approximate f, g with positive simple increasing + + sequence of functions. Take fn ∈ S , gn ∈ S such that fn ↑ f, gn ↑ g. with fn ≤ f and gn ≤ g. Then Z Z Z (afn + bgn)dµ = fndµ + bgndµ, We apply the monotone convergence theorem to take the limit inside the integrals, proving the linearity. (2) It is sufficient to show f ≥ 0 a.e. then R f ≥ 0. This is true as the simple functions approximating f is non-negative. (3) Let f ≥ 0 a.e. If f = 0 a.e. it is clear that R fdµ = 0 (e.g. use monotonicity.) R 1 ∞ Suppose that f = 0. Let An = {f ≥ n }. Then {f > 0} = ∪n=1An. Z Z Z µ(An) = 1An dµ ≤ 1An nfdµ ≤ n fdµ = 0. Thus ∞ X µ({f > 0} ≤ µ(An) = 0, n=1 concluding the proof. These shows that we can exchange summation (over a finite number of functions) with integration. For a sum of non-negative functions, this holds also with infinitely many summands, as to be explained below. Corollary 5.4.8 (Exchange of order of summation and integration) Suppose that fn is a sequence of non-negative measurable functions, then ∞ ∞ Z X X Z fn = fn. n=1 n=1 5.5. INTEGRABLE FUNCTIONS 69 Proof By Proposition 5.4.7, finite sum can be taken out of the integral. Fatou’s lemma allow us to take the limit. Z ∞ Z N Z N X X Monotone X fn = lim fn = lim fn N→∞ N→∞ n=1 n=1 n=1 N ∞ linearity X Z X Z = lim fn = fn, N→∞ n=1 n=1 concluding the proof. 5.5 Integrable functions Definition 5.5.1 Let f : X → R be measurable. If both f + and f − are integrable we say that f is integrable, in which case we define its integral to be: Z Z Z f = f + − f −. Note that f is integrable if and only if |f| is integrable. A bounded measurable function is integrable with respect to a finite measure. Definition 5.5.2 We denote by the class of integrable functions by L1. The notation L1(X ) is used to emphasize the space. Also L1(µ), L1(X ; µ) are used to emphasize the measure. Proposition 5.5.1 If f, g are integrable functions,t then (1) (linearity) For any a, b ∈ R, Z Z Z (af + bg) = a f + b g. (2) (Monotonicity) If g ≤ f a.e., then R g ≤ R f. (3) R |f| = 0 if and only if f = 0 almost-everywhere. (4) | R f dµ| ≤ R |f| dµ. Proof Part one follows from splitting f = f + − f − and g = g+ − g− and the same property for non-negative functions. If g ≤ f then g − f ≥ 0, so R (g − f) ≥ 0, monotonicity now follows from the linearity. The third property follows from Proposition 5.4.7 and the property |f| ≥ 0. Finally, for (4), since −|f| ≤ f ≤ |f|, by monotonicity we deduce that Z Z Z − |f| dµ ≤ f dµ ≤ |f| dµ, thus yielding the claim. 5.5. INTEGRABLE FUNCTIONS 70 Exercise 5.5.1 Assume that f, g are non-negative measurable functions and h is an integrable function on X sastisfying that f ≥ g + h µ-a.e. Show that Z Z Z f dµ ≥ g dµ + h dµ. Exercise 5.5.2 If f, g are real valued integrable functions, show the following statements hold: R 1. If µ(A) = 0 then A f dµ = 0. R 2. If A f = 0 for every measurable set A then f = 0 µ almost-everywhere. Hint: For part (2), use Proposition 5.4.7 and choose a suitable A. Exercise 5.5.3 Let f, g : X → [0, ∞] be two non-negative measurable functions. We assume that R R A f dµ = A g dµ for every measurable set A. 1. Show that if µ is also assumed to be finite, then necessarily f = g µ-a.e. 1 2. Show that if µ is σ-finite, then the conclusion f = g µ-a.e. remains true. Example 5.3 Recall that for x0 ∈ X , the Dirac measure at x0 is defined by: ( 0, if x0 ∈ A δx0 (A) = = 1A(x0). 1, if x0 6∈ A R Let f : X → R be measurable. Then f is integrable and X f dδx0 = f(x0). Example 5.4 Let p : X → [0, ∞) be integrable. Set Z µp(A) := p(x)µ(dx). A Then µp defines a measure. R Proof Firstly µp(φ) = φ p µ = 0. Next if A ∩ B = φ, 1A∩B = 1A + 1B, So Z µp(A ∪ B) = (1A + 1B)pdµ = µp(A) + µp(B). Finally, if An is an increasing sequence of measurable sets with union A, lim 1A = 1∪∞ A = 1A, n→∞ n n=1 n 1Hint: Beware in this case f and g may not be integrable, so we may not use the result from the previous exercise. One may rather consider, for all a, b ∈ Q such that 0 ≤ a < b, the measurable subset A = {f ≤ a < b ≤ g}, and show that µ(A) = 0. 5.6. FURTHER LIMIT THEOREMS 71 showing that Z Z Z µp(A) = 1Ap dµ = lim 1A pdµ = lim 1A p dµ = lim µp(An). n→∞ n n→∞ n n→∞ Hence the σ-additive property holds. Questions. R Let x0 ∈ R. Can one find a function p : R → R such that, for all A ∈ B(R), δx0 (A) = A p dλ? R Can one find a function p : R → R such that, for all A ∈ B(R), λ(A) = A p dδx0 ? Exercise 5.5.4 The answer are no and no. Prove it. Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2 such that S Ω = A1 A2 with µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν. The Lebesgue measure and any Dirac measures δx are singular. Definition 5.5.3 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, we have µ(A) = 0. We write µ ν. Later we will see that if µ and ν are two finite measures such that µ ν then we can found a a density function p such that µ is given by integration of p with respect to ν. This is called the Radon-Nikodym Theorem. 5.6 Further limit theorems Theorem 5.6.1 (Fatou’s Lemma) Suppose that fn is a sequence of non-negative measurable functions, then Z Z lim inf fn dµ ≤ lim inf fn dµ. n→∞ n→∞ Proof Firstly, lim inf fn = lim inf fm. n→∞ k→∞ m≥k Observe that infm≥k fm is an increasing sequence and infm≥k fm ≤ fk. So Z Z lim inf fn = lim inf fm n→∞ k→∞ m≥k Z Z = lim inf inf fm ≤ lim inf fk. k→∞ m≥k k→∞ This completes the proof. 5.6. FURTHER LIMIT THEOREMS 72 Theorem 5.6.2 (Dominated convergence theorem) Suppose that fn is a sequence of measurable func- tions converging to a function f almost everywhere. Suppose that there exists an integrable function g such that |fn| ≤ |g|, then fn and f are integrable w.r.t. µ, and (1) Z Z lim fn dµ = lim fn dµ. n→∞ n→∞ (2) Z lim |fn − f| dµ = 0. n→∞ Proof Since g is integrable and |fn| ≤ |g| µ a.e., the fn are integrable as well. Moreover as fn → f a.e., we deduce that |f| ≤ g a.e., so f is also µ-integrable. Then statement (1) follows from statement (2), as Z Z Z fn dµ − f dµ ≤ |fn − f| dµ. Let us prove (2). By the triangle inequality, |fn − f| ≤ |fn| + |f| ≤ 2|g| a.e., hence the measurable function 2|g| − |fn − f| is non-negative and converges to 2|g| a.e. Therefore, by Fatou’s lemma Z Z Z Z 2 g dµ ≤ lim inf (2|g| − |fn − f|) dµ = 2 g dµ − lim sup |fn − f| dµ. n→∞ n→∞ R R Since g dµ < ∞, we deduce that lim sup |fn − f| dµ ≤ 0, and the claim follows. n→∞ Exercise 5.6.1 Prove part (2) of the dominated convergence theorem. Definition 5.6.1 A sequence of measurable functions fn is said to converge to f in L1 if Z lim |fn − f| dµ = 0. n→∞ Exercise 5.6.2 Show that if fn → f in L1 then, Z Z lim fn dµ = f dµ. n→∞ R p Definition 5.6.2 1. A measurable functions with the property |f| < ∞ are said to be in Lp where 1 ≤ p < ∞. R p 2. If there exists a measurable function f such that |fn − f| dµ → 0, then we say fn converges to f is Lp, in which case f is also in Lp. 5.7. INTEGRALS DEPENDING ON A PARAMETER 73 5.7 Integrals depending on a parameter As before, let (X , F, µ) be a measure space, and let E be a metric space. In many cases we are integrating a function f : E × X → R 3 (t, x) → f(t, x) with respect to the variable x, and are interested in the R behaviour of X f(t, x) dµ(x) when the value of t changes. Proposition 5.7.1 Let f : E × X → R be a function, and let t0 ∈ E. We assume that: 1. For all t ∈ E, the function x 7→ f(t, x) is measurable from (X , F) to (R, B(R)). 2. µ(dx)-a.e., the map t 7→ f(t, x) is continuous at the point t0. 3. there exists g : X 7→ [0, ∞] integrable such that, for all t ∈ E, |f(t, x)| ≤ g(x) µ(dx)-a.e. R Let F : E 7→ R given, for t ∈ E, by F (t) = X f(t, x) µ(dx). Then the function F is well-defined and continuous at t0. Proof Since for all t ∈ E, |f(t, x)| ≤ g(x) µ-dx a.e. where g is integrable, we deduce that x 7→ f(t, x) is integrable, hence F (t) is well-defined. Moreover, if (tn)n≥1 is a sequence in E converging to t0, then, in view of asssumption 2., f(tn, x) → f(t0, x) µ(dx)-a.e. Moreover, by Assumption 3, for all n ≥ 1, R |f(tn, x)| ≤ g(x). Since g is integrable, by the dominated convergence theorem, f(tn, x)µ(dx) −→ n→∞ R f(t0, x)µ(dx), i.e. F (tn) −→ F (t0). The continuity property for F follows. n→∞ Assume now that E = I is an open interval R. The next proposition provides sufficient conditions R for the function t 7→ X f(t, x)dµ(x) to be differentiable at a point t0 ∈ I. Proposition 5.7.2 Let f : I × X → R be a function, and let t0 ∈ I. We assume that: 1. For all t ∈ E, the function x 7→ f(t, x) is integrable with respect to µ. ∂f 2. µ(dx)-a.e., the map t 7→ f(t, x) admits a derivative at the point t0, which we denote by ∂t (t0, x). 3. there exists an integrable function g : X 7→ [0, ∞] such that, for all t ∈ E, we have |f(t, x) − f(t0, x)| ≤ g(x) |t − t0|, µ(dx) − a.e. (5.1) R Then the function F : I → R given, for t ∈ I, by F (t) = X f(t, x) µ(dx), is differentiable at t0, and Z 0 ∂f F (t0) = (t0, x) dµ(x). X ∂t 5.7. INTEGRALS DEPENDING ON A PARAMETER 74 Remark 5.7.3 Under the assumptions of the above Proposition, we thus have Z Z ∂ ∂f f(t, x) dµ(x) = (t0, x) dµ(x), ∂t ∂t X t=t0 X i.e., we may interchange the derivation and the integration. Proof By Assumption 1., F (t) is well-defined for all t ∈ I. Moreover, by Assumption 2., for all sequence of points tn ∈ I converging to t0, we have f(tn, x) − f(t0, x) ∂f −→ (t0, x), µ(dx) − a.e. t − t0 n→∞ ∂t Moreover, Assumption 3. yields the domination property f(tn, x) − f(t0, x) ≤ g(x), µ(dx) − a.e. t − t0 holding for all n ≥ 1. Since g is integrable, by the Dominated Convergence Theorem Z Z F (tn) − F (t0) f(tn, x) − f(t0, x) ∂f = dµ(x) −→ (t0, x) dµ(x), tn − t0 X t − t0 n→∞ X ∂t which yields the claim. Often the following, weaker version of the previous theorem proves convenient. It is less general, but its assumptions are quicker to check and are very often satisfied in most applications. Proposition 5.7.4 Let f : I × X → R be a function. We assume that: 1. For all t ∈ E, the function x 7→ f(t, x) is integrable with respect to µ. ∂f 2’. µ(dx)-a.e., the map t 7→ f(t, x) is differentiable on I, with derivative at t denoted by ∂t (t, x). 3’. there exists an integrable function g : X 7→ [0, ∞] such that, µ(dx)-a.e, we have ∂f ∀t ∈ I, (t, x) ≤ g(x). ∂t R Then the function F : I → R given, for t ∈ I, by F (t) = X f(t, x) µ(dx), is differentiable on I, and Z ∂f F 0(t) = (t, x) dµ(x), t ∈ I. X ∂t Proof By Assumptions 2’. and 3’. and the mean value theorem, for any given t0 ∈ I, the inequality (5.1) is satisfied at t0. Hence the result follows by Proposition 5.7.2. . Here are two important examples of applications of the above three propositions 5.8. EXAMPLES AND EXERCISES 75 Example 5.5 (Convolution) Let ϕ : R → R be a Lebesgue-integrable function, and let % : R → R be a bounded continuous function. Then the convolution % ∗ ϕ : R → R , defined by Z % ∗ ϕ(t) = %(t − x) ϕ(x) λ(dx) is continuous on R. Moreover, if % is C1 and %0 is bounded, then % ∗ ϕ is C1 and (% ∗ ϕ)0 = %0 ∗ ϕ. Example 5.6 (Fourier transform) Let ϕ : R → R be a Lebesgue-integrable function. Its Fourier transform ϕˆ : R → C is defined by Z ϕˆ(ξ) = eiξxϕ(x) λ(dx), ξ ∈ R. R Then ϕˆ is continuous. Moreover, if Z |x| |ϕ(x)| λ(dx) < ∞ R 1 0 R iξx then ϕˆ is C , and for all ξ ∈ R, ϕˆ (ξ) = R ixe ϕ(x) λ(dx). Remark 5.7.5 The above three propositions give criteria for regularity (continuity and differentiability) R of the function F : t 7→ X f(t, x) µ(dx). One may also ask for criteria of integrability: this question will be tackled in Chapter 7 below with Fubini’s Theorems. 5.8 Examples and exercises Exercise 5.8.1 If a bounded function f :[a, b] → R is Riemann integrable, then it is Lebesgue measur- able and Lebesgue integrable. Furthermore, the integrals have the common value: Z Z b f(x)dµ = f(x)dx. [a,b] a Hint. Check that the integrals agree on functions of the form: n X αi1(ci,di) i=1 where (ci, di) are disjoint intervals. (b−a) Let ∆ be the dyadic partition of [a, b]: tj = 2n . Consider X fn(t) = sup{f(x): x ∈ [ti, ti+1]}1(ti,ti+1](t). j Then fn is measurable and converges to f a.e. 5.8. EXAMPLES AND EXERCISES 76 Exercise 5.8.2 1. Show that if for two measures µ and ν, R fdµ = R fdν for all f ∈ S, then R R L1(µ) = L1(ν), and fdµ = fdν for all integrable functions. 2. Let E be a measurable set and µE the restriction of µ to the trace σ-algebra FE = {E∩A : A ∈ F}. Show that Z Z fdµ = fdµE. E X Also f ∈ L1(µE) if and only if f1E ∈ L1(µ). (1) By the construction of integrals, R fdµ = R fdν for all non-negative functions. Thus R |f|dµ = R |f|dν, which are either both finite or both infinite. This implies that L1(µ) = L1(ν) and the integrals are the same for they are defined by the integrals of their positive and negative parts. R R (2) If g is elementary, so is g1E and g1Edµ = gdµE for all elementary g ∈ S. Let fn be an increasing sequence of simple functions converging to f, then fn1E ↑ f1E. Hence Z Z Z Z fdµ = lim fn1E dµ = lim fn dµE = fdµE. E n→∞ n→∞ Exercise 5.8.3 Suppose that p :[a, b] → R is a non-negative continuous function. Set Z x F (x) = F (a) + p(t)dt. a Let f :[a, b] → R be a continuous function. Describe the Lebesgue-Stilejes measure µF and the integral Z f dµF in terms of Riemann integral. Sketch. Let A ⊂ B([a, b]). Then, Z Z d Z 1A dµF = µF (A) = p(t)dt = 1A p(t) dt. c [a,b] By linearity for any simple function f, Z Z b fdµF = f(t)p(t)dt. a By Fatou’s lemma we can pass this to non-negative functions, and then by linearity to integrable func- tions: Z Z b fdµF = f(t)p(t)dt. [a,b] a Example 5.7 Let fn = n1 1 Then, fn → 0 everywhere except at 0. [0, n ] Z Z n fndλ = dx = 1. 0 Hence we cannot exchange the order of taking limit with taking integration in this case. 5.9. THE MONOTONE CLASS THEOREM FOR FUNCTIONS * 77 5.9 The Monotone Class Theorem for Functions * Proposition 5.9.1 (The Monotone Class Theorem for Functions) Let (Ω, F) be a measure space. Let C be a π system with σ(C) = F. Let H be a vector space of functions from Ω to R, with the following property: 1. 1 ∈ H, and 1A ∈ H for every A ∈ C. 2. (Monotone class property) If fn ∈ H is an increasing sequence of non-negative functions with f(x) = supn f(x) finite for every x (resp. with f = supn f bounded), then f ∈ H. Then H contains the set of all real valued (resp. all bounded ) F-measurable functions. Proof Let J = {B ∈ F : 1B ∈ H}. By assumption, 1 = 1Ω ∈ H, and J ⊃ C ∪ {Ω}. If A ⊂ B, A, B ∈ J then 1B\A = 1B − 1A ∈ H by the vector space property of H. Thus B \ A ∈ J. Let An ∈ J be an increasing sequence of sets then by condition 2, 1∪∞ A = lim 1A ∈ H. n=1 n n→∞ n ∞ Hence ∪n=1An ∈ J and J is a λ-system. It therefore contains F. This means all indictor functions are in H and simple functions are in J. Suppose H is closed under monotone limit of functions with bounded limit. If f is bounded positive and measurable there is an sequence of positive simple functions fn converging to f, thus f ∈ H. If f is not positive let f = f + − f − to conclude. Suppose H is closed under monotone limit of functions with finite limit. If f is positive and measur- able there is an sequence of positive simple functions fn converging to f, thus f ∈ H. Again, if f is not + − positive let f = f − f to conclude. Remark 5.9.2 From this we see a meta theorem (means the theorem is likely to hold, of course one needs to give a proof): If a property concerning integration of functions holds for all indicator functions, then it holds for all measurable functions (use Fatou’s lemma to obtain the monotone property) or it holds for all bounded measurable functions ( use dominated convergence theorem to obtain the monotone property). 5.9.1 Lebesgue Integration In this section let X = R, F the completion of the Borel σ algebra, i.e. consists of Lebesgue measurable sets, and µ the Lebesgue measure λ. The restriction of λ to any measurable subset is also denoted by the same letter. 5.10. THE PUSHED FORWARD MEASURE 78 A function f : R → R is said to be Lebesgue measurable if it is measurable with respect to F/B(R). Borel measurable functions are of course Lebesgue measurable. In this section by a measurable function we refer to a Lebesgue measurable function. Definition 5.9.1 Integrals with respect to a Lebesgue measure, are called Lebesgue integrals. Exercise 5.9.1 Let I be a Lebesgue measurable set. Suppose that f : I → R is bounded. 1. If f is Lebesgue measurable, show that Z Z inf h dλ, h ≥ f, h ∈ S. = sup g dλ : g ≤ f, g ∈ S (5.2) 2. Show that if (5.2) holds, then f is Lebesgue measurable. 5.10 The pushed forward measure Suppose we are given two measurable spaces (X , A) and (Y, B). Let µ be a measure on X and f : X → Y be a measurable function. Denote by f∗µ the pushed forward (induced) measure on (Y, B), i.e. f∗(µ)(B) = µ{x : f(x) ∈ B}. The right hand side µ(f −1(B)), so we given B the measure of its pre-image. Proposition 5.10.1 Let ϕ : Y → R a measurable function, we have Z Z ϕ ◦ f dµ = ϕ d(f∗µ). X Y This is in the sense that ϕ is integrable with respect to f∗µ if and only if ϕ ◦ f is integrable with respect to µ. Proof This holds for indicator functions of measurable sets by the definition of pushed forward measures. Apply monotone convergence theorem on both sides to see that those functions with the desired property is a monotone class. Apply the monotone class theorem to conclude. Let (Ω, F, P) be a probability measure. Let (S, G) be a measurable space. A measurable function from X :Ω → S is called a random variable, S is called its state space. If ϕ : S → R and X :Ω → S are measurable, then Z Z E[ϕ(X)] := ϕ ◦ X dP = ϕ dX∗(P). Ω S 0 R R 0 Exercise 5.10.1 If both Y , Y are real valued integrable functions on (Ω, F, P) with A Y = A Y for every measurable set A, then Y = Y 0 almost surely. 5.11. APPENDIX: RIEMANN INTEGRALS 79 Proof Let A = {Y > Y 0}. By symmetry it is sufficient to proves that P(A) = 0. Thus we only need R to prove the following statement: for any Y ≥ 0, A Y = 0 for any A ∈ F, then Y = 0 almost surely. For this just note that if {Y 6= 0} = {Y > 0} has positive measure then for some a > 0, {Y > a} has 1 positive measure (for otherwise P(Y > 0) = limn→∞ P(Y > n ) = 0), and then Z Y dP ≥ aP(Y > a) > 0, A this contradicts the assumption. Hence µ({Y 6= 0}) = 0 ,completting the proof. Example 5.8 Let f : X → Y. Given a measure on Y, can we use f and µ to define a measure on X in a meaningful way? The answer is no in general. There is no good sensible way for this construction, for otherwise you should be able to pull back a direct measure, what that would be if f : {1, 2} → R f(1) = 0 and f(2) = 0? If f : R → R is injective, one can define a measure from a measure on the target space, it would be the pushed forward measure for the inverse map. Definition 5.10.1 If f is a measurable map from X to X , we say µ is invariant by f if f∗(µ) = µ. n n n Example 5.9 Let δx be the Dirac measure on R . Then for any transformation T : R → R , T∗(δx) = δT x. The δ-measure is not invariant under rotations (unless x = 0), nor by translation. 5.11 Appendix: Riemann integrals If a function f :[a, b] → R is Riemann integrable, it is Lebesque integrable and the two integrals agree. In this and the next section we review Riemann integrals and Riemann-Stieltjes integrals. I do not intend to cover this in class. However you might find this helpful for comparing several notions of integration theory. We know how to compute the areas of a triangle and a polygon. We then define the integral underneath a continuous curve by rectangle approximation. If y = f(x), x ≥ 0 is the curve, the approximation leads R t R b to 0 f(s)ds, this is the Riemann integral a f(x)dx. (Riemann integrals are covered in M2PM1: Real Analysis) The Riemann integral of f is defined as follows. Take a partition ∆ : a = x0 < a1 < ··· < an = b. Let Mi and mi be respectively the supremum and infimum values of f on the interval [ai−1, ai]. Define n n X X U(∆, f) = Mi(ai − ai−1),L(∆, g) = mi(ai − ai−1). i=1 i=1 U(f) = inf U(∆, f),L(f) = sup U(∆, f). ∆ ∆ 5.12. APPENDIX: RIEMANN-STIELTJES INTEGRALS* 80 Definition 5.11.1 A bounded function f :[a, b] → R is Riemann integrable, if U(f) = L(f) and the common value is the Riemann integral of f on [a, b]. A continuous function is Riemann integrable. Theorem 5.11.1 A bounded function f is Riemann integrable if and only if for every > 0 there exists a partition such that U(f, ∆) − L(f, ∆) < . Observe that the oscillation of f on [ai−1, ai] is Osc(f) = Mi − mi, X U(f, ∆) − L(f, ∆) = Osc(f, [ai−1, ai])(ai − ai−1). Define the Riemann sum: n X ∗ R(f, ∆) = f(xi )(ai − ai−1). i=1 Theorem 5.11.2 Suppose a bounded function f :[a, b] → R is Riemann integrable. If ∆n is a sequence R b of partitions with mesh goes to zero, then their Riemann sum R(f, ∆n) converges to a f(x)dx. This can be proved using Theorem 5.11.1. Remark 5.11.3 If f : [0, ∞) → R is measurable and Riemann integrable on every interval [0, n] where n ∈ N, then f is Lebesgue integrable on [0, ∞) if and only if Z N lim |f(x)|dx < ∞. N→∞ 0 R ∞ R If so, 0 f(x)dx = [0,∞) fdλ. The concept ‘Lebesgue integrable’ does not allow cancellation, while improper Riemann integration sin(x) allows it. x is improper Riemann integrable, bot not Lebesgue integrable. 5.12 Appendix: Riemann-Stieltjes integrals* Riemann-Stieltjes integrals aree defined in parallel to Riemann integrals. Suppose : f[a, b] → R is a bounded function and g a monotone increasing function on [a, b], for example g(x) = x in which we case we are back to Riemann integrals. Write ∆gi = g(ai) − g(ai−1). Define n n X X U(∆, f, g) = Mi∆gi,L(∆, f, g) = mi∆gi. i=1 i=1 5.12. APPENDIX: RIEMANN-STIELTJES INTEGRALS* 81 U(f, g) = inf U(∆, f, g),L(f, g) = sup U(∆, f, g). ∆ ∆ Definition 5.12.1 If U(f, g) = L(f, g), we say f is Riemann-Stieltjes integrable (with respect to g), we R b denote f ∈ D(g). The common value is denoted by a fdg. Theorem 5.12.1 (Integrability Criterion) Suppose f is a bounded function and g a monotone increas- ing function on [a, b]. Then f is Riemann-Stieltjes integrable if and only if for any > 0, there exists a partition ∆ of [a, b] such that U(∆, f, g) − L(∆, f, g) < . Corollary 5.12.2 Suppose f is a bounded function with only a finite number of discontinuity points, and g a monotone increasing function on [a, b], continuous at every point where f is discontinuous. Then f is Riemann-Stieltjes integrable. Proposition 5.12.3 Suppose that f, f1, f2 :[a, b] → R are bounded, Riemann-Stieltjes integrable with respect to a monotone increasing function. g :[a, b] → R. Let c ∈ R. R b R b 1. If f ∈ D(g), then for any c ∈ R, cf ∈ D(g)and a (cf)dg = c a fdg. R b R b R b 2. f1 + f2 ∈ D(g), and a (f1 + f2)dg = a f1dg + a dg. R b R b 3. If f1 ≤ f2 then a f1dg ≤ a f2dg. R b R b 4. If f ∈ D(g) and c > 0 is a constant, then f ∈ D(cg) and a fd(cg) = c a fdg. 0 0 5. If f ∈ D(g) ∩ D(g ), then f ∈ D(g + g ), Z b Z b Z b fd(g + g0) = fdg + ddg0. a a a Definition 5.12.2 If g = g1 − g2, where monotone increasing functions we define Z Z Z fd(g1 − g2) = fdg1 − fdg2. Riemann-Stieljes integrals do not exists if the integrator and the integrand have the same point of R b R c R c discontinuity. That a fdF, b fdF exist in Riemann-Stieljes sense do not necessarily imply that a fdF is Riemann-Stieltjes integrable. Proposition 5.12.4 If g :[a, b] → R is continuous and F :[a, b] → R has bounded variation then the R b R b Riemann-Stieltjes integral a gdF and a F dg exist. If F is furthermore absolutely continuous and the integral equals to the corresponding Lebesgue integral: Z b Z b gdF = gF 0dx. a a 5.13. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION* 82 5.13 Appendix: Functions with bounded variation* The most commonly used integrators are functions of bounded variations. Functions of bounded varia- tions are differences of increasing functions. Given an interval [a, b], define P ([a, b]) to be the set of all partitions of [a, b]: P ([a, b]) = {t = (t0, t1, . . . , tn): a = t0 < t1 < ··· < tn−1 < tn = b, n ∈ N}. Let ∆t = max0≤i≤n−1(ti+1 − ti) be the modulus of the partition. Write n X X |f(u) − f(v)| = |f(tj) − f(tj−1)|. [u,v]⊂∆ j=1 Definition 5.13.1 The total variation of a function f on [a, b] is defined by X |f|TV ([a, b]) = sup |f(u) − f(v)|. ∆∈P ([a,b]) [u,v]⊂∆ If this number is finite we say that f has bounded total variation over [a, b]. 1 The function x sin( x ) does not have bounded variation over any interval containing 0. Note that f could have bounded total variation over [a, b] without having a bounded variation over R (e.f. f(x) = sin x). Theorem 5.13.1 A. function is of bounded variation on [a, b] if and only if f is the difference of two monotone real-valued functions on [a, b]. Proof Let f = f1 − f2, where f, g are increasing functions, then X |f|TV ([a, b]) = sup |f1(u) − f1(v) − (f2(u) − f2(v))|. ∆∈P ([a,b]) [u,v]⊂∆ For f :[a, b] → R, define a function: |f|TV (x) = |f|TV ([a, x]). Example 5.10 1. If f : R → R is an increasing function, |f|TV (x) = f(x) − f(−∞). 2. If f is locally Lipschitz continuous, then f is of bounded variation on any finite time interval. 3. The set of bounded variation function form a vector space. 5.13. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION* 83 If f : R → R is an increasing then it has left and right derivatives at every point and there are only a countable number of points of discontinuity. Furthermore f is differentiable almost everywhere. Theorem 5.13.2 Let F : R → R+ be a function of bounded variation, right continuous with F (−∞) = 0. Then 1. F determines a Borel measure µ on R. 2. The Borel measure µ, with distribution function F , is absolutely continuous with respect to the Lebesgue measure dx if and only if Z x F (x) = F 0(t)dt. 0 3. It is singular with respect to dx if and only if F 0 = 0. Chapter 6 Lp spaces Lebesgue spaces, also known as Lp spaces, are a family of Banach spaces that play a fundamental role in functional analysis. Their construction, which relies heavily on Lebesgue’s theory of integration, is outlined here. Throughout this chapter, we consider a fixed measure space (X , B, µ). 6.1 Holder’s inequality Let f : X → R measurable. For all p ∈ [1, ∞), we set 1 Z p p kfkp := |f| dµ ∈ [0, ∞]. Moreover, for p = ∞, we set kfk∞ := inf{M ∈ [0, ∞]: |f| ≤ M µ − a.e.} ∈ [0, ∞]. Remark 6.1.1 Note that |f| ≤ kfk∞ µ-a.e. Indeed, by definition of kfk∞, for all n ≥ 1 1 µ |f| ≥ kfk + = 0 ∞ n so that [ 1 µ({|f| > M}) = µ |f| ≥ kfk + = 0. ∞ n n≥1 So indeed |f| ≤ kfk∞ µ-a.e, and kfk∞ is thus the smallest constant that bounds |f| µ-a.e. We also call it the essential supremum of f. Exercise 6.1.1 Show that, if µ is a probability measure, then as p ↑ ∞, kfkp → kfk∞. 85 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 86 Remark 6.1.2 For all p ∈ [1, ∞], k · kp is homogeneous, i.e. for all f : X → R measurable and all a ∈ R, kafkp = |a|kfkp. Theorem 6.1.3 (Holder’s¨ inequality (Cauchy-Schwarz if p = q = 2)) Let 1 ≤ p, q ≤ ∞ such that 1 1 1 d p + q = 1 (with the convention ∞ = 0). Then, for all f, g : X → R measurable, Z |f · g|dµ ≤ kfkp kgkq. Proof If p = 1 and q = ∞, then |fg| ≤ |f| kgk∞ µ-a.e., hence Z Z |fg|dµ ≤ |f| kgk∞dµ = kfk1kgk∞, so the claim follows. The case where p = ∞ and q = 1 is symmetric, so we now focus on the case p, q ∈ (1, ∞). R p If kfkp = 0, then |f| dµ = 0, whence f = 0 µ -a.e.. Hence we also have fg = 0 µ-a.e., so R |fg| dµ = 0. Similarly, the inequality holds if kgkq = 0, hence we may assume kfkp, kgkq > 0. By homogeneity, at the expanse of considering f g f˜ = , g˜ = kfkp kgkp we may even assume kfkp = kgkq = 1. We now claim the following statement, known as Young’s inequality: xp yq ∀x, y ≥ 0, xy ≤ + . p q To prove that claim, note that the case where x or y vanishes is straightforward, so we may assume x, y > 0. Then, setting u = xp and v = yq, by convexity of the exponential, we have u v u1/p v1/q ≤ + , p q which yields the claim. Now, by Young’s inequality, we have Z Z |f|p |g|q kfkp kgkq 1 1 |f| |g| dµ ≤ + dµ = p + q = + = 1 = kfk = kgk , p q p q p q p q which yields the requested inequality. 6.2 Definition and basic properties of the Lp spaces For p ∈ [1, ∞], we set d Lp = {f : X → R : measurable, kfkp < ∞}. 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 87 p p p That Lp is a vector space is clear from the elementary inequality |a + b| ≤ Cp|a| + Cp|b| , for all p ∈ [1, ∞). Question: Does kfkp define a norm on Lp? As we saw it is homogeneous, and it also satisfies the triangle inequality which follows from the following: Theorem 6.2.1 (Minkowski’ inequality) Let f, g ∈ Lp where 1 ≤ p ≤ ∞. Then, f + g ∈ Lp and kf + gkp ≤ kfkp + kgkp. Proof If p = 1 or p = ∞, the inequality follows at once from the triangle inequality, so we restrict our attention to the case p ∈ (1, ∞). We may also assume WLOG that kf + gkp > 0. Let q ∈ (1, ∞) such 1 1 p that p + q = 1, then p = 1 + q . Hence Z Z p p p 1+ q kf + gkp = |f + g| dµ = |f + g| dµ Z p = |f + g| |f + g| q dµ Z p Z p ≤ |f| |f + g| q dµ + |g| |f + g| q dµ where we used the triangle inequality in the last line. Using Holder’s¨ inequality, noting that 1 1 Z p q Z q p ·q p q |f + g| q dµ = |f + g| dµ = kf + gkp , we therefore obtain p p p q q kf + gkp ≤ kfkpkf + gkp + kgkpkf + gkp , p(1− 1 ) q 1 1 whence kf + gkp ≤ kfkp + kgkp. Since p(1 − q ) = p p = 1, the claim follows. Unfortunately, k · kp falls short of defining a norm on Lp because of the following observation: a function f ∈ Lp satisfies kfkp = 0 iff f = 0, µ a.e. In general, a vast collection of non-zero functions may vanish µ a.e. Example 6.1 If (X , F) = (R, B(R)) and µ = δ0, then f = 0 µ-a.e. iff f(0) = 0. To fix this degeneracy, we introduce an equivalence relation on functions as follows: Definition 6.2.1 Two functions f, g : X → R are considered to be in the same equivalent class, and we write f ∼ g, if f = g µ-a.e. Note that this equivalence relation is compatible with the normed vector space structure of Lp. More precisely, if f ∼ f 0 and g ∼ g0, then for all a, b ∈ R, af + bg ∼ af 0 + bg0. Moreover, if f ∼ f 0, then 0 p kfkp = kf kp. This allows us to introduce a normed vector space L as follows: 6.2. DEFINITION AND BASIC PROPERTIES OF THE LP SPACES 88 Definition 6.2.2 For 1 ≤ p ≤ ∞, the space Lp(X , F, µ) (often written Lp(µ), or just Lp) is the space of equivalence classes of measurable functions f such that f ∈ Lp. Loosely speaking, p L = {f : f : X → R measurable, kfkp < ∞}, but keeping in mind that f = g in Lp as soon as f = g µ-a.e. p Proposition 6.2.2 The L spaces are vector spaces, they are Banach spaces under the norm k · kLp . p That L is a Banach space means that it is complete for the norm k · kLp , i.e. for every Cauchy p p sequence fn ∈ L , there exists f ∈ L such that lim kfn − fkp = 0. n→∞ Proof We have already seen that k·kp is homogeneous and verifies the triangle inequality (by Minkowski’sinequality). p Moreover, note that kfkp = 0 if and only if |f| = 0 µ a.e., i.e. if and only if f = 0 in L . Hence k · kp indeed defines a norm on Lp. We recall a useful fact: a normed vector space is a Banach space if and only if any absolutely conver- p P∞ gent sequence is convergent. Let fn ∈ L with n=1 kfnkp = M < ∞. Our aim is to show that there p Pn p exists f ∈ L such that fk −→ f in L . We first assume that p < ∞. We define k=1 n→∞ n X gn(x) = |fk(x)|. k=1 Pn Then, by Minkowski’s inequality, for all n ≥ 1, kgnkp ≤ k=1 kfkkp ≤ M. Hence, invoking the monotone convergence theorem, we obtain Z ∞ !p Z Z X p p p |fk(x)| dµ = lim |gn| dµ = lim |gn| dµ ≤ M < ∞ n→∞ n→∞ k=1 P∞ P∞ Hence the first integral above is finite, so k=1 |fk(x)| < ∞ µ-a.e., i.e. the series k=1 |fk(x)| con- P∞ verges absolutely in R µ(dx)-a.e Then we may define f(x) = k=1 fk(x) whenever the series con- Pn verges absolutely, and f(x) = 0 otherwise. Then f : X → R is measurable. Set hn(x) = k=1 fk(x). P∞ p p Then |hn(x)| ≤ k=1 |fk(x)| ∈ L . By the dominated convergence theorem (since |fn − f| ≤ p P∞ p 1 p 2 ( k=1 |fk(x)|) ∈ L ), |fn − f| converges in L1, hence fn → f in Lp. This completes the proof when p < ∞. When p = ∞, we have, µ-a.e., ∞ ∞ X X |fk(x)| ≤ kfkk < ∞, k=1 k=1 6.3. DENSITY RESULTS 89 P∞ so again the series k=1 fk(x) converges absolutely in R µ(dx)-a.e. Then we may define f(x) = P∞ k=1 fk(x) whenever the series converges absolutely, and f(x) = 0 otherwise. We have the bound P∞ ∞ |f| ≤ k=1 kfkk∞ < ∞ µ-a.e., so f ∈ L . Moreover, µ(dx)-a.e., n ∞ ∞ X X X | fk(x) − f(x)| ≤ |fk(x)| ≤ |fk|∞, k=1 k=n+1 k=n+1 Pn P∞ Pn ∞ so that | fk − f|∞ ≤ |fk|∞ −→ 0, i.e. fk converges to f in L . k=1 k=n+1 n→∞ k=1 Pn Remark 6.2.3 In the proof above we actually established a stronger fact: if the series k=1 fk converges p p Pn p absolutely in L , there exists f ∈ L such that fk −→ f µ-a.e. and in L . In particular, we k=1 n→∞ deduce the following, useful result. p Corollary 6.2.4 Let p ∈ [1, ∞). If a sequence (fn)n≥1 converges to f in L , we can extract a subse- quence (fϕ(n))n≥1 converging to f µ-a.e. −n Proof Since kfn −fkp −→ 0, we can extract a subsequence (fϕ(n))n≥1 such that kfϕ(n) −fkp ≤ 2 , n→∞ Pn p so that the series k=1(fϕ(k) − fϕ(k−1)) converges absolutely in L (where we set fϕ(0) := 0). Thus, Pn p by the previous remark, fϕ(n) = k=1(fϕ(k) − fϕ(k−1)) converges µ-a.e. to its limit in L , which is f. ∞ Exercise 6.2.1 If p = +∞, and if a sequence (fn)n≥1 converges to f in L , then (fn)n≥1 converges to f µ-a.e. (without need to extract a subsequence). 6.3 Density results When we need to prove a certain property for all functions in Lp, where p ∈ [1, ∞], it is often convenient to prove the property on a smaller set of functions, and argue by a density argument that the property remains true on the whole space Lp. We state here some fundamental density results. p Theorem 6.3.1 Let p ∈ [1, ∞]. If f ∈ L (X ) then there exists a sequence of simple functions fn such p p that limn→∞ kfn − fkp = 0. In words, S ∩ L (X ) is dense in L (X ). Proof First assume that p < ∞. We may assume that f ≥ 0 (standard methods passes to f = f + −f −). Let fn ≤ f be a sequence of non-negative increasing simple functions converging to f. Define p p gn = f − (f − fn) . p Then gn → f , and gn is non-negative and increasing , by the monotone convergence theorem, Z Z p p p lim f − (f − fn) dµ = f dµ n→∞ 6.3. DENSITY RESULTS 90 i.e. Z p lim (f − fn) dµ = 0, n→∞ ˜ ˜ and the claim follows. If p = ∞, then |f| ≤ kfk∞ < ∞ µ-a.e. Setting f = 1{|f|≤kfk∞}, then f = f µ-a.e. Moreover f˜is bounded, so there exists a sequence simple functions fn converging to f uniformly. Therefore kfn − fk∞ = kfn − f˜k∞ −→ 0, n→∞ thus concluding the proof. Corollary 6.3.2 Let 1 ≤ p < ∞, then Lp(Rd) is a separable metric space. Pn d Proof The subset D of simple functions i=1 ai1Ai where Ai = Πj=1(ci,j, di,j) with ai, ci,j, di,j ∈ Q is a countable subset of Lp(Rd). We also claim that D is dense in Lp(Rd). Let us prove this claim. For convenience of notation, let us assume that d = 1 (the general case is similar). We know that the subspace S ∩ Lp(R) is dense in Lp(R). Note that S ∩ Lp(R) is the subspace of functions of the form Pn i=1 ai 1Ai , where n ≥ 1, ai ∈ R \{0}, and Ai ∈ B(R) with λ(Ai) < ∞ . Since Q is a dense subset of R, the claim will follow upon showing that if A ∈ B(R) satisfies λ(A) < ∞, then ∀ > 0, ∃f ∈ D, k1A − fkp ≤ . (6.1) Note moreover that for all A ∈ B(R) such that λ(A) < ∞, by the Dominated Convergence Theorem, p 1A∩(−N,N) −→ 1A in L (R), so it is enough to prove (6.1) holds for all subset A ∈ B((−N,N)), for N→∞ N ≥ 1. Given N ≥ 1, let us therefore consider A, the collection of subsets A ∈ B((−N,N)) satisfying (6.1). A is a λ-system, indeed: • 1(−N,N) ∈ D, hence (−N,N) ∈ A • if A, B ∈ A and A ⊂ B, then for all > 0 there exist f, g ∈ D such that k1A − fkp < /2, and k1B − gkp < /2. Since 1B\A = 1B − 1A, it follows by the Minkowski inequality that k1B\A − (g − f)kp < ; since g − f ∈ D, this shows that B \ A ∈ A. • if (An)n≥1 is a non-decreasing sequence of elements of A, and A = ∪n≥1An, then 1A −→ 1A n n→∞ in Lp by the Dominated Convergence Theorem. Hence, for all > 0, there exists a n ≥ 1 such that k1An − 1Akp < /2. Since An ∈ A, there exists f ∈ D such that k1An − fkp < /2. By Minkowski’s inequality, we deduce that k1A − fkp < , so A ∈ A. Thus A is a λ-system, and it contains C := {(c, d): c, d ∈ Q ∩ [−N,N], c < d}. Since C is a π-system and σ(C) = B((−N,N)), by the π − λ Theorem, we deduce that A = B((−N,N)). Hence D is indeed a dense countable subset Lp(Rd), which is therefore separable. 6.4. THE CASE OF FINITE MEASURES 91 Remark 6.3.3 Beware that L∞(Rd) is not separable. ∞ d ∞ d Theorem 6.3.4 Let 1 ≤ p < ∞. The space Cc (R ) of C functions on R with compact support is a dense subspace of Lp(Rd). ∞ d p d ∞ d Proof First note that Cc (R ) ⊂ L (R ). Indeed, if f ∈ Cc (R ), then there exists N ≥ 1 such that f is supported in [−N,N]d. Since moreover f is continuous, it is therefore bounded. Thus Z Z p p p d |f| dλ = |f| 1[−N,N]d dλ ≤ kfk∞ λ([−N,N] ) < ∞, p d ∞ d p d so f ∈ L (R ). We now prove that Cc (R ) is dense in L (R ). By the proof of Corollary 6.3.2, the Pn d subset D of simple functions i=1 ai1Ai where Ai = Πj=1(ci,j, di,j) with ai, ci,j, di,j ∈ Q is dense in p d d L (R ). Hence it suffices to show that for any subset A of the form Πj=1(cj, dj), with cj, dj ∈ Q, there ∞ d p exists a sequence of functions fn ∈ C (R ) such that fn −→ 1A in L . We now recall the following c n→∞ ∞ useful fact: there exists a non-decreasing sequence of functions ϕn ∈ Cc (R) such that • 0 ≤ ϕn ≤ 1(0,1) for all n ≥ 1 • ϕn −→ 1(0,1) pointwise from below. n→∞ d ∞ d Given A = Πj=1(cj, dj), with cj, dj ∈ Q, and cj < dj, we now define fn ∈ Cc (R ) by setting d xj − cj d fn(x) = Πj=1 ϕn , x = (x1, . . . , xd) ∈ R . dj − cj Then we have • 0 ≤ fn ≤ 1A for all n ≥ 1 • fn −→ 1A pointwise from below. n→∞ p By the Dominated Convergence Theorem, we deduce that fn −→ 1A in L , as requested. n→∞ 6.4 The case of finite measures Here we state some fundamental properties specific to the case when µ is a finite measure. Proposition 6.4.1 If µ is a finite measure, then for all r, r0 ∈ [1, ∞] with r < r0, Lr0 (µ) ⊂ Lr(µ). 6.4. THE CASE OF FINITE MEASURES 92 0 ∞ Proof If r = ∞, and f ∈ L (µ), then f ≤ kfk∞ µ a.e., hence for all r ∈ [1, ∞), Z r r |f| dµ ≤ kfk∞ µ(X ) < ∞ r 0 r0 r0 so f ∈ L (µ), thus proving the claim. Let us now assume r < ∞. Let p = r and q = r0−r , so that p, q 0 are two conjugate numbers in (1, ∞). If f ∈ Lr (µ), by Holder’s¨ inequality 1/r 1/pr 1/qr Z Z Z r0−r r pr 0 kfkr = |f| 1 dµ ≤ |f| dµ 1 dµ = kfkr0 µ(X ) rr < ∞, r so f ∈ L (µ), thus proving the claim. Remark 6.4.2 In the special case where µ is a probability measure, the proof above shows that we 0 r0 furthermore have kfkr ≤ kfkr0 for all r > r ≥ 1 and all f ∈ L (µ). In probabilistic notations, 0 0 0 E(|X|r)1/r ≤ E(|X|r )1/r for any random variable X ∈ Lr (µ). Terminology: If X is a random variable on a probability space (Ω, F, P), for all p ∈ [1, ∞), E(|X|p) is called the p-th moment of X. We will now state an inequality that plays an important role in probability theory: Jensen’s inequality. We first recall the definition of a convex function. Definition 6.4.1 A function ϕ : Rd → R is convex if ϕ(tx + (1 − t)y) ≤ tϕ(x) + (1 − t)ϕ(y) for any x, y ∈ Rd and for any t ∈ [0, 1]. The following are convex functions on Rd: ϕ(x) = |x|, |x|p, p > 1. If d = 1, ex is convex, and so are 00 ϕ(x) = xp for p > 1. Moreover, if ϕ is twice differentiable, ϕ is convex if and only if ϕ ≥ 0. Theorem 6.4.3 (Jensen’s Inequality) If ϕ : R → R+ is a convex function and µ is a probability mea- sure, then for all f ∈ L1(X , F, µ), Z Z ϕ fdµ ≤ ϕ(f) dµ. Proof It is a fact that if ϕ is convex then there exists a constant c ∈ R such that Z Z ∀y ∈ R, ϕ(y) ≥ ϕ fdµ + c y − fdµ . Take y = f(x), Z Z ϕ(f(x)) ≥ ϕ fdµ + c f(x) − fdµ . Integrate this over with respect to the probability measure µ (see Exercise 5.5.1) to conclude. 6.4. THE CASE OF FINITE MEASURES 93 Pn Example 6.2 Let x1, . . . xn be points in R and suppose that µ = i=1 λiδxi is a probability measure on R. Then apply Jensen to f(x) = x and any convex ϕ: n n X X ϕ( λixi) ≤ λiϕ(xi). i=1 i=1 6.4.1 Mode of convergence To wrap up we note the commonly used notions of convergence. Let (X , F, µ) be a measure space with σ-finite measure, and fn, f : X → R be Borel measurable functions. Definition 6.4.2 We say fn converges to f in Lp if Z p |fn − f| dµ → 0. Definition 6.4.3 We say fn converges to f in measure if for any > 0, lim µ(x : |fn(x) − f(x)| > ) = 0. n→∞ Definition 6.4.4 We say fn converges to f a.s. if there exists a null set N and for any x 6∈ N, lim fn(x) = f(x). n→∞ Remark 6.4.4 If fn → f in Lp it converges in measure. Chapter 7 Product measures and Fubini’s Theorem Throughout this section let (X , F, µ) and (Y, G, ν) be two measure spaces. A (measurable) product set is of the form A × B where A ∈ F and B ∈ G. Denote by C the collection of products measurable set. C = {A × B : A ∈ F,B ∈ G}. The product σ-algebra is generated by such sets. F ⊗ G = σ(C). Can we assign a measure to F ⊗ G that is consistent to µ and ν? If X = {a1, . . . , an}, we use the power set σ-algebra. If we assign any non-negative values to ai, X say pi, this determines a measure on 2 (no relations is needed on {pi}, as intersections of singletons are emptysets, the union of two singletons is no longer a singleton). The power σ-algebra has precisely 2n elements. Every measure on the finite space X , with the power σ-algebra, is determined uniquely by their values of the singleton sub-sets. Let Y = {b1, . . . , bm}, then X × Y = {(ai, bj) : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, and every element of X Y X Y X ×Y X × Y is contained in 2 ⊗ 2 . Thus 2 ⊗ 2 = 2 . Hence by assigning a value to every {(ai, bj)}, this is a product set, this determines a measure on the product σ-algebra. Given µ on X and ν on Y, we define the product measure by µ × ν({(ai, bj)}) = µ({ai})ν({bj}). Similarly if X is given by a partition {A1,...,An}, and Y is given a partition {B1,...,Bm}, and if n m we set F = σ({A1,...,An}) and G = σ({B1,...,Bm}). Then there are 2 and 2 elements in F and G respectively. If we assign values µ(Ai), this determine a measure on X (because the intersections of the Ai are emptyset, no relation between these numbers are neeeded for extending these to every element of F). Every measure on F is obtained in this way. Also F ⊗ G = σ({Ai × Bj}). It is clear that 95 7.1. PRODUCT MEASURES 96 Ai × Bj is a partition of X × Y. Thus this is totally analogous to the finite state spaces discussed earlier. Furthermore given µ, ν on F and G respectively, the product measure on F ⊗ G will be defined by µ × G(Ai × Bj) = µ(Ai)ν(Bj). If F and G are not finite σ-algebras, they are uncountable, the construction for the product measure becomes much more involved. 7.1 Product measures Recall the definition of elementary family in Definition 2.2.4. Lemma 7.1.1 The collection C of product sets is an elementary family. The collection A of a finite number of disjoint unions of product sets is an algebra. Proof The empty set is in C. Also, (A × B) ∩ (A0 × B0) = (A ∩ A0) × (B ∩ B0), (A × B)c = (A × Bc) ∪ (Ac × B) ∪ (Ac × Bc). Thus, C is closed under taking complement, and the complement of a set is the disjoint union of a finite number of product sets. By exercise 2.2.4, the collection of finite number of disjoint union of sets from C is an algebra. n Let us define a function µ × ν : A → [0, ∞] by the following rules: If E = ∪j=1(Ai × Bj) where Ai ∈ F and Bi ∈ G, then we want to define n X µ × ν(E) = µ(Ai)ν(Bi). (7.1) j=1 We must show that this is independent of the expression of E into disjoint unions of product sets. ∞ Lemma 7.1.2 Let E = A × B ∈ C. Suppose that E = ∪j=1(Aj × Bj) where Ai ∈ F and Bi ∈ G, where {Aj × Bj} are disjoint sets. then ∞ X µ(A)ν(B) = µ(Aj)ν(Bi). j=1 Proof Observe that 1A×B(x, y) = 1A(x)1B(y) for x ∈ X ,B ∈ Y. Also, ∞ ∞ X X 1A×B(x, y) = 1Ai×Bi (x, y) = 1Ai (x)1Bi (y). i=1 i=1 7.1. PRODUCT MEASURES 97 P∞ Holding y fixed, we may integrate 1A(x)1B(y) = i=1 1Ai (x)1Bi (y) with respect to µ. ∞ Z Z X 1A(x) 1B(y) dµ(x) = 1Ai (x)1Bi (y)dµ(y). X X i=1 Use Corollary 5.4.8 for exchanging the summation and integration, ∞ X µ(A)1B = µ(Ai)1Bi . i=1 We use the convention 0 · ∞ = 0. ( So if µ(A) = ∞, then the left hand side is ∞ if y ∈ B and 0 otherwise. The right hand side is interpreted in the same way. Observe that Bi ⊂ B and Ai ⊂ A.) P∞ Integrating both sides from the above line w.r.t. ν we obtain µ(A)ν(B) = i=1 µ(Ai)ν(Bi), com- pleting the proof. S∞ S∞ Lemma 7.1.3 Suppose that i=1(Ai × Bi), j=1(Cj × Dj) are disjoint unions and ∞ ∞ [ [ (Ai × Bi) = (Cj × Dj), i=1 j=1 then ∞ ∞ X X µ(Ai)ν(Bi) = µ(Cj)ν(Dj). i=1 j=1 S∞ Proof Let E = j=1(Ai × Bi). Then, ∞ ∞ [ [ (Ai × Bi) = ( (Ai × Bi)) ∩ E i=1 j=1 ∞ ∞ [ [ = (Ai × Bi) ∩ (Cj × Dj) i=1 j=1 ∞ ∞ [ [ = (Ai ∩ Cj) × (Bi ∩ Dj). i=1 j=1 The last is also a union of disjoint sets. For each i, we apply the previous lemma to ∞ [ Ei = Ai × Bi = (Ai ∩ Cj) × (Bi ∩ Dj) j=1 7.1. PRODUCT MEASURES 98 to see ∞ X µ(Ai × Bi) = µ(Ai ∩ Cj)ν(Bi ∩ Dj) j=1 So, ∞ ∞ ∞ X X X µ(Ai × Bi) = µ(Ai ∩ Cj)ν(Bi ∩ Dj). i=1 i=1 j=1 Similarly ∞ ∞ ∞ X X X µ(Cj × Dj) = µ(Ai ∩ Cj)ν(Bi ∩ Dj). j=1 i=1 j=1 This finishes the proof. The map (7.1) defines a pre-measure on A. Firstly µ × ν(φ) = 0, secondly it has finite additive S∞ property on A. Furthermore, suppose that E = i=1 Ei, where Ei ∈ A are disjoint, with the property P∞ that E ∈ A, we show that µ(E) = i=1 µ(Ei). On one hand, each Ei is a finite disjoint union of product sets, we pull these product sets , that composes of Ei, together: they are all disjoint. Relabel S∞ P these disjoint product sets we see E = i=1 Ai × Bi. Re-arrange it, we see that µ(E) = µi(Ei). With this we define an outer measure µ∗, by (3.1): ∞ ∞ ∗ X [ µ (E) = inf µ × ν(Ej): E ⊂ Ej,Ej ∈ A . j=1 j=1 By the Caratheodory theorem, the outer measure µ∗ is a measure on F ⊗ G and agrees with µ × ν on A. This measure on F ⊗ G will be called the product measure. If µ and ν are σ-finite, there exists a unique measure such that the measure of product sets is the product of the measures on each factor. If µ, ν are finite, so is µ×ν(X ×Y) = µ(X )ν(Y) < ∞. If they are both σ-finite, so is µ×ν. Indeed, there exist An ↑ X and Bn ↑ Y such that µ(An) < ∞, ν(Bn) < ∞ for every n. Now {Ai × bj} is an increasing sequence converging to X × Y with finite measure, hence µ × ν is σ-finite. Remark: Since a countable union of measures of A is a countable union of product sets, ∞ ∞ ∗ X [ µ (E) = inf µ(Aj)ν(Aj): E ⊂ Aj × Bj,Aj ∈ F,Bj ∈ F ∈ A . j=1 j=1 Remark 7.1.4 Similar constructions can be used to obtain a product measure on a finite number of copies of product spaces and their product σ-algebras. Let us take for example three measure spaces (Xi, Fi, µi). Recall that 3 3 ⊗i=1Fi = σ( Πi=1Ai : Ai ∈ Fi, i = 1, 2, 3 ). 7.1. PRODUCT MEASURES 99 We define µ1 × µ2 × µ3 by 3 µ1 × µ2 × µ3(A1 × A2 × A3) = Πi=1µi(Ai). Proposition 7.1.5 Let (Xi, Fi, µi) be measurable spaces. We have, 3 (F1 ⊗ F2) ⊗ F3 = ⊗i=1Fi. If µi are σ-finite, then (µ1 × µ2) × µ3 = µ1 × µ2 × µ3. 3 Proof It is clear that, Πi=1Ai : Ai ∈ Fi, i = 1, 2, 3 ⊂ {E × C : E ∈ F1 ⊗ F2,C ∈ F3}. Hence 3 ⊗i=1Fi ⊂ (F1 ⊗ F2) ⊗ F3. On the other hand, for any C ∈ F3, 3 {E × C : E ∈ F1 ⊗ F2} ⊂ ⊗i=1Fi. so 2 3 {E × C : E ∈ F1 ⊗ F2,C ∈ F3} ⊂ ∪C∈F3 {E × C : E ∈ ⊗i=1Fi} ⊂ ⊗i=1Fi. Thus 3 (F1 ⊗ F2) ⊗ F3 = σ({E × C : E ∈ F1 ⊗ F2,C ∈ F3}) ⊂ ⊗i=1Fi. Now define µ1 × µ2 on F1 ⊗ F2. Then, (µ1 × µ2) × µ3 and µ1 × µ2 × µ3 agree on the product sets. These product set determines σ-finite measures. Example 7.1 Let λ be the Lebesgue measure, the completion of the product measure on R × R is called the Lebesgue measure on R2. This is denoted by the same letter or by λ2. Our question is: given f : R2 → R measurable, what is Z fdλ2? R2 Remark 7.1.6 If each factor (X , F, µ) and (Y, G, ν) are complete, (i.e. the σ-algebra is complete with respect to the given measure), the product σ-algebra is often NOT complete with respect to the product measure. We can of course complete the product σ-algebra under the product measure. Exercise 7.1.1 Give an example of a Borel measure on R2 (i.e. on the Borel σ-algebra) that is not a product measure. 7.1. PRODUCT MEASURES 100 7.1.1 Sections of subsets of product spaces If E ⊂ X × Y, for x ∈ X , we define the section Ex = {y ∈ Y :(x, y) ∈ E}. Similarly for y ∈ Y we define Ey = {x ∈ X :(x, y) ∈ E}. Proposition 7.1.7 If E ∈ F ⊗ G, then 1. Ex ∈ G for every x ∈ X and ν(Ex) is measurable. 2. Ey ∈ F for every y ∈ Y and µ(Ey) is measurable. 3. Also, Z Z y µ × ν(E) = ν(Ex)µ(dx) = µ(E )ν(dy). X Y Proof (1) We first assume that the measures are finite. Let D = {E ∈ F ⊗ G : conclusion holds}. Observe that ( B, x ∈ A, (A × B) = x φ, x 6∈ A. Take µ measure of the above, ( ν(B), x ∈ A, ν((A × B) ) = x 0, x 6∈ A. Integrate w.r.t. x, we see ν((A × B)x)µ(dx) = ν(B)µ(A) = µ × ν(A × B). A similar conclusion holds for the y-sections. This means D contains the set of products sets, which is a π-system. If A ⊂ B, A, B ∈ D then y y y (A \ B)x = Ax \ Bx ∈ G (A \ B) = A \ B ∈ F. Z Z Z ν((A \ B)x)µ(dx) = ν(Ax)dµ(x) − ν(Bx)dµ(y), 7.2. FUBINI’S THEOREM 101 A similar conclusion for the y-sections. Hence A \ B ∈ D. If An ∈ D is an increasing sequence, [ [ ( An)x = (An)x ∈ G. n Thus D is a λ system and thus equals F ⊗ G, as it contains a generating set which is a π-system. This concludes the finite measure case. (2) Suppose that the measures are σ-finite, we take Ai ×Bi increasing with finite measure, then apply the proposition to E ∩ (Ai × Bi), we see that ( E ∩ B , if x ∈ A (E ∩ (A × B )) = x i i i i x φ, otherwise, is measurable for every x, then so is Ex = ∪i(E ∩ (Ai × Bi))x. Similarly conclusions for the y-sections. Since, Z Z y µ × ν(E ∩ (Ai × Bi)) = ν((E ∩ (Ai × Bi))x)µ(dx) = µ((E ∩ (Ai × Bi))x)ν(dy), X Y we may take i → ∞ and use the monotone convergence theorem to conclude. Proposition 7.1.8 If f : X × Y → R is measurable, then so are the following functions: y fx = f(x, ·): Y → R, f = f(·, y): X → R are measurable for every x ∈ X and for every y ∈ Y. Proof For any B ∈ B(R), (f y)−1(B) = {x : f(x, y) ∈ B} = (f −1(B))y ∈ F, −1 −1 (fx) (B) = {y : f(x, y) ∈ B} = (f (B))x ∈ G. This conclusion follows from Proposition 7.1.7. 7.2 Fubini’s Theorem Theorem 7.2.1 (The Fubini-Tonelli Theorem) Let µ and ν be σ-finite measures. R R y (1) Suppose f : X ×Y → [0, ∞] is measurable. Then g(x) = fx(y)dν(y) and h(y) = f (x)dµ(x) are both non-negative and measurable. Furthermore, Z Z Z Z Z fd(µ × ν) = f(x, y)dν(y) dµ(x) = f(x, y)dµ(x) dν(y). (7.2) X Y Y X 7.2. FUBINI’S THEOREM 102 (2) Suppose f ∈ L1(µ × ν), then (a) fx ∈ L1(ν) for a.e. x y (b) f ∈ L1(µ) for a.e. y ∈ Y. R (c) The almost everywhere defined functions g1(x) = fx(y)dν(y) ∈ L1(µ) and g2(y) = R y f (x)dµ(x) ∈ L1(ν) are both integrable. (d) Furthermore, (7.2) holds. Proof This theorem holds for characteristic functions, and therefore holds for simple functions by non- linearity. We prove (1) first. Let f n ∈ S+ such that f n ↑ f. Then Z Z n gn(x) = f (x, y)dν(y) ↑ g(x) = f(x, y)dν(y), Z Z n hn(y) = f (x, y)dµ(x) ↑ h(y) = f(x, y)dµ(x). So g, h are measurable and Z Z Z Z Z fnd(µ × ν) = fn(x, y)dν(y) dµ(x) = fn(x, y)dµ(x) dν(y). X Y Y X Apply the monotone convergence theorem to conclude (1). R R R Also, fd(µ × ν) < ∞ implies that Y fn(x, y)dν(y) is finite for a.e. x and X fn(x, y)dµ(x) is finite for a.e. y. For a general integrable function f, apply this to f + and f −, and use the statement above to conclude. Remark 7.2.2 For simplicity we remove the brackets and write Z Z Z Z f(x, y)dν(y)dµ(x) = f(x, y)dν(y) dµ(x), X Y X Y Z Z Z Z f(x, y)dν(y)dµ(x) = f(x, y)dν(y) dµ(x). X Y Y X Corollary 7.2.3 Fubini’s theorem holds if we replace µ × ν by its completion, provided that µ, ν are complete. To see this we note that if f is measurable with respect to the completion of F ⊗ G, then there exists an F ⊗ G -measurable function fˆ such that f = fˆ except on a null set E, which by enlarging we may y assume to be F × G measurable. Then µ(Ex) = 0 for a.e. x, ν(E ) = 0 for a.e. y. Since µ, ν are complete, they are measurable. Thus, we may apply Fubini’s theorem to fˆ, Z Z ZZ ZZ f dµ × ν = fˆ dµ × ν = fˆ(x, y) dµ(x)dν(y) = fˆ(x, y) dν(y)dµ(x). X ×Y X ×Y 7.3. PRODUCT MEASURES AND INDEPENDENCE 103 We know that fˆ(x, ·) is measurable for almost every x and f(x, ·) differs at most on Ex, a null set. Similarly we conclude f(·, y) is measurable a..e. y, and also they are in L1, the conclusion follows. Exercise 7.2.1 Let X = {1,...,N} be a finite state space. Write down the the Fubini-Tonelli Theorem specific to this case. Given a measure µ on X × Y. We say µi are the marginals of µ if µ(X × B) = µ2(B), µ(A × Y) = µ1(A), ∀A ∈ F,B ∈ G. We also say that µ is a coupling of µ1 and µ2. 2 Exercise 7.2.2 Given an example of a Borel measure µ on R , two Borel measures µi on R such that µ is not the product measure µ1 ×µ2, but µi are marginals of µ. Given another set of examples of measures such that µ = µ1 × µ2 but µi are not the Lebesgue measure. R Note that if f is a non-negative measurable function, µˆ(A) := A f dµ is a measure. Exercise 7.2.3 f : R → R f(x) = x2 + 1 λ For given by √ and the Lebesgue measure, compute the R x−1 pushed forward measure. f∗λ and compute [1,2] e d(f∗λ). (x2−y2) Exercise 7.2.4 Let f(x, y) = (x2+y2)2 for x ∈ [0, 1] and y ∈ (0, 1], also f(0, 0) = 0. Show that f 6∈ L1([0, 1] × [0, 1]). 7.3 Product measures and independence The notion of product measure admits an important interpretation in probability theory with the notion of independence. We first recall the following definitions, for which we assume that we are given a probability space (Ω, F, P). We will say that A is a sub-σ-algebra of F if it is a σ-algbera contained in F. Definition 7.3.1 1. We say that two sub-σ-algebras A and B of F are independent if any A ∈ A and any B ∈ B are independent, i.e. they satisfy P(A ∩ B) = P(A)P(B). 2. Two random variables X and Y on (Ω, F, P) are independent if σ(X) is independent of σ(Y ). 3. A random variable Y is said to be independent of a sub-σ-algebra A of F, if σ(Y ) and A are independent. Thus, if (X , A) and (Y, B) are two measurable spaces, and if X :Ω → X and Y :Ω → Y are two random variables, then X and Y are independent if and only if, for all A ∈ A and B ∈ B, P(X ∈ A, Y ∈ B) = P(X ∈ A) P(Y ∈ B). (7.3) 7.3. PRODUCT MEASURES AND INDEPENDENCE 104 Let us denote by PX (resp. PY ) the probability distribution of X (resp. Y ): this is a probability measure on (X , A) (resp. (Y, B)). Let us further denote by P(X,Y ) the probability distribution of the random variable (X,Y ):Ω → X × Y: this is a probability measure on (X × Y, A ⊗ B). Theorem 7.3.1 The random variables X and Y are independent if and only if P(X,Y ) = PX × PY . Proof X and Y are independent if and only if, for all A ∈ A and B ∈ B, the equality (7.3) holds, and this equality can be written P(X,Y )(A × B) = PX (A) PY (B). Thus X and Y are independent if and only if P(X,Y ) satisfies the characteristic property of the product measure PX × PY , whence the claim. Corollary 7.3.2 If X and Y are independent, then for all measurable functions f : X → [0, ∞] and g : Y → [0, ∞], we have E(f(X) g(Y )) = E(f(X)) E(g(Y )). Proof By the transfer lemma, we have Z E(f(X) g(Y )) = f(x) g(y)dP(X,Y )(x, y). X ×Y But since X and Y are independent, P(X,Y ) = PX × PY , so by the Fubini-Tonnelli Theorem we obtain Z Z E(f(X) g(Y )) = f(x) dPX (x) g(y)dPY (y) X Y = E(f(X)) E(g(Y )) where we used the transfer lemma to obtain the last equality. Exercise 7.3.1 Assume that f : X → R and g : Y → R are measurable functions such that the random variables f(X) and g(Y ) are integrable. Show that if X and Y are independent, then the random variable f(X) g(Y ) is integrable and E(f(X) g(Y )) = E(f(X)) E(g(Y )). Chapter 8 Radon-Nikodym Theorem In this chapter we compare two measure, discuss the concepts of singular and absolutely continuity. 8.1 Singular and absolutely continuity Let (X , F) be a measurable space. Definition 8.1.1 We say that a measure µ is absolutely continuous with respect to ν if whenever ν(A) = 0, we have µ(A) = 0. We write µ ν. Given any two finite measures µ1, µ2 we can find a reference measure (e.g. µ = µ1 + µ2, such that µ1 µ and µ2 µ). Definition 8.1.2 Two measures µ and ν on (X , F) are mutually singular if there are disjoint sets A1 and A2 such that X = A1 ∪ A2 such that µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν. Remark 8.1.1 Note that the above property is equivalent to the existence of a measurable subset B satisfying µ(Bc) = ν(B) = 0. The Lebesgue measure and any Dirac measures δx are singular. Exercise 8.1.1 If ν is a measure on (X , G) and p : X → [0, ∞) is an integrable function. Show that Z µ(A) = p(y)ν(dy) A defines a measure and µ ν. We write µ = p ν, or dµ = p dν. 105 8.1. SINGULAR AND ABSOLUTELY CONTINUITY 106 Example 8.1 The Gaussian measure N(0, 1) on R1 is absolutely continuous w.r.t. the Lebesgue mea- sure. Z 2 1 − |x| N(0, 1)(A) = √ e 2 dx. A 2π R 2 Example 8.2 Let X = [0, 1] and P the Lebesgue measure . Define Q(A) = 2 A x dx so dQ (x) = 2x2. dP Define Q1 by dQ1 = 21[0, 1 ]. dP 2 Then Q1 P and P is not absolutely continuous with respect to Q1. Define Q2 by dQ2 = 21[ 1 ,1]. dP 2 The two measures Q1 and Q2 are singular. The following lemma will be useful in the sequel: Lemma 8.1.2 Let µ, µ˜ and ν three measures on a measurable space (X , F) such that µ ⊥ ν and µ˜ ⊥ ν. Then there exists a measurable subset B such µ(Bc) =µ ˜(Bc) while ν(B) = 0. Proof Since µ ⊥ ν, there exist measurable disjoint susbets A1 and A2 such that A1 ∪ A2 = X and µ(A1) = ν(A2) = 0. Since moreover µ˜ ⊥ ν, there also exist measurable disjoint susbets A˜1 and A˜2 c such that A˜1 ∪ A˜2 = X and µ˜(A˜1) = ν(A˜2) = 0. Setting B = A2 ∪ A˜2, it follows that µ(B ) = ˜ c ˜ µ(A1 ∩ A1) ≤ µ(A1) = 0, and similarly µ˜(B ) = 0, while ν(B) ≤ ν(A1) + ν(A1) = 0. Exercise 8.1.2 Let ν be a finite measure. Sow that ν µ if and only if the following holds: for any > 0 there exists δ > 0 such that whenever µ(E) < δ then ν(E) < . Exercise 8.1.3 Show that if f ∈ L1(µ) is a non-negative function, then for every > 0 there exists δ > 0 such that whenever µ(A) < δ we have Z fdµ < . E R Note that ν(A) = A dµ defines a measure with ν µ. Exercise 8.1.4 Give examples of pairs of measure µ, ν with µ ν; then give examples such that µ ⊥ ν. 8.2. SIGNED MEASURE 107 8.2 Signed measure Definition 8.2.1 Let (X , F) be a measurable space. A signed measure on (X , F) is a map µ : F → R satisfying the following properties: 1. µ(φ) = 0, P 2. if (Aj)j≥1 is a sequence of disjoint measurable sets, then the series j≥1 µ(Aj) is absolutely convergent in R and ∞ ∞ X µ(∪j=1Aj) = µ(Aj). j=1 Remark 8.2.1 We stress that, in the definition of a signed measure µ, we impose µ to have finite real values (this is sometimes refered to as a finite signed measure). We thus allow for A ∈ F such that µ(A) < 0, but not for µ(A) ∈ {±∞}. Terminology: When talking about genuine measures (as introduced in Chapter 3), we will often use the expression “positive measures” to avoid confusion with signed measures. 1 R Example 8.3 If ν is a positive measure, and f ∈ L (X , F, ν), then µ(A) := A f dν defines a signed measure. Definition 8.2.2 A measurable set P is said to be positive for µ if for any measurable F ⊂ P , we have µ(F ) ≥ 0. In other words µ restricts to a (positive) measure on P . A measurable set N is said to be negative for µ if for any F ⊂ N, we have µ(F ) ≤ 0; in other words (−µ) restricts to a (positive) measure on N. ∞ Exercise 8.2.1 If Pn is a sequence of positive sets for µ, then so is ∪n=1Pn. Sn−1 Note that each Pn \ k=1 Pn is positive for µ. Lemma 8.2.2 If A ∈ F satisfies µ(A) < 0, then there exists a negative subset N ⊂ A such that µ(N) < 0. Proof Let δ1 := sup{µ(B): B ∈ F,B ⊂ A}. Note that φ ⊂ A, so that the collection of sets over which we are taking the sup is non-empty, and δ1 ≥ µ(φ) = 0. Thus δ1 ∈ [0, ∞]. By definition of δ1, δ1 we can find B1 ∈ F such that B1 ⊂ A and µ(B1) ≥ min( 2 , 1). Continuing in this way we define, by induction over n, a sequence of non-negative numbers δn and a sequence of disjoint measurable subsets Bn by setting ( n−1 ) [ δn := sup µ(B): B ∈ F,B ⊂ A \ Bi i=1 8.2. SIGNED MEASURE 108