MAA6616 COURSE NOTES FALL 2012 1. Σ-Algebras Let X Be a Set, and Let 2 X Denote the Set of All Subsets of X. We Write E C for T

Home , Disjoint sets, Set function, Simple function

MAA6616 COURSE NOTES FALL 2012

1. σ-algebras Let X be a set, and let 2X denote the set of all subsets of X. We write Ec for the complement of E in X, and for E,F ⊂ X, write E \ F = E ∩ F c. Let E∆F denote the symmetric difference of E and F : E∆F := (E \ F ) ∪ F \ E) = (E ∪ F ) \ (E ∩ F ). (1.1) Definition 1.1. Let X be a set. A Boolean algebra is a nonempty collection A ⊂ 2X which is closed under finite unions and complements. A σ-algebra is a Boolean algebra which is also closed under countable unions. A measurable space is a pair (X, M ) where M ⊂ 2X is a σ-algebra. A function f : X → Y from one measurable spaces (X, M ) to another (Y, N ) is called measurable if f −1(E) ∈ M whenever E ∈ N .

If Eα is any collection of sets in X, then since !c [ c \ Eα = Eα, (1.2) α α a Boolean algebra (resp. σ-algebra) is automatically closed under finite (resp. countable) intersections. It follows that a Boolean algebra (and a σ-algebra) on X always contains ∅ and X. (Proof: X = E ∪ Ec and ∅ = E ∩ Ec.) Remark 1.2. There is a superficial resemblance between measurable spaces and topological spaces; and between measurable functions and continuous functions. In particular, a topology on X is a collection of subsets of X closed under arbitrary unions and finite intersections, whereas for a σ-algebra we insist only on countable unions, but require complements also. For functions, recall that a function between topological spaces is continuous if and only if pre-images of open sets are open; the definition of measurable function is plainly similar. The two categories are related by the Borel algebra construction, which we will examine later. The disjointification trick in the next Proposition is often useful. Proposition 1.3 (Disjointification). A nonempty collection M ⊂ 2X is a σ-algebra if and only if M is closed under complementation and countable disjoint unions.

Proof. The proof amounts to the observation that if {Gn} is a sequence of subsets of X, then the sets n−1 ! [ Fn = Gn \ Gk (1.3) k=1 SN SN S∞ S∞ are disjoint, and n=1 Fn = n=1 Gn for all N (and thus n=1 Fn = n=1 Gn). Date: January 6, 2014. 1 Examples 1.4. a) Of course, for any set X its power set 2X is a σ-algebra. At the other extreme, for any set X the collection {∅,X} is a σ-algebra. In general, if M ⊂ N ⊂ 2X are σ-algebras, we say M is coarser than N ; likewise N is finer than M . So {∅,X}, 2X are respectively the coarsest and finest σ-algebras on X. b) If X is any set, the collection M = {E ⊂ X : E is countable or X \ E is countable } (1.4) is a σ-algebra (the proof is left as an exercise). c) If X is a set, M ⊂ 2X a σ-algebra, and E is any nonempty subset of X, then E ME := {A ∩ E : A ∈ M } ⊂ 2 (1.5) is a σ-algebra on E (exercise). d) If {Mα : α ∈ A} is a collection of σ-algebras on X, then their intersection ∩α∈AMα is also a σ-algebra (checking this is a simple exercise). This means that given any set E ⊂ 2X , we can define the σ-algebra \ M (E ) = M (1.6) where the intersection is taken over all σ-algebras M containing E (this collection is nonempty, since it contains 2X ). M (E ) is called the σ-algebra generated by E . e) An important instance of the construction in (d) is when X is a topological space and E is the collection of open sets of X. In this case the σ-algebra generated by E is called the Borel σ-algebra and is denoted BX . We will study the Borel σ-algebra over R more closely in Subsection 1.1. f) If X is a set, (Y, N ) a measurable space, and f : X → Y a function, then the collection f −1(N ) = {f −1(E): E ∈ N } ⊂ 2X (1.7) is a σ-algebra on X (check this); called the pull-back σ-algebra. The pull-back σ- algebra is the smallest σ-algebra on X for which the function f : X → Y is measurable. g) More generally, if we have a set X, a family of measurable spaces (Yα, Nα) where α ranges over some index set A, and functions fα : X → Yα, then we can construct a σ-algebra M on X by taking −1 E = {fα (Eα): α ∈ A, Eα ∈ Nα} (1.8) and letting M = M (E ). Unlike the case of a single f, E by itself will not be a σ-algebra in general. M is characterized by the property that it is the smallest σ- algebra on X such that each of the functions fα is measurable. An important special case of this construction is the product σ-algebra (see Subsection 1.2). The following proposition is trivial but useful. Proposition 1.5. If M ⊂ 2X is a σ-algebra and E ⊂ M , then M (E ) ⊂ M . The proposition is used in the following way: suppose we want to prove that a particular statement is true for every set in some σ-algebra M (say, the Borel σ-algebra BX ), which we know is generated by a collection of sets E (say, the open sets of X). Then it suffices to prove that 1) the statement is true for every set in E , and 2) the collection of sets for which 2 the statement is true forms a σ-algebra. (The monotone class theorem (see Theorem ?? below) is typically used in a similar way.) A function f : X → Y between topological spaces is said to be Borel measurable if it is measurable when X and Y are equipped with their respective Borel σ-algebras. Proposition 1.6. If X and Y are topological spaces, then every continuous function f : X → Y is Borel measurable. Proof. Problem 7.5. (Hint: follow the strategy described after Proposition 1.5.) We will defer further discussion of measurable functions to the next chapter.

1.1. The Borel σ-algebra over R. Before going further, we take a closer look at the Borel σ-algebra over R. We begin with a useful lemma on the structure of open subsets of R: Lemma 1.7. Every nonempty open subset U ⊂ R is an (at most countable) disjoint union of open intervals. Here we allow the “degenerate” intervals (−∞, a), (a, +∞), (−∞, +∞).

Proof. Let {In : n ∈ N} denote the set of all intervals with rational endpoints such that S∞ In ⊂ U. This collection is evidently countable, and n=1 In = U. Write In = (an, bn). For each n deﬁne

αn = inf{x ∈ R : x ≤ an and (x, bn) ⊂ U} (1.9) βn = sup{x ∈ R : x ≥ bn and (an, x) ⊂ U}. (1.10)

After removing possible repetitions, the collection of open intervals (αn, βn) is disjoint, at S most countable, and n(αn, βn) = U. R Proposition 1.8 (Generators of BR). Each of the following collections of sets E ⊂ 2 generates the Borel σ-algebra BR: • the open intervals E1 = {(a, b): a, b ∈ R} • the closed intervals E2 = {[a, b]: a, b ∈ R} • the (left or right) half-open intervals E3 = {[a, b): a, b ∈ R} or E4 = {[a, b): a, b ∈ R} • the (left or right) open rays E5 = {(−∞, a): a ∈ R} or E6 = {(a, +∞): a ∈ R} • the (left or right) closed rays E7 = {(−∞, a]: a ∈ R} or E8 = {[a, +∞): a ∈ R} Proof. We prove only the open and closed interval cases, the rest are similar and left as

exercises. The proof makes repeated use of Proposition 1.5. To prove M (E1) = BR, first note that since each interval (a, b) is open, M (E1) ⊂ BR by Proposition 1.5. Conversely, each open set U ⊂ R is a countable union of open intervals, so M (E1) contains all the open sets of R, and since the open sets generate BR by definition, we have BR ⊂ M (E1) by Proposition 1.5 again. Thus M (E1) = BR. For the closed intervals E2, first note that each closed set is a Borel set, since it is the complement of an open set; thus E2 ⊂ BR so M (E2) ⊂ BR by Proposition 1.5. Conversely, 1 1 each open interval (a, b) is a countable union of closed intervals [a+ n , b− n ]. More precisely, 1 1 since a < b we can choose an integer N such that a + N < b − N ; then ∞ [ 1 1 (a, b) = [a + , b − ] (1.11) n n n=N 3 It follows that E1 ⊂ M (E2), so by Proposition 1.5 and the first part of the proof,

BR = M (E1) ⊂ M (E2). (1.12)

Sometimes it is convenient to use a more refined version of the above Proposition, where we consider only dyadic intervals. Definition 1.9. A dyadic interval is an interval of the form k k + 1 I = m + , m + (1.13) 2n 2n where m, n, k are integers, n ≥ 0, and 0 ≤ k ≤ 2n − 1. In other words, the dyadic intervals are the intervals of length 2−n (n = 0, 1, 2...) whose endpoints are dyadic rationals (rationals of the form j/2n). (Draw a picture of a few of these to get the idea). A key property of dyadic intervals is the nesting property: if I,J are dyadic intervals, then either they are disjoint, or one is contained in the other. Dyadic intervals are often used to “discretize” analysis problems. The other key property is Proposition 1.10. Every open subset of R is a countable disjoint union of dyadic intervals. Proof. Problem 7.3. It follows (using the same idea as in the proof of Proposition 1.8) that the dyadic intervals generate BR. The use of half-open intervals here is only a technical convenience, to allow us to say “disjoint” in the above proposition instead of “almost disjoint.” Naturally, one can define dyadic cubes in Rn and prove similar results; we defer these to the discussion of Lebesgue integration on Rn in the next chapter.

1.2. Product σ-algebras. Given a collection of measurable spaces (Xα, Mα), one would Q like to construct a σ-algebra on the Cartesian product X = α Xα that is compatible with the σ-algebras Mα. In particular, from the definition of the product of sets there are “coordinate functions” fα : X → Xα, and we would like each of these functions to be measurable. For simplicity, at present we will only consider the case of finitely many factors. Qn The product X = j=1 Xj is identified with the set of tuples {(x1, . . . xn): xj ∈ Xj}. For Qn x = (x1, x2, . . . xn) ∈ j=1 Xj, the coordinate projections πj : X → Xj are defined by πj(x) = xj. If each Xj is equipped with a σ-algebra Nj, then we can form a σ-algebra on X as in Example 1.4(g):

Deﬁnition 1.11. If (Xj, Nj), j = 1, . . . n are measurable spaces, the product σ-algebra n Qn ⊗j=1Nj is the σ algebra on X = j=1 Xj generated by −1 {π (Ej): Ej ∈ Nj, j = 1, . . . n}. (1.14)

The deﬁnition guarantees that each coordinate projection πj is a (⊗kNk, Nj) measurable function. (We will return to this point in the discussion of product measure in the next chapter.) Of course, one can view Rn as the n-fold Cartesian product Rn = R × · · · × R, and the product topology coincides with the usual metric topology on Rn. Thus, we now have two 4 n ways of constructing σ-algebras on R : the Borel σ-algebra BRn , or the product σ-algebra obtained by giving each copy of R the Borel σ-algebra BR and forming the product σ-algebra n ⊗1 BR. It is reasonable to suspect that these two σ-algebras are the same, and indeed this is the case.

n Proposition 1.12. BRn = ⊗j=1BR. Proof. We use Proposition 1.5 to prove inclusions in both directions. By definition, the n −1 product σ-algebra ⊗j=1BR is generated by the sets πj (Ej), j = 1, . . . n where Ej ⊂ R is −1 open and πj(x1, . . . xn) = xj is the projection map. But πj (Ej) = R × · · · × Ej × · · · × R, th n where Ej is the j factor. This is an open subset of R , hence Borel, so by Proposition 1.5 n we have ⊗j=1BR ⊂ BRn . For the reverse inclusion, first note that every open subset U ⊂ Rn is equal to a countable union of open boxes B = (a1, b1) × · · · × (an, bn) (just take all the open boxes contained in U having rational vertices). Arguing as in the proof of Proposition 1.8, the open boxes generate Qn the Borel σ-algebra BRn . If B = j=1(aj, bj) is an open box and we put Ej = (aj, bj), then −1 from the description of πj (Ej) in the first part of the proof we obtain

n \ −1 B = πj (Ej). (1.15) j=1

n n So, each open box belongs to the product σ-algebra ⊗j=1BR, and hence BRn ⊂ ⊗j=1BR by Proposition 1.5.

2. Measures Deﬁnition 2.1. Let X be a set and M a σ-algebra on X.A measure on M is a function µ : M → [0, +∞] such that • µ(∅) = 0, ∞ • If {Ej}j=1 is a sequence of disjoints sets in M , then

∞ ! ∞ [ X µ Ej = µ(Ej) j=1 j=1

∞ If µ(X) < ∞, then µ is called finite; if X = ∪j=1Xj with µ(Xj) < ∞ for each j then µ is called σ-finite. Almost all of the measures of importance in analysis (and certainly all of the measure we will work with) are σ-finite. A triple (X, M , µ) where M is a σ-algebra and µ a measure on M is called a measure space. At this point, we cannot give any really non-trivial examples (such as Lebesgue measure) of measures; this will have to wait until we have proved the Caratheodory and Hahn- Kolmogorov theorems in the following sections. For now we describe some simple measures and some procedures for producing new measures from old.

5 Examples 2.2. a) Let X be any set, and for E ⊂ X let |E| denote the cardinality of E. The function µ : 2X → [0, +∞] defined by ( |E| if E is finite µ(E) = (2.1) +∞ if E is infinite

is a measure on (X, 2X ), called counting measure. It is finite if and only if X is finite, and σ-finite if and only if X is countable. b) Let X be a set and M the σ-algebra of countable and co-countable sets (Exam- ple 1.4(b)). For E ∈ M define µ(E) = 0 if E is countable and µ(E) = +∞ is E is co-countable. Then µ is a measure. c) Let (X, M , µ) be a measure space and E ∈ M . Recall ME from Example 1.4(c). The function µE(A) := µ(A ∩ E) is a measure on (E, ME). (Why did we assume E ∈ M ?) d) (Linear combinations) If µ is a measure on M and c > 0, then (cµ)(E): c · mu(E) is a measure, and if µ1, . . . µn are measures on the same M , then

(µ1 + ··· µn)(E) := µ1(E) + ··· µn(E) (2.2)

P∞ is a measure. Likewise an inﬁnite sum of measures n=1 µn is a measure. (The proof of this last fact requires a small amount of care; see Problem 7.7.)

One can also deﬁne products and pull-backs of measures, compatible with the constructions of product and pull-back σ-algebras; these examples will be postponed until the next chapter, after we have built up some more machinery of measurable functions.

Theorem 2.3 (Basic properties of measures). Let (X, M , µ) be a measure space. a) (Monotonicity) If E,F ∈ M and E ⊂ F , then µ(E) ≤ µ(F ). ∞ S∞ P∞ b) (Subadditivity) If {Ej}j=1 ⊂ M , then µ( j=1 Ej) ≤ j=1 µ(Ej) ∞ c) (Monotone convergence for sets) If {Ej}j=1 ⊂ M and Ej ⊂ Ej+1 ∀j, then µ(∪Ej) = lim µ(Ej). ∞ d) (Dominated convergence for sets) If {Ej}j=1 ⊂ M and Ej ⊃ Ej+1 ∀j, and µ(E1) < ∞, then µ(∩Ej) = lim µ(Ej).

Proof. a) By additivity, µ(F ) = µ(F \ E) + µ(E) ≥ µ(E). b) We use the disjointiﬁcation trick (see Proposition 1.3): for each j ≥ 1 let

j−1 ! [ Fj = Ej \ Ek . (2.3) k=1

Then the Fj are disjoint, and Fj ⊂ Ej for all j, so by countable additivity and (a)

∞ ! ∞ ! ∞ ∞ [ [ X X µ Ej = µ Fj = µ(Fj) ≤ µ(Ej). (2.4) j=1 j=1 j=1 j=1 6 S∞ Sj c) The sets Fj = Ej \Ej−1 are disjoint sets whose union is j=1 Ej, and for each j, k=1 Fk = Ej. So by countable additivity,

∞ ! ∞ ! [ [ µ Ej = µ Fj j=1 j=1 ∞ X = µ(Fk) k=1 j X = lim µ(Fk) j→∞ k=1 j ! [ = lim µ Fk j→∞ k=1

= lim µ(Ej). j→∞ d) The sequence µ(Ej) is decreasing (by (a)) and bounded below, so lim µ(Ej) exists. Let S∞ T∞ Fj = E1 \ Ej. Then Fj ⊂ Fj+1 for all j, and j=1 Fj = E1 \ j=1 Ej. So by (c) applied to the Fj, and since µ(E1) < ∞, ∞ ∞ \ \ µ(E1) − µ( Ej) = µ(E1 \ Ej) j=1 j=1

= lim µ(Fj)

= lim(µ(E1) − µ(Ej))

= µ(E1) − lim µ(Ej).

Again since µ(E1) < ∞, it can be subtracted from both sides.

Remark 2.4. Note that in item (d) of Theorem 2.3, the hypothesis “µ(E1) < ∞” can be replaced by “µ(Ej) < ∞ for some j”. However the ﬁniteness hypothesis cannot be removed entirely. To see this, consider (N, 2N) equipped with counting measure, and let T∞ Ej = {k : k ≥ j}. Then µ(Ej) = ∞ for all j but µ( j=1 Ej) = µ(∅) = 0.

For any set X and subset E ⊂ X, there is a function 1E : X → {0, 1} deﬁned by ( 1 if x ∈ E 1 (x) = , (2.5) E 0 if x 6∈ E called the characteristic or indicator function of E. For a sequence of subsets (En) of X, 1 say En → E pointwise if 1En → 1E pointwise . This notion allows the formulation of a more reﬁned version of the dominated convergence theorem for sets, which foreshadows (and is a special case of) the dominated convergence theorem for the Lebesgue integral. See Problems 7.10 and 7.11.

1What would happen if we asked for uniform convergence? 7 Deﬁnition 2.5. Let (X, M , µ) be a measure space. A null set is a set E ∈ M with µ(E) = 0. It follows immediately from countable subadditivity that a countable union of null sets is null. The contrapositive of this statement is a measure-theoretic version of the pigeonhole principle:

∞ Proposition 2.6 (Pigeonhole principle for measures). If (En)n=1 is a sequence of sets in M and µ(∪En) > 0, then µ(En) > 0 for some n. It will often be tempting to assert that if µ(E) = 0 and F ⊂ E, then µ(F ) = 0, but one must be careful: F need not be a measurable set. This is not a big deal in practice, however, because we can always enlarge the σ-algebra on which a measure is defined so as to contain all subsets of null sets, and it will usually be convenient to do so. Definition 2.7. If (X, M , µ) has the property that F ∈ M whenever E ∈ M , µ(E) = 0, and F ⊂ E, then µ is called complete. Theorem 2.8. Let (X, M , µ) be a measure space. Let N := {N ∈ M |µ(N) = 0}. Define M := {E ∪ F |E ∈ M ,F ⊂ N for some N ∈ N } Then M is a σ-algebra, and µ(E ∪ F ) := µ(E)

deﬁnes a complete measure on M such that µ|M = µ. Proof. First note that M and N are both closed under countable unions, so M is as well. To see that M is closed under complements, consider E∪F ∈ M with E ∈ M ,F ⊂ N ∈ N . We may assume that E and N are disjoint (replace F,N by F \ E,N \ E if necessary). Then we have (E ∪ F )c = (E ∪ N)c ∪ N \ F , and by inspection Ec \ N ∈ M and N \ F ⊂ N ∈ N . Thus M is a σ-algebra. The proof that µ is a measure on M , and extends µ, is left as an exercise (Problem 7.12).

3. Outer measures and the Caratheodory Extension Theorem 3.1. Outline. The point of the construction of Lebesgue measure on the real line is to extend the naive notion of length for intervals to a suitably large family of subsets of R. We will accomplish this via the Caratheodory Extension Theorem (Theorem 3.3) Deﬁnition 3.1. Let X be a nonempty set. A function µ∗ : 2X → [0, +∞] is called an outer measure if: • µ∗(∅) = 0, • (Monotonicity) µ∗(A) ≤ µ∗(B) whenever A ⊂ B, ∞ X • (Subadditivity) if {Aj}j=1 ⊂ 2 , then ∞ ! ∞ ∗ [ X µ Aj ≤ µ(Aj) j=1 j=1

8 Definition 3.2. If µ∗ is an outer measure on X, then a set E ⊂ X is called outer measurable (or µ∗-measurable) if µ∗(A) = µ∗(A ∩ E) + µ∗(A ∩ Ec) (3.1) for every A ⊂ X. The significance of outer measures and outer measurable sets stems from the following theorem: Theorem 3.3 (Caratheodory Extension Theorem). If µ∗ is an outer measure on X, then the collection M of outer measurable sets is a σ-algebra, and the restriction of µ∗ to M is a complete measure. The proof will be given later in this section; we first explain how to construct outer measures. In fact, all of the outer measures we consider will be constructed using the following proposition:

X Proposition 3.4. Let E ⊂ 2 and µ0 : E → [0, +∞] be such that ∅,X ∈ E and µ0(∅) = 0. For every A ⊂ X, define ( ∞ ∞ ) ∗ X [ µ (A) = inf µ0(En): En ∈ E and A ⊂ En (3.2) n=1 n=1 Then µ∗ is an outer measure. Note that we have assumed X ∈ E , so there is at least one covering of A by sets in E (take E1 = X and all other Ej empty), so the definition (3.2) makes sense. Proof of Proposition 3.4. It is immediate from the definition that µ∗(∅) = 0 (cover the empty set by empty sets) and that µ∗(A) ≤ µ∗(B) whenever A ⊂ B (any covering of B is also a covering of A). To prove countable subadditivity, we make our first use of the “/2n” X trick. Let (An) be a sequence in 2 and let > 0. Then for each n ≥ 1 there exists a k ∞ S∞ k countable collection of sets (En)k=1 in E such that An ⊂ k=1 En and ∞ X k ∗ −n µ0(En) < µ (An) + 2 . (3.3) k=1 k ∞ S∞ But now the countable collection (En)n,k=1 covers A = n=1 An, and ∞ ∞ ∞ ∗ X k X ∗ −n X ∗ µ (A) ≤ µ0(En) < (µ (An) + 2 ) = + µ (An). (3.4) k,n=1 n=1 n=1 ∗ P∞ ∗ Since > 0 was arbitrary, we conclude µ (A) ≤ n=1 µ (An). Example 3.5 (Lebesgue outer measure). Let E ⊂ 2R be the collection of all open intervals (a, b) ⊂ R, with −∞ < a < b < +∞, together with ∅ and R. Define µ0((a, b)) = b − a, the length of the interval; and µ0(∅) = 0, µ0(R) = +∞. For any subset A ⊂ R, its Lebesgue outer measure m∗(A) is defined as in (3.2): ( ∞ ∞ ) ∗ X [ m (A) = inf (bn − an): A ⊂ (an, bn) (3.5) n=1 n=1 9 where we allow the “degenerate” interval R = (−∞, +∞). In the next section we will construct Lebesgue measure from m∗ via the Caratheodory Extension Theorem; the main issues will be to show that the outer measure of an interval is equal to its length, and that every Borel subset of R is outer measurable. The other desirable properties of Lebesgue measure (such as translation invariance) will follow from this construction. Before proving Theorem 3.3 we make one observation, which will be used repeatedly: to prove that a set E is outer measurable for a given outer measure µ∗, it suffices to prove that µ∗(A) ≥ µ∗(A ∩ E) + µ∗(A \ E) (3.6) for all A ⊂ X, since the opposite inequality for all A is immediate from the subadditivity of µ∗. We also need one lemma, which will be used to obtain completeness. A set E ⊂ X is called µ∗-null if µ∗(E) = 0. Lemma 3.6. Every µ∗-null set is µ∗-measurable. Proof. Let E be µ∗-null and A ⊂ X. By monotonicity, A ∩ E is also µ∗-null, so by monotonicity again, µ∗(A) ≥ µ∗(A \ E) = µ∗(A ∩ E) + µ∗(A \ E) (3.7) so the lemma follows from the observation immediately above. Proof of Theorem 3.3. We first show that M is a σ-algebra. It is immediate from Defini- tion 3.2 that M contains ∅ and X, and since (3.1) is symmetric with respect to E and Ec, M is also closed under complementation. Next we check that M is closed under finite unions (which will prove that M is a Boolean algebra). So, let E,F ∈ M and fix an arbitrary A ⊂ X. To unclutter the notation, partition A into four disjoint sets

A00 = A \ (E sup F ),A01 = (A \ E) ∩ F,A10 = (A \ F ) ∩ E,A11 = A ∩ E ∩ F. (3.8) (Draw a Venn diagram to see what is going on.) In this notation, to prove that E ∪ F is outer measurable we must prove that ∗ ∗ ∗ µ (A00 ∪ A01 ∪ A10 ∪ A11) = µ (A01 ∪ A10 ∪ A11) + µ (A00) (3.9) Since E is outer measurable, we have ∗ ∗ ∗ µ (A00 ∪ A01 ∪ A10 ∪ A11) = µ (A00 ∪ A01) + µ (A10 ∪ A11) (3.10) and ∗ ∗ ∗ µ (A01 ∪ A10 ∪ A11) = µ (A01) + µ (A10 ∪ A11). (3.11) Similarly, by the outer measurability of F we have ∗ ∗ ∗ µ (A00 ∪ A01) = µ (A00) + µ (A01). (3.12) Combining the last three equations gives (3.9). By induction, M is closed under ﬁnite unions. Now we show that M is closed under countable disjoint unions. (It then follows from Proposition 1.3 that M is a σ-algebra.) Let (En) be a sequence of disjoint outer measurable sets, and let A ⊂ X. We must show that ∞ ∞ ∗ ∗ [ ∗ [ µ (A) = µ (A ∩ En) + µ (A \ En). (3.13) n=1 n=1 10 As noted above, we only need to prove that ∞ ∞ ∗ ∗ [ ∗ [ µ (A) ≥ µ (A ∩ En) + µ (A \ En). (3.14) n=1 n=1 since the opposite inequality follows from the subadditivity of µ∗. For each N ≥ 1, we have SN already proved that n=1 En is outer measurable, and therefore N N ∗ ∗ [ ∗ [ µ (A) ≥ µ (A ∩ En) + µ (A \ En). (3.15) n=1 n=1

∗ SN ∗ S∞ By monotonicity, we also know that µ (A \ n=1 En) ≥ µ (A \ n=1 En), so it suﬃces to prove that ∞ N ∗ [ ∗ [ µ (A ∩ En) ≤ lim µ (A ∩ En) (3.16) N→∞ n=1 n=1 SN By the outer measurability of n=1 En N+1 N ∗ [ ∗ [ ∗ µ (A ∩ En) = µ (A ∩ En) + µ (A ∩ EN+1) (3.17) n=1 n=1 for all N ≥ 1 (the disjointness of the En was used here). Iterating this identity, we get

N+1 N ∗ [ X ∗ µ (A ∩ En) = µ (A ∩ Ek+1) (3.18) n=1 k=0 and taking limits, N ∞ ∗ [ X ∗ lim µ (A ∩ En) = µ (A ∩ EN+1) (3.19) N→∞ n=1 N=0 But also, by countable subadditivity, we have ∞ ∞ ∗ [ X ∗ µ (A ∩ En) ≤ µ (A ∩ EN+1) (3.20) n=1 N=0 which proves (3.16), and thus M is a σ-algebra. To prove that µ∗ is a measure on M , note that it is immediate that µ∗(∅) = 0. To prove countable additivity, suppose that (En) is a disjoint sequence of outer measurable sets. It follows from outer measurability that for every N ≥ 1,

N+1 N ∗ [ ∗ [ ∗ µ ( En) = µ ( En) + µ (EN+1). (3.21) n=1 n=1 By iterating this we see that µ∗ is ﬁnitely additive, that is

N N ∗ [ X ∗ µ ( En) = µ (En). (3.22) n=1 n=1 11 By monotonicity of µ∗, we get by taking limits ∞ ∞ ∗ [ X ∗ µ ( En) ≥ µ (En). (3.23) n=1 n=1 But the reverse inequality is immediate by the countable subadditivity of µ∗, so we are done. Finally, that µ∗ is a complete measure on M is an immediate consequence of Lemma 3.6. 4. Construction of Lebesgue measure In this section, by an interval we mean any set I ⊂ R of the from (a, b), [a, b], (a, b], [a, b), including ∅, open and closed half-lines and R itself. We write |I| = b − a for the length of I, interpreted as +∞ in the line and half-line cases and 0 for ∅. Recall the deﬁnition of Lebesgue outer measure of a set A ⊂ R from Example 3.5: ( ∞ ∞ ) ∗ X [ m (A) = inf |In| : A ⊂ In (4.1) n=1 n=1

where the In are open intervals, or empty. Theorem 4.1. If I ⊂ R is an interval, then m∗(I) = |I|. Proof. We first consider the case where I is a finite, closed interval [a, b]. For any > 0, the single open interval (a − , b + ) covers I, so m∗(I) ≤ (b − a) + 2 = |I| + 2, and thus ∗ m (I) ≤ |I|. For the reverse inequality, again choose > 0, and let (In) be a cover of I P∞ ∗ by open intervals such that n=1 |In| < m (I) + . Since I is compact, there is a finite N subcollection (Ink )k=1 of the In which covers I. Then N X |Ink | > b − a = |I|. (4.2) k=1 To see this, observe that by passing to a further subcollection, we can assume that none

of the intervals Ink is contained in another one. Then re-index I1,...IN so that the left endpoints a1, . . . aN are listed in increasing order. Since these intervals cover I, and there are no containments, it follows that a2 < b1, a3 < b2, ...aN < bN−1. (Draw a picture.) Therefore N N N−1 X X X |Ik| = (bk − ak) = bN − a1 + (bk − ak+1) ≥ bN − a1 > b − a = |I|. (4.3) k=1 k=1 k=1 From the inequality (4.2) we conclude that |I| < m∗(I) + , and since was arbitrary we have proved m∗(I) = |I|. Now we consider the cases of bounded, but not closed, intervals (a, b), (a, b], [a, b). If I is such an interval and I = [a, b] its closure, then since m∗ is an outer measure we have by monotonicity m∗(I) ≤ m∗(I) = |I|. On the other hand, for all > 0 suﬃciently small we ∗ ∗ have I := [a + , b − ] ⊂ I, so by monotonicity again m (I) ≥ m (I) = |I| − 2 and letting → 0 we get m∗(I) ≥ |I|. Finally, the result is immediate in the case of unbounded intervals, since any unbounded interval contains arbitrarily large bounded intervals. 12 ∗ Theorem 4.2. Every Borel set E ∈ BR is m -measurable. Proof. By the Caratheodory extension theorem, the collection of m∗-measurable sets is a σ-algebra, so by Propositions 1.8 and 1.5, it suﬃces to show that the open rays (a, +∞) are m∗-measurable. Fix a ∈ R and an arbitrary set A ⊂ R. We must prove m∗(A) ≥ m∗(A ∩ (a, +∞)) + m∗(A ∩ (−∞, a]). (4.4)

To simplify the notation put A1 = A ∩ (a, +∞), A2 = A ∩ (−∞, a]. Let (In) be a cover of 0 00 A by open intervals. For each n let In = In ∩ (a, +∞) and In = In ∩ (−∞, a]. The families 0 00 (In), (In) are intervals (not necessarily open) that cover A1,A2 respectively. Now ∞ ∞ ∞ X X 0 X 00 |In| = |In| + |In| (4.5) n=1 n=1 n=1 ∞ ∞ X ∗ 0 X ∗ 00 = m (In) + m (In) (by Theorem 4.1) (4.6) n=1 n=1 ∞ ∞ ∗ [ 0 ∗ [ 00 ≥ m ( In) + m ( In) (by subadditivity) (4.7) n=1 n=1 ∗ ∗ ≥ m (A1) + m (A2) (by monotonicity). (4.8) Since this holds for all coverings of A by open intervals, taking the infimum we conclude ∗ ∗ ∗ m (A) ≥ m (A1) + m (A2). Definition 4.3. A set E ⊂ R is called Lebesgue measurable if m∗(A) = m∗(A ∩ E) + m∗(A ∩ Ec) (4.9) for all A ⊂ R. The restriction of m∗ to the Lebesgue measurable sets is called Lebesgue measure, denoted m. By Theorem 3.3, m is a measure; by Theorem 4.2, every Borel set is Lebesgue measurable, and by Theorem 4.1 the Lebesgue measure of an interval is its length. It should also be evident by now that m is σ-finite. So, we have arrived at the promised extension of the length function on intervals to a measure. (We have not yet proved anything about the uniqueness of m; this will have to wait until we have proved the Hahn Uniqueness Theorem. See Corollary 5.5.) Next we prove that m has the desired invariance properties. Given E ⊂ R, x ∈ R, and t > 0, let E + x = {y ∈ R : y − x ∈ E}, −E = {y ∈ R : −y ∈ E}, and tE = {y ∈ R : y/t ∈ E}. (4.10)

Theorem 4.4. If E ⊂ R is Lebesgue measurable, x ∈ R, and t > 0, then the sets E + x, −E, and tE are Lebesgue measurable. Moreover m(E + x) = m(E), m(−E) = m(E), and m(tE) = tm(E). Proof. We give the proof for E + x; the others are similar and left as exercises. First, it is immediate that for any interval I, |I + x| = |I|. Since the map (In) → (In + x) is a bijection between coverings of E and coverings of E+x, it follows that m∗(A) = m∗(A+x) for every set 13 S ⊂ R. Finally, observe that A∩(E+x) = ((A−x)∩E)+x and A∩(E+x)c = ((A−x)∩Ec)+x, so if E is Lebesgue measurable, then for any A ⊂ R m∗(A) = m∗(A − x) (4.11) = m∗((A − x) ∩ E) + m∗((A − x) ∩ Ec) (4.12) = m∗(A ∩ (E + x)) + m∗(A ∩ (E + x)c) (4.13) Thus E + x is Lebesgue measurable and m(E + x) = m(E). The condition (4.9) does not make clear which subsets of R are Lebesgue measurable. We next prove two fundamental approximation theorems, which say that 1) if we are willing to ignore sets of measure zero, then every Lebesgue measurable set is a Gδ or Fσ, and 2) if we are willing to ignore sets of measure , then every set of ﬁnite Lebesgue measure is a union of intervals. (Recall that a set in a topological space is called a Gδ-set if it is a countable intersection of open sets, and an Fσ-set if it is a countable union of closed sets.)

Theorem 4.5. Let E ⊂ R. The following are equivalent: a) E is Lebesgue measurable. b) For every > 0, there is an open set U ⊃ E such that m∗(U \ E) < . c) For every > 0, there is a closed set F ⊂ E such that m∗(E \ F ) < . ∗ d) There is a Gδ set G such that E ⊂ G and m (G \ E) = 0. ∗ e) There is an Fσ set F such that E ⊃ F and m (E \ F ) = 0. Proof. The equivalence of (b) and (c), and of (d) and (e), is an easy consequence of the fact that the complement of an open or Gδ set is a closed or Fσ set respectively. The details are left as an exercise. (a) =⇒ (b): Let E be measurable, and suppose for the moment that m(E) < ∞. Let > 0. Since E is measurable there is a covering of E by open intervals In such that P∞ S∞ n=1 |In| < m(E) + . Put U = n=1 In. By subadditivity of m, ∞ ∞ X X m(U) ≤ m(In) = |In| < m(E) + . (4.14) n=1 n=1 Since U ⊃ E and m(E) < ∞, we conclude (from the additivity of m) that m∗(U \ E) = m(U \ E) = m(U) − m(E) < . To remove the finiteness assumption on E, we apply n the /2 trick: for each n ∈ Z let En = E ∩ (n, n + 1]. The En are disjoint measurable sets whose union is E, and m(En) < ∞ for all n. For each n, by the first part of the |n| proof we can pick an open set Un so that m(Un \ En) < /2 . Let U be the union of the S∞ Un; then U is open and U \ E ⊂ n=1(Un \ En). From the subadditivity of m, we get m∗(U \ E) = m(U \ E) < P 2−|n| = 3. n∈Z (b) =⇒ (d): Let E ⊂ R be given and for each n ≥ 1 choose (using (b)) an open set ∗ 1 T∞ Un ⊃ E such that m (Un \ E) < n . Put G = n=1 Un; then G is a Gδ containing E, and ∗ ∗ 1 G \ E ⊂ Un \ E for every n. By monotonicity of m we see that m (G \ E) < n for every n and thus m∗(G \ E) = 0. (Note that in this portion of the proof we cannot (and do not!) assume E is measurable.) (d) =⇒ (a): Since G is a Gδ, it is a Borel set and hence Lebesgue measurable by Theorem 4.2. By Lemma 3.6, every m∗-null set is Lebesgue measurable, so G \ E, and therefore also E = G \ (G \ E), is Lebesgue measurable. 14 Remark 4.6. Some books take the condition in Theorem 4.5(b) as the definition of Lebesgue measurability. It is possible to prove from this (without using Caratheodory extension) that the collection of sets with this property is a σ-algebra, and the restriction of m∗ to this σ-algebra is a measure. See Problem 7.14. Theorem 4.7. If E is Lebesgue measurable and m(E) < ∞, then for each > 0 there exists a set A which is a finite union of intervals such that m(E∆A) < .

Proof. Let (In) be a covering of E by open intervals such that ∞ X |In| < m(E) + /2. (4.15) n=1 Since the sum is ﬁnite there exists an integer N so that ∞ X |In| < /2. (4.16) n=N+1 S∞ SN Let U = n=1 In and A = n=1 In. Then A \ E ⊂ U \ E, so m(A \ E) ≤ m(U) − m(E) < /2 S∞ by (4.15). Similarly E \ A ⊂ U \ A ⊂ n=N+1 In, so m(E \ A) < /2 by (4.16). Therefore m(E∆A) < . Thus, while the “typical” measurable set can be quite complicated in the set-theoretic sense (i.e. in terms of the Borel hierarchy), for most questions in analysis this complexity is irrelevant. In fact, Theorem 4.7 is the precise expression of a useful heuristic:

Littlewood’s First Principle of Analysis: Every measurable set E ⊂ R with m(E) < ∞ is almost a ﬁnite union of intervals.

Theorem 4.8. If E ⊂ R is Lebesgue measurable, then m(E) = inf{m(U): U ⊃ E and U is open} = sup{m(K): K ⊂ E and K is compact} That is, m is a regular Borel measure. Proof. The first equality follows from monotonicity in the case m(E) = +∞, and from Theorem 4.5(b) (together with the additivity of m) in the case m(E) < ∞. For the second equality, let ν(E) be the value of the supremum on the right-hand side. By monotonicity m(E) ≥ ν(E). For the reverse inequality, first assume m(E) < ∞ and let > 0. By Theorem 4.5(c), there is a closed subset F ⊂ E with m(E \ F ) < /2. Since m(E) < ∞, we have by additivity m(E) < m(F ) + /2, so m(F ) > m(E) − /2. However this F need not be compact. To fix this, for each n ≥ 1 let Kn = F ∩ [−n, n]. Then the Kn are an increasing sequence of compact sets whose union is F . By monotone convergence for sets (Theorem 2.3(c)), there is an n so that m(Kn) > m(F ) − /2. It follows that m(Kn) > m(E) − , and thus ν(E) ≥ m(E). The case m(E) = +∞ is left as an exercise.

15 Remark 4.9. In the preceding theorem, for any set E ⊂ R (not necessarily measurable) the infimum of m(U) = m∗(U) over open sets U ⊃ E defines the Lebesgue outer measure of E. One can also define the Lebesgue inner measure of a set E ⊂ R as ∗ m∗(E) = sup{m (K): K ⊂ E and K is compact}. (4.17) ∗ By monotonicity we have m∗(E) ≤ m (E) for all E ⊂ R. One can then prove that if E ⊂ R ∗ ∗ and m (E) < ∞, then E is Lebesgue measurable if and only if m (E) = m∗(E), in which case this quantity is equal to m(E). You are asked to prove this in Problem 7.15. (Some care must be taken in trying to use this definition if m∗(E) = +∞. Why?)

Example 4.10 (The Cantor set). Recall the usual construction of the “middle thirds” Cantor set. Let E0 denote the unit interval [0, 1]. Obtain E1 from E0 by deleting the middle 1 2 th third (open) subinterval of E0, so E1 = [0, 3 ] ∪ [ 3 , 1]. Continue inductively; at the n step n delete the middle thirds of all the intervals present at that step. So, En is a union of 2 −n T∞ closed intervals of length 3 . The Cantor set is defined as the intersection C = n=0 En. It is well-known (though not obvious) that C is uncountable; we do not prove this here. It is clear that C is a closed set (hence Borel) but contains no intervals, since if J is an interval −n of length ` and n is chosen so that 3 < `, then J 6⊂ En and thus J 6⊂ C. The Lebesgue n measure of En is (2/3) , which goes to 0 as n → ∞, and thus by monotonicity m(C) = 0. So, C is an example of an uncountable, closed set of measure 0. Another way to see this is that at the nth stage (n ≥ 1) we have deleted a collection of 2n−1 disjoint open intervals, each of length 3−n. Thus the Lebesgue measure of [0, 1] \ C is ∞ X 1 2 2n−13−n = 3 = 1. (4.18) 2 1 − 2 n=1 3 Thus m(C) = 0. Example 4.11 (Fat Cantor sets). The standard construction of the Cantor set can be modified in the following way: fix a number 0 < c < 1. Imitate the construction of the Cantor set, except at the nth stage delete, from each interval I present at that stage, an open interval centered at the midpoint of I of length 3−nc. (In the previous construction we n had c = 1.) Again at each stage we have a set En which is a union of 2 closed intervals; let T∞ F = n=0 En. One can prove (in much the same way as for C) that 1) F is an uncountable, closed set; 2) F contains no intervals; and 3) m(F ) = 1 − c > 0. Thus, F is a closed set of positive measure, but which contains no intervals. Example 4.12 (A Lebesgue non-measurable set). Define an equivalence relation on R by declaring x ∼ y if and only if x − y ∈ Q. This relation partitions R into disjoint equivalence classes whose union is R. In particular, for each x ∈ R its equivalence class is the set {x + q : q ∈ Q}. Since Q is dense in R, each equivalence class C contains an element of the closed interval [0, 1]. By the axiom of choice, there is a set E ⊂ [0, 1] that contains exactly one member xC from each class. We claim the set E is not Lebesgue measurable. To see this, let y ∈ [0, 1]. Then y belongs to some equivalence class C, and hence y differs from xC by some rational number in the interval [−1, 1]. This means that [ [0, 1] ⊂ (E + q). (4.19) q∈[−1,1]∩Q 16 On the other hand, since E ⊂ [0, 1] and |q| ≤ 1 we see that [ (E + q) ⊂ [−1, 2]. (4.20) q∈[−1,1]∩Q Finally, by the construction of E the sets E + p and E + q are disjoint if p, q are distinct rationals. So if E were measurable, the the sets E + q would be also, and we would have by the countable additivity and monotonicity of m X 1 ≤ m(E + q) ≤ 3 (4.21) q∈[−1,1]∩Q But by translation invariance, all of the m(E + q) must be equal, which is a contradiction. Remark: The above construction can be modified used to show that if F is any Lebesgue set with m(F ) > 0, then F contains a nonmeasurable subset. See Problem 7.28.

5. Premeasures and the Hahn-Kolmogorov Theorem Definition 5.1. Let A ⊂ 2X be a Boolean algebra. A premeasure on A is a function µ0 : A → [0, +∞] satisfying • µ0(∅) = 0 ∞ ∞ • If {Aj}j=1 is a sequence of disjoint sets in A and ∪1 Aj ∈ A , then ∞ ! ∞ [ X µ0 Aj = µ0(Aj) j=1 j=1 Finiteness and σ-finiteness are defined for premeasures in the same way as for measures. Example 5.2. By an h-interval we mean a (finite or infinite) interval of the form (a, b]. (By convention (a, +∞) is an h-interval but (−∞, b) is not.) The collection A ⊂ 2R of finite Sn unions of h-intervals is a Boolean algebra, and if I ∈ A is a disjoint union I = j=1(aj, bj] then the function n X µ0(I) = bj − aj (5.1) j=1 is a premeasure on A . (Warning: this is immediate if we have already constructed Lebesgue measure, but it is not as obvious as it seems to prove from scratch–there are many different ways to decompose a given I as a countable disjoint union of h-intervals, so verifying the countable additivity condition is somewhat delicate. See Section 6.)

The only case of the proposition we will really care about is when E is an algebra and µ0 is a premeasure.

Theorem 5.3 (Hahn-Kolmogorov Theorem). If µ0 is a premeasure on a Boolean algebra A ⊂ 2X and µ∗ is the outer measure on X deﬁned by (3.2) then every set A ∈ A is outer ∗ measurable and µ |A = µ0. In particular every premeasure µ0 : A → [0, +∞] on a Boolean algebra A can be extended to a measure µ : B → [0, +∞] on a σ-algebra B ⊃ A .

17 X Proof. If A ⊂ 2 is a Boolean algebra and µ0 : A → [0, +∞] is a premeasure, then µ0 ∗ ∗ determines an outer measure µ by Proposition 3.4. We will prove that 1) µ |A = µ0, and 2) every set in A is outer measurable. The theorem then follows from Theorem 3.3. ∗ To prove 1), let E ∈ A . It is immediate that µ (E) ≤ µ0(E), since for a covering of E we can take A1 = E and Aj = ∅ for all other j. For the reverse inequality, let (Aj) be any covering of E by sets Aj ∈ A , and deﬁne sets Bn ∈ A by

n−1 [ Bn = E ∩ (An \ Aj). (5.2) j=1

The Bn are disjoint sets in A whose union is E, and Bn ⊂ An for all n. Thus by the countable additivity and monotonicity of µ0, ∞ ∞ X X µ0(E) = µ0(Bn) ≤ µ0(An) (5.3) n=1 n=1 ∗ Since the covering was arbitrary, we get µ0(E) ≤ µ (E). For 2), let A ∈ A , E ⊂ X, and > 0. Then there exists a sequence of sets (Bj) ⊂ A S∞ P∞ ∗ such that E ⊂ j=1 Bj and j=1 µ0(Bj) ≤ µ (E) + . By additivity and monotonicity of µ0 again, we have ∞ ∞ ∗ X X c µ (E) + ≥ µ0(Bj ∩ A) + µ0(Bj ∩ A ) (5.4) j=1 j=1 ≥ µ∗(E ∩ A) + µ∗(E ∩ Ac) (5.5)

Since was arbitrary, we are done. We will refer to the measure µ constructed in Theorem 5.3 as the Hahn-Kolmogorov extension of the premeasure µ0. The relationship between premeasures, outer measures, and measures in this construction is summarized in the following table:

domain additivity condition bear premeasure Boolean alge- countably additive, when mama bra A possible outer measure all of 2X monotone, countably papa subadditive measure σ-algebra countably additive baby containing A

The premeasure µ0 has the right additivity properties, but is defined on too few subsets of X to be useful. The corresponding outer measure µ∗ constructed in Proposition 3.4 is defined on all of 2X , but we cannot guarantee countable additivity. By Theorem 5.3, restricting µ∗ to the σ-algebra of outer measurable sets is “just right.” We have established that every premeasure µ0 on an algebra A can be extended to a measure on the σ-algebra generated by A . The next theorem addresses the uniqueness of this extension: 18 Theorem 5.4 (Hahn uniqueness theorem). Given any σ-finite premeasure µ0 on a Boolean algebra A , there is a unique extension of µ0 to a measure µ on the σ-algebra generated by A . Proof. Let M be the σ-algebra generated by A , let µ denote the Hahn-Kolmogorov extension 0 of µ0, but restricted to M , and let µ be any other extension of µ0 to M . To prove that 0 0 µ = µ , we first show that µ (E) ≤ µ(E) for all E ∈ M . Let E ∈ M and let (An) be a S∞ sequence in A such that E ⊂ n=1 An. Then ∞ ∞ 0 X 0 X µ (E) ≤ µ (An) = µ0(An) (5.6) n=1 n=1 Taking the infimum over all such coverings of E, it follows that µ0(E) ≤ µ(E). (Recall the S∞ definition of µ.) Also notice that by monotone convergence for sets, if A = n=1 An, then N N 0 0 [ [ µ (A) = lim µ ( An) = lim µ( An) = µ(A). (5.7) N→∞ N→∞ n=1 n=1 Next we show that equality holds in the case µ(E) < ∞. In this case, let > 0 and choose a S∞ covering (An) of E by sets in A such that for A = n=1 An, we have µ(A) < µ(E) + , and consequently µ(A \ E) < . Thus µ(E) ≤ µ(A) = µ0(A) = µ0(E) + µ0(A \ E) (5.8) ≤ µ0(E) + µ(A \ E) (5.9) < µ0(E) + . (5.10) Since was arbitrary, we conclude µ(E) = µ0(E). (Observe that up to this point we have not used the σ-finiteness assumption; we use it only now to move to the µ(E) = +∞ case.) S∞ Now write X = n=1 An with the An mutually disjoint and µ0(An) < ∞ for all n. Then for all E ∈ M ∞ ∞ X X 0 0 µ(E) = µ(E ∩ An) = µ (E ∩ An) = µ (E). (5.11) n=1 n=1 Corollary 5.5 (Uniqueness of Lebesgue measure). If µ is a Borel measure on R such that µ(I) = |I| for every interval I, then µ(E) = m(E) for every Borel set E ⊂ R. Uniqueness can fail in the non-σ-finite case. An example is outlined in Problem 7.20.

6. Lebesgue-Stieltjes measures on R Let µ be a Borel measure on R. (This means that the domain of µ contains all Borel sets, though we allow that the domain of µ may be larger.) Say µ is locally finite if µ(I) < ∞ for every compact interval I. Given a locally finite Borel measure, define a function F : R → R by  0 if x = 0,  F (x) = µ((0, x]) if x > 0, (6.1) −µ((x, 0]) if x < 0 19 It is not hard to show that F is nondecreasing and continuous from the right (that is, F (a) = limx→a+ F (x) for all a ∈ R). In this section we prove the converse: given any increasing, right-continuous function F : R → R, there is a unique locally finite Borel measure µ such that (6.1) holds. This will be accomplished using the Hahn-Kolmogorov extension theorem. Let A ⊂ 2R denote the Boolean algebra generated by the half-open intervals (a, b]. (We insist that the interval be open on the left and closed on the right; this is only a convention chosen to be compatible with our definition of F .) More precisely, A consists of all finite unions of intervals of the form (a, b] (with (−∞, b] and (a, +∞) allowed). Fix a nondecreasing, right-continuous function F : R → R. Since F is monotone, the limits F (+∞) := limx→+∞ F (x) and F (−∞) := limx→−∞ F (x) exist (possibly +∞ or −∞ respectively). For each interval I = (a, b] in A , we define its F -length by

|I|F := F (b) − F (a). (6.2)

SN Given a set A ∈ A , we can write it as a disjoint union of intervals A = n=1 In with In = (an, bn]. Deﬁne

N N X X µ0(A) = |In|F = F (bn) − F (an). (6.3) n=1 n=1

Proposition 6.1. The expression (6.3) is a well-deﬁned premeasure on A .

Proof. That µ0 is well-defined is left as an exercise; note that once this is established it is immediate from the definition that µ0 is finitely additive. To prove that µ0 is a premeasure, let (In) be a disjoint sequence of intervals in A and S∞ suppose I = n=1 ∈ A . By finite additivity we may assume I is an interval. Then

N ! N ! N ! N [ [ [ X µ0(I) = µ0 In + µ0 I \ In ≥ µ0 In = µ0(In). (6.4) n=1 n=1 n=1 n=1

S∞ P∞ Taking limits, we conclude µ( n=1 In) ≥ n=1 µ0(In). For the reverse inequality, we employ a compactness argument similar to the one used in the proof of Theorem ??. However, the situation is more complicated since we are dealing with half-open intervals. The strategy will be to shrink I to a slightly smaller compact interval, and enlarge the In to open intervals, using the right-continuity of F and the /2n trick to control their F -lengths. So, first suppose that I = (a, b] is a finite interval and fix > 0. By right continuity of F , there is a δ > 0 such that F (a + δ) − F (a) < . Moreover if In = (an, bn] then there exist −n δn > 0 such that F (bn + δn) − F (bn) < 2 . Shrink I to the compact interval J = [a + δ, b] and enlarge the In to open intervals Jn = (an, bn + δn). Then finitely many of the Jn cover J, and these may be chosen so that none is contained in another, and they may be relabeled 20 J1,...JN so their left endpoints are listed in increasing order. It follows that

µ0(I) ≤ F (b) − F (a + δ) + (6.5)

≤ F (bN + δN ) − F (a1) + (6.6) N−1 X = F (bN + δN ) − F (aN ) + (F (aj+1 − F (aj)) + (6.7) j=1 N−1 X ≤ F (bN + δN ) − F (aN ) + (F (bj + δj) − F (aj)) + (6.8) j=1 ∞ X ≤ µ0(In) + 2. (6.9) n=1 Since was arbitrary, we are done in the case I = (a, b] is ﬁnite. The inﬁnite cases are left as an exercise.

By the Hahn-Kolmogorov Theorem, µ0 extends to a Borel measure µF , called the Lebesgue- Stieltjes measure associated to F . It is immediate from the definition that µ0 is σ-finite (each interval (n, n + 1] has finite F -length), so the restriction of µF to the Borel σ-algebra is uniquely determined by F . In particular we conclude that the case F (x) = x recovers Lebesgue measure. Examples 6.2. a) (Dirac measure) Define the Heaviside function ( 1 if x ≥ 0 H(x) = (6.10) 0 if x < 0

Then for any interval I = (a, b], µH (I) = 1 if 0 ∈ I and 0 otherwise. Since Dirac measure δ0 is a Borel measure and also has this property, and the intervals (a, b] generate the Borel σ-algebra, it follows from the Hahn Uniqueness Theorem (Theo- rem 5.4) that µH (E) = δ0(E) for all Borel sets E ⊂ R. More generally, for a finite Pn set x1, . . . xn in R and positive numbers c1, . . . cn, let F (x) = j=1 cjH(x−xj). Then Pn µF = j=1 cjδxj . ∞ b) (Infinite sums of point masses) Even more generally, if (xn)n=1 is an infinite sequence P∞ in R and (cn) is a sequence of positive numbers with n=1 cn < ∞, define F (x) = P c H(x − x ). Then µ (E) = P c . A particularly interesting case is n:xn 1, we can form a Lebesgue-Stieltjes measure µF supported on C (that is, µF (E) = 0 if E ∩ C = ∅). This measure is called the Cantor measure. It will be an important example of what is called a singular continuous measure on R.

One can prove that the Lebesgue-Stieltjes measures µF have similar regularity properties as Lebesgue measure; since the proofs involve no new ideas they are left as exercises.

Lemma 6.3. Let µF be a Lebesgue-Stieltjes measure. For any Borel set E ⊂ R, ∞ ∞ X [ µF (E) = inf{ µF (an, bn): E ⊂ (an, bn)} (6.11) n=1 n=1 Proof. Problem 7.25.

Theorem 6.4. Let µF be a Lebesgue-Stieltjes measure. Then for every Borel set E ⊂ R,

µF (E) = inf{µF (U): E ⊂ U, U open} (6.12)

= sup{µF (K): K ⊂ E,K compact} (6.13) Proof. Problem 7.26. 7. Problems Problem 7.1. Prove the “exercise” claims in Example 1.4. ∞ Problem 7.2. a) Let X be a set and let A = (An)n=1 be a sequence of disjoint, nonempty subsets whose union is X. Prove that the set of all ﬁnite or countable unions of members of A (together with ∅) is a σ-algebra. (A σ-algebra of this type is called atomic.)

b) Prove that the Borel σ-algebra BR is not atomic. (Hint: there exists an uncountable family of mutually disjoint Borel subsets of R.) Problem 7.3. a) Prove Proposition 1.10. (First prove that every open set U is a union of dyadic intervals. To get disjointness, show that for each point x ∈ U there is a unique largest dyadic interval I such that x ∈ I ⊂ U.) b) Prove that the dyadic intervals generate the Borel σ-algebra BR. Problem 7.4. Fix an integer n ≥ 1. Prove that the set of finite unions of dyadic subintervals of (0, 1] of length at most 2−n (together with ∅) is a Boolean algebra. Problem 7.5. Prove that if X,Y are topological spaces and f : X → Y is continuous, then f is Borel measurable. 22 Problem 7.6. Let (X, M ) be a measurable space and suppose µ : M → [0, +∞] is a finitely additive measure which satisfies the condition in part (c) of Theorem 2.3. Prove that µ is a measure. Problem 7.7. Prove that an infinite sum of measures is a measure (Example 2.2(d)). (You ∞ will need the following fact from elementary analysis: if (amn)m,n=1 is a doubly indexed se- P∞ P∞ P∞ P∞ quence of nonnegative reals, then n=1 m=1 amn = m=1 n=1 amn. Prove this by proving that both sums are equal to X sup amn (7.1) F (m,n)∈F where the supremum is taken over all finite subsets of N × N.) ∞ Problem 7.8. Let A be an atomic σ-algebra generated by a partition (An)n=1 of a set X (see Problem 7.2).

a) Fix n ≥ 1. Prove that the function δn : A → [0, 1] defined by ( 1 if An ⊂ A δn(A) = (7.2) 0 if An 6⊂ A is a measure on A . b) Prove that if µ is any measure on (X, A ), then there exists a unique sequence (cn) with each cn ∈ [0, +∞] such that ∞ X µ(A) = cnδn(A) (7.3) n=1 for all A ∈ A . Problem 7.9. Let (X.M , µ) be a measure space. Prove the following: a) If E,F ∈ M and µ(E∆F ) = 0 then µ(E) = µ(F ). b) Define E ∼ F if and only if µ(E∆F ) = 0; then ∼ is an equivalence relation on M . c) For E,F ∈ M define d(E,F ) = µ(E∆F ). Then d defines a metric on the set of equivalence classes M / ∼.

Problem 7.10. Let X be a set. For a sequence of subsets (En) of X, deﬁne ∞ ∞ ∞ ∞ \ [ [ \ lim sup En = En, lim inf En = En. (7.4) N=1 n=N N=1 n=N

a) Prove that lim sup 1En = 1lim sup En and lim inf 1En = 1lim inf En (thus justifying the names). Conclude that En → E pointwise if and only if lim sup En = lim inf En = E. (Hint: for the first part, observe that x ∈ lim sup En if and only if x lies in infinitely many of the En, and x ∈ lim inf En if and only if x lies in all but finitely many En.)

b) Prove that if the En are measurable, then so are lim sup En and lim inf En. Deduce that if En → E pointwise and all the En are measurable, then E is measurable.

Problem 7.11 (Fatou theorem for sets). Let (X, M , µ) be a measure space, and let (En) be a sequence of measurable sets. 23 a) Prove that µ(lim inf En) ≤ lim inf µ(En). (7.5) S∞ b) Assume in addition that µ( n=1 En) < ∞. Prove that

µ(lim sup En) ≥ lim sup µ(En). (7.6) c) Prove the following stronger form of the dominated convergence theorem for sets: suppose (En) is a sequence of measurable sets, and there is a measurable set F ⊂ X such that En ⊂ F for all n and µ(F ) < ∞. Prove that if En → E pointwise, then µ(En) → µ(E). Give an example to show the ﬁniteness hypothesis on F cannot be dropped. (For parts (a) and (b), use Theorem 2.3.) Problem 7.12. Complete the proof of Theorem 2.8. Problem 7.13. Complete the proof of Theorem 4.4. Problem 7.14. Prove directly that the collection of sets E ⊂ R satisfying the condition in Theorem 4.5(b) is a σ-algebra. (Hint: to show countable additivity, use the /2n trick.)

Problem 7.15. Recall the definition of Lebesgue inner measure m∗ from Remark 4.9. Prove ∗ ∗ that if E ⊂ R and m (E) < ∞ then E is Lebesgue measurable if and only if m∗(E) = m (E). Problem 7.16. Prove the following dyadic version of Theorem 4.7: If m(E) < ∞ and > 0, there exists an integer n ≥ 1 and a set A, which is a finite union of dyadic intervals of length 2−n, such that m(E∆A) < . (This says, loosely, that measurable sets look “pixelated” at sufficiently fine scales.) Problem 7.17. a) Prove the following strengthening of Theorem 4.7: if E ⊂ R and m∗(E) < SN ∞, then E is Lebesgue measurable if and only if for every > 0, there exists a set A = n=1 In (a finite union of open intervals) such that m∗(E∆A) < . b) State and prove a dyadic version of the theorem in part (a). Problem 7.18. Let E ⊂ R measurable and m(E) > 0. a) Prove that for any α < 1, there is an open interval I such that m(E ∩ I) > αm(I). b) Show that the set E − E = {x − y : x, y ∈ E} contains an open interval centered at 0. (Choose I as in part (a) with α > 3/4; then E − E contains (−m(I)/2, m(I)/2).) Problem 7.19. Prove the claims made about the Fat Cantor set in Example 4.11. Problem 7.20. Let A ⊂ 2R be the Boolean algebra generated by the half-open intervals (a, b]. For A ∈ A , let µ0(A) = +∞ if A is nonempty and µ0(∅) = 0. a) Prove that µ0 is a premeasure. If µ is the Hahn-Kolmogorov extension of µ0 and E ⊂ R is a nonempty Borel set, prove that µ(E) = +∞. 0 0 b) Prove that if µ is counting measure on (R, BR), then µ is an extension of µ0 different from µ. Problem 7.21. Suppose (X, M , µ) is a measure space and A ⊂ 2X is a Boolean algebra which generates M . Prove that if E ∈ M and µ(E) < ∞, then for every > 0 there exists a set A ∈ A such that µ(E∆A) < . (Hint: let µ0 be the premeasure obtained by restricting µ to A . One may then assume that µ is equal the Hahn-Kolmogorov extension of µ0. (Why?)) 24 Problem 7.22. Prove that if µ is a locally finite Borel measure and F is defined by (6.1), then F is nondecreasing and right-continuous. − Problem 7.23. Let µF be a Lebesgue-Stieltjes measure. Write F (a ) := limx→a− F (x). Prove that − • µF ({a}) = F (a) − F (a ), − − • µF ([a, b)) = F (b ) − F (a ), − • µF ([a, b]) = F (b) − F (a ), and − • µF ((a, b)) = F (b ) − F (a). Problem 7.24. Complete the proof of Proposition 6.1 Problem 7.25. Prove Lemma 6.3. Problem 7.26. Prove Theorem 6.4. (Use Lemma 6.3.) Problem 7.27. This problem gives another construction of a set E ⊂ R that is not Lebesgue measurable. a) Prove that there exists a set E ⊂ R such that whenever x, y are irrational numbers with x + y rational, E contains exactly one of the pair x, y. (Hint: use the well- ordering principle (which is equivalent to the Axiom of Choice).) b) Prove that the set E in part (a) is not Lebesgue measurable. (Hint: suppose it is. 1 Prove that for every interval I with rational endpoints, one has m(E ∩ I) = 2 |I|.) Problem 7.28. Let E be the nonmeasurable set described in Example 4.12. a) Show that if F ⊂ E and F is Lebesgue measurable, then m(F ) = 0. b) Prove that if G ⊂ R has positive measure, then G contains a nonmeasurable subset. (Reduce to the case G ⊂ [0, 1]. Then observe that G = ∪q∈Q∩[0,1](G ∩ E + q) and apply part (a).)

25 8. Measurable functions We will state and prove a few “categorical” properties of measurable functions between general measurable spaces, however in these notes we will mostly be interested in functions from a measurable space taking values in the extended positive axis [0, +∞], the real line R, or the complex numbers C. Deﬁnition 8.1. Let (X, M ) and (Y, N ) be measurable spaces. A function f : X → Y is called measurable (or (M , N ) measurable) if f −1(E) ∈ M for all E ∈ N . It is immediate from the deﬁnition that if (X, M ), (Y, N ), (Z, O) are measurable spaces and f : X → Y, g : Y → Z are measurable functions, then the composition g ◦ f : X → Z is measurable. The following is a routine application of Proposition 1.5; the proof is left as an exercise. Proposition 8.2. Suppose (X, M ) and (Y, N ) are measurable spaces and the collection of sets E ⊂ 2Y generates N as a σ-algebra. Then f : X → Y is measurable if and only if f −1(E) ∈ M for all E ∈ E .

Corollary 8.3. Let X,Y be topological spaces equipped with their Borel σ-algebras BX , BY respectively. Every continuous function f : X → Y is (BX , BY )-measurable (or Borel measurable for short). −1 Proof. Since the open sets U ⊂ Y generate BY and f (U) is open (hence in BX ) by hypothesis, this is immediate from Proposition 8.2. Deﬁnition 8.4. A function f : R → R is called Lebesgue measurable (resp. Borel measurable) if it is (L , BR) (resp. (BR, BR)) measurable.

Remark 8.5. Note that since BR ⊂ L , being Lebesgue measurable is a weaker condition than being Borel measurable. If f is Borel measurable, then f ◦ g is Borel or Lebesgue measurable if g is. However if f is only Lebesgue measurable, then f ◦ g need not be Lebesgue measurable, even if g is continuous. (The difficulty is that we have no control over g−1(E) when E is a Lebesgue set.) A counterexample is described in Problem 13.7. It will sometimes be convenient to consider functions that are allowed to take the values ±∞. Definition 8.6 (The extended real line). Let R denote the set of real numbers together with the symbols ±∞. The arithmetic operations + and · can be (partially) extended to R by declaring ±∞ + x = x + ±∞ = ±∞ for all x ∈ R, +∞ · x = x · +∞ = +∞ for all nonzero x ∈ (0, +∞) (and similar rules for the other choices of signs), 0 · ±∞ = ±∞ · 0 = 0, The order < is extended to R by declaring −∞ < x < +∞ for all x ∈ R. 26 The symbol +∞ + (−∞) is not defined, so some care must be taken in working out the rules of arithmetic in R. Typically we will be performing addition only when all values are finite, or when all values are nonnegative (that is for x ∈ [0, +∞]). In these cases most of the familiar rules of arithmetic hold (for example the commutative, associative, and distributive laws), and the inequality ≤ is preserved by multiplying both sides by the same quantity. However cancellation laws are not in general valid when infinite quantities are permitted; in particular from x · +∞ = y · +∞ or x + +∞ = y + +∞ one cannot conclude that x = y. The order property allows us to extend the concepts of supremum and infimum, by defining the supremum of a set that is unbounded from above, or set containing +∞, to be +∞; P similarly for inf and −∞. This also means every sum n xn with xn ∈ [0, +∞] can be meaningfully assigned a value in [0, +∞], namely the supremum of the finite partial sums P n∈F xn. A set U ⊂ R will be called open if either U ⊂ R and U is open in the usual sense, or U is the union of an open subset of R with a set of the form (a, +∞] or [−∞, b) (or both). Definition 8.7 (Extended Borel σ-algebra). The extended Borel σ-algebra over R is the σ-algebra over generated by the Borel sets of R together with the sets (a, +∞] for a > 0. Definition 8.8 (Measurable function). Let (X, M ) be a measurable space. A function f : X → is called measurable if it is ( , ) measurable; that is, if f −1(E) ∈ for R M BR M every open set U ⊂ R. Similarly a function f : X → C is called measurable if f −1(E) ∈ M for every open set U ⊂ C. A function f : R → R is called Lebesgue measurable if it is ( , ) measurable. L BR In particular we have the following criteria for measurability which will be used repeatedly: Corollary 8.9 (Equivalent criteria for measurability). Let (X, M ) be a measurable space and f : X → R a function. The following are equivalent: a) f is measurable. −1 b) f (E) ∈ M for all E ∈ E , where E is any of the collections of sets Ej appearing in Proposition 1.8. The above Corollary will be used most often in the following form: a function f : X → R is measurable if and only if the sets f −1((t, +∞]) = {x : f(x) > t} (8.1) are measurable for all t ∈ R.

Examples 8.10 (Examples of measurable functions). a) An indicator function 1E is measurable if and only if E is measurable. Indeed, the set {x : 1E(x) > t} is either empty, E, or all of X, in the cases t ≥ 1, 0 ≤ t < 1, or t < 0, respectively. b) By Corollary 8.3, if X is a topological space equipped with its Borel σ-algebra, then every continuous function f : X → C is measurable. c) The next series of propositions will show that measurability is preserved by most of the familiar operations of analysis, including sums, products, sups, infs, and limits (provided one is careful about arithmetic of inﬁnities). d) Corollary 8.19 below will show that the examples above are, in fact, all the examples in the case of R or C valued functions. That is, every measurable function is a pointwise limit of linear combinations of measurable indicator functions. 27 Proposition 8.11. Let (X, M ) be a measurable space. A function f : X → C is measurable if and only if Ref and Imf are measurable. Proof. Identify C with R2; the Borel σ-algebra of R2 is generated by open rectangles (a, b) × (c, d). Suppose f : X → C is measurable and let u = Ref. Then u−1((a, b)) is equal to f −1(E), where E is the open strip {z : a < Rez < b}. It follows that u−1(a, b) lies in M , so u is measurable; similarly for v = Imf. Conversely, suppose u, v are measurable. Fix an open rectangle R = (a, b) × (c, d). If E,F denote the open strips {z : a < Rez < b}, {z : c < Imz < d} respectively, then we see that f −1(R) = u−1(E) ∩ v−1(F ) which lies in M by hypothesis. So f is measurable. Theorem 8.12. Let (X, M ) be a measurable space, let f, g : X → C be measurable functions, and c ∈ C. Then cf, f + g, and fg are measurable. Proof. By taking real and imaginary parts it suﬃces to assume that f, g : X → R and c ∈ R. That cf is measurable is left as an exercise. For f +g, let t > 0 and Et = {x : f(x)+g(x) > t}. Then

Et = {x ∈ X : f(x) + g(x) > t} (8.2) = {x : f(x) > t − g(x)} (8.3) [ = {x : f(x) > q > t − g(x)} (8.4) q∈Q [ = {x : f(x) > q} ∩ {x : g(x) > t − q} (8.5) q∈Q

Since all the sets in the last line are measurable, it follows that Et is measurable. 2 2 √ Next we prove that√ f is measurable. If t > 0 we have {x : f(x) > t} = {x : f(x) > t} ∪ {x : f(x) < − t}, and {x : f(x)2 > t} = X if t ≤ 0. It follows that f 2 is measurable. Finally, when f, g are real valued the measurability of fg follows from the identity (f + g)2 − f 2 − g2 fg = . (8.6) 2 The same argument as in the above proof shows that if f, g : X → [0, +∞] are unsigned measurable functions, then f + g is measurable. It is also true that fg is measurable, though the proof must be modiﬁed somewhat. (Why?) This result is left as an exercise (Problem 13.8).

Proposition 8.13. Let fn be a sequence of R-valued measurable functions. Then the functions sup fn, inf fn, lim sup fn, lim inf fn (8.7) n→∞ n→∞ are measurable. Moreover if fn → f pointwise then f is measurable.

Proof. Since inf fn = − sup(−fn), it suﬃces to prove the proposition for sup and lim sup. Let f(x) = supn fn(x). Then for any real number t, we have f(x) > t if and only if fn(x) > t for some n. Thus ∞ [ {x : f(x) > t} = {x : fn(x) > t}. (8.8) n=1 28 It follows that f is measurable. Likewise inf fn is measurable since inf fn = − sup(−fn). Finally let gN = supn≥N fn, hN = infn≥N fn. Then the gN and hN are measurable, and lim sup fn = inf gN , lim inf fn = sup hN . In the previous proposition it is of course essential that the supremum is taken only over a countable set of measurable functions; the supremum of an uncountable collection of measurable functions need not be measurable. Problem 13.6 asks for a counterexample. One simple but important application of the previous proposition is that if f, g are R- valued measurable functions, then f ∧ g := min(f, g) and f ∨ g := max(f, g) are measurable; in particular f + := max(f, 0) and f − := − min(f, 0) are measurable if f is. It also follows that |f| := f + +f − is measurable when f is. Together with Proposition 8.11, this shows that every C valued measurable function f is a linear combination of four unsigned measurable functions (the positive and negative parts of the real and imaginary parts of f). Thus many statements about general measurable functions can be reduced to the unsigned case. In discussing convergence of functions on a measure space (X, M , µ), we say that fn → f almost everywhere (or µ-almost everywhere), abbreviated a.e. or µ-a.e., if the set of points x where fn(x) is either divergent, or does not converge to f(x), has measure 0. The next two propositions show that 1) provided that µ is a complete measure, a.e. limits of measurable functions are measurable, and 2) one is not likely to make any serious errors in forgetting the completeness requirement. Proposition 8.14. Suppose (X, M , µ) is a complete measure space. • If f is measurable and f = g a.e., then g is measurable. • If fn are measurable functions and fn → f a.e., then f is measurable.

Proof. Problem 13.9. Proposition 8.15. Let (X, M , µ) be a measure space and (X, M , µ) its completion. If f is a M -measurable function, then there is an M -measurable function g such that f = g µ-a.e. Deﬁnition 8.16 (Unsigned simple function). An unsigned function f : X → [0, +∞] is called simple if it can be expressed as a ﬁnite sum X f = cn1En (8.9) n

where each En is a subset of X and each cn ∈ [0, +∞]. Proposition 8.17. A function f : X → [0, +∞] is simple if and only if the range of f is a finite set. Note that the expression (8.9) is far from unique; however Using this proposition we can define a standard representation of a simple function: if f is simple and range(f) = −1 {c1, . . . cn}, Define Ek = f ({ck}). Then n X f(x) = ck1Ek (x) (8.10) k=1

for all x ∈ X. (Note that we have allowed one of the ck to be 0 if 0 ∈ range(f); it will be helpful to keep track of this when manipulating simple functions.) 29 Theorem 8.18 (The Ziggurat approximation). Let (X, M ) be a measurable space. If f : X → [0, +∞] is an unsigned measurable function, then there exists a increasing sequence of unsigned, measurable simple functions fn : X → [0, +∞) such that fn → f pointwise on X. If f is bounded, the sequence can be chosen to converge uniformly. Proof. Fix an integer n ≥ 1. If x ∈ X and 0 < f(x) < +∞, then there is a unique positive integer k such that k k + 1 < f(x) ≤ . (8.11) 2n 2n k Let gn(x) = 2n for this unique k. (In other words, gn is obtained from f by “rounding down” −n k k+1 to the nearest integer multiple of 2 .) Observe that if we set En,k := {x : 2n < f(x) ≤ 2n } then ∞ X k g (x) = 1 . (8.12) n 2n En,k k=0 Now deﬁne  0 if f(x) = 0  fn(x) = gn(x) if 0 < f(x) ≤ n . (8.13) n if f(x) > n

Note that, using the deﬁnition of gn, the fn may be equivalently expressed as X k f (x) = 1 + n1 . (8.14) n 2n En,k {x:f(x)>n} 0≤k

Then the functions fn are increasing, and from (8.14) we see that they are simple and measurable. Moreover, fn(x) → +∞ if f(x) = +∞, and if 0 ≤ f(x) < +∞ then |f(x) − 1 fn(x)| ≤ n when n is suﬃciently large (namely, n ≥ f(x)). It follows that fn → f pointwise, and the convergence is uniform if f is bounded. It will be helpful to record for future use the round-oﬀ procedure used in this proof. Let f : X → [0, +∞] be an unsigned function. For any > 0, if 0 < f(x) < +∞ there is a unique integer k such that k < f(x) ≤ (k + 1). (8.15)

Deﬁne the “rounded down” function f(x) to be k when f(x) ∈ (0, +∞) and equal to 0 or +∞ when f(x) = 0 or +∞ respectively. Similarly we can deﬁned the “rounded up” function f to be (k + 1), 0, or +∞ as appropriate. (So, in the previous proof, the function gn was f1/n.) In particular, we have for all > 0 f ≤ f ≤ f , (8.16) and f, f are measurable if f is. Moreover the same argument used in the above proof shows that f, f → f pointwise as → 0. Finally, by the remarks following Proposition 8.13, the following is immediate (since its proof reduces to the unsigned case):

Corollary 8.19. Every R- or C-valued measurable function is a pointwise limit of measurable simple functions. 30 9. Integration of simple functions We will build up the integration theory for measurable functions in three stages. We first define the integral for unsigned simple functions, then extend it to general unsigned functions, and finally to general (R or C-valued) functions. Throughout this section and the next, we fix a measure space (X, M , µ); all functions are defined on this measure space. PN Definition 9.1. Let (X, M , µ) be a measure space and f = n=1 cn1En an unsigned measurable simple function (in its standard representation). The integral of f (with respect to the measure µ) is defined to be N Z X f dµ := cnµ(En). (9.1) X n=1

One thinks of the graph of the function c1E as “rectangle” with height c and “base” E; since µ tells us how to measure the length of E the quantity c · µ(E) is interpreted as the “area” of the rectangle. This intuition can be made more precise once we have proved Fubini’s theorem. Let L+ denote the set of all unsigned measurable functions on (X, M ). We begin by collecting some basic properties of the integrals of simple functions (temporarily referred to as simple integrals. When X and µ are understood we drop them from the notation and R R simply write f for X f dµ. Theorem 9.2 (Basic properties of simple integrals). Let (X, M , µ) be a measure space and let f, g ∈ L+ be simple functions. a) (Homogeneity) If c ≥ 0 then R cf = c R f. b) (Monotonicity) If f ≤ g then R f ≤ R g. c) (Finite additivity) R f + g = R f + R g. e) (Almost everywhere equivalence) If f(x) = g(x) for µ-almost every x ∈ X, then R f = R g. e) (Finiteness) R f < +∞ if and only if is ﬁnite almost everywhere and supported on a set of ﬁnite measure. f) (Vanishing) R f = 0 if and only if f = 0 almost everywhere. Proof. (a) is trivial; we prove (b) and (c) and leave the rest as (simple!) exercises.

b) From the assumption f ≤ g we deduce that cj ≤ dk whenever Ej ∩ Fk 6= ∅. Thus Z X X Z f = cjµ(Ej ∩ Fk) ≤ dkµ(Ej ∩ Fk) = g. (9.2) j,k j,k Pn Pm c) Let f = j=1 Cj1Ej and g = k=1 dk1Fk be simple functions in their standard Sm Sn representation. Notice that since Ej = k=1 Ej ∩Fk for each j and Fk = j=1 Fk ∩Ej for each k, it follows from the ﬁnite additivity of µ that Z Z X f + g = (cj + dk)µ(Ej ∩ Fk). (9.3) j,k But by the same reasoning the right hand side is equal to R (f + g). 31 If f : X → [0, +∞] is a measurable simple function, then so is 1Ef for any measurable set R R E. We write E f dµ := 1Ef dµ. Proposition 9.3. Let (X, M , µ) be a measure space and let f be an unsigned simple function. Then the function Z ν(E) := f dµ (9.4) E is a measure on (X, M ). ∞ Proof. That ν is nonnegative and ν(∅) = 0 are immediate from the deﬁnition. Let (En)n=1 S∞ be a sequence of disjoint measurable sets and let E = n=1 En. Write f in standard form Pm as j=1 cj1Fj . Then ∞ m ∞ ∞ m ∞ [ X [ X X X ν( En) = cjµ(Fj ∩ En) = cjµ(Fj ∩ En) = ν(En). (9.5) n=1 j=1 n=1 n=1 j=1 n=1

10. Integration of unsigned functions We now extend the definition of the integral to all (not necessarily simple) functions in L+: Definition 10.1. Let (X, M , µ) be a measure space. For an unsigned measurable function f : X → [0, +∞], define Z Z f dµ := sup g dµ. (10.1) X 0≤g≤f;g simple X Theorem 10.2 (Basic properties of unsigned integrals). Let f, g be unsigned measurable functions. a) (Homogeneity) If c ≥ 0 then R cf = c R f. b) (Monotonicity) If f ≤ g then R f ≤ R g. c) (Almost everywhere equivalence) If f(x) = g(x) for µ-almost every x ∈ X, then R f = R g. d) (Finiteness) If R f < +∞, then f(x) < +∞ for µ-a.e. x. e) (Vanishing) R f = 0 if and only if f = 0 almost everywhere. One would also expect the integral to be additive; however the proof of this will have to wait until we have established the Monotone Convergence Theorem. Proof of Theorem 10.2. a,b) As in the simple case homogeneity is trivial. Monotonicity is also evident, since any simple function less than f is also less than g. c R R c) Let E be a measurable set with µ(E ) = 0. It suffices to prove that 1Ef = f. If g is a simple function, then by Theorem 9.2(e), 1Eg is a simple function and R R 1Eg = g. Moreover, if h is a simple function then h ≤ 1Ef if and only if there is a simple function g ≤ f such that h = 1Eg. So Z Z Z Z Z f = sup g = sup 1Eg = sup h = 1Ef (10.2) 0≤g≤f 0≤g≤f 0≤h≤1E f 32 R R d) If f = +∞ on a set E, and µ(E) > 0, then f ≥ n1E = nµ(E) for all n, so R f = +∞. (A direct proof can be obtained from Markov’s inequality below.) e) If f = 0 a.e. and g ≤ f is a simple function, then R g = 0 so R f = 0. Conversely, suppose there is a set E of positive measure such that f(x) > 0 for all x ∈ E. Let 1 S∞ En = {x ∈ E : f(x) > n }. Then E = n=1 En, so by the pigeonhole principle 1 R 1 µ(EN ) > 0 for some N. But then N 1EN ≤ f, so f ≥ N µ(EN ) > 0.

Theorem 10.3 (Monotone Convergence Theorem). Suppose fn is a sequence of unsigned R R measurable functions and fn increases to f pointwise. Then fn → f. R R R R Proof. By monotonicity of the integral, fn ≤ f for all n so lim fn ≤ f. For the reverse inequality, ﬁx 0 < < 1 and let g be a simple function with 0 ≤ g ≤ f. Consider the sets En = {x : fn(x) ≥ (1 − )ϕ(x)}. (10.3)

The En form an increasing sequence of measurable sets whose union is X. For all n, Z Z Z fn ≥ fn ≥ (1 − ) g. (10.4) En En R By Monotone convergence for sets (Theorem 2.3(c)) applied to the measure ν(E) = E g (Proposition 9.3), we see that Z Z lim g = g. (10.5) En X R R Thus lim fn ≥ (1 − ) g for every 0 < < 1 and every simple function 0 ≤ g ≤ f. Taking R R the supremum over g and then over , we conclude that lim fn ≥ f. Before going on we mention two frequently used applications of the Monotone Convergence Theorem: Corollary 10.4. • (Vertical truncation) If f is an unsigned measurable function and f ∧ n := min(f, n), then R f ∧ n → R f. ∞ • (Horizontal truncation) If f is an unsigned measurable function and (En)n=1 is an increasing sequence of measurable sets whose union is X, then R f → R f. En

Proof. Since f ∧n and 1En f are measurable for all n and increase pointwise to f, these follow from the Monotone Convergence Theorem. Theorem 10.5 (Additivity of the unisgned integral). If f, g are unsigned measurable functions, then R f + g = R f + R g. Proof. By Theorem 8.18, there exist sequences of unsigned, measurable simple functions fn, gn which increase pointwise to f, g respectively. Thus fn + gn increases to f + g, so by Theorem 9.2(c) and the Monotone Convergence Theorem, Z Z Z Z Z Z f + g = lim fn + gn = lim fn + lim gn = f + g. (10.6)

Corollary 10.6 (Tonelli’s theorem for sums and integrals). If (fn) is a sequence of unsigned R P∞ P∞ R measurable functions, then n=1 fn = n=1 fn. 33 R PN PN R PN Proof. By induction on Theorem 10.5, n=1 fn = n=1 fn for all N ≥ 1. Since n=1 fn P∞ increases pointwise to n=1 fn, we have by the Monotone Convergence Theorem ∞ N N ∞ Z X Z X X Z X Z fn = lim fn = lim fn = fn. (10.7) N→∞ N→∞ n=1 n=1 n=1 n=1 R R If the monotonicity hypothesis is dropped, we can no longer conclude that fn → f if fn → f pointwise (see Examples 10.8 below), however the following weaker result holds.

Theorem 10.7 (Fatou’s Theorem). Suppose fn is a sequence of unsigned measurable functions. Then Z Z lim inf fn ≤ lim inf fn. (10.8)

Proof. For each n deﬁne functions gn(x) := infm≥n fm(x). Then the gn are unsigned measurable functions increasing pointwise to lim inf fn, and gn ≤ fn for each n. Thus by the Monotone Convergence Theorem Z Z Z Z lim inf fn = lim gn = lim inf gn ≤ lim inf fn. (10.9)

Examples 10.8 (Failure of convergence of integrals). This example highlights three modes R R of failure of the convergence fn → f for sequences of unsigned measurable functions R fn : R → [0, +∞]. In each case fn → 0 pointwise, but fn = 1 for all n: • (Escape to height infinity) fn = n1 1 (0, n ) 1 • (Escape to width infinity) fn = 2n 1(−n,n) • (Escape to support infinity) fn = 1(n,n+1) Note that in the second example the convergence is even uniform. These examples can be though of as “moving bump” functions–in each case we have a rectangle and can vary the height, width, and position. If we think of fn as describing a density of mass distributed R over the real line, then fn gives the total “mass”; Fatou’s theorem says mass cannot be created in the limit, but these examples show mass can be destroyed. Proposition 10.9 (Markov’s inequality). If f is an unsigned measurable function, then for all t > 0 1 Z µ({x : f(x) > t}) ≤ f (10.10) t R R Proof. Let Et = {x : f(x) > t}. Then by definition, t1Et ≤ f, so tµ(Et) = t1Et ≤ f. We conclude this section with a few frequently-used corollaries of the monotone convergence theorem, and a converse to it. Theorem 10.10 (Change of variables). Let (X, M , µ) be a measure space, (Y, N ) a measurable space, and φ : X → Y a measurable function. Define a function φ∗µ : N → [0, +∞] by −1 φ∗µ(E) = µ(φ (E)). (10.11) 34 Then φ∗µ is a measure on (Y, N ), and for every unsigned measurable function f : Y → [0, +∞], Z Z f d(φ∗µ) = (f ◦ φ) dµ. (10.12) Y X

Proof. Problem 13.15. The measure φ∗µ is called the push-forward of µ under φ. Lemma 10.11 (Borel-Cantelli Lemma). Let (X, M , µ) be a measure space and suppose ∞ (En)n=1 is a sequence of measurable sets with ∞ X µ(En) < ∞. (10.13) n=1

Prove that almost every x ∈ X is contained in at most ﬁnitely many of the En (that is, the sets Nx := {n : x ∈ En} ⊂ N are ﬁnite for a.e. x). There is a sense in which the monotone convergence theorem has a converse, namely that any map from unsigned measurable functions on a measurable space (X, M ) to [0, +∞], satisfying some reasonable axioms (including MCT) must come from integration against a measure. The precise statement is the following: Theorem 10.12. Let (X, M ) be a measurable space and let U(X, M ) denote the set of all unsigned measurable functions f : X → [0, +∞]. Suppose L : U(X, M ) → [0, +∞] is a function obeying the following axioms: a) (Homogeneity) For every f ∈ U(X, M ) and every real number c ≥ 0, L(cf) = cL(f). b) (Additivity) For every pair f, g ∈ U(X, M ), L(f + g) = L(f) + L(g). c) (Monotone convergence) If fn is a sequence in U(X, M ) increasing pointwise to f, then limn→∞ L(fn) = L(f). R Then there is a unique measure µ : M → [0, +∞] such that L(f) = X f dµ for all f ∈ U(X, M ). In fact, µ(E) = L(1E).

Proof. Problem 13.16.

11. Integration of signed and complex functions Again we work on a fixed measure space (X, M , µ). Suppose f : X → R is measurable. Split f into its positive and negative parts f = f + − f −. If at least one of R f +, R f − is finite, define Z Z Z f = f + − f −. (11.1)

If both are finite, we say f is integrable (or sometimes absolutely integrable). Note that f is integrable if and only if R |f| < +∞; this is immediate since |f| = f + + f −. We write Z kfk1 := |f| dµ (11.2) X when f is integrable. In the complex case, from the inequalities max(|Ref|, |Imf|) ≤ |f| ≤ |Ref| + |Imf| (11.3) 35 it is clear that f : X → C is absolutely integrable if and only if Ref and Imf are. If f is complex-valued and absolutely integrable (that is, f is measurable and |f| is integrable), we define Z Z Z f = Ref + i Imf. (11.4) R We also write kfk1 := X |f| dµ in the complex case. If f : X → R is absolutely integrable, then necessarily the set {x : |f(x)| = +∞} has measure 0. We may therefore redefine f to be 0, say, on this set, without affecting the integral of f (by Theorem 10.2(c)). Thus when working with absolutely integrable functions, we can (and from now on, will) always assume that f is finite-valued everywhere. 11.1. Basic properties of the absolutely convergent integral. The next few propositions collect some basic properties of the absolutely convergent integral. Let L1(X, M , µ) denote the set of all absolutely integrable C-valued functions on X. (If the measure space is understood, as it is in this section, we just write L1.)

Theorem 11.1 (Basic properties of L1 functions). Let f, g ∈ L1 and c ∈ C. Then: 1 R a) L is a vector space over C and the map f → f is a linear functional on it. R R b) f ≤ |f|. c) kcfk1 = |c|kfk1. d) kf + gk1 ≤ kfk1 + kgk1. Proof. a) If f, g ∈ L1, then so is f + g, since f + g is measurable and |f + g| ≤ |f| + |g|. Likewise if c is constant, then R |cf| = |c| R |f| < ∞ (using homogeneity of the unsigned integral) so cf ∈ L1. To prove that f → R f is linear, ﬁrst assume f and g are real-valued; the complex case then follows by deﬁnition. Checking c R f = R cf is straightforward. For additivity, let h = f + g; then h+ − h− = f + + g+ − f − − g− (11.5) and therefore h+ + f + + g− = h− + f + + g+. (11.6) Thus Z Z h+ + f + + g− = h− + f + + g+ (11.7)

which rearranges to R h = R f + R g using the additivity of the unsigned integral, and the ﬁniteness of all the integrals involved. b) If f is real, then Z Z Z Z Z Z + − + − f = f − f ≤ f + f = |f|. (11.8)

When f is complex, assume R f 6= 0 and let t = sgnR f. Then |t| = 1 and | R f| = R tf. It follows that Z Z Z Z Z

f = Re tf = Retf ≤ |tf| = |f|. (11.9)

36 c,d) These are immediate from the observations already made in the proof of (a). Because of cancellation, it is clear that R f = 0 does not imply f = 0 a.e. when f is a signed or complex function. However the conclusion f = 0 a.e. can be recovered if we R assume the vanishing of all the integrals E f, over all measurable sets E. Proposition 11.2. Let f ∈ L1. The following are equivalent: a) f = 0 almost everywhere, b) R |f| = 0, R c) For every measurable set E, E f = 0. Proof. Since f = 0 a.e. if and only if |f| = 0 a.e., (a) and (b) are equivalent by The- orem 10.2(e). Now assuming (b), if E is measurable then by monotonicity and Theo- rem 11.1(b) Z Z Z

f ≤ |f| ≤ |f| = 0, (11.10) E E so (c) holds. Finally, to prove (c) implies (a) we prove the contrapositive. Assume (a) is false. If f 6= 0 on a set of positive measure, then one of u = Ref, v = Imf is nonzero on a set of positive measure; suppose without loss of generality it is u. Then in turn one of u+, u− is nonzero on a set of positive measure E; suppose it is u+. Then by Theorem 10.2(e) again, Z Z Re f = u+ > 0, (11.11) E E so (c) is false. Corollary 11.3. If f, g ∈ L1 and f = g µ-a.e., then R f = R g. Proof. Apply Proposition 11.2 to f − g. The preceding proposition and its corollary say that for the purposes of integration, we are free to alter the definition of functions on sets of measure zero. In particular, if f : X → R is finite valued almost everywhere, then there is another function g which is finite everywhere and equal to f a.e. (Simply define g to be 0 (or any other finite value) on the set E where f = ±∞.) Another consequence is that we can introduces an equivalence relation on L1(X, M , µ) by declaring f ∼ g if and only if f = g a.e. If [f] denotes the equivalence class of f under this relation, we may define the integral on equivalence classes by declaring R [f] := R f. Corollary 11.3 shows that this is well-defined. It is straightforward to check that [cf + g] = [cf] + [g] for all f, g ∈ L1 and scalars c (so that L1/ ∼ is a vector space), and that the properties of the integral given in Theorem 11.1 all persist if we work with equivalence classes. The advantage is that now R [|f|] = 0 if and only if [|f|] = 0. This means that 1 the quantity k[f]k1 is a norm on L / ∼. Henceforth will we agree to impose this relation whenever we talk about L1, but for simplicity we will drop the [·] notation, and also write just L1 for L1/ ∼. So, when we refer to an L1 function, it is now understood that we refer to the equivalence class of functions equal to f a.e., but in practice this abuse of terminology should cause no confusion. Just as the Monotone Convergence Theorem is associated to the unsigned integral, there is a convergence theorem for the absolutely convergent integral. 37 ∞ 1 Theorem 11.4 (Dominated Convergence Theorem). Suppose that 1) (fn)n=1 ⊂ L and 1 fn → f pointwise a.e., and 2) there exists a function g ∈ L such that for every n, we have 1 |fn| ≤ g a.e. Then f ∈ L , and Z Z lim fn = f. (11.12) n→∞ Proof. By considering the real and imaginary parts separately, we may assume f and all the fn are real valued. By hypothesis, g ± fn ≥ 0 a.e. Applying Fatou’s theorem to these sequences, we find Z Z Z Z Z g + f ≤ lim inf (g + fn) = g + lim inf fn (11.13) and Z Z Z Z Z g − f ≤ lim inf (g − fn) = g − lim sup fn (11.14) R R R It follows that lim inf f ≥ f ≥ lim sup f. R R R R The conclusion fn → f (equivalently, fn − f → 0) can be strengthened somewhat:

Corollary 11.5. Let fn, f, g satisfy the hypotheses of the Dominated Convergence theorem. R Then limn→∞ kfn − fk1 = 0 (that is, lim |fn − f| = 0).

Proof. Problem 13.19. Theorem 11.6 (Density of simple functions in L1). Let f ∈ L1. Then there is a sequence of integrable simple functions fn such that:

a) |fn| ≤ |f| for all n, b) fn → f pointwise, and c) limn→∞ kfn − fk1 = 0. Proof. Write f = u + iv with u, v real, and u = u+ − u−, v = v+ − v−. Each of the four functions u±, v± is nonnegative and measurable, so by the ziggurat approximation we can ± ± ± ± choose four sequences of measurable simple functions un , vn increasing pointwise to u , v + − + − respectively. Now put un = un − un , vn = vn − vn , and fn = un + ivn. By construction, each fn is simple. Moreover + − + − |un| = un + un ≤ u + u = |u|, (11.15) and similarly |vn| ≤ |v|, so |fn| ≤ |f|. It follows that each fn is integrable, and fn → 1 f pointwise by construction. Finally, since f ∈ L , the fn satisfy the hypothesis of the dominated convergence theorem (with g = |f|), so (c) follows from Corollary 11.5.

12. Modes of convergence In this section we consider five different ways in which a sequence of functions on a measure space (X, M , µ) can be said to converge. There is no simple or easily summarized description of the relation between the five modes; at the end of the section the reader is encouraged to draw a diagram showing the implications. 38 12.1. The five modes of convergence. ∞ Definition 12.1. Let (X, M , µ) be a measure space and (fn)n=1, f be measurable functions. a) Say fn converges to f pointwise almost everywhere if fn(x) converges to f(x) for µ-a.e. x ∈ X. ∞ b) Say fn converges to f essentially uniformly or in the L norm if for every > 0, there exists N ∈ N such that |fn(x) − f(x)| < for all n ≥ N and µ-a.e. x ∈ X. c) Say fn converges to f almost uniformly if for every > 0, there exists an exceptional set E ∈ M such that µ(E) < and fn → f uniformly on the complement of E. 1 R d) Say fn converges to f in the L norm if kfn − fk1 := X |fn − f| dµ → 0 as n → ∞. e) Say fn converges to f in measure if for every > 0, µ({x : |fn(x) − fm(x)| > }) → 0 as n → ∞. The first thing to notice is that each of these modes of convergence is unaffected if f or the fn are modified on sets of measure 0 (this is not true of ordinary pointwise or uniform convergence). Thus these are modes of convergence appropriate to measure theory. The L1 and L∞ modes are special cases of Lp convergence, which will be studied later in the course. We first treat a few basic properties common to all five modes of convergence.

Proposition 12.2 (Linearity of convergence). Let (fn), (gn), f, g be measurable functions.

a) For each of the ﬁve modes, fn → f in the given mode if and only if |fn − f| → 0 in the given mode. b) If fn → f and gn → g in a given mode, then fn + gn → f + g in the given mode. Proof. The proof is left as an exercise. (Problem 13.23)

Proposition 12.3 (Easy implications). Let (fn) be a sequence of measurable functions.

a) If fn → f essentially uniformly, then fn → f pointwise a.e. and almost uniformly. b) If fn → f almost uniformly, then fn → f pointwise a.e. and in measure. 1 c) If fn → f in the L norm, then fn → f in measure. Proof. (a) is immediate from definitions. For (b), given n ≥ 1 we can find a measurable set −n c En with µ(En) < 2 such that fn → f uniformly (hence pointwise) on E . It follows that S∞ c T∞ fn → f pointwise on the set n=1 En; the complement of this set is n=1 En, which has measure 0. The second part of (b) is left as an exercise. Finally, (c) follows from Markov’s inequality: fix > 0, then 1 Z µ({x : |fn(x) − f(x)| > }) ≤ |fn − f| dµ → 0. (12.1) X Of the twenty possible implications that can hold between the five modes of convergence, only the five implications given in the last proposition are true without further hypotheses. Counterexamples for the rest can be obtained using the moving bump examples, which we now recall:

a) (Escape to height infinity) fn = n1 1 (0, n ) 1 b) (Escape to width infinity) fn = 2n 1(−n,n) c) (Escape to horizontal infinity) fn = 1(n,n+1) To these we add one more: 39 Example 12.4 (The Typewriter Sequence). Consider Lebesgue measure on (0, 1]. Let Ink ⊂ k k+1 n (0, 1] denote the dyadic interval ( 2n , 2n ] for n ≥ 1, 0 ≤ k < 2 . List these intervals in dictionary order (first by increasing n, then by increasing k). So the first few intervals are 1 1 1 1 1 I10 = (0, 2 ], I11 = ( 2 , 1], I20 = (0, 4 ], I21 = ( 4 , 2 ], etc. (Draw a picture to see what is going on.) The sequence of indicator functions of these intervals (in this order) converges in −n measure to 0, since for any 0 < < 1 we have m({x : 1Ink > } = m(Ink) = 2 . However since each point in (0, 1] lies in infinitely many Ink and also lies outside infinitely many Ink,

the sequence 1Ink (x) does not converge at any point of (0, 1]. These four examples suffice to produce counterexamples to all of the implications not covered in Proposition 12.3. For example, the typewriter sequence converges to 0 in the L1 norm and in measure, but in none of the other three modes. The rest are left as exercises. To understand the differing modes of convergence, it is helpful to work out what they say in the simplest possible case, namely that of step functions. A step function is a function of the form c1E for a positive constant c and measurable set E. In this section we obtain criteria for a sequence of step functions to converge to 0 in each of the five modes. It turns out that the answers are more or less completely determined by three objects associated to the ∞ S sequence (cn1En )n=1: the heights cn, the widths µ(En), and the tail supports Tn := n≥j Ej. The proof of the following theorem involves nothing more than the definitions, but is an instructive exercise.

Theorem 12.5. Let fn = cn1E−n be a sequence of step functions. Then: T∞ a) fn → 0 pointwise a.e. if and only if either cn → 0 or µ( n=1 Tn) = 0 (that is, lim inf En is a null set). ∞ b) fn → 0 in L if and only if cn → 0. c) fn → 0 almost uniformly if and only if either cn → 0 or µ(Tn) → 0. 1 d) fn → 0 in L if and only if cnµ(En) → 0. e) fn → 0 in measure if and only if either cn → 0 or µ(En) → 0.

Proof. Problem 13.27. To go further we begin with a closer investigation of convergence in measure.

∞ Deﬁnition 12.6. Let (fn)n=1 be a sequence of C-valued measurable functions. Say (fn) is Cauchy in measure if for every > 0,

µ({x : |fn(x) − fm(x)| > }) → 0 as m, n → ∞ (12.2)

It is readily seen that if fn → f in measure, then the sequence (fn) is Cauchy in measure. Indeed, note that by the triangle inequality

{x : |fn(x) − fm(x)| > } ⊂ {x : |fn(x) − f(x)| > /2} ∪ {x : |fm(x) − f(x)| > /2}, (12.3) the claim follows by subadditivity of µ. The converse also holds, this will be proved shortly. An inspection of the deﬁnition yields the following equivalent formulation of Deﬁnition ??

∞ Proposition 12.7. Let (fn)n=1 be a sequence of measurable functions. Then fn is Cauchy in measure if and only if for every > 0 there exists an integer N ≥ 1 such that

µ({x : |fn(x) − fm(x)| > }) < (12.4) 40 for all n, m ≥ N. Similarly fn → f in measure if and only if for every > 0 there exists N such that

µ({x : |fn(x) − f(x)| > }) < (12.5) for all n ≥ N.

Proof. Problem ??. We have already seen that convergence in measure does not imply pointwise a.e. convergence (the typewriter sequence). Note, however, that in that example there is at least a subsequence converging pointwise a.e. to 0 (give an example). It turns out this is generic:

∞ Proposition 12.8. Let (fn)n=1 be a sequence of measurable functions which is Cauchy in measure. Then: ∞ a) there is a measurable function f and a subsequence (fnk )k=1 such that fnk → f a.e. as k → ∞. b) With f as in part (a), we have fn → f in measure, and if also fn → g in measure, then f = g a.e.

In other words, if the sequence (fn) is Cauchy in measure, then it converges in measure to a unique (a.e.) f, and a subsequence of (fn) converges to f a.e. Proof of Proposition 12.8. a) Taking = 2−k in Proposition 12.7, we can choose inductively a sequence of integers n1 < n2 < . . . nk < . . . such that −k −k µ({x : |fnk (x) − fnk+1 (x)| > 2 }) < 2 . (12.6) ∞ Put gk = fnk . We claim that for a.e. x ∈ X, the sequence (gk(x))k=1 is Cauchy. Let −k Ek = {x : |gk(x) − gk+1(x)| > 2 }; (12.7) −k S∞ P∞ −k −N−1 by (12.6) we have µ(Ek) < 2 . Now let FN = k=N Ek; then µ(FN ) ≤ k=N 2 = 2 . For x 6∈ FN and m ≥ n ≥ N, we have the estimate m−1 m−1 X X −k −N−1 |gn(x) − gm(x)| ≤ |gk+1(x) − gk(x)| ≤ 2 ≤ 2 . (12.8) k=n k=n T∞ It follows that (gk(x)) is Cauchy when x 6∈ FN . If we now put F = N=1 FN = lim sup Ek, then (gk(x)) is Cauchy for x 6∈ F , and µ(F ) = lim sup µ(Ek) = 0 (why?) In other words, (gk) is Cauchy (and hence converges) a.e. b) Let (gk) and f be as in the proof of part (a). Taking m → ∞ in the estimate (??) we −N−1 see that |f(x) − gn(x)| ≤ 2 when n ≥ N and x 6∈ FN . Since µ(FN ) → 0 as N → ∞, it follows that gn → f in measure. In turn it follows that fn → f in measure, using the triangle inequality as in (??):

{x : |fn(x) − f(x)| > } ⊂ {x : |fn(x) − gk(x)| > /2} ∪ {x : |gk(x) − f(x)| > /2}, (12.9) and the measures of the sets on the right can be made less than by choosing n, k suﬃciently large. Finally, suppose also fn → g in measure. Then by one more application of the triangle inequality trick,

{x : |f(x) − g(x)| > } ⊂ {x : |f(x) − fn(x)| > /2} ∪ {x : |fn(x) − g(x)| > /2}, (12.10) 41 and the measures of the sets on the right-hand side go to 0 by hypothesis. So µ({x : |f(x) − g(x)| > }) = 0 for all , so taking a sequence of ’s tending to 0, we ﬁnd that f = g a.e. From Proposition 12.3(c) we know that L1 convergence implies convergence in measure. We then obtain:

1 Corollary 12.9. If fn → f in L , then (fn) has a subsequence converging to f a.e. Proof. Combine Proposition 12.3(c) and Proposition 12.8. 12.2. Finite measure spaces. Observe that two of the “moving bump” examples (escape to width infinity and escape to horizontal infinity) exploit the fact that Lebesgue measure on R is infinite. Moreover, in some cases these are the only counterexamples (of the four) to particular implications–for example, escape to width infinity is the only example where we have L∞ convergence but not convergence in L1, and escape to horizontal infinity is the only one where we have pointwise a.e. convergence but not convergence in measure. It is then plausible that if we work on a finite measure space (µ(X) < ∞), where these examples are “turned off,” we might recover further convergence results. This is indeed the case.

1 Proposition 12.10. Suppose (X, M , µ) is a ﬁnite measure space and (fn) is an L sequence. 1 1 If fn → f essentially uniformly, then f ∈ L and fn → f in L . Proof. Problem 13.26. Theorem 12.11 (Egorov’s Theorem). Let (X, M , µ) be a measure space with µ(X) < ∞. If fn : X → C is a sequence of measurable functions and fn → f a.e., then fn → f almost uniformly.

Proof. Fix > 0. There is no loss of generality in assuming fn → f everywhere. Define the sets ∞ [ 1 E = {x : |f (x) − f(x)| ≥ }. (12.11) N,k n k n=N for n, k ≥ 1. Fix k. If x lies in infinitely many of the EN,k, then fn(x) 6→ f(x), so for T each k, k≥1 EN,k = ∅. Since the EN,k are decreasing with N and µ(X) < ∞, we have by dominated convergence for sets limN→∞ µ(EN,k) = 0 for each k. Now for each k choose Nk −k S∞ such that µ(EN,k) < 2 . Let E = k=1 ENk,k. Then µ(E) < . To say that x 6∈ E means 1 that for every k, there exists an Nk such that for all n ≥ Nk, |fn(x) − f(x)| < k . In other c words fn → f uniformly on E . Thus fn → f almost uniformly. Remark 12.12. Note that in Egorov’s theorem, almost uniform convergence cannot be improved to essential uniform convergence, as the moving bump example 1 1 shows. [0, n ] 12.3. Uniform integrability. In the last section we saw that some convergence implications could be recovered by making an assumption (µ(X) < ∞) that “turns off” some of the failure modes. In this section we do something similar. In particular note that the moving bump examples show that of the five modes, L1 convergence is the only one which implies R R fn → f (assuming the fn and f are integrable). The main result of this section, a version of the Vitali convergence theorem, says that if we make certain assumptions on fn (namely “uniform integrability”) that turn off the moving bump examples, then we can conclude 42 that L1 convergence is equivalent to convergence in measure. This is similar in spirit to the classical Vitali Convergence Theorem (which we will cover later in the context of Lp spaces), though the approach used here (borrowed from T. Tao, An Introduction to Measure Theory, Section 1.5) is slightly different.

1 Definition 12.13 (Uniform integrability). Let fn : X → C be a sequence of L functions. Say that the sequence is uniformly integrable if all of the following hold: 1 (i) [Uniform bound on L norm]: supn kfnk1 < ∞ (ii) [No escape to vertical infinity]: sup R |f | dµ → 0 as M → +∞ n {|fn|≥M} n (iii) [No escape to width infinity]: sup R |f | dµ → 0 as δ → 0. n {|fn|≤δ} n (To visualize conditions (ii) and (iii), work out what they say for sequences of step functions.) We warm up by observing that, for a single L1 function f, we have by the Dominated Convergence theorem Z lim |f| dµ = 0 (12.12) M→+∞ |f|>M and Z lim |f| dµ = 0. (12.13) δ→0 |f|≤δ So uniform integrability says the quantities R |f | dµ and R |f | dµ can be made |fn|>M n |fn|≤δ n arbitrarily small by choice of large M and small δ, independently of n. Before going further we give an equivalent formulation of (ii) (assuming (i)):

1 Lemma 12.14. Suppose (fn) is a sequence of L functions such that supn kfnk1 < ∞. Then condition (ii) in Definition 12.13 is equivalent to: 0 R (ii) For every > 0, there exists a δ > 0 such that if µ(E) < δ, then E |fn| < for all n. Proof. Suppose (ii) holds and let > 0. Choose M such that sup R |f | < , and let n |fn|≥M n 2 δ < 2M . Let new E be a measurable set with µ(E) < δ. Then for all n, Z Z Z |fn| dµ = |fn| dµ + |fn| dµ E E∩{|fn|≥M} E∩{|fn| 0, and for δ > 0 satisfying (ii)0 choose M large 1 enough that M supn kfnk1 < δ. Then by Markov’s inequality, for all n we have µ({|fn| ≥ kfnk1 0 M} ≤ M < δ. Thus by (ii) Z |fn| < (12.14) |fn|≥M for all n. This proves (ii). 43 Remark 12.15. On a finite measure space, escape to width infinity is impossible, and a 0 sequence is uniformly integrable if and only if supn kfk1 < ∞ and (ii) (equivalently, (ii) ) is satisfied.

1 1 Proposition 12.16. If (fn) is a sequence of L functions and fn → f in the L norm, then (fn) is uniformly integrable. Proof. Problem 13.33.

Theorem 12.17. Let fn : X → C be a uniformly integrable sequence of functions and 1 f : X → C a measurable function. Then fn → f in the L norm if and only if fn → f in measure. Proof. The forward implication follows from Proposition 12.3. For the converse, suppose 1 fn → f in measure. We first observe that f ∈ L : by uniform integrability, there is a R constant C such that X |fn| ≤ C for all n, and by Corollary 12.9 there is a subsequence fnk → f a.e. Applying Fatou’s theorem to this subsequence, we conclude Z |f| ≤ C, (12.15) X so f ∈ L1. Now let > 0. By uniform integrability there is a δ > 0 such that Z |fn| dµ ≤ . (12.16) |fn|≤δ Again using Fatou we have the same for f (after shrinking δ if necessary): Z |f| dµ ≤ . (12.17) |f|≤δ Now fix another number 0 < η < δ/2 (to be specified later). From (12.16) and (12.17) we have Z |fn| dµ ≤ (12.18) |fn−f|<η, |f|≤δ/2 and Z |f| dµ ≤ . (12.19) |fn−f|<η, |f|≤δ/2 So by the triangle inequality Z |fn − f| dµ ≤ 2. (12.20) |fn−f|<η, |f|≤δ/2

We now estimate the integral of |fn − f| over the region |fn − f| < η, |f| > δ/2; here we gain pointwise control of |f| via Markov’s inequality. Indeed, we have C µ({x : |f(x)| > δ/2}) ≤ (12.21) δ/2 so Z C |fn − f| dµ ≤ η (12.22) |fn−f|<η, |f|>δ/2 δ/2 44 so if η is chosen suﬃciently small (depending on C, , δ) we conclude Z |fn − f| dµ ≤ . (12.23) |fn−f|<η, |f|>δ/2 Combining this with (12.20) we get Z |fn − f| dµ ≤ . (12.24) |fn−f|<η

Finally we need to estimate the integral of |fn − f| over the set |fn − f| > η, for this we use the convergence in measure hypothesis. In particular choose N such that for all n ≥ N,

µ({x : |fn(x) − f(x)| ≥ η} ≤ η (12.25) (Proposition 12.7). Shrinking η if necessary, (and enlarging N correspondingly), by Lemma 12.14 we have Z |fn| dµ ≤ (12.26) |fn−f|≥η and Z |f| dµ ≤ (12.27) |fn−f|≥η so again by the triangle inequality Z |fn − f| dµ ≤ 2 (12.28) |fn−f|≥η for all n sufficiently large. Putting this together with (12.24) we have finally Z |fn − f| dµ < 5 (12.29) X for all n sufficiently large, which completes the proof.

Remark 12.18. Say that a sequence of measurable functions fn : X → C is dominated if 1 there is an L function g such that |fn| ≤ |g| for all n. It is not hard to show (Problem 13.32) that if a sequence fn is dominated, then it is uniformly integrable. The converse is false, however (give an example). It is true that if f is dominated, and fn → f a.e., then also fn → f almost uniformly, and hence in measure (Problem 13.35). By this circuitous route the dominated convergence theorem can be obtained as a consequence of the previous theorem. Its main utility is thus in proving L1 convergence for sequences that are not dominated. Combining this theorem with Proposition 12.16, we obtain

1 Corollary 12.19. Suppose fn → f in measure. Then fn → f in the L norm if and only if the sequence (fn) is uniformly integrable.

13. Problems 13.1. Measurable functions.

Problem 13.1. Suppose f : X → C is a measurable function. Prove that the functions |f| and sgnf are measurable. (Recall that for a complex number z, sgn(z) = z/|z| if z 6= 0, and sgn(0) = 0.) 45 Problem 13.2. Let f : R → R be a function. For each of the following, prove or give a counterexample. a) Suppose f 2 is Lebesgue measurable. Does it follow that f is Lebesgue measurable? b) Suppose f 3 is Lebesgue measurable. Does it follow that f is Lebesgue measurable? Problem 13.3. Recall the deﬁnition of an atomic σ-algebra (Problem 7.2). Prove that if (X, M ) is a measurable space and M is an atomic σ-algebra, then a function f : X → R is measurable if and only if it is constant on each atom An. Problem 13.4. Prove that every monotone function f : R → R is Borel measurable.

Problem 13.5. Let fn : X → R be a sequence of measurable functions. Prove that the set {x ∈ X : limn→∞ fn(x) exists} is measurable. Problem 13.6. Give an example of an uncountable collection F of Lebesgue measurable functions f : R → [0, +∞] such that the function f(x) = supf∈F f(x) is not Lebesgue measurable. Problem 13.7. Let f : [0, 1] → [0, 1] denote the Cantor-Lebesgue function of Example 6.2(c) and define g(x) = f(x) + x. (i) Prove that g is a homeomorphism of [0, 1] onto [0, 2]. (Hint: it suffices to show g is a continuous bijection.) (ii) Let C ⊂ [0, 1] denote the Cantor set. Prove that m(g(C)) = 1. (iii) By the remark following Example 4.12, g(C) contains a nonmeasurable set E. Show that g−1(E) is Lebesgue measurable, but not Borel. (iv) Prove that there exists functions F,G on R such that F is Lebesgue measurable, G is continuous, but F ◦ G is not Lebesgue measurable. Problem 13.8. Prove that if f, g are unsigned measurable functions, then fg is measurable. Problem 13.9. Prove Proposition 8.14. 13.2. The unsigned integral. Problem 13.10. Complete the proof of Theorem 9.2. Problem 13.11. Prove that if f is an unsigned measurable function and R f < +∞, then the set E := {x : f(x) > 0} is σ-finite. (That is, E can be written as a disjoint union of S∞ measurable sets E = n=1 En with each µ(En) < +∞.) Problem 13.12. Suppose that f is an unsigned measurable function and R f < +∞. a) Prove that for every > 0 there is a measurable set E with µ(E) < +∞, such that R R f − E f < . b) Prove that for every > 0 there is a positive integer n such that R f −R min(f, n) < . Problem 13.13. Prove Fatou’s Theorem (Theorem 10.7) without using the Monotone Con- vergence Theorem. Then use Fatou’s theorem to prove the Monotone Convergence Theorem. Problem 13.14. Let X be any set and let µ be counting measure on X. Prove that for R P every unsigned function f : X → [0, +∞], we have X f dµ = x∈X f(x). Problem 13.15. Prove Theorem 10.10. (Hint: to verify the integral formula, use the Monotone Convergence Theorem.) 46 Problem 13.16. Prove Theorem 10.12. (Hint: show first that µ(E) := L(1E) is a measure, then that L(f) = R f dµ. Problem 7.6 may be helpful.) Problem 13.17. Let f be an unsigned measurable function on (X, M , µ). Prove that the R function ν(E) := E f dµ is a measure on M , and that for all unsigned measurable g, we have R g dν = R gf dµ.

Problem 13.18. Prove (without using the dominated convergence theorem) that if fn is a R sequence of unsigned measurable functions that decreases pointwise to f, and fN < ∞ for R R some N, then f = lim fn. Give an example to show that the ﬁniteness hypothesis is necessary. 13.3. The signed integral. Problem 13.19. . Prove Corollary 11.5. Problem 13.20. Prove the following generalization of the dominated convergence theorem: 1 1 suppose fn → f a.e. If gn is a sequence of L functions converging a.e. to an L function g, R R R R if |fn| ≤ gn for all n, and gn → g, then fn → f. 1 R Problem 13.21. Suppose fn, f ∈ L and fn → f a.e. Prove that |fn − f| → 0 if and only R R if |fn| → |f|. (Use the previous problem.) Problem 13.22. Evaluate each of the following limits, and carefully justify your claims. Z ∞ sin(x/n) a) lim n dx n→∞ 0 (1 + (x/n)) Z ∞ 1 + nx2 b) lim 2 n dx n→∞ 0 (1 + x ) Z ∞ n sin(x/n) c) lim 2 dx n→∞ 0 x(1 + x ) Z ∞ n d) lim 2 2 dx n→∞ a 1 + n x 13.4. Modes of convergence. Problem 13.23. Prove Proposition 12.2

Problem 13.24. Prove that if fn → f almost uniformly, then fn → f in measure. Problem 13.25. Show that the implications between modes of convergence not given in Proposition 12.3 are false. Problem 13.26. . Prove Proposition 12.10. Problem 13.27. Prove Theorem 12.5.

Problem 13.28. Prove that if fn → f in measure, then there is a subsequence fnj converging to f almost uniformly. (Hint: this can be extracted from the proof of Proposition 12.8.) Problem 13.29. Let (X, M , µ) be a finite measure space. For measurable functions f, g : X → C, define Z |f − g| d(f, g) = dµ. (13.1) X 1 + |f − g| Prove that d is a metric on the set of measurable functions (where we identify f and g when f = g a.e.). 47 Problem 13.30. Let (X, M , µ) be a finite measure space. Prove that fn → f in measure if and only if d(fn, f) → 0, where d is the metric in Problem 13.29. 1 1 P∞ Problem 13.31 (Fast L convergence). Suppose fn → f in L in such a way that n=1 kfn− fk1 < ∞. Prove that fn → f almost uniformly. (Note that this hypothesis “turns off” the typewriter sequence.) (Hint: first show that, given > 0 and m ≥ 1, there exists an integer N such that ∞ ! [ −m µ {x : |fk(x) − f(x)| ≥ 2 } < . (13.2) k=N To construct your exceptional set E, use the /2n trick.)

Problem 13.32. Prove that if fn is a dominated sequence, then it is uniformly integrable. Give an example to show that the converse is false. 1 Problem 13.33. Prove that if fn → f in L , then (fn) is uniformly integrable (Proposi- tion 12.16).

Problem 13.34. Suppose fn → f in measure and fn is dominated. Give a direct proof that 1 fn → f in L (without using Theorem 12.17).

Problem 13.35. Prove that if fn is a dominated sequence, and fn → f a.e., then fn → f almost uniformly. (Hint: imitate the proof of Egorov’s theorem.) (Thus for dominated sequences, a.e. and a.u. convergence are equivalent.) 1 Problem 13.36 (Defect version of Fatou’s theorem). Let (fn) be a sequence of unsigned L 1 functions with sup kfnk1 < ∞. Suppose fn → f a.e. Prove that fn → f in L if and only R R n if fn → f.

48 14. Product measures We now revisit measures and σ-algebras. Given a pair of measure spaces (X, M , µ), (Y, N , ν), we would like to construct a “product” measure space over the Cartesian product X × Y . It is natural to ask that Cartesian products E × F (with E ∈ M ,F ∈ N ) be measurable, and that the measure of E × F should be µ(E) · ν(F ). In this section we show that such a construction is possible, and is essentially unique if the factor measure spaces are σ-finite. We first recall the product σ-algebra from Definition ??. Examples 14.1. a) If X,Y are finite sets, then if X,Y are given the discrete σ-algebras 2X , 2Y , then 2X ⊗ 2Y = 2X×Y . This is still true if at least one of X,Y is countably infinite, but becomes false when both X,Y are uncountable. (The point is that the cardinality of 2X ⊗2Y will be strictly less than that of 2X×Y . The proof is an exercise for those familiar with cardinal arithmetic.)

b) If we take two copies of R with the Borel σ-algebra BR, then BR ⊗ BR = BR2 . (See Proposition 1.12.)

c) However, if we take two copies of R with the Lebesgue σ-algebra LR, we find that LR⊗LR is strictly smaller that LR2 . Indeed if E ⊂ [0, 1] is a Lebesgue nonmeasurable set, then G = E × {0} belongs to LR2 since it is a subset of the null set [0, 1] × {0}, 0 but the slice set G = {x ∈ R :(x, 0) ∈ G} = E is not in LR, and hence G is not in LR ⊗ LR by Lemma 14.5 below. With the product σ-algebra M ⊗ N in hand, we construct the product measure µ × ν. Theorem 14.2 (Existence and uniqueness of product measure). Let (X, M , mu), (Y, N , ν) be σ-finite measure spaces. Then there is a unique measure µ × ν on the product σ-algebra M ⊗ N which satisfies (µ × ν)(E × F ) = µ(E)ν(F ) for all E ∈ M ,F ∈ N , and this measure is σ-finite. Proof. We first show existence, using the Hahn-Kolmogorov theorem. Let us call a set E ×F with E ∈ M ,F ∈ N a rectangle, and let B ⊂ M ⊗N denote the Boolean algebra consisting of all finite unions of rectangles. (Check that this is indeed a Boolean algebra). Note also that each set B ∈ B can be written (not uniquely) as a finite disjoint union of rectangles. Sn For a set R = j=1(Ej × Fj) ∈ B (disjoint union), define n X ρ(R) = µ(Ej)ν(Fj). (14.1) j=1 We claim that ρ is well defined and a premeasure on B. Since any two presentations of B have a common refinement, it follows that ρ is well-defined and finitely additive. To show that ρ is a premeasure, it remains to show that it is countably additive when possible. For this we may assume R = E × F is a rectangle, and R is decomposed as a countable union of rectangles Rn = En × Fn. Observe that for rectangles, we have the indicator function identity 1E×F (x, y) = 1E(x)1F (y). (14.2) S∞ Thus from the decomposition R = n=1 Rn ∞ X 1E(x)1F (y) = 1En (x)1Fn (y). (14.3) n=1 49 Now we fix x ∈ X and integrate this identity with respect to ν: ∞ Z X 1E(x)ν(F ) = 1En (x)1Fn (y) dµ (14.4) Y n=1 By the Monotone Convergence Theorem, we can exchange the sum and the integral, so ∞ X Z 1E(x)ν(F ) = 1En (x)1Fn (y) dµ n=1 Y ∞ X = 1En (x)ν(Fn) n=1 for all x ∈ X. We may now repeat this process by integrating dµ and applying monotone convergence to exchange the sum and integral, obtaining ∞ ∞ X X ρ(R) = µ(E)ν(F ) = µ(En)ν(Fn) = ρ(Rn). (14.5) n=1 n=1 Thus, ρ is a premeasure. Thus by the Hahn-Kolmogorov theorem, it extends to a measure on the σ-algebra generated by B, which is M ⊗N . Moreover, since µ and ν are σ-finite it is straightforward to check that ρ is also, and thus (by Hahn-Kolmogorov again) the extension is unique. Remark 14.3. We have stated and proved the theorem only for two factors, but there is no difficulty in extending it to finitely many factors (Xj, Mj, µj), j = 1, . . . n. There is also a construction valid for infinitely many factors when each factor is a probability space (that is, µ(X) = 1) but this requires more care. In these notes we consider only the finite case.

Examples 14.4. a) If X,Y are at most countable, and µX , µY denote counting measure X Y X×Y on X × Y respectively, then 2 ⊗ 2 = 2 and µX × µY is counting measure on X × Y . b) For two copies of R with the Borel σ-algebra and Lebesgue measure m (restricted to BR), the product measure is a σ-finite measure on BR2 which has the value m(E)m(F ) on measurable rectangles. The completion of this measure is Lebesgue measure on R2. (By iterating this construction we of course obtain Lebesgue measure on Rn.) The next step is to develop the theory of integration against product measures, in particular we would like to express integrals against µ × ν as iterated integrals. The goal is the Fubini-Tonelli theorem, which will say that this will work in the case of unsigned (Tonelli) or absolutely convergent (Fubini) integrals. To express integrals of unsigned functions f : X × Y → [0, +∞] against product measures y as iterated integrals, we introduce the slice functions fx : Y → [0, +∞] and f : X → [0, +∞], defined for each x ∈ X (respectively, each y ∈ Y ) by y fx(y) := f(x, y), f (x) = f(x, y). (14.6) In other words, starting with f(x, y) we get functions defined on Y by holding x fixed, and functions defined on X by holding y fixed. 50 In addition to these, given a measurable set E ⊂ X × Y , we can define for all x ∈ X, y y ∈ Y the slice sets Ex ⊂ Y , E ⊂ X by y Ex := {y ∈ Y :(x, y) ∈ E},E := {x ∈ X :(x, y) ∈ E} (14.7) We first show that these constructions preserve measurability: Lemma 14.5. Let (X, M ), (Y, N ) be measurable spaces. a) If E belongs to the product σ-algebra M ⊗ N , then for all x ∈ X and y ∈ Y the slice y sets Ex and E belong to N and M respectively. b) If (Z, O) is another measurable space and f : X × Y → Z is a measurable function, y then for all x ∈ X and y ∈ Y , the functions fx and f are measurable on Y and X respectively.

X×Y y Proof. Let S denote the set of all E ∈ 2 with the property that E ∈ M and Ex ∈ N for all x ∈ X, y ∈ Y . It suffices to prove that S is a σ-algebra containing all measurable rectangles. First observe that S contains all rectangles in M ⊗ N , since if E = F × G then Ex is equal to either G or ∅, if x ∈ F or x 6∈ F respectively. In either case Ex ∈ N . y The same proof works for E . Next, suppose (En) is a sequence of disjoint sets in S and S∞ S∞ y S∞ y E = n=1 En. Then Ex = n=1(En)x ∈ N ; similarly E = n=1 En ∈ M . Finally, c c y (E )x = (Ex) for all x ∈ X; similarly for E . Thus S is a σ-algebra. Part (b) follows from (a) by observing that for any E ⊂ O and x ∈ X, we have −1 −1 (fx) (E) = (f (E))x (14.8) and similarly for y. Next, a lemma which is of some interest in its own right. Definition 14.6. Let X be a set. A monotone class is a collection C ⊂ 2X of subsets of X such that S∞ • If E1 ⊂ E2 ⊂ · · · belong to C , so does n=1 En. T∞ • If E1 ⊃ E2 ⊃ · · · belong to C , so does n=1 En. It is immediate that intersections of monotone classes are monotone classes. Trivially, every σ-algebra is a monotone class. The next lemma is a partial converse to this: Lemma 14.7 (Monotone class lemma). If A ⊂ 2X is a Boolean algebra, then the smallest monotone class containing A is equal to the σ-algebra generated by A . Proof. Let M denote the σ-algebra generated by A and C the smallest monotone class containing A . It suffices to prove that M ⊂ C . First notice that if we replace each set in C with its complement, then we get another monotone class C 0 which contains A , so by minimality C 0 = C . Thus C is closed under complements. To continue, we first make an observation: for a fixed set E ∈ A , let CE denote the set of all F ∈ C such that the sets F \ E,E \ F,F ∩ E,X \ (E ∪ F ) (14.9)

belong to C . It is immediate that CE contains A , and a quick check of the definitions shows that CE is a monotone class. Thus, for every E ∈ A we have CE = C . 51 Let now D denote the set of all E ∈ C such that F \ E,E \ F,F ∩ E,X \ (E ∪ F ) (14.10) belong to C for all F ∈ C . By the observation above, D contains A , and again a check of definitions shows that D is a monotone class. It follows that D = C . Since C is closed under complements, it follows that C = D is also closed under finite unions. Since C contains ∅ (why?) we now see that it is a Boolean algebra. Finally, since it is a Boolean algebra closed under countable increasing unions, it is a σ-algebra. The proof strategy in which the monotone class lemma is applied should be clear: to prove that a statement P holds for a σ-algebra M generated by a Boolean algebra A , one proves that 1) P is true for all E ∈ A , and 2) the collection of all E ∈ M for which P is true is a monotone class. We now prove the first version of Tonelli’s theorem. Though not strictly necessary, we use RR the double integral notation X×Y f d(µ × ν) for integrals against product measures. Theorem 14.8 (Tonelli’s theorem, first version). Let f : X × Y → [0, +∞] be an unsigned M ⊗ N -measurable function. Then R R y a) The slice integrals g(x) := Y fx(y) dν(y) and h(y) := X f (x) dµ(x) are measurable on X and Y respectively, and ZZ Z Z Z Z y b) f d(µ × ν) = fx(y) dν(y) dµ(x) = f (x) dµ(x) dν(y). X×Y X Y Y X

Proof. We begin by proving the very special case f = 1R where R is a measurable rectangle R = E × F . We use the monotone class lemma to move from this case to the case of general R ⊂ M ⊗ N , and then linearity and monotone convergence to move to simple functions and, finally, general unsigned functions. So, suppose R = E×F is a measurable rectangle. Let f = 1R. Then f(x, y) = 1E(x)1F (y), so Z Z fx(y) dν(y) = ν(F )1E(x) and fy(x) dµ(x) = µ(E)1F (y) (14.11) Y X and so these functions are measurable. Moreover, integrating the first equation dµ and the second dν shows that (b) holds. Thus the theorem is proved when f = 1R. Now let C denote the collection of all sets B ∈ M ⊗ N such that (a) and (b) are true for f = 1B. We have that rectangles lie in C , and by additivity C is closed under finite unions. So, it suffices to show that C is a monotone class. First observe that for any E ∈ M ⊗ N , R y y R for f = 1E we have X f (x) dµ = µ(E ) and similarly Y fx(y) dν = ν(Ex). S∞ Let E1 ⊂ E2 ⊂ · · · be an increasing sequence of sets in C , let E = n=1 En. Then (En)x y y and (E )n increase to Ex and E respectively, so by monotone convergence for sets the y functions h(y) = µ(E ) and g(x) = ν(Ex) are measurable, so (a) holds. Since 1En increases to 1E, the monotone convergence theorem gives (b). To show that C is closed under decreasing sequences, we first assume µ(X), ν(Y ) < ∞. T∞ For a decreasing sequence E1 ⊃ E2 ⊃ · · · with E = n=1 En, a similar argument to the one in the previous paragraph (using dominated convergence instead of monotone convergence) shows that (a) and (b) hold for f = 1E, so C is a monotone class in the case µ(X), ν(Y ) < ∞. S∞ S∞ In the σ-finite case, let X = n=1 Xn, Y = n=1 Yn be partitions of X and Y into sets with µ(Xn), ν(Yn) < ∞ for all n. Then the above arguments apply to E ∩ (Xn × Yn) for each 52 n, and we have Z

(µ × ν)(E ∩ (Xn × Yn)) = 1Xn (x)ν(E ∩ Yn) dµ(x) X Z

= 1Yn (y)ν(E ∩ Xn) dν(y) Y

Another application of monotone convergence finishes the proof. We conclude that when X,Y are σ-finite, (a) and (b) hold for f = 1E whenever E ∈ M ⊗ N . To move to general unsigned f, first note that by linearity we conclude immediately that (a) and (b) also hold for simple functions. For a general unsigned measurable f : X × Y → [0, +∞], approximate it by an increasing sequence of measurable simple functions fn. Let Z Z y gn(x) := (fn)x(y) dν(y) and hn(y) := (fn) (x) dµ(x) (14.12) Y X

The monotone convergence theorem implies that gn and hn increase to g and h respectively, so g and h are measurable. One more application of monotone convergence then shows that Z Z Z g dµ = lim gn dµ = lim fn d(µ × ν) = f d(µ × ν) and similarly for h. Thus, ﬁnally, (a) and (b) hold for all unsigned measurable functions on X × Y . As noted above, the product of complete measures is almost never complete. Typically we pass to the completion µ × ν of a product measure; the next task is to show that we may still use iterated integrals in this setting. First a corollary of Tonelli’s theorem:

Corollary 14.9. Let (X, M , µ), (Y, N , ν) be σ-ﬁnite measure spaces. If E is a null set for y µ × ν, then ν(Ex) = 0 for µ-a.e. x ∈ X, and µ(E ) = 0 for ν-a.e. y ∈ Y .

Proof. Apply Tonelli’s theorem to 1E. (Problem 18.5.) To prove the complete version of Tonelli’s theorem we need a couple of facts about measurability on complete measure spaces:

Proposition 14.10. Let (X, M , µ) be a measure space and (X, M , µ) its completion. a) If f : X → C is M -measurable, then there exists a M -measurable function fe such that f = fe µ-a.e. b) If f : X → C is M -measurable and g : X → C is a function with g(x) = f(x) for µ-a.e. x, then g is M -measurable.

Proof. Problem 18.4. Theorem 14.11 (Tonelli’s theorem, complete version). Let (X, M , µ), (Y, N , ν) be complete σ-finite measure spaces. Let f : X × Y → [0, +∞] be an unsigned M ⊗ N -measurable function. Then: 53 y a) For µ-a.e. x and ν-a.e. y, the functions fx and f are N - and M -measurable respectively, and the functions Z Z y g(x) = fx(y) dν(y), h(y) = f (x) dµ(x) (14.13) Y X are M - and N -measurable respectively. ZZ Z Z Z Z y b) f(x, y) dµ × ν = fx(y) dν(y) dµ(x) = f (x) dµ(x) dν(y) X×Y X Y Y X Proof. We know from Proposition 14.10 that there exists an M ⊗ N -measurable function fe such that f(x, y) = fe(x, y) for µ × ν-a.e. (x, y). Let E be the exceptional set on which f 6= fe. Since µ × ν(E) = 0, there is an M ⊗ N -measurable set Ee containing E such that (µ×ν)(Ee) = 0. By Corollary 14.9, ν(Eex) = 0 for µ-a.e. x, thus since Ex ⊂ Eex (and since ν is complete!) we have ν(Ex) = 0 as well, for all such x. Additionally, for every such x the set of y for which fx(y) 6= fex(y) is a subset of Ex and therefore ν-null, since ν is assumed complete (Proposition 14.10 again). It follows that for a.e. x, we have fx = fex ν-a.e., so again by completeness we have fx is N -measurable for every such x. Of course, the analogous proof holds for f y. R For the second part of (a), we have that g(x) = Y fex(y) dν(y) for almost every x, so by Proposition 14.10(b) we conclude that g is M -measurable. (Note that the completeness of M is needed here.) Finally (b) follows from (a) and Tonelli’s theorem applied to fe. Theorem 14.12 (Fubini’s theorem). Let (X, M , µ), (Y, N , ν) be complete σ-finite measure spaces. Suppose f : X × Y → C belongs to L1(µ × ν). Then: y 1 1 a) For µ-a.e. x and ν-a.e. y, the functions fx and f belong to L (ν) and L (µ) respectively, and the functions Z Z y g(x) = fx(y) dν(y), h(y) = f (x) dµ(x) (14.14) Y X belong to L1(µ) and L1(ν) respectively. ZZ Z Z Z Z y b) f(x, y) dµ × ν = fx(y) dν(y) dµ(x) = f (x) dµ(x) dν(y) X×Y X Y Y X Proof. By taking real and imaginary parts, and then positive and negative parts, it suffices to consider the case that f is unsigned, but then the theorem follows from Tonelli’s theorem. Indeed, when f is unsigned and belongs to L1(µ × ν), we have by Tonelli Z Z ZZ fx(y) dν(y) dµ(x) = f(x, y) d(µ × ν) < ∞, (14.15) X Y X×Y R y but then Y fx(y) dν(y) < ∞ for µ-a.e. x; similarly for f . Corollary 14.13 (Integral as the area under a graph). Let (X, M , µ) be a σ-finite measure space, and give R the Borel σ-algebra BR and Lebesgue measure (restricted to BR). Then an unsigned function f : X → [0, +∞) is measurable if and only if the set

Gf := {(x, t) ∈ X × R : 0 ≤ t ≤ f(x)} (14.16) 54 is measurable. In this case, Z (µ × m)(Gf ) = f dµ. (14.17) X Corollary 14.14 (Distribution formula). Let (X, M , µ) be a σ-ﬁnite measure space and f : X → [0, +∞] an unsigned measurable function. Then Z Z f(x) dµ(x) = µ({|f| ≥ t}) dt (14.18) X [0,+∞]

Proof. Let Gf be the region under the graph of f as in Corollary 14.13. Then for ﬁxed t ≥ 0, Z

1Gf (x, t) dµ(x) = µ({|f| ≥ t}) (14.19) X so by Tonelli’s theorem and Corollary 14.13, Z Z Z Z

f(x) dµ(x) = (µ × m)(Gf ) = 1Gf (x, t) dµ(x) dt = µ({|f| ≥ t}) dt. X [0,+∞] X [0,+∞] (14.20) Corollary 14.15 (Compatibility of the Riemann and Lebesgue integrals). If f :[a, b] → R is continuous, then if we extend f to be 0 oﬀ [a, b], the extended f is Lebesgue integrable on and R f dm = R b f(x) dx. R R a

Proof (sketch). We assume f ≥ 0. For a partition P , a = x0 < x1 < ··· < xn = b of [a, b], deﬁne Cj = sup{f(x): xj ≤ x ≤ xj+1} and cj = sup{f(x): xj ≤ x ≤ xj+1}, and consider the sums n n X X U(P, f) := Cj(xj − xj−1) and L(P, f) := cj(xj − xj−1). (14.21) j=1 j=1

Let Gf denote the region enclosed by the graph of f: Gf = {(x, y) : 0 ≤ y ≤ f(x)}. Then Gf 2 2 R is a closed set in (hence Borel measurable), and by 14.13 we have m (Gf ) = f dm. Let R R R+ and R− denote the finite unions of closed rectangles corresponding to the over- and under- estimates for the Riemann integral given by U(P, f) and L(P, f). Then R− ⊂ Gf ⊂ R+, 2 2 2 and m (R+) = U(P, f), m (R−) = L(P, f). It then follows that supP L(P, f) ≤ m (Gf ) ≤ infP U(P, f). But by the definition of the Riemann integral, the inf and sup are equal to R b each other, and their common value is a f(x) dx. Remark 14.16. The above proof can be modified to drop the continuity hypothesis (where was it used?), and conclude that every Riemann integrable function on [a, b] is Lebesgue integrable, and the values of the two integrals agree. With more work it can be shown that a function f :[a, b] → R is Riemann integrable if and only if the set of points where f is discontinuous has Lebesgue measure 0. We will not prove this fact in these notes. We also note that it is not difficult to extend these facts about the Riemann integral to “improper” Riemann integrals, defined over [0, +∞) or R. In particular, note that the distribution function µ({|f| ≥ t}) is a decreasing function of t on [0, +∞), hence Riemann integrable. Thus Corollary 14.14 says that, in principle, the calculation of any Lebesgue integral can be reduced to the computation of a Riemann integral. 55 15. Integration in Rn In this section we briefly discuss Lebesgue measure and Lebesgue integration on Rn. Definition 15.1. Lebesgue measure mn on Rn is the completion of the n-fold product of (R, L , m). We will drop the superscript and just write m for Lebesgue measure on Rn when the dimension is understood. We begin with the observation that we could have also constructed Lebesgue measure on n R in the same way as on R, namely by introducing boxes B = I1 × I2 × · · · × In, where each Qn Ij is an interval in R, and declaring |B| = j=1 |Ij|. One can then define Lebesgue outer measure mn∗ by defining, for all E ⊂ Rn, ∞ ∞ n∗ X [ m (E) = inf{ |Bj| : E ⊂ Bj}; (15.1) j=1 j=1 the infimum taken over all coverings of E by boxes. By imitating the constructions of Section 4, we are led to a σ-finite Borel measure on Rn such that the measure of a box B is its volume |B|. By the Hahn Uniqueness theorem, this measure agrees with the product measure on all Borel sets, and no trouble arises in passing to completions. In particular, we have the following analog of Theorem 4.5; the proof is similar and left as an exercise.

Theorem 15.2. Let E ⊂ Rn. The following are equivalent: a) E is Lebesgue measurable. b) For every > 0, there is an open set U ⊃ E such that mn∗(U \ E) < . c) For every > 0, there is a closed set F ⊂ E such that mn∗(E \ F ) < . n∗ d) There is a Gδ set G such that E ⊂ G and m (G \ E) = 0. n∗ e) There is an Fσ set F such that E ⊃ F and m (E \ F ) = 0. Rn possesses a larger group of symmetries than R does. In particular we would like to analyze the behavior of Lebesgue measure under invertible linear transformations T : Rn → Rn. We have the following analog of Theorem 4.4: Theorem 15.3. Let T : Rn → Rn be an invertible linear transformation. Then: Z 1 Z a) For all f ∈ L1(Rn), (f ◦ T )(x) dx = f(x) dx. Rn | det T | Rn b) If E ∈ Rn is Lebesgue measurable, then T (E) is Lebesgue measurable, and m(T (E)) = | det T |m(E).

Proof. From linear algebra, every invertible linear transformation of Rn is a ﬁnite product n of transformations of one of the following types (we write vectors in R as x = (x1, . . . xn), in the standard basis):

i) (Scaling a row) T (x1, . . . xj, . . . xn) = (x1, . . . cxj, . . . xn), for some j = 1, . . . n and some c ∈ R ii) (adding a row to another) T (x1, . . . xj, . . . xk, . . . xn) = (x1, . . . xj, . . . xj + xk, . . . xn), some j, k = 1, . . . n iii) (interchanging rows) T (x1, . . . xj, . . . xk, . . . xn) = (x1, . . . xk, . . . xj, . . . xn), some j, k = 1, . . . n. 56 In the first case, det T = c, in the second, det T = 1, and in the third, det T = −1. By the multiplicativity of the determinant, it suffices to prove the theorem for T of each of these types. We may also assume f ≥ 0 (why?) But (a) now follows easily from Tonelli’s theorem and the invariance properties of one-dimensional Lebesgue measure. For example, for T of type (i) we integrate with respect to xj first and use the one-dimensional fact Z 1 Z g(ct) dt = g(t) dt (15.2) R |c|

for all c 6= 0. In case (ii) we integrate with respect to xk ﬁrst and use translation invariance of one-dimensional Lebesgue measure: for ﬁxed xj, Z Z g(xj + xk) dxk = g(xk) dxk, (15.3) R R while for case (iii) we simple interchange the order of integration with respect to xj and xk. Thus (a) holds in all three cases. (b) follows by applying (a) to 1T −1(E) and using the fact −1 1 that det T = det T . Corollary 15.4. Lebesgue measure on Rn is rotation invariant. Proof. A rotation of Rn is just a linear transformation satisfying T t = T −1, which implies that | det T | = 1, so the claim follows from Theorem 15.3. One result we will use frequently in the rest of the course is the following fundamental approximation theorem. We already know that absolutely integrable functions can be approximated in L1 by simple functions, we now show that in Rn we can approximate in L1 with continuous functions.

Deﬁnition 15.5. We say a function f : X → C is supported on a set E ⊂ X if f = 0 on the complement of E. When X is a topological space, the closed support of f is equal to the smallest closed set E such that f is supported on E. Say f is compactly supported if it is supported on a compact set E.

Note that since every bounded set in Rn has compact closure, a function f : Rn → C is compactly supported if and only if it is supported in a bounded set. Since bounded sets have finite Lebesgue measure, it follows that if f : Rn → C is continuous and compactly supported, then it belongs to L1(Rn). 1 n Theorem 15.6. If f ∈ L (R ) then there is a sequence of (fn) of continuous, compactly 1 supported functions such that fn → f in L . Proof. We work in R first, and reduce to the case where f is simple: let > 0; since simple 1 1 functions are dense in L , there is an L simple function ψ such kψ − fk1 < /2. If we can 1 find a continuous g ∈ L such that kψ − gk1 < /2 we are done. For this, it suffices (by linearity) to assume ψ = 1E for a set E with m(E) < ∞. By Littlewood’s first principle, Sn we can find a set A, a finite union of disjoint open intervals A = j=1(aj, bj), such that m(A∆E) < . It follows that k1A − 1Ek1 = k1A∆Ek1 < . Let gj denote the unique continuous, piecewise linear function such that gj(x) is equal to 1 for x ∈ (aj, bj) and 0 for −j −j x < aj − 2 and x > bj + 2 . (Draw a picture.) Then for each interval Ij = (aj, bj), −j Pn 1 we have kgj − 1Ij k1 < 2 . Let g = j=1 gj. Then g is continuous and belongs to L , 57 Pn and from the triangle inequality we have kg − 1Ak ≤ j=1 kgj − 1Ij k < . It follows that kg − 1Ek < 2 and the proof is complete in the one-dimensional case. In higher dimensions, the same approximation scheme works; it suffices (using linearity, Littlewood’s first principle, and the /2n trick as before) to approximate the indicator function of a box B = I1 × · · · × In; again a piecewise linear function which is 1 on the box and 0 outside a suitably small neighborhood of the box suffices. The details are left as an exercise. Remark 15.7. There is a more sophisticated way to do continuous approximation in L1, using convolutions. This will be covered in depth later in the course. Also note that since every L1 convergent sequence has a pointwise a.e. convergent subsequence, every L1 function f can be approximated by a sequence of continuous functions fn which converge to f both in the L1 norm and pointwise a.e. As an application of the above approximation theorem, we prove a very useful fact about integration in Rn, namely that translation is continuous in L1(Rn). The proof strategy is to first prove the result from scratch for continuous, compactly supported f, then use the density of these functions in L1 to get the general result. This density argument is frequently used; we will see it again in the next section when we prove the Lebesgue Differentiation Theorem.

n n Proposition 15.8. For h ∈ R , and f : R → C a function, deﬁne fh(x) := f(x − h) (the 1 n 1 translation of f by h). Then for all f ∈ L (R ), we have fh → f in the L norm as h → 0. Proof. First suppose f is continuous and compactly supported. Then for |h| < 1, say, all of n the functions fh and f are supported in a single compact set K ⊂ R . Since f is continuous, each fh is continuous, and fh → f uniformly on K as h → 0. It follows from Proposition 12.10 1 that fh → f in L . Now let f ∈ L1(Rn) and > 0 be given. We can choose a continuous, compactly supported g such that kg − fk1 < /3. Note that by the translation invariance of Lebesgue measure, we have that kgh − fhk1 = kg − fk1 < /3 as well. (Here we have used the readily veriﬁed fact that (|f − g|)h = |fh − gh|). Now, since the result holds for g, there is a δ > 0 such that for all |h| < δ, kgh − gk1 < /3. Thus

kfh − fk1 ≤ kfh − ghk1 + kgh − gk1 + kg − fk1 < (15.4) which proves the theorem. The section concludes with some remarks on integration in polar coordinates. Write 2 2 1/2 n−1 n kxk = (x1 + ··· + xn) for the Euclidean length of a vector x. Let S = {x ∈ R : kxk = 1} be the unit sphere in Rn. Each nonzero vector x can be expressed uniquely in x n the form x = kxk kxk (positive scalar times a unit vector), so we may identify R \{0} with n−1 x n (0, +∞) × S . Precisely, the map Φ(x) = (kxk, kxk ) is a continuous bijection of R \{0} and (0, +∞) × Sn−1. Using the map Φ we can define the push-forward of Lebesgue measure n−1 −1 to (0, +∞)×S ; namely m∗(E) = m(Φ (E)). Let ρ = ρn denote the measure on (0, +∞) R n−1 defined by ρ(E) = E r dr. Theorem 15.9 (Integration in polar coordinates). There is a unique finite Borel measure n−1 1 σ = σn−1 on S such that m∗ = ρ×σ. If f is an unsigned or L Borel measurable function 58 on n then R Z Z ∞ Z f(x) dx = f(rξ)rn−1 dσ(ξ)dr. (15.5) Rn 0 Sn−1 Proof. 16. Differentiation theorems One version of the fundamental theorem of calculus says that if f is continuous on a closed interval [a, b] ⊂ R, if we define the function Z x F (x) := f(t) dt (16.1) a then F is differentiable on (a, b) and F 0(x) = f(x) for all x ∈ (a, b). Using the definition of derivative, this can be reformulated as 1 Z x+h lim f(t) dt = f(x) (16.2) h→0 h x for all x ∈ (a, b). If we let I(x, h) denote the open interval (x, x + h), then, re-expressing in terms of the Lebesgue integral, we have 1 Z lim f dm = f(x). (16.3) h→0 m(Ih) Ih

It is not hard to show that we can replace Ih with the interval B(x, h) centered on x with radius h; in this case m(B(x, h)) = 1/2h and we still have 1 Z lim f dm = f(x). (16.4) h→0 m(B(x, h)) B(x,h) This can be interpreted to say that the average values of f over small intervals centered on x converge to f(x), as one might expect from continuity. Perhaps surprisingly, the result remains true, at least for (Lebesgue) almost every x, when we drop the continuity hypothesis. The goal of this section is to prove the Lebesgue differentiation theorem. To state it we introduce the notation B(x, r) for the open ball of radius r > 0 centered at a point x ∈ Rn. We also write R f(y) dy for integrals against Lebesgue measure. For the rest of this section L1 refers to Lebesgue measure on Rn unless stated otherwise. Definition 16.1 (Locally integrable functions). A Lebesgue measurable function f : Rn → C R n is called locally integrable if K |f(y)| dy < ∞ for every compact set K ⊂ R . The collection n 1 n of all locally integrable functions on R is denoted Lloc(R ). Since every compact set in Rn is contained in a closed ball, it suffices in the above definition R to require only B |f(y)| dy < ∞ for every ball B. 1 n Theorem 16.2 (Lebesgue Differentiation Theorem). Let f ∈ Lloc(R ). Then for almost every x ∈ Rn, 1 Z a) lim |f(y) − f(x)| dy = 0 (16.5) r→0 m(B(x, r)) B(x,r) and 1 Z b) lim f(y) dy = f(x) (16.6) r→0 m(B(x, r)) B(x,r) 59 Notice that the second statement follows from the first. One can interpret the theorem as follows: given f ∈ L1, define for each r > 0 the function 1 Z Af,r(x) := f(y) dy, (16.7) m(B(x, r)) B(x,r) the average value of f over the ball of radius r centered at x. Then the second statement says that the functions Ar,f converge to f almost everywhere as r → 0. (It is not hard to show, 1 using a density argument, that Ar,f → f in the L norm as r → 0. See Problem 18.15.) To begin with, it is easy to prove Theorem 16.2 in the continuous case:

Lemma 16.3. Let f : Rn → C be continuous and compactly supported. Then for all x ∈ Rn, 1 Z lim |f(y) − f(x)| dy = 0. (16.8) r→0 m(B(x, r)) B(x,r) Proof of Lemma 16.3. Since f is continuous and compactly supported, it is absolutely integrable. Fix x ∈ Rn and let > 0. By continuity there is a δ > 0 such that |f(x) − f(y)| < for all |y − x| < δ. Let 0 < r < δ. Then 1 Z 1 Z |f(y) − f(x)| dy < dy m(B(x, r)) B(x,r) m(B(x, r)) B(x,r) = .

1 R It follows that 0 ≤ lim supr→0 m(B(x,r)) B(x,r) |f(y) − f(x)| dy < for every > 0, and the lemma follows. To move from continuous, compactly supported f to absolutely integrable f we need the following estimate, which is quite important in its own right. It is an estimate on the Hardy-Littlewood maximal function, which is deﬁned for f ∈ L1(Rn) by 1 Z Mf (x) = sup |f(y)| dy (16.9) r>0 m(B(x, r)) B(x,r) Theorem 16.4 (Hardy Littlewood Maximal Theorem). Let f : Rn → C be absolutely integrable. Then for all t > 0, kfk m({x ∈ n : M (x) > t}) ≤ C 1 . (16.10) R f t for some absolute constant C > 0 that depends only on the dimension n.

1 n Remark 16.5. If it were the case that Mf were in L (R ), then this estimate would be an 1 immediate consequence of Markov’s inequality. However Mf is essentially never in L , even in the simplest case of the indicator function of an interval. Before proving Theorem 16.4, we will see how it is used to prove the Lebesgue Diﬀerenti- ation Theorem.

1 n 1 Proof of Therem 16.2. First note that we may assume f ∈ L (R ) (not just Lloc); to see this 1 n just replace f by 1B(0,N)f for N ∈ N. So, let f ∈ L (R ) and fix , t > 0. We first prove (b) 60 and then use this to deduce (a). First, by Theorem 15.6 there exists a continuous, compactly supported g such that Z |f(x) − g(x)| dx < . (16.11) Rn Applying the Hardy-Littlewood maximal inequality to |f − g|, we have Z n 1 C m {x ∈ R : sup |f(x) − g(x)| dx > t} ≤ . (16.12) r>0 m(B(x, r)) Rn t In addition, by Markov’s inequality applied to |f − g| we have m({x ∈ n : |f(x) − g(x)| > t}) ≤ (16.13) R t n (C+1) Thus there is a set E ⊂ R of measure less than t such that, outside of E we have both 1 Z sup |f(y) − g(y)| dy ≤ t (16.14) r>0 m(B(x, r)) B(x,r) and |f(x) − g(x)| ≤ t. (16.15) Now consider x ∈ Rn \ E. By the result for continuous, compactly supported functions (Lemma 16.3), we have for all sufficiently small r > 0 Z 1 g(y) dy − g(x) ≤ t (16.16) m(B(x, r)) B(x,r) In the left-hand side of this inequality, we add and subtract f(x) and the average value of f over B(x, r). Then by (16.14), (16.15), and the triangle inequality, we have Z 1 f(y) dy − f(x) ≤ 3t (16.17) m(B(x, r)) B(x,r) for all sufficiently small r > 0. It follows that Z 1 lim sup f(y) dy − f(x) ≤ 3t (16.18) r→0 m(B(x, r)) B(x,r) for all x outside a set of measure (C + 1)epsilon/t. Keeping t fixed and letting → 0 (along a countable sequence of values), we have Z 1 lim sup f(y) dy − f(x) ≤ 3t (16.19) r→0 m(B(x, r)) B(x,r) for almost every x ∈ Rn. Now letting t → 0 through a countable sequence of values, we have finally Z 1 lim sup f(y) dy − f(x) = 0 (16.20) r→0 m(B(x, r)) B(x,r) for almost every x ∈ Rn, which proves part (b) of the theorem. For part (a), note that if f is locally integrable then so is |f(x)−c| for any constant c ∈ C. Thus for each c ∈ C we can apply part (b) to conclude that 1 Z lim |f(y) − c| dy = |f(x) − c| (16.21) r→0 m(B(x, r)) B(x,r) 61 for all x outside an exceptional set Ec with m(Ec) = 0. Fix a countable dense subset Q ⊂ C S and let E = c ∈ QEc; then m(E) = 0 and for fixed x∈ / E there exists c ∈ Q with |f(x) − c| < , so |f(y) − f(x)| < |f(y) − c| + , and 1 Z lim sup |f(y) − f(x)| dy ≤ |f(x) − c| + < 2. (16.22) r→0 m(B(x, r)) B(x,r) Since > 0 was arbitrary, this proves (a). It remains to prove the Hardy-Littlewood maximal inequality, Theorem 16.4. The proof we give is based on the following lemma, known as the Wiener covering lemma. We let B denote an open ball in Rn; and for a > 0 let aB denote the open ball with the same center as B, whose radius is a times the radius of B.

Lemma 16.6 (Wiener’s covering lemma). Let B be a collection of open balls in Rn, and let S U = B∈B B. If c < m(U), then there exists finitely many disjoint balls B1,...Bk ∈ B such Sk −n that m( j=1 Bj) > 3 c. Proof. There exists a compact set K ⊂ U such that m(K) > c. The collection of open balls B covers K, so there are finitely many balls A1,...Am whose union covers K. From these we select a disjoint subcollection by a greedy algorithm: from A1,...Am choose a ball with maximal radius. Call this B1. Now discard all the balls that intersect B1. From the balls that remain, choose one of maximal radius, necessarily disjoint from B1, call this B2. Continue inductively, at each stage choosing a ball of maximal radius disjoint from the balls that have already been picked. The process halts after a finite number of steps. We claim that the balls B1,...Bk have the desired property. This follows from a geometric observation: suppose A, A0 are open balls with radii r, r0, and suppose r ≥ r0. If A ∩ A0 6= ∅, 0 then A ⊂ 3·A (draw a picture). From this observation, it follows that each ball Aj that was not picked during the construction is contained in 3 · Bi for some i. In particular, the balls 3 · B1,... 3 · Bk cover K. From the scaling property of Lebesgue measure (Theorem 15.3), we have m(3 · B) = 3nm(B). Thus

k X n X n [ c < m(K) ≤ m(3 · Bj) = 3 m(Bj) = 3 m( Bj). (16.23) j=1 1 n n Proof of Theorem 16.4. Let f ∈ L (R ) and fix λ > 0. Let Eλ = {x ∈ R : Mf (x) > λ}. If x ∈ Eλ, then by definition of Mf there is an rx > 0 such that Ar,|f|(x) > λ. The open balls B(x, r ) then cover E . Fix c with m(E ) > c. Then m(S B(x, r )) > c, x λ λ x∈Eλ x so by the Wiener covering lemma there are finitely many x1, . . . xk ∈ Eλ so that the balls Sk −n Bk := B(xk, rxk ) are disjoint and m( j=1 Bj) > 3 c. From the way the radii rx were chosen, we have for each xj 1 Z λ < A (x ) = |f(y)| dy (16.24) rxj ,|f| j m(B(xj, rx )) j B(xj ,rxj ) so 1 Z m(Bj) < |f(y)| dy. (16.25) λ Bj 62 It follows that k n Z n Z n n [ n X 3 X 3 3 c < 3 m( Bj) = 3 m(Bj) < |f(y)| dy ≤ |f(y)| dy = kfk1. λ λ n λ j=1 j j Bj R (16.26) This holds for all c < m(Eλ), so taking the supremum over such c we get finally kfk m(E ) ≤ 3n 1 . (16.27) λ λ

17. Signed measures and the Lebesgue-Radon-Nikodym Theorem A second form of the fundamental theorem of calculus says that if f is continuous on the closed interval [a, b] and differentiable in (a, b), then Z b f(b) − f(a) = f 0(x) dx. (17.1) a Suppose for the moment that f is increasing on [a, b]. Then from our construction of Lebesgue-Stieltjes measures, the formula µ([c, d]) := f(d) − f(c), defined for all subintervals [c, d] ⊂ [a, b] defines a unique Borel measure on [a, b]. On the other hand, from (17.1), this measure can equivalently be defined by the formula Z µ(E) = f 0(x) dx. (17.2) E In particular the function x → µ([a, x]) is differentiable on (a, b) and its derivative is f 0, so we 0 dµ 0 can write formally dµ = f dm, or, pressing the issue, dm = f . The Radon-Nikodym theorem is an abstract expression of this relationship between the measures µ and dm. Before going further, however, it will be helpful to introduce signed measures. 17.1. Signed measures; the Hahn and Jordan decomposition theorems. If µ, ν are measures on a common measurable space (X, M ), then we have already seen that we can form new measures cµ (for c ≥ 0) and µ + ν. We would like to extend these operations to allow negative constants and subtraction. The obvious thing to do is to define the difference of two measures to be (µ − ν)(E) = µ(E) − ν(E). (17.3) The only difficulty is that the right-hand side may take the form ∞ − ∞ and is therefore undefined. We deal with this problem by avoiding it: the measure µ − ν will be defined only when at least one of µ, ν is a finite measure, in which case the formula (17.3) always makes sense. It is straightforward to check that, under this assumption, the set function µ − ν is finitely additive, and (µ − ν)(∅) = 0. It is of course not monotone; but countable additivity will hold. From these observations we extract the definition of a signed measure: Definition 17.1. Let (X, M ) be a measurable space. A signed measure is a function ρ : M → R satisftying: a) ρ(∅) = 0, b) ρ takes at most one of the values +∞, −∞, 63 ∞ P∞ c) if (En)n=1 is a disjoint sequence of measurable sets, then n=1 ρ(En) converges to S∞ S∞ ρ( n=1 En), and the sum is absolutely convergent if ρ( n=1 En) is finite. Remark 17.2. Actually, the statement about absolute convergence in (c) is an immediate consequence of the Riemann rearrangement theorem. As an exercise, one can verify that if at least one of µ, ν is finite, then µ − ν as defined above satisfies these conditions. The main result of this section is the Jordan decomposition theorem, which says that every signed measure arises in this way; moreover the µ, ν can be chosen in a canonical way. Note that we can also place a partial order on the set of finite measures on (X, M ), by saying that µ ≥ ν if and only if µ − ν is a positive measure. Example 17.3. Consider a measure space (X, M , µ) and let f : X → R belong to L1(µ). f can be decomposed into its positive and negative parts f = f + − f −, where f + = max(f, 0) and f − = − min(f, 0). The quantity Z ρ(E) := f dµ (17.4) E then defines a signed measure on (X, M ). Indeed, ρ can be written as ρ = µf + − µf − where µf ± denotes the measure Z ± µf ± (E) = f dµ. (17.5) E In fact this will work as long as f is semi-integrable (that is, at least one of f +, f − is integrable). It is not hard to show that monotone and dominated convergence for sets still hold for signed measures:

∞ Proposition 17.4. Let ρ be a signed measure. If (En)n=1 is an increasing sequence of mea- S∞ surable sets, then ρ( n=1 En) = limn→∞ ρ(En). If En is a decreasing sequence of measurable T∞ sets and ρ(E1) is finite, then ρ( n=1 En) = limn→∞ ρ(En). Proof. The proof is essentially the same as in the unsigned case (using the disjointification trick) and is left as an exercise (Problem 18.18). Before going further we introduce some notation and a couple of definitions. If ρ is a signed measure and Y ⊂ X is a measurable set, we let ρ|Y denote the measure defined by ρ|Y (E) := ρ(Y ∩ E). Call a set E totally positive for ρ if ρ|E ≥ 0; totally negative if ρ|E ≤ 0. (Explicitly, E is totally positive for ρ if and only if ρ(F ) ≥ 0 for all F ⊂ E, note that this is equivalent to saying that for all measurable F ⊂ E, we have ρ(F ) ≤ ρ(E).) Call E totally null if ρ|E = 0. It is immediate that E is totally null for ρ if and only if it is both totally positive and totally negative. Say that a signed measure ρ is supported on E if Ec is totally null for ρ. Say that two signed measures ρ, σ are mutually singular if they have disjoint supports; we write ρ⊥σ. Note that when we decompose a real-valued function f into its positive and negative parts + − + − f = f − f , the sets X+ := {x : f (x) > 0} and X− := {x : f (x) > 0} are disjoint, and

f|X+ ≥ 0, f|X− ≤ 0. A similar statement holds for signed measures: Theorem 17.5 (Hahn Decomposition Theorem). Let ρ be a signed measure. Then there

exists a partition of X into disjoint measurable sets X = X+ ∪ X− such that ρ|X+ ≥ 0 and 64 0 0 0 0 ρ|X− ≤ 0. Moreover if X+,X− is another such pair, then X+∆X+ and X−∆X− are totally null for ρ.

Proof. We may assume ρ avoids the value +∞. The idea of the proof is to select X+ to be a maximal totally positive set for ρ, and then show that X− := X \ X+ is totally negative. X+ is obtained by a “greedy algorithm”: let M denote the supremum of ρ(E) over all totally positive sets E. Choose a sequence of sets En so that M = lim ρ(En). Since each En is totally S∞ positive, the union X+ := n=1 En is also totally positive (check this), and by construction ρ(X+) = M. (This shows M is ﬁnite.) The proof is ﬁnished if we can show that X− := X \ X+ is totally negative. By way of contradiction, suppose it is not. We again use a kind of “greedy algorithm”: by our assumption, there is a set E1 ⊂ X− with ρ(E1) > 0. If E1 is totally positive, then X+ ∪E1 is a totally positive set of ρ measure strictly greater than M, a contradiction. Thus E1 contains a subset E2 of strictly larger positive measure. Choose E2 ⊂ E1 such that ρ(E2) ≥ ρ(E1)+1/n1, where n1 is the smallest integer for which this is possible. Now, if E2 were totally positive, we would again obtain a contradiction, so we next select E3 ⊂ E2 such that ρ(E3) ≥ ρ(E2)+1/n2, with the smallest possible n2. Continuing by induction, we get a decreasing sequence of measurable sets Ej+1 ⊂ Ej and a sequence of integers nj such that ρ(Ej) > 0 for all j and 1 ρ(Ej+1) ≥ ρ(Ej) + ; (17.6) nj

and nj is the smallest integer for which such Ej+1 exists. T∞ Let E = j=1 Ej. By Proposition 17.4, the set E has positive measure, and hence its measure is finite (recall ρ omits the value +∞). Since ρ(E) is finite, we have from the the construction that the nj go to infinity. It remains to show that E must be totally positive, which will be a contradiction. To see this, suppose F ⊂ E has ρ(F ) > ρ(E). Let n be the smallest integer such that ρ(F ) − ρ(E) ≥ 1/n. Then for every j we would have F ⊂ Ej and ρ(F ) ≥ ρ(E) + 1/n ≥ ρ(Ej) + 1/n, which, since the nj go to infinity, will contradict (17.6) once j is large enough. The uniqueness statement in the theorem is left as an exercise (Problem 18.19). Theorem 17.6. If ρ is a signed measure on (X, M ), then there exist unique positive measures ρ+, ρ− such that ρ+⊥ρ− and ρ = ρ+ − ρ−.

Proof. Let X = X+ ∪ X− be a Hahn decomposition for ρ and put ρ+ = ρ|X+ , ρ− = −ρ|X− . It is immediate from the properties of the Hahn decomposition that ρ+, ρ− have the desired properties; uniqueness is left as an exercise (Problem 18.20). Example 17.7. Referring to Example 17.3, it is now immediate that the decomposition mf = mf + − mf − is the Jordan decompostion of mf . Thus the Jordan decomposition theorem should be seen as a generalization of the decomposition of a real-valued function into its positive and neagative parts.

Let ρ be a signed measure and ρ = ρ+ − ρ− its Jordan decomposition. By analogy with + − the identity |f| = f + f , we can define a measure |ρ| := ρ+ + ρ−; this is called the absolute value or total variation of ρ. The latter name is explained by the following proposition: Proposition 17.8. Let ρ be a signed measure. Then |ρ| is the minimal positive measure µ such that −µ ≤ ρ ≤ µ. Moreover, for each measurable set E we have |ρ|(E) = P∞ S∞ sup n=1 |ρ(En)|, where the supremum is taken over all partitions E = n=1 En. 65 Proof. For the first assertion, it is immediate from the definition that −|ρ| ≤ ρ ≤ |ρ|. Suppose µ is a positive measure and −µ ≤ ρ ≤ µ. Let X = X+ ∪ X− be a Hahn decompostion for

ρ. Then µ|X+ ≥ ρ|X+ = ρ+, and similarly −µ|X− ≤ ρ|X− = −ρ−, so µ|x− ≥ ρ−. Adding the first and last inequalites we get µ ≥ |ρ|. The second part of the proposition is an exercise (Problem 18.22). Warning: A moment’s thought shows that in general |ρ(E)| 6= |ρ|(E). As an exercise, prove that |ρ(E)| ≤ |ρ|(E) always, with equality if and only if E is either totally positive, totally negative, or totally null for ρ. We can now define a signed measure ρ to be finite or σ-finite according as |ρ| is finite or σ-finite. It is not hard to show that ρ is finite if and only if ρ(E) is finite for every E, if and only if ρ+, ρ− are finite unsigned measures. It is evident from this that the space of finite signed measures on (X, M ) is a real vector space, denoted M(X). (We will see later in the course that the quantity kρk := |ρ|(X) defines a norm on M(X), called the total variation norm.) A few remarks about integration against signed measures are in order. If ρ is a signed 1 1 1 1 1 measure, then L (ρ) is defined to be L (|ρ|); note that L (|ρ|) = L (ρ+) ∩ L (ρ−). When f ∈ L1(ρ), we define Z Z Z f dρ := f dρ+ − f dρ−. (17.7)

Proposition 17.9. Let ρ be a signed measure on (X, M ). 1 R R a) If f ∈ L (ρ), then f dρ ≤ |f| d|ρ| R b) If E ∈ M , |ρ|(E) = sup{ E f dρ : |f| ≤ 1}. Proof. Problem 18.23. 17.2. The Lebesgue-Radon-Nikodym theorem. Fix for reference a measurable space (X, M ) and an unsigned measure m on this space. (In this section all measures are deﬁned on the same σ-algebra M .) For an unsigned measurable function f, we have the measure Z mf (E) = f dm. (17.8) E

The map f → mf is thus a map from the space of unsigned measurable functions into the space of nonnegative measures on X. We saw in the last section that when f is a real-valued L1 function, this map takes L1(m) into the space of finite signed measures on X. One may ask if every finite measure µ on X may be expressed as mf for some f, but one can quickly see this is not the case in general. Indeed, if µ = mf then µ(E) = 0 whenever m(E) = 0, which need not always be the case (e.g., m is Lebesgue measure on R and µ is the point mass at 0.) However, when the measures involved are σ-finite, it turns out this is the only obstruction. Theorem 17.10. Let m be an unsigned σ-finite measure and µ a signed σ-finite measure. Then there is a unique decomposition µ = mf + µs where f is semi-integrable with respect to m and µs⊥m. Moreover if µ is unsigned then f and µs are as well, and if µ is finite then 1 µs is finite and f ∈ L (m). The proof will make use of a few lemmas. 66 Lemma 17.11. Let (X, M , m) be a measure space and f an unsigned measurable function. Then Z mf (E) := f dm (17.9) E 1 1 defines a measure on M , and if g ∈ L (mf ) then gf ∈ L (m) and Z Z g dmf = gf dm. (17.10)

Proof. That mf is a measure is Proposition 9.3. The integral formula was asked for in Problem 13.16; we give a proof here. First the integral formula holds by deﬁnition when g = 1E, and by linearity it holds for simple g. By the monotone convergence theorem it 1 holds for all unsgined measurable g, and then for all g ∈ L (mf ) by linearity again. Lemma 17.12. Suppose m is a σ-ﬁnite measure. If f, g : X → R belong to L1(m), then mf = mg if and only if f = g m-a.e.

Proof. Suppose mf = mg. By Example 17.3 and the uniqueness of the Jordan decomposition, + + we have mf + = mg+ and mf − = mg− . It then follows from Proposition 11.2 that f = g − − and f = g a.e. Lemma 17.13. Suppose µ and ν are ﬁnite positive measures on (X, M ). Then either µ⊥ν, or else there exist > 0 and a measurable set E such that µ(E) > 0 and ν ≥ µ on E (that is, E is totally positive for ν − µ).

n n 1 Proof. For each n ≥ 1, let X = X+ ∪ X− be a Hahn decompostion for ν − n µ. Let P = S∞ n T∞ n 1 n=1 X+ and N = n=1 X−. Then N is totally negative for ν − n µ for all n, and it follows n that ν(N) = 0. If µ(P ) = 0, then µ⊥ν. Otherwise, µ(X+) > 0 for some n, and by n 1 1 n construction X+ is totally positive for ν − n µ (thus we take = n , E = X+). Proof of Theorem 17.10. We prove this only for the case that µ, m are finite; the extension to the σ-finite case is left as an exercise. Using the Jordan decomposition theorem, we may additionally assume that µ, m are unsigned. We first prove existence of f and µs. As before, f is selected by a “greedy algorithm.” Let R M be the supremum of the quantity f dm, taken over all unsigned f such that mf ≤ µ. X R Note that M is finite, since µ is. Choose a sequence fn so that X fn dm → M. Replacing the sequence fn by gn := sup fk, we have that the gn are increasing, mg ≤ µ, and R 1≤k≤n n gn dm → M. Since the gn are increasing, they have a pointwise limit f = lim gn = sup fn, X R n and by monotone convergence X f dm = M. Thus, the supremum is attained; we take this to be our f. The proof is finished if we can show that µs := µ − mf is singular to m. If not, then by Lemma 17.13 there is an > 0 and an E such that m(E) > 0 and µs − m|E ≥ 0. This says that µs ≥ m|E; in other words µ ≥ mf + (m|E)—which means that we could add 1E to f, contradicting the maximality of f. Finally, to see that f and µs are unique, note that if also µ = mg + νs, then we have mf−g = νs − µs. Observe that since µs and νs are both singular to m, then so is their difference. But this means that mf−g is singular to m , hence f − g = 0 a.e. It then follows that also νs = µs. 67 The function f is called the Radon-Nikodym derivative of µ with respect to m, and we dµ write dm = f. The basic manipulations suggested by the derivative notation are valid. For example it is easy to check that d(µ1 +µ2)/dm = dµ1/dm+dµ2/dm. We will see in a moment that the chain rule is valid. As a corollary of the Lebesgue-Radon-Nikodym theorem (combined with some of our earlier work) we obtain the following: Corollary 17.14. Let m be an unsigned σ-finite measure and µ a signed σ-finite measure. The following are equivalent:

i) µ = mf for some measurable function f. ii) µ(E) = 0 whenever m(E) = 0. iii) For every > 0, there exists a δ > 0 such that |µ(E)| < whenever m(E) < δ. Proof. If µ is a finite positive measure, then (i) implies (iii) was already proved in Lemma 12.14. The σ-finite and signed cases follow readily from this; the proof is an exercise. (iii) implies (ii) is trivial. For (ii) implies (i), apply Theorem 17.10 to µ to obtain µ = mf + µs. Let E be a support for µs. Then m(E) = 0 since µs⊥m, and by (ii) we have µ(E) = 0, hence µs(E) = µ(E) − mf (E) = 0 so µs is trivial. When any of the conditions (i)–(iii) of Corollary 17.14 hold, we say that µ is absolutely continuous with respect to m, and write µ m. It is straightforward to verify that µ m if and only if |µ| m, and µ⊥m if and only if |µ|⊥m. In this language, the Lebesgue- Radon-Nikodym theorem (Theorem 17.10 says that µ can be decomposed uniquely as a sum of two measures, one absolutely continuous with respect to m and one singular to m. We can now state and prove the chain rule for Radon-Nikodym derivatives. Proposition 17.15 (Chain rule for Radon-Nikodym derivatives). Suppose m, λ are σ-finite positive measures and µ is a σ-finite signed measure, with µ λ and λ m. 1 dµ 1 i) If g ∈ L (|µ|), then g dλ ∈ L (λ) and Z Z dµ g dµ = g dλ. (17.11) X X dλ ii) We have µ m and dµ dµ dλ = m-a.e. (17.12) dm dλ dm dµ Proof. By treating µ+, µ− separately, we may assume µ ≥ 0. Write f = dλ . Since µ λ by hypothesis, we have in the notation of this section µ = λf , so by 17.11 Z Z g dµ = gf dλ (17.13) X X dµ which proves (i). For (ii), apply (i) with λ, m in place of µ, λ and g = 1E dλ for a measurable set E. Then Z dµ Z dµ dλ µ(E) = dλ = dm (17.14) E dλ E dλ dm dµ R dµ It follows that µ m, and then by the definition of dm we have µ(E) = E dm dm. Comparing this to the equation on the last line and using Propositiion 11.2 again, we conclude that dµ dµ dλ dm = dλ dm a.e. with respect to m. 68 One important corollary of the Lebesgue-Radon-Nikodym theorem is the existence of conditional expectations: Proposition 17.16. Let (X, M .µ be a σ-finite measure space (µ a positive measure), N a 1 1 sub-σ-algebra of M , and ν = µ|N . If f ∈ L (µ) then there exists g ∈ L (ν) (unique modulo ν-null sets) such that Z Z f dµ = g dν E E for all E ∈ N (g is called the conditional expectation of f on N ).

Proof. Problem 18.27 17.3. Lebesgue differentiation revisited. Finally, we describe the connection between Radon-Nikodym derivatives and Lebesgue differentiation on Rn. In this section µ will denote a signed Borel measure on Rn which is regular: a positive measure µ is regular if i) µ(K) < ∞ for every compact K ⊂ Rn, and ii) for every Borel set E ⊂ Rn, we have µ(E) = inf{µ(U): U open ,E ⊂ U}. In fact, one can show that the second condition is already implied by the first, but we will not prove this here so we assume (ii) explicitly. A signed measure µ is called regular if its positive and negative parts µ± are regular. Theorem 17.17. Let µ be a regular, signed Borel measure on Rn, with Lebesgue decomposition

µ = mf + µs with respect to Lebesgue measure m. Then µ(B (x)) lim r = f(x) (17.15) r→0 m(Br(x)) for m-a.e. x ∈ Rn.

Proof. We may assume µ is positive. By the regularity of µ, we see that the measure mf 1 is locally finite, so f ∈ Lloc. One may verify that the measure mf is regular, and so µs is as well. Applying the Lebesgue differentiation theorem, we know that (17.16) holds already with µ = mf , so it suffices to prove that µ (B (x)) lim s r = 0 m − a.e. (17.16) r→0 m(Br(x)) for the singular part µs. c For this, fix a Borel set E such that µs(E) = m(E ) = 0 and let µs(Br(x)) 1 Ek = x ∈ E : lim sup > r→0 m(Br(x)) k

It will suﬃce to prove that m(Ek) = 0 for each integer k ≥ 1. By regularity, for given > 0 there is an open set U containing E such that µs(U) < . By the deﬁntion of Ek, for each x ∈ Ek there is a ball Bx centered at x (and contained in U) such that µ (B ) > m(Bx) . Let V = S B be the union of these balls. Fix a number s x k x∈Ek x 69 c < m(V ) and apply Wiener’s covering lemma 16.6. We obtain points x1, . . . xm ∈ Ek such that the balls B1,...Bm are disjoint and m m n X n X n n n c < 3 m(Bj) ≤ 3 k µx(Bj) ≤ 3 kµs(V ) ≤ 3 kµs(U) < 3 k. j=1 j=1 n Thus m(V ) ≤ 3 k, and since Fk ⊂ V and was arbitrary, we conclude m(Fk) = 0. 18. Problems 18.1. Product measures.

Problem 18.1. Let µX denote counting measure on X. Prove that if X,Y are both at most X Y X×Y countable, then 2 ⊗ 2 = 2 and µX × µY = µX×Y . Problem 18.2. Prove that the product measure construction is associative: that is, if (Xj, Mj, µj), j = 1, 2, 3 are σ-ﬁnite measure spaces, then (M1⊗M2)⊗M3 = M1⊗(M2⊗M3), and (µ1 × µ2) × µ3 = µ1 × (µ2 × µ3).

Problem 18.3. Let X = Y = [0, 1], M = N = B[0,1], µ Lebesgue measure on [0, 1], and ν counting measure on [0, 1]. Let E denote the diagonal {(x, x): x ∈ [0, 1]} ⊂ [0, 1] × [0, 1]. Prove that Z Z Z Z 1E(x, y) dν(y) dµ(x), 1E(x, y) dµ(x) dν(y) (18.1) X Y Y X are unequal. Problem 18.4. Prove Proposition 14.10. Problem 18.5. Prove Corollary 14.9. Problem 18.6. Prove Corollary 14.13. 18.2. Integration on Rn. Problem 18.7. Compare the three integrals ZZ Z 1 Z 1 Z 1 Z 1 f dm2, f(x, y) dx dy, f(x, y) dy dx (18.2) [0,1]2 0 0 0 0 for the functions x2 − y2 a) f(x, y) = (x2 + y2)2 b) f(x, y) = (1 − xy)−s, s > 0

1 R 1 −1 1 Problem 18.8. Prove that if f ∈ L [0, 1] and g(x) = x t f(t) dt, then g ∈ L [0, 1] and R 1 R 1 0 g(x) dx = 0 f(x) dx. R ∞ sin x R b sin x Problem 18.9. Prove that 0 | x | dx = +∞, but the limit limb→+∞ 0 x dx exists and π is finite. (For a bigger challenge, show that the value of the limit is 2 .) Problem 18.10. Prove Theorem 15.2. Problem 18.11. Complete the proof of Theorem 15.6. 70 Problem 18.12 (The Gamma function). Define Z ∞ Γ(x) := tx−1e−t dt (18.3) 0 a) Prove that the function t → tx−1e−t is absolutely integrable for all fixed x > 0 (thus Γ(x) is defined for all x > 0). b) Prove that Γ(x + 1) = xΓ(x) for all x > 0. c) Compute Γ(1/√2). (Hint: if you haven’t seen this before, first make the change of variables u = t, then evaluate the square of the resulting integral using Tonelli’s theorem and polar coordinates.) 1 1 3 1 √ d) Using (b) and (c), conclude that Γ(n + 2 ) = (n − 2 )(n − 2 ) ··· ( 2 ) π for all n ≥ 1. Problem 18.13. Complete the following outline to prove that 2πn/2 σ(Sn−1) = . (18.4) Γ(n/2)

a) Show that if f ∈ L1(Rn) and f is a radial function (that is, f(x) = g(|x|) for some function g : [0, ∞) → C), then Z Z ∞ f(x) dx = σ(Sn−1) g(r)rn−1 dr. (18.5) Rn 0 b) Show that for all c > 0,

Z 2 π n/2 e−c|x| dx = . (18.6) Rn c 2 2 −c|x| Qn −c|xj | (Hint: write e = j=1 e and use Tonelli’s theorem.) c) Finish by combining (a) and (b). (Using the results on the Gamma function from the previous exercise, one ﬁnds that σ(Sn−1) is always a rational multiple of an integer power of π.) 18.3. Diﬀerentiation theorems.

Problem 18.14. Prove that if 0 6= f ∈ L1(R), then there exist constants C, R > 0 (depending on f) such that C M (x) ≥ for all |x| > R. (18.7) f |x|

(Hint: reduce to the case f = 1E where E is a bounded set of positive measure.) Conclude 1 1 that Mf never belongs to L (R) if f ∈ L is not a.e. 0. Problem 18.15. The Lebesgue differentiation theorem says that for f ∈ L1(Rn), we have 1 Ar,f → f pointwise a.e. as r → 0. Prove that also Ar,f → f in the L norm. (Hint: the proof can be done in three steps: first prove this under the assumption that f is continuous 1 1 with compact support. Then prove that for all f ∈ L and r > 0, the functions Ar,f ∈ L ; 1 in fact kAr,f k1 ≤ kfk1 for all r. Tonelli’s theorem will help. Finally, to pass to general L functions, use a density argument.) 71 Problem 18.16. Let E be a Borel set in R. Define the density of E at x to be m(E ∩ B(x, r)) DE(x) = lim r→0 m(B(x, r)) whenever the limit exists.

a) Show that DE(x) = 1 for a.e. x ∈ E and DE(x) = 0 for a.e. x∈ / E. b) Give examples of E and x for which DE(x) = α (0 < α < 1) and for which DE(x) does not exist. Problem 18.17. Define the decentered Hardy-Littlewood maximal function for f ∈ L1(Rn) by Z ∗ 1 Mf (x) = sup |f(x)| dx (18.8) B m(B) B where the supremum is taken over all open balls containing x (not just those centered at x). Prove that ∗ n Mf ≤ Mf ≤ 2 Mf . (18.9) 18.4. Signed measures and the Lebesgue-Radon-Nikodym theorem. Problem 18.18. Prove Proposition 17.4. Problem 18.19. Compolete the proof of Theorem 17.5. Problem 18.20. Prove the uniqueness statement in the Jordan decomposition theorem. (Hint: if also ρ = σ+ − σ−, use σ± to obtain another Hahn decomposition of X.) P Problem 18.21. Prove that if νj⊥µ for j ∈ N then ( j νj)⊥µ, and if νj µ for j ∈ N P then ( j νj) µ. Problem 18.22. Complete the proof of Proposition 17.8. Problem 18.23. Prove Proposition 17.9. Problem 18.24. Complete the proof of Theorem 17.10 in the σ-finite case. Problem 18.25. Complete the proof of the (i) =⇒ (iii) implication in Corollary 17.14 Problem 18.26. Suppose ρ is a signed measure on (X, M ) and E ∈ M . Prove that a) ρ+(E) = sup{ρ(F ): F ∈ M ,F ⊂ E} and ρ−(E) = − inf{ρ(F ): F ∈ M ,F ⊂ E} Pn n b) |ρ|(E) = sup{ 1 |ρ(Ej)| : E1,...En are disjoint and ∪1 Ej = E} Problem 18.27. a) Prove Proposition 17.16. b) In the case µ = Lebesgue measure on [0, 1), j j+1 fix a positive integer k and let N be the sub-σ-algebra generated by the intervals [ k , k ) for j = 0, . . . k − 1. Give an explicit formula for the conditional expectation g in terms of f.