Some Notes on Theory Chris Preston

This version: February 2005 These notes present the material on measures and kernels which are needed in order to read my lecture notes Specifications and their Gibbs states [16]. They could perhaps be used as a general introduction to some parts of measure theory, but the account is somewhat biased and the contents are determined entirely by the kind of results used in [16]. In particular, the topological aspects of measure theory are missing completely. The theory in [16] really only requires an for non-negative mappings, and so we also make this restriction here. The disadvantage is then that we have non-negative cones of mappings instead of vector spaces, but the advantage is that the value ∞ is much less of a nuisance and in most cases can be regarded as just another number. There are many texts providing a more balanced account of measure theory. The classical text is Halmos [8] and a very good modern book is Cohn [3]; the first course I gave on the subject was based on Taylor [17]. However, the book everyone should look at at least once is Meyer [14].

Chris Preston Pankow February, 2005 Contents

1 Extended real numbers 4

2 Measurable spaces and mappings 7

3 Measures 18

4 The Carath´eodory extension theorem 24

5 Measures on the 30

6 How the integral will be introduced 34

7 Partially ordered sets 35

8 Real valued mappings 40

9 Real valued measurable mappings 47

10 The integral 52

11 The Daniell integral 61

12 The Radon-Nikodym theorem 70

13 Image and pre-image measures 76

14 Kernels 82

15 Product measures 88

16 Countably generated measurable spaces 99

17 The Dunford-Pettis theorem 108

18 Substandard Borel spaces 112

19 The Kolmogorov extension property 116

20 Convergence of conditional expectations 122

21 Existence of conditional distributions 129 3

22 Standard Borel spaces 132

23 The usual diagonal argument 137

Bibliography 139

Index 140 1 Extended real numbers

+ Most of the mappings we will be dealing with take their values in the set R∞ of non-negative extended real numbers. In this chapter we list (without proofs) the facts which we need about these numbers. The reader should check through the list to see that there are no surprises. The natural numbers {0, 1, 2, . . .} are denoted by N, the positive natural numbers {1, 2, . . .} by N+. A countable set is one which is either finite or countably infinite (the latter meaning it has the same cardinality as N). + + + Put R = {a ∈ R : a ≥ 0} and let R∞ = R ∪ {∞}; the operations of addition + + and multiplication on R will be extended to R∞ by letting a + ∞ = ∞ + a = ∞ + + for all a ∈ R∞, a · ∞ = ∞ · a = ∞ for all a ∈ R∞ \ {0} and 0 · ∞ = ∞ · 0 = 0. As can be easily verified, these extended operations satisfy the usual associative, commutative and distributive laws (without any restriction). + If a, b ∈ R∞ then a − b is not always defined; however, it is useful to always assign |a − b| a value: This is the usual value if a, b ∈ R+ and ∞ in all other + cases, which means that |∞ − ∞| = ∞. Let a, b ∈ R∞ with a ≤ b; then there + exists c ∈ R∞ with b = a + c, and c is unique unless both a and b are equal to ∞. It is convenient to define b − a to be the ‘real’ difference if a, b ∈ R+ and to be + ∞ otherwise. Thus b = a + (b − a) for all a, b ∈ R∞ with a ≤ b and b − a = ∞ whenever b = ∞. + + The usual total order ≤ on R will be extended to a total order on R∞ (also + denoted by ≤) by letting a ≤ ∞ for each a ∈ R∞. In particular a ≤ |a − b| + b + and b ≤ |a − b| + a for all a, b ∈ R∞. Put a ∧ b = min{a, b}, a ∨ b = max{a, b} + and note that |a − b| = (a ∨ b − a) + (a ∨ b − b) for all a, b ∈ R∞. We use a ≥ b as an alternative notation for b ≤ a. Moreover, a < b means of course a ≤ b but not a = b, and a > b means a ≥ b but not a = b. + + A sequence {an}n≥1 from R∞ converges to a ∈ R if for each ε > 0 there exists m ≥ 1 such that |a − an| < ε for all n ≥ m. This means exactly that there exists + p ≥ 1 such that an ∈ R for all n ≥ p and that the sequence {an}n≥p converges + + to a in R. A sequence {an}n≥1 from R∞ converges to ∞ if for each b ∈ R there + exists m ≥ 1 such that an ≥ b for all n ≥ m. If {an}n≥1 converges to a ∈ R∞ then a is uniquely determined by {an}n≥1; this value a is called the limit of the sequence and will be denoted by limn→∞ an, or mostly just by limn an. If {an}n≥1 + is a sequence from R∞ then the statement limn an = a is short for the statement that the sequence {an}n≥1 converges with limit a. + + Let {an}n≥1 be a convergent sequence from R∞ with a = limn an. If a ∈ R then limn |a − an| = 0. However, if a = ∞ then |a − an| = ∞ for all n ≥ 1. + A sequence {an}n≥1 from R∞ is said to be increasing if an ≤ an+1 and decreasing if an+1 ≤ an for all n ≥ 1. Each increasing sequence {an}n≥1 converges: If there

4 1 Extended real numbers 5

+ exists a ∈ R with an ≤ a for all n ≥ 1 then limn an is just the limit in R, and + if no such a ∈ R exists then limn an = ∞. Moreover each decreasing sequence {an}n≥1 also converges: Either an = ∞ for all n ≥ 1, in which case limn an = ∞, or an < ∞ for all n ≥ p for some p ≥ 1, and then limn an is just the limit of the sequence {an}n≥p in R. + + Let A be a subset of R∞; an element a ∈ R∞ is said to be an upper bound (resp. lower bound) for A if b ≤ a (resp. b ≥ a) for all b ∈ A. An upper bound (resp. lower bound) a is called a least upper bound (resp. greatest lower bound) for A if a ≤ b for each upper bound b for A (resp. if a ≥ b for each lower bound b for A). If a least upper bound (resp. greatest lower bound) exists then it is clearly unique and it will be denoted by sup(A) (resp. by inf(A)). + Each non-empty subset A of R∞ possesses both a least upper bound and a greatest lower bound. If A is a bounded subset of R+ (i.e., if b is an upper bound for A for some b ∈ R+) then sup(A) is the least upper bound of A in R. If A has no upper bound in R+ then sup(A) = ∞. If A = {∞} then inf(A) = ∞. If A =6 {∞} then inf(A) is the greatest lower bound of A \ {∞} in R. + If {an}n≥1 is any sequence from R∞ and m ≥ 1 then {an : n ≥ m} will be used + to denote the set of values {a ∈ R∞ : a = an for some n ≥ m}. + If {an}n≥1 is an increasing sequence from R∞ then limn an is just the least upper bound of its set of values {an : n ≥ 1}. Thus limn an is the unique element a of + + R∞ with the properties: (i) an ≤ a for all n ≥ 1 and (ii) if b ∈ R∞ with b ≤ a and b =6 a then b ≤ an for all large enough n. Similarly, if {an}n≥1 is a decreasing + sequence from R∞ then limn an is the greatest lower bound of {an : n ≥ 1}, and + therefore limn an is the unique element a of R∞ with the properties: (i) an ≥ a + for all n ≥ 1 and (ii) if b ∈ R∞ with b ≥ a and b =6 a then b ≥ an for all large enough n. + Let {an}n≥1 be any sequence from R∞ and for n ≥ 1 let bn = sup{am : m ≥ n} and cn = inf{am : m ≥ n}. Then the sequence {bn}n≥1 is decreasing and {cn}n≥1 is increasing. The limits limn bn and limn cn are denoted by lim supn an and lim infn an respectively. Then lim infn an ≤ lim supn an, with equality if and only if the sequence {an}n≥1 converges. Moreover, if this is the case then

lim inf an = lim an = lim sup an . n→∞ n→∞ n→∞

+ + + 0 0 A binary operation ? : R∞ × R∞ → R∞ is said to be monotone if a ? b ≤ a ? b 0 0 whenever a ≤ a and b ≤ b . If ? is monotone and {an}n≥1 and {bn}n≥1 are + increasing sequences from R∞ then {an ? bn}n≥1 is also an increasing sequence. A monotone operation ? is defined to be continuous if

lim (an ? bn) = lim an ? lim bn n→∞ n→∞ n→∞   1 Extended real numbers 6

+ holds for all increasing sequences {an}n≥1 and {bn}n≥1 from R∞. + The operations +, ·, ∨ and ∧ on R∞ are all continuous (and so in particular monotone). Moreover, for all a, b ∈ R+ the operation (c, d) 7→ ac + bd is also continuous. Furthermore, these operation are all finite, where a binary operation + + + ? on R∞ is said to be finite if a ? b ∈ R for all a, b ∈ R . + + Intervals in R∞ are defined as expected: For all a, b ∈ R∞ with a < b put

+ + [a, b] = {c ∈ R∞ : a ≤ c ≤ b} , (a, b) = {c ∈ R∞ : a < c < b} , + + (a, b] = {c ∈ R∞ : a < c ≤ b} , [a, b) = {c ∈ R∞ : a ≤ c < b} . 2 Measurable spaces and mappings

Before we define a measure in Chapter 3 we must first look at various classes of subsets (algebras, σ-algebras, monotone classes, d-systems) of a given set and the relationships between them. We also need to consider some elementary properties of measurable spaces and mappings. Let X be a non-empty set and S be a non-empty subset of P(X) (where P(X) denotes the set of all subsets of X). Then S is called — an algebra if X \ A ∈ S for each A ∈ S and S is closed under finite unions (i.e., A1 ∪ · · · ∪ An ∈ S whenever A1, . . . , An ∈ S). — a σ-algebra if X \ A ∈ S for each A ∈ S and S is closed under countable

unions (i.e., n≥1 An ∈ S for each sequence {An}n≥1 of elements from S), S — a monotone class if whenever {An}n≥1 is an increasing (resp. is a decreasing)

sequence of elements from S then n≥1 An ∈ S (resp. n≥1 An ∈ S). S T — a d-system if X ∈ S and A2 \ A1 ∈ S for all A1, A2 ∈ S with A1 ⊂ A2. It is clear that a σ-algebra is both an algebra and a monotone class, and that an algebra always contains the elements ∅ and X. Moreover, an algebra is also closed under finite intersections and a σ-algebra is closed under countable intersections.

Lemma 2.1 (1) A monotone class is a σ-algebra if and only if it is an algebra.

(2) Let {An}n≥1 be a sequence from an algebra A. Then there exists a disjoint sequence {Bn}n≥1 from A with Bn ⊂ An for each n ≥ 1 and n≥1 Bn = n≥1 An. S S (3) An algebra A is a σ-algebra if and only if n≥1 An ∈ A for each increasing sequence {An}n≥1 from A. S

(4) An algebra A is a σ-algebra if and only if n≥1 An ∈ A for each disjoint sequence {An}n≥1 from A. S

Proof Straightforward.

Let S be a subset of P(X). Then, since P(X) is a σ-algebra containing S and an arbitrary intersection of σ-algebras is again a σ-algebra, it follows that the intersection of all the σ-algebras containing S is a σ-algebra containing S. This σ-algebra is called the σ-algebra generated by S and it will be denoted by σ(S). Of course, if F is any σ-algebra containing S then by construction F contains σ(S), which means that σ(S) is the smallest σ-algebra containing S. The algebra, monotone class and the d-system generated by S are defined in exactly the same way, and these subsets of P(X) will be denoted by a(S), m(S)

7 2 Measurable spaces and mappings 8 and d(S) respectively. Note that if S is empty then m(S) is also empty and σ(S) = a(S) = d(S) = {∅, X}. It is instructive to see what σ(T ) and a(T ) are in the special case in which T is a non-empty finite subset of P(X), and to be explicit suppose that T consists of the n elements A1, . . . , An. Denote by p(T ) the set of all non-empty subsets 0 0 0 of X having the form A1 ∩ · · · ∩ An, where for each j the set Aj is either Aj n or X \ Aj. Then p(T ) is finite (and in fact can contain at most 2 elements). Moreover, denote by p˘(T ) the set of all subsets of X which can written as unions of elements from p(T ). (An empty union is allowed here, so ∅ ∈ p˘(T ).)

Lemma 2.2 (1) The elements of p(T ) form a partition of X, meaning that each point of X lies in exactly one element of p(T ).

(2) Let 1 ≤ j ≤ n; then for each B ∈ p(T ) either B ⊂ Aj or B ∩ Aj = ∅. Hence (by (1)) Aj is the disjoint union of the elements of p(T ) it contains. (3) σ(T ) = a(T ) = p˘(T ).

Proof (1) It is clear that different elements of p(T ) are disjoint. Moreover, it is also clear that for each x ∈ X there is an element of p(T ) containing x (since x is either in Aj or X \ Aj for each j) and this just says that the union of all the elements in p(T ) is equal to X. (2) This is clear. (3) It follows immediately from (1) that p˘(T ) is a σ-algebra and thus also an algebra, and by (2) T ⊂ p˘(T ). But any algebra or σ-algebra containing T must contain p(T ) and thus also p˘(T ), and therefore σ(T ) = a(T ) = p˘(T ).

Lemma 2.2 implies in particular that σ(T ) and a(T ) are finite when T is finite. The following very useful fact is called the monotone class theorem:

Proposition 2.1 If A is an algebra then σ(A) = m(A).

Proof This is equivalent to showing that σ(A) is a monotone class and that m(A) is a σ-algebra. But a σ-algebra is always a monotone class, and by Lemma 2.1 (1) a monotone class is a σ-algebra if and only if it is an algebra. It thus remains to show that m(A) is an algebra. Let m0(A) = {X \ A : A ∈ m(A)} and put M = m(A) ∩ m0(A). Then A ⊂ M, and it is easily checked that M is a monotone class, and therefore m(A) ⊂ M, i.e., M = m(A). But this just means that X \ A ∈ m(A) for each A ∈ m(A). Now for each A ∈ m(A) let M(A) = {B ∈ m(A) : A ∪ B ∈ m(A)}; then it is again easily checked that M(A) is a monotone class. Moreover, if in addition A ∈ A then A ⊂ M(A), and hence M(A) = m(A) for each A ∈ A. This implies that C ∪ D ∈ m(A) for all 2 Measurable spaces and mappings 9

C ∈ A, D ∈ m(A) (since C ∪ D ∈ M(C) = m(A)), and hence A ⊂ M(D) for each D ∈ m(A), i.e., A = M(D) for each D ∈ m(A). In other words, m(A) is closed under finite unions.

A subset S ⊂ P(X) is said to be closed under finite intersections if A1 ∩ A2 ∈ S for all A1, A2 ∈ S.

Proposition 2.2 If S is closed under finite intersections then d(S) is an algebra.

Proof The set d(S) is non-empty, since X ∈ d(S), and also X \ A ∈ d(S) for all A ∈ d(S), since d(S) is a d-system. Put C = {A ∈ d(S) : A ∩ B ∈ d(S) for all B ∈ S} ; then S ⊂ C ⊂ d(S) and it is easily seen that C is a d-system. Hence C = d(S), i.e., A ∩ B ∈ d(S) for all A ∈ d(S), B ∈ S. Now put D = {A ∈ d(S) : A ∩ B ∈ d(S) for all B ∈ d(S)} ; then S ⊂ D ⊂ d(S) and so again D is a d-system. Therefore D = d(S), i.e., A ∩ B ∈ d(S) for all A, B ∈ d(S). This shows that d(S) is an algebra.

Let Y be a non-empty subset of X; for each non-empty subset S of P(X) denote by S|Y the subset of P(Y ) consisting of all sets having the form Y ∩ A for some A ∈ S. S|Y is referred to as the trace of S on Y . If E is a σ-algebra of subsets of X then E|Y is a σ-algebra of subsets of Y .

Proposition 2.3 σ(S|Y ) = σ(S)|Y .

Proof Denote by F the subset of P(X) consisting of all sets of the form F ∪(G\Y ) with F ∈ σ(S|Y ) and G ∈ σ(S), and therefore F|Y = σ(S|Y ). Then it is easily checked that F is a σ-algebra; moreover, F contains S, since if A ∈ S then A = (A ∩ Y ) ∪ (A \ Y ) and A ∩ Y ∈ S|Y ⊂ σ(S|Y ). Thus σ(S) ⊂ F, which implies that σ(S)|Y ⊂ F|Y = σ(S|Y ). On the other hand, σ(S)|Y is a σ-algebra containing S|Y , and hence also σ(S|Y ) ⊂ σ(S)|Y .

Let X and Y be non-empty sets and f : X → Y be a mapping; then for each B ⊂ Y put f −1(B) = {x ∈ X : f(x) ∈ B} and for each S ⊂ P(Y ) put f −1(S) = {A ∈ P(X) : A = f −1(B) for some B ∈ S} .

Lemma 2.3 (1) If E ⊂ P(X) is a σ-algebra then {B ∈ P(Y ) : f −1(B) ∈ E} is a σ-algebra of subsets of Y . (2) For each σ-algebra F of subsets of Y the set f −1(F) is a σ-algebra of subsets of X. 2 Measurable spaces and mappings 10

Proof Both (1) and (2) follow from the fact that f −1(∅) = ∅, f −1(Y ) = X, −1 −1 −1 −1 f (Y \ B) = X \ f (B) for each B ⊂ Y and f n≥1 Bn = n≥1 f (Bn) for each sequence {Bn}n≥1 of subsets of Y . S  S

Proposition 2.4 f −1(σ(S)) = σ(f −1(S)) for each subset S of P(Y ).

Proof By Lemma 2.3 (2) f −1(σ(S)) is a σ-algebra and f −1(S) ⊂ f −1(σ(S)), and this implies that σ(f −1(S)) ⊂ f −1(σ(S)). But by Lemma 2.3 (1) the set E = {B ∈ P(Y ) : f −1(B) ∈ σ(f −1(S))} is a σ-algebra, and it contains S; hence σ(S) ⊂ E and thus

f −1(σ(S)) ⊂ f −1(E) = {A ∈ P(X) : A = f −1(B) for some B ∈ E} ⊂ σ(f −1(S)) .

A pair (X, E) consisting of a non-empty set X and a σ-algebra E of subsets of X is called a measurable space. Let (X, E) and (Y, F) be measurable spaces. A mapping f : X → Y is said to a measurable mapping from (X, E) to (Y, F) if f −1(F) ⊂ E. This will also be expressed by saying that f : (X, E) → (Y, F) is measurable, or just by saying that f : X → Y is measurable if the σ-algebras E and F can be determined from the context.

Lemma 2.4 Let f : X → Y be a mapping and S be a subset of F with F = σ(S). Then f is measurable if and only if f −1(S) ⊂ E.

Proof This follows immediately from Proposition 2.4.

If f : (X, E) → (Y, F) and g : (Y, F) → (Z, G) are measurable mappings then the composition g ◦ f : X → Z is a measurable mapping from (X, E) to (Z, G). Let f : (X, E) → (Y, F) be a measurable mapping, and thus f −1(F) ⊂ E. Now there are two further stronger properties which we will sometimes require of such a mapping. The first is that f −1(F) = E should hold and the second is that both f −1(F) = E and f(X) ∈ F hold. The first property is important when dealing with countably generated measurable spaces (Chapter 16) and the second when dealing with what we call substandard Borel spaces (Chapter 18); these are some kind of poor man’s version of standard Borel spaces. Therefore, when looking at constructions which preserve measurability (for example, taking products and disjoint unions) we will also check whether these two additional properties are preserved.

If X is a non-empty then OX will denote the set of open subsets of X. The σ-algebra σ(OX ) is called the Borel σ-algebra of X or the σ-algebra of Borel subsets of X and it will be denoted by BX . 2 Measurable spaces and mappings 11

Lemma 2.5 If X and Y are non-empty topological spaces and f : X → Y is a −1 continuous mapping then f (BY ) ⊂ BX , i.e., f is a measurable mapping from −1 (X, BX ) to (Y, BY ). Moreover, if f is a homeomorphism then f (BY ) = BX .

−1 Proof If f is continuous then f (OY ) ⊂ OX and thus by Proposition 2.4

−1 −1 −1 f (BY ) = f (σ(OY )) = σ(f (OY )) ⊂ σ(OX ) = BX .

−1 Moreover, if f is a homeomorphism then f (OY ) = OX and so Proposition 2.4 −1 here implies that f (BY ) = BX .

+ The set R∞ will be considered as a topological space with respect to the order + : In this topology a subset U of R∞ is defined to be open if for each x ∈ U there exists an open interval J with x ∈ J ⊂ U, where an open interval is + + a set having one of the forms {x ∈ R∞ : a < x < b} with a < b, {x ∈ R∞ : x < b} + with b =6 0 and {x ∈ R∞ : x > a} with a =6 ∞. The mapping t 7→ t/(1 − t) + + defines a homeomorphism from [0, 1] onto R∞ and so in particular R∞ is compact. + + Moreover, the relative topology of R as a subset of R∞ is the same as the relative + + topology of R as a subset of R. The Borel σ-algebra of R∞ will be denoted by + B∞. + + The most basic measurable space in the whole of what follows is (R∞, B∞). If (X, E) is a measurable space then the set of all measurable mappings from (X, E) + + + to (R∞, B∞) will be denoted by M(E), thus a mapping f : X → R∞ is in M(E) −1 + if and only if f (B∞) ⊂ E. (We omit the X from this notation because it can always be inferred from the σ-algebra E.) + It is usually rather easy to show that a mapping f : X → R∞ is in M(E). This is because by Lemma 2.4 it is enough to show that f −1(A) ∈ E for each A ∈ S, + where S is a any subset of B∞ with σ(S) = E, and there are many simple choices + here for S. For example, consider the following subsets of P(R∞):

+ + S1 = {[0, a] : a ∈ R \ {0}} , S2 = {[0, a) : a ∈ R \ {0}} , + + S3 = {[a, ∞] : a ∈ R \ {0}} , S4 = {(a, ∞] : a ∈ R \ {0}} .

+ Lemma 2.6 σ(S1) = σ(S2) = σ(S3) = σ(S4) = B∞.

+ Proof Let O be the set of open subsets of R∞ and I the set of open intervals (occurring in the definition of the order topology). Then each element of O can be written as a countable union of elements from I, thus O ⊂ σ(I) and hence + + B∞ = σ(O) ⊂ σ(I). On the other hand, I ⊂ O and so σ(I) ⊂ σ(I) = B∞. This + + shows that B∞ = σ(I). Let a ∈ R \ {0}; then

−1 + [0, a] = [0, b + n ) ∈ σ(S2) , [0, a) = R∞ \ [a, ∞] ∈ σ(S3) , n\≥1 2 Measurable spaces and mappings 12

−1 + [a, ∞] = (a(1 − n ), ∞] ∈ σ(S4) , (a, ∞] = R∞ \ [0, a] ∈ σ(S1) , n[≥1 thus S1 ⊂ σ(S2), S2 ⊂ σ(S3), S3 ⊂ σ(S4) and S4 ⊂ σ(S1) which implies that σ(S1) = σ(S2) = σ(S3) = σ(S4). Denote this common σ-algebra by F. If + + a ∈ R \ {0} then [0, a) ∈ I, hence S1 ⊂ I and so F = σ(S1) ⊂ σ(I) = B∞. Now let a, b ∈ R+ \{0} with a < b; then {x ∈ X : a < x < b} = [0, b)∩(a, ∞] ∈ F and the same holds for the other possibilities in I. Therefore I ⊂ F which implies + + B∞ = σ(I) ⊂ F. This shows that F = B∞.

Putting together Lemmas 2.4 and 2.6 results in the following criteria:

+ Lemma 2.7 A mapping f : X → R∞ is in M(E) as soon as one of the following conditions is satisfied: (1) {x ∈ X : f(x) ≤ a} ∈ E for all a ∈ R+ \ {0}, (2) {x ∈ X : f(x) < a} ∈ E for all a ∈ R+ \ {0}, (3) {x ∈ X : f(x) ≥ a} ∈ E for all a ∈ R+ \ {0}, (4) {x ∈ X : f(x) > a} ∈ E for all a ∈ R+ \ {0}.

−1 Proof For n = 1, 2, 3, 4 the condition (n) says that f (Sn) ⊂ E and therefore −1 + + by Lemma 2.4 f (B∞) ⊂ E, since by Lemma 2.6 (n) B∞ = σ(Sn). Hence f ∈ M(E).

Of course, the choice of Sn, n = 1, 2, 3, 4 was somewhat arbitrary, and many other similar choices could have been made.

Next let S be a non-empty set and for each s ∈ S let (Xs, Es) be a measurable space; put X = s∈S Xs. We will now define an appropriate product σ-algebra on the cartesian Qproduct X. Let us call a subset of X a measurable rectangle if it has the form s∈S Es with Es ∈ Es for each s and Es =6 Xs for only finitely many s ∈ S, andQdenote the set of such measurable rectangles by R; note that R is closed under finite intersections.

Lemma 2.8 Denote by A (resp. A0) the set of all finite unions (resp. all finite disjoint unions) of elements of R. Then A is an algebra and A0 = A. Moreover, σ(A) = σ(R).

Proof It is immediate that σ(A) = σ(R) since R ⊂ A and A ⊂ σ(R). Now A0 ⊂ A and A is closed under finite unions, and hence it will follow that A is an algebra with A0 = A once we show that if A ∈ A then A and X \ A are both 0 elements of A . Thus consider A ∈ A, so A has the form R1 ∪ · · · ∪ Rn with 2 Measurable spaces and mappings 13

Rj = s∈S Esj, where Esj ∈ Es for each s ∈ S, j = 1, . . . , n. Moreover, there 0 0 exists Qa finite subset S of S such that Esj = Xs for all s ∈/ S , j = 1, . . . , n. For each s ∈ S let Ts be the subset of P(Xs) consisting of the elements Es1, . . . , Esn, and let U be the subset of P(X) consisting of all elements of the form s∈S Es with Es ∈ p(Ts) for each s (where p(Ts) is as in Lemma 2.2, and noteQ that 0 p(Ts) = {Xs} for each s ∈/ S ). Then U ⊂ R (since p(Ts) ⊂ Es for each s) and by Lemma 2.2 (1) the elements of U form a partition of X. Furthermore, by Lemma 2.2 (2) it follows that if U ∈ U then either U ⊂ A or U ∩ A = ∅, and this implies that A is the (disjoint) union of the elements of U it contains, and the same holds true of X \ A. In particular, A and X \ A are both elements of A0.

The product of the σ-algebras Es, s ∈ S, is defined to be the σ-algebra σ(R), and this σ-algebra will be denoted by s∈S Es. Moreover, the product of the measurable spaces (Xs, Es), s ∈ S, is definedQ to be the measurable space (X, E), where E = s∈S Es. For each s ∈ S there is the projection mapping ps : X → Xs given by psQ({xs}s∈S) = xs for each {xs}s∈S ∈ X, and ps : (X, E) → (Xs, Es) is clearly measurable. Now consider a further measurable space (Y, F) and for each s ∈ S let fs : Y → Xs be a mapping. Then there is a mapping f : Y → X defined by letting f(y) = {fs(y)}s∈S for each y ∈ Y , and of course fs = ps ◦ f for each s ∈ S. Moreover, all mappings from Y to X can be obtained in this way.

Proposition 2.5 Let f : Y → X be a mapping. Then f : (Y, F) → (X, E) is measurable if and only if ps ◦ f : (Y, F) → (Xs, Es) is measurable for each s ∈ S.

Proof The condition is clearly necessary since ps is measurable for each s. Thus suppose the mapping ps ◦ f : Y → Xs is measurable for each s ∈ S. Then, since

−1 −1 −1 −1 −1 −1 f Es = f ps (Es) = f (ps (Es)) = (ps ◦ f) (Es) Ys∈S  s\∈S  s\∈S s\∈S −1 −1 (and fs (Xs) = Y ), it follows that f (R) ∈ F for each measurable rectangle R, and therefore by Lemma 2.4 f is measurable.

Suppose now that we have two product spaces: For each s ∈ S let (Xs, Es) and

(Ys, Fs) be measurable spaces and let X = s∈S Xs, E = s∈S Es, Y = s∈S Ys and F = s∈S Fs. Also for each s ∈ S let fsQ: Xs → Ys be Qa mapping; thenQ there is a mappingQ f : X → Y given by f({xs}s∈S) = {fs(xs)}s∈S for all {xs}s∈S) ∈ X.

Proposition 2.6 (1) If fs : (Xs, Es) → (Ys, Fs) is measurable for each s ∈ S then f : (X, E) → (Y, F) is measurable. −1 −1 (2) If fs (Fs) = Es for each s ∈ S then f (F) = E. −1 (3) If fs (Fs) = Es and fs(X) ∈ Fs for each s ∈ S and the set S is countable then f(X) ∈ F. 2 Measurable spaces and mappings 14

Proof (1) Let RX (resp. RY ) be the measurable rectangles in X (resp. in Y ). If −1 −1 −1 R = s∈S Fs ∈ RY then f (R) = s∈S fs (Fs) ∈ RX and thus f (RY ) ⊂ RX . ThereforeQ by Lemma 2.4 f is measurable.Q −1 (2) Let R = s∈S Es ∈ RX ; then, since fs (Fs) = Es, there exists Fs ∈ Fs −1 −1 with fs (Fs) =QEs and, since fs (Ys) = Xs), we can choose Fs = Ys whenever 0 −1 0 −1 Es = Xs. Thus R = s∈S Fs ∈ RY and f (R ) = s∈S fs (Fs) = s∈S Es = R −1 and this, together withQ(1) shows that f (RY ) = RQX . Hence by PropQ osition 2.4

−1 −1 −1 f (F) = f (σ(RY )) = σ(f (RY )) = σ(RX ) = E .

(3) Clearly f(X) = s∈S fs(Xs). Thus if S is finite then f(X) is a measurable rectangle, hence assumeQ S is countably infinite. Let {sn}n≥1 be an enumeration of n the elements in S and for each n ≥ 1 put Rn = k=1 fsk (Xsk ) × k>n Ysk . Then Rn is a measurable rectangle for each n ≥ 1 andQ f(X) = n≥1QRn. Therefore f(X) ∈ F. T

We now discuss what are called sections, but only do this for the product of two spaces. Let X, Y, Z be sets and let f : X × Y → Z a be mapping. For each x ∈ X let fx : Y → Z be the mapping defined by fx(y) = f(x, y) for all y ∈ Y and for each y ∈ Y let f y : X → Z be the mapping defined by f y(x) = f(x, y) for all x ∈ X. These mappings are known as sections. Let (X, E), (Y, F) and (Z, G) be measurable spaces.

Proposition 2.7 Let f : (X × Y, E × F) → (Z, G) be a measurable mapping. Then the mapping fx : (Y, F) → (Z, G) is measurable for each x ∈ X and f y : (X, E) → (Z, G) is measurable for each y ∈ Y .

Proof If B ⊂ X × Y then for each x ∈ X put Bx = {y ∈ Y : (x, y) ∈ B} and for y y y each y ∈ Y put B = {x ∈ X : (x, y) ∈ B}; thus (IB)x = IBx and (IB) = IB . −1 −1 y −1 −1 y Then (fx) (G) = (f (G))x and (f ) (G) = (f (G)) for all G ∈ G, x ∈ X and y ∈ Y . It is thus enough to show that if x ∈ X and y ∈ Y then Bx ∈ F y and B ∈ E for all B ∈ E × F. Let Sx : X → Y × Y be the mapping given by Sx(y) = (x, y) for each y ∈ Y ; then

F if x ∈ E , S−1(E × F ) = x  ∅ otherwise ,

−1 which implies that Sx (R) ∈ F for each measurable rectangle R, and therefore −1 by Lemma 2.4 Sx is measurable. But Bx = Sx (B), and hence Bx ∈ F for each B ∈ E × F. In the same way By ∈ E for all B ∈ E × F. 2 Measurable spaces and mappings 15

Besides the product of measurable spaces there is the dual concept of a disjoint union. Let S be a non-empty set and for each s ∈ S let (Xs, Es) be a measurable space; assume that the sets Xs, s ∈ S, are disjoint and put X = s∈S Xs. Let S Es = {E ⊂ X : E ∩ Xs ∈ Es for all s ∈ S} ; s[∈S then E = s∈S Es is clearly a σ-algebra and the measurable space (X, E) is called the disjointS union of the measurable spaces (Xs, Es), s ∈ S.

For each s ∈ S the inclusion mapping is : Xs → X (with is(x) = x for each −1 x ∈ Xs) is measurable and in fact is (E) = Es. The result corresponding to Proposition 2.5 holds (but is rather trivial): Let (Y, F) be a measurable space and let f : X → Y be a mapping. Then f : (X, E) → (Y, F) is measurable if and only if f ◦ is : (Xs, Es) → (Y, F) is measurable for each s ∈ S. The result corresponding to Proposition 2.6 also holds. Suppose we have two disjoint unions: For each s ∈ S let (Xs, Es) and (Ys, Fs) be measurable spaces with the families of sets {Xs}s∈S and {Ys}s∈S both disjoint and let (X, E) (resp. (Y, F)) be the disjoint union of the measurable spaces (Xs, Es), s ∈ S (resp. (Ys, Fs), s ∈ S). Also for each s ∈ S let fs : Xs → Ys be a mapping; then there is a mapping f : X → Y given by f(x) = fs(x) for all x ∈ Xs, s ∈ S.

Proposition 2.8 (1) If fs : (Xs, Es) → (Ys, Fs) is measurable for each s ∈ S then f : (X, E) → (Y, F) is measurable. −1 −1 (2) If fs (Fs) = Es for each s ∈ S then f (F) = E. −1 (3) If fs (Fs) = Es and fs(Xs) ∈ Fs for each s ∈ S and the set S is countable then f(X) ∈ F.

Proof This is very straightforward.

Suppose now S is countable. Let (X, E) be the disjoint union of the measurable spaces (Xs, Es), s ∈ S, and let (Y, F) be a measurable space. Also for each s ∈ S let fs : Xs → Y be a mapping; then there is a mapping f : X → S × Y given by f(x) = (s, fs(x)) for all x ∈ Xs, s ∈ S.

Proposition 2.9 (1) If fs : (Xs, Es) → (Y, F) is measurable for each s ∈ S then f : (X, E) → (S × Y, P(S) × F) is measurable. −1 −1 (2) If fs (F) = Es for each s ∈ S then f (P(S) × F) = E. −1 (3) If fs (F) = Es and fs(Xs) ∈ F for each s ∈ S then f(X) ∈ P(S) × F. 2 Measurable spaces and mappings 16

Proof For each s ∈ S let Ys = {s} × Y and Fs = {{s} × F : F ∈ F}, thus Fs is a σ-algebra of subsets of Ys. Then S × Y is the disjoint union of the sets Ys, s ∈ S.

Let D be the σ-algebra n≥1 Fs, hence A ⊂ S × Y is an element of D if and only if As ∈ F for each s ∈SS, where As is the section {y ∈ Y : (s, y) ∈ A}. But D = P(S)×F: If A ∈ D then {s}×As is a measurable rectangle in S ×Y for each s ∈ S and hence A = s∈S({s} × As) ∈ P(S) × F. Conversely, if A ∈ P(S) × F then by Proposition 2.7S As ∈ F for each s ∈ S, and so A ∈ D. This shows that (S × Y, P(S) × F) is the disjoint union of the measurable spaces (Ys, Fs), s ∈ S. The result thus follows from Proposition 2.8, since the mapping f : X → S × Y here corresponds to the mapping f in Proposition 2.8.

These results about disjoint unions are rather superficial and the only reason we have presented them is because they do play a role when studying point processes in [16].

Let us end the chapter by looking at the product of topological spaces; here there are two σ-algebras: The first is the product of the Borel σ-algebras, and the second is the Borel σ-algebra of the product topology, and we are interested in finding conditions which ensure that they are equal. 0 If X is a topological space then a subset OX of OX is a base for the topology 0 if for each U ∈ OX and each x ∈ U there exists V ∈ OX with x ∈ V ⊂ U. If 0 OX can be chosen to be countable then X is said to have a countable base for its topology.

For each s ∈ S let Xs be a non-empty topological space and put X = s∈S Xs. A subset of X is called an open rectangle if it has the form s∈S Us withQ Us an open subset of Xs for each s ∈ S and Us =6 Xs for only finitelyQ many s ∈ S; the set of such open rectangles will be denoted by RO. The product topology on X is defined by stipulating that RO should be a base for the topology; a subset U ⊂ X is thus open if for each x ∈ U there exists R ∈ RO with x ∈ R ⊂ U.

We now have the product σ-algebra F = s∈S BXs , with BXs the Borel σ-algebra of Xs for each s ∈ S and also the Borel σQ-algebra BX of X.

Proposition 2.10 If S is countable and each Xs has a countable base for its topology then F = BX .

Proof We first show that F ⊂ BX always holds (without any assumptions on S and the Xs’s.) For s ∈ S let Rs denote the set of measurable rectangles having O the form t∈S Bs with Bs ∈ BXs and Bt = Xt for t =6 s and let Rs denote the set of open rectanglesQ having the form t∈S Us with Us ∈ OXs and Ut = Xt for t =6 s. Moreover, let ps : X → Xs be the proQ jection mapping with ps({xt}t∈S) = xs for −1 O −1 each {xt}t∈S ∈ X. Then Rs = ps (BXs ) and Rs = ps (OXs ) and therefore by 2 Measurable spaces and mappings 17

−1 −1 O Proposition 2.4 Rs = ps (BXs ) = ps (σ(OXs )) = σ(Rs ) ⊂ σ(OX ) = BX . But if

R is a measurable rectangle in X then there exist s1, . . . , sn ∈ S and Rk ∈ Rsk , n 1 ≤ k ≤ n, such that R = k=1 Rk. This shows that each measurable rectangle is in BX and thus F ⊂ BX .T

We now need the assumptions. For each s ∈ S let Vs be a countable base for the topology on Xs and let V be the set of all open rectangles of the form s∈S Vs, where Vs ∈ Vs for all s ∈ A for some finite set A ⊂ S and Vs = XQs for all s ∈ S \ A. Then V is countable and it is easy to see that V is a base from the topology on X. In particular this means that each open subset of X can be written as a countable union of elements from V, thus OX ⊂ σ(V) and hence BX = σ(OX ) ⊂ σ(V). But each element of V is a measurable rectangle and therefore BX ⊂ F. This, together with the first part, implies that F = BX . 3 Measures

Measures are introduced in this chapter and some of their elementary properties studied. These measures are usually defined on σ-algebras, but it is also necessary to look at measures defined just on algebras. In what follows let X be a non-empty set. Let S be a non-empty subset of P(X). + A mapping µ : S → R∞ is said to be — sub-additive if µ(A ∪ B) ≤ µ(A) + µ(B) for all A, B ∈ S with A ∪ B ∈ S. — additive if µ(A ∪ B) = µ(A) + µ(B) for all A, B ∈ S with A ∪ B ∈ S and A ∩ B = ∅. — countably sub-additive if

µ An ≤ µ(An) n[≥1  Xn≥1

for each sequence {An}n≥1 in S with n≥1 An ∈ S. — countably additive if S

µ An = µ(An) n[≥1  Xn≥1

for each disjoint sequence {An}n≥1 in S with n≥1 An ∈ S. — monotone if µ(A) ≤ µ(B) for all A, B ∈ S forSall A ⊂ B. — continuous if it is monotone and

µ An = lim µ(An) n→∞ n[≥1 

for every increasing sequence {An}n≥1 from S with n≥1 An ∈ S. S — ∅-continuous if it is monotone and limn µ(An) = 0 whenever {An}n≥1 is a

decreasing sequence from S with n≥1 An = ∅. + T It is clear that if µ : S → R∞ has one of these properties then the restriction of µ to a subset T of S has the same property. It is also clear that if ∅ ∈ S then + any countably sub-additive mapping µ : S → R∞ with µ(∅) = 0 is sub-additive + and every countably additive mapping µ : S → R∞ with µ(∅) = 0 is additive. + If S is a subset of P(X) with ∅, X ∈ S then an additive mapping µ : S → R∞ with µ(∅) = 0 is said to be a finitely additive measure on S. A countably additive + mapping µ : S → R∞ with µ(∅) = 0 is called a measure on S. A measure is always a finitely additive measure (just take An = ∅ for each n ≥ 3). In the following let A ⊂ P(X) be an algebra.

18 3 Measures 19

Lemma 3.1 A finitely additive measure µ on A is monotone.

Proof Let A, B ∈ A with A ⊂ B. Then B \ A ∈ A and thus

µ(A) ≤ µ(A) + µ(B \ A) = µ(A ∪ (B \ A)) = µ(B) , since µ is additive. Thus µ is monotone.

Proposition 3.1 A finitely additive measure µ on A is a measure if and only if it is continuous.

Proof Suppose first that µ is a measure and let {An}n≥1 be an increasing sequence of elements from A with A = n≥1 An ∈ A; then A is the disjoint union of the sets Am \ Am−1, m ≥ 1, (with AS0 = ∅) and these sets are elements of A. Hence n

µ(A) = µ(Am \ Am−1) = lim µ(Am \ Am−1) n→∞ mX≥1 mX=1 n

= lim µ (Am \ Am−1) = lim µ(An) . n→∞ n→∞ m[=1 

Conversely, suppose that µ is continuous and let {An}n≥1 be a disjoint sequence 0 n of elements from A with A = n≥1 An ∈ A. For each n ≥ 1 let An = m≥1 Am; 0 0 then {An}n≥1 is an increasingSsequence of elements from A with n≥S1 An = A, and therefore S n 0 0 µ(A) = µ An = lim µ(An) = lim µ Am n→∞ n→∞ n[≥1  m[=1  n

= lim µ(Am) = µ(An) ; n→∞ mX=1 Xn≥1 i.e., µ is a measure.

Lemma 3.2 Let µ be a measure on A and {An}n≥1 be a decreasing sequence from A with A = n≥1 An ∈ A and µ(A1) < ∞. Then limn µ(An) = µ(A). T Proof For n ≥ 1 let Bn = A1 \ An; then {Bn}n≥1 is an increasing sequence from

A with n≥1 Bn = A1 \ A. Thus by Proposition 3.1 limn µ(Bn) = µ(A1 \ A). But µ(A1 \ AS) = µ(A1) − µ(A) and µ(Bn) = µ(A1) − µ(An) for each n ≥ 1, since µ(A1) < ∞, and therefore limn µ(An) = µ(A).

A measure or a finitely additive measure µ on A is finite if µ(X) < ∞. If µ(X) = 1 then a measure µ on A is called a probability measure. 3 Measures 20

Proposition 3.2 A finite finitely additive measure µ on A is a measure if and only if it is ∅-continuous.

Proof Lemma 3.2 implies that a finite measure is ∅-continuous. Thus let µ be a finite finitely additive measure on A which is ∅-continuous. Let {An}n≥1 be an increasing sequence from A with A = n≥1 An ∈ A. For each n ≥ 1 put Bn = A \ An; then {Bn}n≥1 is a decreasing Ssequence from A with n≥1 Bn = ∅ and therefore limn µ(Bn) = 0. But µ(A) = µ(Bn) + µ(An) for eachTn ≥ 1, since µ is additive, and hence µ(A) = limn(µ(Bn) + µ(An)) = limn µ(An). This shows that µ is continuous and hence by Proposition 3.2 µ is a measure on A.

Lemma 3.3 A measure µ on A is countably sub-additive. In particular, if

{An}n≥1 is a sequence from A with µ(An) = 0 for each n ≥ 1 and A = n≥1 An an element of A then µ(A) = 0. S

Proof Let {An}n≥1 be a sequence from A with A = n≥1 An ∈ A. Then there exists by Lemma 2.1 (2) a disjoint sequence {Bn}n≥1 Sfrom A with Bn ⊂ An for each n ≥ 1 and n≥1 Bn = A. Then µ(Bn) ≤ µ(An) for each n ≥ 1, since µ is monotone andS µ(A) = n≥1 µ(Bn), since µ is countably additive. Hence µ(A) ≤ n≥1 µ(An), which impliesP that µ is countably sub-additive. P If µ is a probability measure on A then Lemma 3.3 is usually applied to the complements of the sets in the lemma, i.e., in the form: If {Bn}n≥1 is a sequence from A with µ(Bn) = 1 for each n ≥ 1 and B = n≥1 Bn ∈ A then µ(B) = 1. Now let (X, E) be a measurable space (and so ETis a σ-algebra). Then E is in particular an algebra and so the above results are still valid when A is replaced by E. (Note that in this case the requirement involving n≥1 An and n≥1 An in Lemmas 3.2 and 3.3 are then automatically satisfied.) T S + If µ1 and µ2 are measures on E and a1, a2 ∈ R then the linear combination a1µ1 + a2µ2 is also a measure, where (a1µ1 + a2µ2)(E) = a1µ1(E) + a2µ2(E) for all E ∈ E. Proposition 3.3 below is a useful criterion for determining that two finite measures are equal.

Lemma 3.4 Let µ1, µ2 be measures on E and let S ∈ E be such that the numbers µ1(S) and µ2(S) are finite and equal. Then

D = {A ∈ E : µ1(A ∩ S) = µ2(A ∩ S)} is both a d-system and a monotone class. 3 Measures 21

Proof If A1, A2 ∈ D with A1 ⊂ A2 then

µ2((A2 \ A1) ∩ S) = µ2((A2 ∩ S) \ (A1 ∩ S))

= µ2(A2 ∩ S) − µ2(A1 ∩ S) = µ1(A2 ∩ S) − µ1(A1 ∩ S)

= µ1((A2 ∩ S) \ (A1 ∩ S)) = µ1((A2 \ A1) ∩ S) and so A2 \ A1 ∈ D. Thus D is a d-system, since clearly X ∈ D. If {An}n≥1 is an increasing sequence from D and A = n≥1 An then {An ∩ S}n≥1 is an increasing sequence from E with A ∩ S = n≥1(ASn ∩ S) and hence by Proposition 3.1 S µ1(A ∩ S) = lim µ1(An ∩ S) = lim µ2(An ∩ S) = µ2(A ∩ S) n→∞ n→∞ and so A ∈ D. On the other hand, if {An}n≥1 is a decreasing sequence from D and A = n≥1 An then in the same way Lemma 3.2 shows that A ∈ D. This implies thatT D is also a monotone class.

Lemma 3.5 Let S be a subset of E closed under finite intersections and such that there exists an increasing sequence {Sn}n≥1 from S with n≥1 Sn = X. Let µ1, µ2 be measures on E and suppose that the numbers µ1(S) andS µ2(S) are equal and finite for all S ∈ S. Then µ1(E) = µ2(E) for all E ∈ σ(S). In particular, µ1 = µ2 if σ(S) = E.

Proof Fix S ∈ S and let D = {A ∈ E : µ1(A ∩ S) = µ2(A ∩ S)}; by Lemma 3.4 D is a d-system and thus d(S) ⊂ D, since S ⊂ D. Hence m(d(S)) ⊂ D, since by Lemma 3.4 D is also a monotone class. But by Proposition 2.2 d(S) is an algebra and so by Proposition 2.1 m(d(S)) = σ(d(S)). Hence σ(S) ⊂ σ(d(S)) ⊂ D, which implies that µ1(E ∩ S) = µ2(E ∩ S) for all E ∈ σ(S). This shows that µ1(E ∩ S) = µ2(E ∩ S) for all E ∈ σ(S) and all S ∈ S. Now let {Sn}n≥1 be an increasing sequence from S with n≥1 Sn = X, and let E ∈ σ(S). Then {E ∩ Sn}n≥1 is an increasing sequenceSfrom E with n≥1 E ∩ Sn = E and thus by Proposition 3.1 S

µ1(E) = lim µ1(E ∩ Sn) = lim µ2(E ∩ Sn) = µ2(E) . n→∞ n→∞

Therefore µ1(E) = µ2(E) for all E ∈ σ(S).

Proposition 3.3 Let S ⊂ E be closed under finite intersections with X ∈ S and σ(S) = E. Let µ1, µ2 be finite measures on E with µ1(A) = µ2(A) for all A ∈ S. Then µ1 = µ2. 3 Measures 22

Proof This is a special case of Lemma 3.5.

Proposition 3.3 will often be applied when A is an algebra with σ(A) = E. Then A is closed under finite intersections and X ∈ A; thus if µ1, µ2 are finite measures on E with µ1(A) = µ2(A) for all A ∈ A then µ1 = µ2. Proposition 3.4 below shows that this fact also holds for σ-finite measures: A measure µ on A is said to be σ-finite if there exists a sequence {An}n≥1 from A with µ(An) < ∞ for each n ≥ 1 and X = n≥1 An. When dealing with σ-finite measures it is often convenient to choose theS sequence {An}n≥1 here either to be increasing or to be disjoint, and clearly both of these alternatives are possible (in the latter case making use of Lemma 2.1 (2)).

Proposition 3.4 Let A ⊂ E be an algebra with σ(A) = E and µ be a σ-finite measure on A. Let ν1, ν2 be measures on E with ν1(A) = ν2(A) = µ(A) for all A ∈ A. Then ν1 = ν2.

Proof Let S = {S ∈ A : µ(S) < ∞}. Then S is a subset of E closed under finite intersections and, since µ is σ-finite, there exists an increasing sequence {Sn}n≥1 from S with n≥1 Sn = X. In particular, A ⊂ σ(S) and so σ(S) = E. Moreover, for each S ∈ S the numbers ν1(S) and ν2(S) are finite and equal (since they are both equal to µ(S)). Hence by Lemma 3.5 ν1(E) = ν2(E) for all E ∈ σ(S) = E, i.e., ν1 = ν2.

There are good reasons for studying σ-finite measures: First, on the real line (perhaps the most fundamental measure of all) is σ-finite but not finite. Second, there are several important results which hold for σ-finite measures (for example Fubini’s theorem and the Radon-Nikodym theorem) which do not hold for general measures. The follow is a version of Proposition 3.2 for σ-finite measures which will be needed in Chapter 5.

Lemma 3.6 Let A ⊂ P(X) be an algebra and let µ be a finitely additive measure on A. Let {Cn}n≥1 be an increasing sequence from A with X = n≥1 Cn and µ(Cn) < ∞ for each n ≥ 1 such that S

(1) limn µ(A ∩ Cn) = µ(A) for all A ∈ A.

(2) If {An}n≥1 is a decreasing sequence from A with n≥1 An = ∅ and A1 ⊂ Cp for some p ≥ 1 then limn µ(An) = 0. T Then µ is a measure on (X, A).

Proof Let {An}n≥1 be an increasing sequence from A with A = n≥1 An ∈ A. 0 0 0 Fix p ≥ 1, put A = A ∩ Cp and let An = An ∩ Cp for each n ≥ 1; thenS {An}n≥1 is 3 Measures 23

0 0 an increasing sequence from A with A = n≥1 An ∈ A. Now for each n ≥ 1 put 0 0 Bn = A \ An; then {Bn}n≥1 is a decreasingS sequence from A with n≥1 Bn = ∅ 0 0 and B1 ⊂ Cp and hence limn µ(Bn) = 0. Now µ(A ) = µ(Bn) + µ(TAn) for each n ≥ 1, since µ is additive, and thus

0 0 0 µ(A ∩ Cp) = µ(A ) = lim (µ(Bn) + µ(An)) = lim µ(An) ≤ lim µ(An) . n→∞ n→∞ n→∞

It therefore follows that µ(A) = limp µ(A ∩ Cp) ≤ limn µ(An). But µ(An) ≤ µ(A) for all n ≥ 1, since µ is monotone, and hence also limn µ(An) ≤ µ(A). This shows that µ is continuous and so by Proposition 3.2 µ is a measure on A. 4 The Carath´eodory extension theorem

In this chapter we prove the Carath´eodory extension theorem (Theorem 4.2). The main part of this result (which also occurs as Theorem 4.1) states that if A ⊂ P(X) is an algebra then any measure on A extends to a measure on σ(A). Another proof of Theorem 4.1 will be given in Chapter 11 as a by-product of the construction of the Daniell integral. In what follows let X be a non-empty set and let A ⊂ P(X) be an algebra.

Theorem 4.1 Every measure µ on A can be extended to a measure on σ(A): There exists a measure ν on σ(A) such that ν(A) = µ(A) for all A ∈ A. More- over, if µ is σ-finite then ν is uniquely determined by µ.

Proof The existence of ν is part of Theorem 4.2. The uniqueness for a σ-finite measure µ follows immediately from Proposition 3.4.

+ A monotone countably sub-additive mapping µ : P(X) → R∞ with µ(∅) = 0 is called an outer measure. + Let S be a subset of P(X) containing ∅ and X and let µ : S → R∞ be a measure ∗ + on S. Define a mapping µ : P(X) → R∞ by

∗ µ (A) = inf µ(An) : {An}n≥1 is a sequence from S with A ⊂ An . nXn≥1 n[≥1 o

Lemma 4.1 The mapping µ∗ is an outer measure.

Proof It is clear that µ∗ is monotone and that µ∗(∅) = 0 (since µ(∅) = 0). It thus remains to show that µ∗ is countably sub-additive. ∗ Let {Sn}n≥1 be a sequence from P(X) and put S = n≥1 Sn. If µ (Sn) = ∞ ∗ ∗ for some n then also µ (S) = ∞, since Sn ⊂ S and µ Sis monotone, and in this ∗ ∗ ∗ case µ (S) = ∞ = n≥1 µ (Sn). We can thus assume that µ (Sn) < ∞ for each n ≥ 1. Let ε > 0; thenP for each n ≥ 1 there is a sequence {An,k}k≥1 from S with ∗ −n Sn ⊂ k≥1 An,k and k≥1 µ(An,k) < µ (Sn) + 2 ε. S + + P+ Let h : N → N ×N be any bijective mapping and let {Bm}m≥1 be the sequence from S with Bm = Ah(m) for each m ≥ 1. Let M ≥ 1; then there exists N ≥ 1, K ≥ 1 so that h(m) ∈ {1, . . . , N} × {1, . . . , K} for all m ∈ {1, . . . , M} and thus

M M N K

µ(Bm) = µ(Ah(m)) ≤ µ(An,k) mX=1 mX=1 Xn=1 Xk=1 N N ∗ −n ∗ ∗ ≤ (µ (Sn) + 2 ε) ≤ µ (Sn) + ε ≤ µ (Sn) + ε . Xn=1 Xn=1 Xn≥1

24 4 The Caratheodor´ y extension theorem 25

∗ Therefore m≥1 µ(Bm) ≤ n≥1 µ (Sn) + ε. But S = n≥1 Sn ⊂ m≥1 Bm (since for each n P≥ 1 and each x ∈PSn there exists k ≥ 1 withSx ∈ An,k andS hence there exists m ≥ 1 with x ∈ Bm) and it follows that

∗ ∗ µ (S) ≤ µ(Bm) ≤ µ (Sn) + ε . mX≥1 Xn≥1

∗ ∗ ∗ Since ε > 0 is arbitrary we have µ (S) ≤ n≥1 µ (Sn) and this shows that µ is countably sub-additive. P

The mapping µ∗ is called the outer measure generated by µ.

Theorem 4.2 Let µ be a measure on A and let µ∗ be the outer measure generated by µ. Put

B = {S ∈ P(X) : µ∗(T ∩ S) + µ∗(T \ S) = µ∗(T ) for all T ∈ P(X)} .

Then: (1) B is a σ-algebra with A ⊂ B. (2) The restriction of µ∗ to B is a measure. (3) µ∗(A) = µ(A) for all A ∈ A. ∗ (4) If B ∈ B with µ (B) < ∞ then for each ε > 0 there exists a sequence {An}n≥1 ∗ from A with B ⊂ n≥1 An such that µ n≥1 An \ B < ε. (5) If B ∈ B withSµ∗(B) < ∞ then for eSach ε > 0 ther e exists A ∈ A such that µ∗((A \ B) ∪ (B \ A)) < ε.

Let ν be the restriction of µ∗ to B; then the measure ν on B is called the Carath´eodory extension of the measure µ on A. As preparation for the proof of Theorem 4.2 we need a couple of lemmas.

+ Lemma 4.2 Let λ : P(X) → R∞ be an outer measure and put

B = {S ∈ P(X) : λ(T ∩ S) + λ(T \ S) = λ(T ) for all T ∈ P(X)} .

Then B is a σ-algebra and the restriction of λ to B is a measure.

Proof (1) It is clear that ∅ ∈ B and so in particular B =6 ∅. (2) X \ S ∈ B for each S ∈ B, since T ∩ (X \ S) = T \ S and T \ (X \ S) = T ∩ S for all T ∈ P(X). 4 The Caratheodor´ y extension theorem 26

(3) Let S1, S2 ∈ B; then for all T ∈ P(X)

λ(T ) = λ(T ∩ S1) + λ(T \ S1)

= λ(T ∩ S1) + λ((T \ S1) ∩ S2) + λ((T \ S1) \ S2)

= λ(T ∩ (S1 ∪ S2) ∩ S1) + λ((T ∩ (S1 ∪ S2)) \ S1) + λ(T \ (S1 ∪ S2)) , since (T \ S1) ∩ S2 = (T ∩ (S1 ∪ S2)) \ S1, (T \ S1) \ S2 = T \ (S1 ∪ S2) and also S1 = (S1 ∪ S2) ∩ S1. But

λ(T ∩ (S1 ∪ S2) ∩ S1) + λ((T ∩ (S1 ∪ S2)) \ S1) = λ(T ∩ (S1 ∪ S2)) and therefore λ(T ∩ (S1 ∩ S2)) + λ(T \ (S1 ∪ S2)) = λ(T ) for all T ∈ P(X). Hence S1 ∪ S2 ∈ B. (4) By (1), (2) and (3) B is an algebra.

(5) Let S1, S2 ∈ B with S1 ∩ S2 = ∅; then, since S1 ∈ B

λ(T ∩ (S1 ∪ S2)) = λ(T ∩ (S1 ∪ S2) ∩ S1) + λ((T ∩ (S1 ∪ S2)) \ S1)

= λ(T ∩ S1) + λ(T ∩ S2) for all T ∈ P(X).

(6) Let {Sn}n≥1 be a disjoint sequence from B. Then by induction on m it follows from (5) that for all m ≥ 1

m m

λ T ∩ Sn = λ(T ∩ Sn)  n[=1  Xn=1 for all T ∈ P(X).

(7) Let {Sn}n≥1 be a disjoint sequence from B. Since λ is monotone it follows from (4) and (6) that for all m ≥ 1

m m

λ(T ) = λ T ∩ Sn + λ T \ Sn  n[=1   n[=1  m m m

= λ(T ∩ Sn) + λ T \ Sn ≥ λ(T ∩ Sn) + λ T \ Sn Xn=1  n[=1  Xn=1  n[≥1  for all T ∈ P(X). Thus, since λ is countably sub-additive,

λ(T ) ≥ λ(T ∩ Sn) + λ T \ Sn ≥ λ T ∩ Sn + λ T \ Sn . Xn≥1  n[≥1   n[≥1   n[≥1  But an outer measure is sub-additive and hence also

λ(T ) ≤ λ T ∩ Sn + λ T \ Sn .  n[≥1   n[≥1  4 The Caratheodor´ y extension theorem 27

Therefore λ(T ) = λ T ∩ n≥1 Sn + λ T \ n≥1 Sn for all T ∈ P(X), and this shows that n≥1 Sn ∈SB.  S  (8) By (4), (7) andS Lemma 2.1 (4) B is a σ-algebra.

(9) The restriction of λ to B is a measure: Let {Sn}n≥1 be a disjoint sequence from B. In (7) we saw that

λ(T ) ≥ λ(T ∩ Sn) + λ T \ Sn Xn≥1  n[≥1  for all T ∈ P(X). In particular with T = n≥1 Sn this implies that S λ Sn ≥ λ(Sn) + λ(∅) = λ(Sn) n[≥1  Xn≥1 Xn≥1 and therefore λ n≥1 Sn = n≥1 λ(Sn), since λ is countably sub-additive. S  P Lemma 4.3 Let µ be a measure on A and let µ∗ be the outer measure generated by µ. Then: (1) µ∗(T ) = µ∗(T ∩ A) + µ∗(T \ A) for all A ∈ A, T ∈ P(X). (2) µ∗(A) = µ(A) for all A ∈ A.

Proof (1) Let A ∈ A and T ∈ P(X); then µ∗(T ) ≤ µ∗(T ∩ A) + µ∗(T \ A), since µ∗ is sub-additive. We thus need to show that µ∗(T ) ≥ µ∗(T ∩ A) + µ∗(T \ A), and for this we can assume that µ∗(T ) < ∞. Let ε > 0; then there is a sequence ∗ {An}n≥1 from A with T ⊂ n≥1 An such that n≥1 µ(An) ≤ µ (T ) + ε. But S P µ(An) = (µ(An ∩ A) + µ(An \ A)) Xn≥1 Xn≥1 ∗ ∗ = µ(An ∩ A) + µ(An \ A) ≥ µ (T ∩ A) + µ (T \ A) , Xn≥1 Xn≥1 since An ∩ A, An \ A ∈ A and T ∩ A ⊂ n≥1(An ∩ A), T \ A ⊂ n≥1(An \ A). Thus µ∗(T ) + ε ≥ µ∗(T ∩ A) + µ∗(T \ A), Sand since ε > 0 was arbitraryS it follows that µ∗(T ) ≥ µ∗(T ∩ A) + µ∗(T \ A), (2) Let A ∈ A; then

∗ µ (A) = inf µ(An) : {An}n≥1 is a sequence from A with A ⊂ An nXn≥1 n[≥1 o

∗ and in particular µ (A) ≤ µ(A). (Just take A1 = A and An = ∅ for all n > 1.)

Now let {An}n≥1 be any sequence from A with A ⊂ n≥1 An. Let B1 = A1 ∩ A S 4 The Caratheodor´ y extension theorem 28

n−1 and for each n > 1 put Bn = (An \ k=1 Ak) ∩ A. Then {Bn}n≥1 is a disjoint sequence from A with Bn ⊂ An for eacSh n ≥ 1 and n≥1 Bn = A. Thus, since µ is a measure on A, S µ(A) = µ(Bn) ≥ µ(An) Xn≥1 Xn≥1 and this shows that µ(A) ≥ µ∗(A). Hence µ∗(A) = µ(A) for all A ∈ A.

Proof of Theorem 4.2: (1), (2) and (3) follow directly from Lemmas 4.2 and 4.3. (4) Let B ∈ B with µ∗(B) < ∞ and let ε > 0. Then there is a sequence ∗ {An}n≥1 from A with B ⊂ n≥1 An such that n≥1 µ(An) < µ (B) + ε. Now ∗ ∗ ∗ µ(An) = µ (An) for each n ≥S1 and µ n≥1 An P≤ n≥1 µ (An), and therefore S  P ∗ ∗ ∗ µ An \ B = µ An − µ (B) < ε , n[≥1  n[≥1  since by (2) the restriction of µ∗ to B is a measure and so in particular it is additive.

(5) Exactly as in (4) there is a sequence {An}n≥1 from A with B ⊂ n≥1 An such ∗ that n≥1 µ(An) < µ (B) + ε/2. Choose m ≥ 1 with n≥m+1 µ(ASn) < ε/2 and m put AP= n=1 An. Then A ∈ A and P S µ∗((A \ B) ∪ (B \ A)) = µ∗(A \ B) + µ∗(B \ A) ∗ ∗ ≤ µ An \ B + µ An < ε n[≥1  n≥[m+1  since (A \ B) ∩ (B \ A) = ∅, and again using the fact that the restriction of µ∗ to B is additive.

Lemma 4.4 Let ν on B be the Carath´eodory extension of a measure µ on A. Then for each B ∈ B with ν(B) < ∞ there exists A ∈ σ(A) with B ⊂ A and ν(A \ B) = 0.

Proof By Theorem 4.2 (4) there exists for each m ≥ 1 a sequence {Am,n}n≥1 from A with B ⊂ n≥1 Am,n such that ν n≥1 Am,n \ B < 1/m. Therefore A = m≥1 n≥1 Am,nS∈ σ(A) with B ⊂ A andS for each m ≥ 1 T S

ν(A \ B) ≤ ν Am,n \ B < 1/m , n[≥1  i.e., ν(A \ B) = 0. 4 The Caratheodor´ y extension theorem 29

Proposition 4.1 Let ν on B be the Carath´eodory extension of a σ-finite measure µ on A. Then for each B ∈ B there exists A ∈ σ(A) with B ⊂ A and ν(A\B) = 0.

Proof Since µ is σ-finite there is a sequence {Cn}n≥1 from A with µ(Cn) < ∞ for each n ≥ 1 such that X = n≥1 Cn. Now let B ∈ B and for each n ≥ 1 put Bn = B ∩ Cn; then ν(Bn) ≤Sµ(Cn) < ∞ and so by Lemma 4.4 there exists An ∈ σ(A) with Bn ⊂ An so that ν(An \ Bn) = 0. Put A = n≥1 An; then A ∈ σ(A) with B ⊂ A and ν(A \ B) = 0, since S

A \ B = An \ Bn ⊂ (An \ Bn) n[≥1  n[≥1  n[≥1 and ν n≥1(An \ Bn) ≤ n≥1 ν(An \ Bn) = 0. S  P Proposition 4.2 Let µ be a σ-finite measure on A and also denote by µ the unique extension to a measure on σ(A). Then for each B ∈ σ(A) and each

ε > 0 there exists a sequence {An}n≥1 from A with B ⊂ n≥1 An such that µ( n≥1 An \ B) < ε. S S Proof Let µ∗ be the outer measure generated by µ; then µ(B) = µ∗(B) for all B ∈ σ(A) (by Theorem 4.2), and since µ is σ-finite there is an increasing sequence

{Em}m≥1 from A with µ(Em) < ∞ for each m ≥ 1 and X = m≥1 Em. Now let B ∈ σ(A) and let ε > 0; since µ(B ∩ Em) < ∞ Theorem 4.2S(4) implies there exists a sequence {Fm,k}k≥1 from A with B ∩ Em ⊂ k≥1 Fm,k so that S ∗ −m µ Fm,k \ (B ∩ Em) = µ Fm,k \ (B ∩ Em) < 2 ε . k[≥1  k[≥1 

Let {An}n≥1 be an enumeration of the double sequence {Fm,k}m≥1,k≥1. Then

{An}n≥1 is a sequence from A with B ⊂ n≥1 An and S

µ An \ B ≤ µ Fm,k \ (B ∩ Em) n[≥1  m[≥1k[≥1  −n ≤ µ Fm,k \ (B ∩ Em) < 2 ε = ε . mX≥1 k[≥1  mX≥1 5 Measures on the real line

In this chapter we show how all reasonable measures on the Borel subsets of the real line R can be constructed. The most important example (and perhaps the most important measure of all) is Lebesgue measure. Recall that the σ-algebra of Borel subsets of R is defined to be the σ-algebra BR = σ(OR), where OR is the set of open subsets of R. It is easy to see that BR contains every interval, and in particular all intervals of the form (a, b] with a < b.

A measure µ on BR is called locally finite if µ(B) < ∞ for each bounded set B ∈ BR. Thus µ is locally finite if and only if µ((a, b]) < ∞ for all a, b ∈ R with a < b. A locally finite measure is σ-finite, although in general the converse does not hold.

Lemma 5.1 Let µ be a locally finite measure on BR. Then there exists a mapping α : R → R such that µ((a, b]) = α(b)−α(a) for all a, b ∈ R with a < b. Moreover, any such mapping is increasing and right continuous. (This means: For each x ∈ R and each ε > 0 there exists δ > 0 such that |α(y) − α(x)| < ε for all y ∈ [x, x + δ).)

Proof Define a mapping α : R → R by

µ((0, x]), if x ≥ 0, α(x) =  −µ((x, 0]), if x < 0.

Then µ((a, b]) = α(b) − α(a) holds for all a, b ∈ R with a < b: If 0 ≤ a < b then µ((a, b]) = µ((0, b]) − µ((0, a]) = α(b) − α(a), on the other hand, if a < 0 ≤ b then µ((a, b]) = µ((a, 0]) + µ((0, b]) = α(b) − α(a), and finally if a < b < 0 then µ((a, b]) = µ((a, 0]) − µ((b, 0]) = α(b) − α(a). Now let α : R → R be any mapping such that µ((a, b]) = α(b) − α(a) for all a, b ∈ R with a < b. Then α(y) − α(x) = µ((x, y]) ≥ 0 if x < y and thus α is increasing. Moreover, α is right continuous: Let x ∈ R and for each n ≥ 1 let An = (x, x + 1/n]; then {An}n≥1 is a decreasing sequence from BR with

n≥1 An = ∅ and thus by Lemma 3.2 it follows that T lim (α(x + 1/n) − α(x)) = lim µ(An) = µ(∅) = 0 , n→∞ n→∞ since µ(A1) = α(x + 1) − α(x) < ∞. Let ε > 0; then there exists N ≥ 1 such that |α(x + 1/N) − α(x)| < ε and hence |α(y) − α(x)| < ε for all y ∈ [x, x + 1/N), since α is increasing.

The main result of this chapter states that the converse of Lemma 5.1 holds:

30 5 Measures on the real line 31

Theorem 5.1 Let α : R → R be increasing and right continuous. Then there exists a unique measure ν on BR such that ν((a, b]) = α(b) − α(a) for all a, b ∈ R with a < b.

The most important example of Theorem 5.1 is when α is the identity mapping idR : R → R (with idR(x) = x for all x ∈ R). This mapping is strictly increasing and continuous and therefore by Theorem 5.1 there exists a unique measure λ on BR such that λ((a, b]) = b − a for all a, b ∈ R with a < b. The measure λ is called Lebesgue measure on BR. We start the proof of Theorem 5.1 by looking at the uniqueness.

Lemma 5.2 Let α : R → R be increasing and right continuous. Then there is at most one measure ν on BR so that ν((a, b]) = α(b) − α(a) for all a, b ∈ R with a < b.

Proof Let ν1, ν2 be measures on BR with ν1((a, b]) = α(b) − α(a) = ν2((a, b]) for all a, b ∈ R with a < b. Then the numbers ν1(S) and ν2(S) are finite and equal for all S ∈ S, where S = {∅} ∪ {(a, b] : a, b ∈ R with a < b}. But the set S is closed under finite intersections and there exists an increasing sequence {Sn}n≥1 from S with n≥1 Sn = R (for example Sn = (−n, n]). Therefore by Lemma 3.5 ν1(F ) = ν2(FS) for all F ∈ σ(S). But σ(S) = BR: If a, b ∈ R with a < b then −1 (a, b) = n≥1(a, b − n ] ∈ σ(S), and thus BR = σ(OR) ⊂ σ(S), since each open subset ofSR can be written as the countable union of open intervals. This implies that ν1 = ν2.

The rest of the chapter deals with the existence of the measure ν. For −∞ ≤ a < b < +∞ let us write ha, bi = (a, b] and for −∞ ≤ a < +∞ put ha, +∞i = (a, +∞). Let

n

A = hak, bki : n ≥ 0 and − ∞ ≤ a1 < b1 < a2 < · · · < an < bn ≤ +∞ , nk[=1 o

0 where k=1hak, bki = ∅; in particular (−∞, x] ∈ A for each x ∈ R. S Lemma 5.3 A is an algebra with σ(A) = BR.

Proof Easy exercise. (The proof that σ(A) = BR is already contained in the proof of Lemma 5.2).

Lemma 5.4 For each A ∈ A there exists a unique n ≥ 0 and unique numbers n −∞ ≤ a1 < b1 < a2 < · · · < an < bn ≤ +∞ such that A = k=1hak, bki. S 5 Measures on the real line 32

Proof This is clear.

+ By Lemma 5.4 we can define a mapping µ : A → R∞ by

n n

µ hak, bki = (α(bk) − α(ak)) , k[=1  Xk=1

0 where α(−∞) = inf(α(R)), α(+∞) = sup(α(R)) and µ k=1hak, bki = 0. In particular (with n = 1) this means µ((a, b]) = α(b) − α(a)Sfor all a, b∈ R with a < b.

+ Lemma 5.5 The mapping µ : A → R∞ is additive: If A, B ∈ A with A∩B = ∅ then µ(A ∪ B) = µ(A) + µ(B).

Proof Easy exercise.

Denote by A0 the set of bounded elements in A.

Lemma 5.6 Let A ∈ A0 and let ε > 0; then there exists n ≥ 0 and numbers n −∞ < a1 < b1 < a2 < · · · < an < bn < +∞ with k=1[ak, bk] ⊂ A such that n µ k=1(ak, bk] ≥ µ(A) − ε. S S  Proof Another exercise. (It is here that the right continuity of α is needed.)

Let A ∈ A0 and ε > 0; by Lemma 5.6 there exists B ∈ A0 such that the closure B¯ of B is a subset of A with µ(B) ≥ µ(A) − ε. For each n ≥ 1 let Cn = (−n, n]; 0 then {Cn}n≥1 is an increasing sequence from A with R = n≥1 Cn. Moreover, µ(Cn) = α(n) − α(−n) < ∞ for all n ≥ 1. S

Lemma 5.7 µ(A) = limn µ(A ∩ Cn) for all A ∈ A.

Proof This is clear.

Lemma 5.8 µ is a finite measure on A.

Proof Clearly µ(∅) = 0 and µ(R) = sup(α(R)) − inf(α(R)) < +∞ and thus by Lemma 5.5 µ is a finitely additive measure on A. We are going to make use of

Lemma 3.6, so let {An}n≥1 be a decreasing sequence from A with n≥1 An = ∅ 0 and A1 ⊂ Cp for some p ≥ 1 (and thus An ∈ A for all n ≥ 1). LetT ε > 0; by 0 Lemma 5.6 there exists for each n ≥ 1 an element Bn ∈ A with B¯n ⊂ An and 5 Measures on the real line 33

−n n 0 µ(Bn) > µ(An) − 2 ε. For each n ≥ 1 let Dn = k=1 Bk; then Dn ∈ A with ¯ Dn ⊂ An. Moreover, T

n n n n n

Dn = Bk = (Ak \ (Ak \ Bk)) = Ak \ (Ak \ Bk) = An \ (Ak \ Bk) k\=1 k\=1 k\=1  k[=1 k[=1 and this implies that

n n −k µ(Dn) ≥ µ(An) − µ(Ak \ Bk) ≥ µ(An) − 2 ε > µ(An) − ε , Xk=1 Xk=1 ¯ ¯ i.e., µ(Dn) > µ(An) − ε. Now n≥1 An = ∅ and so n≥1 Dn = ∅. But {Dn}n≥1 is a decreasing sequence of closedT subsets of the compactT set C¯p = [−p, p] and hence there exists m ≥ 1 so that D¯ m = ∅. This means that µ(An) < ε for all n ≥ m, which shows that limn µ(An) < ε. Thus limn µ(An) = 0, and therefore by Lemma 3.6 µ is a measure on A.

Now by Theorem 4.1 the measure µ on A can be extended to a measure ν on BR (since by Lemma 5.3 σ(A) = BR) and then

ν((a, b]) = µ((a, b]) = α(b) − α(a) for all a, b ∈ R with a < b. This completes the proof of Theorem 5.1. 6 How the integral will be introduced

Let (X, E) be a measurable space and let µ be a measure on E. In the following four chapters we are going to define the integral f dµ of f with respect to µ + for each measurable mapping f : X → R∞. ThisR will result in a functional + : M(E) → R∞ with the following properties: (1)R (af + bg) dµ = a f dµ + b g dµ for all f, g ∈ M(E), a, b ∈ R+. (2) IfR f, g ∈ M(E) withR g ≤ f thenR g dµ ≤ f dµ. R R (3) If {fn}n≥1 is an increasing sequence from M(E) then

lim fn dµ = lim fn dµ , Z n→∞ n→∞ Z  where limn fn is defined pointwise by (limn fn)(x) = limn fn(x) for each x ∈ X. + (4) IE dµ = µ(E) for all E ∈ E, where the mapping IE : X → R∞ is given by R 1 if x ∈ E , I (x) = E  0 otherwise .

Note that the statements (1) and (3) presuppose that the set M(E) has certain properties, namely that: (5) af + bg ∈ M(E) for all f, g ∈ M(E) and all a, b ∈ R+.

(6) If {fn}n≥1 is an increasing sequence from M(E) then limn fn ∈ M(E). Properties (3) and (6) are essentially statements about the order structure on M(E). Property (6) states that M(E) is complete with respect to the usual partial order ≤ on M(E) (in which g ≤ f if and only if g(x) ≤ f(x) for all x ∈ X). Moreover, (3) then says that the integral should be a continuous mapping between + the partially ordered sets M(E) and R∞. This observation will be used as the basis for our approach to introducing the integral. In Chapter 7 we look at some general results about partially ordered sets, in Chapter 8 we apply these results to the set M(X) of all mappings from X + to R∞ and then in Chapter 9 specialise to the set M(E) of measurable mappings. Finally, the integral appears in Chapter 10. In Chapter 11 we give another approach to the integral (the Daniell integral). In particular, this provides an alternative proof of the Carath´eodory extension theorem (Theorem 4.1).

34 7 Partially ordered sets

Our approach to the integral makes use of some general results about continuous mappings defined on complete partially ordered sets. These results are presented in the present chapter. Let D be a non-empty set. A binary relation ≤ on D is a partial order if:

(1) z ≤ z for all z ∈ D.

(2) z1 = z2 whenever z1 ≤ z2 and z2 ≤ z1.

(3) z1 ≤ z3 whenever z1 ≤ z2 and z2 ≤ z3.

In other words, the relation ≤ is reflexive, anti-symmetric and transitive. A pair (D, ≤) consisting of a non-empty set D and a partial order ≤ is called a partially ordered set or, for short, a poset. If (D, ≤) is a poset then we mostly just write D instead of (D, ≤) and assume that ≤ can be determined from the context. In fact, unless something explicit to the contrary is stated, the partial orders will always be denoted by ≤ (even if several posets are being considered at the same time). The notation z1 ≥ z2 will often be used as an alternative to z2 ≤ z1. Moreover, z1 < z2 will mean that z1 ≤ z2 but z1 =6 z2.

In what follows let D be a poset. A sequence of elements {zn}n≥1 from D is said to be increasing if zn ≤ zn+1 and decreasing if zn+1 ≤ zn for all n ≥ 1. Let A be a subset of D; an element z ∈ D is said to be an upper bound for A if z0 ≤ z for all z0 ∈ A. An upper bound z is called a least upper bound for A if z ≤ z0 for each upper bound z0 for A. If a least upper bound exists then it clearly unique and it will be denoted by sup(A). The poset D is said to be complete if each non-empty subset of D possesses a least upper bound. For the rest of the chapter assume that D is complete.

If {zn}n≥1 is an increasing sequence of elements from D then the least upper bound sup({zn : n ≥ 1}) will be denoted by limn zn Thus limn zn is the unique 0 element z of D with the properties: (i) zn ≤ z for all n ≥ 1 and (ii) if z ∈ D 0 0 0 with z ≤ z and z =6 z then z ≤ zn for all large enough n. Note that this is not + the definition of the limit used for R∞ in Chapter 1, but it is equivalent to it. A binary operation ? : D × D → D is said to be monotone if w ? z ≤ w0 ? z0 0 0 whenever w ≤ w and z ≤ z . If ? is monotone and {wn}n≥1 and {zn}n≥1 are increasing sequences from D then {wn ? zn}n≥1 is also an increasing sequence. A monotone operation ? is defined to be continuous if

lim (wn ? zn) = lim wn ? lim zn n→∞ n→∞ n→∞   holds for all increasing sequences {wn}n≥1 and {zn}n≥1 from D.

35 7 Partially ordered sets 36

There is a binary operation ∨ : D × D → D defined by w ∨ z = sup{w, z} for all w, z ∈ D; it is clear that ∨ is monotone.

Lemma 7.1 The operation ∨ is continuous.

Proof Let {wn}n≥1, {zn}n≥1 be increasing sequences from D with w = limn wn and z = limn zn; put v = limn(wn ∨ zn). Then wn ∨ zn ≤ w ∨ z for all n ≥ 1, and hence v ≤ w ∨ z. On the other hand, wn ≤ wn ∨ zn ≤ v for all n ≥ 1 and so w ≤ v, and in the same way z ≤ v. Therefore w ∨ z ≤ v and this shows that w ∨ z = v.

If ? is a binary operation on D then a subset N of D will be called ?-closed or closed under ? if z ? w ∈ N for all z, w ∈ N; the subset N is said to be complete if limn zn ∈ N for each increasing sequence {zn}n≥1 from N. For each subset N of D let N ↑ denote the set of all elements of D having the ↑ form limn zn for some increasing sequence {zn}n≥1 from N. Thus N ⊂ N and N = N ↑ if and only if N is complete.

Lemma 7.2 Let N be a ∨-closed subset of D and let {zn}n≥1 be an increasing ↑ sequence from N with z = limn zn. Then there exists an increasing sequence {wn}n≥1 from N with wn ≤ zn for all n ≥ 1 and limn wn = z. In particular, this implies z ∈ N ↑.

Proof For each n ≥ 1 there exists an increasing sequence {zn,m}m≥1 from N with zn = limm zn,m. For each m ≥ 1 let wm = z1,m ∨ · · · ∨ zm,m; then wm ∈ N, since N is ∨-closed, and

wm = z1,m ∨ · · · ∨ zm,m ≤ z1,m+1 ∨ · · · ∨ zm,m+1 ≤ wm+1 , since ∨ is monotone. Therefore {wm}m≥1 is an increasing sequence from N; put w = limm wm. Now

wn = z1,n ∨ · · · ∨ zn,n ≤ z1 ∨ · · · ∨ zn = zn , i.e., wn ≤ zn for all n ≥ 1 and so in particular w = limn wn ≤ limn zn = z. But on the other hand, zn,m ≤ z1,m ∨ · · · ∨ zm,m = wm ≤ w for all m ≥ n ≥ 1; thus zn = limm zn,m ≤ w for all n ≥ 1 and hence z ≤ w, i.e., w = z.

Proposition 7.1 Let N be a ∨-closed subset of D. Then N ↑ is ∨-closed and complete. Moreover, if ? is a continuous operation on D and N is ?-closed then so is N ↑. 7 Partially ordered sets 37

Proof Lemma 7.2 implies in particular that N ↑ is complete. Let ? be continuous ↑ and suppose N is ?-closed. Let z, w ∈ N and {zn}n≥1, {wn}n≥1 be increasing sequences from N with z = limn zn and w = limn wn. Then zn ? wn ∈ N, since N ↑ ↑ is ?-closed and limn zn ? wn = z ? w. Thus z ? w ∈ N , since N is complete. This shows that N ↑ is ?-closed, and in particular N ↑ is ∨-closed, since by Lemma 7.1 ∨ is continuous.

Let E be a further complete poset and let N be a subset of D. A mapping ψ : N → E is said to be monotone if ψ(w) ≤ ψ(z) whenever w, z ∈ N with w ≤ z. Let ψ : N → E be monotone; then {ψ(zn)}n≥1 is an increasing sequence in E for each increasing sequence {zn}n≥1 from N. A monotone mapping ψ : N → E is said to be pre-continuous if

ψ(z) ≤ lim ψ(zn) n→∞ holds whenever {zn}n≥1 is an increasing sequence from N and z ∈ N is such that z ≤ limn zn. Moreover, if N is complete then a monotone mapping ψ : N → E is said to be continuous if

ψ lim zn = lim ψ(zn) n→∞ n→∞  holds for each increasing sequence {zn}n≥1 from N.

Lemma 7.3 Let N be a complete subset of D and ψ : N → E be a monotone mapping. Then ψ is continuous if and only if it is pre-continuous.

Proof Assume first that the mapping ψ is pre-continuous. Let {zn}n≥1 be an increasing sequence from N and put z = limn zn. Then in particular z ≤ limn zn and hence ψ(z) ≤ limn ψ(zn). But zn ≤ z and therefore ψ(zn) ≤ ψ(z) for all n ≥ 1, since ψ is monotone . Thus also limn ψ(zn) ≤ ψ(z), i.e., limn ψ(zn) = ψ(z), and this shows ψ is continuous. Now assume that ψ is continuous. Let {zn}n≥1 0 be an increasing sequence from N and let z ∈ N with z ≤ z = limn zn. Then, 0 since ψ is monotone, ψ(z) ≤ ψ(z ) = ψ(limn zn) = limn ψ(zn), and this shows ψ is pre-continuous.

The following result gives the basic criterion for the existence of a continuous extension (and is, for example, the essential tool in Chapter 10 for defining the integral).

Proposition 7.2 Let N be a ∨-closed subset of D and let ψ : N → E be a monotone mapping. Then there exists a continuous mapping ψ0 : N ↑ → E such that ψ0(z) = ψ(z) for all z ∈ N if and only if ψ is pre-continuous. 7 Partially ordered sets 38

Proof If there exists a continuous mapping ψ0 : N ↑ → E such that ψ0(z) = ψ(z) for all z ∈ N then by Lemma 7.3 ψ0 is pre-continuous. Therefore ψ, being the restriction of a pre-continuous mapping, is itself pre-continuous. ↑ Assume then conversely that ψ is pre-continuous. Let z ∈ N and let {zn}n≥1, {wn}n≥1 be increasing sequences from N with limn zn = z = limn wn. Then zm ≤ z = limn wn and thus ψ(zm) ≤ limn ψ(wn) for each m ≥ 1, which implies limn ψ(zn) ≤ limn ψ(wn). But the same argument also shows of course that limn ψ(wn) ≤ limn ψ(zn) and therefore limn ψ(zn) = limn ψ(wn). There is thus 0 ↑ 0 a unique mapping ψ : N → E such that ψ (limn zn) = limn ψ(zn) for each 0 increasing sequence {zn}n≥1 from N. In particular ψ (z) = ψ(z) for each z ∈ N. 0 ↑ The mapping ψ is monotone: Let z, w ∈ N with z ≤ w and let {zn}n≥1, {wn}n≥1 be increasing sequences from N with z = limn zn and w = limn wn. Then, since N is ∨-closed, {zn ∨ wn}n≥1 is an increasing sequence from N and by Lemma 7.1 limn(zn ∨ wn) = z ∨ w = w. Thus 0 0 ψ (z) = lim ψ(zn) ≤ lim ψ(zn ∨ wn) = ψ (w) . n→∞ n→∞ 0 ↑ The mapping ψ is continuous: Let {zn}n≥1 be an increasing sequence from N 0 0 0 with z = limn zn. Then ψ (zn) ≤ ψ (z) for each n ≥ 1, since ψ is monotone, 0 0 and therefore limn ψ (zn) ≤ ψ (z). But by Lemma 7.2 there exists an increasing sequence {wn}n≥1 from N with wn ≤ zn for each n ≥ 1 and limn wn = z. Thus 0 0 0 0 ψ (z) = lim ψ(wn) = lim ψ (wn) ≤ lim ψ (zn) ≤ ψ (z) n→∞ n→∞ n→∞ 0 0 and hence ψ (z) = limn ψ (zn). If there exists a continuous mapping ψ0 : N ↑ → E such that ψ0(z) = ψ(z) for all ↑ z ∈ N then it is uniquely determined by ψ. (Let z ∈ N ; then z = limn zn for some 0 0 increasing sequence {zn}n≥1 from N and so ψ (z) = limn ψ (zn) = limn ψ(zn).) Let A be a subset of D; an element z ∈ D is said to be a lower bound for A if z ≤ z0 for all z0 ∈ A. A lower bound z is called a greatest lower bound for A if z0 ≤ z for each lower bound z0 for A. If a greatest lower bound exists then it is clearly unique and will be denoted by inf(A). The poset D will be called co-complete if each non-empty subset of D possesses a greatest lower bound. If D is co-complete and {zn}n≥1 is a decreasing sequence from D then the greatest lower bound inf({zn : n ≥ 1}) will also be denoted by limn zn. Therefore limn zn is the unique element z of D with the properties: (i) 0 0 0 0 z ≤ zn for all n ≥ 1 and (ii) if z ∈ D with z ≤ z and z =6 z then zn ≤ z for + all large enough n. Again, this is not the definition of the limit used for R∞ in Chapter 1, but it is equivalent to it. If D is co-complete then there is also a binary operation ∧ : D × D → D defined by z ∧ w = inf{z, w} for all z, w ∈ D. The operation ∧ is monotone but in general need not be continuous. 7 Partially ordered sets 39

In a co-complete poset D a subset N is said to be co-complete if limn zn ∈ N for each decreasing sequence {zn}n≥1 from N.

In what follows assume that D is complete and co-complete. If {zn}n≥1 is any sequence from D and m ≥ 1 then {zn : n ≥ m} will be used to denote the set of values {z ∈ D : z = zn for some n ≥ m}.

Let {zn}n≥1 be any sequence from D and for n ≥ 1 let un = sup{zm : m ≥ n} and vn = inf{zm : m ≥ n}. Then the sequence {un}n≥1 is decreasing and {vn}n≥1 + is increasing. As in R∞ the limits limn un and limn vn are denoted by lim supn zn and lim infn zn respectively. It is easy to see that lim infn zn ≤ lim supn zn, and if {zn}n≥1 is either increasing or decreasing then

lim inf zn = lim zn = lim sup zn . n→∞ n→∞ n→∞ In a complete and co-complete poset D this equality can be used to define when a general sequence {zn}n≥1 from D converges. We do not pursue this topic further because we will only be dealing with posets where the convergence is defined explicitly. 8 Real valued mappings

The general results of the previous chapter will now by applied to the poset M(X) + of all mappings from a non-empty set X to R∞. + As was stated in Chapter 1 the poset R∞ is both complete and co-complete: Let + + A be a non-empty subset of R∞. If there exists b ∈ R with a ≤ b for all a ∈ A then A is a bounded subset of R and in this case sup(A) is the least upper bound of A in R; if no such b ∈ R+ exists then sup(A) = ∞. Moreover, if A = {∞} then inf(A) = ∞; otherwise inf(A) is the greatest lowest bound of A \ {∞} in R. + The operations +, ·, ∨ and ∧ on R∞ are all continuous (and so in particular monotone). Moreover, for all a, b ∈ R+ the operation (c, d) 7→ ac + bd is also continuous. + Most of the mappings we will be dealing with take their values in R∞ and so we next look at the elementary properties of such mappings. + Let X be a non-empty set; the set of all mappings from X to R∞ will be denoted + by M(X). The total order on R∞ induces a partial order ≤ on M(X) defined pointwise by stipulating that f ≤ g if and only if f(x) ≤ g(x) for all x ∈ X. Thus (M(X), ≤) is a poset, and whenever we consider M(X) as a poset then it will always be with respect to this partial order. The poset M(X) is both complete and co-complete. If A is a non-empty subset of M(X) then its least upper bound is the mapping given by sup(A)(x) = sup{f(x) : f ∈ A} for all x ∈ X and its greatest lower bound is the mapping given by inf(A)(x) = inf{f(x) : f ∈ A} for all x ∈ X. In particular, if {fn}n≥1 is either increasing or decreasing then

lim fn (x) = lim fn(x) n→∞ n→∞  for all x ∈ X. Moreover, if {fn}n≥1 is any sequence from M(X) then

lim sup fn (x) = lim sup fn(x) and lim inf fn (x) = lim inf fn(x) n→∞ n→∞ n→∞  n→∞  for all x ∈ X. These statements about limn fn, lim supn fn and lim infn fn are direct consequences of how the partial order ≤ is defined on M(X); however we use their form as the basis for a definition and say that a sequence {fn}n≥1 from M(X) converges to f ∈ M(X) if limn fn(x) = f(x) for all x ∈ X, and denote the + limit f by limn fn. It then follows from the properties of convergence in R∞ that {fn}n≥1 converges to f if and only if lim infn fn = lim supn fn and in this case lim infn fn = limn fn = lim supn fn. + For each binary operation ? on R∞ there is a corresponding operation on M(X) also denoted by ? and defined pointwise by letting (f ? g)(x) = f(x) ? g(x) for all + x ∈ X. If ? is a monotone (resp. continuous) binary operation on R∞ then the corresponding operation ? on M(X) is also monotone (resp. continuous).

40 8 Real valued mappings 41

In particular, this results in the continuous operations +, ·, ∨ and ∧ on M(X). Moreover, for all a, b ∈ R+ the operation (f, g) 7→ af + bg is continuous. We also need the scalar multiplication · : R+ × M(X) → M(X) given by (af)(x) = af(x) + + for all x ∈ X. (It turns out that R and not R∞ is the ‘correct’ set of scalars.) For each a ∈ R+ the constant mapping in M(X) with value a will also be denoted by a. It will be clear from the context which usage is intended. (Of course, it doesn’t matter if af is interpreted as the scalar multiplication of a with f or as the product of the constant mapping a with f.) For all f, g ∈ M(X) define the mapping |f − g| ∈ M(X) pointwise by letting |f − g|(x) = |f(x) − g(x)| for each x ∈ X (recalling that |a − b| = ∞ as soon as one of a and b is equal to ∞). A non-empty subset N of M(X) is said to be a subspace if af + bg ∈ N for all f, g ∈ N and all a, b ∈ R+. Note that a subspace always contains the constant mapping 0. By a complete (resp. co-complete) subspace is meant a subspace N for which N is also a complete (resp. co-complete) subset of M(X). If ? is a binary operation on M(X) then a subspace N is closed under ? (or is ?-closed) if f ? g ∈ N for all f, g ∈ N. If N is a subspace of M(X) then the statement that N 0 is a subspace of N just means that N 0 is a subspace of M(X) with N 0 ⊂ N. Moreover, the statement that N 0 is a complete subspace of N is just a shortened form of the statement that N 0 is a complete subspace of M(X) with N 0 ⊂ N. The same usage applies when ‘complete’ is replaced by some other adjective.

Proposition 8.1 Let N be a ∨-closed subspace of M(X). Then N ↑ is a sub- space of M(X) which is complete and ∨-closed. Moreover, if N is closed under a continuous operation ? on M(X) then so is N ↑.

Proof This follows from Proposition 7.1. (N ↑ is a subspace since for all a, b ∈ R+ the operation (f, g) 7→ af + bg is continuous.)

If f, g ∈ M(X) with g ≤ f then define the mapping f − g ∈ M(X) pointwise by letting (f − g)(x) = f(x) − g(x) for all x ∈ X (recalling that ∞ − b = ∞ for all + b ∈ R∞). Thus f = g + (f − g). A subspace N of M(X) will be called complemented if for all f, g ∈ N with g ≤ f there exists h ∈ N with f = g + h. In particular, N will be complemented if f − g ∈ N for all f, g ∈ N with g ≤ f.

Denote by MF(X) the set of mappings f ∈ M(X) with f(x) < ∞ for all x ∈ X, thus MF(X) is a complemented subspace of M(X). If N is a subspace of MF(X) then N is complemented if and only if f − g ∈ N for all f, g ∈ N with g ≤ f. Note that f ∨ g + f ∧ g = f + g for all f, g ∈ M(X) and thus a complemented subspace of MF(X) is ∧-closed if and only if it is ∨-closed. Note also that if a subspace N of M(X) contains the constant 1 then a ∈ N for all a ∈ R+. 8 Real valued mappings 42

A complemented subspace of M(X) which contains 1 and is closed under both ∨ and ∧ will be called normal. This definition may seem to be based on a somewhat arbitrary combination of conditions. However, Propositions 9.1 and 9.3 will show that a complete subspace is normal if and only if it has the form M(F) for some σ-algebra F ⊂ P(X).

Proposition 8.2 A complete normal subspace N of M(X) is also co-complete. Moreover, |f − g| ∈ N for all f, g ∈ N, and if g ≤ f then f − g ∈ N.

Proof Let {fn}n≥1 be a decreasing sequence from N with f = limn fn and assume first that f1 ∈ MF(X). For each n ≥ 1 let gn = f1 − fn; then {gn}n≥1 is an increasing sequence from N and thus g = limn gn ∈ N. But g = f − f1 and so f = f1 + g ∈ N. Now let {fn}n≥1 be any decreasing sequence from N and again put f = limn fn. The above argument applied to the sequence {fn ∧ m}n≥1 then shows that f ∧ m ∈ N for each m ≥ 1. But {f ∧ m}m≥1 is an increasing sequence with limm f ∧ m = f and hence f ∈ N. This shows that N is co-complete.

Next, let f, g ∈ N with g ≤ f; if n ≥ 1 then g ∧ n ∈ MF(X) and so in this case hn = f − (g ∧ n) is the unique element of N with f = (g ∧ n) + hn and note that hn(x) = ∞ whenever f(x) = ∞. Now {hn}n≥1 is a decreasing sequence from N and so h = limn ∈ N, since N is co-complete. Thus h = f − g, since f = g + h and h(x) = ∞ whenever f(x) = ∞, i.e., f − g ∈ N. Finally, |f − g| ∈ N for all f, g ∈ N, since |f − g| = (f ∨ g − f) + (f ∨ g − g).

Let N be a normal subspace of M(X). Then by Proposition 8.1 N ↑ is closed under both ∧ and ∨ and of course 1 ∈ N ↑. Therefore N ↑ is normal if and only if it is complemented. However, in general this will fail to be the case, and the best partial result is perhaps the following:

Lemma 8.1 Let N be a ∨-closed complemented subspace of M(X) and let g ∈ N and f ∈ N ↑ with g ≤ f. Then there exists h ∈ N ↑ with f = g + h.

Proof Let {fn}n≥1 be an increasing sequence from N with f = limn fn; then {g ∨ fn}n≥1 is an increasing sequence from N, also with limn(g ∨ fn) = g ∨ f = f. Now g ≤ g ∨ fn and so there exists hn ∈ N with (g ∨ fn) = g + hn; but

0 (g ∨ fn) = (g ∨ f1) ∨ · · · ∨ (g ∨ fn) = (g + h1) ∨ · · · ∨ (g + hn) = g + hn ,

0 0 where hn = h1 ∨ · · · ∨ hn, and {hn}n≥1 is an increasing sequence from N. Let 0 ↑ h = limn hn; then h ∈ N and f = g + h.

Lemma 8.2 Let N be a normal subspace of M(X). Then N ↑ is normal if and only if it is co-complete. 8 Real valued mappings 43

Proof As noted above, N ↑ is normal if and only if it is complemented. Moreover, by Proposition 8.2 a complete normal subspace is co-complete. It thus remains to show that N ↑ is complemented when it is co-complete. Therefore suppose ↑ ↑ N is co-complete. Let f, g ∈ N with g ≤ f and let {gn}n≥1 be an increasing ↑ sequence from N with limn gn = g. By Lemma 8.1 there exists hn ∈ N such 0 that f = gn + hn; for each n ≥ 1 let hn = h1 ∧ · · · ∧ hn; then

0 f = g1 ∨ · · · ∨ gn + h1 ∧ · · · ∧ hn = gn + hn

0 ↑ ↑ and {hn}n≥1 is a decreasing sequence from N . Hence, since N is co-complete, 0 0 ↑ 0 ↑ h = limn hn ∈ N , and f = g + h . Therefore N is complemented. A mapping f ∈ M(X) is said to be simple (or elementary) if the set of values f(X) of f is a finite subset of R+ and the set of such simple mappings will be denoted by ME(X). For each F ⊂ X define IF ∈ ME(X) by 1 if x ∈ F , I (x) = F  0 otherwise .

Lemma 8.3 (1) ME(X) is a normal subspace of M(X) which is ?-closed for each + finite binary operation ? on R∞.

(2) If N is a subspace of M(X) with IF ∈ N for all F ⊂ X then ME(X) ⊂ N, which means ME(X) is the smallest subspace of M(X) containing the mappings IF , F ⊂ X.

(3) For each f ∈ M(X) there is an increasing sequence {fn}n≥1 from ME(X) ↑ such that f = limn fn. Thus M(X) = (ME(X)) .

+ Proof (1) If ? is a finite operation on R∞ then ME(X) is ?-closed since (f ? g)(X) ⊂ S = {a ? b : a ∈ f(X), b ∈ g(X)} and S is a finite subset of R+ whenever both f(X) and g(X) are. In particular, + af +bg ∈ ME(X) for all f, g ∈ ME(X), a, b ∈ R , since (c, d) 7→ ac+bd is a finite + operation on R∞; hence ME(X) is a subspace of M(X). Moreover, the subspace ME(X) is complemented, since if f, g ∈ ME(X) with g ≤ f then (f − g)(X) is a subset of the finite set {a − b : a ∈ f(X), b ∈ g(X)}. Finally, it is clear that 1 ∈ ME(X).

(2) If f ∈ ME(X) then f = a∈f(X) aIEa , where Ea = {x ∈ X : f(x) = a}; thus if IF ∈ N for each F ⊂ X thenP ME(X) ⊂ N.

(3) Let f ∈ M(X) and for each n ≥ 1 define fn ∈ ME(X) by

n2n −n fn = (m − 1)2 IEm,n , mX=1 8 Real valued mappings 44

−n −n where Em,n = {x ∈ X : (m − 1)2 < f(x) ≤ m2 }. Then fn ≤ fn+1 ≤ f for −n each n ≥ 1 and f(x) ≤ 2 + fn(x) for all x ∈ X with f(x) ≤ n. Thus {fn}n≥1 is an increasing sequence from ME(X) with limn fn = f.

A mapping f ∈ M(X) is said to be bounded if there exists b ∈ R+ such that f ≤ b (i.e., such that f(x) ≤ b for all x ∈ X) and the set of all such mappings will be denoted by MB(X). In particular, every simple mapping is bounded.

Lemma 8.4 (1) MB(X) is a normal subspace of M(X) which is ?-closed for + each finite monotone operation ? on R∞.

(2) If f ∈ MB(X) then for each ε > 0 there exists g ∈ ME(X) with g ≤ f ≤ g +ε.

Proof (1) This is more-or-less the same Lemma 8.3 (1).

(2) Just take g = fn, where fn is as in the proof of Lemma 8.3 (3) with n chosen so that 2−n < ε and f ≤ n.

+ Let N be a subspace of M(X). A mapping Φ : N → R∞ will be called linear if it is additive and positive homogeneous, i.e., if

Φ(af + bg) = aΦ(f) + bΦ(g) for all f, g ∈ N and all a, b ∈ R+, and note then that Φ(0) = 0. (The somewhat inaccurate description of being linear is employed because of its brevity.) As + in the general case in Chapter 7 a linear mapping Φ : N → R∞ is monotone if Φ(g) ≤ Φ(f) whenever f, g ∈ N with g ≤ f. Moreover, a monotone linear + mapping Φ : N → R∞ is pre-continuous if Φ(f) ≤ limn Φ(fn) whenever f ∈ N and {fn}n≥1 is an increasing sequence from N with f ≤ limn fn. Finally, if N is + complete then a monotone linear mapping Φ : N → R∞ is continuous if

Φ lim fn = lim Φ(fn) n→∞ n→∞  for each increasing sequence {fn}n≥1 from N. If N is a complemented subspace of M(X) then any linear mapping is automatically monotone, since if f, g ∈ N with g ≤ f then there exists h ∈ N with f = g + h and so

Φ(g) ≤ Φ(g) + Φ(h) = Φ(g + h) = Φ(f) .

+ Proposition 8.3 Let N be a ∨-closed subspace of M(X) and let Φ : N → R∞ be a monotone linear mapping. Then there exists a continuous linear mapping 0 ↑ + 0 Φ : N → R∞ with Φ (f) = Φ(f) for all f ∈ N if and only if Φ is pre-continuous. 8 Real valued mappings 45

Proof This is all contained in Proposition 7.2, except for showing that Φ0 must 0 0 ↑ + be linear. However this follows from the definition of Φ . (Recall Φ : N → R∞ 0 is the unique mapping with Φ (limn fn) = limn Φ(fn) for each increasing sequence {fn}n≥1 from N and, since the operation (f, g) 7→ af + bg is continuous for each a, b ∈ R+, it follows that Φ0 is linear.)

0 ↑ + Note that if Φ is pre-continuous then the extension Φ : N → R∞ is unique. For the remainder of the chapter let N be a complete normal subspace of M(X) + and let Φ : N → R∞ be a continuous linear mapping.

Proposition 8.4 If {fn}n≥1 is a decreasing sequence from N and Φ(fm) < ∞ for some m ≥ 1 then Φ lim fn = lim Φ(fn) . n→∞ n→∞ 

Proof Let {fn}n≥1 be a decreasing sequence from N with Φ(fm) < ∞ and put + f = limn fn. Since {Φ(fn)}n≥m is a decreasing sequence from R∞, the limit limn Φ(fn) exists; thus it must be shown that this limit is equal to Φ(f). For each n ≥ m let hn be the unique maximal mapping with fn + hn = fm (and so hn(x) = ∞ whenever fm(x) = ∞); thus by Proposition 8.2 hn ∈ N. Now the sequence {hn}n≥m is increasing and if h = limn hn then f + h = fm (since h(x) = ∞ also holds whenever fm(x) = ∞). Moreover Φ(h) < ∞, since h ≤ fm and Φ(fm) < ∞. Therefore

Φ(f) + Φ(h) = Φ(f + h) = Φ(fm)

= lim Φ(fm) = lim Φ(fn + hn) = lim (Φ(fn) + Φ(hn)) n→∞ n→∞ n→∞

= lim Φ(fn) + lim Φ(hn) = lim Φ(fn) + Φ(h) n→∞ n→∞ n→∞ and hence Φ(f) = limn Φ(fn), since Φ(h) < ∞.

0 0 Lemma 8.5 Let N be a subspace of M(X) and {fn}n≥1 be a sequence from N . 0 0 (1) If N is complete and ∨-closed then sup{fn : n ≥ 1} ∈ N . 0 0 (2) If N is co-complete and ∧-closed then inf{fn : n ≥ 1} ∈ N . 0 (3) If N is a complete normal subspace of M(X) then lim supn fn and lim infn fn are both elements of N 0.

0 Proof (1) Put f = sup{fn : n ≥ 1} ∈ N and for each n ≥ 1 let gn = f1 ∨· · ·∨fn. 0 Then {gn}n≥1 is an increasing sequence from N with limn gn = f and therefore f ∈ N 0. (2) is the same as (1) and (3) follows from (1) and (2). 8 Real valued mappings 46

Proposition 8.5 For each sequence {fn}n≥1 from N

Φ lim inf fn ≤ lim inf Φ(fn) .  n→∞  n→∞ Moreover, if Φ supm≥n fm < ∞ for some n ≥ 1 then also  Φ lim sup fn ≥ lim sup Φ(fn) ,  n→∞  n→∞ Proof Note by Lemma 8.5 all the mappings occurring here are elements of N. Now put f = lim infn fn and for each n ≥ 1 let gn = infm≥n fm; then {gn}n≥1 is an increasing sequence from N with limn gn = f and thus Φ(f) = limn Φ(gn). But gn ≤ fm and so Φ(gn) ≤ Φ(fm) for all m ≥ n ≥ 1. Hence Φ(gn) ≤ infm≥n Φ(fm) for each n ≥ 1 and this shows that

Φ(f) = lim Φ(gn) ≤ lim inf Φ(fm) = lim inf Φ(fn) . n→∞ n→∞m≥n  n→∞ The second part follows in exactly the same way, but using Proposition 8.4. Here we have lim supn fn = limn gn with gn = supm≥n fm and this time {gn}n≥1 is a decreasing sequence. The condition Φ supm≥n fm < ∞ (i.e., Φ(gn) < ∞) for some n ≥ 1 is what we need to apply Prop osition 8.4.

Lemma 8.6 Let g ∈ N with Φ(g) < ∞ and put G = {x ∈ X : g(x) = ∞}. Then ∞IG ∈ N and Φ(∞IG) = 0.

Proof Put hn = g − g ∧ n; then {hn}n≥1 is a decreasing sequence from N with Φ(h1) ≤ Φ(g) < ∞ and limn hn = ∞IG. Thus ∞IG ∈ N and by Proposition 8.4 Φ(∞IG) = limn Φ(hn) = Φ(g) − limn Φ(gn) = 0, since {gn}n≥1 is an increasing sequence from N with limn gn = g.

Proposition 8.6 Let {fn}n≥1 be a convergent sequence from N with f = limn fn and suppose there exists g ∈ N with Φ(g) < ∞ such that fn ≤ g for all n ≥ 1. Then the sequence {Φ(fn)}n≥1 converges and limn Φ(fn) = Φ(f). Moreover,

lim Φ(|fn − f|) = 0 . n→∞

Proof For each n ≥ 1 put gn = supk≥n |f − fk|; then by Proposition 8.2 and Lemma 8.5 (1) gn ∈ N. Thus {gn}n≥1 is a decreasing sequence from N with gn ≤ g for all n ≥ 1 and limn gn(x) = 0 for all x ∈/ G, where G = {x ∈ X : g(x) = ∞}. Therefore by Proposition 8.4 and Lemma 8.6

lim Φ(gn) = Φ lim gn ≤ Φ(∞IG) = 0 . n→∞ n→∞  Hence also limn Φ(|fn − f|) = 0, since |f − fn| ≤ gn for each n ≥ 1. Finally, f ≤ |f − fn| + fn and fn ≤ |f − fn| + f, thus Φ(f) ≤ Φ(|f − fn|) + Φ(fn) and Φ(fn) ≤ Φ(|f −fn|)+Φ(f) and also Φ(fn) ≤ Φ(g) < ∞ for all n ≥ 1. This implies that |Φ(f) − Φ(fn)| ≤ Φ(|f − fn|) for each n ≥ 1 and so limn Φ(fn) = Φ(f). 9 Real valued measurable mappings

The subspace M(E) of the poset M(X) will now be studied. In what follows let (X, E) be a measurable space. Recall that M(E) denotes the + + set of all measurable mappings from (X, E) to (R∞, B∞), thus

−1 + M(E) = {f ∈ M(X) : f (B∞) ⊂ E} .

If A is an algebra of subsets of X then ME(A) will denote the set of those elements f ∈ ME(X) for which {x ∈ X : f(x) = a} ∈ A for each a ∈ f(X). In particular this means that ME(E) = M(E) ∩ ME(X), i.e., ME(E) consists of those mappings f ∈ M(E) for which f(X) is a finite subset of R+. In Lemmas 9.1 and 9.2 let A ⊂ P(X) be an algebra.

Lemma 9.1 ME(A) is a normal subspace of M(X) which is ?-closed for each + finite operation ? on R∞.

+ + Proof Let ? be a finite operation on R∞. For each h ∈ ME(A) and a ∈ R put h Ea = {x ∈ X : h(x) = a}. Let f, g ∈ ME(A); then

f?g f g Ec = Ea ∩ Eb a?b[=c for each c ∈ (f ? g)(X), where the union is restricted to values a, b ∈ R+ with f?g a ∈ f(X) and b ∈ g(X). Thus Ec ∈ A for each c ∈ (f ? g)(X) and therefore by Lemma 8.3 (2) f ? g ∈ ME(A). This shows that ME(A) is ?-closed. + In particular, af + bg ∈ ME(A) for all f, g ∈ ME(A) and all a, b ∈ R∞, since + (c, d) 7→ ac + bd is a finite operation on R∞. Therefore ME(A) is a subspace of ME(X), since clearly 0 ∈ ME(A). Now let f, g ∈ ME(A) with g ≤ f and let

h = (b − a)I g f , Ea∩Eb Xa

Lemma 9.2 The mapping IA is in ME(A) for all A ∈ A, and any subspace N of M(X) with IA ∈ N for each A ∈ A contains ME(A). Therefore ME(A) is the smallest subspace of M(X) containing the mappings IA, A ∈ A.

47 9 Real valued measurable mappings 48

Proof It was already shown in the proof of Lemma 9.1 that IA ∈ ME(A) for all A ∈ A. The rest is the same as Lemma 8.3 (2).

Lemma 9.3 For each f ∈ M(E) there is an increasing sequence {fn}n≥1 from ↑ ME(E) such that f = limn fn. Thus M(E) = (ME(E)) .

Proof Let f ∈ M(E); for each n ≥ 1 define fn ∈ ME(X) by

n2n −n fn = (m − 1)2 IEm,n mX=1

−n −n where Em,n = {x ∈ X : (m − 1)2 < f(x) ≤ m2 } (i.e., defined as in the proof of Lemma 8.3 (3)). Then {fn}n≥1 is an increasing sequence from ME(X) with n limn fn = f. But Em,n ∈ E for all n ≥ 1, 1 ≤ m ≤ n2 , and therefore fn ∈ ME(E) for each n ≥ 1.

Lemma 9.4 If {fn}n≥1 is any sequence from M(E) and A = {fn : n ≥ 1} then sup(A) and inf(A) are both elements of M(E).

0 + Proof Put f = sup(A) and f = inf(A); then for all a ∈ R∞

{x ∈ X : f(x) ≤ a} = {x ∈ X : fn(x) ≤ a} n\≥0

+ is an element of E and hence f ∈ M(E). In the same way, for all a ∈ R∞

0 {x ∈ X : f (x) < a} = {x ∈ X : fn(x) < a} n[≥0 is an element of E and therefore again f 0 ∈ M(E).

Proposition 9.1 M(E) is a complete normal subspace of M(X) which is also + ?-closed for each finite continuous operation ? on R∞.

Proof By Lemma 9.1 ME(A) is a normal subspace of M(X) and by Lemma 9.3 ↑ M(E) = (ME(E)) . Moreover, by Lemma 9.4 M(E) is co-complete. Therefore by Lemma 8.2 M(E) is a complete normal subspace of M(X). Finally, it follows from Proposition 8.1 and Lemma 9.3 that M(E) is ?-closed for each finite continuous + operation ? on R∞. By Propositions 9.1 and 8.2 |f − g| ∈ M(E) for all f, g ∈ M(E) and if g ≤ f then f − g ∈ M(E). 9 Real valued measurable mappings 49

Proposition 9.2 Let N be a complete subspace of M(E) with IE ∈ N for each E ∈ E. Then N = M(E).

Proof Lemma 9.2 implies that ME(E) ⊂ N, since IE ∈ N for each E ∈ E. Therefore by Lemma 9.3 N = M(E), since N is complete. By Proposition 9.1 M(E) is a complete normal subspace of M(X) and it is clear that E = {E ⊂ X : IE ∈ M(E)}. The next result states that the converse also holds.

Proposition 9.3 Let N be a complete normal subspace of M(X). Then

F = {F ⊂ X : IF ∈ N} is a σ-algebra and N = M(F).

Proof If F ∈ F then IF ∈ N with IF ≤ 1 and IX\F is the unique element of M(X) with 1 = IF + IX\F . Thus IX\F ∈ N, since N is complemented, i.e., X \ F ∈ F for all F ∈ F.

Now let E, F ∈ F; then IE∩F = IE ∧ IF ∈ F, since N is ∧-closed, and hence E ∩ F ∈ F. This shows F is an algebra, since IX = 1 ∈ F.

Next let {Fn}n≥1 be an increasing sequence from F and put F = n≥1 Fn. Then

{IFn }n≥1 is an increasing from N with limn IFn = IF and thus SIF ∈ N. This shows F ∈ F and therefore by Lemma 2.1 (4) F is a σ-algebra. + Now let g ∈ N and a ∈ R with a > 0 and let Ga = {x ∈ X : g(x) > a}. Put h = a−1g − (a−1g) ∧ 1; then g ∈ N, since N is complemented and ∧-closed and 1 ∈ N. For each n ≥ 1 let gn = (nh) ∧ 1; then {gn}n≥1 is an increasing sequence from N with limn gn = IGa . Thus IGa ∈ N, since N is complete, i.e., IGa ∈ F which implies that g ∈ M(F). This shows N ⊂ M(F). But by definition IF ∈ N for all F ∈ F and therefore by Proposition 9.2 N = M(F).

Let MB(E) denote the set of the bounded mappings in M(E) (and therefore MB(E) = M(E) ∩ MB(X)).

Lemma 9.5 (1) MB(E) is a co-complete normal subspace of M(X) which is + ?-closed for each finite continuous operation ? on R∞. (2) If f ∈ MB(E) then for each ε > 0 there exists a mapping g ∈ ME(E) with g ≤ f ≤ g + ε.

Proof (1) This follows immediately from Lemma 8.3 (1) and Proposition 9.1.

(2) If f ∈ MB(E) then the mapping g ∈ ME(X) in the proof of Lemma 8.3 (2) is in fact an element of ME(E).

If f, g ∈ MB(E) then |f − g| = MB(E), since |f − g| is bounded. 9 Real valued measurable mappings 50

Lemma 9.6 Let f, g ∈ M(E); then the set G♦ = {x ∈ X : f(x) ♦ g(x)} is in E whenever ♦ is one of the relations <, ≤, >, ≥, = or =6 .

Proof Let h = f ∨ g − g; thus by Proposition 9.1 h ∈ M(E). Now {x ∈ X : f(x) > g(x)} = {x ∈ X : (f ∨ g)(x) > g(x)} = {x ∈ X : h(x) > 0} ∩ {x ∈ X : g(x) < ∞} and thus G> ∈ E. Moreover, this implies also that G≥ ∈ E, since {x ∈ X : f(x) ≥ g(x)} = {x ∈ X : f(x) = ∞} ∪ {x ∈ X : (f + 1/n)(x) > g(x)} . n\≥1 The other four cases now follow directly from these two.

Lemma 9.7 Let {fn}n≥1 be any sequence from M(E) and f ∈ M(E); then the 0 sets G = {x ∈ X : limn fn(x) exists} and G = {x ∈ X : limn fn(x) = f(x)} are both in E.

Proof By Lemma 9.4 the mappings lim supn fn and lim infn fn are both elements of M(E) and G = {x ∈ X : lim supn fn(x) = lim infn fn(x)}. Therefore by 0 Lemma 9.6 G ∈ E. Moreover, G = {x ∈ X : (IG lim supn fn)(x) = f(x)} and so again making use of Lemmas 9.4 and 9.6 G0 ∈ E.

For the remainder of the chapter let µ be a measure on E. If ♦ is one of the relations <, ≤, >, ≥, = or =6 and f, g ∈ M(E) then we say that f ♦ g µ- (which is usually shortened to f ♦ g µ-a.e.) if µ(X \ {x ∈ X : f(x) ♦ g(x)}) = 0 The usage ‘µ-almost everywhere’ will also be applied to more complex statements, for example if {fn}n≥1 is a sequence from M(E) and f ∈ M(E) then limn fn = f µ-a.e. means that

µ X \ x ∈ X : lim fn(x) = f(x) = 0 . n→∞    In general it means that if B is the set of elements for which the statement does not hold then µ(B) = 0 (with the implicit assumption that it has already been verified that B ∈ E).

A sequence {fn}n≥1 from M(E) is said to converge in µ-measure to f ∈ M(E) if

lim µ({x ∈ X : |fn(x) − f(x)| > ε}) = 0 n→∞ for each ε > 0. (Note that this is impossible if µ({x ∈ X : f(x) = ∞}) > 0, and we leave the reader to find a better definition which also treats convergence at ∞ properly.) 9 Real valued measurable mappings 51

Proposition 9.4 Let µ be finite and let {fn}n≥1 be a sequence from M(E) which converges µ-a.e. to f ∈ M(E), where µ({x ∈ X : f(x) = ∞}) = 0. Then:

(1) {fn}n≥1 converges to f in µ-measure.

(2) For each ε > 0 there exists E ∈ E with µ(E) < ε such that {fn}n≥1 converges uniformly to f on X \ E.

Proof Put F = {x ∈ X : f(x) < ∞ and f(x) = limn fn(x)} and so µ(X \F ) = 0. For each each ε > 0, n ≥ 1 let

ε En = {x ∈ X : |fk(x) − f(x)| > ε} . k[≥n

ε ε Then {En}n≥1 is a decreasing sequence from E with n≥1 En ⊂ X \ F and hence ε ε ε by Lemma 3.2 limn µ(En) = 0 since limn µ(En) = µ(Tn≥1 En) ≤ µ(X \ F ) = 0. T ε (1) This follows because {x ∈ X : |fn(x) − f(x)| > ε} ⊂ En for each n ≥ 1. 1/m −m (2) Let ε > 0; then for each m ≥ 1 there exists nm ≥ 1 with µ(Enm ) < 2 ε. 1/m −m Put E = m≥1 Enm ; then E ∈ E and µ(E) < m≥1 2 ε = ε. Moreover, the sequence {Sfn}n≥1 converges uniformly to f on XP\ E, since |fn(x) − f(x)| ≤ 1/m for all n ≥ nm for each x ∈ X \ E.

Proposition 9.5 Let {fn}n≥1 be a sequence from M(E) converging to f ∈ M(F) in µ-measure. Then there exists a subsequence {nk}k≥1 so that limk fnk = f µ-a.e.

Proof Since {fn}n≥1 converges to f in µ-measure there exists a subsequence −k {nk}k≥1 such that µ({x ∈ X : |fnk (x) − f(x)| ≥ 1/k}) < 2 for each k ≥ 1. Let m ≥ 1; then for each k ≥ m

−k µ({x ∈ X : |fnk (x) − f(x)| ≥ 1/m}) ≤ µ({x ∈ X : |fnk (x) − f(x)| ≥ 1/k}) < 2 which means that the series k≥m µ({x ∈ X : |fnk (x)−f(x)| ≥ 1/m}) converges. This in turn implies that P

µ {x ∈ X : |fnk (x) − f(x)| ≥ 1/m} = 0 N\≥1 k[≥N  for each m ≥ 1 and therefore by Lemma 3.3

µ {x ∈ X : |fnk (x) − f(x)| ≥ 1/m} = 0 . m[≥1 N\≥1 k[≥N 

But this says exactly that limk fnk = f µ-a.e. 10 The integral

We are now finally in a position to introduce the integral. The main interest is in the integral for a measure defined on a σ-algebra, but some of the results are best stated for a measure defined only on an algebra. Thus to start with let X be a non-empty set and let A ⊂ P(X) be an algebra. Recall from Chapter 8 that if N is a subspace of M(X) then a monotone linear + mapping Φ : N → R∞ is said to be pre-continuous if Φ(f) ≤ limn Φ(fn) whenever f ∈ N and {fn}n≥1 is an increasing sequence from N with f ≤ limn fn. Recall + also that a linear mapping Φ : ME(A) → R∞ is always monotone (since by Lemma 9.1 ME(A) is complemented).

+ Proposition 10.1 (1) Let Φ : ME(A) → R∞ be a linear mapping and define + µ : A → R∞ by µ(A) = Φ(IA) for each A ∈ A. Then µ is a finitely additive measure on A. Moreover, if Φ is pre-continuous then µ is a measure. (2) For each finitely additive measure µ on A there exists a unique linear mapping + Φµ : ME(A) → R∞ such that Φµ(IA) = µ(A) for all A ∈ A. Moreover, if µ is a measure then Φµ is pre-continuous.

Proof (1) If Φ is linear then µ(∅) = Φ(I∅) = Φ(0) = 0, and if A1, A2 ∈ A with

A1 ∩ A2 = ∅ then IA1∪A2 = IA1 + IA2 and so

µ(A1 ∪ A2) = Φ(IA1∪A2 ) = Φ(IA1 ) + Φ(IA2 ) = µ(A1) + µ(A2) .

This shows that µ is a finitely additive measure on A.

Now let Φ be pre-continuous and let {An}n≥1 be an increasing sequence from A with A = n≥1 An ∈ A. Then {IAn }n≥1 is an increasing sequence from ME(A) with limn ISAn = IA and therefore

µ(A) = Φ(IA) ≤ lim Φ(IAn ) = lim µ(An) . n→∞ n→∞

But µ(An) = Φ(IAn ) ≤ Φ(IA) = µ(A) for all n ≥ 1, since Φ is monotone, and hence also limn µ(An) ≤ µ(A). Thus limn µ(An) = µ(A), which shows that µ is continuous. Therefore by Proposition 3.1 µ is a measure on A. + (2) Define Φµ : ME(A) → R∞ explicitly by letting

Φµ(f) = aµ({x ∈ X : f(x) = a}) . a∈Xf(X)

In particular this means that Φµ(IA) = µ(A) for all A ∈ A. Moreover, Φµ is + linear: It is clear that Φµ(af) = aΦµ(f) for all f ∈ ME(A), a ∈ R and so it is

52 10 The integral 53

enough to show that Φµ(f + g) = Φµ(f) + Φµ(g) for all f, g ∈ ME(A). Using the notation in the proof of Lemma 9.1 it follows that for all c ∈ C = (f + g)(X)

f+g f g f g µ(Ec ) = µ Ea ∩ Eb = µ(Ea ∩ Eb ) , a+[b=c  aX+b=c with a and b being restricted to values in the sets A = f(X) and B = g(X) respectively. Therefore

f+g f g Φµ(f + g) = cµ(Ec ) = c µ(Ea ∩ Eb ) Xc Xc∈C aX+b=c f g f g = (a + b)µ(Ea ∩ Eb ) = (a + b)µ(Ea ∩ Eb ) Xc∈C aX+b=c Xa∈A Xb∈B f g f g = aµ(Ea ∩ Eb ) + bµ(Ea ∩ Eb ) Xa∈A Xb∈B Xa∈A Xb∈B f g f g = aµ Ea ∩ Eb + bµ Ea ∩ Eb Xa∈A  b[∈B  Xb∈B a[∈A  f g = aµ(Ea ) + bµ(Eb ) = Φµ(f) + Φµ(g) . Xa∈A Xb∈B

The uniqueness of Φµ follows immediately from Lemma 9.2. Suppose now that µ is a measure on A; we must show that Φµ is pre-continuous and the next lemma gives the first step in this direction.

+ Lemma 10.1 Let A ∈ A, a ∈ R and let {fn}n≥1 be an increasing sequence from ME(A) with aIA ≤ limn fn. Then aµ(A) ≤ limn Φµ(fn).

Proof This holds trivially if a = 0 and so we can assume that a =6 0. Let b ∈ R+ with b < a and for each n ≥ 1 let An = {x ∈ A : fn(x) > b}. Then {An}n≥1 is an increasing sequence from A with n≥1 An = A and hence by Proposition 3.1 limn µ(An) = µ(A). But bIAn ≤ fn andS hence bµ(An) = Φµ(bIAn ) ≤ Φµ(fn) for all n ≥ 1; therefore bµ(A) = limn bµ(An) ≤ limn Φµ(fn) and since this holds for all b < a it follows that aµ(A) ≤ limn Φµ(fn).

Let f ∈ ME(A) and let {fn}n≥1 be an increasing sequence from ME(A) with f ≤ limn fn. For each a ∈ f(X) let Ea = {x ∈ X : f(x) = a}. Now let a ∈ f(X); then {fnIEa }n≥1 is an increasing sequence from ME(A) with aIEa ≤ limn fnIEa and hence by Lemma 10.1 aµ(Ea) ≤ limn Φµ(fnIEa ). But fn = a∈g(X) fnIEa for each n ≥ 1 and thus P

Φµ(f) = aµ(Ea) ≤ lim Φµ(fnIEa ) n→∞ a∈Xf(X) a∈Xg(X) 

= lim Φµ(fnIEa ) = lim Φµ(fn) . n→∞ n→∞ a∈Xg(X) 10 The integral 54

This shows that Φµ is pre-continuous.

Here is a version of Proposition 3.2 which is stated in terms of the mapping Φµ:

Proposition 10.2 Suppose that µ is a finite finitely additive measure on A and + let Φµ : ME(A) → R∞ be the unique linear mapping such that Φµ(IA) = µ(A) for all A ∈ A. Then µ is a measure if and only if limn Φµ(fn) = 0 for each decreasing sequence {fn}n≥1 from ME(A) with limn fn = 0.

Proof Suppose µ is a measure and let {fn}n≥1 be a decreasing sequence from + ME(A) with limn fn = 0. Since f1 is bounded there exists b ∈ R with f1 ≤ b (and then fn ≤ b for all n ≥ 1). Let ε > 0, choose η > 0 with 2µ(X)η < ε and for each n ≥ 1 let An = {x ∈ X : fn(x) ≥ η}. Then fn ≤ bIAn + η and therefore

Φµ(fn) ≤ bΦµ(IAn ) + ηΦ(IX ) = bµ(An) + ηµ(X) < bµ(An) + ε/2 , since Φµ is linear and monotone. But {An}n≥1 is a decreasing sequence from A with n≥1 An = ∅ and thus by Proposition 3.2 limn µ(An) = 0. Hence there existsTm ≥ 1 so that bµ(An) < ε/2 for all n ≥ m, which implies that Φµ(fn) < ε for all n ≥ m. This shows that limn Φµ(fn) = 0. Conversely, if limn Φµ(fn) = 0 for each decreasing sequence {fn}n≥1 from ME(A) with limn fn = 0 then clearly µ is ∅-continuous and therefore by Proposition 3.2 µ is a measure.

Proposition 10.3 For each measure µ on A there exists a unique continuous ↑ + linear mapping Φµ : (ME(A)) → R∞ such that Φµ(IA) = µ(A) for all A ∈ A.

+ Proof By Proposition 10.1 (2) Φµ : ME(A) → R∞ is pre-continuous and by Lemma 9.1 ME(A) is a ∨-closed subspace of M(X). Thus by Proposition 8.4 Φµ ↑ + extends uniquely to a continuous linear mapping from (ME(A)) to R∞ (which ↑ + will also be denoted by Φµ). Thus Φµ : (ME(A)) → R∞ is the unique continuous linear mapping such that Φµ(IA) = µ(A) for all A ∈ A.

Now in what follows let (X, E) be a measurable space.

+ Proposition 10.4 Let Φ : M(E) → R∞ be a continuous linear mapping and + define µ : E → R∞ by µ(E) = Φ(IE). Then µ is a measure on E.

Proof By Proposition 9.1 M(E) is complete and hence by Lemma 7.3 Φ is also pre-continuous. The result thus follows from Proposition 10.1 (1).

Theorem 10.1 For each measure µ on E there exists a unique continuous linear + mapping Φµ : M(E) → R∞ such that Φµ(IE) = µ(E) for all E ∈ E. 10 The integral 55

↑ Proof By Lemma 9.3 M(E) = (ME(E)) and therefore the result follows from Proposition 10.3.

Proposition 10.4 implies that the mapping µ 7→ Φµ (given in Theorem 10.1) defines a bijection between the set of measures on E and the set of continuous + linear mappings from M(E) to R∞.

It is usual to write something like f dµ instead of Φµ(f) (at least if Φµ(f) =6 ∞) and to call Φµ(f) the integral ofRf with respect to µ. However, we prefer to just write µ(f) instead of Φµ(f). This means that a measure µ on E will also + be considered as the unique continuous linear mapping µ : M(E) → R∞ with µ(IE) = µ(E) for all E ∈ E. + In terms of the mapping µ : M(E) → R∞ a measure µ is finite if and only if µ(1) < ∞ and is a probability measure if and only if µ(1) = 1.

Theorem 10.2 Let µ be a measure on E and let {fn}n≥1 be a decreasing sequence from M(E) such that µ(fm) < ∞ for some m ≥ 1. Then

µ lim fn = lim µ(fn) . n→∞  n→∞

Proof This is just a special case of Proposition 8.4, since by Proposition 9.1 M(E) is a complete normal subspace of M(X).

Theorem 10.3 Let {fn}n≥1 be a sequence from M(E). Then

µ lim inf fn ≤ lim inf µ(fn) .  n→∞  n→∞

Moreover, if µ supm≥n fm < ∞ for some n ≥ 1 then also 

µ lim sup fn ≥ lim sup µ(fn) ,  n→∞  n→∞

Proof This is just a special case of Proposition 8.5.

Theorem 10.3 (or at least the first part of it) is known as Fatou’s lemma. The next result is known as the dominated convergence theorem.

Theorem 10.4 Let {fn}n≥1 with f = limn fn be a convergent sequence from M(E) and suppose there exists g ∈ M(E) with µ(g) < ∞ such that fn ≤ g for all n ≥ 1. Then the sequence {µ(fn)}n≥1 converges and limn µ(fn) = µ(f). Moreover, limn µ(|fn − f|) = 0. 10 The integral 56

Proof This is just a special case of Proposition 8.6.

Proposition 10.5 Let µ be a measure on E. (1) µ = 0 (i.e., µ(f) = 0 for all f ∈ M(E)) if and only if µ(1) = 0. (2) If µ(f) = 0 then µ(fg) = 0 for all g ∈ M(E).

Proof (1) Let N = {f ∈ M(E) : µ(f) = 0}; then N is a complete subspace of M(E) and so by Proposition 9.2 N = M(E) if and only if µ(IE) = 0 for all E ∈ E, + and this holds if and only if µ(1) = µ(X) = 0, since the mapping µ : E → R∞ is monotone. + (2) This follows from (1), since the mapping g 7→ µ(fg) (from M(E) to R∞) is linear and continuous and thus by Proposition 10.4 a measure.

Lemma 10.2 Let µ be a measure on E. Then

aµ({x ∈ X : f(x) ≥ a}) ≤ µ(f) for all f ∈ M(E) and all a ∈ R+.

Proof Put E = {x ∈ X : f(x) ≥ a}; then aIE ∈ ME(E) with aIE ≤ f and thus aµ(E) = µ(aIE) ≤ µ(f).

Proposition 10.6 Let µ be a measure on E and f ∈ M(E). Then: (1) µ(f) = 0 if and only if µ({x ∈ X : f(x) > 0}) = 0.

(2) µ(fIN ) = 0 for all N ∈ E with µ(N) = 0.

(3) If E ∈ E with µ(X \ E) = 0 then µ(fIE) = µ(f). (4) If µ(f) < ∞ then µ({x ∈ X : f(x) = ∞}) = 0. (5) If µ(f) < ∞ and F ∈ E is such that {x ∈ X : 0 < f(x) < ∞} ⊂ F then µ(fIF ) = µ(f). Moreover, µ(hfIF ) = µ(hf) for all h ∈ M(E).

+ Proof For each a ∈ R put Ea = {x ∈ X : f(x) ≥ a}.

(1) Let E = {x ∈ X : f(x) > 0}. If µ(f) = 0 then by Lemma 10.2 aµ(Ea) = 0 + for each a ∈ R and in particular µ(E1/n) = 0 for all n ≥ 1. Thus by Lemma 3.3

µ(E) = µ E1/n ≤ µ(E1/n) = 0 . n[≥1  Xn≥1 10 The integral 57

Suppose conversely µ(E) = 0 and let g ∈ ME(F) with g ≤ f. Then g ≤ bIE, where b = max(g(X)), and therefore µ(g) ≤ µ(bIE) = bµ(E) = 0. It now follows from the definition of µ(f) that µ(f) = 0.

(2) This follows immediately from (1), since {x ∈ X : (fIN )(x) > 0} ⊂ N.

(3) By (2) µ(f) = µ(fIE + fIX\E) = µ(fIE) + µ(fIX\E) = µ(fIE). + (4) By Lemma 10.2 aµ({x ∈ X : f(x) = ∞}) ≤ aµ(Ea) ≤ µ(f) for each a ∈ R , since {x ∈ X : f(x) = ∞} ⊂ Ea, and thus µ({x ∈ X : f(x) = ∞}) = 0. 0 (5) Let F = {x ∈ X : f(x) < ∞}; then fIF 0 ≤ fIF ≤ f, which implies that 0 µ(fIF 0 ) ≤ µ(fIF ) ≤ µ(f). But by (4) µ(X \ F ) = 0 and therefore by (3) it follows that µ(fIF 0 ) = µ(f). Thus µ(fIF ) = µ(f). Now let E ∈ E; then µ(IEf) < ∞ and {x ∈ X : 0 < (IEf)(x) < ∞} ⊂ {x ∈ X : 0 < f(x) < ∞} ⊂ F , and hence applying the first part to IEf shows that µ(IEfIF ) = µ(IEf). But N = {h ∈ M(E) : µ(hfIF ) = µ(hf) is clearly a closed subspace of M(E) and hence by Proposition 9.2 N = M(E), i.e., µ(hfIF ) = µ(hf) for all h ∈ M(E).

The following result is the Cauchy-Schwarz inequality:

Proposition 10.7 Let µ be a measure on E; then µ(fg)2 ≤ µ(f 2)µ(g2) for all f, g ∈ M(E).

Proof If either µ(f 2) = 0 or µ(g2) = 0 then by Proposition 10.6 (1) µ(fg) = 0 and so we can assume that µ(f 2) > 0 and µ(g2) > 0. Moreover, we can then also assume that µ(f 2) < ∞ and µ(g2) < ∞, and this implies that µ(fg) < ∞, 1 2 2 since fg ≤ 2 (f + g ). Now if λ ≥ 0 then |f − λg| ∈ M(E) and thus also |f − λg|2 ∈ M(E). But then |f − λg|2 + 2λfg = f 2 + λ2g2 and hence

µ(|f − λg|2) + 2λµ(fg) = µ(f 2) + λ2µ(g2) .

It therefore follows that 2λµ(fg) ≤ λ2µ(g2) + µ(f 2) for all λ ≥ 0 which implies µ(fg)2 ≤ µ(f 2)µ(g2).

Recall that a measure µ on E is σ-finite if if there exists a sequence {Bn}n≥1 from E with µ(Bn) < ∞ for each n ≥ 1 and X = n≥1 Bn. Recall also that it is always possible to choose the sequence {Bn}n≥1 hereS either to be increasing or to be disjoint. Let

+ MF (E) = {f ∈ M(E) : 0 < f(x) < ∞ for all x ∈ X} .

Lemma 10.3 A measure µ on E is σ-finite if and only if µ(v) < ∞ for some + v ∈ MF (E). 10 The integral 58

+ Proof Suppose there exists v ∈ MF (E) such that µ(v) < ∞, and for each n ≥ 1 let Bn = {x ∈ X : v(x) ≥ 1/n}. Then Bn ∈ E and n≥1 Bn = X, since v(x) > 0 for all x ∈ X. But by Lemma 10.2 µ(Bn) ≤ nµ(v)

Then vn ∈ ME(E) ⊂ M(E), the sequence {vn}n≥1 is increasing and

n n −k −1 −k µ(vn) = 2 (1 + µ(Bk)) µ(Bk) ≤ 2 < 1 . Xk=1 Xk=1

Let v = limn vn; then v ∈ M(E) and µ(v) ≤ 1. But 0 < v(x) < ∞ for all x ∈ X, −k −1 + since v(x) = 2 (1 + µ(Bk)) for all x ∈ Bn, n ≥ 1, and thus v ∈ MF (E).

Lemma 10.4 Let µ be a σ-finite measure on E and f, g ∈ M(E).

(1) f ≤ g holds µ-a.e. if and only if µ(IEf) ≤ µ(IEg) for all E ∈ E, and this holds if and only if µ(hf) ≤ µ(hg) for all h ∈ M(E).

(2) f = g holds µ-a.e. if and only if µ(IEf) = µ(IEg) for all E ∈ E, and this holds if and only if µ(hf) = µ(hg) for all h ∈ M(E).

Proof (1) Assume first f ≤ g µ-a.e. and let G = {x ∈ X : f(x) > g(x)}, thus µ(G) = 0. Then IEfIX\G ≤ IEgIX\G for each E ∈ E and therefore

µ(IEf) = µ(IEfIX\G) ≤ µ(IEgIX\G) = µ(IEg) by Proposition 10.6 (3). But {h ∈ M(E) : µ(hf) ≤ µ(hf)} is clearly a closed subspace of M(E) and hence by Proposition 9.2 µ(hf) ≤ µ(hf) for all h ∈ M(E) (and this part holds for an arbitrary measure µ). Suppose now conversely that µ(hf) ≤ µ(hf) for all h ∈ M(E, and so in particular µ(IEf) ≤ µ(IEg) for all E ∈ E). Fix b ∈ R+ and B ∈ E with µ(B) < ∞; for each ε > 0 let

Aε = {x ∈ X : f(x) ≥ (g + ε)(x)} ∩ {x ∈ B : g(x) ≤ b} ;

then, since IAε (g + ε) ≤ IAε f, it follows that

µ(IAε g) + εµ(Aε) = µ(IAε (g + ε)) ≤ µ(IAε f) ≤ µ(IAε g) ,

and so µ(Aε) = 0, since µ(IAε g) ≤ µ(bIB) = bµ(B) < ∞. Thus by Lemma 3.3

µ({x ∈ B : f(x) > g(x) and g(x) ≤ b}) = µ A1/n ≤ µ(A1/n) = 0 , n[≥1  Xn≥1 10 The integral 59 and therefore again making use of Lemma 3.3

µ({x ∈ B : f(x) > g(x)}) = µ {x ∈ B : f(x) > g(x) and g(x) ≤ n} = 0 . n[≥1 

Now let {Bn}n≥1 be a sequence from E with µ(Bn) < ∞ for each n ≥ 1 and with

X = n≥1 Bn. Then, once more using Lemma 3.3, we have S µ({x ∈ X : f(x) > g(x)}) = µ {x ∈ Bn : f(x) > g(x)} = 0 . n[≥1 

(2) This follows immediately from (1).

Let µ be a measure on E; a subset U of M(E) is said to be uniformly µ-integrable if for each ε > 0 there exists δ > 0 such that µ(fIE) < ε for all f ∈ U and all E ∈ E with µ(E) < δ.

Lemma 10.5 Let U be a subset of M(E) and suppose there exists h ∈ M(E) with µ(h) < ∞ such that f ≤ h for all f ∈ U. Then U is uniformly µ-integrable.

Proof This follows from Lemma 10.6 below, since µ(fIE) ≤ µ(hIE) for all f ∈ U and all E ∈ E.

Note that if U is as in Lemma 10.5 then there exists b ∈ R+ (= µ(h)) such that µ(f) ≤ b for all f ∈ U. If µ is finite then usually the converse is true, i.e., if U ⊂ M(E) is uniformly µ-integrable then there will exist b ∈ R+ with µ(f) ≤ b for all f ∈ U. (This will be true if for each δ > 0 the set X can be written as a m finite union k=1 Ek with µ(Ek) < δ for each k.) S Lemma 10.6 Let h ∈ M(E) with µ(h) < ∞. Then for each ε > 0 there exists δ > 0 such that µ(hIE) < ε for all E ∈ E with µ(E) < δ. In other words, the one element set {h} is uniformly µ-integrable.

Proof For n ≥ 1 let En = {x ∈ X : h(x) > n}; then {hIEn }n≥1 is a decreasing sequence from M(E) with µ(hIE1 ) ≤ µ(h) < ∞ and limn hIEn = hIE∞ , where E∞ = {x ∈ X : h(x) = ∞}. Therefore by Theorem 10.2 and Proposition 10.6

(2) and (4) limn µ(hIEn ) = µ(hIE∞ ) = 0. Now let ε > 0; then there exists m ≥ 1 such that µ(hIEm ) < ε/2. Let δ = ε/(2m); if E ∈ E with µ(E) < δ then

µ(hIE) = µ(hIE∩Em + hIE\Em ) = µ(hIE∩Em ) + µ(hIE\Em )

≤ µ(hIEm ) + µ(mIE) = µ(hIEm ) + mµ(E) < ε/2 + ε/2 = ε , since hIE∩Em ≤ hIEm and hIE\Em ≤ mIE. Lemma 10.5 implies that the following result is a generalisation of Theorem 10.4 (but only with µ finite). Put Mµ(E) = {f ∈ M(E) : µ({x ∈ X : f(x) = ∞}) = 0}. 10 The integral 60

Proposition 10.8 Let µ be finite and let {fn}n≥1 be a convergent sequence from Mµ(E) with limn fn = f ∈ Mµ(E). Suppose that the set {fn : n ≥ 1} is uniformly + µ-integrable. Then there exists b ∈ R such that µ(f) ≤ b and µ(fn) ≤ b for each n ≥ 1 and limn µ(|fn − f|) = 0.

Proof Let F = {x ∈ X : f(x) < ∞ and fn(x) < ∞ for all n ≥ 1} and for each m ≥ 1 let Em = {x ∈ F : fn(x) ≤ m for all n ≥ 1}; then {Em}m≥1 is an increasing sequence from E with m≥1 Em = F and therefore by Proposition 3.1 limn µ(Em) = µ(F ) = µ(X), sinceSby Lemma 3.3 µ(X \ F ) = 0. Let ε > 0 and choose δ > 0 so that µ(fnIE) < ε/3 for all n ≥ 1 whenever µ(E) < δ. Since µ is finite there then exists p ≥ 1 such that µ(X \ Ep) < δ and thus by Theorem 10.3

µ(fIX\Ep ) = µ lim fnIX\Ep ≤ lim inf µ(fnIX\Ep ) ≤ ε/3 . n→∞  n→∞

Now fnIEp ≤ p for each n ≥ 1 and so µ(fn) = µ(fnIEp ) + µ(fnIX\Ep ) ≤ p + ε/3 for each n ≥ 1; also µ(f) ≤ p + ε/3, since fIEp ≤ p. Moreover,

µ(|f − fn|) = µ(|f − fn|IEp ) + µ(|f − fn|IX\Ep )

≤ µ(|f − fn|IEp ) + µ(fIX\Ep ) + µ(fnIX\Ep )

< µ(|f − fn|IEp ) + 2ε/3 .

Put gn = |f − fn|IEp ; then gn ≤ p for all n ≥ 1 (and µ(p) < ∞ since µ is finite) and limn gn = 0. Thus by Theorem 10.4 limn µ(gn) = 0 and this shows that limn µ(|fn − f|) = 0. Finally, taking b = p + ε/3 (for some arbitrary ε > 0) shows that µ(f) ≤ b and µ(fn) ≤ b for each n ≥ 1. 11 The Daniell integral

In this chapter we give another approach to the integral. The material presented here will not be used elsewhere, and so the chapter can be omitted. In the following let X be a non-empty set.

+ Theorem 11.1 Let N be a normal subspace of M(X) and let Φ : N → R∞ be a pre-continuous linear mapping. Then there exists a σ-algebra E ⊂ P(X) with N ⊂ M(E) and a measure µ on E such that Φ(f) = µ(f) for all f ∈ N.

The construction involved in the proof of this result is due to Daniell (in 1917) and thus goes under the name of the Daniell integral. However, before starting with the proof we apply Theorem 11.1 to the obtain another proof of the Carath´eodory extension theorem (Theorem 4.1). Thus let A ⊂ P(X) be an algebra and let µ be a measure on A. We want to show that there exists a measure ν on σ(A) such that ν(A) = µ(A) for all A ∈ A. By Lemma 9.1 ME(A) is a normal subspace of + M(X) and by Proposition 10.1 (2) the unique linear mapping Φµ : ME(A) → R∞ with Φµ(IA) = µ(A) for all A ∈ A is pre-continuous. Thus by Theorem 11.1 there exists a σ-algebra E ⊂ P(X) with ME(A) ⊂ M(E) and a measure λ on E such that Φµ(f) = λ(f) for all f ∈ ME(A). Then A ⊂ E, since IA ∈ ME(A) for each A ∈ A and hence σ(A) ⊂ E. Let ν be the restriction of λ to σ(A); then µ(A) = Φµ(IA) = λ(IA) = λ(A) = ν(A) for all A ∈ A. We now turn to the proof of Theorem 11.1. A complete subspace N of M(X) will be called weakly complemented if for all f, g ∈ N with g ≤ f there exists an increasing sequence {gn}n≥1 from N ∩MF(X) with g = limn gn and for each n ≥ 1 an element hn ∈ N with f = gn + hn. (The sequence {hn}n≥1 is then more-or-less decreasing, and it can actually be chosen to be decreasing when N is ∧-closed. However, we do not need to make use of this fact.) Let us say that a complete subspace N of M(X) is weakly normal if it is a weakly complemented subspace containing 1 and closed under ∧ and ∨. In particular, a complete normal subspace N is weakly complemented and hence weakly normal. (Let f, g ∈ N with g ≤ f. Then {g ∧ n}n≥1 is an increasing sequence from N ∩ MF(X) with g = limn(g ∧ n), and there exists hn ∈ N with f = (g ∧ n) + hn, since N is complemented and g ∧ n ≤ f.)

Lemma 11.1 Let N be a normal subspace of M(X). Then N ↑ is weakly normal.

↑ 0 Proof Let f, g ∈ N with g ≤ f and let {gn}n≥1 be an increasing sequence from 0 0 N with g = limn gn. For n ≥ 1 put gn = gn ∧ n; then {gn}n≥1 is an increasing

61 11 The Daniell integral 62

↑ sequence from N ∩ MF(X) ⊂ N ∩ MF(X) with g = limn gn. But gn ∈ N and ↑ gn ≤ f and therefore by Lemma 8.1 there exists hn ∈ N with f = gn + hn.

+ Let N be a normal subspace of M(X) and Φ : N → R∞ be a pre-continuous linear mapping. Then by Lemma 11.1 N ↑ is a complete weakly normal subspace of M(X) and Proposition 8.4 implies there exists a continuous linear mapping 0 ↑ + 0 Φ : N → R∞ such that Φ (f) = Φ(f) for all f ∈ N. Theorem 11.1 thus follows immediately from the following result:

Proposition 11.1 Let N be a complete weakly normal subspace of M(X) and + let Φ : N → R∞ be a continuous linear mapping. Then there exists a σ-algebra E ⊂ P(X) with N ⊂ M(E) and a measure µ on E such that Φ(f) = µ(f) for all f ∈ N.

The rest of the chapter is taken up with the proof of Proposition 11.1. Thus let + N be a complete weakly normal subspace of M(X) and let Φ : N → R∞ be a ∗ + continuous linear mapping. Define mappings Φ , Φ∗ : M(X) → R∞ by letting

Φ∗(f) = inf{Φ(g) : g ∈ N with g ≥ f} ,

Φ∗(f) = sup{Φ(h) − Φ(g) : h, g ∈ N with Φ(g) < ∞ and h ≤ g + f} for each f ∈ M(X). (Note that Φ∗(f) ≥ 0 since 0 ≤ 0 + f and Φ(0) − Φ(0) = 0.)

∗ Lemma 11.2 (1) The mappings Φ and Φ∗ are both monotone. ∗ (2) Φ∗(f) ≤ Φ (f) for each f ∈ M(X). ∗ ∗ + (3) Φ (af) = aΦ (f) and Φ∗(af) = aΦ∗(f) for all f ∈ M(X), a ∈ R . ∗ ∗ ∗ (4) Φ (f1 + f2) ≤ Φ (f1) + Φ (f2) for all f1, f2 ∈ M(X).

(5) Φ∗(f1 + f2) ≥ Φ∗(f1) + Φ∗(f2) for all f1, f2 ∈ M(X). (6) If f ∈ M(X), g ∈ N, h ∈ N 0 with f + h ≤ g then Φ∗(f) ≤ Φ(g) − Φ(h). ∗ ∗ (7) Φ∗(f1 + f2) ≤ Φ (f1) + Φ∗(f2) ≤ Φ (f1 + f2) for all f1, f2 ∈ M(X). ∗ (8) Φ∗(g) = Φ(g) = Φ (g) for each g ∈ N.

(9) If {fn}n≥1 is any sequence from M(X) and f = n≥1 fn then P ∗ ∗ Φ (f) ≤ Φ (fn) . Xn≥1

n (Here f = limn sn, where {sn}n≥1 is the increasing sequence with sn = k=1 fk for each n ≥ 1.) P 11 The Daniell integral 63

Proof Let N 0 = {g ∈ N : Φ(g) < ∞}; thus N 0 is a subspace of M(X). (1) This is clear. (2) Let h ∈ N, g0 ∈ N 0 with h ≤ g0 + f and let g ∈ N with g ≥ f. Then h ≤ g0 + g and thus Φ(h) ≤ Φ(g0 + g) = Φ(g0) + Φ(g). Hence Φ(h) − Φ(g0) ≤ Φ(g) ∗ and this implies that Φ∗(f) ≤ Φ (f). (3) It is clear that Φ∗(af) = aΦ∗(f) for all f ∈ M(X), a ∈ R+. Moreover, 0 Φ∗(0) = 0, since if h ∈ N, g ∈ N with h ≤ g + 0 then Φ(h) ≤ Φ(g) and hence Φ(h) − Φ(g) ≤ 0 (and we have already noted that Φ∗(f) ≥ 0 for all f ∈ M(X)). This implies that Φ∗(0f) = 0Φ∗(f) for all f ∈ M(X), and so it remains to show that if a > 0 then Φ∗(af) = aΦ∗(f) for all f ∈ M(X). Now if h ∈ N, g ∈ N 0 with h ≤ g + f then ah ∈ N, ag ∈ N 0 and ah ≤ ag + af and hence Φ∗(af) ≥ Φ(ah) − Φ(ag) = a(Φ(h) − Φ(g)), from which it follows that Φ∗(af) ≥ aΦ∗(f). Applying this to the mapping af and with a replaced by b = 1/a then also gives Φ∗(f) = Φ∗(b(af)) ≥ bΦ∗(af), i.e., aΦ∗(f) ≤ Φ∗(af).

(4) Let g1, g2 ∈ N with g1 ≥ f1 and g2 ≥ f2. Then g1 + g2 ≥ f1 + f2 with ∗ g1 + g2 ∈ N, and thus Φ (f1 + f2) ≤ Φ(g1 + g2) = Φ(g1) + Φ(g2). It therefore ∗ ∗ ∗ follows that Φ (f1 + f2) ≤ Φ (f1) + Φ (f2). 0 (5) Let h1, h2 ∈ N, g1, g2 ∈ N with h1 ≤ g1 + f1 and h2 ≤ g2 + f2. Then 0 h1 + h2 ∈ N, g1 + g2 ∈ N with h1 + h2 ≤ g1 + g2 + f1 + f2 and thus

Φ∗(f1 + f2) ≥ Φ(h1 + h2) − Φ(g1 + g2) = Φ(h1) − Φ(g1) + Φ(h2) − Φ(g2) .

Therefore Φ∗(f1 + f2) ≥ Φ∗(f1) + Φ∗(f2). (6) Let f ∈ M(X), g ∈ N and h ∈ N 0 with f + h ≤ g. If Φ(g) = ∞ then clearly Φ∗(f) ≤ Φ(g) − Φ(h) and so we can assume that g ∈ N 0. Now h ≤ g and N is weakly complemented, and so there exists an increasing sequence {hn}n≥1 0 from N ∩ MF(X) with h = limn hn and for each n ≥ 1 an element hn ∈ N with 0 g = hn + hn. Since Φ is continuous we then have Φ(h) = limn Φ(hn). Let ε > 0; 0 there thus exists p ≥ 1 with Φ(hp) > Φ(h) − ε. But f + hp = f + h ≤ g = hp + hp 0 and hence f ≤ hp, since hp ∈ MF(X). Therefore

∗ 0 Φ (f) ≤ Φ(hp) = Φ(g) − Φ(hp) < Φ(g) − Φ(h) + ε and this implies that Φ∗(f) ≤ Φ(g) − Φ(h). ∗ (7) We first show that Φ∗(f1 + f2) ≤ Φ (f1) + Φ∗(f2), and for this we can assume ∗ 0 Φ (f1) < ∞. Let h ∈ N, g, g1 ∈ N with h ≤ g + f1 + f2 and g1 ≥ f1. Then 0 g + g1 ∈ N and h ≤ g + g1 + f2 and thus

Φ(g1) + Φ∗(f2) ≥ Φ(g1) + Φ(h) − Φ(g + g1) = Φ(h) − Φ(g) .

0 Therefore Φ(g1) + Φ∗(f2) ≥ Φ∗(f1 + f2) for all g1 ∈ N with g1 ≥ f1. and this ∗ ∗ implies that Φ∗(f1 + f2) ≤ Φ (f1) + Φ∗(f2) (since Φ (f1) < ∞). 11 The Daniell integral 64

∗ ∗ We next show that Φ (f1) + Φ∗(f2) ≤ Φ (f1 + f2) and here we can assume that ∗ 0 0 Φ (f1 + f2) < ∞. Let g ∈ N with g ≥ f1 + f2 and let h2 ∈ N, g2 ∈ N with 0 h2 ≤ g2 + f2 (and so in fact h2 ∈ N ). Then

h2 + f1 ≤ g2 + f2 + f1 = g2 + f1 + f2 ≤ g2 + g ,

∗ i.e., f1 + h2 ≤ g + g2, and hence by (6) Φ (f1) ≤ Φ(g + g2) − Φ(h2). Therefore

∗ Φ (f1) + Φ(h2) − Φ(g2) ≤ Φ(g) ,

∗ 0 and thus Φ (f1) + Φ∗(f2) ≤ Φ(g) for all g ∈ N with g ≥ f1 + f2. This in turn ∗ ∗ ∗ implies that Φ (f1) + Φ∗(f2) ≤ Φ (f1 + f2) (again since Φ (f1 + f2) < ∞). (8) This is clear.

(9) Let {fn}n≥1 be a sequence from M(X) and put f = n≥1 fn. We need to ∗ ∗ ∗ show Φ (f) ≤ n≥1 Φ (fn) and for this it can assumed thatP n≥1 Φ (fn) < ∞ ∗ (and so in particularP Φ (fn) < ∞ for each n ≥ 1). Let ε > 0; thenP for each n ≥ 1 ∗ −n there exists gn ∈ N with gn ≥ fn and Φ(gn) < Φ (fn) + 2 ε. Let g = n≥1 gn; then, since N is a complete subspace and Φ is a continuous linear mappingP it follows that g ∈ N and

n n

Φ(g) = lim Φ gk = lim Φ(gk) = Φ(gn) . n→∞ n→∞ Xk=1  Xk=1 Xn≥1 But g ≥ f and therefore

∗ ∗ −n ∗ Φ (f) ≤ Φ(g) = Φ(gn) ≤ (Φ (fn) + 2 ε) = Φ (fn) + ε . Xn≥1 Xn≥1 Xn≥1

∗ ∗ Thus Φ (f) ≤ n≥1 Φ (fn), since ε > 0 is arbitrary. P ∗ + Now put M = {f ∈ M(X) : Φ∗(f) = Φ (f)} and let Ψ : M → R∞ be the ∗ 0 restriction of Φ (or Φ∗) to M. Also let M = {f ∈ M : Ψ(f) < ∞}.

Lemma 11.3 (1) M is a complete subspace of M(X) with N ⊂ M. + (2) Ψ : M → R∞ is continuous linear mapping and Ψ(g) = Φ(g) for all g ∈ N. 0 (3) If f1 ∈ M , f2 ∈ M(X) with f1 + f2 ∈ M then f2 ∈ M. 0 0 (4) If f1, f2 ∈ M then f1 ∨ f2 and f1 ∧ f2 are both in M . (5) If f ∈ M 0 and h ∈ N then h ∧ f ∈ M 0.

Proof It follows immediately from Lemma 11.2 (1), (2), (3), (4) and (5) that M is a subspace of M(X) and that Ψ is a monotone linear mapping. Moreover, by Lemma 11.2 (8) N ⊂ M and Ψ(g) = Φ(g) for all g ∈ N. 11 The Daniell integral 65

To complete the proofs of (1) and (2) we must still show that M is complete and that Ψ is continuous, but before doing this we first look at part (3). Thus 0 consider f1 ∈ M , f2 ∈ M(X) with f1 + f2 ∈ M. Then by Lemma 11.2 (7)

∗ ∗ Ψ(f1 + f2) = Φ∗(f1 + f2) ≤ Φ (f1) + Φ∗(f2) ≤ Φ (f1 + f2) = Ψ(f1 + f2) and (reversing the roles of f1 and f2)

∗ ∗ Ψ(f1 + f2) = Φ∗(f1 + f2) ≤ Φ∗(f1) + Φ (f2) ≤ Φ (f1 + f2) = Ψ(f1 + f2) .

∗ ∗ But Φ∗(f1) = Ψ(f1) = Φ (f1), and therefore Ψ(f1) + Φ (f2) = Ψ(f1) + Φ∗(f2); ∗ hence Φ (f2) = Φ∗(f2), since Ψ(f1) < ∞, i.e., f2 ∈ M.

Now let {fn}n≥1 be an increasing sequence from M with f = limn fn. Then by Lemma 11.2 (1) Φ∗(f) ≥ limn Φ∗(fn) = limn Ψ(fn). This means that if we ∗ can show that Φ (f) ≤ limn Ψ(fn) then it will follow both that f ∈ M and that Ψ(f) = limn Ψ(fn), thus proving that M is complete and also that Ψ is ∗ ∗ continuous. We will thus show that Φ (f) ≤ limn Φ (fn), and for this it can be ∗ assumed that limn Φ (fn) = a < ∞. Since fn ≤ fn+1 there exists hn ∈ M(X) with fn+1 = fn + hn and then by (3) hn ∈ M (noting that Ψ(fn) ≤ a < ∞). But n fn+1 = f1 + k=1 hk for each n ≥ 1 and thus f = f1 + n≥1 hn. Therefore by Lemma 11.2 P(9) P

∗ ∗ ∗ Φ (f) ≤ Φ (f1) + Φ (hn) Xn≥1

= Ψ(f1) + Ψ(hn) = Ψ(f1) + (Ψ(fn+1) − Ψ(fn)) Xn≥1 Xn≥1 n−1

= lim Ψ(f1) + (Ψ(fk+1) − Ψ(fk)) = lim Ψ(fn) . n→∞ n→∞  Xk=1  We have thus now shown that (1), (2) and (3) hold. 0 0 0 (4) For j ∈ {1, 2} let fj ∈ M . Let ε > 0; then there exist gj, gj ∈ N , hj ∈ N 0 0 with fj ≤ gj, hj ≤ gj + fj and Φ(gj) + Φ(gj) < Φ(hj) + ε/2. (Note that 0 0 Φ(hj) ≤ Φ(gj) + Φ(gj) < ∞, since hj ≤ gj + gj.) Then f1 ∨ f2 ≤ g1 ∨ g2 and by ∗ assumption g1 ∨ g2 ∈ N; hence Φ (f1 ∨ f2) ≤ Φ(g1 ∨ g2). Moreover,

0 0 0 0 (h1 + g2) ∨ (h1 + g1) ≤ g1 + g2 + f1 ∨ f2 .

+ (Let bj, cj, dj ∈ R with dj ≤ cj + bj for j ∈ {1, 2}; then

(d1 − c1) ∨ (d2 − c2) = (d1 − c1 − d2 + c2) ∨ 0 + d2 − c2)

= (d1 + c2 − d2 − c1) ∨ 0 + d2 − c2

= (d1 + c2) ∨ (d2 + c1) − (d2 + c1) + d2 − c2

= (d1 + c2) ∨ (d2 + c1) − (c1 + c2) 11 The Daniell integral 66

and thus (d1 + c2) ∨ (d2 + c1) ≤ c1 + c2 + b1 ∨ b2. But this also remains true + + when R is replaced by R∞, since the right-hand-side of the inequality is ∞ 0 0 0 when any of the values is equal to ∞.) Now g1 + g2 ∈ N and by assumption 0 0 0 0 0 v = (h1 + g2) ∨ (h2 + g1) ∈ N , since Φ(v) ≤ Φ(h1 + g2 + h2 + g1) < ∞. Then 0 0 0 0 v ≤ g1 + g2 + f1 ∨ f2 and so Φ∗(f1 ∨ f2) ≥ Φ(v) − Φ(g1 + g2), from which it follows 0 0 that Φ∗(f1 ∨ f2) ≥ Φ(v) − Φ(g1 + g2). But

0 0 g1 ∨ g2 + h1 + h2 ≤ (h1 + g2) ∨ (h2 + g1) + g1 + g2 = v + g1 + g2 .

+ (Let bj, cj, dj ∈ R with dj ≤ cj + bj for j ∈ {1, 2}; then, using the calculation made above,

b1 ∨ b2 + c1 + c2 − (d1 + c2) ∨ (d2 + c1)

= b1 ∨ b2 − (d1 − c1) ∨ (d2 − c2) ≤ (b1 − d1 + c1) + (b2 − d2 + c2) since (y1 ∨ y2) − (x1 ∨ x2) ≤ (y1 − x1) + (y2 − x2) whenever x1, x2, y1, y2 ∈ R with x1 ≤ y1 and x2 ≤ y2. Thus

b1 ∨ b2 + d1 + d2 ≤ (d1 + c2) ∨ (d2 + c1) + b1 + b2

+ + and this still holds when R is replaced by R∞, since the right-hand-side of the inequality is ∞ when any of the values is equal to ∞.) Therefore

Φ(g1 ∨ g2) + Φ(h1 + h2) = Φ(g1 ∨ g2 + h1 + h2)

≤ Φ(v + g1 + g2) = Φ(v) + Φ(g1 + g2) which implies that

∗ 0 0 Φ (f1 ∨ f2) − Φ∗(f1 ∨ f2) ≤ Φ(g1 ∨ g2) − Φ(v) + Φ(g1 + g2) 0 0 ≤ Φ(g1 + g2) − Φ(h1 + h2) + Φ(g1 + g2) 0 0 ≤ Φ(g1) − Φ(h1) + Φ(g1) + Φ(g2) − Φ(h2) + Φ(g2) < ε/2 + ε/2 = ε .

∗ 0 This shows that Φ (f1 ∨f2) = Φ∗(f1 ∨f2), i.e., f1 ∨f2 ∈ M, and hence f1 ∨f2 ∈ M , since Ψ(f1 ∨ f2) ≤ Ψ(f1) + Ψ(f2) < ∞. Finally, it now follows from (3) that also 0 0 f1 ∧ f2 ∈ M , since f1 ∨ f2 + f1 ∧ f2 = f1 + f2 and Ψ(f1 ∨ f2) ∈ M . (5) Let f ∈ M 0 and h ∈ N. Now let ε > 0; then there exist g, g0 ∈ N 0 and h0 ∈ N with f ≤ g, h0 ≤ g0 + f and Φ(g) + Φ(g0) < Φ(h0) + ε. (Note that Φ(h0) ≤ Φ(g0) + Φ(g) < ∞, since h0 ≤ g0 + g.) Then h ∧ f ≤ h ∧ g and by assumption h ∧ g ∈ N; hence Φ∗(h ∧ f) ≤ Φ(h ∧ g). Moreover,

h0 ∧ (h + g0) ≤ g0 + h ∧ f .

(Let a, b, c, d ∈ R+ with d ≤ c + b; then as above a ∧ (d − c) = (a + c) ∧ d − c and + + thus d∧(a+c) ≤ c+a∧b. But this also remains true when R is replaced by R∞.) 11 The Daniell integral 67

0 0 0 0 0 Now v = h ∧ (h + g ) ∈ N , since Φ(v) ≤ Φ(h ) < ∞. Then vδ ≤ g + h ∧ f and 0 0 thus Φ∗(h∧f) ≥ Φ(v)−Φ(g ), from which it follows that Φ∗(h∧f) ≥ Φ(v)−Φ(g ). But h ∧ g + h0 ≤ (h + g0) ∧ h0 + g = v + g . (Let a, b, c, d ∈ R+ with d ≤ c + b; then, using the calculation made above,

a ∧ b + c − (a + c) ∧ d = a ∧ b − a ∧ (d − c) ≤ b − d + c .

+ + Thus a ∧ b + d ≤ (a + c) ∧ d + b and this still holds when R is replaced by R∞.) Therefore Φ(h ∧ g) + Φ(h0) = Φ(h ∧ g + h0) ≤ Φ(v + g) = Φ(v) + Φ(g) and so

∗ 0 0 0 Φ (h ∧ f) − Φ∗(h ∧ f) ≤ Φ(h ∧ g) − Φ(v) + Φ(g ) ≤ Φ(g) + Φ(g ) − Φ(h ) < ε .

∗ 0 This shows that Φ (h ∧ f) = Φ∗(h ∧ f), i.e., h ∧ f ∈ M, and hence h ∧ f ∈ M , since Ψ(h ∧ f) ≤ Ψ(f) < ∞.

By Lemma 11.3 (1) and (2) M 0 is a subspace of M(X). Put

0 0 E = {E ⊂ X : IE ∧ f ∈ M for all f ∈ M } .

Lemma 11.4 E is a σ-algebra and N ∪ M 0 ⊂ M(E).

Proof First note that if f ∈ M 0 and g ∧ f ∈ M for some g ∈ M(X) then in fact g ∧ f ∈ M 0, since Ψ(g ∧ f) ≤ Ψ(f) < ∞, and this means that

0 E = {E ⊂ X : IE ∧ f ∈ M for all f ∈ M } .

0 Let E ∈ E; if f ∈ M then IE ∧ f + IX\E ∧ f = 1 ∧ f and Lemma 11.3 (5) implies that 1 ∧ f ∈ M 0, since by assumption 1 ∈ N. Thus by Lemma 11.3 (3) 0 IX\E ∧ f ∈ M. This shows that X \ E ∈ E. Now let E, F ∈ E; if f ∈ M then 0 IE∩F ∧ f = (IE ∧ f) ∧ (IF ∧ f) and hence by Lemma 11.3 (3) IE∩F ∧ f ∈ M . This shows that E ∩ F ∈ E. But clearly ∅ ∈ E and therefore E is an algebra.

Next let {En}n≥1 be an increasing sequence from E and put E = n≥1 En. If 0 0 f ∈ M then {IEn ∧ f}n≥1 is an increasing from M with limn IEn ∧Sf = IE ∧ f and thus by Lemma 11.3 (1) IE ∧ f ∈ M. This shows E ∈ E and therefore by Lemma 2.1 (4) E is a σ-algebra. + Now let g ∈ N and a ∈ R with a > 0 and let Ga = {x ∈ X : g(x) > a}. Put h = a−1g − (a−1g) ∧ 1; then g ∈ N, since N is complemented and ∧-closed and 1 ∈ N. For each n ≥ 1 let gn = (nh) ∧ 1; then {gn}n≥1 is an increasing 0 sequence from N with limn gn = IGa . Thus if f ∈ M then by Lemma 11.3 (4) 0 {gn ∧ f}n≥1 is an increasing sequence from M with limn gn ∧ f = IGa ∧ f and so by Lemma 11.3 (1) IGa ∧ f ∈ M. This shows that Ga ∈ E and hence Lemma 2.7 implies that g ∈ M(E). 11 The Daniell integral 68

But the same proof also shows that M 0 ⊂ M(E): Let g ∈ M 0; then the mapping h defined above satisfies a−1g = (a−1g) ∧ 1 + h and by Lemma 11.3 (5) (a−1g) ∧ 1 is in M 0, and so Lemma 11.3 (3) implies that h ∈ M 0. Again using Lemma 11.3 (5) 0 this shows gn ∈ M , and exactly as before it then follows that g ∈ M(E).

0 Note that if E ⊂ X with IE ∈ M then by Lemma 11.3 (5) E ∈ E. Now define a + mapping µ : E → R∞ by letting Ψ(I ) if I ∈ M 0 , µ(E) = E E  ∞ otherwise .

Lemma 11.5 µ is a measure on E.

0 Proof Note that if E, F ∈ E with E ⊂ F and IE ∈ M then by the definition of 0 E it follows that IE ∈ M , since IE = IE ∧ IF .

We will show µ is countably additive. Thus let {En}n≥1 be a disjoint sequence 0 from E and put E = n≥1 En. If IEn ∈/ M for some n then by the above remark 0 IE ∈/ M (since En ⊂SE) and in this case µ(E) = ∞ = n≥1 µ(En). We can 0 n thus assume that IEn ∈ M for all n ≥ 1. Put gn = mP=1 IEm ; then {gn}n≥1 0 is an increasing sequence from M with limn gn = IE. ThPus by Lemma 11.3 (1) IE ∈ M and by Lemma 11.3 (2) n

Ψ(IE) = lim Ψ(gn) = lim Ψ IEm n→∞ n→∞ mX=1  n n

= lim Ψ(IEm ) = lim µ(Em) = µ(En) . n→∞ n→∞ mX=1 mX=1 Xn≥1 0 0 But Ψ(IE) = µ(E): Either IE ∈ M , in which case µ(E) = Ψ(IE), or IE ∈ M \M and here µ(E) = ∞ = Ψ(IE). This shows µ is countably additive, and therefore it is a measure, since clearly µ(∅) = Ψ(0) = 0.

0 Lemma 11.6 Let g ∈ ME(E) and f ∈ M(E) with g ≤ f, and where either f ∈ M or µ(f) < ∞. Then g ∈ M 0 and Ψ(g) = µ(g).

+ Proof The mapping g has the form a∈A aIEa with A a finite subset of R \ {0} 0 and Ea, a ∈ A, disjoint elements fromPE. Suppose that IEa ∈ M for each a ∈ A; then g ∈ M 0, since M 0 is a subspace of M(X), and

Ψ(g) = aΨ(IEa ) = aµ(Ea) = µ(g) . Xa∈A Xa∈A 0 −1 0 −1 0 But if f ∈ M then a f ∈ M and IEa ∧ (a f) = IEa and hence IEa ∈ M , since Ea ∈ E. On the other hand, if µ(f) < ∞ then µ(g) < ∞ and so µ(Ea) < ∞, and 0 then again IEa ∈ M for each a ∈ A. 11 The Daniell integral 69

Lemma 11.7 M 0 = {f ∈ M(E) : µ(f) < ∞} and Ψ(f) = µ(f) holds for all f ∈ M 0.

Proof Let f ∈ M 0; then by Lemma 11.4 f ∈ M(E) and hence by Lemma 9.4 there exists an increasing sequence {fn}n≥1 from ME(E) with limn fn = f. Thus 0 by Lemma 11.6 fn ∈ M with Ψ(fn) = µ(fn) for each n ≥ 1 and therefore

Ψ(f) = lim Ψ(fn) = lim µ(fn) = µ(f) , n→∞ n→∞ since Ψ and µ are both continuous. This shows that Ψ(f) = µ(f) for all f ∈ M 0 and so in particular M 0 ⊂ {f ∈ M(E) : µ(f) < ∞}. It remains to show that if f ∈ M(E) with µ(f) < ∞ then f ∈ M 0. But again by Lemma 9.3 there is an increasing sequence {fn}n≥1 from ME(E) with limn fn = f 0 and by Lemma 11.6 fn ∈ M with Ψ(fn) = µ(fn) for each n ≥ 1. Therefore Ψ(f) = µ(f) < ∞, i.e., f ∈ M 0.

Lemma 11.8 Φ(f) = µ(f) for all f ∈ N.

Proof By Lemmas 11.3 (1) and 11.4 N ⊂ M ∩ M(E) and by Lemma 11.3 (2) Φ(f) = Ψ(f) for all f ∈ N. Let f ∈ N; if Φ(f) < ∞ then f ∈ M 0 and thus by Lemma 11.7 Φ(f) = Ψ(f) = µ(f). On the other hand, if Φ(f) = ∞ then f ∈/ M 0 and thus again by Lemma 11.7 µ(f) = ∞, i.e., Φ(f) = ∞ = µ(f). This shows that Φ(f) = µ(f) for all f ∈ N.

The proof of Proposition 11.1 is now complete. 12 The Radon-Nikodym theorem

In this chapter we look at the Radon-Nikodym theorem, which is one of the most useful results in measure theory. In the following let (X, E) be a measurable space. Let µ and ν be measures on E. Then ν is said to be absolutely continuous with respect to µ if for each ε > 0 there exists δ > 0 such that ν(E) < ε for all E ∈ E with µ(E) < δ. Moreover, ν is said to be weakly absolutely continuous with respect to µ, and we then write ν  µ, if ν(E) = 0 for all E ∈ E with µ(E) = 0. It follows from Proposition 10.6 (1) that ν  µ if and only if ν(f) = 0 for all f ∈ M(E) with µ(f) = 0. If ν is absolutely continuous with respect to µ then clearly also ν  µ. For finite measures the converse holds:

Lemma 12.1 If µ and ν be finite measures on E with ν  µ then ν is absolutely continuous with respect to µ.

Proof Suppose that ν is not absolutely continuous with respect to µ. Then there −n exists an ε > 0 and for each n ≥ 1 an element En ∈ E such that µ(En) < 2 but with ν(En) ≥ ε. Put Fn = m≥n Em; then {Fn}n≥1 is a decreasing sequence from E with ν(Fn) ≥ ν(En) ≥ εSfor each n ≥ 1 (since En ⊂ Fn) and by Lemma 3.3 −m −n+1 µ(Fn) ≤ m≥n µ(Em) ≤ m≥n 2 = 2 . Let F = n≥1 Fm; then since µ and ν areP finite, it followsP from Lemma 3.2 that ν(F ) =Tlimn ν(Fn) ≥ ε and µ(F ) = limn µ(Fn) = 0. But this is not possible because ν  µ, and hence ν must be absolutely continuous with respect to µ.

+ Let µ be a measure on E and let h ∈ M(E); define µ·h : M(E) → R∞ by µ·h (f) = µ(hf) for all f ∈ M(E). Then µ·h is clearly linear and continuous and thus by Proposition 10.4 it is a measure. The measure µ·h is finite if and only if µ(h) < ∞, since µ·h (1) = µ(h). Note the rather extreme case with ∞ the constant mapping with value ∞: µ·∞ is the measure with 0 if µ(E) = 0 , µ·∞ (E) =  ∞ otherwise.

The simplest case of this construction is with h = IF for some F ∈ E, which results in the measure µ·IF with µ·IF (E) = µ(F ∩ E) for all E ∈ E.

Proposition 12.1 Let µ be a measure on E and let h ∈ M(E) with µ(h) < ∞. Then the measure µ·h is absolutely continuous with respect to µ.

Proof This is just Lemma 10.6.

70 12 The Radon-Nikodym theorem 71

Proposition 12.2 Let µ be a measure on E. Then µ·h  µ for all h ∈ M(E).

Proof This is just Proposition 10.6 (2).

The converse of Proposition 12.2 holds for σ-finite measures; this result is the Radon-Nikodym theorem:

Theorem 12.1 Let µ, ν be σ-finite measures on E with ν  µ. Then there exists h ∈ M(E) such that ν = µ·h. Moreover, if ν = µ·h0 also holds with h0 ∈ M(E) then h0 = h µ-a.e.

Proof The uniqueness is clear: If ν = µ·h0 = µ·h with h, h0 ∈ M(E) then 0 µ(h IE) = ν(E) = µ(hIE) for all E ∈ E, and thus Lemma 10.4 (2) implies that h0 = h µ-a.e. To show that h exists we first reduce things to the case where the measures + −1 µ and ν are both finite. If v ∈ MF (E) then the mapping v ∈ M(X) with −1 + v (x) = 1/v(x) for each x ∈ X is also an element of MF (E), since

{x ∈ X : v−1(x) > a} = {x ∈ X : v(x) < 1/a} ∈ E for all a ∈ R+ \ {0}.

Lemma 12.2 Let ω be a measure on E and let g, h ∈ M(E) be mappings. Then + −1 (ω·g)·h = ω·(gh). In particular, if v ∈ MF (E) then (ω·v)·v = ω.

Proof If f ∈ M(E) then ((ω·g)·h)(f) = (ω·g)(hf) = ω(ghf) = (ω·(gh))(f), and therefore (ω·g)·h = ω·(gh). The final statement follows because vv−1 = 1 and ω·1 = ω.

+ Let ω be a measure on E and v ∈ MF (E). By Proposition 12.2 and Lemma 12.2 it follows that ω·v  ω and ω  ω·v and so ω and ω·v have the same the sets of measure zero. + Now µ and ν are σ-finite and thus by Lemma 10.3 there exist u, v ∈ MF (E) such that µ(u) < ∞ and ν(v) < ∞. This just means that the measures µ·u and ν·v are finite. Moreover, ν·v  µ·u, since ν·v  ν, ν  µ and µ  µ·u. Suppose there exists h ∈ M(E) such that ν·v = (µ·u)·h. Then by Lemma 12.2

ν = (ν·v)·v−1 = ((µ·u)·h)·v−1 = µ·(uhv−1) .

Thus it is enough to consider the case of finite measures. We need the following fact (adapted from one of the standard proofs of the Hahn-Jordan decomposition for finite signed measures): 12 The Radon-Nikodym theorem 72

Lemma 12.3 Let µ and ν be finite measures on E with ν(X) > µ(X). Then there exists C ∈ E with ν(C) > 0 such that ν(E ∩ C) ≥ µ(E ∩ C) for all E ∈ E.

Proof Let δ1 = sup{µ(E) − ν(E) : E ∈ E} (thus δ1 ≥ 0, since µ(∅) − ν(∅) = 0) and choose E1 ∈ E with µ(E1) − ν(E1) ≥ δ1/2. Then by induction on n we can + define a sequence {δn}n≥1 from R and a sequence {En}n≥1 from E so that

δn = sup µ(E) − ν(E) : E ∈ E with E ∩ Ek = ∅ for k = 1, . . . , n − 1  n−1 and En ∈ E a subset of X \ k=1 Ek with µ(En) − ν(En) ≥ δn/2 for each n > 1. In particular {En}n≥1 is a disjoinS t sequence from E. Put E∞ = n≥1 En and C = X \ E∞. Then µ(E∞) − ν(E∞) = n≥1(µ(En) − ν(En)) ≥ nS≥1 δn/2, and from this it follows that both µ(E∞) − Pν(E∞) ≥ 0 and n≥1 δn 0, and also limn δn = 0. Let E ∈ E; then

n−1

E ∩ C = E ∩ (X \ E∞) ⊂ X \ Ek , k[=1 and so µ(E ∩ C) − ν(E ∩ C) ≤ δn for each n ≥ 1, i.e.., ν(E ∩ C) ≥ µ(E ∩ C).

Now to the proof of Theorem 12.1 for finite measures, so let µ, ν be finite measures on E with ν  µ. Consider the set

U = g ∈ M(E) : µ(gIE) ≤ ν(E) for all E ∈ E ;  then U =6 ∅, since 0 ∈ U and by Proposition 9.2 it easily follows that

U = g ∈ M(E) : µ(gf) ≤ ν(f) for all f ∈ M(E) .  Let g, h ∈ U and let A = {x ∈ X : g(x) ≥ h(x)}; then for each f ∈ M(E)

µ((g ∨ h)f) = µ(gIAf) + µ(hIX\Af) ≤ ν(IAf) + ν(IX\A) = ν(f) and this shows that g ∨ h ∈ U for all g, h ∈ U. Moreover, U is a complete subset of M(E): Let {gn}n≥1 be an increasing sequence from U and let g = limn gn. Then {gnf}n≥1 is an increasing sequence from M(E) with limn gnf = gf and thus µ(gf) = limn µ(gnf) ≤ ν(f), since µ(gnf) ≤ ν(f) for all n ≥ 1. Let α = sup{µ(g) : g ∈ U}; then α ≤ ν(X) < ∞, since µ(g) ≤ ν(X) for all g ∈ U. For each n ≥ 1 choose gn ∈ U with µ(gn) ≥ α − 1/n and let hn = g1 ∨ · · · ∨ gn. Then {hn}n≥1 is an increasing sequence from U and so h = limn hn is also an element of U. But α − 1/n ≤ µ(gn) ≤ µ(hn) ≤ µ(h) ≤ α for each n ≥ 1 and therefore µ(h) = α. 12 The Radon-Nikodym theorem 73

We now show that ν(E) = µ(hIE) for all E ∈ E, thus suppose that this is not the case. Then there exists B ∈ E with µ(hIB) < ν(B) and so there also exists 0 0 δ > 0 such that ν(B) > µ(hIB) + δµ(B). Consider the measures µ , ν on E given 0 0 by µ (E) = µ((h + δ)IBIE) and ν (E) = ν(B ∩ E) for all E ∈ E. Then

0 0 µ (X) = µ((h + δ)IB) = µ(hIB) + δµ(B) < ν(B) = ν (X) .

Hence µ0 and ν0 are both finite and ν0(X) > µ0(X). Thus by Lemma 12.3 there exists C ∈ E with ν0(C) > 0 such that ν0(E ∩ C) ≥ µ0(E ∩ C) for all E ∈ E. Let 0 h = h + δIB∩C ; then

0 µ(h f) + µ(hIB∩C f) = µ(hf) + δµ(IB∩C f) + µ(hIB∩C f)

= µ(hf) + µ((h + δ)IB∩C f) ≤ µ(hf) + ν(IB∩C f)

= µ(hIX\B∩C f) + ν(IB∩C f) + µ(hIB∩C f)

≤ ν(IX\B∩C f) + ν(IB∩C f) + µ(hIB∩C f)

= ν(f) + µ(hIB∩C f)

0 for all f ∈ M(E) and so in particular µ(h IE) ≤ ν(E) for all E ∈ E, since 0 0 µ(hIB∩C IE) ≤ µ(h) = α < ∞. Hence h ∈ U. But ν(B ∩ C) = ν (C) > 0 and ν  µ and so µ(B ∩ C) > 0, which means that

µ(h0) = µ(h) + δµ(B ∩ C) = α + δµ(B ∩ C) > α .

This is a contradiction and therefore ν(E) = µ(hIE) for all E ∈ E, i.e., ν = µ·h. This completes the proof of Theorem 12.1.

The element h ∈ M(E) in Theorem 12.1 is called a (or the) Radon-Nikodym density (or just the density) of ν with respect to µ.

Measures τ1 and τ2 on E are said to be mutually singular, and we then write τ1 ⊥ τ2, if there exist E1, E2 ∈ E with X = E1 ∪ E2 and E1 ∩ E2 = ∅ such that τ1(E1) = τ2(E2) = 0.

Theorem 12.2 Let µ, ν be σ-finite measures. Then there exist σ-finite measures ν1, ν2 such that ν = ν1 + ν2, ν1  µ and ν2 ⊥ µ. Moreover, this decomposition is 0 0 0 0 0 0 0 unique: If ν1, ν2 are measures with ν = ν1 + ν2, ν1  µ and ν2 ⊥ µ then ν1 = ν1 0 and ν2 = ν2.

Proof Let τ = µ + ν; then τ is σ-finite and µ  τ, thus by Theorem 12.1 there exists h ∈ M(E) such that µ = τ·h. Put D1 = {x ∈ X : h(x) > 0} and D2 = {x ∈ X : h(x) = 0} and define measures ν1, ν2 on E by letting ν1(E) = ν(E ∩ D1) and ν2(E) = ν(E ∩ D2) for all E ∈ E. Therefore ν1 and ν2 are σ-finite and ν = ν1 + ν2. Now X = D1 ∪ D2 and D1 ∩ D2 = ∅, and

ν2(D1) = ν(D1 ∩ D2) = ν(∅) = 0 and µ(D2) = τ(hID2 ) = 0, and so ν2 ⊥ µ. 12 The Radon-Nikodym theorem 74

Let E ∈ E with µ(E) = 0. Then τ(hIE ) = 0 and thus by Proposition 10.6 (1) τ(E ∩ E1) = 0, since E ∩ D1 = {x ∈ X : h(x)IE (x) > 0}. It therefore follows that ν1(E1) = ν(E ∩ D1) ≤ τ(E ∩ D1) = 0, and this shows that ν1  µ. 0 0 It remains to establish the uniqueness. Thus let ν1, ν2 be measures on E with 0 0 0 0 0 0 ν = ν1 + ν2, ν1  µ and ν2 ⊥ µ. In particular, ν1 and ν2 are σ-finite (since 0 0 0 ν = ν1 + ν2 and ν is σ-finite). Now since ν2 ⊥ µ and ν2 ⊥ µ, there exist 0 0 0 0 0 0 E1, E2, E1, E2 ∈ E with X = E1 ∪ E2 = E1 ∪ E2 and E1 ∩ E2 = E1 ∩ E2 = ∅ 0 0 0 0 0 with µ(E1) = µ(E1) = ν2(E2) = ν2(E2) = 0. Put F1 = E1 ∪E1 and F2 = E2 ∩E2; 0 then X = F1 ∪ F2, F1 ∩ F2 = ∅ and µ(F1) = ν2(F2) = ν2(F2) = 0. But ν1  µ 0 0 and ν1  µ and so by Theorem 12.1 there exist h, h ∈ M(F) with ν1 = µ·h and 0 0 ν1 = µ·h . It follows from Proposition 10.6 (3) (since µ(X \ F2) = µ(F1) = 0) that

ν1(E) = µ(hIE) = µ(hIE∩F2 ) = ν1(E ∩ F2) = ν1(E ∩ F2) + ν2(E ∩ F2)

= ν(E ∩ F2) 0 0 0 0 0 = ν1(E ∩ F2) + ν2(E ∩ F2) = ν1(E ∩ F2) = µ(h IE∩F2 ) = µ(h IE) 0 = ν1(E)

0 0 for all E ∈ E, i.e., ν1 = ν1. Hence also ν2 = ν2: If E ∈ E with ν(E) < ∞ then

0 0 ν2(E) = ν(E) − ν1(E) = ν(E) − ν1(E) = ν2(E)

0 and thus ν2(E) = ν2(E) for all E ∈ E since ν is σ-finite.

The representation of ν as the sum of ν1 and ν2 in Theorem 12.2 is called the Lebesgue decomposition of ν with respect to µ. The proof of the theorem shows that this decomposition has a very simple form: There exist D1, D2 ∈ E with X = D1∪D2 and D1∩D2 = ∅ such that ν1(E) = ν(E∩D1) and ν2(E) = ν(E∩D2) for all E ∈ E.

In what follows let E 0 be a sub-σ-algebra of E, i.e., a σ-algebra E 0 with E 0 ⊂ E. Then we can consider M(E 0) as a subset of M(E), and it is then in fact a closed subspace of M(E). + 0 0 + Let µ : M(E) → R∞ be a measure and let µ : M(E ) → R∞ be the restriction of µ to M(E 0). Then µ0 is clearly linear and continuous and so by Proposition 10.4 it 0 0 + is a measure on E . As a mapping from E to R∞ it is of course just the restriction 0 + to E of the mapping µ : E → R∞.

Theorem 12.3 Let µ be a finite measure on E and let f ∈ M(E) be a mapping with µ(f) < ∞. Then there exists g ∈ M(E 0) such that µ(gh) = µ(fh) for all h ∈ M(E 0) (and so µ(g) = µ(f) < ∞). If g0 ∈ M(E 0) also satisfies µ(g0h) = µ(fh) for all h ∈ M(E 0) then g0 = g µ-a.e. 12 The Radon-Nikodym theorem 75

0 + 0 Proof Let ν : M(E ) → R∞ be the restriction to M(E ) of the finite measure + 0 0 µ·f : M(E) → R∞. Then ν is a finite measure on (X, E ) and ν  µ , where µ0 is again the restriction of µ to M(E 0), since if E ∈ E 0 with µ0(E) = 0 then µ(E) = µ0(E) = 0 and so by Proposition 12.1 ν(E) = µ·f (E) = 0. Thus by Theorem 12.1 there exists g ∈ M(E 0) such that ν = µ0·g and then µ(gh) = µ0(gh) = µ0·g (h) = ν(h) = µ·f (h) = µ(fh) for all h ∈ M(E 0). Finally, if g0 ∈ M(E 0) also satisfies µ(g0h) = µ(fh) for all h ∈ M(E 0) then by Lemma 10.4 (2) g0 = g µ-a.e. Here is a version of Theorem 12.3 for bounded mappings

Theorem 12.4 Let µ be a finite measure on E and f ∈ MB(E) with f ≤ a. Then 0 0 there exists g ∈ MB(E ) with g ≤ a such that µ(gh) = µ(fh) for all h ∈ M(E ). Proof By Theorem 12.3 there exists g ∈ M(E 0) such that µ(gh) = µ(fh) for all 0 h ∈ M(E ). For n ≥ 1 let Bn = {x ∈ X : g(x) > a + 1/n}; then

(a + 1/n)µ(Bn) ≤ µ(gIB+n) = µ(fIBn ) ≤ aµ(Bn) 0 0 0 and thus µ(Bn) = 0. Put B = n≥1 Bn and g = gIX\B; then g ∈ M(E ) with g0 ≤ a and µ(B) = 0. Therefore Sby Proposition 10.6 (3) µ(g0h) = µ(gh) = µ(fh) for all h ∈ M(E 0). If µ is a probability measure on E then the mapping g ∈ M(E 0) in Theorem 12.3 is called a version of the conditional expectation of f with respect to E 0 and it is 0 usually denoted by Eµ(f|E ). The next result shows that the operation of taking the conditional expectation behaves in some sense like an orthogonal projection.

Lemma 12.4 Let µ be a probability measure on E and let E1, E2 be sub-σ-algebras of E with E2 ⊂ E1. Let f ∈ MB(E) and for k = 1, 2 let fk ∈ MB(Ek) be a version of the conditional expectation of f with respect to Ek. Then 2 2 2 µ((f − f1) ) + µ((f1 − f2) ) = µ((f − f2) ) . 2 2 In particular, µ((f − f1) ) ≤ µ((f − f2) ).

Proof Since f2 ∈ MB(E2) ⊂ MB(E1) it follows that µ(f2f) = µ(f2f1), and also 2 µ(f1 ) = µ(f1f1) = µ(f1f), since f1 ∈ MB(E1). Therefore 2 2 2 µ((f − f1) ) + µ((f1 − f2) ) + 2µ(f1 ) + 2µ(ff2) 2 2 = µ((f − f1) ) + µ((f1 − f2) ) + 2µ(f1f) + 2µ(f1f2) 2 2 = µ((f − f1) + (f1 − f2) + 2f1f + 2f1f2) 2 2 2 2 2 2 = µ(f + 2f1 + f2 ) = µ(f ) + 2µ(f1 ) + µ(f2 ) 2 2 = µ((f − f2) ) + 2µ(f1 ) + 2µ(ff2) , 2 2 2 i.e., µ((f − f1) ) + µ((f1 − f2) ) = µ((f − f2) ). 13 Image and pre-image measures

If (X, E) and (Y, F) are measurable spaces and f : X → Y is a measurable mapping then for each measure µ on E there is a measure µ ◦ f ∗ defined on F which is called the image of µ under f. In this chapter we look at its properties. In particular we are interested in determining which measures on F can occur as image measures, i.e., which have the form µ ◦ f ∗ for some measure µ on E. Let X and Y be non-empty sets and let N be a subspace of M(Y ). A mapping Φ : N → M(X) is linear if it is additive and positive homogeneous, i.e., if

Φ(af + bg) = aΦ(f) + bΦ(g) for all f, g ∈ N and all a, b ∈ R+. Note then that Φ(0) = 0. The mapping Φ is said to be monotone if Φ(g) ≤ Φ(f) whenever f, g ∈ N with g ≤ f; in this case {Φ(fn)}n≥1 is an increasing sequence in M(X) for each increasing sequence {fn}n≥1 from N. Finally, if N is a complete subspace of M(Y ) then a linear mapping Φ : N → M(X) is called continuous if it is monotone and

Φ lim fn = lim Φ(fn) n→∞  n→∞ for each increasing sequence {fn}n≥1 of elements from N. If N is a complemented subspace of M(Y ) (such as M(F)) then any linear mapping Φ : N → M(X) is automatically monotone.

Lemma 13.1 Let N be a subspace of M(Y ) and let Φ : N → M(X) be a linear mapping. Then: (1) The image Φ(N) = {g ∈ M(X) : g = Φ(f) for some f ∈ N} of N under Φ is a subspace of M(X). (2) If M is a subspace of M(X) then Φ−1(M) = {f ∈ N : Φ(f) ∈ M} is a subspace of N. Moreover, if N and M are both complete and Φ is continuous then Φ−1(M) is also complete.

+ Proof (1) Let g1, g2 ∈ N and a1, a2 ∈ R ; there therefore exist f1, f2 ∈ N with g1 = Φ(f1) and g2 = Φ(f2). Then a1f1 + a2f2 ∈ N, since N is a subspace of M(Y ) and Φ(a1f1 + a2f2) = a1Φ(f1) + a2Φ(f2) = a1g1 + a2g2, since Φ is linear. Thus Φ(N) is a subspace of M(X), since 0 = Φ(0) ∈ Φ(N). −1 + (2) Let f1, f2 ∈ Φ (M) and a1, a2 ∈ R ; then since Φ is linear and M is a −1 subspace Φ(a1f1 +a2f2) = a1Φ(f1)+a2Φ(f2) ∈ M, and so a1f1 +a2f2 ∈ Φ (M). Thus Φ−1(M) is a subspace of N, since Φ(0) = 0 ∈ M. Now suppose N and M are both complete and that Φ is continuous; let {fn}n≥1 be an increasing sequence −1 from Φ (M) and put f = limn fn (and so f ∈ N). Then Φ(f) = limn Φ(fn),

76 13 Image and pre-image measures 77

since Φ is continuous, and {Φ(fn)}n≥1 is an increasing sequence from M. Hence Φ(f) ∈ M, since M is complete, i.e., f ∈ Φ−1(M). This implies that Φ−1(M) is complete.

If M is a subspace of M(X) then the statement that Φ : N → M is a linear mapping means Φ : N → M(X) is a linear mapping such that Φ(N) ⊂ M. A similar interpretation is also to be made for monotone linear and continuous linear mappings. In what follows let (X, E) and (Y, F) be measurable spaces and let f : X → Y be a measurable mapping from (X, E) to (Y, F) (i.e., with f −1(F) ⊂ E). Then for each g ∈ M(F) the mapping g ◦ f is in M(E) and so we have a mapping f ∗ : M(F) → M(E) given by f ∗(g) = g ◦ f for each g ∈ M(F). The mapping f ∗ is clearly linear and continuous.

Lemma 13.2 (1) The mapping f ∗ : M(F) → M(E) is surjective if and only if f −1(F) = E. (2) The mapping f ∗ : M(F) → M(E) is injective if and only if ∅ is the only element F ∈ F with F ∩ f(X) = ∅. In particular, f ∗ is injective whenever f is surjective.

Proof (1) Let N = {f ∗(g) : g ∈ M(F)}; then by Lemma 13.1 (1) N is a subspace of M(E). Let {hn}n≥1 be an increasing sequence from N and for each n ≥ 1 let ∗ 0 gn ∈ M(F) with f (gn) = hn. Let gn = g1 ∨ · · · ∨ gn; then

∗ 0 0 f (gn)(x) = gn(f(x)) = g1(f(x)) ∨ · · · ∨ gn(f(x)) = h1(x) ∨ · · · ∨ hn(x) = hn(x)

∗ 0 0 for each x ∈ X, i.e., f (gn) = hn. Now {gn}n≥1 is an increasing sequence from 0 M(F) and hence by Proposition 9.1 g = limn gn ∈ M(F). But

∗ 0 f (g)(x) = g(f(x)) = lim gn(f(x)) = lim hn(x) n→∞ n→∞

∗ for each x ∈ X, thus limn hn = f (g) ∈ N, which shows that N is complete. It therefore follows from Proposition 9.2 (1) that N = M(F) (i.e., that f ∗ is surjective) if and only if IE ∈ N for all E ∈ E. Suppose first that f −1(F) = E and let E ∈ E; there thus exists F ∈ F with −1 ∗ ∗ f (F ) = E and then IE = f (IF ) ∈ N. This implies f is surjective. Suppose conversely that f ∗ is surjective and let E ∈ E; there thus exists g ∈ M(F) with ∗ ∗ IE = f (g). Let F = {y ∈ Y : g(y) = 1}; then F ∈ F and IE = f (IF ) and so E = f −1(F ). This shows that f −1(F) = E.

(2) Let g1, g2 ∈ M(F) and let D = {y ∈ Y : g1(y) =6 g2(y)}; thus by Lemma 9.6 ∗ ∗ D ∈ F. Now f (g1) = f (g2) if and only if g1(f(x)) = g2(f(x)) for all x ∈ X, 13 Image and pre-image measures 78 i.e., if and only if D ∩ f(X) = ∅. Thus if ∅ is the only element F ∈ F with F ∩ f(X) = ∅ then f ∗ is injective. Conversely, if there exists F ∈ F with F =6 ∅ ∗ ∗ ∗ and F ∩ f(X) = ∅ then IF =6 0 but f (IF ) = 0 = f (0) and so in this case f is not injective.

A measurable space (Y, F) is said to be separable if {y} ∈ F for each y ∈ Y . If (Y, F) is separable then f ∗ : M(F) → M(E) is injective if and only if f is surjective (since if f is not surjective then {y} ∩ f(X) = ∅ for some y ∈ Y ). + ∗ + Let µ : M(E) → R∞ be a measure. The mapping µ ◦ f : M(F) → R∞ is then linear and continuous and is thus a measure on F which will be denoted by f∗µ. + −1 As a mapping from F to R∞ we have (f∗µ)(F ) = µ(f (F )) for each F ∈ F. The measure f∗µ is called the image of µ under f. Note that

∗ (f∗µ)(g) = µ(f g) = µ(g ◦ f) for all g ∈ M(F). Let us now look at the converse of this concept. Let ν be a measure on F; then we call a measure µ on E a pre-image of ν under f if ν = f∗µ. When discussing the existence of pre-image measures it is sensible to assume that f −1(F) = E. If this is not the case then E can always be replaced by f −1(F). However, even if the pre-image exists as a measure on f −1(F), this still leaves the problem of extending a measure from f −1(F) to E, and in this generality there are no non-trivial results about such extensions. If f −1(F) = E and a pre-image measure exists then it is clearly unique, since if −1 −1 f∗µ1 = f∗µ2 then µ1(f (F )) = µ2(f (F )) for all F ∈ F. Let ν be a measure on F; then a subset B ⊂ Y is said to be thick with respect to ν if ν(F ) = 0 for all F ∈ F with F ∩ B = ∅. Of course, an element B ∈ F is thick with respect to ν if and only if ν(Y \ B) = 0.

Theorem 13.1 Suppose f −1(F) = E and let ν be a measure on F. Then the pre-image of ν under f exists if and only if f(X) is thick with respect to ν. In particular, if f(X) ∈ F then the condition is that ν(Y \ f(X)) = 0.

Proof The condition is clearly necessary, since if ν = f∗µ and F ∈ F with F ∩ f(X) = ∅ then f −1(F ) = ∅ and hence ν(F ) = µ(f −1(F )) = µ(∅) = 0.

Suppose conversely that f(X) is thick with respect to ν. Let g1, g2 ∈ M(F) with g1 ◦ f = g2 ◦ f, and let D = {g1(y) =6 g2(y)}; then by Lemma 9.6 D ∈ F, and D ∩ f(X) = ∅, and so ν(D) = 0. Hence by Proposition 10.6 (3)

ν(g1) = ν(g1IY \D) = ν(g2IY \D) = ν(g2) . 13 Image and pre-image measures 79

+ There thus exists a mapping µ : M(E) → R∞ such that µ(g ◦ f) = ν(g) for all ∗ g ∈ M(F), i.e., such that f∗µ = ν (and µ is unique since f is surjective). It + remains to show that µ is a measure. Let h1, h2 ∈ M(E) and a1, a2 ∈ R ; since ∗ ∗ f is surjective there exist g1, g2 ∈ M(F) with hj = f (gj) = gj ◦ f for j = 1, 2. Then (a1g1 + a2g2) ◦ f = a1(g1 ◦ f) + a2(g2 ◦ f) = a1h1 + a2h2 and so

µ(a1h1 + a2h2) = ν(a1g1 + a2g2) = a1ν(g1) + a2ν(g2) = a1µ(h1) + a2µ(h2) .

This shows µ is additive. Let {hn}n≥1 be an increasing sequence from M(E) with h = limn hn and for each n ≥ 1 let gn ∈ M(F) with hn = gn ◦ f. Put 0 0 gn = g1 ∨ · · · ∨ gn; then the sequence {gn}n≥1 is increasing and 0 hn = h1 ∨ · · · ∨ hn = (g1 ◦ f) ∨ · · · ∨ (gn ◦ f) = (g1 ∨ · · · ∨ gn) ◦ f = gn ◦ f 0 0 0 for each n ≥ 1. Let g = limn gn; then h = g ◦ f and thus 0 0 0 0 lim µ(hn) = lim µ(gn ◦ f) = lim ν(gn) = ν(g ) = µ(g ◦ f) = µ(h) . n→∞ n→∞ n→∞ This shows µ is continuous and therefore it is a measure.

Theorem 13.1 is most often applied to the case in which f : X → Y is a surjective mapping with f −1(E) = F; for each measure ν on E there then exists a unique measure µ on F with ν = f∗µ. Here is a well-known fact which is a simple application of Theorem 13.1:

Proposition 13.1 Let ν be a measure on F and let B ⊂ Y be thick with respect to ν. Then there exists a unique measure µ on F|B (with F|B the trace σ-algebra on B) such that µ(F ∩ B) = ν(B) for all F ∈ F.

Proof Let i : B → Y be the inclusion mapping (with i(y) = y for all y ∈ B). −1 −1 Then i (F ) = F ∩ B for each F ⊂ Y and hence i (F) = F|B. Moreover, i(B) = B and so by Theorem 13.1 there exists a unique measure µ on (F|B such that ν = i∗µ, i.e., µ is the unique measure such that µ(F ∩ B) = ν(B) for all F ∈ F.

We end the chapter with a construction which is needed in [16] when dealing with particle models.

If (X, E) is a measurable space then let X/ denote the set of all measures on E taking only values in the set N (and so each measure p ∈ X/ is finite, since p(X) ∈ N); put E/ = σ(E♦), where E♦ is the set of all subsets of X/ having the form {p ∈ X/ : p(E) = k} with E ∈ E and k ∈ N. Now let (Y, F) be a second measurable space and let f : (X, E) → (Y, F) be a measurable mapping. If p ∈ X/ and f∗p is the image measure on F then −1 (f∗p)(F ) = p(f (F )) ∈ N for all F ∈ F and so f∗p ∈ Y/. Thus there is a mapping f/ : X/ → Y/ given by f/(p) = f∗p for each p ∈ X/. 13 Image and pre-image measures 80

Proposition 13.2 (1) The mapping f/ : (X/, E/) → (Y/, F/) is measurable. −1 −1 (2) If f (F) = E then (f/) (F/) = E/. −1 (3) If f (F) = E and f(X) ∈ F then f/(X/) ∈ F/.

Proof (1) Let F ∈ F and k ∈ N; then

−1 (f/) ({q ∈ Y/ : q(F ) = k}) −1 = {p ∈ X/ : f/(p)(F ) = k} = {p ∈ X/ : p(f (F )) = k} .

−1 −1 Thus (f/) (F♦) ⊂ E♦ and therefore by Lemma 2.4 (f/) (F/) ⊂ E/. (2) Let E ∈ E and k ∈ N; then there exists F ∈ F with f −1(F ) = E and the calculation in (1) shows that

−1 (f/) ({q ∈ Y/ : q(F ) = k}) = {p ∈ X/ : p(E) = k} .

−1 −1 This implies (f/) (F♦) = E♦ (since in (1) we showed that (f/) (F♦) ⊂ E♦). −1 Therefore by Proposition 2.4 (f/) (F/) = E/.

(3) Put f(X) = D and so D ∈ F. If p ∈ X/ then

−1 −1 (f∗p)(D) = p(f (D)) = p(X) = p(f (Y )) = (f∗p)(Y ) .

On the other hand, if q ∈ Y/ with q(D) = q(Y ) then by Theorem 13.1 there exists −1 a measure p on E with f∗p = q, and since p(f (F )) = q(F ) ∈ N for all F ∈ F −1 and f (F) = E it follows that p ∈ X/. Therefore

f/(X) = {f/(p) : p ∈ X/} = {f∗p : p ∈ X/}

= {q ∈ Y/ : q(D) = n} ∩ {q ∈ Y/ : q(Y ) = n} n[∈N and hence f/(X) ∈ F/.

It is often useful to partition the space X/ into components consisting of those n measures having the same total measure. Thus for each n ∈ N let X/ denote the set of all measures p on E taking only values in the set Nn = {0, 1, . . . , n} and n n n n with p(X) = n; put E/ = σ(E♦ ), where E♦ is the set of all subsets of X/ having n the form {p ∈ X/ : p(E) = k} with E ∈ E and k ∈ Nn. Thus X/ is the disjoint n union of the sets X/ , n ∈ N.

n n Proposition 13.3 We have E/ = {A ⊂ X/ : A ∩ X/ ∈ E/ for each n ∈ N} and therefore the measurable space (X/, E/) is the disjoint union of the measurable n n spaces (X/ , E/ ), n ∈ N. 13 Image and pre-image measures 81

n n Proof Put D = {A ⊂ X/ : A ∩ X/ ∈ E/ for each n ∈ N}, so D is the σ-algebra in the definition of the disjoint union. n n n n Let D/ = {A ∩ X/ : A ∈ E/}; then D/ is the trace σ-algebra of E/ on X/ and n n n n n n thus D/ = σ(D♦), where D♦ = {A ∩ X/ : A ∈ E♦}. But D♦ = E♦ and hence n n n n n n D/ = E/ , i.e., E/ = {A ∩ X/ : A ∈ E/}. Therefore if A ∈ E/ then A ∩ X/ ∈ E/ for each n ∈ N, which implies that A ∈ D. This shows E/ ⊂ D. n n Conversely, let A ∈ D; then A ∩ X/ ∈ E/ and thus there exists An ∈ E/ with n n n A ∩ X/ = An ∩ X/ and this implies that A ∩ X/ ∈ E/ for each n ∈ N, since n n X/ ∈ E/. Finally, we then have A = n∈N(A ∩ X/ ) ∈ E/, i.e., D ⊂ E/, and hence D = E/. S

The result corresponding to Proposition 13.2 also holds for each n ∈ N. Let n n f : (X, E) → (Y, F) be a measurable mapping. If p ∈ X/ then f∗p ∈ Y/ . Thus n n n n n there is a mapping f/ : X/ → Y/ given by f/ (p) = f∗p for each p ∈ X/ .

n n n n n Proposition 13.4 (1) The mapping f/ : (X/ , E/ ) → (Y/ , F/ ) is measurable. −1 n −1 n n (2) If f (F) = E then (f/ ) (F/ ) = E/ . −1 n n n (3) If f (F) = E and f(X) ∈ F then f/ (X/ ) ∈ F/ .

Proof This is the same as the proof of Proposition 13.2. 14 Kernels

Kernels can thought of as measurable families of measures. In general there are two measurable spaces (X, E) and (Y, F) involved in the definition of a kernel and it turns out that kernels can be identified with the set of continuous linear mappings π : M(F) → M(E). This is the direct analogue of the fact that measures + can be regarded as continuous linear mappings µ : M(F) → R∞. Recall from Chapter 13 that if X and Y are non-empty sets and N is a subspace of M(Y ) then a mapping Φ : N → M(X) is linear if Φ(af + bg) = aΦ(f) + bΦ(g) for all f, g ∈ N and all a, b ∈ R+. Moreover, Φ is monotone if Φ(g) ≤ Φ(f) whenever f, g ∈ N with g ≤ f, and if N is a complete subspace of M(Y ) then a linear mapping Φ : N → M(X) is continuous if it is monotone and

Φ lim fn = lim Φ(fn) n→∞  n→∞ for each increasing sequence {fn}n≥1 of elements from N. To start with let (Y, F) be a measurable space and X be a non-empty set. We + say that a mapping π : X × F → R∞ is a pre-kernel if π(x, ·) is a measure on F for each x ∈ X.

Proposition 14.1 Let Φ : M(F) → M(X) be a continuous linear mapping and + define π : X × F → R∞ by π(x, F ) = Φ(IF )(x). Then π is a pre-kernel.

+ Proof Let x ∈ X and define Φx : M(F) → R∞ by Φx(f) = Φ(f)(x) for each + f ∈ M(F). Then, since Φ is a continuous linear mapping, Φx : M(F) → R∞ is also a continuous linear mapping and therefore by Proposition 10.4 the mapping F 7→ Φx(IF ) is a measure on F. But we have Φx(IF ) = Φ(IF )(x) = π(x, F ) and hence π(x, ·) is a measure on F.

Theorem 14.1 For each pre-kernel π there exists a unique continuous linear mapping Φπ : M(F) → M(X) such that Φπ(IF ) = π(·, F ) for all F ∈ F.

Proof In order to emphasise the structure of the proof it is convenient to again employ the the notation from Theorem 10.1: Thus if µ is a measure on F then + Φµ : M(F) → R∞ is the unique continuous linear mapping with Φµ(IF ) = µ(F ) for all F ∈ F. Define a mapping Φπ : M(F) → M(X) by

Φπ(f)(x) = Φπ(x,·)(f) for all f ∈ M(F), x ∈ X. In particular, Φπ(IF )(x) = Φπ(x,·)(IF ) = π(x, F ) for all F ∈ F, x ∈ X, i.e., Φπ(IF ) = π(·, F ) for all F ∈ F.

82 14 Kernels 83

+ Let f, g ∈ M(F) and a, b ∈ R ; then, since Φπ(x,·) is linear,

Φπ(af + bg)(x) = Φπ(x,·)(af + bg) = aΦπ(x,·)(f) + bΦπ(x,·)(g)

= aΦπ(f)(x) + bΦπ(g)(x) = (aΦπ(f) + bΦπ(g))(x) for all x ∈ X and thus Φπ(af + bg) = aΦπ(f) + bΦπ(g). This shows Φπ is linear, and so it is also monotone, since by Proposition 9.1 M(F) is complemented.

Now let {fn}n≥1 be an increasing sequence from M(F) with limn fn = f. Then, since Φπ(x,·) is continuous,

Φπ(f)(x) = Φπ(x,·)(f) = lim Φπ(x,·)(fn) = lim Φπ(fn)(x) = lim Φπ(fn) (x) n→∞ n→∞ n→∞  for all x ∈ X and thus Φπ(f) = limn Φπ(fn). This shows Φπ is continuous.

Putting this together gives us that Φπ : M(F) → M(X) is a continuous linear 0 mapping with Φπ(IF ) = π(·, F ) for all F ∈ F. Finally, if Φπ : M(F) → M(X) 0 is any continuous linear mapping with Φπ(IF ) = π(·, F ) for all F ∈ F then 0 N = {f ∈ M(F) : Φπ(f) = Φπ(f)} is a complete subspace of M(F) with IF ∈ N for all F ∈ F and therefore by Proposition 9.2 N = M(F), which means that 0 Φπ = Φπ.

By Proposition 14.1 the mapping π 7→ Φπ defines a bijection between the set of pre-kernels and the set of continuous linear mappings from M(F) to M(X). + If π : X × F → R∞ is a pre-kernel then we just write π(f) instead of Φπ(f). This means that π will also be considered as the unique continuous linear mapping π : M(F) → M(X) with π(IF ) = π(·, F ) for all F ∈ F.

Proposition 14.2 Let π : M(F) → M(X) be a pre-kernel. Then π(f)(x) = π(x, ·)(f) for all x ∈ X, f ∈ M(F).

Proof This follows directly from the proof of Theorem 14.1, since this is how the unique mapping Φπ was defined. It also follows from Proposition 9.2 (without appealing to the proof of Theorem 14.1) since it is easily checked that N = {f ∈ M(F) : π(f)(x) = π(x, ·)(f) for all x ∈ X} is a complete subspace of M(F) with IF ∈ N for all F ∈ F.

Theorem 14.2 Let π be a pre-kernel and let {fn}n≥1 be a decreasing sequence from M(F) such that π(fm)(x) < ∞ for all x ∈ X for some m ≥ 1. Then

π lim fn = lim π(fn) . n→∞  n→∞ 14 Kernels 84

Proof This follows immediately from Theorem 10.2.

Let (X, E) be a measurable space (so we now have two measurable spaces (X, E) and (Y, F)).

Lemma 14.1 Let Φ : M(F) → M(X) be a continuous linear mapping. Then Φ(f) ∈ M(E) for all f ∈ M(F) if and only if Φ(IF ) ∈ M(E) for all F ∈ F.

Proof By Lemma 13.1 N = {f ∈ M(F) : Φ(f) ∈ M(E)} is a complete subspace of M(F). Therefore Proposition 9.2 implies that N = M(F) if and only if IF ∈ N for all F ∈ F. In other words, Φ(f) ∈ M(E) for all f ∈ M(F) if and only if Φ(IF ) ∈ M(E) for all F ∈ F.

The statement that Φ : M(F) → M(E) is a linear mapping will be used to express the fact that Φ : M(F) → M(X) is a linear mapping such that Φ(f) ∈ M(E) for all f ∈ M(F). + A mapping π : X × F → R∞ is called an (X, E)|(Y, F)-kernel if it is a pre- kernel and π(·, F ) ∈ M(E) for each F ∈ F. If the measurable spaces (X, E) and (Y, F) can be inferred from the context then π will simply be called a kernel. For + example, this is the case if it is known that π is a mapping from X × F to R∞ and E is the only σ-algebra of subsets of X which has been introduced.

Theorem 14.3 (1) If Φ : M(F) → M(E) is a continuous linear mapping and + π : X × F → R∞ is defined by π(x, F ) = Φ(IF )(x) for all x ∈ X, F ∈ F, then π is a kernel. + (2) If π : X × F → R∞ is a kernel and Φπ : M(F) → M(X) is the unique continuous linear mapping (given in Theorem 14.1) with Φπ(IF ) = π(·, F ) for all F ∈ F then Φ(f) ∈ M(E) for all f ∈ M(F). This means that there is a mapping Φπ : M(F) → M(E).

Proof (1) By Proposition 14.1 π is a pre-kernel and π(·, F ) = Φ(IF ) ∈ M(E) for each F ∈ F. Thus π is a kernel.

(2) By definition Φπ(IF ) = π(·, F ) ∈ M(E) holds for all F ∈ F and therefore by Lemma 14.1 Φπ(f) ∈ M(E) for all f ∈ M(F).

By Theorem 14.3 the mapping π 7→ Φπ defines a bijection between the set of (X, E)|(Y, F)-kernels and the set of continuous linear mappings from M(F) to M(E). + As with pre-kernels, we write π(f) instead of Φπ(f) when π : X × F → R∞ is a kernel. Thus π will also be considered as the unique continuous linear mapping π : M(F) → M(E) with π(IF ) = π(·, F ) for all F ∈ F. 14 Kernels 85

A kernel π : M(F) → M(E) is said to be finite if π(1) ∈ MF(E), i.e., if the measure π(x, ·) is finite for each x ∈ X. Moreover, π is a probability kernel if π(1) = 1, i.e., if π(x, ·) is a probability measure for each x ∈ X. Let us point out that in most applications (including Specifications and their Gibbs States [16]) the kernels involved are defined over a single space, i.e., with (X, E) = (Y, F). However, the results typical to this special situation will be dealt with in [16] when they are needed. In Chapter 13 we already saw a very special kind of kernel: If f : (X, E) → (Y, F) is a measurable mapping then the induced mapping f ∗ : M(F) → M(E) (with f ∗g = g ◦ f) is continuous and linear and it is thus a kernel. As a mapping ∗ + ∗ − f : X × F → R∞ we have f (x, F ) = If 1(F )(x). (This should not be regarded as a typical example of a kernel.) + Now let π : X×F → R∞ be a kernel and µ be a measure on E. Then the mappings + π : M(F) → M(E) and µ : M(E) → R∞ can be composed, which results in the + mapping µπ : M(F) → R∞ with (µπ)(f) = µ(π(f)) for all f ∈ M(F). Now this is clearly a continuous linear mapping and hence µπ is a measure on F. Considered + as a mapping µπ : F → R∞ it is given by

(µπ)(F ) = µ(π(IF )) = µ(π(·, F )) for each F ∈ F. For those who insist on using integral signs this means that

(µπ)(F ) = π(·, F ) dµ Z for each F ∈ F. Next let (Z, G) be a further measurable space and consider + + kernels π : X × F → R∞ and % : Y × G → R∞. Then composing the mappings % : M(G) → M(F) and π : M(F) → M(E) gives the mapping π% : M(G) → M(E) with (π%)(f) = π(%(f)) for all f ∈ M(G). Now π% is clearly a continuous linear + mapping and so it is a kernel, and considered as a mapping π% : X × G → R∞ it is given by

(π%)(x, G) = (π%)(IG)(x) = π(%(IG))(x) = π(%(·, G))(x) = π(x, ·)(%(·, G)) for all x ∈ X, G ∈ G, with the final equality following from Proposition 14.2. In terms of integral signs this means that

(π%)(x, G) = %(y, G)π(x, dy) Z for all x ∈ X, G ∈ G. Suppose now in addition that we also have a measure µ on E. Then µ(π%) = (µπ)% clearly holds, since the composition of mappings is associative. 14 Kernels 86

+ + Lemma 14.2 Let π : X × F → R∞ and % : Y × G → R∞ be kernels. Then (π%)(x, ·) = π(x, ·)% for all x ∈ X. (This states that the measures (π%)(x, ·) and π(x, ·)% on G are equal for each x ∈ X.)

Proof Let x ∈ X, G ∈ G; then as above (and thus making use of Proposition 14.2) (π%)(x, G) = π(x, ·)(%(·, G)) = (π(x, ·)%)(G) and hence (π%)(x, ·) = π(x, ·)% for all x ∈ X.

We end the chapter by presenting a couple of rather technical results which will be important later. These deal with topics such as conditions which uniquely determine a kernel and criteria for showing that a pre-kernel is actually a kernel.

Proposition 14.3 Let π : M(F) → M(X) be a pre-kernel, let f ∈ M(F) and x ∈ X. Then: (1) If π(f)(x) = 0 then π(fg)(x) = 0 for all g ∈ M(F).

(2) If π(f)(x) < ∞ then π(fIF )(x) = π(f)(x) holds for any F ∈ F such that {y ∈ X : 0 < f(y) < ∞} ⊂ F . Moreover, then also π(hfIF )(x) = π(hf)(x) for all h ∈ M(F).

Proof Since x ∈ X is fixed both parts are just statements about measures. In fact, (1) is just Proposition 10.5 (2) and (2) is Proposition 10.6 (5).

As with a kernel a pre-kernel π : M(F) → M(X) is finite if π(1) ∈ MF(X), i.e., if the measure π(x, ·) is finite for each x ∈ X.

Proposition 14.4 Let S ⊂ F be closed under finite intersections with Y ∈ S and σ(S) = F, and let π1, π2 be finite pre-kernels such that π1(IA) = π2(IA) for all A ∈ S. Then π1 = π2.

Proof This follows immediately from Proposition 3.3.

Lemma 14.3 Let π be finite pre-kernel such that π(1) ∈ M(E). Then the set D = {A ∈ F : π(IA) ∈ M(E)} is both a d-system and a monotone class.

Proof Let A1, A2 ∈ D with A1 ⊂ A2; then π(IA2 ) = π(IA1 ) + π(IA2\A1 ) and so by Proposition 9.1 π(IA2\A1 ) ∈ M(E) (since here π(IA2 )(x) < ∞ for each x ∈ X). Therefore D is a d-system, since by assumption X ∈ D. The proof that D is a monotone class is the same as the corresponding part of Proposition 9.2 (making use of Theorems 14.1 and 14.2). 14 Kernels 87

Proposition 14.5 Let S ⊂ F be closed under finite intersections with Y ∈ S and σ(S) = F, and let π : M(F) → M(X) be a finite pre-kernel with π(IA) ∈ M(E) for all A ∈ S. Then π is a kernel, i.e., π(f) ∈ M(E) for all f ∈ M(F).

Proof This is the same as the proof of Proposition 3.3 (using Lemma 14.3 instead of Lemma 3.4).

The requirement in Proposition 14.5 that Y ∈ S just means π(1) ∈ M(E). In many applications this holds trivially because π is a probability pre-kernel, i.e., π(1) = 1. Here is version of Proposition 14.5 which also works for pre-kernels which are, in a certain sense, only σ-finite. This will be needed in Chapter 15.

Lemma 14.4 Let S ⊂ F be closed under finite intersections with σ(S) = F. + Let π : M(F) → M(X) be a pre-kernel and v ∈ MF (F) with π(v) ∈ MF(E), and suppose that π(vIA) ∈ M(E) for all A ∈ S. Then π is a kernel, i.e., π(f) ∈ M(E) for all f ∈ M(F).

Proof Define π0 : M(F) → M(X) by π0(f) = π(vf) for all f ∈ M(F). Then π0 is a finite pre-kernel and by Proposition 14.5 (applied to S 0 = S ∪ {Y }) π0 is a kernel, i.e., π(vf) ∈ M(E) for all f ∈ M(F). But v−1f ∈ M(F) for all f ∈ M(F) and vv−1 = 1 and this implies that π(f) ∈ M(E) for all f ∈ M(F).

Finally, the following result shows how it is sometimes possible to show that a pre-kernel is a kernel by restricting things to suitable subsets of X.

Lemma 14.5 Let π : M(F) → M(X) be a pre-kernel and let {En}n≥1 be a sequence from E with X = n≥1 En; for each n ≥ 1 there is then the restriction πn : M(F) → M(En) definedSby πn(f)(x) = π(f)(x) for each x ∈ En, and clearly πn is a pre-kernel. Suppose each πn is a kernel, i.e., that πn(f) ∈ M(En) for all f ∈ M(F), n ≥ 1, where En is the trace σ-algebra of E on En. Then π is a kernel.

+ Proof Let f ∈ M(F), a ∈ R∞ and put Dn = {x ∈ En : πn(f)(x) < a}. Then Dn ∈ En and so Dn ∈ E (with Dn here considered as a subset of X). Therefore

{x ∈ X : π(f)(x) < a} = n≥1 Dn ∈ E and so by Lemma 2.7 π(f) ∈ M(E). This shows that π is a kernel. S 15 Product measures

In this chapter we first discuss the product of two σ-finite measures. This measure is unique and has several useful properties (Fubini’s theorem). We then show that a product of two arbitrary measures exists (although in general it is neither unique nor has any reasonable properties). The product of finitely many σ-measures is considered as well as a countable product of probability measures. Finally we look at something which we call an implicit product: This can be seen as trying to construct a product measure without explicitly knowing the underlying product structure. To start with let (X, E) and (Y, F) be measurable spaces, let µ be a σ-finite measure on E and ν a σ-finite measure on F. We show that there is a unique measure µ × ν on E × F such that

(µ × ν)(E × F ) = µ(E)ν(F ) for all E ∈ E, F ∈ F. This measure is called the product of µ and ν. It is σ-finite and finite if both µ and ν are finite. Recall from Chapter 2 that the product σ-algebra E × F is defined to be σ(R), where R is the set of measurable rectangles, which in the present case (with just two factors) are the sets of the form E × F with E ∈ E and F ∈ F. For the moment make no assumptions about µ and ν. Let f ∈ M(E × F); by y Proposition 2.7 fx ∈ M(F) for each x ∈ X and f ∈ M(E) for each y ∈ Y , y 0 0 0 recalling that the chapters fx andf are defined by fx(y ) = f(x, y ) for all y ∈ Y y 0 0 0 + and f (x ) = f(x , y) for all x ∈ X. There is thus a mapping ν(f) : X → R∞ defined by ν(f)(x) = ν(fx)  + for each x ∈ X, and a mapping µ (f) : Y → R∞ defined for each y ∈ Y by

µ(f)(y) = µ(f y) .

 This results in mappings ν : M(E × F) → M(X) and µ : M(E × F) → M(Y ).

 Lemma 15.1 The mappings ν and µ are linear and continuous, and thus they are both pre-kernels. Moreover, these pre-kernels are finite if µ and ν are finite,  since ν(1)(x) = ν(1) for all x ∈ X and µ (1)(y) = µ(1) for all y ∈ Y .

Proof Let f, g ∈ M(E × F) and a, b ∈ R+; then for each x ∈ X

ν(af + bg)(x) = ν((af + bg)x)

= ν(afx + bgx) = aν(fx) + bν(gx) = aν(f)(x) + bν(g)(x) ,

88 15 Product measures 89

i.e., ν(af + bg) = aν(f) + bν(g). Thus ν is linear. Now if g ≤ f then gx ≤ fx and so ν(g)(x) = ν(gx) ≤ ν(fx) = ν(f)(x) for each x ∈ X, i.e., ν(g) ≤ ν(f), and hence ν is monotone. Next, let {fn}n≥1 be an increasing sequence from M(E × F) and put f = limn fn. Then {(fn)x}n≥1 is an increasing sequence from M(F) with limn(fn)x = fx and thus with

ν(f)(x) = ν(fx) = lim ν((fn)x) = lim ν(fn)(x) n→∞ n→∞ for each x ∈ X. Therefore ν(f) = limn ν(f), and so ν is continuous. Finally  ν(1)(x) = ν(1x) = ν(1) for each x ∈ X. The proof for µ is of course exactly the same.

Now suppose µ and ν are σ-finite measures.

 Lemma 15.2 The mappings ν and µ are both kernels: ν(f) ∈ M(E) and µ(f) ∈ M(F) for each f ∈ M(E × F).

+ Proof By Lemma 10.3 there exists v ∈ MF (F) such that ν(v) < ∞. Define a 0 + 0 0 mapping v : X ×Y → R∞ by v (x, y) = v(y) for all x ∈ X, y ∈ Y ; thus v = v◦p1 0 + and hence v ∈ MF (E × F). 0 Let R = E × F ∈ R; then ν(v IR)(x) = ν(IE(x)vIF ) = IE(x)ν(vIF ) for each 0 0 x ∈ X, i.e., ν(v IR) = ν(vIF )IE, and thus ν(v IR) ∈ M(E) for each R ∈ R. Hence by Lemma 14.4 ν is a kernel, since R is closed under finite intersections, X × Y ∈ R and E × F = σ(R). The proof for µ is exactly the same.

 By Lemma 15.2 ν and µ are kernels and so we now have the measures µν and νµ on E × F. Note that

 µ(ν(1)) = µ(ν(1)) = µ(X)ν(Y ) = ν(µ(1)) = ν(µ (1)) and so these measures are finite if µ and ν are finite.

 Lemma 15.3 µν = νµ , and in particular

 (µν)(E × F ) = µ(E)ν(F ) = (νµ )(E × F )

 for all E ∈ E, F ∈ F. Moreover, the measure µν (= νµ ) is σ-finite.

+ + Proof By Lemma 10.3 there exist u ∈ MF (E) and v ∈ MF (F) with µ(u) < ∞ and + ν(v) < ∞. Define w : X × Y → R∞ by w(x, y) = u(x)v(y) for all x ∈ X, y ∈ Y ; 15 Product measures 90

+   thus w ∈ MF (E × F). Put ω = (µν)·w and ω = (νµ )·w. If R = E × F ∈ R then

ω(E × F ) = (µν)(wIE×F ) = µ(ν(wIE×F ))

= µ(ν(vIF )uIE) = µ(uIE)ν(vIF )    = ν(µ(uIE)vIF ) = ν(µ (wIE×F )) = (νµ )(wIE×F ) = ω (E × F ) .

 This shows (with E = X and F = Y ) that the measures ω and ω are finite  and ω(R) = ω (R) for all R ∈ R. But X × Y ∈ R, R is closed under finite  intersections and σ(R) = E × F, and thus by Proposition 3.3 ω = ω . It now follows from Lemma 12.2 that

−1 −1  −1  −1  µν = ((µν)·w)·w = ω·w = ω ·w = ((νµ )·w)·w = νµ , and in particular

(µν)(E × F ) = µ(ν(IE×F )) = µ(ν(IF )IE)   = µ(E)ν(F ) = ν(µ(IE)IF ) = ν(µ (IE×F )) = (νµ )(E × F ) for all E ∈ E, F ∈ F. Finally, by Lemma 10.3 the measure µν is σ-finite, since (µν)(w) = µ(u)ν(v) < ∞.

 The measure µν (= νµ ) will be denoted by µ × ν. As already mentioned, it is called the product of µ and ν. If µ and ν are finite then µ × ν is also finite, since (µ × ν)(X × Y ) = µ(X)ν(Y ).

Theorem 15.1 µ × ν is the unique measure on E × F satisfying

(µ × ν)(E × F ) = µ(E)ν(F )

 for all E ∈ E, F ∈ F. Moreover, (µ × ν)(f) = µ(ν(f)) = ν(µ (f)) holds for all f ∈ M(E × F).

Proof We only have to show the uniqueness, since the rest is Lemma 15.3. Thus let ω1, ω2 be measures on E × F with ω1(E × F ) = µ(E)ν(F ) = ω2(E × F ) for all E ∈ E, F ∈ F. Let R0 = {E × F ∈ R : µ(E) < ∞ and ν(F ) < ∞}. Now since µ and ν are σ-finite there exists an increasing sequence {En}n≥1 from E with

µ(En) < ∞ for each n ≥ 1 and X = n≥1 En and an increasing sequence {Fn}n≥1 from F with ν(Fn) < ∞ for each n S≥ 1 and Y = n≥1 Fn. Then {En × Fn}n≥1 0 0 is an increasing sequence from R with X × Y = Sn≥1 En × Fn. Moreover, R is closed under finite intersections and σ(R0) = E ×SF (since R ⊂ σ(R0)). But the 0 numbers ω1(R) and ω2(R) are finite and equal for all R ∈ R and therefore by Lemma 3.5 ω1 = ω2. 15 Product measures 91

 The final statement in Theorem 15.1 (that (µ × ν)(f) = µ(ν(f)) = ν(µ (f)) for all f ∈ M(E × F)) is more-or-less what is known as Fubini’s theorem.

We next show that there always exists a measure µ × ν on E × F such that (µ × ν)(E × F ) = µ(E)ν(F ) for all E ∈ E, F ∈ F, even when µ and ν are not σ-finite. However, the analogue of Fubini’s theorem does not hold in general and if this is the case then the measure is not very useful. Moreover, it need not be unique. Thus in what follows let µ and ν be arbitrary measures on E and F respectively.  The mappings ν and µ are defined as before and Lemma 15.1 shows that ν and µ are pre-kernels. Again denote by A (resp. by A0) the set of all finite unions (resp. all finite disjoint unions) of elements of R. Then by Lemma 2.8 A is an algebra and A0 = A. Moreover, σ(A) = σ(R). Let

N = {f ∈ M(E × F) : ν(f) ∈ M(E)} , N  = {f ∈ M(E × F) : µ(f) ∈ M(F)} .

 Lemma 15.4 N and N are complete subspaces of M(E × F) and they both contain ME(A).

 Proof By Lemmas 13.1 (2) and 15.1 it follows that N and N are complete subspaces of M(E × F). Also if R = E × F ∈ R then, as in the proof of Lemma 15.2, ν(IR) = ν(F )IE ∈ M(E) and so IR ∈ N for all R ∈ R. But if 0 A ∈ A = A then there exist R1, . . . , Rn ∈ R such that IA = IR1 + · · · + IRn and hence IA ∈ N for all A ∈ A. Therefore by Lemma 9.2 (1) ME(A) ⊂ N. In the  same way ME(A) ⊂ N .

+ By Lemma 15.4 we now have continuous linear mappings µν : N → R∞ and   + + νµ : N → R∞. Define mappings ω1, ω2 : A → R∞ by ω1(A) = (µν)(IA) and  ω2(A) = (νµ )(IA) for all A ∈ A. Then Proposition 10.1 (1) shows that ω1 and ω2 are measures on A and therefore by Theorem 4.1 they extend to measures on σ(A) = E × F which will also be denoted by ω1 and ω2 respectively (although these extensions are, in general, not unique). Moreover,

 ω1(E × F ) = (µν)(IE×F ) = µ(E)ν(F ) = (νµ )(IE×F ) = ω2(E × F ) for all E ∈ E, F ∈ F. This means that ω1 and ω2 are both candidates for a ‘product measure’, but in general, however, they will not be equal (even if ω1 and ω2 are uniquely determined by their restrictions to A). There is no problem in extending Theorem 15.1 from two to the product of finitely many factors. Let n ≥ 2 and for each k = 1, . . . , n let (Xk, Ek) be a measurable space. Put X = X1 × · · · × Xn and E = E1 × · · · × En. 15 Product measures 92

Theorem 15.2 For each k = 1, . . . , n let µk be a σ-finite measure on Ek. Then there exists a unique measure µ on E such that

µ(E1 × · · · × En) = µ1(E1) × · · · × µn(En) for all Ek ∈ Ek, k = 1, . . . , n. The measure µ is σ-finite and it is finite if the measures µ1, . . . , µn are all finite.

Proof If X is considered as the product of X1 ×· · ·×Xn−1 and Xn then it is easy to see that E = (E1 × · · · × En−1) × En. The existence thus follows by applying Theorem 15.1 n − 1 times. The uniqueness then follows exactly as in the proof of Theorem 15.1.

We next show that a countable product of probability measures exists. Let S be a countably infinite set and for each s ∈ S let (Xs, Es) be a measurable space.

Put X = s∈S Xs and E = s∈S Es. Let R be the set of measurable rectangles, i.e., sets ofQthe form s∈S EQs with Es ∈ Es for each s and Es =6 Xs for only finitely many s ∈ S; thQus by definition E = σ(R).

Theorem 15.3 For each s ∈ S let µs be a probability measure on Es. Then there exists a unique probability measure µ on E such that

µ Es = µs(Es) Ys∈S  Ys∈S for each measurable rectangle s∈S Es ∈ R (and note that there is no problem with the product s∈S µs(Es) sincQ e µs(Es) = 1 for all but finitely many s). Q Proof The uniqueness follows immediately from Proposition 3.2, since R is closed under finite intersections and E = σ(R). The proof of the existence which follows is somewhat sketchy; the reader is left to fill in the details. Recall from Lemma 2.8 that if A (resp. A0) is the set of all finite unions (resp. all finite disjoint unions) of elements of R then A is an algebra and A0 = A; moreover, σ(A) = σ(R). By enumerating the elements of S we can assume without loss of generality that + S = N . For each n ≥ 1 we thus have a a measurable space (Xn, En) and a probability measure µn on En, and X = n≥1 Xn and E = n≥1 En. Q Q For each n ≥ 1 let Yn = k≥n Xk and Fn = k≥n Ek. Then (Y1, F1) = (X, E) and if Yn is identified withQXn × Yn+1 in the usualQ way then Fn = En × Fn+1.   y For each n ≥ 1 define µn : M(Fn) → M(Yn+1) by µn(f)(y) = µn(f ). Then by  Lemmas 15.1 and 15.2 µn is a continuous linear mapping from M(Fn) to M(Fn+1)  and so µn is a probability kernel. 15 Product measures 93

Let n ≥ 1; denote by Rn the set of measurable rectangles occurring in the definition of Fn and let An be the set of all finite unions of elements from Rn. Thus by Lemma 2.8 An is an algebra and σ(An) = Fn. In particular R = R1 and A = A1.   Now µn(f) ∈ ME(An+1) for each f ∈ ME(An) and so we can consider µn as a linear mapping from ME(An) to ME(An+1). Moreover, for each f ∈ ME(An) there exists   p ≥ 1 such that (µn+p−1◦· · ·◦µn)(f) is a constant mapping in ME(An+p) and, since  µm(1) = 1, we still obtain a constant mapping in ME(An+q) with the same value + if p is replaced by q ≥ p. This allows us to define a mapping νn : ME(An) → R∞,   where νn(f) is the value of the constant mapping (µn+p−1 ◦ · · · ◦ µn)(f) with p large enough. + The mapping νn : ME(An) → R∞ is linear and thus by Proposition 10.1 (1) + the mapping A 7→ νn(IA) from An to R∞ (which we also denote by νn) is a finite finitely additive measure on An. In particular ν1 is a finite finitely additive measure on A1 = A and a direct calculation shows that

ν1 En = µn(En) nY≥1  nY≥1 for each measurable rectangle n≥1 En. Therefore it is enough to show that ν1 is a measure on A1, since theQextension of µ1 to σ(A1) is then a measure on σ(A1) = F1 = E with the required property.  Fix n ≥ 1 and note that νn = νn+1 ◦ µn. Consider Yn as Xn × Yn+1; then the section fx is in ME(An+1) for each f ∈ ME(An), x ∈ Xn, and so there is a / / mapping νn+1 : ME(An) → M(Xn) given by νn+1(f)(x) = νn+1(fx). Moreover, / /  νn+1(f) ∈ M(En) and µn(νn+1(f)) = νn+1(µn(f)) = νn(f) for all f ∈ ME(An); (this holds when f = IR with R ∈ Rn and hence by linearity it holds for all f ∈ ME(An)). Let ε > 0 and let {fm}m≥1 be a decreasing sequence from ME(An) / with νn(fm) ≥ ε for all m ≥ 1. Then {νn+1(fm)}m≥1 is a decreasing sequence / / from M(En) with µn(νn+1(fm)) = νn(fm) ≥ ε; also µn(νn+1(f1)) = νn(f1) < ∞. Thus by Theorem 10.2

/ / µn lim νn+1(fm) = lim µn(νn+1(fm)) ≥ ε m→∞  m→∞ / and so there exists x ∈ Xn with (limm νn+1(fm))(x) ≥ ε/2. This means there exists x ∈ Xn such that νn+1((fm)x)) ≥ ε/2 for all m ≥ 1; moreover {(fm)x}m≥1 is a decreasing sequence from ME(An+1).

We now iterate this process starting with a decreasing sequence {fm}m≥1 from ME(A1) with ν1(fm) ≥ ε for all m ≥ 1. For each n ≥ 1 there then exists xn ∈ Xn −n such that νn+1(fm(x1, . . . , xn)) ≥ 2 ε for all m ≥ 1, where g(x1) = gx1 and g(x1, . . . , xn) = (g(x1, . . . , xn−1))xn . 15 Product measures 94

Finally, consider a decreasing sequence {Am}m≥1 from A1 with ν1(A) ≥ ε > 0 for each m ≥ 1 and apply the above with fm = IAm . There therefore exists an −n element x = {xn}n≥1 ∈ X = Y1 such that νn+1(IAm (x1, . . . , xn)) ≥ 2 ε for all n ≥ 1 and all m ≥ 1.

Fix m ≥ 1; now the mapping IAm (x1, . . . , xn) ∈ ME(An+1) only takes on the values 0 and 1, and hence (since νn+1(IAm (x1, . . . , xn)) > 0) there exists for each n ≥ 1 at least one point yn+1 ∈ Yn+1 with IAm (x1, . . . , xn) (yn+1) = 1. But this means that the element (x1, . . . , xn, yn+1) of Y1 lies in Am for each n ≥ 1, and since Am ∈ A, this in turn implies that x ∈ Am.

We have thus shown that x ∈ m≥1 Am and so m≥1 Am =6 ∅. Hence if {Am}m≥1 is a decreasing sequence from TA1 with m≥1 AmT= ∅ then limm ν1(Am) = 0, i.e., ν1 is ∅-continuous and therefore by PropT osition 3.2 ν1 is a measure on A1. This completes the proof of Theorem 15.3.

We now turn to the final topic of this chapter and look at what will be called an implicit product. If (Y, F) is a measurable space then P(F) will denote the set of probability measures on F.

Let (X, E) be the product of measurable spaces (X1, E1) and (X2, E2) and for −1 each k = 1, 2 let Fk = pk (Ek) with pk : X → Xk the projection onto the k th component. Thus F1 consists of all sets of the form E1 × X2 with E1 ∈ E1 and F2 of all sets of the form X1 × E2 with E2 ∈ E2, and in particular E = F1 ∨ F2, where F1 ∨ F2 = σ(F1 ∪ F2) is the smallest σ-algebra containing both F1 and F2.

Now consider probability measures µ1 ∈ P(E1) and µ2 ∈ P(E2) and let µ be the product of µ1 and µ2. Also let ν1 be the measure on F1 with ν1(E1 ×X2) = µ1(E1) for all E1 ∈ E1 and ν2 the measure on F2 with ν2(X1 × E2) = µ2(E2) for all E2 ∈ E2, thus in fact µk = (pk)∗νk is the image of νk under pk for k = 1, 2. Then µ(F1 ∩ F2) = ν1(F1)ν2(F2) for all F1 ∈ F1, F2 ∈ F2. We are interested in the following converse: Let (X, E) be a measurable space and F1, F2 be sub-σ-algebras of E with E = F1 ∨F2. Then when is it the case that for any probability measures ν1 ∈ P(F1) and ν2 ∈ P(F2) there exists a probability measure µ on E such that µ(F1 ∩ F2) = ν1(F1)ν2(F2) for all F1 ∈ F1, F2 ∈ F2? This can be seen as being able to construct a product measure without explicitly knowing the underlying product structure on the set X.

Note that in the situation above the following property holds: If F1 ∈ F1, F2 ∈ F2 with F1 =6 ∅, F2 =6 ∅ then F1 ∩ F2 =6 ∅. This is because F1 has the form E1 × X2 and F2 the form X1 × E2, and then F1 ∩ F2 = E1 × E2 =6 ∅ (since E1 =6 ∅ and E2 =6 ∅). In order to avoid subscripts as much as possible let us state the problem again, just renaming the σ-algebras and measures. Thus let X be a set and E and F 15 Product measures 95 be two σ-algebras of subsets of X. Now if µ ∈ P(E) and ν ∈ P(F) then by Proposition 3.3 there is at most one probability measure ω on E ∨ F such that ω(E ∩ F ) = µ(E)ν(F ) for all E ∈ E, F ∈ F and if this measure exists then it will be called the implicit product of µ and ν. The question now becomes: When does the implicit product of µ and ν exist for all µ ∈ P(E), ν ∈ P(F)? It turns out that this is the case if and only if E and F have the property enjoyed by the σ-algebras E1 and E2 in the original situation. More precisely, let us say that E and F are weakly independent if E ∩ F =6 ∅ for all E ∈ E, F ∈ F with E =6 ∅, F =6 ∅. Theorem 15.4 below states in part that the implicit product of µ and ν exists for all µ ∈ P(E), ν ∈ P(F) if and only if E and F are weakly independent. In many applications what is really needed is a not just a measure but a kernel. Let π : M(E ∨ F) → M(E) be a probability kernel, so π(x, ·) is a probability measure on E ∨ F for each x ∈ X and π(·, B) ∈ M(E) for each B ∈ E ∨ F. Then π is called an implicit product kernel for ν ∈ P(F) if

π(x, E ∩ F ) = IE(x)ν(F ) for all E ∈ E, F ∈ F, x ∈ X. Again, if π exists then by Proposition 14.5 it is unique and will thus be referred to as the implicit product kernel for µ.

For each x ∈ X let εx be the element of P(F) with εx(F ) = IF (x) for all F ∈ F.

Theorem 15.4 The following statements are equivalent: (1) The implicit product of µ and ν exists for all µ ∈ P(E), ν ∈ P(F). (2) The implicit product kernel exists for all ν ∈ P(F).

(3) The implicit product of εx and εy exists for all x, y ∈ X (with εx considered as an element of P(E) and εy as an element of P(F)). (4) The σ-algebras E and F are weakly independent.

Proof (1) ⇒ (3): This clear.

(3) ⇒ (2): For each x, y ∈ X let εx,y be the implicit product of εx and εy, i.e., εx,y is the unique element of P(E ∨ F) such that εx,y(E ∩ F ) = IE(x)IF (y) for all E ∈ E, F ∈ F. Fix x ∈ X and define a pre-kernel πx : M(E ∨ F) → M(X) by

πx(y, B) = εx,y(B) .

Now πx(·, B) ∈ M(F) for each B ∈ R, where R = {E ∩ F : E ∈ E, F ∈ F} and therefore by Proposition 14.5 πx : M(E ∨ F) → M(F) is a probability kernel. Let + ν ∈ P(F) and define π : X × E ∨ F → R∞ by

π(x, B) = ν(πx(·, B)) ; 15 Product measures 96 it will be shown that π is the implicit product kernel for ν. Note first that π(x, ·) = νπx and so in particular π(x, ·) ∈ P(E ∨ F) for each x ∈ X (which means that π is a pre-kernel). Moreover,

π(x, E ∩ F ) = ν(εx,·(E ∩ F )) = ν(IE(x)IF ) = IE(x)ν(F ) for all E ∈ E, F ∈ F, and in particular π(·, R) ∈ M(E) for each R ∈ R. Thus by Proposition 14.5 π(·, B) ∈ M(E) for each B ∈ E ∨ F. (2) ⇒ (1): Let µ ∈ P(E) and ν ∈ P(F), and by assumption there exists an implicit product kernel π for ν. Let ω = µπ, i.e., ω ∈ P(E ∨ F) is the measure defined by ω(B) = µ(π(·, B)) for all B ∈ E ∨ F. Then

ω(E ∩ F ) = µ(π(·, E ∩ F )) = µ(IEν(F )) = µ(E)ν(F ) for all E ∈ E, F ∈ F, and so ω is the implicit product of µ and ν. (3) ⇒ (4): Let E ∈ E, F ∈ F with E =6 ∅ and F =6 ∅. Choose x ∈ E, y ∈ F and let εx,y be the implicit product of εx and εy. Then εx,y(E ∩ F ) = IE(x)IF (y) = 1 and so in particular E ∩ F =6 ∅. (4) ⇒ (3): This is the only part which is not trivial and its proof will occupy the rest of the chapter. Let I denote the set of all subsets of X having the form E ∩ F with E ∈ E and F ∈ F. Then E ∪ F ⊂ I ⊂ E ∨ F and hence E ∨ F = σ(I). Moreover, denote by C (resp. by C0) the set of all finite unions (resp. all finite disjoint unions) of elements of I. The following result is the analogue of Lemma 2.8 (and the proof is essentially just copied from the proof of Lemma 2.8):

Lemma 15.5 C is an algebra and C0 = C. Moreover, σ(C) = E ∨ F.

Proof It is immediate that σ(C) = E ∨ F since I ⊂ C and C ⊂ σ(I). Now C0 ⊂ C and C is closed under finite unions, and hence it will follow that C is an algebra with C0 = C once we show that if C ∈ C then C and X \ C are both elements of 0 n C . Thus consider C ∈ C, so C has the form k=1 Ek ∩ Fk with E1, . . . , En ∈ E and F1, . . . , Fn ∈ F. Let S = {E1, . . . , En} andS T = {F1, . . . , Fn} and let U be the subset of P(X) consisting of all elements of the form S ∩T with S ∈ p(S) and T ∈ p(T ) (where p(S) are p(T ) are as in Lemma 2.2) Then U ⊂ I (since p(S) ⊂ E and p(T ) ⊂ F) and by Lemma 2.2 (1) the elements of U form a partition of X. Furthermore, by Lemma 2.2 (2) it follows that if U ∈ U then either U ⊂ C or U ∩ C = ∅, and this implies that C is the (disjoint) union of the elements of U it contains, and the same holds true of X \ C. In particular, C and X \ C are both elements of A0. Suppose now that E and F are weakly independent and let x, y ∈ X. We want to construct a measure ω on E ∨ F such that ω(E ∩ F ) = IE(x)IF (y) for all E ∈ E, F ∈ F, and by Lemma 15.5 and Theorem 11.2 it is enough to construct a measure ω on C with this property. 15 Product measures 97

Lemma 15.6 Let {En}n≥1 be a sequence from E and {Fn}n≥1 a sequence from

F and let E ∈ E, F ∈ F with E =6 ∅, F =6 ∅ and E ∩ F ⊂ n≥1 En ∩ Fn. Then E ⊂ n≥1 En and F ⊂ n≥1 Fn. S S S

Proof Since E \ n≥1 En ∈ E, F =6 ∅ and S

E \ En ∩ F = (E ∩ F ) \ En ⊂ (E ∩ F ) \ En ∩ Fn = ∅  n[≥1  n[≥1 n[≥1 it follows that E \ n≥1 En = ∅, and thus E ⊂ n≥1 En. The same argument shows that F ⊂ nS≥1 Fn. S S

Lemma 15.7 Let {En}n≥1 be a sequence from E and {Fn}n≥1 a sequence from F such that the sequence {En ∩ Fn}n≥1 is disjoint; also let E ∈ E, F ∈ F with x ∈ E, y ∈ F and E ∩ F ⊂ n≥1 En ∩ Fn. Then there is exactly one index p ≥ 1 such that x ∈ Ep and y ∈ FpS.

Proof Suppose x ∈ Ep, y ∈ Fp and also x ∈ Eq, y ∈ Fq. Then Ep ∩ Eq =6 ∅ and Fp ∩ Fq =6 ∅ and hence (Ep ∩ Eq) ∩ (Fp ∩ Fq) = (Ep ∩ Fp) ∩ (Eq ∩ Fq) =6 ∅. But this only possible if q = p, since the sequence {En ∩ Fn}n≥1 is disjoint. Hence there is at most one index p ≥ 1 such that x ∈ Ep and y ∈ Fp.

Let M = {n ≥ 1 : x ∈ En}; then M =6 ∅, since by Lemma 15.6 E ⊂ n≥1 En. 0 0 0 0 Put E = n∈M En \ m∈/M Em; then E ∈ E, x ∈ E (and so E ∩ E =6 S∅) and T S 0 0 0 (E ∩ E ) ∩ F ⊂ (En ∩ E ) ∩ Fn (En ∩ E ) ∩ Fn . n[≥1 n[∈M

Thus by Lemma 15.6 F ⊂ n∈M Fn and so there exists p ∈ M with y ∈ Fp (and x ∈ Ep since p ∈ M). S

0 0 0 0 Lemma 15.8 Let E1, . . . , Em, E1, . . . , En ∈ E, F1, . . . , Fm, F1, . . . , Fn ∈ F 0 0 0 0 with (Ej ∩ Fj) ∩ (Ek ∩ Fk) = ∅ and (Ej ∩ Fj) ∩ (Ek ∩ Fk) = ∅ whenever j =6 k m n 0 0 and such that j=1 Ej ∩ Fj = k=1 Ek ∩ Fk. Then S S m n I (x)I (y) = I 0 (x)I 0 (y) . Ej Fj Ek Fk Xj=1 Xk=1 Moreover, the sums occurring here can only take on the values 0 and 1. 15 Product measures 98

m Proof Suppose j=1 IEj (x)IFj (y) > 0; then there exists an index j with x ∈ Ej n 0 0 and y ∈ Fj. Thus,Psince Ej ∩Fj ⊂ k=1 Ek ∩Fk, it follows from Lemma 15.7 that 0 0 there is exactly one index k such Sthat x ∈ Ek and y ∈ Fk, which implies that n 0 0 n 0 0 k=1 IE (x)IF (y) = 1. In particular, k=1 IE (x)IF (y) > 0 and so the same k k m k k Pargument now shows that j=1 IEj (x)IFPj (y) = 1. Therefore if one these sums is not zero then they are bothPequal to one; hence they are always equal (and can only take the values 0 and 1).

+ By Lemmas 15.5 and 15.8 there is a unique mapping ω : C → R∞ such that

n n

ω Ek ∩ Fk = IEk (x)IFk (y) k[=1  Xk=1 whenever E1, . . . , En ∈ E, F1, . . . , Fn ∈ F with E1 ∩ F1, . . . En ∩ Fn disjoint, and ω can only take on the values 0 and 1.

Lemma 15.9 ω is a measure on C.

Proof It is clear that ω(∅) = 0 and that ω is additive. Let {Cn}n≥1 be a disjoint sequence from C with C = n≥1 Cn ∈ C. If ω(C) = 0 then ω(Cn) = 0 for all n ≥ 1 (since ω(Cn) + ω(C \SCn) = ω(C) = 0) and then ω(C) = n≥1 ω(Cn) holds trivially. We can thus assume that ω(C) = 1, and so there existP E ∈ E and F ∈ F with E ∩ F ⊂ C and x ∈ E, y ∈ F . Now for each n ≥ 1 there exist

En1, . . . , Enpn ∈ E, Fn1, . . . , Fnpn ∈ F with En1 ∩ Fn1, . . . , Enpn ∩ Fnpn disjoint pn such that Cn = k=1 Enk ∩ Fnk. Hence the elements Enk ∩ Fnk, k = 1, . . . , pn, n ≥ 1 are all disjoinS t and

pn

E ∩ F ⊂ C = Cn = Enk ∩ Fnk . n[≥1 n[≥1 k[=1

Therefore by Lemma 15.8 there exist m ≥ 1 and 1 ≤ k ≤ pm such that x ∈ Emk and y ∈ Fmk, which implies that ω(Cm) = 1. But

n n

ω(Ck) = ω Ck ≤ ω(C) = 1 Xk=1 k[=1  for each n ≥ 1 and thus n≥1 ω(Cn) = 1 = ω(C). This shows that ω is a measure on C. P

This completes the proof of (4) ⇒ (3), and hence the proof of Theorem 15.4. 16 Countably generated measurable spaces

A measurable space (X, E) is said to be countably generated if the σ-algebra is, where a σ-algebra E is countably generated if E = σ(I) for some countable subset I of E. In the present chapter we look at some of the properties enjoyed by such spaces. This chapter can be seen as a preparation for Chapter 18, where we study substandard Borel spaces (our substitute for standard Borel spaces). However, we also apply the results in Chapter 17 to give a simple proof of the Dunford-Pettis theorem. In our treatment of both countably generated measurable spaces and substandard Borel spaces the following ‘nice’ measurable space (M, B) plays a crucial role: Let N M = {0, 1} (the space of all sequences {zn}n≥0 of 0’s and 1’s), considered as a compact metric space with respect to the metric d : M × M → R+ given by

0 −n 0 d({zn}n≥0, {zn}n≥0) = 2 |zn − zn| Xn≥0

(or any equivalent metric the reader might prefer), and let B be the σ-algebra of Borel subsets of M. m+1 For each m ≥ 0 let qm : M → {0, 1} be given by qm({zn}n≥0) = (z0, . . . , zm) −1 m and let Cm = qm (P({0, 1} )). Then Cm is a finite algebra and each of the sets in Cm is both open and closed; also Cm ⊂ Cm+1. Let C = m≥0 Cm; then C is a countable algebra (the algebra of cylinder sets) and each ofSthe sets in C is both open and closed. Also for each m ≥ 0 let pm : M → {0, 1} be the projection mapping defined by letting pm({zn}n≥0) = zm for each {zn}n≥0 ∈ M and let

−1 Λm = pn ({1}) = {{zn}n≥0 ∈ M : zm = 1} .

Lemma 16.1 B = σ(C) = σ({Λm : m ≥ 0} and hence in particular (M, B) is countably generated.

Proof Let O be the set of open subsets of M. Then the countable set C is a base for the topology on M, and so each O ∈ O can be written as a countable union of elements from C. Hence O ⊂ σ(C) and thus B = σ(O) ⊂ σ(σ(C)) = σ(C), i.e., B = σ(C). Moreover, each element of C can written as a finite intersection of elements from the set {Λm : m ≥ 0} ∪ {X \ Λm : m ≥ 0} and it therefore follows that C ⊂ σ({Λm : m ≥ 0}). This implies that B = σ({Λm : m ≥ 0}).

The next result appears as Theorem 2.1 in Mackey [13].

Proposition 16.1 A measurable space (X, E) is countably generated if and only if there exists a mapping f : X → M with f −1(B) = E.

99 16 Countably generated measurable spaces 100

Proof Suppose first (X, E) is countably generated; then there exists a sequence {En}n≥0 from E such that E = σ({En : n ≥ 0}). Define a mapping f : X → M −1 by f(x) = {IEn (x)}n≥0. Then f (Λn) = En for each n ≥ 0 and therefore by Proposition 2.4 and Lemma 16.1

−1 −1 f (B) = f (σ({Λn : n ≥ 0})) = σ({En : n ≥ 0}) = E . Suppose conversely there exists f : X → M with f −1(B) = E and for each n ≥ 0 −1 put En = f (Λn). Then again by Proposition 2.4 and Lemma 16.1

−1 −1 −1 σ({En : n ≥ 0}) = σ({f (Λn) : n ≥ 0}) = f (σ({Λn : n ≥ 0})) = f (B) = E and thus E is countably generated.

Proposition 16.2 If (X, E) is a countably generated measurable space then there exists a countable algebra A with E = σ(A).

Proof By Proposition 16.2 there exists a mapping f : X → M with f −1(B) = E and therefore Proposition 2.4 implies that A = f −1(C) is a countable algebra with σ(A) = σ(f −1(C)) = f −1(σ(C)) = E.

A topological space is separable if it possesses a countable dense set. A metric space is separable if and only its topology has a countable base.

Proposition 16.3 (1) Let X be a topological space having a countable base for its topology and let BX be the σ-algebra of Borel subsets of X. Then (X, BX ) is countably generated. In particular, this is the case when X is a separable metric space. (2) Let (X, E) be a countably generated measurable space and Y be a non-empty subset of X. Then (Y, E|Y ) is countably generated (with E|Y the trace σ-algebra). (3) Let (X, E) and (Y, F) be measurable spaces with (Y, F) countably generated. If there exists a mapping f : X → Y with f −1(F) = E then (X, E) is countably generated.

(4) Let S be a non-empty countable set and for each s ∈ S let (Xs, Es) be a countably generated measurable space. Then the product measurable space (X, E) is countably generated.

(5) Let S be a non-empty countable set and for each s ∈ S let (Xs, Es) be a countably generated measurable space; assume the sets Xs, s ∈ S, are disjoint. Then the disjoint union measurable space (X, E) is countably generated. (6) Let (X, E) be a countably generated measurable space. Then the measurable space (X/, E/) is countably generated. (Recall X/ denotes the set of all measures on E taking only values in the set N and that E/ = σ(E♦), where E♦ is the set of all subsets of X/ having the form {p ∈ X/ : p(E) = k} with E ∈ E and k ∈ N.) 16 Countably generated measurable spaces 101

Proof (1) If OX is the set of open subsets of X and U is a countable base for the topology then each U ∈ OX can be written as a countable union of elements from U and thus OX ⊂ σ(U). Hence BX = σ(OX ) ⊂ σ(U) and so BX = σ(U). Therefore (X, BX ) is countably generated.

(2) If I is a countable subset of E with E = σ(I) then I|Y is a countable subset of E|Y and by Proposition 2.3 E|Y = σ(I)|Y = σ(I|Y ). (3) If I is a countable subset of F with F = σ(I) then f −1(I) is a countable subset of E and by Proposition 2.4 E = f −1(F) = f −1(σ(I)) = σ(f −1(I)). Note: The proofs of (4), (5) and (6) require Propositions 16.4, 16.5 and 16.7 respectively. These results are dealt with below.

(4) By Proposition 16.1 there exists for each s ∈ S a mapping fs : Xs → M with −1 S fs (B) = Es. Define f : X → M by letting f({xs}s∈S) = {fs(xs)}s∈S for each −1 S {xs}s∈S ∈ X. Then by Proposition 2.6 (2) f (B ) = E. But by Propositions 16.1 and 16.4 (M S, BS) is countably generated and therefore by (3) (X, E) is countably generated.

(5) By Proposition 16.1 there exists for each s ∈ S a mapping fs : Xs → M −1 with fs (B) = Es. Define f : X → S × M by letting f(x) = (s, fs(x)) for each −1 x ∈ Xs, s ∈ S. Then by Proposition 2.9 (2) f (P(S) × B) = E. But by (1) and Proposition 16.5 (S × M, P(S) × B) is countably generated and thus by (3) (X, E) is countably generated. (6) By Proposition 16.1 there exists a mapping f : X → M with f −1(B) = E.

Let f/ : X/ → M/ be the mapping defined in Chapter 13 with f/(p) = f∗p for all −1 p ∈ X/. Then by Proposition 13.2 (2) (f/) (B/) = E/. But by Proposition 13.3 n n (M/, B/) is the disjoint union of the measurable spaces (M/ , B/ ), n ≥ 1, and n n by (1) and Proposition 16.7 (M/ , B/ ) is countably generated for each n ∈ N. Thus by (5) (M/, B/) is countably generated and so by (3) (X/, E/) is countably generated.

We now present the results about the various spaces constructed out of M which were used in the proof of Proposition 16.3 and which will be required in the proof of Propositions 18.1 and 18.2.

Proposition 16.4 Let S be a non-empty countable set. Then M S (with the product topology) is homeomorphic to M. If h : M S → M is a homeomorphism then also h−1(B) = BS (with BS the product σ-algebra on M S ).

Proof The set S × N is countably infinite, so let ϕ : N → S × N be a bijective S N mapping. If {ws}s∈S ∈ M and s ∈ S then the element ws of M = {0, 1} S will be denoted by {ws,n}n≥0. Now define a mapping g : M → M by letting g({ws}s∈S) = {zn}n≥0, where zn = ws,k and (s, k) = ϕ(n). Then it is easy to see 16 Countably generated measurable spaces 102

0 that g is bijective, and it is continuous, since pn ◦g = pk ◦ps for each n ≥ 0, where 0 S again (s, k) = ϕ(n) and ps : M → M is the projection onto the s th component. Thus g is a homeomorphism, since M S is compact and compact subsets of M are closed. Now by Proposition 2.10 BS is the Borel σ-algebra and therefore if h : M S → M is a homeomorphism then by Lemma 2.5 h−1(B) = BS.

Proposition 16.5 Let S be a non-empty countable set considered as a topological space with the discrete topology (in which every subset of S is open); then the topological space S × M is separable and its topology can be given by a complete metric. Moreover, P(S) × B is the Borel σ-algebra of S × M.

Proof Since M is separable there exists a countable dense subset D of M. Then S × D is countable and it is clearly a dense subset of S × M; hence S × M is separable. Now a metric % can be defined on S × M by letting

%((s1, z1), (s2, z2)) = max{δ(s1, s2), d(z1, z2)} , where δ(s, s) = 0 and δ(s, t) = 1 if s =6 t, and it is easily checked that this metric is complete and that it generates the topology on S ×M. It remains to show that P(S) × B is the Borel σ-algebra of S × M, which we denote by E. Let D denote the set of all sets having the form {s} × C with s ∈ S and C ∈ C. Then D is a countable base for the topology on S × M and so σ(D) = E. But each element of D is a measurable rectangle and hence σ(D) ⊂ P(S) × B, and this shows E ⊂ P(S) × B. Conversely, for each s ∈ S the set Bs = {B ∈ B : {s} × B ∈ E} is a monotone class containing C (since {s} × C is open for each C ∈ C) and hence by Proposition 2.2 Bs = B, i.e., {s} × B ∈ E for all B ∈ B, s ∈ S.

Thus P(S) × B ⊂ E, since if E ∈ P(S) × B then E = s∈S({s} × Es) and by Proposition 2.7 the section Es is in B for each s ∈ S. S

In the proof of Proposition 16.7 below (and several times later) we need the following remarkable property of the space (M, B):

Proposition 16.6 Let µ be a finite finitely additive measure on C. Then µ is a measure and so by Theorem 4.1 it extends to a unique measure on B.

Proof If {Cn}n≥1 is a decreasing sequence from C with n≥1 Cn = ∅ then, since the elements of C are compact, there exists m ≥ 1 so thatT Cn = ∅ for all n ≥ m. Thus µ(Cn) = 0 for all n ≥ m and so in particular limn µ(Cn) = 0. Therefore µ is ∅-continuous, and hence by Proposition 3.3 it is a measure.

n For each n ∈ N let M/ denote the set of all measures p on B taking only values n n n in the set Nn = {0, 1, . . . , n} and with p(M) = n; put B/ = σ(B♦), where B♦ is 16 Countably generated measurable spaces 103

n n the set of all subsets of M/ having the form {p ∈ M/ : p(B) = k} with B ∈ B n n and k ∈ Nn. We also consider M/ as a topological space: Let U/ be the set of n all non-empty subsets of M/ having the form

n {p ∈ M/ : p(C) = vC for all C ∈ N} with N a finite subset of C and {vC }C∈N a sequence from Nn. Clearly for each n n n p ∈ M/ there exists U ∈ U/ with p ∈ U and if U1, U2 ∈ U/ and p ∈ U1 ∩ U2 then n n there exists U ∈ U/ with p ∈ U ⊂ U1 ∩ U2. Thus U/ is the base for a topology n n n O/ on M/ . This means that U ∈ O/ if and only if for each p ∈ U there exists a finite subset N of C such that

n {q ∈ M/ : q(C) = p(C) for all C ∈ N} ⊂ U .

n n Proposition 16.7 The topological space M/ is compact and metrisable and B/ n is the Borel σ-algebra of M/ .

n n Proof We start by showing that the topology O/ on M/ is given by a metric. Let {Ck}k≥1 be an enumeration of the elements in the countable set C and define n n + a mapping % : M/ × M/ → R by

−k %(p, q) = 2 |p(Ck) − q(Ck)| . Xk≥1

If %(p, q) = 0 then p(C) = q(C) for all C ∈ C and hence by Proposition 3.4 p = q. Thus % is a metric since by definition it is symmetric and it is clear that the n triangle inequality holds. Moreover, if p ∈ M/ then for each ε > 0 there exists a finite subset N of C such that

n n {q ∈ M/ : q(C) = p(C) for all C ∈ N} ⊂ {q ∈ M/ : %(q, p) < ε} and for each finite subset N of C there exists ε > 0 such that

n n {q ∈ M/ : %(q, p) < ε} ⊂ {q ∈ M/ : q(C) = p(C) for all C ∈ N} .

n This means that O/ is the topology given by the metric %. Note that if {pk}k≥1 n n is a sequence from M/ and p ∈ M/ then limk pk = p (i.e., limk %(pk, p) = 0) if and only if limk pk(C) = p(C) for each C ∈ C. n In order to show that M/ is compact it is enough to show that the metric space n n M/ is sequentially compact. Let {pk}k≥1 be a sequence of elements of M/ . By the usual diagonal argument (Theorem 23.1) there exists a subsequence {kj}j≥1 such + that limj pkj (C) exists for each C ∈ C. Define p : C → R by p(C) = limj pkj (C). Then p is clearly finitely additive and p(M) = m and so by Proposition 16.6 p is a measure on C which has a unique extension to a measure (also denoted by p) 16 Countably generated measurable spaces 104

n on B. But D = {B ∈ B/ : p(B) ∈ Nn} is a monotone class containing the algebra n n C and thus p ∈ M/ . Therefore p ∈ M/ and limj %(pkj , p) = 0 and this shows that n the metric space M/ is sequentially compact. n n n It remains to show that B/ is the Borel σ-algebra of M/ . First, the set U/ is n countable and so each element of O/ can be written as a countable union of n n n n n elements from U/ . Thus O/ ⊂ σ(U/ ), which implies that σ(O/ ) = σ(U/ ), since n n n n U/ ⊂ O/ . Second, each element of U/ is a finite intersection of elements from B♦ n n n n n and hence U/ ⊂ B/ . This shows that σ(O/ ) = σ(U/ ) ⊂ B/ . Finally, let k ∈ Nn n n and let D be the set of those B ∈ B for which {p ∈ M/ : p(B) = k} ∈ σ(O/ ). Then C ⊂ D and D is a monotone class, and so by Proposition 2.2 D = B, and n n this means that {p ∈ M/ : p(B) = k} ∈ σ(O/ ) for all B ∈ B, k ∈ Nn, i.e., n n n n n n n B♦ ⊂ σ(O/ ). Therefore B/ = σ(B♦) ⊂ σ(O/ ), and this shows B/ = σ(O/ ).

A measurable space (X, E) is separable if {x} ∈ X for all x ∈ X. We next look at separable countably generated measurable spaces, and first need the following almost trivial but useful fact.

Lemma 16.2 If f : X → Y is any mapping then f(f −1(F )) = F ∩ f(X) holds for all F ⊂ Y .

Proof If y ∈ f(f −1(F )) then there exists x ∈ f −1(F ) with y = f(x) and then y ∈ F . Hence y ∈ F ∩ f(X), i.e., f(f −1(F )) ⊂ F ∩ f(X). On the other hand, if y ∈ F ∩ f(X) then there exists x ∈ X with y = f(x), thus x ∈ f −1(F ) and so f ∈ f(f −1(F )), i.e., F ∩ f(X) ⊂ f(f −1(F )).

Proposition 16.8 Let (X, E) be a separable and countably generated measurable space and let f : X → M be a mapping with f −1(B) = E; put A = f(X). Then f is injective, and if f is considered as a bijective mapping from X onto A then −1 f (B|A) = E.

−1 Proof Let x1, x2 ∈ X with f(x1) = f(x2) = z. Since f (B) = E and {x} ∈ E −1 −1 for each x ∈ X there exist B1, B2 ∈ B with f (B1) = {x1} and f (B2) = {x2}. Therefore by Lemma 16.2

−1 −1 B1 ∩ f(X) = f(f (B1)) = f({x1}) = {z} = f({x2}) = f(f (B2)) = B2 ∩ f(X) and thus (since f −1(B) = f −1(B ∩ f(X)) holds for all B ⊂ M)

−1 −1 −1 {x1} = f (B1 ∩ f(X)) = f ({z}) = f (B2 ∩ f(X)) = {x2} ;

0 i.e., x1 = x2. This shows f is injective. Now let B ∈ B|A; then B = B ∩ f(X) for some B0 ∈ B, which implies that f −1(B) = f −1(B0) ∈ E. But if E ∈ E then 16 Countably generated measurable spaces 105

−1 −1 E = f (B) for some B ∈ B, thus B ∩ f(X) ∈ B|A and f (B ∩ f(X)) = E. −1 Hence f (B|A) = E (and note that this part is true without assuming (X, E) is separable).

Let us say that measurable spaces (X, E) and (Y, F) are isomorphic if there exists a bijective mapping f : X → Y such that f −1(F) = E. In this case the mapping E 7→ f(E) maps E bijectively onto F. Proposition 16.8 thus says that any separable countably generated measurable space is isomorphic to (B, B|B) for some B ⊂ M. Conversely, by Proposition 16.3 (2) the measurable space (B, B|B) is separable and countably generated for each B ⊂ M.

We now look at what are called atoms in a measurable space. Let (X, E) be a measurable space and for each x ∈ X let ax be the intersection of all the elements in E containing x. Thus x ∈ ax and for all x, y ∈ X either ax = ay or ax ∩ay = ∅. A subset A of X is called an atom of E if A = ax for some x ∈ X and the set of all atoms of E will be denoted by A(E). Thus A(E) defines a partition of X: For each x ∈ X there is a unique atom A ∈ A(E) with x ∈ A. (Of course, if (X, E) is separable then ax = {x} for each x ∈ X.) In general atoms need not be measurable, i.e., it will not always be the case that A(E) ⊂ E. However, this problem does not arise if (X, E) countably generated:

Lemma 16.3 Let (X, E) be a countably generated measurable space. If (Y, F) is a separable measurable space and f : X → Y a mapping with f −1(F) = E then A(E) = {f −1({z}) : z ∈ f(X)} . In particular, A(E) = {f −1({z}) : z ∈ f(X)} for each mapping f : X → M with f −1(B) = E which, together with Proposition 16.1, shows that A(E) ⊂ E.

Proof Let z ∈ f(X), put A = f −1({z}) and consider x ∈ A. If E ∈ E with x ∈ E then there exists F ∈ F with f −1(F ) = E and so f(x) ∈ F , from which it follows −1 −1 that A = f ({z}) ⊂ f (F ) = E. Since A ∈ F, this shows that A = ax for all x ∈ A. Therefore A(F) = {f −1({z}) : z ∈ f(X)}, since clearly x ∈ f −1({f(x)}) for each x ∈ X.

Lemma 16.4 Let (X, E) and (Y, F) be countably generated measurable spaces and let f : X → Y be a surjective mapping with f −1(F) = E. Then A(E) = {f −1(A) : A ∈ A(F)} .

Proof By Proposition 16.1 there exists a mapping g : Y → M with g−1(B) = F. Then g ◦ f : X → M with (g ◦ f)−1(B) = E and thus by Lemma 16.3 A(E) = {(g ◦ f)−1({z}) : z ∈ (g ◦ f)(X)} = {f −1(g−1({z})) : z ∈ g(Y )} = {f −1(A) : A ∈ A(F)} . 16 Countably generated measurable spaces 106

Proposition 16.9 Let (X, E), (Y, F) be countably generated measurable spaces, and so by Proposition 16.3 (4) (X × Y, E × F) is also countably generated. Then

A(E × F) = {A × B : A ∈ A(E), B ∈ A(F)} .

Proof Let R (resp. R0) denote the set of measurable rectangles in X ×Y (resp. in M × M). By Proposition 16.1 there exist mappings f : X → M and g : Y → M with f −1(B) = E and g−1(B) = F. Define a mapping h : X × Y → M × M by h(x, y) = (f(x), g(y)). If B × C ∈ R0 then h−1(B × C) = f −1(B) × g−1(C) ∈ R. On the other hand, if E × F ∈ R then there exist B, C ∈ B with E = f −1(B) and F = f −1(C) and then E × F = h−1(B × C) ∈ h−1(R0). Hence R = h−1(R0) and therefore by by Proposition 2.3

E × F = σ(R) = σ(h−1(R0)) = h−1(σ(R0)) = h−1(B × B) .

Moreover, (M × M, B × B) is separable and therefore by Lemma 16.3

−1 A(E × F) = {h ({(z1, z2)}) : (z1, z2) ∈ h(X × Y )} −1 −1 = {f ({(z1)}) × g ({z2)}) : z1 ∈ f(X), z2 ∈ g(Y )} = {A × B : A ∈ A(E), B ∈ A(F)} .

Here is another result of this type, whose proof is almost identical with the proof of Proposition 16.9.

Proposition 16.10 Let X be a non-empty set and let E ⊂ P(X), F ⊂ P(X) be countably generated σ-algebras. Then E ∨ F is also countably generated and

A(E ∨ F) = {A ∩ B : A ∈ A(E), B ∈ A(F)} \ {∅} .

Thus each atom of E ∨ F is the intersection of an atom of E with an atom of F and, conversely, the intersection of an atom of E with an atom of F is either empty or an atom of E ∨ F.

Proof Let D = {E ∩ F : E ∈ E, F ∈ F}; thus E ∪ F ⊂ D ⊂ E ∨ F which implies that E ∨ F = σ(D). Now by Proposition 16.1 there exist mappings f : X → M, g : X → M with f −1(B) = E and g−1(B) = F. Define a mapping h : X → M ×M by h(x) = (f(x), g(x)) for all x ∈ X. Then h−1(B×C) = f −1(B)∩g−1(C) ∈ D for all B, C ∈ B. On the other hand, if E ∈ E and F ∈ F then there exist B, C ∈ B with E = f −1(B) and F = f −1(C) and then E ∩ F = h−1(B × C). Thus D = h−1(R) where R = {B × C : B, C ∈ B}, and therefore by Proposition 2.4

E ∨ F = σ(D) = σ(h−1(R)) = h−1(σ(R)) = h−1(B × B) . 16 Countably generated measurable spaces 107

In particular, by Proposition 16.3 (3) (X, E ∨ F) is countably generated, since by Proposition 16.3 (4) (M × M, B × B) is countably generated. Moreover, (M × M, B × B) is separable and therefore by Lemma 16.3

−1 A(E ∨ F) = {h ({(z1, z2)}) : (z1, z2) ∈ h(X)} −1 −1 = {f ({(z1)}) ∩ g ({z2)}) : (z1, z2) ∈ h(X)} = {A ∩ B : A ∈ A(E), B ∈ A(F)} \ {∅} . 17 The Dunford-Pettis theorem

Let (X, E) be a measurable space and again denote the set of probability measures on (X, E) by P(E). A subset Q of P(E) is equicontinuous if for each decreasing sequence {En}n≥1 from E with n≥1 En = ∅ and each ε > 0 there exists p ≥ 1 so that µ(Ep) < ε for all µ ∈ Q.T The following is the elementary (but more useful) half of the the Dunford-Pettis theorem. (The proof of the converse can be found in Dunford and Schwartz [5], Chapter IV.9.) This result plays a fundamental role in Specifications and their Gibbs states [16].

Theorem 17.1 Let Q ⊂ P(E) be equicontinuous; then for each sequence {µn}n≥1 from Q there exists a subsequence {nj}j≥1 and a measure µ ∈ P(E) such that

µ(E) = limj µnj (E) for all E ∈ E.

The rest of the chapter is taken up with a proof of this result. Before starting however, note the following simple fact about the convergence in Theorem 17.1:

Lemma 17.1 Let {µn}n≥1 be a sequence from P(E) and let µ ∈ P(E). Then µ(E) = limn µn(E) holds for all E ∈ E if and only if µ(f) = limn µn(f) for all f ∈ MB(E).

Proof Suppose µ(E) = limn µn(E) for all E ∈ E; then also µ(g) = limn µn(g) for all g ∈ ME(E). Let f ∈ MB(E); then for each ε > 0 there exists by Lemma 9.6 (2) g ∈ ME(E) with g ≤ f ≤ g + ε. Thus

lim sup µn(f) ≤ lim sup µn(g + ε) = lim µn(g) + ε = µ(g) + ε ≤ µ(f) + ε n→∞ n→∞ n→∞ and in the same way

µ(f) ≤ µ(g) + ε = lim µn(g) + ε ≤ lim inf µn(f) + ε . n→∞ n→∞

From this it follows that µ(f) = limn µn(f). The converse is clear.

We say that (X, E) has the weak sequential compactness property if whenever Q ⊂ P(E) is equicontinuous then for each sequence {µn}n≥1 from Q there exists a subsequence {nj}j≥1 and µ ∈ P(E) such that µ(E) = limj µnj (E) for all E ∈ E. Theorem 17.1 therefore states that every measurable space has this property. We first show that (M, B) has the property (with (M, B) as in Chapter 16), then use this to show that Theorem 17.1 holds when (X, E) is countably generated, and finally deal with the general case.

108 17 The Dunford-Pettis theorem 109

Lemma 17.2 The space (M, B) has the weak sequential compactness property.

Proof We use the notation from Chapter 16, so C is the algebra of cylinder sets. Let Q ⊂ P(B) be equicontinuous and let {µn}n≥1 be a sequence from Q. Since C is countable and the values µn(C) all lie in the compact interval [0, 1] the usual diagonal argument (Theorem 23.1) implies there exists a subsequence {nj}j≥1 + and ν : C → R so that ν(C) = limj µnj (C) for all C ∈ C. But ν is clearly additive and ν(M) = 1 and hence by Proposition 16.6 there exists µ ∈ P(B) with

µ(C) = ν(C) for all C ∈ C; thus µ(C) = limj µnj (C) for all C ∈ C. Let

K = B ∈ B : µ(B) = lim µn (B) ; j→∞ j  then C ⊂ K and σ(C) = B, and so by Proposition 2.1 it is enough to show that K is a monotone class. Let {Bn}n≥1 be an increasing sequence from K and put

B = n≥1 Bn. For each p ≥ 1 let Ap = B \ Bp; then {Ap}p≥1 is a decreasing sequenceS from B with p≥1 Ap = ∅. Let ε > 0; there thus exists p ≥ 1 so that µ(Ap) < ε/3 and so thatT ω(Ap) < ε/3 for all ω ∈ Q. Moreover, since Bp ∈ K, there exists m ≥ 1 so that |µ(Bp) − µnj (Bp)| < ε/3 for all j ≥ m. Hence

|µ(B) − µnj (B)| ≤ |µ(Bp) − µnj (Bp)| + µ(Ap) + µnj (Ap) < ε

for all j ≥ m, and so µ(B) = limj µnj (B), i.e., B ∈ K. The case of a decreasing sequence from K is almost exactly the same.

Now let (X, E) and (Y, F) be measurable spaces.

Lemma 17.3 If there exists a mapping f : X → Y with f −1(F) = E and (Y, F) has the weak sequential compactness property then so does (X, E).

0 Proof For each µ ∈ P(E) denote the image measure f∗µ ∈ P(F) by µ . Let Q ⊂ P(E) be equicontinuous; then the subset Q0 = {µ0 : µ ∈ Q} of P(F) is also equicontinuous: If {Fn}n≥1 is a decreasing sequence from F with n≥1 Fn = ∅ −1 and En = f (Fn) for each n ≥ 1 then {En}n≥1 is a decreasing sequenceT from E with n≥1 En = ∅; thus, given ε > 0, there exists p ≥ 1 so that µ(Ep) < ε 0 0 0 for all µT∈ Q and hence µ (Fp) = µ(Ep) < ε for all µ ∈ Q . Let {ν}n≥1 be a 0 0 sequence from Q; thus {ν }n≥1 is a sequence from Q , and so if (Y, F) has the weak sequential compactness property then there exists a subsequence {nj}j≥1 0 and ν ∈ P(F) such that ν(F ) = limj µnj (F ) for all F ∈ F. In particular, if F ∈ F with F ∩ f(X) = ∅ then

0 −1 ν(F ) = lim µn (F ) = lim µnj (f (F )) = lim µnj (∅) = 0 j→∞ j j→∞ j→∞ 17 The Dunford-Pettis theorem 110

and so by Theorem 13.1 there exists a µ ∈ P(E) with ν = f∗µ. Let E ∈ E; then E = f −1(F ) for some F ∈ F and therefore

−1 0 −1 µ(E) = µ(f (F )) = ν(F ) = lim µn (F ) = lim µnj (f (F )) = lim µnj (E) . j→∞ j j→∞ j→∞ This shows that (X, E) has the weak sequential compactness property.

Proposition 17.1 A countably generated measurable space (X, E) has the weak sequential compactness property.

Proof This follows from Proposition 16.1 and Lemmas 17.2 and 17.3.

We turn to the general case, so now let (X, E) be an arbitrary measurable space. The following standard device (to be found, for example, in Dunford and Schwartz [5], Chapter IV.9.) will be made use of:

Lemma 17.4 Let {µn}n≥1 be any sequence from P(E); then there exists ν ∈ P(E) such that µn  ν for all n ≥ 1.

−n Proof Just take, for example, ν = n≥1 2 µn. P Let Q be an equicontinuous subset of P(E) and let {µn}n≥1 be a sequence from Q. By Lemma 17.4 there exists ν ∈ P(E) such that µn  ν for each n ≥ 1; hence by Theorem 12.1 there exists hn ∈ M(E) such that µn = ν·hn. Now consider any countably generated σ-algebra F ⊂ E. For each µ ∈ P(E) let µ0 ∈ P(F) be the restriction of µ to F. Then Q0 = {µ0 : µ ∈ Q} is an equicontinuous subset of 0 0 P(F) and {µn}n≥1 is a sequence from Q . Thus by Proposition 17.1 there exists a 0 subsequence {nj}j≥1 and ω ∈ P(F) such that ω(F ) = limj µnj (F ) for all F ∈ F. 0 0 But if F ∈ F with ν (F ) = 0 then ν(F ) = 0, thus ω(F ) = limj µnj (F ) = 0 and hence ω  ν0. Therefore by Theorem 12.1 there exists h ∈ M(F) such that 0 ω = ν ·h. Now define µ ∈ P(E) by µ = ν·h, and so µ(E) = ν(hIE ) for all E ∈ E.

In particular µ(F ) = limj µnj (F ) for all F ∈ F.

Lemma 17.5 Suppose hn ∈ M(F) for each n ≥ 1. Then µ(E) = limj µnj (E) for all E ∈ E.

Proof Let E ∈ E; then IE ≤ 1 and therefore by Theorem 12.4 there exists g ∈ MB(F) with g ≤ 1 such that ν(fg) = ν(fIE) for all f ∈ M(F). Then µ(E) = ν(hIE) = ν(hg) = µ(g) and µn(E) = ν(hnIE) = ν(hng) = µn(g) for each n ≥ 1 and therefore by Lemma 17.1

µ(E) = µ(g) = lim µn (g) = lim µn (E) . j→∞ j j→∞ j

The next result completes the proof of Theorem 17.1. 17 The Dunford-Pettis theorem 111

Lemma 17.6 There exists a countably generated σ-algebra F ⊂ E such that hn ∈ M(F) for each n ≥ 1.

Proof Let J be the countable set consisting of all elements of E of the form + {x ∈ X : hn(x) > r} with n ≥ 1 and r ∈ Q , and put F = σ(J ). Then F ⊂ E and F is countably generated. But if a ∈ R+ then there is a decreasing sequence + {rm}m≥1 from Q with limm rm = a and thus

{x ∈ X : hn(x) > a} = {x ∈ X : hn(x) > rm} ∈ F . m\≥1

This implies that hn ∈ M(F) for each n ≥ 1. 18 Substandard Borel spaces

There are several important results, including some which play an important role in [16] (such as the Kolmogorov extension theorem), which do not hold in general for arbitrary measurable spaces and not even for arbitrary countably generated measurable spaces. In such situations it is usual to resort to standard Borel spaces, since in most applications the measurable spaces will be standard Borel and the results which are needed can be proved within this framework. (A measurable space (X, E) is standard Borel if there exists a metric on X which makes it a complete separable metric space in such a way that E is then the Borel σ-algebra.) The theory of standard Borel spaces has therefore established itself as a kind of standard ‘advanced’ measure theory. Unfortunately, a first acquaintance with this theory can be a bit off-putting, and many tend to regard it as simply providing a collection of very useful tools whose validity has to be taken for granted. We thus prefer to work with a cheap alternative which we call substandard Borel spaces. The term ‘substandard’ should here be considered in the sense of indicating a (pattern of linguistic) usage which does not conform to that of the prestige group in a (speech) community. Substandard Borel spaces have the advantage of being easier to understand while at the same time being good enough to be a replacement for standard Borel spaces in results such as the Kolmogorov extension theorem. Their definition involves the space (M, B) introduced in Chapter 16. We say that a measurable space (X, E) a substandard Borel space if there exists a mapping f : X → M with f −1(B) = E such that f(X) ∈ B. (This additional condition is nowhere near as harmless as it might first appear.) Thus by Proposition 16.1 a substandard Borel space is countably generated. The notion of a substandard Borel space already occurs implicitly in Chapter V of Parthasarthy [15] in the proof of the Kolmogorov extension theorem for the inverse limit of standard Borel spaces. The relationship between standard Borel and substandard Borel spaces will be looked at in Chapter 22. In fact, Proposition 18.1 below implies that a standard Borel space is substandard Borel, and for a separable measurable space (i.e., a measurable space (X, E) with {x} ∈ E for all x ∈ X) the converse is true, but a proof requires the typical machinery associated with standard Borel spaces.

Lemma 18.1 Let (X, E) be substandard Borel and let f : X → M be a mapping with f −1(B) = E and f(X) ∈ B. Then f(E) ∈ B for all E ∈ E.

Proof Let E ∈ E; then since f −1(B) = E there exists B ∈ B with f −1(B) = E, and thus by Lemma 16.2 f(E) = f(f −1(B)) = B ∩ f(X) ∈ B.

112 18 Substandard Borel spaces 113

Lemma 18.2 Let (X, E) be a measurable space and let (Y, E) be substandard Borel. If there exists a mapping g : X → Y with g−1(F) = E and g(X) ∈ F then (X, E) is a substandard Borel space.

Proof Since (Y, F) is substandard Borel there exists a mapping f : Y → M with f −1(B) = F and f(Y ) ∈ B; put h = f ◦ g. Then h−1(B) = g−1(F) = E and by Lemma 18.1 h(X) = f(g(X)) ∈ B. Therefore (X, E) is substandard Borel.

Proposition 18.1 Let X be a complete separable metric space and BX be the σ-algebra of Borel subsets of X. Then (X, BX ) is a substandard Borel space.

Proof To start with consider the closed interval I = [0, 1] and let b : M → I be −n−1 the mapping with b({zn}n≥0) = n≥0 2 zn; then b is continuous and hence −1 by Lemma 2.5 b (BI ) ⊂ B, withPBI the σ-algebra of Borel subsets of I. Now for each C ∈ C there is a dyadic interval J such that b−1(J) = C and hence −1 −1 −1 b (BI ) ⊃ C, which implies that b (BI ) ⊃ σ(C) = B, i.e., b (BI ) = B. Put

N = {zn}n≥0 ∈ M : z0 = 0 and zn = 1 for all n ≥ m for some m ≥ 1 ;  then N is countable and b maps Mo = M \ N bijectively onto I. Let v : I → M be the unique mapping with v(b(z)) = z for all z ∈ Mo; then v(I) = Mo and so in particular v(I) ∈ B. Let B ∈ B; then B ∩ Mo ∈ B and thus there exists A ∈ BI −1 −1 −1 with b (A) = B ∩ Mo, which implies v (B) = v (B ∩ Mo) = A ∈ BI . This −1 −1 −1 −1 shows v (B) ⊂ BI . But if A ∈ BI then b (A) ∈ B and v (b (A)) = A, and −1 −1 hence v (B) = BI . We therefore have a mapping v : I → M with v (B) = BI and v(I) ∈ B. N N Now define v : I → M by v({xn}n≥0) = {v(xn)}n≥0. Then parts (2) and (3) −1 N N N N N of Proposition 2.6 imply that v (B ) = BI and v(I ) = f(I) ∈ B , where N N N N BI and B are the product σ-algebras on I and M . But by Proposition 16.4 the topological space M N (with the product topology) is homeomorphic to M and if u : M N → M is a homeomorphism then u−1(B) = BN, and thus also N N −1 N u(B ) = B. Put g = u ◦ v; then g : I → M is a mapping with g (B) = BI and g(IN) ∈ B. This shows that (I N, BN) is a substandard Borel space. Moreover, N N by Proposition 2.10 the product σ-algebra BI is the Borel σ-algebra of I with the product topology (since N is countable and I has a countable base for its topology). Finally, let (X, d) be a complete separable metric space. Then there is a standard construction (given below) producing a continuous injective mapping h : X → I N such that h is a homeomorphism from X to h(X) (with the relative topology) and such that h(X) is the intersection of a sequence of open subsets of I N, and N −1 N so in particular with h(X) ∈ BI . By Lemma 2.5 h (BI ) ⊂ BX . On the other 18 Substandard Borel spaces 114 hand, let U ⊂ X be open; since h : X → h(X) is a homeomorphism there exists an open subset V of IN with h−1(h(X) ∩ V ) = U. But then h−1(V ) = U, and −1 N this shows that OX ⊂ h (BI ), where OX is the set of open subsets of X. Hence −1 N −1 N −1 N by Proposition 2.4 BX = σ(OX ) ⊂ σ(h (BI )) = h (σ(BI )) = h (BI ). N −1 N N We thus have a mapping h : X → I with h (BI ) = BX and h(X) ∈ BI N N and (I , B ) is substandard Borel. Therefore by Lemma 18.2 (X, BX ) is also substandard Borel. Here is how the mapping h can be constructed: Choose a dense sequence of elements {xn}n≥0 from X and for each n ≥ 0 let hn : X → I be the continuous mapping given by hn(x) = min{d(x, xn), 1} for each x ∈ X. If x, y ∈ X with x =6 y then there exists n ≥ 0 such that hn(x) =6 hn(y) (since if n ≥ 0 is such 1 that d(x, xn) < ε, where ε = 2 min{d(x, y), 1}, then hn(x) < ε < hn(y)). Define N h : X → I by letting h(x) = {hn(x)}n≥0 for each x ∈ X. Then h is continuous N (since pn ◦ h = hn is continuous for each n ≥ 0, with pn : I → I the projection onto the n th component) and injective. Moreover, for each x ∈ X and each 0 < ε < 1 there exists n ≥ 0 so that |hn(y) − hn(x)| > ε/2 for all y ∈ X with d(y, x) > ε. (Just take n ≥ 0 so that d(x, xn) < ε/4.) This implies that the bijective mapping h : X → h(X) is a homeomorphism from X to h(X) with the relative topology. (Note that the completeness of X was not needed here.) The topological space I N is metrisable (since N is countable and I is metrisable). Let δ be any metric generating the topology on I N, and for each n ≥ 0 let

N 0 −n 0 Un = {y ∈ I : δ(y, y ) < 2 for some y ∈ h(X)} .

N Then {Un}n≥0 is a decreasing sequence of open subsets of I with h(X) ⊂ Un for each n ≥ 0. In fact h(X) = n≥0 Un: Let y ∈ n≥0 Un; then for each n ≥ 0 there −n exists yn ∈ h(X) with δ(y,Tyn) < 2 and soT{yn}n≥0 is a Cauchy sequence in h(X). Thus {xn}n≥0 is a Cauchy sequence in X, where xn is the unique element with h(xn) = yn. Since X is complete the sequence {xn}n≥0 has a limit x ∈ X and then h(x) = y, i.e., y ∈ h(X). This shows that h(X) is the intersection of a sequence of open subsets of I N.

Proposition 18.2 (1) If (X, E) is substandard Borel and Y ∈ E is non-empty then (Y, E|Y ) is a substandard Borel space.

(2) Let S be a countable set and for each s ∈ S let (Xs, Es) be a substandard Borel space. Then the product measurable space (X, E) is also substandard Borel.

(3) Let S be a non-empty countable set and for each s ∈ S let (Xs, Es) be a substandard Borel space; assume that the sets Xs, s ∈ S, are disjoint. Then the disjoint union measurable space (X, E) is substandard Borel.

(4) Let (X, E) be a substandard Borel space. Then the measurable space (X/, E/) is substandard Borel. (Recall that X/ denotes the set of all measures on (X, E) 18 Substandard Borel spaces 115

taking only values in the set N and that E/ = σ(E♦), where E♦ is the set of all subsets of X/ having the form {p ∈ X/ : p(E) = k} with E ∈ E and k ∈ N.)

Proof (1) Since (X, E) is substandard Borel there exists a mapping f : X → M −1 with f (B) = E and f(X) ∈ B; let f|Y be the restriction of f to Y . If B ∈ B −1 −1 −1 then f|Y (B) = f (B) ∩ Y ∈ E|Y and so f|Y (B) ⊂ E|Y . On the other hand, if −1 A ∈ E|Y then A = E ∩ Y with E ∈ E and there exists B ∈ B with f (B) = E. −1 −1 −1 Thus f|Y (B) = f (B) ∩ Y = E ∩ Y = A, which implies f|Y (B) = E|Y . This −1 means that f|Y : Y → M is a mapping with f|Y (B) = E|Y . But f|Y (Y ) = f(Y ) and so by Lemma 18.1 f|Y (Y ) ∈ B. Hence (Y, E|Y ) is substandard Borel. (2) This is trivially true if S = ∅, so we can assume S is non-empty. For each −1 s ∈ S there exists a mapping fs : Xs → M with fs (B) = Es and fs(Xs) ∈ B. S Define f : X → M by letting f({xs}s∈S) = {fs(xs)}s∈S for each {xs}s∈S ∈ X. Then by Proposition 2.6 (2) and (3) f −1(BS) = E and f(X) ∈ BS . But by Lemma 18.2 and Proposition 16.4 (M S , BS) is substandard Borel and thus by Lemma 18.2 (X, E) is substandard Borel. −1 (3) For each s ∈ S there exists a mapping fs : Xs → M with fs (B) = Fs and fs(X) ∈ B. Define f : X → S × M by letting f(x) = (s, fs(x)) for each −1 x ∈ Xs, s ∈ S. Then by Proposition 2.9 (2) and (3) f (P(S) × B) = E and f(X) ∈ P(S) × B. But by Propositions 18.1 and 16.5 (S × M, P(S) × B) is substandard Borel and thus by Lemma 18.2 (X, F) is substandard Borel. (4) There exists a mapping f : X → M with f −1(B) = E and f(X) ∈ B. Let f/ : X/ → M/ be the mapping defined in Chapter 13 with f/(p) = f∗p for all −1 p ∈ X/. Then by Proposition 13.2 (2) and (3) (f/) (B/) = E/ and f/(X/) ∈ B/. But by Proposition 13.3 (M/, B/) is the disjoint union of the measurable spaces n n n n (M/ , B/ ), n ≥ 1, and by Propositions 18.1 and 16.7 (M/ , B/ ) is substandard Borel for each n ∈ N. Thus by (3) (M/, B/) is substandard Borel and so by Lemma 18.2 (X/, F/) is substandard Borel.

Proposition 16.8 says that any separable substandard Borel space is isomorphic to (B, B|B) for some B ∈ B. Conversely, by Proposition 18.2 (1) the measurable space (B, B|B) is a separable substandard Borel space for each B ∈ B. 19 The Kolmogorov extension property

In the following let (X, E) be a measurable space and {En}n≥0 be an increasing sequence of countably generated sub-σ-algebras of E with E = σ( n≥0 En). S A sequence of measures {µn}n≥0 with µn ∈ P(En) for each n ≥ 0 is said to be consistent if µn(E) = µn+1(E) for all E ∈ En, n ≥ 0. The sequence {En}n≥0 has the Kolmogorov extension property if for each consistent sequence {µn}n≥0 there exists µ ∈ P(E) with µ(E) = µn(E) for all E ∈ En, n ≥ 0. (This measure µ is then unique, since it is uniquely determined by the sequence {µn}n≥0 on the algebra A = n≥0 En and σ(A) = E.) S Finally, the σ-algebra E is called the inverse limit of the sequence {En}n≥0 if

n≥0 An =6 ∅ holds whenever {An}n≥0 is a decreasing sequence of atoms with ATn ∈ A(En) for each n ≥ 0.

Theorem 19.1 Let (X, En) be substandard Borel for each n ≥ 0 and let E be the inverse limit of the sequence {En}n≥0. Then (X, E) is also substandard Borel and the sequence {En}n≥0 has the Kolmogorov extension property.

Theorem 19.1 (for standard Borel spaces) is the form of the Kolmogorov extension theorem occurring in Chapter V of Parthasarathy [15]. In fact the proof in [15] is implicitly a proof for substandard Borel spaces. Theorem 19.1 can be combined with Theorem 17.1 to give the useful result which follows. This also deals with the set-up described above, thus E is the inverse limit of the sequence {En}n≥0 with (X, En) substandard Borel for each n ≥ 0.

The algebra n≥0 En will be denoted by A. A subset Q Sof P(E) is said to be locally equicontinuous if for each n ≥ 0 the restrictions of the measures in Q to En are equicontinuous, i.e., if for each n ≥ 0, each decreasing sequence {Ek}k≥1 from En with k≥1 Ek = ∅ and each ε > 0 there exists p ≥ 1 so that µ(Ep) < ε for all µ ∈ Q.T

Theorem 19.2 Let Q be a locally equicontinuous subset of P(E) and {µn}n≥1 be a sequence of elements from Q. Then there exists a subsequence {nj}j≥1 and

µ ∈ P(E) such that µ(E) = limj µnj (E) for all E ∈ A.

Proof Applying Theorem 17.1 to the restrictions of the measures in Q to Ek for each k ≥ 0 and employing the usual diagonal argument (Theorem 23.1) there exists a subsequence {nj}j≥1 and for each k ≥ 0 a probability measure νk ∈ P(Ek) such that νk(E) = limj µnj (E) for all E ∈ Ek, k ≥ 0. But the sequence of measures {νk}k≥0 is then clearly consistent and so by Theorem 19.1 there exists µ ∈ P(E)

116 19 The Kolmogorov extension property 117

such that µ(E) = νk(E) for all E ∈ Ek, k ≥ 0. Thus µ(E) = limj µnj (E) for all E ∈ A.

Before coming to the proof of Theorem 19.1 let us first consider an example. Let S be a countably infinite set and for each s ∈ S let (Xs, Es) be a separable substandard Borel space; put X = s∈S Xs and E = s∈S Es. It thus follows from Proposition 18.2 (2) that (X, E)Qis substandard Borel.Q Λ Λ For each non-empty finite subset Λ ⊂ S put X = s∈Λ Xs and E = s∈Λ Es Λ and let pΛ : X → X be the projection mapping withQ pΛ({xs}s∈S) = {Qxs}s∈Λ; Λ Λ again by Proposition 18.2 (2) (X , E ) is a substandard Borel space, which is Λ −1 Λ separable (since each point is a measurable rectangle). Put E = pΛ (E ); then E Λ is a sub-σ-algebra of E and by Lemma 18.2 (X, E Λ) is substandard Borel, since pΛ is surjective. Now choose an increasing sequence {Λn}n≥0 of non-empty Λn finite subsets of S with S = n≥0 Λn. Then {E }n≥0 is an increasing sequence Λn of sub-σ-algebras of E with ES= σ( n≥0 E ) (since every measurable rectangle Λn Λn lies in n≥0 E ), and (X, E ) is Sa substandard Borel space for each n ≥ 0. Λn MoreovSer, E is the inverse limit of the sequence {E }n≥0: Let {An}n≥0 be a Λn decreasing sequence of atoms with An ∈ A(E ) for each n ≥ 0. By Lemma 16.4 Λn −1 there exists for each n ≥ 0 a (unique) point xn ∈ X with An = pΛn ({xn}), and Λn+1 since An+1 ⊂ An it follows that xn = τn(xn+1), with τn the projection of X Λn onto X . There therefore exists a unique point x ∈ X with xn = pΛn (x) for each n ≥ 0, which means that x ∈ n≥1 An, and so in particular n≥1 An =6 ∅. Thus by Theorem 19.1 (X, E) is substandardT Borel (which we knewT already) Λn and the sequence {E }n≥0 has the Kolmogorov extension property. (In fact, the assumption about the separability of the spaces (Xs, Es) was not really necessary; it was only made to simplify the proof that E is the inverse limit of the sequence Λn {E }n≥0.)

Now for each s ∈ S let µs be a probability measure on Es and for each n ≥ 0 Λn Λn let µ be the corresponding product measure on E . By Theorem 13.1 there Λn Λn exists a unique νn ∈ P(E ) with µ = (pΛn )∗νn (since the mapping pΛn is Λn −1 Λn surjective and E = pΛn (E )). Moreover, by checking on rectangles it is easy to see that the sequence of measures {µn}n≥0 is consistent and hence by the Kolmogorov extension property there exists a unique measure µ on E such that Λn µ(E) = νn(E) for all E ∈ E , n ≥ 0. Again by checking on rectangles it follows that µ is the product of the measures µs, s ∈ S, whose existence was established in Theorem 15.3. This gives another proof of the existence of the product measure µ, but only under additional assumptions on the measurable spaces (Xs, Es). However, the framework considered here can be used to show the existence of measures which are far from being product measures, and which the method used in the proof of Theorem 15.3 cannot deal with. 19 The Kolmogorov extension property 118

Proof of Theorem 19.1: The proof is based on that of Theorem 4.1 in Chapter V of Parthasarathy [15]. In what follows let (X, E) be a measurable space and {En}n≥0 be an increasing sequence of countably generated sub-σ-algebras of E. Suppose that E is the inverse limit of the sequence {En}n≥0 and that (X, En) is substandard Borel for each n ≥ 0. In the next two lemmas let E 0 be a countably generated sub-σ-algebra of E.

Lemma 19.1 Let f, g : X → M be mappings with f −1(B) = E 0 and g−1(B) ⊂ E 0; put A = f(X). Then there exists a unique mapping h : A → M such that −1 h ◦ f = g, and then h (B) ⊂ B|A.

−1 0 Proof Let z ∈ A; then by Lemma 16.3 f ({z}) ∈ A(E ) and so g(x1) = g(x2) −1 −1 0 for all x1, x2 ∈ f ({z}), since g ({w}) ∈ E for each w ∈ M. There thus exists a unique mapping h : A → M such that h ◦ f = g. Now let B ∈ B; then, since f −1(B) = E 0, there exists B0 ∈ B with f −1(B0) = g−1(B). It follows that h−1(B) = f(f −1(h−1(B))) = f(g−1(B)) = f(f −1(B0)) = f(X) ∩ B0 = A ∩ B0

−1 (since f : X → A is surjective) and therefore h (B) ∈ B|A. This shows that −1 h (B) ⊂ B|A.

Lemma 19.2 Let f, g : X → M be mappings with f −1(B) = E 0 and g−1(B) ⊂ E 0; let θ : X → M × M be the mapping with θ = (g, f). Then θ−1(B × B) = E 0 and θ(X) ∈ B ×B|A, where A = f(X). In particular, if f(X) ∈ B then θ(X) ∈ B ×B.

−1 −1 −1 0 Proof If B1, B2 ∈ B then θ (B1 × B2) = g (B1) ∩ f (B2) ∈ E and hence θ−1(B × B) ⊂ E 0 (since B × B is the smallest σ-algebra containing all products 0 of the form B1 × B2 with B1, B2 ∈ B). On the other hand, if E ∈ E then E = f −1(B) for some B ∈ B and then θ−1(X × B) = E. Thus θ−1(B × B) = E 0. Now put A = f(X); by Lemma 19.1 there exists a unique mapping h : A → M −1 such that h ◦ f = g, and then h (B) ⊂ B|A. Let r : M × A → M × M be the −1 mapping with r(z1, z2) = (h(z2), z1); then r (B × B) ⊂ B × B|A and θ(X) = {(g(x), f(x)) : x ∈ X} = {(h(f(x)), f(x)) : x ∈ X} −1 = {(h(z), z) : z ∈ A} = {(z1, z2) ∈ M × A : h(z2) = z1} = r (D) , where D = {(z, z) : z ∈ M} is the diagonal in M × M. But D ∈ B × B, since D is closed and B × B is the σ-algebra of Borel subsets of M × M, and hence θ(X) ∈ B × B|A.

0 0 Let ∆ : M → M be given by ∆({zn}n≥0) = {zn}n≥0, where zn = z2n; thus ∆ is continuous and surjective. The key to the whole proof is the following ‘trick’, which appears in Theorem 2.2 of Mackey [13] and is also used in Lemma 4.1 in Chapter V of Parthasarathy [15]: 19 The Kolmogorov extension property 119

0 0 0 0 0 Lemma 19.3 Let E1 and E2 be sub-σ-algebras of E with E1 ⊂ E2 and with (X, E2) −1 0 substandard Borel. Let f1 : X → M with f1 (B) = E1. Then there exists a −1 0 mapping f2 : X → M with f2 (B) = E2 and f2(X) ∈ B such that f1 = ∆ ◦ f2.

0 Proof Since (X, E2) is substandard Borel there exists a mapping f : X → M −1 0 with f (B) = E2 and f(X) ∈ B. Define θ : X → M × M by θ = (f1, f). Then 0 0 −1 0 by Lemma 19.2 (since f1(B) = E1 ⊂ E2) θ (B × B) = E2 and θ(X) ∈ B × B. Now let h : M × M → M be the homeomorphism given by

0 h({zn}n≥0, {zn}n≥0) = {wn}n≥0 ,

0 where w2n = zn and w2n+1 = zn for each n ≥ 0, and let f2 = h◦θ. Then it follows −1 −1 0 that f2 (B) = θ (B × B) = E2 and f2(X) = h(θ(X)) ∈ B. Finally, f1 = ∆ ◦ f2 holds directly from the definition of h and ∆.

Since each (X, En) is a substandard Borel space Lemma 19.3 implies there exists −1 a sequence of mappings {fn}n≥0 from X to M with fn (B) = En and fn(X) ∈ B and such that fn = ∆ ◦ fn+1 for each n ≥ 0. Consider the product M N as a compact metric space in the usual way; then the product σ-algebra BN is also the Borel σ-algebra. Let

N+ M∆ = {{zn}n≥0 ∈ M : zn = ∆(zn+1) for all n ≥ 0} ;

N M∆ is a closed (and thus compact) subset of M . Let B∆ be the trace σ-algebra (and so B∆ is the Borel σ-algebra of M∆). For each m ≥ 0 let πm : M∆ → M be the mapping given by πm({zn}n≥0) = zm for each {zn}n≥0 ∈ M∆. Then πn is continuous and surjective (since ∆ is surjective). Moreover, πn = ∆ ◦ πn+1 −1 −1 −1 −1 and this implies that πn (B) = πn+1(∆ (B)) ⊂ πn+1(B) for each n ≥ 0. Thus −1 −1 {πn (B)}n≥0 is an increasing sequence of σ-algebras and B∆ = σ( n≥0 πn (B)) N (since πn is the restriction of the projection mapping pn : M →SM). Denote −1 the algebra n≥0 πn (B) by A∆. S Now {fn(x)}n≥0 ∈ M∆ for each x ∈ X, since fn = ∆ ◦ fn+1 for each n ≥ 0, and therefore a mapping f : X → M∆ can be defined by letting f(x) = {fn(x)}n≥0 for each n ≥ 0. This means f : X → M∆ is the unique mapping such that −1 −1 −1 πn ◦ f = fn for each n ≥ 0. In particular f (πn (B)) = fn (B) = Fn and hence

−1 −1 −1 −1 f (B∆) = f (σ(A∆)) = σ f (πn (B)) = σ En = FE , n[≥1  n[≥0 

−1 −1 i.e., f (B∆) = E. Note that πn (fn(X)) ∈ En for each n ≥ 0 (since fn(X) ∈ B) −1 and that the sequence {πn (fn(X))}n≥0 is decreasing (since πn = ∆ ◦ πn+1 and fn = ∆ ◦ fn+1 for each n ≥ 0). 19 The Kolmogorov extension property 120

−1 Lemma 19.4 f(X) = πn (fn(X)) and so in particular f(X) ∈ B∆. n\≥0

−1 Proof Clearly f(X) ⊂ n≥1 πn (fn(X)), since πn ◦ f = fn for each n ≥ 0. Thus −1 N consider z = n≥0 πn T(fn(X)), so z = {zn}n≥0 ∈ M with zn ∈ fn(X) and −1 zn = ∆(zn+1) forT each n ≥ 0. Put An = fn ({zn}); by Lemma 16.3 An ∈ A(Fn) and, since zn = ∆(zn+1),

−1 −1 −1 −1 An+1 = fn+1({zn+1}) ⊂ fn+1(∆ ({zn+1})) = fn ({zn}) = An for each n ≥ 0. Therefore by assumption A∞ = n≥0 An =6 ∅, and f(x) = z for each x ∈ A∞, i.e., z ∈ f(X). T

By Proposition 18.1 (M∆, B∆) is substandard Borel and it thus follows from Lemma 19.4 and Lemma 22.2 that (X, F) is substandard Borel.

It remains to show that {En}n≥0 has the Kolmogorov extension property and for this the following fact will be needed:

Proposition 19.1 Let {νn}n≥0 be a sequence from P(B) with νn = ∆∗νn+1 for each n ≥ 0. Then there exists a unique measure ν ∈ P(B∆) such that νn = (πn)∗ν for all n ≥ 0.

Proof Again let C ⊂ B be the algebra of cylinder sets. Now ∆−1(C) ⊂ C, since −1 −1 −1 −1 ∆ is continuous, and thus πn (C) = πn+1(∆ (C)) ⊂ πn+1(C) for each n ≥ 0, −1 and therefore {πn (C)}n≥0 is an increasing sequence of countable algebras. Put −1 C∆ = n≥0 πn (C); then C∆ is a countable algebra with C∆ ⊂ A∆ and S −1 −1 σ(C∆) = σ πn (C) = σ πn (σ(C)) = σ(A∆) = B∆ . n[≥0  n[≥1 

Moreover, each element of C∆ is compact (since the mappings πn are continuous and M∆ is compact), and hence C∆ has the finite intersection property.

Let n ≥ 0; then (since πn is surjective) Theorem 13.1 implies there is a unique 0 −1 0 0 νn ∈ P(πn (B)) with νn = (πn)∗νn, and the sequence {νn}n≥1 is consistent in 0 0 −1 −1 that νn+1(D) = νn(D) for all D ∈ πn (B), n ≥ 0. (Let D ∈ πn (B) with −1 −1 0 0 −1 πn (B) = D = πn+1(B ); then B = ∆ (B), since πn = ∆ ◦ πn+1 and πn+1 0 0 −1 0 is surjective, and so νn+1(D) = νn+1(B ) = νn+1(∆ (B)) = νn(B) = νn(D).) 0 + 0 0 There is therefore a unique mapping ν : A∆ → R such that ν (D) = νn(D) −1 0 for all D ∈ πn (B), n ≥ 0, and it is clear that ν is finitely additive. Now 0 the restriction of ν to C∆ is also finitely additive and hence (as in the proof of 0 Proposition 16.6) there exists a unique ν ∈ P(B∆) with ν(D) = ν (D) for all −1 D ∈ A∆. But then the restriction of ν to πn (B) is a probability measure which 19 The Kolmogorov extension property 121

0 −1 −1 is an extension of the restriction of νn to πn (C), and πn (C) is an algebra with −1 −1 0 σ(πn (C)) = πn (B). This means that ν is an extension of νn and from this it immediately follows that νn = (πn)∗ν for each n ≥ 0. The uniqueness of ν is clear, since the requirement that νn = (πn)∗ν for all n ≥ 0 determines ν on the algebra A∆.

Now let {µn}n≥0 be a consistent sequence of measures (with µn ∈ P(En) for each n ≥ 0) and for each n ≥ 0 let νn = (fn)∗µn ∈ P(B). Then

−1 −1 −1 −1 νn(B) = µn(fn (B)) = µn(fn+1(∆ (B))) = νn+1(∆ (B)) for each B ∈ B, and hence νn = ∆∗νn+1 for each n ≥ 0. By Proposition 19.1 there thus exists a unique measure ν ∈ P(B∆) such that νn = (πn)∗ν for all −1 n ≥ 0. But ν(πn (fn(X)) = νn(fn(X)) = µn(X) = 1 for each n ≥ 0 and hence by Lemma 19.4 ν(f(X)) = 1. Therefore by Theorem 13.1 there exists a measure µ ∈ P(E) such that ν = f∗µ. Let n ≥ 0 and E ∈ En; there thus exists B ∈ B −1 with E = fn (B) and then

−1 −1 −1 −1 µ(E) = µ(fn (B)) = µ(f (πn (B)) = ν(πn (B)) −1 = νn(B) = µn(fn (B)) = µn(E) ; i.e., µ(E) = µn(E) for all E ∈ En, n ≥ 0.

This shows that the sequence {En}n≥0 has the Kolmogorov extension property, which completes the proof of Theorem 19.1. 20 Convergence of conditional expectations

Let (X, E) be a measurable space and µ ∈ P(E) be a probability measure (both considered to be fixed in what follows). Recall from Theorem 12.3 that if E 0 is a sub-σ-algebra of E and if f ∈ M(E) with µ(f) < ∞ then there exists g ∈ M(E 0) such that µ(hg) = µ(hf) for all h ∈ M(E 0), and that if g0 ∈ M(E 0) is a further mapping with this property then g0 = g µ-a.e. Since µ is a probability measure we call g a version of the conditional expectation of f with respect to E 0. If f is bounded above by b ∈ R+ then by Theorem 12.4 g can also be chosen so that g ≤ b. In the next chapter and several times in Specifications and their Gibbs States [16] we need the two results below about the convergence of conditional expectations. We note that these results are stated and proved only for bounded mappings (since that is all we will require) but in fact they hold for all mappings f ∈ M(E) with µ(f) < ∞.

Proposition 20.1 Let {En}n≥1 be a decreasing sequence of sub-σ-algebras of E and put E∞ = n≥1 En. Let f ∈ MB(E), for each n ≥ 1 let fn ∈ MB(En) be a version of the conditionalT expectation of f with respect to En and let f∞ ∈ MB(E∞) be a version of the conditional expectation of f with respect to E∞. Then

µ x ∈ X : lim fn(x) = f∞(x) = 1 , n→∞   i.e., limn fn = f∞ µ-a.e. Moreover, limn µ(|fn − f∞|) = 0.

Proposition 20.2 Let {En}n≥1 be an increasing sequence of sub-σ-algebras of E 0 and put E = σ( n≥1 En). Let f ∈ MB(E), for each n ≥ 1 let fn ∈ MB(En) be a 0 0 version of the conditionalS expectation of f with respect to En and let f ∈ MB(E ) be a version of the conditional expectation of f with respect to E 0. Then

0 µ x ∈ X : lim fn(x) = f (x) = 1 , n→∞   0 0 i.e., limn fn = f µ-a.e. Moreover, limn µ(|fn − f |) = 0.

Recall if {fn}n≥1 is any sequence from M(E) and f ∈ M(E) then by Lemma 9.7 the sets {x ∈ X : limn fn(x) exists} and {x ∈ X : limn fn(x) = f(x)} are both in E. Moreover, if f, g ∈ MB(E) then |f − g| = MB(E), since by Lemma 9.5 (1) MB(E) is a normal subspace of M(X) and |f − g| = (f ∨ g − g) + (f ∨ g − f). The proofs of Propositions 20.1 and 20.2 make use of the corresponding simple versions of the martingale convergence theorem. This theorem is one of the most important results in probability theory and we presume that the reader

122 20 Convergence of conditional expectations 123 has seen a proof of it. (A very good presentation can be found in Chapter 5 of Breiman [2].) In order to prove Propositions 20.1 and 20.2 we only need a version for uniformly bounded martingales. Thus we give a simple proof here of the martingale convergence theorem for this special case. The proof is adapted from the proof of Baez-Duarte and Isaac [9] which can be found in Garsia [7]. We start by looking at the version of martingales needed in Proposition 20.1.

Let {En}n≥1 be a decreasing sequence of sub-σ-algebras of E. A sequence {fn}n≥1 from M(E) is said to be adapted to {En}n≥1 if fn ∈ M(En) for all n ≥ 1. An adapted sequence {fn}n≥1 is said to be a martingale if µ(fn) < ∞ for each n ≥ 1 and µ(IEfn) = µ(IEfm) for all E ∈ En whenever n ≥ m. If {fn}n≥1 is a martingale then it immediately follows Proposition 9.2 that also µ(hfn) = µ(hfm) for all h ∈ M(En) whenever n ≥ m. Usually martingales are defined with an increasing sequence of σ-algebras, and what we are dealing with here are often called backward martingales. The ‘usual’ martingales enter the picture when we look at Proposition 20.2.

Proposition 20.3 (1) Let {fn}n≥1 be a martingale; then fn is a version of the conditional expectation of f1 with respect to En for each n ≥ 1.

(2) Let f ∈ M(E) with µ(f) < ∞ and for each n ≥ 1 let fn be a version of the conditional expectation of f with respect to En. Then {fn}n≥1 is a martingale.

Proof (1) This is clear, since by definition fn ∈ M(En) and µ(hfn) = µ(hf1) for all h ∈ M(En), n ≥ 1.

(2) For each n ≥ 1 let fn be a version of the conditional expectation of f with respect to En. Then fn ∈ M(En) and so the sequence {fn}n≥1 is adapted. More- over µ(fn) = µ(f) < ∞ for each n ≥ 1. Let n ≥ m and E ∈ En; then also E ∈ Em and therefore µ(IEfn) = µ(IEf1) = µ(IEfm). This shows that {fn}n≥1 is a martingale.

Put E∞ = n≥1 En; thus E∞ is a σ-algebra which is called the tail σ-algebra. Here is the martingaleT convergence theorem for a backward martingale:

Theorem 20.1 Let {fn}n≥1 be a martingale and f∞ ∈ M(E∞) be a version of the conditional expectation of f1 with respect to E∞. Then

µ x ∈ X : lim fn(x) = f∞(x) = 1 . n→∞   We are only going to prove this result for uniformly bounded martingales, i.e., + under the additional assumption that there exists b ∈ R such that fn ≤ b for 20 Convergence of conditional expectations 124

0 each n ≥ 1 (and so in particular fn ∈ MB(En) for each n ≥ 1). If f, g ∈ MB(E ) 0 2 0 then |f − g| = MB(E ) and thus also (f − g) = MB(E ). We start the proof with some results about finite sequences of mappings. Let F1, . . . , Fp be sub-σ-algebras of E with Fk+1 ⊂ Fk for k = 1, . . . , p − 1. A p p finite sequence {fk}k=1 from MB(E) is adapted to {Fk}k=1 if fk ∈ MB(Fk) for p k = 1, . . . , p. An adapted sequence {fk}k=1 is said to be a martingale (resp. a submartingale) if µ(IF fk) = µ(IF fj) (resp. µ(IF fk) ≤ µ(IF fj)) for all F ∈ Fk whenever k ≥ j. (Note that all the mappings occurring here are assumed to p be bounded.) If {fk}k=1 is a martingale (resp. submartingale) then we clearly also have µ(hfk) = µ(hfj) (resp. µ(hfk) ≤ µ(hfj)) for all h ∈ MB(Fk) whenever k ≥ j.

p ∗ Lemma 20.1 Let {gk}k=1 be a submartingale and put g = g1 ∨ · · · ∨ gp. Then

∗ −1 µ({x ∈ X : g (x) > a}) ≤ a µ(g1) for all a > 0.

Proof Put F = {x ∈ X : g∗(x) > a} and for k = 1, . . . , p let

Fk = {x ∈ X : gk(x) > a and gj(x) ≤ a for j = k + 1, . . . , p} .

Then Fk ∈ Fk for each k and F is the disjoint union of the sets F1, . . . , Fp. Thus

p p p

µ(IF g1) = µ(IFk g1) ≥ µ(IFk gk) ≥ µ(aIFk ) = aµ(F ) Xk=1 Xk=1 Xk=1

−1 −1 and therefore µ(F ) ≤ a µ(IF g1) ≤ a µ(g1).

p 2 p Lemma 20.2 If {fk}k=1 is a martingale then {fk }k=1 is a submartingale.

2 Proof Let j ≤ k and F ∈ Fk. Then µ(IF fk ) = µ(IF fkfk) = µ(IF fkfj) and so by the Cauchy-Schwarz inequality (Proposition 10.7) applied to the measure µ·IF

2 2 2 µ(IF fk ) = µ(IF fkfj) ≤ µ(IF fk)µ(IF fj) .

2 2 2 Thus µ(IF fk ) ≤ µ(IF fj ) (since this holds trivially when µ(IF fk ) = 0).

p 2 Lemma 20.3 Let {fk}k=1 be a martingale and for each k put gk = (fk − fp) . p Then {gk}k=1 is a submartingale. 20 Convergence of conditional expectations 125

Proof Since Fp ⊂ Fk it follows that gk ∈ MB(Fk). Let j ≤ k and F ∈ Fk. Then by Lemma 20.2

2 2 2 µ(IF gk) + 2µ(IF fkfp) = µ(IF (fk − fp) + 2IF fkfp) = µ(IF fk + IF fp ) 2 2 2 2 = µ(IF fk ) + µ(IF fp ) ≤ µ(IF fj ) + µ(IF fp ) 2 2 2 = µ(IF fj + IF fp ) = µ(IF (fj − fp) + 2IF fjfp)

= µ(IF gj) + 2µ(IF fjfp) .

But µ(IF fkfp) = µ(IF fpfk) = µ(IF fpfj) = µ(IF fjfp), since fp ∈ MB(Fk), and therefore µ(IF gk) ≤ µ(IF gj).

p Lemma 20.4 Let {fk}k=1 be a martingale; then for each a > 0

 −2 2 µ({x ∈ X : f (x) > a}) ≤ a µ((f1 − fp) )

 where f = |f1 − fp| ∨ · · · ∨ |fp−1 − fp|.

Proof This follows immediately from Lemmas 20.1 and 20.3.

Now let {fn}n≥1 be a martingale with fn ∈ MB(En) for each n ≥ 1 (though at the moment we do not assume that they are uniformly bounded). Let n ≥ m; 2 then fn ∈ MB(Em) and so µ(fn) = µ(fnfn) = µ(fnfm). Thus

2 2 2 2 2 µ((fn − fm) ) + 2µ(fn) = µ((fn − fm) + 2fn) = µ((fn − fm) + 2fnfm) 2 2 2 2 = µ(fn + fm) = µ(fn) + µ(fm)

2 2 2 2 2 and hence µ((fn −fm) )+µ(fn) = µ(fm). In particular µ(fm)−µ(fn) ≥ 0, which 2 implies that the sequence {µ(fn)}n≥1 is decreasing. 2 2 −2j Let a = limn µ(fn), and choose a subsequence {nj}j≥1 such that µ(fnj ) < a+2 2 2 2 −2j for all j ≥ 1. Then µ((fnj+1 − fnj ) ) = µ(fnj ) − µ(fnj+1 ) < 2 for all j ≥ 1 and hence by the Cauchy-Schwarz inequality (Proposition 10.7)

2 2 2 2 µ(|fnj+1 − fnj |) = µ(1 · |fnj+1 − fnj |) ≤ µ(1)µ((fnj+1 − fnj ) ) = µ((fnj+1 − fnj ) )

−j which means that µ(|fnj+1 − fnj |) < 2 for all j ≥ 1. Thus

µ |fnj+1 − fnj | = µ(|fnj+1 − fnj |) < ∞ ; Xj≥1  Xj≥1 hence if E = {x ∈ X : j≥1 |fnj+1 − fnj |(x) < ∞} then by Proposition 10.6 (4) µ(E) = 1. But clearly EPis a subset of the set

0 G = x ∈ X : lim fnj (x) exists j→∞  20 Convergence of conditional expectations 126 and therefore µ(G0) = 1. Now for each j ≥ 1 let

 fj = |fnj − fnj+1 | ∨ · · · ∨ |fnj+1−1 − fnj+1 | ;

 −2 2 −2 −2j then by Lemma 20.4 µ({x ∈ X : fj (x) > a}) < a µ((fnj − fnj+1 ) ) < a 2 for each a > 0 and hence

  µ {x ∈ X : fj (x) ≤ a} = 1 − µ {x ∈ X : fj (x) > a} j\≥N  j[≥N   −2 −2j −2 −N ≥ 1 − µ({x ∈ X : fj (x) > a}) ≥ 1 − a 2 ≥ 1 − a 2 jX≥N jX≥N for all N ≥ 1, a > 0, which implies that

 µ {x ∈ X : fj (x) ≤ a} = 1 N[≥1 j\≥N  for each a > 0, which in turn implies that

 µ {x ∈ X : fj (x) ≤ 1/m} = 1 . m\≥1 N[≥1 j\≥N 

Thus if G = {x ∈ X : limn fn(x) exists} then

0  G ⊃ G ∩ {x ∈ X : fj (x) < 1/m} , m\≥1 N[≥1 j\≥N

0 + and we have therefore shown that µ(G) = 1. Define a mapping f : X → R∞ by lim f (x) if this limit exists, f 0(x) = n n  0 otherwise.

0 0 0 Then f ∈ M(E), since f = IG lim supn fn and lim supn ∈ M(E), G ∈ E. But f does not change if the sequence {fn}n≥1 is replaced by {fn}n≥m and so the same 0 0 argument implies that f ∈ M(Em) for all m ≥ 1, i.e., f ∈ M(E∞). + We now assume that fn ≤ b for all n ≥ 1 for some b ∈ R . Then by Theorem 10.4 0 and Proposition 10.6 (3) µ(IEf ) = limn µ(IEIGfn) = limn µ(IEfn) for all E ∈ E, 0 since f = limn IGfn and µ(X \ G) = 0. In particular, if E ∈ E∞ then

0 µ(IEf ) = lim µ(IEfn) = lim µ(IEf1) = µ(IEf1), n→∞ n→∞ 0 since E ∈ En for each n ≥ 1 and this implies f is a version of the conditional expectation of f1 with respect to E∞.

Finally, let f∞ be any version of the conditional expectation of f1 with respect to 0 E∞, let F = {x ∈ X : f∞(x) = f (x)} and G∞ = {x ∈ X : limn fn(x) = f∞(x)}. Then G∞ ⊃ G ∩ F and µ(G) = µ(F ) = 1; hence µ(G∞) = 1. This completes the proof of Theorem 20.1 for a uniformly bounded martingale. 20 Convergence of conditional expectations 127

Proposition 20.4 Let {fn}n≥1 be a uniformly bounded martingale and f∞ be a bounded version of the conditional expectation of f1 with respect to E∞. Then

lim µ(|fn − f∞|) = 0 . n→∞

Proof Let G = {x ∈ X : limn fn(x) = f∞(x)}; then by Theorem 20.1 µ(G) = 1, + and limn IGfn = IGf∞. Moreover IGfn ≤ b, where b ∈ R is such that fn ≤ b for all n and µ(b) = b < ∞. Thus by Theorem 10.4 and Proposition 10.6 (3)

lim µ(|fn − f∞|) = lim µ(IG|fn − f∞|) = lim µ(|IGfn − IGf∞|) = 0 . n→∞ n→∞ n→∞

Proof of Proposition 20.1: Let b ∈ R+ be such that f ≤ b; then by Theorem 12.4 (together with the uniqueness in Theorem 12.3) fn ≤ b µ-a.e., and hence fn ∧ b is also a version of the conditional expectation of f with respect to En. Thus by Proposition 20.3 (2), Theorem 20.1 and Proposition 20.4 limn fn ∧ b = f∞ µ-a.e. and limn µ(|fn ∧ b − f∞|) = 0, and so the same holds with fn replacing fn ∧ b.

We now turn to the proof of Proposition 20.2, so now let {En}n≥1 be an increasing sequence of sub-σ-algebras of E. A sequence {fn}n≥1 from M(E) is again said to be adapted to {En}n≥1 if fn ∈ M(En) for all n ≥ 1. An adapted sequence {fn}n≥1 is a martingale if µ(fn) < ∞ for each n ≥ 1 and µ(IEfm) = µ(IEfn) for all E ∈ Em whenever m ≤ n.

Proposition 20.5 Let f ∈ M(E) with µ(f) < ∞ and for each n ≥ 1 let fn be a version of the conditional expectation of f with respect to En. Then {fn}n≥1 is a martingale.

Proof This is the same as the proof of Proposition 20.3 (2). (In general, however, there is no result corresponding to Proposition 20.3 (1).)

Here is the martingale convergence theorem for such a (forward) martingale:

Theorem 20.2 Let {fn}n≥1 be a martingale with supn µ(fn) < ∞. Then

µ x ∈ X : lim fn(x) exists = 1 . n→∞   Proof For uniformly bounded martingales the proof is essentially the same as the proof of the analogous statement in Theorem 20.1. The only real difference is 2 that the sequence {µ(fn)}n≥1, which was decreasing for a backward martingale, is now increasing. We thus need the assumption that the martingale is uniformly 2 bounded at this point to ensure that the limit a = limn µ(fn) is finite. (In the 20 Convergence of conditional expectations 128 previous case the uniform bound was only required later.) The details of the proof are left to the reader.

Proof of Proposition 20.2: Let b ∈ R+ be such that f ≤ b; then as in the proof of Proposition 20.1 we can assume that fn ≤ b for each n ≥ 1. Put

G = x ∈ X : lim fn(x) exists ; n→∞  0 0 then by Lemma 9.7 G ∈ E , since fn ∈ M(E ) for each n ≥ 1, and by Theorem 20.2 and Proposition 20.5 µ(G) = 1. Then limn(IGfn)(x) exists for each x ∈ X and 0 0 IGfn ∈ MB(E ) for each n ≥ 1; hence if g = limn IGfn then g ∈ MB(E ) (with g ≤ b) and limn fn = g µ-a.e. Now if E ∈ Em for some m ≥ 1 then E ∈ En for all n ≥ m and so by Proposition 10.6 (3) µ(IEf) = µ(IEfn) = µ(IEIGfn) for all n ≥ m. Therefore by Theorem 10.4 µ(IEg) = limn µ(IEIGfn) = µ(IEf), since limn IEIGfn = IEg and IEIGfn ≤ b for each n ≥ 1, and this shows that

µ(IEg) = µ(IEf) for all E ∈ A = m≥1 Em. But D = {E ∈ E : µ(IEg) = µ(IEf)} is clearly a monotone class and ASis an algebra and so µ(IEg) = µ(IEf) for all E ∈ E 0 = σ(A). Thus g is a version of the conditional expectation of f with 0 0 respect to E , which implies that limn fn = f µ-a.e., since by Theorem 12.3 0 0 f = g µ-a.e. Finally, the proof that limn µ(|fn − f |) = 0 is exactly the same as the proof of Proposition 20.4. 21 Existence of conditional distributions

Let us say that conditional distributions exist for measurable spaces (X, E) and + (Y, F) if for each µ ∈ P(E × F) there exists a probability kernel π : X × F → R∞ such that µ(E × F ) = µ1(IEπ(IF )) for all E ∈ E, F ∈ F, where µ1 = (p1)∗µ is the image measure of µ under the projection p1 : X ×Y → X onto the first component. (Beware that this definition is not symmetric in (X, E) and (Y, F).) Conditional distributions do not exist in general. However, they do exist if (X, E) is countably generated and (Y, F) is substandard Borel. Proofs of this fact with substandard Borel replaced by standard Borel can be found in Chapter 1 of Doob [4], Chapter V of Parthasarathy [15], and also in Appendix 4 of Dynkin and Yushkevich [6].

Theorem 21.1 Conditional distributions exist for (X, E) and (Y, F) if (X, E) is countably generated and (Y, F) is substandard Borel.

Proof We reduce things to the case in which (X, E) = (Y, F) = (M, B).

Lemma 21.1 Conditional distributions exist for (M, B) and (M, B).

−1 m m Proof For m ≥ 1 again let Cm = qm (P({0, 1} )), where qm : M → {0, 1} is given by qm({zn}n≥1) = (z1, . . . , zm). Thus {Cm}m≥1 is an increasing sequence of

finite algebras with C = m≥0 Cm. For each z ∈ M and each n ≥ 1 let an(z) be the atom of Cn containing zS. Let N ⊂ B be the trivial σ-algebra with N = {∅, M}.

Let µ ∈ P(B × B) and µ1 = (p1)∗µ with p1 : M × M → M projecting onto the + first component. For each n ≥ 1 define γn : M × B → R by

µ(an(z) × B) γn(z, B) = µ1(an(z)) with 0/0 taken to be 0. Then γn(z, ·) is either 0 or an element of P(B) for each z ∈ M, γn(·, B) ∈ M(Cn) for each B ∈ B and µ(C × B) = µ1(IC γn(IB)) for 0 + all C ∈ Cn, B ∈ B. Consider the mapping γn : (M × M) × B → R with 0 0 γn((z1, z2), B) = γ(z1, B); then γn(·, B) ∈ M(Cn × N ) and

0 µ(IC×N γn(IB)) = µ((C × N) ∩ (M × B)) = µ(IC×N IM×B)

129 21 Existence of conditional distributions 130

0 for all C ∈ Cn, N ∈ N , B ∈ B. Thus γn(IB) is a version of the conditional expectation of IM×B with respect to Cn × N for each n ≥ 1 and it therefore follows from Proposition 20.2 that

µ1 z ∈ M : lim γn(z, B) exists n→∞   0 = µ (z1, z2) ∈ M × M : lim γn((z1, z2), B) exists = 1 n→∞   for each B ∈ B. Put

MC = z ∈ M : lim γn(z, C) exists for all C ∈ C ; n→∞  since C is countable it follows that MC ∈ B and µ1(MC) = 1. Choose z0 ∈ MC and define a mapping γ : M × C → R+ by letting lim γ (z, C) if z ∈ M , γ(z, C) = n n C  limn γn(z0, C) if z ∈ M \ MC . Then γ(·, C) ∈ M(B) for each C ∈ C and by Theorem 10.4

µ(C1 × C2) = lim µ1(IC1 γn(IC2 )) = µ1(IC1 γ(IC2 )) n→∞ for all C1, C2 ∈ C. Hence by Proposition 2.1 µ(B1 × C2) = µ1(IB1 γ(IC2 )) for all + B1 ∈ B, C2 ∈ C. Now it is clear that the mapping γ(z, ·) : C → R∞ is additive with γ(z, M) = 1 for each z ∈ M and so by Proposition 16.6 it has a unique extension to an element of P(B) which will also be denoted by γ(z, ·). Thus γ : M × B → R+ is a pre-kernel which by Proposition 14.5 is a probability kernel satisfying µ(B1 × B2) = µ1(IB1 γ(IB2 )) for all B1, B2 ∈ B.

Lemma 21.2 If (Y, F) is substandard Borel then conditional distributions exist for (M, B) and (Y, F).

Proof Since (Y, F) is substandard Borel there exists a mapping f : Y → M with −1 f (B) = F such that f(Y ) ∈ B. Let µ ∈ P(B × F) and let µ1 = (p1)∗µ with p1 : M × Y → M the projection onto the first component. Put ν = g∗µ, where g = idM × f : M × Y → M × M, so ν ∈ P(B × B). Then by Lemma 21.1 there exists a probability kernel γ : M × B → R+ such that

ν(B1 × B2) = ν1(IB1 γ(IB2 )) for all B1, B2 ∈ B, where ν1 = (p˘1)∗ν with p˘1 : M × M → M projecting onto the first component, and note that ν1 = µ1, since p˘1 ◦ g = p˘1 ◦ (idM × f) = p1. Now consider M0 = {z ∈ M : γ(z, f(Y )) = 1}; then M0 ∈ B and ν1(M0) = 1, since

−1 1 = µ(M × Y ) = µ(g (M × f(Y ))) = ν(M × f(Y )) = ν1(γ(f(Y ))) . 21 Existence of conditional distributions 131

+ Choose some point z0 ∈ M0 and define γo : M × B → R∞ by

γ(z, B) if z ∈ M0 , γo(z, B) =  γ(z0, B) if z ∈ M \ M0 ; then γo is a probability kernel with γo(z, f(Y )) = 1 for all z ∈ M and

ν(B1 × B2) = ν1(IB1 γo(IB2 )) for all B1, B2 ∈ B. Now by Theorem 13.1 there exists for each z ∈ M a probability −1 measure τ(z, ·) ∈ P(F) so that τ(z, f (B)) = γo(z, B) for all B ∈ B, and then τ : M × F → R+ is clearly a probability kernel. Let B ∈ B and F ∈ F; then F = f −1(B0) for some B0 ∈ B and so

µ(B × F ) = µ(B × f −1(B0)) = µ(g−1(B × F )) = ν(B × B0)

= ν1(IBγo(IB0 )) = µ1(IBτ(If −1(B0))) = µ1(IBτ(IF )) and this shows that conditional distributions exist for (M, B) and (Y, F).

Proof of Theorem 21.1: Since the measurable space (X, E) is countably generated there exists by Proposition 16.1 a mapping f : X → M with f −1(B) = E. Let µ ∈ P(X × Y, E × F) and µ1 = (p1)∗µ with p1 : X × Y → X the projection onto the first component. Put ν = g∗µ, where g = f × idY : X × Y → M × Y , so ν ∈ P(M × Y, B × F). Then by Lemma 21.2 there exists a probability kernel + τ : M × F → R such that ν(B × E) = ν1(IBτ(IF )) for all B ∈ B, F ∈ F, where ν1 = (p˘1)∗ν with p˘1 : M × Y → M the projection onto the first component, and + ν1 = f∗µ1, since p˘1 ◦ g = p˘1 ◦ (f × idY ) = f ◦ p1. Now define π : X × F → R by letting π(x, F ) = τ(f(x), F ) for all x ∈ X, F ∈ F, thus π is clearly a probability kernel. Let E ∈ E, F ∈ F; then E = f −1(B) for some B ∈ B and so

µ(B × F ) = µ(f −1(B) × F ) = µ(g−1(B × F )) = ν(B × F )

= ν1(IBτ(IF )) = (f∗µ1)(IBτ(IF ))

= µ1(If −1(B)τ(f(·), F )) = µ1(IEπ(IF )) and therefore conditional distributions exist for (X, E) and (Y, F). This completes the proof of Theorem 21.1. 22 Standard Borel spaces

In this chapter we discuss the relationship between standard and substandard Borel spaces. We start by giving (in Proposition 22.1) some further properties of the measurable space (M, B). They form the basis for most of the results which follow. However, we offer no proof of these facts, which would involve the typical machinery associated with standard Borel spaces.

Proposition 22.1 (1) If f : M → M is injective and B-measurable (meaning that f −1(B) ⊂ B) then f(B) ∈ B for each B ∈ B (and so in particular f(M) ∈ B). (2) If A ∈ B is uncountable then there exists an injective B-measurable mapping f : M → M with f(M) = A.

Proof Part (1) is a special case of a theorem of Kuratowski. Part (2) is contained in what goes under the name of the isomorphism theorem. For a treatment of these results see, for example, Chapter I of Parthasarathy [15].

A measurable space (X, E) is standard Borel if there exists a metric on X which makes it a complete separable metric space in such a way that E is then the Borel σ-algebra (the smallest σ-algebra containing the open subsets of X). The name ‘standard Borel’ was given to such spaces by Mackey in [13]. In particular, by Proposition 16.3 (1) a standard Borel space is countably generated. Moreover, a standard Borel space (X, E) is separable, i.e., {x} ∈ E for each x ∈ X.

Proposition 22.2 A measurable space (X, E) is standard Borel if and only if it is separable and substandard Borel.

Proof By Proposition 18.1 a standard Borel space is substandard Borel and clearly it is separable. Suppose then that (X, E) is a separable substandard Borel space. Then there exists a mapping f : X → M with f −1(B) = E such that f(X) ∈ B and by Proposition 16.8 f is injective and if f is considered as a −1 bijective mapping from X onto A = f(X) then f (B|A) = E. If A is countable then so is X and then E = P(X), since (X, E) is separable. In this case P(X) is the Borel σ-algebra of X considered as a topological space with the discrete topology. But the discrete topology is generated by the discrete metric δ (with δ(x, x) = 0 and δ(x, y) = 1 if x =6 y) and the metric space (X, δ) is separable and complete. Hence (X, E) is standard Borel. Suppose then that A = f(X) is uncountable, i.e., A is an uncountable element of B. By Proposition 22.1 (2) there exists an injective B-measurable mapping h : M → M with h(M) = A and by Proposition 22.1 (1) h(B) ∈ B for each

132 22 Standard Borel spaces 133

B ∈ B. Define g : X → M by letting g(x) = h−1(f(x)) for each x ∈ X; thus g is surjective and hence bijective. Let B ∈ B; then g−1(B) = f −1(h(B)) ∈ E, since h(B) ∈ B, and so g−1(B) ⊂ E. On the other hand, for each E ∈ E there exists B ∈ B with f −1(B) = E and then h−1(B) ∈ B with g−1(h−1(B)) = E. This shows that g−1(B) = E. We now have a bijective mapping g : X → M with g−1(B) = E and the mapping g can be used to pull the metric on M back to a metric on X; then g becomes a homeomorphism between the metric spaces X and M. Thus X is a separable complete metric space with respect to this metric and E = g−1(B) is the Borel σ-algebra. This shows that (X, E) is standard Borel.

Proposition 22.1 can be exploited to give a lot more information about standard Borel and substandard Borel spaces. A couple of these result are given below. Note that if (X, E) is a countably generated measurable space and f : X → M is a mapping with f −1(B) = E then by Lemma 16.3 f(X) has the same cardinality as A(E). In particular, the cardinality of f(X) does not depend on f.

Lemma 22.1 Let (X, E) be countably generated and let f1, f2 : X → M be −1 −1 mappings with f1 (B) = f2 (B) = E; put A1 = f1(X) and A2 = f2(X). Then −1 there exists a bijective mapping v : A1 → A2 with v (B|A2 ) = B|A1 .

−1 Proof Let z ∈ A1; then by Lemma 16.3 f1 ({z}) ∈ A(E) and so f2(x1) = f2(x2) −1 −1 for all x1, x2 ∈ f1 ({z}), since f2 ({w}) ∈ E for each w ∈ M. There thus exists a unique mapping v : A1 → M such that v ◦f1 = f2, and since v(A1) = f2(X) = A2 we can consider v as a surjective mapping from A1 onto A2. Reversing the roles of f1 and f2 there is also a unique surjective mapping u : A2 → A1 with u ◦ f2 = f1 which implies that v is injective (since u ◦ v ◦ f1 = f1). This gives us a bijective mapping v : A1 → A2 and u : A2 → A1 must be the inverse mapping to v. −1 Now let B2 ∈ B|A2 and so B2 = B ∩ A2 for some B ∈ B. Then, since f1 (B) = E, −1 −1 −1 there exists B1 ∈ B with f1 (B1) = f2 (B) = f2 (B2). It follows that

−1 −1 −1 −1 v (B2) = f1(f1 (v (B2))) = f1(f2 (B2)) −1 = f1(f1 (B1)) = f1(X) ∩ B1 = A1 ∩ B1

−1 (since f1 : X → A1 is surjective) and therefore v (B2) ∈ B|A1 . This shows that −1 −1 v (B|A2 ) ⊂ B|A1 , and reversing the roles of f1 and f2 gives u (B|A1 ) ⊂ B|A2 , −1 which together implies that v (B|A2 ) = B|A1 .

Proposition 22.3 Let (X, E) be a substandard Borel space. Then f(X) ∈ B for every mapping f : X → M with f −1(B) = E. 22 Standard Borel spaces 134

−1 −1 Proof Let f1, f2 : X → M with f1 (B) = f2 (B) = E, put A1 = f1(X) and A2 = f2(X) and suppose A1 ∈ B. Then by Lemma 22.1 there exists a bijective −1 mapping v : A1 → A2 with v (B|A2 ) = B|A1 . In particular, if A1 is countable then so is A2 and in this case A2 ∈ B, since B contains every countable subset of M. We can thus assume that A1 is uncountable and then by Proposition 22.1 (2) there exists an injective B-measurable mapping h : M → M with h(M) = A1. Define a mapping g : M → M by letting g(z) = h(v(z)) for each z ∈ M; thus g is injective and g(M) = A2. Now consider B ∈ B; then A2 ∩ B ∈ B|A2 , hence −1 −1 0 0 v (A2 ∩ B) ∈ B|A2 and so v (A2 ∩ B) = A1 ∩ B for some B ∈ B. But this −1 −1 0 0 implies that g (B) = v (A1 ∩ B ) ∈ B, since A1 ∩ B ∈ B, and therefore g is B-measurable. Thus by Proposition 22.1 (1) A2 ∈ B. Finally, since (X, E) is substandard Borel there exists a mapping f : X → M with f −1(B) = E and f(X) ∈ B and so the above shows that f(X) ∈ B for every mapping f : X → M with f −1(B) = E.

Proposition 22.4 Let (X, E) be a substandard Borel space. (1) If A(E) is uncountable then there exists a surjective mapping f : X → M with f −1(B) = E. (2) If A(E) is countable then there exists a countable set N and a surjective mapping f : X → N with f −1(P(N)) = E.

Proof Since (X, E) is substandard Borel there exists a mapping g : X → M with g−1(B) = E such that g(X) ∈ B. (1) Let A = g(X); then A is uncountable since, as was already noted, it has the same cardinality as A(E). By Proposition 22.1 (2) there thus exists an injective B-measurable mapping h : M → M with h(M) = A and then by Proposition 22.1 (1) h(B) ∈ B for each B ∈ B. Define f : X → M by letting f(x) = h−1(g(x)) for each x ∈ X; thus f is surjective. Let B ∈ B; then f −1(B) = g−1(h(B)) ∈ E, since h(B) ∈ B, and so f −1(B) ⊂ E. On the other hand, for each E ∈ E there exists B ∈ B with g−1(B) = E and then h−1(B) ∈ B with f −1(h−1(B)) = E. This shows that f −1(B) = E. (2) Here N = g(X) is countable. Thus f = g : X → N is surjective and it now follows immediately from Lemma 16.3 that f −1(P(N)) = E.

Proposition 22.5 If (X, E) is a standard Borel space with X uncountable then there exists a bijective mapping f : X → M with f −1(B) = E. The measurable spaces (X, E) and (M, B) are therefore isomorphic. 22 Standard Borel spaces 135

Proof Since (X, E) is separable A(E) has the same cardinality as X and is thus uncountable. By Propositions 22.2 and 22.4 (1) there therefore exists a surjective mapping f : X → M with f −1(B) = E. But by Proposition 16.8 g is then injective and hence bijective.

We have treated the Kolmogorov extension property as well as the existence of conditional distributions just using substandard Borel spaces. The common characteristic of these two examples is that they involve constructing measures, and such problems seem to be relatively simple. However, there are plenty of more difficult problems which cannot be dealt with directly using substandard Borel spaces. A typical example concerns the existence of measurable selectors: Let (X, E) and (Y, F) be measurable spaces and h : X → Y be a surjective mapping with h−1(F) ⊂ E. By the axiom of choice there then exists a selector for −1 h, i.e., a mapping ϕ : Y → X such that h ◦ ϕ = idY . If in addition ϕ (E) ⊂ F then ϕ is called a measurable selector. Unfortunately, measurable selectors do not always exist, even when (X, E) and (Y, F) are standard Borel spaces: Let I = [0, 1] and let p1 : I × I → I be the projection onto the first component. Then there exists a Borel subset A of I × I with p1(A) = I for which there does not exist a Borel measurable mapping g : I → I × I with g(I) ⊂ A such that p1(g(x)) = x for all x ∈ I. (See, for example, Blackwell [1].) However, so-called universally measurable selectors exist:

Proposition 22.6 Let (X, E) and (Y, F) be standard Borel and let h : X → Y be a surjective mapping with h−1(F) ⊂ E. Then there exists a selector ϕ : Y → X −1 for h with ϕ (E) ⊂ F∗. Here F∗ is the intersection of all σ-algebras Fµ with µ a finite measure on F and with Fµ the completion of the σ-algebra F with respect to µ. (F∗ is called the σ-algebra of universally measurable sets.)

Proof Proofs of equivalent results can be found in Cohn [3], Theorem 8.5.3, and Dynkin and Yushkevich [6], Appendix 3.

We will show how to reduce the proof of Proposition 22.6 to a similar statement involving (M, B). However, this time it doesn’t help and we end up with a task which is no easier than the original one. Let (X, E), (Y, F) and h : X → Y be as in the statement of the theorem, and as usual consider mappings f : X → M and g : Y → M such that f −1(B) = E and g−1(B) = F with A = f(X) and B = g(Y ) both in B. Let θ : X → M ×M be the mapping with θ = (g ◦h, f). By Lemma 19.2 (applied to f and g ◦ h) it follows that θ−1(B × B) = E and θ(X) ∈ B × B.

Let D = θ(X) and p1 : M × M → M be the projection onto the first component. Then p1 ◦ θ = g ◦ h and p1(D) = B, since h is surjective. 22 Standard Borel spaces 136

By Proposition 16.8 θ is injective; let q : D → X be the inverse mapping. It then −1 follows (again by Proposition 16.8) q (E) = (B × B)|D ⊂ B × B. Suppose now there exists a mapping ψ : B → M × M with ψ(B) ⊂ D such that p1 ◦ ψ = idB. Define a mapping ϕ : Y → X by ϕ = q ◦ ψ ◦ g. Then

g ◦ h ◦ ϕ = g ◦ h ◦ q ◦ ψ ◦ g = p1 ◦ θ ◦ q ◦ ψ ◦ g = p1 ◦ ψ ◦ g = idB ◦ g = g and hence h ◦ ϕ = idY , since by Proposition 16.8 g is injective. Suppose in −1 −1 addition ψ (B × B) ⊂ B∗. Then ϕ (E) ⊂ F∗, since it is not difficult to see that g(B∗) ⊂ F∗. The proof of Proposition 22.6 is therefore reduced to showing that the following holds:

Proposition 22.7 Let D be a non-empty subset of M × M such that D ∈ B × B and B = p1(D) ∈ B. Then there exists a mapping ψ : B → M × M with −1 ψ(B) ⊂ D and ψ (B × B) ⊂ B∗ such that p1 ◦ ψ = idB.

Proof As already indicated, in this case the reduction to the statement involving (M, B) does not make the problem any easier. For the proof of Proposition 22.7 the reader will have to look at the references mentioned in Proposition 22.6. 23 The usual diagonal argument

In a couple of places we have used a result which is typically invoked as an appeal to ‘the usual diagonal argument’, and so we here give a precise statement of what this means.

For each non-empty set X denote by Σ(X) the set of all sequences {xn}n≥0 of elements from X, thus Σ(X) is the set of all mappings from N to X. (We index the sequences with N, but it would make no difference if we used N+ instead of N.) Denote by Σ.(N) the set of all strictly increasing mappings in Σ(N); the elements of Σ.(N) are called subsequences. Note that if τ ∈ Σ.(N) then τ(n) ≥ n . . for all n ≥ 0 and that τ1 ◦ τ2 ∈ Σ (N) for all τ1, τ2 ∈ Σ (N). . If s = {xn}n≥0 ∈ Σ(X) is a sequence and τ = {nj}j≥0 ∈ Σ (N) is a subsequence then there is the sequence s ◦ τ = {xnj }j≥0 ∈ Σ(X) (which is often also called a subsequence). For s, s0 ∈ Σ(X) we write s ∼ s0 if there exists m ≥ 0 such that s(k) = s0(k) for all k ≥ m. Thus ∼ is an equivalence relation on Σ(X). Let ∆ be a subset of Σ(X); we say that ∆ is ∼-invariant if s0 ∈ ∆ for all s0 ∈ Σ(X) such that s0 ∼ s for some s ∈ ∆. Moreover, we say that ∆ is closed under subsequences if s ◦ τ ∈ ∆ for all s ∈ ∆, τ ∈ Σ.(N).

Now let ∆0 and ∆1 be subsets of Σ(X) with ∆0 ⊂ ∆1; we call (∆0, ∆1) an admissible pair in Σ(X) if ∆0 is ∼-invariant and closed under subsequences, ∆1 . is closed under subsequences and if for each s ∈ ∆1 there exists τ ∈ Σ (N) such that s ◦ τ ∈ ∆0.

The prototypical example here is with X = R, with ∆1 the set of bounded real sequences and with ∆0 the set of convergent real sequences: This results in an admissible pair in Σ(R), since the Heine-Borel theorem implies that every bounded sequence possesses a convergent subsequence.

n n Theorem 23.1 For each n ≥ 0 let Xn be a non-empty set, let (∆0 , ∆1 ) be an n admissible pair in Σ(Xn) and let sn ∈ ∆1 . Then there exists a subsequence . n τ ∈ Σ (N) such that sn ◦ τ ∈ ∆0 for all n ≥ 0.

Proof The following lemma (involving only the set Σ.(N)) is in some sense the real diagonal argument:

. Lemma 23.1 Let {τn}n≥0 be a sequence of elements from Σ (N) and for each . n ≥ 0 put γn = τ0 ◦ · · · ◦ τn (and so γn ∈ Σ (N)). Define τ : N → N by letting . τ(n) = γn(n) for all n ≥ 0. Then τ ∈ Σ (N). Moreover, for each n ≥ 0 there . exists ηn ∈ Σ (N) such that τ ∼ γn ◦ ηn.

137 23 The usual diagonal argument 138

Proof The mapping τ : N → N is strictly increasing since

τ(n + 1) = (τ0 ◦ · · · ◦ τn+1)(n + 1) = (τ0 ◦ · · · ◦ τn)(τn+1(n + 1))

> (τ0 ◦ · · · ◦ τn)(τn+1(n)) ≥ (τ0 ◦ · · · ◦ τn)(n) = τ(n) for all n ≥ 0, and hence τ ∈ Σ.(N). Fix n ≥ 0; then for all m > n

τ(m) = (τ0 ◦ · · · ◦ τm)(m)

= (τ0 ◦ · · · ◦ τn)((τn+1 ◦ · · · ◦ τm)(m)) = γn((τn+1 ◦ · · · ◦ τm)(m)) ; thus if we define ηn : N → N by

m if m ≤ n , ηn(m) =  (τn+1 ◦ · · · ◦ τm)(m) if m > n ,

. then τ(m) = (γn ◦ ηn)(m) for all m ≥ n, and so τ ∼ γn ◦ ηn. But ηn ∈ Σ (N): If m < n then ηn(m + 1) = m + 1 > m = ηn(m), if m > n then

ηn(m + 1) = (τn+1 ◦ · · · ◦ τm+1)(m + 1) = (τn+1 ◦ · · · ◦ τm)(τm+1(m + 1))

> (τn+1 ◦ · · · ◦ τm)(τm+1(m)) ≥ (τn+1 ◦ · · · ◦ τm)(m) = ηm(m) and finally ηn(n + 1) = τn+1(n + 1) ≥ n + 1 > n = ηn(n).

Now to the proof of Theorem 23.1. By induction we define a sequence {τn}n≥0 of . n elements from Σ (N) such that sn ◦τ0 ◦· · ·◦τn ∈ ∆0 for each n ≥ 0: To start with . 0 0 there exists τ0 ∈ Σ (N) so that s0 ◦ τ0 ∈ ∆0, since s0 ∈ ∆1. Thus let n ≥ 0 and . k suppose that there exist τ0, . . . , τn ∈ Σ (N) such that sk ◦τ0 ◦· · ·◦τk ∈ ∆0 for each n n k = 0, . . . , n. Then sn+1 ◦τ0 ◦· · ·◦τn ∈ ∆1 , since ∆1 is closed under subsequences . n and hence there exists τn+1 ∈ Σ (N) such that sn+1 ◦ τ0 ◦ · · · ◦ τn+1 ∈ ∆0 . . For n ≥ 0 put γn = τ0 ◦ · · · ◦ τn. Then by Lemma 23.1 there exists τ ∈ Σ (N) . and for each n ≥ 0 a subsequence ηn ∈ Σ (N) such that τ ∼ γn ◦ ηn. Now since n n ∆0 is closed under subsequences and sn ◦ γn = sn ◦ τ0 ◦ · · · ◦ τn ∈ ∆0 it follows n that sn ◦ γ ◦ ηn ∈ ∆0 . But sn ◦ τ ∼ sn ◦ γ ◦ ηn, since τ ∼ γn ◦ ηn and therefore n n sn ◦ τ ∈ ∆0 , since ∆0 is ∼-invariant. References

[1] Blackwell, D. (1968): A not containing a graph. Ann. Math. Stats., 39, 1345-1347.

[2] Breiman, L. (1968): Probability. Addison-Wesley, Reading

[3] Cohn, D.L. (1980): Measure Theory. Birkh¨auser, Boston

[4] Doob, J.L. (1953): Stochastic Processes. Wiley, New York

[5] Dunford, N., Schwartz, J.T. (1958): Linear Operators, Part I. Interscience, New York

[6] Dynkin, E.B., Yushkevich, A.A. (1979): Controlled Markov Processes. Springer-Verlag, Berlin

[7] Garsia, A.M. (1970): Topics in Almost Everywhere Convergence. Markham, Chicago

[8] Halmos, P.R. (1974): Measure Theory. Springer

[9] Isaac, R. (1965): A proof of the martingale convergence theorem. Proc. Am. Math. Soc., 16, 842-844.

[10] Kingman, J., Taylor, S.J. (1966): Introduction to Measure and Probability. Cambridge University Press

[11] Kolgomorov, A.N. (1933): Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer-Verlag, Berlin

[12] Kuratowski, K. (1966): Topology, Volume 1. Academic Press, New York

[13] Mackey, G.W. (1957): Borel structure in groups and their duals. Trans. Am. Math. Soc., 85, 134-165.

[14] Meyer, P.A. (1966): Probability and Potentials. Blaisdell, Toronto

[15] Parthasarathy, K.R. (1967): Probability Measures on Metric Spaces. Aca- demic Press, New York

[16] Preston, C. (1980): Specifications and their Gibbs States. Current version to be found at: http://www.mathematik.uni-bielefeld.de/~preston/

[17] Taylor, S.J. (1973): Introduction to Measure and Integration. Cambridge University Press (This is the first half of [10].)

139 Index σ-algebra, 7 subset, 36 Borel, 10 subspace, 41 countably generated, 99 conditional expectation, 75 trace, 9 consistent measures, 116 σ-algebras continuous linear mapping, 44, 76 weakly independent, 95 continuous mapping, 18, 37 σ-finite measure, 22 continuous operation, 5, 35 ?-closed convergence in measure, 50 subset, 36 convergent sequence, 4, 40 subspace, 41 countable set, 4 ∅-continuous mapping, 18 countably additive mapping, 18 countably generated absolutely continuous, 70 σ-algebra, 99 weakly, 70 measurable space, 99 adapted sequence, 123 countably sub-additive mapping, 18 additive mapping, 18 cylinder sets, 99 algebra, 7 almost everywhere, 50 d-system, 7 atom, 105 Daniell integral, 61 decomposition backward martingale, 123 Lebesgue, 74 base for topology, 16 decreasing sequence, 35 Borel σ-algebra, 10 density Borel space Radon-Nikodym, 73 standard, 132 disjoint union, 15 substandard, 112 dominated convergence theorem, 55 bound lower, 5, 38 elementary mapping, 43 upper, 5, 35 equicontinuous measures, 108 bounded mapping, 44 expectation conditional, 75 Carath´eodory extension, 25 extension Cauchy-Schwarz inequality, 57 Carath´eodory, 25 closed under finite intersections, 9 co-complete Fatou’s lemma, 55 poset, 38 finite kernel, 85 subset, 39 finite measure, 19 subspace, 41 finite operation, 6 complemented subspace, 41 finite pre-kernel, 86 complete finitely additive measure, 18 poset, 35

140 Index 141 generated σ-algebra, 7 pre-continuous, 37 generated outer measure, 25 projection, 13 greatest lower bound, 5, 38 simple, 43 sub-additive, 18 image measure, 78 mappings implicit product uniformly integrable, 59 kernel, 95 martingale, 123, 124, 127 measure, 95 measurable mapping, 10 increasing sequence, 35 measurable rectangle, 12 integral, 55 measurable selector, 135 Daniell, 61 measurable space, 10 inverse limit, 116 countably generated, 99 isomorphic measurable spaces, 105 separable, 78, 104 kernel, 84 measurable spaces finite, 85 isomorphic, 105 implicit product, 95 measure, 18 probability, 85 σ-finite, 22 Kolmogorov extension property, 116 finite, 19 finitely additive, 18 least upper bound, 5, 35 image, 78 Lebesgue decomposition, 74 implicit product, 95 Lebesgue measure, 31 Lebesgue, 31 limit of a sequence, 4 locally finite, 30 linear mapping, 44, 76 outer, 24 continuous, 44, 76 pre-image, 78 monotone, 44, 76 probability, 19 pre-linear, 44 product, 88 locally equicontinuous measures, 116 measures locally finite measure, 30 consistent, 116 lower bound, 5, 38 equicontinuous, 108 greatest, 5, 38 locally equicontinuous, 116 mutually singular, 73 mapping monotone class, 7 ∅-continuous, 18 monotone linear mapping, 44, 76 additive, 18 monotone mapping, 18, 37 bounded, 44 monotone operation, 5, 35 continuous, 18, 37 mutually singular measures, 73 countably additive, 18 countably sub-additive, 18 normal subspace, 42 elementary, 43 linear, 44, 76 open rectangle, 16 measurable, 10 operation monotone, 18, 37 continuous, 5, 35 Index 142

finite, 6 measurable, 10 monotone, 5, 35 standard Borel space, 132 order sub-σ-algebra, 74 partial, 35 sub-additive mapping, 18 outer measure, 24 submartingale, 124 subsequence, 137 partial order, 35 subset partially ordered set, 35 ?-closed, 36 poset, 35 co-complete, 39 co-complete, 38 complete, 36 complete, 35 thick, 78 pre-continuous linear mapping, 44 subspace, 41 pre-continuous mapping, 37 ?-closed, 41 pre-image measure, 78 co-complete, 41 pre-kernel, 82 complemented, 41 finite, 86 complete, 41 probability kernel, 85 normal, 42 probability measure, 19 weakly complemented, 61 product σ-algebra, 13 weakly normal, 61 product measure, 88 substandard Borel space, 112 product of measurable spaces, 13 projection mapping, 13 tail σ-algebra, 123 property thick subset, 78 Kolmogorov extension, 116 topological space weak sequential compactness, 108 separable, 100 trace σ-algebra, 9 Radon-Nikodym density, 73 Radon-Nikodym theorem, 71 uniformly integrable mappings, 59 rectangle upper bound, 5, 35 measurable, 12 least, 5, 35 open, 16 weak sequential compactness, 108 section, 14 weakly absolutely continuous, 70 selector, 135 weakly complemented subspace, 61 measurable, 135 weakly independent σ-algebras, 95 separable measurable space, 78, 104 weakly normal subspace, 61 separable topological space, 100 sequence adapted, 123 convergent, 4, 40 decreasing, 4, 35 increasing, 4, 35 simple mapping, 43 space