<<

Probability I Fall 2011

Contents

1 Measures 2 1.1 σ-fields and generators ...... 2 1.2 Outer measure ...... 4 1.3 Carath´eodory Theorem ...... 7 1.4 Product measure I ...... 8 1.5 Hausdorff measure and Hausdorff dimension ...... 10

2 Integrals 12 2.1 Measurable functions ...... 12 2.2 Monotone and bounded convergence ...... 14 2.3 Various modes of convergence ...... 14 2.4 Approximation by continuous functions ...... 15 2.5 Fubini and Radon-Nikodym Theorem ...... 17

3 Probability 18 3.1 Probabilistic terminology and notation ...... 18 3.2 Independence, Borel-Cantelli, 0-1 Law ...... 19 3.3 Lp-spaces ...... 21 3.4 Weak ...... 23 3.5 Measures on a metric , tightness vs. compactness ...... 24

4 Appendix: Selected proofs 27 4.1 Section 1 ...... 27 4.2 Section 2 ...... 30 4.3 Section 3 ...... 33

1 1 Measures

1.1 σ-fields and generators

A family F of subsets of a set S is called a field if

(F1) ∅ ∈ F and S ∈ F; (F2) if A, B ∈ F, then A ∪ B ∈ F; (F3) if A ∈ F, then Ac = S \ A ∈ F.

Condition (F2), repeating itself, is equivalent to

∪n (F2f ) if A1,...,An ∈ F, then Ak ∈ F. k=1

A field F is called a σ-field, if (F2f ) is strengthened to ∪∞ (F2c) if A1,A2,... ∈ F, then Ak ∈ F. k=1

In view of (3) and the De Morgan’s laws, the union “∪” in (F2) or (F2f ) or (F2c) can be replaced by the intersection “∩”.

The members of a σ-field are called measurable sets.

Proposition 1.1 The family 2S of all subsets of S is a σ-field. The intersection of an arbitrary collection of σ-fields is a σ-field.

Proposition 1.2 Let G be any family of subsets of S. Then there is the unique smallest σ-field containing G.

We denote this smallest σ-field by F = σ(G) and call G its generator. We say that G induces or generates or spans F.

The Borel σ-field is generated by the topology in a and its members are called Borel sets. In general, the definition is non-constructive although a simple construction exists for finite σ-fields, and some special countable σ-fields.

Example 1.3 As a generator of the σ-field of Borel sets in Rn one may use the family of open balls, or the family of closed intervals

n [a, b] = { x ∈ R : ak ≤ xk ≤ bk, k = 1, . . . , n } = [a1, b1] × · · · × [an, bn],

x or the family of half-open half-closed intervals

n (a, b] = { x ∈ R : ak < xk ≤ bk, k = 1, . . . , n } .

Since the Euclidean topology is countably generated, any of its generators will generate the Borel σ-field.

2 Example 1.4 The countable Cartesian product R∞ = R × R × · · · is a , e.g., under the metric

∑ |x − y | ∧ 1 d(x, y) = k k , x = (x ), y = (y ), (1) 2k k k k

Let (S, F) and (T, G) be measurable spaces. A function f : S → T is called measurable with respect −1 −1 to (F, G) if f G ⊂ F, or equivalently, if f G0 ⊂ F for some generator G0 of G. In particular, for Borel measurable spaces spanned by topologies, every is Borel measurable. If T = Rn is equipped with the Borel σ-field, we talk about Borel measurable functions.

Let (S, F) be a measurable space. A function µ : F → [0, ∞], not constant ∞, is said to be a measure, if µ is countably additive, i.e., ∪ ∑ µ Ak = µAk, for every of pairwise disjoint Ak ∈ F. (2) k k The underlined restriction simply removes an unnecessary pathology. It is equivalent to the assumption µ∅ < ∞. Then it follows that µ∅ = 0.

F ∞ The triple, (S, , µ∪) is called a measure space. A measure is called finite if µS < and a probability, if µS = 1. If S = Sk such that µSk < ∞, then we call µ σ-finite. k

Let S ≠ ∅, F = 2S. The measure concentrated at a ∈ S, a.k.a the degenerated measure or the point mass, is defined as { 1, if a ∈ A; δaA = 1IA(a) = (3) 0, if a∈ / A. ≥ ∈ Let pk 0 and ak S. Then ∑

µ = pk δak k ∑ is called a discrete measure. It is a probability when k pk = 1 and the counting measure when pk = 1.

3 1.2 Outer measure

Let S ≠ ∅. A set function ϕ : 2S → [0, ∞] is called an outer measure, or OM in short, if

1. φ(∅) = 0;

2. ϕ is monotonic, i.e., A ⊂ B ⇒ φ(A) ≤ φ(B), for every A, B; ( ∪ ) ∑ 3. ϕ is countably subadditive, i.e., φ Ak ≤ φ(Ak), for every countable (Ak). k∈K k∈K Theorem 1.5 Let φ be an OM. Then the family

def c Mφ = { A ⊂ S : for every P ⊂ A and Q ⊂ A , φ(P ∪ Q) = φ(P ) + φ(Q) } . (4)

is a σ-field and φ is a measure on Mφ.

The members of Mφ are called φ-measurable sets. Intuitively, the restriction to measurable sets forces the additivity upon ϕ. The inequality “φ(P ∪ Q) ≤ φ(P ) + φ(Q)” is contained in Condition 3 for OM. So, it suffices to consider

c Mφ = { A ⊂ X : for every P ⊂ A and Q ⊂ A , φ(P ∪ Q) ≥ φ(P ) + φ(Q) } .

A set function φ is called superadditive if

φ(A ∪ B) ≥ φ(A) + φ(B) when A ∩ B = ∅

A superadditive and monotonic φ is countably superadditive.

For a topological space S, an OM is called a Borel outer measure (BOM in short) if Mφ contains Borel

sets. In practice it suffices to show that Mφ contains a generator of Borel sets (e.g., every is measurable).

If (S, d) is a metric space, we call φ a Carath´eodory’s OM outer measure (COM), if the additivity holds for metrically separated sets, i.e.,

d(A, B) > 0 ⇒ φ(A ∪ B) = φ(A) + φ(B), A, B ⊂ S. (5)

Theorem 1.6 Let S be a metric space. Then COM ⊂ BOM.

Some subsets D of S enjoy “natural” numerical values v(D) (“v” for “volume”), and the empty set most naturally should be assigned the value 0. Any such assignment gives rise to the OM and then by Theorem 1.5 to the true measure.

Proposition 1.7 Let D ⊂ 2S contain ∅, and v : D → [0, ∞] be a set function such that v(∅) = 0. The following formula defines an OM: { } ∑∞ ∪ def ∈ D ⊂ φ(A) = φv,D (A) = inf v(Dk): Dk ,A Dk . (6) k=1 k

4 The plethora of choices of the cover family D may lead to a redundancy or triviality. So we say that two ′ ′ choices (D, v) and (D , v ) are equivalent if they induce the same OM, i.e., (6) yields φv,D = φv′,D′ . Note the obvious implications: D ⊂ D′ ⇒ ≥ 1. φv,D φv,D′ , ≤ ′ ⇒ ≤ 2. v v φv,D φv′,D . In order to refine the former crude relation, we say that the cover D′ is finer than a cover D and write D ≺ D′, if ∪ ∑ ∀ ∀ ∈ D ∃ { ′ } ⊂ D′ ⊂ ′ ′ − ϵ > 0 D Dm D Dm and v(D) > v(Dm) ϵ. m m Proposition 1.8 D ≺ D′ ≥ If , then φv,D φv,D′ .

If D ≺ D′ and D′ ≺ D, then the covers are called equivalent. Equivalent covers induce identical OM’s and measures.

Example 1.9 (Lebesgue measure) Let S = Rd and D consist of either closed, or open, or left-open right-closed intervals, or open or closed balls. Let v be the volume which is also called the length when d = 1 and the area when d = 2. All these families are equivalent.

Example 1.10 (Lebesgue-Stieltjes measure) Let f : Rd → R.

d = 1. Assume that f is a nondecreasing right-continuous function. Let D = { (a, b] } and v(a, b] = f(b) − f(a).

d ≥ 2. The d-dimensional increment can be defined by induction. Alternatively, we observe that ∏n ( ) ∑ 1I (x) = 1I −∞ (x ) − 1I −∞ (x ) = s(c) 1I −∞ (x ). (a,b] ( ,bj ] j ( ,aj ] j ( ,cj ] j j=1 c

d d where 2 points c = (cj) ∈ R have coordinates cj ∈ { aj, bj }, and { 1, if c uses an even number of coordinates of a s(c) = −1, if c uses an odd number of coordinates of a

Then we put and require ∑ def v(a, b] = s(c)f(c) ≥ 0, c∈C for every a ≤ b, and that1 lim v(a, b] = 0 b↘a For example, when d = 2,

v((a1, a2), (b1, b2) ] = f(b1, b2) − f(a1, b2) − f(b1, a2) + f(a1, a2). 1The continuity condition is unnecessary for the purpose of mere construction of some measure. However, the type of the discontinuity of the underlying function should match the type of selected intervals, and a potential mismatch invariantly will entail unpleasant pathologies in later stages.

5 Abstract examples

1. Given a countable (finite) set N ⊂ S, one can put v(B) = card { N ∩ B }, which “counts” the number of points of the sequence that lie in B ∈ F = 2S. Here, D = 2S.

2. Let H : S → R be a function. For A ∈ D = 2S put

− v(A) = rangeA(H) = sup H(x) inf H(x). x∈A x∈A

Let (S, d) be a metric space.

1. Define the diameter of a set as

v(A) = diam(A) = sup d(x, y). x,y∈A

A modification of the metric may induce the same topology but the generated OM’s could differ significantly. For example, d′ = ds for s ∈ (0, 1] or d′ = d ∧ ϵ for ϵ > 0 are equivalent metrics but the “natural” (truly, not any more) values

v′(A) = diam′(A) = sup d′(x, y) x,y∈A

entail different OM’s and measures. A modification does not have to preserve the metric property, so the exponent s above could be arbitrary, or even one may consider any ρ : [0, ∞) → [0, ∞) such that ρ(0) = 0, and put

vρ(A) = ρ(diam(A)).

2. Given v and a family D of sets, define the OM φ by formula (6). Now, consider v′ = φ, select another (perhaps, larger) D′, and define the OM by formula (6). Repeat as long as desired.

3. In the last example, a new OM was made of an old OM. In the former example, we transformed the metric d to an equivalent metric d′. Some transformations preserve OM’s. In fact, the rudimentary OM obtained from a single metric may be rather pathological (see Example 1.16 regarding Rn).

(a) If φ is an OM, and f(t) : [0, ∞] → [0, ∞] is a nondecreasing subadditive (f(x + y) ≤ f(x) + f(y)) function with f(0) = 0, then f(φ) is again an OM. Examples:

f(t) = ts, for s ∈ (0, 1), f(t) = ln(1 + t), f(t) = 1 − e−t, etc.

All these functions are concave, and concave functions are subadditive. In contrast to Example 1a, where the rudimentary measurement v is unrestricted (so is ρ(v)), the subadditivity must be assumed here.

(b) Given a collection φ, { φi } of OM, the following set functions are again OM’s: ∑ φ ∧ 1, sup φi, aiφi, i i

where ai ∈ [0, ∞) and ai = 0 for but a countable set of indices.

6 1.3 Carath´eodory Theorem

In R, let D = { (a, b] } and consider the left-continuous function f(x) = 1I[0,1). Proceeding as in the con- struction of the Lebesgue-Stieltjes measure, we have v(0, 1] = 1 but the induced OM yields φ(0, 1] = 0. Always ϕ(D) ≤ v(D) but we should have the equality for the OM to become a proper extension of the basic measurement v. Further, we expect the original cover sets to be measurable as well.

Lemma 1.11 If D is a π-system, i.e., it is closed under finite intersections, and v is monotonic and count- ably subadditive, then the OM ϕ induced by (6) coincides with v on D.

If cover sets from D were measurable and ϕ extended v, then v on D would have to be countably additive2. Proper set differences would also be measurable, and D could be enlarged to contain them. Therefore, if we wish v to agree with ϕ it is reasonable to assume that the cover family D is a ring, i.e. a nonempty family of sets closed under finite union and intersection and the difference of sets.

Note that, if D is a ring and v, in addition to being countably subadditive, is (finitely) superadditve, then v is monotonic: B ⊂ A ⇒ v(B) ≤ v(B) + v(A \ B) ≤ v(B ∪ (A \ B)) = v(A).

So, the hypotheses of Therorem 1.11 are satisfied. Further, v is countably superadditive. So, v is in fact countably additive.

The assumptions of following theorem ensure the desired outcome. The induced measure agrees with the original quantity, basic cover sets are measurable, and the induced measure is unique on every sub-σ-field of measurable sets. The wording of the third uniqueness property has a practical purpose. In the future we will often show that two measures of interest are equal a long as they agree on some generator.

Theorem 1.12 (Carath´eodory) Let D be a ring, and a set function v on D be countably additive on D.

Denote φ = φv,D . Then

1. φ = v on D;

2. D ⊂ Mϕ.

3. Let v be σ-finite. Let σ-field F be a σ-field such that D ⊂ F ⊂ Mϕ. If µ is a measure on F such that µ = v on D, then µ = ϕ on F.

d Example 1.13 Consider the Lebesgue measure in R and choose open intervals to form a family D0). Consider also the ring D of finite unions of left-open right-closed intervals (a, b]. The volume v has the unique additive extension on D. ∪ ∪ We claim that φ D = φ D. Since D ≺ D, then φ D ≤ φ D Let A ⊂ D , where D = D , v, 0 v, 0 v, v, 0 k k k j∈Ik jk where Ik are finite index sets and Djk ∈ D0, and are disjoint for j ∈ Ik, for every k, then ∑ ∑ ∑ ≥ v(Dk) = v(Djk) φv,D0

k k j∈Ik

2the additivity might be satisfied trivially or in vacuum, if there were too few unions in D

7 ≥ D Take the infimum. Then φv,D φv,D0 . Now, in order to show that v is countably additive on , it suffices to prove that ≤ v(a, b] = v[a, b] ϕv,D0 [a, b]

since the converse inequality is obvious. So, let ϵ > 0 and ∪ ∑ [a, b] ⊂ (ak, bk), ϕ[a, b] > v(ak, bk) − ϵ. k k Since the closed interval is compact, one can choose a finite cover, say with index k ≤ n. By finite subaddi- tivity and monotonocity of v,

∑n ( ∪ ) ϕ[a, b] > v(ak, bk) − ϵ ≥ v (ak, bk) − ϵ ≥ v[a, b] − ϵ k=1 k Now, let ϵ → 0.

1.4 Product measure I

For measure spaces (S, F, µ) and (T, G, ν) put

D0 = { F × G : F ∈ F,G ∈ G } .

and define the set function v = µ ⊗ ν on D0 by the formula

µ ⊗ ν(F × G) = µ(F )ν(G) (7)

· ∞ (assuming, by convention, that 0 = 0, if necessary). Then we define the outer measure ϕ(v,D) , and call it the product outer measure. Consider    ∪n  D ∃ ∈ D =  H : H1,...,Hn 0 : H = Hj  j=1

One can choose disjoint sets Hj. Then, for disjoint terms of the union, define ∑ v(H) = v(Hj). j

It follows that the definition is consistent, independent of a representation of H. D is a ring and v is additive on it. As before,

ϕv,D0 = ϕv,D.

The σ-field generated by D0, containing the ring D, is called the product σ-field, and is denote by F ⊗ G. The difficult part again is to prove that v = µ ⊗ ν is countably additive on D.

A simple proof can be presented through the integration techniques (Fubini’s Theorem) but at this moment we can give three examples when this can be done.

8 1. The previous example factually dealt with the product measure3. Denote the Lebesgue measure on n R by λn. Then we may write

⊗n n λmn = λm ⊗ λn, or λn = λ = λ .

So, we showed the countable subadditivity indirectly by resorting to the compactness of a closed interval.

2. Let X and Y be locally compact topological spaces endowed with Borel σ-fields. Let µ have the property: µ(A0) = µ(A), and, for every open set U, for every ϵ > 0, there exists an open set V ⊃ U such that µ(V ) ≤ µ(U) + ϵ. Let ν have the same property. Then we define µ ⊗ ν by formula (7), assuming that E and F are open sets with compact closures.

3. Assume that the measures spaces are special. Let there exist 1-1 functions f : X → [0, 1] and g : { } { } Y → [0, 1] such that F = f −1(B): B ∈ B and G = g−1(B): B ∈ B , and µ(f −1(B)) = λ(B) = ν(g−1(B)). Then, the product measure λ2 = λ ⊗ λ is “carried back” to X × Y , so the countable subadditivity of the planar Lebesgue measure yields the same property of µ ⊗ ν.

The desired property of the product measure v (countable additivity) holds in fact for σ-finite measures µ and ν, but we must introduce new techniques to show this.

Example 1.14 Infinite product Let (X ) be a family of sets indexed by t ∈ T . The Cartesian product ∏ t X = t Xt consists of all functions

{ (ωt : t ∈ T ): ωt ∈ Xt for every t ∈ T } . ∏ If T is countable or finite, say T = N or T = { 1, . . . , n } we may write Xt = X1 × X2 ··· or X1 × · · · × Xn, t ∞ n or even X or X when all Xt = X. ∏ F × ∈ F ∈ If (Xt, t) are measurable space, then sets of the form At s≠ t Xs, where At t for some t T , are called a one dimensional cylindrical sets, and they make a generator of the product σ-field. Their finite intersections form cylindrical sets.

Let Xt be topological spaces and Bt be their Borel fields. Note that the Cartesian product X is also a topological space with its own Borel σ-field. In general, this Borel σ-field contains the aforementioned product σ-field generated by cylindrical sets as a proper subfamily. However, in many cases of interest both σ-fields agree..

Proposition 1.15 If T is countable and Xt are separable metric spaces, then both σ-fields coincide, i.e.,

B(X1 × X2 × · · ·) = B(X1) ⊗ B(X2) ⊗ · · ·

3In R1, the proof of the equality v[a, b] = ϕ[a, b] is much simpler

9 1.5 Hausdorff measure and Hausdorff dimension

d In this section we consider the R with the Euclidean distance d(x, y). The family Db of bounded sets or the family of open solid spheres D and v(D) = diam(D) yield the OM ϕ, which is the Lebesgue OM when d = 1. We desire ϕ to be the true extension of v, that is, it should be a Borel OM and ϕ = v on D (always ϕ ≤ v). Otherwise we deem ϕ pathological. We observe that

ϕ(αA) = αϕ(A), ϕ(A + a) = ϕ(A),

and that the affine mapping A 7→ αA + a preserves measurable sets.

Example 1.16 Say, d = 2. Then ϕ is not a Borel OM. √ The the contrary, suppose that the open unit square I is ϕ-measurable. Clearly, ϕ(I) ≤ diam(I) = 2. I 2 −1 2 contains the union of n similar squares Ik, each with ϕ(Ik) = ϕ(n I) = ϕ(I)/n. Then, ϕ(I) ≥ n ·ϕ(I)/n = nϕ(I) for every n. Hence, ϕ(I) = 0, which carries over to the scaled translations αI + a Hence ϕ(R2) = 0 since ∪ R2 = N(2I − (1, 1)). N However, for every segment A of length a > 0, ϕ(A) = a, which entails a contradiction.

The modification v = diam2 will lead to the two-dimensional Lebegue measure multiplied by the constant 4/pi.

Hausdorff’s idea was to consider a two-step construction. Let ϵ > 0 and consider a new metric d ∧ ϵ, (tantamount to inclusion in D of the sets with diameters not exceeding ϵ. At the same time, we apply the scaling by factor s > 0.

1. Let { } d s Dϵ = D ⊂ R : diam(D) ≤ ϵ , v(A) = diam(A) .

s ≤ D Denote the induced OM by Hϵ . It suffices to use solid spheres of diameter ϵ in lieu of ϵ.

D s 2. Since families ϵ decrease with ϵ (the smaller the ϵ, the fewer the sets), the corresponding OMs Hϵ

increase. The supremum of OMs is an OM. Denote the supremum by Hs and call it the Hausdorff OM or Hausdorff measure.

Theorem 1.17 A Hausdorff OM is COM. Therefore, a Hausdorff OM is a BOM.

Proof. (Comments). Given A, B with d(A, B) > 0, we choose ϵ such that d(A, B) > 2ϵ. Then, sets covering A ∪ B can be separated into two disjoint families, the first consisting of sets whose union covers A, and the second consisting of sets whose union covers B. The inequality ϕ(A ∪ B) ≥ ϕ(A) + ϕ(B) will follow.

We shall discuss the Hausdorff dimension now.

( )1/r ( )1/s Lemma 1.18 Let r ≥ p > 0. Then Hr(A) ≤ Hs(A) .

10 Proof. The inequality follows from the basic construction of the OM, and the inequality

( ∑ )1/r ( ∑ )1/p r ≤ p ≥ dk dk , dk 0. k k r ≥ Note that this is equivalent (rename ak = dk, α = p/r 1) to the inequality ( ∑ )α ∑ ≤ α ak ak , k k

which in turn, follows by induction from (a + b)α ≤ aα + bα, or even easier (t = b/a), from (1 + t)α ≤ 1 + tα.

Definition. Let A ⊂ Rn. The Hausdorff dimension is the nonnegative number

def h(A) = inf { s > 0 : Hs(A) = 0 } = sup { s > 0 : Hs(A) = ∞ }

In other words, h(A) = s, if Hs+δ(A) = 0 and Hs−δ(A) = ∞, for every δ > 0.

Theorem 1.19 Every set A ⊂ Rn has a unique Hausdorff dimension. ∑ ≤ def s ∞ Proof. Let 0 dk < ϵ. If A = k dk < , then ∑ ∑ s+δ ≤ δ s−δ ≥ −δ dk Aε , dk Aε . k k

We use these inequalities with dk = diam(Ak).

If a member of one covering family can be inscribed into a member of another covering family, and conversely (in a suitable uniform manner), then the corresponding Hausdorff measures may differ numerically, yet they will be equivalent. We say that two measures µ and ν are almost equivalent on a σ-field F, if there is a constant K such that K−1µ ≤ ν ≤ Kµ on F.

11 2 Integrals

2.1 Measurable functions

A function f : X → Y acting between two measurable spaces (X, F) and (Y, G) is called measurable if f −1G ⊂ F. Since the pre-image f −1 preserves set operations, thus f −1G is also a σ-field.

Proposition 2.1

1. The composition of measurable functions is measurable.

−1 2. A function f is measurable if f G0 ⊂ F for some generator G0 of G.

3. For topological spaces and Borel σ-fields, continuity implies measurability.

4. If the range Y = R, then f is measurable iff the preimages of all closed left half-lines { f ≤ y }4 are measurable (same for open or right half-lines).

5. The supremum (also, inf, lim sup, lim inf, lim, etc.) of a sequence of measurable functions is measur- able.

6. For any index set T , functions f :(X, F) → (Y , G ), t ∈ T are measurable if and only if f = (f ): ∏ t t t t → X Y = T Yt is measurable with respect to the product σ-field.

Lemma 2.2 If the range Y is a topological group (i.e., the group operations y1 · y2 from the product Y × Y −1 into Y and y in Y are continuous) then f1 · f2 is measurable if f1, f2 are.

Proof. f · g is in fact a composition G ◦ H, where the measurable (cf. Proposition 2.1) H : X → Y × Y is given by H(x) = (f(x), g(x)), and the continuous G : Y × Y is given by G(y, y′) = y · y′.

A measurable function on a probability space is often called a random variable when values are real, or a random element if values run through some measurable space. Typically, metric spaces suffice as ranges of random elements. For example, a sequence f = (fn) of random variables can be seen as a random element with values in the R∞.

The indicator function5 is defined as { 1, if x ∈ A 1IA(x) = 0, if x∈ / A.

A simple function is of the form6 ∑ ∪ ∈ F f = ak1IAk ,Ak , are disjoint, Ak = X, (8) k∈K k∈K

where K is a finite index set. 4notice the quite popular convention: variables are often omitted, { f ≤ y } = { x : f(x) ≤ y }. 5in measure theory often called a characteristic function, in logic used as a 0-1 assignment to a false or true statement, e.g.

1I{roses are red} 6values may be infinite. By convention, ∞ · 0 = 0.

12 Lemma 2.3 A measurable f ≥ 0 is the of an increasing sequence of simple functions. If f is bounded, limit can be uniform.

Proof. For natural numbers n, k = 1, . . . , n 2n put { } k − 1 k A = ≤ f < ,B = { f ≥ n } k,n 2n 2n n and n ∑n2 k − 1 f = 1I + n 1I . n 2n Ak,n Bn k=1 −n If f is bounded, then Bn = ∅ eventually. Then 0 ≤ f − fn ≤ 2 , uniformly.

We will write fn ↗ f when fn increase to the limit f.

Let (X, F, µ) be a measure space. For a simple function (8) we define the Lebesgue integral as ∫ ∑ If = f dµ = µf = ak µ(Ak). (9) X k∈K Proposition 2.4

1. Simple functions form a , say, S;

2. The integral (9) is well defined;

3. The integral (9) is linear and monotone on S.

4. Chebyshev’s inequality holds on S:

tµ { f ≥ t } ≤ µf (10)

Lemma 2.5 For simple functions fn, gn,

0 ≤ fn ↗ f and 0 ≤ gn ↗ f ⇒ lim µfn = lim µgn. n n ∑ ≥ ≤ ≤ Proof. It suffices to show that limn µfn µg for every 0 g = i ci1IAi f. For an arbitrary ϵ > 0, ∑ ∑ ≥ µfn = µfn1IAi µfn1IAi∩{ fn≥ci(1−ϵ) } i i ∑ ( ) ≥ (1 − ϵ) ci µ Ai ∩ { fn ≥ ci(1 − ϵ) } i We let first n → ∞, then ϵ → 0. From the continuity of µ we infer that ∑ ( ) lim µfn ≥ ci µ Ai ∩ { f ≥ ci } . n i

By hypothesis f ≥ g, i.e., Ai ⊂ { f ≥ ci } for every i. Hence, the right hand side is µ g.

Lemmas 2.3 and 2.5 enable us to extend the integral to all nonnegative functions. If f ≥ 0, choose simple functions fn ↗ f (Lemma 2.3 provides at least one choice) and set

def µf = lim µfn. n

13 Lemma 2.5 makes the limit independent of a particular choice of an approximating sequence which makes

the integral well defined. Finally, for a measurable f, since f = f+ − f−, we set

µf = µf+ − µf−,

if one of the integrals is finite. We say that the integral exists in that case. We call f integrable when both integrals are finite, which happens iff µ|f| < ∞.

In particular, Chebyshev’s inequality (10) continues to hold for arbitrary functions.

2.2 Monotone and bounded convergence

All functions considered here are measurable.

Theorem 2.6

Monotone Convergence: If 0 ≤ fn ↗ f, then µfn ↗ µf.

Fatou’s Lemma: If fn ≥ 0, then lim inf µfn ≥ µ lim inf fn n n

Dominated Convergence: If |fn| ≤ g, fn → f, and µg < ∞, then µfn → µf.

2.3 Various modes of convergence

Let (X, F, µ) be a measure space, (S, d) be a metric space, and (f; fn, n ∈ N) be a measurable sequence. We

say that the convergence of fn to f is (or takes place)

( ) 7 a.e. almost everywhere fn → f , if µ { fn ̸→ f } = 0; ( ) 8 E c nearly uniform fn → f , if ∀ ϵ > 0 ∃ A ∈ F, µ(A ) < ϵ and fn → f on A; ( ) { } µ in measure fn → f , if ∀ ϵ > 0 µ fn − f > ϵ → 0;

Theorem 2.7 The above modes of convergence are related as follows, assuming if necessary that the measure is finite or that the convergence takes place along a subsequence:

nearly uniform µ(X)<∞ ↗ ↙ ↘ ∞ µ(X−→)< almost everywhere in measure subseq←− .

7also: almost sure, a.s., when µ is a probability; 8also: in the sense of Egorov;

14 For the real functions, one may consider the “topology convergence of measure” choosing for a 9 of neighborhoods of 0 sets of the form

Uϵ,η = { f : µ { |f| > ϵ } < η } (11)

This space is not Hausdorff, for f and g such that f ≠ g but µ(f ≠ g) = 0, cannot be separated. Thus, we need to introduce the equivalence relation:

f ∼ g iff f = g a.e., i.e., µ(f ≠ g) = 0.

From now on, we identify equivalent functions10.

Theorem 2.8 L0 is a complete metrizable space.

1 For real functions, we may consider the convergence in L : µ|fn −f| → 0. In fact, ∥f∥1 = µ|f| is a on the vector space L1, or the on the quotient space L1 = L1/N . By Chebyshev inequality (10), L1-convergence implies convergence in measure.

Corollary 2.9 L1 is a Banach11 space.

1 1 Proof. Let (fn) be Cauchy in L . In particular, it is bounded in L because

def ∥ ∥ ≤ ∥ − ∥ ∥ ∥ ∞ K = sup fn 1 sup fn fn0 1 + fn0 1 < n n By Chebyshev’s inequality, it is Cauchy in measure. Thus it converges in measure to an f by Theorem 2.8,

and in particular, converges a.e. along a subsequence (fnk). By Fatou’s Lemma, f is integrable:

µ|f| = µ lim inf fn ≤ lim inf µfn ≤ K k k k k Similarly,

lim sup ∥f − fn∥ = lim sup µ lim inf fn − fn ≤ lim sup lim inf µ fn − fn k k k k n n n ≤ lim sup µ fm − fn = 0, n,m→∞

1 so f is the L limit of the Cauchy seqeunce (fn).

2.4 Approximation by continuous functions

A family of sets that contains the whole space and is closed under monotonic countable unions and proper difference is called a λ-system

Lemma 2.10 (Monotone Class Argument, Sierpi´nskiTheorem) For two families of subsets of some space X, let G ⊂ R, where G is a π-system, and R is a λ system. Then σ(G) ⊂ R.

9Real functions form a vector space. To introduce a topology, it suffices to consider neigborhoods { U } of 0, and then translate them { U + x }. Finite intersections of subbase sets form a base. Open sets are unions of base sets. 10Say, L0 = L0((X, F, µ); R) denotes the space of real valued measurable functions on a measure space, and N = { f ∼ 0 }. Then we consider the quotient space L0 = L/N = { f + N : f ∈ L }. 11complete normed

15 Proof. It suffices to show that λ(G), the smallest λ-system containing G, is closed under finite intersections. Then it’ll become a σ-field, so σ(G) ⊂ λ(R). W.l.o.g., we may and do assume that R = λ(G). Let B ∈ G

and define AB = { A : A ∩ B ∈ R }. Clearly, AB is a λ-system containing G, so λ(G) = R ⊂ AB. Given

A ∈ R, BA = { B : A ∩ B ∈ R } is again a λ-system containing G, so R ⊂ BA. This reads: R is closed under intersections.

For the remainder of this subsection we assume that X is a metric space, and µ is a finite Borel measure. All functions are real.

Lemma 2.11 Every Borel measure is regular, i.e., for every Borel B,

µB = sup { µF : closed F ⊂ B } (12)

Using complements, µB = inf { µG : open G ⊃ B }.

Proof. Denote R = { B : (12) }. Then R is a λ-system which contains the π-system of open sets which in turn generate B. Whence B ⊂ R.

1 Corollary 2.12 The space Cb(X) of bounded continuous functions is dense in L (X).

Proof. By definition, the space of simple functions is dense in L1. In virtue of Lemma 2.11, it suffices to

approximate 1IG by continuous functions, where G is open. Let Gn be the open 1/n-hull of G. One may use the following continuous approximators d(x, G) fn(x) = c . d(x, G) + d(x, Gn)

Corollary 2.13 (Lusin Theorem) Let f be a measurable function on a compact X. Then, for every ϵ > 0 there is a closed set F such that µ(F c) < ϵ and f|F is continuous in the relative topology.

Proof. Let gk be continuous functions such that ∥gk − f∥1 → 0. Switching to a subsequence, we may

assume that gk → f a.e. By Theorem 2.7, the convergence is nearly uniform. For ϵ > 0 we choose A with c µ(A ) < ϵ/2 where gk → f on A. Then, choose closed F ⊂ A with µ(A \ F ) < ϵ/2. So, continuous gk converge uniformly on the compact set F . Thus f is continuous on F , where µF c < ϵ.

Remark 2.14 Tietze Theorem says that every real continuous function on a closed subset of a metric space X can be extended to a continuous function on X. Thus, in the above statement, f admits a continuous

extension fϵ on X.

2.5 Fubini and Radon-Nikodym Theorem

For a bivariate function f(x, y), we denote the univariate function of the variable x when y is fixed by f(·, y).

Lemma 2.15 Consider measure spaces (X, F, µ) and (Y, G, ν) and a measurable f : X × Y → [0, ∞). Then

1. for any fixed y ∈ Y , f(·, y) ∈ L0(X), hence the integral g(y) = νf(·, y) exists

16 2. g ∈ L0(Y ).

Proof.

Step 1. f = 1IA×B. Hence, f(x, y) = 1IA(x) · 1IB(y), so both statements are obvious. Further, G = { A × B } is a π-system.

Step 2. The following family is a λ-system:

R = { C ⊂ F ⊗ G : f = 1IC satisfies (1) and (2) }

By Lemma 2.10, F ⊗ G = σ(G) ⊂ R ⊂ F ⊗ G.

Step 3. By linearity, (1) and (2) hold for simple functions on X × Y .

Step 4. (1) holds by all nonnegative f by Lemma 2.3, and (2) holds by the Monotone Convergence Theorem.

Example 2.16 Let f = 1IC , where C ∈ F ⊗G. Then 1IC (x, y) dν(y) is a measurable nonnegative function Y on Y , so the additive set function ∫ (∫ ) def λC = 1IC (x, y) dν(y) dµ(x) (13) X Y

exists (it may be infinite). If Cn ↗ C, then λCn ↗ λC, but it does not imply that λ is countably additive (differences of infinities). Yet it is, when both µ and ν are finite. Further, by the “chessboard” argument, we can show the countable addivity for σ-finite measures. Since

λA × B = µ(A) · ν(B), (14)

we recognize the product measure, λ = µ ⊗ ν. In the case of σ-finite measures, by Cara´ethodory Theorem, it is the unique measure on the product σ-field satisfying the product identity.

Theorem 2.17 (Fubini’s Theorem) Consider σ-finite measure spaces (X, F, µ) and (Y, G, ν). For every measurable function f : X × Y → [0, ∞), the integral µ ⊗ ν f = ∫ ∫ (∫ ) f(x, y) µ ⊗ ν (dx, dy) = f(x, y) µ(dx) ν(dy) X×Y Y X ∫ (∫ ) = f(x, y) ν(dy) µ(dy). X Y Proof. By the “chessboard” argument and linearity, we may assume that both measures are finite. Then,

we check the identity for the indicators 1IA(x) · 1IB(y), and apply the monotone class argument.

We say that a measure ν is absolutely continuous with respect to a measure µ, ν ≪ µ in short, both being defined on the same σ-field F, if

µA = 0 ⇒ νA = 0.

17 Theorem 2.18 (Radon-Nikodym Theorem) Let µ, ν be σ-finite and ν ≪ µ. Then there exists a mea- surable f ≥ 0 such that ∫ νA = f dµ, A ∈ F. (15) A

No elementary proof is available. W.l.o.g., it suffices to prove the statement when µ is finite or simply a probability.

3 Probability

3.1 Probabilistic terminology and notation

Let’s note a change of terms. Let (X, F, µ), denote a general measure space, (Y, G) - a measurable space. A “function” means always “measurable” in the sense clear from the context.

measure theory probability theory if... measure space probability space (Ω, F, P) P(Ω) = 1 measure probability, probability law measurable sets events measurable functions X, Y, ... on Ω random variables the range = R random vectors the range = Rn random the range = R∞ random elements the range = Y integral mathematical expectation almost everywhere, a.e. almost surely, a.s. P ◦ X−1 distribution law

Every random variable X (or element) induces a probability measure, the distribution or probability law of X in the range space. Conversely, given a probability measure µ on (Y, G), there is a random variable X on some probability space (Ω, F, P), whose distribution equals µ. Indeed, it suffices to consider

(Ω, F, P) = (Y, G, µ), and X(ω) = ω (identity).

For example, considering the Lebesgue measure λ on ([0, 1], B), the identity is simply the function U(x) = x. That is P(U ≤ x) = x, 0 ≤ x ≤ 1. Often, U is called a uniform random variable, U[0, 1], and λ - the uniform distribution. Notice that the affine mapping X = a + (b − a)U

−1 defines a uniform U[a, b] random variable, whose distribution is (b − a) λ|[a,b]. More examples can be constructed in special cases.

Lemma 3.1 Let µ be a probability on R. Let (Ω, F, P) = ([0, 1], B, λ). Then the random variable

X(ω) = sup { x > 0 : µ(−∞, x] < ω } (16)

has the distribution µ.

18 Proof. Note that [X(ω) > x] ⇐⇒ µ[−∞, x] < ω]. Hence

λ { ω : X(ω) ≤ x } = λ { ω : µ[−∞, x] ≥ ω } = µ[−∞, x].

Measures agreeing on a generator (halflines here), agree.

Formula (16) factually defines the (generalized) inverse F −1 of a nondecreasing function F (x) = µ(−∞, x]. Of course, any such function (with limits at ±∞ ranging from 0 to 1) determines a (Stieltjes-Lebesgue) measure. Then, the lemma says that F −1(U), where U is U[0, 1] has the distribution F 12. This specific construction cannot be repeated when the range is, say, Rn with n ≥ 2, because it relies on the linear order of R.

3.2 Independence, Borel-Cantelli, 0-1 Law

Random variables (elements ranging in Y ) X1, ··· ,Xn on (Ω, F, P) are called independent, if

P(X1 ∈ B1,...,Xn ∈ Bn) = P(X1 ∈ B1) ··· P(Xn ∈ Bn),B1,...,Bn ∈ G; in other words, the probability distribution of the random vector (X1,...,Xn) is the product of p.d.’s of its components. By the argument spelled above, one can choose a product probability space as the domain (e.g., the product measure space induced by the product measure), and redefine the random vector

Y = (Y1,...,Yn). with the same probability distribution. Then,

Yk(ω1, . . . , ωn) = Yn(ωk), k = 1, . . . , n, to wit: independent random variables are, essentially, those functions that depend on independent variables.

Let Ak be a sequence of sets. Define ∩∞ ∪∞ ∪∞ ∩∞ def def lim sup Ak = Ak, lim inf Ak = Ak. k k n=1 k=n n=1 k=n ∪ ∩ Note that Bn = Ak form a descending sequence while Cn = Ak is an ascending sequence of sets. k≥n k≥n Since ∪∞ ∪∞ Cm ⊂ Am, for every m, and Cm = Cm m=1 m=1 then, for every n ∪∞ ∪ ∪ lim sup Ak = Cm = Cm ⊂ Am = Bn k m=1 m≥n m≥n

Therefore, ∩ lim inf Ak ⊂ Bn = lim sup Ak. k n k The inclusion becomes clearer, when we see that

a ∈ lim inf Ak ⇐⇒ ∃ n ∀ k ≥ n a ∈ Ak ⇐⇒ a ∈ Ak eventually k a ∈ lim sup Ak ⇐⇒ ∀ n ∃ k ≥ n a ∈ Ak ⇐⇒ a ∈ Ak infinitely often k

12or rather, the probability distribution µ, related to F as described

19 That’s why we frequently use the acronym

{ Ak i.o. } = lim sup Ak k Events are independent, if their indicators are indpendent.

Lemma 3.2 (Borel-Cantelli)

1. For every sequence (Ak) of events, ∑ P(Ak) < ∞ ⇒ P(Ak i.o.) = 0 k

2. For every sequence (Ak) of independent events, ∑ P(Ak) = ∞ ⇒ P(Ak i.o.) = 1. k

Proof. Denote A = lim sup A . ∩ k k ∪ 1: Since A = n Bn, where Bn = k≥n Ak, then ∪ ∑ P(A) ≤ P(Bn) = P( Ak) ≤ P(Ak) → 0, as n → ∞. k≥n k≥n

2: It suffices to show that P(Bn) = 1 for every n, if the numerical of probabilities diverges. Indeed, then we will have ( ) ∪ ∑ c c ≤ c P(A ) = P Bn P(Bn) = 0. n n Now, using the independence assumption ∩ ∏ ∏ ( ) c c c − P(Bn) = P( Ak) = P (Ak) = 1 P (Ak) . k≥n k≥n k≥n Observe that 1 − p ≤ e−p, p ≥ 0.

Indeed, the graph of the convex (concave up) function y = e−p lies above any of its tangent lines, and y = 1 − p is its tangent line at 0. Thus, using the divergence of the series (even starting at arbitrary index n),   ∏  ∑  − −∞ c ≤ P (Ak) − P(Bn) e = exp  P (Ak)  = e = 0. k≥n k≥n

Families of events are called independent, if arbitrary finite selections form inependent events. def Lemma 3.3 (Kolmogorov 0-1 Law) Let Fn be independent σ-fields in a probability space. Then lim sup Fn = ∩ n σ(Fn, Fn+1,...) is trivial, i.e., it consists of events either of probability 0 or 1. n

Proof. In the above intersection one may take the index n > k, for any k. Hence lim sup Fn and F1,..., Fk n are independent, for every k, hence, independent of any σ-field generated by all Fk’s. In particular, it is independent of itself. A set independent of itself must have either pobability 0 or 1.

20 3.3 Lp-spaces

Recall that the convergence in probability is metrizable, and the space L0 of classes of equivalence of random variables with respect to the equivalence relation X ∼ Y iff P(X = Y ) = 1 becomes a complete metrizable space. It is convenient to define a quasi-norm ∥f∥, i.e., such that ∥X − Y ∥ is a metric. Here, we may choose

def ∥X∥0 = inf { ϵ > 0 : P(|X| > ϵ) < ϵ } (17)

Alternatively, the following quantities also metrize the convergence in probability ( ) def ∥X∥0 = E |X| ∧ 1 . (18)

Note that implicitly we used the function φ(t) = t ∧ 1, with

def ∥X∥0 = E φ(|X|). (19)

0 To get a L -metric, we need a bounded nondecreasing ϕ : R+ → R+, continuous and valued 0 at 0, and also concave. E.g., φ(t) = 1 − e−t.

Another metric convergence is convergence in Lp, 0 < p ≤ ∞. The larger the p, the stronger the convergence, because of Jensen’s inequality. If ϕ(t): R+ → R+ is convex, then ( ) ϕ E |X| ≤ E ϕ(|X|). (20)

When p ≥ q, apply Jensen’s inequality to ϕ(t) = tp/q to obtain

( )1/q ( )1/p E |X|q ≤ E |X|p . (21)

To prove Jensen’s inequality, notice that a convex function can be written as

ϕ(t) = sup { at + b :(a, b) ∈ Q }

where Q is some countable set. Then, by monotonocity and linearity of the integral

aE |X| + b = E (a|X| + b) ≤ E ϕ(|X|),

and it suffices to take the supremum over Q.

When p ≥ 1, and 1/p + 1/q = 1, the pair (Lp,Lq) forms a ⟨ ⟩ X,Y = E XY. (22)

The integral is well defined because of H¨older’sinequality, true for any measure and nonnegative mea- surable functions ( )1/p( )1/q µfg ≤ µ f p µ f q . (23)

p 1/p When p = 1, by convention we take q = ∞. Recall the p-norm ∥X∥p = (µ|f| ) . Duality (22) defines a linear functional on Lp, which in virtue of H¨older’sinequality is continuous ⟨ ⟩ ⟨ ⟩

ΛY (X) = X,Y , (H¨older’s: X,Y ≤ C∥X∥p, with C = ∥Y ∥q),

21 and consequently yields the concept of weak convergence ⟨ ⟩ ⟨ ⟩ p w q Xn ∈ L , 1 ≤ p < ∞,Xn → X ⇐⇒ ∀ Y ∈ L Xn ,Y → X,Y . (24)

(Lp,Lq ) w If we wish to be more explicit, we may prefer to write Xn → X in lieu of Xn → X.

Above, we excluded the exponent p = ∞. Although L1 consists of continuous linear functionals on L∞, there are far more of such functionals. In contrast, when p ∈ (1, ∞), then Lq exhaust all functionals over Lp, and vice versa. The space of all linear continuous functionals on a is called its dual. When p ∈ (1, ∞), then Lp is a reflexive Banach space, i.e., whose double coincides with the original one. Still (24) defines some sort of convergence when p = ∞, called weak∗ convergence. This is the convergence in a dual space. In a reflexive Banach space weak and weak∗ convergence coincide.

So, what’s the dual of L∞? Even an easier question, what’s the dual of, say, C[0, 1], is not easy to answer. We can give an example of a continuous functional over C[0, 1]. Let µ be a finite Borel measure. Then we obtain a continuous linear functional ∫ ⟨ ⟩ 1 µ , f = Λµ(f) = f dµ, |Λµf| ≤ µ[0, 1] · ∥f∥∞. 0 In fact, there are no other examples. In other words, every continuous linear functional on C[0, 1] must act through a measure and the above formula (This is the Riesz Representation Theorem).

In a less rigorous speech, we may say that measures are continuous functionals over continuous functions.

3.4 Weak convergence of measures

Thus, as a by-product, we arrive at the concept of convergence of finite measures, probabilities in particular. That would be the weak∗ convergence in a dual space. Let’s confine to measures on a metric space (S, d).

Let Cb(S) the Banach space of continuous bounded real functions on S. Finite measures, e.g., probabilities

can be identified with continuous linear functionals on Cb acting through the duality ⟨ ⟩ ∫ µ , f = f dµ. S ∗ ∗ M On the dual space Cb (S) consider the , which is inherited by (S) (the vector subspace of finite measures) or P(S) (the positive cone of probabilities). The inherited weak∗-topology is given by the following subbase of neighborhoods of a measure µ { ∫ ∫ }

Uf,ϵ(µ) = ν : f dµ − f dν < ϵ (25) S S In particular, weak∗ µn → µ ⇐⇒ ∀ bounded continuous f, µnf → µf. (26)

Traditionally, albeit against the aforementioned analytic convention, the introduced convergence is called the weak convergence. So we will follow that misnomer, and drop the asterisque ∗ from the notation. d Random S-valued elements Xn converge in distribution to a random element X, Xn → X, if their distributions weakly converge. Note that the domains of these random variables may differ.

22 We will give a few characterizations of the weak convergence, first, for measures on R. Then, we will generalize the characetrizations to the case of measures on a metric space. Recall that the distribution function F (x) = P(X ≤ x) of a real random variable uniquely determines the distribution µ = L(X) of X, µ(A) = P(X ∈ A). A point x is a continuity point of F , if F (x−) = F (x)(F is right continuous). Since F is nondecreasing, there are at most countably many discontinuity points. Denote by C(F ) the set of continuity points of F .

Theorem 3.4 (Helly Theorem) Let µn = L(Xn), µ = L(X) be probability laws with distribution functions

Fn,F . Then d (1) Xn → X ⇐⇒ (2) ∀ x ∈ C(F ) Fn(x) → F (x)

In the proof we run (cf. formula (31)) into the property of a set { Xt } of probability distributions of random variables:

lim sup P(|Xt| > c) = 0, (27) c→∞ t 0 called the tightness. A set is tight if and only if it is bounded in L . That is, if ∥ · ∥0 is any F-norm metrizing convergence in probability, then for every family { Xt } of random variables,

{ } ⇐⇒ ∥ ∥ ⇐⇒ ∀ → ∀ →P Xt is tight lim sup α Xt 0 = 0 αn 0 (tn) αnXtn 0. α→0 t The latter equivalence follows by monotonicity.

E.g., choose ∥X∥0 = E (|X| ∧ 1). So, the tightness is implied by the new condition, as

−1 ∥tX∥0 ≥ P(|X| > t ).

In general, it suffices to consider sequences only. So, if (Xn) is not tight, then for some subsequence (nk), | | | −1 | ̸→ P( Xnk > k) > d > 0 for some d. But then P( k Xnk > 1) 0.

Conversely, if (Xn) is tight and αn → 0, then, given positive c and ϵ there is n0 such that for n ≥ n0,

|αn| < ϵ/c.

P(|αnXn| > ϵ) ≤ P(|(ϵ/c)Xn| > ϵ) = P(|Xn| > c).

Whence,

lim sup P(|αnXn| > ϵ) ≤ sup P(|Xn| > c) = 0. n n Remark 3.5 The converse inequality is also true. Hence, for the tightness of a sequence we may use the limes superior in lieu of the supremum.

3.5 Measures on a metric space, tightness vs. compactness

Below, S = (S, d) is a Polish (separable complete metric) space. M(S) (respectively, P(S)) denotes the

set of all Borel measures (respectively, probabilities) on S, and Cb(S) denotes the space of real bounded continuous function on S. A Borel set B is called a continuity set of a measure µ, if its boundary ∂B has measure 0 (B denotes the closure of B while Bo denotes its ). Many results cited here remain valid for more general spaces. Some assumptions, such as separability or completeness, or even metricity, may be relaxed although at the cost of strengthening properties of

23 measures. When we compare topologies, the convergence of sequences is not enough, unless the topologies are metrizable. Instead, one may consider generalized sequences (“nets”). A is a family (xn, n ∈ N), where N is a directed set ordered by a partial order ≻. The limit is defined as usual:

lim xn = x ⇐⇒ ∀ open U, U ∋ x ∃ n0 ∀ n ≻ n0 xn ∈ U. n∈N Some arguments good for sequences immediately carry over to nets (e.g., Alexandrov’s Theorem below). However, many don’t. For example, the Monotone or Dominated Convergence Theorem fail for nets: ∫ ∫ ∫ 1 1 1 0 = lim 1IA dx ≠ lim 1IA dx = 1I0,1] dx = 1 finite A⊂[0,1] 0 0 finite A⊂[0,1] 0 Properties of measures vary greatly, what is automatic in a richer space may happen only occasionally in a space with a poorer structure. For example, we know that every finite Borel measure on a metric space is regular, i.e., its values on Borel sets are determined by the values on closed subsets. Regularity is not automatically satisfied in topological spaces. Similarly, consider tight finite measures:

µ(S) = sup { µ(K): K compact } , or Radon measures:

µ(B) = sup { µ(K): K ⊂ B, compact } , for every Borel BB.

Even in compact Hausdorff spaces there are non-Radon measures. However

Proposition 3.6 (Ulam’s Theorem) Every Borel probability measure on a Polish space is Radon.

Proof. Let a sequence { x } be dense in S. Consider an array of closed balls C = C(x , 1/k), centered n ∪ n,k n at xn with radii 1/k. Clearly , S = n Cn,k for each fixed k. In particular, for each k

∪nk ϵ ∀ ϵ > 0 ∃ n µ C > 1 − . k n,k 2k n=1

Then, µ(Cϵ) > 1 − ϵ, where the closed set is defined as

∩ ∪nk Cϵ = Cn,k k n=1

In particular, for every k, the set Cϵ is covered by a finite union of open set, each of radius < 2/k. Such sets are called totally bounded. A closed totally bounded set is compact. So we have proved that µ is tight. Since µ is regular, and arbitrary Borel set can be approximated by closed subsets, and each closed subset is again a Polish space, thus the measure on any Borel set can be approximated from inside by compact sets, i.e., the measure is Radon.

If we relax the separability assumption, then we see that a Radon measure is precisely one that is supported by a separable subset Sµ, i.e., µ(Sµ) = 1.

Theorem 3.7 (Alexandrov) Let (µ, µn) be a family of probability measures. T.F.A.E.:

w 1. µn → µ;

24 ≤ 2. lim supn µn(F ) µ(F ), for every closed set F ;

3. lim infn µn(G) ≥ µ(G), for every open set G;

4. limn µn(C) ≥ µ(C), for every continuity set C of µ;

Note that we haven’t used separability nor completeness at all. However, the metricity was important. In fact, we needed it only in the implication (1) ⇒ (2). One may introduce a special class of topological spaces

that allow such separating continuous functions fϵ (they are called completely regular). Yet, if we dealt with nets, the continuity of the measure µ used implicitly would have to be strengthened to the continuity along descending nets of sets. Each of the conditions yields a subbase of neighborhoods (analogs of halflines) of a measure µ:

(1) U1(µ; f, ϵ) = { ν : |νf − µf| < ϵ } , ϵ > 0, f ∈ Cb(S);

(2) U2(µ; F, ϵ) = { ν : νF < µF + ϵ } , ϵ > 0,F closed;

(3) U3(µ; G, ϵ) = { ν : νG > µG − ϵ } , ϵ > 0,G open;

(4) U1(µ; C, ϵ) = { ν : |νC − µC| < ϵ } , ϵ > 0, µ∂C = 0.

Alexandrov’s Theorem says that these subbases induce equivalent topologies. The following L´evy-Prokhorov metric comprises all of these alternatives:

def π(µ, ν) = inf { ϵ > 0 : µ(B) ≤ ν(Bϵ) + ϵ, for every Borel set B } (28)

Theorem 3.8 (Prokhorov) Let S be Polish13. The metric π metrizes the weak topology. In particular, w µn → µ iff π(µn, µ) → 0.

The aforementioned notion of tightness of a set of random variables is factually a property of probability measures. A family Q ⊂ P(S) is called tight, if

∀ ϵ > 0 ∃ compact K ∀ µ ∈ Q µ(Kc) < ϵ.

The importance of the concept is illustrated by the following result.

Theorem 3.9 (Prokhorov’s Compactness Theorem) Let S be Polish and Q ⊂ P(S) be a family of probability measures. Then, Q is tight iff Q is relatively compact.

13completeness not needed

25 4 Appendix: Selected proofs

4.1 Section 1

Proof of Theorem 1.5.

1. It is obvious that ∅,S ∈ Mφ and Mφ is closed under complements;

2. Mφ is closed under intersections.

c Let A, B ∈ Mφ and P ⊂ A ∩ B,Q ⊂ (A ∩ B) . We partition Q into three disjoint subsets, Q = c QA ∪AB ∪Q0, where QA = Q∩(A\B),QB = Q∩(B \A),Q0 = Q∩(A∪B) . Using the measurability of B, then of A and of B again, and the subadditivity of φ, ( ) φ(P ∪ Q) = φ (P ∪ QA) ∪ (QB ∪ Q0)

≥ φ(P ∪ QA) + φ(QB ∪ Q0)

≥ φ(P ) + φ(QA) + φ(QB) + φ(Q0) ≥ φ(P ) + φ(Q).

Hence and from Step 1, Mφ is closed under differences and unions of two sets. The latter condition means exactly that φ is superadditive, hence countably superadditive, hence countably adiditive on

Mφ.

3. Mφ is closed under finite unions. Further, by induction if A1,...,An ∈ Mφ are disjoint, P ⊂ c A1 ∪ ... ∪ An and Q ⊂ (A1 ∪ · · · ∪ An) , then

∑n φ(P ∪ Q) ≥ φ(P ∩ Ak) + φ(Q). k=1

4. Mφ is a σ-field. ∪ ∪ ∈ M ⊂ ⊂ c It suffices to consider disjoint sets A1,A2,... φ. Let P k Ak and Q ( k Ak) . Fix n. From the monotonicity of φ and then from Step 3 we infer that

( ∪n ) φ(P ∪ Q) ≥ φ Ak ∩ P ∩ Q k=1 ∑n ≥ φ (P ∩ Ak) + φ(Q) k=1 Let n → ∞. Thus, by subadditivity, ∑∞ φ(P ∪ Q) ≥ φ(P ∩ Ak) + φ(Q) ≥ φ(P ) + φ(Q). k=1 That φ is a measure was proved in Step 2.

Proof of Theorem 1.6

26 Consider an open set G ⊂ S. Pick P ⊂ Gc and Q ⊂ G. We need to show that φ(P ∪ Q) ≥ φ(P ) + φ(Q). If φ(P ∪ Q) = ∞, then the inequality is trivially true. So, assume that φ(P ∪ Q) < ∞. Hence φ(Q) < ∞. Define ascending sets and their differences: { } 1 Q = x ∈ Q : d(x, Gc) > ,D = Q \ Q , k k k k+1 k ∪ so Q = k Qk because G is open. By monotonicity and the hypothesis, since d(P,Qk) > 0,

φ(P ∪ Q) ≥ φ(P ∪ Qk) = φ(P ) + φ(Qk).

Since φ(Q) ≥ φ(Qk), then it suffices to show that

φ(Q) ≤ lim φ(Qk). k ∪ \ ∞ By subadditivity and since Q Qk = i=k Di, ( ) ∑∞ φ(Q) = φ Qk ∪ (Q \ Qk) ≤ φ(Qk) + φ(Q \ Qk) ≤ φ(Qk) + φ(Dk). i=k ∑ ∞ To finish, it is enough to show that the series i φ(Di) < . By induction, (5) continues to hold for finitely many metrically separated sets. Hence

∑2n ∑n ∑n φ(Di) = φ(D2j) + φ(D2j−1) i=1 j=1 j=1 ( ∪2n ) ( ∪n ) = φ D2j + φ D2j−1 ≤ 2φ(Q) < ∞. i=1 j=1

This completes the proof.

Proof of Lemma 1.11 Define D′ = { A ∈ D : v(A) < ∞ } ′ ′ ≤ ′ and consider the induced OM ϕ = ϕv,D′ . We claim that ϕ = ϕ . Clearly, ϕ ϕ . If for some set A, ϕ(A) = ∞, so ϕ′(A) = ∞. If ϕ(A) < ∞, then A admits a cover of sets of finite v-value, i.e., belonging to D′. Hence ϕ(A) = ϕ′(A) in any case. So, we may assume that D = D′, i.e., v is finite on D.

Let D ∈ D. Fix an arbitrary ϵ > 0. Choose Dk, k = 1, 2, ..., with v(Dk) < ∞ such that ∪ ∑ D ⊂ Dk and v(Dk) < ϕ(D) + ϵ. (29) k k ∪ ∩ D In particular, D = k(D Dk). By countable subadditivity of v and since is closed under intersections, ∑ v(D) ≤ v(D ∩ Dk). k

Since v is monotonic, ∑ ∑ v(D ∩ Dk) ≤ v(Dk) k k

27 Whence, by definition of infimum on one hand, and by (29) on the other hand,

ϕ(D) ≤ v(D) ≤ ϕ(D) + ϵ.

Now, we let ϵ → 0, which proves that ϕ = v on D.

Proof of Theorem 1.12 By Lemma (1.11), ϕ = v on D. To prove the second statement, let D ∈ D,P ⊂ D. and Q ⊂ Dc. For an ∪ ⊃ ∪ ϵ > 0 choose a cover k Dk P Q such that ∑ ϕ(P ∪ Q) > v(Dk) − ϵ. k

c Since D is a ring, D ∩ Dk,D \ Dk ∈ D, and because v is a measure, v(Dk) = v(Dk∩) + v(Dk ∩ D ). Further, ∪ ∪ c P ⊂ Dk ∩ D,Q ⊂ Dk ∩ D . k k

Hence, ∑ ∑ c ϕ(P ∪ Q) > v(Dk ∩ D) + v(Dk ∩ D ) − ϵ ≥ ϕ(P ) + ϕ(Q) − ϵ. k k

Letting ϵ → 0, we prove that D ∈ Mϕ.

Regarding the second statement, let µ be a measure on F = σ(D) such that µ = v on D. Pick F ∈ F ⊂ M ∪ ϕ and an arbitrary cover Dk ⊃ F , Dk ∈ D. Then, by monotonicity and subadditivity of µ, ∑ ∑ µ(F ) ≤ µ(Dk) = v(Dk) k k Taking the infimum over all covers, we infer that

∀ F ∈ F µ(F ) ≤ ϕ(F ). (30)

So far, we have not used the σ-finitness of µ. Pick again F ∈ F. Suppose first that there is D ∈ D such that F ⊂ D and µ(D) = v(D) = ϕ(D) < ∞. Then ϕ(D \ F ) < ∞ and µ(D \ F ) < ∞ (since D \ F ∈ F). Applying (30) to D \ F , since ϕ = v = µ on D, we have

ϕ(F ) = ϕ(D) − ϕ(D \ F ) = µ(D) − ϕ(D \ F ) ≤ µ(D) − µ(D \ F ) = µ(F ).

Hence, in view of (30), ϕ(F ) = µ(F ) when F is covered by a single set from D of finite measure. If this does ∪ ⊂ ∞ not happen, F k Dk with ϕ(Dk) = v(Dk) = µ(Dk) < . W.l.o.g. we may assume that Dk are disjoint

(using the ring property). Then µ(F ∩ Dk) = ϕ(F ∩ Dk), and hence ∑ ∑ µ(F ) = µ(F ∩ Dk) = ϕ(F ∩ Dk) = ϕ(F ). k k Thus µ = ϕ on F in all cases.

28 4.2 Section 2

Proof of Theorem 2.7

In all statements, it suffices to consider the convergence Dn = d(fn, f) to 0.

Near uniformity implies a.e. convergence: For ϵ = 1/k → 0, choose A as stated in the premise of near ∪ k k ↗ uniformity. Make Ak . On k Ak the a.e. convergence holds.

Near uniformity implies convergence in measure. Fix ϵ > 0. For η > 0 we choose A with µAc < η on which

Dk → 0. That is, we can choose k0 = k(η, ϵ) such that for every k ≥ k0, we have Dk ≤ ϵ, uniformly on A. In other words c A ⊂ { Dk ≤ ϵ } , or { Dk > ϵ } ⊂ A .

c Hence, µ { Dk > ϵ } ≤ µA < η.

14 A.s. convergence implies near when µ is finite . Let µ(X) < ∞ and fn → f a.e., that is, Dn = d(fn, f) → 0 a.e. In other words, { } { }

µ lim sup Dn > 0 = 0, hence ∀ r µ lim sup Dn ≥ r = 0 n n

But { } { } ∩ { } ∩ ∪ lim sup Dn ≥ r = inf sup Dk ≥ r = sup Dk ≥ r ⊃ { Dk ≥ r } . n n k≥n k≥n n n k≥n Whence, for each r we have the decreasing sets (as n ↗ ∞) ∪ def Bn(r) = { Dk ≥ r } . k≥n

Since Dk → 0 a.e., and µ is finite

µBn(r) ↘ 0. ∑ Let ϵ > 0. Let r be rational and pick a positive series r a(r) = 1. For each fixed r, choose n = n(r) such that ∑ µBn(r)(r) < ϵ a(r), whence µBn(r)(r) < ϵ. r Thus, for the defined ∩ ( ) def c A = Bnr (r) , r ∑ c ≤ we have µ(A ) r µBnr (r) < ϵ, and

A = { ∀ r ∃ n(r) ∀ k ≥ n(r) Dk < r } .

That is, the uniform convergence takes place on A.

14Egorov Theorem

29 Convergence in measure implies a.s. convergence along a subsequence. Assume that, for every ϵ > 0,

µ(Dn > ϵ) → 0 as n → ∞. Pick n1 such that 1 µ { D > 1 } < . n1 2

Having picked n1 < n2 . . . , nk, choose nk+1 > nk such that { } 1 1 µ D > < . nk+1 k + 1 2k+1

Decreasing sets { } ∪ 1 ∩ A = D > yield A = A . j nk k j k≥j j Also, ∑ 1 µ(A ) < → 0. j 2j k≥j

In particular, µ(A1) < ∞, so

µ(A) = lim µ(Aj) = 0. j Now, read Ac: { } ∪ ∩ 1 Ac = D ≤ nk k j k≥j

For ϵ > 0, choose jϵ such that 1/jϵ < ϵ, so 1/k < ϵ for every k ≥ jϵ. Hence, { } { } ∪ ∩ ∪ c ⊂ { ≤ } ≤ ⊂ ≤ A Dnk ϵ = sup Dnk ϵ inf sup Dnk ϵ . k≥j j k≥j j k≥j j

That is, for every ϵ, { } c ⊂ ≤ A lim sup Dnk ϵ k which yields (taking a countable intersection) { } c ⊂ A lim sup Dnk = 0 , k → c i.e., Dnk 0 a.s. on A .

Proof of Theorem 2.8 First, (11) can be replaced by a countable subbase (with rational ϵ, η), which implies metrizability. We will prove the completeness. Let fn be Cauchy in measure. Then, we can choose increasing integers nk such that { } 1 1 µ f − f > < nk+1 nk 2k 2k

That yields the a.e. convergent series ∑ − fnk+1 fnk , k which in turn implies the existence of the a.e. limit ( ) ∑ ∑ − − ≤ − f = fnk+1 fnk + fn1 , where f fnk fnj+1 fnj k j≥k

30 Hence, { } 1 ∑ 1 1 µ f − f > < = nk 2k 2j 2k−1 j≥k Now, for a fixed ϵ > 0, { } { } { }

fn − f > ϵ ⊂ fn − fn > ϵ/2 ∪ fn − f > ϵ/2 { k } { k } 1 ⊂ f − f > ϵ/2 ∪ f − f > , n nk nk 2k when k is large enough. So, for these large ks, { } { } 1 − ≤ − lim sup µ fn f > ϵ lim sup µ fn fnk > ϵ/2 + k−1 n n 2

Letting k → ∞, { }

lim sup µ fn − f > ϵ = 0. n

Proof of Theorem 2.6

MC: By Lemma 2.3, for every n there exists a sequence of simple functions gnk ↗ fn as k → ∞, Consider simple functions hnk = g1k ∨ ... ∨ gnk. Observe that

(a) hkk ≤ f1 ∨ ... ∨ fk = fk ≤ f

(b) hnk ≤ hkk for k ≥ n

(c) fn = lim hnk ≤ lim hkk ≤ f [ by (a) and (b)] k k (c) hkk ↗ f [ by (c)]

By definition of the integral (i.e., by Lemma 2.5) and (a),

µf = lim µhkk ≤ lim µfk. k k

The converse inequality lim µfk ≤ µf follows by monotonicity. k ( ) def FL: lim inf fn = sup inf fn = sup hk, where hk increase. By Monotone Convergence and monotonicity n k n≥k k

µ lim inf fn = µ sup hk = sup µhk ≤ sup inf µfn = lim inf µfn. n k k k n≥k n

DC: Apply Fatou’s Lemma for g − fn and g + fn.

31 4.3 Section 3

Proof of Theorem 3.4 (1) ⇒ (2): Assume (1). Fix x ∈ C(F ). That is,

∀ ϵ > 0 ∃ δ > 0 ∀y |x − y| ≤ δ ⇒ |F (x) − F (y)| < ϵ.

δ Denoting I = 1I(−∞,x], we have F (x) = PI. Let I ≥ I be the function that is 1 over (−∞, x], 0 over [x, ∞),

and linearly and continuously drops from 1 to 0 over [x, x + δ]. Similarly, define Iδ ≤ I that drops linearly from 1 to 0 over [x − δ, x]. Let G ∈ { F,Fn }. Then ∫ ∫ ∫ x−δ ∞ x G(x − δ) = Iδ dG ≤ Iδ dG ≤ I dG = G(x), −∞ −∞ −∞ ∫ ∫ ∫ x ∞ x+δ G(x) = Iδ dG ≤ Iδ dG ≤ I dG = G(x + δ). −∞ −∞ −∞

Whence, and since x ∈ C(F ), ∫ ∞ ∫ ∞

Iδ dF − Iδ dFn ≥ F (x − δ) − Fn(x) ≥ F (x) − Fn(x) − ϵ, −∞ −∞ ∫ ∞ ∫ ∞ δ δ I dF − I dFn ≤ F (x + δ) − Fn(x) ≤ F (x) − Fn(x) + ϵ. −∞ −∞ That is, ∫ ∞ ∫ ∞ ∫ ∞ ∫ ∞ δ δ I dF − I dFn − ϵ ≤ F (x) − Fn(x) ≤ Iδ dF − Iδ dFn + ϵ. −∞ −∞ −∞ −∞ Now, letting n → ∞, then ϵ → 0, completes the proof.

(2) ⇒ (1): We will prove first a weaker statement, that (2) implies ∫ ∫ b b ∀ a, b ∈ C(F ) ∀ g ∈ C[a, b] g dFn → g dF. a a Assume (2). Let ϵ > 0. SInce g is uniformly continuous, there is δ > 0 such that |x−y| ≤ δ ⇒ |g(x)−g(y)| <

ϵ. Using the Riemann-Stieltjes definition of the integral we choose a partition x0 = a < x1 < ··· xk = b with

mesh less than δ, where x1, . . . , xk−1 ∈ C(F ) (which is possible because C(F ) is dense). Then

∫ ∫ − ∫ b b k∑1 xi+1 g dFn − g dF = g d(Fn − F ) a a i=0 ∫xi ∫ ∑ xi+1 ∑ xi+1 = (g(x) − g(xi)) dFn + (g(xi) − g(x)) dF i xi ∑ i xi + g(xi)(Fn(xi+1) − F (xi+1) + F (xi) − Fn(xi)) i = A1n + A2n + Bm.

we have |A1n| < ϵ and |A2n| < ϵ, and limn Bn = 0. Whence

∫ ∫ b b − ≤ lim sup g dFn g dF 2ϵ. n a a

Letting ϵ → 0, we finish the proof of the weaker statement.

32 Note that lim P(|X| > c) = 0. We claim that the uniform analog of this property, c→∞

lim sup P(|Xn| > c) = 0, (31) c→∞ n is all that is needed for the completion of the proof. Indeed, given ϵ > 0 and a continuous f bounded by M we choose c so that ϵ ϵ sup P(|Xn| > c) < , P(|X| > c) < . n 2M 2M Then, choose continuity points b > c and a < −c. Then ∫ ∫ ∫ ∫ ∞ ∞ b b f dFn − f dF = f dFn − f dF + rn, −∞ −∞ a a where |rn| < ϵ. The rest follow by routine. The proof is complete, subject to establishing (31). To this end, choose c0 such that c0, −c0 ∈ C(F ) and P(|X| > c0) < ϵ/2. Then, there is n0 so that for all n > n0,

|P (|Xn| > c0) − P(|X| > c0| = |Fn(c0) − Fn(−c0) − F (c0) + F (−c0)| < ϵ/2.

Hence, for n > n0,

P(|Xn| > c0) < ϵ.

Finally, for j = 1, . . . , n0, let c1, . . . , cj be such that P(|Xj| > cj) < ϵ. Now, let C = max cj. So, for c > C 0≤j≤n0

sup P(|X| > c) ≤ sup P(|X| > C) < ϵ, n n which completes the proof.

Proof of Theorem 3.7 (1) ⇒ (b) Let F be a closed set and ϵ > 0. Define the function

d(x, F ) f (x) = . ϵ d(x, F ) + d(x, (F ϵ)c)

ϵ c It follows that fϵ is continuous, 0 ≤ fϵ ≤ 1, fϵ = 1 on F , and fϵ = 0 on (F ) . Hence, for any measure ν, ∫ ϵ ν(F ) ≤ fϵ dν ≤ ν(F ) S Therefore, (1) yields ∫ ∫ ϵ lim sup µn(F ) ≤ lim fϵ dµn = fϵ dµ ≤ µ(F ) n n S S Now, let ϵ → 0.

(2) ⇐⇒ (3) is obvious.

(2)∧(3) ⇒ (4). Let C be a continuity set of µ. Then

lim sup µn(C) ≤ lim sup µn(C) ≤ µ(C) = µ(C) n n o o lim inf µn(C) ≥ lim inf µn(C ) ≥ µ(C ) = µ(C) n n

33 (4) ⇒ (1). Let f ∈ Cb(S). W.l.o.g., assume 0 < f < 1. Let ϵ > 0. Consider a partition of [0, 1], −1 t0 = 0 < t1 < ··· < tk = 1 with mesh < ϵ such that µx : f(x) = ti = 0. Then Ci = f (ti, ti+1] are continuity sets of µ. Then ∑ ∑ def ≤ ≤ def f1 = ti1ICi f ti+11ICi = f2. i i Thus, ∫ ∫ ∫ ∫ ∫ ∫

f1 dµn − f dµ ≤ f dµn − f dµ ≤ f2 dµn − f dµ,

and therefore ∫ ∫ ∫ ∫

−ϵ ≤ lim inf f dµn − f dµ ≤ lim sup f dµn − f dµ ≤ ϵ n n This proves (1).

Proof of Theorem 3.8 It suffices to show that, for every µ, every element of either of the four subbases (1)-(4) contains some metric ball { ν : π(µ, ν) < δ }, and conversely, that any metric ball contains some member of one of the bases15.

For the first part, pick U2(µ; F, ϵ). Since µ is continuous from above, there is a positive δ < ϵ/2 such that

µ(F δ) < µ(F ) + ϵ/2,

Then δ π(µ, ν) < δ ⇒ ν(F ) < µ(F ) + δ < µ(F ) + ϵ ⇒ ν ∈ U2(µ; F, ϵ).

For the second part, consider a metric ball V = { ν : π(µ, ν) < ϵ }. We will inscribe into it a neighborhood of type (4). In this part we need exactly the separability assumption. We cover S with countably many open balls of diameter δ, to be precised later, that are also continuity sets of µ. By the usual procedure, we obtain a countable cover of disjoint µ-continuity sets, each of diameter < δ. Call them Dk. In particular, we find a finite k such that

∪k ( ∪k )c µ Di > 1 − δ, i.e., the remainder R = Di has measure µR < δ. i=1 i=1

k Now, let D consist of 2 unions of selections from { D1,...,Dk }, and put ∩ def U = U4(µ; D, δ) = U4(µ; D, ϵ). D∈D

In particular, ν(R) < 2δ. Now, we will select δ small enough so that U ⊂ V . For that, we must show

ν ∈ U ⇒ ∀ B ν(B) ≤ µ(Bϵ) + ϵ

So, fix a Borel set B and let ν ∈ U. Let’s establish δ. Select from D those sets Di that have non-empty intersection with B. Let D be their union. Observe that

B ⊂ D ∪ R,D ⊂ Bδ.

15finite intersections of members of a subbase make a base

34 Since D ∈ D, ν(D) < µ(D) + δ. So, ( ) ν(B) < ν(D) + ν(R) < µ(D) + δ + 2δ ≤ µ(Bδ) + 3δ.

So, having choosen δ < ϵ/3, ν(B) ≤ µ(Bϵ) + ϵ. That is, U ⊂ V .

Proof of Theorem proh:compact ∗ The set Q may be identified with a subset of the unit ball in the dual space Cb (S). Alaoglu’s Theorem says the unit ball in a dual space to a normed (even topological) space is weak∗-compact. Thus, Q is weak∗ relatively compact. However, weak∗-cluster points of Q might not be probability measures. possibly µ(S) < 1. So, the tightness will ensure any cluster point to be a probability measure. W.l.o.g., we may assume that Q

is closed. A cluster point µ is a limit of some (generalized) sequence µn. That is, there is a directed set N w∗ and µn ∈ Q such that µn → µ. By tightness, for every k there is a compact Kk such that

c ≤ c ≤ c 1 µK lim inf µnKk sup µnKk < . n n k Thus, µ is a probability, for ∪ 1 µ(S) ≥ µ( Kk) = lim(1 − ) = 1. k k k

Conversely, suppose that Q is compact and ϵ > 0. Each member µ of Q is tight, so there are compact Kµ

such that µKµ > 1 − ϵ. Each Kµ, being totally bounded, is contained in a finite union Gµ of open balls of

radius < δ. Clearly, µGµ > 1 − ϵ. Consider a neighborhood of µ of type (3)

U3(µ; Gµ, ϵ).

Since Q is compact, and above sets form an open cover of Q, choose a finite subcover marked by µ1, . . . , µn.

So, we have a finite number of open balls B1,...,Br (making up the union of each Gµk ) of radius < δ, such that ∪ µ Bi > 1 − ϵ. i ··· The numbers ϵ and δ were arbitrary. Rephrasing, for every k and ϵ > 0 there are balls B1, ,Brk with

diam(Bi) < 1/k and ∪rk ϵ µ B > 1 − , µ ∈ Q i 2k i=1 The set ∩ ∪rk C = Bi k i=1 is totally bounded, and µC > 1 − ϵ. Its closure is compact, and even may have a larger measure ≥ 1 − ϵ, uniformly over Q. That makes Q tight.

35