Quick viewing(Text Mode)

Probability and Measure

Probability and Measure

Probability and

Stefan Grosskinsky

Cambridge, Michaelmas 2005

These notes and other information about the course are available on www.statslab.cam.ac.uk/∼stefan/teaching/probmeas.html

The text is based on – and partially copied from – notes on the same course by James Nor- ris, Alan Stacey and Geoffrey Grimmett.

1 Introduction

Motivation from two perspectives:

1. Probability  Probability Ω, P(Ω), P , where Ω is a , P(Ω) the set of events ( in this case) and P : P(Ω) → [0, 1] is the probability measure. If Ω is countable, for every A ∈ P(Ω) we have X  P(A) = P {ω} . ω∈A So calculating probabilities just involves (possibly infinite) sums. If Ω = (0, 1] and P is the uniform probability measure on (0, 1] then for every ω ∈ Ω it is P(ω) = 0. So  X  1 = P (0, 1] 6= “ P {ω} “. w∈(0,1] 2. Integration (Analysis) In general, for continuous Ω, the probability of a set A ⊆ Ω can be computed as Z P(A) = 1A(x) ρ(x) dx , Ω provided this (Riemann-)integral exists. Here 1A is the indicator function of the set A and  ρ is the probability density. In the example above ρ ≡ 1 and this leads to P (a, b] = R 1 1 R b 0 (a,b](x) dx = a dx = b − a. Using standard properties of Riemann-integrals this approach works fine, as long as A is a finite union or intersection of intervals. But e.g. for A = (0, 1] ∩ Q, the Riemann-integral R 1 1 0 A(x) dx is not defined, although the probability for this event is intuitively 0.

1  Moreover, since A is countable, it can be written as A = an : n ∈ N . Define 1 1 fn := {a1,...,an} with fn → A for n → ∞. For every n, fn is Riemann-integrable and R 1 0 fn(x) dx = 0. So it should be the case that 1 1 Z ? Z lim fn(x) dx = 0 = 1A(x) dx , n→∞ 0 0 but the latter integral is not defined. Thus the concept of Riemann-integrals is not satisfac- tory for two reasons: The set of Riemann-integrable functions is not closed, and there are “too many” functions which are not Riemann-integrable.

Goals of this course • Generalisation of Riemann-integration to Lebesgue using measure theory

• Using measure theory as the basis of advanced probability and discussing applications in that area

Official schedule Measure spaces, σ-algebras, π-systems and uniqueness of extension, statement * and proof * of Caratheodory’s´ extension theorem. Construction of on R. The Borel σ-algebra of R. Existence of non-measurable of R. Lebesgue-Stieltjes measures and probability distribution functions. Independence of events, independence of σ-algebras. The Borel-Cantelli lemmas. Kolmogorov’s zero-one law. [6] Measurable functions, random variables, independence of random variables. Construction of the integral, expectation. Convergence in measure and convergence almost everywhere. Fa- tou’s lemma, monotone and dominated convergence, differentiation under the integral sign. Discussion of product measure and statement of Fubini s theorem. [6] Chebyshev s inequality, tail estimates. Jensen’s inequality. Completeness of Lp for 1 ≤ p ≤ ∞ . The Holder¨ and Minkowski inequalities, uniform integrability. [4] L2 as a Hilbert space. Orthogonal projection, relation with elementary conditional probability. Variance and covariance. Gaussian random variables, the multivariate normal distribution. [2] The strong law of large numbers, proof for independent random variables with bounded fourth moments. Measure preserving transformations, Bernoulli shifts. Statements * and proofs * of maximal ergodic theorem and Birkhoff s almost everywhere ergodic theorem, proof of the strong law. [4] The Fourier transform of a finite measure, characteristic functions, uniqueness and inversion. Weak convergence, statement of Levy’s´ convergence theorem for characteristic functions. The central limit theorem. [2]

Appropriate books P. Billingsley Probability and Measure. Wiley 1995 (hardback). R.M. Dudley Real Analysis and Probability. Cambridge University Press 2002 (paperback). R.T. Durrett Probability: Theory and Examples. Wadsworth and Brooks/Cole 1991 (hardback). D. Williams Probability with Martingales. Cambridge University Press (paperback).

From the point of view of analysis, the first chapters of this book might be interesting: W. Rudin Real and Complex Analysis. McGraw-Hill (paperback)

2 2 Set systems and measures

2.1 Set systems Let Ω be an arbitrary set and A ⊆ P(Ω) a set of subsets.

Definition 2.1. Say that A is a ring, if for all A, B ∈ A

(i) ∅ ∈ A (ii) B \ A ∈ A (iii) A ∪ B ∈ A .

Say that A is an algebra (or field), if for all A, B ∈ A

(i) ∅ ∈ A (ii) Ac = Ω \ A ∈ A (iii) A ∪ B ∈ A .

Say that A is a σ-algebra (or σ-field), if for all A and A1,A2,... ∈ A

c [ (i) ∅ ∈ A (ii) A ∈ A (iii) An ∈ A . n∈N Remarks. (i) A (σ-)algebra is closed under (countably) finitely many set operations, since c c cc \  [ c  1 A ∩ B = A ∪ B ∈ A , An = An , n∈N n∈N A \ B = A ∩ Bc ∈ A ,A 4 B = (A \ B) ∪ (B \ A) ∈ A .

(ii) Thus: A is a σ-algebra ⇒ A is an algebra ⇒ A is a ring In general the inverse statementss are false, but in special cases they hold: ⇐ (if Ω is finite) ⇐ (if Ω ∈ A)

Examples. (i) P(Ω) and {∅, Ω} are obviously σ-algebras

n Sn o (ii) Ω = R, R = i=1(ai, bi]: ai < bi, i = 1, . . . , n, n ∈ N is the ring of half-open intervals. R is an algebra if we allow for infinite intervals. R is a σ-algebra if in addition we allow for countably many unions.

 Lemma 2.1. Let Ai : i ∈ I be a (possibly uncountable) collection of σ-algebras. Then T S i∈I Ai is a σ-algebra, whereas i∈I Ai in general is not.

Proof. see example sheet 1

Definition 2.2. Let E ⊆ P(Ω). Then the σ-algebra generated by E is \ σ(E) := A , the smallest σ-algebra containing E. E⊆A A σ−alg.

1as long as (ii) is fulfilled “∪” and “∩” are equivalent

3 Remarks. (i) If A is a σ-algebra, then σ(A) = A.

(ii) Let E1, E2 ⊆ P(Ω) with E1 ⊆ E2. Then σ(E1) ⊆ σ(E2).

 c Example. Let ∅= 6 A ( Ω. Then σ {A} = {∅, Ω, A, A }.

The next example is so important, that we spend an extra definition.

Definition 2.3. Let (Ω, τ) be a topological space with topology τ ⊆ P(Ω) (set of open sets)2. Then σ(τ) is called the Borel σ-algebra of Ω, denoted by B(Ω). A ∈ B(Ω) is called a Borel set. One usually denotes B(R) = B.  Lemma 2.2. Let Ω = R, R the ring of half-open intervals and I = (a, b]: a < b the set of all half-open intervals. Then σ(R) = σ(I) = B.

Proof. (i) I ⊆ R ⇒ σ(I) ⊆ σ(R). On the other hand, each A ∈ R can be written as Sn A = i=1(ai, bi] ∈ σ(I). Thus R ⊆ σ(I) ⇒ σ(R) ⊆ σ(I). T∞ 1 (ii) Each A ∈ I can be written as A = (a, b] = n=1(a, b + n ) ∈ B ⇒ σ(I) ⊆ B. Let A ⊆ R be open, i.e. ∀ x∈A ∃ x>0 : (x − x, x + x) ⊆ A. Thus

∀ x ∈ A ∃ ax, bx ∈ Q : {x} ⊆ (ax, bx] ⊆ A. S Then A = x∈A(ax, bx] which is a countable union, since ax, bx ∈ Q. Thus A ∈ σ(I) ⇒ B ⊆ (I). 2

d d d n Y o Analogously, σ(R ) is generated by I = (ai, bi]: ai < bi, 1 = 1, . . . , d . i=1 Definition 2.4. Let A ⊆ P(Ω) be a σ-algebra. The pair (Ω, A) is a and elements of A are measurable sets.

If Ω is finite or countably infinite, one usually takes A = P(Ω) as relevant σ-algebra.

2.2 Measures Definition 2.5. Let A be a ring on Ω.A set function is any function µ : A → [0, ∞] with µ(∅) = 0.

• µ is called additive if for all A, B ∈ A with A ∩ B = ∅: µ(A ∪ B) = µ(A) + µ(B).

• µ is called countably additive (or σ-additive) if for all sequences (An)n∈ with [  [  X N Ai ∩ Aj = ∅ for i 6= j and An ∈ A: µ An = µ(An). n∈N n∈N n∈N

2 d Having a direct definition of open sets for Ω = R , there is also an axiomatic definition of a topology, namely S (i) ∅ ∈ τ and Ω ∈ τ (ii) ∀ A, B ∈ τ : A ∩ B ∈ τ (iii) i∈I Ai ∈ τ, given Ai ∈ τ for all i ∈ I

4   Note. µ countably additive ⇒ µ additive A1 = A, A2 = B,A3 = A4 = ... = ∅

Definition 2.6. Let (Ω, A) be a measurable space. A countably additive set function µ : A → [0, ∞] is called a measure, the triple (Ω, A, µ) is called measure space. If µ(Ω) < ∞, µ is called finite. If µ(Ω) = 1, µ is a probability measure and (Ω, A, µ) is a probability space. If Ω is a topological space and A = B(Ω), then µ is called .

Basic properties. Let (Ω, A, µ) be a measure space.

(i) µ is non-decreasing: For all A, B ∈ A, A ⊆ B it is µ(B) = µ(B \ A) + µ(A) ≥ µ(A). (Note: The version µ(B \ A) = µ(B) − µ(A) only makes sense if µ(A) < ∞).

(ii) µ is subadditive: For all A, B ∈ A

µ(A) + µ(B) = µ(A \ B) + µ(A ∩ B) + µ(B \ A) +µ(A ∩ B) = | {z } =µ(A∪B) = µ(A ∪ B) + µ(A ∩ B) ≥ µ(A ∪ B) .

(Again: µ(A ∪ B) = µ(A) + µ(B) − µ(A ∩ B) if µ(A ∩ B) < ∞.)

(iii) µ is also countably subadditive (see example sheet 1).

Remark. These properties also hold on a ring, (i) and (ii) also for additive set functions.

Examples.

(i) Discrete measure theory: Let Ω be countable. Any m :Ω → [0, ∞] is called mass function. There is a one-to-one correspondance between mass functions and measures µ on Ω, P(Ω), given by

X X ∀ x ∈ Ω: µ{x} = m(x) and ∀ A ⊆ Ω: µ(A) = µ{x} = m(x) . x∈A x∈A

If µ{x} = 1 for all x ∈ Ω, µ is called counting measure. Sn (ii) Let Ω = R and R be the ring of half-open intervals. For A ∈ R write A = i=1(ai, bi] Pn with disjoint intervals and set µ(A) := i=1(bi − ai). The representation of A is not unique, but the definition of µ is independent of the particular choice. Further, µ is additive and translation invariant, i.e. ∀ x∈R : µ(A + x) = µ(A), where Sn A + x := i=1(ai + x, bi + x]. The key question is:

Can µ be extended to a measure on B?

In order to attack this question in the next subsection, it is useful to introduce the following property of set functions.

5 Definition 2.7. Let A be a ring on Ω and µ : A → [0, ∞] an additive set function. µ is continuous at A ∈ A, if

(i) µ is continuous from below: [ given any A1 ⊆ A2 ⊆ A3 ⊆ ... in A with An = A ∈ A (An % A), n∈N then lim µ(An) = µ(A) n→∞ (ii) µ is continuous from above: \ given any A1 ⊇ A2 ⊇ A3 ⊇ ... in A with An = A ∈ A (An & A) and n∈N µ(An) < ∞ for some n ∈ , then lim µ(An) = µ(A) N n→∞

Lemma 2.3. Let A be a ring on Ω and µ : A → [0, ∞] an additive set function. Then:

(i) µ is countably additive ⇒ µ is continuous at all A ∈ A

(ii) µ is continuous from below at all A ∈ A ⇒ µ is countably additive

(iii) µ is cont. from above at ∅ and µ(A) < ∞ for all A ∈ A ⇒ µ is countably additive

Proof. (i) Given An % A in A, then A = (A1 \ A0) ∪ (A2 \ A1) ∪ (A3 \ A2) ∪ ... (A0 = ∅)

∞ m−1 X  [  ⇒ µ(A) = µ(An+1 \ An) = lim µ (An+1 \ An) = lim µ(Am) . m→∞ m→∞ n=0 n=0

Given An & A in A and µ(Am) < ∞ for some m ∈ N. Let Bn := Am \ An for n ≥ m. Then Bn % (Am \ A) for n → ∞ and thus, following the above, n→∞ µ(Am) − µ(An) = µ(Bn) −→ µ(Am \ A) = µ(Am) − µ(A) .

Since µ(Am) < ∞ this implies lim µ(An) = µ(A). n→∞ (ii) see example sheet 1 (iii) analogous to (i) and (ii) 2

2.3 Extension and uniqueness Theorem 2.4. Caratheodory’s´ extension theorem Let A be a ring on Ω and µ : A → [0, ∞] be a countably additive set function. Then there exists a measure µ0 on Ω, σ(A) such that µ0(A) = µ(A) for all A ∈ A.

The proof is given at the end of this section. To formulate a result on uniqueness two fur- ther notions are useful.

Definition 2.8. Let Ω be a set. A ⊆ P(Ω) os a π-system if

(i) ∅ ∈ A , (ii) ∀ A, B ∈ A : A ∩ B ∈ A .

6 A is a d-system if (i) Ω ∈ A , (ii) ∀ A, B ∈ A,A ⊆ B : B \ A ∈ A , [ (iii) ∀ A1,A2,... ∈ A : A1 ⊆ A2 ⊆ ... ⇒ An ∈ A . n∈N Remarks.  (i) The set I = (a, b] | a ≤ b of half-open intervals is a π-system on R. (ii) A is a σ-algebra ⇔ A is a π- and a d-system (see example sheet).

Lemma 2.5. Dynkin’s π-system lemma Let A be a π-system. Then for any d-system D ⊇ A it is D ⊇ σ(A).

Theorem 2.6. Uniqueness of extension Let A ⊆ P(Ω) be a π-system. Suppose that µ1, µ2 are measures on σ(A) with µ1(Ω) = µ2(Ω) < ∞. If µ1 = µ2 on A then µ1 = µ2 on σ(A).  Proof. Consider D = A ∈ σ(A): µ1(A) = µ2(A) . By hypothesis, Ω ∈ D. For A, B ∈ D with A ⊆ B we have

µi(A) + µi(B \ A) = µi(B) < ∞ , i = 1, 2 , thus also B \ A ∈ D. If An ∈ D, n ∈ N, with An % A, then

µ1(A) = lim µ1(An) = lim µ2(An) = µ2(A) , n→∞ n→∞ so A ∈ D. Thus D ⊆ σ(A) is a d-system containing the π-system A, so D = σ(A) by Dynkin’s lemma. 2

These theorems provide general tools for the construction and characterisation of measures. Before we apply them in a specific context in the next subsection we give the missing proofs.

Proof of Caratheodory’s´ extension theorem. ∗ X For any B ⊆ Ω, define the outer measure µ (B) = inf µ(An) , n S where the infimum is taken over all sequences (An)n∈N in A such that B ⊆ n An and is taken to be ∞ if there is no such sequence. Note that µ∗ is increasing and µ∗(∅) = 0. Let us say that A ⊆ Ω is µ∗-measurable if, for all B ⊆ Ω, µ∗(B) = µ∗(B ∩ A) + µ∗(B ∩ Ac) . Write M for the set of all µ∗-measurable sets. We shall show that M is a σ-algebra containing A and that µ∗ is a measure on M, extending µ. This will prove the theorem. Step I. We show that µ∗ is countably subadditive. S ∗ Suppose that B ⊆ n Bn. If µ (Bn) < ∞ for all n, then, given  > 0, there exist sequences (Anm)m∈N in A, with [ ∗ n X Bn ⊆ Anm, µ (Bn) + /2 ≥ µ(Anm) . m m

7 [ [ ∗ X X X ∗ Then B ⊆ Anm and thus µ (B) ≤ µ(Anm) ≤ µ (Bn) +  . n m n m n ∗ X ∗ Hence, in any case µ (B) ≤ µ (Bn) . n Step II. We show that µ∗ extends µ. Since A is a ring and µ is countably additive, µ is countably subadditive. Hence, for A ∈ A S X and any sequence (An)n∈N in A with A ⊆ n An, we have µ(A) ≤ µ(An). n On taking the infimum over all such sequences, we see that µ(A) ≤ µ∗(A). On the other hand, it is obvious that µ∗(A) ≤ µ(A) for A ∈ A. Step III. We show that M contains A. Let A ∈ A and B ⊆ Ω. We have to show that µ∗(B) = µ∗(B ∩ A) + µ∗(B ∩ Ac) . By subadditivity of µ∗, it is enough to show that µ∗(B) ≥ µ∗(B ∩ A) + µ∗(B ∩ Ac) . If µ∗(B) = ∞, this is trivial, so let us assume that µ∗(B) < ∞. Then, given  > 0, we can

find a sequence (An)n∈N in A such that [ ∗ X B ⊆ An , µ (B) +  ≥ µ(An) . n n [ c [ c Then B ∩ A ⊆ (An ∩ A) ,B ∩ A ⊆ (An ∩ A ) , so that n n ∗ ∗ c X X c X ∗ µ (B ∩ A) + µ (B ∩ A ) ≤ µ(An ∩ A) + µ(An ∩ A ) = µ(An) ≤ µ (B) +  . n n n Since  > 0 was arbitrary, we are done. Step IV. We show that M is an algebra. c Clearly Ω ∈ M and A ∈ A whenever A ∈ A. Suppose that A1,A2 ∈ M and B ⊆ Ω. Then ∗ ∗ ∗ c µ (B) = µ (B ∩ A1) + µ (B ∩ A1) ∗ ∗ c ∗ c = µ (B ∩ A1 ∩ A2) + µ (B ∩ A1 ∩ A2) + µ (B ∩ A1) ∗ ∗ c ∗ c c = µ (B ∩ A1 ∩ A2) + µ (B ∩ (A1 ∩ A2) ∩ A1) + µ (B ∩ (A1 ∩ A2) ∩ A1) ∗ ∗ c = µ (B ∩ (A1 ∩ A2)) + µ (B ∩ (A1 ∩ A2) ) .

Hence A1 ∩ A2 ∈ M. Step V. We show that M is a σ-algebra and that µ∗ is a measure on M. We already know that M is an algebra, so it suffices to show that, for any sequence of disjoint S sets (An)n∈N in M, for A = n An we have ∗ X ∗ A ∈ M , µ (A) = µ (An). n So, take any B ⊆ Ω, then ∗ ∗ ∗ c ∗ ∗ ∗ c c µ (B) = µ (B ∩ A1) + µ (B ∩ A1) = µ (B ∩ A1) + µ (B ∩ A2) + µ (B ∩ A1 ∩ A2) n X ∗ ∗ c c = ... = µ (B ∩ Ai) + µ (B ∩ A1 ∩ ... ∩ An) . i=1

8 ∗ c c ∗ c Note that µ (B ∩ A1 ∩ ... ∩ An) ≥ µ (B ∩ A ) for all n. Hence, on letting n → ∞ and using countable subadditivity, we get ∞ ∗ X ∗ ∗ c ∗ ∗ c µ (B) ≥ µ (B ∩ An) + µ (B ∩ A ) ≥ µ (B ∩ A) + µ (B ∩ A ). n=1 The reverse inequality holds by subadditivity, so we have equality. Hence A ∈ M and, setting ∞ ∗ X ∗ B = A, we get µ (A) = µ (An) . 2 n=1

Proof of Dynkin’s π-system lemma. Denote by D the intersection of all d-systems containing A. Then D is itself a d-system. We shall show that D is also a π-system and hence a σ-algebra, thus proving the lemma. Consider D0 = B ∈ D : B ∩ A ∈ D for all A ∈ A . Then A ⊆ D0 because A is a π-system. Let us check that D0 is a d-system: clearly Ω ∈ D0; 0 next, suppose B1,B2 ∈ D with B1 ⊆ B2, then for A ∈ A we have

(B2 \ B1) ∩ A = (B2 ∩ A) \ (B1 ∩ A) ∈ D , 0 0 because D is a d-system, so B2 \ B1 ∈ D ; finally, if Bn ∈ D , n ∈ N, and Bn % B, then for A ∈ A we have

Bn ∩ A % B ∩ A, so B ∩ A ∈ D and B ∈ D0. Hence D = D0. Now consider D00 = B ∈ D : B ∩ A ∈ D for all A ∈ D . Then A ⊆ D00 because D = D0. We can check that D00 is a d-system, just as we did for D0. Hence D00 = D which shows that D is a π-system. 2

2.4 Lebesgue(-Stieltjes) measure

Theorem 2.7. There exists a unique Borel measure µ on (R, B) such that  µ (a, b] = b − a , for all a, b ∈ R with a < b.

Proof. (Existence) Let R be the ring of finite unions of disjoint intervals of the form Pn A = (a1, b1] ∪ ... ∪ (an, bn] and consider the set function µ(A) = i=1(bi − ai) given above. We aim to show that µ is countably additive on A, which then proves existence by Caratheodory’s´ extension theorem. Since µ(A) < ∞ for all A ∈ R it suffices to show that µ is continuous from above by Lemma 2.3 (iii). Suppose not. Then there exists  > 0 and An & ∅ with µ(An) ≥ 2 for all n. −n For each n we can find Cn ∈ A with C¯n ⊆ Bn and µ(An \ Cn) ≤ 2 . Then

µ(C1 ∩ ... ∩ Cn) = µ(An) − µ(An \ C1) − ... − µ(An \ Cn) ≥ X −n ≥ µ(An) − µ(A1 \ C1) − ... − µ(An \ Cn) ≥ 2 − 2 =  , n∈N

9 ¯ ¯ and in particular C1 ∩ ... ∩ Cn 6= ∅. Thus Kn = C1 ∩ ... ∩ Cn, n ∈ N is a monotone sequence T T of compact non-empty sets, so ∅= 6 Kn ⊆ An which is a contradiction to An & ∅. n∈N n∈N (Uniqueness) For each n consider  µn(A) := µ (n, n + 1] ∩ A .

Then µn is a probability measure on (R, B), so, by Theorem 2.6, µn is uniquely determinded P by its values on the π-system I generating B. Since µ(A) = µn(A) it follows that µ is n∈N also uniquely determined. 2

Definition 2.9. A ⊆ R is called null if A ⊆ B ∈ B with µ(B) = 0. Denote by N the set of all null sets. Then L = B ∪ N : b ∈ B,N ∈ N is the Lebesgue σ-algebra. The sets in L are called Lebesgue-measurable sets or Lebesgue sets. The Borel measure µ can be extended to L via

λ(B ∪ N) := µ(B) for all B ∈ B,N ∈ N .

λ is called Lebesgue measure. (see example sheet 1.7)

Theorem 2.8. B ( L ( P(R).

Proof. (i) We argue that B and L have different : According to example sheet 1.4, B is separable and thus card(B) = c .3 With example sheet 1.10 we have for the Cantor set C: card(C) = c and µ(C) = 0. Since with Definition 2.9 P(C) ⊆ L we have cardL = 2c > c. (ii) Using the axiom of choice we construct a of U = [0, 1] which is not L.-measurable: Define the equivalence relation ∼ on U by x ∼ y if x − y ∈ Q. Write E = {Ei : i ∈ I} for the equivalence classes of ∼ and let R be a collection of representatives of the Ei, chosen by the axiom of choice. Then U can be partitioned [ [ [ U = Ei = (R + q) = {r + q : r ∈ R} , i∈I q∈Q∩[0,1) q∈Q∩[0,1) where + is to be understood modulo 1. Suppose R ∈ L, then R + q ∈ L and λ(R) = λ(R + q) for all q ∈ Q by translation invariance of λ. But by countable additivity of λ this means X λ(R) = λ(U) = 1 , q∈Q∩[0,1) which leads to a contradiction for either λ(R) = 0 or λ(R) > 0. Thus R 6∈ L. 2

Remarks. (i) Every set of positive measure has non-measurable subsets and, moreover:

every subset of A is Lebesgue-measurable ⇔ λ(A) = 0 .

(ii) There exists no translation invariant, countably additive set function on P(R) with µ(A) ∈ (0, ∞) for at least one A ⊆ R.

3 Notation: The cardinality of a is card(N) = ℵ0 , of a continuous set card(R) = c  c and of the power set card P(R) = 2 .

10 Definition 2.10. F : R → R is called distribution function if

(i) F is non-decreasing (ii) F is right-continuous, i.e. lim F (x) = F (x0) . x&x0 F is a probability distribution function if in addition

(iii) lim F (x) = 1 , lim F (x) = 0 . x→∞ x→−∞

Proposition 2.9. Let µ be a Radon measure4 on (R, B). Then  µ(0, x] , x > 0 F (x) := −µ(x, 0] , x ≤ 0 is a distr. fct. with µ(a, b] = F (b) − F (a), a < b. A distribution function with this property is uniquely determined up to an additive constant.

 Proof. F % and µ (a, b] = F (b) − F (a) for b > a by definition. For xn & x > 0   F (xn) = µ (0, xn] & µ (0, x] = F (x) by continuity of measures. Let F and G be two such distribution functions. Then for all a < b

µ(∅) = µ(a, b] − µ(a, b] = F (b) − F (a) − G(b) + G(a) = 0 , and F = G + c for some constant c ∈ R. 2

Remarks. (i) The distribution functions for the Lebesgue measure are F (x) = x + c, c ∈ R. (ii) Adding a constant is equivalent to shifting the reference point r in the definition of F , which is taken to r = 0 in Proposition 2.9.

(iii) If µ is a finite measure, e.g. in probability, one usually uses the cumulative distribution function with r = −∞, F (x) = µ(−∞, x] .

On the other hand, Radon measures are uniquely determined by their distribution function.

Theorem 2.10. Let F : R → R be a distribution function. Then there exists a unique µF on (R, B), such that  µF (a, b] = F (b) − F (a) for all a < b .

The completion λF of this measure, defined analogously to λ on an extended σ-algebra LF , is called the Lebesgue-Stieltjes measure of F .

Proof. Same as for Lebesgue measure. But note that the null sets for µF might be differ- ent, so that LF is in general a different σ-algebra than L.

4i.e. µ(K) < ∞ for K ∈ B compact

11 Examples of distribution functions.

0 (i) If a (probability) distribution function F is differentiable on R, then f = F is called the (probability) density function, and for A ∈ LF it is Z Z λF (A) = dλF (x) = f(x) dx . A A

2 √ For instance, f(x) = 1 for the Lebesgue measure, and f(x) = ex /2/ 2π for the Gaussian (normal) distribution.

(ii) Let E = {e1, e2,...} ⊆ R be countable and p : E → [0, ∞] a mass function with X corresponding measure µ(B) = p(ei) defined for all B ∈ P(R). ei∈B X Then F (x) = p(ei) is a distribution function for the measure µ, and the contin-

ei≤x p(x) , x ∈ E uation p¯(x) = is a discrete analogon of the density function. 0 , x 6∈ E (iii) Let F be the uniform probability distribution function on the Cantor set C (see example 0 sheet 1.10). Then F is continuous on R, but differentiable only on R\C with F (x) = 0, so there exists no probability density.

2.5 Independence and Borel-Cantelli lemmas

Let (Ω, A, P) be a probability space. It provides a model for an experiment whose outcome is random. Ω describes the set of possible outcomes, A the set of events (observable sets of outcomes) and P(A) is the probability of an event A ∈ A.

Definition 2.11. The events (Ai)i∈I , Ai ∈ A, are said to be independent if  \  Y P Ai = P(Ai) for all finite J ⊆ I. i∈J i∈J

The σ-algebras (Ai)i∈I , Ai ⊆ A are said to be independent if the events (Ai)i∈I are indepen- dent for any choice Ai ∈ Ai.

A useful way to establish independence of two σ-algebra is given below.

Theorem 2.11. Let A1, A2 ⊆ A be π-systems and suppose that

P(A1 ∩ A2) = P(A1) P(A2) whenever A1 ∈ A1,A2 ∈ A2 .

Then σ(A1) and σ(A2) are independent.

Proof. Fix A1 ∈ A1 and define for A ∈ A

µ(A) := P(A1 ∩ A) , ν(A) := P(A1) P(A) .

12 µ and ν agree on the π-system A2 with µ(Ω) = ν(Ω) = P(A1) < ∞. So, by uniqueness of extension (Theorem 2.6), for all A1 ∈ A1 and A2 ∈ σ(A2)

P(A1 ∩ A2) = µ(A2) = ν(A2) = P(A1) P(A2) .

Now fix A2 ∈ σ(A2) and repeat the same argument with

0 0 µ (A) := P(A ∩ A2) , ν (A) := P(A) P(A2) to show that for all A1 ∈ σ(A1) we have P(A1 ∩ A2) = P(A1) P(A2). 2   Remark. In particular, the σ-algebras σ {A1} and σ {A2} generated by single events are independent if A1 and A2 are independent.

Definition 2.12. Let (An)n∈N be a sequence of sets in Ω. Then  [ \ 00 00 lim inf An := ω ∈ Ω: ω ∈ Anfor all but finitely many n = Ak = An ev. , n∈N k≥n  \ [ 00 00 lim sup An := ω ∈ Ω: ω ∈ Anfor infinitely many n = Ak = An i.o. . n∈N k≥n

Remarks. (i) If An ∈ A for all n ∈ N then lim inf An, lim sup An ∈ A. T S (ii) lim inf An ⊆ lim sup An , since for all m, n: k≥m Ak ⊆ Am∨n ⊆ k≥n Ak . c c (iii) (lim sup An) = lim inf An

Lemma 2.12. First Borel-Cantelli lemma X If P(An) < ∞ , then P(An i.o.) = 0 . n∈N  \ [   [  X Proof. P(An i.o.) = P Ak ≤ P Ak ≤ P(Ak) → 0 for n → ∞. 2 n∈N k≥n k≥n k≥n

This argument is also valid if P is not a probability measure.

Lemma 2.13. Second Borel-Cantelli lemma X Suppose that (An)n∈N are independent. If P(An) = ∞ , then P(An i.o.) = 1. n∈N −a Proof. We use the inequality 1 − a ≤ e . Set an = P(An). Then we have for all n ∈ N

 \ c  Y h X i P Ak = (1 − ak) ≤ exp − ak = 0 . k≥n k≥n k≥n

c  [ \ c  Hence P(An i.o.) = 1 − P(lim inf An) = 1 − P Ak = 0 . 2 n∈N k≥n

13