Probabilistic Measure Theory

Probabilistic Measure Theory Andrew Kobin Spring 2015 Contents Contents Contents 0 Introduction 1 1 Probability and Normal Numbers 4 1.1 The Weak Law of Large Numbers . .6 1.2 The Strong Law of Large Numbers . .8 1.3 Further Properties of Normal Numbers . 14 2 Probability Measures 19 2.1 Fields, σ-fields and Probability Measures . 19 2.2 The Lebesgue Measure on the Unit Interval . 26 2.3 Extension to σ-fields . 27 2.4 π-systems and λ-systems . 32 2.5 Monotone Classes . 34 2.6 Complete Extensions . 35 2.7 Non-Measurable Sets . 37 3 Denumerable Probabilities 38 3.1 Limit Inferior, Limit Superior and Convergence . 38 3.2 Independence . 40 3.3 Subfields . 43 3.4 The Borel-Cantelli Lemmas . 44 4 Simple Random Variables 47 4.1 Convergence in Measure . 49 4.2 Independent Variables . 51 4.3 Expected Value and Variance . 52 4.4 Abstract Laws of Large Numbers . 55 4.5 Second Borel-Cantelli Lemma Revisited . 58 4.6 Bernstein's Theorem . 60 4.7 Gambling . 62 4.8 Markov Chains . 68 4.9 Transience and Persistence . 72 5 Abstract Measure Theory 80 5.1 Measures . 80 5.2 Outer Measure . 83 5.3 Lebesgue Measure on Rn ............................. 87 5.4 Measurable Functions . 91 5.5 Distribution Functions . 93 6 Integration Theory 97 6.1 Measure-Theoretic Integration . 97 6.2 Properties of Integration . 100 i 0 Introduction 0 Introduction These notes were compiled from a course on measure-theoretic probability theory taught by Dr. Sarah Raynor in Spring 2015 at Wake Forest University. The course covers the basic concepts in measure theory and uses them to deepen understanding of probability. The companion text for the course is Probability and Measure, 4th ed., by P. Billingsley. One of the best examples to illustrate the nuance of measure theory is the Cantor set. The Cantor set C is defined as follows. Let A0 = [0; 1], the unit interval. Let A1 be the 1 2 set A0 − 3 ; 3 formed by deleting the middle third of A0. Next, A2 is similarly formed 1 2 7 8 by deleting the middle thirds 9 ; 9 and 9 ; 9 from each component of A1. The process is continued to define a sequence 1 [ 1 + 3k 2 + 3k A = A − ; : n n−1 3n 3n k=0 1 \ Finally, the Cantor set is the subset of [0; 1] given by C = An, that is, the points remaining n=0 in the unit interval after iterating this process over the natural numbers. Length is our first idea of measure, from which many others will stem. If we take the usual length of an interval on the real line to be end point minus starting point, then the unit interval [0; 1] has length 1. One may then ask: How long is the Cantor set? To measure the length of C, we instead calculate the length of its complement and subtract it from 1. This is the following infinite sum: 1 2 4 8 + + + + ::: 3 9 27 81 1=3 which is a geometric series converging to = 1. Thus the complement of the Cantor 1 − 2=3 set has length 1, but the total unit interval has length 1 meaning the Cantor set has length 0. This is our first example of a set of measure zero. Area, volume, hypervolume, etc. are all extensions of length to higher dimensions | these are also examples of measures. For example, the area of an annulus is easy to compute. Consider the following region R. R 1 2 1 0 Introduction We compute the area A by A = 4π − π = 3π. However, one may also want to compute the mass of the annulus, say if it were made of aluminum or steel. Given a density function, e.g. ρ = e−r2 kg=cm2, find the mass of the annular region. This is computed by a double integral, 2π 2 2π ZZ Z Z 2 1 Z ρ dA = e−r r dr dθ = − (e−2 − e−1) dθ = π(e−1 − e−2): R 0 1 2 0 If we think of a double integral as the limit of the process of breaking the region into smaller regions and adding together all their masses, we see the same concept at work as in the Cantor set example. How does this relate to probability? Example 0.0.1. What is the probability of rolling a prime number on a standard six-sided die? This can be computed by the same divide-and-conquer approach: 3 1 P (prime) = P (2) + P (3) + P (5) = = : 6 2 Example 0.0.2. When playing craps (rolling two dice), what is the probability of rolling either a 7 or an 11? 6 2 2 P (7 or 11) = P (7) + P (11) = + = : 36 36 9 Example 0.0.3. Given a dartboard of unit area, the probability of hitting a small region on the board with your dart is precisely the area of that region: There is a common theme among the above examples, which is that the calculation of probability relies on our ability to measure things and compare the relative measures. We generalize this in the following way. Definition. A measure µ on a set S is a function µ : P(S) ! [0; 1] such that µ is 1 [ countably additive, that is, if A is a subset of S of the form An and An \ Am = ? for all n=1 1 X n 6= m, then µ(A) = µ(An). n=1 This isn't quite the full definition yet; we will formalize everything in Chapter 5. However, some interesting questions arise from defining a measure this way: 2 0 Introduction 1 Is every subset of S measurable? When the set is finite, the answer is yes. However, for the unit interval with length as a measure, the answer is no. A counterexample is difficult to produce at this time. 2 The Banach-Tarski Paradox (sometimes called the Banach-Tarski Theorem) says that it is possible to take a solid ball of any size, say the size of a basketball, decompose it into finitely many pieces and put them back together only using rigid motions to get a ball the size of the sun. How is this possible? The domain of a measure must have a special structure, which is called a σ-field (sometimes σ-algebra in the literature). Definition. Let S be a set and F a collection of subsets of S. F is a σ-algebra provided (1) S 2 F. (2) If A 2 F then AC 2 F as well. 1 [ (3) If A1;A2;::: 2 F (this may be a countable list) then An 2 F. n=1 It turns out that this is just enough structure to allow us to define a measure on F. This will be the main ùniverse' in which we work, defining probability measures and developing their applications. The first four chapters may be treated as an extensive case study of probability spaces, that is, spaces with measure 1. In Chapter 5 we finally introduce the terminology and main theorems in abstract measure theory. 3 1 Probability and Normal Numbers 1 Probability and Normal Numbers In these notes we will denote a sample space by Ω and a particular event taken from this sample space by !. Our prototypical example will have Ω = (0; 1]. For technical reasons we will always assume an interval of the real line is of the form (a; b] so that collections of intervals may be chosen disjointly (so they don't overlap at the endpoints). If I = (a; b] we will denote the usual notion of length by jIj = jb − aj. n [ Suppose A = Ii where Ii = (ai; bi] are pairwise disjoint intervals in the sample space i=1 Ω = (0; 1]. We define Definition. The probability of event A occuring within the sample space Ω is n n X X P (A) := jIij = jbi − aij: i=1 i=1 At this point we are carefully avoiding complicated subsets of Ω, such as the Cantor set in the introduction. These will be the focus in later chapters. If A and B are disjoint subsets of Ω and each of A; B is a finite disjoint union of intervals, then P (A [ B) = P (A) + P (B): This is called the finite additivity of probability. So far we have brushed over something important: is our definition of P (A) well-defined? That is, if A has two different represen- tations as finite disjoint unions of intervals in Ω, do they both give the same probability? n m [ [ Well suppose A = Ii = Jj. We create a collection of intervals Kij = Ii \ Jj, called a i=1 j=1 refinement of the Ii and Jj. Notice that m n m n [ [ [ [ A = Kij = (Ii \ Jj): j=1 i=1 j=1 i=1 This implies well-definedness of our definition of P (A). Example 1.0.4. This relates to the Riemann integral in an important way. For a subset A ⊂ Ω which is a disjoint union of finitely many intervals in Ω = (0; 1], define the characteristic function n ( X 1 x 2 Ii fA = χIi where χIi (x) = i=1 0 x 62 Ii: m X Similarly define gB = χJj . Then finite additivity of probability implies the additive j=1 property of Riemann integrals: Z 1 Z 1 Z 1 (fA + gB) dx = fA dx + gB dx: 0 0 0 4 1 Probability and Normal Numbers This is because Z 1 χI (x) dx = jIj = b − a: 0 Keep in mind that for the moment we are only dealing with event spaces that are finite disjoint unions of half-open intervals; when we encounter more complicated subsets of Ω, Riemann integration breaks down.

Load more