1 Probability Space

Introduction to Probability Theory Unless otherwise noted, references to Theorems, page numbers, etc. from Casella- Berger, chap 1. Statistics: draw conclusions about a population of objects by sampling from the population 1 Probability space We start by introducing mathematical concept of a probability space, which has three components (Ω; B;P ), respectively the sample space, event space, and probability function. We cover each in turn. Ω: sample space. Set of outcomes of an experiment. Example: tossing a coin twice. Ω = fHH;HT;TT;THg An event is a subset of Ω. Examples: (i) \at least one head" is fHH; HT; T Hg; (ii) \no more than one head" is fHT; T H; T T g. &etc. In probability theory, the event space B is modelled as a σ-algebra (or σ-field) of Ω, which is a collection of subsets of Ω with the following properties: (1) ; 2 B (2) If an event A 2 B, then Ac 2 B (closed under complementation) 1 (3) If A1;A2;::: 2 B, then [i=1Ai 2 B (closed under countable union). A countable sequence can be indexed using the natural integers. Additional properties: (4) (1)+(2) ! Ω 2 B 1 1 (5) (3)+De-Morgan's Laws !\i=1Ai 2 B (closed under coutable intersection) Consider the two-coin toss example again. Even for this simple sample space Ω = fHH;HT;TT;THg, there are multiple σ-algebras: 1. f;; Ωg: \trivial" σ-algebra 1(A [ B)c = Ac \ Bc 1 2. The \powerset" P(Ω), which contains all the subsets of Ω In practice, rather than specifying a particular σ-algebra from scratch, there is usually a class of events of interest, C, which we want to be included in the σ-algebra. Hence, we wish to \complete" C by adding events to it so that we get a σ-algebra. For example, consider 2-coin toss example again. We find the smallest σ-algebra containing (HH); (HT ); (TH); (TT ); we call this the σ-algebra \generated" by the fundamental events (HH); (HT ); (TH); (TT ). It is... Formally, let C be a collection of subsets of Ω. The minimal σ-field generated by C, denoted σ(C), satisfies: (i) C ⊂ σ(C); (ii) if B0 is any other σ-field containing C, then σ(C) ⊂ B0. Finally, a probability function P assigns a number (\probability") to each event in B. It is a function mapping B! [0; 1] satisfying: 1. P (A) ≥ 0, for all A 2 B. 2. P (Ω) = 1 3. Countable additivity: If A1;A2; · · · 2 B are pairwise disjoint (i.e., Ai \ Aj = ;, for 1 P1 all i 6= j), then P ([i=1Ai) = i=1 P (Ai). Define: Support of P is the set fA 2 B : P (A) > 0g. Example: Return to 2-coin toss. Assuming that the coin is fair (50/50 chance of getting heads/tails), then the probability function for the σ-algebra consisting of all subsets of Ω is Event AP (A) 1 HH 4 1 HT 4 1 TH 4 1 TT 4 ; 0 Ω 1 3 (HH, HT, TH) 4 (using pt. (3) of Def'n above) 1 (HH,HT) 2 . 2 1.1 Probability on the real line In statistics, we frequently encounter probability spaces defined on the real line (or a portion thereof). Consider the following probability space: ([0; 1]; B([0; 1]); µ) 1. The sample space is the real interval [0; 1] 2. B([0; 1]) denotes the \Borel" σ-algebra on [0,1]. This is the minimal σ-algebra generated by the elementary events f[0; b); 0 ≤ b ≤ 1g. This collection contains 1 2 1 2 1 1 2 things like [ 2 ; 3 ], [0; 2 ] [ ( 3 ; 1], 2 ,[ 2 ; 3 ]. • To see this, note that closed intervals can be generated as countable inter- sections of open intervals (and vice versa): 1 lim [0; 1=n) = \n=1[0; 1=n) = f0g ; n!1 1 lim (0; 1=n) = \n=1(0; 1=n) = ;; n!1 1 (1) lim (a − 1=n; b + 1=n) = \n=1(a − 1=n; b + 1=n) = [a; b] n!1 1 lim [a + 1=n; b − 1=n] = [n=1[a + 1=n; b − 1=n] = (a; b) n!1 (Limit has unambiguous meaning because the set sequences are mono- tonic.) • Thus, B([0; 1]) can equivalently be characterized as the minimal σ-field generated by: (i) the open intervals (a; b) on [0; 1]; (ii) the closed intervals [a; b]; (iii) the closed half-lines [0; a], and so on. • Moreover: it is also the minimal σ-field containing all the open sets in [0; 1]: B([0; 1]) = σ(open sets on [0; 1]): • This last characterization of the Borel field, as the minimal σ-field containing the open subsets, can be generalized to any metric space (ie. so that \openness" is defined). This includes R, Rk, even functional spaces (eg. L2[a; b], the space of square-integrable functions on [a; b]). 3. µ(·), for all A 2 B, is Lebesgue measure, defined as the sum of the lengths of the 1 2 1 1 2 5 1 intervals contained in A. Eg.: µ([ 2 ; 3 ]) = 6 , µ([0; 2 ] [ ( 3 ; 1]) = 6 , µ([ 2 ]) = 0. 3 More examples: Consider the measurable space ([0; 1]; B). Are the following probability measures? • for some δ 2 [0; 1], A 2 B, µ(A) if µ(A) ≤ δ P (A) = 0 otherwise • 1 if A = [0; 1] P (A) = 0 otherwise • P (A) = 1, for all A 2 B. Can you figure out an appropriate σ-algebra for which these functions are probability measures? For third example: take σ-algebra as f;; [0; 1]g. 1.2 Additional properties of probability measures (CB Thms 1.2.8-11) For prob. fxn P and A; B 2 B: • P (;) = 0; • P (A) ≤ 1; • P (Ac) = 1 − P (A). • P (B \ Ac) = P (B) − P (A \ B) • P (A [ B) = P (A) + P (B) − P (A \ B); • Subadditivity (Boole's inequality): for events Ai; i ≥ 1, 1 1 X P ([i=1Ai) ≤ P (Ai): i=1 • Monotonicity: if A ⊂ B, then P (A) ≤ P (B) 4 P1 • P (A) = i=1 P (A \ Ci) for any partition C1;C2;::: By manipulating the above properties, we get P (A \ B) = P (A) + P (B) − P (A [ B) (2) ≥ P (A) + P (B) − 1 which is called the Bonferroni bound on the joint event A \ B. (Note: when P (A) and P (B) are small, then bound is < 0, which is trivially correct. Also, bound is always ≤ 1.) With three events, the above properties imply: 3 3 3 X X P ([i=1Ai) = P (Ai) − P (Ai \ Aj) + P (A1 \ A2 \ A3) i=1 i<j and with n events, we have n n n n X X X P ([i=1Ai) = P (Ai) − P (Ai \ Aj) + P (Ai \ Aj \ Ak)+ i=1 1≤i<j≤n 1≤i<j<k≤n n+1 ··· + (−1) P (A1 \ A2 \···\ An): This equality, the inclusion-exclusion formula, can be used to derive a wide variety of bounds (depending on what is known and unknown). 2 Updating information: conditional probabilities Consider a given probability space (Ω; B;P ). Definition 1.3.2: if A; B 2 B, and P (B) > 0, then the conditional probability of A given B, denoted P (AjB) is P (A \ B) P (AjB) = : P (B) If you interpret P (A) as \the prob. that the outcome of the experiment is in A", then P (AjB) is \the prob. that the outcome is in A, given that you know it is in B". • If A and B are disjoint, then P (AjB) = 0=P (B) = 0. 5 • If A ⊂ B, then P (AjB) = P (A)=P (B) < 1. Here \B is necessary for A". • If B ⊂ A, then P (AjB) = P (B)=P (B) = 1. Here \B implies A" As CB point out, when you condition on the event B, then B becomes the sample space of a new probability space, for which the P (·|B) is the appropriate probability measure: Ω ! B B ! fA \ B; 8A 2 Bg P (·) ! P (·|B) From manipulating the conditional probability formula, you can get that P (A \ B) = P (AjB) · P (B) = P (BjA) · P (A) P (BjA) · P (A) ) P (AjB) = : P (B) P1 For a partition of disjoint events A1;A2;::: of Ω: P (B) = i=1 P (B \ Ai) = P1 i=1 P (BjAi)P (Ai). Hence: P (BjAi) · P (Ai) P (AijB) = P1 i=1 P (BjAi)P (Ai) which is Baye's Rule. Example: Let's Make a Deal. • There are three doors (numbered 1,2,3). Behind one of them, a prize has been randomly placed. • You (the contestant) bet that prize is behind door 1 • Monty Hall opens door 2, and reveals that there is no prize behind door 2. • He asks you: \do you want to switch your bet to door 3?" 6 Informally: MH has revealed that prize is not behind 2. There are two cases: either (a) it is behind 1, or (b) behind 3. In which case is MH's opening door 2 more probable? In case (a), MH could have opened either 2 or 3; in case (b), MH is forced to open 2 (since he cannot open door 1, because you chose that door). MH's opening of door 2 is more probable under case (b), so you should switch. (This is actually a \maximum likelihood" argument.) More formally, define two random variables D (for door behind which the prize is) and M (denoted the door which Monty opens).

Load more