Probability (Graduate Class) Lecture Notes
Total Page:16
File Type:pdf, Size:1020Kb
Probability (graduate class) Lecture Notes Tomasz Tkocz∗ These lecture notes were written for the graduate course 21-721 Probability that I taught at Carnegie Mellon University in Spring 2020. ∗Carnegie Mellon University; [email protected] 1 Contents 1 Probability space 6 1.1 Definitions . .6 1.2 Basic examples . .7 1.3 Conditioning . 11 1.4 Exercises . 13 2 Random variables 14 2.1 Definitions and basic properties . 14 2.2 π λ systems . 16 − 2.3 Properties of distribution functions . 16 2.4 Examples: discrete and continuous random variables . 18 2.5 Exercises . 21 3 Independence 22 3.1 Definitions . 22 3.2 Product measures and independent random variables . 24 3.3 Examples . 25 3.4 Borel-Cantelli lemmas . 28 3.5 Tail events and Kolmogorov's 0 1law .................. 29 − 3.6 Exercises . 31 4 Expectation 33 4.1 Definitions and basic properties . 33 4.2 Variance and covariance . 35 4.3 Independence again, via product measures . 36 4.4 Exercises . 39 5 More on random variables 41 5.1 Important distributions . 41 5.2 Gaussian vectors . 45 5.3 Sums of independent random variables . 46 5.4 Density . 47 5.5 Exercises . 49 6 Important inequalities and notions of convergence 52 6.1 Basic probabilistic inequalities . 52 6.2 Lp-spaces . 54 6.3 Notions of convergence . 59 6.4 Exercises . 63 2 7 Laws of large numbers 67 7.1 Weak law of large numbers . 67 7.2 Strong law of large numbers . 73 7.3 Exercises . 78 8 Weak convergence 81 8.1 Definition and equivalences . 81 8.2 Relations to other notions of convergence and basic algebraic properties 86 8.3 Compactness . 87 8.4 Exercises . 90 9 Characteristic functions 92 9.1 Definition and basic properties . 92 9.2 Inversion formulae . 94 9.3 Relations to convergence in distribution . 97 9.4 Exercises . 100 10 Central limit theorem 105 10.1 Auxiliary elementary lemmas . 105 10.2 Vanilla Central Limit Theorem . 107 10.3 Lindeberg's Central Limit Theorem . 108 10.4 Multidimensional case . 110 10.5 Poisson limit theorem . 111 10.6 Exercises . 114 11 Quantitative versions of the limit theorem* 118 11.1 Berry-Esseen theorem via Stein's mehtod . 118 11.2 Local central limit theorem . 124 11.3 Poisson limit theorem . 128 11.4 Exercises . 133 12 Conditional expectation 134 12.1 Construction . 134 12.2 Important properties . 137 12.3 Basic examples . 139 12.4 Exercises . 141 13 Martingales I 142 13.1 Definitions and basic examples . 142 13.2 Martingale transforms and stopping times . 144 13.3 Convergence theorem . 148 3 13.4 Exercises . 151 14 Martingales II 154 14.1 L2 martingales . 154 14.2 Uniformly integrable martingales . 159 14.3 Maximal inequalities . 161 14.4 Martingales bounded in Lp, p > 1...................... 163 14.5 Exercises . 165 15 Applications of martingale theory 166 15.1 Series of independent random variables . 166 15.2 Kolmogorov's 0 1 law and strong law of large numbers . 168 − 15.3 Kakutani's theorem . 168 15.4 The law of the iterated logarithm for Gaussians . 170 15.5 The Radon-Nikodym theorem . 172 15.6 Exercises . 177 16 Large deviations 178 16.1 Moment generating functions . 179 16.2 Upper bounds: Chernoff's inequality . 183 16.3 Cram´er'stheorem . 185 16.4 Quantitative bounds . 189 16.5 Bounds on the expected maximum of random variables . 190 16.6 A flavour of concentration inequalities . 193 16.7 Exercises . 197 A Appendix: Carath´eodory's theorem 198 B Appendix: Dynkin's theorem 202 C Appendix: Fubini's theorem 203 D Appendix: Infninte products of measures 207 E Appendix: Construction of expectation 210 E.1 Nonnegative random variables . 211 E.2 General random variables . 214 E.3 Lebesgue's dominated convergence theorem . 215 F Appendix: Lindeberg's swapping argument 216 G Appendix: The moment method 221 4 H Appendix: Feller's converse to the central limit theorem 222 I Appendix: Uniform integrability 224 5 1 Probability space 1.1 Definitions Let Ω be a set. A collection of its subsets is called a σ-algebra (sometimes also F σ-field) if (i)Ω , 2 F (ii) for every A , we have Ac , that is is closed under taking complements, 2 F 2 F F S (iii) for every sets A1;A2;::: in , we have 1 A , that is is closed under F n=1 n 2 F F taking coutable unions. Note that these imply that ? and that is also closed under taking set difference, 2 F F countable intersections, etc. For instance, = ?; Ω is the trivial σ-algebra and F f g = 2Ω (all the subsets of Ω) is the largest possible σ-algebra. F Suppose is a σ-algebra on the set Ω. A function F µ: [0; + ] F! 1 is called a measure if (i) µ(?) = 0, (ii) µ is countably-additive, that is for every pairwise disjoint sets A1;A2;::: in , we F have ! [1 X1 µ An = µ(An): n=1 n=1 The measure µ is finite if µ(Ω) < , σ-finite if Ω is a countable union of sets in of 1 F finite measure. The measure µ is a probability measure if µ(Ω) = 1. A probability space is a triple (Ω; ; P), where Ω is a set, is a σ-algebra on Ω and F F P is a probability measure on . The sets in are called events. The empty set is called F F an impossible event because P (?) = 0. Set operations have natural interpretations, for instance for \A B", we say \A and B occur", for \A B", we say \A or B occurs", \ [ c T S for \A ", we say \A does not occur", for \ n1=1 k1=n Ak", we say “infinitely many of the events Ak occur", etc. This definition is a starting point of modern probability theory. It was laid as foundations by Kolmogorov who presented his axiom system for probability theory in 1933. We record some basic and useful properties of probability measures. 1.1 Theorem. Let (Ω; ; P) be a probability space. Let A; B; A1;A2;::: be events. Then F (i) P (Ac) = 1 P (A), − 6 (ii) if A B, then P (B A) = P (B) P (A) and P (A) P (B), ⊂ n − ≤ (iii) P (A B) = P (A) + P (B) P (A B), [ − \ Sn P P P (iv) P ( Ai) = P (Ai) P (Ai Aj) + P (Ai Aj Ak) i=1 i − i<j \ i<j<k \ \ n 1 + ( 1) − P (A1 An), − · · · − \···\ S P (v) P ( 1 Ai) 1 P (Ai), i=1 ≤ i=1 Sn Pn (vi) if A1;:::;An are pairwise disjoint, then P ( i=1 Ai) = i=1 P (Ai). We omit proofs (which are rather standard). Part (iv) is the so-called inclusion- exclusion formula. Part (v) is the so-called union bound. We also have the following continuity of measure-type results for monotone events. 1.2 Theorem. Let (Ω; ; P) be a probability space. Let A1;A2;::: be events. F (i) if the events A are increasing, that is A1 A2 :::, then n ⊂ ⊂ ! [1 P Ak = lim P (An) ; n k=1 !1 (ii) if the events A are decreasing, that is A1 A2 :::, then n ⊃ ⊃ ! \1 P Ak = lim P (An) ; n k=1 !1 Proof. (i) It helps consider the events B1 = A1;B2 = A2 A1;B3 = A3 A2;::: n n which are disjoint. We skip the details. Part (ii) can be obtained from (i) by using the complements. 1.3 Remark. Theorem 1.1 and Theorem 1.2 (i) hold for arbitrary measures (the proofs do not need the assumption P (Ω) = 1 of the measure P being probabilistic). Theorem 1.2 (ii) holds for arbitrary measures P as long as P (Ak) < for some k. 1 1.2 Basic examples Ω 1.4 Example. Let Ω = !1;!2;::: be a countable set and = 2 (all subsets). f g F Defining a probability measure on (Ω; ) really amounts to specifying a nonnegative F P sequence p1; p2;::: such that pi = 1 and defining P ( !i ) = pi. Then for every i f g subset A of Ω, X P (A) = pi: i:!i A 2 7 Conversely, since X 1 = P (Ω) = P ( !i ) ; f g i every probability measure is of this form. 1.5 Example. Let Ω be a finite nonempty set and = 2Ω. The uniform probability F measure on (Ω; ) (sometimes referred to as classical) is defined as F A P (A) = j j; Ω j j for every subset A of Ω, where here denotes cardinality. j · j Our next two examples will require nontrivial constructions. We wish to define two probability spaces which will be reasonable models of 1) selecting a point uniformly at random on the interval [0; 1] 2) tossing a fair coin infinitely many times. As much as choosing the ground set Ω is fairly natural, say Ω = [0; 1] for 1), defining an appropriate σ-algebra and a probability measure on it poses certain challenges. Let us first try to illustrate possible subtleties. Let Ω = [0; 1]. If (Ω; ; P) is meant to be a probability space modelling selecting a F point uniformly at random on [0; 1], for 0 a < b 1, we should have P ((a; b)) = b a ≤ ≤ − (the probability that a point is in the interval (a; b) equal its length), and more generally, P should be translation-invariant. Thus should at the very least contain all intervals. F Thus let be such a σ-algebra, that is the smallest σ-algebra containing all the intervals F in [0; 1]; we write.