18.675: Theory of Probability Introduction 1 September 4, 2019

18.675: Theory of Probability Lecturer: Professor Nike Sun Notes by: Andrew Lin Fall 2019 Introduction The course website for this class can be found at [1]. The page notably contains a summary of topics covered, as well as some brief lecture notes and further references. 18.675 is the new numbering for 18.175; the material covered will be similar to previous years. The basic goal is to cover basic objects and laws in probability in a formal mathematical framework; some background in analysis and probability is strongly recommended. (If we are missing 18.100, we should talk to Professor Sun after class.) The main readings for this class will come from chapters 1–4 of our textbook (see [2]): readings for each chapter are included in the summary of topics linked above. Grades are mostly homework (25%), exam 1 (35%), and exam 2 (35%). Class participation is another 5% of the grade – this is a graduate class, but we are expected to come to class. This is a large class that will probably get smaller, so that’s one way attendance may be checked. Alternatively, a short quiz may be given occasionally in class. They won’t be graded, but they’ll mostly be to check whether we’re all following the material (and to have a formal record of who showed up). Realistically, attending class is highly correlated with doing well! By the way, there is some overlap with 18.125 (measure theory), particularly near the beginning. However, most of the content will be significantly different. 1 September 4, 2019 We’ll start the class with some measure theory, which gives us a formal language for probability. On its own, measure theory is a bit dry, so let’s try to think about why we need it at all. Example 1 (We should know this, e.g. from 18.600) Consider a p-biased coin, where the probability of a head is p and the probability of a tail is 1 − p. Then the probability of six independent p-biased coins coming up H, H, T, H, T, H is just the product of the probabilities: 4 2 6 it’s p (1 − p) . Since there are 4 ways to pick 4 heads among the 6 coin tosses, the probability of having 4 6 4 2 heads in total is 4 p (1 − p) . We should generally also be familiar with what it means to have Un be a random variable distributed uniformly 1 2 n−1 over a finite set like f n ; n ; ··· ; n ; 1g, as well as what it means for Vn to be an independent copy of Un: then we 1 can write statements of independence like j k j k 1 U = ;V = = U = V = = : P n n n n P n n P n n n2 But this becomes a bit more complicated if the set of allowed states is not finite. For example, what does it mean for U to be distributed uniformly in the range [0; 1]? We can’t write down the probability that U takes on any particular value. Also, is it true that Un converges to U as n ! 1, at least in some sense (since we seem to be populating [0; 1] uniformly)? Let’s now see an example that illustrates that this kind of problem is trickier than it might initially seem: Problem 2 Alice and Bob play a game where they have an infinite sequence of boxes. Alice puts a real number in each box, and Bob can see the number in every box except for one of them, which he can choose. “Show” that Bob can guess the unseen number with 90 percent probability. “Solution”. To show this, let’s set up some preliminary notation: let S be the set of infinite sequences of real numbers, which can also be denoted RN. For two sequences x; y 2 S, say that x ∼ y (they are equivalent) if they eventually agree: xn = yn for all n > N for some finite index N. This is an equivalence relation on the space of real-valued sequences, so there is some quotient space S= ∼ of equivalence classes. We can pick one representative z from each equivalence class. In other words, if x 2 S, there is a unique representative sequence z = z(x) such that z ∼ x: since they are equivalent by definition, we can denote n(x) to be the first index where x and z agree completely afterward. So here’s Bob’s strategy for guessing the unseen number. He splits the boxes into ten rows: row k contains the boxes which are in spots k mod 10. We now have ten sequences x (1); ··· x (10) corresponding to our ten rows. He picks one of these rows, uniformly at random (say row 10 without loss of generality). Bob reveals all of the other rows, and he evaluates N = maxfn(x (1)); n(x (2)); ··· ; n(x (9))g + 1: (In other words, he looks at the first nine rows, and he finds the first index where each row agrees with its representative sequence.) He now reveals every box in row 10 except for box N: he now has enough information to deduce z (10), the (10) (10) representative sequence for x . Bob now guesses zN . Why is this correct with 90 percent probability? We chose this row uniformly at random, and there’s at most a (10) (i) (10) 10 percent chance that n(x ) is the largest out of all n(x )s, which is the only case where xN disagrees with (10) zN . This is a strange example, and we’ve obviously violated some rules of probability to get to this point. But it’s constructed in a way such that we don’t really know where we’ve messed up! So that’s why it’s worthwhile to understand a little more about measure theory. Let’s start with a seemingly basic problem: how do we define the variable U ∼ Unif[0; 1] from above? We could try to define a function (or measure) µ on subsets of [0; 1], such that µ(A) is the probability that U 2 A for any subset A. For example, we could say that for any 0 ≤ a ≤ b ≤ 1, it’s natural to say that µ([a; b]) = b − a: 2 We also want to have some form of additivity: if A; B ⊆ [0; 1] are disjoint, then we want their disjoint union to satisfy µ(A t B) = µ(A) + µ(B): By induction, this means that for any positive integer n, n ! n G X µ Ai = µ(Ai ): i=1 i=1 Fn F1 In addition, we can think about the limit of the expression i=1 Ai ! i=1 Ai : it seems also natural to want countable additivity: 1 ! 1 G X µ Ai = µ(Ai ): i=1 i=1 Remark 3. Note that µ([0; 1]) = 1, but [0; 1] is an uncountable union of points, and each point has measure zero. So it’s not really reasonable to assume that uncountable addivity would work, too. So to summarize, our measure µ(A) = P(U 2 A) should satisfy the following: • µ([a; b]) = b − a for all 0 ≤ a ≤ b ≤ 1 (“lengths” of intervals). • Countable additivity (as shown above). • We should also want translation invariance: if A ⊆ [0; 1], and we define Ax = A + x to be A shifted rightward by x (mod 1), then µ(A) = µ(Ax ). This is not a lot of requirements: is it enough? If we’re given any subset A, can we use these three properties to determine µ(A)? Let’s try a slightly complicated example: Example 4 T1 Consider the Cantor set C = i=1 Cn, where Cn are iteratively defined as C0 = [0; 1];Cn+1 = Cn n (“middle third” of each interval of Cn): (Because the Ci s are decreasing, the limit C can indeed be defined this way.) How do we find the measure of the Cantor set? We know that µ(C0) = 1, and we also know that Cn is the disjoint union of Cn+1 and the middle third of Cn, so 2 µ(C ) = µ(C ) − µ(middle third of C ) = µ(C ): n+1 n n 3 n 2 n This means that µ(Cn) = 3 for all n, and taking the limit, µ(C) = 0. This seems encouraging towards an answer of “yes, we can determine µ(A),” but it turns out the Cantor set is actually pretty well-behaved compared to some worse functions. In fact, we’ve actually been asking the wrong question: Proposition 5 There exists no function µ : P([0; 1]) ! [0; 1] that satisfies all three of the properties we want! Proof (the Vitali construction). Define an equivalence relation on the real numbers x ∼ y if x − y 2 Q: This partitions the real line into equivalence classes (cosets of the form [x] = x +Q). Each one is a shift of the rationals 3 by some real number, and similar to the Alice and Bob problem, we pick a representative z from each equivalence class. For simplicity, we can pick all z to be in [0; 1]. Let A be the set of all representatives, which is a subset of [0; 1]. (This is sometimes called the Vitali set.) By the properties above, because we can partition [0; 1] into translations of A, 1 = µ([0; 1]) ! [ = µ Aq q2Q X = µ(Aq) q by countable additivity.

18.675: Theory of Probability Introduction 1 September 4, 2019

Quasi-Invariant Gaussian Measures for the Cubic Fourth Order Nonlinear Schrödinger Equation

A Brief on Characteristic Functions

ON the ISOMORPHISM PROBLEM for NON-MINIMAL TRANSFORMATIONS with DISCRETE SPECTRUM Nikolai Edeko (Communicated by Bryna Kra) 1. I

Sarnak's Möbius Disjointness for Dynamical Systems with Singular Spectrum and Dissection of Möbius Flow

Dynamical Systems of Probabilistic Origin: Gaussian and Poisson Systems Elise Janvresse, Emmanuel Roy, Thierry De La Rue

An Application of a Functional Inequality to Quasi-Invariance in Inﬁnite Dimensions

Incorporating Parameter Risk Into Derivatives Prices – Bid-Ask Pricing and Calibration

Semidefinite Approximations of Invariant Measures for Polynomial

Arxiv:1811.08174V1 [Math.PR] 20 Nov 2018 Rcse Hc R Qiain Ne Oednmc.Tediﬀeren the Dynamics

Non-Autonomous Random Dynamical Systems: Stochastic Approximation and Rate-Induced Tipping

Topics in Ergodic Theory: Notes 1

Hitting Time Statistics for Observations of Dynamical Systems