Advanced Probability Theory
Total Page:16
File Type:pdf, Size:1020Kb
Advanced probability theory Jiˇr´ı Cern´yˇ June 1, 2016 Preface These are lecture notes for the lecture `Advanced Probability Theory' given at Uni- versity of Vienna in SS 2014 and 2016. This is a preliminary version which will be updated regularly during the term. If you have questions, corrections or suggestions for improvements in the text, please let me know. ii Contents 1 Introduction 1 2 Probability spaces, random variables, expectation 5 2.1 Kolmogorov axioms . .5 2.2 Random variables . .7 2.3 Expectation of real-valued random variables . 10 3 Independence 14 3.1 Definitions . 14 3.2 Dynkin's lemma . 15 3.3 Elementary facts about the independence . 17 3.4 Borel-Cantelli lemma . 19 3.5 Kolmogorov 0{1 law . 22 4 Laws of large numbers 25 4.1 Kolmogorov three series theorem . 25 4.2 Weak law of large numbers . 29 4.3 Strong law of large numbers . 30 4.4 Law of large numbers for triangular arrays . 35 5 Large deviations 37 5.1 Sub-additive limit theorem . 37 5.2 Cram´er'stheorem . 38 6 Weak convergence of probability measures 41 6.1 Weak convergence on R ........................... 41 6.2 Weak convergence on metric spaces . 44 6.3 Tightness on R ................................ 47 6.4 Prokhorov's theorem* . 48 7 Central limit theorem 52 7.1 Characteristic functions . 52 7.2 Central limit theorem . 55 7.3 Some generalisations of the CLT* . 56 8 Conditional expectation 60 8.1 Regular conditional probabilities* . 65 iii 9 Martingales 67 9.1 Definition and Examples . 67 9.2 Martingales convergence, a.s. case . 73 9.3 Doob's inequality and Lp convergence . 75 9.4 L2-martingales . 77 9.5 Azuma-Hoeffding inequality . 78 9.6 Convergence in L1 .............................. 79 9.7 Optional stopping theorem . 83 9.8 Martingale central limit theorem* . 84 10 Constructions of processes 85 10.1 Semi-direct product . 85 10.2 Ionescu-Tulcea Theorem . 86 10.3 Complement: Kolmogorov extention theorem . 90 11 Markov chains 92 11.1 Definition and first properties . 92 11.2 Invariant measures of Markov chains . 94 11.3 Convergence of Markov chains . 99 12 Brownian motion and Donsker's theorem 102 12.1 Space C([0; 1])................................. 102 12.2 Brownian motion . 104 12.3 Donsker's theorem . 105 12.4 Some applications of Donsker's theorem . 109 iv 1 Introduction The goal of this lecture is to present the most important concepts of the probability theory in the context of infinite sequences X1;X2;::: of random variables, or, otherwise said, in the context of stochastic processes in discrete time. We will mostly be interested in the asymptotic behaviour of these sequences. The fol- lowing examples cover some questions that will be answered in the lecture and introduce heuristically some concepts that we will develop in order to solve them. Example 1.1 (Series with random coeficients). It is well known that n i X (−1) n!1 X(1) = −−−!− log 2; but n i i=1 n X 1 n!1 X(2) = −−−!1 (no absolute convergence): n i i=1 One can then ask what happens if the signs are chosen randomly, that is for independent 1 random variables Z1;Z2;::: with P [Zi = +1] = P [Zi = −1] = 2 one considers the sum n X Zi X = : n i i=1 Does this random(!) series converge or no? If yes, is the limit random or deterministic? Example 1.2 (Sums of independent random variables). In the lecture `Probability and Statistic' you were studying the following problem. Let Zi be as in Example 1.1, that is Zi's are outcomes of independent throws of a fair coin, and set n X 1 S = Z ; and X = S : n i n n n i=1 By the weak law of large numbers, denoting by EZi(= 0) the expectation of Zi, we know that n!1 P (jXn − EXnj ≥ ") −−−! 0 for every " > 0. Observe however that the last display says only that the probability that jXnj is far from zero decays with n. It says nothing about the convergence of Xn for a single realisation of coin throws. To address these (and many other) questions we will develop the formalism of prob- ability theory which bases on the measure theory and Kolmogorov axioms. In this formalism, we will show an improved version of the weak LLN, so called strong LLN P lim Xn = 0 = 1; or equivalently lim Xn = 0;P -a.e. n!1 n!1 1 Example 1.3 (Random walk and Brownian motion). Continuing with Example 1.2, we can view Sn as a function S : N ! R. By linear interpolation we can extend it to a function S : R+ ! R (see Figure 1.1). This is a random continuous function, i.e. random element of the space C(R+; R). As such random object cannot be described by elementary means of the `Probability and Statistics' lecture, one of our goals is to develop a sound mathematical theory allowing for this. 4 40 2 20 0 0 0 5 10 15 20 0 500 1000 1500 2000 −2 −20 −4 −40 Figure 1.1: Random walk and its scaling. Observe that on the second picture the x-axis is 100 times longer, but y-axis only 10 times. Second picture \looks almost like" a Brownian motion We also want discuss the convergence of such random objects. More exactly, recall that the central limit theorem says that 1 d p Sn −−−! N (0; 1); n n!1 where N (0; 1) stands for the standard normal distribution. The arrow notation in the previous display stands here for the convergence in distribution which can formally be defined here e.g. by Z a h 1 i n!1 1 −x2=2 P p Sn ≤ a −−−! p e dx; for all a 2 R. n −∞ 2π In view of (1.3), it seems not unreasonable to scale the function S by n−1 in the time direction and by n−1=2' in the space direction, that is to consider (n) −1=2 S (t) = n Snt; and ask `Does this sequence of random elements of C(R+; R) converge? What is the limit object?' 2 We will see that the answer on the first question is `YES', but to this end we need to introduce the right convergence notion. Even more interesting is the limit object, the Brownian motion. Apart being very interesting objects of their own, random walk and Brownian motion are prototypes of two important classes of processes, namely Markov chains/processes and martingales, that we are going to study. We close this section by few examples linking the probability theory to other domains of mathematics. Some of them will be treated in the lecture in more detail. Example 1.4 (Random walk and discrete Dirichlet problem, link to PDE's). Consider a simple random walk on Z2 started at x 2 Z2, that is a sequence of random variables X0;X1;::: determined by X0 = x and by requirement that its increments Zi = Xi−Xi−1, i ≥ 1 are i.i.d. random variables satisfying P [Zi = ±e1] = P [Zi = ±e2] = 1=4: 2 Here e1, e2 stand for the canonical basis vectors of Z . See Figure 1.2 for a typical realisation. x O Y Figure 1.2: Realisation of random walk on Z2 Let g : R2 ! R2 be a continuous function and O a large domain in R2. Let Y be the random position of the exit point of the random walk from the domain O, i.e. Y = XT with T = inffk : Xk 2= Og, see the figure again. Define a function u : Z2 ! R by 2 u(x) = Ex[g(Y )]; x 2 Z ; where Ex stands here for the expectation for the random walk started at x. We will later show that u solves a discrete Dirichlet problem 2 ∆du(x) = 0; x 2 Z \ O; 2 u(x) = g(x); x 2 Z n O; 3 1 where ∆d is a discrete Laplace operator 1 ∆du(x) = 4 fu(x + e1) + u(x − e1) + u(x + e2) + u(x − e2)g − u(x) Example 1.5 (Lower bound on Ramsey numbers, a tiny link to the graph theory). Ramsey number R(k) is the smallest number n such that any colouring of the edges of the complete graph Kn by two colours (red and blue say) must contain at least one monochromatic (that is completely blue or red) copy of Kk as a subgraph. These numbers are rather famous in the graph theory, not only because they are very hard to compute. Actually,2 the only known values are R(1) = 1, R(2) = 2, R(3) = 6, R(4) = 18. For larger Ramsey numbers it is known e.g. R(5) 2 [43; 49] or R(10) 2 [798; 23556]. It is thus essential to get good estimates on these numbers. We are going to use an easy probabilistic argument to find a lower bound on R(k). k 1− n 2 Lemma (taken from [AS08], Proposition 1.1.1). Assume that k 2 < 1. Then n > R(k). In particular R(k) ≥ b2k=2c for all k ≥ 3. Proof. Consider a random two-coloring of the edges of Kn obtained by coloring each edge independently either red or blue, where each color is equally likely. For any fixed set R ⊂ f1; : : : ; ng of k vertices, let AR be the event that the induced subgraph of Kn on R is monochromatic. Clearly, k P [AR] = 2 · 2 2 : n Since there are k possible choices for R, the probability that at least one of the events k 1− n 2 AR occurs is at most k 2 < 1.