An Introduction to Large Deviations
Total Page:16
File Type:pdf, Size:1020Kb
An Introduction to Large Deviations Vitalii Konarovskyi [email protected] February 16, 2021 Abstract The large deviations theory is one of the key techniques of modern probability. It concerns with the study of probabilities of rare events and its estimates is the crucial tool required to handle many questions in statistical mechanics, engineering, applied probability, statistics etc. The course is build as the first look at the theory and is oriented on master and PhD students. 1 Introduction and some examples 1.1 Introduction We start from the considering of a coin-tossing experiment. Let us assume that we toss a fair coin. 1 The law of large numbers says us that the frequency of occurrence of \heads" becomes close to 2 as the number of trials increases to infinity. In other words, if X1;X2;::: are independent random variables 1 1 taking values 0 and 1 with probabilities 2 , i.e. P fXk = 0g = P fXk = 1g = 2 , then we know that for 1 X1+···+Xn the empirical mean n Sn = n 1 1 1 P Sn − > " ! 0 ; n ! 1; n 2 or more strongly 1 1 S ! a.s.2; n ! 1: n n 2 1 1 We are going stop more precisely on the probabilities P n Sn − 2 > " . We see that this events becomes more unlikely for large n and their probabilities decay to 0. During the course, we will work with such kind of unlike events and will try to understand the rate of their decay to zero. The knowledge of decay of probabilities of such unlike events has many applications in insurance, information theory, statistical mechanics etc. The aim of the course is to give an introduction to one of the key technique of the modern probability which is called the large deviation theory. 1 1 Before to investigate the rate of decay of the probabilities P n Sn − 2 > " , we consider an example of other random variable where computations are much more simpler. Let ξ1; ξ2;::: be independent identically distribyted random variables. We also assume that E ξ1 = µ 2 R; 2 Var ξ1 = σ > 0; 1according to the weak law of large numbers 2according to the strong law of large numbers 1 Universit¨atHamburg { WS20/21 An introduction to large deviations / Vitalii Konarovskyi and denote Sn = ξ1 + ··· + ξn. Then the weak law of large numbers says that 1 S ! µ in probability, as n ! +1; n n 1 that is, for all " > 0 one has P n Sn − µ ≥ " ! 0 as n ! +1. This convergence simply follows from Chebyshev's inequality. Indeed, 2 2 1 1 1 1 1 σ P Sn − µ ≥ " ≤ E Sn − µ = Var Sn = ! 0; n ! +1: (1) n "2 n "2 n n"2 1 Estimate (1) shows that the rate of convergence must be at least n , but this estimate is too rough. Later we will see that those probabilities decay exponentially fast. Let us demonstrate it on a particular example. Example 1.1. Let ξ1; ξ2;::: be independent normal distributed random variables with mean µ = 0 and variance σ = 1 (shortly ξk ∼ N(0; 1)). Then the random variable Sn has the normal distribution p1 with mean 0 and variance n. This implies n Sn ∼ N(0; 1). Now we can consider for x > 0 +1 1 1 p 1 Z y2 1 nx2 − 2 − 2 P Sn ≥ x = P p Sn ≥ x n = p p e dy ∼ p p e ; n ! +1; n n 2π x n 2πx n by Exercise 1.1 below. Thus, we have for x > 0 2 1 1 1 1 − nx lim ln P Sn ≥ x = lim ln p p e 2 n!1 n n n!1 n 2πx n 1 p p x2 x2 = − lim ln 2πx n − lim = − n!1 n n!1 2 2 due to Exercise 1.2 3). Remark 1.1. By symmetry, one can show that 1 1 x2 lim ln P Sn ≤ x = − n!1 n n 2 for all x < 0. Indeed, 1 1 1 1 1 1 (−x)2 lim ln P Sn ≤ x = lim ln P − Sn ≥ −x = lim ln P Sn ≥ −x = − n!1 n n n!1 n n n!1 n n 2 because −ξk ∼ N(0; 1) for k ≥ 1. Exercise 1.1. Show that Z +1 2 2 − y 1 − x e 2 dy ∼ e 2 ; x ! +1: x x Exercise 1.2. Let (an)n≥1 and (bn)n≥1 be two sequences of positive real numbers. We say that they are logarithmically equivalent and write an ' bn if 1 lim (ln an − ln bn) = 0: n!1 n 2 Universit¨atHamburg { WS20/21 An introduction to large deviations / Vitalii Konarovskyi o(n) 1. Show that an ' bn iff bn = ane . 2. Show that an ∼ bn implies an ' bn and that the inverse implication is not correct. 3. Show that an + bn ' maxfan; bng. Exercise 1.3. Let ξ1; ξ2;::: be independent normal distributed random variables with mean µ and 2 1 1 variance σ . Let also Sn = ξ1 + ··· + ξn. Compute limn!1 n ln P n Sn ≥ x for x > µ. 1.2 Coin-tossing In this section, we come back to the coin-tossing experiment and compute the decay of the probability 1 P n Sn ≥ x . Let, us before, X1;X2;::: be independent random variables taking values 0 and 1 with 1 probabilities 2 and Sn = X1 + ··· + Xn denote their partial sum. We will show that 1 1 lim ln P Sn ≥ x = −I(x) (2) n!1 n n 1 for all x ≥ 2 , where I is some function of x. We first note that for x > 1 1 1 3 lim ln P Sn ≥ x = −∞: n!1 n n 1 Next, for x 2 2 ; 1 we observe that 1 X 1 X S ≥ x = fS ≥ xng = fS = kg = Ck; P n n P n P n 2n n k≥xn k≥xn k n! where Cn = k!(n−k)! . Then we can estimate 1 k n + 1 k max Cn ≤ P fSn ≥ xng ≤ max Cn: (3) 2n k≥xn 2n k≥xn 1 Note that the maximum is attained at k = bxnc, the smallest integer ≥ xn, because x ≥ 2 . We denote l := bxnc. Using Stirling's formula p 1 n! = nne−n 2πn 1 + O ; n we have 1 k 1 l 1 lim ln max Cn = lim ln Cn = lim (ln n! − ln l! − ln(n − l)!) n!1 n k≥xn n!1 n n!1 n l l n − l n − l = lim ln n − 1 − ln l + − ln (n − l) + n!1 n n n n l n − l l n − l = lim ln n + ln n − ln l − ln(n − l) n!1 n n n n l l n − l n − l = lim − ln − ln = −x ln x − (1 − x) ln(1 − x); n!1 n n n n 3we always assume that ln 0 = −∞ 3 Universit¨atHamburg { WS20/21 An introduction to large deviations / Vitalii Konarovskyi l bxnc because n = n ! x as n ! +1. This together with estimate (3) implies 1 1 lim ln P Sn ≥ x = − ln 2 − x ln x − (1 − x) ln(1 − x) n!1 n n 1 for all x 2 2 ; 1 . So, we can take ( ln 2 + x ln x + (1 − x) ln(1 − x) if x 2 [0; 1]; I(x) = (4) +1 otherwise: 4 Remark 1.2. Using the symmetry, we have that 1 1 lim ln P Sn ≤ x = −I(x) (5) n!1 n n 1 for all x ≤ 2 . Indeed, 1 1 1 n 1 lim ln P Sn ≤ x = lim ln P − Sn ≥ 1 − x n!1 n n n!1 n n n 1 (1 − X1) + ··· + (1 − Xn) = lim ln P ≥ 1 − x = −I(1 − x) = −I(x) n!1 n n because Xk and 1 − Xk have the same distribution. Theorem 1.1. Let ξ1; ξ2;::: be independent Bernoulli distributed random variables with parameter p for some p 2 (0; 1), that is, P fξk = 1g = p and P fξk = 0g = 1 − p for all k ≥ 1. Let also Sn = ξ1 + ··· + ξn. Then for all x ≥ p 1 1 lim ln P Sn ≥ x = −I(x); n!1 n n where ( x ln x + (1 − x) ln 1−x if x 2 [0; 1]; I(x) = p 1−p +1 otherwise. 4The picture was taken from [dH00] 4 Universit¨atHamburg { WS20/21 An introduction to large deviations / Vitalii Konarovskyi Exercise 1.4. Prove Theorem 1.1. Exercise 1.5. Using (2) and (5) show that 1 X Sn 1 P − ≥ " < 1; n 2 n=1 Sn 1 for all " > 0. Conclude that n ! 2 a.s. as n ! 1 (strong low of large numbers). (Hint: Use the Borel-Cantelly lemma to show the convergence with probability 1) 2 Cramer's theorem 2.1 Comulant generating function The aim of this section is to obtain an analog of Theorem 1.1 for any sequeness of independent identically distributed random variables. In order to understand the form of the rate function I, we 1 will make the following computations, trying to obtain the upper bound for P n Sn ≥ x . Let ξ1; ξ2;::: be independent identically distributed random variables with mean µ 2 R. Let also Sn = ξ1 + ··· + ξn. We fix x ≥ µ and λ > 0 and use Chebyshev's inequality in order to estimate the following probability n 1 n o 1 1 Y 1 n S ≥ x = fS ≥ xng = eλSn ≥ eλxn ≤ eλSn = eλξk = eλξ1 : P n n P n P eλxn E eλxn E eλxn E k=1 Thus, we have 1 1 1 n −λxn λξ1 lim ln P fSn ≥ xng ≤ lim ln e + lim ln E e = −λx + '(λ); (6) n!1 n n!1 n n!1 n λξ where '(λ) := ln E e 1 .