Large Deviations Richard Sowers
Total Page:16
File Type:pdf, Size:1020Kb
Large Deviations Richard Sowers Department of Mathematics, University of Illinois at Urbana-Champaign E-mail address: [email protected] Contents Chapter 1. Some Simple Calculations and Laplace Asymptotics 5 1. The Main Ideas and Examples 7 2. The Contraction Principle 11 Exercises 12 Chapter 2. The G¨artner-Ellis Theorem 15 Exercises 21 Chapter 3. Schilder’s Theorem 23 Exercises 27 Chapter 4. Freidlin-Wentzell-Theory 29 Exercises 34 Chapter 5. Large Deviations for Markov Chains 35 1. Pair Empirical Measure 42 Exercises 43 Chapter 6. Refined Asymptotics 45 1. Reference 48 Bibliography 49 3 CHAPTER 1 Some Simple Calculations and Laplace Asymptotics The basic idea which underlines all of large deviations theory is exemplified in the following calculation. Set def −3/ε −7/ε Iε = e + e for all ε ∈ (0, 1). Then −3/ε n −4/εo Iε = e 1 + e −4/ε for all ε ∈ (0, 1), and limε→0{1 + e } = 1. Thus −4/ε (1) lim ε ln Iε = −3 + lim ε ln{1 + e } = −3. ε→0 ε→0 Definition 0.1. If {Aε}ε∈(0,1) and {Bε}ε∈(0,1) are two real-valued sequences, we say that Aε Bε if lim ε ln Aε = lim ε ln Bε. ε→0 ε→0 The generalization of (1) is thus that N 1 X −cj /ε ane exp − min cn . ε 1≤n≤N n=1 −8/ε −8/ε for generic an’s and cn’s (we clearly have to disallow the situations like 7e + 7e ). 0.1. Laplace Asymptotics. Let’s extend these calculations to integrals. Define 1 def Z 1 (2) Iε = exp − f(x) dx ε ∈ (0, 1) x=0 ε where f : [0, 1] → R is smooth and has a unique minimum at some x∗ ∈ (0, 1). Let’s also assume that f¨(x∗) > 0. Informally, we have that J−1 X 1 1 1 Iε ≈ exp − f(xj) {xj+1 − xj} exp − inf f(xj) exp − inf f(x) ε ε 1≤j≤J−1 ε 0≤x≤1 j=1 where 0 = x1 < x2 . xJ−1 < xJ = 1 is a partition of [0, 1] which is “sufficiently” fine. Let’s see if we can 1 ∗ ˜ make this rigorous. We can write Iε = exp − ε f(x ) Iε where Z 1 1 ∗ I˜ε = exp − {f(x) − f(x )} dx. x=0 ε Our goal is then to show that limε→0 ε ln I˜ε = 0. Let’s break I˜ε into two parts; ˜ ˜A ˜B Iε = Iε + Iε where Z 1 I˜A def= exp − {f(x) − f(x∗)} dx ε x∈[0,1] ε |x−x∗|≥εγ 5 Z 1 I˜B def= exp − {f(x) − f(x∗)} dx ε x∈[0,1] ε |x−x∗|<εγ where γ > 0 is some parameter to be determined. To proceed, we note that ∗ def f(x) − f(x ) (3) c = lim inf 2 > 0. δ&0 x∈[0,1] δ |x−x∗|≥δ Thus for ε ∈ (0, 1) sufficiently small, 1 c inf {f(x) − f(x∗)} ≥ ε2γ−1. x∈[0,1] ε 2 |x−x∗|≥εγ Thus h c i I˜A ≤ exp − ε2γ−1 . ε 2 1 2γ−1 ˜A If γ < 2 , then limε→0 ε = ∞, so limε→0 ε ln Iε = 0. ˜B ∗ Let’s next consider Iε . We first expand f around x . We have that 1 f(x) = f(x∗) + f¨(x∗)(x − x∗)2 + E (x) 2 1 for x ∈ [0, 1], where Z 1 1 ∗ 3 2 (3) ∗ ∗ E1(x) = (x − x ) (1 − s) f (x + s(x − x ))ds 2 s=0 √ ˜B ∗ for all x ∈ [0, 1]. Let’s next rescale the integral representation for Iε by taking z = (x − x )/ ε. If ε > 0 is small enough that (x∗ − εγ , x∗ + εγ ) ⊂ (0, 1), we get that √ Z εγ−1/2 √ ˜B 1 ¨ ∗ 2 1 ∗ Iε = ε exp − f(x )z + E1 x + εz dz z=−εγ−1/2 2 ε If |z| ≤ εγ−1/2, then 1 ∗ √ (3) 1 √ 3 3γ−1 E1 x + εz ≤ kf kC[0,1] εz ≤ ε . ε ε 1 3γ−1 γ−1/2 This suggests that we take γ > 3 so that limε→0 ε = 0; note also that limε→0 ε = ∞. Doing so, we get that Z s 1 ˜B 1 ¨ ∗ 2 2π lim √ Iε = exp − f(x )z dz = . ε→0 ε 2 ¨ ∗ z∈R f(x ) In other words, (s ) B √ 2π I˜ = ε + E2(ε) ε f¨(x∗) (s ) √ 2π A I˜ε = ε + E2(ε) + I˜ f¨(x∗) ε where limε→0 E2(ε) = 0 and thus lim ε ln I˜ε = 0. ε→0 6 0.2. I.I.D. Coin Tosses. Let’s now toss a large collection of independent and identically distributed (i.i.d.) coins. Let {ξn}n∈N be i.i.d. Bernoulli random variables; i.e., P{ξn = 1} = p and P{ξn = 0} = 1 − p for all n ∈ N, where p ∈ (0, 1) is some fixed parameter. Define N def 1 X (4) X = ξ N N n n=1 for all N ∈ N. Thus NXN is a binomial distribution; N {NX = k} = pk(1 − p)N−k P N k for all k ∈ {0, 1 ...N}. The second√ key realization is that we can use Stirling’s formula to approximate n! for large n; recall that n! ≈ (n/e)n 2πn for large n (more precisely, the difference between the two is negligible for large n). Thus we have the following string of approximate equalities: for any α ∈ (0, 1), N! {X ≈ α} ≈ pNα(1 − p)N−Nα P N (Nα)!(N − Nα)! (N/e)N ≈ pNα(1 − p)N(1−α) (Nα/e)Nα(N(1 − α)/e)N(1−α) √ 2πN × √ 2πNαp2πN(1 − α) (5) N N−Nα−N(1−α) p Nα 1 − p N−Nα 1 = e α 1 − α p2πNα(1 − α) 1 p 1 − p = exp N α ln + (1 − α) ln p2πNα(1 − α) α 1 − α 1 = exp [−NH(α, p)] p2πNα(1 − α) where α 1 − α H(α, p) def= α ln + (1 − α) ln . p 1 − p This is of course the famous relative entropy. Specifically, it is the relative entropy of a coin with bias α with respect to the true coin. Thus for any A ∈ [0, 1], we should heuristically have that X X 1 P {XN ∈ A} ≈ P {XN ≈ α} ≈ exp [−NH(α, p)] exp −N inf H(α, p) . p α∈A α∈A α∈A 2πNα(1 − α) √ The 1/ N terms in (5) should of course negligible at the exponential rate of interest. 1. The Main Ideas and Examples Large Deviations theory gives a background behind both Laplace asymptotics and the calculations of i.i.d. coin flips. Note that Laplace asymptotics primarily applies to continuous random variables, and the coin flips are discrete. We can also handle infinite-dimensional random variables. There are three basic questions we would like to address. (1) What is a large deviations principle? What is the “correct” way to formulate a study of rare events? (2) Is there a unifying theory for studying the origination of rare events? (3) How do rare events transform? 7 We will give the definition of a large deviations principle here. We will also motivate the answer to the second question; the formal result is called the G¨artner-Ellis result and will, due to its technicalities, be addressed in the next chapter. The last issue is essentially the “contraction principle” of Varadhan, which we will also address here. 1.1. Definition of an LDP. Our basic setup is as follows. We have a collection {Xε}ε∈(0,1) of random variables which take values in some Polish space X. We will see that the following is the “correct” way to think about exponentially rare events. Definition 1.1. We say that {Xε}ε∈(0,1) has a large deviations principle with rate function I : X → [0, ∞] if def (1) For each s ≥ 0, Φ(s) = {x ∈ X : I(x) ≤ s} is compact. (2) For each closed subset F of X, lim ε ln P{Xε ∈ F } ≤ − inf I(x) ε→0 x∈F (3) For each open subset G of X, lim ε ln P{Xε ∈ G} ≥ − inf I(x) ε→0 x∈G We will see why this is correct in a moment. It turns out, however, that there is an equivalent definition for a large deviations principle. We here let d be the metric on X. Definition 1.2. We say that {Xε}ε∈(0,1) has a large deviations principle with rate function I : X → [0, ∞] if def (1) For each s ≥ 0, Φ(s) = {x ∈ X : I(x) ≤ s} is compact. (2) For each s ≥ 0 and each δ > 0, lim ε ln {d(Xε, Φ(s)) ≥ δ} ≥ −s ε→0 P (3) For every x∗ ∈ X and every δ > 0, ∗ ∗ lim ε ln {d(Xε, x ) < δ} ≤ −I(x ) ε→0 P 1.2. How to get at an LDP. Let’s return to the framework of Laplace asymptotics. Suppose that 2 {Xε}ε∈(0,1) is a collection of R-valued random variables such that, for some f ∈ C [0, 1], Z 1 (6) P{Xε ∈ A} = cε exp − f(x) dx x∈A ε for all A ∈ B[0, 1], where −1 def Z 1 cε = exp − f(x) dx x∈A ε with limN→∞ ε ln cε = 0. Lemma 1.1. For any A ∈ B[0, 1], lim ε ln P{Xε ∈ A} ≤ − inf f(x) ε→0 x∈A¯ lim ε ln P{Xε ∈ A} ≥ − inf f(x).