Basics of the Theory of Large Deviations

Basics of the Theory of Large Deviations Michael Scheutzow February 1, 2018 Abstract In this note, we collect basic results of the theory of large deviations. Missing proofs can be found in the monograph [1]. 1 Introduction We start by recalling the following computation which was done in the course Wahrscheinlichkeitstheorie I (and which is also done in the course Versicherungs- mathematik). Assume that X; X1;X2; ::: are i.i.d real-valued random variables on a probability space (Ω; F; P) and x 2 R, λ > 0. Then, by Markov’s inequality, n n X λX1 P Xi ≥ nx ≤ exp{−λnxg E e = exp{−n(λx − Λ(λ))g; i=1 where Λ(λ) := log E expfλXg. Defining I(x) := supλ≥0fλx − Λ(λ)g, we therefore get n X P Xi ≥ nx ≤ exp{−nI(x)g; i=1 which often turns out to be a good bound. 1 2 Cramer’s theorem for real-valued random variables Definition 2.1. a) Let X be a real-valued random variable. The function Λ: R ! (−∞; 1] defined by λX Λ(λ) := log Ee is called cumulant generating function or logarithmic moment generating function. Let DΛ := fλ : Λ(λ) < 1g. b) Λ∗ : R ! [0; 1] defined by Λ∗(x) := supfλx − Λ(λ)g λ2R ∗ is called Fenchel-Legendre transform of Λ. Let DΛ∗ := fλ :Λ (λ) < 1g. In the following we will often use the convenient abbreviation Λ∗(F ) := inf Λ∗(x) x2F for a subset F ⊆ R (with the usual convention that the infimum of the empty set is +1). Lemma 2.2. a) Λ is convex. b) Λ∗ is convex. c) Λ∗ is lower semi-continuous, i. e. fx 2 E :Λ∗(x) ≤ αg is closed for every α 2 R. ∗ d) If DΛ = f0g, then we have Λ ≡ 0. e) If Λ(λ) < 1 for some λ > 0, then we have EX < 1 (but possibly EX = −∞). f) If EX < 1 (but possibly EX = −∞), then we have ∗ Λ (x) := supfλx − Λ(λ)g; x ≥ EX λ≥0 and Λ∗ is nondecreasing on [EX; 1). 2 g) EjXj < 1 implies Λ∗(EX) = 0. ∗ h) infx2R Λ (x) = 0. i) Λ is differentiable in the interior of DΛ with derivative 0 1 ηX Λ (η) = E(Xe ): E exp(ηX) Further, Λ0(η) = y implies Λ∗(y) = ηy − Λ(η). Proof. [1] Examples 2.3. a) L(X) = Poisson(θ), θ > 0. Then x Λ∗(x) = −x + θ + x log ; x ≥ 0: θ b) L(X) = Bernoulli(p), 0 < p < 1. Then x 1 − x Λ∗(x) = x log + (1 − x) log ; 0 ≤ x ≤ 1: p 1 − p c) L(X) = Exp(θ), θ > 0. Then Λ∗(x) = θx − 1 − log(θx); x ≥ 0: d) L(X) = N (0; σ2), σ > 0. Then x2 Λ∗(x) = ; x 2 : 2σ2 R In all cases Λ∗(x) is 1 for all other values of x. Now we are ready to formulate and prove Cramer’s´ Theorem. Theorem 2.4. Let X1;X2;::: be a sequence of independent and identically dis- 1 Pn tributed real-valued random variables and let µn := L n i=1 Xi ; n 2 N. Then the sequence fµngn2N satisfies the following properties. 1 ∗ a) lim supn!1 n log µn(F ) ≤ −Λ (F ) for every closed set F ⊆ R. 1 ∗ b) lim infn!1 n log µn(G) ≥ −Λ (G) for every open set G ⊆ R. 3 ∗ c) For a closed set F ⊆ R we even have µn(F ) ≤ 2 exp{−nΛ (F )g for every n 2 N and for a closed interval F of R we even have µn(F ) ≤ exp{−nΛ∗(F )g for every n 2 N. Proof. Obviously c) implies a), so it suffices to prove c) and b). We always assume that X is a random variable with law µ := µ1. c) The assertions are clearly true in case F = ;, so we assume that F is closed ∗ ∗ and nonempty. The assertions are also clear in case Λ (F ) = infx2F Λ (x) = 0, so we assume Λ∗(F ) > 0. It follows from Lemma 2.2d) that there exists some λ¯ 6= 0 such that Λ(λ¯) < 1. Assume that λ¯ > 0 (the case λ¯ < 0 is treated analogously). Lemma 2.2e) shows that EX < 1. For λ ≥ 0 and x 2 R we get: ( n ) 1 X µ ([x; 1)) = X ≥ x n P n i i=1 ( n ! ) X = P exp λ Xi ≥ exp (λnx) i=1 n !! X ≤ exp (−λnx) E exp λ Xi i=1 = e−n(λx−Λ(λ)): Since EX < 1, Lemma 2.2f) shows that for x ≥ EX we have −nΛ∗(x) µn([x; 1)) ≤ e : (1) Case 1: EjXj < 1 (i. e. EX > −∞). Since Λ∗(F ) > 0, Lemma 2.2g) c c implies EX 2 F . Let (x−; x+) be the largest interval in F which contains EX. Since F is nonempty, at least one of the numbers x−; x+ is finite. If x+ is finite, then x+ 2 F and ∗ ∗ µn(F \ [x+; 1)) ≤ µn([x+; 1)) ≤ exp{−nΛ (x+)g ≤ exp{−nΛ (F )g: If x− > −∞, then x− 2 F and ∗ ∗ µn(F \(−∞; x−]) ≤ µn(−∞; x−]) ≤ exp{−nΛ (x−)g ≤ exp{−nΛ (F )g: ∗ Hence µn(F ) ≤ 2 exp{−nΛ (F )g. In case F is an interval either x− = −∞ or x+ = 1. 4 Case 2: EX = −∞. Lemma 2.2f) shows that the function x 7! Λ∗(x) is ∗ nondecreasing and Lemma 2.2h) implies that limx→−∞ Λ (x) = 0. Since ∗ Λ (F ) > 0 and F 6= ;, there exists some x+ 2 R such that F ⊆ [x+; 1) and x+ 2 F . Now (1) implies ∗ ∗ µn(F ) ≤ µn([x+; 1)) ≤ exp{−nΛ (x+)g ≤ exp{−nΛ (F )g: This proves part c). b) Below we will show, that for every δ > 0 (and every law µ) we have 1 ∗ lim inf log µn((−δ; δ)) ≥ inf Λ(λ) (= −Λ (0)) : (2) n!1 n λ2R Assume this holds and let x 2 R and Y := X − x. Then we have λY ΛY (λ) = log Ee = −λx + Λ(λ) and ∗ ∗ ΛY (z) = sup fλz − ΛY (λ)g = sup fλz + λx − Λ(λ)g = Λ (z + x): Using (2), this implies that for any x 2 R and δ > 0, we have 1 1 (Y ) lim inf log µn((x − δ; x + δ)) = lim inf log µn ((−δ; δ)) n!1 n n!1 n ∗ ∗ ≥ −ΛY (0) = −Λ (x): If G = ;, then assertion b) is obvious. So assume that G is open and nonempty and x 2 G. Then we have (x − δ; x + δ) ⊂ G for some δ > 0 and hence 1 1 ∗ lim inf log µn(G) ≥ lim inf log µn((x − δ; x + δ)) ≥ −Λ (x); n!1 n n!1 n and – since x 2 G was arbitrary – we have 1 ∗ lim inf log µn(G) ≥ − inf Λ (x): n!1 n x2G It remains to show (2). Case 1: µ((−∞; 0)) > 0, µ((0; 1)) > 0 and µ has compact support. 5 Since µ has compact support, there exists some a > 0 such that µ([−a; a]) = 1. Further, Λ(λ) ≤ jλja < 1 for every λ 2 R. For the rest of the proof, see [1]. Case 2: µ((−∞; 0)) > 0, µ((0; 1)) > 0 and µ has unbounded support. Let M be so large that both µ([−M; 0)) and µ((0;M]) are strictly positive. Below we will let M tend to infinity. Define the probability measure ν on the Borel sets of R by µ(A \ [−M; M]) ν(A) := : µ([−M; M]) Clearly ν satisfies the assumptions of Case 1. Defining νn in analogy to µn and letting ΛM denote the logarithmic moment generating function associ- ated to ν and defining Z M Λ(M)(λ) := log eλydµ(y); −M we get n µn((−δ; δ)) ≥ νn((−δ; δ))µ([−M; M]) and 1 1 lim inf log µ ((−δ; δ)) ≥ lim inf log ν ((−δ; δ)) + log µ([−M; M]) n n n n (M) ≥ ΛM (λ) + log µ([−M; M]) = Λ (λ): It suffices to prove I∗ := lim inf Λ(M)(λ) ≥ inf Λ(λ): (3) M!1 λ2R λ2R (M) Since M 7! infλ2R Λ (λ) is nondecreasing, the sets fλ :Λ(M)(λ) ≤ I∗g are nonempty, compact and decreasing in M, so the intersection of all these sets is nonempty. If λ0 is in the intersection, then (M) ∗ Λ(λ0) = lim Λ (λ0) ≤ I : M!1 This finishes Case 2. 6 Case 3: Either µ((−∞; 0)) = 0 or µ((0; 1)) = 0. In this case, λ 7! Λ(λ) is either nonincreasing or nondecreasing and infλ2R Λ(λ) = log µ(f0g). Therefore n µn((−δ; δ)) ≥ µn(f0g) = (µ(f0g)) ; and hence 1 log µn((−δ; δ)) ≥ log µ(f0g) = inf Λ(λ): n λ2R This proves (2) and hence Cramer’s theorem. 3 Basic concepts of the theory of large deviations In the following E denotes a topological space and E the Borel sets of E. Definition 3.1. I : E ! [0; 1] is called a rate function, in case I is lower semi- continuous (i. e. fx 2 E : I(x) ≤ αg is closed for every α ≥ 0). I is called a good rate function if – in addition – fx 2 E : I(x) ≤ αg is compact for every α ≥ 0. Again we will use the abbreviation I(G) := inffI(x); x 2 Gg for any subset G of E. Definition 3.2. Let fµngn2N be a family of probability measures on (E; E) and let I be a rate function.

Basics of the Theory of Large Deviations

Arxiv:1909.00374V3 [Math.PR] 1 Apr 2021 2010 Mathematics Subject Classiﬁcation

Long Term Risk: a Martingale Approach

On the Form of the Large Deviation Rate Function for the Empirical Measures

Large Deviation Theory

Probability (Graduate Class) Lecture Notes

Spatial Birth-Death Wireless Networks

Large Deviations Theory Lecture Notes Master 2 in Fundamental Mathematics Université De Rennes 1

Large Deviations for a Matching Problem Related to the -Wasserstein

Lectures on the Large Deviation Principle

General Theory of Large Deviations

On the Large Deviation Rate Function for the Empirical Measures Of

Large Deviations for Stochastic Processes