Large Deviations Principles3 1.1 Motivation
Total Page:16
File Type:pdf, Size:1020Kb
Large Deviation Principles Spring 2010 Scott Robertson∗ Chris Almost† The course website contains links to the assignments and further reading. Contents Contents1 1 Introduction to large deviations principles3 1.1 Motivation................................... 3 1.2 Definition and basic properties....................... 4 1.3 Analogy with weak convergence...................... 7 2 Cramér’s Theorem 11 d 2.1 Cramér’s theorem in R ........................... 11 2.2 Sanov’s theorem ............................... 15 2.3 Sampling a Brownian motion ....................... 16 3 Gärtner-Ellis theorem 17 3.1 The upper bound............................... 17 3.2 The lower bound............................... 18 4 Varadhan’s Integral Lemma 19 4.1 Lower bound for l.s.c. functions...................... 20 4.2 Upper bound for u.s.c. functions...................... 21 4.3 Laplace principles............................... 22 4.4 Asymptotically optimal variance reduction ............... 22 ∗[email protected] †[email protected] 1 2 CONTENTS 5 General results 24 5.1 An existence result for weak LDP ..................... 24 5.2 Contraction principle............................. 25 5.3 Approximate contraction principle .................... 25 6 Sample Path LDP 26 6.1 Mogulskii’s theorem ............................. 26 6.2 Schilder’s theorem .............................. 30 6.3 Friedlin-Wentzell theory........................... 31 7 LDP for occupancy times 32 7.1 Finite state Markov chains ......................... 32 7.2 Markov chains in Polish spaces....................... 34 Index 37 Introduction to large deviations principles 3 1 Introduction to large deviations principles 1.1 Motivation Large deviation theory is concerned with the study of probabilities of rare events. We will be asymptotically computing these probabilities on an exponential scale. When we say “rare” we do not mean P[A] 0, rather we mean events for which 1 log A is moderately sized for large n,≈ where the family of probability mea- n Pn[ ] sures Pn n 1 converges (perhaps to a point mass). ≥ Supposef g we have n stocks S1,..., Sn and we own $1 of each at the beginning of the day. Assume the one day rates of return ri are i.i.d. with mean zero. What is the probability your portfolio loss exceeds some given percentage, say 50%? Our wealth at the end of the day is W Pn 1 S1, so the portfolio rate of return 1 = i=1 S0 i is 1 P r (recall W $n). We wish to computei 1 P r 0.5 . This is an n i i 0 = P[ n i i ] example of a ruin probability. ≤ Why 1 log A and not, say, 1 log A or n2 log A ? The answer is be- n Pn[ ] n2 Pn[ ] Pn[ ] cause it works for empirical averages of i.i.d. N(0, 1) random variables. It is well know that 1 Pn Z N 0, 1 . It can be shown (see K&S p. 112) n i 1 i ( n ) = ∼ n Z Ç r ap2πn 2 1 X 1 n 2 2π 2 e na =2 Z a e nx =2d x e na =2. 1 na2 − P n i = 2π − na − + ≤ i=1 ≥ a ≤ so 1 log a, a2=2. Since 1 log A converges to something mean- n Pn[( )] n Pn[ ] ingful for averages1 of!N −(0, 1), it makes sense to think it is the correct scaling for averages of other i.i.d. random variables, random walks, Brownian motion, some diffusions, some Markov processes, etc. Suppose we may write A g A e nI(A), where 1 log g A 0 uniformly Pn[ ] = n( ) − n n( ) for all A. What should I(A) look like? What properties should it have?! Assume for now that all Pn are absolutely continuous with respect to some reference measure nI(x) P. By taking “A = d x” we get pn(x) = gn(x)e− , so 1 1 Z nI(x) log Pn[A] = log e− P[d x] + o(1) n n A If f is a non-negative, bounded, measurable function then Z 1=n n n f (x) P[d x] !1 esssup f (x) : x A A −−! P f 2 g I(x) Let f (x) := e− and combine the previous two equations to get 1 I(x) lim log Pn[A] = log esssup e− : x A = essinf I(x) : x A . n n P !1 P f 2 g − f 2 g This is telling us that we should look for I such that we can write statements of the form 1 lim log n[A] = inf I(x). n n P x A !1 − 2 4 Large Deviation Principles The basic properties I should have include non-negativity, infx I(x) = 0, and there should exist x A which minimizes I over A whenever 2XA is compact. In particular, I should be2lower semicontinuous, i.e. the set I α is closed for each α 0. f ≤ g ≥ 1.1.1 Exercise. Show that, in a metric space, I is lower semicontinuous if and only if lim infn I(xn) I(x) whenever xn x. ≥ ! SOLUTION: Assume that I is lower semicontinuous and let xn x be given. Let ! α := lim infn I(xn). If α = then there is nothing to prove. Assume that α is a 1 finite number. Define n1 := 1 and for each k > 1 define nk to be the smallest index greater than n such that I x inf I x 2 k. Then x x, since it k 1 ( nk ) m>nk 1 ( m) + − nk is a subsequence− of a convergent sequence,≤ − and by construction I x! α. For ( nk ) ! each " > 0, eventually I(xk) α + ", so since I α + " is a closed set, it follows that I(x) α + ". Since " >≤0 was arbitrary, fI(x≤) α.g If α = then it is easy to show that≤ I(x) = . ≤ −∞ −∞ Suppose I has the property that lim infn I(xn) I(x) for all x and all sequences xn x. Let α > 0 be given and assume for contradiction≥ that the set I α is ! f ≤ g not closed. Then there is a sequence xn 1n 1 I α such that xn x and f g = ⊆ f ≤ g ! x = I α . But then α lim infn I(xn) I(x) > α, a contradiction. ú 2 f ≤ g ≥ ≥ 1.2 Definition and basic properties 1.2.1 Definition. Let ( , τ) be a topological space. I : [0, ] is a rate function if it is lower semicontinuous.X I is a good rate functionX!if I α1is compact for all α 0. f ≤ g ≥ Remark. Some authors take only good rate functions to be rate functions. Let I be a rate function. It seems as though it would be nice if 1 lim log n[A] = inf I(x) n n P x A !1 − 2 for all measurable A, but this is too restrictive for reasonable applications. For example, it is quite reasonable to have Pn[ x ] = 0 for all n, for all x . But this implies that I(x) = everywhere, whichf g is incompatible with the2 fact X that 1 infx I(x) = 0 since Pn[ ] = 1. 2X (w) X Recall that Pn P does not mean limn Pn[A] = P[A] for all A. Rather, it is equivalent to the pair−! of inequalities !1 lim sup Pn[F] P[F] for all F closed, and n ≤ lim inf Pn[G] P[G] for all G open. n ≥ Definition and basic properties 5 1.2.2 Definition. Let ( , ) be a measure space and P" ">0 be a family of prob- X B f g ability measures. We say that P" ">0 satisfies a large deviations principle or LDP with (good) rate function I if, forf allg Γ , 2 B inf I x lim inf " log " lim sup " log " inf I x x ( ) " 0 P (Γ) P (Γ) ( ) − Γ◦ ≤ ≤ " 0 ≤ − x Γ 2 ! ! 2 Remark. (i) There is no reason to assume that is the Borel σ-algebra of . (ii) In most instances, but not all, isB a Polish space (i.e. a completeX separable X metric space) and the P" are Borel measures. (iii) If all the open sets are measurable then we can write the LDP as lim sup " log P"(F) inf I(x) for all F closed, and " 0 ≤ − x F ! 2 lim inf " log " G inf I x for all G open. " 0 P ( ) x G ( ) ! ≥ − 2 The full LDP is the one stated in the definition (or equivalently, the one in the remark when we are in the Borel case). The weak LDP in the Borel case is a slight relaxation, lim sup " log P"(F) inf I(x) for all F compact, and " 0 ≤ − x F ! 2 lim inf " log " G inf I x for all G open. " 0 P ( ) x G ( ) ! ≥ − 2 That is to say, “closed” changes to “compact” for the weak LDP. 1.2.3 Exercise. Find a family satisfying a weak LDP but no full LDP. SOLUTION: From D&Z, page 7, let P" := δ1=". Then the family P" ">0 satisfies the weak LDP for the good rate function I . (The lowerf boundg for open sets is trivial and the upper bound for compact≡ 1 sets follows because eventually 1=" escapes any fixed compact set. We define log 0 := .) This family does −∞ not satisfy a full LDP for any rate function because P"[ x ] is eventually zero for f g every x, so the only possible choice of rate function is I , but P"[R] = 1 implies inf I x 0, a contradiction. ≡ 1 x R ( ) = This example2 also shows that having a weak LDP with a good rate function does not imply a full LDP. ú 1.2.4 Definition. A family P" ">0 is exponentially tight if, for every α > 0, there f g c is a compact set Kα such that lim sup" 0 " log P"[Kα] < α.