Large Deviations

Richard Sowers

Department of Mathematics, University of Illinois at Urbana-Champaign E-mail address: [email protected]

Contents

Chapter 1. Some Simple Calculations and Laplace Asymptotics 5 1. The Main Ideas and Examples 7 2. The Contraction Principle 11 Exercises 12

Chapter 2. The G¨artner-Ellis Theorem 15 Exercises 21

Chapter 3. Schilder’s Theorem 23 Exercises 27

Chapter 4. Freidlin-Wentzell-Theory 29 Exercises 34

Chapter 5. Large Deviations for Markov Chains 35 1. Pair Empirical Measure 42 Exercises 43

Chapter 6. Refined Asymptotics 45 1. Reference 48

Bibliography 49

3

CHAPTER 1

Some Simple Calculations and Laplace Asymptotics

The basic idea which underlines all of large deviations theory is exemplified in the following calculation. Set def −3/ε −7/ε Iε = e + e for all ε ∈ (0, 1). Then −3/ε n −4/εo Iε = e 1 + e

−4/ε for all ε ∈ (0, 1), and limε→0{1 + e } = 1. Thus −4/ε (1) lim ε ln Iε = −3 + lim ε ln{1 + e } = −3. ε→0 ε→0

Definition 0.1. If {Aε}ε∈(0,1) and {Bε}ε∈(0,1) are two real-valued sequences, we say that

Aε  Bε if lim ε ln Aε = lim ε ln Bε. ε→0 ε→0 The generalization of (1) is thus that

N  1  X −cj /ε ane  exp − min cn . ε 1≤n≤N n=1 −8/ε −8/ε for generic an’s and cn’s (we clearly have to disallow the situations like 7e + 7e ). 0.1. Laplace Asymptotics. Let’s extend these calculations to integrals. Define 1   def Z 1 (2) Iε = exp − f(x) dx ε ∈ (0, 1) x=0 ε where f : [0, 1] → R is smooth and has a unique minimum at some x∗ ∈ (0, 1). Let’s also assume that f¨(x∗) > 0. Informally, we have that J−1 X  1   1   1  Iε ≈ exp − f(xj) {xj+1 − xj}  exp − inf f(xj)  exp − inf f(x) ε ε 1≤j≤J−1 ε 0≤x≤1 j=1

where 0 = x1 < x2 . . . xJ−1 < xJ = 1 is a partition of [0, 1] which is “sufficiently” fine. Let’s see if we can  1 ∗  ˜ make this rigorous. We can write Iε = exp − ε f(x ) Iε where Z 1   1 ∗ I˜ε = exp − {f(x) − f(x )} dx. x=0 ε

Our goal is then to show that limε→0 ε ln I˜ε = 0. Let’s break I˜ε into two parts; ˜ ˜A ˜B Iε = Iε + Iε where Z  1  I˜A def= exp − {f(x) − f(x∗)} dx ε x∈[0,1] ε |x−x∗|≥εγ

5 Z  1  I˜B def= exp − {f(x) − f(x∗)} dx ε x∈[0,1] ε |x−x∗|<εγ where γ > 0 is some parameter to be determined. To proceed, we note that

∗ def f(x) − f(x ) (3) c = lim inf 2 > 0. δ&0 x∈[0,1] δ |x−x∗|≥δ

Thus for ε ∈ (0, 1) sufficiently small, 1 c inf {f(x) − f(x∗)} ≥ ε2γ−1. x∈[0,1] ε 2 |x−x∗|≥εγ Thus h c i I˜A ≤ exp − ε2γ−1 . ε 2 1 2γ−1 ˜A If γ < 2 , then limε→0 ε = ∞, so limε→0 ε ln Iε = 0. ˜B ∗ Let’s next consider Iε . We first expand f around x . We have that 1 f(x) = f(x∗) + f¨(x∗)(x − x∗)2 + E (x) 2 1 for x ∈ [0, 1], where Z 1 1 ∗ 3 2 (3) ∗ ∗ E1(x) = (x − x ) (1 − s) f (x + s(x − x ))ds 2 s=0 √ ˜B ∗ for all x ∈ [0, 1]. Let’s next rescale the integral representation for Iε by taking z = (x − x )/ ε. If ε > 0 is small enough that (x∗ − εγ , x∗ + εγ ) ⊂ (0, 1), we get that

√ Z εγ−1/2  √  ˜B 1 ¨ ∗ 2 1 ∗  Iε = ε exp − f(x )z + E1 x + εz dz z=−εγ−1/2 2 ε If |z| ≤ εγ−1/2, then

1 ∗ √  (3) 1 √ 3 3γ−1 E1 x + εz ≤ kf kC[0,1] εz ≤ ε . ε ε

1 3γ−1 γ−1/2 This suggests that we take γ > 3 so that limε→0 ε = 0; note also that limε→0 ε = ∞. Doing so, we get that Z   s 1 ˜B 1 ¨ ∗ 2 2π lim √ Iε = exp − f(x )z dz = . ε→0 ε 2 ¨ ∗ z∈R f(x ) In other words, (s ) B √ 2π I˜ = ε + E2(ε) ε f¨(x∗) (s ) √ 2π A I˜ε = ε + E2(ε) + I˜ f¨(x∗) ε

where limε→0 E2(ε) = 0 and thus

lim ε ln I˜ε = 0. ε→0

6 0.2. I.I.D. Coin Tosses. Let’s now toss a large collection of independent and identically distributed

(i.i.d.) coins. Let {ξn}n∈N be i.i.d. Bernoulli random variables; i.e.,

P{ξn = 1} = p and P{ξn = 0} = 1 − p for all n ∈ N, where p ∈ (0, 1) is some fixed parameter. Define N def 1 X (4) X = ξ N N n n=1 for all N ∈ N. Thus NXN is a binomial distribution; N {NX = k} = pk(1 − p)N−k P N k for all k ∈ {0, 1 ...N}. The second√ key realization is that we can use Stirling’s formula to approximate n! for large n; recall that n! ≈ (n/e)n 2πn for large n (more precisely, the difference between the two is negligible for large n). Thus we have the following string of approximate equalities: for any α ∈ (0, 1), N! {X ≈ α} ≈ pNα(1 − p)N−Nα P N (Nα)!(N − Nα)! (N/e)N ≈ pNα(1 − p)N(1−α) (Nα/e)Nα(N(1 − α)/e)N(1−α) √ 2πN × √ 2πNαp2πN(1 − α) (5) N N−Nα−N(1−α)  p Nα  1 − p N−Nα 1 = e α 1 − α p2πNα(1 − α) 1   p 1 − p  = exp N α ln + (1 − α) ln p2πNα(1 − α) α 1 − α 1 = exp [−NH(α, p)] p2πNα(1 − α) where α 1 − α H(α, p) def= α ln + (1 − α) ln . p 1 − p This is of course the famous relative . Specifically, it is the relative entropy of a coin with bias α with respect to the true coin. Thus for any A ∈ [0, 1], we should heuristically have that X X 1   P {XN ∈ A} ≈ P {XN ≈ α} ≈ exp [−NH(α, p)]  exp −N inf H(α, p) . p α∈A α∈A α∈A 2πNα(1 − α) √ The 1/ N terms in (5) should of course negligible at the exponential rate of interest.

1. The Main Ideas and Examples Large Deviations theory gives a background behind both Laplace asymptotics and the calculations of i.i.d. coin flips. Note that Laplace asymptotics primarily applies to continuous random variables, and the coin flips are discrete. We can also handle infinite-dimensional random variables. There are three basic questions we would like to address. (1) What is a large deviations principle? What is the “correct” way to formulate a study of rare events? (2) Is there a unifying theory for studying the origination of rare events? (3) How do rare events transform?

7 We will give the definition of a large deviations principle here. We will also motivate the answer to the second question; the formal result is called the G¨artner-Ellis result and will, due to its technicalities, be addressed in the next chapter. The last issue is essentially the “contraction principle” of Varadhan, which we will also address here.

1.1. Definition of an LDP. Our basic setup is as follows. We have a collection {Xε}ε∈(0,1) of random variables which take values in some Polish space X. We will see that the following is the “correct” way to think about exponentially rare events.

Definition 1.1. We say that {Xε}ε∈(0,1) has a large deviations principle with I : X → [0, ∞] if def (1) For each s ≥ 0, Φ(s) = {x ∈ X : I(x) ≤ s} is compact. (2) For each closed subset F of X,

lim ε ln P{Xε ∈ F } ≤ − inf I(x) ε→0 x∈F (3) For each open subset G of X,

lim ε ln P{Xε ∈ G} ≥ − inf I(x) ε→0 x∈G We will see why this is correct in a moment. It turns out, however, that there is an equivalent definition for a large deviations principle. We here let d be the metric on X.

Definition 1.2. We say that {Xε}ε∈(0,1) has a large deviations principle with rate function I : X → [0, ∞] if def (1) For each s ≥ 0, Φ(s) = {x ∈ X : I(x) ≤ s} is compact. (2) For each s ≥ 0 and each δ > 0,

lim ε ln {d(Xε, Φ(s)) ≥ δ} ≥ −s ε→0 P (3) For every x∗ ∈ X and every δ > 0, ∗ ∗ lim ε ln {d(Xε, x ) < δ} ≤ −I(x ) ε→0 P 1.2. How to get at an LDP. Let’s return to the framework of Laplace asymptotics. Suppose that 2 {Xε}ε∈(0,1) is a collection of R-valued random variables such that, for some f ∈ C [0, 1], Z  1  (6) P{Xε ∈ A} = cε exp − f(x) dx x∈A ε for all A ∈ B[0, 1], where    −1 def Z 1 cε = exp − f(x) dx x∈A ε with limN→∞ ε ln cε = 0. Lemma 1.1. For any A ∈ B[0, 1],

lim ε ln P{Xε ∈ A} ≤ − inf f(x) ε→0 x∈A¯ lim ε ln P{Xε ∈ A} ≥ − inf f(x). ε→0 x∈A◦ Proof. Let L be Lebesgue measure on ([0, 1], B[0, 1]). The upper bound is obvious;  1   1  P{Xε ∈ A} ≤ cεL(A) exp − inf f(x) ≤ cε exp − inf f(x) . ε x∈A ε x∈A¯ To get the lower bound, fix x∗ ∈ A◦ and δ > 0. Define O def= {x ∈ [0, 1] : f(x) > f(x∗) − δ} ∩ A◦. Since f is continuous, O is open (in the topology [0, 1] inherits from R) and thus L(O) > 0. Hence  1  {X ∈ A} ≥ c L(O) exp − {f(x∗) + δ} . P ε ε ε

8 ∗ ◦ This gives the lower bound by taking ε → 0 and then δ → 0, and they varying x over A .  The upper bound was easy and global; the lower bound required a localization (and the fact that L(O) > 0). While the two bounds may agree (if A¯ = A◦), they will not agree in general. For example, if A is a single point or a countable dense set, they will disagree. These “technicalities” become even more pronounced in infinite-dimensional spaces. Is there a robust way to recover f even if we don’t start from Laplace asymptotics and where we may not originally know the exact distribution of Xε (recall that in (4), we only knew asymptotics of the distribution of XN )? Our interest is to come up with a Laplace-type asymptotics when we don’t have (6) as a starting point. To see how things work, let’s compute asymptotics of the moment generating functions of Xε. Laplace asymptotics tells us that     Z 1    θ ε 1 (7) lim ε ln E exp X = lim ε ln cε exp {θx − f(x)} dx = Λ(θ) ε→0 ε ε→0 x=0 ε where Λ(θ) def= sup {θx − f(x)} . x∈[0,1] The key observation we wish to exploit is almost the Λ is the Legendre-Fenchel transform of f. More exactly, it is exactly the Legendre-Fenchel transform of the function ( f(x) if x ∈ [0, 1] f ∞(x) def= ∞ else.

Definition 1.3. Let X be a vector space with dual X∗. Fix f : X → [−∞, ∞]. The Legendre-Fenchel transform of f is the function

∗ ∗ def ∗ ∗ ∗ f (x ) = sup {hx, x iX − f(x)} x ∈ X x∈X The bidual of f is the function

∗∗ def ∗ ∗ ∗ f (x) = sup {hx, x iX − f (x )} x ∈ X x∗∈X∗ The deus ex machina which we shall exploit (and in a moment prove) is that if f is “nice”, f = f ∗∗. The definition of nice involves some more definitions.

Definition 1.4. Fix a space X and a map f : X → [−∞, ∞]. Define

def epi(f) = {(x, α) ∈ X × R : α ≥ f(x)} ⊂ X × R; this is the epigraph of f. • If X is a vector space, we say that f is convex if epi(f) is a convex subset of X × R. • If X is a topological space, we say that f is lower semicontinuous if epi(f) is a closed subset of X × R. We then have

Lemma 1.2. Suppose that X is a topological vector space. If f : X → (−∞, ∞] is convex and lower semicontinuous, then f = f ∗∗. We will prove this in the following subsection. In the world of (6), if we know that f is convex and lower semicontinuous, we can reconstruct it from the asymptotics of (7) via the formula f(x) = sup hθx − Λ(θ)} . θ∈R

9 1.3. Some Convex Analysis. Let’s first understand what the Legendre-Fenchel transform is, at least for a function f : R → R. Fix θ ∈ R. By definition, f ∗(θ) is the smallest constant c such that c ≥ θx − f(x) def for all x ∈ R; i.e., the smallest constant c such that θx − c ≤ f(x). If we define `(x) = θx − c for all x ∈ R, then ` is a linear minorant of f, and −c = `(0) (i.e., the y-intercept of `) and thus c = −`(0). Hence f ∗(θ) = inf {−`(0) : ` is a linear minorant of f} = − sup {−`(0) : ` is a linear minorant of f} . Suppose now that X is a topological vector space. Suppose that f : X → (−∞, ∞] is convex. Define dom(f) def= {x ∈ X : f(x) < ∞}. If a point is not in epi(f), then we can use the Hahn-Banach theorem to separate it from epi(f). This leads to the following. Lemma 1.3. Suppose that X is a topological vector space and that f : X → (−∞, ∞] is convex and f 6≡ ∞. Fix (x, α) 6∈ epi(f). Then there is an x∗ ∈ X∗, a c ∈ R, and a λ ≥ 0 such that ∗ hx, x iX − λα > c (8) ∗ hz, x iX − λf(z) < c. z ∈ dom(f) If f(x) < ∞, then λ > 0.

Proof. By the Hahn-Banach theorem, there is a (x∗, λ) ∈ X∗ × R and a c ∈ R such that ∗ hx, x iX − λα > c (9) ∗ hy, x iX − λβ < c. (y, β) ∈ epi(f) Since f 6≡ ∞, there is ax ˜ ∈ V such that f(˜x) < ∞. From the second inequality, we thus get that ∗ λβ < c − hx,˜ x iX for all β > f(x). Letting β % ∞, we arrive at a contradiction if λ < 0; thus indeed λ ≥ 0. Since (x, f(x)) ∈ epi(f) for any x ∈ dom(f), we get the (8). If λ = 0 and x ∈ dom(f), then ∗ ∗ hx, x iX > c > hx, x iX which is a contradiction. Thus if f(x) < ∞, we must have λ > 0.  We can in fact do a bit better. Corollary 1.4. Under the same assumptions as Lemma 1.3, we can take λ > 0. Proof. We start with (8). If λ > 0, we are done. We thus henceforth assume that λ = 0. By assumption there is anx ˜ ∈ V with f(˜x) ∈ R. Takingα ˜ < f(˜x), we can apply Lemma 1.3 to (˜x, α˜). We get anx ˜∗ ∈ X∗, c˜ ∈ R, and λ˜ > 0 such that ∗ ˜ hx,˜ x˜ iX − λα˜ > c˜ ∗ ˜ hz, x˜ iX − λf(˜x) < c.˜ z ∈ dom(f) Let’s now take a linear combination of this and (8). Fix η ∈ (0, 1). Then ∗ ˜ η hx, x iX + (1 − η)λα˜ > ηc + (1 − η)˜c ∗ ∗ ˜ hz, ηx + (1 − η)˜x iX − (1 − η)λf(˜x) < ηc + (1 − η)˜c. z ∈ dom(f) The claim will follow if we can find η ∈ (0, 1) such that ∗ ∗ ˜ hx, ηx + (1 − η)˜x iX + (1 − η)λα˜ > ηc + (1 − η)˜c But this is obvious; n ∗ ∗ ˜ o  ∗ lim hx, ηx + (1 − η)˜x iX + (1 − η)λα˜ − {ηc + (1 − η)˜c} = hx, x iX − c > 0. η%1  We finally the separation theorem we will use. Corollary 1.5. Under the same assumptions as Lemma 1.3, we can take λ = 1.

10 Proof. Using Corollary, we can assume that λ > 0 in (8) by λ; thus we can divide by λ.  Proof of Lemma 1.2. First, if f ≡ ∞, then f ∗ ≡ −∞ and thus f ∗∗ = ∞ = f. We thus now assume that f is not identically ∞. Fix x ∈ X. Note that ∗∗ 0 ∗ 0 ∗ f (x) = sup inf {hx − x , x iX + f(x )} ≤ sup {hx − x, x iX + f(x)} = f(x). x∗∈X∗ x0∈X x∗∈X∗ The reverse inequality requires a bit more work. Fix x ∈ X and α < f(x). Apply Corollary 1.5; there is an x∗ ∈ X∗ and c ∈ R be such that ∗ hx, x iX − α > c ∗ hz, x iX − f(z) < c z ∈ dom(f) ∗ ∗ ∗ ∗ ∗ Then f (x ) ≤ c and thus hx, x iX − α > f (x ). Thus ∗ ∗ ∗ ∗∗ α < hx, x iX − f (x ) ≤ f (x).  2. The Contraction Principle We next identify how rare events transform.

Theorem 2.1 (Contraction Theorem). Suppose that {Xε}ε>0 is a collection of random variables which take values in a Polish space X and which has a large deviations principle with rate function I. Let Y be a def second Polish space and let f : X → Y be continuous. Define Yε = f(Xε) for all ε > 0. Then {Yε}ε>0 has a def large deviations principle with rate function I˜(y) = inf x∈X I(x). f(x)=y Proof. Note that (10) {y ∈ Y : I˜(y) ≤ s} = f ({x ∈ X : I(x) ≤ s}); forward images of compacts sets are compact. To get the rest of the proof, note that for any measurable subset A of Y, −1 P{Yε ∈ A} = P{Xε ∈ f (A)}. and that (11) inf I(x) = inf I˜(y) x∈f −1(A) y∈A  The point of this is the following. Often one tries to find a large deviations principle for a “fundamental” problem and then pass it through a system. Another use is that in some cases a large deviations principles can be related to another large deviations principle “upstairs”.

Example 2.2. Let {ξn}n∈N be an i.i.d. collection of random variables with common law µ and which take values in some Polish space X. For each N ∈ N, define N 1 X µ = δ ; N N ξn n=1 this is a random element of P(X). Later on, we will see that {µN }N∈N has a large deviations principle with rate function ( R ln dν (x)ν(dx) if ν  µ I(1)(ν) = x∈X dµ ∞ else (this is entropy). Assume now that X = Rd and furthermore that Z exp [hθ, xi d ] µ(dx) < ∞ d R x∈R 11 for θ in a small neighborhood of the origin. Define now

N def 1 X Z SN = ξn = xµN (dx). N d n=1 x∈R d these are random elements of R . We will also see later that SN has a large deviations principle with rate function  Z  (2) def 0 0 I (x) = sup hθ, xi d − ln exp [hθ, x i d ] µ(dx ) . d R 0 d R θ∈R x ∈R Clearly these two should be related.

def √ Example 2.3. Let W be a . For each ε ∈ (0, 1), define Xε(t) = εWt. Then Xε is a C[0, 1]-valued . We will later see that Xε has a large deviations principle with rate function ( 1 R 1 (ϕ ˙(s))2ds if ϕ˙ ∈ L2 I(ϕ) = 2 s=0 ∞ else.

Fix now b : R → R which is Lipshitz-continuous and a y◦ ∈ R. Then we can solve the SDE ε ε √ dYt = b(Yt )dt + εdWt t ∈ [0, 1] (12) ε Y0 = y◦ It turns out that we can write Y ε as a nonlinear transformation of Xε, so we can immediately get a large deviations principle for Y ε.

Exercises def −5 −3/ε 4 −7/ε (1) Let Iε = ε e + ε e . Compute limε→0 ε ln Iε. (2) Let X be geometric with parameter p ∈ (0, 1); i.e., ( (1 − p)k−1p for k ∈ N P{X = k} = 0 else

1 Compute limN→∞ N ln P{X ≥ N}. (3) Let X be negative binomial (2, p) (for some p ∈ (0, 1)); i.e., X is the position of the second heads in an i.i.d sequence of coin flips where each coin has probability p of being heads. More precisely, ( (k − 1)p2(1 − p)k−2 for k ∈ {2, 3 ... } P{X = k} = 0 else

1 Compute limN→∞ N ln P{X ≥ N}. def 3 7 (4) Define IN = N 5 + N 3 for all N ∈ N. Find a nondecreasing function a : N → (0, ∞) such that limN→∞ a(N) = ∞ and such that 1 lim ln IN N→∞ a(N) exists and is nontrivial. In other words, polynomials can also fit into our framework. (5) Prove (3) (6) Prove Laplace asymptotics if x∗ = 0 and f˙(x∗) > 0; i.e., the minimizer is at an endpoint. (7) Prove Laplace asymptotics if x∗ ∈ (0, 1) but f¨(x∗) = 0 and f (4)(x∗) > 0. (8) Show that ∞ Z 1 2 1 2 √ e−z /2dz ≤ exp−x /2 z=x 2π 2 for all x > 0.

12 (9) Suppose that φ : R → R is a measurable function such that Z e−φ(x)dx < ∞. z∈R Suppose further that there is a φ > 0 such that φ ≥ φ Lebesgue-almost surely. Show that Z  1  lim ε ln exp − φ(x) dx ≤ −φ. N→∞ ε x∈R This calculation is relevant for localizing the region of integration to a compact set. 2 (10) Suppose that {Xε}ε∈(0,1) is a collection of random variables such that, for some f ∈ C [0, 1], Z  1  P{Xε ∈ A} = cε exp − f(x) dx x∈A ε for all A ∈ B[0, 1], where    −1 def Z 1 cε = exp − f(x) dx x∈A ε Assume that limN→∞ ε ln cε = 0. (a) Show that infx∈[0,1] f(x) = 0. ∗ (b) Show that limε→0 Xε = x in probability. α (11) Let g : R+ → R+ be a Lebesgue-integrable function such that limr→∞ r g(r) = c for some α > 0 and some positive constant c > 0. Define Z ∞  x2  f(x) = exp − g(r)dr x > 0 σ=0 2r find the behavior of f(x) for x large. The idea is to get “fat tails” via a “stochastic volatility”. 2 2 (12) Suppose that Xε is Gaussian with mean 0 and variance ε σ , where σ is some fixed nonzero parameter. (a) Write the density of Xε. (b) Compute E [exp[θXε]] for all θ ∈ R. def 2  1  (c) For each θ ∈ R, compute Λ(θ) = limε→0 ε ln E exp[ ε2 θXε] . (d) Compute Λ∗(x) def= sup {θx − Λ(θ)}. θ∈R (13) Fix f : Rn → R. Define ∗ def f (θ) = sup {hθ, xi n − f(x)} R x∈R for all θ ∈ Rn. Fix scalars a and b in R and a vector c ∈ Rn. Define n f˜(x) = af (b(x − c)) x ∈ R Compute f˜∗ in terms of f ∗. −λNt (14) Let XN be exponential with parameter Nλ; i.e., P{XN ≥ t} = e for all t ≥ 0. (a) Show that XN → 0 in probability. def 1 (b) Compute Λ(θ) = limN→∞ N ln E [exp [NθXN ]] for all θ ∈ R (hint: the answer is very degenerate). (c) Compute I(x) = sup {θx − Λ(θ)}. θ∈R (15) Fix constants a, b, and c in R. Compute f ∗ for the following functions: (a) when a < b, ( c if a < x < b f(x) = ∞ else (b) when a and b are nonnegative, ( ax if x > 0 f(x) = −bx if x < 0 (16) Let X be uniformly on (0, 1).

13 (a) For each θ ∈ R, compute def 1 Λ(θ) = lim ln E [exp [NθX]] N→∞ N (b) Compute I(x) def= sup {xθ − Λ(θ)}. θ∈R (17) Consider the coin flips of (4). (a) Show that limN→∞ XN = p. def (b) Compute Λ(θ) = ln E [exp [θξn]] for all θ ∈ R. (c) Compute I(x) def= sup {θx − Λ(θ)} for all x ∈ . θ∈R R (d) Show that p 7→ H(p, α) is increasing on [α, 1) and decreasing on (0, α). (18) Show that Definitions 1.1 and 1.2 are equivalent. (19) Fix a space V and a map f : V → (−∞, ∞]. (a) Assume that V is a vector space. Show that f is convex if and only if f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y) for all x and y in V and λ ∈ [0, 1]. (b) Assume that V is topological. Show that f is lower semicontinuous if and only if {v ∈ V : f(v) ≤ s} is closed for all s ∈ R. (c) Assume that V is topological. Show that f is lower semicontinuous if and only if f(x) ≤ limn→∞ f(xn) if x = limn→∞ xn. (20) Suppose that f : V → (−∞, ∞], where V is a topological vector space. Show that n o f ∗∗(x) = sup f˜(x): f˜ : V → (−∞, ∞] is convex, lower semicontinuous, and f˜ ≤ f everywhere .

(21) Fix an Rd-valued random variable X. Show that def d Λ(θ) = ln [exp [hθ, Xi d ]] θ ∈ E R R is a convex function. (22) Let f be a function on a vector space V . (a) Show that f ∗ is convex. (b) Show that f ∗ is lower semicontinuous (hint: write {v∗ ∈ V ∗ : f ∗(v∗) > s} as a union of open half-spaces for each s ∈ R). def (23) Suppose that {fn}n∈N is a collection of convex functions on a vector space V . Suppose that f = limn→∞ fN exists pointwise. Show that f is convex. (24) Prove (10) and (11). One direction of (10) is very easy. The other direction requires some compactness. 1 2 d1 d1 (25) Suppose that {Xε }ε>0 and {Xε }ε>0 are, respectively, collections of R -valued and R -valued random 1 variables with large deviations principles with respective action functionals I1 and I2. Assume that Xε def 2 1 2 d1+d2 and Xε are independent for each ε > 0. Define Yε = (Xε ,Xε ) (i.e., Yε is R -valued). Show that {Yε}ε>0 has a large deviations principle and find its rate function. (26) For Example 2.2, show that  Z  I(2)(x) = inf I(1)(ν): x0ν(dx0) = x . 0 d x ∈R (27) Consider Example 2.3. For each ϕ ∈ C[0, 1], let T (ϕ) ∈ C[0, 1] be the solution of the integral equation Z t (T (ϕ))(t) = y◦ + b ((T (ϕ)(s)) ds + ϕ(t) t ∈ [0, 1] s=0 (a) Show that T is a Lipshitz-continuous map from C[0, 1] to itself. (b) Find a stochastic differential equation for T (Xε). (c) Find a large deviations principle for T (Xε). Be as explicit as possible.

14 CHAPTER 2

The G¨artner-Ellis Theorem

Our goal here is to formalize the calculations of Subsection 1.2. This will involve a number of important but disparate thoughts, so we will only connect things together at the end. For reasons which will be clear a bit later, let’s set things up in a canonical way and have the measures depend on ε rather than the random variables. Our measurable space will be (Rd, B(Rd)), and we will define def X(ω) = ω for all ω ∈ Rd. d d d Fix {Pε}ε>0 ⊂ P(R ), the collection of probability measures on (R , B(R )). For each ε > 0, let Eε be the associated expecation operator. For each θ ∈ Rd and ε > 0, define    def 1 (13) Λε(θ) = ε ln ε exp hθ, Xi d . E ε R

def d Suppose that Λ(θ) = limε&0 Λε(θ) exists for all θ ∈ R . We define def d (14) I(x) = sup {hθ, xi d − Λ(θ)} x ∈ R d R θ∈R 0.1. Structure of I and upper bound. Let’s start with deterministic calculations. Lemma 0.4. We have that I is lower semicontinuous.

Proof. Assume that x = limn→∞ xn. For any θ ∈ R,

lim I(xn) ≥ lim {hθ, xni d − Λ(θ)} = hθ, xi d − Λ(θ). n→∞ n→∞ R R

Thus I(x) ≤ limn→∞ I(xn), giving us the claimed semicontinuity.  Let’s next prove a bound which will be central to our calculations. Lemma 0.5 (Exponential Chebychev). Fix θ ∈ Rd and L ∈ R. Then

lim ε ln Pε {hθ, Xi d ≥ L} ≤ − (L − Λ(θ)) . ε&0 R Proof. We have that        1 L −L/ε 1 1 ε {hθ, Xi d ≥ L} = ε hθ, Xi d ≥ ≤ e exp hθ, Xi d = exp − (L − Λε(θ)) . P R P ε R ε E ε R ε The claim follows.  Let’s next start to work on the upper large deviations bound. Lemma 0.6. For every K ⊂⊂ Rd, (15) lim ε ln Pε{X ∈ K} ≤ − inf I(x). ε→0 x∈K

Proof. First fix s < infx∈K I(x), and note that d [ K ⊂ {x ∈ R : I(x) > s} = {x ∈ R : θx − Λ(θ) > s} θ∈R Since we thus cover K by a collection of open sets, we can extract a finite subcover; there is a finite subset Θ of d such that R [ K ⊂ {x ∈ : hθ, xi d > s + Λ(θ)} R R θ∈Θ

15 Thus X ε{X ∈ K} ≤ ε {hθ, xi d > s + Λ(θ)} P P R θ∈Θ and thus lim ε ln Pε{X ∈ K} ≤ max lim ε ln Pε {hθ, xi d > s + Λ(θ)} ≤ −s ε&0 θ∈Θ ε&0 R where we have use Lemma 0.5.  Let’s now get some regularity at infinity. For each L > 0, define

def d B¯(L) = {x ∈ R : kxk ≤ L}. Lemma 0.7. Assume that (16) 0 ∈ int(dom(Λ)) (i.e., the origin is in the interior of dom(Λ)). Then {x ∈ Rd : I(x) ≤ s} is bounded for all s ≥ 0. Furthermore, ¯ (17) lim lim ε ln Pε{X 6∈ B(L)} = −∞. L→∞ ε&0

d ± def Proof. To start, let {e1, e2 ... ed} be the standard basis vectors in R . Define E = {±e1, ±e2 · · ·±ed}. def By (16), there is a δ > 0 such that K = supe∈E± Λ(δe) is finite. ± If I(x) ≤ s, then hδe, xi d − Λ(δe) ≤ s for all e ∈ E ; hence R s + K sup he, xi d ≤ . e∈E R δ Similarly,

 ± X ε{kXk∞ ≥ L} = ε hX, ei d ≥ L for some e ∈ E ≤ ε {hX, ei d ≥ L} P P R P R e∈E± X = ε {hX, δei d ≥ δL} . P R e∈E± Thus lim ε ln Pε{kXk∞ ≥ L} ≤ max lim ε ln Pε {hX, δei d ≥ δL} ≤ max {Lδ − Λ(δe)} . ε&0 e∈E± ε&0 R e∈E± This leads to (17).  The property (17) is called exponential tightness; here it is framed in terms of balls, which are compact in finite-dimensional spaces. Corollary 0.8. If (16) holds, then • For each s ≥ 0, {x ∈ Rd : I(x) ≤ s} ⊂⊂ Rd • For F ⊂ Rd closed, lim ε ln Pε{X ∈ F } ≤ − inf I(x). ε&0 x∈F Proof. Lemma 0.4 ensures that the level sets of I are closed; Corollary 0.8 ensures that they are also d bounded. Fix next F ⊂ R closed. Fix s < infx∈F I(x). By Lemma 0.7, we can then find an L > 0 such that ¯ lim ε ln Pε{X 6∈ B(L)} ≤ −s. ε&0

Also note that infx∈F ∩B¯(L) I(x) > s. We have that ¯ ¯ Pε{X ∈ F } ≤ Pε{X ∈ F ∩ B(L)} + Pε{X 6∈ B(L)} Thus   ¯ ¯ lim ε ln Pε{X ∈ F } ≤ max lim ε ln Pε{X ∈ F ∩ B(L)}, lim ε ln Pε{X 6∈ B(L) ≤ −s. ε&0 ε&0 ε&0 

16 0.2. Some More Convex Analysis. We now want to prove the lower bound. This will take a lot of work and require a number of concepts from convex analysis; our treatment is taken from [DZ98]. Since the Λε’s are convex, so is Λ. Furthermore, I is convex. We define ri(dom(I)) = {x ∈ dom(I) : if y ∈ dom(I) then x + ε(x − y) ∈ dom(I) for some ε > 0} this is the relative interior of dom(I). Before proceeding, let’s prove a regularity result. Lemma 0.9. Let f : V → (−∞, ∞] be convex, where V is a finite-dimensional vector space. Then f is locally bounded on int dom(f).

def Proof. Fix x∗ ∈ int dom(f). Define f˜(x) = f(x∗ + x) − f(x∗). Then f˜ is convex, f˜(0) = 0, and 0 ∈ int dom(f˜). We want to show that f˜ is locally bounded near 0. Let V be a basis of V ; since V is finite-dimensional, V is a finite set. Define V± = V ∪ (−V) ∪ {0}; i.e., V is the collection of basis vectors and their negatives, and also 0. Since 0 ∈ int dom(f˜). there is a δ > 0 ˜ ± def ˜ such that δv ∈ dom(f) for all v ∈ V . Define M = maxv∈V± |f(δv)|; then M < ∞. Define now ( ) def X X O = λvv : |λv| < δ . v∈V v∈V P def Then O is an open neighborhood of 0. Fix x = v∈V λvv ∈ O. For each v ∈ V, define sv = sgn(λv); i.e., ± sv = λv/|λv| if λv 6= 0, and sv = 0 if λv = 0. Note that svv ∈ V for all v ∈ V. We then write P   P  |λv| X |λv| x = v∈V |λ |(δs v) + 1 − v∈V 0. δ v v δ v∈V Convexity then implies that P |λv| f˜(x) ≤ v∈V f˜(δs v) ≤ M. δ v ˜ ˜ Thus supx∈O f(x) ≤ M; i.e., f is bounded from above on O. Fix next x ∈ O. Since O = −O, −x ∈ O, so f˜(−x) ≤ M. Suppose that f˜(x) < −M; then there is an ˜ ˜ 1 s < −M such that (x, s) ∈ epi(f). Clearly (−x, M) ∈ epi(f) so by definition of convexity, (0, 2 (s + M)) ∈ ˜ ˜ 1 ˜ ˜ epi(f); thus f(x) ≤ 2 (s + M) < 0. But this contradicts the fact that f(0) = 0. Thus f(x) ≥ −M, so ˜ infx∈O f(x) ≥ −M. 

In other words, we can’t have limy→x |f(y)| = ∞ if x ∈ int dom(f). Our first technical result is that if x ∈ ri(dom(I)), then the sup is actually a max in the definition of I.

Lemma 0.10. Suppose that Λ is lower semicontinuous and that x ∈ ri(dom(I)). Then there is a Θx ∈ dom(Λ) such that

(18) I(x) = hx, Θxi d − Λ(Θx). R Proof. The basic idea is to find a supporting hyperplane for the subderivative of I at x. The fact that the subderivative is homogeneous then plays an important role. For v ∈ Rd, define I(x + δv) − I(x) g(v) def= inf . δ>0 δ d Since g is the inf of a collection of convex functions, it is convex. Fix 0 < δ1 < δ2 and v ∈ R . Then     δ1 δ1 I (x + δ2v) + 1 − x − I(x) I(x + δ v) − I(x) δ2 δ2 1 = δ1 δ1   δ1 δ1 I(x + δ2v) + 1 − I(x) − I(x) δ2 δ2 I(x + δ v) − I(x) ≤ ≤ 2 δ1 δ2

17 (this is a standard calculation) so in fact I(x + δv) − I(x) g(v) = lim . δ&0 δ

 v  d Note that g(αv) = αg(v) for all α > 0; thus g(v) = kvkg kvk for all v ∈ R \{0}. We would like to use the Lemma 1.3 to find the required Θx. We next claim that dom(g) is in fact a vector space. Clearly it is convex and is a cone. Next, note that if g(v) < ∞, then x + δv ∈ dom(I) for some δ > 0. Since x ∈ ri(dom(I)), there is an ε > 0 such that x − εδv = x + ε(x − (x + δv)) ∈ dom(I). Thus −v ∈ dom(g); thus − dom(g) ⊂ dom(g), so in fact dom(g) = − dom(g). Fix v ∈ dom(g) and α ∈ R. Since dom(g) is a cone, |α|v ∈ dom(g). Since dom(g) = − dom(g), αv ∈ dom(g); i.e., dom(g) is a symmetric cone. Next fix v1 and v2 in dom(g) and α1 and α2 in R. Since dom(g) is a symmetric cone, 2α1v1 and 2α2v2 are both in dom(g). Since dom(g) is 1 convex, we finally have that α1v1 + α2v2 = 2 (2α1v1 + 2α2v2) ∈ dom(g). Thus dom(g) is indeed a vector space, which we denote as V ; since V ⊂ Rd, V is finite-dimensional. By Lemma 0.9, g is bounded on compact sets. The only thing we are missing is lower semicontinuity. We can modify things so that this happens. For d def each v ∈ R , defineg ¯(v) = limv0→v g(v); this is the lower-semicontinuous regularization of g; it is indeed lower semicontinuous and it retains the convexity of g. We next apply Lemma 1.3 to separate (0, −1) from epi(¯g). Note that since g is bounded on compact sets, it is bounded on the unit sphere; by homogeneity, we must indeed have thatg ¯(0) = 0. Thus by Lemma d 1.3, there is a Θx ∈ and a c ∈ such that hz, Θxi d − g¯(z) < c for all z ∈ dom(¯g). Since g ≥ g¯, for any R R R z ∈ dom(g) and α > 0, we have that

g(αz) > α hz, Θxi d − c R and thus c g(z) > hz, Θxi d − . R α Take now α % ∞. Unwinding things, we have that

I(x + v) ≥ I(x) + hΘx, vi d R d for all v ∈ R . We now can use Lemma 1.2 to write Λ(Θx) in terms of I (this is where we use the lower semicontinuity of Λ). We clearly have that

Λ(Θx) ≥ hx, Θxi d − I(x). R However, we also have that for any v ∈ Rd,

hx + v, Θxi d − I(x + v) ≤ hx, Θxi d + I(x) R R so in fact Λ(Θx) = hx, Θxi d − I(x) which gives us the claim (and ensures that Θx ∈ dom(Λ)). R  We use the fact that x ∈ ri dom(I) to bound the subgradient from below. Let’s next identify some assumptions under which Λ is differentiable at any solution of (18).

Definition 0.1. We say that Λ is essentially smooth if (1) int dom(Λ) 6= ∅ (2) Λ is differentiable on int dom(Λ) 0 (3) If θ ∈ ∂ int dom(Λ), then lim θ0→θ k∇Λ(θ )k = ∞. θ0∈int dom(Λ)

0 Note that in the last requirement, we impose no condition on lim kθ0k→∞ k∇Λ(θ )k. Then we have θ0∈int dom(Λ)

d Lemma 0.11. Assume that Λ is essentially smooth and that dom(I) 6= ∅. If Θx ∈ R satisfies (18), then Θx ∈ int dom(Λ).

18 Proof. By the first requirement of essential smoothnes, we can find Θ∗ ∈ int dom(Λ). By Lemma 0.9, ∗ def ∗ for δ > 0 sufficiently small, Λ is bounded on Θ + B(δ); i.e., M+ = supv∈B(δ) Λ(Θ + v) is finite. ∗ Let’s now use the fact that Θx and Θ + B(δ) are both in dom(Λ) and that dom(Λ) is convex. For each def ∗ def ∗ λ ∈ [0, 1], define Oλ = (1 − λ)Θx + λ(Θ + B(δ)) and set θλ = (1 − λ)Θx + λΘ . Then Oλ ∈ dom(Λ) for each λ ∈ [0, 1], and since Oλ is open for each λ ∈ (0, 1], in fact Oλ ∈ int dom(Λ) for each λ ∈ (0, 1]. By the second requirement of essential smoothness, we know that Λ is thus differentiable at θλ for each λ ∈ (0, 1]. Let’s bound ∇Λ(θλ). Fix v ∈ B(δ) and s ∈ (0, 1]. Then ∗ (19) θλ + sλv = (1 − λ)Θx + λ(Θ + sv) ∈ Oλ.

Λ(θλ + sλv) − Λ(θλ) = Λ(s(θλ + λv) + (1 − s)θλ) − Λ(θλ) ≤ sΛ(θλ + v) + (1 − s)Λ(θλ) − Λ(θλ)

= s (Λ(θλ + λv) − Λ(θλ)) . Thus Λ(θλ + sλv) − Λ(θλ) (20) h∇Λ(θλ), λvi d ≤ lim ≤ Λ(θλ + λv) − Λ(θλ). R s&0 s Using (19) with s = 1, we have that ∗ ∗ Λ(θλ + λv) ≤ (1 − λ)Λ(Θx) + λΛ(Θ + v) = Λ(Θx) + λ (Λ + v) − Λ(Θx))

We can use (18) to bound Λ(θλ) from below; we have that ∗ Λ(θλ) ≥ hθλ, xi d − I(x) = (1 − λ) hΘx, xi d + λ hΘ , xi d − I(x) R R R ∗ ∗ = hΘx, xi d − I(x) + λ hΘ − Θx, xi d = Λ(Θx) + λ hΘ − Θx, xi d . R R R 0 def ∗ Setting M = M + |Λ(Θx)| + kΘ − Θxkkxk, we thus have that 0 |Λ(θλ + λv) − Λ(θλ)| ≤ M λ. Using this in (20), we get that 0 h∇Λ(θλ), λvi d ≤ M R Letting v vary over B(δ), we get that k∇Λ(θλ)k. Finally let λ & 0; then limλ&0 θλ = Θx. By the last requirement of essential smoothness, we thus have that Θx ∈ int dom(Λ) \ ∂ int dom(Λ) = int dom(Λ). 

Finally, we prove that if Λ is differentiable at Θx solving (18), then we have several other properties.

Lemma 0.12. Suppose that Θx solves (18). If Λ is differentiable at Θx, then

• x = ∇Λ(Θx) d • I(x + v) > I(x) + hΘx, vi d for all v ∈ . R R Proof. Suppose that (18) holds. Then for any v ∈ Rd and δ > 0,

Λ(Θx + δv) ≥ hx + δv, Θxi d − I(x) = hδv, Θxi d − Λ(Θx). R R Thus Λ(Θx + δv) − Λ(Θx) hv, xi d ≤ lim = hv, ∇Λ(Θx)i d R R δ&0 δ This holding for any v ∈ Rd, we indeed have the first claim. Suppose now that the second claim is not true. Then there is a v ∈ Rd such that

I(x + v) ≤ I(x) + hv, ∇Λ(Θx)i d . R Then for any ψ ∈ Rd and any δ > 0,

hx + v, Θx + δψi d − Λ(Θx + δψ) ≤ hx, ∇Λ(Θx)i d + hv, ∇Λ(Θx)i d R R R Thus Λ(Θx + δψ) − Λ(Θx) hx + v, ψi d ≤ lim = hψ, ∇Λ(Θx)i d R R δ&0 δ d Since this holds for all ψ ∈ R , we must have that x + v = ∇Λ(Θx) = x; i.e., v = 0. 

19 0.3. Lower Bound. We can now get back to proving the lower large deviations bound.

Lemma 0.13. Assume that Λ is essentially smooth. Fix G ⊂ Rd open. Then

lim ε ln Pε{X ∈ G} ≥ − inf{I(x): x ∈ G ∩ ri dom(I)}. ε&0

Proof. Fix x ∈ G ∩ ri dom(I). Since x ∈ ri dom(I), Lemma 0.10 ensures that there is a Θx ∈ dom(Λ) which satisfies (18). Note that  1  1  ε exp hX − x, Θxi d = exp {Λε(Θx) − hΘx, xi d } < ∞ E ε R ε R We write things in a fairly clever way.      ˜ 1 1 ε{X ∈ G} = ε χG(X) exp − hX − x, Θxi d exp {Λε(Θx) − hΘx, xi d ] P E ε R ε R ˜ d where Pε ∈ P(R ) is defined as   1  ε χA exp hX − x, Θxi d ˜ def E ε R d Pε(A) =   1  A ∈ B(R ) ε exp hX − x, Θxi d E ε R Note that 1  lim ε ln exp {Λε(Θx) − hΘx, xi d ] = − {hΘx, xi d − Λ(Θx)} = −I(x) ε&0 ε R R Fix now δ > 0 such that x + B(δ) ∈ G. Then       ˜ 1 ˜ 1 ε χG(X) exp − hX − x, Θxi d ≥ ε χ (X) exp − hX − x, Θxi d E ε R E x+B(δ) ε R  δ  ≥ ˜ {X ∈ x + B(δ)} exp − . Pε ε We claim that ˜ lim Pε {X ∈ x + B(δ)} = 1. ε&0 We will do this via the upper bound. Define   ˜ def ˜ 1 Λε(θ) = ε ln ε exp [hθ, Xi d ] = Λε(Θx + θ) − Λε(Θx). E ε R Thus def Λ(˜ θ) = lim {Λε(Θx + θ) − Λε(Θx)} = Λ(Θx + θ) − Λ(Θx) ε&0 for all θ ∈ Rd. Define also ˜ 0 n ˜ o 0 0 I(x ) = sup hθ, xi d − Λ(θ) = I(x ) − I(x) − hΘx, x − xi d . d R R θ∈R

By Lemma 0.10 we knew that Θx ∈ dom(Λ); by using Lemma 0.11 we actually know that Θx ∈ int dom(Λ). Thus 0 ∈ int dom(Λ).˜ Thus we can use the upper large deviations bound Corollary 0.8. Note that (x+B(δ))c is closed. We thus have that ˜ ˜ 0 0 lim ε ln Pε {X 6∈ x + B(δ)} ≤ − inf I(x ) = − min {I(x ) − I(x)} . ε&0 x06∈x+B(δ) x0−x6∈B(δ) By now using the strict inequality in the second claim of Lemma 0.12, we get that min {I(x0) − I(x)} > 0. x0−x6∈B(δ) Thus ˜ lim ε ln Pε {X 6∈ x + B(δ)} < 0 ε&0

20 so ˜ lim Pε {X ∈ x + B(δ)} = 1. ε&0 Thus    ˜ 1 lim ε ln ε χG(X) exp − hX − x, Θxi d ≥ −kΘxkδ. E R ε&0 ε Let δ & 0 and combine things to get the claimed lower bound.  We finally have

Corollary 0.14. Assume that Λ is essentially smooth. Fix G ⊂ Rd open. Then

lim ε ln Pε{X ∈ G} ≥ − inf{I(x): x ∈ G}. ε&0 Proof. Fix x ∈ G. We claim that

(21) lim ε ln Pε{X ∈ G} ≥ −I(x). ε&0 This is clearly true if I(x) = ∞. Assume next that x ∈ dom(I). Since ri dom(I) 6 ∅, fix z ∈ ri dom(I). def Consider the point zλ = (1−λ)x+λz for all λ ∈ [0, 1]. We want to show that zλ ∈ ri dom(I) for all λ ∈ (0, 1]. Indeed, fix y ∈ dom(I); then since z ∈ ri dom(I), there is an ε > 0 such that z + ε(z − y) ∈ dom(I). Define ελ λ εˆ = and λˆ = 1 + ε(1 − λ) 1 + ε(1 − λ) We calculate that

zλ +ε ˆ(zλ − y) = (1 +ε ˆ)zλ − εyˆ = (1 +ε ˆ)(1 − λ)x + (1 +ε ˆ)λz − εyˆ (1 + ε)(1 − λ)x + (1 + ε)λz − λεy = 1 + ε(1 − λ) (1 − λˆ)x + λˆ(z + ε(z − y)) = (1 − λˆ)x + λˆ (z + ε(z − y)) = (1 − λˆ)x + λˆ(1 + ε)z − λεyˆ (1 + ε − ελ + λ)x + λ(1 + ε)z − λεy (1 + ε)(1 − λ)x + (1 + ε)λz − λεy = = 1 + ε(1 − λ) 1 + ε(1 − λ) The import of these calculations is that since dom(I) is convex and x and z + ε(z − y), we thus know that zλ +ε ˆ(zλ −y) ∈ dom(I). Since y was an arbitrary element of dom(I), we thus have that indeed zλ ∈ ri dom(I) (note: it is much easier to deduce the correct formulae forε ˆ and λˆ by taking z = 0; i.e., by considering dom(I) − x). Thus zλ ∈ ri dom(I) for all λ ∈ (0, 1]. Since G is open, zλ ∈ G for λ ∈ (0, 1] sufficiently small. Thus

lim ε ln Pε{X ∈ G} ≥ − inf{I(x): x ∈ G ∩ ri dom(I)} ≥ −I(zλ) ε&0

By convexity, I(zλ) ≤ (1 − λ)I(x) + λI(z) so limλ&0 I(zλ) ≤ I(x). Thus we again have (21).  Theorem 0.15 (G¨artner-Ellis). Assume that (16) holds and that Λ is essentially smooth. Then Xε has large deviations principle with rate function (14)

Exercises (1) If f is convex, show that dom(f) is convex. (2) We here investigate the natural properties of the Λε’s of (13). (a) Show that the Λε’s are convex. (b) Show that the limit of convex functions is convex. (c) Show that the Legendre-Fenchel transform of a function is convex. (3) Show that if dom(I) 6= ∅, then ri(dom(I)) 6= ∅.

21 (4) Let f : R2 → (0, ∞] be ( 1 if y ≥ 1 and x = 0 f(x, y) = y ∞ else Fix ri(dom(I)). (5) Let f : Rd → [−∞, ∞] be a convex function and fix a > 0 and b ∈ R, c ∈ Rd, and d in R. Define def f˜(x) = af(bx + c) + d for all x ∈ Rd. Show that f˜ is convex. def (6) If F is a collection of convex functions from a vector space V to [−∞, ∞], show that f(x) = infg∈F g(x) is convex. ˜ def 0 (7) Fix f : V → [−∞, ∞] where V is a topological For each x ∈ V , define f(x) = limx0→x f(x ). (a) Show that f˜ is lower semicontinuous. (b) Assume that V is a topological vector space and f is convex. Show that f˜ is also convex. (8) Show that if a lower semicontinuous function f has compact level sets, then f attains it inf over a closed set.

(9) One of the basic results in large deviations is Cramer’s Theorem. Let {ξn}n∈N be an i.i.d. collection of Rd-valued random variables with common law µ. Define N def 1 X S = ξ N N n n=1

d def 1 (a) For θ ∈ , compute Λ(θ) = limN→∞ ln [exp [N hθ, XN i d ]] in terms of µ. R N E R (b) Show that Λ is differentiable on int dom(Λ). d d (c) Fix x ∈ R and suppose that Θx ∈ R satisfies (18). Find the statistics of {ξn}1≤n≤N under ˜ def E [χA exp [N hSN , Θxi d ]] PN (A) = R A ∈ F [exp [N hSN , Θxi d ]] E R In this case, the proof of the lower bound can be simplified. 1 2 d1 d1 (10) Suppose that {Xε }ε>0 and {Xε }ε>0 are, respectively, collections of R -valued and R -valued random 1 2 variables. Assume that Xε and Xε are independent for each ε > 0. Assume that def   −1 1  Λ1(θ) = lim ε ln exp ε θ, Xε d ε→0 E R 1 def   −1 2  Λ2(θ) = lim ε ln exp ε θ, Xε d ε→0 E R 1 def 1 2 d1+d2 both exist, satisfy (16) and are essentially smooth. Define Yε = (Xε ,Xε ) (i.e., Yε is R -valued). (a) Compute def   −1  Λ(θ) = lim ε ln exp ε hθ, Yεi d ε→0 E R 1 d1 d2 for all θ = (θ1, θ2) ∈ R × R (b) Show that (16) holds for Λ (c) Show that Λ is essentially smooth. (d) Compute the Legendre-Fenchel transform of Λ in terms of the Legendre-Fenchel transforms of Λ1 and Λ2. (11) Fix µ ∈ P[0, ∞). Define Z def zθ Λ(θ) = ln e µ(dz) θ ∈ R z∈[0,∞) Let I denote the Fenchel transform of Λ. (a) Compute I(x) for x < 0. −z R −r (b) Compute limθ%∞ Λ(θ). (Hint: note that e = z∈[0,∞) χ{r≥z}e dr). (c) Compute I(0).

22 CHAPTER 3

Schilder’s Theorem

We here develop large deviations for path-valued random variables. Namely, let W be a Brownian motion. We want to consider the C[0, 1]-valued random variable εW as ε > 0. Let’s first approximate. For each n ∈ N, define ˜ n def Wt = Wbtnc/n {btnc + 1 − tn} + W(btnc+1)/n {tn − btnc} . Thus W˜ n agrees with W at each k/n, and is piecewise linear on each [k/n, (k + 1)/n]. We can thus uniquely characterize Xn,ε as follows. ˜ n • W0 = 0 • It is piecewise linear on each [k/n, (k + 1)/n]. n   o 1/2 ˜ n ˜ n • n W(k+1)/n − Wk/n | 0 ≤ k ≤ n − 1 is an independent collection of standard normal random vari- ables. ˜ n ¯ n We will see that (εW )ε∈(0,1) has a large deviations principle for each n ∈ N. Since W is in fact totally determined by its values at the k/n’s, it is in fact finite-dimensional; wish to exploit this fact. n Define Tn : R → C[0, 1] as    def 1  X  T (¯x)(t) = √ x + (tn − btnc) x t ∈ [0, 1] n n  j btnc+1  1≤j≤btnc 

n forx ¯ = (x1, x2 . . . xn) ∈ R . Note that k + 1  k  1 (22) T (¯x) − T (¯x) = √ x n n n n n k

for all k ∈ {0, 1 . . . n−1}. Fix next an independent collection {ηi}1≤i≤n of standard normal random variables ¯ n def def ¯ n ¯ n and define Wt = (Tn(¯η))(t) for all t ∈ [0, 1], whereη ¯ = (η1, η2 . . . ηn). Clearly W0 = 0, W is piecewise n  on−1 1/2 ¯ n ¯ n linear on each [k/n, (k + 1)/n] and from (22) we have that n W(k+1)/n − Wk/n is an independent k=0 collection of standard normals. ˜ n Lemma 0.16. We have that (εW )ε∈(0,1) has large deviations principle in C[0, 1] with action functional  k+1 2 n−1 ϕ( )−ϕ( k )  1 P n n 1 def 2 k=0 1 n if ϕ is piecewise linear on each [k, (k + 1)/n] and ϕ(0) = 0 In(ϕ) = n ∞ else

˜ n ¯ n n Proof. By the above comments, the law of W and the law of W agree. Note that Tn(R ) is the collection of functions which vanish at 0 and which are piecewise-linear on each [k, (k + 1)/n]. We further note that if Tn(ψ1, ψ2 . . . ψn) = ψ, then 2 n k+1  k  1 X 2 1 X ψ n − ψ n ψk = . 2 2 √1 k=1 k=1 n 

23 Letting n become large, let’s define ( 1 R 1 (ϕ ˙(s))2 ds if ϕ is absolutely continuous and φ(0) = 0 (23) I(ϕ) def= 2 s=0 ∞else;

def see [FW98]. Define C0[0, 1] = {ϕ ∈ C[0, 1] : ϕ(0) = 0}, and for ϕ ∈ C0[0, 1], set kϕkC = sup0≤t≤1 |ϕ(t)|. Our goal is to show that εW has a large deviations principle in C0[0, 1] with rate functional given by (23); this is Theorem 0.22 below.

Lemma 0.17. For each s ≥ 0, the level sets {ϕ ∈ C0[0, 1] : I(ϕ) ≤ s} of I are compact in C0[0, 1].

Proof. If I(ϕ) ≤ s, then for any 0 ≤ t1 < t2 ≤ 1,

Z t2 Z t2 1 |ϕ(t2) − ϕ(t1)| = ϕ˙(s)ds = t2 − t1 ϕ˙(s)ds s=t1 t2 − t1 s=t1 s 1 Z t2 √ √ ≤ t2 − t1 |ϕ˙(s)| ds ≤ t2 − t1 2s. t2 − t1 s=t1 Thus Φ(s) is compact. We thus need to show that Φ(s) is closed. This follows from the fact that   n 2 1 X |ϕ(tj+1) − ϕ(tj)|  (24) I(ϕ) = sup | 0 = t < t ··· < t = 1 . 2 t − t 0 1 n  j=0 j+1 j   Let’s next prove the lower bound. Again we use a measure change.

Lemma 0.18. For any ϕ ∈ C0[0, 1] and δ > 0, 2 ε lim ε ln P {kX − ϕkC < δ} ≥ −I(ϕ) ε&0 Proof. The result is obvious if I(ϕ) = ∞. Assume that I(ϕ) < ∞; thusϕ ˙ is well-defined, and we set Z t ε 1 1 W˜ = Wt − ϕ(t) = Wt − ϕ˙(s)ds; ε ε s=0 then Xε = εW˜ . Define   Z 1 Z 1  ˜ε def 1 1 2 P (A) = E χA exp ϕ˙(s)ds − 2 (ϕ ˙(s)) ds A ∈ F ε s=0 2ε s=0 By Girsanov’s theorem, W˜ ε is a Brownian motion under P˜ε. We thus have that   Z 1  ε ε 1 1 {kX − ϕk < δ} = ˜ χ ε exp − ϕ˙(s)dW − I(ϕ) P C E kX kC <δ} s 2 ε s=0 ε   Z 1    ε 1 1 ≥ ˜ χ ε exp − ϕ˙(s)dW exp − I(ϕ) E kX kC <δ} s 2 ε s=0 ε Girsanov’s theorem tells us that ˜ε ε lim {kX kC < δ} = 1, ε→0 P ˜ε ε so for ε > 0 sufficiently small, P {kX kC < δ} is positive. Thus we can use Jensen’s inequality to calculate that   Z 1  ε 1 ˜ ε E χ{kX kC <δ} exp − ϕ˙(s)dWs ε s=0 ε h h 1 R 1 ii ˜ ε E χ{kX kC <δ} exp − ϕ˙(s)dWs ˜ε ε ε s=0 = {kX kC < δ} P ˜ε ε P {kX kC < δ} 24  ε h R 1 i ˜ ε E χ{kX kC <δ} ϕ˙(s)dWs ˜ε ε 1 s=0 ≥ {kX kC < δ} exp − P  ˜ε ε  ε P {kX kC < δ} and

ε R 1 q ˜ ε ˜ε ε " 1 2# E χ{kX kC <δ} ϕ˙(s)dWs {kX kC < δ} Z s=0 P ˜ε ≤ ϕ˙(s)dWs ˜ε ε ˜ε ε E P {kX kC < δ} P {kX kC < δ} s=0 1 p = q 2I(ϕ). ˜ε ε P {kX kC < δ} Combining things together, we get that " s # ε ˜ε ε 1 2I(ϕ) 1 {kX − ϕkC < δ} ≥ {kX kC < δ} exp − I(ϕ) P P ˜ε ε 2 ε P {kX kC < δ} ε This gives us the stated claim.  Lemma 0.19. We have that   r 8  L2  P sup |Wt| ≥ L ≤ exp − 0≤t≤1 π 2 for all L > 0. Proof. We can prove this either by the reflection principle or the Feynman-Kac formula. We shall use the latter. Define Z ∞  2  Z ∞   1 (z − x) 1 1 2 u(t, x) = 2 √ exp − dz = 2 √ √ exp − z dz z=L 2πt 2t z=(L−x)/ t 2π 2 for all t > 0 and x ≤ L. Clearly u ∈ C∞((0, ∞) × (−∞,L]) and ∂u 1 ∂2u (t, x) = (t, x) ∂t 2 ∂x2 We also have that u(t, L) = 1 and limt&0 u(t, ·) = 0 uniformly on compact subsets of (−∞,L). δ def The PDE for u and its regularity imply that for each δ > 0, Zt = u(1 + δ − t, Wt) is a martingale. We have that Z ∞  2  δ 1 z Z0 = u(1 + δ, 0) = 2 √ √ exp − dz. r=L/ 1+δ 2π 2 Define next def τ = inf {t ≥ 0 : Wt ≥ L} . We next have that   {τ ≤ 1} = sup Wt ≥ L 0≤t≤1 δ If τ ≤ 1, then τ ∧ 1 = τ and Wτ∧1 = L, so Zτ∧1 = u(1 + δ − τ, L) = 0. On the other hand, if τ > 1, then δ τ ∧ 1 = 1 and Wτ∧1 < L, in which case Zτ∧1 = u(δ, W1). Combining things together, we get that   δ  δ    u(1 + δ, 0) = Z0 = E Zτ∧1 = P sup Wt ≥ L + E u(δ, W1)χ{τ>1} . 0≤t≤1 Let δ & 0 and use standard convergence results to see that   Z ∞ 1  z2  P sup Wt ≥ L = 2 √ exp − dz. 0≤t≤1 r=L 2π 2 Thus       Z ∞ 1  z2  P sup |Wt| ≥ L ≤ P sup Wt ≥ L + P inf Wt ≤ −L = 4 √ exp − dz. 0≤t≤1 0≤t≤1 0≤t≤1 r=L 2π 2

25 A standard bound on the Gaussian error function completes the proof.  From this we can show that X˜ n,ε is a good approximation of Xε. Corollary 0.20. For all n ∈ N, ε > 0, and δ > 0, n o r 2  δ2n2  kX˜ n,ε − Xεk ≥ δ ≤ n exp − P C π 8ε2 Proof. For every t ∈ [0, 1],

˜ n,ε ε ˜ n |Xt − Xt | ≤ ε Wt − Wbtnc/n + ε Wbtnc/n − Wt ≤ 2ε Wt − Wbtnc/n . Thus ( )

n ˜ n,ε ε o δ kX − X kC ≥ δ ≤ sup sup Wt − W k ≥ P P n 0≤k≤n−1 k k+1 2ε n ≤t≤ n n−1 ( ) ( ) X δ δ ≤ sup Wt − W k ≥ ≤ n sup |Wt| ≥ P n P k k+1 2ε 0≤t≤ 1 2ε k=0 n ≤t≤ n n √  δ n r 8  δ2n ≤ nP sup |Wt| ≥ ≤ n exp − 2 . 0≤t≤1 2ε π 8ε This completes the proof.  We can now prove the upper bound. Lemma 0.21. For s ≥ 0 and δ > 0, ε lim ε ln P {dist(X , Φ(s)) ≥ δ} ≤ −s. ε&0 Proof. We have that  δ   δ  {dist(Xε, Φ(s)) ≥ δ} ≤ kXε − X˜ n,εk ≥ + dist(X˜ n,ε, Φ(s)) ≥ P P 2 P 2 r16  δ2n  n   o ≤ n exp − + I X˜ n,ε ≥ s . π 32ε2 P We note that 2 n−1  ˜ n,ε nε X 2 I X = W(k+1)/n − Wk/n 2 k=1 We thus have that for α > 0 n   o 1 − α   s(1 − α)  s(1 − α)  (1 − α)   I X˜ n,ε ≥ s ≤ I X˜ n,ε ≥ ≤ exp − exp I X˜ n,ε P P ε2 ε2 ε2 E ε2   "n−1  # s(1 − α) Y n(1 − α) 2 = exp − exp W − W  ε2 E 2 (k+1)/n k/n k=0 n  s(1 − α) r n Z  αnz2    s(1 − α) = exp − exp − dz = exp − α−n/2. ε2 2π 2 ε2 z∈R Collect things together. 

Theorem 0.22. We have that {Xε}ε>0 has an LDP in C0[0, 1] with rate functional I as in (23), which is equivalently written as ( 1 R 1 (ϕ ˙(s))2ds if ϕ˙ ∈ L2 (25) I(ϕ) = 2 s=0 ∞ else.

Proof. Collect the above calculations together 

26 Exercises  (1) If W is a Brownian motion, show that for each δ > 0 limε→0 P sup0≤t≤T |εWt| ≥ δ = 0. (2) We here study (24). (a) Suppose that 0 = t0 < t1 ··· < tn = 1. Show that n 2 Z 1 X |ϕ(tj+1) − ϕ(tj)| ≤ (ϕ ˙(t))2dt. t − t j=0 j+1 j t=0

(b) Fix an open subset O of [0, 1]. Show that there is a {ϕn}n∈N ⊂ C([0, 1]; [0, 1]) such that ϕn % χO. Hint: construct the ϕn using the distance function to [0, 1] \O. (c) Since Lebesgue measure on ([0, 1], B[0, 1]) is regular, show that for any A ∈ B[0, 1] and any δ > 0, there is a ϕ ∈ C([0, 1]; [0, 1]) such that Z 1 2 |χA(t) − ϕ(t)| dt < δ t=0 (d) Assume that ψ ∈ L2[0, 1]. Show that for δ > 0, there is a ψ∗ ∈ C[0, 1] such that Z 1 |ψ(t) − ψ∗(t)|2 dt < δ t=0 (e) Show that if ϕ ∈ C1[0, 1], then   n 2 1 X |ϕ(tj+1) − ϕ(tj)|  I(ϕ) ≤ sup | 0 = t ≤ t · · · ≤ t = 1 . 2 t − t 0 1 n  j=0 j+1 j  (f) Show that if I(ϕ) < ∞, then there is a ϕ∗ ∈ C1[0, 1] such that Z 1 |ϕ˙(t) − ϕ˙ ∗(t)|2 dt < δ. t=0 (g) Show that if I(ϕ) < ∞, then   n 2 1 X |ϕ(tj+1) − ϕ(tj)|  I(ϕ) ≤ sup | 0 = t ≤ t · · · ≤ t = 1 . 2 t − t 0 1 n  j=0 j+1 j  (3) Show that since (24) holds, the level sets of I are closed in C[0, 1]. (4) Use the contraction principle (on Brownian scaling) and the large deviations principle in C[0, 1] to get the large deviations principle for εW as an element of C[0,T ] for any T > 0.

27

CHAPTER 4

Freidlin-Wentzell-Theory

Let’s next extend Schilder’s theorem to a nonlinear setting. Fix bounded and Lipshitz-continuous maps b and σ from R to R. Define b def= sup |b(x)| σ def= sup |σ(x)| σ def= inf |σ(x)| x∈ x∈R x∈R R def |b(x) − b(y)| def |σ(x) − σ(y)| Lb = sup Lσ = sup . x,y∈R |x − y| x,y∈R |x − y| x6=y x6=y

Then b, σ, Lb and Lσ are finite. We also assume that σ > 0. Let W be a Brownian motion and fix an initial point x◦ ∈ R. For each ε ∈ [0, 1), define ε ε ε dXt = b(Xt )dt + εσ(Xt )dWt t > 0 ε X0 = x◦ Note that X0 is deterministic.

h ε 0 2i Lemma 0.23. For each T > 0, we have that limε→0 sup0≤t≤T E Xt − Xt = 0. The proof is one of the problems. We want to find a large deviations principle for Xε. To make things ε fit into the large deviations framework, let’s fix T > 0. We will treat {Xt | 0 ≤ t ≤ T } as an element of C[0,T ]. Note that we already concluded a related result in Exercise 27 of Chapter 1; that will be central in our current calculations. Motivated by that exercise, let’s define  2 1 R 1  ϕ˙(s)−b(ϕ(s))  def  ) ds if ϕ is absolutely continuous and φ(0) = x (26) I(ϕ) = 2 s=0 σ(ϕ(s)) ◦ ∞else; Let’s first prove the lower bound; the arguments are very similar to the proof of the lower bound in Schilder’s theorem. Lemma 0.24. For G ⊂ C[0,T ] open, 2 ε lim ε ln P{X ∈ G} ≥ − inf I(ϕ) ε&0 ϕ∈G Proof. It is sufficient to fix ϕ ∈ C[0,T ] such that I(ϕ) < ∞ and show that 2  ε lim lim ε ln P kX − ϕkC[0,T ] < δ ≥ −I(ϕ). δ&0 ε&0

Since I(ϕ) < ∞, ϕ is absolutely continuous and ϕ(0) = x◦. Define ˜ ε def ε Xt = Xt − ϕt t ∈ [0,T ] ϕ˙ − b(ψ + X˜ ε) ξε def= t s s t ∈ [0,T ] s ˜ ε σ(ψs + Xs ) Z t ˜ ε def 1 ε Wt = Wt − ξs ds t ∈ [0,T ] ε s=0 " " Z T Z T ## ˜ε def 1 ε 1 ε 2 P (A) = E χA exp ξs dWs − 2 (ξs ) ds A ∈ F ε s=0 2ε s=0

29 Then W˜ ε is a P˜ε-Brownian motion and ˜ ε ˜ ε ˜ ε dXt = εσ(Xt + ϕt)dWt t ∈ [0,T ] ˜ ε X0 = 0. We then have that " " Z T Z T ##  ε ˜ε 1 ˜ ε 1 ε 2 kX − ϕk < δ = χ ˜ ε exp − ξ dW − (ξ ) ds . P C[0,T ] E {kX kC[0,T ]<δ} s s 2 s ε s=0 2ε s=0 To proceed, we first note that for x and v in R, 2 2 2 2 (x + v) − x = 2xv + v ≤ 2|x||v| + v   ≤ 2 |x|p|v| p|v| + v2 ≤ x2|v| + |v| + v2 = |v| x2 + 1 + |v| . With this calculation in hand, we see that

We now compute that ! ˙  2 2 ε 2 ψt − b(ψt) 1  ˙   ε  ˙ (ξ ) − = ψt − b(ψt) + b(ψt) − b(ψt + X˜ ) − ψt − b(ψt) s 2 ˜ ε t σ(ψt) σ (ψt + Xt ) ˙ 2 1 ψt − b(ψt) 2 2 ε + σ (ψt) − σ (ψt + X˜ ) 2 ˜ ε t σ (ψt + Xt ) σ(ψt)

˜ ε  2  b(ψt) − b(ψt + Xt ) ˙  2 ψt − b(ψt) ε  ≤ σ + 1 + b(ψt) − b(ψt + X˜ ) 2 ˜ ε t σ (ψt + Xt )  σ(ψt)  2 2 ˙ σ ψt − b(ψt) ε + 2 σ(ψt) − σ(ψt + X˜ ) 2 ˜ ε t σ (ψt + Xt ) σ(ψt) ˜ ε Combining these together, we get that if kX kC[0,T ] ≤ δ, then

1 Z T ε 2 (ξs ) ds − I(ϕ) ≤ E(δ) 2 s=0 where L δ σ2 E(δ) def= b 2σ2I(ϕ) + 1 + L δ + I(ϕ)L δ. σ2 b σ2 σ ˜ ˜ ε ˜ ˜ ε Note that limε&0 Pε{kX kC[0,T ] ≥ δ} = 0; i.e. limε&0 Pε{kX kC[0,T ] < δ} = 1. By Jensen’s inequality, as in the proof of Lemma 0.18. We have that " " Z T ## ˜ 1 ε ˜ ε χ ˜ ε exp − ξ dW Eε {kX kC[0,T ]<δ} s s ε s=0 h i  ˜ R T ε ˜ ε  Eε ξs dWs ˜ n ˜ ε o 1 s=0 ≥ Pε kX kC[0,T ] < δ exp − n o ε ˜ ˜ ε Pε kX kC[0,T ] < δ  r h i  ˜ R T ε 2 Eε s=0 (ξs ) ds ˜ n ˜ ε o  1  ≥ Pε kX kC[0,T ] < δ exp − n o  ε ˜ ˜ ε  Pε kX kC[0,T ] < δ h i ˜ R T ε 2 To finish, we need to bound Eε s=0 (ξs ) dt . We calculate that

30   2 ˜ ε (ψt − b(ψt)) + b(ψt) − b(ψt + Xt ) |ξε|2 = t 2 ˜ ε σ (ψt + Xt ) 2 2 ε 2 |ψt − b(ψt)| + b(ψt) − b(ψt + X˜ ) 2 ˙ t σ ψt − b(ψt) b ≤ 2 ≤ 2 + 8 . 2 ˜ ε 2 2 σ (ψt + Xt ) σ σ(ψt) σ Thus Z T 2 2 ε 2 4σ 8T b |ξt | dt ≤ 2 I(ϕ) + 2 . s=0 σ σ Combine things together as in the proof of Lemma 0.18.  To prove the compactness of the level sets and the upper bound, we will use the approximation of def Exercise 27 of Chapter 1. For n ∈ N, define τn(t) = btnc/n. For ε ∈ (0, 1), let’s then define n,ε n,ε  n,ε  dX = b(X )dt + εσ X dWt t ∈ [0,T ] t t τn(t) n,ε X0 = x◦ and let’s define  2  1 R T ϕ˙(t)−b(ϕ(t)) def 2 s=0 σ(ϕ ) dt if ϕ is absolutely continuous and ϕ(0) = x◦ (27) In(ϕ) = τn(t) ∞ else

n,ε Lemma 0.25. We have that {X }ε∈(0,1) has large deviations principle in C[0,T ] with action functional In. This is in the exercises. We begin by showing that In is a good approximation of I. Define √ ν(s, δ) def= σ 2sδ1/2 + bδ for all s > 0 and δ > 0.

Lemma 0.26. Fix ϕ ∈ C[0,T ] and n ∈ N. We have that

|ϕ(t2) − ϕ(t2)| ≤ min {ν(I(ϕ), |t2 − t1|), ν(In(ϕ), |t2 − t1|)}  L ν(I (ϕ), 1/n)2 I(ϕ) ≤ I (ϕ) 1 + σ n n σ  L ν(I(ϕ), 1/n)2 I (ϕ) ≤ I(ϕ) 1 + σ . n σ

Proof. We can assume that ϕ is absolutely continuous and ϕ(0) = x◦; otherwise the claims are trivially true. Fix next ψ ∈ C[0,T ], which we will take as either ϕ itself or ϕ ◦ τn. If 0 ≤ t1 ≤ t1 ≤ T , then

Z t2 Z t2    ϕ˙(r) − b(ϕ(r)) |ϕ(t2) − ϕ(t1)| = ϕ˙(r)dr = σ(ϕ(r)) + b(ϕ(r)) dr r=t1 r=t1 σ(ψ(r)) Z t2   Z t2 ϕ˙(r) − b(ϕ(r)) ≤ σ(ψ(r))dr + b(ϕ(r))dr r=t1 σ(ψ(r)) r=t1 s s Z t2  2 Z t2 Z t2 ϕ˙(r) − b(ϕ(r)) ≤ dr σ2(ψ(r))dr + b(ϕ(r))dr . r=t1 σ(ψ(r)) r=t1 r=t1 This proves the first two claims. To see the last two claims, fix ψ1 and ψ2 in C[0,T ]. We will either take ψ1 = ϕ and ψ2 = ϕ ◦ τn, or we will take ψ1 = ϕ ◦ τn and ψ2 = ϕ. In either case, we have that

(28) |ψ1(s) − ψ2(s)| ≤ |ϕ(s) − ϕ(τn(s))| ≤ min {ν(I(ϕ), |t2 − t1|), ν(In(ϕ), |t2 − t1|)} .

31 We proceed by calculating that

1 Z T ϕ˙(s)) − b(ϕ(s))2 1 Z T ϕ˙(s)) − b(ϕ(s))2 σ(ψ (s))2 ds = 2 ds 2 s=0 σ(ψ1(s)) 2 s=0 σ(ψ2(s)) σ(ψ1(s)) 1 Z T ϕ˙(s)) − b(ϕ(s))2  σ(ψ (s)) − σ(ψ (s))2 = 1 + 2 1 ds 2 s=0 σ(ψ2(s)) σ(ψ1(s)) 1 Z T ϕ˙(s)) − b(ϕ(s))2  L |ψ (s) − ψ (s)|2 ≤ 1 + σ 1 2 ds. 2 s=0 σ(ψ2(s)) σ We can then use (28) to finish the proof of the last two claims.  We can now prove that the level sets of I are compact. Lemma 0.27. For each s ≥ 0, the set def Φ(s) = {ϕ ∈ C[0,T ]: I(ϕ) ≤ s} is a compact subset of C[0,T ]. Proof. The first part of Lemma 0.26 implies that Φ(s) is equicontinuous. To see that it is closed, fix a def sequence (ϕm)m∈N in Φ(s) and assume that ϕ = limm→∞ ϕm is well-defined (in C[0,T ]). For each m ∈ N,  L ν(s, 1/n)2 I (ϕ ) ≤ s 1 + σ . n m σ

Since the In’s are lower semicontinuous, we thus have that  L ν(s, 1/n)2 I (ϕ) ≤ s 1 + σ . n σ

For n large enough, we thus have that In(ϕ) ≤ 2s, and for such n,  L ν(2s, 1/n)2  L ν(2s, 1/n)4 I(ϕ) ≤ I (ϕ) 1 + σ ≤ s 1 + σ . n σ σ Let n → ∞ and we see that Φ(s) is indeed closed.  Let’s next show that Xn,ε is an exponentially good approximation of Xε. This is found in [Var84]. Lemma 0.28. For each δ > 0, we have that 2  ε n,ε lim lim ε ln P kX − X kC[0,T ] ≥ δ = −∞. n→∞ ε&0 This will take some work to prove. We start by writing that Z t Z t ε n,ε ε n,ε ε n,ε Xt − Xt = {b(Xs ) − b(Xs )} ds + ε {σ(Xs ) − σ(Xs )} dWs s=0 s=0 Z t n n,ε n,ε o + ε σ(X ) − σ(X ) dWs s τn(s) s=0 Lemma 0.29. For κ > 0, we have that   2 n,ε n,ε lim lim ε ln P sup |Xt − Xτ (s)| ≥ κ = −∞. n→∞ ε&0 0≤t≤T n

Proof. If j/n ≤ t < (j + 1)/n and n ≥ 2b/κ, we have that n,ε n,ε 1 |Xt − Xj/n| = εσ Wt − Wj/n + b ≤ εσ sup Wt − Wj/n + κ/2. n j/n≤t<(j+1)/n Thus

32   ( ) n,ε n,ε κ sup X − X ≥ ≤ max sup Wt − Wτ (t) ≥ P t τn(t) κ P n 0≤t≤T 0≤j≤bT nc j/n≤t<(j+1)/n 2εσ ( )  √  κ κ n ≤ T nP sup |Wt| ≥ ≤ T nP sup |Wt| ≥ 0≤t≤1/n 2εσ 0≤t≤1 2εσ r 8  2n  ≤ T n exp − κ π 8ε2σ2 where we have used Lemma 0.19. The claim follows.  We can now bound the error between Xε and Xn,ε.

Proof of Lemma 0.28. Let’s now rewrite things a bit. For n ∈ N and ε, δ, and κ in (0, 1). Define 1,n,ε def ε n,ε def %δ = inf {t ∈ [0,T ]: |Xt − Xt | ≥ δ} inf(∅) = T n o %2,n,ε def= inf t ∈ [0,T ]: Xn,ε − Xn,ε ≥ inf(∅) def= T κ t τn(t) κ %3,n,ε def= %1,n,ε ∧ %2,n,ε δ,κ δ κ Lemma 0.29 is that 2  2,n,ε lim lim ε ln P ρ < T = −∞ n→∞ ε&0 κ for κ ∈ (0, 1). Fix for a moment n ∈ N and ε and κ in (0, 1). We write that   n o n o sup |Xε − Xn,ε| ≥ δ = %1,n,ε < T ⊂ %1,n,ε < T, %2,n,ε = T ∪ %2,n,ε < T . t t δ δ κ κ 0≤t≤T Define now  1/ε2 n,κ,ε def ε n,ε 2 2 Zt = |Xt − Xt | + κ t ∈ [0,T ] for all t ∈ [0,T ]. Then n o  1/ε2  %1,n,ε < T, %2,n,ε = T ⊂ Zn,κ,ε ≥ δ2 + 2 δ κ %3,n,ε κ κ,δ We next study the evolution of Zn,ε,δ. Set

2 def 2 21/ε fε,κ(x) = x + κ . x ∈ R Then 1 1/ε2−1 f˙ (x) = x2 + 2 (2x) ε,κ ε2 κ 1  1  1/ε2−2 2 1/ε2−1 f¨ (x) = − 1 x2 + 2 4x2 + x2 + 2 ε,κ ε2 ε2 κ ε2 κ for all x ∈ R. Note that 2 1/ε2−1/2 6 1/ε2−1 f˙ (x) ≤ x2 + 2 and f¨ (x) ≤ x2 + 2 ε,κ ε2 κ ε,κ ε4 κ for all x ∈ R. Then n o n,κ,ε ˙ ε n,ε ε n,ε ˙ ε n,ε ε n,ε dZ˜ = fε, (X − X ) {b(X ) − b(X )} dt + εfε, (X − X ) σ(X ) − σ(X ) dWt t κ t t t t κ t t t τn(t) 2 2 ε ¨ ε n,ε n ε n,ε o + fε, (Xt − Xt ) σ(Xt ) − σ(X ) dt 2 κ τn(t) We have that 2L f˙ (Xε − Xn,ε) {b(Xε) − b(Xn,ε)} ≤ b Zn,κ,ε ε,κ t t t t ε2 t 33 If 0 ≤ t ≤ %2,n,ε, then κ 2 n o Xε − Xn,ε ≤ {|Xε − Xn,ε| + }2 ≤ 2 |Xε − Xn,ε|2 + 2 t τn(t) t t κ t t κ and thus ε2 n o2 6L ¨ ε n,ε ε n,ε σ n,κ,ε fε, (Xt − Xt ) σ(Xt ) − σ(X ) ≤ Zt 2 κ τn(t) ε2 ˜n,κ,ε 2/ε2 def 2 Note also that Z0 = κ . Set now λ = 2Lb + 6Lσ; then t 2 λ Z Zn,κ,ε ≤ 2/ε + Zn,κ,εds + M t∧%3,n,ε κ 2 s t κ,δ ε s=0 where Mt is a martingale. Setting  λ   Z˜n,κ,δ,ε def= exp − t ∧ %3,n,ε Zn,κ,ε , t ∈ [0,T ] t 2 κ,δ t∧%3,n,ε ε κ,δ we have that Z˜n,κ,δ,ε is a submartingale. Thus

 2   2 2  Zn,κ,ε ≥ (δ2 + 2)1/ε ≤ Z˜n,κ,δ,ε ≥ e−λT/ε (δ2 + 2)1/ε P %3,n,ε κ P %3,n,ε κ κ,δ κ,δ   ˜n,κ,δ,ε 2 E Z 3,n,ε  2 λT 1/ε % ,δ e ≤ κ ≤ κ . 2 1/ε2 2 2 e−ΛT/ε (δ2 + κ2) δ + κ Thus 1/ε2 n o  2eλT  lim lim ε2 ln %1,n,ε < T, %2,n,ε = T ≤ lim lim ε2 ln κ = −∞. P δ κ 2 2 κ&0 ε&0 κ&0 ε&0 δ + κ Collecting things together, we get the desired result. 

Exercises (1) Prove Lemma 0.23. Hint: use something like Gronwall’s inequality. (2) We want to prove Lemma 0.25. For ϕ ∈ C[0,T ], define T ϕ ∈ C[0,T ] as the solution of Z t (Tnϕ)(t) = x◦ + b((Tnϕ)(s))ds s=0 X  j   j + 1  j  + σ(ϕ(τ (t)) (ϕ(t) − ϕ(τ (t))) + σ ϕ − ϕ t ∈ [0,T ] n n n n n 0≤j≤btnc−1 (a) Show that T is well-defined and continuous n,ε (b) Show that that X = Tn(εW ). (c) Show that (27) is the action functional for Xn,ε.

34 CHAPTER 5

Large Deviations for Markov Chains

Large deviations studies ’atypical’ behavior. The gives us one framework of ’typical’ behavior. Let’s see that we also have a notion of ’typical’ behavior for a . To begin, the state space of our Markov chain will be a finite set S. The paths of the Markov chain def will thus be elements of Ω = SN. For each n ∈ N ∪ {0}, and ω = (ω0, ω1,... ), define the coordinate def 0 0 random variable Xn(ω) = ωn. Fix a transition matrix P = (p(s, s ) | s, s ∈ S) and an initial distribution π = (π(s) | s ∈ S); we then consider the probability measure P on (Ω, B(Ω)) defined by requiring that N ! N−1 \ Y P {Xn = sn} = π(s0) p(sn, sn+1) n=0 n=0 N+1 for all n ∈ N and all (s0, s1, . . . sn) ∈ S . We are interested in the asymptotic behavior of the occupation measure of X. To set up our notation, define ( ) def S X P◦(S) = π ∈ R+ : πs = 1 ; s∈S this is of course equivalent to the collection of probability measures on S. For each s ∈ S and N ∈ N, define N def 1 X L (s) = χ (X ). N N {s} n n=1 1 S clearly LN is a random element of P◦(S). We topologize P◦(S) as a subset of L (R ). The “typical” behavior of LN for N large is given by the stationary distribution of the Markov chain. To avoid a number of details, recall a standard notion from the theory of Markov chains.

Definition 0.2. The Markov chain P is irreducible if for every distinct s and s0 in S there is an N ∈ N N 0 QN−1 and an (s0, s1, s2 . . . sN ) in S with s0 = s and sN = s such that n=0 psn,sn+1 > 0. We then have a standard result from the theory of Markov chains.

Lemma 0.30. If P is irreducible, it has a unique stationary distribution µ ∈ P◦(S); i.e., a unique µ ∈ P(S) such that X µ(s) = µ(s0)p(s, s0) s ∈ S s∈S Furthermore,

lim LN = µ N→∞ in probability. We will prove this in Subsection 0.4. Define TP : B(S) → B(S) as def X 0 0 (TP f)(s) = p(s, s )f(s ) s0∈S for all f ∈ B(S) and s ∈ S. This is the transition operator associated with P .

35 Let’s now see if we can understand “unlikely” events. For q ∈ P◦(S), define  Φ(s)  def X e (29) I(q) = sup ln q(s) (T eΦ)(s) Φ∈B(S) s∈S

Lemma 0.31. We have that I : P◦(S) → [0, ∞]. Furthermore, {q ∈ P◦(S): I(q) ≤ L} ⊂⊂ P◦(S) for all L ≥ 0. Proof. First note that by taking Φ ≡ 1, we get that I ≥ 0. As in our earlier calculations, for each L ≥ 0 we have that ( ) \ X eΦ(s) {q ∈ P◦(S) ≤ L} = q ∈ P◦(S): ln q(s) ≤ L . (T eφ)(s) Φ∈B(S) s∈S P eΦ(s) Since the map q 7→ s∈S ln (T eφ)(s) q(s) is continuous, {q ∈ P◦(S) ≤ L} is an intersection of closed sets and is thus closed. Since P◦(S) is compact, we can conclude the proof. 

Lemma 0.32. Fix q ∈ P◦(S). If I(q) < ∞, we have that

1 lim lim ln P {kLN − qk < δ} + I(q) = 0. δ&0 N→∞ N If I(q) = ∞, then 1 lim lim ln P {kLN − qk < δ} = −∞. δ&0 N→∞ N Proof. Fix Φ ∈ B(S) and define Φ ˜ def e (s) Φ Φ(s) = ln Φ = Φ(s) − log TP e (s) s ∈ S (TP e )(s) We now write that " " # " ## X ˜ X ˜ P {kLN − qk < δ} = E χ{kLN −qk<δ} exp −N Φ(s)LN (s) exp N Φ(s)LN (s) . s∈S s∈S

First note that if kLN − qk < δ, then

X ˜ X ˜ X Φ(s)LN (s) − Φ(s)q(s) = Φ(s)(LN (s) − q(s)) ≤ kΦk∞δ. s∈S s∈S s∈S Secondly,

" # " N # " N N # X X X X Φ exp N Φ(˜ s)LN (s) = exp Φ(˜ Xn) = exp Φ(Xn) − ln TP e (Xn) s∈S n=1 n=1 n=1 " N N−1 # " N N−1 # Φ X X X X TP e (X0) = exp Φ(X ) − ln T eΦ (X ) = exp Φ(X ) − ln T eΦ (X ) n P n n P n (T eΦ)(X ) n=1 n=0 n=1 n=0 P N ( Φ ) Φ Y e (Xn) TP e (X0) = (T eΦ)(X ) (T eΦ)(X ) n=1 P n−1 P N ˜ For N ∈ N, define the measure PN on (Ω, B(Ω)) as " ( )# Φ(Xn) ˜ def Y e N (A) = χA .A ∈ B(Ω) P E (T eΦ)(X ) n=1 P n−1 ˜ Φ φ 0 0 To understand the measure PN , define the matrix P = (p (s, s ); s, s ∈ S} as 0 Φ(s0) Φ 0 def p(s, s )e 0 p˜ (s, s ) = Φ . s, s ∈ S (TP e )(s)

36 Φ N Clearly P˜ is a stochastic matrix. For any (s0, s1 . . . sN ) ∈ S ,

"( N )( )# Y Y eΦ(sn) ˜ ∩N {X = s } = χ PN n=0 n n E {Xn=sn} (T eΦ)(s ) n=0 n=1 P n−1 ( N ) N N Y eΦ(sn) Y Y = π(s ) p(s , s ) = π(s ) p˜Φ(s , s ). (T eΦ)(s ) 0 n−1 n 0 n−1 n n=1 P n−1 n=1 n=1

˜ N ˜Φ ˜ Thus PN is a probability measure and {Xn}n=0 is Markovian with transition matrix P under PN . Collecting things together, we now have that " " # " # ˜ X ˜ X ˜ P {kLN − qk < δ} = EN χ{kLN −qk<δ} exp −N Φ(s)LN (s) exp N Φ(s)LN (s) (30) s∈S s∈S Φ # TP e (X0) × Φ . (TP e )(XN )

Suppose now that I(q) < ∞. Then

Φ X e (s) X  Φ I(q) = q(s) ln Φ = sup Φ(s) − ln TP e (s) q(s) (TP e )(s) s∈S Φ∈B(S) s∈S If the supremum is achieved, then the first-order conditions of optimality are thus that

( Φ  ) X TP (e η) (s) (31) η(s) − Φ q(s) = 0 (TP e )(s) s∈S for all η ∈ B(S). We see that Φ  TP (e η) (s)  ˜  Φ = TΦη (s) (TP e )(s) for all s ∈ S. Thus (31) means that

X 0 q(s) = P˜Φ(s, s ) s0∈S for all s ∈ S, so q a stationary distribution for P˜Φ. In fact, by Lemmas 0.35 and (0.36), we know that ˜ N limN→∞ PN {kL − qk < δ} = 1. Since Φ ∈ B(S), T eΦ (X ) −2kΦk∞ P 0 2kΦk∞ e ≤ Φ ≤ e . (TP e )(XN ) Combining things we get the claim when I(q) < ∞. Suppose next that I(q) = ∞. Clearly 1 lim lim sup ln P {kLN − qk < δ} = −∞. δ&0 N→∞ N

Fix Φ ∈ B(S); from (30) we have that

1 X  eΦ(s)  lim lim ln P {kLN − qk < δ} ≤ − ln q(s) . δ&0 N→∞ N (T eΦ)(s) s∈S

Since I(q) = ∞, upon letting Φ vary, we get the desired claim. 

37 0.4. Proofs. For each s ∈ S, define def (32) τs = inf{n ≥ 1 : Xn = s}

and let Ps be the probability measure on (Ω, B(Ω)) defined by requiring that N ! N−1 \ Y Ps {Xn = sn} = δs(s0) p(sn, sn+1) n=0 n=0 N+1 for all n ∈ N and all (s0, s1, . . . sn) ∈ S . Thus of course X P(A) = Ps(A)πs A ∈ B(Ω) s∈S For s ∈ S, define

1 def τs = min{n > 0 : Xn = s} k def k−1 τs = min{n > τs : Xn = s} k ∈ N (inf ∅ = ∞)

def 0 For convenience, set τs = τs . For each k ∈ N, let’s also define ( τ k+1 − τ k if τ k < ∞ Sk def= s s s s 0 else Also, for each s ∈ S, define n n def X Vs = χ{Xj =s} j=1 ∞ ∞ def X n Vs = χ{Xj =s} = lim Vs . n→∞ j=1 n Thus Vs is the number of times X visits s during the set of times {1, 2 . . . n}. Note that 1 L (s) = V n; n n s n Vs is the number of times that X visits the state i in the times {1, 2 . . . n}. k We start by understanding the structure of the Ss ’s. 0 0.33 Fix s and s in S, A ∈ k and k and n in . Then Lemma . Fτs N k k  k  Ps0 {Ss = n} ∩ A ∩ {τs < ∞} = Ps{τ = n}Ps0 A ∩ {τs < ∞} . k k Thus, conditionally on τ < ∞, S is independent of k and has the -law of τ. s s Fτs Ps Proof. We have that ∞ k k  X k k  Ps0 {Ss = n} ∩ A ∩ {τs < ∞} = Ps0 {Ss = n} ∩ A ∩ {τs = m} . m=0 k Since τs is a stopping time, k k  k A ∩ {τs = m} = A ∩ {τs ≤ m} \{τs ≤ m − 1} ∈ Fm k k if m ∈ N is positive; of course A ∩ {τs = 0} = A ∩ {τs ≤ 0} ∈ F0. Thus for any m ∈ {0, 1 ... },

n−1   k k  \ k Ps0 {Ss = n} ∩ A ∩ {τs = m} = Ps0  {Xm+j 6= s} ∩ {Xm+n = s} ∩ A ∩ {τs = m} j=1

n−1   \ k  = Ps  {Xj 6= s} ∩ {Xn = s} Ps0 A ∩ {τs = m} . j=1

38 Combine the terms to get the first claim. The second claim is an interpretation of the first.  We can next show recurrence.

0 ∞ Lemma 0.34. For all s and s in S, Ps{τs0 < ∞} = Ps{Vs0 = ∞} = 1. In particular, Ps{τs < ∞} = 1 for all s ∈ S, so P is recurrent.

Proof. First note that for any s and s0 in S and any n ∈ N ∞ n+1 n+1 n n n Ps{Vs0 ≥ n + 1} = Ps{τs0 < ∞} = Ps{τs0 < ∞, τs0 < ∞} = Ps{Ss0 < ∞, τs0 < ∞} (33) n ∞ = Ps0 {τs0 < ∞}Ps{τs0 < ∞} = Ps0 {τs0 < ∞}Ps{Vs0 ≥ n}. Observe now that X ∞ 0 X ∞ Es0 [Vs00 ] π(s ) = E [Vs00 ] = ∞. s0,s00∈S s00∈S 0 00 ∞ Thus there is an s and s in S such that Es0 [Vs00 ] = ∞. Explicitly, ∞ X p(n)(s0, s00) = ∞. n=1 00 0 Fix now s ∈ S. Since P is irreducible, there is an n0 and n00 in N such that p(n )(s, s0) > 0 and p(n )(s00, s) > 0. Thus for any n, 00 0 00 0 p(n +n+n )(s, s) ≥ p(n )(s, s0)p(n)(s0, s00)p(n )(s00, s) so in fact ∞ ∞ X (n) Es [Vs ] = p (s, s) = ∞. n=1 Rewriting this, we see that ∞ X ∞ (34) Ps{Vs ≥ n} = Es [Vs ] = ∞. n=1 ∞ If Ps{τs < ∞} < 1, then by Lemma 0.34 we have that Es [Vs ] < ∞ which contradicts (34) (use the ratio test for series). Hence we indeed have recurrence. 0 Next fix s and s in S. By recurrence, Ps0 {τs0 < ∞} = 1. Using Lemma 0.34, we have that ∞ n−1 ∞ n Ps0 {Vs0 ≥ n} = (Ps0 {τs0 < ∞}) Ps0 {Vs0 ≥ 1} = (Ps0 {τs0 < ∞}) = 1 ∞ (n) 0 so in fact Ps0 {Vs0 = ∞} = 1. By irreducibility, there is an n ∈ N such that p (s, s ) > 0. Thus

 ∞   ∞  ∞ X  X X 00 0 0 1 = Ps {Vs0 = ∞} = Ps χ{Xj =s} = ∞ = Ps χ{Xj =s } = ∞,Xn = s j=n  s00∈S j=n  X (n) 0 00 ∞ = p (s , s )Ps00 {Vs0 = ∞} . s00∈S P (n) 0 00 ∞ Since s00∈S p (s , s ) = 1, we must in fact have that Ps {Vs0 = ∞} = 1. Furthermore, ∞ 1 = Ps {Vs0 = ∞} ≤ Ps{τs0 < ∞} = 1. The proof is complete.  For each s ∈ S, define ( 1 def if Es[τs] < ∞ µ(s) = Es[τs] 0 else Then we have

Lemma 0.35. Since P is irreducible, limN→∞ LN (s) = µ(s) in probability for each s ∈ S.

39 k Proof. Fix s ∈ S. Lemmas 0.33 and 0.34 imply that {Ss }k∈N is an i.i.d. collection of random variables k 1 with common law being the Ps-law of τs and by Lemma 0.34 the Ss ’s are and τs are P-a.s. finite. If Es[τs] < ∞, then the strong law of large numbers implies that ( N ) 1 X n Ps lim Ss = Es[τs] = 1. N→∞ N n=1 n If Es[τs] = ∞, then by virtue of the fact that the Ss ’s are nonnegative, ( N ) 1 X n Ps lim Ss ∧ M = Es[τs ∧ M] = 1 N→∞ N n=1 for all M ∈ N. Thus Ps-a.s. N N 1 X 1 X lim Sn ≥ lim Sn ∧ M = [τ ∧ M]. N s N s Es s N→∞ n=1 N→∞ n=1 Letting M % ∞, we get that again ( N ) 1 X n Ps lim Ss = Es[τs] = 1. N→∞ N n=1 By Lemma 0.34, we have that n ∞ { lim Vs = ∞} = {Vs = ∞} = 1. P n→∞ P n If Vs = k ∈ N, then X visits the state s exactly k times during the times {1, 2 . . . n}. This means that 1 X k0 1 X k0 τs + Ss ≤ n < τs + Ss . 1≤k0≤k−1 1≤k0≤k In other words, 1 X k0 1 X k0 τs + Ss ≤ n < τs + Ss . 0 n 0 n 1≤k ≤Vs −1 1≤k ≤Vs Hence 1 P k0 1 P k0 τ + 0 n S τ + 0 n S s 1≤k ≤Vs −1 s n s 1≤k ≤Vs s n ≤ n < n . Vs Vs Vs Consequently  n  P lim n = Es[τs] = 1. n→∞ Vs This implies the claim.  This gives us the sought-after stationary distribution. Lemma 0.36. Since P is irreducible, µ is the unique stationary distribution.

Proof. Note that 0 ≤ LN (s) ≤ 1, so the convergence in probability given by Lemma 0.35 implies that N−1 1 X µ(s) = lim P{Xn = s}. N→∞ N n=1

Let’s first see that µ ∈ P◦(S). Clearly µ ≥ 0. Secondly, since |S| < ∞, N−1 N−1 ( ) X X 1 X 1 X X µ(s) = lim P{Xn = s} = lim P{Xn = s} = 1. N→∞ N N→∞ N s∈S s∈S n=0 n=0 s∈S Let’s next see that µ is indeed a stationary distribution. For any s ∈ S,

N−1 N−1 ( )! 1 X 1 X X 0 0 µ(s) = lim P{Xn = s} = lim π(s) + P{Xn−1 = s }p(s , s) N→∞ N N→∞ N n=0 n=1 s0∈S

40 N−2 ( ) 1 X X 0 0 = lim P{Xn = s }p(s , s) N→∞ N n=0 s0∈S N−2 X 0 1 X 0 X 0 0 = p(s , s) lim P{Xn = s } = µ(s )p(s , s). N→∞ N s0∈S n=0 s0∈S

∗ ∗ Fix next any stationary distributionµ ˜ ∈ P◦(S). Since µ ∈ P◦(S), there is an s ∈ S such that µ(s ) > 0; i.e., Es[τs] < ∞. Fix s ∈ S. Then

X X ∗ ∗ µ˜(s) = µ˜(s1)p(s1, s) = µ˜(s1)p(s1, s) +µ ˜(s )p(s , s)

s1∈S s1∈S s16=s X ∗ ∗ = µ˜(s2)p(s1, s1)p(s1, s) +µ ˜(s )p(s , s)

s1,s2∈S s16=s X X ∗ ∗ ∗ ∗ (35) = µ˜(s2)p(s2, s1)p(s1, s) + µ˜(s )p(s , s1)p(s1, s) +µ ˜(s )p(s , s)

s1,s2∈S s1∈S s1,s26=s s16=s     X ∗  X ∗ ∗  µ˜(s2)p(s2, s1)p(s1, s) +µ ˜(s ) p(s , s1)p(s1, s) + p(s , s)   s1,s2∈S s1∈S  s1,s26=s s16=s Note that

∗ p(s , s) = Ps∗ {X1 = s} = Ps∗ {X1 = s, τs∗ > 0} X ∗ ∗ p(s , s1)p(s1, s) = Ps∗ {X1 6= s ,X2 = s} = Ps∗ {X1 = s, τs∗ > 1} s1∈S s16=s

Extending (35) and using the fact that all the terms are nonnegative, we get thatµ ˜(s) ≥ µ˜(s∗)γ(s) where

∞ "τs∗ −1 # def X X ∗ ∗ ∗ γ(s) = Ps {Xn = s, τs > n} = Es χ{Xn=s} . n=0 n=0

∗ ∗ ∗ Since this holds for all s ∈ S, we have thatµ ˜ ≥ µ˜(s )γ. Note that since Es [τs] < ∞, X0 = Xτs∗ = s Ps∗ -a.s. Thus

" τ ∗ # ∞ ∞ Xs X X ∗ ∗ ∗ ∗ ∗ γ(s) = Es χ{Xn=s} = Ps {Xn = s, n ≤ τs } = Ps {Xn = s, n − 1 < τs } n=1 n=1 n=1 ∞ X X 0 = Ps∗ {Xn = s, Xn−1 = s , n − 1 < τs∗ } s0∈S n=1 ∞  n−1  X X 0 \ ∗ = Ps∗ {Xn = s} ∩ {Xn−1 = s } ∩ {Xj 6= s } s0∈S n=1 j=1

∞  n−1  ∞  n  X X 0 0 \ ∗ X X 0 0 \ ∗ = p(s , s)Ps {Xn−1 = s } ∩ {Xj 6= s } = p(s , s)Ps {Xn = s } ∩ {Xj 6= s } s0∈S n=1 j=1 s0∈S n=0 j=1 ∞ X 0 X 0 X 0 0 = p(s , s) Ps{Xn = s , τs∗ ≥ n} = p(s , s)γ(s ). s0∈S n=0 s0∈S

41 Fix s ∈ S. Since P is irreducible, there is an n ∈ N such that p(n)(s, s∗) > 0. Thus, noting that γ(s∗) = 1, we have that X 0 =µ ˜(s∗) − µ˜(s∗)γ(s∗) = p(n)(s0, s∗) {µ˜(s0) − µ˜(s∗)γ(s0)} . s0∈S Sinceµ ˜ ≥ µ˜(s∗)γ, all of the terms on the right are nonnegative. Thus

0 ≥ p(n)(s, s∗){µ˜(s∗) − µ(s)γ(s∗)}.

∗ and thus in factµ ˜(s) =µ ˜(s )γ(s). Thus all stationary distributions must coincide. 

Proof of Lemma 0.30. Combine things. 

1. Pair Empirical Measure Let’s use the theory for Markov chains to look at pairs. Namely, consider

N (2) 0 def 1 X 0 2 L (s, s ) = χ 0 (X ,X )(s, s ) ∈ S N {(s,s )} n−1 n n=1

0 (2) def This counts the number of times that X makes a transition from s to s . We note that Xn = (Xn,Xn+1) 2 (2) (2) 0 0 0 0 2 is Markovian with state space S and transition matrix P = (p ((s1, s2), (s1, s2)) : (s1, s2), (s1, s2) ∈ S ) defined by

( 0 0 0 (2) 0 0 p(s1, s2) if s2 = s1 pˆ ((s1, s2), (s , s )) = 1 2 0 else

We also define T : B(S2) → B(S2) as

def X (2) 0 0 0 0 X 0 0 (36) (TP (2) f)(s1, s2) = p ((s1, s2), (s1, s2))f(s1, s2) = f(s2, s2)p(s2, s2) 0 0 2 0 (s1,s2)∈S s2 Define

Φ(s1,s2) (2) def X 0 e (37) I (q) = sup q(s, s ) ln Φ Φ∈B(S2) (T e )(s1, s2) (s,s0)∈S2

2 2 for all q ∈ P◦(S ). We are interested in a simpler formulation of this expression. For q ∈ P◦(S ), define

def X 0 def X 0 q1(s) = q(s, s ) and q2(s) = q(s , s) s0∈S s0∈S for all s ∈ S. If q1(s) > 0, define 0 0 def q(s, s ) 0 q2|1(s |s) = s ∈ S q1(s) Lemma 1.1. We have that (P  q1(s)H q2|1(·|s) p(s, ·) if q1 = q2 I(2)(q) = s∈S ∞ else

2 for all q ∈ P◦(S ).

42 Proof. First rewrite (37) as   def  0 0  (2) X 0 X 0 X Φ(s ,s2) 0 0 I (q) = sup q(s, s )Φ(s1, s2) − q(s, s ) ln e p(s , s2) 2 Φ∈B(S )  0 2 0 2 0  (s,s )∈S (s,s )∈S s2∈S    0 0  X 0 X 0 X Φ(s ,s2) 0 0 (38) = sup q(s, s )Φ(s1, s2) − q(s, s ) ln e p(s , s2) 2 Φ∈B(S )  0 2 0 2 0  (s,s )∈S (s,s )∈S s2∈S    0 0  X 0 X 0 X Φ(s ,s2) 0 0 = sup q(s, s )Φ(s1, s2) − q2(s ) ln e p(s , s2) 2 Φ∈B(S )  0 2 0 0  (s,s )∈S s ∈S s2∈S

We use here the fact that the right-hand side of (36) doesn’t depend on s1. If q1 = q2, then we can continue (38) as   (2)  X 0 X X Φ(s,s0) 0  I (q) = sup q(s, s )Φ(s1, s2) − q1(s) ln e p(s, s ) Φ∈B(S2) (s,s0)∈S2 s∈S s0∈S  ( ( )) 0 X X 0 0 X φs(s ) 0 = sup q1(s) q2|1(s |s)φs(s ) − ln e p(s, s ) {φs}s∈S⊂B(S) s∈S s0∈S s0∈S X  = q1(s)H q2|1(·|s) p(s, ·) s∈S

∗ ∗ ∗ Next suppose that q1 6= q2. Then there is an s ∈ S such that q1(s ) < q2(s ) (we use here the fact that both q1 and q2 sum up to 1). For each α ∈ R, define

( ∗ 0 α if s = s Φα(s, s ) = 0 else

Then    0 0  X 0 X 0 X Φα(s ,s2) 0 0 I(q) ≥ sup q(s, s )Φα(s1, s2) − q2(s ) ln e p(s , s2) α∈R  0 2 0 0  (s,s )∈S s ∈S s2∈S    ∗ ∗ X  ∗ ∗ ∗ = sup αq1(s ) − q2(s )α − q2(s) = sup {α (q1(s ) − q2(s )) − (1 − q2(s ))} = ∞. α∈R  0 ∗  α∈R s16=s

This completes the proof. 

When q1 = q2, we say that q is shift-invariant.

Exercises (1) Show that τs of (32) is a stopping time; i.e., for each n ∈ N,

{τs ≤ n} ∈ σ{X0,X1 ...Xn}. (2) Show that thanks to Lemmas 0.33 and 0.34

K K k  Y k Ps ∩k=1{Ss = nk} = Ps{Ss = nk} k=1

K for all K ∈ N and (n1, n2 . . . nK ) ∈ N .

43 (3) Show that I of (29) can be rewritten as X  Φ(s)  X  Φ(s)  I(q) = sup ln q(s) = sup ln q(s) . (T Φ) (s) (T Φ) (s) Φ∈B(S) s∈S Φ∈B(S) s∈S Φ>0 0<Φ≤1 (4) Assume that S = {1, 2}. Compute I of (29) for the matrices 0 1 1 0 0 1  P = 1 0 1 0 α 1 − α where for the third matrix α is a fixed element of (0, 1). Note that for the first and second matrices, there is no randomness in the dynamics. For the first, every distribution is stationary, but for the second there is only one stationary distribution. The third matrix is irreducible.

44 CHAPTER 6

Refined Asymptotics

Let’s now discuss some issues of refined asymptotics. A general theory would be much less than instruc- tive than a full analysis of a special case, so we return to the case of Subsection 0.2. We again define XN as in (4). Fix α ∈ (p, 1). Then we should have that

1 0 lim ln P{XN ≥ α} ≤ − inf H(α , p) < 0 N→∞ N α0≥α so {XN ≥ α} is a rare event. Fix now a measurable map Ψ : R → R+. We want to compute     IN = E Ψ(XN − α) χ[α,∞)(XN ) = E Ψ(XN − α) χ[0,∞)(XN − α) .

If Ψ is nice enough (e.g., bounded), we should have that limN→∞ IN = 0, but how quickly? To start, let’s write down the measure change suggested by the G¨artner-Ellis theorem. We have that H(α, p) = sup θα − ln peθ + 1 − p . θ∈R The supremum is attained at θ∗, where ∗ peθ α = peθ∗ + 1 − p We then write that

IN = I˜N exp [−NH(α, p)] where ∗ ˜ def E [χA [Nθ (XN − α)]] PN (A) = ∗ .A ∈ F E [[Nθ (XN − α)]] ˜ ˜ ∗ IN = EN [Φ(XN − α) exp [−Nθ (XN − α)]]

˜ We recall that under PN , {ξ1, ξ2 . . . ξN } are i.i.d. Bernoulli random variables with ∗ peθ ˜ {ξ = 1} = = α. PN n peθ∗ + 1 − p ˜ Thus, as expected, EN [XN ] = α. To make headway with this specific problem, let’s define N def X X˜N = N(XN − α) = ξn − Nα n=1 def SN = {n − Nα : n ∈ {0, 1 ...N}} Then ˜ ˜ h  −1 ˜  h ∗ ˜ ii IN = EN Ψ N XN exp −θ XN and X˜N takes values in SN . We will later choose some specific Φ’s. For the moment, we know that  2 ˜ h ˜ i ˜  ˜  ˜ EN XN = 0. It is easy to see that EN XN = Nα(1 − α), so in fact the variance of XN tends

45 to infinity. Note further by the that √1 X˜ should approximately be Gaussian as N N N → ∞. Remark 0.2. As an example of the purpose of things, let’s assume that Ψ(z) = zα for some α > 0. Admitting some prescience, let’s replace X˜N by a Gaussian with mean 0 and variance Nα(1 − α). We can then compute that

Z ∞ α  2  1  z  ∗ z p exp [−θ z] exp − dz z=0 2πNα(1 − α) N 2Nα(1 − α) ∞ 1 Z ∗ 1 Γ(α + 1) ≈ zαe−θ zdz = α+1/2p α+1/2p ∗ α+1 N 2πα(1 − α) z=0 N 2πα(1 − α) (θ ) This is the type of asymptotic result we want. To mimic this calculation, let’s assume some regularity of Ψ. Assumption 0.3. We assume that |Ψ(z)| sup β < ∞ z≥0 1 + z

def Ψ(z) for some β > 0 and that C = limN→∞ zα is well-defined, finite, and positive for some α > 0.

The negative exponential in I˜N should mean that most of the mass of the integral in I˜N is centered ˜ ˜ ˜ where XN is O(1). Thus, we want to find the behavior of PN {XN = s} for s ∈ SN of order 1 as N → ∞. We proceed by using characteristic functions. Fix N ∈ N and ψ ∈ R and define def ˜ h h ˜ ii PN (ψ) = EN exp iψXN .. ψ ∈ R We then have that N X ˜ n ˜ o PN (ψ) = exp [iψ(n − Nα)] PN XN = n − Nα n=0 " " ( N )## (39) ˜ X iψ N PN (ψ) = EN exp iψ ξN − Nα = exp [−iψNα] e α + 1 − α n=1  N = P˜(ψ)

where P˜(ψ) def= 1 + α eiψ − 1 e−iψ Firstly, we can rearrange the first equation of (39) get that

N iNαψ X iψn ˜ n ˜ o PN (ψ)e = e PN XN = n − Nα . n=0 Integrating against eiψm, we get that Z ˜ n ˜ o PN XN = n − Nα = PN (ψ) exp [−i(n − Nα)ψ] dψ ψ∈(−π,π) so that in fact Z ˜ n ˜ o PN XN = s = PN (ψ) exp [−iψ] dψ ψ∈(−π,π)

for all s ∈ SN . Let’s next look at the second equation of (39). Since PM is a characteristic function, |PN | ≤ 1. We N ˜ ˜ actually have that |PN (ψ)| = P(ψ) . Note that PN (0) = 1. We can actually get a more precise estimate.

46 Lemma 0.4. There is a K1 > 0 such that

˜  2 P(ψ) ≤ exp −K1ψ ψ ∈ (−π, π) Secondly, we have that  α(1 − α)  P˜(ψ) = exp − ψ2 + E(ψ) ψ ∈ (−π, π) 2 3 where there is a K2 > 0 such that |E(ψ)| ≤ K2|ψ| for all |ψ| ≤ 1. Proof. We first calculate that 2 ˜ 2 2 2 2 P(ψ) = |1 − α + α cos ψ + iα sin ψ| = (1 − α + α cos ψ) + α sin ψ = (1 − α)2 + α2 + 2α(1 − α) cos ψ = (1 − α)2 + 2α(1 − α) cos ψ + α2 − 2α(1 − α) (1 − cos ψ) = 1 − 2α(1 − α) (1 − cos ψ) . Since −1 < cos ψ < 1 for all nonzero ψ ∈ (−π, π), we have that 0 < 1−cos ψ < 1 for all nonzero ψ ∈ (−π, π). 1 We also have that 0 < α(1 − α) ≤ 4 . Thus for all nonzero ψ ∈ (−π, π), we have that 0 < 2α(1 − α) (1 − cos ψ) < 4α(1 − α) ≤ 1

˜ and thus 0 < P(ψ) < 1 for all nonzero ψ ∈ (−π, π). Hence

def ln {1 − 2α(1 − α) (1 − cos ψ)} K1 = sup 2 ψ∈(−π,π)\{0} ψ

is well-defined and KK1 ≤ 0. We have that ln {1 − 2α(1 − α) (1 − cos ψ)} ln {1 − 4α(1 − α)} lim = < 0. |ψ|→π ψ2 π2 We can use l’Hˆopital’s rule to also see that ln {1 − 2α(1 − α) (1 − cos ψ)} −2α(1 − α) sin ψ lim = lim = −α(1 − α) < 0. |ψ|→π ψ2 ψ→0 2ψ

˜  2 Note of course that P(0) = 1 ≤ exp −K10 . This proves the first claim. ˜ To see the second claim, we want to use the principal branch of the logarithm. Note that if P(ψ) ∈ R−, ˜ then sin ψ = 0. Thus P(ψ) 6∈ R− for all ψ ∈ (−π, π). We can thus define f(ψ) def= ln P˜(ψ) = ln 1 + α eiψ − 1 − iψα for all ψ ∈ (−π, π), and calculate that iαeiψ f 0(ψ) = − iα 1 + α(eiψ − 1) −αeiψ α2e2iψ f 00(ψ) = − 1 + α(eiψ − 1) (1 + α(eiψ − 1))2 0 00 2 Thus f(0) = 0, f (0) = 0, and f (0) = −α + α = −α(1 − α). This gives the second claim. 

We can next connect things back together. For any s ∈ SN , we have that √ Z     ˜ n ˜ o N ψ ψ PN XN = s = √ √ PN √ exp −i√ dψ 2π ψ∈(−π N,π N) N N Z      1 α(1 − α) 2 ψ ψ = √ √ √ exp − ψ + NE √ exp −i√ dψ. 2π N ψ∈(−π N,π N) 2 N N

47 1. Reference Two common references for large deviations are [DZ98] and [FW98]. See also the references therein. Another nice source is [Var84].

48 Bibliography

[DZ98] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathe- matics. Springer-Verlag, New York, second edition, 1998. [FW98] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems, volume 260 of Grundlehren der Math- ematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, New York, second edition, 1998. Translated from the 1979 Russian original by Joseph Sz¨ucs. [Var84] S. R. S. Varadhan. Large deviations and applications, volume 46 of CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1984.

49