STAT 811 Probability Theory II

Chapter 7: Laws of Large Numers and Sums of Independent Random Variables

Dr. Dewei Wang Associate Professor Department of Statistics University of South Carolina [email protected]

1 / 56 7.1 Truncation and Equivalence

From the deﬁnition of ui, we see that dealing with random variables that are uniformly bounded or that have moments are often easier. When these desirable properties are absent, we often use truncation to induce their presence and then see whether the truncation makes any equivalence. For example, we often compare

0 {Xn} with {Xn} = {XnI|Xn|≤n}

where the second one is a truncated version of the ﬁrst.

2 / 56 7.1 Truncation and Equivalence

Deﬁnition. 0 Two sequences {Xn} and {Xn} are tail equivalent if

X 0 P[Xn 6= Xn] < ∞. n Remark: From the Borel-Cantelli Lemma, we have that the above tail equivalence implies 0 P Xn 6= Xn i.o. = 0 or equivalently 0 P lim inf Xn = X = 1 n→∞ n 0 0 Let N = lim infn→∞ [Xn = Xn] = ∪n ∩k≥n [Xk = Xk ], we have 0 P(N) = 1. For ω ∈ N, it means when k ≥ K(ω) , Xk (ω) = Xk (ω). P 0 Thus for ω ∈ N, n(Xn(ω) − Xn(ω)) converges; i.e., P 0 n(Xn − Xn) converges a.s.

3 / 56 7.1 Truncation and Equivalence

P∞ P∞ 0 In addition, we have n=K(ω) Xn(ω) = n=K(ω) Xn(ω). And if an ↑ ∞ and when n ≥ K(ω),

n K(ω) −1 X 0 −1 X 0 an (Xj (ω) − Xj (ω)) = an (Xj (ω) − Xj (ω)) → 0. j=1 j=1

Proposition 7.1.1 (Equivalence) 0 Suppose the two sequences {Xn} and {Xn} are tail equivalent. Then P 0 1. n(Xn − Xn) converges a.s. P P 0 2. n Xn conveges a.s. iﬀ n Xn converge a.s. 3. If an ↑ ∞ and if there exits a random variable X such that

n n −1 X a.s. −1 X 0 a.s. an Xj → X , then also an Xj → X . j=1 j=1

4 / 56 7.2 A General Weak Law of Large Numbers

Theorem 7.2.1 (General weak law of large numbers)

Suppose {Xn} are independent random variables and deﬁne Pn Sn = j=1 Xj . If Pn 1. j=1 P[|Xj | > n] → 0, −2 Pn 2 2. n j=1 E Xj I[|Xj ≤n] → 0, then if we deﬁne n X an = E Xj I[|Xj ≤n] , j=1 we get S − a n n = X¯ − a /n →P 0. n n n

Remark: No assumptions about moments of Xn’s need to be made.

5 / 56 By Chebychev’s inequality n X 2 2 P[|Xj | > n] = nP [|X1| > n] ≤ nE X1 /n → 0 j=1 and n X 1 1 n−2 E X 2I = nE X 2I ≤ E X 2 → 0 j [|Xj ≤n] n2 1 [|X1|≤n] n 1 j=1 Finally, we observe, as n → ∞, a n = E X I → E (X ) = µ n 1 [|X1|≤n] 1

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (a) WLLN with variances. 2 Suppose {Xn} are iid with E(Xn) = µ and E(Xn ) < ∞. Then as P n → ∞, Sn/n → µ. Proof.

6 / 56 We ﬁrst have, since E (|X1|) < ∞,

n X P[|Xj | > n] = nP [|X1| > n]= E nI[|X1|>n] j=1 ≤ E |X1| I[|X1|>n] → 0.

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (b) Khintchin’s WLLN under the ﬁrst moment hypothesis.

Suppose {Xn} are iid with E(|Xn|) < ∞ and E(Xn) = µ. Then as P n → ∞, Sn/n → µ. Proof.

7 / 56 Next, for any > 0, n X 1 n−2 E X 2I = EX 2I j [|Xj ≤n] n 1 [|X1|≤n] j=1 1 ≤ E X 2I √ + E X 2I √ n 1 [|X1|≤ n] 1 [ n≤|X1|≤n] 2n 1 ≤ + E n |X1| I √ n n [ n≤|X1|≤n] 2 √ ≤ + E |X1| I n≤|X1| → 2. Moreover, since nE X1I [|X1|≤n] − E (X ) ≤ E |X | I → 0, 1 1 [|X1|>n] n P we have X¯n → µ.

7.2 A General Weak Law of Large Numbers (Special Cases)

8 / 56 It suﬃces to show that (1), obviously,

n X P[|Xj | > n] = nP [|X1| > n] → 0, j=1

and (2)

n X 1 n−2 E X 2I = E X 2I → 0 j [|Xj ≤n] n 1 [|X1|≤n] j=1

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (c) Feller’s WLLN without a ﬁrst moment assumption.

Suppose {Xn} are iid with limx→∞ xP[|X1| > x] = 0, then P Sn/n − E(X1I[|X1|≤n]) → 0. Proof.

9 / 56 We deﬁne τ(x) := xP [|X1| > x] and let F (x) be the cdf. Then Z Z 1 2 1 2 |X1| I[|X1|≤n]dP = x F (dx) n Ω n |x|≤n ! 1 Z Z |x| = 2sds F (dx) n |x|≤n s=0 " # 1 Z n Z = 2s F (dx) ds (by Fubini) n 0 s<|x|≤n 1 Z n = 2s (P [|X1| > s] − P [|X1| > n]) ds n 0 1 Z n 1 Z n = 2τ(s)ds − 2sdsP [|X1| > n] n 0 n 0 2 Z n = τ(s)ds − nP [|X1| > n] → 0 n 0 | {z } τ(n) since if τ(s) → 0 so does its average.

7.2 A General Weak Law of Large Numbers (Special Cases)

10 / 56 Deﬁne n 0 0 X 0 X = Xj I and S = X nj [|Xj |≤n] n nj j=1 Then n n X 0 X P Xnj 6= Xj = P [|Xj | > n] → 0 j=1 j=1 So   0 0 [ 0  P Sn − Sn > ≤ P Sn 6= Sn ≤ P Xnj 6= Xj j=1  n X 0 ≤ P Xnj 6= Xj → 0 j=1 and therefore 0 P Sn − Sn → 0.

7.2 A General Weak Law of Large Numbers (Proof)

Proof of Theorem 7.2.1.

11 / 56 0 Now look at Sn. By Chebychev’s inequality

0 0 0 n Sn − ESn var (Sn) 1 X 0 2 P > ≤ ≤ E X n n22 n22 nj j=1 n 1 X = E X 2I → 0 n22 j [|Xj |≤n] j=1

0 Pn Note an = ES = EXj I , and thus n j=1 [|Xj |≤n] S0 − a n n →P 0 n We therefore get S − a S − S0 S0 − a n n = n n + n n →P 0. n n n

7.2 A General Weak Law of Large Numbers (Proof)

12 / 56 Note that Z ∞ e e Z ∞ dy EX + = dx = = ∞, e 2x log x 2 1 y because the distribution is symmetric E (X +) = E (X −) = ∞ and E(X ) does not exist. However, e e τ(x) = xP [|X | > x] = x · = → 0 1 x log x log x

and an = 0 because F is symmetric, so without a mean existing, S n →P 0. n

7.2 A General Weak Law of Large Numbers (Example)

Example e Let us consider a CDF: F (x) = 1 − 2x log x , x ≥ e. Suppose we have from it an iid sequence {Xn}. What is the mean of X ? What does Sn/n converges in probability to? Solution.

13 / 56 Note that E(X ) neither exists. Now

−1 τ(x) = xP [|X1| > x] = x · (1 − 2π arctan x).

But 1 − 2π−1 arctan x −2π−1(1 + x2)−1 lim τ(x) = lim = lim x→∞ x→∞ x−1 x→∞ −x−2 2x2 2 = lim = 6= 0. (Oops!) x→∞ π(1 + x2) π

7.2 A General Weak Law of Large Numbers (Example)

Example How about (standard) Cauchy? We have

F (x) = 0.5 + π−1 arctan x.

Solution.

14 / 56 When N = 1, it is trivial. For any N > 1, we deﬁne PN A = {supj≤N |Sj | > α} = k=1 Ak where Ak = {|Sk | > α, |Sj | ≤ α, j < k}, k = 1,..., N, are disjoint events. Note that 2 2 P(Ak ) = E(IAk ) ≤ E(Sk /α IAk ) −2 2 2 −2 2 ≤ α E((Sk + (SN − Sk ) )IAk ) = α E(Sn IAk ). Summing over k yields −2 2 −2 2 P(A) = P(sup |Sj | > α) ≤ α E(SN IA) ≤ α E(SN ). j≤N

7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.0 (Komogorov’s inequality: about tail probabilities of maxima of sums)

Suppose {Xn} is an independent sequence of random variables and suppose E(Xn) = 0 and var(Xn) < ∞. Then for each α > 0, " # N 1 1 X P sup |S | > α ≤ var(S ) = E(X 2). j α2 N α2 j j≤N j=1 Proof.

15 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.1 (Skorohod’s inequality: about tail probabilities of maxima of sums)

Suppose {Xn} is an independent sequence of random variables and suppose α > 0 is ﬁxed. For n ≥ 1, set

if iid c = sup P[|SN − Sj | > α] = sup P[|Sj | > α]. j≤N j≤N

Suppose c < 1, then " # 1 P sup |Sj | > 2α ≤ P[|SN | > α]. j≤N 1 − c

Remark. Compared to the famous Kolmogorov’s inequality, one advantage of Skorohod’s inequality is that it does not require moment assumptions. 16 / 56 Deﬁne J := inf {j : |Sj | > 2α} with the convention that inf ∅ = ∞. Note that

" # N X sup |Sj | > 2α = [J ≤ N] = [J = j] j≤N j=1

where the last union is a union of disjoint sets. Now we write

P [|SN | > α] ≥ P [|SN | > α, J ≤ N] N N X X = P [|SN | > α, J = j] ≥ P [|SN − Sj | ≤ α, J = j] j=1 j=1

where, the last inequality is because if |SN − Sj | ≤ α and |Sj | > 2α, then |SN | > α.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Proof of Skorohod’s inequality.

17 / 56 PN It is also true that SN − Sj = i=j+1 Xj ∈ B (Xj+1,..., XN ) and [J = j] = supi 2α ∈ B (X1 ... Xj ). Noting B (Xj+1,..., XN ) ⊥ B (X1 ... Xj ) , we have

N X P [|SN | > α] ≥ P [|SN − Sj | ≤ α] P[J = j] j=1 N X ≥ (1 − c)P[J = j] j=1 =( 1 − c)P[J ≤ N] " # =( 1 − c)P sup |Sj | > 2α . j≤N

7.3 Almost Sure Convergence of Sums of Independent Random Variables

18 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables

Based on Skorohod’s inequality, we may now present a remarkable result which shows the equivalence of convergence in probability to almost sure convergence for sums of independent random variables. Theorem 7.3.2 (Lévy’s theorem)

If {Xn} are independent, then X X Xn converges i.p. iﬀ Xn converges a.s.; n n Pn i.e., letting Sn = i=1 Xi , then the following are equivalent 1. {Sn} is Cauchy in probability.

2. {Sn} converges in probability.

3. {Sn} converges almost surely.

4. {Sn} is almost surely Cauchy.

19 / 56 Assume {Sn} is convergent in probability, so that {Sn} is Cauchy in probability. We show {Sn} is almost surely convergent by showing {Sn} is almost surely Cauchy. To show that {Sn} is almost surely Cauchy, we need to show a.s. ξN = sup |Sm − Sn| → 0, m,n≥N

as N → ∞. Note that {ξN , N ≥ 1} is a decreasing sequence. Known P a.s. P that if ξN → 0, then ξN → 0. To show ξN → 0, we see

ξN = sup |Sm − SN + SN − Sn| m,n≥N

≤ sup |Sm − SN | + sup |Sn − SN | m≥N n≥N

= 2 sup |Sn − SN | n≥N

= 2 sup |SN+j − SN | j≥0

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Proof of Lévy’s theorem.

20 / 56 P It suﬃces to show that supj≥0 |SN+j − SN | → 0. For any > 0, and 1 0 < δ < 2 , the assumption that {Sn} is Cauchy i.p. implies that there exists N,δ such that h i P |S − S 0 | > ≤ δ m m 2 0 if m, m ≥ N,δ, and hence, if N ≥ N,δ, h i P |S − S | > ≤ δ, ∀j ≥ 0. N+j N 2

7.3 Almost Sure Convergence of Sums of Independent Random Variables

21 / 56 0 Now we seek to apply Skorohod’s inequality. Let Xi = XN+i and j j 0 X 0 X Sj = Xi = XN+i = SN+j − SN . i=1 i=1 With this notation we have " " # 0 P sup | SN+j −SN |> ] = P sup Sj > N0≥j≥0 N0≥j≥0   1 h 0 i ≤   P SN0 > h 0 0 i 2 1 − sup 0 P S − S > j≤N N0 j 2 ! 1 h 0 i = P SN0 > 1 − supj≤N0 P |SN+N0 − SN+j | > 2 2 ≤ (1 − δ)−1δ ≤ 2δ which can be chosen arbitrarily small.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

22 / 56 P∞ 2 WLOG, assume E(Xj ) = 0. Then j=1 EXj < ∞, which implies that {Sn} is L2–Cauchy since (m < n) n 2 X 2 kSn − Smk2 = var (Sn − Sm) = EXj → 0 j=m+1

as m, n → ∞. So {Sn} , being L2− Cauchy, is also Cauchy in proba- −2 −2 Pn bility. P [|Sn − Sm| > ] ≤ var (Sn − Sm) = j=m var (Xj ) → 0 as n, m → ∞. By Lévy’s theorem {Sn} is almost surely convergent.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Theorem 7.3.3 (Kolmogorov convergence criterion)

If {Xn} are independent, and ∞ X var(Xj ) < ∞, j=1 ∞ then X (Xj − E(Xj )) converges almost surely. j=1 Proof.

23 / 56 7.4 Strong Laws of Large Numbers

This section considers the problem of when sums of independent random variables properly scaled and centered converge almost surely. We will prove that sample averages converge to mathematical ex- pectations when the sequence is iid and a mean exists. We begin with a number theory result which is traditionally used in the development of the theory. Lemma 7.4.1 (Kronecker’s lemma)

Suppose we have two sequences {xk } and {ak } such that xk ∈ R and0 < an ↑ ∞. If X xk converges, ak k then n −1 X lim a xk = 0. n→∞ n k=1

24 / 56 P∞ Let rn = k=n+1 xk /ak so that rn → 0 as n → ∞. Given > 0, there exists N0 = N0() such that for n ≥ N0, we have |rn| ≤ . Now

xn = rn−1 − rn an so xn = an (rn−1 − rn) , n ≥ 1 and n n X X xk = (rk−1 − rk ) ak k=1 k=1 n−1 X = (aj+1 − aj ) rj + a1r0 − anrn j=1

7.4 Strong Laws of Large Numbers

Proof of Kronecker’s lemma.

25 / 56 Then for n ≥ N0 Pn xk k=1 an N0−1 n−1 X (aj+1 − aj ) X (aj+1 − aj ) a1r0 anrn ≤ |rj | + |rj | + + an an an an j=1 j=N0 const ≤ + (aN0+1 − aN0 + aN0+2 − aN0+1 an an

+aN0+3 − aN0+2 + ··· + an − an−1) + (a − a ) ≤o(1) + n N0 + an ≤o(1) + 2.

This shows the result.

7.4 Strong Laws of Large Numbers

26 / 56 By the Kolmogorov convergence criterion, we know X ((Xk − EXk )/bk ) converges a.s. n Then follow the Kronecker Lemma, we have n −1 X a.s. bn (Xk − EXk ) → 0. k=1

7.4 Strong Laws of Large Numbers

Corollary 7.4.1 (A strong law) 2 Let {Xn} be independent with E(Xn ) < ∞. Suppose we have a monotone sequence bn ↑ ∞. If X Xk var < ∞, bk k then S − E(S ) n n a→.s. 0. bn Proof.

27 / 56 Pn 1 Taken from analysis: log n − j=1 j → c (Euler’s constant) as n → ∞.

7.4.1 Example 1: Record counts

Suppose {Xn} is iid with continuous cdf F . Deﬁne

n N X . X µN = I = 1j , [Xj is a record] j=1 j=1

so µN is the number of recordes in the ﬁrst N observations. Proposition 7.4.1 (Logarithmic growth rate) The number of records in an iid sequence grows logarithmically and we have the almost sure limit µ lim N → 1. N→∞ log N

Proof.

28 / 56 Recall that {1j , j ≥ 1} are independent. The basic facts about {1j } are that 1 1 P [1 = 1]= , E (1 ) = j j j j 1 1 j − 1 var (1 )= E (1 )2 − (E1 )2 = − = j j j j j2 j2

This implies that ∞ ∞ X 1j X 1 var = var (1 ) log j (log j)2 j j=2 2 ∞ X j − 1 = < ∞ j2(log j)2 2

7.4.1 Example 1: Record counts

29 / 56 The Kolmogorov convergence criterion implies that ∞ ∞ 1 − 1 X 1j − E (1j ) X j j = converges log j log j j=2 j=2

and Kronecker’s lemma yields

Pn −1 Pn Pn −1 Pn −1 1j − j 1j − j µn − j 0 a←.s. j=1 = j=1 j=1 = j=1 log n log n log n

Thus Pn −1 Pn −1 µ µn − j j − log n n − 1 = j=1 + j=1 → 0 log n log n log n

This completes the derivation.

7.4.1 Example 1: Record counts

30 / 56 By the Kolmogorov Zero-One Law, we know that P[explosion] = P P[ n Xn < ∞] = 0 or1.

7.4.2 Example 2: Explosions in the Pure Birth Process.

Let {Xn} be independent with

−λnx P[Xn > x] = e , x > 0 Pn where λn ≥ 0 are birth parameters. Let S0 = 0, Sn = i=1 Xi and the population size process {X (t), t ≥ 0} of the pure birth process by X (t) = n if Sn−1 ≤ t < Sn. Next deﬁne the event explosion by ∞ X [explosion] = [ Xn < ∞] = [X (t) = ∞ for some ﬁnite t]. n=1 Proposition 7.4.2

X −1 P[explosion] = I ( λn < ∞). n Proof.

31 / 56 P −1 If n λn < ∞, then by the monotone convergence theorem, X X X −1 E( Xn) = E(Xn) = λn < ∞. n n n Thus X P[explosion] = P[ Xn < ∞] = 1. n P P∞ Conversely, suppose P [ n Xn < ∞] = 1. Then1 > exp {− n=1 Xn} > P 0 a.s. which implies that1 > E (exp {− n Xn}) > 0. Known that ∞ ! N ! − P∞ X Y −X Y −X E e n=1 n = E e n = E lim e n N→∞ n=1 k=1 N ! Y = lim E e−Xn (by Monotone Convergence) N→∞ n=1 N N Y Y λn = lim E e−Xn (by independence) = lim N→∞ N→∞ 1 + λ n=1 n=1 n

7.4.2 Example 2: Explosions in the Pure Birth Process.

32 / 56 P Taking log, we have P [ n Xn < ∞] = 1 implies ∞ X −1 0 < log(1 + λn ) < ∞. n=1

−1 −1 Then log(1 + λn ) → 0 which implies λ → 0.since log(1 + x) lim = 1 x↓0 x by L’Hôpital’s rule, we have

−1 −1 log 1 + λn ∼ λn as n → ∞ and thus ∞ ∞ X −1 X −1 0 < log 1 + λn < ∞ iﬀ λn < ∞ n=1 n=1

7.4.2 Example 2: Explosions in the Pure Birth Process.

33 / 56 (a) ↔ (c): Observe that: ∞ Z ∞ X Z n+1 E (|X1|)= P [|X1| ≥ x] dx = P [|X1| ≥ x] dx 0 n=0 n ∞ ∞ X X ≥ P [|X1| ≥ n + 1] ≤ P [|X1| ≥ n] n=0 n=0 P∞ Thus E (|X1|) < ∞ iff n=0 P [|X1| ≥ n] < ∞. For every > 0, set Y = X1/. We get E(|X1|) < ∞ iff E(|Y |) < ∞ iff P∞ P∞ n=0 P [|Y | ≥ n] = n=0 P [|X1| ≥ n] < ∞.

7.5 The Strong Law of Large Numbers for IID Sequences

Lemma 7.5.1 Let {Xn} be iid. The following are equivalent:

(a) E|X1| < ∞ Xn (b) limn→∞ n = 0 almost surely. P∞ (c) For > 0, n=1 P[|X1| ≥ n] < ∞. Proof.

34 / 56 (c) ↔ (b): Given > 0, X X P [|X1| ≥ n] = P [|Xn| ≥ n] < ∞ n n is equivalent, by the Borel zero-one law, to |X | P n > i.o. = 0 n which is in turn equivalent to |X | lim sup n ≤ n→∞ n

|Xn| almost surely. since lim supn→∞ n is a tail function, it is a.s. constant. If this constant is bounded by for any > 0, then |X | lim sup n = 0 n→∞ n This gives (b) and the converse is similar.

7.5 The Strong Law of Large Numbers for IID Sequences

35 / 56 We show ﬁrst that S n a→.s. c implies E (|X |) < ∞. n 1 This is because the following and Lemma 7.5.1 X S − S S n − 1 S n = n n−1 = n − n−1 a→.s. c − c = 0 . n n n n n − 1

7.5 The Strong Law of Large Numbers for IID Sequences

Kolmogorov’s SLLN Pn Let {Xn} be iid and set Sn = i=1 Xi . There exists c ∈ R such that a.s. X¯n = Sn/n → c

iﬀ E(|X1) < ∞ in which case c = µ = E(X1). Corollary 7.5.1 ¯ a.s. 2 If {Xn} is iid, then E(|X1|) < ∞ implies Xn → µ and EX1 < ∞ 1 Pn ¯ 2 a.s. 2 implies n i=1(Xi − Xn) → σ = var(X1) and 1 Pn ¯ 2 a.s. 2 n−1 i=1(Xi − Xn) → σ = var(X1). Proof of Kolmogorov’s SLLN.

36 / 56 a.s. Now we show that E (|X1|) < ∞ implies Sn/n → E (X1) . To do this, we use a truncation argument. Deﬁne

0 Xn = XnI[|Xn|≤n], n ≥ 1

Then X 0 X P Xn 6= Xn = P [|Xn| > n] < ∞ n n 0 (since E |X1| < ∞ ) and hence {Xn} and {Xn} are tail equivalent. Therefore by Proposition 7.1.1

n a.s. 0 X 0 a.s. Sn/n → E (X1) iﬀ Sn/n = Xj /n → E (X1) j=1

So it suﬃces to consider the truncated sequence.

7.5 The Strong Law of Large Numbers for IID Sequences

37 / 56 Next observe that 0 0 0 Pn S − E (S ) S − E (S ) nE (X1) − E X1I[|X |≤j] n n n n j=1 1 − = n n n

n E X I X 1 [|X1|≤j] = E (X1) − → 0. n j=1 This last step follows from the fact that E (X1) − E X1I[|X1|≤n] ≤ E |X1| I[|X1|>n] → 0. We thus conclude that Pn 0 0 S0 1 Xj − E Xj n − E (X ) a→.s. 0 iﬀ a→.s. 0. n 1 n Then, it is enough by Kronecker’s lemma to prove X 0 var(Xj /j) < ∞. j

7.5 The Strong Law of Large Numbers for IID Sequences

38 / 56 2 0 ! E X 0 ∞ X Xj X j X 1 var ≤ = E X 2I j j2 j2 1 [|X1|≤j] j j j=1 ∞ j ∞  ∞  X X 1 X X 1 = E X 2I = E X 2I j2 1 [k−1<|X1|≤k]  j2  1 [k−1<|X1|≤k] j=1 k=1 k=1 j=k ∞ X 2 ≤ E |X |2 I k 1 [k−1<|X1|≤k] k=2 ∞ X 2 ≤ · kE |X | I k 1 [k−1<|X1|≤k] k=2

= 2E (|X1|) < ∞.

7.5 The Strong Law of Large Numbers for IID Sequences

39 / 56 7.5.1 Application 1: Renewal Theory

Suppose Xn’s are iid and non-negative. with E(Xn) = µ ∈ (0, ∞). Then a.s. a.s. Sn/n → µ, and Sn → ∞.

Let S0 = 0 and deﬁne

∞ X N(t) = I[Sj ≤t]. j=0

We call N(t) the number of renewals in [0, t]. Then

[N(t) ≤ n] = [Sn > t] and SN(t)−1 ≤ t < SN(t).

We conjecture that N(t)/t a→.s. µ−1.

40 / 56 We ﬁrst show N(t) → ∞ a.s. as t → ∞ and because of the monotonicity, it suﬃces to show N(t) →P ∞. Since for any m

lim P[N(t) ≤ m] = lim P [Sm > t] → 0 t→∞ t→∞ we get the desired result that N(t) →P ∞. Now deﬁne the sets

n Sn(ω) o Λ1 = ω : n → µ Λ2 = {ω : N(t, ω) → ∞} a.s. by →, we have P (Λ1) = P (Λ2) = 1. Then Λ := Λ1 ∩ Λ2 has P(Λ) = 1. For ω ∈ Λ, as t → ∞

SN(t,ω)(ω)/N(t, ω) → µ a.s. and so SN(t)/N(t) → µ as t → ∞. From N(t) − 1 S t S N(t)−1 < ≤ N(t) , N(t) N(t) − 1 N(t) N(t) we conclude that t/N(t) a→.s. µ and thus N(t)/t a→.s. µ−1. Thus the long run rate of renewals is µ−1.

7.5.1 Application 1: Renewal Theory

41 / 56 Deﬁne ← xv,k := F (v/k), v = 1,..., k where F ←(x) = inf{u : F (u) ≥ x}. Recall F ←(u) ≤ t iﬀ u ≤ F (t) and F (F ←(u)) ≥ u, F (F ←(u)−) ≤ u.

7.5.1 Application 1: Glivenko-Cantelli Theorem

Suppose Xn’s are iid with common distribution F . We estimate F by the empirical cumulative distribution function (ecdf) n ˆ −1 X Fn(x) = n I[Xj ≤x]. j=1 a.s. By the SLLN, Fˆn(x) → F (x) for each ﬁxed x. Glivenko-Cantelli Theorem a.s. Dn = sup |Fˆn(x) − F (x)| → 0. x Proof.

42 / 56 If xv,k ≤ x < xv+1,k , then monotonicity implies

F (xv,k ) ≤ F (x) ≤ F (xv+1,k −) , Fˆn (xv,k ) ≤ Fˆn(x) ≤ Fˆn (xv+1,k −)

and for such x

Fˆn (xv,k ) − F (xv+1,k −) ≤ Fˆn(x) − F (x)

≤ Fˆn (xv+1,k −) − F (xv,k ) . Since v + 1 v 1 F (x −) − F (x ) ≤ − = , v+1,k v,k k k k we have 1 Fˆ (X ) − F (x ) − ≤ Fˆ (x) − F (x) n v,k v,k k n 1 ≤ Fˆ (x −) − F (x −) + . n v+1,k v+1,k k

7.5.1 Application 1: Glivenko-Cantelli Theorem

43 / 56 Therefore

sup Fˆn(x) − F (x) x∈[xv,k ,xv+1,k )

≤ 1 + Fˆ (x ) − F (x ) ∨ Fˆ (x −) − F (x −) k n v,k v,k n v+1,k v+1,k

which is valid for v = 1,..., k − 1, and taking the supremum,

sup Fˆn(x) − F (x) x∈[x1,k ,xk,k )

≤ 1 + Wk Fˆ (x ) − F (x ) ∨ Fˆ (x −) − F (x −) k v=1 n v,k v,k n v,k v,k = RHS

We now show that this inequality also holds for x < x1,k and x ≥ xk,k . If x ≥ xk,k , then F (x) = Fˆn(x) = 1 so Fˆn(x) − F (x) = 0 and RHS is still an upper bound.

7.5.1 Application 1: Glivenko-Cantelli Theorem

44 / 56 If x < x1,k , either (i) F (x) ≥ Fˆn(x) in which case

ˆ Fn(x, ω) − F (x) = F (x) − Fn(x, ω)

≤ F (x) ≤ F (x1,k −) 1 ≤ ≤ RHS k

or (ii) Fˆn(x) > F (x) in which case

ˆ ˆ Fn(x, ω) − F (x) = Fn(x, ω) − F (x)

≤ Fˆn (x1,k −, ω) − F (x1,k −) + F (x1,k −) − F (x)

ˆ ≤ Fn (x1,k −, ω) − F (x1,k −) + |F (x1k −) − F (x)| 1 ≤ + Fˆ (x −, ω) − F (x −) ≤ RHS k n 1k 1k We therefore conclude that

Dn ≤ RHS

7.5.1 Application 1: Glivenko-Cantelli Theorem

45 / 56 The SLLN implies that there exist sets Λv,k , and Λ˜v,k , such that P (Λv,k ) = P Λ˜v,k = 1 and such that

Fˆn (xv,k ) → F (xvk ) , n 1 X Fˆn (xv,k −) = I → P [X1 < xv,k ] = F (xv,k −) n [Xj

provided ω ∈ Λvk and Λ˜vk respectively. Let \ \\ Λk = Λv,k Λ˜v,k v

so P (Λk ) = 1. Then for ω ∈ Λk , 1 lim sup Dn(ω) ≤ n→∞ k T T For ω ∈ k Λk , where P ( k Λk ) = 1,

lim Dn(ω) = 0. n→∞

7.5.1 Application 1: Glivenko-Cantelli Theorem

46 / 56 7.6 The Kolmogorov Three Series Theorem

The Kolmogorov three series theorem provides necessary and suf- ﬁcient conditions for a series of independent random variables to converge. The result is especially useful when the Kolmogorov convergence criterion may not be applicable, for example, when existence of variances is not guaranteed. Theorem 7.6.1 P Let {Xn} be independent. in order for n Xn converges a.s., it is necessary and suﬃcient that there exists c > 0 such that P (i) n P[|Xn| > c] < ∞ P (ii) n var(XnI[|Xn|≤c]) < ∞ P (iii) n E(XnI|Xn|≤c ) converges. P If n Xn converges a.s., then (i), (ii), (iii) hold for any c > 0. Thus if the three series converge for one value of c > 0, they converge for all c > 0.

47 / 56 Suppose the three series converge. Deﬁne 0 Xn = XnI[|Xn|≤c] Then X 0 X P Xn 6= Xn = P [|Xn| > c] < ∞ n n 0 P by (i) so {Xn} and {Xn} are tail equivalent. Thus n Xn converges P 0 almost surely iﬀ n Xn converges almost surely. From (ii) X 0 var Xn < ∞ n so by the Kolmogorov convergence criterion X 0 0 Xj − E Xj converges j P 0 P 0 But (iii) implies n E (Xn) converges and thus n Xn converges, as desired.