<<

STAT 811 II

Chapter 7: Laws of Large Numers and Sums of Independent Random Variables

Dr. Dewei Wang Associate Professor Department of Statistics University of South Carolina [email protected]

1 / 56 7.1 Truncation and Equivalence

From the definition of ui, we see that dealing with random variables that are uniformly bounded or that have moments are often easier. When these desirable properties are absent, we often use truncation to induce their presence and then see whether the truncation makes any equivalence. For example, we often compare

0 {Xn} with {Xn} = {XnI|Xn|≤n}

where the second one is a truncated version of the first.

2 / 56 7.1 Truncation and Equivalence

Definition. 0 Two sequences {Xn} and {Xn} are tail equivalent if

X 0 P[Xn 6= Xn] < ∞. n Remark: From the Borel-Cantelli Lemma, we have that the above tail equivalence implies  0   P Xn 6= Xn i.o. = 0 or equivalently   0  P lim inf Xn = X = 1 n→∞ n 0 0 Let N = lim infn→∞ [Xn = Xn] = ∪n ∩k≥n [Xk = Xk ], we have 0 P(N) = 1. For ω ∈ N, it means when k ≥ K(ω) , Xk (ω) = Xk (ω). P 0 Thus for ω ∈ N, n(Xn(ω) − Xn(ω)) converges; i.e., P 0 n(Xn − Xn) converges a.s.

3 / 56 7.1 Truncation and Equivalence

P∞ P∞ 0 In addition, we have n=K(ω) Xn(ω) = n=K(ω) Xn(ω). And if an ↑ ∞ and when n ≥ K(ω),

n K(ω) −1 X 0 −1 X 0 an (Xj (ω) − Xj (ω)) = an (Xj (ω) − Xj (ω)) → 0. j=1 j=1

Proposition 7.1.1 (Equivalence) 0 Suppose the two sequences {Xn} and {Xn} are tail equivalent. Then P 0 1. n(Xn − Xn) converges a.s. P P 0 2. n Xn conveges a.s. iff n Xn converge a.s. 3. If an ↑ ∞ and if there exits a X such that

n n −1 X a.s. −1 X 0 a.s. an Xj → X , then also an Xj → X . j=1 j=1

4 / 56 7.2 A General Weak

Theorem 7.2.1 (General weak law of large numbers)

Suppose {Xn} are independent random variables and define Pn Sn = j=1 Xj . If Pn 1. j=1 P[|Xj | > n] → 0, −2 Pn  2  2. n j=1 E Xj I[|Xj ≤n] → 0, then if we define n X   an = E Xj I[|Xj ≤n] , j=1 we get S − a n n = X¯ − a /n →P 0. n n n

Remark: No assumptions about moments of Xn’s need to be made.

5 / 56 By Chebychev’s inequality n X 2 2 P[|Xj | > n] = nP [|X1| > n] ≤ nE X1 /n → 0 j=1 and n X   1 1 n−2 E X 2I = nE X 2I  ≤ E X 2 → 0 j [|Xj ≤n] n2 1 [|X1|≤n] n 1 j=1 Finally, we observe, as n → ∞, a n = E X I  → E (X ) = µ n 1 [|X1|≤n] 1

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (a) WLLN with variances. 2 Suppose {Xn} are iid with E(Xn) = µ and E(Xn ) < ∞. Then as P n → ∞, Sn/n → µ. Proof.

6 / 56 We first have, since E (|X1|) < ∞,

n X  P[|Xj | > n] = nP [|X1| > n]= E nI[|X1|>n] j=1  ≤ E |X1| I[|X1|>n] → 0.

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (b) Khintchin’s WLLN under the first moment hypothesis.

Suppose {Xn} are iid with E(|Xn|) < ∞ and E(Xn) = µ. Then as P n → ∞, Sn/n → µ. Proof.

7 / 56 Next, for any  > 0, n X   1 n−2 E X 2I = EX 2I j [|Xj ≤n] n 1 [|X1|≤n] j=1 1      ≤ E X 2I √ + E X 2I √ n 1 [|X1|≤ n] 1 [ n≤|X1|≤n] 2n 1   ≤ + E n |X1| I √ n n [ n≤|X1|≤n]   2 √ ≤  + E |X1| I n≤|X1| → 2. Moreover, since  nE X1I [|X1|≤n] − E (X ) ≤ E |X | I  → 0, 1 1 [|X1|>n] n P we have X¯n → µ.

7.2 A General Weak Law of Large Numbers (Special Cases)

8 / 56 It suffices to show that (1), obviously,

n X P[|Xj | > n] = nP [|X1| > n] → 0, j=1

and (2)

n X   1 n−2 E X 2I = E X 2I  → 0 j [|Xj ≤n] n 1 [|X1|≤n] j=1

7.2 A General Weak Law of Large Numbers (Special Cases)

Case (c) Feller’s WLLN without a first moment assumption.

Suppose {Xn} are iid with limx→∞ xP[|X1| > x] = 0, then P Sn/n − E(X1I[|X1|≤n]) → 0. Proof.

9 / 56 We define τ(x) := xP [|X1| > x] and let F (x) be the cdf. Then Z Z 1 2 1 2 |X1| I[|X1|≤n]dP = x F (dx) n Ω n |x|≤n ! 1 Z Z |x| = 2sds F (dx) n |x|≤n s=0 " # 1 Z n Z = 2s F (dx) ds (by Fubini) n 0 s<|x|≤n 1 Z n = 2s (P [|X1| > s] − P [|X1| > n]) ds n 0 1 Z n 1 Z n = 2τ(s)ds − 2sdsP [|X1| > n] n 0 n 0 2 Z n = τ(s)ds − nP [|X1| > n] → 0 n 0 | {z } τ(n) since if τ(s) → 0 so does its average.

7.2 A General Weak Law of Large Numbers (Special Cases)

10 / 56 Define n 0 0 X 0 X = Xj I and S = X nj [|Xj |≤n] n nj j=1 Then n n X  0  X P Xnj 6= Xj = P [|Xj | > n] → 0 j=1 j=1 So    0   0  [  0  P Sn − Sn >  ≤ P Sn 6= Sn ≤ P Xnj 6= Xj j=1  n X  0  ≤ P Xnj 6= Xj → 0 j=1 and therefore 0 P Sn − Sn → 0.

7.2 A General Weak Law of Large Numbers (Proof)

Proof of Theorem 7.2.1.

11 / 56 0 Now look at Sn. By Chebychev’s inequality

 0 0  0 n Sn − ESn var (Sn) 1 X 0 2 P >  ≤ ≤ E X n n22 n22 nj j=1 n 1 X   = E X 2I → 0 n22 j [|Xj |≤n] j=1

0 Pn Note an = ES = EXj I , and thus n j=1 [|Xj |≤n] S0 − a n n →P 0 n We therefore get S − a S − S0 S0 − a n n = n n + n n →P 0. n n n

7.2 A General Weak Law of Large Numbers (Proof)

12 / 56 Note that Z ∞ e e Z ∞ dy EX + = dx = = ∞, e 2x log x 2 1 y because the distribution is symmetric E (X +) = E (X −) = ∞ and E(X ) does not exist. However, e e τ(x) = xP [|X | > x] = x · = → 0 1 x log x log x

and an = 0 because F is symmetric, so without a mean existing, S n →P 0. n

7.2 A General Weak Law of Large Numbers (Example)

Example e Let us consider a CDF: F (x) = 1 − 2x log x , x ≥ e. Suppose we have from it an iid sequence {Xn}. What is the mean of X ? What does Sn/n converges in probability to? Solution.

13 / 56 Note that E(X ) neither exists. Now

−1 τ(x) = xP [|X1| > x] = x · (1 − 2π arctan x).

But 1 − 2π−1 arctan x −2π−1(1 + x2)−1 lim τ(x) = lim = lim x→∞ x→∞ x−1 x→∞ −x−2 2x2 2 = lim = 6= 0. (Oops!) x→∞ π(1 + x2) π

7.2 A General Weak Law of Large Numbers (Example)

Example How about (standard) Cauchy? We have

F (x) = 0.5 + π−1 arctan x.

Solution.

14 / 56 When N = 1, it is trivial. For any N > 1, we define PN A = {supj≤N |Sj | > α} = k=1 Ak where Ak = {|Sk | > α, |Sj | ≤ α, j < k}, k = 1,..., N, are disjoint events. Note that 2 2 P(Ak ) = E(IAk ) ≤ E(Sk /α IAk ) −2 2 2 −2 2 ≤ α E((Sk + (SN − Sk ) )IAk ) = α E(Sn IAk ). Summing over k yields −2 2 −2 2 P(A) = P(sup |Sj | > α) ≤ α E(SN IA) ≤ α E(SN ). j≤N

7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.0 (Komogorov’s inequality: about tail probabilities of maxima of sums)

Suppose {Xn} is an independent sequence of random variables and suppose E(Xn) = 0 and var(Xn) < ∞. Then for each α > 0, " # N 1 1 X P sup |S | > α ≤ var(S ) = E(X 2). j α2 N α2 j j≤N j=1 Proof.

15 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.1 (Skorohod’s inequality: about tail probabilities of maxima of sums)

Suppose {Xn} is an independent sequence of random variables and suppose α > 0 is fixed. For n ≥ 1, set

if iid c = sup P[|SN − Sj | > α] = sup P[|Sj | > α]. j≤N j≤N

Suppose c < 1, then " # 1 P sup |Sj | > 2α ≤ P[|SN | > α]. j≤N 1 − c

Remark. Compared to the famous Kolmogorov’s inequality, one advantage of Skorohod’s inequality is that it does not require moment assumptions. 16 / 56 Define J := inf {j : |Sj | > 2α} with the convention that inf ∅ = ∞. Note that

" # N X sup |Sj | > 2α = [J ≤ N] = [J = j] j≤N j=1

where the last union is a union of disjoint sets. Now we write

P [|SN | > α] ≥ P [|SN | > α, J ≤ N] N N X X = P [|SN | > α, J = j] ≥ P [|SN − Sj | ≤ α, J = j] j=1 j=1

where, the last inequality is because if |SN − Sj | ≤ α and |Sj | > 2α, then |SN | > α.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Proof of Skorohod’s inequality.

17 / 56 PN It is also true that SN − Sj = i=j+1 Xj ∈ B (Xj+1,..., XN ) and   [J = j] = supi 2α ∈ B (X1 ... Xj ). Noting B (Xj+1,..., XN ) ⊥ B (X1 ... Xj ) , we have

N X P [|SN | > α] ≥ P [|SN − Sj | ≤ α] P[J = j] j=1 N X ≥ (1 − c)P[J = j] j=1 =( 1 − c)P[J ≤ N] " # =( 1 − c)P sup |Sj | > 2α . j≤N

7.3 Almost Sure Convergence of Sums of Independent Random Variables

18 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables

Based on Skorohod’s inequality, we may now present a remarkable result which shows the equivalence of convergence in probability to almost sure convergence for sums of independent random variables. Theorem 7.3.2 (Lévy’s theorem)

If {Xn} are independent, then X X Xn converges i.p. iff Xn converges a.s.; n n Pn i.e., letting Sn = i=1 Xi , then the following are equivalent 1. {Sn} is Cauchy in probability.

2. {Sn} converges in probability.

3. {Sn} converges almost surely.

4. {Sn} is almost surely Cauchy.

19 / 56 Assume {Sn} is convergent in probabil- ity, so that {Sn} is Cauchy in probability. We show {Sn} is almost surely convergent by showing {Sn} is almost surely Cauchy. To show that {Sn} is almost surely Cauchy, we need to show a.s. ξN = sup |Sm − Sn| → 0, m,n≥N

as N → ∞. Note that {ξN , N ≥ 1} is a decreasing sequence. Known P a.s. P that if ξN → 0, then ξN → 0. To show ξN → 0, we see

ξN = sup |Sm − SN + SN − Sn| m,n≥N

≤ sup |Sm − SN | + sup |Sn − SN | m≥N n≥N

= 2 sup |Sn − SN | n≥N

= 2 sup |SN+j − SN | j≥0

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Proof of Lévy’s theorem.

20 / 56 P It suffices to show that supj≥0 |SN+j − SN | → 0. For any  > 0, and 1 0 < δ < 2 , the assumption that {Sn} is Cauchy i.p. implies that there exists N,δ such that h  i P |S − S 0 | > ≤ δ m m 2 0 if m, m ≥ N,δ, and hence, if N ≥ N,δ, h  i P |S − S | > ≤ δ, ∀j ≥ 0. N+j N 2

Now write " # ( " #) P sup |SN+j − SN | >  = P lim sup |SN+j − SN | >  j≥0 N0→∞ N0≥j≥0 " # = lim P sup |SN+j − SN | >  . N0→∞ N0≥j≥0

7.3 Almost Sure Convergence of Sums of Independent Random Variables

21 / 56 0 Now we seek to apply Skorohod’s inequality. Let Xi = XN+i and j j 0 X 0 X Sj = Xi = XN+i = SN+j − SN . i=1 i=1 With this notation we have " " # 0 P sup | SN+j −SN |> ] = P sup Sj >  N0≥j≥0 N0≥j≥0   1 h 0  i ≤   P SN0 > h 0 0  i 2 1 − sup 0 P S − S > j≤N N0 j 2 ! 1 h 0  i =    P SN0 > 1 − supj≤N0 P |SN+N0 − SN+j | > 2 2 ≤ (1 − δ)−1δ ≤ 2δ which can be chosen arbitrarily small.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

22 / 56 P∞ 2 WLOG, assume E(Xj ) = 0. Then j=1 EXj < ∞, which implies that {Sn} is L2–Cauchy since (m < n) n 2 X 2 kSn − Smk2 = var (Sn − Sm) = EXj → 0 j=m+1

as m, n → ∞. So {Sn} , being L2− Cauchy, is also Cauchy in proba- −2 −2 Pn bility. P [|Sn − Sm| > ] ≤  var (Sn − Sm) =  j=m var (Xj ) → 0 as n, m → ∞. By Lévy’s theorem {Sn} is almost surely convergent.

7.3 Almost Sure Convergence of Sums of Independent Random Variables

Theorem 7.3.3 (Kolmogorov convergence criterion)

If {Xn} are independent, and ∞ X var(Xj ) < ∞, j=1 ∞ then X (Xj − E(Xj )) converges almost surely. j=1 Proof.

23 / 56 7.4 Strong Laws of Large Numbers

This section considers the problem of when sums of independent ran- dom variables properly scaled and centered converge almost surely. We will prove that sample averages converge to mathematical ex- pectations when the sequence is iid and a mean exists. We begin with a number theory result which is traditionally used in the development of the theory. Lemma 7.4.1 (Kronecker’s lemma)

Suppose we have two sequences {xk } and {ak } such that xk ∈ R and0 < an ↑ ∞. If X xk converges, ak k then n −1 X lim a xk = 0. n→∞ n k=1

24 / 56 P∞ Let rn = k=n+1 xk /ak so that rn → 0 as n → ∞. Given  > 0, there exists N0 = N0() such that for n ≥ N0, we have |rn| ≤ . Now

xn = rn−1 − rn an so xn = an (rn−1 − rn) , n ≥ 1 and n n X X xk = (rk−1 − rk ) ak k=1 k=1 n−1 X = (aj+1 − aj ) rj + a1r0 − anrn j=1

7.4 Strong Laws of Large Numbers

Proof of Kronecker’s lemma.

25 / 56 Then for n ≥ N0 Pn xk k=1 an N0−1 n−1 X (aj+1 − aj ) X (aj+1 − aj ) a1r0 anrn ≤ |rj | + |rj | + + an an an an j=1 j=N0 const  ≤ + (aN0+1 − aN0 + aN0+2 − aN0+1 an an

+aN0+3 − aN0+2 + ··· + an − an−1) +   (a − a ) ≤o(1) + n N0 +  an ≤o(1) + 2.

This shows the result.

7.4 Strong Laws of Large Numbers

26 / 56 By the Kolmogorov convergence criterion, we know X ((Xk − EXk )/bk ) converges a.s. n Then follow the Kronecker Lemma, we have n −1 X a.s. bn (Xk − EXk ) → 0. k=1

7.4 Strong Laws of Large Numbers

Corollary 7.4.1 (A strong law) 2 Let {Xn} be independent with E(Xn ) < ∞. Suppose we have a monotone sequence bn ↑ ∞. If   X Xk var < ∞, bk k then S − E(S ) n n a→.s. 0. bn Proof.

27 / 56 Pn 1 Taken from analysis: log n − j=1 j → c (Euler’s constant) as n → ∞.

7.4.1 Example 1: Record counts

Suppose {Xn} is iid with continuous cdf F . Define

n N X . X µN = I = 1j , [Xj is a record] j=1 j=1

so µN is the number of recordes in the first N observations. Proposition 7.4.1 (Logarithmic growth rate) The number of records in an iid sequence grows logarithmically and we have the almost sure limit µ lim N → 1. N→∞ log N

Proof.

28 / 56 Recall that {1j , j ≥ 1} are independent. The basic facts about {1j } are that 1 1 P [1 = 1]= , E (1 ) = j j j j 1 1 j − 1 var (1 )= E (1 )2 − (E1 )2 = − = j j j j j2 j2

This implies that ∞   ∞ X 1j X 1 var = var (1 ) log j (log j)2 j j=2 2 ∞ X j − 1 = < ∞ j2(log j)2 2

7.4.1 Example 1: Record counts

29 / 56 The Kolmogorov convergence criterion implies that   ∞   ∞ 1 − 1 X 1j − E (1j ) X j j = converges log j log j j=2 j=2

and Kronecker’s lemma yields

Pn −1 Pn Pn −1 Pn −1 1j − j 1j − j µn − j 0 a←.s. j=1 = j=1 j=1 = j=1 log n log n log n

Thus Pn −1 Pn −1 µ µn − j j − log n n − 1 = j=1 + j=1 → 0 log n log n log n

This completes the derivation.

7.4.1 Example 1: Record counts

30 / 56 By the Kolmogorov Zero-One Law, we know that P[explosion] = P P[ n Xn < ∞] = 0 or1.

7.4.2 Example 2: Explosions in the Pure Birth Process.

Let {Xn} be independent with

−λnx P[Xn > x] = e , x > 0 Pn where λn ≥ 0 are birth parameters. Let S0 = 0, Sn = i=1 Xi and the population size process {X (t), t ≥ 0} of the pure birth process by X (t) = n if Sn−1 ≤ t < Sn. Next define the event explosion by ∞ X [explosion] = [ Xn < ∞] = [X (t) = ∞ for some finite t]. n=1 Proposition 7.4.2

X −1 P[explosion] = I ( λn < ∞). n Proof.

31 / 56 P −1 If n λn < ∞, then by the monotone convergence theorem, X X X −1 E( Xn) = E(Xn) = λn < ∞. n n n Thus X P[explosion] = P[ Xn < ∞] = 1. n P P∞ Conversely, suppose P [ n Xn < ∞] = 1. Then1 > exp {− n=1 Xn} > P 0 a.s. which implies that1 > E (exp {− n Xn}) > 0. Known that ∞ ! N !  − P∞ X  Y −X Y −X E e n=1 n = E e n = E lim e n N→∞ n=1 k=1 N ! Y = lim E e−Xn (by Monotone Convergence) N→∞ n=1 N N Y   Y λn = lim E e−Xn (by independence) = lim N→∞ N→∞ 1 + λ n=1 n=1 n

7.4.2 Example 2: Explosions in the Pure Birth Process.

32 / 56 P Taking log, we have P [ n Xn < ∞] = 1 implies ∞ X −1 0 < log(1 + λn ) < ∞. n=1

−1 −1 Then log(1 + λn ) → 0 which implies λ → 0.since log(1 + x) lim = 1 x↓0 x by L’Hôpital’s rule, we have

−1 −1 log 1 + λn ∼ λn as n → ∞ and thus ∞ ∞ X −1 X −1 0 < log 1 + λn < ∞ iff λn < ∞ n=1 n=1

7.4.2 Example 2: Explosions in the Pure Birth Process.

33 / 56 (a) ↔ (c): Observe that: ∞ Z ∞ X Z n+1 E (|X1|)= P [|X1| ≥ x] dx = P [|X1| ≥ x] dx 0 n=0 n ∞ ∞ X X ≥ P [|X1| ≥ n + 1] ≤ P [|X1| ≥ n] n=0 n=0 P∞ Thus E (|X1|) < ∞ iff n=0 P [|X1| ≥ n] < ∞. For every  > 0, set Y = X1/. We get E(|X1|) < ∞ iff E(|Y |) < ∞ iff P∞ P∞ n=0 P [|Y | ≥ n] = n=0 P [|X1| ≥ n] < ∞.

7.5 The Strong Law of Large Numbers for IID Sequences

Lemma 7.5.1 Let {Xn} be iid. The following are equivalent:

(a) E|X1| < ∞ Xn (b) limn→∞ n = 0 almost surely. P∞ (c) For  > 0, n=1 P[|X1| ≥ n] < ∞. Proof.

34 / 56 (c) ↔ (b): Given  > 0, X X P [|X1| ≥ n] = P [|Xn| ≥ n] < ∞ n n is equivalent, by the Borel zero-one law, to |X |   P n >  i.o. = 0 n which is in turn equivalent to |X | lim sup n ≤  n→∞ n

|Xn| almost surely. since lim supn→∞ n is a tail function, it is a.s. constant. If this constant is bounded by  for any  > 0, then |X | lim sup n = 0 n→∞ n This gives (b) and the converse is similar.

7.5 The Strong Law of Large Numbers for IID Sequences

35 / 56 We show first that S n a→.s. c implies E (|X |) < ∞. n 1 This is because the following and Lemma 7.5.1 X S − S S n − 1 S n = n n−1 = n − n−1 a→.s. c − c = 0 . n n n n n − 1

7.5 The Strong Law of Large Numbers for IID Sequences

Kolmogorov’s SLLN Pn Let {Xn} be iid and set Sn = i=1 Xi . There exists c ∈ R such that a.s. X¯n = Sn/n → c

iff E(|X1) < ∞ in which case c = µ = E(X1). Corollary 7.5.1 ¯ a.s. 2 If {Xn} is iid, then E(|X1|) < ∞ implies Xn → µ and EX1 < ∞ 1 Pn ¯ 2 a.s. 2 implies n i=1(Xi − Xn) → σ = var(X1) and 1 Pn ¯ 2 a.s. 2 n−1 i=1(Xi − Xn) → σ = var(X1). Proof of Kolmogorov’s SLLN.

36 / 56 a.s. Now we show that E (|X1|) < ∞ implies Sn/n → E (X1) . To do this, we use a truncation argument. Define

0 Xn = XnI[|Xn|≤n], n ≥ 1

Then X  0  X P Xn 6= Xn = P [|Xn| > n] < ∞ n n 0 (since E |X1| < ∞ ) and hence {Xn} and {Xn} are tail equivalent. Therefore by Proposition 7.1.1

n a.s. 0 X 0 a.s. Sn/n → E (X1) iff Sn/n = Xj /n → E (X1) j=1

So it suffices to consider the truncated sequence.

7.5 The Strong Law of Large Numbers for IID Sequences

37 / 56 Next observe that 0 0 0 Pn  S − E (S ) S − E (S ) nE (X1) − E X1I[|X |≤j] n n n n j=1 1 − = n n n

n E X I  X 1 [|X1|≤j] = E (X1) − → 0. n j=1 This last step follows from the fact that   E (X1) − E X1I[|X1|≤n] ≤ E |X1| I[|X1|>n] → 0. We thus conclude that Pn  0  0 S0 1 Xj − E Xj n − E (X ) a→.s. 0 iff a→.s. 0. n 1 n Then, it is enough by Kronecker’s lemma to prove X 0 var(Xj /j) < ∞. j

7.5 The Strong Law of Large Numbers for IID Sequences

38 / 56  2 0 ! E X 0 ∞ X Xj X j X 1 var ≤ = E X 2I  j j2 j2 1 [|X1|≤j] j j j=1 ∞ j ∞  ∞  X X 1 X X 1 = E X 2I  = E X 2I  j2 1 [k−1<|X1|≤k]  j2  1 [k−1<|X1|≤k] j=1 k=1 k=1 j=k ∞ X 2   ≤ E |X |2 I k 1 [k−1<|X1|≤k] k=2 ∞ X 2 ≤ · kE |X | I  k 1 [k−1<|X1|≤k] k=2

= 2E (|X1|) < ∞.

7.5 The Strong Law of Large Numbers for IID Sequences

39 / 56 7.5.1 Application 1: Renewal Theory

Suppose Xn’s are iid and non-negative. with E(Xn) = µ ∈ (0, ∞). Then a.s. a.s. Sn/n → µ, and Sn → ∞.

Let S0 = 0 and define

∞ X N(t) = I[Sj ≤t]. j=0

We call N(t) the number of renewals in [0, t]. Then

[N(t) ≤ n] = [Sn > t] and SN(t)−1 ≤ t < SN(t).

We conjecture that N(t)/t a→.s. µ−1.

40 / 56 We first show N(t) → ∞ a.s. as t → ∞ and because of the monotonicity, it suffices to show N(t) →P ∞. Since for any m

lim P[N(t) ≤ m] = lim P [Sm > t] → 0 t→∞ t→∞ we get the desired result that N(t) →P ∞. Now define the sets

n Sn(ω) o Λ1 = ω : n → µ Λ2 = {ω : N(t, ω) → ∞} a.s. by →, we have P (Λ1) = P (Λ2) = 1. Then Λ := Λ1 ∩ Λ2 has P(Λ) = 1. For ω ∈ Λ, as t → ∞

SN(t,ω)(ω)/N(t, ω) → µ a.s. and so SN(t)/N(t) → µ as t → ∞. From N(t) − 1 S t S N(t)−1 < ≤ N(t) , N(t) N(t) − 1 N(t) N(t) we conclude that t/N(t) a→.s. µ and thus N(t)/t a→.s. µ−1. Thus the long run rate of renewals is µ−1.

7.5.1 Application 1: Renewal Theory

41 / 56 Define ← xv,k := F (v/k), v = 1,..., k where F ←(x) = inf{u : F (u) ≥ x}. Recall F ←(u) ≤ t iff u ≤ F (t) and F (F ←(u)) ≥ u, F (F ←(u)−) ≤ u.

7.5.1 Application 1: Glivenko-Cantelli Theorem

Suppose Xn’s are iid with common distribution F . We estimate F by the empirical cumulative distribution function (ecdf) n ˆ −1 X Fn(x) = n I[Xj ≤x]. j=1 a.s. By the SLLN, Fˆn(x) → F (x) for each fixed x. Glivenko-Cantelli Theorem a.s. Dn = sup |Fˆn(x) − F (x)| → 0. x Proof.

42 / 56 If xv,k ≤ x < xv+1,k , then monotonicity implies

F (xv,k ) ≤ F (x) ≤ F (xv+1,k −) , Fˆn (xv,k ) ≤ Fˆn(x) ≤ Fˆn (xv+1,k −)

and for such x

Fˆn (xv,k ) − F (xv+1,k −) ≤ Fˆn(x) − F (x)

≤ Fˆn (xv+1,k −) − F (xv,k ) . Since v + 1 v 1 F (x −) − F (x ) ≤ − = , v+1,k v,k k k k we have 1 Fˆ (X ) − F (x ) − ≤ Fˆ (x) − F (x) n v,k v,k k n 1 ≤ Fˆ (x −) − F (x −) + . n v+1,k v+1,k k

7.5.1 Application 1: Glivenko-Cantelli Theorem

43 / 56 Therefore

sup Fˆn(x) − F (x) x∈[xv,k ,xv+1,k )

≤ 1 + Fˆ (x ) − F (x ) ∨ Fˆ (x −) − F (x −) k n v,k v,k n v+1,k v+1,k

which is valid for v = 1,..., k − 1, and taking the supremum,

sup Fˆn(x) − F (x) x∈[x1,k ,xk,k )

≤ 1 + Wk Fˆ (x ) − F (x ) ∨ Fˆ (x −) − F (x −) k v=1 n v,k v,k n v,k v,k = RHS

We now show that this inequality also holds for x < x1,k and x ≥ xk,k . If x ≥ xk,k , then F (x) = Fˆn(x) = 1 so Fˆn(x) − F (x) = 0 and RHS is still an upper bound.

7.5.1 Application 1: Glivenko-Cantelli Theorem

44 / 56 If x < x1,k , either (i) F (x) ≥ Fˆn(x) in which case

ˆ Fn(x, ω) − F (x) = F (x) − Fn(x, ω)

≤ F (x) ≤ F (x1,k −) 1 ≤ ≤ RHS k

or (ii) Fˆn(x) > F (x) in which case

ˆ ˆ Fn(x, ω) − F (x) = Fn(x, ω) − F (x)

≤ Fˆn (x1,k −, ω) − F (x1,k −) + F (x1,k −) − F (x)

ˆ ≤ Fn (x1,k −, ω) − F (x1,k −) + |F (x1k −) − F (x)| 1 ≤ + Fˆ (x −, ω) − F (x −) ≤ RHS k n 1k 1k We therefore conclude that

Dn ≤ RHS

7.5.1 Application 1: Glivenko-Cantelli Theorem

45 / 56 The SLLN implies that there exist sets Λv,k , and Λ˜v,k , such that   P (Λv,k ) = P Λ˜v,k = 1 and such that

Fˆn (xv,k ) → F (xvk ) , n 1 X Fˆn (xv,k −) = I → P [X1 < xv,k ] = F (xv,k −) n [Xj

provided ω ∈ Λvk and Λ˜vk respectively. Let \ \\ Λk = Λv,k Λ˜v,k v

so P (Λk ) = 1. Then for ω ∈ Λk , 1 lim sup Dn(ω) ≤ n→∞ k T T For ω ∈ k Λk , where P ( k Λk ) = 1,

lim Dn(ω) = 0. n→∞

7.5.1 Application 1: Glivenko-Cantelli Theorem

46 / 56 7.6 The Kolmogorov Three Series Theorem

The Kolmogorov three series theorem provides necessary and suf- ficient conditions for a series of independent random variables to converge. The result is especially useful when the Kolmogorov con- vergence criterion may not be applicable, for example, when existence of variances is not guaranteed. Theorem 7.6.1 P Let {Xn} be independent. in order for n Xn converges a.s., it is necessary and sufficient that there exists c > 0 such that P (i) n P[|Xn| > c] < ∞ P (ii) n var(XnI[|Xn|≤c]) < ∞ P (iii) n E(XnI|Xn|≤c ) converges. P If n Xn converges a.s., then (i), (ii), (iii) hold for any c > 0. Thus if the three series converge for one value of c > 0, they converge for all c > 0.

47 / 56 Suppose the three series converge. Define 0 Xn = XnI[|Xn|≤c] Then X  0  X P Xn 6= Xn = P [|Xn| > c] < ∞ n n 0 P by (i) so {Xn} and {Xn} are tail equivalent. Thus n Xn converges P 0 almost surely iff n Xn converges almost surely. From (ii) X 0  var Xn < ∞ n so by the Kolmogorov convergence criterion X 0 0 Xj − E Xj converges j P 0 P 0 But (iii) implies n E (Xn) converges and thus n Xn converges, as desired.

7.6 The Kolmogorov Three Series Theorem

Proof of Sufficiency.

48 / 56 Without loss of generality, we suppose E (Xn) = 0 for all n and we prove the statement: If {Xn, n ≥ 1} are independent, P E (Xn) = 0, |Xn| ≤ α, then n Xn almost surely convergent implies P 2 Pn n E Xn < ∞. We set Sn = i=1 Xi and begin by estimating PN 2 var (SN ) = i=1 E Xi for a positive integer N. To help with this, fix a constant λ > 0, and define the first passage time out of [−λ, λ]

τ := inf {n ≥ 1 : |Sn| > λ} , ∞ and τ = ∞ on the set [∨n=1 |Sn| ≤ λ].

7.6 The Kolmogorov Three Series Theorem

We now examine a proof of the necessity of the Kolmogorov three series theorem. Two lemmas pave the way. The first is a partial converse to the Kolmogorov convergence criterion. Lemma 7.6.1 Suppose {Xn} are independent which are uniformly bounded, so that for some α > 0 and all ω ∈ Ω we have |Xn(ω)| ≤ α. If P P n(Xn − EXn) converges almost surely, then n var(Xn) < ∞. Proof.

49 / 56 We then have N X 2 2  2  2  . E Xi = E SN = E SN I[τ≤N] + E SN I[τ>N] = I + II . i=1 N Note on τ > N, we have supi=1 |Si | ≤ λ, so that in particular, 2 2 SN ≤ λ . Hence, II ≤ λ2P[τ > N] ≤ (λ + α)2P[τ > N] For I we have N X 2  I = E SN I[τ=j] j=1 For j < N 2  N   2  X E SN I[τ=j] = E Sj + Xi  I[τ=j] . i=j+1

7.6 The Kolmogorov Three Series Theorem

50 / 56 "j−1 # _ Note [τ = j] = |Si | ≤ λ, |Sj | > λ ∈ σ (X1,..., Xj ) i=1 N N X X while Xi ∈ σ (Xj+1,..., XN ) and thus I[τ=j] ⊥ Xi . i=j+1 i=j+1 Hence, for j < N,

 N N !2  2  2 X X E SN I[τ=j] = E Sj + 2Sj Xi + Xi  I[τ=j] i=j+1 i=j+1

N !2 2  X =E Sj I[τ=j] + 0 + E Xi P[τ = j] i=j+1 N 2  X 2 ≤E (|Sj−1| + |Xj |) I[τ=j] + E (Xi ) P[τ = j] i=j+1 N 2 X 2 ≤(λ + α) P[τ = j] + E (Xi ) P[τ = j]. i=j+1

7.6 The Kolmogorov Three Series Theorem

51 / 56 PN 2 Defining i=N+1 E (Xi ) = 0, we have  N  2  2 X 2 E SN I[τ=j] ≤ (λ + α) + E (Xi )  P[τ = j], 1 ≤ j ≤ N. i=j+1 Adding over j yields N ! 2  2 X 2 I = E SN I[τ≤N] ≤ (λ + α) + E Xi P[τ ≤ N] i=1 2 2  = (λ + α) + E SN P[τ ≤ N] N 2  X 2 Finally, E SN = E Xi = I + II i=1 2 2  2 ≤ (λ + α) + E SN P[τ ≤ N] + (λ + α) P[τ > N] 2 2  ≤ (λ + α) + E SN P[τ ≤ N], (λ + α)2 i.e., E S2  ≤ N P[τ > N]

7.6 The Kolmogorov Three Series Theorem

52 / 56 Let N → ∞. We get

∞ X (λ + α)2 E X 2 ≤ i P[τ = ∞] i=1 which is helpful and gives the desired result only if P[τ = ∞] > 0. P However, note that we assume n Xn is almost surely convergent, and hence for ω, we have {Sn(ω), n ≥ 1} is a bounded sequence of numbers. So supn |Sn| is almost surely a finite random variable, and there exists λ > 0 such that   P[τ = ∞] = P sup |Sn| ≤ λ > 0 n

else P [supn |Sn| < ∞] = 0, which is a contradiction. This completes the proof of the lemma.

7.6 The Kolmogorov Three Series Theorem

53 / 56 The proof uses a technique called symmetrization. De- fine an independent sequence {Yn, n ≥ 1} which is independent of d {Xn, n ≥ 1} satisfying Yn = Xn. Let

Zn = Xn − Yn, n ≥ 1

Then {Zn, n ≥ 1} are independent random variables, E (Zn) = 0, and the distribution of each Zn is symmetric which amounts to the statement that d Zn = −Zn Further,

var (Zn) = var (Xn) + var (Yn) = 2var (Xn)

and |Zn| ≤ |Xn| + |Yn| ≤ 2α.

7.6 The Kolmogorov Three Series Theorem

Lemma 7.6.2 Suppose {Xn} are independent which are uniformly bounded, so that for some α > 0 and all ω ∈ Ω we have |Xn(ω)| ≤ α. Then P P n Xn converges almost surely implies that n EXn converges. Proof.

54 / 56 d ∞ Since {Xn, n ≥ 1} = {Yn, n ≥ 1} as random elements of R , the convergence properties of the two sequences are identical and since P n Xn is assumed almost surely convergent, the same is true of P P n Yn. Hence also n Zn is almost surely convergent. Since {Zn} is also uniformly bounded, we conclude from Lemma 7.6.1 that X X var (Zn) = 2var (Xn) < ∞ n n P From the Kolmogorov convergence criterion we get n (Xn − E (Xn)) P almost surely convergent. Since we also assume n Xn is almost P surely convergent, it can only be the case that n E (Xn) converges.

7.6 The Kolmogorov Three Series Theorem

55 / 56 P Since n Xn converges almost surely, we have a.s. Xn → 0 and thus

P ([|Xn| > c] i.o. ) = 0 P By the Borel zero-one law, it follows that P [|Xn| > c] < ∞.  n Then {Xn} and XnI[|Xn|≤c] are tail equivalent and one converges iff the other does. So we get that the uniformly bounded sequence  P XnI[|Xn|≤c] satisfies n XnI|Xn|≤c converges almost surely. By P  Lemma 7.6.2, n E XnI[|Xn|≤c] (the series in (iii)) is convergent. Thus the infinite series of uniformly bounded summands X  XnI[|Xn|≤c] − E XnI[|Xn|≤c] n is almost surely convergent and by Lemma 7.6.1 X var(XnI[|Xn|≤c]) < ∞ which is (ii). n

7.6 The Kolmogorov Three Series Theorem

We now turn to the necessity of the Kolmogorov three series theorem. Proof of necessity.

56 / 56