STAT 811 Probability Theory II
Chapter 7: Laws of Large Numers and Sums of Independent Random Variables
Dr. Dewei Wang Associate Professor Department of Statistics University of South Carolina [email protected]
1 / 56 7.1 Truncation and Equivalence
From the definition of ui, we see that dealing with random variables that are uniformly bounded or that have moments are often easier. When these desirable properties are absent, we often use truncation to induce their presence and then see whether the truncation makes any equivalence. For example, we often compare
0 {Xn} with {Xn} = {XnI|Xn|≤n}
where the second one is a truncated version of the first.
2 / 56 7.1 Truncation and Equivalence
Definition. 0 Two sequences {Xn} and {Xn} are tail equivalent if
X 0 P[Xn 6= Xn] < ∞. n Remark: From the Borel-Cantelli Lemma, we have that the above tail equivalence implies 0 P Xn 6= Xn i.o. = 0 or equivalently 0 P lim inf Xn = X = 1 n→∞ n 0 0 Let N = lim infn→∞ [Xn = Xn] = ∪n ∩k≥n [Xk = Xk ], we have 0 P(N) = 1. For ω ∈ N, it means when k ≥ K(ω) , Xk (ω) = Xk (ω). P 0 Thus for ω ∈ N, n(Xn(ω) − Xn(ω)) converges; i.e., P 0 n(Xn − Xn) converges a.s.
3 / 56 7.1 Truncation and Equivalence
P∞ P∞ 0 In addition, we have n=K(ω) Xn(ω) = n=K(ω) Xn(ω). And if an ↑ ∞ and when n ≥ K(ω),
n K(ω) −1 X 0 −1 X 0 an (Xj (ω) − Xj (ω)) = an (Xj (ω) − Xj (ω)) → 0. j=1 j=1
Proposition 7.1.1 (Equivalence) 0 Suppose the two sequences {Xn} and {Xn} are tail equivalent. Then P 0 1. n(Xn − Xn) converges a.s. P P 0 2. n Xn conveges a.s. iff n Xn converge a.s. 3. If an ↑ ∞ and if there exits a random variable X such that
n n −1 X a.s. −1 X 0 a.s. an Xj → X , then also an Xj → X . j=1 j=1
4 / 56 7.2 A General Weak Law of Large Numbers
Theorem 7.2.1 (General weak law of large numbers)
Suppose {Xn} are independent random variables and define Pn Sn = j=1 Xj . If Pn 1. j=1 P[|Xj | > n] → 0, −2 Pn 2 2. n j=1 E Xj I[|Xj ≤n] → 0, then if we define n X an = E Xj I[|Xj ≤n] , j=1 we get S − a n n = X¯ − a /n →P 0. n n n
Remark: No assumptions about moments of Xn’s need to be made.
5 / 56 By Chebychev’s inequality n X 2 2 P[|Xj | > n] = nP [|X1| > n] ≤ nE X1 /n → 0 j=1 and n X 1 1 n−2 E X 2I = nE X 2I ≤ E X 2 → 0 j [|Xj ≤n] n2 1 [|X1|≤n] n 1 j=1 Finally, we observe, as n → ∞, a n = E X I → E (X ) = µ n 1 [|X1|≤n] 1
7.2 A General Weak Law of Large Numbers (Special Cases)
Case (a) WLLN with variances. 2 Suppose {Xn} are iid with E(Xn) = µ and E(Xn ) < ∞. Then as P n → ∞, Sn/n → µ. Proof.
6 / 56 We first have, since E (|X1|) < ∞,
n X P[|Xj | > n] = nP [|X1| > n]= E nI[|X1|>n] j=1 ≤ E |X1| I[|X1|>n] → 0.
7.2 A General Weak Law of Large Numbers (Special Cases)
Case (b) Khintchin’s WLLN under the first moment hypothesis.
Suppose {Xn} are iid with E(|Xn|) < ∞ and E(Xn) = µ. Then as P n → ∞, Sn/n → µ. Proof.
7 / 56 Next, for any > 0, n X 1 n−2 E X 2I = EX 2I j [|Xj ≤n] n 1 [|X1|≤n] j=1 1 ≤ E X 2I √ + E X 2I √ n 1 [|X1|≤ n] 1 [ n≤|X1|≤n] 2n 1 ≤ + E n |X1| I √ n n [ n≤|X1|≤n] 2 √ ≤ + E |X1| I n≤|X1| → 2. Moreover, since nE X1I [|X1|≤n] − E (X ) ≤ E |X | I → 0, 1 1 [|X1|>n] n P we have X¯n → µ.
7.2 A General Weak Law of Large Numbers (Special Cases)
8 / 56 It suffices to show that (1), obviously,
n X P[|Xj | > n] = nP [|X1| > n] → 0, j=1
and (2)
n X 1 n−2 E X 2I = E X 2I → 0 j [|Xj ≤n] n 1 [|X1|≤n] j=1
7.2 A General Weak Law of Large Numbers (Special Cases)
Case (c) Feller’s WLLN without a first moment assumption.
Suppose {Xn} are iid with limx→∞ xP[|X1| > x] = 0, then P Sn/n − E(X1I[|X1|≤n]) → 0. Proof.
9 / 56 We define τ(x) := xP [|X1| > x] and let F (x) be the cdf. Then Z Z 1 2 1 2 |X1| I[|X1|≤n]dP = x F (dx) n Ω n |x|≤n ! 1 Z Z |x| = 2sds F (dx) n |x|≤n s=0 " # 1 Z n Z = 2s F (dx) ds (by Fubini) n 0 s<|x|≤n 1 Z n = 2s (P [|X1| > s] − P [|X1| > n]) ds n 0 1 Z n 1 Z n = 2τ(s)ds − 2sdsP [|X1| > n] n 0 n 0 2 Z n = τ(s)ds − nP [|X1| > n] → 0 n 0 | {z } τ(n) since if τ(s) → 0 so does its average.
7.2 A General Weak Law of Large Numbers (Special Cases)
10 / 56 Define n 0 0 X 0 X = Xj I and S = X nj [|Xj |≤n] n nj j=1 Then n n X 0 X P Xnj 6= Xj = P [|Xj | > n] → 0 j=1 j=1 So 0 0 [ 0 P Sn − Sn > ≤ P Sn 6= Sn ≤ P Xnj 6= Xj j=1 n X 0 ≤ P Xnj 6= Xj → 0 j=1 and therefore 0 P Sn − Sn → 0.
7.2 A General Weak Law of Large Numbers (Proof)
Proof of Theorem 7.2.1.
11 / 56 0 Now look at Sn. By Chebychev’s inequality
0 0 0 n Sn − ESn var (Sn) 1 X 0 2 P > ≤ ≤ E X n n22 n22 nj j=1 n 1 X = E X 2I → 0 n22 j [|Xj |≤n] j=1
0 Pn Note an = ES = EXj I , and thus n j=1 [|Xj |≤n] S0 − a n n →P 0 n We therefore get S − a S − S0 S0 − a n n = n n + n n →P 0. n n n
7.2 A General Weak Law of Large Numbers (Proof)
12 / 56 Note that Z ∞ e e Z ∞ dy EX + = dx = = ∞, e 2x log x 2 1 y because the distribution is symmetric E (X +) = E (X −) = ∞ and E(X ) does not exist. However, e e τ(x) = xP [|X | > x] = x · = → 0 1 x log x log x
and an = 0 because F is symmetric, so without a mean existing, S n →P 0. n
7.2 A General Weak Law of Large Numbers (Example)
Example e Let us consider a CDF: F (x) = 1 − 2x log x , x ≥ e. Suppose we have from it an iid sequence {Xn}. What is the mean of X ? What does Sn/n converges in probability to? Solution.
13 / 56 Note that E(X ) neither exists. Now
−1 τ(x) = xP [|X1| > x] = x · (1 − 2π arctan x).
But 1 − 2π−1 arctan x −2π−1(1 + x2)−1 lim τ(x) = lim = lim x→∞ x→∞ x−1 x→∞ −x−2 2x2 2 = lim = 6= 0. (Oops!) x→∞ π(1 + x2) π
7.2 A General Weak Law of Large Numbers (Example)
Example How about (standard) Cauchy? We have
F (x) = 0.5 + π−1 arctan x.
Solution.
14 / 56 When N = 1, it is trivial. For any N > 1, we define PN A = {supj≤N |Sj | > α} = k=1 Ak where Ak = {|Sk | > α, |Sj | ≤ α, j < k}, k = 1,..., N, are disjoint events. Note that 2 2 P(Ak ) = E(IAk ) ≤ E(Sk /α IAk ) −2 2 2 −2 2 ≤ α E((Sk + (SN − Sk ) )IAk ) = α E(Sn IAk ). Summing over k yields −2 2 −2 2 P(A) = P(sup |Sj | > α) ≤ α E(SN IA) ≤ α E(SN ). j≤N
7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.0 (Komogorov’s inequality: about tail probabilities of maxima of sums)
Suppose {Xn} is an independent sequence of random variables and suppose E(Xn) = 0 and var(Xn) < ∞. Then for each α > 0, " # N 1 1 X P sup |S | > α ≤ var(S ) = E(X 2). j α2 N α2 j j≤N j=1 Proof.
15 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables Proposition 7.3.1 (Skorohod’s inequality: about tail probabilities of maxima of sums)
Suppose {Xn} is an independent sequence of random variables and suppose α > 0 is fixed. For n ≥ 1, set
if iid c = sup P[|SN − Sj | > α] = sup P[|Sj | > α]. j≤N j≤N
Suppose c < 1, then " # 1 P sup |Sj | > 2α ≤ P[|SN | > α]. j≤N 1 − c
Remark. Compared to the famous Kolmogorov’s inequality, one advantage of Skorohod’s inequality is that it does not require moment assumptions. 16 / 56 Define J := inf {j : |Sj | > 2α} with the convention that inf ∅ = ∞. Note that
" # N X sup |Sj | > 2α = [J ≤ N] = [J = j] j≤N j=1
where the last union is a union of disjoint sets. Now we write
P [|SN | > α] ≥ P [|SN | > α, J ≤ N] N N X X = P [|SN | > α, J = j] ≥ P [|SN − Sj | ≤ α, J = j] j=1 j=1
where, the last inequality is because if |SN − Sj | ≤ α and |Sj | > 2α, then |SN | > α.
7.3 Almost Sure Convergence of Sums of Independent Random Variables
Proof of Skorohod’s inequality.
17 / 56 PN It is also true that SN − Sj = i=j+1 Xj ∈ B (Xj+1,..., XN ) and [J = j] = supi
N X P [|SN | > α] ≥ P [|SN − Sj | ≤ α] P[J = j] j=1 N X ≥ (1 − c)P[J = j] j=1 =( 1 − c)P[J ≤ N] " # =( 1 − c)P sup |Sj | > 2α . j≤N
7.3 Almost Sure Convergence of Sums of Independent Random Variables
18 / 56 7.3 Almost Sure Convergence of Sums of Independent Random Variables
Based on Skorohod’s inequality, we may now present a remarkable result which shows the equivalence of convergence in probability to almost sure convergence for sums of independent random variables. Theorem 7.3.2 (Lévy’s theorem)
If {Xn} are independent, then X X Xn converges i.p. iff Xn converges a.s.; n n Pn i.e., letting Sn = i=1 Xi , then the following are equivalent 1. {Sn} is Cauchy in probability.
2. {Sn} converges in probability.
3. {Sn} converges almost surely.
4. {Sn} is almost surely Cauchy.
19 / 56 Assume {Sn} is convergent in probabil- ity, so that {Sn} is Cauchy in probability. We show {Sn} is almost surely convergent by showing {Sn} is almost surely Cauchy. To show that {Sn} is almost surely Cauchy, we need to show a.s. ξN = sup |Sm − Sn| → 0, m,n≥N
as N → ∞. Note that {ξN , N ≥ 1} is a decreasing sequence. Known P a.s. P that if ξN → 0, then ξN → 0. To show ξN → 0, we see
ξN = sup |Sm − SN + SN − Sn| m,n≥N
≤ sup |Sm − SN | + sup |Sn − SN | m≥N n≥N
= 2 sup |Sn − SN | n≥N
= 2 sup |SN+j − SN | j≥0
7.3 Almost Sure Convergence of Sums of Independent Random Variables
Proof of Lévy’s theorem.
20 / 56 P It suffices to show that supj≥0 |SN+j − SN | → 0. For any > 0, and 1 0 < δ < 2 , the assumption that {Sn} is Cauchy i.p. implies that there exists N,δ such that h i P |S − S 0 | > ≤ δ m m 2 0 if m, m ≥ N,δ, and hence, if N ≥ N,δ, h i P |S − S | > ≤ δ, ∀j ≥ 0. N+j N 2
Now write " # ( " #) P sup |SN+j − SN | > = P lim sup |SN+j − SN | > j≥0 N0→∞ N0≥j≥0 " # = lim P sup |SN+j − SN | > . N0→∞ N0≥j≥0
7.3 Almost Sure Convergence of Sums of Independent Random Variables
21 / 56 0 Now we seek to apply Skorohod’s inequality. Let Xi = XN+i and j j 0 X 0 X Sj = Xi = XN+i = SN+j − SN . i=1 i=1 With this notation we have " " # 0 P sup | SN+j −SN |> ] = P sup Sj > N0≥j≥0 N0≥j≥0 1 h 0 i ≤ P SN0 > h 0 0 i 2 1 − sup 0 P S − S > j≤N N0 j 2 ! 1 h 0 i = P SN0 > 1 − supj≤N0 P |SN+N0 − SN+j | > 2 2 ≤ (1 − δ)−1δ ≤ 2δ which can be chosen arbitrarily small.
7.3 Almost Sure Convergence of Sums of Independent Random Variables
22 / 56 P∞ 2 WLOG, assume E(Xj ) = 0. Then j=1 EXj < ∞, which implies that {Sn} is L2–Cauchy since (m < n) n 2 X 2 kSn − Smk2 = var (Sn − Sm) = EXj → 0 j=m+1
as m, n → ∞. So {Sn} , being L2− Cauchy, is also Cauchy in proba- −2 −2 Pn bility. P [|Sn − Sm| > ] ≤ var (Sn − Sm) = j=m var (Xj ) → 0 as n, m → ∞. By Lévy’s theorem {Sn} is almost surely convergent.
7.3 Almost Sure Convergence of Sums of Independent Random Variables
Theorem 7.3.3 (Kolmogorov convergence criterion)
If {Xn} are independent, and ∞ X var(Xj ) < ∞, j=1 ∞ then X (Xj − E(Xj )) converges almost surely. j=1 Proof.
23 / 56 7.4 Strong Laws of Large Numbers
This section considers the problem of when sums of independent ran- dom variables properly scaled and centered converge almost surely. We will prove that sample averages converge to mathematical ex- pectations when the sequence is iid and a mean exists. We begin with a number theory result which is traditionally used in the development of the theory. Lemma 7.4.1 (Kronecker’s lemma)
Suppose we have two sequences {xk } and {ak } such that xk ∈ R and0 < an ↑ ∞. If X xk converges, ak k then n −1 X lim a xk = 0. n→∞ n k=1
24 / 56 P∞ Let rn = k=n+1 xk /ak so that rn → 0 as n → ∞. Given > 0, there exists N0 = N0() such that for n ≥ N0, we have |rn| ≤ . Now
xn = rn−1 − rn an so xn = an (rn−1 − rn) , n ≥ 1 and n n X X xk = (rk−1 − rk ) ak k=1 k=1 n−1 X = (aj+1 − aj ) rj + a1r0 − anrn j=1
7.4 Strong Laws of Large Numbers
Proof of Kronecker’s lemma.
25 / 56 Then for n ≥ N0 Pn xk k=1 an N0−1 n−1 X (aj+1 − aj ) X (aj+1 − aj ) a1r0 anrn ≤ |rj | + |rj | + + an an an an j=1 j=N0 const ≤ + (aN0+1 − aN0 + aN0+2 − aN0+1 an an
+aN0+3 − aN0+2 + ··· + an − an−1) + (a − a ) ≤o(1) + n N0 + an ≤o(1) + 2.
This shows the result.
7.4 Strong Laws of Large Numbers
26 / 56 By the Kolmogorov convergence criterion, we know X ((Xk − EXk )/bk ) converges a.s. n Then follow the Kronecker Lemma, we have n −1 X a.s. bn (Xk − EXk ) → 0. k=1
7.4 Strong Laws of Large Numbers
Corollary 7.4.1 (A strong law) 2 Let {Xn} be independent with E(Xn ) < ∞. Suppose we have a monotone sequence bn ↑ ∞. If X Xk var < ∞, bk k then S − E(S ) n n a→.s. 0. bn Proof.
27 / 56 Pn 1 Taken from analysis: log n − j=1 j → c (Euler’s constant) as n → ∞.
7.4.1 Example 1: Record counts
Suppose {Xn} is iid with continuous cdf F . Define
n N X . X µN = I = 1j , [Xj is a record] j=1 j=1
so µN is the number of recordes in the first N observations. Proposition 7.4.1 (Logarithmic growth rate) The number of records in an iid sequence grows logarithmically and we have the almost sure limit µ lim N → 1. N→∞ log N
Proof.
28 / 56 Recall that {1j , j ≥ 1} are independent. The basic facts about {1j } are that 1 1 P [1 = 1]= , E (1 ) = j j j j 1 1 j − 1 var (1 )= E (1 )2 − (E1 )2 = − = j j j j j2 j2
This implies that ∞ ∞ X 1j X 1 var = var (1 ) log j (log j)2 j j=2 2 ∞ X j − 1 = < ∞ j2(log j)2 2
7.4.1 Example 1: Record counts
29 / 56 The Kolmogorov convergence criterion implies that ∞ ∞ 1 − 1 X 1j − E (1j ) X j j = converges log j log j j=2 j=2
and Kronecker’s lemma yields
Pn −1 Pn Pn −1 Pn −1 1j − j 1j − j µn − j 0 a←.s. j=1 = j=1 j=1 = j=1 log n log n log n
Thus Pn −1 Pn −1 µ µn − j j − log n n − 1 = j=1 + j=1 → 0 log n log n log n
This completes the derivation.
7.4.1 Example 1: Record counts
30 / 56 By the Kolmogorov Zero-One Law, we know that P[explosion] = P P[ n Xn < ∞] = 0 or1.
7.4.2 Example 2: Explosions in the Pure Birth Process.
Let {Xn} be independent with
−λnx P[Xn > x] = e , x > 0 Pn where λn ≥ 0 are birth parameters. Let S0 = 0, Sn = i=1 Xi and the population size process {X (t), t ≥ 0} of the pure birth process by X (t) = n if Sn−1 ≤ t < Sn. Next define the event explosion by ∞ X [explosion] = [ Xn < ∞] = [X (t) = ∞ for some finite t]. n=1 Proposition 7.4.2
X −1 P[explosion] = I ( λn < ∞). n Proof.
31 / 56 P −1 If n λn < ∞, then by the monotone convergence theorem, X X X −1 E( Xn) = E(Xn) = λn < ∞. n n n Thus X P[explosion] = P[ Xn < ∞] = 1. n P P∞ Conversely, suppose P [ n Xn < ∞] = 1. Then1 > exp {− n=1 Xn} > P 0 a.s. which implies that1 > E (exp {− n Xn}) > 0. Known that ∞ ! N ! − P∞ X Y −X Y −X E e n=1 n = E e n = E lim e n N→∞ n=1 k=1 N ! Y = lim E e−Xn (by Monotone Convergence) N→∞ n=1 N N Y Y λn = lim E e−Xn (by independence) = lim N→∞ N→∞ 1 + λ n=1 n=1 n
7.4.2 Example 2: Explosions in the Pure Birth Process.
32 / 56 P Taking log, we have P [ n Xn < ∞] = 1 implies ∞ X −1 0 < log(1 + λn ) < ∞. n=1
−1 −1 Then log(1 + λn ) → 0 which implies λ → 0.since log(1 + x) lim = 1 x↓0 x by L’Hôpital’s rule, we have