O.H. Probability II (MATH 2647) M15

2 Convergence of random variables

In one uses various modes of convergence of random variables, many of which are crucial for applications. In this section we shall consider some of the most important of them: convergence in Lr, convergence in probability and convergence with probability one (a.k.a. almost sure convergence).

2.1 Weak laws of large numbers b Definition 2.1. Let r > 0 be fixed. We say that a sequence Xj, j 1, of random r ≥ r L variables converges to a X in L (write Xn X) as n , if r E X X 0 as n . → → ∞ n − → → ∞ Example 2.2. Let X be a sequence of random variables such that for n n 1 ≥ some real numbers (an)n 1, we have ≥ P X = a = p , P X = 0 = 1 p . (2.1) n n n n − n Lr r   Then X 0 iff E X a r p 0 as n . n → n ≡ | n| n → → ∞ The following result is the L2 weak (L2-WLLN)

Z Theorem 2.3. Let X , j 1, be a sequence of uncorrelated random variables j ≥ with EXj = µ and Var(Xj) C < . Denote Sn = X1 + + Xn. Then 2 ≤ ∞ ··· 1 S L µ as n . n n → → ∞ Proof. Immediate from

2 2 1 (Sn nµ) Var(Sn) Cn E Sn µ = E − = 0 as n . n − n2 n2 ≤ n2 → → ∞   b Definition 2.4. We say that a sequence Xj, j 1, of random variables converges ≥ P to a random variable X in probability (write Xn X) as n , if for every fixed ε > 0 → → ∞ P X X ε 0 as n . | n − | ≥ → → ∞  Example 2.5. Let the sequence Xn n 1 be as in (2.1). Then for every ε > 0 ≥ P we have P Xn ε P Xn = 0) = pn, so that Xn 0 if pn 0 as n . | | ≥ ≤ 6 → → → ∞ The usual (WLLN ) is just a convergence in probability result:

Z Theorem 2.6. Under the conditions of Theorem 2.3, 1 S P µ as n . n n → → ∞ Exercise 2.7. Derive Theorem 2.6 from the Chebyshev inequality. We prove Theorem 2.6 using the following simple fact:

r Z Lemma 2.8. Let X , j 1, be a sequence of random variables. If X L X j ≥ n → for some fixed r > 0, then X P X as n . n → → ∞

6 O.H. Probability II (MATH 2647) M15

r Proof. By the generalized Markov inequality with g(x) = x and Zn = Xn X 0, | − | ≥ we get: for every fixed ε > 0

r r r E Xn X P Zn ε P Xn X ε | − | 0 ≥ ≡ | − | ≥ ≤ εr → as n .   → ∞ Proof of Theorem 2.6. Follows immediately from Theorem 2.3 and Lemma 2.8.

As the following example shows, a high dimensional cube is almost a sphere.

Example 2.9. Let Xj, j 1 be iid with Xj U( 1, 1). Then the variables 2 1≥ 2 ∼ − 4 Yj = (Xj) satisfy EYj = 3 , Var(Yj) E[(Yj )] = E[(Xj) ] 1. Fix ε > 0 and consider the set ≤ ≤

def n An,ε = z R : (1 ε) n/3 < z < (1 + ε) n/3 , ∈ − | | n o p n 2 n p 2 where z is the usual Euclidean length in R , z = (zj) . By the WLLN, | | | | j=1 1 n 1 n 1P Y (X )2 P ; n j ≡ n j → 3 j=1 j=1 X X

in other words, for every fixed ε > 0, a point X = (X1,...,Xn) chosen uniformly at random in ( 1, 1)n satisfies − 1 n 1 P (X )2 ε P X A 0 as n , n j − 3 ≥ ≡ 6∈ n,ε → → ∞  j=1  X  ie., for large n, with probability approaching one, a random point X ( 1, 1)n is near the n-dimensional sphere of radius n/3 centred at the origin. ∈ −

Z Theorem 2.10. Let random variables pSn, n 1, have two finite moments, 2 ≥ µn ESn, σn Var(Sn) < . If, for some sequence bn, we have σn/bn 0 as n ≡ , then (≡S µ )/b ∞ 0 as n , both in L2 and in probability.→ → ∞ n − n n → → ∞ Proof. The result follows immediately from the observation

2 (Sn µn) Var(Sn) E −2 = 2 0 as n . bn bn → → ∞   12 Example 2.11. In the “coupon collector’s problem” let Tn be the time to n 1 collect all n coupons. It is easy to show that ETn = n m=1 m n log n and 2 2 ∼ Var(T ) n2 n 1 π n , so that n ≤ m=1 m2 ≤ 6 P P T ET T n − n 0 ie., n 1 n log n → n log n →

as n both in L2 and in probability. → ∞ 12 Problem R4

7 O.H. Probability II (MATH 2647) M15

2.2 Almost sure convergence

Let (Xk)k 1 be a sequence of i.i.d. random variables having mean EX1 = µ and ≥ def n finite second moment. Denote Sn = k=1 Xk. Then the usual (weak) law of large numbers (WLLN) tells us that for every δ > 0 P 1 P n− S µ > δ 0 as n . (2.2) n − → → ∞  1 In other words, according to WLLN , n− Sn converges in probability to a constant random variable X µ = E(X1), as n (recall Definition 2.4, Theorem 2.6). It is important to≡ remember that convergence→ ∞ in probability is not related to the pointwise convergence, ie., convergence Xn(ω) X(ω) for a fixed ω Ω. The following useful definition can be realised in→ terms of a [0, 1] random∈ variable, recall Remark 1.6.2 U Definition 2.12. The canonical is Ω, , P , where Ω = [0, 1], is the smallest σ-field containing all intervals in [0, 1]F, and P is the ‘length  measure’F on Ω (ie., for A = [a, b] [0, 1],P(A) = b a). ⊆ − Z Example 2.13. Let Ω, , P be the canonical probability space. For every event A consider the Findicator random variable ∈ F  1, if ω A , 1A(ω) = ∈ (2.3) 0, if ω / A . ( ∈ For n 1 put m = [log n], ie., m 0 is such that 2m n < 2m+1, define ≥ 2 ≥ ≤ n 2m n + 1 2m A = − , − 0, 1 n 2m 2m ⊆ h i def   1 1 [log2 n] 2 and let Xn = An . Since P An > 0 = P(An) = 2− < n 0 as n , the sequence X converges in probability to X 0. However, → → ∞ n  ≡

ω Ω: Xn(ω) X(ω) 0 as n = ∅ , ∈ → ≡ → ∞ ie., there is no point ω Ω for which the sequence X (ω ) 0, 1 converges to ∈ n ∈ { } Z X(ω) = 0. [Try the R script simulating this sequence from the course webpage!] The following is the key definition of this section. b Definition 2.14. A sequence (Xk)k 1 of random variables in (Ω, , P) converges, as n , to a random variable X with≥ probability one (or almostF surely) if → ∞ P ω Ω: X (ω) X(ω) as n = 1 . (2.4) ∈ n → → ∞   Z Remark 2.14.1. For ε > 0, let An(ε) = ω : Xn(ω) X(ω) > ε . Then the property (2.4) is equivalent to saying that for every| ε >− 0 |  P An(ε) finitely often = 1 . (2.5) This is why the Borel-Cantelli lemma is so useful  in studying almost sure limits.

8 O.H. Probability II (MATH 2647) M15

Example 1.5 (continued) Consider a finite random variable X, ie., satisfying def 1 P( X < ) = 1. Then the sequence (Xk)k 1 defined via Xk = k X converges to| zero| with∞ probability one. ≥

Solution. The previous discussion established exactly (2.5).

In general, to verify convergence with probability one is not immediate. The following lemma gives a sufficient condition of almost sure convergence.

Z Lemma 2.15. Let X1, X2, . . . and X be random variables. If, for every ε > 0,

∞ P X X > ε < , (2.6) | n − | ∞ n=1 X  then Xn converges to X almost surely.

Proof. Fix ε > 0 and let An(ε) = ω Ω: Xn(ω) X(ω) > ε . By (2.6), ∈ | − | P An(ε) < , and, by Lemma 1.6a), only a finite number of An(ε) occur with n ∞  probability one. This means that for every fixed ε > 0 the event P  def A(ε) = ω Ω: Xn(ω) X(ω) ε for all n large enough ∈ | − | ≤

has probability one. By monotonicity (A(ε1) A(ε2) if ε1 < ε2), the event ⊂

ω Ω: Xn(ω) X(ω) as n = A(ε) = A(1/m) ∈ → → ∞ ε>0 m 1  \ \≥ has probability one. The claim follows.

A straightforward application of Lemma 2.15 improves the WLLN (2.2) and gives the following famous (Borel) Strong Law of Large Numbers (SLLN):

4 Z Theorem 2.16 (L -SLLN). Let the variables X1, X2, . . . be i.i.d. with E(Xk) = 4 def µ and E (Xk) < . If Sn = X1 + X2 + + Xn, then Sn/n µ almost surely, as n . ∞ ··· → → ∞ 13 Proof. We may and shall suppose that µ = E(Xk) = 0. Now,

n 4 4 4 2 2 E (Sn) = E Xk = E (Xk) + 6 E (Xk) (Xm) k=1 k 1 k nε 4 2 4 | | ≤ (nε)  ≤ n ε  and the result follows from (2.6).

13 otherwise, consider the centred variables Xk0 = Xk µ and deduce the result from the 1 1 − relation S = Sn µ and linearity of almost sure convergence. n n0 n −

9 O.H. Probability II (MATH 2647) M15

With some additional work,14 one can obtain the following SLLN (which is due to Kolmogorov): b Theorem 2.17 (L1-SLLN). Let X , X , . . . be i.i.d. r.v. with E X < . If 1 2 | k| ∞ E(X ) = µ and S def= X + + X , then 1 S µ almost surely, as n . k n 1 ··· n n n → → ∞ Notice that verifying almost sure convergence through the Borel-Cantelli lemma (or the sufficient condition (2.6)) is easier than using an explicit con- struction in the spirit of Example 1.5. We shall see more examples below.

2.3 Relations between different types of convergence It is important to remember the relations between different types of convergence. We know that (Lemma 2.8)

Lr P Z Xn X = Xn X ; → ⇒ → one can also show15 a.s. P Z Xn X = Xn X. → ⇒ → In addition, according to Example 2.13,

P a.s. Z Xn X Xn X, → 6⇒ → and the same construction shows that Lr a.s. Z Xn X Xn X. → 6⇒ → The following examples fill in the remaining gaps:

Lr a.s. Z Example 2.18 (Xn X Xn X). Let Xn be a sequence of independent random → 6⇒ → variables such that P(Xn = 1) = pn, P(Xn = 0) = 1 pn. Then − P Lr Xn X pn 0 Xn X as n , → ⇐⇒ → ⇐⇒ → → ∞ whereas a.s. Xn X pn < . → ⇐⇒ ∞ n X In particular, taking pn = 1/n we deduce the claim. Notice that this example also P a.s. shows that Xn X Xn X. → 6⇒ → P Lr Z Example 2.19 (Xn X Xn X). Let (Ω, , P) be the canonical probability → 6⇒ → F space, recall Definition 2.12. For every n 1, define ≥ n def n e , 0 ω 1/n Xn(ω) = e 1[0,1/n](ω) ≤ ≤ · ≡ (0 , ω > 1/n .

a.s. P We obviously have Xn 0 and Xn 0 as n ; however, for every r > 0 r r enr → L→ → ∞ E Xn = n as n , ie., Xn 0. Notice that this example also shows that | a.s|. →L ∞r → ∞ →6 Xn X Xn X. → 6⇒ → 14we will not do this here! 15although we shall not do it here!

10