2 Convergence of Random Variables

O.H. Probability II (MATH 2647) M15

2 Convergence of random variables

In probability theory one uses various modes of convergence of random variables, many of which are crucial for applications. In this section we shall consider some of the most important of them: convergence in Lr, convergence in probability and convergence with probability one (a.k.a. almost sure convergence).

2.1 Weak laws of large numbers b Definition 2.1. Let r > 0 be fixed. We say that a sequence Xj, j 1, of random r ≥ r L variables converges to a random variable X in L (write Xn X) as n , if r E X X 0 as n . → → ∞ n − → → ∞ Example 2.2. Let X be a sequence of random variables such that for n n 1 ≥ some real numbers (an)n 1, we have ≥ P X = a = p , P X = 0 = 1 p . (2.1) n n n n − n Lr r Then X 0 iff E X a r p 0 as n . n → n ≡ | n| n → → ∞ The following result is the L2 weak law of large numbers (L2-WLLN)

Z Theorem 2.3. Let X , j 1, be a sequence of uncorrelated random variables j ≥ with EXj = µ and Var(Xj) C < . Denote Sn = X1 + + Xn. Then 2 ≤ ∞ ··· 1 S L µ as n . n n → → ∞ Proof. Immediate from

2 2 1 (Sn nµ) Var(Sn) Cn E Sn µ = E − = 0 as n . n − n2 n2 ≤ n2 → → ∞ b Deﬁnition 2.4. We say that a sequence Xj, j 1, of random variables converges ≥ P to a random variable X in probability (write Xn X) as n , if for every ﬁxed ε > 0 → → ∞ P X X ε 0 as n . | n − | ≥ → → ∞ Example 2.5. Let the sequence Xn n 1 be as in (2.1). Then for every ε > 0 ≥ P we have P Xn ε P Xn = 0) = pn, so that Xn 0 if pn 0 as n . | | ≥ ≤ 6 → → → ∞ The usual (WLLN ) is just a convergence in probability result:

Z Theorem 2.6. Under the conditions of Theorem 2.3, 1 S P µ as n . n n → → ∞ Exercise 2.7. Derive Theorem 2.6 from the Chebyshev inequality. We prove Theorem 2.6 using the following simple fact:

r Z Lemma 2.8. Let X , j 1, be a sequence of random variables. If X L X j ≥ n → for some ﬁxed r > 0, then X P X as n . n → → ∞

6 O.H. Probability II (MATH 2647) M15

r Proof. By the generalized Markov inequality with g(x) = x and Zn = Xn X 0, | − | ≥ we get: for every ﬁxed ε > 0

r r r E Xn X P Zn ε P Xn X ε | − | 0 ≥ ≡ | − | ≥ ≤ εr → as n . → ∞ Proof of Theorem 2.6. Follows immediately from Theorem 2.3 and Lemma 2.8.

As the following example shows, a high dimensional cube is almost a sphere.

Example 2.9. Let Xj, j 1 be iid with Xj U( 1, 1). Then the variables 2 1≥ 2 ∼ − 4 Yj = (Xj) satisfy EYj = 3 , Var(Yj) E[(Yj )] = E[(Xj) ] 1. Fix ε > 0 and consider the set ≤ ≤

def n An,ε = z R : (1 ε) n/3 < z < (1 + ε) n/3 , ∈ − | | n o p n 2 n p 2 where z is the usual Euclidean length in R , z = (zj) . By the WLLN, | | | | j=1 1 n 1 n 1P Y (X )2 P ; n j ≡ n j → 3 j=1 j=1 X X

in other words, for every ﬁxed ε > 0, a point X = (X1,...,Xn) chosen uniformly at random in ( 1, 1)n satisﬁes − 1 n 1 P (X )2 ε P X A 0 as n , n j − 3 ≥ ≡ 6∈ n,ε → → ∞ j=1 X ie., for large n, with probability approaching one, a random point X ( 1, 1)n is near the n-dimensional sphere of radius n/3 centred at the origin. ∈ −

Z Theorem 2.10. Let random variables pSn, n 1, have two ﬁnite moments, 2 ≥ µn ESn, σn Var(Sn) < . If, for some sequence bn, we have σn/bn 0 as n ≡ , then (≡S µ )/b ∞ 0 as n , both in L2 and in probability.→ → ∞ n − n n → → ∞ Proof. The result follows immediately from the observation

2 (Sn µn) Var(Sn) E −2 = 2 0 as n . bn bn → → ∞ 12 Example 2.11. In the “coupon collector’s problem” let Tn be the time to n 1 collect all n coupons. It is easy to show that ETn = n m=1 m n log n and 2 2 ∼ Var(T ) n2 n 1 π n , so that n ≤ m=1 m2 ≤ 6 P P T ET T n − n 0 ie., n 1 n log n → n log n →

as n both in L2 and in probability. → ∞ 12 Problem R4

7 O.H. Probability II (MATH 2647) M15

2.2 Almost sure convergence

Let (Xk)k 1 be a sequence of i.i.d. random variables having mean EX1 = µ and ≥ def n finite second moment. Denote Sn = k=1 Xk. Then the usual (weak) law of large numbers (WLLN) tells us that for every δ > 0 P 1 P n− S µ > δ 0 as n . (2.2) n − → → ∞ 1 In other words, according to WLLN , n− Sn converges in probability to a constant random variable X µ = E(X1), as n (recall Definition 2.4, Theorem 2.6). It is important to≡ remember that convergence→ ∞ in probability is not related to the pointwise convergence, ie., convergence Xn(ω) X(ω) for a fixed ω Ω. The following useful definition can be realised in→ terms of a [0, 1] random∈ variable, recall Remark 1.6.2 U Definition 2.12. The canonical probability space is Ω, , P , where Ω = [0, 1], is the smallest σ-field containing all intervals in [0, 1]F, and P is the ‘length measure’F on Ω (ie., for A = [a, b] [0, 1],P(A) = b a). ⊆ − Z Example 2.13. Let Ω, , P be the canonical probability space. For every event A consider the Findicator random variable ∈ F 1, if ω A , 1A(ω) = ∈ (2.3) 0, if ω / A . ( ∈ For n 1 put m = [log n], ie., m 0 is such that 2m n < 2m+1, define ≥ 2 ≥ ≤ n 2m n + 1 2m A = − , − 0, 1 n 2m 2m ⊆ h i def 1 1 [log2 n] 2 and let Xn = An . Since P An > 0 = P(An) = 2− < n 0 as n , the sequence X converges in probability to X 0. However, → → ∞ n ≡

ω Ω: Xn(ω) X(ω) 0 as n = ∅ , ∈ → ≡ → ∞ ie., there is no point ω Ω for which the sequence X (ω ) 0, 1 converges to ∈ n ∈ { } Z X(ω) = 0. [Try the R script simulating this sequence from the course webpage!] The following is the key definition of this section. b Definition 2.14. A sequence (Xk)k 1 of random variables in (Ω, , P) converges, as n , to a random variable X with≥ probability one (or almostF surely) if → ∞ P ω Ω: X (ω) X(ω) as n = 1 . (2.4) ∈ n → → ∞ Z Remark 2.14.1. For ε > 0, let An(ε) = ω : Xn(ω) X(ω) > ε . Then the property (2.4) is equivalent to saying that for every| ε >− 0 | P An(ε) finitely often = 1 . (2.5) This is why the Borel-Cantelli lemma is so useful in studying almost sure limits.

8 O.H. Probability II (MATH 2647) M15

Example 1.5 (continued) Consider a ﬁnite random variable X, ie., satisfying def 1 P( X < ) = 1. Then the sequence (Xk)k 1 deﬁned via Xk = k X converges to| zero| with∞ probability one. ≥

Solution. The previous discussion established exactly (2.5).

In general, to verify convergence with probability one is not immediate. The following lemma gives a suﬃcient condition of almost sure convergence.

Z Lemma 2.15. Let X1, X2, . . . and X be random variables. If, for every ε > 0,

∞ P X X > ε < , (2.6) | n − | ∞ n=1 X then Xn converges to X almost surely.

Proof. Fix ε > 0 and let An(ε) = ω Ω: Xn(ω) X(ω) > ε . By (2.6), ∈ | − | P An(ε) < , and, by Lemma 1.6a), only a ﬁnite number of An(ε) occur with n ∞ probability one. This means that for every ﬁxed ε > 0 the event P def A(ε) = ω Ω: Xn(ω) X(ω) ε for all n large enough ∈ | − | ≤

has probability one. By monotonicity (A(ε1) A(ε2) if ε1 < ε2), the event ⊂

ω Ω: Xn(ω) X(ω) as n = A(ε) = A(1/m) ∈ → → ∞ ε>0 m 1 \ \≥ has probability one. The claim follows.

A straightforward application of Lemma 2.15 improves the WLLN (2.2) and gives the following famous (Borel) Strong Law of Large Numbers (SLLN):

4 Z Theorem 2.16 (L -SLLN). Let the variables X1, X2, . . . be i.i.d. with E(Xk) = 4 def µ and E (Xk) < . If Sn = X1 + X2 + + Xn, then Sn/n µ almost surely, as n . ∞ ··· → → ∞ 13 Proof. We may and shall suppose that µ = E(Xk) = 0. Now,

n 4 4 4 2 2 E (Sn) = E Xk = E (Xk) + 6 E (Xk) (Xm) k=1 k 1 k nε 4 2 4 | | ≤ (nε) ≤ n ε and the result follows from (2.6).

13 otherwise, consider the centred variables Xk0 = Xk µ and deduce the result from the 1 1 − relation S = Sn µ and linearity of almost sure convergence. n n0 n −

9 O.H. Probability II (MATH 2647) M15

With some additional work,14 one can obtain the following SLLN (which is due to Kolmogorov): b Theorem 2.17 (L1-SLLN). Let X , X , . . . be i.i.d. r.v. with E X < . If 1 2 | k| ∞ E(X ) = µ and S def= X + + X , then 1 S µ almost surely, as n . k n 1 ··· n n n → → ∞ Notice that verifying almost sure convergence through the Borel-Cantelli lemma (or the suﬃcient condition (2.6)) is easier than using an explicit construction in the spirit of Example 1.5. We shall see more examples below.

2.3 Relations between diﬀerent types of convergence It is important to remember the relations between diﬀerent types of convergence. We know that (Lemma 2.8)

Lr P Z Xn X = Xn X ; → ⇒ → one can also show15 a.s. P Z Xn X = Xn X. → ⇒ → In addition, according to Example 2.13,

P a.s. Z Xn X Xn X, → 6⇒ → and the same construction shows that Lr a.s. Z Xn X Xn X. → 6⇒ → The following examples ﬁll in the remaining gaps:

Lr a.s. Z Example 2.18 (Xn X Xn X). Let Xn be a sequence of independent random → 6⇒ → variables such that P(Xn = 1) = pn, P(Xn = 0) = 1 pn. Then − P Lr Xn X pn 0 Xn X as n , → ⇐⇒ → ⇐⇒ → → ∞ whereas a.s. Xn X pn < . → ⇐⇒ ∞ n X In particular, taking pn = 1/n we deduce the claim. Notice that this example also P a.s. shows that Xn X Xn X. → 6⇒ → P Lr Z Example 2.19 (Xn X Xn X). Let (Ω, , P) be the canonical probability → 6⇒ → F space, recall Deﬁnition 2.12. For every n 1, deﬁne ≥ n def n e , 0 ω 1/n Xn(ω) = e 1[0,1/n](ω) ≤ ≤ · ≡ (0 , ω > 1/n .

a.s. P We obviously have Xn 0 and Xn 0 as n ; however, for every r > 0 r r enr → L→ → ∞ E Xn = n as n , ie., Xn 0. Notice that this example also shows that | a.s|. →L ∞r → ∞ →6 Xn X Xn X. → 6⇒ → 14we will not do this here! 15although we shall not do it here!