O.H. Probability II (MATH 2647) M15
2 Convergence of random variables
In probability theory one uses various modes of convergence of random variables, many of which are crucial for applications. In this section we shall consider some of the most important of them: convergence in Lr, convergence in probability and convergence with probability one (a.k.a. almost sure convergence).
2.1 Weak laws of large numbers b Definition 2.1. Let r > 0 be fixed. We say that a sequence Xj, j 1, of random r ≥ r L variables converges to a random variable X in L (write Xn X) as n , if r E X X 0 as n . → → ∞ n − → → ∞ Example 2.2. Let X be a sequence of random variables such that for n n 1 ≥ some real numbers (an)n 1, we have ≥ P X = a = p , P X = 0 = 1 p . (2.1) n n n n − n Lr r Then X 0 iff E X a r p 0 as n . n → n ≡ | n| n → → ∞ The following result is the L2 weak law of large numbers (L2-WLLN)
Z Theorem 2.3. Let X , j 1, be a sequence of uncorrelated random variables j ≥ with EXj = µ and Var(Xj) C < . Denote Sn = X1 + + Xn. Then 2 ≤ ∞ ··· 1 S L µ as n . n n → → ∞ Proof. Immediate from
2 2 1 (Sn nµ) Var(Sn) Cn E Sn µ = E − = 0 as n . n − n2 n2 ≤ n2 → → ∞ b Definition 2.4. We say that a sequence Xj, j 1, of random variables converges ≥ P to a random variable X in probability (write Xn X) as n , if for every fixed ε > 0 → → ∞ P X X ε 0 as n . | n − | ≥ → → ∞ Example 2.5. Let the sequence Xn n 1 be as in (2.1). Then for every ε > 0 ≥ P we have P Xn ε P Xn = 0) = pn, so that Xn 0 if pn 0 as n . | | ≥ ≤ 6 → → → ∞ The usual (WLLN ) is just a convergence in probability result:
Z Theorem 2.6. Under the conditions of Theorem 2.3, 1 S P µ as n . n n → → ∞ Exercise 2.7. Derive Theorem 2.6 from the Chebyshev inequality. We prove Theorem 2.6 using the following simple fact:
r Z Lemma 2.8. Let X , j 1, be a sequence of random variables. If X L X j ≥ n → for some fixed r > 0, then X P X as n . n → → ∞
6 O.H. Probability II (MATH 2647) M15
r Proof. By the generalized Markov inequality with g(x) = x and Zn = Xn X 0, | − | ≥ we get: for every fixed ε > 0
r r r E Xn X P Zn ε P Xn X ε | − | 0 ≥ ≡ | − | ≥ ≤ εr → as n . → ∞ Proof of Theorem 2.6. Follows immediately from Theorem 2.3 and Lemma 2.8.
As the following example shows, a high dimensional cube is almost a sphere.
Example 2.9. Let Xj, j 1 be iid with Xj U( 1, 1). Then the variables 2 1≥ 2 ∼ − 4 Yj = (Xj) satisfy EYj = 3 , Var(Yj) E[(Yj )] = E[(Xj) ] 1. Fix ε > 0 and consider the set ≤ ≤
def n An,ε = z R : (1 ε) n/3 < z < (1 + ε) n/3 , ∈ − | | n o p n 2 n p 2 where z is the usual Euclidean length in R , z = (zj) . By the WLLN, | | | | j=1 1 n 1 n 1P Y (X )2 P ; n j ≡ n j → 3 j=1 j=1 X X
in other words, for every fixed ε > 0, a point X = (X1,...,Xn) chosen uniformly at random in ( 1, 1)n satisfies − 1 n 1 P (X )2 ε P X A 0 as n , n j − 3 ≥ ≡ 6∈ n,ε → → ∞ j=1 X ie., for large n, with probability approaching one, a random point X ( 1, 1)n is near the n-dimensional sphere of radius n/3 centred at the origin. ∈ −
Z Theorem 2.10. Let random variables pSn, n 1, have two finite moments, 2 ≥ µn ESn, σn Var(Sn) < . If, for some sequence bn, we have σn/bn 0 as n ≡ , then (≡S µ )/b ∞ 0 as n , both in L2 and in probability.→ → ∞ n − n n → → ∞ Proof. The result follows immediately from the observation
2 (Sn µn) Var(Sn) E −2 = 2 0 as n . bn bn → → ∞ 12 Example 2.11. In the “coupon collector’s problem” let Tn be the time to n 1 collect all n coupons. It is easy to show that ETn = n m=1 m n log n and 2 2 ∼ Var(T ) n2 n 1 π n , so that n ≤ m=1 m2 ≤ 6 P P T ET T n − n 0 ie., n 1 n log n → n log n →
as n both in L2 and in probability. → ∞ 12 Problem R4
7 O.H. Probability II (MATH 2647) M15
2.2 Almost sure convergence
Let (Xk)k 1 be a sequence of i.i.d. random variables having mean EX1 = µ and ≥ def n finite second moment. Denote Sn = k=1 Xk. Then the usual (weak) law of large numbers (WLLN) tells us that for every δ > 0 P 1 P n− S µ > δ 0 as n . (2.2) n − → → ∞ 1 In other words, according to WLLN , n− Sn converges in probability to a constant random variable X µ = E(X1), as n (recall Definition 2.4, Theorem 2.6). It is important to≡ remember that convergence→ ∞ in probability is not related to the pointwise convergence, ie., convergence Xn(ω) X(ω) for a fixed ω Ω. The following useful definition can be realised in→ terms of a [0, 1] random∈ variable, recall Remark 1.6.2 U Definition 2.12. The canonical probability space is Ω, , P , where Ω = [0, 1], is the smallest σ-field containing all intervals in [0, 1]F, and P is the ‘length measure’F on Ω (ie., for A = [a, b] [0, 1],P(A) = b a). ⊆ − Z Example 2.13. Let Ω, , P be the canonical probability space. For every event A consider the Findicator random variable ∈ F 1, if ω A , 1A(ω) = ∈ (2.3) 0, if ω / A . ( ∈ For n 1 put m = [log n], ie., m 0 is such that 2m n < 2m+1, define ≥ 2 ≥ ≤ n 2m n + 1 2m A = − , − 0, 1 n 2m 2m ⊆ h i def 1 1 [log2 n] 2 and let Xn = An . Since P An > 0 = P(An) = 2− < n 0 as n , the sequence X converges in probability to X 0. However, → → ∞ n ≡
ω Ω: Xn(ω) X(ω) 0 as n = ∅ , ∈ → ≡ → ∞ ie., there is no point ω Ω for which the sequence X (ω ) 0, 1 converges to ∈ n ∈ { } Z X(ω) = 0. [Try the R script simulating this sequence from the course webpage!] The following is the key definition of this section. b Definition 2.14. A sequence (Xk)k 1 of random variables in (Ω, , P) converges, as n , to a random variable X with≥ probability one (or almostF surely) if → ∞ P ω Ω: X (ω) X(ω) as n = 1 . (2.4) ∈ n → → ∞ Z Remark 2.14.1. For ε > 0, let An(ε) = ω : Xn(ω) X(ω) > ε . Then the property (2.4) is equivalent to saying that for every| ε >− 0 | P An(ε) finitely often = 1 . (2.5) This is why the Borel-Cantelli lemma is so useful in studying almost sure limits.
8 O.H. Probability II (MATH 2647) M15
Example 1.5 (continued) Consider a finite random variable X, ie., satisfying def 1 P( X < ) = 1. Then the sequence (Xk)k 1 defined via Xk = k X converges to| zero| with∞ probability one. ≥
Solution. The previous discussion established exactly (2.5).
In general, to verify convergence with probability one is not immediate. The following lemma gives a sufficient condition of almost sure convergence.
Z Lemma 2.15. Let X1, X2, . . . and X be random variables. If, for every ε > 0,
∞ P X X > ε < , (2.6) | n − | ∞ n=1 X then Xn converges to X almost surely.
Proof. Fix ε > 0 and let An(ε) = ω Ω: Xn(ω) X(ω) > ε . By (2.6), ∈ | − | P An(ε) < , and, by Lemma 1.6a), only a finite number of An(ε) occur with n ∞ probability one. This means that for every fixed ε > 0 the event P def A(ε) = ω Ω: Xn(ω) X(ω) ε for all n large enough ∈ | − | ≤
has probability one. By monotonicity (A(ε1) A(ε2) if ε1 < ε2), the event ⊂
ω Ω: Xn(ω) X(ω) as n = A(ε) = A(1/m) ∈ → → ∞ ε>0 m 1 \ \≥ has probability one. The claim follows.
A straightforward application of Lemma 2.15 improves the WLLN (2.2) and gives the following famous (Borel) Strong Law of Large Numbers (SLLN):
4 Z Theorem 2.16 (L -SLLN). Let the variables X1, X2, . . . be i.i.d. with E(Xk) = 4 def µ and E (Xk) < . If Sn = X1 + X2 + + Xn, then Sn/n µ almost surely, as n . ∞ ··· → → ∞ 13 Proof. We may and shall suppose that µ = E(Xk) = 0. Now,
n 4 4 4 2 2 E (Sn) = E Xk = E (Xk) + 6 E (Xk) (Xm) k=1 k 1 k