Lecture 18: April 5 18.1 Continuous Mapping Theorem

36-752 Advanced Probability Overview Spring 2018 Lecture 18: April 5 Lecturer: Alessandro Rinaldo Scribes: Wanshan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 18.1 Continuous Mapping Theorem 1 Let fXngn=1 and X be random variables taking values in metric space (X ; d). Recall that last time we D defined that Xn ! X when limn!1 E[g(Xn)] = E[g(X)] for all bounded continuous function g. This relationship can actually be generalized as follows: D Theorem 18.1. Xn ! X if and only if lim [g(Xn)] = [g(X)] n!1 E E for all bounded g that are continuous a:e:[µX ], or µX (fx : g not continuous at xg) = 0. Remark For the direct proof of this theorem, you can see Theorem 3.9.1 on Durrett's book, or the section on weak convergence of Billingsley's book. You can also prove it by using Skorokhod's Representation Theorem given below: D Theorem 18.2 (Skorokhod's Representation Theorem). Suppose Xn ! X, all taking values in metric space (X ; d), and the probability measure µ of X is separable. Then 9 fYng and Y , taking values in (X ; d), and d d defined on some probability space (Ω; F; p) s.t. Xn = Yn, X = Y , and a:s: Yn ! Y; where a:s: is w.r.t. p. In a narrow sense, the so-called continuous mapping theorem concerns the convergence in distribution of random variables, as we will discuss first. This theorem contains three parts. Roughly speaking, the main D D part of it says that if Xn ! X and f is a a:e:[µX ] continuous function, then f(Xn) ! f(X). 1 Theorem 18.3 (Continuous Mapping Theorem, I). Let fXngn=1 be a sequence of random variables and X another random variable, all taking values in the same metric space X . Let Y be a metric space and f : X!Y a measurable function. Define Cf = fx : g is continuous at xg: D D Suppose that Xn ! X and P(X 2 Cf ) = 1, then f(Xn) ! f(X). 18-1 18-2 Lecture 18: April 5 Proof. Denote the probability measure of Xn and X as µn and µ. Last time we saw that 8B ⊂ Y closed, −1 −1 c f (B) ⊂ f (B) [ Cf ; where f −1(B) is the closure of fx 2 X : f(x) 2 Bg. Therefore, −1 lim sup P(f(xn) 2 B) = lim sup µn(f (B)) n!1 n!1 −1 ≤ lim sup µn(f (B)) n!1 Portmanteau ≤ µ(f −1(B)) −1 c ≤ µ(f (B)) + µ(Cf ) = µ(f −1(B)) = P(f(X) 2 B): D This implies f(Xn) ! f(X) by Portamanteau Theorem. Example 18.4. Suppose X1;X2; ··· are i.i.d. samples from a (well behaved) distribution with mean and p h p i2 2 n ¯ D n ¯ D 2 variance µ, σ . By CLT we have σ (Xn − µ) ! N(0; 1). Now by CMT we have σ (Xn − µ) ! χ1. 1 1 Theorem 18.5 (Continuous Mapping Theorem, II). Let fXngn=1, X, fYngn=1 be random variables D P D taking values in a metric space with metric d. Suppose that Xn ! X and d(Xn;Yn) ! 0, then Yn ! X. 1 1 Theorem 18.6 (Continuous Mapping Theorem, III). Let fXngn=1 and fYngn=1 be random variables. D P D D Suppose that Xn ! X and Yn ! c, then (Xn;Yn) ! (X; c). Furthermore, if Xn ?? Yn and Yn ! Y , then 0 D 0 (Xn;Yn) ! (X; Y ) , and X ?? Y . The CMT can also be generalized to cover the convergence in probability, as the following theorem does. P Theorem 18.7 (CMT for convergence in probability). If Xn ! X and f is continuous a:s:[µX ], then P f(Xn) ! f(X). a:s: a:s: Remark Also notice the trivial fact that if Xn ! X then f(Xn) ! f(X). Therefore the CMT holds for all these three modes of convergence. Proof. Fix an arbitrary " > 0, we want to show that P(dY (f(Xn); f(X)) > ") ! 0; as n ! 1: Fix δ > 0 and let Bδ be the subset of X consisting of all x's such that 9y 2 X with dX (x; y) < δ and dY (f(x); f(y)) > ". Then fX2 = Bδg \ fdY (f(Xn); f(X)) > "g ⊂ fdX (Xn;X) ≥ δg: By the fact that Ac \ B ⊂ C ) B ⊂ A [ C, we have P(dY (f(Xn); f(X)) > ") ≤ P(X 2 Bδ) + P(dX (Xn;X) ≥ δ) , T1 + T2: P Now T2 ! 0 as n ! 1 because Xn ! X. As for T1, notice that P(X 2 Bδ) = P(X 2 Bδ \ Cf ) # 0 as δ # 0: Thus " " (d (f(X ); f(X)) > ") ≤ + = ": P Y n 2 2 For n large enough and δ small enough. Lecture 18: April 5 18-3 Remark If f (defined on X ) is a:s:[µX ] continuous and g (defined on Y) is continuous, then g◦f is a:s:[µX ] continuous. Therefore by theorem 18.1, we can get the CMT immediately. 2 Example 18.8. Suppose X1;X2; ··· are i.i.d. samples with mean and variance µ and σ . We want to 2 2 P 2 2 1 Pn ¯ 2 ¯ 1 Pn estimate σ by an estimator σ^n ! σ . Consider σ^n = n i=1(Xi − Xn) , where Xn = n i=1 Xi. By WLLN we have n 1 X P P (X − µ)2 ! σ2; X¯ ! µ. n i n i=1 Using this, together with the fact that n 1 X σ^2 = (X − µ)2 − (X¯ − µ)2; n n i n i=1 2 P and (X¯n − µ) ! 0 by CMT, we have n 1 X P ( (X − µ)2; (X¯ − µ)2)0 ! (σ2; 0)0: n i n i=1 Now let f(x; y) = x − y, which is continuous, by CMT we have n 1 X P σ^2 = f( (X − µ)2; (X¯ − µ)2) ! σ2: n n i n i=1 A direct but useful corollary of continuous mapping theorem is the Slusky's Theorem. D D Theorem 18.9 (Slusky's Theorem). If Xn ! X and Yn ! c, then D 1. Xn + Yn ! X + c. D 2. XnYn ! cX. D 3. Xn=Yn ! X=c provided that c 6= 0. p n ¯ D Example 18.10. By Central Limit Theorem we have σ (X − µ) ! N(0; 1). By Law of Large Numbers we 2 P 2 P have σ^n ! σ . Then by continuous mapping theorem σ^nσ and by Slusky's Theorem, p p n σ n (X¯ − µ) = (X¯ − µ) !D N(0; 1): σ^n σ^n σ 18.2 Tightness 1 Definition 18.11. A sequence of probability measures fµngn=1 is said to be tight if 8" > 0, 9C compact s.t. µn(C) > 1 − "; 8n: 1 Equivalently, if Xn ∼ µn, then fXngn=1 is tight or bounded by probability if 8" > 0, 9M s.t. sup P(kXnk > M) < ": n In that case, we denote Xn = OP (1). 18-4 Lecture 18: April 5 Example 18.12. Cases where bounded in probability fails: 1. P(Xn = cn) = 1, and cn ! 1. c M 2. Xn ∼ Uniform([−n; n]). (Note that then P(Xn 2 [−M; M] ) = 1 − n ). 1 d Theorem 18.13. Let fXngn=1 be a sequence of random variables taking values in R . D (i) If Xn ! X then fXng is tight. D (ii) Helly-Bray Selection Theorem. If fXng is tight, then 9 fnkg s.t. Xnk ! X. Further, if every D convergent (in distribution) sub-sequence converges to the same X, then Xn ! X. Proof of (i). X is a random variable, so 8" > 0, let M = M(") > 0 s.t. P(kXk > M) < ". Then by Portmanteau Thm, for all n ≥ n0("; M), P(kXnk ≥ M) ≤ P(kXk ≥ M) + " ≤ 2": On the other hand, we can find M1 > M s.t. for all n < n0, P(kXnk ≥ M1) ≤ 2": Now for n ≥ n0, we have P(kXnk ≥ M1) ≤ P(kXnk ≥ M) ≤ P(kXk ≥ M) + " ≤ 2": Thus supn P(kXnk ≥ M1) ≤ 2". 1 d D Theorem 18.14 (Polya Theorem). Suppose fXngn=1 and X take values in R , and Xnk ! X. If the d c.d.f. F of X is continuous, or equivalently µX λd, where λd is the Lebesgue measure on R , then (Fn is the c.d.f. of Xn) sup jFn(x) − F (x)j ! 0: d X2R Theorem 18.15 (Glivenko-Cantelli Theorem). Suppose X1;X2; ··· ;, Xi 2 R are i.i.d. samples from a distribution with c.d.f. F . For each n, let F^n be the empirical c.d.f. n 1 X F^ (x) = 1fX ≤ xg: n n n i=1 We have a:s: kF^n − F k1 = sup jF^n(x) − F (x)j ! 0: X2R Remark ^ ^ a:s: • Notice that E[Fn(x)] = F (x) for all x. Therefore by SLLN we have jFn(x) − F (x)j ! 0 for each x. However the Glivenko-Cantelli Theorem is much stronger than this because it asserts the uniform convergence. • We often use another (even stronger) theorem instead, named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved this inequality: Theorem 18.16 (Dvoretzky-Kiefer-Wolfowitz).

Lecture 18: April 5 18.1 Continuous Mapping Theorem

Modes of Convergence in Probability Theory

Arxiv:2102.05840V2 [Math.PR]

Noncommutative Ergodic Theorems for Connected Amenable Groups 3

Sequences and Series of Functions, Convergence, Power Series

Basic Functional Analysis Master 1 UPMC MM005

Almost Uniform Convergence Versus Pointwise Convergence

ACM 217 Handout: Completeness of Lp the Following

MAA6617 COURSE NOTES SPRING 2014 19. Normed Vector Spaces Let

9 Modes of Convergence

Chapter 3: Integration

Probability Theory - Part 2 Independent Random Variables

Probability I Fall 2011 Contents