Probability and Measure

Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1 Convergence Concepts 1.1 Convergence of Real Numbers A sequence of real numbers an converges to a limit a if and only if, for each ǫ > 0, the sequence an eventually lies within a ball of radius ǫ centered at a. It’s okay if the first few (or few million) terms lie outside that ball— and the number of terms that do lie outside the ball may depend on how big ǫ is (if ǫ is small enough it will take millions of terms before the remaining sequence lies inside the ball). This can be made mathematically precise by introducing a letter (say, Nǫ) for how many initial terms we have to throw away, so that a a if and only if there is an N < so that, for each n → ǫ ∞ n N , a a <ǫ: only finitely many a can be farther than ǫ from a. ≥ ǫ | n − | n The same notion of convergence really works in any (complete) metric space, where we require that some measure of the distance d(an, a) from an to a tend to zero in the sense that it exceeds each number ǫ> 0 for at most some finite number Nǫ of terms. Points a in d-dimensional Euclidean space will converge to a limit a Rd n ∈ if and only if each of their coordinates converges; and, since there are only finitely many of them, if they all converge then they do so uniformly (i.e., for each ǫ we can take the same Nǫ for all d of the coordinate sequences). 1 2 Convergence of Random Variables For random variables Xn the idea of convergence to a limiting random variable X is more delicate, since each X is a function of ω Ω and usually n ∈ there are infinitely many points ω Ω. What should we mean in asking ∈ about the convergence of a sequence Xn of random variables to a limit X? Should we mean that Xn(ω) converges to X(ω) for each fixed ω? Or that these sequences converge uniformly in ω Ω? Or that some notion of the ∈ distance d(Xn, X) between Xn and the limit X decreases to zero? Should the probability measure P be involved in some way? Here are a few different choices of what we might mean by the statement that “Xn converges to X,” for a sequence of random variables Xn and a random variable X, all defined on the same probability space (Ω, F, P): pw: The sequence of real numbers X (ω) X(ω) for every ω Ω (point- n → ∈ wise convergence): ( ǫ> 0) ( ω Ω) ( N < ) ( n N ) X (ω) X(ω) < ǫ. ∀ ∀ ∈ ∃ ǫ,ω ∞ ∀ ≥ ǫ,ω | n − | uni: The sequences of real numbers X (ω) X(ω) uniformly for ω Ω: n → ∈ ( ǫ> 0) ( N < ) ( ω Ω) ( n N ) X (ω) X(ω) < ǫ. ∀ ∃ ǫ ∞ ∀ ∈ ∀ ≥ ǫ | n − | a.s..: Outside some null event N F, each sequence of real numbers X (ω) ∈ n → X(ω) (Almost-Sure convergence, or “almost everywhere” (a.e.)): for some N F with P[N] = 0, ∈ ( ǫ> 0) ( ω / N) ( N < ) ( n N ) X (ω) X(ω) < ǫ, ∀ ∀ ∈ ∃ ǫ,ω ∞ ∀ ≥ ǫ,ω | n − | i.e., P X (ω) X(ω) ǫ = 0. ∪ǫ>0 ∩N<∞ ∪n≥N | n − |≥ L : Outside some null event N F, the sequences of real numbers X (ω) ∞ ∈ n → X(ω) converge uniformly (“almost-uniform” or “L∞” convergence): for some N F with P[N] = 0, ∈ ( ǫ> 0) ( N < ) ( ω / N) ( n N ) X (ω) X(ω) < ǫ. ∀ ∃ ǫ ∞ ∀ ∈ ∀ ≥ ǫ | n − | i.p.: For each ǫ > 0, the probabilities P[ X X > ǫ] 0 (convergence | n − | → “in probability”, or “in measure”): ( ǫ> 0) ( η > 0) ( N < ) ( n N ) P[ X X >ǫ] < η. ∀ ∀ ∃ ǫ,η ∞ ∀ ≥ ǫ,η | n − | 2 L : The expectation E[ X X ] converges to zero (convergence “in L ”): 1 | n − | 1 ( ǫ> 0) ( N < ) ( n N ) E[ X X ] < ǫ. ∀ ∃ ǫ ∞ ∀ ≥ ǫ | n − | L : For some fixed number p> 0, the expectation of the pth power E[ X p | n − X p] converges to zero (convergence “in L ,” sometimes called “in the | p pth mean”): ( ǫ> 0) ( N < ) ( n N ) E[ X X p] < ǫ. ∀ ∃ ǫ ∞ ∀ ≥ ǫ | n − | i.d.: The distributions of Xn converge to the distribution of X, i.e., the measures P X−1 converge in some way to P X−1 (“vague” or ◦ n ◦ “weak” convergence, or “convergence in distribution”, sometimes writ- ten X X): n ⇒ ( ǫ> 0) ( φ C (R)) ( N < ) ( n N ) E[ φ(X ) φ(X) ] < ǫ. ∀ ∀ ∈ b ∃ ǫ,φ ∞ ∀ ≥ ǫ,φ | n − | Which of these eight notions of convergence is right for random variables? The answer is that all of them are useful in probability theory for one purpose or another. You will want to know which ones imply which other ones, under what conditions. All but the first two (pointwise, uniform) notions depend upon the measure P; it is possible for a sequence Xn to converge to X in any of these senses for one probability measure P, but to fail to converge for another P′. Most of them can be phrased as metric convergence for some notion of distance between random variables: i.p.: X X in probability if and only if d (X, X ) 0 as real numbers, n → 0 n → where: X Y d (X,Y ) E | − | 0 ≡ 1+ X Y | − | L : X X in L if and only if d (X, X ) = X X 0 as real 1 n → 1 1 n || − n||1 → numbers, where: X Y E X Y || − ||1 ≡ | − | L : X X in L if and only if d (X, X ) = X X 0 as real p n → p p n || − n||p → numbers, where: X Y (E X Y p)1/p || − ||p ≡ | − | 3 L : X X almost uniformly if and only if d (X, X )= X Y 0 ∞ n → ∞ n || − ||∞ → as real numbers, where: X Y = l.u.b. r < : P[ X Y > r] > 0 || − ||∞ { ∞ | − | } As the notation suggests, convergence in probability and in L∞ are in some sense limits of convergence in L as p 0 and p , respectively. Almost- p → →∞ sure convergence is an exception: there is no metric notion of distance d(X,Y ) for which X X almost surely if and only if d(X, X ) 0. n → n → 2.1 Almost-Sure Convergence Let X and X be a collection of RV’s on some (Ω, F, P). The set of points { n} ω for which Xn(ω) does converge to X(ω) is just ∞ ∞ [ω : X (ω) X(ω) ǫ], | n − |≤ n=m ǫ>\0 m[=1 \ the points which, for all ǫ > 0, have X (ω) X(ω) less than ǫ for all but | n − | finitely-many n. The sequence Xn is said to converge “almost everywhere” (a.e.) to X, or to converge to X “almost surely” (a.s..), if this set of ω has probability one, or (conversely) if its complement is a null set: ∞ ∞ P [ω : X (ω) X(ω) >ǫ] = 0. | n − | ǫ>0 m=1 n=m h [ \ [ i The union over ǫ > 0 is only a countable one, since we need include only rational ǫ (or, for that matter, any sequence ǫk tending to zero, such as ǫ = 1/k). Thus X X a.e. if and only if, for each ǫ> 0, k n → ∞ ∞ P [ω : X (ω) X(ω) >ǫ] = 0. (a.e.) | n − | m=1 n=m h \ [ i This combination of intersection and union occurs frequently in probability, ∞ ∞ and has a name; for any sequence En of events, [ m=1 n=m En] is called the lim sup of the E , and is sometimes described more colorfully as [E i.o.], { n} T S n the set of points in En “infinitely often.” Its complement is the lim inf of c ∞ ∞ the sets Fn = En, [ m=1 n=m Fn]: the set of points in all but finitely many of the F . Since P is countably additive, and since the intersection in the n S T definition of lim sup is decreasing and the union in the definition of lim inf 4 is increasing, always we have P[ ∞ E ] P[ ∞ ∞ E ] and P[ ∞ F ] P[ ∞ ∞ F ] as n=m n ց m=1 n=m n n=m n ր m=1 n=m n m . Thus, S→∞ T S T S T Theorem 1 X X P-a.s.. if and only if for every ǫ> 0, n → lim P[ Xn X >ǫ for some n m] = 0. m→∞ | − | ≥ In particular, X X P-a.s.. if P[ X X > ǫ] < for each ǫ > 0 n → | n − | ∞ (why?). P 2.2 Convergence In Probability The sequence Xn is said to converge to X “in probability” (i.p.) if, for each ǫ> 0, P[ω : X (ω) X(ω) >ǫ] 0. (i.p.) | n − | → If we denote by E the event [ω : X (ω) X(ω) >ǫ] we see that convergence n | n − | almost surely requires that P[ E ] 0 as m , while convergence n≥m n → →∞ in probability requires only that P[E ] 0. Thus: S n → Theorem 2 If X X a.e. then X X i.p. n → n → Here is a partial converse: Theorem 3 If X X i.p., then there is a subsequence n such that n → k X X a.e. nk → Proof. Set n = 0 and, for each integer k 1, set 0 ≥ 1 n = inf n>n : P ω : X (ω) X(ω) > 2−k .

Load more