Almost Sure Central Limit Theory
Fredrik Jonsson U.U.D.M. Project Report 2007:9
Examensarbete i matematisk statistik, 20 poäng Handledare och examinator: Allan Gut Mars 2007
Department of Mathematics Uppsala University
Abstract
The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n 1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the above mentioned sequence of random variables. The purpose of this thesis is to present and summarize various kinds of generalizations of this result which may be found in the research literature.
i
Acknowledgement
I would like to thank Professor Allan Gut for introducing me to the subject, for careful readings of my drafts and for interesting conversations.
iii
Contents
1 Introduction 1 1.1 Notation ...... 2
2 Preliminaries 3 2.1 Probability measures and weak convergence ...... 3 2.2 Central limit theory ...... 8 2.3 Summation methods: linear transformations ...... 15
3 Almost Sure Converging Means of Random Variables 19 3.1 Bounds for variances of weighted partial sums ...... 19 3.2 Bounds for covariances among individual variables ...... 22 3.3 Refinements with respect to weight sequences ...... 25
4 Almost Sure Central Limit Theory 29 4.1 Independent random variables ...... 31 4.2 Weakly dependent random variables ...... 37 4.3 Subsequences ...... 40 4.4 An almost sure version of Donsker’s theorem ...... 40
5 Generalizations and Related Results 45 5.1 A universal result and some consequences ...... 45 5.2 Return times ...... 47 5.3 A local limit theorem ...... 50 5.4 Generalized moments in the almost sure central limit theorem 52
v
1 Introduction
The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n 1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), (1.1) n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the sequence of random variables {Xk}k≥1. The notation “a.s.” abbreviates “almost surely”, that is, with probability one. The first version of (1.1) was proved in the late 1980s, but a preliminary result was considered in the 1930s by Paul L´evy. It was at this early stage shown (consult [13] for an elementary proof in the case of the simple, symmetric random walk) that the random quantity
n 1 X IS ≤ 0 (1.2) n k k=1 does not stop to vary randomly as n tends to infinity. On the contrary, the distributions of the random variables converge to the Arc sine distribution. The quantity in (1.2) can be interpreted as the amount of time the random walk {Sn} has spent below zero up to time n. In the result (1.1), except for replacing 0 by the more general x, there are weights {1/k}k≥1 and a P different normalization corresponding to the fact that 1≤k≤n 1/k ∼ log n, as n → ∞. In this way randomness vanishes asymptotically, but on the other hand, the random walk occupancy time interpretation seems to be lost. There are many other kinds of sequences of random variables {Xk}k≥1, than the one mentioned at the beginning where S lim √n =d Φ. n→∞ n
For an even larger class of interesting sequences {Xk}k≥1 one has S − b lim n n =d G, (1.3) n→∞ an for some sequences {an}n≥1 and {bn}n≥1 and a distribution G (different from the zero-distribution). One may call these results Central limit theorems. The purpose of this thesis is to present and summarize known general- izations of (1.1) for some of the more well-known examples satisfying (1.3), especially those where the Xk-variables are independent. This is the content
1 of Sections 4.1 and 4.2. Improvements, or other ways of generalizing (1.1), are presented in Sections 4.4 and 5.4. The remaining parts of Chapter 5 and Section 4.3 present other results which are interesting in this context. Some useful background material is presented in Chapter 2 while Chapter 3 gives relevant results to be used in Chapters 4 and 5. The results of Chapter 3 are to a large extent based on arguments in [4] which we separate and put in a more general form. However, we make a more elementary connection to the theory of summation methods. No reference to the general theory of Riesz typical means, as can be found in [4], is made here. A slight generalization of the results has also been obtained under the influence of [11]. A different way of arriving at (1.1), and related results which will be considered here, is based on Characteristic functions and may be found in [20]. It is not the aim of this thesis to give a complete overview of results inspired by, or connected to (1.1). We refer to the survey paper [3] for further results. We rather hope to introduce the subject and perhaps contribute concerning unification and by filling out some gaps where research articles most of the times leave out the details. Examples of the latter kind are Theorems 2.4 and 5.19. Some extensions of previously published results may also be new.
1.1 Notation We follow Landau’s “small o”, “big O” notation for real-valued functions and sequences. That is: f = o(g) means f(x)/g(x) → 0 as x → ∞ and f = O(g) means f(x)/g(x) remains bounded as x → ∞. We presume the reader’s familiarity with such statements as: For all > 0, log x = o(x). By f ∼ g we mean “asymptotically equal”, that is f(x)/g(x) → 1 as x → ∞. An example of a true statement of this kind is log (1 + 1/x) ∼ 1/x. We follow the tradition of denoting iterated logarithms by logk, k ≥ 1. That is, log1 (x) := log x, and recursively logk+1 (x) := log logk (x). We also presume the reader’s familiarity with the (hopefully universal) fundamental concepts of probability theory. We refer e.g. to the first chap- ters of [17]. As for notation we reserve N(µ, σ2) to denote the normal dis- tribution with expectation µ and variance σ2. We also denote the standard normal distribution, N(0, 1), by N, its distribution function by Φ and its density function by φ. As is common, we abbreviate “independent, identi- cally distributed” i.i.d. At some places we refer to facts and concepts from the theory of inte- gration. In measure spaces we denote the indicator function (defined on the same space, assuming values 0 and 1) of a subset A by I{A}.
2 2 Preliminaries
2.1 Probability measures and weak convergence This section concerns probability measures on some space S, equipped with a metric %(·, ·) and the usual σ-field S generated by the open balls in S. Given such a measure P on (S, S), a set being P-continuous means that its boundary has P -measure 0. Given an P -integrable function f on S to R, we also denote R fdP by P f. To such a setting we may extend the familiar notion (in S = R) of Convergence in distribution to what is called Weak convergence of probability measures. It concerns a collection ({Pn}n≥1,P ) of probability measures, is denoted Pn ⇒ P and defined by
lim Pnf = P f, for all bounded and continuous f. n→∞ The following theorem, usually called ”The Portmanteau Theorem”, gives five equivalent definitions.
Theorem 2.1. Let {Pn}n≥1 and P be probability measures on (S, S). These five conditions are equivalent: (i) Pn ⇒ P ; (ii) limn→∞ Pnf = P f, for all bounded, uniformly continuous f; (iii) lim supn PnF ≤ PF , for all closed sets F; (iv) lim infn PnG ≥ PG, for all open sets G; (v) PnA ⇒ PA, for all P-continuity sets A. Proof. [6, Theorem 2.1, page 16].
In the case of a Separable metric space, i.e. one with a countable dense subset, condition (ii) may, for sufficiency, be weakened as follows:
Proposition 2.2. Let {Pn}n≥1, P and (S, S) be as before. Assume S separa- ble. Then there exists a sequence {fm}m≥1 of bounded, Lipschitz-continuous functions, fm : S → R, such that
lim Pnfm → P fm, for all m ≥ 1, n→∞ implies Pn ⇒ P .
Proof. Let {xk}k≥1 denote a dense sequence in S. There are countably many balls {B(xm, q): m ∈ N, q ∈ Q} which we denote {Ak}k≥1. The sets {Ak} generate the open sets in S in the sense that every open set A may be written [ A = Ak, (2.1) k∈A
3 for some A ⊆ N. Indeed, if x ∈ A there exists an so that B(x, ) ⊆ A and an xk so that xk ∈ B(x, ). By the triangle inequality B(xk, ) ⊆ A so that 2 2 A := k ∈ N : Ak ⊆ A will do. To verify condition (iv) of Theorem 2.1 it will be enough to consider only finite unions of sets A , since assuming so and writing A = S A k j∈N kj by (2.1) implies that
m [ lim inf PnA = lim inf lim Pn Ak n n m→∞ j j=1 m [ ≥ lim lim inf Pn Ak m→∞ n j j=1 m [ ≥ lim P Ak = P A. (2.2) m→∞ j j=1
The first inequality (changing order of limits) is valid since
m [ lim inf Pn Ak is non-decrasing in m, n j j=1 so that for any > 0 and some M():
m M [ [ lim lim inf Pn Ak ≤ lim inf Pn Ak + . m→∞ n j n j j=1 j=1
The first term on the right is majorized by
m [ lim inf lim Pn Ak , n m→∞ j j=1 since m [ Pn Akj is non-decrasing in m, for all n. j=1 This proves (2.2). The collection of finite unions of balls Ak is also countable. Indeed, the collection of n-ary unions is of non-larger cardinality than the n-fold cartesian product, which is countable. And a countable union of countable sets is countable. It therefor remains to show that for any fixed, finite union of sets Ak there exist a sequence {fm} of bounded Lipschitz-functions so that Pnfm → P fm for all m ∈ N implies lim infn PnA ≥ PA. This last
4 c condition is equivalent to lim supn PnF ≤ PF , for F = A , where F is a closed set. Define
%(x, F ) := inf %(x, z) z∈F + fm(x) := 1 − %(x, F )m n 1 o F := x ∈ S : %(x, F ) < . m m T Then F = Fm, since F closed implies that m∈N 1 \ x∈ / F ⇒ %(x, F ) > 0 ⇒ ∃m : %(x, F ) ≥ ⇒ x∈ / F ⇒ x∈ / F . m m m m∈N
Moreover, Fm+1 ⊆ Fm implies that
M \ PF = lim P Fm = lim PFM , (2.3) M→∞ M→∞ m=1 by fundamental properties of measures. Finally, it follows that for any m
I{F } ≤ fm ≤ I{Fm}. (2.4)
Indeed, x ∈ F implies that fm(x) = 1 and (2.4) holds with equalities. Taking x ∈ Fm\F implies that fm(x) = 1 − %(x, F )m ≤ 1 so that (2.4) holds with 0 and 1 on the boundaries. Taking x∈ / Fm finally implies that fm(x) = 0 and (2.4) once again holds with equalities. Now, assuming Pnfm → P fm for all m ∈ N we get by (2.4) that
lim sup PnF ≤ lim sup Pnfm = P fm ≤ PFm. n n
The statement lim supn PnF ≤ PF follows by (2.3). It only remains to show that fm is Lipschitz, that is
|fm(x) − fm(y)| ≤ N%(x, y), (2.5) for some constant N and all x and y, since boundedness is obvious. In fact (2.5) holds with N = m. This follows from Lemma 2.3 by, as for (2.4), going through the different cases where x and y belong to F and Fm respectively.
Lemma 2.3. Let A be any subset of S and define a positive function on S by %(x, A) := infz∈A %(x, z). Then %(·,A) is Lipschitz-1, i.e.
%(x, A) − %(y, A) ≤ %(x, y).
5 Proof. Assume w.l.o.g. that %(x, A) ≥ %(y, A). For > 0 take z ∈ A so that %(y, z) − %(y, A) ≤ . Then
%(x, A) − %(y, A) = %(x, A) − %(y, A) ≤ %(x, z) − %(y, A) ≤ %(x, y) + %(y, z) − %(y, A) ≤ %(x, y) + .
The proof is complete since was arbitrary.
From Theorem 2.1 and Proposition 2.2 we now deduce a result to be used in Chapters 4 and 5.
Theorem 2.4. Let {dk}k≥1 be a sequence of positive real numbers and set P Dn := 1≤k≤n dk for n ≥ 1. Let further {Xk}k≥1 be a sequence of ran- dom elements in a separable metric space S defined on a probability space (Ω, P,P ). Let further G be a probability measure on S and in case S = R, CG ⊆ R its continuity points. Finally, let for x ∈ S, δ(x) denote the Dirac point measure of x. The following two conditions are equivalent:
(i) 1 Pn d δ(X ) ⇒ G, almost surely; Dn k=1 k k (ii) 1 Pn d f(X ) −→a.s. R fdG,for all bounded Lipschitz-functions f. Dn k=1 k k In the case S = R the following is a third equivalent condition: (iii) 1 Pn d I{X ≤ x} −→a.s. G(x), for all x ∈ C . Dn k=1 k k G Proof. Define for all n and k:
n 1 X Fn(ω) := dkδ(Xk(ω)), Dn k=1 Gk(ω) := δ(Xk(ω)). P Since dk ≥ 0 and Dn = k≤n dk, this defines, for ω ∈ Ω fixed, probability measures on S. For the equivalence of (i) and (iii) when S = R we merely note that Fn(ω) have distribution functions
n 1 X Fn(ω; x) := dkI{Xk(ω) ≤ x}. Dn k=1 Conditions (i) and (iii) are therefore equivalent (cf. [6, Chapter 1]). In general, condition (i) could now be stated
Fn(ω) ⇒ G, all ω∈ / N. (2.6)
6 for some P -null set N ∈ P. Theorem 2.1 gives the equivalence of (2.6) and the statement Z Z fdFn −→ fdG, S S for all bounded, uniformly continuous f and all ω∈ / N. But Z n Z n 1 X 1 X fdFn = dk fdGk = dkf Xk(ω) . Dn Dn k=1 k=1 Since f Lipschitz implies f Uniformly continuous, condition (i) implies con- dition (ii) with the same null set N for all f. On the other hand, by Proposition 2.2, statement (2.6) is also equivalent to: Z Z fmdFn −→ fmdG, for all m and all ω∈ / N, where {fm} is a certain sequence of bounded, Lipschitz-continuous functions. Once again: Z n Z n 1 X 1 X fmdFn = dk fmdGk = dkfm Xk(ω) . Dn Dn k=1 k=1 It therefore remains to show that condition (ii) implies:
n Z 1 X dkfm Xk(ω) −→ fmdG, (2.7) Dn k=1 for some P-null set N and all m, all ω∈ / N. Condition (iii) gives null S sets Nm for each fm. Taking N := m Nm gives another null set, since P P (N) ≤ m P (Nm) = 0. Finally, for m fixed we have:
ω∈ / N ⇒ ω∈ / N ⇒ (2.7). m Remark 2.5. Theorem 2.4 will in later chapters be applied in cases where S = R and S = C[0, 1], the set of continuous real-valued functions on C[0, 1] equipped with the metric of uniform convergence. Moreover also for S = D[0, 1], the set of functions f : [0, 1] → R, at each point being right- continuous and having a left-hand limit, equipped with any of the metrics d and d◦ defined in [6, Chapter 3]. Another common candidate which will d not be considered is S = R . Billingsley [6] proves that all these spaces are separable.
7 2.2 Central limit theory We begin by stating three versions of the Central limit theorem. First the classical formulation, then the Lindeberg-L´evy-Feller version and finally an extension of the first, which not merely concerns random variables with finite variance and the normal limit.
Theorem 2.6. Let {Xk}k≥1 be a sequence of i.i.d. random variables with 2 Pn finite expectation µ and positive, finite variance σ , and set Sn = k=1 Xk. Then S − nµ n √ −→d N as n → ∞. σ n Proof. Confer for example [17, Theorem 1.1, Chapter 3, page 330].
Theorem 2.7. Let {Xk}k≥1 be a sequence of independent random variables 2 with finite expectations µk and positive, finite variances σk, and set Sn = Pn 2 Pn 2 k=1 Xk and sn = k=1 σk. To avoid trivialities, assume that s1 > 0. Among the three conditions below, (ii) is equivalent to the conjunction of (i) and (iii). 2 σk (i) max1≤k≤n 2 → 0 as n → ∞; sn 1 Pn 2 (ii) 2 E|Xk − µk| I |Xk − µk| > sn → 0 as n → ∞; sn k=1
(iii) 1 Pn (X − µ ) −→d N as n → ∞. sn k=1 k k Proof. [17, Theorem 2.1, Chapter 3, page 331]. Before the next result we need some new notions. A probability dis- tribution F on R, belongs to the Domain of attraction of a non-degenerate distribution G whenever a suitably centered and normalized sequence of par- tial sums of i.i.d. F -distributed random variables converge in distribution to G. It can be shown that G is unique, up to centering and normalization, rel- ative to F , and that only Stable distributions may occur as G-distributions. (cf. [17, pages 428-431]) We only mention here that stable distributions may be characterized by an order parameter, α ∈ (0, 2], the skewness parameter β ∈ [−1, 1], and finally by centering and normalization. Their characteristic functions admit finite expressions, cf. [17, page 427]. They possess moments of order r, r ∈ (0, α), except when α = 2, which gives the normal distribution (no extra skewness parameter for this case), which has moments of all orders. The two most well-known members of the family are the symmetric Cauchy distribution (α = 1) and the standard normal distribution (α = 2). A positive, Lebesgue-measurable function L defined on [a, ∞) for some a > 0, is said to be Slowly varying at infinity, L ∈ SV, whenever L(tx) → 1 as t → ∞, for all x > 0. L(t)
8 Examples are positive functions with a finite limit at infinity and L = log+ . Theorem 2.8. A random variable X with distribution function F belongs to the domain of attraction of a stable distribution of order α if and only if there exists L ∈ SV, such that
EX2I{|X| ≤ x} ∼ x2−αL(x) as x → ∞. (2.8) and, moreover, for α ∈ (0, 2), there exists some p ∈ [0, 1], such that P (X > x) P (X < −x) → p and → 1 − p as x → ∞. (2.9) P (|X| > x) P (|X| > x) Proof. Confer [17, Theorem 3.2, Chapter 9, page 432].
Remark 2.9. It is possible to replace (2.9) by a condition involving P (|X| > x) instead of EX2I{|X| ≤ x}, cf. [17, page 432]. The centering sequence may be ignored when α < 1 and taken to be {nEX} when α > 1. This is possible since X possesses moments of the same order as the stable distribution to which convergence occurs. An explicit expression for the centering sequence may also be given for the case α = 1. The normalization sequence will typically be of the form n1/αL(n), for some L ∈ SV and may be taken to be increasing. cf. [17] or [14]. Theorem 2.14 below is due to deAcosta and Gin´e [10]. We here restrict to the case of real random variables and give the original proof in a somewhat more detailed version. We first state some facts that will be needed which are related to symmetric random variables and symmetrization and one lemma concerning slowly varying functions. The inequalities in Proposition 2.13 are the so-called L´evyinequalities. For a random variable X we define the distribution of the Symmetrized variable by Xs =d X − X0, where X0 and X are i.i.d.
1/α Lemma 2.10. Assume that an = n L(n) with L ∈ SV, α > 0. Then there exist constants C and N and a sequence {τn}, such that τn → 0 as n → ∞ and so that for n > N and all m a mn ≤ Cm1/α+τn . an Proof. Set p = 1/α. By assumption a 2n → 2p as n → ∞. an
Choose N so that one may define a non-increasing sequence {τn}, with τN = 1 and τn → 0 as n → ∞ so that a 2n ≤ 2p+τn , when n > N. an
9 It now follows for n > N that
k a k Y a j k p+τn 2 n = 2 n ≤ 2p+τn = 2k . a a j−1 n j=1 2 n
It remains to show that for some constant C all k, 2k−1 ≤ m ≤ 2k and n > N a mn ≤ C, (2.10) a2k−1n since then
a a a k−1 a p+τ mn = mn · 2 n ≤ mn · 2k−1 n ≤ Cmp+τn . an a2k−1n an a2k−1n
When an is (ultimately) increasing it follows that
a a k mn ≤ 2 n ≤ 2p+τn ≤ 2p+1. a2k−1n a2k−1n But (2.10) also holds in general by using the uniform convergence theorem [7, Theorem 1.5.2, page 22], so that
p p a m 2k−1n a m 2k−1n amn 2k−1 2k−1 m m p = ≤ − k−1 + k−1 ≤ 1 + 2 , a2k−1n a2k−1n a2k−1n 2 2 for n > N1 say. One may then choose N0 = N ∨ N1 instead of N. Remark 2.11. This lemma could also easily be deduced from Karamata’s representation theorem of slowly varying functions, (which may be found in [7, page 12]).
Proposition 2.12. Let X be a random variable and let med (X) denote the median of X. Then for any r > 0, 1 E|X − med (X)|r ≤ E|Xs|r. 2 Proof. [17, Proposition 6.4, Chapter 3, page 135].
Proposition 2.13. Let {Xk}k≥1 be a sequence of independent, symmetric random variables with partial sums Sn, n ≥ 1. Then
P ( max Sk > x) ≤ 2P (Sn > x), 1≤k≤n
P ( max |Sk| > x) ≤ 2P (|Sn| > x). 1≤k≤n
Proof. [17, Theorem 7.1, Chapter 3, page 139].
10 Theorem 2.14. Assume that a distribution function F belongs to the do- main of attraction of a stable distribution G of index α. Take {Xk}k≥1 i.i.d. Pn and F -distributed, set Sn = k=1 Xk and assume that S − b n n −→d G, as n → ∞, (2.11) an for some positive sequence {an} and real sequence {bn}. Then
β Sn − bn sup E < ∞, for all β ∈ (0, α). (2.12) n an Remark 2.15. Theorem 2.14 implies that moments of order strictly less than α converge to moments of corresponding order in the limit relation. To prove this one needs to verify Uniform integrability for the sequences
β Sn − bn , β ∈ (0, α). an Uniform boundedness for all such β suffices by [17, Theorem 4.2, Chapter 5, page 215].
Proof of Theorem 2.14. It suffices to prove the result for symmetric random variables Xk. Indeed, for the general case of (2.11) it follows (by subtracting independent, convergent random variables) that Ss n −→d Gs, as n → ∞. an By assuming (2.12) for this sequence we may then use Proposition 2.12 to conclude that
β s β 1 Sn − bn Sn − bn Sn E − med ≤ E . 2 an an an
It then remains to prove that the sequence {med ( Sn−bn )} is bounded, but an this follows from assumption (2.11). For symmetric random variables no centering constants bn are necessary. We now proceed to prove the theorem for this situation. For ∈ (0, 1/2), choose d so that for all n Sn P > d < . an This is possible since convergence implies stochastic boundedness. It now follows by the second inequality in Proposition 2.13 that
P ( max |Snk − Sn(k−1)|/amn > d) ≤ 2P (|Smn|/amn > d) ≤ 2. 1≤k≤m
11 Now since
d |Snk − Sn(k−1)| = |Sn|, and all independent for 1 ≤ k ≤ m, and since for any i.i.d. random variables Yk and Y ,