Almost Sure Central Limit Theory
Total Page:16
File Type:pdf, Size:1020Kb
Almost Sure Central Limit Theory Fredrik Jonsson U.U.D.M. Project Report 2007:9 Examensarbete i matematisk statistik, 20 poäng Handledare och examinator: Allan Gut Mars 2007 Department of Mathematics Uppsala University Abstract The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n 1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the above mentioned sequence of random variables. The purpose of this thesis is to present and summarize various kinds of generalizations of this result which may be found in the research literature. i Acknowledgement I would like to thank Professor Allan Gut for introducing me to the subject, for careful readings of my drafts and for interesting conversations. iii Contents 1 Introduction 1 1.1 Notation . 2 2 Preliminaries 3 2.1 Probability measures and weak convergence . 3 2.2 Central limit theory . 8 2.3 Summation methods: linear transformations . 15 3 Almost Sure Converging Means of Random Variables 19 3.1 Bounds for variances of weighted partial sums . 19 3.2 Bounds for covariances among individual variables . 22 3.3 Refinements with respect to weight sequences . 25 4 Almost Sure Central Limit Theory 29 4.1 Independent random variables . 31 4.2 Weakly dependent random variables . 37 4.3 Subsequences . 40 4.4 An almost sure version of Donsker’s theorem . 40 5 Generalizations and Related Results 45 5.1 A universal result and some consequences . 45 5.2 Return times . 47 5.3 A local limit theorem . 50 5.4 Generalized moments in the almost sure central limit theorem 52 v 1 Introduction The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n 1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), (1.1) n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the sequence of random variables {Xk}k≥1. The notation “a.s.” abbreviates “almost surely”, that is, with probability one. The first version of (1.1) was proved in the late 1980s, but a preliminary result was considered in the 1930s by Paul L´evy. It was at this early stage shown (consult [13] for an elementary proof in the case of the simple, symmetric random walk) that the random quantity n 1 X IS ≤ 0 (1.2) n k k=1 does not stop to vary randomly as n tends to infinity. On the contrary, the distributions of the random variables converge to the Arc sine distribution. The quantity in (1.2) can be interpreted as the amount of time the random walk {Sn} has spent below zero up to time n. In the result (1.1), except for replacing 0 by the more general x, there are weights {1/k}k≥1 and a P different normalization corresponding to the fact that 1≤k≤n 1/k ∼ log n, as n → ∞. In this way randomness vanishes asymptotically, but on the other hand, the random walk occupancy time interpretation seems to be lost. There are many other kinds of sequences of random variables {Xk}k≥1, than the one mentioned at the beginning where S lim √n =d Φ. n→∞ n For an even larger class of interesting sequences {Xk}k≥1 one has S − b lim n n =d G, (1.3) n→∞ an for some sequences {an}n≥1 and {bn}n≥1 and a distribution G (different from the zero-distribution). One may call these results Central limit theorems. The purpose of this thesis is to present and summarize known general- izations of (1.1) for some of the more well-known examples satisfying (1.3), especially those where the Xk-variables are independent. This is the content 1 of Sections 4.1 and 4.2. Improvements, or other ways of generalizing (1.1), are presented in Sections 4.4 and 5.4. The remaining parts of Chapter 5 and Section 4.3 present other results which are interesting in this context. Some useful background material is presented in Chapter 2 while Chapter 3 gives relevant results to be used in Chapters 4 and 5. The results of Chapter 3 are to a large extent based on arguments in [4] which we separate and put in a more general form. However, we make a more elementary connection to the theory of summation methods. No reference to the general theory of Riesz typical means, as can be found in [4], is made here. A slight generalization of the results has also been obtained under the influence of [11]. A different way of arriving at (1.1), and related results which will be considered here, is based on Characteristic functions and may be found in [20]. It is not the aim of this thesis to give a complete overview of results inspired by, or connected to (1.1). We refer to the survey paper [3] for further results. We rather hope to introduce the subject and perhaps contribute concerning unification and by filling out some gaps where research articles most of the times leave out the details. Examples of the latter kind are Theorems 2.4 and 5.19. Some extensions of previously published results may also be new. 1.1 Notation We follow Landau’s “small o”, “big O” notation for real-valued functions and sequences. That is: f = o(g) means f(x)/g(x) → 0 as x → ∞ and f = O(g) means f(x)/g(x) remains bounded as x → ∞. We presume the reader’s familiarity with such statements as: For all > 0, log x = o(x). By f ∼ g we mean “asymptotically equal”, that is f(x)/g(x) → 1 as x → ∞. An example of a true statement of this kind is log (1 + 1/x) ∼ 1/x. We follow the tradition of denoting iterated logarithms by logk, k ≥ 1. That is, log1 (x) := log x, and recursively logk+1 (x) := log logk (x). We also presume the reader’s familiarity with the (hopefully universal) fundamental concepts of probability theory. We refer e.g. to the first chap- ters of [17]. As for notation we reserve N(µ, σ2) to denote the normal dis- tribution with expectation µ and variance σ2. We also denote the standard normal distribution, N(0, 1), by N, its distribution function by Φ and its density function by φ. As is common, we abbreviate “independent, identi- cally distributed” i.i.d. At some places we refer to facts and concepts from the theory of inte- gration. In measure spaces we denote the indicator function (defined on the same space, assuming values 0 and 1) of a subset A by I{A}. 2 2 Preliminaries 2.1 Probability measures and weak convergence This section concerns probability measures on some space S, equipped with a metric %(·, ·) and the usual σ-field S generated by the open balls in S. Given such a measure P on (S, S), a set being P-continuous means that its boundary has P -measure 0. Given an P -integrable function f on S to R, we also denote R fdP by P f. To such a setting we may extend the familiar notion (in S = R) of Convergence in distribution to what is called Weak convergence of probability measures. It concerns a collection ({Pn}n≥1,P ) of probability measures, is denoted Pn ⇒ P and defined by lim Pnf = P f, for all bounded and continuous f. n→∞ The following theorem, usually called ”The Portmanteau Theorem”, gives five equivalent definitions. Theorem 2.1. Let {Pn}n≥1 and P be probability measures on (S, S). These five conditions are equivalent: (i) Pn ⇒ P ; (ii) limn→∞ Pnf = P f, for all bounded, uniformly continuous f; (iii) lim supn PnF ≤ PF , for all closed sets F; (iv) lim infn PnG ≥ PG, for all open sets G; (v) PnA ⇒ PA, for all P-continuity sets A. Proof. [6, Theorem 2.1, page 16]. In the case of a Separable metric space, i.e. one with a countable dense subset, condition (ii) may, for sufficiency, be weakened as follows: Proposition 2.2. Let {Pn}n≥1, P and (S, S) be as before. Assume S separa- ble. Then there exists a sequence {fm}m≥1 of bounded, Lipschitz-continuous functions, fm : S → R, such that lim Pnfm → P fm, for all m ≥ 1, n→∞ implies Pn ⇒ P . Proof. Let {xk}k≥1 denote a dense sequence in S. There are countably many balls {B(xm, q): m ∈ N, q ∈ Q} which we denote {Ak}k≥1. The sets {Ak} generate the open sets in S in the sense that every open set A may be written [ A = Ak, (2.1) k∈A 3 for some A ⊆ N. Indeed, if x ∈ A there exists an so that B(x, ) ⊆ A and an xk so that xk ∈ B(x, ). By the triangle inequality B(xk, ) ⊆ A so that 2 2 A := k ∈ N : Ak ⊆ A will do. To verify condition (iv) of Theorem 2.1 it will be enough to consider only finite unions of sets A , since assuming so and writing A = S A k j∈N kj by (2.1) implies that m [ lim inf PnA = lim inf lim Pn Ak n n m→∞ j j=1 m [ ≥ lim lim inf Pn Ak m→∞ n j j=1 m [ ≥ lim P Ak = P A.