<<

Almost Sure Central Limit Theory

Fredrik Jonsson U.U.D.M. Project Report 2007:9

Examensarbete i matematisk statistik, 20 poäng Handledare och examinator: Allan Gut Mars 2007

Department of Mathematics Uppsala University

Abstract

The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n   1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the above mentioned sequence of random variables. The purpose of this thesis is to present and summarize various kinds of generalizations of this result which may be found in the research literature.

i

Acknowledgement

I would like to thank Professor Allan Gut for introducing me to the subject, for careful readings of my drafts and for interesting conversations.

iii

Contents

1 Introduction 1 1.1 Notation ...... 2

2 Preliminaries 3 2.1 Probability measures and weak convergence ...... 3 2.2 Central limit theory ...... 8 2.3 Summation methods: linear transformations ...... 15

3 Almost Sure Converging Means of Random Variables 19 3.1 Bounds for variances of weighted partial sums ...... 19 3.2 Bounds for covariances among individual variables ...... 22 3.3 Refinements with respect to weight sequences ...... 25

4 Almost Sure Central Limit Theory 29 4.1 Independent random variables ...... 31 4.2 Weakly dependent random variables ...... 37 4.3 Subsequences ...... 40 4.4 An almost sure version of Donsker’s theorem ...... 40

5 Generalizations and Related Results 45 5.1 A universal result and some consequences ...... 45 5.2 Return times ...... 47 5.3 A local limit theorem ...... 50 5.4 Generalized moments in the almost sure central limit theorem 52

v

1 Introduction

The Almost sure central limit theorem states in its simplest form that a sequence of independent, identically distributed random variables {Xk}k≥1, 2 with moments EX1 = 0 and EX1 = 1, obeys n   1 X 1 Sk a.s. lim I √ ≤ x = Φ(x), (1.1) n→∞ log n k k=1 k for each value x. I{·} here denotes the indicator function of events, Φ the distribution function of the standard normal distribution and Sn the n:th partial sum of the sequence of random variables {Xk}k≥1. The notation “a.s.” abbreviates “almost surely”, that is, with probability one. The first version of (1.1) was proved in the late 1980s, but a preliminary result was considered in the 1930s by Paul L´evy. It was at this early stage shown (consult [13] for an elementary proof in the case of the simple, symmetric ) that the random quantity

n 1 X IS ≤ 0 (1.2) n k k=1 does not stop to vary randomly as n tends to infinity. On the contrary, the distributions of the random variables converge to the Arc sine distribution. The quantity in (1.2) can be interpreted as the amount of time the random walk {Sn} has spent below zero up to time n. In the result (1.1), except for replacing 0 by the more general x, there are weights {1/k}k≥1 and a P different normalization corresponding to the fact that 1≤k≤n 1/k ∼ log n, as n → ∞. In this way vanishes asymptotically, but on the other hand, the random walk occupancy time interpretation seems to be lost. There are many other kinds of sequences of random variables {Xk}k≥1, than the one mentioned at the beginning where S lim √n =d Φ. n→∞ n

For an even larger class of interesting sequences {Xk}k≥1 one has S − b lim n n =d G, (1.3) n→∞ an for some sequences {an}n≥1 and {bn}n≥1 and a distribution G (different from the zero-distribution). One may call these results Central limit theorems. The purpose of this thesis is to present and summarize known general- izations of (1.1) for some of the more well-known examples satisfying (1.3), especially those where the Xk-variables are independent. This is the content

1 of Sections 4.1 and 4.2. Improvements, or other ways of generalizing (1.1), are presented in Sections 4.4 and 5.4. The remaining parts of Chapter 5 and Section 4.3 present other results which are interesting in this context. Some useful background material is presented in Chapter 2 while Chapter 3 gives relevant results to be used in Chapters 4 and 5. The results of Chapter 3 are to a large extent based on arguments in [4] which we separate and put in a more general form. However, we make a more elementary connection to the theory of summation methods. No reference to the general theory of Riesz typical means, as can be found in [4], is made here. A slight generalization of the results has also been obtained under the influence of [11]. A different way of arriving at (1.1), and related results which will be considered here, is based on Characteristic functions and may be found in [20]. It is not the aim of this thesis to give a complete overview of results inspired by, or connected to (1.1). We refer to the survey paper [3] for further results. We rather hope to introduce the subject and perhaps contribute concerning unification and by filling out some gaps where research articles most of the times leave out the details. Examples of the latter kind are Theorems 2.4 and 5.19. Some extensions of previously published results may also be new.

1.1 Notation We follow Landau’s “small o”, “big O” notation for real-valued functions and sequences. That is: f = o(g) means f(x)/g(x) → 0 as x → ∞ and f = O(g) means f(x)/g(x) remains bounded as x → ∞. We presume the reader’s familiarity with such statements as: For all  > 0, log x = o(x). By f ∼ g we mean “asymptotically equal”, that is f(x)/g(x) → 1 as x → ∞. An example of a true statement of this kind is log (1 + 1/x) ∼ 1/x. We follow the tradition of denoting iterated logarithms by logk, k ≥ 1. That is, log1 (x) := log x, and recursively logk+1 (x) := log logk (x). We also presume the reader’s familiarity with the (hopefully universal) fundamental concepts of . We refer e.g. to the first chap- ters of [17]. As for notation we reserve N(µ, σ2) to denote the normal dis- tribution with expectation µ and variance σ2. We also denote the standard normal distribution, N(0, 1), by N, its distribution function by Φ and its density function by φ. As is common, we abbreviate “independent, identi- cally distributed” i.i.d. At some places we refer to facts and concepts from the theory of inte- gration. In measure spaces we denote the indicator function (defined on the same space, assuming values 0 and 1) of a subset A by I{A}.

2 2 Preliminaries

2.1 Probability measures and weak convergence This section concerns probability measures on some space S, equipped with a metric %(·, ·) and the usual σ-field S generated by the open balls in S. Given such a measure P on (S, S), a set being P-continuous means that its boundary has P -measure 0. Given an P -integrable function f on S to R, we also denote R fdP by P f. To such a setting we may extend the familiar notion (in S = R) of Convergence in distribution to what is called Weak convergence of probability measures. It concerns a collection ({Pn}n≥1,P ) of probability measures, is denoted Pn ⇒ P and defined by

lim Pnf = P f, for all bounded and continuous f. n→∞ The following theorem, usually called ”The Portmanteau Theorem”, gives five equivalent definitions.

Theorem 2.1. Let {Pn}n≥1 and P be probability measures on (S, S). These five conditions are equivalent: (i) Pn ⇒ P ; (ii) limn→∞ Pnf = P f, for all bounded, uniformly continuous f; (iii) lim supn PnF ≤ PF , for all closed sets F; (iv) lim infn PnG ≥ PG, for all open sets G; (v) PnA ⇒ PA, for all P-continuity sets A. Proof. [6, Theorem 2.1, page 16].

In the case of a Separable metric space, i.e. one with a countable dense subset, condition (ii) may, for sufficiency, be weakened as follows:

Proposition 2.2. Let {Pn}n≥1, P and (S, S) be as before. Assume S separa- ble. Then there exists a sequence {fm}m≥1 of bounded, Lipschitz-continuous functions, fm : S → R, such that

lim Pnfm → P fm, for all m ≥ 1, n→∞ implies Pn ⇒ P .

Proof. Let {xk}k≥1 denote a dense sequence in S. There are countably many balls {B(xm, q): m ∈ N, q ∈ Q} which we denote {Ak}k≥1. The sets {Ak} generate the open sets in S in the sense that every open set A may be written [ A = Ak, (2.1) k∈A

3 for some A ⊆ N. Indeed, if x ∈ A there exists an  so that B(x, ) ⊆ A and   an xk so that xk ∈ B(x, ). By the triangle inequality B(xk, ) ⊆ A so that  2 2 A := k ∈ N : Ak ⊆ A will do. To verify condition (iv) of Theorem 2.1 it will be enough to consider only finite unions of sets A , since assuming so and writing A = S A k j∈N kj by (2.1) implies that

m  [  lim inf PnA = lim inf lim Pn Ak n n m→∞ j j=1 m  [  ≥ lim lim inf Pn Ak m→∞ n j j=1 m  [  ≥ lim P Ak = P A. (2.2) m→∞ j j=1

The first inequality (changing order of limits) is valid since

m  [  lim inf Pn Ak is non-decrasing in m, n j j=1 so that for any  > 0 and some M():

m M  [   [  lim lim inf Pn Ak ≤ lim inf Pn Ak + . m→∞ n j n j j=1 j=1

The first term on the right is majorized by

m  [  lim inf lim Pn Ak , n m→∞ j j=1 since m  [  Pn Akj is non-decrasing in m, for all n. j=1 This proves (2.2). The collection of finite unions of balls Ak is also countable. Indeed, the collection of n-ary unions is of non-larger cardinality than the n-fold cartesian product, which is countable. And a countable union of countable sets is countable. It therefor remains to show that for any fixed, finite union of sets Ak there exist a sequence {fm} of bounded Lipschitz-functions so that Pnfm → P fm for all m ∈ N implies lim infn PnA ≥ PA. This last

4 c condition is equivalent to lim supn PnF ≤ PF , for F = A , where F is a closed set. Define

%(x, F ) := inf %(x, z) z∈F + fm(x) := 1 − %(x, F )m n 1 o F := x ∈ S : %(x, F ) < . m m T Then F = Fm, since F closed implies that m∈N 1 \ x∈ / F ⇒ %(x, F ) > 0 ⇒ ∃m : %(x, F ) ≥ ⇒ x∈ / F ⇒ x∈ / F . m m m m∈N

Moreover, Fm+1 ⊆ Fm implies that

M  \  PF = lim P Fm = lim PFM , (2.3) M→∞ M→∞ m=1 by fundamental properties of measures. Finally, it follows that for any m

I{F } ≤ fm ≤ I{Fm}. (2.4)

Indeed, x ∈ F implies that fm(x) = 1 and (2.4) holds with equalities. Taking x ∈ Fm\F implies that fm(x) = 1 − %(x, F )m ≤ 1 so that (2.4) holds with 0 and 1 on the boundaries. Taking x∈ / Fm finally implies that fm(x) = 0 and (2.4) once again holds with equalities. Now, assuming Pnfm → P fm for all m ∈ N we get by (2.4) that

lim sup PnF ≤ lim sup Pnfm = P fm ≤ PFm. n n

The statement lim supn PnF ≤ PF follows by (2.3). It only remains to show that fm is Lipschitz, that is

|fm(x) − fm(y)| ≤ N%(x, y), (2.5) for some constant N and all x and y, since boundedness is obvious. In fact (2.5) holds with N = m. This follows from Lemma 2.3 by, as for (2.4), going through the different cases where x and y belong to F and Fm respectively.

Lemma 2.3. Let A be any subset of S and define a positive function on S by %(x, A) := infz∈A %(x, z). Then %(·,A) is Lipschitz-1, i.e.

%(x, A) − %(y, A) ≤ %(x, y).

5 Proof. Assume w.l.o.g. that %(x, A) ≥ %(y, A). For  > 0 take z ∈ A so that %(y, z) − %(y, A) ≤ . Then

%(x, A) − %(y, A) = %(x, A) − %(y, A) ≤ %(x, z) − %(y, A) ≤ %(x, y) + %(y, z) − %(y, A) ≤ %(x, y) + .

The proof is complete since  was arbitrary.

From Theorem 2.1 and Proposition 2.2 we now deduce a result to be used in Chapters 4 and 5.

Theorem 2.4. Let {dk}k≥1 be a sequence of positive real numbers and set P Dn := 1≤k≤n dk for n ≥ 1. Let further {Xk}k≥1 be a sequence of ran- dom elements in a separable metric space S defined on a (Ω, P,P ). Let further G be a probability measure on S and in case S = R, CG ⊆ R its continuity points. Finally, let for x ∈ S, δ(x) denote the Dirac point measure of x. The following two conditions are equivalent:

(i) 1 Pn d δ(X ) ⇒ G, almost surely; Dn k=1 k k (ii) 1 Pn d f(X ) −→a.s. R fdG,for all bounded Lipschitz-functions f. Dn k=1 k k In the case S = R the following is a third equivalent condition: (iii) 1 Pn d I{X ≤ x} −→a.s. G(x), for all x ∈ C . Dn k=1 k k G Proof. Define for all n and k:

n 1 X Fn(ω) := dkδ(Xk(ω)), Dn k=1 Gk(ω) := δ(Xk(ω)). P Since dk ≥ 0 and Dn = k≤n dk, this defines, for ω ∈ Ω fixed, probability measures on S. For the equivalence of (i) and (iii) when S = R we merely note that Fn(ω) have distribution functions

n 1 X Fn(ω; x) := dkI{Xk(ω) ≤ x}. Dn k=1 Conditions (i) and (iii) are therefore equivalent (cf. [6, Chapter 1]). In general, condition (i) could now be stated

Fn(ω) ⇒ G, all ω∈ / N. (2.6)

6 for some P -null set N ∈ P. Theorem 2.1 gives the equivalence of (2.6) and the statement Z Z fdFn −→ fdG, S S for all bounded, uniformly continuous f and all ω∈ / N. But Z n Z n 1 X 1 X  fdFn = dk fdGk = dkf Xk(ω) . Dn Dn k=1 k=1 Since f Lipschitz implies f Uniformly continuous, condition (i) implies con- dition (ii) with the same null set N for all f. On the other hand, by Proposition 2.2, statement (2.6) is also equivalent to: Z Z fmdFn −→ fmdG, for all m and all ω∈ / N, where {fm} is a certain sequence of bounded, Lipschitz-continuous functions. Once again: Z n Z n 1 X 1 X  fmdFn = dk fmdGk = dkfm Xk(ω) . Dn Dn k=1 k=1 It therefore remains to show that condition (ii) implies:

n Z 1 X  dkfm Xk(ω) −→ fmdG, (2.7) Dn k=1 for some P-null set N and all m, all ω∈ / N. Condition (iii) gives null S sets Nm for each fm. Taking N := m Nm gives another null set, since P P (N) ≤ m P (Nm) = 0. Finally, for m fixed we have:

ω∈ / N ⇒ ω∈ / N ⇒ (2.7). m  Remark 2.5. Theorem 2.4 will in later chapters be applied in cases where S = R and S = C[0, 1], the set of continuous real-valued functions on C[0, 1] equipped with the metric of uniform convergence. Moreover also for S = D[0, 1], the set of functions f : [0, 1] → R, at each point being right- continuous and having a left-hand limit, equipped with any of the metrics d and d◦ defined in [6, Chapter 3]. Another common candidate which will d not be considered is S = R . Billingsley [6] proves that all these spaces are separable.

7 2.2 Central limit theory We begin by stating three versions of the Central limit theorem. First the classical formulation, then the Lindeberg-L´evy-Feller version and finally an extension of the first, which not merely concerns random variables with finite variance and the normal limit.

Theorem 2.6. Let {Xk}k≥1 be a sequence of i.i.d. random variables with 2 Pn finite expectation µ and positive, finite variance σ , and set Sn = k=1 Xk. Then S − nµ n √ −→d N as n → ∞. σ n Proof. Confer for example [17, Theorem 1.1, Chapter 3, page 330].

Theorem 2.7. Let {Xk}k≥1 be a sequence of independent random variables 2 with finite expectations µk and positive, finite variances σk, and set Sn = Pn 2 Pn 2 k=1 Xk and sn = k=1 σk. To avoid trivialities, assume that s1 > 0. Among the three conditions below, (ii) is equivalent to the conjunction of (i) and (iii). 2 σk (i) max1≤k≤n 2 → 0 as n → ∞; sn 1 Pn 2  (ii) 2 E|Xk − µk| I |Xk − µk| > sn → 0 as n → ∞; sn k=1

(iii) 1 Pn (X − µ ) −→d N as n → ∞. sn k=1 k k Proof. [17, Theorem 2.1, Chapter 3, page 331]. Before the next result we need some new notions. A probability dis- tribution F on R, belongs to the Domain of attraction of a non-degenerate distribution G whenever a suitably centered and normalized sequence of par- tial sums of i.i.d. F -distributed random variables converge in distribution to G. It can be shown that G is unique, up to centering and normalization, rel- ative to F , and that only Stable distributions may occur as G-distributions. (cf. [17, pages 428-431]) We only mention here that stable distributions may be characterized by an order parameter, α ∈ (0, 2], the skewness parameter β ∈ [−1, 1], and finally by centering and normalization. Their characteristic functions admit finite expressions, cf. [17, page 427]. They possess moments of order r, r ∈ (0, α), except when α = 2, which gives the normal distribution (no extra skewness parameter for this case), which has moments of all orders. The two most well-known members of the family are the symmetric Cauchy distribution (α = 1) and the standard normal distribution (α = 2). A positive, Lebesgue-measurable function L defined on [a, ∞) for some a > 0, is said to be Slowly varying at infinity, L ∈ SV, whenever L(tx) → 1 as t → ∞, for all x > 0. L(t)

8 Examples are positive functions with a finite limit at infinity and L = log+ . Theorem 2.8. A X with distribution function F belongs to the domain of attraction of a stable distribution of order α if and only if there exists L ∈ SV, such that

EX2I{|X| ≤ x} ∼ x2−αL(x) as x → ∞. (2.8) and, moreover, for α ∈ (0, 2), there exists some p ∈ [0, 1], such that P (X > x) P (X < −x) → p and → 1 − p as x → ∞. (2.9) P (|X| > x) P (|X| > x) Proof. Confer [17, Theorem 3.2, Chapter 9, page 432].

Remark 2.9. It is possible to replace (2.9) by a condition involving P (|X| > x) instead of EX2I{|X| ≤ x}, cf. [17, page 432]. The centering sequence may be ignored when α < 1 and taken to be {nEX} when α > 1. This is possible since X possesses moments of the same order as the stable distribution to which convergence occurs. An explicit expression for the centering sequence may also be given for the case α = 1. The normalization sequence will typically be of the form n1/αL(n), for some L ∈ SV and may be taken to be increasing. cf. [17] or [14]. Theorem 2.14 below is due to deAcosta and Gin´e [10]. We here restrict to the case of real random variables and give the original proof in a somewhat more detailed version. We first state some facts that will be needed which are related to symmetric random variables and symmetrization and one lemma concerning slowly varying functions. The inequalities in Proposition 2.13 are the so-called L´evyinequalities. For a random variable X we define the distribution of the Symmetrized variable by Xs =d X − X0, where X0 and X are i.i.d.

1/α Lemma 2.10. Assume that an = n L(n) with L ∈ SV, α > 0. Then there exist constants C and N and a sequence {τn}, such that τn → 0 as n → ∞ and so that for n > N and all m a mn ≤ Cm1/α+τn . an Proof. Set p = 1/α. By assumption a 2n → 2p as n → ∞. an

Choose N so that one may define a non-increasing sequence {τn}, with τN = 1 and τn → 0 as n → ∞ so that a 2n ≤ 2p+τn , when n > N. an

9 It now follows for n > N that

k a k Y a j k p+τn 2 n = 2 n ≤ 2p+τn  = 2k . a a j−1 n j=1 2 n

It remains to show that for some constant C all k, 2k−1 ≤ m ≤ 2k and n > N a mn ≤ C, (2.10) a2k−1n since then

a a a k−1 a p+τ mn = mn · 2 n ≤ mn · 2k−1 n ≤ Cmp+τn . an a2k−1n an a2k−1n

When an is (ultimately) increasing it follows that

a a k mn ≤ 2 n ≤ 2p+τn ≤ 2p+1. a2k−1n a2k−1n But (2.10) also holds in general by using the uniform convergence theorem [7, Theorem 1.5.2, page 22], so that

p p a m 2k−1n a m 2k−1n     amn 2k−1 2k−1 m m p = ≤ − k−1 + k−1 ≤ 1 + 2 , a2k−1n a2k−1n a2k−1n 2 2 for n > N1 say. One may then choose N0 = N ∨ N1 instead of N. Remark 2.11. This lemma could also easily be deduced from Karamata’s representation theorem of slowly varying functions, (which may be found in [7, page 12]).

Proposition 2.12. Let X be a random variable and let med (X) denote the median of X. Then for any r > 0, 1 E|X − med (X)|r ≤ E|Xs|r. 2 Proof. [17, Proposition 6.4, Chapter 3, page 135].

Proposition 2.13. Let {Xk}k≥1 be a sequence of independent, symmetric random variables with partial sums Sn, n ≥ 1. Then

P ( max Sk > x) ≤ 2P (Sn > x), 1≤k≤n

P ( max |Sk| > x) ≤ 2P (|Sn| > x). 1≤k≤n

Proof. [17, Theorem 7.1, Chapter 3, page 139].

10 Theorem 2.14. Assume that a distribution function F belongs to the do- main of attraction of a stable distribution G of index α. Take {Xk}k≥1 i.i.d. Pn and F -distributed, set Sn = k=1 Xk and assume that S − b n n −→d G, as n → ∞, (2.11) an for some positive sequence {an} and real sequence {bn}. Then

β Sn − bn sup E < ∞, for all β ∈ (0, α). (2.12) n an Remark 2.15. Theorem 2.14 implies that moments of order strictly less than α converge to moments of corresponding order in the limit relation. To prove this one needs to verify Uniform integrability for the sequences

β Sn − bn , β ∈ (0, α). an Uniform boundedness for all such β suffices by [17, Theorem 4.2, Chapter 5, page 215].

Proof of Theorem 2.14. It suffices to prove the result for symmetric random variables Xk. Indeed, for the general case of (2.11) it follows (by subtracting independent, convergent random variables) that Ss n −→d Gs, as n → ∞. an By assuming (2.12) for this sequence we may then use Proposition 2.12 to conclude that

   β s β 1 Sn − bn Sn − bn Sn E − med ≤ E . 2 an an an

It then remains to prove that the sequence {med ( Sn−bn )} is bounded, but an this follows from assumption (2.11). For symmetric random variables no centering constants bn are necessary. We now proceed to prove the theorem for this situation. For  ∈ (0, 1/2), choose d so that for all n   Sn P > d < . an This is possible since convergence implies stochastic boundedness. It now follows by the second inequality in Proposition 2.13 that

P ( max |Snk − Sn(k−1)|/amn > d) ≤ 2P (|Smn|/amn > d) ≤ 2. 1≤k≤m

11 Now since

d |Snk − Sn(k−1)| = |Sn|, and all independent for 1 ≤ k ≤ m, and since for any i.i.d. random variables Yk and Y ,

  m P max |Yk| > d = 1 − P max |Yk| ≤ d = 1 − P (|Y | ≤ d) , 1≤k≤m k≤m  1/m P (|Y | > d) = 1 − 1 − P max |Yk| > d , 1≤k≤m it follows that

1/m P (|Sn|/amn > d) ≤ 1 − (1 − 2) . (2.13)

One verifies that for 0 < a < 1, some constant C = C(a) and all m,

1 − a1/m ≤ C/m. (2.14)

Indeed

1 − ax ≤ Cx ⇐⇒ ax ≥ 1 − Cx ⇐⇒ x log a ≥ log (1 − Cx),

But log (1 − Cx) ∼ −Cx as x → 0, and x log (1/a) ≤ Cx obviously holds for C ≥ log (1/a). Equation (2.13) therefore turns into

P (|Sn|/amn > d) ≤ C0/m, for some constant C0. (2.15)

It now follows by Lemma 2.10 that

1/α+τn mP (|Sn|/an > Cdm ) ≤ mP (|Sn|/amn > d) ≤ C0. (2.16)

From this we are now in a position to prove:   P |Sn| > t tα−δ ≤ C, an for any δ such that 0 < 2δ < α, for n > N(δ) and for all t. (2.17)

Indeed choose N(δ) so that for n > N(δ) −1 δ 1 < τ < · . (2.18) α n α α − δ Put then for simplicity Cd = M in (2.16) and choose m so that

Mm1/α+τn < t ≤ M(m + 1)1/α+τn . (2.19)

12 From these assumptions we may now deduce

|S |  |S |  P n > t ≤ P n > Cdm1/α+τn , (2.20) an an and

α−δ α−δ (1/α+τ )(α−δ) 1+τ (α−δ)− δ t ≤ M (m + 1) n ≤ Cm n α ≤ Cm. (2.21)

Statement (2.17) now follows from (2.16), (2.20) and (2.21). With this established we can now conclude, for any such δ and n > N(δ), that

α−2δ Z ∞ Sn α−2δ−1 E = (α − 2δ)t P (|Sn|/an > t)dt an 0 Z ∞ ≤ C(α − 2δ) t−(1+δ)dt < ∞. (2.22) 0 For δ fixed and the finite set n < N(δ) = N we may simply use that E|X|α−2δ < ∞ and Minkowski’s inequality to conclude that

α−2δ α−2δ 1 α−2δ α−2δ α−2δ E|Sn| ≤ n(E|X| ) α−2δ ≤ N E|X| . (2.23)

Uniform boundedness follows from (2.22) and (2.23).  We now turn to extensions of the first three limit theorems of this section to random functions, or elements, in the spaces C[0, 1] and D[0, 1] mentioned in Section 1.1. Limiting distributions considered are Wiener measure, i.e. the probability measure of the Brownian motion on the unit interval (cf. [6, Section 8]) and more general L´evystable processes (cf. [32, page 113]). Especially Theorems 2.16 and 2.17 are known as versions of Donsker’s theorem. In this context we shall also mention the Arc sine distribution of L´evyand its connection to the amount of time certain random walks asymptotically spend on the positive or negative axis.

Theorem 2.16. Let W denote Wiener measure on C[0, 1]. If {ξk}k≥1 is a sequence of i.i.d. random variables with mean 0 and variance σ2, and if Xn is the continuous random function on [0, 1] defined by 1 1 Xn(ω) = √ S (ω) + nt − [nt] √ ξ (ω). t σ n [nt] σ n [nt]+1

Then Xn ⇒ W as n → ∞.

Proof. [6, Theorem 8.2].

13 Theorem 2.17. Let W denote Wiener measure on D[0, 1]. Suppose that {ξk}k≥1 is an independent sequence of random variables with mean 0 and 2 2 variances σk satisfying the conditions of Theorem 2.7. Let sn and Sn be the partial sums of variances and random variables respectively. Let further Xn be the random function on [0, 1] defined by

 2 2  n Sk(ω) sk sk+1 Xt (ω) = , for t ∈ 2 , 2 , k = 1, ..., n. sn sn sn Then Xn ⇒ W as n → ∞.

Proof. [6, Theorem 14.1 and its extension].

Theorem 2.18. Let F , G, {Xk}k≥1, {Sn}, {an} and {bn} be as in Theorem 2.14. Assume moreover (w.l.o.g.) that G has normalization parameter 1 and centering parameter 0. Let further Xn be the random function on [0, 1] defined by   n Sk(ω) − bk k k + 1 Xt (ω) = , for t ∈ , , k = 1, ..., n. an n n

Then Xn ⇒ X as n → ∞, where X ∈ D[0, 1] is the α-stable L´evymotion (or L´evyprocess) whose one-dimensional marginal distribution at t = 1 is G.

Proof. [30, Proposition 3.4., page 81]. For more information about α-stable L´evymotions confer also [32, page 113].

Remark 2.19. There is no difficulty in using the same kind of linearly interpolated random variables as in Theorem 2.16 also in Theorem 2.17 to remain in the context of C[0, 1]. This is on the other hand not possible for Theorem 2.18, since it is not possible to define X as an element of C[0, 1] when α < 2 (confer [32, Exercises 9.5-9.6, Chapter 9]).

For simplicity we state the following theorem in the context of D[0, 1]. The function h which occurs could be defined on the entire function space; it then assigns the amount of the unit interval where a given function takes positive values.

n Theorem 2.20. Let {X }n≥1 be random elements in D[0, 1] defined as in Theorem 2.17. Define random variables by:

n 1 X h(Xn) = σ2I{S > 0}. s2 k k n k=1 Then h(Xn) −→d A, as n → ∞,

14 where A denotes the arc sine law, that is, the distribution concentrated on [0, 1] satisfying: 2 √ P (A ≤ t) = arcsin t, 0 < t < 1. π Proof. [6, Section 8] and [6, Section 14].

We finally state a local limit theorem related to the results of this section. A more general version than the one given below concerns i.i.d. random variables with some lattice distribution and finite variance. Confer [17, Theorem 7.6, Chapter 7, page 365] or [15, §49].

Theorem 2.21. Suppose that {Xk}k≥1 is an sequence of i.i.d. random vari- ables assuming only integer values. Assume that every integer a is a possible P value of Sn := 1≤k≤n Xk for all sufficiently large n. Assume moreover that 2 EX1 = 0 and that Var X1 = σ < ∞. Then, uniformly over all integers a,

√  1 nP Sn = a → √ as n → ∞. σ 2π Proof. [15, §49].

2.3 Summation methods: linear transformations The concept of Summation method originates from questions concerning how to assign limits to divergent sums. A classical treatise on the subject is [18]. We will here rather be interested in assigning limits to divergent sequences, but the problems are more or less equivalent. Typically the divergence is due to oscillation in some form. We shall only consider a certain type of such methods, namely certain kinds of linear transformations of sequences ∞ (as elements in R ). We associate a summation method T to a double sequence of real numbers, {cm,n}m,n≥1, and say that a sequence {sn}n≥1 is summable to s (T ), whenever

∞ X tm = cm,nsn → s, as m → ∞. n=1 One important concept is that of Regularity. A method T is said to be regular whenever usual convergence of a sequence, sn → s, implies that sn is summable to s by T . The following theorem is associated with T oeplitz and originally proved in the beginning of the 20th century.

Theorem 2.22. In order that T should be regular, it is necessary and suf- ficient that P (i) γm = n≥1 |cm,n| < H, for some constant H independent of m;

15 (ii) cm,n → 0, for each n, when m → ∞; P (iii) cm = n≥1 cm,n → 1, when m → ∞. Proof. [18, Theorem 2, Chapter 3, page 43] or [24, Theorem 4.10-1, page 270]. The former reference proves necessity by counterexamples, while the latter uses results from functional analysis.

We shall mostly be interested in an application where {pn}n≥1 is a non- Pm negative sequence with p0 > 0. Set Pm = n=0 pn, assume that Pm → ∞, as m → ∞, and set

cm,n = pn/Pm, when n ≤ m, cm,n = 0, when n > m.

The method associated to {cm,n} will be denoted by (N,¯ pn). One example is pn = n + 1 which gives Ces`arosummation or arithmetic means. The pn-sequence will be called Weights and gives rise to weighted means tm.

Theorem 2.23. The method (N,¯ pn) is regular. Proof. This could be established in an elementary way without any reference to Theorem 2.22. The conditions (i)-(iii) are anyhow easily verified. γm = cm = 1, verifying (i) and (iii). To verify (ii), cm,n = pn/Pm → 0, as m → ∞.

In fact, one may go a bit further and prove that methods of the type (N,¯ pn) are Totally regular. This means that we add the condition that sn → ∞, implies that sn is summable to ∞ (T ). Confer [18, Theorem 10, Chapter 3, page 53]. Much investigation has been made on how different summation meth- ods relate to each other. The following theorem gives such results for the methods recently defined. P P Theorem 2.24. If pn > 0, qn > 0, pn = ∞, qn = ∞, and either:

(a) qn+1/qn ≤ pn+1/pn; or

(b) pn+1/pn ≤ qn+1/qn and Pn/pn ≤ HQn/qn, for some constant H, then sn → s (N,¯ pn) implies sn → s (N,¯ qn). Proof. [18, Theorem 14, Chapter 3, page 58].

16 The relation Strength and the concept of Equivalence, applied to couples of methods, refer to situations as in the conclusion of the theorem, that is to relations between the two sets of summable sequences. A special case, which we shall encounter later, is where two methods defined by bounded weight sequences pn and qn are related so that pn ∼ qn as n → ∞. Do they necessarily give rise to equivalent summation methods? This question is answered in the following proposition. The proof of the second part is inspired by the proof of Theorem 2.24 found in [18].

Theorem 2.25. Assume that two methods (N,¯ pn) and (N,¯ qn) with bounded weights pn and qn are related so that pn ∼ qn as n → ∞. Then

(i) sn → s (N,¯ pn) ⇐⇒ sn → s (N,¯ qn), for bounded sequences sn;

(ii) There exists sequences pn and qn giving rise to non-equivalent methods.

Proof. It follows that Pn ∼ Qn. Indeed, for  > 0 small, take N such that for k ≥ N, qk ≤ (1 + )pk. Then for m > N: Pm PN Pm k=1 qk k=1 qk (1 + ) k=N+1 pk Pm ≤ Pm + Pm . k=1 pk k=1 pk k=1 pk The first term tends to zero as m → ∞, while the second is bounded by 1 + . Reversing inequalities and replacing  by −, yields the result. For (i), assume {sn} bounded and that sn → s (N,¯ pn). We must now prove sn → s (N,¯ qn) which is equivalent to showing that 1 X qksk → s, as n → ∞. (2.24) Pn k≤n We then decompose 1 X 1 X 1 X qksk = pksk + (qk − pk)sk, Pn Pn Pn k≤n k≤n k≤n and it remains to prove that the second term to the right tends to zero as n tends to infinity. But for  > 0 small, take N large such that for k ≥ N,

|qk − pk| ≤ pk.

We use boundedness of sn to conclude that

1 X 1 X (qk − pk)sk ≤ C |qk − pk| Pn Pn k≤n k≤n n 1  X X  ≤ C |qk − pk| +  pk Pn k≤N k=N+1  1 X  ≤ C |qk − pk| +  . Pn k≤N

17 The conclusion follows since Pn → ∞ and since  was arbitrary. (−1)k For (ii), take pk = 1 and qk = 1 + log k for k ≥ 2 and q0 = q1 = 1. They indeed satisfy the conditions. Let, for any sequence {sk}k≥1, 1 X 1 X tm = pksk, um = qksk. Pm Qn k≤m k≤n It follows that

1  q0 q1 qm  um = P0t0 + (P1t1 − P0t0) + ··· + (Pmtm − Pm−1tm−1) Qm p0 p1 pm m X = cm,ntn, n=0 where    qn − qn+1 Pn , n < m,  pn pn+1 Qm q cm,n = n+1 Pn , n = m,  pn+1 Qm  0, n > m. We may now come to the desired conclusion by verifying that some of con- ditions (i)-(iii) in Theorem 2.22 are violated for this double sequence {cm,n}. In fact, for n < m   1 1 Pn |cm,n| = + , log n log(n + 1) Qm so that X 1 X  1 1  |c | > n + m,n Q log n log(n + 1) n . m log n log(n + 1) m log n n

18 Proposition 2.26. If pn > 0 and sn → s (N,¯ pn), then

sn − s = o(Pn/pn).

Proof. The proof is found in [18, Theorem 13, Chapter 3, page 57] and may be given in one line:

pnsn = Pntn − Pn−1tn−1 = s(Pn − Pn−1) + o(Pn) = spn + o(Pn). 

It is a characteristic feature that the the strength of methods (N,¯ pn) is negatively related to the speed with which Pm tends to infinity. This is present in Theorem 2.24 and Proposition 2.26. The following proposition further illustrates this. It gives an necessary upper bound on this speed for the method to sum something else than convergent sequences. P Proposition 2.27. If Pn+1/Pn ≥ 1 + δ > 1, for some δ, then an cannot be summable (N,¯ pn) unless it is convergent. Proof. [18, Theorem 15, Chapter 3, page 59].

3 Almost Sure Converging Means of Random Vari- ables

The main result of this chapter is Theorem 3.1. The situation resembles The for random variables of finite variance, although no assumptions are made here on independence or orthogonality but instead of boundedness. Moreover we no longer restrict to arithmetic means, but allow rather arbitrary weight sequences.

3.1 Bounds for variances of weighted partial sums The proof of the following theorem is fairly simple in principle and consists of two parts. The procedure is known as the ”method of taking subsequences” or the ”gap method”. It should however be noted that the second part of such proofs usually is the hardest one, demanding some analysis and use of ”maximal inequalities”. This will not enter here, although such reasoning would be possible also in this more general situation. For further discussion confer Remark 3.4. The arguments in the proof are to a large extent implicit in [4], confer also [11] for a similar result. A general result in the case of arithmetic means and no assumption on boundedness, may be found in [19].

Theorem 3.1. Let {ξk}k≥1 be a sequence of random variables, uniformly bounded below and with finite variances, and let {dk}k≥1 be a sequence of

19 positive numbers. Set for n ≥ 1 D := Pn d and T := 1 Pn d ξ . n k=1 k n Dn k=1 k k Assume that Dn → ∞ and Dn+1/Dn → 1, (3.1) as n → ∞. If for some  > 0, C and all n

2 −1 −2 ETn ≤ C(log Dn) (log2 Dn) , (3.2) then a.s. Tn −→ 0 as n → ∞. (3.3) For the proof we shall need the following lemma.

Lemma 3.2. Let {dk}k≥1 and {Dn}n≥1 be as in Theorem 3.1. Then for each a > 1 there exists a subsequence {nk}, such that

k Dnk ∼ a , as k → ∞. (3.4)

Proof. Choose N large so that n > N implies that Dn+1/Dn < a. For k ≤ N take nk = k. For k > N define

k nk = inf{n : Dn ≥ a }.

k k Now nk+1 > nk since Dnk ≥ a , Dnk−1 < a implies that

Dnk k k+1 Dnk = Dnk−1 < aa = a . Dnk−1 Moreover, it also follows for k > N that

Dnk Dnk Dnk−1 Dnk 1 ≤ k = k < . a Dnk−1 a Dnk−1

The desired conclusion follows since Dn/Dn−1 → 1. Proof of Theorem 3.1. Let a > 1 and apply Lemma 3.2 with this a. Equation (3.2) for this subsequence then becomes

ET 2 ≤ Ck−1(log k)−2 , for some constant C. (3.5) nk

k This follows since (3.4) implies that Dnk /a remains bounded. We hence have ∞ ∞ ∞ X X X 1 E T 2 = ET 2 ≤ C < ∞, nk nk k log2 k k=1 k=1 k=1 since the summands are positive random variables. A fortiori P∞ T 2 < k=1 nk ∞, almost surely, which in turn implies that T 2 −→a.s. 0 which finally implies nk that a.s. Tnk −→ 0, as k → ∞. (3.6)

20 Consider now an arbitrary n and assume that nk < n ≤ nk+1. From (3.4) it follows that

Dnk+1 /Dnk → a. (3.7)

By assumption there exists an M such that −M < ξk, for all k. Define, for n ≥ 1, n 0 1 X Tn = dk(ξk + M) = Tn + M. Dn k=1 Then by (3.6) T 0 −→a.s. M, as k → ∞. (3.8) nk Moreover, by positivity D T 0 ≤ D T 0 ≤ D T 0 , nk nk n n nk+1 nk+1 which gives that D Dn nk T 0 ≤ T 0 ≤ k+1 T 0 . nk n nk+1 Dnk+1 Dnk 0 a.s. From (3.7) and (3.8) it then follows, by letting a → 1, that Tn −→ M so a.s. that Tn −→ 0 as n → ∞ as desired. 

Remark 3.3. The random variables ξk will usually have zero expectation 2 in applications of the theorem, so that ETn denotes the variance of Tn. If (3.2) holds for random variables not obeying this assumption, it also does for the sequence of centralized random variables. Indeed the variance of a random variable X satisfies Var X = inf E[X − a]2. a∈R The two conclusions from the theorem would then imply that the sequence of means of the original random variables is zero summable for this particular summation method. Remark 3.4. One could try to weaken (3.2) even further by choosing a subsequence {Dnk } not obeying ”Dnk+1 /Dnk bounded”. More classically one could then try to replace the second part of the proof by proving appropriate bounds for:

Mk := sup |Sm − Snk |, nk≤m

21 Remark 3.5. In order to relax Almost sure convergence to Convergence in probability in the conclusion of the theorem, one could also weaken assump- tion (3.2). The following assumption would, for example, be sufficient:

2 −γ ETn ≤ C(log Dn) , for some γ > 0 and constant C. (3.9) By the standard Chebyshev-inequality it suffices that the right hand side of (3.2) tends to zero as n tends to infinity.

3.2 Bounds for covariances among individual variables We shall now consider a way to obtain both weight sequences and estimates as in Theorem 3.1 by means of covariance estimates among individual ran- dom variables. Condition (3.10) below gives a restriction in the scope of weight se- quences {dk} to be considered in comparison with Theorem 3.1. In the preceeding theorem it was assumed that dk = o(Dk) while condition (3.10) gives a minimum rate of this convergence.

Theorem 3.6. Let {ξk}k≥1 be as in Theorem (3.1); {ck}k≥1 a nondecreasing positive sequence with c1 = 1 and ck → ∞. Define X dk = log (ck+1/ck),Dn := dk = log (ck+1), 1≤k≤n and assume that for some constant C and all k ≥ 1 −1 −2 dk ≤ CDk log Dk log2 Dk . (3.10)

ee Assume further that for some constant C and for 1 ≤ k < l, cl/ck > e , −1 −2 |E(ξkξl)| ≤ C log2 (cl/ck) log3 (cl/ck) . (3.11) For other indices 1 ≤ k < l assume that

|E(ξkξl)| ≤ C. (3.12)

Then condition (3.2) will be satisfied with Dn and Tn as in Theorem 3.1 and the conclusion of Theorem 3.1 follows. Proof. Since for any numbers

n n 2  X  X 2 X ak = ak + 2 akal, k=1 k=1 1≤k

22 n  X 2 X ak ≥ akal. (3.14) k=1 1≤k≤l≤n

If we apply (3.13) and the triangle inequality with ak = dkξk we arrive at n  X 2 X E dkξk ≤ 2 dkdl|E(ξkξl)|. (3.15) k=1 1≤k≤l≤n What we need to show is hence that

X −1 −2 2 dkdl|E(ξkξl)| ≤ C(log Dn) (log2 Dn) Dn, (3.16) 1≤k≤l≤n for some constant C. For those indices in the left hand sum where cl/ck ≥ 1/2 exp (Dn ) equation (3.11) gives: ∗ −1 −2 |E(ξkξl)| ≤ C (log Dn) (log2 Dn) . The sum over such indices is hence majorized by

∗ −1 −2 X ∗ −1 −2 2 C (log Dn) (log2 Dn) dkdl ≤ C (log Dn) (log2 Dn) Dn. 1≤k≤l≤n The last inequality is an application of (3.14). For remaining indices k, l where l lies in:  1/2 l : k ≤ l ≤ n and cl/ck < exp (Dn ) =: Ak , we can apply (3.12) which by (3.11) generalizes to all indices l and k. We note that the fact that {ck} is increasing implies that for all k considered Ak = {l : k ≤ l ≤ nk} for some nk ≤ n. We thereby get that X dl = log (cnk+1/ck), l∈Ak and by (3.10) and the definition of Ak:

log (cnk+1/ck) = log (cnk+1/cnk ) + log (cnk /ck) −1 −2 1/2 ≤ CDn(log Dn) (log2 Dn) + Dn 0 −1 −2 ≤ C Dn(log Dn) (log2 Dn) . (3.17) Now putting things together, the left hand side in (3.16) for indices where 1/2 cl/ck < exp (Dn ), is majorized by: n X X 0 2 −1 −2 C dk dl ≤ CC Dn(log Dn) (log2 Dn) . k=1 l∈Ak Inequality (3.16) is therefore satisfied for some constant C. The conclu- sion of Theorem 3.1 now follows since the sequence {dk} fulfills the desired conditions by assumptions on {ck}.

23 Remark 3.7. A stronger assumption than (3.10) is that ck+1/ck = O(1). We now derive two consequences of Theorem 3.6. α Theorem 3.8. If (3.11) and (3.12) are fulfilled for ck = k and some α > 0 then equivalently to (3.3), n 1 X ξk a.s. −→ 0 log n k k=1 Proof. We have  1  1 d = α log 1 + ∼ α , as k → ∞. (3.18) k k k Pn 1 This implies that Dn ∼ α k=1 k ∼ α log n as n → ∞. We may obviously disregard the occurrence of α (set α = 1). By Theorem 2.24 we may prove equivalence between the methods (N,¯ dk) and (N,¯ 1/k) as defined in Section 2.3 by either showing (i) or (ii):

(i) (k + 1)dk+1 ≤ kdk; (3.19)

(ii) (k + 1)dk+1 ≥ kdk. (3.20) In fact (3.19) holds, since the function g, ! k + 1k g(k) := k log (1 + 1/k) = log , k

k+1 k is decreasing. Indeed, one easily verifies that ( k ) decreases as k increases. The logarithmic function is on the other hand increasing.

α Theorem 3.9. If (3.11) and (3.12) are fulfilled for ck = (log k) and some α > 0 then equivalently to (3.3), n 1 X ξk a.s. −→ 0. log n k log k 2 k=1 Proof.  log (k + 1) − log k  d = α log 1 + k log k log (k + 1) − log k  1 ∼ α ∼ α , as k → ∞. log k k log k Pn 1 This implies that Dn ∼ α k=1 k log k ∼ α log2 n. As in Theorem 3.8 we need to show either (i) or (ii):

(i) dk+1(k + 1) log (k + 1) ≤ dkk log k; (3.21)

(ii) dk+1(k + 1) log (k + 1) ≥ dkk log k. (3.22)

24 Define log (x + 1) f(x) = x log x log , log x g(x) = x log (1 + 1/x)

We need to show that f is either decreasing or increasing and in the proof of Theorem 3.8 we showed that g is decreasing. Now x log x x log x f(x) = y log (1 + 1/y) = g(y), y y log x y = , log (1 + 1/x) x log x = x log (1 + 1/x) = g(x). y y is increasing in x so that f is the product of two decreasing functions, and hence decreasing. In conclusion, (3.21) holds.

3.3 Refinements with respect to weight sequences

Condition (3.23) below is stronger than (3.11) in Theorem 3.6 with ck = k. That is, Theorem 3.8 with log-summation is valid in this case. The next theorem shows that one may with this condition improve on the result by choosing other sequences {Dn}n≥1 tending faster to infinity. Confer Proposition 3.13 for results concerning in what sense one may call this an improvement.

Theorem 3.10. Let {ξk}k≥1 be as in Theorem (3.1). Assume further that for some positive constants C, γ and all 1 ≤ k ≤ l,

−γ |E(ξkξl)| ≤ C(l/k) . (3.23)

1 Then condition (3.2) is satisfied for any α ∈ [0, 2 ) and condition (3.9) in Remark 3.5 is satisfied for all α ∈ [0, 1) with  1    d := log 1 + exp (log k)α , k k and Dn and Tn defined by these as before. It follows that: 1 T −→a.s. 0, as n → ∞ for all α ∈ [0, ), n 2 p Tn −→ 0, as n → ∞ for all α ∈ [0, 1).

First we need a lemma concerning these weight sequences.

25 Lemma 3.11. For α ∈ [0, 1] and Dn above, there exists some constant C > 0 such that 1−α  α Dn ∼ C(log n) exp (log n) . Proof. The boundary cases are well-known with C = 1. Assume that α ∈ (0, 1) and define:   F (x) = (log x)1−α exp (log x)α , 1   f(x) = exp (log x)α . x

Then dk ∼ f(k). Together with Dn → ∞ it follows, just as in the beginning of the proof of Theorem 2.25, that n n X X Dn = dk ∼ f(k). (3.24) k=1 k=1 On the other hand  α (1 − α)  F 0(x) = exp (log x)α + (log x)−α = αf(x) + g(x), x x with  (1 − α)  g(x) := exp (log x)α (log x)−α . x Clearly g = o(f). Together with f, g ≥ 0 this implies that Z n Z n F (n) = F (n) − F (1) = F 0(x)dx ∼ αf(x)dx. (3.25) 1 1 To prove this, take for  > 0 small, N such that

x ≥ N ⇒ g(x) ≤ αf(x).

Then for y > N: R y R N R y 1 g(x)dx 1 g(x)dx N αf(x)dx R y ≤ R y + R y . 1 (αf(x) + g(x))dx 1 (αf(x) + g(x))dx 1 (αf(x) + g(x))dx The first term tends to zero since F (y) → ∞ as y → ∞. The second is bounded by . This proves (3.25). It now remains to prove that n X Z n f(k) ∼ f(x)dx. (3.26) k=1 1 For x such that log x ≥ 1, it follows that  α(log x)α−1 1   α 1  α − 1 f 0(x) = exp (log x)α − ≤ x − = < 0. x2 x2 x2 x2 x

26 The function f is hence decreasing for x ≥ 3, say. Hence

n n+1 X Z n X f(k) ≤ f(x)dx ≤ f(k). k=3 3 k=3 R n Since F (n) → ∞ implies k=1 f(x)dx → ∞ by (3.25), it only remains to show that f(n + 1) Pn → 0. k=1 f(k) 1 But this is obvious since 0 ≤ f(k) ≤ 1. Our proof is complete and C = α , except for α = 0 where C = 1.

Proof of Theorem 3.10. The proof will be similar to the proof of Theorem 3.6. Condition (3.1) is satisfied since {dk}k≥1 is bounded, indeed,

 1     1  log 1 + exp (log k)α ≤ log 1 + exp (log k) k k  1  = k log 1 + ∼ 1. k

As before we need to show that

X −1 −2 2 dkdl|E(ξkξl)| ≤ C(log Dn) (log2 Dn) Dn, (3.27) 1≤k≤l≤n for some constant C. Now consider indices l, k satisyfing

2/γ l/k ≥ (log Dn) and their complement separately. For the first set condition (3.23) trans- forms into −2 |E(ξkξl)| ≤ C(log Dn) , so that −2 X −2 2 C(log Dn) dkdl ≤ C(log Dn) Dn. 1≤k≤l≤n The last inequality follows from (3.14). For the complement we proceed similarly as in Theorem 3.6. Define

n 2/γo n o Ak := l : k ≤ l ≤ n and l/k < (log Dn) = l : k ≤ l ≤ nk .

Inequality (3.17) is replaced by 2 log ((n + 1)/k) ≤ log 2 + log D ≤ C log D , (3.28) k γ 2 n 2 n

27 We shall also use the fact that for α > 0:

 α ∗ −(1−α)/α exp (log n) ∼ C Dn(log Dn) , (3.29) for some constant C∗. This comes from Lemma 3.11. Indeed,

α (1−α)/α 1−α log Dn ∼ (log n) ⇒ log Dn ∼ (log n) .

We proceed:

n n X X X X 1   d d = d log (1 + ) exp (log l)α k l k l k=1 A k=1 A n X X 1   ≤ d log (1 + ) exp (log n)α k l k=1 A n  α X ≤ C exp (log n) dk log2 (Dn) k=1  α = C exp (log n) Dn log2 (Dn) ∗ 2 −(1−α)/α ≤ CC Dn log2 (Dn)(log Dn) 00 2 − 1−α + ≤ C Dn(log Dn) α .

1−α The last inequality holds for all  > 0. Now α < 1/2 gives α = 1 + δ, while α < 1 only implies that 1−α > 0. This completes the proof. α  Remark 3.12. There seems to be an open question whether 1 > α ≥ 1/2 permits any Almost sure conclusions. Negative results in the case α = 1 follow from Theorem 2.20. Berkes claims that Proposition 3.10 holds for arbitrary sequences ck (with properties as in Theorem 3.6) with weights

 α dk := log (ck+1/ck) exp (log ck) .

To verify this one needs to prove a generalization of Lemma 3.11 and to show that dk satisfies (3.1). The rest of the proof would be similar. The family of summation methods defined in Theorem 3.10 ranges be- tween log summation (α = 0) and Ces`aro summation (α = 1). The strength of these methods decreases as α increases, as we shall see in the following proposition. In fact, in the formulation of Theorem 2.24 there is no claim of proving any strict difference of strength between methods. But this seems possible to prove here and in other situations of the same type by continuing the proof in [18], which is based on Theorem 2.22. Confer the end of the proof of the next proposition.

28 α  Proposition 3.13. Let pk(α) = log (1 + 1/k) exp (log k ) . Then for 0 ≤ 0 0 α < α ≤ 1 and for all sequences {sn}n≥1, sn → s (N,¯ pk(α )) implies that sn → s (N,¯ pk(α)). 0 P P Proof. Set pk = pk(α ), qk = pk(α) and Pn = k≤n pk, Qn = k≤n qk. By Lemma 3.11

1 0 0 P ∼ (log n)1−α exp (log n)α , n α0 1 Q ∼ (log n)1−α exp (log n)α. n α

We investigate condition (a) of Theorem 2.24, namely qn+1/qn ≤ pn+1/pn.

qn+1/qn ≤ pn+1/pn ⇐⇒ 0 0 exp (log (n + 1))α − (log n)α ≤ exp (log (n + 1))α − (log n)α  ⇐⇒ 0 0 (log (n + 1))α − (log n)α ≤ (log (n + 1))α − (log n)α .

The last inequality follows from the mean value theorem and the fact that the derivative of f(x) = xα increases pointwise for positive x as α increases. Condition (b) of Theorem 2.24 for p and q in reversed order does on the other hand not hold. In fact for some constant C 1−α pnQn (log n) α0−α ∼ C 1−α0 = C(log n) , qnPn (log n) which tends to infinity as n tends to infinity. This indicates that there is no equivalence between methods belonging to different values of α.

4 Almost Sure Central Limit Theory

The results of the previous chapter will now be used to derive Almost sure central limit theorems for sums of random variables. We use the notation Pn Sn = k=1 Xk and often presume a Central limit theorem of the form S − b n n −→d G, (4.1) an for some sequences {an}n≥1 and {bn}n≥1 and a non-trivial distribution G. So far there are no dependency assumptions whatsoever concerning the se- quence {Xk}k≥1. The following result is based on Theorem 3.6 and will be an important tool in subsequent sections. Theorem 4.1. Assume that for some distribution G, random variables {Xk}k≥1 and real sequences {an}n≥1 and {bn}n≥1 S − b n n −→d G, as n → ∞, (4.2) an

29 with an > 0. Let f be a bounded Lipschitz-function on R and define:      Sk − bk Sk − bk ξk = f − E f . ak ak Assume moreover that for some constants M and  (possibly depending on f) and all k and l obeying 1 ≤ k ≤ l, −1 −2 E(ξkξl) ≤ M log2 (cl/ck) log3 (cl/ck) , (4.3) or stronger   ck E(ξkξl) ≤ M , (4.4) cl with {ck}n≥1 positive and nondecreasing to infinity such that condition (3.10) is fulfilled.  Pn Then, for dk := log ck+1/ck , Dn := k=1 dk, n   1 X Sk − ak a.s. dkI ≤ x −→ G(x), for all x ∈ CG. (4.5) Dn bk k=1 Proof. Put

Sk − bk Yk = . ak Theorem 2.4 shows that (4.5) is equivalent to: n Z 1 X a.s. dkf(Yk) −→ fdG, Dn k=1 for all bounded Lipschitz funtions f. It follows by (4.2) and Theorem 2.1 that Z Ef(Yk) → fdG as k → ∞, for such functions f. Since the summation method defined by {dk}k≥1 is regular it follows that

n 1 X Z dkEf(Yk) → fdG. Dn k=1 It therefore remains to prove that

n 1 X  a.s. dk f(Yk) − Ef(Yk) −→ 0, Dn k=1 for all bounded Lipschitz funtions f. To this end our assumptions make sure that Theorem 3.6 applies.

30 Remark 4.2. Berkes [4, page 23] states that there is an example (due to Lifschitz) in the case cl = l of a sequence of independent random variables {X } with mean 0 and finite variances that obeys (4.2) with b = 0 and k k√≥1 n an = n and −1 E(ξkξl) ≤ M log2 (l/k) , as in (4.3) but not (4.3) itself nor the conclusion (4.5).

4.1 Independent random variables We shall now recall the results of the beginning of Section 1.2 to see when, and for which weight sequences {dk}k≥1 Theorem 4.1 may be applied. Con- dition (4.4) is crucial and will be investigated in the following proposition.

Proposition 4.3. Let Sn denote the n:th partial sum of a sequence of inde- pendent random variables {Xk}k≥1. Let {an}n≥1 and {bn}n≥1 be sequences of real numbers. Assume that {bn}n≥1 is positive. Let f be a bounded Lipschitz-function on R and define:      Sk − bk Sk − bk ξk = f − E f . (4.6) ak ak Assume further that p Sn − bn E ≤ C (4.7) an for some constant C and some p < 1. Then, for some constant M and all k and l obeying 1 ≤ k ≤ l, it follows that  p ak E(ξkξl) ≤ M . (4.8) al Proof. Let K denote a Lipschitz-constant, as well as an upper bound for f. Define further:   Sk − bk fk = f , for all k. ak   Sl − Sk − (bl − bk) fk,l = f , for all 1 ≤ k ≤ l. al

By independency assumptions, fk,l is independent of fk, for all 1 ≤ k ≤ l. It follows that

E(ξkξl) = Cov (fk, fl) = Cov (fk, fl − fk,l) 0   ≤ C E fl − fk,l . (4.9)

31 The inequality holds for some C0 depending on K, since f is bounded, and thereby also fk, fk,l and their moments. From the assumptions on f and the triangle inequality it follows that   Sk − bk fl − fk,l ≤ K ∧ 2K al Now proceeding from (4.9) gives     Sk − bk E fl − fk,l ≤ E K ∧ 2K al   1 Sk − bk = 2KE ∧ 1 . (4.10) 2 al Since what is inside the last two brackets is less than 1, it will increase by being raised by p < 1, defined at the beginning. We finish by continuing from (4.10) using hypothesis (4.7): " # 1 S − b   p 1p S − b p k k 1 Sk−bk k k E ∧ 1 ≤ E 2 a ∧ 1 ≤ E 2 al l 2 al " #  p p a p ak Sk−bk ∗ k = 2a E a ≤ C . l k al

∗ 1 p Here C = C 2 . The i.i.d. case of Part (b) of the following theorem is the original Almost sure (or everywhere) central limit theorem, Brosamler [8], Schatte [33] and Lacey and Philipp [25]. The general case of (b) seems to be due to Atlagh [1]. Part (a) may not have been given in this general form before. An early source for Part (c) is Peligrad and R´ev´esz [27].

Theorem 4.4. Let {Xk}k≥1 be a sequence of independent random variables. Then

(a) When {Xk}k≥1 have finite expectations {µk}k≥1 and finite variances 2 2 Pn 2 {σk}k≥1 such that for sn = k=1 σk, some constant C and all k ≥ 1,

2 σk  −1 −2 (a1) log 1 + 2 ≤ C(log sk)(log2 sk) (log3 sk) ; sk−1 (a2) 1 Pn (X − µ ) −→d N, sn k=1 k k as n → ∞, then for all real x

n S − P µ  1 X 2 2 k 1≤i≤k k a.s. 2 log sk+1/sk I ≤ x −→ Φ(x). (4.11) log s sk n k=1

32 (b) When {Xk}k≥1 have finite expectations {µk}k≥1 and finite variances 2 2 Pn 2 {σk}k≥1 such that for sn = k=1 σk,

2 n σk 1 X d max 2 → 0, and (Xk − µk) −→ N, 1≤k≤n s sn n k=1 as n → ∞ (cf. Theorem 2.7), then for all real x

n 2  P  1 X σk+1 Sk − 1≤i≤k µk a.s. 2 2 I ≤ x −→ Φ(x). (4.12) log s s sk n k=1 k

(c) When {Xk}k≥1 are identically distributed and in the domain of attrac- tion of a stable distribution G of index α, 0 < α ≤ 2 so that S − b n n , −→d G as n, → ∞ an

for some positive sequence {an} and real sequence {bn}, (cf. Theorem 2.8), then for all real x n   1 X 1 Sk − bk a.s. I ≤ x −→ G(x). (4.13) log n k ak k=1

2 2 (d) The sequence {pk}k≥1, pk := log (sk+1/sk) may in (4.11) be replaced by any other sequence {qk}k≥1, such that qk ∼ pk, as k → ∞. The corresponding conclusion also holds with respect to (4.12), that is, with 2 2 {qk}k≥1 such that qk ∼ σk+1/sk and with respect to (4.13), that is, with {qk}k≥1 such that qk ∼ 1/k. Proof. To begin with, note that Theorem 2.25 (i) applies so that conclu- sion (d) holds if (a), (b) and (c) are proved with weight sequences {pk}k≥1, 2 2 {σk+1/sk}k≥1 and {1/k}k≥1 respectively, or any other ∼-equivalent sequences. For part (a), condition (4.7) in Proposition 4.3 is satisified even for p = 2 2 which suggests that we choose ck = sk in (4.4) of Theorem 4.1, yielding 2 2 weights dk = log (sk+1/sk). Condition (a1) corresponds to condition (3.10) in Theorem 3.6 concerning the behavior of the weight sequence. Conclusion (a) thereby follows from Theorem 4.1. For part (b), which is a subcase of part (a), note that

 2   2  2 sk+1 σk+1 σk+1 log 2 = log 1 + 2 ∼ 2 . (4.14) sk sk sk Indeed 2 σk+1 2 → 0, as k → ∞. (4.15) sk

33 2 2 Arguing by contradiction, statement σk+1 > sk implies that

2 2  2  σk+1 sk σk+1 2 > 2 =  1 − 2 . sk+1 sk+1 sk+1 We hence get 2 σk+1  2 > = δ. (4.16) sk+1 1 +  From (4.16) we may contradict condition (ii) of Theorem 2.7, that is, if (4.15) does not hold. This proves (4.14). Furthermore n n X 2 2 X 2 2 2 2 2  σk+1/sk ∼ log sk+1/sk = log sn+1/s1 ∼ log sn . (4.17) k=1 k=1

2 2 2 2 2 Indeed sn+1 ∼ sn, which combined with sn → ∞ gives log (sn+1) ∼ log (sn). The first asymptotic equivalence is a consequence of (4.14). Conclusion (b) is thereby proved by the results (a) and (d). For part (c), condition (4.7) in Proposition 4.3 is satisified for p < α, according to Theorem 2.14. We hence get, for 1 ≤ k ≤ l,

a  l  1 L(l) l = α , (4.18) ak k L(k) with L slowly varying at infinity. We would now like to show that condition (4.4) with this sequence {ak}k≥1 is both stronger than the same condi- tion with {ak}k≥1 replaced by some of {ck}k≥1 sequences of Theorem 3.8 and weaker than the same condition with {ak}k≥1 replaced by some other {ck}k≥1 sequences of Theorem 3.8. (Confer statements (4.19) and (4.20) below.) This would give (4.13) as a natural conclusion in view of the same theorem. In fact, Karamata’s representation theorem of slowly varying functions (see e.g. [7, page 12]), tells us that

 Z x  L(x) = c(x)exp (u)du/u , 0 with c(x) → c ∈ (0, ∞) and (x) → 0 as x → ∞. We may therefore conclude that L(l)  l M ≤ A , (4.19) L(k) k for some constants M and A and all 1 ≤ k ≤ l. Indeed, |(x)| ≤ M gives

Z l (u)du/u ≤ M log (l/k), k

34 and c(x) bounded below and above gives c l ≤ A. ck This proves (4.19). On the other hand, introducing

1 f(n) = n 2α L(n), it follows that a  l  1 f(l) l = 2α , ak k f(k) and it remains to prove that f(l) ≥ B, (4.20) f(k) for some positive constant B and all l, k such that l > k. Now by [7, Theorem 1.5.3, page 23] it follows that

f(x) = inf{f(t): t ≥ x} ∼ f(x) as x → ∞, so that for some constant K and all k > K, f(l) f(k) ≥ ≥ 1/2. f(k) f(k) It also follows [7, Proposition 1.5.1, page 22] that

f(x) → ∞ as x → ∞, so that for some constant N and all l > N and k < K, f(l) f(l) > max f(k) =⇒ ≥ 1, k≤K f(k) We may finally define n o B = min f(l)/f(k) ∨ 1/2, k,l≤K∧N since we now take the minimum over a finite set.

One could investigate further the range of summation methods (N,¯ pn), as defined in Section 1.3, which permits conclusions of the form (4.11), (4.12) and (4.13) under the conditions (a), (b) and (c) of Theorem 4.4. Of special interest may be how weak summation methods the conclusion permits, with respect to log - and Ces`arosummation. We shall here move in this direction for the case (c) by making use of Theorem 3.10.

35 Theorem 4.5. Let {Xk}k≥1 be a sequence of i.i.d. random variables in the domain of attraction of a stable distribution G of index α, 0 < α ≤ 2 so that S − b n n −→d G, as n → ∞, an for some positive sequence {an}n≥1 and real sequence {bn}n≥1. Let, for γ ∈ [0, 1],

(γ) 1   X d := exp (log k)(γ) ,D(γ) := d . k k n k 1≤k≤n Then, for any γ ∈ [0, 1/2) and all real x, n   1 X (γ) Sk − bk a.s. lim d I ≤ x = G(x). (4.21) (γ) k n→∞ ak Dn k=1 For any γ ∈ [0, 1) and all real x, n   1 X (γ) Sk − bk p lim d I ≤ x = G(x). (4.22) (γ) k n→∞ ak Dn k=1 Proof. From (4.18) and (4.19) above and Proposition 4.3 we see that con- dition (4.4) in Theorem 4.1 is fulfilled with ck = k. Theorem 3.10 thereby applies for the type of {ξk}-sequences defined in Theorem 4.1. We here have replaced each weight sequence {dk} by an asymptotic equivalent, but this does not disturb the result according to Theorem 4.4 (d). Conclusions (4.21) and (4.22) now follow in the same manner as in Theorem 4.1.

The following theorem gives an example of the situation in (b) of Theo- rem 4.4. It is related to the theory of Records and Extremes. It follows from (iii) below that usual log-summation is not enough in this case.

Theorem 4.6. Let {Xk} be a sequence of i.i.d. continuous random variables and define

n X Ik = I{Xk > Xl, all l < k}, µ(n) = Ik. k=1 Then as n → ∞,

µ(n)−log n d (i) √ −→ N; log n

n µ(n)−log n o a.s. (ii) 1 Pn 1 I √ ≤ x −→ Φ(x), for all x; log2 n k=1 k log k log n

1 Pn 1 d (iii) log n k=1 k I{µ(n) − log n > 0} −→ A, where A denotes the arc sine distribution.

36 Proof. It can be shown that Ik are independent Bernoulli(1/k)-random vari- ables [17, page 93] so that Eµ(n) ∼ log n and Var µ(n) ∼ log n. Moreover, the conditions of Theorem 2.7 may be verified yielding (i), cf. [17, page 351]. Condition (ii) follows from Theorem 4.4 (b) and (d) since

Var I 1/k(1 − 1/k) 1 1 k+1 = ∼ ∼ . Var µ(k) Var µ(k) kVar µ(k) k log k

Finally, from Theorem 2.20 it follows that

n 1 X 1  1  d 1 − I{µ(n) − log n > 0} −→ A. log n k k k=1 This is equivalent to (iii) since

n n 1 X 1 1 X 1 I{µ(n) − log n > 0} ≤ , log n k2 log n k2 k=1 k=1 i.e. tends to zero as n → ∞, and by the general fact (Cram´er’s theorem) for d p d sequences of random variables: Xn −→ X, Yn −→ 0 implies that Xn+Yn −→ X.

4.2 Weakly dependent random variables By relaxing the assumption of independence we encounter different charac- terizations and definitions of Weakly dependent sequences which may be- have asymptotically like independent sequences and may allow central limit theorems of the form (4.1). Important classes are those of Strongly (or α−) mixing, and ρ−mixing sequences. Let {Xn}n≥1 be a sequence of ran- dom variables and denote the σ−field generated by the random variables n {Xj : m ≤ j ≤ n} by Fm. We then take as defining properties respectively: k ∞ (i) α(n) := supk≥1{|P (A ∩ B) − P (A)P (B)| : A ∈ F1 ,B ∈ Fk+n} → 0; n o (ii) ρ(n) := sup Cov (X,Y ) : X ∈ L (Fk),Y ∈ L (F∞ ) → 0, k≥1 (EX2)1/2(EY 2)1/2 2 1 2 k+n as n → ∞. Another class is Associated sequences. The defining property for a se- quence {Xn}n≥1 is that for any n ≥ 1 and any coordinatewise increasing n functions f, g : R → R we have

(iii) Cov (f(X1,...,Xn), g(X1,...,Xn)) ≥ 0, whenever the left hand side is well-defined. We shall state results from [28] concerning these classes with respect to conclusions of the form (c) of Theorem 4.4. Proofs will be indicated through

37 results of previous chapters but in close connection to the one in [28]. For an overview of some further results consult [3, Chapter 6]. To begin with we state one result from [25] which is not related to as- sumptions (i)-(iii) but which, on the other hand, depends on a different convergence result than the usual central limit theorem. It applies to broad classes of sequences of random variables (mixing, martingale differences, la- cunary trigonometric etc.), cf. [25].

Theorem 4.7. Let {Xn}n≥1 be a sequence of real random variables whose partial sums Sn permit the approximation X √ Sn − Yk = o( n), a.s. as n → ∞, (4.23) k≤n by a sequence of i.i.d. standard normal random variables {Yn}n≥1. Then n   1 X 1 Sk a.s. I √ ≤ x −→ Φ(x), for all x. log n k k=1 k Proof. In view of Theorem 2.4 we need to show that n Z 1 X 1  √  a.s. f S / k −→ fdN, (4.24) log n k k k=1 R for any bounded Lipschitz function f. When Xk = Yk, this is nothing but ˜ P Theorem 4.4 (b). For the general case, put Sn = k≤n Yk. Then, as k → ∞,    ˜  ˜ Sk Sk Sk Sk C ˜ f √ − f √ ≤ C √ − √ = √ Sk − Sk = o(1), (4.25) k k k k k by (4.23) and boundedness of f. Since n Z 1 X 1  √  a.s. f S˜ / k −→ fdN, (4.26) log n k k k=1 R and since (N,˜ 1/n) is regular (cf. Theorem 2.23) statement (4.24) now follows from (4.25) and (4.26).

Theorem 4.8. Let {Xn}n≥1 be a stationary strong mixing sequence satisfy- 2 2 2 −γ ing EX1 = 0, EX1 < ∞, σn = ESn → ∞ and α(n) = O(log n) for some γ > 0. Assume that d Sn/σn −→ N. (4.27) Then n   1 X 1 Sk a.s. I ≤ x −→ Φ(x), for all x. (4.28) log n k σk k=1

38 Theorem 4.9. Let {Xn}n≥1 be a stationary associated sequence with EX1 = P∞ 2 2 0 and k=1 EX1Xk < ∞. Let σn = ESn, then conclusions (4.27) and (4.28) hold. Proof sketches of Theorems 4.8 and 4.9. By Theorem 2.4 we need to show that n Z 1 X 1 a.s. fS /σ  −→ fdN, (4.29) log n k k k k=1 R for any bounded Lipschitz function f. By the same arguments as in Theorem 4.1 and by using assumption (4.27) (which also holds by the assumptions in Theorem 4.9, cf. [28]) we reduce (4.29) to

n 1 X 1 a.s. fS /σ  − EfS /σ  −→ 0. log n k k k k k k=1 Using Theorem 3.1 we would be done by establishing n  1 X 1    Var fS /σ  = O (log n)−1(log n)−2 , (4.30) log n k k k 2 3 k=1 Peligrad and Shao [28] establish the stronger statement that

n  1 X 1    Var fS /σ  = O (log n)− , log n k k k k=1 for some , using the assumptions of Theorems 4.8 and 4.9.  From central limit theorems for mixing sequences one may now deduce the following two theorems as corollaries of Theorem 4.8 (cf. [28] and further references therein).

Theorem 4.10. Let {Xn}n≥1 be a stationary strong mixing sequence with EX1 = 0. Assume that ∞ 2+δ X δ/(2+δ) E|X1| < ∞, for some δ > 0, and α (n) < ∞. (4.31) n=1 2 2 P∞ 2 Then σ = EX1 + 2 k=2 EX1Xk < ∞. If in addition σ > 0, then (4.28) is true.

Theorem 4.11. Let {Xn}n≥1 be a stationary ρ−mixing sequence with EX1 = 2 2 0, EX1 < ∞. Assume that σn → ∞ and ∞ X ρ(2n) < ∞. (4.32) n=1 Then (4.28) holds.

39 Remark 4.12. Condition (4.31) combined with the condition σ2 > 0 is essentially sharp in the context of α-mixing sequences with respect to the central limit theorem (4.27). That is, counterexamples exist where (4.31) is slightly violated and where (4.27) does not hold. The same is true of condition (4.32) in the context of ρ-mixing sequences. Confer Theorems 1.7 and 2.3 with subsequent comments in the survey article [26].

4.3 Subsequences We now return to independent random variables to prove a result which in fact is a consequence of Theorem 4.4 (a). The idea is to consider subse- quences of partial sums in order to prove almost sure results with the weaker, but perhaps more convenient, arithmetic (or Ces`aro)summation. We con- fine ourselves to the situation of i.i.d. random variables with finite variances. One may proceed in the same manner to derive similar results for all types of sequences of independent random variables considered in Theorem 4.4. The following theorem was introduced and proved by Schatte [33] under 3 the extra condition E|X1| < ∞. Atlagh and Weber [2] proved it in the form in which it is given below.

Theorem 4.13. Let {Xn}n≥1 be a sequence of i.i.d. random variables with 2 Pn E(X1) = 0 and E(X1 ) = 1. Set Sn = k=1 Xk, then n   1 X S k a.s. I √2 ≤ x −→ Φ(x), for all x. (4.33) n k k=1 2

Proof.√Put for k ≥ 2, Yk := S2k − S2k−1 , Y1 := X1 and put for k ≥ 1, k bk := 2 . Then {Yk}k≥1 is a sequence of independent random variables, Pn ˜ S2n = i=1 Yi =: Sn and S˜ n −→d N, as n → ∞. bn

From Theorem 4.4 (a) applied to the random variables Yk it follows that n  ˜  1 X Sk a.s. dkI ≤ x −→ Φ(x), for all x, (4.34) Dn bk k=1  1 Pn n with dk := log bk+1/bk = 2 log 2 and Dn := k=1 dk = 2 log 2. Since {dk} is bounded it follows that condition (a1) of Theorem 4.4 is fulfilled. Statement (4.34) is equivalent to (4.33).

4.4 An almost sure version of Donsker’s theorem Just as Theorems 2.16 and 2.17 provide extensions of classical central limit theorems to Functional central limit theorems, we shall now consider exten- sions of the almost sure results Theorem 4.4 (a) and (b).

40 Lemma 4.14. Let {ξk}k≥1 be a sequence of independent square integrable 2 P random variables and let µk = EXk and sn = 1≤k≤n Var Xk. Then, for some constant C,

n  X  E max Sk − µi ≤ Csn. (4.35) 1≤k≤n i=k Proof. By The Kolmogorov inequality [17, Chapter 3, Theorem 1.6] it follows that, for x > 0

k  X  −2 P max Sk − µn > x · sn ≤ x . (4.36) 1≤k≤n i=1 The right hand side of (4.36) is integrable at infinity, which proves (4.35).

Lemma 4.15. Let {ξk}k≥1 be a sequence of i.i.d. and F -distributed random variables, with F belonging to the domain of attraction of a stable distribu- tion G of index α, for some 0 < α ≤ 2. Assume moreover that S − b n n −→d G, (4.37) an for some sequences {an}n≥1 and {bn}n≥1. Then, for all β < α and some constant C, β  |Sk − bk|  E max β ≤ C, (4.38) 1≤k≤n an Proof. We first assume that F and G are symmetric (which implies that {bn}n≥1 may be left out). By Proposition 2.13 (the L´evyinequalities) it follows that  |S |  |S |  P max k > x ≤ 2P n > x . (4.39) 1≤k≤n an an The integral of the right hand side of 4.39 against xβ−1 is finite and bounded uniformly in n by Theorem 2.14 so that the same is true of the left hand side, proving (4.38) in the symmetric case. For the non-symmetric case we apply a strong symmetrization inequality [17, Proposition 6.3, Chapter 3]:  S − b S − b    Ss  P max k k − med k k ≥ x ≤ 2P max k ≥ x . 1≤k≤n an an 1≤k≤n an Since S − b  S − b  med k k ≤ C med k k an ak  S − b β1/β ≤ C21/β E k k ≤ C0, ak

41 for all β < α and some constants by [17, Proposition 6.1, Chapter 3] and Theorem 2.14, it follows by the triangle-inequality, that for some positive constant a,  S − b   Ss  P max k k ≥ x ≤ 2P max k ≥ x − a . (4.40) 1≤k≤n an 1≤k≤n an The first part of the proof now gives that also the integral of the left hand side of (4.40) against xβ−1 is finite and bounded uniformly in n, proving (4.38) in the non-symmetric case.

We can now state and prove our result. The proof given here is an adaptation of the one given in [25] for the i.i.d. case.

Theorem 4.16. Let W denote Wiener measure on C[0, 1]. Let {ξk}k≥1 be a sequence of independent random variables satisfying the conditions of 2 Theorem 2.16. Let {sn}n≥1 and {Sn}n≥1 be the partial sums of variances and variables respectively and define for n ≥ 1

2 n Sk(ω) sk Xt (ω) = , for t = 2 , k = 1, ..., n, sn sn and linearly interpolated for t ∈ [0, 1] in between. Then, as n → ∞,

n 2 1 X σ k+1 δXk ⇒ W, almost surely. log s2 s2 n k=1 k Proof. We proceed in analogy with the proofs of Theorems 4.1 and 4.4. 2 2 2 Let dk = σk+1/sk and Dn = log (sn). By Theorem 2.4 we need to show that n Z 1 X k a.s. dkf X −→ fdW, Dn k=1 for all bounded Lipschitz functions f defined on C[0, 1]. Theorem 2.17 gives (through Remark 2.19) and Theorem 2.1 that Z EfXk → fdW, so that it remains to show that n 1 X k k a.s. dk f X − Ef X −→ 0, Dn k=1 for all bounded Lipschitz funtions f. Let

k k ζk = f X − Ef X .

42 Just as in Theorem 4.4 it suffices to prove that for some constant M and all k and l obeying 1 ≤ k ≤ l,

sk  E(ζkζl) ≤ M . (4.41) sl Define for 0 ≤ t ≤ 1 the random element

 2 2 0, 0 ≤ t ≤ sk/sl , rk,l(t) = l 2 2 Xt − Sk/sl, sk/sl ≤ t ≤ 1, and fk,l := f(rk,l). Then rk,l(t) depends only on ξk+1, . . . , ξl; is independent of Xk, and moreover

 Xl, 0 ≤ t ≤ s2/s2, Xl − r (t) = t k l t k,l Sk , s2/s2 ≤ t ≤ 1. sl k l

k l Therefore, with fk := f X and fl := f X and by the regularity of f, we have that

|E(ζkζl)| = |Cov (fk, fl)| = |Cov (fk, fl − fk,l)| ≤ CE|fl − fk,l|

0  l  0  |Si| 0 1   ≤ C E kX − rk,lk∞ = C E max = C E max |Si| . i≤k sl sl i≤k To arrive at (4.41) we finally apply the inequality of the conclusion of Lemma 4.14.

Remark 4.17. Theorem 4.7 could easily be extended to the form of Theo- rem 4.16 by the conclusion of Theorem 4.16. Consult [25] for further details.

The following theorem, as well as its proof, is modeled on the preceding one. The result may however be new. Note that we no longer consider convergence of measures on C[0, 1] but on D[0, 1] (cf. Remark 2.19).

Theorem 4.18. Let F , G, {ξk}k≥1, {Sn}n≥1, {an}n≥1, {bn}n≥1 and X ∈ D[0, 1] be as in Lemma 4.15. Let further Xn be the random function on [0, 1] defined by   n Sk(ω) − bk k k + 1 Xt (ω) = , for t ∈ , , k = 1, ..., n. an n n Then, as n → ∞,

n 1 X 1 δXk ⇒ X, almost surely. log n k k=1

43 Proof. Let dk = 1/k and Dn = log n. By Theorem 2.4 we need to show that n Z 1 X k a.s. dkf X −→ fdX, Dn k=1 for all bounded Lipschitz functions f defined on D[0, 1]. Theorem 2.18 gives that Z EfXk → fdX, so that it remains to show that n 1 X k k a.s. dk f X − Ef X −→ 0. Dn k=1 Let k k ζk = f X − Ef X . Just as in Theorem 4.4 (b) it suffices to prove that for some constants M and p and all k and l obeying 1 ≤ k ≤ l, ak p E(ζkζl) ≤ M . al Define for 0 ≤ t ≤ 1 the random element  0, 0 ≤ t ≤ k/l, r (t) = k,l Xl − Sk−bk , k/l ≤ t ≤ 1, t al and fk,l := f(rk,l). Then rk,l(t) depends only on ξk+1, . . . , ξl, is independent of Xk, and moreover  Xl, 0 ≤ t ≤ k/l, Xl − r (t) = t t k,l Sk−bk , k/l ≤ t ≤ 1. al Now let d(·, ·) be the metric of D[0, 1] and note that

d(x, y) ≤ kx − yk∞, for any two elements x, y of D[0, 1]. Let further K be a Lipschitz constant, as k well as an upper bound for f and p < α ∧ 1 any number. With fk := f X l and fl := f X , we therefore have that

|E(ζkζl)| = |Cov (fk, fl)| = |Cov (fk, fl − fk,l)| ≤ CE|fl − fk,l|  l   l  ≤ CE Kd(X , rk,l) ∧ 2K ≤ CE KkX − rk,lk∞ ∧ 2K h |S − b | i h |S − b | ip = C0E max i i ∧ 1 ≤ C0E max i i ∧ 1 i≤k 2al i≤k 2al p p p 0 h |Si − bi|i 0 ak  h |Si − bi| i ≤ C E max = C E max p i≤k 2al 2al i≤k ak a p ≤ C00 k , al by Lemma 4.15.

44 5 Generalizations and Related Results

5.1 A universal result and some consequences Here we collect some results which no longer merely concern central limit theory. Theorem 5.1 is due to Berkes [4, Theorem 2] and can be proved in a similar fashion to how we arrived at Theorem 4.4. Its main condition is reminiscent of the proof of Proposition 4.3. In fact, Theorem 4.4 may be deduced from Theorem 5.1 with, for example,

l  X  fl(x1, . . . , xl) = xi − bl /al, i=1 l  X  fk,l(xk+1, . . . , xl) = xi − (bl − bk) /al, i=k+1 when proving statement (b). (Cf. [4, Theorem A].) When applying Theorem 5.1 one could typically start from the result

d fk(X1,...,Xk) −→ G, as k → ∞. The main condition is then a measure on how little the k first independent random variables influence the variables {fl(X1,...,Xl)}l>k asymptotically as l → ∞. A limit relation which no longer holds when a few variables are changed is typically what Theorem 5.1 does not cover. Confer [4, pages 3-5] for further discussion.

k Theorem 5.1. Let {Xk}k≥1 be independent random variables, fk : R → R, k = 1, 2,..., measurable functions and assume that for each 1 ≤ k < l there l−k exists a measurable function fk,l : R → R such that  −(1+) E fl(X1,...,Xl) − fk,l(Xk+1,...,Xl) ∧ 1 ≤ C log+ log+ (cl/ck) , for some constants C > 0,  > 0 and a positive, non-decreasing sequence {cn}n≥1 satisfying cn → ∞, cn + 1/cn = O(1) as n → ∞. Put X dk = log (ck+1/ck),Dn = dk. 1≤k≤n Then, for any distribution function G, the relations n 1 X a.s. lim dkI{fk(X1,...,Xk) ≤ x} = G(x), for any x ∈ CG, n→∞ Dn k=1 and n 1 X lim dkP {fk(X1,...,Xk) ≤ x} = G(x), for any x ∈ CG, n→∞ Dn k=1

45 are equivalent. The result remains valid if we replace the weight sequence ∗ ∗ P ∗ {dk}k≥1 by any sequence {dk}k≥1 such that 0 ≤ dk ≤ dk, dk = ∞. Proof. [4, Theorem 2].

Berkes deduces Theorems 5.2 and 5.3 below from Theorem 5.1. Theorem 5.2 was originally proved independently in [12] and [9].

Theorem 5.2. Let {Xk}k≥1 be i.i.d. random variables such that setting Mk = max1≤i≤k Xi we have

d ak(Mk − bk) −→ G, for some numerical sequences {an}n≥1 and {bn}n≥1 and a distribution func- tion G. Then n 1 X 1 a.s. lim I{ak(Mk − bk) ≤ x} = G(x), for any x ∈ CG. n→∞ log n k k=1

Theorem 5.3. Let {Xk}k≥1 be i.i.d. random variables with continuous dis- tribution function F , let Fn be the empirical distribution function defined by 1 X F (x) = I{X ≤ x} n n k 1≤k≤n and let

Dn = sup Fn(x) − F (x) x be the Kolmogorov statistic. Then

n √ ∞ 1 X 1  a.s. X j −2j2x2 lim I kDk ≤ x = (−1) e , for any x. n→∞ log n k k=1 j=−∞

Theorems 5.4 and 5.5 are more recent (2002 and 2006 respectively). The latter is the “almost sure version” of the former.

Theorem 5.4. Let {Xk}k≥1 be a sequence of positive i.i.d. square integrable 2 random variables with E(X1) = µ, Var (X1) = σ > 0 and the coefficient of Pk variation γ = σ/µ. Let Sk = i=1 Xi, k ≥ 1. Then √  Qn S 1/ γ n √ Y := k=1 k −→d e 2N, as n → ∞. n n!µn

Proof. Cf. [29].

46 Theorem 5.5. Let {Xk}k≥1, {Yk}k≥1, µ, σ and γ be as in Theorem 5.4. Then, for any real x,

n 1 X 1 a.s. lim I{Yk ≤ x} = F (x), (5.1) n→∞ log n k k=1 √ where F is the distribution function of the random variable e 2N.

Proof. Cf. [16].

5.2 Return times We now use results from the previous section to derive some results con- cerning Return times, with respect to the origin, of the simple, symmetric random walk. We first consider two dimensions and then one dimension. The analysis is inspired by [4, pages 32-33] from which Theorem 5.9 is taken. Theorem 5.13, which we deduce here, could be viewed as the corresponding result for one dimension. Let in both cases 0 = τ0 < τ1 < . . . denote the successive times of return to the origin and Xn = τn − τn−1, n ≥ 1, the excursion times. Naturally, {Xn}n≥1 forms a sequence of i.i.d. random variables. (Cf. Propositions 5.6 and 5.10 below for the question of well-definedness.) The following proposition dates back to 1950 and Dvoretzky and Erd˝os.

Proposition 5.6. Let τ1 be the time of the first return to the origin in the two-dimensional setting. Then π P (τ > t) ∼ , as t → ∞. 1 log t

Proof. [31, Lemma 19.1, page 197].

Proposition 5.7. Let Mk = max1≤i≤k Xi, where {Xi}i≥1, denotes the se- quence of excursion times in the two-dimensional setting. Then

1 d (i) k log Mk −→ H;

1 d (ii) k log τk −→ H, where the distribution function of H is defined by

 e−π/x, if x > 0, H(x) = 0, if x ≤ 0.

Proof. Since Mk ≤ τk ≤ kMk,

47 it follows that 0 ≤ log τk − log Mk ≤ log k, so that 1 log τ − log M  → 0, as k → ∞. k k k Statements (i) and (ii) are therefore equivalent. To prove (i) we note that the random variables {log Mk/k} are positive since X1 ≥ 2. For x > 0 it now follows by independence and Proposition 5.6 that log M      k P k ≤ x = P M ≤ ekx = P X ≤ ekx k k 1  π k ∼ 1 − → e−π/x, kx as k → ∞.

Remark 5.8. The distribution H defined in Proposition 5.7 belongs to the family of extremal distributions, it is in fact of Fr´echet type. The same holds for the distribution G defined in Proposition 5.11.

Theorem 5.9. Consider the two-dimensional setting. Then

1 Pn 1  1 a.s. (i) limn→∞ log n k=1 k I k log τk ≤ x = H(x), for any x;

1 Pn 1  1 a.s. (ii) limn→∞ log n k=1 k I k log Mk ≤ x = H(x), for any x, where H and {Mk}k≥1 are defined in Proposition 5.7. Proof. The result follows from Theorem 5.1, cf. [4, Theorem H, page 33].

And now to corresponding results for the one-dimensional simple, sym- metric random walk.

Proposition 5.10. Let τ1 be the time of the first return to the origin in the one-dimensional setting. Then

 2 1/2 P (τ > t) ∼ t−1/2, as t → ∞. 1 π Proof. By combinatorial arguments (cf. e.g. [31, page 94]) we get that

2n P X > 2n = 2−2n ,P X > 2n + 1 = P X > 2n. 1 n 1 1

The result follows by Stirling approximation of the factorials.

48 Proposition 5.11. Let Mk = max1≤i≤k Xi, where {Xi}i≥1, denotes the sequence of excursion times in the one-dimensional setting. Then M k −→d G, k2 where the distribution function of G is defined by    1/2  exp − 2 , if x > 0, G(x) = πx  0, if x ≤ 0.

2 Proof. Positivity of the random variables Mk/k is obvious. For x > 0 it now follows by independence and Proposition 5.10 that M      k P k ≤ x = P M ≤ k2x = P X ≤ k2x k2 k 1  1  2 1/2k   2 1/2 ∼ 1 − → exp − , k πx πx as k → ∞.

Proposition 5.12. Consider the one-dimensional setting. Then τ k −→d F, (5.2) k2 where F is the so-called L´evy-distribution with distribution function defined by ( 1 R x −3/2 − 1 √ t e 2t dt if x > 0 F (x) = 2π 0 0 if x ≤ 0

Proof. Since {τk} is a partial sum sequence of positive, i.i.d. τ1-distributed variables, (5.2) follows from Proposition 5.10 and Theorem 2.8 with pre- ceding remarks and by noting that the above mentioned L´evy-distribution has skewness parameter β = 1 and index α = 1/2, in the conventional terminology concerning stable distributions. A proof based on exact calculations of probabilities and Stirling approx- imation is also possible, confer [31, Theorem 9.11, page 99].

Theorem 5.13. Consider the one-dimensional setting. Then

1 Pn 1  τk a.s. (i) limn→∞ log n k=1 k I k2 < x = F (x), for any x;

1 Pn 1  Mk a.s. (ii) limn→∞ log n k=1 k I k2 < x = G(x), for any x, where F is defined in Proposition 5.12 and G and {Mk}k≥1 in Proposition 5.11. Proof. Statements (i) and (ii) are consequences of Theorems 4.4 (c) and 5.2 respectively.

49 5.3 A local limit theorem All results of this section are stated in Chapter 8 of the review paper [3], as well as some further results and references for these. The following theorem was published in 1951. It may be interesting to note that the line of reasoning in the published proof is closely related to the arguments in Chapter 3.

Theorem 5.14. Let {Xk}k≥1 be a sequence of i.i.d. integer random vari- Pn ables with E(X1) = 0 and let Sn = k=1 Xk for n ≥ 1. Assume that every integer a is a possible value of Sn for all sufficiently large n. Finally, set

n X Mk := P (Si = a). k=1 Then n 1 X I{Sk = a} a.s. lim = 1. n→∞ log Mn Mk k=1 Proof. [22, Theorem 6].

The next result, which we may call the Almost sure local central limit theorem, follows from the preceding theorem and Theorem 2.21, as is noted in the proof below. We here also provide a proof independent of Theorem 5.14, based on the results of Chapter 3 and Theorem 2.21.

Theorem 5.15. Let {Xk}k≥1 and {Sn}n≥1 be as in Theorem 5.14. Assume 2 2 moreover that E(X1 ) = σ < ∞. Then

n 1 X I{Sk = a} a.s. 1  1 1/2 lim = . (5.3) n→∞ log n k1/2 σ 2π k=1 Proof. From Theorem 2.21 it follows that, for all integers b 1 P (Si = b) ∼ √ , (5.4) σ 2πn and by, (what we now find to be!), standard techniques we deduce that

−1 1/2 Mk ∼ σ (2k/π) ) and log Mk ∼ (log k)/2, with Mk defined as in Theorem 5.14. The desired conclusion follows from Theorem 2.25 (i) and Theorem 5.14. For the alternative proof, define for k ≥ 1,

1  1 1/2 √ C = ; I = I{S = a}; P = P (S = a); ξ = kI − C. σ 2π k k k k k k

50 The desired conclusion (5.3) is equivalent to

n 1 X ξk a.s. lim = 0. (5.5) n→∞ log n k k=1

The sequence {ξk}k≥1 is uniformly bounded below and each variable is square-integrable, so that the conditions of Theorem 3.1 are fulfilled. Through Theorem 3.8, statement (5.5) follows if we show e.g. that

k 1/2 E(ξkξl) ≤ M , (5.6) l for some constant M and all 1 ≤ k ≤ l. Now √ √ √ √  2 E(ξkξl) = k lP Sk = a, Sl = a + C − C lPl − C kPk √ √ √ √  2 = k lPkP Sl−k = 0 + C − C lPl − C kPk.

Applying (5.4) with b = a and b = 0 we get that √ √ √ 2 1 0 l E(ξkξl) ≤ DC k l√ √ + 1 − 1 − 1 = C √ − 1 , k l − k l − k for some constant D and C0 = DC2. Finally √ √ √ l  k 1/2 k √ k √ = 1 + ≤ 1 + √ ≤ 1 + 2 √ , l − k l − k l − k l so that √ 1/2 0k  E(ξkξl) ≤ 2C , l proving (5.6) as desired.

We now investigate the consequences for the simple, symmetric random walk in one dimension. As noted in the proof below, there is a connection to Theorem 5.13 (i) of the preceding section.

Theorem 5.16. Let {ξk}k≥1 be a sequence of i.i.d. random variables with P P (ξ1 = 1) = P (ξ1 = −1) = 1/2 and let Sn = 1≤k≤n ξk. Let 0 = τ0 < τ1 < ... denote the successive subscripts where Sk = 0. Then

1 Pn 1 1/2 a.s. 2 1/2 (i) limn→∞ = ; log n k=1 τk π

1 Pn 1 1/2 a.s. 1 1/2 (ii) limn→∞ = ; log τn k=1 τk 2π

log τn a.s. (iii) limn→∞ log n = 2.

51 Proof. It suffices to prove two of the three statements above. Since (iii) follows from Theorem 11.6 in [31] it suffices to prove (ii). Statement (ii) now follows from Theorem 5.15 since

n τn 1 X  1 1/2 1 X I{Sk = 0} = , 1/2 log τn τk log τn k k=1 k=1

2 and since Eξ1 = 1. Remark 5.17. One may note the connection between (i) and Theorem 5.13. Namely, by defining

g(x) = x−1/2, for x > 0, we see that statement (i) is equivalent to

n Z ∞ 1 X  τk  a.s.  2 1/2 g = f(x)g(x)dx = , log n k2 π k=1 0 where f(x) = √1 x−3/2e−1/(2x), x > 0, is the density function of the L´evy 2π distribution. To deduce (i) immediately from Theorem 5.13 there is however a slight difficulty to overcome, namely that g is not bounded.

5.4 Generalized moments in the almost sure central limit theorem Returning to the beginning and our fundamental result (1.1), we saw (by Theorem 2.4) and also made use of the fact that it could be equivalently stated as n   Z 1 X 1 Sk a.s. lim f √ = f(x)φ(x)dx, (5.7) n→∞ log n k k=1 k for any bounded, Lipschitz-continuous function f. The question now arises, can the result (5.7) be extended to a larger class of functions f? Do we need to assume more than the well-definedness of R f(x)φ(x)dx? The answers are in both cases affirmative. Conditions concerning con- tinuity as well as asymptotic behavior need to be imposed. Several papers have dealt with the the problem in this form (cf. [3, Chapter 3]). Theorem 5.18 below due to Ibragimov and Lifschitz seems to be best possible at the moment. For a counterexample where condition (5.8) is not fulfilled we refer to [21].

Theorem 5.18. Let {Xk}k≥1 be a sequence of i.i.d. random variables with 2 EX1 = 0 and EX1 = 1. Further, let A and H0 > 0 be constants and assume 2 that f :[A, ∞) → R+ is a nondecreasing function such that f(x) exp {−H0x }

52 R ∞ is nonincreasing and A f(x)φ(x)dx < ∞. Then for every continuous func- tion h which satisfies

|h(x)| ≤ f(|x|), |x| ≥ A, (5.8) we have n   Z 1 X 1 Sk a.s. lim h √ = h(x)φ(x)dx. n→∞ log n k k=1 k Proof. [21].

Several authors have noted and used the connection between results of the form (5.7) and the Birkhoff ergodic theorem applied to the Ornstein- Uhlenbeck process. Corollary 5.20 is indeed a “continuous version” of (5.7) with partial sums replaced by Brownian motion and sums by integrals. In this case then, no extra conditions on f are necessary. Theorem 5.19 below concerns the more general setting which corresponds to Theorem 4.16.

Theorem 5.19. Let B(t), t ≥ 0, be a standard Brownian motion defined on a probability space (Ω, P,P ). Set for each t > 0 and each s ∈ [0, 1]

−1/2 Bt(s) = t B(st).

Let further W denote Wiener measure on C[0, 1]. Then, for all f ∈ L1(W ),

T 1 Z 1 a.s. Z f(Bu)du −→ fdW, as T → ∞. log T 1 u Proof. By changing variables, u = et, the conclusion is equivalent to

T 1 Z a.s. Z f(Xt)dt −→ fdW, as T → ∞, (5.9) T 0 where we define

−t/2 t Xt(s) := Bet (s) = e B(se ), for s ∈ [0, 1] and t ≥ 0. (5.10)

We shall deduce (5.9) from Birkhoff’s ergodic theorem. We transfer the ˜ C[0,1] problem (cf. Billingsley [5, page 19]) and define the product space Ω = R≥0 and the mapping

ψ :Ω −→ Ω˜, ψ(ω) = X·(·, ω).

We then equip Ω˜ with the σ-algebra and measure induced by ψ, hence defining a probability space (Ω˜, P˜, P˜). If we can show that P˜ is an invariant and ergodic measure with respect to the usual one step shift operator τ1

53 defined on Ω˜ then an application of Birkhoff’s ergodic theorem would give that n−1 Z 1 X a.s. g(τ ω˜) −→ gdP˜ as n → ∞, (5.11) n k k=0 for all g ∈ L1(Ω).˜ But as Krengel states in [23, page 10] the continuous time version T 1 Z a.s. Z g(τtω˜)dt −→ gdP˜ as T → ∞, (5.12) T 0 is a consequence of the usual theorem if we (with analoguous definitions) assume that P˜ is invariant and ergodic with respect to the semiflow {τt, t ≥ 0} of t-step shifts. We now let T be the projection on the first coordinate of Ω,˜ C[0,1] T : R≥0 −→ C[0, 1],T (˜ω) =ω ˜0, and for f ∈ L1(W ) we define g = f ◦ T . Now g ∈ L1(Ω˜, P˜), since, for each t ≥ 0, Xt(·) is a [0,1]-Brownian motion, and it also follows that

f ◦ Xt = f ◦ T ◦ τt ◦ ψ = g ◦ τt ◦ ψ. By definition of ψ, and since R gdP˜ = R fdW , (5.9) therefore follows from (5.12). It remains to show invariance and ergodicity of the measure P˜ with respect to shift operators. Invariance (or stationarity) here means ˜ −1 ˜ ˜ ˜ P (τu B) = P (B), for all u. (5.13) Replacing ergodicity by the stronger mixing property (cf. [page 12][5]) means showing ˜ ˜ −1 ˜ ˜ ˜ ˜ ˜ P (A ∩ τu B) → P (A)P (B) as u → ∞. (5.14) A˜ and B˜ here denote arbitray sets in Ω.˜ Following Billingsley [6, page 194] we regard our original Brownian motion B(t) as a random element in D[0, ∞). The σ-algebra of this space is generated by sets of the form {x ∈ D[0, ∞): x(t) ∈ I}, for a fixed interval I and a fixed value of t ∈ [0, ∞), (cf. [6, Theorem 12.5, page 134]). Now P˜ is likewise induced by the mapping ϕ : D[0, ∞) −→ Ω˜, ϕ(x) = e−t/2x(set), for t ≥ 0 and s ∈ [0, 1], with D[0, ∞) equipped with Brownian motion measure, which we denote by PW . This means that both (5.14) and (5.13) only need to be considered in a restrictive class of sets, namely sets A˜ and B˜ of the following type −1 A = {x ∈ D[0, ∞): x(t0) ∈ I}, ϕ (A˜) = A, −1 B = {x ∈ D[0, ∞): x(t1) ∈ J}, ϕ (B˜) = B.

54 If we as usual distinguish elements in Ω˜ through coordinates t ≥ 0 and s ∈ [0, 1], then B˜ is characterized by

−t/2 −t ω˜t(s) ∈ e J, for all t ≥ 0 and s ∈ [0, 1] such that s = e t1.

−1 ˜ It follows that τu B is characterized by

−t/2 −t ω˜t−u(s) ∈ e J, for all t ≥ u and s ∈ [0, 1] such that s = e t1.

−1 −1 ˜ u Therefore, denoting ϕ (τu B) =: B , we finally get

u u/2 −u B = {x ∈ D[0, ∞): e x(t1e ) ∈ J}.

We may then prove (5.13) since

˜ −1 ˜ u u/2 −u P (τu B) = PW (B ) = PW (e B(t1e ) ∈ J) = PW (B(t1) ∈ J) = PW (B) = P˜(B˜).

To prove (5.14) we note that

˜ ˜ −1 ˜ −1 ˜ −1 ˜  −1 ˜ −1 −1 ˜  P A ∩ τu B = PW ϕ (A ∩ τu B) = PW ϕ (A) ∩ ϕ (τu B) u = PW A ∩ B u/2 −u  = PW B(t0) ∈ I ∩ e B(t1e ) ∈ J . (5.15)

We therefore need to show that (5.15) converges to  PW B(t0) ∈ I)PW (B(t1) ∈ J , (5.16)

u/2 −u as u → ∞. Let X := B(t0), Y := B(t1) and Yu := e B(t1e ). Now

 u/2 −u  u/2 −u −u/2 Cov X,Yu = e Cov B(t0),B(t1e ) = e t1e = e t1, for u sufficiently large, and tends hence to 0 as u → ∞. Now

d Yu = Y ∼ N(0, t1), for all u. By the definition of bivariate normal distributions, it now fol- lows that the density of (X,Yu) converge to the density of (X,Y ), proving convergence of (5.15) to (5.16) for arbitrary intervals I and J.

Corollary 5.20. Let B(t), t ≥ 0, be a standard Brownian motion. Then, for all f ∈ L1(N),

1 Z T 1 B(u) Z f du −→a.s. f(x)φ(x)dx, as T → ∞. 1/2 log T 1 u u

55 References

[1] M. Atlagh. Th´eor`eme central limite presque sˆuret loi du logarithme it´er´epour des sommes de variables al´eatoires ind´ependantes. C. R. Acad. Sci. Paris S´er. I, 316:929–933, 1993.

[2] M. Atlagh and Weber M. Un th´eor`emecentral limite presque sˆurrelatif `ades sous-suites. C. R. Acad. Sci. Paris S´er. I, 315:203–206, 1992.

[3] I. Berkes. Results and problems related to the pointwise central limit theorem. In B. Szyszkowicz, editor, Asymptotic results in Probability and Statistics, pages 59–96. Elsevier, Amsterdam, 1998.

[4] I. Berkes and E. Cs`aki.A universal result in almost sure central limit theory. Stochastic Process. Appl., 94(1):105–134, 2001.

[5] P. Billingsley. and Information. Tracts on Prob. and Statistics. John Wiley and Sons, 1965.

[6] P. Billingsley. Convergence of Probability Measures. John Wiley and Sons, second edition, 1999.

[7] N.H Bingham, C.M. Goldie, and J.L. Teugels. Regular variation. Cam- bridge University Press, 1987.

[8] G. Brosamler. An central limit theorem. Math. Proc. Cambridge Phil. Soc., 104:561–574, 1988.

[9] S. Cheng, L. Peng, and Y. Qi. Almost sure convergence in extreme value theory. Math. Nachr., 190:43–50, 1998.

[10] A. deAcosta and E. Gin´e. Convergence of moments and related func- tionals in the general central limit theorem in banach spaces. Z. Wahrsch. verw. Gebiete, 48:213–231, 1979.

[11] N. Etemadi. Stability of sums of weighted nonnegative random vari- ables. J. Multivariate Anal., 13:361–365, 1983.

[12] I. Fahrner and Stadtm¨uller U. On almost sure max-limit theorems. Statist. and Probab. Lett., 37:229–236, 1998.

[13] W. Feller. An Introduction to Probability Theory and Its Applications, Vol 1. John Wiley and Sons, second edition, 1968.

[14] W. Feller. An Introduction to Probability Theory and Its Applications, Vol 2. John Wiley and Sons, second edition, 1971.

[15] B.V. Gnedenko and A.N. Kolmogorov. Limit Distributions for Sums of Independent Random Variables. Addison-Wesley, revised edition, 1968.

56 [16] K. Gonchigdanzan and G.A. Rempala. A note on the almost sure limit theorem for the product of partial sums. Appl. Math. Lett., 19:191–196, 2006.

[17] A. Gut. Proabability: A Graduate Course. Springer, 2005.

[18] G.H. Hardy. Divergent series. Oxford University Press, 1949.

[19] T-C. Hu, A. Rosalsky, and A.I. Volodin. On the golden ratio, strong law, and first passage problem. Mathematical Scientist, pages 1–10, 2005.

[20] I.A. Ibragimov and M.A. Lifshits. On almost sure limit theorems. The- ory Probab. Appl., 44(2):254–272, 1998.

[21] I.A. Ibragimov and M.A. Lifshits. On the convergence of generalized moments in almost sure central limit theorem. Statist. and Probab. Lett., 40:343–351, 1998.

[22] Chung K.L. and P. Erd¨os. Probability limit theorems assuming only the first moment 1. Memoirs of the AMS, 6, 1951.

[23] U. Krengel. Ergodic Theorems. Walter de Gruyter, 1985.

[24] E. Kreyszig. Introductory Functional Analysis with Applications. John Wiley and Sons, 1978.

[25] M. Lacey and W. Philipp. A note on the almost everywhere central limit theorem. Statist. and Probab. Lett., 9:201–205, 1990.

[26] M. Peligrad. Recent advances in the central limit theorem and its weak invariance principle for mixing sequences of random variables (a sur- vey). In E. Eberlein and M.S. Taqqu, editors, Dependence in Probability and Statistics. A Survey of Recent Results., pages 193–225. Birkh¨auser, 1985.

[27] M. Peligrad and P. R´ev´esz. On the almost sure central limit theorem. In A. Bellow and R. Jones, editors, Almost everywhere convergence II, pages 209–225. Academic Press, New York, 1991.

[28] M. Peligrad and Q. Shao. A note on the almost sure central limit theorem for weakly dependent random variables. Statist. and Probab. Lett., 22(2):131–136, 1995.

[29] G. Rempala and J. Wesolowski. Asymptotics for products of sums and u-statistics. Electron. Comm. Probab., 7:47–54, 2002.

[30] S. I. Resnick. Point processes, regular variation and weak convergence. Adv. in Appl. Probab., 18:66–138, 1986.

57 [31] P. R´ev´esz. Random walk in random and non-random environments. World Scientific Publishing Co., 1990.

[32] G. Samorodnitsky and M. S. Taqqu. Stable Non-Gaussian Random Processes. Chapman and Hall, New York, 1994.

[33] P. Schatte. On strong versions of the central limit theorem. Math. Nachr., 137:249–256, 1988.

58