Average laws in analysis Silvius Klein Norwegian University of Science and Technology (NTNU) The : informal statement

The theoretical of an experiment is approximated by the average of a large number of independent samples.

theoretical expected value ≈ empirical average The law of large numbers (LLN)

Let X1, X2,..., Xn,... be a sequence of jointly independent , identically distributed copies of a scalar random variable X. Assume that X is absolutely integrable, with expectation µ.

Define the partial sum process

Sn := X1 + X2 + ... + Xn.

Then the average process

S n → µ as n → . n

∞ The law of large numbers: formal statements

Let X1, X2, . . . be a sequence of independent, identically distributed random variables with common expectation µ.

Let Sn := X1 + X2 + ... + Xn be the corresponding partial sums process. Then

Sn 1 (weak LLN) → µ in . n That is, for every  > 0,

S n − µ >  → 0 as n → . P n  ∞ Sn 2 (strong LLN) → µ almost surely. n It was the best of times, it was the worst of times.

Charles Dickens, A tale of two cities (click here) yskpw,qol,all/alkmas;’.a ma;;lal;,qwmswl,;q;[;’ lkle’78623rhbkbads m ,q l;,’;f.w, ’ fwe It was the best of times, it was the worst of times. jllkasjllmk,a s.„,qjwejhns;.2;oi0ppk;q,Qkjkqhjnqnmnmmasi[oqw— qqnkm,sa;l;[ml/w/’q

Application of LLN: the infinite monkey theorem

Let X1, X2, . . . be i.i.d. random variables drawn uniformly from a finite alphabet.

Then almost surely, every finite phrase (i.e. finite string of symbols in the alphabet) appears (infinitely often) in the string X1X2X3 . . .. Application of LLN: the infinite monkey theorem

Let X1, X2, . . . be i.i.d. random variables drawn uniformly from a finite alphabet.

Then almost surely, every finite phrase (i.e. finite string of symbols in the alphabet) appears (infinitely often) in the string X1X2X3 . . .. yskpw,qol,all/alkmas;’.a ma;;lal;,qwmswl,;q;[;’ lkle’78623rhbkbads m ,q l;,’;f.w, ’ fwe It was the best of times, it was the worst of times. jllkasjllmk,a s.„,qjwejhns;.2;oi0ppk;q,Qkjkqhjnqnmnmmasi[oqw— qqnkm,sa;l;[ml/w/’q Application of LLN: the infinite monkey theorem

Let X1, X2, . . . be i.i.d. random variables drawn uniformly from a finite alphabet.

Then almost surely, every finite phrase (i.e. finite string of symbols in the alphabet) appears (infinitely often) in the string X1X2X3 . . .. yskpw,qol,all/alkmas;’.a ma;;lal;,qwmswl,;q;[;’ lkle’78623rhbkbads m ,q l;,’;f.w, ’ fwe It was the best of times, it was the worst of times. jllkasjllmk,a s.„,qjwejhns;.2;oi0ppk;q,Qkjkqhjnqnmnmmasi[oqw— qqnkm,sa;l;[ml/w/’q The second Borel-Cantelli lemma

Let E1, E2,..., En,... be a sequence of jointly independent events. If

P(En) = , ∞ = nX1 ∞ then almost surely, an infinite number of En hold simultaneously.

This can be deduced from the strong law of large numbers, applied to the random variables

1 Xk := Ek . The actual proof of the infinite monkey theorem

Split every realization of the infinite string of symbols in the alphabet X1X2X3 ... Xn ... into finite strings S1, S2, . . . of length 52 each.

Let En be the event that the phrase

It was the best of times, it was the worst of times. is exactly the n-th finite string Sn.

These are independent events. They each have the same probability p > 0 to occur.

Apply the second Borel-Cantelli lemma. The law of large numbers

We have seen that if

X1, X2,..., Xn,... is a sequence of jointly independent , identically distributed copies of a scalar random variable X, and if we denote the corresponding sum process by

Sn := X1 + X2 + ... + Xn, then the average process

S n → X as n → . n E

∞ A rather deterministic system: circle rotations

Let S be the unit circle in the (complex) plane.

There is a natural measure λ on S (i.e. the extension of the arc-length).

Let 2πα be an angle, and denote by Rα the rotation by 2πα on S.

That is, consider the transformation

Rα : S → S,

π π α where if z = e2 i x ∈ S and if we denote ω := e2 i , then 2π i (x+α) Rα(z) = e = z · ω.

Note that Rα preserves the measure λ. Iterations of the circle rotation Let 2πα be an angle.

π Start with a point z = e2 i x ∈ S and consider successive applications of the rotation map Rα: 1 2π i (x+α) Rα(z) = Rα(z) = e 2 2π i (x+2α) Rα(z) = Rα ◦ Rα(z) = e . . n 2π i (x+nα) Rα(z) = Rα ◦ ... ◦ Rα(z) = e . . 1 2 n The maps Rα, Rα,..., Rα, . . . are the iterations of Rα.

Given a point z ∈ S, the set 1 2 n {Rα(z), Rα(z),..., Rα(z),... } is called the orbit of z. The orbit of every point is dense on the circle.

This transformation satisfies a very weak form of independence called ergodicity.

An orbit of a circle rotation

Let Rα be the circle rotation by the angle 2πα, where α is an .

Pick a point z on the circle S.

The orbit of z (or rather a finite subset of it). An orbit of a circle rotation

Let Rα be the circle rotation by the angle 2πα, where α is an irrational number.

Pick a point z on the circle S.

The orbit of z (or rather a finite subset of it). The orbit of every point is dense on the circle.

This transformation satisfies a very weak form of independence called ergodicity. Observables on the unit circle

Any measurable function f : S → R is called a (scalar) observable of the measure space (S, A, λ).

We will assume our observables to be absolutely integrable.

A basic example of an observable: f = 1I , where I is an arc (or any other measurable set) on the circle.

I, | I | ~ e

“Observations" of the orbit points of a circle rotation. Average number of orbit points visiting an arc

Let Rα be the circle rotation by the angle 2πα, where α is an irrational number. Let I be an arc on the circle.

I, | I | ~ e

The first n orbit points of a circle rotation and their visits to I. The average number of visits to I: j # j ∈ {1, 2, . . . , n} : Rα(z) ∈ I n What does this look like for large enough n? Or in other words, is there a limit of these averages as n → ?

∞ Average number of orbit points visiting an arc

Let Rα be the circle rotation by the angle 2πα, where α is an irrational number. Let I be an arc on the circle.

I, | I | ~ e

The first n orbit points of a circle rotation and their visits to I.

As n → , the average number of visits to I:

# j ∈ {1, 2, . . . , n} : Rj (z) ∈ I ∞ α → λ(I) , n for all points z ∈ S. = 1I dλ . ZS

Average number of orbit points visiting an arc

Let Rα be the circle rotation by the angle 2πα, where α is an irrational number. Let I be an arc on the circle.

I, | I | ~ e

The first n orbit points of a circle rotation and their visits to I.

n j 1 j # j ∈ {1, 2, . . . , n} : Rα(z) ∈ I = I (Rα(z)). = Xj 1 Then the average number of visits to I can be written: 1 (R1 (z)) + 1 (R2 (z)) + ... + 1 (Rn (z)) I α I α I α → λ(I) n Average number of orbit points visiting an arc

Let Rα be the circle rotation by the angle 2πα, where α is an irrational number. Let I be an arc on the circle.

I, | I | ~ e

The first n orbit points of a circle rotation and their visits to I.

n j 1 j # j ∈ {1, 2, . . . , n} : Rα(z) ∈ I = I (Rα(z)). = Xj 1 Then the average number of visits to I can be written: 1 (R1 (z)) + 1 (R2 (z)) + ... + 1 (Rn (z)) I α I α I α → λ(I) = 1 dλ . n I ZS Measure preserving dynamical systems

A probability space (X, B, µ) together with a transformation T : X → X define a measure preserving dynamical system if T is measurable and it preserves the measure of any B-measurable set: µ(T −1A) = µ(A) for all A ∈ B.

Ergodic dynamical system. For any B-measurable set A with µ(A) > 0, the iterations TA, T 2 A,..., T n A,... fill up the whole space X, except possibly for a set of measure zero.

Ergodicity leads to some very, very weak form of independence. Some examples of ergodic dynamical systems

1 The Bernoulli shift, which encodes sequences of independent, identically distributed random variables.

2 The circle rotation by an irrational angle.

3 The doubling map.

T :[0, 1] → [0, 1], Tx = 2x mod 1.

. . The pointwise ergodic theorem

Given: an ergodic dynamical system (X, B, µ, T), and an absolutely integrable observable f : X → R, define the n-th Birkhoff sum

2 n Sn f (x) := f (Tx) + f (T x) + ... + f (T x).

Then as n →

1 S ∞f (x) → f dµ for µ a.e. x ∈ X. n n ZX The law of large numbers

We have seen that if

X1, X2,..., Xn,... is a sequence of jointly independent , identically distributed copies of a scalar random variable X, and if we denote the corresponding sum process by

Sn := X1 + X2 + ... + Xn, then as n → 1 S → X almost surely . ∞n n Z An immediate application of the ergodic theorem Let (X, B, µ, T) be an ergodic dynamical system. Let x ∈ X, and consider its orbit Tx, T 2x,..., T nx,... Equidistribution of orbit points. For any B-measurable set A, the average number of orbit points that visit A, converges as n → .

j # j ∈ {1, 2, . . . , n} : T x ∈ A ∞ → µ(A) . n for µ almost every point x ∈ X.

Proof. Just apply the pointwise ergodic theorem to the observable f = 1A , and note that the counting of orbit points above equals the n-th Birkhoff sum of this observable. # j ∈ {1, 2, . . . , n} : xj = 7 ≈ ? as n → . n

Solution. Consider the dynamical system given∞ by the 10-fold map: T :[0, 1) → [0, 1), Tx = 10x mod 1. Let f :[0, 1) → R be the observable defined as

1 if x1 = 7 f (x) = 0 otherwise .

Another simple application of the ergodic theorem Consider the decimal representation of every x ∈ [0, 1). x = 0. x1 x2 ... xn ..., where the digits xk ∈ {0, 1, 2, . . . , 9}.

What is the frequency (average occurrence) of each digit in the decimal representation of a “typical" real number x ∈ [0, 1] ? Solution. Consider the dynamical system given by the 10-fold map: T :[0, 1) → [0, 1), Tx = 10x mod 1. Let f :[0, 1) → R be the observable defined as

1 if x1 = 7 f (x) = 0 otherwise .

Another simple application of the ergodic theorem Consider the decimal representation of every real number x ∈ [0, 1). x = 0. x1 x2 ... xn ..., where the digits xk ∈ {0, 1, 2, . . . , 9}.

What is the frequency (average occurrence) of each digit in the decimal representation of a “typical" real number x ∈ [0, 1] ?

# j ∈ {1, 2, . . . , n} : xj = 7 ≈ ? as n → . n

∞ Another simple application of the ergodic theorem Consider the decimal representation of every real number x ∈ [0, 1). x = 0. x1 x2 ... xn ..., where the digits xk ∈ {0, 1, 2, . . . , 9}.

What is the frequency (average occurrence) of each digit in the decimal representation of a “typical" real number x ∈ [0, 1] ?

# j ∈ {1, 2, . . . , n} : xj = 7 ≈ ? as n → . n

Solution. Consider the dynamical system given∞ by the 10-fold map: T :[0, 1) → [0, 1), Tx = 10x mod 1. Let f :[0, 1) → R be the observable defined as

1 if x1 = 7 f (x) = 0 otherwise . The law of large numbers

We have seen that if

X1, X2,..., Xn,... is a sequence of jointly independent, identically distributed scalar random variables, and if we denote the corresponding sum process by

Sn := X1 + X2 + ... + Xn, then the averages

1 S converge almost surely as n → . n n

∞ Furstenberg-Kesten’s theorem. Almost surely, and as n → , the “geometric averages" 1 log kΠ k converge to a constant. ∞n n This constant is called the Lyapunov exponent of the multiplicative process.

Random matrices and geometric averages Consider a sequence

M1, M2,..., Mn,... of random matrices. We assume that this sequence is independent and identically distributed.

Consider the partial products process:

Πn = Mn · ... · M2 · M1 . Random matrices and geometric averages Consider a sequence

M1, M2,..., Mn,... of random matrices. We assume that this sequence is independent and identically distributed.

Consider the partial products process:

Πn = Mn · ... · M2 · M1 .

Furstenberg-Kesten’s theorem. Almost surely, and as n → , the “geometric averages" 1 log kΠ k converge to a constant. ∞n n This constant is called the Lyapunov exponent of the multiplicative process. . . . then MA3105 Advanced will cover in depth many of these topics.

You are ready to take MA3105 if this picture makes some sense to you.

If you liked this . . . You are ready to take MA3105 if this picture makes some sense to you.

If you liked this ...... then MA3105 Advanced real analysis will cover in depth many of these topics. If you liked this ...... then MA3105 Advanced real analysis will cover in depth many of these topics.

You are ready to take MA3105 if this picture makes some sense to you. MA3105 Advanced real analysis main topics

A monkey typing random stuff. MA3105 Advanced real analysis main topics

Arnold’s cat map.