The Pointwise Ergodic Theorem and its Applications

Peter Oberly 11/9/2018

Introduction

Algebra has homomorphisms and topology has continuous maps; in these notes we explore the structure preserving maps for measure theory known (somewhat unimaginatively) as measure preserving transformations. The first section contains some (but not all) of the necessary definitions for this talk and in the second we introduce some classical examples to illustrate these definitions. We then turn our attention to the dynamics of measure preserving maps which leads us to the pointwise ergodic theorem. In the final section we use the ergodic theorem to prove Borel’s theorem on normal numbers.

Definitions

Definition. A σ-algebra A is a collection of subsets of a non-empty set X so that X ∈ A and A is closed under complementation and countable unions; The pair (X, A) is called a measurable space, and elements of A are called measurable sets. A particularly important σ-algebra is the collection of Borel sets, defined to be the σ-algebra generated by the open subsets of a topological space X.

Definition. A measure m : A → [0, ∞] is a function which satisfies the following: 1. m(E) ≥ 0 for all E ∈ A; 2. m(∅) = 0; ∞ S P 3. If {En}n=1 ⊂ A is a sequence of pairwise disjoint sets in A, then m( n En) = n m(En). A measure space is a triple (X, A, m) where (X, A) is a measurable space and m is a measure defined on A. The triple (X, A, m) is called a probability space if m(X) = 1.

Definition. Let (X, A, m) and (Y, B, n) be measure spaces, and let T : X → Y be a map from X into Y . T is said to be measurable if T −1(E) ∈ A for each E ∈ B; that is, if the pre-image of every measurable set is measurable.

Definition. A measurable transformation T :(X, A, m) → (Y, B, n) is said to be measure- preserving if m(T −1(E)) = n(E) for all E ∈ B. If T is a bijection and T −1 is also measure preserving, then T is said to be invertible. If (X, A, m) is a probability space, and if T : X → X is measure preserving, then the quadruple (X, A, m, T ) is sometimes referred to as a measurable .

Remarks: (1) We should really write T :(X, A, m) → (Y, B, n) since the measure preserving property Notes on the Ergodic Theorem depends on both the σ-algebras and the measures, but will often write T : X → Y instead. (2) If T :(X, A, , m) → (Y, B, n) and S :(Y, B, n) → (Z, C, p) are measure preserving, then so is S ◦ T . (3) Measure preserving maps are the structure preserving transformations (morphisms) of measure spaces. (4) As such, a measure preserving map T : X → X induces a morphism on the Banach 1 1 1 space of m-integrable functions L (m). In detail, let UT : L (m) → L (m) be defined by UT (f) = f ◦ T . It is evident that UT is linear, and if f ≥ 0 (and so is real valued), then (UT f)(x) = f(T (x)) ≥ 0 for x ∈ X. So UT is positive. In fact, UT is an isometry. For if Pn s is a non-negative simple function s = k=1 akχAk , where ak are scalars and Ak are the measurable sets where s > 0, then Z n Z n n Z X X −1 X UT (s) dm = ak χAk ◦ T dm = akm(T (Ak)) = akm(Ak) = s dm. k=1 k=1 k=1

Therefore choosing a sequence of simple functions sn which converges monotonically to |f|, 1 where f ∈ L (m), shows ||UT (f)||1 = ||f||1. Note also that this shows UT really does map into L1(m). (5) As we are interested in the dynamics of measure preserving maps, from now on we will restrict our attention to measurable functions T : X → X. Additionally, unless other wise stated, we will assume that (X, A, m) is a probability space.

Our last definition requires a bit of motivation. Let (X, A, m, T ) be a measurable dynamical system. If T −1(E) = E for E ∈ A, then T −1(X \ E) = X \ E and we could study our system by examining the two simpler systems (E, A ∩E, m|E A ∩E,T |E) and (X \ E, A ∩(X \ E), m|A ∩(X\E),T |X\E) (with the corresponding measures normalized appro- priately). If 0 < m(E) < 1, then we have actually decomposed our original system into two smaller ones. However, if m(E) = 0 or m(X \ E) = 0 (i.e. m(E) = 1), then one of our simpler systems is in fact trivial, and we are left with a system essentially the same as the one we started with. It follows that those measurable dynamical systems where T −1(E) = E implies m(E) = 0 or 1 are not usefully decomposable in this way. It makes sense therefore to study those systems where such decomposition is not possible, for understanding these will enable us to understand the ones which can be simplified. We call such systems ergodic. Definition. A measurable dynamical system (X, A, m, T ) is said to be ergodic if E ∈ A and T −1(E) = E implies that m(E) = 0 or 1. We will often have a specific probability space (X, A, m) in mind and refer to the measure preserving transformation T as ergodic. There are many characterizations of ; one which will prove useful in this talk is the following. Theorem 1. (X, A, m, T ) is ergodic if and only if f ∈ L1(m) and f ◦ T = f ae implies that f is constant ae. Proof. Assume that for all f ∈ L1(m) that if f ◦ T = f ae then f is constant ae. Let E ∈ A −1 1 be so that T (E) = E. Then χE ◦ T = χE. As χE ∈ L (m), then χE is constant ae. Therefore χE is either 0 or 1 ae and so m(E) = 0 or 1. The converse is more technical, and can be found in McDonald and Wiess on page 616.

2 Notes on the Ergodic Theorem

Examples

n n (1) Let T : R → R be a linear map and let m be the Lebesgue measure on the Borel sets n n of R . If T is singular, then range T is a proper subspace of R , and so T is not measure pre- serving. If instead T is non-singular, from linear algebra then m(T −1(E)) = m(E)/| det T | for all Borel sets E. Therefore T is a measure preserving linear map if and only if | det T | = 1.

(2) Let S1 = {z ∈ C : |z| = 1} and let B denote the Borel σ-algebra. Then with normalized circular Lebesgue measure m, the triple (S1, B, m) is a probability space. For a ∈ S1, define 1 1 the rotation Ta : S → S by Ta(z) = az. Then Ta is measure preserving and invertible for all a. It is very instructive to show the following.

Theorem 2. The rotation T = Ta is ergodic if and only if a is not a root of unity Proof. Suppose that a is a root of unity. Then ap = 1 for some p 6= 0. Let f : S1 → S1 be defined by f(z) = zp. Then (f ◦ T )(z) = f(az) = apzp = f(z) for all z ∈ S1. Therefore f ◦ T = f but f is non-constant. So T is not ergodic by theorem 1. Conversely let A be a measurable subset of S1 so that T −1(A) = A. Notice that the functions 1 1 n 2 en : S → S defined by en(z) = z , n ∈ Z form an orthonormal basis for L (m). Let the P n Fourier series for χA be χA ∼ n bnen. Since en(T z) = a en(z), it follows by a change of variable that Z Z −n bn = χAe−n dm = a e−n dm, T −1(A) P n −1 and so χT −1(A) ∼ n a bnen. As T (A) = A, then χA = χT −1(A) and thus have the same n Fourier coefficients. Therefore bn = a bn for all n. If a is not a root of unity, the only way this can hold is if bn = 0 for all n 6= 0. By the uniqueness of Fourier coefficients, χA is a constant almost everywhere and so m(A) = 0 or 1. Therefore Ta is ergodic when a is not a root of unity. (3) Let ([0, 1), B, m) be the probability space consisting of the half open unit interval with Borel sets B and m the Lebesgue measure. Define T : [0, 1) → [0, 1) by ( 2x, if 0 ≤ x < 1/2; T (x) = 2x mod 1 = 2x − 1, if 1/2 ≤ x < 1.

This map is referred to as the dyadic transformation. Notice that if x has binary expansion x = 0.x1x2x3...(2) then T (x) = 0.x2x3...(2). It is worth showing that T is measure preserving. From measure theory [Billingsly, p 4], it suffices to prove that T preserves measure on a semi-algebra which generates the Borel σ-algebra. The collection of half open intervals with k j rational dyadic endpoints is such a semi-algebra. So let E = [ 2n , 2n ) where n ≥ 0 and

3 Notes on the Ergodic Theorem

0 ≤ k ≤ j ≤ 2n. Then k j k j T −1(E) = {x ∈ [0, 1/2) : ≤ 2x < } ∪ {x ∈ [1/2, 1) : ≤ 2x − 1 < } 2n 2n 2n 2n k j k 1 j = [ , ) ∪ [1/2 + , + ) 2n+1 2n+1 2n+1 2 2n+1 1 1 1 = E ∪ ( + E) 2 2 2 and the translation invariance of the Lebesgue measure implies 1 1 m(T −1(E)) = m(E) + m(E) = m(E). 2 2 So T is measure preserving. We sketch the proof that T is in fact ergodic. Let A be a −1 0 measurable subset of [0, 1) with T (A) = A. Let x = 0.0x2x3...(2) and x = 0.1x2x3...(2) 0 and assume that these are unique expansions. Then T (x) = T (x ) = 0.x2x3...(2). Now x ∈ A is equivalent to T x ∈ A and similarly x0 ∈ A exactly when T x0 ∈ A. So T (x) = T (x0) implies x ∈ A if and only if x0 ∈ A. Then it follows that A ∩ [1/2, 1) = 1/2 + A ∩ [0, 1/2). So m(A ∩ [0, 1/2)) = m(A ∩ [1/2, 1)), and hence

m(A) = m(A ∩ [0, 1/2)) + m(A ∩ [1/2, 1)) = 2m(A ∩ [0, 1/2)) = m(A ∩ [0, 1/2))/m([0, 1/2)).

Thus m(A)m([0, 1/2)) = m(A ∩ [0, 1/2)). Now this argument can be elaborated to show that this is true of any half open interval with rational dyadic endpoints, or any dis- joint union of such intervals. Now given  > 0, choose such a disjoint union E so that m(A4E) < , where 4 denotes the symmetric difference (which we can do as A is measur- able and the half open dyadic intervals generate the Borel sets). Then |m(A) − m(E)| <  and |m(A) − m(A ∩ E)| = |m(A) − m(A)m(E)| < . Hence |m(A) − m(A)2| < 2 and as  is arbitrary, then m(A) = m(A)2. So m(A) = 0 or 1 and T is ergodic.

The Ergodic Theorem

To motivate the pointwise ergodic theorem, we first show that all measure preserving trans- formations on a finite measure space enjoy the property of recurrence:

Theorem 3 (The Poincar´eRecurrence Theorem). Let T : X → X be a measure preserving transformation of a probability space (X, A, m). Let E ∈ A with m(E) > 0. Then points of E return to E infinitely often under iteration by T ; that is, T n(x) ∈ E for almost all x ∈ E and for infinitely many n.

S∞ −n T∞ Proof. Given N ≥ 0, set EN = n=N T (E) and set F = E ∩ N=0 EN . Then x ∈ F if and only if x ∈ E and for all N ≥ 0, there is an n ≥ N so that T n(x) ∈ E. So F is the set

4 Notes on the Ergodic Theorem

of points of E which return to E infinitely often under iteration by T . Note that if x ∈ F , nj then there is a subsequence n1 < n2 < ... < nj < ... of natural numbers so that T (x) ∈ E for all j; therefore for each j we have T nj (x) ∈ F since T nj −ni (T ni (x)) ∈ E for all i. Thus every point of F returns to F infinitely often under iteration by T . −1 S∞ −(n+1) It remains to show that m(F ) = m(E). Note that T (EN ) = n=N T (E) = EN+1 and so m(EN ) = m(EN+1) for all N. Therefore m(EN ) = m(E0) for all N and since E0 ⊃ E1 ⊃ ... T∞ then m( N=0 EN ) = m(E0). Therefore m(F ) = m(E ∩ E0) = m(E) as E ⊂ E0. This begs the natural question: how often, or with what frequency, do the iterates of T (x) return to a set? There is a very big difference between T 2n(x) ∈ E and T n!(x) ∈ E for all n (and almost all x ∈ E) even though both return to E infinitely often. It makes sense then to consider the long term behavior of the average number of times T n(x) returns to E; that is to consider the limit of the ratios

n−1 1 X χ (T k(x)) n E k=0 as n → ∞. It is not obvious in what sense, if any at all, this limit exists. It is also quite restrictive to consider just characteristic functions; in a wide variety of applications both in theoretical math and the sciences, it is impossible to calculate or observe the of a point directly. Instead, we rely on numerical data. We are therefore lead to consider the convergence of the ratios n−1 1 X (f ◦ T k)(x) n k=0 where f : X → C is now a measurable function. It is even less clear in what sense this limit may exist, or with what restrictions we may require to ensure convergence. Birkhoff’s celebrated pointwise ergodic theorem provides an answer to these questions. Theorem 4 (Birkhoff’s Pointwise Ergodic Theorem). Let (X, A, m) be a (possibly σ-finite) measure space and let T : X → X be measure preserving. If f ∈ L1(m), then the limit

n−1 1 X lim (f ◦ T k) n→∞ n k=0 converges pointwise almost everywhere to a function f ∗ ∈ L1(m). Furthermore, f ∗ ◦ T = f ∗ ae (f ∗ is invariant), and if m(X) < ∞ then Z Z f ∗ dm = f dm.

Remark. If T is also ergodic, then f ∗ is constant ae by theorem 1. So if m(X) < ∞, then R ∗ ∗ R ∗ 1 R f dm = f m(X) = f dm ae and thus f = m(X) f dm. In particular, if T is ergodic and (X, A, m) is a probability space then

n−1 1 X Z lim (f ◦ T k)(x) = f dm n→∞ n k=0

5 Notes on the Ergodic Theorem

for almost all x ∈ X and all f ∈ L1(m). This is the form of the ergodic theorem that may be the most familiar; that the time average tends to the space average for almost every point. This answers our question on the asymptotic frequency with which the orbit of a point x lies in a given measurable set E. For if T is ergodic, then

n−1 1 X k lim χE(T )(x) = m(E) n n k=0 for almost every x in the probability space X. We will only outline the proof. A detailed exposition can be found in Walters, Halmos, or Billingsly. The form of this proof is from Walters.

(1) The first step is to prove the maximal ergodic theorem, or rather the following corollary of it. The maximal ergodic theorem, along with the convergence theorems of Lebesgue theory, is what drives the proof of the pointwise ergodic theorem.

Theorem 5 (Maximal Ergodic Theorem). Let (X, A, m) be a finite measure space and T : X → X be measure preserving. If f is real-valued and integrable, then Z f dm ≥ 0, A where n−1 1 X A = {x ∈ X : sup f(T k(x)) > 0} n≥1 n k=0

1 1 Proof. As noted in the introduction, the map UT : LR(m) → LR(m) defined by UT (f) = f ◦T n−1 is a positive linear isometry. Let f0 = 0 and fn = f + UT f + ...UT f for n ≥ 1. Set FN = max0≤n≤N fn and note that FN ≥ 0 for all N ∈ N. Also observe that FN is integrable since fn is. We have FN ≥ fn for 0 ≤ n ≤ N, and so UT (FN ) ≥ UT (fn) by positivity. Hence UT (FN ) + f ≥ fn+1, and therefore UT (FN ) + f ≥ max1≤n≤N fn. Thus if x ∈ X and FN (x) > 0, then (UT FN )(x) + f(x) ≥ max fn(x) = FN (x). 0≤n≤N

So f ≥ FN − UT FN on AN = {x ∈ X : FN (x) > 0}. As FN (x) = 0 on X \ AN , then Z Z Z f dm ≥ FN dm − UT (FN ) dm AN AN AN Z Z = FN dm − UT (FN ) dm X AN Z Z ≥ FN dm − FN dm X X = ||FN ||1 − ||UT (FN )||1 = 0,

6 Notes on the Ergodic Theorem

R R where we have used the fact that UT (FN ) dm ≤ UT (FN ) dm and that UT is an AN X 1 Pn−1 k isometry. Given x ∈ X, we see that supn≥1 n k=0 UT (f) > 0 if and only if there is an N so S∞ that max0≤n≤N fn(x) = FN (x) > 0; hence A = n=0 AN . As FN ≤ FN+1, then AN ⊂ AN+1 and so applying the monotone convergence theorem to f · χAN yields the desired claim. (2) We make some simplifying assumptions and introduce notation. Assume first that m(X) < ∞ and that f is real valued. Given x ∈ X, define

n−1 1 X a (x) = f(T k(x)), n n k=0 and ∗ f (x) = lim sup an(x), f∗(x) = lim inf(x). n n ∗ As an is measurable for all n, then so are f and f∗. Notice that n + 1 f(x) a (T x) = a (x) − n n n+1 n

for all n. Since f ∈ L1(X), we can assume that f(x) < ∞ by redefining f on a set of measure zero if necessary. Therefore f(x)/n → 0 as n → ∞ and so

∗ n + 1 ∗ f (T (x)) = lim sup an(T x) = lim sup( an+1(x) − f(x)/n) = lim sup an+1(x) = f (x). n n n n

A similar argument shows that f∗ ◦ T = f∗ ae.

∗ ∗ (3) We show that f = f∗ ae; that is, that the set E = {x ∈ X : f∗(x) < f (x)} has measure ∗ zero. For real numbers a and b with a < b, let E(a, b) = {x ∈ X : f∗(x) < a < b < f (x)}. S Then E = {E(a, b): a, b ∈ Q}, so we show m(E(a, b)) = 0. As f∗ and f∗ are measurable, ∗ ∗ then so is E(a, b) and therefore so is E. As f ◦ T = f and f∗ ◦ T = f∗ ae, then

−1 ∗ T (E(a, b)) = {x ∈ X : f∗(T x) < a < b < f (T x)} = E(a, b). It is here that we need to use the maximal ergodic theorem.

1 Pn−1 k (4) Notice that E(a, b) ∩ {x ∈ X : supn≥1 n k=0 f(T (x)) > b} = E(a, b). So apply the maximal ergodic theorem to the function f − b to conclude Z Z f − b dm ≥ 0, so f dm ≥ bm(E(a, b)) E(a,b) E(a,b) and similarly Z Z a − f dm ≥ 0, so f dm ≤ am(E(a, b)). E(a,b) E(a,b) Therefore aE(a, b) ≥ bE(a, b); since b > a, this can be true only if m(E(a, b)) = 0. Hence ∗ f∗ = f ae.

7 Notes on the Ergodic Theorem

(5) To show that f ∗ is integrable, note that

n−1 Z 1 Z X Z |a | dm ≤ | f ◦ T k| dm = |f(x)| dm < ∞, n n k=0 where we have used a change of variables and the fact that T is m-invariant. Fatou’s lemma implies then Z Z lim inf |an| dm ≤ lim inf |f| dm < ∞.

So f ∗ ∈ L1(m).

(6) The last part is to show that R f dm = R f ∗ dm. Notice that

n−1 Z 1 X Z Z a dm = f ◦ T k dm = f dm n n k=0 by changing variables and since T preserves measure. Therefore if we show that the inter- change of limit and integral Z Z Z Z ∗ f dm = lim an dm = lim an dm = f dm n n is valid then the proof for the case m(X) < ∞ is complete. This is accomplished by another application of the maximal ergodic theorem and the dominated convergence theorem.

(7) For the case when X is σ-finite, the above will work so long as m(E(a, b)) < ∞ so that we can apply the maximal ergodic theorem. This is done by choosing a subset C ⊂ E(a, b) with finite measure (which exists by σ-finiteness) and applying the maximal ergodic theorem to the function f − bχC to conclude (after a few more steps) that Z |f| dm ≥ bm(C).

1 R Therefore if C ⊂ E(a, b) has m(C) < ∞, then m(C) ≤ b ≤ |f| dm; it follows from σ- finiteness that m(E(a, b)) < ∞ as well.

Consequences of the Ergodic Theorem

A real number x is normal to base r if the expansion of x in base r contains each digit in the same proportion.

Theorem 6 (Borel’s Theorem on Normal Numbers). Almost all numbers in [0, 1) are normal to base r for all integers r ≥ 2; i.e. for almost all x ∈ [0, 1) the frequency of the digits 0, 1, 2, ..., r − 1 in the base r expansion of x occur with the same frequency 1/r.

8 Notes on the Ergodic Theorem

Proof. Let r ≥ 2 be an integer and define the r-adic transformation T : [0, 1) → [0, 1) by  rx 0 ≤ x < 1;   1 2 rx − 1 r ≤ x < r ; T (x) = rx mod 1 = . .   r−1 rx − (r − 1) r ≤ x < 1. Just as for the dyadic transformation (r = 2), T is ergodic on [0, 1) with respect to the Lebesgue measure and Borel σ-algebra. Let X denote the set of points of [0, 1) which have unique base r expansion. Then [0, 1) \ X is countable so m(X) = 1. Let x ∈ X and write x uniquely as x = x1x2x3...(r). Then

j T (x) = T (0.x1x2...) = 0.x2x3x4...(r), and so T (x) = 0.xj+1xj+2...(r) where j ≥ 0. For ease of writing, let f denote the characteristic function f = χ k k+1 , where [ r , r ) 0 ≤ k < r is an integer. Then ( j 1, if xj+1 = k; f(T (x)) = f(0.xj+1xj+2...) = 0, else.

Therefore the number of times k appears in the first n digits of the r-adic expansion of x is Pn−1 j j=0 f(T (x)). Dividing by n and applying the ergodic theorem gives

n−1 1 X Z k k + 1 1 f(T j(x)) → f dm = m([ , )) = . n r r r j=0 [0,1) Hence the frequency with which k ∈ {0, 1, ..., r−1} appears in the r-adic expansion of almost all numbers in [0, 1) is 1/r. The pointwise ergodic theorem gives the following nice characterization of ergodicity. Theorem 7. A measurable dynamical system (X, A, m, T ) is ergodic if and only if for all A, B ∈ A n−1 1 X m(T −k(A) ∩ B) → m(A)m(B). n k=0

Proof. Suppose that T is ergodic. Applying the ergodic theorem to χA shows that

n−1 1 X χ (T k) → m(A) a.e.. n A k=0

1 Pn−1 k Multiplying by χB gives n k=0 χA(T )χB → m(A)χB a.e., and so the dominated conver- gence theorem implies

n−1 1 X m(T −k(A) ∩ B) → m(A)m(B) a.e. n k=0

9 Notes on the Ergodic Theorem

Conversely, suppose the convergence property holds. Suppose that E ∈ A with T −1(E) = E. Set A = B = E; by assumption then

n−1 1 X m(E) → m(E)2. n k=0

1 Pn−1 2 Since n k=0 m(E) = m(E) for all n then m(E) = m(E) and so m(E) = 0 or 1. This theorem provides a physical aid for understanding ergodic transformations; they are the maps which ”stir” our space enough so that every measurable set will intersect every other measurable set in proportion to their relative size.

References

1. John McDonald, Neil Weiss. A Course in Real Analysis, 2nd edition, Academic Press 2012, chapter 16. 2. Paul R. Halmos. Lectures on , Martino Publishing, 2013 3. Peter Walters. An Introduction to Ergodic Theory, Springer-Verlag New York Inc., 1982 4. Patrick Billingsley. Ergodic Theory and Information, John Wiley and Sons Inc., 1965

10