<<

MATH41112/61112

Ergodic

Charles Walkden

4th January, 2018 MATH4/61112 Contents

Contents

0 Preliminaries 2

1 An introduction to ergodic theory. Uniform distribution of real se- quences 4

2 More on uniform distribution mod 1. spaces. 13

3 Lebesgue integration. Invariant measures 23

4 More examples of invariant measures 38

5 Ergodic measures: definition, criteria, and basic examples 43

6 Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem to prove 53

7 Continuous transformations on compact metric spaces 62

8 Ergodic measures for continuous transformations 72

9 Recurrence 83

10 Birkhoff’s Ergodic Theorem 89

11 Applications of Birkhoff’s Ergodic Theorem 99

12 Solutions to the Exercises 108

1 MATH4/61112 0. Preliminaries

0. Preliminaries

0.1 Contact details § The lecturer is Dr Charles Walkden, Room 2.241, Tel: 0161 275 5805, Email: [email protected]. My office hour is: Monday 2pm-3pm. If you want to see me at another time then please email me first to arrange a mutually convenient time.

0.2 Course structure § This is a reading course, supported by one lecture per week. I have split the notes into weekly sections. You are expected to have read through the material before the lecture, and then go over it again afterwards in your own time. In the lectures I will highlight the most important parts, explain the statements of the theorems and what they mean in practice, and point out common misunderstandings. As a general rule, I will not normally go through the proofs in great detail (but they are examinable unless indicated otherwise). You will be expected to work through the proofs yourself in your own time. All the material in the notes is examinable, unless it says otherwise. (Note that if a proof is marked ‘not examinable’ then it means that I won’t expect you to reproduce the proof, but you will be expected to know and understand the statement of the result. If an entire section is marked ‘not examinable’ (for example, the review of Riemann integration in 1.3, and § the discussions on the proofs of von Neumann’s Ergodic Theorem and Birkhoff’s Ergodic Theorem in 9.6, 10.5, respectively) then you don’t need to know the statements of any §§ subsidiary lemmas/propositions in those sections that are not used elsewhere but reading this material may help your understanding.) Each section of the notes contains exercises. The exercises are a key part of the course and you are expected to attempt them. The solutions to the exercises are contained in the notes; I would strongly recommend attempting the exercises first without referring to the solutions. Please point out any mistakes (typographical or mathematical) in the notes.

0.3 The exam § The exam is a 3 hour written exam. There are several past exam papers on the course website. Note that some topics (for example, ) were covered in previous years and are not covered this year; there are also some new topics (for example, Kac’s Lemma) that were not covered in 2010 or earlier. The format of the exam is the same as last year’s. The exam has 5 questions, of which you must do 4. If you attempt all 5 questions then you will get credit for your best 4 answers. The style of the questions is similar to last year’s exam as well as to ‘Section B’ questions from earlier years.

2 MATH4/61112 0. Preliminaries

Each question is worth 30 marks. Thus the total number of marks available on the exam is 4 30 = 120. This will then be converted to a mark out of 100 (by multiplying by × 100/120). There is no coursework, in-class test or mid-term for this course.

0.4 Recommended texts § There are several suitable introductory texts on ergodic theory, including

W. Parry, Topics in Ergodic Theory P. Walters, An Introduction to Ergodic Theory I.P. Cornfeld, S.V. Fomin and Ya.G. Sinai, Ergodic Theory K. Petersen, Ergodic Theory M. Einsiedler and T. Ward, Ergodic Theory: With a View Towards .

Parry’s or Walter’s books are the most suitable for this course.

3 MATH4/61112 1. Uniform distribution mod 1

1. An introduction to ergodic theory. Uniform distribution of real sequences

1.1 Introduction § A consists of a space X, often called a , and a rule that determines how points in X evolve in time. Time can be either continuous (in which case a dynamical system is given by a first order autonomous differential equation) or discrete (in which case we are studying the iterates of a single map T : X X). → We will only consider the case of discrete time in this course. Thus we will be studying the iterates of a single map T : X X. We will write T n = T T (n times) to denote → ◦···◦ the nth composition of T . If x X then we can think of T n(x), the result of applying the ∈ map T n times to the point x, as being where x has moved to after time n. We call the sequence x, T (x), T 2(x),...,T n(x),... the of x. If T is invertible (and so we can iterate backwards by repeatedly applying T −1) then sometimes we refer to the doubly-infinite sequence ...,T −n(x),...,T −1(x),x,T (x),...,T n(x),... as the orbit of x and the sequence x, T (x),...,T n(x),... as the forward orbit of x. As an example, consider the map T : [0, 1] [0, 1] defined by → 2x if 0 x 1/2, T (x)= ≤ ≤ 2x 1 if 1/2 0 if T n(x)= x. (Note that we do not assume that n is least.) If x is a of period n then we call x, T (x),...,T n−1(x) a periodic orbit of period { } n. Other points for the doubling map may have a dense orbit in [0,1]. Recall that a set Y is dense in [0, 1] if any point in [0, 1] can be arbitrarily well approximated by a point in Y . Thus the orbit of x is dense in [0, 1] if: for all x′ X and for all ε > 0 there exists n > 0 ∈ such that T n(x) x′ < ε. | − | Consider a subinterval [a, b] [0, 1]. How frequently does an orbit of a point under the ⊂ doubling map visit the interval [a, b]? Define the characteristic function χB of a set B by 1 if x B, χ (x)= ∈ B 0 if x B.  6∈ Then n−1 j χ[a,b](T (x)) Xj=0 4 MATH4/61112 1. Uniform distribution mod 1 denotes the number of the first n points in the orbit of x that lie in [a, b]. Hence

n−1 1 χ (T j(x)) n [a,b] Xj=0 denotes the proportion of the first n points in the orbit of x that lie in [a, b]. Hence

n−1 1 j lim χ[a,b](T (x)) n→∞ n Xj=0 denotes the frequency with which the orbit of x lies in [a, b]. In ergodic theory, one wants to understand when this is equal to the ‘size’ of the interval [a, b] (we will make ‘size’ precise later by using measure theory; for the moment, ‘size’=‘length’). That is, when does

n−1 1 j lim χ[a,b](T (x)) = b a (1.1.1) n→∞ n − Xj=0 for every interval [a, b]? Note that if x satisfies (1.1.1) then the proportion of time that the orbit of x spends in an interval [a, b] is equal to the length of that interval; i.e. the orbit of x is equidistributed in [0, 1] and does not favour one region of [0, 1] over another. In general, one cannot expect (1.1.1) to hold for every point x; indeed, if x is periodic then (1.1.1) does not hold. Even if the orbit of x is dense, then (1.1.1) may not hold. However, one might expect (1.1.1) to hold for ‘typical’ points x X (where again we can ∈ make ‘typical’ precise using measure theory). One might also want to replace the function χ with an arbitrary function f : X R. In this case one would want to ask: for the [a,b] → doubling map T , when is it the case that

n−1 1 1 lim f(T j(x)) = f(x) dx? n→∞ n 0 Xj=0 Z The goal of the course is to understand the statement, prove, and explain how to apply the following result.

Theorem 1.1.1 (Birkhoff’s Ergodic Theorem) Let (X, ,µ) be a probability space. Let f L1(X, ,µ) be an integrable function. Suppose B ∈ B that T : X X is an ergodic measure-preserving transformation of X. Then → n−1 1 lim f(T j(x)) = f dµ n→∞ n Xj=0 Z for µ-a.e. point x X. ∈ Ergodic theory has many applications to other areas of , notably hyperbolic , number theory, geometry, and . We shall see some of the (simpler) applications to number theory throughout the course.

5 MATH4/61112 1. Uniform distribution mod 1

1.2 Uniform distribution mod 1 § Let T : X X be a dynamical system. In ergodic theory we are interested in the long-term → distributional behaviour of the sequence of points x, T (x), T 2(x),.... Before studying this problem, we consider an analogous problem in the context of sequences of real numbers. Let x R be a sequence of real numbers. We may decompose x as the sum of its n ∈ n integer part [x ] = sup m Z m x (i.e. the largest integer which is less than or n { ∈ | ≤ n} equal to x ) and its fractional part x = x [x ]. Clearly, 0 x < 1. The study of n { n} n − n ≤ { n} x mod 1 is the study of the sequence x in [0, 1]. n { n}

Definition. We say that the sequence xn is uniformly distributed mod1 (udm1 for short) if for every a, b with 0 a < b 1, we have that ≤ ≤ 1 lim card 0 j n 1 xj [a, b] = b a. n→∞ n { ≤ ≤ − | { }∈ } − Remarks. (i) Here, card denotes the cardinality of a set.

(ii) Thus x is uniformly distributed mod 1 if, given any interval [a, b] [0, 1], the n ⊂ frequency with which the fractional parts of xn lie in the interval [a, b] is equal to its length, b a. − (iii) We can replace [a, b] by [a, b), (a, b] or (a, b) without altering the definition.

The following result gives a necessary and sufficient condition for the sequence x R n ∈ to be uniformly distributed mod 1.

Theorem 1.2.1 (Weyl’s Criterion) The following are equivalent: (i) the sequence x R is uniformly distributed mod 1; n ∈ (ii) for every continuous function f : [0, 1] R with f(0) = f(1) we have → n−1 1 1 lim f( xj )= f(x) dx; (1.2.1) n→∞ n { } 0 Xj=0 Z (iii) for each ℓ Z 0 we have ∈ \ { } n−1 1 lim e2πiℓxj = 0. n→∞ n Xj=0 Remarks. (i) As a grammatical point, criterion is singular (the plural is criteria). Weyl’s criterion is that (i) and (iii) are equivalent. Statement (ii) has been included because it is an important intermediate step in the proof and, as we shall see, it closely resembles an ergodic theorem.

(ii) One can replace the hypothesis that f is continuous in (1.2.1) with f is Riemann integrable.

6 MATH4/61112 1. Uniform distribution mod 1

(iii) To prove that (i) is equivalent to (iii) we work, in fact, not on the unit interval [0, 1] but on the unit circle R/Z. To form R/Z, we work with real numbers modulo the integers (informally: we ignore integers parts). Note that, ignoring integer parts means that 0 and 1 in [0, 1] are ‘the same’. Thus the end-points of the unit interval ‘join up’ and we see that R/Z is a circle. More formally, R is an additive group, Z is a subgroup and the quotient group R/Z is, topologically, a circle. Note that the requirement in (ii) that f(0) = f(1) means that f : [0, 1] R is a well-defined function on the circle → R/Z. It is, however, the case that (i) is equivalent to (ii) without the hypothesis in (ii) that f(0) = f(1).

1.2.1 The sequence x = αn § n The behaviour of the sequence xn = αn depends on whether α is rational or irrational. If α Q then it is easy to see that αn can take on only finitely many values in [0, 1]. ∈ { } Indeed, if α = p/q (p Z,q Z,q = 0, hcf(p,q) = 1) then αn takes the q values ∈ ∈ 6 { } 0 p 2p (q 1)p 0= , , ,..., − q q q q         as qp/q = 0. In particular αn is not uniformly distributed mod 1. { } If α Q then the situation is completely different. We shall show that αn is uniformly 6∈ distributed mod 1 by applying Weyl’s Criterion. Let ℓ Z 0 . As α Q we have that ∈ \ { } 6∈ ℓα is never an integer; hence e2πiℓα = 1. Note that 6 n−1 n−1 1 1 1 e2πiℓαn 1 e2πiℓxj = e2πiℓαj = − n n n e2πiℓα 1 Xj=0 Xj=0 − by summing the geometric progression. Hence

n−1 1 1 e2πiℓαn 1 1 2 e2πiℓxj = | − | . (1.2.2) n n e2πiℓα 1 ≤ n e2πiℓα 1 j=0 X | − | | − |

As α Q, the denominator in the right-hand side of (1.2.2) is not 0. Letting n we 6∈ → ∞ see that n−1 1 lim e2πiℓxj = 0. n→∞ n j=0 X

Hence xn is uniformly distributed mod 1.

Remarks.

1. More generally, we could consider the sequence xn = αn + β. It is easy to see by modifying the above argument that xn is uniformly distributed mod 1 if and only if α is irrational. (See Exercise 1.2.)

n 2. Fix α> 1 and consider the sequence xn = α x. Then it is possible to show that for (Lebesgue) almost every x R, the sequence x is uniformly distributed mod 1. We ∈ n will prove this, at least for the cases when α = 2, 3, 4,....

7 MATH4/61112 1. Uniform distribution mod 1

n 3. Suppose we set x = 1 in the above remark and consider the sequence xn = α . Then one can show that xn is uniformly distributed mod 1 for almost every α> 1. However, not a single example of such an α is known! Indeed, it is not even known if (3/2)n is dense mod 1.

1.2.2 Proof of Weyl’s Criterion § We prove (i) implies (ii). Suppose that the sequence x R is uniformly distributed mod 1. n ∈ If χ[a,b] is the characteristic function of the interval [a, b], then we may rewrite the definition of uniform distribution mod 1 as n−1 1 1 lim χ[a,b]( xj )= χ[a,b](x) dx. n→∞ n { } 0 Xj=0 Z From this we deduce that n−1 1 1 lim g( xj )= g(x) dx n→∞ n { } 0 Xj=0 Z m whenever g is a step function, i.e., when g(x)= k=1 ckχ[ak,bk](x) is a finite linear combi- nation of characteristic functions of intervals. P Now let f be a continuous function on [0, 1]. Then, given ε > 0, we can find a step function g with f g ε. We have the estimate k − k∞ ≤ n−1 1 1 f( x ) f(x) dx n { j} − j=0 Z0 X n−1 n−1 1 1 1 (f( x ) g( x )) + g( x ) g(x) dx ≤ n { j} − { j } n { j } − j=0 j=0 Z0 X X 1 1

+ g(x) dx f(x) dx − Z0 Z0 n− 1 n−1 1 1 1 f( x ) g( x ) + g( x ) g(x) dx ≤ n | { j} − { j } | n { j } − j=0 j=0 Z0 X X 1

+ g(x) f(x) dx | − | Z0 n−1 1 1 2ε + g( x ) g(x) dx . ≤ n { j } − j=0 Z0 X

Since the last term converges to zero as n , we obtain →∞ n−1 1 1 lim sup f( xj ) f(x) dx 2ε. n→∞ n { } − ≤ j=0 Z0 X

Since ε> 0 is arbitrary, this gives us that

n−1 1 1 f( xj ) f(x) dx n { } → 0 Xj=0 Z 8 MATH4/61112 1. Uniform distribution mod 1 as n . →∞ We now prove (ii) implies (iii). Suppose that f : [0, 1] C is continuous and f(0) = → f(1). By writing f = Ref + iImf and applying (ii) to the real and imaginary parts of f we have that n−1 1 1 lim f( xj )= f(x) dx. n→∞ n { } 0 Xj=0 Z For ℓ Z,ℓ = 0 we let f(x)= e2πiℓx. Note that, as exp is 2πi-periodic, f( x ) = e2πiℓxj . ∈ 6 { j} Hence n−1 1 1 1 1 lim e2πiℓxj = e2πiℓx dx = e2πiℓx = 0, n→∞ n 0 2πiℓ 0 j=0 Z X as ℓ = 0. 6 We prove (iii) implies (i). Suppose that (iii) holds. Then

n−1 1 1 lim g( xj )= g(x) dx n→∞ n { } 0 Xj=0 Z m whenever g(x) = c e2πiℓkx is a trigonometric polynomial, c C, i.e. a finite linear k=1 k k ∈ combination of exponential functions. P Note that the space C(X, C) is a vector space: if f, g C(X, C) then f + g C(X, C) ∈ ∈ and if f C(X, C), λ C then λf C(X, C). A linear subspace C(X, C) is an algebra ∈ ∈ ∈ S ⊂ if whenever f, g then fg . We will need the following result: ∈ S ∈ S Theorem 1.2.2 (Stone-Weierstrass Theorem) Let X be a compact metric space and let C(X, C) denote the space of continuous functions defined on X. Suppose that C(X, C) is an algebra of continuous functions such that S ⊂ (i) if f then f¯ , ∈ S ∈ S (ii) separates the points of X, i.e. for all x,y X, x = y, there exists f such that S ∈ 6 ∈ S f(x) = f(y), 6 (iii) for every x X there exists f such that f(x) = 0. ∈ ∈ S 6 Then is uniformly dense in C(X, C), i.e. for all f C(X, C) and all ε > 0, there exists S ∈ g such that f g = sup f(x) g(x) < ε. ∈ S k − k∞ x∈X | − | We shall apply the Stone-Weierstrass Theorem with given by the set of trigonometric S polynomials. It is easy to see that satisfies the hypotheses of Theorem 1.2.2. Let f be S any continuous function on [0, 1] with f(0) = f(1). Given ε> 0 we can find a trigonometric polynomial g such that f g ε. As in the first part of the proof, we can conclude k − k∞ ≤ that n−1 1 1 lim f( xj )= f(x) dx. n→∞ n { } 0 Xj=0 Z Now consider the interval [a, b] [0, 1]. Given ε> 0, we can find continuous functions ⊂ f1 and f2 (with f1(0) = f1(1), f2(0) = f2(1)) such that

f χ f 1 ≤ [a,b] ≤ 2

9 MATH4/61112 1. Uniform distribution mod 1 and 1 f (x) f (x) dx ε. 2 − 1 ≤ Z0 We then have that n−1 n−1 1 1 1 lim inf χ[a,b]( xj ) lim inf f1( xj )= f1(x) dx n→∞ n { } ≥ n→∞ n { } 0 Xj=0 Xj=0 Z 1 1 f (x) dx ε χ (x) dx ε ≥ 2 − ≥ [a,b] − Z0 Z0 and n−1 n−1 1 1 1 lim sup χ[a,b]( xj ) lim sup f2( xj )= f2(x) dx n→∞ n { } ≤ n→∞ n { } 0 Xj=0 Xj=0 Z 1 1 f (x) dx + ε χ (x) dx + ε. ≤ 1 ≤ [a,b] Z0 Z0 Since ε> 0 is arbitrary, we have shown that

n−1 1 1 lim χ[a,b]( xj )= χ[a,b](x) dx = b a, n→∞ n { } 0 − Xj=0 Z so that xn is uniformly distributed mod 1. ✷

1.2.3 Exercises § Exercise 1.1 Show that if x is uniformly distributed mod 1 then x is dense in [0, 1]. (The converse n { n} is not true.)

Exercise 1.2 Let α, β R. Let x = αn + β. Show that x is uniformly distributed mod 1 if and only ∈ n n if α Q. 6∈ Exercise 1.3 (i) Prove that log10 2 is irrational. (ii) The leading digit of an integer is the left-most digit of its base 10 representation. (Thus the leading digit of 32 is 3, the leading digit of 1024 is 1, etc.) Show that the n frequency with which 2 has leading digit r (r = 1, 2,..., 9) is log10(1 + 1/r). (Hint: first show that 2n has leading digit r if and only if

r10k 2n < (r + 1)10k ≤ for some k N.) ∈ Exercise 1.4 Calculate the frequency with which the penultimate leading digit of 2n is equal to r, r = 0, 1, 2,..., 9. (The penultimate leading digit is the second-to-leftmost digit in the base 10 expansion. The penultimate leading digit of 2048 is 0, etc.)

10 MATH4/61112 1. Uniform distribution mod 1

1.3 Appendix: a recap on the Riemann Integral § (This subsection is included for general interest and to motivate the Lebesgue integral. Hence it is not examinable.) You have probably already seen the construction of the Riemann integral. This gives a method for defining the integral of suitable functions defined on an interval [a, b]. In the next section we will see how the Lebesgue integral is a generalisation of the Riemann integral in the sense that it allows us to integrate functions defined on spaces more general than subintervals of R. The Lebesgue integral has other nice properties, for example it is well-behaved with respect to limits. Here we give a brief exposition about some inadequacies of the Riemann integral and how they motivate the Lebesgue integral. Let f : [a, b] R be a bounded function (for the moment we impose no other conditions → on f). A partition ∆ of [a, b] is a finite set of points ∆ = x ,x ,x ,...,x with { 0 1 2 n} a = x

n−1 U(f, ∆) = sup f(x) (xi+1 xi), x∈[xi,xi+1] − Xi=0 n−1 L(f, ∆) = inf f(x) (xi+1 xi). x∈[xi,xi+1] − Xi=0 The idea is then that if we make the subintervals in the partition small, these sums will be a good approximation to our intuitive notion of the integral of f over [a, b] as the area bounded by the graph of f. More precisely, if

inf U(f, ∆) = sup L(f, ∆), ∆ ∆ where the infimum and supremum are taken over all possible partitions of [a, b], then we write b f(x) dx Za for their common value and call it the (Riemann) integral of f between those limits. We also say that f is Riemann integrable. The class of Riemann integrable functions includes continuous functions and step func- tions (i.e. finite linear combinations of characteristic functions of intervals). However, there are many functions for which one wishes to define an integral but which are not Riemann integrable, making the theory rather unsatisfactory. For example, define f : [0, 1] R by → 1 if x Q f(x)= χQ (x)= ∈ ∩[0,1] 0 otherwise.  Since between any two distinct real numbers we can find both a rational number and an irrational number, given 0 y < z 1, we can find y

11 MATH4/61112 1. Uniform distribution mod 1 with f(x′) = 0. Hence for any partition ∆ = x ,x ,...,x of [0, 1], we have { 0 1 n} n−1 U(f, ∆) = (x x ) = 1, i+1 − i Xi=0 L(f, ∆) = 0.

Taking the infimum and supremum, respectively, over all partitions ∆ shows that f is not Riemann integrable. Why does Riemann integration not work for the above function and how could we go about improving it? Let us look again at (and slightly rewrite) the formulæ for U(f, ∆) and L(f, ∆). We have

n−1 U(f, ∆) = sup f(x) λ([xi,xi+1]) x∈[xi,xi+1] Xi=0 and n−1 L(f, ∆) = inf f(x) λ([xi,xi+1]), x∈[xi,xi+1] Xi=0 where, for an interval [y, z], λ([y, z]) = z y − denotes its length. In the example above, things did not work because dividing [0, 1] into intervals (no matter how small) did not ‘separate out’ the different values that f could take. But suppose we had a notion of ‘length’ that worked for more general sets than intervals. Then we could do better by considering more complicated ‘partitions’ of [0, 1], where by partition we now mean a collection of subsets E ,...,E of [0, 1] such that E E = , { 1 m} i ∩ j ∅ if i = j, and m E = [0, 1]. 6 i=1 i In the example, for instance, it might be reasonable to write S 1 f(x) dx = 1 λ([0, 1] Q) + 0 λ([0, 1] Q) × ∩ × \ Z0 = λ([0, 1] Q). ∩ Instead of using subintervals, the Lebesgue integral uses a much wider class of subsets (namely, sets in a given σ-algebra) together with a notion of ‘generalised length’ (namely, measure).

12 MATH4/61112 2. More on uniform distribution. Measure spaces.

2. More on uniform distribution mod 1. Measure spaces

2.1 Uniform distribution of sequences in Rk § We shall now look at the uniform distribution of sequences in Rk. We will say that a se- quence x = (x ,...,x ) Rk is uniformly distributed mod 1 if, given any k-dimensional n n,1 n,k ∈ cube, the frequency with which the fractional parts of xn lie in the cube is equal to its k- dimensional volume. Definition. A sequence x = (x ,...,x ) Rk is said to be uniformly distributed n n,1 n,k ∈ mod 1 if, for each choice of k intervals [a , b ],..., [a , b ] [0, 1], we have that 1 1 k k ⊂ n−1 1 lim card j 0, 1,...,n 1 xj [a1, b1] [ak, bk] = (b1 a1) (bk ak). n→∞ n { ∈ { − } | ∈ ×···× } − · · · − Xj=0 We have the following criterion for uniform distribution.

Theorem 2.1.1 (Multi-dimensional Weyl’s Criterion) Let x = (x ,...,x ) Rk. The following are equivalent: n n,1 n,k ∈ (i) the sequence x Rk is uniformly distributed mod 1; n ∈ (ii) for any continuous function f : Rk/Zk R we have → n−1 1 lim f( xj,1 ,..., xj,k )= f(x1,...,xk) dx1 ... dxk; n→∞ n { } { } · · · Xj=0 Z Z (iii) for all ℓ = (ℓ ,...,ℓ ) Zk 0 we have 1 k ∈ \ { } n−1 1 lim e2πi(ℓ1xj,1+···+ℓkxj,k) = 0. n→∞ n Xj=0 Remark. Here and throughout 0 Zk denotes the zero vector (0,..., 0). ∈ Remark. In 1 we commented that, topologically, the quotient group R/Z is a circle. § More generally, the quotient group Rk/Zk is a k-dimensional . Remark. Consider the case when k = 2 so that R2/Z2 is the 2-dimensional torus. We can regard R2/Z2 as the square [0, 1] [0, 1] with the top and bottom sides identified and left × and right sides identified. Thus a continuous function f : R2/Z2 R has the property that → f(0,y) = f(1,y) and f(x, 0) = f(x, 1). More generally, we can identify the k-dimensional k k k torus R /Z with [0, 1] with (x1,...,xi−1, 0,xi+1,...,xk) and (x1,...,xi−1, 1,xi+1,...,xk) identified, 1 i k. A continuous function f : Rk/Zk R then corresponds to a ≤ ≤ → continuous function f : [0, 1]k R such that → f(x1,...,xi−1, 0,xi+1,...,xk)= f(x1,...,xi−1, 1,xi+1,...,xk) for each i, 1 i k. ≤ ≤

13 MATH4/61112 2. More on uniform distribution. Measure spaces.

Proof of Theorem 2.1.1. The proof of Theorem 2.1.1 is essentially the same as in the case k = 1. ✷

2.2 The sequence x = (α n,...,α n) § n 1 k

We shall apply Theorem 2.1.1 to the sequence xn = (α1n,...,αkn), for real numbers α1,...,αk. Definition. Real numbers β ,...,β R are said to be rationally independent if the only 1 s ∈ rationals r ,...,r Q such that 1 s ∈ r β + + r β = 0 1 1 · · · s s are r = = r = 0. 1 · · · s Proposition 2.2.1 Let α ,...,α R. Then the following are equivalent: 1 k ∈ (i) the sequence x = (α n,...,α n) Rk is uniformly distributed mod 1; n 1 k ∈ (ii) α1,...,αk and 1 are rationally independent.

Proof. The proof is similar to the discussion in 1.2.1 and we leave it as an exercise. (See § Exercise 2.1.) ✷

Remark. Note that in the case k = 1, Proposition 2.2.1 reduces to the results of 1.2.1. § To see this, note that α, 1 are rationally dependent if and only if there exist rationals r, s (not both zero) such that rα + s = 0. This holds if and only if α is rational. Hence α, 1 are rationally independent if and only if α is irrational.

2.3 Weyl’s Theorem on Polynomials § We have seen that αn + β is uniformly distributed mod 1 if α is irrational. Weyl’s Theorem generalises this to polynomials of higher degree. Write

p(n)= α nk + α nk−1 + + α n + α . k k−1 · · · 1 0 Theorem 2.3.1 (Weyl’s Theorem on Polynomials) If any one of α1,...,αk is irrational then p(n) is uniformly distributed mod 1. To prove this theorem we shall need the following technical result.

Lemma 2.3.2 (van der Corput’s Inequality) Let z ,...,z C and let 1

14 MATH4/61112 2. More on uniform distribution. Measure spaces.

Proof (not examinable). The proof is essentially an exercise in multiplying out a prod- uct and some careful book-keeping of the cross-terms. You are familiar with a particular case of it, namely the fact that z + z 2 = (z + z )(z ¯ +z ¯ )= z 2 + z 2 + z z¯ +z ¯ z = z 2 + z 2 + 2Re(z z¯ ). | 0 1| 0 1 0 1 | 0| | 1| 0 1 0 1 | 0| | 1| 0 1 Construct the following parallelogram:

z0 z0 z1 z0 z1 z2 ...... z z z z 0 1 2 · · · m−1 z z z z 1 2 · · · m−1 m z2 zm−1 zm zm+1 ·. · · . .. .. z z n−m · · · n−1 zn−m+1 zn−1 ·. · · . .. . zn−2 zn−1 zn−1 (There are n columns, with each column containing m terms, and n + m 2 rows.) Let s , − j 0 j n + m 2, denote the sum of the terms in the jth row. Each z occurs in exactly ≤ ≤ − i m of the row sums sj. Hence s + + s = m(z + + z ) 0 · · · n+m−2 0 · · · n−1 so that 2 n−1 m2 z = s + + s 2 j | 0 · · · n+m−2| j=0 X 2 ( s0 + + sn+m−2 ) ≤ | | · · · | | (n + m 1)( s 2 + + s 2), ≤ − | 0| · · · | n+m−2| where the final inequality follows from the (n + m 1)-dimensional Cauchy-Schwarz in- − equality. Recall that s 2 = s s¯ . Expanding out this product and recalling that 2 Re(z)= z +z ¯ | j| j j we have that s 2 = z 2 +2Re z z¯ | j| | k| k ℓ Xk Xk,ℓ where the first sum is over all indices k of the zi occurring in the definition of sj, and the second sum is over the indices ℓ < k of the zi occurring in the definition of sj. Noting that the the number of time the term z z¯ occurs in s 2 + + s 2 is k ℓ | 0| · · · | n+m−1| equal to m (ℓ k), we can write − − n−1 m−1 n−1−j s 2 + + s 2 m z 2 +2Re (m j) z z¯ | 0| · · · | n+m−1| ≤ | j| − i+j i Xj=0 Xj=1 Xi=0 and the result follows. ✷

15 MATH4/61112 2. More on uniform distribution. Measure spaces.

(m) Let x R. For each m 1 define the sequence xn = x x to be the sequence n ∈ ≥ n+m − n of mth differences. The following lemma allows us to infer the uniform distribution of the th sequence xn if we know the uniform distribution of the each of the m differences of xn.

Lemma 2.3.3 (m) th Let x R be a sequence. Suppose that for each m 1 the sequence xn of m differences n ∈ ≥ is uniformly distributed mod 1. Then xn is uniformly distributed mod 1.

Proof. We shall apply Weyl’s Criterion. We need to show that if ℓ Z 0 then ∈ \ { } n−1 1 lim e2πiℓxj = 0. n→∞ n Xj=0 Let z = e2πiℓxj for j = 0,...,n 1. Note that z = 1. Let 1

2 n−1 m−1 n−1−j m2 m 2(n + m 1) (m j) e2πiℓxj (n + m 1)n + − Re − e2πiℓ(xi+j −xi) n2 ≤ n2 − n n j=0 j=1 i=0 X X X m−1 m 2(n + m 1) = (m + n 1) + − Re (m j)A n − n − n,j Xj=1 where n−1−j n−1−j 1 2πiℓ(x −x ) 1 2πiℓx(j) A = e i+j i = e i . n,j n n Xi=0 Xi=0 (j) th As the sequence xi of j differences is uniformly distributed mod 1, by Weyl’s criterion we have that A 0 for each j = 1,...,m 1. Hence for each m 1 n,j → − ≥ 2 2 n−1 m 2πiℓxj (n + m 1) lim sup 2 e lim sup m − = m. n→∞ n ≤ n→∞ n j=0 X

Hence, for each m> 1 we have n−1 1 1 lim sup e2πiℓxj . n→∞ n ≤ √m j=0 X

✷ As m> 1 is arbitrary, the result follows. Proof of Weyl’s Theorem. We will only prove Weyl’s Theorem on Polynomials (The- orem 2.3.1) in the special case where the leading coefficient αk of p(n)= α nk + + α n + α k · · · 1 0 is irrational. (The general case, where α is irrational for some 1 i k, can be deduced i ≤ ≤ easily from this special case and we leave this as an exercise. See Exercise 2.2.) We shall use induction on the degree of p. Let ∆(k) denote the statement ‘for every polynomial p of degree k, with irrational leading coefficient, the sequence p(n) is uniformly ≤ distributed mod 1’. We know that ∆(1) is true; this follows immediately from Exercise 1.2.

16 MATH4/61112 2. More on uniform distribution. Measure spaces.

Suppose that ∆(k 1) is true. Let p(n)= α nk + + α n + α be any polynomial of − k · · · 1 0 degree k with α irrational. Let m N and consider the sequence p(m)(n)= p(n+m) p(n) k ∈ − of mth differences. We have that

p(m)(n) = p(n + m) p(n) − = α (n + m)k + α (n + m)k−1 + + α (n + m)+ α k k−1 · · · 1 0 α nk α nk−1 α n α − k − k−1 −···− 1 − 0 = α nk + α knk−1m + + α nk−1 + α (k 1)nk−2m k k · · · k−1 k−1 − + + α n + α m + α α nk α nk−1 α n α . · · · 1 1 0 − k − k−1 −···− 1 − 0 After cancellation, we can see that, for each m, p(m)(n) is a polynomial of degree k 1 (m)− with irrational leading coefficient αkkm. Therefore, by the inductive hypothesis, p (n) is uniformly distributed mod 1. We may now apply Lemma 2.3.3 to conclude that p(n) is uniformly distributed mod 1 and so ∆(k) holds. This completes the induction. ✷

2.4 Measures and the Lebesgue integral § You may have seen the definition of Lebesgue measure, Lebesgue outer measure and the Lebesgue integral in other courses, for example in Fourier Analysis and Lebesgue Integra- tion. The theory developed in that course is one particular example of a more general theory, which we sketch here. Measure theory is a key technical tool in ergodic theory, and so a good knowledge of measures and integration is essential for this course (although we will not need to know the (many) technical intricacies).

2.4.1 Measure spaces § Loosely speaking, a measure is a function that, when given a subset of a space X, will say how ‘big’ that subset is. A motivating example is given by Lebesgue measure on [0, 1]. The Lebesgue measure of an interval [a, b] is given by its length b a. In defining an − abstract measure space, we will be taking the properties of ‘length’ (or, in higher dimensions, ‘volume’) and abstracting them, in much the same way that a metric space abstracts the properties of ‘distance’. It turns out that in general it is not possible to be able to define the measure of an arbitrary subset of X. Instead, we will usually have to restrict our attention to a class of subsets of X.

Definition. A collection of subsets of X is called a σ-algebra if the following properties B hold:

(i) , ∅∈B (ii) if E then its complement X E , ∈B \ ∈B (iii) if E , n = 1, 2, 3,..., is a countable sequence of sets in then their union n ∈ B B ∞ E . n=1 n ∈B Definition.S If X is a set and a σ-algebra of subsets of X then we call (X, ) a mea- B B surable space.

Examples.

17 MATH4/61112 2. More on uniform distribution. Measure spaces.

1. The trivial σ-algebra is given by = , X . B {∅ } 2. The full σ-algebra is given by = (X), i.e. the collection of all subsets of X. B P Remark. In general, the trivial σ-algebra is too small and the full σ-algebra is too big. We shall see some more interesting examples of σ-algebras later.

Here are some easy properties of σ-algebras:

Lemma 2.4.1 Let be a σ-algebra of subsets of X. Then B (i) X ; ∈B (ii) if E then ∞ E . n ∈B n=1 n ∈B In the special caseT when X is a compact metric space there is a particularly important σ-algebra.

Definition. Let X be a compact metric space. We define the Borel σ-algebra (X) to B be the smallest σ-algebra of subsets of X which contains all the open subsets of X.

Remarks. 1. By ‘smallest’ we mean that if is another σ-algebra that contains all open subsets of C X then (X) , that is: B ⊂C (X)= is a σ-algebra that contains the open sets . B {C |C } \ 2. We say that the Borel σ-algebra is generated by the open sets. We call a set in (X) B a Borel set.

3. By Definition 2.4.1(ii), the Borel σ-algebra also contains all the closed sets and is the smallest σ-algebra with this property.

4. By Lemma 2.4.1 it follows that contains all countable intersections of open sets, all B countable unions of countable intersections of open sets, all countable intersections of countable unions of countable intersections of open sets, etc—and indeed many other sets.

5. There are plenty of sets that are not Borel sets, although by necessity they are rather complicated. For example, consider R as an additive group and Q R as a subgroup. ⊂ Form the quotient group R/Q and choose an element in [0, 1] for each coset (this requires the of Choice.) The set E of coset representatives is a non-Borel set.

6. In the case when X = [0, 1] or R/Z, the Borel σ-algebra is also the smallest σ-algebra that contains all sub-intervals.

Let X be a set and let be a σ-algebra of subsets of X. B Definition. A function µ : R is called a (finite) measure if: B → (i) µ( ) = 0; ∅

18 MATH4/61112 2. More on uniform distribution. Measure spaces.

(ii) if E is a countable collection of pairwise disjoint sets in (i.e. E E = for n B n ∩ m ∅ n = m) then 6 ∞ ∞ µ En = µ(En). ! n[=1 nX=1 We call (X, ,µ) a measure space. B If µ(X) = 1 then we call µ a probability or probability measure and refer to (X, ,µ) B as a probability space.

Remark. Thus a measure just abstracts properties of ‘length’ or ‘volume’. Condition (i) says that the empty set has zero length, and condition (ii) says that the length of a disjoint union is the sum of the lengths of the individual sets.

Definition. We say that a property holds if the set of points on which the property fails to hold has measure zero. We will often abbreviate this to ‘µ-a.e.’ or to ‘a.e.’ when the implied measure is clear from the context.

Example. We shall see (Exercise 2.9) that the set of rationals in [0, 1] forms a Borel set with zero Lebesgue measure. Thus Lebesgue almost every point in [0, 1] is irrational. (Thus, ‘typical’ (in the sense of measure theory, and with respect to Lebesgue measure) points in [0, 1] are irrational.)

We will usually be interested in studying measures on the Borel σ-algebra of a compact metric space X. To define such a measure, we need to define the measure of an arbitrary Borel set. In general, the Borel σ-algebra is extremely large. We shall see that it is often unnecessary to do this and instead it is sufficient to define the measure of a certain class of subsets.

2.4.2 The Hahn-Kolmogorov Extension Theorem § A collection of subsets of X is called an algebra if: A (i) , ∅∈A (ii) if A , A ,...,A then n A , 1 2 n ∈A j=1 j ∈A (iii) if A then Ac . S ∈A ∈A Thus an algebra is like a σ-algebra, except that it is closed under finite unions and not necessarily closed under countable unions.

Example. Take X = [0, 1], and = all finite unions of subintervals . A { } Let ( ) denote the σ-algebra generated by , i.e., the smallest σ-algebra containing B A A . More precisely: A ( )= is a σ-algebra, . B A {C |C C ⊃A} In the case when X = [0, 1] and\ is the algebra of finite unions of intervals, we have A that ( ) is the Borel σ-algebra. Indeed, in the special case of the Borel σ-algebra of a B A compact metric space X, it is usually straightforward to check whether an algebra generates the Borel σ-algebra.

19 MATH4/61112 2. More on uniform distribution. Measure spaces.

Proposition 2.4.2 Let X be a compact metric space and let be the Borel σ-algebra. Let be an algebra B A of Borel subsets, . Suppose that for every x ,x X, x = x , there exist disjoint A⊂B 1 2 ∈ 1 6 2 open sets A , A such that x A ,x A . Then generates the Borel σ-algebra . 1 2 ∈A 1 ∈ 1 2 ∈ 2 A B The following result says that if we have a function which looks like a measure defined on an algebra, then it extends uniquely to a measure defined on the σ-algebra generated by the algebra.

Theorem 2.4.3 (Hahn-Kolmogorov Extension Theorem) Let be an algebra of subsets of X. Suppose that µ : [0, 1] satisfies: A A→ (i) µ( ) = 0; ∅ (ii) if A , n 1, are pairwise disjoint and if ∞ A then n ∈A ≥ n=1 n ∈A ∞ ∞S µ An = µ(An). n=1 ! n=1 [ X Then there is a unique probability measure µ : ( ) [0, 1] which is an extension of B A → µ : [0, 1]. A→ Remarks. (i) We will often use the Hahn-Kolmogorov Extension Theorem as follows. Take X = [0, 1] and take to be the algebra consisting of all finite unions of subintervals of X. A We then define the ‘measure’ µ of a subinterval in such a way as to be consistent with the hypotheses of the Hahn-Kolmogorov Extension Theorem. It then follows that µ does indeed define a measure on the Borel σ-algebra. (ii) Here is another way in which we shall use the Hahn-Kolmogorov Extension Theorem. Suppose we have two measures, µ and ν, and we want to see if µ = ν. A priori we would have to check that µ(B) = ν(B) for all B . The Hahn-Kolmogorov ∈ B Extension Theorem says that it is sufficient to check that µ(A) = ν(A) for all A in an algebra that generates . For example, to show that two Borel probability A B measures on [0, 1] are equal, it is sufficient to show that they give the same measure to each subinterval. (iii) There is a more general version of the Hahn-Kolmogorov Extension Theorem for the case when X does not have finite measure (indeed, this is the setting in which the Hahn-Kolmogorov Theorem is usually stated). Suppose that X is a set, is a σ- B algebra of subsets of X, and is an algebra that generates . Suppose that µ : A B A→ R satisfies conditions (i) and (ii) of Theorem 2.4.3. Suppose in addition that ∪ {∞} there exist a countable number of sets A , n = 1, 2, 3,... such that X = ∞ A n ∈A n=1 n such that µ(A ) < 1. Then there exists a unique measure µ : ( ) R which n B A → ∪ {∞}S is an extension of µ : R . A→ ∪ {∞} A consequence of the proof (which we omit) of the Hahn-Kolmogorov Extension The- orem is that sets in can be arbitrarily well approximated by sets in in the following B A sense. We define the symmetric difference between two sets A, B by A B = (A B) (B A). △ \ ∪ \ Thus, two sets are ‘close’ if their symmetric difference is small.

20 MATH4/61112 2. More on uniform distribution. Measure spaces.

Proposition 2.4.4 Suppose that is an algebra that generates the σ-algebra . Let B and let ε > 0. A B ∈ B Then there exists A such that µ(A B) < ε. ∈A △ Remark. It is straightforward to check that if µ(A B) < ε then µ(A) µ(B) < ε. △ | − | 2.4.3 Examples of measure spaces § Lebesgue measure on [0, 1]. Take X = [0, 1] and take to be the collection of all finite A unions of subintervals of [0, 1]. For a subinterval [a, b] define

µ([a, b]) = b a. − This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra . This is Lebesgue measure. B Lebesgue measure on R/Z. Take X = R/Z and take to be the collection of all finite A unions of subintervals of [0, 1). For a subinterval [a, b] define

µ([a, b]) = b a. − This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra . This is Lebesgue measure on the circle. B Lebesgue measure on the k-dimensional torus. Take X = Rk/Zk and take to be k kA the collection of all finite unions of k-dimensional sub-cubes j=1[aj, bj] of [0, 1] . For a k k sub-cube j=1[aj, bj] of [0, 1] , define Q

Q k k µ( [a , b ]) = (b a ). j j j − j jY=1 jY=1 This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra . This is Lebesgue measure on the torus. B Stieltjes measures.1 Take X = [0, 1] and let ρ : [0, 1] R+ be an increasing function → such that ρ(1) ρ(0) = 1. Take to be the algebra of finite unions of subintervals and − A define µ ([a, b]) = ρ(b) ρ(a). ρ − This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so defines a measure on the Borel σ-algebra . We say that µ is the measure on [0, 1] with density B ρ ρ. Dirac measures. Finally, we give an example of a class of measures that do not fall into the above categories. Let X be an arbitrary space and let be an arbitrary σ-algebra. Let B x X. Define the measure δ by ∈ x 1 if x A δ (A)= ∈ x 0 if x A.  6∈ Then δx defines a probability measure. It is called the Dirac measure at x. 1An approximate pronunciation of Stieltjes is ‘Steeel-tyuz’.

21 MATH4/61112 2. More on uniform distribution. Measure spaces.

2.5 Exercises § Exercise 2.1 Prove Proposition 2.2.1: let α ,...,α R and let x = (α n,...,α n) Rk. Prove that 1 k ∈ n 1 k ∈ xn is uniformly distributed mod 1 if and only if α1,...,αk, 1 are rationally independent.

Exercise 2.2 Deduce the general case of Weyl’s Theorem on Polynomials (where at least one non-constant coefficient is irrational) from the special case proved above (where the leading coefficient is irrational).

Exercise 2.3 Let α be irrational. Show that p(n)= αn2 + n + 1 is uniformly distributed mod 1 by using Lemma 2.3.3 and Exercise 1.2: i.e. show that, for each m 1, the sequence p(m)(n) = ≥ p(n + m) p(n) of mth differences is uniformly distributed mod 1. − Exercise 2.4 Let p(n)= α nk + α nk−1 + + α n + α , q(n)= β nk + β nk−1 + + β n + β . k k−1 · · · 1 0 k k−1 · · · 1 0 Show that (p(n),q(n)) R2 is uniformly distributed mod 1 if, for some 1 i k, α , β ∈ ≤ ≤ i i and 1 are rationally independent.

Exercise 2.5 Prove Lemma 2.4.1.

Exercise 2.6 Let X = [0, 1]. Find the smallest σ-algebra that contains the sets: [0, 1/4), [1/4, 1/2), [1/2, 3/4), and [3/4, 1]

Exercise 2.7 Let X = [0, 1] and let denote the Borel σ-algebra. A dyadic interval is an interval of the B form p p 1 , 2 , p ,p 0, 1,..., 2k . 2k 2k 1 2 ∈ { } Show that the algebra formedh by takingi finite unions of all dyadic intervals (over all k N) ∈ generates the Borel σ-algebra.

Exercise 2.8 Show that = all finite unions of subintervals of [0, 1] is an algebra. A { } Exercise 2.9 Let µ denote Lebesgue measure on [0, 1]. Show that for any x [0, 1] we have that ∈ µ( x ) = 0. Hence show that the Lebesgue measure of any countable set is zero. { } Show that Lebesgue almost every point in [0, 1] is irrational.

Exercise 2.10 Let X = [0, 1]. Let µ = δ1/2 denote the Dirac δ-measure at 1/2. Show that µ ([0, 1/2) (1/2, 1]) = 0. ∪ Conclude that µ-almost every point in [0, 1] is equal to 1/2.

22 MATH4/61112 3. Lebesgue integration. Invariant measures

3. Lebesgue integration. Invariant measures

3.1 Lebesgue integration § Let (X, ,µ) be a measure space. We are interested in how to integrate functions defined B on X with respect to the measure µ. In the special case when X = [0, 1], is the Borel B σ-algebra and µ is Lebesgue measure, this will extend the definition of the Riemann integral to a class of functions that are not Riemann integrable.

Definition. Let f : X R be a function. If D R then we define the pre-image of D → ⊂ to be the set f −1D = x X f(x) D . { ∈ | ∈ } A function f : X R is measurable if f −1D for every Borel subset D of R. One → ∈ B can show that this is equivalent to requiring that f −1( , c) for all c R. −∞ ∈B ∈ A function f : X C is measurable if both the real and imaginary parts, Ref and → Imf, are measurable.

Remark. In writing f −1D, we are not assuming that f is a bijection. We are writing f −1D to denote the pre-image of the set D.

We define integration via simple functions.

Definition. A function f : X R is simple if it can be written as a linear combination → of characteristic functions of sets in , i.e.: B r

f = ajχBj , Xj=1 for some a R, B , where the B are pairwise disjoint. j ∈ i ∈B j Remarks. (i) Note that the sets B are sets in the σ-algebra ; even in the case when X = [0, 1] j B we do not assume that the sets Bj are intervals.

(ii) For example, χQ∩[0,1] is a simple function. Note, however, that χQ∩[0,1] is not Riemann integrable.

For a simple function f : X R we define → r f dµ = ajµ(Bj). Z Xj=1 For example, if µ denotes Lebesgue measure on [0, 1] then

χ dµ = µ(Q [0, 1]) = 0, Q∩[0,1] ∩ Z 23 MATH4/61112 3. Lebesgue integration. Invariant measures as Q [0, 1] is a countable set and so has Lebesgue measure zero. ∩ A simple function can be written as a linear combination of characteristics functions of pairwise disjoint sets in many different ways (for example, χ[1/4,3/4] = χ[1/4,1/2) + χ[1/2,3/4]). However, one can show that the definition of a simple function f given in (3.1.1) is indepen- dent of the choice of representation of f as a linear combination of characteristic functions. Thus for a simple function f, the integral of f can be regarded as being the area of the region in X R bounded by the graph of f. × If f : X R, f 0, is measurable then one can show that there exists an increasing → ≥ sequence of simple functions f such that f f pointwise1 as n and we define n n ↑ →∞

f dµ = lim fn dµ. n→∞ Z Z This can be shown to exist (although it may be ) and to be independent of the choice of ∞ sequence fn. For an arbitrary measurable function f : X R, we write f = f + f −, where → − f + = max f, 0 0 and f − = max f, 0 0 and define { }≥ {− }≥ f dµ = f + dµ f − dµ. − Z Z Z If f + dµ = and f − dµ is finite then we set f dµ = . Similarly, if f + dµ is finite ∞ ∞ but f − dµ = then we set f dµ = . If both f + dµ and f − dµ are infinite then R ∞ R −∞ R R we leave f dµ undefined. R R R R Finally, for a measurable function f : X C, we define R → f dµ = Ref dµ + i Imf dµ. Z Z Z We say that f is integrable if f dµ < + . | | ∞ Z (Note that, in the case of a measurable function f : X R, saying that f is integrable is → equivalent to saying that both f + dµ and f − dµ are finite.) Denote the space of C-valued integrable functions by L1(X, ,µ). (We shall see a R R B slightly more sophisticated definition of this space below.) Note that when we write f dµ we are implicitly integrating over the whole space X. We can define integration over subsets of X as follows. R Definition. Let (X, ,µ) be a probability space. Let f L1(X, ,µ) and let B . B ∈ B ∈ B Then χ f L1(X, ,µ). We define B ∈ B

f dµ = χBf dµ. ZB Z 1 fn ↑ f pointwise means: for every x, fn(x) is an increasing sequence of real numbers and limn fn(x)= →∞ f(x).

24 MATH4/61112 3. Lebesgue integration. Invariant measures

3.1.1 Examples § Lebesgue measure. Let X = [0, 1] and let µ denote Lebesgue measure on the Borel σ-algebra. If f : [0, 1] R is Riemann integrable then it is also Lebesgue integrable and → the two definitions agree. However, there are plenty of examples of functions which are Lebesgue integrable but not Riemann integrable. For example, take f(x) = χQ∩[0,1](x) defined on [0, 1] to be the characteristic function of the rationals. Then f(x) = 0 µ-a.e. Hence f is integrable and f dµ = 0. However, f is not Riemann integrable.

The Stieltjes integral.R Let ρ : [0, 1] R+ and suppose that ρ is differentiable. Then → one can show that ′ f dµρ = f(x)ρ (x) dx. Z Z Integration with respect to Dirac measures. Let x X. Recall that we defined the ∈ Dirac measure at x by 1 if x B δx(B)= ∈ 0 if x B.  6∈ If χB denotes the characteristic function of B then

1 if x B χ dδ = ∈ B x 0 if x B. Z  6∈

Suppose that f = ajχBj is a simple function and that, without loss of generality, the Bj are pairwise disjoint. Then f dδ = a where j is chosen so that x B (and equals zero P x j ∈ j if no such B exists). Now let f : X R. By choosing an increasing sequence of simple j R → functions, we see that

f dδx = f(x). Z We say that two measurable functions f, g : X C are equivalent or equal µ-a.e. if → f = g µ-a.e., i.e. if µ( x X f(x) = g(x) ) = 0. The following result says that if two { ∈ | 6 } functions differ only on a set of measure zero then their integrals are equal.

Lemma 3.1.1 Suppose that f, g L1(X, ,µ) and f, g are equal µ-a.e. Then f dµ = g dµ. ∈ B Functions being equivalent is an equivalence relation. WeR shall writeR L1(X, ,µ) for B the set of equivalence classes of integrable functions f : X C on (X, ,µ). We define → B f = f dµ. k k1 | | Z Then d(f, g) = f g is a metric on L1(X, ,µ). One can show that L1(X, ,µ) is a k − k1 B B vector space; indeed, it is complete in the L1 metric, and so is a Banach space.

Remark. In practice, we will often abuse notation and regard elements of L1(X, ,µ) as B functions rather than equivalence classes of functions. In general, in measure theory one can often ignore sets of measure zero and treat two objects (functions, sets, etc) that differ only on a set of measure zero as ‘the same’.

25 MATH4/61112 3. Lebesgue integration. Invariant measures

More generally, for any p 1, we can define the space Lp(X, ,µ) consisting of (equiv- ≥ B alence classes of) measurable functions f : X C such that f p is integrable. We can → | | again define a metric on Lp(X, ,µ) by defining d(f, g)= f g where B k − kp 1/p f = f p dµ k kp | | Z  is the Lp norm. Apart from L1, the most interesting Lp space is L2(X, ,µ). This is a Hilbert space2 B with the inner product f, g = fg¯ dµ. h i Z The Cauchy-Schwarz inequality holds: f, g f g for all f, g L2(X, ,µ). |h i| ≤ k k2k k2 ∈ B Suppose that µ is a finite measure. It follows from the Cauchy-Schwarz inequality that L2(X, ,µ) L1(X, ,µ). B ⊂ B In general, the Riemann integral does not behave well with respect to limits. For example, if f is a sequence of Riemann integrable functions such that f (x) f(x) at n n → every point x then it does not follow that f is Riemann integrable. Even if f is Riemann integrable, it does not follow that f (x) dx f(x) dx. The following convergence n → theorems hold for the Lebesgue integral. R R Theorem 3.1.2 (Monotone Convergence Theorem) Suppose that f : X R is an increasing sequence of integrable functions on (X, ,µ). n → B Suppose that fn dµ is a bounded sequence of real numbers (i.e. there exists M > 0 such that f dµ M for all n). Then f(x) = lim f (x) exists µ-a.e. Moreover, f is | n |R ≤ n→∞ n integrable and R f dµ = lim fn dµ. n→∞ Z Z Theorem 3.1.3 (Dominated Convergence Theorem) Suppose that g : X R is integrable and that f : X R is a sequence of measurable → n → functions with f (x) g(x) µ-a.e. and lim f (x)= f(x) µ-a.e. Then f is integrable | n |≤ n→∞ n and

lim fn dµ = f dµ. n→∞ Z Z Remark. Both the Monotone Convergence Theorem and the Dominated Convergence Theorem fail for Riemann integration.

3.2 Invariant measures § We are now in a position to study dynamical systems. Let (X, ,µ) be a probability space. B Let T : X X be a dynamical system. If B then we define → ∈B T −1B = x X T (x) B , { ∈ | ∈ } that is, T −1B is the pre-image of B under T . 2An inner product h·, ·i : H×H→ C on a complex vector space H is a function such that: (i) hv, vi≥ 0 for all v ∈ H with equality if and only if v = 0, (ii) hu, vi = hv, ui, and (iii) for each v ∈ H, u 7→ hu, vi is linear. An inner product determines a norm by setting kvk = (hv, vi)1/2. A norm determines a metric by setting d(u, v)= ku − vk. We say that H is a if the vector space H is complete with respect to the metric induced from the inner product.

26 MATH4/61112 3. Lebesgue integration. Invariant measures

Remark. Note that we do not have to assume that T is a bijection for this definition to make sense. For example, let T (x) = 2x mod 1 be the doubling map on [0, 1]. Then T is not a bijection. One can easily check that, for example, T −1(0, 1/2) = (0, 1/4) (1/2, 3/4). ∪ Definition. A transformation T : X X is said to be measurable if T −1B for all → ∈ B B . ∈B Remark. We will often work with compact metric spaces X equipped with the Borel σ-algebra. In this setting, any continuous transformation is measurable. Remark. Suppose that is an algebra of sets that generates the σ-algebra . One can A B show that if T −1A for all A then T is measurable. ∈B ∈A Definition. We say that T is a measure-preserving transformation (m.p.t. for short) or, equivalently, µ is said to be a T -, if µ(T −1B)= µ(B) for all B . ∈B 3.3 Using the Hahn-Kolmogorov Extension Theorem to prove invariance § Recall the Hahn-Kolmogorov Extension Theorem:

Theorem 3.3.1 (Hahn-Kolmogorov Extension Theorem) Let be an algebra of subsets of X and let ( ) denote the σ-algebra generated by . A B A A Suppose that µ : [0, 1] satisfies: A→ (i) µ( ) = 0; ∅ (ii) if A , n 1, are pairwise disjoint and if ∞ A then n ∈A ≥ n=1 n ∈A ∞ ∞S µ An = µ(An). n=1 ! n=1 [ X Then there is a unique probability measure µ : ( ) [0, 1] which is an extension of B A → µ : [0, 1]. A→ That is, if µ looks like a measure on an algebra , then it extends uniquely to a measure A defined on the σ-algebra ( ) generated by . B A A Corollary 3.3.2 Let be an algebra of subsets of X. Suppose that µ and µ are two measures on ( ) A 1 2 B A such that µ (A)= µ (A) for all A . Then µ = µ on ( ). 1 2 ∈A 1 2 B A We shall discuss several examples of dynamical systems and prove that certain naturally occurring measures are invariant using the Hahn-Kolmogorov Extension Theorem. Suppose that (X, ,µ) is a probability space and suppose that T : X X is measurable. B → We define a new measure T∗µ by −1 T∗µ(B)= µ(T B) (3.3.1) where B . It is straightforward to check that T µ is a probability measure on (X, ,µ) ∈B ∗ B (see Exercise 3.4). Thus µ is a T -invariant measure if and only if T∗µ = µ, i.e. T∗µ and µ are the same measure. Corollary 3.3.2 says that if two measures agree on an algebra, then they agree on the σ-algebra generated by that algebra. Hence if we can show that T µ(A)= µ(A) for all sets A for some algebra that generates , then T µ = µ, and ∗ ∈A A B ∗ so µ is a T -invariant measure.

27 MATH4/61112 3. Lebesgue integration. Invariant measures

3.3.1 The doubling map § Let X = R/Z be the circle, be the Borel σ-algebra, and let µ denote Lebesgue measure. B Define the doubling map by T (x) = 2x mod 1.

Proposition 3.3.3 Let X = R/Z be the circle, be the Borel σ-algebra, and let µ denote Lebesgue measure. B Define the doubling map by T (x) = 2x mod 1. Then Lebesgue measure µ is T -invariant.

Proof. Let denote the algebra of finite unions of intervals. For an interval [a, b] we A have that a b a + 1 b + 1 T −1[a, b]= x R/Z T (x) [a, b] = , , . { ∈ | ∈ } 2 2 ∪ 2 2     See Figure 3.3.1.

b

a

a b a+1 b+1 2 2 2 2 Figure 3.3.1: The pre-image of an interval under the doubling map

Hence

−1 T∗µ([a, b]) = µ(T [a, b]) a b a + 1 b + 1 = µ , , 2 2 ∪ 2 2     b a (b + 1) (a + 1) = + 2 − 2 2 − 2 = b a = µ([a, b]). − Hence T µ = µ on the algebra . As generates the Borel σ-algebra, by uniqueness in ∗ A A the Hahn-Kolmogorov Extension Theorem we see that T∗µ = µ. Hence Lebesgue measure is T -invariant. ✷

3.3.2 Rotations on a circle § Let X = R/Z be the circle, let be the Borel σ-algebra and let µ be Lebesgue measure. B Fix α R. Define T : R/Z R/Z by T (x)= x + α mod 1. We call T a rotation through ∈ → angle α.

28 MATH4/61112 3. Lebesgue integration. Invariant measures

One can also regard R/Z as the unit circle K = z C z = 1 in the complex plane { ∈ | | | } via the map t e2πit. In these co-ordinates, the map T becomes T (e2πiθ) = e2πiαe2πiθ, 7→ which is a rotation about the origin through the angle 2πα.

Proposition 3.3.4 Let T : R/Z R/Z, T (x)= x + α mod 1, be a circle rotation. Then Lebesgue measure is → an invariant measure.

Proof. Let [a, b] R/Z be an interval. By the Hahn-Kolmogorov Extension Theorem, if ⊂ we can show that T µ([a, b]) = µ([a, b]) then it follows that µ(T −1B)= µ(B) for all B , ∗ ∈B hence µ is T -invariant. Note that T −1[a, b] = [a α, b α] where we interpret the endpoints mod 1. (One needs − − to be careful here: if a α < 0 < b α then T −1([a, b]) = [0, b α] [a α + 1, 1], etc.) − − − ∪ − Hence T µ([a, b]) = µ([a α, b α]) = (b α) (a α)= b a = µ([a, b]). ∗ − − − − − − Hence T µ = µ on the algebra . As generates the Borel σ-algebra, by uniqueness in ∗ A A the Hahn-Kolmogorov Extension Theorem we see that T∗µ = µ. Hence Lebesgue measure is T -invariant. ✷

3.3.3 The Gauss map § Let X = [0, 1] be the unit interval and let be the Borel σ-algebra. Define the Gauss map B T : X X by → 1 mod1 if x = 0 T (x)= x 6 0 if x = 0.  See Figure 3.3.2

1 1 1 0 4 3 2 1 Figure 3.3.2: The graph of the Gauss map (note that there are, in fact, infinitely many branches to the graph, only the first 5 are illustrated)

The Gauss map is very closely related to continued fractions. Recall that if x (0, 1) ∈

29 MATH4/61112 3. Lebesgue integration. Invariant measures then x has a continued fraction expansion of the form 1 x = (3.3.2) 1 x + 0 1 x + 1 x + 2 · · · where x N. If x is rational then this expansion is finite. One can show that x is irrational j ∈ if and only if it has an infinite continued fraction expansion. Moreover, if x is irrational then it has a unique infinite continued fraction expansion. If x has continued fraction expansion given by (3.3.2) then 1 1 = x + . x 0 1 x + 1 1 x + 2 x + 3 · · · Hence, taking the fractional part, we see that T (x) has continued fraction expansion given by 1 T (x)= 1 x + 1 1 x + 2 x + 3 · · · i.e. T acts by deleting the zeroth term in the continued fraction expansion of x and then shifting the remaining digits one place to the left. The Gauss map does not preserve Lebesgue measure (see Exercise 3.5). However it does preserve Gauss’ measure µ defined by 1 dx µ(B)= log 2 1+ x ZB (here log denotes the natural logarithm; the factor log 2 is a normalising constant to make this a probability measure).

Proposition 3.3.5 Gauss’ measure is an invariant measure for the Gauss map.

Proof. It is sufficient to check that µ([a, b]) = µ(T −1[a, b]) for any interval [a, b]. First note that ∞ 1 1 T −1[a, b]= , . b + n a + n n=1 [   Thus

µ(T −1[a, b]) ∞ 1 1 a+n 1 = dx log 2 1 1+ x n=1 Z b+n X∞ 1 1 1 = log 1+ log 1+ log 2 a + n − b + n n=1      X∞ 1 = [log(a + n + 1) log(a + n) log(b + n + 1) + log(b + n)] log 2 − − n=1 X 30 MATH4/61112 3. Lebesgue integration. Invariant measures

N 1 = lim [log(a + n + 1) log(a + n) log(b + n + 1) + log(b + n)] N→∞ log 2 − − n=1 X 1 = lim [log(a + N + 1) log(a + 1) log(b + N + 1) + log(b + 1)] log 2 N→∞ − − 1 a + N + 1 = log(b + 1) log(a + 1) + lim log log 2 − N→∞ b + N + 1    1 = (log(b + 1) log(a + 1)) log 2 − 1 b 1 = dx = µ([a, b]), log 2 1+ x Za as required. ✷

3.3.4 Markov shifts § Let S be a finite set, for example S = 1, 2,...,k , with k 2. Let { } ≥ Σ= x = (x )∞ x S { j j=0 | j ∈ } denote the set of all infinite sequences of symbols chosen from S. Thus a point x in the phase space Σ is an infinite sequence of symbols x = (x0,x1,x2,...). Define the shift map σ :Σ Σ by →

σ((x0,x1,x2,...)) = (x1,x2,x3,...)

(equivalently, (σ(x))j = xj+1). Thus σ takes a sequence, deletes the zeroth term in this sequence, and then shifts the remaining terms in the sequence one place to the left. When constructing a measure µ on the Borel σ-algebra of [0, 1] we first defined µ on B an algebra that generates the σ-algebra and then extended µ to using the Hahn- A B B Kolmogorov Extension Theorem. In this case, our algebra was the collection of finite A unions of intervals; thus to define µ on it was sufficient to define µ on an interval. We A want to use a similar procedure to define measures on Σ. To do this, we first need to define a metric on Σ, so that it makes sense to talk about the Borel σ-algebra, and then we need an algebra of subsets that generates the Borel σ-algebra. Let x, y Σ. Suppose that x = y. Define n(x, y) = n where x = y but x = y ∈ 6 n 6 n j j for 0 j n 1. Thus n(x, y) is the index of the first place in which the sequences x, y ≤ ≤ − disagree. For convenience, define n(x, y)= if x = y. Define ∞ 1 d(x, y)= . 2n(x,y) Thus two sequences x, y are close if they agree for a large number of initial places. One can show (see Exercise 3.10) that d is a metric on Σ and that the shift map σ :Σ Σ is continuous. → Fix i S, j = 0, 1,...,n 1. We define the cylinder set j ∈ − [i , i ,...,i ]= x = (x )∞ Σ x = i , j = 0, 1,...,n 1 . 0 1 n−1 { j j=0 ∈ | j j − }

That is, the cylinder set [i0, i1,...,in−1] consists of all infinite sequences of symbols from S that begin i0, i1,...,in−1. We call n the rank of the cylinder. Cylinder sets for shifts often

31 MATH4/61112 3. Lebesgue integration. Invariant measures play the same role that intervals do for maps of the unit interval or circle. Let denote A the algebra of all finite unions of cylinders. Then generates the Borel σ-algebra . To A B see this we use Proposition 2.4.2. It is sufficient to check that separates every pair of A distinct points in Σ. Let x = (x )∞ , y = (y )∞ Σ and suppose that x = y. Then there j j=0 j j=0 ∈ 6 exists n 0 such that x = y . Hence x, y are in different cylinders of rank n + 1, and the ≥ n 6 n claim follows. We will construct a family of σ-invariant measures on Σ by first constructing them on cylinders and then extending them to the Borel σ-algebra by using the Hahn-Kolmogorov Extension Theorem. A k k matrix P is called a if × (i) P (i, j) [0, 1] ∈ k (ii) each row of P sums to 1: for each i, j=1 P (i, j) = 1. (Here, P (i, j) denotes the (i, j)th entry of theP matrix P .) We say that P is irreducible if: for all i, j, there exists n> 0 such that P n(i, j) > 0. We say that P is aperiodic if there exists n> 0 such that every entry of P n is strictly positive. Thus P is irreducible if for every (i, j) there exists an n such that the (i, j)th entry of P n is positive, and P is aperiodic if this n can be chosen to be independent of (i, j). Suppose that P is irreducible. Let d be the highest common factor of n> 0 P n(i, i) > { | 0 . One can show that P is aperiodic if and only if d = 1. We call d the period of P . } In general, if d is the period of an irreducible matrix P then 1, 2,...,k can be parti- { } tioned into d sets, S0,S1,...,Sd−1, say, such that P (i, j) > 0 only if i Sℓ, j Sℓ+1 mod d. d ∈ ∈ The matrix P restricted to the indices that comprise each set Sj is then aperiodic. The eigenvalues of aperiodic (or, more generally, irreducible) stochastic matrices are extremely well-behaved.

Theorem 3.3.6 (Perron-Frobenius Theorem) Let P be an irreducible stochastic matrix with period d. Then the following statements hold:

(i) The dth roots of unity are simple eigenvalues for P and all other eigenvalues have modulus strictly less than 1.

(ii) Let 1 denote the column vector (1, 1,..., 1)T . Then P 1 = 1 so that 1 is a right eigenvector corresponding to the maximal eigenvalue 1. Moreover, there exists a corresponding left eigenvector p = (p(1),...,p(k)) for the eigenvalue 1, that is pP = p. The vector p has strictly positive entries p(j) > 0, and we can assume that p is k normalised so that j=1 p(j) = 1. (iii) for all i, j 1, 2,...,kP , we have that P nd(i, j) p(j) as n . ∈ { } → →∞ Proof (not examinable). We prove only the aperiodic case. In this case, the period d = 1. We must show that 1 is a simple eigenvalue, construct the positive left eigenvector p, and show that P n(i, j) p(j) as n . → →∞ First note that 1 is an eigenvalue of P as P 1 = 1; this follows from the fact that, for a stochastic matrix, the rows sum to 1. Suppose P has an eigenvalue λ with corresponding eigenvector v. Then P v = λv. Hence P nv = λnv. As the entries of P are non-negative we have that

λn v P n v . | || |≤ | |

32 MATH4/61112 3. Lebesgue integration. Invariant measures

Note that if P is stochastic then so is P n for any n 1. As P n is stochastic, the right-hand ≥ side is a bounded sequence in n. Hence P keeps an eigenvector in a bounded region of Ck. If λ > 1 then λn v , a contradiction if v = 0. Hence the eigenvalues of P have | | | || |→∞ 6 modulus less than or equal to 1. Suppose that P v = λv and λ = 1. Then P n v v . As P is aperiodic, we can choose | | | | ≥ | | n such that P n(i, j) > 0 for all i, j. Hence

k P n(i, j) v(j) v(i) (3.3.3) | | ≥ | | Xj=1 and choose i such that v(i ) = max v(j) 1 j k . Also, as P n is stochastic and 0 | 0 | {| | | ≤ ≤ } P n(i, j) > 0, we must have that

k v(i ) P n(i , j) v(j) (3.3.4) | 0 |≥ 0 | | Xj=1 as the right-hand side of (3.3.4) is a convex combination of the v(j) . Thus v(j) = v(i ) | | | | | 0 | for every j, 1 j k. We can assume, by normalising, that v(j) = 1 for all 1 j k. ≤ ≤ | | ≤ ≤ Now P v = λv, i.e. λv(i)= k P (i, j)v(j), a convex combination of v(j). As the v(j) j=1 | | all have the same modulus, this can only happen if all of the v(j) are the same. Hence v is P a multiple of 1 and λ = 1. So 1 is a simple eigenvalue and there are no other eigenvalues of modulus 1. Since 1 is a simple eigenvalue, there is a unique (up to scalar multiples) left eigenvector p such that pP = p. As P is non-negative, we have that p P p , i.e. | | ≥ | | k p(i) P (i, j) p(j) (3.3.5) | | ≥ | | Xi=1 and summing over j gives k k p(i) p(j) | |≥ | | Xi=1 Xj=1 as P is stochastic. Hence we must have equality in (3.3.5), i.e. p P = p . Hence p is a | | | | | | left eigenvector for P . Hence p is a scalar multiple of p , so without loss of generality we | | can assume that p(i) 0 for all i. ≥ To see that p(i) > 0, choose n such that all of the entries of P n are positive. Then n k n pP = p. Hence p(j) = i=1 p(i)P (i, j). The right-hand side of this expression is a sum of non-negative terms and can only be zero if p(i) = 0 for all 1 i k, i.e. if p = 0. Hence P ≤ ≤ all of the entries of p are strictly positive. k We can normalise p and assume that j=1 p(j) = 1. Decompose Rk into the sum V + V of eigenspaces where 0 1 P V = v p, v = 0 , V = span 1 0 { | h i } 1 { } so that V1 is the eigenspace corresponding to the eigenvalue 1 and V0 is the sum of the eigenspaces of the remaining eigenvalues. Then P (V ) = V and P (V ) V . Note that 1 1 0 ⊂ 0 if w V then P nw 0 as 1 is not an eigenvalue of P when restricted to V and the ∈ 0 → 0 eigenvalues of P restricted to V0 have modulus strictly less than 1.

33 MATH4/61112 3. Lebesgue integration. Invariant measures

Let v Rk and write v = c1 + w where p, w = 0. Hence c = p, v . Then ∈ h i h i P nv = p, v 1 + P nw. h i Hence P nv p, v as n . Taking v = e = (0,..., 0, 1, 0,..., 0), the standard basis → h i → ∞ j vectors, we see that P n(i, j) p(j). ✷ → Given an irreducible stochastic matrix P with corresponding normalised left eigenvector p, we define a Markov measure µP on cylinders by defining

µ ([i , i ,...,i ]) = p(i )P (i , i )P (i , i ) P (i , i ). P 0 1 n−1 0 0 1 1 2 · · · n−2 n−1

We can then extend µP to a probability measure on the Borel σ-algebra of Σ. Bernoulli measures are particular examples of Markov measures. Let p = (p(1),...,p(k)), p(j) (0, 1), k p(j) = 1 be a probability vector. Define ∈ j=1 P µ ([i , i ,...,i ]) = p(i )p(i ) p(i ). p 0 1 n−1 0 1 · · · n−1 and then extend to the Borel σ-algebra. We call µp the Bernoulli-p measure. We can now prove that Markov measures are invariant for shift maps.

Proposition 3.3.7 Let σ :Σ Σ be a shift map on the set of k symbols S. Let P be an irreducible stochastic → matrix with left eigenvector p. Then the Markov measure µP is a σ-invariant measure.

−1 Proof. It is sufficient to prove that µP (σ [i0,...,in−1]) = µP ([i0,...,in−1]) for each cylinder [i0,...,in−1]. First note that

σ−1[i ,...,i ] = x Σ σ(x) [i ,...,i ] 0 n−1 { ∈ | ∈ 0 n−1 } = x Σ x = (i, i ,...,i ,...), i S { ∈ | 0 n−1 ∈ } k = [i, i0,...,in−1]. i[=1 Hence

k −1 µP (σ [i0,...,in−1]) = µP [i, i0,...,in−1] ! i[=1 k = µP ([i, i0,...,in−1]) as this is a disjoint union Xi=1 k = p(i)P (i, i )P (i , i ) P (i , i ) 0 0 1 · · · n−2 n−1 Xi=1 = p(i )P (i , i ) P (i , i ) as pP = p 0 0 1 · · · n−2 n−1 = µP ([i0,...,in−1]) where we have used the fact that pP = p. ✷

34 MATH4/61112 3. Lebesgue integration. Invariant measures

Remark. Bernoulli measures are familiar to you from . Suppose that S = H, T so that Σ denotes all infinite sequences of Hs and T s. We can think of an { } element of Σ as the outcome of an infinite sequence of coin tosses. Suppose that p = (pH ,pT ) is a probability vector with corresponding Bernoulli measure µp. Then, for example, the cylinder set [H,H,T ] denotes the set of (infinite) coin tosses that start H,H,T , and this set has measure pH pH pT , corresponding to the probability of tossing H,H,T . Markov measures are similar. Given a stochastic matrix P = (P (i, j)) and a left prob- ability eigenvector p = (p(1),...,p(k)) we defined

µ ([i , i ,...,i ]) = p(i )P (i , i )P (i , i ) P (i , i ). P 0 1 n−1 0 0 1 1 2 · · · n−2 n−1 We can regard p(i0) as being the probability of outcome i0. Then we can regard P (i0, i1) as being the probability of outcome i1, given that the previous outcome was i0.

3.4 Exercises § Exercise 3.1 Show that in Weyl’s Criterion (Theorem 1.2.1) one cannot replace the hypothesis in equa- tion (1.2.1) that f is continuous with the hypothesis that f L1(R/Z, ,µ) (where µ ∈ B denotes Lebesgue measure).

Exercise 3.2 Let X be a compact metric space equipped with the Borel σ-algebra . Show that a B continuous transformation T : X X is measurable. → Exercise 3.3 Give an example of a sequence of functions f L1([0, 1], ,µ) (µ = Lebesgue measure) n ∈ B such that f 0 µ-a.e. but f 0 in L1. n → n 6→ Exercise 3.4 Let (X, ,µ) be a probability space and suppose that T : X X is measurable. Show B → that T µ is a probability measure on (X, ,µ). ∗ B Exercise 3.5 (i) Show that the Gauss map does not preserve Lebesgue measure. (That is, find an example of a Borel set B such that T −1B and B have different Lebesgue measures.) (ii) Let µ denote Gauss’ measure and let λ denote Lebesgue measure. Show that if B , ∈B the Borel σ-algebra of [0, 1], then 1 1 λ(B) µ(B) λ(B). (3.4.1) 2 log 2 ≤ ≤ log 2 Conclude that a set B has Lebesgue measure zero if and only if it has Gauss’ ∈ B measure zero. (Two measures with the same sets of measure zero are said to be equivalent.) (iii) Using (3.4.1), show that f L1([0, 1], ,µ) if and only if f L1([0, 1], , λ). ∈ B ∈ B Exercise 3.6 For an integer k 2 define T : R/Z R/Z by T (x) = kx mod 1. Show that T preserves ≥ → Lebesgue measure.

35 MATH4/61112 3. Lebesgue integration. Invariant measures

Exercise 3.7 Let β > 1 denote the golden ratio (so that β2 = β + 1). Define T : [0, 1] [0, 1] by → T (x) = βx mod 1. Show that T does not preserve Lebesgue measure. Define the measure

µ by µ(B)= B k(x) dx where

R 1 on [0, 1/β) 1 + 1 k(x)= β β3  1 on [1/β, 1]. β 1 + 1  “ β β3 ” By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure.

Exercise 3.8 Define the T : [0, 1] [0, 1] by T (x) = 4x(1 x). Define the measure µ by → − 1 1 µ(B)= dx. π B x(1 x) Z − (i) Check that µ is a probability measure. p

(ii) By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure.

Exercise 3.9 Define T : [0, 1] [0, 1] by → 1 1 n(n + 1)x n if x , , n N T (x)= − ∈ n + 1 n ∈     0 if x = 0. This is called the L¨uroth map. Show that ∞ 1 = 1. n(n + 1) n=1 X Show that T preserves Lebesgue measure.

Exercise 3.10 Let Σ = x = (x )∞ x 1, 2,...,k denote the shift space on k symbols. For { j j=0 | j ∈ { }} x, y Σ, define n(x, y) to be the index of the first place in which the two sequences x, y ∈ disagree, and write n(x, y)= if x = y. Define ∞ 1 d(x, y)= . 2n(x,y) (i) Show that d(x, y) is a metric.

(ii) Show that the shift map σ is continuous.

(iii) Show that a cylinder set [i0,...,in−1] is both open and closed. (One can also prove that Σ is compact; we shall use this fact later.)

36 MATH4/61112 3. Lebesgue integration. Invariant measures

Exercise 3.11 Show that the matrix 0 1 0 0 0 1/4 0 3/4 0 0   P = 0 1/2 0 1/2 0  0 0 3/4 0 1/4     0 0 0 1 0      is irreducible but not aperiodic. Show that P has period 2. Show that 1, 2,..., 5 can be { } partitioned into two sets S0 S1 so that P (i, j) > 0 only if i Sℓ and j Sℓ+1 mod 2. Show 2 ∪ ∈ ∈ that P , when restricted to indices in S0 and in S1 is aperiodic. Determine the eigenvalues of P . Find the unique left probability eigenvector p such that pP = p.

Exercise 3.12 Show that Bernoulli measures are Markov measures. That is, given a probability vector p = (p(1),...,p(k)), construct a stochastic matrix P such that pP = p. Show that the corresponding Markov measure is the Bernoulli-p measure.

37 MATH4/61112 4. Examples of invariant measures

4. More examples of invariant measures

4.1 Criteria for invariance § We shall give more examples of invariant measures. Recall that, given a measurable trans- formation T : X X of a probability space (X, ,µ), we say that µ is a T -invariant → B measure (or, equivalently, T is a measure-preserving transformation) if µ(T −1B) = µ(B) for all B . ∈B We will need the following characterisations of invariance.

Lemma 4.1.1 Let T : X X be a measurable transformation of a probability space (X, ,µ). Then the → B following are equivalent: (i) T is a measure-preserving transformation; (ii) for each f L1(X, ,µ) we have ∈ B f dµ = f T dµ; ◦ Z Z (iii) for each f L2(X, ,µ) we have ∈ B f dµ = f T dµ. ◦ Z Z

Proof. We will use the identity χ 1 = χB T ; this is straightforward to check, see T − B ◦ Exercise 4.1. We prove that (i) implies (ii). Suppose that T is a measure-preserving transformation. For any characteristic function χ , B , B ∈B −1 χB dµ = µ(B)= µ(T B)= χ 1 dµ = χB T dµ T − B ◦ Z Z Z and so the equality holds for any simple function (a finite linear combination of characteristic functions). Given any f L1(X, ,µ) with f 0, we can find an increasing sequence of ∈ B ≥ simple functions f with f f pointwise, as n . For each n we have n n → →∞ f dµ = f T dµ n n ◦ Z Z and, applying the Monotone Convergence Theorem to both sides, we obtain

f dµ = f T dµ. ◦ Z Z To extend the result to a general real-valued integrable function f, consider the positive and negative parts. To extend the result to complex-valued integrable functions f, take real and imaginary parts.

38 MATH4/61112 4. Examples of invariant measures

That (ii) implies (iii) follows immediately, as L2(X, ,µ) L1(X, ,µ). B ⊂ B 2 Finally, we prove that (iii) implies (i). Let B . Then χB L (X, ,µ) as 2 ∈ B ∈ B χB dµ = χB dµ = µ(B). Recalling that χB T = χ 1 we have that | | ◦ T − B R R −1 µ(B)= χB dµ = χB T dµ = χ 1 dµ = µ(T B) ◦ T − B Z Z Z so that µ is a T -invariant probability measure. ✷

4.2 Invariant measures on periodic orbits § Recall that if x X then we define the Dirac measure δ by ∈ x 1 if x B δ (B)= ∈ x 0 if x B.  6∈ We also recall that if f : X R then f dδ = f(x). → x Let T : X X be a measurable dynamical system defined on a measurable space → R (X, ). Suppose that x = T nx is a periodic point with period n. Then the probability B measure n−1 1 µ = δ j n T x Xj=0 is T -invariant. This is clear from Lemma 4.1.1, noting that for f L1(X, ,µ) ∈ B 1 f T dµ = (f(Tx)+ + f(T n−1x)+ f(T nx)) ◦ n · · · Z 1 = (f(x)+ f(Tx)+ + f(T n−1x)) n · · · = f dµ, Z using the fact that T nx = x.

4.3 The change of variables formula § The change of variables formula (equivalently, integration by substitution) for (Riemann) integration should be familiar to you. It can be stated in the following way: if u : [a, b] → [c, d] is a differentiable bijection with continuous derivative and f : [c, d] R is (Riemann) → integrable then f u : [a, b] R is (Riemann) integrable and ◦ → u(b) b f(x) dx = f(u(x))u′(x) dx. (4.3.1) Zu(a) Za Allowing for the possibility that u is decreasing (so that u(b) < u(a)), we can rewrite (4.3.1) as f(x) dx = f(u(x)) u′(x) dx. (4.3.2) | | Z[c,d] Z[a,b] We would like a version of (4.3.2) that holds for (Lebesgue) integrable functions on subsets of Rn, equipped with Lebesgue measure on Rn.

39 MATH4/61112 4. Examples of invariant measures

Theorem 4.3.1 (Change of variables formula) Let B Rn be a Borel subset of Rn and suppose that B U for some open subset U. ⊂ ⊂ Suppose that u : U Rn is a diffeomorphism onto its image (i.e. u : U u(U) is a → → differentiable bijection with differentiable inverse). Then u(B) is a Borel set. Let µ denote Lebesgue measure on Rn and let f : Rn C be integrable. Then → f dµ = f u det Du dµ ◦ | | Zu(B) ZB where Du denotes the matrix of partial derivatives of u. There are more sophisticated versions of the change of variables formula that hold for arbitrary measures on Rn.

4.4 Rotations of a circle § We illustrate how one can use the change of variables formula for integration to prove that Lebesgue measure is an invariant measure for certain maps on the circle.

Proposition 4.4.1 Fix α R and define T (x)= x + α mod 1. Then Lebesgue measure µ is T -invariant. ∈ Proof. By Lemma 4.1.1(ii) we need to show that f T dµ = f dµ for every f ◦ ∈ L1(X, ,µ). B R R Recall that we can identify functions on R/Z with 1-periodic functions on R. By using the substitution u(x)= x + α and the change of variables formula for integration we have that

1 1 1+α f T dµ = f(Tx) dx = f(x + α) dx = f(x) dx ◦ 0 0 α Z 1 Z 1+α Z 1 Z α 1 = f(x) dx + f(x) dx = f(x) dx + f(x) dx = f(x) dx Zα Z1 Zα Z0 Z0 α 1+α ✷ where we have used the fact that 0 f(x) dx = 1 f(x) dx by the periodicity of f. R R 4.5 Toral automorphisms § Let X = Rk/Zk be the k-dimensional torus. Let A = (a(i, j)) be a k k matrix with entries × in Z and with det A = 0. We can define a linear map Rk Rk by 6 →

x1 x1 . A . .  .  7→  .  x x  k   k      For brevity, we shall often abuse this notation by writing this as (x ,...,x ) A(x ,...,x ). 1 k 7→ 1 k Since A is an integer matrix it maps Zk to itself. We claim that A allows us to define a map T = T : X X : (x ,...,x )+ Zk A(x ,...,x )+ Zk. A → 1 k 7→ 1 k We shall often abuse notation and write T (x1,...,xk)= A(x1,...,xk) mod 1.

40 MATH4/61112 4. Examples of invariant measures

To see that this map is well defined, we need to check that if x + Zk = y + Zk then Ax + Zk = Ay + Zk. If x,y Rk give the same point in the torus, then x = y + n for some ∈ n Zk. Hence Ax = A(y + n) = Ay + An. As A maps Zk to itself, we see that An Zk ∈ ∈ so that Ax, Ay determine the same point in the torus.

Definition. Let A = (a(i, j)) denote a k k matrix with integer entries such that det A = × 6 0. Then we call the map T : Rk/Zk Rk/Zk a linear toral endomorphism. A → The map T is not invertible in general. However, if det A = 1 then A−1 exists and is ± an integer matrix. Hence we have a map T −1 given by −1 −1 T (x1,...,xk)= A (x1,...,xk) mod 1. One can easily check that T −1 is the inverse of T .

Definition. Let A = (a(i, j)) denote a k k matrix with integer entries such that det A = × 1. Then we call the map T : Rk/Zk Rk/Zk a linear toral automorphism. ± A → Remark. The reason for this nomenclature is clear. If TA is either a linear toral en- domorphism or linear toral automorphism, then it is an endomorphism or automorphism, respectively, of the torus regarded as an additive group.

Example. Take A to be the matrix 2 1 A = 1 1   and define T : R2/Z2 R2/Z2 to be the induced map: → T (x1,x2) = (2x1 + x2 mod 1,x1 + x2 mod 1). Then T is a linear toral automorphism and is called Arnold’s CAT map (CAT stands for ‘C’ontinuous ‘A’utomorphism of the ‘T’orus). See Figure 4.5.1.

Definition. Suppose that det A = 1. Then we call T a hyperbolic toral automorphism ± if A has no eigenvalues of modulus 1.

Proposition 4.5.1 Let T be a linear toral automorphism of the k-dimensional torus X = Rk/Zk. Then Lebesgue measure µ is T -invariant.

Proof. By Lemma 4.1.1(ii) we need to show that f T dµ = f dµ for every f ◦ ∈ L1(X, ,µ). B R R Recall that we can identify functions f : Rk/Zk C with functions f : Rk C that → → satisfy f(x + n)= f(x) for all n Zk. We apply the change of variables formula with the ∈ substitution T (x) = Ax. Note that DT (x) = A and det DT = 1. Hence, by the change | | of variables formula

f T dµ = f T det DT dµ = f dµ = f dµ. ◦ k k ◦ | | k k Z ZR /Z ZT (R /Z ) Z ✷

We shall see in 5.4.3 that linear toral endomorphisms (i.e. when A is a k k integer matrix § × with det A = 0 also preserves Lebesgue measure. 6

41 MATH4/61112 4. Examples of invariant measures

Figure 4.5.1: Arnold’s CAT map

4.6 Exercises § Exercise 4.1 Suppose that T : X X. Show that χ 1 = χB T . → T − B ◦ Exercise 4.2 Let T : R/Z R/Z, T (x) = 2x mod 1, denote the doubling map. Show that the periodic → points for T are points of the form p/(2n 1), p = 0, 1,..., 2n 2. Conclude that T has − − infinitely many invariant measures.

Exercise 4.3 By using the change of variables formula, prove that the doubling map T (x) = 2x mod 1 on R/Z preserves Lebesgue measure.

Exercise 4.4 Fix α R and define the map T : R2/Z2 R2/Z2 by ∈ → T ((x,y)+ Z2) = (x + α, x + y)+ Z2.

By using the change of variables formula, prove that Lebesgue measure is T -invariant.

42 MATH4/61112 5. Ergodic measures

5. Ergodic measures: definition, criteria, and basic examples

5.1 Introduction § In section 3 we defined what is meant by an invariant measure or, equivalently, what is meant by a measure-preserving transformation. In this section, we define what is meant by an ergodic measure. The primary motivation for ergodicity is Birkhoff’s Ergodic Theorem: if T is an ergodic measure-preserving transformation of the probability space (X, ,µ) then, B for each f L1(X, ,µ) we have that ∈ B n−1 1 lim f(T jx)= f dµ n→∞ n Xj=0 Z for µ-a.e. x X. Checking that a given measure-preserving transformation is ergodic is ∈ often a highly non-trivial task and we shall study some methods for proving ergodicity.

5.2 Ergodicity § We define what it means to say that a measure-preserving transformation is ergodic.

Definition. Let (X, ,µ) be a probability space and let T : X X be a measure- B → preserving transformation. We say that T is an ergodic transformation with respect to µ (or that µ is an ergodic measure) if, whenever B satisfies T −1B = B, then we have ∈ B that µ(B) = 0 or 1.

Remark. Ergodicity can be viewed as an indecomposability condition. If ergodicity does not hold then we can find a set B such that T −1B = B and 0 < µ(B) < 1. We can ∈ B then split T : X X into T : B B and T : X B X B with invariant probability → → \ → \ measures 1 µ( B) and 1 µ( (X B)), respectively. µ(B) ·∩ 1−µ(B) ·∩ \ It will sometimes be convenient for us to weaken the condition T −1B = B to µ(T −1B B)= △ 0, where denotes the symmetric difference: △ A B = (A B) (B A). △ \ ∪ \ We will often write that A = B µ-a.e. or A = B mod 0 to mean that µ(A B) = 0. △ Remark. It is easy to see that if A = B µ-a.e. then µ(A)= µ(B).

Lemma 5.2.1 Let T be a measure-preserving transformation of the probability space (X, ,µ). B Suppose that B is such that µ(T −1B B) = 0. Then there exists B′ with ∈ B △ ∈ B T −1B′ = B′ and µ(B B′) = 0. (In particular, µ(B)= µ(B′).) △

43 MATH4/61112 5. Ergodic measures

Proof (not examinable). For each n 0, we have the inclusion ≥ n−1 n−1 T −nB B T −(j+1)B T −jB = T −j(T −1B B). △ ⊂ △ △ j[=0   j[=0 Hence, as T preserves µ,

µ(T −nB B) nµ(T −1B B) = 0. △ ≤ △ Let ∞ ∞ B′ = T −jB. n=0 \ j[=n We have that ∞ ∞ µ B T −jB µ(B T −nB) = 0.  △  ≤ △ j[=n Xj=n   Since the sets ∞ T −jB decrease as n increases we have that µ(B B′) = 0. Also, j=n △ S ∞ ∞ ∞ ∞ T −1B′ = T −(j+1)B = T −jB = B′, n=0 n=0 \ j[=n \ j=[n+1 as required. ✷

Corollary 5.2.2 If T is ergodic and µ(T −1B B) = 0 then µ(B) = 0 or 1. △ We have the following convenient characterisations of ergodicity.

Proposition 5.2.3 Let T be a measure-preserving transformation of the probability space (X, ,µ). The B following are equivalent:

(i) T is ergodic;

(ii) whenever f L1(X, ,µ) satisfies f T = f µ-a.e. we have that f is constant µ-a.e. ∈ B ◦ (iii) whenever f L2(X, ,µ) satisfies f T = f µ-a.e. we have that f is constant µ-a.e. ∈ B ◦ Remark. If f is a constant function then clearly f T = f. Proposition 5.2.3 says that, ◦ when T is ergodic, the constants are the only T -invariant functions (up to sets of measure zero).

Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Suppose that f ∈ L1(X, ,µ) is such that f T = f µ-a.e. By taking real and imaginary parts, we can B ◦ assume without loss of generality that f is real-valued. For k Z and n N, define ∈ ∈ k k + 1 k k + 1 X(k,n)= x X f(x) < = f −1 , . ∈ | 2n ≤ 2n 2n 2n     Since f is measurable, we have that X(k,n) . ∈B

44 MATH4/61112 5. Ergodic measures

We have that

T −1X(k,n) X(k,n) x X f(Tx) = f(x) △ ⊂ { ∈ | 6 } so that µ(T −1X(k,n) X(k,n)) = 0. △ Hence, as T is ergodic, we have by Corollary 5.2.2 that µ(X(k,n)) = 0 or µ(X(k,n)) = 1. As f L1(X, ,µ) is integrable, we have that f is finite almost everywhere. Hence, for ∈ B each n,

∞ ∞ ∞ k k + 1 k k + 1 −1R −1 −1 f = f n , n = f n , n = X(k,n) 2 2 ! 2 2 k=[−∞   k=[−∞   k=[−∞ is equal to X up to a set of measure zero, i.e.,

µ X X(k,n) = 0; △ Z ! k[∈ moreover, this union is disjoint. Hence we have

µ(X(k,n)) = µ(X) = 1 Z Xk∈ and so there is a unique kn for which µ(X(kn,n)) = 1. Let

∞ Y = X(kn,n). n\=1 Then µ(Y ) = 1. Let x,y Y . Then for each n we have that f(x),f(y) [k /2n, (k + ∈ ∈ n n 1)/2n). Hence for all n 1 we have that ≥ 1 f(x) f(y) . | − |≤ 2n Hence f(x)= f(y). Hence f is constant on the set Y . Hence f is constant µ-a.e. That (ii) implies (iii) is clear as if f L2(X, ,µ) then f L1(X, ,µ). ∈ B ∈ B Finally, we prove that (iii) implies (i). Suppose that B is such that T −1B = B. ∈ B Then χ L2(X, ,µ) and χ T (x)= χ (x) for all x X. Hence χ is constant µ-a.e. B ∈ B B ◦ B ∈ B Since χB only takes the values 0 and 1, we must have χB = 0 µ-a.e. or χB = 1 µ-a.e. Therefore 0 if χB = 0 µ-a.e. µ(B)= χB dµ = 1 if χB = 1 µ-a.e. ZX  Hence T is ergodic with respect to µ. ✷

5.3 Fourier series § We shall give a method for proving that certain transformations of the circle or torus are ergodic with respect to Lebesgue measure. To do this, we use Proposition 5.2.3 and Fourier series.

45 MATH4/61112 5. Ergodic measures

Let X = R/Z denote the unit circle and let f : X R. (Alternatively, we can think → of f as a periodic function R R with f(x)= f(x + n) for all n Z.) Equip X with the → ∈ Borel σ-algebra, let µ denote Lebesgue measure and assume that f L2(X, ,µ). ∈ B We can associate to f its Fourier series

∞ a 0 + (a cos 2πnx + b sin 2πnx) , (5.3.1) 2 n n n=1 X where 1 1 an = 2 f(x) cos 2πnx dµ, bn = 2 f(x) sin 2πnx dµ. Z0 Z0 (Notice that we are not claiming that the series converges—we are just formally associating the Fourier series to f.) We shall find it more convenient to work with a complex form of the Fourier series and rewrite (5.3.1) as ∞ 2πinx cne , (5.3.2) n=−∞ X where 1 −2πinx cn = f(x)e dµ. Z0 1 (In particular, c0 = 0 f dµ.) We call cn the nth Fourier coefficient. Remark. That (5.3.2)R and (5.3.1) are equivalent follows from the fact that

e2πinx + e−2πinx e2πinx e−2πinx cos 2πnx = , sin 2πnx = − 2 2i One can explain Fourier series by considering a more general construction. Recall that an inner product on a complex vector space is a function , : C such that H h· ·i H×H→ (i) u, v = v, u for all u, v , h i h i ∈H (ii) for each v , u u, v is linear, ∈H 7→ h i (iii) v, v 0 for all v , with equality if and only if v = 0. h i≥ ∈H Given an inner product, one can define a norm on by setting v = v, v . One can H k k h i then define a metric on by setting d (u, v)= u v . H H k − k p If is a complex vector space with an inner product , such that is complete with H h· ·i H respect to the metric given by the inner product then we call a Hilbert space. H Recall that L2(X, ,µ) is a Hilbert space with the inner product B f, g = fg¯ dµ. h i Z The metric on L2(X, ,µ) is then given by B 1/2 d(f, g)= f g 2 dµ . | − | Z  Let be an infinite dimensional Hilbert space. We say that e ∞ is an orthonormal H { j }j=0 basis for if: H

46 MATH4/61112 5. Ergodic measures

0 if i = j (i) e , e = 6 h i ji 1 if i = j.  (ii) every v can be written in the form ∈H ∞ v = cjej. (5.3.3) Xj=0 As (5.3.3) involves an infinite sum, we need to be careful about what convergence means. n To make (5.3.3) precise, let sn = j=0 cjej denote the nth partial sum. Then (5.3.3) means that v s 0 as n . k − nk→ →∞ P As the vectors e ∞ are orthonormal, taking the inner product of (5.3.3) with e { j}j=0 i shows that ∞ ∞ v, e = c e , e = c e , e = c . h ii h j j ii jh j ii i Xj=0 Xj=0 Let X = R/Z be the circle and let be the Borel σ-algebra. Let µ denote Lebesgue B measure. Let x R/Z. Let e (x) = e2πinx. Then e ∞ is an orthonormal basis for ∈ n { n}n=−∞ the Hilbert space L2(X, ,µ). Thus if f L2(X, ,µ) then we can write B ∈ B ∞ 2πinx f(x)= cne n=−∞ X (in the sense that the sequence of partial sums L2-converges to f) where

c = f, e = f(x)e−2πinx dµ. (5.3.4) n h ni Z If we want to make the dependence of cn on f clear, then we will sometimes write cn(f) for cn. We shall need the following facts about Fourier coefficients.

Proposition 5.3.1 (i) Let f, g L2(X, ,µ). Then f = g µ-a.e. if and only if their Fourier coefficients are ∈ B equal, i.e. c (f)= c (g) for all n Z. n n ∈ (ii) Let f L2(X, ,µ). Then c 0 as n . ∈ B n → → ±∞ Remark. Proposition 5.3.1(ii) is better known as the Riemann-Lebesgue Lemma.

So far, we have studied Fourier series for functions defined on the circle; a similar construction works for functions defined on the k-dimensional torus. Let X = Rk/Zk be the k-dimensional torus equipped with the Borel σ-algebra and let µ denote Lebesgue measure on X. Then L2(X, ,µ) is a Hilbert space when equipped with the inner product B f, g = fg¯ dµ. h i Z k k k 2πihn,xi Let x R /Z . Let n = (n1,...,nk) Z and define en(x) = e where n,x = ∈ ∈ 2 h i n x + + n x . Then en k is an orthonormal basis for L (X, ,µ). Thus we can 1 1 · · · k k { }n∈Z B write f L2(X, ,µ) as ∈ B 2πihn,xi f(x)= cne k nX∈Z 47 MATH4/61112 5. Ergodic measures in the sense that the sequence of partial sums s converges in L2(X, ,µ) where N B 2πihn,xi sN (x)= cne . k n=(n1,...,nXk)∈Z ,|nj|≤N The nth Fourier coefficient is given by

−2πihn,xi cn = cn(f)= f(x)e dµ. Z We have the following analogue of Proposition 5.3.1:

Proposition 5.3.2 (i) Let f, g L2(X, ,µ). Then f = g µ-a.e. if and only if their Fourier coefficients are ∈ B equal.

(ii) Let f L2(X, ,µ). Let n = (n ,...,n ) Zk and define n = max n . ∈ B 1 k ∈ k k 1≤j≤k | j| Then cn 0 as n . → k k→∞ Remark. We could have used any norm on Zk in (ii).

5.4 Proving ergodicity using Fourier series § In the previous section we studied a number of examples of dynamical systems defined on the circle or the torus and we proved that Lebesgue measure is invariant. We show how Proposition 5.2.3 can be used in conjunction with Fourier series to determine whether Lebesgue measure is ergodic. Recall that if f L2(X, ,µ) then we associate to f the Fourier series ∈ B ∞ 2πinx cn(f)e n=−∞ X where −2πinx cn(f)= f(x)e dµ. Z If we let s (x)= n c (f)e2πiℓx then s f 0 as n . n ℓ=−n ℓ k n − k2 → →∞ If T is a measure-preserving transformation then it follows that P 1/2 1/2 s T f T = s T f T 2 dµ = ( s f )2 T dµ k n ◦ − ◦ k2 | n ◦ − ◦ | | n − | ◦ Z  Z  1/2 = ( s f )2 dµ = s f 0 | n − | k n − k2 → Z  as n , where we have used Lemma 4.1.1. By Proposition 5.3.2(i) it follows that, if → ∞ lim s T is a possibly infinite sum of terms of the form e2πinx, then it must be the n→∞ n ◦ Fourier series of f T . In practice, this means that if we take the Fourier series for f(x) ◦ and evaluate it at Tx, then we obtain the Fourier series for f(Tx). If f T = f almost ◦ everywhere, then we can use Proposition 5.3.1(i) to compare Fourier coefficients to obtain relationships between the Fourier coefficients, and then show that f must be constant. A similar method works for Fourier series on the torus, as we shall see.

48 MATH4/61112 5. Ergodic measures

5.4.1 Rotations on a circle § Fix α R and define T : R/Z R/Z by T (x)= x+α mod 1. We have already seen that T ∈ → preserves Lebesgue measure. The following result gives a necessary and sufficient condition for T to be ergodic.

Theorem 5.4.1 Let T (x)= x + α mod 1.

(i) If α Q then T is not ergodic with respect to Lebesgue measure. ∈ (ii) If α Q then T is ergodic with respect to Lebesgue measure. 6∈ Proof. Suppose that α Q and write α = p/q for p,q Z with q = 0. Define ∈ ∈ 6 f(x)= e2πiqx L2(X, ,µ). ∈ B Then f is not constant but

f(Tx)= e2πiq(x+p/q) = e2πi(qx+p) = e2πiqx = f(x).

Hence T is not ergodic. Suppose that α Q. Suppose that f L2(X, ,µ) is such that f T = f a.e. We want 6∈ ∈ B ◦ to prove that f is constant. Suppose that f has Fourier series

∞ 2πinx cne . n=−∞ X Then f T has Fourier series ◦ ∞ 2πinα 2πinx cne e . n=−∞ X Comparing Fourier coefficients we see that

2πinα cn = cne , for all n Z. As α Q, we see that e2πinα = 1 unless n = 0. Hence c = 0 for n = 0. ∈ 6∈ 6 n 6 Hence f has Fourier series c0, i.e. f is constant a.e. ✷

5.4.2 The doubling map § Let X = R/Z. Recall that if f L2(X, ,µ) has Fourier series ∈ B ∞ 2πinx cne n=−∞ X then the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)) tells us that c 0 as n . n → →∞ Proposition 5.4.2 The doubling map T : X X defined by T (x) = 2x mod 1 is ergodic with respect to → Lebesgue measure µ.

49 MATH4/61112 5. Ergodic measures

Proof. Let f L2(X, ,µ) and suppose that f T = f µ-a.e. Let f have Fourier series ∈ B ◦ ∞ 2πinx f(x)= cne . n=−∞ X For each p 0, f T p has Fourier series ≥ ◦ ∞ 2πin2px cne . n=−∞ X Comparing Fourier coefficients we see that

cn = c2pn for all n Z and each p = 0, 1, 2,.... Suppose that n = 0. Then 2pn as p . By ∈ 6 | |→∞ →∞ the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)), c p 0 as p . As c p = c , 2 n → → ∞ 2 n n we must have that c = 0 for n = 0. Thus f has Fourier series c , and so must be equal to n 6 0 a constant a.e. Hence T is ergodic with respect to µ. ✷

5.4.3 Toral endomorphisms § The argument for the doubling map can be generalised using higher-dimensional Fourier series to study toral endomorphisms. Let X = Rk/Zk and let µ denote Lebesgue measure. When T is invertible (and so a linear toral automorphism) we have already seen that Lebesgue measure is an invariant measure; in 7 we shall see that Lebesgue measure is an § invariant measure when T is a linear toral endomorphism. Recall that f L2(X, ,µ) has Fourier series ∈ B 2πihn,xi cne , k nX∈Z where n = (n ,...,n ), x = (x ,...,x ). Define n = max n . Then the Riemann- 1 k 1 k | | 1≤j≤k | j| Lebesgue Lemma tells us that cn 0 as n . → | |→∞ Let A be a k k integer matrix with det A = 0 and define T : X X by × 6 → k k T ((x1,...,xk)+ Z )= A(x1,...,xk)+ Z .

Proposition 5.4.3 A linear toral endomorphism T is ergodic with respect to µ if and only if no eigenvalue of A is a .

Remark. In particular, hyperbolic toral automorphisms (i.e. det A = 1 and A has no ± eigenvalues of modulus 1) are ergodic with respect to Lebesgue measure.

Proof. Suppose that T is ergodic but, for a contradiction, that A has a pth root of unity as an eigenvalue. We choose p > 0 to be the least such integer. Then Ap has 1 as an eigenvalue, and so n(Ap I) = 0 for some non-zero vector n = (n ,...,n ) Rk. Since A − 1 k ∈ is an integer matrix, we have that Ap I is an integer matrix, and so we can in fact take − n Zk. Note that ∈ p p e2πihn,A xi = e2πihnA ,xi = e2πihn,xi.

50 MATH4/61112 5. Ergodic measures

T This is because, writing x = (x1,...,xk) ,

a a x 1,1 · · · 1,k 1 n, Ax = (n ,...,n ) . . . = nA, x . h i 1 k  . .   .  h i a a x  k,1 · · · k,k   k      Define p−1 j f(x)= e2πihn,A xi. Xj=0 Then f L2(X, ,µ) and is T -invariant. Since T is ergodic, we must have that f is ∈ B constant. But the only way in which this can happen is if n = 0, a contradiction. Conversely suppose that no eigenvalue of A is a root of unity; we prove that T is ergodic with respect to Lebesgue measure. Suppose that f L2(X, ,µ) is T -invariant µ-a.e. We ∈ B show that f is constant µ-a.e. Associate to f its Fourier series:

2πihn,xi cne . k nX∈Z Since fT p = f µ-a.e., for all p> 0, we have that

2πihnAp,xi 2πihn,xi cne = cne . k k nX∈Z nX∈Z Comparing Fourier coefficients we see that, for every n Zk, ∈

cn = cn = = cn p = . A · · · A · · ·

If cn = 0 then there can only be finitely many indices in the above list, for otherwise it 6 would contradict the fact that cn 0 as n , by the Riemann-Lebesgue Lemma → | |→∞ (Proposition 5.3.1(ii)). Hence there exist q > q 0 such that nAq1 = nAq2 . Letting 1 2 ≥ p = q q > 0 we see that nAp = n. Thus n is either equal to 0 or n is an eigenvector for 1 − 2 Ap with eigenvalue 1. In the latter case, A would have a pth root of unity as an eigenvalue. Hence n = 0. Hence cn = 0 unless n = 0 and so f is equal to the constant c0 µ-a.e. Thus T is ergodic. ✷

5.5 Exercises § Exercise 5.1 Suppose that α Q. Show directly from the definition that the rotation T (x)= x+α mod 1 ∈ is not ergodic, i.e. find an invariant set B = T −1B, B , which has Lebesgue measure ∈ B 0 <µ(B) < 1.

Exercise 5.2 Define T : R2/Z2 R2/Z2 by → T ((x,y)+ Z2) = (x + α, x + y)+ Z2.

Suppose that α Q. By using Fourier series, show that T is ergodic with respect to 6∈ Lebesgue measure.

51 MATH4/61112 5. Ergodic measures

Exercise 5.3 Let T : X X be a measurable transformation of a measurable space (X, ). Suppose → B that x = T nx is a periodic point with period n. Define the measure µ supported on the periodic orbit of x by n−1 1 µ = δ j n T x Xj=0 where δx denotes the Dirac measure at x. Show from the definition of ergodicity that µ is an ergodic measure.

Exercise 5.4 (Part (iv) of this exercise is outside the scope of the course!) It is easy to construct lots of examples of hyperbolic toral automorphisms (i.e. no eigenvalues of modulus 1—the CAT map is such an example), which must necessarily be ergodic with respect to Lebesgue measure. It is harder to show that there are ergodic toral automorphisms with some eigenvalues of modulus 1.

(i) Show that to have an ergodic toral automorphism of Rk/Zk with an eigenvalue of modulus 1, we must have k 4. ≥ Consider the matrix 0 1 0 0 0 0 1 0 A =   . 0 0 0 1  1 8 6 8   − −    4 4 (ii) Show that A defines a linear toral automorphism TA of the 4-dimensional torus R /Z . (iii) Show that A has four eigenvalues, two of which have modulus 1.

(iv) Show that TA is ergodic with respect to Lebesgue measure. (Hint: you have to show that the two eigenvalues of modulus 1 are not roots of unity, i.e. are not solutions to λn 1 = 0 for some n. The best way to do this is to use results from Galois theory − on the irreducibility of polynomials.)

52 MATH4/61112 6. Ergodic measures: using the HKET

6. Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem to prove ergodicity

6.1 Introduction § We illustrate a method for proving that a given transformation is ergodic using the Hahn- Kolmogorov Extension Theorem. The key observation is the following technical lemma.

Lemma 6.1.1 Let (X, ,µ) be a probability space and suppose that is an algebra that generates B A⊂B . Let B . Suppose there exists K > 0 such that B ∈B µ(B)µ(I) Kµ(B I) (6.1.1) ≤ ∩ for all I . Then µ(B) = 0 or 1. ∈A Proof. Let ε > 0. As generates there exists I such that µ(Bc I) < ε. Hence A B ∈ A △ µ(Bc) µ(I) < ε. Moreover, note that B I Bc I so that µ(B I) < ε. Hence | − | ∩ ⊂ △ ∩ µ(B)µ(Bc) µ(B)(µ(I)+ ε) µ(B)µ(I)+ µ(B)ε Kµ(B I)+ ε (K + 1)ε. ≤ ≤ ≤ ∩ ≤ As ε> 0 is arbitrary, it follows that µ(B)µ(Bc) = 0. Hence µ(B) = 0 or 1. ✷

Remark. We will often apply Lemma 6.1.1 when is an algebra of finite unions of A intervals or cylinders. In this case, we need only check that there exists a constant K > 0 k such that (6.1.1) holds for intervals or cylinders. To see this, let I = j=1 Ij be a finite union of pairwise disjoint sets in . Then if (6.1.1) holds for I then A j S k k µ(B)µ(I) = µ(B)µ I = µ(B)µ(I )  j j j[=1 Xj=1 k   k K µ(B I )= Kµ B I = Kµ(B I). ≤ ∩ j  ∩ j ∩ Xj=1 j[=1   We will also use the change of variables formula for integration. Recall that if I, J R ⊂ are intervals, u : I J is a differentiable bijection, and f : J R is integrable, then → → f(x) dx = f(u(x)) u′(x) dx. | | ZJ ZI

53 MATH4/61112 6. Ergodic measures: using the HKET

6.2 The doubling map § To illustrate the method, we give another proof that the doubling map is ergodic with respect to Lebesgue measure. Let X = [0, 1] be the unit interval, let be the Borel B σ-algebra, and let µ be Lebesgue measure. Given x [0, 1], we can write x as a base 2 ‘decimal’ expansion: ∈ ∞ x x = x x x . . . = j (6.2.1) · 0 1 2 2j+1 Xj=0 where x 0, 1 . Note that j ∈ { } ∞ x ∞ x ∞ x T (x) = 2 j mod 1 = x + j+1 mod 1 = j+1 . 2j+1 0 2j+1 2j+1 Xj=0 Xj=0 Xj=0 Hence if x has base 2 expansion given by (6.2.1) then T (x) has base 2 expansion given by

T (x)= x x x . . . · 1 2 3 i.e. T deletes the zeroth term in the base 2 expansion of x and shifts the remaining terms one place to the left. We introduce dyadic intervals or cylinders to be the sets

I(i , i ,...,i )= x [0, 1] x = i , j = 0,...,n 1 . 0 1 n−1 { ∈ | j j − } (So, for example, I(0) = [0, 1/2], I(1) = [1/2, 1], I(0, 0) = [0, 1/4],I(0, 1) = [1/4, 1/2], etc.) We call n the rank of the cylinder. A dyadic interval is an interval with end-points at k/2n, (k + 1)/2n where n 1 and k 0, 1,..., 2n . ≥ ∈ { } Let denote the algebra of finite unions of cylinders. Then generates the Borel A A σ-algebra. This follows from Proposition 2.4.2 by noting that cylinders are intervals (and so Borel) and that they separate points: if x,y [0, 1], x = y, then they have base 2 ∈ 6 expansions that differ at some index, say x = y . Hence x,y belong to disjoint cylinders n 6 n of rank n. Define the maps x x + 1 φ (x)= , φ (x)= . 0 2 1 2 Then φ : [0, 1] I(0) and φ : [0, 1] I(1) are differentiable bijections. Indeed, if 0 → 1 → x [0, 1] has base 2 expansion ∈ x = x x x . . . · 0 1 2 then φ0(x) and φ1(x) have base 2 expansions given by

φ (x)= 0x x x ..., φ (x)= 1x x x .... 0 · 0 1 2 1 · 0 1 2

Thus φ0 and φ1 act on base 2 expansions as a shift to the right, inserting the digits 0 and 1 in the zeroth place, respectively. Note that T φ (x)= x and T φ (x)= x for all x [0, 1]. 0 1 ∈ Given i , i ,...,i 0, 1 , define 0 1 n−1 ∈ { }

φi0,i1,...,in 1 : [0, 1] I(i0, i1,...,in−1) − → by

φi0,i1,...,in 1 = φi0 φi1 φin 1 . (6.2.2) − · · · −

54 MATH4/61112 6. Ergodic measures: using the HKET

Thus φi0,i1,...,in 1 takes the point x with base 2 expansion given by (6.2.1), shifts the digits − n places to the right, and inserts the digits i0, i1,...,in−1 in the first n places. Note that n T φi0,i1,...,in 1 (x)= x for all x [0, 1]. − ∈ We are now in a position to prove that T is ergodic with respect to Lebesgue measure. Let B be such that T −1B = B. We must show that µ(B) = 0 or 1. By Lemma 6.1.1, ∈B it is sufficient to prove that there exists K > 0 such that µ(B)µ(I) Kµ(B I) for all ≤ ∩ intervals I; in fact, we shall prove that µ(B)µ(I)= µ(B I) for all dyadic intervals I. −n ∩ Note that T B = B. Let I = I(i0, i1,...,in−1) be a cylinder of rank n and let n n φ = φi0,i1,...,in 1 . Then T φ(x) = x. Note also that µ(I) = 1/2 . We will also need the ′ − n ′ ′ fact that φ (x) = 1/2 (this follows by noting that φ0(x)= φ1(x) = 1/2 and differentiating (6.2.2) using the chain rule). Finally, we observe that

µ(B I) = χ (x) dx ∩ B∩I Z = χB(x)χI (x) dx Z = χB(x) dx I Z 1 ′ = χB(φ(x))φ (x) dx by the change of variables formula 0 Z 1 ′ −n = χ n (φ(x))φ (x) dx as T B = B T − B 0 Z 1 n ′ n n = χB(T (φ(x)))φ (x) dx as χT − B = χB T 0 ◦ Z 1 ′ n = χB(x)φ (x) dx as T φ(x)= x Z0 1 1 = χ (x) as φ′(x) = 1/2n 2n B Z0 = µ(I)µ(B) as µ(I) = 1/2n.

Hence µ(B I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it ∩ follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

6.3 The Gauss map § Let x [0, 1]. If x has continued fraction expansion ∈ 1 x = 1 x + 0 1 x + 1 x + 2 · · · then for brevity we write x = [x0,x1,x2,...]. Let X = [0, 1] and recall that the Gauss map is defined by T (x) = 1/x mod 1 (with T defined at 0 by setting T (0) = 0). If x has continued fraction expansion [x0,x1,x2,...] then T (x) has continued fraction expansion [x1,x2,...]. We have already seen that T leaves

55 MATH4/61112 6. Ergodic measures: using the HKET

Gauss’ measure µ invariant, where Gauss’ measure is defined by 1 1 µ(B)= dx. log 2 1+ x ZB We shall find it convenient to swap between Gauss’ measure µ and Lebesgue measure, which we shall denote here by λ. Recall from Exercise 3.5 that for any set B we have ∈B 1 1 λ(B) µ(B) λ(B). 2 log 2 ≤ ≤ log 2 Hence µ(B) = 0 if and only if λ(B) = 0. Thus to prove ergodicity it suffices to show that any T -invariant set B has either λ(B)=0 or λ(Bc) = 0. We shall also need some basic facts about continued fractions. Let x (0, 1) be irrational ∈ and have continued fraction expansion [x ,x ,...]. For any t [0, 1], write 0 1 ∈ Pn(x0,x1,...,xn−1; t) [x0,x1,...,xn−1 + t]= Qn(x0,x1,...,xn−1; t) where Pn(x0,x1,...,xn−1; t) and Qn(x0,x1,...,xn−1; t) are polynomials in x0,x1,...,xn−1 and t. Let Pn = Pn(x0,x1,...,xn−1; 0), Qn = Qn(x0,x1,...,xn−1; 0) (we suppress the dependence of Pn and Qn on x0,...,xn−1 for brevity). The following lemma is easily proved using induction.

Lemma 6.3.1 (i) We have

Pn(x0,x1,...,xn−1; t)= Pn + tPn−1, Qn(x0,x1,...,xn−1; t)= Qn + tQn−1.

and the following recurrence relations hold:

Pn+1 = xnPn + Pn−1, Qn+1 = xnQn + Qn−1

with initial conditions P0 = 0, P1 = 1,Q0 = 1,Q1 = x0. (ii) The following identity holds:

Q P Q P = ( 1)n. n n−1 − n−1 n − Let i , i ,...,i N. Define the cylinder I(i , i ,...,i ) to be the set of all points 0 1 n−1 ∈ 0 1 n−1 x (0, 1) whose continued fraction expansion starts with i ,...,i . This is easily seen ∈ 0 n−1 to be an interval; indeed

I(i , i ,...,i )= [i , i ,...,i + t] t [0, 1) . 0 1 n−1 { 0 1 n−1 | ∈ } Let denote the algebra of finite unions of cylinders. Then generates the Borel σ- A A algebra. (This follows from Proposition 2.4.2: cylinders are clearly Borel sets and they separate points. To see this, note that if x = y then they have different continued fraction 6 expansions. Hence there exists n such that x = y . Hence x,y are in different cylinders n 6 n of rank n, and these cylinders are disjoint.) For each i N define the map φ : [0, 1) I(i) by ∈ i → 1 φ (x)= . i i + x

56 MATH4/61112 6. Ergodic measures: using the HKET

Thus if x has continued fraction expansion [x0,x1,...] then φi(x) has continued fraction expansion [i, x ,x ,...]. Clearly T (φ (x)) = x for all x [0, 1). 0 1 i ∈ For i , i ,...,i N, define 0 1 n−1 ∈

φi0,i1,...,in 1 = φi0 φi1 φin 1 : [0, 1) I(i0, i1,...,in−1). − · · · − →

Then φi0,i1,...,in 1 (x) takes the continued fraction expansion of x, shifts every digit n places − to the right, and inserts the digit i0, i1,...,in−1 in the first n places. Clearly

n T (φi0,i1,...,in 1 (x)) = x − for all x [0, 1). ∈ We first need an estimate on the length of (i.e. the Lebesgue measure of) the cylinder I(i0, i1,...,in−1). Note that

Pn(i0,...,in−1; t) Pn + tPn−1 φi0,i1,...,in 1 (t)= = . − Qn(i0,...,in−1; t) Qn + tQn−1 Differentiating this expression with respect to t and using Lemma 6.3.1(ii), we see that

Q P P Q 1 φ′ (t) = n n−1 n n−1 = . i0,i1,...,in 1 − 2 2 | − | (Q + tQ ) (Q + tQ ) n n−1 n n−1

It follows from Lemma 6.3.1(ii) that Q + Q 2Q . Hence n n−1 ≤ n 1 1 1 1 φ′ (t) . (6.3.1) 2 2 i0,i1,...,in 1 2 4 Qn ≤ (Qn + Qn−1) ≤ | − |≤ Qn Hence 1 ′ λ(I(i0, i1,...,in−1)) = χI(i0,i1,...,in 1)(t) dt = dt = φi0,i1,...,in 1 (t) dt − − I(i0,i1,...,in 1) 0 | | Z Z − Z (6.3.2) where we have used the change of variables formula. Combining (6.3.2) with (6.3.1) we see that 1 1 1 2 λ(I(i0, i1,...,in−1)) 2 . (6.3.3) 4 Qn ≤ ≤ Qn We can now prove that the Gauss map is ergodic with respect to Gauss’ measure µ. Suppose that T −1B = B where B . Let I(i , i ,...,i ) be a cylinder. Then ∈B 0 1 n−1 λ(B I(i , i ,...,i )) ∩ 0 1 n−1 = χB(x) dx I(i0,i1,...,in 1) Z − 1 ′ = χB(φi0,i1,...,in 1 (x)) φi0,i1,...,in 1 (x) dx by the change of variables formula. 0 − | − | Z 1 ′ −n n = χT − B(φi0,i1,...,in 1 (x)) φi0,i1,...,in 1 (x) dx as T B = B 0 − | − | Z 1 n ′ n n = χB(T (φi0,i1,...,in 1 (x))) φi0,i1,...,in 1 (x) dx as χT − B = χB T 0 − | − | ◦ Z 1 ′ n = χB(x) φi0,i1,...,in 1 (x) dx as T φi0,i1,...,in 1 (x)= x. | − | − Z0 57 MATH4/61112 6. Ergodic measures: using the HKET

By (6.3.1) and (6.3.3) it follows that 1 1 λ(B I(i0, i1,...,in−1)) 2 λ(B) λ(B)λ(I(i0, i1,...,in−1)) ∩ ≥ 4Qn ≥ 4 so that λ(B)λ(I(i , i ,...,i )) 4λ(B I(i , i ,...,i )). 0 1 n−1 ≤ ∩ 0 1 n−1 By Lemma 6.1.1 it follows that λ(B)=0 or λ(Bc) = 0. Hence, as Lebesgue measure and Gauss’ measure have the same sets of measure zero, it follows that either µ(B) = 0or µ(Bc) = 0. Hence T is ergodic with respect to Gauss’ measure.

6.4 Bernoulli shifts § Let S = 1,...,k be a finite set of symbols and let Σ = x = (x )∞ x 1, 2,...,k { } { j j=0 | j ∈ { }} denote the shift space on k symbols. Let σ :Σ Σ denote the left shift map, so that → (σ(x))j = xj+1. Recall that we defined the cylinder [i0,...,in−1] to be the set of all sequences in Σ that start with symbols i0,...,in−1, that is

[i ,...,i ]= x = (x )∞ Σ x = i , j = 0, 1,...,n 1 . 0 n−1 { j j=0 ∈ | j j − } k Let p = (p(1),...,p(k)) be a probability vector (that is, p(j) > 0, j=1 p(j) = 1). We defined the Bernoulli measure µ on cylinders by setting p P µ [i ,...,i ]= p(i )p(i ) p(i ). p 0 n−1 0 1 · · · n−1

We have already seen that µp is a σ-invariant measure.

Proposition 6.4.1 Let µp be a Bernoulli measure. Then µp is ergodic.

Proof. We first make the following observation: let I = [i0,...,ip−1], J = [j0,...,jq−1] be cylinders of ranks p,q, respectively. Consider I σ−nJ where n p. Then ∩ ≥ I σ−nJ = x = (x )∞ Σ x = i for k = 0, 1,...,p 1, x = j for k = 0, 1,...,q 1 ∩ { k k=0 ∈ | k k − k+n k − } = [i0, i1,...,ip−1,xp,...,xn−1, j0,...,jq−1], xp,...,xn 1 [ − a disjoint union. Hence

µ (I σ−nJ) = µ [i , i ,...,i ,x ,...,x , j ,...,j ] p ∩ p 0 1 p−1 p n−1 0 q−1 xp,...,xn 1 X − = p(i )p(i ) p(i )p(x ) p(x )p(j )p(j ) p(j ) 0 1 · · · p−1 p · · · n−1 0 1 · · · q−1 xp,...,xn 1 X − k k = p(i )p(i ) p(i )p(j )p(j ) p(j ) as p(x )= = p(x ) = 1 0 1 · · · p−1 0 1 · · · q−1 p · · · n−1 xp=1 xn 1=1 X X− = µp(I)µp(J). (6.4.1)

Let B be σ-invariant. By Lemma 6.1.1 it is sufficient to prove that µ (B)µ (I) ∈B p p ≤ µ (B I) for each cylinder I. Let ε> 0. We first approximate the invariant set B by a finite p ∩

58 MATH4/61112 6. Ergodic measures: using the HKET union of cylinders. By Proposition 2.4.4, we can find a finite disjoint union of cylinders A = r J such that µ (B A) < ε. Note that µ (A) µ (B) < ε. j=1 j p △ | p − p | Let n be any integer greater than the rank of I. Note that σ−nB σ−nA = σ−n(B A). S △ △ Hence µ (σ−nB σ−nA)= µ (σ−n(B A)) = µ (B A) < ε, p △ p △ p △ −n where we have used the facts that σ B = B and that µp is an invariant measure. r As A = j=1 Jj is a finite union of cylinders and n is greater than the rank of I, it follows from (6.4.1) that S r r µ (σ−nA I) = µ σ−n J I = µ (σ−nJ I) p ∩ p   j ∩  p j ∩ j[=1 Xj=1 r    r = µ (J )µ (I)= µ J µ (I) p j p p  j p Xj=1 j[=1 = µp(A)µp(I).  

Finally, note that (σ−nA I) (σ−nB I) (σ−nA) (σ−nB). Hence µ ((σ−nA ∩ △ ∩ ⊂ △ p ∩ I) (σ−nB I)) < ε so that µ (σ−nA I) <µ (σ−nB I)+ ε. Hence △ ∩ p ∩ p ∩ µ (B)µ (I) = µ (σ−nB)µ (I) µ (σ−nA)µ (I)+ ε = µ (σ−nA I)+ ε p p p p ≤ p p p ∩ µ (σ−nB I) + 2ε = µ(B I) + 2ε. ≤ p ∩ ∩ As ε > 0 is arbitrary, we have that µ (B)µ (I) µ (B I) for any cylinder I. By p p ≤ p ∩ Lemma 6.1.1, it follows that µp(B) = 0 or 1. Hence µp is ergodic. ✷

6.5 Markov shifts § Let P be an irreducible stochastic k k matrix with entries P (i, j). Let p = (p(1),...,p(k)) × be the unique left probability eigenvector corresponding to eigenvalue 1, so that pP = p. Recall that the Markov measure µP is defined on the Borel σ-algebra by defining it on cylinders in the following way:

µ ([i , i ,...,i ]) = p(i )P (i , i )P (i , i ) P (i , i ). P 0 1 n−1 0 0 1 1 2 · · · n−2 n−1

We have seen that µP is an invariant measure for the shift map σ. We can adapt the proof of Proposition 6.4.1 to show that µP is ergodic.

Proposition 6.5.1 Let P be an irreducible stochastic matrix. Then the corresponding Markov measure µP is ergodic.

Proof (not examinable). Let d denote the period of P . Let I = [i0,...,ip−1], J = [j0,...,jq−1] be cylinders of ranks p,q, respectively. Consider I σ−nJ where n p. Then ∩ ≥ I σ−nJ = x Σ x = i for j = 0, 1,...,p 1, x = y for j = 0, 1,...,q 1 ∩ { ∈ | j j − j+n j − } = [i0, i1,...,ip−1,xp,...,xn−1, j0,...,jq−1], xp,...,xn 1 [ − 59 MATH4/61112 6. Ergodic measures: using the HKET a disjoint union. Hence

µ (I σ−nJ) P ∩ = µp[i0, i1,...,ip−1,xp,...,xn−1, j0,...,jq−1] xp,...,xn 1 X − = p(i )P (i , i ) P (i , i )P (i ,x )P (x ,x ) P (x , j ) 0 0 1 · · · p−2 p−1 p−1 p p p+1 · · · n−1 0 xp,...,xn 1 X − P (j , j ) P (j , j ) × 0 1 · · · q−2 q−1 1 = µP (I)µP (J) P (ip−1,xp)P (xp,xp+1) P (xn−1, j0) p(j0) · · · xp,...,xn 1 X − n−1−p P (ip−1, j0) = µP (I)µP (J) p(j0)

By the Perron-Frobenius Theorem (Theorem 3.3.6), we know that P nd(i, j) p(j) as → n . Hence, letting n through an appropriate subsequence, we see that →∞ →∞ µ (I σ−nJ) µ (I)µ (J). (6.5.1) P ∩ → P P The remainder of the proof is almost identical to the proof of Proposition 6.4.1. Let B ∈ be σ-invariant. By Proposition 6.1.1 it is sufficient to prove that µ (B)µ (I) µ (B I) B P P ≤ P ∩ for every cylinder I. Let ε> 0. We approximate B by a finite union of cylinders by using r Proposition 2.4.4. That is, we can find a finite disjoint union of cylinders A = j=1 Jj such that µ (B A) < ε. Note that µ (A) µ (B) < ε. P △ | P − P | S Let n be any integer greater than the rank of I. Note that σ−nB σ−n(A)= σ−n(B A). △ △ Hence µ (σ−nB σ−nA)= µ (σ−n(B A)) = µ (B A) < ε P △ P △ P △ −n where we have used the facts that σ B = B and that µP is an invariant measure. r As A = j=1 Jj is a finite union of cylinders, it follows from (6.5.1) that by choosing n sufficiently large, we have that µP (σ−nJ I) µ (J )µ (I)+ ε for j = 1, 2,...,r. Hence S j ∩ ≤ P j P r r µ (σ−nA)µ (I) = µ σ−n J µ (I)= µ (σ−nJ )µ (I) P P P   j P P j P j[=1 Xj=1 r    r = µ (J I)+ ε = µ J I + ε P j ∩ P  j ∩  Xj=1 j[=1 = µ (A I)+ ε.   P ∩ Finally, note that (σ−nA I) (σ−nB I) (σ−nA) (σ−nB). Hence µ ((σ−nA ∩ △ ∩ ⊂ △ P ∩ I) (σ−nB I)) < ε so that µ (σ−nB I) <µ (σ−nA I)+ ε. Hence △ ∩ P ∩ P ∩ µ (B)µ (I) = µ (σ−nB)µ (I) µ (σ−nA)µ (I)+ ε = µ (σ−nA I) + 2ε P P P P ≤ P P P ∩ µ (σ−nB I) + 3ε = µ (B I) + 3ε. ≤ P ∩ P ∩ As ε> 0 was arbitrary, we have that µ (B)µ (I) µ (B I). By Proposition 6.1.1, the P P ≤ P ∩ result follows. ✷

60 MATH4/61112 6. Ergodic measures: using the HKET

6.6 Exercises § Exercise 6.1 The dynamical system T : [0, 1] [0, 1] defined by → 2x if 0 x 1/2 T (x)= ≤ ≤ 2(1 x) if 1/2 x 1  − ≤ ≤ is called the .

(i) Prove that T preserves Lebesgue measure.

(ii) Prove that T is ergodic with respect to Lebesgue measure.

Exercise 6.2 Recall that the L¨uroth map T : [0, 1] [0, 1] is defined to be → 1 1 n(n + 1)x n if x , T (x)= − ∈ n + 1 n     0 if x = 0. We saw in Exercise 3.9 that Lebesgue measure is a T -invariant probability measure. Prove that Lebesgue measure is ergodic.

Exercise 6.3 Prove (using induction on n) Lemma 6.3.1.

Exercise 6.4 Let Σ = x = (x )∞ x 0, 1 and let σ :Σ Σ, (σ(x)) = x be the shift map on { j j=0 | j ∈ { }} → j j+1 the space of infinite sequences of two symbols 0, 1 . Note that Σ supports uncountably { } many different σ-invariant measures (for example, the Bernoulli-(p, 1 p) measures are − all ergodic and all distinct for p (0, 1)). We will use this observation to prove that the ∈ doubling map has uncountably many ergodic measures. Define π :Σ R/Z by → x x x π(x)= π(x ,x ,...)= 0 + 1 + + n + . 0 1 2 22 · · · 2n+1 · · · (i) Show that π is continuous.

(ii) Let T : R/Z R/Z be the doubling map: T (x) = 2x mod 1. Show that π σ = T π. → ◦ ◦ (iii) If µ is a σ-invariant probability measure on Σ, show that π∗µ (where π∗µ(B) = µ(π−1B) for a Borel subset B R/Z) is a T -invariant probability measure on R/Z. ⊂ (Lebesgue measure on R/Z corresponds to choosing µ to be the Bernoulli-(1/2, 1/2)- measure on Σ.)

(iv) Show that if µ is an ergodic measure for σ, then π∗µ is an ergodic measure for T . (v) Conclude that there are uncountably many different ergodic measures for the doubling map.

61 MATH4/61112 7. Continuous transformations

7. Continuous transformations on compact metric spaces

7.1 Introduction § So far, we have been studying a measurable map T defined on a probability space (X, ,µ). B We have asked whether the given measure µ is invariant or ergodic. In this section, we shift our focus slightly and consider, for a given transformation T : X X, the space M(X, T ) → of all probability measures that are invariant under T . In order to equip M(X, T ) with some structure we will need to assume that the underlying space X is itself equipped with some additional structure other than merely being a measure space. Throughout this section we will work in the context of X being a compact metric space and T being a continuous transformation.

7.2 Probability measures on compact metric spaces § Let X be a compact metric space equipped with the Borel σ-algebra . (Recall that the B Borel σ-algebra is the smallest σ-algebra that contains all the open subsets of X.) Let C(X, R)= f : X R f is continuous denote the space of real-valued continuous { → | } functions defined on X. Define the uniform norm of f C(X, R) by ∈ f ∞ = sup f(x) . k k x∈X | | With this norm, C(X, R) is a Banach space. An important property of C(X, R) that will prove to be useful later on is that it is separable: C(X, R) contains a countable dense subset. Thus we can choose a sequence f C(X, R) ∞ such that, for all f C(X, R) and all ε > 0, there exists n such that { n ∈ }n=1 ∈ f f < ε. k − nk∞ Let M(X) denote the set of all Borel probability measures on (X, ). B It will be very important to have a sensible notion of convergence in M(X); the ap- propriate notion for us is called weak∗ convergence. We say that a sequence of probability measures µ weak∗ converges to µ as n if, for every f C(X, R), n →∞ ∈

lim f dµn = f dµ. n→∞ Z Z If µ weak∗ converges to µ then we write µ ⇀µ as n . We can make M(X) into a n n → ∞ metric space compatible with this definition of convergence by choosing a countable dense subset f ∞ C(X, R) and, for µ ,µ M(X), setting { n}n=1 ⊂ 1 2 ∈ ∞ 1 d (µ ,µ )= f dµ f dµ M(X) 1 2 2n f n 1 − n 2 n=1 n ∞ Z Z X k k (we can assume that f 0 for any n). It is easy to check that µ ⇀µ as n if and n 6≡ n → ∞ only if d (µ ,µ) 0 as n . M(X) n → →∞ However, we will not need to work with a particular metric: what will be important is the definition of convergence.

62 MATH4/61112 7. Continuous transformations

Remark. Note that with this definition it is not necessarily true that limn→∞ µn(B) = µ(B) for B . ∈B 7.2.1 The Riesz Representation Theorem § Let µ M(X) be a Borel probability measure. Then we can think of µ as a functional ∈ that acts on C(X, R), that is we can regard µ as a map

µ : C(X, R) R : f f dµ. → 7→ Z We will often write µ(f) for f dµ. Notice that this functional enjoys several natural properties: R (i) the functional defined by µ is linear:

µ(λ1f1 + λ2f2)= λ1µ(f1)+ λ2µ(f2)

where λ , λ R and f ,f C(X, R). 1 2 ∈ 1 2 ∈ (ii) the functional defined by µ is bounded: i.e. if f C(X, R) then µ(f) f . ∈ | | ≤ k k∞ (iii) if f 0 then µ(f) 0 (we say that the functional µ is positive); ≥ ≥ (iv) consider the function 1 defined by 1(x) 1 for all x; then µ(1) = 1 (we say that the ≡ functional µ is normalised).

The Riesz Representation Theorem says that the above properties characterise all Borel probability measures on X. That is, if we have a map w : C(X, R) R that satisfies → the above four properties, then w must be given by integrating with respect to a Borel probability measure. This will be a very useful method of constructing measures: we need only construct bounded positive normalised linear functionals.

Theorem 7.2.1 (Riesz Representation Theorem) Let w : C(X, R) R be a functional such that: → (i) w is linear: i.e. w(λ1f1 + λ2f2)= λ1w(f1)+ λ2w(f2); (ii) w is bounded: i.e. for all f C(X, R) we have w(f) f ; ∈ | | ≤ k k∞ (iii) w is positive: i.e. if f 0 then w(f) 0; ≥ ≥ (iv) w is normalised: i.e. w(1) = 1.

Then there exists a Borel probability measure µ M(X) such that ∈ w(f)= f dµ. Z Moreover, µ is unique. Thus the Riesz Representation Theorem says that “if it looks like integration on continuous functions, then it is integration with respect to a (unique) Borel probability measure.”

63 MATH4/61112 7. Continuous transformations

7.2.2 Properties of M(X) § First note that the space M(X) of Borel probability measures on the compact metric space X is non-empty (provided X = ). This is because, for each x X, the Dirac measure δ 6 ∅ ∈ x is a Borel probability measure. Indeed, we have the following result:

Proposition 7.2.2 There is a continuous embedding of X in M(X) given by the map X M(X) : x δ , → 7→ x i.e. if x x then δ ⇀ δ . n → xn x Proof. See Exercise 7.1. ✷

Recall that a subset C of a vector space is convex if whenever v , v C and α [0, 1] 1 2 ∈ ∈ then αv + (1 α)v C. 1 − 2 ∈ Proposition 7.2.3 The space M(X) is convex.

Proof. Let µ ,µ M(X), α [0, 1]. Then it is easy to check that αµ + (1 α)µ , 1 2 ∈ ∈ 1 − 2 defined by (αµ + (1 α)µ )(B)= αµ (B) + (1 α)µ (B), 1 − 2 1 − 2 is a Borel probability measure. ✷

Finally, recall that a metric space K is said to be (sequentially) compact if every se- quence of points in K has a convergent subsequence.

Proposition 7.2.4 The space M(X) is weak∗ compact.

Proof. For convenience, we shall write µ(f)= f dµ. Since C(X, R) is separable, we can choose a countable dense subset of functions f ∞ R { i}i=1 ⊂ C(X, R). Given a sequence µ M(X), we shall first consider the sequence of real numbers n ∈ µn(f1) R. We have that µn(f1) f1 ∞ for all n, so µn(f1) is a bounded sequence of ∈ | | ≤ k k (1) real numbers. As such, it has a convergent subsequence, µn (f1) say. (1) (1) Next we apply the sequence of measures µn to f and consider the sequence µn (f ) 2 2 ∈ R. Again, this is a bounded sequence of real numbers and so it has a convergent subsequence (2) µn (f2). (i) (i−1) In this way we obtain, for each i 1, nested subsequences µn µn such that (i) ≥ { } ⊂(n) { } µn (fj) converges for 1 j i. Now consider the diagonal sequence µn . Since, for n i, (n) ≤ (i)≤ (n) ≥ µn is a subsequence of µn , µn (fi) converges for every i 1. ≥ (n) We can now use the fact that f is dense to show that µn (f) converges for all { i} f C(X, R), as follows. For any ε > 0, we can choose fi such that f fi ∞ ε. Since (n∈) k − k ≤ µn (f ) converges, there exists N such that if n,m N then i ≥ µ(n)(f ) µ(m)(f ) ε. | n i − m i |≤ Thus if n,m N we have ≥ µ(n)(f) µ(m)(f) µ(n)(f) µ(n)(f ) + µ(n)(f ) µ(m)(f ) + µ(m)(f ) µ(m)(f) | n − m | ≤ | n − n i | | n i − m i | | m i − m | 3ε, ≤

64 MATH4/61112 7. Continuous transformations

(n) so µn (f) converges, as required. (n) To complete the proof, write w(f) = limn→∞ µn (f). We claim that w satisfies the hypotheses of the Riesz Representation Theorem and so corresponds to integration with respect to a probability measure. (i) By construction, w is a linear mapping: w(λf + µg)= λw(f)+ µw(g).

(ii) As w(f) f , we see that w is bounded. | | ≤ k k∞ (iii) If f 0 then it is easy to check that w(f) 0. Hence w is positive. ≥ ≥ (iv) It is easy to check that w is normalised: w(1) = 1. Therefore, by the Riesz Representation Theorem, there exists µ M(X) such that w(f)= (n) ∈ (n) fdµ. We then have that fdµn fdµ, as n , for all f C(X, R), i.e., that µn → →∞ ∈ converges weak∗ to µ, as n . ✷ R R→∞ R 7.3 Invariant measures for continuous transformations § Let X be a compact metric space equipped with the Borel σ-algebra and let T : X X → be a continuous transformation. It is clear that T is measurable. −1 Given a measure µ, we have already defined the measure T∗µ by T∗µ(B) = µ(T B). If µ is a Borel probability measure, then it is straightforward to check that T∗µ is a Borel probability measure. We can think of T∗ as a transformation on M(X), namely: T : M(X) M(X), T µ = µ T −1. ∗ → ∗ ◦ That is, if B then T µ(B)= µ(T −1B). ∈B ∗ The following result tells us how to integrate with respect to T∗µ.

Lemma 7.3.1 For f L1(X, ,µ) we have ∈ B f d(T µ)= f T dµ. ∗ ◦ Z Z Proof. From the definition, for B , ∈B −1 χB d(T∗µ) = (T∗µ)(B)= µ(T B)= χ 1 dµ = χB T dµ. T − B ◦ Z Z Z Thus the result holds for simple functions. If f 0 is a positive measurable function then ≥ we can choose an increasing sequence of simple functions fn increasing to f pointwise. We have f d(T µ)= f T dµ n ∗ n ◦ Z Z and, applying the Monotone Convergence Theorem (Theorem 3.1.2) to each side, we obtain

f d(T µ)= f T dµ. ∗ ◦ Z Z The result extends to an arbitrary real-valued f L1(X, ,µ) by considering positive and ∈ B negative parts and then to complex-valued integrable functions by taking real and imaginary parts. ✷

65 MATH4/61112 7. Continuous transformations

Recall that a measure µ is said to be T -invariant if µ(T −1B) = µ(B) for all B . ∈ B Hence µ is T -invariant if and only if T∗µ = µ. Write

M(X, T )= µ M(X) T µ = µ { ∈ | ∗ } to denote the space of all T -invariant Borel probability measures. The following result gives a useful criterion for checking whether a measure is T - invariant.

Lemma 7.3.2 Let T : X X be a continuous mapping of a compact metric space. The following are → equivalent:

(i) µ M(X, T ); ∈ (ii) for all f C(X, R) we have that ∈ f T dµ = f dµ. (7.3.1) ◦ Z Z Proof. We prove (i) implies (ii). Suppose that µ M(X, T ) so that T µ = µ. Let ∈ ∗ f C(X, R). Then f L1(X, ,µ). Hence by Lemma 7.3.1, for any f C(X, R) we have ∈ ∈ B ∈ f T dµ = f d(T µ)= f dµ. ◦ ∗ Z Z Z Conversely, Lemma 7.3.1 allows us to write (7.3.1) as: µ(f) = (T µ)(f) for all f ∗ ∈ C(X, R). Hence µ and T∗µ determine the same linear functional on C(X, R). By uniqueness in the Riesz Representation theorem, we have T∗µ = µ. ✷

7.4 Invariant measures for continuous maps on the torus § We can use Lemma 7.3.2 to prove that a given measure is invariant for certain dynami- cal systems. We first note that we need only check (7.3.1) for a dense set of continuous functions.

Lemma 7.4.1 Suppose that C(X, R) is a uniformly dense subset of functions (that is, for all f S ⊂ ∈ C(X, R) and all ε> 0 there exists g such that f g < ε). Suppose that g T dµ = ∈ S k − k∞ ◦ g dµ for all g . Then f T dµ = f dµ for all f C(X, R). ∈ S ◦ ∈ R Proof.R Let f C(X, R) andR let ε> 0.R Choose g such that f g < ε. Then ∈ ∈ S k − k∞

f T dµ f dµ ◦ − Z Z

f T dµ g T dµ + g T dµ g dµ + g dµ f dµ ≤ ◦ − ◦ ◦ − − Z Z Z Z Z Z

f T g T dµ + g T dµ g dµ + f g dµ. ≤ | ◦ − ◦ | ◦ − | − | Z Z Z Z

66 MATH4/61112 7. Continuous transformations

Noting that, as f g < ε, we have that f(Tx) g(Tx) < ε for all x, and that k − k∞ | − | g T dµ = g dµ, we have that ◦ R R f T dµ f dµ < 2ε. ◦ − Z Z

As ε is arbitrary, the result follows. ✷

Corollary 7.4.2 Let T be a continuous transformation of a compact metric space X, equipped with the Borel σ-algebra. Let µ be a Borel probability measure on X. Suppose that C(X, R) is a uniformly dense subset of functions such that g T dµ = S ⊂ ◦ g dµ for all g . Then µ is a T -invariant measure. ∈ S R RProof. This follows immediately from Lemma 7.3.2 and Lemma 7.4.1. ✷

We show how to use Corollary 7.4.2 by studying some of our examples.

7.4.1 Circle rotations § Let T (x)= x + α mod 1 be a circle rotation. We show how to use Corollary 7.4.2 to prove that Lebesgue measure µ is T -invariant. Let ℓ Z. We first note that if ℓ = 0 then ∈ 6 1 1 e2πiℓx dx = e2πiℓx = 0. 2πiℓ 0 Z

We also note that if ℓ = 0 then e2πiℓx dx = 1. Let denote the set of trigonometric polynomials, i.e. S R r−1 = c e2πiℓj x c R,ℓ Z, r N . S  j | j ∈ j ∈ ∈  Xj=0  Then is uniformly dense inC(X, R) by the Stone-Weierstrass Theorem (Theorem 1.2.2). S Let g be a trigonometric polynomial and write ∈ S r−1 2πiℓj x g(x)= cje Xj=0 where ℓj = 0 if and only if j = 0. Hence g dµ = c0. Note that r−1 R r−1 2πiℓj (x+α) 2πiℓj α 2πiℓj x g(Tx)= cje = cje e . Xj=0 Xj=0 Hence r−1 r−1 g T dµ = c e2πiℓj αe2πiℓj x dµ = c e2πiℓj α e2πiℓj x dµ ◦ j j Z Z Xj=0 Xj=0 Z and the only non-zero integral occurs when ℓj = 0, i.e. j = 0. We must therefore have that g T dµ = c . ◦ 0 Hence g T dµ = g dµ for all g . It follows from Corollary 7.4.2 that µ is R ◦ ∈ S T -invariant. R R

67 MATH4/61112 7. Continuous transformations

7.4.2 Toral endomorphisms § Let A be a k k integer matrix with det A = 0. Define the linear toral endomorphism × 6 T : Rk/Zk Rk/Zk by → k k T ((x1,...,xk)+ Z )= A(x1,...,xk)+ Z .

When T is a linear toral automorphism (i.e. when det A = 1) we have already seen that ± Lebesgue measure is invariant. We use Corollary 7.4.2 to prove the Lebesgue measure µ is T -invariant when det A = 0. 6 For n = (n ,...,n ) Zk and x = (x ,...,x ) Rk define, as before, n,x = 1 k ∈ 1 k ∈ h i n x + + n x . Note that 1 1 · · · k k

e2πihn,xi dµ = e2πin1x1 e2πinkxk dx dx . · · · · · · 1 · · · k Z Z Z Hence 0 if n = 0 e2πihn,xi dµ = 6 1 if n = 0 Z  where 0 = (0,..., 0) Zk. ∈ Let r−1 (j) = c e2πihn ,xi r N, c R, n(j) = (n(j),...,n(j)) Zk . S  j | ∈ j ∈ 1 k ∈  Xj=0  By the Stone-Weierstrass Theorem (Theorem 1.2.2), we see that is uniformly dense in  S  C(Rk/Zk, R). Let g and write ∈ S r−1 2πihn(j),xi g(x)= cje Xj=0 where n(j) = 0 if and only if j = 0. Then

r−1 r−1 2πihn(j),xi 2πihn(j),xi g dµ = cje dµ = cj e dµ = c0. Z Z Xj=0 Xj=0 Z Note that r−1 r−1 2πihn(j),Axi 2πihn(j)A,xi g(Tx)= cje = cje . Xj=0 Xj=0 Hence r−1 r−1 (j) (j) g T dµ = c e2πihn A,xi dµ = c e2πihn A,xi dµ. ◦ j j Z Z Xj=0 Xj=0 Z These integrals are zero unless n(j)A = 0. As det A = 0 this happen only when n(j) = 0, 6 i.e. when j = 0. Hence g T dµ = c = g dµ. ◦ 0 Z Z Hence by Corollary 7.4.2, µ is a T -invariant measure.

68 MATH4/61112 7. Continuous transformations

Remark. You will notice a strong connection between the above arguments and Fourier series and you may think that we could take g(x) to be the nth partial sum of the Fourier series for f. However, one needs to take care. Suppose f C(Rk/Zk, R) has Fourier 2πihn,xi ∈ series n cn(f)e . We need to be careful about what it means for this infinite series to converge. We know that the sequence of partial sums s converges in L2 to f, but P n we do not know that the partial sums converge uniformly to f. That is, we know that f s 0, but not necessarily that f s 0. In fact, in general, it is not true k − nk2 → k − nk∞ → that f sn ∞ 0. k − k → n−1 However, if one defines σn = 1/n j=0 sj to be the average of the first n partial sums, then it is true that f σ 0. (This is quite a deep result.) k − nk∞ → P

7.5 Existence of invariant measures § Given a continuous mapping T : X X of a compact metric space, it is natural to ask → whether invariant measures necessarily exist, i.e., whether M(X, T ) = . The next result 6 ∅ shows that this is the case.

Theorem 7.5.1 Let T : X X be a continuous mapping of a compact metric space. Then there exists at → least one T -invariant probability measure.

Proof. Let ν M(X) be a probability measure (for example, we could take ν to be a ∈ Dirac measure). Define the sequence µ M(X) by n ∈ n−1 1 µ = T jν, n n ∗ Xj=0 so that, for B , ∈B 1 µ (B)= (ν(B)+ ν(T −1B)+ + ν(T −(n−1)B)). n n · · · Since M(X) is weak∗ compact, some subsequence µ converges, as k , to a nk → ∞ measure µ M(X). We shall show that µ M(X, T ). By Lemma 7.3.2, this is equivalent ∈ ∈ to showing that f dµ = f T dµ for all f C(X, R). ◦ ∈ Z Z To see this, first note that f T f is continuous. Then ◦ −

f T dµ f dµ = (f T f) dµ ◦ − ◦ − Z Z Z

= lim (f T f ) dµnk k→∞ ◦ − Z

nk−1 1 j = lim (f T f) d T∗ ν k→∞ ◦ − n  Z k j=0 X   nk−1 1 j = lim (f T f) dT∗ ν k→∞ n ◦ − k Z j=0 X

69 MATH4/61112 7. Continuous transformations

n −1 1 k = lim (f T j+1 f T j) dν k→∞ n ◦ − ◦ k Z j=0 X 1 nk = lim (f T f) dν k→∞ n ◦ − k Z

2 f ∞ lim k k = 0. ≤ k→∞ nk Therefore, µ M(X, T ), as required. ✷ ∈ We will need the following additional properties of M(X, T ).

Theorem 7.5.2 Let T : X X be a continuous mapping of a compact metric space. Then M(X, T ) is a → weak∗ compact and convex subset of M(X).

Proof. The fact that M(X, T ) is convex is straightforward from the definition. To see that M(X, T ) is weak∗ compact it is sufficient to show that it is a weak∗ closed subset of the weak∗ compact M(X). Suppose that µ M(X, T ) is such that µ ⇀ µ n ∈ n ∈ M(X). We need to show that µ M(X, T ). To see this, observe that for any f C(X, R) ∈ ∈ we have that

f T dµ = lim f T dµn as f T is continuous ◦ n→∞ ◦ ◦ Z Z = lim f dµn as µn M(X, T ) n→∞ ∈ Z = f dµ as µn ⇀ µ. Z ✷

7.6 Exercises § Exercise 7.1 Prove Proposition 7.2.2: show that if x ,x X and x x then δ ⇀ δ . n ∈ n → xn x Exercise 7.2 Prove that T : M(X) M(X) is weak∗ continuous (i.e. if µ ⇀µ then T µ ⇀ T µ). ∗ → n ∗ n ∗ Exercise 7.3 Let X be a compact metric space. For µ M(X) define ∈

µ = sup f dµ . k k f∈C(X,R),kfk ≤1 ∞ Z

We say that µn converges strongly to µ if µn µ 0 as n . The topology this k − k → → ∞ determines is called the strong topology (or the operator topology). (i) Show that if µ µ strongly then µ ⇀µ in the weak∗ topology. n → n ii) Suppose that X is infinite. Show that X ֒ M(X) : x δ is not continuous in the) → 7→ x strong topology.

70 MATH4/61112 7. Continuous transformations

(iii) Prove that δ δ = 2 if x = y. (You may use Urysohn’s Lemma: Let A and B k x − yk 6 be disjoint closed subsets of a metric space X. Then there is a continuous function f C(X, R) such that 0 f 1 on X while f 0 on A and f 1 on B.) ∈ ≤ ≤ ≡ ≡ Hence prove that M(X) is not compact in the strong topology when X is infinite.

Exercise 7.4 Give an example of a sequence of measures µ and a set B such that µ ⇀µ but µ (B) n n n 6→ µ(B).

Exercise 7.5 Prove that M(X, T ) is convex.

Exercise 7.6 Suppose that C(X, R) is a uniformly dense subset of functions (that is, for all f S ⊂ ∈ C(X, R) and all ε > 0 there exists g such that f g < ε). Let µ ,µ M(X). ∈ S k − k∞ n ∈ Suppose that f dµ f dµ for all f . Prove that µ ⇀µ. n → ∈ S n Exercise 7.7R R Let Σ = x = (x )∞ x 0, 1 denote the shift space on two symbols 0, 1. Let { j j=0 | j ∈ { }} σ :Σ Σ, (σ(x)) = x denote the shift map. → j j+1 (i) How many periodic points of period n are there?

(ii) Let Per(n) denote the set of periodic points of period n. Define 1 µ = δ . n 2n x x∈XPer(n) Let i 0, 1 , 0 j m 1 and define the cylinder set j ∈ { } ≤ ≤ − [i , i ,...,i ]= x = (x )∞ Σ x = i , j = 0, 1,...,m 1 . 0 1 m−1 { j j=0 ∈ | j j − } Let µ denote the Bernoulli-(1/2, 1/2) measure. Prove that

χ[i0,i1,...,im 1] dµn χ[i0,i1,...,im 1] dµ as n . − → − →∞ Z Z

(iii) Prove that χ[i0,i1,...,im 1] is a continuous function. − (iv) Use Exercise 7.6 and the Stone-Weierstrass Theorem (Theorem 1.2.2) to show that µ ⇀µ as n . n →∞ Exercise 7.8 Let X = R3/Z3 be the 3-dimensional torus. Let α R. Define T : X X by ∈ → x α + x T y + Z3 = y + x + Z3.      z z + y      Use Corollary 7.4.2 to prove that Lebesgue measure µ is a T -invariant measure.

71 MATH4/61112 8. Ergodic measures for continuous transformations

8. Ergodic measures for continuous transformations

8.1 Introduction § In the previous section we saw that, given a continuous transformation of a compact metric space, the set of T -invariant Borel probability measures is non-empty. One can ask a similar question: is the set of ergodic Borel probability measures non-empty? In this section we address this question. We let E(X, T ) M(X, T ) denote the set of ergodic T -invariant ⊂ Borel probability measures on X.

8.2 Radon-Nikodym derivatives § We will need the concept of Radon-Nikodym derivatives. Definition. Let µ be a measure on the measurable space (X, ). We say that a measure ν B is absolutely continuous with respect to µ and write ν µ if ν(B) = 0 whenever µ(B) = 0, ≪ B . ∈B Remark. Thus ν is absolutely continuous with respect to µ if sets of µ-measure zero also have ν-measure zero (but there may be more sets of ν-measure zero). For example, let f L1(X, ,µ) be non-negative and define a measure ν by ∈ B ν(B)= f dµ. (8.2.1) ZB Then ν µ. ≪ As a particular example, let X = [0, 1] be equipped with the Borel σ-algebra . Define B f : [0, 1] R by → 2x if 0 x 1/2 f(x)= ≤ ≤ 0 if 1/2

ν(B)= f dµ. ZB If A [1/2, 1] is any Borel set then ν(A) = 0. ⊂ The following theorem says that, essentially, all absolutely continuous measures occur by the construction in (8.2.1).

Theorem 8.2.1 (Radon-Nikodym) Let (X, ,µ) be a probability space. Let ν be a measure defined on and suppose that B B ν µ. Then there is a non-negative measurable function f such that ≪ ν(B)= f dµ for all B . ∈B ZB Moreover, f is unique in the sense that if g is a measurable function with the same property then f = g µ-a.e.

72 MATH4/61112 8. Ergodic measures for continuous transformations

Remark. If ν µ then it is customary to write dν/dµ for the function given by the ≪ Radon-Nikodym theorem, that is dν ν(B)= dµ. dµ ZB The following relations are all easy to prove, and indicate why the notation was chosen in this way. (i) If ν µ and f is a µ-integrable function then f is ν-integrable and ≪ dν f dν = f dµ. dµ Z Z

(ii) If ν1,ν2 µ then ≪ d(ν + ν ) dν dν 1 2 = 1 + 2 . dµ dµ dµ (iii) If λ ν µ then λ µ and ≪ ≪ ≪ dλ dλ dν = . dµ dν dµ

8.3 Ergodic measures as extreme points § 8.3.1 Extreme points of convex sets § A point in a convex set is called an extreme point if it cannot be written as a non-trivial convex combination of (other) elements of the set. More precisely, µ is an extreme point of M(X, T ) if, whenever µ = αµ + (1 α)µ , 1 − 2 with µ ,µ M(X, T ), 0 <α< 1, then we have µ = µ = µ. 1 2 ∈ 1 2 Remarks. (i) Let Y be the unit square

Y = (x,y) 0 x 1, 0 y 1 R2. { | ≤ ≤ ≤ ≤ }⊂ Then the extreme points of Y are the corners (0, 0), (0, 1), (1, 0), (1, 1).

(ii) Let Y be the (closed) unit disc

Y = (x,y) x2 + y2 1 R2. { | ≤ }⊂ Then the set of extreme points of Y is precisely the unit circle (x,y) x2 + y2 = 1 . { | } 8.3.2 Existence of ergodic measures § The next result will allow us to show that ergodic measures for continuous transformations on compact metric spaces always exist.

Theorem 8.3.1 Let T be a continuous transformation of a compact metric space X equipped with the Borel σ-algebra . The following are equivalent: B

73 MATH4/61112 8. Ergodic measures for continuous transformations

(i) the T -invariant probability measure µ is ergodic;

(ii) µ is an extreme point of M(X, T ).

Proof. We prove (ii) implies (i). If µ is an extreme point of M(X, T ) then it is ergodic. In fact, we shall prove the contrapositive. Suppose that µ is not ergodic; we show that µ is not an extreme point of M(X, T ). As µ is not ergodic, there exists B such that ∈ B T −1B = B and 0 <µ(B) < 1. Define probability measures µ1 and µ2 on X by µ(A B) µ(A (X B)) µ (A)= ∩ , µ (A)= ∩ \ . 1 µ(B) 2 µ(X B) \ (The assumption that 0 <µ(B) < 1 ensures that the denominators are not equal to zero.) Clearly, µ = µ , since µ (B) = 1 while µ (B) = 0. 1 6 2 1 2 Since T −1B = B, we also have T −1(X B)= X B. Thus we have \ \ µ(T −1A B) µ (T −1A) = ∩ 1 µ(B) µ(T −1A T −1B) = ∩ µ(B) µ(T −1(A B)) = ∩ µ(B) µ(A B) = ∩ µ(B) = µ1(A) and (by the same argument)

µ(T −1A (X B)) µ (T −1A)= ∩ \ = µ (A), 2 µ(X B) 2 \ i.e., µ1 and µ2 are both in M(X, T ). However, we may write µ as the non-trivial (since 0 <µ(B) < 1) convex combination

µ = µ(B)µ + (1 µ(B))µ , 1 − 2 so that µ is not an extreme point. ✷

Proof (not examinable). We prove (i) implies (ii). Suppose that µ is ergodic and that µ = αµ + (1 α)µ , with µ ,µ M(X, T ) and 0 <α< 1. We shall show that µ = µ 1 − 2 1 2 ∈ 1 (so that µ2 = µ, also). This will show that µ is an extreme point of M(X, T ). If µ(A) = 0 then µ (A) = 0, so that µ µ. Therefore the Radon-Nikodym derivative 1 1 ≪ dµ /dµ 0 exists. One can easily deduce from the statement of the Radon-Nikodym 1 ≥ Theorem that µ1 = µ if and only if dµ1/dµ = 1 µ-a.e. We shall show that this is indeed the case by showing that the sets where, respectively, dµ1/dµ < 1 and dµ1/dµ > 1 both have µ-measure zero. Let dµ B = x X 1 (x) < 1 . ∈ | dµ   74 MATH4/61112 8. Ergodic measures for continuous transformations

Now dµ1 dµ1 dµ1 µ1(B)= dµ = dµ + dµ (8.3.1) dµ 1 dµ 1 dµ ZB ZB∩T − B ZB\T − B and −1 dµ1 dµ1 dµ1 µ1(T B)= dµ = dµ + dµ. (8.3.2) 1 dµ 1 dµ 1 dµ ZT − B ZB∩T − B ZT − B\B As µ M(X, T ), we have that µ (B)= µ (T −1B). Hence comparing the last summands 1 ∈ 1 1 in both (8.3.1) and (8.3.2) we obtain dµ dµ 1 dµ = 1 dµ. (8.3.3) 1 dµ 1 dµ ZB\T − B ZT − B\B In fact, these integrals are taken over sets of the same µ-measure:

µ(T −1B B) = µ(T −1B) µ(T −1B B) \ − ∩ = µ(B) µ(T −1B B) − ∩ = µ(B T −1B). \

Note that on the left-hand side of (8.3.3), the integrand dµ1/dµ < 1. However, on the right- hand side of (8.3.3), the integrand dµ /dµ 1. Thus we must have that µ(B T −1B) = 1 ≥ \ µ(T −1B B) = 0, which is to say that µ(T −1B B) = 0, i.e. T −1B = B µ-a.e. Therefore, \ △ since µ is ergodic, we have that µ(B)=0 or µ(B) = 1. We can rule out the possibility that µ(B) = 1 by observing that if µ(B)= 1 then dµ dµ 1= µ (X)= 1 dµ = 1 dµ<µ(B) = 1, 1 dµ dµ ZX ZB a contradiction. Therefore µ(B) = 0. If we define dµ C = x X 1 (x) > 1 ∈ | dµ   then repeating essentially the same argument gives µ(C) = 0. Hence dµ µ x X 1 (x) = 1 = µ(X (B C)) = µ(X) µ(B) µ(C) = 1, ∈ | dµ \ ∪ − −   i.e., dµ1/dµ = 1 µ-a.e. Therefore µ1 = µ, as required. ✷

We can now prove that a continuous transformation of a compact metric space always has an ergodic measure. To do this, we will show that M(X, T ) has an extreme point.

Theorem 8.3.2 Let T : X X be a continuous mapping of a compact metric space. Then there exists at → least one ergodic measure in M(X, T ).

Proof. By Theorem 8.3.1, it is equivalent to prove that M(X, T ) has an extreme point. Choose a countable dense subset of C(X, R), f ∞ say. Consider the first function { i}i=0 f0. Since the map M(X, T ) R : µ f dµ → 7→ 0 Z 75 MATH4/61112 8. Ergodic measures for continuous transformations is (weak∗) continuous and M(X, T ) is compact, there exists (at least one) ν M(X, T ) ∈ such that

f0 dν = sup f0 dµ. Z µ∈M(X,T ) Z If we define M = ν M(X, T ) f dν = sup f dµ 0 ∈ | 0 0 ( Z µ∈M(X,T ) Z ) then the above shows that M0 is non-empty. Also, M0 is closed and hence compact. We now consider the next function f1 and define

M = ν M f dν = sup f dµ . 1 ∈ 0 | 1 1 ( Z µ∈M0 Z )

By the same reasoning as above, M1 is a non-empty closed subset of M0. Continuing inductively, we define

Mj = ν Mj−1 fj dν = sup fj dµ ( ∈ | µ∈Mj 1 ) Z − Z and hence obtain a nested sequence of sets

M(X, T ) M M M ⊃ 0 ⊃ 1 ⊃···⊃ j ⊃··· with each Mj non-empty and closed. Now consider the intersection ∞ M∞ = Mj. j\=0 Recall that the intersection of a decreasing sequence of non-empty compact sets is non- empty. Hence M is non-empty and we can pick µ M . We shall show that µ is an ∞ ∞ ∈ ∞ ∞ extreme point (and hence ergodic). Suppose that we can write µ = αµ + (1 α)µ , µ ,µ M(X, T ), 0 <α< 1. We ∞ 1 − 2 1 2 ∈ have to show that µ = µ . Since f ∞ is dense in C(X, R), it suffices to show that 1 2 { j}j=0

f dµ = f dµ j 0. j 1 j 2 ∀ ≥ Z Z Consider f0. By assumption

f dµ = α f dµ + (1 α) f dµ . 0 ∞ 0 1 − 0 2 Z Z Z In particular, f dµ max f dµ , f dµ . 0 ∞ ≤ 0 1 0 2 Z Z Z  However µ M and so ∞ ∈ 0

f dµ = sup f dµ max f dµ , f dµ . 0 ∞ 0 ≥ 0 1 0 2 Z µ∈M(X,T ) Z Z Z 

76 MATH4/61112 8. Ergodic measures for continuous transformations

Therefore

f0 dµ1 = f0 dµ2 = f0 dµ∞. Z Z Z Thus, the first identity we require is proved and µ ,µ M . This last fact allows us to 1 2 ∈ 0 employ the same argument on f1 (with M(X, T ) replaced by M0) and conclude that

f1 dµ1 = f1 dµ2 = f1 dµ∞ Z Z Z and µ ,µ M . 1 2 ∈ 1 Continuing inductively, we show that for an arbitrary j 0, ≥

fj dµ1 = fj dµ2 Z Z and µ ,µ M . This completes the proof. ✷ 1 2 ∈ j

8.4 An example: the North-South map § For many dynamical systems there exist uncountably many different ergodic measures. This is the case for the doubling map, Markov shifts, toral automorphisms, etc. Here we give an example of a dynamical system T : X X for which one can construct M(X, T ) → and E(X, T ) explicitly. Let X R2 denote the circle of radius 1 centred at (0, 1) R2. Call N = (0, 2) the ⊂ ∈ North Pole and S = (0, 0) the South Pole (S) of X. Define a map φ : X N R 0 \ { }→ × { } by drawing a straight line through N and x and denoting by φ(x) the unique point on the x-axis that this line crosses (this is just stereographic projection of the circle). Define T : X X by → φ−1 1 φ(x) if x X N , T (x)= 2 ∈ \ { } N if x = N.   Hence T (N) = N, T (S) = S and if x = N,S then T n(x) S as n . We call T the 6 → → ∞ N

x

T (x)

φ(x) S 2 φ(x)

Figure 8.4.1: The North-South map

North-South map. Clearly both N and S are fixed points for T . Hence δN and δS (the Dirac delta measures at N, S, respectively) are T -invariant. It is easy to see that both δN and δS are ergodic.

77 MATH4/61112 8. Ergodic measures for continuous transformations

Now let µ M(X, T ) be an invariant measure. We claim that µ assigns zero measure ∈ to the set X N,S . Let x X be any point in the right semi-circle (for example, take \ { } ∈ x = (1, 1) R2) and consider the arc I of semi-circle from x to T (x). Then ∞ T −nI is ∈ n=−∞ a disjoint union of arcs of semi-circle and, moreover, is equal to the entire right semi-circle. S Now ∞ ∞ ∞ µ T −nI = µ(T −nI)= µ(I) n=−∞ ! n=−∞ n=−∞ [ X X and the only way for this to be finite is if µ(I) = 0. Hence µ assigns zero measure to the entire right semi-circle. Similarly, µ assigns zero measure to the left semi-circle. Hence µ is concentrated on the two points N, S, and so must be a convex combination of the Dirac delta measures δN and δS. Hence

M(X, T )= αδ + (1 α)δ α [0, 1] { N − S | ∈ } and the ergodic measures are the extreme points of M(X, T ), namely δN , δS.

8.5 Unique ergodicity § We conclude by looking at the case where T : X X has a unique invariant probability → measure.

Definition. Let T : X X be a continuous transformation of a compact metric space → X. If there is a unique T -invariant probability measure then we say that T is uniquely ergodic.

Remark. You might wonder why such T are not instead called ‘uniquely invariant’. Recall that the extreme points of M(X, T ) are precisely the ergodic measures. If M(X, T ) consists of just one measure then that measure is an extreme, and so must be ergodic.

Unique ergodicity implies the following strong convergence result.

Theorem 8.5.1 (Oxtoby’s Ergodic Theorem) Let X be a compact metric space and let T : X X be a continuous transformation. The → following are equivalent:

(i) T is uniquely ergodic;

(ii) for each f C(X, R) there exists a constant c(f) such that ∈ n−1 1 lim f(T jx) c(f), (8.5.1) n→∞ n → Xj=0 uniformly for x X. ∈ Remark. The convergence in (8.5.1) means that

n−1 1 lim sup f(T jx) c(f) = 0. n→∞ n − x∈X j=0 X

78 MATH4/61112 8. Ergodic measures for continuous transformations

Remark. If M(X, T )= µ then the constant c(f) in (8.5.1) is f dµ. { } Proof. We prove (ii) implies (i). Suppose that µ,ν are T -invariantR probability measures; we shall show that µ = ν. Integrating the expression in (ii), we obtain

n−1 n−1 1 1 f dµ = lim f T j dµ = lim f T j dµ = c(f) dµ = c(f), n→∞ n ◦ n→∞ n ◦ Z Xj=0 Z Z Xj=0 Z (that the convergence in (8.5.1) is uniform allows us to interchange integration and taking limits) and, by the same argument

f dν = c(f). Z Therefore f dµ = f dν for all f C(X, R) ∈ Z Z and so µ = ν (by the Riesz Representation Theorem). We prove (i) implies (ii). Let M(X, T ) = µ . If (ii) is true, then, by the Dominated { } Convergence Theorem (Theorem 3.1.3), we must necessarily have c(f)= f dµ. The convergence in (ii) means: f C(X, R), ε > 0, N N such that if n N ∀ ∈ ∀ ∃ ∈ R ≥ then for all x X we have ∈ n−1 1 f(T jx) f dµ < ε. n − j=0 Z X

Suppose that (ii) is false. Then, negating the above quantifiers, we see that there exists f C(X, R) and ε> 0 and an increasing sequence n such that there exists x for 0 ∈ k ↑∞ nk which n −1 1 k f (T jx ) f dµ ε. (8.5.2) n 0 nk − 0 ≥ k j=0 Z X

Define the probability measure µk M(X) by ∈ n −1 1 k µ = T jδ , k ∗ xnk nk Xj=0 so that (8.5.2) can be written as

f dµ f dµ ε. 0 k − 0 ≥ Z Z

Now µ M(X) and M(X) is weak∗ compact. Hence there exists a weak∗ convergent k ∈ subsequence, say with weak∗ limit ν. By following the proof of Theorem 7.5.1, it is easy to see that ν M(X, T ). In particular, we have ∈

f dν f dµ ε. 0 − 0 ≥ Z Z

Therefore, ν = µ, contradicting unique ergodicity. ✷ 6

79 MATH4/61112 8. Ergodic measures for continuous transformations

8.6 Irrational rotations § Let X = R/Z, T : X X, T (x)= x+α mod 1 where α is irrational. We have already seen → that Lebesgue measure µ is an ergodic T -invariant measure. We can prove that Lebesgue measure is the only invariant measure.

Proposition 8.6.1 An of a circle is uniquely ergodic and the unique T -invariant measure is Lebesgue measure.

Proof. We use Oxtoby’s Ergodic Theorem. To prove that T is uniquely ergodic, we must show that (8.5.1) holds for every continuous function f C(X, R). Note that the ∈ convergence in (8.5.1) is uniform, i.e. we must show that

n−1 1 f(T j(x)) f dµ 0 (8.6.1) n − → j=0 Z X ∞ as n . →∞ We first prove (8.6.1) in the case when f(x) = e2πiℓx, ℓ Z 0 . Note that T j(x) = ∈ \ { } x + jα. Hence

n−1 n−1 1 1 f(T j(x)) = e2πiℓ(x+jα) n n j=0 j=0 X X n−1 1 = e2πiℓx e2πiℓαj n j=0 X 1 e2πiℓαn 1 = | − | n e2πiℓα 1 | − | 1 2 . (8.6.2) ≤ n e2πiℓα 1 | − | As α is irrational, the denominator in (8.6.2) is not zero. Note also that e2πiℓx dµ = 0. Hence n−1 R 1 sup f(T j(x)) f dµ 0 n − → x∈X j=0 Z X as n when f(x) = e2πiℓx, ℓ Z 0 . Clearly (8.6.1) holds when f is a constant → ∞ ∈ \ { } function. By taking finite linear combinations of exponential functions we see that

n−1 1 sup g(T j(x)) g dµ 0 n − → x∈X j=0 Z X as n for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (The- → ∞ orem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f C(X, R) ∈ and let ε > 0. Then there exists a trigonometric polynomial g such that f g < ε. k − k∞

80 MATH4/61112 8. Ergodic measures for continuous transformations

Hence for any x X we have ∈ n−1 1 f(T j(x)) f dµ n − j=0 Z X n−1 n−1 1 1 (f(T j(x)) g (T j(x))) + g(T j(x)) g dµ + g f dµ ≤ n − n − − j=0 j=0 Z Z X X n−1 n−1 1 1 f(T j(x)) g(T j(x)) + g(T j (x)) g dµ + g f dµ ≤ n | − | n − | − | j=0 j=0 Z Z X X n−1 1 2ε + g(T j(x)) g dµ . ≤ n − j=0 Z X

Hence, taking the supremum over all x X, we have ∈ n−1 n−1 1 1 f(T j(x)) f dµ < 2ε + g(T j(x)) g dµ . n − n − j=0 Z j=0 Z X ∞ X ∞

Letting n we see that →∞ n−1 1 lim sup f(T j(x)) f dµ < 2ε. n→∞ n − j=0 Z X ∞

As ε> 0 is arbitrary, it follows that

n−1 1 lim f(T j(x)) f dµ = 0. n→∞ n − j=0 Z X ∞

Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the unique invariant measure. ✷

8.7 Exercises § Exercise 8.1 Prove the following identities concerning Radon-Nikodym derivatives.

(i) If ν µ and f L1(X, ,µ) then f L1(X, ,ν) and ≪ ∈ B ∈ B dν f dν = f dµ. dµ Z Z

(ii) If ν1,ν2 µ then ≪ d(ν + ν ) dν dν 1 2 = 1 + 2 . dµ dµ dµ

81 MATH4/61112 8. Ergodic measures for continuous transformations

(iii) If λ ν µ then λ µ and ≪ ≪ ≪ dλ dλ dν = . dµ dν dµ

Exercise 8.2 Let X = R3/Z3 be the 3-dimensional torus. Fix α Q. Define T : X X by 6∈ → x α + x T y + Z3 = y + x + Z3 .       z z + y       Prove by induction that for n 3 ≥ n α + x 1 x    n n   T n y + Z3 = α + x + y + Z3     2 1   z         n n n       α + x + y + z    3 2 1               n (here denotes the binomial coefficient). r   Let f(x,y,z) = e2πi(kx+ℓy+mz) where k,ℓ,m Z. Assuming Weyl’s Theorem on Poly- ∈ nomials (Theorem 2.3.1), prove using Weyl’s Criterion (Theorem 1.2.1) that

n−1 1 sup f(T j((x,y,z)+ Z3)) 0 x,y,z n → Xj=0 as n , whenever (k,ℓ,m) Z3 (0, 0, 0) . →∞ ∈ \ { } Hence, using Oxtoby’s Ergodic Theorem, prove that T is uniquely ergodic and Lebesgue measure is the unique invariant measure.

Exercise 8.3 Let T be a homeomorphism of a compact metric space X. Suppose that T is uniquely ergodic with unique invariant measure µ. Prove that every orbit of T is dense if, and only if, µ(U) > 0 for every non-empty open set U.

82 MATH4/61112 9. Recurrence

9. Recurrence

9.1 Introduction § We can now begin to study ergodic theorems. Before we do this, we discuss a remarkable result due to Poincar´e.

9.2 Poincar´e’s Recurrence Theorem § Theorem 9.2.1 (Poincar´e’s Recurrence Theorem) Let T : X X be a measure-preserving transformation of the probability space (X, ,µ). → B Let B be such that µ(B) > 0. Then for µ-a.e. x B, the orbit T nx ∞ returns to B ∈B ∈ { }n=0 infinitely often.

Proof. Let E = x B T nx B for infinitely many n 1 , { ∈ | ∈ ≥ } then we have to show that µ(B E) = 0. \ If we write F = x B T nx B n 1 { ∈ | 6∈ ∀ ≥ } then we have the identity ∞ B E = (T −kF B). \ ∩ k[=0 Thus we have the estimate

∞ ∞ ∞ µ(B E)= µ (T −kF B) µ T −kF µ(T −kF ). \ ∩ ! ≤ ! ≤ k[=0 k[=0 Xk=0 Since µ(T −kF )= µ(F ) k 0 (because the measure is preserved), it suffices to show that ∀ ≥ µ(F ) = 0. First suppose that n>m and that T −mF T −nF = . If y lies in this intersection ∩ 6 ∅ then T my F and T n−m(T my) = T ny F B, which contradicts the definition of F . ∈ ∈ ⊂ Thus T −mF and T −nF are disjoint. Since T −kF ∞ is a disjoint family, we have { }k=0 ∞ ∞ µ(T −kF )= µ T −kF µ(X) = 1. ! ≤ Xk=0 k[=0 Since the terms in the summation have the constant value µ(F ), we must have µ(F ) = 0. ✷

83 MATH4/61112 9. Recurrence

Remark. Note that the hypotheses of the Poincar´eRecurrence Theorem are very mild: all one needs is for T to be a measure-preserving transformation of a probability space. (One does not need T to be ergodic.) If you carefully look at the proof, you will see that the fact that T is measure-preserving and the fact that µ(X) = 1 are used just once. The same proof continues to hold in the case when µ(X) is finite. Poincar´e’s Recurrence Theorem is false with either of the hypotheses that µ(X) is finite or T is measure-preserving removed.

9.3 Ergodic Theorems § An ergodic theorem is a result that describes the limiting behaviour of sequences of the form n−1 1 f T j (9.3.1) n ◦ Xj=0 as n . The precise formulation of an ergodic theorem depends on the class of function →∞ f (for example, one could assume that f is integrable, L2, or continuous), and the notion of convergence used (for example, the convergence could be pointwise, L2, or uniform). We have already studied when one has uniform convergence of (9.3.1): this is Oxtoby’s Ergodic Theorem and only holds in the very special circumstances when T is uniquely ergodic. In what follows we will discuss von Neumann’s (Mean) Ergodic Theorem and Birkhoff’s Ergodic Theorem. Von Neumann’s Ergodic Theorem is in the context of f ∈ L2(X, ,µ) and L2-convergence of the ergodic averages (9.3.1); Birkhoff’s Ergodic Theorem B is in the context of f L1(X, ,µ) and almost everywhere pointwise convergence of (9.3.1). ∈ B Note that L2 convergence neither implies nor is implied by almost everywhere pointwise convergence. Before stating these theorems, we first need to discuss .

9.4 Conditional expectation § Let (X, ,µ) be a probability space. Let be a sub-σ-algebra. Note that µ defines a B A⊂B measure on by restriction. Let f L1(X, ,µ). Then we can define a measure ν on A ∈ B A by setting, for A , ∈A ν(A)= f dµ. ZA Note that ν µ . Hence by the Radon-Nikodym theorem, there is a unique -measurable ≪ |A A function E(f ) such that |A ν(A)= E(f ) dµ |A ZA for all A . We call E(f ) the conditional expectation of f with respect to the ∈ A | A σ-algebra . A So far, we have only defined E(f ) for non-negative f. To define E(f ) for an | A | A arbitrary real-valued f, we split f into positive and negative parts f = f f where + − − f ,f 0 and define + − ≥ E(f )= E(f ) E(f ). |A + |A − − |A For a complex-valued f we split f into its real and imaginary parts and define

E(f )= E(Re(f) )+ iE(Im(f) ). |A |A |A

84 MATH4/61112 9. Recurrence

Thus we can view conditional expectation as an operator

E( ) : L1(X, ,µ) L1(X, ,µ). ·|A B → A Note that E(f ) is uniquely determined by the two requirements that |A (i) E(f ) is -measurable, and |A A (ii) f dµ = E(f ) dµ for all A . A A |A ∈A Intuitively,R one canR think of E(f ) as the best approximation to f in the smaller space |A of -measurable functions. A Let T be a measure-preserving transformation of the probability space (X, ,µ). To B state von Neumann’s and Birkhoff’s Ergodic Theorems precisely, we will need the sub-σ- algebra of T -invariant subsets, namely: I = B T −1B = B a.e. . I { ∈ B | } It is straightforward to check that is a σ-algebra. Note that if T is ergodic then is the I I trivial σ-algebra consisting of all sets in of measure 0 or 1. B

9.5 Von Neumann’s Ergodic Theorem § Von Neumann’s Ergodic Theorem deals with the L2-limiting behaviour of 1 n−1 f T j n j=0 ◦ for f L2(X, ,µ). ∈ B P Theorem 9.5.1 (von Neumann’s Ergodic Theorem) Let (X, ,µ) be a probability space and let T : X X be a measure-preserving transfor- B → mation. Let denote the σ-algebra of T -invariant sets. Then for every f L2(X, ,µ), I ∈ B we have n−1 1 lim f T j = E(f ) n→∞ n ◦ |I Xj=0 where the convergence is in L2. When T is ergodic with respect to µ then von Neumann’s Ergodic Theorem takes a particularly simple form.

Corollary 9.5.2 Let (X, ,µ) be a probability space and let T : X X be an ergodic measure-preserving B → transformation. Let f L2(X, ,µ). Then ∈ B n−1 1 lim f T j = f dµ, (9.5.1) n→∞ n ◦ Xj=0 Z where the convergence is in L2.

Proof. If T is ergodic then is the trivial σ-algebra consisting of sets of measure 0 I N and 1. If f L2(X, ,µ) then E(f )= f dµ. ✷ ∈ B |N R

85 MATH4/61112 9. Recurrence

Remark. The meaning of convergence in (9.5.1) is that

n−1 1 lim f T j f dµ = 0 n→∞ n ◦ − j=0 Z X 2 i.e. 2 1/2 n−1 1 lim f(T jx) f dµ dµ = 0. n→∞  n −   Z Xj=0 Z     9.6 Proof of von Neumann’s Ergodic Theorem § None of this section is examinable—it is included for people who like hard-core ! We prove von Neumann’s Ergodic Theorem in the case where T is invertible. In order to prove von Neumann’s Ergodic Theorem, it is useful to recast it in terms of linear analysis.

Theorem 9.6.1 (von Neumann’s Ergodic Theorem for Operators) Let U be a unitary operator of a complex Hilbert space H. Let I = v H Uv = v be { ∈ | } the closed subspace of U-invariant functions and let P : H I be orthogonal projection I → onto I. Then for all v H we have ∈ n−1 1 j lim U v = PI v (9.6.1) n→∞ n Xj=0 in the norm induced on H by the inner product.

Proof of Theorem 9.6.1. Denote the inner product and norm on by , and , H h· ·i k · k respectively. First note that if v I then (9.6.1) holds, as ∈ n−1 1 U jv = v = P v. n I Xj=0 If v = Uw w for some w H then − ∈ n−1 1 1 1 U jv = U nw w 2 w 0 n nk − k≤ n k k→ j=0 X as n . If we let C denote the norm-closure of the subspace Uw w w H then it →∞ { − | ∈ } follows that n−1 1 lim U jv = 0 n→∞ n Xj=0 for all v C, by approximation. ∈ We claim that H = I C, an orthogonal decomposition. Suppose that v C. Then ⊕ ⊥ v,Uw w = 0 for all w H. Hence U ∗v, w = v, w for all w H. Hence U ∗v = v. h − i ∈ h i h i ∈ As U is unitary, we have that U ∗ = U −1. Hence v = Uv, so that v I. Reversing each ∈ implication we see that v I implies v C, and the claim follows. ✷ ∈ ⊥

86 MATH4/61112 9. Recurrence

Remark. Note that an isometry of a Hilbert space H is a linear operator U such that Uv,Uw = v, w for all v, w H. We say that U is unitary if, in addition, it is invertible. h i h i ∈ Equivalently, U is unitary if the dual operator U ∗ is the inverse of U: U ∗U = UU ∗ = id.

We can prove von Neumann’s Ergodic Theorem for an invertible measure-preserving transformation T of a probability space (X, ,µ) as follows. Recall that L2(X, ,µ) is a B B Hilbert space with respect to the inner product

f, g = fg¯ dµ h i Z and that T induces a linear operator U : L2(X, ,µ) L2(X, ,µ) by Uf = f T . As T B → B ◦ is measure-preserving, we have that U is an isometry; if T is invertible then U is unitary. Let P : L2(X, ,µ) L2(X, ,µ) denote the orthogonal projection onto the subspace I B → I of T -invariant functions. One can easily check (see Exercise 9.6) that P f = E(f ). I |I Hence, when T is invertible, Theorem 9.5.1 follows immediately from Theorem 9.6.1. One can deduce from Theorem 9.6.1 that the result continues to hold when U is an isometry and is not assumed to be invertible.

9.7 Exercises § Exercise 9.1 Construct an example to show that Poincar´e’s recurrence theorem does not hold on infinite measure spaces. That is, find a measure space (X, ,µ) with µ(X) = and a measure- B ∞ preserving transformation T : X X such that the conclusion of Poincar´e’s Recurrence → Theorem does not hold.

Exercise 9.2 Poincar´e’s Recurrence Theorem says that, if we have a measure-preserving transformation T of a probability space (X, ,µ) and a set A , µ(A) > 0, then, if we start iterating a B ∈B typical point x A then the orbit of x will return to A infinitely often. ∈ Construct an example to show that if we have a measure-preserving transformation T of a probability space (X, ,µ) and two sets A, B , µ(A),µ(B) > 0, then, if we start B ∈ B iterating a typical point x A then the orbit of x does not necessarily visit B infinitely ∈ often.

Exercise 9.3 (i) Prove that f E(f ) is linear. 7→ |A (ii) Suppose that T is a measure-preserving transformation. Show that E(f ) T = | A ◦ E(f T T −1 ). ◦ | A (iii) Show that E(f )= f. |B (iv) Let denote the trivial σ-algebra consisting of all sets of measure 0 and 1. Show N that a function f is -measurable if and only if it is constant a.e. Show that E(f N | )= f dµ. N Exercise 9.4R Let (X, ,µ) be a probability space. B

87 MATH4/61112 9. Recurrence

(i) Let α = A ,...,A , A be a finite partition of X. (By a partition we mean { 1 n} j ∈ B that X = n A and A A = if i = j.) Let denote the set of all finite unions j=1 j i ∩ j ∅ 6 A of sets in α. Check that is a σ-algebra. S A (ii) Show that g : X R is -measurable if and only if g is constant on each A , i.e. → A j n

g(x)= cjχAj (x). Xj=1

(iii) Let f L1(X, ,µ). Show that ∈ B r f dµ Aj E(f )(x)= χAj (x) . |A Rµ(Aj) Xj=1 Thus E(f ) is the best approximation to f that is constant on sets in the partition |A α.

Exercise 9.5 Prove that is a σ-algebra. I Exercise 9.6 Let T be a measure-preserving transformation of the probability space (X, ,µ) and let B I denote the sub-σ-algebra of T -invariant sets. Let P : L2(X, ,µ) L2(X, ,µ) denote I B → I the orthogonal projection onto the subspace of T -invariant functions. Prove that PI f = E(f ) for all f L2(X, ,µ). |I ∈ B

88 MATH4/61112 10. Birkhoff’s Ergodic Theorem

10. Birkhoff’s Ergodic Theorem

10.1 Birkhoff’s Ergodic Theorem § Birkhoff’s Ergodic Theorem deals with the behaviour of 1 n−1 f(T jx) for µ-a.e. x X, n j=0 ∈ and for f L1(X, ,µ). ∈ B P Theorem 10.1.1 (Birkhoff’s Ergodic Theorem) Let (X, ,µ) be a probability space and let T : X X be a measure-preserving transfor- B → mation. Let denote the σ-algebra of T -invariant sets. Then for every f L1(X, ,µ), I ∈ B we have n−1 1 lim f(T jx)= E(f )(x) n→∞ n |I Xj=0 for µ-a.e. x X. ∈ Corollary 10.1.2 (Birkhoff’s Ergodic Theorem for an ergodic transformation) Let (X, ,µ) be a probability space and let T : X X be an ergodic measure-preserving B → transformation. Let f L1(X, ,µ). Then ∈ B n−1 1 lim f(T jx)= f dµ n→∞ n Xj=0 Z for µ-a.e. x X. ∈

10.2 Consequences of, and criteria for, ergodicity § Here we give some simple corollaries of Birkhoff’s Ergodic Theorem. The first result says that, for a typical orbit of an ergodic dynamical system, ‘time averages’ equal ‘space aver- ages’.

Corollary 10.2.1 Let T be an ergodic measure-preserving transformation of the probability space (X, ,µ). B Suppose that B . Then for µ-a.e. x X, the frequency with which the orbit of x lies ∈ B ∈ in B is given by µ(B), i.e., 1 lim card j 0, 1,...,n 1 T jx B = µ(B) µ-a.e. n→∞ n { ∈ { − } | ∈ }

Proof. Apply the Birkhoff Ergodic Theorem with f = χB. ✷

It is possible to characterise ergodicity in terms of the behaviour of iteration of pre- images of sets, rather than the iteration points, under the dynamics. The next result deals with this.

89 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Proposition 10.2.2 Let (X, ,µ) be a probability space and let T : X X be a measure-preserving transfor- B → mation. The following are equivalent:

(i) T is ergodic;

(ii) for all A, B , ∈B n−1 1 lim µ(T −jA B)= µ(A)µ(B). n→∞ n ∩ Xj=0 Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Since χ L1(X, ,µ), A ∈ B Birkhoff’s Ergodic Theorem tells us that

n−1 1 j lim χA(T x)= µ(A) n→∞ n Xj=0 for µ-a.e. x X. Multiplying both sides by χ (x) gives ∈ B n−1 1 j lim χA(T x) χB(x)= µ(A)χB (x) n→∞ n Xj=0 for µ-a.e. x X. Since the left-hand side is bounded (by 1), we can apply the Dominated ∈ Convergence Theorem (Theorem 3.1.3) to see that

n−1 n−1 n−1 1 −j 1 j 1 j lim µ(T A B) = lim χA T χB dµ = lim χA T χB dµ = µ(A)µ(B). n→∞ n ∩ n→∞ n ◦ n→∞ n ◦ Xj=0 Xj=0 Z Z Xj=0 We prove that (ii) implies (i). Now suppose that the convergence holds. Suppose that T −1B = B and take A = B. Then µ(T −jA B)= µ(B) so ∩ n−1 1 lim µ(B)= µ(B)2. n→∞ n Xj=0 This gives µ(B)= µ(B)2. Therefore µ(B)=0 or 1 and so T is ergodic. ✷

10.3 Kac’s Lemma § Poincar´e’s Recurrence Theorem tells us that, under a measure-preserving transformation, almost every point of a subset A of positive measure will return to A. However, it does not tell us how long we should have to wait for this to happen. One would expect that return times to sets of large measure are small, and that return times to sets of small measure are large. This is indeed the case, and forms the content of Kac’s Lemma. Let T : X X be a measure-preserving transformation of a probability space (X, ,µ) → B and let A X be a measurable subset with µ(A) > 0. By Poincar´e’s Recurrence Theorem, ⊂ the integer n (x) = inf n 1 T n(x) A A { ≥ | ∈ } is defined for a.e. x A. ∈

90 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Theorem 10.3.1 (Kac’s Lemma) Let T be an ergodic measure-preserving transformation of the probability space (X, ,µ). B Let A be such that µ(A) > 0. Then ∈B

nA dµ = 1. ZA Proof. Let A = A T −1Ac T −(n−1)Ac T −nA. n ∩ ∩···∩ ∩ Then An consists of those points in A that return to A after exactly n iterations of T , i.e. A = x A n (x)= n . n { ∈ | A } Consider the illustration in Figure 10.3. As T is ergodic, almost every point of X

T

T

TT

TT T A A1 A2 A3 An

T

Figure 10.3.1: The return times to A eventually enters A. Hence the diagram represents of X. Note that the column above An in the diagram consists of n sets, An,0,...,An,n−1 say, with An,0 = An. Note −k that T An,k = An. As T is measure-preserving, it follows that µ(An,k) = µ(An) for k = 0,...,n 1. Hence − ∞ n−1 ∞ 1 = µ(X)= µ(An,k)= nµ(An) n=1 k=0 n=1 ∞ X X X = nA dµ = nA dµ. n=1 An A X Z Z ✷

Remark. Let A be as in the statement of Kac’s Lemma (Theorem 10.3.1). Define a probability measure µA on A by µA = µ/µ(A) so that µA(A) = 1. Then Kac’s Lemma says

91 MATH4/61112 10. Birkhoff’s Ergodic Theorem that 1 n dµ = , A A µ(A) ZA i.e. the expected return time of a point in A to the set A is 1/µ(A).

10.4 Ehrenfests’ example § The following example, due to P. and T. Ehrenfest, demonstrates that the return times in Poincar´e’s Recurrence Theorem may be extremely large. Consider two urns. One urn contains 100 balls, numbered 1 to 100, and the other urn is empty. We also have a random number generator: this could be a bag containing 100 slips of paper, numbered 1 to 100. Each second, a slip of paper is drawn from the bag, the number is noted, and the slip of paper is returned to the bag. The ball bearing that number is then moved from whichever urn it is currently in to the other urn. Naively, we would expect that the system will settle into an equilibrium state in which there are 50 balls in each urn. Of course, there will continue to be small random fluctuations about the 50-50 distribution. However, it would appear highly unlikely for the system to return to the state in which 100 balls are in the first urn. Nevertheless, the Poincar´e Recurrence Theorem tells us that this situation will occur and Kac’s Lemma tells us how long we should expect to wait. To see this, we represent the system as a shift on 101 symbols with an appropriate Markov measure. Regard xj 0,..., 100 as being the number of balls in the first urn ∈ { ∞ } after j seconds. Hence a sequence (xj)j=0 records the number of balls in the first urn at each time. Let Σ = x = (x )∞ x 0, 1,..., 100 . { j j=0 | j ∈ { }} Let p(i) denote the probability of there being i balls in the first urn. This is equal to the number of possible ways of choosing i balls from 100, divided by the total number of 100 ways of distributing 100 balls across the 2 urns. There are ways of choosing i balls i   from 100 balls. As there are 2 possible urns for each ball to be in, there are 2100 possible arrangements of all the balls. Hence the probability of there being i balls in the first urn is

1 100 p(i)= . 2100 i   If we have i balls in the first urn then at the next stage we must have either i 1 or i+1 − balls in the first urn. The number of balls becomes i 1 if the random number chosen is − equal to the number of one of the balls in the first urn. As there are currently i such balls, the probability of this happening is i/100. Hence the probability P (i, i 1) that there are − i 1 balls remaining given that we started with i balls in the first urn is i/100. Similarly, − the probability P (i, i + 1) that there are i + 1 balls in the first urn given that we started with i balls is (100 i)/100. if j = i 1, i + 1 then we cannot have j balls in the first urn − 6 − given that we started with i balls; thus P (i, j) = 0. This defines a stochastic matrix:

0 1 0 0 0 · · · 1 0 99 0 0  100 100 · · ·  0 2 0 98 0 P = 100 100 · · ·  0 0 3 0 97   100 100   . . . . . ·. · ·   ......      92 MATH4/61112 10. Birkhoff’s Ergodic Theorem

It is straightforward to check that pP = p. Hence we have a Markov probability measure µP defined on Σ. The matrix P is irreducible (but is not aperiodic); this ensures that µP is ergodic. Consider the cylinder A = [100] of length 1. This represents there being 100 balls in the first urn. By Poincar´e’s Recurrence Theorem, if we start in A then we return to A infinitely often. Thus, with probability 1, we will return to the situation where all 100 balls have returned to the first urn—and this will happen infinitely often! We can use Kac’s Lemma to calculate the expected amount of time we will have to wait until all the balls first return to the first urn. By Kac’s lemma, the expected first return time to A is 1 = 2100 seconds, µP (A) which is about 4 1022 years, or about 3 1012 times the length of time that the Universe × × has so far existed! (This measure-preserving transformation system, with 4 balls rather than 100, was also studied in Exercise 3.11.)

10.5 Proof of Birkhoff’s Ergodic Theorem § None of this section is examinable—it is included for people who like hard-core ε-δ analysis! The proof is based on the following inequality.

Theorem 10.5.1 (Maximal Inequality) Let (X, ,µ) be a probability space, let T : X X be a measure-preserving transformation B → and let f L1(X, ,µ). Define f = 0 and, for n 1, ∈ B 0 ≥ f = f + f T + + f T n−1. n ◦ · · · ◦ For n 1, set F (x) = max f (x) so that F (x) 0. Then ≥ n 0≤j≤n j n ≥

f dµ 0. ≥ Z{x∈X|Fn(x)>0} Proof. Clearly F L1(X, ,µ). For 0 j n, we have F f , so F T f T . n ∈ B ≤ ≤ n ≥ j n ◦ ≥ j ◦ Hence F T + f f T + f = f n ◦ ≥ j ◦ j+1 and therefore Fn T (x)+ f(x) max fj(x). ◦ ≥ 1≤j≤n

If Fn(x) > 0 then max fj(x)= max fj(x)= Fn(x), 1≤j≤n 0≤j≤n so we obtain that f F F T ≥ n − n ◦ on the set A = x F (x) > 0 . { | n }

93 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Hence

f dµ F dµ F T dµ ≥ n − n ◦ ZA ZA ZA = F dµ F T dµ as F =0 on X A n − n ◦ n \ ZX ZA F dµ F T dµ as F T 0 ≥ n − n ◦ n ◦ ≥ ZX ZX = 0as µ is T -invariant. ✷

Corollary 10.5.2 Let g L1(X, ,µ) and let ∈ B n−1 1 j Mα = x X sup g(T x) > α .  ∈ | n≥1 n   Xj=0  Then for all B with T −1B = B we have that ∈B   g dµ αµ(M B). ≥ α ∩ ZMα∩A Proof. Suppose first that B = X. Let f = g α, then − ∞ n−1 ∞ ∞ M = x g(T jx) > nα = x f (x) > 0 = x F (x) > 0 α  |  { | n } { | n } n[=1  Xj=0  n[=1 n[=1 (since f (x) > 0 F (x) > 0 and F (x) > 0 f (x) > 0 for some 1 j n). Write n ⇒ n n ⇒ j ≤ ≤ C = x F (x) > 0 and observe that C C . Thus χ converges to χ and so n { | n } n ⊂ n+1 Cn Mα fχ converges to fχ , as n . Furthermore, fχ f . Hence, by the Dominated Cn Mα →∞ | Cn | ≤ | | Convergence Theorem,

f dµ = fχ dµ fχ dµ = f dµ, as n . Cn → Mα →∞ ZCn ZX ZX ZMα Applying the Maximal Inequality, we have for all n 1 that f dµ 0. Therefore ≥ Cn ≥ f dµ 0, i.e., g dµ αµ(Mα). Mα ≥ Mα ≥ R For the general case, we work with the restriction of T to B, T : B B, and apply R R |B → the Maximal Inequality on this subset to get

g dµ αµ(M B), ≥ α ∩ ZMα∩B as required. ✷ We will also need the following convergence result.

Proposition 10.5.3 (Fatou’s Lemma) Let (X, ,µ) be a probability space and suppose that f : X R are measurable functions. B n → Define f(x) = lim infn→∞ fn(x). Then f is measurable and

f dµ lim inf fn dµ ≤ n→∞ Z Z (one or both of these expressions may be infinite).

94 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Proof of Birkhoff’s Ergodic Theorem. Let

n−1 n−1 ∗ 1 j 1 j f (x) = limsup f(T x), f∗(x) = lim inf f(T x). n→∞ n n→∞ n Xj=0 Xj=0 These exist (but may be , respectively) at all points x X. Clearly f (x) f ∗(x). ±∞ ∈ ∗ ≤ Let n−1 1 a (x)= f(T jx). n n Xj=0 Observe that n + 1 1 a (x)= a (Tx)+ f(x). n n+1 n n As f is finite µ-a.e., we have that f(x)/n 0 µ-a.e. as n . Hence, taking the lim sup → →∞ and lim inf as n , gives us that f ∗ T = f ∗ µ-a.e. and f T = f µ-a.e. →∞ ◦ ∗ ◦ ∗ We have to show

∗ (i) f = f∗ µ-a.e (ii) f ∗ L1(X, ,µ) ∈ B (iii) f ∗ dµ = f dµ.

WeR prove (i).R For α, β R, define ∈ E = x X f (x) < β and f ∗(x) > α . α,β { ∈ | ∗ } Note that x X f (x) α  ∈ | n≥1 n   Xj=0  then E M = E . α,β ∩ α α,β   Applying Corollary 10.5.2 we have that

f dµ = f dµ ZEα,β ZEα,β∩Mα αµ(E M )= αµ(E ). ≥ α,β ∩ α α,β Replacing f, α and β by f, β and α and using the fact that ( f)∗ = f and − − − − − ∗ ( f) = f ∗, we also get − ∗ − f dµ βµ(E ). ≤ α,β ZEα,β Therefore αµ(E ) βµ(E ) α,β ≤ α,β

95 MATH4/61112 10. Birkhoff’s Ergodic Theorem

∗ and since β < α this shows that µ(Eα,β) = 0. Thus f = f∗ µ-a.e. and

n−1 1 lim f(T jx)= f ∗(x) µ-a.e. n→∞ n Xj=0 We prove (ii). Let n−1 1 g (x)= f(T jx) . n n j=0 X

Then gn 0 and ≥ g dµ f dµ n ≤ | | Z Z so we can apply Fatou’s Lemma (Proposition 10.5.3) to conclude that lim g = f ∗ is n→∞ n | | integrable, i.e., that f ∗ L1(X, ,µ). ∈ B We prove (iii). For n N and k Z, define ∈ ∈ k k + 1 Dn = x X f ∗(x) < . k ∈ | n ≤ n   For every ε> 0, we have that n n Dk M k −ε = Dk . ∩ n −1 n n Since T Dk = Dk , we can apply Corollary 10.5.2 again to obtain

k n f dµ ε µ(Dk ). Dn ≥ n − Z k   Since ε> 0 is arbitrary, we have

k n f dµ µ(Dk ). Dn ≥ n Z k Thus ∗ k + 1 n 1 n f dµ µ(Dk ) µ(Dk )+ f dµ Dn ≤ n ≤ n Dn Z k Z k n (where the first inequality follows from the definition of Dk ). Since

n X = Dk Z k[∈ (a disjoint union), summing over k Z gives ∈ 1 f ∗ dµ µ(X)+ f dµ ≤ n ZX ZX 1 = + f dµ. n ZX Since this holds for all n 1, we obtain ≥ f ∗ dµ f dµ. ≤ ZX ZX 96 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Applying the same argument to f gives − ( f)∗ dµ f dµ − ≤ − Z Z so that f ∗ dµ = f dµ f dµ. ∗ ≥ Z Z Z Therefore f ∗ dµ = f dµ, Z Z as required. Finally, we prove that f ∗ = E(f ). First note that as f ∗ is T -invariant, it is | I measurable with respect to . Moreover, if I is any T -invariant set then I f dµ = f ∗ dµ. ZI ZI Hence f ∗ = E(f ). ✷ |I

10.6 Exercises § Exercise 10.1 Suppose that T is an ergodic measure-preserving transformation of the probability space (X, ,µ) and suppose that f L1(X, ,µ). Prove that B ∈ B f(T nx) lim = 0 µ-a.e. n→∞ n

Exercise 10.2 Deduce from Birkhoff’s Ergodic Theorem that if T is an ergodic measure-preserving trans- formation of a probability space (X, ,µ) and f 0 is measurable but f dµ = then B ≥ ∞ n−1 R 1 lim f(T jx)= µ-a.e. n→∞ n ∞ Xj=0 (Hint: define f = min f, M and note that f L1(X, ,µ). Apply Birkhoff’s Ergodic M { } M ∈ B Theorem to each fM .)

Exercise 10.3 Let T be a measure-preserving transformation of the probability space (X, ,µ). Prove B that the following are equivalent:

(i) T is ergodic with respect to µ,

(ii) for all f, g L2(X, ,µ) we have that ∈ B n−1 1 lim f(T jx)g(x) dµ = f dµ g dµ. n→∞ n Xj=0 Z Z Z

97 MATH4/61112 10. Birkhoff’s Ergodic Theorem

Exercise 10.4 Let X be a compact metric space equipped with the Borel σ-algebra and let T : X X B → be continuous. Suppose that µ M(X) is an ergodic measure. ∈ Prove that there exists a set Y with µ(Y ) = 1 such that ∈B n−1 1 lim f(T jx)= f dµ n→∞ n Xj=0 Z for all x Y and for all f C(X, R). ∈ ∈ (Thus, in the special case of a continuous transformation of a compact metric space and continuous functions f, the set of full measure for which Corollary 10.1.2 holds can be chosen to be independent of the function f.)

Exercise 10.5 A popular illustration of recurrence concerns a monkey typing the complete works of Shake- speare on a typewriter. Here we study this from an ergodic-theoretic viewpoint. Imagine a(n idealised) monkey typing on a typewriter. Each second he types one let- ter, and each letter occurs with equal probability (independently of the preceding letter). Suppose that the keyboard has 26 keys (so no space bar, carriage return, numbers, etc). Show how to model this using a shift space on 26 symbols with an appropriate Bernoulli measure. Use Birkhoff’s Ergodic Theorem to show that the monkey must, with probability 1, eventually type the word ‘MONKEY’. Use Kac’s Lemma to calculate the expected time it would take for the monkey to first type ‘MONKEY’.

98 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

11. Applications of Birkhoff’s Ergodic Theorem

11.1 Introduction § We will show how to use Birkhoff’s Ergodic Theorem to prove some interesting results in number theory.

11.2 Normal and simply normal numbers § Recall that any number x [0, 1] can written as a decimal ∈ ∞ x x = x x x . . . = j · 0 1 2 10j+1 Xj=0 where x 0, 1,..., 9 . This decimal expansion is unique unless the decimal expansion j ∈ { } ends in either infinitely repeated 0s or infinitely repeated 9s. More generally, given any integer base b 2, we can write x [0, 1] as a base b ≥ ∈ expansion: ∞ x x = x x x . . . = j · 0 1 2 bj+1 Xj=0 where x 0, 1,...,b 1 . This expansion is unique unless it ends in either infinitely j ∈ { − } repeated 0s or infinitely repeated (b 1)s. In what follows, if x has two expansions in base − b then we always choose the expansion that ends in infinitely repeated 0s; note that this is a countable set and so has Lebesgue measure zero.

Definition. A number x [0, 1] is said to be simply normal in base b if for each k = ∈ 0, 1,...,b 1, the frequency with which digit k occurs in the base b expansion of x is equal − to 1/b.

Remarks. 1. Thus a number is simply normal in base b if all of the b possible digits in its base b expansion are equally likely to occur.

2. It is straightfoward to construct examples of simply normal numbers in a given base. For example, x = 012 89012 89 (11.2.1) · · · · · · · · · · consisting of the block of decimal digits 012 89 infinitely repeated is simply normal · · · in base 10. If a number is simply normal in one base then it need not be simply normal in any other base. For example, x as defined in (11.2.1) is not simply normal in base 1010.

99 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

Fix b 2. Define the map T : [0, 1] [0, 1] by T (x)= T (x)= bx mod 1. It is easy to ≥ → b see, by following any of the arguments we have seen for the doubling map, that Lebesgue measure µ on [0, 1] is an ergodic invariant measure for T . There is a close connection between the map Tb and base b expansions. Note that if x [0, 1] has base b expansion ∈ ∞ x x = j = x x x bj+1 · 0 1 2 · · · Xj=0 then ∞ x ∞ x ∞ x T (x)= b j mod 1 = x + j mod 1 = j+1 = x x x . b bj+1 0 bj bj+1 · 1 2 3 · · · Xj=0 Xj=1 Xj=0 Thus Tb acts on base b expansions by deleting the zeroth term and then shifting the re- maining digits one place to the left. This relationship between base b expansions and the map Tb can be used to prove the following result.

Proposition 11.2.1 Let b 2. Then Lebesgue almost every real number in [0, 1] is simply normal in base b. ≥

Proof. Fix k 0, 1,...,b 1 . Note that x0 = k if and only if x [k/b, (k + 1)/b). ∈ { −j } ∈ Hence x = k if and only if T (x) [k/b, (k + 1)/b). Thus j b ∈ n−1 1 1 card 0 j n 1 x = k = χ (T jx). (11.2.2) n { ≤ ≤ − | j } n [k/b,(k+1)/b) Xj=0 By Birkhoff’s Ergodic Theorem, for Lebesgue almost every point x the above expression converges to χ (x) dx = 1/b. Let X (k) denote the set of points x [0, 1] for [k/b,(k+1)/b) b ∈ which (11.2.2) converges. Then µ(X (k)) = 1 for each k = 0, 1,...,b 1. R b − Let X = b−1 X (k). Let µ denote Lebesgue measure. Then µ(X )=1. If x X b k=0 b b ∈ b then the frequency with which each digit k 0, 1,...,b 1 occurs in the base b expansion T ∈ { − } of x is equal to 1/b, i.e. x is simply normal in base b. ✷

We can consider a more general notion of normality of numbers as follows. Take x [0, 1] ∈ and write x as a base b expansion ∞ x x = x x x . . . = j · 0 1 2 bj+1 Xj=0 where x 0, 1,...,b 1 . Fix a finite word of symbols i , i ,...,i where i j ∈ { − } 0 1 k−1 j ∈ 0, 1,...,b 1 , j = 0,...,k 1. We can ask what is the frequency with which the { − } − block of symbols i0, i1,...,ik−1 occurs in the base b expansion of x. We call a block a symbols i0, i1,...,ik−1 a word of length k. Note that x has a base b expansion that starts i i i precisely when 0 1 · · · k−1 k−1 k−1 i i 1 x j , j + . ∈  bj+1 bj+1 bk  Xj=0 Xj=0   k Call this interval I(i0,...,ik−1) and note that it has Lebesgue measure 1/b .

100 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

Definition. A number x [0, 1] is simply normal if it is simply normal in base b for all ∈ b 2. ≥ Proposition 11.2.2 Lebesgue almost every real number x [0, 1] is simply normal. ∈ Proof. See Exercise 11.2(i). ✷

Definition. A number x [0, 1] is said to be normal in base b if, for each k 1 and for ∈ ≥ each word i0, i1,...,ik−1 of length k, the frequency with which this word occurs in the base b expansion of x is equal to 1/bk.

Proposition 11.2.3 Let b 2 be an integer. Lebesgue almost every real number in [0, 1] is normal in base b. ≥

Proof. Fix a word i0, i1,...,ik−1 of length k and define the interval I(i0,...,ik−1) as above. Then the word i0, i1,...,ik−1 occurs at the jth place in the base b expansion of x if and only if T j(x) I(i ,...,i ). Thus b ∈ 0 k−1 1 card 0 j n 1 i , i ,...,i occurs at the jth place in the base b expansion of x n { ≤ ≤ − | 0 1 k−1 }

n−1 1 j = χI(i0,...,ik 1)(T x). (11.2.3) n − Xj=0 By Birkhoff’s Ergodic Theorem, for Lebesgue almost every point x the above expres- k sion converges to χI(i0,...,ik 1)(x) dx = 1/b . Let Xb(i0, i1,...,ik−1) denote the set of points x [0, 1] for which (11.2.3)− converges. Let µ denote Lebesgue measure. Then ∈ R µ(Xb(i0, i1,...,ik−1)) = 1 for each word i0, i1,...,ik−1 of length k. Let ∞ Xb = Xb(i0, i1,...,ik−1)

k=1 i0,i1,...,ik 1 \ \ − where the second intersection is taken over all words of length k. As this is a countable intersection, we have that µ(X )=1. If x X then the frequency with which any word of b ∈ b length k occurs in the base b expansion of x is equal to 1/bk, i.e. x is normal in base b. ✷

We can then make the following definition.

Definition. A number x [0, 1] is normal if it is normal in base b for every base b 2. ∈ ≥ We can now prove the following result:

Proposition 11.2.4 Lebesgue almost every number x [0, 1] is normal. ∈ Proof. See Exercise 11.2(ii). ✷

Remark. Although a ‘typical’ number is normal, there are no known examples of normal numbers!

101 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

11.3 Continued fractions § We can use Birkhoff’s Ergodic Theorem to study the frequency with which a given digit occurs in the continued fraction expansion of real numbers.

Proposition 11.3.1 For Lebesgue-almost every x [0, 1], the frequency with which the natural number k occurs ∈ in the continued fraction expansion of x is 1 (k + 1)2 log . log 2 k(k + 2)   Proof. Let λ denote Lebesgue measure and let µ denote Gauss’ measure. Recall that λ and µ are equivalent, i.e. they have the same sets of measure zero. Then λ-a.e. and µ-a.e. x [0, 1] is irrational and has an infinite continued fraction expansion ∈ 1 x = . (11.3.1) 1 x + 0 1 x + 1 x + 2 · · · Let T denote the continued fraction map. Then 1 T (x)= 1 x + 1 1 x + 2 x + 3 · · · so that 1 1 = x + . T (x) 1 1 x2 + x3 + · · · n Hence x1 = [1/T (x)], where [x] denotes the integer part of x. More generally, xn = [1/T x]. Fix k N. Note that x has a continued fraction expansion starting with digit k (i.e. ∈ x0 = k) precisely when [1/x]= k. That is, x0 = k precisely when 1 k < k + 1 ≤ x which is equivalent to requiring 1 1

= χ(1/(k+1),1/k] dµ for µ-a.e. x Z 1 1 1 = log 1+ log 1+ for µ-a.e. x log 2 k − k + 1      1 (k + 1)2 = log for µ-a.e. x. log 2 k(k + 2)

102 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

As µ and λ have the same sets of measure zero, this holds for Lebesgue almost every point. ✷

We can also study the limiting arithmetic and geometric means of the digits in the continued fraction expansion of Lebesgue almost every point x [0, 1]. ∈ Proposition 11.3.2 (i) For Lebesgue-almost every x [0, 1], the limiting arithmetic mean of the digits in the ∈ continued fraction expansion of x is infinite. More specifically, for Lebesgue almost every x [0, 1] we have ∈ 1 lim (x0 + x1 + + xn−1)= n→∞ n · · · ∞ where x has continued fraction expansion given by (11.3.1).

(ii) For Lebesgue-almost every x [0, 1], the limiting geometric mean of the digits in the ∈ continued fraction expansion of x is

∞ 1 log k/ log 2 1+ . k2 + 2k kY=1   More specifically, for Lebesgue almost every x [0, 1] we have ∈ ∞ log k/ log 2 1/n 1 lim (x0x1 xn−1) = 1+ . n→∞ · · · k2 + 2k kY=1   where x has continued fraction expansion given by (11.3.1).

Proof. Writing 1 x = . 1 x + 0 1 x + 1 x + 2 · · · the proposition claims that 1 lim (x0 + x1 + + xn−1)= (11.3.2) n→∞ n · · · ∞ for Lebesgue almost every point, and that

∞ log k/ log 2 1/n 1 lim (x0x1 xn−1) = 1+ (11.3.3) n→∞ · · · k2 + 2k kY=1   for Lebesgue almost every point. We leave (11.3.2) as an exercise. We prove (11.3.3). Define f(x) = log k for x (1/(k + 1), 1/k] so that f(x) = log k j ∈ precisely when x0 = k. Then f(T x) = log k precisely when xj = k. By Exercise 3.5(iii),

103 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem to show f L1(X, ,µ) it is sufficient to show that f L1(X, , λ). Note that ∈ B ∈ B ∞ 1 1 f dλ = log k λ , k + 1 k Z k=1   X∞ log k = k(k + 1) k=1 X∞ log k , ≤ k2 Xk=1 which converges. Hence f L1(X, ,µ). ∈ B Now

1/n 1 lim log (x0x1 xn−1) = lim (log x0 + log x1 + + log xn−1) n→∞ · · · n→∞ n · · · n−1 1 = lim f(T jx) n→∞ n Xj=0 1 1 f(x) = dx log 2 0 1+ x Z∞ 1 1/k log k = dx log 2 1+ x k=1 Z1/(k+1) ∞ X log k 1 = log 1+ log 2 k2 + 2k Xk=1   as n , for Gauss-almost every point x [0, 1]. As Gauss’ measure and Lebesgue → ∞ ∈ measure have the same sets of measure zero, this limit also exists for Lebesgue almost every point. Exponentiating both sides of the above gives the result. ✷

Let x [0, 1] be irrational and have continued fraction expansion [x ,x ,...]. Then ∈ 0 1 [x0,x1,...,xn−1] is a rational number; write [x0,x1,...,xn−1] = Pn(x)/Qn(x), where Pn(x),Qn(x) are co-prime integers. Then Pn(x)/Qn(x) is a ‘good’ rational approxima- tion to x. We write Pn(x),Qn(x) if we wish to indicate the dependence on x. As x and Pn(x)/Qn(x) lie in the same cylinder I(x0,...,xn−1) of rank n, we must have that

P (x) 1 x n diam I(x ,...,x ) , − Q (x) ≤ 0 n−1 ≤ Q (x)2 n n

where diam I denotes the length of the interval I. Thus we can quantify how good a rational approximation Pn(x)/Qn(x) is to x by looking at the denominator Qn(x). Thus understanding how Qn(x) grows gives us information about x. For a typical point, Qn(x) grows exponentially fast and we can determine the exponential growth rate.

Proposition 11.3.3 For Lebesgue almost every real number x [0, 1] we have that ∈ 1 π2 lim log Qn(x)= . n→∞ n 12 log 2

104 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

2 Remark. Thus, for a typical point x [0, 1], we have that Q (x) enπ /12 log 2. ∈ n ∼ Proof (not examinable). Let x [0, 1] be irrational and have continued fraction ex- ∈ pansion [x0,x1,...]. Write Pn(x) [x0,x1,...,xn−1]= . Qn(x) Then P (x) 1 1 Q (Tx) n = = = n−1 . (11.3.4) Qn(x) x0 + [x1,...,xn−1] Pn−1(Tx) Pn−1(Tx)+ x0Qn−1(Tx) x0 + Qn−1(Tx) By Lemma 6.3.1(ii) and the Euclidean algorithm, we know that for all n and all x, Pn(x) and Qn(x) are coprime. As Pn−1(Tx) and Qn−1(Tx) are coprime, it follows that Pn−1(Tx)+ x0Qn−1(Tx) and Qn−1(Tx) are coprime. Hence, comparing the numerators in (11.3.4), we see that Pn(x)= Qn−1(Tx). Also note that P1(x) = 1. Hence

n−1 n−1 Pn(x) Pn−1(Tx) P1(T x) P1(T x) 1 n−1 = = . Qn(x) Qn−1(Tx) · · · Q1(T x) Qn(x) Qn(x) Taking the logarithm and dividing by n gives that

n−1 j n−1 j 1 1 Pn−j(T x) 1 Pn−j(T x) log Qn(x)= log j = log j . (11.3.5) −n n Qn−j(T x) n Qn−j(T x) jY=0 Xj=0

This resembles an ergodic sum, except that the function Pn−j/Qn−j depends on j and so we cannot immediately apply Birkhoff’s Ergodic Theorem. We will consider ergodic sums 1 n−1 j using the function f(x) = log x and show that the difference between n j=0 f(T x) and (11.3.5) is small. P Let f(x) = log x. Then we can write (11.3.5) as

n−1 n−1 j 1 1 j 1 j Pn−j(T x) 1 (1) 1 (2) log Qn(x)= f(T x) log T (x) log j = Σn Σn . −n n − n − Qn−j(T x) n − n Xj=0 Xj=0   1 (1) We evaluate limn→∞ n Σn . By Birkhoff’s Ergodic Theorem and the fact that Gauss’ measure µ and Lebesgue measure are equivalent, it follows that for Lebesgue almost every x [0, 1] we have that ∈ 1 1 f(x) 1 1 log x lim Σ(1) = dx = dx. n→∞ n n log 2 1+ x log 2 1+ x Z Z0 Integrating by parts we have that

1 log x 1 log(1 + x) 1 log(1 + x) dx = log x log(1 + x) 1 dx = dx. 1+ x |0 − x − x Z0 Z0 Z0 The Taylor series expansion of log(1 + x) about zero is

∞ x2 x3 ( 1)k−1xk log(1 + x)= x + = − − 2 3 −··· k Xk=1 105 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem so that ∞ log(1 + x) ( 1)kxk = − . x k + 1 Xk=0 Hence for almost every x,

∞ k 1 ∞ k 1 (1) 1 ( 1) k 1 ( 1) lim Σn = − x dx = − 2 . n→∞ n −log 2 k + 1 0 −log 2 (k + 1) Xk=0 Z Xk=0 Note that ∞ ∞ ∞ ∞ ∞ ( 1)k 1 1 1 1 1 π2 − = 2 = = , (k + 1)2 n2 − (2n)2 n2 − 2 n2 12 n=1 n=1 n=1 n=1 Xk=0 X X X X ∞ 2 2 using the well-known fact that n=1 1/n = π /6. Hence for almost every x, P 1 π2 lim Σ(1) = . n→∞ n n −12 log 2

1 (2) It remains to show that Σn 0 as n . Recall that in 6.3 we introduced n → → ∞ § the cylinder set I(x0,x1,...,xn−1) of rank n to denote the set of points x with continued fraction expansion that starts x0,...,xn−1. We proved in 6.3 that I(x0,x1,...,xn−1) is 2 § an interval with length at most 1/Qn(x) . Note that both x and Pn(x)/Qn(x) lie in the same interval of rank n. Hence x Q (x) P (x) Q (x) 1 1 1 = n x n n = . P (x)/Q (x) − P (x) − Q (x) ≤ P (x) Q (x)2 P (x)Q (x) n n n n n n n n

It follows from Lemma 6.3.1(i) that P (x) 2(n− 2)/2 and Q (x) 2(n−1)/2. Hence n ≥ n ≥ x 1 1 . P (x)/Q (x) − ≤ 2n−3/2 n n

By the triangle inequality and the fact that log y y 1 we have that ≤ − n−1 T j(x) Σ(n) log 2 ≤ P (T j(x))/Q (T j(x)) j=0  n−j n−j  X n−1 T j(x) 1 ≤ P (T j(x))/Q (T j(x)) − j=0 n−j n−j X n−1 1 . n−j− 3 ≤ 2 2 Xj=0 Note that n−1 n−1 ∞ 1 1 1 1 1 = = 25/2. n−j− 3 −3/2 j −3/2 j 2 2 2 2 ≤ 2 2 Xj=0 Xj=0 Xj=0 (2) 5/2 Hence Σn 2 for all n. Hence ≤ 1 lim Σ(2) = 0 n→∞ n n and the result follows. ✷

106 MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

11.4 Exercises § Exercise 11.1 Let b 2 be an integer. Prove that Lebesgue measure is an ergodic invariant measure for ≥ Tb(x)= bx mod 1 defined on the unit interval.

Exercise 11.2 (i) A number x [0, 1] is said to be simply normal if it is simply normal in base b for all ∈ b 2. Prove that Lebesgue a.e. number x [0, 1] is simply normal. ≥ ∈ (ii) Prove Proposition 11.2.4.

Exercise 11.3 Let r 2 be an integer. Prove that for Lebesgue almost every x [0, 1], the sequence ≥n ∈ xn = r x is uniformly distributed mod 1.

Exercise 11.4 Prove that the arithmetic mean of the digits appearing in the base 10 expansion of Lebesgue- a.e. x [0, 1) is equal to 4.5, i.e. prove that if x = ∞ x /10j+1, x 0, 1,..., 9 then ∈ j=0 j j ∈ { } 1 P lim (x0 + x1 + + xn−1) = 4.5 a.e. n→∞ n · · ·

Exercise 11.5 Let x [0, 1] have continued fraction expansion x = [x ,x ,x ,...]. ∈ 0 1 2 Prove that 1 lim (x0 + x1 + + xn−1)= n→∞ n · · · ∞ for Lebesgue almost every x [0, 1]. (Hint: use Exercise 10.2.) ∈

107 MATH4/61112 12. Solutions

12. Solutions to the Exercises

Solution 1.1 Suppose that x R is uniformly distributed mod 1. Let x [0, 1] and let ε> 0. We want n ∈ ∈ to show that there exists n such that x (x ε, x + ε) [0, 1] (as usual, x denotes { n}∈ − ∩ { n} the fractional part of xn). By the definition of uniform distribution mod 1 we have that 1 lim card j 0 j n 1, xj (x ε, x + ε) = 2ε. n→∞ n { | ≤ ≤ − { }∈ − } Then there exists n such that if n n then 0 ≥ 0 1 card j 0 j n 1, x (x ε, x + ε) >ε> 0. n { | ≤ ≤ − { j}∈ − } Hence card j 0 j n 1, x (x ε, x + ε) > 0 { | ≤ ≤ − { j}∈ − } for some n, so there exists j such that x (x ε, x + ε). { j}∈ − Solution 1.2 We use Weyl’s Criterion. Let ℓ Z 0 . Then ∈ \ { } n−1 n−1 n−1 1 1 1 1 e2πiℓαn 1 e2πiℓxj = e2πiℓ(αj+β) = e2πiℓβ e2πiℓαj = e2πiℓβ − , n n n n e2πiℓα 1 Xj=0 Xj=0 Xj=0  −  summing the geometric progression. As α Q, we have that e2πiℓα = 1 for any ℓ Z 0 . 6∈ 6 ∈ \ { } Hence n−1 1 1 e2πiℓαn 1 1 2 e2πiℓxj − 0 n ≤ n e2πiℓα 1 ≤ n e2πiℓα 1 → j=0 X − | − | as n , as e2πiℓβ = 1. Hence x = αn + β is uniformly distributed. →∞ | | n Solution 1.3 p/q q p p p (i) If log10 2= p/q with p,q integers, hcf(p,q) = 1, then 2 = 10 , i.e. 2 = 10 = 5 2 . Comparing indices, we see that 0 = p = q, a contradiction.

(ii) Let 2n have leading digit r. Then

2n = r 10ℓ + terms involving lower powers of 10 · where the terms involving lower powers of 10 are integers lying in [0, 10ℓ). Hence

2n has leading digit r r 10ℓ 2n < (r + 1) 10ℓ ⇔ · ≤ · log r + ℓ n log 2 < log (r +1)+ ℓ ⇔ 10 ≤ 10 10 log r n log 2 < log (r + 1). ⇔ 10 ≤ { 10 } 10

108 MATH4/61112 12. Solutions

Hence 1 card k 0 k n 1, 2k has leading digit r n { | ≤ ≤ − } 1 = card k 0 k n 1, k log 2 [log r, log (r + 1)) n { | ≤ ≤ − { 10 }∈ 10 10 } which, by uniform distribution, converges to log (r + 1) log r = log (1 + 1/r) as 10 − 10 10 n . →∞ Solution 1.4 n 9 The frequency with which the penultimate leading digit of 2 is r is given by q=1 A(q, r) where A(q, r) is the frequency with which the leading digit is q and the penultimate leading P digit is r. Now 2n has leading digit q and penultimate digit r precisely when

q 10ℓ + r 10ℓ−1 2n

log (10q + r)+ ℓ 1 n log 2 < log (10q + r + 1) + (ℓ 1). 10 − ≤ 10 10 − Reducing this mod 1 gives

log (10q + r) 1 n log 2 < log (10q + r + 1) 1 10 − ≤ { 10 } 10 − (the 1s appear because 1 < log (10q + r), log (10q + r + 1) < 2). As n log 2 is − 10 10 { 10 } uniformly distributed mod 1, we see that

A(q, r) = (log (10q + r + 1) 1) (log (10q + r) 1) 10 − − 10 − 1 = log 1+ . 10 10q + r   Hence the frequency with which the penultimate leading digit of 2n is r is

9 9 1 1 log 1+ = log 1+ . 10 10q + r 10 10q + r q=1 q=1 X   Y   Solution 2.1 Suppose first that the numbers α1,...,αk, 1 are rationally independent. This means that if r1,...,rk, r are rational numbers such that

r α + + r α + r = 0, 1 1 · · · k k then r = = r = r = 0. In particular, for ℓ = (ℓ ,...,ℓ ) Zk 0 1 · · · k 1 k ∈ \ { } ℓ α + + ℓ α / Z, 1 1 · · · k k ∈ so that e2πi(ℓ1α1+···+ℓkαk) = 1. 6

109 MATH4/61112 12. Solutions

By summing the geometric progression we have that

n−1 1 1 e2πin(ℓ1α1+···+ℓkαk) 1 e2πi(ℓ1jα1+···+ℓkjαk) = − n n e2πi(ℓ1α1+···+ℓkαk) 1 j=0 − X 1 2 0, as n . ≤ n e2πi(ℓ1α1+···+ℓkαk) 1 → →∞ | − | Therefore, by Weyl’s Criterion, (nα1,...,nαk) is uniformly distributed mod 1. Now suppose that the numbers α1,...,αk, 1 are rationally dependent. Thus there exist rationals r ,...,r , r (not all zero) such that r α + + r α + r = 0. By multiplying by 1 k 1 1 · · · k k a common denominator we can find ℓ = (ℓ ,...,ℓ ) Zk 0 such that 1 k ∈ \ { } ℓ α + + ℓ α Z. 1 1 · · · k k ∈ Thus e2πi(ℓ1nα1+···+ℓknαk) = 1 for all n N and so ∈ n−1 1 e2πi(ℓ1jα1+···+ℓkjαk) = 1 0, as n . n 6→ →∞ Xj=0 Therefore, (nα1,...,nαk) is not uniformly distributed mod 1.

Solution 2.2 Let p(n)= α nk + + α n + α . Suppose that α ,...,α Q but α Q. Let k · · · 1 0 k s+1 ∈ s 6∈ p (n) = α nk + + α ns+1 1 k · · · s+1 p (n) = α ns + + α n + α 2 s · · · 1 0 so that p(n)= p1(n)+ p2(n). By choosing q to be a common denominator for αk,...,αs+1, we can write 1 p (n)= (m nk + + m ns+1) 1 q k · · · s+1 where m Z. j ∈ By Weyl’s Criterion, we want to show that for ℓ Z 0 we have ∈ \ { } n−1 1 e2πiℓp(j) 0 n → Xj=0 as n . →∞ Write j = qm + r where r = 0,...,q 1. Then p1(qm + r)= dr mod 1 for some dr Q. (q,r) − ∈ Moreover, p2(qm + r) = p2 (m) is a polynomial in m with irrational leading coefficient. Now

n −1 n−1 h q i q−1 1 2πiℓp(j) 1 2πiℓd 2πiℓp(q,r)(m) lim e = lim e r e 2 n→∞ n n→∞ n m=0 r=0 Xj=0 X X n −1 n q−1 h q i q 2πiℓd 1 2πiℓp(q,r)(m) = lim e r e 2 n→∞ hni n r=0 m=0 X q X = 0 h i (q,r) as p2 (m) is uniformly distributed mod 1.

110 MATH4/61112 12. Solutions

Solution 2.3 Let p(n) = αn2 + n + 1 where α Q. Let m 1 and consider the sequence p(m)(n) = 6∈ ≥ p(n + m) p(n) of mth differences. We have that − p(m)(n)= α(n + m)2 + (n + m) + 1 αn2 n 1 = 2αmn + αm2 + m − − − which is a degree 1 polynomial in n with leading coefficient 2αm Q. Note that 2αm = 0, 6∈ 6 as m 1. By Exercise 1.2 we have that p(m)(n) is uniformly distributed mod 1 for every ≥ m 1. By Lemma 2.3.3, it follows that p(n) is uniformly distributed mod 1. ≥ Solution 2.4 By Weyl’s Criterion, we require that for each (ℓ ,ℓ ) Z2 (0, 0) we have 1 2 ∈ \ { } n−1 1 lim exp 2πi(ℓ1p(k)+ ℓ2q(k)) = 0 (12.0.1) n→∞ n Xk=0 Let

pℓ1,ℓ2 (n) = ℓ1p(n)+ ℓ2q(n) = (ℓ α + ℓ β )nk + + (ℓ α + ℓ β )n + (ℓ α + ℓ β ) 1 k 2 k · · · 1 1 2 1 1 0 2 0 This is a polynomial of degree at most k. Then (12.0.1) can be written as n−1 1 exp 2πip (k). n ℓ1,ℓ2 Xk=0 By the 1-dimensional version of Weyl’s criterion (using the integer ℓ = 1), this will converge to 0 as n if p (n) is uniformly distributed mod 1. By Weyl’s Theorem on Polynomi- →∞ ℓ1,ℓ2 als (Theorem 2.3.1), this happens if at least one of ℓ1αk + ℓ2βk,ℓ1αk−1 + ℓ2βk−1,...,ℓ1α1 + ℓ β is irrational. Note that ℓ α +ℓ β Q if and only if α , β , 1 are rationally independent. 2 1 1 i 2 i 6∈ i i Solution 2.5 (i) We know that and that if E then X E . Hence X = X . ∅∈B ∈B \ ∈B \∅∈B (ii) Let E . Then X E . Then (X E ) . Now (X E )= X E . n ∈B \ n ∈B n \ n ∈B n \ n \ n n Hence E = X (X E ) . n n \ \ n n ∈BS S T Solution 2.6T T The smallest σ-algebra containing the sets [0, 1/4), [1/4, 1/2), [1/2, 3/4) and [3/4, 1] is = , [0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1], [0, 1/2), [0, 1/4) [1/2, 3/4), [0, 1/4) [3/4, 1], B {∅ ∪ ∪ [1/4, 3/4), [1/4, 1/2) [3/4, 1], [1/2, 1], [0, 3/4), [0, 1/2) [3/4, 1], ∪ ∪ [0, 1/4) [1/2, 1], [1/4, 1], [0, 1] ∪ } Solution 2.7 Clearly a finite union of dyadic intervals is a Borel set. By Proposition 2.4.2 we need to show that if x,y [0, 1], x = y, then there exist disjoint ∈ 6 dyadic intervals I ,I such that x I , y I . Let ε = x y and choose n such that 1 2 ∈ 1 ∈ 2 | − | 1/2n < ε/2. Without loss of generality, assume that x

111 MATH4/61112 12. Solutions

Solution 2.8 Let denote the collection of finite unions of intervals. Trivially . If A, B A ∅∈A ∈ A are finite unions of intervals then A B is a finite union of intervals. Hence is closed ∪ A under taking finite unions. If A = [a, b] [0, 1] then Ac = [0, a) (b, 1] is a finite union of ⊂ ∪ intervals. Hence is an algebra. A Solution 2.9 First note that if µ is a measure and A B then µ(A) µ(B). (To see this, note that ⊂ ≤ if A B then B = A (B A) is a disjoint union. Hence µ(B) = µ(A (B A)) = ⊂ ∪ \ ∪ \ µ(A)+ µ(B A) µ(A).) \ ≥ Let µ denote Lebesgue measure on [0, 1]. Let x [0, 1]. For any ε > 0, we have that ∈ x (x ε, x + ε) [0, 1]. Hence µ( x ) 2ε. As ε > 0 is arbitrary, it follows that { } ⊂ − ∩ { } ≤ µ( x ) = 0. { } Let E = x ∞ be a countable set. Then { j}j=1 ∞ ∞ µ(E)= µ x = µ( x ) = 0.  { j} { j} j[=1 Xj=1   Hence any countable set has Lebesgue measure 0. As the rational points in [0, 1] are countable, it follows that µ(Q [0, 1]) = 0. Hence ∩ Lebesgue almost every point in [0, 1] is irrational.

Solution 2.10 Let µ = δ be the Dirac δ-measure at 1/2. Then, by definition, µ([0, 1/2) (1/2, 1]) = 0 1/2 ∪ as 1/2 [0, 1/2) (1/2, 1]. Hence µ x [0, 1] x = 1/2 = 0, so that µ-a.e. point in [0, 1] 6∈ ∪ { ∈ | 6 } is equal to 1/2.

Solution 3.1 Let x = αn where α R is irrational. Then x is uniformly distributed mod 1 (by the n ∈ n results in 1.2.1). Let A = αn n 0 [0, 1] denote the set of fractional parts of the § {{ } | ≥ }⊂ sequence x ; note that A is a countable set. Let f = χ . Then f L1([0, 1], ,µ) (where n A ∈ B denotes the Borel σ-algebra and µ denotes Lebesgue measure on [0, 1]) and f 0 a.e. B ≡ Hence f dµ = 0. However, f( x ) = 1 for each n. Hence { n} n−1 R 1 f( x ) = 1 f dµ = 0 n { n} 6→ Xj=0 Z as n . →∞ Solution 3.2 Let X be a compact metric space equipped with the Borel σ-algebra . Let T : X X B → be continuous. Recall that is generated by the open sets. It is sufficient to check that B T −1U for all open sets U. But this is clear: as T is continuous, the pre-image T −1U ∈ B of any open set is open, hence T −1U . ∈B Solution 3.3 Define f : [0, 1] R by n → n n2x if 0 x 1/n f (x)= − ≤ ≤ n 0 if 1/n x 1.  ≤ ≤ 112 MATH4/61112 12. Solutions

(Draw a picture!) Then f is continuous, hence f L1(X, ,µ). Moreover, f dµ = 1/2 n n ∈ B n for each n. Hence f 0 in L1(X, ,µ). n 6→ B R However, f 0 µ-a.e. To see this, let x [0, 1], x = 0. Choose n such that 1/n

∞ ∞ ∞ ∞ ∞ −1 −1 −1 T∗µ En = µ T En = µ T En = µ(T En)= T∗µ(En) ! ! ! n[=1 n[=1 n[=1 nX=1 nX=1 −1 ∞ ∞ −1 where we have used the fact that T n=1 En = n=1 T En. Hence T∗µ is a measure. −1 S S −1 Finally, note that T (X) = X. Hence T∗µ(X) = µ(T X) = µ(X) = 1, so that T is a probability measure.

Solution 3.5 (i) Let λ denote Lebesgue measure on [0, 1]. All one needs to do is to find a set B such that λ(B) = λ(T −1B), and any (reasonable) choice of set B will work. For example, 6 take B = (1/2, 1). Then

∞ 1 1 T −1(B)= , . n + 1 n + 1/2 n=1 [   It follows that ∞ 1 1 λ(T −1B)= = log(4) 1 < = λ(B). (1 + 2n)(1 + n) − 2 nX=1 (ii) Recall that 1 dx 1 χ (x) µ(B)= = B dx. log 2 1+ x log 2 1+ x ZB Z Note that 1/2 1/(1 + x) 1 if 0 x 1. Hence ≤ ≤ ≤ ≤ 1 χ (x) 1 B dx µ(B) χ (x) dx log 2 2 ≤ ≤ log 2 B Z Z so that 1 1 1 1 λ(B)= χ (x) dx µ(B) χ (x) dx = λ(B). 2 log 2 2 log 2 B ≤ ≤ log 2 B log 2 Z Z

113 MATH4/61112 12. Solutions

(iii) From (3.4.1) it follows that

1 1 f dλ f dµ f dλ (12.0.2) 2 log 2 ≤ ≤ log 2 Z Z Z for all simple functions f. By taking increasing sequences of simple functions, we see that (12.0.2) continues to hold for non-negative measurable functions. Let f ∈ L1(X, ,µ). Then B 1 f dλ f dµ 2 log 2 | | ≤ | | Z Z so that f L1(X, , λ). Similarly, if f L1(X, , λ) then f L1(X, ,µ). ∈ B ∈ B ∈ B Solution 3.6 Let [a, b] [0, 1]. Then ⊂ k−1 a + j b + j T −1[a, b]= , k k j[=0   so that k−1 k−1 b + j a + j b a T µ([a, b]) = = − = b a = µ([a, b]). ∗ k − k k − Xj=0 Xj=0 Hence T∗µ and µ agree on intervals. Hence, by the Hahn-Kolmogorov Extension Theorem, T∗µ = µ so that µ is a T -invariant measure.

Solution 3.7 Let T (x)= βx mod 1. Then T has a graph as illustrated in Figure 12.1.

1

1/β

0 0 1/β 1

Figure 12.1: The graph of T (x)= βx mod 1.

Let us first show that T does not preserve Lebesgue measure. For this, it is sufficient to find a set B such that B and T −1B do not have equal Lebesgue measure; in fact, almost any

114 MATH4/61112 12. Solutions reasonable choice of B will suffice, but here is a specific example. Let λ denote Lebesgue measure. Take B = [1/β, 1]. Then λ(B) = 1 1/β = 1/β2 (as β 1 = 1/β). Now − − T −1[1/β, 1] = [1/β2, 1/β] so that λ(T −1B) = 1/β 1/β2 = 1/β3 = λ(B). − 6 We now show that T does preserve the measure µ defined as in the statement of the question. To do this we again use the Hahn-Kolmogorov Extension Theorem, which tells us that it is sufficient to prove that µ(T −1[a, b]) = µ[a, b] for all intervals [a, b] [0, 1]. ⊂ If [a, b] [0, 1/β] then ⊂ a b a + 1 b + 1 T −1[a, b]= , , , β β ∪ β β     a disjoint union. Hence,

−1 1 (b a) 1 (b + 1) (a + 1) µ(T [a, b]) = 1 1 − + − + 3 β 1 1 β β β   β β + β3   b a 1 1   = − + 1 + 1 β β2 β β3   = µ([a, b]).

If [a, b] [1/β, 1] then T −1[a, b] = [a/β, b/β] and ⊂ a b 1 b a µ(T −1[a, b]) = µ , = − = µ([a, b]). β β 1 + 1 β   β β3   If a< 1/β < b then we write [a, b] = [a, 1/β] [1/β, b]. Then T −1[a, b]= T −1[a, 1/β] ∪ ∪ T −1[1/β, b], a disjoint union. Hence

µ(T −1[a, b]) = µ(T −1[a, 1/β] T −1[1/β, b]) ∪ = µ(T −1[a, 1/β]) + µ(T −1[1/β, b]) = µ([a, 1/β]) + µ([1/β, b]) = µ([a, b]).

Solution 3.8 (i) Note that 1 1 1 2 π/2 µ([0, 1]) = dx = dθ = 1, π 0 x(1 x) π 0 Z − Z using the substitution x = sin2 θ. p

(ii) By the Hahn-Kolmogorov Extension Theorem it is sufficient to prove that µ([a, b]) = µ(T −1[a, b]) for all intervals [a, b]. Note that 1 √1 a 1 √1 b 1+ √1 b 1+ √1 a T −1[a, b]= − − , − − − , − 2 2 ∪ 2 2     (as the graph of T is decreasing on [1/2, 1] the order of a, b are reversed in the second sub-interval). It is sufficient to prove that 1 µ([(1 √1 a)/2, (1 √1 b)/2]) = µ([a, b]) − − − − 2

115 MATH4/61112 12. Solutions

and 1 µ([(1 + √1 b)/2, (1 + √1 a)/2]) = µ([a, b]) − − 2 We prove the first equality (the second is similar). Now

1 √1 b 1 − 2 − 1 µ([(1 √1 a)/2, (1 √1 b)/2]) = dx. (12.0.3) − − − − π 1 √1 a x(1 x) Z − 2 − − Consider the substitution u = 4x(1 x). Then du = 4(1p 2x)dx and as x ranges − − between (1 √1 a)/2 and (1 √1 b)/2, u ranges between a, b. Note also that − − − − a simple manipulation shows that (1 2x)2 = 1 u. Hence the right-hand side of − − (12.0.3) is equal to 1 b 1 1 du = µ([a, b]). 2π a u(1 u) 2 Z − Similarly, p 1 µ([(1 + √1 b)/2, (1 + √1 a)/2]) = µ([a, b]) − − 2 and the result follows.

Solution 3.9 Note that ∞ 1 1 X = , 0 n + 1 n ∪ { } n=1 [   and that this is a disjoint union. Hence, denoting Lebesgue measure by µ,

∞ ∞ ∞ 1 1 1 1 1 1= µ(X)= µ , = = . n + 1 n n − n + 1 n(n + 1) n=1 n=1 n=1 X   X X By the Hahn-Kolmogorov Extension Theorem, it is sufficient to check that µ(T −1[a, b]) = µ([a, b]) for all intervals [a, b]. It is straightforward to check that

∞ a + n b + n T −1[a, b]= , n(n + 1) n(n + 1) n=1 [   and that this is a disjoint union. Hence

∞ a + n b + n µ(T −1[a, b]) = µ , n(n + 1) n(n + 1) n=1   X∞ b a = − n(n + 1) n=1 X = b a = µ([a, b]). − Solution 3.10 (i) Clearly d(x, y) 0 with equality if and only if x = y. It is also clear that d(x, y) = ≥ d(y, x). It remains to prove the triangle inequality: d(x, z) d(x, y)+ d(y, z) for all ≤ x, y, z Σ. ∈

116 MATH4/61112 12. Solutions

If any of x, y, z are equal then the triangle inequality is clear, so we can assume that x, y, z are all distinct. Suppose that x and y agree in the first n places and that y and z agree in the first m places. Then x and z agree in at least the first min n,m { } places. Hence n(x, z) min n(x, y),n(y, z) . Hence ≥ { } 1 1 1 1 d(x, z)= + = d(x, y)+ d(y, z). 2n(x,z) ≤ 2min{n(x,y),n(y,z)} ≤ 2n(x,y) 2n(y,z)

(ii) Let ε > 0 and choose n 1 such that 1/2n < ε. Choose δ = 1/2n+1. Suppose that ≥ d(x, y) < δ = 1/2n+1. Then n(x, y) > n + 1, i.e. x and y agree in at least the first n + 1 places. Hence σ(x) and σ(y) agree in at least the first n places. Hence n(σ(x),σ(y)) >n. Hence d(σ(x),σ(y)) < 1/2n < ε.

(iii) We show that [i ,...,i ] is open. Let x [i ,...,i ] so that x = i for j = 0 n−1 ∈ 0 n−1 j j 0, 1,...,n 1. Choose ε = 1/2n. Suppose that d(x, y) < ε. Then n(x, y) >n, i.e. x − and y agree in at least the first n places. Hence x = y for j = 0, 1,...,n 1. Hence j j − y = i for j = 0, 1,...,n 1 so that y [i ,...,i ]. Hence [i ,...,i ] is open. j j − ∈ 0 n−1 0 n−1 To see that [i0,...,in−1] is closed, note that

Σ [i ,...,i ]= [i′ ,...,i′ ] \ 0 n−1 0 n−1 [ where the union is over all n-tuples (i′ , i′ ,...,i′ ) = (i , i ,...,i ). This is a 0 1 n−1 6 0 1 n−1 finite union of open sets, and so is open. Hence [i0,...,in−1], as the complement of an open set, is closed.

Solution 3.11 First note that

0 1 0 0 0 1/4 0 3/4 0 0 1/4 0 3/4 0 0 0 5/8 0 3/8 0 P =  0 1/2 0 1/2 0  , P 2 =  1/8 0 3/4 0 1/8  ,  0 0 3/4 0 1/4   0 3/8 0 5/8 0       0 0 0 1 0   0 0 3/4 0 1/4          0 5/8 0 3/8 0 5/32 0 3/4 0 3/32 5/32 0 3/4 0 3/32 0 17/32 0 15/32 0     P 3 = 0 1/2 0 1/2 0 , P 4 = 1/8 0 3/4 0 1/8 .  3/32 0 3/4 0 5/32   0 15/32 0 17/32 0       0 3/8 0 5/8 0   3/32 0 3/4 0 5/32          As for each i, j there exists n for which P n(i, j) > 0, it follows that P is irreducible. Recall that the period of P is the highest common factor of n> 0 P n(i, i) > 0 . As { | } all the diagonal entries of P 2 are positive, it follows that P has period 2. Decompose 1, 2, 3, 4, 5 = 1, 3, 5 2, 4 = S S . If P (i, j) > 0 then either i S { } { } ∪ { } 0 ∪ 1 ∈ 0 and j S , or i S and j S , i.e. i S and j S . When restricted to the ∈ 1 ∈ 1 ∈ 0 ∈ ℓ ∈ ℓ+1 mod 2 indices 1, 3, 5 , P 2 has the form { } 1/4 3/4 0 1/8 3/4 1/8   0 3/4 1/4   117 MATH4/61112 12. Solutions which is easily seen to be irreducible and aperiodic. When restricted to the indices 2, 4 , { } P 2 has the form 5/8 3/8 3/8 5/8   which is clearly irreducible and aperiodic. The eigenvalues of P are found by evaluating the determinant

λ 1 0 0 0 − 1/4 λ 3/4 0 0 − 0 1/2 λ 1/2 0 . − 0 0 3/4 λ 1/4 − 0 0 0 1 λ

After simplifying this expression, we obtain 1 (1 λ)(1 + λ)λ λ + . − 4   (Note that, as P has period 2, we expect from the Perron-Frobenius Theorem that the square roots of 1 to be the eigenvalues of modulus 1 for P .) A left eigenvector p = (p(1),p(2),p(3),p(4),p(5)) for the eigenvalue 1 is determined by

0 1 0 0 0 1/4 0 3/4 0 0 (p(1),p(2),p(3),p(4),p(5))  0 1/2 0 1/2 0  = (p(1),p(2),p(3),p(4),p(5))  0 0 3/4 0 1/4     0 0 0 1 0      which simplifies to 1 1 3 3 1 1 p(2) = p(1), p(1)+ p(3) = p(2), p(2)+ p(4) = p(3), p(3)+p(5) = p(4), p(4) = p(5). 4 2 4 4 2 4 Setting p(1) = 1 we obtain (p(1),p(2),p(3),p(4),p(5)) = (1, 4, 6, 4, 1), and normalising this to form a probability vector we obtain

1 1 3 1 1 p = , , , , . 16 4 8 4 16   Solution 3.12 Let p = (p(1),...,p(k)) be a probability vector. Let P be the matrix

p(1) p(2) p(k) · · · p(1) p(2) p(k) P =  . · · · .  . . .    p(1) p(2) p(k)   · · ·    Then P is a stochastic matrix. As each p(j) > 0, it follows that P is aperiodic. It is straightforward to check that pP = p. As P (i, j) = p(j), the Markov measure determined by the matrix P is the same as Bernoulli measure determined by the probability vector p.

118 MATH4/61112 12. Solutions

Solution 4.1 Note that −1 χ 1 (x) = 1 x T B T (x) B χB(T (x)) = 1. T − B ⇔ ∈ ⇔ ∈ ⇔ Hence χ 1 = χB T . T − B ◦ Solution 4.2 Note that T n(x) = x if and only if 2nx = x mod 1, i.e. 2nx = x + p for some integer p. Hence x = p/(2n 1). We get distinct values of x in R/Z when p = 0, 1,..., 2n 2 (note − − that when p = 2n 1 then x = 1, which is the same as 0 in R/Z). − Hence there are infinitely many distinct periodic orbits for the doubling map. If n−1 n−1 x,Tx,...,T x is a periodic orbit of period n then let δ(x) = 1/n j=0 δT j x denote the periodic orbit measure supported on the orbit of x. As there are infinitely many distinct P periodic orbits, there are infinitely many distinct measures supported on periodic orbits.

Solution 4.3 Recall that R/Z can be regarded as [0, 1] where 0 and 1 are identified. Suppose that f : [0, 1] R is integrable and that f(0) = f(1) so that f is a well-defined function on → R/Z. Then

1/2 1 f T dµ = f T dµ + f T dµ ◦ ◦ ◦ Z Z0 Z1/2 1/2 1 = f(2x) dx + f(2x 1) dx (12.0.4) − Z0 Z1/2 1 1 1 1 = f(x) dx + f(x) dx 2 2 Z0 Z0 = f dµ Z where we have used the substitution u(x) = 2x for the first integral and u(x) = 2x 1 for − the second integral in (12.0.4)

Solution 4.4 It is straightforward to check that T : R2/Z2 R2/Z2 is a diffeomorphism. → Recall that we can identify functions f : R2/Z2 C with functions f : R2 C that → → satisfy f(x + n)= f(x) for all n Z2. We apply the change of variables formula with the ∈ substitution u(x)= T (x). Note that 1 0 DT (x)= 1 1   so that det DT = 1. Hence, by the change of variables formula | | f T dµ = f T det DT dµ = f dµ = f dµ. ◦ 2 2 ◦ | | 2 2 Z ZR /Z ZT (R /Z ) Z Solution 5.1 Let α = p/q with p,q Z, q = 0, hcf(p,q) = 1. Let ∈ 6 q−1 j j 1 B = , + . q q 2q j[=0   119 MATH4/61112 12. Solutions

Then j j 1 j p j p 1 T −1 , + = − , − + q q 2q q q 2q     so that T −1B = B (draw a picture to understand this better). However µ(B) = 1/2, so that T is not ergodic with respect to Lebesgue measure.

Solution 5.2 Suppose that f L2(X, ,µ) has Fourier series ∈ B 2πi(nx+my) c(n,m)e . 2 (n,mX)∈Z Then f T has Fourier series ◦ 2πi(n(x+α)+m(x+y)) 2πinα 2πi((n+m)x+my) c(n,m)e = c(n,m)e e . 2 2 (n,mX)∈Z (n,mX)∈Z Comparing coefficients we see that

2πinα c(n+m,m) = e c(n,m).

Suppose that m = 0. Then for each j > 0, 6 c = = c = c , | (n+jm,m)| · · · | (n+m,m)| | (n,m)| as e2πinα = 1. Note that if m = 0 then (n + jm,m) as j . By the Riemann- | | 6 → ∞ → ∞ Lebesgue Lemma (Proposition 5.3.2(ii)), we must have that c =0 if m = 0. Hence f (n,m) 6 has Fourier series 2πinx c(n,0)e 2 (n,X0)∈Z and f T has Fourier series ◦ 2πinα 2πinx c(n,0)e e . 2 (n,X0)∈Z Comparing Fourier coefficients we see that

2πinα c(n,0) = c(n,0)e .

Suppose that n = 0. As α Q, e2πinα = 1. Hence c = 0 unless n = 0. Hence f has 6 6∈ 6 (n,0) Fourier series c(0,0), i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure.

Solution 5.3 Suppose that T : X X has a periodic point x with period n. Let → n−1 1 µ = δ j . n T x Xj=0 Let B and suppose that T −1B = B. We must show that µ(B) = 0 or 1. ∈B

120 MATH4/61112 12. Solutions

Suppose that x B. Then x T −1B. Hence T (x) B. Continuing inductively, we see ∈ ∈ ∈ that T j(x) B for j = 0, 1,...,n 1. Hence ∈ − n−1 n−1 1 1 µ(B)= δ j (B)= 1 = 1. n T x n Xj=0 Xj=0 Similarly, if x X B then T j(x) X B for j = 0, 1,...,n 1 (we have used the fact ∈ \ ∈ \ − that if B is T -invariant then X B is T -invariant). Hence µ(B) = 0. \ Solution 5.4 (i) Recall that the determinant of a matrix is equal to the product of all the eigenvalues. Let T be a linear toral automorphism with corresponding matrix A. Suppose that A has an eigenvalue of modulus 1. By considering A2 if necessary, there is no loss in generality in assuming that det A = 1. Suppose k = 2. Then the matrix A has two eigenvalues, λ, λ. As A has an eigenvalue of modulus 1, we must have that λ = e2πiθ for some θ [0, 1). Then λ, λ satisfy the ∈ equation λ2 + 2 cos θλ + 1 = 0. However the matrix A = (a, b; c, d) has characteristic equation λ2+(a+d)λ+1 = 0. Hence 2 cos θ = a+d, an integer. Thus θ = 0, π/2, π. ± ± Hence λ = 1, i, and is a root of unity and so T cannot be ergodic. ± ± Now suppose k = 3. Then, assuming that A has an eigenvalue of modulus 1, the eigenvalues must be λ = e2πiθ, λ¯ and µ R. As det A = 1, we must have that ∈ λλµ¯ = 1. As λλ¯ = 1, it follows that µ = 1, Hence A has 1 as an eigenvalue and so T cannot be ergodic. Thus k 4. ≥ (ii) A has integer entries and it is easy to see that det A = 1. Hence A determines a linear toral automorphism of R4/Z4.

(iii) It is straightforward to calculate that the characteristic equation for A is

λ4 8λ3 + 6λ2 8λ +1=0. − − Clearly, λ = 0. Dividing by λ2 and substituting u = λ + λ−1 we see that 6 u2 8u +4=0. − Hence u = 4 2√3. ± Multiplying λ + λ−1 = u by λ we obtain a quadratic in λ with solution

u √u2 4 λ = ± − . 2 Substituting the two different values of u gives four values of λ, namely:

2+ √3 6 + 4√3, 2 √3 i 4√3 6. ± − ± − q q The first two are real and not of unit modulus, whereas the second two are complex numbers of unit modulus.

121 MATH4/61112 12. Solutions

(iv) This question is not part of the course and is included for completeness only. The solution requires ideas from Galois theory. We first claim that f(λ) = λ4 8λ3 + 6λ2 6λ + 1 is irreducible over Q. (To see − − this, recall that irreducibility over Q is equivalent to irreducibility over Z. Consider the polynomial f(λ +1) = λ4 4λ3 12λ2 14λ 6 and apply Eisenstein’s criterion − − − − using the prime 2 to see that f(λ + 1) is irreducible over Z. Hence f(λ) is irreducible over Z.) Hence λ4 8λ3 + 6λ2 6λ + 1 has no common factors with λn 1 for any − − − n. Hence λ is not a root of unity.

Solution 6.1 (i) Let µ denote Lebesgue measure. We prove that T∗µ = µ by using the Hahn- Kolmogorov Extension Theorem. It is sufficient to prove that T∗µ([a, b]) = µ([a, b]) for all intervals [a, b]. Note that a b b a T −1[a, b]= , 1 , 1 . 2 2 ∪ − 2 − 2     Hence b a a b T µ([a, b]) = + 1 1 = b a = µ([a, b]). ∗ 2 − 2 − 2 − − 2 −     Hence µ is a T -invariant measure.

(ii) Define I(0) = [0, 1/2], I(1) = [1/2, 1] and define the maps x x φ : [0, 1] I(0) : x , φ : [0, 1] I(1) : x 1 . 0 → 7→ 2 1 → 7→ − 2

Then T φ0(x)= x, T φ1(x)= x. Given i ,...,i 0, 1 define 0 n−1 ∈ { }

φi0,i1,...,in 1 = φi0 φi1 φin 1 − · · · − n and note that T φi0,i1,...,in 1 (x)= x for all x [0, 1]. − ∈ Define

I(i0, i1,...,in−1)= φi0,i1,...,in 1 ([0, 1]) − and call this a cylinder of rank n. It is easy to see that cylinders of rank n are dyadic intervals (although the labelling of these cylinders is not the same as the labelling that one gets when using the doubling map: for example, for the tent map I(1, 1) = [1/2, 3/4] whereas for the doubling map I(1, 1) = [3/4, 1]). Hence the algebra of finite unions of cylinders generates the Borel σ-algebra. A −1 −n Let B be such that T B = B. Note that T B = B. Let I = I(i0, i1,...,in−1) ∈B n be a cylinder of rank n and let φ = φi0,i1,...,in 1 . Then T φ(x) = x. Note also that µ(I) = 1/2n. We will also need the fact that φ−′(x) = 1/2n (this follows from noting | | that φ′ (x) = φ′ (x) = 1/2 and using the chain rule). | 0 | | 1 | Finally, we observe that

µ(B I) = χ (x) dx ∩ B∩I Z = χB(x)χI (x) dx Z 122 MATH4/61112 12. Solutions

= χB(x) dx I Z 1 ′ = χB(φ(x)) φ (x) dx by the change of variables formula 0 | | Z 1 ′ −n n = χT − B(φ(x)) φ (x) dx as T B = B 0 | | Z 1 n ′ = χB(T (φ(x))) φ (x) dx 0 | | Z 1 = χ (x) φ′(x) dx as T nφ(x)= x B | | Z0 1 1 = χ (x) as φ′(x) = 1/2n 2n B | | Z0 = µ(I)µ(B) as µ(I) = 1/2n.

Hence µ(B I)= µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 ∩ it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

Solution 6.2 For each n 1 define I(n) = [1/(n + 1), 1/n] and define the maps ≥ x n φ : [0, 1] I(n) : x − . n → 7→ n(n + 1)

Note that T φ (x)= x for all x [0, 1]. n ∈ Given i , i ,...,i N define 0 1 n−1 ∈

φi0,i1,...,in 1 = φi0 φi1 φin 1 − · · · − n and note that T φi0,i1,...,in 1 (x)= x for all x [0, 1]. − ∈ Define I(i0, i1,...,in−1) = φi0,i1,...,in 1 ([0, 1]) and call this a cylinder of rank n. Note that − 1 1 φ′ (x)= n n(n + 1) ≤ 2 so that, by the chain rule,

n−1 1 1 φ′ (x)= . i0,i1,...,in 1 n − ij(ij + 1) ≤ 2 jY=0

By the Intermediate Value Theorem, I(i0, i1,...,in−1) is an interval of length no more than 1/2n. For each n, the cylinders of rank n partition [0, 1]. Let x,y [0, 1] and suppose that ∈ x = y. Choose n such that x y > 1/2n. Then x,y must lie in different cylinders of rank 6 | − | n. Hence the cylinders separate the points of [0, 1]. By Proposition 2.4.2 it follows that the algebra of finite unions of cylinders generates the Borel σ-algebra. A −1 −n Let B be such that T B = B. Note that T B = B. Let I = I(i0, i1,...,in−1) ∈ B n be a cylinder of rank n and let φ = φi0,i1,...,in 1 . Then T φ(x)= x. Note that − n−1 1 µ(I)= = φ′(x) (12.0.5) ij(ij + 1) jY=0 123 MATH4/61112 12. Solutions for any x [0, 1]. ∈ Finally, we observe that

µ(B I) = χ (x) dx ∩ B∩I Z = χB(x)χI (x) dx Z = χB(x) dx I Z 1 ′ = χB(φ(x)) φ (x) dx by the change of variables formula 0 | | Z 1 ′ −n n = χT − B(φ(x)) φ (x) dx as T B = B 0 | | Z 1 n ′ = χB(T (φ(x))) φ (x) dx 0 | | Z 1 = χ (x) φ′(x) dx B | | Z0 = µ(I)µ(B) by (12.0.5).

Hence µ(B I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 it ∩ follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

Solution 6.3 (i) First note that 1 P1 1 x1 P2 = , 1 = = . x0 Q1 x + x0x1 + 1 Q2 0 x1

If we define P0 = 0,Q0 = 1 then we have that P2 = x1P1 + P0 and Q2 = x1Q1 + Q0. Similarly,

1 P1(x0; t) 1 x1 + t P2(x0,x1; t) = , 1 = = . x0 + t Q1(x0; t) x + x0x1 +1+ t Q2(x0,x1; t) 0 x1+t then P2(x0,x1; t)= P2 + tP1, Q2(x0,x1; t)= Q2 + tQ1.

Suppose that Pn(x0,...,xn−1)= Pn + tPn−1, Qn(x0,...,xn−1)= Qn + tQn−1. Then

Pn+1(x0,x1,...,xn; t) = [x0,...,xn−1,xn + t] Qn+1(x0,x1,...,xn; t) 1 = x ,...,x + 0 n−1 x + t  n  1 Pn(x0,x1,...,xn−1; ) = xn+t Q (x ,x ,...,x ; 1 ) n 0 1 n−1 xn+t 1 Pn + Pn−1 = xn+t Q + 1 Q n xn+t n−1 x P + P + tP = n n n−1 n . xnQn + Qn−1 + tQn

124 MATH4/61112 12. Solutions

Hence

Pn+1(x0,x1,...,xn; t)= xnPn+Pn−1+tPn, Qn+1(x0,x1,...,xn; t)= xnQn+Qn−1+tQn. Putting t = 0 we obtain the recurrence relations

Pn+1 = xnPn + Pn+1, Qn+1 = xnQn + Qn+1. Hence

Pn+1(x0,x1,...,xn; t)= Pn+1 + tPn, Qn+1(x0,x1,...,xn; t)= Qn+1 + tQn. By induction, the recurrence relations hold. (ii) Note that Q P Q P = (x Q + Q )P Q (x P + P ) n n−1 − n−1 n n−1 n−1 n−2 n−1 − n−1 n−1 n−1 n−2 = (Q P Q P )= = ( 1)n. − n−1 n−2 − n−2 n−1 · · · − Solution 6.4 (i) Let x = (x ,x ,...), y = (y ,y ,...) Σ. Let d and d denote the usual metrics 0 1 0 1 ∈ R/Z Σ on R/Z and Σ, respectively. Now

dR/Z(π(x),π(y)) (12.0.6) π(x ,x ,...) π(y ,y ,...) ≤ | 0 1 − 0 1 | x y x y = 0 − 0 + 1 − 1 + 2 22 · · ·

x y x y | 0 − 0| + | 1 − 1| + . (12.0.7) ≤ 2 22 · · · n Now if dΣ(x, y) < 1/2 then xj = yj for j = 0,...n. Hence we can bound the right-hand side of (12.0.7) by x y x y 1 1 | n+1 − n+1| + | n+2 − n+2| + + + 2n+2 2n+3 ··· ≤ 2n+2 2n+3 · · · 1 1 1 = 1+ + + 2n+2 2 22 · · ·   1 , ≤ 2n+1 summing the geometric progression. This implies that π is continuous. To see this, n+1 n let ε > 0. Choose n such that 1/2 < ε. Choose δ = 1/2 . If dΣ(x, y) < δ then dR/Z(π(x),π(y)) < ε. (ii) Observe that if x = (x )∞ Σ then j j=0 ∈ x x π(σ(x)) = π(σ(x ,x ,...)) = π(x ,x ,...)= 1 + 2 + 0 1 1 2 2 22 · · · and

T (π(x)) = T (π(x0,x1,...)) x x = T 0 + 1 + 2 22 · · ·  x1 x2  = x + + + mod 1 0 2 22 · · · x x = 1 + 2 + . 2 22 · · ·

125 MATH4/61112 12. Solutions

(iii) We must show that T∗(π∗µ)= π∗µ. To see this, observe that −1 T∗(π∗µ)(B) = π∗µ(T B) = µ(π−1T −1B) = µ(σ−1π−1B) as πσ = Tπ −1 = (σ∗µ)(π B) = µ(π−1B) as µ is σ-invariant

= (π∗µ)(B).

(iv) Suppose that µ is an ergodic measure for σ. We claim that π∗µ is an ergodic measure for T , i.e. if B (R/Z) is such that T −1B = B then π µ(B) = 0 or 1. ∈B ∗ First observe that π−1(B) is σ-invariant. This follows as:

σ−1(π−1(B)) = π−1T −1(B)= π−1(B).

As µ is an ergodic measure for σ, we must have that µ(π−1(B)) = 0 or 1. Hence π∗µ(B) = 0 or 1.

(v) There are uncountably many different Bernoulli measures µp for Σ given by the family of probability vectors (p, 1 p). These are ergodic for σ. To see that π∗µp are all − −1 different, notice that π∗µp([0, 1/2)) = µp(π [0, 1/2)) = µp([0]) = p, where [0] denotes the cylinder consisting of all sequences that start with 0.

Solution 7.1 Suppose that x x. We must show that δ ⇀ δ . Let f C(X, R). Then n → xn x ∈ f dδ = f(x ) f(x)= f dδ xn n → x Z Z as f is continuous. Hence δxn ⇀ δx.

Solution 7.2 Suppose that µ ⇀µ. We must show that T µ ⇀ T µ. Let f C(X, R). Then n ∗ ∗ ∈

lim f d(T∗µn) = lim f T dµn = f T dµ = f d(T∗µ) n→∞ n→∞ ◦ ◦ Z Z Z Z as f T is continuous. Hence T µ ⇀ T µ as n . ◦ ∗ n ∗ →∞ Solution 7.3 (i) Suppose that µ µ as n . We claim that µ ⇀ µ. To show this, we have to n → → ∞ n prove that if f C(X, R) then f dµ f dµ as n . ∈ n → →∞ Let f C(X, R). Note that f/ f C(X, R) and that (f/ f ) = 1. Hence ∈ kR k∞ ∈ R k k k∞ k∞ f f f dµ f dµ = f dµ dµ n − k k∞ f n − f Z Z Z ∞ Z ∞ k k k k

sup g dµn g dµ ≤ g∈C(X,R),kgk ≤1 − ∞ Z Z

= µn µ , k − k which tends to 0 as n . →∞

126 MATH4/61112 12. Solutions

(ii) Suppose that x x but that x = x for all n. We claim that δ δ . Note that n → n 6 xn 6→ x

δxn δx = sup f(xn) f(x) . k − k f∈C(X,R),kfk ≤1 | − | ∞ For each n, we can choose a continuous function f C(X, R) such that f (x) = 1, n ∈ n f (x )=0 and f 1. Hence n n k nk∞ ≤

δxn δx = sup f(xn) f(x) . k − k f∈C(X,R),kfk ≤1 | − | ∞ f (x ) f (x) ≥ | n n − n | = 1. Hence δ δ . xn 6→ x (iii) First note that if f C(X, R) is any continuous function with f 1, then ∈ k k∞ ≤ f dδ f dδ = f(x) f(y) f(x) + f(y) 2. x − y | − | ≤ | | | |≤ Z Z

Hence

δx δy = sup f dδx f dδy 2. k − k f∈C(X,R),kfk ≤1 − ≤ ∞ Z Z

Conversely, by Urysohn’s Lemma, there exist continuous functions g1, g2 such that g (x) = g (y) = 1, g (y) = g (x) = 0 and 0 g , g 1. Let h = g g . Then 1 2 1 2 ≤ 1 2 ≤ 1 − 2 h(x) = 1, h(y)= 1 and 1 h 1 (so that h = 1). Hence − − ≤ ≤ k k∞ 2 = h(x) h(y) | − | = h dδ h dδ x − y Z Z

sup f dδx f dδy ≤ f∈C(X,R),kfk ≤1 − ∞ Z Z

= δx δy . k − k Hence if x = y then δ δ = 2. 6 k x − yk Suppose that X is infinite. Let x X be pairwise distinct and consider the sequence n ∈ µ = δ . Then µ µ = 2 if n = m. Hence µ cannot have a convergent n xn k n − mk 6 n subsequence, and so M(X) is not compact in the strong topology when X is infinite.

Solution 7.4 Let x X be a sequence such that x x and x = x for all n. Let µ = δ and µ = δ . n ∈ n → n 6 n xn x Then µ ⇀µ. Take B = x . Then µ (B)=0 but µ(B) = 1. Hence µ (B) µ(B). n { } n n 6→ Solution 7.5 Let µ ,µ M(X, T ) and suppose that α [0, 1]. Then αµ + (1 α)µ M(X). To 1 2 ∈ ∈ 1 − 2 ∈ check that αµ + (1 α)µ M(X, T ), note that 1 − 2 ∈ (T (αµ + (1 α)µ ))(B) = (αµ + (1 α)µ )(T −1B) ∗ 1 − 2 1 − 2 = αµ (T −1B) + (1 α)µ (T −1B) 1 − 2 = αµ (B) + (1 α)µ (B) 1 − 2 = (αµ + (1 α)µ )(B). 1 − 2

127 MATH4/61112 12. Solutions

Solution 7.6 Let C(X, R) be uniformly dense. Let f C(X, R). Let ε> 0. Choose g such that S ⊂ ∈ ∈ S f g < ε. Choose N such that if n N then g dµ g dµ < ε. Then k − k∞ ≥ | n − | R R f dµ f dµ n − Z Z

f dµ g dµ + g dµ g dµ + f dµ g dµ ≤ n − n n − − Z Z Z Z Z Z

f g dµ + g dµ g dµ + f g dµ ≤ | − | n n − | − | Z Z Z Z

3ε. ≤ As ε> 0 is arbitrary, the result follows.

Solution 7.7 (i) Recall that x Σ is a periodic point with period n if σn(x) = x. If x is periodic ∈ with period n then xj+n = xj for all j = 0, 1, 2,.... Hence x is determined by the n first n symbols, which then repeat. As there are two choices for each xj, there are 2 periodic points with period n.

(ii) First note that µn is a Borel probability measure. Let [i , i ,...,i ] be a cylinder. Let n m. Then the periodic points x of period 0 1 m−1 ≥ n in [i0, i1,...,im−1] have the form

x = (i0, i1,...,im−1,xm,...,xn−1, i0, i1,...,im−1,xm,...,xn−1, i0,...)

where the finite string of symbols i0, i1,...,im−1,xm,...,xn−1 repeats. The symbols n−m xm,...,xn−1 can be chosen arbitrarily. Hence there are 2 such periodic points. Hence, if n m, ≥ 1 n−m 1 χ[i0,i1,...,im 1] dµn = n 2 = m = χ[i0,i1,...,im 1] dµ. − 2 × 2 − Z Z

(iii) To prove that χ[i0,i1,...,im 1] is continuous we need to show that, if xn x then − → χ[i0,i1,...,im 1](xn) χ[i0,i1,...,im 1](x). − → − First suppose that x [i , i ,...,i ]. As x x, it follows from the definition of ∈ 0 1 m−1 n → the metric on Σ that there exists N N such that if n N then x and x agree in ∈ ≥ n the first m places. Hence if n N then x [i , i ,...,i ]. Hence, if n N, then ≥ n ∈ 0 1 m−1 ≥ χ[i0,i1,...,im 1](xn)=1= χ[i0,i1,...,im 1](x). − − Now suppose that x [i , i ,...,i ]. As x x, it follows from the definition of 6∈ 0 1 m−1 n → the metric on Σ that there exists N N such that if n N then x and x agree ∈ ≥ n in the first m places. Hence if n N, there exists j 0, 1,...,m 1 such that ≥ ∈ { − } (x ) = i ; that is, if n N, then x [i , i ,...,i ]. Hence, if n N, then n j 6 j ≥ n 6∈ 0 1 m−1 ≥ χ[i0,i1,...,im 1](xn)=0= χ[i0,i1,...,im 1](x). − −

Hence χ[i0,i1,...,im 1] is continuous. − (iv) Let denote the set of finite linear combinations of characteristic functions of cylin- S ders. By the Stone-Weierstrass Theorem, is uniformly dense in C(X, R). By (ii) S above, if g then g dµ g dµ as n . Let f C(X, R) and let ε > 0. ∈ S n → → ∞ ∈ Choose g such that f g < ε. Then a 3ε argument as in the solutions to ∈ S R k − kR∞ Exercise 7.6 proves that lim sup f dµ f dµ < 3ε and the result follows. n→∞ | n − | R R 128 MATH4/61112 12. Solutions

Solution 7.8 As trigonometric polynomials are uniformly dense in C(X, R), it is sufficient to prove that (j) g T dµ = g dµ for all trigonometric polynomials g. Let g(x) = r c e2πihn ,xi, ◦ j=0 j (j) (j) (j) (j) 3 Rcj R, n =R (n1 ,n2 ,n3 ) Z be a trigonometric polynomial. We labelP the coefficients ∈ (j) ∈ so that n = 0 if and only if j = 0. Then g dµ = c0. Note that R x α + x g T y + Z3 = g x + y + Z3 ◦       z y + z    r   2πih(n(j),n(j),n(j)),(α+x,x+y,y+x)i = cje 1 2 3 Xj=0 r (j) (j) (j) (j) 2πin α 2πi“n1 x+n2 (x+y)+n3 (y+z)” = cje 1 e Xj=0 r 2πin(j)α 2πih(n(j)+n(j),n(j)+n(j),n(j)),(x,y,z)i = cje 1 e 1 2 2 3 3 . Xj=0 Hence r 2πin(j)α 2πih(n(j)+n(j),n(j)+n(j),n(j)),(x,y,z)i g T dµ = c e 1 e 1 2 2 3 3 dµ ◦ j Z Z Xj=0 r 2πin(j)α 2πih(n(j)+n(j),n(j)+n(j),n(j)),(x,y,z)i = cje 1 e 1 2 2 3 3 dµ. Xj=0 Z (j) (j) (j) (j) (j) The integral is equal to zero unless (n1 + n2 ,n2 + n3 ,n3 ) = (0, 0, 0), i.e. unless (j) (j) (j) n1 = n2 = n3 = 0. By our choice of labelling the coefficients, this only happens if j = 0. Hence g T dµ = c = g dµ. ◦ 0 Z Z Solution 8.1 (i) Let B and let f = χ . Note that ∈B B dν dν dν f dν = χ dν = dν = ν(B)= dµ = χ dµ = f dµ. B dµ B dµ dµ Z Z ZB ZB Z Z Hence the result holds for characteristic functions, hence for simple functions (finite linear combinations of characteristic functions). Let f L1(X, ,µ) be such that ∈ B f 0. By considering an increasing sequence of simple functions, the result follows ≥ for positive L1 functions. By splitting an arbitrary real-valued L1 function into its positive and negative parts, and then an arbitrary L1(X, ,µ) function into its real B and imaginary parts, the result holds.

(ii) Now dν1/dµ, dν2/dµ are the unique functions such that dν dν ν (B)= 1 dµ, ν (B)= 2 dµ, 1 dµ 2 dµ ZB ZB 129 MATH4/61112 12. Solutions

respectively. Hence dν dν dν dν ν (B)+ ν (B)= 1 dµ + 2 dµ = 1 + 2 dµ. 1 2 dµ dµ dµ dµ ZB ZB ZB However, d(ν + ν ) (ν + ν )(B)= 1 2 dµ. 1 2 dµ ZB Hence, by uniqueness in the Radon-Nikodym theorem, we have that

d(ν + ν ) dν dν 1 2 = 1 + 2 . dµ dµ dµ

(iii) Suppose that µ(B) = 0. As ν µ then ν(B) = 0. As λ ν then λ(B) = 0. Hence ≪ ≪ λ µ. ≪ Now as λ µ we have ≪ dλ λ(B)= dµ. dµ ZB As λ ν we have ≪ dλ dλ λ(B)= dν = χ dν. dν B dν ZB Z By part (i), using the fact that ν µ, it follows that ≪ dλ dλ dν dλ dν χ dν = χ dµ = dµ. B dν B dν dµ dν dµ Z Z ZB Hence by uniqueness in the Radon-Nikodym theorem, dλ dλ dν = . dµ dν dµ

Solution 8.2 The claimed formula is easily seen to be valid for n = 3. Suppose the formula is valid for n. Then x T n+1 y + Z3    z   x  = T T n y + Z3    z    n α + x 1    n n   = T α + x + y + Z3  2 1           n n n    α + x + y + z    3 2 1              

130 MATH4/61112 12. Solutions

n α + x + α 1    n n n   = α + x + y + α + x + Z3  2 1 1             n n n n n    α + x + y + z + α + x + y    3 2 1 2 1                 n + 1   α + x 1    n + 1 n + 1   = α + x + y + Z3 .  2 1           n + 1 n + 1 n + 1    α + x + y + z    3 2 1               Hence the claimed formula holds by induction. Let f(x,y,z)= e2πi(kx+ℓy+mz). Then

n−1 n−1 1 1 (k,ℓ,m) f(T j((x,y,z)+ Z3)) = e2πipx,y,z (j) n n Xj=0 Xj=0 (k,ℓ,m) (k,ℓ,m) where px,y,z (n) is a polynomial. When m = 0, px,y,z (n) is a degree 3 polynomial with 6 (k,ℓ,m) leading coefficient mα/6 Q. When m = 0,ℓ = 0, px,y,z (n) is a degree 2 polynomial with 6∈ 6 (k,ℓ,m) leading coefficient ℓα/2 Q. When m = ℓ = 0, k = 0, px,y,z (n) is a degree 1 polynomial 6∈ 6 (k,ℓ,m) with leading coefficient kα Q. In all three cases, px,y,z (n) is uniformly distributed 6∈ mod 1, by Weyl’s Theorem on Polynomials (Theorem 2.3.1). Hence by Weyl’s Criterion (Theorem 1.2.1) for all (k,ℓ,m) Z3 (0, 0, 0) we have ∈ \ { } n−1 1 (k,ℓ,m) e2πipx,y,z (j) 0 n → Xj=0 as n . When k = ℓ = m = 0 we trivially have that →∞ n−1 n−1 1 (k,ℓ,m) 1 e2πipx,y,z (j) = 1 1 n n → Xj=0 Xj=0 as n . Hence →∞ n−1 1 f(T j((x,y,z)+ Z3)) f dµ n → Xj=0 Z whenever f(x,y,z)= e2πi(kx+ℓy+mz). By taking finite linear combinations of exponential functions we see that

n−1 1 sup g(T j(x)) g dµ 0 n − → x∈X j=0 Z X as n for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (The- → ∞ orem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f C(X, R) ∈

131 MATH4/61112 12. Solutions and let ε > 0. Then there exists a trigonometric polynomial g such that f g < ε. k − k∞ Hence for any x X we have ∈ n−1 1 f(T j(x)) f dµ n − j=0 Z X n−1 n−1 1 1 (f(T j(x)) g (T j (x)) + g(T j(x)) g dµ + g f dµ ≤ n − n − − j=0 j=0 Z Z X X n−1 n−1 1 1 f(T j(x)) g(T j(x)) + g(T j(x)) g dµ + g f dµ ≤ n | − | n − | − | j=0 j=0 Z Z X X n−1 1 2ε + g(T j (x)) g dµ . ≤ n − j=0 Z X ∞

Hence, taking the supremum over all x X , we have ∈ n−1 n−1 1 1 f(T j(x)) f dµ 2ε + g(T j(x)) g dµ . n − ≤ n − j=0 Z j=0 Z X ∞ X ∞

Letting n we see that →∞ n−1 1 lim sup f(T j(x)) f dµ 2ε. n→∞ n − ≤ j=0 Z X ∞

As ε> 0 is arbitrary, it follows that n−1 1 lim f(T j(x)) f dµ = 0. n→∞ n − j=0 Z X ∞

Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s Ergodic Theorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is the unique invariant measure.

Solution 8.3 Let T be a uniquely ergodic homeomorphism with unique invariant measure µ. Suppose that every orbit is dense. Let U be a non-empty open set. Then for all x X, ∈ there exists n Z such that T n(x) U. Hence X = ∞ T −nU. Hence ∈ ∈ n=−∞ ∞ ∞ S ∞ 1= µ(X)= µ T −nU µ(T −nU)= µ(U) ≤ n=−∞ ! n=−∞ n=−∞ [ X X as µ is T -invariant. Hence µ(U) > 0. Conversely, suppose that µ(U) > 0 for all non-empty open sets. Suppose for a contra- diction that there exists x X such that the orbit of x is not dense. 0 ∈ 0 Clearly T n(x ) n Z is T -invariant. As T is continuous, the set { 0 | ∈ } Y = cl T n(x ) n Z { 0 | ∈ }

132 MATH4/61112 12. Solutions is also T -invariant. As the orbit of x0 is not dense, Y is a proper subset of X. As Y is closed and X is compact, it follows that Y is compact. By Theorem 7.5.1 there exists an invariant probability measure ν for the map T : Y Y . Extend ν to X by setting ν(B)= ν(B Y ) → ∩ for Borel subsets B X. Noting that X Y is also T -invariant, it follows that ν is an ⊂ \ invariant measure for T : X X. This contradicts unique ergodicity as ν(X Y )=0 but → \ µ(X Y ) > 0. \ Solution 9.1 Let X = R equipped with the Borel σ-algebra and Lebesgue measure. Define T (x)= x + 1. Then Lebesgue measure is T -invariant. Take A = [0, 1). Then A has positive measure, but no point of A returns to A under T .

Solution 9.2 Take X = 0, 1 to be a set consisting of two elements. Let be the set of all subsets of { } 1 1 B X and equip X with the measure µ = 2 δ0 + 2 δ1 that assigns measure 1/2 to both 0 and 1. Take T (x)= x to be the identity. Then T is a measure-preserving transformation. Let A = 0 , B = 1 . Then µ(A)= µ(B) = 1/2 > 0. However, T j(0) never lands in B. { } { } Solution 9.3 Recall that E(f ) is determined as being the unique -measurable function such that |A A

E(f ) dµ = f dµ |A ZA ZA for all A . ∈A (i) We need to show that

E(αf + βg )= αE(f )+ βE(g ). |A |A |A Note that αE(f )+ βE(g ) is -measurable. Moreover, as |A |A A αE(f )+ βE(g ) dµ = α E(f ) dµ + β E(g ) dµ |A |A |A |A ZA ZA ZA = α f dµ + β g dµ ZA ZA = αf + βg dµ ZA = E(αf + βg ) dµ |A ZA for all A , the claim follows. ∈A (ii) First note that E(f ) T is T −1 -measurable. To see this, note that E(f ) is |A ◦ A |A -measurable, i.e. A x X E(f )(x) c for all c R. { ∈ | |A ≤ }∈A ∈ Hence

x X E(f )(Tx) c = T −1 x X E(f )(x) c T −1 { ∈ | |A ≤ } { ∈ | |A ≤ }∈ A

133 MATH4/61112 12. Solutions

so that E(f ) T is T −1 -measurable. |A ◦ A Note that for any A ∈A

E(f ) T dµ = χ 1 E(f ) T dµ T − A 1 |A ◦ |A ◦ ZT − A Z = χ T E(f ) T dµ A ◦ · |A ◦ Z = χ E(f ) dµ as µ is T -invariant A |A Z = E(f ) dµ |A ZA = f dµ. ZA Moreover

E(f T T −1 ) dµ = f T dµ 1 ◦ | A 1 ◦ ZT − A ZT − A = χ 1 f T dµ T − A ◦ Z = χ T f T dµ A ◦ · ◦ Z = χAf dµ Z = f dµ. ZA Hence E(f T T −1 ) dµ = E(f ) T dµ 1 ◦ | A 1 |A ◦ ZT − A ZT − A for all A . By the characterisation of conditional expectation, it follows that ∈A E(f T T −1 )= E(f ) T. ◦ | A |A ◦ (iii) That E(f ) = f is immediate from the above characterisation of conditional | B expectation.

(iv) Recall that a function f : X R is -measurable if f −1( , c) for all c R. → A −∞ ∈A ∈ Suppose that f is -measurable. Let B = f −1( , c) . Hence µ(B ) = 0 or 1. N c −∞ ∈N c Note that c < c implies B B . Hence there exists c such that 1 2 c1 ⊂ c2 0 c = sup c µ(B ) = 0 = inf c µ(B ) = 1 . 0 { | c } { | c } We claim that f(x) = c µ-a.e. If c < c then µ( x X f(x) < c ) = 0. Hence 0 0 { ∈ | } f(x) c µ-a.e. Let c > c . Then ≥ 0 0 µ( x X f(x) c )= µ(X x X f(x) < c ) = 1 µ( x X f(x) < c ) = 0. { ∈ | ≥ } \ { ∈ | } − { ∈ | } Hence µ( x X f(x) > c ) = 0. Hence f(x)= c µ-a.e. { ∈ | 0} 0

134 MATH4/61112 12. Solutions

Suppose that f is constant almost everywhere, say f(x)= aµ-a.e. Then f −1( , c)= −∞ µ-a.e. if c < a and f −1( , c)= X µ-a.e. if c > a. Hence µ(f −1( , c)) = 0 or 1. ∅ −∞ −∞ Hence f −1( , c) for all c R. Hence f is -measurable. −∞ ∈N ∈ N If N has measure 0 then ∈N f dµ =0= f dµ dµ ZN ZN Z  and if N has measure 1 then ∈N f dµ = f dµ = f dµ dµ. ZN Z ZN Z  Hence E(f )= f dµ. |N Solution 9.4 R (i) Let α = A ,...,A be a finite partition of X into sets A and let be the set { 1 n} j ∈B A of all finite unions of sets in α. Trivially . ∅∈A Let B = ℓj A , A α, be a countable collection of finite unions of sets in α. j i=1 i,j i,j ∈ Then B is a union of sets in α. As there are only finitely many sets in α, we have j Sj that B is a finite union of sets in α. Hence B . Sj j j j ∈A It is clear that is closed under taking complements. S A S Hence is a σ-algebra. A (ii) Recall that g : X R is -measurable if g−1( , c) for all c R. → A −∞ ∈A ∈ Suppose that g is constant on each A α and write g(x) = c χ (x). Then j ∈ j j Aj g−1( , c)= A where the union is taken over sets A for which c < c. Hence g −∞ j j P j is -measurable. A S Conversely, suppose that g is -measurable. For each c R, let A = g−1( , c). A ∈ c −∞ Then A . Moreover, A as c (in the sense that A = ) and c ∈ A c ↓ ∅ → −∞ c∈R c ∅ A X as c (in the sense that A = X). Let A α. Then there exists c c ↑ → ∞ c c ∈ T 0 such that A A for c < c and A A for c > c . Hence g(x) = c for all x A. 6⊂ c 0 ⊂S c 0 0 ∈ Hence g is constant on each element of α.

(iii) Define g by n f dµ Aj g(x)= χAj (x) . Rµ(Aj) Xj=1 Then g is constant on each set in α, hence g is -measurable. A Let A α. Then i ∈ n f dµ n f dµ Aj Aj g dµ = χAi χAj dµ = χAi∩Aj dµ = f dµ. Ai Rµ(Aj) Rµ(Aj) Ai Z Xj=1 Z Xj=1 Z Z Hence g dµ = f dµ for all A . It follows that g = E(f ). A A ∈A |A R R

135 MATH4/61112 12. Solutions

Solution 9.5 Clearly . ∅∈I Let I , so that T −1(I)= I. Then T −1(X I)= X I, so that the complement of I ∈I \ \ is in . I Let I . Then T −1( I )= T −1I = I so that I . n ∈I n n n n n n n n ∈I Hence is a σ-algebra. I S S S S Solution 9.6 Recall that E(f ) is determined by the requirements that E(f ) is -measurable and |I |I I that E(f ) dµ = f dµ |I ZI ZI for all I . Let P : L2(X, ,µ) L2(X, ,µ) denote the orthogonal projection onto the ∈I I B → I subspace of -measurable functions. To show that P f = E(f ) it is thus sufficient to I I | I check that for each I we have ∈I

PI f dµ = f dµ ZI ZI for all f L2(X, ,µ). ∈ B Note that P f dµ = χ P f dµ = χ , P f and, similarly, f dµ = χ ,f , where I I I I h I I i I h I i we use , to denote the inner product on L2(X, ,µ). Hence it is sufficient to prove that, h· ·i R R B R for all I , χ ,f P f = 0. ∈I h I − I i It is proved in the proof of Theorem 9.6.1 that L2(X, ,µ)= L2(X, ,µ) C where C B I ⊕ denotes the norm-closure of the subspace w T w w L2(X, ,µ) . Hence it is sufficient { ◦ − | ∈ B } to prove that χ , g = 0 for all g C. To see this, first note that for w L2(X, ,µ) we h I i ∈ ∈ B have that

χ , w T w = χ , w T χ , w h I ◦ − i h I ◦ i − h I i = χ 1 , w T χI , w h T − I ◦ i − h i = χ T, w T χ , w = 0, h I ◦ ◦ i − h I i using the facts that I = T −1I a.e. and that T is measure-preserving. It follows that χ , g = 0 for all g C. h I i ∈ Solution 10.1 Let T be an ergodic measure-preserving transformation of the probability space (X, ,µ) B and let f L1(X, ,µ). Let ∈ B n−1 j Sn = f(T x). Xj=0 By Birkhoff’s Ergodic Theorem, there exists a set N such that µ(N)=0andif x N then 6∈ S /n f dµ as n . Let x N. Note that n → →∞ 6∈ R n + 1 S f(T nx) S n+1 = + n . n n + 1 n n

Letting n we have that (n + 1)/n 1, 1 S f dµ and 1 S f dµ as → ∞ → n+1 n+1 → n n → n . Hence if x N then f(T nx)/n 0 as n . Hence f(T nx)/n 0 as n → ∞ 6∈ → →∞ R → R → ∞ for µ-a.e. x X. ∈

136 MATH4/61112 12. Solutions

Solution 10.2 Let f 0 be measurable and suppose that f dµ = . For each integer M > 0 de- ≥ ∞ fine f (x) = min f(x), M . Then 0 f M, hence f L1(X, ,µ). Moreover M { } ≤ M R≤ M ∈ B f (x) f(x) as M for all x X. Hence by the Monotone Convergence Theorem M ↑ → ∞ ∈ (Theorem 3.1.2), f dµ f dµ = . M → ∞ By Birkhoff’s Ergodic Theorem, there exists N X with µ(N ) = 0 such that for R R M ⊂ M all x NM we have 6∈ n−1 1 j lim fM (T x)= fM dµ. (12.0.8) n→∞ n Xj=0 Z Let N = ∞ N . Then µ(N) = 0. Moreover, for any M > 0 we have that if x N M=1 M 6∈ then (12.0.8) holds. S Let K 0 be arbitrary. As f dµ , it follows that there exists M > 0 such that ≥ M →∞ f dµ K. Hence for all x N we have M ≥ 6∈R R n−1 n−1 1 j 1 j lim inf f(T x) lim fM (T x)= fM dµ K. n→∞ n ≥ n→∞ n ≥ Xj=0 Xj=0 Z As K is arbitrary, we have that for all x N 6∈ n−1 1 lim inf f(T jx)= . n→∞ n ∞ Xj=0 Hence 1 ∞ f(T jx) for µ-a.e. x X. n j=0 →∞ ∈ P Solution 10.3 We prove that (i) implies (ii). Suppose that T is an ergodic measure-preserving trans- formation of the probability space (X, ,µ). Recall from Proposition 10.2.2 that for all B A, B , ∈B n−1 1 lim µ(T −jA B)= µ(A)µ(B), n→∞ n ∩ Xj=0 Equivalently, for all A, B we have ∈B n−1 1 j lim χA(T x)χB(x) dµ = χA dµ χB dµ. (12.0.9) n→∞ n Xj=0 Z Z Z r Let f(x) = k=1 ckχAk (x) be a simple function. Then taking linear combinations of expressions of the form (12.0.9) we have that P n−1 1 j lim f(T x)χB(x) dµ = f dµ χB dµ. n→∞ n Xj=0 Z Z Z If f 0 is a positive measurable function then we can choose a sequence of simple ≥ functions f f that increase pointwise to f. By the Monotone Convergence Theorem n ↑ (Theorem 3.1.2) we have that

n−1 1 j lim f(T x)χB(x) dµ = f dµ χB dµ (12.0.10) n→∞ n Xj=0 Z Z Z 137 MATH4/61112 12. Solutions for all positive measurable functions f. Suppose that f L1(X, ,µ) is real-valued. Then ∈ B by writing f = f + f − where f +,f − are positive, we have that (12.0.10) holds when f is − integrable and real-valued. By taking real and imaginary parts of f, we have that (12.0.10) holds for all f L1(X, ,µ). ∈ B By taking finite linear combinations of characteristic functions in (12.0.10) we have that

n−1 1 lim f(T jx)g(x) dµ = f dµ g dµ. (12.0.11) n→∞ n Xj=0 Z Z Z for all simple functions g. By taking an increasing sequence of simple functions and applying the Monotone Convergence Theorem as above, we have that (12.0.11) holds for all positive measurable functions g. By writing g = g+ g− where g+, g− are positive, we have that − (12.0.11) holds for any real-valued integrable function g. By taking real and imaginary parts, we have that (12.0.11) holds for any g L1(X, ,µ). ∈ B We prove that (ii) implies (i). Suppose that for all f, g L2(X, ,µ) we have that ∈ B n−1 1 lim f(T jx)g(x) dµ = f dµ g dµ. n→∞ n Xj=0 Z Z Z Suppose that T −1B = B, B . Then χ L2(X, ,µ). Taking f = g = χ we have ∈ B B ∈ B B that n−1 1 j 2 lim χB(T x)χB(x) dµ χB dµ χB dµ = µ(B) . n→∞ n → Xj=0 Z Z Z j −j Note that χ (T x)χ (x)= χ j (x)= χ (x) as T B = B. Hence B B T − B∩B B n−1 n−1 1 1 χ (T jx)χ (x) dµ = χ dµ = χ dµ = µ(B). n B B n B B Xj=0 Z Xj=0 Z Z Hence µ(B)= µ(B)2 so that µ(B) = 0 or 1.

Solution 10.4 Choose a countable dense set of continuous functions f ∞ C(X, R). By Birkhoff’s { i}i=1 ⊂ Ergodic Theorem there exists Y such that µ(Y )=1 and i ∈B i n−1 1 j lim fi(T x)= fi dµ n→∞ n Xj=0 Z for all x Y . Let Y = ∞ Y . Then Y and µ(Y ) = 1. ∈ i i=1 i ∈B Let f C(X, R),ε> 0,x Y . Choose i such that f f < ε. Choose N such that ∈ T ∈ k − ik∞ if n N then ≥ n−1 1 f (T jx) f dµ < ε. n i − i j=0 Z X

Then

n−1 1 f(T jx) f dµ n − j=0 Z X

138 MATH4/61112 12. Solutions

n−1 n−1 1 1 f(T jx) f (T jx) + f (T jx) f dµ + f dµ f dµ . ≤ n − i n i − i i − j=0 j=0 Z Z Z X  X < 3ε.

As ε> 0 is arbitrary, we have that for all f C(X, R) and for all x Y , ∈ ∈ n−1 1 lim f(T jx)= f dµ. n→∞ n Xj=0 Z Solution 10.5 Let S = A,B,C,...,Z denote the finite set of letters (symbols) in the alphabet. Let { } Σ = x = (x )∞ x S, j = 0, 1, 2,... denote the space of all infinite sequences of { j j=0 | j ∈ } symbols. For each s S, let p(s) = 1/26 denote the probability of choosing symbol s. Let ∈ denote the Borel σ-algebra on Σ and equip Σ with the Bernoulli probability measure µ B defined on cylinders by

µ([i , i ,...,i ]) = p(i )p(i ) p(i ). 0 1 n−1 0 1 · · · n−1 Define σ :Σ Σ by (σ(x)) = x . → j j+1 We regard an element x Σ as one possible outcome of the monkey typing an infinite ∈ sequence of letters. Let B denote the cylinder [M,O,N,K,E,Y ]. Then µ(B) = 1/266 > 0. By Birkhoff’s Ergodic Theorem, for µ-a.e. x Σ, ∈ n−1 1 j lim χB(σ (x)) = µ(B) > 0. n→∞ n Xj=0 Hence, almost surely, the infinite sequence of letters x will contain ‘MONKEY’. Hence, with probability 1, the monkey will type the word ‘MONKEY’. (Indeed, with probability one he will type ‘MONKEY’ infinitely often.) By Kac’s Lemma, the expected first time at which ‘MONKEY’ appears is 1/µ(B) = 266. If the monkey types 1 letter a second, then one would expect to wait 266 seconds (about 9.8 years) until ‘MONKEY’ first appears in a block of 6.

Solution 11.1 We first claim that for each integer b 2, T (x) = T (x) = bx mod 1 is ergodic with ≥ b respect to Lebesgue measure µ (we already know that Lebesgue measure is invariant by Exercise 3.6). To see this, we use Fourier series, following the argument that was used to prove that the doubling map is ergodic with respect to Lebesgue measure. Suppose that f L2(R/Z, ,µ) is such that f T = f µ-a.e. Then f T p = f µ- ∈ B ◦ ∞ 2πinx ◦ p a.e. for all p N. Associate to f its Fourier series n=−∞ cne . Then f T has ∈ ∞ 2πinbpx ◦ Fourier series c e . Comparing Fourier coefficients we see that c p = c . n=−∞ n P b n n Suppose that n = 0. Then bpn as n . By the Riemann-Lebesgue Lemma P6 → ∞ → ∞ (Proposition 5.3.1(ii)), c = c p 0 as n . Hence c = 0 if n = 0. Hence f has n b n → → ∞ n 6 Fourier series c0, i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure.

Solution 11.2 (i) Let X = x [0, 1) x is simply normal in base b . b { ∈ | }

139 MATH4/61112 12. Solutions

Then for each b 2, X has Lebesgue measure µ(X ) = 1. Hence ≥ b b ∞ X∞ = Xb b\=2 consists of all numbers that are simply normal in every base b 2. Clearly µ(X ) = 1. ≥ ∞ (ii) Let X(b) denote the set of numbers that are normal in base b, b 2. Then X∞ = ∞ ≥ b=2 Xb consists of all normal numbers. Clearly µ(X∞) = 1. Alternatively, note that x [0, 1] is simply normal in base bk if and only if every word T ∈ of length k occurs with frequency 1/bk in the base b expansion of x. Hence a number is normal in every base if and only if it is simply normal in every base.

Solution 11.3 Let T (x)= rx mod 1, T : R/Z R/Z. From Exercise 11.1 we know that T is ergodic with → respect to Lebesgue measure µ. Let x [0, 1] and let x = rnx. Then x , the fractional ∈ n { n} part of x , is equal to T nx. Let ℓ Z 0 and let f (x) = e2πiℓx. Then there exists n ∈ \ { } ℓ N , µ(N ) = 0, such that ℓ ∈B ℓ n−1 n−1 1 2πiℓxj 1 j lim e = lim fℓ(T x)= fℓ(x) dx = 0 n→∞ n n→∞ n Xj=0 Xj=0 Z for all x N . 6∈ ℓ Let N = ℓ∈Z\{0} Nℓ. As µ(Nℓ) = 0 and this is a countable union, we have that µ(N) = 0. Hence if x N we have for all ℓ Z 0 S 6∈ ∈ \ { } n−1 1 lim e2πiℓxn = 0. n→∞ n Xj=0 By Weyl’s Criterion it follows that if x N then x is uniformly distributed mod 1. 6∈ n (Aside: you might wonder why we had to use Weyl’s Criterion and did not just use the definition of uniform distribution. Whilst it is certainly true that

n−1 1 1 card j 0, 1,...,n 1 x [a, b] = χ (T jx) χ dµ = b a n { ∈ { − } | { j}∈ } n [a,b] → [a,b] − Xj=0 Z for µ-a.e. x X, the set of measure zero for which this fails depends on the interval [a, b]. ∈ We need a set of measure zero that works for all intervals. As there are uncountably many intervals, we cannot just take the union of all the sets of measure zero as we did above. One can make an argument along these lines work, by considering intervals with rational endpoints (and so a countable collection of intervals) and then approximate an arbitrary interval.)

Solution 11.4 Let T (x) = 10x mod 1. From Exercise 11.1 we know that T is ergodic with respect to Lebesgue measure. Let x [0, 1] have decimal expansion ∈ ∞ x x = j 10j+1 Xj=0 140 MATH4/61112 12. Solutions with xj 0, 1,..., 9 . Let ∈ { } 9

f(x)= kχ[k/10,(k+1)/10) Xk=0 j so that f(x) = k precisely when x0 = k. Note that f(T x) = k precisely when xj = k. Then n−1 1 1 (x + x + + x )= f(T jx). n 0 1 · · · n−1 n Xj=0 Hence by Birkhoff’s Ergodic Theorem, n−1 1 1 j lim (x0 + x1 + + xn−1) = lim f(T x) n→∞ n · · · n→∞ n Xj=0 = f(x) dx a.e. Z 9 k = a.e. = 4.5 a.e. 10 Xk=0 Solution 11.5 If x [0, 1] then write the continued fraction expansion of x as [x ,x ,...]. ∈ 0 1 Define f : [0, 1] R by → ∞ f(x)= kχ(1/(k+1),1/k](x) Xk=1 if x (0, 1] and set f(0) = 0. Then for x (0, 1] we have that f(x) = k precisely when ∈ ∈ 1/(k + 1)

142