<<

Notes on ergodic theory

Michael Hochman1

January 27, 2013

1Please report any errors to [email protected] Contents

1 Introduction 4

2 preserving transformations 6 2.1 Measure preserving transformations ...... 6 2.2 Recurrence ...... 9 2.3 Inducedactiononfunctionsandmeasures ...... 11 2.4 Dynamics on metric spaces ...... 13 2.5 Some technicalities ...... 15

3 Ergodicity 17 3.1 Ergodicity ...... 17 3.2 Mixing ...... 19 3.3 Kac’s return time formula ...... 20 3.4 Ergodic measures as extreme points ...... 21 3.5 Ergodic decomposition I ...... 23 3.6 Measure integration ...... 24 3.7 Measure disintegration ...... 25 3.8 Ergodic decomposition II ...... 28

4 The ergodic theorem 31 4.1 Preliminaries ...... 31 4.2 Mean ergodic theorem ...... 32 4.3 Thepointwiseergodictheorem ...... 34 4.4 Generic points ...... 38 4.5 Unique ergodicity and circle rotations ...... 41 4.6 Sub-additive ergodic theorem ...... 43

5 Some categorical constructions 48 5.1 Isomorphismandfactors...... 48 5.2 Product systems ...... 52 5.3 Natural extension ...... 53 5.4 Inverselimits ...... 53 5.5 Skew products ...... 54

1 CONTENTS 2

6 Weak mixing 56 6.1 Weak mixing ...... 56 6.2 Weak mixing as a multiplier property ...... 59 6.3 Isometricfactors ...... 61 6.4 Eigenfunctions ...... 62 6.5 Spectral isomorphism and the Kronecker factor ...... 66 6.6 Spectral methods ...... 68

7 Disjointness and a taste of entropy theory 73 7.1 Joinings and disjointness ...... 73 7.2 Spectrum, disjointness and the Wiener-Wintner theorem . . . . 75 7.3 Shannon entropy: a quick introduction ...... 77 7.4 Digression: applications of entropy ...... 81 7.5 Entropy of a stationary process ...... 84 7.6 Couplings, joinings and disjointness of processes ...... 86 7.7 Kolmogorov-Sinai entropy ...... 89 7.8 Application to filterling ...... 90

8 Rohlin’s lemma 91 8.1 Rohlin lemma ...... 91 8.2 The group of automorphisms and residuality ...... 93

9 Appendix 98 9.1 Theweak-*topology...... 98 9.2 Conditionalexpectation ...... 99 9.3 Regularity ...... 103 Preface

These are notes from an introductory course on ergodic theory given at the Hebrew University of Jerusalem in the fall semester of 2012. The course covers the usual basic subjects, though relatively little about entropy (a subject that was covered in a course the previous year). On the less standard side, we have included a discussion of Furstenberg disjointness.

3 Chapter 1

Introduction

At its most basic level, dynamical systems theory is about understanding the long-term behavior of a map T : X X under iteration. X is called the phase ! space and the points x X may be imagined to represent the possible states 2 of the “system”. The map T determines how the system evolves with time: time is discrete, and from state x it transitions to state Tx in one unit of time. Thus if at time 0 the system is in state x,thenthestateatallfuturetimes t =1, 2, 3,... are determined: at time t =1it will be in state Tx,attimet =2 in state T (Tx)=T 2x,andsoon;ingeneralwedefine

T nx = T T ... T (x) n so T nx is the state of the system| at time{z n,assumingthatattimezeroitis} in state x. The “future” trajectory of an initial point x is called the (forward) orbit, denoted O (x)= x, T x, T 2x, . . . T { } 1 When T is invertible, y = T x satisfies Ty = x,soitrepresentsthestateof n 1 n n 1 the world at time t = 1, and we write T =(T ) =(T ) .Theonecan also consider the full or two-sided orbit

n O±(x)= T x : n Z T { 2 } There are many questions one can ask. Does a point x X necessarily 2 return close to itself at some future time, and how often this happens? If we fix another set A,howoftendoesx visit A? If we cannot answer this for all points, we would like to know the answer at least for typical points. What is the behavior of pairs of points x, y X:dotheycomeclosetoeachother?given 2 another pair x0,y0, is there some future time when x is close to x0 and y is close to y0? If f : X R,howwelldoesthevalueoff at time 0 predict its value at ! future times? How does randomness arise from deterministic evolution of time? And so on.

4 CHAPTER 1. INTRODUCTION 5

The set-theoretic framework developed so far there is relatively little that can be said besides trivialities, but things become more interesting when more structure is given to X and T .Forexample,X may be a , and T continuous; or X may be a compact manifold and T adifferentiablemap(or k-times differentiable for some k); or there may be a measure on X and T may preserve it (we will come give a precise definition shortly). The first of these settings is called topological dynamics, the second smooth dynamics,andthe last is ergodic theory.Ourmainfocusinthiscourseisergodictheory,though we will also touch on some subjects in topological dynamics. One might ask why these various assumptions are natural ones to make. First, in many cases, all these structures are present. In particular a theorem of Liouville from celestial mechanics states that for Hamiltonian systems, e.g. systems governed by Newton’s laws, all these assumptions are satisfied. Another example comes from the algebraic setting of flows on homogeneous spaces. At the same time, in some situations only some of these structures is available; an example is can be found in the applications of ergodic theory to combinatorics, where there is no smooth structure in sight. Thus the study of these assumptions individually is motivated by more than mathematical curiosity. In these notes we focus primarily on ergodic theory, which is in a sense the most general of these theories. It is also the one with the most analytical flavor, and a surprisingly rich theory emerges from fairly modest axioms. The purpose of this course is to develop some of these fundamental results. We will also touch upon some applications and connections with dynamics on compact metric spaces. Chapter 2

Measure preserving transformations

2.1 Measure preserving transformations

Our main object of study is the following. Definition 2.1.1. Ameasurepreservingsystemisaquadruple =(X, ,µ,T) X B where (X, ,µ) is a probability space, and T : X X is a measurable, measure- B ! preserving map: that is

1 1 T A and µ(T A)=µ(A) for all A 2B 2B 1 If T is invertible and T is measurable then it satisfies the same conditions, and the system is called invertible. Example 2.1.2. Let X be a finite set with the -algebra of all subsets and normalized µ,andT : X X a bijection. This is a measure ! preserving system, since measurability is trivial and

1 1 1 1 µ(T A)= T A = A = µ(A) X | | X | | | | | | This example is very trivial but many of the phenomena we will encounter can already be observed (and usually are easy to prove) for finite systems. It is worth keeping this example in mind. Example 2.1.3. The identity map on any is measure preserving. Example 2.1.4 (Circle rotation). Let X = S1 with the Borel sets and 1 1 B normalized length measure µ.Let↵ R and let R↵ : S S denote the 2 ! rotation by angle ↵,thatis,z e2⇡i↵z (if ↵/2⇡Z then this map is not the 7! 2 identity). Then R↵ preserves µ;indeed,ittransformsintervalstointervalsof equal length. If we consider the algebra of half-open intervals with endpoints

6 CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 7 in Q[↵],thenT preserves this algebra and preserves the measure on it, hence it preserves the extension of the measure to , which is µ. B This example is sometimes described as X = R/Z, then the map is written additively, x x + ↵. 7! This example has the following generalization: let G be a compact group with normalized Haar measure µ,fixg G,andconsiderRg : G G given 1 2 1 ! by x gx.Toseethatµ(T A)=µ(A),let⌫(A)=µ(g A),andnote ! that ⌫ is a Borel probability measure that is right invariant: for any h H, 1 1 2 ⌫(Bh)=µ(g Bh)=µ(g B)=⌫(B). This ⌫ = µ. Example 2.1.5 (Doubling map). Let X =[0, 1] with the Borel sets and Lebesgue measure, and let Tx =2x mod 1.Thismapisontois,not1-1,in 1 fact every point has two pre-images which differ by 2 ,exceptfor1, which is not in the image. To see that T2 preserves µ,notethatforanyinterval I =[a, a + r) [0, 1), ✓

1 a a + r a 1 a + r 1 T [a, a + r)=[ , ) [ + , + ) 2 2 2 [ 2 2 2 2 which is the union of two intervals of length half the length; the total length is unchanged. Note that TI is generally of larger length than I;thepropertyofmeasure 1 preservation is defined by µ(T A)=µ(A). This example generalizes easily to Tax = ax mod 1 for any 1 1 Lebesgue measure is not preserved. If we identify [0, 1) with R/Z then the example above coincides with the endomorphism x 2x of the compact group R/Z.Moregenerallyonecan 7! consider a compact group G with Haar measure µ and an endomorphism T : G G. Then from uniqueness of Haar measure one again can show that T ! preserves µ. Example 2.1.6. (Symbolic spaces and product measures) Let A be a finite set, A 2, which we think of as a discrete topological space. Let X+ = AN and | | X = AZ with the product -algebras. In both cases there is a map which shifts “to the right”, (x)n = xn+1

In the case of X this is an invertible map (the inverse is (x)n = xn 1). In the + one-sided case X ,theshiftisnot1-1sinceforeverysequencex = x1x2 ... AN 1 2 we have (x)= x x x ... : x A . { 0 1 2 0 2 } Let p be a probability measure on A and µ = pZ, µ+ = pN the product measures on X, X+,respectively.Byconsideringthealgebraofcylindersets [a]= x : x = a , where a is a finite sequence of symbols, one may verify that { i i} preserves the measure.

Example 2.1.7. (Stationary processes) In probability theory, a sequence ⇠ 1 { n}n=1 of random variables is called stationary if the distribution of a consecutive n- tuple (⇠k,...,⇠k+n 1) does not depend on where it behind; i.e. (⇠1,...,⇠n)= CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 8

(⇠k,...,⇠k+n 1) in distribution for every k and n. Intuitively this means that if we observe a finite sample from the process, the values that we see give no information about when the sample was taken. From a probabilistic point of view it rarely matters what the sample space is and one may as well choose it to be (X, )=(Y N, N), where (Y, ) is the range B C C of the variables. On this space there is again defined the shift map : X X ! given by ((y )1 )=(y )1 .ForanyA ,...,A and k let n n=1 n+1 n=1 1 n 2C Ai = Y ... Y A ... A Y Y Y ... ⇥ ⇥ ⇥ 1 ⇥ ⇥ n ⇥ ⇥ ⇥ ⇥ k

Note that is generated| {z by} the family of such sets. If P is the underlying B probability measure, then stationarity means that for any A1,...,An and k,

P (A0)=P (Ak)

k k 0 1 Since A = A this shows that the family of sets B such that P ( B)= P (B) contains all the sets of the form above. Since this family is a -algebra and the sets above generate ,weseethat preserves P . B There is a converse to this: suppose that P is a -invariant measure on X = Y N.Define⇠n(y)=yn. Then (⇠n) is a stationary process. Example 2.1.8. (Hamiltonian systems) The notion of a measure-preserving system emerged from the following class of examples. Let ⌦=R2n;wedenote ! ⌦ by ! =(p, q) where p, q Rn. Classically, p describes the positions of 2 2 particles and q their momenta. Let H :⌦ R be a smooth map and consider ! the differential equation d @H pi = dt @qi d @H q˙i = dt @pi

Under suitable assumptions, for every initial state ! =(p0,q0) ⌦ and t R 2 2 there is determines a unique solution !(t)=(p(t),q(t)),and!t = !(t) is the state of the world after evolving for a period of t started from !. Thinking of t as fixed, we have defined a map T :⌦ ⌦ by T ! = (t). t ! t ! Clearly T0(!)=!(0) = !

We claim that this is an action of R. Indeed, notice that (s)=!(t + s) satisfies (0) = !(t)=!t and ˙ (s)=! ˙t (t + s),andsoA(, ˙ )=A(!(t + s), ˙ !(t + s)) = 0. Thus by uniqueness of the solution, !t (s)=!(t + s). This translates to

Tt+s(!)=!(t + s)=!t (s)=Ts!t = Ts(Tt!) and of course also Tt+s = Ts+t = TtTs!. Thus (Tt)t is action of R on ⌦. 2R CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 9

It often happens that ⌦ contains compact subsets which are invariant under the action. For example there may be a notion of energy E :⌦ R that !1 is preserved, i.e. E(Tt!)=E(!),andthenthelevelsetsM = E (e0) are invariant under the action. E is nice enough, M will be a smooth and often compact manifold. Furthermore, by a remarkable theorem of Liouville, if the equation governing the evolution is a Hamiltonian equation (as is the case in classical mechanics) then the flow preserves volume, i.e. vol(TtU) = vol(U) for every t and open (or Borel) set U. The same is true for the volume form on M.

2.2 Recurrence

One of deep and basic properties of measure preserving systems is that they display “recurrence”, meaning, roughly, that for typical x,anythingthathappens along its orbit happens infinitely often. This phenomenon was first discovered by Poincaré and bears his name. Given a set A and x A it will be convenient to say that x returns to A if n 2 n T x A for some n>0;thisisthesameasx A T A.Wesaythatx 2 2 \ returns for A infinitely often if there are infinitely many such n. The following proposition is, essentially, the pigeon-hole principle. Proposition 2.2.1. Let A be a measurable set, µ(A) > 0. Then there is an n n such that µ(A T A) > 0. \ 1 2 k Proof. Consider the sets A, T A, T A, . . . , T A.SinceT is measure pre- i serving, all the sets T A have measure µ(A),sofork>1/µ(A) they cannot k i be pairwise disjoint mod µ (if they were then 1 µ(X) µ(T A) > 1, i=1 which is impossible). Therefore there are indices 0 i 0. Now, \ i j i (j i) T A T A = T (A T A) \ \ (j i) so µ(A T A) > 0,asdesired. \ Theorem 2.2.2 (Poincare recurrence theorem). If µ(A) > 0 then µ-a.e. x A 2 returns to A. Proof. Let

n 1 n E = x A : T x/A for n>0 = A T A { 2 2 } \ n=1 [ n n Thus E A and T E E T E A = for n 1 by definition. Therefore ✓ \ ✓ \ ; by the previous corollary, µ(E)=0. Corollary 2.2.3. If µ(A) > 0 then µ-a.e. x A returns to A infinitely often. 2 CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 10

Proof. Let E be as in the previous proof. For any k-tuple n1

Proof. Let Ai = Bri (xi) be a countable sequence of balls that generate the topology. By Theorem 2.2.2, there are sets A0 A of full measure such that i ✓ i every x A0 returns to A .LetX = X (A A0 ), which is of full µ-measure. 2 i i 0 \ i \ i For x X0 if x Ai then x returns to Ai, so it returns to within diam An of 2 2 S | | itself. Since x belongs to An of arbitrarily small diameter, x is recurrent. When the phenomenon of recurrence was discovered it created quite a stir. Indeed, by Liouville’s theorem it applies to Hamiltonian systems, such as plan- etary systems and the motion of molecules in a gas. In these settings, Poincaré recurrence seems to imply that the system is stable in the strong sense that it nearly returns to the same configuration infinitely often. This question arose original in the context of stability of the solar system in a weaker sense, i.e., will it persist indefinitely or will the planets eventually collide with the sun, or fly offinto deep space. Stability in the strong sense above contradicts our experience. One thing to note, however, is the time frame for this recurrence is enormous, and in the celestial-mechanical or thermodynamics context it does not say anything about the short-term stability of the systems. Recurrence also implies that there are no quantities that only increase as time moves forwards; this is on the face of it in contradiction of the second law of thermodynamics, which asserts that the thermodynamic entropy of a mechanical system increases monotonely over time. A function f : X R is in- ! creasing (respectively, constant) along orbits if f(Tx) f(x) a.e. (respectively f(Tx)=f(x) a.e.). This is the same as requiring that for a.e. x the sequence f(x),f(Tx),f(T 2x),...is non-decreasing (respectively constant). Although su- perficially stronger, the latter condition follows because for fixed n,

n+1 n n µ(x : f(T (x)) 0 let

J()= x X : f(Tx) f(x)+ { 2 }

We must show that µ(J()) = 0 for all >0,sincethenµ( n1=1 J(1/n)) = 0, which implies that f(Tx)=f(x) for a.e. x. S Suppose there were some >0 such that µ(J()) > 0.Fork Z let 2 J(, k)= x J():k f(x) < (k + 1) { 2 2  2} J()= J(, k) k µ(J(, k)) > 0 Notice that k Z ,sothereissome with .Onthe other hand, if x J(,2 k) then 2 S f(Tx) f(x)+ k + >(k + 1) 2 2 so Tx / J(, k).Similarlyforanyn 1,sincef is increasing on orbits, 2 f(T nx) f(Tx) > (k + 1) ,soT nx/J(, k). We have shown that no point of 2 2 J(, k) returns to J(, k),contradictingPoincarérecurrence. The last result highlights the importance of measurability. Using the axiom of choice one can easily choose a representative x from each orbit, and using it define f(T nx)=n for n 0 (and also n<0 if T is invertible). Then we have a function which is strictly increasing along orbits; but by the corollary, it cannot be measurable.

2.3 Induced action on functions and measures

Given a map T : X Y there is an induced map T on functions with domain ! Y ,givenby Tf(x)=f(Tx) b On the space f : Y R or f : Y C the operator T has some obvious ! b ! properties: it is linear, positive (f 0 implies Tf 0), multiplicative (T (fg)= Tf Tg). Also Tf = T f and T (f c)=(Tf)c. b · | | | | When (X, ) and (Y, ) are measurable spacesb and T is measurable,b the B C inducedb b map T actsb onb the spaceb of measurableb functions on Y . Similarly, in the measurable setting T induces a map on measures. Write (X) and (bX) for the spaces signed measures and probability measures on M P (X, ),respectively,andsimilarlyforY . Then T : (X) (Y ) is given by B M !M 1 (Tµ)(A)=µ(T A) for measurable A Y b ✓ b CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 12

This is called the push-forward of µ and is sometimes denoted T µ or T#µ. It ⇤ is easy to check that Tµ (Y ) and that µ Tµ this is a measure on Y . It 2M 7! is easy to see that this operator is also linear, i.e. T (aµ + b⌫)=aTµ+ bT⌫ for scalars a, b. b b Lemma 2.3.1. ⌫ = Tµ is the unique measure satisfyingb fd⌫=b Tf dµb for every bounded measurable function f : Y R (or for every´ f C(X´) if X is a ! 2 compact). b b

Proof. For A note that T 1 (x)=1 (Tx)=1 1 (x),hence 2C A A T A T 1 dµ = µ(T 1A)=(Tµ)(A)= 1 dTµ ˆ A b ˆ A This shows that ⌫ =b Tµ has the stated propertyb when f isb an indicator func- tion. Every bounded measurable function (and in particular every continuous function if X is compact)b is the pointwise limit of uniformly bounded sequence of linear combinations of indicator functions, so the same holds by dominated convergence (note that f f implies Tf Tf). n ! n ! Uniqueness follows from the fact that ⌫ is determined by the values of fd⌫ as f ranges over bounded measurable functions,or,b b when X is compact,´ over continuous functions. Corollary 2.3.2. Let (X, ,µ) be a measure space and T : X X a measurable B ! map. Then T preserves µ if and only if fdµ = Tf dµ for every bounded measurable f : X R (or f C(X) if X ´is compact)´ ! 2 b Proof. T preserves µ if and only if µ = Tµ,sothisisaspecialcaseofthe previous lemma. b Proposition 2.3.3. Let f : X Y be a map between measurable spaces, ! µ (X) and ⌫ Tµ (Y ). Then T maps Lp(⌫) isometrically into Lp(µ) 2P 2 2P for every 1 p .  1 Proof. First note thatb if f is an a.e. definedb function then Tf is also, because 1 if E is the nullset where f is not defined then T E is the set where Tf is 1 not defined, and µ(T E)=⌫(E)=0. Thus T acts on equivalenceb classes of measurable functions mod µ. Now, for 1 p< we have b  1 p b p p p p Tf = Tf dµ = T ( f ) dµ = f d⌫ = f p p ˆ | | ˆ | | ˆ | | k k For p = theb claim followsb from theb identity f =limp f p. 1 k k1 !1 k k Corollary 2.3.4. In a measure preserving system T is a norm-preserving self- map of Lp, and if T is invertible then T is an isometry of Lp. The operator T on L2 is sometimes called the Koopmanb operator.WhenT is invertible it is a unitary operator andb opens up the door for using spectral techniques to studyb the underlying system. We will return to this idea later. We will follow the usual convention and write T instead of T . This introduces slight ambiguity but the meaning should usually be clear from he context. b CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 13

2.4 Dynamics on metric spaces

Many (perhaps most) spaces and maps studied in ergodic theory have additional topological structure, and there is a developed dynamical theory for system of this kind. Here we will discuss only a few aspects of it, especially those which are related to ergodic theory. Definition 2.4.1. Atopologicaldynamicalsystemisapair(X, T) where X is acompactmetricspaceandT : X X is continuous. ! It is sometimes useful to allow compact non-metrizable spaces but in this course we shall not encounter them. Before we begin discussing such systems we review some properties of the space of measures. Let (X) denote the linear space of signed (finite) Borel M measures on X and (X) (X) the convex space of Borel probability mea- P ✓M sures. Two measures µ, ⌫ (X) are equal if and only if fdµ = fd⌫ for 2M all f C(X),sothemapsµ fdµ, f C(X),separatepoints.´ ´ 2 7! ´ 2 Definition 2.4.2. The weak-* topology on (X) (or (X))istheweakest M P topology that make the maps µ fdµ continuous for all f C(X). In 7! 2 particular, ´

µ µ if and only if fdµ fdµ for all f C(X) n ! ˆ n ! ˆ 2

Proposition 2.4.3. The weak-* topology is metrizable and compact. For the proof see Appendix 9. Let (X, T) be a topological dynamical system. It is clear that the induced map T on functions preserves the space C(X) of continuous functions. Lemma 2.4.4. T : C(X) C(X) is contracting in , and if the original ! k·k map T : X X is onto, the induced T : C(X) C1(X) is an isometry. ! ! T : (X) (X) is continuous. P !P Proof. The first part follows from

Tf = max f(T (x)) = max fyx) f k k1 x X | | y T (X) | |k k1 2 2 and the fact that there is equality if TX = X. For the second part, if µ µ then for f C(X), n ! 2

fdTµ = f Tdµ f Tdµ= fdTµ ˆ n ˆ n ! ˆ ˆ

This shows that Tµ Tµ,soT is continuous. n ! The following result is why ergodic theory is useful in studying topological systems. CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 14

Proposition 2.4.5. Every topological dynamical system (X, T) has invariant measures. Proof. Let x X be an arbitrary initial point and define 2 N 1 1 µ = n N N T x n=0 X Note that N 1 1 fdµ = f(T nx) ˆ N N n=0 X Passing to a subsequence N(k) we can assume by compactness that !1 µ µ (X).Wemustshowthat fdµ= f Tdµfor all f C(X). N(k) ! 2P 2 Now, ´ ´

fdµ f Tdµ =lim (f f T ) dµN(l) ˆ ˆ k ˆ !1 N(k) 1 1 =lim (f f T )(T nx) k N(k) ˆ !1 n=0 X 1 N(k) 1 =lim f(T x) f(x) k N(k) !1 =0 ⇣ ⌘ because f is bounded. There are a number of common variations of this proof. We could have 1 N 1 n defined µN = N n=0 T xN (with the initial point xN varying with N), of 1 N 1 n µ µ = T µ begun with an arbitraryP measure and N N n=0 . The proof would then show that any accumulation point of µN is T -invariant. P We denote the space of T -invariant measures by (X). PT Corollary 2.4.6. In a topological dynamical system (X, T), (X) is non- PT empty, compact and convex. Proof. We already showed that it is non-empty, and convexity is trivial. For compactness we need only show it is closed. We know that

(X)= µ (X): (f f T )dµ =0 PT { 2P ˆ } f C(X) 2\ Each of the sets in the intersection is the pre-image of 0 under the map µ 7! (f f T )dµ;sincef f T is continuous this map is continuous and so ´ (X) is the intersection of closed sets, hence closed. PT Corollary 2.4.7. Every topological dynamical system (X, T) contains recurrent points. Proof. Choose any invariant measure µ (X) and apply Proposition 2.2.5 2PT to the measure preserving system (X, ,µ,T). B CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 15

2.5 Some technicalities

Much of ergodic theory holds in general probability spaces, but some of the results require assumptions on the measure space in order to avoid pathologies. There are two possible and theories available which for our purposes are essen- tially equivalent: the theory of Borel spaces and of Lebesgue spaces. We will work in the Borel category. In this section we review without proof the main facts we will use. These belong to descriptive set theory and we will not prove them. Definition 2.5.1. A Polish space is an uncountable topological space whose topology is induced by a complete, separable metric. Note that a metric space can be Polish even if the metric isn’t complete. For example [0, 1) is not complete in the usual metric but it is homeomorphic to [0, ), which is complete (and separable), and so [0, 1) is Polish. 1 Definition 2.5.2. AstandardBorelspaceisapair(X, ) where X is an un- B countable Polish space and is the Borel -algebra. A standard measure space B is a -finite measure on a Borel space. Definition 2.5.3. Two measurable spaces (X, ) and (Y, ) are isomorphic if B 1 C there is a bijection f : X Y such that both f and f are measurable. ! Theorem 2.5.4. Standard Borel spaces satisfy the following properties. 1. All Borel spaces are isomorphic. 2. Countable products of Borel spaces are Borel. 3. If A is an uncountable measurable subset of a Borel space, then the re- striction of the -algebra to A again is a Borel space. 4. If f is a measurable injection (1-1 map) between Borel spaces then the image of a Borel set is Borel. In particular it is an isomorphism from the domain to its image.

Another important operation on measure spaces is the factoring relation. Definition 2.5.5. Afactorbetweenmeasurablespaces(X, ) and (Y, ) is a B C measurable onto map f : X Y . If there are measures µ, ⌫ on X, Y ,respec- ! tively, then the factor is required also to satisfy fµ = ⌫. Given a factor f :(X, ) (Y, ) between Borel spaces, we can pull back B ! C the -algebra and obtain a sub--algebra by C E✓B 1 1 = f = ⇡ : C E C { C 2C} Note that is countably generated (since it is isomorphic to the Borel -algebra C of a separable metric space), so is countably generated as well. E CHAPTER 2. MEASURE PRESERVING TRANSFORMATIONS 16

This procedure can to some extent be reversed. Let (X, B)beaBorelspace and acountablygeneratedsub- algebra. Partition X into the atoms of E✓B ,thatis,accordingtothetheequivalencerelation E x y 1 (x)=1 (y) for all E ⇠ () E E 2E Let ⇡ : X X/ denote the factor map and ! ⇠ 1 / = E X/ : ⇡ E E ⇠ { ✓ ⇠ 2E} Then the quotient space X/ =(X/ , / ) is a measurable space and ⇡ is E 1 ⇠ E ⇠ afactormap.Noticealsothat⇡ :( / ) is 1-1, so / is countably E ⇠ !E E ⇠ generated. Also, the atoms of are measurable, since if f is generated by ⇠ E E then the atom of x is just F where F = E if x E and F = X E { n} n n n 2 n n \ n otherwise. Hence / separates points in X/ . E ⇠ T ⇠ These two operations are not true inverses of each other: it is not in general true that if is countably generated then (X/ , / ) is a Borel space. E✓B ⇠ E ⇠ But if one introduces a measure then it is true up to measure 0.

Theorem 2.5.6. Let µ be a probability measure on a Borel space (X, ,µ), and B a countably generated infinite sub--algebra. Then there is a measurable E✓B subset X X of full measure such that the quotient space of X / is a Borel 0 ✓ 0 E space. As we mentioned above, there is an alternative theory available with many of the same properties, namely the theory of Lebesgue spaces. These are measure spaces arising as the completions of -finite measures on Borel spaces. In this theory all definitions are modulo sets of measure zero, and all of the properties above hold. In particular the last theorem can be stated more cleanly, since the removal of a set of measure 0 is implicit in the definitions. Many of the standard texts in ergodic theory work in this category. The disadvantage of Lebesgue spaces is that it makes it cumbersome to consider different measures on the same space, since the -algebra depends non-trivially on the measure. This is the primary reason we work in the Borel category. Chapter 3

Ergodicity

3.1 Ergodicity

In this section and the following ones we will study how it may be decomposed into simpler systems. Definition 3.1.1. Let (X, ,µ,T) be a measure preserving system. A measur- B 1 able set A X is invariant if T A = A. The system is ergodic if there are no ✓ non-trivial invariant sets; i.e. every invariant set has measure 0 or 1. If A is invariant then so is X A. Indeed, \ 1 1 1 T (X A)=T X T A = X A \ \ \ Thus, ergodicity is an irreducibility condition: a non-ergodic system the dynam- ics splits into two (nontrivial) parts which do not “interact”, in the sense that an orbit in one of them never enters the other. Example 3.1.2. Let X be a finite set with normalized counting measure, and T : X X a 1-1 map. If X consists of a single orbit then the system is ergodic, ! since any invariant set that is not empty contains the whole orbit. In general, X splits into the disjoint (finite) union of orbits, and each of these orbits is invariant and of positive measure. Thus the system is ergodic if and only if it consists of a single orbit. Note that every (invertible) system splits into the disjoint union of orbits. However, these typically have measure zero, so do not in themselves prevent ergodicity. Example 3.1.3. By taking disjoint unions of measure preserving systems with the normalized sum of the measures, one gets many examples of non-ergodic systems. Definition 3.1.4. Afunctionf : X Y for some set Y is invariant if f(Tx)= ! f(x) for all x X. 2 17 CHAPTER 3. ERGODICITY 18

The primary example is 1A when A is invariant. Lemma 3.1.5. The following are equivalent: 1. (X, ,µ,T) is ergodic. B 1 2. If T A = A mod µ then µ(A)=0or 1. 3. Every measurable invariant function is constant a.e. 4. If f L1 and Tf = f a.e. then f is a.e. constant. 2 Proof. (1) and (3) are equivalent since an invariant set A produces the invari- ant function 1A, while if f is invariant and not a.e. constant then there is a 1 measurable set U in the range of f such that 0 <µ(f U) < 1.Butthissetis clearly invariant. Exactly the same argument shows that (2) and (4) are equivalent. We complete the proof by showing the equivalence of (3) and (4). Clearly (4) implies (3). Conversely, suppose that f L1 and Tf = f a.e.. Let g = 2 lim sup f(T nx). Clearly g is T -invariant (since g(Tx) is the limsup of the shifted sequence f(T n+1x),andisthesameasthelimsupoff(T nx), which is g(x)). The proof will be done by showing that g = f a.e. This is true at a point x if f(T nx)=f(x) for all n 0,andforthisitisenoughthatf(T n+1x)= f(T nx) for all n 0;equivalently,thatT nx Tf = f for all n,i.e.that n 2{ } x T Tf = f .Butthisisanintersectionofsetsofmeasure1andhence 2 { } holds for a.e. x,asdesired. T 2⇡i↵ Example 3.1.6 (Irrational circle rotation). Let R↵(x)=e x be an irrational circle rotation (↵/Q)onS1 with Lebesgue measure. We claim that this system 2 n 1 is ergodic. Indeed, let n(z)=z (the characters of the compact group S )and 2 consider an invariant function f L1(µ).Sincef L ,itcanberepresented 2 2 2 in L as a Fourier series f = ann. Now,

2⇡i↵ n 2⇡in↵ n 2⇡in↵ Tn(z)=(eP z) = e z =2 n so from 2⇡in↵ f = Tf = anTn = e ann

2⇡in↵ Comparing this to the original expansionX we haveX an = e an. Thus if an =0 2⇡in↵ 6 then e =1, which, since ↵/Q,canoccuronlyifn =0. Thus f = a00, 2 which is constant . Non-ergodicity means that one can split the system into two parts that don’t “interact”. The next proposition reformulates this in a positive way: ergodicity means that every non-trivial sets do “interact”. Proposition 3.1.7. The following are equivalent: 1. (X, ,µ,T) is ergodic. B n 2. For any B ,ifµ(B) > 0 then 1 T B = X mod µ for every N. 2B n=N S CHAPTER 3. ERGODICITY 19

n 3. If A, B and µ(A),µ(B) > 0 then µ(A T B) > 0 for infinitely many 2B \ n.

n Proof. (1) implies (2): Given B let B0 = n1=N T B and note that S 1 1 1 n 1 n T (B0)= T T B = T B B0 ✓ n[=N n=[N+1 1 1 Since µ(T B0)=µ(B0) we have B0 = T B0 mod µ,hencebyergodicity B0 = X mod µ. (2) implies (3): Given A, B as in (3) we conclude from (2) that, for every N, n n µ(A 1 T B)=µ(A),hencetheresomen>N with µ(T A) > 0.This \ n=N implies that there are infinitely many such n. S Finally if (3) holds and if A is invariant and µ(A) > 0,thentakingB = X A n \ clearly A T B = for all n so µ(B)=0by (3). Thus every invariant set \ ; is trivial. S 3.2 Mixing

Although a wide variety of ergodic systems can be constructed or shown ab- stractly to exist, it is surprisingly difficult to verify ergodicity of naturally aris- ing systems. In fact, in most cases where ergodicity can be proved because the system satisfies a stronger “mixing” property. Definition 3.2.1. (X, ,µ,T) is called mixing if for every pair A, B of mea- B surable sets, n µ(A T B) µ(A)µ(B) as n \ ! !1 It is immediate from the definition that mixing systems are ergodic. The advantage of mixing over ergodicity is that it is enough to verify it for a “dense” family of sets A, B. It is better to formulate this in a functional way. Lemma 3.2.2. For fixed f L2 and n, the map (f,g) f T ngdµ is n 2 7! ´ · multilinear and f T gdµ 2 f 2 g 2. ´ · k k k k Proof. Using Cauchy-Schwartz and the previous lemma,

f T ngdµ f T ng = f g ˆ · k k2 k k2 k k2 k k2

Proposition 3.2.3. (X, ,µ,T) is mixing if and only if for every f,g L2, B 2

f T ngdµ fdµ gdµ as n ˆ · ! ˆ · ˆ !1

Furthermore this limit holds for ail f,g L2 if and only if it holds for f,g in a 2 dense subset of L2. CHAPTER 3. ERGODICITY 20

Proof. We prove the second statement first. Suppose the limit holds for f,g V 2 2 2 with V L dense. Now let f,g L and for ">0 let f 0,g0 V with ✓ 2 2 f f 0 <"and g g0 <". Then k k k k

n n f T gdµ (f f 0 + f 0) T (g g0 + g0) dµ ˆ ·  ˆ · n n (f f 0) T gdµ + f T (g g0) dµ +  ˆ · ˆ · n n + (f f 0) T (g g0) dµ + f 0 T g0 dµ ˆ · ˆ · 2 n " g + f " + " + f 0 T g0 dµ  kk k k ˆ · n n Since f 0 T g0 dµ 0 and " was arbitrary this shows that f T gdµ · ! · ! 0,asdesired.´ ´ n n For the first part, using the identities 1A dµ = µ(A),T 1A =1T A and 1A1B =1A B,weseethatmixingisequivalenttothelimitabovefor´ \ indicator functions, and since the integral is multilinear in f,g it holds for linear combinations of indicator functions and these combinations are dense in L2,we are done by what we proved above. Example 3.2.4. Let X = AZ for a finite set A,taketheproduct-algebra, and µ a product measure with marginal given by a probability vector p =(pa)a A. 2 Let : X X be the shift map (x) = x .Weclaimthatthismapis ! n n+1 mixing and hence ergodic. To prove this note that if f(x)=f(x1,...,xk) depends on the first k co- n ordinates of the input, then f(x)=f(xk+1,...,xk+n). If f,g are two such functions then for n large enough, ngeand f depend on different coordinates, and hence, because µ is a product measure,e they are independent in the sense of probability theory:

f ngdµ= fdµ ngdµ= fdµ gdµ ˆ · ˆ · ˆ ˆ · ˆ so the same is true when taking n . Mixing follows from the previous !1 proposition.

3.3 Kac’s return time formula

We pause to give a nice application of ergodicity to estimation of the “recurrence rate” of points to a set. n Let (X, ,µ,T) be ergodic and let µ(A) > 0.SinceX = 1 T A,and B 0 n=1 its measure is at least µ(A) which is positive, by ergodicity µ(X0)=1. Thus for S a.e. there is a minimal n 1 with T nx A;wedenotethisnumberbyr (x) 2 A CHAPTER 3. ERGODICITY 21

and note that rA is measurable, since

i r

Theorem 3.3.1 (Kac’s formula). Assume that T is invertible. Then A rA dµ = 1; in particular, E(rA A)=1/µ(A), so the expected time to return to´A starting | from A is 1/µ(A). Proof. Let A = A r = n . Then n \{ A } n 1 1 r dµ = nµ(A )= µ(T kA ) ˆ A n n A n=1 n=1 X X kX=1 k The proof will be completed by showing that the sets T An : n N ,...1 { 2  k n are pairwise disjoint and that their union has full measure. Indeed,  } m for a.e. x X there is a least m 1 such that y = T x A.Letn = 2 n n 2 n m rA(y). Clearly m n,sinceifnmbecause this would imply that m0 m m0 m m0 rA(T x) m0 >m,andatthesametimeT x = T (T x) m0 2 An A,implyingrA(T x) m0 m

3.4 Ergodic measures as extreme points

It is clear that (X) is convex; in this section we will prove a nice alge- PT braic characterization of the ergodic measures as precisely the extreme points of (X).Recallthatapointinaconvexsetisanextremepointifitcannot PT be written as a convex combination of other points in the set. Proof. Let f = d⌫/dµ.Givent let E = f

1 and since µ(E)=µ(T E),therighthandsidesareequal,andhencealsothe left hand sides. Now ⌫(E)= fdµ= fdµ+ fdµ ˆE ˆE T 1E ˆE T 1E \ \ On the other hand

1 ⌫(E)=⌫(T E)= fdµ+ fdµ ˆT 1E E ˆ(T 1E) E \ \ Subtracting we find that

fdµ= fdµ ˆE T 1E ˆT 1E E \ \ On the left hand side the integral is over a subset of E, where ft, \ 1 \ so the integral is tµ(T E E).Equalityispossibleonlyifthemeasureof \ 1 these sets is 0,andsinceµ(E)=µ(T E),thesetdifferencecanbeaµ-nullset 1 if and only if E = T E mod µ, which is the desired invariance of E. Remark 3.4.1. If T is invertible, there is an easier argument: since Tµ = µ and T⌫ = ⌫ we have dT⌫/dT µ = d⌫/dµ = f. Now, for any measurable set A,

1 1 dT ⌫ = 1AdT ⌫ = 1A Td⌫= 1A Tfdµ= 1Af T dT µ = f T dT µ ˆA ˆ ˆ ˆ ˆ ˆA

1 This shows that f T = dT⌫/dT µ = f.Butofcoursewehaveusedinverta- bility. Proposition 3.4.2. The ergodic invariant measures are precisely the extreme points of (X). PT Proof. If µ (X) is non-ergodic then there is an invariant set A with 2PT 0 <µ(A) < 1.ThenB = X A is also invariant. Let µ = 1 µ \ A µ(A) |A and µ = 1 µ denote the normalized restriction of µ to A, B. Clearly B µ(B) |B µ = µ(A)µA + µ(B)µB,soµ is a convex combination of µA,µB,andthese measures are invariant:

1 1 1 µ (T E)= µ(A T E) A µ(A) \

1 1 1 = µ(T A T E) µ(A) \

1 1 = µ(T (A E)) µ(A) \ 1 = µ(A E) µ(A) \

= µA(E) CHAPTER 3. ERGODICITY 23

Thus µ is not an extreme point of (X). PT Conversely, suppose that µ = ↵⌫ +(1 ↵)✓ for ⌫,✓ (X) and ⌫ = µ. 2PT 6 Clearly µ(E)=0implies ⌫(E)=0,so⌫ µ,andbythepreviouslemma ⌧ f = d⌫/dµ L1(µ) is invariant. Since 1=⌫(X)= fdµ,weknowthatf =0, 2 6 and since ⌫ = µ we know that f is not constant.´ Hence µ is not ergodic by 6 Lemma 3.1.5.

As an application we find that distinct ergodic measures are also separated at the spacial level: Corollary 3.4.3. Let µ, ⌫ be ergodic measures for a measurable map T of a measurable space (X, ). Then either µ = ⌫ or µ ⌫. B ? Proof. Suppose µ = ⌫ and let ✓ = 1 µ + 1 ⌫.Sincethisisanontrivialrepresen- 6 2 2 tation of ✓ as a convex combination, it is not ergodic, so there is a nontrivial invariant set A.Byergodicity,A must have µ-measure 0 or 1 and similarly for ⌫. They cannot be both 0 since this would imply ✓(A)=0,andtheycannot both have measure 1,sincethiswouldimply✓(A)=1. Therefore one is 0 and one is 1. This implies that A supports one of the measures and X A the other, \ so µ ⌫. ? 3.5 Ergodic decomposition I

Having described those systems that are “indecomposable”, we now turn to study how a non-ergodic system may decompose into ergodic ones. One can begin such a decomposition immediately from the definitions: if µ (X) 2PT is not ergodic then there is an invariant set A and µ is a convex combination of the invariant measures µA,µX A, which are supported on disjoint invariant \ sets. If µA,µB are not ergodic we can split each of them further as a convex combination of mutually singular invariant measures. Iterating this procedure we obtain representations of µ as convex combinations of increasingly “fine” mutually singular measures. While at each finite stage the component measures need not be ergodic, in a sense they are getting closer to being ergodic, since at each stage we eliminate a potential invariant set. One would like to pass to a limit, in some sense, and represent the original measure is a convex combination of ergodic ones. Example 3.5.1. If T is a bijection of a finite set with normalized counting measure then the measure splits as a convex combination of uniform measures on orbits, each of which is ergodic. In general, it is too much to ask that a measure be a convex combination of ergodic ones. Example 3.5.2. Let X =[0, 1] with Borel sets and Lebesgue measure µ and T the identity map. In this case the only ergodic measures are atomic, so we cannot write µ as a finite convex combination of ergodic measures. CHAPTER 3. ERGODICITY 24

The idea of decomposing µ is also motivated by the characterization of ergodic measures as the extreme points of and the fact that in finite- PT dimensional vector spaces a point in a convex set is a convex combination of ex- treme points. There are also infinite-dimensional versions of this, and if (X) PT can be made into a compact convex set satisfying some other mild conditions one can apply Choquet’s theorem. However, we will take a more measure-theoretic approach through the measure integration and disintegration.

3.6 Measure integration

Given a measurable space (X, ),afamily ⌫x x X of probability measures on B { } 2 (Y, ) is measurable if for every E the map x ⌫ (E) is measurable (with C 2C 7! x respect to ). Equivalently, for every bounded measurable function f : Y R, B ! the map x f(y) d⌫ (y) is measurable. 7! x Given a measure´ µ (X) we can define the probability measure ⌫ = 2P ⌫xdµ(x) on Y by ´ ⌫(E)= ⌫ (E) dµ(x) ˆ x For bounded measurable f : Y R this gives ! fd⌫= ( fd⌫ ) dµ(x) ˆ ˆ ˆ x and the same holds for f L1(⌫) by approximation (although f is defined only 2 on a set E of full ⌫-measure, we have ⌫x(E)=1for µ-a.e. x,sotheinner integral is well defined µ-a.e.). Example 3.6.1. Let X be finite and =2X . Then B ⌫ dµ(x)= µ(x) ⌫ ˆ x · x x X X2 Any convex combination of measures on Y can be represented this way, so the definition above generalizes convex combinations.

Example 3.6.2. Any measure µ on (X, ) the family x x X is measurable B { } 2 since x(E)=1E(x),andµ = x dµ(x) because ´ µ(X)= 1 (x)dµ(x)= ⌫ (E) dµ(x) ˆ E ˆ x

In this case the parameter space was the same as the target space. In particular, this representation shows that Lebesgue measure on [0, 1] is an integral of ergodic measures for the identity map. CHAPTER 3. ERGODICITY 25

Example 3.6.3. X =[0, 1] and Y =[0, 1]2.Forx [0, 1] let ⌫ be Lebesgue 2 x measure on the fiber x [0, 1].Measurabilityisverifiedusingthedefinition { }⇥ of the product -algebra, and by Fubini’s theorem

1 1 ⌫(E)= ⌫x(E)dµ(x)= 1E(x, y)dy dx = 1dxdy ˆ ˆ0 ˆ0 ˆˆE so ⌫ is just Lebesgue measure on [0, 1]2. One could also represent ⌫ as ⌫x,y d⌫(x, y) where ⌫x,y = ⌫x.Writtenthis way each fiber measure appears many´ times.

3.7 Measure disintegration

We now reverse the procedure above and study how a measure may be decom- posed as an integral of other measures. Specifically, we will study the decom- position of a measure with respect to a partition.

Example 3.7.1. Let (X, ,µ) be a probability space and let P = P ,...,P B { 1 n} be finite partition of it, i.e. P are measurable, P P = for i = j,and i i \ j ; 6 X = P .Forsimplicityassumealsothatµ(P ) > 0.Let (x) denote the i i P unique Pi containing x and let µx denote the conditional measure on it, µx = 1 S µ( (x)) µ (x). Then it is easy to check that µ = µx dµ(x). P |P Alternatively we can define Y = 1,...,n with´ a probability measure given P ( i )=µ(P ) µ = 1 µ { }µ = µ(P )µ = µ dP (i) by i .Let i µ(Pi) Pi . Then i i i . { } | ´ Our goal is to give a similar decomposition of aP measure with respect to an infinite (usually uncountable) partition of X. Then the partition elements E E typically have measure 0,andtheformula 1 µ no longer makes 2E µ(E) |E sense. As in probability theory one can define the conditional probability of an event E given that x E as the conditional expectation E(1E ) evaluated at 2 |P x (conditional expectation is reviewed in the Appendix). This would appear to give the desired decomposition: define µx(E)=E(1E )(x).Foranycountable |E algebra this does give a countably additive measure defined for µ-a.e. x. The problem is that µx(E) is defined only for a.e. x but we want to define µx(E) for all measurable sets. Overcoming this problem is a technical but nontrivial chore which will occupy us for the rest of the section. For a measurable space (X, ) and a sub--algebra generated by a B E✓B countable sequence En .Writex y if 1E(x)=1E(y) for every E ,or { } ⇠E 2E equivalently, 1En (x)=1En (y) for all n. This is an equivalence relation. The atoms of are by definition the equivalence classes of , which are measurable, E ⇠E being intersections of sequences F of the form F E ,X E .Wedenote n n 2{ n \ n} (x) the atom containing x. E In the next theorem we assume that the space is compact, which makes the Riesz representation theorem available as a means for of defining measures. We shall discuss this restriction afterwards. CHAPTER 3. ERGODICITY 26

Theorem 3.7.2. Let X be compact metric space, the Borel algebra, and B a countably generated sub--algebra. Then there is an measurable E✓B E family µy y X (X) such that µy is supported on (y) and { } 2 ✓P E

µ = µ dµ(y) ˆ y

Furthermore if µy0 y X is another such system then µy = µy0 a.e. { } 2 Note that -measurability has the following consequence: For µ-a.e. y,for E every y0 (y) we have µ = µ (and, since since µ ( (y)) = 1, it follows that 2E y0 y y E µy0 = µy for µy-a.e y0).

Definition 3.7.3. The representation µ = µy dµ(y) in the proof is often called the disintegration of µ over . ´ E We adopt the convention that y denotes the variable of -measurable func- E tions. Let V C(X) be a countable dense Q-linear subspace with 1 V .For ✓ 2 f V let 2 f = E(f ) |E (see the Appendix for a discussion of conditional expectation). Since V is count- able there is a subset X X of full measure such that f is defined everywhere 0 ✓ on X0 for f V and f f is Q-linear and positive on X0,and1=1on X0. 2 7! Thus, for y X0 the functions ⇤y : V R given by 2 !

⇤y(f)=f(y) are positive Q-linear functionals on the normed space (V, ),andtheyare k·k continuous, since by positivity of conditional expectation f 1 f . Thus 1 k k1 ⇤y extends to a positive R-linear functional⇤y : C(X) R.Notethat⇤y1= ! 1(y)=1.Hence,bytheRieszrepresentationtheorem,thereexists µ (X) y 2P such that ⇤ f = f(x) dµ (x) y ˆ y For y X X define µ to be some fixed measure to ensure measurability. 2 \ 0 y Proposition 3.7.4. y µy is -measurable and E(1A )(y)=µy(A) µ-a.e., ! E |E for every A . 2B Proof. Let denote the family of sets A such that y µ (A) A✓B 2B 7! y measurable from (X, ) to (X, ) and E(1A )(y)=µy(A) µ-a.e. We want to E B |E show that = . A B Let denote the family of sets A X such that 1 is a pointwise A0 ✓B ✓ A limit of a uniformly bounded sequence of continuous functions. First, is an A0 algebra: clearly X, ,iffn 1A then 1 fn 1X A,andifalsogn 1B ;2A ! ! \ ! then fngn 1A1B =1A B. ! \ CHAPTER 3. ERGODICITY 27

We claim that 0 . Indeed, if fn 1A and fn C then A ✓A ! k k1 

f dµ 1 dµ = µ (A) ˆ n y ! ˆ A y y by dominated convergence, so y µ (A) is the pointwise limit of the functions 7! y y f dµ , which are the same a.e. as the measurable functions f = 7! n y n E(fn ´):(X, ) (X, ). This establishes measurability of the limit function |E E ! B y µy(A) and also proves that this function is E(1A ) a.e., since E( ) is 7! |E ·|E continuous in L1 and f 1 boundedly. This proves . n ! A A0 ✓A Now, contains the closed sets, since if A X then 1 =limf for A0 ✓ A n f (x)=exp( n d(x, A)). Thus generates the Borel -algebra . n · A0 B Finally, we claim that is a monotone class. Indeed, if A A ... A 1 ✓ 2 ✓ belong to 0 and A = A ,thenµ (A)=limµ (A ),andsoy µ (A) is the B n y y n 7! y pointwise limit of the measurable functions y µy(An). The latter functions S 7!1 are just E(1An ) and, since 1An 1A in L ,bycontinuityofconditional |E ! 1 expectation, E(1An ) E(1A ) in L .Henceµy(A)=E(1A ) a.e. as |E ! |E |E desired. Since is a monotone class containing the sub-algebra of and gen- A A0 A0 erates ,bythemonotoneclasstheoremwehave . Thus = ,as B B✓A A B desired.

1 Proposition 3.7.5. E(f )(y)= fdµy µ-a.e. for every f L (µ). |E ´ 2 Proof. We know that this holds for f =1A by the previous proposition. Both sides of the claimed equality are linear and continuous under monotone in- creasing sequences. Approximating by simple functions this gives the claim for positive f L1 and, taking differences, for all f L1. 2 2 Proposition 3.7.6. µ is µ-a.s. supported (y), that is, µ ( (y)) = 1 ⌫-a.e. y E y E Proof. For E we have 2E

1E(y)=E(1E )(y)= 1E dµy = µy(E) |E ˆ and it follows that µ (E)=1 (y) a.e. Let E 1 generate ,andchoosea y E { n}n=1 E set of full measure on which the above holds for all E = En.Fory in this set let F E ,X E be such that (y)= F .Bytheaboveµ (F )=1, n 2{ n \ n} E n y n and so µy( (y)) = 1,asclaimed. E T

Proposition 3.7.7. If µy0 y Y is another family with the same properties then { } 2 µy0 = µy for µ-a.e. y. 1 Proof. For f L (µ) define f 0(y)= fdµ0 . This is clearly a linear operator 2 y defined on L1(X, ,µ),anditsrangeis´ L1(X, ,µ) because B E

f 0 dµ ( f dµ ) dµ(y)= f dµ = f ˆ | |  ˆ ˆ | | y ˆ | | k k1 CHAPTER 3. ERGODICITY 28

The same calculation shows that f 0 dµ = fdµ.Finally,forE we know 2E that µ is supported on E for µ-a.e.´ y E ´and on X E for µ-a.e. y X E. y 2 \ 2 \ Thus µ-a.s. we have

(1 f)0(y)= 1 fdµ0 =1 (y) fdµ0 =1 f 0 E ˆ E y E ˆ y E ·

By a well-known characterization of conditional expectation, f 0 = E(f )=f |E (see the Appendix).

It remains to address the compactness assumption on X.Examplesshow that one the disintegration theorem does require some assumption; it does not hold for arbitrary measure spaces and sub--algebras. We will not eliminate the compactness assumption so much as explain why it is not a large restriction. We can now formulate the disintegration theorem as follows.

Theorem 3.7.8. Let µ be a probability measure on a standard Borel space (X, ,µ) and a countably generated sub--algebra. Then there is an B E✓B -measurable family µy y Y (X, ) such that µy is supported on (y) and E { } 2 ✓P B E

µ = µ dµ(y) ˆ y

Furthermore if µy0 y X is another such system then µy = µy0 . { } 2 3.8 Ergodic decomposition II

Let (X, ,µ,T) be a measure preserving system on a Borel space. Let B I✓B denote the family of T -invariant measurable sets. It is easy to check that is a I -algebra. The -algebra in general is not countably generated. Consider for example I the case of an invertible ergodic transformation on a Borel space, such as an irrational circle rotation or two-sided Bernoulli shift. Then consists only of I sets of measure 0 and 1. If were countably generated by I 1 ,say,then I { n}n=1 for each n either µ(I )=1or µ(X I )=1.SetF = I or F = X I n \ n n n n \ n according to these possibilities. Then F = Fn is an invariant set of measure 1 and is an atom of .Buttheatomsof are the orbits, since each point in X I I T is measurable and hence every countable set is. But this would imply that µ is supported on a single countable orbit, contradicting the assumption that it is non-atomic. We shall work instead with a fixed countably generated µ-dense sub-- algebra of .LetL1(X, ,µ) is a closed subspace of L1(X, ,µ),andsince I0 I I B the latter is separable, so is the former. Choose a dense countable sequence f L1(X, ,µ),choosingrepresentativesofthefunctionsthataregenuinely n 2 I I measurable, not just modulo a -measurable nullset. Now consider the count- B able family of sets An,p,q = p

Let µy y X be the disintegration of µ relative to 0,weneedonlyshow { } 2 I that for µ-a.e. y the measure µy is T -invariant and ergodic.

Claim 3.8.2. For µ-a.e. y, µy is T -invariant.

Proof. Define µy0 = Tµy. This is an measurable family since for any E , 1 E 2B µ0 (E)=µ (T E) so measurability of y µ0 (E) follows from that of y y y 7! y 7! µy(E).Weclaimthat µy0 y X is a disintegration of µ over 0. Indeed, for any { } 2 I E , 2B ( µ (E)) dµ(y)= ( µ (T 1E)) dµ(y) ˆ ˆ y0 ˆ ˆ y 1 = µ(T E) = µ(E)

1 Also T (y)= (y) (since (y) )so I0 I0 I0 2I 1 µ0 ( (y)) = µ (T (y)) = µ ( (y)) = 1 y I0 y I0 y I0 so µy0 is supported on (y). Thus, µy0 y X is an -measurable disintegration E { } 2 E of µ,soµy0 = µy a.e. This is exactly the same as a.e. invariance of µy.

Claim 3.8.3. For µ-a.e. y, µy is ergodic. Proof. This can be proved by purely measure-theoretic means, but we will give a proof that uses the mean ergodic theorem, Theorem 4.2.3 below. Let C(X) F✓ be a dense countable family. Then

N 1 n T f E(f )=E(f 0) N ! |I |I n=1 X in L2( ,µ). For each f ,wecanensurethatthisholdsa.e.alongan B 2F appropriate subsequence, and by a diagonal argument we can construct a sub- 1 Nk n sequence Nk such that T f E(f 0) for all f ,a.e.Since !1 Nk n=1 ! |I 2F µ = µ dµ(y) this holds µ -a.e. for µ-a.e. y. Now, for such a y,inthemeasure y y P 1 Nk n preserving´ system (X, ,µy,T),forf we have T f µ (f ) Nk n=1 E y 2 B 2F ! |I in L ;sincef is bounded and the limit is a.s. equal to E(f 0),wehave 2F P |I CHAPTER 3. ERGODICITY 30

Eµy (f )=E(f 0) µ-a.e. But the right hand side is 0-measurable, hence |I |I I µy-a.e. constant. We have found that for f the conditional expectation 2F 1 Eµy (f ) is µy-a.e. constant. is dense in C(X) and therefore in L ( ,µy), |I F B and Eµy ( ) is continuous, we the image of Eµy ( ) is contained in the constant ·|I 1 ·|I functions. But if g L ( ,µy) is invariant it is -measurable and Eµy (g )=g 2 B I |I is constant. Thus all invariant functions in L1( ,µ ) are constant, which implies B y that (X, ,µ ,T) is ergodic. B y Our formulation of the ergodic decomposition theorem represents µ as an integral of ergodic measures parametrized by y X (in an -measurable way). 2 I Sometimes the following formulation is given, in which (X) is given the - PT algebra generated by the maps µ µ(E), E ; this coincides with the 7! 2B Borel structure induced by the weak-* topology when X is given the structure of a compact metric space. One can show that the set of ergodic measures is measurable, for example because in the topological representation they are the extreme points of a weak-* compact convex set. Theorem 3.8.4 (Ergodic decomposition, second version). Let (X, ,µ,T) be B a measure preserving system on a Borel space. Then there is a unique prob- ability measure ✓ on (X) supported on the ergodic measure and such that PT µ = ⌫d✓(⌫). ´ Chapter 4

The ergodic theorem

4.1 Preliminaries

We have seen that in a measure preserving system, a.e. x A returns to A 2 infinitely often. Now we will see that more is true: these returns occur with a definite frequency which, in the ergodic case, is just µ(A);inthenon-ergodic case the limit is µx(A), where µx is the ergodic component to which x belongs. This phenomenon is better formulated at an analytic level in terms of av- erages of functions along an orbit. To this end let us introduce some notation. Let T : V V be a linear operator of a normed space V ,andsupposeT is ! acontraction,i.e. Tf f . This is the case when T is induced from a k kk k measure-preserving transformation (in fact we have equality). For v V define 2 N 1 1 S v = T nv N N n=0 X Note that in the dynamical setting, the frequency of visits x to A up to time 1 N 1 n N is SN 1A(x)= N n=0 1A(T x). Clearly SN is linear, and since T is a n contraction T v v for n 1,sobythetriangleinequality, SN v 1 N 1 nk kkP k k k T v v . Thus S are also contractions. This has the following N n=0 k kk k N useful consequence. P Lemma 4.1.1. Let T : V V as above and let S : V V be another bounded ! ! linear operator. Suppose that V V is a dense subset and that S v Sv as 0 ✓ N ! N for all v V . Then the same is true for all v V . !1 2 0 2 Proof. Let v V and w V . Then 2 2 0 lim sup SN v Sv lim sup SN v SN w +limsup SN w Sv N k k N k k N k k !1 !1 !1 Since S v S w = S (v w) v w and S w Sw (because w k N N k k N kk k N ! 2 V0), we have

lim sup SN v Sv v w + Sw Sv (1 + S ) v w N k kk k k k k k ·k k !1 31 CHAPTER 4. THE ERGODIC THEOREM 32

Since v w can be made arbitrarily small, the lemma follows. k k 4.2 Mean ergodic theorem

Historically, the first ergodic theorem is von-Neuman’s “mean” ergodic theorem, which can be formulated in a purely Hilbert-space setting (and it is not hard to adapt it to LP ). Recall that if T : V V is a bounded linear operator ! of a Hilbert space then T ⇤ : V V is the adjoint operator, characterized by ! v, Tw = T ⇤v, w for v, w V ,andsatisfies T ⇤ = T . h i h i 2 k k k k Lemma 4.2.1. Let T : V V be a contracting linear operator of a Hilbert ! space. Then v V is T -invariant if and only if it is T ⇤-invariant. 2 Remark 4.2.2. When T is unitary (which is one of the main cases of interest to us) this lemma is trivial. Note however that without the contraction assumption this is false even in Rd.

Proof. Since (T ⇤)⇤ = T it suffices to prove that T ⇤v = v implies Tv = v.

v Tv 2 = v Tv,v Tv k k h i = v 2 + Tv 2 Tv,v v, Tv k k2 k k2 h ih i = v + Tv v, T ⇤v T ⇤v, v k k k k h ih i = v 2 + Tv 2 v, v v, v k k k k h ih i = Tv 2 v 2 k k k k 0  where the last inequality is because T is a contraction. Theorem 4.2.3 (Hilbert-space mean ergodic theorem). Let T be a linear con- traction of a Hilbert space V ,i.e. Tv v .LetV V denote the closed k kk k 0  subspace of T -invariant vectors (i.e. V =ker(T I)) and ⇡ the orthogonal 0 projection to V0. Then

N 1 1 T nv ⇡v for all v V N ! 2 n=0 X Proof. If v V then S v = v and so S v v = ⇡v trivially. Since V = 2 0 N N ! V V ? and S is linear, it suffices for us to show that S v 0 for v V ?. 0 0 N N ! 2 0 The key insight is that V0? can be identified as the space of co-boundaries,

V ? = v Tv : v V (4.1) 0 { 2 } CHAPTER 4. THE ERGODIC THEOREM 33 assuming this, by Lemma 4.1.1 we must only show that S (v Tv) 0 for N ! v V , and this follows from 2 N 1 1 S (v Tv)= T n(v Tv) N N n=0 X 1 = (w T N+1w) N 0 ! where in the last step we used w T N+1w w + T N+1w 2 w . k k  k k To prove (4.1) it suffices to show that w v Uv : v V implies w V . ? { 2 } 2 0 Suppose that w (v Uv) for all v V .Since ? 2 w, v Uv = w, v w, Uv h i h ih i = w, v U ⇤w, v h ih i = w U ⇤w, v h i we conclude that w U ⇤w, v =0for all v V ,hencew U ⇤w =0.Hence h i 2 Uw = w and by the lemma Uw = w,asdesired.

Now let (X, ,µ,T) be a measure preserving system and let T denote also B 2 the Koopman operator induced on L by T . Then the space V0 of T -invariant vectors is just L2(X, ,µ), where is the -algebra of invariant sets, and I I✓B the orthogonal projection ⇡ to V0 is just the conditional expectation operator, ⇡f = E(f ) (see the Appendix). We derive the following: |I Corollary 4.2.4 (Dynamical mean ergodic theorem). Let (X, ,µ,T) be a B measure-preserving system, let denote the -algebra of invariant sets, and I let ⇡ denote the orthogonal projection from L(X, ,µ) to the closed subspace B L2(X, ,µ). Then for every f L2, I 2 N 1 1 n 2 T f E(f ) in L N ! |I n=0 X In particular, if the system is ergodic then the limit is constant:

N 1 1 T nf fdµ in L2 N ! ˆ n=0 X 2 Specializing to f =1A,andnotingthatL -convergence implies, for example, convergence in probability, the last result says that on an arbitrarily large part of the space, the frequency of visits of an orbit to A up to time N is arbitrarily close to µ(A),ifN is large enough. CHAPTER 4. THE ERGODIC THEOREM 34

4.3 The pointwise ergodic theorem

Very shortly after von Neumann’s mean ergodic theorem (and appearing in print before it), Birkhoffproved a stronger version in which convergence takes place a.e. and in L1.

Theorem 4.3.1 (Pointwise ergodic theorem). Let (X, ,µ,T) be a measure- B preserving system, let denote the -algebra of invariant sets. Then for any I f L1(µ), 2 N 1 1 n 1 T f E(f ) a.e. and in L N ! |I n=0 X In particular, if the system is ergodic then the limit is constant:

N 1 1 T nf fdµ a.e. and in L1 N ! ˆ n=0 X We shall see several proofs of this result. The first and most “standard” proof follows the same scheme as the mean ergodic theorem: one first establishes the statement for a dense subspace V L1,andthenusessomecontinuityproperty ✓ to extend to all of L1. The first step is nearly identical to the proof of the mean ergodic theorem. Proposition 4.3.2. There is a dense subspace V L1such that the conclusion ✓ of the theorem holds for every f V . 2 Proof. We temporarily work in L2.LetV denote the set of invariant f L2, 1 2 for which the theorem holds trivially because S f = f for all N.LetV L2 N 2 ✓ denote the linear span of functions of the form f = g Tg for g L1. The 2 theorem also holds for these, since

g + T N+1g g + T N+1g =2 g 1 k k1 1 k k1 and therefore N 1 1 1 T n(g Tg)= (g T N+1g) 0 a.e. and in L1 N N ! n=0 X Set V = V1 + V2.BylinearityofSN ,thetheoremholdsforf V1 + V2. Now, 2 2 2 2 L1 is dense in L and T is continuous on L ,soV = g Tg : g L . In 2 { 2 } the proof of the mean ergodic theorem we saw that L2 = V V ,soV = V V 1 2 1 2 is dense in L2,andhenceinL1,asrequired.

By Lemma 4.1.1, this proves the ergodic theorem in the sense of L1-convergence for all f L1. In order to similarly extend the pointwise version to all of L1 2 we need a little bit of “continuity”, which is provided by the following. CHAPTER 4. THE ERGODIC THEOREM 35

1 Theorem 4.3.3 (Maximal inequality). Let f L with f 0 and SN f = 1 N 1 n 2 N n=0 T f. Then for every t, P 1 µ x :supS f(x) >t fdµ N  t ˆ ✓ N ◆ Before giving the proof let us show how this finishes the proof of the ergodic theorem. Write S = E( ), which is a bounded linear operator on L1,letf L1 ·|I 2 and g V . Then 2 S f Sf S f S g + S g Sg | N ||N N | | N | S f g + S g Sf  N | | | N | Now, S g Sg a.e., hence S g Sf S(g f) S f g a.e. Thus, N ! | N |!| | | |

lim sup SN f Sf lim sup SN f g + S g f N | | N | | | | !1 !1 If the left hand side is >"then at least one of the terms on the right is >"/2. Therefore,

µ lim sup SN f Sf >" µ lim sup SN f g >"/2 +µ (S g f >"/2) N | |  N | | | | ✓ !1 ◆ ✓ !1 ◆ Now, by the maximal inequality, the first term on the right side is bounded by 1 f g ,andbyMarkov’sinequalityandtheidentity Shdµ = hdµ,the "/2 k k second term is bounded by 1 g f as well. Thus for´ any ">0 ´and g V "/2 k k 2 we have found that 4 µ lim sup SN f Sf >" f g N | |  " k k ✓ !1 ◆ For each fixed ">0,therighthandsidecanbemadearbitrarilycloseto0, hence lim sup SN f Sf =0a.e. which is just SN f Sf = E(f ),asclaimed. | | ! |I We now return to the maximal inequality which will be proved by reducing it to a purely combinatorial statement about functions on the integers. Given a function f : N [0, ) and a set = I N,theaverageoff over I is denoted ! 1 ;6 ✓ 1 b S f = f(i) b I I i I | | X2 b b In the following discussion we write [i, j] also for integer segments, i.e. [i, j] Z. \ Proposition 4.3.4 (Discrete maximal inequality). Let f : N [0, ).Let ! 1 J I N be finite intervals, and for each j J let Ij I be a sub-interval of ✓ ✓ 2 ✓ I whose left endpoint is j. Suppose that S f>tfor all jb J. Then Ij 2 J S f>t | b| I · I | | b CHAPTER 4. THE ERGODIC THEOREM 36

Proof. Suppose first that the intervals I are disjoint. Then together with { j} U = I I they form a partition of I,andbysplittingtheaverageS f \ j I according to this partition, we have the identity S b U I S f = | |S f + | j|S f I I U I Ij | | X | | b b b Since f 0 also S f 0,andso U I 1 I b S fb | j|S f t I t| j| I I Ij I | j| I X | | | | X S| | Now, Ij j J is notb a disjoint family,b but the above applies to every disjoint { } 2 sub-collection of it. Therefor we will be done if we can extract from Ij j J a { } 2 disjoint sub-collection whose union is of size at least J . This is the content of | | the next lemma.

Lemma 4.3.5 (Covering lemma). Let I,J, Ij j J be intervals as above. Then { } 2 there is a subset J0 J such that (a) J i J Ij and (b) the collection of ✓ ✓ 2 0 intervals Ji i J is pairwise disjoint. { } 2 0 S Proof. Let I =[j, j+N(j) 1].WedefineJ = j by induction using a greedy j 0 { k} procedure. Let j1 =minJ be the leftmost point. Assuming we have defined j1 < ...

j =min I [0,j + N(j ) 1] k+1 { \ k k } It is clear that the extended collection satisfies the same conditions, so we can continue until we have covered all of J. We return now to the dynamical setting. Each x X defines a function 2 f = fx : N [0, ) by evaluating f along the orbit: ! 1 i b b f(i)=f(T x)

Let b A = sup SN f>t { N } and note that if T jx A then there is an N = N(j) such that S f(T jx) >t. 2 N Writing I =[j, j + N(j) 1] j this is the same as

SIj f>t Fixing a large M (we eventually take M ), consider the interval I = b !1 [0,M 1] and the collection Ij j J , where { } 2 J = J = 0 j M 1:T jx A and I [0,M 1] x {   2 j ✓ } CHAPTER 4. THE ERGODIC THEOREM 37

The proposition then gives J S[0,M 1]f>t | | · M In order to estimate the size of J we willb restrict to intervals of some bounded length R>0 (which we eventually will send to infinity). Let

AR = sup SN f>t {0 N R }   Then J 0 j M R 1:T jx A ◆{   2 R} and if we write h =1AR ,thenwehave

M R 1 J h(j) | | j=0 X b =(M R 1)S[0,M R 1]h With this notation now in place,the above becomes b M R 1 S[0,M 1]fx >t S[0,M R 1]hx (4.2) · M · and notice that the averageb on the right-hand side is just frequencyb of visits to AR up to time M. We now apply a general principle called the transference principle, which relates the integral gdµ of a function g : X R its discrete averages SI g ! along orbits: using ´g = T ng,wehave ´ ´ M 1 b 1 gdµ = T mgdµ ˆ M ˆ m=0 X M 1 1 = T mg dµ ˆ M m=0 ! X = S[0,M 1]gx dµ(x) ˆ

Applying this to f and using 4.2, we obtain b

fdµ = S[0,M 1]fx ˆ M R 1 >t b hdµ · M · ˆ R 1 = t (1 ) 1 dµ · M · ˆ AR R 1 = t (1 ) µ(A ) · M · R CHAPTER 4. THE ERGODIC THEOREM 38

Letting M ,thisis !1 fdµ>t µ(A ) ˆ · R Finally, letting R and noting that µ(A ) µ(A),weconcludethat !1 R ! fdµ>t µ(A), which is what was claimed. ´ · Example 4.3.6. Let (⇠n)n1=1 be an independent identically distributed se- quence of random variables represented by a product measure on (X, ,µ)= 1 B (⌦, ,P)N, with ⇠n(!)=⇠(!n) for some ⇠ L (⌦, ,P).Let : X X be F 2 F n ! the shift, which preserves µ and is ergodic, and ⇠n = ⇠0( ).Sincetheshift acts ergodically on product measures, the ergodic theorem implies

N 1 N 1 1 1 n ⇠n = ⇠0 E(⇠0 )=E⇠0 a.e. N N ! |I n=0 n=0 X X Thus the ergodic theorem generalizes the law of large numbers. However it is a very broad generalization: it holds for any stationary process (⇠n)n1=1 without any independence assumption, as long as the process is ergodic. When T is invertible it is also natural to consider the two-sided averages 1 N n 1 1 SN = 2N+1 n= N T f.Uptoanextraterm2N+1 f,thisisjust 2 SN (T,f)+ 1 1 S T ,f), where we write S (T,f) to emphasize which map is being used. 2 N P N Since both of these converge in L1 and a.e. to the same function E(f ),the |I same is true for SN f.

4.4 Generic points

The ergodic theorem is an a.e. statement relative to a given L1 function, and, anyway, L1 functions are only a.e. Therefore it is not clear how to interpret the statement that the orbit of an individual point distributes well in the space. There is an exception: When the space is a compact metric space, one can use the continuous functions as test functions to define a more robust notion. Definition 4.4.1. Let (X, T) be a topological dynamical system. A point x X is generic for a Borel measure µ (X) if it satisfies the conclusion of 2 2P the ergodic theorem for every continuous function, i.e.

N 1 1 T nf(x) fdµ for all f C(X) (4.3) N ! ˆ 2 n=0 X We have already seen that any measure µ that satisfies the above is T - invariant.

Lemma 4.4.2. Let C(X) be a countable -dense set. If 4.3 holds for F✓ k·k every f then x is generic for µ. 1 2F CHAPTER 4. THE ERGODIC THEOREM 39

Proof. By a familiar calculation, given f C(X) and g , 2 2F lim sup SN f(x) fdµ lim sup SN f(x) SN g(X) +limsup SN g(x) fdµ N | ˆ | N | | N | ˆ | !1 !1 !1

lim sup SN f g (x)+limsup gdµ fdµ  N | | N | ˆ ˆ | !1 !1 2 f g  k k1 since g can be made arbitrarily close to f we are done.

Proposition 4.4.3. If µ is T -invariant with ergodic decomposition µ = µx dµ(x). Then µ-a.e. x is generic for µx. ´

Proof. Since µ = µx dµ(x),itsufficestoshowthatforµ-a.e. x,forµx-a.e. y, y is generic for µx´. Thus we may assume that µ is ergodic and show that a.e. point is generic for it. To do this, fix a -dense, countable set C(X). k·k F✓ By the ergodic theorem, S f(x) f a.e.,1 for every f ,sosince is N ! 2F F countable there is a set of measure one´ on which this holds simultaneously for all f . The previous lemma implies that each of these points is generic for 2F µ.

This allows us to give a new interpretation of the ergodic decomposition when T : X X is a continuous map of a compact metric space. For a given ! ergodic measure µ,let denote the set of generic points for µ.Sinceameasure Gµ is characterized by its integral against continuous functions, if µ = ⌫ then 6 = .Finally,itisnothardtoseethat is measurable and µ( )=1 Gµ \G⌫ ; Gµ Gµ by the proposition above. Thus we may regard as the ergodic component Gµ of µ.Onecanalsoshowthat = ,thesetofpointsthataregenericfor G Gµ ergodic measures, is measurable, because these are just the points such that S ergodic averages exist against every continuous function, or equivalently every function in a dense countable subset of C(X). Now, for any invariant measure ⌫ with ergodic decomposition ⌫ = ⌫x d⌫(x), ´ ⌫( )= ⌫ ( ) d⌫(x)=1 G ˆ x G because ⌫ are a.s. ergodic and . Thus on a set of full ⌫-measure sets x G⌫x ✓G Gµ give a partition that coincides with the ergodic decomposition. Note, however, that this partition does not depend on ⌫ (in the ergodic decomposition theorem it is not a-priori clear that such a decomposition can be achieved). Example 4.4.4. Let X = 0, 1 N and let µ = and µ = . These { } 0 000... 1 111... are ergodic measures for the shift .Nowletx X be the point such that 2 x =0for k2 n<(k + 1)2 if k is even, and x =1for k2 n<(k + 1)2 if k n  n  is odd. Thus x = 111000001111111000000000111 ... CHAPTER 4. THE ERGODIC THEOREM 40

1 1 We claim that x is generic for the non-ergodic measure µ = 2 µ0 + 2 µ1. It suffices to prove that for any `,

N 1 1 n 1 1 ` (T x) N 0 ! 2 n=0 X N 1 1 n 1 1 ` (T x) N 1 ! 2 n=0 X where 0`, 1` are the sets of points beginning with ` consecutive 0s and ` con- secutive 1s, respectively. The proofs are similar so we show this for 0`.Notice n 2 2 n that 1 ` (T x)=1if k n<(k + 1) ` and k is even, and 1 ` (T x)=0 0  0 otherwise. Now, each N satisfies k2 N<(k + 2)2 for some even k. Then  N 1 k/2 k/2 n 2 2 1 2 1 ` (T x)= ((2j + 1) `) (2j) )= (4j +1 `)=( k + O(k)) 0 2 n=0 j=1 j=1 X X X 2 2 2 1 Also N k (k + 1) k = O(k). Therefore S 1 ` (x) as claimed.  N 0 ! 2 k Example 4.4.5. With (X,) as in the previous example, let yn =0if 2 k+1  n<2 for k even and yn =1otherwise. Then one can show that x is not generic for any measure, ergodic or not. Our original motivation for considering ergodic averages was to study the frequency of visits of an orbit to a set. Usually 1A is not continuous even when A is topologically a nice set (e.g. open or closed), so generic points do not have to behave well with respect to visit frequencies. The following shows that this can be overcome with slightly stronger assumption on A and x. Lemma 4.4.6. If x is generic for µ, and if U is open and C is closed, then

N 1 1 lim inf 1 (T nx) µ(U) N U n=0 X N 1 1 lim sup 1 (T nx) µ(C) N C  n=0 X kd(y,Uc) Proof. Let f C(X) with f 1 (e.g. f (y)=1 e ). Then k 2 k % U n 1 f and so U n N 1 N 1 1 1 lim inf 1 (T nx) lim f (T nx)= f dµ µ(U) N U N k ˆ k ! n=0 n=0 X X The other inequality is proves similarly using g 1 . n & C 1 N 1 n Proposition 4.4.7. If x is generic for µ, A X and µ(@A)=0then 1 (T x) ✓ N n=0 A ! µ(A). P CHAPTER 4. THE ERGODIC THEOREM 41

Proof. Let U = interior(A) and C = A,so1 1 1 .Bythelemma, U  A  C lim inf S 1 lim inf S 1 µ(U) N A N U and lim sup S 1 lim sup S 1 µ(C) N A  N C  But by our assumption, µ(U)=µ(C)=µ(A),andwefindthat

µ(A)=liminfS 1 lim sup S 1 µ(A) N A  N A  So all are equalities, and S 1 µ(A). N A ! 4.5 Unique ergodicity and circle rotations

When can the ergodic theorem be strengthened from a.e. point to every point? Once again the question does not make sense for L1 functions, since these are only defined a.e., but it makes sense for continuous functions. Definition 4.5.1. Atopologicalsystem(X, T) is uniquely ergodic if there is only one invariant probability measure, which in this case is denoted µX . Proposition 4.5.2. Let (X, T) be a topological system and µ (X). The 2PT following are equivalent. 1. Every point is generic for µ.

2. SN f fdµuniformly, for every f C(X). ! ´ 2 3. (X, T) is uniquely ergodic and µ is its invariant measure. Proof. (1) implies (3): If ⌫ = µ were another invariant measure there would be 6 points that are generic for it, contrary to (1). (3) implies (2): Suppose (2) fails, so there is an f C(X) such that 2 S f fdµ 0. Then there is some sequence x X and integers N 6 ! k 2 N ´such that1 S f(x ) c = fdµ.Let⌫ be an accumulation point k Nk k !11 Nk ! 6 of T nx . This is a T -invariant´ measure and fd⌫ = c so ⌫ = µ, Nk n=1 k 6 contradicting (3). ´ P (2) implies (1) is immediate.

Proposition 4.5.3. Let X = R/Z and ↵/Q. The map T↵x = x + ↵ on X is 2 uniquely ergodic with invariant measure µ = Lebesgue. We give two proofs.

Proof number 1. We know that µ is ergodic for T↵ so a.e. x is generic. Fix one such x.Lety X be any other point. then there is a R such that y = Tx. 2 2 CHAPTER 4. THE ERGODIC THEOREM 42

For any function f C(X), 2 N 1 N 1 1 1 T nf(y)= f(y + ↵n) N ↵ N n=0 n=0 X X N 1 1 = f(x + ↵ + ) N n n=0 X N 1 1 = (T f)(T nx) N ↵ n=0 X T fdµ= fdµ ! ˆ ˆ

Therefore every point is generic for µ and T↵ is uniquely ergodic. Our second proof is based on a more direct calculation that does not rely on the ergodic theorem.

Definition 4.5.4. Asequence(xk) in a compact metric space X equidistributes for a measure µ if 1 N µ weak-*. N n=1 xn ! Lemma 4.5.5 (Weyl’sP equidistribution criterion). Asequence(xk) R/Z ✓ equidistributes for Lebesgue measure µ if and only if for every m,

N 1 1 0 m =0 e2⇡imxn N ! 1 m =0 n=0 X ⇢ 6 s⇡imt Proof. Let m(t)=e . The linear span of m m is dense in C(R/Z) by { } 2Z Fourier analysis so equidistribution of (x ) is equivalent to S (x) dµ k N m ! m for every m. This is what the lemma says. ´

Proof number 2. Fix t R/Z and xk = t + ↵k.Form =0the limit in Weyl’s 2 criterion is automatic so we only need to check m =0. Then 6 N 1 N 1 1 1 1 e2⇡im↵N 1 e2⇡imxn = e2⇡imt (e2⇡im↵)n = e2⇡it =0 N N · N · e2⇡im↵ 1 n=0 n=0 X X (note that ↵/Q ensures that the denominator is not 0, otherwise the summa- 2 tion formula is invalid).

Corollary 4.5.6. For any open or closed set A R/Z, for every x R/Z, ✓ 2 S 1 (x) Leb(A). N A ! Proof. The boundary of an open or closed is countable and hence of Lebesgue measure 0. CHAPTER 4. THE ERGODIC THEOREM 43

Example 4.5.7 (Benford’s law). Many samples of numbers collected in the real world exhibit the interesting feature that the most significant digit is not uniformly distributed. Rather, 1 is the most common digit, with frequency approximately 0.30; the frequency of 2 is about 0.18; the frequency of 3 is about 0.13;etc.Moreprecisely,thefrequencyofthedigitk is approximately 1 log10(1 + d ). We will show that a similar distribution of most significant digits holds for powers of b whenever b is not a rational power of 10. The main observation is that the most significant base-10 digit of x [1, ) is determined by y = 2 1 log x mod 1,andisequaltok if y I = [log k, log (k + 1)). Therefore, 10 2 k 10 10 the asymptotic frequency of k being the most significant digits of bn is

N N 1 n 1 ln b lim 1Ik (log10 b )= lim 1Ik (n ) N N N N ln 10 !1 n=1 !1 n=1 X X = Leb(Ik)

= Leb[log10 k, log10(k + 1)] 1 = log (1 + ) 10 k since this is just the frequency of visits of the orbit of 0 to [log10 k, log10(k + 1)] under the map t t +lnb/ ln 10 mod 1,andln b/ ln 10 / Q by assumption (it 7! 2 would be rational if and only if b is a rational power of 10).

4.6 Sub-additive ergodic theorem

Theorem 4.6.1 (Subadditive ergodic theorem). Let (X, ,µ,T) be an ergodic B measure-preserving system. Suppose that f L1(µ) satisfy the subadditivity n 2 relation f (x) f (x)+f (T mx) m+n  m n 1 and are uniformly bounded above, i.e. fn L for some L. Then limn n fn(x)  1 !1 exists a.e. and is equal to the constant limn n fn. !1 ´ n 1 k Before giving the proof we point out two examples. First, if fn = k=0 T g then fn satisfies the hypothesis, so this is a generalization of the usual ergodic P theorem (for ergodic T ). n For a more interesting example, let An = A(T x) be a stationary sequence of d d matrices (for example, if the entries are i.i.d.). Let f = log A ... A ⇥ n k 1 · · nk satisfies the hypothesis. Thus, the subadditive ergodic theorem implies that random matrix products have a Lyapunov exponent – their norm growth is asymptotically exponential. Proof. Let us first make a simple observation. Suppose that 1,...,N is par- { } titioned into intervals [ai,bi) i I . Then subadditivity implies { } 2

ai fN (x) fb a (T x)  i i i I X2 CHAPTER 4. THE ERGODIC THEOREM 44

Let 1 a =liminf f n n We claim that a is invariant. Indeed, 1 1 f (Tx) (f (x) f (x)) n n n n+1 1 From this it follows that a(Tx) a(x) so by ergodicity a is constant. 1 Fix ">0.Sincelim inf n fn = a there is an N such that the set 1 A = x : f (x) 1 ". Now fix a typical point x.Bytheergodictheorem,foreverylargeenough M, M 1 1 1 (T nx) > 1 " M A n=0 X Fix such an M and let I = 0 n M N : T nx A 0 {   2 } For i I there is a 0 k N such that 1 f (T ix) (1 ")M.Byconstruction { } 2 | 2 1 || | also i I Ui [0,M). 2 1 ✓ S Choose an enumeration Ui i I of the complementary intervals in [0,M) S { } 2 2 \ Ui,sothat Ui i I I is a partition of [0,M).WritingUi =[ai,bi) and i Ii { } 2 1[ 2 using2 the comment above, we find that S

1 1 ai ai fM (x) fb a (T x)+ fb a (T x) M  M i i i i i I i I ! X2 1 X2 2 n I Ui n I Ui 2 1 | |(a + ")+ 2 2 | | f  P M P M k k1 (a + ")+" f  k k1 1 Since this holds for all large enough M we conclude that lim sup M fM a = 1  lim inf n fn so the limit exists and is equal to a. 1 It remains to identify a =limn fn.Firstnotethat ´ f f dµ + f T mdµ = f dµ + f dµ ˆ m+n  ˆ m ˆ n ˆ m ˆ n

1 so an = fndµ is subadditive, hence the limit a0 =limn an exists. By Fatou’s lemma (since´ f L we can apply it to f )weget n  n 1 1 1 a = lim sup f dµ lim sup f dµ =lim a = a0 ˆ n n ˆ n n n n CHAPTER 4. THE ERGODIC THEOREM 45

Suppose the inequality were strict, a0 0 and let n be such that a

4.6.1 Group actions Let G be a countable group. A measure preserving action of G on a measure space (X, ,µ) is, first of all, an action, that is a map G X X, (g, x) gx, B ⇥ ! 7! such that g(hx)=(gh)(x) for all g, h G and x X. In addition, for each 2 2 g G the map T : x gx must be measurable and measure-preserving. It is 2 g 7! convenient to denote the action by Tg g G. { } 2 An invariant set for the action is a set A such tat T A = A for all g G. 2B g 2 If every such set satisfies µ(A)=0or µ(X A)=0,thentheactionisergodic. \ There is an ergodic decomposition theorem for such actions, but for simplicity (and without loss of generality) we will assume that the action is ergodic. 1 For a function f : X R the function Tgf = f Tg : X R has the ! p ! same regularity, and Tg g G gives an isometric action on L for all 1 p . { } 2  1 Given a finite set E G let S f be the functions defined by ✓ E

SEf(x)= f(Tgx) g E X2 As before, this is a contraction in Lp.WesaythatasequenceE G of finite n ✓ sets satisfies the ergodic theorem along E if S f f,inasuitablesense { n} En ! (e.g. in L2 or a.e.) for every ergodic action and every suitable´ f. Definition 4.6.2. AgroupG is amenable if there is a sequence of sets E G n ✓ such that for every g G, 2 E gE | n n| 0 E ! | n| Such a sequence E is called a Følner sequence. { n} d d d For example, Z is a amenable because En =[ n, n] Z satisfies \

(En + u) En = En u = En + o(1) | \ | | k k1 | | | CHAPTER 4. THE ERGODIC THEOREM 46

The class of amenable groups is closed under taking subgroups and countable increasing unions, and if G and N C G are amenable so is G/N.Groupsof sub-exponential growth are amenable; the free group is not amenable, but there are amenable groups of exponential growth. Theorem 4.6.3. If E is a Følner sequence in an amenable group G then the { n} ergodic theorem holds along E in the L2 sense (the mean ergodic theorem). { n} Proof. Let V = span f T f : f L2 ,g G 0 { g 2 2 } One can show exactly as before that V0? consists of the invariant functions (in this case, the constant functions, because we are assuming the action is ergodic). Then one must only show that S (f T f) 0 for f L2.But En g ! 2 this is immediate from the Følner property, since

1 SEn f SEn Tgf = SE E r f n\ n and therefore

1 1 1 1 EnEng S (f T f) E E g f | | f 0 E En g  E | n \ n |·k k2  E k k2 ! n 2 n n | | | | | | This proves the mean ergodic theorem. The proof of the pointwise ergodic theorem for amenable groups is more delicate and does not hold for every Følner sequence. However, one can reduce it as before to a maximal inequality. What one then needs is an analog of the discrete maximal inequality, which now concerns functions f : G [0, ),and ! 1 requires an analog of the covering Lemma 4.3.5. Such a result is known under 1 astrongerassumptionon E ,namelyassumingthat b E E C E { n} | k

1 1 1 aba cc abb = aba abb = abbb the right hand side is reduced. Let En Fs denote the set of reduced words of length n. ✓  Theorem 4.6.4 (Nevo-Stein, Bufetov). If Fs acts ergodically by measure pre- serving transformations on (X, ,µ) then for every S f f for every B En ! f L1(µ). ´ 2 CHAPTER 4. THE ERGODIC THEOREM 47

There is a major difference between the proof of this result and in the 1 amenable case. Because E E g / E 0,thethereisnotrivialrea- | n n | | n|6! son for the averages of co-boundaries to tend to 0. Consequently there is no natural dense set of functions in L1 for which convergence holds. In any case, the maximal inequality is not valid either. The proof in non-amenable cases takes completely different approaches (but we will not discuss them here).

4.6.2 Hopf’s ergodic theorem Another generalization is to the case of a measure-preserving transformation T of a measure space (X, ,µ) with µ(X)= (but -finite). Ergodicity is B 1 defined as before – all invariant sets are of measure 0 or their complement is of measure 0. It is also still true that T : L2(µ) L2(µ) is norm-preserving, ! and so the mean ergodic theorem holds: S f ⇡f for f L2, where ⇡ is the N ! 2 projection to the subspace of invariant L2 functions. Now, however, the only constant function that is integrable is 0,andwefindthatS f 0 in L2. In N ! fact this is true in L1 and a.e. The meaning is, however, the same: if we take a set of finite measure A,thissaysthatthefractionoftimeanorbitspendsinA is the same as the relative size of A compared to ⌦;inthiscaseµ(A)/µ(⌦) = 0. Instead of asking about the absolute time spent in A,itisbettertoconsider two sets A, B of positive finite measure. Then an orbit visits both with frequency 0,butonemayexpectthatthefrequencyofvisitstoA is µ(A)/µ(B)-times the frequency of visits to B. This is actually he case: Theorem 4.6.5 (Hopf). If T is an ergodic measure-preserving transformation of (X, ,µ) with µ(X)= , and if f,g L1(µ) and gdµ =0, then B 1 2 ´ 6 N 1 n T f fdµ n=0 a.e. N 1 n N ! ´ gdµ Pn=0 T g !1 ´ Since the right handP side is usually not 0,onecannotexpectthistoholdin L1. Hopf’s theorem can also be generalized to group actions, but the situation there is more subtle, and it is known that not all amenable groups have sequences E T gf/ T gh f/ h n such that En En .See??. ! ´ ´ P P Chapter 5

Some categorical constructions

5.1 Isomorphism and factors

Definition 5.1.1. Two measure preserving systems (X, ,µ,T) and (Y, ,⌫,S) B C are isomorphic if there are invariant subsets X0 X and Y0 Y of full measure 1✓ ✓ and a bijection ⇡ : X Y such that ⇡,⇡ are measurable, ⇡µ = ⌫,and 0 ! 0 ⇡ T = S ⇡. The last condition means that the following diagram commutes: X T X 0 ! 0 ⇡ ⇡ # # Y S Y 0 ! 0 It is immediate that ergodicity and mixing are isomorphism invariants. Also, ⇡ induces an isometry L2(µ) L2(⌫) in the usual manner and the induced maps ! of T,S on these spaces commute with ⇡,sotheinducedmapsT,S are unitarily equivalent in the Hilbert-space sense. The same is true for the associated Lp spaces.

Example 5.1.2. Let ↵ R and X = R/Z with Lebesgue measure µ,and 2 T↵x = x+↵ mod 1. Then T↵, T ↵ are isomorphic via the isomorphism x x. ! Example 5.1.3. Let X = 0, 1 N with µ the product measure 1 , 1 and the { } 2 2 shift T ,andY =[0, 1] with ⌫ =LebesguemeasureandSx =2x mod 1.Let n ⇡ : X Y be the map ⇡(x)= 1 x 2 ,or⇡(x)=0.x x x ... in binary ! n=1 n 1 2 3 notation. Then it is well known that ⇡µ = ⌫,andwehave P

S(⇡x)=S(0.x1x2 ...)=0.x2x3 ...= ⇡(Tx)

Thus ⇡ is a factor map between the corresponding systems. Furthermore it is an isomoprhism, since if we take X X to be all eventually-periodic sequences 0 ✓ 48 CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 49 and Y0 = Y Q. These are invariant sets; µ(X0)=1,sincetherearecountably \ many eventually periodic sequences and each has measure 0;and⌫(Y0)=1. Finally ⇡ : X Y is 1-1 and onto, since there is an inverse given by the 0 ! 0 binary expansion, which is measurable. This proves that the two systems are isomorphic. Example 5.1.4. An irrational rotation is not isomorphic to a shift space with a product measure. This can be seen in many ways, one of which is the following. Note that there is a sequence n such that T nk 0 0; this follows from k !1 ↵ ! the fact that 0 equidistributes for Lebesgue measure, so its orbit must return arbitrarily close to x.Sincex = Tx0,wefindthat

T nk x = T nk T 0=T T nk 0 T 0=x x x ! x so T nk x x for all x R/Z. It follows from dominated convergence that for ! 2 2 n 2 every f L (µ) L1(µ) we have T k f f in L ,hence 2 \ ! T nk f f (f 2)dµ ˆ · ! ˆ On the other hand if (AZ, ,⌫,S) is a shifts space with a product measure then C we have already seen that it is mixing, hence for every f L2(⌫) we have 2 Snk f f ( f)2d⌫ ˆ · ! ˆ By Cauchy-Schwartz, we generally have ( f)2 = (f 2) so the operators S, T 6 cannot be unitarily equivalent. ´ ´ Definition 5.1.5. Ameasurepreservingsystem(Y, ,⌫,S) is a factor of a C measure preserving system (X, ,⌫,T) if there are invariant subsets X X, B 0 ✓ Y Y of full measure and a measurable map ⇡ : X Y such that ⇡µ = ⌫ 0 ✓ 0 ! 0 and ⇡ T = S ⇡. This is the same as an isomorphism, but without requiring ⇡ to be 1-1 or onto. Note that ⇡µ = ⌫ means that ⇡ is automatically “onto” in the measure 1 sense: if A Y and ⇡ (A)= then ⌫(A)=0. ✓ 0 ; Remark 5.1.6. When X, Y are standard Borel spaces (i.e. as measurable spaces they are isomorphic to complete separable metric spaces with the Borel - algebra), one can always assume that ⇡ : X Y is onto (even though the 0 ! 0 image of a measurable set is not in general measurable, in the standard setting one can find a measurable subset of the image that has full measure, and restrict to it).

Example 5.1.7. Let T↵ be rotation by ↵.Letk Z 0. Then ⇡ : x kx mod 1 2 \ 7! maps Lebesgue measure to Lebesgue measure on R/Z and

⇡T↵x = k(x + ↵)=Tk↵⇡x

Thus Tk↵ is a factor of T↵.Notethatunlessk = 1 this is not an isomorphism, 1 ± since ⇡ y = k for all y. | | CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 50

Example 5.1.8. Let A = 1,...,n , p =(p ,...,p ) be a non-degenerate { } 1 n probability vector, X = AZ with the product -algebra and product measure µ = pZ.LetB be a set and ⇡ : A B any map, and let q = p 1 the ! b ⇡ b push-forward probability vector. Let Y = BZ and ⌫ = qZ.FinallyletS be the shift (on both spaces) and extend ⇡ to a map X Y pointwise: !

⇡(...,x 1,x0,x1,...)=(...⇡(x 1),⇡(X0),⇡(x1),...) Then by considering cylinder sets it is easy to show that this ⇡µ = ⌫,andclearly S⇡ ⇡S. This Y is a factor of X. Proposition 5.1.9. Let (X, ) and (Y, ) be measurable spaces and T : X X B C ! and S : Y Y measurable maps, and ⇡ : X Y⇡: X Y be a measurable ! ! ! map such that ⇡T = S⇡.. 1. If µ is an invariant measure for T then ⌫ = ⇡µ is an invariant measure for S and ⇡ is a factor map between (X, ,µ,T) and (Y, ,⌫,S). B C 2. If the spaces are standard Borel spaces and if ⌫ is an invariant measure for S, then there exist invariant measures µ for T such that ⌫ = ⇡µ and ⇡ is a factor map between (X, ,µ,T) and (Y, ,⌫,S) (but no uniqueness B C is claimed). Proof. The first part is an exercise. The second part is less trivial. We give a proof for the case that X, Y are compact metric spaces, , the Borel -algebras, and T,S,⇡ are continuous. B C In this situation, we first need a non-dynamical version:

Lemma 5.1.10. There is a measure µ0 on X such that ⇡µ = ⌫.

Proof No. 1 (almost elementary). Start by constructing a sequence ⌫n of atomic measures on Y with ⌫ ⌫ weakly, i.e. gd⌫ gd⌫ for all g C(Y ).To n ! n ! 2 get such a sequence, given n choose a finite´ partition´ of Y into measurable En sets of diameter < 1/n (for instance cover Y by balls Bi of radius < 1/n and set

Ei = Bi j

gd⌫ =lim gd⌫ = g ⇡dµ = g ⇡dµ= gd(⇡µ) ˆ ˆ n ˆ n ˆ ˆ as claimed. CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 51

Proof No. 2 (function-analytic). .Firstafewgeneralremarks.Alinearfunc- tional µ⇤ on C(X) is positive if it takes non-negative values on non-negative functions. This property implies boundedness: to see this note that for any f C(X) we have f f 0,hencebylinearityandpositivityµ⇤( f ) 2 k k1 k k1 µ⇤(f) 0,giving µ⇤(f) µ⇤( f )= f µ⇤(1)  k k1 k k1 · Similarly, using f + f 0 we get µ⇤(f) f . Combining the two we k k1 k k1 have µ⇤(f) C f , where C = µ⇤(1). | | k k1 Since a positive functional µ⇤ is bounded it corresponds to integration against aregularsignedBorelmeasureµ,andsince fdµ= µ⇤(f) 0 for continuous f 0,regularityimpliesthatµ is a positive measure.´ Hence a linear functional µ⇤ C(X)⇤ corresponds to a probability measure if and only if it be positive 2 and µ⇤(1) = 1 (this is the normalization condition 1 dµ =1). We now begin the proof. Let ⌫⇤ : C(Y ) R be´ bounded positive the linear ! functional g gd⌫. The map ⇡⇤ : C(Y ) C(X), g g ⇡,embedsC(Y ) 7! ! 7! isometrically as´ a subspace V = ⇡(C(Y ))

gd⇡µ= g ⇡dµ= µ⇤(g ⇡)=µ⇤(g ⇡)=⌫⇤(g)= gd⌫ ˆ ˆ 0 ˆ so µ is the desired measure.

1 n 1 k Now let µn = n k=0 T µ0 and let µ be a subsequential limit of µn. It is easy to check int he usual way that µ is T -invariant. Also, P n 1 n 1 n 1 1 1 1 ⇡µ = ⇡(T kµ)= T k⇡µ = ⌫ = ⌫ n n n n kX=0 kX=0 kX=0 Since the space of measure on X projecting to ⌫ is weak-* closed, and M µ ,alsotheirlimitpointµ ,asclaimed. n 2M 2M Example 5.1.11. Let A, B be finite sets, AZ,BZ the product spaces and S the shift. Let ⇡ : A2n+1 B be any map and extend ⇡ to AZ BZ by ! !

(⇡x)i = ⇡(xi n,...,xi,...xi+n) This commutes with the shift and so any invariant measure on AZ projects to an invariant measure on BZ and vice versa. CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 52

For example if A = B = 0, 1 , n =1and ⇡(a, b, c)=b + c mod 1 then we { } 1 1 get a factor map as above and one may verify that the ( 2 , 2 ) product measure is mapped to itself; but the factor map is non-trivial since each sequence has two pre-images. If ⇡ : X Y is a factor map between measure preserving systems (X, ,µ,T) ! B and (Y, ,⌫,S) (already restricted to the invariant subsets of full measure). Then C 1 1 ⇡ = ⇡ C : C C { 2C} is a sub--algebra of and it is invariant since ⇡T = S⇡.Noealsothat⇡ is an 2 B 1 2 isometry between L (⇡ ,µ) and L ( ,⌫). Thus C C There is also a converse: Proposition 5.1.12. Let (X, ,µ,T) be a measure preserving system on a B standard Borel space and let 0 be an invariant, countably generated - C ✓B algebra. Then there is a factor (Y, ,⌫,S),with(Y, ) a standard Borel space, C1 C and factor map ⇡ such that 0 = ⇡ . C C The proof relies on the analogous non-dynamical fact that a countably gen- erated sub--algebra in standard Borel space always arises as the pullback via ameasurablemapofsomeotherstandardBorelspace.Weshallnotgointo details.

5.2 Product systems

Another basic construction is to take products: Definition 5.2.1. Let (X, ,µ,T) and (Y, ,⌫,S) be measure preserving sys- B C tems. Let T S : X Y X Y denote the map T S(x, y)=(Tx,Sy). ⇥ ⇥ ! ⇥ ⇥ Then (X Y, ,µ ⌫,T S) is called the product of X and Y . ⇥ B⇥C ⇥ ⇥ Claim 5.2.2. The product of measure preserving systems is measure preserving. 1 Proof. Let ✓ = µ ⌫ and R = T S. It is enough to check that ✓(R (A B)) = ⇥ ⇥ ⇥ ✓(A B) for A X, B Y ,sincethesesetsgeneratetheproductalgebraand ⇥ ✓ ✓ the family invariant sets whose measure is preserved is a -algebra. For such A, B, 1 1 1 1 1 ✓(R (A B)) = ✓(T A S B)=µ(T A)⌫(T B)=µ(A)⌫(B)=✓(a B) ⇥ ⇥ ⇥

Remark 5.2.3. Observe that the coordinate projections ⇡1 and ⇡2 are factor maps from the product system to the original systems. The definition generalizes to products of finitely many or countably many systems. We turn tot he construction of inverse limits. Let (X , ,µ ,T ) be mea- n Bn n n sure preserving systems, and let ⇡ : X X be factor maps, and suppose n n+1 ! n that ⇡n are defined everywhere and onto their image. Let X n1=1Xn 1 ✓⇥ denote the set of sequences (...,xn,xn 1,...,x1), with xn Xn,suchthat 2 ⇡n(xn+1)=xn. CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 53

5.3 Natural extension

When (X, ,µ,T) is an ergodic m.p.s. on a standard Borel space, there is B canonical invertible m.p.s. (X, , µ, T ) and factor map ⇡ : X X such that if B ! (Y, ,⌫,S) is another invertible system and ⌧ : Y X afactormap,⌧ factors C ! through ⇡,thatisthereisafactormape e e e : Y X with ⌧ =e ⇡: ! Y X e ⌧ ! ⇡ &# Xe

We now construct X.Let⇡ : XZ X denote the coordinate projections, n ! which are measurable with respect tot he product algebra, and let e X = x XZ : Tx = x { 2 n n+1} This is the intersection of the measurable sets x XZ : ⇡ x = T ⇡ x so e { 2 n+1 n } X is measurable. The shift map T is measurable and X is clearly invariant. It is also easy to check that ⇡ : X X is a factor map: T⇡ = ⇡ T . n ! n n e As for uniqueness, if :(Y, ,⌫e,S) (X, ,T,µ) wee can define ⌧ : Y XZ n C ! B ! by ⌧(y)n = (T y).Thentheimageise X, = ⇡⌧ is automatic,e and one can show that ⌧⌫ = µ;again,weomitthedetails. e Lemma 5.3.1. If a m.p.s. is ergodic then so is its natural extension. e Proof. Let X be the original system and X its natural extension, ⇡ : X X the ! factor map. Then from the construction above it is clear that f =limE(f n) 2 n 1 |B for f L (µ), where = T ⇡ . It ise clear that (X, , µ, T ) = (eX, ,µ,T) 2 Bn B Bn ⇠ B and is therefore ergodic. It is also clear that E(Tf n)=T E(f n),since n 2 |B |B B is invariant.e Therefore if fe L (µ) is T -invariant thene E(ef e n) is invariant. 2 |B Since it corresponds to an invariant function on thee ergodice system (X, ,µ,T) B it is constant. Since f =limE(f en), f eis constant. Therefore X has no non- |B constant invariant functions, so it is ergodic. e Example 5.3.2. Let (X, ,µ,T) be an ergodic measure preserving system and B A .Definethereturntimer (x) as usual. If T is invertible, Kac’s formula 2B A (Theorem 3.3.1) ensures that A rAdµ =1.Wecannowprovethisinthenon- invertible case: let (X,T.µ) and´ ⇡ : X X be the natural extension. Let 1 ! A = ⇡ A and rA the return time function in X.SinceX is ergodic, by Kac’s formula r dµ =1.Butclearlye e e r = erA ⇡ so r dµ = rAd⇡µ = rAdµ, A A A ande the claim´ follows.e e ´ e ´ ´ e e e e e e 5.4 Inverse limits

A very similar construction gives the following theorem. CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 54

Theorem 5.4.1. Let (Xn, n,Tn,µn) be m.p.s. and ⇡n : Xn Xn 1 factor B ! maps. Then there is a measure-preserving system (X, , µ, T ) and factor maps B ⌧ : X X such that n ! n e e e e 1. ⌧n 1 = ⇡n⌧n. e 2. X is unique in the sense that if Y is another m.p.s. and : Y X n ! n satisfies n 1 = ⇡nn, then there is a factor map ⌧ : Y X such that ! en = ⌧n⌧. e 3. If all the Xns are ergodic, so is X.

e 5.5 Skew products

Let (X, ,µ,T) be a m.p.s. and G a compact group with normalized Haar mea- B sure m (for example R/Z with Lebesgue measure). Fix a probability space(Y, ,⌫) C and let G = Aut(⌫) be the group of invertible ⌫-preserving maps Y Y . This ! group can be identified with a subgroup of the bounded linear operators on L2(⌫) with the operator topology (strong or weak) and given the induced Borel structure. Now suppose that :x ' is a measurable map X G.Nowform ! x ! X = Y and T : X Y X Y by T (x, y)=(Tx,(x)y). ⇥ ⇥ ! ⇥ Lemma 5.5.1. Let ✓ = µ ⌫. Then ✓ is T -invariant. ⇥ Proof. Using Fubini, for A X Y we have: ✓ ⇥ 1 1 ✓(T A)= T (A)d✓ ˆ

= 1 (T x)d✓(x) ˆ A

= 1 (Tx,' y)d⌫(y) dµ(x) ˆ ˆ A x ✓ ◆ = 1 (Tx,y)d⌫(y) dµ(x) ˆ ˆ A ✓ ◆ = 1 (Tx,y)dµ(x) d⌫(y) ˆ ˆ A ✓ ◆ = 1 (x, y)dµ(x) d⌫(y) ˆ ˆ A ✓ ◆ = ✓(A) as claimed.

The m.p.s. (X,✓,T) is called a skew-product over X (the base) with cocycle . Note that ⇡(x, y)=x is a factor map from (X Y,✓,T ) to (X, µ, T). In a ⇥ sense there is a converse: CHAPTER 5. SOME CATEGORICAL CONSTRUCTIONS 55

Theorem 5.5.2 (Rohlin). Let ⇡ : Y X be a factor map between standard ! Borel spaces. Then Y is isomorphic to a skew-product over X. Note that we already know that the measure on Y disintegrates into condi- tional measures. What is left to do is identify all the fibers with a single space. Then is defined as the transfer map between fibers. We omit the technical details. Chapter 6

Weak mixing

6.1 Weak mixing

Ergodicity is the most basic “mixing” property; it ensures that any two sets A, B n intersect at some time, i.e. there is an n with µ(A T B) > 0. It also ensures \ that every orbit “samples” the entire space correctly. The notion of weak mixing is a natural generalization to pairs (and later k-tuples) of points. Thus we are interested in understanding the implications of independent pairs x, y of points sampling the product space. This is ensured by the following definition. Definition 6.1.1. Am.p.s. (X, ,µ,T) is weak mixing if (X X, ,µ B ⇥ B⇥B ⇥ µ, T T ) is ergodic. ⇥ This property has many equivalent and surprising forms which we will now discuss. Let us first give some example. Since X is a factor of X X and the factor of an ergodic system is ergodic, ⇥ weak mixing implies ergodicity. The converse is false.

Example 6.1.2. The translation of R/Z by ↵/Q is ergodic but not weak mix- 2 ing. Indeed, let d be a translation-invariant metric on R/Z,sothatd(T↵x, T↵y)= d(x, y). Then the function d : R/Z R/Z R is T T -invariant and is not a.e. ⇥ ! ⇥ constant with respect to Lebesgue measure on the product space (each level set is a sub-manifold of measure 0). Therefore the product is not ergodic. Example 6.1.3. More generally, if (X, d) is a compact metric space, T : X X ! an isometry, and µ an invariant measure supported on more than one point, then the same argument shows that (X, µ, T) is not weak mixing. Example 6.1.4. Let X = 0, 1 Z with the product measure µ = 1 , 1 Z and { } { 2 2 } the shift S. Then X X = 00, 01, 10, 11 Z with the product measure µ µ = ⇥ ⇠ { } ⇥ 1 , 1 , 1 , 1 Z, which is ergodic. Therefore (X, µ, T) is weak mixing. { 4 4 4 4 } n In the last example, X was strong mixing: µ(A T B) µ(A), (B) for all \ ! measurable A, B,or: f T ng f g in L2 for f,g L2.Weakmixinghas ´ · ! ´ ´ 2 56 CHAPTER 6. WEAK MIXING 57 asimilarcharacterization.Beforestatingitweneedadefinition.Forasubset I N we define the upper density to be ✓ I 1,...,N d(I)=limsup| \{ }| N N !1

Definition 6.1.5. Asequencean R converges in density to a R,denoted D 2 2 a a or D-lim a = a,if n ! n d( n : a a >" )=0 for all ">0 { | n | } Compare this to the usual notion of convergence, where we require the set above to be finite rather than 0-density. Since the union of finitely many sets of zero density has zero density, this notion of limit has the usual properties (with the exception that a subsequence may not have the same limit). One can also show the following:

Lemma 6.1.6. For a bounded sequence an, the following are equivalent:

D 1. a a. n ! 2. 1 N a a =0. N n=0 | n | 3. 1 PN (a a)2 =0. N n=0 n

4. ThereP is a subset I = n1

Theorem 6.1.7. For a m.p.s. (X, ,µ,T) the following are equivalent: B 1. X is weak mixing.

1 N 1 n 2 2. N n=0 f T gdµ fdµ gdµ 0 for all f,g L (µ). ´ · ´ ´ ! 2 1 N 1 n 3. P µ(A T B) µ(A)µ(B) 0 for every A, B . N n=0 | \ | ! 2B P n D 4. µ(A T B) µ(A)µ(B) for every A, B . \ ! 2B This is based on the following: Lemma 6.1.8. For a m.p.s. (Y, ,⌫,s) the following are equivalent: C 1. Y is ergodic.

1 N n 2. (µ(A T B) µ(A)µ(B)) 0 for every A, B . N n=0 \ ! 2C 1 PN n 2 3. N n=0 f T g fdµ gdµ 0 for every f,g L (⌫). ´ · ´ · ´ ! 2 P CHAPTER 6. WEAK MIXING 58

Proof. The equivalence of the last two conditions is standard, we prove equiva- lence of the first two. n n If the limit holds then, it implies that µ(A T B)= 1 T 1 dµ > 0 \ A · B infinitely often if µ(A)µ(B) > 0. This gives ergodicity. ´ Conversely, if the system is ergodic then by the mean ergodic theorem, 1 N 1 n 2 2 S g = T g g in L for any g L .Sobycontinuityofthe N N n=0 ! 2 inner product, ´ P N 1 N n 1 n lim µ(A T B)= lim 1A T 1Bdµ N \ N N ˆ · !1 n=0 !1 n=0 X X ✓ ◆ =lim1A,SN 1B N h i !1 = 1 , 1 dµ A ˆ B ⌧ = 1 ,µ(B) h A i µ(A)µ(B) which is what we wanted.

n Proof of the Proposition. Since µ(A T B) µ(A)µ(B) 1,theequivalence | \ | of (3) and (4) is Lemma 6.1.6. The equivalence of (2) and (3) is standard be approximating L2 functions by simple functions and using Cauchy-Schwartz. So we have to prove that (1) (2). () We may suppose that X is ergodic, since otherwise (1) fails trivially and (2) fails already without absolute values by the lemma. Then for f,g L1(µ) we 2 know from the lemma that

N 1 1 f T ng f g (6.1) N ˆ · ! ˆ ˆ n=0 X 2 Suppose that X is weak mixing. Let f 0 = f(x)f(y) L (µ µ) and define 2 2 ⇥ g0 L (µ µ) similarly. By ergodicity of X X 2 ⇥ ⇥ N 1 1 n f 0 (T T ) g0dµ µ f 0 g0 N ˆ · ⇥ ⇥ ! ˆ ˆ n=0 X but

n n n f 0(T T ) g0dµ µ = f(x)f(y)g(T x)g(T y) dµ(x)dµ(y) ˆ ⇥ ⇥ ˆˆ =( f T ngdµ)2 ˆ · and 2 2 f 0dµ µ g0dµ µ =( fdµ) ( gdµ) ˆ ⇥ · ˆ ⇥ ˆ ˆ CHAPTER 6. WEAK MIXING 59

Thus we have proved:

N 1 1 ( f T ngdµ)2 ( f g)2 (6.2) N ˆ · ! ˆ ˆ n=0 X 1 N 1 n Combining this with N n=0 f T g f g,wefindthat ´ · ! ´ ´ N 1 1 P ( f T ngdµ ( f g))2 0 (6.3) N ˆ · ˆ ˆ ! n=0 X and since the terms are bounded, this implies (2). In the opposite direction, assume (4), which is equivalent to (2). We must prove that X X is ergodic, or equivalently, that for every F, G L2(µ ⌫), ⇥ 2 ⇥ N 1 1 F T nGdµ ⌫ Fdµ ⌫ Gdµ ⌫ N ˆ · ⇥ ! ˆ ⇥ · ˆ ⇥ n=0 X By approximation it is enough to prove this when F (x1,x2)=f1(x1)f2(x2) and G(x1,x2)=g1(x1)g2(x2).Furthermorewemayassumethatf1,f2,g1,g2 are simple, and even indicator functions 1A, 1A0 , 1B, 1B0 . Thus we want to prove that for A, A0,B,B0,

N 1 1 n n µ(A T B)µ(A T B0) µ(A)µ(B)µ(A0)µ(B0) N \ \ ! n=0 X n n But µ(A T B) µ(A)µ(B) in density and µ(A0 T B0) µ(A0)µ(B0) in \ ! \ ! density, so the same is true for their product; and hence the averages converge as desired. Corollary 6.1.9. If (X, ,µ,T) is mixing then it is weak mixing. B n Proof. µ(A T B) µ(A)µ(B) implies it in density. Now apply the propo- \ ! sition.

6.2 Weak mixing as a multiplier property

Proposition 6.2.1. (X, ,µ,T) is weak mixing if and only if X Y is ergodic B ⇥ for every ergodic system (Y, ,⌫,S). C Proof. One direction is trivial: if X Y is ergodic whenever Y is ergodic then ⇥ this is true in particular for the 1-point system. Then X Y = X so X is ⇥ ⇠ ergodic. It then follows taking Y = X that X X is ergodic, so X is weak ⇥ mixing. In the opposite direction we must prove that for every F, G L2(µ ⌫), 2 ⇥ N 1 1 F (T S)nGdµ ⌫ Fdµ ⌫ Gdµ ⌫ N ˆ · ⇥ ⇥ ! ˆ ⇥ ˆ ⇥ n=0 X CHAPTER 6. WEAK MIXING 60

As before it is enough to prove this when F (x1,x2)=f1(x1)f2(x2) and G(x1,x2)= g1(x1)g2(x2),anditreducesto

N 1 1 f (x)T nf (x)dµ(x) g (x)Sng (x)d⌫(x) f dµ f dµ g d⌫ g d⌫ N ˆ 1 2 ·ˆ 2 2 ! ˆ 1 ˆ 2 ˆ 1 ˆ 2 n=0 X Splitting L2(µ) into constant functions and their orthogonal complement (func- tions of integral 0), it is enough to prove this for f1 in each of these spaces. If n f1 is constant then f1(x)T f2(x)dµ(x)= f1dµ f2d⌫ and the claim follows from ergodicity of S´.Ontheotherhandif´ f1dµ ´=0we have ´ N 1 2 1 f (x)T nf (x)dµ(x) g (x)Sng (x)d⌫(x) N ˆ 1 2 · ˆ 2 2 n=0 ! X N 1 2 N 1 2 1 1 f (x)T nf (x)dµ(x) g (x)Sng (x)d⌫(x)  N ˆ 1 2 · N ˆ 2 2 n=0 n=0 X ✓ ◆ X ✓ ◆ but N 1 2 N 1 2 1 1 f (x)T nf (x)dµ(x) = f (x)T nf (x)dµ(x) f f 0 N ˆ 1 2  N ˆ 1 2 ˆ 1 ˆ 2 ! n=0 n=0 X ✓ ◆ X ✓ ◆ by weak mixing of X and we are done. Corollary 6.2.2. If X is weak mixing so is X X and X X ... X. ⇥ ⇥ ⇥ ⇥ Proof. For any ergodic Y , (X X) Y = X (X Y ).SinceX Y is ergodic ⇥ ⇥ ⇥ ⇥ ⇥ so is X (X Y ). The general claim follows in the same way. ⇥ ⇥ More generally, Corollary 6.2.3. If X ,X ,... are weak mixing so are X X .... 1 2 1 ⇥ 2 ⇥ Also Corollary 6.2.4. If (X, ,µ,T) is weak mixing then so is T n for all n N (if B 2 T is invertible, also negative n). 1 1 Proof. Since T T is ergodic if and only if T T is ergodic, weak mixing 1 ⇥ ⇥ of T and T are equivalent, so we only need to consider n>0. First we show that t weak mixing implies T m is ergodic. Otherwise let f L2 be a T m invariant and non-constant function. Consider the system 2 Y = 0,...,m 1 and S(y)=y +1modm with uniform measure. Since X, T { } m i is weak mixing, X Y,T S is ergodic. Let F (x, i)=f(T x). Then ⇥ ⇥ m+1 (i+1) m i F (Tx,Si)=f(T x)=f(T x)=F (x, i) so F is T S invariant and non-constant, a contradiction. Thus T m is ergodic. ⇥ Applying this to T T , which is weak mixing, we find that (T T )m is ⇥ ⇥ ergodic, equivalently T m T m is ergodic, so T m is weak mixing. ⇥ CHAPTER 6. WEAK MIXING 61

6.3 Isometric factors

An isometric m.p.s. is a m.p.s. (Y, ,⌫,S) where Y is a compact metric space, C C the Borel algebra, and S preserves the metric. We say that system is nontrivial if ⌫ is not a single atom. We have already seen that nontrivial isometric systems are not weak mixing, since if d is the metric then d : Y Y R is a non-trivial invariant function. ⇥ ! In this section we prove the following. Theorem 6.3.1. Let (X, ,µ,T) be an invertible m.p.s. on a standard Borel B space. Then it is weak mixing if and only if it does not have nontrivial isometric factors (up to isomorphism). One direction is easy: If Y is an isometric factor then X X factors onto ⇥ Y Y and so, since the latter is not ergodic, neither is the former, For the ⇥ converse direction we must show that if X is not weak mixing then it has non- trivial isometric factors. We can assume that X is ergodic, since otherwise the ergodic decomposition gives a factor to a compact metric space with the identity map, which is an isometry. Recall that if (X, ,µ) is a probability space then there is a pseudo-metric B on defined by B d(A, B)=µ(AB)= 1 1 k A Bk1 Identifying sets that differ in measure 0 dives a metric on equivalence classes, and the resulting space may be identified with the space of 0-1 valued functions in L1, which is the same as the set of indicator functions. This is an isometry and since the latter space is closed in L1, it is complete. Now suppose that X is not weak mixing and let A X X be a non-trivial ✓ ⇥ invariant set. Consider the map X L1 given by x 1 where ! ! Ax A = y X :(x, y) A x { 2 2 } The map is measurable, we use the fact that the Borel structure of the unit ball in L1 in the norm and weak topologies coincide (this fact is left as an exercise). Then we only need to check that

x 1 (y)g(y) dµ(y) 7! ˆ Ax is measurable for every g L1.Forfixedg, this clearly holds when A is a 2 product set or a union of product sets, and the general case follows from the monotone class theorem. Now, notice that

TAx = Ty :(x, y) A { 1 2 } = y :(x, T y) A { 1 2 } 1 = y :(x, T y) T A { 2 } = y :(Tx,y) A { 2 } = ATx CHAPTER 6. WEAK MIXING 62 so ⇡ : x A commutes with the action of T on X and L1.Finally,theaction ! x of T on L1 is an isometry. Therefore we have proved: Claim 6.3.2. If T is not weak mixing then there is a complete metric space (Y,d), and isometry T : Y Y and a Borel map ⇡ : X Y such that T⇡ = ⇡T. ! ! Let ⌫ = ⇡µ, the image measure; it is preserved. Thus (Y,⌫,T) is almost the desired factor, except that the space Y is not compact (and there is another technicality we will mention later). To fix these problems we need a few general facts. Definition 6.3.3. Let (Y,d) be a complete metric space. A subset Z Y is ✓ called totally bounded if for every " there is a finite set Z Y such that " ✓ Z z Z B"(z). ✓ 2 " LemmaS 6.3.4. Let Z Y as above. Then Z is compact if and only if Z is ✓ totally bounded.

Proof. This is left as an exercise. Lemma 6.3.5. Let (Y,d) be a complete separable metric space, S : Y Y ! an isometry and µ an invariant and ergodic Borel probability measure. Then supp µ is compact.

Proof. Let C =suppµ. This is a closed set and is clearly invariant so we only need to show that it is compact. For this it is enough to show that it is totally bounded. Choose a µ-typical point y.Bytheergodictheorem,itsorbitisdenseinC. Furthermore since S is an isometry, B (Sny)=SnB (y).Nowletz C. There r r 2 is an n with d(z,Sny)

6.4 Eigenfunctions

Definition 6.4.1. Let T be an operator on a Hilbert space H. Then C is 2 an eigenvalue and u acorrespondingeigenfunctionifUu = u.Wedenotethe set of eigenvalues by ⌃(U). CHAPTER 6. WEAK MIXING 63

As in the finite-dimensional case, unitary operators have only eigenvalues of modulus 1,becauseifUv = v and v =1then k k

= v,v = Uv,Uv = U ⇤Uv,v = v, v =1 h i h i h i h i

Lemma 6.4.2. If Uv = v and Uv0 = 0v0 and = 0 then v v0.In 6 ? particular, if H is separable then ⌃(U) . | |@0 Proof. We can assume v = v0 =1. Then the same calculation as above k k k k shows that v, v0 = v, v0 , which is possible if and only if v, v0 =0. 0 h i h i h i From now on let (X, ,µ,T) be an invertible m.p.s. and denote also by T B the induced unitary operator on L2(µ). Notice that an invariant function is an eigenvector of eigenvalue 1, and the space of such functions is always at least 1-dimensional, since it contains the constant functions. Therefore, Lemma 6.4.3. T is ergodic if and only if 1 is a simple eigenvalue (the corre- sponding eigenspace has complex dimension 1).

If T is ergodic and e2⇡i↵ is an eigenvalue with eigenfunction f L2(µ),then 2 f satisfies | | T f = Tf = e2⇡i↵f = f | | | | | | | | so f is invariant. Therefore, | | Corollary 6.4.4. If T is ergodic then all eigenfunctions are constant. By convention, we always assume that eigenfunctions have modulus 1. Therefore if ↵, ⌃(T ) with corresponding eigenfunctions f,g,thenf g 2 · has modulus 1,henceisinL2,and

T (fg)=Tf Tg = ↵f g = ↵ fg · · · so ↵ ⌃(T ) and fg is an eigenfunction for it. Similarly considering f we find 2 1 that ↵ = ↵ ⌃(T ). This shows that 2 Corollary 6.4.5. ⌃(T ) C is a (multiplicative) group and if T is ergodic then ✓ all eigenvalues are simple. Proof. The first statement is immediate. For the second, note that if f,g are distinct eigenfunctions for the same ↵ then fg is an eigenfunction for ↵↵ =1, 1 hence fg =1,or(usingg = g )wehavef = g. Note that ⌃(U) is not generally a group if U is unitary. The following observation is trivial: Lemma 6.4.6. If ⇡ :(X, T) (Y,S) is a factor map between measure preserv- ! ing systems then ⌃(S) ⌃(T ). ✓ CHAPTER 6. WEAK MIXING 64

Proof. If ⌃(S) let f be a corresponding eigenfunction, Sf = f then 2 T (f⇡)=f⇡T = fS⇡ = f⇡

This shows that ⌃(T ). 2 Theorem 6.4.7. (X, ,µ,T) is ergodic then it is weak mixing if and only if 1 B is the only eigenvalue, i.e. ⌃(T )= 1 . { } One direction of the proof is easy. Note that from a dynamical point of view, an eigenfunction f for ↵ ⌃ of (X, ,µ,T) is a factor map to rotation on S1, 2 B since if f is an eigenfunction with eigenvalue ↵ and R↵z = ↵z is the rotation on S1 then the eigenfunction equation is just

fT(x)=↵f(x)=R↵(f(x))

1 so fT = R↵f,andtheimagemeasurefµ is a probability measure on S invari- ant under R .Ofcourse,R is an isometry of S1,andif↵ =1then fµ is not ↵ ↵ 6 asingleatom;thuswehavefoundanon-trivialisometricfactorofX and so it is not weak mixing. For the other direction we need the following. For a group G and g G 2 let L : G G denote the map h gh. A dynamical system in which G is a g ! 7! compact group and the map is Lg is called a group translation. Note that Haar measure is automatically invariant, since by definition it is the unique Borel probability measure that is invariant under all translations. If Haar measure is also ergodic the we say the translation is ergodic. Proposition 6.4.8. Let (Y,d) be a compact metric space and S : Y Y an ! isometry with a dense orbit. Then there is a compact metric group G and g G 2 and a homeomorphism ⇡ : Y G such that L ⇡ = ⇡S. Furthermore if ⌫ is an ! g invariant measure on Y then it is ergodic and ⇡⌫ is Haar measure on G.

Proof. Consider the group of isometries of Y with the sup metric,

d(,0)=supd((y),0(y)) y Y 2 Then (,d) is a complete metric space, and note that it is right invariant: d( , )=d(,0). n Let y0 Y have dense orbit and set Y0 = S y0 n . If the orbit is finite, 2 { } 2Z Y = Y0 is a finite set permuted cyclically by S,sothestatementistrivial. n Otherwise y Y0 uniquely determines n such that S y0 = y and we can define 2 n ⇡ : Y0 by y S for this n. ! 7! 2 n n0 We claim that ⇡ is an isometry. Fix y, y0 Y ,soy = S y and y0 = S y , 2 0 0 0 so n n0 d(⇡y,⇡ y0)=supd(S z,S z) z Y 2 CHAPTER 6. WEAK MIXING 65

Given z Y there is a sequence n such that Snk y z.Butthen 2 k !1 0 !

n n0 n nk n0 nk d(S z,S z)=d(S (lim S y0),S (lim S y0))

n nk n0 nk =limd(S S y0,S S y0)

nk n nk n0 =limd(S (S y0),S (S y0))

n n0 =limd(S y0,S y0)

n n0 = d(S y0,S y0)

= d(y, y0)

Thus d(⇡y, ⇡y0)=d(y, y0) and ⇡ is an isometry Y , .Furthermore,for 0 ! y = Sny Y , 0 2 0 n n+1 n ⇡(Sy)==⇡(SS y)=S = LSS = LS⇡(y)

It follows that ⇡ extends uniquely to an isometry with Y, also satisfying ! ⇡(Sy)=S(⇡y). The image ⇡(Y0) is compact, being the continuous image of the n compact set Y .Since⇡(Y0)= S n and this is a group its closure is also a { } 2Z group G. Finally, suppose ⌫ is an invariant measure on Y . Then m = ⇡⌫ is LS n invariant on G.SinceitisinvariantunderLS it is invariant under LS n , { } 2Z and this is a dense set of elements in G. Thus m it is invariant under every translation in G,andthereisonlyonesuchmeasureuptonormalization:Haar measure. The same argument applies to every ergodic components of m (w.r.t. LS) and shows that the ergodic components are also Haar measure. Thus m is LS-ergodic and since ⇡ is an isomorphism, (Y,⌫,S) is ergodic. Corollary 6.4.9. If (X, ,µ,T) is ergodic but not weak mixing, then a non- B trivial ergodic group translation as a factor. Finally, for the existence of eigenfunctions we will rely on a well-known result from group theory. Theorem 6.4.10 (Peter-Weyl). If G is a compact topological group with Haar measure m then the characters form an orthonormal basis for L2(m) (a char- acter is a continuous homomorphisms ⇠ : G S1 C). ! ✓ Corollary 6.4.11. Let G be a compact group with Haar measure m and g G. 2 2 Then L (m) is spanned by eigenfunctions of Lg. Proof. If ⇠ is a character then

Lg⇠(x)=⇠(gx)=⇠(g)⇠(x) so ⇠ is an eigenfunction with eigenvalue ⇠(g).SincethecharactersspanL2(m) we are done. CHAPTER 6. WEAK MIXING 66

In summary, we have seen that if (X, ,µ,T) is ergodic but not weak mixing B then it has a nontrivial isometric factor. This is isomorphic to a non-trivial group rotation, which has a non-trivial eigenfunction. This eigenfunction lifts via the composition of all these factor maps to a non-trivial eigenfunction on X. This completes the characterization of weak mixing in terms of existence of eigenfunctions.

6.5 Spectral isomorphism and the Kronecker fac- tor

We will now take a closer look at systems that are isomorphic to group rotations. 1 Let T1 =(S )N, which is a compact metrizable group with the product topology. Let m be the infinite product of Lebesgue measure, which is invariant under translations in T1.Given↵ T1 let L↵ : T1 T1 as usual be the 2 ! translation map. Note that n n L↵x = ↵ x

Lemma 6.5.1. The orbit closure G↵ of 0 T1 under L↵ (that is, the closure 2 of ↵n : n N ) is the closed subgroup generated by ↵. { 2 } Proof. It is clear that it is contained in the group in question, is closed, contains ↵, and is a semigroup. The latter is because it is the closure of the semigroup n ni mj ↵ n ;explicitly,ifx, y G↵ then x =lim↵ and y =lim0 ,so { } 2N 2 xy =(lim↵mj )(lim ↵ni )lim↵mi+ni G 2 ↵ 1 n Thus, we only need to check that x G implies x G .Letx =lim↵ i . 2 ↵ 2 ↵ We can assume ni+1 > 2ni.Passingtoasubsequence,wecanalsoassumethat n 2n ↵ i+1 i y.Butthen ! n n 2n n n 1 xy =(lim↵ i )(lim ↵ i+1 i )=lim↵ i+1 i = xx =1

1 so y = x .

1 Lemma 6.5.2. Let ↵ T1 =(S )N and G↵ be as above. Let m↵ be the 2 Haar measure on G↵, equivalently, the unique L↵-invariant measure. Then ⌃(L )= ↵ ,↵ ,... S1, the discrete group generated by the coordinates ↵ ↵ h 1 2 i✓ i of ↵.

1 Proof. Let ⇡n : T1 S denote the n-th coordinate projection. Clearly ! ⇡n(L↵x)=↵nxn = ↵n⇡n(x),sothefunctions⇡n are eigenfunctions of (G↵,m↵,L↵). Let denote the C-algebra generated by ⇡n . This is an algebra of continuous A { } functions that separate points in T1 and certainly in G↵,sotheyaredensein 2 L (m↵). Since this algebra consists precisely of the eigenfunctions with eigen- values in ↵ ,↵ ,... we are done. h 1 2 i CHAPTER 6. WEAK MIXING 67

Proposition 6.5.3. Let (X, ,µ,T) be an ergodic measure preserving system B on a standard Borel space. Let ⌃(T )= ↵1,↵2,... and ↵ =(↵1,↵2,...) T1. { } 2 Then

1. (G↵,m↵,L↵) is a factor of X 2. If L2(µ) is spanned by eigenfunctions then the factor map is an isomor- phism: X ⇠= G↵. 3. If ⌧ : X Y is a factor map to an isometric system (Y,S,⌫) then ⇡ ! factors through G↵. Proof. We may restrict to a subset of X of full measure where all the eigenfunc- tions fi of ↵i are defined, and satisfy Tfi = ↵ifi.LetF : X T1 denote the ! map F (x)=(f1(x),f2(x),...). Then it is immediate that F (Tx)=L↵F (x). Let ⌫ = Fµ, which is an L↵-invariant measure on T1,andy supp ⌫ be a point 2 1 with dense orbit (which exists by ergodicity). Consider the map L 1 x y x, y 7! 1 which commutes with L↵ (because T1 is abelian) and note that Ly maps the 1 L↵-orbit of y to the L↵-orbit of 1.Writingm = Ly ⌫ it follows that m is L -invariant and supp m = G .Hencem = m .Also⇡ : x L 1 F (x) is a ↵ ↵ ↵ 7! y factor map from X to G↵. Suppose that the eigenfunctions span L2(µ).Sinceeacheigenfunctionis 2 1 2 lifted by ⇡ from an eigenfunction of G ,wefindthatL (⇡ )=L (µ), ↵ B↵ where is the Borel algebra of G . It follows that ⇡ is 1-1 a.e. and by B↵ ↵ standardness it is an isomorphism. Finally suppose ⌧ : X Y as in the statement. Then (Y,X) is a group ! rotation, its eigenvectors are dense in L2(⌫),andwehaveanisomorphism : Y G where =( , ,...) enumerates ⌃(S). Now, each eigenvector f of ! 1 2 (G,S) lifts to one of X and since the multiplicity is 1, this is the eigenvector for its eigenvalue. Thus the coordinates of ↵ include those of .Letu : G↵ T1 ! denote projection to the coordinates corresponding to eigenvalues of S;thenthe map u F = ⌧ defined above. The claim follows. Let us point out two consequences of this theorem. First, Definition 6.5.4. An ergodic measure preserving system has discrete spectrum if L2 is spanned by eigenfunctions.

Corollary 6.5.5. Discrete spectrum systems are isomorphic if and only if the induced unitary operators are unitarily equivalent. Proof. This follows from the theorem above and the fact that two diagonal- izable unitary operators are unitarily equivalent if and only if they have the same eigenvalues (counted with multiplicities), and ergodicity implies that all eigenvalues are simple. Second, part (3) of the theorem above shows that every measure preserving system has a maximal isometric factor. This factor is called the Kronecker CHAPTER 6. WEAK MIXING 68 factor. The factor is canonical, although the factor map is not – one can always post-compose it with a translation of the group. We emphasize that in general it is false that unitary equivalence implies ergodic-theoretic isomorphism. The easiest example to state is that the prod- uct measures (1/2, 1/2)Z and (1/3, 1/3, 1/3)Z with the shift map have unitarily isomorphic induced actions on L2,buttheyarenotisomorphic.

6.6 Spectral methods

Our characterization of weak mixing is, in the end, purely a Hilbert-space state- ment. Thus one should be able to prove the existence of eigenfunctions without use of the underlying dynamical system. This can be done with the help of the spectral theorem. Let us first give a brief review of the version we will use. Let us begin with an example of a unitary operator. Let µ be a probability measure on the circle S1 and let M : L2(µ) L2(µ) be given by (Mf)(z)= ! zf(z).NotethatU preserves norm, since zf(z) = f(z) for z S1 and hence | | | | 2 µ-a.e. z;itisinvertiblesincetheinverseisgivenbymultiplicationbyz. The spectral theorem says that any unitary operator can be represented in this way on any invariant subspace for which it has an cyclic vector. Theorem 6.6.1 (Spectral theorem for unitary operators). Let U : H H be n ! a unitary operator and v H a unit vector such that U v n1= = H. Then 2 1 { } 1 2 there is a probability measure µv (S ) and a unitary operator V : L (µ) H 1 2P 2 2 ! such that U = VMV , where M : L (µ) L (µ) is as above. Furthermore ! V (1) = v.

We give the main idea of the proof. The measure µv is characterized by the statement because its Fourier transform µv : Z R is given by !

n n n µv(n)= z dµv = M 1b, 1 2 = U v, v ˆ h iL (µv ) h iH

Reversing this argument,b in order to construct µv one starts with the sequence a = U n1, 1 . This sequence is positive definite in the sense that for any n h iH CHAPTER 6. WEAK MIXING 69

n sequence i C and any n, ijai j 0: 2 i,j=1 P n 1 i j ijai j = ij U v, v H i,j=1 i,j= n X X n ⌦ ↵ i j = U iv, U jv H i,j= n X n ⌦ n ↵ i j = U iv, U jv *i= n j= n + X X H n 2 i = U iv i= n 2 X 0 Therefore, by a theorem of Hergolz (also Bochner) an is the Fourier transform 1 2 of a probability measure on S (note that a0 == v =1). k k d n One first defines V on complex polynomials p(z)= n=0 bnz by Vp = d b U nv.Onecancheckthatthispreservesinnerproducts;itsufficesto n=0 n P check for monomials, and indeed P

m n m n m n m n m n V (bz ),V(cz ) = bc U v, U v = bc am n = bc z dµv = (bz )(cz )dµv = bz ,bz 2 h i h i · ˆ ˆ h iL (µv )

Since polynomials are dense in L2(µ) it remains to extend V to measurable functions. The technical details of carrying this out can be found in many textbooks. Lemma 6.6.2. Let U : H H be unitary and v a cyclic unit vector for U with ! spectral measure µ. Then ↵ ⌃(U) if and only if µ (↵) > 0. 2 v 2 Proof. If ↵ is an atom of µv let f =1 ↵ . This is a non-zero vector in L (µv), { } and Mf(z)=zf(z)=↵f(z).Hence↵ ⌃(M) and by the spectral theorem ↵ ⌃(U). 2 2 Conversely, suppose that µ ( ↵ )=0.ConsidertheoperatorU (w)= v { } ↵ ↵Uw, which can easily be seen to be unitary. Clearly w is an eigenfunction with eigenvalue ↵ if and only if U↵w = w. Thus it suffices for us to prove that 1 N 1 n U w 0 for all w,and,sincev is cyclic and the averaging operator is N n=0 ↵ ! linear and continuous, it is enough to check this for v.Transferringtheproblem P 1 1 N 1 n n 2 to (S ,µ ,M), we must show that ↵ z 0 in L (µ ).Wehave v N n=0 ! v N 1 P 1 1 (↵z)N 1 ↵nzn = N N ↵z 1 n=0 X This converges to 0 at every point z = ↵,henceµv-a.e., and it is bounded. 6 2 Hence by bounded convergence, it tends to 0 in L (µv),asrequired. CHAPTER 6. WEAK MIXING 70

Proposition 6.6.3. Let U, H, v, µv be as above. If µv is continuous (has no atoms), then N 1 1 n (w, U w0) 0 N | |! n=0 X for every w, w0 H . 2 v 2 Proof. Using the fact that w, w0 can be approximated in L by linear combina- n n tions of U v ,itisenoughtoprovethisforw, w0 U v .Sincethestatement { } 2{ } 1 we are trying to prove is formally unchanged if we replace w by U ± w or w0 by 1 U ± w0,wemayassumethatw = w0 = v.Also,wemaysquarethesummand,as we have seen this does not affect the convergence to 0 of the averages. Passing to the spectral setting, we have

N 1 N 1 2 1 1 (v, Unv) 2 = zndµ N | | N ˆ v n=0 n=0 X X N 1 1 = zndµ zndµ N ˆ v · ˆ v n=0 X ✓ ◆ N 1 1 = znyndµ (y)dµ (z) N ˆˆ v v n=0 X ✓ ◆ 1 (zy)N 1 = dµ µ (z,y) ˆˆ N · zy 1 v ⇥ v ✓ ◆ The integrand is bounded by 1 and tends pointwise to 0 offthe diagonal y = z , { } which has µ µ -measure 0,sinceµ is non-atomic. Therefore by bounded v ⇥ v v convergence, the expression tends to 0. Corollary 6.6.4. If (X, ,µ,T) is ergodic then it is weak mixing if and only B if µ is continuous (has no atoms) for every f 1 (equivalently, the maximal f ? spectral type is non-atomic except for an atom at 1), if and only if ⌃(T )= 1 . { } Proof. Suppose for f 1 the spectral measure µf is continuous. By the last 1 N 1 ? n proposition, f T fdµ 0.Forgeneralf,g we can write f = N n=0 | · |! f 0 + f, g = g0 + g, where´ f 0,g0 1.Substitutingf = f 0 + f into ´ P ´ ? ´ N 1 1 f T nfdµ ( fdµ)( gdµ) N | ˆ · ˆ ˆ | n=0 X we obtain the expression

N 1 1 n f 0 T g0 dµ N | ˆ · | n=0 X which by assumption 0.Thiswasoneofourcharacterizationsofweakmixing. ! CHAPTER 6. WEAK MIXING 71

Conversely suppose T is weak mixing. Then it has no eigenfunctions except 1 (this was the trivial direction of the eigenfunction characterization), so if f 1 ? also span U nf 1 and so, since on this subspace T has no eigenfunctions, µ { }? f is continuous. We already know that weak mixing implies ⌃(T )= 1 . In the other di- { } rection, if T is not weak mixing, we just saw that there is some f 1 with ? µ (↵) > 0 for some ↵,andbythepreviouslemma,↵ ⌃(T ). f 2 In a certain sense, we can now “split” the dynamics of a non-weak-mixing system into an isometric part, and a weak mixing part: Corollary 6.6.5. Let (X, B,µ, T ) be ergodic. Then L2(µ)=U V , where U = L2(µ, ) for the Kronecker factor, and V is an invariant subspace E E✓B 1 N 1 n such that T is a weak-mixing in the sense that f T gdµ 0 |V N n=0 | · |! for g V . ´ 2 P One should note that, in general, the subspace V in the corollary does not correspond to a factor in the dynamical sense. An important consequence is the following: Theorem 6.6.6. Let (X, ,µ,T) and (Y, ,⌫,S) be ergodic measure preserving B C systems. Then X Y is ergodic if and only if ⌃(T ) ⌃(S)= 1 . ⇥ \ { } Proof. Let Z = X Y , R = T S, ✓ = µ ⌫. If ↵ =1is a common eigenvalue of ⇥ ⇥ ⇥ 6 T,S with eigenfunctions f,g,thenh(x, y)=g(y) f(x) is a non-trivial invariant · function, since

h(R(z,y)) = g(Sy) f(Tx)=↵g(y)↵f(x)=h(x, y) · and so Z is not ergodic. Conversely, write L2(µ)=V V , where T as in the previous corol- wm pp |Vpp lary, where T has no eigenvalues, and decompose L2(⌫)=W W |Vpp wm pp similarly. We must show that

N 1 1 h Rnhd✓ ( h)2 N ˆ · ! ˆ n=0 X for every h L2(✓) and it suffices to check this for h = fg, f L2(µ), g L2(⌫), 2 2 2 since the span of these is dense in L2. Then hRnhd✓ = fTnfdµ gSngd⌫. · Also, since we can write f = f + f and ´g = g + g ´ for f ´V etc. wm pp wm pp wm 2 wm we can expand the expression above, and obtain a sum of terms of the form

N 1 N 1 1 1 h Rnhd✓ ( h)2 = ( f T nf dµ)( g Sng d⌫) N ˆ · ! ˆ N ˆ i j ˆ s t n=0 i,j,s,t wm,pp n=0 ! X 2X{ } X (6.4) CHAPTER 6. WEAK MIXING 72

Consider the terms in parentheses; they are all bounded independently of n.So if i, j = wm we can bound

N 1 N 1 1 n n 1 n ( fiT fjdµ)( gsS gtd⌫) C fiT fjdµ 0 N ˆ ˆ  · N | ˆ |! n=0 n=0 X X n n and similarly if s, t = wm.Alsoifi = wm , s = pp,thenT fj = ↵ fj for some ↵,andf f .Hence j ? i

f T nf dµ = ↵n f f dµ = ↵ f , f =0 ˆ i j ˆ i j i j ⌦ ↵ Thus in 6.4 we only need to consider the case i, j, s, t = pp. In this case we can expand each of the functions as a series in normalized, distinct eigenfunctions: fpp = 'k and gpp = k where T'k = ↵k'k and S k = k k.Weassume '0 = const and 0 = const.Expandingagainusinglinearity,wemustconsider P P terms of the form

N 1 N 1 1 1 ( ' T n' dµ)( Sn d⌫)= (↵n ' ' dµ)(n d⌫) N ˆ i j ˆ s t N j ˆ i j t ˆ s t n=0 n=0 X X

Now, the first integral is 0 unless 'j = 'i and the second is 0 unless t = s. If this is the case we have, writing c = ' 2 2 i,s k jk k jk N 1 c ↵ = 1 n n i,s j t ci,s ↵j = t = ↵j t ci,s = 1 (↵)N 1 N c N ! 0 otherwise n=0 ( i,s N 1 otherwise !1 X ⇢ Since ⌃(T ) ⌃(S)= 1 the limit is thus 0 except for i = j = s = t =0. In \ { } 2 2 2 2 the latter case, c0,0 = '0dµ 0d⌫ =( f) ( g) ,sothiswasthelimitwe wanted. ´ ´ ´ ´ Chapter 7

Disjointness and a taste of entropy theory

7.1 Joinings and disjointness

Definition 7.1.1. Ajoiningofmeasurepreservingsystems(X, ,µ,T) and B (Y, ,⌫,S) is a measure ✓ on X Y that is invariant under T S and projects C ⇥ ⇥ to µ, ⌫,respectively, under the coordinate projections. Remark 7.1.2. There is a more general notion of a joining of X, Y ,namely, apreservingsystem(Z, ,✓,R) together with factor maps ⇡ : Z X and E X ! ⇡ : Z Y . This gives a joining in the previous sense by taking the image of Y ! the measure ✓ under z (⇡ (z),⇡ (z)). 7! X Y Joinings always exist since we could take ✓ = µ ⌫ with the coordinate ⇥ projection. This is called the trivial joining. Another example arises when ' : X Y is an isomorphism. Then the ! graph map g : x (x, 'x) pushes µ forwards to a measure on X Y that 7! ⇥ is invariant under T S and projects in a 1-1 manner under the coordinate ⇥ maps to µ, ⌫ respectively. In particular this joining is different from the product joining unless X consists of a single point. We saw that when X, Y are isomorphic there are non-trivial joinings. The following can be viewed, then, as an extreme form of non-isomorphism. Definition 7.1.3. X, Y are disjoint if the only joining is the trivial one. Then we write X Y . ? If X is not a single point then it cannot be disjoint from itself. Indeed, the graph of T is a non-trivial joining. Also if X, Y are ergodic but X Y ⇥ is not the ergodic components of ✓ of µ ⌫ must project to µ, ⌫ under the z ⇥ coordinate maps, ⇡X ✓z is T -invariant and ✓z = ✓ implies ⇡X ✓z = µ,andby ergodicity of µ this implies ⇡✓z = µ for a.e.´ z;similarlyfor´ ⇡Y ✓. Thus if the ergodic decomposition is non-trivial then the ergodic components are non-trivial joinings.

73 CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY74

Example 7.1.4. Let X = Z/mZ,Y = Z/nZ with the maps i i +1. If 7! ⇡ : X Y is a factor between these systems (i.e. preserves measure) then, ! 1 since it is measure preserving, the fibers ⇡ (i) all have the same cardinality, hence n m.Converselyifm n there exists a factor map, given by x x mod m. | | 7! It is now a simple algebraic fact that if ✓ joining of X, Y ,thenitconsists of a coset in (Z/mZ) (Z/nZ) generated by (i, j) (i, j)+(1, 1). Thus the ⇥ 7! systems are disjoint if and only if (Z/mZ) (Z/nZ) is a cyclic group of order mn ⇥ generated by (1, 1), which is the same as saying that mn is the least common multiple of m and n. This is equivalent to gcd(m, n)=1. Since any common divisor of m, n gives rise to a common factor of the systems, in this setting disjointness is equivalent to the absence of common nontrivial factors. It also shows that if X, Y are disjoint, then any factor of X is disjoint from any factor of Y (since if gcd(m, n)=1then gcd(m0,n0)=1for any m0 m andn0 n. | | These phenomena hold to some extent in the general ergodic setting.

Proposition 7.1.5. Let X, Y be invertible systems on standard Borel spaces (neither assumption is not really necessary). Suppose that ⇡ X W and X ! ⇡ Y W are factors and (W, ,✓,R) is non-trivial. Then X Y . Y ! E 6? Proof. Let µ = µzd✓(x) and ⌫ = ⌫zd✓(z) denote the decomposition of µ over the factor W´.Defineameasure´⌧ on X Y by ⇥ ⌧ = µ ⌫ d✓(z) ˆ z ⇥ z

We claim that ⌧ is a non-trivial joining of X and Y . First, it is invariant:

(T S)⌧ = T S(µ ⌫ )d✓(z)= Tµ S⌫ d✓(z)= µ ⌫ d✓(z)= µ ⌫ d✓(z)=⌧ ⇥ ˆ ⇥ z⇥ z ˆ z⇥ z ˆ Rz⇥ Rz ˆ z⇥ z

Second, it is distinct from µ ⌫. Indeed, let E W be a non-trivial set and 1 1 ⇥ ✓ E0 = ⇡ E ⇡ E. Then X ⇥ Y

1 1 ⌧(E )= µ (⇡ E)⌫ (⇡ E)d✓(z)= 1 (z)1 (z)d✓(z)=✓(E) 0 ˆ z X z Y ˆ E E

On the other hand

1 1 2 (µ ⌫)(E0)=µ(⇡ E) ⌫(⇡ (E)) = ✓(E) ⇥ X · Y since ✓(E) 0, 1 we have ✓(E) = ✓(E)2 so ⌧ = µ ⌫. ⌘ 6 6 ⇥ The converse is false in the ergodic setting: there exist systems with no common factor but non-trivial joinings. Proposition 7.1.6. If X X and X Y and X Y , then Y Y . 1 ? 2 1 ! 1 2 ! 2 1 ? 2 CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY75

Proof. Let us suppose X Y and X Y ,and✓ (Y Y ) is a non-trivial 1 ! 1 2 ! 2 2P 1⇥ 2 joining. Let µ1 = µ1,yd⌫1(y) and µ2 = µ2,yd⌫2(y) be the disintegrations. Define a measure ⌧´on X X by ´ 1 ⇥ 2 ⌧ = µ µ d✓(y ,y ) ˆ 1,y! ⇥ 2,y2 1 2 One can check in a similar way to the previous proposition that ⌧ is invariant and projects to µ1,µ2 under the coordinate projections. It is not the product measure because under the maps X X Y Y given by applying the 1 ⇥ 2 ! 1 ⇥ 2 original factor maps to each coordinate, the image measure is ✓ which the image of µ µ is ⌫ ⌫ . 1 ⇥ 2 1 ⇥ 2 Proposition 7.1.7. Let (X, Y ) and (Y,S) be uniquely ergodic topological sys- tems. and µ, ⌫ invariant measures respectively. Suppose µ ⌫. ? 1. For any generic point x for µ and y for ⌫, the point (x, y) X Y is 2 ⇥ generic for µ ⌫. ⇥ 2. If X, Y are uniquely ergodic then so is X Y . ⇥ 1 N 1 n n Proof. Clearly any accumulation point of N n=0 1(T x,S y) projects under coordinate maps to µ, ⌫,henceisajoining.Sincetheonlyjoiningisµ ⌫ there P ⇥ is only one accumulation point, sot he averages converge to µ ⌫, which is the ⇥ same as saying that (x, y) is generic for µ ⌫. ⇥ If X, Y are uniquely ergodic then every invariant measure on X Y is a ⇥ joining (since it projects to invariant measures under the coordinate maps, and these are necessarily µ, ⌫). Therefore by disjointness the only invariant measure on X Y is µ ⌫. ⇥ ⇥ Remark 7.1.8. In the ergodic setting, unlike the arithmetic setting, there do exists examples of systems X, Y that are not disjoint, but have no common factor. The examples are highly nontrivial to construct, however.

7.2 Spectrum, disjointness and the Wiener-Wintner theorem

Theorem 7.2.1. If X is weak mixing and Y has discrete spectrum then X Y . ? In fact this is a result of the following more general fact. Theorem 7.2.2. Let X, Y be ergodic m.p.s. and Y discrete spectrum. If ⌃(X) \ ⌃(Y )= 1 then they are disjoint. { } Proof. We may assume Y = G is a compact abelian group with Haar measure m and S = La atranslation. Suppose that ✓ is a joining. Define the maps L : X G X G by g ⇥ ! ⇥

Lg(x, h)=(x, Lgh)=(x,e gh)

e CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY76 and let ✓g = Lg✓ These are invariant since L and L commute, hence L commutes with T L . g a e g ⇥ a Also, ⇡1✓g = ⇡1✓ = µ and ⇡2✓g = Lg⇡2✓ = LgmG = mG,so✓g is a joining. Let e ✓ = ✓ dm (g) 0 ˆ g G This is again a joining. Now, for any F L2(✓), 2

F (x, h)d✓ (x, h)= F (x, h)d✓ (x, h)dm (g) ˆ 0 ˆ g G

= L F (x, h)d✓(x, h)dm (g) ˆ g G

= Fe (x, gh)d✓(x, h)dm (g) ˆ G

= F (x, gh)dm (g)d✓(x) ˆ G

= F (x, g)dm (g)d✓(x) ˆ G which means that ✓0 = µ m is a product measure. ⇥ G Now, from Theorem ?? we know that X Y is ergodic hence ✓0 is ergodic. ⇥ But ✓0 = ✓gdmG(g) and the ✓g are invariant. Thus by uniqueness of the ergodic decomposition´ ✓g are mG-a.s. equal to ✓0.Sincetheyareallimagesof each other under the Lg maps, they are all ergodic, in particular ✓ = ✓e = ✓0, which is what we wanted to prove. e Corollary 7.2.3. Among systems with discrete spectrum, disjointness is equiv- alent to absence of nontrivial common factors. Theorem 7.2.4 (Wiener-Wintner theorem). Let (X, ,µ,T) be an ergodic mea- B sure preserving system, with X compact metric and T continuous (this is no restriction assuming the space is standard), and f L1. Then for a.e. x the 1 N 1 n n 2 1 limit limN ↵ f(T x) exists for every ↵ S . !1 N n=0 2 Proof. Fix f. IfP we fix a countable sequence ↵ S1,thentheconclusionfor i 2 these values is obtained as follows. Given ↵ consider the product system X S1 ⇥ and T R↵,andthefunctiong(x, z)=zf(x). Then by the ergodic theorem 1 ⇥N 1 n 1 lim N n=0 (T R↵) g exists a.s., hence for µ-a.e. x there is a w S such that 1 N 1 ⇥ n 1 N2 1 n n lim (T R ) g(x, w) exists, but this is just lim w ↵ f(T x) N Pn=0 ⇥ ↵ N n=0 so the claim follows. P P First assume f is continuous. Applying the above to the collection of ↵ such that ↵ or some power ↵n are in ⌃(T ),weobtainasetoffullmeasure where the associated averages converge so it is enough to prove there is a set of full measure where the averages converge for the other values of ↵.For CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY77 these ↵ consider the system X S1 as above. Now (X, T) and (S1,R ) are ⇥ ↵ disjoint (since ⌃(R )= ↵n so by choice of ↵, ⌃(T ) ⌃(R )= 1 )sothe a { } \ ↵ { } only invariant measure is µ m. It follows that if x is a generic point for T ⇥ then (x, z) is generic for µ m (this is proved similarly to the statement about ⇥ products of disjoint uniquely ergodic systems). Therefore the ergodic averages 1 N 1 n of g converge, and hence N n=0 ↵ f(x). We have proved the following: for continuous f there is a set of full measure 1 N 1 nP 1 of x such that ↵ f(x) converges for every ↵.Nowiff L we N n=0 2 can approximate f by continuous functions, and note that the limsup of the P difference of the averages in question is bounded by f f 0 for every ↵ and k k1 a.s. x. This gives, for fixed f,asetofmeasure1 where the desired limit converges for every ↵.

7.3 Shannon entropy: a quick introduction

Entropy originated as a measure of randomness. It was introduced in informa- tion theory by Shannon in 1948. Later the ideas turned out relevant in ergodic theory and were adapted by Kolmogorov, with an important modification due to Sinai. We will quickly cover both of these topics now. We begin with the non-dynamical setting. Suppose we are given a discrete . How random is it? Clearly a variable that is uniformly dis- tributed on 3 points is more random than one that is uniformly distributed on 2 point, and the latter is more random than a non-uniform measure on 2 points. It turns out that the way to quantify this randomness is through entropy. Given arandomvariableX on a probability space (⌦, ,P),andvaluesinacountable F set A,thenitsentropyH(X) is defined by

H(X)= P (X = a) log P (X = a) a A X2 with the convention that log is in base 2 and 0 log 0 = 0. This quantity may in general be infinite. Evidently this definition is not affected by the actual values in A,andde- pends only on the probabilities of each value, P (X = a). We shall writedist(X) for the distribution of X which is the probability vector

dist(X)=(P (X = a))a range(X) 2 The entropy is then a function of dist(X);fixingthesizek of the range, the function is H(t ,...,t )= t log t 1 k i i Again we define 0 log 0 = 0, which makes t logXt continuous on [0, 1]. Lemma 7.3.1. H( ) is strictly concave. · Remark if p, q are distributions on ⌦ then tp+(1 t)q is also a distribution on ⌦ for all 0 t 1.   CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY78

Proof. Let f(t)= t log t. Then

f 0(t)= log t 1 1 f 00(t)= t so f is strictly concave on (0, ).Now 1 H(tp+(1 t)q)= f(tp +(1 t)q ) (tf(p )+(1 t)f(q )) = tH(p)+(1 t)H(q) i i i i i i X X with equality if and only if pi = qi for all i. Lemma 7.3.2. If X takes on k values with positive probability then 0 H(X)   log k. the left is equality if and only if k =1and the right is equality if and only if dist(X) is uniform, i.e. p(X = x)=1/k for each of the values.

Proof. The inequality H(X) 0 is trivial. For the second, note that since H is strictly concave on the convex set of probability vectors it has a unique maximum. By symmetry this must be (1/k, . . . , 1/k). Definition 7.3.3. The joint entropy of random variables X, Y is H(X, Y )= H(Z) where Z =(X, Y ). The conditional entropy of X given that another random variable Y (de- fined on the same probability space as X)takesonthevaluey is is an entropy associated to the conditional distribution of X given Y = y,i.e.

H(X Y = y)=H(dist(X Y = y)) = p(X = x Y = y) log p(X = x Y = y) | | | | x X The conditional entropy of X given Y is the average of these over y,

H(X Y )= p(Y = y) H(dist(X Y = y)) | · | y X 1 1 Example 7.3.4. If X, Y are independent 2 , 2 coin tosses then (X, Y ) takes 4 values with probability 1/4 so H(X, Y ) = log 4. If X, Y are correlated fair coin tosses then (X, Y ) is distributed non-uniformly on its 4 values and the entropy H(X, Y ) < log 4. Lemma 7.3.5. .

1. H(X, Y )=H(X)+H(Y X). | 2. H(X, Y ) H(X) with equality if and only if Y is a function of X. 3. H(X Y ) H(X) with equality if and only if X, Y are independent. |  4. More generally, H(X YZ) H(X Y ). |  | CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY79

Proof. Write

H(X, Y )= p((X, Y )=(x, y)) log p((X, Y )=(x, y)) x,y X p((X, Y )=(x, y)) = p(Y = y) p(X = x Y = y) log log p(Y = y) | p(Y = y) y x X X ✓ ◆ = p(Y = y) log p(Y = y) p(X = x Y = y) | y x X X p(Y = y) p(X = x Y = y) log p(X = x Y = y) | | y x X X = H(Y )+H(X Y ) | Since H(Y X) 0, the second inequality follows, and it is an equality if and | only if H(X Y = y)=0for all y which Y attains with positive probability. But | this occurs if and only if on each event Y = y ,thevariableX takes one value. { } This means that X is determined by Y . The third inequality follows from concavity:

H(X Y )= p(Y = y)H(dist(X Y = y)) | | y X H( p(Y = y)dist(X Y = y))  | y X = H(X) and equality if and only if and only if dist(X Y = y) are all equal to each other | and to dist(X), which is the same as independence. The last inequality follows similarly from concavity. It is often convenient to re-formulate entropy for partitions rather then ran- dom variables. Given a countable partition = B of ⌦ let { i}

dist()=(P (B))B 2 and define the entropy by

H()=H(dist()) = P (B) log P (B) B X2 This is the entropy of the random variable that assigns to ! ⌦ the unique 2 set Bi containing !. This set is denoted (!). Conversely if X is a random variable then it induces a partition of⌦ and the entropy of X and of this partition coincide. Given partitions ↵, their join is the partition

↵ = A B : A ↵ : B _ { \ 2 2 } CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY80

This is the partition induced by the pair (X, Y ) of random variables X : ! 7! ↵(!) and Y : ! (!). If we define the conditional entropy of ↵ on a set B by 7! H(↵ B)= P (A B) log P (A B) | | | A ↵ X2 and H(↵ )= P (B) H(↵ B) | · | B X2 then we have the identities corresponding to Lemma ??.Specifically,wesay that ↵ refines ,or↵ ,ifeveryA ↵ is a subset of some B . Then we 2 2 have Lemma 7.3.6. . 1. H(↵ )=H(↵)+H( ↵). _ | 2. H(↵ ) H(↵) with equality if and only if ↵ . _ 3. H(↵ ) H(↵) with equality if and only if ↵, are independent. |  Definition 7.3.7. Given a discrete random variable X and a -algebra , B✓F the conditional entropy H(X ) is |B H(dist(X (!)) dP (!)= H (X) dP (!) ˆ |B ˆ P! where dist(X (!)) is the distribution of X given the atom (!),andP is |B B ! the disintegrated measure at ! given .Forapartition↵ we define H(↵ ) B |B similarly. Proposition 7.3.8. Suppose ↵ is a finite partition and ...asequence 1 2 of refining partitions and = ( , ,...)= the generated -algebra. B 1 2 n Then W H(↵ )= lim H(X n) n |B !1 | Equivalently, if X is a finite-valued random variable and Yn are discrete random variables then H(X Y1,Y2,...)= lim H(X Y1 ...Yn) n | !1 | Remark 7.3.9. The same is true when ↵ is a countable partition but the proof is slightly longer and we won’t need this more general version. Proof. By the martingale convergence theorem, for each A ↵ we have 2

P (A n(!)) = E(1A n)(!) E(1A )(!)=P (A (!)) a.e. | | ! |B |B This just says that dist(↵ ) dist(↵ ) pointwise as probability vectors. | n ! |B Since H(t1,...,t↵ ) is continuous this implies that H(dist(↵ n)) H(dist(↵ ), | | | ! |B which is the desired conclusion. CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY81

Remark 7.3.10. There is a beautiful axiomatic description of entropy as the only continuous functional of random variables satisfying the conditional entropy for- mula. Suppose that Hm(t1,...,tm) are functions on the space of m-dimensional probability vectors, satisfying 1. H ( , ) is continuous, 2 · · 1 1 2. H2( 2 , 2 )=1,

3. Hk+m(p1,p2,...pk,q1,...,qm)=( pi)Hk(p10 ,...,pk0 )+( qi)Hm(q10 ,...,qk0 ), where p0 = pi/ pi and q0 = qi/ qi. i i P P Then H (t)= P t log t .Weleavetheproofasanexercise.P m i i P 7.4 Digression: applications of entropy

We describe here two applications of entropy. The first demonstrates how en- tropy can serve as a useful analog of cardinality, but with better analytical properties. The basic connection is that if X is uniform on its range ⌃,then H(X) = log ⌃ . | | Proposition 7.4.1 (Loomis-Whitney Theorem). Let A R3 be a finite set ✓ and ⇡xy,⇡xz,⇡yz the projections to the coordinate planes. Then the image of A under one of the projections is of size at least A 2/3. | | If we make the same statement in 2 dimensions, it is trivial, since if ⇡ (A) | x | A and ⇡ (A) < A then A ⇡ (A) ⇡ (A) has cardinality < A , which | | | y | | | ✓ x ⇥ y | | is impossible. This argument does not work in three dimensions. p p Lemma 7.4.2 (Shearer’s inequality). If X, Y, Z are random variables then 1 H(X, Y, Z) (H(X, Y )+H(X, Z)+H(Y,Z))  2 Proof. Write

H(X, Y, Z)=H(X)+H(Y X)+H(Z X, Y ) H(X, Y )=H(X)+H(Y |X) | H(Y,Z)= H(Y| )+H(Z Y ) H(X, Z)=H(X)+H(Z,| X)

Using H(Y ) H(Y X) and H(Z X) H(Z X, Y ) and H(Z Y ) H(Z X, Y ), | | | | | the sum of last 3 lines is at least equal to the first line. This was the claim. Proof of Loomis-Whitney. Let P =(X, Y, Z) be a random variable uniformly distributed over A. Thus

H(P )=H(X, Y, Z) = log A | | CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY82

On the other hand by Shearer’s inequality, 1 1 H(P ) (H(X, Y )+H(X, Z)+H(Y,Z)) = (H(⇡ (P )) + H(⇡ (P )) + H(⇡ (P )))  2 2 xy xz zy so at least one H(⇡ (P )),H(⇡ (P )),H(⇡ (P )) is 2 log A .Since⇡(P ) xy xz yz 3 | | is supported on ⇡(A) we also have H(⇡ (P )) log supp ⇡ A , etc. In x,z  | x,y | conclusion for one of the projections, ⇡( A ) 2 log A , which is what we | | 3 | | claimed. We now turn to the original use of Shannon entropy in information theory, where H(X) should be thought of as a quantitative measure of the amount of randomness tin the variable X,SupposewewanttorecordthevalueofX using astringof0sand1s. Such an association c :⌃ 0, 1 ⇤ is called a code. We !{ } shall require that the code be 1-1, and for simplicity we require it to be a prefix code,whichmeansthatifa, b ⌃ then neither of a, b is a prefix of the other. 2 Let c(a) denote the length of the codeword c(a). | | c(a) Lemma 7.4.3. If c :⌃ 0, 1 ⇤ is a prefix code then a ⌃ 2| | 1. !{ } `(a) 2  Conversely, if ` :⌃ N are given and a ⌃ 2 then there is a prefix code ! 2 P c :⌃ 0, 1 ⇤ with c(a) = `(a). !{ } | | P Theorem 7.4.4. Let `i i ⌃ be integers. Then the following are equivalent: { } 2 ` 1. 2 i 1.  2.P There is a prefix code with lengths `i.

Proof. We can assume ⌃= 1,...,n .LetL = max ` and order ` ` { } i 1  2  ... ` = L.Identify 0, 1 i with the full binary tree of height L,soeach  n i L{ } vertex has two children, one connected to the vertex by an edge marked 0 and S the other by an edge marked 1. Each vertex is identified with the labels from the root to the vertex; the root corresponds to the empty word and the leaves (at distance L from the root) correspond to words of length L. 2 = 1: The prefix condition means that c(i),c(j) are not prefixes of each ) other if i = j;consequentlynoleafofthetreehasbothasanancestor.Writing 6 L ` A for the leaves descended from i,wehave A =2 i and the sets are disjoint, i | i| therefore L ` L L 2 i = A = A 0, 1 =2 | i| | i||{ } | dividing by 2L givesX (1) . X [ Conversely a greedy procedure allows us to construct a prefix code for `i as above. The point is that if we have defined a prefix code on 1,...,k 1 then { } the set of leaves below the codewords must have size

L ` L A A = 2 i < 2 | i| | i| i

is the ancestor of a leaf not in i

ri log pi( ) = log 1 = 0  pi ✓X ◆ Equality occurs unless ri/pi =1. Theorem 7.4.6 (Achieving optimal coding length (almost)). There is a prefix code whose average coding length is H(X)+1 Proof. Set ` = log p . Then i d ie ` log p 2 i 2 i = p =1  i X X X and since ` log p +1,theexpectedcodinglengthis i  i p ` H(X)+1 i i  X Thus up to one extra bit we achieved the optimal coding length.

Corollary 7.4.7. If (Xn) is a stationary finite-valued process with entropy 1 h(X)=limn H(X1 ...Xn), then for every ">0 if n is large enough we can code X ...X using h + " bits per symbol (average coding length (h + ")n). 1 n  CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY84

Proof. We can code X1,...,Xn using H(X1,...,Xn)+1bits on average. Di- 1 viding by n and assuming n>2/",andalsothat n H(X1 ...Xn)

7.5 Entropy of a stationary process

X =(Xn)n1= be a stationary sequence of random variables with values in afiniteset⌃1; as we know, such a sequence may be identified with a shift- invariant measure µ on ⌃Z,withXn(!)=!n for ! =(!n) ⌃Z,andXn(!)= n 2 X0(S !), where S is the shift. The partition induced by X0 on ⌃Z is the partition according tot he 0-th coordinate, and we denote ↵. Then Xn induces n n the partition T ↵ = T A : A ↵ .Since(Xn(!))n1= determines !,we { n 2 } 1 n see that the partitions T ↵, n Z,separatepoints,so = n1= T ↵. 2 B 1 Definition 7.5.1. The entropy h(X) of the process X =(XWn) is 1 lim H(X1,...,Xn) n !1 n

Let us first show that the limit exists: Let Hn(X)=H(X1,...... Xn).For each n, m,

Hm+n(X)=H(X1 ...Xm,Xm+1,...,Xn+m) = H(X ,...,X )+H(X ,...,X X ,...,X ) 1 m m+1 m+n| 1 m H(X ,...,X )+H(X ,...,X )  1 m m+1 m+n = H(X1,...,Xm)+H(X1,...,Xn)

= Hm(X)+Hn(X)

Here the inequality is because conditioning cannot increase entropy, and then we used stationarity, which implies dist(Xm+1,...,Xm+n)=dist(X1,...,Xn). e have shown that the sequence Hn(X) is subadditive so the limit h(X)= 1 1 lim n Hn(X) exists (and equals inf n Hn(X)).

Example 7.5.2. If Xn are i.i.d. then H(X1,...,Xn)= H(Xi)=nH(X1) and so h(X)=H(X1). P Example 7.5.3. Let µ be the S-invariant measure on a periodic point ! = N S !. Then (Xn) can be obtained by choosing a random shift of the sequence !. Since X ,...,X takes on at most N different values, H(X ,...,X ) log N 1 n 1 N  and so 1 H(X ...X ) 0=h(X). n 1 N ! CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY85

Example 7.5.4. Let ✓ R = Q and R✓ : R/Z R/Z translation, for which 2 6 ! Lebesgue measure µ is the unique invariant measure. Let X1(x)=1[0,1/2](x) n and Xn = X1(R✓ x). We claim that the process X =(Xn) has entropy 0. Indeed, the partition determined by Xn is an interval of length 1/2 in R/Z and so (X1,...,Xn) determines a partition that is the join of n intervals. This partition is the partition into intervals that is determined by 2n endpoints and so consists of 2n intervals. Hence H(X ,...,X ) log 2n and so 1 h(X ...X ) 1 n  n 1 n ! 0=h(X).

Theorem 7.5.5. h(X)=H(X0 X 1,X 2,...). | n Proof. It is convenient to write Xm = Xm,...,Xn. Now,

n H(X1 )=H(X1 ...Xn 1)+H(Xn X1,...,Xn 1) | = H(X1 ...Xn 1)+H(X0 X n,...,X 1) | by stationarity. By induction,

n 1 n 1 H(X1 )= H(X0 Xi ) | i=0 X so 1 h(X)= lim H(X1,...,Xn) n !1 n n 1 1 1 =lim H(X0 Xi ) n n | !1 i=0 X 1 1 Since H(X0 X )=limn H(X0 Xn),thesummandsintheaveragesabove | 1 !1 | converge, so the averages converge to the same limit, which is what we claimed.

1 Definition 7.5.6. Aprocess(Xn)n1= is deterministic if X a.s. deter- 1 1 mines X0 (and by induction also X1,X2,...).

Corollary 7.5.7. (Xn) is deterministic if and only if h(X)=0. 1 Proof. H(X0 X )=0if and only if a.e. the conditional distribution of X0 1 | 1 given X is atomic, which is the same as saying that X0 is measurable with 1 1 respect to (X ), which is the same as determinism. 1

Remark 7.5.8. Note that the entropy of the time-reversal (X n)n1= is the 1 same as of (Xn),becausetheentropyoftheinitialn variables is the same in both cases. It follows that a process and its time-reversal are either both deterministic (if the entropy is 0) or neither is. This conclusion is nonobvious!

Definition 7.5.9. For a stationary process (Xn)n1= ,thetail -algebra n 1 is the -algebra = n1=1 (X ). The process has trivial tail,orisa T 1 Kolmogorov process,if is the trivial algebra (modulo nullsets). T T CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY86

Intuitively, a process with trivial tail is a process in which the remote past has no effect on the present (and future).

n Example 7.5.10. If (Xn) are i.i.d. then (X ) is independent of X1n+1. so is independent of all the variables X and1 so, since it is measurable with T n respect tot hem , is independent of itself; so it is trivial.

Proposition 7.5.11. If X =(Xn) has trivial tail and H(X0) > 0 then h(X) > 0.

n Proof. Since H(X0 X ) H(X0 )=H(X0). Therefore there is some n0 | 1 ! n0 |T such that such that H(X0 X ) > 0. Now, | 1 H(X ,...,X ) H(X ,X ,X ,...,X ) 0 n 0 n0 2n0 [n/n0]n0 [n/n ] 1 0

= H(Xin0 X0,Xn0 ,...,X(i 1)n ) | 0 i=0 X n n0 H(X0 X ) n | 1  0 so 1 1 n0 h(X)= lim H(X0,...,Xn) H(X0 X ) > 0 n 1 !1 n n0 | as claimed. Corollary 7.5.12. The only process that is both deterministic and has trivial tail is the trivial process Xn = a for all n.

7.6 Couplings, joinings and disjointness of pro- cesses

Let (Xn), (Yn) be stationary processes taking values in A, B respectively. A coupling is a stationary process (Zn) with Zn =(Xn0 ,Yn0 ) and the marginals (Xn0 ), (Yn0 ) have the same distribution as (Xn), (Yn), respectively. This is the probabilistic analog of a joining. Indeed, identify (Xn), (Yn) with invariant measures µ, ⌫ on AZ,BZ,respectively. The coupling (Zn) can be identified with an invariant measure ✓ on (A B)Z,andtheassumptiononthedistribution ⇥ of (X0 ), (Y 0 ) is the same as saying that the projections (A B)Z AZ maps n n ⇥ ! ✓ µ and the other projection maps ✓ ⌫;thus✓ is a joining of the systems 7! 7! (AZ,µ,S) and (BZ,⌫,S).Ontheotherhandanyjoininggivesanassociated coupling, Z = ⇡ (z) where ⇡ :(A B)Z A B is the projection to the n n n ⇥ ! ⇥ n-th coordinate. Definition 7.6.1. We say that two processes are disjoint if the only coupling is the independent one; equivalently, the associated shift-invariant measures are disjoint. CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY87

Proposition 7.6.2. If (Xn), (Yn) have positive entropy, they are not disjoint. Proof. We prove this for 0, 1-valued processes, the general proof is the same. If U, V are random variables taking on values 0, 1 with positive probability, then there is a non-trivial coupling that can be constructed as follows: let p = P (U = 1) and q = P (V = 1).LetZ U[0, 1] and set U 0 =1Z

1 1 h(W 0)=h(W00 (W 0) )= lim H(W00 (W 0)T ) | 1 T | !1 so it suffices to prove that

1 H(W00 (W 0)T )

i 1 Now, for i 1 note that H(Wi Wi T ) is determined in a manner independent | i 1 of i by the conditional distribution of (Ui,Vi) given (Uj,Vj)j=i T and the latter is converging to an independent coupling of the marginal distributions, so this limit is 1 0 H(W00 (W 0)T )=H(W1 W T +1) | | 1 and this H(W0 W ) as T . Therefore it is enough to show that ! | 1 !1 0 H(W1 W )

0 0 0 0 H(W1 W )= H(W1 W = w ) dP (w ) | 1 ˆ | 1 1 1

0 0 0 0 0 < H(X1 X = x )+H(Y1 Y = y ) dµ ⌫((x, y) ) ˆ | 1 1 | 1 1 ⇥ 1 = h(X)+h(Y ) as desired.

Theorem 7.6.3. Let (Xn) be a process with trivial tail and (Yn) deterministic process, taking values in finite sets A, B respectively. Then (X ) (Y ). n ? n Lemma 7.6.4. Let Z =(Xn,Yn)n1= be a stationary process with values in A B. Then 1 ⇥ 1 h(Z)=h(Y )+H(X0 X ,Y1 ) | 1 1 Proof. Expand H(Z1,...,Zn)=H(X1,...,XnY1,...,Yn) as

H(Y1,...,Yn,X1,...,Xn)=H(Yn)+H(Y1,...,Yn 1,X1,...Xn Yn) | = H(Yn)+H(Yn 1 Yn)+H(Y1,...,Yn 2,X1,...Xn Yn 1Yn) | | . . n 1 n i 1 n = H(Yn i Y1 )+H(X1 ...Xn Y1 ) | | i=0 X . . n 1 n 1 n i 1 n i 1 n = H(Yn i Y1 )+ H(Xn i X1 ,Y1 ) | | i=0 i=0 X X Dividing by n,thefirsttermconvergestoh(Y ). The second term, after shifting 1 m indices, is an average of terms of the form H(X1 Xk Y k) where k and | 1 !1 m ,andthesetenduniformlytoH(X0 X ,Y1 ),sotheaveragedoes !1 | 1 1 too. CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY89

M M Lemma 7.6.5. Let Yn =(Yn M ,...... Yn+M ). Then h(Y )=h(Y ).

Lemma 7.6.6. H(X0 X M ,X 2M ,...) H(X0) as M . | ! !1 Of the theorem. Suppose Zn =(Xn,Yn) is a coupling. Let us first show that X0 is independent of (Y 1 ). If not then there is some M and ">0 such M 1 M that H(X0 Y M )

M h(Z ) h((X )1 ) h(X ) " Mn n=1 0 assuming M is large enough. But applying the previous lemma,

M M 1 M 1 M M h(Z )=h(Yn )+H(X0 (Xkn)k= , (Y n ) ) 0+H(X0 Y0 )=H(X0 Y M )

7.7 Kolmogorov-Sinai entropy

Let (X, ,µ,T) be an invertible measure-preserving system. For a partition B ↵ = A of X we define { i} n 1 i hµ(T,↵)= lim Hµ( T ↵) n n !1 i=1 _ ↵ This is the same as the entropy of the process (Xn )n1= , where 1 X↵(x)=i T nx A n () 2 i

Definition 7.7.1. The Kolmogov-Sinai entropy hµ(T ) of the system is

hµ(T )=suphµ(T,↵) ↵

This number is non-negative and may be infinite.

Proposition 7.7.2. hµ(T ) is an isomorphism invariant of the system (that is, isomorphic systems have equal entropies).

n i n i Proof. Since Hµ( i=1 T ↵) depends only on the masses of the atoms of i=1 T ↵, n i n i and the partition = ⇡↵ of Y has the property that ⇡( T ↵)= S , W i=1 i=1W the two have equal entropy. It follows that hµ(T,↵)=h⌫ (S,⇡↵). This shows W W that the supremum of entropies of partitions of Y is at least as large as the supremum of the entropies of partitions of X.Bysymmetry,wehaveequal- ity. CHAPTER 7. DISJOINTNESS AND A TASTE OF ENTROPY THEORY90

i Definition 7.7.3. Apartition↵ of X is generating if 1 T ↵ = mod i = B µ. 1 W Example 7.7.4. the partition of AZ according tot he first coordinate is gener- ating.

In order for entropy to be useful, we need to know how to compute it. This will be possibly using

Theorem 7.7.5. If ↵ is a generating partition, then hµ(T )=hµ(T,↵). Proof. Let ↵ be a generating partition and another partition. We only need to show that h (T,↵) h (T,). Consider the process (X ,Y ) where X is µ µ n n n the process determined by ↵ and Yn the one determined by (so (Xn,Yn) is determined by ↵ ). By definition h (T,↵)=h((X )) and h (T,)=h((Y )). _ µ n µ n Now, since H(Xn,Yn) H(Y n) we have h((X ,Y )) h((Y )).Ontheother 1 1 1 n n n hand by a previous lemma,

0 h((Xn,Yn)) = h((Xn)+H(Y0 Y ,X1 ) | 1 1

Since (X1 ) is the full -algebra up to nullsets, the conditional expectation on the right is10. Thus, h (T,↵)=h((X )) h((Y )) = h (T,),asdesired. µ n n µ Example 7.7.6. Let p =(p ,...,p ) be a probability vector and µ = pZ 1 n p 2 ( 1,...,n Z), S= the shift. Then the partition ↵ according to the 0-coordinate P { } is generating, so hµp (S)=h((Xn)) where (Xn) is the process associated to ↵. This is an i.i.d. process with marginal p so its entropy if H(p). Thus, µ = µ implies that H(p)=H(q). In particular, this shows that (1/2, 1/2)Z = p ⇠ q 6 (1/3, 1/3, 1/3)Z.

7.8 Application to filterling

See Part I, Section 9 of H. Furstenberg, Disjointness in ergodic theory, minimal sets, and a problem in diophantine approximation,1967,availableonlineat http://www.kent.edu/math/events/conferences/cbms2011/upload/furst-disjointness.pdf Chapter 8

Rohlin’s lemma

8.1 Rohlin lemma

Theorem 8.1.1 (Rohlin’s Lemma). Let (X, ,µ,T) be an invertible measure B preserving system, and suppose that for every >0 there is a set A with µ(A) < and µ(X n1=0 TA)=0. Then for every ">0 and integer N 1 there is \ N 1 asetB such that B,TB,...,T B are pairwise disjoint, and their union has S mass > 1 ". Remark 8.1.2. We will discuss the hypothesis soon but note for now that if µ is non-atomic and T is ergodic then it is automatically satisfied, since in fact for n 1 any set B of positive measure, C = 1 T B satisfies C T C,henceby n=0 ✓ ergodicity µ(CX)=0. S Thus the theorem has the following heuristic implication: any two measure preserving maps behave in an identical manner on an arbitrarily large fraction of the space, on which it acts simply as a “shift”; the differences are “hidden” in the exceptional " of mass. Proof. Let ",B be given and choose A as in the hypothesis for = "/N.Let n 1 r (x)=min n>0:T x A and A into A = r (x) A,thatis, A { 2 } n A \ A = x A : T ix A for 0 i

91 CHAPTER 8. ROHLIN’S LEMMA 92

Also set n 1 i En = T An i=N[[n/N] Notice that n 1 N 1 i i T A = E T A0 n [ n i=0 i=0 [ [ and the union is disjoint. Since the above holds for all N,theset

B = An0 n [ N 1 N 1 i has the property that B,TB,...,T B are pairwise disjoint, and µ(X T B)= \ i=0 µ(E), where E = En.Finally,toestimateµ(E),wehave S " µ(E)=S µ(E ) Nµ(A )=Nµ(A)

Definition 8.1.3. For a measure-preserving system (X, ,µ,T) let B

1 Per(T )= x X : x = T nx { 2 } n=1 [ The system is aperiodic if µ(Per(T )) = 0. Proposition 8.1.4. If (X, ,µ,T) is aperiodic and (X, ) is standard, then the B B hypothesis of the previous proposition is satisfied.

Proof. First, we may assume that Per(T )= by replacing X with X Per(T ). ; \ Let ">0 and consider the class of measurable sets A with the property that n A µ(A) "µ( 1 T A).Notethat is non-empty since it contains ,anditis  n=0 A ; closed under monotone increasing unions of its members. S Consider the partial order on given by A A0 if A A0 and µ(A) < A ✓ µ(A0).Anymaximalchainiscountable,sincetherearenouncountablebounded increasing sequences of reals. Therefore the chain has a maximal element, namely the union of its members. We shall show that every such maximal n element A must satisfy µ( n1=0 T A)=1. n To see this first suppose that A and µ( n1=0 T A) < 1.SetX0 = n S 2A X 1 T A which is an invariant set (up to a nullset, which we also remove \ n=0 S as necessary). All we must show is that there exists E X0 with µ(E) S n ✓  "µ( 1 T E),thenA E A.Toseethisletd be a compatible metric n=0 [ on X, with the Borel algebra. Let N =[1/"].ByLusin’stheoremwecan S B N find X00 X0 with µ(X00) > 0 and all the maps T,...,T are continuous ✓ N on X00.Forx X00,sinceallthepointsx, T x, . . . , T x are distinct, there is a 2 CHAPTER 8. ROHLIN’S LEMMA 93

N relative ball B = Br(x)(x) X00 such that B,TB,...,T B are pairwise disjoint, \ n which implies µ(E) "µ( 1 T B).MoreoverthesameholdsforanyB0 B.  n=0 ✓ Choose such a B0 containing X from some fixed countable basis for the topology S of X00 (here we use standardness, though actually only separability is needed). Since these B0 form a countable cover of X00 one of them must have positive mass. This is the desired set E.

8.2 The group of automorphisms and residuality

Fix ⌦=([0, 1], ,m) Lebesgue measure on the unit interval with Borel sets. B Let Aut denote the group of measure-preserving maps of ⌦, with two maps identified if they agree on a Borel set of full measure. We introduce a topology on Aut whose sub-basis is given by sets of the form 1 1 U(T,A,")= S Aut : µ(S AT A) <" { 2 } where T Aut, A and ">0.NotethatT U(T,A,"). 2 2B 2 We may also identify Aut with a subgroup of the group of unitary (or bounded linear) operators of L2(m).Assuchitinheritsthestrongandweak operator topologies from the group of bounded linear operators. These are the topologies whose bases are given by sets S : Tf Sf <" for given operator T, f L2 ">0 { k k2 } 2 and S : (Tf,Sf) (f,f) <" for given operator T, f L2 ">0 { | | } 2 respectively. When restricted to the group of unitary operators these are equiv- alent bases, as can be seen from the identity Tf Sf 2 = Tf 2 2(Tf,Sf)+ Sf 2 = 2(f,f) 2(Tf,Sf) k k2 k k2 k k2 Now. the topology we have defined is clearly weaker than the strong operator topology, since 1 1 2 2 µ(S AT A)= 1 1 1 = S1 T 1 k S A TAk2 k A Ak2 so 2 1 U(T,1 ," ) U(T ,A,") A ✓ k On the other hand for step functions f = i=1 ai1Ai we can show that it is also stronger: setting a = ai , | | P kP U(T,A ,"/ak) U(T,f,") i ✓ i=1 \ For general f one can argue by approximation. Hence, all three topologies on Aut agree. It also shows that the topology makes composition continuous and 1 makes tha map T T continuous (since this is true in the unitary group). 7! CHAPTER 8. ROHLIN’S LEMMA 94

Definition 8.2.1. Denote by the partition of [0, 1) into the dyadic intervals Dn [k/2n, (k + 1)/2n). Lemma 8.2.2. Aut is closed in the group of unitary operators with the strong operator topology. Proof. Let T Aut.SinceT arise from maps of [0, 1],eachT is not only n 2 n n aunitarymapofL2, it also preserves pointwise multiplication of functions in 2 L1: T (fg)=T f T g. It is easy to see that if T T U(L (m)) in n n · n n ! 2 the strong operator topology. Then is it a simple matter to verify that T also preserves pointwise multiplication. But it is then a classical fact that T arises from a measure preserving map of the underlying space. By using the fact that 1 1 1 Tn T we similarly see that T arises from a measure preserving map. ! 1 1 1 Now the relation TnTn A = Tn TnA = A imples that T,T are inverses, so 1 T,T Aut. 2 Corollary 8.2.3. Aut is Polish in our topology. Definition 8.2.4. Asetiscalleddyadic (of generation n)ifitistheunionof elements of .Let ⇤ denote the algebra of these sets. Dn Dn T Aut is dyadic (of generation n)ifitpermutestheelementsof (with 2 Dn the map of intervals realized by isometries). Adyadicautomorphismiscyclic if the permutation is a cycle. Note that if ⇡ is a permutation of then there is an automorphism S Dn ⇡ 2 Aut such that S : I ⇡I in a linear and orientation preserving manner. D ⇡ ! Proposition 8.2.5. The cyclic permutations are dense in Aut. Furthermore for every n , the cyclic permutations of order 2n0 are dense. 0 Proof. Fix n0.LetUi = U(Ai,Ti,"i), i =1,...,k,begiven.Wemustshow that Aut k U = . C \ i=1 i 6 ; First let us show that AutD ( Ui) = .Let denote the coarsest partition T \ 6 ; A of X that refines the partitions Ai,X Ai .LetT be its image, which is {T \ } A similarly defined by the sets TAi.Fixanauxiliaryparameter>0.Givenn and A let 2A (A)= I : m(I A) > (1 )m(I) En { 2Dn \ } (A)= I : m(I TA) > (1 )m(I) Fn { 2Dn \ } and write E (A)= (A) n [En F = (A) n [Fn By Lebesgue’s differentiation theorem, we can choose an n>n0 so large that for every A , 2A m(AEn(A)) <

m(TAFn(A)) < CHAPTER 8. ROHLIN’S LEMMA 95

Of course, m(A)=m(TA),sotheabovemeansthat m(E (A)) m(F (A)) < | n n | 2. If m(E (A))

Assuming was small this shows that S U . ⇡ 2 i The permutation ⇡ above need may not be cyclic, so we have only shown so T far that AutD is dense. To improve this to AutC proceed as follows. Fix a large N and consider the partition into dyadic intervals of level n + N.Let⇡0 Dn+N be a permutation of that has the property that if I0 and I0 I Dn+N 2Dn+N ✓ for I ,then⇡0I0 ⇡I. There are many such ⇡0 and we may choose one so 2Dn ✓ that every cycle of ⇡0 covers a complete cycle of ⇡.Forexample,ifI I I 1 ! 2 ! 1 is a cycle or order 2 of ⇡,thenenumeratethe(n + N)-adic subintervals of I1 as I10,1,...,I10,k and similarly I20,1,...,I20,k the sub-intervals of I2,anddefine

⇡0 : I0 I0 I0 I0 I0 I0 ... I0 I0 I0 1,1 ! 2,1 ! 1,2 ! 2,2 1,3 ! 2,3 ! ! 1,k ! 2,k ! 1,1

Now, if S0 = S⇡0 is the automorphism associated to ⇡0 and S to ⇡,thenS0I = SI for every I , so the argument above shows that m(S0A, T A) < 6 for A . 2Dn 2A Also, it is clear that ⇡0 has the same number of cycles as ⇡, which is at most n n 2 .Bychangingthedefinitionof⇡0 on at most 2 of the n + N-adic intervals, we obtain cyclic permutation ⇡00 and associated S00 = S Aut with ⇡00 2 C n n+N N m(S00IS0I) 2 2 =2 for I  2Dn N Thus, assuming 2 <, we will have m(S0A, T A) < 8 for A . Thus 2A S00 Aut U ,asdesired. 2 C \ i Theorem 8.2.6T (Rohlin density phenomenon). Let (X, ,µ,T) be an aperi- B odic measure preserving transformation on a standard measure space. Then the isomorphism class [T ]= S Aut : S = T { 2 ⇠ } is dense in Aut. Proof. Take S Aut, A and ",wemustshow[T ] U(S, A, ") = .Bythepre- 2 \ 6 ; vious proposition, it suffices to show this for S AutC of arbitrarily high order 2 N 1 N. Now, by Rohlin’s lemma we can find B such that B,TB,...,T B 2B are disjoint and fill > 1 " of X.LetI ,...,I denote an ordering of dyadic 1 N CHAPTER 8. ROHLIN’S LEMMA 96 intervals as permuted by S.LetI0 denote I I00 where I00 is the interval at N n \ n n the right end of I of length "/N. Thus m(I0 )=(1 ")/N = µ(B) and S maps n n I0 I0 . n ! n+1 Using standardness we can find an isomorphism ⇡0 : B I1.Nowfor n n n ! x T B, 0 n

Take a moment to appreciate this theorem. It immediately implies that the group rotations are dense in Aut (in fact, any particular group rotation is); the isomorphism class of every nontrivial product measure is dense in Aut; etc. Stated another way, for any two non-atomic invertible ergodic transfor- mations of a standard measure space, one can find realizations of them in Aut that are arbitrarily close. Hence, essentially nothing can be learned about an automorphism by observing it at a “finite resolution”. Theorem 8.2.7. Let

WM = T Aut : T is weak mixing { 2 }

Then WM is a dense G in Aut. In particular the set of ergodic automorphisms contains a dense G. Proof. Let f be a countable dense subset of L2(m).Let { i}

U = T Aut : (f ,Tkf ) f f < 1/n { 2 | i j ˆ i ˆ j| } i j n \ \ \ [k Since T (f ,Tkf ) is continuous, the innermost set is open, and so this is a 7! i j G set. We claim U = WM.ToseethatWM U,notethatforeveryweak ✓ mixing transformation and every i, j,,weknowthat

density (f ,Tkf ) f f 0 | i j ˆ i ˆ j| ! so for every n there is a k with the difference < 1/n,henceT U. 2 In the other direction, if T Aut WM,thenthereisanon-constanteigen- 2 \ function T' = ', ' =1, which we may assume has zero integral. Then k k Taking f ,f close to ' we will have (f ,Tkf ) > (',T k') 1/2=1/2 for all i j | i j | | | k,thusforn =3and these i, j we do not have (f ,Tkf ) f f < 1/n | i j i j| } and therefore T/U. ´ ´ 2 Finally, WM is dense by the previous theorem, since it contains aperiodic automorphisms. CHAPTER 8. ROHLIN’S LEMMA 97

Theorem 8.2.8. Let

SM = T Aut : T is strongly mixing { 2 } Then SM is meager in Aut.

Proof. Let I =[0, 1/2].Leto(S) denote the order of a cyclic automorphism S Aut and let 2 C o(S) 1 V = U(S, SiI, ) n o(S) n S Aut : o(S) n i=0 \ { 2 C[ } \ · where o(S) is the order of S. This is a G set and it is dense since AutC is dense. So we only need to prove that V SM = . Indeed, if T V then for \ ; 2 arbitrarily large n there is a cyclic automorphism S of order k = o(S) n with T U(S, I, 1/nk).Nownotethatm(I SkI)=m(I). It is an easy induction 2 \ to show that m(SiIT iI) i/nk for 1 i k,sincefori =1it follows from    T U(S, I, 1/nk,andassumingweknowitfori 1, we can write 2 i i i 1 i 1 m(S IT I)=m(S(S I)T (T I)) i 1 i 1 i 1 i 1 m(S(S I)T (S I)) + m(T (S I)T (T I))  1 i 1 i 1 + m(S IT I)  nk 1 i 1 +  nk nk

i 1 where in the second inequality we used T U(S, S I,1/nk) and the fact that 2 T is measure-preserving. We conclude that k 1 m(SkIT kI) =  nk n so 1 m(IT kI) m(ISkI)+m(SkIT kI)=0+  n Since k as n ,wedonothavem(IT kI) m(I)2 = 1 . This shows !1 !1 ! 4 that T is not strong mixing. Chapter 9

Appendix

9.1 The weak-* topology

Proposition 9.1.1. Let X be a compact metric space. Then (X) is metrizable P and compact in the weak-* topology.

Proof. Let f 1 be a countable dense subset of the unit ball in C(X).Define { i}i=1 ametricon (X) by P

1 i d(µ, ⌫)= 2 f dµ f d⌫ | ˆ i ˆ i | i=1 X It is easy to check that this is a metric. We must show that the topology induced by this metric is the weak-* topology. If µ µ weak-* then f dµ f dµ 0 as n ,henced(µ ,µ) 0. n ! i n i ! !1 n ! Conversely, if d(µ ,µ) ´ 0,then´ f dµ f dµ for every i and therefore n ! i n ! i for every linear combination of the f´s. Given f´ C(X) and ">0 there is a i 2 linear combination g of the fi such that f g <". Then k k1

fdµ fdµ < fdµ gdµ + gdµ gdµ + gdµ fdµ | ˆ n ˆ | | ˆ n ˆ n| | ˆ n ˆ | | ˆ ˆ | <"+ gdµ gdµ + " | ˆ n ˆ | and the right hand side is < 3" when n is large enough. Hence µ µ weak-*. n ! Since the space is metrizable, to prove compactness it is enough to prove sequential compactness, i.e. that every sequence µ (X) has a convergent n 2P subsequence. Let V = span fi , which is a countable dense Q-linear subspace Q{ } of C(X). The range of each g V is a compact subset of R (since X is compact 2 and g continuous) so for each g V we can choose a convergent subsequence 2 of gdµn.Usingadiagonalargumentwemayselectasinglesubsequenceµn(j) such´ that gµn(j) ⇤(g) as j for every g V . Now, ⇤ is a Q-linear ´ ! !1 2 98 CHAPTER 9. APPENDIX 99 functional because

⇤(af + bf )=k lim (af + bf ) dµ i j ˆ i j n(k)

=lima fidµn(k) + b fjdµn(k) k ˆ ˆ !1 = a⇤(fi)+b⇤(fj)

⇤ is also uniformly continuous because, if fi fj <"then k k1

⇤(fi fj) = lim (fi fj) dµn(k) | | k ˆ !1 lim fi fj dµn(k)  k ˆ | | !1 "  Thus ⇤ extends to a continuous linear functional on C(X).Since⇤ is positive (i.e. non-negative on non-negative functions), sos is its extension, so by the Riesz representation theorem there exists µ (X) with ⇤(f)= fdµ.By 2P definition gdµ gdµ 0 as k for g V ,hencethisistrueforthe´ n(k) ! !1 2 f ,sod(µ ´ ,µ) ´ 0 Hence µ µ weak-* . i n(k) ! n(k) ! 9.2 Conditional expectation

When (X, B, µ) is a probability space, f L1,andA asetofpositivemeasure, 2 1 then the conditional expectation of f on A is usually defined as µ(A) A fdµ. When A has measure 0 this formula is meaningless, and it is not clear´ how to give an alternative definition. But if = Ai i I is a partition of X into A { } 2 measurable sets (possibly of measure 0), one can sometimes give a meaningful definition of the conditional expectation of f on (x) for a.e. x, where (x) is A A the element A containing x. Thus the conditional expectation off on is a i A function that assigns to a.e. x the conditional expectation of f on the set (x). A Rather than partitions, we will work with -algebras; the connection is made by observing that if is a countably-generated -algebra then the partition of E X into the atoms of is a measurable partition. E Theorem 9.2.1. Let (X, ,µ) be a probability space and asub- algebra. B E✓B Then there is a linear operator L1(X, ,µ) L1(X, ,µ) satisfying B ! E 1. Chain rule: E(f ) dµ = fdµ. ´ |E ´ 2. Product rule: E(gf )=g E(f ) for all g L1(X, ,µ). |E · |E 2 E Proof. We begin with existence. Let f L1(X, ,µ) and let µ be the finite 2 B f signed measure dµf = fdµ. Then µf µ in the measure space (X, ,µ) and ⌧ 1 B this remains true in (X, ,µ).LetE(f )=dµf /dµ L (X, ,µ),theRadon- E |E 2 E Nykodim derivative of µ with respect to µ in (X, ,µ). f E CHAPTER 9. APPENDIX 100

The domain of this map is L1(X, ,µ) and its range is in L1(X, ,µ) by the B E properties of dµf /dµ. Linearity follows from uniqueness of the Radon-Nykodim derivative and the definitions. The chain rule is also immediate:

dµf E(f ) dµ = dµ = fdµ ˆ |E ˆ dµ ˆ

dµf dµgf For the product rule, let g L1(X, ,µ).Wemustshowthatg = 2 E · dµ dµ in (X, ,µ).Equivalentlywemustshowthat E dµf dµgf g dµ = dµ for all E ˆE dµ ˆE dµ 2E Now, for A and g =1 we have 2E A dµf dµf 1A dµ = dµ ˆE dµ ˆA E dµ \ = µ (A E) f \ = fdµ ˆA E \

= 1Afdµ ˆE dµ = 1Af dµ ˆE dµ so the identity holds. By linearity of these integrals in the g argument it holds linear combinations of indicator functions. For arbitrary g L1 we can take 2 auniformlyboundedsequenceofsuchfunctionsconvergingpointwisetog,and pass to the limit using dominated convergence. This proves the product rule. To prove uniqueness, let T : L1(X, ,µ) L1(X, ,µ) be an operator with B ! E these properties. Then for f L1(X, ,µ) and E , 2 B 2E

Tf dµ = 1ETf dµ ˆE ˆ = T (1 f) dµ ˆ E

= 1 fdµ ˆ E

= fdµ ˆE where the second equality uses the product rule and the third uses the chain rule. Since this holds for all E we must have Tf = dµ /dµ. 2E f Proposition 9.2.2. The conditional expectation operator satisfies the following properties: CHAPTER 9. APPENDIX 101

1. Positivity: f 0 a.e. implies E(f ) 0 a.e. |E 2. Triangle inequality: E(f ) E( f ). | |I | | ||I 3. Contraction: E(f ) f ; in particular, E( ) is L1-continuous. k |E k1 k k1 ·|E 4. Sup/inf property: E(sup fi ) sup E(fi ) and E(inf fi ) inf E(fi ) |E |E |E  |E for any countable family f . { i} 5. Jensen’s inequality: if g is convex then g(E(f )) E(g f E). |E  | 6. Fatou’s lemma: E(lim inf fn ) lim inf E(fn ). |E  |E Remark 9.2.3. Properties (2)–(6) are consequences of positivity only.

Proof. (1) Suppose f 0 and E(f ) > 0,soE(f ) < 0 on a set A of |E 6 |E 2E positive measure. Applying the product rule with g =1A,wehave

E(1Af )=1AE(f ) |E |E hence, replacing f by 1A,wecanassumethatf 0 and E(f ) < 0.Butthis |E contradicts the chain rule since fdµ 0 and E(f ) dµ < 0. |E + (2) Decompose f into positive´ and negative´ parts, f = f f ,sothat + f = f + f .Bypositivity, | | + E(f ) = E(f ) E(f ) | |E | | +|E |E | E(f ) + E(f ) | + |E | | |E | = E(f )+E(f ) +|E |E = E(f + f ) |E = E( f ) | ||E (3) We compute:

E(f ) = E(f ) dµ k |E k1 ˆ | |E |

E( f ) dµ  ˆ | ||E | = f dµ ˆ | | = f k k1 where we have used the triangle inequality and the chain rule. (4) We prove the sup version. By monotonicity and continuity it suffices to prove this for finite families and hence for two functions. The claim now follows from the identity max f ,f = 1 (f + f + f f ),linearity,and { 1 2} 2 1 2 | 1 2| the triangle inequality. (5) For an affine function g(t)=at + b,

E(g f )=E(af + b )=aE(f )+b = g E(f ) |E |E |E |E CHAPTER 9. APPENDIX 102

If g is convex then g =supgi where gi i I is a countable family of affine { } 2 functions. Thus

E(g f )=E(sup gi f ) |E i |E sup E(gi f ) i |E =supgi E(f ) i |E = g E(f ) |E (6) Since inf f lim inf f as n the convergence is also in L1,so k>n k % k !1 by continuity and positivity the same holds after taking the conditional expec- tation. Thus, using the inf property,

lim inf E(fn )= liminf E(fk ) n n k>n !1 |E !1 |E lim E(inf fk ) n k>n !1 |E = E(lim inf fn ) n !1 |E Corollary 9.2.4. The restriction of the conditional expectation operator to L2(X, ,µ) coincides with the orthogonal projection ⇡ : L2(X, ,µ) L2(X, ,µ). B B ! E Proof. Write ⇡ = E( ). If f L2 then by by convexity of t t2 and Jensen’s ·|E 2 ! inequality (which is immediate for simple functions and hence holds for f L1 2 by approximation),

2 ⇡f = E(f ) dµ k k2 ˆ | |E |

2 E( f ) dµ  ˆ | | |E = f 2 dµ by the chain rule ˆ | | = f k k2 Thus ⇡ maps L2 into the subspace of -measurable L2 functions, hence ⇡ : E L2(X, ,m) L2(X, ,µ). We will now show that ⇡ is the identity on L2(X, ,µ) B ! E E and is ⇡. Indeed, if g L2(X, E,µ) then for every A 2 2E ⇡g = E(g 1 ) · |E = g E(1 ) · |E Since E(1 )= 1=1, this shows that ⇡ is the identity on L2(X, , ).Next ´ |E ´ E CHAPTER 9. APPENDIX 103 if f,g L2 then fg L1,and 2 2

f,⇡g = f E(g ) dµ h i ˆ · |E

= E (f E(g )) dµ by the chain rule ˆ · |E

= E (f ) E(g ) dµ by the product rule ˆ |E |E

= E (E(f ) g) dµ by the product rule ˆ |E ·

= E(f ) gdµ by the chain rule ˆ |E · = ⇡f, g h i so ⇡ is self-adjoint.

9.3 Regularity

I’m not sure we use this anywhere, but for the record: Lemma 9.3.1. A Borel probability measure on a complete (separable) metric space is regular. Proof. It is easy to see that the family of sets A with the property that

µ(A)=infµ(U):U A is open { ◆ } =supµ(C):C A is closed { ✓ } contains all open and closed sets, and is a -algebra. Therefore every Borel set A has this property. We need to verify that in the second condition we can replace closed by compact. Clearly it is enough to show that for every closed set C and every ">0 there is a compact K C with µ(K>>µ(C) ". ✓ Fix C and ">0.Foreveryn we can find a finite family Bn,1,...,Bn,k(n) of -balls whose union B = B intersects A in a set of measure >µ(A) "/2n. n n,i Let K0 = C Bn,sothatµ(K0) >µ(C) ".ByconstructionK0 is pre- \ S compact, and K = K0 C,soK has the desired property. T ✓