<<

and Mixing Properties 225

Ergodicity and Mixing Properties average of any measurement as the system evolves is equal to the average over the component. Ergodic decomposi- ANTHONY QUAS tion gives a precise description of the manner in which Department of and Statistics, a system can be split into ergodic components. University of Victoria, Victoria, Canada A related (stronger) property of a -preserv- ing transformation is mixing. Here one is investigating Article Outline the correlation between the state of the system at different times. The system is mixing if the states are asymptotically Glossary independent: as the times between the measurements in- Definition of the Subject crease to infinity, the observed values of the measurements Introduction at those times become independent. Basics and Examples Ergodicity Ergodic Decomposition Introduction Mixing The term ergodic was introduced by Boltzmann [8,9]in Hyperbolicity and Decay of Correlations his work on statistical mechanics, where he was study- Future Directions ing Hamiltonian systems with large numbers of particles. Bibliography The system is described at any time by a point of phase space,asubsetofR6N where N is the number of parti- Glossary cles. The configuration describes the 3-dimensional posi- Bernoulli shift Mathematical abstraction of the scenario tion and velocity of each of the N particles. It has long been in statistics or probability in which one performs re- known that the Hamiltonian (i. e. the overall energy of the peated independent identical experiments. system) is invariant over time in these systems. Thus, given A probability model describing a sequence a starting configuration, all future configurations as the of observations made at regularly spaced time intervals system evolves lie on the same energy surface as the initial such that at each time, the probability distribution of one. the subsequent observation depends only on the cur- Boltzmann’s ergodic hypothesis was that the trajectory rent observation and not on prior observations. of the configuration in phase space would fill out the en- Measure-preserving transformation Amapfromamea- tire energy surface. The term ergodic is thus an amalgama- sure space to itself such that for each measurable sub- tion of the Greek words for work and path. This hypothe- set of the space, it has the same measure as its inverse sis then allowed Boltzmann to conclude that the long-term image under the . average of a quantity as the system evolves would be equal Measure-theoretic entropy A non-negative (possibly in- to its average value over the phase space. finite) real number describing the of a mea- Subsequently, it was realized that this hypothesis is sure-preserving transformation. rarely satisfied. The ergodic hypothesis was replaced in Product transformation Given a pair of measure-pre- 1911 by the quasi-ergodic hypothesis of the Ehrenfests [17] serving transformations: T of X and S of Y,theprod- which stated instead that each trajectory is dense in the uct transformation is the map of X Y given by energy surface, rather than filling out the entire energy (T S)(x; y) D (T(x); S(y)). surface. The modern notion of ergodicity (to be defined below) is due to Birkhoff and Smith [7]. Koopman [44] suggested studying a measure-preserving transformation Definition of the Subject by means of the associated isometry on Hilbert space, 2 2 Many physical phenomena in equilibrium can be modeled UT : L (X) ! L (X)definedbyUT ( f ) D f ı T.This as measure-preserving transformations. is point of view was used by von Neumann [91] in his proof the abstract study of these transformations, dealing in par- of the mean ergodic theorem. This was followed closely ticular with their long term average behavior. by Birkhoff [6] proving the pointwise ergodic theorem. One of the basic steps in analyzing a measure-preserv- An ergodic measure-preserving transformation enjoys the ing transformation is to break it down into its simplest property that Boltzmann first intended to deduce from his possible components. These simplest components are its hypothesis: that long-term averages of an observable quan- ergodic components, and on each of these components, tity coincide with the integral of that quantity over the the system enjoys the ergodic property: the long-term time phase space. 226 Ergodicity and Mixing Properties

These theorems allow one to deduce a form of in- a Lebesgue space (that is, the space together with its com- dependence on the average: given two sets of configura- pleted -algebra agrees up to a measure-preserving bijec- tions A and B, one can consider the volume of the phase tion with the unit interval with Lebesgue measure and the space consisting of points that are in A at time 0 and in B usual -algebra of Lebesgue measurable sets). Although at time t. In an ergodic measure-preserving transforma- this sounds like a strong restriction, in practice it is barely tion, if one computes the average of the volumes of these a restriction at all, as almost all of the spaces that appear regions over time, the ergodic theorems mentioned above in the theory (and all of those that appear in this article) allow one to deduce that the limit is simply the product of turn out to be Lebesgue spaces. For a detailed treatment the volume of A and the volume of B. This is the weak- of the theory of Lebesgue spaces, the reader is referred est mixing-type property. In this article, we will outline to Rudolph’s book [76]. The reader is referred also to the a rather full range of mixing properties with ergodicity at chapter on  Measure Preserving Systems. the weakest end and the Bernoulli property at the strongest While many of the definitions that we shall present are end. valid for both invertible and non-invertible measure-pre- We will set out in some detail the various mixing prop- serving transformations, the strongest mixing conditions erties, basing our study on a number of concrete examples are most useful in the case of invertible transformations. sitting at various points of this hierarchy. Many of the mix- It will be helpful to present a selection of simple exam- ing properties may be characterized in terms of the Koop- ples, relative to which we will be able to explore ergodicity man operators operators mentioned above (i. e. they are and the various notions of mixing. These examples and the spectral properties), but we will see that the strongest mix- lemmas necessary to show that they are measure-preserv- ing properties are not spectral in nature. ing transformations as claimed may be found in the books We shall also see that there are connections between of Petersen [64], Rudolph [76]andWalters[92]. More de- the range of mixing properties that we discuss and mea- tails on these examples can also be found in the chapter on sure-theoretic entropy. In measure-preserving transfor-  ErgodicTheory:BasicExamplesandConstructions. mations that arise in practice, there is a correlation be- Example 1 (Rotation on the circle) Let ˛ 2 R.Let tween strong mixing properties and positive entropy, al- R˛ :[0; 1) ! [0; 1) be defined by R˛(x) D x C ˛ mod 1. though many of these properties are logically independent. It is straightforward to verify that R˛ preserves the re- One important issue for which many questions remain striction of Lebesgue measure to [0; 1) (it is sufficient to open is that of higher-order mixing. Here, the issue is if 1 check that (R˛ (J)) D (J) for an interval J). instead of asking that the observations at two times sep- arated by a large time T be approximately independent, Example 2 (Doubling Map) Let M2 :[0; 1) ! [0; 1) be one asks whether if one makes observations at more times, defined by M2(x) D 2x mod 1. Again, Lebesgue measure each pair suitably separated, the results can be expected is invariant under M2 (to see this, one observes that for an 1 to be approximately independent. This issue has an ana- interval J, M2 (J) consists of two intervals, each of half the logue in , where it is well-known that it length of J). This may be generalized in the obvious way to is possible to have a collection of random variables that are amapMk for any integer k 2. pairwise independent, but not mutually independent. Example 3 (Interval Exchange Transformation) The class of interval exchange transformations was introduced by Sinai [85]. An interval exchange transformation is the map Basics and Examples obtained by cutting the interval into a finite number of In this article, except where otherwise stated, the measure- pieces and permuting them in such a way that the result- preserving transformations that we consider will be de- ing map is invertible, and restricted to each interval is an fined on probability spaces. order-preserving isometry. More specifically, given a measurable space (X; B)and More formally, one takes a sequence of positive B a probability measure defined on ,ameasure-preserv- lengths `1;`2;:::;`k summing to 1 andP a permutation B B ing transformation of (X; ;)isa -measurable map Pof f1;:::;kg and defines ai D j

sures, not simply probability measures, most of the re- T :[0; 1) ! [0; 1) defined by Tj[ai ;aiC1)(x) D xC(bi ai ). sults and definitions below only make sense in the proba- It is straightforward to check that any such interval ex- bility measure case. Sometimes it will be helpful to make change transformation preserves Lebesgue measure on the the assumption that the underlying probability space is unit interval. Ergodicity and Mixing Properties 227

Example 4 (Bernoulli Shift) Let A be a finite set and fix of identical balls which move at constant velocity until N a vector (pi )i2A of positive numbers that sum to 1. Let A two balls collide, whereupon they elastically swap mo- denote the set of sequences of the form x0x1x2 :::,where mentum along the direction of contact. The phase space Z 6N xn 2 A for each n 2 N and let A denote the set of bi- forthissystemisaregionofR (with N 3-dimensional infinite sequences of the form :::x2x1 x0x1x2 ::: (the position vectors and N 3-dimensional velocity vectors). is a placeholder that allows us to distinguish (for example) More abstractly, the system is equivalent to the motion of between the sequences :::0101010101 ::: and :::10101 a single point particle in a region of RM RM (with the 01010 :::). first M-vector representing position and the second rep- N We define a map (the shift map) S on A by (S(x))n D resenting velocity). The system is constrained in that its Z M xnC1 and define S on A by the same formula. Note that S position is required to lie in a bounded region S of R is invertible as a transformation on AZ but non-invertible with a piecewise smooth boundary. The system evolves as a transformation on AN . by moving the position at a constant rate in the direction We need to equip AN and AZ with measures. This of the velocity vector until the point reaches @S,atwhich is done by defining the measure of a preferred class of time the component of the velocity parallel to the normal sets, checking certain consistency conditions and appeal- to @S is reversed. This then defines a flow (i. e. a family ing to the Kolmogorov extension theorem. Here the pre- of maps (Tt)t2R satisfying TtCs D Tt ı Ts )onthephase ferred sets are the cylinder sets. Given m n in the in- space. Since the magnitude of the velocity is conserved, it n vertible case and a sequence am :::an,welet[am :::an]m is convenient to restrict to flows with speed 1. This sys- Z denote fx 2 A : xm D am;:::;xn D ang and de- tem is clearly the closest of the examples that we consider n fine ([am :::an]m) D pam pamC1 :::pan .Thisisthen to the situation envisaged by Boltzmann. Perhaps not sur- shown to uniquely define a measure on the -alge- prisingly, proofs of even the most basic properties for this bra of AZ generated by the cylinder sets. It is immedi- system are much harder than those for the other examples ate to see that for any C, (S1C) D (C), that we consider. and it follows that S is a measure-preserving transforma- Z We will need to make use of the concept of measure-theo- tion of (A ; B;). The construction is exactly analogous retic isomorphism. Two measure-preserving transforma- inthenon-invertiblecase.Seethechapteron Mea- tions T of (X; B;)andS of (Y; F;)aremeasure-theo- sure Preserving Systems or the books of Walters [92]or retically isomorphic (or just isomorphic) if there exist mea- Rudolph [76] for more details of defining measures in surable maps g : X ! Y and h : Y ! X such that these systems. The class of Bernoulli shifts will play a distinguished role 1. g ı h and h ı g agree with the respective identity maps in what follows. almost everywhere; N Z 1 D 1 D 2 F Example 5 (Markov Shift) The spaces A and A are ex- 2. (g F) (F)and (h B) (B)forallF 2 B actly as above, as is the shift map. All that changes is the and B ;and ı D ı measure. 3. S g(x) g T(x)for -almost every x (or equivalently ı D ı To define a Markov shift, we need a stochastic matrix P T h(y) h S(y)for -almost every y). (i. e. a matrix with non-negative entries whose rows sum Measure-theoretic isomorphism is the basic notion of to 1) with rows and columns indexed by A and a left eigen- ‘sameness’ in ergodic theory. It is in some sense quite vector for P with eigenvalue 1 with the property that weak, so that systems may be isomorphic that feel very dif- the entries of are non-negative and sum to 1. The exis- ferent (for example, as we discuss later, the time one map tence of such an eigenvector is a consequence of the Per- of a flow is isomorphic to a Bernoulli shift). For ron–Frobenius theory of positive matrices. Provided that 0 comparison, the notion of sameness in topological dynam- the matrix P is irreducible (for each a and a in A,thereis ical systems () is far stronger. n > 0 > an n 0suchthatPa;a 0), the eigenvector is unique. As an example of measure-theoretic isomorphism, it Given the pair (P;), one defines the measure of may be seen that the doubling map is isomorphic to the n acylindersetby([am :::an] ) D a Pa a C ::: m m m m 1 one-sided Bernoulli shift on f0; 1g with p0 D p1 D 1/2 Pan1 an and extends as before to a probability measure N Z (the map g takes an x 2 [0; 1) to the sequence of 0’s and on A or A . 1’s in its binary expansion (choosing the sequence ending Example 6 (Hard Sphere Gases and Billiards) We wish to with 0’s, for example, if x is of the form p/2n ) and the in- model the behavior of a gas in a bounded region. We make verse map h takesasequenceof0’sand1’stothepointin the assumption that the gas consists of a large number N [0; 1) with that binary expansion.) 228 Ergodicity and Mixing Properties

Given a measure-preserving transformation T of be the -algebra generated by sets of the form A¯n D a probability space (X; B;), T is associated to an isom- fx¯ 2 X : xn 2 Ag, ¯(A¯n) D (A)andT(x0; x1;:::) D 2 etry of L (X; B;)byUT ( f ) D f ı T.Thisoperatoris (T(x0); x0; x1;:::). The transformation T of (X; B; )is known as the Koopman Operator.InthecasewhereT is called the natural extension of the transformation T of invertible, the operator UT is unitary. Two measure-pre- (X; B;)(seethechapteron ErgodicTheory:BasicEx- serving transformations T and S of (X; B;)and(Y; F;) amples and Constructions for more details). In situations are spectrally isomorphic if there is a Hilbert space iso- where one wants to use invertibility, it is often possible to morphism ‚ from L2(X; B;)toL2(Y; F;)suchthat pass to the natural extension, work there and then derive ‚ ı UT D US ı ‚. As we shall see below, spectral isomor- conclusions about the original non-invertible transforma- phism is a strictly weaker property than measure-theoretic tion. isomorphism. Since in ergodic theory, measure-theoretic isomor- Ergodicity phism is the basic notion of sameness, all properties that are used to describe measure-preserving systems are re- Given a measure-preserving transformation T : X ! X, 1 1 c c quired to be invariant under measure-theoretic isomor- if T A D A,thenT A D A also. This allows us to c phism (i. e. if two measure-preserving transformations are decompose the transformation X into two pieces A and A measure-theoretically isomorphic, the first has a given and study the transformation T separately on each. In fact 1 property if and only if the second does). On the other the same situation holds if T A and A agree up to a set hand,weshallseethatsomemixing-typepropertiesarein- of measure 0. For this reason, we call a set A invariant if 1 variant under spectral isomorphism, while others are not. (T AA) D 0. If a property is invariant under spectral isomorphism, we Returning to Boltzmann’s ergodic hypothesis, exis- saythatitisaspectral property. tence of an invariant set of measure between 0 and 1 would There are a number of mixing type properties that be a bad situation as his essential idea was that the orbit of occur in the probability literature (˛-mixing, ˇ-mixing, a single point would ‘see’ all of X,whereasifX were de- -mixing, -mixing etc.) (see Bradley’s survey [12]for composed in this way, the most that a point in A could see c a description of these conditions). Many of these are would be all of A, and similarly the most that a point in A c stronger than the Bernoulli property, and are therefore not could see would be all of A . preserved under measure-theoretic isomorphism. For this A measure-preserving transformation will be called er- reason, these properties are not widely used in ergodic the- godic if it has no non-trivial decomposition of this form. ory, although ˇ-mixing turns out to be equivalent to the More formally, let T be a measure-preserving transforma- so-called weak Bernoulli property (which turns out to be tion of a probability space (X; B;). The transformation T stronger than the Bernoulli property that we discuss in this is said to be ergodic if for all invariant sets, either the set or article – see Smorodinsky’s paper [87]) and ˛-mixing is its complement has measure 0. equivalent to strong-mixing. Unlike the remaining concepts that we discuss in this A basic construction (see the article on  Ergodic The- article, this definition of ergodicity applies also to infinite ory: Basic Examples and Constructions) that we shall re- measure-preserving transformations and even to certain quire in what follows is the product of a pair of measure- non-measure-preserving transformations. See Aaronson’s preserving transformations: given transformations T of book [1] for more information. (X; B;)andS of (Y; F;), we define the product trans- The following lemma is often useful: formation T S :(X Y; B˝F;)by(T S)(x; y) D Lemma 1 Let (X; B;) be a probability space and let T : ; (Tx Sy). X ! X be a measure-preserving transformation. Then T is One issue that we face on occasion is that it is some- ergodic if and only if the only measurable functions f sat- times convenient to deal with invertible measure-preserv- isfying f ı T D f (up to sets of measure 0) are constant ing transformations. It turns out that given a non-invert- almost everywhere. ible measure-preserving transformation, there is a nat- ural way to uniquely associate an invertible measure- For the straightforward proof, we notice that if the condi- preserving transformation transformation sharing almost tion in the lemma holds and A is an invariant set, then all of the ergodic properties of the original transfor- 1A ı T D 1A almost everywhere, so that 1A is an a. e. mation. Specifically, given a non-invertible measure-pre- constant function and so A or Ac is of measure 0. Con- serving transformation T of (X; B;), one lets X D versely, if f is an invariant function, we see that for each ˛, f(x0; x1;:::): xn 2 X and T(xn) D xn1 for all ng, B fx : f (x) <˛g is an invariant set and hence of measure 0 Ergodicity and Mixing Properties 229 or 1. It follows that f is constant almost everywhere. We that works simultaneously for all L1 functions. In the case remark for future use that it is sufficient to check that the where X is a compact metric space, it is well known that bounded measurable invariant functions are constant. C(X), the space of continuous functions on X with the uni- The following corollary of the lemma shows that er- form norm has a countable , ( fn)n1 say. If the godicity is a spectral property. invariant measure is ergodic, then for each n,thereis asetB of measure 1 such that for all x 2 B , A f (x) ! Corollary 2 Let T be a measure-preserving transformation R n T n N n f d. Letting B D B ,oneobtainsafullmeasure of the probability space (X; B;).ThenTisergodicifand n n n R setsuchthatforalln and all x 2 B, AN fn(x) ! fn d. only if 1 is a simple eigenvalue of UT . A simple approximation argument thenR shows that for all The ergodic theorems mentioned earlier due to von Neu- x 2 B and all f 2 C(X), AN f (x) ! f d.Apointx mann and Birkhoff are the following (see also the chapter with this property is said to be generic for .Theobserva- on  Ergodic Theorems). tions above show that for an ergodic invariant measure , Theorem 3 (von Neumann Mean Ergodic Theorem [91]) we have fx : x is generic for gD1. n Let T be a measure-preserving transformation of the prob- If T is ergodic, but T is not ergodic for some n,then 2 ;:::; ability space (X; B;).Forf 2 L (X; B;),letAN f D one can show that the space X splits up as A1 Ad for 1/N( f C f ı T C C f ı T N1).Thenforallf 2 some djn in such a way that T(Ai ) D AiC1 for i < d 2 2 D n L (X; B;),AN fconvergesinL to an invariant function and T(Ad ) A1 with T acting ergodically on each Ai. n f . The transformation T is totally ergodic if T is ergodic for all n 2 N. One can check that a non-invertible transfor- Theorem 4 (Birkhoff Pointwise Ergodic Theorem [6]) mation T is ergodic if and only if its natural extension is Let T be a measure-preserving transformation of the prob- ergodic. ability space (X; B;).Letf 2 L1(X; B;).LetA fbeas N The following lemma gives an alternative characteriza- above. Then for -almost every x 2 X, (A f (x)) is a con- N tion of ergodicity, which in particular relates it to mixing. vergent sequence. Of these two theorems, the pointwise ergodic theorem is Lemma 5 (Ergodicity as a Mixing Property) Let T the deeper result, and it is straightforward to deduce the be a measure-preserving transformation of the probability B mean ergodic theorem from the pointwise ergodic theo- space (X; ;). Then T is ergodic if and only if for all f 2 rem. The mean ergodic theorem was reproved very con- and g in L , cisely by Riesz [71] and it is this proof that is widely known XN now. Riesz’s proof is reproduced in Parry’s book [63]. 1 h f ; g ı Tni!hf ; 1ih1; gi: There have been many different proofs given of the point- N nD0 wise ergodic theorem. Notable amongst these are the argu- ment due to Garsia [23] and a proof due to Katznelson and P In particular, if T is ergodic, then (1/N) N1 (A \ Weiss [40] based on work of Kamae [35], which appears in nD0 Tn B) ! (A)(B) for all measurable sets A and B. a simplified form in work of Keane and Petersen [42]. If the measure-preserving transformation T is ergodic, Proof Suppose that T is ergodic. ThenP the left-hand side then by virtue of Lemma 1, the limit functions appearing N1 n of the equality is equal to h f ; (1/N) nD0 g ı T i.The in the ergodic theorems are constant. One sees that the mean ergodic theorem shows that the second termR con- constant is simply the integral of f with respectR to ,so verges in L2 to the constant function with value g d D that in this situation AN f (x) converges to f d in norm hg; 1i, and the equality follows. and pointwise almost everywhere, thereby providing a jus- Conversely, if the equation holds for all f and g in L2, tification of Boltzmann’s original claim: for ergodic mea- suppose that A is an invariant set. Let f D g D 1A.Then sure-preserving transformations, time averages agree with n since g ıT D 1A for all n,theleft-handsideish1A; 1AiD spatial averages.InthecasewhereT is not ergodic, it is (A). On the other hand, the right-hand side is (A)2,so also possible to identify the limit in the ergodic theorems: that the equation yields (A) D (A)2,and(A)iseither E I I we have f D ( f j ), where is the -algebra of T-in- 0or1asrequired. variant sets. Taking f D 1A and g D 1B for measurable sets A Note that the set on which the almost everywhere con- and B gives the final statement. vergence in Birkhoff’s theorem takes place depends on the L1 function f that one is considering. Straightforward con- We now examine the ergodicity of the examples presented siderations show that there is no single full measure set above. Firstly, for the rotation of the circle, we claim that 230 Ergodicity and Mixing Properties

the transformation is ergodic if and only if the ‘angle’ ˛ is tion, it is necessary to decompose the measure into ergodic irrational.Toseethis,weargueasfollows.If˛ D p/q, pieces. This is known as ergodic decomposition. then we see that f (x) D e2 iqx is a non-constant R˛- The set of invariant measures for a measurable map T invariant function, and hence R˛ is not ergodic. On the of a measurable space (X; B) to itself forms a simplex. other hand, if ˛ is irrational, suppose f is a bounded mea- General functional analytic considerations (due to Cho- surable invariant function. Since f is bounded, it is an L2 quet [14,15] – see also Phelps’ account [66] of this theory) 2 function, andP so f may be expressed in L as a Fourier se- mean that it is possible to write any member of the simplex 2 inx ries: f D n2Z cPn en where en(x) D e .Wethen as an integral-convex combination of the extreme points. 2 in˛ see that f ı R˛ D n2Z e cn en.Inorderforf to be Further, the extreme points of the simplex may be iden- equal in L2 to f ı R˛, they must have the same Fourier tified as precisely the ergodic invariant measures for T.It 2 in˛ coefficients, so that cn D e cn for each n.Since˛ is follows that any invariant probability measure for T may irrational, this forces cn D 0foralln ¤ 0, so that f is be uniquely expressed in the form constant as required. Z The doubling map and the Bernoulli shift are both er- (A) D (A)dm() ; ; godic, although we defer proof of this for the time be- Merg(X T) ing, since they in fact have the strong-mixing property. A Markov chain with matrix P and vector is ergodic if where Merg(X; T) denotes the set of ergodic T-invariant measures on X and m is a measure on Merg(X; T). and only if for all i and j in A with i > 0andj > 0, n We will give a proof of this theorem in the special case there exists an n 0withPij > 0. This follows from the ergodic theorem for Markov chains (which is derived from of a continuous transformation of a compact space. Our the Strong Law of Large Numbers) (see [18]fordetails).In proof is based on the Birkhoff ergodic theorem and the particular, if the underlying Markov chain is irreducible, Riesz Representation Theorem identifying the dual space then the measure is ergodic. of the space of continuous functions on a compact space In the case of interval exchange transformations, there as the set of bounded signed measures on the space (see is a simple necessary condition on the permutation for Rudin’s book [75] for details). We include it here because irreducibility, namely for 1 j < k,wedonothave this special case covers many cases that arise in practice, f1;:::;jgDf1;:::;jg. Under this condition, Ma- and because few of the standard ergodic theory references sur [49] and Veech [88] independently showed that for al- include a proof of ergodic decomposition. An exception to this is Rudolph’s book [76] which gives a full proof in the most all values of the sequence of lengths (`i )1ik,the interval exchange transformation is ergodic. (In fact they case that X is a Lebesgue space. This is based on a detailed showed the stronger condition of unique ergodicity:that development of the theory of these spaces and builds mea- the transformation has no other invariant measure than sures using conditional exceptions. Kalikow’s notes [32] Lebesgue measure. This implies that Lebesgue measure is give a brief outline of a proof similar to that which follows. ergodic, because if there were a non-trivial invariant set, Oxtoby [62] also wrote a survey article containing much then the restriction of Lebesgue measure to that set would of the following (and much more besides). be another invariant measure). Theorem 6 Let X be a compact metric space, B be the For the hard sphere systems, there are no results on Borel -algebra, be an invariant Borel probability mea- ergodicity in full generality. Important special cases have sure and T be a continuous measure-preserving transfor- been studied by Sinai [84], Sinai and Chernov [86], Krámli, mation of (X; B;).Thenforeachx 2 X, there exists an Simányi and Szász [45], Simányi and Szász [81], Sim- invariant Borel measure x such that: nyi [79,80]andYoung[95]. R RR 1 1. For f 2 L (X; B;), f d D f dx d(x); 1 2. Given f 2 L (XR; B;),for-almost every x 2 X, one Ergodic Decomposition has AN f (x) ! f dx ; We already observed that if a transformation is not er- 3. The measure x is ergodic for -almost every x 2 X. godic, then it may be decomposed into parts. Clearly if these parts are not ergodic, they may be further decom- Notice that conclusion (2) shows that x can be under- posed. It is natural to ask whether the transformation can stood as the distribution on the phase space “seen” if one be decomposed into ergodic parts, and if so what form starts the system in an initial condition of x.Thisinter- does the decomposition take? In fact such a decomposi- pretation of the measures x corresponds closely with the tion does exist, but rather than decompose the transforma- ideas of Boltzmann and the Ehrenfests in the formulation Ergodicity and Mixing Properties 231 of the ergodic and quasi-ergodic hypotheses, which can be Ris alsoT measurable, and by monotone convergenceT we see seen as demanding that x is equal to for (almost) all x. x ( Uk )d(x) D 0. ItT follows that x ( Uk ) D 0 for -almost every x.Since Uk C, the lemma follows. Proof The proof will be divided into 3 main steps: defin- ing the measures , proving measurability with respect x Given a set A 2 B,letf be a sequence of con- to x and proving ergodicity of the measures. k tinuous functions (uniformly bounded by 1) satisfying n Step 1: Definition of x k fk 1AkL1() < 2 ,sothatinparticularR fk(x) ! 1 B Given a function f 2 L (X; ;), Birkhoff’s theorem 1A(x)for-almost every x.Foreachk, x 7! fk dx D states that for -almost every x 2 X,(AN f (x)) is con- limn!1 An fk(x) is a measurable function. By Lemma 7, ˜ vergent. It will be convenient to denote the limit by f (x). for -almost every x, fk ! 1A x-almost everywhere, Let f1; f2;::: be a sequence of continuous functions that soR that by the bounded convergence theorem limk!1 is dense in C(X). For each k,thereisasetBk of x’s mea- fk dx D x (A)for-almost every x. Since the limit 1 sure 1 for which (An fk(x))nD1 is a convergent sequence. of measurable functions is measurable, it follows that x 7! Intersecting these gives a set B of full measure such that x (A) is measurable for any measurable set A 2 B. for x 2 B,foreachk 1, An fk(x) is convergent. A simple R This allows us to define a measure by (A) D approximation argument shows that for x 2 B and f an ar- x (RA)d(x).R ForR a bounded measurable function f ,we bitrary , An f (x) is convergent. Given haveR f d D ( f dx )d(x). Since this agrees with ˜ x 2 B,defineamapLx : C(X) ! R by Lx ( f ) D f (x). f d for continuous functions by (1), it follows that This is a continuous linear functional on C(X), and hence D . Conclusion (1) of the theorem now follows eas- by the Riesz Representation TheoremR there exists a Borel ily. ˜ 1 measure x such that f (x) D f dx for each f 2 C(X) Given f 2 L (X), we let ( fk ) be a sequence of contin- and x 2 B.SinceLx ( f ) 0whenf is a non-negative uous functions such that k fk f kL1() is summable. This function and L (1) D 1, the measure is a probability 1 x x implies that k fk f kL (R x ) is summableR for -almost ev- measure. Since Lx ( f ı T) D Lx ( f )for f 2 C(X), one can ery x and in particular, fk dx ! f dx for almost check that x must be an invariant probability measure. every x. On the other hand, by the remark following the c ˜ For x 62 B,simplydefinex D .SinceB is a set of mea- statement of Birkhoff ’s theorem, we have fk D E( fk jI) ˜ ˜ ˜ ˜ sure 0, this will not affect any of the statements that we are so that k f fkkL1() is summable and fk(x) ! f (x)for trying to prove. -almost every x. Combining these two statements, we see Now for f continuous, we have AN f is a boundedR se- that for -almost every x,wehave quence of functions withR AN f (x) convergingR to f dx Z Z D almost everywhere and AN f d f d since T is ˜ ˜ f (x) D lim fk(x) D lim fk dx D f dx : measure-preserving. It follows from the bounded conver- k!1 k!1 gence theorem that for f 2 C(X), Z Z Z This establishes conclusion (2) of the theorem. f d D f dx d(x) : (1) Step 3: Ergodicity of x We have shown how to disintegrate the invariant mea- Step 2: Measurability of x 7! x (A) sure as an integral combination of x’s, and we have in- 2 B D D Lemma 7 Let C satisfy (C) 0.Then x (C) 0 terpreted the x’s as describing the average behavior start- 2 for -almost every x X. ing from x. It remains to show that the x’s are ergodic Proof Using regularity of Borel probability measures (see measures. Fix for now a continuous function f and a number 0 < Rudin’s book [75] for details), there exist open sets U1 <1. Since A f (x) ! f˜(x) -almost everywhere, there U2 C with (Uk ) < 1/k.Thereexistcon- n ˜ 3 1 exists an N such that fx : jAN f (x) f (x)j >/2g < /8. tinuous functions gk;m with (gk;m(x))mD1 increasing to ; c We now claim the following: 1Uk everywhereR (e.R g. gk m(x) D min(1; m d(x; Uk ))). By (1), weR have ( gk;m dx )d(x) < 1/k for all k; m. ˚ R ˜ Note that gk;m dx D limn!1 An gk;m(x)isamea- x : x fy : j f (y) f dx j >g > <: (2) surable function of x, so that using the monotoneR conver- R ˜ gence theorem (taking the limitR R in m), x 7! 1Uk dx D To see this, note that fy : j f (y) f dx j >g ˜ ˜ x (Uk )ismeasurableand ( 1Uk dx )d(x) T 1/k. fy : j f (y)AN f (y)j >/2Rg[fy : jAN f (y) f (x)j >/2g, ˜ We now see that x 7! limk!1 x (Uk ) D x ( Uk ) so that if x fy : j f (y) f dx j >g >, then either 232 Ergodicity and Mixing Properties

˜ x fy : j f (y)AN f (y)j >/2g >/2 or x fy : jAN f (y) In order for T to be strong-mixing,werequiresimply f˜(x)j >/2g >/2. We show that the set of x’s satisfying (A \ TN B) ! (A)(B)asN !1. It is clear that each condition is small. strong-mixing implies weak-mixing and weak-mixing im- 3 ˜ Firstly,R we have /8 >fy : j f (y) AN f (y)j > plies ergodicity. ˜ d d /2gD x fy : j f (y) AN f (y)j >/2g d(x), so that If T is not ergodic (so that T A D A for some A of ˜ 2 nd fx : x fy : j f (y) AN f (y)j >/2g >/2g </4 < measure strictly between 0 and 1), then j(T A \ A) /2. (A)2jD(A)(1 (A)), so that T is not weak-mixing. For the second term, given c 2 R,letFc (x) D An alternative characterization of weak-mixing is as jAN f (x) cj and G(x) D Ff˜(x)(x). Note that follows: Z Lemma 8 The measure-preserving transformation T is F ˜ (y)dx (y) D lim An F ˜ (x) D lim AnG(x) weak-mixing if and only if for every pair of measurable f (x) n!1 f (x) n!1 sets A and B, there exists a J of N of density 1 (i. e. #(J \f1;:::;Ng)/N ! 1)suchthat (using the facts that y 7! Ff˜(x)(y) is a continuous ˜ function and that since f (x) isR an invariant function, \ n D : k k 3 lim (A T B) (A) (B) (3) Ff˜(x)(T x) DR G(T x)). Since G(x)d(x) </8, it n!1 n62J 2 follows that Ff˜(x)(y)dx (y) /4 except on a set of x’s of measure less than /2. Outside this bad set, we By taking a countable family of measurable sets that are ˜ have x fy : jAN f (y) f (x)j >/2g </2 so that dense (with respect to the metric d(A; B) D (AB)) and ˜ fx : x fy : jAN f (y) f (x)j >/2g >/2g </2 as taking a suitable intersection of the corresponding J sets, required. one shows that for a given weak-mixing measure-preserv- This establishes our claim (2) above. Since >0is ing transformation, there is a single set J N such that arbitrary, it follows that for each f 2 C(X), forR -almost (3)holdsforall measurable sets A and B (see Petersen [64] ˜ every x, x-almost every y satisfies f (y) D f dx .As or Walters [92] for a proof). usual, taking a countable dense sequence ( fk)inC(X), it We show that an of the circle is not ˜ R Q isR the case that for all k and -almost every x, fk (y) D weak-mixing as follows: let ˛ 2 n and let A be the 1 3 fk dx x-almost everywhere. Let the set of x’s with this interval [ 4 ; 4 ). There is a positive proportion of n’s in the property be D.Weclaimthatforx 2 D, x is ergodic. natural numbers (in fact proportion 1/3) with the property n 1 1 1 n 1 Suppose not. Then let x 2 D and let J be an invariant set that jT ( 2 ) 2 j < 6 .Forthesen’s (A \ T A) > 3 , n 1 of x measure between ı and 1 ı for some ı>0. Then so that in particular j(A \ T A) (A)(A)j > 12 . 1 by density of C(X)inL (x ), there exists an f k with k fk Clearly this precludes the required convergence to 0 in the 1 1JkL (x ) <ı.Since1J is an invariant function, we have definition of weak-mixing, so that an irrational rotation is ˜ n 1˜J D 1J. On the other hand, fk is a constant function. It ergodic but not weak-mixing. Since R˛ D Rn˛,theearlier n ˜ ˜ 1 ˛ follows that k fk 1J kL (x ) ı>k fk 1JkL1( x ).This argument shows that R˛ is ergodic, so that R is totally contradicts the identification of the limit as a conditional ergodic. expectation and concludes the proof of the theorem. On the other hand, we show that any Bernoulli shift is strong-mixing. To see this, let A and B be arbitrary mea- surable sets. By standard measure-theoretic arguments, A Mixing and B may each be approximated arbitrarily closely by a fi- nite union of cylinder sets. Since if A0 and B0 are finite As mentioned above, ergodicity may be seen as an in- unions of cylinder sets, we have that (A0 \ Tn B0)is dependence on average property. More specifically, one equal to (A0)(B0)forlargen,itiseasytodeducethat n wants to know whether in some sense (A \ T B)con- (A \ Tn B) ! (A)(B) as required. Since the dou- verges to (A)(B)asn !1. Ergodicity is the property bling map is measure-theoretically isomorphic to a one- that there is convergence in the Césaro sense. Weak-mix- sided Bernoulli shift, it follows that the doubling map is ing is the property that there is convergence in the strong also strong-mixing. Césaro sense. That is, a measure-preserving transforma- Similarly, if a Markov Chain is irreducible (i. e. for any tion T is weak-mixing if n states i and j,thereexistsann 0suchthatPij > 0) and aperiodic (there is a state i such that gcdfn: Pn > 0gD1), NX1 ii 1 then given any pair of cylinder sets A0 and B0 we have by j(A\Tn B)(A)(B)j!0asN !1: N standard theorems of Markov chains (A0 \ Tn B0) ! nD0 Ergodicity and Mixing Properties 233

(A0)(B0). The same argument as above then shows that hand, work of Katok [36] shows that no interval exchange an aperiodic irreducible Markov Chain is strong-mixing. transformation is strong-mixing. On the other hand, if a Markov chain is periodic (d D It is of interest to understand the behavior of the ‘typi- n gcdfn: Pii > 0g > 0), then letting A D B Dfx : x0 D ig, cal’ measure-preserving transformation. There are a num- we have that (A\ Tn B) D 0 whenever d − n. It follows ber of Baire category results addressing this. In order to Td is not ergodic, so that T is not weak-mixing. state them, one needs a set of measure-preserving transfor- Both weak- and strong-mixing have formulations in mations and a on them. As mentioned earlier, it terms of functions: is effectively no restriction to assume that a transformation is a Lebesgue-measurable map on the unit interval preserv- Lemma 9 Let T be a measure-preserving transformation ing Lebesgue measure. The classical category results are of the probability space (X; B;). then on the collection of invertible Lebesgue-measure pre- 1. T is weak-mixing if and only if for every f ; g 2 L2 one serving transformations of the unit interval. One topology has on these is the ‘weak’ topology, where a sub-base is given by sets of the form N(T; A;) DfS : (S(A)T(A)) <g. NX1 1 With respect to this topology, Halmos [26] showed that jh f ; g ı Tnihf ; 1ih1; gij ! 0 as N !1: N a residual set (i. e. a dense Gı set) of invertible measure- nD0 preserving transformations is weak-mixing (see also work 2. T is strong-mixing if and only if for every f ; g 2 L2,one of Alpern [3]), while Rokhlin [72] showed that the set of has strong-mixing transformations is meagre (i. e. a nowhere dense F set), allowing one to conclude that with respect h f ; g ı T N i!hf ; 1ih1; gi as N !1: to this topology, the typical transformation is weak- but not strong-mixing. As often happens in these cases, even when a certain Using this, one can see that both mixing conditions are kind of behavior is typical, it may not be simple to exhibit spectral properties. concrete examples. In this case, a well-known example of Lemma 10 Weak- and strong-mixing are spectral proper- a transformation that is weak-mixing but not strong-mix- ties. ing was given by Chacon [13]. While on the face of it the formulation of weak-mix- Proof Suppose S is a weak-mixing transformation of ing is considerably less natural than that of strong-mixing, (Y; F;) and the transformation T of (X; B;)is the notion of weak-mixing turns out to be extremely nat- spectrally isomorphic to S by the Hilbert space iso- ural from a spectral point of view. Given a measure-pre- morphism ‚.Thenfor f ; g 2 L2(X; B;), h f ; g ı serving transformation T,letU be the Koopman opera- Tni hf ; 1i h1; gi Dh‚( f );‚(g) ı Sni T X X X Y tor described above. Since this operator is an isometry, any h‚( f );‚(1)i h‚(1);‚(g)i . Since 1 is an eigenfunction Y Y eigenvalue must lie on the unit circle. The constant func- of U with eigenvalue 1, ‚(1) is an eigenfunction of U T S tion 1 is always an eigenfunction with eigenvalue 1. If T is with an eigenvalue 1, so since S is ergodic, ‚(1) must ergodic and g and h are eigenfunctions of U with eigen- be a constant function. Since ‚ preserves norms, ‚(1) T value ,thengh¯ is an eigenfunction with eigenvalue 1, must have a constant value of absolute value 1 and hence hence invariant, so that g D Kh for some constant K.We h f ; g ı Tni hf ; 1i h1; gi Dh‚( f );‚(g) ı Sni X X X Y see that for ergodic transformations, up to rescaling, there h‚( f ); 1i h1;‚(g)i . It follows from Lemma 9 that T is Y Y is at most one eigenfunction with any given eigenvalue. weak-mixing. If UT has a non-constant eigenfunction f ,thenone A similar proof shows that strong-mixing is a spectral n 2 has jhUT f ; f ij D k f k for each n, whereas by Cauchy– property. 2 2 n Schwartz, jh f ; 1ij < k f k . It follows that jhUT f ; f i Both weak- and strong-mixing properties are preserved by h f ; 1ih1; f ij c for some positive constant c,sothatusing taking natural extensions. Lemma 9, T is not weak-mixing. Recent work of Avila and Forni [4]showsthatforin- Using the spectral theorem, the converse is shown to terval exchange transformations of k 3 intervals with hold. the underlying permutation satisfying the non-degeneracy condition above, almost all divisions of the interval (with Theorem 11 The measure-preserving transformation T is respect to Lebesgue measure on the k1-dimensional sim- weak-mixing if and only UT has no non-constant eigen- plex) lead to weak-mixing transformations. On the other functions. 234 Ergodicity and Mixing Properties

Of course this also shows that weak-mixing is a spectral ing on (X; B0;) is measure-theoretically isomorphic to property. Equivalently, this says that the transformation T arotationonacompactgroup.Thisallowsonetosplit 2 B 2 B0 2 B is weak-mixing if and only if the apart from the constant L (X; ;)asL (X; ;) ˚ Lc (X; ;), where, as men- eigenfunction, the operator UT has only continuous spec- tioned above the first part is the discrete spectrum part, trum (that is, the operator has no other eigenfunctions). spanned by eigenfunctions, and the second part is the con- For a very nice and concise development of the part of tinuous spectrum part, consisting of functions whose spec- spectral theory relevant to ergodic theory, the reader is re- tral measure is continuous. Since we have split L2 into ferred to the Appendix in Parry’s book [63]. a discrete part and a continuous part, it is natural to ask Using this theory, one can establish the following: whether the underlying transformation T can be split up in some way into a weak-mixing part and a discrete spectrum Theorem 12 (compact group rotation) part, somewhat analogously to 1. T is weak-mixing if and only if T Tisergodic; the ergodic decomposition. Unfortunately, there is no- 2. If T and S are ergodic, then T S is ergodic if and only if such decomposition available. However for some appli- US and UT have no common eigenvalues other than 1. cations, for example to multiple recurrence (starting with the work of Furstenberg [20,21]), the decomposition of L2 (possibly into more complicated parts) plays a crucial role Proof The main factor in the proof is that the eigenvalues (see the chapters on  Ergodic Theory: Recurrence and of U are precisely the set of ˛ˇ,where˛ is an eigen- T S  Ergodic Theory: Interactions with Combinatorics and value of U and ˇ is an eigenvalue of U .Further,the T S Number Theory). eigenfunctions of U with eigenvalue are spanned by T S For non-invertible measure-preserving transforma- eigenfunctions of the form f ˝ g,wheref is an eigenfunc- tions, the transformation is weak- or strong-mixing if and tion of U , g is an eigenfunction of U ,andtheproductof T S only if its natural extension has that property. the eigenvalues is . The understanding of weak-mixing in terms of the dis- Suppose that T is weak-mixing. Then the only eigen- crete part of the spectrum of the operator also extends to function is the constant function, so that the only eigen- total ergodicity. Tn is ergodic if and only if T has no eigen- function of U is the constant function, proving that T T values of the form e2 ip/n other than 1. From this it fol- T T is ergodic. Conversely, if U has an eigenvalue (so T lows that an ergodic measure-preserving transformation T that f ı T D ˛T for some non-constant f )thenf ˝ f¯ is is totally ergodic if and only if it has no rational spectrum a non-constant invariant function of T T so that T T (i. e. no eigenvalues of the form e2 ip/q other than the sim- is not ergodic. ple eigenvalue 1). For the second part, if U and U have a common S T An intermediate mixing condition between strong- eigenvalue other than 1 (say f ı T D ˛ f and g ı T D ˛g), and weak- mixing is that a measure-preserving transfor- then f ˝ g¯ is a non-constant invariant function. Con- mation is mild-mixing if whenever f ı Tn i ! f for an versely, if T S has a non-constant invariant function h, L2 function f and a sequence n !1,thenf is a.e. then h can be decomposed into functions of the form f ˝g, i constant. Clearly mild-mixing is a spectral property. If where f and g are eigenfunctions of U and U respec- T S a transformation has an eigenfunction f , then it is straight- tively with eigenvalues ˛ and ˇ satisfying ˛ˇ D 1. Since forward to find a sequence n such that f ı Tn i ! f , the eigenvalues of S are closed under complex conjuga- i so we see that mild-mixing implies weak-mixing. To see tion, we see that UT and US have a common eigenvalue that strong-mixing implies mild-mixing, suppose that T other than 1 as required. n isR strong-mixing and that f ı T i ! f .Thenwehave n 2 For a measure-preserving transformation T,weletK f ı T i f¯ !kf k . OnR the other hand, the strong mixing bethesubspaceofL2 spanned by the eigenfunctions of property implies that f ı Tn i f¯ !jhf ; 1ij2.Theequality UT. It is a remarkable fact that K may be identified as of these implies that f is a. e. constant. Mild-mixing has L2(X; B0;)whereB0 is a sub--algebra of B.ThespaceK a useful reformulation in terms of ergodicity of general is called the Kronecker factor of T. The terminology comes (not necessarily probability) measure-preserving transfor- fromthefactthatanysub--algebra F of B gives rise to mations: A transformation T is mild-mixing if and only if a factor mapping :(X; B;) ! (X; F;)with(x) D for every conservative ergodic measure-preserving trans- x.ByconstructionL2(X; B0;) is the closed linear span formation S, T S is ergodic. See Furstenberg and Weiss’ of the eigenfunctions of T considered as a measure-pre- article [22] for further information on mild-mixing. serving transformation of (X; B0;). By the Discrete Spec- The strongest spectral property that we consider is trum Theorem of Halmos and von Neumann [27], T act- that of having countable Lebesgue spectrum. While we Ergodicity and Mixing Properties 235 will avoid a detailed discussion of spectral theory in this Newton [52]) have countable Lebesgue spectrum but zero article, this is a special case that can be described sim- entropy. ply. Specifically, let T be an invertible measure-preserv- The fact that (two-sided) Bernoulli shifts have the K ing transformation. Then T has countable Lebesgue spec- propertyW follows from Kolmogorov’s 0–1 law by taking F 1 nP P trum if there is a sequence of functions f1; f2;::: such that D nD0 T ,where is the partition into cylinder n Z N f1g[fUT f j : n 2 ; j 2 g forms an orthonormal basis sets (see Williams’s book [93]fordetailsofthe0–1law). for L2(X). Although the K property is explicitly an invertible To see that this property is stronger than strong- property, it has a non-invertible counterpart, namely t n B mixing, we simply observe that it implies that hUT UT f j; Texactness. A transformation T of (X; ;)isexact if m 1 nB UT fk i!0ast !1. Then by approximating f and g nD0 T consists entirely of null sets and sets of mea- by their expansions with respect to a finite part of the ba- sure 1. It is not hard to see that a non-invertible transfor- n sis, we deduce that hUT f ; gi!hf ; 1ih1; gi as required. mation is exact if and only if its natural extension is K. Since already strong-mixing is atypical from the topologi- The final and strongest property in our list is that of be- cal point of view, it follows that countable Lebesgue spec- ing measure-theoretically isomorphic to a Bernoulli shift. trum has to be atypical. In fact, Yuzvinskii [96]showed If T is measure-theoretically isomorphic to a Bernoulli that the typical invertible measure-preserving transforma- shift, we say that T has the Bernoulli property.Whilein tion has simple singular spectrum. principle this could apply to both invertible and non-in- The property of countable Lebesgue spectrum is by vertible transformations, in practice the definition applies definition a spectral property. Since it completely describes to a large class of invertible transformations, but occurs the transformation up to spectral isomorphism, there can comparatively seldom for non-invertible transformations. be no stronger spectral properties. The remaining prop- For this reason, we will restrict ourselves to a discussion of erties that we shall examine are invariant under measure- the Bernoulli property for invertible transformations (see theoretic isomorphisms only. however work of Hoffman and Rudolph [29]andHeicklen An invertible measure-preserving transformation T of and Hoffman [28] for work on the one-sided Bernoulli (X; B;) is said to be K (for Kolmogorov) if there is a sub- property). -algebra F of B such that In the case of invertible Bernoulli shifts, Orn- T stein [53,58] developed in the early 1970s a powerful iso- 1. 1 TnF is the trivial -algebra up to sets of mea- nD1 morphism theory, showing that two Bernoulli shifts are sure 0 (i. e. the intersection consists only of null sets and measure-theoretically isomorphic if and only if they have sets of full measure). W the same entropy. Entropy had already been identified as 2. 1 TnF D B (i. e. the smallest -algebra contain- nD1 an invariant by Kolmogorov and Sinai [43,82], so this es- ing TnF for all n > 0isB). tablished that it was a complete invariant for Bernoulli The K property has a useful reformulation in terms of en- shifts. Keane and Smorodinsky [41] gave a proof which tropy as follows: T is K if and only if for every non-trivial showed that two Bernoulli shifts of the same entropy are partition P of X,theentropyofT with respect to the par- isomorphic using a conjugating map that is continuous tition P is positive: T has completely positive entropy.See almost everywhere. With other authors, this theory was the chapter on  Entropy in Ergodic Theory for the rel- extended to show that the property of being isomorphic evant definitions. The equivalence of the K property and to a Bernoulli shift applied to a surprisingly large class of completely positive entropy was shown by Rokhlin and measure-preserving transformations (e. g. geodesic flows Sinai [74]. For a general transformation T,onecancon- on manifolds of constant negative (Ornstein sider the collection of all B of X such that with re- and Weiss [60]), aperiodic irreducible Markov chains c spect to the partition PB DfB; B g, h(PB) D 0. One can (Friedman and Ornstein [19]), toral automorphisms show that this is a -algebra. This -algebra is known as (Katznelson [39]) and more generally many Gibbs mea- the Pinsker -algebra. The above reformulation allows us sures for hyperbolic dynamical systems (see the book of to say that a transformation is K if and only if it has a trivial Bowen [11])). Pinsker -algebra. Initially, it was conjectured that the properties of be- The K property implies countable Lebesgue spectrum ing K and Bernoulli were the same, but since then a num- (see Parry’s book [63] for a proof). To see that K is not im- ber of measure-preserving transformations that are K but plied by countable Lebesgue spectrum, we point out that not Bernoulli have been identified. The earliest was due to certain measure-preserving transformations derived from Ornstein [55]. Ornstein and Shields [59]thenprovidedan Gaussian systems (see for example the paper of Parry and uncountable family of non-isomorphic K automorphisms. 236 Ergodicity and Mixing Properties

Katok [37] gave an example of a smooth diffeomorphism Bergelson [5] generalized this by showing that that is K but not Bernoulli; and Kalikow [33] gave a very p1(n) pk (n) natural probabilistic example of a transformation that has lim A0 \ T A1 \\T Ak n!1;n62J this property (the T, T1 process). Yk While in systems that one regularly encounters there D (Ai ) is a correlation between positive entropy and the stronger iD0 mixing properties that we have discussed, these properties whenever p1(n);:::;pk (n) are non-constant integer-val- are logically independent (for example taking the prod- ued polynomials such that pi (n) p j (n) is unbounded for uct of a Bernoulli shift and the identity transformation i ¤ j. The method of proof of both of these results was gives a positive entropy transformation that fails to be er- a Hilbert space version of the van der Corput inequality of godic; also, the zero entropy Gaussian systems with count- analytic number theory. Furstenberg’s proof played a key able Lebesgue spectrum mentioned above have relatively role in his ergodic proof [20] of Szemerédi’s theorem on strong mixing properties but zero entropy). the existence of arbitrarily long arithmetic progressions in In many of the mixing criteria discussed above we have a subset of the integers of positive density (see the chap- considered a pair of sets A and B and asked for asymptotic ter on  Ergodic Theory: Interactions with Combinatorics n independence of A and B (so that for large n, A and T B and Number Theory for more information about this di- become independent). It is natural to ask, given a finite rection of study). collection of sets A0; A1;:::;Ak , under whatQ conditions The conclusions that one draws here are much weaker n1 nk k (A0 \T A1 \\T Ak ) converges to jD0 (A j). than the requirement for mixing of all orders. For mix- A measure-preserving transformation is said to be ing of all orders, it was required that provided the gaps mixing of order k + 1 if for all measurable sets A0;:::;Ak , between 0; n1;:::;nk diverge to infinity, one achieves asymptotic independence, whereas for these weak-mixing n1 nk lim A0 \ T A1 \\T Ak results, the gaps are increasing along prescribed sequences n1!1;n jC1n j!1 with regular growth properties. Yk It is interesting to note that the analogous question of D (A ) : j whether mixing implies mixing of all orders is known to jD0 fail in higher-dimensional actions. Here, rather than a Z An outstanding open question asked by Rokhlin [73]ap- action, in which there is a single measure-preserving trans- pearing already in Halmos’ 1956 book [27] is to determine formation (so that the integer n acts on a point x 2 X by n Zd whether mixing (i. e. mixing of order 2) implies mixing of mapping it to T x), one takes a action. For such an all orders. Kalikow [34] showed that mixing implies mix- action, one has d commuting transformations T1;:::;Td ing of all orders for rank 1 transformations (existence of and a vector (n1;:::;nd )actsonapointx by sending it to n1 nd rank one mixing transformations having been previously T1 Td x. Ledrappier [46] studied the following two- Z2 established by Ornstein in [54]). Later Ryzhikov [77]used dimensional action. Let X Dfx 2f0; 1g : xv C xvCe1 C joining methods to establish the result for transformations xvCe2 D 0(mod2)g and let Ti (x)v D xvCei .SinceX is with finite rank, and Host [30] also used joining methods a compact Abelian group, it has a natural measure in- to establish the result for measure-preserving transforma- variant under the group operations (the Haar measure). It tions with singular spectrum, but the general question re- is not hard to show that this system is mixing (i. e. given n1 n2 mains open. any measurable sets A and B, (A \ T1 T2 B) ! It is not hard to show using martingale arguments that (A)(B)ask(n1; n2)k!1). Ledrappier showed that K automorphisms and hence all Bernoulli measure-pre- the system fails to be 3-mixing. Subsequently Masser [48] serving transformations are mixing of all orders. established necessary and sufficient conditions for similar For weak-mixing transformations, Furstenberg [21] higher-dimensional algebraic actions to be mixing of or- has established the following weak-mixing of all orders der k but not order k C 1 for any given k. statement: if a measure-preserving transformation T is weak-mixing, then given sets A0;:::;Ak ,thereisasub- Hyperbolicity and Decay of Correlations sequence J of the integers of density 0 such that One class of systems in which the stronger mixing prop- erties are often found is the class of smooth systems pos- Yk n kn sessing uniform hyperbolicity (i. e. the tangent space to lim A0 \ T A1 \\T Ak D (Ai ) : n!1 n62J the manifold at each point splits into stable and unstable iD0 Ergodicity and Mixing Properties 237

s u subspaces E (x)andE (x) such that the kDTjEs(x)k than exhibiting exponential decay of correlations, the map 1 a < 1forallx and kDT jEu (x)ka and DT(Es(x)) D has polynomial decay of correlations with a rate depend- Es (T(x)) and DT(Eu(x)) D Eu (T(x))). In some cases sim- ing on ˛. ilar conclusions are found in systems possessing non-uni- In Young’s survey [94], a variety of techniques are out- form hyperbolicity. See Katok and Hasselblatt’s book [38] lined for understanding the strong ergodic properties of for an overview of hyperbolic dynamical systems, as well as non-uniformly hyperbolic diffeomorphisms. In her arti- the chapter in this volume on  Smooth Ergodic Theory. cle [95], methods are introduced for studying many classes Inthesimplecaseofexpandingpiecewisecontinu- of non-uniformly hyperbolic systems by looking at suit- ous maps of the interval (that is, maps for which the ab- ably high powers of the map, for which the power has solute value of the derivative is uniformly bounded be- strong hyperbolic behavior. The article shows how to un- low by a constant greater than 1), it is known that if derstand the ergodic behavior of these systems. These they are totally ergodic and topologically transitive (i. e. methods are applied (for example) to billiards, one-dimen- the forward images of any interval cover the entire inter- sional quadratic maps and Hénon maps. val), then provided that the map has sufficient smoothness (e. g. the map is C1 and the derivative satisfies a certain Future Directions additional summability condition), the map has a unique absolutely continuous invariant measure which is exact and whose natural extension is Bernoulli (see the paper Problem 1 (Mixing of all orders) Does mixing imply of Góra [25]forresultsofthistypeprovedundersome mixing of all orders? Can the results of Kalikow, Ryzhikov of the mildest hypotheses). These results were originally and Host be extended to larger classes of measure-preserv- established for maps that were twice continuously differ- ing transformations? Thouvenot observed that it is suffi- entiable, and the hypotheses were progressively weakened, cient to establish the result for measure-preserving trans- approaching, but never meeting, C1.Subsequentworkof formations of entropy 0. This observation (whose proof is Quas [68,69]providedexamplesofC1 expanding maps based on the Pinsker -algebra) was stated in Kalikow’s of the interval for which Lebesgue measure was invariant, paper [34] and is reproduced as Proposition 3.2 in recent but respectively not ergodic and not weak-mixing. Some work of de la Rue [16] on the mixing of all orders problem. of the key tools in controlling mixing in one-dimensional Problem 2 (Multiple weak-mixing) 1 As mentioned expanding maps that are absent in the C case are bounded above, Bergelson [5] showed that if T is a weak-mixing distortion estimates. Here, there is a constant 1 C < 1 n transformation, then there is a subset J of the integers of such that given any interval I on which some power T density 0 such that of T acts injectively and any sub-interval J of I,onehas 1/C (jTn Jj/jTn Ij)/(jJj/jIj) C. An early place in p1(n) pk (n) lim A0 \ T A1 \\T Ak which bounded distortion estimates appear is the work of n!1;n62J Rényi [70]. Yk One important class of results for expanding maps es- D (Ai ) tablishes an exponential decay of correlations. Here, one iD0 starts withR a pair of smoothR functionsR f and g and one esti- n mates f g ıT d f d g d,where is an abso- whenever p1(n);:::;pk (n) are non-constant integer-val- lutely continuous invariant measure. If is mixing, we ex- ued polynomials such that pi (n) p j (n) is unbounded for pect this to converge to 0. In fact though, in good cases this i ¤ j. It is natural to ask what is the most general class of converges to 0 at an exponential rate for each pair of func- times that can replace the sequences (p1(n));:::;(pk (n)). tions f and g belonging to a sufficiently smooth class. In In unpublished notes, Bergelson and Håland considered this case, the measure-preserving transformation T is said as times the values taken by a family of integer-valued to have exponential decay of correlations. See Liverani’s generalized polynomials (those functions of an integer article [47] for an introduction to a method of establishing variable that can be obtained by the operations of ad- this based on cones. Exponential decay of correlations im- dition, multiplication, addition of or multiplication by plies in particular that the natural extension is Bernoulli. ap real constantp and taking integer parts (e. g. g(n) D Hu [31] has studied the situation of maps of the in- b 2bncCb 3nc2c)). They conjectured necessary and terval for which the derivative is bigger than 1 everywhere sufficient conditions for the analogue of Bergelson’s weak- except at a fixed point, where the local behavior is of the mixing polynomial ergodic theorem to hold, and proved form x 7! x C x1C˛ for 0 <˛<1. In this case, rather theconjectureincertaincases. 238 Ergodicity and Mixing Properties

In a recent paper of McCutcheon and Quas [50], the 7. Birkhoff GD, Smith PA (1924) Structural analysis of surface analogous question was addressed in the case where T is transformations. J Math 7:345–379 a mild-mixing transformation. 8. Boltzmann L (1871) Einige allgemeine Sätze über Wärme- gleichgewicht. Wiener Berichte 63:679–711 Problem 3 (Pascal adic transformation) Vershik [89,90] 9. Boltzmann L (1909) Wissenschaftliche Abhandlungen. Akade- mie der Wissenschaften, Berlin introduced a family of transformations known as the adic 10. Boshernitzan M, Berend D, Kolesnik G (2001) Irrational dilations transformations. The underlying spaces for these trans- of Pascal’s triangle. Mathematika 48:159–168 formations are certain spaces of paths on infinite graphs, 11. Bowen R (1975) Equilibrium States and the Ergodic Theory of and the transformations act by taking a path to its lex- Anosov Diffeomorphisms. Springer, Berlin icographic neighbor. Amongst the adic transformations, 12. Bradley RC (2005) Basic properties of strong mixing conditions. A survey and some open questions. Probab Surv 2:107–144 the so-called Pascal adic transformation (so-called because 13. Chacon RV (1969) Weakly mixing transformations which are the underlying graph resembles Pascal’s triangle) has been not strongly mixing. Proc Amer Math Soc 22:559–562 singled out for attention in work of Petersen and oth- 14. Choquet G (1956) Existence des représentations intégrales au ers [2,10,51,65]. In particular, it is unresolved whether this moyen des points extrémaux dans les cônes convexes. C R transformation is weak-mixing with respect to any of its Acad Sci Paris 243:699–702 15. Choquet G (1956) Unicité des représentations intégrales au ergodic measures. Weak-mixing has been shown by Pe- moyen de points extrémaux dans les cônes convexes réticulés. tersen and Schmidt to follow from a number-theoretic C R Acad Sci Paris 243:555–557 condition on the binomial coefficients [2,10]. 16. de la Rue T (2006) 2-fold and 3-fold mixing: why 3-dot-type counterexamples are impossible in one dimension. Bull Braz Problem 4 (Weak Pinsker Conjecture) Pinsker [67] Math Soc (NS) 37(4):503–521 conjectured that in a measure-preserving transformation 17. Ehrenfest P, Ehrenfest T (1911) Begriffliche Grundlage der with positive entropy, one could express the transfor- statistischen Auffassung in der Mechanik. No. 4. In: En- mation as a product of a Bernoulli shift with a system cyclopädie der mathematischen Wissenschaften. Teubner, Leipzig with zero entropy. This conjecture (now known as the 18. Feller W (1950) An Introduction to Probability and its Applica- Strong Pinsker Conjecture) was shown to be false by Orn- tions. Wiley, New York stein [56,57]. Shields and Thouvenot [78] showed that the 19. Friedman NA, Ornstein DS (1970) On isomorphism of weak collection of transformations that can be written as a prod- Bernoulli transformations. Adv Math 5:365–394 uct of a zero entropy transformation with a Bernoulli shift 20. Furstenberg H (1977) Ergodic behavior of diagonal mea- ¯ sures and a theorem of Szemerédi on arithmetic progressions. is closed in the so-called d-metric that lies at the heart of J Analyse Math 31:204–256 Ornstein’s theory. 21. Furstenberg H (1981) Recurrence in Ergodic Theory and Com- It is, however, the case that if T : X ! X has en- binatorial Number Theory. Princeton University Press, Prince- tropy h > 0, then for all h0 h, T has a factor S ton with entropy h0 (this was originally proved by Sinai [83] 22. Furstenberg H, Weiss B (1978) The finite multipliers of infinite ergodic transformations. In: The Structure of in Dy- and reproved using the Ornstein machinery by Ornstein namical Systems. Proc Conf, North Dakota State Univ, Fargo, and Weiss in [61]). The Weak Pinsker Conjecture states N.D., 1977. Springer, Berlin that if a measure-preserving transformation T has entropy 23. Garsia AM (1965) A simple proof of E. Hopf’s maximal ergodic h > 0, then for all >0, T may be expressed as a product theory. J Math Mech 14:381–382 of a Bernoulli shift and a measure-preserving transforma- 24. Girsanov IV (1958) Spectra of dynamical systems generated by stationary Gaussian processes. Dokl Akad Nauk SSSR 119:851– tion with entropy less than . 853 25. Góra P (1994) Properties of invariant measures for piecewise Bibliography expanding one-dimensional transformations with summable oscillations of derivative. Ergodic Theory Dynam Syst 14:475– 1. Aaronson J (1997) An Introduction to Infinite Ergodic Theory. 492 American Mathematical Society, Providence 26. Halmos PA (1944) In general a measure-preserving transforma- 2. Adams TM, Petersen K (1998) Binomial coefficient multiples of tion is mixing. Ann Math 45:786–792 irrationals. Monatsh Math 125:269–278 27. Halmos P (1956) Lectures on Ergodic Theory. Chelsea, New 3. Alpern S (1976) New proofs that weak mixing is generic. Invent York Math 32:263–278 28. Hoffman C, Heicklen D (2002) Rational maps are d-adic 4. Avila A, Forni G (2007) Weak-mixing for interval exchange bernoulli. Ann Math 156:103–114 transformations and translation flows. Ann Math 165:637–664 29. Hoffman C, Rudolph DJ (2002) Uniform endomorphisms which 5. Bergelson V (1987) Weakly mixing pet. Ergodic Theory Dynam are isomorphic to a Bernoulli shift. Ann Math 76:79–101 Systems 7:337–349 30. Host B (1991) Mixing of all orders and pairwise indepen- 6. Birkhoff GD (1931) Proof of the ergodic theorem. Proc Nat Acad dent joinings of systems with singular spectrum. Israel J Math Sci 17:656–660 76:289–298 Ergodicity and Mixing Properties 239

31. Hu H (2004) Decay of correlations for piecewise smooth maps 57. Ornstein DS (1973) A mixing transformation for which Pinsker’s with indifferent fixed points. Ergodic Theory Dynam Syst conjecture fails. Adv Math 10:103–123 24:495–524 58. Ornstein DS (1974) Ergodic Theory, , and Dynam- 32. Kalikow S () Outline of ergodic theory. Notes freely available ical Systems. Yale University Press, Newhaven for download. See http://www.math.uvic.ca/faculty/aquas/ 59. Ornstein DS, Shields PC (1973) An uncountable family of K-au- kalikow/kalikow.html tomorphisms. Adv Math 10:89–102 1 33. Kalikow S (1982) T; T transformation is not loosely Bernoulli. 60. Ornstein DS, Weiss B (1973) Geodesic flows are Bernoullian. Is- Ann Math 115:393–409 rael J Math 14:184–198 34. Kalikow S (1984) Twofold mixing implies threefold mixing for 61. Ornstein DS, Weiss B (1975) Unilateral codings of Bernoulli sys- rank one transformations. Ergodic Theory Dynam Syst 2:237– tems. Israel J Math 21:159–166 259 62. Oxtoby JC (1952) Ergodic sets. Bull Amer Math Soc 58:116–136 35. Kamae T (1982) A simple proof of the ergodic theorem using 63. Parry W (1981) Topics in Ergodic Theory. Cambridge, Cam- non-standard analysis. Israel J Math 42:284–290 bridge 36. Katok A (1980) Interval exchange transformations and some 64. Petersen K (1983) Ergodic Theory. Cambridge, Cambridge special flows are not mixing. Israel J Math 35:301–310 65. Petersen K, Schmidt K (1997) Symmetric Gibbs measures. Trans 37. Katok A (1980) Smooth non-Bernoulli K-automorphisms. In- Amer Math Soc 349:2775–2811 vent Math 61:291–299 38. Katok A, Hasselblatt B (1995) Introduction to the Modern The- 66. Phelps R (1966) Lectures on Choquet’s Theorem. Van Nostrand, ory of Dynamical Systems. Cambridge, Cambridge New York 39. Katznelson Y (1971) Ergodic automorphisms of T n are 67. Pinsker MS (1960) Dynamical systems with completely positive Bernoulli shifts. Israel J Math 10:186–195 or zero entropy. Soviet Math Dokl 1:937–938 1 40. Katznelson Y, Weiss B (1982) A simple proof of some ergodic 68. Quas A (1996) A C expanding map of the circle which is not theorems. Israel J Math 42:291–296 weak-mixing. Israel J Math 93:359–372 41. Keane M, Smorodinsky M (1979) Bernoulli schemes of the same 69. Quas A (1996) Non-ergodicity for C1 expanding maps and entropy are finitarily isomorphic. Ann Math 109:397–406 g-measures. Ergodic Theory Dynam Systems 16:531–543 42. Keane MS, Petersen KE (2006) Nearly simultaneous proofs of 70. Rényi A (1957) Representations for real numbers and their er- the ergodic theorem and maximal ergodic theorem. In: Dy- godic properties. Acta Math Acad Sci Hungar 8:477–493 namics and Stochastics: Festschrift in Honor of M.S. Keane. In- 71. Riesz F (1938) Some mean ergodic theorems. J Lond Math Soc stitute of Mathematical Statistics, pp 248–251, Bethesda MD 13:274–278 43. Kolmogorov AN (1958) New metric invariant of transitive dy- 72. Rokhlin VA (1948) A ‘general’ measure-preserving transforma- namical systems and endomorphisms of Lebesgue spaces. tion is not mixing. Dokl Akad Nauk SSSR Ser Mat 60:349–351 Dokl Russ Acad Sci 119:861–864 73. Rokhlin VA (1949) On endomorphisms of compact commuta- 44. Koopman BO (1931) Hamiltonian systems and Hilbert space. tive groups. Izvestiya Akad Nauk SSSR Ser Mat 13:329–340 Proc Nat Acad Sci 17:315–218 74. Rokhlin VA, Sinai Y (1961) Construction and properties of in- 45. Krámli A, Simányi N, Szász D (1991) The K-property of three bil- variant measurable partitions. Dokl Akad Nauk SSSR 141:1038– liard balls. Ann Math 133:37–72 1041 46. Ledrappier F (1978) Un champ Markovien peut être d’entropie 75. Rudin W (1966) Real and Complex Analysis. McGraw Hill, New nulle et mélangeant. C R Acad Sci Paris, Sér A-B 287:561–563 York 47. Liverani C (2004) Decay of correlations. Ann Math 159:1275– 76. Rudolph DJ (1990) Fundamentals of Measurable Dynamics. 1312 Oxford, Oxford University Press 48. Masser DW (2004) Mixing and linear equations over groups in 77. Ryzhikov VV (1993) Joinings and multiple mixing of the actions positive characteristic. Israel J Math 142:189–204 of finite rank. Funct Anal Appl 27:128–140 49. Masur H (1982) Interval exchange transformations and mea- 78. Shields P, Thouvenot J-P (1975) Entropy zero × Bernoulli pro- sured foliations. Ann Math 115:169–200 cesses are closed in the d¯-metric. Ann Probab 3:732–736 50. McCutcheon R, Quas A (2007) Generalized polynomials and 79. Simányi N (2003) Proof of the Boltzmann–Sinai ergodic hy- mild mixing systems. Canad J Math (to appear) pothesis for typical hard disk systems. Invent Math 154:123– 51. Méla X, Petersen K (2005) Dynamical properties of the Pascal 178 adic transformation. Ergodic Theory Dynam Syst 25:227–256 52. Newton D, Parry W (1966) On a factor automorphism of a nor- 80. Simányi N (2004) Proof of the ergodic hypothesis for typical mal . Ann Math Statist 37:1528–1533 hard ball systems. Ann Henri Poincaré 5:203–233 53. Ornstein DS (1970) Bernoulli shifts with the same entropy are 81. Simányi N, Szász D (1999) Hard ball systems are completely hy- isomorphic. Adv Math 4:337–352 perbolic. Ann Math 149:35–96 54. Ornstein DS (1972) On the root problem in ergodic theory. In: 82. Sinai YG (1959) On the notion of entropy of a dynamical sys- Proceedings of the Sixth Berkeley Symposium on Mathemat- tem. Dokl Russ Acad Sci 124:768–771 ical Statistics and Probability (Univ. California, Berkeley, Calif., 83. Sinai YG (1964) On a weak isomorphism of transformations 1970/1971), vol II: Probability theory. Univ. California Press, with invariant measure. Mat Sb (NS) 63:23–42 pp 347–356 84. Sinai YG (1970) Dynamical systems with elastic reflections. 55. Ornstein DS (1973) An example of a Kolmogorov automor- Ergodic properties of dispersing billiards. Uspehi Mat Nauk phism that is not a Bernoulli shift. Adv Math 10:49–62 25:141–192 56. Ornstein DS (1973) A K-automorphism with no square root and 85. Sinai Y (1976) Introduction to Ergodic Theory. Princeton, Pinsker’s conjecture. Adv Math 10:89–102 Princeton, translation of the 1973 Russian original 240 Ergodicity and Mixing Properties

86. Sinai YG, Chernov NI (1987) Ergodic properties of some sys- 91. von Neumann J (1932) Proof of the quasi-ergodic hypothesis. tems of two-dimensional disks and three-dimensional balls. Proc Nat Acad Sci USA 18:70–82 Uspekhi Mat Nauk 42:153–174, 256 92. Walters P (1982) An Introduction to Ergodic Theory. Springer, 87. Smorodinsky M (1971) A partition on a Bernoulli shift which is Berlin not weakly Bernoulli. Math Systems Th 5:201–203 93. Williams D (1991) Probability with Martingales. Cambridge, 88. Veech W (1982) Gauss measures for transformations on Cambridge the space on interval exchange maps. Ann Math 115:201– 94. Young LS (1995) Ergodic theory of differentiable dynamical 242 systems. In: Real and Complex Dynamical Systems (Hillerød, 89. Vershik A (1974) A description of invariant measures for ac- 1993). Kluwer, Dordrecht, pp 293–336 tions of certain infinite-dimensional groups. Soviet Math Dokl 95. Young LS (1998) Statistical properties of dynamical systems 15:1396–1400 with some hyperbolicity. Ann Math 147:585–650 90. Vershik A (1981) Uniform algebraic approximation of shift and 96. Yuzvinskii SA (1967) Metric automorphisms with a simple spec- multiplication operators. Soviet Math Dokl 24:97–100 trum. Soviet Math Dokl 8:243–245