preliminary version : Not for diffusion Lecture 4. Entropy and Markov Chains The most important numerical invariant related to the growth in topological dynamical systems is .1 It represents the exponential growth rate of the number of orbit segments which are distinguishable with an arbitrarily high but finite precision. Of course, topological entropy is invariant by topological conjugacy. For measurable dynamical systems, an entropy can be defined using the invariant . It gives an indication of the amount of or of the system. The relation between measure–theoretical entropy and topological entropy is given by a variational principle. 4.1 Topological Entropy We will follow the definition given by Rufus Bowen in [Bo, Chapter 4]. Let X be a compact .

Definition 4.1 Let S ⊂ X, n ∈ N and ε > 0. S is a (n, ε)–spanning set if for every x ∈ X there exists y ∈ S such that d(f j(x), f j(y)) ≤ ε for all 0 ≤ j ≤ n.

It is immediate to check that the compactness of X implies the existence of finite spanning sets. Let r(n, ε) be the least number of points in an (n, ε)–spanning set. If we bound the time of observation of our by n and our precision in making measurements is ε we will see at most r(n, ε) orbits. Exercise 4.2 Show that if X admits a cover by m sets of diameter ≤ ε then r(n, ε) ≤ mn+1.

Definition 4.3 The topological entropy htop(f) of f is given by 1 htop(f) = lim lim sup log r(n, ε) . [4.1] ε→0 n→+∞ n

In the previous definition one cannot replace lim sup with lim since there exist examples of maps for which the limit does not exist. However one can replace it with lim inf still obtaining the topological entropy (see [Mn1], Proposition 7.1, p. 237). Exercise 4.4 Show that the topological entropy for any diffeomorphism of a compact manifold is always finite.

2 −i Exercise 4.5 Let X = {x ∈ ` (N) , |xi| < 2 for all i ∈ N}, f((xi)i∈N) = −1 n (2xi+1)i∈N. Let k ∈ N. Show that for this system r(n, k ) > k thus htop(f) = ∞.

1 According to Roy Adler [BKS, p. 103] “topological entropy was first defined by C. Shannon [Sh] and called by him noiseless channel capacity. preliminary version !

Exercise 4.6 Show that the topological entropy of the p–adic map of Exercise 2.36 is log p.

Remark 4.7 The topological entropy of a flow ϕt is defined as the topological entropy of the time–one diffeomorphism f = ϕ1. Exercise 4.8 Show that : (i) the topological entropy of an isometry is zero ; if h is an isometry the topological entropy of f equals that of h−1 ◦ f ◦ h. −1 (ii) if f is a of a X then htop(f) = htop(f ); m (iii) htop(f ) = |m|htop(f). Exercise 4.9 Let X be a metric space and f a continuous endomorphism of X. We say that a set A is (n, ε)–separated if for all x, y ∈ X there exists a 0 ≤ j ≤ n such that d(f j(x), f j(y)) > ε. We denote s(n, ε) the maximal cardinality of an (n, ε)–separated set. Show that : (i) s(n, 2ε) ≤ r(n, ε) ≤ s(n, ε); 1 (ii) htop(f) = limε→0 lim supn→+∞ n log s(nε); (iii) if X is a compact subset of Rl and f is Lipschitz with Lipschitz constant K then htop(f) ≤ l log K.

Proposition 4.10 The topological entropy does not depend on the choice of the metric on X provided that the induced topology is the same. The topological entropy is invariant by topological conjugacy.

Proof. We first show how the second statement is a consequence of the first. Let f, g be topologically conjugate via a homeomorphism h. Let d denote a fixed metric 0 0 −1 −1 on X and d denote the pullback of d via h : d (x1, x2) = d(h (x1), h (x2)). Then h becomes an isometry so htop(f) = htop(g) (see Exercise 4.8). Let us now show the first part. Let d and d0 be two different metrics on X which induce the same topology and let rd(n, ε) and rd0 (n, ε) denote the minimal cardinality of a (n, ε)-spanning set in the two metrics. We will denote htop,d(f) and htop,d0 (f) the corresponding topological entropies. Let ε > 0 and consider the set Dε of all pairs (x1, x2) ∈ X × X such that 0 d(x1, x2) ≥ ε. This is a compact subset of X × X thus d takes a minimum 0 0 0 δ (ε) > 0 on Dε. Thus any δ (ε)–ball in the metric d is contained in a ε–ball in 0 the metric d. From this one gets rd0 (n, δ (ε)) ≥ rd(n, ε) thus htop,d0 (f) ≥ htop,d(f). Interchanging the role of the two metrics one obtains the opposite inequality. ¤

Exercise 4.11 Show that if g is a factor of f then htop(g) ≤ htop(f). An alternative but equivalent definition of topological entropy is obtained consid- ering all possible open covers of X and their refinements obtained by iterating f.

2 C. Carminati and S. Marmi – An Introduction to Dynamical Systems

Definition 4.12 If α, β are open covers of X theire join α ∨ β is the open cover by all sets of the form A ∩ B, where A ∈ α and B ∈ β. An open cover β is a refinement of α, written α < β, if every member of β is a subset of a member of α.

Let α be an open cover of X and let N(α) be the number of sets in a finite subcover of α with smallest cardinality. We denote f −1α the open cover consisting of all sets f −1(A) where A ∈ α.

Exercise 4.13 If {an}n∈N is a of real numbers such that an+m ≤ an +am for all n, m then limn→+∞ an/n exists and equals infn∈N an/n. [Hint : n = kp+m, an ap am n ≤ p + kp .]

Theorem 4.14 The topological entropy of f is given by à ! n_−1 1 −i htop(f) = sup lim log N f α . [4.2] α n→∞ n i=0

For its proof see [Wa, pp. 173-174].

4.2 Entropy and information. Metric entropy. In order to define metric entropy and to make clear its analogy with the formula [4.2] of topological entropy we will preliminarly introduce some general consider- ations on the relationship between entropy and information (see [Khi]). Suppose that one performs an experiment which we will denote α which has m ∈ N possible mutually esclusive outcomes A1,...,Am (e.g. throwing a coin m = 2 or a dice m = 6). Assume that each possible outcome Ai happens with a Pm probability pi ∈ [0, 1], i=1 pi = 1 (in an experimental situation the probability will be defined statistically). In a probability space (X, A, µ) this corresponds to the following setting : α is a finite partition X = A1 ∪ ... ∪ Am mod(0), Ai ∈ A, µ(Ai ∩ Aj) = 0, µ(Ai) = pi. We want to define a (called entropy) which measures the uncertainity associated to a prediction of the result of the experiment (or, equivalently, which measures the amount of information which one can gain from performing the experiment). Let ∆(m) denote the standard m- of Rm, Xm (m) m ∆ = {(x1, . . . , xm) ∈ R | xi ∈ [0, 1], xi = 1}. i=1

Definition 4.15A H(m) : ∆(m) → [0, +∞] is called an entropy if it has the following properties :

3 preliminary version !

(m) (1) symmetry : ∀ i, j ∈ {1, . . . , m} H (p1, . . . , pi, . . . , pj, . . . , pm) = H(p1, . . . , pj, . . . , pi, . . . , pm) ; (2) H(m)(1, 0,..., 0) = 0 ; (m) (m−1) (3) H (0, p2, . . . , pm) = H (p2, . . . , pm) ∀ m ≥ 2, ∀ (p2, . . . , pm) ∈ (m−1) ∆ ; ¡ ¢ (m) (m) (m) 1 1 (4) ∀ (p1, . . . , pm) ∈ ∆ one has H (p1, . . . , pm) ≤ H m ,..., m where 1 equality is possible if and only if pi = m for all i = 1, . . . , m ; (ml) (5) Let (π11, . . . , π1l, π21, . . . , π2l, . . . , πm1, . . . , πml) ∈ ∆ ; for all (p1, . . . , pm) ∈ ∆(m) one must have (ml) (m) H (π1l, . . . , π1l, π21, . . . , πml) =H (p1, . . . , pm)+ µ ¶ Xm π π + p H(l) i1 ,..., il . i p p i=1 i i

In the above definition : (2) says that if some outcome is certain then the entropy is zero ; (3) says that no information is gained from impossible outcomes (i.e. outcomes with probability zero) ; (4) says that the maximal uncertainity of the outcome is obtained when the possible results have the same probabilitly ; (5) describes the behaviour of entropy when independent distinct experiences are performed. Let β denote another experiment with possible outcomes B1,...,Bl (i.e. another partition of (X, A, µ)). Let πij be the probablility πij of Ai and Bj. The conditional probability of Bj is prob (Bj | Ai) = (i.e. pi µ(Ai ∩Bj)). Clearly the uncertainity of the outcome of the experiment³ β once´ (l) πi1 πil one has already performed α with outcome Ai is given by H ,..., . pi pi

Theorem 4.16 An entropy is necessarily a positive multiple of Xm H(p1, . . . , pm) = − pi log pi . [4.3] i=1

Here we adopt the convention 0 log 0 = 0. The above theorem and its proof are taken from [Khi, pp. 10-13]. ¡ ¢ 1 1 Proof. Let K(m) = H m ,..., m . By (3) and (4) K is increasing : K(m) = H(0, 1/m, . . . , 1/m) ≤ H(1/(m + 1),..., 1/(m + 1)) = K(m + 1). Let m and l be 1 1 two positive integers. Applying (5) with πij ≡ ml , pi ≡ m gives Xm 1 K(lm) = K(m) + K(l) = K(m) + K(l) m i=1

4 C. Carminati and S. Marmi – An Introduction to Dynamical Systems thus K(lm) = mK(l). Given three integers r, n, l let m be such that lm ≤ rn ≤ lm+1, i.e. m log r m 1 n ≤ log l ≤ n + n . Since

mK(l) = K(lm) ≤ K(rn) = nK(r) ≤ K(lm+1) = (m + 1)K(l) ¯ ¯ m K(r) m 1 ¯ K(r) log r ¯ 1 K(r) K(l) one obtains n ≤ K(l) ≤ n + n , i.e. ¯ K(l) − log l ¯ ≤ n . Thus log r = log l and K(m) = c log m, c > 0. m (m) Let (p1, . . . , pm) ∈ Q ∩ ∆ andP let s denote the least common multiple of ri m their denominators. Then pi = s and i=1 ri = s. In addition to the partition α with elements A1,...,Am and associated probabilities con p1, . . . , pm we also consider β with s outcomes B1,...,Bs which we divide into m groups each of them containing r1, . . . , rm outcomes respectively. pi 1 Let πij = = , i = 1, . . . , m, j = 1, . . . , ri. ri s Given any outcome Ai of α the possible ri outcomes of β are equally probable thus µ ¶ πi1 πiri H ,..., = c log ri and pi pi µ ¶ Xm π π Xm Xm p H i1 ,..., iri = c p log r = c p log p + c log s. i p p i i i i i=1 i i i=1 i=1

On the other hand H(πi1, . . . , πmrm ) = c log s and by (5) µ ¶ Xm π π H(p , . . . , p ) = H(π , . . . , π ) − p H i1 ,..., iri 1 m i1 mrm i p p i=1 i i Xm = −c pi log pi, i=1 thus [4.3] holds on a dense subset of ∆(m). By continuity it must hold everywhere. ¤

1 The entropy H can be regarded as − N × the logarithm of the probability of a “typical” result of the experiment α repeated N times. Indeed, if N is large and α is repeated N times, by the law of large numbers one should observe each Ai approximately piN times. Thus the probability of a “typical” outcome is p1N p2N pmN p1 p2 . . . pm . We now want to extend the notion of entropy to measurable dynamical systems (X, A, µ, f). If α and β are two partitions of X, their joint partition α ∨ β is {A ∩ B,A ∈ Wn α, B ∈ β}. Given n partitions α1, . . . , αn we will denote i=1 αi their joint

5 preliminary version ! partition. If f is measurable and f −1(A) ∈ A for all A ∈ A, and α is a partition, f −1α is the partition defined by the subsets {f −1A,A ∈ α}. Finally a partition β is finer than α, denoted α < β, if ∀ B ∈ β ∃ A ∈ α such that B ⊂ A. The entropy H(α) of a partition α = {A1,...,Am} is given by H(α) = Pm − i=1 µ(Ai) log µ(Ai).

Definition 4.17 Let (X, A, µ, f) be a measurable dynamical system and α a partition. The entropy of f w.r.t. the partition α is à ! n_−1 1 −i hµ(f, α) := lim H f α [4.4] n→∞ n i=0 The entropy of f is

hµ(f) := sup{h(S, α), α is a finite partition of X}. [4.5]

Remark 4.18 Using the strict convexity of x log ³x on R+, one´ can prove the 1 Wn−1 −i existence of the limit [4.4]. Indeed the sequence n H i=0 f α is non–negative monotonic non increasing. Thus hµ(f, α) ≥ 0 for all α. Exercise 4.19 Show that if two measurable dynamical systems are isomorphic then they have the same entropy. The above considerations show that the entropy of a partition α measures the amount of information obtained making a measurement by means of a device which distinguishes points of X with the resolution prescribed by {A1,...,Am} = α. If x ∈ X and we consider the orbit of x up to time n − 1 x, fx, f 2x, . . . , f n−1x, since α is a partition mod(0) of X the points f ix, 0 ≤ i ≤ n − 1, belong (almost surely) to exactly one of the sets of α : x ∈ A with k ∈ {1, . . . , m} for all ¡ ¢ i ki i n−1 −i i = 0, . . . , n − 1. H ∨i=0 f α measures the information obtained from the knowledge³ of the´ distribution w.r.t. α of a segment of orbit of length n. Thus 1 Wn−1 −i n H i=0 f α is the average amount of information per unit of time and hµ(S, α) is the amount of information (asymptotically) obtained at each iteration of the dynamical system from the knowledge of the distribution of the orbit of a point w.r.t. the partition α. A more satisfactory formulation of this is given by the following theorem [Mn1].

Theorem 4.20 (Shannon-Breiman-McMillan) Let (X, A, µ, f) be an ergodic measurable dynamical system, α a finite partition of X. Given x ∈ X let αn(x) Wn−1 −i be the element of i=0 f α which contains x. For µ–a.e. x ∈ X one has

1 n hµ(f, α) = lim − log µ(α (x)). [4.6] n→∞ n

6 C. Carminati and S. Marmi – An Introduction to Dynamical Systems

Remark 4.21 The previous theorem admits the following interpretation : if a system is ergodic then there exists a non–negative number h such that ∀ ε > 0 if α is a sufficiently fine partition of X then there exists a positive integer N such that for all n ≥ N there is a subset Xn of X with measure µ(Xn) > 1 − ε and nh Wn−1 −i −nh made of approximately e elements of i=0 S α, each measuring about e . Let X be a compact metric space and A be the Borel σ-algebra. Brin e Katok [M. Brin and A. Katok, Lecture Notes in Mathematics 1007 (1983) 30–38] gave a “topological version” of Shannon-Breiman-McMillan’s Theorem. Let B(x, ε) be the ball of center x ∈ X and radius ε. Let f : X → X be continuous and preserving the probability measure µ : A → [0, 1]. Let

B(x, ε, n) := {y ∈ X | d(f ix, f iy) ≤ ε forall i = 0, . . . , n − 1}, i.e. B(x, ε, n) is the set of points y ∈ X whose orbit stays at a distance at most ε from the orbit of x for at least n − 1 iterations. Then one has

Theorem 4.22 (Brin-Katok) Assume that (X, A, µ, f) is ergodic. For µ–a.e. x ∈ X one has 1 sup lim sup − log µ(B(x, ε, n)) = hµ(f). [4.7] ε>0 n→∞ n

When the entropy is positive some of the observables are not predictable. A system is chaotic if it has positive entropy. Brin-Katok’s Theorem together with Poincar´erecurrence theorem show that the orbits of chaotic systems are subject to two apparently contrasting requirements. On one hand almost every orbit is recurrent. On the other hand the probability that two orbits stay close to each other for an inteval of time of length n decays exponentially with n. Since two initially close orbits must come infinitely many times close to their origin, if the entropy is positive they cannot be correlated. Tipically they will separate one from the other and return at different times n. To this complexity of the motions one associates the notion of chaos and shows how it can be impossible to compute the values that an observable will assume from the knowledge of the past. Remark 4.23 To compute the entropy one can use the following important result of Kolmogorov and Sinai : if α is a partition of X which generates the σ-algebra A the entropy of (X, A, µ, f) is simply given by

hµ(f) = hµ(f, α). [4.8]

W+∞ −i We recall that α generates A iff −∞ f α = A mod(0) if f is invertible, W∞ −i i=0 f α = A mod(0) if f is not invertible.

7 preliminary version !

Exercise 4.24 Show that the entropy of the p–adic map is log p. Exercise 4.25 Interpret formula [4.2] in terms of information (so as its analogy with [4.4] is clear).

4.3 Shifts and Bernoulli schemes

Z Let N ≥ 2, ΣN = {1,...N} . For x = (xi)i∈Z, y = (yi)i∈Z we define their distance

−a(x,y) d(x, y) = 2 where a(x, y) = inf{|n| , n ∈ Z , xn 6= yn} . [4.9]

Then (ΣN , d) is a compact (ultra)–metric space. The shift σ :ΣN → ΣN is the bilipschitzian homeomorphism of ΣN (the Lipschitz constant is N) defined by

σ((xi)i∈Z) = (xi+1)i∈Z . [4.10]

Topological properties of the shift map : • The ΣN is totally disconnected and has Hausdorff dimension 1. • The homeomorphism σ is expansive : for all x 6= y there exists n such that d(σn(x), σn(y)) ≥ 1. • The topological entropy of (ΣN , σ) is log N. (N) Let (p1, . . . , pN ) ∈ ∆ and let ν be the probability measure on {1,...N} such that ν({i}) = pi.

Definition 4.26The BS(p1, . . . , pN ) is the measurable dynam- ical system given by the shift map σ :ΣN → ΣN with the (product) probability Z measure µ = ν on ΣN .

Exercise 4.27 Show that the σ–algebra of measurable subsets of ΣN coincides with its Borel σ–algebra and its generated by cylinders : if j1, . . . , jk ∈ {1,...N} and i1, . . . , ik ∈ Z the corresponding cylinder is µ ¶ j1, . . . , jk C = {x ∈ ΣN | xi1 = j1, xi2 = j2, . . . , xik = jk} , [4.11] i1, . . . , ik

Check that the measure of cylinders for the Bernoulli scheme BS(p1, . . . , pN ) is µ µ ¶¶ j1, . . . , jk µ C = pj1 . . . pjk , [4.12] i1, . . . , ik and that it is preserved by the shift map.

Proposition 4.27The Kolmogorov–Sinai entropy of the Bernoulli scheme BS(p1, . . . , pN ) PN is − i=1 pi log pi.

8 C. Carminati and S. Marmi – An Introduction to Dynamical Systems

© ¡ j ¢ª Proof. The partition α defined by the cylinders C 0 j=1,...,N generates the sigma-algebra A. By Remark 4.22 we can thus use it to compute the entropy. Since ½ µ ¶¾ j j α ∪ σ−1α = C 0 1 0 1 ½ µ j0,j1=1,...,N¶¾ j j j α ∪ σ−1α ∪ σ−2α = C 0 1 2 0 1 2 ji=1,...,N, i=0,1,2 and so on, and the corresponding entropies are XN H(α) = − pj log pj j=1 XN XN −1 H(α ∪ σ α) = − pj0 pj1 log pj0 pj1 =

j0=1 j1=1 XN XN XN XN

= − (pj0 log pj0 pj1 − (pj1 log pj1 pj0 =

j0=1 j1=1 j1=1 j0=1 XN = −2 pj log pj j=1 X XN −1 −2 H(α ∪ σ α ∪ σ ) = − pj0 pj1 pj2 log pj0 pj1 pj2 = −3 (pj log pj

j0,j1,j2 j=1

PN and so on. Thus hµ(σ, α) = − j=1 pj log pj. ¤

(N) Remark 4.28 Note that hµ(σ) ≤ log N for all (p1, . . . , pN ) ∈ ∆ with equality if and only if pi = 1/N for all i for which we get the unique of the shift on N symbols which realizes the topological entropy.

Let us see how the shift and the shift–invariant compact subsets of ΣN arise naturally in the context of (the following description is taken from the lectures of J.–C. Yoccoz at the 1992 ICTP School on Dynamical Systems). Let (Y, d) be a compact metric space and f a homeomorphism of Y . Let Y = Y1 ∪ ... ∪ YN , where the Yi are compact. Given a point y ∈ Y we define

i Σ(f, y) = {x ∈ ΣN , f (y) ∈ Yxi ∀i ∈ Z} .

This is a nonempty compact subset of ΣN . Moreover we define

Σ(f) = ∪y∈Y Σ(f, y) −i = {x ∈ ΣN , ∩i∈Zf (Yxi ) 6= ∅} .

9 preliminary version !

Exercise 4.29 Show that Σ(f) is also a compact subset of ΣN , invariant under the shift. [Hint : Σ(f, f(y)) = f(Σ(f, y)).]

Assume that the map f is expansive, i.e. there exists ε > 0 such that for all y1 6= y2 n n there exists an integer n such that d(f (y1), f (y2)) > ε, and choose the compacts Yi above with diam(Yi) < ε. Then by expansivity if y1 6= y2 the sets Σ(f, y1) and Σ(f, y2) are disjoint and we can define a map h : Σ(f) → Y by the property h−1(y) = Σ(f, y). Exercise 4.30 Show that h is surjective, continuous and h ◦ σ = f ◦ h, i.e. h is a semiconjugacy from the restriction of the shift σ to Σ(f) to f. Exercise 4.31 Show that the semiconjugacy above is indeed a topological conjugacy if and only if Y is totally disconnected (and f is expansive). [Hint : choose the compacts Yi with diam(Yi) < ε and disjoint.]

4.4 (Topological) Markov chains and Markov maps The discussion at the end of the previous section shows the importance of the shift invariant compact subsets of ΣN . Among these a very important subclass are the so–called topological Markov chains or subshifts of finite type. −→ Let Γ ⊂ {1,...N}2 and let Γ be a connected directed graph on the vertices {1,...N} with at most one arrow between two vertices : there is an arrow from i to j if and only if (i, j) ∈ Γ. We denote A = AΓ the N × N matrix with entries aij ∈ {0, 1} defined as follows : ½ −→ 1 ⇐⇒ (i, j) ∈ Γ ⇐⇒ there is an arrow in Γ from i to j aij = 0 otherwise

We moreover assume that for all i ∈ {1,...N} there exist j, k ∈ {1,...N} such that aij = aki = 1. −→ We associate to the matrix A (or, equivalently, to the directed graph Γ ) the subset ΣA ⊂ ΣN defined as follows :

ΣA = {x ∈ ΣN , (xi, xi+1) ∈ Γ ∀i ∈ Z} .

Exercise 4.32 Show that ΣA is a compact shift invariant subset of ΣN .

The restriction of the shift σ to ΣA is denoted σA and is called the topological (or subshift of finite type) associated to the matrix A (equivalently −→ to the graph Γ ).

10 C. Carminati and S. Marmi – An Introduction to Dynamical Systems

n n Exercise 4.33 Show that card (Fix(σA)) = Tr(A ) for all n ∈ N. Deduce from this that the Artin-Mazur zeta function à ! X∞ 1 ζ (t) = exp card (Fix(σn ))tn A n A n=0 is rational (indeed it is equal to det(I − tA)−1). The matrix A is called primitive if there exists a positive integer m such that all m m m m the entries of A are strictly positive : A = (aij ) and aij > 0 for all i, j. Then n it is easy to show that for all n ≥ m one also has aij > 0 for all i, j.

Exercise 4.34 Show that if A is primitive then σA is topologically transitive, and its periodic orbits are dense in ΣA. Moreover σA is topologically (...). When the matrix is primitive one can apply the classical Perron–Frobenius theorem to compute the topological entropy of the associated subshift. Theorem 4.35 (Perron–Frobenius, see [Gan]) If A is primitive then there exists an eigenvalue λA > 0 such that : (i) |λA| > λ for all eigenvalues λ 6= λA ; (ii) the left and right eigenvectors associated to λA are strictly positive and are unique up to constant multiples ; (iii) λA is a simple root of the characteristic polynomial of A. Exercise 4.35 Assume that A is primitive. Show that the topological entropy of m σA is log λA (clearly λA > 1 since all the integers aij > 0). Very much as the shift on N symbols preserves many invariant measures (the Bernoulli schemes on N symbols) a topological Markov chain preserves many invariant measures (which are called Markov chains). Let P = (Pij) be an N × N matrix such that (i) Pij ≥ 0 for all i, j, and Pij > 0 ⇐⇒ aij = 1 ; PN (ii) j=1 Pij = 1 for all i = 1,...,N ; (iii) P m has all its entries strictly positive. Such a matrix is called a stochastic matrix. Applying Perron–Frobenius theorem to P we see that 1 is a simple eigenvalue of P and there exists a normalized (N) eigenvector p = (p1, . . . , pN ) ∈ ∆ such that pi > 0 for all i and XN piPij = pj , ∀ 1 ≤ i ≤ N. i=1

We define a probability measure µ on ΣA corresponding to P prescribing its value on the cylinders : µ µ ¶¶ j , . . . , j µ C 0 k = p P ··· P , i, . . . , i + k j0 j0j1 jk−1jk

11 preliminary version ! for all i ∈ Z, k ≥ 0 and j0, . . . , jk ∈ {1,...,N}. It is called the Markov measure associated to the stochastic matrix P .

Exercise 4.36 Prove that the subshift σA preserves the Markov measure µ.

The subshift of finite type σA with the preserved measure µ is called a Markov chain.

Exercise 4.37 Show that the Kolmogorov–Sinai entropy of (ΣA, A, σA, µ) is

XN hµ(σA) = − piPij log Pij . i,j=1

Check that hµ(σA) ≤ htop(σA). One can prove that there exists a stochastic matrix P such that the entropy of the associated Markov chain is equal to the topological entropy of σA. Moreover this measure is unique (Parry measure, see [Mn1]). Remark 4.38 There is another point of view which can be useful in studying topological Markov chains and their invariant Markov measures. Call a sequence x ∈ ΣA a configuration of a one–dimensional spin system (or Potts system) with configuration space ΣA. Then part of the classical stastistical mechanics of spin systems [Ru] is just the of the topological Markov chain (the shift– invariant measures being interpreted as translation–invariant measures). Remark 4.39 An interesting application of the symbolic dynamics method described at the end of Section 3 is the theory of piecewise expanding Markov maps of the interval (Exercise 2.21). Let Y = [0, 1], f : Y → Y piecewise monotonic and C2, i.e. there exists a finite decomposition of the interval [0, 1] in N subintervals 2 Ii = [ai, ai+1), (a1 = 0, aN+1 = 1) on which f si monotonic and of class C on their −1 closure. On each of these subintervals an inverse branch fi of f is well–defined. Assume moreover

• Markov property f(Ii) = Iki ∪ Iki+1 ∪ ... ∪ Iki+ni ; m • aperiodicity there exists an integer m such that f (Ii) = Y for all i = 1,...,N ; • eventual expansivity some iterate of f has its derivative bounded away from 1 in modulus. After Section 3 the symbolic dynamics of these maps is just a topological Markov chain. Moreover one can prove that there exists a unique invariant ergodic measure absolutely continuous w.r.t. the Lebesgue measure with piecewise continuous density bounded away from 0 and ∞. With this measure the system is isomorphic to the Markov chain with the Parry measure : see [AF]. The existence of

12 C. Carminati and S. Marmi – An Introduction to Dynamical Systems an absolutely continuous invariant measure can be proven also under weaker assumptions, see the classical [LY].

13