<<

Measures of maximal for shifts of finite type

Ben Sherman

July 7, 2013

Contents

Preface 2

1 Introduction 9

2 Dynamical systems 10

3 Topological entropy 13

4 Measure-preserving dynamical systems 20

5 Measure-theoretic entropy 22

6 Parry measure 27

7 Conclusion 30

Appendix: Deriving the Parry measure 30

1 Preface

Before I get to the technical material, I thought I might provide a little non-technical background as to why is so interesting, and why unpredictable dynamical systems (i.e., dynamical systems with positive entropy) provide an interesting explanation of randomness.

What is randomness?

“How dare we speak of the laws of chance? Is not chance the antithesis of all law?” —Joseph Bertrand, Calcul des probabilit´es, 1889

Throughout history, phenomena which could not be understood in a mundane sense have been ascribed to fate. After all, if something cannot be naturally understood, what else but a supernatural power can explain it? Concepts of chance and randomness thus have been closely associated with actions of deities. After all, for something to be random, it must, by definition, defy all explanation. Many ancient cultures were fascinated with games of chance, such as throwing dice or flipping coins, and interpreted their outcomes as prescriptions of fate. To play a game of chance was to probe the supernatural world. In ancient Rome, the goddess Fortuna was the deity who determined fortune and fate; the two concepts were naturally inseparable. Randomness, in many ways, is simply a lack of understanding. of an- cient Greece realized the subjectivity of randomness with an explanatory story (Wikipedia, 2012). Suppose two men arrange to send their servants to fetch water at the same time. When the servants meet while fetching water, they deem their encounter random, and ascribe it to actions of the Gods, while it is known by the men who sent them that their meeting was arranged by mortals. As Bertrand exclaims, it seems paradoxical to formulate laws that explain the unex- plainable. Perhaps this is why was so slow to develop. If each random event is determined by fate, it would be unreasonable to expect collections of random

2 events to follow definite, mundane patterns. It was not until 1565, when Gerolamo Car- dano published Liber de Lude Aleae, a gambler’s manual that discusses the odds of winning games of chance, that it was realized that a collection of many outcomes of random events follows strong patterns. Cardano noted that as one observes more outcomes of games of chance, the frequencies of the outcomes come closer to certain numbers. Cardano called these special numbers probabilities, and the theory of probability was born. While the development of probability theory answered many questions about the aggre- gated outcomes of random events, it still leaves undecided how the outcome of an individual random event is determined. As a consequence, it also cannot provide an explanation for why the frequencies of outcomes of random events approach the probabilities of those events. Chance remains “the antithesis of all law.” There are two directions we can turn to for answers. We can ascribe it to fate, as has been done since the dawn of history. Or, we could turn to the laws of physics.

Newton’s laws

“God doesn’t play dice with the world.” —Albert Einstein, conversation with William Hermanns, 1943

The world is unpredictable. One could say that we have good fortune that this is true, as a predictable world would certainly not be very exciting. Given our current state, we’d know exactly what would happen in the future. It would be a world without choice and devoid of agency: simply a long, boring march through time. In fact, it would be much worse; in a predictable world, our infinite knowledge of the world’s state in the past and future would preclude us from experiencing time, as we would not accumulate memory as time passed by. There would be no way to assert which direction of time would be “forward,” and time would reduce to something akin to another spatial dimension. But this putative dull world is not so far-fetched, and in fact it is quite peculiar that our world is not that way. When physicist devised what are now known

3 as the classical laws of physics (around the year 1700), he essentially claimed that our n world is that bleak predictable one described earlier. Let x ∈ R be a vector of spatial coordinates of particles in an isolated system (such as the universe). For example, we could 6 have x ∈ R for two particles in 3-dimensional space, with x1 and x4 the x-coordinates of particles 1 and 2, respectively, et. cetera. Let mi be the mass of the particle whose coordinate is described by xi. Newton claimed then that there was some time-independent energy potential U(x) that was a function only of the positions of the particles, such that for all i,

2 ∂U(x) d xi − = mi 2 . ∂xi dt

Thus, Newton claimed that the trajectory of any system of particles was determined by a system of n ordinary differential equations which are described above. Suppose for a given time t0, we know the positions x0 and velocities x˙ 0 of all coordinates. Then existence and uniqueness theorems for ordinary differential equations assure that there is a single unique solution x(t), the trajectory of all the particles through time. Therefore, Newton’s laws uniquely determine the trajectory of a system. This means that Newton’s laws are deterministic, and leave no room for anything such as randomness. We also observe that Newton’s laws are time-symmetric; suppose we have a trajectory x(t) that satisfies Newton’s laws. Then one can check that the time-reversed trajectory x˜(t) defined by x˜(t) = x(−t) also satisfies Newton’s laws, and thus is an equally plausible trajectory. Newton’s laws have received some adjustments in the 300 years since he first formulated them, but the two principles of determinism and time-symmetry still hold, with some revision. Now, in order to produce a plausible time-reversed trajectory, we also need to mirror charge and parity. But every trajectory still has a plausible time-reversed trajectory. The quantum physical revision states that unobservable information about particles still evolves deterministically, but leaves open to interpretation how these unobservables relate to measured observables of a system.

4 But both classical and seem to imply that the world evolves de- terministically with time. But we are not consigned to the dull world that Newton’s laws imply. We don’t know the weather two weeks from now, or the winner of tonight’s bas- ketball game. We can’t even predict the start of an earthquake or volcanic eruption the second before one occurs! What gives? There is one physical “law” that is time-asymmetric and thus discriminates between directions of time (Baranger, 2000). It is the Second Law of , which states that the entropy of the world does not decrease (and sometimes increases) as time moves “forward.” This actually provides the only physical definition for what the “forward” direction of time even means! But what is this nebulous quantity entropy? In the study of thermodynamics, it was empirically discovered that energy, in the form of heat, tends to flow from objects of high temperature to those of low temperature as time advances. Physicist Rudolf Clausius coined the term entropy in 1868, drawing from the Greek word entropia, meaning “a turning toward,” for a measure he defined that related to the lack of potential of low-temperature objects to transfer heat energy to high-temperature ones. But alas, temperature was in turn defined in terms of capability to transfer heat, and we are left with little insight. And none of this could be derived from Newton’s laws. Physicist was the first to rigorously define entropy. Boltzmann is known as the founder of statistical mechanics, a field that uses statistics to reconcile the behavior of microscopic particles with the properties of the bulk material they compose. His tombstone famously bears the equation that encapsulates this founding idea,

S = k log W, where S is the entropy of a system, k is a constant (the now-eponymous Boltzmann con- stant), and W is the number of possible indistinguishable microstates that could equiva- lently describe the system’s state. Therefore, the Second Law equivalently states that the value of W for the universe doesn’t decrease as time moves forward. Boltzmann was also

5 able to define temperature of a bulk material in terms of the microscopic states of its con- stituent particles. Boltzmann’s entropy managed to suitably explain irreversible processes like heat transfer from hot to cold objects, mixing of two different substances, and phase transitions of substances at given temperatures. Boltzmann’s definition is simultaneously problematic and intriguing. First, it is prob- lematic, because calculating the entropy of a system depends very strongly on how a sys- tem’s state (specifically “macrostate”) is to be described. The more accurately we describe the system, the fewer indistinguishable microstates there would be that would count as describing the system, and thus the lower the entropy. So the thermodynamic fact that that Clausius concluded reduces to something that seems surprisingly subjective. Perhaps, as time goes by, physicists simply get lazier and less rigorously describe the macroscopic states of their systems! But it is also offers a very intriguing interpretation of the Second Law: as time goes by, we cannot ever become more capable of describing the world, and sometimes we become less capable. In essence, the Second Law states that the world is unpredictable (and ever-increasingly so)! Or, seen from another perspective, as time goes by, the number of possible microstates describing the world becomes larger, and thus some random-like process has somehow generated new microstates. We now need more informa- tion to describe our system; it has become more complicated. Entropy is the measure of randomness that has accumulated in the world. In this way, Boltzmann’s entropy seems to fly in the face of Newton’s laws. After all, consider a system that can be described by any of W indistinguishable microstates. For each microstate, we could use Newton’s laws to traverse time, and end up with exactly W indistinguishable microstates for the system at any point in the past or future. Newton’s laws, and the fact that ordinary differential equations have unique solutions, preclude generating several microstates from one1.

1In reality, since microstate parameters such as position and velocity vary continuously, W is not a discrete number, but rather a microstate density. Therefore, the entropy S of a single macrostate cannot be unambiguously defined, but the difference between two macrostates can be (by using the same reference

6 However, Boltzmann claimed to derive the Second Law for an ideal gas (a particularly tractable physical system) from Newton’s laws with his H-theorem. But physicist Johann Loschmidt complained that Boltzmann’s H-theorem must be inaccurate, as it should not be possible to derive a time-asymmetric result from time-symmetric laws. This objection is now known as Loschmidt’s paradox. According to Newton’s laws, why can’t we, for all of our plausible trajectories x(t) where entropy doesn’t decrease as time increases, create time-reversed trajectoriesx ˜(t) in which entropy doesn’t decrease as time decreases as well, and conclude that entropy is constant? It is now accepted that Boltzmann had implicitly made a (time-asymmetric) probabilistic assumption of “molecular chaos,” meaning that particles randomly collided with each other. Despite its mysterious underpinnings, the Second Law is astoundingly real. It is the reason we constantly search for new sources of usable energy, even though Newton’s laws say that energy is conserved. It is the reason milk cannot be unspilt, liquids cannot be unmixed, and ears don’t warm up on frosty days. Boltzmann’s entropy formula fascinatingly links these phenomena to the probabilistic outcome of random-like events. But how can deterministic, and even reversible, dynamics produce random-like events? The answer is chaos.

Chaos

“Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” —Edward Lorenz, title of a talk presented in 1972

One of the founders of chaos theory was meteorologist Edward Lorenz. In 1961, Lorenz had a (deterministic) weather model that forecasted future weather conditions based on current conditions. These calculations were done using an early computer (Gleick, 1987). In one case, Lorenz had run a simulation, but wanted to look at some data again. To save density for both states). In this case, there is a theorem called Louville’s theorem that says that Newton’s laws preserve phase-state density, and so entropy should not change.

7 computing time, he started the re-run of his simulation in the middle of the trajectory by inputting what the previous simulation had output. The long-term weather prediction results were profoundly different in this second simulation, and Lorenz realized that this was because the numbers that he had input were rounded off compared to what the computer had worked with (from 6 digits of precision to 3). But the results were drastically different (as opposed to proportional to the size of the input error)! Lorenz realized that little differences in weather conditions propagate enormously with time, and gave a talk entitled “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?”. Lorenz’s realization was that no matter how accurately current weather conditions could be measured, physical models would still fail to make accurate long-term weather predictions. At the same time, a similar revelation was made in the mathematical world (Sinai, 2007). Mathematician was studying measure-theoretic isomorphism of dynamical systems around 1959, and created a class of probabilistic systems (which he called quasi-regular) and defined a notion of entropy for those systems. Entropy was a measure of how unpredictable those dynamical systems are, and was intended to categorize systems based on how random they were. The notion of entropy was like this: suppose you have been “watching” a dynamical system for a very long time and have measured the state of the system at each point in time with arbitrarily high (but imperfect) accuracy. Entropy is then how much uncertainty you have as to what state the system will be in at the next point in time2.

2This definition may seem different from the one presented later in this paper. However, this is an equivalent definition of entropy. For either topological or measure-theoretic entropy, an equivalent definition of entropy of transformation T with respect to partition α is

n ! n !! _ −j _ −j h(T, α) = lim H T α − H T α . n→∞ j=0 j=1 This is a precise version of the colloquial definition that was given.

8 Vladimir Rokhlin learned of these topics, and decided to compute the entropy of de- terministic automorphisms of the 2-dimensional torus. Yakov Sinai, a doctoral student of Kolmogorov, tried to prove that the entropy must be zero since the transformation is entirely deterministic, but failed. Kolmogorov then looked at the problem and told him the entropy must be positive! Thus they reached the same conclusion as Lorenz: there are purely deterministic dynamical systems for which, no matter how accurately past and current states have been measured, it is impossible to predict the future state of the system. This gives a suitable definition for a chaotic dynamical system: one that has positive entropy. Unfortunately, real examples of chaotic (deterministic) dynamical systems are often very difficult to analyze, and so it can be most interesting to look at the properties of some toy dynamical systems called shifts of finite type. This paper will investigate the entropy (i.e., the rate of entropy production) of these systems.

1 Introduction

The Variational Principle states that the topological entropy of a dynamical system is the supremum of the measure-theoretic entropy of all measures preserved by the dynamical system. For certain systems called shifts of finite type, there is a measure, called Parry measure, which achieves this supremum (and it is actually the unique measure to do so). The goal of this paper is to derive this entropy-maximizing measure and confirm that it achieves the topological entropy. First, we introduce dynamical systems and describe shifts of finite type. We then introduce topological entropy, and calculate the topological entropy for shifts of finite type. Next, we present the concept of measure-preserving dynamical systems, and briefly describe Markov measures on the shift space. We then cover measure-theoretic entropy and the Variational Principle. In the final section, we describe the Parry measure and confirm that it achieves the topological entropy.

9 2 Dynamical systems

2.1 Background

Let X be a set (space), and let T : X → X be a transformation of that space. Then (X,T ) is called a dynamical system. Our space X consists of all possible states of the system, and is called the phase space. The function T models the passage of time in discrete steps. We can imagine a trajectory of a point x ∈ X in this dynamical system as an infinite sequence of points where each successive point is the image of the previous point under T , i.e.,

xn+1 = T (xn).

+ n For any n ∈ Z , let T denote T composed with itself n times. Then if x0, is the initial state of the point, we have n xn = T (x0).

Note that since T is not necessarily invertible, we may not be able to “backtrace” where a point has been in the past.

2.2 Shifts of finite type

Shifts of finite type are a class of dynamical systems which are particularly amenable to study and are the focus of this paper. We can imagine shifts of finite type as giving dynamical meaning to infinite walks along finite graphs. Let us consider a directed graph with nodes labeled 1, . . . , k. We represent the edges between nodes of this graph with an adjacency matrix A, where Ai,j = 1 if the transition i → j is allowed, and 0 otherwise.

Definition. Let A be a k × k adjacency matrix. Then the one-sided shift space on A is the space +  + ΣA = (x0, x1, . . . xn,...): Axi,xi+1 = 1 for i ∈ Z of all possible infinite paths x0 → x1 → · · · along the graph represented by A.

10 Definition. A cylinder set of length n is a set of the form

∞ + [x0, x1, . . . , xn] = {(yn)0 ∈ Σk : yi = xi for 0 ≤ i ≤ n}, which is the set of all infinite paths which begin with the path x0 → x1 → · · · → xn (which is a path of length n).

+ A useful topology on ΣA is that generated by the set of all cylinder sets (together with + ∅ and ΣA). We will denote this topology as C. Note that with this topology, all cylinder sets are both open and closed (they are the complement of the union of a finite number of cylinders), and so the space is totally disconnected.

We can imagine x0 as the “current position” of a given walk along the graph, and the rest of the sequence as a record of the nodes to be visited in the future. To model the passage of time, we create the shift transformation.

+ + Definition. The shift transformation ρ :ΣA → Σk is defined by ρ(x0, x1,...) = (x1, x2,...).

The entire dynamical system is known as a shift of finite type.

2.3 Perron-Frobenius theorem

The Perron-Frobenius theorem will be an important tool for describing the limiting behav- ior of shift space. This is taken almost directly from (Walkden, 2012, lec. 15), and will be stated without proof. We begin with some necessary definitions.

Definition. A matrix A is non-negative if each entry is non-negative, and positive if each entry is positive.

n Definition. A non-negative matrix A is aperiodic if there exists some n ∈ N such that A is positive.

Remark. If A is an adjacency matrix, A is aperiodic if there is some n ∈ N such that a path of length n can be constructed from any node to any other node.

11 Theorem 1 (Perron-Frobenius Theorem). Let A be a non-negative aperiodic k ×k matrix. Then:

1. There is a positive real eigenvalue λ of A such that all other eigenvalues λi ∈ C

satisfy |λi| < λ,

2. The eigenvalue λ is simple (i.e. the corresponding eigenspace is one-dimensional),

3. There are unique left- and right-eigenvectors u and v such that uT A = λuT and P T Av = λv, and all entries ui > 0 and vi > 0, and i ui = 1 and u v = 1.

The main use of this theorem is to characterize what An looks like for large n. The following corollary will be used repeatedly later on.

Corollary 2. A useful consequence of this theorem is

An lim = vuT . n→∞ λn

Proof. We put A into Jordan canonical form,     λ uT      J  uT  −1 h i  1   2  A = PDP = v v ··· v     , 2 k  ..   .   .   .      T Jm uk where each Ji is a Jordan block with eigenvalue λi (so m ≤ k −1). Note that this is correct since Av = vλuT v = λv, and likewise for u. Recall that the nth power of a Jordan block of size h with eigenvalue λi is

 n n n−1 n  n−h−1 λi 1 λi ··· h−1 λi  . .   n .. .  n  λi .  Ji =   .  .. n n−1   . λ   1 i  n λi

12 Since for all eigenvalues other than λ, we have |λi|/λ < 1, this means that for each Jordan block Ji, we have 1 lim J n = 0. n→∞ λn i Then     1 uT     n n  J n/λn  uT  A D −1 h i  i   2  = P P = v v ··· v     . λn λn 2 k  ..   .   .   .      n n T Jm/λ uk

Taking the limit as n → ∞, and noting that all entries in the block diagonal matrix tend to 0 except for the first, we recover the result.

3 Topological entropy

The first thing that we’d like to do is calculate the topological entropy of a given shift space. Therefore, we will now define topological entropy, and then prove a theorem that allows us to more easily calculate it.

3.1 Background

The following definitions build to allow us to define topological entropy. In the section that follows, we will let X be a compact Hausdorff topological space.

Definition. Let α and β be open covers of X. The common refinement of α and β is

α ∨ β := {A ∩ B : A ∈ α, B ∈ β}.

Note that α ∨ β is also an open cover of X.

Definition. Let α be an open cover of X. Let N(α) denote the minimum cardinality of all finite subcovers of α (we are guaranteed α has finite subcover since X is compact). Then

13 the topological entropy of α is H(α) := lg(N(α)), where lg is the base-2 logarithm3.

Definition. Let α be an open cover of a space X, and let T : X → X be a continuous mapping. The topological entropy of T with respect to α is

n−1 ! 1 _ h(T, α) := lim H T −iα , n→∞ n i=0

−i −i −i where T α = {T A}A∈α (note that T α is also an open cover of X, since T is continu- ous).

This all depends on the open cover α, of course. If we consider the trivial open cover {X}, we have h(T, {X}) = 0, since T −iX = X for any i. It is apparent that finer open covers will result in higher entropy values. To quantify how random or unpredictable T is, we don’t want to depend on which open cover α we choose. We want to take partitions that are arbitrarily fine to isolate the unpredictability of T .

Definition. Let X be a topological space and let T : X → X be continuous. The topolog- ical entropy of T is

h(T ) := sup{h(T, α): α is an open cover of X}.

Of course, it is impractical to attempt to enumerate every open cover of a space and compute the topological entropy of T with respect to each open cover. A theorem, presented by Pollicott and Yuri (1998), will allow us to assert that if we can find an open cover α that is specific enough, then h(T ) = h(T, α). The following definitions and lemmas lead up to this theorem.

3The in any base is appropriate, but base-2 is most often used in an context.

14 Definition. An open cover α of a space X endowed with continuous map T : X → X is ∞ said to be a generator if for any sequence of sets (An)n=0 in α, we have that ∞ \ −n T An n=0 contains at most one point (where for any set E ⊆ X, E denotes the the closure of E).

Lemma 3. Any subcover of a generator is also a generator.

Proof. According to the definition, all the requirements for a subcover of an open cover to be a generator are also requirements for the original open cover to be a generator.

Definition. Let α and β be open covers of X. We say α refines β if each set in α is wholly contained in a set in β.

Remark. Note that the common refinement of two open covers refines each open cover.

Lemma 4. Let α and β be open covers of X. If α refines β, then H(α) ≥ H(β).

Proof. Equivalently, we must prove that N(α) ≥ N(β). Let α0 = {A1,...,An} be a finite subcover of α. Since α refines β, for each Ai ∈ α0, we can find a Bi in β such that Ai ⊆ Bi.

Then if we take β0 = {B1,...,Bn}, where each Ai ⊆ Bi, we see that β0 is a finite subcover of β, and α0 and β0 both have cardinality n. Therefore, if α has a finite subcover with any given cardinality, so does β. Thus N(α) ≥ N(β).

Lemma 5. Let α and β be open covers of X. If α refines β, then h(T, α) ≥ h(T, β).

Wn−1 −i Wn−1 −i Proof. First, we claim that i=0 T α refines i=0 T β. Suppose we have an arbitrary Tn−1 −i Wn−1 −i set A = i=0 T Ai in i=0 T α. Then for each Ai, we can let Bi be such that Ai ⊆ Bi, Tn−1 −i Wn−1 −i since α refines β. Let B = i=0 T Bi, and note that this set is in i=0 T β. We note Wn−1 −i Wn−1 −i that A ⊆ B, and since A was arbitrary, we conclude that i=0 T α refines i=0 T β. Using Lemma 4, we can conclude that

n−1 ! n−1 ! _ _ H T −iα ≥ H T −iβ , i=0 i=0

15 and thus h(T, α) ≥ h(T, β).

The following is the generator theorem, which allows us to calculate topological entropy by finding a generator.

Theorem 6. If α is a generator, then h(T ) = h(T, α).

Before we can prove this theorem, we will need several lemmas. For the remainder of Wn−1 −i this section, let α be a generator. For notational convenience, denote αn := i=0 T α.

Lemma 7. Let α be a generator. If β is an open cover of X, then there is some N ∈ N such that some subcover of αN refines β.

−n Proof. For all x ∈ X and for all n ∈ N, let Ax,n be any set in α such that x ∈ T Ax,n. T∞ −n Then, since α is a generator, for all x ∈ X, we know that n=0 T Ax,n = {x}. For each x ∈ X, let Bx be any set in β such that x ∈ Bx.

TNx −n For each x ∈ X, we claim that there is some Nx such that n=0 T Ax,n ⊆ Bx, due to the fact that X is compact and since each of those finite intersections is compact (Keynes and Robertson, 1969). Then, we will define

( Nx ) \ −n γ = T Ax,n . n=0 x∈X Note that γ is an open cover of X, since each point x is contained in the open set indexed by itself. Moreover, because of how we chose each Nx, we observe that γ refines β.

Since X is compact, let γ0 be a finite subcover of γ. For concreteness, say  M Nxi  \ −n  γ0 = T Axi,n , n=0  i=1 and note that this subcover will still refine β. Now, define N = maxi Nxi . Now we will define an open cover   M Nxi N [  \ −n \ −n  δ = T Axi,n ∩ T An . i=1 n=0 n=N +1  xi (A ,...,A ) Nxi +1 N

16 Essentially, δ splits up some of the open sets in γ0 down to the “level” of αN . From the form, it is apparent that δ refines γ0. At the same time, the form shows that δ must be a subcover of αN . Since refinement is transitive, we conclude that δ refines β as well.

Lemma 8. Let α be a generator. The entropy of T with respect to the refined open cover

αN is equal to that of T with respect to the original open cover α,

h(T, αN ) = h(T, α).

Proof. First, note that

k−1 k−1 N−1  k+N−2 _ −i _ −i _ −j _ −i T αN = T  T α = T α = αk+n−1. i=0 i=0 j=0 i=0

Then

k−1 ! 1 _ −i 1 1 h(T, αN ) = lim H T αN = lim H (αk+N−1) = lim H (αk+N−1) k→∞ k k→∞ k (k+N)→∞ k + N i=0 = h(T, α).

We can now prove Theorem 6.

Proof. Let α be a generator, and let β be any open cover of X. By Lemma 7, we can let N be such that some subcover of αN , say δ, refines β. Since δ refines β, we know by Lemma 5 that h(T, δ) ≥ h(T, β). From the definition of topological entropy, it is clear that a cover and its subcover must have the same topological entropy, and thus h(T, αN ) = h(T, δ), and therefore h(T, αN ) ≥ h(T, β). From Lemma 8, we know that h(T, αN ) = h(T, α), so therefore we conclude that h(T, α) ≥ h(T, β). This is true for any open cover β, so h(T, α) ≥ h(T ). Since α is itself an open cover, we have the equality h(T, α) = h(T ).

17 3.2 Topological entropy of the shift space

Let A be a k × k aperiodic adjacency matrix and let u, v, and λ be the corresponding objects from the Perron-Frobenius Theorem (Thm. 1). We would like to calculate the + topological entropy of the shift transformation ρ on the shift space ΣA. We would like to find a generator so that we may calculate the topological entropy using Theorem 6. We will choose to partition the space by grouping together all sequences + starting with a common symbol, so let α = {[1],..., [k]} be that open cover of ΣA.

Lemma 9. The open cover α is a generator.

Proof. First, we recall that all cylinder sets are clopen, and so for all A ∈ α, A = A. ∞ Consider the arbitrary sequence of sets ([in])n=0 in α. Then the only possible member of T∞ −n ∞ the set n=0 ρ [in] is the sequence (in)n=0, if that sequence is indeed in the shift space (otherwise, the set is empty). Thus the set determined by any arbitrary sequence of sets in α contains at most one point, and so α is a generator.

Wn −i Note that the constituents of the open cover i=0 ρ α are simply the cylinder sets of length n, and so the minimum cardinality of all of its subcovers is simply the number of possible distinct (non-empty) cylinder sets of length n.

Therefore, we’d like to count all possible cylinder sets of length n. Let θn be the number of possible distinct cylinder sets of length n (i.e., paths of length n). We wish to count θn.

n n Proposition 10. A = [Ai,j] is the matrix of the number of possible cylinders of length n beginning with i and ending with j.

+ Proof. We prove by induction on n. First, note that since A is aperiodic, a sequence in ΣA may begin with any symbol. Thus, for n = 0, the possible paths are trivial ones that start and end at the same point, so there is 1 possible path if i = j and 0 if i 6= j, so this yields I = A0. Now suppose the proposition holds for n. Then the number of paths of length n + 1 starting at i and ending at j is equal to the of the number of paths of length n from

18 i to h for all h where the transition h to j is possible, which is

k X n n n+1 Ai,hAh,j = (A A)i,j = Ai,j . h=1

Let   1   . 1 := .   1 be the vector of 1s of length k. This will allow us to conveniently notate summations of rows or columns of matrices. For a k × k matrix M, 1T M gives a row vector of sums of columns of M, while M1 gives a column vector of sums of rows of M. So 1T M1 is the sum of all entries of M.

Proposition 11. The number of possible cylinders of length n grows like λn. Specifically, θ lim n = 1T v. n→∞ λn

n Proof. We note that due to Proposition 10, θn is simply the sum of entries of A , i.e.,

T n θn = 1 A 1.

We would like to investigate the rate of growth of θn, so we calculate the ratio θn+1/λn, θ 1T An1 An  n = = 1T 1. λn λn λn Taking the limit as n → ∞, and using Corollary 2, θ lim n = 1T (vuT )1 = (1T v)(uT 1) = 1T v · 1. n→∞ λn

+ Theorem 12. The topological entropy of the system (ΣA, ρ) is

h(ρ) = lg λ.

19 Proof. Since α is a generator, we can use Theorem 6 to assert that h(ρ) = h(ρ, α). Then, Wn−1 −i using the fact that the subcover of minimal cardinality for i=0 T α is the collection of Wn−1 −i  all allowed cylinders of length n − 1, we know that N i=0 T α = θn−1. Therefore, using Proposition 11,

n−1 ! 1 _ −i 1 h(ρ, α) = lim H ρ α = lim lg θn−1 n→∞ n n→∞ n i=0 1 1 n − 1 = lim lg(1T vλn−1) = lim lg(1T v) + lim lg λ n→∞ n n→∞ n n→∞ n = lg λ.

4 Measure-preserving dynamical systems

The next segment of paper will describe measure-theoretic entropy for measure-preserving dynamical systems, so first we must describe what measure-preserving dynamical systems are. We we will then examine Markov measures, a family of measures on the shift space that are preserved by the shift transformation.

4.1 Background

In this section, let X be a space equipped with σ-algebra B. Let T : X → X be a B- measurable transformation of that space and let µ : B → [0, 1] be a probability measure on X such that µ is preserved by T , meaning that for all B ∈ B, µ(T −1B) = µ(B). Then (X, B, µ, T ) is called a measure-preserving dynamical system.

4.2 Measures on the shift space

For shifts of finite type, the shift transformation ρ preserves a wide variety of measures on + the shift space ΣA. An interesting family of measures are called Markov measures, which assign measures (i.e., probabilities) to cylinder sets according to the likelihood that such a

20 sequence of states would be observed in a Markov chain that is in its steady state (to this end, we consider only aperiodic Markov chains). Markov chains are determined by a stochastic matrix P (a k × k matrix) of transition probabilities, where Pij is the probability that state i transitions to state j.

Definition. A non-negative k × k matrix P is a stochastic matrix if

P 1 = 1.

Thus, if a system has probability pi of being in state i at a given time, we can represent this distribution with a row vector p = (p1, . . . , pk) so that the probability distribution of the following state will be p0 = pP . If the Markov chain is aperiodic (i.e., if P is aperiodic), then it will approach a unique “steady-state” probability distribution. Henceforth, we will only consider aperiodic Markov chains. Thus, Perron-Frobenius Theorem (Thm 1) says that P has a unique eigenvector which is simple and has greater modulus than all other eigenvalues. Because P is stochastic, this eigenvalue is actually equal to 1 (with right eigenvector 1). The left eigenvector p with eigenvalue 1 for P (i.e., pP = p) is the “steady-state” or equilibrium probability distribution of the Markov chain.

+ Definition. A stochastic matrix P is compatible with the shift space ΣA if all transitions allowed by P are allowed by A, i.e., if for every i, j, Pi,j > 0 only as long as Ai,j = 1.

Remark. Many stochastic matrices may be compatible with a given shift space, and many shift spaces may be compatible with a given stochastic matrix.

+ Definition. A Markov measure on the shift space ΣA assigns to each cylinder set the probability

µP ([x0, . . . , xn]) = px0 Px0,x1 ··· Pxn−1,xn ,

+ where P is a stochastic matrix compatible with ΣA, and p is the left eigenvector associated with the eigenvalue 1 for the matrix P .

One can check that Markov measures are preserved by the shift transformation ρ.

21 5 Measure-theoretic entropy

In this section, we will finally describe the measure-theoretic entropy of a dynamical system and its relation to the analogous topological entropy.

5.1 Background

We will now give some definitions in order to define the measure-theoretic entropy of a measure-preserving dynamical system (X, B, µ, T ).

Remark. Our definition of measure-theoretic entropy of a partition of a probability mea- sure space is a natural analog to the topological case. There is an enticing correspondence between measurable sets and open sets, measurable functions and continuous functions, partitions and open covers, joins and common refinements, and strongly-generating parti- tions and generators. Perhaps the analogy shouldn’t be taken too far, due to the different properties of topological and measure spaces. However, in the special case of the shift space, the fact that the shift space is totally disconnected makes this correspondence very close.

Definition. Let X be a measure space and let α = {A1,...,An} be a finite collection Sn of non-empty, mutually disjoint, measurable subsets of X. If i=1 Ai = X, then α is a partition of X.

Definition. Let α and β be partitions of X. The join of α and β is

α ∨ β := {A ∩ B : A ∈ α, B ∈ β, and A ∩ B 6= ∅}.

Note that α ∨ β is also a partition of X.

Definition. Let α be a partition of space X with probability measure µ. The measure- theoretic entropy of α is X  1  H (α) := µ(A) lg , µ µ(A) A∈α

22 1 4 where we use the convention 0 lg 0 := 0 .

Note that if we have a collection α which is both an open cover and a partition, then if the measure µ is evenly distributed among all elements of the partition, the measure- theoretic entropy and topological entropy of α are equivalent, i.e., Hµ(α) = H(α). The following definitions are also analogous to corresponding quantities for topological entropy.

Definition. Let (X, B, µ, T ) be a measure-preserving dynamical system, and let α be a partition of X. The measure-theoretic entropy of (X, B, µ, T ) with respect to α is n−1 ! 1 _ −i hµ(T, α) := lim Hµ T α . n→∞ n i=0 Definition. Let (X, B, µ, T ) be a measure-preserving dynamical system. The measure- theoretic entropy of (X, B, µ, T ) is

hµ(T ) := sup{hµ(T, α): α is a partition of X}.

Just as there is a generator theorem for topological entropy (Thm. 6), there is an equiv- alent generator theorem for measure-theoretic entropy, called Sinai’s generator theorem. We begin with several analogous definitions and lemmas.

Definition. Let α be a partition of X. The algebra generated by α is ( ) [ σ(α) := S : S ∈ P(α) , S∈S where P(α) is the power set of α. That is, σ(α) is exactly the set of all unions of some sets of α. Note that the collection σ(α) is an algebra of sets.

Definition. A partition α of a space X endowed with σ-algebra B and measurable map

T : X → X is said to strongly T -generate B if for any set B ∈ B, there exists some n ∈ N such that n−1 ! _ B ∈ σ T −iα . i=0

4 1 This is a sensible convention, since lim→0  lg  = 0.

23 Definition. Let α and β be partitions of X. We say α refines β if every set A ∈ α is contained within some set B ∈ β. Equivalently, α refines β if β is a union of sets in α, or if β ⊆ σ(α).

Lemma 13. Let α and β be partitions of X. If α refines β, then H(α) ≥ H(β).

Proof. Since every set in β is a union of sets in α,

X X X Hµ(β) = − µ(B) lg µ(B) = − µ(A) lg µ(B) B∈β B∈β A∈α,A⊆B X X ≤ − µ(A) lg µ(A) = Hµ(α). B∈β A∈α,A⊆B The critical step in the above equation is that for B ∈ β and A ∈ α with A ⊆ B, we have µ(A) ≤ µ(B), so − lg µ(A) ≥ − lg µ(B).

Lemma 14. Let α and β be partitions of X and let f : X → X be measurable. If α refines β, then f −1α refines f −1β.

Proof. Given f −1A ∈ f −1α, we find a B ∈ β such that A ⊆ B, and then f −1A ⊆ f −1B.

Lemma 15. Let α, β, γ, and δ be partitions of X. If α refines β and γ refines δ, then α ∨ γ refines β ∨ δ.

Proof. Given A ∩ C ∈ α ∨ γ, we find B ∈ β and D ∈ δ such that A ⊆ B and C ⊆ D, and we observe that A ∩ C ⊆ B ∩ D.

Lemma 16. Let α and β be partitions of X. If α refines β, then hµ(T, α) ≥ hµ(T, β).

Wn−1 −i Wn−1 −i Proof. We claim that for any n ∈ N, i=0 T α refines i=0 T β. We proceed by induction on n. The case n = 0 is true by the assumption of the lemma. Wn−1 −i Wn−1 −i Now suppose i=0 T α refines i=0 T β as the induction hypothesis. We will prove Wn −i Wn −i that i=0 T α refines i=0 T β. These are partitions are just n−1 n−1 _ _ T −nα ∨ T −iα and T −nβ ∨ T −iβ. i=0 i=0

24 By Lemma 14, we know that T −nα refines T −nβ. By Lemma 15, and using the induction Wn −i Wn −i hypothesis, we find that i=0 T α refines i=0 T β. Wn−1 −i Wn−1 −i Therefore, for any n ∈ N, i=0 T α refines i=0 T β. By the previous lemma, we can conclude that for all n ∈ N,

n−1 ! n−1 ! _ −i _ −i Hµ T α ≥ Hµ T β , i=0 i=0 and by the monotonicity of limits, hµ(T, α) ≥ hµ(T, β).

We will now begin to prove Sinai’s generator theorem.

Theorem 17 (Sinai’s generator theorem). If α strongly T -generates the σ-algebra B, then h(T ) = h(T, α).

For the remainder of this section, let α strongly T -generate B, and let β be a partition Wn−1 −i of X. For notational convenience, denote αn := i=0 T α.

Lemma 18. Let α be a partition of X which strongly T -generates B. If β is a partition of X, then there is some N ∈ N such that αN refines β.

Proof. Since β is finite, enumerate it as {B1,...,Bn}. Since α strongly T -generates B, and since each Bi is measurable, for each Bi ∈ β, there is some ni ∈ N such that Bi ∈ σ(αni ).

Let N = max{ni}. Then for each Bi ∈ β, Bi ∈ σ(αN ). Therefore, β ⊆ σ(αN ), and so αN refines β.

Lemma 19. Let α be a partition of X which strongly T -generates B. The entropy of T with respect to the refined partition αN is equal to that of T with respect to the original partition α,

h(T, αN ) = h(T, α).

Proof. This proof follows exactly as in Lemma 8.

We can now prove Sinai’s generator theorem.

25 Proof. Let α strongly T -generate B, and let β be any partition of X. By Lemma 18, we can let N be such that αN refines β. Since αN refines β, we know by Lemma 16 that hµ(T, αN ) ≥ hµ(T, β). From Lemma 19, we know that hµ(T, αN ) = hµ(T, α), so therefore we conclude that hµ(T, α) ≥ hµ(T, β). This is true for any partition β, so hµ(T, α) ≥ hµ(T ).

Since α is itself a partition, we have equality hµ(T, α) = hµ(T ).

5.2 Variational Principle

Measure-theoretic entropy is defined in a very similar way as topological entropy, and it is natural to wonder about the relationship between the two invariants of dynamical systems. Suppose we have some α which is both an open cover and a partition. If we could imagine a measure µ which assigns approximately equal measure to every set in the partition Wn −j 1 j=1 T α, for n very large, then each set would be assigned measure Wn −j , and N( j=1 T α) so the measure theoretic entropy with respect to α would look like

 n  1 _ −j hµ(T, α) = lim Hµ  T α n→∞ n j=1   1 X 1 1 = − lim lg   n→∞ n Wn −j  Wn −j  Wn −j N T α N T α A∈ j=1 T α j=1 j=1   n   n  1 _ −j 1 _ −j = lim lg N  T α = lim H  T α n→∞ n n→∞ n j=1 j=1

= h(T, α).

However, if a measure were not to assign approximately equal measure to each of these sets, then because the logarithm is a concave function, the measure-theoretic entropy would be less. So it intuitively seems that the measure-theoretic entropy should not exceed the topological entropy, and it seems that one could choose a measure which distributes its measure in the way as described above so as to achieve topological entropy. In fact, the following deep theorem states almost exactly that (Dajani and Dirksin,

26 2008, ch. 8).

Theorem 20 (Variational Principle). Let X be a compact Hausdorff space, and let T : X → X be continuous. The topological entropy of a transformation T is equal to the supremum of the measure-theoretic entropy of all T -invariant measures defined on B, i.e.,

sup{hµ(T ): µ is T -invariant} = h(T ).

The Variational Principle was first proved in around 1970 by E. I. Dinaburg, T. N. T. Goodman, and L. W. Goodwyn (Adler et al., 2008). The proof of the Variational Principle is quite involved, and so this theorem is presented without proof. For shifts of finite type, there is indeed a measure which achieves the topological entropy. This measure is called Parry measure, after the mathematician William Parry who proved that the measure was the unique measure to achieve the topological entropy (Parry, 1964). The remainder of this paper describes the Parry measure and shows that it achieves the topological entropy.

6 Parry measure

It turns out that the Parry measure is actually a Markov measure. This fact shouldn’t be too surprising due to the strong relationship between shifts of finite type and Markov mea- sures. For shifts of finite type, a sequence [x0, . . . , xn] is allowed if and only if [x0, . . . , xn−1] and Axn−1,xn = 1, while a Markov measure µP associated with stochastic matrix P assigns

µP ([x0, . . . , xn]) = µP ([x0, . . . , xn−1])Pxn−1,xn .

+ Definition. The Parry measure for a shift space ΣA is the Markov measure associated with the stochastic matrix P which has

vj Pi,j = Ai,j , λvi where λ is the largest eigenvalue of A, and v is the corresponding right eigenvector for that eigenvalue.

27 The stationary probability distribution for each symbol (i.e., the left eigenvector asso- ciated with eigenvalue 1 for P ), p, is given by

pi = uivi.

First, we confirm that P is a stochastic matrix and that p is its left eigenvector. Since for any i, we have k k X 1 X 1 P = A v = (λv ) = 1, i,j λv i,j j λv i j=1 i j=1 i and since all entries of P are clearly non-negative, we know that P is a stochastic matrix. Since for any j, we have

k k k X X vj vj X vj (pT P ) = p P = u v A = u A = (λu ) = u v = p , j i i,j i i i,j λv λ i i,j λ j j j j i=1 i=1 i i=1 we know that p is indeed the eigenvector for P with eigenvalue 1. Parry measure also has the following interesting characteristic, which is certainly not a general property of Markov measures.

+ Proposition 21. Let µ be the Parry measure on ΣA. Parry measure assigns to any allowed cylinder of length n beginning with i and ending with j the probability

u v µ([i, x , . . . , x , j]) = i j . 1 n−1 λn

Proof. Parry measure assigns to the cylinder [i, x1, . . . , xn−1, j] the measure

µ([i, x1, . . . , xn−1, j]) = piPi,x1 Px1,x2 ··· Pxn−2,xn−1 Pxn−1,j         vx1 vx2 vxn−1 vj = (uivi) Ai,x1 Ax1,x2 ··· Axn−2,xn−1 Axn−1,j λvi λvx1 λvxn−2 λvxn−1 u v = i j A A ··· A A  . λn i,x1 x1,x2 xn−2,xn−1 xn−1,j

Thus Parry measure assigns the cylinder measure 0 if the path is not allowed (as required),

uivj and λn if the path is allowed.

28 This proposition shows that Parry measure actually assigns the same measure to many different cylinders of the same length. This will allow us to more easily calculate the measure-theoretic entropy of the Parry measure. Moreover, it shows that Parry measure assigns approximately equal measure to all allowed cylinders of the same length, and yields an intuitive reason why the shift transformation has such large entropy with respect to Parry measure.

Lemma 22. The partition α = {[1],..., [k]} strongly ρ-generates C.

Proof. To confirm this, since C is generated by cylinder sets, we must simply check that for Wm−1 −i  some arbitrary cylinder set B = [x0, . . . , xn], there is some m such that B ∈ σ i=0 ρ α . We take m = n + 1 and note that

n −1 −n _ −i B = [x0] ∩ ρ [x1] ∩ · · · ∩ ρ [xn] ∈ ρ α, i=0 and so the claim follows.

Theorem 23. The measure-theoretic entropy of the Parry measure µ achieves the topo- logical entropy. That is,

hµ(ρ) = h(ρ) = lg λ.

Proof. By the previous lemma, we can calculate the entropy of the Parry measure by calculating the entropy with respect to the partition α = {[1],..., [k]} and subsequently using Sinai’s generator theorem. Note that cylinders of the same length n which start with the same symbol i and end with the same symbol j have the same measure, so we would like to count how many of these cylinders we will have. Recall from Proposition 10 that n this is just Ai,j. By Corollary 2,

n Ai,j T lim = (vu )i,j = viuj. n→∞ λn

29 Thus, using the previous proposition, which gives the measure of each of these cylinders, the entropy is

1 X hµ(ρ) = − lim µ(B) lg µ(B) n→∞ n cylinders B of length n k k 1 X X n uivj uivj  = − lim (viujλ ) lg n→∞ n λn λn i=1 j=1 k k k n X X X = lim uivi ujvj lg λ = lg λ uivi(1) n→∞ n i=1 j=1 i=1 = lg λ.

In fact, it is the case that the Parry measure is the unique measure on the shift space which achieves the topological entropy (Parry, 1964).

7 Conclusion

We have shown that for shifts of finite type, the topological entropy of the shift transforma- tion is simply the logarithm of the largest eigenvalue of the adjacency matrix for the shift space. The Variational Principle states that the topological entropy is the supremum of all possible measure-theoretic . We showed that the Parry measure on the shift space achieves the topological entropy, proving that in the case of shifts of finite type, there is indeed a measure which achieves the topological entropy. The appendix gives a derivation of the Parry measure.

Appendix: Deriving the Parry measure

In the main article, Parry measure was simply defined, and it was confirmed that it achieves topological entropy. But where does Parry measure come from? The following discussion

30 and exploration demonstrates a method of “finding” the Parry measure, and finds it to be closely related to the Hausdorff measure on the space with a suitably chosen metric. From previous commentary, and the derivation of topological entropy for shifts of finite type, it seems that if we want to have a measure whose measure-theoretic entropy matches the topological entropy, the measure ought to be approximately evenly distributed among all possible “very long” cylinders of the same length. If we can define a metric on the shift space that makes all cylinders of the same length have the same diameter, then Hausdorff content will allow us to create a measure that does most of what we want.

Hausdorff content and dimension

In this section, we introduce Hausdorff content in general. Let X be a space endowed with metric d(·, ·).

Definition. Let d ≥ 0. Let

r d hd(X) = inf{|α|r : α is a finite open cover of X where each A ∈ α is an open ball of radius r}, where |·| denotes the cardinality of a set. The d-dimensional Hausdorff content (for d ≥ 0) of X is r Hd(X) = lim hd(X). r↓0 The Hausdorff dimension of X is

H(X) = sup{d : Hd(X) > 0}.

One can confirm that any d-dimensional Hausdorff content for a space is a measure for that space (however, that measure is only non-trivial if the dimension used is the Hausdorff dimension).

A metric on the shift space

+ Remember that the goal is to create a measure on ΣA which achieves the topological entropy. Since we have found a strongly generating partition α = {[1],..., [k]}, we know

31 that we want to assign roughly equal measure to “very long” cylinders of the same length, and so we want to define a metric which makes all cylinders of the same length have the same diameter. + We will now define a metric d(·, ·) on ΣA to make the space a metric space that uses exactly this concept. If two points x and y are equal, then define d(x, y) = 0, and if they are not equal, then let n(x, y) be the largest number n for which the two points x and y lie Wn −i in different members of the partition i=0 ρ α. Note that, for shifts of finite type, n(x, y) is exactly the first coordinate at which the two sequences differ. For distinct points, we define their distance as d(x, y) = β−n(x,y), where β is some number larger than 1. One can confirm that d(·, ·) is indeed a metric.

+ Definition. An open ball of radius r around point x ∈ ΣA is the set

+ Br(x) = {y ∈ ΣA : d(x, y) < r}.

Note that this metric induces a same topology as the topology C generated by cylinder + sets that was introduced earlier. If x = (x0, x1,...) ∈ ΣA, then

Br(x) = [x0, . . . , xn], where n is largest whole number such that β−n < r. Thus any open ball is a cylinder set.

+ Remark. For any two points x, y ∈ ΣA, d(x, y) ≤ 1.

Hausdorff content of the shift space

Wn −i Now that we have made a metric which gives all members of the partition i=0 ρ α (i.e., all cylinders of length n − 1) the same diameter, we can now make a measure that gives approximately equal weighting to all of these cylinders. Creating a Hausdorff measure on + ΣA will allow us to do exactly this: it will assign measure to cylinders according to how

32 many “very long” sub-cylinders they contain (since all these sub-cylinders were given equal diameter), making all “very long” cylinders approximately equally likely. + We would like to calculate the d-dimensional Hausdorff content of ΣA. Since a cylinder −n n−1 + of length n − 1 is an open ball of radius β , for convenience we will write hd (ΣA) to β−n + + mean hd (ΣA). Since ΣA can be covered minimally by θn cylinders of length n,

n−1 + −n d −nd hd (ΣA) = θn−1(β ) = θn−1β .

Taking the limit as the radius of the balls goes to 0 means taking the limit as the length of cylinders approaches ∞, so (using Proposition 11) the Hausdorff content of the shift space is

+ n−1 −nd θn−1 1 n −nd Hd(Σ ) = lim h (X) = lim θn−1β = lim λ β A n→∞ d n→∞ n→∞ λn−1 λ 1T v = lim λnβ−nd λ n→∞ 1T v = lim exp(n(ln λ − d ln β)). λ n→∞

ln λ + ln λ Note that ln λ − d ln β > 0 when d < ln β . In this case, Hd(ΣA) = ∞. If d > ln β , then + Hd(ΣA) = 0. The following proposition summarizes our results.

+ Proposition 24. The Hausdorff dimension of ΣA is ln λ D := H(Σ+) = , A ln β

+ and the D-dimensional Hausdorff content of ΣA is 1T v H (Σ+) = . D A λ

The properties of Hausdorff content allow us to define a probability measure µH : C → [0, 1] as HD(B) λ µH (B) := + = T HD(B). HD(ΣA) 1 v

For a given cylinder set B = [x0, . . . , xn], we will calculate µH (B).

33 Proposition 25. The Hausdorff measure of any allowed cylinder [x0, . . . , xn] is

vxn µH ([x0, . . . , xn]) = . 1T vλn Proof. First, we must count how the number of sub-cylinders contained in B grows as the length of those cylinders grows. Let ψm be the number of cylinders of length m comprising B. Note that there is a 1 to 1 correspondence between cylinders of length m comprising

B and cylinders of length m − n beginning at xn. Thus, by Proposition 10, k X ψ = Am−n = e Am−n1. m xn,j xn j=1

Then we see that the rate of growth of ψm is ψ eT Am−n1  Am−n  lim m = lim xn = λ−neT lim 1 m→∞ λm m→∞ λm xn m→∞ λm−n

−n T T −n T T −n = λ exn (vu )1 = λ (exn v)(u 1) = λ vxn .

We have m −(m+1) D −(m+1)D ln β −(m+1) hD (B) = ψm(β ) = ψme = ψmλ , so this means that the Hausdorff content is

m 1 −m 1 vxn HD(B) = lim h (X) = lim ψmλ = . m→∞ D m→∞ λ λ λn Therefore, the Hausdorff measure of the set B is

λ 1 vxn vxn µH (B) = = . 1T v λ λn 1T vλn

As desired, the Hausdorff measure assigns approximately equal probability to cylinders of the same length. However, there is a problem! The Hausdorff measure isn’t preserved by ρ! To create a derivative measure that is preserved by ρ, we can just see how ρ transforms

µH . Applying ρ to µH iteratively, if we approach some measure in the limit, then that measure will be preserved by ρ, and moreover, it should still assign very long cylinders approximately equal measure.

34 Definition. The pushforward of a measure µ : B → [0, 1] by a transformation T : X → X is T∗µ : B → [0, 1], defined by, for B ∈ B,

−1 T∗µ(B) = µ(T B).

m 5 Proposition 26. There is a measure µ such that µ = limm→∞ ρ∗ µH , and µ is preserved by ρ.

Proof. Let B = [x0, . . . , xn] be a cylinder. Then

−m [ ρ B = [a0, . . . , am, x0, . . . , xn].

Aa0,a1 ···Aam−1,am Aam,x0 =1

Note that all cylinders on the right-hand side end with the same symbol and are of the same length, m + n + 1, so they have the same Hausdorff measure (and they are all mutually disjoint). Thus we only need to count the number of cylinders on the right-hand side. T m Remember that we have 1 A eam distinct cylinders of length m ending with am. We will only count those cylinders with am where Aam,x0 = 1. Thus we have

k −m vxn X T m  µH (ρ B) = Ai,x 1 A ei 1T vλm+n+1 0 i=1 k m vxn X T A = Ai,x 1 ei. 1T vλn+1 0 λm i=1 Taking the limit as m → ∞,

k  m  k −m vxn X T A vxn X T T lim µH (ρ B) = Ai,x 1 lim ei = Ai,x 1 (vu )ei m→∞ 1T vλn+1 0 m→∞ λm 1T vλn+1 0 i=1 i=1 k vx X vx vx = n A u = n uT Ae = n λuT e λn+1 i,x0 i λn+1 x0 λn+1 x0 i=1 u v = x0 xn . λn 5I am glossing over what topology this limit is taken in. This limit is convergent in the weak-* topology on measures.

35 Thus, our limit measure µ has u v µ([x , . . . , x ]) = x0 xn . 0 n λn The measure µ is ρ-invariant since

  k [ X uivx µ(ρ−1[x , . . . , x ]) = µ [i, x , . . . , x ] = A n 0 n  0 n  i,x0 λn+1 Ai,x0 =1 i=1 k vx X vx = n A u = n λu λn+1 i,x0 i λn+1 x0 i=1 u v = x0 xn = µ([x , . . . , x ]). λn 0 n

Proposition 27. The limit measure µ is a Markov measure, with

pi = uivi vj Pi,j = Ai,j λvi This is known as the Parry measure.

Proof. Suppose we have a cylinder [x0, . . . , xn−1, i] and another one [x0, . . . , xn−1, i, j]. If µ is indeed a Markov measure, then the conditional probability

µ([x0, . . . , xn−1, i, j] = Pi,j, µ([x0, . . . , xn−1, i]) regardless of what x0, . . . , xn−1 are (and what n is).

If Ai,j = 1, then ux0 vj µ([x0, . . . , xn−1, i, j] λn+1 vj = u v = . µ([x , . . . , x , i]) x0 i λv 0 n−1 λn i Otherwise, the conditional probability above will be 0. Since this conditional probability is dependent only on i and j (and since µ is preserved by the shift transformation ρ), we observe that µ is a Markov measure! Thus

vj Pi,j = Ai,j . λvi

36 We have µ([i]) = uivi, so the stationary probability vector has pi = uivi. So the limit measure µ which we have found here is actually the Parry measure which was defined earlier!

Therefore, it was possible to discover the Parry measure in the following manner: First, we found a strongly-generating partition α. We then created a metric which determined closeness of two points points according to the largest number n for which the two points lied Wn−1 −i the same member of the partition i=0 ρ α. We then found the Hausdorff measure on the shift space with this metric. Since the metric was not preserved by the shift transformation ρ, we iteratively found the pushforward of the Hausdorff measure, and in the limit, we got a measure that was shift-invariant. It happened to be the case that for shifts of finite type, this resulting measure achieved the topological entropy. It would be interesting to see whether this strategy works for other topological dynamical systems which have strongly- generating partitions.

References

Roy Adler, Tomasz Downarowicz, and Michal Misiurewicz. Topological entropy. Scholar- pedia, 3(2):2200, 2008.

Michel Baranger. Chaos, complexity, and entropy. New England Complex Systems Institute, 2000.

Karma Dajani and Sjoerd Dirksin. A Simple Introduction to Ergodic Theory. 2008.

James Gleick. Chaos: Making a New Science. Cardinal, London, 1987.

Harvey B. Keynes and James B. Robertson. Generators for topological entropy and ex- pansiveness. Mathematical systems theory, 3(1):51–59, 1969.

William Parry. Intrinsic markov chains. Transactions of the American Mathematical Soci- ety, 112(1):55–66, Jul. 1964.

37 Mark Pollicott and Michiko Yuri. Dynamical Systems and Ergodic Theory, chapter 3. Cambridge University Press, 1998.

Yakov Sinai. Metric entropy of dynamical system. 2007.

Charles Walkden. Math 41112/61112 ergodic theory lecture notes, 2012.

Wikipedia. History of randomness. 2012. URL http://en.wikipedia.org/wiki/ History_of_randomness.

38