<<

Dynamical Systems [Micha lMisiurewicz]

Basic Notions, Examples

The main objects studied in the theory of dynamical systems are maps f : X X, where X is a space with some structure. For instance, this structure can be topological,→ differentiable, or there may be a measure on X. Therefore three basic cases we will consider are: (a) X is a topological space (usually compact) and f is continuous, (b) X is a smooth (Cr, 1 r ) manifold (also usually compact) with or without boundary and f is smooth≤ (C≤r), ∞ (c) X is a space with a σ-field and a measure µ on it, the measure is a probability one, that is µ(X) = 1; f preservesF µ, that is µ(f −1(A)) = µ(A) for every A . Other way of saying that f preserves µ is that µ is f-invariant, or that f is an endomorphism∈ F of (X, ,µ). F 0 In each of cases (a)-(c), we consider the iterates of f, defined by induction: f = idX , and then f n+1 = f n f. Thus, f n = f f. This means that we look at the set of ◦ ◦···◦ n times n all iterates of f, f n∈Z+ . In other words, we look at the action of the semigroup Z+ on the space X.{ The} special case which| is{z considered} as often as the general one (and maybe even more often) is when the map f is invertible. We require then that f −1 is also within the class of maps we consider. That means that in case (a) f is a homeomorphism, in case (b) f is a diffeomorphism (of class Cr), and in case (c) f is an invertible measure preserving transformation (an automorphism) of X. If we consider an invertible f then we study also negative iterates of f, defined as f −n = (f −1)n = (f n)−1. Then we have the Z acting on X. Yet another case is when we consider an action of the group R on X. This means that t instead of one map and its iterates we have a one-parameter group (ϕ )t∈R of invertible 0 s+t s t maps. That it is a group means that ϕ = idX and ϕ = ϕ ϕ for all s,t R. Usually t ◦ ∈ we call (ϕ )t∈R a flow. A classical example of such a situation is when X has a differentiable structure and there is a vector field F on X. Then we can consider the ordinary differential equationx ˙ = F (x). Under certain regularity conditions on F , the solutions to this equation are defined on the whole real line (the vector field is complete) and they depend in a smooth t way on initial conditions. If the solution with the initial condition x = x0 is (ϕ (x0))t∈R, t then (ϕ )t∈R is a one-parameter group of diffeomorphisms of X. In fact, this situation was the starting point of the whole theory of dynamical systems. Let us give several examples. Example 1. Let X = [0, 1] and let f : X X be given by the formula → 2x if 0 x 1/2, f(x) = 2 2x if 1/≤2 ≤x 1.  − ≤ ≤ This is so called (full) . The graph of this map is shown in Figure 1. Our space is a topological space – an interval, and the map is continuous. Therefore we have a situation from (a). We can also easily introduce an invariant measure. Let λ be the Lebesgue measure on [0, 1], and let be the σ-field of all measurable subsets of [0, 1]. If A then f −1(A) F ∈F 1 Dynamical Systems [Micha lMisiurewicz]

is the union of two sets: one contained in [0, 1/2], the other in [1/2, 1], each of them of measure (1/2)λ(A) (see Figure 1). Hence, λ is f-invariant. Therefore we have a situation from (c). Notice that if B = f −1(A) [0, 1/2] then f(B) = A, but λ(B) = λ(A) (if λ(A) = 0). This shows that in the definition∩ of a measure preserving transformation6 we should6 really take the inverse image of A, not the image of A (of course, if we want this example to work).

A

f -1 ( A )

Figure 1

In fact, there is a deeper reason for taking the inverse image of A instead of the image of A. Namely, if g : X Y is a map and we have a measure µ on X then g carries this measure through to →Y and we get a measure ν = g∗(µ) on Y defined as follows. If A Y then we say that A is measurable if g−1(A) is measurable in X and then we set ν(A⊂) = µ(f −1(A)). It turns out that the family of measurable subsets of Y is a σ-field and ν is a measure. Exercise 1. Prove it. Using the above result, we see that f : X X preserves the measure µ if and only if f ∗(µ) = µ. → Example 2. Let F be a constant vector field in R2: F (x) = v for every x R2. The flow generated by this vector field is simply ϕt(x) = x + tv. Now consider the two-dimensional∈ T2 = R2/Z2. We can think of it as the square [0, 1]2 with opposite sides identified (upper with lower and left with right). Since F (x + k) = F (x) for every k Z2, the flow F induces a (constant) vector field on T2. Its flow is the projection of the flow∈ ϕt onto the torus. A piece of a trajectory of this flow is shown in Figure 2. Assume that v is not horizontal. If we take a horizontal circle S T2 and the first return map on S then we get a map f : S S. The system (S, f) is a⊂ so called Poincar´e t → section of the flow (ϕ )t∈R (see Figure 2). The first return map is defined as follows. Take a point of S and follow the flow until the trajectory hits again S – this point is the image of x. Since we can follow the trajectory back from f(x) to x, the map we get is invertible.

2 Dynamical Systems [Micha lMisiurewicz]

If it happens, as in our example, that the first return map is defined on the whole S, it is a diffeomorphism. It is easy to show that in our example it is a rotation on the circle S.

xf 3 (x ) f (x ) f 2 (x )

Figure 2

We now describe another basic example. Example 3. Let S be a finite set consisting of more than one point, for instance S = ∞ ∞ 1, 2,...,s with s > 1. Define Σ = −∞ S and Σ+ = S. More precisely, Σ = { ∞ } ∞ 0 −∞ Si and Σ+ = Si, where Si = S for each i. Thus, the elements of Σ are i= i=0 Q Q the doubly infinite sequences (...,x−2,x−1,x0,x1,x2,...) with xi S for all i, and the elementsQ of Σ are theQ usual one-sided sequences (x ,x ,x ,...) with∈ x S for all i. + 0 1 2 i ∈ We will regard Σ and Σ+ as topological spaces. The topology is defined as the product topology on Σ and Σ+, where S has the discrete topology. It can be described in three ways. The first way is to specify an open basis of the space. This is a family of open subsets such that a set A is open if and only if for every point x A there is a set B from the basis such that x B A. In our case an open basis we choose∈ will consist of all cylinders. ∈ ⊂ A cylinder is a set of the form Cy−n,y−n+1,...,yn−1,yn = (...,x−2,x−1,x0,x1,x2,...) Σ: x = y for all i n, n +1,...,n 1,n in{ the space Σ and of the form∈ i i ∈ {− − − }} Cy0,y1,...,yn = (x0,x1,x2,...) Σ: xi = yi for all i 0, 1,...,n in the space Σ+. Notice that the{ cylinders are not∈ only open, but also closed.∈ { }} The second way of describing topology on Σ and Σ+ is by specifying a metric. We do it by setting d(x,y)=2−k, where k is the smallest non-negative integer such that there is m with m = k and such that the m-th terms of the sequences x and y are different. | | (n) (n) (n) ∞ ∞ The third way is to say when limn→∞ x = y. If x =(xi )i=−∞ and y =(yi)i=−∞ (n) (n) ∞ ∞ (n) (for Σ) or x =(xi )i=0 and y =(yi)i=0 (for Σ+), then limn→∞ x = y if and only if (n) for every k there exists N such that if n N then xi = yi for every i with i k. This means that as n then x(n) coincides≥ with y on longer and longer pieces| around| ≤ 0. →∞ We define a shift σ on Σ and Σ+ (we will use the same letter in both cases) as the shift by one to the left. This means that σ(x0,x1,x2,...)=(x1,x2,x3,...). To write the formula for σ on Σ is more difficult. For this we have to introduce notation

3 Dynamical Systems [Micha lMisiurewicz]

for the points of Σ which shows where the 0-th coordinate is. Namely, we shall write * x = (...,x−2,x−1, x0,x1,x2,...) if the 0-th coordinate of x is x0. With this notation we * * can write σ(...,x−2,x−1, x0,x1,x2,...)=(...,x−1,x0, x1,x2,x3,...). From the description of convergence it follows immediately that σ is continuous in both cases. Moreover, in the case of the space Σ, σ is a homeomorphism. In the case of the space Σ+, σ is s-to-one. If ν is a probability measure on S, we can take the product measures µ on Σ and Σ+ (again we will use the same letter in both cases). Let us describe them. For this it is enough to specify the measures of cylinders. If ν( i ) = pi then µ(Cy−n,y−n+1,...,yn−1,yn ) = n n { } i=−n pyi for Σ and µ(Cy0,y1,...,yn ) = i=0 pyi for Σ+. The measures of other Borel sets are determined by taking unions of disjoint cylinders and passing to the limit. (The σ-field Qof the Borel sets is defined as the smallestQ σ-field containing all closed and all open sets.) It can be easily checked that with our assignment of measures for cylinders, the measure µ is well defined (in other words, if a cylinder is a union of disjoint cylinders, its measure is the sum of the measures of those cylinders). Let us show that µ is σ-invariant. Let us do it for the case of Σ+; the case of Σ is similar. Since the measure of every measurable set can be obtained from the measures of the cylinders, it is enough to check the equality µ(σ−1(A)) = µ(A) only in the case when −1 s A is a cylinder. Let A = Cy0,y1,...,yn . Then σ (A) = j=1 Cj,y0,y1,...,yn . Since the sets Cj,y ,y ,...,yn , j =1, 2,...,s, are pairwise disjoint, we get 0 1 S s s n µ(σ−1(A)) = µ(C ) = p p j,y0,y1,...,yn j · yi j=1 j=1 i=0 ! X X Y s n n = p p = p = µ(A).  j yi yi j=1 i=0 ! i=0 X Y Y   s since j=1 pj =1(ν is a probability measure). The system (Σ,µ,σ) is called a two-sided Bernoulli shift and the system (Σ+,µ,σ) a one-sidedP Bernoulli shift. Often when it is obvious which shift we consider, we drop the “two-sided” or “one sided”, and speak just of Bernoulli shifts.

Exercise 2. Check that all three descriptions of topology in Σ (Σ+) give the same topol- ogy.

Exercise 3. Check that the definition of the measure µ on Σ (Σ+) is consistent. Exercise 4. Check that µis σ-invariant in the case of Σ.

Exercise 5. Prove that Σ and Σ+ are compact.

If S consists of two elements then it is easy to see that Σ+ is homeomorphic to the standard Cantor set. In fact, always Σ and Σ+ are homeomorphic to the Cantor set.

In various mathematical theories objects which which look similarly are treated as the same object. Thus, in geometry one does not distinguish between two sets if they are

4 Dynamical Systems [Micha lMisiurewicz]

isometric, in topology one does not distinguish between two spaces if they are homeomor- phic, in algebra one does not distinguish between two groups if they are isomorphic, etc. Here a similar situation occurs. Let X and Y be topological spaces and let f : X X and g : Y Y be continuous maps. If there is a homeomorphism h : X X such→ that h f = g →h (in other words, the diagram → ◦ ◦ f X X −→

h h

 g  Y Y y −→ y commutes), we will say that f and g are conjugate. Notice that in this case X and Y are homeomorphic. The homeomorphism h is called a conjugacy (between f and g). If we study the system from topological point of view, we do not distinguish between two maps if they are conjugate. If we look also at the differentiable structure, there may be real differences, because h may be not smooth (and in many cases, is not). Of course, if h is a diffeomorphism (say, of the same class as f and g), then f and g are smoothly conjugate, and then they are the same also from a “smooth” point of view. In the situation as above, if h is not necessarily a homeomorphism, but just a contin- uous map of X onto Y , we say that g is a factor of f and h is a semiconjugacy of f with g. Assume that f is a measure preserving transformation of (X,µ) and g is a measure preserving transformation of (Y, ν). We say that f and g are isomorphic if there exists h : X Y such that h∗(µ) = ν, h is invertible, and h f = g h. Here, as always in the case→ of spaces with measures, we take everything modulo◦ sets◦ of measure zero. That is, if we speak about some property, like for instance invertibility of h, we mean that this property holds perhaps after removing some sets of measure zero. The map h is called an isomorphism (between f and g). If in the above definition we delete the requirement of invertibility of h then we say that in such a situation g is a factor of f. Example 4. Let f : [0, 1] [0, 1] be the tent map and let g : [0, 1] [0, 1] be the , that is the map→ given by g(x)=4x(1 x). Let h : [0, 1] →[0, 1] be given by h(x)=(1 cos(πx))/2. Since the map x cos(πx−) is a homeomorphism→ of [0, 1] onto [ 1, 1], the map− h is a homeomorphism of [0,7→1] onto itself. If 0 x 1/2 then f(x)=2x, so− ≤ ≤ 1 1 h(f(x)) = (1 cos(2πx)) = (1 cos2(πx) + sin2(πx)) = sin2(πx). 2 − 2 − If 1/2 x 1 then f(x)=2 2x, so ≤ ≤ − 1 1 h(f(x)) = (1 cos(2π 2πx)) = (1 cos(2πx)), 2 − − 2 − so again h(f(x)) = sin2(πx). In both cases 1 cos(πx) 1 + cos(πx) g(h(x))=4 − =1 cos2(πx) = sin2(πx). 2 2 − 5 Dynamical Systems [Micha lMisiurewicz]

Therefore h f = g h. This proves that f and g are conjugate. ◦ ◦ Example 5. Let f : [0, 1] [0, 1] be the tent map and let σ : Σ+ Σ+ be a one-sided shift on two symbols (that→ is S has two elements). For convenience,→ we will choose those symbols as L = [0, 1/2] and R = [1/2, 1] (L and R stand for “left” and “right” respectively). Thus, S = L,R and the elements of Σ+ are sequences of elements of S. The map f maps each of{ the} intervals L,R onto the whole [0, 1] in a monotone way. Now we apply the coding procedure to the system ([0, 1],f) with the partition L,R of [0, 1] (strictly speaking, this is not a partition, because L and R have a point in{ common,} but we will use this name). For every point x [0, 1] we look at which of the sets L,R the point f i(x) lies for i =0, 1, 2,... and call this∈ set A . In such a way we get for every point x [0, 1] a i ∈ code (A0,A1,A2,...) Σ+. Since 1/2 belongs to both L,R, for some points such a code is not unique. However,∈ we claim that given a code there is a unique point with this code. Let K =(A0,A1,A2,...) Σ+. For a given n the set of points whose code (or one of ∈ n−1 −i the codes) begins with (A0,A1,...,An−1) is equal to In(K) = i=0 f (Ai). We show by induction that for every code K such a set is an interval of length 2−n. This is definitely T −1 true for n = 1. If it is true for some n then, since In+1(K) = A0 f (In(σ(K))) and by −n ∩ the induction hypothesis In(σ(K)) is an interval of length 2 , we get that In+1(K) is an interval of length 2−n−1. This completes the induction step. Now we have a descending ∞ sequence (In(K))n=0 of closed non-empty subsets of [0, 1]. Since [0, 1] is compact, the intersection of all In(K) is non-empty. Since the length of In(K) goes to 0 as n , this intersection consists of one point. We call this point ϕ(K). →∞ In such a way we defined a map ϕ :Σ+ [0, 1]. Since every point of [0, 1] has a code, this map is onto. Definitely, it is not one-to-one,→ but only countably many points have more than one inverse image (the points whose trajectory passes through 1/2), and each of those points has exactly two inverse images. We shall show that ϕ is continuous. Let limn→∞ Kn = M = (B0,B1,B2,...). As n , longer and longer initial pieces of K are the same as the initial pieces of M, → ∞ n so if we choose any m then Im(Kn) = Im(M) if n is sufficiently large. This implies that −m ϕ(Kn) ϕ(M) 2 if n is sufficiently large. Therefore limn→∞ ϕ(Kn) = ϕ(M), so ϕ |is continuous.− | ≤ In such a way we have proved that f is a factor of σ. Exercise 6. Let f be the tent map. Let g : [0, 1] [0, 1] be a continuous map such that g(0) = g(1) = 0, there is a point c (0, 1) such→ that g(c) = 1, g is of class C1 on each of the intervals [0, c] and [c, 1], and∈ on each of them inf g′ > 1. Prove that f and g are conjugate. Hint: Use coding for f and g to get semiconjugacies| | ϕ and ψ of σ with f and g respectively. Then prove that if K,M Σ+ then ϕ(K) < ϕ(M) if and only if ψ(K) <ψ(M). ∈ Notice that the decimal, binary, etc. representations of real numbers can be obtained by applying the above coding procedure to suitable piecewise continuous functions. For instance, if we replace the tent map by the map given by

2x if 0 x 1/2, f(x) = 2x 1 if1/≤2 ≤x 1,  − ≤ ≤ 6 Dynamical Systems [Micha lMisiurewicz]

then the coding gives a binary representation of a point from [0, 1]. Strictly speaking, f is not a function, since it has two values at 1/2, but this does not matter much. The points like 3/4 have two binary representations (.1100000000 ... and .1011111111 ... in this case).

Whenever we have a conjugacy (or at least semiconjugacy) h of f : X X with g : Y Y and an f-invariant Borel probability measure µ on X then we can→ transport if to Y→. Let ν = h∗(µ). If A Y be a Borel set then ν(g−1(A)) = µ((g h)−1(A) = µ((h f)−1(A) = µ(f −1(h−1(A⊂))). Since µ is f-invariant, we have µ(f −1(◦h−1(A))) = µ(h−1◦(A)) = ν(A). Thus, ν(g−1(A)) = ν(A). This proves that ν is g-invariant. Clearly, in this situation if h is a conjugacy, it is also an isomorphism between (X,µ,f) and (Y,ν,g).

If I is an interval then there is a special class of measures on I, which are very important from the point of view of applications. Namely, they are measures that are absolutely continuous with respect to the Lebesgue measure. Such a measure can be written as µ = ρλ, where λ is the Lebesgue measure on I and ρ is a measurable non- negative function on I with integral 1 (the latter condition is imposed in order to make µ a probability measure). This function ρ is called the density of µ. For every measurable set

A I we have µ(A) = A ρ(x) dx. A function ϕ on I is µ-integrable if ϕρ is λ-integrable, and⊂ then ϕdµ = ϕ(x)ρ(x) dx. I I R Let h : I J be a diffeomorphism and let µ = ρλ be an absolutely continuous R → R −1 ′ −1 ′ −1 −1 ′ measure on I. Set Ph(ρ)(x) = ρ(h (x))/ h (h (x)) . We have 1/h (h (x))=(h ) (x), so if h−1(x) = y(x) then P (ρ)(x) = ρ(y(|x)) dy/dx .| From the formula for integration by h | | substitution we get for a measurable subset A of J: −1 ρ(y) dy = ρ(y(x)) dy/dx dx h (A) A | | (the absolute value is necessary since we work with integrals over sets, not from a to b). −1 R R ∗ Therefore µ(h (A)) = h−1(A) ρ(y) dy = A Ph(ρ)(x) dx. This proves that h (µ) is an absolutely continuous measure with the density P (ρ). R R h In fact, in the above computation we can (as always) neglect sets of measure zero. Therefore it does not matter if for instance the homeomorphism h is not differentiable at finite number of points.

1

Figure 3

7 Dynamical Systems [Micha lMisiurewicz]

Example 6. Let f,g and h be as in Example 4. As we know from Example 1, the Lebesgue measure λ is invariant for f. Therefore the measure h∗(λ) is a g-invariant mea- sure, absolutely continuous with respect to the Lebesgue measure. Let us compute its density Ph(1) (here 1 denotes the constant function equal 1 everywhere). Recall that h(y) = (1 cos(πy))/2. If h−1(x) = y then we get cos(πy) = 1 2x. Therefore h′(y) = (π/−2) sin(πy) = (π/2) 1 cos2(πy)=(π/2) 1 (1 2x)−2 = π√x√1 x. Hence| | we get P| (1)(x)=1| /(π√x√1 −x) (see Figure 3). − − − h p − p

Periodic Points, Circle Maps

Let X be a topological space and f : X X a continuous map. A point x X is peri- odic if there is a positive integer n such that→f n(x) = x. The smallest n with this∈ property is called the period of x. The orbit of x is then finite and equal to x,f(x),...,f n−1(x) . It is called a periodic orbit. All points of a given periodic orbit have{ the same period.} A periodic point x of period n is called attracting if there exists a neighborhood U of x such kn that for every y U we have limk→∞ f (y) = x. Periodic points, especially attracting ones, play a very∈ important role in applications. If the system (X,f) is a model of some m ∞ physical, chemical, biological or other process, a trajectory (f (z))m=0 of the initial state z of the process is what we observe (here n plays role of time). If x is an attracting periodic point then it can easily happen that at certain moment f m(z) approaches x sufficiently close and then the rest of the trajectory gets attracted to the orbit of x (which consists of finite number of points). Soon it gets so close to this orbit that there is no way of distinguishing them. Thus, what we start to observe is practically a periodic motion. Example 7. Consider the situation from Example 5: f is the tent map and σ is the shift on two symbols. A point x = (x ,x ,x ,...) Σ is σ-periodic of period n if and only 0 1 2 ∈ + if the sequence (x0,x1,x2,...) is periodic with the least period n. Thus, for every n the shift σ has many periodic points of this period. There are 2n fixed points of σn. Some of them are fixed points of σk for some divisor k of n, but not all of them. For instance, the sequence starting with one L followed by n 1 R’s and then repeating periodically has least period n. − We have a semiconjugacy ϕ between σ and f constructed in Example 5. We claim that ϕ maps bijectively the set of periodic points of σ onto the set of periodic points of f and that it preserves periods. Let us prove it. Let x be a periodic point of σ of period n. Then σn(x) = x, so f n(ϕ(x)) = ϕ(σn(x)) = ϕ(x). Thus, ϕ(x) is f-periodic and its period is a divisor of n. If this period is smaller than n then the f-orbit of ϕ(x) has less elements than the σ-orbit of x. Therefore there is a point y of the f-orbit of ϕ(x) has more than one inverse image under ϕ. We know that the only points of [0, 1] with this property are those whose orbits pass through 1/2. Thus, the orbit of ϕ(x) passes through 1/2. However, f i(1/2) =1/2 for any i > 0, whereas f n(y) = y. This proves that the period of ϕ(x) is n. Thus, the6 set of periodic points of σ is mapped by ϕ into the set of periodic points of f and this map preserves periods. Moreover, as we noticed, ϕ(x) has only one inverse image under ϕ, so this map is one-to-one. It remains to prove that it is onto, that is, that if ϕ(x) is f-periodic then x itself is σ-periodic. However, this is clear, since the code of a periodic

8 Dynamical Systems [Micha lMisiurewicz]

point must be periodic (we use again the fact that a periodic orbit of f does not contain 1/2). This proves our claim. In such a way we have shown that the number of periodic points of a given period of the tent map is the same as for the shift on two symbols. In particular, the tent map has periodic points of all periods. Clearly, we can take instead of the tent map any map conjugate to it (for instance, the logistic map), and the above conclusion remains true. For every x [0, 1], except 1/2, we have f ′(x) = 2. Therefore if x is periodic of period n then (f∈n)′(x) = 2n. If x is an attracting| periodic| point of period n and f n is differentiable at| x then| clearly (f n)′(x) 1. Thus f has no attracting periodic points. If we consider the conjugacy| h from| ≤ Example 4 between f and the logistic map g, from the formula h f n = gn h we get h′(x) (f n)′(x) = (gn)′(x) h′(x) . Unless x = 0, this yields (gn)′(x)◦ = (f n)′(◦x) = 2n.| If x =|| 0 then |x is| a fixed|| point of| g (a periodic point of period| 1), and| an| easy computation| gives g′(x) = 4. Thus g also has no attracting periodic points – a result not so obvious as for f. Exercise 7. Prove that if f is an interval map, x is a periodic point of period n, f n is differentiable at x and (f n)′(x) < 1 then x is attracting. | | Example 8. One can check that for the map g : [0, 1] [0, 1] given by g(x)=3.832x(1 x) there is an attracting periodic point of period 3 (very→ close to 1/2). −

In many models in mechanics one gets a flow with invariant tori, and the flow restricted to such a torus has no fixed points. Then the situation is similar to the one considered in Example 2 and by means of Poincar´esection we can get an orientation preserving diffeomorphism of a circle. This motivates a deeper study of orientation preserving circle homeomorphisms, which has been in fact started by Poincar´e. Let S1 be a circle. We shall consider it to be the unit circle z : z = 1 in the complex plane. The natural projection e : R S1 is given by e(X) ={ exp(2| |πiX).} Notice that e(X) = e(Y ) if and only if X Y Z. → If f : S1 S1 is a continuous map− ∈ then there exists a lifting of f, that is a continuous map F : R →R such that the diagram → R F R −→

e e

 f  S1 S1 y −→ y commutes. For every k Z the map F +k (that is the map given by (F +k)(X) = F (X)+k for X R) is also a lifting∈ of f. On the other hand, if G is a lifting of f then G = F + k for some∈ k Z. If X ∈R then e(X) = e(X +1), so e(F (X)) = f(e(X)) = f(e(X +1)) = e(F (X +1)). Therefore∈d = F (X + 1) F (X) is an integer. By continuity, d does not depend on the choice of X. It is called− the degree of f. It depends really on f, not on F . If we take G = F + k then G(X +1) G(X)=[F (X +1)+ k] [F (X) + k] = F (X +1) F (X) = d. If x moves around the circle− once then f(x) moves− around the circle d times− (negative d means the opposite direction).

9 Dynamical Systems [Micha lMisiurewicz]

One can easily prove by induction that if F is a lifting of a degree d map and k Z then F (X + k) = F (X) + kd. Moreover, if F and G are liftings of the maps f and g∈ of degrees d and c respectively then (F G)(X +1) = F (G(X) + c)=(F G)(X) + cd. Since F G is a lifting of f g, the degree◦ of f g is cd. In particular, by induction◦ we get that the◦ degree of f n is dn◦. ◦ If f is an orientation preserving homeomorphism then its degree is 1. Therefore we shall concentrate our attention on the degree one circle maps. In this case for the lifting F we get F (X + k) = F (X) + k for k Z and the degree of all iterates of our map is 1 (that follows also from the fact that all iterates∈ are also orientation preserving homeomorphisms). The central role in the study of orientation preserving circle homeomorphisms is played by the rotation numbers. We shall introduce them for a slightly wider class of maps , consisting of all degree one circle maps with nondecreasing liftings. The next theoremL establishes existence of the rotation number ρ(F ) for every such a lifting. Theorem 1. Let f and let F be its lifting. Then the limit ∈L F n(X) X ρ(F )= lim − n→∞ n exists for every X R and is independent of X. ∈ Proof. If X,Y R then there exists an integer k such that X + k Y X + k + 1. We have ∈ ≤ ≤ F n(X) X 1 = F n(X + k) (X + k + 1) F n(Y ) Y − − − ≤ − F n(X + k + 1) (X + k) = F n(X) X +1, ≤ − − so (1) [F n(X) X] [F n(Y ) Y ] 1. − − − ≤ Since F mn(X) X =[F n(X ) X]+[F n(F n(X)) (F n( X))]+ ... +[F n(F (m−1)n(X))

F (m−1)n(X)], by− (1) we get m−[F n(X) X] [F mn−(X) X] m. Therefore − − − − ≤ F n(X) X F mn(X) X 1 − − . n − mn ≤ n

By switching the roles of n and m we get

F m(X) X F mn(X) X 1 − − . m − mn ≤ m

Therefore F n(X) X F m(X) X 1 1 − − + . n − m ≤ n m

n ∞ This proves that the sequence ((F (X) X)/n)n=1 is a Cauchy sequence and therefore it converges. By (1), the limit does not depend− on X. We write ρ(F ), not ρ(f), because the rotation number depends on the choice of a lifting. If we take another lifting F + k then (F + k)n(X) = F n(X) + nk, so ρ(F + k) = ρ(F )+k. Therefore e(ρ(F )) S1 is already independent of the choice of a lifting. However, it is sometimes simpler to think∈ about rotation numbers as real numbers, so we shall use rather ρ(F ) R, referring to it also as a rotation number of f. ∈ 10 Dynamical Systems [Micha lMisiurewicz]

Exercise 8. Prove that the rotation number depends continuously on the map. More precisely, prove that if g,f1,f2,f3,... have liftings G, F1, F2, F3,... respectively and ∞ ∈ L the sequence (Fn)n=1 converges uniformly to G then limn→∞ ρ(Fn) = ρ(G). Example 9. Set R (z) = z exp(2πiα). We call R the rotation by α (although strictly α · α speaking this is the rotation by the angle 2πα). The map Tα given by Tα(X) = X + α (the translation by α) is a lifting of R . We have T n(X) X = nα, so ρ(T ) = α. α − α If α is rational, Rα is called a rational rotation; if α is irrational, Rα is called an irrational rotation. Note that for a rational rotation all points are periodic. There is a basic difference between the dynamical properties of maps from with rational and irrational rotation numbers. L Theorem 2. Let f and let F be its lifting. If ρ(F ) is irrational then f has no periodic points. If ρ(F ) = p/q∈L, where p and q are integers, q > 0, and p and q are relatively prime then f has a periodic point and all periodic points of f have period q. Suppose that x is a periodic point of period q of f. Take X R with e(X) = x. Then e(F q(X)) = f q(x) = x, so F q(X) = X + p for some p Z. We∈ have F nq(X) = X + np, nq ∈ so ρ(F ) = limn→∞(F (X) X)/(nq) = p/q. This shows that if ρ(F ) is irrational then f has no periodic points. − Assume now that ρ(F ) = p/q, where p and q are integers, q > 0, and p and q are relatively prime. Look at G = F q p. It is a lifting of f q. If G has no fixed point then there is ε > 0 such that either G(X) X− ε for every X [0, 1] or G(X) X ε for all X [0, 1]. Since for k Z we have G−(X +≥k) = G(X)+k,∈ we get either G(−X) ≤X − ε for every∈ ∈ n n−1 − i ≥ i X R or G(X) X ε for all X R. Since G (Y ) Y = i=0 G(G (Y )) G (Y ), we∈ get either ρ(G−) ε≤or −ρ(G) ε.∈ On the other hand,− Gn(Y ) Y = F nq(Y )− np, so ρ(G) = qρ(F ) p =≥ 0, a contradiction.≤ − This proves that G hasP a− fixed point. If Z− is this fixed point and−e(Z) = z then f q(z) = e(F q(Z)) = e(Z + p) = z, so f has a periodic point. Let x be a periodic point of period m of f. Take X R with e(X) = x. We have e(Gm(X)) = e(F mq(X) mp) = e(F mq(X)) = f mq(x)∈ = x, so Gm(X) = X + l for some l Z. If Z is a fixed− point of G then for every k Z the point Z + k is also a fixed point∈ of G. Therefore, if X is not a fixed point of ∈G then there are fixed points Z and Z of G such that Z Y for all Y (Z1,Z2) or F (Y ) < Y for all n ∈ n Y (Z1,Z2). Therefore either limn→∞ G (X) = Z2 or limn→∞ G (X) = Z1. In both cases∈ we get a contradiction with Gm(X) = X + l. This proves that X is a fixed point of G. Therefore f q(x) = e(F q(X)) = e(X + p) = x. This proves that m divides q. We have e(F m(X)) = e(X), so F m(X) = X + r for some r Z. If q/m = s then F q(X) = X + rs, so p = rs. Thus, s is a common divisor of p and q∈. We assumed that those two numbers were relatively prime and hence s = 1. Therefore q = m. This proves that all periodic points of f have period q. In fact, from the above proof we get even more information about the dynamics of a map from with a rational rotation number. Namely, if this rotation number is p/q with L 1 nq ∞ p and q as above, then for every x S the sequence (f (x))n=0 converges to a periodic point of f. ∈ Now we start studying closer maps from with irrational rotation numbers. L 11 Dynamical Systems [Micha lMisiurewicz]

Theorem 3. For an irrational rotation all orbits are dense.

Proof. Let α be an irrational number. Suppose that Rα has an orbit which is not dense. This means that there exists X R and an interval I R such that for every n,k Z with n 0 the point X + nα k∈does not belong to I. ⊂ ∈ There≥ is a positive integer−m such that the length of I is larger than 1/m. Divide the 2 m circle into m arcs of equal length. Among the points x,Rα(x),Rα(x),...,Rα (x) (where x = e(X)) there are two which fall into the same arc. This means that there are integers i, j with 0 i 0. Since sα + k is smaller than the length of I, among| the points| X,X + sα + r, X +2− sα +2r, X +3| sα +3|r,... there must be one which belongs to I + p for some p Z (we define I + p as Y + p : Y I ). This means that X + nα belongs to I + k for some∈ n,k Z with n 0,{ and this is∈ what} we wanted to prove. ∈ ≥ Theorem 4. Let f have an irrational rotation number α. Then it is semiconjugate to the rotation by α via∈ L a map from . L Proof. Let F be the lifting of f with ρ(F ) = α. We are looking for a map H such that the diagram R F R −→

H H

 Tα  R R y −→ y commutes and H is a lifting of some h . Then h will be the desired semiconjugacy. We start by choosing some X R and∈L setting H(X) = 0. Thenfor k,n Z with n 0 we have to set H(F n(X) + k) = nα∈+ k. This we can do, since f has no periodic∈ point (see≥ Theorem 2) and consequently for different pairs (k,n) we get different points F n(X) + k. We claim that the function H defined up to now (that is, on the set A = F n(X) + k : k,n Z, n 0 ) is increasing. To prove it assume that F n(X) + k < F{m(X) + l. If n = m∈ then k k l. Set G = F m−n (k l). Since f has no periodic points, G has no fixed− points. Since− G(Z) > Z for Z =− F n−(X), we have the same inequality for all Z R. As in the proof of Theorem 2 this leads to the conclusion that there exists ε > 0 such∈ that G(Z) Z + ε for all Z R, and consequently ρ(G) ε > 0. Since ρ(G)=(m n)ρ(F ) (k ≥l), we get α = ρ(∈F ) > (k l)/(m n). Therefore≥ k l < (m n)α, that− is nα + k−

12 Dynamical Systems [Micha l Misiurewicz]

With a semiconjugacy h as above, the inverse images of points under h are either points or closed arcs. The inverse images of different points are disjoint, so there can be at most countable number of points with arcs as inverse images. Thus any map from with irrational rotation number α can be obtained (up to a conjugacy) from the rotationL by α by blowing up t an arc at most countable number of points. Notice that by continuity, if we blow up some point, we have to blow up its inverse image (under the rotation), its inverse image, etc., that is the whole backwards orbit of such a point. Moreover, if our map is a homeomorphism then with each point we have to blow up the whole positive (and consequently the full – positive and negative) orbit of this point. This describes almost fully the dynamics of those maps from topological point of view. We continue our study of orientation preserving circle homeomorphisms (or more generally, the maps from the class . Let f have an irrational rotation number. Let h be the semiconjugacy with aL rotation (see∈ L Theorem 4). We shall denote by Ω(f) the set∈L of all points of x S1 such that for every neighborhood U of x the set h(U) consists of more than one point∈ (then it is either an arc or the whole circle). Thus, S1 Ω(f) is the union of the interiors of all sets of the form h−1(y) where y S1. If h−1(y)\ consists of one point, its interior is empty. Otherwise, h−1(y) is a closed arc∈ and its interior is an open arc. We will call such an open arc a wandering arc. Thus, S1 Ω(f) is the union of all wandering arcs of f. \ Notice that if A is a wandering arc (or any arc contained in a wandering arc) then h(A) n consists of one point y, and since the rotation number α of f is irrational, Rα(y) = y for any n > 0. Therefore f n(A) A = for any n > 0. Notice that if f is a homeomorphism6 then the same is true also for∩ n < ∅0. Therefore in this case we get f i(A) f j(A) = if i, j Z, i = j. ∩ ∅ ∈ 6 On the other hand, if A is an open set containing any point of Ω(f) then h(A) n contains an arc B. By Theorem 3, if y B then there exists n > 0 such that Rα(y) B, so R (B) B = . Therefore there exists∈ n > 0 such that f n(A) A = . ∈ α ∩ 6 ∅ ∩ 6 ∅ Clearly, if there are no wandering arcs then h is a conjugacy. Thus, if we want to prove that some f with an irrational rotation number α is conjugate to a rotation by α then it is enough∈L to show that it has no wandering arcs. Of course, in this case f has to be a homeomorphism (every open arc on which f is constant is contained in a wandering one). 1 Let α be an irrational number. Fix a point x S and look at the Rα-trajectory of x. By Theorem 3 it is dense in the circle, so from time∈ to time it approaches x closer than ever before. Thus we get a sequence of closest approaches. It is closely related to the number- theoretical properties of α, in particular to the continued fraction representation of α. On the other hand, it is related to the existence of smooth conjugacies of diffeomorphisms with rotation number α with Rα. However all that we will need here is the existence of n this sequence. If n is one of those closest approaches then Rα(x) is closer to x than any of i the points Rα(x), 0

1 n Proof. Take a point y S and n such that Rα(y) is closer to y than any of the points i ∈ n Rα(y), 0

′ ′ ′ ′ [log f (z0) log f (z−n)] + [log f (z1) log f (z−n+1)] + ... (2) − ′ − ′ + [log f (zn−1) log f (z−1)] V. − ≤

The expression at the left-hand side of (2) is equal to

′ ′ ′ [log f (z0)+log f (z1) + ... + log f (zn−1)] ′ ′ ′ [log f (z− )+log f (z− ) + ... + log f (z− )] − n n+1 1 n ′ n ′ = log(f ) (z0) log(f ) (z−n) . − n ′ −n ′ Furthermore, we have (f ) (z −n)=1/(f ) (z0), so the expression at the left-hand side of (2) is equal to log[(f n)′(z ) (f −n)′(z )] . Thus we get from (2) 0 · 0 ′ − ′ (3) exp( V ) (f n) (z ) (f n) (z ) exp V, − ≤ 0 · 0 ≤ where z0 = z is an arbitrary point of the circle. i Let A be an arc and set Ai = f (A) for n Z. Denote by Ai the length of Ai. i ′ ∈ n ′ |−n | We have Ai = A(f ) (x) dx, so An + A−n = A[(f ) (x)+(f )(x)] dx. Since the arithmetic| mean| of (f n)′(x) and (f −| n)(|x) is| greater| than or equal to their geometric mean, we get from (3) R R

n ′ −n 1/2 (4) A + A− 2 [(f ) (x) (f )(x)] dx 2 exp( V/2) A . | n| | n| ≥ · ≥ − | | ZA If A is a wandering arc then we know that f i(A) f j(A) = if i, j Z, i = j. ∩ ∅ ∈ 6 Therefore limk→∞( Ak + A−k ) = 0, so (4) can hold only for finitely many n’s. However, it holds for every n|from| the| sequence| of closest approaches, and there are infinitely many

14 Dynamical Systems [Micha lMisiurewicz]

of those. Hence we get a contradiction. This proves that there is no wandering arc. Consequently, f is conjugate to the rotation by α. Let us make a couple of comments about the Denjoy Theorem. The assumption on the bounded variation of f ′ may seem difficult to verify. However, it is not. Namely, if f is of class C2 then f ′′ is bounded by some constant M, and the same constant bounds the derivative of f ′| . Therefore| the variation of f ′ is bounded by M times the length of the circle (which| is 1| in our model). Thus, any orientation preserving C2 diffeomorphism of the circle with irrational rotation number is conjugate to an irrational rotation. The second comment is that the density of orbits is an invariant of conjugacies. Thus, if a circle homeomorphism is conjugate to an irrational rotation then all its orbits are dense. Exercise 9 (Denjoy counterexample). Prove that for every irrational α there exists a C1 orientation preserving circle diffeomorphism with rotation number α and not conjugate to Rα. Hint: Take Rα and blow up one full orbit. Choose lengths of new arcs in such a way that you can extend your map to a diffeomorphism.

Nonwandering Points

Many of the notions and ideas present in the theory of circle maps can be further developed and generalized. Let X be a compact metric space and f : X X a continuous map. We will call a point x X wandering if there exists a neighborhood→ U of x such that U f n(U) = for every ∈n > 0. A point is called nonwandering if it is not wandering (logical,∩ isn’t it?).∅ The set of nonwandering points of f is denoted Ω(f). Notice that in the case of a circle map from the class this agrees with our previous definition of Ω(f). L Exercise 10. Prove that for any shift and for the tent map the set of nonwandering points is the whole space. To get simple examples where the set of nonwandering points is not the whole space we can use the following construction.

Figure 4

15 Dynamical Systems [Micha lMisiurewicz]

Example 10. Let X be a compact closed surface (for instance a sphere, a torus, a pret- zel) embedded smoothly into R3 (see Figure 4). The minus height function H(x) = H(x1,x2,x3) = x3 restricted to X determines the gradient vector field on X. This vec- − t d t tor field determines the gradient flow (ϕ ) ∈R of H on X. We have ϕ (x) = (H )(x). t dt ∇ |X Notice that the vector (H X )(x) is equal to the orthogonal projection of the vector H(x) = 0, 0, 1 to the∇ plane| tangent to X at x. One can think about the flow ϕ as ∇describingh the movement− i of a point on X under the force of gravity. Now we consider the time one map f = ϕ1 of the flow ϕ. It is a diffeomorphism of X onto itself. At the points of X where the tangent plane is horizontal the gradient vector field vanishes, so those points are fixed points of f. For all other points x X the third component of the vector field is negative, so the third coordinate of f(x)∈ is smaller than the third coordinate of x. If we take a sufficiently small neighborhood U of such an x then by continuity for every points y = (y1,y2,y3) U and z = (z1, z2, z3) f(U) we ∈ n ∈ have z3 < y3. For any n 0 the third coordinate of f (z) is smaller than or equal to n+1 ≥ z3. Therefore f (U) is disjoint from U. This shows that such an x is wandering. Since clearly every fixed point is nonwandering, we see that Ω(f) consists of all points at which the plane tangent to X is horizontal. The following proposition lists the basic properties of the set of nonwandering points. Proposition 6. Let f : X X be a continuous map of a compact metric space X into itself. Then the following properties→ hold. (a) Ω(f) is compact. (b) Ω(f) is invariant, that is f(Ω(f)) Ω(f). ⊂ (c) If f is a homeomorphism then f(Ω(f)) = Ω(f). (d) Ω(f n) Ω(f) for any n > 0. ⊂ − (e) If f is a homeomorphism then Ω(f 1)=Ω(f). Proof. By the definition, the set of wandering points is open. Therefore the set of non- wandering points is closed, and hence compact (since X is compact). This proves (a). To prove (b), assume that x Ω(f). Let U be a neighborhood of f(x). Then V = f −1(U) is a neighborhood of x∈. Since x is nonwandering, there exists n > 0 such that V f n(V ) = . This means that there is y V such that f n(y) V . Then f(y) f∩(V ) = U 6 and∅ f n(f(y)) = f(f n(y)) f(V ) =∈U. Therefore U f n(U∈) = . This proves∈ that f(x) Ω(f). Hence (b) holds. ∈ ∩ 6 ∅ ∈ To prove (d), assume that x Ω(f n). If U is a neighborhood of x then there exists m > 0 such that U (f m)n(U) = ∈. Since (f m)n = f mn, this shows that x Ω(f). Hence (d) holds. ∩ 6 ∅ ∈ To prove (e) assume that f is a homeomorphism and x Ω(f). If U is a neighborhood of x then there exists n > 0 such that U f n(U) = . This means∈ that there is y U such that f n(y) U. Therefore U f −n(U) =∩ . This6 shows∅ that Ω(f) Ω(f −1). This∈ is true for all homeomorphisms∈ of X∩, in particular6 ∅ for f −1. Therefore Ω(⊂f −1) Ω((f −1)−1) = Ω(f). This proves (e). ⊂ From (e) and (b) it follows that if f is a homeomorphism then

f −1(Ω(f)) = f −1(Ω(f −1)) Ω(f −1)=Ω(f), ⊂ 16 Dynamical Systems [Micha lMisiurewicz]

so Ω(f) f(Ω(f)). Together with (b) this gives (c). ⊂ Example 11. Consider the space

X = 0, 2 2−n : n 0 2−n : n 0 2+2−n : n 0 . { }∪{− ≥ }∪{ ≥ }∪{ ≥ } Define the map f : X X by → 2x if x < 0, f(x) = −x if 0 x 1, ( x− 2 if x ≤ 2.≤ − ≥ Clearly, X is compact and f is continuous. All the points of X, except 0 and 2, are isolated and not periodic. Therefore they are wandering. The point 0 is a fixed point, so it is nonwandering. In any neighborhood of the point 2 there are points of the form 2+2−n The image of such a point is 2−n and the 2n +2-nd image of this point is 2. Therefore 2 is nonwandering. Thus, Ω(f) = 0, 2 . However, we have f( 0, 2 ) = 0 . This shows that the assumption in (c) that f is{ a homeomorphism} is essential.{ } { } The set U = 2 2+2−n : n 0 is a neighborhood of 2. Set V = 0 2−n : n 0 . We have{ }∪{f 2(U) V and≥f 2}(V ) V . Since V is disjoint from{ }∪{−U, we get U ≥(f}2)m(U) = for every⊂m > 0. Therefore⊂ 2 is wandering for f 2. This shows that in (d)∩ we do not have∅ necessarily the equality. This example shows also that the set Ω(f Ω(f)) (which is 0 here) can be smaller than Ω(f). | { } The next proposition gives two reasons why the notion of the set of nonwandering points is important. We will use the notation dist(x,y) for the distance between two points x and y, and dist(x,A) for the distance of a point x from the set A (that is, dist(x,A) = infy∈A dist(x,y)). Proposition 7. Let f : X X be a continuous map of a compact metric space X into itself. Then the following properties→ hold. (a) Every point is attracted by the set of nonwandering points, that is

lim dist(f n(x), Ω(f))=0 n→∞

for every x X. (b) Every invariant∈ probability measure µ is concentrated on the set of nonwandering points, that is µ(X Ω(f))=0. \ Proof. We prove (a) first. Suppose that there is a point x X such that dist(f n(x), Ω(f)) ∈ n ∞ does not tend to zero. Then there is an ε > 0 and a subsequence of the orbit (f (x))n=0 of x such that the distance of every point of this subsequence from Ω(f) is greater than or equal to ε. Since X is compact, there is a subsequence of this subsequence that converges to some point y X. Clearly, dist(y, Ω(f)) ε. For every neighborhood U of y there are arbitrarily large∈ integers n with f n(x) ≥U. We choose two of them, n < m. We ∈ 17 Dynamical Systems [Micha lMisiurewicz]

have f m(x) U, and since f n(x) U, we get f m(x) = f m−n(f n(x)) f m−n(U). Thus, U f m−n(U∈) = . This proves that∈ the point y is nonwandering, contrary∈ to the property dist(∩ y, Ω(f)) 6 ε∅ > 0. This contradiction shows that (a) holds. Now we prove≥ (b). Let µ be an invariant probability measure on X. Every wandering n point x X has a neighborhood Ux such that f (Ux) Ux = for every n > 0. Since X is compact∈ metric, the open set X Ω(f) is a countable∩ union∅ of compact sets, and each of those compact sets can be covered\ by finitely many U ’s. Thus, X Ω(f) can be x \ covered by countably many Ux’s. Hence, to prove (b) it is enough to show that µ(Ux)=0 for every wandering x. −m −n We claim that if m>n 0 then f (Ux) f (Ux) = . If not, then there −m −n ≥ m ∩ n ∅ exists y f (Ux) f (Ux) and then both f (y) and f (y) belong to Ux. Hence, m−n ∈ ∩ f (Ux) Ux = , contrary to our choice of Ux. This proves the claim. ∩ 6 ∅ −n Since µ is invariant, each of the sets f (Ux) has the same measure as Ux. As we have shown, those sets are pairwise disjoint, and thus if this measure is positive, the measure of the whole space is infinite. By our assumptions, µ is a probability measure, so this is impossible. Therefore µ(Ux) = 0. This proves (b). We can interpret the above proposition as follows. The trajectory of every point approaches Ω(f), so on longer and longer pieces it looks like trajectories of some points of Ω(f). Thus, all interesting dynamics can be found in Ω(f). Moreover, if we look at our system with some invariant probability measure, then with probability 1 a randomly chosen point is nonwandering. In fact, in Proposition 7 the set of nonwandering points can be replaced by a smaller set, namely the center of the system. It can be obtained by taking Ω(f), then Ω(f Ω(f)), etc. We can continue by transfinite induction until the sequence stabilizes, and this| is the center. As Example 11 shows, it can be smaller than Ω(f). For any point x X we define the omega limit set ω(x) of x (or of the trajectory of ∈ n ∞ x) as the set of limits of all convergent subsequences of (f (x))n=0. Thus another way of stating Proposition 7 (a) is that ω(x) Ω(f) for every x X. One can show that the center of the system is equal to the closure⊂ of the set of recurrent∈ points (that is, points x such that x ω(x)). When we∈ say that the orbit of a point x is dense in X, we can understand it in two ways. The first way is that we regard the orbit of x as a set, and then we mean that the set f n(x): n 0 is dense in X. The second way is that we regard the orbit of x as a sequence{ and≥ then} we mean that ω(x) = X. In most cases, those two statements are equivalent (see Exercise 11). However, in a general case it is not so (see Example 12). To avoid confusion, let us agree to use the expression “the orbit of a point x is dense in X” only in the second sense, that is “ω(x) = X”. Exercise 11. Prove that if X has no isolated points then f n(x): n 0 is dense in X if and only if ω(x) = X. { ≥ } Example 12. Let X = 0, 1 and let f(0) = f(1) = 0. The set f n(1) : n 0 is the whole space, whereas ω(1){ = }0 . { ≥ } { }

18 Dynamical Systems [Micha lMisiurewicz]

Transitivity,

Let us assume again that X is a nonempty compact metric space and f : X X a continuous map. → The existence of a dense orbit is an important property, called transitivity. It can be characterized also in a different way. We are going to use in the proof of equivalence of various characterizations the Baire category method. Any compact metric space is complete, that is every Cauchy sequence is convergent. In such a space the Baire Theorem holds: the intersection of a countable family of open dense sets is dense. The intersection of a countable family of open sets is called a Gδ set. Thus the sets containing Gδ dense sets are in some sense large. They are called residual sets (they complements are called the sets of first category). Thus, every open dense set is residual, and the intersection of a countable number of residual sets is residual. It happens often that it is much easier to prove that some set is residual than just that it is nonempty. Theorem 8. Let f : X X be a continuous map of a nonempty compact metric space X into itself. Then the following→ properties are equivalent. (a) f is transitive. (b) The set of points with the dense orbit is residual. (c) For every open nonempty sets U, V X there exists n 0 such that f n(U) V = . (d) For every open nonempty sets U, V⊂ X and m 0 ≥there exists n m such∩ 6 that∅ f n(U) V = . ⊂ ≥ ≥ ∩ 6 ∅ Proof. Clearly, (b) implies (a) and (d) implies (c). We will show that (a) implies (d), (c) implies (d), and (d) implies (b). Then it will follow that all four conditions are equivalent. Let us assume (a) and prove (d). Let x be a point with the dense orbit, let U, V X be nonempty open sets, and let m 0. There exists k such that f k(x) U and l k⊂+ m such that f l(x) V . Then f n(U) ≥V = for n = l k m. ∈ ≥ Now let us assume∈ (c) and prove∩ (d).6 ∅ We start by− proving≥ that f is a surjection (that is, maps X onto itself). Suppose that f(X) = X. Take a point x X f(X). Then f(x) = x, so there exist disjoint open sets U f6 (x) and W x. Since X∈ is compact,\ f(X) is also6 compact, so X f(X) is open. Then ∋V = W (X ∋f(X)) is open. It is nonempty, since it contains x. By\ (c), there is n 0 such that∩ f n\(U) V = . This means that there is a point y U such that f n(y) ≥ V . Since U is disjoint∩ from6 ∅W , we have n > 0. Therefore f(f n−1(∈y)) = f n(y) V X∈ f(X). However, we have f(f n−1(y)) f(X), a contradiction. This shows that∈f is⊂ a surjection.\ ∈ Let U, V X be nonempty open sets and let m 0. The set W = f −m(V ) is open, and since⊂ f is a surjection, it is nonempty. By (c),≥ there exists n 0 such that f n(U) W = . Then f n+m(U) V = . This proves (d). ≥ Finally,∩ 6 let∅ us assume (d) and∩ prove6 ∅ (b). The space X has a countable open base (Ui). Let Ai,j be the set of those points x X for which there exists n j such that n ∞ −n∈ ≥ f (x) Ui. In other words, Ai,j = n=j f (Ui). Since f is continuous, Ai,j is open. We will show∈ that it is dense. S If Ai,j is not dense then there exists an open nonempty set V disjoint from it. That n is, f (V ) Ui = for every n j. However, this contradicts (d). Therefore Ai,j is dense for all i, j.∩ By the∅ Baire Theorem,≥ the set A = A is residual. If x A then the orbit i,j i,j ∈ 19 T Dynamical Systems [Micha lMisiurewicz]

of x passes through each Ui infinitely often. This means that this orbit is dense. Hence, (b) is proved. From Theorem 8 it follows immediately that for a transitive map the set of nonwan- dering points is the whole space. Indeed, if x X and if U is a neighborhood of x then from condition (d) it follows that there exists n∈ > 0 such that f n(U) U = . The identity on any space consisting of more than one point is a simple example∩ of6 a map∅ where the set of nonwandering points is the whole space but that is not transitive. From the experimental point of view transitivity is a weak form of chaos: if we wait sufficiently long then probably we will see all possible types of behavior of our orbit. Any shift and the tent map are transitive (checking that is similar to Exercise 10). The systems from Examples 10 and 11 are not transitive. An orientation preserving circle homeomorphism is transitive if and only if it is conjugate to an irrational rotation. A circle map from the class that is not a homeomorphism is not transitive. L Another property similar to transitivity but much stronger is minimality. A continu- ous map f : X X is minimal if every orbit is dense. An example of a minimal map is an irrational rotation→ or any homeomorphism conjugate to it. If the space is infinite then clearly a minimal map has no periodic points. Therefore our other standard examples, like the tent map or shifts, are not minimal.

1 Exercise 12. Let Rα be an irrational rotation on the circle. Divide S into two semicircles L and R (they are closed arcs, so the endpoints are common, and as for an interval strictly speaking this is not a partition). Make the coding. Let X Σ be the closure of the ⊂ + set of all the codes that we get. Consider a subshift σ X : X X. Prove that it is well defined (that is, σ(X) X) and minimal. | → ⊂ One can look at minimality in a different way (that justifies the name). For f : X X consider the class of all nonempty compact invariant subsets of X, ordered by inclusion.→ Since X is compact, the intersection of any descending sequence of such sets is again such a set. Therefore by Zorn’s Lemma (or by a transfinite induction) we see that every set from our family contains a minimal one (that does not contain any other one). In most of examples, this will be a periodic orbit. For a circle map from the class with an irrational rotation number this will be Ω(f). Now, f is minimal if and only if theL whole space is a minimal set in the above sense.

Let us take another look at the definition of the rotation numbers. If f is a map from the class with a lifting F then we can define the displacement function ϕ as ϕ(x) = F (X) XL, where e(X) = x. Clearly, this does not depend on the choice of X e−1(x). The function− ϕ is continuous. We have ∈

n−1 F n(X) X 1 − = ϕ(f i(x)), n n Xi=0 where e(X) = x. This means that in order to find the rotation number of F we have to take the limit of the averages of the values of the displacement function over longer and

20 Dynamical Systems [Micha lMisiurewicz]

longer pieces of trajectories. We will refer to them as time averages (we can think of the number of the iterates as of time), or ergodic averages of the function ϕ. This is a special case of a more general situation. If a system (X,f) (now X is again a space, not a point) is a model of some phenomenon, we can often observe values of some function ϕ. If the initial state of the system (at time n = 0) is x then at time n we observe the value ϕ(f n(x)) of the function ϕ. Then it is very natural to look at the time averages and ask questions like: do they converge to a limit, and if yes, does this limit depend on the initial state? For circle maps from the class and the displacement function we know the answer: they do converge, and the limit is independentL on the initial state. We will prove that for irrational rotations and all continuous functions on S1 the situation is even better, namely we have uniform convergence of time averages to a constant. Of course, this situation is preserved under conjugacy. Thus, the same is true for any homeomorphism conjugate to an irrational rotation. In fact, the same is true for a larger class of maps, including all maps from ; we will speak about it later. L We shall refer to the measure µ = (e [0,1])∗(λ), where λ is the Lebesgue measure on [0, 1], as the normalized Lebesgue measure |on S1. Thus, we have µ(A) = λ(e−1(A) [0, 1]) for a measurable subset A of S1. If we think about the unit circle S1 as a topological∩ group, we see that µ is the Haar measure (invariant with respect to the rotations).

Theorem 9. For an irrational rotation Rα of the circle and a continuous function ϕ : S1 R the sequence of the time averages of ϕ converges uniformly to the space average of ϕ→with respect to the normalized Lebesgue measure. That is

n−1 1 k lim ϕ Rα = ϕdµ n→∞ n ◦ S1 Xk=0 Z in C0 topology. Proof. Trigonometric polynomials are dense (in C0 topology) in the space of continuous functions periodic with period 2π. The function Φ defined by Φ(t) = ϕ(exp(it)) is such a function. Therefore for a given ε > 0 there exists a trigonometric polynomial

k k

Ψ(t) = a0 + aj cos(jt) + bj sin(jt) j=1 j=1 X X such that Φ(t) Ψ(t) ε for all t R. | − | ≤ ∈ We have cos(jt) + i sin(jt) = exp(ijt) = (exp(it))j. Therefore Ψ(t) = ψ(exp(it)), where k k ψ(z) = a + a (zj ) + b (zj ) 0 jℜ jℑ j=1 j=1 X X (where and denote the real and imaginary parts). With our new notation we have ϕ(z) ℜψ(z) ℑε for all z S1. | − | ≤ ∈ 21 Dynamical Systems [Micha lMisiurewicz]

In order to estimate the time averages of ϕ, we estimate time averages of ψ, and for j this we have to estimate the time averages of the functions z z . If w = e(α) then Rα is just the multiplication by w. Therefore the n-th time average7→ of the function z zj at a point z is equal to 7→

− − 1 n 1 1 n 1 1 wjn 1 (zwk)j = zj (wj)k = zj − . n n n wj 1 Xk=0 kX=0 − Since z = w = 1, we get | | | | − 1 n 1 1 2 (zwk)j . n ≤ n wj 1 k=0 | − | X

Since α is irrational, wj 1 > 0 for every j = 0. Hence the time averages of the function z zj (and therefore| − also| of its real and6 imaginary parts) converge uniformly 7→ to zero. This proves that the time averages of the function ψ a0 converge uniformly to zero. Hence, there exists N such that for every n N those averages− are smaller than ε ≥ at every point. Notice that those averages are equal to the time averages of ψ minus a0 (subtract a0 after taking an average). The integral with respect to dµ of every function z zj (if j = 0) is zero. Therefore 0 7→ 6 ψdµ = a0. Since the C distance between ψ and ϕ is smaller than or equal to ε, we have ψdµ ϕdµ ε. Therefore ϕdµ a0 ε. Moreover, the absolute value of the differenceR − between≤ the n-th time averages− of ψ ≤and ϕ also does not exceed ε. Therefore R R R we get for n N and z S1 ≥ ∈ n−1 1 k ϕ(Rα(z)) ϕdµ n − Xk=0 Z n−1 n− 1 n−1 1 k 1 k 1 k ϕ(Rα(z)) ψ(Rα(z)) + ψ(Rα(z)) a0 + a0 ϕdµ 3ε. ≤ n − n n − − ≤ k=0 k=0 k=0 Z X X X

This completes the proof. In a general case we cannot say much about the convergence of time averages from the topological point of view. The situation is very different when we look at measure preserving transformations. It seems that this is the main reason why in applications (and therefore also in theory) invariant measures play so great role. Ergodic Theorem (Birkhoff). Let f : X X be a transformation preserving a proba- bility measure µ on X. Let ϕ : X R be an→ integrable function. Then the limit of ergodic averages → − 1 n 1 lim ϕ f k n→∞ n ◦ Xk=0 22 Dynamical Systems [Micha lMisiurewicz]

exists µ-almost everywhere, is integrable, and has the same integral as ϕ. Moreover, the limit function ϕ is f-invariant, that is ϕ f = ϕ µ-almost everywhere. ◦ We will not prove here the Ergodic Theorem, although the proof is not very difficult. An f-invariante probability measureeµ is callede ergodic if every measurable f-invariant set has measure 0 or 1 (sometimes, if µ is fixed, we say that f is ergodic). Notice that when we consider measure preserving transformations, it does not matter whether we think of an invariant set (a set A such that f(A) A) or a fully invariant set (a set A such that f −1(A) = A). ⊂ Exercise 13. Prove that if A is an invariant set then there exists a fully invariant set B such that the measure of the symmetric difference of A and B is zero. The main reason for importance of ergodicity is that if µ is ergodic then the limit function ϕ in the Ergodic Theorem is constant almost everywhere. Indeed, if ϕ is not constant almost everywhere then there exists t R such that the set x X : ϕ(x)

Ergodic Theory for Continuous Maps

Let us analyze the connections between topological and measure theoretical dynamical systems from the point of view of functional analysis. Let X be a nonempty compact metric space and f : X X a continuous map. Up to now we did not specify explicitly the σ- field of subsets of→X on which the measures were considered. Now at last we have to do it. We want all open and all closed sets to be measurable. The smallest σ-field of subsets of X containing those sets is the σ-field of all Borel subsets of X (by the definition). A measure defined on this σ-field is called a Borel measure. Denote by M (or M(X) if we deal with more than one space at once) the set of all Borel probability measures on X. If µ M then we can make a construction similar as for the Lebesgue measure: we say that∈ a subset of a set of measure zero is also measurable and has measure zero, and then we consider as measurable all sets which differ from Borel sets by sets of measure zero. However, we neglect anyhow the sets of measure zero, so such a completion of a σ-field does not produce a new measure from our point of view. Therefore it is enough to look at the measures from M. The linear space of all continuous real functions on X with the C0 norm will be denoted by C(X). The space dual to C(X) (consisting of all continuous linear functionals on C(X)) will be denoted by C∗(X). By the Riesz theorem, there is one-to-one correspondence between the non-negative linear functionals on C(X) of norm 1 and Borel probability

23 Dynamical Systems [Micha lMisiurewicz]

measures on X. That Ψ is non-negative means that Ψ(ϕ) 0 for any non-negative ϕ. If µ M then the corresponding functional Ψ is given by ≥ ∈ (5) Ψ(ϕ) = ϕdµ. ZX If a non-negative linear functional Ψ of norm 1 is given then there exists a unique Borel finite measure µ such that (5) holds. In this case the norm of Ψ is attained on the constant M function 1, so by (5) it is equal to Ψ(1) = X 1 dµ = 1. Therefore µ . The natural topology to consider in the space C∗(X) is the weak∈ topology (sometimes ∗ R referred to as the weak- topology). In this topology, a sequence of functionals (Ψn) is convergent to Ψ if and only if for every ϕ C(X) the sequence (Ψn(ϕ)) converges to Ψ(ϕ). Since by the Riesz theorem we can consider∈ M as a subset of C∗(X), we have the weak topology on M. A sequence of measures (µ ) converges weakly to µ if for every ϕ C(X) n ∈ the sequence ( X ϕdµn) converges to X ϕdµ. The advantage of using this topology is that we can use the Alaoglu theorem, saying that the unit ball in the space dual to a Banach space with theR weak topology is compact.R Since M is a weakly closed subset of the unit ball in C∗(X), we get as a corollary that M is compact in the weak topology. If X and Y are compact metric spaces and f : X Y a continuous map then f induces a linear operator f ∗ : C(Y ) C(X) by f ∗(ϕ) =→ϕ f. This operator induces in ∗ →∗ ◦∗ turn a linear operator f∗ : C (X) C (Y ) by f∗(Ψ) = Ψ f . Thus, for every continuous → ◦ function ϕ on Y we have (f∗(Ψ))(ϕ) = Ψ(ϕ f). Assume that Ψ M(X). If ϕ is non- ◦ ∈ negative then ϕ f is also non-negative, so (f∗(Ψ))(ϕ) 0. If ϕ = 1 then ϕ f = 1, so ◦ ≥ ◦ (f∗(Ψ))(ϕ) = 1. Therefore f∗(Ψ) M(Y ). Thus, f∗ maps probability Borel measures on X to probability Borel measures on∈ Y . If A is a Borel subset of Y then its characteristic function can be approximated in L1(Y ) by continuous functions. Thus, the integral of the characteristic function of f −1(A) with respect to the measure Ψ is equal to the integral of the characteristic function of A with respect to f∗(Ψ). The integral of the characteristic function of a set is equal to the measure of this set. Therefore our new definition of f∗ gives the same map on the measures as the old one. Let us return to the situation where f : X X. From the definition of f∗ it follows → immediately that it is continuous. Hence f∗ restricted to M is also continuous. We know already that a measure µ M is f-invariant if and only if f∗(µ) = µ. Therefore M is the ∈ f set of fixed points of f∗ on M, and therefore it is closed. Moreover, since M is compact, Mf is also compact. Clearly, Mf is convex. Therefore by the Krein-Milman theorem it is equal to the closed convex hull of the set if its extremal points. Let us identify extremal points of Mf . A measure µ Mf is not extremal if and only if it is of a form µ = tν + (1 t)ν for some t (0, 1) and∈ ν , ν M , ν = ν . Any set 1 − 2 ∈ 1 2 ∈ f 1 6 2 of µ-measure zero has also ν1-measure zero. This means that ν1 is absolutely continuous with respect to µ. By the Radon-Nikodym theorem, there exists a non-negative function ρ1 such that ν1 = ρ1µ. Similarly, there exists a non-negative function ρ2 such that ν2 = ρ2µ. Set A = x X : ρ (x) >ρ (x) . Since the functions ρ and ρ are measurable, the set A { ∈ 1 2 } 1 2 is measurable. Both measures ν1 and ν2 are invariant, so ν (A) ν (A) = ν (f −1(A)) ν (f −1(A)) 1 − 2 1 − 2 = ν (f −1(A) A) ν (f −1(A) A) + ν (f −1(A) A) ν (f −1(A) A). 1 ∩ − 2 ∩ 1 \ − 2 \ 24 Dynamical Systems [Micha l Misiurewicz]

By the definition of A, if B is disjoint from A then ν1(B) ν2(B). Therefore we get −1 −1 ≤ ν1(A) ν2(A) ν1(f (A) A) ν2(f (A) A). From this and from the definition of A it− follows≤ that µ(A f −∩1(A))− = 0. Therefore∩ A is invariant. Since ν = ν and \ 1 6 2 ν1(X)=1= ν2(X), we have 0 <µ(A) < 1. This shows that µ is not ergodic. Conversely, if µ is not ergodic then there exists an invariant set A with 0 <µ(A) < 1. Define ν1 as µ restricted to A and normalized (divided by µ(A)) and ν2 as µ restricted to X A and normalized (divided by 1 µ(A)). One can think about ν1 and ν2 as conditional probabilities.\ Since A is invariant, both− ν and ν are invariant. We have µ = tν +(1 t)ν 1 2 1 − 2 for t = µ(A), so µ is not an extremal point of Mf . This proves that extremal points of Mf are exactly ergodic measures. Thus, we get the following result. The set of all invariant probability Borel measures is equal to the closure of the convex hull of the set of ergodic ones. In other words, the set of finite convex combinations of ergodic measures is dense in Mf . The functional-analytic approach presented above is very useful. For instance, with it we can easily prove the existence of invariant measures. Theorem 10 (Krylov-Bogolyubov). Let f : X X be a continuous map of a nonemp- ty compact metric space X into itself. Then M =→. f 6 ∅ Proof. We start with any measure ν M, for instance a measure concentrated at one point. Then for n > 0 we set ∈ − 1 n 1 µ = f k(ν). n n ∗ kX=0 From the sequence (µ )∞ we can choose a subsequence weekly convergent to some µ M. n n=1 ∈ We shall show that µ Mf . Take any ϕ C(X∈). We have ∈ n−1 n−1 1 k k+1 ϕd(f∗(µn)) ϕdµn = ϕd(f∗ (ν)) ϕd(f∗ (ν)) − n − Z Z k=0 Z k=0 Z X X 1 n 2 = ϕ dν ϕd(f∗ (ν)) sup ϕ(x) . n − ≤ n ∈ | | Z Z x X

Passing to the limit along our subsequence we get ϕd(f∗(µ)) = ϕdµ. Since this holds for all ϕ C(X), we get f∗(µ) = µ. This means that µ is invariant. ∈ R R Since convex combinations of ergodic measures are dense in Mf and Mf is nonempty, we see that for every f as above there exists an ergodic measure in Mf .

Smooth Systems, Hyperbolicity

Let X be a nonempty metric compact space and let f : X X be a continuous map. → We will say that f is uniquely ergodic if the set Mf consists of one element. If additionally f is minimal, we say that f is strictly ergodic. For uniquely ergodic maps the situation is similar as for irrational rotations of the circle.

25 Dynamical Systems [Micha lMisiurewicz]

Proposition 11. Let f : X X be a continuous map of a nonempty compact metric → space X into itself. Assume that f is uniquely ergodic. Let µ be the unique element of Mf . Then for every continuous function ϕ : X R its ergodic averages converge uniformly to ϕdµ. → RProof. Suppose that there is a continuous function ϕ : X R such that its ergodic averages do not converge uniformly to ϕdµ. This means that→ there exists ε > 0 such that for every i there exists n i and x X such that i ≥ Ri ∈

ni−1 1 k ϕ(f (xi)) ϕdµ ε. ni − ≥ k=0 Z X

Let δxi be the Dirac delta at xi, that is the probability measure concentrated at the point xi. Set ni−1 1 k νi = f∗ (δxi ). ni Xk=0 Some subsequence of the sequence (νi) is convergent to a measure ν M. If ψ C(X) then the same estimate as in the proof of Theorem 10 gives us ∈ ∈

2 ψd(f∗(νi)) ψ dνi sup ψ(x) . − ≤ n ∈ | | Z Z i x X

Passing to the limit along our subsequence gives ψd(f∗(ν)) = ψ dν, and since this holds for all ψ C(X), we obtain f∗(ν) = ν. This proves that ν M . ∈ R ∈ R f k k We have f∗ (δxi ) = δf (xi), so from the definition of νi we get

ni−1 1 k ϕ dνi = ϕ(f (xi)). ni Z kX=0 Therefore by our assumption ϕ dν ϕdµ ε for every i. Consequently, i − ≥ R R

ϕ dν ϕdµ ε. − ≥ Z Z

This shows that ν = µ, and that contradicts the assumption that f is uniquely ergodic. 6 A point x X is called generic for a measure µ Mf if for every ϕ C(X) the values of the ergodic∈ averages of ϕ at x converge to ϕdµ∈. Since the value of∈ the n-th ergodic 1 n−1 k average of ϕ at x is equal to the integral of ϕ with respect to the measure f∗ (δx), R ∞ n k=0 1 n−1 k a point x is generic for µ if and only if the sequence n k=0 f∗ (δx) converges weakly n=1 P to µ.  P  26 Dynamical Systems [Micha lMisiurewicz]

Lemma 12. If µ M is ergodic then µ-almost every point of X is generic for µ. ∈ f Proof. Let µ Mf be an ergodic measure. Since X is compact and metric, there exists a countable dense∈ subset A C(X). For ϕ A denote by B the set of points of X at ⊂ ∈ ϕ which the ergodic averages of ϕ converge to ϕdµ. By the ergodic theorem, µ(Bϕ) = 1. Therefore the set B = B has also measure 1. We claim that every point of B is ϕ∈A ϕ R generic for µ. Take a function ψ TC(X), a point x B and a number ε > 0. There exists a function ϕ A such that ϕ(y) ∈ ψ(y) ε for every∈ y X. Then ∈ | − | ≤ ∈ − − 1 n 1 1 n 1 ϕ(f k(x)) ψ(f k(x)) ε n − n ≤ k=0 k=0 X X and

ϕdµ ψdµ ε. − ≤ Z Z

Since x Bϕ, there exists N such that for all n N we have ∈ ≥ − 1 n 1 ϕ(f k(x)) ϕdµ ε. n − ≤ k=0 Z X Thus for all n N we get ≥ − 1 n 1 ψ(f k(x)) ψdµ 3ε. n − ≤ k=0 Z X This proves that x is generic for µ.

Now we can show that the property from Proposition 11 characterizes uniquely ergodic maps. Theorem 13. Let f : X X be a continuous map of a nonempty compact metric space X into itself. Then the following→ properties are equivalent. (a) f is uniquely ergodic. (b) For every continuous function its ergodic averages converge uniformly to a constant. (c) There exists a measure such that every point is generic for it. Proof. By Proposition 11, (a) implies (b). Let us assume (b) and prove (c). For any ϕ C(X) let Ψ(ϕ) be the limit of the ergodic averages of ϕ. Clearly, Ψ is a non-negative linear∈ bounded functional on C(X) and is norm is 1. Therefore we can treat it as a measure from M. The standard estimates from the proof of Theorem 10 show that it is f-invariant. By the definition of this measure, every point is generic for it. To complete the proof of the theorem, we assume (c) and prove (a). If f is not uniquely ergodic then there exist at least two ergodic measures in Mf . By Lemma 12, for each of them there is a generic point. Since no point can be generic for two distinct measures, (c) cannot hold. From Theorems 9 and 13 it follows immediately that an irrational rotation on the circle is uniquely ergodic. Since it is minimal, it is even strictly ergodic.

27 Dynamical Systems [Micha lMisiurewicz]

Exercise 14. Show that every circle map from the class is uniquely ergodic. L In certain cases, the above theorem provides means for proving that a measure is ergodic. Namely, if f is uniquely ergodic then the unique f-invariant measure has to be ergodic. For instance, the normalized Lebesgue measure on the circle is ergodic for any irrational rotation (clearly, it is not ergodic for rational rotations). However, in a general case the situation is not so simple. Checking ergodicity using its definition is impossible except the trivial cases when the support of a measure is finite. Thus, we need a new characterization of ergodicity. Theorem 14. Assume that a map f : X X preserves a probability measure µ on X. Then µ is ergodic if and only if →

n−1 1 (6) lim µ(A f −k(B)) = µ(A)µ(B) n→∞ n ∩ kX=0 for every measurable sets A,B X. ⊂ −k k Proof. Assume that µ is ergodic. We have µ(A f (B)) = A χB f dµ, where χB is the characteristic function of B. Therefore ∩ ◦ R n−1 n−1 n−1 1 −k 1 k 1 k µ(A f (B)) = χB f dµ = χB f dµ. n ∩ n A ◦ A n ◦ kX=0 kX=0 Z Z kX=0 By the ergodic theorem, n−1 1 k lim χB f = µ(B) n→∞ n ◦ Xk=0 µ-almost everywhere. Hence,

− − 1 n 1 1 n 1 lim µ(A f −k(B))= lim χ f k dµ = µ(B) dµ = µ(A)µ(B). →∞ →∞ B n n ∩ n A n ◦ A kX=0 Z Xk=0 Z

1 n−1 k We could pass to the limit inside the integral since all the functions n k=0 χB f are bounded from below by 0 and from above by 1, and the constant functions are integrable.◦ Now we assume that (6) holds for every measurable sets A,B XP. If a measurable set A is invariant then µ(A f −k(A)) = µ(A), and if we take B =⊂A then the left-hand side of (6) is equal to µ(A).∩ The right-hand side of (6) is equal to (µ(A))2, so we get µ(A)=(µ(A))2. Therefore µ(A)=0or µ(A) = 1. This proves that µ is ergodic. In view of the above theorem, in order to prove that µ is ergodic it is enough to show that (6) holds for every measurable sets A,B X. However, notice that it is enough to check (6) only for A,B from some family of sets⊂ generating the whole σ-field of measurable sets. This is a consequence of the following result.

28 Dynamical Systems [Micha lMisiurewicz]

Exercise 15. Prove that if we fix A then the family of all sets B satisfying (6) is a σ-field.

Now we can show that the the measures from Example 3 are ergodic. Example 13. Let σ be a shift and let µ be the product measure from Example 3 (that is the system is a Bernoulli shift). To prove that µ is ergodic, it is enough to show that (6) holds for all cylinders A,B. However, if k is larger than the length of the cylinder A (in the non-invertible case) or the sum of the lengths of the cylinders A and B (in the invertible case) then we have µ(A f −k(B)) = µ(A)µ(B). Therefore ∩ (7) lim µ(A f −k(B)) = µ(A)µ(B), n→∞ ∩

so (6) holds. Thus, Bernoulli shifts are ergodic. Notice that in a similar way as for (6), if (7) holds for cylinders then it holds for all measurable sets. If it holds for all measurable A,B X then we say that µ (or f) is . This property is stronger than ergodicity. Thus,⊂ we even proved that Bernoulli shifts are mixing. Notice that irrational rotations are not mixing (take as A and B short arcs). Now we are going to study closer smooth dynamical systems. The first example we would like to analyze is the flow and the corresponding diffeomorphism from Example 10. The set of nonwandering points for that diffeomorphism consisted of fixed points. Therefore our immediate attention will be focused on the fixed points. Close to a fixed point the diffeomorphism can be approximated by its derivative at this point. This derivative is a linear map, so we refer to this procedure as a linearization. The behavior of the system close to the fixed point can be described with high accuracy by the behavior of linear system determined by this derivative. In fact, the Grobman-Hartman theorem says that under some conditions those two systems are locally (in a sufficiently small neighborhood of the fixed point) topologically conjugate. The assumption we have to make is that the fixed point is hyperbolic. This means that the derivative at the fixed point has no eigenvalues of modulus one. Notice that this is a generic condition. The set of invertible N N matrices satisfying it is open and dense in the space of all invertible N N matrices.× ×Our diffeomorphism from Example 10 is given as the time one map of a flow, and this flow is determined by a vector field. Therefore we have to find a connection between the vector field in a neighborhood of a critical point (a point at which this vector field vanishes) and the derivative of the corresponding time one map at this point. We consider a more general situation. Let F be a vector field, (ϕt) its flow, and 1 f = ϕ the corresponding time one map. Assume that F (x0) = 0 for some point x0. We consider everything in a small neighborhood of the point x0. Thus, even if our vector field is defined on a manifold, we can transport it to an open set in RN . We also assume that the vector field is smooth (at least of class C1). It is known that in such a case we can change the order of taking the spatial and temporal derivatives of the solutions. That is, t t t t DxDtϕ (x) = DtDxϕ (x). The fact that (ϕ ) is the flow of F means that Dtϕ (x) = F (x). t Therefore we get DxF (x) = DtDxϕ (x). One can think of both sides of this equality as of

29 Dynamical Systems [Micha lMisiurewicz]

N N N matrices. Therefore they can act on vectors v R . Hence, at the point x = x0 we × t N ∈ have DxF (x0)(v) = DtDxϕ (x0)(v) for all v R . We can interpret this in the following ∈ t t way. Consider a linear differential equation v˙ = DxF (x0)(v). Then ψ (v) = Dxϕ (x0)(v) is the general solution to this equation. On the other hand, we know how to solve linear differential equations. The general solution is etDxF (x0)(v). This shows that we have t tDxF (x0) 1 DxF (x0) Dxϕ (x0) = e . In particular, for t = 1 we have Dxϕ (x0) = e . Since f = ϕ1, we get the formula we were looking for, namely

DxF (x0) Dxf(x0) = e .

From the above formula it follows immediately that the eigenvalues of the derivative λ of f at x0 are e for the corresponding eigenvalues λ of the linear part DxF (x0) of the vector field F at x0. Thus, hyperbolicity of the fixed point x0 of f is equivalent to the hyperbolicity of DxF (x0), that is the property that no eigenvalue of DxF (x0) is purely imaginary. Of course, we can speak in such a case also of hyperbolicity of x0 for the flow (ϕt). As we know from the theory of linear differential equations (or just from linear algebra), in the hyperbolic case the space RN can be written as V s V u, where V s and u ⊕ V correspond to the eigenvalues of DxF (x0) with respectively negative and positive real parts. Both subspaces are invariant. In the linear subspace V s all solutions approach 0 exponentially fast, and in V u they escape from 0 exponentially fast. Thus, the same is true s for f, except that V corresponds to the eigenvalues of Dxf(x0) with moduli smaller than u 1 (that is, lying inside the unit circle), and V corresponds to the eigenvalues of Dxf(x0) with moduli larger than 1 (that is, lying outside the unit circle). Now, to continue examining fixed points of the map from Example 10, we have to be able to derive DxF (x0) from the equations describing the surface X. In order to do it we change slightly our notation (the symbols x and x0 will change their meaning). Assume that the surface X is the graph of z = G(x,y) in a neighborhood 2 of a point P = (x0,y0, z0), where z0 = G(x0,y0), and assume that tG is of class C . We assume also that the point P is a critical point of our gradient vector field F . This means that the tangent plane to X at P is horizontal, that is the partial derivatives of G at Q =(x0,y0) vanish: Gx(Q) = Gy(Q) = 0. We know that the vector w of our vector field at R =(x,y,G(x,y)) is the projection of the vector u = 0, 0, 1 to the plane tangent to X at R. The vector h − i G (x,y), G (x,y), 1 v = h x y − i 2 2 (Gx(x,y)) +(Gy(x,y)) +1 is a unit normal vector to X atpR. We can decompose the vector u into the sum of the vectors w orthogonal to v and u w parallel to v. Since the norm of v is 1, we get −

Gx(x,y), Gy(x,y), 1 u w = ((u v) v)v =(u v)v = h 2 −2 i . − − · · (Gx(x,y)) +(Gy(x,y)) +1

Therefore G (x,y) G (x,y) 1 w = x , y , 1 , a a a −   30 Dynamical Systems [Micha lMisiurewicz]

2 2 where a =(Gx(x,y)) +(Gy(x,y)) + 1. The first two components of w are functions of x and y and do not depend on z. Therefore we can project the whole picture, including the vector field, to the x,y-plane, and continue working there. Now we have a vector field F defined by

G (x,y) G (x,y) F (x,y) = x , y (G (x,y))2 +(G (x,y))2 +1 (G (x,y))2 +(G (x,y))2 +1  x y x y  in a neighborhood of the point Q. We want to compute the matrix of the derivative of F at Q. The fact that Gx(Q) = Gy(Q) = 0 facilitates this computation enormously. We get immediately

∂F ∂F ∂F ∂F 1 (Q) = G (Q), 1 (Q) = 2 (Q) = G (Q), and 2 (Q) = G (Q). ∂x − xx ∂y ∂x − xy ∂y − yy

This means that the matrix of the first derivative of F is equal to M, where M is the matrix of the second derivative of G (the sign minus is due to the fact− that the function H, with respect to which we take the gradient, is z, not z). − Since M is a matrix of the second derivative of a function of class C2, it is symmetric (the mixed partial derivatives are equal). Therefore both eigenvalues of M are real. The condition for hyperbolicity becomes in this case much simpler. Namely, it says that both eigenvalues are non-zero, i.e. the matrix M is nondegenerate. Assume that M is nondegenerate. Then there are three cases possible. If G has a local maximum at Q then the determinant of M is positive and both entries on the diagonal are positive. Therefore the determinant and the trace of M are positive. This means that both eigenvalues of M are positive, so both eigenvalues of M are negative. Therefore the point P is (exponentially) attracting. If G has a local minimum− at Q then the determinant of M is positive and both entries on the diagonal are negative. Therefore the determinant M is positive and the trace of M is negative. This means that both eigenvalues of M are negative, so both eigenvalues of M are positive. Therefore the point P is (exponentially) repelling. The third possibility− is that Q is a saddle point for G. Then the determinant of M is negative. Therefore one eigenvalue of M is positive and the other one is negative. The same is true for M. This is what we call also a saddle for f (or for the flow (ϕt)). In the tangent space− at P there is one stable (attracting) and one unstable (repelling) direction. In a general case of a diffeomorphism f of a manifold X, hyperbolic fixed points give us more new structure to look at. Namely, if x is a hyperbolic fixed point of a diffeomorphism f, then as in the case when it was derived from a flow, we can split the tangent space at x and write it as V s V u, where the subspaces V s and V u correspond to the eigenvalues with moduli smaller than⊕ 1 and larger than 1 respectively (this is elementary linear algebra). In particular, both V s and V u are DF (x)-invariant. Moreover, there are constants c > 0 and λ (0, 1) such that DF n(x)(v) cλn v for every n > 0 and v V s, and DF −n(x∈)(v) cλn v fork every n >k ≤0 and kv k V u. In fact, those properties∈ characterizek V s andk ≤V u. Wek k call V s a stable or contracting∈ subspace, and V u an unstable or expanding subspace.

31 Dynamical Systems [Micha lMisiurewicz]

Warning. There is a natural tendency of a human mind to think that everything moves forward in time (perhaps only time travelers do not think so). Therefore if we have the condition DF n(x)(v) cλn v with λ < 1 and n = 1, 2, 3,... for a contracting subspace, therek is a temptationk ≤ of usingk k a similar condition DF n(x)(v) cλn v with λ > 1 and n = 1, 2, 3,... for an expanding one. However,k this is wrong.k ≥ Supposek k that 2 0 DF (x) = . We have V s = 0,a : a R and V u = a, 0 : a R . However, 0 1/2 {h i ∈ } {h i ∈ }   for every vector v = a,b with a = 0 there exists constants c > 0 and λ > 1 such that DF n(x)(v) cλn vh . i 6 k k ≥ k k In the situation described above (prior to Warning), the situation in the manifold is similar as in the tangent space at x. Namely, there exist stable and unstable manifolds at x. Locally, they are submanifolds of X. Both pass through x and the stable manifold W s(x) is tangent to V s, and the unstable manifold W u(x) is tangent to V u. In particular, their dimensions are the same as the dimensions of V s and V u respectively, that is the same as the number of eigenvalues of DF (x) of modulus smaller and larger than 1 respectively (counting multiple ones several times according to their multiplicity). The stable and unstable manifolds are at least as smooth as f. Globally, they are embedded manifolds, that is globally their topology is not necessarily inherited from X (although locally it is). In particular, it may happen that they are dense in X. One can define a local stable and local unstable manifold by choosing a sufficiently small ε > 0 and setting s n u Wε (x) = y X : dist(x,f (y)) < ε for n = 0, 1, 2,... and Wε (x) = y X : dist(x,f −n{(y))∈ < ε for n = 0, 1, 2,... (a similar warning as} above applies also{ ∈ here). They are (for sufficiently small ε) neighborhoods} of x in W s(x) and W u(x) respectively. The theorem on existence of stable and unstable manifolds is called the Hadamard- Perron theorem. Notice that from the Grobman-Hartman theorem we can get those man- ifolds, but since the conjugacy in that theorem is not necessarily smooth, we would not get smoothness.

Figure 5

If x is a periodic point of f of period n then we say that x is hyperbolic if it is a

32 Dynamical Systems [Micha lMisiurewicz]

hyperbolic fixed point of f n. Then we can apply the above names and constructions to x as a fixed point of f n. Let us look at the special case of the situation from Example 10. Namely, let X be a torus positioned vertically in R3 (see Figure 5). More precisely, X is the surface of revolution obtained by rotating a circle of radius 1 centered at (2, 0) in the xy-plane, around the y-axis. The time one map f of the gradient vector field for the function z has four fixed points: a repelling one T at the top, an attracting one B at the bottom,− and two saddle points Su (the upper one) and Sl (the lower one) (see Figure 5). All those points are hyperbolic (see Exercise 16). The stable manifold of T consists of T only. The unstable manifold of T is the whole torus except the unstable manifolds of the saddle points and the point B. Similarly, the unstable manifold of B consists of B only and the stable manifold of B is the whole torus except the stable manifolds of the saddle points and the point T . The stable manifold of Su is the vertical circle through Su and T , except the point T . The unstable manifold of Su is the vertical “inner” circle of the torus through Su and Sl, except the point Sl. Similarly, the unstable manifold of Sl is the vertical circle through Sl and B, except the point B, and the stable manifold of Sl coincides with the unstable manifold of Su except that we remove from the “inner” circle the point Su instead of Sl (see Figure 5).

Exercise 16. Check that in the situation described above all four fixed points are hyper- bolic.

Very often along with a given we want to look at its small perturba- tions. We may think of those small perturbations as caused by a small guy who sometimes even appears if the figures illustrating those perturbations. Let our small guy kick the torus a little, so that it is no more vertical. We want to see how the above picture changes. We assume that the small guy kicking the torus is behind it near the top. Then the upper part of the torus is closer to us than the lower one (see Figure 6).

Figure 6

33 Dynamical Systems [Micha lMisiurewicz]

There are still four fixed points, and they are still hyperbolic (small smooth pertur- bations cannot destroy hyperbolicity). Their stable and unstable manifolds look the same as before with two exceptions. Namely, the unstable manifold of Su no longer ends up at Sl, but goes all the way down to B. Similarly, the stable manifold of Sl no longer ends up at Su, but goes all the way up to T . Moreover, those manifolds do not intersect each other (see Figure 6). Thus, the picture is qualitatively different than before. However, further small kicks will not change it. This can be expressed rigorously in the following way. We say that a diffeomorphism f : X X is structurally stable if there is a neighborhood U of f in the space Diff 1(X) of C1 →diffeomorphisms of X onto itself (with the C1 topology) such that every element of U is topologically conjugate to f. maps stable and unstable manifolds to the corresponding stable and unstable manifolds. Therefore we see that our first diffeomorphism f is not structurally stable. One can show that the second one is structurally stable. Namely, its set of nonwan- dering points is finite and consists of periodic points, and all stable and unstable manifolds intersect transversally (we will explain this notion in a moment). A diffeomorphism with those properties is called a Morse-Smale diffeomorphism. It is known that all Morse-Smale diffeomorphisms are structurally stable. We say that two embedded manifolds in X intersect transversally if at every point q of their intersection the vectors tangent to at least one of them span the whole tangent space to X at q. In particular, if those manifolds have empty intersection then they intersect transversally. If one of the manifolds has the same dimension as X then they also intersect transversally. If the dimensions of our two manifolds add up to the dimension of X then they intersect transversally if and only if at each point of their intersection there is no nonzero vector tangent to both of them. In fact, properties like hyperbolicity of periodic points and transversal intersections of stable and unstable manifolds are quite common. We say that some property of a smooth dynamical system is generic if the set of diffeomorphisms having this property is residual 1 1 in Diff (x) (that is, contains a dense Gδ subset of Diff (X)). The Kupka-Smale theorem says that the following properties are generic: the periodic points are dense in the set of nonwandering points, all the periodic points are hyperbolic, and the stable manifold of any periodic point intersects transversally the unstable manifold of any periodic point. Another important class of diffeomorphisms that are known to be structurally stable is the class of Anosov diffeomorphisms. A diffeomorphism f : X X is Anosov if the set of nonwandering points of f is the whole manifold X and f is globally→ hyperbolic. Global s u hyperbolicity means that the tangent space at every point x of X can be written as Vx Vx , s u s s ⊕ both Vx and Vx depend continuously on x, are invariant (that is Df(Vx ) = Vf(x) and Df(V u) = V u ), and there are constants c > 0 and λ (0, 1) such that Df n(v) cλn x f(x) ∈ k k ≤ and Df −n(u) cλn for every v V s, u V u and n > 0. In some sense the Anosov diffeomorphismsk k ≤ are very far from the∈ Morse-Smale∈ diffeomorphisms. Namely, the set of nonwandering points is as small as possible for Morse-Smale diffeomorphisms, and as large as possible for Anosov diffeomorphisms. Standard examples of Anosov diffeomorphisms are hyperbolic algebraic automor- phisms of an N-dimensional torus, described in the next example.

34 Dynamical Systems [Micha lMisiurewicz]

Example 14. As in Example 2, we look at the N-dimensional torus TN as the quotient RN /ZN . Let A be an N N matrix with integer entries and determinant 1. We can look at A as a linear diffeomorphism× of RN onto itself. It preserves ZN , and± therefore it N N N N induces a smooth map fA : T T . If π : R T is the natural projection then the diagram → → RN A RN −→

π π

 fA  TN TN y −→ y commutes. Since the determinant of A is 1, the matrix A−1 has also integer entries. N ±N Therefore the smooth map f −1 : T T is well defined. Clearly, it is the inverse of A → fA, and therefore fA is a diffeomorphism. The derivative of fA is the same as the derivative of A. Since A is linear, this derivative is simply A. Thus, the necessary and sufficient condition for fA to be hyperbolic is that the matrix A is hyperbolic, i.e. that it has no eigenvalues of modulus 1. A standard example of such a diffeomorphism in dimension 2 is fA where A = 2 1 . The characteristic polynomial of A is λ2 3λ + 1, so the eigenvalues are 1 1 −   λ1 = (3+ √5)/2 > 1 and λ2 = (3 √5)/2 (0, 1). the corresponding eigenvectors are (1 + √5)/2, 1 and 1, (1 + √5)/−2 . They∈ are orthogonal since A is symmetric. h i h− i The most popular method of looking at how the diffeomorphism fA acts is to look at T2 as the unit square with the parallel sides identified, draw a cat’s face in the square, and see how it gets mutilated under the action of the diffeomorphism (see Figure 7). This poor cat is sometimes referred to as the Arnold’s cat.

Figure 7

The set of periodic points of fA is dense in the torus (see Exercise 17). Since fA is globally hyperbolic, each of those points is hyperbolic. Since A is linear, in R2, their stable and unstable manifolds are straight lines in the directions of the corresponding

35 Dynamical Systems [Micha lMisiurewicz]

eigenvectors. In T2, they are the images of those lines under the projection π. Thus, they are trajectories of the flows of the constant vector fields, as described in Example 2. Since the numbers 1 and (1 + √5)/2 are non-commensurable, the rotations obtained from those flows by means of the Poincar´esection are irrational. Therefore all trajectories of those rotations are dense in the circle. Consequently, all trajectories of our flows are dense in the torus. Hence, all stable and unstable manifolds of all periodic points of fA are dense in the torus.

Exercise 17. Let fA be an algebraic automorphism of an N-dimensional torus. Prove that every point with rational coordinates is periodic for fA. It is easy to see that in the example described above the stable and unstable manifolds of any periodic point p intersect each other at a point different from p. Such a point q of intersection is called a homoclinic point. If the intersection at q is transversal (as for fA), q is called a transversal homoclinic point. Existence of a transversal homoclinic point tells us that the dynamics of the diffeomorphism are complicated. We will analyze the situation in dimension 2.

W u ( p )

W s ( p ) p f 3 ( q ) f 2 ( q ) f ( q ) q

Figure 8

Let the one-dimensional stable and unstable manifolds W s(p) and W u(p) of the hyper- bolic fixed point p of a diffeomorphism f intersect transversally at a point q = p. Assume for simplicity that both eigenvalues of Df(p) are positive. As we follow W6 u(p) further,

36 Dynamical Systems [Micha lMisiurewicz]

pieces of this manifold approach closer and closer the piece of W u(p) close to p, crossing W s(p) at the points f n(q) (see Figure 8). Thus a rectangular neighborhood R of p has an image f n(R) whose part intersects R in a horseshoe-like way (see Figure 9). We shall analyze this phenomenon in a model situation called Smale’s horseshoe.

W u ( p )

part of f n ( R )

p W s ( p )

R

W u ( p ) Figure 9

Example 15 (Smale’s horseshoe). Let P be the union of the square R = [0, 1]2 and two semidisks: D0 below R and D1 above R, adjacent to R. We define a diffeomorphism of P into P as follows. We take two disjoint horizontal rectangles R0 = [0, 1] [1/5, 2/5] and R = [0, 1] [3/5, 4/5] (see Figure 10). Now we squeeze R horizontally,× stretch 1 × vertically and bend like a horseshoe. The map f on Ri is linear (more precisely, affine), and f(R )=[1/5, 2/5] [0, 1], f(R )= [3/5, 4/5] [0, 1]. The rectangle between R and 0 × 1 × 0 R1 is mapped into D1. The rectangles below R0 and above R1, and the semidisks Di are mapped into D0 (see Figure 10). In f(D0) there is an attracting fixed point. Every point outside R0 R1 is mapped to D0 D1, then to D0, then to f(D0), and then it becomes attracted to∪ this fixed point. Thus,∪ all nonwandering points in P other than this attracting fixed point have trajectories contained in R R . 0 ∪ 1 If we want to have a diffeomorphism of a compact manifold without boundary onto itself, we can complete the picture by embedding P into a sphere and extending the diffeomorphism to the whole sphere. We can do it in such a way that there will be a repelling fixed point outside P and all backward trajectories of points outside P will converge to it. Then there will be one nonwandering point more (namely, this repelling fixed point).

37 Dynamical Systems [Micha lMisiurewicz]

D1

R1 R

R0

D0

Figure 10

Let us concentrate our attention on the set C of points with both forward and back- ward trajectories in R0 R1. Let us apply a coding procedure similar to the one used for ∪ i interval maps. Namely, for a point x let us write ci(x) = j if f (x) Rj. In such a way we have defined a map c : C Σ, where Σ is the space of all two-sided∈ 0-1 sequences. Clearly, the diagram → f C C −→

c c   Σ σ Σ y −→ y commutes. If the distance between x and y is small then for a long time the images and preimages of x and y stay close to each other, so the codes for those points are close. Therefore c is continuous. We will show that c is a homeomorphism of C onto Σ. That would mean that f C : C C is conjugate to the full 2-shift. We have to show that c is one-to-one and onto.| → Suppose we have some two-sided 0-1 sequence εi. By induction we easily see that the −n set of points x R with ci(x) = εi for i = 1, 2,...,n is a vertical strip of width 5 and height 1 (we∈ can make finite coding for all points of P with relevant images in P ). Therefore the set of the points x R with c (x) = ε for i =1, 2,... is a vertical segment ∈ i i of height 1. Similarly, the set of the points x R with ci(x) = εi for i = 0, 1, 2,... is a horizontal segment of width 1. Their intersection∈ consists of one point. Of− course− this point belongs to C. Therefore for every point of Σ there is exactly one point x C with c(x) equal to this point. Since C is compact and c is continuous, one-to-one and∈ onto, it is a homeomorphism. Since there are infinitely many periodic points of σ, there are infinitely many periodic 1/5 0 1/5 0 points of f in C. The matrix of the derivative of f is on R and 0 5 0 − 0 5     38 Dynamical Systems [Micha lMisiurewicz]

on R1. Therefore all those periodic points are hyperbolic. Moreover, we have a hyperbolic structure on C. Now we see that existence of a transversal homoclinic point for a periodic point im- plies a very rich structure of the diffeomorphism, in particular infinitely many hyperbolic periodic points. A set like the set C from our example can be interesting from theoretical point of view, however it may be simply invisible in the experiments. Namely, for the classical Smale’s horseshoe described above, Lebesgue-almost all points are attracted to the attracting fixed point. Therefore a random choice of the initial point will give with probability 1 a tra- jectory attracted to this point, and there will be no signs of existence of C. However, in other examples it may happen that sets with complicated dynamics are visible, since they attract all points from some set of positive Lebesgue measure. One example is an Anosov diffeomorphism, when the dynamics on the whole manifold is complex. There are many non-equivalent definitions of , depending on the point of view of a person producing them. We will give one of them. Let X be a compact metric space and f : X X a continuous map. We will say that a set A X is an if there is a neighborhood→ U of A such that the image of the closure of⊂ U is contained in U, the intersection of the sets f n(U) over all nonnegative n is ∞ n equal to A, and f A is transitive. Notice that the condition A = n=0 f (U) = A implies invariance of A. Moreover,| A is also the intersection of closed sets f n(U), so it is closed. T Therefore it makes sense to speak of transitivity of f A : A A. According to our definition, an attracting periodic| orbit→ is an attractor, and the whole manifold for an Anosov diffeomorphism is an attractor. These attractors are from topo- logical point of view quite simple. However, there exist more complicated ones. They are usually called strange attractors.

Figure 11

Example 16 (solenoid). Let X be a solid torus, that is a solid obtained by revolving a disk D around a line l lying in the same plane as D but disjoint from D. We will refer to the direction of the circles obtained during this revolution as the circle direction and to

39 Dynamical Systems [Micha lMisiurewicz] the disks obtained as intersections of X with the semi-planes having l as the boundary as transversal disks. We stretch X in the circle direction, and squeeze it in the transversal direction. Then we wrap it twice around l and put back into the solid torus in such a way that the images of the transversal disks are contained in transversal disks (see Figure 11). It is possible to make it smoothly; one can even write simple real analytic formulas. In such a way we defined a diffeomorphism f of X into itself. Again, it is possible to extend our diffeomorphism to a diffeomorphism of some three- dimensional manifold without boundary (for instance, a 3-sphere) onto itself, adding one repelling fixed point outside. ∞ Set A = f n(X). If U is X minus the boundary of X in R3 then f(U) is contained n=0 ∞ in the closure in U and A = f n(U). Therefore to show that A is an attractor for f S n=0 it remains to prove that f A is transitive. We do it again by coding. Since f preserves the | S of X by transversal disks, we can factorize it to a circle map. If the stretching in the circle direction was exactly by factor 2 (we may assume that), we get as a factor of f the map z z2 (we assume that the circle is the unit circle in the complex plane). This map can7→ be viewed as a discontinuous interval map with the coding that gives binary expansions of numbers from [0, 1). (In fact, it can also be viewed as an algebraic endomorphism of the one-dimensional torus given by the 1 1 matrix (2).) This coding gives a semiconjugacy from the one-sided two-shift to our× circle map. The corresponding coding for f gives a semiconjugacy from the two-sided two-shift to f . |A Since the two-shift is transitive, f A is also transitive. Thus, A is a strange attractor for f. It is called a solenoid. | Locally, A is a product of a segment and a 2-dimensional Cantor set. We have hyper- bolic structure on A. Namely, the direction tangent to the “fibers” is unstable, and the directions in the planes of transversal disks are stable. Thus, A is a hyperbolic attractor.

40