Information Dimension and the Probabilistic Structure of Chaos J. Doyne Farmer

Dynamical Systems Collective, Physics Department, UC Santa Cruz, Santa Cruz, California

Z. Naturforsch. 37a, 1304-1325 (1982); received May 18, 1982 The concepts of entropy and dimension as applied to dynamical systems are reviewed from a physical point of view. The information dimension, which measures the rate at which the in- formation contained in a probability density scales with resolution, fills a logical gap in the classification of attractors in terms of metric entropy, , and topological entropy. Several examples are presented of chaotic attractors that have a selfsimilar, geometrically scaling structure in their ; for these attractors the information dimension and fractal dimension are different. Just as the metric (Kolmogorov-Sinai) entropy places an upper bound on the information gained in a sequence of measurements, the information dimension can be used to estimate the information obtained in an isolated measurement. The metric entropy can be expressed in terms of the information dimension of a probability distribution constructed from a sequence of measurements. An algorithm is presented that allows the experimental de- termination of the information dimension and metric entropy.

Introduction the natural vehicle to make the notion of "degrees of freedom" more precise. Let us start out with a quotation from Abbot's Following A Square [1], we will distinguish three "Fiatland" [1]: distinct intuitive notions of dimension: "Dimension implies direction, implies measurement, implies the more and the less." (1) Direction: Euclidean 3-space is naturally de- scribed by 3-tuples, and is not homeomorphic to "As one of your Spaceland poets has said, we are all liable to the same errors, all alike the Slaves of our re- Euclidean 2-space. Following Poincare, Brouwer, spective Dimensional prejudices." Menger, and Urysohn, this notion naturally leads A Square to the modern definition of topological dimension [2]. There is no unique notion of dimension. In the (2) Capacity (the more and the less): The volume familiar territory of Euclidean space, we may all of a cube varies as the third power of its side, where- be slaves to a different aspect of dimension and yet as the "volume" of a square varies as the second. arrive at the same conclusions. But, if we venture Following Hausdorff and Besicovitch, we are led into the realm of the bizarre, distinct notions of to the fractal dimension, beautifully described in dimension diverge. In the domain of chaos, deter- the book by Mandelbrot [3]. ministic structure amplifies the uncertainty inher- ent in measurement, until only probabilistic infor- (3) Measurement: Here we come unstuck from mation remains. To comprehend the strange objects pure geometry. Measurement leads to probability, that inhabit this world, we must expand our con- and probability leads to entropy, or to take the cept of dimension to encompass chance as well as positive aspect, information. The scaling rate as the certainty. length of resolution varies defines the information dimension. This little known quantity was originally When dealing with a dissipative dynamical sys- defined by Balatoni and Renyi [4]. Exploring its tem we may begin with a Euclidean phase space properties and arguing for the utility of this concept of initial conditions of large or even infinite dimen- are the central goals of this paper. sion; after some time passes, the transients relax, some modes may damp out, and the point in phase For simple, predictable attractors such as fixed space that describes the state of the system ap- points, limit cycles, or 2-tori, the separate notions proaches an attractor. The number of degrees of of dimension converge; by any reasonable definition freedom is thereby reduced. Dimension provides these attractors are of dimension 0, 1, or 2 re- spectively. Chaotic (strange) attractors, however, Reprint requests to Center for Nonlinear Studies, MS B258 pose a more difficult problem. At any fixed level L.A.N.L., Los Alamos, New Mexico 87545, USA. of precision, most information about initial condi-

0340-4811 1 82 / 1100-1304 $ 01.30/0. - Please order a reprint rather than making your own copy.

Dieses Werk wurde im Jahr 2013 vom Verlag Zeitschrift für Naturforschung This work has been digitalized and published in 2013 by Verlag Zeitschrift in Zusammenarbeit mit der Max-Planck-Gesellschaft zur Förderung der für Naturforschung in cooperation with the Max Planck Society for the Wissenschaften e.V. digitalisiert und unter folgender Lizenz veröffentlicht: Advancement of Science under a Creative Commons Attribution Creative Commons Namensnennung 4.0 Lizenz. 4.0 International License. 1305,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos tions is lost in a finite time r. For times greater the asymptotic probability distribution of the than r, knowledge of the future is limited to the logistic equation. The second section is primarily information contained in the probability distribu- a review of the metric entropy, exceptions being tion of points on the attractor [6]. By assigning a the development of an algorithm for information dimension to a probability distribution, the in- dimension and metric entropy, and the expression formation dimension provides a probabilistic notion of the metric entropy in terms of the information of dimension. As we shall see, chaotic attractors dimension. can have a S3lf similar structure in their probability distributions that causes their information dimen- Part I: Isolated Measurements sion to differ from their fractal dimension. Just as Background: Partitions, Measures, and Measurement Mandelbrot [3] calls a set a fractal if the fractal dimension exceeds the topological dimension, we To begin the discussion, consider a measuring will call a probability distribution a fractal measure instrument with a uniform minimum scale of if the fractal dimension exceeds the information resolution e. For a ruler, for example, E is the dimension. distance between adjacent graduations. If a mea- suring instrument is assigned to each of the N real The direct physical relevance of the information variables of a dynamical system, the graduations dimension is in measurement. Knowledge of the of these instruments induce a'partition o f the phase information dimension of an attractor allows an space. (A partition ß= {Bi} of a set S is a collection observer to estimate the information gained when a of nonempty, nonintersecting measurable sets Bi measure m >nt is made at a given level of precision. that cover S, see Figure 1). Thus, there is an element This is to be contrasted with the new information of the partition corresponding to every possible gained in a series of measurements that are not outcome of a measurement. isolated. In this case there are typically causal relationships between the measured values so that Note: A measuring instrument will typically induce a par- tition only on a region of the phase space, which we will the average information is less than that gained assume to contain the attractors of interest. Also, although in an isolated measurement. The metric entropy measurements of classical quantities can be made continu- is by definition the upper bound on the information ously they are limited by observational noise, i.e. random perturbations that occur as a measurement is being made. acquisition rate. The metric entropy also estimates Although continuous, these observations are inevitably re- the rate at which the accuracy of a prediction of corded as rational numbers with a finite number of digits. the future decays as the time for prediction in- In the following discussion assume that the number of digits recorded is commensurate with the quality of an creases. We will take positive metric entropy as the instrument, so that the induced partition forms a good definition of chaos. Information dimension and model for the measurement process. metric entropy are related concepts. Information In the limit of perfect resolution the values of dimension deals with isolated measurements, and the N variables of the dynamical sj^stem at any metric entropy with sequences of measurements, given time may be represented by a point in the so it is natural to discuss them together. phase space. We will refer to this hypothetical This paper is a synthesis of old and new results. point as the state of the system. Throughout this By presenting new results integrated into the back- paper, assume that the motion in phase space is ground of previous results, I hope that the context bounded, and that a sufficient time has passed of the new results will be clear, and also that logical without a perturbation of the system so that the gaps in previous results will be filled so that they state is close to an attractor. This means that the form a coherent whole. Most of the new results are probability to find the state in an element of the contained in the first section, which develops the partition not containing part of the attractor is concept of information dimension as applied to negligibly small. chaotic attractors. New results include the applica- tion of information dimension to dynamical sys- tems, invention of "Cantor's Densit}^", computa- ^U— -r Fig; 1. A schematic drawing of a uniform tions on the Henon map, calculations of the in- §§§ _ 5 two dimensional partition, e is the mini- mum scale of resolution, and Bi is an formation dimension of an asymmetric Cantor set, element of the partition, corresponding and computations of the information dimension of __J to the outcome of a measurement. 1306 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

By making repeated measurements at random the sum in (4) is unnecessary, but for discrete time intervals, a histogram can be made to estimate mappings the inverse may not be unique.) the frequency of occurrence Pi of the ith element of the partition. The set of probabilities {Pi} is Information Dimension called the coarse grained asymtotic 'probability The amount of information gained in making a distribution. Take the limit as the scale of resolution measurement depends on the a priori knowledge goes to zero, and the number of samples goes to of the observer making the measurement, i.e. in- infinity. Assume that for any fixed set C on the formation depends on its context [7], The context attractor, the sum of Pi over elements of the parti- that we shall concentrate on here is that of an tion covering C approaches a number Ji(C). This observer who has a knowledge of the equations of defines a measure Jl on the attractor, with a cor- motion, and all the information that can (in prin- responding probability density P(x). ciple) be extracted from them. This observer is Ji{C) = jP{x) dx. (1) therefore capable of computing a coarse grained c asymptotic probability distribution Pi, and can Thus, for any partition ß= {Bi}, Pi = Ji(Bi). For obtain a good approximate knowledge of the mea- convenience, assume that within the basin of sure JL. attraction, almost every initial condition produces Suppose that this observer makes an isolated the same measure Jl. (Numerical experiments in- measurement of the state of the system. "Isolated dicate that in many cases these are good assump- measurement" means that no measurements have tions.) Let the phase space flow cp be defined as been made recently, so that the observer has no x(t) =

ally define D\ in an unambiguous manner: ficiently at finite levels of resolution. The definition of in- formation dimension depends on a ratio of 1(e) to | logs |, The diameter d(C) of a set C is d(C) = sup d(x, y), and is independent of the units used to measure e. Thus, x,y there is no loss of generality in taking L = 1. where d (x, y) is the distance between any two points A clear distinction should be made between ob- x and y in C. The diameter d of a partition ß = {£$} servational noise and external noise. Observational is d(ß) = sup d(Bi). A partition of diameter e is Bi noise, which determines the signal to noise ratio S, labeled ß{e). The information of ß with respect to occurs only during the measurement process, and a measure /j, is at least in the classical limit, does not affect the operation of the dynamical system. External noise (7) /(/?) =log is caused by random perturbations that do affect Let M be a manifold with metric d, and let a; be a the dynamical system. If the level of external noise random variable with measure /x. Define 1(e) as exceeds the level of observational noise, then at the minimum information possible for a partition fine resolution the observer will no longer be mea- of diameter e, i.e. 1(e) = inf I(ß(e)). The informa- suring properties of the dynamical system; the ß(e) extra information gained will be information about tion dimension of (M, d, /u) is the perturbations, and will not scale according to

DI = \im{I(e)l\\oge\}. (8) D\. As Shaw has pointed out [6], fractal structure £ —>0 is effectively truncated by the presence of external The information dimension of an attractor is noise. computed by imposing a metric on the dynamical system and using the invariant measure defined by Relation to Fractal Dimension (1). The dependence on the metric is clear from the This definition of information dimension is re- presence of e in the definition; nevertheless, we miniscent of the notion of capacity, or fractal di- conjecture that the information dimension remains mension [3], defined for an attractor as follows: invariant under smooth coordinate transformations, Let n(e) be the minimum number of balls of dia- providing that the measure induced by the co- meter e needed to cover an attractor. The fractal ordinate transformation is used to compute the dimension DF is: information dimension of the transformed attractor. The information dimension was defined by Bala- DF = lim {log n (e)/| log e |}. (10) £ —> 0 toni and Renyi [4] in 1956. They refer to it as Note: This does not depend on a measure. Nonetheless, simply "the dimension of a probability distribu- the similarity is clear: If all of the n(e) elements of a par- tion". tition are equally probable, 1(e) = log »(e). Furthermore, given a cover of balls that minimizes n(e), a partition cover- Once the information dimension of an attractor ing the attractor of diameter ^ e can be formed by sys- is known, without taking the trouble to accumulate tematically removing the regions where the balls intersect. or calculate an asymptotic probability distribu- Logra(e) ^ 1(e), which implies that DF ^ D\. The fractal dimension is an upper bound on the information dimension. tion, a rough estimate can be made of the quantity of information gained in making a measurement. The fractal dimension provides a means of The accuracy of an experimental instrument is estimating the information gained in a measure- usually given in terms of the signal to noise ratio S. ment when the asymptotic probability distribution Taking the simplest case in which the extent L is assumed to be uniform, i.e. using Lebesgue mea- of the attractor is roughly the same in every direc- sure. To an observer who knows the number of tion, and adjusting the units of measurement so cells to cover the attractor, but does not know their that L = 1, the number of bits of experimental relative probability, the amount of new information resolution is log S = | log g| (logs to base 2). From gained in a measurement is I = log n (e) ~ DF log S. (6), the new information gained in a measurement is the order of D\ log S. Note: The fractal dimension is a property of a set, rather than a measure, so properly speaking it always refers to the / ~ i>i log S. (9) support of the measure, which roughly speaking is the set of points x that have P (x) =t= 0, where P (x) is an associated Note: The definition of information dimension, (6) or (8), probability density. More precisely, the support of a mea- takes advantage of the fact that, since the limit is taken sure fx is the set of points x such that every open set C as e -> 0, the partition need not cover the attractor ef- containing x has /J, (C) > 0. 1308 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

Q,

BINARY CANTOR DENSITY

Fig. 2. "Cantor's density" has an information dimension Fig. 3. The binary Cantor density. This is similar to Fig. 2, Di = log 2/log 3, but the fractal dimension of its support but the interval is partitioned into 2n rather than 371 pieces. is 1 (see Eq. (12)). (a) Ql (x) = 3p0 ^ 0.3406 on (0,1/3) and (2/3,1), and Qi (x) = 3pm 2.3187 on (1/3,2/3). Using a uniform three piece partition, the information 7(1/3) is one bit. (b) To form Q2, the probabilities are re- distributed among 9 pieces (see Eq. (7), i = 2). Using the 32 element partition, 1 = 2 bits, (c) i = 3, 1 = 3 bits, (d) i = 6. Because of the finite linewidth of the plotter, this is indistinguishable from the limiting case except for the vertical scale (in the limit the spikes become infinite). 1309 ,T.D . Farmer • Information Dimension and the Probabilistic Structure of Chaos

Some Examples pieces so that the ratios within each third are the same as those of Qi (see Figure 2). Alternatively, (1) Continuous Density Qi can be defined as follows: Round the value of x Renyi has proven [8] that a D dimensional down to the nearest i digit base 3 number, i.e. random variable with a continuous probability a?=. and the probability of the piece (3) An Asymmetric, Dissipative Baker's in the middle be pm. The first approximation to Transformation: Yorke's Map

"Cantor's density" is Qi (x) = (3p0, 3pm, 3p0)- (The Although the Cantor density presented in the l previous section may seem bizarre, and at a first factor of 3 appears to make J Qi (x) da; = 1, which o glance unlikely to be physically relevant, these also implies that po = (1 — pm)/2.) Form Qz(x) by Cantor-like probability densities are typical of dividing each piece again into thirds, and redistri- chaotic attractors of dynamical systems. The follow- buting the probability within each of these nine ing example of a dissipative Baker's transforma- 1310 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos tion was introduced by Yorke [9] in order to each piece vertically until each piece has a height 1 illustrate this point: and a width 1/3. Then it slices the two pieces apart Define F (x, y) on the unit square 0 x sS 1, along their boundary, flips both of them about their vertical midlines, and places them as shown 0 q, as shown in Figure 4. It compresses the is p, and that of the right piece is q. When the square horizontally by a factor of 3, and stretches mapping is repeated, four thin strips are left with nonzero probability; from left to right, their values are pq, p2, q2 ,and pq, as shown in Figure 4. When this process is continued indefinitely, the resulting attractor is the Cartesian product of the classic Cantor set and the interval (0, 1), and has a fractal dimension of 1 +log 2/log 3. The invariant measure on this attractor is extremely nonuniform, how- ever, as can be imagined by extrapolating Figure 4. a) (If the "holes" are removed, and the order of the probability appropriately reshuffled, the probability PCx.y) P(x) density projected onto the x axis is the binary Cantor density of Figure 3.) The information di- mension can be computed by dividing the square into 32ra uniform pieces for n = 1, 2, ... . For any value of n, the ratio /(e)/1 log e | = /((l/3)2w)/log 32re is l + (plog(l/p) + glog(i/g))/log3.

Remembering that p = 1 — q, the information di- \ \ / / s \ \ mension D\ can be written as \ / \ / / s \ \ / \ / Di = 1 -{- H {q)/\og 3 , (16) > \ \ \ / \ / / \ \ where H(q) is the binary entropy function (see (14)). The essential feature that causes the difference between the information dimension and fractal dimension of this attractor is that the Jacobian determinant is not a constant. At a coarse level of resolution, each iteration of the mapping results in different probabilities for the "leaves" of the fractal set. Successive iterations move these leaves closer together, so that the resulting probability b) density has self similar structure on all scales. Fig. 4. (a) A geometric illustration of Yorke's Map. (b) Suc- Since vector fields with constant Jacobians form cessive iterations of the map act on an initially uniform a set of measure zero in the space of all possible probability density as shown. The attractor is the Cartesian product of the classic Cantor set and the interval. The in- vector fields, chaotic attractors with self similar formation dimension is given by (16). structure typically have fractal measures. (Self 1311,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos

similar structure is not a necessary condition, how- made by resolving the blowup of Fig. 5 (c) into 6 ever; there may be chaotic attractors that do not distinct leaves, and binning the total number of have an exact self similar structure that nonethe- points that visit each leaf into a histogram. As can less have fractal measures.) be seen in the figure, the probabilities of the discernable leaves are quite different. (4) Henon's Map This is true even though Henon's map is in- vertible and has a constant Jacobian determinant. A nonconstant Jacobian determinant is not, how- Taken together these two properties imply that ever, a necessary condition for a chaotic attractor the probability density on the attractor is uniform. to have a fractal measure. For example, consider What is not uniform is the spacing of the leaves the Henon [10] mapping, of the fractal set that make up the attractor. To see this, begin with a set C, and assign it a con- = 1 + Vi — axi2, Vi+\ = bxi (17) stant probability density. Iterate the map several with a = 1.4 and b = 0.3. A picture of the attractor times (see Figure 6). The set will be stretched and is shown in Fig. 5, together with the two blowups folded until it begins to resemble the attractor, but as shown by Henon in [10]. The number of observ- according to (4), the probability density on the able leaves of this fractal set depends on the quality iterated set Fl(C) remains uniform. The spacing of this reproduction and the keenness of your eye- between the folds of Fi(C) is not equal, however, sight. Figure 5(d) shows a probability distribution and at a fixed level of resolution the visible leaves

Fig. 5. The Henon attractor, see Equation (17). Figures 5(b) and 5(c) show successive blowups of the regions inside the box of the previous figure. Resolving the fractal structure of Fig. 5(c) into six "leaves", the relative probabilities of these leaves are plotted in Figure 5(d). The width of each bar is roughly the width of the corresponding leaf. This unequal distribution of probabilities persists on all scales, making the Henon attractor a "measure fractal". 1312 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

Fig. 6. Successive iterates by the Henon map of the line segment shown in Figure 6 a. The spacing between the folds that form after several iterations is not constant; when evaluated at a finite level of resolution unequal numbers of them appear to merge, yielding the probability structure seen in Figure 5(d).

may contain different numbers of folds of FL(C). into 2n equal bins, as shown in Figure 7(b). Each The total probability of each resolvable leaf appears time the resolution is cut in half, the asymmetry different. The fact that this structure persists on causes neighboring bins to acquire different prob- such a microscopic level suggests the Henon attrac- abilities. This inequality persists on all scales. The tor has a fractal measure. information dimension is computed in Appendix I, and shown to be (5) An Asymmetric Cantor Set If the intersection is taken of the Henon attrac- tor and a curve cutting transversally through it, the result is an asymmetric Cantor set. Construction of This example illustrates that the information an example gives an understanding of the manner dimension depends on the metric properties of a set in which asymmetries in the spacing of the elements (spatial distribution of the elements of the set) as of a set can create a fractal measure. Begin as usual well as the distribution of probability within the with the interval [0, 1], and delete the segment set. Asymmetries in either of these properties can from 1/2 to 3/4, i.e. delete the third fourth, as cause a chaotic attractor to have a fractal measure. shown in Figure 7 (a). Then delete the third fourth of each remaining piece, and so on. The limit is an (6) The Logistic Equation asymmetric Cantor set, of fractal dimension T)p = One of the properties that is necessary for a log 3/log 4. measure to be a fractal measure is structure on all Assign a uniform probability density to the scales. As the following example shows, this is not points of this set. To compute the information a sufficient condition; fractal measures must have dimension, partition the interval (0, 1) successively "geometrically multiplying" small scale structure. 1313,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos

binning the result into a histogram. The result of the latter procedure at r = 3.7 is shown in Figure 8. Although this probability density has structure on all scales of resolution, a direct calculation with a uniform partition shows that the entropy scales a) linearly with | log £ | (see Figure 9). An investiga- tion into the nature of P{x) makes this result llog P(x) reasonable: Using an initial condition Po (a?) = 1, suppose the Frobenius-Perron equation (4) is used to construct P{x). Providing the iterates of the critical point are not asymptotically periodic, at every iteration the fact that F (x) — 0 at the critical point x = zo causes a new singularity to be added to Pj(x). The resulting asymptotic density P(x) is not conti- nuous, and contains an infinite number of singu- larities at Zi = Fi(zo). Nevertheless, since each singularity falls off on one side roughly as x~l<2, the singularities are integrable (see Appendix II). Thus a coarse grained probability distribution {Pi} computed in this manner will converge in root- mean-square to the probability density P(x); the resulting measure is absolutely continuous with respect to Lebesgue measure. When ordered accord- ing to iteration from the critical point, the ampli- tude of the singularities decreases roughly exponen- tially at a rate given by the Lyapunov exponent,

log, P(X)

b) Fig. 7. (a) Successive approximations to an asymmetric Cantor set, formed by deleting the third fourth of each continuous piece, (b) Assuming the initial probability den- sity of points on the interval is uniform, the coarse grained probability distribution is evaluated at successively finer scales of resolution e = 2_i, i = 1, 2,..., 5.

Consider a mapping of the interval with a single quadratic maximum, for example, the logistic equation

xi+r = rXi{\ — xi), (19) where 0

I (bits) happens to contain one of the early iterates of the critical point, it looks relatively flat. When the segment is split in half to go to a finer level of reso- lution, the probability of the segment will divide evenly between the two new pieces. Asymptotically (when viewed at fine resolution), this density be- haves like a uniform density almost everywhere. At a coarse level of resolution, the information is less than that of a uniform density, but at fine resolution the information increases linearly, and the information dimension is one, as shown in Figure 9. l°g2n This example illustrates the manner in which Fig. 9. The information 7 as a function of the logarithm of small scale structure must occur to make a fractal the scale of resolution. The interval is evenly divided into n measure. For the logistic equation, the size of the equal pieces, and P (x) for the distribution shown in Fig. 8 is averaged over each piece. The dots are measured values visible spikes decreases exponentially, but the of 1(e), where e — l/n. The line is a plot of log», shown number of spikes in a given logarithmic size range for reference, and is the information contained in a uniform is fixed, independent of scale. Compare this with P( x). Fig. 2(d) or Figure 2(e). The size of the spikes decreases exponentially, but the number of spikes as demonstrated in Appendix II. The following visible at finer scales also increases exponentially. approximate form may add insight into the be- When a blowup is made of a small piece, structure havior of P(x) (see Figure 10). on the scale of that piece is visible almost every- where. (20) t=i \zi\' Part II: Sequential Measurements where zi = [Ft(x)]/x=:Zl, and X{ = (x — zt) multiplied by the sign of zi. 6(x) is the step function, i.e. Metric Entropy as Information Acquisition Rate 0 (z) = 0 for a;< 0, and 6 (x) = 1 for x ^0. The discussion so far has concerned the amount In the limit of large i, the size of a singularity of information gained by an observer in making relative to its background goes to zero. If a blowup a single, isolated measurement, i.e., the information is made of a small segment of P(x), unless it gained in taking a "snapshot" of a dynamical system. We might alternatively ask how much new information is obtained per unit time by an ob- log2 P(x) server who is, say, watching a movie of a dynamical system. In other words, what is the information acquisition rate of an experimenter who makes a series of measurements to monitor the behavior of a dynamical system ? For a predictable dynam- ical system eventually new measurements provide no further information. But, if the dynamical sys- tem is chaotic, new measurements are constantly required to update the knowledge of the observer. The metric, or Kolmogorov-Sinai entropy pro- vides an upper bound on the information acquisition rate. Although in a context unrelated to dynamical systems, the metric entropy wras originally defined by Shannon [11]. This quantity was later applied to dynamical systems, and shown to be a topological invariant by Kolmogorov [12] and Sinai [13]. To 605,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos 1315 simplify the discussion, we will follow the historical and defining course by first defining the metric entropy of a

Markov process, and then discussing it in the /in = - 1 P(Sm) log P(Sm), (23) context of dynamical systems (see also Refs. Sm — 0 [14-18]). the average amount of new information gained per A Markov process may be thought of as an in- symbol emitted by the source can be rewritten formation source that randomly generates one of AIm=Im+1-Im. (24) n symbols at discrete timesteps [19]. The text of this paper, for example, can be considered to con- For an ergodic source, AIm should be independent stitute a Markov process. These symbols can be of m when m is greater than the order of the source. thought of as the outcome of sequential measure- If the source, is not ergodic, however, AIm may ments. For convenience, assume that the n possible oscillate for large m. This problem may be avoided symbols are the integers from 0 to n — 1. If the by taking the limit as m goes to infinity, and de- occurrence of a given symbol s depends on m — 1 fining the information rate All At as preceding symbols, then the Markov process is of Alf At = lim {I m/m At}, (25) th m order. Assume throughout that the Markov OT-> oo process is ergodic, so that time averages are inde- where zl^ is the time interval between symbols. pendent of starting time, and are equal to ensemble Shannon calls this simply the entropy of the source; th averages. For an m order process whose current however, we will call it the information rate to symbol is can be represented server with a knowledge of the past history of the as the m digit base n fraction .s\ s% ... s^, abbrevi- source, A If At is a measure of the unpredictability ated Sm. This number defines the state of the of the sequences it generates. source. (This should not be confused with the state of a dynamical system.) The probability that a Metric Entropy and Refinement of Initial Conditions given state Sm will occur is A sequence of numbers obtained by making a P(si,s2,...,%) = P(Sm). series of measurements may be thought of as the symbols emitted by a Markov source. A distinct Given that the source is in state Sm, the conditional symbol is assigned to each of the n elements of the probability that the m +1 symbol will be sm+i is partition induced by the measuring instrument. P{8 m | %+i)> and the amount of new information A measurement at time t = 0 determines that the obtained on learning that this particular transition state of the dynamical system is located somewhere has occurred is I——log P(Sm | sm+i)- Averaging inside a given element of the partition. A finite over all possible transitions from Sm to sm+i, and time At later, another measurement may give a over all possible states Sm, the average amount of different outcome, indicating that the state is in a new information gained per symbol, AIm, is different partition element. Thus, a series of mea- 1—n~m n — 1 surements generates a sequence of symbols. The AIm=~ 2 P(8m) 2 information rate per unit time for this symbol Sm = 0 Sm+1 = 0 sequence may be calculated by taking the limit • P(Sm\sm+1)\og P(Sm\sm+i). as m->oo of AImlmAt = AljAt. Following Sinai [13], the metric entropy can be defined as the maxi- Making use of the facts that mum information rate when the partition and sampling rate are varied. P (Sm, 8m+1) = P (Sm) P (Sm | Sm+l) hn = sup lim {ImjmAt}. (26) is the joint probability for Sm and sm+i to occur in ß,At m-+ oo succession, and that probability is conserved when To an observer with optimal measuring instruments the system makes a transition to a new state, i.e. taking samples at the optimal rate, the metric entropy is the average amount of new information "z P(Sm,sm+1) = P(Sm) (22) Sm+1 = 0 gained per sample. 1316 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

/-i Fig. 11. A schematic illustration of Thus, the metric entropy may alternatively be ' the manner in which initial condi- h „--1 tions are refined by a sequence of thought of as measuring the extra amount of measurements. knowledge gained about an initial condition with each new measurement. For a predictable system, the trajectories on the average diverge at a linear Bi'n0 B'jl or polynomial rate, and asymptotically a new mea- A sequence of several measurements allows an surement ceases to provide better definition of the initial condition to be isolated more precisely than initial condition. For chaotic systems, on the does any one of the measurements taken alone. average trajectories diverge locally at an exponen- Suppose, for example, that a sequence of measure- tial rate, and each successive measurement pro- ments taken at times 0, 1, ..., m show that the vides new information. state of the system was successively in the partition The definition of metric entropy requires a mea- elements Bio, Bü, .... Bim. The sets Bt evolve sure (Ironically, it does not require a metric). To deterministically under the action of the flow cp. define the metric entropy of an attractor it is Thus, for example, if at time t = 1 the state is some- necessary to choose a measure on the attractor. where inside B{l, at time £ = 0 it must have been The physically relevant choice is the measure for somewhere inside (p~l(Bil). When the two measure- time averages, defined in (3). Again, we will assume ments at time t = 0 and t = 1 are taken together, that this is the same for almost every initial condi- at time t — 0 we know that the original initial condi- tion (with respect to Lebesgue measure) in a given tion at £ = 0 must have been somewhere inside of basin. By definition a chaotic attractor has positive n Bio (see Figure 11). After m measure- metric entropy. ments, it is located somewhere inside of m such intersections. In this manner a sequence of mea- Topological Entropy surements can be used to refine knowledge of an The topological entropy is the upper bound on initial condition. the information acquisition rate for an observer To obtain the average rate of refinement, it is who knows what symbol sequences occur, but does necessary to average over all possible initial condi- not know their relative probability. In this case, tions. This can be done by examining simulta- the information contained in a refined partition neously the intersection of all the elements of a ßm is log Nm, where Nm is the total number of partition with their own inverse images. Let the elements of ßm • For a partition consisting of n sym- intersection of two partitions a and ß be the parti- bols (n elements), Nm is the number of distinct tion formed by taking all possible intersections of symbol sequences that actually occur, in contrast their elements. to the nm symbol sequences that might occur. The

a A ß={AinBj} (27) topological entropy can be defined as for all Ai e a and all Bj e ß. The intersection of a ht = sup lim {log Nm/mAt} . (30) partition ß with the partition formed out of its ß. At m -» oo inverse images is called a refinement of the parti- This is only one of several equivalent methods of tion. If this is done m times the resulting partition, defining the topological entropy. The topological ßm, is called the mth refinement of ß with respect entropy is a topological invariant, and can be de- to (p. fined, for example, as the exponent of the rate of ßm = ß A 00, define Poo = lim Pm. Although Poo may be tion: m—<-oo a very nasty probability density, the normalization .101100111->.01100111 ? (35) condition insures that it induces a well defined Alternatively, this can be thought of as the one measure [too (see Equation (1)). dimensional map shown in Figure 12. If the digit Turning this construction around, P{Sm) can be that is shifted past the decimal point on a given viewed as the result of smearing the limiting iteration is 1, we can say that this Markov source distribution Poo over each of nm bins has generated a 1; if it is 0, it has generated a zero. Equivalently, the interval can be partitioned into p {Sm) = J Poo (x) dx = jXoo (iSm) (32) two equal pieces, and the elements of the partition Sm assigned the symbols 0 and 1. This source emits for Sm = 0 to 1 — n~m, with iSm the interval from Sm one bit of information per iteration. The refine- to Sm + n~m. Studying symbol sequences of length m ments of this partition, found by computing the is equivalent to examining the symbol sequence inverse images of .5, divide the interval into probability density Poo at a scale of resolution of 2, 4, 8, ... equal pieces of equal probability for e — n~m. This mapping of the set of probabilities m — 0, 1, 2, ... . The information contained in the P(S m) onto the unit interval allows the metric mth refinement is log 2m = m. Since possible symbol entropy to be rewTitten in terms of the information sequences are equally likely, the symbol sequence dimension of Poo . Since | log e \ — m log ft, the density Poo is a constant, and has an information metric entropy hß can be written: dimension of one. Thus, using any of the formula- tions that have been discussed here, the information hß= lim {/m/?ftlogft} log w, rate is one bit per iteration (see [6] and [21]).

hß - Di (Poo) log ft, (33) where Di(Poo) is the information dimension of Poo. Log ft is the amount of information per symbol if the symbols are of equal probability and do not depend on the previous symbols. Di(Poo) measures the nonuniformity of the probability of symbol Fig. 12. The binary shift map (see sequences. Equation (35)). 1318 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

This map with the partition above is a model 1 2 3 of a fair coin flip. If the interval is partitioned into two pieces of unequal length p and 1 —p, however, the coin is not fair. In this case the density Poo is u not uniform; heads is weighted differently than r tails, so that in passing from sequences of length m rH to sequences of length m -j-1, the probability in 4 6 each bin of width is split unequally between JI «fir two bins of width 2_wl_1. The resulting density Poo is the binary Cantor density, shown in Figure 3. j |||f The information production rate is the information 1 111 dimension of P multiplied by log 2; the information rate is therefore H(p), where H{p) is the binary Fig. 13. Poo for the logistic equation (19) with r = 3.7, taken entropy function (see Eqs. (14) and (15)). The equal from [27]. The interval is partitioned into two equal parts, which are assigned the symbols 0 and 1. Symbol sequences partition p= 1/2 gives the maximum value Sm = .«i «2 • • • sm are generated by iterating the map, and H( 1/2) = 1, corresponding to the metric entropy binned to approximate P(Sm). Successive approximations to Poo are shown in the figure. The metric entropy is the of the map. information dimension of Poo, and the topological entropy is the fractal dimension (using log2). (2) The Logistic Equation Revisited abilities P(Sm) of the corresponding symbol se- In a similar manner it is possible to compute the quences. If the elements of the refined partition ßm information production rate of the logistic equa- are Bmt, the probabilities of the symbol sequences tion, (19). For r = 4, the logistic equation is sym- Sm are metric and two onto one. The invariant probability P(Sm) = SP(x)dx = JKBm). (36) -1 2 density P(x)= 1/tt(x(1 —a;)) / is also symmetric. ßm As pointed out by Shaw [6], assuming also that there are no stable periodic orbits, these facts imply (3) Yorke's Map Revisited that the metric entropy with respect to P (x) is one Let us reconsider Yorke's map (15) in the con- bit per iteration, since in order to reconstruct a tra- text of sequential measurements. Partition the jectory going backwards in time a binary decision square into two pieces by dividing it along the line must be made at each iteration. For r =4=4, the in- y = q, as shown in Figure 14. The preimages of this formation rate can be computed numerically by partition are horizontal strips whose widths are making a partition of the map and iterating many distributed according to a binomial distribution, times. A histogram can be made of the relative and are shown on the left side of Figure 14. Now, frequency of occurrence of the symbol sequences, consider a sequence of measurements at successive to approximate Poo, as shown in Figure 13. This figure shows the characteristic behavior of a fractal measure as the scale of resolution is varied; com- pare to Fig. 7(b), for example. As demonstrated numerically by Crutchfield and Packard [27], the information rate is maximized for a two element partition when the division is made at the critical Fig. 14. Iterates of the par- point x = 1/2. tition of Yorke's uiap shown in the upper left corner (re- Equivalently, the symbol sequence probability fer to Eq. (15) and Figure 4). distribution P{Sm) can be demonstrated by com- The preimages of the parti- tion are shown on the left puting the measure P (x) of points on the interval side, and the images on the (Fig. 8) via any convenient method, computing the right side. When a combina- inverse iterates of the critical point to get a refined tion of images and preimages is used, an initial condition partition, and summing P{x) over each of the ele- can be determined arbitrar. ments of the refined partition to calculate the prob- ily well. 1319,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos iterations of the map made at times projected onto the x axis, as shown in Figure 4. Although these two versions of Poo have different — m, — m + 1, ...,0, ...,m — 1, m. information dimensions, the rule given by eq. (33) If the map is used to compare all the preimages gives the same metric entropy. of this sequence of measurements at the same past time — m, the y value at t = — m can be deter- Predicting the Future mined to arbitrary precision as m increases. From the example of part 1, Fig. 4, we know that the Taken together the metric entropy and informa- measure [x is uniform vertically, so the measure tion dimension can be used to estimate the length of the preimages is just proportional to their areas. of time that a physical prediction remains valid. Except for a reshufflhng of their order, this distri- The information dimension allows an estimate to bution of probabilities is exactly the binary Cantor be made of the information contained in an initial density (Figure 3). Thus the information rate is measurement, and the metric entropy estimates the H(q), the binary entropy function of q rate at which this information decays. (Equation 14). As we have already seen, the metric entropy is an upper bound on the information gained in each Now, consider the images of this partition. These measurement of a series of measurements. But, if are the uniform vertical bars shown on the left side each measurement is made with the same precision, of Fig. 14, and are exactly the successive approxi- the information gained must equal the information mations to the attractor shown in Figure 4. If the that would have been lost had the measurement not images of each measurement are compared at the been made. Thus, the metric entropy also quanti- same future time m, the horizontal position, and fies the initial rate at which knowledge of the state hence the "leaf" of the Cantor set that the state of the system is lost after a measurement. occupies at time m can be determined to arbitrary precision. Although the areas of the images are Now, to make the notion of "information con- the same, their probability is not; the relative tained in a prediction" more precise: Let ß = {Pf} probabilities are the same as those plotted on the be a partition. For convenience, refer to Bi, the attractor in Figure 4. elements of the partition, as simply outcomes. Let The agreement between these probability distri- probability that a measure- butions is not a coincidence. Since Yorke's map is ment at time t has outcome Bj invertible, either preimages or images of the parti- Vi] w — jf a measurement at time 0 had ^ tion can be used to make refinements. Conservation outcome Bi. of probability requires that the measure of each element of the partition remain the same under the Pij(0) must satisfy action of the map. If both images and preimages are used to evaluate all the measurements at the Pi}{0) = 1 if i = j; same time t = 0, both the horizontal and the verti- Put0) = 0 if i^j. (38) cal position can be determined arbitrarily well as m grows large. A partition such as this one with the If Bi is known, the information gained on learning property that every point on the attractor can be of outcome Bj is —log Pij(t). With no initial in- represented arbitrarily well is called a generator. formation, the information gained from the mea- From Sinai's theorem [21], the information rate surement is determined solely by the asymptotic computed using a generator is equal to the metric measure fx, and is —log Ji(Bj). The extra informa- entropy. tion gained using a prediction from the initial data This example illustrates the ambiguities that are is the difference of the two, or possible in constructing Poo. On one hand, using /rel = log {Pijmm. (39) the two symbol partition shown on the left of Fig. 13, Poo is exactly the binary Cantor density. This must be averaged over all possible measure- If, on the other hand, the three symbol partition ments Bj at time t, and all possible initial measure- shown on the right of Fig. 13 is used, then Poo is ments Bi. The measurements Bj are weighted by just the same as the density P(x) on the attractor their probability of occurrence Pij[t), and the initial 1320 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos measurements are weighted by Ji(Bi). This gives I

i(t) = 2V'(Bi) i • 2ft/(0 log (40) i I(t) is the average amount of information con- tained in a prediction made a time t into the future. Using the invariant measure /Z, pa(t) can be ex- pressed in more geometrical terms: At time t the 1 Fig. 16. The typical behavior of I(t), the information con- initial measurement Bi evolves into cp Bi) the tained in a prediction, for a chaotic attractor. Initially I (t) probability of outcome Bj at time t is therefore decreases at a linear rate given by the metric entropy. Ji^Bi n Bj) (see Figure 15). Dividing by Jl(Bi) to make py(0) = 1, pij(t) can be written in the On a chaotic attractor, adjacent trajectories on alternate form the average diverge exponentially, and I(t) initially decreases at a linear rate, as shown in Figure 16. pij(t) = Ji{cptBinBj)jji(Bi). (41) Goldstein [29] has shown that this initial rate is Combining (40) and (41) get equal to the metric entropy for flows with a conti- nuous invariant measure. As demonstrated in I(t)=Zji( CO and information dimension can be used to estimate for any sets A and B on the attractor. Intuitively, I(t) for short times. For initial measurements made this just means that B is "smeared" throughout with a signal to noise ratio S, the attractor by the flow, so that the probability I(t) = 1(0)- hßt = Di log S — hßt (44) of finding a point originating in B inside of A is just the original probability of B, weighted by the prob- (see Equation (9)). The initial data becomes useless ability of A. For an attractor that is mixing, after a characteristic r = (Dijhß) log S. substituing (43) into (42) implies that /(oo) = 0. Experimental Applications: Infinite Dimensions, Limited Knoviedge (a) (b) Thus far we have assumed a dynamical system of finite dimension N. In contrast, many important physical systems, for example fluid flow, are de- scribed by partial differential equations, which have an infinite dimensional phase space. In this case, to determine a completely arbitrary initial condi- tion, an infinite number of real variables must be measured, corresponding to the infinite degrees of freedom possible in choosing an initial function. Fig. 15. (a) Assume that an initial measurement finds that the state is contained in partition element Bi. (b) Given When the state of the system is close to an attractor, outcome Bi at time 0, a time t later the state must be however, the number of degrees of freedom may somewhere inside «p'-B*; the probability that a measure- be reduced. Throughout the following discussion ment at time t will have outcome Bj is ~jx(Bj n cp1 Bi), r where fi is the invariant measure (which gives the proper w e will assume that the attractors considered can be a priori weight to each outcome). contained in a smooth manifold of dimension m^. 1321,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos

When this assumption holds the number of mea- [32], any given state can be represented in terms of surements required to determine an initial condi- delay coordinates. tion at a given level of precision is + The representation of an attractor in terms of ("Number of measurements" can refer to simulta- delay coordinates can be viewed as an attractor of a neous measurements of M real variables, or as dynamical system that is related to other represen- discussed below, a single variable can be measured tations by a coordinate transformation. Providing M successive times.) Assume that the attractor can this coordinate transformation is smooth, the be embedded in M dimensions. If M is substituted attractors should have the same metric entropy. for the phase space dimension N in the previous This relies on the invariance of the metric entropy; discussions, all of the quantities that refer to at- if the measure, the partition, and the flow are all tractors, such as information dimension and metric transformed, the metric entropy remains the same entropy, are well defined for infinite dimensional [21]. The invariance properties of the information systems (see Refs. [31] and [32]). dimension are not currently known, but for the In addressing the experimental determination of purpose of the following discussion we will assume the dimension of a strange attractor, Takens [32] that the information dimension is also invariant has expressed the fractal dimension and topological under smooth coordinate transformations. entropy in a manner that illustrates their conjugate Let Pm>n be the probability distribution of se- relationship. As shown below, the information di- quences of length m for an n symbol partition of a mension and the metric entropy can be expressed single variable, and let I(Pm,n) be the information in an analogous form. This form provides an al- contained in Pmn)l\\og e| must be taken as the constructing qualitative properties of the attractor scale of resolution goes to zero, which requires that from a projection onto only a few dimensions. the number n of elements in the partition go to in- Fortunately, as demonstrated in Refs. [33], [34] finity. Furthermore, the number of samples m must and [35], a valid representation of an attractor can be large enough so that the delay representation be constructed from a projection of a trajectory is of sufficient dimension to faithfully represent onto a single dimension. For example, if Vx(TQ, t) is states on the attractor. Although the embedding a single component of the velocity of a fluid at a dimension M is in general unknown, a faithful fixed point ro in space, a set of phase variables representation can be accomplished by letting m (V\,V2, ..., vk) can be constructed. get arbitrarily large. Once m^M, adding further samples is redundant, and the limit should converge. v\ (t) = vx(r0,t), To calculate the metric entropy the limit as V2 (t) = vx(r0,t + Ti), m -» oo must be taken for a partition and sampling

vs(t) = vx[r0,t + r2), rate that maximize the information rate. Intui- tively, this rate should be approached by an ob- server with accurate instruments that induce a uniformly fine partition of the phase space contain- ing the attractor. With this assumption, the in- VK (t) = vx {r0 ,t -f tk—l), (45) formation dimension and metric entropy of an where n, ..., TK-I are suitably chosen delay times attractor can be written in the following form: (see Ref. [35]). We will refer to coordinates con- D\ — lim lim {/(Pm, «)/log n), (46a) structed by taking delayed values of a projection m—*oo n —• oo onto a single dimension as delay coordinates. If M hß = lim lim {I(Pm,n)lm}. (46b) of these coordinates are taken, then, as shown in tt-*00 7/1 —> OO 1322 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

Unfortunately, neither this algorithm, nor other Table 1. Dynamical quantities. algorithms previously presented in the literature, are practical to implement in experiments (a pos- (Need Relevant measure) Probability sible exception is the algorithm given in [35]). For Density attractors of dimension greater than three, the amount of data that needs to be gathered to achieve Dimensions Df Di a reasonable level of resolution is prohibitive [35]. (Need Fractal Information Asymptotic Future applications to physical experiments await metric) Dimension Dimension Probability Density more efficient algorithms, if they exist. lim lo%n liI- m 1(e)v ' P(x) If the equations of motion are known, the infor- e-0 1 log £ I £-»0 | log £ | mation dimension is much easier to compute than Entropies hj hß the fractal dimension. Kaplan and Yorke define a Topological Metric Symbol quantity they call the Lyapunov dimension [36, 37] Entropy Entropy Sequence which can be expressed in terms of the spectrum Distribution of Lyapunov exponents (see Ref. [32], for example, DF(P) log n Di (Pec) log« Poo for a definition of Lyapunov exponents). They have recently conjectured that the Lyapunov dimension is generally equal to the information dimension [9]. two a metric is also needed); a notion of relative For any dynamical system that can be simulated probability is not necessary. efficiently it is feasible to compute the relevant part The information dimension requires both a metric of the spectrum of Lyapunov exponents. The com- and a probability measure for its definition, and putational time and memory requirements increase provides a geometrical way to understand the linearly with the dimension of the attractor, rather amount of information gained in sampling a random than as the power of the dimension, as they do for variable. It can thus be used to discuss some of the a direct computation of the fractal dimension. In probabilistic properties of attractors. Knowledge addition, the algorithms are relatively independent of the information dimension of an attractor of a of resolution. Assuming that the Kaplan-Yorke dynamical system allows an estimate to be made conjecture is true, Ref. [33] contains examples of of the information gained in a measurement of an computations of the information dimension using initial condition. For a continuous probability this technique, for an infinite dimensional dynamical density defined on a smooth manifold, the informa- system with attractors of dimension as great as tion dimension is an integer, but for probability twenty (see also the comparisons of the Lyapunov densities with structure that multiplies geomet- dimension and the fractal dimension made in [38]). rically as the scale of resolution decreases, it is usually not an integer. The information dimension fills a logical gap in Conclusions the classification of chaotic attractors in terms of Four distinct dimensions have been discussed in metric entropy, fractal dimension, and topological this paper. The topological dimension and fractal dimension. As summarized in Table 1, the informa- dimension are discussed in detail elsewhere [2, 3], tion dimension plays the same role relative to the and not much is said about them here. The em- metric entropy that the fractal dimension does bedding dimension is the minimum dimension for relative to the topological entropy. Information an embedding into Euclidean space; for an attrac- dimension and metric entropy are probabilistic tor, roughly speaking it is the minimum number of notions, and require a measure for their definition. independent real numbers needed to smoothly Topological dimension and fractal dimension do not label points everywhere on the attractor. The em- require a measure. Both of the dimensions require bedding dimension is important because it gives a concept of distance (metric), whereas the en- the minimum number of modes that are needed tropies do not. Dimensions relate to isolated mea- for a physical description. For the topological, surements (snapshots), whereas entropies relate to fractal, and embedding dimensions it is sufficient sequences of measurements (movies). The entropies to consider the attractor as a set (for the second can both be expressed in terms of the corresponding 1323,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos dimension of the symbol sequence distribution partition, the total measure on the left side is mapped onto the interval. \ • | • f ...; the total measure on the right is Because it weights the probabilistic aspects in- J • f • f ...; their ratio is 2:1. Letting Pjl be the herent in any physical system, the information measure of the jth element of the ith partition, we dimension has a more direct physical interpretation get the following sequence of coarse grained prob- than the fractal dimension. This is similarly true ability distributions. Let m be the number of non- for the metric entropy relative to the topological empty elements in the ith partition. entropy. Another advantage of the information Ui dimension is that when the equations of motion are known it is more feasible to compute than the = 1 fractal dimension. {P,i} = l/3[2, 1], 2 Outstanding mathematical questions: Is the in- {P,2} = 1/9[4j 2,3], 3 formation dimension an invariant under smooth coordinate transformations (assuming the metric {Pi3} = l/27[8,4,6,6,3], 5 and measure are also transformed appropriately) ? {Pf} = i/8l [16, 8, 12, 12, 6, 12, 6, 9]. 8 Does a uniform partition always provide the correct value of the information dimension in the limit Table Al. where the elements of the partition are all very Note that the rn are the Fibonacci numbers, satis- fine ? If not, is there an algorithm to find partitions fying rii — rii-\ nt-2. The Pjl are given by the of a given diameter that minimize the information ? following recursion relation: For j = 1,2, ...,rii-\, P/ = 2/3P/-i, A cknoviledgements For j = iti-i + 1,..., nt, Psi = 1/3 P}Z2ni_x. I would like to give special thanks to Norman Packard and Jim Crutchfield for their help in un- Let the information contained in the ith partition ravelling the relationship between the dynamical be quantities of Table 1, and for contributing Figure 13. Ui I am especially indebted to Jim Yorke for illu- Ii=-%Pji\ogPf minating phone conversations and for providing 7 = 1 the example of (15). I would also like to acknowl- ni-l = -2(2/3) P/-ilog(2/3)P/-i edge valuable discussions with John Guckenheimer, A Allan Kaufman, Joe Rudnick, David Rand, and nt-2 Rob Shaw. I am indebted to Richard Kaplan and -2(l/3)P/-2log^P/-2. (Al) the Department of Aerospace Engineering at the j=i University of Southern California, for their gracious A little algebra shows hospitality and the use of their PDP 11/55 mini- Ii = (2/3) Ii-1 + (1/3) Ii-2 + #(1/3), (A2) computer. This study was supported by the Fanny and John Hertz Foundation. where H(p) is the binary entropy function, Equa- tion (14). Let the ith approximation to the dimen- sion be Di = i^/log 2* = 7f/ilog2. Recast (2) in Appendix I terms of Di. Information Dimension of an Asymmetric Cantor Set This appendix contains a computation of the in- A = (tt-tti log 2t )I (A3) formation dimension of the asymmetric Cantor set made by deleting third fourths, shown in Figure 7. (i-2) 2(i — 1) A-2+ 0 ;A-i + ff(|) As described in Example (5) of Part I, begin the construction of the Cantor measure with a uniform In the limit as i goes to infinitiv. Di approaches a probability density on the interval [0, 1]. stable fixed point D\. Assume that the dimension can be computed using a uniform partition consisting of 2i elements, H{\) Di = where i = 0, 1, 2, ... . With i = 1, a two element 4 log 2 1324 ,T. D. Farmer • Information Dimension and the Probabilistic Structure of Chaos

Di is approximately equal to 0.6887, in contrast to with length Ej. For small e; DF = log 3/log 4 = 0.7925. ej=\F'(zj-1)F'(zj-2)...F'(z1)\e1 Note that a random Cantor set formed by ran- = W\ei, (A.4) domly deleting at each stage of the construction either the second fourth or the 3rd fourth of each where z/ = [FJ~1(x)]x=Zl. In the limit as oo, the continuous piece will have the same dimensions. Lyapunov exponent of z\ is AZl = (log | z/\)/j. Qua- Any Cantor set made this way will have the same dratic mappings with a single maximum have only sequence of coarse grained probability distributions one attractor, and almost every point tends to this as shown in Table 1, except that the order of the attractor. As long as the critical point is one of elements will be different. Since the information is these typical points, the subscript can be dropped, insensitive to the order of the elements, the dimen- and the Lyapunov exponent of almost every point sion will be the same. written simply A. Therefore = exp(?'A)ei for large j. Appendix II Conservation of probability implies that the In this Appendix I investigate some properties total probability of Ij will be the sum of the prob- of the asymptotic probability density P(x) for abilities of the inverse iterates of Ij: chaotic attractors of one dimensional maps Xi+\ = Ji(Ij) = jPj(x)dx F(xi) with a single quadratic maximum. There are h typically an infinite number of integrable singu- J P1(x)dx = ji(I1). (A.5) +1 larities in P{x), w hich occur at the iterates of the zeF-' (I]) critical point, and fall off on one side as kx~1!2. The fact that F has a quadratic maximum, together The amplitude k decreases roughly as e-^/2, where j with the initial condition Po {x) = 1, implies that is the order of iteration from the critical point, for xe 11 and I\ sufficiently small

and A is the Lyapunov exponent. P1{x)^k1{x-z1)~V2. (A.6) Assume throughout that the iterates of the critical point ZQ (F' (ZO) = 0) are not asymptotically Since I\ is the only of the inverse iterates of Ij periodic. This is the typical case for chaotic attrac- that contains a singularity, for Ej small enough it tors; exceptions are, for example, r — 4 for the dominates the contribution to the integrals in (A.5), -1 2 logistic equation. Assume that the asymptotic so that for x e Ij, Pj(x)^.kj{x — z^) / . Therefore, probability density can be computed using the using (A.5) and (A.6), Frobenius-Perron equation (4). Numerical computa- j Pj(x)dx = kj(2ej)1/2 r tions by Rob Shaw and myself at several parameter 11 = JPi(a;)da; = k1{2e1)^ (A.7) values of the logistic equation support this as- and sumption. kjlh = (ei lej)1'2 = 1/| z; 11/2 = With the initial condition Po (#) = 1, the jth iterate P{x) of the Frobenius-Perron equation Thus the amplitude k of the singularities falls off (eq. 10) will contain j singularities. The jth sin- roughly exponentially at the rate given above. gularity occurs at the jth iterate of the critical point, Notice that the larger the characteristic exponent, which will be labeled zj. the faster the amplitude of the singularities de- Let I\ be a small interval of length ei centered creases. This provides the motivation for the on z\, and let Ij be the jth iterate of this interval, approximation for P(x) presented in (20).

[1] E.A.Abbot, Flatland: A Romance of Many Dimen- [5] G. Bennetin, M. Casartelli, L. Galgani, A. Giogilli, and sions, Blackwell, Oxford 1888, 1962. G. M. Strelcyn, Nuovo Cimento 44B, 1883 (1978), and [2] W. Hurewicz and H. Wallman, Dimension Theory, also 50 B, 211 (1979). Princeton 1948. [6] R, Shaw, Z. Naturforsch. 36a, 80 (1981). [3] B.Mandelbrot, Fractals: Form, Chance, and Dimen- [7] L. Brillouin, Science and , Aca- sion, Freeman, San Francisco 1977. demic Press, New York 1956. [4] J. Balatoni and A. Renyi, Publications of the Math. [8] A. Renyi, Acta Mathematica (Hungary) 10,193 (1959). Inst, of the Hungarian Acad, of Sei. 1, 9 (1956) See also A. Renyi, Probability Theory, North Holland, (Hungarian). See English translation in "The Selected Amsterdam 1970. Papers of A. Renyi, 1, p. 558, Akademia Budapest 1976. [9] J. Yorke, private communication. 1325,T. D . Farmer • Information Dimension and the Probabilistic Structure of Chaos

[10] M. Henon, Comm. Math. Phys. 53, 69 (1976). Proceedings of the Int. Conf. on Plasma Physics, [11] C. Shannon, Bell Techn. 27, 379 (1948). Nagoya, Japan 1980. [12] A. Kolmogorov, Dolk. Akad. Nauk SSSR 124, 754 [27] J. Crutchfield and N. Packard, UCSC preprint (1981). (1959). English summary in MR 21, 2035. [28] J. Curry, On the Entropy of the Henon Attractor, [13] Ya. G. Sinai, Dolk. Akad. Nauk. SSSR 124, 768 (1959). IHES preprint. English summary in MR 21, 2036. [29] S. Goldstein, Entropy Increase in Dynamical Systems, [14] V. Si. Alekseyev and M. V. Yakobson, Symbolic Dy- Rutgers U. Math. Dept. preprint (1980). namics and Hyperbolic Dynamic Systems, to appear [30] D. Farmer, J. Crutchfield, H. Froehling, N. Packard, in Physics Reports. and R. Shaw, Ann. N.Y. Acad. Sei. 357, 453 (1980). [15] R. Bowen, Equilibrium States and the Ergodic Theory [31] D. Farmer, Spectral Broadening of Period Doubling of Anasov Diffeomorphisms, Springer-Verlag, Berlin Sequences, Phys. Rev. Lett. 17, 179 (1981). 1975, 470. [32] F. Takens, Detecting Strange Attractors in Turbu- [16] R. Bowen and D. Ruelle, Inventiones Math. 29, 181 lence, Rijksuniversiteit Groningen Mathematisch In- (1975). stitut, preprint (1980). [17] Ya. Sinai, Russ. Math. Surv. 166, 21 (1972). [33] D. Farmer, Chaotic Attractors of an Infinite Dimen- [18] Y. Oono, T. Kohda, and H. Yamazaki, J. Phys. Soc. sional Dynamical System, Physica 4D (1982) 366—399. Japan 48, 3, 738 (1980). [34] N Packard, J. Crutchfield, D. Farmer, and R. Shaw, [19] See, for example, N. Abramson, Information Theory Phys. Rev. Lett, 45, 712 (1980). and Coding, McGraw-Hill, New York 1963. [35] H. Froehling, J. Crutchfield, D. Farmer, N. Packard, [20] V. I. Arnold and A. Avez, Ergodic Problems of Clas- and R. Shaw, On Determining the Dimension of sical Mechanics, Benjamin, New York 1968 Chaotic Flows, Physica 3D (1981) 605-617. [21] P. Billingsley, Ergodic Theory and Information, John [36] J. Kaplan and J. Yorke, Functional Differential Equa- Wiley and Sons, New York 1965. tions and Approximation of Fixed Points. H. O. Peit- [22] R. Adler, A. Konheim, and M. McAndrew, Trans. gen and H. O. Walther, Eds., Springer-Verlag, Berlin Amer. Math. Soc. 114, 309 (1965); MR 30, 5291. 1979, p. 228, Bd. 570. [23] R. Bowen, Trans. Amer. Math. Soc. 153, 401 (1971). [37] P. Frederickson, J. Kaplan, and J. Yorke, The Dimen- [24] J. Guckenheimer, Comm. Math. Physics 70,133 (1979). sion of the Strange Attractor for a Class of Difference [25] I. Shimada, Prog. Theo. Phys. 62, 61 (1979). Systems, U. of Maryland Math. Dept., to appear in J. [26] A. Kaufman, H. Abarbanel, and C. Grebogi, Statisti- Differential Eqs. cal Analysis of a Deterministic Stochastic Orbit, [38] D. Russel, J. Hanson and E. Ott, Phys. Rev. Lett. 45, 1175 (1980). Nachdruck — auch auszugsweise — nur mit schriftlicher Genehmigung des Verlages gestattet Verantwortlich für den Inhalt: A. KLEMM Satz und Drude: Konrad Triltsch, Würzburg