<<

2017-03-23 3:42 pm 1

Chapter 2. New worlds versus scaling: from van Leeuwenhoek to Mandelbrot 2.1 Scalebound thinking and the missing quadrillion We just took a voyage through scales, noticing structures in cloud photographs and wiggles on graphs. Collectively these spanned ranges of scale over factors of billions in space and billions of billions in time. We are immediately confronted with the question: how can we conceptualize and model such fantastic variation? Two extreme approaches have developed, for the moment I will call the dominant one the “new worlds” view after Antoni van Leeuwenhoek (1632-1723), who developed a powerful early microscope, the other, the self-similar (scaling) view by Benoit Mandelbrot (1924- 2010) that I discuss in the next section. My own view - scaling but with the notion of scale itself an emergent property - is discussed in ch.3. When van Leeuwenhoek peered through his microscopea, in his amazement he is said to have discovered a “new world in a drop of water”: “animalcules”, the first micro- organismsb (fig. 2.1). Since then, the idea that zooming in will reveal something totally new has become second nature: in the 21st century -imaging microscopes are developed precisely because of the promise of such new worlds. The scale-by-scale “newness” idea was graphically illustrated by K. Boeke’s highly influential book “” (1957) which starts with a photograph of a girl holding a cat, first zooming away showing the surrounding vast reaches of , and then zooming in until reaching the nucleus of an atom. The book was incredibly successful, and was included in Mortimer Adler's “Gateway to the Great Books” (1963), a 10 volume series featuring works by Aristotle, Shakespeare, Einstein and others. In 1968, two films were based on Boeke’s book: “”c and “Powers of Ten” (1968d, re-released in 1977e) which encouraged the idea that nearly every power of ten in scale hosted different phenomena. More recently (2012), there’s even the interactive , app for the iPad, iPhone, or iPod. In a 1981 paper, Mandelbrot coined the term “scalebound” for this “New Worlds” view, a convenient shorthandf that I use frequently belowg. While “Powers of Ten” was proselytizing the new worlds view to an entire generation, there were other developments that pushed scientific thinking in the same direction. In the 1960’s, long ice and ocean cores were revolutionizing climate science by supplying the first quantitative data at centennial, millennial and longer time scales. This coincided with the

a The inventor of the first microscope is not known, but van Leuwenhoek’s was more powerful, up to about 300 times magnification. b Recent historical research indicates that Robert Hooke may in fact have preceded van Leeuwenhoek, but the latter is usually credited with the discovery. c Produced by the National Film Board of Canada. d By Charles and Ray Eames. e The re-release had the subtitle: “A Film Dealing with the Relative Size of Things in the and the Effect of Adding Another Zero” and was narrated by P. Morrison. More recently, the similar “” (1996), appeared in IMAX format. f He wrote it as here, as one word, as a single concept. g He was writing in Leonardo, to an audience of architects: “I propose the term scalebound to denote any object, whether in nature or one made by an engineer or an artist, for which characteristic elements of scale, such as length and width, are few in number and each with a clearly distinct size”: 1 Mandelbrot, B. Scalebound or scaling shapes: a useful distinction in the visual arts and in the natural sciences. Leonardo 14, 43-47 (1981). 2017-03-23 3:42 pm 2 development of practical techniques to decompose a signal into oscillating components: “spectral analysis”. While it had been known since Joseph Fourier (1768-1830) that any time series may be written as a sum of sinusoids, applying this idea to real data was computationally challenging and in atmospheric science had been largely confined to the study of turbulence. The breakthrough was the development of fast computers combined with the discovery of the “Fast Fourier Transform” (FFT) algorithmh (1968). The beauty of Fourier decomposition is that each sinusoid has an exact, unambiguous time scale: its period (the inverse of its frequency) is the length of time it takes to make a full oscillation (fig. 2.2a, upper left for examples). Fourier analysis thus provides a systematic way of quantifying the contribution of each time scale to a time series. Fig. 2.2a illustrates this for the Weierstrass function which in this example, is constructed by summing sinusoids with frequencies increasing by factors of two so that the nth frequency is ω = 2n. Fig 2.2a (upper left) shows the result for H =1/3 with all the terms up until 128 cycles per second (upper row); the amplitudes decrease by factors of 2-H (here = 0.79) so that the nth amplitude is 2-nH. Eliminating n, we find the power law relation A = ω-H. More generally for a scaling process, we have: Spectrum = (frequency)-β

Where β is the usual notation for the “spectral exponent”i. The spectrum is the square of the amplitude, so that in this (discrete) examplej we have β =2H. The spectrum of the Weierstrass function is shown in fig. 2.2a bottom row (left) as a discrete series of dots, one for each of the 8 sinusoids in the upper left construction. On the bottom row (right) we show the same spectrum but on a logarithmic plot on which power laws are straight lines. Of course, in the real world - unlike this academic example - there is nothing special about powers of 2 so that all frequencies – a continuum - are present. The Weierstrass function was created by adding sinusoids: Fourier composition. Now take a messy piece of data – for example the multifractal simulation of the data series (lower left in fig. 1.3): it has small, medium and large wiggles. To analyze it we need the inverse of composition, and this is where the FFT is handy. In this case, by construction, we know that all the wiggles are generated randomly by the process; that they are unimportant. However, if we had no knowledge – or only a speculation - about of the mechanism that produced it, we would wonder: do the wiggles hide signatures of important processes of interest, or are they simply uninteresting details that should be averaging out and ignored?

h The speed-up due to the invention of the FFT is huge: even for the relatively short series in fig. 1.3 (2048 points) it is about a factor of one hundred. In GCM’s it accelerates calculations by factors of millions. iThe negative sign is used to so that in typical situations, β is positive. j In the more usual case of continuous spectra, we have β = 1+2H possibly with corrections when intermittency is important. 2017-03-23 3:42 pm 3

Fig. 2.1: Antoni van Leuwenhoek discovering “animalcules” (micro-organisms), circa 1675.

Fig. 2.2b shows the spectrum of the multifractal simulation (fig. 1.3 lower left) for all periods longer than 10 milliseconds. How do we interpret the plot? One sees three strong spikes, at frequencies of 12, 28 and 41 cycles per second (corresponding to periods of 1/12, 1/28 and 1/41 of a second, about 83, 35, 24 milliseconds). Are they signals of some important fundamental process or are they just noise? Naturally, this question can only be answered if we have a mental model of how the process might be generated, and this is where it gets interesting. First of all, consider the case where we have only a single series. If we knew the signal was turbulent (as it was for the top data series), then turbulence theory tells us that we would expect all the frequencies in a wide continuum of scales to be important, and furthermore, that at least on average, that their amplitudes should decay in a power law manner (as with the Weierstrass function). But the theory only tells us the spectrum that we would expect to find if we averaged over a large number of identical experimentsk (each one with different “bumps” and wiggles, but from the same overall conditions). In fig. 2.2b, this average is the smooth blue curve. But in the figure, we see that there are apparently large departures from this average. Are these departures really exceptional or are these just “normal” variations expected from randomly chosen pieces of turbulence? Before the development of cascade models and the discovery of multifractals in the 1970’s and 80’s, turbulence theory would have led us to expect that the up and down variations about a smooth line through the spectrum should roughly follow the “bell curve”. If this was the case, then the spectrum should not exceed the bottom red curve more than 1% of the time and the top curve more than one in ten billion times. Yet, we see that even this 1/10,000,000,000 curve is exceeded twice in this single but nonexceptional simulationl. Had we encountered this series in an experiment, k An “ensemble” or “statistical” average. l I admit that to make my point, I made 500 simulations of the multifractal process in fig. 1.3 and then searched through the first 50 to find the one with the most striking variation. But this was by no means the most extreme of the 500 and if the statistics had been from the bell curve, then the extreme point in the spectrum in fig. 2.3 would have corresponded to a probability of one in 10 trillion, so that my slight cheating in the selection process would still have been extremely unlikely to have caused the result! 2017-03-23 3:42 pm 4

turbulence theory itself would probably have been questioned – as it indeed it repeatedly was (and still is). Failure to fully appreciate the huge variability that is expected in turbulent processes and continued embrace of inappropriate bell curve type paradigms has spuriously shed discredit on many attempts at establishing turbulent laws and has been a major obstacle in their understanding.

�(�) �(�) �� �� �� �� �� �� � � � � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �(ω) ����(ω) ��� ���ω ��� -��� ��� ��� ��� ��� -��� ��� -��� ��� -��� -��� ��� -��� ω -��� �� �� �� �� ��� ���

Fig. 2.2a: Upper left: The first eight contributions to the Weierstrass function (displaced in the vertical for clarity). Sinusoids with frequencies of 1, 2, 4, 8, 16, 32, 64, 128 cycles per second (the time t is in seconds). Upper right: Sinusoids with frequencies of 2, 4, 8, 16, 32, 64, 128, cycles per second, stretched by a factor 1.25 in the vertical and a factor of two in the horizontal. The sum (top) is the same as that on the left but is missing the highest frequency detail (see the discussion a little later). Lower left: The spectrum on a linear-linear scale each point indicating the contribution (squared) and the frequency. Lower right: The same as lower left but on a logarithmic plot (it is now linear).

In conclusion, until the 1980’s even if we knew that the series came from an apparently routine turbulent wind trace on the roof of the physics building, we would still have concluded that the bumps were indeed significant. But what would be our interpretation if instead fig. 2.2b was the spectrum of a climate series? We would have no good theory of the variability and we would typically only have a single trace. Let’s take the example of an ice core record. The series itself was likely the product of a near heroic scientific effort, possibly involving months in freezing conditions near the south pole. The sample would first be cored and then transported to the lab. This would have been followed by a painstaking sampling and analysis of the isotopic composition 2017-03-23 3:42 pm 5

using a mass spectrometer, then a digitization of the result. Careful comparison with other cores or with ice flow models would eventually establish a chronology. At this point, the researcher would be eager for a quantitative look at what she had found. If the black curve in fig. 2.2b was the spectrum of such a core, how would she react to the bumps in the spectrum? Unlike the turbulence situation where there was some theory, an early core would have had little with which to compare it. This is the point where the new worlds view could easily influence the researcher’s resultsm. She would be greatly tempted to conclude that the spikes were so strong, so far from the bell curve theory that they represented real physical oscillations occurring over a narrow range of time scales. She would also remark that the two main bumps in the spectrum involve several successive frequencies, and according to usual statistical assumptions, “background noise” should not be correlated in this way. This wide bump would strengthen the interpretation that there was a hidden oscillatory process at workn. Armed with the series of bumps, she might start to speculate about possible physical mechanisms to explain them.

�(ω) ���

���

���

���

���

� ω �� �� �� �� ��� Fig. 2.2b: Black: the Fourier spectrum of the changes in wind speed in the 1 second long simulation shown at the bottom left of fig. 1.3, showing the amplitudeso of the first 100 frequencies (ω). The upper left is thus for one cycle over the length of the simulation, i.e. one cycle per second, a period of one second. The far right shows the variability at 100 cycles per second giving the amplitude of the wiggles at 10 milliseconds (higher frequencies were not shown for clarity). The brown shows the average over 500 random series each identical to that in fig. 1.3: as expected, it is nearly exactly the theoretical (scaling) power law (blue, the two virtually on top of each other). The three red curves show the theoretical 1%, one in a million and one in ten billion extreme fluctuation limits (bottom to top) determined by assuming that the spectrum has bell curve (Gaussian) probabilities.

We should thus not be surprised to learn that the 1970’s witnessed a rash of papers based on spectra resembling that of fig. 2.2b: oscillators were suddenly ubiquitousp. It was

m Alternatively, there might be incorrect theories that could be spuriously supported by unfortunately placed random spectral bumps, and much time would be wasting chasing blind alleys. n According to standard assumptions that –if only demonstrated by this example are clearly inappropriate - successive frequencies should be statistically independent of each other. o The spectrum is actually the ensemble average of the squares of the absolute amplitudes. It was windowed” in order to avoid spurious “spectral leakage” that could artificially smear out the spectrum. p I could also mention the contribution of “Box-Jenkins” techniques (1970) to bolstering scalebound blinkers. These were originally engineering tools for analyzing and modeling stochastic processes based on the a priori 2017-03-23 3:42 pm 6 in this context that Murray Mitchell5 (1928-1990) famously made the first explicit attempt to conceptualize temporal atmospheric variability (fig 2.3a). Mitchell’s ambitious composite spectrum ranged from hours to the age of the (≈4.5x109 to 10-4 years, bottom, fig. 2.3a). In spite of his candid admission that this was mostly an “educated guess”, and notwithstanding the subsequent revolution in climate and paleoclimate data, over forty years later it has achieved an iconic status and is still regularly cited and reproduced in climate papers and textbooks6,7,8. Its continuing influence is demonstrated by the slightly updated version shown in fig. 2.3b that (until 2015) adorned NOAA’s National Climate Data Center (NCDC) paleoclimate web siteq. The site was surprisingly forthright about the figure’s ideological character. While admitting that “in some respects it overgeneralizes and over-simplifies climate processes”, it continues: “… the figure is intended as a mental model to provide a general "powers of ten" overview of climate variability, and to convey the basic complexities of climate dynamics for a general science savvy audience.” Notice the explicit reference to the “powers of ten” mindset over fifty years after Boeke’s bookr. Certainly the continuing influence of Mitchell’s figure has nothing to do with its accuracy. Within fifteen years of its publication, two scaling composites (close to several of those shown in fig. 2.3 a), over the ranges 1 hr to 105 yrs, (see fig. 2.10 for the related fluctuations) and 103 to 108 yrs, already showed astronomical discrepancies9, 10. In the figure, we have superposed the spectra of several of the series analysed in ch. 1; the difference with Mitchell’s original is literally astronomical. Whereas over the range 1 hr to 109 yrs, Mitchell’s background varies by a factor ≈ 150, the spectra from real data imply that the true range is a factor of a quadrillions (1015), NOAA’s fig. 2.3b extends this error by a further factor of tent. Writing a and a half after Mitchell, leading climatologists Shackelton and Imbrie10 laconically noted that their own spectrum was “much steeper than that visualised by Mitchell”, a conclusion subsequently reinforced by several scaling composites11, 12. Over at least a significant part of this range, Wunsch13 further underlined its misleading nature by demonstrating that the contribution to the variability from specific frequencies associated with specific “spikes” (presumed to originate in oscillatory processes) was much smaller than the contribution due to the continuum. Just as van Leuwenhook peered through the first microscope and discovered a new world, today, we automatically anticipate finding new worlds by zooming in or out of scale. It is a scientific ideology so powerful that even quadrillions do not shake it.

scalebound assumption that the correlations decayed in an exponential manner. This especially contributed to scalebound thinking in precipitation and hydrology see for example the influential publications: 2 Zawadzki, I. Statistical properties of precipitation patterns. Journal of Applied Meteorology 12, 469-472 (1973). 3 Bras, R. L. & Rodriguez-Iturbe, I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resources Research 12, 450-456 (1976); 4 Bras, R. L. & Rodriguez-Iturbe, I. Random Functions and Hydrology. (Addison-Wesley Publishing Company, 1985). q The site explicitly acknowledges Mitchell’s influence. r If this were not enough, the site adds a further gratuitous interpretation: assuring any sceptics that just “because a particular phenomenon is called an oscillation, it does not necessarily mean there is a particular oscillator causing the pattern. Some prefer to refer to such processes as variability.” Since any time series whether produced by turbulence, the stock market or a pendulum can be decomposed into sinusoids: the decomposition has no physical content per se, yet we are told that variability and oscillations are synonymous. s In fig. 2.11a, we plot the same information but in real space and find that whereas the RMS fluctuations at 5.53x108 years are ≈ ±10 K so that extrapolating Gaussian white noise over the range implies a value ≈ 10-6 K, i.e. it is in error by a factor ≈107. t If we attempt to extend Mitchell’s picture to the dissipation scales (at frequencies a million times higher, corresponding to millisecond variability), the spectral range would increase by an additional factor of a billion. 2017-03-23 3:42 pm 7

Mitchell’s scalebound view led to a framework for atmospheric dynamics that emphasized the importance of numerous processes occurring at well defined time scales, the quasi periodic “foreground” processes illustrated as bumps – the signals - on Mitchell’s nearly flat background that was considered to be an unimportant noiseu. Although in Mitchell’s original figure, the lettering is difficult to decipher, fig. 2.3b spells them out more clearly with numerous conventional examples. For example, the QBO is the “Quasi-Biennal Oscillation”, ENSO is the “El Nino Southern Oscillation”, the PDO is the “Pacific Decadal Oscillation” and the NAO is the “North Atlantic Oscillation”. At longer time scales, the Dansgaards-Oescher and Milankovitch and tectonic “cycles”v will be discussed in ch. 4. The point here is not that these processes, mechanisms are wrong or inexistent, it is rather that they only explain a small fraction of the overall variability. Even the nonlinear revolution was affected by scalebound thinking. This included atmospheric applications of low dimensional deterministic chaos. When it was applied to weather and climate, the spectral bumps were associated with specific chaos models, analysed with the help of the dynamical systems machinery of bifurcations, limit cycles and the likew. Of course – as discussed below - from the alternative scaling, turbulence view, wide range continuum spectra are generic results of systems with large numbers of interacting components (“degrees of freedom”) - “stochastic chaos” 15 – and are incompatible with the usual small number of interacting components (“low dimensional”) deterministic chaos. Incredibly, a famous paper published in Nature even claimed that four interacting components (!) were enough to describe and model the climate16. Similarly, the spectra will be scaling - i.e. power laws – whenever there are no dynamically important characteristic scales or scale breaksx (ch. 3).

u Mitchell actually assumed that his background was either a white noise or over short ranges, sums (integrals) of a white noise. v The figure refers to these as “cycles” rather than oscillations, perhaps because they are broader. wMore recently updated with the help of stochastics: the “random dynamical systems” approach, see e.g.: 14 Chekroun, M. D., Simonnet, E. & Ghil, M. Stochastic Climate Dynamics: Random Attractors and Time- dependent Invariant Measures Physica D 240, 1685-1700 (2010). 8 Dijkstra, H. Nonlinear Climate Dynamics. (Cambridge University Press, 2013). x Although in the more recent random dynamical systems approach, the driving noise may be viewed as the expression of a large numbers of degrees of freedom, this interpretation is only justified if there is a significant scale break between the scales of the noise and of the explicitly modelled dynamics, it is not trivially compatible with scaling spectra. 2017-03-23 3:42 pm 8

105 yr 2

K β = 1.8

1 E(ω) ≈ ω−β

) β = 1.8 -0.6 ω

15 =

E( 10 megaclimate β climate macroweather weather 10 Log β = 0.2 10-5

Macroclimate β = 0.6 β = 1.8

-10 10 -1 10-9 10-6 10-5 10-2 Log10ω (yr) 102 104

Fig. 2.3a: A comparison of Mitchell’s relative scale, “educated guess” of the spectrum (grey, bottom5) with modern evidence from spectra of a selection of the series displayed in fig. 1.4 (the plot is logarithmic in both axes). There are three sets of red lines; on the far right, the spectra from the 1871-2008 20CR (at daily resolution) quantifies the difference between the globally averaged temperature (bottom) and local averages (2ox2o, top). The spectra were averaged over frequency intervals (10 per factor of ten in frequency), thus “smearing out” the daily and annual spectral “spikes”. These spikes have been re-introduced without this averaging, and are indicated by green spikes above the red daily resolution curves. Using the daily resolution data, the annual cycle is a factor ≈ 1000 above the continuum, whereas using hourly resolution data, the daily spike is a factor ≈3000 above the background. Also shown is the other striking narrow spectral spike at (41 kyrs)-1 (obliquity; ≈ a factor 10 above the continuum), this is shown in dashed green since it is only apparent over the period 0.8 - 2.56 Myr BP (before present). The blue lines have slopes indicating the scaling behaviours. The thin dashed green lines show the transition periods that separate out the regimes discussed in detail in ch. 3; these are at 20 days, 50 yrs, 80,000 yrs, and 500,000 yrs. Mitchell’s original figure has been faithfully reproduced many times (with the same admittedly mediocre quality). It is not actually very important to be able to read the lettering near the spikes, if needed they can seen in fig. 2.3b which was inspired by it.

2017-03-23 3:42 pm 9

Fig. 2.3b: The updated version of Mitchell’s spectrum reproduced from NOAA’s NCDC paleoclimate web sitey. The “background” on this paleo site is perfectly flat; hence in comparison with the empirical spectrum in fig. 2a, it is in error by an overall factor ≈ 1016.

*** At weather scales, and at virtually the same time as Mitchell’s scalebound framework for temporal variability, Isidoro Orlanski proposed a scalebound spatial classification of atmospheric phenomena by powers of ten (fig. 2.4)17. The figure is a reproduction of Orlanski’s phenomenological space-time diagram z with eight different dynamical regimes indicated on the right according to their spatial scales. The diagram does more than just classify phenomena according to their size, it also relates their sizes to their lifetimesaa. Along the diagonal, various pre-existing conventional phenomena are indicated including fronts, hurricanes, tornadoes and thunderstorms. The straight line embellishment was added by colleagues and I in 199718 and shows that the figure is actually scaling not scalebound! This is because straight lines on logarithmic plots such as this are power laws; more on this below. At the time of Orlanski’s classification, meteorology was already largely scalebound. This was partly due to its near total divorce from primarily scaling turbulence theorybb, and also it was due to its heritage from the older more qualitative traditions of “synoptic” meteorology and of linearized approaches - these being the only ones available in the pre- computer era. Orlanski’s classification therefore rapidly became popular as a systematic rationalization of an already strongly phenomenologically scalebound approach. It is ironic that just as Orlanski tried to perfect the old scalebound approach, and unbeknownst to the

y The page is has since been taken down. z Sometimes called “Stommel diagrams” after Henry Stommel who produced such diagrams in oceanography. aa Notice that he indicates that the climate starts at about two weeks (bottom row). bb All the turbulence theories were scaling, the question was whether one or two ranges were required; we discuss this in detail in ch. 3. 2017-03-23 3:42 pm 10 modellers, the computer revolution was ushering in the opposite scaling approach: the General Circulation Modelcc.

Fig. 2.4: Orlanski’s space-time diagram with eight different dynamical regimes indicated on the right according to their spatial scales. Notice that he indicates that the climate starts at about two weeks (bottom row). The straight shows that the figure is actually scaling (straight on this logarithmic plot). Reproduced from 18. 2.2 Scaling: Big whirls have little whirls and little whirls have lesser whirls Scalebound thinking is now so entrenched that it seems obvious that “zooming in” to practically anything will disclose hidden secrets. Indeed, we would likely express more wonder if we zoomed in only to find that nothing had changed, if the system’s structure was scaling (fig. 2.5)! Yet in the last thirty years antiscaling prejudices have started to unravel; much of this is thanks to Mandelbrot’s path breaking “Fractals, form chance and dimension”19dd (1977) and “Fractal Geometry of Nature”20 (1982). His books made an immediate visual impact thanks to his stunning avant-garde use of computer graphics that were the first to display the realism of scaling. One was also struck by the word “geometry” in the title: the last time scientists had cared about geometry was when D’Arcy Thompson 21 brilliantly used it to understand the shapes of diatoms and other biomorphologies. While cc At the time the GCMs were much to small to allow for proper statistical scaling analyses of their outputs, and the reigning turbulence theory turned out to be seriously unrealistic, see ch. 3. Even today, the scaling of the GCMs is practically unknown! dd This was actually a translation and extension of the earlier French book “Fractals”, 1975. 2017-03-23 3:42 pm 11

Mandelbrot’s simulations, imagery and scaling idea sparked the fractal strand of the nonlinear revolution - and continue to transform our thinking - his insistence on geometry is now nearly forgotten. The basic reason is that scientists have – in my opinion rightly - long been more interested in statistics than in geometry. There is also a less obvious reason: the most interesting thing to come from the scaling revolution was arguably not fractals, but multifractals, and these cannot be reduced to geometryee. In contrast with a “scalebound” object, Mandelbrot counterposed his new scaling, fractal one:

“A scaling object, by contrast, includes as its defining characteristic the presence of very many different elements whose scales are of any imaginable size. There are so many different scales, and their harmonics are so interlaced and interact so confusingly that they are not really distinct from each other, but merge into a continuum. For practical purposes, a scaling object does not have a scale that characterizes it. Its scales vary also depending upon the viewing points of beholders. The same scaling object may be considered as being of a human's dimension or of a fly's dimension.”1

Fig. 2.5: The scaling approach: looking through the microscope at the Mandelbrot set (the black in the upper left square), Mandelbrot notices one of an infinite number of reduced scale versions.

I had the good fortune to begin my own graduate career in 1976, just as the scalebound weather and climate paradigms were ossifying but before the nonlinear ee The mathematical issue is their singular small scale nature. The basic multifractal process is cascades (box 2.1) that do not converge to mathematical points but only converge in the neighbourhood of points. This precludes them from being represented as a geometric set of points. 2017-03-23 3:42 pm 12 revolution really took off. I was thus totally unprepared and can vividly remember the epistemic shock when shortly after it appeared, I first encountered “Fractals, form chance and dimension”. Revealingly, it was neither my PhD supervisor Geoff Austin nor any other scientific colleague who introduced me to the book, but rather my mother ff - an artist – who was awed by Mandelbrot’s imagery and fascinated by its artistic implications. At the time, my thesis topic was the measurement of precipitation from satellitesgg and I had become frustrated because of the enormous space-time variability of rain that was way beyond anything that conventional methods could handlehh. The problem was that there were several competing techniques for estimated rain from satellites and each one was different, yet there was essentially no way to validate any of them: scientific progress in the field was essentially blocked. Fortunately, this didn’t prevent radar and satellite remote sensing technology from continuing to advance. Not long after reading Mandelbrot’s book, I started working on developing fractal models of rain, so that when I finally submitted my thesis in November 1980, about half of it was on conventional remote sensing topics while the other half was an attempt to understand precipitation variability by using fractal analyses and fractal models of rainii. Given that three of the more conventional thesis chapters had already been published in journals - and had thus passed peer review - I was confident of a rubber stamp by the external thesis examiner. Since I had already been awarded of a post-doctoral fellowship financed by Canada’s National Science and Engineering Research Council (NSERC) I happily began preparing for a move to Paris to take it up at the Météorologie Nationale (the French weather service). But rather than getting a nod and a wink, the unimaginable happened: my thesis was rejected! The external examiner David Atlas (1924- 2015) then at NASA, was a pioneering radar meteorologist, who was involved in the - then fashionable - meso-scale scalebound theorizing (ch. 3). Atlas was clearly uncomfortable with the fractal material but rather than attacking it directly, he instead claimed that while the thesis content might be acceptable, that its structure was not. To his way of thinking, there were in fact two theses not one. The first was a conventional one that had already been published, while the second was a thesis on fractal precipitation which according to him was unrelated to the first. The last point piqued me since it seemed obvious that the fractals were there in an attempt to overcome longstanding problems of untamed space-time variability: on the contrary they were very relevant to a remote sensing thesis. At that point, I panicked. According to the McGill thesis regulations, I had only two options: either I accept the referee’s strictures, amputate the offending fractal material and resubmit, or I could refuse to bend. In the latter case, the thesis would be sent without change to two external referees, both of which would have to accept it, a highly risky proposition. Although I was ready to defend the fractal material, I knew full well it had not ff She was a pioneering electronic artist and had been working with early colour Xerox machines to develop electronic imagery before the development of personal computers: 22 Lovejoy, M. Postmodern Currents: Art and Artists in the Age of Electronic Media. (Prentice Hall College Division, 1989). gg My thesis (1981) was entitled “The remote sensing of rain”, Physics, dept. McGill University. hh Conventional methods are still in vogue, but over the last ten years our understanding of precipitation has been revolutionized by the application of the first satellite borne weather radar (the Tropical Rainfall Measurement Mission), that has unequivocally demonstrated that - like the other atmospheric variables - precipitation is a global scale cascade process that is distinctive primarily because its intermittency parameter is much larger than for the other fields. More on this below. ii My approach to rainfall modeling followed the method that Mandelbrot had used to make cloud and mountain models in his book, except that I used a variant that was far more variable (based Levy distributions rather than the bell curve). 2017-03-23 3:42 pm 13 received serious critical attention. There might be errors that would sink the whole thing. A second rejection would be disastrous because McGill would not permit me to resubmit a thesis on the same topic. However, before making a decision and with the encouragement of Austin, I contacted Mandelbrot, visiting him at his Yorktown heights IBM office in January 1981. Mandelbrot was very encouraging about the material in the draft thesis. Not being very familiar with atmospheric science, and wanting to give me the best possible advice, he contacted his friend, the oceanographer Eric Mollo-Christensen (1923-2009) at MIT. Mollo- Christensen advised me to simply remove the fractal material, and get the thesis out of the way. I could then try to publish it in the usual scientific literature. Beyond that, Mandelbrot advised me to make a short publication out of the analysis part, hinting that we could later start a collaboration to develop an improved fractal model of rain. With the fractals excised, the thesis was accepted without a hitchjj, and at the end of June, a week after defending my thesis, I went off to my Paris post-doc at the Météorologie Nationale, to work with a radar specialist, Marc Giletkk. In literally my first week in the French capital, I wrote up the analysis part - the empirical rain and cloud area-perimeter relation, fig. 2.8 and submitted it to Science23ll. A few months later, I started a series of three week visits to Mandelbrot in Yorktown heights. This collaboration eventually spawned the Fractals Sums of Pulses (FSP) model24mm close to the “H – model” that I describe later. *** An object is scaling if when blown up, a (small) part in some way resembles the (large) whole. An unavoidable example is the icon that now bears his name – the Mandelbrot set, the black silhouette in fig. 2.5. It can be seen that after a series of blow-ups we find reduced scale copies of the original (largest version) of the setnn. While the Mandelbrot set has been termed “the most complex object in mathematics”25 it is simultaneously one of the simplest, being generated by simply iterating the algorithm: “I take a number, square it, add a constant, square it, add a constant…”oo. Precisely because of this algorithmic simplicity, it is now the subject of a small cottage industry of computer geeks who superbly combine numerics, graphics, and music. The You Tube is replete with examples; the last time I looked, the record-holder displayed a zoom by a factor of over 104000 (a one with four thousand zeroes)! The Mandelbrot set may be easy to generate, but it is hardly easy to understand. To understand the scaling, fractal idea, consider instead the simplest (and historically the first) fractal, the “perfect” Cantor set (fig. 2.6). Start with a segment one unit long (infinitely thin: this is mathematics!); the “base”. Then remove the middle 1/3, this is the “motif” (second from the top in the figure). Then iterate by removing the middle third of each of the two 1/3 long segments from the previous. Continue iterating so that at every level, one removes all the middle segments before moving to the next level. When this is repeated to infinitely

jj Twenty five years later, I met up with Atlas, by then in his 80’s but still occupying an office at NASA. His rejection of my thesis had been a fatherly act intending to steer me back towards to into mainstream science. During our discussion, he was mostly intrigued that I was still pursuing the material he had rejected so long ago! kk Within two months of the start of my post-doc, Gilet was given a high level administrative position and essentially withdrew from research. As a free agent, I soon started collaborating with Daniel Schertzer in the newly formed turbulence group. ll The paper sparked a stir; since then it has been cited nearly a thousand times. mm The FSP model was an extension and improvement over the Levy fault model that I had developed during my PhD thesis, but was nevertheless still mono – not multi – fractal. nn The small versions are actually slightly deformed versions of the larger ones. oo To get an interesting result the constant should be a complex number (i.e. one that involves the square root of minus one). 2017-03-23 3:42 pm 14 small segments, the result is the Cantor set26pp. From the figure, we can see that if either the left or right half of the set is enlarged by a factor of three, then one obtains the same set. This property - that a part is in some way similar to the whole - is for obvious reasons called “self-similarity”. In this case, the left or right halves are identical to the whole, in atmospheric applications, the relationship between a part and the whole will generally be statistical, small parts are will only be the same as the whole on average. The Cantor set has many interesting properties, the main one for our purposes being its fractality, a consequence of its self-similarity. Let’s consider it a little more closely. After n construction levels, the number of segments is N = 2n, and the length of each segment is L = (1/3)n. Therefore, N and L are related by a power law: eliminating the level n, we find N = L-D where D = log(2)/log(3) = 0.63… D is the fractal dimension. In this case, it is called the “box counting” dimension since - if we considered a fully formed Cantor set - the number of segments L (one dimensional “boxes”) that we would need to cover the set would be the sameqq N. If the previous fractal dimension seems a bit weird, consider what would happen if we applied it to the entire initial segment (one dimensional) line? We can check that we do indeed recover D =1. To see this, consider what would happen if we did not remove the middle third (we kept the original segment) but analysed it using the same reasoning. In this case would have still divide by 3 at every iteration so that as before, L = (1/3)n but now the number of segments is simply N= 3n instead of 2n. This would lead to D = log3/log3 = 1, simply confirming that the segment does indeed have the usual dimension of a line (= one). When a quantity such as N changes in a power law manner with scale L, it is called “scaling” so that N = L-D is a scaling law. Contrary to a scalebound process that changes its mechanism, (its “laws”) every factor of 10 or so, a unique scaling law may hold over a wide range of scales; for the Cantor set and other mathematical fractals, over an infinite range. Of course, real physical fractals can only be scaling over finite ranges of scale, there is always a smallest and largest scale beyond which the scaling will no longer be valid. Why does a power law imply scaling (and visa versa)? The answer is simply that if N = L-D and we zoom in by a factor λ (so that L->L/ λ), then we see that N-> N λD; so that the form of the law is unchanged. For a scalebound process, changing scales by zooming would give us something quite different. Whenever there is a scaling law, there is something that doesn’t change with scale, something that is scale invariant: in the previous example it is the fractal dimension D. No matter how far we zoom into the Cantor set we will always recover the same value D. Self-similarity is a special case of scale invariance and occurs when - as its name suggests - when some aspect of the system is unchanged under a usual blow-up. In physics, quantities such as energy that are invariant (conserved) under various transformations are of fundamental importance, hence the significance of exponents such as fractal dimensions that are invariant under scale transformations. More generally, a system can be invariant under more generalized “zooms” i.e. blow- ups combined with stretchings, rotations or other transformations. As an example, let’s return to the Weierstrass function which is scale invariant but not self-similar. To show that it is indeed scale invariant, we must combine a blow- up with a squashing - or alternatively blow up by different factors in each of the coordinate directions. This property is shown in (fig. 2.2a) by comparing the full Wierstrass function on the interval between 0 and 1, with the upper right that shows the left half (omitting the lowest frequencyrr), pp It was apparently discovered a bit earlier by H. J. S. Smith in 1874. qq The box counting dimension is (almost always) the same as the Hausdorff dimension that is sometimes used in this context. rr If we don’t remove the lowest frequency in the upper left construction, then the result is only approximately self-affine, however, the construction mechanism itself is nevertheless self-affine. 2017-03-23 3:42 pm 15 stretched in the horizontal direction by a factor 2 and stretched in the vertical direction by the factor 2H = 1.26. Objects that are scale invariant only after being blown up by different factors in perpendicular directions are called “self-affine”; the Weierstrass function is thus self-affine. Scale invariance is still more general than this as we discuss at length in the next chapter. On the other hand, in the infinitely small limit, the Cantor set is simply a collection of disconnected points (Mandelbrot calls such sets “dusts”)ss, and a mathematical point has a dimension zerott. The Cantor set is thus an example of set whose fractal dimension 0.63… is between 0 and 1 and D this quantifies the extent to which it fills the line. Sets of points with such in-between dimensions (they are usually noninteger) are fractalsuu. More generally, for the purposes of this book, a fractal is a geometric set of points that is scale invariantvv. As another mathematical example, consider next fig. 2.7b, the Sierpinski carpet 27ww. The figure shows the base (upper left), motif (upper right) obtained by dividing the square into squares one third the size and then removing the middle one, the bottom right shows the result after 6 iterations. Using the same approach as above, after n construction steps (levels), the number of squares is N = 8n, and the size of each is L = (1/3)n. Thus N = L-D with D = log8/log3 = 1.89… Indeed, the Cantor set, the Sierpinski square and the unit segment illustrate the general result:

Number of boxes ≈ (size L)-D≈ (scale)-D

Just as the Cantor set has a fractal dimension D = 0.63… between 0 and 1 - between a point and a line - the value of D for the Sierpinski square is between 1 and 2 i.e. between a line and a plane and it quantifies the extent that the Sierpinski square exceeds a line while partially filling the plane. These examples show a basic feature of fractal sets: due to their hierarchical clustering of points, they are “sparse”, their fractal dimension quantifies their sparseness. While the number of boxes gives us information about the absolute frequency of occurrence of parts of the set of size L, it is often more useful to characterize the density of the boxes of size L obtained by dividing the number of boxes needed to cover the set by the total number of possible boxes: for example the Cantor set by L1, the Sierpinski square by L2 since they are sets on the line (d = 1) and plane (d = 2) respectively. This ratio is their relative frequency, i.e. it is the probability that a randomly placed segment (d = 1) or square (d = 2) will happen to land on part of the set; the ratio is LC where C = d-D is the codimension of the set. Whereas D measures absolute sparseness and frequencies of occurrence, C measures relative sparseness and probabilities. For the Cantor set, C =1-log2/log3= 0.36… and for the Sierpinski square, C = 2-log8/log3 = 0.11… so that their relative sparsenesses are not so different. If I put a circle (or square) size L at random on the Sierpinski square (iterated to infinitely small scales), the probability of it landing on part of the square is L0.11, whereas for the Cantor set, putting a random segment length L, would have almost the same ss At some point any connected segment would have been cut by the removal of a middle third. tt The familiar geometric shapes studied by Euclid - points, lines, planes, volumes have “topological dimensions” 0, 1, 2, 3. For fractal sets, the fractal dimension and the topological dimension are generally different. uu Due to nontrivial mathematical issues, there are numerous mathematical definitions of dimension, a full discussion would take us too far afield. vv Of course, the line in the above example is scale invariant with D = 1 so according to this definition it is also a fractal. However, we generally reserve the term “fractal” for less trivial scale invariant sets. ww This construction and the analogous construction based on removing middle triangles is credited to W. Sierpinski in 1916. Mobile phone and wifi antennae have been produced using a few iterations of the Sierpinski carpet, exploiting their scale invariance to accommodate multiple frequencies. The Sierpinski triangle goes back to at least the 13th century where it has been found in churches as decorative motifs. 2017-03-23 3:42 pm 16 probability - L0.36 - of landing on the set. In science, we’re usually interested in probabilities, so that fractal codimensions are generally more useful than fractal dimensions. This example illustrates the general result:

probability ≈ (scale)C

Base Mo&f

X3

Fig. 2.6a: The Cantor set. Starting at the top (the “base”), a segment one unit long - the “motif” is obtained by removing the middle third. The operation of removing middle thirds is then iterated to infinitely small scales. The red ellipses show the property of self-similarity: the left hand half of one level when blown up by a factor of three gives the next level up.

Fig. 2.6b: The construction of the Sierpinski carpet. The base (upper left), is transformed to the motif (upper middle) by dividing the square into nine subsquares each one third the size and then removing the middle square. The construction proceeds left to right top to bottom to the sixth iteration.

2017-03-23 3:42 pm 17

An example of fractal sets that are relevant to atmospheric science are the points where meteorological measurements are taken, fig. 2.7c. In this case, the set is sparse because the measurement stations are concentrated on continents and in richer nations. To estimate its fractal dimension, one can place circles of radius L on each stationxx (one is shown in the figure) and determine the average number of other stations within the distance L. If one repeats this operation for each radius L, averaging over all the stations, one finds that on averageyy there are LD stations in a radius L, and that this behavior continues down to a scale of 1 kmzz. For the measuring network (fig. 2.7d), we found D = 1.75. Even today, much of our knowledge of the atmosphere comes from meteorological stations, for climate purposes - such as estimating the evolution of the atmosphere over the last century - we must also consider ship measurements, place the data on a grid, (typically 5oX5o in size) and average them over a month (e.g. fig. 2.7e). It turns out that for any given month, the set of grid points having some temperature data, is similarly sparseaaa so that both in situ weather and climate data are fractal. An immediate consequence of a fractal network is that it will not detect sparse fractal phenomena, for example, the violent centres of storms that are so sparse that their fractal dimensions are less than C, (in this example, 0.25). By systematically missing these rare but violent events, the statistics end up being biased, a subject that we discuss in the case of macroweather in ch. 3. This analysis shows that as we use larger and larger circles, they typically encompass larger and larger voids so that the number of stations per square kilometer systematically decreases: the measuring network effectively has holes at all scales. This means that the usual way of handling missing data must be revised. At present, one thinks of the measuring network as a two dimensional array although with some grid points empty. According to this way of thinking, since the earth has a surface area of about 500 million square kilometers, each of the 10,000 stations represents about 50 thousand square kilometers. This corresponds to a box about 220 kilometers on a side so that atmospheric structures (e.g. storms) smaller than this will typically not be detected. Although it is admitted to be imperfectbbb, the grid is therefore supposed to have a spatial resolution of 220 km. Our analysis shows that on the contrary, the problem is one of inadequate dimensional resolutionccc. xx This technique actually estimates the “correlation dimension” of the set. If instead one centres circles at points chosen at random on the earth’s surface (not only on stations), then one instead obtains the box-counting dimension discussed above. It turns out that in general, the two are slightly different, the density of points is an example of a multifractal measure. Indeed, one can introduce an infinite hierarchy of different exponents associated with the density of points. yy The rule LD for the number of stations in a circle is a consequence of the number of boxes decreasing with L as L-D since on average, the number of points per box is independent of L: L-D xLD = constant. zz The geographical locations of the stations were only specified to the nearest kilometer, so it is possible that the curve extends to slightly smaller scales. For large L it is valid up to several thousand kilometers which is about as much as is theoretically possible given that there are only about 10,000 stations. aaa Both in space and – due to data outages and ship movements, also in time, the fractal dimensions, codimenions are nearly the same as for the meteorological network: 28 Lovejoy, S. & Schertzer, D. The Weather and Climate: Emergent Laws and Multifractal Cascades. (Cambridge University Press, 2013). bbb The techniques for filling the “holes” such as “Kriging” typically also make scalebound assumptions (exponential decorrelations and the like). ccc When estimating global temperatures over scales up to decades, the problem of missing data does indeed dominate the other errors (although this is not the same as dimensional resolution). It dominates those associated with instrumental siting (e.g. “heat island effect”), changing technology and other potential biases due to human influence: 29 Lovejoy, S. How accurately do we know the temperature of the surface of the earth? . Clim. Dyn., doi:doi:10.1007/s00382-017-3561-9 (2017). 2017-03-23 3:42 pm 18

L

nL() LD

2.6c: The geographical distribution of the 9962 stations that the World Meteorological Organization listed as giving at least one meteorological measurement per 24 hours (in 1986); it can be seen that it closely follows the distribution of land masses and is concentrated in the rich and populous countries. The main visible artificial feature is the Trans-Siberian Railroad. Also shown is an example of a circle used in the analysis. Adapted from ref30.

Fig. 2.6d: The average number of stations (vertical axis) within a circle radius L horizontal axis (in kilometers). The top straight line slope is D=1.75. Adapted from ref30.

2017-03-23 3:42 pm 19

Fig. 2.6e: Black indicates the 5ox5 o grid points for which there is some data in the month of January, 1878 (20% of the 2560 grid points were filled). This is from the HadCRUT data set 31. Although highly deformed by this map projection, we can almost make out the south American continent (white surrounded by black, lower left) and Europe, the central upper black mass of grid points.

*** The preceding examples of fractal sets were deliberately chosen so that one could get an intuitive feel for the sparseness that the dimension quantifies. In many cases, one instead deals with sets made up of “wiggly” lines such as the Koch curve 32 (1904), shown in fig. 2.7addd; the fractal dimension can often quantify wiggliness. The construction proceeds from top to bottom by replacing each straight segment by segments the shape of the second curve from the top, i.e. made of pieces, each of the original size. Again, after n iterations, we have N = 4n and L =(1/3)n, hence the fractal dimension is D = log4/log3 = 1.26… In this curve, the “wiggles” have a dimension between 1 and 2, the wiggliness is quantified by D. Note that as we proceed to more and more iterations, the length of the curve increases. Indeed, after each iteration, the length increases by the factor 4/3 since each segment is replaced four segments each 1/3 the previous length. Therefore after n iterations, the length is (4/3)n which becomes infinite as n grows. If a completed Koch curve is measured with a ruler of length L (such a ruler will be insensitive to wiggles smaller than thiseee), then in terms of the fractal dimension, the length of the Koch curve would be L1-D. As the ruler gets shorter and shorter, it can measure more and more details, the length increases accordingly. Since D (=1.26>1), the length grows as L-0.26 and becomes infinite for rulers with small enough L. How far can we take wiggliness? In 1890, Peano proposed the fractal construction that bears his name (fig. 2.6d). The Peano curve is made from a line that is so wiggly that - by successive iterations – it ends up literally filling part of the plane! At the time, this stunned the mathematical community since it was believed that a square was two dimensional because it required two numbers (coordinates) to specify a point in it. Peano’s curve allows a point to be specified instead by a single coordinate specifying its position on an (infinite) line wiggling its way around the squarefff. We have already seen another example of wiggliness, the Weierstrass function, fig. 1.3, 2.2a), constructed by adding sinusoids with geometrically increasing frequencies and

ddd If three Koch curves are joined in the shape of a triangle, one obtains the Koch “snowflake” which is probably more familiar. eee This method is sometimes called the “Richardson dividers method” after L. F. Richardson who first used it to estimate the length of coastlines and other geographic features, see below. fff Note that in the infinitely small limit, at each point, the Peano curve touches itself. This means that while it is mapping of the line onto the plane, the mapping is not one to one. 2017-03-23 3:42 pm 20 geometrically decreasing amplitudes. The Weierstrass function was originally proposed as the first example of a function whose value is everywhere well defined (it is continuous), but does not have a tangent anywhere. A visual inspection (fig. 2.2a) shows why this is so: to determine the tangent, we must zoom in to find a smooth enough part over which to estimate the slope, since the Weierstrass function is a fractal, we in zoom forever without finding anything smooth. An atmospheric example of a wiggly curve is the perimeter of a cloud as defined for example by a cloud photograph separating lines that are brighter or darker than a fixed brightness threshold. In order to estimate the fractal dimension of a cloud perimeter, we could try to measure it with rulers of different lengths and use the fact that the perimeter length increases as L1-D (since D>1). It turns out that it is more convenient to use fixed resolution satellite or radar images and use many clouds of different sizes. If we ignore any holes in the cloudsggg and if their perimeters all have the same fractal dimensions, then their areas (A) turn out to be related to the perimeterhhh as P = AD/2. Fig. 2.8 shows an example when this technique is applied to real cloud and rain areas. Although various theories were later developed to explain the empirical dimension (D = 1.35) the most important implication of this figure is that it gave the first modern evidence of the complete failure of Orlanski’s scalebound classification. Had Orlanski’s classification been based on real physical phenomena each different and acting over narrow ranges of scales, then we would expect a series of different slopes, one for each of his ranges. The expectation that the behaviour would be radically different over different scale ranges was especially strong as concerns the meso-scale, the horizontal range from about 1 to 100 kilometers where it was believed that the atmospheric thickness would play a key role in changing the behaviour, the “meso-scale gap”, see ch. 3. Before this, the only other quantitative evidence for wide range atmospheric scaling was from various empirical tests of Richardson’s 4/3 law of turbulent diffusion33; fig. 2.8 (left) shows his original verification using notably data from pilot balloons and volcanic ashiii. The atmospheric 4/3 power law has since been repeatedly confirmedjjj with theorists invariably complaining that it extends beyond the range for “which it can be justified theoretically”kkk. ggg If the cloud area is itself a fractal set, the P = AD/Dc where Dc is the fractal dimension of the clouds. hhhThe area-perimeter relation was proposed by Mandelbrot. iii The ocean is also an example of a stratified turbulent system and the 4/3 law holds fairly accurately over the range 10 m to 10,000km: 34 Okubo, A. & Ozmidov, R. V. Empirical dependence of the horizontal eddy diffusivity in the ocean on the length scale of the cloud. Izv. Akad. Nauk SSSR, Fiz. Atmosf. i Okeana 6 (5), 534-536 (1970). jjj For example, in a later paper, Richardson tested his law in the ocean using imaginative means includes bags of parsnips that he watched diffusing from a bridge. However, there was some controversy about it generated by a large scale balloon experiment called EOLE in 1974 that claimed to have indirectly invalidated it: 35 Morel, P. & Larchevêque, M. Relative dispersion of constant level balloons in the 200 mb general circulation. J. of the Atmos. Sci. 31, 2189-2196 (1974). However, the original interpretation was shown to be wrong by: 36 Lacorta, G., Aurell, E., Legras, B. & Vulpiani, A. Evidence for a k^-5/3 spectrum from the EOLE Lagrangian balloons in the lower stratosphere. Ibid. 61, 2936-2942 (2004). And even this reinterpretation was incomplete (!), see appendix 6A of: 28 Lovejoy, S. & Schertzer, D. The Weather and Climate: Emergent Laws and Multifractal Cascades. (Cambridge University Press, 2013). It seems that Richardson was indeed right! 37 Richardson, L. F. & Stommel, H. Note on eddy diffusivity in the sea. J. Met. 5, 238-240 (1948). kkk Meaning that it cannot be accounted for by the dominant three dimensional isotropic homogeneous turbulence theory, see e.g. p. 557 of: 2017-03-23 3:42 pm 21

Fig. 2.7a: Left: A fractal Koch curve32 (1904), reproduced from Welander39 (1955) who used it as a model of the interface between two parts of a turbulent fluid.

Fig. 2.7b: Left, the first three steps of the original Peano curve, showing how a line (dimension 1) can literally fill the plane (dimension 2). Right: A variant reproduced from Steinhaus40 (1960) who used it as a model for a hydrographic network, illustrating how streams can fill a surface.

38 Monin, A. S. & Yaglom, A. M. Statistical Fluid Mechanics. (MIT press, 1975). 2017-03-23 3:42 pm 22

(1000 km)2

(100 km)2

(10 km)2

(1 km)2

1 km 104 km

Fig. 2.8: The left shows Richardson’s proposed 4/3 law of turbulent diffusion33lll which includes a few estimated data points. Right: the area perimeter relation for radar detected rain areas (black) and Infra red satellite cloud images (open circles), the perimeter is the horizontal axis, the area, the vertical axis. The slope corresponds to D = 1.35. The mesoscale (roughly 1 to 100km) is shown in the red brackets: nothing special. Adapted from ref.23 *** We have discussed several of the famous 19th century fractals: the Cantor set (figs. 2.7a), the first set with a noninteger dimension; the Peano curve (figs. 2.7b), the first line that could pass through every point in the unit square (a plane); and the Weierstrass function (figs. 2.2a), the first continuous curve that doesn’t have a tangent anywheremmm. But these were considered to be essentially mathematical constructions: academic oddities without physical relevance. Mandelbrot provocatively called them “monsters”. Mandelbrot not only coined the term “fractal” but with his indefatigable energy put them squarely on the scientific map. Although he made numerous mathematical contributionsnnn, his most important one was as a towering pioneer in applying fractals and scaling to the real world. In this regard, his only serious scientific precursor was Lewis Fry Richardsonooo (1881-1953). Due to his Quaker beliefs, Richardson was a pacifist and this

lll The quality of the figure is low, but thanks to Monin, it is already improved: 41 Monin, A. S. Weather forecasting as a problem in physics. (MIT press, 1972). mmm It is a nowhere differentiable function. nnn I will let the mathematicians judge his contributions to mathematics. However, there is no question that Mandelbrot’s contribution to science has been monumental and underrated. In any case (and in spite of Mandelbrot’s efforts!), it is still early to evaluate his place in the history of science. Interested readers may consult his autobiography, “The Fractalist” that was published posthumously: 42 Mandebrot, B. B. The Fractalist. (First Vintage Books, 2011). ooo Other notable precursors were Jean Perrin (1870-1942), who questioned the differentiability of the coast of Brittany: 2017-03-23 3:42 pm 23 made his career difficult, essentially disqualifying him from academic positions. He instead joined the Meteorology Office but temporarily quit it in order to drive an ambulance during the first world war. Afterwards, he rejoined the Meteorology Office but in 1920 resigned when it was merged into the Air Ministry and militarized. Richardson worked on a range of topics and is eponymously remembered for the nondimensional Richardson number that characterizes atmospheric stability, the Richardson 4/3 law (fig. 2.8a), the Modified Richardson Iteration and Richardson Acceleration techniques of numerical analysis and the Richardson divider’s method. The latter is a variant on box-counting (discussed above) that he notably used to estimate the length of the coastline of Britain, demonstrating that it followed a power law. Mandelbrot’s famous 1967 paper that initiated fractals: “How long is the coastline of Britain? Statistical self-similarity and fractional dimension”45ppp took Richardson’s graphs and interpreted the exponent in terms of a fractional dimension qqq . Fully aware of the problem of conceptualizing wide rang atmospheric variability, he was the first to explicitly propose that the atmosphere might be fractal. A remarkable subheading in his 1926 paper on turbulent diffusion is entitled “Does the wind possess a velocity” this followed with the statement: “this question, at first sight foolish, improves upon acquaintance”. He then suggested that a particle transported by the wind might have a Weierstrass function-like trajectory that would imply that its speed (tangent) would not be well defined. Richardson is unique in that he straddled the two main – and superficially opposing - threads of atmospheric science: the low level deterministic approach and the high level statistical turbulence approach. Remarkably, he was a founding figure for both. His seminal book “Weather forecasting by numerical process46” rrr (1922) inaugurated the era of numerical weather prediction. In it, Richardson not only wrote down the modern equations of atmospheric dynamics, but he pioneered numerical techniques for their solution, he even laboriously attempted a manual integrationsss. Yet this work also contained the seed of an alternative: buried in the middle of a paragraph, he slyly inserted the now iconic poem describing the cascade idea: “Big whirls have little whirls that feed on their velocity, little whirls have smaller whirls and so on to viscosity (in the molecular sense)”ttt. Soon afterwards, this was followed by the first turbulent law, the Richardson 4/3 law of turbulent diffusion33, which today is celebrated as the starting point for modern theories of turbulence including the key idea of cascades and scale invariance. Unencumbered by later notions of meso-scaleuuu, and with remarkable prescience, he even proposed that his scaling law could hold from dissipation up to planetary scales (fig. 2.8, left), a hypothesis confirmed 35 years ago by the area perimeter analysis, and since then by a large body of results

43 Perrin, J. Les Atomes. (NRF-Gallimard, 1913). and Hugo Steinhaus (1887- 1972) who questioned the integrability of the length of the river Vistula: 44 Steinhaus, H. Length, Shape and Area. Colloquium Mathematicum III, 1-13 (1954). Lack of differentiability and integrability are typical scaling features and are discussed again in (ch. 7). ppp This was nearly a decade before Mandelbrot coined the word “fractal”. qqq Above we saw that the length of the cloud perimeter varies as L1-D where L is the length of the ruler and D is the fractal dimension. rrr Lacking support, he paid for the publication out of his own pocket. sss Near the war’s end, he somehow found six weeks to attempt a manual integration of the weather equations. His estimate of the pressure tendency at a single grid point in Europe turned out to be badly wrong (as he admitted), but the source of the error was only recently identified, see the fascinating account by Lynch: 47 Lynch, P. The emergence of numerical weather prediction: Richardson's Dream. (Cambridge University Press, 2006). ttt This poem was a parody of a nursery rhyme, the “Siphonaptera”: “Big fleas have little fleas, Upon their backs to bite 'em, And little fleas have lesser fleas, and so, ad infinitum. uuu That predicted a strong break in the scaling, see ch. 3. 2017-03-23 3:42 pm 24 discussed in the chapters below. Today, he is both honoured as father of numerical weather prediction by the Royal Meteorological Society’s Richardson prize and as grandfather of turbulence by the European Geosciences Union’s Richardson medalvvv. As a humanist, Richardson worked to prevent war, with his book “The problem of contiguity: an appendix of statistics of deadly quarrels” 48 founding the mathematical (and nonlinear!) study of war. He was also anxious that his research be applied to directly improve the situation of humanity and proposed that a vast “Weather factory” be built employing tens of thousands of human “computers” in order to make real time weather forecasts! Recognizing (from personal experience) the tedium of manual computation, he foresaw the need for the factory to include social and cultural amenities. Let me now explain a deep consequence of Richardson’s cascade idea that didn’t fully mature until the nonlinear revolution in the 1980’s. We have seen that the alternative to scalebound thinking is scaling thinking and that fractals embody this idea for geometric sets of points. For example the Koch curve was a model of a turbulent interface, the set of points bounding two different regions, the Peano curve as a model of a hydrographic network. However, in order to apply fractal geometry to the set of bounding (perimeter) points, we were already faced with a problem: we had to reduce the grey shades to white or black (cloud or no cloud). Since atmospheric science does not often deal with black/white sets, but rather with fields such as the cloud brightness or temperature that have numerical values everywhere in space and that vary in time, something new is necessary. (Re)consider fig. 1.5, the aircraft temperature transect. One could repeat the treatment of the Weierstrass function to try to fit the transect into the framework of fractal sets by simply considering the points on the top graph as the set of interest. But this turns out to be a bad idea because as we also saw in fig. 1.5 (bottom), the figure was actually hiding some incredibly variable spiky, (intermittent) changes, and this behaviour requires something new to handle it: multifractalswww. Indeed, multifractals were first discovered precisely as models of such turbulent intermittencyxxx. Focus on the bottom of fig. 1.5, the “spikes”. Rather than treating all the points on the graph as a wiggly fractal set, instead consider the set of points that exceed a fixed vvv The highest honour of the Nonlinear Processes division. www Mathematically, the nontrivial point is that whereas the Weierstrass function is continuous i.e. well defined at each instant t, (a mathematical point on the time axis), a multifractal only converges in the neighborhood of the instant, in order to converge, the multifractal must be averaged over a finite interval. This is the origin of the “dressed” properties that are related to the extreme events discussed in ch. 6. xxx It was actually a little more complicated than that: the key multifractal formula independently appeared in three publications in 1983, one in turbulence, and the other two in the field of deterministic chaos: 49 Schertzer, D. & Lovejoy, S. in IUTAM Symp. on turbulence and chaotic phenomena in fluids. (ed T. Tasumi) 141-144. 50 Grassberger, P. Generalized dimensions of strange attractors. Physical Review Letter A 97, 227 (1983). 51 Hentschel, H. G. E. & Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physica D 8, 435-444 (1983). While the turbulent publication was admittedly only in a conference proceeding, the debate about the priority of discovery was soon overshadowed by Mandelbrot’s claim to be the “father of multifractals”: 52 Mandelbrot, B. B. Multifractals and Fractals. Physics Today 39, 11, doi: http://dx.doi.org/10.1063/1.2815135 (1986). Soon after the initial discovery of multifractals, a major contribution was made by Parisi and Frisch who were also the first to coin the term “multifractal”: 53 Parisi, G. & Frisch, U. in Turbulence and predictability in geophysical fluid dynamics and climate dynamics (eds M. Ghil, R. Benzi, & G. Parisi) 84-88 (North Holland, 1985). Recognizing the importance of multifractals, Mandelbrot subsequently spent a huge effort claiming its paternity. Ironically, Steven Wolfram in his review of Mandelbrot’s posthumous autobiography “The Fractalist” complained that Mandelbrot had “diluted” the fractals concept by insisting on multifractals: 54 Wolfram, S. in Wall Street journal (2012). 2017-03-23 3:42 pm 25 threshold, for example those above the level of one standard deviation as indicated by the horizontal line in the figure as a kind of Cantor set. If the spikes are scale invariant, then this set will be a fractal with a certain fractal dimension. Now, move the horizontal line a little higher to consider a different set. We find that the fractal dimension of this different set is lower. Indeed, in this way, moving to higher and higher levels we could specify the fractal dimension of all the different level sets, thus completely characterizing the set of spikes by an infinite number of fractal dimensions. The absolute temperature changes (the spikes) - and indeed the temperature transect itself - are thus multifractals. It turns out that multifractals are naturally produced by cascade processes that are physical models of the concentration of energy and other turbulent fluxes into smaller and smaller regions. Interested readers can find more information about this in box 2.1. Mathematically, whereas fractals are scale invariant geometric sets of points they are black or white, you are either on or off the set of points. In contrast, multifractals are scale invariant fields: like the temperature, they have numerical values at each point in space, at each instant in time. 2.3 Fluctuations as a microscope When confronted with variability over huge range of space and time scales, we have argued that there are two extreme opposing ways of conceptualizing it. We can either assume that everything changes as we move from one range of scale to another – ever factor of ten or so – or on the contrary, we can assume that – at least over wide range (factors of hundred, thousands or more), that blowing up gives us something that is essentially the same. But this is science; it shouldn’t be a question of ideology. If we are given a temperature series or a cloud image, how can we analyse the data to distinguish the two, to tell which is correct? We have already introduced two methods: spectral analysis which is quite general, and the area-perimeter relation which is rather specialized. While spectral analysis is a powerful technique, its interpretation is not so simple – indeed, had the interpretations been obvious, we would never have missed the quadrillion and the distinction between macroweather and the climate would have been clarified long ago! It is therefore important to use an analysis technique that is both easy to apply and easy to understand, a kind of analytic microscope that allows us to zoom in and to systematically compare a time series or a transect at different scales, to directly test the scalebound or scaling alternative: fluctuation analysis. We probably all have an intuitive idea about what a fluctuation is. In a time series it’s about the change in the value of a series over an interval in time. Consider a temperature series. We are interested in how much the temperature has fluctuated over an interval of time Δt. The simplest fluctuation is simply the difference between the temperature now and at a time Δt earlier (fig. 2.9a top). This is indeed the type of fluctuation that has traditionally been used in turbulence theory and that was used in the first attempt to test the scaling hypothesis on climate data. In order to make figure 2.10, I started with two instrumental series the Manley series from central England starting in 1659 (open circles) and an early northern hemisphere series from 1880 (black circles); the former being essentially local, that latter global in scale. The other series were from early paleo isotope series as indicated in the caption, using the official calibrations into temperature valuesyyy.

yyy Long before the internet, scanners and publicly accessible data archives, as a post-doc at the Météorologie Nationale in Paris, I recall taking the published graphs, making enlarged photocopies, and then using tracing paper to painstakingly digitize them. 2017-03-23 3:42 pm 26

In order to make the graph, for a given time interval Δt, one systematically calculates all the nonoverlapping differences in each series and averaged their squares, the “typical value” shown in the plot is the square root of this (i.e. the standard deviation of the differences). One then plots the results on logarithmic coordinates since in that case, scaling appears as a straight line and can easily be identified. Reading the graph, one can see for example that at 10 year intervals, the typical northern hemisphere temperature change is about 0.2 oC and that about 50,000 years, that the typical temperature difference is about 6 oC (about ±3 oC), this corresponds to the typical difference of temperature between glacials and interglacials, hence the box (which allows for some uncertainty) is the “glacial-interglacial window”. The fluctuations are therefore straightforward to understand. On fig. 2.10, a reference line with slope H = 0.4 is shown corresponding to the scaling behaviour ΔT ≈ ΔtH, linking hemispheric temperature variations at ten years to paleo variations at hundreds of thousands of years. Although this basic picture is essentially correct, later work provided a number of nuances that help to explain why things were not fully cleared up until much later. Notice in particular, the two essentially flat sets of points in the figure, one from the local central England temperate up to roughly three hundred years, and the other from an ocean core that is flat from scales 100,000 years and longer. It turns out that the flatness is an artefact of the use of differences in the definition of fluctuations: we need something a bit better. Before continuing, let us recall the scaling laws that we have introduced up until now: β β Spectrum ≈ (frequency ω)- ≈ (scale) Number of boxes ≈ (size L)-D≈ (scale)-D Probability ≈ (size L)C ≈ (scale) C Fluctuations ≈ (interval Δt)H ≈ (scale)H

Where β is the spectral exponent, D is the fractal dimension of a set, C the codimension and H the fluctuation exponent zzz . A nonobvious problem with defining fluctuations as differences is that on average, differences cannot decrease with increasing time intervalsaaaa. This means that no matter what the value of H - whether positive or negative, that they cannot decrease so that whenever H is negative, the difference fluctuations will simply give a constant resultbbbb, the flat parts of fig. 2.10. But do regions of negative H exist? One way to investigate this is to try to infer H from the spectrum (which does not suffer from an analogous restriction: its exponent β can take any value). In this case there is an approximate formulacccc: β =1+2H. This formula implies that negative H corresponds to β <1, and a check on the spectrum (fig. 2.3a) indicates that several regions are indeed flat enough to imply negative H. How do we fix the problem and estimate the correct H when it is negative?

zzz The symbol H is used in honour of Edwin Hurst who discovered the “Hurst effect” – long range memory associated with scaling in hydrology. He did this by examining ancient records of Nile flooding: 55 Hurst, H. E. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770-808 (1951). It turns out that the fluctuation exponent is in general not the same as Hurst’s exponent, that they are only the same if the data follow the bell curve… which they only do rarely! This distinction has caused much confusion in the literature. aaaa This is true for any series that has correlations that decrease with Δt (as physically relevant series always do). bbbb D and C cannot be negative so that this problem does not arise for them. cccc Valid if we ignore intermittency. 2017-03-23 3:42 pm 27

It took a surprisingly long time to clarify this issue. To start with, the turbulence community - who had been the first to use fluctuations as differences (and seventy five years ago introduced the first H as the exponent in Kolmogorov’s famous lawdddd, H = 1/3 ch. 3, 4) – had many convenient theoretical results for difference fluctuations. In classical turbulence all the H’s are positive so that the restriction to positive H was not a problem. Later, in the wake of the nonlinear revolution in the 1980’s, mathematicians invented an entire mathematics of fluctuations called “wavelets”eeee. Although technically, difference fluctuations are indeed wavelets, mathematicians mock them calling them the “poor man’s wavelet” and promoting more sophisticated ones. Wavelets turned out to have many beautiful mathematical properties and often having colourful names such “Mexican Hat”, “Hermitian Hat”, or the “Cohen-Daubechies-Feauveau wavelet”. For mathematicians, the fact that physical interpretations were not evident was irrelevant. The mastery of wavelet mathematics also required a fair intellectual effort and this further limited their applications. This was the situation in the 1990’s when scaling started to be applied to geo time series involving negative H (essentially to any macroweather series, although at the time this was not at clear). It fell upon a statistical physicist Chung-Kang Peng to develop an H<0 technique that he applied to biological series; the Detrended Fluctuation Analysis (DFA) methodffff56. Also at this time, another part of the scaling community (including my colleagues and I) were focusing on multifractality and intermittency, and these issues didn’t involve negative H the problem was ignored. Over the following nearly two decades, there were thus several more or less independent strands of scaling analysis, each with their own mathematical formalism and interpretations. The wavelet community dealing with fluctuations directly, but unconcerned about simplicity of interpretation; the DFA communitygggg wielding a somewhat complex but one method that could be readily implemented numerically and didn’t require much theoretical baggage hhhh ; and the turbulence community focused on multifractal intermittency. In the meantime, mainstream geoscientists continued to use spectral analysis, but without insisting much on the interpretation of the results. Ironically, the impasse was broken by the first wavelet, a wavelet that Alfréd Haar (1885-1933) had introduced in 1910 even before the wavelet formalism had been invented57. The Haar fluctuation is beautiful for two reasons: the simplicity of its definition and calculation and the simplicity of interpretation58. To determine the Haar fluctuation over a time interval Δt, one takes the average of the first half of the interval and subtracts the average of the second half (fig. 2.9a, b). That’s itiiii! As for the interpretation, it is easy to show that when H is positive, that it is (nearly) the same as a difference, whereas

dddd Komogorov’s law was actually very close to Richardson’s 4/3 law, the 4/3 was H+1. eeee Although wavelets can be traced back to Alfred Haar (1909, see below), it really took off starting in the early 1980’s with the continuous wavelet transformation by Alex Grossman and Jean Morlet. ffff The key innovation was simply to first sum the series, effectively adding one to the value of H so that in most cases (as long as H>-1), the result became positive allowing for more usual difference and difference-like fluctuations to be applied. gggg At last count, Peng’s original paper had more than 2000 citations, an astounding number for a highly mathematical paper. hhhh The DFA method estimates fluctuations by the standard deviation of the residuals of a polynomial fit to the running sum of the series. The interpretation is so obscure that typical plots do not bother to even use units for the fluctuation amplitudes. iiii I can recall a comment of a referee of a paper in which I explained the Haar fluctuation using the same words. Expecting something complicated, he complained that he didn’t understand the words and instead wanted an equation! 2017-03-23 3:42 pm 28 whenever H is negative, we not only recover its correct valuejjjj, but the fluctuation itself can be interpreted as an “anomalykkkk.”

T Δt t+Δt H>0 2 t Difference fluctua=ons Weather 1 Δt < 10 days 20 80 100 120 - 1 - 2 - 3 ΔT - 4 difference - 5

T anomaly Anomaly fluctua=ons H<0 4 ΔT Macroweather 2 10 days <Δt< 300 yr 40 80 120 - 2 Δt t t+Δt - 4 - 6

Fig. 2.9a: Schematic illustration of difference (top) and anomaly (bottom) fluctuations for a mltifractal simulation of the atmosphere in the weather regime (0≤H≤1), top, and in the lower frequency macroweather regime (bottom). Notice the “wandering” and “cancelling behaviours.

jjjj The Haar fluctuation is only useful for H in the range -1 to 1, but this turns out to cover most of the series that are encountered in geoscience. kkkk In this context an anomaly is simply the average over a segment length Δt of the series with its long term average first removed. 2017-03-23 3:42 pm 29

T Haar fluctua=ons -1

40 80 120

Δt/2 Δt/2 Δt

Fig. 2.9b: Schematic illustration of Haar fluctuations (useful for processes with -1≤H≤1). The Haar fluctuation over the interval Δt is the mean of the first half subtracted from the mean of the second half of the interval Δt.

2017-03-23 3:42 pm 30

101

1

C o ) t Δ S( 10-1

1 101 102 103 104 105 106 Δt (years)

Fig. 2.10: The RMS difference structure function estimated from local (Central England) temperatures since 1659 (open circles, upper left), northern hemisphere temperature (black circles), and from paleo temperatures from Vostok (Antarctic, solid triangles), Camp Century (Greenland, open triangles) and from an ocean core (asterixes). For the northern hemisphere temperatures, the (power law, linear on this plot) climate regime starts at about 10 years. The rectangle (upper right) is the “glacial-interglacial window” through which the structure function must pass in order to account for typical variations of ±2 to ±3K for cycles with half periods ≈ 50 kyrs. Reproduced from 9.

2017-03-23 3:42 pm 31

20 oC ΔT 10 oC -0.7 0.4 o 0.4 5 C -0.4 0.4

10 102 10-4 10-3 10-2 10-1 103 104 105 106 107 108 109 0.5 oC Δt (yrs)

0.2 oC

weather macroweather climate macroclimate megaclimate

Fig. 2.11a: A composite of typical Haar fluctuationsllll from (daily and annually detrended) hourly station temperatures (left), 20CR temperatures (1871-2008 averaged over 2o pixels at 75oN) and paleo-temperatures from EPICA ice cores (right) over the last 800kyrs. The reference lines are indicated, their slopes are estimates of H. Adapted from 59.

H ΔT ≈ Δt 2 Megaclimate Veizer: 290 Mys - 511Myrs BP (1.23Myr) H ≈ 0.4

1 Megaclimate Zachos: 0-67 Myrs (370 kyr)

H ≈ -0.8 0 Macroclimate Huybers: 0-2.56 Myrs (14 kyrs) max T

H ≈ 0.4 Δ - 1 Climate

T/ Epica: 25-97 BP kyrs (400 yrs)

H ≈ - 0.4 - 2 Macroweather Berkeley: 1880-1895 AD (1 month)

- 3 Weather H ≈ 0.4 Lander Wy.: July 4-July 11, 2005 (1 hour) H>0: “wandering”, unstable H<0: “cancelling, stable 50 100 150 t

llll These are root mean square Haar fluctuations. 2017-03-23 3:42 pm 32

Fig. 2.11b: Representative series from each of the five scaling regimes taken from fig. 1.4a-d with the addition of the hourly surface temperatures from Lander Wyoming, (bottom, detrended daily and annually). In order to fairly contrast their appearances, each series had the same number of points (180) and was normalized by its overall range (the maximum minus the minimum), and each series was offset by 1 oC in the vertical for claritymmmm. The series resolutions were 1 hour, 1 month, 400 years, 14 kyrs, 370 kyrs and 1.23 Myrs bottom to top respectively. The black curves have H>0, the red, H<0, reproduced from59.

When Haar fluctuations are substituted for difference fluctuations, and using the climate series discussed in ch. 1 we obtain the composite fig. 2.11a that covers nearly the same range as Mitchell. One can clearly make out five distinct regions, each – with the exception of macro climatennnn - are scaling over ranges of roughly a thousand in time scaleoooo. We clearly the alternation of the sign of the slope (H) from positive (the weather regime to about 10 days, left), through the longer macroweather regime with decreasing fluctuations and negative H, to the increasing climate regime, decreasing macroclimate, increasing mega climate. The numbers on the vertical axis of fig. 2.11a all make perfect sense and give a precise idea of typical fluctuations at the correspond time scale. We can compare this to the amplitude of the fluctuations implied by Mitchell’s scalebound spectrum. To do this, note that his background was mostly “white noise” and this would have a slope - 1/2 on fig. 2.10. Using Haar fluctuations, Mitchell would have predicted that at scales of millions of years that changes of only millionths of a degree would occur, more than a million times too smallpppp! By comparing the difference and Haar fluctuations (fig. 2.10, 2.11a respectively), we can now understand the limitations of the difference-based analysis (fig. 2.10), and understand why macroweather was not clearly discerned until so recently. As expected, the increasing parts of the two figures are quite similar, whereas the flat parts of fig. 2.10 do indeed correspond to negative H – both macroweather and macroclimate. The remaining apparent divergence between the differences and Haar fluctuations (figs. 2.10, 2.11a) has to do with the difference between local and globally averaged temperatures and the difference between industrial and pre-industrial temperatures (due to anthropogenic warming), we defer discussion until ch. 5.

References

1 Mandelbrot, B. Scalebound or scaling shapes: a useful distinction in the visual arts and in the natural sciences. Leonardo 14, 43-47 (1981). 2 Zawadzki, I. Statistical properties of precipitation patterns. Journal of Applied Meteorology 12, 469-472 (1973). 3 Bras, R. L. & Rodriguez-Iturbe, I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resources Research 12, 450-456 (1976). 4 Bras, R. L. & Rodriguez-Iturbe, I. Random Functions and Hydrology. (Addison- Wesley Publishing Company, 1985). mmmm From top to bottom the ranges used for normalizing are: 10.1, 4.59, 1.61 (Veizer, Zachos, Huybers respectively, all δ18O), 6.87 K, 2.50 K, 25 K (Epica, Berkeley, Lander). nnnn It is not clear that this is indeed a true scaling regime, see ch. 3. oooo The weather regime apparently continues for a factor of more than a million down to millisecond dissipation scales. pppp The missing quadrillion refers to the spectrum; the amplitude of the fluctuations squared; Mitchell’s error in fluctuations is only about ten million. 2017-03-23 3:42 pm 33

5 Mitchell, J. M. An overview of climatic variability and its causal mechanisms. Quaternary Res. 6, 481-493 (1976). 6 Dijkstra, H. & Ghil, M. Low frequency variability of the large scale ocean circulations: a dynamical systems approach. Rev. Geophys. 43 (2005). 7 Fraedrich, K., Blender, R. & Zhu, X. Continuum Climate Variability: Long-Term Memory, Scaling, and 1/f-Noise, . International Journal of Modern Physics B 23, 5403-5416 (2009). 8 Dijkstra, H. Nonlinear Climate Dynamics. (Cambridge University Press, 2013). 9 Lovejoy, S. & Schertzer, D. Scale invariance in climatological temperatures and the local spectral plateau. Annales Geophysicae 4B, 401-410 (1986). 10 Shackleton, N. J. & Imbrie, J. The δ18O spectrum of oceanic deep water over a five-decade band. Climatic Change 16, 217-230 (1990). 11 Pelletier, J., D. The power spectral density of atmospheric temperature from scales of 10**-2 to 10**6 yr, . EPSL 158, 157-164 (1998). 12 Huybers, P. & Curry, W. Links between annual, Milankovitch and continuum temperature variability. Nature 441, 329-332, doi:10.1038/nature04745 (2006). 13 Wunsch, C. The spectral energy description of climate change including the 100 ky energy. Climate Dynamics 20, 353-363 (2003). 14 Chekroun, M. D., Simonnet, E. & Ghil, M. Stochastic Climate Dynamics: Random Attractors and Time-dependent Invariant Measures Physica D 240, 1685-1700 (2010). 15 Lovejoy, S. & Schertzer, D. in Chaos, Fractals and models 96 Vol. 38-52 (eds F. M. Guindani & G. Salvadori) (Italian University Press, 1998). 16 Nicolis, C. & Nicolis, G. Is there a climate attractor? Nature 311, 529 (1984). 17 Orlanski, I. A rational subdivision of scales for atmospheric processes. Bull. Amer. Met. Soc. 56, 527-530 (1975). 18 Schertzer, D., Lovejoy, S., Schmitt, F., Chigirinskaya, Y. & Marsan, D. Multifractal cascade dynamics and turbulent intermittency. Fractals 5, 427- 471 (1997). 19 Mandelbrot, B. B. Fractals, form, chance and dimension. (Freeman, 1977). 20 Mandelbrot, B. B. The Fractal Geometry of Nature. (Freeman, 1982). 21 Thompson, D. W. On Growth and Form. 793 (Cambridge University Press, 1917). 22 Lovejoy, M. Postmodern Currents: Art and Artists in the Age of Electronic Media. (Prentice Hall College Division, 1989). 23 Lovejoy, S. Area perimeter relations for rain and cloud areas. Science 187, 1035-1037 (1982). 24 Lovejoy, S. & Mandelbrot, B. B. Fractal properties of rain and a fractal model. Tellus 37 A, 209 (1985). 25 Dewdney, A. K. A computer microscope zooms in for a close look at the most complicated object in mathematics. . Scientific American August, 16–24 (1985). 26 Cantor, G. Sur les ensembles infinis et linéaires de points. Acta Mathematica 2, 381-408 (1883). 2017-03-23 3:42 pm 34

27 Sierpiński, W. Sur une courbe cantorienne qui contient une image biunivoque et continue de toute courbe donnée. C. r. hebd. Seanc. Acad. Sci., Paris (in French) 162, 629–632 (1916). 28 Lovejoy, S. & Schertzer, D. The Weather and Climate: Emergent Laws and Multifractal Cascades. (Cambridge University Press, 2013). 29 Lovejoy, S. How accurately do we know the temperature of the surface of the earth? . Clim. Dyn., doi:doi:10.1007/s00382-017-3561-9 (2017). 30 Lovejoy, S., Schertzer, D. & Ladoy, P. Fractal characterisation of inhomogeneous measuring networks. Nature 319, 43-44 (1986). 31 Kennedy, J. J., Rayner, N. A., Smith, R. O., Saunby, M. & Parker, D. E. Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850 part 2: biases and homogenisation. J. Geophys. Res. 116, D14104, doi:10.1029/2010JD015220 (2011). 32 Koch, H. On a continuous curve without tangents constructible from elementary geometry. Arkiv för Matematik, Astronomi och Fysik 1, 681-702 (1904). 33 Richardson, L. F. Atmospheric diffusion shown on a distance-neighbour graph. Proc. Roy. Soc. A110, 709-737 (1926). 34 Okubo, A. & Ozmidov, R. V. Empirical dependence of the horizontal eddy diffusivity in the ocean on the length scale of the cloud. Izv. Akad. Nauk SSSR, Fiz. Atmosf. i Okeana 6 (5), 534-536 (1970). 35 Morel, P. & Larchevêque, M. Relative dispersion of constant level balloons in the 200 mb general circulation. J. of the Atmos. Sci. 31, 2189-2196 (1974). 36 Lacorta, G., Aurell, E., Legras, B. & Vulpiani, A. Evidence for a k^-5/3 spectrum from the EOLE Lagrangian balloons in the lower stratosphere. J. of the Atmos. Sci. 61, 2936-2942 (2004). 37 Richardson, L. F. & Stommel, H. Note on eddy diffusivity in the sea. J. Met. 5, 238-240 (1948). 38 Monin, A. S. & Yaglom, A. M. Statistical Fluid Mechanics. (MIT press, 1975). 39 Welander, P. Studies on the general development of motion in a two dimensional , ideal fluid. Tellus 7, 156 (1955). 40 Steinhaus, H. Mathematical Snapshots. (Oxford University Press, 1960). 41 Monin, A. S. Weather forecasting as a problem in physics. (MIT press, 1972). 42 Mandebrot, B. B. The Fractalist. (First Vintage Books, 2011). 43 Perrin, J. Les Atomes. (NRF-Gallimard, 1913). 44 Steinhaus, H. Length, Shape and Area. Colloquium Mathematicum III, 1-13 (1954). 45 Mandelbrot, B. B. How long is the coastline of Britain? Statistical self- similarity and fractional dimension. Science 155, 636-638 (1967). 46 Richardson, L. F. Weather prediction by numerical process. (Cambridge University Press republished by Dover, 1965, 1922). 47 Lynch, P. The emergence of numerical weather prediction: Richardson's Dream. (Cambridge University Press, 2006). 48 Richardson, L. F. The problem of contiguity: an appendix of statistics of deadly quarrels. General Systems Yearbook 6, 139-187 (1961). 2017-03-23 3:42 pm 35

49 Schertzer, D. & Lovejoy, S. in IUTAM Symp. on turbulence and chaotic phenomena in fluids. (ed T. Tasumi) 141-144. 50 Grassberger, P. Generalized dimensions of strange attractors. Physical Review Letter A 97, 227 (1983). 51 Hentschel, H. G. E. & Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physica D 8, 435-444 (1983). 52 Mandelbrot, B. B. Multifractals and Fractals. Physics Today 39, 11, doi: http://dx.doi.org/10.1063/1.2815135 (1986). 53 Parisi, G. & Frisch, U. in Turbulence and predictability in geophysical fluid dynamics and climate dynamics (eds M. Ghil, R. Benzi, & G. Parisi) 84-88 (North Holland, 1985). 54 Wolfram, S. in Wall Street journal (2012). 55 Hurst, H. E. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770-808 (1951). 56 Peng, C.-K. et al. Mosaic organisation of DNA nucleotides. Phys. Rev. E 49, 1685-1689 (1994). 57 Haar, A. Zur Theorie des orthogonalen Funktionsysteme. Mathematische Annalen 69, 331-371 (1910). 58 Lovejoy, S. & Schertzer, D. Haar wavelets, fluctuations and structure functions: convenient choices for geophysics. Nonlinear Proc. Geophys. 19, 1- 14, doi:10.5194/npg-19-1-2012 (2012). 59 Lovejoy, S. A voyage through scales, a missing quadrillion and why the climate is not what you expect. Climate Dyn. 44, 3187-3210, doi:doi: 10.1007/s00382-014-2324-0 (2015).