<<

Friday, February 2, 2018 Chapter 2 1

Chapter 2. New worlds versus scaling: from van Leeuwenhoek to Mandelbrot 2.1 Scalebound thinking and the missing quadrillion

2.1.1 New worlds and spectral bumps We just took a voyage through scales noticing structures in cloud photographs and wiggles on graphs. Collectively these spanned ranges of scale over factors of billions in space and billions of billions in time. We are immediately confronted with the question: how can we conceptualize and model such fantastic variation? Two extreme approaches have developed. For the moment I will call the dominant one the “new worlds” view after Antoni van Leeuwenhoek (1632-1723), who developed a powerful early microscope. The other is the self-similar (scaling) view by (1924- 2010) which I discuss in the next section. My own view - scaling but with the notion of scale itself an emergent property - is discussed in ch.3. When van Leeuwenhoek peered through his microscopea, in his amazement he is said to have discovered a “new world in a drop of water”: “animalcules”, the first micro- organismsb (fig. 2.1). Since then, the idea that zooming will reveal something totally new has become second nature: in the 21st century -imaging microscopes are developed precisely because of the promise of such new worlds. The scale-by-scale “newness” idea was graphically illustrated by K. Boeke’s highly influential book “” (1957) which starts with a photograph of a girl holding a cat, first zooming away showing the surrounding vast reaches of , and then zooming in until reaching the nucleus of an atom. The book was incredibly successful, it was included in Mortimer Adler's “Gateway to the Great Books” (1963), a 10 volume series featuring works by Aristotle, Shakespeare, Einstein and others. In 1968, two films were based on Boeke’s book: “”c and “Powers of Ten” (1968d, re-released in 1977e) encouraging the idea that nearly every power of ten in scale hosts different phenomena. More recently (2012), there’s even the interactive , app for the iPad, iPhone, or iPod, not to mention a lavish update: the “Zoomable ”1. In a 1981 paper, Mandelbrot coined the term “scalebound” for this “New Worlds” view, a convenient shorthandf that I use frequently belowg. Often, scaleboundedness is obvious. For example, adult humans have heights in the factor of ten range 30 cm to 3 m,

a The inventor of the first microscope is not known, but van Leuwenhoek’s was powerful, up to about 300 times magnification. b Recent historical research indicates that Robert Hooke may in fact have preceded van Leeuwenhoek, but the latter is usually credited with the discovery. c Produced by the National Film Board of Canada. d By Charles and Ray Eames. e The re-release had the subtitle: “A Film Dealing with the Relative Size of Things in the Universe and the Effect of Adding Another Zero” and was narrated by P. Morrison. More recently, the similar “” (1996), appeared in IMAX format. f He wrote it as here, as one word, as a single concept. g He was writing in Leonardo, to an audience of architects: “I propose the term scalebound to denote any object, whether in nature or one made by an engineer or an artist, for which characteristic elements of scale, such as length and width, are few in number and each with a clearly distinct size”: 2 Mandelbrot, B. Scalebound or scaling shapes: a useful distinction in the visual arts and in the natural sciences. Leonardo 14, 43-47 (1981). Friday, February 2, 2018 Chapter 2 2 and are several tenths of a nanometer across (give or take a factor of 10). Stretching the idea only a little, bacteria are scalebound even though they range in size from a tenth of a micron to a tenth of a millimeter. But what about a cloud, a coastline or a storm? While “Powers of Ten” was proselytizing the new worlds view to an entire generation, there were other developments that pushed scientific thinking in the same direction. In the 1960’s, long ice and ocean cores were revolutionizing climate science by supplying the first quantitative data at centennial, millennial and longer time scales. This coincided with the development of practical techniques to decompose a signal into oscillating components: “spectral analysis”. While it had been known since Joseph Fourier (1768-1830) that any time series may be written as a sum of sinusoids, applying this idea to real data was computationally challenging and in atmospheric science had been largely confined to the study of turbulenceh. The breakthrough was the development of fast computers combined with the discovery of the “Fast Fourier Transform” (FFT)3 algorithmi (1965). The beauty of Fourier decomposition is that each sinusoid has an exact, unambiguous time scale: its period (the inverse of its frequency) is the length of time it takes to make a full oscillation (fig. 2.2a, upper left for examples). Fourier analysis thus provides a systematic way of quantifying the contribution of each time scale (inverse frequency) to a time series. The spectrum is obtained from the analysis by averagingj the square of the contributionk at a given frequency (the inverse period) per frequency band. Qualitatively, it indicates the variability of the process as a function of time scale (one divided by the frequency). In the following, this qualitative idea is main thing to keep in mindl. Fig. 2.2a illustrates this for the Weierstrass function which in this example, is constructed by summing sinusoids with frequencies increasing by factors of two so that the nth frequency is ω = 2n. The amplitudes (A) decrease by factors of 2-H (here H =1/3 so that 2-H = 0.79) so that the nth amplitude is 2-nH. Fig 2.2a (upper left) shows the result for H =1/3 with all the terms up until 128 cycles per second (upper row); eliminating n, we find the power law relation A = ω-H. More generally for a scaling process, we have: - Spectrum = (frequency) β

Where β is the usual notation for the “spectral exponent”m. The spectrum is the square of the amplitude, so that in this (discrete) examplen we have β =2H. The spectrum of the Weierstrass function is shown in fig. 2.2a bottom row (left) as a discrete series of dots, one for each of the 8 sinusoids in the upper left construction. On the bottom row (right) we show the same spectrum but on a logarithmic plot on which power laws are straight lines. Of course, in the real world - unlike this academic example - there is nothing special about powers of 2 so that all frequencies – a continuum - are present. The Weierstrass function was created by adding sinusoids: Fourier composition. Now take a messy piece of data – for example the multifractal simulation of the data series (lower h Even here, spectra were often estimated by using specialized circuitry involving numerous narrow band filters. i The speed-up due to the invention of the FFT is huge: even for the relatively short series in fig. 1.3 (2048 points) it is about a factor of one hundred. In numerical weather models it accelerates calculations by factors of millions. j The averaging is over a collection (ensemble) of similar processes. If we only have a single time series, then technically, the result is a “periodogram”. k Its variance. l A precise interpretation of the spectrum is technically challenging and as we discuss below, even the professionals missed key implications! mThe negative sign is used by convention so that in typical situations, β is positive. n In the more usual case of continuous spectra, we have β = 1+2H with corrections when intermittency is important (box 2.1). Friday, February 2, 2018 Chapter 2 3 left in fig. 1.3): it has small, medium and large wiggles. To analyze it we need the inverse of composition, and this is where the FFT is handy. In this case, by construction, we know that the wiggles are unimportanto, they are generated randomly by the process. However, if we had no knowledge – or only a speculation - about of the mechanism that produced it, we would wonder: do the wiggles hide signatures of important processes of interest, or are they simply uninteresting details that should be averaging out and ignored?

Fig. 2.1: Antoni van Leuwenhoek discovering “animalcules” (micro-organisms), circa 1675.

Fig. 2.2b shows the spectrum of the multifractal simulation (fig. 1.3 lower left) for all periods longer than 10 milliseconds. How do we interpret the plot? One sees three strong spikes, at frequencies of 12, 28 and 41 cycles per second (corresponding to periods of 1/12, 1/28 and 1/41 of a second, about 83, 35, 24 milliseconds). Are they signals of some important fundamental process or are they just noise? Naturally, this question can only be answered if we have a mental model of how the process might be generated, and this is where it gets interesting. First of all, consider the case where we have only a single series. If we knew the signal was turbulent (as it was for the data series at the top), then turbulence theory tells us that we would expect all the frequencies in a wide continuum of scales to be important, and furthermore, that at least on average, that their amplitudes should decay in a power law manner (as with the Weierstrass function). But the classical theory only tells us the spectrum that we would expect to find if we averaged over a large number of identical experimentsp (each one with different “bumps” and wiggles, but from the same overall conditions). In fig. 2.2b, this

o We mean that they don’t imply any special origin or mechanism. However, various applications might only be sensitive to a narrow range of frequencies - for example wind blowing against a swing. In this case, the wiggles that occur at the frequency of the swing would be important even if - from the point of view of the underlying mechanisms – they had no special role. p An “ensemble” or “statistical” average, see ch. 4. Friday, February 2, 2018 Chapter 2 4 average is the smooth blue curveq. But in the figure, we see that there are apparently large departures from this average. Are these departures really exceptional or are these just “normal” variations expected from randomly chosen pieces of turbulence? Before the development of cascade models and the discovery of multifractals in the 1970’s and 80’s, turbulence theory would have led us to expect that the up and down variations about a smooth line through the spectrum should roughly follow the “bell curve”. If this was the case, then the spectrum should not exceed the bottom red curve more than 1% of the time and the top curve more than one in ten billion times. Yet, we see that even this 1/10,000,000,000 curve is exceeded twice in this single but unexceptional simulationr (indicated by the two leftmost arrows in the figure). The spectrum turns out to be very sensitive to “jumps” and “spikes” that are hiding in the signal as illustrated in fig. 1.5 (but see also figs. 4.6 and 5.3). This turns out to be an example of extreme “black swan” variability discussed in Box 3.1. Had we encountered this series in an experiment, turbulence theory itself would probably have been questioned – as indeed it repeatedly was (and still is). Failure to fully appreciate the huge variability that is expected in turbulent processes and continued embrace of inappropriate bell curve type paradigms has spuriously shed discredit on many attempts at establishing turbulent laws and has been a major obstacle to their understanding.

q It is the ensemble average spectrum. In this example, it lies mostly below the spectrum of the single (somewhat exceptional) realization shown in black. r I admit that to make my point, I made 5000 simulations of the multifractal process in fig. 1.3 and then searched through the first 50 to find the one with the most striking variation. But this was by no means the most extreme of the 5000 and if the statistics had been from the bell curve, then the extreme point in the spectrum in fig. 2.3 would have corresponded to a probability of one in 10 trillion, so that my slight cheating in the selection process would still have been extremely unlikely to have caused the result! Friday, February 2, 2018 Chapter 2 5

�(�) �(�) �� �� �� �� �� �� � � � � � ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� �(ω) ����(ω) ��� ���ω ��� -��� ��� ��� ��� ��� -��� ��� -��� ��� -��� -��� ��� -��� ω -��� �� �� �� �� ��� ���

Fig. 2.2a: Upper left: The first eight contributions to the Weierstrass function (displaced in the vertical for clarity). Sinusoids with frequencies of 1, 2, 4, 8, 16, 32, 64, 128 cycles per second (the time t is in seconds). Upper right: Sinusoids with frequencies of 2, 4, 8, 16, 32, 64, 128, cycles per second, stretched by a factor 2H = 1.25 in the vertical and a factor of two in the horizontal. The sum (top) is the same as that on the left but is missing the highest frequency detail (see the discussion a little later). Lower left: The spectrum on a linear-linear scale each point indicating the contribution (squared) and the frequency. Lower right: The same as lower left but on a logarithmic plot (it is now linear).

In conclusion, until the 1980’s with the development of multifractals, even if we knew that the series came from an apparently routine turbulent wind trace on the roof of the physics building, we would still have concluded that the bumps were indeed significant. But what would be our interpretation if instead fig. 2.2b was the spectrum of a climate seriess? We would have no good theory of the variability and we would typically only have a single trace. Let’s take the example of an ice core record. The series itself was likely the product of a near heroic scientific effort, possibly involving months in freezing conditions near the south pole. The sample would first be cored and then transported to the lab. This would have been followed up by a painstaking sampling, analysis of the isotopic composition using a mass spectrometer, and finally a digitization of the result. Careful comparison with other cores or with ice flow models would eventually establish a chronology. At this point, the researcher would be eager for a quantitative look at what she had found. If the black curve

s Obviously with totally different time scales! Friday, February 2, 2018 Chapter 2 6 in fig. 2.2b was the spectrum of such a core, how would she react to the bumps in the spectrum? Unlike the turbulence situation where there was some theory, an early core would have had little with which to compare it. This is the point where the new worlds view could easily influence the researcher’s resultst. She would be greatly tempted to conclude that the spikes were so strong, so far from the bell curve theory that they represented real physical oscillations occurring over a narrow range of time scales. She would also remark that the two extreme bumps in the spectrum involve several successive frequencies, and according to usual statistical assumptions, “background noise” should not be correlated in this way. This wide bump would strengthen the spurious interpretation that there was a hidden oscillatory process at worku. Armed with the series of bumps, she might start to speculate about possible physical mechanisms to explain themv.

) ω Spectrum E(

Frequency ω (s-1)

Fig. 2.2b: Black: the Fourier spectrum w (E(ω), proportional to the amplitude squared of the t Alternatively, there might be incorrect theories that could be spuriously supported by fortuitously placed random spectral bumps, and much time would be wasted chasing blind alleys. u According to standard assumptions (that this example shows are inappropriate), successive frequencies should be statistically independent of each other, there should be no relationship between them. vIn ch. 4 we give some examples of spurious periodicities that emerged in the 1990’s from inappropriate probability assumptions in the use of the multi-taper method (MTM) and Singular Spectral Analysis (SSA) method. The search for significant oscillations continues, a recent climate example of a spectral analysis with a nearly identical spectrum as in fig. 2.2b (albeit with much smaller spikes, but still claimed to be significant) can be found in: 4 Galloway JM et al. Climate change and decadal to centennial-scale periodicities recorded in a late Holocene NE Pacific marine record: Examining the role of solar forcing. Palaeogeogr Palaeoclimatol Palaeoecol 386, 669–689 (2013). w Since the black curve is the result of analyzing a single series, there is no averaging over a collection of series so that technically, this is a periodogram; the spectrum is represented rather by the smooth grey curves at the bottom; these are obtained by averaging over many periodograms (one of these by averaging over 5000, the other is the theoretical result for averaging over an infinite number). Friday, February 2, 2018 Chapter 2 7 components discussed in fig. 2.2a) of the changes in wind speed in the 1 second long simulation shown at the bottom left of fig. 1.3, showing the amplitudesx of the first 100 frequencies (ω). The points the furthest to the left are thus at frequencies of one cycle over the length of the simulation, i.e. one cycle per second, a period of one second. The far right shows the variability at 100 cycles per second giving the amplitude of the wiggles at 10 milliseconds (higher frequencies were not shown for clarity). At the bottom, one there are actually two grey curves that are nearly indistinguishable. One shows the average over 5000 random series each statistically identical to that in fig. 1.3. As expected, it is nearly identical to the other superposed theoretical (scaling) power law grey curve. The top three grey curves show the theoretical 1%, one in a million and one in ten billion extreme fluctuation limits (bottom to top) determined by assuming that the spectrum has bell curve (Gaussian) probabilities. The arrows show the most extreme spikes, each of which has a probability of less than one in a million.

We should thus not be surprised to learn that the 1970’s witnessed a rash of papers based on spectra resembling that of fig. 2.2b: oscillators were suddenly ubiquitousy. It was in this context that Murray Mitchell8 (1928-1990) famously made the first explicit attempt to conceptualize temporal atmospheric variability (fig 2.3a). Mitchell’s ambitious composite spectrum ranged from hours to the age of the (≈4.5x109 to 10-4 years, bottom, fig. 2.3a). In spite of his candid admissionz that this was mostly an “educated guess”, and notwithstanding the subsequent revolution in climate and paleoclimate data, over forty years later it has achieved an iconic status and is still regularly cited and reproduced in climate papers and textbooks9,10,11. Its continuing influence is demonstrated by the slightly updated version shown in fig. 2.3b that (until 2015) adorned NOAA’s National Climate Data Center (NCDC) paleoclimate web siteaa. The site was surprisingly forthright about the figure’s ideological character. While admitting that “in some respects it overgeneralizes and over-simplifies climate processes”, it continued: “… the figure is intended as a mental model to provide a general "powers of ten" overview of climate variability, and to convey the basic complexities of climate dynamics for a general science savvy audience.” Notice the explicit reference to the “powers of ten” mindset over fifty years after Boeke’s bookbb. Certainly the continuing influence of Mitchell’s figure has nothing to do with its accuracy. Within fifteen years of its publication, two scaling composites (close to several of those shown in fig. 2.3a, one shown in fig. 2.11), over the ranges 1 hour to 105 years, and 103 to 108 years, already showed colossal discrepancies12, 13. In fig. 2.3a, we have superposed the spectra of several of the series analysed in ch. 1; the difference with Mitchell’s original is x The spectrum is actually the ensemble average of the squares of the absolute amplitudes. It was “windowed” in order to avoid spurious “spectral leakage” that could artificially smear out the spectrum. y I could also mention the contribution of “Box-Jenkins” techniques (1970) to bolstering scalebound blinkers. These were originally engineering tools for analyzing and modeling stochastic processes based on the a priori scalebound assumption that the correlations decayed in an exponential manner. This especially contributed to scalebound thinking in precipitation and hydrology. See for example the influential publications: 5 Zawadzki, I. Statistical properties of precipitation patterns. Journal of Applied Meteorology 12, 469-472 (1973). 6 Bras, R. L. & Rodriguez-Iturbe, I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resources Research 12, 450-456 (1976); 7 Bras, R. L. & Rodriguez-Iturbe, I. Random Functions and Hydrology. (Addison-Wesley Publishing Company, 1985). z I should make it clear that the problem was not Mitchell - who made an important pioneering contribution – but rather the later elevation of Mitchell’s provisional, tentative interpretation into an unexamined ideology. aa The site explicitly acknowledged Mitchell’s influence. bb If this were not enough, the site adds a further gratuitous interpretation. To assure sceptics it continues “[just] because a particular phenomenon is called an oscillation, it does not necessarily mean there is a particular oscillator causing the pattern. Some prefer to refer to such processes as variability.” Since any time series whether produced by turbulence, the stock market or a pendulum can be decomposed into sinusoids: the decomposition has no physical content per se, yet we are told that variability and oscillations are synonymous. Friday, February 2, 2018 Chapter 2 8 literally astronomical. Whereas over the range 1 hour to 109 years, Mitchell’s background varies by a factor ≈ 150 (bottom, grey), the spectra from real data imply that the true range is a factor of a quadrillioncc (1015), NOAA’s fig. 2.3b extends this error by a further factor of tendd. Rather than plotting the data in the difficult to interpret terms of amplitudes of sinusoids, we can plot it in a much simpler to interpret way using typical amplitudes of fluctuations. Fig. 2.3c (top) shows the analysis of many of the same series as in fig. 2.3a, but using the fluctuations described in detail in section 2.3 (see also fig. 2.12a,b). Even with only an intuitive understanding of fluctuations, we can see that fig. 2.3c makes intuitive sense: by reading the numbers off the graph we see that typical temperature fluctuations vary between a little less than a degree to as much as 15 or 20oC at hundreds of millions of years, the full details and discussion are given later. In comparison, fig. 2.3c (bottom: the line and the grey areas) shows that the quantitative implications of Mitchell’s spectrum are quite implausible. Although, the amplitude of the fluctuations in fig. 2.3b was not specified, the baseline background in fig. 2.3b is a constant level of white noise that corresponds to fluctuations decreasing with a slope of -1/2 as indicated. We see for example, that it implies that consecutive one million year average global temperatures would vary by only a of a degree centigrade! Both fig. 2.3a, 2.3c (top) show how these analyses can be used to establish a basic typography, to categorize the basic dynamical regimes; we return to this in section 2.3 and in ch. 5. Writing a and a half after Mitchell, leading climatologists Shackelton and Imbrie13 laconically noted that their own spectrum was “much steeper than that visualised by Mitchell”, a conclusion already anticipated five years earlier (fig. 2.11) and subsequently reinforced by several scaling composites14, 15. Over at least a significant part of this range, Wunsch16 further underlined its misleading nature by demonstrating that the contribution to the variability from specific frequencies associated with specific “spikes” (presumed to originate in oscillatory processes) was much smaller than the contribution due to the continuum. But none of this perturbed the dominant scale bound view. Just as van Leuwenhook peered through the first microscope and discovered a new world, today, we automatically anticipate finding new worlds by zooming in or out of scale. It is a scientific ideology so powerful that even quadrillions do not shake it. The scalebound view led to a framework for atmospheric dynamics that emphasized the importance of numerous processes occurring at well defined time scales, the quasi periodic “foreground” processes illustrated as bumps – the signals - on Mitchell’s nearly flat background that was considered to be an unimportant noiseee. Although in Mitchell’s original figure, the lettering is difficult to decipher, fig. 2.3b spells them out more clearly with numerous conventional examples. For example, the QBO is the “Quasi-Biennal Oscillation”, ENSO is the “El Nino Southern Oscillation”, the PDO is the “Pacific Decadal Oscillation” and the NAO is the “North Atlantic Oscillation”. At longer time scales, the Dansgaards-Oescher and Milankovitch and tectonic “cycles”ff will be discussed in ch. 4. The

cc In fig. 2.4a, we plot the same information but in real space and find that whereas the RMS fluctuations at 5.53x108 years are ≈ ±10 K so that extrapolating Gaussian white noise over the range implies a value ≈ 10-6 K, i.e. it is in error by a factor ≈107. dd If we attempt to extend Mitchell’s picture to the dissipation scales (at frequencies a million times higher, corresponding to millisecond variability), the spectral range would increase by an additional factor of a billion. ee Mitchell actually assumed that his background was either a white noise or over short ranges, sums (integrals) of a white noise. ff The figure refers to these as “cycles” rather than oscillations, perhaps because they are broader. Friday, February 2, 2018 Chapter 2 9 point here is not that these processes, mechanisms, are wrong or inexistentgg, it is rather that – at best - they only explain a small fraction of the overall variability. Even the nonlinear revolution was affected by scalebound thinking. This included atmospheric applications of low dimensional deterministic chaos. When chaos techniques were applied to weather and climate, the spectral bumps were associated with specific chaos models, analysed with the help of the dynamical systems machinery of bifurcations, limit cycles and the likehh. Of course – as discussed below - from the alternative scaling, turbulence view, wide range continuum spectra are generic results of systems with large numbers of interacting components (“degrees of freedom”) - “stochastic chaos” 18 – and are incompatible with the usual small number of interacting components (“low dimensional”) deterministic chaos ii . Similarly, whenever there are no dynamically important characteristic scales or scale breaks the spectra will be scaling - i.e. power lawsjj (ch. 3).

105 yr 2 K β = 1.8

1

) β = 1.8 -0.6 ω

megaclimate macroweather weather 15 =

E( β 10

10 climate 1 year Log β = 0.2 24hrs 10-5 Macroclimate β = 0.6 β = 1.8

10-10 -9 -6 -5 -2 -1 2 4 10 10 10 10 Log10ω (yr) 10 10

Fig. 2.3a: A comparison of Mitchell’s relative scale, “educated guess” of the spectrum (grey, bottom8) with modern evidence from spectra of a selection of the series displayed in fig. 1.4 (the plot is logarithmic in both axes). There are three sets of thick black lines; on the far right, the spectra from the 1871-2008 20CR (at daily resolution) quantifies the difference between the globally averaged

gg Although in some cases, maybe they are! hhMore recently updated with the help of stochastics: the “random dynamical systems” approach, see e.g.: 17 Chekroun, M. D., Simonnet, E. & Ghil, M. Stochastic Climate Dynamics: Random and Time- dependent Invariant Measures Physica D 240, 1685-1700 (2010). 11 Dijkstra, H. Nonlinear Climate Dynamics. (Cambridge University Press, 2013). ii In the early 1980’s, excitement about chaos was so strong that enthusiasm sometimes replaced cool heads. A famous paper published in Nature that was based on a new time series analysis algorithm, even claimed that four interacting components were enough to describe and model the climate! 19 Nicolis, C. & Nicolis, G. Is there a climate ? Nature 311, 529 (1984).. jj Although in the more recent random dynamical systems approach, the driving noise may be viewed as the expression of a large numbers of degrees of freedom, this interpretation is only justified if there is a significant scale break between the scales of the noise and of the explicitly modelled dynamics, it is not trivially compatible with scaling spectra. Friday, February 2, 2018 Chapter 2 10 temperature (bottom) and local averages (2ox2o, top)kk. The spectra were averaged over frequency intervals (10 per factor of ten in frequency), thus “smearing out” the daily and annual spectral “spikes”. These spikes have been re-introduced without this averaging, and are indicated by vertical lines at daily and annual resolution curves. Using the daily resolution data, the annual cycle is a factor ≈ 1000 above the continuum, whereas using hourly resolution data, the daily spike is a factor ≈3000 above the background. Also shown is the other striking narrow spectral spike at (41 kyrs)-1 (obliquity; ≈ a factor 10 above the continuum), this is shown in dashed grey since it is only apparent over the period 0.8 - 2.56 Myr BP (before present). The thick dashed lines have slopes indicating the scaling behaviours. The thin dashed lines show the transition periods that separate out the regimes discussed in detail in ch. 4, 5; these are at 20 days, 50 yrs, 80,000 yrs, and 500,000 yrs. Mitchell’s original figure has been faithfully reproduced many times (with the same admittedly mediocre quality). It is not actually very important to be able to read the lettering near the spikes, if needed they can seen in fig. 2.3b which was inspired by it (see also or 2.3c).

Fig. 2.3b: The updated version of Mitchell’s spectrum reproduced from NOAA’s NCDC paleoclimate web sitell. The “background” on this paleo site is perfectly flat; hence in comparison with the empirical spectrum in fig. 2.3a, it is in error by an overall factor ≈ 1016.

kk Mitchell also distinguished been region and global spectra with the latter being the lower curve completely contained within the grey area near the bottom. ll The page has since been taken down. Friday, February 2, 2018 Chapter 2 11

Diurnal cycle o megaclimate 20 C climate o Macro- climate Annual cycle 10 C macroweather weather

10-4 10-3 10-2 10-1 2 103 104 105 106 107 108 109

QBO 10 10 Time scale Δt (years) 0.1 oC

Diurnal harmonic Dansgaard- Synop&c Weather Orbital Tectonic Oeschger (Milankovitch) cycles: Evolu&on of cycles o ENSO cycles mountain earth,

0.01 C PDO Annual harmonics building and atmosphere, weathering NAO and biosphere 10-3 oC T Δ

10-4 oC

10-5 oC Fluctuation Flat spectrum Gaussian white 10-6 oC noise, slope: -0.5

2.3c: The spectrum in fig. 2.3b replotted in terms of fluctuations (grey), where the flat baseline of fig. 2.3b now has a slope of -1/2 corresponding to an uncorrelated Gaussian “white noise” background. Since the amplitudes in fig. 2.3b were not specified, the amplitudes of the transformed “bumps” is only notional. At the top is superposed the typical fluctuation at time scale Δt as estimated from various instrumental and paleo series, from those displayed in fig. 1.4. More details can be found in section 2.3 and see fig. 2.12a.

2.1.2 New worlds and the meteorological zoo At weather scales, and at virtually the same time as Mitchell proposed a scalebound framework for temporal variability, Isidoro Orlanski proposed a scalebound spatial classification of atmospheric phenomena by powers of ten (fig. 2.4)20. The figure is a reproduction of Orlanski’s phenomenological space-time diagrammm with eight different dynamical regimes indicated on the right according to their spatial scales. The diagram does more than just classify phenomena according to their size, it also relates their sizes to their lifetimes. Along the diagonal, various pre-existing conventional phenomena are indicated including fronts, hurricanes, tornadoes and thunderstorms. The straight line embellishment was added by colleagues and me in 199721 and shows that the figure actually shows scaling, not scalebound behaviour! This is because straight lines on logarithmic plots such as this are power laws and even the slope of the line (-3/2) turns out to be theoretically predicted using the energy rate density; more on this below. At the time of Orlanski’s classification, meteorological thinking was already largely scalebound. There were several reasons for this. One was partly due to its near total divorce from (the primarily scalingnn), turbulence theory. Another was another due to its heritage from the older more qualitative traditions of “synoptic” meteorologyoo. Finally, we should cite the influence of analytic linearized approaches that were the only ones available in the pre-computer era. Orlanski’s classification therefore rapidly became popular as a

mm Sometimes called “Stommel diagrams” after Henry Stommel who produced such diagrams in oceanography. nn All the fundamental turbulence theories were scaling, the question was whether one or two – or in some cases three – scaling ranges were required; we discuss this in detail in ch. 3. oo For example, as we discuss in the next chapter, clouds had been classified since the early 19th century, but these classifications were based on cloud morphologies not size. Friday, February 2, 2018 Chapter 2 12 systematic rationalization of an already strongly phenomenologically scalebound approach. It is ironic that just as Orlanski tried to perfect the old scalebound approach, and unbeknownst to the modellers, the computer revolution was ushering in the opposite scaling approach: the General Circulation Model (GCM)pp.

Time scale Length Scale

Fig. 2.4: Orlanski’s space-time diagram with eight different dynamical regimes indicated on the right according to their spatial scales. Notice that he indicates that the climate starts at about two weeks (bottom row). The straight line shows that the figure is actually scaling (straight on this logarithmic plot). Reproduced from 21. 2.2 Scaling: Big whirls have little whirls and little whirls have lesser whirls

2.2.1 The revolution Scalebound thinking is now so entrenched that it seems obvious that “zooming in” to practically anything will disclose hidden secrets. Indeed, we would likely express more

pp At the time the GCMs were much too small to allow for proper statistical scaling analyses of their outputs, and the reigning turbulence theory turned out to be seriously unrealistic, see ch. 3. Even today, the fact that GCMs are scaling is practically unknown; the absence of a scale break is even sometimes seen not as a model strength, but as a model deficiency, see the discussion in ch. 4! Friday, February 2, 2018 Chapter 2 13 wonder if we zoomed in only to find that nothing had changed, if the system’s structure was scaling! Yet in the last thirty years antiscaling prejudices have started to unravel; much of this is thanks to Mandelbrot’s path breaking “, form chance and dimension”22qq (1977) and “Fractal Geometry of Nature”23 (1982). Thanks to his avant-garde use of stunning computer graphics, his books made an immediate visual impact; they were the first to display the realism of scaling. One was also struck by the word “geometry” in the title: the last time scientists had emphasized geometry was 60 years earlier when D’Arcy Thompson24 brilliantly used it to understand the shapes of diatoms and other biomorphologies. While Mandelbrot’s simulations, imagery and scaling ideas sparked the fractal strand of the nonlinear revolution - and continue to transform our thinking - his insistence on geometry is now nearly forgotten. The basic reason is that scientists are – in my opinion rightly - more interested in statistics than in geometry. There is also a less obvious reason: the most interesting thing to come from the scaling revolution was probably not fractals per se, but multifractals, and – despite attempts - these cannot generally be reduced to geometry (box 2.1)rr. In contrast with a “scalebound” object, Mandelbrot counterposed his new scaling, fractal one:

“A scaling object … includes as its defining characteristic the presence of very many different elements whose scales are of any imaginable size. There are so many different scales, and their harmonics are so interlaced and interact so confusingly that they are not really distinct from each other, but merge into a continuum. For practical purposes, a scaling object does not have a scale that characterizes it. Its scales vary also depending upon the viewing points of beholders. The same scaling object may be considered as being of a human's dimension or of a fly's dimension.”2

qq This was actually a translation and extension of his earlier French book “Fractals”, 1975. rr The mathematical issue is their singular small scale nature. The basic multifractal process is a cascade that does not converge to mathematical points but only converges in the neighbourhood of points. This precludes them from being represented as a geometric set of points. They are like Dirac delta functions used in physics. Friday, February 2, 2018 Chapter 2 14

Fig. 2.5: The scaling approach: looking through the microscope at the (the black in the upper left square), after two magnifications (lower left and right hand side), Mandelbrot notices one of an infinite number of reduced scale versions (slightly deformed, near the lower right hand corner).

I had the good fortune to begin my own graduate career in 1976, just as the scalebound weather and climate paradigms were ossifying but before the nonlinear revolution really took off. I was thus totally unprepared and can vividly remember the epistemic shock when shortly after it appeared, I first encountered “Fractals, form chance and dimension”. Revealingly, it was neither my PhD supervisor Geoff Austin nor any other scientific colleague who introduced me to the book, but rather my motherss - an artist – who was awed by Mandelbrot’s imagery and fascinated by its artistic implications. At the time, my thesis topic was on the measurement of precipitation from satellitestt and I had become frustrated because of the enormous space-time variability of rain that was way beyond anything that conventional methods could handleuu. The problem was that there were several competing techniques for estimated rain from satellites and each one was different, yet there was essentially no way to validate any of them: scientific progress in the field was blocked. Fortunately, this didn’t prevent radar and satellite remote sensing technology from continuing to advance. Not long after reading Mandelbrot’s book, I started working on developing fractal models of rain, so that when I finally submitted my thesis in November 1980, about half of it

ss She was a pioneering electronic artist and had been working with early colour Xerox machines to develop electronic imagery years before the development of personal computers and cheap computer graphics: 25 Lovejoy, M. Postmodern Currents: Art and Artists in the Age of Electronic Media. (Prentice Hall College Division, 1989). tt My thesis (1981) was entitled “The remote sensing of rain”, Physics, dept. McGill University. uu Conventional methods are still in vogue, but over the last ten years our understanding of precipitation has been revolutionized by the application of the first satellite borne weather radar (the Tropical Rainfall Measurement Mission), that has unequivocally demonstrated that - like the other atmospheric variables - precipitation is a global scale cascade process that is distinctive primarily because its intermittency parameter is much larger than for the other fields. Friday, February 2, 2018 Chapter 2 15 was on conventional remote sensing topics while the other half was an attempt to understand precipitation variability by using fractal analyses and fractal models of rainvv. Given that three of the more conventional thesis chapters had already been published in journals - and had thus passed peer review - I was confident of a rubber stamp by the external thesis examiner. Since I had already been awarded of a post-doctoral fellowship financed by Canada’s National Science and Engineering Research Council (NSERC) I happily began preparing for a move to Paris to take it up at the Météorologie Nationale (the French weather service). But rather than getting a nod and a wink, the unthinkable happened: my thesis was rejected! The external examiner David Atlas (1924- 2015) then at NASA, was a pioneering radar meteorologist, who was involved in the - then fashionable - scalebound meso-scale theorizing (ch. 4). Atlas was clearly uncomfortable with the fractal material but rather than attacking it directly, he instead claimed that while the content of the thesis might be acceptable, that its structure was not. To his way of thinking, there were in fact two theses not one. The first was a conventional one that had already been published, while the second was a thesis on fractal precipitation that according to him, was unrelated to the first. The last point piqued me since it seemed obvious that the fractals were there in an attempt to overcome longstanding problems of untamed space-time variability: on the contrary they were very relevant to a remote sensing thesis. At that point, I panicked. According to the McGill thesis regulations, I had only two options: either I accept the referee’s strictures, amputate the offending fractal material and resubmit, or I could refuse to bend. In the latter case, the thesis would be sent without change to two external referees, both of whom would have to accept it, a highly risky proposition. Although I was ready to defend the fractal material, I knew full well it had not received serious critical attention. There might be errors that would sink the whole thing. A second rejection would be disastrous because then, McGill would not permit me to resubmit a thesis on the same topic. Therefore, before making a decision - and with the encouragement of Austin - I contacted Mandelbrot, visiting him at his Yorktown heights IBM office in January 1981. Mandelbrot was very positive about the material in the draft thesis. Not being very familiar with atmospheric scienceww, and wanting to give me the best possible advice, he contacted his friend, the oceanographer Eric Mollo-Christensen (1923-2009) at MIT. Mollo- Christensen advised me to simply remove the fractal material, and get the thesis out of the way. I could then try to publish it in the usual scientific literature. Beyond that, Mandelbrot advised me to make a short publication out of the analysis part, hinting that we could later start a collaboration to develop an improved fractal model of rain. With the fractals excised, the thesis was accepted without a hitchxx, and at the end of June, a week after defending my thesis, I went off to my Paris post-doc at the Météorologie Nationale, to work with a radar specialist, Marc Giletyy. In literally my first week in the French capital, I wrote up the analysis part - the empirical rain and cloud area-perimeter vv My approach to rainfall modeling followed the method that Mandelbrot had used to make cloud and mountain models in his book, except that I used a variant that was far more variable (based on “Levy” distributions rather than the bell curve, see box 3.1). ww The “Fractal Geometry of Nature” contained simulations of cloud “surfaces” based on turbulence theory (the Corrsin-Obhukhov law) but were not more directly related to meteorology. xx Twenty five years later, I met up with Atlas, by then in his 80’s but still occupying an office at NASA. His rejection of my thesis had been a fatherly act intending to steer me back towards mainstream science. During our discussion, he was mostly intrigued that I was still pursuing the material he had rejected so long ago! yy Within two months of the start of my post-doc, Gilet was given a high level administrative position and essentially withdrew from research. As a free agent, I soon started collaborating with Daniel Schertzer in the newly formed turbulence group. Friday, February 2, 2018 Chapter 2 16 relation (fig. 2.8) and submitted it to Science26zz. A few months later, I started a series of triweekly visits to Mandelbrot in Yorktown heights. This collaboration eventually spawned the Fractals Sums of Pulses (FSP) model27aaa close to the “H – model” described in box 2.2.

2.2.2 Fractal sets and dimensions The converse of scalebound objects are scaling, fractal objectsbbb. Let’s take a look at some examples. An object is scaling when if blown up, a small part in some way resembles the larger whole. An unavoidable example is the icon that now bears his name – the Mandelbrot set, the black silhouette in fig. 2.5. It can be seen that after a series of blow-ups we find reduced scale copies of the original (largest version) of the setccc. While the Mandelbrot set has been termed “the most complex object in mathematics”28 it is simultaneously one of the simplest, being generated by simply iterating the algorithm: “I take a number, square it, add a constant, square it, add a constant…”ddd. Precisely because of this algorithmic simplicity, it is now the subject of a small cottage industry of computer geeks who superbly combine numerics, graphics, and music. The You Tube is replete with examples; the last time I looked, the record-holder displayed a zoom by a factor of over 104000 (a one with four thousand zeroes)! The Mandelbrot set may be easy to generate, but it is hardly easy to understand. To understand the scaling, fractal idea, consider instead the simplest (and historically the first) fractal, the “perfect” (fig. 2.6a). Start with a segment one unit long (infinitely thin: this is mathematics!); the “base”. Then remove the middle 1/3, this is the “motif” (second from the top in the figure). Then iterate by removing the middle third of each of the two 1/3 long segments from the previous. Continue iterating so that at every level, one removes all the middle segments before moving to the next level. When this is repeated to infinitely small segments, the result is the Cantor set29eee. From the figure, we can see that if either the left or right half of the set is enlarged by a factor of three, then one obtains the same set. This property - that a part is in some way similar to the whole - is for obvious reasons called “self-similarity”. In this case, the left or right halves are identical to the whole. In atmospheric applications, the relationship between a part and the whole will generally be statistical, small parts are only the same as the whole on average. The Cantor set has many interesting properties, for our purposes the main one being its fractality, a consequence of its self-similarity. Let’s consider it a little more closely. After n construction levels, the number of segments is N = 2n, and the length of each segment is L = (1/3)n. Therefore, N and L themselves are related by a power law: eliminating the level n, we find N = L-D where D = log(2)/log(3) = 0.63… D is the (“log” for logarithm). In this case, it is called the “” dimension since - if we considered a fully formed Cantor set - the number of segments L (one dimensional “boxes”) that we would need to cover the set would

zz The paper sparked a stir; since then it has been cited nearly a thousand times. aaa The FSP model was an extension and improvement over the Levy fault model that I had developed during my PhD thesis, but was nevertheless still mono – not multi – fractal. bbb The use of the term “object” is frequent in this context but is sometimes confusingly vague. Mathematically, fractals are sets of points that have a scale symmetry (scaling) whereas multifractals are fields such as the temperature in the atmosphere that have values at each point in space and in time and that are scaling. ccc The small versions are actually slightly deformed versions of the larger ones. ddd To get an interesting result, the constant should be a complex number (i.e. one that involves the square root of minus one). eee It was apparently discovered a bit earlier by H. J. S. Smith in 1874. Friday, February 2, 2018 Chapter 2 17 be the samefff N. If this fractal dimension seems a bit weird, consider what would happen if we applied it to the entire initial (one dimensional) line segment? We can check that we do indeed recover D =1. To see this, consider what would happen if we did not remove the middle third (we kept the original segment) but analysed it using the same reasoning. In this case would have still divide by 3 at every iteration so that as before, L = (1/3)n but now the number of segments is simply N= 3n instead of 2n. This would lead to D = log3/log3 = 1, simply confirming that the segment does indeed have the usual dimension of a line. When a quantity such as N changes in a power law manner with scale L, it is called “scaling” so that N = L-D is a scaling law. Contrary to a scalebound process that changes its mechanism, (its “laws”) every factor of 10 or so, a unique scaling law may hold over a wide range of scales; for the Cantor set and other mathematical fractals, over an infinite range. Of course, real physical fractals can only be scaling over finite ranges of scale, there is always a smallest and largest scale beyond which the scaling will no longer be valid. Why does a power law imply scaling (and visa versa)? The answer is simply that if N = L-D and we zoom in by a factor λ (so that L->L/ λ), then we see that N-> N λD; so that the form of the law is unchanged. In contrast, for a scalebound process, changing scales by zooming would give us something quite different. Whenever there is a scaling law, there is something that doesn’t change with scale, something that is scale invariant: in the previous example it is the fractal dimension D. No matter how far we zoom into the Cantor set we will always recover the same value D. Self-similarity is a special case of scale invariance and occurs when - as its name suggests - some aspect of the system is unchanged under a usual blow-up. In physics, quantities such as energy that are invariant (conserved) under various transformations are of fundamental importance, hence the significance of exponents such as fractal dimensions that are invariant under scale transformations. More generally, a system can be invariant under more generalized “zooms” i.e. blow- ups combined with stretchings, rotations or other transformations. As an example, let’s return to the Weierstrass function which is scale invariant but not self-similar. To show that it is indeed scale invariant, we must combine a blow- up with a squashing - or alternatively blow up by different factors in each of the coordinate directions. This property is shown in (fig. 2.2a) by comparing the full Wierstrass function on the interval between 0 and 1, with the upper right that shows the left half (omitting the lowest frequencyggg), stretched in the horizontal direction by a factor 2 and stretched in the vertical direction by the factor 2H = 1.26. Objects that are scale invariant only after being blown up by different factors in perpendicular directions are called “self-affine”; the Weierstrass function is thus self-affine. As we discuss at length in the next chapter, scale invariance is still much more general than this. On the other hand, in the infinitely small limit, the Cantor set is simply a collection of disconnected points (Mandelbrot calls such sets “dusts”)hhh, and a mathematical point has a dimension zeroiii. The Cantor set is thus an example of set whose fractal dimension 0.63… is between 0 and 1 and D this quantifies the extent to which it fills the line. Sets of points with

fff Here - as almost always - the box counting dimension is the same as the that is sometimes used in this context. ggg If we don’t remove the lowest frequency in the upper left construction, then the result is only approximately self-affine, however, the construction mechanism itself is nevertheless self-affine. hhh At some stage in the construction any connected segment would have been cut by the removal of a middle third. iii The familiar geometric shapes studied by Euclid - points, lines, planes, volumes have “topological dimensions” 0, 1, 2, 3. Topological dimensions have to do with the connectedness of a set. For fractal sets, the fractal dimension and the topological dimension are generally different. Friday, February 2, 2018 Chapter 2 18 such in-between dimensions (they are usually noninteger) are fractalsjjj. More generally, for the purposes of this book, a fractal is a geometric set of points that is scale invariantkkk. As another mathematical example, consider next fig. 2.6b, the “Sierpinski carpet”30lll. The figure shows the base (upper left), motif (upper right) obtained by dividing an initial square into squares one third the size and then removing the middle one; the bottom right shows the result after 6 iterations. Using the same approach as above, after n construction steps (levels), the number of squares is N = 8n, and the size of each is L = (1/3)n. Thus N = L-D with D = log8/log3 = 1.89… Indeed, the Cantor set, the Sierpinski carpet and the unit segment illustrate the general result:

Number of boxes ≈ (scale)-D

Where the scale is identified with the length of the side of a square. Just as the Cantor set has a fractal dimension D = 0.63… between 0 and 1 - between a point and a line - the value of D for the Sierpinski carpet is between 1 and 2 i.e. between a line and a plane and it quantifies the extent that the Sierpinski carpet exceeds a line while partially filling the plane. These examples show a basic feature of fractal sets: due to their hierarchical clustering of points, they are “sparse”, their fractal dimension quantifies their sparseness. While the number of boxes gives us information about the absolute frequency of occurrence of parts of the set of size L, it is often more useful to characterize the density of the boxes of size L obtained by dividing the number of boxes needed to cover the set by the total number of possible boxes: for example the Cantor set by L-1, the Sierpinski carpet by L-2 since they are sets on the line (d = 1) and plane (d = 2) respectivelymmm. This ratio is their relative frequency, i.e. it is the probability that a randomly placed segment (d = 1) or square (d = 2) will happen to land on part of the set; the ratio is L-D / L-d = Ld-D = LC where C = d-D is the codimension of the set. Whereas D measures absolute sparseness and frequencies of occurrence, C measures relative sparseness and probabilities of occurence. For the Cantor set, C =1-log2/log3= 0.36… and for the Sierpinski carpet, C = 2-log8/log3 = 0.11… so that their relative sparsenesses are not so different. If I put a circle (or square) size L at random on the Sierpinski carpet (iterated to infinitely small scales), the probability of it landing on part of the carpet is L0.11, whereas for the Cantor set, putting a random segment length L, would have almost the same probability - L0.36 - of landing on the set. Notice that in both cases, the probability gets smaller as the scale L is reduced. This is because small segments/boxes are more likely to land in “holes” than larger ones. This example illustrates the general result:

probability ≈ (scale)C

jjj Due to nontrivial mathematical issues, there are numerous mathematical definitions of dimension, a full discussion would take us too far afield. kkk Of course, the line in the above example is scale invariant with D = 1 so according to this definition it is also a fractal. However, we generally reserve the term “fractal” for less trivial scale invariant sets, usually with fractal dimension greater than the topological dimension. lll This construction and the analogous one based on removing middle triangles is credited to W. Sierpinski in 1916, the name “carpet” was added by Mandelbrot. Mobile phone and wifi antennae have been produced using a few iterations of the Sierpinski carpet, exploiting their scale invariance to accommodate multiple frequencies. The Sierpinski triangle goes back to at least the 13th century where it has been found in churches as decorative motifs. mmm The negative sign is because smaller segments are more numerous. Indeed NxL = total length, (in d = 1), NxL2 = total area (in d =2) etc. Friday, February 2, 2018 Chapter 2 19

In science, we’re usually interested in probabilities, so that fractal codimensions are generally more useful than fractal dimensions.

X3

Fig. 2.6a: The Cantor set. Starting at the top (the “base”), a segment one unit long - the “motif” is obtained by removing the middle third. The operation of removing middle thirds is then iterated to infinitely small scales. The dashed ellipses show the property of self-similarity: the left hand half of any level when blown up by a factor of three gives the next level up.

Fig. 2.6b: The construction of the Sierpinski carpet. The base (upper left), is transformed to the motif (upper middle) by dividing the square into nine subsquares each one third the size and then removing the middle square. The construction proceeds left to right top to bottom to the sixth iteration. Friday, February 2, 2018 Chapter 2 20

An example of fractal sets that are relevant to atmospheric science are the locations where meteorological measurements are taken, represented as points in fig. 2.6c. In this case, the set is sparse because the measurement stations are concentrated on continents and in richer nations. To estimate its fractal dimension, one can place circles of radius L on each stationnnn (one is shown in the figure) and determine the average number of other stations within the distance L. If one repeats this operation for each radius L, averaging over all the stations, one finds that on averageooo there are LD stations in a radius L, and that this behaviour continues down to a scale of 1 kmppp. For the measuring network, we found D = 1.75 (fig. 2.6d). Even today, much of our knowledge of the atmosphere comes from meteorological stations, for climate purposes - such as estimating the evolution of the atmosphere over the last century - we must also consider ship measurements, place the data on a grid, (typically 5oX5o in size) and average them over a month (e.g. fig. 2.6e). It turns out that for any given month, the set of grid points having some temperature data, is similarly sparseqqq so that both in situ weather and climate data are taken on fractal sets. An immediate consequence of a fractal network is that it will not detect sparse fractal phenomena, for example, the violent centres of storms that are so sparse that their fractal dimensions are less than C, (in this example, 0.25). This analysis shows that as we use larger and larger circles, they typically encompass larger and larger voids so that the number of stations per square kilometer systematically decreases: the measuring network effectively has holes at all scales. This means that the usual way of handling missing data must be revised (see box 5.1). At present, one thinks of the measuring network as a two dimensional spatial array or grid (three dimensional if we include time) although with some grid points empty. According to this way of thinking, since the earth has a surface area of about 500 million square kilometers, each of the 10,000 stations represents about 50 thousand square kilometers. This corresponds to a box about 220 kilometers on a side so that atmospheric structures (e.g. storms) smaller than this will typically not be detected. Although it is admitted to be imperfectrrr, the grid is therefore supposed to have a spatial resolution of 220 km. Our analysis shows that on the contrary, the problem is one of inadequate dimensional resolutionsss.

nnn This technique actually estimates the “” of the set. If instead one centres circles at points chosen at random on the earth’s surface (instead of only on stations), then one obtains the box-counting dimension discussed above. It turns out that in general, the two are slightly different, the density of points is an example of a multifractal measure. Indeed, one can introduce an infinite hierarchy of different exponents associated with the density of points. ooo The rule LD for the number of stations in a circle is a consequence of the fact that the number of boxes at scale L decreases with L as L-D since on average, the number of points per box is independent of L: L-D xLD = constant. ppp The geographical locations of the stations in fig. 2.6c were only specified to the nearest kilometer, so it is possible that the curve actually extends to even smaller scales. For large L it is valid up to several thousand kilometers which is about as much as is theoretically possible given that were only 9962 stations. qqq Both in space and – due to data outages and ship movements, also in time, the fractal dimensions, codimenions are nearly the same as for the meteorological network: 31 Lovejoy, S. & Schertzer, D. The Weather and Climate: Emergent Laws and Multifractal Cascades. (Cambridge University Press, 2013). rrr The techniques for filling the “holes” such as “Kriging” typically also make scalebound assumptions (exponential decorrelations and the like). sss When estimating global temperatures over scales up to decades, the problem of missing data does indeed dominate the other errors (although this is not the same as dimensional resolution), see box 5.1.

Friday, February 2, 2018 Chapter 2 21

L

nL() LD

2.6c: The geographical distribution of the 9962 stations that the World Meteorological Organization listed as giving at least one meteorological measurement per 24 hours (in 1986); it can be seen that it closely follows the distribution of land masses and is concentrated in the rich and populous countries. The main visible artificial feature is the Trans-Siberian Railroad. Also shown is an example of a circle used in the analysis. Adapted from ref32.

Fig. 2.6d: The average number of stations (vertical axis) within a circle radius L horizontal axis (in kilometers). The slope of the top straight line is D = 1.75. Adapted from ref32.

Friday, February 2, 2018 Chapter 2 22

Fig. 2.6e: Black indicates the 5ox5 o grid points for which there is some data in the month of January, 1878 (20% of the 2560 grid points were filled) ttt. Although highly deformed by this map projection, we can almost make out the south American continent (white surrounded by black, data are from ship measurements lower left) and Europe, the central upper black mass of grid points.

2.2.3 Fractal lines and wiggliness The preceding examples of fractal sets were deliberately chosen so that one could get an intuitive feel for the sparseness that the dimension quantifies. In many cases, one instead deals with sets made up of “wiggly” lines such as the Koch curve34 (1904), shown in fig. 2.7auuu; the fractal dimension can often quantify wiggliness. The construction proceeds from top to bottom by replacing each straight segment by segments the shape of the second curve from the top, i.e. made of pieces, each of the original size. Again, after n iterations, we have N = 4n and L =(1/3)n, hence the fractal dimension is D = log4/log3 = 1.26… In this curve, the “wiggles” have a dimension between 1 and 2, the wiggliness is quantified by D. Note that as we proceed to more and more iterations, the length of the curve increases. Indeed, after each iteration, the length increases by the factor 4/3 since each segment is replaced four segments each 1/3 the previous length. Therefore after n iterations, the length is (4/3)n which becomes infinite as n grows. If a completed Koch curve is measured with a ruler of length L (such a ruler will be insensitive to wiggles smaller than thisvvv), then in terms of the fractal dimension, the length of the Koch curve would be L1-D. As the ruler gets shorter and shorter, it can measure more and more details, the length increases accordingly. Since D (=1.26>1), the length grows as L-0.26 and becomes infinite for rulers with small enough L. How far can we take wiggliness? In 1890, Giuseppe Peano (1858-1932) proposed the fractal construction that bears his name (fig. 2.7b). The is made from a line that is so wiggly that - by successive iterations – it ends up literally filling part of the plane! ttt This is from the HadCRUT data set using an equal angle projection that greatly distorts high latitudes: 33 Kennedy, J. J., Rayner, N. A., Smith, R. O., Saunby, M. & Parker, D. E. Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850 part 2: biases and homogenisation. J. Geophys. Res. 116, D14104, doi:10.1029/2010JD015220 (2011). uuu If three Koch curves are joined in the shape of a triangle, one obtains the Koch “snowflake” which is probably more familiar. vvv This method is sometimes called the “Richardson dividers method” after L. F. Richardson who first used it to estimate the length of coastlines and other geographic features, see below. Friday, February 2, 2018 Chapter 2 23

At the time, this stunned the mathematical community since it was believed that a square was two dimensional because it required two numbers (coordinates) to specify a point in it. Peano’s curve allows a point to be specified instead by a single coordinate specifying its position on an (infinite) line wiggling its way around the squarewww. We have already seen another example of wiggliness, the Weierstrass function, fig. 1.3, 2.2a), constructed by adding sinusoids with geometrically increasing frequencies and geometrically decreasing amplitudes. The Weierstrass function was originally proposed as the first example of a function whose value is everywhere well defined (it is continuous), but does not have a tangent anywhere (it is “nowhere differentiable”). A visual inspection (fig. 2.2a) shows why this is so: to determine the tangent, we must zoom in to find a smooth enough part over which to estimate the slope, since the Weierstrass function is a fractal, we in zoom forever without finding anything smooth. An atmospheric example of a wiggly curve is the perimeter of a cloud as defined by a cloud photograph separating lines that are brighter or darker than a fixed brightness threshold26. In order to estimate the fractal dimension of a cloud perimeter, we could try to measure it with rulers of different lengths and use the fact that the perimeter length increases as L1-D (since D>1). It turns out that it is more convenient to use fixed resolution satellite or radar images and use many clouds of different sizes. If we ignore any holes in the cloudsxxx and if their perimeters all have the same fractal dimensions, then their areas (A) turn out to be related to the perimeteryyy as P = AD/2. Fig. 2.8 (right) shows an example when this technique is applied to real cloud and rain areas. Although various theories were later developed to explain the empirical dimensionzzz (D = 1.35) the most important implication of this figure is that it gave the first modern evidence of the complete failure of Orlanski’s scalebound classification. Had Orlanski’s classification been based on real physical phenomena each different and acting over narrow ranges of scales, then we would expect a series of different slopes, one for each of his ranges. The expectation that the behaviour would be radically different over different scale ranges was especially strong as concerns the meso-scale, the horizontal range from about 1 to 100 kilometers where it was believed that the atmospheric thickness would play a key role in changing the behaviour, the “meso-scale gap”, see ch. 4. Before this, the only other quantitative evidence for wide range atmospheric scaling was from various empirical tests of Richardson’s 4/3 law of turbulent diffusion35; fig. 2.8 (left) shows his original verification using notably data from pilot balloons and volcanic ashaaaa. Through the 1950’s and 1960’s, Richardson’s atmospheric 4/3 power law was repeatedly confirmed with theorists invariably complaining that it extends beyond the range for “which it can be justified theoretically”bbbb. However the story didn’t end there: in the 1970’s, in the wake of two www Note that in the infinitely small limit, at each point, the Peano curve touches itself. This means that while it is mapping of the line onto the plane, the mapping is not one-to-one. xxx If the cloud area is itself a fractal set, the P = AD/Dc where Dc <2 is the fractal dimension of the clouds. yyyThe area-perimeter relation was proposed in: 22 Mandelbrot, B. B. Fractals, form, chance and dimension. (Freeman, 1977). zzz These results were in my original thesis; in the final version they were excised in order to satisfy the external referee, they were subsequently published in Science. aaaaThe ocean is also an example of a stratified turbulent system and the 4/3 law holds fairly accurately, Richardson creatively tested his law in the ocean using amongst other things, bags of parsnips that he watched diffusing from a bridge: 36 Richardson, L. F. & Stommel, H. Note on eddy diffusivity in the sea. J. Met. 5, 238-240 (1948). Still later, it was verified in the ocean over the range 10 m to 10,000km in: 37 Okubo, A. & Ozmidov, R. V. Empirical dependence of the horizontal eddy diffusivity in the ocean on the length scale of the cloud. Izv. Akad. Nauk SSSR, Fiz. Atmosf. i Okeana 6 (5), 534-536 (1970). bbbb Meaning that it cannot be accounted for by the dominant three dimensional isotropic homogeneous turbulence theory, see e.g. p. 557 of: Friday, February 2, 2018 Chapter 2 24 dimensional isotropic turbulence theory, a large scale balloon version, the EOLE experiment was undertaken that claimed to invalidate his law. In ch. 4, we describe the saga of how this conclusion was based on an erroneous analysis, and how (partially) in 2004 and (fully) in 2013, Richardson was vindicated.

Fig. 2.7a: Left: A fractal Koch curve34 (1904), reproduced from Welander39 (1955) who used it as a model of the interface between two parts of a turbulent fluid.

Fig. 2.7b: Left, the first three steps of the original Peano curve, showing how a line (dimension 1) can literally fill the plane (dimension 2).

38 Monin, A. S. & Yaglom, A. M. Statistical Fluid Mechanics. (MIT press, 1975). Friday, February 2, 2018 Chapter 2 25

Right: A variant reproduced from Steinhaus40 (1960) who used it as a model for a hydrographic network, illustrating how streams can fill a surface.

(1000 km)2

(100 km)2

(10 km)2

(1 km)2

1 km 104 km

Fig. 2.8: The left shows Richardson’s proposed 4/3 law of turbulent diffusion35cccc which includes a few estimated data points. Right: the area perimeter relation for radar detected rain areas (black) and Infra red satellite cloud images (open circles), the perimeter is the horizontal axis, the area, the vertical axis. The slope corresponds to D = 1.35. The mesoscale (roughly 1 to 100km) is shown in the brackets: nothing special. Adapted from ref.26

2.2.4 Richardson We have discussed several of the famous 19th century fractals: the Cantor set (figs. 2.6a), the first set with a noninteger dimension; the Peano curve (figs. 2.6b), the first line that could pass through every point in the unit square (a plane); and the Weierstrass function (figs. 2.2a), the first continuous curve that doesn’t have a tangent anywhere. But for a long time these were considered to be essentially mathematical constructions: academic oddities without physical relevance. Mandelbrot rehabilitated them and provocatively called them “monsters”. Mandelbrot not only coined the term “fractal” but with his indefatigable energy put them squarely on the scientific map. Although he made numerous mathematical contributionsdddd, his most important one was as a towering pioneer in applying fractals and

cccc The quality of the figure is low, but this version due to Monin, is already improved from the original: 41 Monin, A. S. Weather forecasting as a problem in physics. (MIT press, 1972). dddd I will let the mathematicians judge his contributions to mathematics. However, there is no question that Mandelbrot’s contribution to science has been monumental and underrated. In any case (and in spite of Friday, February 2, 2018 Chapter 2 26 scaling to the real world. In this regard, his only serious scientific precursor was Lewis Fry Richardsoneeee (1881-1953). Due to his Quaker beliefs, Richardson was a pacifist and this made his career difficult, essentially disqualifying him from academic positions. He instead joined the Meteorology Office but temporarily quit it in order to drive an ambulance during the first world war. Afterwards, he rejoined the Meteorology Office but in 1920 resigned when it was militarized by being merged into the Air Ministry. Richardson worked on a range of topics and is remembered for the nondimensional Richardson number that characterizes atmospheric stability, the Richardson 4/3 law (fig. 2.8, left), the Modified Richardson Iteration and Richardson Acceleration techniques of numerical analysis and the Richardson divider’s method. The latter is a variant on box- counting (discussed above) that he notably used to estimate the length of the coastline of Britain, demonstrating that it followed a power law. Mandelbrot’s famous 1967 paper that initiated fractals: “How long is the coastline of Britain? Statistical self-similarity and fractional dimension”45ffff took Richardson’s graphs and interpreted the exponent in terms of a fractional dimensiongggg. Fully aware of the problem of conceptualizing wide range atmospheric variability, Richardson was the first to explicitly propose that the atmosphere might be fractal. A remarkable subheading in his 1926 paper on turbulent diffusion is entitled “Does the wind possess a velocity”, followed by the statement: “this question, at first sight foolish, improves upon acquaintance”. He then suggested that a particle transported by the wind might have a Weierstrass function-like trajectory that would imply that its speed (tangent) would not be well definedhhhh. Richardson is unique in that he straddled the two main – and superficially opposing - threads of atmospheric science: the low level deterministic approach and the high level statistical turbulence approach. Remarkably, he was a founding figure for both. His seminal book “Weather forecasting by numerical process46” iiii (1922) inaugurated the era of numerical weather prediction. In it, Richardson not only wrote down the modern equations of atmospheric dynamics, but he pioneered numerical techniques for their solution, he even laboriously attempted a manual integrationjjjj. Yet this work also contained the seed of an alternative: buried in the middle of a paragraph, he slyly inserted the now iconic poem describing the cascade idea: “Big whirls have little whirls that feed on their velocity, little whirls have smaller whirls and so on to viscosity (in the molecular sense)”kkkk. Fig. 2.9 is a

Mandelbrot’s efforts!), it is still a bit early to evaluate his place in the history of science. Interested readers may consult his posthumously published autobiography, “The Fractalist”: 42 Mandebrot, B. B. The Fractalist. (First Vintage Books, 2011). eeee We have given some historical examples of early geophysical fractal models (figs. 2.7a,b) but other notable precursors were Jean Perrin (1870-1942), who questioned the differentiability of the coast of Brittany: 43 Perrin, J. Les Atomes. (NRF-Gallimard, 1913). and Hugo Steinhaus (1887- 1972) who questioned the integrability of the length of the river Vistula: 44 Steinhaus, H. Length, Shape and Area. Colloquium Mathematicum III, 1-13 (1954). Lack of differentiability and its converse, integrability are typical scaling, fractal features. Looking back, these examples are significant, but are isolated. In contrast, Mandelbrot initiated a whole body of work. ffff This was still nearly a decade before Mandelbrot coined the word “fractal”. gggg Above we saw that the length of the cloud perimeter varies as L1-D where L is the length of the ruler and D is the fractal dimension. hhhh It turned out that the problem was not the velocity, but the acceleration. iiii Lacking support, he paid for the publication out of his own pocket. jjjj Near the war’s end, he somehow found six weeks to attempt a manual integration of the weather equations. His estimate of the pressure tendency at a single grid point in Europe turned out to be badly wrong (as he admitted), but the source of the error was only recently identified, see the fascinating account by Lynch: 47 Lynch, P. The emergence of numerical weather prediction: Richardson's Dream. (Cambridge University Press, 2006). kkkk This poem was a parody of a nursery rhyme, the “Siphonaptera”: “Big fleas have little fleas, Upon their backs to bite 'em, And little fleas have lesser fleas, and so, ad infinitum. Friday, February 2, 2018 Chapter 2 27 schematic showing a modern interpretation that is the basis of turbulent models of intermittency and is the basic multifractal model (see Box 2.1). Richardson’s book was soon followed by the first turbulent law, the Richardson 4/3 law of turbulent diffusion35, today celebrated as the starting point for modern theories of turbulence - including the key idea of cascades and scale invariance. Unencumbered by later notions of meso-scalellll, and with remarkable prescience, he even proposed that his scaling law could hold from dissipation up to planetary scales (fig. 2.8, left). Richardson is the precursor of much of the work described in this book including the area-perimeter analysis (fig. 2.8, right), and the large body of results that we discuss later. Today, he is honoured both as father of numerical weather prediction by the Royal Meteorological Society’s Richardson prize and as grandfather of turbulence by the European Geosciences Union’s Richardson medalmmmm. As a humanist, Richardson worked to prevent war, with his book “The problem of contiguity: an appendix of statistics of deadly quarrels”48 he founded the mathematical (and nonlinear!) study of war. He was also anxious that his research be applied to directly improve the situation of humanity and proposed the construction of a vast “Weather factory”. This would employ tens of thousands of human “computers” and would make real time weather forecasts. Recognizing (from personal experience) the tedium of manual computation, he foresaw the need for the factory to include social and cultural amenities. Let me now explain a deep consequence of Richardson’s cascade idea that didn’t fully mature until the nonlinear revolution in the 1980’s. We have seen that the alternative to scalebound thinking is scaling thinking and that fractals embody this idea for geometric sets of points. For example the Koch curve was a model of a turbulent interface, the set of points bounding two different regions, the Peano curve as a model of a hydrographic network. However, in order to apply fractal geometry to the set of bounding (perimeter) points, we were already faced with a problem: we had to reduce the grey shades to white or black (cloud or no cloud). Since atmospheric science does not often deal with black/white sets - but rather with fields such as the cloud brightness or temperature that have numerical values everywhere in space and that vary in time - something new is necessary.

Box 2.1 Intermittency, multifractals and the α model

(Re)consider fig. 1.5, the aircraft temperature transect. One could repeat the treatment of the Weierstrass function to try to fit the transect into the framework of fractal sets by simply considering the points on the top graph as the set of interest. But this turns out to be a bad idea because as we also saw in fig. 1.5 (bottom), the figure was actually hiding some incredibly variable spiky, (intermittent) changes, and this behaviour requires something new to handle it: multifractalsnnnn. Indeed, multifractals were first discovered precisely as models of such turbulent intermittencyoooo. llll That predicted a strong break in the scaling, see ch. 4. mmmm The highest honour of the Nonlinear Processes division. nnnn Mathematically, the nontrivial point is that whereas the Weierstrass function is continuous i.e. well defined at each instant t, (a mathematical point on the time axis), a multifractal only converges in the neighborhood of the instant, in order to converge, the multifractal must be averaged over a finite interval. This is the origin of the “dressed” properties that are related to the “multifractal butterfly effect” and extreme events discussed in Box 3.1. oooo It was actually a little more complicated than that: the key multifractal formula independently appeared in three publications in 1983, one in turbulence, and the other two in the field of deterministic chaos: Friday, February 2, 2018 Chapter 2 28

Focus on the bottom of fig. 1.5, the “spikes”. Rather than treating all the points on the graph as a wiggly fractal set, instead consider the set of points that exceed a fixed threshold, for example those above the level of one standard deviation as indicated by the horizontal line in the figure as a kind of Cantor set. If the spikes are scale invariant, then this set will be a fractal with a certain fractal dimension. Now, move the horizontal line a little higher to consider a different set; the spikes that exceed this higher level. We find that the fractal dimension of this different set is lower. Indeed, in this way, moving to higher and higher levels we could specify the fractal dimension of all the different level sets, thus completely characterizing the set of spikes by an infinite number of fractal dimensions. The absolute temperature changes (the spikes) - and indeed the temperature transect itself - are thus multifractals. It turns out that multifractals are naturally produced by cascade processes that are physical models of the concentration of energy and other turbulent fluxes into smaller and smaller regions. Interested readers can find more information about this in box 2.1. Mathematically, whereas fractals are scale invariant geometric sets of points they are black or white, you are either on or off the set of points. In contrast, multifractals – at least when averaged over small segments and time intervals - are scale invariant fields: like the temperature, they have numerical values at each point in space, at each instant in time.

49 Schertzer, D. & Lovejoy, S. in IUTAM Symp. on turbulence and chaotic phenomena in fluids. (ed T. Tasumi) 141-144. 50 Grassberger, P. Generalized dimensions of strange attractors. Physical Review Letter A 97, 227 (1983). 51 Hentschel, H. G. E. & Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physica D 8, 435-444 (1983). While the turbulent publication was admittedly only in a conference proceeding, the debate about the priority of discovery was soon overshadowed by Mandelbrot’s claim to be the “father of multifractals”: 52 Mandelbrot, B. B. Multifractals and Fractals. Physics Today 39, 11, doi: http://dx.doi.org/10.1063/1.2815135 (1986). Soon after the initial discovery of multifractals, a major contribution was made by Parisi and Frisch who were also the first to coin the term “multifractal”: 53 Parisi, G. & Frisch, U. in Turbulence and predictability in geophysical fluid dynamics and climate dynamics (eds M. Ghil, R. Benzi, & G. Parisi) 84-88 (North Holland, 1985). Recognizing the importance of multifractals, Mandelbrot subsequently spent a huge effort claiming its paternity. Ironically, Steven Wolfram in his review of Mandelbrot’s posthumous autobiography “The Fractalist” complained that Mandelbrot had “diluted” the fractals concept by insisting on multifractals: 54 Wolfram, S. in Wall Street journal (2012). Friday, February 2, 2018 Chapter 2 29

Log4 Log3 D = = 2 D = =1.58... Log2 Log2

Fig. 2.9: A schematic diagram showing the first few steps in a Richardson inspired cascade process. At each step, the parent eddy (Richardson’s “big whirl”, top) is broken up into “daughter” eddies, each reduced by a factor of 2 in scale, indicated as squares. The left shows a homogeneous cascadepppp in which the energy flux is simply redistributed from large to small structures, while keeping its density constant. The right-hand side shows an improvement: “on/off” intermittency is modelled by an “alive/dead” alternative at each step (here only the bottom right sub-eddy becomes dead); the result is a fractal set of active areas, the fractal dimension is shown at the bottom. For pedagogical reasons, the alternative displayed is purely deterministic, but could be easily randomized (see text). Adapted from Schertzer and Lovejoy (1987). For the generalization of the (fractal) model to multifractals, see box 2.1. 2.3 Fluctuation analysis as a microscope When confronted with variability over huge range of space and time scales, we have argued that there are two extreme opposing ways of conceptualizing it. We can either assume that everything changes as we move from one range of scale to another – ever factor of ten or so – or on the contrary, we can assume that – at least over wide range (factors of hundreds, thousands or more), that blowing up gives us something that is essentially the same. But this is science; it shouldn’t be a question of ideology. If we are given a temperature series or a cloud image, how can we analyse the data to distinguish the two, to tell which is correct? We have already introduced two methods: spectral analysis which is quite general, and the area-perimeter relation which is rather specialized. While spectral analysis is a powerful technique, its interpretation is not so simple – indeed, had the interpretations been obvious, we would never have missed the quadrillion and the distinction between macroweather and the climate would have been clarified long ago! It is therefore important to use an analysis technique that is both easy to apply and easy to understand, a kind of analytic microscope that allows us to zoom in and to systematically compare a time series or a transect at different scales, to directly test the scalebound or scaling alternative: fluctuation analysis, already briefly discussed in fig. 2.3c.

ppppCorresponding to Kolmogorov’s 1941 homogeneous turbulence. Friday, February 2, 2018 Chapter 2 30

We probably all have an intuitive idea about what a fluctuation is. In a time series it’s about the how the values change over an interval in time. Consider a temperature series. We are interested in how much the temperature has fluctuated over an interval of time Δt. The simplest fluctuation is simply the difference between the temperature now and at a time Δt earlier (fig. 2.10 top). This is indeed the type of fluctuation that has traditionally been used in turbulence theory and that was used in the first attempt to test the scaling hypothesis on climate data (fig. 2.11). In order to make figure 2.11, first, two instrumental series were analyzed: the Manley55 series from central England starting in 1659qqqq (open circles) and an early northern hemisphere series from 1880 (black circles); the former being essentially local (regional), the latter global in scale. The other series were from early paleo isotope series, using the official calibrations to transform them into temperature valuesrrrr. In order to make the graph, for a given time interval Δt, one systematically calculates all the nonoverlapping differences in each series and averaged the squares of these differences, the typical values shown in the plot are the square root of these (the “root mean squares”). One then plots the results on logarithmic coordinates since in that case, scaling appears as straight lines and can easily be identified. Reading the graph, one can see for example that at 10 year intervals, the typical northern hemisphere temperature change is about 0.2 oC and that over about 50,000 years, that the typical temperature difference is roughly 6 oC (±3 oC), this corresponds to the typical difference of temperature between glacials and interglacials, hence the box (which allows for some uncertainty) is the “glacial- interglacial window”. These fluctuations are therefore straightforward to understand. On fig. 2.11, a reference line with slope H = 0.4 is shown corresponding to the scaling behaviour ΔT ≈ ΔtH, linking hemispheric temperature variations at ten years to paleo variations at hundreds of thousands of years. Although this basic picture is essentially correct, later work provided a number of nuances that help to explain why things were not fully cleared up until much later. Notice in particular, the two essentially flat sets of points in the figure, one from the local central England temperature up to roughly three hundred years, and the other from an ocean core that is flat from scales 100,000 years and longer. It turns out that the flatness is an artefact of the use of differences in the definition of fluctuations: we need something better. Before continuing, let us recall the scaling laws that we have introduced up until now: β β Spectrum ≈ (frequency ω)- ≈ (scale) Number of boxes ≈ (size L)-D≈ (scale)-D Probability ≈ (scale) C Fluctuations ≈ (interval Δt)H ≈ (scale)H

Where β is the spectral exponent, D is the fractal dimension of a set, C the codimension and H the fluctuation exponent ssss . A nonobvious problem with defining fluctuations as

qqqq This monthly series from the Greater London area is famous for the being the longest complete series based on real thermometer measurements. rrrr Long before the internet, scanners and publicly accessible data archives, as a post-doc at the Météorologie Nationale in Paris, I recall taking the published graphs, making enlarged photocopies, and then using tracing paper to painstakingly digitize them. ssss The symbol H is used in honour of Edwin Hurst who discovered the “Hurst effect”: long range memory associated with scaling in hydrology. He did this by examining ancient records of Nile flooding. It turns out that the fluctuation exponent is in general not the same as Hurst’s exponent, that they are only the same if the data follow the bell curve… which they only do rarely! This distinction has caused much confusion. Friday, February 2, 2018 Chapter 2 31 differences is that on average, differences cannot decrease with increasing time intervalstttt. This means that no matter what the value of H - whether positive or negative that they cannot decrease so that whenever H is negative, (implying that ΔtH decreases with increasing Δt) the difference based fluctuations will simply give a constant resultuuuu, the flat parts of fig. 2.11. But do regions of negative H exist? One way to investigate this is to try to infer H from the spectrum which does not suffer from an analogous restriction: its exponent β can take any value. In this case there is an approximate formulavvvv we can use: β =1+2H. This formula implies that negative H corresponds to β <1, and a check on the spectrum (fig. 2.3a) indicates that several regions are indeed flat enough to imply negative H. How do we fix the problem with difference fluctuations and estimate the correct H when it is negative?

Box 2.2 Fluctuations and the fractal H model

It took a surprisingly long time to clarify this issue. To start with, the turbulence community was fond of difference fluctuations for which it had developed many convenient theoretical results. Turbulence theorists had been the first to use fluctuations as differences and a decade before Hurst, effectively introduced the first H = 1/3 as the exponent in Kolmogorov’s famous lawwwww, (see ch. 3, 4). In classical turbulence all the H’s are positive so that the restriction to positive H was not a problem. Later, in the wake of the nonlinear revolution in the 1980’s, mathematicians invented an entire mathematics of fluctuations called “wavelets”xxxx. Although technically, difference fluctuations are indeed wavelets, mathematicians mock them calling them the “poor man’s wavelet” promoting the more sophisticated ones. Wavelets turned out to have many beautiful mathematical properties and often have colourful names such “Mexican Hat”, “Hermitian Hat”, or the “Cohen- Daubechies-Feauveau wavelet”. For mathematicians, it was irrelevant that the corresponding physical interpretations were not evident. The mastery of wavelet mathematics also required a fair intellectual effort and this further limited their scientific applications. This was the situation in the 1990’s when scaling started to be applied to geo time series involving negative H (essentially to any macroweather series, although at the time this was not at all clear). It fell upon a statistical physicist Chung-Kang Peng to develop an H<0 technique that he applied to biological series; the Detrended Fluctuation Analysis (DFA) methodyyyy57. Also at this time, another part of the scaling community (including my colleagues and I) were focusing on multifractality and intermittency, and these issues didn’t involve negative H so that the problem was ignored. Over the following nearly two decades,

56 Hurst, H. E. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770-808 (1951). tttt This is true for any series that has correlations that decrease with increasing interval Δt, as physically relevant series always do. uuuu D and C cannot be negative so that this problem does not arise for them. vvvv Valid if we ignore intermittency, otherwise there are “intermittency corrections”, box 2.1. wwww Kolmogorov’s law was actually very close to Richardson’s 4/3 law, the 4/3 was H+1. xxxx Although wavelets can be traced back to Alfred Haar (1909, see below), it really took off starting in the early 1980’s with the continuous wavelet transformation by Alex Grossman and Jean Morlet. yyyyIn retrospect, the key innovation was simply that the method started by taking the running sum of the original series, effectively adding one to the value of H. As long as the original H was greater than -1 the new series thus had a positive H, allowing the usual differences and difference-like fluctuations to be used. Friday, February 2, 2018 Chapter 2 32 there were thus several more or less independent strands of scaling analysis, each with their own mathematical formalism and interpretations. The wavelet community dealing with fluctuations directly, but unconcerned about the simplicity of physical interpretations; the DFA communityzzzz wielding a somewhat complex method but one that could be readily implemented numerically and didn’t require much theoretical baggage aaaaa ; and the turbulence community focused on multifractal intermittency. In the meantime, mainstream geoscientists continued to use spectral analysis, focusing on spectral peaks that supposedly represented quasi-oscillating processes, not on the scaling or on the interpretation of the amplitudes of the spectra, most of which was treated as uninteresting background noise. Ironically, the impasse was broken by the first wavelet, the one that Alfréd Haar (1885-1933) had introduced in 1910 even before the wavelet formalism had been invented58. The Haar fluctuation is beautiful for two reasons: the simplicity of its definition and calculation and the simplicity of interpretation59. To determine the Haar fluctuation over a time interval Δt, one takes the average of the first half of the interval and subtracts the average of the second half (fig. 2.10, bottom). That’s itbbbbb! As for the interpretationccccc, it is easy to show that when H is positive, that it is (nearly) the same as a difference, whereas whenever H is negative, we not only recover its correct valueddddd, but the fluctuation itself can be interpreted as an “anomalyeeeee.”

T t Δt t+Δt H>0 Difference t fluctuaZons Difference ΔT

T Anomaly ΔT H<0 Anomaly t fluctuaZons Δt t t+Δt

T Haar ΔT

-1

zzzz At last count, Peng’s original paper had more than 2000 citations, an astounding number for such a highly mathematical paper. aaaaa In words, the DFA method estimates fluctuations by the standard deviation of the residuals of a polynomial fit to the running sum of the series. The interpretation is so obscure that typical plots do not bother to even use units for the fluctuation amplitudes, thus throwing away much of the information. bbbbb I can recall a comment of a referee of a paper in which I explained the Haar fluctuation using the same words. Expecting a complicated wavelet expression, he complained that he didn’t understand the words and instead wanted an equation! ccccc The correspondence Haar = difference with H positive and Haar = anomaly with H negative is not numerically exact; it is usual to multiply the “raw” Haar fluctuation by the factor 2 in order to make the correspondence closer. This has been done in fig. 2.4 and elsewhere in this book. ddddd The Haar fluctuation is only useful for H in the range -1 to 1, but this turns out to cover almost all of the series that are encountered in geoscience. eeeee In this context an anomaly is simply the average over a segment length Δt of the series after its long term average has been removed. Friday, February 2, 2018 Chapter 2 33

Fig. 2.10: Top: Schematic illustration of difference fluctuations for a multifractal simulation of the atmosphere in the weather regime (0≤H≤1), Middle: Illustration of the anomaly fluctuation for a series in the lower frequency macroweather regime. Notice the “wandering” and “cancelling behaviours. Bottom: Illustration of Haar fluctuations (useful for processes with -1≤H≤1). The Haar fluctuation over the interval Δt is the mean of the first half subtracted from the mean of the second half of the interval Δt.

101

1

C o ) t Δ S( 10-1

1 101 102 103 104 105 106 Δt (years)

Fig. 2.12: The square root of the average differences (the “structure function”) estimated from local (Central England) temperatures since 1659 (open circles, upper left), northern hemisphere temperature (black circles), and from paleo temperatures from Vostok (Antarctic, solid triangles), Camp Century (Greenland, open triangles) and from an ocean core (asterixes). For the northern hemisphere temperatures, the (power law, linear on this plot) climate regime starts at about 10 years. The rectangle (upper right) is the “glacial-interglacial window” through which the structure function must pass in order to account for typical variations of ±2 to ±3K for cycles with half periods ≈ 50 kyrs. Reproduced from 12. Friday, February 2, 2018 Chapter 2 34

ΔT (oC) 20 oC

10 oC

5 oC

1 year 1 second 1 hour 10 days 1oC

10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1 10 102 103 104 105 106 107 108 109 o (yrs) 0.5 C Δt

weather climate megaclimate

o Industrial

0.2 C Pre-industrial macroweather macroclimate 0.1oC

Fig. 2.12a: A composite of typical Haar fluctuationsfffff (these are root mean square fluctuations; the square root of the average of the squares of the fluctuations). This composite is equivalent – but is easier to understand - than the spectrum in fig. 2.3a. Using largely the same data, it extends the range of time scales by a factor of 100,000 from an hour down to 2x0.017 = 0.034 seconds. From left to right, the curves are from thermistors at 0.017 s resolution (the same data as the lower right of fig. 1.4e) from (daily and annually detrended) hourly temperatures (second from the left, from a station in Lander Wyoming), 20CR temperatures (thick, middle, the same data as fig. 1.4d but at 75oN) and paleo-temperatures from EPICA ice cores (“S” shaped curve at the right, from the data shown in fig. 1.4b and in the spectrum, fig. 2.3a) over the last 800kyrs, the two far right curves are from the upper benthic curves in fig. 1.4a and fig. 2.3a. The different dynamical regimes are indicated by dashed lines, roughly separating regions with linear scale dependencies. The slopes are estimates of H. Adapted from 60.

fffff These are root mean square Haar fluctuations. Friday, February 2, 2018 Chapter 2 35

ΔT (oC) W 1 day, 1o o 10 C 0.40

C 140 years, 2o 3oC M 1 month, 1o 1oC

1 km 10 km 100 km 1000 km 10000 km Δx

0.3oC W Aircra;, 280m

0.1oC

Fig. 2.12b: The same as fig. 2.12a but for spatial fluctuations. The curves show data averaged over weather (“W”), macroweather (“M”) and climate (“C”) time scales. The straight reference line has slope H = 0.4. The lower left is from the aircraft data used in fig. 1.5 (280 m resolution), the upper right is from daily temperatures in January, 2006, the fluctuations in the longitudinal direction, every 1o in longitude (the same data as used in fig. 5.3; see the description there). The middle (“M”) and lower (“C”) right curves are from monthly and 140 year averaged data in the longitudinal direction, see fig. 5.3 for more details.

2 Megaclimate H ≈ 0.4 Veizer: 290 Mys - 511Myrs BP (1.23Myr)

1 Megaclimate Zachos: 0-67 Myrs (370 kyr)

H ≈ -0.8 0 Macroclimate max Huybers: 0-2.56 Myrs (14 kyrs) T Δ H ≈ 0.4 - 1 Climate T/ Epica: 25-97 BP kyrs (400 yrs)

- 2 Macroweather H ≈ - 0.4 Berkeley: 1880-1895 AD (1 month)

- 3 Weather Lander Wy.: July 4-July 11, 2005 (1 hour) H ≈ 0.4 - 4 Weather Thermistor, Montreal (0.017s)

50 100 150 t

Fig. 2.12c: Representative series from each of the five scaling regimes taken from fig. 1.4 and 2.12a with the addition of the hourly surface temperatures from Lander Wyoming, (bottom, detrended daily and annually) and a thermistor series in Montreal. In order to fairly contrast their appearances, Friday, February 2, 2018 Chapter 2 36 each series had the same number of points (180) and was normalized by its overall range (the maximum minus the minimum), and each series was offset by 1 oC in the vertical for clarityggggg. The series resolutions were 1 hour, 1 month, 400 years, 14 kyrs, 370 kyrs and 1.23 Myrs bottom to top respectively. The black curves have H>0, the red, H<0, adapted from60.

When Haar fluctuations are substituted for difference fluctuations, and using the climate series discussed in ch. 1 we obtain the composite fig. 2.12a that covers a range about 5 orders of magnitude more than that covered by Mitchellhhhhh (fig. 2.3a). One can clearly make out five distinct regions, each – with the exception of macroclimateiiiii - are scaling over ranges of roughly a thousand in time scalejjjjj. We clearly see the alternation of the sign of the slope (H) from positive (the weather regime to about 10 days, left), through the longer macroweather regimekkkkk with decreasing fluctuations and negative H, to the increasing climate regime, decreasing macroclimate, increasing megaclimate. The numbers on the vertical axis of fig. 2.12a all make perfect sense and give a precise idea of typical fluctuations at the correspond time scale. For example, reading the numbers off the graph we see that typical temperature fluctuations at intervals of one second are about 0.1oC; at 10 days (midlatitudes), about 10oC; these are the increasing part of the curve at the left and indicate that typical changes at these scales are ±0.05oC, ±5oC. At 10 years – on the decreasing part of the curve, we again have fluctuations with amplitudes of about 1oC, in this case indicating that typical consecutive ten year averages differ by this amount. Continuing to longer time periods, we find that typical ice age variations (with half-periods Δt about 50,000 years) are roughly 6oC, and at one hundred million years, about 12oC. Similarly, in space (fig. 2.12b), kilometer to kilometer changes are of the order of 0.2oC, with typical changes at 100 km of 1oC. In contrast, we saw by direct comparison (fig. 2.3c) that the quantitative implications of Mitchell’s spectrum are quite implausible; for example, analysis shows that it implies that consecutive one million year average global temperatures would vary by only a millionthlllll of a degree centigrade. Fig. 2.12c visually shows how fluctuations in the different regimes look, confirming the cancelling (H<0) and wandering (H>0) behaviours. By comparing the bottom two at 0.017s and hourly resolutions, we see visually confirm that although they have the same H’s, that the characters are somewhat different with the bottom being more “spikey” a consequence of intermittency (box 2.1). Finally, fig. 2.12b is the spatial counterpart of fig. 2.3a showing how data averaged over different time intervals (the different regimes) fluctuates in space. It shows excellent scaling (straight lines in the figure) including through the meso-scale discussed earlier (1 – 100 km). By comparing the difference and Haar fluctuations (fig. 2.11, 2.12a respectively), we can now understand the limitations of the difference-based analysis (fig. 2.11), and ggggg From top to bottom the ranges used for normalizing are: 10.1, 4.59, 1.61 (Veizer, Zachos, Huybers respectively, all δ18O), 6.87 oC, 2.50 oC, 25 oC (Epica, Berkeley, Lander). hhhhh To avoid cluttering the figure, we did not show the curve for the globally averaged temperature, this is discussed extensively in ch. 5. iiiii It is not clear that this is indeed a true scaling regime, see ch. 5. jjjjj The weather regime apparently continues for a factor of more than a million down to millisecond dissipation scales. kkkkk The figure shows macroweather at a high latitude location (Greenland). Over the ocean or averaged globally, it is less steep; H is less negative, see ch. 5. lllll The missing quadrillion refers to the spectrum; the amplitude of the fluctuations squared; Mitchell’s error in fluctuations is only by a factor in the range of about one to ten million. The sub-second thermistor data analyzed in the lower left of fig. 2.12a extends Mitchell’s range of time scales by a further factor of one hundred thousand (from two hours down to 0.03s). If Mitchell had extended his roughly flat background spectrum to the corresponding high frequencies, the error would have compounded by a further factor of one hundred million or so. Friday, February 2, 2018 Chapter 2 37 understand why macroweather was not clearly discerned until so recently. As expected, the increasing parts of the two figures are quite similar, whereas the flat parts of fig. 2.11 do indeed correspond to negative H – both macroweather and macroclimate. The remaining apparent divergence between the differences and Haar fluctuations (figs. 2.11, 2.12a) has to do with the difference between local and globally averaged temperatures and the difference between industrial and preindustrial temperatures (due to anthropogenic warming), we defer discussion of this until ch. 5 and box 5.1.

References

1 Sharf, C. The Zoomable Universe. (Scientific American, 2017). 2 Mandelbrot, B. Scalebound or scaling shapes: a useful distinction in the visual arts and in the natural sciences. Leonardo 14, 43-47 (1981). 3 Cooley, J. W. & Tukey, J. W. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19 (90), 297–301, doi:10.1090/S0025-5718-1965-0178586-1 (1965). 4 Galloway JM et al. Climate change and decadal to centennial-scale periodicities recorded in a late Holocene NE Pacific marine record: Examining the role of solar forcing. Palaeogeogr Palaeoclimatol Palaeoecol 386, 669–689 (2013). 5 Zawadzki, I. Statistical properties of precipitation patterns. Journal of Applied Meteorology 12, 469-472 (1973). 6 Bras, R. L. & Rodriguez-Iturbe, I. Rainfall generation: a nonstationary time varying multidimensional model. Water Resources Research 12, 450-456 (1976). 7 Bras, R. L. & Rodriguez-Iturbe, I. Random Functions and Hydrology. (Addison- Wesley Publishing Company, 1985). 8 Mitchell, J. M. An overview of climatic variability and its causal mechanisms. Quaternary Res. 6, 481-493 (1976). 9 Dijkstra, H. & Ghil, M. Low frequency variability of the large scale ocean circulations: a dynamical systems approach. Rev. Geophys. 43 (2005). 10 Fraedrich, K., Blender, R. & Zhu, X. Continuum Climate Variability: Long-Term Memory, Scaling, and 1/f-Noise, . International Journal of Modern Physics B 23, 5403-5416 (2009). 11 Dijkstra, H. Nonlinear Climate Dynamics. (Cambridge University Press, 2013). 12 Lovejoy, S. & Schertzer, D. Scale invariance in climatological temperatures and the local spectral plateau. Annales Geophysicae 4B, 401-410 (1986). 13 Shackleton, N. J. & Imbrie, J. The δ18O spectrum of oceanic deep water over a five-decade band. Climatic Change 16, 217-230 (1990). 14 Pelletier, J., D. The power spectral density of atmospheric temperature from scales of 10**-2 to 10**6 yr, . EPSL 158, 157-164 (1998). 15 Huybers, P. & Curry, W. Links between annual, Milankovitch and continuum temperature variability. Nature 441, 329-332, doi:10.1038/nature04745 (2006). 16 Wunsch, C. The spectral energy description of climate change including the 100 ky energy. Climate Dynamics 20, 353-363 (2003). Friday, February 2, 2018 Chapter 2 38

17 Chekroun, M. D., Simonnet, E. & Ghil, M. Stochastic Climate Dynamics: Random Attractors and Time-dependent Invariant Measures Physica D 240, 1685-1700 (2010). 18 Lovejoy, S. & Schertzer, D. in Chaos, Fractals and models 96 Vol. 38-52 (eds F. M. Guindani & G. Salvadori) (Italian University Press, 1998). 19 Nicolis, C. & Nicolis, G. Is there a climate attractor? Nature 311, 529 (1984). 20 Orlanski, I. A rational subdivision of scales for atmospheric processes. Bull. Amer. Met. Soc. 56, 527-530 (1975). 21 Schertzer, D., Lovejoy, S., Schmitt, F., Chigirinskaya, Y. & Marsan, D. Multifractal cascade dynamics and turbulent intermittency. Fractals 5, 427- 471 (1997). 22 Mandelbrot, B. B. Fractals, form, chance and dimension. (Freeman, 1977). 23 Mandelbrot, B. B. The Fractal Geometry of Nature. (Freeman, 1982). 24 Thompson, D. W. On Growth and Form. 793 (Cambridge University Press, 1917). 25 Lovejoy, M. Postmodern Currents: Art and Artists in the Age of Electronic Media. (Prentice Hall College Division, 1989). 26 Lovejoy, S. Area perimeter relations for rain and cloud areas. Science 187, 1035-1037 (1982). 27 Lovejoy, S. & Mandelbrot, B. B. Fractal properties of rain and a fractal model. Tellus 37 A, 209 (1985). 28 Dewdney, A. K. A computer microscope zooms in for a close look at the most complicated object in mathematics. . Scientific American August, 16–24 (1985). 29 Cantor, G. Sur les ensembles infinis et linéaires de points. Acta Mathematica 2, 381-408 (1883). 30 Sierpiński, W. Sur une courbe cantorienne qui contient une image biunivoque et continue de toute courbe donnée. C. r. hebd. Seanc. Acad. Sci., Paris (in French) 162, 629–632 (1916). 31 Lovejoy, S. & Schertzer, D. The Weather and Climate: Emergent Laws and Multifractal Cascades. (Cambridge University Press, 2013). 32 Lovejoy, S., Schertzer, D. & Ladoy, P. Fractal characterisation of inhomogeneous measuring networks. Nature 319, 43-44 (1986). 33 Kennedy, J. J., Rayner, N. A., Smith, R. O., Saunby, M. & Parker, D. E. Reassessing biases and other uncertainties in sea-surface temperature observations measured in situ since 1850 part 2: biases and homogenisation. J. Geophys. Res. 116, D14104, doi:10.1029/2010JD015220 (2011). 34 Koch, H. On a continuous curve without tangents constructible from elementary geometry. Arkiv för Matematik, Astronomi och Fysik 1, 681-702 (1904). 35 Richardson, L. F. Atmospheric diffusion shown on a distance-neighbour graph. Proc. Roy. Soc. A110, 709-737 (1926). 36 Richardson, L. F. & Stommel, H. Note on eddy diffusivity in the sea. J. Met. 5, 238-240 (1948). Friday, February 2, 2018 Chapter 2 39

37 Okubo, A. & Ozmidov, R. V. Empirical dependence of the horizontal eddy diffusivity in the ocean on the length scale of the cloud. Izv. Akad. Nauk SSSR, Fiz. Atmosf. i Okeana 6 (5), 534-536 (1970). 38 Monin, A. S. & Yaglom, A. M. Statistical Fluid Mechanics. (MIT press, 1975). 39 Welander, P. Studies on the general development of motion in a two dimensional , ideal fluid. Tellus 7, 156 (1955). 40 Steinhaus, H. Mathematical Snapshots. (Oxford University Press, 1960). 41 Monin, A. S. Weather forecasting as a problem in physics. (MIT press, 1972). 42 Mandebrot, B. B. The Fractalist. (First Vintage Books, 2011). 43 Perrin, J. Les Atomes. (NRF-Gallimard, 1913). 44 Steinhaus, H. Length, Shape and Area. Colloquium Mathematicum III, 1-13 (1954). 45 Mandelbrot, B. B. How long is the coastline of Britain? Statistical self- similarity and fractional dimension. Science 155, 636-638 (1967). 46 Richardson, L. F. Weather prediction by numerical process. (Cambridge University Press republished by Dover, 1965, 1922). 47 Lynch, P. The emergence of numerical weather prediction: Richardson's Dream. (Cambridge University Press, 2006). 48 Richardson, L. F. The problem of contiguity: an appendix of statistics of deadly quarrels. General Systems Yearbook 6, 139-187 (1961). 49 Schertzer, D. & Lovejoy, S. in IUTAM Symp. on turbulence and chaotic phenomena in fluids. (ed T. Tasumi) 141-144. 50 Grassberger, P. Generalized dimensions of strange attractors. Physical Review Letter A 97, 227 (1983). 51 Hentschel, H. G. E. & Procaccia, I. The infinite number of generalized dimensions of fractals and strange attractors. Physica D 8, 435-444 (1983). 52 Mandelbrot, B. B. Multifractals and Fractals. Physics Today 39, 11, doi: http://dx.doi.org/10.1063/1.2815135 (1986). 53 Parisi, G. & Frisch, U. in Turbulence and predictability in geophysical fluid dynamics and climate dynamics (eds M. Ghil, R. Benzi, & G. Parisi) 84-88 (North Holland, 1985). 54 Wolfram, S. in Wall Street journal (2012). 55 Manley, G. Central England temperatures: monthly means 1659-1973. Q. J. Roy. Meterolog. Soc. 100, 389-495 (1974). 56 Hurst, H. E. Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770-808 (1951). 57 Peng, C.-K. et al. Mosaic organisation of DNA nucleotides. Phys. Rev. E 49, 1685-1689 (1994). 58 Haar, A. Zur Theorie des orthogonalen Funktionsysteme. Mathematische Annalen 69, 331-371 (1910). 59 Lovejoy, S. & Schertzer, D. Haar wavelets, fluctuations and structure functions: convenient choices for geophysics. Nonlinear Proc. Geophys. 19, 1- 14, doi:10.5194/npg-19-1-2012 (2012). 60 Lovejoy, S. A voyage through scales, a missing quadrillion and why the climate is not what you expect. Climate Dyn. 44, 3187-3210, doi:doi: 10.1007/s00382-014-2324-0 (2015). Friday, February 2, 2018 Chapter 2 40