Imperial College London Department of Mathematics
Non-autonomous Random Dynamical Systems: Stochastic Approximation and Rate-Induced Tipping
Michael Hartl
Supervised by Prof Sebastian van Strien and Dr Martin Rasmussen
A thesis presented for the degree of Doctor of Philosophy at Imperial College London.
Declaration
I certify that the research documented in this thesis is entirely my own. All ideas, theories and results that originate from the work of others are marked as such and fully referenced, and ideas originating from discussions with others are also acknowledged as such.
1 Copyright
The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives license. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the license terms of this work.
2 Abstract
In this thesis we extend the foundational theory behind and areas of application of non- autonomous random dynamical systems beyond the current state of the art. We generalize results from autonomous random dynamical systems theory to a non-autonomous realm. We use this framework to study stochastic approximations from a different point of view. In particular we apply it to study noise induced transitions between equilibrium points and prove a bifurcation result. Then we turn our attention to parameter shift systems with bounded additive noise. We extend the framework of rate induced tipping in deterministic parameter shifts for this case and introduce tipping probabilities. Finally we perform a case study by developing and applying a numerical method for calculating tipping probabilities and examining the results thereof.
3 Acknowledgments
I consider myself very lucky that I was included in the Innovative Training Network CRITICS1, funded entirely by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 643073.
Special thanks to: Hassan Alkhayuon, Peter Ashwin, Michel Benaïm, Jens Bendel, Daniele Castellana, Andrew Clarke, Michael Collins, Peter Deiml, Gabriela Depetri, Maximilian En- gel, Gabriel Fuhrmann, Tobias Jäger, Gerhard Keller, Jeroen Lamb, Johannes Lohmann, Iacopo Longo, Usman Mirza, Karl Nyman, Christian Oertel, Guillermo Olicon, Greg Pavli- otis, Courtney Quinn, Martin Rasmussen, Flavia Remo, Chris Richley, Paul Ritchie, Pablo Rodríguez-Sánchez, Edmilson Roque, Anderson Santos, Cristina Sargent, Tobias Schwedes, Jakob Seifert, Jan Sieber, Leif Stolberg, Sebastian van Strien, Damian Smug, Kalle Timperi, Dmitry Turaev, and my family. Dedicated to Rudi Wutz.
1Critical Transitions in Complex Systems
4 Contents
1 Introduction6 1.1 Non-autonomous random dynamical systems...... 6 1.2 Stochastic approximations...... 8 1.3 Rate-induced tipping...... 10 1.4 Structure of the thesis...... 13 1.5 Notation...... 15
2 Random dynamical systems 17 2.1 Background on autonomous RDS...... 17 2.1.1 Skew product systems...... 17 2.1.2 Random invariant sets and measures...... 18 2.1.3 Attractors and repellers...... 22 2.2 Non-autonomous random dynamical systems...... 24 2.2.1 Non-autonomous sets and measures...... 24 2.2.2 Attractors and repellers...... 32
3 Stochastic Approximations 35 3.1 Setup and notation...... 35 3.2 Examples...... 36 3.2.1 Urn models and market competition...... 36 3.2.2 Learning in games...... 37 3.2.3 Stochastic gradient descent...... 40 3.3 The Limit Set Theorem...... 41 3.4 Noise induced tipping in one dimension...... 43 3.4.1 Preliminaries...... 43 3.4.2 The invertible case...... 46 3.4.3 The non-invertible case...... 57 3.4.4 Further remarks...... 60 3.5 A bifurcation arising from touchpoints...... 62 3.6 Conclusions and outlook...... 66
4 Rate-induced tipping in random systems 68 4.1 Deterministic R-tipping...... 68 4.2 Asymptotically autonomous NRDS...... 69 4.3 Random parameter shift systems and rate induced tipping...... 75 4.4 An example...... 83 4.5 Further remarks and outlook...... 94
A Hausdorff distance 104
B Conditional expectation, martingales and stopping times 105
C Chain recurrence and asymptotic pseudotrajectories 109
5 1 Introduction
1.1 Non-autonomous random dynamical systems The theory of dynamical systems dates back to at least the 17th century, when Isaac Newton and contemporaries began exploring the motion of celestial bodies, which would later develop into the field of classical mechanics. Very loosely speaking, a dynamical system is a space with a prescribed set of motion laws. The classical theory of dynamical systems assumes that those laws are known and fixed for all times. In many areas of applications however this setting is far too restrictive. This gives rise to the notion of non-autonomous dynamical systems. There have been efforts towards developing a unified theory of non-autonomous dynamical systems. A foundational account of this is the book [49] by Kloeden and Rasmussen, which also contains a comprehensive historical overview of the developments in this field. The basic idea is to describe non-autonomy by a base flow on some abstract parameter space which influences the observed dynamics on the phase space. Generally there are two main categories of such base flows. Firstly, one may think of situations where a system is subject to some driving force, which does not follow any known or predescribed motion laws. This can be simply due to the fact that a system is way too complex in for every detail of it to be included in the model, as it is the case for example in climate studies. Another possibility is that a system is subject to a truly random influence, like many economic or financial markets who are subject to human decisions. Models in those cases are often based on stochastic differential equations driven by a random process, often a Brownian motion; examples include [23, 34, 58]. In systems with multiple time scales, a fast variable can seem random from the viewpoint of a slow timescale, such that a stochastic differential equation describes the motion of slow variables fairly well, see e.g. [57] for a theoretical approach and [43] for practical examples. Discrete time random systems are also of interest, where points in a space are iterated according to a map depending on some random parameter, such as random circle diffeomorphisms [78] or iterated function systems [10, 42]. The abstract parameter space in this case is usually a measure space, equipped with a measure preserving transformation; a setup popularized by Arnold in his seminal book [3]. On the other hand, motion laws can explicitly depend on time, for example through a varying parameter. Examples include the FitzHugh–Naguro model for the firing of neurons in the brain (cf. [39]), where the parameter is the level of stimulation. An early theoretical account is [72, 71]. In this setting, the abstract parameter space is the real “time” line, and the dynamics are given by a right shift. In the case of a periodic time dependency one can also use the unit circle as the abstract parameter space. Despite the two types of systems being included within the umbrella term non-autonomous dynamical system, a big part of the existing literature treats them separately. Random dynamical systems with their measure preserving base transformations give rise to a lot of measure theoretic tools that are not available for the time shift case. We will present a few highlights in Chapter 2.1. On the other hand this limits the way a non-autonomous system can depend on the driving force; the right shift on the real line does not possess an invariant probability measure. Only very recently, mathematicians have turned their attention to simultaneously time and noise dependent systems. To the best of our knowledge, the non-autonomous random
6 dynamical system formalism as presented in this thesis appeared for the first time in [30] under the name partial-random dynamical system, as the solution operator of a stochastic partial differential equation with a time varying domain. The term “partial-random” refers to the fact that the non-autonomous aspect of the dynamical system is only partially due to a random influence. The authors introduce a concept of non-autonomous random attractors and prove the existence of a global attractor in their sense for an example class of systems. Their ideas were followed on by Wang [73, 74]. Other work, such as [1, 24, 74], is particularly concerned with existence of pullback attractors under periodic deterministic forcing. Cui and Langa [32] discuss various concepts of non-autonomous random attractors with compact deterministic component of the base flow. Remarkably this paper does not necessarily assume that their NRDS is induced by a non-autonomous SDE. A NRDS on a (metric) state space X has two basic ingredients. • A base consisting of a measurable map T on some probability space (Ω, A , µ) which serves as the dynamical model for the noise. • A cocycle mapping
ϕ: N0 × T × Ω × X → X, (k, n, ω, x) 7→ ϕ (k, n; ω) x
describes the evolution of a state x under the influence of the noise realization ω, starting in time n and ending at time n + k. We assume troughout this thesis that the time line T is discrete, i.e. T = N0 or T = Z, but the case of continuous time is similar. Table1 shows the evolution of noise and state when going from time n to n + k and then n + k + l, and it illustrates the cocyle property
ϕ (k + l, n; ω) x = ϕ (l, n + k; T kω) ◦ ϕ (k, n; ω) x of ϕ. While this setup is generally agreed on in the literature cited above, the consensus is also that the base (Ω, A , µ, T ) has to be measure-preserving. Roughly speaking this means that the influence of the noise is in some sense stationary, an assumption that makes sense when speaking about autonomous random systems. However we believe that this is an unnecessary restriction in the case of non-autonomous random systems. Chapter3 presents a class of examples where indeed T is not measure preserving in general. A more formal introduction is presented in Section 2.2. We generalize notions from au- tonomous RDS theory and show that some classical results remain true in our non-autonomous framework, such as the Krylov-Boguliubov Theorem 2.31 and the 1:1 correspondence between stationary measures and Markov invariant measures in Theorem 2.38. An important concept is pullback attraction, which is introduced in Definition 2.39. First one should notice that “sets” and “points” in a non-autonomous and random framework have both a time and a random component. Roughly speaking a non-autonomous random set is a sequence (A ) of set-valued maps A : Ω 7→ (X). Given two such objects U = (U ) n n∈Z n P n n∈Z and A = (A ) we say that A pullback attracts U if ϕ (k, n−k; T −kω)U (T −kω) converges n n∈Z n−k to An (ω) almost surely in the semi Hausdorff distance and for all n. If moreover U is a neighborhood of A and A is invariant under ϕ then we say A is a (local pullback) attractor. Chapters3 and4 are each devoted to a class of NRDS, and in both cases, this concept will play a big role in describing the dynamics of the systems.
7 time n n + k n + k + l noise ω T kω T k+lω state x ϕ (k, n; ω)x ϕ (k + l, n; ω)x
Table 1: Non-autonomous random dynamics
1.2 Stochastic approximations The term stochastic approximation was coined by Robbins and Monro in [66]. In their paper they were studying a system that changes some input x ∈ R to an output M (x) ∈ R, which is only accessible through a measurement with some stochastic error U. Under the assumption that M is monotone they give an algorithm of updated measurements of the form 1 x = x + (−M(x ) + a + U ) , (1.1) n+1 n n + 1 n n+1 that converges almost surely to the solution of M (x) = α if α is such that this equation has exactly one solution. The term in brackets on the right hand side of (1.1) corresponds to a measurement of −M (xn) + a with (random) error Un+1. Soon after, Kiefer and Wolfowitz [47] adapted an algorithm of a similar form for finding extremal points of some differentiable function M in the same manner, by approximating M 0 from erroneous measurements. The almost sure convergence in the above examples comes from the fact that the studied algorithms approximate a so called mean field differential equation. Equation (1.1) looks like an Euler approximation for x˙ = −M (x)+a, but with a decreasing time step and an additional noise term. The desired point is the stable fixed point of this differential equation. This ODE point of view has been established in a more solid way by Ljung in [56] and Kushner and Clark in [51]. Generalizations to this dynamical systems approach have been made by Benaïm and Hirsch, for example [12, 13, 15], and by Pemantle, e.g. in [59]. We will present some highlights of these results, in particular the ones by Benaïm and Hirsch, in Section 3.3. A large field of applications for stochastic approximation arises through their links to urn processes. The simplest model, dating back to Eggenberger and Pólya [36], consists of one urn with an initial distribution of balls of two colors. At each step, one ball is drawn randomly and returned to the urn, together with a fixed number of balls of the same color. The proportion of the balls forms a stochastic approximation process. Friedmann’s urn is similar, but adds α > 0 balls of the color drawn and β > 0 of the other color (see e.g. [40]). While those simple models can be analyzed with well established martingale techniques, several generalizations of those models that have been proposed are studied with stochastic approximation tools. One way of doing so is to modify the probability of drawing a specific color from the urn to be a function of its proportion, rather than the proportion itself. In his doctoral thesis [64], Renlund studies these models in a fairly general stochastic approximation setup. In graph based interactions of n > 2 colors in an urn were studied e.g. in [14] and [55]. Such models have been used to analyze market competition between similar products of different companies [5]. An overview of various applications for those models can be found in the survey paper [61]. Interacting Pólya urns have been used to create learning algorithms for games, for example the reinforcement learning models by Erev and Roth [38] and Arthur [6]. In [45] the author uses stochastic approximation techniques to give a complete description of the limiting behavior
8 of such an algorithm for the case of two players and two strategies. Similar techniques are applied to higher dimensional games in [46]. Fictitious play is a learning method based on players’ best response to their opponents’ actions. Introduced by Brown [20] in a deterministic setting, Fudenberg and Kreps extended the algorithm to a random setting in [41]. Benaïm and Hirsch proved convergence of this algorithm in a typical setting using stochastic approximation techniques in [16]. In Chapter3 of this thesis we focus on one-dimensional algorithms of the Robbins–Monro type, which take the general form
xn+1 = xn + γn+1 (f (xn) + Un+1) . (1.2)
We will study a setting where all the noise terms Un are bounded by a number R > 0, and where the aforementioned results by Benaïm and Hirsch can be applied to show that limits of paths of (1.2) can only be stable equilibrium points of the mean field ODE x˙ = f (x). Assuming there are several such equilibria, we are striving to answer the question whether or not the algorithm can get trapped near an equilibrium forever or if, with positive probability, it can escape and converge to another stable point eventually. Our main result in this chapter is Theorem 3.28. It provides a critical noise value for such transitions to occur and is of the following type.
Theorem. Let s0 and s1 be stable equilibrium points of x˙ = f (x) and V be a suitably small neighborhood of s0. Then there is a critical value R∗ such that
1 µ (xn → s | xk ∈ V ) > 0
∗ for all k ∈ N if R > R and 1 µ (xn → s | xk ∈ V ) = 0 for all large enough k whenever R < R∗.
However this formulation of the result is somewhat problematic. Firstly it depends on an a priory arbitrary neighborhood V of the starting equilibrium point s0 as a notion of being “close” to the point. Secondly, the process (x ) depends on its initial value x ∈ . This n n∈N0 0 R may cause ambiguity since for some choices of x0, V and k the event {xk ∈ V } may have probability 0 to begin with, and thus the conditional probabilities above are not well defined in those cases. For this reason we provide an alternative point of view, instead of seeing a stochastic approximation as a collection of stochastic processes {(x ) | x ∈ } we write n n∈N0 0 R them as a NRDS ϕ. This novel approach allows us to avoid such issues. We show that every unstable equilibrium point of the mean field ODE corresponds to a unique repelling non-autonomous random fixed point of ϕ. Those points separate the phase space into non-autonomous, random regions, each of which acts as the basin of attraction for one of the stable fixed points. These basins can then be used to reformulate and prove the above result in a coherent way, as we have done in Theorem 3.28. Roughly speaking, a tipping occurs if one stable equilibrium of the mean field is contained in the basin corresponding to another one. This interesting interplay between the non-autonomous random system ϕ and the au- tonomous deterministic one generated by the mean field ODE shows how powerful our ex- tended NRDS setup is. For one, NRDS generated by a Robbins-Monro algorithm are in general
9 not invertible in any sense. In general the sequence (U ) need not be stationary and thus n n∈N cannot be modeled using a measure preserving system (Ω, A , µ, T ) in the base. Moreover the fiber maps need not be invertible as well, and the system is not defined for negative times. Yet the remarkable fact that limiting objects are deterministic and autonomous can be captured and described. After a brief discussion of possible generalizations of our findings we close Chapter3 by extending a result by Pemantle [60] on urn processes with touch points. A touch point is hereby an equilibrium of the mean field ODE which is stable from one side and unstable from the other one, and it is shown in [60] that the one-sided derivative from the stable side determines whether the probability of convergence to this point is positive or not. Theorem 3.38 extends this result to a much more general class of stochastic approximations, which includes also systems with unbounded noise. Since the proof presented in [60] relies on the specific structure of the model studied, a new approach was necessary.
1.3 Rate-induced tipping Many phenomena in science and nature can be described as critical transitions or tipping points—on a very superficial level. We want to refer the reader to the Nature article [70] by Scheffer et. al. for a brief discussion of the terms and example situations. A more detailed account is the book [69] by the same author. The article [35] by Ditlevsen is focused on tipping of climate system. While the phenomenon critical transition is easy to grasp intuitively—an abrupt qualitative change in the system—and many insights have been gained, its mathematical foundation is still rather vague. The book [69] strives to explain many of the examples through means of bifurcations. The theory of topological bifurcations is well developed, but lies beyond the scope of this thesis, so we refer the reader to the ample literature on this topic. However it has been suggested that topological bifurcations alone cannot explain all critical transitions sufficiently. As mentioned in Section 1.1, many models incorporate a random component, and it has been found that this noise can also lead to a critical transition: A system in a stable state can be pushed to another state by a large enough perturbation. The models presented in [43] are an example. One of our main results (Theorem 3.28 in Chapter3) gives necessary and sufficient conditions for the presence of noise induced tipping in non-autonomous random systems generated by a stochastic approximation algorithm. Wieczorek, Ashwin et al. [75] identify yet another type of critical transitions which they call rate-induced tipping. They study a system in which a parameter changes linearly without putting the system through a classical bifurcation, and where noise is not present. They observe that the speed, or rate, at which the parameter changes can influence the stability of the system. Vaguely speaking, a rate-induced tipping occurs when the parameter change is so fast that the system in a stable state has not enough time to follow the parameter change and instead tips into another state. In [7] the authors give a coherent definition of rate-induced tipping in parameter shift systems x˙ t = f (xt,Λ (rt)) where Λ is a bounded, real-valued (parameter shift) function, limiting to some λ± as t → ±∞, and r > 0 is the speed or rate of parameter change. They show that for every stable fixed point x− of the limiting ODE x˙ t = f (xt, λ−) there is a non- autonomous pullback attracting fixed point of the parameter shift system converging to x− as t → −∞. If x− gets continuously transformed to a stable fixed point x+ of x˙ t = f (xt, λ+)
10 An example for rate-induced tipping
2.0 0.4
1.5 0.2 ) ) t x ( 1.0 ( 0.0 0 f
0.5 -0.2
0.0 -0.4 -0.5 0.0 0.5 1.0 1.5 2.0 -1.0 -0.5 0.0 0.5 1.0 t x
3 3
2 2
t 1 t 1 x x
0 0
-1 -1 -0.5 0.0 0.5 1.0 1.5 2.0 -0.5 0.0 0.5 1.0 1.5 2.0 r t r t
Figure 1: A simple model that displays rate-induced tipping is given by f (x, λ) = f0 (x − λ) 3 with f0 (x) = x − x (top right panel) and a ramping function Λ that increases linearly from 0 to 2 on the interval [0, 1] and is constant otherwise (top left panel). For t < 0 the non- autonomous ODE x˙ t = f (xt,Λ (rt)) is equivalent to the autonomous ODE x˙ t = f0 (xt) which 1 has attracting equilibria at x = 1 and x = −1. Similarly for t > r it is equivalent to the ODE x˙ t = f (xt, 2) with attracting equilibria x = 1 and x = 3, where 1 is linearly shifted into 3 and −1 into 1. The bottom two panels show the solution of x˙ t = f (xt,Λ (rt)) starting in xt = 1 at t < 0 for rates r = 0.2 (left) and r = 0.4 (right), and the aforementioned shift of the equilibria is indicated by the dashed lines. On the left hand side, the solution xt follows the dashed line, albeit with a slight delay, whereas on the right hand side it tips towards the other equilibrium.
11 1.00
0.75
0.50
tipping probability 0.25
0.00 0 1 1 1 2 6 3 2 3 rate r of parameter change
Figure 2: Probability that the example system of Section 4.4 undergoes rate-induced tipping, depending on the rate r for a fixed noise size and parameter shift Λ. Figure6 shows similar plots for various noise sizes and Λ.
under the parameter change from λ− to λ+, a rate-induced tipping occurs if the aforementioned non-autonomous point does not converge to x+. These results are generalized to arbitrary attractors of the limiting systems in [2]. We will present some of the results from that paper in Chapter4. A toy example is displayed and explained in Figure1 Ashwin, Wieczorek et al. propose in [8] that the three scenarios described above are typical, and they subsequently suggest a classification of tipping scenarios into the three useful categories • bifurcation induced tipping (B-tipping), • noise induced tipping (N-tipping) and • rate-induced tipping (R-tipping). However these categories need not be mutually disjoint. Ritchie and Sieber [65] perform a phenomenological study of a system with both noise and a parameter change. They observe that tipping is a random phenomenon, but the tipping probability depends on the rate; tipping more likely occurs at higher rates. In Chapter4 we are concerned with exactly this kind of system, albeit on a discrete time line and with bounded noise. That this makes a difference is explored in Section 4.4. For a given example with a parameter shift function very similar to the one used in [65] we performed a numeric calculation of tipping probabilities. Figure2 shows a plot of rate versus tipping probability. The perhaps most striking observation from this analysis is that tipping probability is not a monotone function of the rate—contradicting both the findings of [65] and the intuition that a faster change of a parameter should make it easier for a system to be knocked out of equilibrium. At this stage we cannot give a full analytic explanation of the observed phenomenon, but we discuss a few possible starting points for such an analysis. Due to the simultaneous presence of noise and an explicit time dependence of the parameter
12 change, the framework of non-autonomous random dynamical systems is a natural choice. Building on the ideas in [7] we describe a parameter shift system with additive noise as a r family (ϕ )r>0 of NRDS converging to limiting autonomous RDS as time tends to ∞ and −∞, with the limiting systems not depending on the rate r. The main result of this chapter, Theorem 4.18, states roughly speaking that any nice enough random attractor A− of the past-limiting system admits a non-autonomous random attractor (Ar ) of ϕr for every rate n n∈Z r r, such that An converges to A− in the semi Hausdorff distance as n → −∞. r Using these non-autonomous attractors we say the system tips at rate r if An does not converge to an attractor A+ of the future-limiting system as n → ∞. As usual with convergence of objects in probability spaces there are different types of convergence which leads to different definitions of tipping. A weaker version is presented in Definition 4.22 and a stronger one in Definition 4.24. The latter one has the advantage that a tipping probability can be defined as well. These attractors, plus a similarly constructed family of repellers, are also the foundation of the algorithm we used to calculate tipping probabilities in Figure6.
1.4 Structure of the thesis The main body of this thesis is divided into three thematically distinct parts, corresponding to Chapters2,3 and4. Chapter2 is a more conceptual account of random dynamical systems theory. Our main goal is to extend well established notions and results from autonomous RDS theory to a non-autonomous setting. For this reason, Chapter2 is split up into two main parts that progress in a parallel motion. The first part gives a short overview over some key elements of autonomous random dy- namical systems theory, relying entirely on existing literature. We present the general RDS framework of a skew product flow on a metric space over an ergodic base, and introduce con- cepts such as random invariant points, sets and measures, decomposition of random measures, stationary measures and the Markov property and pullback attractors. Moreover we state two classic theorems, namely the random Krylov-Boguliubov Theorem and the Theorem on 1:1 correspondence of stationary measures and Markov invariant measures. In the second part we present the concept of a non-autonomous random dynamical system as a cocycle flow on a metric space. As opposed to the existing literature on this topic, we do not assume that the randomness is generated by an ergodic or at least measure preserving dynamical system, but by any measurable transformation on a probability space. This allows us to include systems into our framework where the statistics of the noise changes over time—this is true for example with stochastic approximations as presented in Chapter3. We generalize the notions of random invariant points, sets and measures to our extended setting, taking inspiration from existing works such as [24, 30, 32] before proving a non- autonomous random Krylov-Boguliubov result (Theorem 2.31). We proceed with introducing non-autonomous stationary measures and a non-autonomous Markov property. To our knowl- edge, this has not been done in the existing literature. One has to be a bit careful with the latter since the past of the system can be understood w.r.t. the explicit time component and the noise component. Theorem 2.38 is a non-autonomous generalization of the aforementioned correspondence theorem for stationary and Markov measures. Finally we present a concept of pullback attraction and discuss some of its properties and differences to the autonomous case. Chapter3 starts with a formal introduction of Robbins-Monro type algorithms as recur-
13 sively defined stochastic processes approximating an mean field ODE, and set and explains some standard assumptions. While the largest part of the existing literature seems to under- stand stochastic approximations as a special type of recursively defined stochastic processes, we introduce an alternative point of view by describing them within the non-autonomous RDS framework introduced in Chapter2. This will prove to be more adequate for some of our research. In the next section we present a few example applications that can take the form of stochastic approximations: generalized Pólya urns and their application in economy, two different models of learning in game theory and finally stochastic gradient descent algorithms as often used in optimization and machine learning problems. Section 3.3 is a brief recapitu- lation of some estabished results about stochastic approximations, mostly due to Benaïm et. al. Section 3.4 is concerned with noise-induced tipping in stochastic approximation NRDS with bounded noise; we will focus on the one-dimensional case. We assume that the mean field is hyperbolic and has a finite number of equilibria. We provide a simple condition under which almost every path of the according stochastic approximation converges to one of the stable equilibria of the mean field. Theorem 3.16 shows that a.e. path is bounded, which allows us to use the results from Section 3.3. After establishing this we show in Lemma 3.23 and Theorem 3.26 that the unstable equilibria of the mean field correspond 1:1 to repelling non- autonomous fixed points of the NRDS converging to them. This is where it becomes apparent that a NRDS approach is more suitable than the classical stochastic processes, because in the latter case any given solution of the Robbins-Monro algorithm a.s. does not converge to an unstable point. These repelling points separate the phase space into (non-autonomous random) regions, each of which serves as the basin of attraction of a stable equilibrium, and this allows us to give a precise definition of noise-induced transitions. Finally, Theorem 3.28 shows the existence of critical values of the noise size at which transitions between two given stable points become possible. The rest of Section 3.4 shows how the technique developed can be applied to other scenarios, and also limitations thereof. First we assume that the condition preventing paths from being unbounded is violated. Based on Lemma 3.33 we use our methods to provide at least a local description of the dynamics in this case. Example 3.35 however shows that in this case the global dynamics can be more complicated. Finally we briefly discuss stochastic approximations of parameter depending mean fields and on the unit circle (Theorem 3.36). Section 3.5 is motivated by [60] where the author studies mean fields with non-hyperbolic touchpoints in the context of generalized Pólya urn models; he presents conditions under which the probability of convergence to such a point is zero or strictly positive. However, the methods used rely on the specific structure of the model. By applying a different approach we managed to show that the main result of [60] holds true for a much bigger class of stochastic approximations (Theorem 3.38). In particular, our result includes scenarios with bounded as well as unbounded noise. Chapter4 is devoted to rate-induced tipping in random systems. Our starting point is the deterministic framework for rate-induced tipping proposed by Ashwin, Wieczorek et. al. in [9, 7], which we briefly recall in Section 4.1. Section 4.2 is a conceptual analysis of asymptotically autonomous NRDS, i.e. systems whose dynamics converge to those of a limiting autonomous RDS as time goes to −∞. We are mostly concerned with the question: If this limiting system has an attractor, does this correspond to a non-autonomous attractor “tracking” the autonomous one? In Definition 4.7 we introduce two different concepts of
14 tracking, one of which is weaker than the other one. Theorem 4.9 gives a condition under which a given limiting attractor is tracked in the weak sense. In a similar fashion, Theorem 4.11 provides some stronger conditions under which an attracting fixed point of the limiting system is tracked in the strong sense. In both cases the proof is constructive and moreover it is shown that the resulting non-autonomous attractor is maximal in the sense that ever other non-autonomous tracking attractor is contained in it. Section 4.3 introduces NRDS that stem from a parameter shift systems as defined by Ashwin, Wieczorek et. al., but with bounded additive noise. Theorem 4.18 shows that for small enough noise size, those systems are in the realm of the previous section and thus the existence of tracking attractors is guaranteed. Moreover we prove that they depend continuously on the rate in Theorem 4.19. These attractors allow us to extend the definition of rate-induced tipping to random dynamical systems. Theorem 4.23 shows that there is a strictly positive threshold the rate has to pass in order for tipping to occur. This is known in the deterministic case, but our result extends this to the random setting. Corresponding to the above mentioned strong version of tracking is a path-wise notion of tipping which in turn allows us to speak about tipping probabilities. We explore the properties of tipping probabilities through an example in Section 4.4, but the methods developed can be applied in a wider context, the extent of which is unclear to us at the present stage. The main tool we are using are non-autonomous random repellers tracking autonomous random repellers of the future limit system. After giving an equivalent characterization of tipping in Lemma 4.30 using those repellers, we then apply the continuity of attractors and repellers to show that the tiping probability is uniformly continuous as a function of the rate (Theorem 4.32). A further analysis of tipping probabilities is performed with a numerical method we developed based on the tipping characterization using repellers. Our results show that surprisingly, tipping probability is not a monotone function of the rate, and moreover that it is not differentiable. We present an intuition why this is the case, but a rigorous proof is still to be found. Finally we calculate the limits of tipping probabilities in the limit of the noise size to zero in Theorem 4.35.
1.5 Notation The following notational and other conventions are in use throughout this thesis. • We use the symbol N to denote the set of natural numbers 1, 2,... and N0 for the set of non-negative integers 0, 1, 2,... . • As usual the symbols Z, Q and R denote the sets of integers, rational numbers and real numbers respectively. • The empty set ∅ is a finite set and every finite set is countable. Let B ⊆ A be sets and f : A → R any map. • We use the short hand notation sup f := sup f (x) , B x∈B
and similarly maxB f, infB f and minB f. • The symbol 1B denotes the indicator function of B, i.e. (1, if x ∈ B 1B : A → R, 1B (x) = 0, if x∈ / B.
15 • The power set of A is denoted as P (A). • The complement of B is Bc := A \ B. For a subset A ⊆ M of a topological space M we write • cl A for the closure of A, • int A for the interior of A and • ∂A := cl A \ int A for its boundary. Let (M, d) be a metric space. • Given a subset A ⊆ M and a point x ∈ M we write
d (x, A) = d (A, x) := inf d (x, y) . y∈A
• For ε > 0 we denote by
Bε (x) := {y ∈ M | d (x, y) < ε}
the open ε-ball around x. • We reuse the symbol Bε to denote open ε-neighborhoods of sets A ⊆ M,
Bε (A) := {y ∈ M | d (A, y) < ε} .
• The according closed ball and closed neighborhood are denoted by B≤ε (x) and B≤ε (A). • The diameter of A is
diam (A) := sup {d (x, y) | x, y ∈ A} .
• The semi-Hausdorff distance on subsets of M is denoted as dist and the Hausdorff distance as dh. For details we refer to AppendixA. Let (Ω, A , µ) be a measure space and (Θ, B) a measurable space. • The pushforward measure of µ under a measurable map f : Ω → Θ is defined via −1 f∗µ (B) = µ f B for B ∈ B and f∗µ is a measure on (Θ, B). • For a collection A of subsets of Ω we denote by σ (A) the smallest σ-algebra containing every element of A. • If f : Ω → Θ is a map and A consists of the sets f −1B with B ∈ B we write σ (f) = σ (A). • Similarly if (fi)i∈I is a family of maps we write σ (fi : i ∈ I) for the smallest σ-algebra −1 containing all sets fi B, B ∈ B. • If (M, T ) is a topological space we denote its Borel-σ-algebra as B (M) = σ (T ). • A measurable map T : Ω → Ω is bi-measurable if it is invertible and the inverse map is measurable. • If A, B ∈ A we say that A ⊆ B modulo µ if µ (A \ B) = 0, and A = B modulo µ if A ⊆ B and B ⊆ A modulo µ. If M is a metric space and µ, µ1, µ2,... are probability measures on (M, B (M)) we say that µ converges weakly to µ if n Z Z lim h dµn = h dµ n→∞ for all continuous, bounded maps f : M → R. We write µ = w-limn→∞ µ in this case.
16 2 Random dynamical systems
2.1 Background on autonomous RDS This section is an overview over important results from (autonomous) RDS theory, as a primer for the non-autonomous setting. For further reading on the topic we recommend the book [3].
2.1.1 Skew product systems A random dynamical system is, roughly speaking, a dynamical system on some metric space (X, d), where the dynamics at each time are subject to a random influence. So in order to describe those dynamics we do not only need a single map g : X → X, but rather a whole family (gω)ω∈Ω. Then the (random) orbit of an initial value x0 is the sequence
x0, x1 = gω0 (x0) , x2 = gω1 (x1) ,... (2.1) where the ω0, ω1,... are picked from the set Ω according to some probability law. It has proven convenient to use a dynamical modeling of this noise influence. By this we mean that we describe the sequence of the noise variables ω0, ω1,... as a dynamical system on some probability space.
Definition 2.1. Let (Ω, A , µ) be a probability space. (a) A map T : Ω → Ω is called measure preserving, if µ T −1A = µ (A) for every A ∈ A . In that case we call the tuple (Ω, A , µ, T ) a measure preserving dynamical system (MPDS) and the measure µ invariant under T . (b) An MPDS (Ω, A , µ, T ) is called ergodic, if µ (A) ∈ {0, 1} for each A ∈ A with T −1A = A. Such sets A are said to be invariant. (c) We say the MPDS (Ω, A , µ, T ) is invertible, if the map T is bi-measurable.
Remark 2.2. If (Ω, A , µ, T ) is measure preserving/ergodic and invertible, then the inverse map T −1 is measure preserving/ergodic as well. It is a well known result from ergodic theory that ergodic systems are the building blocks of MPDS. Every (nice enough) T -invariant measure µ can be decomposed into ergodic measures in a unique way ([37, Theorem 6.2]). Moreover, any ergodic system can be extended to be invertible (cf. [37, Exercise 2.1.7]). Thus we can assume that our noise model is ergodic and invertible without loss of (a lot of) generality. From now on we assume that (Ω, A , µ, T ) is an ergodic and invertible MPDS.
Definition 2.3. A random homeomorphism is a map g : Ω × X → X such that ω 7→ g (ω, x) is measurable for every x ∈ X and x 7→ g (ω, x) is continuous for every ω ∈ Ω.
Instead of g (ω, x) we will write gω (x). Just as in (2.1), the dynamics in X are described by the maps gω, but at the same time we we can now evolve ω according to T . This leads to the skew product map
T n g : Ω × X → Ω × X, (ω, x) 7→ (T ω, gω (x)) .
This allows us to describe the inherently non-autonomous random dynamical system on X as living within an autonomous dynamical system on the extended phase space Ω × X. We
17 n can describe the n time step dynamics on X by applying (T n g) and extracting the second n n n component. We observe that (T n g) is also of skew product form, namely T × g where n gω := gT n−1ω ◦ · · · ◦ gT ω ◦ gω. One can easily see that the family (gn) of random homeomorphisms has the cocycle n∈N property n+k n k gω = gT kω ◦ gω. Example 2.4 (Barnsley’s chaos game). Let X = [0, 1] and define the maps x x 2 g , g : X → X, g (x) = , g (x) = + . 0 1 0 3 1 3 3 Barnsley’s chaos game is defined as follows. For a given initial value x0 ∈ X we create a sequence (xn) n ∈ N0 in the following way. Assume we know xn. Then we randomly pick an
ω ∈ {0, 1} and define xn+1 := gωn (xn). This defines a random dynamical system and we want to show how to write this as a skew product. One way to do this is to pick all the ωn simultaneously and then shift the obtained sequence accordingly. Ω+ := {0, 1}N0 be the set of all one-sided sequences ω = (ω ) with values in {0, 1}. If we equip this space n n∈N0 with the Borel-σ-algebra A w.r.t. the usual metric we can define a probability measure as follows. To any finite sequence α0, . . . , αn ∈ {0, 1} we can associate the cylinder
[α0, . . . , αn] := {ω ∈ Ω | ω0 = α0, . . . ωn = αn} ∈ A . Then there is a unique probability measure µ on (Ω, A ) such that −(n+1) µ ([α0, . . . , αn]) = 2 for all cylinders [α0, . . . , αn]. Selecting a sequence ω according to µ is equivalent to selecting 1 all the ωn individually and independently from {0, 1} with probability 2 each. Instead of accessing ωn directly we shift the sequence ω to the left n-times and read out the value at the 0 component. Formally we introduce the shift map T : Ω+ → Ω+, T ω = (ω ) n+1 n∈N0 and in a slight abuse of notation we write gω := gω0 . Then the orbit of a point x0 is then given by x (T nω) = (gn (x )) . n ω 0 n∈N0 The shift T is not invertible on Ω+, but can be made invertible by extending to the space Ω = {0, 1}Z of two-sided sequences. It is well known that T is ergodic on both Ω and Ω+.
2.1.2 Random invariant sets and measures Random dynamical systems are often studied through means of random invariant measures. We will present a few classical results below, some of which we will generalize to the non- autonomous case in Chapter 2.2. For a broader introduction to the topic we refer to the book [27] by Crauel. For a set A ⊆ Ω × X and ω ∈ Ω we denote by A (ω) := {x ∈ X | (ω, x) ∈ A} the ω-fiber of A. Using this notation we can re-interpret subsets A ⊆ Ω × X as maps from Ω to the power set P (X) of X. We will use both notions interchangeably.
18 Definition 2.5 ([27, Definition 2.1]). A random closed/compact set is a subset A ⊆ Ω × X such that A (ω) is closed/compact for every ω ∈ Ω and
ω 7→ d (x, A (ω)) is measurable for every x ∈ X. A set U ⊆ Ω × X is called random open if its fiber wise complement U c, defined via U c (ω) := X \ U (ω) , is random closed. We use the umbrella term random set for both random open and random closed sets.
Remark 2.6. Crauel and Kloeden propose in [29] that any measurable subset A ∈ A ⊗ B (X) should be called a random set. Open/closed/compact random sets are similarly defined by openness/closedness/compactness of the fibers A (ω). They justify this proposition with the following example. Given a non-measurable subset N ⊆ X, the set A = Ω ×N is a random set in definition 2.5, but not measurable as a subset of Ω × X. However, Proposition 2.4 in [27] suggests that both notions are equivalent in the case that (Ω, A , µ) is a complete probability space, and Lemma 2.7 in the same book shows that for any A ∈ A × B (X) with closed fibers A (ω) there exists a random closed set A˜ in the sense of Definition 2.5 with A = A˜ a.s. Since we are mostly concerned with random closed sets, the difference in definition boils down to a question of measurability. This will not cause any problems throughout this thesis. There are several equivalent characterizations of random closed/compact/open sets, see e.g [27, Proposition 2.4, Theorem 2.6] or [22, Theorem III.2].
Definition 2.7. A random set A is said to be forward invariant for the RDS T n g if gωA (ω) ⊆ A (T ω) for a.e. ω ∈ Ω. It is invariant if gωA (ω) = A (T ω) for almost every ω ∈ Ω.
If A ⊆ Ω × X is a random set it is easy to see that the fibers of (T n g) A can be expressed as [(T n g) A](T ω) = gωA (ω) , such that the above definition of (forward) invariance is equivalent to (forward) invariance under the skew product flow if we identify two random sets whenever they agree in almost every fiber. A special case arises when each fiber A (ω) consists of exactly one point.
Definition 2.8. A random fixed point is a measurable map a: Ω → X such that gω (a (ω)) for a.e. ω ∈ Ω.
We will not make a strict distinction between a random fixed point a and the according invariant random compact set {(ω, a (ω)) | ω ∈ Ω}. All definitions and results formulated for invariant random (compact) sets are assumed to include random fixed points as well. Random measures are probability measures living on the extended phase space Ω × X which preserve the ergodic structure in the ω component.
Definition 2.9. A random measure over (Ω, A , µ) is a probability measure α on A ⊗ B (X) such that α (A × X) = µ (A) for all A ∈ A . We also say that random measures have marginal µ on Ω. This structure allows us to decompose random measures into their ω-fibers.
19 Theorem 2.10 ([27, Proposition 6.6]). Let α be a random measure over (Ω, A , µ). There exists a decomposition of α into a family (αω)ω∈Ω of probability measures on X such that
ω 7→ αω (B) is measurable for every B ∈ B (X) and Z α (A) = αω (A (ω)) dµ (ω) for all A ∈ A ⊗ B (X). Moreover, this decomposition is a.s. unique in the following sense. If (βω)ω∈Ω is another decomposition of α, then αω = βω for a.e. ω ∈ Ω. In analogy to Definition 2.7, we recall the notion of random invariant measures.
Definition 2.11. A random invariant measure is a random measure α such that gω∗αω = αT ω for a.e. ω ∈ Ω.
A standard argument shows that for a random measure α the fibers of its pushforward under the skew product transformation are
((T n g)∗ α)T ω = gω∗αω such that α is invariant if and only if (T n g)∗ α = α. The following statement is a generalization of the well known Krylov-Boguliubov Theorem from topological dynamics (see e.g. [19, Section 4.6]). A proof can be found in Chapter 6 of [27]. We borrow these ideas to prove Theorem 2.31, which extends this result to non-autonomous random systems.
Theorem 2.12. Let A be an invariant random compact set. There is at least one random invariant measure α supported on A, i.e. α (A) = 1.
Of special importance are so called Markov invariant measures, as they are linked to statistical properties of the RDS. The defining property of a Markov process is roughly speaking that the future is independent from the past. The equivalent notion for RDS is known as the Markov Property. We give a precise definition below, loosely following the presentation in [50, Chapter 1.3.3]. For −∞ ≤ p < q ≤ ∞ we define the σ-algebra
k Fp,q := σ gT nω (x): x ∈ X and n ∈ Z, k ∈ N with p ≤ n < n + k ≤ q .
− + We call F := F−∞,−1 the σ-algebra of the past and F := F0,∞ the σ-algebra of the future. Definition 2.13. (a) A RDS is called a Markov RDS if F − and F + are independent. In this case we say that T n g has the Markov property. (b) A random invariant measure α is called Markov measure if the map ω 7→ αω (B) is F −-measurable for all B ∈ B (X).
20 d The name Markov property is justified, as in a Markov RDS over X ⊆ R , the stochastic process (Y ) defined via n n∈Z n Yn (ω) = gω (x) is a Markov chain w.r.t. the filtration ( ) . Indeed we have the relation F−∞,n n∈Z
Yn+1 (ω) = gT n+1ω (Yn (ω)) , and as Yn is F−∞,n measurable and ω 7→ gT n+1ω ( · ) is independent of F−∞,n we see that Z 0 E Yn+1 F−∞,n (ω) = gω0 (Yn (ω)) dµ ω = E [Yn+1 | Yn]. (2.2)
Equation 2.2 moreover shows that (Y ) is homogeneous and we can infer that its generator n n∈Z takes the form Z M : P → P, Mρ = gω∗ρ dµ (ω) ,
d where P denotes the set of all Borel probability measures on R . Definition 2.14. A measure ρ ∈ P is called stationary for M if Mρ = ρ.
Markov operators describe the statistical evolution of the RDS, but a priori it seems that they are not containing any actual dynamic information. However, the following result due to Ledrappier and Young [54] and Crauel [26] shows that stationary measures correspond 1:1 to invariant Markov measures of the Random Dynamical System. It is Theorem 4.2.9 in [50], where a proof is presented. Theorem 2.38 in the next chapter is a generalization of this result to the non-autonomous case.
Theorem 2.15. Let ϕ be a Markov RDS. (a) If α is a Markov measure then Z ρ := αω dµ (ω) (2.3)
is a stationary measure. (b) If ρ is a stationary measure for M, there exists a Markov measure α fulfilling (2.3). The disintegration of this measure α is a.s. given as
k αω = w-lim (g −k ) ρ, k→∞ T ω ∗ where w-lim denotes the weak limit of probability measures on X.
Remark 2.16. If a is a past-measurable random fixed point then we can define a Markov measure via αω = δa(ω). That this measure is invariant follows from the relation
gω∗αω = δgω(a(ω)) = αT ω.
In that case the according stationary measure is given as
ρ (B) = µ {a ∈ B} .
21 2.1.3 Attractors and repellers For a continuous dynamical system given by a map F : X → X on some metric space X, a local attractor is a compact invariant set A such that there is an η > 0 with
n lim dist (F Bη (A) ,A) = 0, (2.4) n→∞ or in other words the sequence (F kB (A)) converges to A in the Hausdorff distance. η k∈N In order to generalize this definition to a RDS T n g the most obvious ansatz would be to study the sequence k g (Bη (K (ω)) (2.5) ω k∈N for a random compact invariant set K, but in general this sequence does not converge as n → ∞. This can easily be seen in the example of Barnsley’s Chaos game, as each element 1 h 1 i h 2 i of the sequence independently has probability 2 to be contained in either 0, 3 or 3 , 1 . However, if instead of interpreting F n in (2.4) as going from time 0 to time n, we think of it as going from time −n to 0 and replace (2.5) by