<<

Imperial College London Department of

Non-autonomous Random Dynamical Systems: Stochastic Approximation and Rate-Induced Tipping

Michael Hartl

Supervised by Prof Sebastian van Strien and Dr Martin Rasmussen

A thesis presented for the degree of Doctor of Philosophy at Imperial College London.

Declaration

I certify that the research documented in this thesis is entirely my own. All ideas, theories and results that originate from the work of others are marked as such and fully referenced, and ideas originating from discussions with others are also acknowledged as such.

1 Copyright

The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives license. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the license terms of this work.

2 Abstract

In this thesis we extend the foundational theory behind and areas of application of non- autonomous random dynamical systems beyond the current state of the art. We generalize results from autonomous random dynamical systems theory to a non-autonomous realm. We use this framework to study stochastic approximations from a different point of view. In particular we apply it to study noise induced transitions between equilibrium points and prove a bifurcation result. Then we turn our attention to parameter shift systems with bounded additive noise. We extend the framework of rate induced tipping in deterministic parameter shifts for this case and introduce tipping probabilities. Finally we perform a case study by developing and applying a numerical method for calculating tipping probabilities and examining the results thereof.

3 Acknowledgments

I consider myself very lucky that I was included in the Innovative Training Network CRITICS1, funded entirely by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 643073.

Special thanks to: Hassan Alkhayuon, Peter Ashwin, Michel Benaïm, Jens Bendel, Daniele Castellana, Andrew Clarke, Michael Collins, Peter Deiml, Gabriela Depetri, Maximilian En- gel, Gabriel Fuhrmann, Tobias Jäger, Gerhard Keller, Jeroen Lamb, Johannes Lohmann, Iacopo Longo, Usman Mirza, Karl Nyman, Christian Oertel, Guillermo Olicon, Greg Pavli- otis, Courtney Quinn, Martin Rasmussen, Flavia Remo, Chris Richley, Paul Ritchie, Pablo Rodríguez-Sánchez, Edmilson Roque, Anderson Santos, Cristina Sargent, Tobias Schwedes, Jakob Seifert, Jan Sieber, Leif Stolberg, Sebastian van Strien, Damian Smug, Kalle Timperi, Dmitry Turaev, and my family. Dedicated to Rudi Wutz.

1Critical Transitions in Complex Systems

4 Contents

1 Introduction6 1.1 Non-autonomous random dynamical systems...... 6 1.2 Stochastic approximations...... 8 1.3 Rate-induced tipping...... 10 1.4 Structure of the thesis...... 13 1.5 Notation...... 15

2 Random dynamical systems 17 2.1 Background on autonomous RDS...... 17 2.1.1 Skew product systems...... 17 2.1.2 Random invariant sets and measures...... 18 2.1.3 and repellers...... 22 2.2 Non-autonomous random dynamical systems...... 24 2.2.1 Non-autonomous sets and measures...... 24 2.2.2 Attractors and repellers...... 32

3 Stochastic Approximations 35 3.1 Setup and notation...... 35 3.2 Examples...... 36 3.2.1 Urn models and market competition...... 36 3.2.2 Learning in games...... 37 3.2.3 Stochastic gradient descent...... 40 3.3 The Limit Set Theorem...... 41 3.4 Noise induced tipping in one dimension...... 43 3.4.1 Preliminaries...... 43 3.4.2 The invertible case...... 46 3.4.3 The non-invertible case...... 57 3.4.4 Further remarks...... 60 3.5 A bifurcation arising from touchpoints...... 62 3.6 Conclusions and outlook...... 66

4 Rate-induced tipping in random systems 68 4.1 Deterministic R-tipping...... 68 4.2 Asymptotically autonomous NRDS...... 69 4.3 Random parameter shift systems and rate induced tipping...... 75 4.4 An example...... 83 4.5 Further remarks and outlook...... 94

A Hausdorff distance 104

B Conditional expectation, martingales and stopping times 105

C Chain recurrence and asymptotic pseudotrajectories 109

5 1 Introduction

1.1 Non-autonomous random dynamical systems The theory of dynamical systems dates back to at least the 17th century, when Isaac Newton and contemporaries began exploring the motion of celestial bodies, which would later develop into the field of classical mechanics. Very loosely speaking, a is a space with a prescribed set of motion laws. The classical theory of dynamical systems assumes that those laws are known and fixed for all times. In many areas of applications however this setting is far too restrictive. This gives rise to the notion of non-autonomous dynamical systems. There have been efforts towards developing a unified theory of non-autonomous dynamical systems. A foundational account of this is the book [49] by Kloeden and Rasmussen, which also contains a comprehensive historical overview of the developments in this field. The basic idea is to describe non-autonomy by a base flow on some abstract parameter space which influences the observed dynamics on the phase space. Generally there are two main categories of such base flows. Firstly, one may think of situations where a system is subject to some driving force, which does not follow any known or predescribed motion laws. This can be simply due to the fact that a system is way too complex in for every detail of it to be included in the model, as it is the case for example in climate studies. Another possibility is that a system is subject to a truly random influence, like many economic or financial markets who are subject to human decisions. Models in those cases are often based on stochastic differential equations driven by a random process, often a ; examples include [23, 34, 58]. In systems with multiple time scales, a fast variable can seem random from the viewpoint of a slow timescale, such that a stochastic differential equation describes the motion of slow variables fairly well, see e.g. [57] for a theoretical approach and [43] for practical examples. Discrete time random systems are also of interest, where points in a space are iterated according to a map depending on some random parameter, such as random circle diffeomorphisms [78] or iterated function systems [10, 42]. The abstract parameter space in this case is usually a space, equipped with a measure preserving transformation; a setup popularized by Arnold in his seminal book [3]. On the other hand, motion laws can explicitly depend on time, for example through a varying parameter. Examples include the FitzHugh–Naguro model for the firing of neurons in the brain (cf. [39]), where the parameter is the level of stimulation. An early theoretical account is [72, 71]. In this setting, the abstract parameter space is the real “time” line, and the dynamics are given by a right shift. In the case of a periodic time dependency one can also use the unit circle as the abstract parameter space. Despite the two types of systems being included within the umbrella term non-autonomous dynamical system, a big part of the existing literature treats them separately. Random dynamical systems with their measure preserving base transformations give rise to a lot of measure theoretic tools that are not available for the time shift case. We will present a few highlights in Chapter 2.1. On the other hand this limits the way a non-autonomous system can depend on the driving force; the right shift on the does not possess an invariant probability measure. Only very recently, mathematicians have turned their attention to simultaneously time and noise dependent systems. To the best of our knowledge, the non-autonomous random

6 dynamical system formalism as presented in this thesis appeared for the first time in [30] under the name partial-random dynamical system, as the solution operator of a stochastic partial differential equation with a time varying domain. The term “partial-random” refers to the fact that the non-autonomous aspect of the dynamical system is only partially due to a random influence. The authors introduce a concept of non-autonomous random attractors and prove the existence of a global in their sense for an example class of systems. Their ideas were followed on by Wang [73, 74]. Other work, such as [1, 24, 74], is particularly concerned with existence of pullback attractors under periodic deterministic forcing. Cui and Langa [32] discuss various concepts of non-autonomous random attractors with compact deterministic component of the base flow. Remarkably this paper does not necessarily assume that their NRDS is induced by a non-autonomous SDE. A NRDS on a (metric) X has two basic ingredients. • A base consisting of a measurable map T on some (Ω, A , µ) which serves as the dynamical model for the noise. • A cocycle mapping

ϕ: N0 × T × Ω × X → X, (k, n, ω, x) 7→ ϕ (k, n; ω) x

describes the evolution of a state x under the influence of the noise realization ω, starting in time n and ending at time n + k. We assume troughout this thesis that the time line T is discrete, i.e. T = N0 or T = Z, but the case of continuous time is similar. Table1 shows the evolution of noise and state when going from time n to n + k and then n + k + l, and it illustrates the cocyle property

ϕ (k + l, n; ω) x = ϕ (l, n + k; T kω) ◦ ϕ (k, n; ω) x of ϕ. While this setup is generally agreed on in the literature cited above, the consensus is also that the base (Ω, A , µ, T ) has to be measure-preserving. Roughly speaking this that the influence of the noise is in some sense stationary, an assumption that makes sense when speaking about autonomous random systems. However we believe that this is an unnecessary restriction in the case of non-autonomous random systems. Chapter3 presents a class of examples where indeed T is not measure preserving in general. A more formal introduction is presented in Section 2.2. We generalize notions from au- tonomous RDS theory and show that some classical results remain true in our non-autonomous framework, such as the Krylov-Boguliubov Theorem 2.31 and the 1:1 correspondence between stationary measures and Markov invariant measures in Theorem 2.38. An important concept is pullback attraction, which is introduced in Definition 2.39. First one should notice that “sets” and “points” in a non-autonomous and random framework have both a time and a random component. Roughly speaking a non-autonomous random set is a sequence (A ) of set-valued maps A : Ω 7→ (X). Given two such objects U = (U ) n n∈Z n P n n∈Z and A = (A ) we say that A pullback attracts U if ϕ (k, n−k; T −kω)U (T −kω) converges n n∈Z n−k to An (ω) almost surely in the semi Hausdorff distance and for all n. If moreover U is a neighborhood of A and A is invariant under ϕ then we say A is a (local pullback) attractor. Chapters3 and4 are each devoted to a class of NRDS, and in both cases, this concept will play a big role in describing the dynamics of the systems.

7 time n n + k n + k + l noise ω T kω T k+lω state x ϕ (k, n; ω)x ϕ (k + l, n; ω)x

Table 1: Non-autonomous random dynamics

1.2 Stochastic approximations The term stochastic approximation was coined by Robbins and Monro in [66]. In their paper they were studying a system that changes some input x ∈ R to an output M (x) ∈ R, which is only accessible through a measurement with some stochastic error U. Under the assumption that M is monotone they give an algorithm of updated measurements of the form 1 x = x + (−M(x ) + a + U ) , (1.1) n+1 n n + 1 n n+1 that converges almost surely to the solution of M (x) = α if α is such that this equation has exactly one solution. The term in brackets on the right hand side of (1.1) corresponds to a measurement of −M (xn) + a with (random) error Un+1. Soon after, Kiefer and Wolfowitz [47] adapted an algorithm of a similar form for finding extremal points of some differentiable function M in the same manner, by approximating M 0 from erroneous measurements. The almost sure convergence in the above examples comes from the fact that the studied algorithms approximate a so called field differential equation. Equation (1.1) looks like an Euler approximation for x˙ = −M (x)+a, but with a decreasing time step and an additional noise term. The desired point is the stable fixed point of this differential equation. This ODE point of view has been established in a more solid way by Ljung in [56] and Kushner and Clark in [51]. Generalizations to this dynamical systems approach have been made by Benaïm and Hirsch, for example [12, 13, 15], and by Pemantle, e.g. in [59]. We will present some highlights of these results, in particular the ones by Benaïm and Hirsch, in Section 3.3. A large field of applications for stochastic approximation arises through their links to urn processes. The simplest model, dating back to Eggenberger and Pólya [36], consists of one urn with an initial distribution of balls of two colors. At each step, one ball is drawn randomly and returned to the urn, together with a fixed number of balls of the same color. The proportion of the balls forms a stochastic approximation process. Friedmann’s urn is similar, but adds α > 0 balls of the color drawn and β > 0 of the other color (see e.g. [40]). While those simple models can be analyzed with well established martingale techniques, several generalizations of those models that have been proposed are studied with stochastic approximation tools. One way of doing so is to modify the probability of drawing a specific color from the urn to be a function of its proportion, rather than the proportion itself. In his doctoral thesis [64], Renlund studies these models in a fairly general stochastic approximation setup. In graph based interactions of n > 2 colors in an urn were studied e.g. in [14] and [55]. Such models have been used to analyze market competition between similar products of different companies [5]. An overview of various applications for those models can be found in the survey paper [61]. Interacting Pólya urns have been used to create learning algorithms for games, for example the reinforcement learning models by Erev and Roth [38] and Arthur [6]. In [45] the author uses stochastic approximation techniques to give a complete description of the limiting behavior

8 of such an algorithm for the case of two players and two strategies. Similar techniques are applied to higher dimensional games in [46]. Fictitious play is a learning method based on players’ best response to their opponents’ actions. Introduced by Brown [20] in a deterministic setting, Fudenberg and Kreps extended the algorithm to a random setting in [41]. Benaïm and Hirsch proved convergence of this algorithm in a typical setting using stochastic approximation techniques in [16]. In Chapter3 of this thesis we focus on one-dimensional algorithms of the Robbins–Monro type, which take the general form

xn+1 = xn + γn+1 (f (xn) + Un+1) . (1.2)

We will study a setting where all the noise terms Un are bounded by a number R > 0, and where the aforementioned results by Benaïm and Hirsch can be applied to show that limits of paths of (1.2) can only be stable equilibrium points of the mean field ODE x˙ = f (x). Assuming there are several such equilibria, we are striving to answer the question whether or not the algorithm can get trapped near an equilibrium forever or if, with positive probability, it can escape and converge to another stable point eventually. Our main result in this chapter is Theorem 3.28. It provides a critical noise value for such transitions to occur and is of the following type.

Theorem. Let s0 and s1 be stable equilibrium points of x˙ = f (x) and V be a suitably small neighborhood of s0. Then there is a critical value R∗ such that

1 µ (xn → s | xk ∈ V ) > 0

∗ for all k ∈ N if R > R and 1 µ (xn → s | xk ∈ V ) = 0 for all large enough k whenever R < R∗.

However this formulation of the result is somewhat problematic. Firstly it depends on an a priory arbitrary neighborhood V of the starting equilibrium point s0 as a notion of being “close” to the point. Secondly, the process (x ) depends on its initial value x ∈ . This n n∈N0 0 R may cause ambiguity since for some choices of x0, V and k the event {xk ∈ V } may have probability 0 to begin with, and thus the conditional probabilities above are not well defined in those cases. For this reason we provide an alternative point of view, instead of seeing a stochastic approximation as a collection of stochastic processes {(x ) | x ∈ } we write n n∈N0 0 R them as a NRDS ϕ. This novel approach allows us to avoid such issues. We show that every unstable equilibrium point of the mean field ODE corresponds to a unique repelling non-autonomous random fixed point of ϕ. Those points separate the phase space into non-autonomous, random regions, each of which acts as the basin of attraction for one of the stable fixed points. These basins can then be used to reformulate and prove the above result in a coherent way, as we have done in Theorem 3.28. Roughly speaking, a tipping occurs if one stable equilibrium of the mean field is contained in the basin corresponding to another one. This interesting interplay between the non-autonomous random system ϕ and the au- tonomous deterministic one generated by the mean field ODE shows how powerful our ex- tended NRDS setup is. For one, NRDS generated by a Robbins-Monro algorithm are in general

9 not invertible in any sense. In general the sequence (U ) need not be stationary and thus n n∈N cannot be modeled using a measure preserving system (Ω, A , µ, T ) in the base. Moreover the fiber maps need not be invertible as well, and the system is not defined for negative times. Yet the remarkable fact that limiting objects are deterministic and autonomous can be captured and described. After a brief discussion of possible generalizations of our findings we close Chapter3 by extending a result by Pemantle [60] on urn processes with touch points. A touch point is hereby an equilibrium of the mean field ODE which is stable from one side and unstable from the other one, and it is shown in [60] that the one-sided derivative from the stable side determines whether the probability of convergence to this point is positive or not. Theorem 3.38 extends this result to a much more general class of stochastic approximations, which includes also systems with unbounded noise. Since the proof presented in [60] relies on the specific structure of the model studied, a new approach was necessary.

1.3 Rate-induced tipping Many phenomena in science and nature can be described as critical transitions or tipping points—on a very superficial level. We want to refer the reader to the Nature article [70] by Scheffer et. al. for a brief discussion of the terms and example situations. A more detailed account is the book [69] by the same author. The article [35] by Ditlevsen is focused on tipping of climate system. While the phenomenon critical transition is easy to grasp intuitively—an abrupt qualitative change in the system—and many insights have been gained, its mathematical foundation is still rather vague. The book [69] strives to explain many of the examples through means of bifurcations. The theory of topological bifurcations is well developed, but lies beyond the scope of this thesis, so we refer the reader to the ample literature on this topic. However it has been suggested that topological bifurcations alone cannot explain all critical transitions sufficiently. As mentioned in Section 1.1, many models incorporate a random component, and it has been found that this noise can also lead to a critical transition: A system in a stable state can be pushed to another state by a large enough perturbation. The models presented in [43] are an example. One of our main results (Theorem 3.28 in Chapter3) gives necessary and sufficient conditions for the presence of noise induced tipping in non-autonomous random systems generated by a stochastic approximation algorithm. Wieczorek, Ashwin et al. [75] identify yet another type of critical transitions which they call rate-induced tipping. They study a system in which a parameter changes linearly without putting the system through a classical bifurcation, and where noise is not present. They observe that the speed, or rate, at which the parameter changes can influence the stability of the system. Vaguely speaking, a rate-induced tipping occurs when the parameter change is so fast that the system in a stable state has not enough time to follow the parameter change and instead tips into another state. In [7] the authors give a coherent definition of rate-induced tipping in parameter shift systems x˙ t = f (xt,Λ (rt)) where Λ is a bounded, real-valued (parameter shift) function, limiting to some λ± as t → ±∞, and r > 0 is the speed or rate of parameter change. They show that for every stable fixed point x− of the limiting ODE x˙ t = f (xt, λ−) there is a non- autonomous pullback attracting fixed point of the parameter shift system converging to x− as t → −∞. If x− gets continuously transformed to a stable fixed point x+ of x˙ t = f (xt, λ+)

10 An example for rate-induced tipping

2.0 0.4

1.5 0.2 ) ) t x ( 1.0 ( 0.0 0 f

0.5 -0.2

0.0 -0.4 -0.5 0.0 0.5 1.0 1.5 2.0 -1.0 -0.5 0.0 0.5 1.0 t x

3 3

2 2

t 1 t 1 x x

0 0

-1 -1 -0.5 0.0 0.5 1.0 1.5 2.0 -0.5 0.0 0.5 1.0 1.5 2.0 r t r t

Figure 1: A simple model that displays rate-induced tipping is given by f (x, λ) = f0 (x − λ) 3 with f0 (x) = x − x (top right panel) and a ramping function Λ that increases linearly from 0 to 2 on the interval [0, 1] and is constant otherwise (top left panel). For t < 0 the non- autonomous ODE x˙ t = f (xt,Λ (rt)) is equivalent to the autonomous ODE x˙ t = f0 (xt) which 1 has attracting equilibria at x = 1 and x = −1. Similarly for t > r it is equivalent to the ODE x˙ t = f (xt, 2) with attracting equilibria x = 1 and x = 3, where 1 is linearly shifted into 3 and −1 into 1. The bottom two panels show the solution of x˙ t = f (xt,Λ (rt)) starting in xt = 1 at t < 0 for rates r = 0.2 (left) and r = 0.4 (right), and the aforementioned shift of the equilibria is indicated by the dashed lines. On the left hand side, the solution xt follows the dashed line, albeit with a slight delay, whereas on the right hand side it tips towards the other equilibrium.

11 1.00

0.75

0.50

tipping probability 0.25

0.00 0 1 1 1 2 6 3 2 3 rate r of parameter change

Figure 2: Probability that the example system of Section 4.4 undergoes rate-induced tipping, depending on the rate r for a fixed noise size and parameter shift Λ. Figure6 shows similar plots for various noise sizes and Λ.

under the parameter change from λ− to λ+, a rate-induced tipping occurs if the aforementioned non-autonomous point does not converge to x+. These results are generalized to arbitrary attractors of the limiting systems in [2]. We will present some of the results from that paper in Chapter4. A toy example is displayed and explained in Figure1 Ashwin, Wieczorek et al. propose in [8] that the three scenarios described above are typical, and they subsequently suggest a classification of tipping scenarios into the three useful categories • bifurcation induced tipping (B-tipping), • noise induced tipping (N-tipping) and • rate-induced tipping (R-tipping). However these categories need not be mutually disjoint. Ritchie and Sieber [65] perform a phenomenological study of a system with both noise and a parameter change. They observe that tipping is a random phenomenon, but the tipping probability depends on the rate; tipping more likely occurs at higher rates. In Chapter4 we are concerned with exactly this kind of system, albeit on a discrete time line and with bounded noise. That this makes a difference is explored in Section 4.4. For a given example with a parameter shift function very similar to the one used in [65] we performed a numeric calculation of tipping probabilities. Figure2 shows a plot of rate versus tipping probability. The perhaps most striking observation from this analysis is that tipping probability is not a monotone function of the rate—contradicting both the findings of [65] and the intuition that a faster change of a parameter should make it easier for a system to be knocked out of equilibrium. At this stage we cannot give a full analytic explanation of the observed phenomenon, but we discuss a few possible starting points for such an analysis. Due to the simultaneous presence of noise and an explicit time dependence of the parameter

12 change, the framework of non-autonomous random dynamical systems is a natural choice. Building on the ideas in [7] we describe a parameter shift system with additive noise as a r family (ϕ )r>0 of NRDS converging to limiting autonomous RDS as time tends to ∞ and −∞, with the limiting systems not depending on the rate r. The main result of this chapter, Theorem 4.18, states roughly speaking that any nice enough random attractor A− of the past-limiting system admits a non-autonomous random attractor (Ar ) of ϕr for every rate n n∈Z r r, such that An converges to A− in the semi Hausdorff distance as n → −∞. r Using these non-autonomous attractors we say the system tips at rate r if An does not converge to an attractor A+ of the future-limiting system as n → ∞. As usual with convergence of objects in probability spaces there are different types of convergence which leads to different definitions of tipping. A weaker version is presented in Definition 4.22 and a stronger one in Definition 4.24. The latter one has the advantage that a tipping probability can be defined as well. These attractors, plus a similarly constructed family of repellers, are also the foundation of the algorithm we used to calculate tipping probabilities in Figure6.

1.4 Structure of the thesis The main body of this thesis is divided into three thematically distinct parts, corresponding to Chapters2,3 and4. Chapter2 is a more conceptual account of random dynamical systems theory. Our main goal is to extend well established notions and results from autonomous RDS theory to a non-autonomous setting. For this reason, Chapter2 is split up into two main parts that progress in a parallel motion. The first part gives a short overview over some key elements of autonomous random dy- namical systems theory, relying entirely on existing literature. We present the general RDS framework of a skew product flow on a over an ergodic base, and introduce con- cepts such as random invariant points, sets and measures, decomposition of random measures, stationary measures and the and pullback attractors. Moreover we state two classic theorems, namely the random Krylov-Boguliubov Theorem and the Theorem on 1:1 correspondence of stationary measures and Markov invariant measures. In the second part we present the concept of a non-autonomous random dynamical system as a cocycle flow on a metric space. As opposed to the existing literature on this topic, we do not assume that the is generated by an ergodic or at least measure preserving dynamical system, but by any measurable transformation on a probability space. This allows us to include systems into our framework where the of the noise changes over time—this is true for example with stochastic approximations as presented in Chapter3. We generalize the notions of random invariant points, sets and measures to our extended setting, taking inspiration from existing works such as [24, 30, 32] before proving a non- autonomous random Krylov-Boguliubov result (Theorem 2.31). We proceed with introducing non-autonomous stationary measures and a non-autonomous Markov property. To our knowl- edge, this has not been done in the existing literature. One has to be a bit careful with the latter since the past of the system can be understood w.r.t. the explicit time component and the noise component. Theorem 2.38 is a non-autonomous generalization of the aforementioned correspondence theorem for stationary and Markov measures. Finally we present a concept of pullback attraction and discuss some of its properties and differences to the autonomous case. Chapter3 starts with a formal introduction of Robbins-Monro type algorithms as recur-

13 sively defined stochastic processes approximating an mean field ODE, and set and explains some standard assumptions. While the largest part of the existing literature seems to under- stand stochastic approximations as a special type of recursively defined stochastic processes, we introduce an alternative point of view by describing them within the non-autonomous RDS framework introduced in Chapter2. This will prove to be more adequate for some of our research. In the next section we present a few example applications that can take the form of stochastic approximations: generalized Pólya urns and their application in economy, two different models of learning in game theory and finally stochastic gradient descent algorithms as often used in optimization and problems. Section 3.3 is a brief recapitu- lation of some estabished results about stochastic approximations, mostly due to Benaïm et. al. Section 3.4 is concerned with noise-induced tipping in stochastic approximation NRDS with bounded noise; we will focus on the one-dimensional case. We assume that the mean field is hyperbolic and has a finite number of equilibria. We provide a simple condition under which almost every path of the according stochastic approximation converges to one of the stable equilibria of the mean field. Theorem 3.16 shows that a.e. path is bounded, which allows us to use the results from Section 3.3. After establishing this we show in Lemma 3.23 and Theorem 3.26 that the unstable equilibria of the mean field correspond 1:1 to repelling non- autonomous fixed points of the NRDS converging to them. This is where it becomes apparent that a NRDS approach is more suitable than the classical stochastic processes, because in the latter case any given solution of the Robbins-Monro algorithm a.s. does not converge to an unstable point. These repelling points separate the phase space into (non-autonomous random) regions, each of which serves as the basin of attraction of a stable equilibrium, and this allows us to give a precise definition of noise-induced transitions. Finally, Theorem 3.28 shows the existence of critical values of the noise size at which transitions between two given stable points become possible. The rest of Section 3.4 shows how the technique developed can be applied to other scenarios, and also limitations thereof. First we assume that the condition preventing paths from being unbounded is violated. Based on Lemma 3.33 we use our methods to provide at least a local description of the dynamics in this case. Example 3.35 however shows that in this case the global dynamics can be more complicated. Finally we briefly discuss stochastic approximations of parameter depending mean fields and on the unit circle (Theorem 3.36). Section 3.5 is motivated by [60] where the author studies mean fields with non-hyperbolic touchpoints in the context of generalized Pólya urn models; he presents conditions under which the probability of convergence to such a point is zero or strictly positive. However, the methods used rely on the specific structure of the model. By applying a different approach we managed to show that the main result of [60] holds true for a much bigger class of stochastic approximations (Theorem 3.38). In particular, our result includes scenarios with bounded as well as unbounded noise. Chapter4 is devoted to rate-induced tipping in random systems. Our starting point is the deterministic framework for rate-induced tipping proposed by Ashwin, Wieczorek et. al. in [9, 7], which we briefly recall in Section 4.1. Section 4.2 is a conceptual analysis of asymptotically autonomous NRDS, i.e. systems whose dynamics converge to those of a limiting autonomous RDS as time goes to −∞. We are mostly concerned with the question: If this limiting system has an attractor, does this correspond to a non-autonomous attractor “tracking” the autonomous one? In Definition 4.7 we introduce two different concepts of

14 tracking, one of which is weaker than the other one. Theorem 4.9 gives a condition under which a given limiting attractor is tracked in the weak sense. In a similar fashion, Theorem 4.11 provides some stronger conditions under which an attracting fixed point of the limiting system is tracked in the strong sense. In both cases the proof is constructive and moreover it is shown that the resulting non-autonomous attractor is maximal in the sense that ever other non-autonomous tracking attractor is contained in it. Section 4.3 introduces NRDS that stem from a parameter shift systems as defined by Ashwin, Wieczorek et. al., but with bounded additive noise. Theorem 4.18 shows that for small enough noise size, those systems are in the realm of the previous section and thus the existence of tracking attractors is guaranteed. Moreover we prove that they depend continuously on the rate in Theorem 4.19. These attractors allow us to extend the definition of rate-induced tipping to random dynamical systems. Theorem 4.23 shows that there is a strictly positive threshold the rate has to pass in order for tipping to occur. This is known in the deterministic case, but our result extends this to the random setting. Corresponding to the above mentioned strong version of tracking is a path-wise notion of tipping which in turn allows us to speak about tipping probabilities. We explore the properties of tipping probabilities through an example in Section 4.4, but the methods developed can be applied in a wider context, the extent of which is unclear to us at the present stage. The main tool we are using are non-autonomous random repellers tracking autonomous random repellers of the future limit system. After giving an equivalent characterization of tipping in Lemma 4.30 using those repellers, we then apply the continuity of attractors and repellers to show that the tiping probability is uniformly continuous as a function of the rate (Theorem 4.32). A further analysis of tipping probabilities is performed with a numerical method we developed based on the tipping characterization using repellers. Our results show that surprisingly, tipping probability is not a monotone function of the rate, and moreover that it is not differentiable. We present an intuition why this is the case, but a rigorous proof is still to be found. Finally we calculate the limits of tipping probabilities in the limit of the noise size to zero in Theorem 4.35.

1.5 Notation The following notational and other conventions are in use throughout this thesis. • We use the symbol N to denote the set of natural numbers 1, 2,... and N0 for the set of non-negative integers 0, 1, 2,... . • As usual the symbols Z, Q and R denote the sets of integers, rational numbers and real numbers respectively. • The empty set ∅ is a finite set and every finite set is countable. Let B ⊆ A be sets and f : A → R any map. • We use the short hand notation sup f := sup f (x) , B x∈B

and similarly maxB f, infB f and minB f. • The symbol 1B denotes the indicator function of B, i.e. (1, if x ∈ B 1B : A → R, 1B (x) = 0, if x∈ / B.

15 • The power set of A is denoted as P (A). • The complement of B is Bc := A \ B. For a subset A ⊆ M of a topological space M we write • cl A for the closure of A, • int A for the interior of A and • ∂A := cl A \ int A for its boundary. Let (M, d) be a metric space. • Given a subset A ⊆ M and a point x ∈ M we write

d (x, A) = d (A, x) := inf d (x, y) . y∈A

• For ε > 0 we denote by

Bε (x) := {y ∈ M | d (x, y) < ε}

the open ε-ball around x. • We reuse the symbol Bε to denote open ε-neighborhoods of sets A ⊆ M,

Bε (A) := {y ∈ M | d (A, y) < ε} .

• The according closed ball and closed neighborhood are denoted by B≤ε (x) and B≤ε (A). • The diameter of A is

diam (A) := sup {d (x, y) | x, y ∈ A} .

• The semi-Hausdorff distance on subsets of M is denoted as dist and the Hausdorff distance as dh. For details we refer to AppendixA. Let (Ω, A , µ) be a measure space and (Θ, B) a measurable space. • The pushforward measure of µ under a measurable map f : Ω → Θ is defined via −1  f∗µ (B) = µ f B for B ∈ B and f∗µ is a measure on (Θ, B). • For a collection A of subsets of Ω we denote by σ (A) the smallest σ-algebra containing every element of A. • If f : Ω → Θ is a map and A consists of the sets f −1B with B ∈ B we write σ (f) = σ (A). • Similarly if (fi)i∈I is a family of maps we write σ (fi : i ∈ I) for the smallest σ-algebra −1 containing all sets fi B, B ∈ B. • If (M, T ) is a topological space we denote its Borel-σ-algebra as B (M) = σ (T ). • A measurable map T : Ω → Ω is bi-measurable if it is invertible and the inverse map is measurable. • If A, B ∈ A we say that A ⊆ B modulo µ if µ (A \ B) = 0, and A = B modulo µ if A ⊆ B and B ⊆ A modulo µ. If M is a metric space and µ, µ1, µ2,... are probability measures on (M, B (M)) we say that µ converges weakly to µ if n Z Z lim h dµn = h dµ n→∞ for all continuous, bounded maps f : M → R. We write µ = w-limn→∞ µ in this case.

16 2 Random dynamical systems

2.1 Background on autonomous RDS This section is an overview over important results from (autonomous) RDS theory, as a primer for the non-autonomous setting. For further reading on the topic we recommend the book [3].

2.1.1 Skew product systems A random dynamical system is, roughly speaking, a dynamical system on some metric space (X, d), where the dynamics at each time are subject to a random influence. So in order to describe those dynamics we do not only need a single map g : X → X, but rather a whole family (gω)ω∈Ω. Then the (random) orbit of an initial value x0 is the sequence

x0, x1 = gω0 (x0) , x2 = gω1 (x1) ,... (2.1) where the ω0, ω1,... are picked from the set Ω according to some probability law. It has proven convenient to use a dynamical modeling of this noise influence. By this we mean that we describe the sequence of the noise variables ω0, ω1,... as a dynamical system on some probability space.

Definition 2.1. Let (Ω, A , µ) be a probability space. (a) A map T : Ω → Ω is called measure preserving, if µ T −1A = µ (A) for every A ∈ A . In that case we call the tuple (Ω, A , µ, T ) a measure preserving dynamical system (MPDS) and the measure µ invariant under T . (b) An MPDS (Ω, A , µ, T ) is called ergodic, if µ (A) ∈ {0, 1} for each A ∈ A with T −1A = A. Such sets A are said to be invariant. (c) We say the MPDS (Ω, A , µ, T ) is invertible, if the map T is bi-measurable.

Remark 2.2. If (Ω, A , µ, T ) is measure preserving/ergodic and invertible, then the inverse map T −1 is measure preserving/ergodic as well. It is a well known result from that ergodic systems are the building blocks of MPDS. Every (nice enough) T - µ can be decomposed into ergodic measures in a unique way ([37, Theorem 6.2]). Moreover, any ergodic system can be extended to be invertible (cf. [37, Exercise 2.1.7]). Thus we can assume that our noise model is ergodic and invertible without loss of (a lot of) generality. From now on we assume that (Ω, A , µ, T ) is an ergodic and invertible MPDS.

Definition 2.3. A random homeomorphism is a map g : Ω × X → X such that ω 7→ g (ω, x) is measurable for every x ∈ X and x 7→ g (ω, x) is continuous for every ω ∈ Ω.

Instead of g (ω, x) we will write gω (x). Just as in (2.1), the dynamics in X are described by the maps gω, but at the same time we we can now evolve ω according to T . This leads to the skew product map

T n g : Ω × X → Ω × X, (ω, x) 7→ (T ω, gω (x)) .

This allows us to describe the inherently non-autonomous random dynamical system on X as living within an autonomous dynamical system on the extended phase space Ω × X. We

17 n can describe the n time step dynamics on X by applying (T n g) and extracting the second n n n component. We observe that (T n g) is also of skew product form, namely T × g where n gω := gT n−1ω ◦ · · · ◦ gT ω ◦ gω. One can easily see that the family (gn) of random homeomorphisms has the cocycle n∈N property n+k n k gω = gT kω ◦ gω. Example 2.4 (Barnsley’s chaos game). Let X = [0, 1] and define the maps x x 2 g , g : X → X, g (x) = , g (x) = + . 0 1 0 3 1 3 3 Barnsley’s chaos game is defined as follows. For a given initial value x0 ∈ X we create a sequence (xn) n ∈ N0 in the following way. Assume we know xn. Then we randomly pick an

ω ∈ {0, 1} and define xn+1 := gωn (xn). This defines a random dynamical system and we want to show how to write this as a skew product. One way to do this is to pick all the ωn simultaneously and then shift the obtained sequence accordingly. Ω+ := {0, 1}N0 be the set of all one-sided sequences ω = (ω ) with values in {0, 1}. If we equip this space n n∈N0 with the Borel-σ-algebra A w.r.t. the usual metric we can define a probability measure as follows. To any finite sequence α0, . . . , αn ∈ {0, 1} we can associate the cylinder

[α0, . . . , αn] := {ω ∈ Ω | ω0 = α0, . . . ωn = αn} ∈ A . Then there is a unique probability measure µ on (Ω, A ) such that −(n+1) µ ([α0, . . . , αn]) = 2 for all cylinders [α0, . . . , αn]. Selecting a sequence ω according to µ is equivalent to selecting 1 all the ωn individually and independently from {0, 1} with probability 2 each. Instead of accessing ωn directly we shift the sequence ω to the left n-times and read out the value at the 0 component. Formally we introduce the shift map T : Ω+ → Ω+, T ω = (ω ) n+1 n∈N0 and in a slight abuse of notation we write gω := gω0 . Then the orbit of a point x0 is then given by x (T nω) = (gn (x )) . n ω 0 n∈N0 The shift T is not invertible on Ω+, but can be made invertible by extending to the space Ω = {0, 1}Z of two-sided sequences. It is well known that T is ergodic on both Ω and Ω+.

2.1.2 Random invariant sets and measures Random dynamical systems are often studied through means of random invariant measures. We will present a few classical results below, some of which we will generalize to the non- autonomous case in Chapter 2.2. For a broader introduction to the topic we refer to the book [27] by Crauel. For a set A ⊆ Ω × X and ω ∈ Ω we denote by A (ω) := {x ∈ X | (ω, x) ∈ A} the ω-fiber of A. Using this notation we can re-interpret subsets A ⊆ Ω × X as maps from Ω to the power set P (X) of X. We will use both notions interchangeably.

18 Definition 2.5 ([27, Definition 2.1]). A random closed/compact set is a subset A ⊆ Ω × X such that A (ω) is closed/compact for every ω ∈ Ω and

ω 7→ d (x, A (ω)) is measurable for every x ∈ X. A set U ⊆ Ω × X is called random open if its fiber wise complement U c, defined via U c (ω) := X \ U (ω) , is random closed. We use the umbrella term random set for both random open and random closed sets.

Remark 2.6. Crauel and Kloeden propose in [29] that any measurable subset A ∈ A ⊗ B (X) should be called a random set. Open/closed/compact random sets are similarly defined by openness/closedness/compactness of the fibers A (ω). They justify this proposition with the following example. Given a non-measurable subset N ⊆ X, the set A = Ω ×N is a random set in definition 2.5, but not measurable as a subset of Ω × X. However, Proposition 2.4 in [27] suggests that both notions are equivalent in the case that (Ω, A , µ) is a complete probability space, and Lemma 2.7 in the same book shows that for any A ∈ A × B (X) with closed fibers A (ω) there exists a random closed set A˜ in the sense of Definition 2.5 with A = A˜ a.s. Since we are mostly concerned with random closed sets, the difference in definition boils down to a question of measurability. This will not cause any problems throughout this thesis. There are several equivalent characterizations of random closed/compact/open sets, see e.g [27, Proposition 2.4, Theorem 2.6] or [22, Theorem III.2].

Definition 2.7. A random set A is said to be forward invariant for the RDS T n g if gωA (ω) ⊆ A (T ω) for a.e. ω ∈ Ω. It is invariant if gωA (ω) = A (T ω) for almost every ω ∈ Ω.

If A ⊆ Ω × X is a random set it is easy to see that the fibers of (T n g) A can be expressed as [(T n g) A](T ω) = gωA (ω) , such that the above definition of (forward) invariance is equivalent to (forward) invariance under the skew product flow if we identify two random sets whenever they agree in almost every fiber. A special case arises when each fiber A (ω) consists of exactly one point.

Definition 2.8. A random fixed point is a measurable map a: Ω → X such that gω (a (ω)) for a.e. ω ∈ Ω.

We will not make a strict distinction between a random fixed point a and the according invariant random compact set {(ω, a (ω)) | ω ∈ Ω}. All definitions and results formulated for invariant random (compact) sets are assumed to include random fixed points as well. Random measures are probability measures living on the extended phase space Ω × X which preserve the ergodic structure in the ω component.

Definition 2.9. A random measure over (Ω, A , µ) is a probability measure α on A ⊗ B (X) such that α (A × X) = µ (A) for all A ∈ A . We also say that random measures have marginal µ on Ω. This structure allows us to decompose random measures into their ω-fibers.

19 Theorem 2.10 ([27, Proposition 6.6]). Let α be a random measure over (Ω, A , µ). There exists a decomposition of α into a family (αω)ω∈Ω of probability measures on X such that

ω 7→ αω (B) is measurable for every B ∈ B (X) and Z α (A) = αω (A (ω)) dµ (ω) for all A ∈ A ⊗ B (X). Moreover, this decomposition is a.s. unique in the following sense. If (βω)ω∈Ω is another decomposition of α, then αω = βω for a.e. ω ∈ Ω. In analogy to Definition 2.7, we recall the notion of random invariant measures.

Definition 2.11. A random invariant measure is a random measure α such that gω∗αω = αT ω for a.e. ω ∈ Ω.

A standard argument shows that for a random measure α the fibers of its pushforward under the skew product transformation are

((T n g)∗ α)T ω = gω∗αω such that α is invariant if and only if (T n g)∗ α = α. The following statement is a generalization of the well known Krylov-Boguliubov Theorem from topological dynamics (see e.g. [19, Section 4.6]). A proof can be found in Chapter 6 of [27]. We borrow these ideas to prove Theorem 2.31, which extends this result to non-autonomous random systems.

Theorem 2.12. Let A be an invariant random compact set. There is at least one random invariant measure α supported on A, i.e. α (A) = 1.

Of special importance are so called Markov invariant measures, as they are linked to statistical properties of the RDS. The defining property of a Markov process is roughly speaking that the future is independent from the past. The equivalent notion for RDS is known as the Markov Property. We give a precise definition below, loosely following the presentation in [50, Chapter 1.3.3]. For −∞ ≤ p < q ≤ ∞ we define the σ-algebra

 k  Fp,q := σ gT nω (x): x ∈ X and n ∈ Z, k ∈ N with p ≤ n < n + k ≤ q .

− + We call F := F−∞,−1 the σ-algebra of the past and F := F0,∞ the σ-algebra of the future. Definition 2.13. (a) A RDS is called a Markov RDS if F − and F + are independent. In this case we say that T n g has the Markov property. (b) A random invariant measure α is called Markov measure if the map ω 7→ αω (B) is F −-measurable for all B ∈ B (X).

20 d The name Markov property is justified, as in a Markov RDS over X ⊆ R , the (Y ) defined via n n∈Z n Yn (ω) = gω (x) is a w.r.t. the filtration ( ) . Indeed we have the relation F−∞,n n∈Z

Yn+1 (ω) = gT n+1ω (Yn (ω)) , and as Yn is F−∞,n measurable and ω 7→ gT n+1ω ( · ) is independent of F−∞,n we see that Z   0 E Yn+1 F−∞,n (ω) = gω0 (Yn (ω)) dµ ω = E [Yn+1 | Yn]. (2.2)

Equation 2.2 moreover shows that (Y ) is homogeneous and we can infer that its generator n n∈Z takes the form Z M : P → P, Mρ = gω∗ρ dµ (ω) ,

d where P denotes the set of all Borel probability measures on R . Definition 2.14. A measure ρ ∈ P is called stationary for M if Mρ = ρ.

Markov operators describe the statistical evolution of the RDS, but a priori it seems that they are not containing any actual dynamic information. However, the following result due to Ledrappier and Young [54] and Crauel [26] shows that stationary measures correspond 1:1 to invariant Markov measures of the Random Dynamical System. It is Theorem 4.2.9 in [50], where a proof is presented. Theorem 2.38 in the next chapter is a generalization of this result to the non-autonomous case.

Theorem 2.15. Let ϕ be a Markov RDS. (a) If α is a Markov measure then Z ρ := αω dµ (ω) (2.3)

is a stationary measure. (b) If ρ is a stationary measure for M, there exists a Markov measure α fulfilling (2.3). The disintegration of this measure α is a.s. given as

k αω = w-lim (g −k ) ρ, k→∞ T ω ∗ where w-lim denotes the weak limit of probability measures on X.

Remark 2.16. If a is a past-measurable random fixed point then we can define a Markov measure via αω = δa(ω). That this measure is invariant follows from the relation

gω∗αω = δgω(a(ω)) = αT ω.

In that case the according stationary measure is given as

ρ (B) = µ {a ∈ B} .

21 2.1.3 Attractors and repellers For a continuous dynamical system given by a map F : X → X on some metric space X, a local attractor is a compact invariant set A such that there is an η > 0 with

n lim dist (F Bη (A) ,A) = 0, (2.4) n→∞ or in other words the sequence (F kB (A)) converges to A in the Hausdorff distance. η k∈N In order to generalize this definition to a RDS T n g the most obvious ansatz would be to study the sequence k  g (Bη (K (ω)) (2.5) ω k∈N for a random compact invariant set K, but in general this sequence does not converge as n → ∞. This can easily be seen in the example of Barnsley’s Chaos game, as each element 1 h 1 i h 2 i of the sequence independently has probability 2 to be contained in either 0, 3 or 3 , 1 . However, if instead of interpreting F n in (2.4) as going from time 0 to time n, we think of it as going from time −n to 0 and replace (2.5) by

k −k  g −k (Bη (K (T ω))) , (2.6) T ω n∈N we get an alternative definition of random attraction. To generalize even further, we can allow any random set to be attracted. Definition 2.17. A random set U is said to be a random neighborhood of a random set A if there exists a random open set U˜ such that A (ω) ⊆ U˜ (ω) ⊆ U (ω) for a.e. ω ∈ Ω. Definition 2.18. An invariant random compact set is said to pullback attract a bounded random set U if k −k dist (g −k U (T ω),A (ω)) −−−→ 0 T ω k→∞ for a.e. ω ∈ Ω. If A attracts a forward invariant random neighborhood of itself, we call it a local . Remark 2.19. Sequences as defined in (2.5) may converge to a “moving target”, which leads us to the concept of a forward attractor. Precisely, an invariant compact random set is said to forward attract a bounded random set U if

k k dist (gωU (ω),A (T ω)) −−−→ 0 k→∞ for a.e. ω ∈ Ω. Forward and pullback convergence are not equivalent, the paper [42] gives examples of both forward attracting and not forward attracting pullback attractors. However if a.s. convergence is replaced by weak convergence, forward and pullback convergence coincide. The paper [28] provides a simple criterion for the existence of a local pullback attractor. Given any random set U, the omega limit of U is the random set ΩU with fibers

k −k ΩU (ω) := lim g −k U (T ω). k→∞ T ω Theorem 2.20 ([28, Proposition 3.6]). Assume U is a random set and K is a random compact k −k 2 set such that for a.e. ω ∈ Ω there is k0 ∈ N with gT −kωU (T ω) ⊆ K (ω) whenever k ≥ k0 . Then ΩU is a compact invariant random set attracting U.

22 A converse result is [31, Lemma 8]

Lemma 2.21. Assume A, U are compact, invariant random sets such that A attracts U. Then ΩU ⊆ A a.s.

Remark 2.22. Let T n g be a Markov RDS and assume A attracts a past-measurable random compact neighborhood U of itself. Then Lemma 2.21 implies that A = ΩA ⊆ ΩU ⊆ A a.s. But since ΩU is past-measurable this implies that A is past measurable as well. In the case that A is a random fixed point it suffices that it attracts any past measurable U, in particular this is true for deterministic U. Example 2.23. This continues Barnsley’s Chaos Game 2.4. For ω ∈ Ω we have that

k gT −kω ([0, 1]) = gω−1 ◦ ... ◦ gω−n ([0, 1]) is a decreasing sequence of compact intervals of lengths 3−1, 3−2,... , such that the intersection over all sets contains exactly one point, which we denote as a (ω). It is not hard to verify that a: Ω → R is an attracting fixed point. Moreover it is not hard to see that a is a bijection between Ω− and the middle thirds Cantor set.

2In the terminology of [28], K absorbs U.

23 2.2 Non-autonomous random dynamical systems 2.2.1 Non-autonomous sets and measures Random dynamical systems as described in the previous section are non-autonomous in their nature due to the presence of the noise variables, but by using a dynamical model for the noise we could make them behave autonomously in several ways. (a) The dynamics on the extended phase space Ω × X are autonomous. (b) The “average dynamics”, described by the Markov operator M : P → P are autonomous. These were due to the fact that we assumed stationarity of the base transformation and that the time dependence of fiber maps gω was through the noise ω alone. In this section we want to study a more general type of random dynamical system that does not fulfill the above assumptions. As before, we assume that (Ω, A , µ) is a probability space and (X, d) a metric space. The base dynamics are given by a measurable map T : Ω → Ω. In the autonomous case, the assumption that T preserves the measure µ was natural in the sense that all fiber maps gω, gT ω,... along an orbit have the same distribution. However, in this chapter we consider the case of fiber maps with an explicit time dependence, and thus a priori their distribution can depend on the time as well. So we do in general no longer assume that T preserves µ. n Instead, T generates a whole family of probability measures µn := T∗ µ. The systems studied in Chapter3 have a non-measure preserving base in general. A non-autonomous random dynamical system (NRDS) over (Ω, A , µ, T ) consists of a family (f ) of measurable maps f : Ω × X → X, where ∈ { , } can be one-sided n n∈T n T N0 Z or two-sided (discrete) time. We write fn,ω (x) := fn (ω, x) and assume that all the maps fn,ω : X → X are continuous. We will use the non-autonomous flow notation

ϕ (k, n; ω) := fn+k−1,T n+k−1ω ◦ · · · ◦ fn,ω to describe the dynamics of starting in time n ∈ T and going k ∈ N0 steps forward under the noise realization ω ∈ Ω, where we assume the usual convention ϕ (0, n; ω) x = x for all ω ∈ Ω, n ∈ T and x ∈ X. Definition 2.24. The map ϕ: N0 × T × Ω × X → X is called non-autonomous random dynamical system. We will use the abbreviation NRDS. The maps fn,ω are called the fiber maps of ϕ. We say ϕ (a) has invertible fibers if fn,ω is invertible for all n ∈ N and a.e. ω ∈ Ω. (b) is invertible if it has invertible fibers and T : Ω → Ω is bi-measurable. (c) is reversible if it is invertible and T = Z. From its definition it is clear that ϕ fulfills the cocycle property ϕ (k + l, n; ω) = ϕ (l, n + k; T kω) ◦ ϕ (k, n; ω) (2.7) for all k, l ∈ N0, n ∈ T and ω ∈ Ω. This accounts to the fact that going from time n to time n + k + l directly is the same as going from n first to n + k and then afterwards to n + k + l. If the NRDS is invertible, we can extend the definition of ϕ to negative values of k as well. The most natural way to do so is defining ϕ (−k, n; ω) := ϕ (k, n − k, T −kω)−1,

24 Abbreviation Full expression Reference RDS (autonomous) random dynamical system Section 2.1 NRDS non-autonomous random dynamical system Definition 2.24 NARS non-autonomous random set Definition 2.25 NARF non-autonomous random fixed point Definition 2.26 NARM non-autonomous random measure Definition 2.28 NASM non-autonomous stationary measure Definition 2.37

Table 2: Overviev over abbreviations introduced in this chapter.

for k ∈ N such that n − k ∈ T, as this is coherent with the cocycle property. For autonomous systems the notions (a), (b) and (c) coincide, but they are different in the non-autonomous setting. The systems studied in Chapter3 can have invertible fibers but not be invertible, and they are never reversible. As usual for non-autonomous dynamical systems, all the objects we study become time- dependent as well. Thus we need to amend the notions of fixed points, invariant sets and attractors from the previous chapter.

Definition 2.25. A non-autonomous random set, or NARS for short, is a sequence (A ) n n∈T of random sets. We say it is closed/compact/open, if all the An are compact/closed/open in the sense of Definition 2.5. It is called invariant, if

fn,ω (An (ω)) = An+1 (T ω) (2.8) for all n ∈ T and µn-a.e. ω ∈ Ω. Just as in Chapter 2.1 we have a notion of random fixed points as special cases of invariant NARS. Again, all definitions and results that hold for (compact) invariant NARS are thought to include random fixed points as well.

Definition 2.26. A non-autonomous random fixed point, or NARF for short, is a sequence (a ) of measurable maps a : Ω → X such that f (a (ω)) = a (T ω) for all n ∈ and n n∈T n n,ω n n+1 T µn-a.e. ω ∈ Ω. From the cocycle property (2.7) it follows that invariance implies

n n+k ϕ (k, n; ω) An (T ω) = An+k (T ω) for all choices of k, n, ω. Remark 2.27. Assume T is bi-measurable. Let n ∈ and (A ) be a sequence of random 0 T n n≤n0 sets such that (2.8) holds for all ω ∈ Ω and n < n . Then (A ) with 0 n n∈T

−k −k An0+k (ω) := ϕ (k, n; T ω)An0 (T ω) (2.9) is an invariant NARS. Moreover, if (B ) is an invariant NARS with A = B a.s. for all n n∈T n n n ≤ n0, then this is true for all n ∈ T.

25 If T = N0 or ϕ is invertible then (2.9) implies a 1:1 correspondence between random sets and NARS, as we can recover (A ) from A alone. In those cases the notion is rather n n∈T 0 weak, such that one usually has to single out NARS with special properties in order to obtain dynamical information about the system. Definition 2.28. A non-autonomous random measure (NARM) is a sequence (α ) of n n∈T −n measures on A ⊗ B (X) such that αn (A × X) = µn (A) = µ (T A) for all n ∈ T and A ∈ A . In other words, a NARM is a sequence of random measures (α ) where α is a random n n∈Z n measure over (Ω, A , µn). From Theorem 2.10 it follows that each αn possesses a measurable decomposition (αn,ω)ω∈Ω, which is νn-a.s. uniquely defined via Z αn = αn,ω dµn (ω) .

Definition 2.29. A NARM (α ) is called invariant under ϕ if n n∈Z

αn+1,T ω = (fn,ω)∗αn,ω for all n ∈ T and µn-a.e. ω ∈ Ω. With the skew product notation

Φn : Ω × X → Ω × X, (ω, x) 7→ T n fn (ω, x) = (T ω, fn,ω (x)) we can give an alternative characterization in the case of invertible T . First we want to point out that Φn∗αn is a random measure over (Ω, A , µn+1) for all n ∈ T. Given A ∈ A we have −1 −1 −1  Φn (A × X) = T A × X and thus Φn∗αn (A × X) = µn T A = µn+1 (A). Lemma 2.30. Assume T is bi-measurable. A NARM (α ) is invariant if and only if n n∈T Φn∗αn = αn+1. Proof. For C ∈ A ⊗ B (X) we have the relation

−1 −1 (Φn C)(ω) = {x ∈ X | (T ω, fn,ω (x)) ∈ C} = fn,ω (C (T ω)) and thus Z Φn∗αn (C) = (fn,ω)∗αn,ω (C (T ω)) dµn (ω) . (2.10) Z = (fn,T −1ω)∗αn,T −1ω (C (ω)) dµn+1 (ω) . (2.11)

This shows that the decomposition of the random measure Φn∗αn w.r.t. µn+1 is given by   (fn,T −1ω) αn,T −1ω , ∗ ω∈Ω and the fact that these decompositions are µn+1-a.s. unique concludes the proof. Our aim is to prove the following generalization of Theorem 2.12. Theorem 2.31. Assume T is bi-measurable and let (A ) be a compact, forward invariant n n∈T NARS. There is at least one invariant NARM (α ) such that α (A ) = 1 for all n ∈ . n n∈T n n T

26 In order to prove this result, we will introduce a topology on (non-autonomous) random measures with nice convergence properties. We follow the ideas of Chapters 3, 4 and 6 in [27]. For now let (Ω, A , µ) be any probability space.

Definition 2.32. A map h: Ω × X → R is called random continuous if • ω 7→ h (ω, x) is measurable for all x ∈ X, • x 7→ h (ω, x) is continuous and bounded for a.e. ω ∈ Ω and • the map ω 7→ supx∈X |h (x, ω)| is µ-integrable. In analogy to the well know weak topology on the set of Borel measures of some metric space, the narrow topology on the set Mµ of random probability measures over (Ω, A , µ) is defined as follows.

1 2 Definition 2.33. A sequence α , α ,... ∈ Mµ converges in the narrow topology to some α ∈ Mµ if Z Z h dαk → h dα (2.12) Ω×X Ω×X for all random continuous functions h.

It is shown in [27, Lemma 3.16] that it suffices to have the convergence (2.12) for all random continuous functions with values in [0, 1]. If A is a random compact set we write Mµ (A) for the set of all random probability measures α with α (A) = 1.

Theorem 2.34. For every random compact set A, the set Mµ (A) is sequentially compact in the narrow topology, i.e. every sequence in Mµ (A) has a narrowly convergent subsequence.

1 2 Proof. Let α , α ,... ∈ Mµ (A). Theorems 4.3 and 4.4 in [27] show that there is a subsequence (αki ) and α ∈ M such that αki → α narrowly. It remains to show that α (A) = 1, but this i∈N µ follows from Theorem 3.17 in [27].

Proof of Theorem 2.31. The sets Mµn (An) for n ∈ T are sequentially compact in the narrow topology due to Theorem 2.34 and thus the set Y NM := Mµn (An) n∈T of all non-autonomous random measures supported on (A ) is sequentially compact w.r.t. n n∈T the product topology. The Measurable Selection Theorem [27, Theorem 2.6] implies that for each n ∈ T there is a measurable map an : Ω → X such that an ∈ An a.s. Then the measure δan on A ⊗ B (X) with fibers (δan )ω = δan(ω) is an element of Mµn (An) which shows that MN is non-empty. Let us first focus on the case T = Z. For every αn ∈ Mµn (An), Φn∗αn is a random measure over (Ω, A , µn+1) and Φn∗αn (An+1) ≥ αn (An) = 1, such that the operator Φˆ: NM → NM, (Φαˆ )n+1 = Φn∗αn is well defined. It is moreover sequentially continuous due to [27, Lemma 6.7].

27 The operator Φˆ was constructed such that invariant NARM supported on (A ) are n n∈Z exactly the fixed points of Φ and it remains to construct such a fixed point. Starting from any arbitrary α0 ∈ NM we let 1 k−1 αk := X Φˆjα0 ∈ NM. k j=0 k Let ki % be a subsequence such that α i converges to some α ∈ NM. We claim that this α has the required property. To see this first note that 1 Φαˆ ki = αki + (Φˆkj α0 − α0). kj

For a random h: Ω → [0, 1] we have

1 Z Z 2 ˆkj 0 0 h d(Φ α )n − h dαn ≤ kj kj which yields Z Z Z ˆ ki ki lim h d(Φα ) = lim h dαn = h dαn. i→∞ n i→∞

ˆ ki ˆ Thus, (Φα )n → αn in the narrow topology for all n ∈ Z. Continuity of Φ finally implies that

Φαˆ = lim Φαˆ ki = α, i→∞ and thus α is a fixed point of Φˆ. In the case where T = N0 one can apply the same argumentation to the operator

Φˆ: NM → NM, (α ) 7→ (α ,Φ α ,Φ α ,... ). n n∈N0 0 0∗ 0 1∗ 1

In the following we will present a way of extending the notion of Markov dynamical systems to the non-autonomous case. We assume from now on that T is bi-measurable and T = Z. We denote as

+ n FN := σ (ϕ (k, n; T ω): k ∈ N, n ≥ N, x ∈ X) −  n−k  FN := σ ϕ (k, n − k; T ω): k ∈ N, n ≤ N, x ∈ X the σ-algebras of the past before N and the future after N respectively.

Definition 2.35. + − (a) The NRDS ϕ is called Markov NRDS, if for every N ∈ Z, FN and FN are independent w.r.t. the measure µ. (b) An invariant NARM (α ) is called a non-autonomous Markov measure or Markov n n∈Z − NARM, if for all n ∈ Z and B ∈ B (X) the map ω 7→ αn,T nω (B) is Fn -measurable.

28 d For a NRDS on X ⊆ R being Markov means that for all x ∈ X and n ∈ Z the stochastic process (Y ) with k k∈Z n Yk (ω) = ϕ (k, n; T ω) x is a Markov chain w.r.t. the measure µ and the filtration (F − ) . Indeed the Markov n+k k∈Z property implies that

− 0 0 ] E[Yk+1 | Fn+k](ω) = E [ω 7→ fn+k,T n+kω0 Yk ω | Fn+k (ω) Z 0 = fn+k,T kω0 (Yk (ω)) dµn ω Z 0 = fn+k,ω0 (Yk (ω)) dµn+k ω

= E [Yk+1 | Yk](ω) , and from this relation we can read off the generating Markov semigroup (M ) as n n∈Z Z Mn : P → P,Mnρ = (fn,ω)∗ ρ dµn (ω) .

Here P denotes the set of all Borel probability measures on X. Note that this Markov process is not homogeneous as Mn =6 Mm for m 6= m in general. In other words, the action of the Markov semigroup on P generates a cocycle

M (k, n) = Mn+k−1 ◦ · · · ◦ Mn.

The following lemma shows that this cocycle M is compatible with the cocycle ϕ.

Lemma 2.36. Assume ϕ is Markov and let h: X → R be measurable and bounded and ρ ∈ P. Then for all n ∈ Z and k ∈ N, Z Z Z h (x) d [M (k, n) ρ](x) = h (ϕ (k, n; ω) x) dρ (x) dµn (ω) . (2.13) X Ω X

Proof. We show (2.13) by induction over k. For k = 1 and h = 1B with B ∈ B (X), (2.13) reduces the definition of the Markov operator Mn and is therefore true by default. Extending this to all bounded, measurable h is a standard argument3. Now assume we showed (2.13) for some k. Then Z Z h (x) d [M (k + 1, n) ρ](x) = h (x) d [M (1, n + k)M (k, n) ρ](x) X X Z Z = h (fn+k,ω (x)) d [M (k, n) ρ](x) dµn+k (ω) (2.14a) Ω X Z Z Z n 0 0 = h (fn+k,T n+kω ◦ ϕ (k, n; T ω )(x)) dρ (x) dµ ω dµ (ω) (2.14b) Ω Ω X Z Z = h (ϕ (k + 1, n, T nω)x) dρ (x) dµ (ω) (2.14c) Ω X Z Z = h (ϕ (k + 1, n; ω) x) dρ (x) dµn (ω) . (2.14d) Ω X Hereby we used

29 • the result for k = 1 in (2.14a), • the induction assumption and the fact that x 7→ h (fn+k,ω (x)) is measurable and bounded l for every ω ∈ Ω, as well as the relation T∗µ = µl in (2.14b), • the Markov property and cocycle property of ϕ in (2.14c) and n • the relation µn = T∗ µ in (2.14d).

Just as in the autonomous setting, we are interested in fixed points of the Markov semigroup, but as already mentioned above, this semigroup acts non-autonomously, such that we have to work with non-autonomous measures.

Definition 2.37. A non-autonomous stationary measure (NASM) is a sequence (ρ ) of n n∈Z Borel measures on X, such that Mnρn = ρn+1. We can now generalize Theorem 2.15 to our non-autonomous case.

Theorem 2.38. Let ϕ be a Markov NRDS. (a) If (α ) is a Markov NARM then a NASM is defined via n n∈Z Z ρn = αn,ω dµn (ω) . (2.15)

(b) If (ρ ) is a NASM then there exists a unique Markov NARM (α ) such that n n∈Z n n∈Z

−k αn,ω = w-lim ϕ (k, n − k; T ω)∗ρn−k (2.16) k→∞

for every Z ∈ N and µn-a.e. ω ∈ Ω. Moreover (2.15) holds. This in particular implies a 1:1 correspondence between NASM and Markov NARM’s. Our proof is based on the ideas from the proof of the autonomous case presented in [50, Theorem 4.2.9].

Proof. (a) Let ρn be as defined in (2.15) and B ∈ B (X) measurable and bounded. Then the Markov property implies that Z −1 Mnρn (B) = ρn (fn,ωB) dµn (ω) Ω Z Z 0 = (fn,T nω)∗αn,T nω0 (B) dµ (ω) dµ ω Ω Ω Z = (fn,T nω)∗αn,T nω (B) dµ (ω) Ω Z = αn+1,ω (B) dµn+1 (ω) Ω = ρn+1 (B) .

3 Pn As (2.13) is linear in h it holds true for all step functions h of the shape h = i=1 ci1Bi with Bi ∈ B (X) and ci ∈ R. For every measurable, bounded function h there is a sequence of step functions hn such that hn → h a.s., and thus the Theorem of Dominated Convergence implies that (2.13) holds for all such h.

30 (b) Let (ρ ) be a NASM and let h: X → be continuous and bounded. For fixed n n∈Z R n ∈ N we write Z Z −k −k ζk (ω) := h (ϕ (k, n − k; T ω)x) dρn−k (x) = h dϕ (k, n − k; T ω)∗ρn−k. X X

All ζk are measurable and uniformly bounded by supX |h|. We have the relation Z n n−k ζk+1 (T ω) = h (ϕ (k, n − k; T ω) ◦ fn−k−1,T n−k−1ω (x)) dρn−k−1 (x) . X Using the Markov property of ϕ we see that n + E[ζk+1 ◦ T | Fn−k](ω) Z Z n−k 0 = h (ϕ (k, n − k; T ω) ◦ fn−k−1,T n−k−1ω0 (x)) dρn−k−1 (x) dµ ω Ω X Z Z n−k 0 = h (ϕ (k, n − k; T ω) ◦ fn−k−1,ω0 (x)) dρn−k−1 (x) dµn−k−1 ω Ω X Z n−k = Xh (ϕ (k, n − k; T ω)x) dρn−k (x) n = ζk (T ω) for µ-a.e. ω ∈ Ω. This shows that (ζ ◦ T n) is a bounded Martingale with respect to the k k∈N + n filtration (F ) . Doob’s Martingale Convergence Theorem B.12 implies thus that ζk ◦ T n−k k∈N converges µ-a.s. to some , or equivalently that ζk converges µn-a.s. According to [50, Theorem 7.5.2] there is a random measure αn over (Ω, A , µn) such that (2.16) holds µ -a.s. Let us now show that the NARM (α ) generated this way is invariant n n n∈Z for ϕ. This follows from (2.16) as −k (fn,ω) αn,ω = w-lim (fn,ω) ϕ (k, n; T ω) ∗ ρn−k ∗ k→∞ ∗ −(k+1) = w-lim ϕ (k + 1, (n + 1) − (k + 1) ; T T ω)∗ρ(n+1)−(k+1) k→∞

= αn+1,T ω for all n ∈ Z and µn-a.e. ω ∈ Ω. That this NARM fulfills the Markov property is also clear from (2.16), so it remains to show that (2.15) holds. If we denote Z ρ˜n := αn,ω dµn (ω) this is equivalent to ρ˜n = ρn. Let h: X → R be bounded and continuous. Using (2.16), the Dominated Convergence Theorem and (2.13) we get Z Z Z h d˜ρn = h (x) dαn,ω (x) dµn (ω) X Ω X Z Z −k = lim h (ϕ (k, n − k, T ω)x) dρn−k (x) dµn (ω) k→∞ Ω X Z = lim h dM (k, n − k) ρn−k k→∞ Ω Z = h dρn. X

As this is true for any bounded continuous h we have indeed that ρ˜n = ρn

31 2.2.2 Attractors and repellers In the case of a two sided time we can generalize the notion of pullback attraction to the non- autonomous case. Several concepts of attracting and attractor for NRDS have been suggested in the literature, see for example [30, 24, 32]. A discussion of these various approaches would be to exhaustive for this thesis, so we refer to the literature. We only present a basic concept of a set attracting anotherset that generalizes Definition 2.18, and coincides for example with Definition 2.4 in [32]. This will be sufficient for our purposes.

Definition 2.39. Let T = Z and T be bi-measurable. We say that a compact invariant NARS (A ) n n∈Z (a) (pullback) attracts a NARS (U ) if n n∈Z −k −k  lim dist ϕ (k, n − k; T ω)Un−k (T ω),An (ω) = 0 (2.17) k→∞

for all n ∈ Z and µn-a.e. ω ∈ Ω. (b) (pullback attracts) a random set U if

−k −k  lim dist ϕ (k, n − k; T ω)U (T ω),An (ω) = 0 (2.18) k→∞ Of course, (b) is a special case of (a) as one can reinterpret the random set U as a NARS (U) . n∈Z The following theorem shows that attractivity is already defined if the above relation holds up to a certain time n0. Together with Remark 2.27 this provides a handy tool for constructing attractors as it suffices to construct it up to a certain n0. Theorem 2.40. Assume T is bi-measurable and = and let (A ) be a compact invariant T Z n n∈Z NARS. If (U ) is a NARS such that such that for some n ∈ , (2.18) holds for all n ≤ n n n∈Z 0 Z 0 and a.e. ω ∈ Ω, then (A ) pullback attracts (U ) . n n∈Z n n∈Z

Proof. Assume w.l.o.g. that n0 = 0 and fix some n > 0. We want to show that (2.18) holds for every n > 0 and a.e. ω ∈ Ω. Let k > n and denote

m := k − n ∈ N and hω := ϕ (n, 0, ω) .

Note that hω is independent of k. The cocycle property of NRDS implies that

n−k −m ϕ (k, n − k, T ω) = hω ◦ ϕ (m, −m, T ω). Moreover we have

n−k n An−k (T ω) = A−m,T −mω0 and An (T ω) = hω (A0 (ω)) . By assumption,

−m −m   lim dist ϕ (m, −m; T ω) U−m T ω ,A0 (ω) = 0 m→∞ for a.e. ω ∈ Ω. Fix one such ω and let K denote the set of all compact subsets of K := B1 (A0 (ω)). As hω0 : K → R is uniformly continuous, the map K 3 K 7→ hω0 (K) is continuous −m −m w.r.t. the Hausdorff metric. As moreover ϕ (m, −m, T ω) U−m (T ω)) ∈ K for all large enough m, the claim follows.

32 Remark 2.41. This shows that being an attractor in a non-autonomous system is much less restrictive than in the autonomous case. Consider for example a case of an invertible system that is contracting in the following sense. Assume X = R and that all the fn,ω are Lipschitz with a Lipschitz constant ϑ < 1 that does not depend on n or ω. Then let (A ) be any n n∈N compact, invariant NARS and pick any η > 0. For every choice n ∈ Z, k ∈ N and ω ∈ Ω we have on the one hand,

−k −k ϕ (k, n − k, T ω)Bη (An−k (T ω)) ⊇ An,ω and on the other hand,

−k −k  k diam ϕ (k, n − k, T ω)Bη (An−k (T ω)) ≤ diam (An,ω) + 2ϑ .

These two relations together imply that (2.18) holds for all n ∈ Z and ω ∈ Ω. Thus, every compact invariant NARS is attracting every bounded neighborhood of itself. To further exemplify the difference between autonomous and non-autonomous attractors one can put an autonomous RDS into a non-autonomous framework. Let for example Ω = [−1, 1] with any measure µ preserved by some T : Ω → Ω and define 1 f (x) := g (x) := x + ω . n,ω ω 2 0 Then the according NRDS ϕ has infinitely many compact, invariant NARS, all of which are attractors according to the above argument. But on the other hand, T n g, despite describing the same dynamics as ϕ, has a unique compact invariant random set, which is also the only attractor. Assume for a that ϕ is reversible. In this setting it is possible to define the reverse-time system ϕ∗ via

ϕ∗ (k, n; ω) := ϕ (−k, −n; ω) = ϕ (k, −n − k; T −kω)−1.

This is again a NRDS, but over the base Ω, A , µ, T −1. Indeed it fulfills the cocycle property

ϕ∗ (k + l, n; ω) = ϕ∗ (l, n + k; T −kω) ◦ ϕ∗ (k, n; ω) .

Also from the definition it is clear that ϕ∗∗ = ϕ. For any NARS (A ) we define A∗ := A . From that definition we immediately get n n∈Z n −n the following.

Lemma 2.42. A NARS (A ) is invariant for ϕ if and only if (A∗ ) is invariant for n n∈Z n n∈Z ϕ∗.

Proof. Assume (A ) is ϕ-invariant. Then n n∈Z ∗ −n  ∗ −n −n  ϕ k, n; T ω An (T ω) = ϕ (−k, −n, ω) A−n T ω −n−k = A−n−k (T ω) ∗ −n−k = An+k (T ω).

The other implication follows the same way with ϕ = ϕ∗∗.

33 The dual concept to an attractor in dynamics is a repeller, which can be defined as an attractor of the reverse time system. Let (R ) and (U ) NARS such that (R∗ ) n n∈Z n n∈Z n n∈Z pullback attracts (U ∗) in ϕ∗. We can reverse time again and rewrite the ϕ∗ version of n n∈Z (2.18) using the forward time cocycle φ and obtain

k k  lim dist ϕ (−k, n + k; T ω)U (T ω)),Rn (ω) = 0 (2.19) k→∞ for every n ∈ Z and µn-a.e. ω ∈ Ω. This relation does not requite invertibility of T and if n ≥ 0, we do not need that T = Z either, such that we can give a more generalized definition of a repeller.

Definition 2.43. Assume ϕ has invertible fibers. A NARS (R ) is said to (pullback) repel n n∈T a NARS (U ) if (2.19) holds for all n ∈ and a.e. ω ∈ Ω. If U ⊆ Ω X is a random set n n∈T T n we say that (R ) pullback repels U if it pullback repels the NARS (U) . n n∈T n∈T Remark 2.44. Strictly speaking maps of the shape ϕ (−k, n + k, ω) are not well-defined in the case of a non-invertible base transformation T , so the notation ϕ (−k, n + k, T kω) in (2.19) is a somewhat sloppy way of writing ϕ (k, n; ω)−1, but it is coherent with the cocycle property.

Remark 2.45. With a similar argumentation one could also define attraction for T = −N0, but that case only allows dynamics in the past, so we will omit it.

34 3 Stochastic Approximations

3.1 Setup and notation In this chapter we will study Robbins-Monro algorithms, a recursively defined type of stochastic processes that first appeared in the paper [66]. Numerous models have been studied in which a stochastic process can be brought into this particular form. We will give a few examples below. We will use the notion of conditional expectations and some standard martingale theory throughout this section. For the reader who wants to refresh their knowledge we refer to AppendixB. The RM-algorithm is a recursion rule for stochastic processes (x ) . A common form n n∈N0 (e.g. [13, 59]) of stating it is

xn+1 = xn + γn+1 (f (xn) + Un+1) , (3.1) with the following properties. d (a) x0 ∈ R is the deterministic initial value. d d (b) The map f : R → R is continuous. We will refer to it as the mean field of the RM-algorithm. (c) The noise process (U ) is a martingale difference sequence in d on some filtered n n∈N R probability space (Ω, , µ, ( ) ). A Fn n∈N (d) γ1, γ2, . . . > 0 are the step sizes or time steps, which are chosen such that γn → 0 and P∞ n=1 γn = ∞. The RM-algorithm has the form of an Euler approximation of the ODE

x˙ = f (x) (3.2) with varying time steps γn, perturbed by some noise terms Un. Beginning in the 1970’s, a link between the asymptotic behavior of (3.1) and solutions of (3.2) has been established (cf. [56, 51]). Interpreting γn as the time steps of an Euler approximation, the elapsed time after n steps is n X τ0 = 0 andτn := γn for n ∈ N, k=1 so taking the the limit n → ∞ implies τn → ∞, which is somewhat equivalent to taking the P∞ limit t → ∞ in the solutions xt of (3.2). This is the reason for the condition n=1 γn above.

Remark 3.1. In some works, such as [13, 15], an additional term bn appears in the RM- algorithm, xn+1 = xn + γn+1 (f (xn) + Un+1 + bn+1) , where (b ) is an adapted process with b → 0 a.s. While this term is crucial for the n n∈N n formulation of many applications, mathematically speaking it is usually negligible due to the fact that it vanishes at n → ∞. Even though many of the results presented below are still valid with this additional term, we will follow the example of [13, Remark 4.5] and usually not include it in our formulations.

35 From a dynamical point of view, stochastic approximations can be understood as NRDS. This, however, requires a dynamical modeling of the noise process. For any given (U ) , n n∈N d 0 0 0 where Un has values in R , we can define a dynamical system (Ω , A , µ ,T ) with

0 d N 0 0 Ω = (R ) and A := B (Ω ).

As a measure µ0 we choose the distribution of the Ω0-valued random variable (U ) , i.e. for n n∈N any set A ∈ A 0 we define 0 µ (A) = µ {(U1,U2,... ) ∈ A} . If 0 πn : Ω → X, ω 7→ ωn denotes the projection onto the n-th component, then the process (π ) has the same n n∈N distribution as (U ) , such that we can assume w.l.o.g. that (Ω, , µ) = (Ω0, 0, µ0) and n n∈N A A n−1 Un = πn. Then we have the relation Un = U1 ◦ T where

T : Ω → Ω, ω 7→ (ω ) n+1 n∈N is the right shift on Ω. Following our notation from the previous chapter, we can now define a NRDS ϕ via

fn,ω (x) = x + γn+1 (f (x) + ω1) = x + γn+1 (f (x) + U1 (ω1)) .

If (x ) is a stochastic approximation in the way we defined it above, it fulfills the relation n n∈N0

xn+1 (ω) = fn,T nω (xn (ω))

3.2 Examples In this section we present a few models that take the form (3.1). For further examples we refer to the book [51] and the survey paper [61].

3.2.1 Urn models and market competition The basic Pólya urn was first described in [36]. It consists of an urn that is initially filled with one red and one blue ball. Then a ball is drawn at random and added back to the urn, together with a ball of the same color. This process is then repeated over and over, and we want to study the long time behavior of the proportion xn of red balls. Let qn be the number qn of red balls after drawing n times from the urn, so xn = n+2 . We can model the drawing process using a sequence (ξ ) if i.i.d. random variables, uniformly distributed on [0, 1]. n n∈N Since the probability of drawing red in step n + 1 is xn, we say red is drawn if ξn ≤ xn, and blue is drawn otherwise. This means that qn+1 = qn + 1{ξn≤xn} and thus

qn + 1 1 x = {ξn≤xn} = x + (−x + 1 ). (3.3) n+1 n + 3 n n + 3 n {ξn≤xn}

1 Since E[−xn + 1{ξn≤xn} | Fn] = 0, this is a RM-algorithm with γn = n+2 , f ≡ 0 and

Un+1 = −xn + 1{ξn≤xn}.

36 A Freedman urn (cf. [40]) is a modification of the above model, where the probability of drawing red in step n + 1 is given by p (xn) with some increasing, surjective function p: [0, 1] → [0, 1]. Equation (3.3) has to be modified to

qn + 1 1 x = {ξn≤xn} = x + (−x + 1 ). (3.4) n+1 n + 3 n n + 3 n {ξn≤p(xn)} In this case

E[−xn + 1{ξn≤p(xn)} | Fn] = p (xn) − xn := f (xn) , such that we can rewrite (3.4) as a RM-algorithm 1 x = x + (f (x ) + U ) n+1 n n + 3 n n+1 with Un+1 = −xn + 1{ξn≤p(xn)} − f (xn). On the application side, Freedman urns can be used for example to model the market share between two competing products or companies, if neither of them is intrinsically better than the other (e.g. VHS vs. Betamax, Apple vs. IBM or Android vs. IOS). The popular account [4] discusses such an approach. For example, one possibility proposed in this paper is to use a Freedman urn with xα p (x) = xα + (1 − x)α with α > 0. The rationale behind this model is that consumers tend to buy the more popular product, and the higher the current market share of a product is, the stronger the tendency towards this particular product becomes. The according mean field ODE x˙ = p (x) − x has two stable equilibria x = 0 and x = 1, which correspond to monopolies of either products. The results presented in the next section can be used to show that the stochastic approximation a.s. converges to one of those monopolies.

3.2.2 Learning in games Stochastic approximation algorithms appear in numerous forms in game theory. We will present the Arthur model for reinforcement learning ([6]), as well as a stochastic version of fictitious play ([41]). A 2 × 2 bimatrix game consists of 2 players A and B competing with each other. Both players have two actions 0 and 1 available. Players get rewarded based on the actions chosen, with the payoff depending simultaneously on the actions chosen by both players. Those payoffs are encoded in two payoff matrices ! ! a a b b A = 0,0 0,1 and B = 0,0 0,1 . a1,0 a1,1 b1,0 b1,1

If A plays action i and B plays j, then the reward will be ai,j for A and for bj,i for B. A strategy is a between the two actions 0 and 1. We identify a x  A B number x ∈ [0, 1] with the strategy ~x = 1−x . If x and x are strategies chosen by the players, then the expected payoff for A and B is given by

t t pA (~xA, ~xB) = (~xA) A~xB and pB (~xA, ~xB) = (~xB) B~xA.

37 The main goal of learning theory is for both players to find a strategy that maximizes their expected payoff. If all players had all information about the game this is a simple optimization task. But if you assume that parts of the information are hidden to one or both players, different approaches can be used to learn the missing information from repeated play. We briefly discuss two common approaches.

Stochastic fictitious play Fictitious play—as introduced by Brown [20]—assumes that each player has knowledge of their own payoff matrix and the full history of actions chosen i by the opposing player. Let an ∈ {0, 1} be the action chosen by i = A, B in step n and 1 n xi := X ai n n k k=1 be the average over the first n actions. The idea of fictitious play is for each player at each step to pick the best response to the opponents’ previous play, i.e. the action that has the highest expected payoff based on the action average of their opponent. Formally,

( A A B A B an+1 = BR (xn ) := arg maxa∈{0,1} p (a, xn ) B B A A A an+1 = BR (xn ) := arg maxa∈{0,1} p (xn , a). This leads to the recursive relations 1   xi = xi + BRi (x1−i) − xi . (3.5) n+1 n n + 1 n n This purely deterministic equation is a sort of degenerate version of the Robins-Monro algo- rithm with Un ≡ 0. A randomized version of this model was proposed by Fudenberg and Kreps in [41]. Instead of fixing the payoff matrices A and B we allow them to depend on some parameter ζ, modeling some external influence. Thus the payoff functions and best response functions depend on this parameter as well. We assume further that ζ is chosen randomly in each step, so let (ζ ) n n∈N be an i.i.d. sequence of parameters. We thus obtain a stochastic version of (3.5), 1   xi = xi + BRi (x1−i, ζ ) − xi n+1 n n + 1 n n n 1 = xi + (f i (x ) + U i ) (3.6) n n + 1 n n+1 where i i 1−i i i 1−i i f (x) = E [BR (x , ζn)] − x = µ{BR (x , ζn) = 1} − x and i i 1−i i i Un = E [BR (xn , ζn)] − xn − f (xn) . That (U ) is a martingale difference sequence follows from the fact that n n∈N  i 1−i  i E BR (xn , ζn) xn = f (xn) , and thus (3.6) is a Robbins-Monro algorithm. The map f : [0, 1]2 → [0, 1]2 is called the game vector field. A Nash distribution equilibrium is a point x∗ such that x∗ = 0, or in other words, an equilibrium pioint of the mean field ODE x˙ = f (x). Benaïm and Hirsch proved in [16, Theorem 2.2] that under some mild assumptions, xn converges a.s. to a Nash distribution equilibrium.

38 Reinforcement learning This model assumes that players can only observe their own actions and the resulting payoffs; The opponent’s actions as well as the payoff matrices are unknown to them. All payoffs ai,j, bi,j are assumed to be strictly positive. The idea of reinforcement learning is that an action should be played more often in the future, if it has resulted in a higher payoff in the past. i Each player starts with an initial propensity q0, i = A, B for strategy 1. Then, every time they chose to play strategy 1, the observed payoff is added to that propensity. This leads to i i i two sequences q0, q1,... of propensities. Let πn denote the payoff of player i in round n. In i i this setup, the total payoff for i after n steps is the random number π1 + ··· + πn. In the reinforcement learning model proposed by Arthur in [6], this total payoff is renormalized to ν  1 i i be Cn with some C > 0 and ν ∈ 2 , 1 . If δn denotes the indicator of i playing action 1 in step n, the (renormalized) propensities follow the recursion

ν i i i i C (n + 1) qn+1 = (qn + δnπn+1) ν i . Cn + πn+1 The strategy i uses to determine her action in step n + 1 is given by

qi xi = n . n Cnν i In [62] it is shown that the xn follow a recursion of the form 1 xi = xi + (f i (xA, xB) + U i + bi ), (3.7) n+1 n Cnν n n n+1 n+1 where U i  is a martingale difference sequence and bi is a random variable of order n n∈N n −ν i O (Cn ), hence bn → 0 a.s. Note that the calculations presented in [62] follow an idea similar to the approach we presented above for the Freedman urn. A B 2 The map f := (f , f ): [0, 1] → R is the so called replicator map of the game. It contains information about the dynamic behavior of the game. Thus, the Arthur model of reinforcement learning is a stochastic approximation (in the sense of Remark 3.1) of the replicator dynamics of the game. For further information on the latter we refer the reader to [44, Chapter 7]. Remark 3.2. An alternative approach is the model by Erev and Roth ([67, 38]), which is identical to the Arthur model, just without the renormalization of the total payoff. If one tries to obtain a recursion in the style of (3.7) for this case, one gets random time steps γn, namely i i i the inverse of the total payoff Πn = π1 + ··· + πn after step n. However, as shown in [46] one i n can still obtain a Robbins-Monro algorithm by introducing the additional variables µn = i . Πn A Nash equilibrium of the game is a pair of strategies (xA, xB) ∈ [0, 1]2 such that no player can unilaterally increase their own payoff by deviating from this strategy. A Nash equilibrium is strict if unilaterally deviating from it strictly decreases the average payoff of the deviating player. Each 2 × 2-bimatrix game has at least one Nash equilibrium ([11, Theorem 3.3.2]), but not necessarily a strict one, and all Nash equilibria are equilibria of the replicator equation x˙ = f (x) ([44, Theorem 7.2.1]). Hopkins showed in [62, Theorem 1] that for games with at least one strict Nash equilibrium, the process (x ) defined in (3.7) a.s. converges to a n n∈N0 strict Nash equilibrium, and that every strict Nash equilibrium is the limit of (x ) with n n∈N0 positive probability. Otherwise, the game has a Nash equilibrium in the interior (0, 1)2 of

39 the phase space, and the replicator ODE has periodic solutions cycling around this Nash equilibrium. In this case (x ) converges either to this equilibrium, to one of the periodic n n∈N0 orbits or to the boundary ∂ [0, 1]2 of the phase space, and each of the three options occurs with positive probability. Remark 3.3. This setup can be extended to games with more than two players and strategies while retaining the form of a Robbins-Monro algorithm, see e.g. [46]. But replicator dynamics in those cases can get very complicated, so no result in the generality of the above statements is known.

3.2.3 Stochastic gradient descent This following example is discussed by Bottou in [17], a paper which remarkably has the expression “Stochastic Approximation” in its title. A classical problem in supervised machine learning is the following. Given a certain number of samples (xi, yi)i=1,...,N (the training set) one wants to establish a correlation between d the xi and the yi using a parameterized model. If we assume that xi ∈ R and yi ∈ R such a d m model is a family of maps Fλ : R → R with parameters λ ∈ Λ ⊆ R . For example λ could be the wights of a neural network and Fλ describes the network structure. The goal is to optimize the parameter λ in such a way that given any input xi, the model predicts the value of yi fairly well. To measure the quality of such a prediction we use an 2 error function Q: R → [0, ∞). The prediction error for a given data tuple (xi, yi) under a given parameter λ is then Q (Tλ (xi) , yi), and we want to minimize the average error over the training set. Precisely we want to find a parameter λ∗ that minimizes the risk function

1 N J : Λ → [0, ∞) ,J (λ) = X Q (F (x ) , y ) . N λ i i i=1 Given enough regularity of J, a traditional computational approach to this optimization problem would be to perform a batch gradient descent, i.e. for some initial guess λ0 we define a sequence λ1, λ2 ... via λn+1 = λn − γ grad J (λ) ∗ with some step size γ > 0. Under some convexity and other mild conditions, λn → λ . We need to calculate gradλ Q (Fλ (xi) , yi) for every i ∈ {1,...,N} in every single step. This can be computationally very heavy, especially if the gradient has to be calculated numerically as well. A more efficient approach can be a stochastic gradient descent, where at each step we pick exactly one i randomly and instead of taking the average over all training data, we only calculate the gradient in this datum. Precisely let i1, i2,... be a sequence of i.i.d. random variables, uniformly distributed in {1,...,N}. Then for some initial value λ0 ∈ Λ we let

λn+1 = λn − γn+1 gradλ Q (Fλn (xin+1 ), yin+1 ). (3.8) with training rates γ1, γ2, . . . > 0. To ensure convergence one usually has to chose γn such P∞ that γn → 0 and n=1 γn = ∞. Note that   E gradλ Q (Fλn (xin+1 ), yin+1 ) i1, . . . , in = grad J (λn) ,

40 such that we obtain a martingale difference sequence

Un := grad J (λn) − gradλ Q (Fλn (xin+1 ), yin+1 ) and we can rewrite (3.8) as the Robbins-Monro algorithm

λn+1 = λn + γn+1 (− grad J (λn) + Un+1) .

∗ In Chapter 5 of [17] it is shown that under certain regularity conditions λn → λ a.s.

3.3 The Limit Set Theorem This section is a summary of results by Benaïm and coworkers in the 1990’s (see e.g. [12], [13] or [15]). Their approach to stochastic approximations builds on work done by Ljung [56] and Kushner and Clark [51]. The main idea is that the Robbins-Monro algorithm (3.1) approximates the mean field ODE (3.2) over finite intervals of time with arbitrary precision, if the cumulative influence of the noise over the same time interval is small. In order to compare the behavior of the stochastic approximation to that of the mean field ODE, Benaïm and Hirsch [15] introduced the notion of asymptotic pseudotrajectories. They can be defined with respect to any semi-flow Φ on some arbitrary metric space (M, d). In our case this will be the (semi-)flow generated by the mean field ODE (3.2), i.e. if t 7→ x (t) is the solution of (3.2) with initial condition x (0) = x0 then Φt (x0) = x (t). Definition 3.4. An asymptotic pseudotrajectory w.r.t. a semi-flow Φ is a continuous map X : [0, ∞) → M such that

 lim sup d Φh (X (t)) ,X (t + h) = 0 t→∞ h∈[0,T ] for all T > 0.

This means that one can approximate X on the interval [t, t + T ] by evolving the point X (t) according to Φ within any given error bound, given that t is large enough. However this does not mean that X describes the long term behavior of a single trajectory.

Example 3.5. Let M = R with the usual metric and take the constant flow Φt (x) = x. Then X (t) = log(t + 1) defines an asymptotic pseudotrajectory. Indeed, for any given T > 0 we have the estimate T sup |Φh (X (t)) − X (t + h)| ≤ |log (t + 1) − log (t + T + 1)| ≤ −−−→ 0, h∈[0,T ] t + 1 t→∞ but for any given t > 0 and x ∈ M we have

lim |ΦT (x) − X (t + T )| = ∞, T →∞ so X never stays close to any given trajectory of Φ for an arbitrarily large amount of time. In particular, X is not a trajectory of the flow itself.

41 The limit set L (X) of a map X : [0, ∞) → M is the set of all of its accumulation points,

L (X) = {y ∈ M | (∃t1, t2, . . . > 0) : tn % ∞ and X (tn) → y} . The simple example above shows that we cannot expect limit sets of asymptotic pseudotra- jectories to be limit sets of actual orbits of the semi-flow Φ, but Benaïm and Hirsch found the following characterization. Theorem 3.6 (Limit Set Theorem [15, Theorem 8.2]). (a) Let X be an asymptotic pseutotrajectory with compact image. Then the limit set L (X) is internally chain transitive. (b) Let A ⊆ M be internally chain transitive. There exists an asymptotic pseudotrajectory X such that A = L (X). For the reader unfamiliar with the notion of internal chain transitivity we refer to Ap- pendixC for a short overview and to [25] for a more detailed account. Asymptotic pseudotrajectories are defined as continuous maps from [0, ∞) to M, but solutions of (3.1) are in discrete time. But we can make them continuous by interpolation, and the following way seems to be the most natural one, given that we interpret the γn as time steps (c.f. [13]). Given a Rd-valued stochastic process (x ) we define a piecewise n n∈N affine, continuous stochastic process (Xt)t∈[0,∞) via

xn+1 − xn Xτn+s = xn + s for s ∈ [0, γn+1] . (3.9) γn+1

In particular we have Xτn = xn. For T > 0 and n ∈ N define the random variables   n+k  X  ∆n (T ) := sup γ U k ∈ such that τ ≤ τn + T , (3.10) l l N n+k  l=n+1  i.e. ∆n (T ) describes the maximal cumulative noise over a finite time interval of length T , starting at time τn. The key result that links the stochastic approximation to the mean field ODE is the following due to Benaïm.

Theorem 3.7 ([13, Theorem 4.1]). Let (Xt)t∈[0,∞) be the interpolation (3.9) of a solution of (3.1) and Φ the semi-flow induced by (3.2). Then t 7→ Xt (ω) is an asymptotic pseudotrajectory with respect to Φ for every ω ∈ Ω such that (a) ∆n (T )(ω) → 0 as n → ∞ for all T > 0 and (b) (Xt (ω))t∈[0,∞) is bounded. Verifying condition (a) can often be done by using martingale techniques. The following situation will be the most relevant for this thesis. Proposition 3.8 ([13, Proposition 4.2]). Assume there exists q ≥ 2 such that

∞ q  q X 1+ 2 sup E |Un| < ∞ and γn < ∞. n∈N n=1 Then condition (a) of Theorem 3.7 holds for a.e. ω ∈ Ω.

42 For q = 2 this follows from the fact that the sum

∞ X γnUn n=1 converges a.s., due to Doob’s Martingale Convergence Theorem B.12. The more sophisticated argument for the case q > 2 can be found in [13]. From our NRDS viewpoint, solutions of (3.1) are special cases of NARF, namely those with a deterministic value at n = 0. But we can easily expand Benaïm’s result to any NARF in the following way.

Corollary 3.9. Let (q ) be a NARF for the NRDS generated by (3.1), and let (X ) n n∈N t t∈[0,∞) be the interpolation (3.9) of the stochastic process (ξ ) defined as ξ (ω) = q (T nω). Then n n∈N0 n n the assertion of Theorem 3.7 remains true.

Proof. Fix ω ∈ Ω such that assumptions (a) and (b) of Theorem 3.7 are fulfilled. Let x := ξ (ω) and (x ) be the stochastic approximation emerging from x , as well as (X0) 0 0 n n∈N 0 t t≥0 0 0 its interpolation (3.9). Then Theorem 3.6 applies to Xt, but Xt (ω) = Xt (ω), so t 7→ Xt (ω) is an asymptotic pseudotrajectory.

3.4 Noise induced tipping in one dimension 3.4.1 Preliminaries The aim of this section is to describe a phenomenon known as noise induced tipping. From the Limit Set Theorem 3.6 we know that all bounded paths of a stochastic approximation converge to an internally chain transitive set. The question we are investigating here is if a trajectory can be trapped near such an chain transitive set. To simplify things we restrict to a situation where all trajectories converge to a stable fixed point of the mean field ODE. In the following we state and discuss three assumptions we will be using throughout this Section 3.4.

1 1 Assumption (A). The map f : R → R is C , hyperbolic and has ∞ as a source, i.e. f ∈ C and the set {f = 0} = {s0 < z1 < s1 < ··· < zN < sN } is finite and f 0 si < 0 and f 0 zi > 0 for i = (0), 1,...N.

Assumption (B). Denote by κn,k a regular conditional distribution of (Un+1,...,Uk) given

Gn+k := σ (Ul : l ≥ n + k + 1) .

There is R > 0 such that µ {|Un| ≤ R} = 1 for all n ∈ N and

(∀n, k ∈ N)(µ-a.e. ω ∈ Ω): ± (R,...,R) ∈ supp κn,k (ω, · ) .

Assumption (C). For all T > 0 we have ∆n (T ) → 0 a.s., there is σ0 > 0 such that for all n ∈ N a.s. E[|Un+1| | Fn] > σ0 (3.11)  i and µ xn → z = 0 for all i = 1,...,N for each fixed initial value x0 ∈ R.

43 Assumption (A) means that the mean field ODE x˙ = f (x) has an odd number of fixed points which are in the order stable–unstable–. . . –stable. This implies that the internally chain transitive sets are exactly the singletons si and zi . That those sets are internally chain transitive is obvious. On the other hand, assume A is internally chain transitive then A is compact, connected and invariant by Theorem C.6, hence A = [x, y] with equilibrium points x ≤ y ∈ {f = 0}. If x < y then A must contain one of the points si. But in this case si is a proper attractor for the flow restricted to A, and thus A is not internally chain transitive according to Theorem C.6. The number R from Assumption (B) is an optimal bound for the noise variables Un, and even optimal when conditioned to the future. This means that given any finite sequence n, . . . , n + k there is a positive probability that all noise variables Un,...,Un+k assume a value arbitrarily close to R, even when information on the noise Ul for l > n + k is taken into account. This assures that the bound R for the noise is always optimal. Similarly −R is an optimal lower bound for the noise. Example 3.10. Let (V ) be an i.i.d. sequence of random variables with µ {V = 1} = 1 = n n∈N n 2 1 µ {Vn = −1} and W be independent of the Vn with µ {W = 1} = 2 = µ {W = 2}. Then the sequence of Un := WVn violates (B). Even though R = 2 is an optimal bound for the Un 1 as µ {Un = 2} = µ {Un = −2} = 4 for all n, this is no longer true when conditioned on the future. Indeed for any given n ∈ N the event

A = {|Ul| ≤ 1 for all l ≥ n + 1} = {V = 1}

1 is an element of Gn and has probability µ (A) = 2 but

µ (|Un| > 1 | A) = 0.

If we denote by κn a regular conditional distribution of Un given Gn, we immediately get that (B) implies the seemingly weaker condition

(∀n ∈ N)(µ-a.e. ω ∈ Ω): ± R ∈ supp κn (ω, · ) (3.12) but the following lemma can be used to show that they are in fact equivalent.

d e Lemma 3.11. Let X : Ω → R and Y : Ω → R be integrable and G ⊆ A a σ-algebra. Assume d e for A ∈ B (R ) and B ∈ B (R ) that

µ (X ∈ A | Y, G ) > 0 and µ (Y ∈ B | G ) > 0 (µ-a.s.).

Then also µ (X ∈ A, Y ∈ B | G ) > 0 (µ-a.s.). Proof. We have   µ (X ∈ A, Y ∈ B | G ) = E E [1{X∈A}1{Y ∈B} | Y, G ] G   = E 1{Y ∈B}µ (X ∈ A | Y, G ) G by Theorem B.2. Let G + := {G ∈ G | µ (G) > 0}. For every G ∈ G + holds Z µ ({Y ∈ B} ∩ G) = µ (Y ∈ B | G ) dµ > 0 G

44 by assumption µ (Y ∈ B | G ) > 0 a.s. This implies that for every G ∈ G + we have Z 1{Y ∈B}µ (X ∈ A | Y, G ) dµ > 0 G as µ (X ∈ A | Y, G ) > 0 a.s., and hence it follows that µ (X ∈ A, Y ∈ B | G ) > 0 (µ-a.s.).

Proposition 3.12. Condition (B) and (3.12) are equivalent. Proof. That (3.12) follows from (B) is clear, so assume (3.12) holds. We will show that (B) holds by induction over k. As κ1,n = κn+1, we immediately get that the inclusion in (B) holds for all n ∈ N and k = 1. Assume now that (B) is proven for all n and some k. That supp κn,k+1 (ω, · ) ⊆ [−R,R] for k+1 a.e. ω ∈ Ω is clear. Let V be a countable neighborhood basis of Rk+1 := (R,...,R) ∈ R , i.e. V is a countable collection of open sets containing Rk+1, such that for every neighborhood U of Rk+1 there is V ∈ V with V ⊆ U. For every V ∈ V we can find open neighborhoods A of R and B of Rk with A × B ⊆ V . Let X = (Un+1,...,Un+k), Y = Un+k+1 and G = Gn+k+1. We have µ (X ∈ A | Y, G ) = κk,n ( · )(A) > 0 (a.s.) because of the induction assumption and

µ (Y ∈ B | G ) = κn+k+1 ( · )(B) > 0 (a.s.) by (B). Thus, Lemma 3.11 implies that

κk+1,n ( · ,V ) ≥ κk+1,n ( · ,A × B) = µ (X ∈ A, Y ∈ B | G ) > 0 (a.s.).

As V is countable we can find a full measure set Ω0 ∈ A such that

κk+1,n (ω, V ) > 0 for all V ∈ V and ω ∈ Ω0, and as V is a neighborhood basis, this extends to all neighborhoods V of Rk+1. Thus Rk+1 ∈ supp νk+1,n (ω, · ) for all ω ∈ Ω0 and n ∈ N0. The same argument applies to the point −Rk.

Assumption (C) finally guarantees a certain richness of the noise. The condition ∆n (T ) → 0 for a.e. ω ∈ Ω is a fundamental assumption in Theorem 3.7, which itself ensures that the Limit Set Theorem 3.6 is applicable. In view of Proposition 3.8 this is for example the case if there is q > 0 with P∞ γq < ∞. Then, if (x ) is a Robbins Monro algorithm, almost n=1 n n n∈N every bounded path converges to an equilibrium of x˙ = f (x). However, the following example shows that (A) and (B) alone do not exclude the possibility of an unstable point as the limit of (x ) , a case which we want to exclude in this section. n n∈N Example 3.13. Let U1,U2,... be independent random variables with values in {−1, 0, 1} and 1 1 µ {U = 1} = µ {U = −1} = , µ {U = 0} = 1 − . n n 4n2 n 2n2 Q∞  1  Then the event {Un = 0 for all n ∈ N} has probability n=1 1 − 2n2 > 0. So in the case i i x0 = z for some i we have that xn → z with positive probability.

45 The papers [59, 13] both contain results that provide mild regularity conditions under which (3.11) implies that unstable points cannot be limit points of (x ) with positive n n∈N0 probability. Under those three conditions (A), (B) and (C), we will say that a noise induced tipping occurs when a path of (x ) stays close to some si for a while but eventually converges to n n∈N sj with i 6= j. Remark 3.14. In Example 3.10, (C) holds while (B) is violated, whereas Example 3.13 fulfills (B) but not (C).

3.4.2 The invertible case From now on assume that (A), (B) and (C) hold. We want to treat the noise size R as a bifurcation parameter and show how it enables or prevents tipping, i.e. transitions of trajectories of (3.1) between stable equilibrium points of (3.2). The NRDS formalism is a powerful tool for describing this phenomenon, as it allows us to treat stochastic approximations independently of the initial value x0 but rather as a dynamical system on the whole phase space. We will give a description of noise induced tipping in terms of pullback repellers of that system. Recall that we rewrote (3.1) as a NRDS with

fn,ω (x) = x + γn+1 (f (x) + U1 (ω)) .

We will start our analysis in the simple case where f 0 is bounded from below. This implies 1 0 that all the maps fn,ω are diffeomorphic whenever γn ≤ M where M := − inf f > 0. W.l.o.g. 0 we will assume that this holds true for all n ∈ N. If conversely infR f = −∞, there are points y , y ,... such that f 0 (y ) = − 1 . Hence f 0 (y ) = 0 and f is not diffeomorphic for any 1 2 n γn+1 n,ω n n,ω n ∈ N0. The exact condition we work with will be Assumption (D). 0 1 (∃M > 0) : inf f ≥ −M and sup γn < . (D) R n∈N M We assume this to hold true for the remainder of this section. n Example 3.15. Let p (x) = anx + ··· + a1x + a0 with n ∈ N odd and an < 0. It’s easy to verify that f := arctan p fulfills both (A) and (D). Recall that Theorems 3.6 and 3.7 imply convergence only for bounded paths. We will show now that assumption (D) guarantees this for almost every path.

Theorem 3.16. Under (D), every solution (x ) of (3.1) is almost surely bounded. n n∈N0 Note that because of (A) we can find K > 0 such that si, zi ∈ [−K,K] for all i. This set [−K,K] is absorbing in the sense that for every x0 ∈ R there is a t0 > 0 such that Φt (x0) ∈ [−K,K] for all t > t0. A key to the proof is the following Lemma, which shows how (D) limits the behavior of the stochastic approximation for paths that are far away from this absorbing region [−K,K].

0 Lemma 3.17. For every C ∈ R there exists K > K such that for a.e. ω ∈ Ω and n ∈ N0 0 0 holds: If x < −K then fn,T nω (x) < C and if x > K then fn,T nω (x) > C.

46 : 1 Proof. Let Γ = supn∈N γn < M and denote by

Fn (x) := x + γnf (x) the deterministic part of fn,ω. For x ∈ R and n ∈ N there exists ξ ∈ R by the Mean Value Theorem such that 0 Fn (x) = Fn (K) + Fn (ξ)(x − K) .

Note that Fn (K) ≥ K − Γ f (K) for all n. Thus we have

Fn (x) ≥ K − Γ f (K) + (1 − ΓM)(x − K) . (3.13)

If we now let K0 > K be so large that the right hand side of (3.13) is larger than ΓR + C 0 0 whenever x > K we get that for a.e. ω ∈ Ω, every n ∈ N0 and x > K ,

fn,T nω (x) = Fn+1 (x) + γn+1Un+1 (ω) > Γ R + C = γn+1Un+1 (ω) ≥ C.

By applying a similar argument and possibly increasing the value of K0 we can also achieve 0 that fn,T nω (x) < C for x < −K . Proof of Theorem 3.16. Fix an initial value x with path (x ) and let A ∈ be the set 0 n n∈N0 A on which (x ) is unbounded. We write A = A+ ∪ A− where A± is the set on which n n∈N0 lim supn→∞ ±xn = ∞. The proof is organized in 4 steps. First we show that on A+, every path must be below K for infinitely many n. Then we use this to deduce that xn has to make infinitely many upcrossings of the strip [K + 1,K + 2] and give a minimum time such an upcrossing takes. In step 3 we apply Benaïm’s estimate for the cumulative noise to show that this cannot happen with positive probability. Finally we apply the same methodology to C−. + Step 1. Let ω ∈ A . Assume there is n0 ∈ N such that xn (ω) > K for all n ≥ n0. Then we have n +k X0 xn0+k (ω) ≤ xn0 (ω) + γlUl (ω) . (3.14) l=n0+1 P∞ 2 P∞ 2 If n=1 γn < ∞, then l=n0+1 γlUl a.s. converges by Doob’s L -Martingale Convergence Theorem B.12. This contradicts lim supn→∞ xn (ω) = ∞. Otherwise

n +k X0 lim inf γlUl = −∞ (a.s.), k→∞ l=n0+1 due to Theorem B.15. But this means that xn (ω) < K for some n ≥ n0, also a contradiction. + Hence xn < K for infinitely many n on A . + Step 2. Fix one ω ∈ A and write xn instead of xn (ω). Following step 1, there must be infinitely many upcrossings of the strip [K + 1,K + 2], i.e. infinitely many n ∈ N such that there is k ∈ N with

xn < K + 1 ≤ xn+1 . . . , xn+k ≤ K + 2 < xn+k+1.

47 Fix such n and k for a moment and let K0 like in Lemma 3.17, with C = K. This means that 0 1 −1 a.s. xn > −K . We denote c := max[−K0,K+1] f and choose n0 ∈ N such that γn ≤ 2 (R + c) for all n ≥ n0. In this case we get the estimate 1 x = x − γ (f (x ) + U ) ≥ K + 1 − γ (c + R) ≥ K + . n n+1 n+1 n n+1 n+1 2

Now choose T > 0 with RT < 1 and let m (n) be the largest integer such that τn+m(n) ≤ τn + T . From (3.14) we get

xn+m(n) ≤ xn + Rτn+m(n) < K + 2, which implies k ≥ m (n). This shows that modulo µ,

+ \ [ A ⊆ lim sup Bn = Bn, n→∞ n∈N k≥n where  1  B := K + < x < K + 1 ≤ x + 1, . . . , x < K + 2 . n 2 n n n+m(n)

Step 3. On the set Bn we can refine (3.14) to

n+m(n) X xn+m(n) ≤ K + 1 + γl (−ε + Ul) l=n+1 n+m(n) εT ≤ K + 1 − + X γ U (3.15) 2 l l l=n+1 for all large enough n, where ε := − max 1 f > 0. [K+ 2 ,K+2] Let ∆n := ∆n (T ) like in (3.10) and define  εT  Ω := ∆ ≤ for all k ≥ n . n k 4

According to Assumption (C) we have Ωn % Ω modulo µ. On the set Ωn ∩ Bn, (3.15) implies that εT x ≤ K + 1 − < K + 1, n+m(n) 4 C which is a contradiction. Hence Bn ⊆ Ωn for all large enough n. This finally implies that

 C +  \ [ C  [  µ (A ) ≤ µ Ωk = µ Ωn = 0. n∈N k≥n n∈N Step 4. A similar argument shows that µ (A−) = 0 and thus µ (A) = 0.

This result, together with Theorems 3.6 and 3.7 immediately implies the following.

Corollary 3.18. Under assumptions (A)–(D), every solution (x ) of (3.1) converges to n n∈N0 i 0 N one of the points s . In other words, there is a random variable x∞ with values in {s , . . . , s } such that xn → x∞ a.s.

48 In some sense the points si act as attractors for the NRDS, which is remarkable since they do not depend on n or ω, but they are solely calculated from the mean field ODE x˙ = f (x). However, which initial values converge to which stable point depends on both the starting time and the noise, so the “influence sphere” Bi  of each point si is a sequence of subsets n n∈N0 of Ω × R, i n i Bn (ω) := {x ∈ R | ϕ (k, n; T ω) x → s as k → ∞} for n ∈ N0 and ω ∈ Ω. Those basins of attraction present a dynamical way of describing convergence of stochastic processes (x ) generated by (1.2); x (ω) → si is equivalent to n n∈N n i n xn (ω) ∈ Bn (T ω) for one—and hence all—n ∈ N0. It is clear that i i k ϕ (k, n; ω) Bn (ω) = Bn (T ω) (3.16) for all k, n ∈ N and ω ∈ Ω. i n Our next goal is to study some properties of the sets Bn. Recall the notation µn := T∗ µ from Chapter 2.2. In the noise model we presented at the beginning of this chapter this means that µn is the law of the random variable Un. It is clear from the definition that i i fn,ωBn (ω) = Bn+1 (T ω) for every n ∈ N, i = 0,...,N and µn-a.e. ω ∈ Ω, so the natural question that arises is whether they are random sets in the sense of Definition 2.25. The path to answering this qustion gives us much stronger results. i A first observation is that each Bn (ω) is an interval, and that those intervals are ordered i j with i. Precisely, if j > i and x ∈ Bn (ω) and y ∈ Bn (ω), then y > x. This follows from the fact that all fn,ω are monotonic increasing functions. We say a path of (x ) is trapped near an equilibrium si after time n, if it enters a n n∈N0 small neighborhood around si before time n and nevermore leaves it. We start by showing that the probability of being trapped near si converges to one as n → ∞. Lemma 3.19. For all i = 0,...,N there exists ε = εi > 0 such that µ {ω ∈ Ω | (si − ε, si + ε) ⊆ Bi (T n+kω) for all k ∈ } −−−→ 1. n+k N n→∞ A result of a similar type is Theorem 7.3 in [13], and we borrow the key ideas from the proof of this result. But our result does not need any of the rather technical conditions that are necessary in [13]. The lemma means that any path of (3.1) has a high probability of converging to si if it is close to si at a large enough time. The proof will rely the following estimate. Lemma 3.20. Let ϕ be a stochastic approximation NRDS with a locally Lipschitz mean field f. ˜ Let T > 0 and n ∈ N, and assume that for x ∈ R there are Ω0 ∈ A and bounded Q ⊆ Q ⊆ R such that ϕ (k, n; T nω) Q˜ ⊆ Q for all ω ∈ Ω0 and k ∈ N such that τn+k ≤ τn + T . Denote the largest such number k by m (n). There exist constants ci = ci (T,Q) > 0, i = 1, 2, not depending on n such that n sup |ϕ (k, n; T ω) x − Φτn+k−τn (x)| ≤ c1∆n (T )(ω) + c2γn+m(n) (3.17) k=1,...,m(n) for all x ∈ Q˜, with ∆n (T ) as in (3.10).

49 Proof. Follows from Lemma 4.4 in [12].

i i Proof of Lemma 3.19. Fix 0 ≤ i ≤ N and write Ir for the compact interval [s − r, s + r]. As f is assumed to be hyperbolic, there exists ε > 0 such that

sup f 0 < 0. I2ε

ε Fix 0 < T < and 0 < δ < ε such that Ψ T (Iε) ⊆ Iδ. We denote by m (n) ∈ N0 the largest R 2 0 integer such that τn+m(n) ≤ τn + T . Moreover let −η := infI2ε f < 0 and n0 be so large, that 1 T γn < min{ η , 2 } and m (n) > 0 for all n ≥ n0. We show in a first step through an elementary estimate, that trajectories starting in Iε do not leave I2ε in time < T , which allows us to apply Lemma 3.20 in a second step. i Step 1. Let n ≥ n0 and x ∈ Iε. We write x = s + ρ with ρ ∈ [−ε, ε]. The Mean Value Theorem gives us a (random) number ξ (ω) ∈ Iε such that

i 0  fn,T nω (x) = s + 1 + γn+1f (ξ (ω)) ρ + γn+1Un+1 (ω) .

1 0 0 As 0 < γn+1 < η and 0 > f (ξ (ω)) > −η, the factor (1 + γn+1f (ξ (ω))) lies in the interval (0, 1). Thus, on the full measure set

Ω0 := {|Un| ≤ R for all n ∈ N}

n we get fn,T ω (x) ∈ Iε+γn+1R. This means that on Ω0,

n fn,T ω (Iε) ⊆ Iε+γn+1R ⊆ Iε+RT ⊆ I2ε. A finite iteration of this argument shows that similarly

n ϕ (k, n; T ω) Iε ⊆ Iε+(γn+1+···+γn+k)R ⊆ I2ε for all k = 1, . . . , m (n) and ω ∈ Ω0. Step 2. We now apply Lemma 3.20 with Q˜ = Iε and Q = I2ε to get the estimate

n |ϕ (m (n) , n; T ω) x − Φτn+m(n)−τn (x)| ≤ c1∆n (T )(ω) + c2γn+m(n) =: an (ω) (3.18) for all n ≥ n0, x ∈ Iε and ω ∈ Ω0. Note that an does not depend on x ∈ Iε. Let ε := ε − δ > 0 and An := {ak ≤ ε for all k ≥ n}. Since γn+m(n) → 0, and ∆n (T ) → 0 a.s. by assumption (C), we have An % Ω modulo µ. ± i Fix some n ≥ n0 and set x0 := s ± ε and

± n ± xl (ω) := ϕ (l, n; T ω) x0 . T T As γn+l < 2 for all l ∈ N we have t := τn+m(n) − τn > 2 . The definition of δ implies ± ± that Φt (x0 ) ∈ Iδ. Thus, for all ω ∈ An ∩ Ω0, (3.18) implies that xm(n) (ω) ∈ Iε. Now we ± ± can apply the same argument to the points xm(n) (ω) to find that xm2(n) (ω) ∈ Iε for all 2 ω ∈ An ∩ Ω0, where m (n) = m (n) + m (n + m (n)). Iterating this process we get numbers m3 (n) < m4 (n) < . . . with

± xmr(n) (ω) ∈ Iε for all r ∈ N and ω ∈ An ∩ Ω0. (3.19)

50 + According to Corollary 3.18, there is a full measure set Ωn ∈ A such that both xl (ω) and − 0 N xl (ω) converge to one of the points s , . . . , s as l → ∞ for all ω ∈ Ωn. This and (3.19) imply ± i ˜ ± i n that xl (ω) → s whenever ω ∈ An := An ∩ Ω0 ∩ Ωn. In other words, x0 = x ± ε ∈ Bn (T ω), i n ˜ and thus Iε ⊆ Bn (T ω, for all ω ∈ An. For any k ≥ 0 we can—in a similar manner—find a set A˜n+k, which is modulo µ equal to i ˜ i An+k, such that Iε ⊆ Bn+k,T n+kω for all ω ∈ Ak. But as An ⊆ Ak, we get that Iε ⊆ Bn+k,T n+kω for µ-a.e. ω ∈ A˜n. This implies that

i i i n+k µ {ω ∈ Ω | (s − ε, s + ε) ⊆ B (T ω) for all k ≥ n} ≥ µ (A˜n) = µ (An) −−−→ 1. n+k n→∞

From this we can already deduce non-emptiness of the basins.

i i n Theorem 3.21. For i = 0,...,N and n ∈ N holds pn := µ {ω ∈ Ω | Bn (T ω) 6= ∅} = 1. i Proof. Equation (3.16) implies that pn does not depend on n. But then Lemma 3.19 already yields µ {ω ∈ Ω | Bi (T nω) 6= } ≥ µ {ω ∈ Ω | si ∈ Bi (T nω)} −−−→ 1. n ∅ n n→∞

SN i Lemma 3.22. The random set Bn (ω) := R \ i=0 Bn (ω) is µn-a.s. finite for all n ∈ N. n Proof. As Bn (T ω) = ϕ (n, 0; ω) B0 (ω) according to (3.16), it suffices to prove the statement for n = 0. Let q ∈ Q. According to Corollary 3.18 there exists a full measure set Ωq ∈ A such that i for every ω ∈ Ωq there is 0 ≤ i ≤ N with q ∈ B0 (ω). This means that on the full measure set T SN i q∈Q Ωq we have Q ⊆ i=0 B0 (ω), and hence B0 a.s. contains no interval. But since it is the compliment of a finite union of intervals, it must be a finite set in those cases.

i−1 i This means that for µn-a.e. ω ∈ Ω, the two neighboring intervals Bn (ω) and Bn (ω) touch in exactly one point

i i−1 i qn (ω) := sup Bn (ω) = inf Bn (ω)

n for i = 1,...,N. The following result shows that we can extend qi to measurable maps on the whole of Ω in a coherent way.

Lemma 3.23. There exists a sequence of measurable sets Ω1,Ω2,... ∈ A and a NARF n n n (ˆq ) for each i = 1,...,N such that µn (Ωn) = 1 and qˆ (ω) = q (ω) for all n ∈ 0, i n∈N0 i i N i = 1,...,N and ω ∈ Ωn.

n Proof. For every n ∈ N0 let An,0 ∈ A such that µn (An,0) = 1 and qi (ω) is well defined for all i ω ∈ A1 and i = 1, . . . , n. It is clear that the sets Bn ⊆ Ω ×R are measurable w.r.t. A ⊗B (R). Thus the graphs i i Gn := {(ω, qn (ω)) | ω ∈ A1}

51 n of qi over An,0 are elements of A ⊗B (R) with compact fibers g (ω). Lemma 2.7 in [27] implies ˜i that there exist random compact sets An,1 ∈ A with µn (An,1) and compact random sets Gn, i ˜i such that Gn (ω) = Gn (ω) whenever ω ∈ An,2. We define Ωn := An,0 ∩ An,1 and the maps

( n i i qi (ω) , if ω ∈ Ωn qˆn : Ω → R, qˆn (ω) = si, otherwise

Given an open set U ⊆ R we have that

c i c Ωn ∩ {qˆn ∈ U} ∈ {Ωn, ∅} is trivially measurable and

i ˜i Ωn ∩ {qˆn ∈ U} = Ωn ∩ {Gn ∩ U 6= ∅}

i is measurable due to [27, Proposition 2.4]. This shows that all maps qˆn are measurable. For −1 each n ∈ N and ω ∈ Ωn ∩ T Ωn+1 we have the relation

i i fn,ωqˆn (ω) = fn,ωqn (ω) = qn+1 (T ω) =q ˆn+1 (ω) ,

−1 n and since µn {Ωn ∩ T Ωn+1} = 1 for all n ∈ 0 this shows that (ˆq ) is indeed a NARF N i n∈N0 for the NRDS ϕ.

i i From now on we assume w.l.o.g. that qn =q ˆn. i n i Corollary 3.24. For i = 1,...,N and µ-a.e. ω ∈ Ω, qn (T ω) → z . Proof. Fix i = 1,...,N. According to Corollary 3.9 and Theorem 3.6, qi ◦ T n converges n n∈N a.s. to one of the equilibrium points of (3.2). Lemma 3.19 implies that there is ε > 0 such that µ (An) → 1 where

i−1 i k i An := {ω ∈ Ω | s + ε < qk (T ω) < s − ε for all k ≥ n}.

i k i But if ω ∈ An, the only possible limit for the sequence (q (T ω)) is z . k k∈N0 n In particular this implies that the sets Bi (ω) do not contain their end points and are i therefore open intervals. In the spirit of Lemma 3.23 we can redefine Bn as

0 1 i i i+1 N N Bn = (−∞, qn),Bn = (qn, qn ) for i = 2,...,N − 1 and Bn = (qn , ∞).

Corollary 3.25. For each i = 1,...,N, (Bi ) is an open, invariant NARS. n n∈N0 The fact that f is repelling around zi also implies that qi  are pullback repelling. n n∈N0 Theorem 3.26. For each i = 1,...,N there exists η > 0 such that qi  pullback-repells n n∈N0 an η-neighborhood of itself.

52 Proof. Fix i ∈ {1, . . . , n} and let η > 0 such that

c := inf f 0 > 0. i B3η(z ) We have to show that

n+k i n+k i n  dist ϕ (−k, n + k; T ω)Bη (qn+k (T ω)), qn (T ω) −−−→ 0 (3.20) k→∞ for every n ∈ N and a.e. ω ∈ Ω. Let ε > 0. According to Corollary 3.24 there is n0 ∈ N such that µ (A) > 1 − ε with

i n i A := {qn ◦ T ∈ Bη(z ) for all n ≥ n0}.

We claim that (3.20) holds for all ω ∈ A and n ∈ N. First note that we can apply an argument analogous to Theorem 2.40 to see that it is enough to prove (3.20) for all ω ∈ A and n ≥ n0. Moreover, as i n n+k i n+k qn (T ω) ∈ ϕ (−k, n + k; T ω)Bη (qn+k (T ω)), it suffices to show that

n+k i n+k  dk,n (ω) := diam ϕ (−k, n + k; T ω)Bη (qn+k (T ω)) −−−→ 0. k→∞

For all n ≥ n0 and ω ∈ A. To see this fix k ∈ N and let

n+k i n+k Cl := ϕ (−l, n + k; T ω)Bη (qn+k (T ω)).

i We claim that for all l = 0, . . . , k we have Cl ⊆ B3η z and

l−1 −1 Y  diam (Cl) ≤ 2η (1 + cγn+k−j) . (3.21) j=0

For l = 0 this follows from n ≥ n0 and ω ∈ A. Now assume this was proven for l − 1. Then we have −1 Cl = (fn+k−l,T n+k−lω) Cl−1. i As Cl−1 ⊆ B3η z this implies

l−1 −1 −1 Y diam (Cl) ≤ (1 + cγn+k−(l−1)) diam (Cl−1) ≤ 2η (1 + cγn+k−j) ≤ 2η. j=0

i n+k−l i i as moreover qn+k−l (T ω) ∈ Cl ∩ Bη (z ) this implies that Cl ⊆ B3η z . Evaluating (3.21) with k = l we finally get that

k −1 Y  dn,k (ω) ≤ (1 + cγn+j) −−−→ 0 k→∞ j=1 for all n ≥ n0 and ω ∈ A. Thus the probability that (3.20) holds for all n ∈ N is at least 1 − ε, and as ε > 0 was arbitrary, the theorem is proven.

53 So far we have constructed open NARS which separate the non-autonomous extended phase space into N + 1 regions, each of them associated with one of the attracting fixed points si, and we saw that the borders between those regions are pullback repelling NARF, each one associated with a repelling fixed point zi. Lemma 3.19, or equivalently Corollary 3.24, shows that the probability of being trapped near a stable equilibrium converges to one as time n goes to ∞. It remains to answer the question of whether or not this probability actually reaches i i one, i.e. if xn is close enough to some s for large enough n, then almost always xn → s . It turns out that the answer depends on the noise size R. Let 0 ≤ i, j ≤ N. We say that i j i j n there are transitions between s and s if µ {ω ∈ Ω | s ∈ Bn (T ω)} > 0 for all n ∈ N. The i j n probabilities µ {ω ∈ Ω | s ∈ Bn (T ω)} depend on the noise level R. So far, we blissfully ignored this R-dependency in our notation—out of convenience. Now we need to be more precise. Therefore, if there are transitions from si to sj under a given noise level R, we will indicate this fact by writing i −→R j. Otherwise we will use the notation i −→6R j. Remark 3.27. Lemma 3.19 implies that i −→R i for all R > 0 and i = 0,...,N. The rest of this section is dedicated to the proof of the following bifurcation result for the stochastic approximation NRDS ϕ with bifurcation parameter R. Theorem 3.28. For every pair (i, j) with 0 ≤ i, j ≤ N and i 6= j there exists a critical ∗ parameter R(i,j) > 0 such that

 R ∗ i −→6 j, if R < R(i,j) R ∗ i −→ j, if R > R(i,j). ∗ ∗ In particular, if R < R0 := min(i,j) R(i,j) there are no transitions between stable equilibria ∗ ∗ and for R > R1 := max(i,j) R(i,j), transitions between any two arbitrary stable equilibria are ∗ ∗ possible. So all the bifurcations happen in [R0,R1] and their number is at most N (N − 1). We will prove this in two steps. The following Proposition 3.29 deals with the case ∗ ∗ R < R(i,j) and Proposition 3.30 handles the situation R > R(i,j). As a positive side effect we ∗ get a way determine the bifurcation parameters R(i,j).

Proposition 3.29. Let R > 0 and i 6= j. If either i < j and R < − inf(si,sj ) f or i > j and R R > − sup(si,sj ) f, then i −→6 j.

Proof. We only show the case where i < j and R < − inf(si,sj ) f, as the other case is analogous. Under this condition there exist ε > 0 and si < a < b < sj such that f < −R − ε on the interval [a, b]. From Lemma 3.17 it follows that there is K0 < b such that

n 0 ϕ (1, n; T ω) x < b for all n ∈ N0, x < K and a.e. ω ∈ Ω. (3.22)

Let n0 be so large that b − a γn+1 ≤ sup[K0,b]|f| + R for all n ≥ N. R j Assume i −→ j. Then there exists a set A ∈ A of positive measure such that xk (ω) → s for all ω ∈ A, where n0 i xk (ω) := ϕ (k, n0; T ω)s .

54 1 t

X 0

1

20 30 40 50 60

1 t

X 0

1

20 30 40 50 60 t

Figure 3: Realizations of interpolated graphs of a stochastic approximation with mean field 3 √1 f (x) = arctan (x − x ) with γn = n and noise size R = 3, starting in x50 = 1 at time step n = 50 (or t ≈ 12.75). The top panel shows a graph that initially stays close to the stable equilibrium 1 but then transits to the stable equilibrium −1 while the bottom graph stays near 1 over the observed time period.

W.l.o.g. we assume that the assertion of (3.22) holds true and that |Un (ω)| ≤ R for every ω ∈ A. Now fix one ω ∈ A. For convenience we write xk instead of xk (ω). There must be 0 k0 ∈ N0 such that xk0 ≤ b and xk0+1 > b. From (3.22) it follows that xk0 ∈ [K , b]. Thus,

|xk0+1 − xk0 | = γk0+1|f (xk0 ) + Uk0+1 (ω)| ≤ b − a,

which means that xk0 > a. This now yields that

xk0+1 = xk0 + γk0+1 (f (xk0 ) + Uk0+1 (ω)) ≤ xk0 + γk0+1 (−R − ε + R) < xk0 < b, a contradiction.

Proposition 3.30. Let R > 0 and i 6= j. If either i < j and R > − inf(si,sj ) f or i > j and R R < − sup(si,sj ) f, then i−→j.

Proof. Again it suffices to focus on the first case where i < j and R > − inf(si,sj ) f, as the j second one is analogous. Fix some δ > 0 such that R − δ > − inf(si,sj ) f and let ε = ε like in Lemma 3.19. There is n0 ∈ N0 such that ε γn+1 ≤ sup[si,sj ]|f| + R

55 i for all n ≥ n0. Fix such an n and write x0 := s and

xk+1 = xk + γn+k+1 (f (xk) + R) , i.e. (x ) is a (hypothetical) path of the stochastic approximation starting in si at time k k∈N i j n, where every noise realization Un takes the value R. As long as xk ∈ [s , s ] we have the j estimate xk+1 ≥ xk + δγγn+k+1 , hence there is k0 ∈ N with xk0 ≤ s < xk0+1. Moreover, j j n xk0+1 − xk0 ≤ ε, such that xk0 ∈ (s − ε, s + ε). The maps ϕ (k, n; T ω) depend continuously k0 on ωn+1, . . . , ωn+k, and hence there is an open neighborhood V of the point (R,...,R) ∈ R such that n i j j ϕ (k0, n; T ω) s ∈ (s − ε, s + ε) for all  0 0 0 ω ∈ A := ω ∈ Ω | ωn+1, . . . , ωn+k ∈ V ∈ A . From Lemma 3.19 it follows that there is a set B ∈ A with µ (B) > 0 such that

k0 j ϕ (k, k0; T ω)x −−−→ s k→∞ for all x ∈ (sj − ε, sj + ε) and ω ∈ B, and it is clear that for ω ∈ A ∩ B,

ϕ (k, n; T nω) si −−−→ sj k→∞

It remains to show that µ (A ∩ B) > 0. Recall the notation of νk,n and Gn from the k0 definition of (B). As the maps ϕ (k, k0; T ω) only depend on ω through ωk0+1, . . . , ωk0+k, we see that B ∈ Gk0 , and (B) implies that νk0,n (ω)(A) > 0 for a.e. ω ∈ Ω. This concludes the proof as Z µ (A ∩ B) = νk0,n (ω)(A) dµ (ω) > 0. B

Remark 3.31. There is no imminent mathematical reason for the assumption that

R := − ess inf Un = ess sup Un =: R.

If we allow R 6= R, then we can treat both as bifurcation parameters, where R bifurcates at the ∗ ∗ values R(i,j) with i > j and R at values R(i,j) with i < j. However, the mathematical insights in this case are not much different from the case of symmetric bounds, but the notational complexity increases. Moreover, a natural way of changing the size of the noise would be to multiply the random variables Un with any positive number. In lieu of the original stochastic approximation by Robbins and Monro, this could correspond to an improvement or a decrease of measurement precision. In this case we cannot alter R and R independently. Thus, we chose to formulate all our results in the symmetric case.

56 3.4.3 The non-invertible case In this section we will explore some consequences of omitting assumption (D), while we still assume (A)–(C) to hold. As the maps fn,ω no longer need to be invertible we are no longer able to talk about pullback repellers in the sense of Definition 2.43, but we will show how the results from the previous section can be applied to construct an object that generalizes the idea of the repeller as a separatrix between regions of convergence in the extended non-autonomous phase space. Assumption (D) was a requirement for paths of the stochastic approximation to be bounded (Theorem 3.16), and the following example shows that without (D) this is in not true in general. 1 3 Example 3.32. Let γn = n and f (x) = x − x , hence (D) is violated. There is a number n0 ∈ N such that 2 f (x) ≤ −2x − x − R for all x > n0.

For any n ≥ n0 and x ≥ n + 1 holds a.s. 1 x f n (x) ≤ x + (f (x) + R) ≤ x − (2x + 1) ≤ −x − 1 ≤ − (n + 2) n,T ω n + 1 n + 1 and similarly fn,T nω (x) ≥ n + 2 for x ≤ − (n + 1). An iteration of this argument shows that a.s. |ϕ (k, n; T nω) x| ≥ n + k + 2 for all |x| ≥ n + 1 and thus (ϕ (k, n; T nω) x) is k∈N0 not bounded and does in particular not converge to one of the stable equilibrium points ±1 of (3.2). The main idea for this section will be to replace f by a map f 0 that fulfills (D) and coincides with f on some forward invariant interval containing all the equilibria. In order to have such a set we need the following assumption. Assumption (E). There is a K > 0 such that f (x) < −R for all x > K and f (x) > R for all x < k. The following Lemma shows that this guarantees not only the existence of one forward invariant set, but of arbitrarily large ones.

0 0 Lemma 3.33. Under (E), there is K > 0 such that for every L > K exists n0 = n0 (L) ∈ N with 0 fn,T nω[−L, L] ⊆ [−L, L] and fn,T nω (x) > 0 for all n ≥ n0, x ∈ [−L, L] and a.e. ω ∈ Ω, Proof. Let K like in (E) and define

0 c := sup |f|,Γ := sup γn, and K := K + Γ (c + R) . [−K,K] n∈N Then for any L > K0 write

c˜ := sup |f| and − c0 := inf f 0 [−L,L] [−L,L] and choose n0 so large that  1 L + K  γ ≤ min , n c0 c˜ + R

57 for all n ≥ n0. Fix such an n and let first x ∈ [−K,K]. For a.e. ω ∈ Ω holds

0 fn,T nω (x) ≤ x + Γ (c + R) ≤ K ≤ L and similarly fn,T nω (x) ≥ −L. For x ∈ [K,L] we have

fn,T nω (x) ≤ x + γn+1 (f (x) + R) ≤ x ≤ L and fn,T nω (x) ≥ x − γn+1 (˜c + R) ≥ K − (L + K) ≥ −L.

An analogous argument also shows that a.s. fn,T nω (x) ∈ [−L, L] for all x ∈ [−L, −K]. Moreover, for every x ∈ [−L, L] we get

0 0 0 fn,T nω (x) = 1 + γn+1f (x) ≥ 1 − γn+1c > 0 by choice of n0.

0 0 0 0 Let J := [−K ,K ] with K from Lemma 3.33. We assume w.l.o.g. that n0 (K ) = 1. Once a solution of (3.1) enters J it will forever stay there. Thus, the behavior of f outside of J is not relevant in this case anymore and we could replace f by a function f 0 which fulfills (D) and coincides with f on J. This function f˜ generates a stochastic approximation NRDS ϕ˜ to which the results of the previous section can be applied. Let (B˜i ) the attracting regions n n∈N0 and (˜qi ) the repellers of this new system. n n∈N0 For i, n, ω denote ˆi ˜i i Bn (ω) := J ∩ Bn (ω) = J ∩ Bn (ω) . i ˆi k If x ∈ Bn (ω) for some i, n, ω then there is a k such that ϕ (k, n; ω) x ∈ Bn+k (T ω). Thus we get the representation i [ −1 ˆi k Bn (ω) = ϕ (k, n; ω) Bn+k (T ω). k∈N0 i This together with Theorem 3.21 implies that µn {ω ∈ Ω | Bn = ∅} = 0 also in this case. SN i n We saw in example 3.32 that i=0 Bn (T ω) does not need to cover the whole real line anymore, so the assertion of Lemma 3.22 is no longer true. We can partially fix this by introducing a divergent region

B∞ (ω) := {x ∈ | (ϕ (k, n; ω) x) is unbounded}. n R k∈N0

Denoting I := {0,...,N, ∞}, Theorems 3.6 and 3.7 imply that for each x ∈ R and n ∈ N0 and i µ-almost every ω ∈ Ω there is i ∈ I such that x ∈ Bn,T nω. Thus, in analogy to Lemma 3.22 we get the following. : S i Lemma 3.34. The random set Bn = R \ i∈I Bn is µn-a.s. countable for all n ∈ N.

Note that we only get countability of Bn as opposed to finiteness in Lemma 3.22, because this was a consequence of the monotonicity of fn,ω. Later we will give an example where Bn is indeed infinite a.s. This means that for a.e. noise realization ω there are at most countably many points x ∈ such that (ϕ (k, n; T nω) x) is bounded but does not converge to one R k∈N0 of the stable equilibria s0, . . . , sn.

58 Let Ωn ∈ A be a full µn-measure set on which Bn,ω is countable and assumption (a) of Theorem 3.7 holds. For x ∈ Bn (ω) and ω ∈ Ω˜, the Limit Set Theorem 3.6 implies that ϕ (k, n; T nω) x converges to one of the unstable points zi as k → ∞ and thus, for large n i n+k enough k ∈ N, we must have ϕ (k, n; T ω) x = q˜n+k (T ω) for some i. This gives us the representation N [ i Bn (ω) = Qn (ω) i=1 With the countable, non-empty NARS

i [ −1 i k Qn (ω) := ϕ (k, n; ω) {qn+k (T ω)} k∈N0 for every n ∈ N and µn-a.e. ω ∈ Ω. i Example 3.35. This continues Example 3.32. We want to show that the sets Qn,T nω are countably infinite and no longer singletons. The mean field f (x) = x−x3 has two stable equilibria ±1. Let (B˜±) be the according n n∈N0 ˜ 0 0 basins of attraction of some truncated f and qn = q˜n the separating NARF converging a.s. to 0 (cf. Corollary 3.24). As above we define

[ −k 0 k Qn (ω) = ϕ (k, n; ω) {qn+k (T ω)}. k∈N ˜ 0 n We find a full µ measure set Ω, a (small) ε > 0 and n0 ∈ N such that qn (T ω) ∈ [−ε, ε] for all n ≥ n0 and ω ∈ Ω˜. 1 √ The map Fn (x) = x + n f (x) has three roots, at 0 and ±Cn where Cn = 1 + n. A simple calculation shows that  2  F (C + δ) = − 2 + δ + δ3o (n−1). n n n

+1 n Thus, by putting δ := R + ε we see that for all n ≥ n0 there is a point qn (T ω) ∈ Bδ (Cn+1) such that +1 n 0 n+1 fn (qn (T ω)) = qn+1 (T ω). −1 Similarly we find qn ∈ Bδ (−Cn+1) with

−1 n 0 n+1 fn (qn (T ω)) = qn+1 (T ω).

The maps Fn assume a local maximum

2r1 + n  1  m = 1 + n 3 3 n at the point r1 + n c = n 3 +1 n+1 and a local minimum −mn at −cn. Since, for large enough n, qn+1 (T ω) > mn+1 + R, −2 n −2 n +1 n+1 there is exactly one point qn (T ω) such that fn (qn (T ω)) = qn+1 (T ω). Note that fn

59 −1 n is monotonely decreasing on (−∞, −cn) and for large enough n, qn (T ω) < −cn. Hence the +1 n+1 0 n+1 fact that qn+1 (T ω) > qn+1 (T ω) implies

−2 n −1 n qn (T ω) < qn (T ω).

+2 n +1 n In a similar fashion one can construct the point qn (T ω) > qn (T ω). Iterating this argument we get a two sided sequence (qk (T nω)) , strictly increasing in n k∈Z k, such that n k n 0 n+|k| ϕ (|k|, n, T ω)qn (T ω) = qn+|k| (T ω). n k n By construction, Qn (T ω) = {qn (T ω) | k ∈ Z} for all n ≥ n0 and

n n −1 n0 Qn (T ω) = ϕ (n0 − n, n, T ) Qn0 (T ω)

± for n < n0. Thus, all the sets Qn are countably infinite and hence the basins Bn are a disjoint union of a countably infinite number of intervals. + n S − n As a side remark, we saw in Example 3.32 that Bn (T ω) Bn (T ω) ⊆ [−n − 1, n + 1]. This means that the limits ±∞ n k n qn (T ω) := lim qn (T ω) k→±∞ ∞ n k n ∞ n ∞ n exist in R. Since q (T ω) > qn (T ω) for all k ∈ Z, we get that qn (T ω) ∈ Bn (T ω) and −∞ n ∞ n similarly qn (T ω) ∈ Bn (T ω), thus

∞ n n −∞ ∞ n Bn (T ω) = (−∞, q (T ω)n ] ∪ [qn (T ω), ∞).

3.4.4 Further remarks Parameter depending mean fields In classical bifurcation theory of autonomous ODE’s one usually studies families (fλ)λ∈Λ of vector fields depending continuously on some potentially λ multi-dimensional parameter λ from some metric space Λ. Let Φ be the flow of x˙ = fλ (x). Two parameters λ and λ0, or rather the flows associated to them, are called topologically conjugate if there exists a homeomorphism that maps orbits of Φλ to orbits of Φλ0 without changing the direction of time. We say the system bifurcates at λ∗ ∈ Λ if every neighborhood of λ∗ contains two parameters λ and λ0 which are not topologically conjugate. We refer the reader to [52] for details. In one dimension this usually means that equilibria appear, disappear, 3 collide, split or change their stability. A well know example√ is given by fλ (x) = λx − x with λ ∈ R. For λ > 0, this system has two stable equilibria at ± λ and an unstable one at 0, but if we decrease λ they collide to a single saddle point 0 at λ∗ = 0 which becomes unstable for λ < 0. This is called a pitchfork bifurcation. ∗ Assume the family (fλ)λ∈(a,b) with fλ : R → R undergoes a bifurcation at λ ∈ (a, b) such that the number of stable fixed points changes and such that fλ is hyperbolic for all λ 6= λ∗. The aforementioned pitchfork example is such a system. Let ϕλ be the stochastic λ approximation with mean field λ, where neither the γn nor the Un depend on λ. Then ϕ undergoes a similar bifurcation, as the number of possible limit points changes according to Theorem 3.21. But our results show that ϕλ can bifurcate in parameters not corresponding to a bifurcation in Φλ. The bifurcation parameters calculated in Propositions 3.30 and 3.29 depend on λ, so for fixed noise level R, a change in λ might drop the critical value below R or push it above.

60 3 λ In√ our pitchfork example fλ (x) = λx − x , the system Φ has two stable equilibrium points ± λ for λ > 0. Due to symmetry there exists only one bifurcation parameter R∗ (λ) in the sense of Theorem 3.28, which we can calculate as s 2λ λ R∗ (λ) = . 3 3 Fixing a noise level, for example R = 1, this means that transitions between the stable equilibrium points are possible for R∗ (λ) > 1 and impossible for R∗ (λ) < 1. Solving for λ finally gives us a bifurcation parameter √ 3 λ∗ = 3 2 > 0 2 λ λ for (ϕ )λ>0 which does not correspond to a bifurcation in Φ .

Stochastic Approximations on the unit circle Our results can be transferred to stochas- tic approximations on S1. Let for example f : S1 → S1 be a hyperbolic vector field of the unit 1 circle S = R / Z. A stochastic approximation algorithm with mean field f takes the same form of (3.1), but with all operations modulo 1. Applying the coordinate transformation ! sin 2πx ψ :S1 → 2, x 7→ R cos 2πx to the xn we get the recursion ! −1 0 1 2 yn+1 := ψ (xn+1) = yn + γn+1(f ◦ ψ (yn) + Un+1) · 2π yn + O (γ ), −1 0 n which we can bring to the form

yn+1 = yn + γn+1 (g (yn) + Vn+1 + bn+1) with a martingale difference sequence (V ) and an adapted process (b ) such that n n∈N n n∈N bn → 0 a.s. As mentioned in [13, Remark 4.5], the Limit Set Theorem remains true for this type of recursion. Since |y | = 1 for all n ∈ , we can deduce that (y ) and hence (x ) n N n n∈N n n∈N converges almost surely to an equilibrium of g or f respectively. Let ϕ be the flow of the stochastic approximation on S1. We can apply the same techniques as in Section 3.4.2 to obtain the following result. The formal proof is left to the reader. We assume that S1 is equipped with the usual metric

d (x, y) := min|x − y − n|. n∈Z Theorem 3.36. Let f : S1 → S1 be a continuously differentiable and hyperbolic vector field with exactly one stable fixed point s and unstable fixed point z. Then there is R∗ > 0 such that the following assertions hold. ∗ (a) If R < R there exist n0 ∈ N0 and ε > 0 such that for all n ≥ n0, k ∈ N, x ∈ Bε (s) and almost every ω ∈ Ω, n ϕ (k, n; T ω) x∈ / Bε (z) .

61 ∗ (b) If R > R , then for every ε > 0 and n0 ∈ N there is a set A ∈ A of positive measure, such that for every ω ∈ A and x ∈ (s − ε, s] exist n ≥ n0 and k ≥ 2 and with ϕ (k, n; T nω) x ∈ [s, s + ε) and n ϕ (l, n; T ω) x∈ / Bε (z) for l = 1, . . . , k − 1.

If we choose ε and n0 in the second case such that γn supS1 |f| < ε for all n ≥ n0 then on the set A, all paths will eventually wrap around the whole unit circle in counterclockwise direction. So in other words, Theorem 3.36 says that for R > R∗ there will be paths that go counterclockwise around the circle with positive probability, whereas for R < R∗, the unstable point z forms a barrier that cannot be crossed. Of course the result remains true if we formulate it for clockwise direction. An alternative approach is to extend f to a 1-periodic function on R. Then case (b) of Theorem 3.36 corresponds to a transition between neighboring stable fixed points which are a distance of 1 apart. If f has more than one stable fixed point, we can generalize Theorem 3.36 in the fashion of Theorem 3.28, where we have to take both clockwise and counterclockwise transits into account. This is easier to formulate for periodic functions. Again, the formal proof will be omitted and left to the reader.

Theorem 3.37. Let f : R → R be periodic and hyperbolic. The assertions of Theorems 3.16 and 3.28 hold true.

3.5 A bifurcation arising from touchpoints In this section we study a stochastic approximation of the form 1 x = x + (f (x ) + U ) (3.23) n+1 n n + 1 n n+1 where f : R → [0, ∞) fulfills f (0) = 0 and f (x) > 0 for x 6= 0.

This means that f has only one fixed point which is not hyperbolic. In fact we do not even need to assume smoothness of f in this section. However we retain the assumtion that f is continuous and that the mean field ODE x˙ = f (x) has solutions for all initial values x0 and times t ≥ 0. This question is motivated by [60] where Pemantle discusses the possibility of convergence to these so called touch points in the context of urn models. Pemantle’s proof relies on the form of the noise variables of the urn process. By applying a different strategy to the problem, we can extend Pemantle’s result to a more general case while significantly shortening the proofs. As in the previous subsections we assume that (U ) is a martingale difference sequence. n n∈N We still assume the richness condition (3.11) for the noise, but it need no longer be bounded. Instead we only assume a uniform conditional L2 bound. Precisely we assume there exist σ0, σ1 > 0 such that

2 2 σ0 ≤ E[|Un+1| | Fn] and E[Un+1 | Fn] ≤ σ1

62 almost surely for all n ∈ N. This choice of the Un makes sure that for every n ∈ N the limit ∞ 1 Z := X U n k k k=n exists by the Martingale Convergence Theorem B.12.

1 Theorem 3.38. (a) If there exists 0 < α < 2 such that f (x) ≤ α|x| for all x ∈ R, then for large enough initial value x0 we have µ {xn → 0} > 0. 1 (b) If there are δ, ε > 0 such that µ {Zn < 0} < ε for all n ∈ N and f (x) ≥ 2 |x| for all x ∈ [−δ, δ], then µ {xn → 0} = 0 for every choice of x0. If we chose f (x) = α|x|, this result implies a bifurcation of the according stochastic 1 approximation (3.23) with critical value αcrit = 2 , which does not correspond to a bifurcation of the mean field ODE x˙ = α|x|. Moreover, this bifurcation is not induced by the noise size, in fact the noise need not even be bounded in this case. In order to prove this result, we will need the following lemma. In it we can see the 1 2 relevance of the critical value α = 2 . We will later use it to estimate L norms of a certain 1 martingale, and then apply Doob’s inequality and Theorem B.15. For α ∈ (0, 2 ) and k, n ∈ N we write n  α n+k  1  p (α) := Y 1 − and q := Y 1 − . (3.24) n i k,n 2i i=1 i=n+1

1 Lemma 3.39. For every α < 2 we have ∞ 1 c (α) := X < ∞ 2 2 k=1 k pk (α) and for every n ∈ N, ∞ 1 X = ∞. 2 2 k=1 (n + k) qk,n

1  1  Proof. Let α < 2 and choose some λ ∈ α, 2 . Since

1 − αx = (1 − λx) + (λ − α) x = exp (−λx) + (λ − α) x + O (x2), there are k0 ∈ N and a constant C > 0 such that ! k 1 p (α) ≥ C exp −λ X ≥ C (ek)−λ k i i=1

Pk 1 for all k ≥ k0, where we used the estimate i=1 i ≤ 1 + log k = log (ek). Thus ∞ 1 e2λ ∞ 1 X ≤ X < ∞, k2p2 C2 k2−2λ k=k0 k k=k0 which proves c (α) < ∞.

63 For the second claim, we can use the estimates 1 − x ≤ e−x and

n+k 1 X ≥ log (n + k) − log (n + 1) i i=n+1 to obtain

 k+n  1 X 1 qk,n ≤ exp −  2 i i=k+1  1  ≤ exp − (log (n + k) − log (n + 1)) 2 √ − 1 = n + 1 (n + k) 2 .

Then we use this inequality to see that

∞ 1 1 ∞ 1 X ≥ X = ∞. 2 2 n + 1 n + k k=1 (n + k) qk,n k=1

Proof of Theorem 3.38. (a) Assume x0 < 0 and An := {x0, . . . , xn < 0}. Because of f (x) ≤ α|x|, we have the estimate

 α  1 x ≤ 1 − x + U k+1 k + 1 k k + 1 k+1 on the set An for k = 1, . . . , n. Iterating this inequality yields that on An for k = 1, . . . , n + 1,

xk ≤ pk (α)(x0 + Mk) , where pk (α) is like in (3.24) and

k 1 M := X U . k jp (α) j j=1 j

This means that we have the inclusion

An ⊇ {M1,...,Mn < −x0} .

From Lemmas B.10 and 3.39 it follows that ∞ 1 E[M 2] ≤ σ2 X ≤ σ2c (α) n 1 2 2 1 k=1 k pk (α) for all n ∈ N. Thus we can apply Doob’s inequality (Theorem B.11) to get the estimate   2 σ1c (α) µ (Ω \ An) ≤ µ max |Mk| > −x0 ≤ 2 . k=1,...,n x0

64 Note that the sequence (A ) is decreasing and that on the limit n n∈N [ A∞ := An = {xn < 0 for all n ∈ N} n∈N we have xn → 0 a.s. by Theorems 3.6 and 3.7. Thus

2 σ1c (α) µ {xn → 0} ≥ µ (A∞) ≥ 1 − 2 , x0 p the right hand side of which is positive for x0 < −σ1 c (α). (b) Assume µ {xn → 0} > 0 and let δ, ε be like in our assumption. Then there is m ∈ N such that µ (A) > 0 with

A = {xn ∈ [−δ, δ] for all n ≥ m and xn → 0} ∈ F∞.

According to Corollary B.14 we can find n ≥ m and B ∈ Fn such that µ (B) > 0 and ε µ (A | B) ≥ 1 − . (3.25) 2 Let τ := inf {k > n | xk > 0} . Obviously τ is a bounded from below by n + 1, and we want to see that it is a.s. finite. Let C := {τ = ∞}. In a similar fashion to part (a) we can get the estimate

xn+k ≥ qk,n (xn + Nk) with qk,n like in (3.24) and k 1 N := X U . k (n + i) q n+i i=1 i,n

Lemma 3.39 and Theorem B.15 imply that lim supk→∞ Nk = ∞. But this means that Nk > −xn for some k and thus xn+k > 0, a contradiction to τ = ∞. So we conclude that µ (C) = 0, or equivalently τ < ∞ a.s. Note that for all k ∈ N the estimate τ+k 1 x ≥ x + X U τ+k τ i i i=τ+1 holds. On the set A this implies

0 = lim xτ+k ≥ xτ + Zτ+1 > Zτ+1. k→∞ Combining this with (3.25) we get ε µ (Z < 0 | B) ≥ µ (A | B) ≥ 1 − . τ+1 2

As τ ≥ n + 1 and B ∈ Fn we have B ∈ Fτ . On the other hand, Lemma B.18 implies that µ (Zτ+1 > 0 | Fτ ) > ε, a contradiction.

65 3.6 Conclusions and outlook Sections 3.4 and 3.5 are somewhat complementary, the first describing the dynamics of stochas- tic approximations with only hyperbolic limit points, while the second studies touchpoints which are necessarily non-hyperbolic. In situations where both can occur, one may be able to combine the techniques from both scenarios to obtain a description of the dynamics. This, together with the modifications mentioned in 3.4.4, provides a machinery to understand a large class of one-dimensional stochastic approximations from a dynamical systems point of view. This includes e.g. some generalized Pólya urn models. On the other hand, many examples, some of which we have presented in more detail, are in higher dimensions. Our analysis can be thus understood as a starting point to develop a general understanding of the dynamics of stochastic approximations. Let us discuss a few challenges that might occur. We only dealt with equilibrium points, but in higher dimensions limit sets of paths of the stochastic approximation can have a much more complicated geometry. For example Sato et. al. show in [68] that the replicator dynamics of a 3x3 bimatrix game can be chaotic, and thus the limiting objects of a reinforcement learning algorithm as described in Section 3.2.2 are potentially very complex. But even fixed points need not fall in the categories described above, as they can have stable, unstable and even neutral directions at the same time. In order to prove that the transition probability is strictly positive in Proposition 3.29 we used an extremal noise realization, where in every step the noise is such that it pushes towards the target equilibrium with maximal strength. In higher dimensions such a straight forward approach is not sufficient in general. The following example illustrates this point. Example 3.40. Consider the 2-dimensional parameterized mean field

2 ! e−y x − x3 f : d → d, f (x, y) = ε R R −εy with ε > 0 and assume the Un are independent and uniformly distributed on B≤R (0). This vector field has 3 equilibrium points, two stable ones s0 = (−1, 0) and s1 = (1, 0) and an unstable one at the origin. Assume we want to construct a path from s0 to s1, using extremal noise realizations. Attempting to do so in the most direct approach, i.e. Un = (R, 0), results in the equations

3 xn+1 = xn + γn+1 (xn − xn + R) yn+1 = 0,

∗ but according to Theorem 3.28 there is a critical value R such that xn will never be close 1 ∗ to s if R < R . Assume we were starting at a point xN = (−1, yN ) for yN > 0, then we would be able to construct a path towards to (1, y¯) with extremal noise realizations (R, 0) if y¯2 1 R R < e . From (1, y¯) it is then easy to get to s . Now if y¯ < ε we can find N ∈ N such that 0 0 for U1,...,UN = (0,R) and (x0, y0) = s we have yN > y¯, and thus we found a path from s 1 to s . Hence the number Rε that solves the equation ! R2 R = exp − ε R∗ ε ε2

∗ is an upper bound for the critical noise size of fε, which is strictly smaller than R . It is not hard to see that Rε is in fact equal to the critical value.

66 For practical purposes one might want to have good estimates on transition probabilities, or even a numerical way of calculating them. This is a non-trivial task. Assume that for a given path (xn) of the stochastic approximation we found a measure of saying “a tipping event occurred before time n”, then the probability pn of this happening can be approximated with numerical methods, at least a brute force approach is possible. But then it is not clear how good of an approximation pn is for the value limk→∞ pk. Some analytic estimates, for example on the convergence speed of the pn, or some recursive relation will be necessary. At the current point, the best estimate we could produce is based on the extremal noise realizations described above, to give lower bounds on pn, but Figure3 shows that paths oscillate during transitions, so extremal solutions are not the typical case. This is a hint that such estimates are very poor.

67 4 Rate-induced tipping in random systems

4.1 Deterministic R-tipping We give an introduction to rate induced tipping for solutions of ODE’s, based on the work of Ashwin, Wieczorek and coworkers ([2, 7, 9]). Our goal is to develop an equivalent theory in the context of random dynamical system. The framework first provided in [9] is that of a parameter shift system, which is a non- autonomous ODE of the form x˙ t = f (xt,Λ (rt)) (4.1) d d 1 4 where f : R × [λ−, λ+] → R is C and Λ is a smooth parameter shift.

Definition 4.1. Let −∞ < λ− < λ+ < ∞.A parameter shift from λ− to λ+ is a continuous map Λ: R → [λ−, λ+] such that lim Λ (t) = λ±. (4.2) t→±∞ We call it smooth parameter shift if Λ is C2 and additionally lim Λ0 (t) = 0. t→±∞ Property (4.2) can be understood in such a way that the non-autonomous ODE (4.1) is close to the autonomous ODE x˙ t = f (xt, λ−) for small t < 0 and close to x˙ t = f (xt, λ+) for large t > 0. In the terminology of [63] the system (4.1) is asymptotically autonomous. Recall that a compact set A ⊆ is an attractor for some flow ψ = (ψ ) if it is invariant R t t∈R under ψ and there is η > 0 such that

dist (ψtBη (A) ,A) −−−→ 0, t→∞ If additionally there are c, v > 0 such that  −vt d ψt (x) ,A ≤ ce d (x, A) (4.3) for all x ∈ Bη (A) and t > 0, we sat the attractor is exponential. Here we are particularly interested in the flows ψλ generated by solutions of the ODE λ x˙ t = f (xt, λ). If A− is an attractor for ψ − , then the following notion introduced in [2], loosely speaking, indicates that the attractor persists if we change the parameter continuously d from λ− to λ+. We write K for the set of compact subsets of R , endowed with the Hausdorff distance.

Definition 4.2. A stable branch is a continuous map X :[λ−, λ+] → K such that X (λ) is an exponentially stable attractor for ψλ with the constants c, v, η in (4.3) not depending on λ ∈ [λ−, λ+]. The parameter shift system also generates a non-autonomous flow φ in the following way. If τ 7→ xτ is the solution of (4.1) with initial value xt = x then we define φ (s, t) x := xt+s, so t refers the starting time and s ∈ R is the elapsed time. We give definitions of invariant sets and pullback attractors for parameter shift systems which are deterministic non-autonomous counterparts to the according objects defined in Chapter 2.2.

4In the terminology of Ashwin et al ([2, 7]), parameter shift always refers to what we call a smooth parameter shift. But in our time discrete setting continuity will be sufficient for all results.

68 Definition 4.3. A set valued map A: R → K is called invariant under φ if φ (s, t) At = As+t for all s, t ∈ R. We say it is a pullback attractor if it is invariant and if there is a bounded open set U with limt→−∞ At ⊆ U and

lim dist (φ (s, t − s) U, At) = 0 s→∞ for all t ∈ R.

The main result in [2] shows that for any given attractor A− of the past limit system we can find a non-autonomous attractor for φ whose backwards limit is contained in A−. The way we present it here is a conglomerate of the results in [2, Section B].

λ Theorem 4.4. Let A− be an attractor for ψ − . Then for all small enough η > 0 the set valued map A[Λ,r,A−], defined via

[Λ,r,A−] A := lim φ (s, t − s) Bη (A−) t s→∞

[Λ,r,A−] is an attractor for φ with limt→−∞ At ⊆ A−.

If we imagine the attractor A− to live at time t = −∞, the non-autonomous attractor A[Λ,r,A−] is in some way its natural extension to times t > −∞. Assuming that X is a stable λ+ [Λ,r,A−] branch from A− to some attractor A+ of ψ we can pose the question: Does At extend the attractor A+ in the same way? Unsurprisingly this answer depends on Λ and r. In [2, Definition III.1], three different cases are distinguished.

[Λ,r,A−] [Λ,r,A−] Definition 4.5. Let X be a stable branch from A− to A+ and let A∞ := limt→∞ At . We say there is [Λ,r,A−] • end-point tracking if A∞ ⊆ A+. [Λ,r,A−] • total tipping if A∞ ∩ A+ = ∅. • partial tipping otherwise. We say there is tipping if there is either partial or total tipping. For a fixed parameter shift Λ all three cases can potentially occur for different rates as demonstrated in Section IV of [2]. Analogue to the definition of critical values for bifurcation ∗ ∗ we say that r is a critical rate if for every ε > 0 there is a rate r ∈ Bε (r ) such that the tipping behavior of r and r∗ are different. The following result shows that end-point tracking is always possible. Theorem 4.6. In the situation of Definition 4.5, for every parameter shift Λ there is a r∗ > 0 such that there is end-point tracking whenever r < r∗.

4.2 Asymptotically autonomous NRDS In preparation for a formal study of tipping in a random context, we first establish a few results about asymptotically autonomous NRDS. Assume the NRDS generated by the random continuous maps (f ) on X = d over the ergodic base (Ω, , µ, T ) converges to a limiting n n∈Z R A d autonomous RDS T n g in the following sense. Let K be the set of all compact subsets of R and define βn (K) := ess sup sup |fn,ω − gω| ω∈Ω K

69 for K ∈ K and n ∈ Z. We say that fn converges uniformly on compact sets and uniformly in ω ∈ Ω to the past limit g, if βn (K) → 0 as n → −∞ for every K ∈ K. Definition 4.7. Let A be a random compact set and (A ) a NARS. − n n∈Z (a) We say that (A ) past-tracks A uniformly if n n∈Z − n n lim ess sup dist (An (T ω) ,A− (T ω)) = 0. (4.4) n→−∞ ω∈Ω

(b) We say that (A ) weakly past-tracks A if dist (A ,A ) → 0 in probability as n n∈Z − n − n → −∞.

It is clear that uniform tracking implies weak tracking. Conversely, if (A ) is weakly n n∈Z past-tracking, then there is an increasing subsequence n1 < n2 < . . . of natural numbers such that −nk −nk  dist A−n (T ω),A− (T ω) −−−→ 0 k k→∞ for a.e. ω ∈ Ω. Convergence to a constant in probability and in distribution are equivalent, so weak past-tracking could be defined as dist (An,A−) → 0 in distribution. We say that a given random set U is essentially bounded if there is a compact set K ∈ K such that U ∈ K a.s. In other words, the essential supreme of the random variables supu∈U |U| is finite. The following simple method of detecting weak past-tracking will be used later on.

Lemma 4.8. Assume the random compact set A− attracts an essentially bounded neighborhood U of itself. If (A ) is an invariant NARS such that for a.e. ω ∈ Ω there is n (ω) ∈ n n∈Z 0 Z with A (ω) ∈ U (ω) for all n ≤ n (ω), then (A ) weakly past-tracks A . n 0 n n∈Z − Proof. First we show that A−∞ (ω) := lim An (ω) n→−∞ defines a compact invariant random set for T n g. To see this let K ∈ K be a compact set S  with U ⊆ K a.s. and denote Cn (ω) := cl k≤n Ak (ω) . For every ε > 0 there is n1 ∈ Z with βn (K) ≤ ε for all n ≤ n1. Thus,

dh (gωAn (ω) ,An+1 (T ω)) = dh (gωAn (ω) , fn,ωAn (ω)) ≤ ε whenever n ≤ min {n0 (ω) , n1}. This implies that also

dh (gωCn (ω) ,Cn+1 (T ω)) ≤ ε for those n and hence

dh (gωA−∞ (ω) ,A− (T ω)) = lim dh (gωCn (ω) ,Cn+1 (T ω)) ≤ ε n→∞ by Lemma A.4 and Theorem A.3. As ε > 0 was arbitrary this implies gωA−∞ (ω) = A−∞ (T ω) for a.e. ω ∈ ω. Let ΩU be the omega limit of U. Lemma 2.21 implies that ΩU ⊆ A−. Due to the fact that A−∞ is invariant and A−∞ ⊆ U a.s. we have that

A−∞ = ΩA−∞ ⊆ ΩU ⊆ A−.

70 Let again ε > 0. Lemma A.5 gives us an a.s. defined random variable −∞ < nε ≤ n0 with

dist (An (ω) ,A−∞ (ω)) ≤ ε whenever n < nε (ω) and thus

µ {dist (An,A−∞) < ε} ≥ µ {nε > n} −−−−−→ 1. n→−∞

Knowing that A−∞ ⊆ A− a.s., this implies that d (An,A−) → 0 in probability. The rest of this subsection is dedicated to the question whether for a given random invariant set A− of the limiting system T n g there exists a tracking NARS. We will focus our attention on the case that A− is a local attractor. The first result gives a rather simple condition for the existence of a weakly past-tracking NARS.

Theorem 4.9. Assume that the invariant random compact set A− attracts an essentially bounded random compact neighborhood U of itself, which has the property that there exists ε > 0 with B≤ε (gωU (ω)) ⊆ U (T ω) for a.e. ω ∈ Ω. Then there exist a compact, invariant NARS (A ) and n < 0 with the following properties. n n∈N 0 T −k −k (a) An (ω) = k∈N ϕ (k, n − k; T ω)U (T ω) for n ≤ n0 and a.e. ω ∈ Ω. (b) An ⊆ U for n ≤ n0. (c) (A ) weakly past-tracks A . n n∈N − (d) (A ) attracts U. n n∈N (e) If (C ) is a NARS weakly past-tracking A , then C ⊆ A a.s. n n∈Z − n n d Proof. Let K ⊆ R be such that U ⊆ K a.s. and choose n0 < 0 with βn (K) < ε whenever n < n0. This means that

fn,ωU (ω) ⊆ B≤ε (gωU (ω)) ⊆ U (T ω) in that case and hence

\ −k −k An (ω) := ϕ (k, n; T ω)U (T ω) ⊆ U (ω) k∈N is non-empty and compact as a decreasing intersection of bounded, non-empty compact sets. Moreover we have that

\ −k −k fn,ωAn (ω) = fn,ω ◦ ϕ (k, n; T ω)U (T ω) k∈N = \ ϕ (k + 1, (n + 1) − (k + 1) ; T −(k+1)T ω)U (T −(k+1)T ω) k∈N = An+1 (T ω) , such that we can extend (A ) to an invariant NARS (A ) . n n

71 by assumption for a.e. ω ∈ Ω. As dist (Cn,A−) → 0 in probability there is a subsequence nk % +∞ such that −nk −nk  dist C−n (T ω),A− (T ω) −−−→ 0 k k→∞

−nk −nk for a.e. ω ∈ Ω. In particular, C−nk (T ω) ⊆ U (T ω) for such ω and all large enough k such that

 −nk −nk  dist Cn (ω) ,An (ω) ≤ lim sup dist ϕ (nk, n − nk,T ω)U (T ω),An (ω) = 0 k→∞ for all n ∈ Z and a.e. ω ∈ Ω due to (d).

We can say more in the case where the attractor A− is a random fixed point and gω is locally attracting around it. In fact it is enough to have attraction in the long term average.

Definition 4.10. We say that T n g is contracting on average on a random set U if there exists a measurable map L: Ω → [0, ∞) with EL < 1 such that gω is L (ω)-Lipschitz on U (ω).

1 For the following result we will assume that fn,ω and gω are in C and that the differentials converge uniformly on compact sets uniformly in ω ∈ Ω. To formalize this let

γn (K) := ess sup sup |Dfn,ω − Dgω| ω∈Ω K for K ∈ K. If T n g is contracting on average on a forward invariant neighborhood of a random fixed point a−, then a− attracts this neighborhood. We give no proof of this result since the techniques are basically contained in the proof of the following generalization to NRDS. The structure of this theorem is analogue to Theorem 4.9.

Theorem 4.11. Assume that gω and fn,ω are continuously differentiable and γn (K) → 0 as n → −∞ for every K ∈ K. Assume further that a random fixed point a− of T ng is contracting on average on an essentially bounded random neighborhood U of itself, which has the property that there exists ε > 0 such that B≤ε (gωU (ω)) ⊆ U (T ω) for a.e. ω ∈ Ω. Then there is a NARF (a ) with the following properties. n n∈Z (a) (a ) past-tracks a uniformly. n n∈Z − (b) (a ) attracts U. n n∈Z (c) If the invariant NARS (C ) weakly past-tracks a then C = {a } a.s. n n∈Z − n n As opposed to Theorem 4.9, this result guarantees the existence of a unique strongly past-tracking attractor, which is unique even in the weak sense. We will use the following simple observation.

Lemma 4.12. For a measurable map h: Ω → [0, ∞] with Eh < 1 we write

n n−1 Y −k X Pnh (ω) := h (T ω) and Qnh (ω) := Pkh (ω) . k=1 k=0

Then limn→∞ Pnh = 0 a.s. and limn→∞ Qnh < ∞ a.s.

72 Proof. Let 1 ∞ S h (ω) := X h (T −kω). n n n=1 Then, by the inequality of arithmetic and and Birkhoff’s Ergodic Theorem we have pn lim sup Pnh ≤ lim Anh = Eh < 1 n→∞ n→∞ and thus Pnh → 0 a.s. For any generic ω ∈ Ω there is n0 ∈ N and η < 1 such that Snh (ω) < η for all n ≥ n0. Using the inequality of arithmetic and geometric mean a second time we can conclude that

∞ ∞ ∞ X X n X n Pnh (ω) ≤ (Snh (ω)) ≤ η < ∞. n=n0 n=n0 n=n0

Instead of proving Theorem 4.11 directly, we first reformulate it in a more technical and slightly more general way, using the notation of Lemma 4.12. This version will be used in the next section of this chapter.

Theorem 4.13. Let fn,ω, gω, U and a− be like in Theorem 4.11 and let K ∈ K such that U ∈ K a.s. and L like in Definition 4.10. Pick δ > 0 such that EL + δ < 1 and let h (ω) := L (ω)+δ. There exists a NARF (x ) fulfilling (a)–(c) of Theorem 4.11. Moreover, n n∈Z if n0 is chosen such that

inf βn (K) ≤ ε and inf γn (K) ≤ δ n≤n0 n≤n0 then −k −k d (ϕ (k, n − k; T ω)U (T ω), xn (ω)) ≤ diam (K) Pn (h) (4.5) for a.e. ω ∈ Ω and every n ≤ n0. A particular case arises when L < 1 can be chosen as a constant, and thus h is constant as well. Then equation 4.5 shows that xn attracts U exponentially fast, uniformly in ω.

Proof of Theorem 4.13. First we observe that βn (K) < ε implies fn,ωU (ω) ⊆ U (T ω) for a.e. ω ∈ Ω. As gω is L (ω)-Lipschitz on U (ω) we have the estimate |Dgω (x)| ≤ L (ω) for all x ∈ U (ω). Hence, by choice of n0 we have Dfn,ω (x) ≤ h (ω) for a.e. ω ∈ Ω and all x ∈ U (ω). For n ≤ n0 and k ∈ N we denote

n −k −k Vk (ω) := ϕ (k, n − k; T ω)U (T ω).

The sequence (V n (ω)) is a decreasing sequence of non-empty compact sets and Lemma k k∈N0 4.12 implies that n diam (Vk (ω)) ≤ Pkh (ω) −−−→ 0 k→∞ T n a.s. Thus for a.e. ω ∈ Ω the intersection k∈N Vk (ω) contains exactly one point which we denote as an (ω).

73 n n+1 The cocycle property of ϕ implies that fn,ωVk (ω) = Vk+1 (T ω) such that fn,ω (an (ω)) = a (T ω) and thus we can extend (a ) to a random fixed point (a ) . That this fixed n+1 n n≤n0 n n∈Z point attracts U is clear from the definition of an and Theorem 2.40. That (4.5) holds is also obvious. Let us now show that (a ) uniformly past-tracks a . To see this let n n∈Z −

∆n (ω) := |an (ω) − a− (ω)|.

Whenever n < n0 we have the a.s. estimate

∆n+1 (T ω) ≤ |fn,ω (an (ω)) − fn,ω (a− (ω))| + |fn,ω (a− (ω)) − gω (a− (ω))|

≤ h (ω) ∆n (ω) + βn (K) .

A k-fold application of this inequality yields

k −k X ∆n (ω) ≤ Pkh (ω) ∆n−k (T ω) + βn−j (K) Pj−1h (ω) j=1

≤ Pkh (ω) · diam (K) + Qkh (ω) · inf βn−j (K) . j=1,...,k

Taking limits k → ∞, Lemma 4.12 gives us a random variable 0 ≤ Q < ∞ such that

∆n (ω) ≤ Q (ω) · inf βn (K) , j≤n but as βn → 0 this proves uniform past-tracking. It remains to prove (c). First note that by assumption we have

Bε (a− (T ω)) ⊆ B≤ε (gωU (ω)) ⊆ U (T ω) . (4.6) for a.e. ω ∈ Ω. If the NARS (C ) is weakly past-tracking a , there is a subsequence n n∈Z − −kl kl kl & −∞ such that dist(Ckl (T ω), a− (T ω)) → 0 a.s as l → ∞. Due to (4.6) we can nk kl assume that Ckl (T ω) ⊆ U (T ω) for some ω ∈ Ω and all large enough l ∈ N. This shows now that in this case

C (T nω) = ϕ (k − n, k ; T −kl ω)C (T kl ω) ⊆ V n (T nω) . n l l kl kl Recall that the sequence (V n (ω)) is decreasing such that finally k k∈N C (ω) ⊆ \ V n (ω) = \ V n (ω) = {a (ω)} . n kl k n l∈N k∈N

Remark 4.14. There is a of possible alternative definitions of past-tracking, for which similar results are obtainable. For example one could define almost sure past-tracking as n dist (An (T ω) ,A− (ω)) → 0 for a.e. ω ∈ Ω. Then if already supK |fn,T nω − gT nω| → 0 for a.e. ω ∈ Ω and the NARF a− attracts an essentially bounded neighborhood U on which it is contracting, there is a unique NARF past-tracking a− in the almost sure sense. Potentially p one could also think about L convergence of dist (An,A−) for p < ∞.

74 4.3 Random parameter shift systems and rate induced tipping In this section we introduce bounded noise to parameter shift systems as described in Sec- tion 4.1. We focus on the time discrete case. For simplicity, we assume w.l.o.g. that λ− = 0, so all parameter shifts are from 0 to λ+ > 0. Let (F λ) be a family of C1 maps d → d such that F λ → F = F 0 compactly as λ∈[0,λ+] R R λ + λ d λ → 0 and F → F = F + as λ → λ+, i.e. for every compact set K ⊆ R ,

lim sup |F λ − F | = 0 and lim sup |F λ − F +| = 0. λ→0 K λ→λ+ K

λ A typical (model) situation is F (x) = F (x − λ) + λ where the graph of a function F : R → R is shifted along the diagonal. We assume moreover that (λ, x) 7→ F λ (x) is continuous n both variables x and λ. Z The noise model we are using is a shift system on the space Ω = B≤1 (0) of two-sided d sequences with values in the closed unit ball in R , endowed with its Borel-σ-algebra A = ⊗ Z B (Ω) and the measure µ = m Z, where m is the normalized Lebesgue measure on B≤1 (0) . a sequence ω = (ω ) according to µ means that all the components ω are picked n n∈Z n Z independently and uniformly distributed on B≤1 (0) . It is well known (see e.g. [48, Example 20.12]) that µ is ergodic w.r.t. the right shift

T : Ω → Ω, T ω = (ω ) . n+1 n∈Z We will scale this noise with a parameter σ > 0. [Λ,r] For a parameter shift Λ from 0 to λ+ and a rate r > 0 we obtain a NRDS ϕ from the fiber maps [Λ,r] Λ(rn) fn,ω (x) = F (x) + σω0 (4.7) If there is no ambiguity about Λ and r we may omit the [Λ, r] dependence from the notation for convenience. All those systems are asymptotically autonomous to the past limit RDS T ng with fiber maps gω (x) = F (x) + σω0. − Given the shift-structure of the underlying space Ω, the past-σ-algebra F of T n g coincides − r with the past-σ-algebras FN of the NRDS ϕ and is generated by the projections ω 7→ ωn e for n < 0. In other words, a measurable map h: Ω → R is past measurable if it depends r only on ω−1, ω−2,... . From this it is clear that T n g and ϕ have the Markov property (see Sections 2.1.2 and 2.2.1). [Λ,r] Convergence of fn to gω is not only uniform on compact subsets and uniform in ω, it is even locally uniform in r.

Lemma 4.15. Let Λ be a parameter shift and I = [a, b] ⊆ (0, ∞) a compact interval. (a) For every K ∈ K and ε > 0 there is n0 ∈ Z such that

[Λ,r] [Λ,r] βn (K) := ess sup sup |fn,ω − gω| < ε ω∈Ω K

for all n ≤ n0 and r ∈ I.

75 λ (b) If DF → DF compactly as λ → 0, then for any given δ > 0 we can choose n0 such that additionally [Λ,r] [Λ,r] γn (K) := ess sup sup |Dfn,ω − Dgω| < δ ω∈Ω K

whenever r ∈ I and n ≤ n0. ∗ λ ∗ Proof. For K ∈ K there is λ > 0 such that supK |F − F | < ε for all λ < λ . As Λ (t) → 0 for ∗ ∗ ∗ ∗ t → −∞ there is t ∈ R such that Λ (t) < λ for all t < t . If n0 is chosen such that n0b < t then for all n ≤ 0 and r ∈ I we have nr < t∗ and thus

[Λ,r] Λ(rn) |fn,ω (x) − gω (x)| = |F (x) − F (x)| ≤ ε for all ω ∈ Ω and x ∈ K, which proves (a). ∗ λ ∗ For (b) we have to pick λ so small that additionally supK |DF − DF | < δ for all λ < λ and proceed as above.

d Recall that a compact set A ⊆ R is an attractor for F if FA = A and if there is an compact neighborhood U of A such that dist (F nU, A) → 0 as n → ∞. In particular, U can be chosen such that FU ⊆ int U. If A consists of a single (fixed) point a then we call a asymptotically stable if F is contracting on a neighborhood of a. We disturb F by a noise of order σ and we want to be able to say when an attractor or stable fixed point persists under this noise. Definition 4.16. Let σ > 0. (a) An attractor A of F is called σ-stable if there is a compact neighborhood UA of A such that B≤σ (FUA) ⊆ int UA. (b) A fixed point a of F is called asymptotically σ-stable if there are a compact neighborhood Ua of a and L ∈ (0, 1) such that B≤σ (FUa) ⊆ int Ua and a is L-Lipschitz on Ua. Every attractor A of F is σ-stable and every attracting fixed point a asymptotically σ- stable if σ > 0 is small enough, so (asymptotic) σ-stability is more a property of σ than the attractor itself. Given a σ-stable attractor A the random compact set U = Ω × UA fulfills the assumptions of Theorem 2.20 with K = U, such that A− := ΩU is attracting the neighborhood U of itself. In this case U is forward-invariant for T n g such that the fibers of A− can be expressed as \ k −k A− (ω) = cl (gT −kωU (T ω)). (4.8) k∈N If a is an asymptotically σ-stable fixed point then we can argue in a similar fashion that there is a random fixed point a− that attracts the random neighborhood Ω × Ua and a− (ω) is the sole element of \ k {a− (ω)} = gT −kωUa. (4.9) k∈N Remark 4.17. Associated to the RDS T n g is the set-valued dynamical system

Gσ : K → K,K 7→ B≤σ (F (K))

d on the set K of compact subsets of R . Given a set K ∈ K, Gσ (K) is the set of all possible values gω (x) for x ∈ K and ω ∈ Ω. An attractor A is σ-stable if and only if there is a compact neighborhood UA of A such that Gσ (UA) ⊆ int UA.

76 : T n K In this case the set EA = n∈N Gσ (UA) ∈ is a fixed point of Gσ, also known as an invariant set. The relevance of such invariant sets for the dynamics of T n g is manifested in [77]. The authors prove that under some mild regularity condition, there is a finite set ρ1, . . . , ρn of stationary measures (cf. Definition 2.14) with the following properties. Every stationary measure is a convex combination of ρ1, . . . , ρn and the supports Ei of ρi are pairwise disjoint and minimal invariant for Gσ, i.e. they contain no invariant set other than ∅ and itself. This can help describing A−. 4 Consider for example F (x) = π arc tan (x). This map has two attracting fixed points 1 and −1, and interval A = [−1, 1] is therefore an attractor as well. As F (R) ⊆ (−2, 2), the attractor A is always σ-stable, for example by choice of UA := [−2 − 2σ, 2 + 2σ]. Given the monotonicity of F the associated random attractor A− as in (4.8) has intervals as fibers,

A− (ω) = [x (ω) , y (ω)], where x, y are past-measurable random fixed points. Like in Remark 2.16 each of them generates a stationary measure, which we denote as ρx, ρy. For small enough σ, both 1 and −1 are asymptotically stable and E−1 and E1 are disjoint, invariant sets for Gσ, and we can identify the according attracting random fixed points (4.9) ∗5 as x and y. If σ crosses a certain threshold σ , E1 and E2 disappear, and EA becomes the unique and therefore minimal invariant set. Hence ρx = ρy and Theorem 2.15 implies that x = y a.s. In other words, the interval A− (ω) collapses into a single point at the bifurcation value σ∗. That this happens in a discontinuous way follows from the results in [53]. It should be mentioned that this set-valued approach, in the obvious generalization to non-autonomous systems, can be used to study tipping for parameter shift systems with bounded noise. This was done in [21] where results are obtained in a fairly high generality. However we want to focus on the probabilistic nature of tipping, for which a purely set-valued approach is not feasible, as it cannot distinguish individual paths of the NRDS. The following theorem provides two ways of generalizing Theorem 4.4 to a random setting. Theorem 4.18. (a) Let A be a σ-stable attractor of F and A− as in (4.8). There is a compact, invariant NARS (A[Λ,r,A−]) for ϕ[Λ,r] that weakly past-tracks A , attracts Ω × U and contains n n∈Z − A every other compact, invariant NARS weakly past-tracking A−. (b) Let a be an asymptotically σ-stable fixed point of F with a− like in (4.9). There is an a.s. unique NARF (a[Λ,r,a−]) for ϕ[Λ,r] that uniformly past-tracks a and attracts n n∈Z − Ω × U . Moreover, (a[Λ,r,a−]) is the a.s. unique invariant NARS past-tracking a in a n n∈Z − the weak sense.

Proof. In (a) we can pick an ε > 0 such that B≤σ+ε (FUA) ⊆ UA which implies that

B≤ε (gωUA) ⊆ B≤σ+ε (FUA) ⊆ UA for all ω ∈ Ω. As moreover A− (ω) ⊆ UA is compact and UA is open, the random open set U := Ω × UA fulfills the assumptions of Theorem 4.9. For (b) we can similarly apply Theorem 4.11 with U = Ω × Ua and L as specified in Definition 4.16. 5One can determine σ∗ explicitly as the (unique) value of σ such that the graphs of x 7→ F (x) ± σ are tangential to the diagonal y = x in one point. The same priciple is applied to a different F in the next section, see Figure5.

77 In the following we want to analyze the r-dependence of the objects constructed above. In [Λ,r,a−] the case of a σ-stable fixed point, each random an has its values in the compact set Ua and therefore is an element of L∞. Theorem 4.19. Let a be an asymptotically σ-stable fixed point of F and Λ a parameter shift. The mapping ∞ [Λ,r,a−] (0, ∞) → Lµ , r 7→ an ∞ is continuous w.r.t. to the Lµ -norm for every n ∈ Z.

Proof. For the sake of simplicity we suppress all Λ and a−-dependencies in the notation throughout this proof and write U = Ua. For a fixed r > 0 we let ρ > 0 such that I := [r − ρ, r + ρ] ⊆ (0, ∞). Due to Lemma 4.15 and Theorem 4.13 we can find n0 ∈ Z and Θ ∈ (0, 1) such that |ϕr¯ (k, n − k; T −kω)x − ar¯ (ω)| ≤ Θk diam (U) for all n ≤ n0, r¯ ∈ I, ω ∈ Ω and x ∈ U. k η For η > 0 fix a k ∈ N with Θ diam (U) ≤ 3 and some arbitrary x ∈ Ua. Then 2η |ar (ω) − ar¯ (ω)| ≤ + |ϕr (k, n − k, T −kω)x − ϕr¯ (k, n − k, T −kω)x| (4.10) n n 3 r¯ −k for all ω ∈ Ω. As ϕ (k, n − k; T ω)x depends on ω only through ωn−k, . . . ωn−1 only the map

k d r¯ −k [r − ρ, r + ρ] × B≤1 (0) → R , (¯r, ωn−k, . . . , ωn−1) 7→ ϕ (k, n − k, T ω)x is well defined. Moreover it is uniformly continuous as a continuous map on a compact domain. Hence we can find δ ∈ (0, ρ) such that for all ω ∈ Ω and r¯ ∈ Bδ (r), η |ϕr (k, n − k, T −kω)x − ϕr¯ (k, n − k, T −kω)x| < , 3 and (4.10) implies that r r¯ |an − an|L∞ ≤ η. r¯ This shows that r¯ 7→ an is continuous in r whenever n ≤ n0. For n > n0 write n = n0 + k. Similar to above, the map

k d r¯ [r − ρ, r + ρ] × B≤1 (0) × U → R , (¯r, ω0, . . . , ωk−1, y) 7→ ϕ (k, n0; ω) y is well definded and uniformly continuous and we can find δ1 > 0 such that

|ϕr (k, n; ω) y − ϕr¯ (k, n; ω)y ¯| < ε (4.11) for all ω ∈ Ω, r¯ ∈ [r − ρ, r + ρ] and y, y¯ ∈ U with |(r, y) − (r,¯ y¯)| < δ1. Due to the result for n ≤ n0 there is δ2 > 0 such that, whenever |r − r¯| < δ2, we have δ |ar (ω) − ar¯ (ω)| < √1 . n0 n0 2 Thus if   δ1 |r − r¯| < δ := min √ , δ2 2

78 r r¯ we can apply (4.11) with y = an0 (ω) and y¯ = an0 (ω) to obtain that

r k r¯ k |an0+k (T ω) − an0+k (T ω)| < ε. Note that indeed we have q 2 2 |(r, y) − (¯r, y¯)| < |r − r¯| + |y − y¯| < δ1.

r¯ As T preserves µ this implies that r¯ 7→ an is continuous in r whenever n > n0. Now let us generalize the notion of stable branches introduced in Definition 4.2 to σ-stable attractors. We do that in what appears to be the most natural way.

Definition 4.20. A σ-stable branch is a pair (U, A) of continuous maps U, A: [0, λ+] → K λ λ such that A (λ) is a σ-stable attractor for F , A (λ) ⊆ U (λ) and B≤σ (F U (λ)) ⊆ int U (λ) for all λ ∈ [0, λ+]. We say it is convex if U (λ) is a convex set for all λ ∈ [0, λ+]. If (U, A) is a σ-stable branch, then A (0) is a σ-stable attractor of F and U (0) contains the random attractor A− as in (4.8). Similarly we can find a random attractor A+ of the future limit system T n gˆ contained in U (λ+). In this case we say that A− and A+ are connected via the σ-stable branch (U, A). We make the following observation that motivates our definition of a tipping point.

Lemma 4.21. Let A− and A+ be connected via the convex σ-stable branch (U, A) and Λ a ∗ ∗ parameter shift. There exists r > 0 such that for every r ∈ (0, r ) we can find n1 ∈ Z with [Λ,r,A−] An ⊆ U (λ+) a.s. for all n ≥ n1. We delay the rather technical proof of this result to the end of this chapter. It means, roughly speaking, that for small enough r, the non-autonomous attractor which is past-tracking A− will eventually be close to the future limit attractor A+. This motivates the following definition.

Definition 4.22. Let Λ be a parameter shift and r > 0, and let A− and A+ be connected via [Λ,r,A(0)] the convex σ-stable branch (U, A). We say that A− tips weakly under Λ, r if dist (An ,A+) does not converge to 0 in probability.

An immediate consequence is the following generalization of Theorem 4.6.

Theorem 4.23. Let A− and A+ be connected via the convex σ-stable branch (A, U). For ∗ every parameter shift Λ there exists r > 0 such that A− does not weakly tip for any rate r < r∗.

Proof. If we pick r∗ as in Lemma 4.21 then

[Λ,r,A−] [Λ,r,A−] A (ω) := lim A (ω) ⊆ U (λ+) ∞ n→∞ n for a.e. ω ∈ Ω and r < r∗, and we can apply a forward-version of Lemma 4.8.

79 A more detailed analysis is possible in the case of two attracting fixed points that are connected via a σ-stable branch (U, A). In this case we can replace the singleton-set valued n map A by a map a: [0, λ+] → R . If we assume moreover that there is L ∈ (0, 1) such that λ F is L-Lipschitz on U (Λ), at least for λ = 0, λ+, then there is a unique attracting random fixed point a− of T n g contained in U (0), given by (4.9). Similarly there is an a.s. unique attracting fixed point a+ of the future limit system T n gˆ contained in U (λ+). In this case we say that a− and a+ are connected via the contracting σ-stable branch (U, a). λ + If we assume that DF → DF compactly as λ → λ+, then for any Λ, r, part (b) of Theorem 4.18 implies the existence of an a.s. unique non-autonomous attractor (a[Λ,r,a−]) n n∈Z [Λ,r,a−] which past-tracks a− uniformly. In particular, an converges to a− pathwise:

[Λ,r,a−] n n lim |a (T ω) − a− (T ω)| = 0 n→−∞ n for (almost) every ω ∈ Ω. Based on this observation we propose the following.

Definition 4.24. Let a− and a+ be connected via a contracting σ-stable branch and assume λ + DF → DF compactly as λ → λ+. Given a parameter shift Λ, a rate r > 0 and ω ∈ Ω, we say that a− tips along ω, if

[Λ,r,a−] n n |an (T ω) − a+ (T ω)| 6→ 0 as n → ∞. We call the set

[Λ,r,a−] Ωt := {ω ∈ Ω | a− tips along ω} the tipping set of [Λ, r, a−], and the map

[Λ,a−] [Λ,a−] [Λ,r,a−] Pr : (0, ∞) → [0, 1] , Pr (r) = µ (Ωt )

[Λ,r,a−] [Λ,r,a−] the tipping probability of [Λ, a−]. We call Ωnt := Ω \ Ωt the non-tipping set or tracking set of [Λ, r, a−]. A simple but handy tool for detecting tracking is the following.

∗ Lemma 4.25. In the situation of Definition 4.24, for any given Λ there exist t ∈ R and ε > 0 such that [Λ,r] B≤ε (fn,ω U (λ+)) ⊆ U (λ+) and [Λ,r]  ess sup dist ϕ (k, n; ω)U (λ+) , a+ (ω) −−−→ 0 ω∈Ω k→∞ whenever rn > t∗.

The proof of this result is a combination of techniques from the proofs of Theorem 4.13 and Lemma 4.15, thus we only sketch the main steps, and leave the details to the dedicated reader.

80 Sketch of proof. For a given Λ there exist numbers t∗ > 0, ε > 0 and L < Θ < 1 such that

[Λ,r]  [Λ,r] B≤ε fn,ω U (λ+) ⊆ U (λ+) and sup |Dfn,ω | < Θ U(λ+)

∗ ∗ whenever rn > t . Fix n0 and r with rn0 > t and define

k ∆n (ω, y) := |ϕ (k, n; ω)y − a+ (T ω)| for n ≥ n0 and y ∈ U (λ+). We have the inequality

k n X j−1 ∆n+k (ω, y) ≤ Θ ∆n (ω) + Θ βn+k−j (U (λ+)) j=1

n 1 ≤ Θ diam (U (λ+)) + sup βj (U (λ+)) , 1 − Θ j≥n and taking limits k → ∞ and then n → ∞ gives the desired result, as the right hand side of this inequality neither depends on ω ∈ Ω nor on y ∈ U (λ+). We can prove an analogue version of Theorem 4.23 for this case.

Theorem 4.26. Let a− and a+ be connected via a contracting, convex σ-stable branch. For ∗ [Λ,r,a−] ∞ ∗ every parameter shift Λ there is r > 0 such that an → a+ in Lµ whenever r ∈ (0, r ). In particular, Pr[Λ,a−] (r) = 0 for all r ∈ (0, r∗).

∗ ∗ Proof. Let r be like in Lemma 4.21 and fix some r ∈ (0, r ), and let n1 be like in the same ∗ ∗ Lemma. We can assume w.l.o.g. that rn1 > t with t from Lemma 4.25. [Λ,r,a−] Then for all n ≥ n1 we have an ∈ U (λ+) a.s. Using the notation n = n1 + k we obtain

[Λ,r,a−] n n [Λ,r] n1 n  |an (T ω) − a+ (T ω)| ≤ dist ϕ (k, n1; T ω)U (λ+) , a+ (T ω) , and it remains to take the essential supremum and limits k → ∞ and apply Lemma 4.25.

The analytic properties of general tipping probabilities as maps from (0, ∞) to the unit interval are mostly unknown to us at this point. A numerical analysis of an example, conducted in the next section, indicates that we cannot expect differentiability or monotonicity in general. The particular map chosen there induces a continuous tipping probability map Pr, and the according proof could easily adapted to more general hyperbolic fixed points in the case d = 1. We believe that this is true for a much more general class of functions but we do not have a rigorous argument supporting this claim. However, a direct consequence of Lemma 4.25 is semicontinuity.

Proposition 4.27. In the situation of Definition 4.24, the map Pr[Λ,a−] is upper semicontin- uous.

t∗ Proof. Let r ∈ (0, ∞), pick ρ > 0 such that I := [r − ρ, r + ρ] ⊆ (0, ∞), and let n1 ≥ r−ρ ∗ with t like in Lemma 4.25. We drop Λ and a− from the notation for the rest of this proof. Denoting r¯ n Cn := {ω ∈ Ω | B≤ε (an (T ω)) ⊆ U (λ+)}

81 with ε from Lemma 4.25 we have the relation

r [ Ωnt = Cn. n≥n1

Due to Lemma 4.25, Cn ⊆ Cn+1 for all n ≥ n1 such that for any given η > 0 ther is one such n with r µ (Ωnt) ≤ µ (Cn) + η. r n Theorem 4.19 implies that r 7→ an (T ω) is continuous, and thus there is a δ ∈ (0, ρ] such that r¯ n r¯ n an (T ω) ∈ B≤ε (an (T ω)) ⊆ U (λ+) r r¯ for all ω ∈ Cn and r¯ ∈ Bδ (r), and Lemma 4.25 yields that ω ∈ Ωnt in this case. Summing up we have the relation

r¯ Pr (¯r) = 1 − µ (Ωnt) ≤ 1 − µ (Cn) ≤ Pr (r) + η whenever r¯ ∈ Bδ (r), which shows that Pr is upper semicontinuous in r.

It remains to prove Lemma 4.21. We will use the following two geometrical results.

Lemma 4.28. Let (U, A) be a σ-stable branch. There exists ε > 0 such that Bε+σ (U (λ)) ⊆ U (λ) for all λ ∈ [0, λ+].

Proof. For λ ∈ [0, λ+] let λ  εˆ(λ) := dh B≤σ (F U (λ)),U (λ) .

As (U, A) is a σ-stable branch we have εˆ(λ) > 0 for all λ ∈ [0, λ+]. U is a continuous map on the compact set [0, λ+], thus there is K ∈ K with U (λ) ⊆ K for all λ ∈ [0, λ+]. Since d λ [0, λ+] × K → R , (λ, x) 7→ F (X) is uniformly continuous, the map λ 7→ F λU (λ) is (uniformly) continuous as well, which in turn implies that εˆ: [0, λ+] → (0, ∞) is continuous and thus assumes a minimum ε > 0. Then it is clear from the definition of εˆ that this ε has the desired property.

Lemma 4.29. Let I, J, K ∈ K with K convex, and assume there is ε > 0 such that B≤ε (I) ⊆ J and dh (J, K) ≤ ε. Then I ⊆ K. Proof. Assume that I 6⊆ K and pick a point x ∈ I \ K. Since K is convex there is a unique y ∈ K such that δ := d (x, K) = |x − y|. Construct a point z ∈ B≤ε (I) ⊆ J by extending the line from y to x by length ε, in formulas

z = x + ε (y − x) = y + (δ + ε)(y − x) .

Convexity of K is equivalent to convexity of the map d ( · ,K). Thus,

δ = d (x, K) = d ((1 − t) y + tz, K) ≤ (1 − t) · d (y, K) + t · d (z, K) = t · d (z, K)

82 δ with t = δ+ε . This can be rewritten as d (z, K) ≥ ε + δ > ε, but this contradicts z ∈ J and dh (J, K) < ε. Proof of Lemma 4.21. Let ε be like in Lemma 4.28. As U is continuous on a compact domain, 0 it is uniformly continuous and thus there is a δ > 0 such that dh (U (λ) ,U (λ )) < ε whenever 0 |λ − λ | < δ. The map Λ is also uniformly continuous because the limits limt→±∞ Λ (t) exist ∗ 0 0 ∗ in R, hence we can find a number r such that |Λ (t) − Λ (t )| < δ whenever |t − t | < r . Now we fix any r ∈ (0, r∗) and drop the [Λ, r] dependence from the notation for the rest of this proof. We denote Vn := U (Λ (rn)). U is continuous and thus there exists a compact set K ∈ K such that U (λ) ⊆ K for all λ ∈ [0, λ+]. There is a number n0 ∈ Z such that for all n ≤ n0 we have ε • βn (K) < 2 , ε • dh (U (0) ,Vn+1) < 2 and • An (ω) ⊆ U (0) In this case we have the relation

B ε (fn,ωU (0)) ⊆ B≤ε (gωU (0)) ⊆ B≤σ+ε (FU (0)) ⊆ U (0) . ≤ 2

Thus we can use the invariance of An and Lemma 4.29 to see that

An+1 (T ω) = fn,ωAn (ω) ⊆ fn,ωU (0) ⊆ Vn+1. (4.12)

This shows that An ⊆ Vn a.s. whenever n < n0. ∗ In the next step we extend this result to all n ∈ Z. By choice of r < r we achieve that dh (Vn,Vn+1) ≤ ε and moreover Λ(rn) B≤ε (fn,ω (Vn)) ⊆ B≤ε+σ (F Vn) ⊆ Vn.

Using the same argument as in (4.12) iteratively, we thus obtain that An ⊆ Vn a.s. for all n ∈ Z. Finally we can pick a number n1 ∈ Z such that dh (Vn,U (λ+)) < ε for all n ≥ n1, and in a similar manner as above we can conclude that An ⊆ U (λ+) for all n ≥ n1.

4.4 An example In this section we want to study an example in one dimension d = 1, and just one attracting fixed point. We will provide a numerical recipe for calculating the tipping probability and analyze some features of the graph. The map we are studying is Z x  1    1 x  F (x) = c arccot (y) + dy = c x arccot (x) + (1 + x2) + 0 10 2 10 with normalizing constant c > 0 such that F (1) = 1. This function was chosen because it has the nice property it is a homeomorphism and concave. The parameter shift Λ will be the ramping function  0, t < 1  3  2 Λ (t) = λ+, t > 3  (3t − 1) λ+, otherwise.

83 F(x) + 1.00 x

0.75 ) t

0.50 (

0.25

0.00 0 0.00 0.25 0.50 0.75 1.00 0 1 2 1 x 3 3 t

Figure 4: Graphs of F and the parameter shift Λ

With F λ (x) := F (x − λ) + λ this setup induces a family of NRDS ϕr := ϕ[Λ,r] as in (4.7)

For a given compact subset K ⊆ R let L be a Lipschitz constant of F on B≤λ+ (K), which exists as F is C1. Then for all x ∈ K we have |F λ (x) − F (x)| = |F (x − Λ (rn)) − F (x) + Λ (rn)| ≤ (L + 1) Λ (rn) , which shows that indeed F λ → F compactly as λ → 0. Similarly, DF λ → DF compactly as λ + λ + r 2 λ → 0 and F → F and DF → DF as λ → λ+. Note that by choice of Λ, ϕ = ϕ 3 for all 2 r ≥ 3 . We denote the forward- and backward limit systems as

gω (x) = F (x) + σω0 and gˆω (x) = F (x − λ+) + λ+ + σω0 respectively. One can see easily that F is monotonically increasing and has two fixed points, a stable one at x = 0 and an unstable one at x = 1. There is a unique point x∗ in (0, 1) with F 0 (x∗) = 1 and the map x 7→ F (x) − x assumes its maximum σ∗ > 0 at x = x∗. If we pick 0 < σ < σ∗, then the maps x 7→ F (x) ± σ have two fixed points each, which we denote as ∗ qmin < 0 < qmax < x < amin < 1 < amax. It is then not hard to see that 1 is asymptotically σ ∗ σ-stable, where U1 can be any compact neighborhood of I = [amin, amax] with U1 ⊆ (x , ∞). σ In this case I is the unique minimal forward invariant set of the set valued system Gσ as described in Remark 4.17 Figure5 gives a graphical overview over the defined quantities. Thus we are in the situation of Theorem 4.11, and there exist an a.s. unique attracting random fixed point a− ∈ [amin, amax] of T n g and for each r > 0 an a.s. unique attracting NARF (ar ) uniformly past-tracking a . In fact, f = g for all n with rn < 1 and thus n n∈Z − n,ω ω 3 r we have an = a− in those cases. Similary, the future limit system T n gˆ has an a.s. unique attracting random fixed point a+ ∈ [amin + λ+, amax + λ+]. It is also clear that

(U, a): [0, λ+] → K × R, λ 7→ (U1 + λ, 1 + λ)

84 σ ∗ ∗ Figure 5: Graphic definition of qmin, qmax, amin, amax, I , x and σ .

defines a contracting, convex σ-stable branch connecting a− and a+. We denote the tipping probability function from Definition 4.24 as Pr. By choice of F and (Ω, A , µ, T ), all the NRDS ϕr are reversible and we can apply the same argumentation as above to see that each ϕr has an a.s. unique repelling NARF (qr ) , which n n∈Z is identical to the unique repelling random fixed point q+ ∈ [qmin + λ+, qmax + λ+] of T n gˆ 2 whenever rn > 3 . This repelling fixed point is a valuable tool in analyzing tipping probabilities beyond the scope of the previous section, as we have the following characterization.

Lemma 4.30. For all r > 0 and n ∈ Z, r r n r n Ωt = {an ◦ T ≤ qn ◦ T } . (4.13)

Proof. That (4.13) does not depend on n is clear from the monotonicity of fn,ω. r n r n 2 If an (T ω) ≤ qn (T ω) for some ω ∈ Ω, then for any given n with rn > 3 we have r n n an (T ω) ≤ qmax + λ+ < amin + λ+ ≤ a+ (T ω) r and hence ω ∈ Ωt . r n r n 2 Assume on the other hand that an (T ω) > qn (T ω). Fix n0 with n0r > 3 and let ε > 0 ∗ r n be so small that x − ε > qmax. For simplicity we will write an and qn instead of an (T ω) r n and qn (T ω) respectively. ∗ First we show that an > λ+ + x − ε for some large enough n ≥ n0. To see this assume ∗ 2 an ≤ λ+ + x − ε for all n with rn > 3 and denote Θ := F 0 (x∗ − ε) = inf F 0 (x) > 1. x≤x∗−ε

85 The we have by the Mean Value Theorem that

k an+k ≥ qn+k + Θ (an − qn) −−−→ ∞, k→∞ a contradiction. ∗ This implies now that an > λ+ + x + ε for some large enough n. Indeed let

δ := sup F (x) − x > 0. x∈[x∗−ε,x∗+ε]

∗ ∗ 2 Then if an ∈ [λ+ + x − ε, λ+ + x + ε] for all n with nr > 3 we get

an+k ≥ kδ + an −−−→ ∞, k→∞ again a contradiction. ∗ If we assume w.l.o.g. that x + ε ∈ U1, Lemma 4.25 shows that in this case indeed r ω ∈ Ωnt. r We already established that an depends continuously on r, so the above characterization of tipping already indicates that Pr is a continuous function. This is indeed the case, but in order to prove this we need the following.

r r Lemma 4.31. µ {an = qn} = 0 r Proof. We show that each an = an has a density hn w.r.t. the Lebesgue measure on R. For 1 rn < 3 this follows from the fact that an = a− and Theorem 1.3 in [77]. Assuming that we have constructed a density hn for any n ∈ N. Since an is measurable w.r.t. ω−1, ω−2,... and r fn,ω w.r.t. ω0 we can use the Markov property of ϕ and the fact that fn,ω are diffeomorphisms to see that for B ∈ B (R), ZZ 0 0 µ {an+1 ∈ B} = 1 −1 (an (ω )) dµ (ω) dµ (ω ) fn,ωB ZZ = 1 −1 (x) hn (x) dµ (ω) dx fn,ωB

Z Z hn (fn,ω (x)) = 1B (x) 0 dµ (ω) dx, fn,ω (x) such that we identified Z h (f (x)) : n n,ω 6 hn (x) = 0 dµ (ω) fn,ω (x) as the density of an+1. Similarly all qn are absolutely continuous w.r.t. the Lebesgue measure and moreover an and qn are independent, such that also an − qn is absolutely continuous, which concludes the proof.

This was the last step missing to show the aforementioned continuity.

Theorem 4.32. The map Pr: (0, ∞) → [0, 1] is uniformly continuous. 6In fact we applied a non-autonomous version of the random defined in [77].

86 Proof. Let r0, r1, r2,... ∈ (0, ∞) with rk → r0 and write

rk rk hk (ω) := an (ω) − qn (ω) . as well as ˜ : hk = 1{hk<0} ˜ ˜ for an arbitrary n ∈ Z. Theorem 4.19 implies that hk → h0 a.s. and thus hk → h0 on the ˜ set {hk 6= 0}, which is of full measure according to Lemma 4.31. Hence the Theorem of Dominated Convergence yields that ˜ ˜ Pr (rk) = Ehk −−−→ Eh0 = Pr (r0) , k→∞ which shows continuity of Pr. ∗ 2 Moreover, Theorem 4.26 implies that Pr (r) ≡ 0 on some interval (0, r ), and if r ≥ 3 r h 2  then ϕ does not depend on r, such that Pr is constant on 3 , ∞ . This shows uniform continuity.

Numerical method to calculate Pr Lemma 4.30 allows us to devise a numerical method to approximately calculate the tipping probability function Pr. First for r > 0 we denote  1   2  N (r) := and N (r) := , (4.14) 0 3r 1 3r r i.e. N0 (r) is equal to the largest integer n such that fn = g and N1 (r) to the smallest integer r n with fn =g ˆ. Due to Lemma 4.31 and the fact that q+ repels [λ+ + qmin, λ+ + qmax], the random integer

r n N2 (r, ω) := inf {n ≥ N1 (r) | an (T ω) ∈/ [qmin + λ+, qmax + λ+]} is almost surely finite for every r > 0. We denote k0 (r, ω) = N2 (r, ω) − N0 (r) and

r G (r, ω) := ϕ (k0 (r, ω) ,N0 (r) , ω) .

r Lemma 4.30 and the fact that aN0(r) = a− imply that ω ∈ Ωt if and only if

N2(r,ω) N0(r) N0(r) aN2(r,ω) (T ω) = G (r, T ω)a− (T ω) < qmin + λ+.

By definition G (r, ω) depends only on ω0, ω1,... by definition and a− (ω) on ω−1, ω−2,... with the reasoning from Remark 2.22. Thus they are independent and due to the T -invariance of µ we deduce that ZZ 0 0 Pr (r) = 1{G(r,ω)a−(ω )

Based on this we implemented the following numerical method for calculating Pr (r). 1.) First we sample N points a1, . . . , aN from [amin, amax] according to the law of a−. As k gT −k (1) converges exponentially fast towards a− (ω) and the invariance of T this can be well approximated by K ai = gωi (1) for some large enough K, where the ωi ∈ [−1, 1]k are picked independently.

87 i,j 2.) For each i = 1,...,N we calculate G (r, ω )ai for M independent noise realizations i,1 i,M k0 i,j ω , . . . , ω ∈ [−1, 1] and denote pi,j = 1 if G (r, ω )ai < qmin + λ+ and pi,j = 0 otherwise. 3.) Due to the Strong , (4.15) yields that

N M 1 X X Pr (r) = lim pi,j M,N→∞ MN i=1 j=1 a.s. such that for sufficiently large M and N we have

1 N M Pr (r) ≈ X X p . MN i,j i=1 j=1

For practical reasons we calculate this average as an online update: we start with p0 = 0 and then if pi,j is the k-th value calculated we update 1 p = p + p − p  , (4.16) k k−1 k i,j k−1 such that Pr (r) ≈ pMN . We want to emphasize the fact that (4.16) is a Robbins-Monro algorithm as described in Chapter3. Figure6 shows a number of plots of Pr for various values of σ and λ+ obtained by the above algorithm with N = 2000, K = 1000 and M = 5000. We make the following observations. (a) Independently of σ and λ+, the tipping probability is not a monotonous function of the rate. There seem to be a certain number of points at which the graph of Pr has a peak, and the positions of these peaks are independent of λ+ and σ. (b) for the given λ+ there are three critical rates r1, r2, r3 such that the noise-less system 3 (σ = 0) tips for rates r1 < r < r2 and r > r3 and does not tip for f < r1 and r2 < r < r . 1 The graphs of Pr for various values of σ seem to intersect in the points (ri, 2 ). Moreover, between r1 and r2 and on the right of r3, Pr (r) seems to grow monotonically towards 1 and decrease monotonically towards 0 otherwise. At the present stage we do not have analytical explanations for all the phenomena described above, but in the following we want to lay out some observations and partial results that seem to have some significance. (a) A closer analysis of the positions of the peaks in the graphs of Pr lets us assume that 2 there is some regularity, they seem to be located at rates of the form r = 3k with k ∈ N. These rates do appear naturally in the formulation of the problem. If we let N0 (r) and N1 (r) like in (4.14) then N1 (r) − N0 (r) is the number of time steps n such that fn is not equal to either the past or the future limit of the NRDS. Explicitly one can calculate that  k + 1 if 2 ≤ r < 2 and r 6= 1 N (r) − N (r) = 3(2k+1) 3(2k−1) 3k 1 0 1 k if r = 3k

2 where k ∈ N. Thus the points r = 3k are exactly the discontinuities of N1 − N0. We know that Pr (r) = 0 for small enough r due to Theorem 4.26, and if Pr ≡= 1 on a neighborhood of a point r then Pr is obviously differentiable in r. Thus we formulate the following conjecture. 1 Conjecture 4.33. For any given λ+ and σ the tipping probability Pr is at least C in all but 2 finitely many points. The exceptional points are exactly those of the form r = 3k with k ∈ N such that 0 < Pr (r) < 1. Moreover Pr assumes a local maximum in all those points.

88 = 0.04 1.00 + = 1.0

+ = 1.05

+ = 1.1 0.75 + = 1.15 + = 1.2

+ = 1.25 ) r ( 0.50 r P

0.25

0.00 0 1 1 1 2 6 3 2 3 r

+ = 1.05

1.00 = 0.0 = 0.01 = 0.02 = 0.03 0.75 = 0.04 = 0.05 ) r ( 0.50 r P

0.25

0.00 0 1 1 1 2 6 3 2 3 r

Figure 6: Tipping probability Pr (r) plotted against the rate r for a fixed noise level σ = 0.04 and varying values of λ+ (top image) and for a fixed value of λ+ = 1.05 and varying noise levels σ (bottom image), including the limiting case σ = 0 of the deterministic system without added noise.

89 0.6

0.5

0.4 r

0.3

0.2

0.1 1.00 1.05 1.10 1.15 1.20

+

Figure 7: Critical rates for the deterministic system (i.e. σ = 0). For each value of λ+ there are up to 5 critical rates. The graph separates the λ+-r-parameter plane in two regions, the area right of the graph contains exactly the parameter values for which there is deterministic tipping.

The following statement could be a part explanation.

 4 2  r  Lemma 4.34. If r ∈ 9 , 3 , then Pr 2 = Pr (r).  4 2  1 2 r Proof. Assume r ∈ 9 , 3 . In this case nr ∈ 3 , 3 ) only for n = 1 and thus fn = g for n ≤ 0, r fn =g ˆ for n ≥ 2 and r fn,1 (x) = F (x − Λ (r)) + Λ (r) + σω0. 0  2 1  0 1 2 0 Similarly if r ∈ 9 , 3 then nr ∈ 3 , 3 ) if and only if n = 2, and with r = 2r we get that r r0 r0 −1 r fn = fn+1 for all n ∈ Z and thus Ωt = T Ωt , and the claim follows from the invariance of µ.

1  1  1  1  A similar argumentation shows that Pr 6 = Pr 4 = Pr 2 and Pr 3 = Pr (r) for all 2 r ≥ 3 . (b) Let us first turn our attention to the the deterministic counterpart of the systems ϕr, i.e. the boundary case σ = 0. For any r, this system is given via

r r r r Ψ (k, n) x = Fn+k−1 ◦ · · · ◦ Fn (x) where Fi (x) = F (x − Λ (ri)) + Λ (ri) .

90 r In accordance with the findings of this section, we can define tipping for the family (Ψ )r>0 in the following way. If N0 and N1 are like in (4.14), and k1 := N1 − n0, then we say there is tipping at rate r if and only if

Φ (k1 (r) ,N0 (r)) (1) < 1 + λ+. (4.17)

If (4.17) holds with “=” instead of “<”, we say that r is a critical rate. Figure7 shows a numerical calculation of the critical rates depending on λ+. This was done applying a brute force algorithm. For a given λ+, check for 10 000 rates, evenly spaced out in (0.01, 0.67), whether the system tips or not. If for two consecutive rates r1 and r2 a different result was found, repeat the process with 10.000 points in (r1, r2) to determine a −8 critical rate with an error margin of 10 . We see that for every value of λ+ between 1 and 5 such critical rates exist. Using a Taylor approximation one can obtain the following result. Theorem 4.35. Let ϕr be the family of NRDS as defined at the beginning of this Section 4.4, with associated tipping probability funton Pr. For every rate r > 0 the following assertions are true. (a) If the deterministic system tips at rate r, then Pr (r) → 1 in the limit σ → 0. 1 (b) If r is a critical rate, then Pr (r) → 2 as σ → 0. (c) Otherwise, Pr (r) → 0 as σ → 0. Our proof is based on the following three lemmas, in which we first show that the attractor a− and q+ differ from the according attractor and repeller of the locally linearized and thus 2 symmetric system only in order of σ . Then we use this to obtain a representation of aN1 as a sum of a random variable symmetric around Φ (k1 (r) ,N0 (r)) (1) and a small error term.

All that remains is to combine these lemmas to get an according expression for aN1 − qN1 in which we can apply the limit σ → 0.

∗ Lemma 4.36. There exist a symmetric, bounded random variable X, σ0 ∈ (σ, σ ), c > 0 and a family (ξσ) of random variables such that the following assertions hold for all σ∈(0,σ0) σ ∈ (0, σ0). • X is measurable w.r.t. ω−1, ω−2,... . • X is absolutely continuous w.r.t. the Lebesgue measure on R. σ • a− = 1 + σX + ξ . •| ξσ| ≤ cσ2.

0 ∗ Proof. We write Θ := F (1) ∈ (0, 1). By the Implicit Function Theorem there exist σ0 ∈ (0, σ ) and c1 > 0 such that amax (σ) < 1 + σc1 and amin (σ) > 1 − σc1 for all σ ∈ (0, σ0). By Taylor’s Theorem we can write gω (1 + x) = 1 + Θx + R (x) + σω0 with the remainder term R fulfilling the inequality

2 |R (x)| ≤ c2x ,

σ whenever x ∈ I and σ ∈ (0, σ0), where

1 00 c2 := sup |F (1 + y)|. 2 |y|

91 For k ∈ N and ω ∈ Ω we denote k X j−1 Xk (ω) := Θ ω−j. j=1

We show by induction that there exists a sequence of random variables (ξσ) such that k k∈N

k−1 σ 2 X j |ξk | ≤ c1c2σ Θ . j=0 and k σ gT −kω (1) = 1 + σX (ω) + ξk (ω) for all k ∈ N, ω ∈ Ω and σ ∈ (0, σ0). For k = 1 this is clear so assume we proved this claim up to some value k ∈ N. Then we have

k+1 −1 σ −1 g (1) = g −1 (1 + σX (T ω) + ξ (T ω)) T −(k+1)ω T ω k k −1 ˜ σ −1 = 1 + σ (ΘXk (T ω) + ω−1) + R (Xk+1 (ω)) + Θξk (T ω) σ = 1 + σXk+1 + ξk+1 (ω) where

˜ −1 σ −1 σ ˜ σ −1 Xk+1 (ω) := σXk (T ω) + ξk (T ω) and ξk+1 := R (Xk+1) + Θξk ◦ T .

σ σ As I is forward invariant for all gω, 1 + X˜k ∈ I a.s. and thus we have the estimate

k−1 k σ ˜ 2 2 X j 2 X j |ξk+1| ≤ c2Xk+1 + Θc1c2σ Θ ≤ c1c2σ Θ j=0 j=0

Now, the sequence (X ) is Cauchy such that its limit X exists a.s. Indeed we have the k k∈N estimate l−1 k k X j Θ |X (ω) − X (ω)| = Θ Θ ω ≤ . k+l k −(k+j+1) 1 − Θ j=0 1 That X is symmetric and measurable with respect to ω−1, ω−2,... is clear and as |Xk| ≤ 1−Θ a.s for all k ∈ N, X is bounded as well. X is a random fixed point of the RDS with base (Ω, A , µ, T ) and fiber maps (ω, x) 7→ Θx + ω0 and thus is absolutely continuous w.r.t. the Lebesgue measure according to [77, Theorem 1.3]. Due to (4.9),

σ a− = 1 + lim (σXk + ξk ) (4.18) k→∞

σ σ a.s. and thus ξ := limk→∞ ξk is well defined and c c |ξσ| ≤ cσ2 := 1 2 σ2. 1 − Θ

92 By applying the same argument to the inverse of the future limit system T n gˆ we obtain the following.

∗ Lemma 4.37. There exist a symmetric, bounded random variable Z, σ0 ∈ (σ, σ ), c > 0 and a family (ζσ) of random variables such that the following assertions hold for all σ∈(0,σ0) σ ∈ (0, σ0). • Z is measurable w.r.t. ω0, ω1,... . • Z is absolutely continuous w.r.t. the Lebesgue measure on R. σ • q+ = λ+ + σZ + ζ . •| ζσ| ≤ cσ2.

Let N0 and N1 be like in (4.14).

∗ Lemma 4.38. There exist a symmetric, bounded random variable Y , σ0 ∈ (0, σ ) and a family (υσ) of random variables such that the following assertions hold. σ∈(0,σ0) • Y is measurable w.r.t. ω−1, ω−2,... . • Y is absolutely continuous w.r.t. the Lebesgue measure on R. σ • aN1 = Ψ (k1,N0) (1) + σY + υ for all σ ∈ (0, σ0). • υσ = o (σ).

0 Proof. Let σ be like in Lemma 4.36. For k = 0, . . . k1 we denote yk = Ψ (k, N0) (1). Let moreover p denote an upper bound on F 00 We show by a finite induction that there exist symmetric, past-measurable and absolutely σ σ continuous random variables Y0,...,Yk and random variables υ0 , . . . , υk = o (σ), σ ∈ (0, σ0) such that σ aN0+k = yk + σYk + υk . Then the Lemma is proven as we can choose Y = Y and υσ = υσ . k0 k0 σ σ For k = 0 our claim is true, as aN0 = a− and thus we can pick Y0 = X and υ0 = ξ from Lemma 4.36. Assume now the claim has been proven up to some value k. Due to Taylor’s Theorem we can write

FN0+k (yk + x) = yk+1 + bkx + Rk (x) with b = F 0 (y ) and the remainder term R fulfilling the inequality k N0+k k k p |R (x)| ≤ x2. 2 With the notations

˜ σ σ −1 Yk := σYk + υ ,Yk+1 (ω) := bkYk (T ω) + ω−1 and σ ˜ σ −1 σ −1 υk+1 (ω) := Rk (Yk (T ω)) + bkυk (T ω) this implies the relation

−1 σ −1 aN0+k+1 (ω) = FN0+k (yk + σYk (T ω) + υk (T ω)) + σω−1 σ = yk+1 + σYk+1 (ω) + υk+1 (ω) .

93 That Yk+1 is bounded, symmetric, absolutely continuous and past measurable follows from the fact that it is a sum of two independent random variables fulfilling all those properties. Moreover we have σ ˜ σ 2 σ  σ 2 σ |υk+1| (Yk ) |υk | |υk | |υk | ≤ p + bk ≤ pσ Yk + + bk −−−→ 0 σ σ σ σ σ σ→0 σ since Yk is bounded and υk ∈ o (σ). Proof of Theorem 4.35. With the notation from Lemmas 4.37 and 4.38 we can write σ aN1 − qN1 = aN1 − q+ = Ψ (k1,N0) (1) − λ+σD where υσ − ζσ Dσ := Y − Z + . σ Note that σDσ → 0 a.s. We analyze the tree cases (a), (b) and (c) of the theorem separately. σ (a) If the deterministic system tips, then Ψ (k1,N0) (1) − λ+ < 0 and thus, as σD → 0 a.s., we see that Pr (r) = µ {aN − qN < 0} −−−→ 0. 1 1 σ→0 σ (b) At a critical rate r we have aN1 − qN1 = σD and hence Pr (r) = µ {Dσ < 0} . The sequence of Dσ converges a.s. to the random variable D := Y − Z, and therefore also in distribution. As a sum of two independent, symmetric and absolutely continuous random variables, D is symmetric and absolutely continuous itself. In particular, the cumulative 1 distribution function of D is continuous in 0 and attains the value 2 . This implies now that 1 Pr (r) = 1 − µ {−Dσ ≤ 0} −−−→ 1 − µ {−Dσ ≤ 0} = . σ→0 2 (c) Analogous to (a).

4.5 Further remarks and outlook Generalizations of results and limitations The results from Section 4.2 assume the existence of very specific random neighborhoods of A− and a− respectively. Though formulated in a slightly more general way, they are somewhat tailored to the situation of bounded, additive noise as described in Section 4.3. Given an attractor A− of the past-limit system A−, Definition 4.24 of a tipping probability only makes sense in the case that there is a non-autonomous random attracting set (A ) n n∈Z such that n n dist (An (T ω) ,A− (T ω)) −−−−−→ 0 (4.19) n→−∞ for a.e. ω ∈ Ω. Unfortunately, Theorem 4.9 does not guarantee the existence of such an attractor, we have to resort to Theorem 4.11. However, the prequisites of this result are fairly strong: gω must be contracting around A−, and this inevitably implies that A− is in fact a random fixed point. A possible way around this restriction could be the following condition on g and U.

94 Assumption (†). Let A− be a compact invariant random set for T n g that attracts an essentially bounded random neighborhood U of itself with the following properties. (a) For every δ > 0 there is a k0 ∈ N such that for all k ≥ k0 and a.e. ω ∈ Ω,

k −k gT −kωU (T ω) ⊆ B≤δ (A− (ω)).

(b) For all K ∈ K there is L = L (K) such that gω is L-Lipschitz on K for a.e. ω ∈ Ω. This is clearly fulfilled if g is contracting on U, but we believe that this condition holds for a larger class of attractors. However, at the present stage we cannot give any proof or example for that. Under this assumption one can follow the path of the proof of Theorem II.2 in [2]. The first step is to adapt [63, Lemma 7.1] to our time discrete, random setting.

Lemma 4.39. Let K be compact and assume (†) holds. For every k0 ∈ N0 and δ > 0 there exists n0 = n0 (k0, δ) such that

k |ϕ (k, n, ω) x − gω (x)| ≤ δ

k for all n ≤ n0, k = 0, . . . , k0, a.e. ω ∈ Ω and all x ∈ K such that gω (x) ∈ K for all k = 0, . . . k0. −1 k0−1 Sketch of proof. Let L = L (B≤δ (K)), c = δ (1 + L + ... + L ) and n0 be so small that βn := βn (B≤δ (K)) < c for all n ≤ n0 − k0. Similarly to the proof of Theorem 4.13 we get the inequality k+1 k |ϕ (k + 1, n; ω) x − gn (x)| ≤ βn+k + L|ϕ (k, n; ω) x − gn (x)| and thus k |ϕ (k, n; ω) x − gn (x)| ≤ δ for all suitable k, x, ω and n ≤ n0. Then one can follow the same steps as the authors do in the aforementioned paper [2] to obtain the following result.

Theorem 4.40. Under assumption (†), there is n0 ∈ Z such that

−k −k An (ω) := lim ϕ (k, n − k, T ω)U (T ω) k→∞ for n ≤ n defines a compact, invariant NARS (A ) which past-tracks A uniformly and 0 n n∈Z − attracts the random set U.

Alternatively one could seek for a concept of tipping probability in the weak tracking case. Extending the idea from Definition 4.24 one can define a non-tipping set as

[Λ,r,A−] [Λ,r,A−] n n Ωnt := {ω ∈ Ω | dist (An (T ω),A+ (T ω)) → 0}.

This, of course, raises the question to what extent it is justified to demand a stronger future tracking property than for the past. It should also be noted that the ansatz

[Λ,r,A−] µ{A∞ ∩ A+ ⊆ A+} (4.20)

95 [Λ,r,A−] in the spirit of Definition 4.5 fails, since the random set A∞ ∩ A+ is forward invariant and thus the value of (4.20) is either 0 or 1 due to of the base transformation. The notion of partial tipping is somewhat ambiguous in a random context as well. One can argue that a tipping probability in (0, 1) is a form of partial tipping. On the other hand, the weak tipping framework permits a natural generalization of partial tipping in a spacial sense like Definition 4.5. As discussed above, the events

[Λ,r,A−] [Λ,r,A−] A∞ ∩ A+ ⊆ A+ and A∞ ∩ A+ = ∅ both have probability either 0 or 1, and thus we can say there is partial tipping if both of them have probability 0.

Modifications of the example from Section 4.4 We consider the following modifications 7 to the tipping problem. Instead of evaluating Λ in the points rn we use the points tn = r (n + δn), where δn is a small deterministic or random time translation. These might help with understanding the observed and described phenomena. Deterministic fixed For δ ∈ (0, 1) we let

r fn,ω (x) := F (x − Λ (r (n + δ))) + Λ (r (n + δ)) + σω0.

The limiting systems T n g and T n gˆ remain unchanged. Deterministic r-dependent As before, δn = δ is fixed in time, but this time chosen de- 1 1 pending on r such that tN1 = 3 for N1 = b 3r c. This makes the setup more regular,  1 2  as the function “r 7→ number of n such that tn ∈ 3 , 3 ” becomes monotone. This is ¯  1  somewhat equivalent to using a parameter shift Λ (t) = Λ t − 3 . Random Let ε ∈ (0, 1). For this case we want the numbers δn to be i.i.d. random numbers, ε ε uniformly distributed in [− 2 , 2 ] and independent from ω. Let (Ξ, B, ν, S) be a copy of (Ω, A , µ, T ) and define r ˜ ˜ fn,ω,ξ (x) = F (x − Λ (r, n, ξ0)) + Λ (r, n, ξ0) + σω0 for ξ = (ξ ) ∈ Ξ, where n n∈N   ε  Λ˜(r, n, ξ) = Λ r n + ξ . 2

This defines a NRDS over (Ω × Ξ, A ⊗ B, µ ⊗ ν, T × S) limiting to RDS with fibers gω,ξ = gω and gˆω,ξ =g ˆω. Figure8 shows tipping probabilities in all three scenarios. In the deterministic fixed case the qualitative shape of the graph of Pr does not change. Adding the time translations only shifts the spots of the peaks. For deterministic and r-dependent time translations, the graph of Pr appears to be mono- tone but still not smooth. The points of non-differentiability remain the same, but interestingly the right limit of Pr0 in those points appears to be 0. In the random scenario the tipping probability increases for most values of r, only around 1 2 r = 3 and for r > 3 it seems to decrease, but the sharp peaks seem to soften. The “extreme” 7as per suggestion of Peter Ashwin.

96 = 0.04, + = 1.05

1.00 = 0.0 = 0.4 = 0.8 0.75 ) r ( 0.50 r P

0.25

0.00 0 1 1 1 2 6 3 2 3 r

= 0.07 1.00 + = 1.0

+ = 1.05

+ = 1.1 0.75 + = 1.15 + = 1.2

+ = 1.25 ) r ( 0.50 r P

0.25

0.00 0 1 1 1 1 12 6 4 3 r

= 0.04, + = 1.05

1.00 = 0.0 = 0.25 = 0.5 = 0.75 0.75 = 1.0 ) r ( 0.50 r P

0.25

0.00 0 1 1 1 2 6 3 2 3 r

Figure 8: Modified tipping example. Top: Tipping probabilities for fixed time delay of several values of δ. Middle: Deterministic r-dependent time delay. This is equivalent to using the parameter  1  h 1 i shift t 7→ Λ t − 3 , therefore the interesting range of parameters is 0, 3 . Bottom: Tipping probabilities for random time translations of varying size ε, including the limiting case ε = 0.

97 case ε = 1 is included in the figure, but even in this case Pr is not monotonous. This is somewhat surprising as in this case the points rn := r (n + δn) for n = N0,...,N1 are evenly h i covering the interval 1 , 2 in the following sense. If (ri ) are i.i.d. realizations of r then 3 3 n i∈N n the set i {rn (ω) | n = N0,...,N1, i ∈ N} h 1 2 i r is dense in 3 , 3 for a.e. ω ∈ Ω, and thus one might expect ϕ to behave more like a time continuous system, where discretization effects do not play any role. In this case one would expect to find monotonous dependency of tipping on the rate, as it was observed in [65].

98 References

[1] Adili, A., and Wang, B. Random attractors for non-autonomous stochastic FitzHugh- Nagumo systems with multiplicative noise. Discrete Contin. Dyn. Syst., Dynamical systems, differential equations and applications. 9th AIMS Conference. Suppl. (2013), 1–10.

[2] Alkhayuon, H., and Ashwin, P. Rate induced tipping from periodic attractors: partial tipping and connecting orbits. Chaos 28, 3 (2018), 033608.

[3] Arnold, L. Random Dynamical Systems. Springer, 1998.

[4] Arthur, B. Positive feedbacks in the economy. Scientific American (1990), 92–99.

[5] Arthur, B., Ermoliev, Y., and Kaniovskii, Y. Path dependent processes and the emergence of macro-structure. Eur. J. Oper. Res 30 (1987), 294–303.

[6] Arthur, W. B. On designing economic agents that behave like human agents. J. Evolitionary Econ. 3 (1993), 1–22.

[7] Ashwin, P., Perryman, C., and Wieczorek, S. Parameter shifts for nonautonomous systems in low dimensions: bifurcation- and rate-induced tipping. Nonlinearity 30, 2 (2017), 2185–2210.

[8] Ashwin, P., Wieczorek, S., Vitolo, R., and Cox, P. Tipping points in open systems: bifurcation, noise-induced and rate-dependent examples in the climate system. Philos. Trans. Royal Soc. A 370 (2012), 1166–1184.

[9] Ashwin, P., Wieczorek, S., Vitolo, R., and Cox, P. Tipping points in open systems: bifurcation, noise-induced and rate-dependent examples in the climate system. Phil. Trans. R. Soc 370 (2012), 1166–1184.

[10] Barnsley, M., and Vince, A. The chaos game on a general iterated function system. Ergod. Th. & Dynam. Sys. 31 (2011), 1073–1079.

[11] Barron, E. Game Theory: An Introduction, second ed. John Wiley & Sons, Ltd., 2013.

[12] Benaïm, M. A dynamical systems approach to stochastic approximations. Siam J. Control and Optimization 34, 2 (1996), 437–472.

[13] Benaïm, M. Dynamics of stochastic approximation algorithms. Séminaire de probabilités (Strasbourg) 33 (1999), 1–68.

[14] Benaïm, M., Benjamini, I., Chen, Y., and Lima, Y. A generalized Pólya’s urn with graph based interactions. Random Structures Algorithms 46, 4 (2015), 614–634.

[15] Benaïm, M., and Hirsch, M. W. Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynam. Differential Equations 8, 1 (1996), 141–176.

[16] Benaïm, M., and Hirsch, M. W. Learning processes, mixed equilibria and dynamical systems. Games Econ. Behav. 29 (1999), 36–72.

99 [17] Bottou, L. On-line Learning and Stochastic Approximations. Cambridge University Press, 1998, pp. 9–42.

[18] Bowen, R. ω-limit sets for axiom A diffeomorphisms. J. Differential Equations 18, 2 (1975), 333–339.

[19] Brin, M., and Stuck, G. Introduction to Dynamical Systems. Cambridge University Press, 2002.

[20] Brown, G. W. Iterative solution of games by fictitious play. Cowles Commission Monograph No. 13. John Wiley & Sons, Inc., 1951, pp. 374–376.

[21] Carigi, G. Rate-induced tipping in nonautonomous dynamical systems with bounded noise. MRes thesis, Imperial College London and University of Reading, 9 2017.

[22] Castaing, C., and Valadier, M. Convex Analysis and Measurable Multifunctions, vol. 580 of Lecture Notes in Mathematics. Spriger, 1977.

[23] Chekroun, M., Simonnet, E., , and Ghil, M. Stochastic climate dynamics: Random attractors and time-dependent invariant measures. Physica D 240 (2011), 1685–1700.

[24] Cherubini, A. M., Lamb, J. S. W., Rasmussen, M., and Sato, Y. A random dynamical systems perspective on stochastic resonance. Nonlinearity 30, 7 (2017), 2835– 2853.

[25] Conley, C. Isolated invariant sets and the Morse index, vol. 38 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, 1978.

[26] Crauel, H. Markov measures for random dynamical systems. Stochastics and Stochastics Reports 37 (1991), 153–173.

[27] Crauel, H. Random Probability Measures on Polish Spaces. Taylor & Francis, 2002.

[28] Crauel, H., and Flandoli, F. Attractors for random dynamical systems. Probab. Thoory Relat. Fields 100 (1994), 365–393.

[29] Crauel, H., and Kloeden, P. Nonautonomous and random attractors. Jahresber. Dtsch. Math.-Ver. 117, 3 (2015), 173–206.

[30] Crauel, H., Kloeden, P. E., and Yang, M. Random attractors of stochastic reaction- diffusion equations on variable domains. Stoch. Dyn. 11, 2-3 (2011), 301–314.

[31] Crauel, H., and Scheutzow, M. Minimal random attractors. J. Differential Equations 265, 2 (2018), 702–718.

[32] Cui, H., and Langa, J. A. Uniform attractors for non-autonomous random dynamical systems. J. Differential Equations 263 (2017), 1255–1268.

[33] Davis, B. A comparison test for Martingale inequalities. Ann. Math. Stat. 40, 5 (1969), 1852–1854.

100 [34] Dijkstra, H. A., Franccombe, L., and von der Heydt, A. S. A stochastic dynamical systems view of the atlantic multidecadal oscillation. Phil. Trans. R. Soc. A 366 (2008), 2545–2560.

[35] Ditlevsen, P. Tipping points in the climate system. Cambridge University Press, 2017, pp. 33–53.

[36] Eggenberger, F., and Pólya, G. Über die Statistik verketteter Vorgänge. J. Appl. Math. Mech. 3, 4 (1923), 279–289.

[37] Einsiedler, M., and Ward, T. Ergodic Thoery, vol. 259 of Graduate Texts in Mathe- matics. Springer, 2011.

[38] Erev, I., and Roth, A. Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. A. Econ. Rev. 88, 4 (1998), 848–881.

[39] FitzHugh, R. Impulses and physiological states in theoretical models of nerve membrane. Biophysical J 1 (1961), 445–466.

[40] Freedman, D. A. Bernard Friedman’s urn. Ann. Math. Statist. 36, 3 (1965), 956–970.

[41] Fudenberg, D., and Kreps, D. Learning mixed equilibria. Games Econom. Behav. 5, 3 (1993), 320–367.

[42] Gharaei, M., and Homburg, A. J. Random interval diffeomorphisms. Discrete Contin. Dyn. Syst. Ser. S 10, 2 (2017), 241–272.

[43] Hasselmann, K. Stochastic climate models. I. Theory. Tellus 28 (1976), 473–485.

[44] Hofbauer, J. Evolutionary Games and Population Dynamics. Cambridge University Press, 1998.

[45] Hopkins, E. Two competing models on how people learn in games. Econometrica 70, 6 (2002), 2141–2166.

[46] Hopkins, E., and Posch, M. Attainability of boundary points under reinforcement learning. Games Econom. Behav. 53, 1 (2005), 110–125.

[47] Kiefer, J., and Wolfowitz, J. Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 3 (1952), 462–466.

[48] Klenke, A. : A Comprehensive Course. Springer London, 2008.

[49] Kloeden, P., and Rasmussen, M. Nonautonomous Dynamical Systems, vol. 176 of Mathematical Surveys and Monographs. American Mathematical Society, 2011.

[50] Kuksin, S., and Shirikyan, A. Mathematics Of Two-dimensional Turbulence. Cam- bridge University Press, 2012.

[51] Kushner, H., and Clark, D. Stochastic approximation methods for constrained and unconstrained systems. No. 26 in Applied Mathematical Sciences. Springer, 1978.

101 [52] Kuznetsow, Y. Elements of Applied Bifurcation Theory, second ed. No. 112 in Applied Mathematical Sciences. Springer, 1998.

[53] Lamb, J. S. W., Rasmussen, M., and Rodrigues, C. S. Topological bifurcations of minimal invariant sets for set-valued dynamical systems. Proc. Am. Math. Soc. 143, 9 (2015), 3927–3937.

[54] Ledrappier, F., and Young, L.-S. Entropy for random transformations. Probab. Th. Rel. Fields 80 (1988), 217–240.

[55] Lima, Y. Graph-based Pólya’s urn: completion of the linear case. Stoch. Dyn. 16, 2 (2016), 13 pp.

[56] Ljung, L. Strong convergence of a stochastic approximation algorithm. Annals of Statistics 6, 3 (1978), 680–696.

[57] Melbourne, I., and Stuart, A. A Note on diffusion limits of chaotic skew product flows. Nonlinearity 4 (2001), 1361–1367.

[58] Merton, R. C. Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics 51, 3 (1969), 247–257.

[59] Pemantle, R. Nonconvergence to unstable points in urn models and stochastic approx- imations. Ann. Prob. 18 (1990), 698–712.

[60] Pemantle, R. When are touchpoints limits for generalized Pólya urns? Proc. Amer. Math. Soc. 113, 1 (1991), 235–243.

[61] Pemantle, R. A survey of random processes with reinforcement. Probab. Surv. 4 (2007), 1–79.

[62] Posch, M. Cycling in a stochastic learning algorithm for normal form games. J. Evol. Econ. 7 (1997), 193–207.

[63] Rasmussen, M. Attractivity and Bifurcation for Non Autonomous Dynamical Systems. No. 1907 in Lecture Notes in Mathematics. Springer, 2007.

[64] Renlund, H. Generalized Pólya Urns via Stochastic Approximation. PhD thesis, Uppsala University, February 2009.

[65] Ritchie, P., and Sieber, J. Early-warning indicators for rate-induced tipping. Chaos 26, 9 (2016), 093116, 13.

[66] Robbins, H., and Monro, S. A stochastic approximation method. Ann. Math. Statistics 82 (1951), 400–407.

[67] Roth, A., and Erev, I. Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term. Games Econ. Behav. 8 (1995), 164–212.

[68] Sato, Y., Akiyama, E., and Farmer, J. D. Chaos in learning a simple two-person game. Proc. Natl. Acad. Sci. USA, 7 (2002), 4748–4751.

102 [69] Scheffer, M. Critical Transitions in Nature and Society. Princeton Studies in Com- plexity. Oxford University Press, 2009.

[70] Scheffer, M., Bascompte, J., Brock, W., Brovkin, V., Carpenter, S., Dakos, V., Held, H., Nes, E. v., Rietkerk, M., and Sugihara, G. Early-warning signals for critical transitions. Nature 461 (2009), 53 – 59.

[71] Sell, G. R. Nonautonomous differential equations and dynamical systems. II. Limiting equations. Transactions of the American Mathematical Society 127 (1967), 263–283.

[72] Sell, G. R. Nonautonomous differential equations and topological dynamics I. The basic theory. Transactions of the American Mathematical Society 127, 2 (1967), 241–262.

[73] Wang, B. Sufficient and necessary criteria for existence of pullback attractors for non- compact random dynamical systems. J. Differential Equations 253, 5 (2012), 1544–1583.

[74] Wang, B. Random attractors for non-autonomous stochastic wave equations with multiplicative noise. Discrete Contin. Dyn. Syst. 34, 1 (2014), 269–300.

[75] Wieczorek, S., Ashwin, P., Luke, C. M., and Cox, P. M. Excitability in ramped systems: the compost-bomb instability. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 467, 2129 (2011), 1243–1269.

[76] Williams, D. Probability with Martingales. Cambridge University Press, 1991.

[77] Zmarrou, H., and Homburg, A. J. Bifurcations of stationary measures of random diffeomorphisms. Ergodic Theory Dynam. Systems 27, 5 (2007), 1651–1692.

[78] Zmarrou, H., and Homburg, A. J. Dynamics and bifurcations of random circle diffeomorphisms. Discrete Contin. Dyn. Syst. Ser. B 10 (2008), 719–731.

103 A Hausdorff distance

We want to recall a few basic facts about the Hausdorff distance. For a more complete overview we refer to Chapter 2 of the book [22]. Let (M, d) be a separable metric space and denote by K the set of all compact subsets of M. For x ∈ M and K ∈ K we use the standard notation

d (x, K) = d (K, x) = inf d (x, y) . y∈K

Definition A.1. Let K,L ∈ K. (a) The excess of K over L, or semi Hausdorff distance of K and L is

dist (K,L) := sup d (x, L) = sup{ε > 0 | L ⊆ Bε (K) . x∈K

(b) The Hausdorff distance of K and L is

dH (K,L) := sup {dist (K,L) , dist (L, K)}.

Theorem A.2 (Corollary 2.9 in [22]). The Hausdorff distance is a metric on K. If (M, d) is separable then (K, dh) is separable as well. Theorem A.3. If f : M → M is (uniformly) continuous, then K → K,K 7→ f (K) is (uni- formly) continuous as well.

K T Lemma A.4. If K1,K2,... ∈ is a decreasing sequence then Kn → n∈N Kn w.r.t. dh. : T Proof. Let K = ∈N Kn. As Kn is decreasing and K ⊆ Kn, the sequence dist (Kn,K) is decreasing as well and has a limit c ≥ 0. If c were positive, ther would be points xn ∈ Kn with d (x ,K) > c . As K and K are compact, (x ) has an accumulation point in K, a n 2 1 n n∈N contradiction.

If Kn → K in K, then K is the upper limit of the Kn,   \ \ K = lim Kn := cl Kn , n→∞   n0∈N n≥n0 see [22, Theorem II.2]. The upper limit is always a closed but can be empty and in the case of non-compact M could be unbounded. But in compact spaces every sequence in K converges to its upper limit in the semi-Hausdorff distance.

Lemma A.5. Let M be compact and K1,K2,... ∈ K. Then

lim dist (Kn, lim K) = 0. n→∞ n→∞

S  Proof. Let An := cl k≥n Kn and apply Lemma A.4.

In general we do not have convergence in dh. Let for example M = [0, 2] and Kn = [0, 1] for n even and Kn = [1, 2] for n odd. Then Kn 6→ M = limn→∞ Kn.

104 B Conditional expectation, martingales and stopping times

In this section we will give a brief introduction to the theory of martingales, and we will introduce a few classic results that we use above, in particular in Section3. We will start by recalling the definition and a few basic facts about conditional expectations, then introduce martingales and martingale difference sequences as well as list a few classic results about them. Finally, we will present a few concepts about stopping times. Throughout the whole section, (Ω, A , µ) is a probability space. Expectation with respect p to µ will be denoted with the symbol E. For p ≥ 1 we write Lµ for the set of (equivalence p classes modulo zero sets) of measurable maps f : Ω → R with E[|f| ] < ∞.

1 Conditional expectations Let f ∈ Lµ and F ⊆ A a sub-σ-algebra. The conditional expectation of f w.r.t. F is the best guess of the value of f given the information contained in F . We follow Chapter 8 in [48], where proofs of all results below can be found. 1 Theorem and Definition B.1. There is an a.s. unique random variable Y ∈ Lµ with the following properties. (a) Y is F -measurable. (b) For each F ∈ F , Z Z Y dµ = f dµ. F F This random variable is called the conditional expectation of f with respect to F . We use the symbol E[f | F ] = Y . If F = σ (X) is generated by some random variable X, we write E[f | F ] = E [f | X]. Note that conditional expectations are random variables rather than scalars. The following elementary but useful properties can be easily derived from this definition.

1 Theorem B.2. Let f, g ∈ Lµ, F ⊆ G ⊆ A be σ-algebras and c ∈ R. (a) E E[f | F ] = E [f]. (b) E[f + cg | F ] = E [f | F ] + c · E[g | F ]. (c) If g is F -measurable, then E[fg | F ] = g · E[f | F ] and in particular E[g | F ] = g.     (d) E E[f | G ] F = E E[f | F ] G = E [f | F ]. (e) If f is independent from F , then E[f | F ] = E [f]. Definition B.3. The conditional probability of A ∈ A w.r.t. F is the random variable

µ (A | F ) := E [1A | F ].

Following Theorem and Definition B.1, conditional probabilities are a.s. uniquely defined via F -measurability and the relation Z (∀F ∈ F ): µ (A ∩ F ) = µ (A | F ) dµ. F Remark B.4. In elementary statistics, conditional probabilities are defined with respect to positive measure sets. For A, B ∈ A with µ (B) > 0 we write µ (A ∩ B) µ (A | B) := . µ (B)

105 This is somehow contained as a special case in the above definition, if we condition on the c σ-algebra F = {∅,Ω,B,B }, because in this case we get c µ (A | F ) = µ (A | B ) 1Bc + µ (A | B) 1B.

So far we only defined conditional probabilities as random variables µ (A | F ) individually for any given set A ∈ A . However, those random variables are only well defined up to sets of measure 0 that can a priori depend on A, so it is not clear that for any ω ∈ Ω the map A 7→ µ (A | F )(ω) is actually a probability measure. In the case of conditional distributions however it is known that this can be achieved.

Definition B.5. Let U be a random variable. A map κ: Ω ×B (R) → [0, 1] is called a regular conditional distribution of U given F , if (a) B 7→ κ (ω, B) is a probability measure for a.e. ω ∈ Ω and (b) ω 7→ κ (ω, B) is a version of the conditional probability µ (U ∈ B | F ).

Theorem B.6 ([48, Theorem 8.28]). For every random variable U : Ω → R and sub-σ-algebra F ⊆ A exists a regular conditional distribution of U given F .

Martingales

Definition B.7. (a)A filtration on (Ω, A , µ) is an increasing sequence of σ-algebras

F1 ⊆ F2 ⊆ · · · ⊆ A .

The limit of the filtration is the σ-algebra

 [  F∞ := σ Fn . n∈N (b) A stochastic process (M ) is adapted to a filtration ( ) , if M is -measurable n n∈N Fn n∈N n Fn for every n ∈ N. This means that an adapted process carries at most as much information up to a time n as the filtration. For a given stochastic process (M ) a natural choice for the filtration is n n∈N

Fn := σ (Mk : k = 1, . . . , n) , as it contains exactly the information of the process. For that reason it is called the natural filtration. But for the rest of this section we will fixate just any filtration ( ) and simply Fn n∈N speak of adapted processes. Definition B.8. (a)A martingale is an adapted process (M ) such that n n∈N

E[Mn+1 | Fn] = Mn

for all n ∈ N. (b)A martingale difference sequence is an adapted process (U ) such that n n∈N

E[Un+1 | Fn] = 0

for all n ∈ N.

106 1 Implicitly this definition assumes that Mn,Un ∈ Lµ. Theorem B.2(d) implies that

E[Mn+k | Fn] = Mn in the martingale case and E[Un+k | Fn] = 0 in the martingale difference case for all n, k ∈ N. Remark B.9. The two definitions are equivalent in the following way. (a) If (M ) is a martingale, then we can define a martingale difference sequence via n n∈N U1 := M1 and Un := Mn − Mn−1 for n ≥ 2. (b) Conversely, if (U ) is a martingale difference sequence and c , c ,... ∈ are some n n∈N 1 2 R arbitrary weights, then n X Mn := ckUk (B.1) k=1 defines a martingale. The representation of martingales via a martingale difference sequence like in (B.1) has the advantage, that we get a nice representation of the second moments of the resulting martingale. The following rule is known as the Pythagorean Formula.8

Lemma B.10 ([76, Section 12.1]). Let (Mn) be a martingale defined as in (B.1). Then for all n ∈ N, n 2 X 2 2 E[Mn] = ck · E[Uk ]. k=1 The second moments of a martingale are an important measure for determining its long time behavior. We will now present a few classic results establishing this fact. Doob’s martingale inequality gives an upper bound for the probability that a martingale leaves an interval [−c, c] before time n.

2 Theorem B.11 (Doob’s L -inequality, [48, Theorem 11.2]). Let (Mn) be a martingale with 2 E[Mn] < ∞ for all n ∈ N. Then   2 E[Mn] µ max |Mn| > c ≤ k=1,...,n c2 for all n ∈ N and c > 0. This result can then be used to show that martingales converge if its second moment is uniformly bounded.

2 Theorem B.12 (Doob’s L -convergence theorem, [48, Theorem 11.10]). Let (Mn) be a mar- 2 2 tingale with supn∈N E[Mn] < ∞. There exists a F∞-measurable random variable M∞ ∈ Lµ 2 such that Mn → M∞ a.s. and in L .

8 2 2 The spaces Lµ (Fn) of Fn-measurable, square integrable functions are subspaces of the Hilbert space Lµ, 2 and the linear operator E[ · | Fn] is the orthogonal projection onto Lµ (Fn) (cf. [48, Corollary 8.16]). Being a 2 2 2 martingale difference sequence in Lµ means that U1,...,Un ∈ Lµ (Fn) and Un+1⊥Lµ (Fn) for all n ∈ N, and 2 thus (Un) is an orthogonal family in Lµ. n∈N

107 A stronger version of this result exists for uniformly integrable martingales. We will not state this result in its full generality as it is not needed for our purpose, but a direct corollary from it, which can be used to show that any set A ∈ A∞ can be approximated arbitrarily close by sets in some Fn.

1 Theorem B.13 (Levy’s 0–1 law, [76, Section 14.2]). Let f ∈ Lµ. Then

E[f | Fn] → E[f | F∞]

1 a.s. and in Lµ.

Corollary B.14. Let A ∈ A∞ with µ (A) > 0 and ε > 0. There exist n ∈ N and B ∈ Fn such that µ (A | B) > 1 − ε.

Proof. By Levy’s 0–1 law, µ (A | Fn) → 1A a.s., and thus for some large enough n we have µ (B) > 0 with B := {µ (A | Fn) > 1 − ε} ∈ Fn. Moreover, Z µ (A ∩ B) = µ (A | Fn) dµ ≥ (1 − ε) µ (B) . B

The following is a sort of converse result to the L2 convergence theorem. It was proven by Davis in [33]. For our purposes we reformulate the original result a little bit.

Theorem B.15. Let (U ) be a martingale difference sequence such that there are constants n n∈N σ0, σ1 > 0 with 2 2 E[Un+1 | Fn] ≤ σ1 and E[|Un+1| | Fn] ≥ σ0 P∞ 2 almost surely for all n ∈ N, and let c1, c2, . . . > 0 with n=1 cn = ∞. Then a.s.

n n X X lim inf ckUk = −∞ and lim sup ckUk = ∞. n→∞ n→∞ k=1 k=1 Proof. Using Jensen’s inequality we see that

2 2 2 Σn := E [Un+1 | Fn] ≥ σ0

−1 a.s. for all n ∈ N and thus dn := UnΣn is well defined. Obviously E[dn+1 | Fn] = 1 and

E[|Un+1| | Fn] σ0 E[|dn+1| | Fn] = ≥ > 0. Σn σ1

P∞ 2 Putting vn := cnΣn we see that vn is measurable w.r.t. Fn−1 and n=1 vn = ∞, hence the claim follows from the main result in [33].

108 Stopping times Closely related to filtrations and martingales is the concept of stopping times. Roughly speaking, a stopping time is a rule or strategy for termination of a stochastic process, that does not take any information beyond the stopping point into account.

Definition B.16. A random variable τ : Ω → N ∪ {∞} is called a stopping time, if {τ = n} ∈ Fn for all n ∈ N. Definition B.17. Let τ be a stopping time. (a) The σ-algebra Fτ generated by τ consists of all A ∈ A such that A ∩ {τ = n} ∈ Fn for all n ∈ N. (b) Let (X ) be a stochastic process. The stopped process is the random variable X n n∈N τ with Xτ (ω) := Xτ(ω) (ω).

One can easily show that Fτ is the smallest σ-algebra with respect to which Xτ is mea- surable for every adapted process (X ) . Indeed, if (X ) is adapted and B ∈ ( ), n n∈N n n∈N B R then {Xτ ∈ B} ∩ {τ = n} = {Xn ∈ B} ∩ {τ = n} ∈ Fn.

On the other hand, if A ∈ Fτ , then the process Xn := 1A∩{τ=n} is adapted and Xτ = 1A. Lemma B.18. Assume the stochastic process (X ) fulfills the inequality n n∈N

E[Xn | Fn] ≥ c for some c ∈ R and all n ∈ N. Then for every a.s. finite stopping time τ,

E[Xτ | Fτ ] ≥ c.

Proof. First we show that

E [1{τ=n}Xn | Fτ ] = E [1{τ=n}Xn | Fn].

To see this, let A ∈ Fτ . Then we have Z Z Z Z E [1{τ=n}Xn | Fn] dµ = E[Xn | Fn] dµ = Xn dµ = 1{τ=n}Xn dµ. A A∩{τ=n} A∩{τ=n} A

This implies that a.s.

∞ ∞ X X E[Xτ | Fτ ] = E [1{τ=n}Xn | Fτ ] = 1{τ=n}E[Xn | Fn] ≥ c. n=1 n=1

C Chain recurrence and asymptotic pseudotrajectories

In this section we discuss some aspects of chain recurrence and chain transitivity. These ideas go back to Charles Conley in his seminal work [25]. For this section let (M, d) be a metric space and Φ: [0, ∞) × M → M, (t, x) 7→ Φt (x)

109 a semi-flow on M, i.e. Φ0 (x) = x and Φs+t (x) = Φt (Φs (x)) for all x ∈ M and s, t ≥ 0. A orbit or trajectory of the semi-flow Φ is a set of the form O (x0) := {Φt (x0) | t ≥ 0}, and we can say there is an orbit from x0 to x if x = Φt (x0) for some t, or equivalently x ∈ O (x0). However, if we allow small perturbations into our system, this definition of orbits is a bit too restrictive.

Definition C.1. Let δ, T > 0 and a, b ∈ X.A (δ, T ) pseudo-orbit from a to b is a finite sequence of points a = x0, x1, . . . , xn = b ∈ M such that there exist t0, . . . , tn−1 > T with  d Φti (xi) , xi+1 < δ for i = 0, . . . , n − 1.

In a pseudo-orbit we do not need to follow the same trajectory forever, but we are allowed to jump a finite number of times, and the parameters δ, T control the maximum size of the jump and the minimum duration between two subsequent jumps. Letting δ → 0 and T → ∞ leads us to the concept of chain recurrence and chain transitivity.

Definition C.2. A point x ∈ M is chain recurrent, if for every choice of δ, T > 0 there is a (δ, T ) pseudo-orbit from x to itself. We denote by R (Φ) the set of all chain recurrent points of the flow Φ. We say that Φ is chain recurrent if R (Φ) = M.

Definition C.3. The flow Φ is called chain transitive, if for every pair a, b ∈ M and every choice of δ, T > 0 there is a (δ, T ) pseudo-orbit from a to b.

Definition C.4. A compact subset A ⊆ M is called invariant, if Φt (A) = A for all t > 0. An invariant subset is called internally chain recurrent (transitive), if the restricted flow Φ|A is chain recurrent (transitive).

Definition C.5. A compact, invariant subset A ⊆ M is called an attractor, if A 6= ∅ and if there exists an open set O with A ⊆ O and dist (Φt (O) ,A) → 0 as t → ∞. It is called a proper attractor if, in addition, A 6= M.

The following result due to Bowen [18] gives us a handy characterization of internal chain transitivity.

Theorem C.6. Let A ⊂ M be a compact, invariant set. The following assertions are equiva- lent. (a) A is internally chain transitive. (b) A is internally chain recurrent and connected.

(c) The restricted flow Φ|A does not admit a proper attractor.

110