<<

Long–time behavior of a spherical mean field model

vorgelegt von Diplom–Mathematiker Ren´eDahms

Von der Fakult¨atMathematik und Naturwissenschaften der Technischen Universit¨atBerlin zur Erlangung des akademischen Grades

Doktor der Naturwissenschaften – Dr. rer. nat. –

genehmigte Dissertation

Promotionsausschuß:

Vorsitzender: Prof. Dr. D. Kr¨uger Berichter: Prof. Dr. J. G¨artner Prof. Dr. M. Scheutzow Prof. Dr. A. Wakolbinger

Tag der wissenschaftlichen Aussprache: 01. 10. 2002

Berlin 2002 D 83

Summary

In this thesis, we study some aspects of the long time behavior of a spherical mean field model. This model is motivated by problems in mathematical physics and statistical mechanics. More precisely, we consider a system of N ∈ N diffusion processes (particles) on a sphere, where the drift depends on the of these N particles in a rotation invariant manner. The object of main interest is the behavior of the process X(N) of the empirical measures, for large N. This process can be viewed as a random perturbation of the deterministic McKean– Vlasov dynamics µ(N). The appropriate approach to study such random dynamics consists of an infinite dimensional generalization of the classical Freidlin–Wentzell theory, which is a powerful tool for studying randomly perturbed finite dimensional dynamical systems in the case of small noisy disturbances. In the first part we investigate a hierarchical model of several levels of mean field interac- tions. For each level n ∈ N, we derive a large deviation principle, as N tends to infinity, for the corresponding invariant distributions of the measure valued empirical processes at level n. We give an explicit formula for the associated rate function and characterize its zero set. It turns out that this set is a finite dimensional sphere S(n) of probability measures. This sphere can be characterized by its radius r(n) ∈ [0, 1). We obtain a phase transition at a critical value of the mean field interaction strength. Moreover, we calculate this critical value explicitly. Finally, we analyze the sequence of radii r(n) and give some criteria for its behavior as n tends to infinity. In the second part we consider a more general setting with particles moving on a finite dimensional Riemannian manifold instead on a sphere. We prove a moderate deviation prin- ciple for the processes X(N) −µ(N), as N tends to infinity. Such a principle measures deviations in intermediate scales between two boundary√ regimes, namely the regime of the , where the scale is equal to 1/ N, and the large deviation scale, being equal to one. Moreover, we derive a nice integral formula for the rate function of the moderate deviation principle and analyze its zero set. It turns out that this set contains only the Schwartz distri- bution zero. The main tools we use in this part are the theory of partial differential equations in Sobolev spaces and stochastic analysis on Riemannian manifolds. (N) In the last part we study the time speeded processes (X (Nt))t∈[0,T ]. Using the first two parts, we derive a functional central limit theorem for these processes. More precisely, we (N) prove that, as N tends to infinity, the processes (X (Nt))t∈[0,T ] converge in distribution to a Brownian motion on the sphere S(1) of probability measures. The variance of this Brownian motion is evaluated in terms of an eigenfunction of an associated diffusion operator.

Zusammenfassung

In dieser Arbeit untersuchen wir einige Aspekte des Langzeitverhaltens von sph¨arischen Teilchenmodellen mit Mittelwertwechselbeziehung. Diese Modelle sind durch Problemstellun- gen der mathematischen Physik und der statistischen Mechanik motiviert. Genauer gesagt, betrachten wir N ∈ N Diffusionen (Teilchen), deren Drift vom empirischen Maß dieser N Teilchen rotationsinvariant abh¨angt.Dem Verhalten des Prozesses X(N) der empirischen Maße f¨ur große N gilt dabei unser Hauptinteresse. Dieser Prozeß kann als eine zuf¨allige St¨orung der deterministischen McKean–Vlasov–Dynamik µ(N) aufgefaßt werden. Der geeignete Zu- gang zum Studium solcher zuf¨alligen Systeme besteht in einer unendlich–dimensionalen Ver- allgemeinerung der klassischen Freidlin–Wentzell–Theorie. Diese Theorie ist ein m¨achtiges Werkzeug in der Untersuchung von endlich–dimensionalen deterministischen dynamischen Systemen mit kleinen zuf¨alligenSt¨orungen. Im ersten Teil untersuchen wir ein hierarchisches Modell, das aus mehreren Stufen besteht. F¨ur jede Stufe n ∈ N leiten wir ein Prinzip der großen Abweichungen f¨ur die invarianten Verteilungen der maßwertigen empirischen Prozesse der n-ten Stufe her. Die zugeh¨orige Ratenfunktion geben wir explizit an und charakterisieren ihre Nullstellenmenge. Es stellt sich heraus, daß diese Menge eine endlich–dimensionale Sph¨are S(n) von Wahrscheinlichkeitsmaßen ist. Diese Sph¨arekann durch ihren Radius r(n) ∈ [0, 1) beschrieben werden. Wir zeigen, daß an einem kritischen Wert f¨ur die St¨arke der Mittelwertwechselbeziehung ein Phasen¨ubergang stattfindet, und geben diesen kritischen Wert explizit an. Weiterhin analysieren wir die Folge der Radien und leiten einige Kriterien f¨urihr Verhalten beim Grenz¨ubergang n → ∞ her. Im zweiten Teil untersuchen wir allgemeinere zuf¨allige Teilchenmodelle auf endlich–dimen- sionalen Riemannschen Mannigfaltigkeiten anstatt einer Sph¨are. Wir beweisen ein Prinzip moderater Abweichungen f¨ur die Prozesse X(N) − µ(N) f¨ur N gegen Unendlich. Die Gr¨oßen- ordnung der√ Abweichungen, die mit solch einem Prinzip gemessen werden, liegt zwischen der Skala 1/ N des zentralen Grenzwertsatzes und der Skala Eins der großen Abweichun- gen. Weiterhin leiten wir eine kompakte Integralform der zugeh¨origenRatenfunktion her und untersuchen ihre Nullstellenmenge. Es stellt sich heraus, daß diese Menge als einziges Element die Schwartz’sche Distribution Null enth¨alt.Die wichtigsten Werkzeuge, die wir in diesem Teil verwenden, sind die Theorie partieller Differentialgleichungen in Sobolevr¨aumen und stochastische Analysis auf Riemannschen Mannigfaltigkeiten. (N) Im letzten Teil untersuchen wir die zeitbeschleunigten Prozesse (X (Nt))t∈[0,T ]. Unter Verwendung der ersten beiden Teile beweisen wir einen funktionalen zentralen Grenzwertsatz. (N) Genauer gesagt, beweisen wir, daß die Prozesse (X (Nt))t∈[0,T ] f¨ur N → ∞ in Verteilung gegen eine Brownsche Bewegung auf der Sph¨are S(1) konvergieren. Weiterhin berechnen wir die Varianz dieser Brownschen Bewegung mittels Eigenfunktionen eines assoziierten Diffu- sionsoperators.

Contents

Introduction i

1 Invariant distributions and a hierarchical model 1 1.1 The model and basic notation ...... 1 1.2 Level one ...... 4 1.2.1 Large deviations at level one ...... 4 1.2.2 The rate function at level one ...... 5 1.3 Higher Levels ...... 9 1.3.1 Large deviations at higher levels ...... 11 1.4 The behavior of the sequence of radii ...... 14 1.4.1 A criterion for a positive limit of the radii ...... 14 1.4.2 Some examples ...... 18 1.5 Notes ...... 19

2 Moderate deviations 21 2.1 The model and basic notation ...... 21 2.2 Formulation of the moderate deviation principle ...... 24 2.3 Some statements about integral equations ...... 26 2.4 The behavior of the McKean–Vlasov path ...... 31 2.5 The free model ...... 34 2.5.1 Large deviations for the martingale term ...... 35 2.5.2 Identification of the rate function ...... 47 2.5.3 Large deviations for the free model ...... 52 2.6 The coupled model ...... 55 2.6.1 Local large deviations ...... 55 2.6.2 Exponential tightness ...... 58 2.6.3 Proof of Theorem 2.2.2 ...... 62 2.7 Notes ...... 63

3 The dynamic behavior 65 3.1 The model and basic notation ...... 65 3.2 Construction of the test functions ...... 68 3.2.1 Properties of the test functions f0,β ...... 69 3.2.2 Properties of the test functions f1,β ...... 73 3.3 Long time behavior of the empirical processes ...... 75 3.3.1 Exponential convergence of the McKean–Vlasov path ...... 75 CONTENTS

3.3.2 Convergence of the empirical processes to the sphere S ...... 83 3.4 Proof of Theorem 3.1.1 ...... 86 3.5 Wrong test functions ...... 90 3.6 Notes ...... 93

Appendix 95 A.1 Logarithmic Sobolev inequalities ...... 95 A.2 Sobolev Spaces ...... 96 A.3 Stochastic differential equations on manifolds ...... 99 A.4 Frequently used notation ...... 102 Introduction

In this thesis we study the asymptotic behavior of a mean field model of weakly interacting diffusions on spheres in the limit of (1) diverging time, (2) diverging number of particles, and (3) appropriate couplings of the two limits. Mean field models of interacting diffusions are of great interest in statistical mechanics and genetics and are studied by many authors. Among others, Donald Dawson and Andreas Greven studied in several articles mean field models of interacting diffusions, which arise in population genetics. They considered interacting diffusion processes with linear drift term, which depend on the empirical mean of the system. Equations with such drift terms model migration, selection, mutation and recombination of population genetics. Moreover, they an- alyzed a hierarchical model of such interacting diffusions. One important aspect they studied is the long–time behavior of such particle systems under multiple space–time scales. For a survey about such mean field models and more recent results related to population genetics, see [Gre00] by Andreas Greven. Other mean field models are motivated by problems in mathematical physics and statistical mechanics. For a survey of such mean field models, see [DG88] by Donald Dawson and J¨urgen G¨artner. In this thesis we will study some interesting phenomena of such models. Therefore, let us describe such a mean field model in Euclidean space in the next section.

A Euclidean mean field model

For each N ∈ N, consider a systems of interacting diffusions x1, . . . , xN in d–dimensional Eu- d clidean space R . In contrast to the above mentioned models, which are of interest in genetics, one considers here a non–linear drift term, given by the gradient of an external potential V . If V (x) increases sufficiently fast as kxk → ∞ then this potential induces a kind of compact- ification and exhibits strong properties. For instance, it causes non–degenerate diffusions with almost compact state space. This means, the potential ensures that all the d particles xk stay a long time period in a compact subset of R with high probability. More- over, the particles x1, . . . , xN are coupled via a mean field interaction, i.e., via an additional drift term, which forces each particle to move in the direction of the empirical mean of the system. One aspect of the long–time behavior is the limit in time, for each fixed number of particles. This leads to the study of equilibrium states (also called steady states in the physics literature) and is closely related to Gibbs measures. The typical phenomena arising are phase transitions. This means, if the strength of the mean field interaction is large enough then there are several equilibrium states in the limit as N → ∞. Otherwise, only one equilibrium state occurs, which is the same behavior as for the free model, i.e., the model without mean field interaction. This

i ii INTRODUCTION allows for the following interpretation. Macroscopically one can only measure the mean field interaction if it is large enough. Otherwise, the diffusion part of the particles dominates and conceals the mean field interaction as N tends to infinity. Other aspects of the long–time behavior are the limit N → ∞, for each fixed compact time interval, and appropriate couplings of the limit in time and the limit N → ∞. The typical phenomena of interest in these cases will be discussed later. Mathematically, our mean field model can be described by the following system of Itˆo stochastic differential equations

N  1 X  dx (t) = −grad V (x (t)) + J (x (t) − x (t)) dt + σ dW (t), k = 1,...,N. k k N l k k l=1 Here σ > 0 is the diffusion constant, J ≥ 0 is the strength of the mean field interaction and d W1,...,WN are independent standard Brownian motions on R . The main object of interest (N) 1 PN is the measure valued empirical process X (t) = N k=1 δxk(t), where δx denotes the Dirac measure on x. Several authors have derived laws of large numbers and functional central limit theorems for this process on fixed compact time intervals, see [DG87] for references. Since the distribution of X(N) is invariant under permutations of the initial data we can d conceive this process as a Markov process with state space P(R ), the set of all probability d measures on R . Using Itˆo’sformula, we compute

D E D (N) E 1 d X(N)(t), f = X(N)(t), LX (t)f dt + √ dM (N)(t), N f

2 d for all f ∈ C (R ), where Mf is a martingale that depends on f. Here hµ, fi denotes the action d of the probability measure µ ∈ P(R ) on the function f, and L is a certain measure dependent diffusion operator. Hence, we see that X(N) represents a random perturbation of a determin- d istic dynamics, namely the McKean–Vlasov dynamics. A solution µ ∈ C([0, ∞); P(R )) of d the McKean–Vlasov dynamics with initial datum µ0 ∈ P(R ) is called McKean–Vlasov path starting at µ0. The appropriate approach to study such random dynamics consists of an infinite dimen- sional generalization of the Freidlin–Wentzell theory, see [FW84] by Mark I. Freidlin and Alexander D. Wentzell. They studied randomly perturbed finite dimensional dynamical sys- tems in the case of small noisy disturbances. If the initial data X(N)(0) converge, as N → ∞, fast enough to some probability measure (N) µ0, then the measure valued empirical processes X converge on each compact time interval to the McKean–Vlasov path starting at µ0. For a precise statement and a proof of this fact, see the article [DG87] by Donald Dawson and J¨urgen G¨artner. In [DG87] much more was shown. Under appropriate assumptions on the potential V , (N) a large deviation principle for the measure valued empirical processes (X (t))t∈[0,T ] was proven, even for more general measure dependent drift terms. Roughly speaking, a large deviation principle is a statement of the type   P X(N)(·) ≈ ν(·) ≈ e−NI(ν(·)), as N → ∞,

d d for ν ∈ C([0,T ]; P(R )). The functional I: C([0,T ]; P(R )) → [0, ∞] is called rate function associated with the large deviation principle and characterizes the exponential rate of decay iii of the probability that the measure valued empirical process is in a small neighborhood of the measure valued path ν. Large deviation principles are a very powerful tool to study different asymptotic questions in and its applications. In particular, the existence of a unique minimal point of the rate function, implies in most cases, a with this minimal point as the deterministic limit. Since the rate function I is nonnegative the set of all minimal points of I is equal to the zero set of I. In general, the zero set of the (N) rate function contains all accumulation points of the sequence (X )N∈N. For an introduction to the theory of large deviation principles, see for instance [DZ93] by Amir Dembo and Ofer Zeitouni or [DH00] by Frank den Hollander. We will mostly follow the notation in [DZ93]. It turns out that the deterministic system can have more than one equilibrium state. In [DG89] Donald Dawson and J¨urgen G¨artner studied a mean field model, where the associated deterministic dynamics have separated steady states. If N is large then the measure valued empirical process will normally follow a McKean–Vlasov path leading into a small neigh- borhood of one such equilibrium state and then perform small fluctuations around it. But from time to time X(N) will make attempts to escape from the domain of attraction of this equilibrium. Sooner or later one of these attempts will be successful and the measure valued empirical process will undergo a transition into a small neighborhood of another equilibrium state. This type of dynamical phase transition is called tunneling. It turns out that the time the measure valued empirical process stays in a neighborhood of one steady state grows exponentially as the number of particles tends to infinity. Assume that the equilibrium states of the McKean–Vlasov dynamics form a connected finite d dimensional submanifold of P(R ). For instance, this should be the case in dimension d ≥ 2 for potentials V , that are “nice” and invariant under rotations. The expected behavior of the random dynamics, i.e., of the measure valued empirical process X(N), is as follows. If N is large then X(N) normally follows the McKean–Vlasov path starting in X(N)(0) into a small neighborhood of some equilibrium state and then performs a diffusion in a small neighborhood of the manifold of all equilibrium states. It turns out that in contrast to the case of separated steady states this behavior can be observed in times, that grow linearly as the number of particles tends to infinity.

A spherical mean field model

d The lack of compactness of Euclidean space R causes major technical problems. Moreover, the claimed correspondence between a “nice” rotation invariant potential V and a connected finite dimensional manifold of steady states of the McKean–Vlasov dynamics is closely related to the GHS (Griffiths–Hurst–Sherman) inequality, which is proven for dimension one only, see [EMN76]. To avoid these technical complications we study a mean field model on the d–dimensional d d sphere S rather than on R . We call this model spherical mean field model. Heuristically, d+1 we consider the mean field model on R and force each particle to stay all the time on d d+1 S = {x ∈ : kxk d+1 = 1} by choosing R R

 0, for x ∈ Sd, V (x) = ∞, otherwise.

Mathematically, the spherical mean field model can be described by the following system iv INTRODUCTION of Stratonovich type stochastic differential equations

(N) dxk(t) = J B(X (t))(xk(t)) dt + σA(xk(t)) ∗ dWk(t), k = 1,...,N,

d+1 where W1,...,WN are independent Brownian motions on R . The components A1,...,Ad+1 of A are fixed vector fields on Sd with d+1 X 2 Ai = ∆. i=1 Here ∆ denotes the Beltrami–Laplace operator (Laplacian) on Sd. Moreover, the value of d R the drift B(µ) at some point x ∈ S is the projection of the vector hµ, id d+1 i = z µ(dz) R d+1 d R attached at x onto the tangent space TxS . Here and in the following we will denote a d d+1 probability measure on S and its unique extension to a probability measure on R by the same symbol. For an introduction to stochastic differential equations (SDE’s) on manifolds, see for instance [HT94] by Wolfgang Hackenbroch and Anton Thalmaier, [IW89] by Nobuyuki Ikeda and Shinzo Watanabe or [Eme89] by Michel Emery. In Appendix A.3 we present some basic facts about SDE’s on manifolds. As in the Euclidean case, we denote by X(N) the measure valued empirical process of the particle system. We will study the following three interesting aspects: • The behavior of the invariant distributions of X(N) as N → ∞ and a related hierarchical model.

(N) (N) (N) • A moderate deviation principle for (X (t) − µ (t))t∈[0,T ], where µ is the McKean– Vlasov path with initial datum X(N)(0).

(N) (N) • An invariance principle for X as N → ∞, i.e., weak convergence of (X (Nt))t∈[0,T ] to a Brownian motion on a sphere of probability measures. Although the second aspect may be of interest by itself, we mainly use it as technical facility to prove the third one, which is our main aim. Let us now discuss each of these aspects in detail.

The behavior of the invariant distributions in a hierarchical model The hierarchical model is arranged in several levels. Let us step by step explain the transition from one to the next level, starting with the system of two levels. (N) (N) For N ∈ N, let Π denote the invariant distribution of the process X . The above large (N) deviation principle suggests that, as N → ∞, the process (X (t))t∈[0,T ] follows the McKean– Vlasov path with initial data X(N)(0). Hence we guess that Π(N) is asymptotically concentrated on some subset of the set of all equilibrium states of the McKean–Vlasov dynamics. In our d particular situation, a McKean–Vlasov path with initial datum µ0 ∈ P(S ) is a solution µ ∈ C([0, ∞); P(Sd)) of the parabolic partial differential equation d D E hµ(t), fi = µ(t), Lµ(t)f , f ∈ C∞(Sd), µ(0) = µ ∈ P(Sd). dt 0 The measure dependent diffusion operator L is given by σ2 Lηf = ∆f + J B(η)f, η ∈ P(Sd), f ∈ C∞(Sd). 2 v

Denote by (Lη)∗ the adjoint operator of Lη. Then each steady state µ ∈ P(Sd) of the McKean–Vlasov dynamics solves (Lµ)∗µ = 0.

(N) We will consider large deviations for the sequence (Π )N∈N and derive an explicit formula for the corresponding rate function Iinv. It will turn out that the zero set of Iinv is a sphere

S = {να: α ∈ Sd} ⊂ P(Sd),

where να are probability measures on Sd with density

 2J  α dν exp σ2 r0 (α, x) d+1 (x) = R , x ∈ Sd, dλ R  2J  exp 2 r0 (α, y) d+1 λ(dy) σ R for some radius r0 ∈ [0, 1). Here (·, ·) d+1 and λ denote the Euclidean inner product in d+1 Rd R and the uniform distribution on S , respectively. The uniqueness of this radius is a consequence of the GHS–inequality, see [EMN76] and [Sim93], and of the rotation invariance of the model. As mentioned above the set S contains all accumulation points of the sequence (N) (Π )N∈N. The radius r0 depends on the mean field constant J. We will see that there exists a critical strength Jc > 0 of the mean field interaction, at which a phase transition occurs. More precisely, if 0 ≤ J ≤ Jc then the radius r0 is equal to zero, and the sphere S is degenerated, d i.e., contains only the uniform distribution on S . On the other hand, if J > Jc then r0 is positive. This allows for the following interpretation. If J ≤ Jc then macroscopically one does not see the mean field interaction and qualitatively the coupled particle system cannot be distinguished from the free one, i.e., from the system of independent Brownian motions on Sd. So far, we have described the stationary behavior at level one. Higher levels are defined as follows. At level two we consider N independent copies of level one boxes, each with N J(2) particles, and add an additional mean field interaction with strength N between the level (N,1) d one empirical means hX (t), id d+1 i projected onto the tangent manifold TS . Here the l R subscript l refers to the l-th level one box and the second superscript indicates the level. (N,1) Hence, Xl is the measure valued empirical process of the l-th level one box. The level two empirical processes are given by

N (N,2) 1 X d (2) d X (t) = δ (N,1) ∈ P(P(S )) = P (S ), t ∈ [0, ∞),N ∈ N. N Xl (t) l=1

(1) (1) Assume that the radius r0 of the level one sphere S = S is positive. Then one can ask whether or not the level two mean field interaction is macroscopically measurable. It turns (1) −2 out that there exists a critical mean field interaction strength Jc ·(r0 ) between the level one components, at which a phase transition occurs. This behavior can be interpreted as follows. If J (2) is larger than this critical value then a kind of synchronization of the level one boxes happens. Otherwise, qualitatively the coupled level two system cannot be distinguished from the system of independent copies of level one boxes without mean field interaction between them. vi INTRODUCTION

At level three we consider N independent copies of level two boxes, each containing N level J(3) one boxes, and an additional mean field interaction with strength N 2 between the level two (N,2) d empirical means hX (t), h·, id d+1 ii projected onto the tangent manifold TS . Higher levels l R are defined inductively. For each level n ∈ N, we prove a large deviation principle for the invariant distributions of the empirical processes X(N,n) as N → ∞. Moreover, we derive an explicit formula for the corresponding rate function and identify its zero set. It turns out that this set is equal to a (n) (n) d (n) (n−1) (n) sphere S ⊂ P (S ) with radius r0 ∈ [0, r0 ). The radius r0 depends on all mean field (m) (n−1) interaction constants J , m ≤ n, of lower levels. Moreover, if the radius r0 is positive then there exists a positive critical value of the mean field interaction strength J (n), at which (n) (n) (n−1) −2 a phase transition occurs on level n. More precisely, we have r0 = 0 for J ≤ Jc · (r0 ) (n) (n) (n−1) −2 and r0 > 0 for J > Jc · (r0 ) . (n) Since the sequence (r0 )n∈N is non–increasing there are two possible scenarios. Either there (n) (m) exists a level n ∈ N with r0 = 0 and hence r0 = 0, for all m ≥ n, or all radii are positive. (n) Unfortunately, we are not able to present a simple criterion on the sequence (J )n∈N of mean field constants allowing for a distinction of the two scenarios occurs. If all radii are positive then one can ask whether or not they converge to positive limit as n → ∞. We will derive a simple sufficient criterion on the mean field interaction constants for the limit being equal P∞ (n) to zero. It turns out that if n=1 1/J = ∞ then the limit of the radii is equal to zero. (n) (n) (n−1) Defining new parameters γ ≥ 0, which depend on J and r0 , we can prove a somewhat explicit statement of the form

∞ (n) X 1 lim r = 0 ⇐⇒ = ∞. n→∞ 0 γ(n) n=1

Weak convergence to a Brownian motion Let us return to our (level one) mean field model on the sphere Sd. Assume that the mean field interaction strength J is larger than the critical value Jc. Then the radius r0 of the sphere S is positive. From the large deviation principle for the invariant distributions and the central limit theorem one can guess the long time behavior of the measure valued empirical (N) processes X , N ∈ N, under the time scale t 7→ Nt as N tends to infinity. Heuristically, the (1) rate function Iinv = Iinv plays the role of a potential for the McKean–Vlasov dynamics, see [DG88]. This means that

1 µ˙ (t) = (Lµ(t))∗µ(t) = − grad I (µ(t)), t > 0, 2σ2 P inv

for all McKean–Vlasov paths µ. Here gradP may be regarded as the gradient with respect to a certain weak Riemannian structure on the infinite dimensional manifold P(Sd). Therefore, we expect that the distance between the McKean–Vlasov path and the zero set of the rate function Iinv vanishes as t → ∞. Since Iinv is zero on S there is no “drift” on the sphere S. This leads to the suggestion that, under the above time scale, the measure valued empirical processes X(N) converge, as N → ∞, weakly to a diffusion process on the sphere S. By rotation invariance, this diffusion must be a Brownian motion on S. By a “Brownian motion on S” we mean the process νφ, where φ is a Brownian motion on Sd. vii

The time scale t 7→ Nt can be guessed from the process of X(N). Indeed, since t D E D E σ2 Z D E [[ X(N), f , X(N), f ]] = X(N)(s), kgrad fk2 ds, t N 0 for all f ∈ C∞(Sd), the proper time scale should be the claimed one. Here grad and k·k denote the gradient on Sd and the norm on TSd, respectively. Therefore, we have to show first the convergence of the processes X(N) to the sphere S and second to identify the weak limit. In order to prove the first statement we will use a moderate deviation result, which is presented in the next section. Since the manifold P(Sd) is infinite dimensional we have to choose a proper class of test functions to solve our problem. It will turn out that this is the most difficult part of the proof. Since each measure να ∈ S is uniquely determined by its “mean”

α hν , id d+1 i , R this “mean” seems to be a good candidate for a test function. But this turns out to be wrong. We will discuss this issue in Section 3.5. In order to get a first feeling for the right choice of the test functions, we use Itˆo’sformula to compute the following “linear” approximation

D (N) β E D (N) β νβ E  (N) β ⊗2 d X (t) − ν , f = X (t) − ν , G f dt + R (X (t) − ν ) dt + dMf (t), (∗)

d ∞ d for each β ∈ S and each f ∈ C (S ), where Mf is a martingale, which depends on f, and R((X(N)(t) − νβ)⊗2) is a quadratic remainder term. The distribution dependent operator G is defined by D E D E ϑ2 ϑ2 ϑ1, G f = ϑ1, L f + J hϑ2, B(ϑ1)fi ,

∞ d 0 d for f ∈ C (S ) and Schwartz distributions ϑ1, ϑ2 ∈ D (S ). Therefore, in order to control the first drift term on the right hand side of (∗) the eigenfunctions corresponding to the eigenvalue zero of the operator Gνβ should be a good choice for test functions. In reality we will use a “quadratic” approximation of X(N)(t) − νβ. The idea is to add to (∗) a term of the form (X(N)(t) − νβ)⊗2, g , where we choose g ∈ C∞(Sd × Sd) in such a way that the remaining drift term depends continuously on (X(N)(t) − µ(N)(t))⊗3. Then one can use a moderate deviation principle to control this remaining drift term. Moreover, we have to replace β by a semi–martingale ϕ(N), which causes additional terms on the right hand side of (∗). Another problem, which occurs in dimension larger than one, is as follows. There exists more than one rotation of the sphere Sd, which maps one given point x ∈ Sd to another given point y ∈ Sd. These technical problems are the reason why we decided to prove our claim only in dimension one.

Moderate deviations Instead of the sphere Sd, we consider more generally an arbitrary d–dimensional connected compact Riemannian C∞–manifold M without boundary. As usual ∆ denotes the Laplacian on M. Moreover, let D0(M) denote the Schwartz space of all distributions on M. Unless otherwise noted all non–random differential geometrical objects are C∞. viii INTRODUCTION

Fix a distribution dependent vector field B(ϑ) that depends linearly on ϑ ∈ D0(M). For N ∈ N, we consider the particle system x1, . . . , xN ∈ C([0, ∞); M), which solves the following system of Stratonovich type stochastic differential equations

(N) dxk(t) = J B(X (t))(xk(t)) dt + σA(xk(t)) ∗ dWk(t), k = 1,...,N,

where σ > 0 is the diffusion constant and J ≥ 0 is the strength of the mean field interaction. m Moreover, W1,...,WN are independent Brownian motions on R , for some fixed m ∈ N. The components A1,...,Am of A are fixed vector fields, which solve m X 2 Ai = ∆ . i=1

Note that such vector fields A1,...,Am exist, at least for m ≥ d + 1. This is a consequence of Whitney’s embedding theorem, see for instance [HT94]. Suppose that, as N → ∞, the empirical measure X(N)(0) converges fast enough to some probability measure ν ∈ P(M). Denote by µ and µ(N) the McKean–Vlasov paths with initial datum ν and X(N)(0), respectively. Then from [DG87] we know that the process (N) (X (t))t∈[0,T ] converges almost surely in C([0,T ]; M) to µ as N → ∞. Moderate deviations measure the difference between X(N) and µ or between X(N) and µ(N) in a more precise way. Since, under certain conditions on the initial data, the moderate (N) deviation principle of (X − µ)N∈N is a consequence of the moderate deviation principle of (N) (N) (X − µ )N∈N we only discuss here the latter one. (N) (N) Roughly speaking, a moderate deviation principle for the sequence (X − µ )N∈N is a statement of the form   1 (N) (N) −Nγ2 I(ϑ) P (X (·) − µ (·)) ≈ ϑ(·) ≈ e N , as N → ∞, γN for suitable ϑ ∈ C([0,T ]; D0(M)) and some rate function I: C([0,T ]; D0(M)) → [0, ∞]. Here

(γN )N∈N is an arbitrary sequence of positive numbers with 2 lim γN = 0 and lim NγN = ∞. N→∞ N→∞

This means that we study intermediate deviation scales between√ two boundary regimes, namely the regime of the central limit theorem, where γN = 1/ N, and the large devia- tion scale, where γN = 1. Clearly a moderate deviation principle may also be seen as a family 2 of large deviation principles with rates NγN . In order to prove the claimed moderate deviation principle we will first study the free particle system xe1,..., xeN , i.e., N independent copies of the following Stratonovich type stochastic differential equation

dxe(t) = J B(µ(t) + γN ϑ(t))(xe(t)) dt + σA(xe(t)) ∗ dW (t), for a given McKean–Vlasov path µ and suitable deterministic deviations given by ϑ ∈ C([0,T ]; D0(M)). Denote by X(N) the measure valued empirical process of this free parti- ϑ cle system. Using Itˆo’sformula, we compute   1 (N)  1 (N)  D E d (X (t) − µ(t)), f = F (X (t) − µ(t)) dt + d Mf(N)(t), f , γN ϑ γN ϑ ix for all f ∈ C∞(M) and some functional F that depends on f. Here Mf(N) is a measure valued martingale. Assume that the sequence (X(N)(0)) converges fast enough to the initial datum ν of ϑ N∈N (N) the McKean–Vlasov path µ. Then it turns out that the sequence (Mf )N∈N satisfies a large 2 deviation principle with rate NγN . Moreover, we will deduce a “nice” representation of the corresponding rate function. Using the theory of partial differential equations in Sobolev spaces, we will obtain 1 (X(N) − µ(N)) as the image of M (N) under a certain continuous γN ϑ f mapping. Therefore, a suitable large deviation principle for the processes 1 (X(N) − µ(N)) γN ϑ then follows with the help of the contraction principle, see [DZ93]. Second we will see that in a γN –neighborhood of ϑ the coupled and the free particle system have asymptotically the same distribution. This local result together with some exponential tightness argument yield the claimed moderate deviation principle for (X(N) − µ(N)) . ϑ N∈N Consider again M = Sd and assume that the mean field interaction J is larger than the critical value Jc. A consequence of the moderate deviation principle is as follows. For N ∈ N, (N) α assume that X (0) starts from a γN –neighborhood of some fixed ν ∈ S. Then, as N → ∞, (N) the measure valued empirical process X will stay all the time up to NT in a suitable γN – neighborhood of the sphere S with high probability. Hence, for each functional R, depending continuously on (X(N)(t) − µ(N)(t))⊗3, we get !   (N) (N) ⊗3 lim P sup R (X − µ ) > ε = 0, N→∞ t∈[0,NT ] for all ε > 0 and all T ≥ 0. Because of this, we can and will use our moderate deviation principle to solve the corresponding problems mentioned in the last section.

Organization of this thesis In Chapter 1 we analyze the invariant distributions in the hierarchical model. Chapter 2 contains the moderate deviation results and in Chapter 3 we prove the weak convergence of the time scaled measure valued empirical processes to a Brownian motion on a sphere of probability measures. Finally, in the appendix we present some basic results about logarithmic Sobolev inequalities (Appendix A.1), about Sobolev spaces on manifolds (Appendix A.2) and about stochastic analysis and differential equations on manifolds (Appendix A.3).

Open problems and generalizations In this section we discuss four interesting possibilities of generalizations of our results. Problem 1: Prove the convergence in distribution of the time scaled empirical processes (N) (X (Nt))t∈[0,T ] to a Brownian motion on the sphere S, for dimension larger than one. In Section 3.6 we will specify the arising technical problems more precisely. Moreover, we will present some ideas for a solution of these problems.

d Problem 2: Consider the mean field model in R , d ≥ 2, with a rotation invariant poten- tial V . Assume that the steady states of the McKean–Vlasov dynamics form a sphere. This assumption is closely related to the GHS–inequality. We believe that one can prove analogous x INTRODUCTION statements as for the spherical mean field model, provided the potential V is “nice” enough. For some comments about the Euclidean case, see Sections 1.5 and 2.7.

Problem 3: Extend the investigations of the hierarchical equilibrium system to the dy- namical behavior of the hierarchical system. For instance, if the strength of the mean field interaction at level one is larger than the critical value then we expect the following behavior.

Under the time scale t 7→ Nt the measure valued empirical process of each level one box converges in distribution to a Brownian motion on the sphere S(1) as N → ∞. At level two a kind of phase transition should occur at a critical value of the level two mean field interaction strength. That means, if J (2) is larger than this critical value then there should be a kind of synchronization of the “Brownian motions” of the level one boxes. Otherwise, macroscopically the level two mean field interaction should not be observable.

Under the time scale t 7→ N 2t we expect that the level two empirical process converges, as N → ∞, in distribution to a Brownian motion on the sphere S(2) of probability measures on the sphere S(1), which itself contains probability measures on Sd. Moreover, there should exist a critical value of the level two mean field interaction strength such that the radius of the sphere S(2) is positive if and only if J (2) is larger than the critical value.

The behavior of higher levels under various time scales should be analogous.

Problem 4: Consider non–constant diffusion coefficients, an arbitrary compact Rieman- nian manifold, non rotation invariant potentials V or more general drift terms, which depend on the empirical measure. One serious problem arising in such general settings is to identify the equilibrium states of the McKean–Vlasov dynamics. Moreover, the set of all equilibrium states can be very irregular.

Assume that the set of equilibrium states of the McKean–Vlasov dynamics form a smooth finite dimensional, maybe non–connected, manifold N . Then we expect the following behavior of the measure valued empirical processes X(N), for a large number N of particles. First, X(N) will normally follow a McKean–Vlasov path into a neighborhood of some steady state contained in some connected component of N and then perform a diffusion in a neighborhood of this component. This phenomenon should be detectable in time scales that grow linearly as N → ∞. But from time to time X(N) will make attempts to escape from the domain of attraction of this component of N . Sooner or later one of these attempts will be successful and the measure valued empirical process will undergo a transition into a small neighborhood of an equilibrium state of another component of N . We expect that the time, which the measure valued empirical process takes to perform such tunneling grows exponentially as N → ∞. xi

Acknowledgments

It is a great pleasure to gratefully acknowledge the support of my advisor Professor J¨urgen G¨artner. He generously shared with me his knowledge, experience, and his sense of good style in mathematics and moreover provided an excellent research environment. His suggestions and inspiring remarks significantly enhanced the content and improve the presentation of this thesis. Very special thanks go to friends and colleagues for their great support and encouragement. Achim D¨obler, Wolfgang K¨onig,Jens Abel, Urs Gruber and Felix Esche merit a special note of thanks for inspiring discussions and their support on my thesis. Finally, my heartfelt thanks go to Adriana and my family for being there. Financial support by the Deutsche Forschungsgemeinschaft via Graduiertenkolleg (“Stochas- tic Processes and Probabilistic Analysis”) is gratefully acknowledged. xii INTRODUCTION Chapter 1

Invariant distributions and a hierarchical model

1.1 The model and basic notation

In this section we study the behavior of the invariant distributions of the empirical processes d d at each level. Fix d ∈ N and denote the d–dimensional sphere by S . Equip S with the natural Riemannian structure and denote by grad, ∆ and (·, ·) the gradient and Laplacian on Sd and the inner product on the tangent manifold TSd, respectively. Although all differential geometrical objects used in this section are intrinsic we sometimes consider

d d+1 S = {x ∈ : kxk d+1 = 1} (1.1) R R d+1 as submanifold of R . Now let us describe the hierarchical spherical mean field model we want to analyze. At level one we consider, for each N ∈ N, the particle system (x1(t), . . . , xN (t)), t ∈ R+, of N diffusion processes on Sd defined by the following system of Stratonovich type stochastic differential equations

(1) D (N,1) E dxk(t) = J grad X (t), cos(·, xk(t)) dt + σA(xk(t)) ∗ dWk(t). (1.2)

Here J (1) ≥ 0 measures the strength of the mean field interaction, σ > 0 is the diffusion d+1 constant and W1,...,WN are independent Brownian motions on R . Moreover, the com- ponents A1,...,Ad+1 of A are vector fields, which satisfy

d+1 X ∞ d AiAif = ∆f, for all f ∈ C (S ). (1.3) i=1

For instance, such vector fields A1,...,Ad+1 can be constructed in the following way: Consider d d+1 d S as submanifold of R and let A(x) be the orthogonal projection onto TxS , for each x ∈ Sd. Then the components of A satisfy (1.3), see for instance [HT94]. Moreover, we define the cosine cos(x, y) between two points x, y ∈ Sd as the cosine of the d d+1 angle between x and y. More precisely, consider S via (1.1) as submanifold of R . Then

cos(x, y) := (x, y) d+1 . (1.4) R

1 2 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

In (1.2) the empirical measure

N 1 X X(N,1)(t) := δ , (1.5) N xk(t) k=1

of the N particles x1(t), . . . , xN (t) acts on the variable “·”. As a consequence of (1.2)

t Z σ2  D E  f(x (t))−f(x (0))− ∆f(x (s))+J (1) grad X(N,1)(s), cos(·, x (s)) , grad f(x (s)) ds k k 2 k k k 0 is a (real valued) martingale, for each f ∈ C∞(Sd). d d+1 Again, consider S via (1.1) as submanifold of R . Then the mean field term of (1.2) turns out to be the usual mean field interaction between xk(t) and the empirical mean of the d whole system projected onto the tangent space Txk(t)S . i.e.,

N (1) 1 X  dx (t) = J x (t) − (x (t), x (t)) d+1 x (t) dt + σA(x (t)) ∗ dW (t). k N l l k R k k k l=1

At level two we take N ∈ N independent copies of level one boxes and an additional mean J(2) field interaction of strength N ≥ 0 between the level one empirical means of the N level one boxes. The corresponding particle system is defined by the following system of Stratonovich type stochastic differential equations

(1) D (N,1) E dxk,l(t) = J grad Xl (t), cos(·, xk,l(t))) dt + σA(xk,l(t)) ∗ dWk,l(t) (1.6) J (2) D E D E + grad X(N,2)(t), cos(·, x (t)) − X(N,1)(t), cos(·, x (t)) dt, N k,l l k,l d+1 where Wk,l, 1 ≤ k, l ≤ N, are independent Brownian motions on R and A has the same components A1,...,Ad+1 as at level one. The subscript l on the level one empirical measure (N,1) Xl refers to the l-th level one box. The action of the level two empirical measure

N (N,2) 1 X X (t) := δ (N,1) N Xl (t) l=1

on a function f ∈ C(Sd) is defined by

N D E D E 1 X D (N,1) E X(N,2)(t), f := X(N,2)(t), h·, fi = X (t), f . N l l=1

d d+1 Consider S via (1.1) as submanifold of R . Then the additional mean field term of (1.6) is the usual one between the empirical mean of the whole system and the empirical mean of d the l-th level one box projected onto the tangent manifold Txl,k(t)S . For each higher level n ≥ 3 we take N ∈ N independent copies of level n − 1 boxes and an J(n) additional mean field interaction with strength N n−1 between the N level n − 1 boxes. Now let us recall the definition of a large deviation principles. 1.1. THE MODEL AND BASIC NOTATION 3

Definition 1.1.1 Let E be a topological space equipped with the Borel σ–field. Moreover,

let (a(N))N∈N be a sequence of positive real numbers, which tend to infinity. We say that a family (µN )N∈N of probability measures on E satisfies a large deviation principle (LDP) on E with rate a(N) and rate function I: E → [0, ∞] if the following three statements are fulfilled:

(i) For each open subset U of E we have

1 lim ln µN (U) ≥ − inf I(x). N→∞ a(N) x∈U

(ii) For each closed subset F of E we have

1 lim ln µN (F ) ≤ − inf I(x). N→∞ a(N) x∈F

(iii) The rate function has compact level sets, i.e., the set {x ∈ E: I(x) ≤ s} is compact in E, for each s ≥ 0.

A sequence (ξN )N∈N of random variables satisfies a large deviation principle in E with rate a(N) and rate function I if their distributions satisfy a large deviation principle on E with rate a(N) and rate function I.

There are many books about large deviations. Some of them are listed in the references. We will use the notation of the book [DZ93] by Amir Dembo and Ofer Zeitouni. An outline of Chapter 1 is as follows. In each level, n ∈ N, we will see that the invariant distributions of the measure valued empirical processes

N (N,n) 1 X X (t) := δ (N,n−1) (1.7) N Xl (t) k=1

satisfy, as N → ∞, a large deviation principle with rate N. Moreover, the zero set of the (n) (n) corresponding rate function is equal to a d–dimensional sphere S of radius r0 ≥ 0 in the space  P(Sd), for n = 1, P(n)(Sd) := (1.8) P(P(n−1)(Sd)), for n > 1.

(n−1) We will prove that if r0 > 0 then there exists a positive critical mean field interaction (n−1) (n) (n) Jc > 0, depending on r0 , such that r0 > 0 if and only if J is larger than the critical value. (n) Finally, we will discuss the behavior of the sequence of radii (r0 )n∈N, which turns out to (n) be non–increasing. Of great interest is the question whether or not the sequence (r0 )n∈N eventually reaches zero or stays positive. Unfortunately, we cannot present a simple criterion (n) on the sequence (J )n∈N to distinguish these two cases. However, we shall discuss important special cases. 4 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

1.2 Level one

Fix J (1) ≥ 0. In this section we study the invariant distributions Π(N,1) ∈ P(2)(Sd) of the (N,1) measure valued empirical processes X , N ∈ N, defined in (1.5). Since the distribution of (N,1) X is invariant under permutation of the initial configuration (x1(0), . . . , xN (0)) we can conceive X(N,1) as a Markov process on the set P(1)(Sd) = P(Sd) of all probability measures on Sd. First we will prove a large deviation principle for the invariant distributions Π(N,1) of X(N,1). In the second part we will discuss the rate function of this LDP.

1.2.1 Large deviations at level one First let us compute the invariant distribution Π(N,1) of the measure valued empirical process X(N,1). The particle system defined in (1.2) is a diffusion process on (Sd)N with generator

2 (N,1) σ  (N,1)  ∞ d N L f = ∆N f + gradN V , gradN f , f ∈ C ((S ) ), 2 N

d where gradN , ∆N and (·, ·)N denote the gradient and Laplacian on S and the Riemannian inner product on the tangent manifold T (Sd)N , respectively. The potential V (N,1) is given by

N J (1) X V (N,1)(y) = cos(y , y ), y = (y , . . . , y ) ∈ (Sd)N . 2N l k 1 N l,k=1

A short calculation shows that the adjoint operator (L(N,1))∗ of L(N,1) is of the form

σ2  (L(N,1))∗f = div grad f − fgrad V (N,1) , f ∈ C∞((Sd)N ), N 2 N N

d N where divN denotes the divergence on (S ) . From the last equation one easily computes the invariant distribution µ of the particle system (x1(t), . . . , xN (t)). The density of µ with d N respect to the uniform distribution λN on (S ) is given by   dµ 1 2 (N,1) (y) = exp 2 V (y) dλN C σ * N N +! 1 J (1) 1 X 1 X = exp N δ ⊗ δ , cos , C σ2 N yk N yk k=1 k=1 where C > 0 is the normalizing constant. From the last equation we obtain the invariant distribution Π(N,1) ∈ P(2)(Sd) = P(P(Sd)) of the measure valued empirical process X(N,1), i.e., ! 1 J (1) Π(N,1)(dµ) = exp N hµ ⊗ µ, cosi Π(N,1)(dµ), (1.9) C σ2 0

(N,1) where C > 0 is the normalizing constant and Π0 is the distribution of the empirical measure 1 PN d N k=1 δξk of N independent uniform distributed random variables ξ1, . . . , ξN ∈ S . The following large deviation principle can be guessed from (1.9). 1.2. LEVEL ONE 5

(N,1) (1) d Proposition 1.2.1 The sequence (Π )N∈N satisfies a large deviation principle on P (S ) with rate N and rate function

  dµ J (1) (1)  µ, ln − 2 hµ ⊗ µ, cosi + C, if µ  λ, Iinv(µ) := dλ σ (1.10)  ∞, otherwise,

(1) (1) d where C ∈ R is a constant such that inf{Iinv(µ): µ ∈ P (S )} = 0, and λ denotes the uniform distribution on Sd.

(N,1) (2) d Proof: Applying Sanov’s theorem, see [DZ93, Theorem 6.2.10], to Π0 ∈ P (S ), we get a large deviation principle on P(1)(Sd) for these measures with rate N and rate function

  dµ (1)  µ, ln , µ  λ, Ieinv(µ) = dλ  ∞, otherwise.

d Because the mapping P(S ) 3 µ 7→ hµ ⊗ µ, cosi ∈ R is continuous and bounded, we can apply Varadhan’s lemma, see [DZ93, Theorem 4.3.1], and the claimed LDP is proven. 2

(1) In the next section we look at the rate function Iinv. In particular, we will identify its zero set.

1.2.2 The rate function at level one The aim of this section is to characterize the set

(1) d (1) {µ ∈ P (S ): Iinv(µ) = 0}.

Let us define the length r(1)(µ) ∈ [0, 1] and the “angle” α(1)(µ) ∈ Sd of the mean of a probability d d d+1 measure µ ∈ P(S ). Consider S via (1.1) as submanifold of R . Then we define Z (1) (1) z µ(dz) = hµ, id d+1 i =: r (µ)α (µ). (1.11) R d+1 R Then we have

hµ ⊗ ν, cosi = r(1)(µ)r(1)(ν) cos(α(1)(µ), α(1)(ν)), for all ν, µ ∈ P(Sd). (1.12)

This explains why we call r(1)(µ) and α(1)(µ) length and “angle” of the mean of µ, respectively. (1) Having this in mind, we can rewrite the rate function Iinv as

   (1) dµ J (1) 2 (1)  µ, ln − 2 (r (µ)) + C, µ  λ, Iinv(ν) = dλ σ (1.13)  ∞, otherwise.

The following lemma gives us a necessary condition for Iinv(µ) = 0. 6 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

(1) Lemma 1.2.2 If Iinv(µ) is equal to zero then µ is absolutely continuous with respect to the uniform distribution λ on Sd and  2J(1) (1) (1)  dµ exp σ2 r (µ) cos(x, α (µ)) (x) = , x ∈ Sd. (1.14) dλ D  2J(1) (1) (1) E λ, exp σ2 r (µ) cos(·, α (µ))

(1) Proof: Suppose Iinv(µ) = 0 then µ is absolutely continuous with respect to λ. Denote the dµ (1) d density dλ by ϕ. Because of Iinv(µ) = 0, we have ϕ ln ϕ ∈ L1(S ; λ). Take some arbitrary d dµe ∞ d µe ∈ P(S ) with dλ = ψ ∈ C (S ) and define the convex function h: R+ → R by h(x) := x ln x, x ∈ R+. Then we get  2 (1) (1) I ((1 − ε)µ + εµ)  h((1 − ε)ϕ + εψ) − h(ϕ) J (1) r ((1 − ε)µ + εµe) 0 ≤ inv e = λ, − , ε ε σ2 ε for all ε ∈ [0, 1]. The convexity of h implies h((1 − ε)ϕ + εψ) − h(ϕ) ≤ h(ψ) − h(ϕ) ∈ L (Sd); λ). ε 1 Therefore, applying Fatou’s lemma it follows that I(1) ((1 − ε)µ + εµ) 0 ≤ lim inv e ε↓0 ε  h((1 − ε)ϕ + εψ) − h(ϕ) 2J (1)r(1)(µ) D E ≤ λ, lim − λ, ψ cos(·, α(1)(µ)) . ε↓0 ε σ2

(1) Using Iinv(µ) = 0, we can proceed with * !+ 2J (1) 0 ≤ λ, ψ ln ϕ − r(1)(µ) cos(·, α(1)(µ)) + C , σ2 for some constant C, which depends on µ. Since the last inequality is true for all probability ∞ d densities ψ ∈ C (S ) there exists some constant C0 > 0 such that ϕ(x) ≥ C0, for λ almost all x ∈ Sd. This implies that for each ψ ∈ C∞(Sd) there exists a constant δ > 0 such that h((1 − ε)ϕ(x) + εψ(x)) − h(ϕ(x)) C (1 − ε)ϕ(x) + εψ(x) ≥ 0 and ≥ h0( 0 ), ε 2 for λ almost all x ∈ Sd and all ε ∈ [−δ, 1]. Therefore, we can compute * !+ dI(1) ((1 − ε)µ + εµ) 2J (1) 0 = inv e = λ, ψ ln ϕ − r(1)(µ) cos(·, α(1)(µ)) + C . dε σ2 ε=0 Since this is true for all probability densities ψ ∈ C∞(Sd) we get ! 2J (1) ϕ(x) = exp r(1)(µ) cos(x, α(1)(µ)) − C , σ2 for λ almost all x ∈ Sd. And since µ is a probability measure (1.14) follows. 2 1.2. LEVEL ONE 7

From (1.12) we get an additional constraint, i.e., D E r(1)(µ)2 = hµ ⊗ µ, cosi = r(1)(µ) µ, cos(·, α(1)(µ)) .

d Fix J ≥ 0 and α ∈ S . Define the mapping GJ : R+ → [0, 1) by 2J  λ, cos(·, α) exp σ2 r cos(·, α) GJ (r) := 2J  , r ∈ R+. (1.15) λ, exp σ2 r cos(·, α) d One easily compute that GJ is independent of α. Therefore, a probability measure µ ∈ P(S ) with Iinv(µ) = 0 has to solve (1) (1) r (µ) = GJ(1) (r (µ)). (1.16) The following lemma will tell us more about (1.15).

Lemma 1.2.3 Fix J > 0. Then the mapping GJ is strictly increasing and strictly concave.

d Proof: Fix α ∈ S . The claimed monotony of GJ follows from 2J   2 G0 (r) = µ(r), cos(·, α) − hµ(r), cos(·, α)i > 0, (1.17) J σ2 where µ(r) is the probability measure on Sd with density

2J dµ(r) exp( 2 r cos(x, α)) (x) = σ , x ∈ Sd. (1.18) dλ 2J λ, exp( σ2 r cos(·, α))

A proof of the strict concavity of GJ can be found in the book of Barry Simon [Sim93, Theorem II.13.5]. 2

Remark 1.2.4 The concavity of GJ is closely related to the GHS (Griffiths–Hurst–Sherman) inequality, see [EMN76] and [Sim93].

(1) Now we are able to identify the zero set of the rate function Iinv.

(1) (1) Proposition 1.2.5 For each J ≥ 0 there exists a unique r0 ∈ [0, 1) such that the set d (n) {µ ∈ P(S ): Iinv(µ) = 0} is equal to the sphere S(1) := {να: α ∈ Sd}, (1.19)

where να are probability measures on P(Sd) with density

 2J(1) (1)  α dν exp σ2 r0 cos(x, α) (x) = , x ∈ Sd, (1.20) dλ D  2J(1) (1) E λ, exp σ2 r0 cos(·, α)

(1) (1) (1) and r0 solves (1.16). Moreover, r0 is positive if and only if J > Jc, where d + 1 J := σ2, (1.21) c 2 see Figure 1.1. 8 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

1 1

GJ(1)

GJ(1)

r r (1) (1) 0 (1) r0 1 r0 = 0(1) 1 J > Jc J ≤ Jc

Figure 1.1: Dependence of r(1) on J (1).

d Proof: From Lemma 1.2.2 we know that each probability measure µ ∈ P(S ) with Iinv(µ) = 0 is of the form (1.14). Therefore, we only have to look at solutions of the fixed point equation

r = GJ(1) (r). (1.22)

Fix α ∈ Sd. Then, using (1.17), we compute

2J (1) 2J (1) G0 (0) = λ, (cos(·, α))2 = . J(1) σ2 (d + 1)σ2

Hence, G0 (0) > 1 if and only if J (1) > J . J(1) c (1) Assume that J ≤ Jc. Then since G is strictly increasing and strictly concave only r = (1) d r0 = 0 solves (1.22). Therefore, only the uniform distribution λ on S solves (1.14) and (1.16). Hence, the zero set of the rate function contains only λ and we can identify this set (1) (1) with the degenerated sphere S of radius r0 = 0. Now assume that J (1) > J and define µ(r) by (1.18). Since G0 (0) > 1 we get c J(1)

d2I(1) (µ ) 2J (1) inv r = G0 (0)(1 − G0 (0)) < 0. dr2 σ2 J(1) J(1) r=0

d (1) Therefore, the uniform distribution on S does not minimize Iinv and there must exist another (1) solution r0 ∈ (0, 1) of (1.22). Because GJ(1) is strictly concave, this solution is unique and we (1) (1) (1) can identify the zero set of Iinv with the sphere S of radius r0 . 2 At the end of this section we look at the weak limit of the invariant distributions Π(N,1) as N → ∞.

Proposition 1.2.6 As N tends to infinity the invariant distributions Π(N,1) of the measure valued empirical process X(N,1) converges in P(2)(Sd) weakly to the uniform distribution on (1) (1) the sphere S with radius r0 . 1.3. HIGHER LEVELS 9

Proof: Because of Proposition 1.2.1 and 1.2.5 each accumulation point of the sequence (N,1) (1) (N,1) (Π )N∈N is a probability measure on the sphere S . But, since each Π is invariant under rotations on Sd such accumulation point has to be the uniform distribution. Now our claim follows from the compactness of P(2)(Sd). 2

1.3 Higher Levels

(1) (n) Let n ≥ 2 and fix mean field constants J ,...,J ≥ 0. For each N ∈ N, define the set ΛN := {1,...,N}. The model we want to study can be described by the following system of Stratonovich type stochastic differential equations

(1) D (N,1) E dxk(t) = J grad Xk (t), cos(·, xk(t)) dt + σA(xk(t)) ∗ dWk(t) (1.23) n (u) X J D (N,u) E D (N,u−1) E + grad X (t), cos(·, x (t)) − X (t), cos(·, x (t)) dt, N u−1 k k k k u=2 n n k = (k1, . . . , kn) ∈ ΛN . Here σ > 0 is the diffusion constant and Wk, k ∈ ΛN , are independent d+1 Brownian motions on R . Moreover, the components A1,...,Ad+1 of A are the same as at level one. For a measure µ ∈ P(n)(Sd) and a function f ∈ C(Sd), we define  hµ, fi , if n = 1, hµ, fi := (1.24) hµ, h·, fii , if n > 1. In particular, for 1 ≤ u ≤ n, the action of the level u empirical measure of the k-th level u box on a function f ∈ C(Sd) is defined by

 D (N,1) E D E  Xk (t), f , if u = 1, X(N,u)(t), f := (1.25) k D (N,u) E  Xk (t), h·, fi , if u > 1. N 1 X = f(x ). N u l1,...,lu,ku+1,...,kn l1,...,lu=1

d d+1 Remark 1.3.1 Consider S via (1.1) as submanifold of R . Then the drift terms in (1.23) are differences of the empirical mean of the (k1, . . . , kn−u)-th level u box and the empirical d means of the N level u − 1 sub–boxes projected onto the tangent manifold Txk(t)S . Like at level one let us define the length r(n)(µ) ∈ [0, 1] and the “angle” α(n)(µ) ∈ Sd of the “mean” of a measure µ ∈ P(n)(Sd) as follows. Consider the sphere Sd via (1.1) as submanifold d+1 of R . Then we define (n) (n) hµ, id d+1 i =: r (µ)α (µ). (1.26) R (n) d (n) d+1 Here we denote a measure µ ∈ P (S ) and its unique extension to an element of P (R ) by the same symbol. Definition (1.26) implies hµ ⊗ ν, cosi = r(n)(µ)r(n)(ν) cos(α(n)(µ), α(n)(ν)), for all µ, ν ∈ P(n)(Sd). (1.27) This explains, why we call r(n)(µ) and α(n)(µ) the length and “angle” of the “mean” of the measure µ ∈ P(n)(Sd), respectively. In the next section we want to prove the following theorem. 10 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

(N,n) Theorem 1.3.2 (i) Let n ∈ N. Then the invariant distributions Π , N ∈ N, of the measure valued empirical processes X(N,n) satisfy a large deviation principle on P(n)(Sd) with rate N and rate function

   (n) dµ J (n) 2 (n) (n−1) (n)  µ, ln − r (µ) + C , if µ  λ , (n−1) 2 Iinv(µ) = dλ σ (1.28)  ∞, otherwise.

Here λ(n−1) is the uniform distribution on the set

(n−1) d (n−1) {ν ∈ P (S ): Iinv (ν) = 0}

(n) (n) (n) d and C ∈ R is a constant such that inf{Iinv(µ): µ ∈ P (S )} = 0.

(0) (1) (2) (3) (ii) There exists a sequence 1 =: r0 > r0 ≥ r0 ≥ r0 ≥ ... ≥ 0 such that the following (n) holds for each n ∈ N. The zero set of the rate function Iinv is equal to the sphere

S(n) := {να,n: α ∈ Sd}, (1.29)

with  2J(n) (n−1) (n)  α,n dν exp σ2 r0 r0 cos(β, α) (νβ,n−1) = . (1.30) (n−1) 2π dλ 1 R  2J(n) (n−1) (n)  2π exp σ2 r0 r0 cos(x) dx 0

Here λ denotes the uniform distribution on the sphere S(n−1).

(n) (n) (n−1) −2 d+1 2 (n−1) −2 Moreover, r0 is positive if and only if J > Jc(r0 ) = 2 σ (r0 ) .

(iii) As N tends to infinity the invariant distribution Π(N,n) converges in P(n+1)(Sd) weakly to the uniform distribution on the sphere S(n).

Remark 1.3.3 (a) Propositions 1.2.1, 1.2.5 and 1.2.6 yield the statements of Theorem 1.3.2 for n = 1. Our proof of Theorem 1.3.2 runs via an induction on n.

(b) There are no essential differences between the sphere S(n) and the uniform distribution on this set on the one hand and the sphere Sd and the uniform distribution on Sd on the other hand. Therefore, the assertions in (ii) and (iii) follow from (i) in the same way as Propositions 1.2.5 and 1.2.6 follow from Proposition 1.2.1. We shall not carry out the details.

(n) (c) At each level n ∈ N the radius r0 solves

(n) α,n (n−1) (n) (n−1) r0 = hν , cos(·, α)i = r0 GJ(n) (r0 r0 ),

d (n) for all α ∈ S . Since GJ(n) (x) < 1, for all x ∈ R+, it follows that the sequence (r0 )n∈N is even strictly decreasing until it reaches zero (provided this happens). 1.3. HIGHER LEVELS 11

1.3.1 Large deviations at higher levels In this section we prove part (i) of Theorem 1.3.2. Therefore, fix a level n ≥ 2 and assume that Theorem 1.3.2 is fulfilled for level n − 1. As at level one the particle system can be described by its generator 2 (N,n) σ  (N,n)  ∞ d N n L = ∆N n f + gradN n V , gradN n f , f ∈ C ((S ) ), 2 N n d N n where gradN n , ∆N n and (·, ·)N n denote the gradient and Laplacian on (S ) and the Rie- d N n mannian inner product on the tangent manifold T (S ) , respectively. For N ∈ N, let ΛN denote the set {1,...,N}. Then the potential V (N,n) is given by

(n) (N,n) J X V (y) = 2n−1 cos(yl, yk) (1.31) 2N n l,k∈ΛN n−1 (u) X J (1 + O(1)) X X d N n + cos(y , y ), y = (y ) n ∈ (S ) , n+u−1 k l1,...,lu,ku+1,...,kn k k∈ΛN 2N n u u=1 k∈ΛN l∈ΛN where O(1) are real numbers, which depend on the levels n and u and the number of particle. Moreover, O(1) converge to zero as N tends to infinity. A short calculation shows that the adjoint operator (L(N,n))∗ of L(N,n) is of the form  2  (N,n) ∗ σ (N,n) ∞ d N n (L ) f = div n grad n f − fgrad n V , f ∈ C ((S ) ), N 2 N N d N n where divN n denotes the divergence on (S ) . Using the notation in (1.27), we see that the first term on the right hand side of (1.31) can be written as J (n)  2 N r(n)(X(N,n)) , 2 and the summands in the second term are equal to (u) J (1 + O(1)) 1 X  (N,u) 2 N r(u)(X ) , 1 ≤ u < n, 2 N n−u k n−u k∈ΛN n−u where the index k ∈ ΛN refers to the k-th level u box. From the operator (L(N,n))∗ we can compute the invariant distribution µ(n) ∈ P((Sd)N n ) of the particle system as follows:   (n) 1 2 (N,n) µ (dx) = exp V (x) λ n (dx) (1.32) C σ2 N ! 1 J (n)  2  ⊗N = exp N r(n)(X(N,n)) µ(n−1) (dx), 2 e Ce σ (n−1) where µe is the invariant distribution of the level n − 1 particle system with mean field interactions Je(N,u) ≥ 0, 1 ≤ u < n, which differ from J (n) by factors of the form 1 + O(1). Therefore, the level n empirical process X(N,n) has the following invariant distribution ! 1 J (n)  2 Π(N,n)(dν) = exp N r(n)(ν) Π(N,n)(dν), (1.33) C σ2 0 12 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

(N,n) (n+1) d 1 PN where Π0 ∈ P (S ) is the distribution of the empirical measure N l=1 δξl of N inde- (N,n−1) (N,n−1) (N,n−1) pendent Πe distributed random variables ξ1, . . . , ξN . Here, Π and Πe only differ by the mean field interaction parameters J (u) and Je(N,u), 1 ≤ u < n. Let us first derive the LDP for the measures Πe (N,n−1) from the LDP for Π(N,n−1).

(N,n−1) (n−1) d Lemma 1.3.4 The measures Πe , N ∈ N, satisfy a LDP on P (S ) with rate N and (n−1) (N,n−1) (n) d rate function Iinv . Moreover, (Πe )N∈N converges in P (S ) to the uniform distribution (n−1) (n−1) on the sphere S with radius r0 as N tends to infinity.

(N,n−1) (N,n−1) Proof: For N ∈ N, the measure Πe is absolute continuous with respect to Π and density  n−1  P J(u) (u) 2 exp N σ2 O(1) ν, (r (·)) ϕ(ν) = u=1 .   n−1  (N,n−1) P J(u) (u) 2 Π (dη), exp N σ2 O(1) η, (r ) u=1 Since

n−1 J (n) D E X (u) 2 lim sup 2 O(1) ν, (r (·)) = 0, (1.34) N→∞ (n−1) σ ν∈P (Sd) u=1

one easily derives the LDP for Πe (N,n−1) from the LDP for Π(N,n−1), see for instance the proof of Varadhan’s lemma in [DZ93]. Moreover, via the same arguments as in the proof of Proposition 1.2.6 we deduce the weak (N,n−1) (n−1) convergence of (Π )N∈N to the uniform distribution on the sphere S from the large deviation principle for this sequence. 2

(N,n) (n+1) d Looking once more at (1.33), we see that if Π0 ∈ P (S ) satisfies a LDP then we (N,n) (N,n) can apply Varadhan’s Lemma to derive a LDP for the measures Π . Moreover, Π0 ∈ P(n+1)(Sd) is the empirical measure of independent identical distributed random variables. The distribution of these random variables depends on N and converges by Lemma 1.3.4 weakly to the uniform distribution on the sphere S(n−1). Therefore, we will use the following generalization of Sanov’s Theorem.

Proposition 1.3.5 Let (E,%) be a Polish space equipped with the Borel σ–algebra. For (N) N ∈ N, let µ be a probability measure on E. Let ξ1, . . . , ξN be i.i.d. random variables with (N) (N) 1 PN distribution µ . Denote by Πe the distribution of their empirical measure N l=1 δ (N) . ξl (N) (N) Assume that µ converges, as N → ∞, weakly to some µ ∈ P(E). Then the family (Πe )N∈N satisfies a LDP on P(E) with rate N and rate function ( D E ν, ln dν , ν  µ, Λ∗(ν) = dµ ∞ , otherwise.

For completeness we present a proof of this simple extension of Sanov’s theorem.

Proof: Since µ(N) converges weakly to µ there exists a probability space (Ω, F,P ) and E– (N) valued random variables ψl and ζl, N ∈ N and 1 ≤ l ≤ N, such that 1.3. HIGHER LEVELS 13

(N) (i) the family (ψl , ζl)1≤l≤N is independent, for each N ∈ N,

(N) (N) (ii) the distribution of ψl under P is µ , for each 1 ≤ l ≤ N,

(iii) the distribution of ζl under P is µ, for each 1 ≤ l ≤ N, and

 (N)  (iv) lim P %(ψ , ζl) > ε = 0, for all l ∈ N. N→∞ l A proof of this statement can be found in the book [Dud89, Theorem 11.7.1] of Richard M. (N) 1 PN Dudley. From (i) and (ii) it follows that Πe is the distribution of N l=1 δ (N) . For N ∈ N, ψl (N) 1 PN denote by Π the distribution of N l=1 δζl . Then it follows from Sanov’s Theorem that (N) the sequence (Π )N∈N satisfies a large deviation principle on P(E) with rate N and rate function Λ∗. δ Denote by d% the Prohorov metric on P(E) induced by % and by F the closed δ–neighbor- hood of F ⊆ E. Then, for each ε > 0, we get

N N !  1 X 1 X  P d% δ (N) , δζl > ε N ψl N l=1 l=1 N !  1 X   = P inf δ ≥ 0 : 1n (N) o − 1 δ ≤ δ, ∀ F ⊆ E closed > ε ζl∈F N ψl ∈F { } l=1  (N)  ≤ P %(ψl , ζl) > ε, for at least [Nε] indices l  N   [Nε] ≤ P %(ψ(N), ζ ) > ε , [Nε] 1 1 where [x] denotes the greatest integer less or equal than x. Hence N N ! 1  1 X 1 X  lim ln P d% δ (N) , δζl > ε = −∞. (1.35) N→∞ N N ψl N l=1 l=1 (N) (N) In other words the families (Πe )N∈N and (Π )N∈N are exponentially equivalent. Therefore, (N) (N) the same large deviation principle holds for (Πe )N∈N as it holds for (Π )N∈N, see [DZ93, Theorem 4.2.13]. 2 Now we are able to prove part (i) of Theorem 1.3.2 for level n ≥ 2 under the assumption that Theorem 1.3.2 is fulfilled for level n − 1.

Proof of part (i) of Theorem 1.3.2: Because of Lemma 1.3.4, the measures Πe (N,n−1) converge, as N → ∞, weakly to the uniform distribution λ on the sphere S(n−1). Therefore, we can apply Proposition 1.3.5 and derive a LDP on P(n)(Sd) with rate N and rate function

 ν, ln dν , ν  λ, I(n) (ν) = dλ einv ∞, otherwise,

(N,n) for the measures Π0 , N ∈ N. Now we use Varadhan’s Lemma to get the claimed LDP for (N,n) the measures Π , N ∈ N. 2 14 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

1.4 The behavior of the sequence of radii

(n) In the previous sections we have seen that a sequence (J )n∈N of nonnegative mean field (n) interaction intensities yields a non–increasing sequence (r0 )n∈N of radii, which , as N → ∞, characterize the behavior of the invariant distributions of the level n empirical process X(N,n). The question we want to answer in this section is whether there exists a level n ∈ N such that (n) r0 is equal to zero or all radii are positive. In the last case we are interested in the limit (n) limn→∞ r0 . The first question has an easy but unsatisfactory answer, which is a direct consequence of part (ii) of Theorem 1.3.2.

Lemma 1.4.1 For each level n ∈ , the radius r(n) is positive if and only if J (n) > d+1 σ2. N 0 (n−1) 2 2(r0 ) This answer is unsatisfactory, because the critical value for the mean field constant J (n) (u) (n) depends recursively on all J , u < n. Moreover, a better characterization of r0 > 0 is still an open problem.

1.4.1 A criterion for a positive limit of the radii (n) In this section we present a criterion for a positive limit of the sequence (r0 )n∈N. Therefore, let us define the following new parameter

2J (n)(r(n−1))2 γ(n) := 0 , n ≥ 1. (d + 1)σ2

(n) (n) (n) Then, for each level n ∈ N, the radius r0 is positive if and only if γ > 1. Moreover, r0 is a solution of 2 (n) ! (n) (n−1)  (n−1) (n) (n−1) σ (d + 1) (n) r0 r = r G (n) r r = r G (n) γ . 0 0 J 0 0 0 J 2J (n) (n−1) r0

(n) (n) r0 Therefore, x := (n−1) solves r0 π R cos(α)(sin(α))d−1e(d+1)γ(n)x(n) cos(α) dα  2  (n) σ (d + 1) (n) (n) 0 x = G (n) γ x = . (1.36) J (n) π 2J (n) (n) R (sin(α))d−1e(d+1)γ x cos(α) dα 0 d d+1 For the last equation we consider S via (1.1) as submanifold of R . Now it follows that n (n) Y (u) r0 = x . u=1 Before we can state the criterion for the behavior of the radii we require some estimates.

Lemma 1.4.2 For each dimension d ≥ 1 and each ε > 0 there exists a constant K > 0 such that d + ε σ2 d − ε 1 − ≤ G ( y) ≤ 1 − , for all y > K and all J > 0. (1.37) 2y J 2J 2y 1.4. THE BEHAVIOR OF THE SEQUENCE OF RADII 15

Proof: Fix J > 0. First note that from standard analytic arguments it follows that, for each 0 dimension d ≥ 1, the distributions ψy ∈ D ([0, π]), y ∈ R+, defined by

π R f(α) sin(α)d−1ey cos(α) dα 0 hψy, fi := π , f ∈ C([0, π]) R sin(α)d−1ey cos(α) dα 0

converges weakly to δ0 as y tends to infinity, i.e.,

lim hψy, fi = f(0), for all f ∈ C([0, π]). y→∞

Second, our statement (1.37) is equivalent to

σ2 d lim y(1 − GJ ( y)) = . y→∞ 2J 2

Take d ≥ 3. Then, using the integration by parts formula, we compute

π R y(1 − cos(α)) sin(α)d−1ey cos(α) dα σ2 y(1 − G ( y)) = 0 J 2J π R sin(α)d−1ey cos(α) dα 0 π  R sin(α)d−1 + (d − 2) cos(α)(1 − cos(α)) sin(α)d−3 ey cos(α) dα 0 = π R sin(α)d−1ey cos(α) dα 0  (1 − cos) cos = 1 + (d − 2) ψy, , sin2

d which converge to 2 as y tends to infinity. For d = 2, using once more the integration by parts formula, we get

σ2 2y y(1 − G ( y)) = 1 − , J 2J e2y − 1

d which converge to 1 = 2 , as y tend to infinity. Finally, for d = 1, we get

3π 4 π R y(1 − cos(α))ey cos(α) dα + R y(1 − cos(α))ey cos(α) dα σ2 0 3π y(1 − G ( y)) = 4 . J 2J π R ey cos(α) dα 0

yπ y − 2 The absolute value of the second summand on the right hand side is smaller then 2 e , which converge to zero as y tends to infinity. The first summand on the right hand side can 16 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL be handled in the same way as for higher dimensions, i.e.,

3π 4 R y cos(α) 2 y(1 − cos(α))e dα σ 0 lim y(1 − GJ ( y)) = lim y→∞ 2J y→∞ π R ey cos(α) dα 0 3π 4 √ − √y  (1−cos(α)) cos(α)  2 R y cos(α) −(1 + 2)e + 1 − sin(α)2 e dα = lim 0 y→∞ π R ey cos(α) dα 0 (1 − cos(α)) cos(α) 1 = 1 − lim = . α→0 sin(α)2 2 2

Now we can formulate a criterion for the behavior of the radii.

Theorem 1.4.3 ∞ (n) X 1 lim r > 0 ⇐⇒ < ∞. (1.38) n→∞ 0 γ(n) n=1

Remark 1.4.4 This criterion isn’t as good as it looks like, because γ(n) depends on all γ(u) with u < n. But we can and will use it to compute some important examples, see the next section.

Proof: First remark that both sides of (1.38) implies the positivity of all radii. Indeed, (n) (u) (u) assume that some radius r0 is equal to zero. Then we get r0 = γ = 0, for all u > n. Therefore, lim r(n) = 0 as well as P∞ 1 = ∞. n→∞ 0 n=1 γ(n) Moreover, the positivity of all radii imply the positivity of all mean field interaction con- stants J (n). (n) (n) (n) r0 Assume that limn→∞ r0 > 0. Then the ratio x = (n−1) has to converges to one as n r0 (n) tends to infinity. Hence, (γ )n∈N converges to infinity as n tends to infinity. Therefore, from Lemma 1.4.2 it follows that

 2  (n) σ (d + 1) (n) (n) d x = G (n) γ x ≤ 1 − , J 2J (n) 4(d + 1)γ(n)x(n) for large enough n ∈ N. This implies s 1 1 d x(n) ≤ + − . 2 4 4(d + 1)γ(n)

Using Taylor expansion, we get

d ln(x(n)) ≤ − , 4(d + 1)γ(n) 1.4. THE BEHAVIOR OF THE SEQUENCE OF RADII 17 for large enough n ∈ N. This leads to

∞ n ! d X 1 Y (n) ≤ − lim ln x(u) = − lim ln(r ) < ∞. 4(d + 1) γ(n) n→∞ n→∞ 0 n=1 u=1

Now assume that P∞ 1 < ∞. Then (γ(n)) converges to infinity as n tends to infinity. n=1 γ(n) n∈N Using ones more Lemma 1.4.2, we conclude that

 2  (n) σ (d + 1) (n) (n) d x = G (n) γ x ≥ 1 − , J 2J (n) (d + 1)γ(n)x(n) for large n ∈ N, which implies s 1 1 d x(n) ≥ + − . 2 4 (d + 1)γ(n)

Indeed, because x(n) is near one, the other root of the quadratic equation is unimportant. Taking first the logarithm on booth sides of the last inequality and then a Taylor expansion, we get 2d ln(x(n)) ≥ − , (d + 1)γ(n)

for large enough n ∈ N. This leads to

∞ ∞ (n) X 2d X 1 lim ln r = ln(x(n)) ≥ − > −∞ n→∞ 0 (d + 1) γ(n) n=1 n=1

(n) and limn→∞ r0 > 0 follows. 2

Let us finish this section with the only criterion for the behavior of the radii we can formulate (n) in terms on the mean field interaction constants J , n ∈ N.

Corollary 1.4.5 ∞ X 1 (n) = ∞ =⇒ lim r = 0. J (n) n→∞ 0 n=1

Proof: Assume that lim r(n) > 0. Then applying Theorem 1.4.3, we get P∞ 1 < ∞. n→∞ 0 n=1 γ(n) But, since !2 J (n) γ(n) r(n) γ(n) = 0 ≤ , for all n ≥ 1, J (n+1) γ(n+1) (n−1) γ(n+1) r0 it follows that P∞ 1 < ∞. 2 n=1 J(n) 18 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL

1.4.2 Some examples

In this section we discuss two examples for the behavior of the radii. In the first example we (n) prove that if the mean field interaction constants J , n ∈ N, grow exponentially fast and (1) (n) J is big enough then the sequence (r0 )n∈N has a positive limit. But there are sequences (n) (J )n∈N, which do not grow exponentially and yield a positive limit of the radii, too. Such a sequence we present in the second example.

(n) Example 1.4.6 For each δ > 1 and each sequence (J )n∈N with

J (n+1) ≥ δ, for all n ∈ N, J (n) there exists a constant K > 0 such that

J (1) > K =⇒ lim r(n) > 0. n→∞ 0

Proof: Take J (1) so large that 1 (r(1))2 =: θ > . 0 δ Then we get (n) n−1 (1) γ ≥ (θδ) γ , for all n ∈ N. (1.39)

Indeed, (1.39) is satisfied for n = 1. Assume that this inequality holds for some n ∈ N then (n) (1) (n) (1) r0 r0 (1) it follows that γ > γ . Therefore, (n−1) ≥ (0) = r0 and we compute r0 r0

!2 J (n+1) r(n) γ(n+1) = 0 γ(n) ≥ (θδ)nγ(1). J (n) (n−1) r0

Finally, (1.39) leads to ∞ X 1 < ∞ γ(n) n=1 and our statement follows from Theorem 1.4.3. 2

(n) 1+ε Example 1.4.7 Fix ε > 0. For n ∈ N, define γ = n . Then by Theorem 1.4.3 we get a positive limit of the radii. Hence, x(n) converges to one as n tends to infinity. And because of

J (n+1) γ(n+1) 1 (n + 1)1+ε 1 = = , for n ∈ N, J (n) γ(n) (x(n))2 n1+ε (x(n))2

(n+1) the ratio J converges to one as n tends to infinity. Hence, the sequence (J (n)) does J(n) n∈N not grow exponentially fast. 1.5. NOTES 19

1.5 Notes

d There are some possibilities to change the model. For instance, one can consider R instead of Sd, i.e., a particle system which solves (at level one) the following system of Itˆostochastic differential equations

N ! 1 X dx (t) = −grad V (x (t)) + J (1) (x (t) − x (t)) dt + σdW (t), (1.40) k k N l k k l=1

for 1 ≤ k ≤ N. Here σ > 0 is the diffusion constant and W1,...,WN are independent d d Brownian motions on R . To get a well defined model the potential V : R → R has to be “nice” and has to grow fast enough to infinity as kxk → ∞. Furthermore, we assume that V is invariant under rotation. Like in the spherical case one can define higher levels. Under suitable conditions on the potential V one can prove similar statements as in Theorem d 1.3.2 for this mean field model in R . In the proof will arise two problems. The first one is that we would like to apply Varadhan’s Lemma to functionals

(n) d 2 P (R ) 3 µ 7→ µ, |x| ∈ R.

(n) d These functionals are not continuous with respect to the weak topology on P (R ). But (n) d this problem can be solved by taking a stronger topology on P (R ). For details see the article [DG87] of Donald A. Dawson and J¨urgen G¨artner, where they proved a large deviation principle for the measure valued process X(N,1). The second problem we get is related to the GHS inequality and only appears at level one. In order to identify the zero set of the rate function we have to solve the equation

1 2 (1) R − (2V (x)+|x| −2J rx1) x1e σ2 dx d R r = 1 2 (1) =: G(r), R − (2V (x)+|x| −2J rx1) e σ2 dx d R

d where x1 is the first coordinate of x ∈ R . The GHS inequality would yield the strict concavity of G on R+, and one could argue in the same way as in our case. Unfortunately, there is no such inequality known for d > 1. Only in dimension one there exists such a result proven in the paper [EMN76] by Richard S. Ellis, James L. Monroe and Charles M. Newman. 20 CHAPTER 1. INVARIANT DISTRIBUTIONS AND A HIERARCHICAL MODEL Chapter 2

Moderate deviations

2.1 The model and basic notation

Fix d ≥ 1 and let M be a d–dimensional connected compact Riemannian C∞–manifold without boundary. In this chapter we prove moderate deviations for a huge class of mean field models on M. In order to make this statement more precise let us first state some definitions and notations.

Definition 2.1.1 Let E be a separable Banach space equipped with the topology induced by its metric. We say that a family (ξN )N∈N of E–valued random variables satisfies a moderate deviation principle in E if there exists a rate function I: E → [0, ∞] such that the family   ξN satisfies a large deviation principle (see Definition 1.1.1) in E with rate Nγ2 and γN N N∈N rate function I, for each sequence (γN )N∈N of positive numbers with 2 lim γN = 0 and lim NγN = ∞. (2.1) N→∞ N→∞ Denote by grad, ∆, λ,(·, ·) and k·k the gradient, Laplacian and uniform distribution on M and the inner product and norm on the tangent manifold T M, respectively. Fix some m ∈ N and some vector fields A1,...,Am such that each solution x of the Stratonovich type stochastic differential equation m X i dx(t) = Ai(x(t)) ∗ dW (t) i=1 is a Brownian motion on M, for each choice of real valued Brownian motions W 1,...,W m. Define A := (A1,...,Am). We study the particle system x1, . . . , xN ∈ C([0, ∞); M), which satisfies the following system of Stratonovich type stochastic differential equations

(N) dxk(t) = B(X (t))(xk(t)) dt + σA(xk(t)) ∗ dWk(t), (2.2)

where σ > 0 is the diffusion constant and W1,...,WN are independent Brownian motions on m R . The vector field B depends on the measure valued empirical process N 1 X X(N)(t) := δ , (2.3) N xk(t) k=1

21 22 CHAPTER 2. MODERATE DEVIATIONS where δx denotes the Dirac measure on x. Denote by D0(M) the Schwartz space of all distributions on M. Furthermore, by τM we denote the space of all vector fields b ∈ C∞(M; T M). We will make the following assumptions on the distribution dependent vector field.

Assumption 2.1.2 We assume that B: D0(M) → τM is linear and continuous.

Remark 2.1.3 Let µ be a finite measure on a compact Polish space E, f ∈ C0,∞(E × M) and (Vx)x∈E be a family of vector fields in τM, which depends continuously on x ∈ E. Then

B: D0(M) → τM with B(ϑ) := hµ, hϑ, fi V i satisfies Assumption 2.1.2. ∞ In particular, for V1,...,Vn ∈ τM and f1, . . . , fn ∈ C (M) the vector field

n 0 X B: D (M) → τM with B(ϑ) := hϑ, fki Vk k=1 fulfills Assumption 2.1.2.

The following two operators play a decisive role.

Definition 2.1.4 For each ϑ ∈ D0(M), define the operators Lϑ and Gϑ by

σ2 Lϑf := B(ϑ)f + ∆f, f ∈ C∞(M), (2.4) 2 D E D E ϕ, Gϑf := ϕ, Lϑf + hϑ, B(ϕ)fi , f ∈ C∞(M), ϕ ∈ D0(M). (2.5)

As consequence of (2.2)

t D E D E Z D (N) E X(N)(t), f − X(N)(0), f − X(N)(s), LX (s)f ds 0

is a real valued martingale, for each f ∈ C∞(M). Now let us define McKean–Vlasov paths.

Definition 2.1.5 A measure valued path µ ∈ C([0, ∞); P(M)) is called McKean–Vlasov path with initial datum ν ∈ P(M) if it solves (in distributional sense)

 d  − (Lµ(t))∗ µ(t) = 0 , µ(0) = ν . dt

In Section 2.4 we will analyze McKean–Vlasov paths. Finally, we have to specify the Banach spaces, in which we want to derive a moderate deviation principle. Therefore, we define the Sobolev spaces Hp(M) as follows. 2.1. THE MODEL AND BASIC NOTATION 23

Definition 2.1.6 Let λl, l ∈ N, be the eigenvalues of the operator −∆, and denote by el, l ∈ N, an orthonormal basis of corresponding eigenfunctions in L2(M). For p ∈ R, we define the set Hp(M) by n o H (M) = ϑ ∈ D0(M): kϑk < ∞ , (2.6) p Hp where the Hp–norm is given by

∞ X kϑk2 := (1 + λ )p hϑ, e i2 . (2.7) Hp l l l=1 Then the inner product

∞ X (ϑ , ϑ ) := (1 + λ )p hϑ , e i hϑ , e i , ϑ , ϑ ∈ H (M), (2.8) 1 2 Hp l 1 l 2 l 1 2 p l=1 makes Hp(M) a Hilbert space.

Note that H2(M) is equal to L2(M). For p ∈ R, define ∞ X p 2 κp := sup (1 + λl) kgrad el(x)k . (2.9) x∈M l=1 d Then by part (ii) of Lemma A.2.3 the constants κp are finite, for all p < − 2 − 1. In Section A.2 we will state some more properties of the Sobolev spaces Hp(M). If a distribution dependent vector field B satisfies Assumption 2.1.2 then its restriction on Hp(M) is continuous with respect to the Hp–norm topology.

Lemma 2.1.7 Let B satisfy Assumption 2.1.2. Fix p, q ∈ R. Then the mapping

Hp(M) × Hq(M) 3 (ϑ1, ϑ2) 7→ B(ϑ1)ϑ2 ∈ Hq−1(M)

is continuous. Moreover, there exists a constant c > 0 such that

kB(ϑ )ϑ k ≤ c kϑ k kϑ k , (2.10) 1 2 Hq−1 1 Hp 2 Hq

for all ϑ1 ∈ Hp(M) and all ϑ2 ∈ Hq(M).

Proof: Fix some atlas U = (U1,...,Un) of M. For m ∈ N and a vector field A ∈ τM with m coordinates (a1, . . . , ad), define the C –norm of A by

d 2 X α 2 kAkCm := sup kD aikC0 . d α∈N , |α|≤m i=1

α ∂|α| d Here |α| := α1 + ... + αd and D := α1 αd , for α ∈ N . ∂x1 ·...·∂xd Choose m ∈ N in such a way that p ≤ m. Then, using part (ii) of Lemma A.2.4, we get

kB(ϑ )ϑ k ≤ c kB(ϑ )k m kϑ k , (2.11) 1 2 Hq−1 1 1 C 2 Hq 24 CHAPTER 2. MODERATE DEVIATIONS

0 for all ϑ1 ∈ D (M), all ϑ2 ∈ Hq(M) and some constant c1 > 0, which depends on q and the atlas U. Since B satisfies Assumption 2.1.2 there exists a constant c2 > 0, depending on p and the atlas U, such that

kB(ϑ)k m ≤ c kϑk , (2.12) C 2 Hp 0 for all ϑ ∈ Hp(M). Indeed, since B is a continuous mapping from D (M) to τM its restriction to H (M) is continuous with respect to the norm topology induced by k·k . Therefore, (2.12) p Hp follows from the linearity of B. Combining (2.11) and (2.12), we get

kB(ϑ )ϑ k ≤ c kϑ k kϑ k , (2.13) 1 2 Hq−1 3 1 Hp 2 Hq

for all ϑ1 ∈ Hp(M), all ϑ2 ∈ Hq(M) and some constant c3 > 0, which depends on p, q and the atlas U. Since the Hp–norm is independent of the chosen atlas we can take the infimum over all atlases of M on both sides of (2.13) and (2.10) follows. 2

Denote by µ(N) the McKean–Vlasov path starting at X(N)(0). The aim of the following (N) (N) sections is to prove a moderate deviation principle for the sequence (X − µ )N∈N in C([0,T ]; Hp(M)), for all T > 0 and sufficiently negative p ∈ R.

2.2 Formulation of the moderate deviation principle

Let B be a distribution dependent vector field, which satisfies Assumption 2.1.2. For N ∈ N, define the particle system x1, . . . , xN by (2.2). We will study the measure valued empirical processes N 1 X X(N)(t) = δ ,N ∈ . N xk(t) N k=1 Denote by µ(N) the McKean–Vlasov path with initial datum X(N)(0). Moreover, we consider the following class of distribution valued functions.

Definition 2.2.1 Let T > 0 and µ ∈ C([0,T ]; P(M)). We call a distribution valued function ϑ ∈ C([0,T ]; D0(M)) absolutely continuous with respect to µ if for each ε > 0 there exists a δ > 0 such that

n n X X (tk − sk) < δ =⇒ kϑ(tk) − ϑ(sk)k tk < ε , 1 R H−1(M; µ(t)dt) k=1 k=1 tk−sk sk

for all disjoint intervals (s1, t1),..., (sn, tn) ⊂ [0,T ], where

hϑ, fi2 kϑk2 := sup , µ ∈ P(M), ϑ ∈ D0(M). (2.14) H−1(M;µ) D E f∈C∞(M) µ, kgradfk2 hµ,kgradfk2i>0

We denote by ACµ the set of all distribution valued functions that are absolutely continuous with respect to µ.

The aim of this chapter is to prove the following statement. 2.2. FORMULATION OF THE MODERATE DEVIATION PRINCIPLE 25

d (N) Theorem 2.2.2 Fix T > 0 and p < − 2 − 1. Assume that X (0) converges, as N → ∞, almost surely in P(M) to some non–random ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. (N) (N) Then (X − µ )N∈N satisfies a moderate deviation principle in C([0,T ]; Hp−1(M)) with rate function

 T 2  1 R ϑ˙(t) − (Gµ(t))∗ϑ(t) dt , if ϑ ∈ AC , ϑ(0) = 0, 2σ2 µ I(ϑ) = 0 H−1(M;µ(t)) (2.15)  ∞ , otherwise.

In particular, I(ϑ) = 0 if and only if ϑ = 0.

We will split the proof in several parts. First, since we use in all parts statements about solutions u ∈ C([0,T ]; Hp(M)) of integral equations, which are related to parabolic partial differential equations, we will give in the next section a short introduction to this topic. Fix ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. In Section 2.5.3 we will analyze the free model X(N), i.e., the measure valued empirical process of the particle system x ,..., x ∈ ϑ e1 eN C([0, ∞); M), which solves the system of Stratonovich type stochastic differential equations

dxek(t) = B(µ(t) + γN ϑ(t))(xek(t)) dt + σA(xek(t)) ∗ dWk(t).

(N) (N) Assume that X (0) = X (0), for all N ∈ N. Fix a sequence (γN )N∈N, which fulfill ϑ   (2.1). We will see that the family 1 (X(N) − µ(N)) satisfies a large deviation principle γN ϑ N∈N 2 in C([0,T ]; Hp−1(M)) with rate NγN . In a small γ –neighborhood of ϑ the processes X(N) and X(N) are almost the same. This N ϑ local result we prove in Section 2.6.1. Finally, Theorem 2.2.2 follows from a exponential tightness argument, see Section 2.6.3. Assume that Theorem 2.2.2 is valid. Then we get the following moderate deviation principle.

d Corollary 2.2.3 Fix T > 0 and p < − 2 − 1. Assume that for each sequence (γN )N∈N of positive numbers, which satisfies (2.1), we have

1   lim ln P X(N)(0) − ν > γ ε = −∞, (2.16) 2 N N→∞ NγN Hp−1 for all ε > 0 and some non–random ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean– Vlasov path with initial datum ν. (N) Then (X − µ)N∈N satisfies a moderate deviation principle in C([0,T ]; Hp−1(M)) with rate function I.

Proof: Because of (2.16), it follows from Borel–Cantelli’s lemma, see [Bau91, Lemma 11.1], (N) d that X (0) converges almost surely in Hp−1(M) to ν. Since p−1 < − 2 we can apply Lemma (N) A.2.3.(iii) and get the almost surely convergence of X (0) to ν in P(M). For N ∈ N, denote by µ(N) the McKean–Vlasov path with initial datum X(N)(0). Then from Theorem 2.2.2 it (N) (N) follows that (X − µ )N∈N satisfies a moderate deviation principle in C([0,T ]; Hp−1(M)) with rate function I. 26 CHAPTER 2. MODERATE DEVIATIONS

Now fix some sequence (γN )N∈N of positive numbers, which satisfies (2.1). Later in Corol- lary 2.4.4 we prove that

µ(t) − µ(N)(t) ν − X(N)(0)

sup ≤ c , t∈[0,T ] γN γN Hp−1 Hp−1 for all N ∈ and some constant c > 0. Therefore, (2.16) implies that 1 (X(N) − µ(N)) and N γN 1 (N) 2 (X − µ) are exponentially equivalent in C([0,T ]; Hp−1(M)) with rate Nγ . Hence, the γN N same LDP holds for 1 (X(N) − µ) as it holds for 1 (X(N) − µ(N)), see [DZ93, Theorem 4.2.13]. γN γN Since the sequence (γN )N∈N was arbitrary a moderate deviation principle for the sequence (N) (X − µ)N∈N follows. 2

2.3 Some statements about integral equations

We are interested in solutions u ∈ C([0, ∞); Hp(M)) of equations like

t Z   u(t) − u(0) = −A(s)u(s) + f(s) ds + M(t), 0 where A is a generalized partial differential operator (see below). If M is not differentiable in time then we cannot use the standard results about parabolic partial differential equations. Moreover, we require uniform results for a whole class of operators A. Therefore, we have to adapt statements about parabolic partial differential equations. There are many publications about parabolic partial differential equations in several spaces. We do not want to give a list of references here, because it would be always incomplete. Some are listed in the references. Most of the publications are either for beginners or for professionals. Therefore, either the statements do not fit with our needs or they are so universal that it is even hard to understand the notation. I only found one book [Wlo82] by Joseph Wloka, where the statements are universal enough and the proofs can be easily verified. Let us first specify the operators A we want to study.

Assumption 2.3.1 Fix T > 0, p ∈ R and q ∈ R+. We assume that the operator A: [0,T ] × Hp(M) → Hp−2q(M) satisfies the following three conditions:

(i) The mapping A(t): Hp(M) → Hp−2q(M) is linear, for all t ∈ [0,T ]. (ii) The mapping t 7→ (ϑ , A(t)ϑ ) is measurable, for all ϑ , ϑ ∈ H (M). 1 2 Hp−q 1 2 p

(iii) There exist constants C1,C2 > 0 and C3 ≥ 0 such that

(ϑ1, A(t)ϑ2) ≤ C1 kϑ1k kϑ2k , (2.17) Hp−q Hp Hp (ϑ , A(t)ϑ ) ≥ C kϑ k2 − C kϑ k2 , (2.18) 1 1 Hp−q 2 1 Hp 3 1 Hp−q

for all t ∈ [0,T ] and all ϑ1, ϑ2 ∈ Hp(M).

Remark 2.3.2 If A satisfies the assumptions above for some p ∈ R and some q ∈ R+ then the adjoint operator A∗ defined by A∗(t) := (A(t))∗, t ∈ [0,T ], satisfies Assumption 2.3.1 for p and q, too. Moreover, the constants C1,C2 and C3 are the same. 2.3. SOME STATEMENTS ABOUT INTEGRAL EQUATIONS 27

Now we can prove the following statement about the solution of integral equations.

Proposition 2.3.3 Assume that A: [0,T ] × Hp(M) → Hp−2q(M) satisfies Assumption 2.3.1 for some p ∈ R and q ∈ R+. Then, for each f ∈ C([0,T ]; Hp−2q(M)), each g ∈ Hp−q(M) and each M ∈ C([0,T ]; Hp(M)), there exists an unique solution

u ∈ L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M))

of the integral equation

t t Z Z u(t) − g = − A(s)u(s) ds + f(s) ds + M(t) . (2.19) 0 0 Moreover, the mapping (M, f, g) 7→ u from C([0,T ]; Hp(M)) × C([0,T ]; Hp−2q(M)) × Hp−q(M) to L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M)) is continuous and there exists a constant c > 0, depending only on T and the constants C1,C2,C3 of Assumption 2.3.1.(iii), such that

T Z sup ku(t)k2 + ku(t)k2 dt (2.20) Hp−q Hp t∈[0,T ] 0  T  Z ≤ c kgk2 + sup kM(t)k2 + kf(t)k2 dt .  Hp−q Hp Hp−2q  t∈[0,T ] 0

Proof: First we observe that the integral equation (2.19) can be written as the following parabolic partial differential equation

 d  + A(t) (u(t) − M(t)) = −A(t)M(t) + f(t) dt u(0) − M(0) = g − M(0). (2.21)

Since M is an element of C([0,T ]; Hp(M)) and the operator A fulfill (2.17) of Assumption 2.3.1 we have A(t)M(t) ∈ L2([0,T ]; Hp−2q(M)). Indeed, using (2.17) and part (i) of Lemma A.2.2, a short calculation yields

2 (ϕ, A(t)M(t))H kA(t)M(t)k2 = sup p−q Hp−2q 2 ∞ kϕk ϕ∈C (M) Hp ≤ C2 kM(t)k2 . 1 Hp 28 CHAPTER 2. MODERATE DEVIATIONS

Following the proofs of [Wlo82, Satz 26.1, Chapter IV] and [Wlo82, Satz 25.5, Chapter IV] we get an unique solution

(u − M) ∈ L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M)) of (2.21). Moreover, there exists a constant c1 > 0, depending only on T and the constants C1,C2,C3 of Assumption 2.3.1, such that

T Z sup ku(t) − M(t)k2 + ku(t) − M(t)k2 dt Hp−q Hp t∈[0,T ] 0  T  Z ≤ c kg + M(0)k2 + kA(t)M(t) + f(t)k2 dt 1  Hp−q Hp−2q  0  T T  Z Z ≤ c kg + M(0)k2 + C2 kM(t)k2 dt + kf(t)k2 dt . 1  Hp−q 1 Hp Hp−2q  0 0

Since M ∈ C([0,T ]; Hp(M)) it follows that

u ∈ L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M))

and

T T Z Z sup ku(t)k2 + ku(t)k2 dt ≤ sup kM(t)k2 + (1 + c C2) kM(t)k2 dt Hp−q Hp Hp−q 1 1 Hp t∈[0,T ] t∈[0,T ] 0 0 T Z + c kf(t)k2 dt + c kg + M(0)k2 1 Hp−2q 1 Hp−q 0  T  Z ≤ c kgk2 + sup kM(t)k2 + kf(t)k2 dt ,  Hp−q Hp Hp−2q  t∈[0,T ] 0

2 with c = 1 + c1 + T (c1 + c1C1 ). This proves (2.20) and the claimed continuity. 2

Remark 2.3.4 (a) One could wonder why we loose smoothness, i.e., why the solution u of the integral equation (2.19) is only an element of C([0,T ]; Hp−q(M)) whereas M is an element of C([0,T ]; Hp(M)). In the proof of Proposition 2.3.3 we only require that M is an element of L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M)). Therefore, one can extend the result such that a solution u of (2.19) is the image of (M, f, g) under a continuous mapping from

(L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M))) × C([0,T ]; Hp−2q(M)) × Hp−q(M)

to L2([0,T ]; Hp(M)) ∩ C([0,T ]; Hp−q(M))). 2.3. SOME STATEMENTS ABOUT INTEGRAL EQUATIONS 29

(b) The result of Proposition 2.3.3 can be generalized to pseudo differential operators, see for instance [CP82, Section 5 of Chapter 6] by Jacques Chazarain and Alainor Piriou or [LSU68] by Olga A. Ladyˇzenskaya, Vsevolod A. Solonnikov and Nina N. Ural’tseva, but it is harder to figure out on what the constant c depends.

(c) A triple Hp−2q(M), Hp−q(M) and Hp(M) of Hilbert spaces like in the proof of Propo- sition 2.3.3 is called Gelfand triple.

The following two lemmata will tell us that the operators −Lϑ(·) and −Gϑ(·) satisfy As- sumption 2.3.1 for all p ∈ R and q = 1. Moreover, the constants C1,C2 > 0 and C3 ≥ 0 can be taken uniform for a huge class of distribution valued functions ϑ ∈ C([0,T ]; D0(M)). First we study the operator −Lϑ(·).

ϑ(t) Lemma 2.3.5 Fix T ≥ 0 and r ∈ R. The operator A(t) := −L , t ∈ [0, ∞), satisfies Assumption 2.3.1, for q = 1, all p ∈ R and all ϑ ∈ C([0,T ]; Hr(M)). Moreover, the constants C1,C2,C3 can be chosen uniformly for all ϑ ∈ C([0,T ]; Hr(M)) with

sup kϑ(t)k < K, Hr t∈[0,T ]

for some constant K > 0.

Proof: Since [0,T ] 3 t 7→ B(ϑ(t)) ∈ τM is continuous the mapping

[0,T ] 3 t 7→ (ϑ , A(t)ϑ ) ∈ 1 2 Hp−1 R

is continuous, too, and therefore measurable, for each fixed pair ϑ1, ϑ2 ∈ Hp(M). Moreover, A(t) is by definition a linear mapping from Hp(M) to Hp−2(M), for each fix t ∈ [0,T ]. Fix r ∈ R and K > 0. Using the duality of H 1 (M) and H 3 (M) together with Lemma p− 2 p− 2 2.1.7, we get a constant c > 0, depending on p and r, such that

(ϑ1, B(ϑ(t))ϑ2) ≤ kϑ1k kB(ϑ(t))ϑ2k Hp−1 H 1 H 3 p− 2 p− 2 ≤ c kϑ(t)k kϑ k kϑ k , (2.22) Hr 1 H 1 2 H 1 p− 2 p− 2

for all ϑ1, ϑ2 ∈ Hp(M), all t ∈ [0,T ] and all ϑ ∈ C([0,T ]; Hr(M)). Furthermore,

∞ X (ϑ , −∆ϑ ) = (1 + λ )p−1 hϑ , e i hϑ , λ e i , for all ϑ , ϑ ∈ H (M). (2.23) 1 2 Hp−1 l 1 l 2 l l 1 2 p l=1 Using the Cauchy–Schwarz inequality, this implies

∞ X p−1 (ϑ1, −∆ϑ2) = (1 + λ ) λ hϑ1, e i hϑ2, e i Hp−1 l l l l l=1 ∞ ∞ X p−1 2 X p−1 2 ≤ (1 + λl) λl hϑ1, eli (1 + λl) λl hϑ2, eli l=1 l=1 ≤ kϑ k kϑ k , 1 Hp 2 Hp 30 CHAPTER 2. MODERATE DEVIATIONS for all ϑ1, ϑ2 ∈ Hp(M). Combining this with (2.22), it follows that

σ2 (ϑ1, A(t)ϑ2) ≤ (ϑ1, −∆ϑ2) + (ϑ1, B(ϑ(t))ϑ2) Hp−1 2 Hp−1 Hp−1 σ2  ≤ + c K kϑ1k kϑ2k . 2 Hp Hp

In other words the operator −Lν(·) satisfies (2.17). In order to prove (2.18) let us first study distributions ϑ1 ∈ Hp(M) with hϑ1, 1i = 0. Then (2.23) implies λ 2 kϑ1k ≤ (ϑ1, −∆ϑ1) , 1 + λ Hp Hp−1 where λ > 0 is the smallest eigenvalue of −∆ larger than zero. This together with (2.22) yield

σ2 λ (ϑ , A(t)ϑ ) ≥ kϑ k2 − c K kϑ k2 . 1 1 Hp−1 1 Hp 1 H 1 2 1 + λ p− 2

Using A.2.2.(v), we can find a constant C3 > 0, depending on c K and p, such that

2 σ λ 2 2 (ϑ1, A(t)ϑ1) ≥ kϑ1k − C3 kϑ1k , Hp−1 4 1 + λ Hp Hp−1

for all ϑ1 ∈ Hp(M) with hϑ1, 1i = 0. Finally, we have to enlarge C3 such that

2 σ 2 2 (1, A(t)1) = 0 ≥ k1k − C3 k1k . Hp−1 4 Hp Hp−1 2

Corollary 2.3.6 Fix T ≥ 0. Then there exist constants C1,C2 > 0 and C3 ≥ 0 such that the ϑ(·) operator −L satisfies Assumption 2.3.1, for q = 1, all p ∈ R and all ϑ ∈ C([0,T ]; P(M)).

d Proof: Fix r < − 2 . Then by part (iii) of Lemma A.2.3 there exists a constant K > 0 such that kϑk < K, Hr for all ϑ ∈ P(M). Therefore, our claim follows from Lemma 2.3.5. 2

Now let us study the operator −Gϑ(·).

Lemma 2.3.7 Fix T ≥ 0. The operator −Gϑ(·) satisfies Assumption 2.3.1, for q = 1, all p ∈ R and all ϑ ∈ C([0,T ]; Hr(M)) with r > p − 1. Moreover, the constants C1,C2,C3 can be chosen uniformly for all ϑ ∈ C([0,T ]; D0(M)) with

sup kϑ(t)k < K, Hr t∈[0,T ]

for some constant K > 0. 2.4. THE BEHAVIOR OF THE MCKEAN–VLASOV PATH 31

Proof: The first two assumptions 2.3.1.(i) and 2.3.1.(ii) can be verified in the same way as for the operator −Lϑ(·) in the proof of Lemma 2.3.5. Fix r > p − 1 and K > 0. Using Lemma 2.1.7, we get a constant c > 0, depending on p, r and K, such that

| (B(ϑ )ϑ(t), ϑ ) | ≤ kB(ϑ )ϑ(t)k kϑ k 1 2 Hp−1 1 Hr−1 2 H2p−1−r ≤ c kϑ(t)k kϑ k kϑ k , Hr 1 H2p−1−r 2 H2p−1−r

for all ϑ1, ϑ2 ∈ Hp(M) and all t ∈ [0,T ]. Since 2p − 1 − r < p we can proceed in the same way as in the proof of Lemma 2.3.5. 2

Corollary 2.3.8 Fix T ≥ 0. Then there exist constants C1,C2 > 0 and C3 ≥ 0 such ϑ(·) d that the operator −G satisfies Assumption 2.3.1, for q = 1, all p < − 2 + 1 and all ϑ ∈ C([0,T ]; P(M)).

d Proof: Choose r ∈ R in such a way that − 2 > r > p − 1. Then part (iii) of Lemma A.2.3 implies sup kϑk < K, Hr ϑ∈P(M) for some K > 0. Therefore, our claim is a consequence of Lemma 2.3.7. 2

2.4 The behavior of the McKean–Vlasov path

This section presents some statements about McKean–Vlasov paths defined in 2.1.5. Let us first study the existence and uniqueness of the McKean–Vlasov path with initial datum ν ∈ P(M).

Lemma 2.4.1 Let ν be a probability measures on M. Then there exists a unique McKean– Vlasov path µ with initial datum ν, i.e., an unique solution µ ∈ C([0, ∞); P(M)) of the equation  d  − (Lµ(t))∗ µ(t) = 0, for all t > 0, dt with µ(0) = ν.

We only want to sketch the idea of the proof. For n ∈ N, denote by µn ∈ C([0, ∞); P(M)) the unique solution of  d  − (Lµn−1(t))∗ µ (t) = 0, for all t > 0, dt n

µ (·) with µn(0) = ν, where we set µ0 ≡ ν. Since, for each n ∈ N, the operator L n−1 is the generator of a diffusion process on M such a solution µn ∈ C([0, ∞); P(M)) always exists. Like in the proofs of Lemma 2.4.3 and Corollary 2.4.4, stated in the next section, one can show that up to a time ε > 0, which is independent of the initial datum ν ∈ P(M), the d mapping µn−1 7→ µn is a contraction on C([0, ε]; Hp(M)), for p < − 2 . Therefore, up to time ε there exists a McKean–Vlasov path starting at ν. Since ε does not depend on ν we can restart at time ε with ν(ε) and get a McKean–Vlasov path up to time 2ε. Therefore, Lemma 2.4.1 follows by an iteration. 32 CHAPTER 2. MODERATE DEVIATIONS

Now we present three statements about McKean–Vlasov paths. We start with the smooth- ness of McKean–Vlasov paths.

Lemma 2.4.2 Fix q ∈ R, m ∈ N and δ > 0.

(i) There exists a constant c1 = c1(δ, q) > 0 such that sup kµ(t)k2 ≤ c , Hq 1 t∈[δ,∞) for all McKean–Vlasov paths µ ∈ C([0, ∞); P(M)). d Moreover, if q < − 2 then we can take δ = 0.

(ii) For each atlas U of M, there exists a constant c2 = c2(U, δ, m) > 0 such that 2 sup kµ(t)kCm ≤ c2, t∈[δ,∞) for all McKean–Vlasov paths µ ∈ C([0, ∞); P(M)).

d Proof: Without loss of generality we assume that δ < 1. Take r < − 2 . Then P(M) is by Lemma A.2.3.(iii) compactly embedded in Hr(M). Since µ solves d  ∗ µ(t) = Lµ(t) µ(t) , µ(0) = ν ∈ P(M) ⊆ H (M), dt r

we can apply Proposition 2.3.3 and get constants a1, a2 > 0, depending on r, such that t Z sup kµ(s)k2 + kµ(s)k2 ds ≤ a kνk2 ≤ a . (2.24) Hr Hr+1 1 Hr 2 s∈[0,t] 0

for all t ∈ [0, 1]. Since a1 and a2 are independent of ν ∈ P(M) we can iterate these arguments and get sup kµ(t)k2 ≤ a . Hr 2 t∈[0,∞)

Because Hr(M) is continuously embedded in Hq(M), for q ≤ r, we have proven (i), for d q < − 2 and δ = 0. If q ≥ r then choose n ∈ N in such a way that r + n < q ≤ r + n + 1. From equation (2.24) it follows that µ(t) is an element of Hr+1(M) for almost all t ∈ [0, 1]. In particular, there δ exists a time s ∈ [0, n+1 ] with a (n + 1) kµ(s)k2 ≤ 2 . Hr+1 δ Now we can use s as new starting time for the McKean–Vlasov path µ. Iterating these n + 1 times, we get a constant c1 = c1(δ, q) > 0 such that sup kµ(t)k2 ≤ sup kµ(t)k ≤ c . Hq Hr+n+1 1 t∈[δ,δ+1] t∈[δ,δ+1] Since all was independent of the initial datum ν inequality (i) follows. m d Moreover, Hq(M) is continuously embedded in C (M), for q > 2 + m, see Lemma A.2.2. Therefore, inequality (ii) follows from (i). 2 2.4. THE BEHAVIOR OF THE MCKEAN–VLASOV PATH 33

Now let us analyze the dependence of the McKean–Vlasov path on the initial data.

Lemma 2.4.3 Fix 0 < δ < T , r ∈ R and n ∈ N. Then there exists a constant c = c(T, δ, r, n) > 0 such that

sup kµ (t) − µ (t)k2 ≤ c kν − ν k2 , (2.25) 1 2 Hr+n 1 1 2 Hr t∈[δ,T ] for all McKean–Vlasov paths µ1, µ2 ∈ C([0,T ]; P(M)) with initial data ν1, ν2 ∈ P(M), respectively. d Moreover, if p < − 2 + 1 and n = 0 then we can take δ = 0.

Proof: Fix r ∈ R and ν1, ν2 ∈ P(M). If ν1 − ν2 is not an element of Hr(M) then there is nothing to prove. d Assume that ν1 − ν2 ∈ Hr(M) and r < − 2 + 1. Then the measure valued path µ1 − µ2 solves d (µ (t) − µ (t)) = (Gµ2(t))∗(µ (t) − µ (t)) + R(µ (t) − µ (t)), (2.26) dt 1 2 1 2 1 2 with hR(µ1(t) − µ2(t)), fi = hµ1(t) − µ2(t), B(µ1(t) − µ2(t))fi , for f ∈ C∞(M). Because of Corollary 2.3.8 and Lemma 2.4.2, the operator −Gµ2(·) satisfies Assumption 2.3.1. Hence, we can apply Proposition 2.3.3 to (2.26) and get a constant a1 = a1(T, r) > 0 such that

t Z sup kµ (s) − µ (s)k2 + kµ (s) − µ (s)k2 ds (2.27) 1 2 Hr 1 2 Hr+1 s∈[0,t] 0  t  Z ≤ a kν − ν k2 + kR(µ (s) − µ (s))k2 ds , 1  1 2 Hr 1 2 Hr−1  0

for all t ∈ [0,T ]. Using Lemma 2.1.7, we get a constant a2 = a2(r) > 0 with kR(µ (t) − µ (t))k ≤ a kµ (t) − µ (t)k sup kµ (s) − µ (s)k . 1 2 Hr−1 2 1 2 Hr 1 2 H d s∈[0,T ] − 2 −1

Applying A.2.3.(iii) to the second factor on the right hand side, we can proceed with

kR(µ (t) − µ (t))k ≤ a kµ (t) − µ (t)k , 1 2 Hr−1 3 1 2 Hr

for all t ∈ [0,T ] and some constant a3 = a3(r). Inserting this in (2.27) and applying Gronwall’s inequality, see [DZ93, Lemma E.6], it follows that

sup kµ (t) − µ (t)k2 ≤ a kν − ν k2 , 1 2 Hr 4 1 2 Hr t∈[0,T ]

d for some constant a4 = a4(T, r). This proves our claim for r < − 2 + 1 and n = 0 with δ = 0. Inequality (2.27) implies that µ1(t)−µ2(t) is an element of Hr+1(M), for almost all t ∈ [0,T ]. δ Especially, there exists a time s ∈ [0, n ] such that

2 n a1(1 + T a4) 2 kµ1(s) − µ2(s)k ≤ kν1 − ν2k . Hr+1 δ Hr 34 CHAPTER 2. MODERATE DEVIATIONS

Now we can use s as new starting time for the McKean–Vlasov paths µ1 and µ2. Iterating d this n times, we get (2.25) for r < − 2 + 1. d Note that we only used the assumption r < − 2 + 1 in order to verify that the operator µ2(·) d −G satisfies Assumption 2.3.1 on [0,T ]. If r ≥ − 2 +1 then by Lemma 2.4.2 and Corollary µ2(·) δ 2.3.8 the operator −G satisfies Assumption 2.3.1 on [ 2 ,T ], for each δ > 0. Therefore, we δ d can take 2 as new starting time and proceed in the same way as for r < − 2 + 1. 2

Corollary 2.4.4 Fix 0 < δ < T , q, p ∈ R and m ∈ N.

(i) There exists a constant c1 = c1(T, δ, p, q) > 0 such that

sup kµ (t) − µ (t)k2 ≤ c kν − ν k2 , 1 2 Hq 1 1 2 Hp t∈[δ,T ]

for all McKean–Vlasov paths µ1, µ2 ∈ C([0,T ]; P(M)) with initial data ν1, ν2 ∈ P(M)∩ Hp(M), respectively. d Moreover, if q < − 2 + 1 and q ≤ p then we can take δ = 0.

(ii) For each atlas U of M, there exists a constant c2 = c2(U, T, δ, p, m) > 0 such that

2 2 sup kµ (t) − µ (t)k m ≤ c kν − ν k , 1 2 C 2 1 2 Hp t∈[δ,T ]

for all McKean–Vlasov paths µ1, µ2 ∈ C([0,T ]; P(M)) with initial data ν1, ν2 ∈ P(M)∩ Hp(M), respectively.

Proof: Take r = p and n ∈ N in such a way that q ≤ r + n. Then (i) follows from Lemma d 2.4.3. Moreover, if q < − 2 + 1 and q ≤ p then we can take r = q and n = 0 in order to get (i) with δ = 0. m Since by Lemma A.2.2.(iii) the space Hq(M) is continuously embedded in C (M), for d q > 2 + m, inequality (ii) follows from (i). 2

2.5 The free model

Fix a distribution dependent vector field B, which satisfies Assumption 2.1.2, and fix some distribution valued function ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Moreover, fix some sequence (γN )N∈N of positive numbers, which fulfills (2.1). We are interested in the free par- ticle system, i.e., solutions xek, k ∈ N, of the following system of Stratonovich type stochastic differential equations

(N) dxek(t) = B(µ (t) + γN ϑ(t))(xek(t)) dt + σA(xek(t)) ∗ dWk(t). (2.28) m Here Wk, k ∈ N, are independent Brownian motions on R and m ∈ N and A are taken from the definition of the coupled model (2.2). For N ∈ , we denote by X(N) the measure valued empirical process of the free particle N ϑ system, i.e., N (N) 1 X X (t) := δ (2.29) ϑ N xek(t) k=1 2.5. THE FREE MODEL 35 and by µ(N) is the McKean–Vlasov path with initial datum X(N)(0). ϑ (N) µ (·)+γN ϑ(·) Note, that the operator L is the generator for each diffusion xek and that the family (xk)k∈N is independent. Moreover, since the distribution of the measure valued empirical process X(N) is invariant ϑ under permutation of the initial data (x (0),..., x (0)) we can conceive X(N) as Markov e1 eN ϑ process with state space P(M). A consequence of (2.28) and Itˆo’sformula is that

t * (N) (N) + Z   D E X (t) − X (0) 1 (N) (N) Mf(N)(t), f := ϑ ϑ , f − X (s), Lµ (s)+γN ϑ(s)f ds (2.30) γN γN ϑ 0 is a (real valued) martingale, for each f ∈ C∞(M). Because the processes X(N), N ∈ , depending on γ , we cannot speak of a moderate ϑ N N deviation principle holding for these processes. Therefore, we will look at each sequence 2 (γN )N∈N separately and prove a large deviation principle with rate NγN for the processes 1 (X(N) − µ(N)), N ∈ . γN ϑ N (N) In the next section we will study the martingale term Mf , N ∈ N. More precisely, we will prove two large deviation principles in different spaces. This will help us to derive a “nice” shaped representation of the rate function, see Section 2.5.2. Finally, in Section 2.5.3 we will present a large deviation principle for the processes 1 (X(N) − µ(N)), N ∈ , with rate Nγ2 . γN ϑ N N

2.5.1 Large deviations for the martingale term

Fix a distribution dependent vector field B, which satisfies Assumption 2.1.2, and fix some distribution valued function ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Moreover, fix some sequence (γN )N∈N of positive numbers, which fulfills (2.1). In this section we study the martingale term Mf(N) defined in (2.30). There are two natural kinds of spaces in which one (N) can try to prove a large deviation principle for the sequence (Mf )N∈N. On the one hand we can look at C([0,T ]; Hp(M)), for some p ∈ R. Then there exists a standard procedure to derive a large deviation principle. First one prove a large deviation principle for the finite dimensional distributions, where one can use the G¨artner–Ellis Theorem, see [DZ93, Corollary (N) 4.4.27]. Then one take the projective limit to get a LDP for the sequence (Mf )N∈N in the [0,T ] space (Hp(M)) , see [DZ93, Theorem 4.6.1]. Finally, one has to prove exponential tightness of the processes in C([0,T ]; Hp(M)). On the other hand one can look at the spaces L2([0,T ]; Hp(M)), for some p ∈ R. We will do both. Then we get two representations of the rate function, which we will use in Section 2.5.2 to derive a “nice” shaped representation of the rate function. (N) Let us start with some properties of the measure valued martingales Mf , N ∈ N.

d Lemma 2.5.1 Fix T > 0, p < − 2 − 1 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Then the following statements are valid.

(N) (i) For each N ∈ N, the measure valued martingale Mf is the mean of N independent 36 CHAPTER 2. MODERATE DEVIATIONS

(N) (N) measure valued martingales Mf1 ,..., MfN with quadratic covariation

t 2 Z D (N) E D (N) E σ [[ Mf , f , Mf , g ]]t = (grad f, grad g) ds. k k 2 xek(s) γN 0

This means that N 1 X (N) Mf(N) = Mf . N k k=1

t D E D E σ2 Z D E (ii) [[ M (N), f , M (N), g ]] = X(N)(s), (grad f, grad g) ds, f f t 2 ϑ NγN 0 for all f, g ∈ C∞(M) and all t ∈ [0,T ].

 2   Nγ2 α 1 (iii) P sup M (N)(τ) − M (N)(s) > α ≤ exp − N + , f f 2 s≤τ≤t Hp 8(t − s)σ κp 4 for all N ∈ N, all α > 0 and all 0 ≤ s ≤ t ≤ T . √ !! (N) (iv) sup E exp C NγN sup Mf (t) < ∞, for all C ∈ R. H N∈N 0≤t≤T p

(N) (N) (v) The mapping t ∈ [0,T ] 7→ Mf (t) ∈ Hp(M) is almost surely continuous. Hence, Mf is an element of C([0,T ]; Hp(M)) almost surely.

(N) Proof: For 1 ≤ k ≤ N, define Mfk by

 t  D E 1 Z (N) (N) µ (s)+γN ϑ(s) Mfk (t), f := f(xek(t)) − f(xek(0)) − L f(xek(s)) ds , γN 0

∞ (N) (N) for all f ∈ C (M), where xek is taken from definition (2.28). Then Mf1 ,..., MfN are inde- (N) (N) (N) pendent and Mf is by definition the mean of Mf1 ,..., MfN . Moreover, a short calculation yields the claimed formula for the quadratic covariation and (i) is proven. Now, (ii) follows directly from (i). In order to prove (iii) fix N ∈ N and 0 ≤ s ≤ t ≤ T . Define for shorter notations

(N) Mft := Mf (t), t ∈ [0,T ]. (2.31)

Then, using Itˆo’s formula, we calculate

∞ τ 2 Z 2 2 X p D E D E δNγN Mfτ − Mfs = 2δNγN (1 + λl) Mfu − Mfs, el d Mfu, el , Hp l=1 s τ ∞ Z 2 X p D E D E + δNγN (1 + λl) [[ M,f el , M,f el ]]u du, (2.32) l=1 s 2.5. THE FREE MODEL 37 for all δ > 0 and all 0 ≤ s ≤ τ ≤ t. Here the first term on the right hand side of (2.32) is a martingale M after time s with quadratic variation

[[M,M]]τ (2.33) τ Z ∞ 2 2 4 X p p D ED E D E D E = 4δ N γN (1 + λl) (1 + λk) Mfu − Mfs, el Mfu − Mfs, ek d[[ M,f el , M,f ek ]]u s l,k=1 τ Z * ∞ 2+ (N) X D E = 4δ2Nγ2 σ2 X (u), (1 + λ )p M − M , e grad e du N ϑ l fu fs l l s l=1 τ * ∞ + Z 2 2 2 2 (N) X p 2 ≤ 4δ NγN σ Mfu − Mfs X (u), (1 + λl) kgrad elk du Hp ϑ s l=1 τ Z 2 2 2 2 ≤ 4δ NγN σ κp Mfu − Mfs du . Hp 0 In the third line we used the Cauchy–Schwarz inequality. Taking the supremum over τ in (2.32), we get

2 1 2 2 δNγN sup Mfτ − Mfs ≤ sup (M(τ) − [[M,M]]τ ) + δσ (t − s)κp s≤τ≤t Hp s≤τ≤t 2 2 2 2 2 + 2δ NγN σ (t − s)κp sup Mfτ − Mfs . s≤τ≤t Hp

1 Choosing δ = 2 , it follows that 4(t−s)σ κp

Nγ2 2  1  1 N sup M − M ≤ sup M(τ) − [[M,M]] + . 2 fτ fs τ 8(t − s)σ κp s≤τ≤t Hp s≤τ≤t 2 4

Finally, we get by Doob’s sub–martingale inequality

 2  (N) (N) P sup Mf (τ) − Mf (s) > α s≤τ≤t Hp    2  1 NγN α 1 ≤ P sup M(τ) − [[M,M]]τ > 2 − s≤τ≤t 2 8(t − s)σ κp 4  2  NγN α 1 ≤ exp − 2 + , 8(t − s)σ κp 4 for each α > 0. This proves (iii). Now, (iv) follows from (iii). Indeed, !! √ (N) E exp C NγN sup Mf (t) 0≤t≤T Hp ∞ ! ! Z √ (N) = P exp C NγN sup Mf (t) > y dy 0≤t≤T Hp 0 38 CHAPTER 2. MODERATE DEVIATIONS

∞ ! Z 2 (ln y)2 ≤ P sup M (N)(t) > dy f 2 2 0≤t≤T Hp C NγN 0 ∞ (ln y)2 2 2 Z − + 1 q 8C T σ κp+1 8C2T σ2κ 4 2 2 ≤ e p dy = 8C T σ κpπ e 4 . 0 D E Proof of (v): First, the mapping C∞(M) 3 f 7→ Mf(N)(t), f is by definition linear, for all t ∈ [0,T ] almost surely. Moreover, using first Doob’s sub–martingale inequality and then Itˆo’sformula, we get

! ∞ ! 2 2 (N) X p D (N) E E sup Mf (t) = E sup (1 + λl) Mf (t), el (2.34) Hp t∈[0,T ] t∈[0,T ] l=1 ∞ ! 2 X p D (N) E ≤ (1 + λl) E sup Mf (t), el l=1 t∈[0,T ] ∞  2 X p D (N) E ≤ 4 (1 + λl) E Mf (T ), el l=1 ∞ X p  D (N) E D (N) E  = 4 (1 + λl) E [[ Mf , el , Mf , el ]]T l=1 ∞ 4σ2T X ≤ sup (1 + λ )p kgrad e k2 . Nγ2 l l x N x∈M l=0

d Because of p < − 2 − 1, Lemma A.2.3.(ii) yields the finiteness of (2.34). This together with (N) (N) the linearity of Mf (t) proves Mf (t) ∈ Hp(M), for all t ∈ [0,T ], almost surely. Finally, (2.32) implies

t ∞ 2 Z X p D E D E Mft − Mfs = 2 (1 + λl) Mfu − Mfs, el d Mfu, el Hp s l=1 t 2 Z ∞ N σ X p 1 X 2 + (1 + λl) kgrad elk du , 2 xek(u) NγN N s l=1 k=1 for all 0 ≤ s ≤ t ≤ T . Therefore, it follows that

 2  σ2 E M − M ≤ κ (t − s) . ft fs 2 p Hp NγN 1 Applying first Itˆo’sformula and then (2.33), for δ = 2 , we get NγN  4   2 2  E Mft − Mfs = E [[ Mf· − Mfs , Mf· − Mfs ]]t Hp Hp Hp t 4σ2κ Z  2  ≤ p E M − M du 2 fu fs NγN Hp s 2.5. THE FREE MODEL 39

t 4 2 Z 4 2 4σ κp 2σ κp 2 ≤ 2 4 (u − s)du = 2 4 (t − s) . N γN N γN s Now the Kolmogorov–Prohorov criterion for almost surely continuous paths proves (v), see for instance [HT94][Satz 2.11]. 2

If X(N)(0) converges, as N → ∞, almost surely to a probability measure ν then one expect ϑ that X(N) is in a neighborhood of the McKean–Vlasov path starting at ν. The following ϑ lemma makes this statement more precise.

Lemma 2.5.2 Fix T > 0 and ϑ ∈ C([0,T ]; H (M)), for some r ∈ . Assume that X(N)(0) r R ϑ converges, as N → ∞, almost surely in P(M) to some non–random ν ∈ P(M). Denote the McKean–Vlasov path with initial datum ν by µ ∈ C([0,T ]; P(M)). Then

t t Z D E Z lim E X(N)(s), f(s) ds = hµ(s), f(s)i ds , (2.35) N→∞ ϑ 0 0 for all t ∈ [0,T ] and all f ∈ C(M × [0,T ]).

Proof: Take p < − d . Then, by Lemma A.2.3.(iii), the sequence (X(N)(0)) converges in 2 ϑ N∈N Hp(M) almost surely to ν. For N ∈ N, define   µ(N)(t) := E X(N)(t)|X(N)(0) , for t ∈ [0,T ], ϑ ϑ ϑ and denote by µ(N) ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum X(N)(0). ϑ Using Itˆo’sformula, one easily compute that µ(N) is a solution of ϑ

(N) (N) (N) dµ (t) = (Lµ (t)+γN ϑ(t))∗µ (t) dt. ϑ ϑ

Therefore, the distribution valued function u = µ(N) − µ solves ϑ  d   ∗ − (Gµ(t))∗ u(t) = R(u(t)) + B µ(N)(t) − µ(N)(t) + γ ϑ(t) µ(N)(t) , dt ϑ N ϑ u(0) = X(N)(0) − ν , ϑ with hR(u(t)), fi = hu(t), B(u(t))fi , for all f ∈ C∞(M). Now, by Proposition 2.3.3, we get

 t Z sup ku(s)k2 ≤ c ku(0)k2 + kR(u(s))k2 ds (2.36) Hp  Hp Hp−1 s∈[0,t] 0 t  Z  ∗ 2 + B µ(N)(s) − µ(N)(s) + γ ϑ(s) µ(N)(s) ds , ϑ N ϑ  Hp−1 0 40 CHAPTER 2. MODERATE DEVIATIONS for all t ∈ [0,T ] and some constant c > 0. Using Lemma 2.1.7, we conclude that kR(u(s))k ≤ c ku(s)k2 , Hp−1 Hp

B(ϑ(s))∗µ(N)(s) ≤ c ϑ(s) µ(N)(s) , ϑ Hr ϑ Hp−1 Hp

B(µ(N)(s) − µ(N)(s))∗µ(N)(s) ≤ c µ(N)(s) − µ(N)(s) µ(N)(s) , ϑ ϑ ϑ ϑ Hp−1 Hp Hp for some constant c > 0. d Because ϑ is an element of C([0,T ]; Hr(M)) and p < − 2 , we get sup ϑ(t) < ∞ and sup ku(t)k ≤ 2 sup kηk < ∞. (2.37) Hr Hp Hp t∈[0,T ] t∈[0,T ] η∈P(M) Using this, we can proceed with  t  Z  2  sup ku(s)k2 ≤ c ku(0)k2 + ku(s)k2 + γ2 + µ(N)(s) − µ(N)(s) ds , (2.38) Hp  Hp Hp N ϑ  s∈[0,t] Hp 0 for all t ∈ [0,T ] and some constant c > 0. Since µ(N) − µ(N) solves ϑ   d (N) − (Lµ )∗ (µ(N)(t) − µ(N)(t)) = −γ B(ϑ(t))∗µ(N)(t), t ∈ [0,T ], dt ϑ N ϑ µ(N)(0) − µ(N)(0) = 0, ϑ it follows from Proposition 2.3.3 and (2.37) that

sup µ(N)(t) − µ(N)(t) ≤ c γ , ϑ N t∈[0,T ] Hp for some constant c > 0. Applying Gronwall’s inequality, see [DZ93, Lemma E.6], to (2.38), we get a new constant c > 0 such that 2  2  sup µ(N)(t) − µ(t) ≤ c µ(N)(0) − µ(0) + γ2 . ϑ ϑ N t∈[0,T ] Hp Hp The right hand side converges almost surely to zero as N tends to infinity. This implies t Z lim µ (s) − µ(s), f(s) ds = 0 , for all t ∈ [0,T ] almost surely, N→∞ ϑ 0 for all f ∈ C∞,0(M × [0,T ]). Finally, using dominated convergence, we conclude that

t t Z D E Z D   E lim E X(N)(s), f(s) ds = lim E µ(N)(s) , f(s) ds N→∞ ϑ N→∞ ϑ 0 0 t Z = hµ(s), f(s)i ds, 0 first for all f ∈ C∞,0(M × [0,T ]) and then for all f ∈ C(M × [0,T ]). 2 2.5. THE FREE MODEL 41

Now let us look at the logarithmic moment generating function of increments of the measure (N) valued empirical martingales Mf , N ∈ N.

d Lemma 2.5.3 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Assume that X(N)(0) converges, as N → ∞, almost surely in P(M) to some non–random ν ∈ P(M). ϑ Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. Then

n !! 1 2 X D (N) (N) E lim ln E exp NγN Mf (tl) − Mf (tl−1), fl N→∞ Nγ2 N l=1 n tl σ2 X Z D E = µ(s), kgradf k2 ds =: Λ (f , . . . , f ) , (2.39) 2 l t0,...,tn 1 n l=1 tl−1

for all f1, . . . , fn ∈ H−p(M) and all 0 = t0 ≤ t1,..., ≤ tn = T .

Proof: From 2.5.1.(i) it follows that

n !! 2 X D (N) (N) E E exp NγN Mf (tl) − Mf (tl−1), fl l=1 N n !! Y 2 X D (N) (N) E = E exp γN Mfk (tl) − Mfk (tl−1), fl , k=1 l=1

for all N ∈ N. Using a Taylor expansion, we can proceed with

 2 N 4 n ! Y γN X D (N) (N) E = 1 + E Mf (tl) − Mf (tl−1), fl  2 k k k=1 l=1 !3 γ3 n D E N X (N) (N) ξk + E γN Mf (tl) − Mf (tl−1), fl e , 6 k k  l=1

2 D E γN Pn (N) (N) 3 x for some random ξk between 0 and 3 l=1 Mfk (tl) − Mfk (tl−1), fl . Since x < 3e , for all x ∈ R, we compute

n !3 X D (N) (N) E ξ E γN Mfk (tl) − Mfk (tl−1), fl e l=1 !! n D E X (N) (N) ≤ 3 E exp γN (γN + 1) Mfk (tl) − Mfk (tl−1), fl l=1 !! (N) ≤ 3 E exp 2nγN (γN + 1) sup M (s) max kfnk fk H−p 0≤s≤T Hp 1≤l≤n √ !! (N) ≤ 3 E exp 2n(γN + 1) NγN sup M (s) max kfnk , f H−p 0≤s≤T Hp 1≤l≤n 42 CHAPTER 2. MODERATE DEVIATIONS for each 1 ≤ k ≤ N. The right hand side is by 2.5.1.(iv) finite uniform in N ∈ N. Using Taylor expansion for the logarithm, we get n !! 1 2 X D (N) (N) E ln E exp Nγ Mf (tl) − Mf (tl−1), fl Nγ2 N N l=1  2  N 4 n ! 1 X γN X D (N) (N) E 3 =  E Mf (tl) − Mf (tl−1), fl + O(γN ) , Nγ2 2 k k N k=1 l=1

3 3 3 where O(γN ) are real numbers with limN→∞|O(γN )/γN | < ∞. Using Itˆo’sformula and part (ii) of Lemma 2.5.1, we can proceed with

tl 2 n Z σ X D (N) E = E X (s), kgradf k2 ds + O(γ ) . 2 ϑ l N l=1 tl−1 d Since p < − 2 − 1 and f1, . . . , fn are elements of H−p(M) Lemma A.2.2.(iii) implies that 2 kgradflk is an element of C(M), for 1 ≤ l ≤ n. Therefore, (2.39) is a consequence of the last calculation and Lemma 2.5.2. 2 Before we can prove a large deviation principle for the finite dimensional distributions of (N) the processes Mf , N ∈ N, we require a statement about exponential tightness. d Lemma 2.5.4 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Then for n all 0 = t0 ≤ t1 ≤ ... ≤ tn = T and all α > 0 there exists a compact set Kα ⊆ (Hp(M)) such that

1  (N) (N) (N) (N)   lim lim 2 ln P Mf (t1) − Mf (t0),..., Mf (tn) − Mf (tn−1) 6∈ Kα = −∞. N→∞ N→∞ NγN

d Proof: Choose q ∈ R in such a way that − 2 − 1 > q > p. For α > 0, define n on K := ϑ ∈ H (M): kϑk2 ≤ α . α p Hq n Then Kα is closed and hence by A.2.2.(ii) compact in (Hp(M)) . Now from 2.5.1.(iii) it follows that  (N) (N) (N) (N)   P Mf (t1) − Mf (t0),..., Mf (tn) − Mf (tn−1) 6∈ Kα ! 2 (N) (N) ≤ nP sup Mf (t) − Mf (s) > α 0≤s,t≤T Hq

Nγ2 − N α + 1 ≤ n e 8σ2κqT 2 4 . This proves our claim. 2 Recall the definition (2.14) of kϑk , for ϑ ∈ D0(M) and µ ∈ P(M), i.e., H−1(M;µ) hϑ, fi2 kϑk2 := sup . H−1(M;µ) D E f∈C∞(M) µ, kgradfk2 hµ,kgradfk2i>0

Now we are able to handle the finite dimensional distributions of Mf(N). 2.5. THE FREE MODEL 43

d Proposition 2.5.5 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Assume that X(N)(0) converges, as N → ∞, almost surely in P(M) to some non–random ϑ ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν.  (N) (N)  Fix 0 = t0 ≤ t1 ≤ ... ≤ tn = T . Then (Mf (t0),..., Mf (tn)) satisfies a LDP in N∈N n+1 2 (Hp(M)) with rate NγN and rate function  n 2 tl − tl−1 X ϑl − ϑl−1  , for ϑ = 0,  2 tl 0 ∗ 2σ t − t 1 R l l−1 H−1(M; µ(s)ds) Λt ,...,tn (ϑ0, . . . , ϑn) := l=1 t −t 0 l l−1 t  l−1  ∞, for ϑ0 6= 0.

Proof: Note that Λt0,...,tn , defined by (2.39), is finite and Gateaux differentiable. Combining Lemma 2.5.3, Lemma 2.5.4 and the G¨artner–Ellis Theorem, see [DZ93, Corollary 4.4.27], we  (N) (N) (N) (N)  get a large deviation principle for (Mf (t1) − Mf (t0),..., Mf (tn) − Mf (tn−1)) in N∈N n 2 (Hp(M)) with rate NγN and rate function

n ! ∗ X Λet0,...,tn (ϑ1, . . . , ϑn) = sup hϑl, fli − Λt0,...,tn (f1, . . . , fn) f1,...,fn∈H−p(M) l=1   n tl X σ2 Z D E = sup hϑ , f i − µ(s), kgradf k2 ds  l l 2 l  f1,...,fn∈H−p(M) l=1 tl−1   n tl X σ2 Z D E = sup hϑ , fi − µ(s), kgradfk2 ds .  l 2  l=1 f∈H−p(M) tl−1

∞ Since the mapping H−p(M) 3 f 7→ hϑl, fi is continuous and C (M) is a dense subset of H−p(M) we can take the infimum over all f ∈ H−p(M) instead of the infimum over all f ∈ C∞(M). Replacing f by cf and taking first the supremum over c ∈ R, we get n 2 1 X hϑl, fi Λ∗ (ϑ , . . . , ϑ ) = sup et0,...,tn 1 n 2 t 2σ f∈C∞(M) l D E l=1 R 2 tl µ(s), kgradfk ds R µ(s),kgradfk2 >0 h i tl−1 tl−1 n 2 tl − tl−1 X ϑl = . 2 tl 2σ t − t 1 R l l−1 H−1(M; µ(s)ds) l=1 tl−tl−1 tl−1

n n Since the mapping F :(Hp(M)) → {0} × (Hp(M)) with

n ! X F (ϑ1, . . . , ϑn) = 0, ϑ1, ϑ1 + ϑ2,..., ϑn l=1 is bijective and continuous we can apply the contraction principle, see for instance [DZ93,  (N) (N)  n Theorem 4.2.1]. The result is a LDP for (Mf (t0),..., Mf (tn)) in {0} × (Hp(M)) N∈N 44 CHAPTER 2. MODERATE DEVIATIONS

∗ n n+1 with rate function Λt0,...,tn . Finally, the set {0} × (Hp(M)) is closed in (Hp(M)) . There- n+1 fore, we can extend the large deviation principle to the larger space (Hp(M)) , see [DZ93, Lemma 4.1.5]. 2

In order to derive a large deviation principle for processes in C([0,T ]; Hp(M)) from the LDP for the finite dimensional distributions of these processes one requires a statement about exponential tightness in C([0,T ]; Hp(M)). Such a result we present in the following lemma.

d Lemma 2.5.6 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Then for all α > 0 there exists a compact set Kα ⊆ C([0,T ]; Hp(M)) with

1  (N)  lim lim ln P Mf 6∈ Kα = −∞ . α→∞ 2 N→∞ NγN

d Proof: Choose q ∈ R in such a way that − 2 −1 > q > p. For α > 0, define Kα := K1,α ∩K2,α with ( ) K := ϑ ∈ C([0,T ]; H (M)): sup kϑ(s)k ≤ α 1,α p Hq 0≤s≤T   ∞   \  1  K2 := ϑ ∈ C([0,T ]; Hp(M)): sup kϑ(t) − ϑ(s)k ≤ . Hp n n=1  0≤s,t≤T   |s−t|< 1  n3α

Then the sets Kα are closed subsets of C([0,T ]; Hp(M)). Therefore, A.2.2.(vi) together with A.2.2.(ii) yield the compactness of Kα. Using 2.5.1.(iv), we get

1  (N)  lim lim ln P Mf 6∈ K1,α = −∞ . α→∞ 2 N→∞ NγN Finally, from 2.5.1.(iv) it follows that     ∞ 1 (N) X (N) (N) P Mf 6∈ K2,α ≤ P  sup Mf (t) − Mf (s) >  0≤s,t≤T, |s−t|< 1 Hp n n=1 n3α ∞ 2 NγN α 1 X − 2 n+ ≤ e 8σ κp 4 n=1 Nγ2 α − N + 1 e 8σ2κp 4 = . Nγ2 α − N 1 − e 8σ2κp This implies 1  (N)  lim lim ln P Mf 6∈ K2,α = −∞ . α→∞ 2 N→∞ NγN 2

(N) Now we can state the first LDP for the measure valued martingales Mf , N ∈ N. 2.5. THE FREE MODEL 45

d Proposition 2.5.7 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Assume that X(N)(0) converges, as N → ∞, almost surely in P(M) to some non–random ϑ ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. (N) 2 Then (Mf )N∈N satisfies a LDP in C([0,T ]; Hp(M)) with rate NγN and rate function

n  t −t ϑ(t )−ϑ(t ) 2 P l l−1 l l−1 t  sup 2 l , if ϑ(0) = 0,  2σ tl−tl−1 1 R t ,...,t H−1(M; µ(s)ds) I(ϑ) = 0 n l=1 tl−tl−1 (2.40) tl−1  ∞, otherwise,

where the supremum is taken over all 0 = t0 < t1 < . . . < tn = T and all n ∈ N.

Proof: Because of Proposition 2.5.5, the finite dimensional distributions of Mf(N) satisfy a large deviation principle. Therefore, we can take the projective limit, see [DZ93, Theorem (N) [0,T ] 4.6.1]. The result is a large deviation principle for (Mf )N∈N in (Hp(M)) with rate 2 [0,T ] NγN and rate function I. Here (Hp(M)) is equipped with the topology of pointwise convergence. (N) Because of Lemma 2.5.1.(v), the processes Mf , N ∈ N, are almost surely elements of C([0,T ]; Hp(M)). Moreover,

∞ X kϑ(t) − ϑ(s)k2 = (1 + λ )p hϑ(t) − ϑ(s), e i2 Hp l l l=1 t ∞ Z X p D 2E −p −2 ≤ (1 + λ ) µ(u), kgrad e k du + (1 + λ ) l kϑ(t) − ϑ(s)k t l l l R H−1(M; µ(u)du) l=1 s s ∞ 2  X −2 ≤ 2σ (t − s) κp + l I(ϑ) , l=1

for all 0 ≤ s ≤ t ≤ T . In other words, I is infinite outside of C([0,T ]; Hp(M)). Therefore, we can restrict the LDP to the space C([0,T ]; Hp(M)) equipped with the topology induced by [0,T ] (Hp(M)) , see [DZ93, Lemma 4.1.5]. This topology is weaker then the natural topology on C([0,T ]; Hp(M)), i.e., the topology of uniform convergence in time. But the sequence (N) (Mf )N∈N is by Lemma 2.5.6 exponentially tight in C([0,T ]; Hp(M)) with respect to the uniform topology. Hence, the claimed large deviation principle follows, see [DZ93, Corollary 4.2.6]. 2

Take ϑ ∈ C([0,T ]; Hp(M)) such that I(ϑ) < ∞. Then the formula (2.40) for I(ϑ) looks like a Riemannian sum for the integral

T 1 Z 2 ˙ 2 ϑ(s) ds. (2.41) 2σ H−1(M;µ(s)) 0

Therefore, we expect that for “nice” distribution valued functions ϑ ∈ C([0,T ]; Hp(M)) the rate function I is equal to (2.41) and that for all other distribution valued functions I(ϑ) = ∞. We will see, that a distribution valued functions ϑ ∈ C([0,T ]; Hp(M)) is “nice” if and only if it is an element of ACµ. 46 CHAPTER 2. MODERATE DEVIATIONS

We cannot prove this conjecture directly. But we are able to prove a second LDP for the (N) sequence (Mf )N∈N with another representation I of the rate function and compute that the expected rate function lies between I and I. Therefore, let us finish this section with a second (N) large deviation principle for the processes Mf , N ∈ N.

d Proposition 2.5.8 Fix p < − 2 − 1, T > 0 and ϑ ∈ C([0,T ]; Hr(M)), for some r ∈ R. Assume that X(N)(0) converges, as N → ∞, almost surely in P(M) to some non–random ϑ ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. (N) 2 Then (Mf )N∈N satisfy a LDP in L2([0,T ]; Hp(M)) with rate NγN and rate function

2  T ! R  hϑ˙(t),f(t)idt  0  sup , if ϑ(0) = 0 I(ϑ) = T (2.42) f 2 R 2  2σ hµ(t),kgradf(t)k idt  0  ∞ , otherwise.

Here ϑ˙ denotes the derivative in distribution sense of ϑ ∈ D0([0,T ] × M) with respect to the first variable. The supremum in the first line of the right hand side of (2.42) is taken over all ∞ R T D 2E f ∈ C ([0,T ] ×M) with 0 µ(s), kgrad f(s)k >0 and f(T ) = 0.

Proof: First, like in the proof of Lemma 2.5.3, we compute

  T   T 2 1 Z D E Z D E ln E exp Nγ2 M (N)(t), f(T ) = E  M (N)(t), f(t)  + O(γ ), 2   N f   f   N NγN 0 0

for all f ∈ L2([0,T ]; H−p), where O(γN ) are real numbers with limN→∞|O(γN )/γN | < ∞. Then we use Itˆo’sformula and Lemma 2.5.2 to show that

 T 2 T T 2 Z 2 Z * Z + D (N) E σ lim E  Mf (t), f(t)   = µ(t), grad f(s)ds dt =: Λ(f). N→∞   2 0 0 t

(N) By Lemma 2.5.6 the sequence (Mf )N∈N is exponentially tight in C([0,T ]; Hp(M)). This implies the exponential tightness of this sequence in L2([0,T ]; Hp(M)). Therefore, we can (N) apply the G¨artner–Ellis theorem and get a LDP for (Mf )N∈N in L2([0,T ]; Hp(M)) with 2 rate NγN and rate function

 2  T T* T + Z σ2 Z Z   I(ϑ) = sup  hϑ(t), f(t)i dt − µ(t), grad f(s)ds dt. f∈L2([0,T ];H−p(M)) 2 0 0 t

Since the first summand on the right hand side is continuous in f ∈ L2([0,T ]; H−p(M)) and ∞ C ([0,T ] × M) is a dense subset of L2([0,T ]; H−p(M)) it follows that

 T  Z I(ϑ) = sup  hϑ(t), f(t)i dt − Λ(f) . f∈C∞([0,T ]×M) 0 2.5. THE FREE MODEL 47

Replacing f by cf and taking first the supremum over c ∈ R, we get

2 T ! R hϑ(t), f(t)i dt 0 I(ϑ) = sup 2 f∈C∞([0,T ]×M) T * T +  2 T T 2σ2 R µ(t), grad R f(s)ds dt R µ(s), grad R f(u) du ds>0 0 t 0 s 2 T ! R d ϑ(t), dt g(t) dt 0 = sup T T g(·)=R f(s)ds∈C∞([0,T ]×M) 2 R D 2E · 2σ µ(t), kgrad g(s)k dt T 0 R hµ(s),kgrad g(s)k2ids>0 0 !2 T D E R ϑ˙(t), g(t) dt − hϑ(0), g(0)i 0 = sup ∞ T g∈C ([0,T ]×M), g(T )=0, D 2E T 2 R R 2 2σ µ(t), kgrad g(s)k dt hµ(s),kgrad g(s)k ids>0 0 0

In the last line ϑ˙ denotes the derivative of the distribution ϑ ∈ D0([0,T ] × M) with respect to the first coordinate. Finally, if ϑ(0) 6= 0 then one easily computes I(ϑ) = ∞. 2

2.5.2 Identification of the rate function

Let µ be an element of C([0,T ]; P(M)) and define H−1(M; µ(s)) by (2.14), for s ∈ [0,T ]. In this section we want to derive a “nice” shaped representation of the rate function associated (N) with the LDP for the processes Mf , N ∈ N, proven in Proposition 2.5.7. Therefore, remind 0 the Definition 2.2.1 of the set ACµ of all distribution valued functions ϑ ∈ C([0,T ]; D (M)), which are absolutely continuous with respect to µ ∈ C([0,T ]; P(M)). The idea to look at this set of distributions is taken from the article [DG87] by Donald Dawson and J¨urgen G¨artner. Therefore, let us first verify that our definition is a special case of the definition in this paper.

Lemma 2.5.9 Fix T > 0 and µ ∈ C([0,T ]; P(M)). Then each ϑ ∈ ACµ is absolutely continuous in the sense of Definition 4.1 of [DG87], i.e., there exists a neighborhood U of 0 ∞ in C (M) and an absolutely continuous function H: [0,T ] → R such that

| hϑ(t), fi − hϑ(s), fi | ≤ |H(t) − H(s)| , for all s, t ∈ [0,T ] and all f ∈ U.

Proof: Take U = {f ∈ C∞(M): kgradfk < 1} and

n X H(t) = sup kϑ(tl) − ϑ(tl−1)k 1 R tl . H−1(M; µ(t)dt) 0=t

Then H is absolutely continuous. Moreover, n X | hϑ(t), fi − hϑ(s), fi | ≤ sup kϑ(tl) − ϑ(tl−1)k 1 R tl H−1(M; µ(t)dt) s=t

for all 0 ≤ s ≤ t ≤ T . 2

The next lemma will tell us that the time derivative of ϑ ∈ ACµ exists as distribution valued almost everywhere defined function. This explains why we call such a distribution valued function absolutely continuous.

Lemma 2.5.10 Fix T > 0, µ ∈ C([0,T ]; P(M)) and assume that ϑ ∈ ACµ. For each f ∈ C∞(M), the real function hϑ(·), fi is absolutely continuous. Moreover, the derivative ϑ(t + h) − ϑ(t) ϑ˙(t) = lim h→0 h exists in the distribution sense for Lebesgue almost all t ∈ [0,T ], i.e., ϑ˙(t) ∈ D0(M), for Lebesgue almost all t ∈ [0,T ]. A proof can be found in [DG87, Lemma 4.2]. Moreover, we have the following integration by parts formula.

Lemma 2.5.11 Fix 0 ≤ s ≤ t ≤ T , µ ∈ C([0,T ]; P(M)) and assume that ϑ ∈ ACµ. Then

t t Z D E Z D E hϑ(t), f(t)i − hϑ(s), f(s)i = ϑ˙(u), f(u) du + ϑ(u), f˙(u) du (2.43) s s holds, for all f ∈ C∞([0,T ] × M). Moreover,

t Z

ϑ(t) − ϑ(s) − ϑ˙(u) du = 0 , (2.44)

s H−1(M;µ(t)) for all 0 ≤ s ≤ t ≤ T .

Proof: Formula (2.43) is taken from [DG87, Lemma 4.3]. To prove (2.44) we use (2.43) and get

 t 2 t 2 R ˙ Z ϑ(t) − ϑ(s) − ϑ(u)du, f ˙ s ϑ(t) − ϑ(s) − ϑ(u) du = sup D E f∈C∞(M) µ(t), kgradfk2 s 2 H−1(M;µ(t)) hµ(t),kgradfk i>0  t 2 R d ϑ(u), dt f du s = sup D E = 0 . f∈C∞(M) µ(t), kgradfk2 hµ(t),kgradfk2i>0 2 2.5. THE FREE MODEL 49

Before we will deduce a “nice” shaped representation of the rate function associated with (N) the large deviation principle for the measure valued martingales Mf , N ∈ N, we present a result for the H−1(M; µ) norm, which looks like a version of Fatou’s lemma.

Lemma 2.5.12 Fix µ ∈ P(M) and ϑ ∈ D0(M). Assume that

(k) (i) The sequence (µ )k∈N of probability measures on M converges weakly to µ. (k) (k) (ii) The sequence (ϑ )k∈N of elements of H−1(M; µ ) converges weakly to ϑ. Then 2 lim ϑ(k) ≥ kϑk2 . (2.45) (k) H−1(M;µ) k→∞ H−1(M;µ )

Proof: If the left hand side of (2.45) is equal to infinity then there is nothing to prove. 0 Otherwise, we define 0 := 0. Then we get 2 2 ϑ(k), f ϑ(k) ≥ , (2.46) (k) D E H−1(M;µ ) µ(k), kgradfk2

∞ D 2E for all k ∈ N and all f ∈ C (M), with µ, kgradfk = 1. Taking in (2.46) the limit inferior of k → ∞, it follows that

2 2 ϑ(k), f lim ϑ(k) ≥ lim (k) D E k→∞ H−1(M;µ ) k→∞ µ(k), kgradfk2 hϑ, fi2 = D E, µ, kgradfk2

D E for all f ∈ C∞(M), with µ, kgradfk2 = 1. Therefore,

2 hϑ, fi2 lim ϑ(k) ≥ sup (k) D E k→∞ H−1(M;µ ) f∈C∞(M) µ, kgradfk2 hµ,kgradfk2i=1 = kϑk2 . H−1(M;µ) 2 Now we are able to present a “nice” representation of the rate function associated with the (N) LDP for the measure valued martingales Mf , N ∈ N, proven in Proposition 2.5.7.

d Lemma 2.5.13 Fix p ≤ − 2 − 1, T > 0 and ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. Define the functional Ie: C([0,T ]; Hp(M)) → [0, ∞] by

 T 2  1 R ϑ˙(t) dt, ϑ ∈ AC , ϑ(0) = 0, 2σ2 µ Ie(ϑ) := 0 H−1(M;µ(t)) (2.47)  ∞, otherwise. 50 CHAPTER 2. MODERATE DEVIATIONS

Then

I(ϑ) = Ie(ϑ) = I(ϑ) , (2.48) for all ϑ ∈ C([0,T ]; Hp(M)).

Proof: First a short sketch of the proof. We will show the following three parts

(1) I = I on C([0,T ]; Hp(M)).

(2) I(ϑ) ≤ Ie(ϑ), for all ϑ ∈ C([0,T ]; Hp(M)).

(3) Ie(ϑ) ≤ I(ϑ), for all ϑ ∈ C([0,T ]; Hp(M)).

(N) Part (1): The sequence (Mf )N∈N satisfies by Proposition 2.5.7 a large deviation principle in C([0,T ]; Hp(M)) with rate function I. If we furnish L2([0,T ]; Hp(M)) with the strongest topology τ, which induces on C([0,T ]; Hp(M)) the topology of uniform convergence, then C([0,T ]; Hp(M)) is a closed subset of L2([0,T ]; Hp(M)). Therefore, we can extent the LDP (N) holding for (Mf )N∈N to the larger space L2([0,T ]; Hp(M)) equipped with the topology τ, where the new rate function Iˆ is equal to I on C([0,T ]; Hp(M)). (N) Since the topology τ is stronger then the L2–topology (Mf )N∈N satisfies a LDP with rate function Iˆ in L2([0,T ]; Hp(M)) equipped with the usual topology. (N) Moreover, the sequence (Mf )N∈N satisfies by Proposition 2.5.8 a LDP with rate function I in L2([0,T ]; Hp(M)) equipped with the usual topology, too. Now since L2([0,T ]; Hp(M)) is a regular Hausdorff space I and Iˆ are equal, see [DZ93, Lemma 4.1.4], which implies I = I on C([0,T ]; Hp(M)). Part (2): If ϑ 6∈ ACµ or ϑ(0) 6= 0 then Ie(ϑ) = ∞ and there is nothing to prove. Now assume that ϑ ∈ ACµ with ϑ(0) = 0. Then by Lemma 2.5.10

ϑ(t + h) − ϑ(t) ϑ˙(t) := lim h→0 h exists in D0(M), for almost all t ∈ [0,T ]. Using the Cauchy–Schwarz inequality, we get

!2 T D E R ϑ˙(t), f(t) dt 0 2σ2 I(ϑ) = sup ∞ T f∈C ([0,T ]×M), f(T )=0, D 2E T R R 2 µ(t), kgradf(t)k dt hµ(s),kgrad f(s)k ids>0 0 0 T D E2 Z ϑ˙(t), f(t) ≤ sup D E dt f∈C∞([0,T ]×M), f(T )=0, µ(t), kgradf(t)k2 T 0 R hµ(s),kgrad f(s)k2ids>0 0 T D E2 Z ϑ˙(t), f(t) 2 ≤ sup D E dt ≤ 2σ Ie(ϑ) . f∈C∞([0,T ]×M), f(T )=0, µ(t), kgradf(t)k2 0 T R hµ(s),kgrad f(s)k2ids>0 0 2.5. THE FREE MODEL 51

Part (3): If ϑ(0) 6= 0 then there is nothing to prove. Now assume that ϑ ∈ ACµ with ϑ(0) = 0. For x ∈ R, denote by [x] the largest integer less or equal then x. For k ∈ N, take the following linear approximations of ϑ and µ

kt +1 T  kt T   kt T  [ T ] k ϑ [ T + 1] k − ϑ [ T ] k k Z ϑ(k)(t) := and µ(k)(t) := µ(s) ds , 2 T 2T k kt T [ T ] k (k) (k) for t ∈ [0,T ]. Then, as k → ∞, the sequences (ϑ (t))k∈N and (µ (t))k∈N converge weakly to ϑ˙(t) and µ(t), respectively, for almost all t ∈ [0,T ]. Now we compute n 2 2 X ϑ(tl) − ϑ(tl−1) 2σ I(ϑ) = sup (tl − tl−1) tl t − t 1 R 0=t0<...

Finally, assume that ϑ 6∈ ACµ and ϑ(0) = 0. Then there exist a positive number δ and sequences 0 ≤ tn < . . . < tn ≤ T , n ∈ , such that 1 kn N k Xn lim (tn − tn ) = 0 n→∞ l l−1 l=1 and kn X n n ϑ(t ) − ϑ(t ) tn ≥ δ, for all n ∈ . l l−1 l N 1 R l=1 H−1(M; tn−tn µ(t)dt) l l−1 tn l−1 Fix % > 0 arbitrarily. Then we get

 t  n Z l X D 2E I(ϑ) = sup sup hϑ(tl) − ϑ(tl−1), fi − µ(t), kgradfk dt ∞   0=t0<...

∞ For each 1 ≤ l ≤ kn, choose fl ∈ C (M) in such a way that

n n % n n ϑ(t ) − ϑ(t ), f ≥ ϑ(t ) − ϑ(t ) tn l l−1 l l l−1 l 2 1 R H−1(M; tn−tn µ(t)dt) l l−1 tn l−1

n tl 1 R D 2E 2 and tn−tn µ(t), kgrad flk dt = % . Then we can proceed with l l−1 n tl−1   kn X % n n n n 2 %δ I(ϑ) ≥ lim ϑ(t ) − ϑ(t ) tn − (t − t )% ≥ .  l l−1 l l l−1  n→∞ 2 1 R  2 l=1 H−1(M; tn−tn µ(t)dt) l l−1 tn l−1

Since % > 0 was arbitrarily it follows that

I(ϑ) = ∞ = Ie(ϑ).

2

2.5.3 Large deviations for the free model

(N) Fix a sequence (γN )N∈N of positive numbers, which satisfies (2.1). Denote by µ the (N) 1 (N) (N) McKean–Vlasov path with initial datum X (0). Then ( (X − µ ))N∈ satisfies the ϑ γN ϑ N following large deviation principle.

Theorem 2.5.14 Fix T > 0, p < − d − 1 and ϑ ∈ C([0,T ]; D0(M)). Assume that X(N)(0) 2 ϑ converges, as N → ∞, almost surely in P(M) to some non–random ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. Then the processes (N) 1  (N) (N) 2 Z := X − µ , N ∈ , satisfy a LDP in C([0,T ]; Hp−1(M)) with rate Nγ and γN ϑ N N rate function

 T 2 ϑ ∈ ACµ,  1 R ϑ˙(t) − (Lµ(t))∗ϑ(t) − B(ϑ(t))∗µ(t) dt,  2σ2 ϑ(0) = 0, H−1(M;µ(t)) Iϑ(ϑ) = 0 (2.49)  ∞, otherwise.

(N) Proof: First let us study the solution Ze ∈ C([0,T ]; Hp−1(M)) of

t t Z Z Ze(N)(t) = (Lµ(s))∗Ze(N)(s) ds + B(ϑ(s))∗µ(s) ds + Mf(N)(t) . 0 0

From Proposition 2.3.3 and Corollary 2.3.6 it follows that Ze(N) is the image of Mf(N) with respect to a continuous mapping from C([0,T ]; Hp(M)) to C([0,T ]; Hp−1(M)). Therefore, (N) we can apply the contraction principle, see [DZ93, Theorem 4.2.1], to the LDP for (Mf )N∈N (N) proven in Proposition 2.5.7. The result is a LDP for (Ze )N∈N in C([0,T ]; Hp−1(M)) with 2 rate NγN . The corresponding rate function is by Lemma 2.5.13 equal to Iϑ. 2.5. THE FREE MODEL 53

Moreover, u := Z(N) − Ze(N) solves  d  − (Lµ(t))∗ u(t) = B(ϑ(t))∗(X(N)(t) − µ(t)) + B(µ(N)(t) − µ(t))∗Z(N)(t). dt ϑ Again by Proposition 2.3.3 we get

2 2 sup Z(N) − Z(N) ≤ c sup B(ϑ(t))∗(X(N)(t) − µ(t)) (2.50) e ϑ t∈[0,T ] Hp−1 t∈[0,T ] Hp−2 ! 2 (N) ∗ (N) + sup B(µ (t) − µ(t)) Z (t) , t∈[0,T ] Hp−2

for some constant c > 0. Since ϑ is an element of C([0,T ]; Hr(M)) it follows from Lemma 2.1.7 that

B(ϑ(t))∗(X(N)(t) − µ(t)) ≤ c X(N)(t) − µ(t)) , (2.51) ϑ ϑ Hp−2 Hp−1

B(µ(N)(t) − µ(t))∗Z(N)(t) ≤ c µ(N)(t) − µ(t) Z(N)(t) , (2.52) Hp−2 Hp Hp−1 for all t ∈ [0,T ] and some constant c > 0. Moreover, u := X(N) − µ solves ϑ t Z   ∗  u(t)−(X(N)(0)−ν) = (Lµ(s))∗u(s) + B µ(N)(s) − µ(s) + γ ϑ(s) X(N)(s) ds+γ M (N)(t), ϑ N ϑ N f 0 which, using ones more Proposition 2.3.3, implies

2 2 2 sup X(N)(t) − µ(t) ≤ c X(N)(0) − ν + γ2 sup M (N)(t) ϑ ϑ N f t∈[0,T ] Hp−1 Hp−1 t∈[0,T ] Hp !  ∗ 2 + sup B µ(N)(t) − µ(t) + γ ϑ(t) X(N)(t) , N ϑ t∈[0,T ] Hp−2

d for some constant c > 0. Furthermore, since ϑ is an element of C([0,T ]; Hr(M)) and p < − 2 we get

B(µ(N)(t) − µ(t) + γ ϑ(t))∗X(N)(t) N ϑ Hp−2 ! (N) (N) ≤ c γN sup ϑ(t) + µ (t) − µ(t) X (t) Hr ϑ t∈[0,T ] Hp−1 Hp−1   (N) ≤ c γN + µ (t) − µ(t) , Hp−1 for all t ∈ [0,T ] and some constant c > 0, which may change from line to line. This together with (2.51) implies

2 2 2 sup B(ϑ(t))∗(X(N)(t) − µ(t)) ≤ c X(N)(0) − ν + sup µ(N)(t) − µ(t) ϑ ϑ t∈[0,T ] Hp−2 Hp−1 t∈[0,T ] Hp−1 !! 2 2 (N) + γN 1 + sup Mf (t) . t∈[0,T ] Hp 54 CHAPTER 2. MODERATE DEVIATIONS

In order to estimate (2.52) let us look at the process Z(N). It solves

t Z   Z(N)(t) − Z(N)(0) = (Lµ(s))∗Z(N)(s) + B(ϑ(s))∗X(N)(s) ds ϑ 0 t Z + B(µ(N)(s) − µ(s))∗Z(N)(s)ds + Mf(N)(t), 0 for all t ∈ [0,T ]. Applying once more Proposition 2.3.3, we get t 2 Z 2 2 sup Z(N)(s) ≤ c B(ϑ(s))∗X(N)(s) ds + sup M (N)(s) ϑ s∈[0,t] Hp−1 Hp−2 s∈[0,T ] Hp 0 t ! Z 2 + B(µ(N)(s) − µ(s))∗Z(N)(s) ds , Hp−2 0 for all t ∈ [0,T ] and some constant c > 0. Moreover,

2 2 sup B(ϑ(s))∗X(N)(s) ≤ c sup ϑ(s) sup kηk2 < ∞ ϑ Hr Hp−1 s∈[0,T ] Hp−2 s∈[0,T ] η∈P(M) and 2 B(µ(N)(s) − µ(s))∗Z(N)(s) ≤ c sup kηk2 Z(N)(s) , Hp−1 Hp−2 η∈P(M) Hp−1 for some constant c > 0. Therefore,

 t  2 2 Z 2 (N) (N) (N) sup Z (s) ≤ c 1 + sup M (s) + Z (s) ds , s∈[0,t] Hp−1 s∈[0,T ] Hp Hp−1 0 for all t ∈ [0,T ] and some constant c > 0. Applying Gronwall’s inequality, see [DZ93, Lemma E.6], it follows that ! 2 2 (N) (N) sup Z (s) ≤ c 1 + sup M (s) , s∈[0,T ] Hp−1 s∈[0,T ] Hp for some constant c > 0. Using this, we can estimate (2.52) as follows ! 2 2 2 (N) ∗ (N) (N) (N) sup B(µ (t) − µ(t)) Z (t) ≤ c sup µ (t) − µ(t) 1 + sup M (t) . t∈[0,T ] Hp−2 t∈[0,T ] Hp−1 t∈[0,T ] Hp All together, we get 2 (N) (N) sup Z (t) − Ze (t) t∈[0,T ] Hp−1 ! !! 2 2 2 ≤ c X(N)(0) − ν + sup µ(N)(t) − µ(t) + γ2 2 + sup M (N)(t) . ϑ N f Hp−1 t∈[0,T ] Hp−1 t∈[0,T ] Hp 2.6. THE COUPLED MODEL 55

Because of p < − d + 1, the almost surely weak convergence of the sequence (X(N)(0)) to 2 ϑ N∈N ν ∈ P(M) implies the almost surely convergence in Hp−1(M). Applying Corollary 2.4.4, it follows that

(N) lim sup µ (t) − µ(t) = 0, almost surely. N→∞ t∈[0,T ] Hp−1 Hence, for each ε > 0, we get ! 1 lim ln P sup Z(N)(t) − Z(N)(t) > ε 2 e N→∞ NγN t∈[0,T ] Hp−1 ! 1 2 ≤ lim lim ln P sup M (N)(t) > K , 2 f K→∞ N→∞ NγN t∈[0,T ] Hp

(N) (N) which is by Lemma 2.5.1.(iii) equal to −∞. In other words, (Z )N∈N and (Ze )N∈N are (N) exponentially equivalent in C([0,T ]; Hp−1(M)). Hence, the same LDP holds for (Z )N∈N (N) as it holds for (Ze )N∈N, see [DZ93, Theorem 4.2.13]. 2

2.6 The coupled model

In this section we prove the moderate deviation principle for the measure valued processes (N) (N) d X − µ presented in Theorem 2.2.2. Fix some p < − 2 − 1 and some sequence (γN )N∈N of positive numbers, which satisfies (2.1). We have to estimate the probability for X(N) − µ(N) being in a γN –neighborhood of some ϑ ∈ C([0,T ]; Hp−1(M)). As we will see in Section 2.6.1 this probability is almost the same as for the free system X(N) − µ(N). This will give us a ϑ “weak” LDP, i.e., a LDP, where the upper bound 1.1.1.(ii) only holds for compact sets. In order to derive the full LDP we have to extend this to all closed subsets. One way to do 1 (N) (N) this is to show exponential tightness of the processes (X − µ ) in C([0,T ]; Hp−1(M)) γN 2 with rate NγN . Unfortunately, we cannot prove this directly. But in Section 2.6.2 we will construct a family of processes, which is exponentially tight and exponentially equivalent to 1 (X(N) − µ(N)), N ∈ . In Section 2.6.3 we will see that this together with the local result γN N is enough to prove Theorem 2.2.2.

2.6.1 Local large deviations As claimed before we want to compare the coupled model with the free one in a neighborhood

of ϑ ∈ C([0,T ]; Hp−1(M)). In order to do this we fix a sequence (γN )N∈N of positive numbers, which satisfies (2.1). For a distribution valued function ϑ ∈ C([0,T ]; Hp−1(M)), define the free particle system by (2.28) and the corresponding measure valued empirical process X(N) by ϑ (2.29). Denote by µ(N) the McKean–Vlasov path with initial datum X(N)(0) and take p < − d − ϑ 2 1 (N) (N) 1. From Theorem 2.5.14 we know that (X − µ ) satisfies a LDP in C([0,T ]; Hp−1(M)) γN ϑ 2 with rate NγN and rate function Iϑ. Note that Iϑ(ϑ) = I(ϑ), for all ϑ ∈ C([0,T ]; Hp−1(M)). Moreover, we denote by X(N) the measure valued empirical process of the coupled particle system (2.2). We take the same initial data for the free and the coupled model. Then in a γ –neighborhood of ϑ the processes X(N) − µ(N) and X(N) − µ(N) are almost the same. The N ϑ following theorem will make this statement more precise. 56 CHAPTER 2. MODERATE DEVIATIONS

d (N) Theorem 2.6.1 Fix T > 0 and p < − 2 − 1. Assume that X (0) converges, as N → ∞, almost surely in P(M) to some non-random ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the McKean–Vlasov path with initial datum ν. Then the following assertions are valid for each ϑ ∈ C([0,T ]; Hp−1(M)) with ϑ(0) = 0. (i) For each open neighborhood V of ϑ,   1 1 (N) (N) lim 2 ln P (X − µ ) ∈ V ≥ −I(ϑ) . N→∞ NγN γN

(ii) For each δ > 0, there exists an open neighborhood V of ϑ such that    1 1 (N) (N) −I(ϑ) + δ, I(ϑ) < ∞, lim 2 ln P (X − µ ) ∈ V ≤ N→∞ NγN γN −δ, I(ϑ) = ∞.

N Proof: Fix ϑ ∈ C([0,T ]; Hp−1(M)) with ϑ(0) = 0 and take (x1, . . . , xN ) ∈ M with

N 1 X δ = X(N)(0). N xk k=1

We denote by P (N) ∈ P(C([0,T ]; MN )) the distribution of the coupled particle system (x1(·), . . . , xN (·)) with initial datum (x1, . . . , xN ) defined by (2.2). Furthermore, we denote by (N) N Pe the distribution on C([0,T ]; M ) of the free particle system (xe1(·),..., xeN (·)) generated µ(N)+γ ϑ (N) by L N with initial datum (x1, . . . , xN ), see (2.28). By Ee we denote the expectation with respect to Pe(N). The Cameron–Martin–Girsanov formula tells us, that P (N) and Pe(N) are equivalent with Radon–Nikodym derivative

P (N)  1  = exp M(T ) − [[M,M]]T , Pe(N) 2

where M is a continuous Pe(N) martingale with M(0) = 0 and quadratic variation

T* N N 2+ Z 1 X 1 X [[M,M]] (y(·)) = N δ , B( δ ) − B(µ(N)(s) + γ ϑ) ds, t yk(s) yk(s) N N N 0 k=1 k=1

N y(·) = (y1(·), . . . , yN (·)) ∈ C([0,T ]; M ). In order to prove (i) fix δ > 0 arbitrarily. Assume without loss of generality that I(ϑ) < ∞. Since B satisfies Assumption 2.1.2 there exists for each ε > 0 an open neighborhood of ϑ ∈ C([0,T ]; Hp−1(M)) such that 2 ε sup B(ϑ)(x) − B(ϑ)(x) ≤ , x∈M T

2 N for all ϑ ∈ V . This implies [[M,M]]T (y(·)) ≤ εNγN , for all y(·) ∈ C([0,T ]; M ) with

N ! 1 1 X (N) δyk(·) − µ (·) ∈ V. γN N k=1 2.6. THE COUPLED MODEL 57

Take r, q > 1 with r−1 + q−1 = 1. Using H¨older’s inequality, we get  1  P (X(N) − µ(N)) ∈ V γN N ! ! (N) 1 1 X (N) = P δxk(·) − µ ∈ V γN N k=1   (N) M(T )− 1 [[M,M]] = E e 2 T 1( ! ) e  N  1 1 P δ −µ(N) ∈V γ N xk(·) N k=1   − q+r Nγ2 ε (N) M(T )+ q [[M,M]] ≥ e 2r N E e 2r T 1( ! ) e  N  1 1 P δ −µ(N) ∈V γ N xk(·) N k=1 N ! !r − r q+r 2  q 1 q q  q 1 1 X − 2r NγN ε (N) − r M(T )− 2 [[ r M, r M]]T (N) (N) ≥ e Ee e Pe δxk(·) − µ ∈ V . γN N k=1 Since  q 1 q q  exp − M − [[ M, M]] r 2 r r is a Pe(N) martingale starting at one we can proceed with   1 1 (N) (N) lim 2 ln P (X − µ ) ∈ V N→∞ NγN γN N ! ! q + r 1 1 1 X ≥ − ε + r ln P (N) δ − µ(N) ∈ V lim 2 e xk(·) 2r N→∞ Nγ γN N N k=1 q + r 1  1  = − ε + r lim ln P (X(N) − µ(N)) ∈ V 2 ϑ 2r N→∞ NγN γN q + r q + r ≥ − ε − rI (ϑ) = − ε − rI(ϑ) . 2r ϑ 2r Here we used the LDP for (X(N)) proven in Theorem 2.5.14. Taking r > 1 and ε > 0 ϑ N∈N small enough, it follows that   1 1 (N) (N) lim 2 ln P (X − µ ) ∈ V ≥ −I(ϑ) − δ . N→∞ NγN γN Since δ > 0 was arbitrary we get (i). In order to prove (ii) we only want to look at the case I(ϑ) < ∞. The other case follows by similar arguments. Fix δ > 0 arbitrary. Since (X(N)) satisfies a LDP we can choose ϑ N∈N an open neighborhood V of ϑ in such a way that   1 (N) (N) δ δ P (X − µ ) ∈ V ≤ −Iϑ(ϑ) + = −I(ϑ) + γN ϑ 2 2 and ε sup B(ϑ)(x) − B(ϑ)(x) ≤ , for all ϑ ∈ V. x∈M T 58 CHAPTER 2. MODERATE DEVIATIONS

Take r, q > 1 with r−1 + q−1 = 1. Then, using once more H¨older’s inequality, we compute  1  P (X(N) − µ(N)) ∈ V γN N ! ! (N) 1 1 X (N) = P δxk(·) − µ ∈ V γN N k=1   (N) M(T )− 1 [[M,M]] = E e 2 T 1( ! ) e  N  1 1 P δ −µ(N) ∈V γ N xk(·) N k=1   r−1 Nγ2 ε (N) M(T )− r [[M,M]] ≤ e 2 N E e 2 T 1( ! ) e  N  1 1 P δ −µ(N) ∈V γ N xk(·) N k=1 1 1 N ! ! q r−1 2  1  r 1 1 X 2 NγN ε (N) rM(T )− 2 [[rM,rM]]T (N) (N) ≤ e Ee e Pe δxk(·) − µ ∈ V . γN N k=1

1 (N) Since exp(rM − 2 [[rM, rM]]) is a Pe martingale starting at one it follows that   1 1 (N) (N) lim 2 ln P (X − µ ) ∈ V N→∞ NγN γN N ! ! r − 1 1 1 1 1 X ≤ ε + lim ln P (N) δ − µ(N) ∈ V 2 e xk(·) 2 q N→∞ Nγ γN N N k=1 r − 1 1 1  1  = ε + lim ln P (X(N) − µ(N)) ∈ V 2 ϑ 2 q N→∞ NγN γN r − 1 1 δ ≤ ε − (I(ϑ) − ). 2 q 2 Therefore, taking q > 1 and ε > 0 small enough, we finally get   1 1 (N) (N) lim 2 ln P (X − µ ) ∈ V ≤ −I(ϑ) + δ . N→∞ NγN γN 2

2.6.2 Exponential tightness As in the sections before fix a measure dependent vector field B, which satisfies Assumption

2.1.2, and a sequence (γN )N∈N of positive numbers, which fulfills (2.1). Moreover, we define the measure valued empirical process X(N) of the coupled particle system (2.2) by (2.3). Denote by µ(N) the McKean–Vlasov path with initial datum X(N)(0). In this section we (N) want to construct measure valued processes Ze , N ∈ N, which are exponentially tight in 1 (N) (N) 2 C([0,T ]; Hp−1(M)) and exponentially equivalent to (X − µ ), both with rate Nγ . γN N (N) Therefore, we first study the distribution valued martingales M , N ∈ N, defined by t * (N) (N) +   D E X (t) − X (0) Z 1 (N) M (N)(t), f := , f − X(N)(s), LX (s)f ds, (2.53) γN γN 0 2.6. THE COUPLED MODEL 59 for all f ∈ C∞(M). The next lemma gives us some properties of theses distribution valued martingales, especially the exponential tightness in C([0,T ]; Hp(M)).

(N) Lemma 2.6.2 Fix T > 0 and define M , N ∈ N, by (2.53). Then the statements (ii)–(v) of Lemma 2.5.1 are valid for M (N) and X(N) instead of M (N) and X(N), respectively. f ϑ d Fix p < − 2 − 1. Then for each α > 0 there exists a compact set Kα ⊆ C([0,T ]; Hp−1(M)) with 1  (N)  lim lim ln P M 6∈ Kα = −∞ . (2.54) α→∞ 2 N→∞ NγN

Proof: In the proof of parts (ii)–(v) of Lemma 2.5.1 we only used that the distribution valued (N) martingales Mf , N ∈ N, of the free model have quadratic covariation

t D E D E Z D E [[ M (N), f , M (N), g ]] = X(N)(s), (gradf, gradg) ds. f f t ϑ 0

The same is true for M (N) with X(N) instead of X(N). ϑ In order to check 2.5.6 we can follow the proof of Lemma 2.5.6. 2

Now suppose that X(N)(0) converges, as N → ∞, almost surely in P(M) to some non– random measure ν ∈ P(M). Denote by µ the McKean–Vlasov path with initial datum ν. (N) For N ∈ N, we define the process Ze ∈ C([0,T ]; Hp−1(M)) as the unique solution of

t Z Ze(N)(t) = (Gµ(s))∗Ze(N)(s) ds + M (N)(t) . (2.55) 0

This is a kind of linearization for the process 1 (X(N) −µ(N)) in the infinite dimensional space γN C([0,T ]; Hp−1(M)). From Proposition 2.3.3 we know, that a solution of (2.55) exists, for all d T > 0 and all p < − 2 − 1. The next lemma will transfer the exponential tightness of the (N) (N) sequence (M )N∈N to the sequence (Ze )n∈N.

d (N) Lemma 2.6.3 Fix T > 0 and p < − 2 − 1. Define Ze by (2.55). (N) 2 Then Ze is exponentially tight in C([0,T ]; Hp−1(M)) with rate NγN , i.e., for each α > 0 there exists a compact set Kα ⊆ C([0,T ]; Hp−1(M)) such that

1  (N)  lim lim ln P Ze 6∈ Kα = −∞ . α→∞ 2 N→∞ NγN

Proof: We know from Proposition 2.3.3 that Ze(N) is the image of M (N) under a bijective con- tinuous mapping from C([0,T ]; Hp(M)) to C([0,T ]; Hp−1(M)). Because exponential tight- ness is conserved under continuous mappings, our claim follows from 2.6.2. 2   Finally, (Z(N)) and 1 (X(N) − µ(N)) are closely related. This means, they are e N∈N γN N∈N 2 exponentially equivalent in C([0,T ]; Hp−1(M)) with rate NγN . 60 CHAPTER 2. MODERATE DEVIATIONS

Lemma 2.6.4 Fix T > 0 and p < − d − 1. Assume that X(N)(0) converges, as N → ∞, 2 ϑ almost surely in P(M) to some non–random ν ∈ P(M). Denote by µ ∈ C([0,T ]; P(M)) the (N) McKean–Vlasov path with initial datum ν. For N ∈ N, define Ze by (2.55). (N) Then the sequences (Ze )N∈N and

(N)  1 (N) (N)  (Z )N∈N := (X − µ ) (2.56) γN N∈N 2 are exponentially equivalent in C([0,T ]; Hp−1(M)) with rate NγN , i.e., ! 1 lim ln P sup Z(N)(t) − Z(N)(t) > δ = −∞, (2.57) 2 e N→∞ NγN 0≤t≤T Hp−1 for all δ > 0.

(N) Proof: For N ∈ N, let us first look at Z . This process solves t Z   Z(N)(t) = M (N)(t) + (Lµ(s))∗Z(N)(s) + B(Z(N)(s))∗X(N)(s) + B(µ(N)(s) − µ(s))∗Z(N)(s) ds, 0 for all t ∈ [0,T ]. Applying Proposition 2.3.3, we get

2 (N) sup Z (s) (2.58) s∈[0,t] Hp−1 t 2 Z 2 2 (N) (N) ∗ (N) (N) ∗ (N) ≤ c sup M (s) + c B(Z (s)) X (s) + B(µ (s) − µ(s)) Z (s) ds, s∈[0,T ] Hp Hp−2 Hp−2 0 for all t ∈ [0,T ] and some constant c > 0. Lemma 2.1.7 implies

2 2 B(Z(N)(s))∗X(N)(s) ≤ c Z(N)(s) sup kηk2 , Hp−1 Hp−2 Hp−1 η∈P(M) 2 2 B(µ(N)(s) − µ(s))∗Z(N)(s) ≤ c Z(N)(s) sup kηk2 , Hp−1 Hp−2 Hp−1 η∈P(M)

d for all s ∈ [0,T ] and some a constant c > 0. Since p < − 2 it follows from Lemma A.2.3.(iii) that sup kνk2 < ∞. Hp−1 ν∈P(M) This implies

 t  2 Z 2 2 (N) (N) (N) sup Z (s) ≤ c  Z (s) ds + sup M (t)  , s∈[0,t] Hp−1 Hp−1 s∈[0,T ] Hp 0 for all t ∈ [0,T ] and some constant c > 0. Applying Gronwall’s inequality, see [DZ93, Lemma E.6], we get 2 2 (N) (N) sup Z (s) ≤ c sup M (t) , (2.59) s∈[0,T ] Hp−1 s∈[0,T ] Hp 2.6. THE COUPLED MODEL 61 for some constant c > 0. Now we can start with the proof of (2.57). By definition Z(N) − Ze(N) solves

t Z Z(N)(t) − Ze(N)(t) = (Gµ(s))∗(Z(N)(s) − Ze(N)(s)) ds 0 t Z   + B(Z(N)(s))∗(X(N)(s) − µ(s)) + B(µ(N)(s) − µ(s))∗Z(N)(s) ds, 0 for all t ∈ [0,T ]. Now Proposition 2.3.3 yields 2 (N) (N) sup Z (t) − Ze (t) (2.60) t∈[0,T ] Hp−1 T Z 2 ≤ B(Z(N)(t))∗(X(N)(t) − µ(t)) + B(µ(N)(t) − µ(t))∗Z(N)(t) dt . Hp−2 0 Because of Lemma 2.1.7, the following inequalities are valid

B(Z(N)(t))∗(X(N)(t) − µ(t)) Hp−2 (N) ∗ (N) (N) ∗ (N) ≤ γN B(Z (t)) Z (t) + B(Z (t)) (µ (t) − µ(t)) Hp−2 Hp−2  2  (N) (N) (N) ≤ c γN Z (t) + Z (t) µ (t) − µ(t) Hp−1 Hp−1 Hp−1 and B(µ(N)(t) − µ(t))∗Z(N)(t) ≤ c µ(N)(t) − µ(t) Z(N)(t) , Hp−2 Hp−1 Hp−1 for all t ∈ [0,T ] and some constant c > 0. Therefore, using (2.59), we conclude that 2 B(Z(N)(t))∗(X(N)(t) − µ(t)) + B(µ(N)(t) − µ(t))∗Z(N)(t) Hp−2 ! 4 2 2 2 (N) (N) (N) ≤ c γN sup M (t) + sup µ (t) − µ(t) sup M (t) , t∈[0,T ] Hp t∈[0,T ] Hp−1 t∈[0,T ] Hp for some constant c > 0. Inserting the last estimate in (2.60), we get 2 (N) (N) sup Z (t) − Ze (t) (2.61) t∈[0,T ] Hq−1 ! 2 2 2 (N) 2 (N) (N) ≤ c sup M (t) γN sup M (t) + sup µ (t) − µ(t) , t∈[0,T ] Hp t∈[0,T ] Hp t∈[0,T ] Hp−1 for some constant c > 0. d (N) Since p − 1 < − 2 the almost surely weak convergence of the sequence (X (0))N∈N to ν implies the almost surely converge in Hp−1(M). Therefore, by Corollary 2.4.4 we get

(N) lim sup µ (t) − µ(t) = 0, almost surely. N→∞ t∈[0,T ] Hp−1 62 CHAPTER 2. MODERATE DEVIATIONS

Finally, using (2.61) and Lemma 2.6.2, we conclude that ! 1 2 lim ln P sup Z(N)(t) − Z(N)(t) > δ 2 e N→∞ NγN 0≤t≤T Hq−1 ! 1 2 ≤ lim lim ln P sup M (N)(t) > K 2 K→∞ N→∞ NγN t∈[0,T ] Hq−1 = −∞ , for each δ > 0. 2

2.6.3 Proof of Theorem 2.2.2 In this section we finally piece all together and proof Theorem 2.2.2. Therefore, fix a distribu- tion dependent vector field B, which satisfies Assumption 2.1.2. For each sequence (γN )N∈N of positive numbers, which fulfills (2.1), define the processes Ze(N) and Z(N) by (2.55) and (2.56), respectively.

Proof of Theorem 2.2.2: Fix some sequence (γN )N∈N of positive numbers, which satisfies (N) (N) (2.1). From Lemma 2.6.4 we know that the sequences (Z )N∈N and (Ze )N∈N are exponen- 2 tially equivalent in C([0,T ]; Hp−1(M)) with rate NγN . This implies that the local result of (N) (N) Theorem 2.6.1 holding for the sequence (Z )N∈N holds for (Ze )N∈N, too. Since, in addi- (N) 2 tion, (Ze )N∈N is by Lemma 2.6.3 exponentially tight in C([0,T ]; Hp−1(M)) with rate NγN (N) it follows that (Ze )N∈N satisfies a large deviation principle in C([0,T ]; Hp−1(M)) with rate 2 (N) NγN and rate function I. Now, using once more the exponential equivalence of (Z )N∈N and (N) (N) (Ze )N∈N, we get the same LDP for (Z )N∈N. Since the sequence (γN )N∈N was arbitrarily the claimed moderate deviation principle follows. Assume that I(ϑ) = 0, for some ϑ ∈ C([0,T ]; Hp−1(M)). Then

 d µ(t) ∗ − (G ) ϑ(t) = 0, for almost all t ∈ [0,T ]. dt H−1(M;µ(t))

Because µ is a McKean–Vlasov path, Lemma 2.4.2 yields kµ(t)kC0 < ∞, for all t ∈ (0,T ]. Therefore, we get

 d µ(t) ∗ − (G ) ϑ(t) = 0, for almost all t ∈ [0,T ]. dt H−1 This implies  d  − (Gµ(t))∗ ϑ(t) = 0, for almost all t ∈ [0,T ]. dt

Because of I(ϑ) = 0, the distribution valued function ϑ is an element of ACµ with ϑ(0) = 0. Therefore, we can proceed with

t Z ϑ(t) − (Gµ(s))∗µ(s) ds = 0, for all t ∈ [0,T ]. 0 Applying Proposition 2.3.3, it follows that ϑ(t) = 0, for all t ∈ [0,T ]. 2 2.7. NOTES 63

2.7 Notes

d As in Section 1.5 we want to analyze the mean field model in R , i.e., the particle system

 (N)  dxk(t) = −grad V (xk(t)) + B(X ) dt + σdWk(t).

d d Here, W1,...,WN are independent Brownian motions on R and V : R → R is a suitable potential. Like in our case we define the measure valued empirical processes X(N) by

N 1 X X(N)(t) := δ N xk(t) k=1 and denote by µ(N) the McKean–Vlasov path with initial datum X(N)(0). Our proof for the moderate deviation principle requires the compactness of the manifold M. Without compactness of the space we get several technical problems. For instance, the Laplace operator does not have a discrete spectrum and we cannot define the Sobolev spaces d Hp(R ) analogously to (2.6). One way to solve this problem is to take a suitable potential V , such that the operator

σ2 −∆ f := − ∆f + (grad V, grad f) , f ∈ C∞( d), V 2 R

d has a discrete non–positive spectrum. Then ∆V is self–adjoint in the Hilbert space L2(R ; µV ), where the probability measure µV is defined by dµ 1  2  V (x) = exp − V (x) . dλ C σ2

d Now one can define Hilbert spaces Hp(R ) like in (2.6) with −∆V and µV instead of −∆ and the uniform distribution λ, respectively. Furthermore, one has to prove similar results as in Section A.2. In particular, one has to prove that the eigenvalues λl of −∆V tends fast enough to ∞ as l tends to infinity. d Moreover, one may have to change the topology in P(R ), see [DG87] and [G¨ar88]. We expect that if the potential V is smooth and grows fast enough to infinity as kxk tends (N) (N) to infinity then the sequence (X − µ )N∈N satisfies a moderate deviation principle. 64 CHAPTER 2. MODERATE DEVIATIONS Chapter 3

The dynamic behavior

3.1 The model and basic notation

Let S be the one dimensional sphere. We will identify S with R modulo 2π. The particle sys- tem we are interested in can be described by the following system of Itˆostochastic differential equations D (N) E dxk(t) := J X (t), sin(· − xk(t)) dt + σdWk(t), (3.1) where J ≥ 0 is the mean field interaction constant and σ > 0 is the diffusion constant. Moreover, W1,...,WN are independent Brownian motions on R and the measure valued empirical process X(N) is defined by

N 1 X X(N)(t) := δ . N xk(t) k=1 Since we only study level one we will drop the index of the level. (N) In the last chapter we have seen that, for large N ∈ N, the process X is very close to the McKean–Vlasov path µ(N) with initial datum X(N)(0). Moreover, we know from Chapter (N) 1 that the invariant distributions of the measure valued empirical processes X , N ∈ N, converge to the uniform distribution on the sphere

S = {να: α ∈ S}.

Here, the probability measures να, α ∈ S, are defined by

dνα 1 2J  = exp r cos(α − x) , (3.2) dx C σ2 0

for a suitable normalizing constant C > 0, which is independent of α ∈ S. Moreover, the radius r0 ≥ 0 solves α r0 = hν , cos(α − ·)i (3.3) 2 and is positive if and only if the mean field constant J is larger than the critical value Jc = σ . We define the operators Lϑ and Gϑ by (2.4) and (2.5), respectively, for the distribution dependent vector field (B(ϑ)f)(x) = J f 0(x) hϑ, sin(· − x)i ,

65 66 CHAPTER 3. THE DYNAMIC BEHAVIOR i.e.,

σ2 (Lϑf)(x) = f 00(x) + J f 0(x) hϑ, sin(· − x)i , (3.4) 2 (Gϑf)(x) = (Lϑf)(x) + J ϑ, f 0 sin(x − ·) , (3.5)

for all ϑ ∈ D0(S) and all f ∈ C∞(S). Moreover, we define the operator D E Ge ϑf := Gϑf − ϑ, Gϑf , for ϑ ∈ D0(S) and f ∈ C∞(S). (3.6)

∞ For β ∈ S define f0,β ∈ C (S) by

D β E νβ D β E ν , f0,β = 0 and L f0,β(x) = J ν , sin(· − x) , x ∈ S.

As we will see in Section 3.2.1 f0,β is well defined. Moreover, in a suitable Hilbert space, f0,β β spans the eigenspace corresponding to the eigenvalue zero of the operator Ge ν . One could expect that as N tends to infinity a suitable times scaled version of the processes X(N) converge, as N → ∞, in distribution to a Brownian motion on the sphere S. Because of

D E D (N) E D E d X(N)(t), f = N X(N)(t), LX (t)f dt + σd M (N)(t), f , f ∈ C∞(S), where M (N)(t), f is a martingale with quadratic variation

t D E D E σ2 Z D E [[ M (N), f , M (N), f ]] = X(N)(Ns), |f 0|2 ds, t N 0 this time scale has to be t 7→ Nt. This leads to the following functional central limit theorem.

3 Theorem 3.1.1 Fix T > 0, p < − 2 and J > Jc. Assume that there exists a β ∈ S such (N) that the sequence (X (0))N∈N of random probability measures on S converges, as N → ∞, weakly to νβ almost surely. Furthermore, assume that

1  (N) β X (0) − ν converges, as N → ∞, in probability to zero, (3.7) γ N Hp−1 for some sequence (γN )N∈N of positive numbers with

3 lim NγN = 0. (3.8) N→∞

(N) Then the time speeded up processes (X (Nt))t∈[0,T ] converge, as N → ∞, in distribution to a Brownian motion on the sphere S with variance

−1 2 2 D β 0 2E 2 σe = σ ν , |f0,β| > σ and initial datum νβ. More precisely, there exist angle processes ϕ(N) ∈ C([0, ∞); S) such that 3.1. THE MODEL AND BASIC NOTATION 67

(i) for each ε > 0, ! (N) (N) ϕ (Nt) lim P sup X (Nt) − ν > ε = 0; N→∞ t∈[0,T ] Hp−1

(N) (ii) the time speeded up processes (ϕ (Nt))t∈[0,T ] converge, as N → ∞, in distribution to 2 a Brownian motion on S with variance σe and initial datum β.

2 Remark 3.1.2 (a) As we will see in Section 3.2.1 the variance σe is independent of β. (N) β (b) If the empirical measures X (0), N ∈ N, satisfy a central limit theorem with mean ν then the condition (3.7) is fulfilled. In particular, if X(N)(0) is the empirical measure of N independent νβ–distributed random variables then all conditions of Theorem 3.1.1 are satisfied.

(c) Because of rotation invariance, it is enough to prove the existence of processes ϕ(N) ∈ C([0, ∞); S) satisfying (ii) of Theorem 3.1.1 with initial datum zero and

(i’) for each ε > 0, !

lim P sup X(N) ◦ D−1 (Nt) − νβ > ε = 0, ϕ(N) N→∞ t∈[0,T ] Hp−1

where Dα denotes the rotation on S by angle α ∈ S. (d) For notes about higher dimensional spheres see the last section of this chapter. Since our state space P(S) is infinite dimensional we have to look at suitable test functions. In order to get a first feeling fix α ∈ S and compute the first order approximation

D E D α E   d X(N)(t) − να, f = X(N)(t) − να, Ge ν f + R (X(N)(t) − να)⊗2 dt D E + d M (N)(t), f ,

α for all f ∈ C∞(S). We will see in the next section that Ge ν is a self–adjoint operator in a suitable Hilbert space with non–positive discrete spectrum. Therefore, it should be a good idea to choose the eigenfunctions corresponding to the eigenvalue zero for test functions. Then we have to control the rest term R and we have to check how good this test functions are for our needs. The first problem is not so easy as it looks like. Indeed, we will see that we have to add a “second order approximation” in order to control the rest term. The idea is to get a rest term with depends continuously on (X(N) − να)⊗3 and use a moderate deviation principle to control it. In the next section we will construct the angle processes ϕ(N) together with the proper test functions. Then in Section 3.3 we will show that X(N) is the whole time t ∈ [0,NT ] very close to the sphere S. For this we will first show that a McKean–Vlasov path, which starts in a suitable neighborhood of this sphere, converges exponentially fast to some measure νβ. This behavior together with the moderate deviation result we have proven in Chapter 2 yield the claimed behavior of the process X(N). 68 CHAPTER 3. THE DYNAMIC BEHAVIOR

In Section 3.4 we will piece all together and prove Theorem 3.1.1. In oder to identify a measure νβ ∈ S it is enough to compute the value of νβ, cos(· − α) for at least two suitable α ∈ S. Therefore, one could guess that cos(α − ·), for some suitable α ∈ S, is a good candidate for a test function. In Section 3.5 we will discus why this guess is wrong, i.e., the functions cos(α − ·), α ∈ S, yield a to large variance for the limiting Brownian motion.

3.2 Construction of the test functions

In this section we construct the processes ϕ(N) together with the proper test functions. Fix J > Jc. Then the sphere S has a positive radius r0. (N) (N) For N ∈ N, let us assume that ϕ is a semi–martingale on S with ϕ (0) = 0. We define the process Y (N)(t) := X(N) ◦ D−1 (t), i.e., (3.9) ϕ(N) D E D E Y (N)(t), f = X(N)(t), f(· + ϕ(N)(t)) , for all f ∈ C∞(S).

For each function f ∈ C∞(S), we get

N D E 1 X d Y (N)(Nt) − νβ, f = σ f 0(x (Nt) + ϕ(N)(Nt))dW (Nt) (3.10) N k k k=1 D (N) β νβ E D (N) β ⊗2 E + N Y (Nt) − ν , Ge f dt + N (Y (Nt) − ν ) , R1,βf dt N D E σ X + Y (N)(Nt), f 0 dϕ(N)(Nt) + f 00(x (Nt) + ϕ(N)(Nt))d[[W , ϕ(N)]] N k k Nt k=1 1 D E + Y (N)(Nt), f 00 d[[ϕ(N), ϕ(N)]] , 2 Nt with

(R1,βf)(x, y) J Jr   = sin(x − y)(f 0(y) − f 0(x)) − 0 sin(β − x)(f 0 (x) − 1) + sin(β − y)(f 0 (y) − 1) 2 2 0,β 0,β for f ∈ C∞(S). We want to choose ϕ(N) in such a way that D E (N) β sup Y (Nt) − ν , f t∈[0,T ] converges, as N → ∞, in probability to zero, for suitable test functions f. β If Ge ν f = 0 then the second term on the right hand side of (3.10) vanishes. Assume that there exists such function f0,β, which is not constant. To control the third term we will add

D (N) β ⊗2 E (Y (Nt) − ν ) , f1,β , where f1,β is a solution of

νβ νβ νβ νβ (Ge ⊗ Ge )f1,β = Ge 1 f1,β + Ge 2 f1,β = −R1,βf0,β. (3.11) 3.2. CONSTRUCTION OF THE TEST FUNCTIONS 69

β β Here the subscript at the operator Ge ν refers to the variable, which Ge ν acts on. Then we (N) define the processes ϕ , N ∈ N, such that D (N) β E D (N) β ⊗2 E Y (Nt) − ν , f0,β + (Y (Nt) − ν ) , f1,β (3.12)

D (N) β E D (N) β ⊗2 E = Y (0) − ν , f0,β + (Y (0) − ν ) , f1,β ,

(N) (N) for all t ≥ 0 up to a suitable τ . For N ∈ N, the process ϕ solves  (N) β⊗3 A1 Y (t) − ν σ2 A + O(Y (N)(t) − νβ) dϕ(N)(t) = dM (N)(t) + dt + 2 dt , (3.13) (N) β (N) β 2 A3(Y (t), ν ) N (A3(Y (t), ν )) where M (N) is a continuous martingale with quadratic variation

t D β 0 2E (N) β σ2 Z ν , |f0,β| + O(Y (s) − ν ) [[M (N),M (N)]] = ds t (N) β 2 N (A3(Y (s), ν )) 0 and 0 3 A1(ϑ) = hϑ, R2,βf1,βi , ϑ ∈ (D (S)) , ∞ (R2,βf)(x, y, z) = (R1,βf(·, z))(x, y) + (R1,βf(z, ·))(x, y), f ∈ C (S × S),  (N) β D β 0 E (N) β A3 Y (t), ν = ν , f0,β + O(Y (t) − ν ) and  2  D β 0 2E  β β ∂ 1 D β 00 E A2 = ν , |f0,β| ν ⊗ ν , f1,β + ν , f0,β ∂x1∂x2 2  2  D β 0 E D β 0 00 E β β ∂ 0 0  + ν , f0,β ν , f0,βf0,β + ν ⊗ ν , f1,β(f0,β ⊕ f0,β) ∂x1∂x2 2  2  D β 0 E β ∂ + ν , f0,β ν , trace f1,β , ∂x1∂x2 Furthermore, each term O(Y (N)(t) − νβ) depends continuously on Y (N)(t) − νβ ∈ D0(S). (N) β (N) β Moreover, (O(Y (t) − ν ))N∈N converges, as N → ∞, weakly to zero if (Y (t) − ν )N∈N converges weakly to zero. Therefore, we have to analyze the test functions f0,β and f1,β.

3.2.1 Properties of the test functions f0,β

In this section we analyze the functions f0,β defined in the last section. Let us first look at β β the operators Lν and Ge ν defined by (3.4) and (3.6), respectively.

Lemma 3.2.1 Fix J > Jc and β ∈ S. Denote by He 1 the set of all f ∈ L2(S) with D E D E νβ, f = 0 and νβ, |f 0|2 < ∞.

Then the inner product D E νβ, g0f 0 makes He 1 a Hilbert space and the following assertions are valid 70 CHAPTER 3. THE DYNAMIC BEHAVIOR

νβ (i) The operator L is self–adjoint in He 1 with negative discrete spectrum,

νβ (ii) The operator Ge is self–adjoint in He 1 with non–positive discrete spectrum. Moreover, zero is a single eigenvalue. The corresponding eigenfunctions are real multiple of the ∞ function f0,β ∈ C (S) defined by

νβ D β E (L f0,β)(x) = J ν , sin(· − x) . (3.14)

Proof: By standard arguments it follows that He 1 is a Hilbert space. Using (3.2), one easily compute

D β E D β E D β E D β E νβ, (Lν f)0g0 = νβ, f 0(Lν g)0 and νβ, (Ge ν f)0g0 = νβ, f 0(Ge ν g)0

∞ νβ νβ ∞ for all f, g ∈ C (S). Moreover, L and Ge map He 1 ∩ C (S) into He 1. β β In order to study the spectrum of Lν and Ge ν let us first analyze the second derivative of β the rate function Iinv defined by (1.10) in Chapter 1. Since Iinv(ν ) = 0 it follows that Iinv is minimal at νβ and we get

2 d β 0 ≤ 2 Iinv((1 − h)ν + hν) dh h=0,    2 D β 2E 2J β cos = ν , (ϕ − 1) − ν , (ϕ − 1) σ2 sin =: R(ϕ)

dν ∞ for all ν ∈ P(S) with dνβ = ϕ ∈ C (S). Define  cos  cos m := νβ, and m := νβ, (ϕ − 1) . sin ϕ−1 sin

β Assume that R(ϕ) = 0. Then, using Iinv(ν ) = 0, we conclude that

d 0 = R((1 − h)ϕ + hψ) dh h=0   2J cos  = 2 νβ, ψ (ϕ − 1) − − m, m σ2 sin ϕ−1

for all ψ ∈ C∞(S) with ψ ≥ 0 and νβ, ψ = 1. Therefore, ϕ − 1 solves

2J cos  ϕ − 1 = − m, m . σ2 sin ϕ−1

Now R(ϕ) = 0 yields * + 2J cos 2 2J 0 = ( )2 νβ, − m, m − km k2 (3.15) σ2 sin ϕ−1 σ2 ϕ−1    2J d 2 = 2 mϕ−1, V (m + hmϕ−1) − kmϕ−1k , σ dh h=0 3.2. CONSTRUCTION OF THE TEST FUNCTIONS 71 with 2π  cos(x)  2J m, R cos(x) σ2 (sin(x)) sin(x) e dx V (m) := 0 , for m ∈ 2. 2π  cos(x)  R 2J m,( ) R e σ2 sin(x) dx 0

2 Because V is invariant under rotations, equation (3.15) holds for all mϕ−1 ∈ R , which are orthogonal to m. Otherwise, if mϕ−1 = c m, for some c ∈ R \{0}, then we get   d 2 2 0 2 c m, V (m(1 + hc)) = c m GJ (r0) < mϕ−1. dh h=0

Here we used Propositions 1.2.5 together with Lemma 1.2.3. Therefore, equation (3.15) is fulfilled if and only if mϕ−1 and m are orthogonal. Hence we get

D β E ϕ(x) − 1 = c ν , sin(· − x) , for some constant c ∈ R.

Now, using the integration by parts formula, we compute

2  0  0   0 β 0  νβ  β 0  νβ  β 0 cos ν , f Ge f = ν , f L f + J ν , f sin !  2     2 2 β  νβ  2J β cos νβ = − ν , L f − ν , L f σ2 σ2 sin   νβ 2 β 2 L f = − sup Lν f(x) R  + 1 2  νβ  σ x∈S sup L f(x) x∈S ≤ 0 ,

∞ for all f ∈ C (S). Moreover, equality holds if and only if f = cf0,β, for some c ∈ R. Therefore, β β the spectrum of Lν is negative and the spectrum of Ge ν is non–positive. Moreover, one easily νβ sees that f0,β belongs to He 1 and hence zero is a single eigenvalue of the operator Ge . νβ νβ Up to now we have proven that L and Ge are semi–bounded symmetric operators on He 1 β β with dense domains. Therefore, Lν and Ge ν are self–adjoint. This mean there exist unique self–adjoint extensions. This statement is taken from [DS63, Theorem 2 of Chapter XII.5.2]. We will denote the self–adjoint extensions by the same symbols. β νβ νβ Since ν ∈ Hp(S), for all p ∈ R, the operators −L and −G satisfy by Lemma 2.3.5 and Lemma 2.3.7 the Assumption 2.3.1 for all p ∈ R and q = 1. Moreover, the constants C1,C2 > 0 and C3 ≥ 0 are independent of β ∈ S. Taking λ > 2C3 and using (2.18), we compute

β 2  β  (λI − Lν )g ≥ λ2 kgk2 + 2λ g, −Lν g H1 H1 H1   ≥ λ2 kgk2 − kgk2 + 2C λ kgk2 H1 H0 2 H2 ≥ 2C λ kgk2 , 2 H2 72 CHAPTER 3. THE DYNAMIC BEHAVIOR for all g ∈ H (S). Since H is a subspace of H (S) and the set {g ∈ H (S): kgk ≤ 1} is 2 e 1 1 1 H2 νβ −1 compact in H1(S) it follows that the resolvent (λI − L ) is compact. Hence, the spectrum of Lνβ is discrete. Furthermore, taking λ ≥ 0, we get

β 2 β 2 D β E  β  (λI − Ge ν )g ≥ (λI − Gν )g − 2 νβ, Gν g (λI − Gν )g, 1 H1 H1 H1 β 2  β   β  = (λI − Gν )g − 2 (Gν )∗νβ, g g, (λI − Gν )∗1 H1 H0 H1 β 2 β β ≥ (λI − Gν )g − 2 kgk2 (Gν )∗νβ (λI − Gν )∗1 H0 H1 H0 H2 β 2 β  β  ≥ (λI − Gν )g − 2 kgk2 (Gν )∗νβ λ + (Gν )∗1 H0 H1 H0 H2 β 2 ≥ (λI − Gν )g − C(λ + 1) kgk2 , H0 H1

νβ for all g ∈ H2(S) and some constant C ≥ 0. Since −G satisfies the Assumption 2.3.1 we can proceed in the same way as for the operator Lνβ and derive the discreteness of the spectrum β of Ge ν . 2

Now let us study the functions f0,β, β ∈ S.

Lemma 3.2.2 Fix J > Jc and β ∈ S. Define f0,β by (3.14). Then the following assertions are valid: 2J β  2J  exp − 2 ν , cos(· − x) exp − 2 r0 cos(β − x) (i) f 0 (x) = 1 − σ = 1 − σ , 0,β 2π 2π 1 R 2J  1 R 2J  2π exp σ2 r0 cos(y) dy 2π exp σ2 r0 cos(y) dy 0 0 where r0 is the unique positive solution of r = GJ (r), see (1.15).

(ii) f0,β(β + x) = −f0,β(β − x), for all x ∈ S.

(iii) f0,β(x) = f0,α(x − β + α), for all α, x ∈ S.

D β 0 0 E β 0 1 (iv) ν , g f0,β = ν , g for all g ∈ C (S).

2π −2 D β 0 E  1 R 2J  (v) 1 > ν , f0,β = 1 − 2π exp( σ2 r0 cos(x)) dx > 0. 0

D β 00 E (vi) ν , f0,β = 0.

α (vii) hν , f0,βi = 0 if and only if β − α = 0 mod π.

Proof: By standard techniques one gets (i). From this (ii) and (iii) follows immediately. Using the integration by parts formula twice, we conclude that

D E 2 D β E νβ, g0f 0 = − νβ, g Lν f 0,β σ2 0,β 2Jr D E = − 0 νβ, g sin(β − ·) σ2 D E = νβ, g0 . 3.2. CONSTRUCTION OF THE TEST FUNCTIONS 73

Jensen’s inequality, see [Bau91, Satz 3.9], and (i) gives us (v). 00 Because of (iii) it is enough to prove (vi) for β = 0. Since f0,0 is odd f0,0 is odd, too. Hence, 00 0 integrating f0,0 with the even distribution ν , yields zero. α α cos 0 cos 1 Now take α ∈ S and define m = ν , sin and m = ν , sin = r0 0 . Then we get

2π 1 R 2J α  2π exp σ2 km − mk cos(x) dx d α α 0 0 hν , f0,0i = − ν , f0,0 = − 1. (3.16) dα  2π 2 1 R 2J  2π exp σ2 r0 cos(x) dx 0

α d α Since the numerator is increasing in km − mk it follows that dα hν , f0,0i has only one minima α α α at m = m and only one maxima at m = −m . Therefore, the mapping α 7→ hν , f0,0i has α at most two zero points. Moreover, using (i), we get hν , f0,0i = 0, for α = 0 mod π. This proves (vii), for β = 0. The general case now follows from (iii). 2

3.2.2 Properties of the test functions f1,β

In this section we analyze the functions f1,β.

∞ Lemma 3.2.3 Fix J > Jc and β ∈ S. For f ∈ C (S), define J (R f)(x, y) = sin(x − y)(f 0(y) − f 0(x)) 1,β 2 Jr   − 0 sin(β − x)(f 0 (x) − 1) + sin(β − y)(f 0 (y) − 1) . 2 0,β 0,β ∞ Then there exists an symmetric solution f1,β ∈ C (S × S) of

νβ νβ (Ge ⊗ Ge )f1,β + R1,βf0,β = 0 (3.17) with  2   2  β β ∂ D β 0 E β ∂ 2 ν ⊗ ν , f1,β + ν , f0,β ν , trace f1,β = 0. (3.18) ∂x1∂x2 ∂x1∂x2

β β Proof: First, let us look at the operator Ge ν ⊗ Ge ν . Using Lemma 3.2.1, we conclude that this operator is self–adjoint in n o He 2 := f ∈ He 1 ⊗ He 1: f(x, y) = f(y, x), for all x, y ∈ S with discrete spectrum. Moreover, zero is a single eigenvalue with eigenfunction f0,β ⊗ f0,β. Since 2   R f (x, y) = cos(β − x) − r sin(β − y)f 0 (y) J 1,β 0,β 0 0,β   0 + cos(β − y) − r0 sin(β − x)f0,β(x)

 0  − sin(β − x) cos(β − y)f0,β(y) − r0

 0  − sin(β − y) cos(β − x)f0,β(x) − r0 , 74 CHAPTER 3. THE DYNAMIC BEHAVIOR

for all x, y ∈ S × S, we see that R1,βf0,β is an element of He 2. Furthermore,  2  1 1 β β 0 0 ∂ (R1,βf0,β, f0,β ⊗ f0,β)H = ν ⊗ ν , f0,β ⊗ f0,β R1,βf0,β J e 2 J ∂x1∂x2 D β 0 0 ED β 0 0 0 E = 2 ν , cos (β − ·)f0,β ν , (sin(β − ·)f0,β) f0,β

D β 0 0 ED β 0 0 0 E −2 ν , sin (β − ·)f0,β ν , (cos(β − ·)f0,β) f0,β

Using Lemma 3.2.2 and the integration by parts formula, we get

D E 2 D β E νβ, (cos(β − ·)f 0 )0f 0 = νβ, cos(β − ·)f 0 Lν f 0,β 0,β σ2 0,β 0,β 2Jr D E = − 0 νβ, cos(β − ·) sin(β − ·)f 0 σ2 0,β 2Jr D E = − 0 νβ, cos(β − ·) sin(β − ·) σ2 = 0 and D β 0 0 E D β 0 E ν , cos (β − ·)f0,β = ν , cos (β − ·) = 0 Therefore, it follows that (R1,βf0,β, f0,β ⊗ f0,β) = 0. He 2 In other words R1,βf0,β is orthogonal to the eigenspace corresponding to the eigenvalue zero νβ νβ of the operator Ge ⊗ Ge . Therefore, there exists an unique solution fe1,β ∈ He 2 of (3.17) with   fe1,β, f0,β ⊗ f0,β = 0. He 2 νβ νβ Moreover, since R1,βf0,β and all coefficients of the operator Ge ⊗ Ge are elements of ∞ ∞ C (S × S) the function fe1,β has to be an element of C (S × S), too. Now define

D 2 E νβ, trace ∂ f ∂x1∂x2 e1,β f1,β := fe1,β − f0,β ⊗ f0,β. D β 0 E 3 ν , f0,β

νβ νβ Then Ge ⊗ Ge f1,β = 0. Moreover, using the properties of f0,β presented in Lemma 3.2.2, we get  2   2  β β ∂ D β 0 E β ∂ 2 ν ⊗ ν , f1,β + ν , f0,β ν , trace f1,β ∂x1∂x2 ∂x1∂x2 D β ∂2 E  2  ν , trace fe1,β β β ∂ 2 ∂x1∂x2 D β β 0 0 E = 2 ν ⊗ ν , fe1,β − D E ν ⊗ ν , f0,β ⊗ f0,β ∂x1∂x2 3 β 0 ν , f0,β

D β ∂2 E  2  ν , trace fe1,β D β 0 E β ∂ 1 ∂x1∂x2 D β 0 2E + ν , f0,β ν , trace fe1,β − D E ν , (f0,β) ∂x1∂x2 3 β 0 ν , f0,β  2  β β ∂ = 2 ν ⊗ ν , fe1,β ∂x1∂x2 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 75

D β 0 E   = 2 ν , f0,β fe1,β, f0,β ⊗ f0,β He 2 = 0.

2

0 0 Remark 3.2.4 The operators Lν and Ge ν can also be conceived as self–adjoint operators on the sub–Hilbert spaces of all even or all odd functions f ∈ He 1. Maybe this could be useful in higher dimensions.

3.3 Long time behavior of the empirical processes

Fix J > Jc. Then the sphere S has a positive radius r0. Fix T > 0 and a sequence (γN )N∈N of positive numbers, which satisfies (2.1). We show that if the measure valued empirical process (N) X starts in a γN –neighborhood of the sphere S then it stays the whole time up to NT in a γN -neighborhood of this sphere with large probability. The idea to prove this result is first (N) to show that X stays up to time T with large probability in a γN –neighborhood of the McKean–Vlasov path µ(N) with initial datum X(N)(0). Second, we verify that the McKean– Vlasov path converges exponentially fast to the sphere S. Then we use the of X(N) in order to prove the claimed long–time behavior of X(N). In the next section we will start with the study of the McKean–Vlasov path.

3.3.1 Exponential convergence of the McKean–Vlasov path In this section we continue the study of the McKean–Vlasov path started in Section 2.4. Fix J > Jc. Then the radius r0 of the sphere S is positive. We prove that if the initial datum ν of the McKean–Vlasov path µ ∈ C([0, ∞); P(S)) is in a suitable neighborhood of this sphere then µ converges exponentially fast to some probability measure νβ ∈ S. We start with some notations. For µ ∈ P(S), define the length r(µ) ∈ [0, 1] and the angle α(µ) ∈ S of the mean of µ by

 cos cos(α(µ)) µ, = r(µ) . (3.19) sin sin(α(µ))

If there is no risk of confusion then we will write r(t) and α(t) instead of r(µ(t)) and α(µ(t)), respectively, for a measure valued path µ ∈ C([0, ∞); P(S)). Moreover, we define the probability measure µ by

dµ 1 2J  (x) := exp r(µ) cos(α(µ) − x) (3.20) dλ Z(r(µ)) σ2 1 2J cos(x)  cos = exp , µ, , Z(r(µ)) σ2 sin(x) sin with 2π Z 2J  Z(r) := exp r cos(x) λ(dx). (3.21) σ2 0 76 CHAPTER 3. THE DYNAMIC BEHAVIOR

Here λ denotes the uniform distribution on S. Note that the function G defined by (1.15) is σ2 the derivative of 2J ln(Z), i.e., d 2J ln(Z(r)) = G(r). dr σ2 In this section we require logarithmic Sobolev inequalities.

Definition 3.3.1 We say that a probability measure µ on a manifold S satisfies a logarithmic Sobolev inequality with constant κ > 0 if

* !2+ 2 ϕ D 2E κ µ, ϕ ln ≤ µ, kgradϕk , for all ϕ ∈ L2(µ). kϕkµ

Here, kϕkµ denotes the L2–norm of ϕ with respect to µ, i.e.,

2 2 kϕkµ := µ, ϕ .

In our situation we get.

Lemma 3.3.2 There exists a constant κ > 0 such that for each µ ∈ P(S) the probability − 8J measure µ satisfies a logarithmic Sobolev inequality with constant κe σ2 .

Proof: Since

− 4J dµ dµ 4J e σ ≤ inf (x) ≤ sup (x) ≤ e σ , for all µ ∈ P(S), x∈S dλ x∈S dλ the claimed logarithmic Sobolev inequality follows from general results about logarithmic Sobolev inequalities on compact manifolds. For a short introduction, see Appendix A.1. 2 In order to prove the exponential convergence of the McKean–Vlasov path we use estimates for the rate function Iinv of the LDP for the invariant distributions of the measure valued empirical processes X(N) defined by (1.10) in Chapter 1. Moreover, we use the relative entropy Ξ(µ|ν) of a probability measure µ with respect to another probability measure ν, i.e., ( D E µ, ln dµ , if µ  ν, Ξ(µ|ν) := dν (3.22) ∞, otherwise.

Then we get J I (µ) = Ξ(µ|λ) − (r(µ)2 + r2) + ln(Z(r )) (3.23) inv σ2 0 0 J = Ξ(µ|µ) + (r(µ)2 − r2) − ln(Z(r(µ))) + ln(Z(r )) σ2 0 0 J = Ξ(µ|να(µ)) − (r(µ) − r )2. σ2 0

The following lemma presents an estimate for the rate function Iinv in terms of the relative entropy. 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 77

Lemma 3.3.3 Fix J > Jc and µ0 ∈ P(S). Denote by µ ∈ C([0, ∞); P(S)) the McKean– Vlasov path with initial datum µ0. Then there exists a constant c > 0 independent of µ0 such that t Z Iinv(µ(t)) ≤ −c Ξ(µ(u)|µ(u)) du + Iinv(µ(s)), for all 0 ≤ s ≤ t. (3.24) s

In particular, Iinv(µ(·)) is non–increasing.

Proof: First, assume that (3.24) is fulfilled. Then, because of the non–positivity of the relative entropy, Iinv(µ(·)) is non–increasing. Furthermore,

t Z d I (µ(t)) = I (µ(u))du + I (µ(s)) inv du inv inv s t Z      2 d dµ(u) J d cos = µ(u), ln − µ(u), du + Iinv(µ(s)), du dλ σ2 du sin s for all 0 ≤ s ≤ t. For the first integrand of the right hand side we get

d  dµ(u) µ(u), ln du dλ  dµ(u)  d dµ(u) = µ(u), Lµ(u) ln + µ(u), ln (3.25) dλ du dλ  dµ(u)  dµ(u) dµ(u)  d dµ(u) = µ(u), Lµ(u) ln + µ(u), Lµ(u) ln + λ, dλ dµ(u) dµ(u) du dλ

The last summand in the last line is equal to zero. Using the integration by parts formula for the second summand in the last line of (3.25), we can proceed with

d  dµ(u) µ(u), ln du dλ  dµ(u) σ2  dµ(u)0  dµ(u)0 = µ(u), Lµ(u) ln − µ(u), ln dλ 2 dµ(u) dµ(u) 0 2    2 * s ! + J d cos 2 dµ(u) = µ(u), − 2σ µ(u), . σ2 du sin dµ(u)

Therefore,

t 0 2 Z * s ! + 2 dµ(u) Iinv(µ(t)) = −2σ µ(u), du + Iinv(µ(s)). dµ(u) s Now, using the logarithmic Sobolev inequality proven in Lemma 3.3.2, our claimed estimate 2 − 8J (3.24) follows with c = −2σ κe σ2 , where κ > 0 is taken from Lemma 3.3.3. 2 78 CHAPTER 3. THE DYNAMIC BEHAVIOR

To derive exponential estimates from (3.24) we have to estimate the relative entropy Ξ(µ(u)|µ(u)) in terms of Iinv(µ(u)), for all u ≥ 0. For this we require the following in- equalities.

Lemma 3.3.4 Fix J > Jc. Then the following assertions are valid: (i) There exist constants c, c > 0 such that

J c(r − r )2 ≤ (r2 − r2) − ln(Z(r)) + ln(Z(r )) ≤ c(r − r )2, 0 σ2 0 0 0 for all r ∈ [0, 1]

(ii) For each a ∈ (0, r0) there exists a constant c > 0 such that

2 2 (r − GJ (r)) ≥ c(r − r0) ,

for all r ∈ [a, 1].

J 2 2 (r −r )−ln(Z(r))+ln(Z(r0)) σ2 0

2 (GJ (r)−r)

r r 0 r0 1 0 r0 1

Proof: Let us first study the function J h(r) := (r2 − r2) − ln(Z(r)) + ln(Z(r ). σ2 0 0 Differentiating h, we get 2J h0(r) = (r − G (r)), σ2 J

which is by Lemma 1.2.3 and Proposition 1.2.5 zero if and only if r = 0 or r = r0. Since 00 00 h (0) < 0 and h (r0) > 0 we get (i) by a Taylor expansion, see the right picture above. An analogue discussion proves (ii), see the left picture above. 2 Now we can proceed with estimates between the relative entropy, the rate function and the L2–norm of the McKean–Vlasov path.

Lemma 3.3.5 Fix J > Jc. Then the following assertions are valid: (i) For all µ ∈ P(S), J (r(µ) − r )2 ≤ Ξ(µ|να(µ)). σ2 0 (ii) For all µ ∈ P(S), α(µ) Ξ(µ|µ) ≤ Iinv(µ) ≤ Ξ(µ|ν ). 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 79

(iii) For each K > 0 there exists a constant c > 0 such that

2 α(µ) α(µ) α 4J α 2 c µ − ν ≤ Ξ(µ|ν ) = inf Ξ(µ|ν ) ≤ e σ2 inf kµ − ν k , H0 H0 α∈S α∈S

for all µ ∈ P(S) with kµkC0 < K.

(iv) For each K > 0 and each 0 < a < r0 there exists a constant c > 0 such that

Ξ(µ|να(µ)) ≤ c Ξ(µ|µ),

for all µ ∈ P(S) with kµkC0 < K and r(µ) ≥ a.

Proof: Part (i) and the second inequality of (ii) follow immediately from (3.23). Moreover, since by Lemma (i) J (r2 − r2) − ln(Z(r)) + ln(Z(r )) ≥ 0, σ2 0 0 for all r ∈ [0, 1], the first inequality of (ii) follows from (3.23), too. In order to prove (iv) fix K > 0. Using Taylor expansion, we get a constant c1 > 0 such 2 that x ln x ≥ (x − 1) + c1(x − 1) is fulfilled, for all x ∈ [0,K]. Therefore,

 dµ dµ  Ξ(µ|να(µ)) = να(µ), ln (3.26) dνα(µ) dνα(µ) * +  dµ 2 ≥ c να(µ), − 1 1 dνα(µ) 2 µ − να(µ), f = c sup 1 α(µ) 2 f∈C∞(S) ν , f −1 2 dνα(µ) µ − να(µ), f ≥ c sup , 1 2 dλ f∈C∞(S) hλ, f i C0 for all µ ∈ P(S) with kµkC0 < K. dνα(µ) 4J A simple calculation shows that ≤ e σ2 and the first inequality of (iii) is proven. dλ C0 Moreover, 2J Ξ(µ|να) − Ξ(µ|να(µ)) = r r(µ)(1 − cos(α(µ) − α)) , σ2 0 for all µ ∈ P(S) and all α ∈ S. Therefore, α(µ) is the “best choice”, i.e.,

Ξ(µ|να(µ)) = inf Ξ(µ|να). α∈S

2 In order to finish the proof of (iii) we use that x ln x ≤ (x − 1) + (x − 1) , for all x ∈ R+, and get

 dµ dµ  Ξ(µ|να) = να, ln dνα dνα * +  dµ 2 ≤ να, − 1 dνα 80 CHAPTER 3. THE DYNAMIC BEHAVIOR

hµ − να, fi2 = sup α 2 f∈C∞(S) hν , f i α 2 4J hµ − ν , fi σ2 ≤ e sup 2 f∈C∞(S) hλ, f i 4J α 2 = e σ2 kµ − ν k , H0 for all α ∈ S. Now fix K > 0 and r0 > a > 0. Then analogous to (3.26) we get a constant c1 > 0 such that 2 − 4J hµ − µ, fi σ2 Ξ(µ|µ) ≥ c1e sup 2 , f∈C∞(S) hλ, f i

cos cos  for all µ ∈ P(S) with kµkC0 < K. Taking f = sin , µ − µ, sin , we can proceed with   2 − 4J cos − 4J 2 Ξ(µ|µ) ≥ c e σ2 µ − µ, = c e σ2 (r(µ) − G (r(µ))) . 1 sin 1 J Here we used α(µ) = α(µ) and r(µ) = GJ (r(µ)), for all µ ∈ P(S). If in addition r(µ) ≥ a then we can apply Lemma 3.3.4 and get a constant c2 > 0 such that 2J J c Ξ(µ|µ) ≥ (r(µ)2 − r2) − ln(Z(r(µ))) + ln(Z(r )) + (r(µ) − r )2 2 σ2 0 0 σ2 0 = Ξ(µ|να(µ)) − Ξ(µ|µ), and (iv) is proven. 2 We finish this section with the proof of the claimed exponential convergence of the McKean– Vlasov path.

Proposition 3.3.6 Fix J > Jc, p ∈ R and µ0 ∈ P(S) ∩ Hp(S). Denote by µ(·) the McKean– Vlasov path with initial datum µ0. Then there exist constants ε = ε(p) > 0, c1 = c1(p) > 0 and c2 = c2(p) > 0 such that α 2 inf kµ0 − ν kH < ε α∈S p implies the existence of some β ∈ S with 2 µ(t) − νβ ≤ c e−c2t inf kµ − ναk2 , (3.27) 1 0 Hp Hp α∈S for all t ∈ R+.

Proof: First, we note that in the following all constants will be positive and independent of the initial datum µ0. Furthermore, we denote by the letter “a” constants, which depend on p, and by the letter “b” constants, which are independent of p. Moreover, note that if a McKean–Vlasov path starts in some να ∈ S then it will stay there all time. Let us recall some estimates from Section 2.4: 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 81

(a) By Corollary 2.4.4 the following inequalities are valid

α 2 α 2 inf kµ(1) − ν kH ≤ a1 inf kµ0 − ν kH , (3.28) α∈S 0 α∈S p sup inf kµ(t + s) − ναk2 ≤ b inf kµ(t) − ναk2 , for all t ≥ 0. (3.29) H0 1 H0 s∈[0,1] α∈S α∈S

(b) From Lemma 2.4.2 it follows that

inf kµ(t)kC0 ≤ b2. (3.30) t∈[1,∞)

(c) By Lemma 3.3.5 we get α 2 Iinv(µ) ≤ b3 inf kµ − ν kH , (3.31) α∈S 0

for all µ ∈ P(S) with kµkC0 ≤ b2. (d) Lemma 3.3.5 yields

α 2 inf kµ − ν kH ≤ b4 Iinv(µ), (3.32) α∈S 0 Ξ(µ|µ) ≤ Iinv(µ) ≤ b5 Ξ(µ|µ), (3.33) for all µ ∈ P(S) with

2 2 r0 kµk 0 ≤ b and (r(µ) − r ) ≤ . C 2 0 2

r2 r2 Now choose ε = ε(p) > 0 in such a way that ε < 0 and ε < 0 . Assume that 2b1a1 2b1a1b3b4

α 2 inf kµ0 − ν kH < ε. α∈S p Then combining (3.28)–(3.31), it follows that    2 2 α cos α 2 (r(1 + s) − r0) = inf µ(1 + s) − ν , ≤ inf kµ(1 + s) − ν kH α∈S sin α∈S 0 α 2 ≤ b1 inf kµ(1) − ν kH α∈S 0 α 2 ≤ b1a1 inf kµ0 − ν kH α∈S p r2 ≤ 0 , 2 for all s ∈ [0, 1]. Therefore, (3.32) is applicable to µ(t), for t ∈ [1, 2]. This leads to

2 α 2 α 2 (r(t) − r0) ≤ inf kµ(t) − ν kH ≤ b4 Iinv(µ(t)) ≤ b3b4 inf kµ(t) − ν kH , (3.34) α∈S 0 α∈S 0

r2 for all t ∈ [1, 2]. In particular, since ε < 0 it follows that 2b1a1b3b4

2 α 2 (r(2) − r) ≤ b3 Iinv(µ(2)) ≤ b3b4 inf kµ(2) − ν kH α∈S 0 2 α 2 r0 ≤ b1a1b3b4 inf kµ0 − ν kH ≤ . α∈S p 2 82 CHAPTER 3. THE DYNAMIC BEHAVIOR

Therefore, since Iinv(µ(·)) is by Lemma 3.3.3 non–increasing we can iterate this procedure and get (3.34) for all t ≥ 1. Moreover, (3.33) is fulfilled for µ(t), for all t ≥ 1. Using this together with the logarithmic Sobolev inequality presented in Lemma 3.3.3, we get

t Z Iinv(µ(t)) ≤ −c2 Iinv(µ(u)) du + Iinv(µ(s)), s for all 1 ≤ s ≤ t and some constant c2 = c2(p) > 0. This implies

−c2t −c2t α 2 Iinv(µ(t)) ≤ Iinv(1)e ≤ a1b3 e inf kµ0 − ν kH , (3.35) α∈S p for all t ≥ 1. Using once more (3.34), we conclude that

α 2 −c2t α 2 inf kµ(t) − ν kH ≤ a1b3b4 e inf kµ0 − ν kH , α∈S 0 α∈S p for all t ≥ 1. Part (i) of Corollary 2.4.4 implies

α 2 α 2 inf kµ(t + 1) − ν kH ≤ a2 inf kµ(t) − ν kH , α∈S p α∈S 0 for all t ≥ 0. Therefore, we can proceed with

α 2 −c2t α 2 inf kµ(t) − ν kH ≤ a1b3b4a2 e inf kµ0 − ν kH , α∈S p α∈S p for all t ≥ 2. Using Lemma 2.4.3, we get

α 2 α 2 inf kµ(t) − ν kH ≤ a3 kµ0 − ν kH , α∈S p p for all t ∈ [0, 2]. Hence, it follows that

α 2 −c2t α 2 inf kµ(t) − ν kH ≤ a4 e inf kµ0 − ν kH , (3.36) α∈S p α∈S p for all t ≥ 0. Moreover, take m ∈ N such that p + 1 > m ≥ p. Then Lemma A.2.2.(iv) yields

α1 α2 2 α1 α2 2 2 2 kν − ν k ≤ a kν − ν k m ≤ a r | cos(α ) − cos(α )| , Hp 5 C 6 0 1 2

for all α1, α2 ∈ S. Therefore, we get

2 α(t) α ν − ν ≤ a6 r0 cos(α(t)) − r0 cos(α) Hp  2 2 ≤ a6 |r(t) cos(α(t)) − r0 cos(α)| + |r(t) − r0|    2 α cos ≤ 2a6 µ(t) − ν , (3.37) sin ≤ a kµ(t) − ναk2 , 7 Hp for all t ≥ 0 and all α ∈ S. Using this together with (3.36), we conclude that

2 µ(t) − να(t) ≤ a (1 + a )e−c2t inf kµ − ναk2 , (3.38) 4 7 0 Hp Hp α∈S 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 83 for all t ≥ 0. cos 2 Now let us analyze µ(t), sin , for t ≥ 0. This R –valued function solves

d  cos σ2  cos   sin2 − sin cos  cos µ(t), = − µ(t), + J µ(t), µ(t), . dt sin 2 sin − sin cos cos2 sin

A short calculation yields

  sin2 − sin cos  cos σ2  cos να(t), µ(t), = µ(t), . − sin cos cos2 sin 2J sin

Denote by kAk the absolute value of the largest eigenvalue of the symmetric matrix A. Then, using (3.38), we can proceed with

   2   2  2 d cos α(t) sin − sin cos µ(t), ≤ J µ(t) − ν , dt sin − sin cos cos2 2 α(t) ≤ a8 µ(t) − ν Hp −c2t α 2 ≤ a9 e inf kµ(t) − ν kH , α∈S p

cos for all t ≥ 0. This implies that µ(t), sin converges, as t → ∞, exponentially fast to a 2 vector m ∈ R , i.e.,

 cos 2 a µ(t), − m ≤ 9 e−c2t inf kµ − ναk2 , for all t ≥ 0. 0 H0 sin c2 α∈S

cos(β) Because of (3.36), this vector has to be of the form m = r0 sin(β) , for some β ∈ S. Finally the last inequality together with (3.37) and (3.38) imply

2 2 2 µ(t) − νβ ≤ µ(t) − να(t) + να(t) − νβ Hp Hp Hp 2    2 α(t) β cos ≤ µ(t) − ν + 2a6 µ(t) − ν , Hp sin

−c2t α 2 ≤ c1 e inf kµ0 − ν kH , α∈S α

for all t ≥ 0 and some constant c1 = c1(p) > 0. 2

3.3.2 Convergence of the empirical processes to the sphere S

Fix J > Jc. Then the sphere S has a positive radius r0. Assume that the sequence (γN )N∈N of positive numbers satisfies (2.1). In the last section we have seen that the McKean–Vlasov path converges exponentially fast to some element of the sphere S. (N) (N) Moreover, from Chapter 2 we know that the sequence (X −µ )N∈N satisfies a moderate deviation principle with rate function I. Remind that I(ϑ) = 0 if and only if ϑ = 0. For this reason one could expect that X(N) converges to the sphere S, too. The following proposition makes this statement more precise. 84 CHAPTER 3. THE DYNAMIC BEHAVIOR

3 Proposition 3.3.7 Fix T > 0, p < − 2 , J > Jc and f: N → N with

1 lim 2 ln f(N) < ∞. (3.39) N→∞ NγN Assume that   X(N)(0) − να lim P  inf > ε = 0, (3.40) N→∞ α∈S γN Hp−1 for some sequence (γN )N∈N that satisfies (2.1) and all ε > 0. Moreover, assume that

inf lim X(N)(0) − να = 0, almost surely. (3.41) α∈S N→∞ Hp−1

Then   X(N)(t) − να

lim P  sup inf > ε = 0, (3.42) N→∞ t∈[0,T f(N)] α∈S γN Hp−1 for all ε > 0.

Proof: Without loss of generality we take T > 0 and K > 0 so large that

1 c e−c2T < and K > 1 + c (1 − e−c2T ), 1 2 1 where the constants c1 and c2 are taken from Proposition 3.3.6. Fix ε > 0. For N ∈ N and 0 ≤ k ≤ f(N) − 1, define the sets    X(N)(t) − να  (N) Ak := sup inf > ε , t∈[kT,(k+1)T ] α∈S γN  Hp−1     X(N)(kT ) − να ε  (N) Bk := inf > , α∈S γN K  Hp−1    (N) α Ck := inf lim X (kT ) − ν 6= 0 . α∈S N→∞ Hp−1

Denote by Ac the complement of a set A. Using the Markov property of the measure valued empirical processes X(N), we get

  X(N)(t) − να

P  sup inf > ε t∈[0,T f(N)] α∈S γN Hp−1 f(N)−1     (N) X (N) (N) (N) c ≤ P B0 ∪ C0 + P Ak ∪ Bk+1 ∪ Ck+1 (Bk ∪ Ck) k=0     (N) (N) (N) (N) c ≤ P B0 ∪ C0 + f(N)P A0 ∪ B1 ∪ C1 (B0 ∪ C0) . 3.3. LONG TIME BEHAVIOR OF THE EMPIRICAL PROCESSES 85

From (3.40) and (3.41) it follows that the first summand on the right hand side tends to zero (N) (N) as N tends to infinity. Therefore, we have to analyze the probabilities of A0 , B1 and C1 (N) c under the condition (B0 ∪ C0) . (N) (N) (N) Let us start with A0 . Denote by µ the McKean–Vlasov path with initial datum X (0). Then we get     µ(N)(t) − να c (N) (N) c 1 (N) c P A0 (B0 ∪ C0) ≤ P  sup inf > ε (B0 ∪ C0)  (3.43) t∈[0,T ] α∈S γN K Hp−1   X(N)(t) − µ(N)(t) K − c 1 (N) c + P  sup > ε (B0 ∪ C0) . t∈[0,T ] γN K Hp−1 Now, Proposition 3.3.6 yields   µ(N)(t) − να c 1 (N) c P  sup inf > ε (B0 ∪ C0)  t∈[0,T ] α∈S γN K Hp−1   µ(N)(0) − να 1 (N) c ≤ P  sup inf > ε (B0 ∪ C0)  , t∈[0,T ] α∈S γN K Hp−1 which is zero for all N ∈ N. Using once more (3.40), we can apply Theorem 2.2.2 to the second summand on the right hand side of (3.43). The result is

1   lim ln P A(N) (B(N) ∪ C )c = −∞ 2 0 0 0 N→∞ NγN and it follows that   (N) (N) c lim f(N)P A0 (B0 ∪ C0) = 0. N→∞ (N) Now let us look at the probability of B1 . We compute     µ(N)(T ) − να 1 (N) (N) c (N) c P B1 (B0 ∪ C0) ≤ P inf > ε (B0 ∪ C0)  (3.44) α∈S γN 2K Hp−1   X(N)(T ) − µ(N)(T ) 1 (N) c + P  > ε (B0 ∪ C0) . γN 2K Hp−1 Since we have chosen T > 0 large enough we get by Proposition 3.3.6   µ(N)(T ) − να 1 (N) c P inf > ε (B0 ∪ C0)  α∈S γN 2K Hp−1   µ(N)(0) − να 1 (N) c ≤ P inf > ε (B0 ∪ C0)  , α∈S γN K Hp−1 86 CHAPTER 3. THE DYNAMIC BEHAVIOR which is zero for all N ∈ N. Using again Theorem 2.2.2, we can handle the second summand of the right hand side of (3.44) in the same way as for the sets A(N). The result is   (N) (N) c lim f(N)P B1 (B0 ∪ C0) = 0. N→∞ Finally,

    (N) c (N) (N) (N) c P C1 (B0 ∪ C0) ≤ P lim X (T ) − µ (T ) 6= 0 (B0 ∪ C0) N→∞ Hp−1   (N) α (N) c + P inf lim µ (T ) − ν 6= 0 (B0 ∪ C0) , (3.45) α∈S N→∞ Hp−1

(N) (N) Since the zero set of the rate function I associated with the LDP of (X −µ )N∈N contains (N) (N) only the null distribution the sequence (X − µ )N∈N converges by the Borel–Cantelli lemma almost surely to zero, as N → ∞. Hence, the first summand on the right hand side of (3.45) vanishes. Using once more Proposition 3.3.6, we conclude that the second summand vanishes, too. All together prove   X(N)(t) − να

P  sup inf > ε = 0. t∈[0,T f(N)] α∈S γN Hp−1

Since ε > 0 was arbitrary we are done. 2

3.4 Proof of Theorem 3.1.1

Fix J > Jc. Then the sphere S has a positive radius r0. Let us first recall some notations. The functions f0,β and f1,β are taken from (3.14) and Lemma 3.2.3, respectively. In Section 3.2 we have defined the processes

Y (N)(t) := X(N)(t) ◦ D−1 ,N ∈ , ϕ(N)(t) N

where the angle processes ϕ(N) are up to a stopping time τ (N) the solution of the Itˆostochastic differential equation

 (N) β⊗3 A1 Y (t) − ν σ2 A + O(Y (N)(t) − νβ) dϕ(N)(t) = dM (N)(t) + dt + 2 dt , (3.46) (N) β (N) β 2 A3(Y (t), ν ) N (A3(Y (t), ν ))

Here M (N) is a continuous martingale with quadratic variation

t D β 0 2E (N) β σ2 Z ν , |f0,β| + O(Y (s) − ν ) [[M (N),M (N)]] = ds. t (N) β 2 N (A3(Y (s), ν )) 0 Moreover,

0 3 A1(ϑ) = hϑ, R2,βf1,βi , ϑ ∈ (D (S)) , 3.4. PROOF OF THEOREM 3.1.1 87

∞ (R2,βf)(x, y, z) = (R1,βf(·, z))(x, y) + (R1,βf(z, ·))(x, y), f ∈ C (S × S), Jr   (R f)(x, y) = − 0 sin(β − x)(f 0 (x) − 1) + sin(β − y)(f 0 (y) − 1) 1,β 2 0,β 0,β J + sin(x − y)(f 0(y) − f 0(x)), x, y ∈ S, 2  (N) β D β 0 E (N) β A3 Y (t), ν = ν , f0,β + O(Y (t) − ν )

and  2  D β 0 2E  β β ∂ 1 D β 00 E A2 = ν , |f0,β| ν ⊗ ν , f1,β + ν , f0,β ∂x1∂x2 2  2  D β 0 E D β 0 00 E β β ∂ 0 0  + ν , f0,β ν , f0,βf0,β + ν ⊗ ν , f1,β(f0,β ⊕ f0,β) ∂x1∂x2 2  2  D β 0 E β ∂ + ν , f0,β ν , trace f1,β . ∂x1∂x2

Each term O(Y (N)(t) − νβ) depends continuously on Y (N)(t) − νβ ∈ D0(S). Moreover, the (N) β (N) β sequence (O(Y (t) − ν ))N∈N converges weakly to zero if (Y (t) − ν )N∈N converges, as N → ∞, weakly to zero. For such an angle process ϕ(N) we have

D (N) β E D (N) β ⊗2 E Y (t) − ν , f0,β + (Y (t) − ν ) , f1,β (3.47)

D (N) β E D (N) β ⊗2 E = Y (0) − ν , f0,β + (Y (0) − ν ) , f1,β ,

(N) for all t ∈ [0, τ ] and all N ∈ N.

3 Proof of Theorem 3.1.1 : Fix T > 0 and p < − 2 . First note that the assumptions of Proposition 3.3.7 are fulfilled. We want to prove (i’) of Remark 3.1.2 instead of (i) of Theorem 3.1.1. D β 0 E Lemma 3.2.2 yields that ν , f0,β > 0 and hence all coefficients of (3.46) are bounded and measurable up to the stopping time  1 D E τ (N) := inf t ≥ 0: |O(Y (N)(t) − νβ)| ≥ νβ, f 0 . 2 0,β

Hence, ϕ(N) is well defined up to the stopping time τ (N). For t > τ (N), we define ϕ(N)(t) := ϕ(N)(τ (N)). We split the proof in the following three parts !

(N) β (1) lim P sup Y (t) − ν > ε = 0, for all ε > 0, N→∞ 0≤t≤TN∧τ (N) Hp−1   (2) lim P τ (N) < T N = 0 and N→∞

(N) (3) the processes (ϕ (Nt))t∈[0,T ] converge, as N → ∞, to a Brownian motion on S with −1 2 2 D β 0 2E 2 variance σe = σ ν , |f0,β| > σ and initial datum 0. 88 CHAPTER 3. THE DYNAMIC BEHAVIOR

Part (1): Since

∂ D α β E D α β ⊗2 E D β 0 E ν − ν , f0,β + (ν − ν ) , f1,β = ν , f0,β > 0 ∂α α=β

there exists for each small enough ε > 0 a constant δ > 0 such that ( )

β ξ ∈ C([0,T ]; P(S)): sup ξ(t) − ν > ε (3.48) t∈[0,T ] Hp−1 (

⊆ ξ ∈ C([0,T ]; P(S)): ξ(0) − νβ > δ or Hp−1 D E D E β β ⊗2 sup ξ(t) − ν , f0,β + (ξ(t) − ν ) , f1,β > δ or t∈[0,T ] ) sup inf kξ(t) − ναk > δ . Hp−1 t∈[0,T ] α∈S

β In other words, it is impossible to leave the ε–ball in Hp−1(S) around ν without leaving the tube associated with the right hand side of (3.48), see Figure 1.

ν − νβ < ε Hp−1 α inf kν − ν kH < δ β β ⊗2 α∈S p−1 | ν − ν , f0,β + (ν − ν ) , f1,β | > δ

νβ S

Figure 1.

From Proposition 3.3.7 we know that !

(N) α lim P sup inf Y (t) − ν > δ N→∞ t∈[0,NT ∧τ (N)] α∈S Hp−1 !

(N) α ≤ lim P sup inf X (t) − ν > δ N→∞ t∈[0,NT ] α∈S Hp−1 = 0,

for all δ > 0. Moreover, from (3.47) it follows that ! D E D E (N) β (N) β ⊗2 lim P sup Y (t) − ν , f0,β + (Y (t) − ν ) , f1,β > δ N→∞ t∈[0,NT ∧τ (N)]  D (N) β E D (N) β ⊗2 E  = lim P Y (0) − ν , f0,β + (Y (0) − ν ) , f1,β > δ , N→∞ 3.4. PROOF OF THEOREM 3.1.1 89 which is by (3.7) equal to zero, for all δ > 0. Using once more (3.7), we get

  lim P Y (N)(0) − νβ > δ = 0 N→∞ Hp−1 for all δ > 0 All together prove (1). Part (2): Since for small enough ε > 0 we have

( ) n o (N) (N) β τ < T N ⊆ sup Y (t) − ν > ε t∈[0,NT ∧τ (N)] Hp−1 part (2) follows from part (1). Part (3): Because of Lemma 3.2.2 and Lemma 3.2.3, we get A2 = 0. This together with (3.46) imply

(N) (N) sup ϕ (Nt) − M (Nt) τ(N) t∈[0,T ∧ N ]

NA ((Y (N)(Nt) − νβ)⊗3) O(Y (N)(Nt) − νβ) ≤ T sup 1 + . (3.49) (N) β (N) β 2 τ(N) A3(Y (Nt), ν ) (A3(Y (Nt), ν )) t∈[0,T ∧ N ]

From part (1) it follows that

D E (N) β β 0 sup A3(Y (Nt), ν ) − ν , f0,β τ(N) t∈[0,T ∧ N ] converges in probability to zero as N tends to infinity. Moreover, by the same argument we see that

(N) β sup O(Y (Nt) − ν ) τ(N) t∈[0,T ∧ N ] converges in probability to zero as N tends to infinity. In order to estimate the first summand of the right hand side of (3.49) let us look at the McKean–Vlasov path ϑ with start in some ϑ(0) ∈ P(S). For β ∈ S and t ≥ 0, we get

d D E D E ϑ(t) − νβ, f + (ϑ(t) − νβ)⊗2, f dt 0,β 1,β D β νβ E  β ⊗3 = ϑ(t) − ν , G f0,β + R2,β (ϑ(t) − ν )

 β ⊗3 = R2,β (ϑ(t) − ν ) .

In particular, taking ϑ(0) = να, for some α ∈ S, it follows that

 α β ⊗3 R2,β (ν − ν ) = 0. 90 CHAPTER 3. THE DYNAMIC BEHAVIOR

This implies   (N) β ⊗3 P  sup |NA1((Y (Nt) − ν ) )| > ε τ(N) t∈[0,T ∧ N ]   √ 3 (N) α ≤ P  sup inf N(X (Nt) − ν ) > ε , τ(N) α∈S H−2 t∈[0,T ∧ N ] which, as N tends to infinity, converges by Proposition 3.3.7 to zero, for each ε > 0. Therefore,

(N) (N) sup ϕ (Nt) − M (Nt) t∈[0,T ] converges in probability to zero as N tends to infinity. (N) Because of part (1), the quadratic variation of the continuous martingale (M (Nt))t∈[0,T ] converges, as N → ∞, in probability to D E νβ, |f 0 |2 2 2 0,β σe t = σ t 2 . D β 0 E ν , f0,β

(N) (N) This implies that (M (Nt))t∈[0,T ] and therefore (ϕ (Nt))t∈[0,T ] converges, as N → ∞, in 2 distribution to a Brownian motion on S with variance σe and initial datum zero, see [EK86, Theorem 1.4 of chapter 7, p. 339]. Finally, Lemma 3.2.2 yields

1 1 σ2 = σ2 = σ2 > σ2. e D β 0 2E D β 0 E ν , |f0,β| ν , f0,β

2

3.5 Wrong test functions

We are still discussing the one dimensional model defined by (3.1). We identify the one α dimensional sphere S with R modulo 2π. Let J > Jc. Then the sphere S = {ν : α ∈ S} has a positive radius r0. In the last section we have seen that the time speeded up measure valued (N) empirical process (X (Nt))t∈[0,T ] converges, as N tends to infinity, weakly to a Brownian motion on the sphere S with variance

σ2 σ2 = > σ2. e D 0 0 E ν , f0,0

In this section we want to explain why the “mean” is the wrong test function. Or more precise, why we don’t look at test functions of the form

cos(α − ·), α ∈ S. 3.5. WRONG TEST FUNCTIONS 91

(N) Assume that ϕ , N ∈ N, are S–valued semi–martingales. Moreover, assume that as N tends to infinity (N) (ϕ (Nt))t∈[0,T ] converges in distribution to a Brownian motion on S starting at 0 with varianceσ ¯2 and D E X(N)(Nt), f(· + ϕ(N)(Nt))

converges in probability to νβ, f , for all f ∈ C∞(S) and all t ∈ [0,T ]. Using Itˆo’sformula, we get

N D E σ X d X(N)(Nt), f(· + ϕ(N)(Nt)) = f 0(x (Nt) + ϕ(N)(Nt))dW (Nt) N k k k=1 D E + X(N)(Nt), f 0(· + ϕ(N)(Nt)) dϕ(N)(Nt)

D (N) E + N X(N)(Nt), (LX (Nt)f)(· + ϕ(N)(Nt)) dt 1 D E2 + X(N)(Nt), f 00(· + ϕ(N)(Nt)) d[[ϕ(N), ϕ(N)]] 2 Nt N σ X + f 00(x (Nt) + ϕ(N)(Nt))d[[W , ϕ(N)]] , N k k Nt k=1 for all N ∈ N. If the function f is the right choice for a test function then the variances of

t N Z σ X f 0(x (Ns) + ϕ(N)(Ns))dW (Ns) N k k 0 k=1 and t Z D E X(N)(Ns), f 0(· + ϕ(N)(Ns)) dϕ(N)(Ns) 0 should converge to the same limit as N tends to infinity. Therefore, let us compute the quadratic variation of both processes. We get

t Z D E σ2 X(N)(Ns), f 0(· + ϕ(N)(Ns))2 ds 0 and t Z 2 D (N) 0 (N) E (N) (N) X (Ns), f (· + ϕ (Ns)) d[[ϕ , ϕ ]]s, 0 respectively. As N tends to infinity, this converge to D E σ2t νβ, |f 0|2

and D E2 σ¯2t νβ, f 0 , 92 CHAPTER 3. THE DYNAMIC BEHAVIOR respectively. Therefore, for a proper test function f, it should follow that D E D E2 σ2 νβ, |f 0|2 =σ ¯2 νβ, f 0 .

2 2 Note that for f = f0,β the last equation yields the correct varianceσ ¯ = σe . Now assume that cos(α − ·) is the right choice for a test function, for some α ∈ S. Then we get D E D E2 σ2 νβ, | sin(α − ·)|2 =σ ¯2 νβ, sin(α − ·) . Using the integration by parts formula, we get  σ2 σ2  σ2 + (1 − ) sin2(α − β) =σ ¯2r2 sin2(α − β). 2J J 0 If α = β thenσ ¯ has to be equal to infinity, which makes no sense. Therefore, cos(β − ·) is not a good choice for a test function. Otherwise, it follows that

σ2 σ2 2 2 2 2J + (1 − J ) sin (α − β) σ¯ = σ 2 2 . r0 sin (α − β) π The right hand side attains its minimum at α = 2 + β. Therefore, 2 ! 1 − σ νβ, cos2(β − ·) σ¯2 ≥ σ2 2J = σ2 2 β 2 r0 hν , cos(β − ·)i β 2 D β 0 2E ν , cos (β − ·) ν , |f0,β| = σ2 . β 2 D β 0 2E hν , cos(β − ·)i ν , |f0,β| Using first the Cauchy–Schwarz inequality and then Lemma 3.2.2, we conclude that 2 D β 0 E ν , cos(β − ·)f0,β σ¯2 ≥ σ2 β 2 D β 0 2E hν , cos(β − ·)i ν , |f0,β| νβ, cos(β − ·) 2 = σ2 β 2 D β 0 2E hν , cos(β − ·)i ν , |f0,β| σ2 = , D β 0 E ν , f0,β where equality holds if and only if 0 f0,β(x) = c cos(β − x), for all x ∈ S and some constant c ∈ R. Using part (i) of Lemma 3.2.2, we see that this is obviously not the case. Therefore, the test functions cos(α − ·), α ∈ S, would lead to a variance σ2 σ¯2 > = σ2, D β 0 E e ν , f0,β which is to large. In other words, the functions cos(α − ·), α ∈ S, are the wrong test functions. 3.6. NOTES 93

3.6 Notes

First we make some notes about higher dimensions. Fix d ≥ 2. Let Sd be the d–dimensional sphere. Denote by X(N) the measure valued empirical process of the coupled model defined by (1.2). Assume that the mean field interaction constant J is larger than Jc. Then the sphere S has a positive radius r0. We expect that Theorem 3.1.1 is true for d ≥ 2, too. If we try to adapt our proof for d d dimension one we have to specify what the rotation Dα of S with angle α ∈ S means. Note, since d ≥ 2 there does not exists an unique rotation of Sd, which maps a given point x ∈ Sd onto another given point y ∈ Sd. d One way to define the rotation Dα is as follows. Fix some basis (e1, . . . , ed) of TβS and assume that ϕ(N) is a Sd–valued semi–martingale starting at β. Then the horizontal lift, see (N) [HT94, Satz 7.141], yields an unique (ϕ (t), e1(t), . . . , ed(t))t∈[0,∞) with

d (e1(t), . . . , ed(t)) is a basis of Tϕ(N)(t)S , for all t ∈ [0, ∞),

d and (e1(0), . . . , ed(0)) = (e1, . . . , ed). For t ∈ [0, ∞), we define the rotation Dϕ(N)(t) of S with angle ϕ(N)(t) in such a way that (N) Dϕ(N)(t)β = ϕ (t) and

Dϕ(N)(t)ei = ei(t), 1 ≤ i ≤ d. (N) For N ∈ N, assume that ϕ is a solution of

(N) (N) (N) dϕ (t) = A0 (ϕ (t), x1(t), . . . , xN (t)) dt (3.50) N d+1 X X (N) (N) i + Ak,i (ϕ (t), x1(t), . . . , xN (t)) ∗ dWk(t), k=1 i=1

(N) (N) for some vector fields A0 and Ak,i , 1 ≤ k ≤ N and 1 ≤ i ≤ d + 1. Here W1,...,WN are the d+1 same Brownian motions on R as in (1.2). Now one can try to define the vector fields of (3.50) in such a way that D E D E X(N)(t) ◦ D−1 − νβ, f = X(N)(0) ◦ D−1 − νβ, f , ϕ(N)(t) ϕ(N)(0) for all t ≥ 0 and suitable test functions f ∈ C∞(Sd). Like in the one dimensional case the β eigenfunctions corresponding to the eigenvalue zero of the operator Ge ν are good candidates β for such test functions. One can prove that the operator Ge ν is self–adjoint in a suitable Hilbert space and that the eigenspace corresponding to the eigenvalue zero is d–dimensional. Therefore, in order to prove our conjecture it should be possible to following the idea of the proof of the one dimensional case. d Now let us make some notes about the mean field model in R defined by (1.40). The main d problem one has in this case is the non–compactness of R . And in contrast to Chapter 1 this is an essential problem. Some aspects we already discussed in Section 2.7. d Another aspect of the non–compactness of R is to derive a suitable logarithmic Sobolev inequality for the proof of Lemma 3.3.3. Fortunately, this problem is solved, for “nice” potentials, see the article [BE84] of Dominique Bakry and Michel Emery. 94 APPENDIX

None the less we expect that for “nice” potentials the time speeded measure valued empirical d processes converge to a Brownian motion on a finite dimensional submanifold of P(R ). Like in Chapter 1 one could study higher levels. We only want to make some remarks (1) for level two in the one dimensional case. Fix T > 0 and J > Jc. Then the level one (N,1) measure valued empirical processes (Xl (Nt))t∈[0,T ] of the l-th box converge, as N → ∞, in distribution to a Brownian motion Wl. The question is whether or not there occurs a kind of synchronization between these Brownian motions as N tends to infinity. We guess that there exists a critical level two mean field interaction Jc, which may be differ from the critical (2) value in the stationary case, such that a synchronization appears if and only if J > Jc. One method to prove such a result are large deviation principles. The time scale t 7→ N 2t should yield another interesting phenomenon. We expect that (N,2) 2 the level two measure valued empirical processes (X (N t))t∈[0,T ] converge, as N → ∞, in distribution to a Brownian motion on the sphere S(2). The main problem for both time scales would be the construction of proper test functions, which are now functions on P(S). However, if one understand level two it should be no problem to analyze higher levels. Appendix

A.1 Logarithmic Sobolev inequalities

In this section we state some result about logarithmic Sobolev inequalities. Therefore, let M be a finite dimensional connected compact Riemannian C∞–manifold without boundary. We say a probability measure µ on M satisfies a logarithmic Sobolev inequality with constant κ > 0 if * !2+ ϕ D E κ µ, ϕ2 ln ≤ µ, kgradϕk2 , kϕkµ

for all ϕ ∈ L2(µ), where kϕkµ denotes the L2–norm of ϕ with respect to µ. Logarithmic Sobolev inequalities are closely related to hypercontractivity of diffusion pro- cesses, see for instance [Eme89] or [DS89]. Since we are on a connected compact Riemannian C∞–manifold one knows that each probability measure µ on M, with

dµ dµ 0 < c1 ≤ inf ≤ sup ≤ c2 < ∞ x∈M dλ x∈M dλ satisfies a logarithmic Sobolev inequality for some constant κ > 0, see [DS89]. Here λ denotes the Lebesgue measure on M. We want to illustrate why the constant κ > 0 can be taken in such a way that it only depends on c1 and c2. Let us start with two elementary but very useful inequalities.

Lemma A.1.1 (i) For all s, t ∈ R, one has

s2 ln s2 − s2 ln t2 − s2 + t2 ≥ 0, (A.1)

where equality holds if and only if s2 = t2. Here we define ln(0) := −∞.

(ii) For all probability measures µ, ν on M and all functions ϕ ∈ L2(µ) ∩ L2(ν), one has

* !2+ 2 ϕ D 2 2 2 2 2 2E µ, ϕ ln ≤ µ, ϕ ln ϕ − ϕ ln kϕkν − ϕ + kϕkν , kϕkµ

where kϕkµ denotes the L2–norm of ϕ with respect to the measure µ.

Proof: Consider the left hand side of (A.1) as a function of t ∈ R. An easy calculation shows that this function is minimal if and only if t2 = s2. Hence, (i) follows.

Moreover, (ii) is a consequence of (i) with t = kϕkν and s = kϕkµ. 2

95 96 APPENDIX

Now we can explain how to derive a logarithmic Sobolev inequality from another.

Lemma A.1.2 Assume that the probability measure ν on M satisfies a logarithmic Sobolev inequality with constant κ > 0. Let µ be another probability measure on M with dµ 0 < δ ≤ ≤ K, (A.2) dν δ for some δ, K > 0. Then ν satisfies a logarithmic Sobolev inequality with constant κ K .

Proof: First, from (A.2) it follows that L2(µ) = L2(ν). Therefore, we get by A.1.1.(ii) * !2+ 2 ϕ D 2 2 2 2 2 2E µ, ϕ ln ≤ µ, ϕ ln ϕ − ϕ ln kϕkν − ϕ + kϕkν , kϕkµ

for all ϕ ∈ L2(µ). The integrand on the right hand side is by A.1.1.(i) nonnegative. Hence, * !2+ 2 ϕ D 2 2 2 2 2 2E µ, ϕ ln ≤ K ν, ϕ ln ϕ − ϕ ln kϕkν − ϕ + kϕkν kϕkµ * +  ϕ 2 = K ν, ϕ2 ln kϕkν K D E ≤ ν, kgradϕk2 κ K D E ≤ µ, kgradϕk2 . κδ 2 All together, we get the following uniform result.

Proposition A.1.3 Fix c1, c2 > 0. There exists a constant κ > 0 such that if a probability measure µ on M satisfies dµ dµ c1 ≤ inf ≤ sup ≤ c2 x∈M dλ x∈M dλ then it satisfies a logarithmic Sobolev inequality with constant κ c1 . c2 Proof: Since M is a finite dimensional connected compact Riemannian C∞–manifold without boundary the uniform distribution on M satisfies a logarithmic Sobolev inequality for some constant κ > 0. Now our claim follows directly from Lemma A.1.2. 2

A.2 Sobolev Spaces

Let M be a finite dimensional connected compact Riemannian C∞–manifold without bound- ary. We denote by ∆ the Laplace operator and by λ the uniform distribution on M. Sobolev spaces are very useful in the theory of partial differential equations, see for instance Section 2.3. There are different possibilities to define such spaces, each with its own assets and drawbacks. We will use the following definition taken from the book [CP82] of Jacques Chazarain and Alain Piriou. A.2. SOBOLEV SPACES 97

Definition A.2.1 Let λl, l ∈ N, be the eigenvalues of the operator −∆ and denote by (el)l∈N a orthonormal basis of L2(M) of eigenfunctions. For p ∈ R, we define the set Hp(M) by n o H (M) = ϑ ∈ D0(M): kϑk < ∞ , (A.3) p Hp with norm ∞ X kϑk2 := (1 + λ )p hϑ, e i2 . (A.4) Hp l l l=1 Then the inner product ∞ X (ϑ , ϑ ) := (1 + λ )p hϑ , e i hϑ , e i , ϑ , ϑ ∈ H (M), 1 2 Hp l 1 l 2 l 1 2 p l=1

makes Hp(M) a Hilbert space. The following properties can be found in several books about Sobolev spaces or partial differential equations.

∗ Lemma A.2.2 (i) Hp+q(M) = Hp−q(M), for all p ∈ R and all q ∈ R.

(ii) The Sobolev space Hp(M) is continuously and compactly embedded in Hq(M), for all p, q ∈ R with p > q. d m (iii) Fix m ∈ N and p > 2 + m. Then Hp(M) is continuously embedded in C (M). (iv) Fix p ∈ R and some atlas U of M. Take m ∈ N with m ≥ p. Then there exists a constant c > 0, depending on m and U, such that α kϑk ≤ c sup kD ϑk 0 = c kϑk m , for all ϑ ∈ H (M). Hp C C p d α∈N , |α|≤m

(v) For all p ∈ R, all q > 0 and all ε > 0 there exists a constant c > 0 such that kϑk ≤ ε kϑk + c kϑk , Hp Hp+q Hp−q

for all ϑ ∈ Hp(M).

(vi) Fix p ∈ R. A subset K of C([0,T ]; Hp(M)) is relatively compact if and only if the following two assertions are valid.

(a) For each t ∈ [0,T ], the set {ϑ(t): ϑ ∈ K} is compact in Hp(M) and (b) lim sup sup kϑ(t) − ϑ(s)k = 0 . Hp δ→0 ϑ∈K 0≤s,t≤T |t−s|<δ

Proof: The duality is given by ∞ X (f, g) = (1 + λ )p hf, e i hg, e i , f ∈ H (M), g ∈ H (M). Hp l l l p+q p−q l=0 This proves (i). Moreover, the statements (ii)–(v) can be found in [CP82, Chapter 2]. Finally, (vi) is a consequence of the Arzel`a–Ascoli theorem. 2 98 APPENDIX

Since we want to use Sobolev spaces for solutions µ ∈ P(M) of parabolic partial differential equations we have to know whether or not a probability measure is an element of the Sobolev space Hp(M), for some p ∈ R. Therefore, we require estimates for the eigenvalues λl as l tends to infinity. A very good book about this topic is [Cha84] by Isaac Chavel. Most of the results of the next lemma are taken from this book. ∞ X p 2 d Lemma A.2.3 (i) sup (1 + λl) el(x) < ∞, for each p < − 2 . x∈M l=1 (ii) For p ∈ R, define ∞ X p 2 κp := sup (1 + λl) kgrad elkx . x∈M l=1 d Then κp is finite, for all p < − 2 − 1. d (iii) Fix p < − 2 . Then P(M) is continuously and compactly embedded in Hp(M). Proof: In order to prove (i) and (ii) one use the identity ∞ X −tλl p(t, x, y) = e el(x)el(y) , t ∈ [0,T ], x, y ∈ M, l=1 and estimates for the heat kernel p(t, x, y), see [Cha84, Chapter VI, Section 4]. d d Fix p < − 2 and take r ∈ R in such a way that p < r < − 2 . Using Jensen’s inequality, see [Bau91, Satz 3.9], we get 2 ∞ Z Z X kµk2 = δ µ(dx) ≤ kδ k2 µ(dx) ≤ sup (1 + λ )re (x)2, Hr x x Hr l l x∈M Hp l=1 for all µ ∈ P(M). By (i) the right hand side finite. Therefore, using A.2.2.(ii), we conclude that P(M) is a compact subset of Hp(M).

Now suppose that (µn)n∈N converges, as n → ∞, in P(M) to some µ ∈ P(M). Then the set {µn: n ∈ N} is a compact subset of Hp(M). Let µe be some accumulation point of {µn: n ∈ N} in Hp(M), i.e., the limit in Hp(M) of some subsequence (µnk )k∈N. Then we have

hµ, fi = lim hµn , fi = hµ, fi , e k→∞ k for all f ∈ C∞(M). Hence, it follows that µ = µ. This implies that (µ ) converges to µ e n n∈N in Hp(M) as n → ∞. Therefore, the embedding is continuous. 2

Finally, we give some statements about the behavior of the Hp–norm.

Lemma A.2.4 Fix p ∈ R. Then the following mappings are continuous: (i) Multiplication with a function ϕ ∈ C∞(M), i.e., ∞ C (M) × Hp(M) 3 (ϕ, ϑ) 7→ ϕϑ ∈ Hp(M). Moreover, for each atlas U of M and each m ∈ N with m ≥ p, there exists a constant c > 0, depending on m and U, such that

kϕϑk ≤ c kϕk m kϑk , Hp C Hp ∞ for all ϑ ∈ Hp(M) and all ϕ ∈ C (M). A.3. STOCHASTIC DIFFERENTIAL EQUATIONS ON MANIFOLDS 99

(ii) Applying smooth vector fields B, i.e.,

τM × Hp(M) 3 (B, ϑ) 7→ Bϑ ∈ Hp−1(M).

Moreover, for each atlas U and each m ∈ N with m ≥ p, there exists a constant c > 0, depending on m and U, such that

kBϑk ≤ c kBk m kϑk , Hp−1 C Hp

for all ϑ ∈ Hp(M) and all B ∈ τM.

(iii) Applying the adjoint operator B∗ of a smooth vector field B, i.e.,

∗ ∗ ∞ τM×Hp(M) 3 (B, ϑ) 7→ B ϑ ∈ Hp−1(M), with hB ϑ, fi = hϑ, Bfi , f ∈ C (M).

Moreover, for each atlas U and each m ∈ N with m ≥ p, there exists a constant c > 0, depending on m and U, such that

∗ kB ϑk ≤ c kBk m kϑk , Hp−1 C Hp

for all ϑ ∈ Hp(M) and all B ∈ τM.

Proof: See [CP82, Theorem 2.19. of Chapter 2]. 2

A.3 Stochastic differential equations on manifolds

∞ Fix d ∈ N and let M be a d–dimensional connected compact Riemannian C –manifold with- out boundary. As usual grad and ∆ denote the gradient and Laplacian on M, respectively. Moreover, let (·, ·) denote the inner product on the tangent manifold T M. Unless otherwise denoted all non–random differential geometrical objects are C∞.

Fix a filtered probability space (Ω, F,P, (F)t∈R+ ), which satisfies the usual condition, i.e., F0 contains all P –null sets of F.A M–valued adapted process x = (x(t))t∈R+ with continuous ∞ trajectories is called a semi–martingale on M if for each f ∈ C (M) the R–valued process (f(x(t))t∈R+ is a continuous semi–martingale in the usual sense. This definition is due to Laurent Schwartz. We want to consider M–valued semi–martingales x, which are solutions of Stratonovich type stochastic differential equations of the following kind

m X i dx(t) = B(t)(x(t))dt + σ Ai(x(t)) ∗ dW (t), (A.5) i=1

1 m where σ > 0 is the diffusion constant, m ∈ N and W ,...,W are independent standard Brownian motions on R. Moreover, the drift vector field B(t), t ∈ R+, depends continuously on t and A1,...,Am are vector fields with

m X ∞ AiAif = ∆f, for all f ∈ C (M). i=1 100 APPENDIX

If m ∈ N is large enough then such vector fields A1,...,Am always exist. A semi–martingale on M adapted to the filtration generated by the Brownian motions W1,...,Wm is by definition a solution of (A.5) if

t Z  σ2  f(x(t)) − f(x(0)) = (B(s)(x(s)), grad f(x(s))) + ∆f(x(s)) ds 2 0 t m Z X i + σ (Ai(x(s)), grad f(x(s))) dW (s), i=1 0 for all t ≥ 0 and all f ∈ C∞(M). Here the stochastic integral is an Itˆointegral. σ2 Each solution x of (A.5) is a diffusion process on M with generator 2 ∆ + B(·). Moreover, for each initial datum x0 ∈ M, there exists an unique solution of (A.5) with x(0) = x0, see Theorem 1.1 of Chapter V of [IW89]. If B vanishes then the solution x of (A.5) is called a Brownian motion on M. For shorter notations we define A := (A1,...,Am). Then (A.5) can be rewritten as

dx(t) = B(t)(x(t))dt + σA(x(t)) ∗ dW (t),

1 m m where W = (W ,...,W ) is a standard Brownian motion on R and m X i A(x(t)) ∗ dW (t) := Ai(x(t)) ∗ dW (t). i=1 For a detailed introduction to the topic of and differential equations on manifolds, see for instance [HT94] by Wolfgang Hackenbroch and Anton Thalmaier, [IW89] by Nobuyuki Ikeda and Shinzo Watanabe or [Eme89] by Michel Emery. Note that in contrast to the Euclidean case in general there does not exists “the Brownian motion” on a given manifold. This means, some properties of a Brownian motion on a manifold depend on the construction, i.e., on the vector fields A1,...,Am. In order to illustrate this phenomenon we want to study one property of a Brownian motion on the one dimensional sphere S. 2 One way to construct a Brownian motion on S is as follows. Consider S ={x∈R : kxk 2 =1} 2 R as submanifold of R . Then the tangent space TxS at the point x = (x1, x2) ∈ S is equal to

TxS = span{(x1, −x2)}.

Therefore, 2 A(x) := (x1, −x2), x = (x1, x2) ∈ R , is a vector field. One easily compute

A2f = f 00,

for all f ∈ C∞(S). Hence, the solution x ∈ C([0, ∞); S) of

dx(t) = A(x(t)) ∗ dW (t) and x(0) = z (A.6)

is a Brownian motion on S, for each Brownian motion W on R and each initial datum z ∈ S. A.3. STOCHASTIC DIFFERENTIAL EQUATIONS ON MANIFOLDS 101

A second way to construct a Brownian motion on S is to take the vector fields

A1(x) := x2A(x) and A2(x) = −x1A(x), x ∈ S.

Then the solution y ∈ C([0, ∞); S) of

1 2 d(y(t) = A1(y(t)) ∗ dW (t) + A2(y(t)) ∗ dW (t) and y(0) = z (A.7)

1 2 2 is a Brownian motion on S, for each Brownian motion W = (W ,W ) on R and each initial datum z ∈ S.

Now fix to points z1 and z2 of S. Denote by xz1 and xz2 the Brownian motions on S con- structed with the same Brownian motion W via (A.6) with initial data z1 and z2, respectively. Then one has

kxz (t) − xz (t)k 2 = kz1 − z2k 2 , for all t ≥ 0 a.s. 1 2 R R

On the other hand, denote by yz1 and yz2 the Brownian motions on S constructed with the same Brownian motion W = (W1,W2) via (A.7) with initial data z1 and z2, respectively. Then one can show that

lim kyz (t) − yz (t)k 2 = 0 a.s. t→∞ 1 2 R Therefore, some properties of a Brownian motion on a manifold depend on its construction. 102 APPENDIX

A.4 Frequently used notation

symbol description, (page of definition) d R d–dimensional Euclidean space S one dimensional sphere Sd d–dimensional sphere M connected compact Riemannian C∞–manifold TxM tangent space at x ∈ M T M tangent manifold of M τM space of all C∞ vector fields over M ∆ Beltrami–Laplace operator grad gradient div divergence λ uniform distribution κp see (pg. 23) cos(x, y) cosine of the angle between x and y, (pg. 1) kfk norm of f

kfkV norm of f ∈ V (f, g) inner product of f and g

(f, g)V inner product of f, g ∈ V C(M) space of all continuous functions f: M → R C([0,T ]; V ) space of all continuous functions f: [0,T ] → V Cm(M) space of all m–times continuously differentiable functions f: M → R ∞ C (M) space of all f: V → R possessing continuous derivatives of all orders L2(M) space of all functions f: M → R, which are square integrable with respect to the uniform distribution λ L2(M; µ) space of all functions f: M → R, which are square integrable with respect to the measure µ L2([0,T ]; V ) space of all functions f: [0,T ] → V , with kf(·)kV ∈ L2([0,T ]) D0(M) Schwartz space of all distribution on M, algebraic dual of C∞(M) hϑ, fi action of ϑ on f Hp(M) Sobolev space of order p ∈ R, (pg. 23) kϑk Sobolev norm of order p ∈ , (pg. 23) Hp R (ϑ , ϑ ) inner product in H (M), (pg. 23) 1 2 Hp p H−1(M; µ) Sobolev space of order −1 with respect to the measure µ, (pg. 24) ACµ set of all distribution valued functions ϑ ∈ C([0,T ]; D0(M)), which are absolutely continuous with respect to µ ∈ C([0,T ]; P(M)), (pg. 24) B(·) distribution dependent vector field, which satisfies Assumption 2.1.2 A.4. FREQUENTLY USED NOTATION 103

symbol description (page of definition) Lµ see (pg. 22) and (pg. 66) Gµ see (pg. 22) and (pg. 66) Ge µ see (pg. 66) P(M) space of all probability measures on M (n) P (M) space of all level n ∈ N probability measures on M, (pg. 3) P (A) probability of the set A E (ξ) expectation of the random variable ξ W Brownian motion (n) J strength of the level n ∈ N mean field interaction JJ (1) Jc critical value of the mean field interaction, (pg. 7) X(N) measure valued empirical process of the coupled model, (pg. 21) (N,n) X level n ∈ N measure valued empirical process of the coupled model, (pg. 2) and (pg. 3) ϕ(N) angle process, (pg. 69) Y (N) rotated measure valued empirical process of the coupled model, (pg. 68) X(N) measure valued empirical process of the free model, (pg. 34) ϑ M (N) measure valued martingale part of X(N), (pg. 58) M (N) measure valued martingale part of X(N), (pg. 58) f ϑ (γN )N∈N rate for moderate deviation principles, (pg. 21) Ξ(µ|ν) relative entropy of the probability measure µ with respect to the probability measure ν, (pg. 76) (n) Iinv level n ∈ N rate function of the large deviation principle for the invariant distributions, (pg. 10) (1) Iinv Iinv, (pg. 5) I rate function of the moderate deviation principle for the coupled model, (pg. 25)

Iϑ rate function of the LDP for the free model, (pg. 52) α,n (n) ν zero points of Iinv, (pg. 7) and (pg. 10) να να,1, (pg. 7) S(n) sphere of all να,n, α ∈ Sd, (pg. 7) and (pg. 10) SS(1), (pg. 7) (n) (n) d r (µ) level n ∈ N length of µ ∈ P (S ), (pg. 5) and (pg. 9) (n) (n) d α (µ) level n ∈ N angle of µ ∈ P (S ), (pg. 5) and (pg. 9) (n) (n) r0 radius of the sphere S , (pg. 10) (1) r0 r0 GJ see (pg. 7) LDP large deviation principle 104 APPENDIX Bibliography

[Bau91] Heinz Bauer. Wahrscheinlichkeitstheorie. De Gruyter Lehrbuch. Berlin etc.: Walter de Gruyter, 1991.

[BE84] Dominique Bakry and Michel Emery. Hypercontractivit´ede semi-groupes de diffu- sion (Hypercontractivity for diffusion semi-groups). C. R. Acad. Sci., Paris, S´er. I, 299:775–778, 1984.

[Cha84] Isaac Chavel. Eigenvalues in Riemannian geometry. With a chapter by Burton Randol. With an appendix by Jozef Dodziuk., volume 115. Pure and Applied Math- ematics, Orlando etc.: Academic Press, Inc, 1984.

[CP82] Jacques Chazarain and Alain Piriou. Introduction to the theory of linear partial dif- ferential equations. Transl. from the French ed. by Trans. Inter-Scientia, Tonbridge, England., volume 14. Studies in Mathematics and Its Applications, Amsterdam - New York - Oxford: North-Holland Publishing Company, 1982.

[DG87] Donald A. Dawson and J¨urgen G¨artner. Large deviations from the McKean–Vlasov limit for weakly interacting diffusions. Stochastics, 20:247–308, 1987.

[DG88] Donald A. Dawson and J¨urgen G¨artner. Long time behaviour of interacting dif- fusions. In Stochastic calculus in application, Proc. Symp., Cambridge/UK 1987, Pitman Res. Notes Math. Ser. 197, 29–54 . 1988.

[DG89] Donald A. Dawson and J¨urgen G¨artner. Large deviations, free energy functional and quasi-potential for a mean field model of interacting diffusions. Mem. Am. Math. Soc., 398, 1989.

[DH00] Frank Den Hollander. Large deviations. Fields Institute Monographs. 14. Provi- dence, RI: AMS, American Mathematical Society., 2000.

[DS63] Nelson Dunford and Jacod T. Schwartz. Linear operators. Part I–III. New York and London: Interscience Publishers, a division of John Wiley and Sons, 1963.

[DS89] Jean-Dominique Deuschel and Daniel W. Stroock. Large deviations. Rev. ed. Pure and Applied Mathematics, 137. Boston, MA etc.: Academic Press, Inc, 1989.

[Dud89] Richard M. Dudley. Real analysis and probability. Wadsworth & Brooks/Cole Mathematics Series. Pacific Grove, CA, 1989.

[DZ93] Amir Dembo and Ofer Zeitouni. Large Deviations Techniques and Applications. Jones and Bartlett Publishers, 1993.

105 106 BIBLIOGRAPHY

[EK86] Stewart N. Ethier and Thomas G. Kurtz. Markov processes. Characterization and convergence. Wiley Series in Probability and Mathematical . New York etc.: John Wiley & Sons., 1986.

[Eme89] Michel Emery. Stochastic calculus in manifolds. With an appendix by P. A. Meyer. Universitext. Berlin etc.: Springer-Verlag, 1989.

[EMN76] Richard S. Ellis, James L. Monroe, and Charles M. Newman. The GHS and Other Correlation Inequalities for a Class of Even Ferromagnets. Commun. Math. Phys., 46:167–182, 1976.

[FW84] Mark I. Freidlin and Alexander D. Wentzell. Random perturbations of dynamical systems. Springer–Verlag, New York, 1984.

[G¨ar88] J¨urgen G¨artner. On the McKean-Vlasov limit for interacting diffusions. Math. Nachr., 137:197–248, 1988.

[Gre00] Andreas Greven. Interacting stochastic systems: longtime behavior and its renor- malization analysis. Jahresber. Dtsch. Math.-Ver., 104:149–170, 2000.

[HT94] Wolfgang Hackenbroch and Anton Thalmaier. Stochastische Analysis. B. G. Teub- ner Stuttgart, 1994.

[IW89] Nobuyuki Ikeda and Shinzo Watanabe. Stochastic Differential Equations and Dif- fusion Processes. North-Holland Publishing Company, 1989.

[LSU68] Olga A. Ladyˇzenskaya, Vsevolod A. Solonnikov, and Nina N. Ural’tseva. Linear and quasi-linear equations of parabolic type. Translations of Mathematical Monographs. 23. Providence, RI: American Mathematical Society, 1968.

[Sim93] Barry Simon. The statistical mechanics of lattice gases. Vol. I. Princeton Series in Physics. Princeton, 1993.

[Wlo82] Joseph Wloka. Partielle Differentialgleichungen. B. G. Teubner, Stuttgart, 1982.