arXiv:1901.02556v1 [math.PR] 8 Jan 2019 Introduction 1 : Φ a fapoesdsrbdb cenVao tcatcdifferent stochastic McKean-Vlasov a by case described process a of law edsigihtocss a) cases: two distinguish We e kqatttv rpgto fcasvadfeeta ca differential via chaos of propagation quantitative Weak h i fti oki opoiea xc ekerrepninbetwe expansion error weak exact an provide to is work this of aim The P µ 2 siae o nebe fitrcigprilsadso t show and particles interacting able properties. of are ensembles we for Furthermore, estimates systems. particle quantit interacting reveals for work ga regularit this Ultimately, the mean-field analyse SDEs. of and McKean-Vlasov case literature each the connected for required in intimately conditions equation is master and the measures second called probability The of measures. flow of space the particl the corresponding on the derivatives of functional laws marginal of law empirical ereo mohes edsigihtocss a) cases: two distinguish We smoothness. of degree of sample a of measure empirical µ ae st hwta ne utberglrt conditions, regularity suitable under that show to is paper o oepstv constants positive some for nue yte2Wsesendistance 2-Wasserstein the by induced N ( b) ; R steeprclmaueo h agnllw ftecorrespondin the of laws marginal the of measure empirical the is osdrtemti space metric the Consider d ) µ → samria a fMKa-lsvsohsi differential stochastic McKean-Vlasov of law marginal a is R enFaçi Chassagneux Jean-François fteeprclmeasure empirical the of 2 colo ahmtc,Uiest fEdinburgh of University Mathematics, of School µ N ntesaeo measures of space the on C steeprclmaueof measure empirical the is | 1 1 Φ( SM nvriéPrsDiderot Paris Université LSPM, C , . . . , ( P µ 2 ) ( N R − d k E admvralsdsrbtdas distributed variables random ) − W , Φ( W µ 1 N µ 2 htd o eedon depend not do that 2 Let . ) N P ∈ Abstract 1 fsur nerbelw on laws integrable square of ) uazSzpruch Lukasz , | = 1 2 : Φ ( k X j R =1 − µ 1 N d P ) N C steeprclmaueof measure empirical the is 2 n t eemnsi limit deterministic its and j j ( R + ntecs ffntoaso h asof laws the of functionals of case the in y ehave we tv siae fpoaaino chaos of propagation of estimates ative d ytm h rtcs ssuidusing studied is case first The system. e opoiewa rpgto fchaos of propagation weak provide to ) a hs a aesm remarkable some have may these hat O N aerle na t-yefruafor formula Itô-type an on relies case e.W tt h eea regularity general the state We mes. ( → oPE ntesaeo measures, of space the on PDEs to smlsfrom -samples N 1 k R qaini hc case which in equation 2 a qain(cVSE,i which in (McKV-SDE), equation ial N ) n li Tse Alvin and , , eafnto and function a be where , µ h anrsl fthis of result main The . atcesystem. particle g na(olna)functional (nonlinear) a en R k d orsod othe to corresponds µ ihtetopology the with b) ; N smlsfrom -samples Φ( 2 µ µ µ stemarginal the is ) N µ , N µ ethe be san is P ∈ lculus 2 ( R d ) . In the first case where µN is the empirical measure of N-samples from µ, the only interesting case is when the functional Φ is non-linear. To provide some context to our results, one may, for example, assume that Φ is Lipschitz continuous with respect to the Wasserstein distance, i.e, there exists a constant C > 0 such that
Φ(µ) Φ(ν) CW (µ,ν), µ,ν (Rd) , | − |≤ 2 ∀ ∈ P2 one could bound Φ(µ) EΦ(µ ) by EW (µ,µ ). Consequently, following [15] or [14], the rate of | − N | 2 N convergence in the number of samples N deteriorates as the dimension d increases. On the other hand, recently, authors [13, Lem. 5.10] made a remarkable observation that if the functional Φ is twice-differentiable with respect to the functional derivative (see Section 2.1.1), then one can obtain a dimension-independent bound for the strong error E Φ(µ) Φ(µ ) p, p 4, which is of order | − N | ≤ O(N −1/2) (as expected by CLT). Here, we study a weak error and show that, (see Theorem 2.17) if Φ is (2k + 1)-times differentiable with respect to the functional derivative, then indeed we have
k−1 C 1 Φ(µ) EΦ(µ ) = j + O( ), | − N | N j N k Xj=1 for some positive and explicit constants C1,...,Ck−1 that do not depend on N. The result is of inde- pendent interest, but is also needed to obtain a complete expansion for the error in particle approx- imations of McKV-SDEs that we discuss next. The second situation we treat in this work concerns estimates of propagation-of-chaos type 1. Consider a probability space (Ω, , P) with a d-dimensional Brownian motion W . We are interested F in the McKean-Vlasov process Xs,ξ with interacting kernels b and σ, starting from a random { t }t∈[s,T ] variable ξ, defined by the SDE2
t t X0,ξ = ξ + b(X0,ξ, L (X0,ξ)) dr + σ(Xs,ξ, L (X0,ξ)) dW , t [s, T ], (1.1) t r r r r r ∈ Z0 Z0 s,ξ s,ξ d d d where L (Xr ) denotes the law of Xr and functions b = (b ) : R (R ) R and σ = i 1≤i≤d × P2 → (σ ) : Rd (Rd) Rd Rd satisfy suitable conditions so that there exists a unique weak i,j 1≤i,j≤d × P2 → ⊗ solution (see e.g. [34] or, for more up-to-date panorama on research on existence and uniqueness, see [32, 20, 1, 12]). McKean-Vlasov SDE (1.1) can be derived as a limit of interacting diffusions. Indeed, 0,ξ N 1 N one can approximate the law L (X ) by the empirical measure X := δ j,N generated by N · · N j=1 X· particles i,N defined as {X }1≤i≤N P t t i,N = ξ + b i,N , XN dr + σ i,N , XN dW i, 1 i N, t [0, T ], (1.2) Xt i Xr r Xr r r ≤ ≤ ∈ Z0 Z0 1We would like to remark that our results also cover a situation where the law µ is induced by a system of stochastic differential equations with random initial conditions that are not of McKean-Vlasov type in which case the samples are i.i.d. 2We assume without loss of generality that the dimensions of X and W are the same because we will not make any non-degeneracy assumption on the diffusion coefficient σ in our work. In particular, one dimension of X could be time itself.
2 where W i, 1 i N, are independent d-dimensional Brownian motions and ξ , 1 i N, are i.i.d. ≤ ≤ i ≤ ≤ random variables with the same distribution as ξ. It is well known, [34, Prop. 2.1], that the property XN of propagation of chaos is equivalent to weak convergence of measure-valued random variables t L N L XN Rd to (Xt). A common strategy is to establish tightness of π = ( t ) ( ( )) and to identify N ∈ P P the limit by showing that π converges weakly to δL (Xt). This approach does not reveal quantitative bounds we seek in this paper, but is a very active area of research. We refer the reader to [18, 34, 29] for the classical results in this direction and to [22, 3, 16, 30, 26] for an account (non-exhaustive) of recent results. On the other hand, the results on quantitative propagation of chaos are few and far in between. In the case when coefficients of (1.1) depend on the measure component linearly, i.e., are of the form b(x,µ)= B(x,y) µ(dy), σ(x,µ)= Σ(x,y)µ(dy) , Rd Rd Z Z with B, Σ being Lipschitz continuous in both variables, it follows from a simple calculation [34] to see that W (L ( i,N ), L (X0,ξ)) = O(N −1/2). We refer to Sznitman’s result as strong propagation of 2 Xt t chaos. Note that in this work we treat the case of McKean-Vlasov SDEs with coefficients with general measure dependence. In that case, as explicitly demonstrated in [7, Ch. 1], the rate of strong propaga- tion of chaos deteriorates with the dimension d. This is due to the fact that one needs to estimate the difference between the empirical law of i.i.d. samples from µ and µ itself using results such as [15] or [14]. In the special case when the diffusion coefficient is constant, with linear measure dependence on the drift (which lies in some negative Sobolev space), the rate of convergence in the total variation norm has been shown to be O(1/√N) in [21]. Of course, in a strong setting, O(1/√N) is widely considered to be optimal as it corresponds to the size of stochastic fluctuations as predicted by the CLT. In this work, we are interested in weak quantitative estimates of propagation of chaos. Indeed, this new direction of research has been put forward very recently by two independent works [24, Ch. 9] and [31, Th. 2.1]. The authors presented novel weak estimates of propagation of chaos for linear functions in measure, i.e. Φ(µ) := F (x)µ(dx) with F : Rd R being smooth. This gives the rate Rd → of convergence O(1/N), plus the error due to approximation of the functional of the initial law (see R [31, Lem. 4.6] for a discussion of a dimensional-dependent case). While the aim of [31] is to estab- lish quantitative propagation of chaos for the Boltzmann’s equation, in a spirit of Kac’s programme [23, 28], Theorem 6.1 in [31, Th. 6.1] specialises their result to McKV-SDEs studied here, but only for elliptic diffusion coefficients that do not depend on measure and symmetric Lipschitz drifts with linear measure dependence. The key idea behind both results is to work with the semigroup that acts on the space of functions of measure, sometimes called the lifted semigroup, which can be viewed a dual to the space of probability measures on (Rd) as presented in [30]. A similar research programme, but P in the context of mean-field games with a common noise, has been successfully undertaken in [6]. In this work, the authors study the master equation, which is a PDE driven by the Markov generator of a lifted semi-group. They show that existence of classical solution to that PDE is the key to obtain quantitative bounds between an n-player Nash system and its mean field limit. Indeed, perturbation analysis of the PDE on the space of measures leads to the weak error being of the order O(1/N). In this work we build on these observations, and identify minimal assumptions for the expansion in number of particles N to hold. Next, we verify these assumptions for McKV-SDEs with a general drift
3 and general (and possibly non-elliptic) diffusion coefficients. We also consider non-linear functionals of measure. The main theorem in this paper, Theorem 2.17, states that given sufficient regularity we have k−1 C 1 E Φ(XN ) Φ(L (X0,ξ)) = j + O( ), T − T N j N k j=1 X where C1,...,Ck−1 are constants that do not depend on N. As mentioned above, the method of expansion relies heavily on the calculus on ( (Rd),W ) and P2 2 we follow the approach presented by P. Lions in his course at Collège de France [27] (redacted by Cardaliaguet [5]). To obtain such an expansion, one needs to rely on some smoothness property for the solution of (1.1) as it is always the case when one wants to gets error expansion for some approxim- ating procedure (see [36] for a similar expansion on the weak error expansion of SDE approximation with time-discretisation). The important object in our study, similarly to [6], is the PDE written on the space [0, T ] (Rd), which corresponds to the lifted semigroup and comes from the Itô’s formula × P2 of functionals of measures established in [4] and [9]. Smoothness properties on the functions (m) V (see Definition 2.6) required for expansion (2.17) to hold are formulated in Theorem 2.9. A natural question is then to identify some sufficient conditions on the SDE coefficients to guarantee the smooth- ness property of the functions (m). We give one possible answer to this question in Theorem 2.17. V In that theorem, we show that if the coefficients of the SDE are smooth, then the functions (m) are V also smooth enough provided that Φ is itself smooth. This result, which is expected, comes from an extension of Theorem 7.2 in [4] (see Theorem 2.15). While in the current paper, we assume high order of smoothness of the coefficients of McKV-SDEs and Φ, we anticipate this general approach to be valid under a less regular setting. Indeed, when working with a strictly elliptic setting with some structural conditions, the lifted semigroup may be smooth even in the case when drift and diffusion coefficients are irregular. This has been demonstrated in [11, 12]. Similarly, Φ does not need to be smooth for the lifted semigroup to be differentiable in the measure direction. This has been shown using techniques of Malliavin calculus in [10]. Finally, when the underlying equation has some special structure, the more classical approach can be deployed to study weak propagation of chaos property [2]. The analysis of irregular cases goes beyond the scope of this paper. To sum up, there are three main contributions in this paper. Firstly, the main result (Theorem 2.17) allows us to use Romberg extrapolation to obtain an estimator of X with weak error being in the order of O( 1 ), for each k N. (See Section 1.1 for details.) Thus, effectively, a higher-order particle system N k ∈ (in terms of the weak error) can be constructed up to a desired order of approximation. Secondly, the analysis in this paper makes use of the notions of measure derivatives and linear functional derivatives by generalising them to an arbitrary order of differentiation. This is in line with the approach in [10]. Some properties (e.g. Lemma 2.5) relate the regularity of the two notions of derivatives in measure and might be of an independent interest. In particular, the generalisation of Theorem 7.2 in [4] from second order derivatives in measure to higher order derivatives is proven to be useful in the analysis of McKean-Vlasov SDEs in general. Finally, as a by-product of the weak error expansion, a version of the law of large numbers in terms of functionals of measures is developed in Theorem 2.12.
4 1.1 Romberg extrapolation and ensembles of particles In this section we construct an ensemble particle system in the spirit of Richardson’s extrapolation method [33] that has been studied in the context of time-discretisation of SDEs in [36] and in the context of discretisation of SPDEs in [19]. Let F : Rd R be a Borel-measurable function and define Φ(µ) := F (x)µ(dx). Observe that → Rd N R 1 E[Φ(XN )] = E F ( i,N ) = E[F ( 1,N )]. T N XT XT Xi=1 E 0,ξ E 1,N Hence, the weak error reads [F (XT )] [F ( T )] . By the result of Theorem 2.17, we can apply | − X | E 0,ξ the technique of Romberg extrapolation to construct an estimator which approximates [F (XT )] k such that the weak error is of the order of O(1/N ). More precisely, for k = 2, since C1 is independent of N, C 1 EF ( i,N ) E[F (X0,ξ)] = 1 + O XT − T N N 2 and C 1 EF ( i,2N ) E[F (X0,ξ)] = 1 + O . XT − T 2N N 2 Hence, 1 2EF ( i,2N ) EF ( i,N ) E[F (X0,ξ)] = O . XT − XT − T N 2 For general k, we can use a similar method to show that
k 1 α EF ( i,mN ) E[F (X0,ξ)] = O , m XT − T N k m=1 X where mk α = ( 1)k−m , 1 m k. m − m!(k m)! ≤ ≤ − To motivate the study of the weak error expansion we will analyse an estimator that uses M ensembles of particles. Fix M 1. The ensembles are indexed by j. For j 1,...,M , consider ≥ ∈ { } t t (i,j),N = ξ + b (i,j),N , X(j,N) dr + σ (i,j),N , X(j,N) dW (i,j), 1 i N, (1.3) Xt (i,j) Xr r Xr r r ≤ ≤ Z0 Z0 where W i : 1 i N are M independent ensembles each consisting of N d-dimensional { ≤ ≤ }1≤j≤M Brownian motions; and ξ : 1 i N are M independent ensembles each consisting of N { (i,j) ≤ ≤ }1≤j≤M i.i.d. random variables with the same distribution as ξ. We consider the following estimator
M k mN 1 1 α F ( (i,j),mN ). M m mN XT m=1 Xj=1 X Xi=1 5 Next we analyse mean-square error3 of this estimator
M k mN 1 1 2 E E[F (X )] α F ( (i,j),mN ) T − M m mN XT m=1 Xj=1 X Xi=1 k 2 2 E[F (X )] α EF ( 1,mN ) ≤ T − m XT mX=1 k mN M k mN 1 1 1 2 + 2E E α F ( i,mN ) α F ( (i,j),mN ) . m mN XT − M m mN XT mX=1 Xi=1 Xj=1 mX=1 Xi=1 The first term on the right-hand side is studied in Theorem 2.17 and, provided that the coefficients of (1.1) are sufficiently smooth, it converges with order (N −2k). Control of the second term follows O from the qualitative strong propagation of chaos. Indeed, we write
M k mN 1 1 Var α F ( (i,j),mN ) M m mN XT m=1 Xj=1 X Xi=1 M k mN M k mN 1 1 1 1 2Var α F (X(i,j)) + 2Var α F ( (i,j),mN ) F (X(i,j)) , ≤ M m mN T M m mN XT − T m=1 m=1 Xj=1 X Xi=1 Xj=1 X Xi=1 where X(i,j) denotes the solution of (1.1) driven by W i,j with initial data ξi,j. Hence, independence implies that
M k mN k 1 1 1 1 Var α F (X(i,j)) α2 Var[F (X(1,1))]. M m mN T ≤ M m mN T m=1 m=1 Xj=1 X Xi=1 X On the other hand,
M k mN 1 1 Var α F ( (i,j),mN ) F (X(i,j)) M m mN XT − T Xj=1 mX=1 Xi=1 k mN k2 1 2 α2 E F ( (i,j),mN ) F (X(i,j)) ≤ M m mN XT − T mX=1 Xi=1 k mN k2 1 α2 E F ( (i,j),mN ) F (X(i,j)) 2 , ≤ M m mN XT − T m=1 i=1 X X where Jensen’s inequality is used. Using the fact F is Lipschitz continuous and the result on a dimension- free bound for strong propagation of chaos, established in [35], there exists a constant C > 0 with no
3We look at the mean-square error for simplicity, but a similar computation could be done to verify the Lindeberg condi- tion and produce CLT with an appropriate scaling.
6 dependence on N such that C E[ F ( (i,j),mN ) F (X(i,j)) 2] . | XT − T | ≤ mN Consequently, we have
M k mN k 1 1 2 1 1 E E[F (X )] α F ( (i,j),mN ) C(N −2k + α2 ). T − M m mN XT ≤ M m mN m=1 m=1 Xj=1 X Xi=1 X Since there are M ensembles corresponding to the estimator and each ensemble has k sub-particle sys- tems with mN particles each, m 1,...,k , the total number of interactions is = M k (mN)2. ∈ { } C m=1 When we take N = ǫ−1/k and M = ǫ−2+1/k the mean-square error is of the order O(ǫ2) (since k P α2 m−1 is a constant). The corresponding number of interactions is of the order O(ǫ−2−1/k). m=1 m C The message here is that as the smoothness increases, less interactions among particles are needed Pwhen approximating the law of McKean-Vlasov SDE (1.1). We would like to stress out again that the dimension of the system does not deteriorate the rate of convergence, in contrast to results presented in the literature [8, 15, 30]. It is instructive to compare the above computation with a usual mean- square analysis of a single particle system
N 1 2 E E[F (X )] F ( i,N ) T − N XT Xi=1 N 2 1 2 = E[F (X )] E[F ( 1,N )] + E E[F ( 1,N )] F ( i,N ) . T − XT XT − N XT Xi=1 As above, invoking strong propagation of chaos, one can show that the second term is of order O N −1 . That means that there would be no gain to go beyond what we can obtain from the strong propagation of chaos analysis to control the first term. Taking N = ǫ−2 results in mean-square error being of the order O(ǫ2) and number of interactions = N 2 = ǫ−4. That clearly demonstrates that C working with ensembles of particles leads to an improvement in quantitative properties of propagation of chaos, which is interesting on its own but can also be explored when simulating particle systems on the computer.
Notations. The 2 Wasserstein metric is defined by • − 1 2 2 W2(µ,ν) := inf x y π(dx, dy) , π∈Π(µ,ν) Rd Rd | − | Z × where Π(µ,ν) denotes the set of couplings between µ and ν i.e. all measures on B(Rd Rd) such × that π(B, Rd)= µ(B) and π(Rd,B)= ν(B) for every B B(Rd). ∈
7 Uniqueness in law of (1.1) implies that for any random variables ξ,ξ′ such that L (ξ)= L (ξ′)= ′ • L s,ξ L s,ξ s,µ s,ξ µ, we have (Xt )= (Xt ). Therefore, we adopt the notation Xt := Xt if only the law of the process is concerned.
When the total number N of particles is clear from context, we will often simply write i for • X i,N . X For any x,y Rd, we denote their inner product by xy. Since different measure derivatives lie in • ∈ different tensor product spaces, we use to denote the Euclidean norm for any tensor product | · | space in the form Rd1 . . . Rdℓ . ⊗ ⊗ The law of any random variable Z is denoted by L (Z). For any function f : (Rd) R, its lift • P2 → f˜ : L2(Ω, , P; Rd) R is defined by f˜(ξ)= f(L (ξ)). F → Also, (Ωˆ, ˆ, Pˆ) stands for a copy of (Ω, , P), which is useful to represent the Lions’ derivative of • F F a function of a probability measure. Any random variable η defined on (Ω, , P) is represented F by ηˆ as a pointwise copy on (Ωˆ, ˆ, Pˆ). In the section on regularity, we shall introduce a sequence F of copies of (Ω, , P), denoted by (Ω(n), (n), P(n)) . As before, any random variable η defined F { F }n on (Ω, , P) is represented by η(n) as a pointwise copy on (Ω(n), (n), P(n)). F F For T > 0, we define the following subsets of [0, T ]m, m 1: • ≥ ∆m := (t ,...,t ) [0, T ]m 0 We often denote (t1,...,tm) = t and (t1,...,tm−1) = τ. We shall also sometimes use the con- vention ∆0 :=: ∆0 := for simplicity of notation. T T ∅ With the above definition, we denote • f(t)dt := f(t1,...,tm)dt1 . . . dtm. m Z∆T Z0 For any function f :∆m R, we always denote by ∂ f(t ,...,t ) the partial derivatives of f in • T → t 1 m the variable tm at (t1,...,tm) whenever they exist. L2 denotes the set of square integrable random variables, 2 the set of square-integrable pro- • 1 H gressively measurable processes θ such that T θ 2 2 L2. 0 | s| ∈ R 8 2 Method of weak error expansion 2.1 Calculus on the space of measures Our method of proof is based on expansion of an auxiliary map satisfying a PDE on the Wasserstein space. One of the most important tools of the paper is thus the theory of differentiation in measure. We make an intensive use of the so-called “L-derivatives” and “linear functional derivatives” that we recall now, following essentially [6]. We also introduce a higher-order version of this derivative as this is needed in the proofs of our expansion. 2.1.1 Linear functional derivatives A continuous function δU : (Rd) Rd R is said to be the linear functional derivative of δm P2 × → U : (Rd) R, if P2 → for any bounded set (Rd), y δU (m,y) has at most quadratic growth in y uniformly in • K ⊂ P2 7→ δm m , ∈ K for any m,m′ (Rd), • ∈ P2 1 δU U(m′) U(m)= ((1 s)m + sm′,y) (m′ m)(dy) ds. (2.1) − Rd δm − − Z0 Z For the purpose of our work, we need to introduce derivatives at any order p 1. ≥ Definition 2.1. For any p 1, the p-th order linear functional of the function U is a continuous δpU d ≥ d p−1 d function from p : (R ) (R ) R R satisfying δm P2 × × → p for any bounded set (Rd), (y,y′) δ U (m,y,y′) has at most quadratic growth in (y,y′) • K ⊂ P2 7→ δmp uniformly in m , ∈ K for any m,m′ (Rd), • ∈ P2 p−1 p−1 1 p δ U ′ δ U δ U ′ ′ ′ ′ p−1 (m ,y) p−1 (m,y)= p ((1 s)m + sm ,y,y ) (m m)(dy )ds, δm − δm Rd δm − − Z0 Z provided that the (p 1)-th order derivative is well defined. − The above derivatives are defined up to an additive constant via (2.1). They are normalised by δpU (m, 0) = 0 . (2.2) δmp We make the following easy observation, which will be useful in the latest parts. 9 Lemma 2.2. If U admits linear functional derivatives up to order q, then the following expansion holds q−1 1 δpU ′ y ′ ⊗p y U(m ) U(m)= p (m, ) m m (d ) − p! Rpd δm { − } Xp=1 Z 1 1 δqU q−1 ′ y ′ ⊗q y + (1 t) q ((1 t)m + tm , ) m m (d )dt. (q 1)! − Rqd δm − { − } − Z0 Z Proof. We define [0, 1] t f(t)= U (1 t)m + tm′ = U m + t(m′ m) R (2.3) ∋ 7→ − − ∈ and apply Taylor-Lagrange formula to f up to order q, namely q−1 1 1 1 f(1) f(0) = f (p)(0) + (1 t)(q−1)f (q)(t)dt. − p! (q 1)! − p=1 0 X − Z It remains to show that δpU (p) ′ y ′ ⊗p y f (t)= p (m + t(m m), ) m m (d ), p 0,...,q . (2.4) Rpd δm − { − } ∀ ∈ { } Z by induction. Since (2.4) holds trivially for p = 0, we suppose that (2.4) holds for p 0,...,q 1 . ∈ { − } Then f (p)(t + h) f (p)(t) − h 1 δpU ′ y ′ ⊗p y = p (m + (t + h)(m m), ) m m (d ) h Rpd δm − { − } Z δpU ′ y ′ ⊗p y p (m + t(m m), ) m m (d ) − Rpd δm − { − } Z 1 δp+1U ′ y ′ ′ ′ ′ ⊗p y = p+1 (m + (t + sh)(m m), ,y ) (m m)(dy ) ds m m (d ). Rpd Rd δm − − { − } Z Z0 Z Taking h 0 gives (2.4) for p + 1. This completes the proof. → 2.1.2 L-derivatives The above notion of linear functional derivatives is not enough for our work. We shall need to consider further derivatives in the non-measure argument of the derivative function. If the function y δU (m,y) is of class 1, we consider the intrinsic derivative of U that we denote 7→ δm C δU ∂ U(m,y) := ∂ (m,y) . µ y δm 10 The notation is borrowed from the literature on mean field games and corresponds to the notion of “L-derivative” introduced by P.-L. Lions in his lectures at Coll?ge de France [27]. Traditionally, it is introduced by considering a lift on an L2 space of the function U and using the Fr?chet differentiability of this lift on this Hilbert space. The equivalence between the two notions is proved in [8, Tome I, Chapter 5], where the link with the notion of derivatives used in optimal transport theory is also made. In this context, higher order derivatives are introduced by iterating the operator ∂µ and the deriv- ation in the non-measure arguments. Namely, at order 2, one considers (Rd) Rd (m,y) ∂ ∂ U(m,y) and (Rd) Rd Rd (m,y,y′) ∂2U(m,y,y′) . P2 × ∋ 7→ y µ P2 × × ∋ 7→ µ This leads in particular to the notion of a fully 2 function that will be of great interest for us (see C [9]). Definition 2.3 (Fully 2). A function U : (Rd) R is fully 2 if the following mappings C P2 → C (Rd) Rd (m,y) ∂ U(m,y) P2 × ∋ 7→ µ (Rd) Rd (m,y) ∂ ∂ U(m,y) P2 × ∋ 7→ y µ (Rd) Rd Rd (m,y,y′) ∂2U(m,y,y′) P2 × × ∋ 7→ µ are well-defined and continuous for the product topologies. Let us observe for later use that if the function U is fully 2 and moreover satisfies, for any compact C subset (Rd), K ⊂ P2 2 2 sup ∂µU(m,y) + ∂y∂µU(m,y) dm(y) < + , Rd | | | | ∞ m∈K Z then it follows from Theorem 3.3 in [9] that U can be expanded along the flow of marginals of an It? process. Namely, let µt = L (Xt) where dX = b dt + σ dW , X L2, t t t t 0 ∈ with b and a := σ σ′ , then ∈H2 t t t ∈H2 t 1 U(µ )= U(µ )+ E ∂ U(µ , X )b + Tr ∂ ∂ U(µ , X )a ds. (2.5) t 0 µ s s s 2 { y µ s s s} Z0 In order to prove our expansion, we need to iterate the application of the previous chain rule and in order to proceed, we need to use higher order derivatives of the measure functional. Inspired by the work [10], for any k N, we formally define the higher order derivatives in meas- ∈ ures through the following iteration (provided that they actually exist): for any k 2, (i ,...,i ) ≥ 1 k ∈ 1,...,d k and x ,...,x Rd, the function ∂kf : (Rd) (Rd)k (Rd)⊗k is defined by { } 1 k ∈ µ P2 × → k k−1 ∂µf(µ,x1,...,xk) := ∂µ ∂µ f( ,x1,...,xk−1) (µ,xk) , (2.6) (i ,...,i ) · (i1,...,ik−1) i 1 k k 11 and its corresponding mixed derivatives in space ∂ℓk ...∂ℓ1 ∂kf : (Rd) (Rd)k (Rd)⊗(k+ℓ1+...ℓk) vk v1 µ P2 × → are defined by ∂ℓk ∂ℓ1 ℓk ℓ1 k k N ∂v ...∂v ∂µf(µ,x1,...,xk) := . . . ∂µf(µ,x1,...,xk) , ℓ1 ...ℓk 0 . k 1 ∂xℓk ∂xℓ1 ∈ ∪{ } (i1,...,ik) k 1 (i1,...,ik) (2.7) Since this notation for higher order derivatives in measure is quite cumbersome, we introduce the following multi-index notation for brevity. This notation was first proposed in [10]. Definition 2.4 (Multi-index notation). Let n,ℓ be non-negative integers. Also, let β = (β1,...,βn) be an n-dimensional vector of non-negative integers. Then we call any ordered tuple of the form (n,ℓ, β) or (n, β) a multi-index. For a function f : Rd (Rd) R, the derivative D(n,ℓ,β)f(x, µ, v , . . . , v ) is × P2 7→ 1 n defined as (n,ℓ,β) βn β1 ℓ n D f(x, µ, v1, . . . , vn) := ∂vn ...∂v1 ∂x∂µ f(x, µ, v1, . . . , vn) if this derivative is well-defined. For any function Φ : (Rd) R, we define P2 → (n,β) βn β1 n D Φ(µ, v1, . . . , vn) := ∂vn ...∂v1 ∂µ Φ(µ, v1, . . . , vn), if this derivative is well-defined. Finally, we also define the order 4 (n,ℓ, β) (resp. (n, β) ) by | | | | (n,ℓ, β) := n + β + ...β + ℓ, (n, β) := n + β + ...β . (2.8) | | 1 n | | 1 n As for the first order case, we can establish the following relationship with linear functional deriv- atives, see e.g. [6] for the correspondence up to order 2, δ δ δn ∂nU( )= ∂ ...∂ U( )= ∂ ...∂ U( ) , (2.9) µ · yn δm y1 δm · yn y1 δmn · provided one of the two derivatives is well-defined. Next, we deduce the following lemma that will be useful later on. p ∞ Lemma 2.5. Let p 1 and assume that ∂µU L . Then ≥ ∈ δpU (m,y ,...,y ) C ( y p + + y p) , δmp 1 p ≤ p | 1| · · · | p| for some constant Cp > 0. Proof. We sketch the proof by induction in dimension one, for ease of notation. Let p 1. ≥ First, we compute that δpU δpU 1 δpU (m,y ,...,y )= (m,y ,...,y , 0) + ∂ (m,y ,...,t y ) dt . δmp 1 p δmp 1 p−1 tp δmp 1 p p p Z0 4 We do not consider ‘zeroth’ order derivatives in our definition, i.e. at least one of n, β1,...,βn and ℓ must be non-zero, for every multi-index n, ℓ, (β1,...,βn). 12 Let ∂xp denote the derivative w.r.t. the pth component of the spatial variables. From the convention of normalisation (2.2), we simply obtain that δpU 1 δpU (m,y ,...,y )= y ∂ (m,y ,...,t y ) dt . δmp 1 p p xp δmp 1 p p p Z0 Let k δpU (m,y ,...,y )= δmp 1 p δpU yp−k ...yp ∂xp−k ...∂xp p (m,y1,...,yp−k−1,tp−kyp−k,...,tpyp) dtp−k . . . dtp. (2.10) k+1 δm Z[0,1] Then, observing that δpU ∂ ...∂ (m,y ,...,y , 0,t y ,...,t y ) = 0, xp−k xp δmp 1 p−k−2 p−k p−k p p we recover δpU (m,y ,...,y )= δmp 1 p δpU yp−k−1 ...yp ∂xp−k−1 ...∂xp p (m,y1,...,yp−k−2,tp−k−1yp−k−1,...,tpyp) dtp−k−1 . . . dtp. k+2 δm Z[0,1] Setting k = p 1 in (2.10), we then obtain − p δ U p p (m,y1,...,yp)= y1 ...yp ∂µU(m,t1y1,...,tpyp)dt1 . . . dtp. δm p Z[0,1] p The proof is concluded by invoking the boundedness assumption of ∂µU along with Young’s inequality. 2.2 Weak error expansion along dynamics To state our expansion for the dynamic case, we will need some notion of smoothness given in the following definition. Definition 2.6. Let m be a positive integer. A function U : ∆m (Rd) R is of class (∆m) if the T × P2 → D T following conditions hold: i) U is jointly continuous on ∆m (Rd). T × P2 ii) For all t ∆m, U(t, ) is fully C2. ∈ T · 13 iii) Let L be a positive constant. For all t ∆m and ξ L2(Rd), ∈ T ∈ E ∂ U(t, L (ξ))(ξ) 2 + ∂ ∂ U(t, L (ξ))(ξ) 2 + ∂2U(t, L (ξ))(ξ,ξ) 2 L. | µ | | υ µ | | µ | ≤ iv) m = 1: s U(s,µ) is continuously differentiable on (0, T ). • 7→ m> 1: for all (τ ,...,τ ) ∆m−1 and all µ (Rd), the function • 1 m−1 ∈ T ∈ P2 (0,τ ) s U((τ ,...,τ ,s),µ) R m−1 ∋ 7→ 1 m−1 ∈ is continuously differentiable on (0,τm−1). v) The functions ∆m L2(Rd) (t,ξ) ∂ U(t, L (ξ))(ξ) L2(Rd) T × ∋ 7→ t ∈ ∆m L2(Rd) (t,ξ) ∂ U(t, L (ξ))(ξ) L2(Rd) T × ∋ 7→ µ ∈ ∆m L2(Rd) (t,ξ) ∂ ∂ U(t, L (ξ))(ξ) L2(Rd×d) T × ∋ 7→ υ µ ∈ ∆m L2(Rd) (t,ξ) ∂2U(t, L (ξ))(ξ,ξ) L2(Rd×d) T × ∋ 7→ µ ∈ are continuous. We define recursively the functions Φ(m), (m), 1 m k, that are used to prove the expansion. V ≤ ≤ Definition 2.7. i) For m = 1, we set Φ(0) =Φ and define (1) : [0, T ] (Rd) R by V × P2 → (1)(t,µ) := (t,µ)=Φ(0)(L (Xt,µ)) = Φ(L (Xt,µ)) . V V T T Assuming that (1) belongs to the class (∆1 ), we set Φ(1) : (0, T ) (Rd) R as V D T × P2 → (1) 2 (1) Φ (t,µ) := Tr ∂µ (t,µ)(x,x)a(x,µ) µ(dx). (2.11) Rd V Z h i ii) For 1 (m) 2 (m) Φ (t,µ) := Tr ∂µ (t,µ)(x,x)a(x,µ) µ(dx). Rd V Z h i A key point in our work is to show that the previous definition is licit under some assumptions on the coefficient functions b, σ and Φ (Theorem 2.16 and Theorem 2.17). Before we proceed we state the following assumptions 14 (Lip) b and σ are Lipschitz continuous with respect to the Euclidean norm and the W2 norm. (UB) There exists L> 0 such that σ(x,µ) L, for every x Rd and µ (Rd). | |≤ ∈ ∈ P2 It will become apparent from the proofs that when working only with (Lip), higher order integrability conditions would need to be stated in Definition (2.6). We refrain from this extension and assume (UB) to improve readability of the paper, but encourage a curious reader to perform this simple extension. We begin with the following technical lemma. d Lemma 2.8. Assume (Lip) and (UB). Let m be a positive integer and f : 2(R ) R be a continuous m d P t,µ → function. Consider U : ∆ (R ) R given by U((τ,t),µ) := f([Xτ − ]) and set U (t,µ) = T × P2 → m 1 τ U((τ,t),µ), where τ ∆m−1. If U is of class (∆m), then the following statements hold: ∈ T D T (i) U satisfies on (0,τ ) (Rd) the following PDE τ m−1 × P2 1 ∂sUτ (s,µ)+ ∂µUτ (s,µ)(y)b(y,µ)+ Tr ∂v∂µUτ (s,µ)(y)a(y,µ) µ(dy) = 0, (2.12) Rd 2 Z with terminal condition U (τ , ) = f(τ, ), where a = (a ) : Rd (Rd) Rd Rd τ m−1 · · i,k 1≤i,k≤d × P2 → ⊗ denotes the diffusion operator q a (x,µ) := σ (x,µ)σ (x,µ), x Rd, µ (Rd). i,k i,j k,j ∀ ∈ ∀ ∈ P2 Xj=1 (ii) Uτ can be expanded along the flow of random measure associated to the particle system (1.2) as follows, for all 0 t τ , ≤ ≤ m−1 t XN XN 1 XN 2 XN XN N Uτ (t, t )= Uτ (0, 0 )+ Tr a(υ, s )∂µUτ (s, s )(υ, υ) s (dυ)ds + Mt , (2.13) 2N Rd Z0 Z N N where M is a square integrable martingale with M0 = 0. 0,ξ Proof. (i) By the flow property, we observe that the function [0,τm−1) s Uτ (s, L (Xs )) R is 0,ξ ∋ 7→ ∈ 0,ξ s,Xs 0,ξ constant. Indeed, Uτ (s, L (Xs )) = f(L (Xτ )) = f(L (Xτ )). Applying the chain rule in both time and measure arguments between t and t + h, we get h L 0,ξ E L 0,ξ 0,ξ 0,ξ L 0,ξ 0 = ∂tUτ (s, (Xs )) + ∂µUτ (s, (Xs ))(Xs )b(Xs , (Xs )) Z0 1 h i + Tr a(X0,ξ, L (X0,ξ))∂ ∂ U (s, L (X0,ξ))(X0,ξ) ds. 2 s s υ µ τ s s h i 15 Dividing by h and letting h 0 allows to recover the first claim. → (ii) To recover the expansion, we use the known strategy of considering finite dimensional projection of U. Namely, for a fixed number of particles N, we define N 1 u(t,x ,...,x ) := U (t, δ ) . 1 N τ N xi Xi=1 From Definition 2.6(ii), (iv) and (v), we have that u is C1,2([0,τ) (Rd)n) (see Proposition 3.1 in [9]. × Recalling the link between the derivatives of u and U (again see Proposition 3.1 in [9]), we can apply the classical Ito’s formula to t U (t, XN )= u(t, 1,..., N ) to get 7→ τ t Xt Xt N 1 t U (t, XN )= U(0, XN )+ ∂ U (s, XN )( i)σ( i, XN )dW i =: M N (2.14) τ t 0 N µ τ s Xs Xs s s t i=0 Z0 t X XN XN XN 1 XN XN XN + ∂tU(s, s )+ ∂µUτ (s, s )(υ)b(υ, s )+ Tr a(υ, s )∂υ∂µUτ (s, s )(υ) s (dυ) ds Rd 2 Z0 Z (2.15) t 1 XN 2 XN XN + Tr a(υ, s )∂µUτ (s, s )(υ, υ) s (dυ)ds . (2.16) 2N Rd Z0 Z XN We first note that the term in (2.15) is precisely (2.12) evaluated at (s, s ) and is thus equal to zero. We now study the local martingale term M N in (2.14). We simply compute N 1 t E M N 2 = E ∂ U (s, XN )( i)σ( i, XN ) 2 ds | t | N 2 | µ τ s Xs Xs s | i=1 Z0 Xt C E XN 2 XN ∂µUτ (s, s )(υ) s (dυ) ds , ≤ N Rd | | Z0 Z where we have used (UB). Using Definition 2.6(iii), we have XN 2 XN ∂µUτ (s, s )(υ) s (dυ) C, Rd | | ≤ Z which concludes that M N is a square integrable martingale. Theorem 2.9 (Weak error expansion: dynamic case). Assume (Lip) and (UB). Suppose that Definition 2.7 is well-posed for m 1,...,k . Then the weak error in the particle approximation can be expressed ∈ { } as k−1 1 1 E Φ(XN ) Φ(L (X0,ξ)) = C + N + O( ), (2.17) T − T N j j Ij+1 N k j=0 X 16 where C0 := 0 and (m) L 0,ξ Cm := Φ (t, (Xtm ))dt , m 1,...,k 1 , m ∈ { − } Z∆T and N := E (0, XN ) (0, L (ξ)) and I1 V 0 −V N := E (m)((τ, 0), XN ) (m)((τ, 0), L (ξ)) dτ, m 2,...,k . m − 0 I ∆m 1 V −V ∈ { } Z T h i Proof. Part 1: We first check that the constants (C , N ) are well defined. m Im+1 0≤m≤k−1 For 1 m k 1, we first show that the function ≤ ≤ − ∆m (Rd) (t,µ) Φ(m)(t,µ) R T × P2 ∋ 7→ ∈ is continuous. Indeed, let (tn,µn)n be a sequence converging to (t,µ) in the product topology. Then there exists a sequence (ξn) of random variable such that L (ξn) = µn converging to ξ with law µ in L2. By continuity of σ, Definition 2.7 and Definition 2.6(v), Γ := Tr ∂2 (m)(t , L (ξ ))(ξ ,ξ )a(ξ , L (ξ )) Tr ∂2 (m)(t, L (ξ))(ξ,ξ)a(ξ, L (ξ)) =: Γ n µV n n n n n n → µV h i h i in probability. Next, since σ is bounded, 2 2 E Tr ∂2 (m)(t , L (ξ ))(ξ ,ξ )a(ξ , L (ξ )) CE ∂2 (m)(t , L (ξ ))(ξ ,ξ ) C, µV n n n n n n ≤ µV n n n n ≤ h i where the last inequality follows from the fact that (m) is of class (∆m), by Definition 2.6 (iii). By de V D T La Vallée Poussin Theorem, the previous computation shows that (Γn) is uniformly integrable and thus Φ(m)(t ,µ ) = E[Γ ] E[Γ] = Φ(m)(t,µ). Observing that ∆m t (t, L (X0,ξ)) ∆m (Rd) is n n n → T ∋ 7→ tm ∈ T × P2 continuous, we conclude that ∆m t Φ(m)(t, L (X0,ξ)) R is also continuous (hence measurable) T ∋ 7→ tm ∈ and therefore Cm is well-defined. Hence, by the definition of (m), for each µ (Rd), the function V ∈ P2 ∆m−1 τ (m)((τ, 0),µ) R T ∋ 7→ V ∈ is continuous. Also, by the previous argument along with Definition 2.6(iii), we can see that Φ(m) is uniformly bounded. Therefore, the function (τ,µ) (m)((τ, 0),µ) R 7→ V ∈ is also uniformly bounded. By the dominated convergence theorem, the function ∆m−1 τ E (m)((τ, 0), XN ) R T ∋ 7→ V 0 ∈ h i 17 is continuous. This shows that N is well-defined. Im Part 2: We now proceed with the proof of the expansion, which is done by induction on m. Base step: We decompose the weak error as E Φ(XN ) Φ(L (X0,ξ)) = E (T, XN ) (0, XN ) + E (0, XN ) (0, L (ξ)) . (2.18) T − T V T −V 0 V 0 −V Applying Lemma 2.8(ii) for the first term in the right-hand side and taking expectation on both side, we obtain that T E XN L 0,ξ E XN L 1 XN 2 XN XN Φ( T ) Φ( (XT )) = (0, 0 ) (0, (ξ)) + Tr a(υ, s )∂µ (s, s )(υ, υ) s (dυ)ds . − V −V 2N Rd V Z0 Z h i Recalling the definition of Φ(1) in (2.11), we get 1 T E Φ(XN ) Φ(L (X0,ξ)) = E (0, XN ) (0, L (ξ)) + E Φ(1)(t , XN ) dt . T − T V 0 −V N 1 t1 1 Z0 h i h i From Part 1, we know that (1) is uniformly bounded and thus T E (1) XN , where Φ 0 Φ (t1, t1 ) dt1 < C C > 0 does not depend on N. This proves the induction for the base step. R Induction step: Assume that for 1 m−1 1 1 E XN L 0,ξ N E (m) t XN t Φ( T ) Φ( (XT )) = j j+1 + Cj + m Φ ( , tm ) d . − N I N m j=0 Z∆T h i X h i Then, we observe that E Φ(m)(t, XN ) Φ(m)(t, L (X0,ξ)) tm − tm h i = E (m+1)((t,t ), XN ) (m+1)((t, 0), XN )+ (m+1)((t, 0), XN ) (m+1)((t, 0), L (ξ)) , V m tm −V 0 V 0 −V h i which leads to E Φ(XN ) Φ(L (X0,ξ)) T − T mh m−1 i 1 N 1 1 E (m+1) XN (m+1) XN = j + Cj + ((t,tm), tm ) ((t, 0), 0 ) dt.(2.19) N j I N j N m m V −V ∆T Xj=0 Xj=1 Z h i Applying Lemma 2.8(ii) to (m+1)(t, ), we obtain that V · E (m+1)((t,t ), XN ) (m+1)((t, 0), XN ) V m tm −V 0 h1 tm i E 2 (m+1) t XN XN XN = Tr ∂µ (( ,tm+1), tm+1 )(υ, υ)a(v, tm+1 ) tm+1 (dυ)dtm+1 . 2N 0 Rd V Z Z h i 18 Inserting this back into (2.19), we get m m 1 1 1 E XN L 0,ξ N E (m+1) t XN t Φ( T ) Φ( (XT )) = j + Cj + Φ ( , tm+1 ) d . − N j I N j N m+1 m+1 ∆T h i Xj=0 Xj=1 Z h i E (m+1) XN The proof is concluded by observing that m+1 Φ (t, t ) dt < C, due to the uniform ∆T m+1 (m+1) boundedness of Φ given in Part 1. R h i 2.3 Weak error expansion for the initial condition Assuming enough smoothness of the functions (m), we can take care of the terms N appearing in V Im the previous theorem, which are error made at time 0. The following weak error analysis relies on the notion of linear functional derivatives. We first start by studying the weak error generated between the evaluation of the function at a measure and its empirical measure counterparts. We prove two results: one dealing mainly with low order expansion and the order one, available at any order. The main assumption we work with relates to the couple (U, m), where U is a function with domain (Rd). P2 (p-LFD) The pth order linear functional derivative of U exists and is continuous and that for any family (ξi)1≤i≤p of random variable identically distributed with law m the following holds p E δ U sup p (ν,ξ1,...,ξp) L(U,m) , Rd δm ≤ "ν∈P2( ) # for some positive constant L(U,m). We first make the following observation regarding assumption (p-LFD), that will be of later use. Remark 2.10. (i) Lemma 2.5 states that δpU (m)(y ,...,y ) C y p + . . . + y p , (2.20) δmp 1 p ≤ | 1| | p|