arXiv:2010.14178v1 [math.PR] 27 Oct 2020 n esrso h w rcse,udrsial roi as ergodic to suitable us under allow processes, will two i and the estimates times, of for large measures allow for ant to better different, behave bit that a be will here mentation qain.Ta ehdwsitoue nteSEstigi [ in Lel setting De SDE and the in Crippa The introduced by was developed problem. method technique That this a of equations. of regime adaptation quantitative an the is study to is here esrs[ measures close Here Let Introduction 1 rcse as processes hl eoeby denote shall one rmblw nsc situation, a such In below. from bounded aevle ntecn fsymmetric of cone the in values take tblt siae o nain esrso diffusion of measures invariant for estimates Stability ewl eepcal neetdi h aewhere case the in interested especially be will We ups ht nsm es ob aepeielater, precise made be to sense some in that, Suppose rcse,wt plctost tblt fmoment of stability to applications with processes, X B σ epc to respect ata xeso fa nqaiyo eox ori n ect r kernels. Peccati Stein se pro of and non-Gaussian we construction Crip Nourdin a map application, to Ledoux, by discrepancies an of Stein introduced As and inequality technique distances an equations. transport a of transport of extension study variant to partial a Lellis is De method and The concavity. t Y , n a s hte h the whether ask can One . t sasadr rwinmotion, Brownian standard a is eivsiaesaiiyo nain esrso iuinprocesses diffusion of measures invariant of stability investigate We t dX 5 esohsi rcse in processes stochastic be ,wihw hl eoe epciey as respectively, denote, shall we which ], X t = t ∼ S L a d p ( ++ µ X itne ntececet,udra supino log- of assumption an under coefficients, the on distances t t Given . esrsadSenkernels Stein and measures ) and dt a ah n a Mikulincer Dan and Fathi Max + Y p t ∼ 2 X τ ν 0 coe 8 2020 28, October ( t X . and t ) Abstract dB R Y d d 0 t dY , esalwietemria aso the of laws marginal the write shall we , 1 hc aif h SDEs, the satisfy which × ,b a, d µ t oiiedfiiemtie,wihwe which matrices, definite positive r etrvle ucin,and functions, vector-valued are t X utte ecoeto close be then must = t b and ( Y t ) µ dt Y and t + σ di nqeinvariant unique admit a p ν and scoeto close is sumptions. 2 nti etn,we setting, this In . opr h invari- the compare egtdSobolev weighted n σ 8 ehdue here used method ( Y tn i the via tting , τ i o transport for lis t 32 ) dB r uniformly are .Orimple- Our ]. ν elating t t u goal Our . b . with ea ve and , pa τ ,τ σ, (1) is 2 will think about ν as the reference measure and quantify the discrepancy in the coefficients as: β := a b 1 + √σ √τ 2 . k − kL (ν) k − kL (ν) Our main result is an estimate of the form (see precise formulation below)

dist(µ,ν) h(β), ≤ where dist( , ) stands for an appropriate notion of distance, which will here be a · · transport distance, and lim h(β) = 0. β 0 → As an application of our stability estimate, we will show that if two uniformly log-concave measures satisfy certain similar integration by parts formula, in the sense arising in Stein’s method, then the measures must be close. This problem was the original motivation of our study.

1.1 Background on stability for transport equations Part of the present work is a variant in the SDE setting of a now well-established quantitative theory for transport equations with non-smooth coefficients, pio- neered by Crippa and De Lellis [13]. As demonstrated by the DiPerna-Lions theory ([14] and later estimates in [13]), there is a significant difference in the stability of solutions to differential equations when the coefficients are Lipschitz continuous versus when they only belong to some Sobolev space (and are not necessarily globally Lipschitz). Our focus will be on the latter, and arguably more challenging, case. As we shall later discuss, the techniques can be carried over to the setting of stochastic differential equations we are interested in here, as worked out in [8, 32]. The strategy for quantitative estimates, introduced in [13], in the Lagrangian setting relies on controlling the behavior of ln(1 + X Y /δ) for two flows (X ) | t − t| t and (Yt), with a parameter δ that will be very small, of the order of the difference between the vector fields driving the flows. A crucial idea is the use of the Lusin- Lipschitz property of Sobolev vector fields [33], which allows to get Lipschitz-like estimates on large regions, in a controlled way. We will discuss this idea in more details in Section 2.2. The ideas of [13] were adapted to the Eulerian setting in [41, 42], using a transport distance with logarithmic cost. Existence and uniqueness of solutions to SDEs and Fokker-Planck equations with non-smooth coefficients by adapting DiPerna-Lions theory was first addressed in [29, 19], inspiring many further developments, such as [16, 46, 49, 8]. We will not discuss much the issue of well-posedness here, and focus on more quantitative aspects of the problem. We refer for example to [32, Section 3.1] and [46] for a comprehensive discussion of the issues, in particular with respect to the different ways of defining a notion of solution. As pointed out in [8, 32], the kind of quantitative methods used here could also prove well-posedness by approximation with processes with smoother coefficients. 3

2 Results

2.1 Main result We consider two diffusion processes of the form (1), and assume that they admit unique invariant measures µ and ν. We fix some real number p 2 with q its 1 1 ≥ H¨older conjugate, so that q + p = 1. We then make the following assumptions: H1. (Regularity of coefficients) There exists a g : Rd R, such that for → any x,y Rd: ∈ a(x) a(y) , τ(x) τ(y) (g(x)+ g(y)) x y , (2) k − k k − k≤ k − k and p p g 2q < . k kL (µ) ∞ To simplify some notations later on, we also assume that g 1 pointwise, ≥ which does not strengthen the assumption since we can always replace g by max(g, 1).

H2. (Integrability of relative density) Both µ and ν have finite second moments, and it holds that dν < . dµ p ∞ L (µ)

H3. (Exponential convergence to equilibrium) There exist constants κ, CH > 0 such that for any initial data µ and any t 0 we have 0 ≥ κ t W (µ ,µ) C e− · W (µ ,µ). 2 t ≤ H 2 0 We denote for the second moments

m2(µ)= x 2dµ, m2(ν)= x 2dν. 2 | | 2 | | RZd RZd

Concerning Assumption (H1), it is the Lusin-Lipschitz property we mentioned above, and which we shall discuss in some depth in Section 2.2 below. One should think about it as being a generalization of Lipschitz continuity, as it essentially means σ and a may be well approximated by Lipschitz functions on arbitrarily large sets. In particular, this property holds for Sobolev functions when the ref- erence measure is log-concave. When CH = 1, the third assumption corresponds to contractivity, which for reversible diffusion processes is equivalent to a lower bound on the Bakry-Emery curvature [38]. Allowing for a constant CH > 1 allows to cover other examples, such as hypocoercive dynamics, notably because the as- sumption is then invariant by change of equivalent metric, up to the value of CH . See for example [35] 4

Now, for R> 0, define the truncated quadratic Wasserstein distance by:

W 2 (ν,µ) := inf min( x y 2,R)dπ, (3) 2,R π | − | Z where the infimum isf taken over all couplings of µ and ν. This is a distance on the space of measures, weaker than the classical W2, defined by

W 2(ν,µ) := inf x y 2dπ. 2 π | − | Z With the above notations our main result reads: Theorem 2.1. Assume (H1),(H2) and (H3) hold, and denote

β := a b 1 + √σ √τ 2 . k − kL (ν) k − kL (ν) Then, for any R> 1,

R ln ln 1+ + ln m2(µ)+ m2(ν) + κ R 2 2 2 β · W2,R(ν,µ) 100CH R g L2q (µ) dν/dµ Lp(µ) , ≤ ·k k k k    κ ln 1+ R  · β f   1 1 where q + p = 1.

Remark 2.1. Essentially, The theorem says that W2,R(ν,µ) decreases at a rate 1 1 R − R − which is proportional to ln 1+ β . We could improvef the rate to ln 1+ β r by considering a truncated W1 distance, as shall be discussed in Remark 3.1 . How- ever, for our application to Stein kernels it is more natural to work with W2.

Let us discuss now the role of the term dν/dµ p in Theorem 2.1. In order k kL (µ) to use a b 1 + √σ √τ 2 as a measure of discrepancy, it seems neces- k − kL (ν) k − kL (ν) sary that the supports of µ and ν intersect, otherwise we could just change σ and a on a µ-negligible and have β = 0 in the conclusion of the theorem, which obviously fails in this particular situation. Thus, since the bound in Theorem

2.1 only makes sense when dν/dµ p is finite, one may view this term as an k kL (µ) a-priori guarantee on the common support of µ and ν.

The logarithmic rate obtained here may seem quite weak. In the setting of transport equations, the logarithmic bounds obtained by the method considered here are sometimes actually sharp [42]. We do not know much about optimality in the stochastic setting, since it may be that the presence of noise would help, while the method used here cannot do better in the stochastic setting than in the deterministic setting.

As alluded to in the introduction, if the coefficients a and √τ are actually L-Lipschitz, in which case g L in (H1), then one may greatly improve the rate ≡ 2 in Theorem 2.1. 5

Theorem 2.2. Assume a and √τ are L-Lipschitz and that (H3) holds, and denote

β := a b 2 + √τ √σ 2 . k − kL (ν) k − kL (ν) Then, 4L2+1 L W (ν,µ) 15C 2κ β + 1 . 2 ≤ H κ   This type of estimate is part of the folklore, and a version of it appears for example in [6].

2.2 About the Lusin-Lipschitz property for Sobolev functions We will now discuss in some more depth Assumption (H1). As mentioned pre- viously, it is motivated by the Lusin-Lipschitz property of Sobolev functions with respect to the [33]: if a function f : Rd R satisfies −→ f pdx < then for a.e. x,y Rd we have |∇ | ∞ ∈ R f(x) f(y) (M f (x)+ M f (y)) x y , (4) | − |≤ |∇ | |∇ | | − | where M is the Hardy-Littlewood maximal , defined on a non-negative function g as, 1 Mg(x) := sup B − g(x)dx, | r| r>0 ZBr(x) C a dimension-free constant, and Br is the Euclidean ball of radius r. This operator satisfies the dimension-free continuity property

Mf p C f p , || ||L (dx) ≤ p|| ||L (dx) when p> 1. The dimension-free bound on Cp is due to E. Stein [45] In particular, if f Lp(dx), then f is λ-Lipschitz on the regions where ∇ ∈ M f is smaller than λ/2, which are large when λ is, by the Markov inequality. |∇ | The important distinction between using an estimate on M f instead of f is |∇ | ∇ that, even if both f(x) and f(y) are controlled, we do not automatically get ∇ ∇ an estimate on f(x) f(y), since the straight line from x to y may well go through − a region where f is arbitrarily large. The use of (4) nicely bypasses this issue. |∇ | An important issue for the applications we shall discuss here is that working in functional weighted with the Lebesgue measure is not always the most natural when dealing with stochastic processes. It is often preferable to work in Lp(µ) with a probability measure adapted to the problem considered, which here shall be the invariant measure of the considered. However, in general the maximal operator has no reason to be continuous over Lp(µ), unless µ has density with respect to the Lebesgue measure that is uniformly bounded from above and below on its support. As soon as µ is not compactly supported, this cannot be the case. Therefore, we shall make strong use of a work of Ambrosio, Bru´eand Trevisan [1], which proves a Lusin-type property for Sobolev functions 6 with respect to a log-concave measure. The proof uses an operator different from the Hardy-Littlewood maximal operator, more adapted to the setting. We shall not discuss here the specifics of that operator, since we only need the Lusin prop- erty, and not the maximal operator itself. The exact statement of their result, in the restricted setting of log-concave measures on Rd, is as follows:

Proposition 2.3. Let µ be a log-concave measure, and p 2. Then for any ≥ function f W 1,p(µ), there exists a function g such that f(x) f(y) (g(x)+ ∈ | − |≤ g(y)) x y for a.e. x and y, and with g p C f p with C some | − | || ||L (µ) ≤ p||∇ ||L (µ) p universal constant, that only depends on p.

This statement is proved in [1, Theorem 4.1], and also holds for maps taking values in some Hilbert space. It is written there only for p = 2, but the reason for that restriction is that they work in the more general setting of possibly nonsmooth RCD spaces, rather than just Rd endowed with a measure. The only where they require the restriction to p = 2 is when using the Riesz inequality [1, Remark 3.9], which in the smooth setting is known for general values of p, as proved in [2].

2.3 Related works As mentioned previously, the adaptation of the Crippa-De Lellis method to derive quantitative estimates for stochastic differential equations was already considered in [8] and [32]. The results of [32] give stability estimates with bounds that depend on σ p . ||∇ ||L (dx) Considering estimates weighted with the Lebesgue measure allows to use the Hardy-Littlewood maximal function directly. As mentioned above, the main focus here is to get estimates that are weighted with respect to a probability measure adapted to the problem, which may behave very differently, for example when the coefficients of the two SDE are uniformly close, but not compactly supported. To use estimates in weighted space, [8] considers functions such that

(M f )2dµ + (M f )2dµ < , | | t |∇ | t ∞ Z Z with µt the flow of the SDE. The authors also consider other function spaces of the same nature, sharper in dimension one, or that handle weaker integrability conditions on M f than Lp (but stronger than L1). Since the space depends on |∇ | the law of the flow at all times, it may be difficult to determine estimates on such norms. For the application considered in Section 2.4, we do not know whether the approach of [8] could apply. We shall also focus on establishing very explicit quantitative estimates in trans- port distance, highlighting in particular the dependence on the dimension. Another approach was developed in [6] to directly obtain relative entropy es- timates between the distributions at finite times. The upside of that approach is that the quantitative estimates are quite stronger, depending polynomially on some distance between the coefficients. The two downsides are that they depend 7 on stronger Sobolev norms, requiring that the derivatives of the two diffusion co- efficients are close in some sense, as well as a-priori Fisher information-like bounds on the relative densities, rather than Lp bounds. Fisher information-like estimates were then derived in [7] by directly comparing generators via a Poisson equation, also using stronger Sobolev norms. Finally, when the two diffusion coefficients match, one can derive relative en- tropy bounds via Girsanov’s theorem. Unfortunately, this strategy cannot work when the two diffusion coefficients differ.

2.4 An application to Stein kernels We now explain how our result might be applied in the context of Stein’s method for bounding distances between probability measures. The theory was developed by C. Stein in [43, 44] to control distances to the standard Gaussian along the central limit theorem. Since then it has found many applications for bounding distances between probability measures, in both Gaussian and non-Gaussian sit- uations [39, 36, 10]. At the heart of the theory lies the following observation, sometimes called Stein’s lemma (see [39]): If G γ is the standard Gaussian in Rd then it satisfies the following integra- ∼ tion by parts formula, for any regular test function f,

E [ f(G), G ]= E [∆f(G)] , h∇ i where ∆ is the Laplacian. Moreover, the Gaussian is the only measure which satisfies this formula. Given a measure ν on Rd and X ν, a matrix valued map τ : Rd M d(R), ∼ → is said to be a Stein kernel for ν, if it mimics the above formula:

E [ f(X), X ]= E 2 f(X),τ(X) . (5) h∇ i h∇ iHS Observe that the map τ I , which is constantly identity, is a Stein kernel for γ ≡ d γ. Stein’s lemma suggests that if τν is close to the identity then ν should be close γ. This is in fact true, and there are many examples of precise quantitative statements implementing this idea, for various distances between measures, such as transport distances, the total variation distance, or the Kolmogorov distance in dimension 1. The one most relevant to the present work is an inequality of [30], which states that for any τ which is a Stein kernel for a measure ν,

2 W (ν, γ) τ I 2 , (6) 2 ≤ k − dkL (ν) where W2 is the quadratic Wasserstein distance. The proof of this inequality strongly relies on Gaussian algebraic identities, such as the Mehler formula for the Ornstein-Uhlenbeck semigroup. We are interested in similar estimates when neither of the measures are Gaussian. The main motivation comes from the fact that Stein’s method, in its classical implementations, is hard to use for target measures that do not satisfy certain exact algebraic properties (typically, explicit 8 knowledge of the eigenvectors of an associated Markov semigroup). We shall prove a weaker inequality holds for certain non-Gaussian reference measures and for one particular construction of Stein kernels. To understand this construction we require the following definition.

Definition 2.4 (Moment map). Let µ be a measure on Rd. A moment map of µ is a convex function ϕ : Rd R such that e ϕ is a centered probability density → − whose push-forward by ϕ is µ. ∇ As was shown in [12, 40], if µ is centered and has a finite first moment and a density, then its moment map exists and is unique as long as we enforce essential continuity at the boundary of its support. The moment map ϕ can be realized as the optimal transport map between some source log-concave measure and the target measure µ, where we enforce that gradient of the source measure’s potential must equal the transport map itself. The correspondence between the convex function ϕ and the measure µ is actually a bijection, up to a translation of ϕ, and the measure associated with a given convex function is known as its moment measure. If µ has a density ρ, then ϕ solves the Monge-Amp`ere-type PDE

ϕ 2 e− = ρ( ϕ) det( ϕ). ∇ ∇ This PDE, sometimes called the toric K¨ahler-Einstein PDE, first appeared in the geometry literature [48, 15, 4, 31], where it plays a role in the construction of K¨ahler-Einstein metrics on certain complex manifolds. Variants with different nonlinearities have recently been considered, for example in [21]. The connection between moment maps and Stein kernels was made in [18]. Specifically, it was proven that if ϕ is the moment map of µ, then (up to regularity issues) the matrix valued map,

2 1 τ := ϕ( ϕ− ), (7) µ ∇ ∇ is a Stein kernel for µ. Since ϕ is a convex function, τµ turns out to be supported on positive semi-definite matrices. For this specific construction of a Stein kernel we will prove the following analogue of (6).

Theorem 2.5. Let µ be a log-concave measure on Rd such that 1 αI 2 ln (dµ/dx) I , d ≤ −∇ ≤ α d for some α (0, 1] and let τ be its Stein kernel defined in (7). If ν is any other ∈ µ probability measure and σ is a Stein kernel for ν such that (1) is well defined, then 2 2 for β = √τ √σ 2 and M = max m (µ),m (ν) , k − kL (µ) 2 2  ln ln 1+ M + M ln(M) 2 6 3 β W2 (µ,ν) Cα− d M ln(M) dν/dµ . ≤ k k∞   M ln 1+ β   9

Moreover, if µ is radially symmetric, has full support, and d > c, for some uni- versal constant c> 0,

ln ln 1+ M + M ln(M) 2 20 7/2 β W2 (µ,ν) Cα− d M ln(M) dν/dµ L2 (µ) . ≤ k k   M ln 1+ β   Finally, if √τµ is L-Lipschitz, then

2 (4L2 +1) W (µ,ν) 100α− (2L + 1) β. 2 ≤ It should be emphasized that, except in dimension one, Stein kernels are not unique. Different constructions than the one studied here have been provided for example in [9, 11, 34, 37]. Unlike the functional inequalities of [30] for the Gaussian measure, our results will only work for the Stein kernels constructed from moment maps (at least for one of the two measures). In particular, in order to define a stochastic flow from a Stein kernel, we must require the kernel to take positive values, which to our knowledge is not guaranteed for other constructions. While this estimate is somewhat weak, it seems to be one of the few instances where we can estimate a distance from a discrepancy for a class of target mea- sures, without explicit algebraic requirements for an associated Markov generator. Recently, there has been progress on implementing Stein’s method for wide classes of target measures via Malliavin calculus [17, 20]. Note that if one of the two measures is Gaussian, since the natural Stein kernel for the standard Gaussian is constant, and hence Lipschitz, one could use the stronger Theorem 2.2 to get a stability estimate, which would still be weaker than that of [30], but with the sharp exponent. One may wonder why we do not prove this type of estimate directly using Stein’s method. The key difference lies in that we do not need a second-order regularity bound on solutions of Stein’s equation, which we do not even know how to prove. To be more precise, the natural way to try to use Stein’s method for this problem would be to apply the generator approach using the generator of the process dX = X dt + 2τ(X )dB , where τ is the Stein kernel for µ. Applying t − t t t Stein’s method to bound say the W1 distance would require us to bound f p ||∇ ||∞ and 2f for solutions to the Stein equation ||∇ ||∞ x f + Tr(τ 2f)= g gdµ − ·∇ ∇ − Z for arbitrary 1-Lipschitz data g. While a slightly stronger version of Assumption (H3) could be used to prove bounds on f , the techniques used here would not ||∇ ||∞ help to bound 2f . So using Stein’s method would require some ingredients ||∇ ||∞ we do not have. Indeed, in general proving second-order bounds is usually the most difficult step in implementing Stein’s method via diffusion processes, and in the literature has mostly been done for measures satisfying certain algebraic properties, such as having an explicit orthogonal basis of polynomials that are eigenvectors for an associated diffusion process (for example Gaussians or gamma distributions). 10

3 Proofs of stability bounds

A rough outline of the proofs is as follows: as a first step we will use Itˆo’s for- mula to show that (H1) implies bounds on the measures µt and νt, for fixed t. Indeed, (H1) will allow us to replace quantities like τ(X ) τ(Y ) , which will k t − t k arise through the use of Itˆo’s formula by something more similar to X Y . We k t − tk will then use (H2) to transfer those estimate to the measure ν as well. After establishing that µt and νt are close, (H3) will be used to establish the same for µ and ν.

We first demonstrate this in the easier case of globally Lipschitz coefficients.

3.1 Lipschitz coefficients - proof of Theorem 2.2 Proof of Theorem 2.2. By Itˆo’s formula, we have d X Y 2 = 2 X Y , a(X ) b(Y ) dt+√8(X Y )(√σ(X ) √τ(Y ))dB + √σ(X ) √τ(Y ) 2 dt. k t− tk h t− t t − t i t− t t − t t k t − t kHS So, d E X Y 2 E X Y 2 + E a(X ) b(Y ) 2 + E √σ(X ) √τ(Y ) 2 . dt k t − tk ≤ k t − tk k t − t k k t − t kHS         We have

E a(X ) b(Y ) 2 2E a(X ) a(Y ) 2 + 2E a(Y ) b(Y ) 2 k t − t k ≤ k t − t k k t − t k 2E 2 2 2L Xt Yt + 2 a b 2 ,   ≤  k − k  k − kL (νs)  and  

E √σ(X ) √τ(Y ) 2 2E √σ(X ) √σ(Y ) 2 + 2E √σ(Y ) √τ(Y ) 2 k t − t kHS ≤ k t − t kHS k t − t k 2E 2 2 2L Xt Yt + 2 √σ √τ 2 .   ≤  k − k k − kL (νs)  Combine the above displays to obtain,  d E 2 2 E 2 2 2 Xt Yt (1 + 4L ) Xt Yt + 2 a b 2 + 2 √σ √τ 2 . dt k − k ≤ k − k k − kL (νs) k − kL (νs)     We choose µ = ν = ν so that ν = ν for all s 0, and denote r = 2 a b 2 + 0 0 s ≥ k − kL2(ν) 2 √σ √τ 2 . To bound W 2(ν,µ), we consider the differential equation k − kL2(ν) 2 2 f ′(t) = (1 + 4L )f(t)+ r, with initial condition f(0) = 0.

2 e(4L +1)t 1 Its unique solution is given by f(t)= r 4L2+1− . Thus, by Gronwall’s inequality

2 e(4L +1)t 1 W 2(ν,µ )= W 2(ν ,µ ) E X Y 2 r − . 2 t 2 t t ≤ k t − tk ≤ 4L2 + 1   11

By Assumption (H3) we also know that κ t W (µ ,µ) C e− · W (ν,µ). 2 t ≤ H 2 Thus,

(4L2+1)t e 1 κ t W2(ν,µ) W2(ν,µt)+ W2(µt,µ) r − + CH e− · W2(ν,µ), ≤ ≤ s 4L2 + 1 or equivalently when t is large enough

2 r e(4L2+1)t 1 r e(2L +1)t W (ν,µ) − . 2 ≤ 4L2 + 1 1 e κ tC ≤ 4L2 + 1 1 e κ tC r p − − · H r − − · H ln 1+ 2κ +ln(C )  4L2+1  H We now take t = κ to get

4L2+1 2 r 4L2 + 1 2κ 2κ 4L +1 W (ν,µ) + 1 1+ C 2κ 2 ≤ 4L2 + 1 2κ 4L2 + 1 H r    4L2+1 L 10C 2κ √r + 1 . ≤ H κ   To finish the proof it is enough to observe that r 2β2. ≤ 3.2 Proof of Theorem 2.1 To prove Theorem 2.1, we will first show that, under suitable assumptions, for a given t > 0, the measure µt cannot be too different than νt. Following [13] we define the logarithmic transport distance, which serves as a natural measure of distance between µt and νt: x y 2 δ(µ,ν) := inf ln 1+ | − | dπ, D π δ2 Z   where δ > 0 and the infimum is taken over all couplings of µ and ν, i.e. is a Dδ transport cost (but not a distance, and the cost is concave, not convex). We have the following connection between and W 2 , which is essentially Dδ 2,R the same as [41, Lemma 5]. The proof of this lemma may be found in the appendix. f Lemma 3.1. For any R,δ,ǫ > 0, we have

2 2 δ(µ,ν) δ(µ,ν) W2,R(µ,ν) δ exp D + Rǫ + R D . ≤ ǫ R2   ln 1+ δ2 f   Remark 3.1. We can define W1,R in the same way, and a similar proof would also show that f δ(µ,ν) δ(µ,ν) W1,R(µ,ν) δ exp D + Rǫ + R D , ≤ ǫ ln 1+ R   δ2 which motivatesf Remark 2.1.  12

δ(µ,ν) Observe that if δ < R, then by choosing ǫ = D R in the above lemma, we ln(1+ δ ) obtain 2 δ(µ,ν) W2,R(µ,ν) 2R δ + D R . (8) ≤ ln 1+ δ ! Moreover, if both µ andfν have tame tails then it can be shown that for R large enough, W 2(µ,ν) W 2 (ν,µ). 2 ≃ 2,R This is made rigorous in Lemma A.1, in the appendix. For the logarithmic trans- f port distance, we will prove:

Lemma 3.2. Suppose that (H1) and (H2) hold and that X0 = Y0 almost surely with Y ν. Then, for any t,δ > 0, 0 ∼

2 1 2 2 (µ ,ν ) 2t 10 dν/dµ p g 2q + a b 1 + √σ √τ 2 . Dδ t t ≤ k kL (µ) k kL (µ) δ k − kL (ν) δ2 k − kL (ν)   1 1 where q is such that q + p = 1. Let us first show how to derive Theorem 2.1 from Lemma 3.2.

Proof of Theorem 2.1. To ease the notation we will denote

2 α = 20 dν/dµ p g 2q , β = 2 a b 1 + 2 √σ √τ 2 . k kL (µ) k kL (µ) k − kL (ν) k − kL (ν) We choose δ = β in Lemma 3.2 and obtain:

(µ ,ν ) (α + 1) t. Dδ t t ≤ Now, combine the above estimate with (8) to get

2 (α + 1) (α + 1) W2,R(µt,νt) 2R β + t 2R (t + R). (9) ≤  R  ≤ R ln 1+ β ln 1+ β f       To see the second inequality note that β R ln(1+ R ) 1. With Assumption (H3), ≤ β − we have

tκ tκ 2 2 W (µ ,µ) W (µ ,µ) C e− W (ν,µ) C e− x dµ + x dν . 2,R t ≤ 2 t ≤ H 2 ≤ H | | | | sZ Z ! f Observe as well that since ν is an invariant measure,

W2,R(νt,ν) = 0.

We thus get, f

tκ 2 2 W (ν ,ν)+ W (µ ,µ) C e− m (µ)+ m (ν). 2,R t 2,R t ≤ H 2 2 q f f 13

Take 1 R t := ln m2(µ)+ m2(ν) ln 1+ , 0 κ 2 2 β s  ! for which, CH W2,R(νt0 ,ν)+ W2,R(µt0 ,µ) , ≤ R ln 1+ β f f r   and, by using (9),

2R(α + 1) W2,R(µt0 ,νt0 ) (t0 + R). ≤ vln 1+ R u β u f t   To conclude the proof, we use the triangle inequality,

W (µ,ν) W (ν ,ν)+ W (µ ,µ)+ W (µ ,ν ) 2,R ≤ 2,R t0 2,R t0 2 t0 t0 CH 2R(α + 1) f f +f f(t0 + R) R ≤ R vln 1+ ln 1+ u β β u r   t   1 2R(α + 1) R C + ln (m2(µ)+ m2(ν)) ln 1+ + κR . ≤ H κ 2 2 β R s    ! ln 1+ β r  

3.2.1 Proof of Lemma 3.2

In this section our goal is to bound the logarithmic distance between Xt and Yt and thus prove Lemma 3.2. Towards this, we let Z = X Y . A straightforward t t − t application of Itˆo’s formula gives the following result, whose proof may be found in [32, Section 4.1]. Lemma 3.3. t Z , a(X ) b(Y ) (µ ,ν ) (µ ,ν ) + 2 E h s s − s i ds Dδ t t ≤ Dδ 0 0 Z 2 + δ2 Z0  | s|  t √σ(X ) √τ(Y ) 2 + 2 E || s − s || ds. Z 2 + δ2 Z0  | s|  With the above inequality we may then prove. Lemma 3.4. Let t 0. Then, ≥ t 2 2 1 2 2 δ(µt,νt) δ(µ0,ν0)+2 5 g 2 + g 2 + a b 1 + √σ √τ 2 ds. D ≤ D k kL (µs) k kL (νs) δ k − kL (νs) δ2 k − kL (νs) Z0     14

Proof. We have

Z , a(X ) b(Y ) a(X ) b(Y ) E h s s − s i E | s − s | 2 2 2 2 Zs + δ ≤ " Zs + δ #  | |  | | ap(X ) a(Y ) a(Y ) b(Y ) E | s − s | + E | s − s | . 2 2 2 2 ≤ " Zs + δ # " Zs + δ # | | | | p p

Using Assumption (H1), we get

a(Xs) a(Ys) E | − | E [g(Xs)+ g(Ys)] = g 1 + g 1 . 2 2 L (µs) L (νs) " Zs + δ # ≤ k k k k | | p

We also have,

a(Ys) b(Ys) 1 E | − | a b 1 . 2 2 L (νs) " Zs + δ # ≤ δ k − k | | So, p Zs, a(Xs) b(Ys) 1 E h − i g 1 + g 1 + a b 1 . Z 2 + δ2 ≤ k kL (µs) k kL (νs) δ k − kL (νs)  | s|  Similar calculations yield,

2 E √σ(Xs) √τ(Ys) 2 2 2 2 || − || 4 g 2 + 4 g 2 + √σ √τ 2 . Z 2 + δ2 ≤ k kL (µs) k kL (νs) δ2 − L (νs)  s  | | 2 As it is fine to assume g g 1 1 for any probability measure ρ we k kL2(ρ) ≥ k kL (ρ) ≥ consider, since we assumed for convenience that g 1, we now plug the above ≥ displays into Lemma 3.3.

Lemma 3.2 is now a consequence of the previous lemma.

Proof of Lemma 3.2. We start from Lemma 3.4. Since µ0 = ν0 = ν, and ν is the invariant measure of the evolution equation for (Yt), we have

t 2 2 1 2 2 δ(µt,νt) 2 5 g 2 + g 2 + a b 1 + √σ √τ 2 ds. D ≤ k kL (µs) k kL (ν) δ k − kL (ν) δ2 k − kL (ν) Z0     (10) 1 Let q = 1 1 − . By H¨older’s inequality, − p   2 2 dν g L2(ν) g L2q (µ) . k k ≤ k k dµ p L (µ)

15

Also,

2 2 dµs 2 dν g 2 g 2q g 2q , L (µs) L (µ) L (µ) k k ≤ k k dµ p ≤ k k dµ p L (µ) L (µ)

where in the second inequality we have used that dµs is monotonic decreas- dµ Lp(µ) ing in s. We plug the above displays into (10), to obtain

2 1 2 2 (µ ,ν ) 2t 10 dν/dµ p g 2q + a b 1 + √σ √τ 2 . Dδ t t ≤ k kL (µ) k kL (µ) δ k − kL (ν) δ2 k − kL (ν)   which concludes the proof.

4 Proofs of the applications to Stein kernels

d In this section we fix a measure µ on R , with Stein kernel τµ, constructed as in (7). For now, we make the assumption that τµ is positive definite and uniformly bounded from below. To apply our result, we must first construct an Itˆodiffusion process with µ as its unique invariant measure. Define the process Xt to satisfy the following SDE: dX = X dt + 2τ (X )dB . (11) t − t µ t t q Lemma 4.1. µ is the unique invariant measure of the process Xt.

Proof. Let L be the infinitesimal generator of of (Xt). That is, for a twice differ- entiable test function,

Lf(x)= x, f(x) + τ (x), 2 f(x) . h− ∇ i h µ ∇ iHS From the definition of the Stein kernel (5), we have

Eµ [Lf(x)] = 0, for any such test function. We conclude that µ is the invariant measure of the process. Uniqueness follows, since τµ is uniformly bounded from below ([5]). Before proving Theorem 2.5 we collect several facts concerning this process.

4.1 Lusin-Lipschitz Property for moment maps

We would now like to claim that the kernel τµ exhibits Lipschitz-like properties as in Assumption (H1). For this to hold we restrict our attention to a more regular V (x) class of measures. Henceforth, we assume that µ = e− dx is an isotropic log- concave measure whose support equals Rd and that there exists a constant α> 0, such that 1 αI 2V I . (12) d ≤∇ ≤ α d 16

In some sense, this assumption can be viewed as restricting ourselves to measures that are not too far from a Gaussian distribution. Under this assumption the main result of this section is that Stein kernels satisfy the Lusin-Lipschitz property that we need in order to apply Theorem 2.1. That is:

Lemma 4.2. Let µ be an isotropic log-concave measure on Rd satisfying (12) and let τµ be its Stein kernel constructed from the moment map. Then, there exists a function g : Rd R such that for almost every x,y Rd: → ∈

τ (x) τ (y) (g(x)+ g(y)) x y , µ − µ ≤ k − k

q q and, 3/2 1 g 2 Cd α− , k kL (µ) ≤ where C > 0 is a universal constant. Moreover, there exists a constant c such that if µ is radially symmetric and has full support then we also have for d > c

7/4 8 g 4 < Cd α− . k kL (µ) In the sequel we will use the following notation, for v Rd, ∂ ϕ is the ∈ v directional derivative of ϕ along v. Repeated derivations will be denoted as 2 3 ∂uvϕ, ∂uvwϕ, etc. If ei, for i = 1,...,d, is a standard unit vector, we will ab- breviate ∂iϕ = ∂ei ϕ. Recall that τ = 2ϕ( ϕ 1), where ϕ pushes the measure e ϕdx unto µ. µ ∇ ∇ − ∇ − Thus, keeping in mind Proposition 2.3, our first objective is to show ∂3 ϕ ijk ∈ W 1,2(µ), for every i, j, k = 1,...,d. This will be a consequence of the following result:

Proposition 4.3 (Third-order regularity bounds on moment maps). Assume that µ is isotropic and that 2V α I . Then, for i, j = 1, ..., d and j = i, ∇ ≥ d 6 1. ∂2 ϕ 2e ϕdx Cα 1. |∇ ii | − ≤ − 2. R ∂2 ϕ 2e ϕdx C(d + α 1). |∇ ij | − ≤ − Here CR is a dimension-free constant, independent of µ.

Note that under the isotropy condition, necessarily α 1. These bounds build ≤ up on the following estimates :

V (x) Proposition 4.4. Assume that µ = e− dx is log-concave and isotropic and ϕ is its moment map.

1. For any direction e Sd 1 we have ∈ −

2 p ϕ p 2p (∂ ϕ) e− dx 8 p ee ≤ Z 17

2. ( 2ϕ) 1 ∂ ϕ, ∂ ϕ e ϕdx 32 x, e 4dµ C, with C a dimension- h ∇ − ∇ ee ∇ ee i − ≤ h i ≤ free constant, that does not depend on µ. R qR 3. If µ has a convex support and 2 V α I with α> 0, then 2 ϕ α 1 I . ∇ ≥ d ∇ ≤ − d 4. If µ has full support and 2 V β I with β > 0 then 2 ϕ β 1 I . ∇ ≤ d ∇ ≥ − d The first part was proved in [23] (see [18, Proposition 3.2] for the precise statement). The second part is an immediate consequence of [24, eq (55)]. The third part was proved in [23]. The last part is part of the proof of [28, Theorem 3.4]

Proof of Proposition 4.3. The first part is an immediate consequence of items 2 and 3 of Proposition 4.4. For the second part, with several successive integrations by parts, we have,

3 2 ϕ 4 2 ϕ 3 2 ϕ (∂ ϕ) e− dx = (∂ ϕ)(∂ ϕ)e− dx + (∂ ϕ)(∂ ϕ)(∂ ϕ)e− dx ijk − iijk jk ijk jk i Z Z Z 3 3 ϕ 3 2 ϕ = (∂ ϕ)(∂ ϕ)e− dx (∂ ϕ)(∂ ϕ)(∂ ϕ)e− dx iik jjk − iik jk j Z Z 3 2 ϕ + (∂ijkϕ)(∂jkϕ)(∂iϕ)e− dx Z 1 3 2 ϕ 3 2 ϕ 1 3 2 ϕ (∂ ϕ) e− dx + (∂ ϕ) e− dx + (∂ ϕ) e− dx ≤ 2 ijk iik 2 jjk Z Z Z 1 2 4 ϕ 1 4 4 ϕ + (∂ ϕ) e− dx + ((∂ ϕ) + (∂ ϕ) e− dx. (13) 2 jk 4 i j Z Z Moreover, since 2ϕ is positive-definite, we have ∂2 ϕ (∂2 ϕ + ∂2 ϕ)/2, ∇ | jk | ≤ jj kk and therefore

2 4 ϕ 1 2 2 4 ϕ (∂ ϕ) e− dx (∂ ϕ + ∂ ϕ) e− dx jk ≤ 8 jj kk Xk Z Xk Z Cd. (14) ≤ Summing (13) over k implies the result, via the moment bounds for isotropic log-concave distributions and the 2nd order bounds on ϕ.

We will also need the following result about radially symmetric functions.

V (x) Proposition 4.5. Suppose that (12) holds and that µ = e− dx is radially symmetric and has full support. Then, there exists an absolute constant c > 0, such that if d > c: 3 4 ϕ 30 4 ∂ ϕ e− dx Cα− d . | ijk | ≤ Z for some absolute constant C > 0. 18

Proof. Note that ϕ satisfies the Monge-Amp`ere equation

ϕ V ( ϕ) 2 e− = e− ∇ det ϕ , ∇ and that it can be verified that if V is a radial function then so is ϕ. Let i = 1,...,d, by taking the logarithm and differentiating the above equation we get:

1 ∂ ϕ = V ( ϕ), ∂ ϕ Tr 2∂ ϕ 2ϕ − . i h∇ ∇ ∇ i i− ∇ i ∇    By Proposition 4.4, αI 2ϕ 1 I . Hence, d ≤∇ ≤ α d 2 1 2 1 Tr ϕ − ∂ ϕ ∂ ϕ + V ( ϕ), ∂ ϕ ∂ ϕ + α− √d V ( ϕ) , ∇ ∇ i ≤ | i | |h∇ ∇ ∇ i i| ≤ | i | k∇ ∇ k    (15) where the second inequality used Cauchy-Schwartz along with ∂ ϕ √d 2ϕ . k∇ i k≤ k∇ kop The proof will now be conducted in three steps:

d 2 1 2 2 3 1. We will bound Tr ϕ − ∂iϕ in terms of Tr ∂iϕ = ∂jjiϕ. ∇ ∇ ∇ j=1     P 4 d 3 ϕ 2. Using (15), we’ll show that ∂jjiϕ e− dx cannot be large. j=1 ! P 4 3 ϕ 3. Finally, we will use the previous step to bound ∂kjiϕ e− dx. R   1 Step 1: We now wish to understand Tr 2ϕ − 2R∂ ϕ . Write ϕ(x)= f( x 2), ∇ ∇ i k k so that,   2 2  2 T ϕ(x) = 2f ′( x )I + 4f ′′( x )xx . (16) ∇ k k d k k The bounds on 2ϕ imply the following inequalities, which we shall freely use ∇ below: 2 2 1 α 2f ′( x ), 2f ′( x ) + 4f ′′( x ) x α− , ≤ k k k k k k k k ≤ and 2 2 1 4f ′′( x ) x α− . k k k k ≤ By the Sherman-Morrison formula,

2 1 1 4f ′′( x ) 2ϕ − (x)= I k k xxT . ∇ 2f ( x 2) d − 2f ( x 2) + 4f ( x 2) x 2 ′ k k  ′ k k ′′ k k k k   So,

d 2 1 1 4f ′′( x ) Tr 2ϕ − 2∂ ϕ = ∂3 ϕ k k ∂3 ϕ . ∇ ∇ i 2f ( x 2)  jji − 2f ( x 2) + 4f ( x 2) x 2 xxei  ′ k k j=1 ′ k k ′′ k k k k    X  (17)  19

A calculation shows

3 2 2 2 2 ∂ ϕ(x) = 4x (2x f ′′′( x )+ f ′′( x ) + 2δ f ′′( x )), jji i j k k k k ij k k and d 3 2 2 2 ∂ ϕ(x) = 4x (2 x f ′′′( x ) + (d + 2)f ′′( x )). jji i k k k k k k Xj=1 Also, 3 4 2 2 2 ∂ ϕ = 4x 2 x f ′′′( x ) + 3 x f ′′( x ) . xxei i k k k k k k k k ′′ ′ 4f (x 2) x 2 2f ( x 2)  Thus, if D(x) := 1 2f ′( x 2)+4k fk′′(kxk2) x 2 = 2f ′( x 2)+4kf ′′k( x 2) x 2 , − k k k k k k k k k k k k d 4f ( x 2) ∂3 ϕ ′′ ∂3 ϕ jji 2 k k 2 2 xxei − 2f ′( x ) + 4f ′′( x ) x Xj=1 k k k k k k 2 2 4f ′′( x ) x 2 2 = 1 k k k k 8x x f ′′′( x ) − 2f ( x 2) + 4f ( x 2) x 2 ik k k k  ′ ′′  k k k k 2k k 2 4f ′′( x ) x 2 + d + 2 3 k k k k 4x f ′′( x ) − 2f ( x 2) + 4f ( x 2) x 2 i k k  ′ ′′  2 k 2k k k k k 2 = D(x)8x x f ′′′( x ) + (d 1 + 3D(x)) 4x f ′′( x ) ik k k k − i k k d 3 2 = D(x) ∂ ϕ + (d 1 + 3D(x) D(x)(d + 2)) 4x f ′′( x ) jji − − i k k Xj=1 d 3 3 xi D(x) ∂ ϕ Cdα− | | ≥ jji − x 2 Xj=1 k k 2 3 xi = D(x)Tr ∂ ϕ Cdα− | | . ∇ i − x 2 k k  − 2 α 1 In the inequality, we have used (16) along with the bounds 4 f ′′( x ) x 2 , | k k | ≤ k k and α2 D(x) α 2. (17) then implies: ≤ ≤ − 3 1 D(x) Cdα− 1 Tr 2ϕ − 2∂ ϕ Tr 2∂ ϕ . ∇ ∇ i ≥ 2f ( x 2) ∇ i − 2f ( x 2) x ′ k k ′ k k k k     Step 2: We now integrate with respect to the moment measure, so the estimate from the previous step, along with the bounds α2 D(x), α 2f ( x ) α 1, ≤ ≤ ′′ k k ≤ − and (15) give:

2 4 ϕ 12 4 ϕ Tr ∂ ϕ e− dx Cα− ∂ ϕ e− dx ∇ i ≤ | i | Z  Z  4 2 4 ϕ 4 16 1 ϕ + α− d V ( ϕ) e− dx + d α− e− dx . k∇ ∇ k x 4 Z Z k k (18) 20

Let us look at each term on the right hand side. By a change of variable

4 ϕ 4 ∂ ϕ e− dx = x dµ C, | i | k ik ≤ Z Z since higher moments of coordinates of isotropic log-concave measures are con- trolled. For the second term, since ϕ is a transport map, we get that ∇

4 ϕ 4 V ( ϕ) e− dx = V dµ. k∇ ∇ k k∇ k Z Z Recalling that αI 2V 1 I , we apply the Poincar´einequality for µ, and d ≤ ∇ ≤ α d since V 2 = 2 2V V 2α 1 V , |∇|∇ | | |∇ ∇ |≤ − |∇ | 2 4 ϕ 2 3 2 V ( ϕ) e− dx V dµ + 4α− V dµ . k∇ ∇ k ≤ |∇ | |∇ | Z Z  Z  By integration by parts, V 2dµ = ∆V dµ dα 1, and hence |∇ | ≤ − R R 4 ϕ 2 4 V ( ϕ) e− dx 5d α− . k∇ ∇ k ≤ Z For the third integral, we may use the fact that when d c, a reverse H¨older ≥ inequality holds for negative moments, and may be applied to radially symmetric log-concave measures (see [22, Theorem 1.4]). According to the inequality,

2 1 ϕ 2 ϕ − 2 e− dx C x e− dx Cα− . x 4 ≤ k k ≤ Z k k Z  We now plug the previous three displays into (18) to conclude,

4 d 3 ϕ 2 4 ϕ 30 4 ∂ ϕ e− dx = Tr ∂ ϕ e− dx Cα− d . (19)  jji  ∇ i ≤ j=1 Z X  Z   21

Step 3: Now, let i, j, k = 1,...,d be distinct. We have,

3 4 ϕ 4 2 4 ϕ ∂ ϕ e− dx = 8 x x x f ′′′( x ) e− dx ijk i j k k k Z Z  4  8 4 2 2 4 ϕ 4 2 2 4 ϕ x x f ′′′( x ) e− dx + x x f ′′′( x ) e− dx . ≤ 24 i j k k i k k k Z Z  4   8 4 2 2 4 ϕ x x f ′′′( x ) e− dx ≤ 23 i k k k k Z  4 4 2 2 2 4 ϕ 8 x x f ′′′( x ) + (d + 2)f ′′( x ) e− dx ≤ i k k k k k k Z  5 4 2 4 ϕ + 8 x (d + 2)f ′′( x ) e− dx i k k Z 4  d 4 3 ϕ 2 4 ϕ = 8 ∂ ϕ(x) e− dx + (d + 2)x f ′′( x ) e− dx   jji  i k k  j=1 Z  X   Z    30 4 4 2 4 ϕ Cα− d + (d + 2) x f ′′( x ) e− dx , ≤ i k k  Z   where we have used (19) in the last inequality. For the remaining integral term, denote x i = (x1, ..., xi 1,xi+1, ..., xd). Then ∼ −

2 4 ϕ 1 2 4 ϕ xif ′′( x ) e− dx = 4 xi x i f ′′( x ) e− dx k k x i k ∼ k · k k Z Z k ∼ k  1 4 4 2 4  ϕ 4 xi xj f ′′( x ) e− dx ≤ x i k k Z ∼ j=i k k X6 1 2 4 ϕ = 4 ∂ijϕ e− dx x i Z k ∼ k j=i X6  4 1 ϕ 6 dα− 4 e− dx Cdα− . ≤ x i ≤ Z k ∼ k In the last inequality we again used the a reverse H¨older inequality for negative moments of radially symmetric log-concave measures. Plugging this estimate into the previous display finishes the proof, when i, j, k = 3. The other cases can |{ }| be proven similarly.

We are now in a position to prove Lemma 4.2.

1 ϕ Proof of Lemma 4.2. Since ϕ− transports µ to e− dx, from Proposition 4.3 we 1,2 ∇ 3/2 1/2 conclude that τ W (µ) and that Dτ 2 Cd α , where D stands µ ∈ k µkL (µ) ≤ − for the total derivative operator. Thus, by Proposition 2.3, there exists a function g˜ for which, τ (x) τ (y) (˜g(x) +g ˜(y)) x y , k µ − µ k≤ k − k 22

d3/2 and g˜ 2 C . Proposition 4.4 along with (12) shows k kL (µ) ≤ α1/2 √α I √τ . d ≤ µ Hence, 1 τ (x) τ (y) τ (x) τ (y) . µ − µ ≤ √αk µ − µ k

q q Take now g := 1 g˜ to conclude the proof. If µ is radially symmetric, then √ α 7 15 Proposition 4.5 shows Dτ 4 Cd 4 α− 2 and the proof continues in a similar k µkL (µ) ≤ way.

4.2 Exponential convergence to equilibrium We now show that the process (11) satisfies the exponential convergence to equi- librium property we require, as long as (12) is satisfied.

V 1 2 Lemma 4.6. Assume that µ = e− dx with α− Id V α Id. Then the ≥ ∇ ≥ 2 diffusion process (11) satisfies Assumption H3 with κ = 1/2 and CH = α− . Proof. As demonstrated in [27], the diffusion process ϕ (X ), where (X ) solves ∇ ∗ t t (11), satisfies the Bakry-Emery curvature dimension condition CD(1/2, ) when ∞ viewed as the canonical diffusion process on the weighted manifold (Rd, ( 2ϕ) 1, e ϕ). ∇ − − Therefore it is a contraction in Wasserstein distance, with respect to the Rieman- nian metric d with tensor ( 2ϕ) 1 [38]. That is ϕ ∇ − ϕ t/2 ϕ W (µ ϕ, e− ) e− W (µ ϕ, e− ). 2,dϕ t ◦∇ ≤ 2,dϕ 0 ◦∇ From the bounds on 2ϕ given by Proposition 4.4, we have ∇ 1 2 2 2 α− x y d (x,y) α x y , | − | ≥ ϕ ≥ | − | and the result follows, using again the two-sided Lipschitz bounds on ϕ. ∇ 4.3 Stability for Stein kernels Proof of Theorem 2.5. We consider the two processes

dX = X + 2τ(X )dB , t − t t t dY = Y + p2σ(Y )dB . t − t t t p

By Lemma 4.1, µ and ν are the respective invariant measures of Xt and Yt. By 1 2 Lemma 4.6, Assumption H3 is satisfied with κ = 2 ,CH = α− . By Lemma 4.2, Assumption H1 is satisfied with g 2 Cd3α 2. k kL2 (µ) ≤ − 23

Set p = ,q = 1 and R > 0. Plugging the above estimates to Theorem 2.1, ∞ we get

ln ln 1+ R + ln (M)+ R 2 6 3 β W2,R(ν,µ) Cα− d R dν/dµ . ≤ k k∞    R ln 1+ β f   ν and µ are log-concave and in-particular have sub-exponential tails. We apply Lemma A.1, from the appendix, to obtain a constant C′ > 0 such that

W (ν,µ) 2W ′ (ν,µ), 2 ≤ 2,C M ln(M) which proves the first part of the theorem.f For the second part, if µ is radially symmetric, then we take p = q = 2, and by Lemma 4.2, H1 is now satisfied with 2 7 16 g Cd 2 α and the proof continues in the same way. The last part of k kL4(µ) ≤ − the theorem is an immediate consequence of Theorem 2.2.

Acknowledgments: We thank Ronen Eldan, Bo’az Klartag, Claude Le Bris, Michel Ledoux and Guillaume Mijoule for useful discussions and for their enlight- ening comments. Part of this work was done while M.F. was a guest of the teams Matherials and Mokaplan at the INRIA Paris, whose hospitality is gratefully ac- knowledged. M.F. was supported by the Projects MESA (ANR-18-CE40-006) and EFI (ANR-17-CE40-0030) of the French National Research Agency (ANR).

References

[1] L. Ambrosio, E. Bru´eand D. Trevisan, Lusin-type approximation of Sobolev by Lipschitz functions, in Gaussian and RCD(K, ) spaces. Adv. Math., 339 ∞ (2018), 426–452.

[2] D. Bakry, Etude´ des transformations de Riesz dans les vari´et´es riemanniennes `acourbure de Ricci minor´ee. S´eminaire de Probabilit´es, XXI, Lecture Notes in Math. 1247, 137–172, Springer, Berlin.

[3] D. Bakry, M. Ledoux and L. Saloff-Coste. Markov semigroups at Saint-Flour. Probability at Saint-Flour, Springer, Heidelberg. 2012.

[4] R. J. Berman and B. Berndtsson, Real Monge-Amp`ere equations and K¨ahler- Ricci solitons on toric log Fano varieties Annales de la facult´edes de Toulouse S´er. 6, 22 no. 4, (2013), p. 649–711.

[5] C. Bianca and C. Dogbe, On the existence and uniqueness of invariant mea- sure for multidimensional diffusion processes, Nonlinear Stud., 24, 437 – 468, 2017. 24

[6] V. I. Bogachev, M. R¨ockner and S. V. Shaposhnikov, Distances be- tween transition of diffusions and applications to nonlinear Fokker–Planck–Kolmogorov equations. J. Funct. Anal. 271 (2016), no. 5, 1262–1300.

[7] V.I. Bogachev, V. I., M. R¨ockner and S.V. Shaposhnikov, The Poisson equa- tion and estimates for distances between stationary distributions of diffusions. J. Math. Sci. (N.Y.) 232 (2018), no. 3, Problems in mathematical analysis. No. 92 (Russian), 254–282.

[8] N. Champagnat and P.-E. Jabin, Strong solutions to stochastic differential equations with rough coefficients. Ann. Probab. 46 (2018), no. 3, 1498–1541.

[9] S. Chatterjee, Fluctuations of eigenvalues and second order Poincar´einequal- ities. Probab. Theory Related Fields, 143, 1-40, 2009.

[10] S. Chatterjee, A short survey of Stein’s method. Proceedings of ICM 2014, Vol IV, 1-24, 2014.

[11] T. Courtade, M. Fathi and A. Pananjady, Existence of Stein kernels via spectral gap, and discrepancy bounds. Ann. IHP: Probab. Stat. 55, 2, 2019.

[12] D. Cordero-Erausquin and B. Klartag, Moment measures, J. Funct. Anal. 268 (2015) 3834–3866.

[13] G. Crippa and C. De Lellis, Estimates and regularity results for the DiPerna- Lions flow. J. Reine Angew. Math. 616 (2008), 15–46.

[14] R.J. DiPerna and P. L. Lions, Ordinary differential equations, transport the- ory and Sobolev spaces, Invent. Math., 3, 511–547, 1989.

[15] S. K. Donaldson, K¨ahler geometry on toric manifolds, and some other mani- folds with large symmetry. Handbook of geometric analysis. Adv. Lect. Math. (ALM), Vol. 7, No. 1, Int. Press, Somerville, MA, 29–75, 2008.

[16] S. Fang, D. Luo and A. Thalmaier, Stochastic differential equations with coefficients in Sobolev spaces. J. Funct. Anal. 259 (2010), no. 5, 1129–1168.

[17] X. Fang, Q.-M.Shao and L. Xu, Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174 (2019), no. 3-4, 945–979.

[18] M. Fathi, Stein kernels and moment maps. Ann. Probab., Vol. 47, No. 4, 2172-2185, 2019.

[19] A. Figalli, Existence and uniqueness of martingale solutions for SDEs with rough or degenerate coefficients. J. Funct. Anal. 254 (2008), no. 1, 109–153. 25

[20] J. Gorham, A.B. Duncan, Andrew B., S.J. Vollmer, Sebastian J. and L. Mackey, Measuring sample quality with diffusions. Ann. Appl. Probab. 29 (2019), no. 5, 2884–2928.

[21] K. Huynh and F. Santambrogio, q-Moment Measures and Applications: a new Approach via Optimal Transport. Arxiv preprint, 2020.

[22] G. Paouris, Small ball probability estimates for log-concave measures. Trans. Amer. Math. Soc. 364 (2012), no 1, 287–308.

[23] B. Klartag, Logarithmically-concave moment measures I. Geometric Aspects of Functional Analysis, Lecture Notes in Math. 2116, Springer (2014), 231– 260.

[24] B. Klartag and A. V. Kolesnikov, Eigenvalue distribution of optimal trans- portation. Analysis & PDE., Vol. 8, No. 1, (2015), 33–55.

[25] B. Klartag and A. V. Kolesnikov, Remarks on curvature in the transportation metric. Analysis Math., Vol. 43, No. 1, (2017), 67–88.

[26] A. V. Kolesnikov, On Sobolev Regularity of Mass Transport and Transporta- tion Inequalities, Theory of Probability and Its Applications. 2013. Vol. 57. No. 2. P. 243-264.

[27] A. V. Kolesnikov, Hessian metrics, CD(K,N)-spaces, and optimal transporta- tion of log-concave measures, Discrete and Continuous Dynamical Systems - Series A. 2014. Vol. 34. No. 4. P. 1511-1532.

[28] A. V. Kolesnikov and E. D. Kosov, Moment measures and stability for Gaus- sian inequalities. Theory Stoch. Process. 22 (2017), no. 2, 47–61.

[29] C. Le Bris and P.-L. Lions, Existence and uniqueness of solutions to Fokker- Planck type equations with irregular coefficients. Comm. Partial Differential Equations 33 (2008), no. 7-9, 1272–1317.

[30] M. Ledoux, I. Nourdin and G. Peccati, Stein’s method, logarithmic Sobolev and transport inequalities Geom. Funct. Anal. 25, 256–306 (2015).

[31] E. Legendre, Toric K¨ahler-Einstein metrics and convex compact polytopes, Journal of Geometric Analysis, 26(1), 399–427, 2016.

[32] H. Li and D. Luo, Quantitative stability estimates for Fokker-Planck equa- tions. J. Math. Pures Appl. (9) 122 (2019), 125–163.

[33] F. C. Liu, A Luzin type property of Sobolev functions. Indiana Univ. Math. J. 26 (1977), no. 4, 645–651.

[34] G. Mijoule, G. Reinert and Y. Swan, Stein operators, kernels and discrepan- cies for multivariate continuous distributions. Arxiv preprint, 2018. 26

[35] P. Monmarch´e, Generalized Γ calculus and application to interacting particles on a graph, 2015, Potential Analysis 50, no. 3, 439-466, 2019

[36] I. Nourdin and G. Peccati, Normal approximations with Malliavin calculus: from Stein’s method to universality. Cambridge Tracts in Mathematics. Cam- bridge University Press, 2012.

[37] I. Nourdin, G. Peccati and A. R´eveillac. Multivariate normal approxima- tion using Stein’s method and Malliavin calculus. Ann. I.H.P. Proba. Stat., 46(1):45–58, 2010

[38] M.-K. von Renesse and K.-T. Sturm, Karl-Theodor, Transport inequalities, gradient estimates, entropy, and Ricci curvature. Comm. Pure Appl. Math. 58 (2005), no. 7, 923–940.

[39] N. Ross, Fundamentals of Stein’s method. Probability Surveys Vol. 8 (2011) 210–293.

[40] F. Santambrogio, Dealing with moment measures via entropy and optimal transport. J. Funct. Anal. 271 (2016), no. 2, 418–436.

[41] C. Seis, A quantitative theory for the continuity equation. Ann. I.H.P. (C) Analyse Non Lineaire 34, No. 7: 1837–1850, 2017.

[42] C. Seis, Optimal stability estimates for continuity equations. Proc. Roy. Soc. Edinburgh Sect. A 148 (2018), no. 6, 1279–1296.

[43] C. Stein, A bound for the error in the normal approximation to the distri- bution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical and Probability (Univ. Cal- ifornia, Berkeley, Calif., 1970/1971), Vol. II: Probability theory, pages 583– 602, Berkeley, Calif., 1972. Univ. California Press.

[44] C. Stein. Approximate computation of expectations. Institute of Mathemati- cal Statistics Lecture Notes - Monograph Series, 7. Institute of Mathematical Statistics, Hayward, CA, 1986.

[45] E. M. Stein, and J.-O. Str¨omberg, Behavior of maximal functions in Rn for large n. Ark. Mat. 21 (2): 259–269, 1983.

[46] D. Trevisan, Well-posedness of multidimensional diffusion processes with weakly differentiable coefficients. Electron. J. Probab. 21 (2016), Paper No. 22, 41 pp.

[47] C. Villani, Topics in optimal transportation. Vol. 58 of Graduate Studies in Mathematics, Amer. Math. Soc., Providence, RI, 2003.

[48] X.-J. Wang and X. H. Zhu, K¨ahler-Ricci solitons on toric manifolds with positive first Chern class, Adv. Math., 188, 47–103, 2004. 27

[49] L. Xie and X. Zhang, Sobolev differentiable flows of SDEs with local Sobolev and super-linear growth coefficients. Ann. Probab. 44 (2016), no. 6, 3661– 3687. 28

A Transport inequalities for the truncated Wasser- stein distance

Proof of Lemma 3.1. We let π denote the optimal coupling for and (X,Y ) π. Dδ ∼ Define the sets

D = (x,y) R2d : x y 2 R , ∈ k − k ≤ n x yo2 (µ,ν) D = (x,y) D : ln 1+ k − k Dδ . ∈ δ2 ≤ ǫ     We now write e

2 2 2

½ ½

E E ½ E E min X Y ,R = X Y D + X Y D D +R [ R2d D] , k − k k − k k − k \ · \ e h e i and bound each term separately. Observe that for any α R, ∈ x2 ln 1+ α x δ√eα 1. δ2 ≤ ⇐⇒ | |≤ −   Thus, 2 2 δ(µ,ν) E X Y ½ δ exp D . k − k D ≤ ǫ e   Next, by Markov’s inequality 

X Y 2 (µ,ν) ǫ X Y 2 P ln 1+ k − k Dδ E ln 1+ k − k = ǫ. δ2 ≥ ǫ ≤ (µ,ν) δ2     Dδ    So,

2 ½

E ½ E X Y D D R R2d D Rǫ. k − k \ ≤ \ ≤ e e Finally, a second applicationh of Markov’si inequalityh givesi

2

RE [½R2d D] R P X Y R \ ≤ · k − k ≥ X Y 2 R2 (µ,ν) = R P ln 1+ k −  k ln 1+ R Dδ . · δ2 ≥ δ2 ≤ R2      ln 1+ δ2  

Lemma A.1. Let X µ,Y ν be two centered random vectors in Rd. Assume ∼ ∼ that both X and Y are sub-exponential with parameter M, in the sense that for every k 2, ≥ 1 1 E X k k E Y k k k k , k k M.  k   k  ≤ Then W 2(µ,ν) 2W 2 (µ,ν), 2 ≤ 2,CM ln(M) for a universal constant C > 0. f 29

Proof. Fix R> 0 and let π denote the optimal coupling for W and (X,Y ) π. 2,R ∼ Then, f

2 2 2 ½

E E ½ E X Y = X Y X Y 2 R + X Y X Y 2>R k − k k − k k − k ≤ k − k k − k   W2 (µ,ν)+ E [ X Y 4] P( X Y > R).  ≤ 2,R k − k k − k X Y also has a sub-exponential lawp with parameter C M, where C > 0 is a − f ′ ′ constant. Thus, for some other constant C,

E [ X Y 4] CME X Y 2 , k − k ≤ k − k and p   R P ( X Y > R) e− CM . k − k ≤ Take R = CM ln(2CM), top get. 1 E X Y 2 W 2 (µ,ν)+ E X Y 2 , k − k ≤ 2,R 2 k − k     which implies, f W 2(µ,ν) E X Y 2 2W 2 (µ,ν). 2 ≤ k − k ≤ 2,R   f