IEEE TRANSACTIONS ON , VOL. 64, NO. 8, AUGUST 2018 5691 Quantitative Stability of the Entropy Power Inequality

Thomas A. Courtade , Member, IEEE,MaxFathi,andAshwinPananjady

Abstract—We establish quantitative stability results for the these inequalities is reviewed in Section III-B. Due to the entropy power inequality (EPI). Specifically, we show that if formal equivalence of the Shannon-Stam and Lieb inequalities, uniformly log-concave densities nearly saturate the EPI, then they we shall generally refer to both as the EPI. must be close to Gaussian densities in the quadratic Kantorovich- Wasserstein distance. Furthermore, if one of the densities is It is well known that equality is achieved in the Shannon- Gaussian and the other is log-concave, or more generally has Stam EPI if and only if X and Y are Gaussian vectors with positive spectral gap, then the deficit in the EPI can be controlled proportional covariances. Equivalently, δEPI,t (µ, ν) vanishes if in terms of the L1-Kantorovich-Wasserstein distance or relative and only if µ, ν are Gaussian measures that are identical up entropy, respectively. As a counterpoint, an example shows to translation.1 However, despite the fundamental role the EPI that the EPI can be unstable with respect to the quadratic Kantorovich-Wasserstein distance when densities are uniformly plays in information theory and crisp characterization of equal- log-concave on sets of measure arbitrarily close to one. Our ity cases, few stability estimates are known. Specifically, our stability results can be extended to non-log-concave densities, motivating question is the following quantitative reinforcement provided certain regularity conditions are met. The proofs are of equality conditions for the EPI: based on mass transportation. If δEPI,t (µ, ν) is small, must µ and ν be ‘close’ to Gaussian Index Terms—Entropy power inequality, optimal transport, measures, which are themselves ‘close’ to each other, in a stability. precise and quantitative sense? I. INTRODUCTION Toward answering this question, our main result is a dimension-free, quantitative stability estimate for the EPI. ET X and Y be independent random vectors on n with R More specifically, we show that if the measures µ, ν have uni- corresponding laws µ and ν,eachabsolutelycontinuous L formly log-concave densities and nearly saturate either form of with respect to Lebesgue measure. The celebrated entropy the EPI, then they must also be close to Gaussian measures in power inequality (EPI) proposed by Shannon [2] and proved the quadratic Kantorovich-Wasserstein distance. We also show by Stam [3] asserts that that the EPI is not stable (with respect to the same criterion) in N(X Y ) N(X) N(Y ), (1) situations where the densities nearly satisfy the same regularity + ≥ + conditions. Other quantitative deficit estimates are obtained where N(X) N(µ) 1 e2 h(X)/n denotes the entropy 2πe when one of the two variables is Gaussian and the other is power of X,and≡ h(X) :=h(µ) f log f is the entropy Rn log-concave or has positive spectral gap. Dimension-dependent of X having density f .Foraparameter≡ =− t (0, 1),letusdefine ! ∈ estimates are obtained in certain more general situations. Before stating the main results, let us first introduce some δEPI,t (µ, ν) h(√tX √1 tY) th(X) (1 t)h(Y ) . := + − − + − notation. We let $ $(Rn) denote the set of centered " (2)# ≡ n Gaussian probability measures on R ,andletγ denote the Unaware of the works by Shannon, Stam and Blachman [4], standard Gaussian measure on Rn.Thatis,2 Lieb [5] rediscovered the EPI by establishing δ (µ, ν) 0 EPI,t n x 2/2 dx ≥ dγ(x) dγ (x) e−| | . and noting its formal equivalence to (1). The equivalence of = = (2π)n/2 Manuscript received June 12, 2017; revised January 3, 2018; accepted Next, we recall that the L2-Kantorovich-Wasserstein distance February 1, 2018. Date of publicationFebruary20,2018;dateofcurrent version July 12, 2018. This paper was presented in part at the 2017 between probability measures µ, ν is defined according to International Symposium on Information Theory [1]. T. A. Courtade and 1/2 A. Pananjady were supported by NSF (Center for Science of Information) 2 W2(µ, ν) inf E X Y , under Grant CCF-1528132 and Grant CCF-0939370. M. Fathi was supported = | − | by NSF FRG under Grant DMS-1361122. T. A. Courtade and M. Fathi " # n acknowledge support from the France–Berkeley Fund. where denotes the Euclidean metric on R and the infimum T. A. Courtade and A. Pananjady are with the Department of Electrical is over|·| all couplings on X, Y with marginal laws X µ Engineering and Computer Sciences, University of California, Berkeley, CA and Y ν.IfX µ is a centered random vector, then∼ we 94720 USA (e-mail: [email protected]). ∼ ∼ M. Fathi is with CNRS, 75016 Paris, France, and also with the Institut de Mathématiques de Toulouse, Université Paul Sabatier, 31062 Toulouse, 1Lieb did not settle the cases of equality; this was done later by France. Carlen and Soffer [6]. Communicated by I. Sason, Associate Editor at Large. 2Explicit dependence of quantities on the ambient dimension n will be Digital Object Identifier 10.1109/TIT.2018.2808161 suppressed in situations where our arguments are the same in all dimensions. 0018-9448 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 5692 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018

write &µ EXX⊤ to denote the covariance matrix of X.For additive on product measures, so the estimate (3) is dimension- = 1/2 asymmetricpositivesemidefinitematrix&,wewrite& free, which is compatible with the additivity of δEPI,t on to denote the unique symmetric positive semidefinite matrix product measures. satisfying & &1/2&1/2. Theorem 3 may be readily adapted to the setting of uni- Remark 1:= Both forms of the EPI are invariant to transla- formly log-concave densities. Toward this end, let η>0and tion of the measures µ, ν.Thus,ourpersistentassumptionof recall that h(η1/2 X) h(X) 1 log η,sothatδ (µ, ν) = + 2 EPI,t centered probability measures is for convenience and comes is invariant under the rescaling (X, Y ) (η1/2 X,η1/2Y ). without loss of generality. Similarly, if X µ has density f that→ is uniformly log- Remark 2: Without explicit mention, we assume throughout concave in the sense∼ that that all entropies exist in the usual Lebesgue sense. When 2 dealing with the Shannon-Stam or Lieb inequalities, one gen- log f ηI, (4) −∇ ≥ erally needs to make this assumption. Indeed, it is possible for h(X Y ) to not exist in the usual Lebesgue sense even though then a change of variables reveals that the density fη asso- + ciated with the rescaled η1/2 X satisfies h(X) and h(Y ) exist and are finite [7, Proposition 1]. That 2 ϕ log fη I. In particular, fηdx e dγ for some convex being said, our main results primarily concern log-concave −∇ ≥ = − distributions. As such, all involved entropies are guaranteed function ϕ.Thus,Theorem3isequivalenttothefollowing: to exist (e.g., [8]). Corollary 5: If µ and ν are centered probability measures with densities satisfying (4),then

Organization δEPI,t (µ, ν) The rest of this paper is organized as follows: t(1 t) 2 2 2 η − inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) . Sections II-A and II-B describe our main stability results ≥ 2 γ1,γ2 $ + + for log-concave densities and the relationship to previous ∈ " # work, respectively. Section II-C gives an example where This result will also apply to certain families of non log- the EPI is not stable with respect to the quadratic concave measures, see Remark 17. 2 2 For convenience, let d (µ) infγ0 $ W (µ, γ0) denote Kantorovich-Wasserstein distance when regularity conditions W2 := ∈ 2 are not met. Section III gives proofs of our main results the squared W2-distance from µ to the set of centered Gaussian measures. Using the inequality (a b c)2 3(a2 b2 c2) and a brief discussion of techniques, which are based on + + ≤ + + optimal mass transportation. We conclude in Section IV with and the triangle inequality for W2,wemayconcludeaweaker, extensions of our results to more general settings. but potentially more convenient variant of Corollary 5. Corollary 6: If µ and ν are centered probability measures with densities satisfying (4),then II. MAIN RESULTS t(1 t) This section describes our main results, and also compares δ (µ, ν) η − d2 (µ) d2 (ν) W 2(µ, ν) . to previously known stability estimates. Proofs are generally EPI,t ≥ 8 W2 + W2 + 2 deferred until Section III. " # (5) The Shannon-Stam form of the entropy power inequality (1) A. Stability of the EPI for Log-Concave Densities is oftentimes preferred to Lieb’s inequality for applications Our main result is the following: in information theory. Starting with Corollary 6, we may ϕ ψ Theorem 3: Let µ e− γ and ν e− γ be centered establish an analogous estimate for the Shannon-Stam EPI. probability measures, where= ϕ and ψ are= convex. Then Recall first that (1) attains equality if and only if µ, ν are Gaussian with proportional covariance matrices. Motivated by δ (µ, ν) EPI,t this, we define the quantity t(1 t) 2 2 2 − inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) . 2 ≥ 2 γ1,γ2 $ + + 2 1/2 1/2 ∈ dF (µ, ν) inf √θ&µ √1 θ&ν , (6) " (3)# := θ (0,1) − − F ∈ $ $ ϕ $ $ Remark 4: The notation µ e− γ is shorthand for where A F $ Tr(A A) denotes$ Frobenius dµ ϕ = ϕ ∥ ∥ := ⊤ e− .Measuresoftheformµ e− γ for convex (or Hilbert-Schmidt) norm of a real matrix A,toprovidea dγ = = % ϕ have several names in the literature. Such names include convenient measure of distance between the second order ‘strongly log-concave densities’, ‘log-concave perturbation of statistics of µ, ν.Inparticular,d2 (µ, ν) 0ifandonlyif F = Gaussian’, ‘uniformly convex potential’ and ‘strongly convex &µ and &ν are proportional. With this notation established, potential’ (see [9, pp. 50–51]). This situation also corresponds we have the following quantitative reinforcement of equality n to the Bakry-Émery condition CD(1, )whenthespaceisR . conditions in the Shannon-Stam EPI: ∞ Under the assumptions of the theorem, the three terms in Corollary 7: Let µ and ν be centered probability measures the RHS of (3) explicitly give necessary conditions for the n on R satisfying (4) with parameters ηµ and ην ,respectively. deficit δEPI,t (µ, ν) to be small. In particular, µ, ν must each Then, be quantitatively close to Gaussian measures, which are them- selves quantitatively close to one another. Additionally, W 2 is N(µ ν) (N(µ) N(ν)) + (µ, ν), 2 ∗ ≥ + EPI COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5693

where +EPI(µ, ν) is defined as the quantity for probability measures µ, ν with log-concave densities, there is an explicit function R such that min θηµ,(1 θ)ην exp { − } (1 θ)d2 (µ) 4 n − W2 N(µ ν) (N(µ) N(ν)) R(µ, ν), & " ∗ ≥ + θd2 (ν) d2 (µ, ν) , (7) where R(µ, ν) 1withequalityonlyifµ, ν are Gaussian + W2 + F ≥ ' measures. However, this result is not directly comparable to # 3 and θ is chosen to satisfy θ/(1 θ) N(µ)/N(ν). ours since the function R(µ, ν) is quite complicated, and As remarked above, equality is− attained= in (1) if and only does not explicitly control the distance of µ, ν to the space if µ, ν are Gaussian with proportional covariances. Under of Gaussian measures. Toscani leaves this as an open problem the stated assumptions of log-concavity, these conditions are [13, Remark 7]. Corollary 7 provides a satisfactory answer to explicitly captured by the last three terms in (7). his problem when µ, ν satisfy (4) for some parameter η>0. We also derive a stability estimate when one vector is simply Similarly, Theorems 8 and 9 provide an answer when one log-concave (but not uniformly so) and the other vector is of the measures is Gaussian and the other satisfies regularity Gaussian, involving the L1-Kantorovich-Wasserstein distance conditions. n Next, we compare Corollary 5 to the main result of with respect to the ℓ1 metric on R :Thatis, Ball and Nguyen [14], which states that if µ is a centered n isotropic probability measure (i.e., &µ I) with spectral gap λ W1,1(µ, ν) inf E X Y 1 inf E Xi Yi , = := ∥ − ∥ = | − | and log-concave density (not necessarily uniformly), then i 1 (= λ λ where the infimum is over all couplings of X, Y with marginal δ (µ, µ) D(µ γ) W 2(µ, γ), (9) EPI,1/2 ≥ 4(1 λ) ∥ ≥ 8(1 λ) 2 laws X µ and Y ν. + + Theorem∼ 8: For any∼ log-concave centered random vector where the second inequality is Talagrand’s information- in Rn with law µ, transportation inequality. Now, if µ satisfies (4), then Corollary 5 yields the similar bound 1 2 δ (µ, γ) Ct(1 t) min(n− W (µ, γ), 1), (8) EPI,t ≥ − 1,1 η 2 δEPI,1/2(µ, µ) inf W2 (µ, γ0). ≥ 4 γ0 $ with C an absolute constant that does not depend on µ ∈ or dimension n. Given the similarity, Corollary 5 may be viewed as a close This estimate is reminiscentofthedeficitestimateson relative of (9), which holds for non-identical measures and Talagrand’s inequality of [10] and [11], with a remainder term all parameters t (0, 1).However,twopointsshouldbe ∈ that stays bounded when the distance becomes large. Note mentioned: (i) a stability estimate with respect to W2 is weaker that W1,1 grows linearly with dimension for product measures, than one involving relative entropy; and (ii) η-uniform log- 1 2 so the term n− W1,1(µ, γ) in the RHS of (8) has the correct concavity in the sense of (4) ensures a spectral gap of at least dependence on dimension. η,butnotviceversa.Thus,itisinterestingtoaskwhetherthe Finally, in a more general direction, a measure µ is said hypothesis of Corollary 5 can be weakened to require only n to have spectral gap λ if, for all smooth s R R with aspectralgap;theresultofBallandNguyenandasimilar sdµ 0, : → earlier result by Ball et al. [15] in dimension one provides = some grounds for cautious optimism. In Section 4, we shall ! λ s2 dµ s 2 dµ. obtain a one-dimensional result for non log-concave measures ≤ |∇ | ) ) under a stronger assumption than [15] (namely a Cheeger n Theorem 9: If µ is a centered probability measure on R isoperimetric inequality), but with the advantage of being valid with spectral gap λ,then for non-identical measures. It should be noted that Theorem 9 has weaker hypotheses δEPI,t (µ, γ) min(λ, 1)t(1 t)D(µ γ), than (9) since log-concavity is not assumed, however it applies ≥ − ∥ to the EPI deficit δ (µ, γ),ratherthanconvolutionof where D(µ γ) dµ log dµ is the relative entropy between EPI,t dγ identical measures µ which may be of interest in applications µ and γ . ∥ = to the central limit theorem. All log-concave! distributions have positive spectral gap [12], The results referenced above assume log-concave densities, so the hypothesis of Theorem 9 is weaker than that of as do we (for the most part). In contrast, the refined EPI Theorem 8. However, the advantage of (8) is that it does established in [16] provides a qualitative stability estimate not rely on any quantitative information on µ,onlythatitis for the EPI when µ is arbitrary and ν is Gaussian. However, log-concave. the deficit is quantified in terms of the so-called strong data processing function,andisthereforenotdirectlycomparable B. Relation to Prior Work to the present results. Nevertheless, a noteworthy consequence is a reverse entropy power inequality, which does bear some As remarked above, a few stability estimates are known resemblance to the result of Corollary 7. In particular, for for the EPI. Here, we review those that are most relevant and comment on the relationship to our results. To begin, we 3 R(µ, ν) is expressed in terms of integrals ofnonlinearfunctionalsevaluated mention a stability result due to Toscani [13], which asserts along the evolutes of µ, ν under the heat semigroup. 5694 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018 arbitrary probability measures µ, ν on Rn with finite second However, both results do give quantitative bounds on entropy moments, it was shown in [17] that production under convolution in terms of a distance from Gaussian measures. In general, the constants in (3) will be N(µ ν) (N(µ) N(ν))((1 θ)p(µ) θp(ν)) , (10) ∗ ≤ + − + much better than those in (11) which, although numerical, can where θ is the same as in the definition of +EPI(µ, ν) and be quite small. We return to the setting of radially symmetric 1 measures in Section IV. p(µ) n N(µ)J(µ) 1istheStam defect,withJ(µ) denoting:= .≥ We have p(µ) 1onlyifµ Finally, we mention Carlen and Soffer’s qualitative stability is Gaussian, and thus p(µ) may reasonably be= interpreted as estimate for the EPI that holds under general conditions [6]. ameasureofhowfarµ is from the set of Gaussian measures. Roughly speaking, their resultisthefollowing:ifprobability Thus, the deficit term (1 θ)p(µ) θp(ν) in (10) bears a measures µ, ν on Rn are isotropic with Fisher informations − + 2 pleasant resemblance to the deficit term (1 θ)d (µ) upper bounded by J0,thenthereisanonlinearfunction − W2 + θd2 (ν) in Corollary 7. Importantly, though, the former is / R 0, ), strictly increasing from 0, depending only W2 : →[ ∞ an upper bound on N(µ ν),whilethelatteryieldsalower on the dimension n,theparametert,thenumberJ0 and bound. ∗ smoothness and decay properties of µ, ν that satisfies We would be remiss to not mention that the inequality δ (µ, ν) /(D(µ γ)). p(µ) 1 mentioned above is known as Stam’s inequality, EPI,t ≥ ∥ ≥ and is equivalent to Gross’ celebrated logarithmic Sobolev The construction of the function / relies on a compactness inequality for Gaussian measure [18], [19]. Taking µ ν argument, and therefore is non-explicit. As such, it is again in (10) gives the sharpening p(µ) exp 2 δ (µ,= µ) , ≥ n EPI,t not directly comparable to our results. However, it does settle holding for any probability measure µ with finite second cases of equality. moment. Equivalently, if µ is centered, then * + 1 I (µ γ) D(µ γ) δ (µ, µ), C. Instability of the EPI: An Example 2 ∥ ≥ ∥ + EPI,t As a counterpoint to Theorem 3 and to provide justification where I (µ γ) log dµ 2 dµ denotes the relative Fisher dγ for the regularity assumptions therein, we observe that there information∥ of:=µ with|∇ respect| to γ .Whenµ satisfies (4), are probability measures that satisfy the hypotheses required we have a dimension-free! quantitative stability result for the in Theorem 3 on sets of measure arbitrarily close to one, but logarithmic Sobolev inequality 1 I (µ γ) D(µ γ).This 2 severely violate its conclusion. is an improvement upon the main result∥ of≥ Indrei∥ and Mar- Proposition 10: There is a family of probability measures con [20], who consider the subset of densities satisfying (4) (µ ) on with finite and uniformly bounded entropies for some parameter η>0, whose Hessians are also uniformly ϵ ϵ>0 R and second moments such that upper bounded. Unfortunately, this improvement is already 1) The measures µϵ essentially satisfy (4) for η 1. obsolete, as Fathi et al. [10] have recently shown that a similar = That is, limϵ 0 µϵ(1ϵ) 1,where1ϵ x result holds for all probability measures with positive spectral 2 ↓ = := { | d log f (x) 1 with dµ f dx. gap. Interestingly though, (10) does imply a general upper dx2 ϵ ϵ ϵ 2) The− measures µ≥saturate} the EPI= as ϵ approaches zero. bound on δ (µ, ν) involving Fisher informations. Specifi- ϵ EPI,t That is, lim δ (µ ,µ ) 0 for all t (0, 1). cally, for arbitrary probability measures µ, ν with finite second ϵ 0 EPI,t ϵ ϵ 3) The measures↓ µ are bounded= away from∈ Gaussians moments, ϵ in the W2 metric; specifically, lim infϵ 0 infγ0 $ 1 2 ↓ ∈ W (µϵ,γ0)>1/3. δEPI,t (µ, ν) (1 t) I (µ γ) D(µ γ) 2 ≤ − 2 ∥ − ∥ We remark that the measures (µ ) in the proposition & ' ϵ ϵ>0 1 are not necessarily pathological. In fact, it suffices to con- t I (ν γ) D(ν γ) . + 2 ∥ − ∥ sider simple Gaussian mixtures that approximate a Gaussian & ' measure, albeit with heavy tails. Moreover, the result can See [17] for details. Other deficit estimates for the logarithmic be trivially extended to arbitrary dimension by considering Sobolev inequality have been obtained in [21] and [22]. product measures. If X µ is a radially symmetric random vector on n, R Proof: Define the density f as n 2, satisfying∼ modest regularity conditions (e.g,. convolu- ϵ tion≥ with a Gaussian measure of small variance is sufficient), √ϵ ϵx2 √1 ϵ (1 ϵ)x2 fϵ(x) ϵ e− (1 ϵ) − e− − . (12) then it was recently established in [23] that, for any ε>0 = √π + − √π ε 1 ε δ (µ, µ) Cε(µ)n D + (µ γµ), (11) EPI,1/2 ≥ ∥ Evidently, fϵ is a Gaussian mixture having unit variance; the mixture components have variance (2ϵ) 1 and (2(1 ϵ)) 1, where γµ denotes the Gaussian measure with the same covari- − − respectively. − ance as µ,andCε(µ) is an explicit function that depends Proof of 1: On any interval, as ϵ 0, the densities only on ε,afinitenumberofmomentsofµ,anditsregu- ↓ larity. This closely parallels quantitative estimates on entropy ( fϵ)ϵ>0 and their derivatives converge uniformly to those of production in the Boltzmann equation [24], [25]. Neither (3) the Gaussian density with variance 1/2. Therefore, nor (11) imply the other since the hypotheses required are lim fϵ′′(x) 2 x R. (13) quite different (strong log-concavity vs. radial symmetry). − ϵ 0 = ∀ ∈ ↓ COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5695

Since the measures µϵ converge weakly to a Gaussian measure III. DISCUSSION AND PROOFS with variance 1/2, it is straightforward to conclude that The remainder of this paper makes use of ideas from limϵ 0 µϵ(1ϵ) 1, where 1ϵ is defined as in the statement ↓ = optimal mass transport. All necessary ingredients are high- of the proposition. lighted as they are needed. The unfamiliar reader is directed Proof of 2: This follows immediately from [26, Th. 1] due to the references, as well as the comprehensive introduc- to pointwise convergence of uniformly bounded densities and tions [29], [30]. Although not related to the present paper, uniformly bounded second moments. th we remark that optimal transport and information theory Proof of 3: Let m p(µ) denote the p absolute moment together play an important role in establishing concentration of associated with µ.Forconjugateexponentsp, q 1, Hölder’s ≥ measure; see, e.g., [31]–[33] and references therein. In prob- inequality implies that abilistic terms, the theory of optimal transport systematically 2 1/p 1/q investigates the problem of transporting one probability mea- W2 (µϵ,γs ) s 1 2m p (µϵ)mq (γs), (14) ≥ + − sure µ to another probability measure ν.Tothisend,amap where γs is the Gaussian measure with variance s.Indeed,for T Rn Rn is said to transport a probability measure µ to X µϵ and Z γs,thereisacouplingbetweenX and Z : → n ∼ ∼ another probability measure ν (both on R )ifthepushforward such that of µ under T is ν (i.e., ν T #µ). That is, 2 2 2 2 = W (µϵ ,γs) E X Z E Z E X 2E XZ 2 = | − | = [ ]+ [ ]− [ ] f Tdµ fdν 1 s 2E XZ , ◦ = = + − [ ] ) ) for all continuous bounded functions f Rn R.Inthe where the second equality follows since X and Z have second : → moments 1 and s,respectively.Now,Hölder’sinequality language of probability, if X has law µ and T transports p 1/p q 1/q 1/p 1/q µ to ν,thentherandomvariableT (X) has law ν.Under yields E XZ E X E Z m p (µϵ)mq (γs), giving (14).[ ]≤ [| | ] [| | ] ≡ general conditions (e.g., µ is absolutely continuous w.r.t. the Recall the characterization of Gaussian moments Lebesgue measure), valid transportation maps are guaranteed to exist. More remarkably, we are often ensured the existence (2s)q q 1 mq (γs) $ + of transportation maps with useful structure. For example, = π 2 , & ' in dimension 1, if Fµ, Fν denote the cumulative distribution 1 for s, q > 0. Let us now specialize (14) by choosing p 3/2 functions for µ and ν,respectively,thenT F− Fµ = = ν ◦ and q 3. In this case, transports µ to ν,providedFµ, Fν are strictly increasing. This = construction results in a transportation map T that is monotone 2 2/3 2 s increasing; in other words, it is realized as the derivative of a W2 (µϵ,γs) 1 s 2m3/2(µϵ) 1/3 ≥ + − ,π convex function. The spirit of this construction generalizes to 2 4/3 any dimension, and is a cornerstone result of optimal transport 1 m3/2(µϵ), ≥ − π1/3 theory: where the first inequality is (14) and the second inequality Theorem 12 (Brenier-McCann, [29], [34], [35]): Consi- follows by minimizing over s > 0. der two probability measures µ, ν on Rn,andassumethatµ By the dominated convergence theorem, we have is absolutely continuous with respect to the Lebesgue measure. There exists a unique map T (which we shall call the Brenier 1 lim m3/2(µϵ) m3/2(γ1/2) $(5/4). map)transportingµ onto ν that arises as the gradient of a ϵ 0 = = π ↓ , convex lower semicontinuous function. Moreover, this map is So, putting everything together, we may conclude such that 2 2 2 2 lim inf inf W2 (µϵ,γ0) lim inf inf W2 (µϵ,γs) W (µ, ν) E X T (X) , ϵ 0 γ0 $ = ϵ 0 s>0 2 ↓ ∈ ↓ = [| − | ] 4/3 2 1 where X has law µ,andthereforeT(X) has law ν.Inother 1 $(5/4) words, (X, T (X)) is an optimal coupling for the distance W . ≥ − π1/3 π 2 -, . The starting point for the proofs of our main results comes 2 1 ($(5/4))4/3 from a recent paper of Rioul [36]. Through an impressively = − π short sequence of direct but carefully chosen steps, Rioul 0.441562 > 1/3. ≈ recently gave a new proof of the EPI based on transportation of measures. From his proof, we may readily distill the following: Note that the first line follows since µϵ has mean zero, and n n n n therefore the minimization over all Gaussian measures may be Lemma 13: Let T1 R R and T2 R R be diffeomorphisms satisfying: µ→ T #γ and :ν T→#γ .Ifµ restricted to minimization over those with mean zero. ! 1 2 and have finite entropies, then= = Remark 11: Our construction of fϵ is closely related to ν the counterexamples proposed by Bobylev and Cercignani det(t T1(X ∗) (1 t) T2(Y ∗)) in their disproof of Cercignani’s conjecture on entropy pro- δEPI,t (µ, ν) E log | ∇ +t − ∇ 1 t|, ≥ det( T1(X ∗)) det( T2(Y ∗)) − duction in the Boltzmann equation [27]. This construction | ∇ | | ∇ | (15) also appeared in the context of the Boltzmann equation in [28, Proposition 23]. where X γ and Y γ are independent. ∗ ∼ ∗ ∼ 5696 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018

Remark 14: For a vector-valued map 2 (21,22, In addition, we remind the reader that a random vector n n = ...,2n) R R ,wewrite 2 to denote its Jacobian. X having a log-concave density enjoys (i) finite second : → ∂ ∇ That is, ( 2(x))ij 2 j (x). moment (in fact, finite moments of all orders); and (ii) finite ∂xi In words,∇ (15) shows= that the deficit in the EPI can always entropy h(X).SinceTheorem3requireslog-concavedensities, be bounded from below by a function of the Jacobians T1 these conditions will be implicitly assumedthroughoutthe ∇ and T2,whereT1 and T2 are are invertible and differentiable proof. ∇ maps that transport measures γ to µ and γ to ν,respectively. Proof of Theorem 3: Assume first that the densities ϕ ψ n When T1 and T2 are Knöthe maps (see [30], [37], [38]), e− and e− are smooth and strictly positive on R .Also, the Jacobians T1 and T2 are upper triangular matrices with let X γ and Y γ be independent. Define T1 to ∇ ∇ ∗ ∼ ∗ ∼ positive diagonal entries. Using this property, Rioul concludes be the Brenier map transporting γ to µ,andletT2 denote δEPI,t (µ, ν) 0usingconcavityofthelogarithmappliedto the Brenier map transporting γ to ν.Werecallherethata ≥ the eigenvalues (diagonal entries) of T1 and T2.Bystrict Brenier map is always the gradient of a convex function by ∇ ∇ concavity of the logarithm, saturation of this inequality implies the Brenier-McCann theorem, and therefore T2 and T1 ∇ ∇ that the diagonal entries of T1 and T2 must be equal almost are positive semidefinite since they coincide with Hessians ∇ ∇ everywhere. Combining this information with the fact that of convex functions. In fact, since all densities involved are arelativeentropyterm(omittedabove)mustvanish,Rioul non-vanishing, they are positive definite. Moreover, when the recovers the well known necessary and sufficient conditions for densities are strictly positive on the whole space, we know by δEPI,t (µ, ν) to vanish. Specifically, µ and ν must be Gaussian results of Caffarelli [40], [41] that the maps T1 and T2 are measures, equal up to translation. C1-smooth. Instead of the Knöthe map, our proofs will use the Brenier Using the assumed smoothness and convexity of the poten- map (defined in Theorem 12), which is always realized as tials ϕ and ψ,Caffarelli’scontractiontheorem(see[42]and, the gradient of a convex function. Toward this end, departing e.g., [29, Th. 9.14]) implies that T1 and T2 are 1-Lipschitz, from Rioul’s argument based on Knöthe maps, if T and T are 1 2 so that λmax( T1) 1andλmax( T2) 1. Therefore, since taken instead to be Brenier maps (again, transporting γ to µ ∇ ≤ ∇ ≤ T2 and T1 are positive definite, Lemma 15 yields the and γ to ν,respectively),thentheJacobians T1 and T2 are following∇ (pointwise)∇ estimate symmetric positive definite by the Brenier-McCann∇ Theorem.∇ log det(t T1 (1 t) T2) Thus, concavity of the log-determinant function on the posi- ∇ + − ∇ tive semidefinite cone immediately gives the EPI from (15). t log det( T1) (1 t) log det( T2) ≥ ∇ + − ∇ Moreover, by strict concavity of the log-determinant function, t(1 t) 2 − T1 T2 F . equality in the EPI implies that T1(X ∗) T2(Y ∗) almost + 2 ∥∇ −∇ ∥ ∇ =∇ everywhere, and are thus constant. Hence, T1 and T2 are Combined with (15) we obtain: necessarily affine functions, identical up to translation. This immediately implies that δ (µ, ν) 0onlyifµ, ν are δEPI,t (µ, ν) EPI,t = Gaussian measures with identical covariances. t(1 t) 2 − E (T1(X ∗) X ∗) (T2(Y ∗) Y ∗) . Unfortunately, while both arguments easily settle cases ≥ 2 ∥∇ − −∇ − ∥F of equality in the EPI, neither yield quantitative stability Now, define matrices A E (T1(X ∗) X ∗) and B = [∇ − ] = estimates. However, we note that the Brenier map is gen- E (T2(Y ∗) Y ∗) .Byorthogonality, we have erally better suited for establishing quantitative stability in [∇ − ] (T (X ) X ) (T (Y ) Y ) 2 functional inequalities. Indeed, it was remarked by Figalli, E 1 ∗ ∗ 2 ∗ ∗ F ∥∇ − −∇ − ∥ 2 Maggi and Pratelli in their comparison to Gromov’s proof of E (T1(X ∗) (I A)X ∗) (T2(Y ∗) (I B)Y ∗) F = ∥∇ 2 − + −∇ − + ∥ the isoperimetric inequality that the Brenier map is generally A B F more efficient than the Knöthe map in establishing quantitative +∥ − ∥ 2 E (T1(X ∗) (I A)X ∗) F stability estimates due to its rigid structure [39]. We shall = ∥∇ − + ∥ (T (Y ) (I B)Y ) 2 A B 2 fruitfully exploit the properties of the Brenier map in our proof E 2 ∗ ∗ F F + ∥∇ − + 2 ∥ +∥ − ∥ of Theorem 3. E T1(X ∗) (I A)X ∗ ≥ | − + | 2 2 E T2(Y ∗) (I B)Y ∗ (I A) (I B) F A. Proof of Theorem 3 + | − + 2 | +∥ + − + ∥ The proof of Theorem 3 is short, but makes use of several E T1(X ∗) (I A)X ∗ = | − + | 2 2 foundational results from the theory of optimal transport. E T2(Y ∗) (I B)Y ∗ E (I A)X ∗ (I B)X ∗ + | − + | + | + − + | We will also need the following lemma; a proof can be found The final inequality is due to the Gaussian Poincaré inequality in the Appendix. f 2 dγ f 2 dγ ,holdingforeveryC1-smooth Lemma 15: For positive definite matrices A, Band | |n ≤ |∇ | f R R with mean zero. Indeed, its application is justified t 0, 1 ,wehave ! : 1 → ! ∈[ ] by C -smoothness of the Brenier maps among log-concave log det(tA (1 t)B) t log det(A) (1 t) log det(B) distributions, and the identity + − ≥ t+(1 −t) − 2 2 E T1(X ∗) (I A)X ∗ xdµ(x) (I A) xdγ(x) 0, + 2max λmax(A),λmax(B) [ − + ]= − + = { 2 } ) ) A B F , ×∥ − ∥ which holds similarly for Y ∗.Thedesiredinequalitynow where λmax( ) denotes the largest eigenvalue. follows from the definition of W2.Indeed,letγA and γB denote · COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5697

the laws of (I A)X and (I B)Y ,respectively.Evidently, where $K $ denotes the set of centered Gaussian mea- + ∗ + ∗ ⊂ γA and γB are both Gaussian measures. Since T1(X ) µ sures with second moments bounded by K .Fromthis,itis ∗ ∼ and T2(Y ) ν by construction, straightforward to argue that for s sufficiently small, ∗ ∼ 2 2 E T1(X ∗) (I A)X ∗ W (µ, γA) +(µ ,ν ) | − + | ≥ 2 s s 2 2 2 and similarly, inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) √sC, ≥ γ1,γ2 $ + + − 2 2 ∈ E T2(Y ∗) (I B)Y ∗ W (ν,γB). " # (16) | − + | ≥ 2 Since X , Y are equal in distribution, (I B)X also has law ∗ ∗ + ∗ where C C(µ, ν) < depends only on the second γB,sothat moments of=µ and ν.Thus,thefirstpartoftheclaimfollows.∞ 2 2 The second claim follows immediately from the fact that E (I A)X ∗ (I B)X ∗ W2 (γA,γB). | + − + | ≥ lims 0 h(µs ) h(µ) for any µ having density and finite sec- Summarizing above, we may conclude ond↓ moment.= See, e.g., [6, Lemma 1.2].

δEPI,t (µ, ν) The third claim can be found in [9, Th. 3.7(b)]. ! Remark 17: The assumption of log-concavity is mainly used t(1 t) 2 − E (T1(X ∗) X ∗) (T2(Y ∗) Y ∗) to ensure that the optimal transport map is Lipschitz. This can ≥ 2 ∥∇ − −∇ − ∥F sometimes still be the case in other situations. For example, t(1 t) 2 2 2 − W (µ, γA) W (ν,γB ) W (γA,γB) the recent work [43] shows that this property holds for certain ≥ 2 2 + 2 + 2 families of bounded perturbations of the Gaussian measure, t(1 t)" 2 2 2 # − inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) . including the radially symmetric case. ≥ 2 γ1,γ2 $ + + ∈ " # Thus, Theorem 3 holds when the densities are smooth and positive everywhere. If this is not the case, we may first B. Proof of Corollary 7 convolve µ, ν with a Gaussian having small variance so that Before proving Corollary 7, let us briefly review the formal the resulting densities will be both smooth and positive. The equivalence between the different forms of the EPI. To this general result then follows by considering arbitrarily small end, let X, Y be independent random vectors in Rn whose perturbations, which is justified by Proposition 16, found entropies exist. Let t (0, 1) and observe that the Shannon- ∈ below. Stam inequality (1) together with convexity of x ex gives ! 5→ Proposition 16: For a probability measure µ and s > 0,let exp(2 h(√tX √1 tY)/n) µs denote the probability measure corresponding to X √sZ, + − where X µ and Z γ are independent.Ifprobability+ exp(2 h(√tX)/n) exp(2 h(√1 tY)/n) ∼ ∼ ≥ + − measures µ, ν have finite second moments, and t exp(2 h(X)/n) (1 t) exp(2 h(Y )/n) = + − 2 2 2 exp(2(th(X) (1 t)h(Y ))/n). +(µ, ν) inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) , ≥ + − := γ1,γ2 $ + + ∈ " # Taking logarithms reveals that the Shannon-Stam inequality then lims 0 +(µs,νs ) +(µ,ν ).Moreover,ifµ, ν have ↓ = implies Lieb’s inequality: densities, then lims 0 δEPI,t (µs,νs ) δEPI,t (µ, ν).Finally, ↓ = if µ satisfies (4) with parameter η>0,thenµs satisfies (4) h(√tX √1 tY) th(X) (1 t)h(Y ). with parameter η/(1 sη). + − ≥ + − + However, Lieb argued that a simple rescaling allows one to Proof: Evidently, if γs denotes the Gaussian measure with make the reverse deduction. In particular, define the random covariance matrix sI, then µs µ γs.Observethatforany 1/2 1/2 = ∗ vectors X t− X and Y (1 t)− Y .Inthiscase, fixed Gaussian measures γ1,γ2 (not to be confused with γs), ˜ ˜ we have by= Lieb’s inequality = − 2 2 2 W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) +2 + 2 h(X Y ) h(√t X √1 tY ) th(X) (1 t)h(Y ) (17) W (µ γs,γ1 γs) W (ν γs,γ2 γs) + = ˜ + − ˜ ≥ ˜ + − ˜ ≥ 2 ∗ ∗ + 2 ∗ ∗ 2 for all t 0, 1 .Inparticular,wemaychooset to satisfy W2 (γ1 γs,γ2 γs) + ∗ ∗ t/(1 t) ∈[N(X])/N(Y ),inwhichcase +(µs,νs ), − = ≥ th(X) (1 t)h(Y ) so that +(µ, ν) +(µs,νs ).Itremainstoproveaninequality ˜ ˜ ≥ + − n in the reverse direction. th(X) (1 t)h(Y ) (t log t (1 t) log(1 t)) = + − − 2 + − − Since W2 is a metric on the space of probability measures n with finite second moments, the reverse triangle inequality (exp(2h(X)/n) exp(2h(Y )/n)) . (18) = 2 + implies that W (µ, ν) W (µ, δ ) W (ν,δ ) ,whereδ 2 2 0 2 0 0 Thus, substituting into (17), we obtain is the Dirac measure at≥ 0.| Squaring both− sides, it follows| that 2 2 2 2 n W2 (µ, ν) ( E X E Y ) for X µ and Y ν. h(X Y ) (exp(2 h(X)/n) exp(2 h(Y )/n)) , Therefore,≥ there is| K| − K (µ,| | ν) < depending∼ only∼ on + ≥ 2 + % = % ∞ the second moments of µ and ν such that which is precisely Shannon-Stam inequality (1). 2 2 2 Proof: Let X, Y and X, Y be as above. If X µ +(µ, ν) inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) , ˜ ˜ ∼ = γ1,γ2 $K + + satisfies (4) with parameter ηµ,thenbyachangeofvariables ∈ " # 5698 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018 the density of X satisfies (4) with parameter tη .Asimilar The L1 Poincaré inequality for the Gaussian measure ˜ µ statement holds for Y .Thus,bothX and Y satisfy (4) with (cf. [44, Proposition 1.8]) states that ˜ ˜ ˜ parameter min tηµ,(1 t)ην . Next, let µ{and ν denote− } the laws of X and Y ,respec- f dγ 2 f dγ (20) ˜ ˜ n n ˜ ˜ 2 2 R | | ≤ R |∇ | tively. We note that t(1 t)dW (µ) (1 t)dW (µ) and ) ) − 2 ˜ = − 2 n t(1 t)d2 (ν) td2 (ν) by a simple rescaling. Also, we have for every smooth mean zero f R R.Seethediscus- − W2 ˜ = W2 : → 1 the following lower bound on W 2(µ, ν): sion in Section IV for more information about L Poincaré 2 ˜ ˜ inequalities and their relation to Cheeger inequalities. t(1 t)W 2(µ, ν) d2 (µ, ν). − 2 ˜ ˜ ≥ F Since (T (X ∗), X ∗) is a valid coupling between µ and γ , This follows from rescaling, the fact that W2 is non-increasing we have under rescaled convolution, the central limit theorem, weak n continuity of W ,theidentityforW distance between W1,1(µ, γ) E T (X ∗) X ∗ 1 E Ti (X ∗) X ∗ . 2 2 ≤ ∥ − ∥ = | − i | Gaussian measures mentioned in the proof of Theorem 3, and i 1 = 2 ( finally the definition of dF in (6). Smoothness of the Brenier map among log-concave densities Now, using the above identities and applying (5), we have and the fact that T (X ) X is zero mean justifies application ∗ − ∗ h(X Y ) of (20). This in conjunction with the ℓ1 ℓ2 bound x 1 n − ∥ ∥ ≤ + √n x 2 for x R gives the estimate h(√t X √1 tY ) ∥ ∥ ∈ = ˜ + − ˜ n W1,1(µ, γ) 2 E (Ti (X ∗) X ∗) th(X) (1 t)h(Y ) ≤ |∇ − i | ≥ ˜ + − ˜ i 1 (= t(1 t) 2 min tηµ,(1 t)ην − d (µ) √n2E T (X ∗) I) F + { − } 8 W2 ˜ ≤ ∥∇ − ∥ " 2 2 2 √n2E (λi 1) . d (ν) W (µ, ν) = − + W2 ˜ + 2 ˜ ˜ /, 0 # ( th(X) (1 t)h(Y ) Hence in this situation, if we have an L2 bound on the largest ≥ ˜ + − ˜ eigenvalue of T ,wecandeduceaW1,1 estimate on the min tηµ,(1 t)ην ∇ 1 t d2 deficit (in contrast to using a uniform bound as in the proof { − } ( ) W2 (µ) + 8 − of Theorem 3). " 2 3 2 2 2 AresultofKolesnikovassertsthat λ ( λn ) tdW (ν) dF (µ, ν) . E n 2 E + 2 + (see [45, Th. 6.1] and the discussion at the[ top] of≤ page 1526).[ ] Now, we imitate Lieb’s argument and choose t to# satisfy Moreover, t/(1 t) N(X)/N(Y ).Inthiscase,theidentity(18)holds, − = giving 2 E λn 1 E λn 1 1 E (λi 1) . [ ]≤ + [| − |] ≤ + − h(X Y ) /, 0 +n ( (exp(2 h(X)/n) exp(2 h(Y )/n)) From this estimate we deduce ≥ 2 + min tη ,(1 t)η µ ν (1 t)d2 (µ) td2 (ν) (λ 1)2 { − } W2 W2 E i + 8 − + /, − 0 " 2 ( dF (µ, ν) 2 + 2 # 4 3 (λ 1)2 δ (µ, γ). Multiplying through by 2/n,takingexponentsandrelabel- 4 E i EPI,t ≤ 5 + / − 0 7t(1 t) ing t θ completes the proof. 5 , − ← 6 ( ! 2 2 Since r/√1 r c min(r, 1) and 2E (λi 1) 1/2 + ≥ − ≥ C. Proof of Theorem 8 n− W1,1(µ, γ),weconclude 8%2 9 Proof of Theorem 8: Assume that X γ ,andletT 1/2 ∗ 2δEPI,t (µ, γ)/(t(1 t)) C min(n− W1,1(µ, γ), 1), be the Brenier map sending γ onto µ.Forconvenience,let∼ − ≥ us define random variables (λi )1 i n as the eigenvalues of and% the result follows. ≤ ≤ T (X ) in increasing order (so that λ1 λ2 λn ∇ ∗ ≤ ≤··· a.s.). The combination of (15) and Lemma 15 yields in this D. Proof of Theorem 9 case 2 For a probability measure ν,letI (ν γ)and D(ν γ) denote t(1 t) T (X ∗) I F the relative Fisher information and relative∥ entropy,∥ respec- δEPI,t (µ, γ) − E ∥∇ − ∥ 2 ≥ 2 1 λmax( T (X )) tively, of ν with respect to the standard Gaussian measure γ / + ∇ ∗ 0 on n.Sinceweshallonlybeconcernedwith as reference n 2 R γ t(1 t) i 1(λi 1) measure, we opt henceforth for the more compact notation − E = −2 . (19) = 2 1 λn I (ν) I (ν γ) and D(ν) D(ν γ) to simplify the 12 + 3 := ∥ := ∥ COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5699 expressions in this section. Thus, if h is the density of ν with so we need only integrate the RHS of (22). Using the d respect to γ ,then differential form of de Bruijn’s identity ds I (νs ) D(νs ), integration by parts gives =− D(ν) D(ν γ) h log hdγ 2s := ∥ = ∞ 1 λ(e 1) ) I (νs ) + − ds 1 λ(e2(t s) 1) and )0 + + − 1 λ(e2s 1) ∞ 2 D ∞ D u s ds h (νs ) + 2(t s−) (νs ) ′( ) I (ν) I (ν γ) |∇ | dγ, =− 1 λ(e 1) + 0 := ∥ = h 1 + + − 30 ) ) 1 ∞ D(ν) D(νs )u′(s)ds, (23) provided h is sufficiently smooth. = 1 λ(e2t 1) + )0 If X ν and Z γ are independent, we let νt denote + − ∼ t ∼ 2t 1/2 where, for convenience, we have defined the law of e X (1 e ) Z.Inotherwords,νt is − + − − the evolution of ν at time t along the Ornstein-Uhlenbeck 1 λ(e2s 1) u(s) . process. The following integral form of de Bruijn identity is + 2(t s−) := 1 λ(e + 1) classical [6], [46], [47]: If ν is centered and has finite second + − moment, then Now, we note that e2s(e2t 1)(λ 1)λ ∞ u (s) 2 − − . D(ν) I (νs )ds. (21) ′ 2(s t) 2 = = (1 λ(e + 1)) )0 + − Thus, if λ 1, then u (s) 0andtherefore In the proof of [10, Th. 1], Fathi, Indrei and Ledoux ≤ ′ ≤ established the following inequality: 0∞ D(νs )u′(s)ds 0. Hence, (22) together with (23) and de Bruijn’s identity≤ yields Theorem 18: If ν is a centered probability measure with ! spectral gap λ>0,thenforallt 0 1 ≥ D(νt ) D(ν) . ≤ 1 λ(e2t 1) 2t 1 I (νt ) e− I (ν) . + − ≤ 1 λ(e2t 1) On the other hand, if λ 1, then u (s) 0, and we may + − ≥ ′ ≥ We also note that the convolution inequality for Fisher infor- bound mation implies that I (ν ) e 2t I (ν),soTheorem18yieldsa t − ∞ D(ν )u (s)ds stability result for the exponentia≤ ldecayofFisherinformation s ′ )0 along the Ornstein-Uhlenbeck process, provided the starting ∞ 2s measure has positive spectral gap. D(ν) e− u′(s)ds ≤ 0 We now aim to establish the following Corollary of (18), ) ∞ 1 from which Theorem 9 will follow. D(ν)2(e2 t 1)(λ 1)λ ds = − − (1 λ(e2(s t) 1))2 Corollary 19: If ν is a centered probability measure with )0 + + − λ(e2 t 1) λ 1 λ 1 spectral gap λ,thenforallt 0 D(ν) − − log 1 − ≥ = (λ 1) 1 λ(e2 t 1) + − λe2t − & + − & '' 2t 1 2t D(νt ) e− D(ν) max , e− to conclude ≤ 1 λ(e2t 1) & + − ' 1 λ(e2s 1) ∞ I ds Proof: In their proof of [10, Th. 1], Fathi, Indrei and (νs ) + 2(t s−) 0 1 λ(e + 1) Ledoux also noted that if ν has spectral gap λ,thenνs has ) + 1 − spectral gap at least ∞ D(ν) 2t D(νs )u′ (s)ds = 1 λ(e 1) + 0 | 2s + − ) λe λ(e2 t 1) λ 1 λs D(ν) D(ν) − log 1 − = 1 λ(e2s 1) ≤ + (λ 1) − λe2t + − 2t − & ' So, as a consequence of this and Theorem 18, we have e− D(ν), ≤ 2t 1 where the last inequality is due to log(1 x) x for x < 1. I (νt s ) e− I (νs ) 2t − ≤− + ≤ 1 λs(e 1) So, as before, (22) together with (23) and de Bruijn’s identity + 2s− yields 2t 1 λ(e 1) e− I (νs ) + − . (22) = 1 λ(e2(t s) 1) 2t + D(νt ) e− D(ν), + − ≤ By de Bruijn’s identity4: completing the proof. ! ∞ Remark 20: In principle, Corollary 19 should follow via I (νt s)ds D(νt ), an application of Gronwall’s lemma to Theorem 18. However, 0 + = ) the integration involved there appears no less tedious than the 4Positive spectral gap implies finite moments of all orders, so an explicit computations above, so we have opted for the direct proof assumption of finite second moment is not necessary here. given here. 5700 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018

Now, Theorem 9 follows immediately. In particular if ν Applying this result and following the same steps as in the is centered and has spectral gap λ,thendefinitionsand proof of Theorem 3 yields Corollary 19 yield the desired inequality t(1 t) δ (µ, γ) − W 2(µ, γ). EPI,t 8c2n 2 δ 2t (ν,γ) ≥ EPI,e− 2t which concludes the proof. ! e− D(ν) D(νt ) = − We also prove a lower bound when neither of the two λ 2t 2t measures are Gaussian, but with an even worse dependence e− (1 e− )D(ν) min 2t 2t , 1 ≥ − e− λ(1 e− ) on the dimension: 2t 2t & + − ' e− (1 e− ) min (λ, 1) D(ν). Proposition 22: Let T and T be the Brenier maps sending ≥ − 1 2 Gaussians to X and Y, respectively. Assume that X and Y are 1 centered and that the maps Ti are C and satisfy the pointwise bound T x c x 2 for all x, for some c . IV. EXTENSIONS λmax( i ( )) 1 > 1 Then there is∇ a universal≤ constant+| | C > 0 such that The proof of Theorem 3 uses the fact that, under the % assumptions of uniform log-concavity, the Brenier maps are δEPI,t (µ, ν) 2 Ct(1 t) 2 2 2 Lipschitz to bound the square of the largest eigenvalue λmax −2 inf (W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2)). ≥ (cn) γ1,γ2 $ + + in the deficit estimate for the log-concavity of the deter- ∈ minant. A natural question is whether we can use weaker Proof: Let us define assumptions on the map and still obtain a deficit estimate for 2 1 cn E (1 X ∗ )− , the EPI. It turns out that we can get an estimate, provided the := [ +| | ] largest eigenvalue of T (x) grows at most linearly at infinity. 1 T1(X ∗) I A cn− E ∇ − , We shall later see that∇ in dimension 1, as well as for multi- := 1 X 2 1 +| ∗| 3 dimensional radially symmetric measures, this assumption of 1 T2(Y ∗) I B c− E ∇ − . eigenvalue growth holds as soon as the law of the random := n 1 Y 2 1 +| ∗| 3 variable satisfies a Cheeger isoperimetric inequality, which As in the proof of Theorem 3, we have is a stronger assumption than the spectral gap assumption 2 used for the one-dimensional result of [15], but equivalent in t(1 t) T1(X ∗) I ( T2(Y ∗) I) δ (µ, ν) − ∥∇ − − ∇ − ∥F , (non-uniformly) log-concave situations. EPI,t 2 E 2 2 ≥ 2c / (1 X ∗ )(1 Y ∗ ) 0 Afirstcaseinwhichweestablishadeficitestimateiswhen +| | +| | 2 2 2 one of the two variables is Gaussian: where we have used the naïve bound λmax c (1 x ) (1 y 2),whichultimatelyleadstotheworseningofthe≤ +| | Proposition 21: Let µ be a centered probability measure +| | on Rn,andletT betheBreniermapsendingthestandard dependence on the dimension. We then have 1 2 Gaussian measure γ onto µ.IfT isC and satisfies the T1(X ∗) I ( T2(Y ∗) I) F 2 ∥∇ − − ∇ − ∥ pointwise bound λmax( T (x)) c 1 x for all x, for E 2 2 / (1 X ∗ )(1 Y ∗ ) 0 some c > 1,then ∇ ≤ +| | +| | +| | % T (X ) I c A 2 t(1 t) 1 ∗ n F 2 E ∥∇ 2− − ∥2 δEPI,t (µ, γ) − W (µ, γ). = (1 X )(1 Y ) ≥ 8c2n 2 / +| ∗| +| ∗| 0 2 This estimate can be compared to Theorem 8. Its advantage T2(Y ∗) I cn B F 2 2 E ∥∇ − − ∥ c cn(A B) is that it involves the more natural W distance, and that + (1 X 2)(1 Y 2) + n∥ − ∥F 2 / +| ∗| +| ∗| 0 its assumptions may be satisfied in certain non-log-concave T1(X ∗) I cn A T2(Y ∗) I cn B situations (as we shall later see for one-dimensional random 2E ∇ − − , ∇ − − − 1 X 2 1 Y 2 variables). However, it has the downside of sometimes being : +| ∗| +| ∗| ; cn(A B) T1(X ∗) I cn A worse, even for product measures in high dimension, and of 2E − , ∇ − − + 1 Y 2 1 X 2 requiring more quantitative information on the measure µ,via 1: +| ∗| +| ∗| ;3 the constant c in the eigenvalue bound. cn(A B) T2(Y ∗) I cn B 2E − , ∇ − − Proof: Using the assumption, the combination of (15) and − 1 X 2 1 Y 2 1: +| ∗| +| ∗| ;3 Lemma 15 gives 2 T1(X ∗) (I cn A) F cnE ∥∇ − +2 ∥ 2 = (1 X ∗ ) t(1 t) T (X ∗) I F / +| | 0 δEPI,t (µ, γ) −2 E ∥∇ −2 ∥ . 2 ≥ 2c / 1 X ∗ 0 T2(Y ∗) (I cn B) F 2 2 +| | cnE ∥∇ − + ∥ c cn A cn B + (1 Y 2) + n∥ − ∥F According to [48, Corollary 5.6] (see also [49] for the one- / +| ∗| 0 dimensional case), the standard Gaussian measure in dimen- and then the proof continues in the same way as for the sion n satisfies the weighted Poincaré inequality previous proposition. The constant cn above is the expectation 2 1 1 2 of (1 X ∗ )− ,whichisofordern− . ! f (X ∗) 1 To+| apply| these results, we want to know when does the E |∇ | Var ( f (X ∗)). 1 X 2 ≥ 4n 1 +| ∗| 3 Brenier map satisfy the eigenvalue bound. We shall prove COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5701 that for one-dimensional measures and for radially symmetric This result can be extended to the non-centered case just by log concave measures, such an assumption holds when the translating. measure satisfies a certain isoperimetric inequality. Note that a measure that is the image of the exponential Proposition 23: If the law of X is given by an exponential measure by a Lipschitz map necessarily satisfies Cheeger’s 1 measure µ(dx) 2 exp( x )dx, then the Brenier map T inequality, so the first part of this statement is actually an transporting a standard= Gaussian−| | random variable onto X equivalence. satisfies the bound Proof: Showing an upper bound on the derivative of the map is the same as proving a lower bound on the derivative of 2 T ′(x) c 1 x the inverse map T (which is the optimal map sending µ onto ≤ + ˜ for some c > 0 % the exponential measure). A computation shows that, if we Proof: The optimal map from a measure µ onto a denote by f the density of the law of X,wehaveforx positive measure ν with positive densities in dimension one is given 2 f (x) 1 by x Fν− (Fµ(x)),whereFµ(x) µ( , x ) is the T ′(x) . −→ = ]−∞ ] ˜ = 1 Fµ(x) distribution function associated with µ.Fortheexponential − measure the distribution function can be explicitly computed Since µ has median 0, the Cheeger inequality implies that for x x α as F (x) 1 e /2forx 0ande /2forx < 0. x 0weget f (x) (1 Fµ(x)) and the first part of the exp − ≥ ≥ 2 − Consider=x >−0. A direct computation≥ shows that result immediately follows, after applying the same reasoning x2/2 for negative x. e− 2 T ′(x) The second part can be deduced just by using the fact that = √2π 1 Fγ (x) − in dimension 1 the optimal map from the Gaussian onto µ where Fγ is the distribution function of the standard Gaussian is the composition of the map from the Gaussian onto the measure. There exists a constant c such that 1 Fγ (x) exponential measure with T . ! x2/2 − ≥ The same argument can be generalized to radially symmet- e− ,andtheboundonT immediately follows. By sym- c√1 x2 ′ ric random vectors with log-concave density: + metry, the same bound applies when x < 0. Proposition 26: Assume that X is a radially symmetric To prove the lower bound on 1 Fγ (x),wejustusethe n − random vector in R ,whoselawislog-concaveandsatisfies fact that aCheegerinequalitywithconstantα.Thereexistsaconstant x2/2 t2/2 e− 1 ∞ e− cn (depending on n, but otherwise independent of X) such that 1 Fγ (x) 2 dt the optimal map T transporting the Gaussian measure onto − = √2πx − √2π x t ) 1 2 x2/2 Xsatisfiesλmax( T (x)) cnα− 1 x . e− 1 Fγ (x) Remark 27: Bobkov∇ [53]≤ showed that+| the| optimal constant − , % ≥ √2πx − x2 λ in the Poincaré inequality for a radially symmetric log- 2 1 and the existence of a suitable constant easily follows. ! concave random variable satisfies nE X − /12 λ 2 1 [| | ] ≤ ≤ Definition 24: A probability measure µ is said to satisfy a nE X − .Sinceforlog-concavemeasuresthesquareof [| | ] Cheeger isoperimetric inequality with constant λ>0 if for the Cheeger constant and the Poincaré constant are equiva- any measureable set A we have lent [51], the constant α in the Proposition always exists and 2 1/2 is comparable to √nE X − ,uptouniversalconstants. µ+(∂ A) λµ(A)(1 µ(A)) [| | ] ≥ − Proof: Let µrad be the law of X ,andT be the optimal µ(A B ) µ(A) rn 1 r2/2| | ˜ ϵ transport sending γrad − e dr onto µrad.TheBrenier where µ+(∂ A) lim supϵ 0 + ϵ − ,withBϵ the ball √2π − := ↓ := with center 0 and radius ϵ. optimal map sending the Gaussian onto the law of X is then For log-concave measures (i.e., those having a log-concave given by x T ( x )x/ x .Thiscanbecheckedbyverifying ˜ density), Buser’s theorem [50] states that satisfying a Cheeger that it sends−→ the Gaussian| | | measure| onto the law of X (which inequality and a Poincaré inequality is equivalent, up to is a simple change of variable argument) and that it is the universal constants (or more precisely, the extension of Buser’s gradient of the convex function H ( x ) with H T .Since ′ ˜ theorem to weighted spaces [51]). In general, the Cheeger the Brenier map is the only transport| | map that arises= as the inequality is stronger than the Poincaré inequality. It is equiv- gradient of a convex function, T is necessarily the Brenier 1 alent (up to universal constants) to the L Poincaré inequality map. f dµ c f dµ for every function with average zero. |∇ | ≥ | | We then compute the gradient of the map T ,whichisgiven More generally, for log-concave measures the isoperimetric by T (x)v v x v,x T ( x ) x T ( x ) x,v . !inequality and! the exponential concentration property are ∇ = x − x 3 ⟨ ⟩ ˜ | | + x 2 ˜ ′ | | ⟨ ⟩ Since T (0) 0, using| | the| | mean value theorem,| | we therefore equivalent [52]. = " # have T (x)v v x v,x T (t) x T ( x ) x,v for Theorem 25: Assume X is a one-dimensional random vari- x 2 ˜ ′ x 2 ˜ ′ ∇ = − | | ⟨ ⟩ + | | | | ⟨ ⟩ able with positive density and median 0,thatsatisfiesa some t (0, x ).Fromthis,weseethattoprovethedesired" # upper bound∈ | on| the eigenvalues of T ,itisenoughtoshow Cheeger inequality with constant α.Thentheoptimalmap ∇ 1 that T (r) c√1 r 2. transporting the exponential measure onto X is α− -Lipschitz. ˜ ′ ≤ + As a consequence, the optimal map T transport- To prove this bound, we consider the symmetrized versions ing the Gaussian measure onto the law of X satisfies of µrad and γrad by extending them by symmetry to R, T (x) c (1 x 2). and dividing the density by 2 so that they are still ′ ≤ α +| | 5702 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018 probability measures. We denote these measures by We now multiply equation (26a) by t and (26b) by (1 t) and − µrad and γrad. These measures are still log-concave, add them to obtain andˆ their Cheegerˆ constants are comparable to those of the tf(x) (1 t) f (y) original measures, up to a universal constant, via Bobkov’s + − t(1 t)2m(x,w) t2(1 t)m(y,w) theorem we mentioned in Remark27.Moreover,theirmedian f (w) − + − y x 2. is located at 0. We also extend T to by antisymmetry, and ≥ + 2 | − | ˜ R denote the function we obtain by T .Itiseasytocheckthat By definition, we have m(x,w) m(x, y) and m(y,w) ˆ ≥ ≥ T is the optimal map sending γrad onto µrad. m(x, y),andthisprovesthelemma. ˆ ˆ ˆ ! Following the same arguments as for the one-dimensional Proof of Lemma 15: The function f ( ) log det( ) case, denoting by p the density of µ,wehaveforr 0 is known to be strictly convex and twice· differentiable=− in· ˆ ≥ the interior of the positive semidefinite cone. Substituting the n 1 r2/2 1 T ′(r) Cr − e− /p(F− (F (r)) µrad γrad definition of function f into Lemma 30 yields ˆ ≤ ˆ ˆ n 1 r2/2 1 1 Cαr − e− Fγrad (r)− (1 Fγrad (r))− log det(tA (1 t)B) t log det(A) (1 t) log det(B) ≤ ˆ − ˆ n 1 r2/2 1 + − ≥ + − C r e F r m(A, B) 2 α − − (1 γrad ( ))− t(1 t) A B . ≤ − + − 2 ∥ − ∥F Cα 1 r 2 ≤ + 2 1 1 It is a standard fact that f (M) M− M− ,with % r2/2 n 2 ∇ = ⊗ where we have used the estimate 1 Fγrad (r) Ce− r − denoting the Kronecker product. The minimum eigenvalue − ≥ ⊗ 2 for large enough r,andC was some positive constant that of this Kronecker product is given by 1/λmax(M),where changed from line to line. Note that the constant is dimension- λmax(M) is the largest eigenvalue of M.Wethereforehave dependent in the final inequality. ! 1 Remark 28: In this proof, the log-concavity is only used m(A, B) min 2 . ≥ t 0,1 λmax(tA (1 t)B) to ensure that µrad satisfies a Cheeger inequality with a ∈[ ] + − constant comparableˆ to α.Thisisnotnecessarilythecasefor Using the convexity of the maximum eigenvalue then yields non log-concave radially symmetric measures (for example, the desired result. ! the uniform measure on a two-dimensional annulus). REFERENCES

APPENDIX [1] T. A. Courtade, M. Fathi, and A. Pananjady, “Wasserstein stability of the entropy power inequality for log-concave random vectors,” in Proc. N STIMATE FOR THE OG ETERMINANT UNCTION A E L -D F IEEE Int. Symp. Inf. Theory (ISIT),Jun.2017,pp.659–663. Definition 29: A twice differentiable function f [2] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. : Tech. J.,vol.27,no.3,pp.623–656,Jul./Oct.1948. dom f R is said to be m(x, y)-strongly convex between [3] A. J. Stam, “Some inequalities satisfied by the quantities of information x, y dom→ fif 2 f (tx (1 t)y) m(x, y)I,forall of Fisher and Shannon,” Inf. Control,vol.2,no.2,pp.101–112, t 0∈, 1 . ∇ + − ≥ Jun. 1959. ∈[ ] [4] N. M. Blachman, “The convolution inequality for entropy powers,” IEEE Lemma 30: For all t 0, 1 ,am(x, y)-strongly convex Trans. Inf. Theory,vol.IT-11,no.2,pp.267–271,Apr.1965. function f between x and∈ y[ satisfies] [5] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” Commun. Math. Phys.,vol.62,no.1,pp.35–41,1978. [6] E. Carlen and A. Soffer, “Entropy production by block variable summa- tf(x) (1 t) f (y) + − tion and central limit theorems,” Commun. Math. Phys.,vol.140,no.2, m(x, y) 2 pp. 339–371, 1991. f (tx (1 t)y) t(1 t) x y . [7] S. G. Bobkov and G. P. Chistyakov, “Entropy power inequality for the ≥ + − + − 2 | − | Rényi entropy,” IEEE Trans. Inf. Theory,vol.61,no.2,pp.708–714, Proof: The Taylor series expansion of f for any two points Feb. 2015. [8] S. Bobkov and M. Madiman, “The entropy per coordinate of a random a, b dom f yields vector is highly constrained under convexity conditions,” IEEE Trans. ∈ Inf. Theory,vol.57,no.8,pp.4940–4954,Aug.2011. f (a) f (b) f (b), b a [9] A. Saumard and J. A. Wellner, “Log-concavity and strong log-concavity: = 1 +⟨∇ − ⟩ Areview,”Statist. Surveys,vol.8,p.45,Apr.2014. 2 [10] M. Fathi, E. Indrei, and M. Ledoux, “Quantitative logarithmic Sobolev b a, f (t0a (1 t0)b)(b a) (24) + 2⟨ − ∇ + − − ⟩ inequalities and stability estimates,” Discrete Continuous Dyn. Syst., vol. 36, no. 12, pp. 6835–6853, 2016. m(a, b) 2 f (b) f (b), b a a b , (25) [11] D. Cordero-Erausquin, “Transport inequalities for log-concave measures, ≥ +⟨∇ − ⟩+ 2 | − | quantitative forms and applications,” Can. J. Math.,vol.69,no.3, pp. 481–501, 2017. where equation (24) holds for some t0 0, 1 ,andinequal- ∈[ ] [12] S. G. Bobkov, “Isoperimetric and analytic inequalities for log-concave ity (25) follows from Definition 29. probability measures,” Ann. Probab.,vol.27,no.4,pp.1903–1921, Denote w " tx (1 t)y.Applyinginequality(25)twice Oct. 1999. + − [13] G. Toscani, “A strengthened entropy power inequality for log-concave yields the two inequalities densities,” IEEE Trans. Inf. Theory,vol.61,no.12,pp.6550–6559, Dec. 2015. m(x,w) f (x) f (w) f (w), w x w x 2 (26a) [14] K. Ball and V. H. Nguyen, “Entropy jumps for isotropic log-concave ≥ +⟨∇ − ⟩+ 2 | − | random vectors and spectral gap,” Studia Math.,vol.213,no.1, m(y,w) pp. 81–96, 2012. f (y) f (w) f (w), w y w y 2. (26b) [15] K. Ball, F. Barthe, and A. Naor, “Entropy jumps in the presence of a ≥ +⟨∇ − ⟩+ 2 | − | spectral gap,” Duke Math. J.,vol.119,no.1,pp.41–63,2003. COURTADE et al.:QUANTITATIVESTABILITYOFTHEEPI 5703

[16] T. A. Courtade, “Strengthening the entropy power inequality,” in Proc. [43] M. Colombo, A. Figalli, and Y. Jhaveri, “Lipschitz changes of variables IEEE Int. Symp. Inf. Theory (ISIT),Jul.2016,pp.2294–2298. between perturbations of log-concave measures,” Annali della Scuola [17] T. A. Courtade, “Concavity of entropy power: Equivalent formulations Normale Superiore,vol.17,no.4,pp.1491–1519,Dec.2017. and generalizations,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), [44] M. Ledoux, “Logarithmic Sobolev inequalities for unbounded spin Jul. 2017, pp. 56–60. systems revisited,” in Séminaire de Probabilités XXXV (Lecture Notes [18] L. Gross, “Logarithmic Sobolev inequalities,” Amer. J. Math.,vol.97, in Mathematics), vol. 1755, J. Azéma, M. Émery, M. Ledoux, and no. 4, pp. 1061–1083, 1975. M. Yor, Eds. Berlin, Germany: Springer, 2001. [Online]. Available: [19] E. A. Carlen, “Superadditivity of Fisher’s information and logarithmic https://doi.org/10.1007/978-3-540-44671-2_13 Sobolev inequalities,” J. Funct. Anal.,vol.101,no.1,pp.194–211, [45] A. V. Kolesnikov, “Hessian metrics, CD(K,N)-spaces, and optimal 1991. transportation of log-concave measures,” Discrete Continuous Dyn. Syst., [20] E. Indrei and D. Marcon, “A quantitative Log–Sobolev inequality vol. 34, no. 4, pp. 1511–1532, 2014. for a two parameter family of functions,” Int. Math. Res. Notices, [46] A. R. Barron, “Entropy and the central limit theorem,” Ann. Probab., vol. 2014, no. 20, pp. 5563–5580, Jan. 2013. [Online]. Available: vol. 14, no. 1, pp. 336–342, 1986. https://doi.org/10.1093/imrn/rnt138 [47] D. Bakry, I. Gentil, and M. Ledoux, Analysis and Geometry of Markov [21] S. G. Bobkov, N. Gozlan, C. Roberto, and P.-M. Samson, “Bounds on the Diffusion Operators,vol.348.Cham,Switzerland:Springer,2013. deficit in the logarithmic Sobolev inequality,” J. Funct. Anal.,vol.267, [48] M. Bonnefont, A. Joulin, and Y. Ma, “Spectral gap for spherically no. 11, pp. 4110–4138, 2014. symmetric log-concave probability measures, and beyond,” J. Funct. [22] F. Bolley, I. Gentil, and A. Guillin, “Dimensional improvements of the Anal.,vol.270,no.7,pp.2456–2482,2016. logarithmic Sobolev, Talagrand and Brascamp–Lieb inequalities,” Ann. [49] M. Bonnefont, A. Joulin, and Y. Ma, “A note on spectral gap and Probab.,vol.46,no.1,pp.261–301,2018. weighted Poincaré inequalities for some one-dimensional diffusions,” [23] T. A. Courtade. (2016). “Entropy jumps for radially symmetric random ESAIM, Probab. Statist.,vol.20,pp.18–29,Jun.2016. vectors.” [Online]. Available: https://arxiv.org/abs/1608.05430 [50] P. Buser, “A note on the isoperimetric constant,” Ann. Sci. École Norm. [24] G. Toscani and C. Villani, “Sharp entropy dissipation bounds and Supérieure (4),vol.15,no.2,pp.213–230,1982. explicit rate of trend to equilibrium for the spatially homoge- [51] M. Ledoux, “Spectral gap, logarithmic Sobolev constant, and geometric neous Boltzmann equation,” Commun. Math. Phys.,vol.203,no.3, bounds,” Surv. Differ. Geometry,vol.9,pp.219–240,2004.[Online]. pp. 667–706, Jun. 1999. Available: http://dx.doi.org/10.4310/SDG.2004.v9.n1.a6 [25] C. Villani, “Cercignani’s conjecture is sometimes true and always almost [52] E. Milman, “On the role of convexity in isoperimetry, spectral gap true,” Commun. Math. Phys.,vol.234,no.3,pp.455–490,Mar.2003. and concentration,” Inventiones Math.,vol.177,no.1,pp.1–43, [26] M. Godavarti and A. Hero, “Convergence of differential entropies,” IEEE 2009. Trans. Inf. Theory,vol.50,no.1,pp.171–176,Jan.2004. [53] S. G. Bobkov, “Spectral gap and concentration for some spherically [27] A. V. Bobylev and C. Cercignani,“Ontherateofentropyproduc- symmetric probability measures,” Geometric Aspects of Functional tion for the Boltzmann equation,” J. statist. Phys.,vol.94,nos.3–4, Analysis (Lecture Notes in Mathematics), vol. 1807, V. D. Milman pp. 603–618, Feb. 1999. and G. Schechtman, Eds. Berlin, Germany: Springer, 2003. [Online]. [28] E. A. Carlen, M. C. Carvalho, J. L. Roux, M. Loss, and C. Villani, Available: https://doi.org/10.1007/978-3-540-36428-3_4 “Entropy and chaos in the Kac model,” Kinetic Rel. Models,vol.3, no. 1, pp. 85–122, 2010. [29] C. Villani, Topics in Optimal Transportation,vol.58.Providence,RI, USA: AMS, 2003. [30] C. Villani, Optimal Transport: Old and New,vol.338.Berlin,Germany: Springer-Verlag, 2008. Thomas A. Courtade (S’06–M’13) received the B.Sc. degree (summa cum [31] M. Ledoux, The Concentration of Measure Phenomenon,vol.89. laude) in electrical engineering fromMichiganTechnologicalUniversityin Providence, RI, USA: AMS, 2005. 2007, and the M.S. and Ph.D. degrees from the University of California, [32] S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequal- Los Angeles (UCLA) in 2008 and 2012, respectively. He is an Assistant ities: A Nonasymptotic Theory of Independence.London,U.K.: Professor in the Department of Electrical Engineering and Computer Sciences Oxford Univ. Press, 2013. at the University of California, Berkeley. Prior to joining UC Berkeley in 2014, [33] M. Raginsky and I. Sason, “Concentration of measure inequalities he was a Postdoctoral Fellow supportedbytheNSFCenterforScienceof in information theory, communications, and coding,” Found. Trends Information. Commun. Inf. Theory,vol.10,nos.1–2,pp.1–246,2013. Prof. Courtade’s honors include an NSF CAREER award and a Hellman [34] Y. Brenier, “Polar factorizationandmonotonerearrangementofvector- Fellowship. He also received the Distinguished Ph.D. Dissertation Award and valued functions,” Commun. Pure Appl. Math.,vol.44,no.4, an Excellence in Teaching Award from the UCLA Department of Electrical pp. 375–417, Jun. 1991. Engineering, and a Jack Keil Wolf Student Paper Award for the 2012 [35] R. J. McCann, “Existence and uniqueness of monotone measure- International Symposium on Information Theory. preserving maps,” Duke Math. J.,vol.80,no.2,pp.309–324,1995. [36] O. Rioul, “Yet another proof of the entropy power inequality,” IEEE Trans. Inf. Theory,vol.63,no.6,pp.3595–3599,Jun.2017. [37] H. Knöthe, “Contributions to the theory of convex bodies,” Michigan Math. J.,vol.4,no.1,pp.39–52,1957. Max Fathi received his PhD from the Université Pierre et Marie Curie, Paris [38] M. Rosenblatt, “Remarks on a multivariate transformation,” Ann. Math. in 2014. He is currently a CNRS researcher at the Institut de Mathématiques Statist.,vol.23,no.3,pp.470–472,1952. de Toulouse. [39] A. Figalli, F. Maggi, and A. Pratelli, “A mass transportation approach to quantitative isoperimetric inequalities,” Inventiones Math.,vol.182, no. 1, pp. 167–211, Oct. 2010. [40] L. A. Caffarelli, “A localization property of viscosity solutions to the Monge–Ampere equation and their strict convexity,” Ann. Math., vol. 131, no. 1, pp. 129–134, 1990. Ashwin Pananjady is a Ph.D. candidate in the Department of Electrical [41] L. A. Caffarelli, “The regularity of mappings with a convex potential,” Engineering and Computer Sciences (EECS) at the University of California, J. Amer. Math. Soc.,vol.5,no.1,pp.99–104,1992. Berkeley. He received the B.Tech. degree (with honors) in electrical engi- [42] L. A. Caffarelli, “Monotonicity properties of optimal transportation and neering from IIT Madras in 2014. His research interests include statistics, the FKG and related inequalities,” Commun. Math. Phys.,vol.214,no.3, optimization, and information theory. He is a recipient of the Governor’s Gold pp. 547–563, Nov. 2000. Medal from IIT Madras.