Quantitative Stability of the Entropy Power Inequality
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018 5691 Quantitative Stability of the Entropy Power Inequality Thomas A. Courtade , Member, IEEE,MaxFathi,andAshwinPananjady Abstract—We establish quantitative stability results for the these inequalities is reviewed in Section III-B. Due to the entropy power inequality (EPI). Specifically, we show that if formal equivalence of the Shannon-Stam and Lieb inequalities, uniformly log-concave densities nearly saturate the EPI, then they we shall generally refer to both as the EPI. must be close to Gaussian densities in the quadratic Kantorovich- Wasserstein distance. Furthermore, if one of the densities is It is well known that equality is achieved in the Shannon- Gaussian and the other is log-concave, or more generally has Stam EPI if and only if X and Y are Gaussian vectors with positive spectral gap, then the deficit in the EPI can be controlled proportional covariances. Equivalently, δEPI,t (µ, ν) vanishes if in terms of the L1-Kantorovich-Wasserstein distance or relative and only if µ, ν are Gaussian measures that are identical up entropy, respectively. As a counterpoint, an example shows to translation.1 However, despite the fundamental role the EPI that the EPI can be unstable with respect to the quadratic Kantorovich-Wasserstein distance when densities are uniformly plays in information theory and crisp characterization of equal- log-concave on sets of measure arbitrarily close to one. Our ity cases, few stability estimates are known. Specifically, our stability results can be extended to non-log-concave densities, motivating question is the following quantitative reinforcement provided certain regularity conditions are met. The proofs are of equality conditions for the EPI: based on mass transportation. If δEPI,t (µ, ν) is small, must µ and ν be ‘close’ to Gaussian Index Terms—Entropy power inequality, optimal transport, measures, which are themselves ‘close’ to each other, in a stability. precise and quantitative sense? I. INTRODUCTION Toward answering this question, our main result is a dimension-free, quantitative stability estimate for the EPI. ET X and Y be independent random vectors on n with R More specifically, we show that if the measures µ, ν have uni- corresponding laws µ and ν,eachabsolutelycontinuous L formly log-concave densities and nearly saturate either form of with respect to Lebesgue measure. The celebrated entropy the EPI, then they must also be close to Gaussian measures in power inequality (EPI) proposed by Shannon [2] and proved the quadratic Kantorovich-Wasserstein distance. We also show by Stam [3] asserts that that the EPI is not stable (with respect to the same criterion) in N(X Y ) N(X) N(Y ), (1) situations where the densities nearly satisfy the same regularity + ≥ + conditions. Other quantitative deficit estimates are obtained where N(X) N(µ) 1 e2 h(X)/n denotes the entropy 2πe when one of the two variables is Gaussian and the other is power of X,and≡ h(X) :=h(µ) f log f is the entropy Rn log-concave or has positive spectral gap. Dimension-dependent of X having density f .Foraparameter≡ =− t (0, 1),letusdefine ! ∈ estimates are obtained in certain more general situations. Before stating the main results, let us first introduce some δEPI,t (µ, ν) h(√tX √1 tY) th(X) (1 t)h(Y ) . := + − − + − notation. We let $ $(Rn) denote the set of centered " (2)# ≡ n Gaussian probability measures on R ,andletγ denote the Unaware of the works by Shannon, Stam and Blachman [4], standard Gaussian measure on Rn.Thatis,2 Lieb [5] rediscovered the EPI by establishing δ (µ, ν) 0 EPI,t n x 2/2 dx ≥ dγ(x) dγ (x) e−| | . and noting its formal equivalence to (1). The equivalence of = = (2π)n/2 Manuscript received June 12, 2017; revised January 3, 2018; accepted Next, we recall that the L2-Kantorovich-Wasserstein distance February 1, 2018. Date of publicationFebruary20,2018;dateofcurrent version July 12, 2018. This paper was presented in part at the 2017 between probability measures µ, ν is defined according to International Symposium on Information Theory [1]. T. A. Courtade and 1/2 A. Pananjady were supported by NSF (Center for Science of Information) 2 W2(µ, ν) inf E X Y , under Grant CCF-1528132 and Grant CCF-0939370. M. Fathi was supported = | − | by NSF FRG under Grant DMS-1361122. T. A. Courtade and M. Fathi " # n acknowledge support from the France–Berkeley Fund. where denotes the Euclidean metric on R and the infimum T. A. Courtade and A. Pananjady are with the Department of Electrical is over|·| all couplings on X, Y with marginal laws X µ Engineering and Computer Sciences, University of California, Berkeley, CA and Y ν.IfX µ is a centered random vector, then∼ we 94720 USA (e-mail: [email protected]). ∼ ∼ M. Fathi is with CNRS, 75016 Paris, France, and also with the Institut de Mathématiques de Toulouse, Université Paul Sabatier, 31062 Toulouse, 1Lieb did not settle the cases of equality; this was done later by France. Carlen and Soffer [6]. Communicated by I. Sason, Associate Editor at Large. 2Explicit dependence of quantities on the ambient dimension n will be Digital Object Identifier 10.1109/TIT.2018.2808161 suppressed in situations where our arguments are the same in all dimensions. 0018-9448 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 5692 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 64, NO. 8, AUGUST 2018 write &µ EXX⊤ to denote the covariance matrix of X.For additive on product measures, so the estimate (3) is dimension- = 1/2 asymmetricpositivesemidefinitematrix&,wewrite& free, which is compatible with the additivity of δEPI,t on to denote the unique symmetric positive semidefinite matrix product measures. satisfying & &1/2&1/2. Theorem 3 may be readily adapted to the setting of uni- Remark 1:= Both forms of the EPI are invariant to transla- formly log-concave densities. Toward this end, let η>0and tion of the measures µ, ν.Thus,ourpersistentassumptionof recall that h(η1/2 X) h(X) 1 log η,sothatδ (µ, ν) = + 2 EPI,t centered probability measures is for convenience and comes is invariant under the rescaling (X, Y ) (η1/2 X,η1/2Y ). without loss of generality. Similarly, if X µ has density f that→ is uniformly log- Remark 2: Without explicit mention, we assume throughout concave in the sense∼ that that all entropies exist in the usual Lebesgue sense. When 2 dealing with the Shannon-Stam or Lieb inequalities, one gen- log f ηI, (4) −∇ ≥ erally needs to make this assumption. Indeed, it is possible for h(X Y ) to not exist in the usual Lebesgue sense even though then a change of variables reveals that the density fη asso- + ciated with the rescaled random variable η1/2 X satisfies h(X) and h(Y ) exist and are finite [7, Proposition 1]. That 2 ϕ log fη I. In particular, fηdx e dγ for some convex being said, our main results primarily concern log-concave −∇ ≥ = − distributions. As such, all involved entropies are guaranteed function ϕ.Thus,Theorem3isequivalenttothefollowing: to exist (e.g., [8]). Corollary 5: If µ and ν are centered probability measures with densities satisfying (4),then Organization δEPI,t (µ, ν) The rest of this paper is organized as follows: t(1 t) 2 2 2 η − inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) . Sections II-A and II-B describe our main stability results ≥ 2 γ1,γ2 $ + + for log-concave densities and the relationship to previous ∈ " # work, respectively. Section II-C gives an example where This result will also apply to certain families of non log- the EPI is not stable with respect to the quadratic concave measures, see Remark 17. 2 2 For convenience, let d (µ) infγ0 $ W (µ, γ0) denote Kantorovich-Wasserstein distance when regularity conditions W2 := ∈ 2 are not met. Section III gives proofs of our main results the squared W2-distance from µ to the set of centered Gaussian measures. Using the inequality (a b c)2 3(a2 b2 c2) and a brief discussion of techniques, which are based on + + ≤ + + optimal mass transportation. We conclude in Section IV with and the triangle inequality for W2,wemayconcludeaweaker, extensions of our results to more general settings. but potentially more convenient variant of Corollary 5. Corollary 6: If µ and ν are centered probability measures with densities satisfying (4),then II. MAIN RESULTS t(1 t) This section describes our main results, and also compares δ (µ, ν) η − d2 (µ) d2 (ν) W 2(µ, ν) . to previously known stability estimates. Proofs are generally EPI,t ≥ 8 W2 + W2 + 2 deferred until Section III. " # (5) The Shannon-Stam form of the entropy power inequality (1) A. Stability of the EPI for Log-Concave Densities is oftentimes preferred to Lieb’s inequality for applications Our main result is the following: in information theory. Starting with Corollary 6, we may ϕ ψ Theorem 3: Let µ e− γ and ν e− γ be centered establish an analogous estimate for the Shannon-Stam EPI. probability measures, where= ϕ and ψ are= convex. Then Recall first that (1) attains equality if and only if µ, ν are Gaussian with proportional covariance matrices. Motivated by δ (µ, ν) EPI,t this, we define the quantity t(1 t) 2 2 2 − inf W2 (µ, γ1) W2 (ν,γ2) W2 (γ1,γ2) . 2 ≥ 2 γ1,γ2 $ + + 2 1/2 1/2 ∈ dF (µ, ν) inf √θ&µ √1 θ&ν , (6) " (3)# := θ (0,1) − − F ∈ $ $ ϕ $ $ Remark 4: The notation µ e− γ is shorthand for where A F $ Tr(A A) denotes$ Frobenius dµ ϕ = ϕ ∥ ∥ := ⊤ e− .Measuresoftheformµ e− γ for convex (or Hilbert-Schmidt) norm of a real matrix A,toprovidea dγ = = % ϕ have several names in the literature.