Strengthening the Entropy Power Inequality

2016 IEEE International Symposium on Information Theory Strengthening the Entropy Power Inequality Thomas A. Courtade Department of Electrical Engineering and Computer Sciences University of California, Berkeley Email: [email protected] Abstract—We tighten the entropy power inequality (EPI) when We remark that there exist various improvements of the EPI one of the random summands is Gaussian. Our strengthening is in the literature (e.g,. [2] for log-concave densities, and [3] for closely related to strong data processing for Gaussian channels subsets), however none are notably similar to that presented and generalizes the (vector extension of) Costa’s EPI. This leads to a new reverse EPI and, as a corollary, sharpens Stam’s inequal- here. The reader is referred to the recent survey [4] for an ity relating entropy power and Fisher information. Applications overview. to network information theory are given, including a short self- A conditional version of the EPI is often useful in applica- contained proof of the converse for the two-encoder quadratic tions. Theorem 1 easily generalizes along these lines. Indeed, Gaussian source coding problem. The proof of our main result due to joint convexity of log(2x + 2y) in x; y, we obtain the is based on weak convergence and a doubling argument for establishing Gaussian optimality via rotational-invariance. following corollary of Theorem 1: Corollary 1. Suppose X; W are conditionally independent I. INTRODUCTION AND MAIN RESULT given Q, and moreover that W is conditionally Gaussian given For a random variable X with density f, the differential Q. Then, for any V satisfying X ! (X + W ) ! V jQ, entropy of X is defined by1 Z 22(h(X+W jQ)−I(X;V jQ)) h(X) = − f(x) log f(x)dx: (1) ≥ 22(h(XjQ)−I(X+W ;V jQ)) + 22h(W jQ): (5) Similarly, h(X) is defined to be the differential entropy of a As one would expect, Theorem 1 also admits a vector random vector X on Rn. Shannon’s celebrated entropy power generalization, which may be regarded as our main result: inequality (EPI) asserts that for X; W independent Theorem 2. Suppose X; W are n-dimensional random vec- 22h(X+W ) ≥ 22h(X) + 22h(W ): (2) tors that are conditionally independent given Q, and moreover that W is conditionally Gaussian given Q. Then, for any V Under the assumption that W is Gaussian, we prove the satisfying X ! (X + W) ! V jQ, following strengthening of (2): 2 (h(X+WjQ)−I(X;V jQ)) 2 2 n Theorem 1. Let X ∼ PX , and let W ∼ N(0; σ ) be 2 (h(XjQ)−I(X+W;V jQ)) 2 h(WjQ) independent of X. For any V satisfying X ! (X +W ) ! V , ≥ 2 n + 2 n : (6) 2(h(X+W )−I(X;V )) 2(h(X)−I(X+W ;V )) 2h(W ) 2 ≥ 2 + 2 : (3) The restriction of W to be conditionally Gaussian in Theo- The notation X ! (X + W ) ! V indicates that the rem 2 should not be viewed as a severe limitation. Indeed, in random variables X, X + W and V form a Markov chain, typical applications of the EPI, one of the variables is Gaussian in that order. Throughout, we write X ! Y ! V jQ to denote as surveyed by Rioul [5, Section I]. random variables X; Y; V; Q with joint distribution factoring In the following section, we demonstrate several applications of Theorem 2. In particular, we show that Costa’s EPI [6] as PXYVQ = PXQPY jXQPV jYQ. That is, X ! Y ! V form a Markov chain conditioned on Q. and its generalization [7] are immediate corollaries of Theorem To familiarize the reader with (3), we remark that the 2, along with a new reverse EPI and a sharpening of the function Stam-Gross logarithmic Sobolev inequality. Also, we will see that Theorem 2 leads to a very brief proof of the converse for gI (t; PAB) = sup fI(V ; A): I(V ; B) ≤ tg ; (4) the rate region of the quadratic Gaussian two-encoder source- V :A!B!V coding problem [8]. Finally, an application to the one-sided is the best-possible (or strong) data-processing function Gaussian interference channel is discussed. Proofs of the main (cf. [1]) for the pair (A; B) ∼ PAB since I(V ; A) ≤ results are sketched in Section III. gI (I(V ; B);PAB) for any V satisfying A ! B ! V . Adopting the notation of Theorem 1 and defining Y = X+W , II. APPLICATIONS inequality (3) may be restated as A. Generalized Costa’s EPI and a New Reverse EPI 22(h(Y )−gI (t;PXY )) ≥ 22(h(X)−t) + 22h(W ) for all t ≥ 0. Costa’s EPI [6] asserts concavity of entropy power, and has been generalized to a vector setting by Liu et al. [7]. Hence, the slack in Shannon’s EPI (2) for Gaussian W (due We demonstrate below that this generalization follows as a to non-Gaussianness of X) can be improved by choosing corollary to Theorem 2 by taking V equal to X contaminated an appropriate point on the strong data processing curve by additive Gaussian noise. In this sense, Theorem 2 may be gI (·;PXY ). In view of this, (3) might be called a strong EPI. interpreted as a further generalization of Costa’s EPI, where This work was supported by NSF Grants CCF-1528132 and CCF-0939370 the additive noise is no longer restricted to be Gaussian. First, (Center for Science of Information). we have the following new EPI for three random summands 1When the integral (1) does not exist, or if X does not have density, then (one of which is Gaussian): we adopt the convention that h(X) = −∞. 978-1-5090-1805-5/16/$31.00 ©2016 IEEE 2294 2016 IEEE International Symposium on Information Theory Theorem 3. Let X ∼ PX; Z ∼ PZ and W ∼ N(0; Σ) be Since Wagner et al.’s original proof of the sum-rate constraint, independent, n-dimensional random vectors. Then other proofs have been proposed (e.g., [15]), however all 2 (h(X+W)+h(Z+W)) known proofs are quite complex. Below, we show that the 2 n converse result for the entire rate region is a direct consequence 2 (h(X)+h(Z)) 2 (h(X+Z+W)+h(W)) ≥ 2 n + 2 n : (7) of Theorem 2, thus unifying the results of [8] and [12] under a common and succinct EPI. Proof. This is an immediate consequence of Theorem 2 by n putting V = X + Z + W and rearranging exponents. Theorem 6. [8] Let X; Y = fXi;Yigi=1 be independent The vector generalization of Costa’s EPI now follows: identically distributed pairs of jointly Gaussian random vari- n nRX ables with correlation ρ. Let φX : R ! f1;:::; 2 g and Theorem 4. [7] Let X ∼ PX and N ∼ N(0; Σ) be n nRY φY : R ! f1;:::; 2 g, and define independent, n-dimensional random vectors. For a positive 1 semidefinite matrix A I, d kX − [Xjφ (X); φ (Y)]k2 (12) X , nE E X Y 2 h(X+A1=2N) 1=n 2 h(X) 1=n 2 h(X+N) 2 n ≥ jI − Aj 2 n + jAj 2 n : 1 d kY − [Yjφ (X); φ (Y)]k2 : (13) Y , nE E X Y Proof. Let N1; N2 be independent copies of N, and put W = 1=2 1=2 q 4ρ2D A N1 and Z = (I − A) N2. Since N = Z + W in Then, for β(D) , 1 + 1 + (1−ρ2)2 , distribution, the claim follows from Theorem 3. 1 1 Theorem 3 also yields a new reverse EPI as a corollary, R ≥ log 1 − ρ2 + ρ22−2RY (14) X 2 d which sharpens Stam’s inequality. Toward this end, suppose X 1 1 2 2 −2RX X has smooth density f and define the entropy power N(X) RY ≥ log 1 − ρ + ρ 2 (15) and Fisher information J(X) as 2 dY 1 (1 − ρ2)β(d d ) 1 2 h(X) 2 X Y N(X) = 2 n J(X) = Ekr ln f(X)k : (8) RX + RY ≥ log : (16) 2πe 2 2dX dY Letting W ∼ N(0; tI) in Theorem 3 and applying de Bruijn’s Our proof hinges on a simple corollary2 of Theorem 2: Identity as t ! 0, we obtain the following reverse EPI: Proposition 1. For X; Y as above, Theorem 5. Let X ∼ PX; Z ∼ PZ be independent, n- − 2 (I(Y;U)+I(X;V jU)) 2 − 2 (I(X;U)+I(Y;V jU)) 2 dimensional random vectors with smooth densities. Then 2 n ≥ ρ 2 n + 1 − ρ nN(X + Z) ≤ N(X)N(Z)(J(X) + J(Z)) : (9) for any U; V satisfying U ! X ! Y ! V . In contrast to other reverse EPIs that tend to be more Proof. Since mutual information is invariant to scaling, we restrictive in their assumptions (e.g., log-concavity of densities may assume without loss of generality that Yi = ρXi + Zi, 2 [9]), inequality (9) applies generally and bounds the entropy where Xi ∼ N(0; 1) and Zi ∼ N(0; 1 − ρ ), independent of power N(X+Z) in terms of the marginal entropy powers and Xi. Now, Theorem 2 implies Fisher informations. We refer the reader to the survey [4] for 2 (h(YjU)−I(X;V jU)) 2 (h(ρXjU)−I(Y;V jU)) 2 h(Z) 2 n ≥ 2 n + 2 n an overview of other known reverse EPIs. 2 2 (h(XjU)−I(Y;V jU)) 2 Stam’s inequality [10] (or, equivalently, the Gaussian log- = ρ 2 n +2πe(1 − ρ ): arithmic Sobolev inequality [11]) states N(X)J(X) ≥ n.

Strengthening the Entropy Power Inequality

On Relations Between the Relative Entropy and $\Chi^ 2$-Divergence

A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications

Entropy Power Inequalities the American Institute of Mathematics

On the Entropy of Sums

Relative Entropy and Score Function: New Information–Estimation Relationships Through Arbitrary Additive Perturbation

Cramér-Rao and Moment-Entropy Inequalities for Renyi Entropy And

Two Remarks on Generalized Entropy Power Inequalities

Entropy Power Inequalities for Qudits M¯Arisozols University of Cambridge

On Relations Between the Relative Entropy and Χ2-Divergence, Generalizations and Applications

On the Similarity of the Entropy Power Inequality And

Inequalities in Information Theory a Brief Introduction

Rényi Entropy Power and Normal Transport