2016 IEEE International Symposium on

Strengthening the Entropy Power Inequality

Thomas A. Courtade Department of Electrical Engineering and Computer Sciences University of California, Berkeley Email: [email protected]

Abstract—We tighten the entropy power inequality (EPI) when We remark that there exist various improvements of the EPI one of the random summands is Gaussian. Our strengthening is in the literature (e.g,. [2] for log-concave densities, and [3] for closely related to strong data processing for Gaussian channels subsets), however none are notably similar to that presented and generalizes the (vector extension of) Costa’s EPI. This leads to a new reverse EPI and, as a corollary, sharpens Stam’s inequal- here. The reader is referred to the recent survey [4] for an ity relating entropy power and . Applications overview. to network information theory are given, including a short self- A conditional version of the EPI is often useful in applica- contained proof of the converse for the two-encoder quadratic tions. Theorem 1 easily generalizes along these lines. Indeed, Gaussian source coding problem. The proof of our main result due to joint convexity of log(2x + 2y) in x, y, we obtain the is based on weak convergence and a doubling argument for establishing Gaussian optimality via rotational-invariance. following corollary of Theorem 1: Corollary 1. Suppose X,W are conditionally independent I.INTRODUCTIONAND MAIN RESULT given Q, and moreover that W is conditionally Gaussian given For a X with density f, the differential Q. Then, for any V satisfying X → (X + W ) → V |Q, entropy of X is defined by1 Z 22(h(X+W |Q)−I(X;V |Q)) h(X) = − f(x) log f(x)dx. (1) ≥ 22(h(X|Q)−I(X+W ;V |Q)) + 22h(W |Q). (5) Similarly, h(X) is defined to be the of a As one would expect, Theorem 1 also admits a vector random vector X on Rn. Shannon’s celebrated entropy power generalization, which may be regarded as our main result: inequality (EPI) asserts that for X,W independent Theorem 2. Suppose X, W are n-dimensional random vec- 22h(X+W ) ≥ 22h(X) + 22h(W ). (2) tors that are conditionally independent given Q, and moreover that W is conditionally Gaussian given Q. Then, for any V Under the assumption that W is Gaussian, we prove the satisfying X → (X + W) → V |Q, following strengthening of (2): 2 (h(X+W|Q)−I(X;V |Q)) 2 2 n Theorem 1. Let X ∼ PX , and let W ∼ N(0, σ ) be 2 (h(X|Q)−I(X+W;V |Q)) 2 h(W|Q) independent of X. For any V satisfying X → (X +W ) → V , ≥ 2 n + 2 n . (6) 2(h(X+W )−I(X;V )) 2(h(X)−I(X+W ;V )) 2h(W ) 2 ≥ 2 + 2 . (3) The restriction of W to be conditionally Gaussian in Theo- The notation X → (X + W ) → V indicates that the rem 2 should not be viewed as a severe limitation. Indeed, in random variables X, X + W and V form a Markov chain, typical applications of the EPI, one of the variables is Gaussian in that order. Throughout, we write X → Y → V |Q to denote as surveyed by Rioul [5, Section I]. random variables X,Y,V,Q with joint distribution factoring In the following section, we demonstrate several applica- tions of Theorem 2. In particular, we show that Costa’s EPI [6] as PXYVQ = PXQPY |XQPV |YQ. That is, X → Y → V form a Markov chain conditioned on Q. and its generalization [7] are immediate corollaries of Theorem To familiarize the reader with (3), we remark that the 2, along with a new reverse EPI and a sharpening of the function Stam-Gross logarithmic Sobolev inequality. Also, we will see that Theorem 2 leads to a very brief proof of the converse for gI (t, PAB) = sup {I(V ; A): I(V ; B) ≤ t} , (4) the rate region of the quadratic Gaussian two-encoder source- V :A→B→V coding problem [8]. Finally, an application to the one-sided is the best-possible (or strong) data-processing function Gaussian interference channel is discussed. Proofs of the main (cf. [1]) for the pair (A, B) ∼ PAB since I(V ; A) ≤ results are sketched in Section III. gI (I(V ; B),PAB) for any V satisfying A → B → V . Adopting the notation of Theorem 1 and defining Y = X+W , II.APPLICATIONS inequality (3) may be restated as A. Generalized Costa’s EPI and a New Reverse EPI 22(h(Y )−gI (t,PXY )) ≥ 22(h(X)−t) + 22h(W ) for all t ≥ 0. Costa’s EPI [6] asserts concavity of entropy power, and has been generalized to a vector setting by Liu et al. [7]. Hence, the slack in Shannon’s EPI (2) for Gaussian W (due We demonstrate below that this generalization follows as a to non-Gaussianness of X) can be improved by choosing corollary to Theorem 2 by taking V equal to X contaminated an appropriate point on the strong data processing curve by additive Gaussian noise. In this sense, Theorem 2 may be gI (·,PXY ). In view of this, (3) might be called a strong EPI. interpreted as a further generalization of Costa’s EPI, where This work was supported by NSF Grants CCF-1528132 and CCF-0939370 the additive noise is no longer restricted to be Gaussian. First, (Center for Science of Information). we have the following new EPI for three random summands 1When the integral (1) does not exist, or if X does not have density, then (one of which is Gaussian): we adopt the convention that h(X) = −∞.

978-1-5090-1805-5/16/$31.00 ©2016 IEEE 2294 2016 IEEE International Symposium on Information Theory

Theorem 3. Let X ∼ PX, Z ∼ PZ and W ∼ N(0, Σ) be Since Wagner et al.’s original proof of the sum-rate constraint, independent, n-dimensional random vectors. Then other proofs have been proposed (e.g., [15]), however all

2 (h(X+W)+h(Z+W)) known proofs are quite complex. Below, we show that the 2 n converse result for the entire rate region is a direct consequence 2 (h(X)+h(Z)) 2 (h(X+Z+W)+h(W)) ≥ 2 n + 2 n . (7) of Theorem 2, thus unifying the results of [8] and [12] under a common and succinct EPI. Proof. This is an immediate consequence of Theorem 2 by n putting V = X + Z + W and rearranging exponents. Theorem 6. [8] Let X, Y = {Xi,Yi}i=1 be independent The vector generalization of Costa’s EPI now follows: identically distributed pairs of jointly Gaussian random vari- n nRX ables with correlation ρ. Let φX : R → {1,..., 2 } and Theorem 4. [7] Let X ∼ PX and N ∼ N(0, Σ) be n nRY φY : R → {1,..., 2 }, and define independent, n-dimensional random vectors. For a positive 1 semidefinite matrix A  I, d kX − [X|φ (X), φ (Y)]k2 (12) X , nE E X Y 2 h(X+A1/2N) 1/n 2 h(X) 1/n 2 h(X+N) 2 n ≥ |I − A| 2 n + |A| 2 n . 1 d kY − [Y|φ (X), φ (Y)]k2 . (13) Y , nE E X Y Proof. Let N1, N2 be independent copies of N, and put W = 1/2 1/2 q 4ρ2D A N1 and Z = (I − A) N2. Since N = Z + W in Then, for β(D) , 1 + 1 + (1−ρ2)2 , distribution, the claim follows from Theorem 3. 1  1  Theorem 3 also yields a new reverse EPI as a corollary, R ≥ log 1 − ρ2 + ρ22−2RY  (14) X 2 d which sharpens Stam’s inequality. Toward this end, suppose X 1  1  2 2 −2RX  X has smooth density f and define the entropy power N(X) RY ≥ log 1 − ρ + ρ 2 (15) and Fisher information J(X) as 2 dY 1 (1 − ρ2)β(d d ) 1 2 h(X) 2 X Y N(X) = 2 n J(X) = Ek∇ ln f(X)k . (8) RX + RY ≥ log . (16) 2πe 2 2dX dY Letting W ∼ N(0, tI) in Theorem 3 and applying de Bruijn’s Our proof hinges on a simple corollary2 of Theorem 2: Identity as t → 0, we obtain the following reverse EPI: Proposition 1. For X, Y as above, Theorem 5. Let X ∼ PX, Z ∼ PZ be independent, n- − 2 (I(Y;U)+I(X;V |U)) 2 − 2 (I(X;U)+I(Y;V |U)) 2 dimensional random vectors with smooth densities. Then 2 n ≥ ρ 2 n + 1 − ρ nN(X + Z) ≤ N(X)N(Z)(J(X) + J(Z)) . (9) for any U, V satisfying U → X → Y → V . In contrast to other reverse EPIs that tend to be more Proof. Since mutual information is invariant to scaling, we restrictive in their assumptions (e.g., log-concavity of densities may assume without loss of generality that Yi = ρXi + Zi, 2 [9]), inequality (9) applies generally and bounds the entropy where Xi ∼ N(0, 1) and Zi ∼ N(0, 1 − ρ ), independent of power N(X+Z) in terms of the marginal entropy powers and Xi. Now, Theorem 2 implies

Fisher informations. We refer the reader to the survey [4] for 2 (h(Y|U)−I(X;V |U)) 2 (h(ρX|U)−I(Y;V |U)) 2 h(Z) 2 n ≥ 2 n + 2 n an overview of other known reverse EPIs. 2 2 (h(X|U)−I(Y;V |U)) 2 Stam’s inequality [10] (or, equivalently, the Gaussian log- = ρ 2 n +2πe(1 − ρ ). arithmic Sobolev inequality [11]) states N(X)J(X) ≥ n. 2 2 − n h(Y) − n h(X) 1 By taking X, Z to be IID in (9), we have sharpened Stam’s Since 2 = 2 = 2πe , multiplying through by 1 establishes the claim. inequality:   2πe N √1 (X + Z) 2 N(X)J(X) ≥ n . (10) Proof of Theorem 6. For convenience, put U = φX (X) and N(X) V = φY (Y). Using the Markov relationship U → X → Y → Finally, we note that Stam’s inequality controls the deficit in V , we may rearrange the exponents in Proposition 1 to obtain the classical EPI in the following sense: if both X and Z nearly the equivalent inequality saturate Stam’s inequality, then (9) can be rearranged to show − 2 (I(X;U,V )+I(Y;U,V )) 2 n the classical EPI will also be nearly saturated for the sum − 2 I(X,Y;U,V )  2 2 − 2 I(X,Y;U,V ) X+Z. A similar statement holds for the convolution inequality ≥ 2 n 1 − ρ + ρ 2 n . (17) for Fisher information. Indeed, applying Stam’s inequality to X + Z The left- and right-hand sides of (17) are monotone decreasing the sum , inequality (9) yields: 1 1 in n (I(X; U, V ) + I(Y; U, V )) and n I(X, Y; U, V ), respec- 1  1 1  1 ≤ + p(X)p(Z), (11) tively. Therefore, if n I(X, Y; U, V ) ≤ R and J(X + Z) J(X) J(Z) 1 1 1 1 (I(X; U, V ) + I(Y; U, V )) ≥ log (18) where p(X) := n N(X)J(Z) ≥ 1, and p(Z) is defined n 2 D similarly. for some pair (R,D), then we have D ≥ −2R 2 2 −2R B. Two-Encoder Quadratic Gaussian Source Coding 2 1 − ρ + ρ 2 , which is a quadratic inequality 2−2R The converse for the two-encoder quadratic Gaussian source with respect to the term . This is easily solved using the coding problem was a longstanding open problem in the field quadratic formula to obtain: of network information theory until its ultimate resolution by 2D 1 (1 − ρ2)β(D) 2−2R ≤ =⇒ R ≥ log , Wagner et al. [8]. Wagner et al.’s work built upon Oohama’s (1 − ρ2)β(D) 2 2D earlier solution to the one-helper problem [12] and the in- dependent solutions to the Gaussian CEO problem [13], [14]. 2Proposition 1 was first established by the author and Jiao in [16].

2295 2016 IEEE International Symposium on Information Theory

q 4ρ2D where β(D) , 1 + 1 + (1−ρ2)2 . Note that Jensen’s inequal- ity and the maximum-entropy property of Gaussians imply Interestingly, (25) this takes a similar form to the outer 1 I(X; U, V ) ≥ 1 log 1 and 1 I(Y; U, V ) ≥ 1 log 1 , so bound of Theorem 7; however, it is known that transmission n 2 dX n 2 dY that without power control is suboptimal for the Gaussian Z- 1 1 1 interference channel in general [21], [22]. Nevertheless, it may (I(X; U, V ) + I(Y; U, V )) ≥ log , (19) be possible to identify a random variable V in the supremum n 2 dX dY n in Theorem 7, possibly depending on X2 , which ultimately 1 establishing (16) since n I(X, Y; U, V ) ≤ improves known bounds. 1 n (H(U) + H(V )) ≤ RX + RY . Similarly, Proposition 1 implies III.PROOF SKETCH

2R +log d 2 (I(X;U|V )−I(X;U,V )) − 2 I(X;V ) Here we sketch the proofs of Theorems 1 and 2, complete 2 X X ≥ 2 n = 2 n (20) details can be found in [23]. For random variables X,Y ∼ 2 2 2 − n I(Y;V ) ≥ (1 − ρ ) + ρ 2 (21) PXY , we write X|{Y = y} to denote the random variable ≥ (1 − ρ2) + ρ22−2RY . (22) X conditional on {Y = y}. A sequence of random variables {Xn, n ≥ 1} will be denoted by the shorthand {Xn}, and Rearranging (and symmetry) yields (14)-(15). convergence of {Xn} in distribution to a random variable X∗ D C. One-sided Gaussian Interference Channel is written Xn −→ X∗. We begin with an optimization problem: For a random We now briefly discuss how Theorem 2 might be applied to variable X ∼ P , let Y be defined via the additive Gaussian the interference channel3. Recall that the one-sided Gaussian X √ noise channel P given by Y = snrX + Z, where interference channel (GIC) is a memoryless channel, with Y |X Z ∼ N(0, 1), and define the family of functionals input-output relationship given by s (X, snr) = −h(X) + λh(Y ) (26) Y = X + W (23) λ 1 1 n o + inf I(Y ; V ) − λI(X; V ) Y2 = αY1 + X2 + W2, (24) V :X→Y →V where Xi and Yi are the channel inputs and observations parameterized by λ ≥ 1. For (X,Y,Q) ∼ PXQPY |X , define corresponding to Encoder i and Decoder i, respectively, for the functional of PXQ 2 i = 1, 2. Here, W ∼ N(0, 1) and W2 ∼ N(0, 1 − α ) are sλ(X, snr|Q) = −h(X|Q) + λh(Y |Q) (27) independent of each other and of the channel inputs X1,X2. n o We assume |α| < 1 since the capacity is known in the + inf I(Y ; V |Q) − λI(X; V |Q) . strong interference regime of |α| > 1. Observe that we have V :X→Y →V |Q expressed the one-sided GIC in degraded form, which has We consider the optimization problem capacity region identical to the corresponding non-degraded version [18]. The capacity region of the one-sided GIC remains Vλ(snr) = inf sλ(X, snr|Q). (28) P : [X2]≤1 unknown in the regime of |α| < 1. XQ E For convenience, define Y0 = αY1 + W2, and let In (28), it suffices to consider Q ∈ Q with |Q| ≤ 2. By C (α, P1,P2) denote the set of achievable rate pairs (R1,R2) Fenchel-Caratheodory-Bunt, this is sufficient to preserve the 2 P 2 for the one-sided GIC described above, when user i is subject values of E[X ] = p(q)E[X |Q = q] and sλ(X, snr|Q) = P q to power constraint Pi, i = 1, 2. See [19] for formal defini- q p(q)sλ(X, snr|Q = q). tions. The essence of Theorem 1 is the explicit characterization:

Theorem 7. (R1,R2) ∈ C (α, P1,P2) only if Theorem 8.

2 n n n  h     i −2R2+o(1) − I(X ,X ;Y ) 1 λ2πe 2πe 1 2 ≥ 2 n 1 2 2  2 λ log λ−1 − log λ−1 + log(snr) snr ≥ λ−1 Vλ(snr)= 2 n n 2 n 1 h i 1 n 2 2R1− I(Y ;V |Y ) 2 I(Y ;V )o × sup α 2 n 0 1 + (1 − α )2 n 1 ,  2 λ log (2πe(1 + snr)) − log (2πe) snr ≤ λ−1 . n n V :Y1 →Y0 →V n 2 The idea behind proving Theorem 8 is that we only need to for some PXnXn = PXn PXn satisfying [kX k ] ≤ nPi, 1 2 1 2 E i consider Gaussian random variables in optimization problem i = 1, 2. (28). Our argument is based on weak convergence and draws Proof. The proof is a direct consequence of Fano’s inequal- inspiration from a technique employed by Geng and Nair for n establishing Gaussian optimality via rotational-invariance [24], ity and Theorem 2 applied to any V such that Y1 → n n n n n n which has roots in a doubling trick applied successfully in the Y2 → V |X2 . Since I(Y2 ; V |X2 ) = I(Y0 ; V,X2 ) and n n n n study of functional inequalities (e.g., [25]–[27]). The critical I(Y1 ; V |X2 ) = I(Y1 ; V,X2 ), the supremum can be taken n n n n n ingredients can be summarized as follows: over V : Y1 → Y0 → V instead of Y1 → Y2 → V |X2 . The Han-Kobayashi achievable region [19], [20] evaluated Lemma 1. There exists a sequence {Xn,Qn} satisfying for Gaussian inputs (without power control) can be expressed lim sλ(Xn, snr|Qn) = Vλ(snr) (29) 1 n→∞ as the set of rate pairs (R1,R2) satisfying Ri ≤ 2 log(1+Pi), 2 i = 1, 2 and E[Xn] ≤ 1 (30)

D 2 2 2R1 2 and (X ,Q ) −→ (X ,Q ), with X |{Q = q} ∼ N(µ , σ ) −2R α P2 2 1 − α n n ∗ ∗ ∗ ∗ q X 2 2 2 ≥ 2 2 + 2 . (25) (P2 + 1 − α )(1 + α P1 + P2) P2 + 1 − α for PQ∗ -a.e. q, with σX ≤ 1 not depending on q. D 2 2 3Costa’s EPI plays a role in GIC converse results (e.g., [17]). Since we have Lemma 2. If Xn −→ X∗ ∼ N(µ, σX ) and supn E[Xn] < ∞, seen that Theorem 2 subsumes Costa’s EPI, the GIC is a natural application. then lim infn→∞ sλ(Xn, snr) ≥ sλ(X∗, snr).

2296 2016 IEEE International Symposium on Information Theory

The proof of Lemma 1 is sketched in Section III-A. The (X2,Y2,Q2) denote two independent copies of (X,Y,Q). proof of Lemma 2 is more mechanical (though, nontrivial) Define and is omitted due to space constraint. 1 1 With the above Lemmas in hand, the proof of Theorem X+ = √ (X1 + X2) X− = √ (X1 − X2) , (33) 2 2 8 follows from calculus and the classical EPI. We require the following proposition, which is an easy corollary of the and in a similar manner, define Y+,Y−. Letting Q = (Q1,Q2), conditional EPI. we have for λ ≥ 1 Proposition 2. Let X ∼ N√(0, γ) and Z ∼ N(0, 1) be 2s (X, snr|Q) ≥ s (X , snr|X , Q) + s (X , snr|Y , Q) independent, and define Y = snrX + Z. Then for λ ≥ 1, λ λ + − λ − + 2s (X, snr|Q) ≥ s (X , snr|Y , Q) + s (X , snr|X , Q).   λ λ + − λ − + inf I(Y ; V ) − λI(X; V ) V :X→Y →V D ( Lemma 4. Suppose (X1,n,X2,n) −→ (X1,∗,X2,∗) with 1 log ((λ − 1)γ snr)−λ log λ−1 (1 + γ snr) γ snr ≥ 1 2 2 = 2 λ λ−1 supn E[Xi,n] < ∞ for i = 1, 2. Let (Z1,Z2) ∼ N(0, σ I) 0 γ snr ≤ 1 . λ−1 be pairwise independent of (X1,n,X2,n) and, for i = 1, 2, define Yi,n = Xi,n + Zi. If X1,n,X2,n are independent and Proof of Theorem 8. Noting that sλ(X, snr) is invariant to translations of E[X], it follows from Lemmas 1 and 2 that lim inf I(X1,n + X2,n; X1,n − X2,n|Y1,n,Y2,n) = 0, (34) n→∞ Vλ(snr) = inf sλ(Xγ , snr), where Xγ ∼ N(0, γ). 0≤γ≤1 then X1,∗,X2,∗ are independent Gaussian random variables with identical variances. Recalling the definition of sλ( · , snr), Proposition 2 implies The proof of Lemma 4 is omitted due to space constraint. sλ(Xγ , snr) However, we point out that it is similar in spirit to a famous ( h     i 1 λ log λ2πe − log 2πe + log(snr) if γ snr ≥ 1 result of Bernstein: If X1,X2 are independent random vari- = 2 λ−1 λ−1 λ−1 1 1 ables such that X1 + X2 and X1 − X2 are independent, then 2 [λ log (2πe (1 + γ snr)) − log (2πeγ)] if γ snr ≤ λ−1 . X1 and X2 are normal, with identical variances. Differentiating with respect to the quantity γ, we find that 1 Proof of Lemma 1. For convenience, we will refer to any 2 [λ log (2πe (1 + γ snr)) − log (2πeγ)] is decreasing in γ provided γ snr ≤ 1 . Therefore, taking γ = 1 minimizes sequence {Xn,Qn} satisfying (29)-(30) as admissible. Since λ−1 s (X , snr|Q ) is invariant to translations of the mean of sλ(Xγ , snr) over the interval γ ∈ [0, 1], proving the claim. λ n n Xn, we may restrict our attention to admissible sequences Given the explicit characterization of Vλ(snr), a dual form satisfying E[Xn] = 0 without any loss of generality. of (3), we are now in a position to prove Theorem 1. Begin by letting {Xn,Qn} be an admissible sequence with the property that Proof of Theorem 1. We first establish (3) under the addi- 0 0 0 0  2 lim (h(Yn|Qn) − h(Xn|Qn)) ≤ lim inf h(Yn|Qn) − h(Xn|Qn) tional assumption that E[X ] < ∞. Toward this goal, since n→∞ n→∞ (35) mutual information is invariant√ to scaling, it is sufficient to prove that, for Y = snrX + Z with [X2] ≤ 1 and 0 0 E for any other admissible sequence {Xn,Qn}. Such a sequence Z ∼ N(0, 1) independent of X, we have can always be constructed by a diagonalization argument, and 22(h(Y )−I(X;V )) ≥ snr 22(h(X)−I(Y ;V )) + 22h(Z) (31) therefore exists. Moreover, we can show that the LHS of (35) is finite due to Vλ(snr) < ∞ and conditioning reduces entropy. for V satisfying X → Y → V . Multiplying both sides by σ2 By the same logic as in the remark following (28), we Var(X) and choosing snr := σ2 gives the desired inequality (3) may assume that Qn ∈ Q, where |Q| = 3, since this is 2 2 when E[X ] < ∞. Thus, to prove (31), observe by definition sufficient to preserve the values of E[Xn], sλ(Xn, snr|Qn) of Vλ(snr) that and (h(Yn|Qn) − h(Xn|Qn)). Thus, since Q is finite and 2 E[Xn] ≤ 1, the sequence {Xn,Qn} is tight. By Prokhorov’s −h(X) + I(Y ; V ) ≥ λ(I(X; V ) − h(Y )) + Vλ(snr). (32) theorem, we may assume that there is some (X∗,Q∗) for D Minimizing the RHS over λ proves the inequality. In particular, which (Xn,Qn) −→ (X∗,Q∗) by restricting our attention to 2 elementary calculus shows that the RHS of (32) is minimized a subsequence of {Xn,Qn} if necessary. Moreover, E[X∗ ] ≤ λ 1 −2(I(X;V )−h(Y )) lim inf [X2] ≤ 1 by Fatou’s lemma. when λ satisfies λ−1 = 2πe 2 . Substituting n→∞ E n into (32) and recalling 22h(Z) = 2πe proves (31). Next, for a given n, let (X1,n,Q1,n) and (X2,n,Q2,n) The assumption that E[X2] < ∞ can be eliminated via a denote two independent copies of (Xn,Qn). Define truncation argument. See [23]. 1 1 X+,n = √ (X1,n + X2,n) X−,n = √ (X1,n − X2,n), A. Proof of Lemma 1 2 2 We now sketch the proof of Lemma 1, the most significant In a similar manner, define Y+,n,Y−,n, and put Qn = technical ingredient needed to establish Theorem 8. We begin (Q1,n,Q2,n). Applying Lemma 3, we obtain with two lemmas, stated without proof (see [23] for details). 2sλ(Xn, snr|Qn) The first is a superadditive property enjoyed by sλ(X, snr|Q) on doubling, and the second is a characterization of the normal ≥ sλ(X+,n, snr|X−,nQn) + sλ(X−,n, snr|Y+,nQn), (36) distribution in the context of weak convergence. and the symmetric inequality √ Lemma 3. Let P be the Gaussian channel Y = snrX + Y |X 2s (X , snr|Q ) Z, where Z ∼ N(0, 1) is independent of X. Now, sup- λ n n ≥ sλ(X+,n, snr|Y−,nQn) + sλ(X−,n, snr|X+,nQn). (37) pose (X,Y,Q) ∼ PXQPY |X , and let (X1,Y1,Q1) and

2297 2016 IEEE International Symposium on Information Theory

P2 By independence of X1,n and X2,n and the assumption that Hence, Vλ(Γ) ≥ i=1 Vλ(Γi). Induction proves the claim. E[Xn] = 0, we have

2 2 1 2 1 2 2 To finish, the proof of Theorem 2 follows similarly to that of E[X+,n] = E[X−,n] = E[X1,n] + E[X2,n] = E[Xn] ≤ 1. 2 2 Theorem 1, but by first whitening W and employing Theorem Hence, it follows that the terms in the RHS of (36) and 9 in place of Theorem 8. the RHS of (37) are each lower bounded by Vλ(snr). Since EFERENCES limn→∞ sλ(Xn, snr|Qn) = Vλ(snr) by definition, we find that R 1   [1] F. du Pin Calmon, Y. Polyanskiy, and Y. Wu, “Strong data processing Vλ(snr)= lim sλ(X+,n, snr|Y−,nQn)+sλ(X−,n, snr|Y+,nQn) . inequalities in power-constrained Gaussian channels.” in International n→∞ 2 Symposium on Information Theory. IEEE, 2015, pp. 2558–2562. In particular, by letting the random pair (X0 ,Q0 ) correspond [2] G. Toscani, “A strengthened entropy power inequality for log-concave n n densities,” Information Theory, IEEE Transactions on, vol. 61, no. 12, to equal time-sharing between the pairs (X+,n, (Y−,nQn)) pp. 6550–6559, 2015. and (X−,n, (Y+,nQn)), we have constructed an admissible [3] M. Madiman and A. Barron, “Generalized entropy power inequalities sequence {X0 ,Q0 } which satisfies and monotonicity properties of information,” Information Theory, IEEE n n Transactions on, vol. 53, no. 7, pp. 2317–2329, 2007. 0 0 [4] M. Madiman, J. Melbourne, and P. Xu, “Forward and reverse lim sλ(Xn, snr|Qn) = Vλ(snr). (38) n→∞ entropy power inequalities in convex geometry,” arXiv preprint arXiv:1604.04225, 2016. The following identity can be shown by standard manipuation [5] O. Rioul, “Information theoretic proofs of entropy power inequalities,” 0 0 0 0 IEEE Transactions on Inf. Theory, vol. 57, no. 1, pp. 33–55, Jan 2011. h(Yn|Qn) − h(Xn|Qn) = h(Yn|Qn) − h(Xn|Qn) (39) [6] M. Costa, “A new entropy power inequality,” IEEE Transactions on Inf. 1 Theory, vol. 31, no. 6, pp. 751–760, Nov 1985. + I(X+,n; X−,n|Y+,n,Y−,n, Qn). [7] R. Liu, T. Liu, H. Poor, and S. Shamai, “A vector generalization of 2 Costa’s entropy-power inequality with applications,” IEEE Transactions Since the sequence {X0 ,Q0 } is admissible, it must also on Inf. Theory, vol. 56, no. 4, pp. 1865–1879, April 2010. n n [8] A. Wagner, S. Tavildar, and P. Viswanath, “Rate region of the quadratic satisfy (35). Therefore, in view of (39) and the fact that the Gaussian two-encoder source-coding problem,” IEEE Transactions on LHS of (35) is finite, this implies that Inf. Theory, vol. 54, no. 5, pp. 1938–1961, May 2008. [9] S. Bobkov and M. Madiman, “Reverse Brunn-Minkowski and reverse lim inf I(X1,n + X2,n; X1,n − X2,n|Y1,n,Y2,n, Qn) = 0. entropy power inequalities for convex measures,” Journal of Functional n→∞ Analysis, vol. 262, no. 7, pp. 3309–3339, 2012. This completes the proof since an application of Lemma 4 [10] A. J. Stam, “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Information and Control, vol. 2, no. 2, pp. guarantees that, for PQ∗ -a.e. q, the random variable X∗|{Q∗ = 101–112, 1959. q} is normal with variance not depending on q, and moreover [11] L. Gross, “Logarithmic Sobolev inequalities,” American Journal of we have already observed that [X2] ≤ 1, so the variance of Mathematics, vol. 97, no. 4, pp. 1061–1083, 1975. E ∗ [12] Y. Oohama, “Gaussian multiterminal source coding,” IEEE Transactions X∗|{Q∗ = q} is at most unity as claimed. on Inf. Theory, vol. 43, no. 6, pp. 1912–1923, Nov 1997. [13] V. Prabhakaran, D. Tse, and K. Ramachandran, “Rate region of the B. Extension to Random Vectors quadratic Gaussian CEO problem,” in International Symposium on Information Theory, June 2004, pp. 119–124. The vector generalization of Shannon’s EPI is proved by [14] Y. Oohama, “Rate-distortion theory for Gaussian multiterminal source a combination of conditioning, Jensen’s inequality and induc- coding systems with several side informations at the decoder,” IEEE tion. The same argument does not appear to readily apply in Transactions on Inf. Theory, vol. 51, no. 7, pp. 2577–2593, July 2005. [15] J. Wang, J. Chen, and X. Wu, “On the sum rate of Gaussian multiterminal generalizing Theorem 1 to its vector version due to complica- source coding: New proofs and results,” IEEE Transactions on Inf. tions arising from the Markov constraint X → (X+W) → V . Theory, vol. 56, no. 8, pp. 3946–3960, 2010. However, the desired generalization may be established by [16] T. Courtade and J. Jiao, “An extremal inequality for long Markov chains,” in 52nd Annual Allerton Conference on Communication, Con- noting an additivity property enjoyed by the dual form. trol, and Computing. IEEE, 2014, pp. 763–770. For a random vector X ∼ PX, let Y be defined via the [17] Y. Polyanskiy and Y. Wu, “Wasserstein continuity of entropy and outer additive Gaussian noise channel Y = Γ1/2X + Z, where Z ∼ bounds for interference channels,” arXiv preprint arXiv:1504.04419, 2015. N(0,I) is independent of X and Γ is a diagonal matrix with [18] M. Costa, “On the Gaussian interference channel,” IEEE Transactions nonnegative diagonal entries. Analogous to the scalar case, on Inf. Theory, vol. 31, no. 5, pp. 607–615, Sep 1985. define for λ ≥ 1 [19] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge University Press, 2012. [20] H. Te Sun and K. Kobayashi, “A new achievable rate region for the sλ(X, Γ|Q) = − h(X|Q) + λh(Y|Q) (40) n o interference channel,” IEEE transactions on information theory, vol. 27, + inf I(Y; V |Q) − λI(X; V |Q) , no. 1, pp. 49–60, 1981. V :X→Y→V |Q [21] M. H. Costa, “Noisebergs in Z-Gaussian interference channels,” in Information Theory and Applications Workshop (ITA). IEEE, 2011, and consider the optimization problem pp. 1–6. [22] C. Nair and M. H. Costa, “Gaussian Z-interference channel: Around the Vλ(Γ) = inf sλ(X, Γ|Q). (41) corner,” in Information Theory and Applications Workshop (ITA), 2016. 2 PXQ : E[Xi ]≤1,i∈[n] IEEE, 2016. [23] T. A. Courtade, “Strengthening the entropy power inequality,” arXiv Theorem 9. If Γ = diag(snr1, snr2,..., snrn), then preprint arXiv:1602.03033, 2016. n [24] Y. Geng and C. Nair, “The capacity region of the two-receiver Gaussian X vector broadcast channel with private and common messages,” IEEE Vλ(Γ) = Vλ(snri). Transactions on Inf. Theory, vol. 60, no. 4, pp. 2087–2104, April 2014. i=1 [25] E. H. Lieb, “Gaussian kernels have only Gaussian maximizers,” Inven- tiones mathematicae, vol. 102, no. 1, pp. 179–208, 1990. Proof. Let Γ be a block diagonal matrix with blocks given by [26] E. A. Carlen, “Superadditivity of Fisher’s information and logarithmic Γ = diag(Γ1, Γ2). Partition X = (X1, X2) and Z = (Z1, Z2) Sobolev inequalities,” Journal of Functional Analysis, vol. 101, no. 1, such that Y = Γ1/2X + Z for i = 1, 2. Then, for any V pp. 194–211, 1991. i i i i [27] F. Barthe, “Optimal Young’s inequality and its converse: a simple proof,” such that X → Y → V |Q, it is shown in [23] that Geometric & Functional Analysis GAFA, vol. 8, no. 2, pp. 234–242, 1998. sλ(X, Γ|Q) ≥ sλ(X1, Γ1|X2,Q) + sλ(X2, Γ2|Y1,Q). (42)

2298