arXiv:cs/0701050v2 [cs.IT] 13 Apr 2007 nrp ne h vrac rsrig rnfrain[5 transformation preserving” “variance the under entropy any for X nycekdta h eesr odto o oa minimu local a for condition of necessary the that checked only convergenc with theorem [4]. limit entropy relative central provi in the in instrumental of and is separation version and source strong [3]) blind e.g., in (see, application [2]. deconvolution finds channel also broadcast EPI MIMO non-Gaussian The Gaussian the determine of to of capacity used region early the was capacity as bound it Recently, to Shannon channels. [1] by noise additive paper used 1948 was his It as theorems. coding source or a logarithms all that paper this throught natural. assumed is it where ifrniletoyo h rbblt est function density probability the of entropy differential r nfc roso natraiestatement alternative an of proofs fact in are on ntedfeeta nrp ftesmo independent of sum the of entropy variables random differential the on bound nomto n ietp ohF rMS representations. MMSE or mu of FI properties both the simp on sidesteps new, based a and solely providing is information by which gap EPI, the the proofs fill versi of then of dual proof and essentially view proof, are same unified they the that a of showing present MMSE, first and FI we er via paper, mean-square entropy this minimum In differential or (MMSE). (FI) of information representations Fisher’s either integral of on mutual proofs theoretic or hinge information entropy available see exception: (EPI) of an inequality properties be power basic entropy the Shannon’s from information, deduced be can iheult if equality with hno’ ro fteEI[]wsicmlt nta he that in incomplete was [1] EPI the of proof Shannon’s channel of converses proving in application its finds EPI The hno’ nrp oe nqaiy(P)gvsalower a gives (EPI) inequality power entropy Shannon’s Abstract h sdfie as defined is ipePofo h nrp-oe Inequality Entropy-Power the of Proof Simple A ( X h exp(2 + ( 0 √ hl otueu nomto hoei inequalities theoretic information useful most While — Y ≤ X λ ( ) h ,Y X, λ sstse.Aalbergru roso h EPI the of proofs rigorous Available satisfied. is ( X + ≤ X ,Y X, + √ ) i rpriso uulInformation Mutual of Properties via 1 h and 7−→ 1 hc mut otecnaiyo the of concavity the to amounts which , ( Y .I I. X − )) Y = ) ihdensities: with Y λ W ≥ NTRODUCTION r asinrno aibe.The variables. random Gaussian are exp(2 = E ) n ≥ √ log X λ λh h p ( ( X ( X 1 X + )+exp(2 + )) (1 + ) ) √ et oEe,GTTe´cmPrs(ENST) GET/T´el´ecom Paris ComElec, Dept. o 1 , − − Y. λ aiTc nttt NSLTCI CNRS & Institute ParisTech λ ) h mi:[email protected] Email: h ( ( Y Y )) ) p h EPI the ( lve Rioul Olivier sto ms x using ai,France Paris, ga ng ) tual ]: ons ror the (3) (2) (4) (1) of re m le e MEi aeb h olwn dniydrvdi [11]: in derived identity following the by made is MMSE qa entropies: equal oseta 3 seuvln o() define (1), to equivalent is (3) that see To ntepoete fteMS netmtn h input the estimating in MMSE the bas of EPI properties the of the proof alternative on an provided [10] [9], Shamai where X be hc sidpnetof independent is which able, seas lcmn[].I sbsdo h rprisof properties the on based is It (FI) [7]). information Fisher’s Blachman also (see ftelgrtm ovrey aigepnnil,()wri (3) exponentials, taking for Conversely, concav logarithm. the the from follows of (3) sides, both of logarithms Taking uptmta information mutual output h ikbtendfeeta nrp n Ibigd Bruijn de 17.7.2]: being Thm. FI and [8, entropy identity differential between link the eoe h inlt-os ai.Ti MEi civdby achieved is estimator MMSE mean conditional This the ratio. signal-to-noise the denotes hr h xetto stknoe h on itiuino distribution joint variables the over random taken is expectation the where MEadF ersnain n s nybscproperties basic only use and representations FI and MMSE more alternative EPI. is the the representation using proving FI for that convenient of iden- [10] and Bruijn’s place insightful claimed de in been to representation has equivalent MMSE It be to (6). out tity turns identity This variance conditional the by asincanlgvnteoutput the given channel Gaussian a x 2 exp h rtrgru ro fteEIwsgvnb tm[6] Stam by given was EPI the of proof rigorous first The nti ae,w hwta ti osbet vi both avoid to possible is it that show we paper, this In 1 = − ,V U, h λ √ ( Y Z e U λ ) 2 dt h d or mle 1 for (1) implies ( N ∼ , √ I Y ( Var λ U λ dt d X = = h + ; (0 ( x 2 exp ( √ X √ e X √ J X , 2 1 X t | 1) h ( 1 − Y + X ( V λ X and − h = ) = ) √ sasadr asinrno vari- random Gaussian standard a is + ( )) U V λ ) / Z t Z x 2 exp = ) ( ≥ Y E λ e E = ) n n ert 1 sfollows: as (1) rewrite and , h oncinbteninput- between connection The . = ) 2 n I λe h hsns that so chosen ( ( ( X X X 2 X X 1 2 ˆ p h p 1 2 ) − ; ′ Var ( ( eety edu u and Verd´u, Guo Recently, . ( ( U J Y + Y X X E h ) ( = ) = ) e X ) ( Y ( { (1 + ) 2 X V X  h + = 2 ) ( | | E Y √ htis, that , o ,V U, Y √ ( − √ ) h , X t X } ) ( Z t . X t ) λ Y | 2 U Y ) yterelations the by o ) + e ) ) , , 2 + − and h Z n sgiven is and x 2 exp ( Z V ) h . where , ) ( . λ V Z h ( ) X X have the f tten ) and (6) (5) (7) (8) ity ed to = ’s t of mutual information. The new proof of the EPI presented where S(X)= p′(X)/p(X) is a zero-mean . in this paper is based on a convexity inequality for mutual The following conditional mean representation is due to Blach- information under the variance preserving transformation (4): man [7]: S(X + Z)= E S(Z) X + Z . (12) { | } Theorem 1: If X and Y are independent random variables, By the “law of total variance”, this gives and if Z is Gaussian independent of X, Y , then J(X + Z)= Var S(X + Z) I(√λX + √1 λ Y + √tZ; Z) { } − = Var E S(Z) X + Z λI(X + √tZ; Z)+(1 λ)I(Y + √tZ; Z) (9) { { | }} ≤ − = Var S(Z) Var S(Z) X + Z { }− { | } for all 0 λ 1 and t 0. = J(Z) Var S(Z) X + Z (13) Apart from≤ its≤ intrinsic≥ interest, we show that inequality (9) − { | } We now use the fact that is standard Gaussian. It is easily reduces to the EPI by letting t . Z seen by direct calculation that and , Before turning to the proof→ of ∞ Theorem 1, we make the S(Z) = Z J(Z)=1 and, therefore, J(X + Z)=1 Var Z X− + Z . Since Z connection between earlier proofs of the EPI via FI and via E E − { | Var } − MMSE by focusing on the essential ingredients common to (Z X + Z)= (X X + Z) X we have Z X + Z = Var | , thereby| showing− (10). { | } the proofs. This will give an idea of the level of difficulty that X X + Z Note{ | that when} is Gaussian but not standard Gaussian, we is required to understand the conventional approaches, while Z have Var and Var , and (10) also serving as a guide to understand the new proof which uses S(Z) = Z/ (Z) J(Z)=1/ (Z) generalizes to − similar ingredients, but is comparatively simpler and shorter. The remainder of the paper is organized as follows. Sec- Var(Z)J(X + Z)+ J(Z)Var(X X + Z)=1. (14) tion II gives a direct proof of a simple relation between FI | and MMSE, interprets (6) and (8) as dual consequences of Another proof, which is based on a data processing argument a generalized identity, and explores the relationship between and avoids Blachman’s representation, is given in [13]. the two previous proofs of the EPI via FI and via MMSE. It B. Intepretation is shown that these are essentially dual versions of the same Equation (10) provides a new estimation theoretic interpre- proof; they follow the same lines of thought and each step has tation of Fisher’s information of a noisy version X′ = X + Z an equivalent formulation, and a similar interpretation, in terms of X. It is just the complementary quantity to the MMSE that of FI and MMSE. Section III then proves Theorem 1 and the results from estimating X from X′. The estimation is all the EPI using two basic ingredients common to earlier approaches, more better as the MMSE is lower, that is, as X′ provides namely 1) a “data processing” argument applied to (4); 2) a higher FI. Thus Fisher’s information is a measure of least Gaussian perturbation method. The reader may wish to skip squares estimation’s efficiency, when estimation is made in to this section first, which does not use the results presented additive Gaussian noise. earlier. The new approach has the advantage of being very To illustrate, consider the special case of a Gaussian random simple in that it relies only on the basic properties of mutual variable X. Then the best estimator is the linear regression information. 2 estimator, with MMSE equal to Var(X X′)=(1 ρ )Var(X) where ρ = Var(X)/Var(X ) is the| correlation− factor be- II. PROOFSOFTHE EPI VIA FI AND MMSE REVISITED ′ tween X andpX′: The central link between FI and MMSE takes the form of Var(X) a simple relation which shows that they are complementary Var(X X′)= . (15) | Var(X)+1 quantities in the case of a standard Gaussian perturbation Z independent of X: Meanwhile, J(X′) is simply the reciprocal of the variance of X′: Var (10) 1 J(X + Z)+ (X X + Z)=1 J(X′)= . (16) | Var(X)+1 This identity was mentioned in [11] to show that (6) and (8) Both quantities sum to one, in accordance with (10). In the are equivalent. We first provide a direct proof of this relation, case of non-Gaussian noise, we have the more general iden- and then use it to unify and simplify existing proofs of the EPI tity (13) which also links and conditional via FI and via MMSE. In particular, two essential ingredients, variance, albeit in a more complicated form. namely, Fisher’s information inequality [7], [12], and a related inequality for MMSE [9], [10], will be shown to be equivalent C. Dual versions of de Bruijn’s Identity from (10). De Bruijn’s identity can be stated in the form [5] d 1 A. A new proof of (10) h(X + √tZ) = J(X)Var(Z). (17) dt t=0 2 Fisher’s information (5) can be written in the form The conventional technical proof of (17) is obtained by in- J(X)= E S2(X) = Var S(X) (11) tegrating by parts using a diffusion equation satisfied by the { } { } 2 Gaussian distribution (see e.g., [7] and [8, Thm. 17.7.2]). A Here J(XG)=1/σ and (23) is nonnegative by the Cram´er- simpler proof of a more general result is included in [13]. Rao inequality. Now for t =0, Dh(X +√tZ)= Dh(X), and From (17), we deduce the following. since non-Gaussianness is scale invariant, Dh(X + √tZ) = Theorem 2 (de Bruijn’s identity): For any two random in- Dh(Z + X/√t) Dh(Z)= 0 as t + . Therefore, → → ∞ dependent random variables X and Z, integrating (22) from t = 0 to + we obtain a FI integral representation for ∞ [4] d 1 √ √ Var h(X + tZ)= J(X + tZ) (Z) (18) 1 ∞ dt 2 Dh(X)= DJ (X + √tZ) dt (24) 2 Z0 if Z is Gaussian, and or 1 1 ∞ 1 d 1 Var 2 √ h(X + √tZ)= J(X) (Z X + √tZ) (19) h(X)= log 2πeσ J(X + tZ) 2 dt. dt 2 | 2 − 2 Z0 − σ + t  (25) if is Gaussian. X Similarly, from (19) with X and Z interchanged, we obtain a This theorem is essentially contained in [11]. In fact, noting dual identity: that I(X; √tX + Z) = h(√tX + Z) h(Z), it is easily − d 1 seen that (19), with X and Z interchanged, is the identity (8) Dh(√tX + Z)= DV (X √tX + Z), (26) of Guo, Verd´uand Shamai. Written in the form (19) it is dt 2 | clear that this is a dual version of the conventional de Bruijn’s where for Y = √tX + Z and YG = √tXG + Z, identity (18). Note that both identities reduce to (17) for t =0. DV (X Y )= Var(XG YG) Var(X Y ). (27) For completeness we include a simple proof for t> 0. | | − | Proof: Equation (18) easily follows from (17) using the Again this quantity is nonnegative, because Var(XG YG) = 2 2 | stability property of Gaussian distributions under convolution: σ /(tσ + 1) is the MMSE achievable by a linear esti- mator, which is suboptimal for non-Gaussian X. Now for substitute X + √t′ Z′ for X in (17), where Z and Z′ are taken to be iid Gaussian random variables, and use the fact t = 0, Dh(√tX + Z) = Dh(Z) vanishes, and since non- Gaussianness is scale invariant, Dh(√tX + Z) = Dh(X + that √tZ + √t′ Z′ and √t + t′ Z are identically distributed. Z/√t) Dh(X) as t + . Therefore, integrating (26) To prove (19) we use the complementary relation (14) in → → ∞ the form from t = 0 to + we readily obtain the MMSE integral representation for differential∞ entropy [11]: Var(X)J(X + √tZ)+ tJ(X)Var(Z X + √tZ)=1 (20) | 1 ∞ Dh(X)= DV (X √tX + Z) dt (28) where X is Gaussian. Let u = 1/t. By (18) (with X and Z 2 Z0 | interchanged), we have or 2 1 1 ∞ σ d d 1 h(X)= log 2πeσ2 Var(X √tX+Z) dt. √ √ 2 h(X + tZ)= h( uX + Z)+ log t 2 − 2 Z0 1+ tσ − | dt dtn 2 o  (29) 1 d 1 √ This MMSE representation was first derived in [11] and used = 2 h( uX + Z)+ −t du 2t in [9], [10] to prove the EPI. The FI representation (25) can 1 1 = Var(X)J(√uX + Z)+ be used similarly, yielding essentially Stam’s proof of the −2t2 2t 1 1 EPI [6], [7]. These proofs are sketched below. = Var(X)J(X + √tZ)+ . Note that the equivalence between FI and MMSE rep- −2t 2t resentations (24), (28) is immediate by the complementary which combined with (20) proves (19). relation (10), which can be simply written as

D. Equivalent integral representations of differential entropy DJ (X + Z)= DV (X X + Z). (30) | Consider any random variable with density and finite vari- In fact, it is easy to see that (24) and (28) prove each ance σ2 = Var(X). Its non-Gaussianness is defined as the other by (30) after making a change of variable u = 1/t. divergence with respect to a Gaussian random variable XG Both sides of (30) are of course simultaneously nonnegative with identical second centered moments, and is given by and measure a “non-Gaussianness” of X when estimated in additive Gaussian noise. Interestingly, these FI and MMSE Dh(X)= h(XG) h(X) (21) − non-Gaussianities coincide. 1 G 2 It is also useful to note that in the above derivations, the where h(X ) = 2 log 2πeσ . Let Z be standard Gaussian, independent of X. From (18), we obtain Gaussian random variable XG may very well be chosen such that σ2 = Var(XG) is not equal to Var(X). Formulas (21)– d 1 Dh(X + √tZ)= DJ (X + √tZ), (22) (30) still hold, even though quantities such as Dh, DJ , and dt −2 DV may take negative values. In particular, the right-hand where sides of (25) and (29) do not depend on the particular choice DJ (X)= J(X) J(XG). (23) of σ. − E. Simplified proofs of the EPI via FI and via MMSE At any rate, the equivalence between (32) and (33) is im- For Gaussian random variables XG, YG with the same mediate by the complementary relation (10) or its generalized variance, the EPI in the form (3) holds trivially with equality. version (14), as can be easily seen. Either one of (32), (33) Therefore, to prove the EPI, it is sufficient to show the gives a proof of the EPI. following convexity property: The above derivations illuminate intimate connections be- tween both proofs of the EPI, via FI and via MMSE. They Dh(W ) λDh(X)+(1 λ)Dh(Y ) (31) ≤ − do not only follow the same lines of argumentation, but are where Dh( ) is defined by (21) and W is defined as in (4). also shown to be dual in the sense that through (10), each By the last· remark of the preceding subsection, the FI and step in these proofs has an equivalent formulation and similar MMSE representations (24), (28) hold. Therefore, to prove the interpretation in terms of FI or MMSE. EPI, it is sufficient to show that either one of the following III. A NEWPROOFOFTHE EPI inequalities holds. For convenience in the following proof, define W by (4) √ DJ (W + tZ) and let α = √tλ and β = t(1 λ). − λDJ (X + √tZ)+(1 λ)DJ (Y + √tZ) p ≤ − A. Proof of Theorem 1 DV (W √t W + Z) | Similarly as in the conventional proofs of the EPI, we λDV (X √tX + Z)+(1 λ)DV (Y √t Y + Z). ≤ | − | use the fact that the linear combination √λ(X + αZ) + These in turn can be written as √1 λ(Y +βZ)= W +√tZ cannot bring more information than− the individual variables X + αZ and Y + βZ together. J(W + √tZ) Thus, by the data processing inequality for mutual information, λJ(X + √tZ)+(1 λ)J(Y + √tZ) (32) ≤ − I(W + √tZ; Z) I(X + αZ, Y + βZ; Z). (37) Var(W √t W + Z) ≤ | Let U = X + αZ, V = Y + βZ and develop using the chain λVar(X √tX + Z)+(1 λ)Var(Y √t Y + Z), rule for mutual information: ≥ | − | (33) because these inequalities hold trivially with equality for G I(U, V ; Z)= I(U; Z)+ I(V ; Z U) X | and YG. I(U; Z)+ I(V ; Z U)+ I(U; V ) ≤ | Inequality (33) is easily proved in the form = I(U; Z)+ I(V ; U,Z) Var(W √t W + Z) Var(W √tX + Z′, √t Y + Z′′) (34) = I(U; Z)+ I(V ; Z)+ I(U; V Z). | ≥ | | where Z′ and Z′′ are standard Gaussian random variables, Since X and Y are independent, U and V are conditionally independent of X, Y and of each other, and Z = √λZ′ + independent given Z, and therefore, I(U; V Z)=0. Thus, we | √1 λZ′′. This has a simple interpretation [9], [10]: it is obtain the inequality better− to estimate the sum of independent random variables from individual noisy measurements than from the sum of I(W + √tZ; Z) I(X + αZ; Z)+ I(Y + βZ; Z) (38) ≤ these measurements. Assume for the moment that I(X + αZ; Z) admits a second- The dual inequality (32) can also be proved in the form order Taylor expansion about α = 0 as t 0. Since I(X + → J(W ) λJ(X)+(1 λ)J(Y ) (35) αZ; Z) vanishes for α = 0, and since mutual information is ≤ − nonnegative, we may write where we have substituted X, Y for X + √tZ′, Y + √tZ′′. This inequality is known as the Fisher’s information inequal- I(X + αZ; Z)= λI(X + √tZ; Z)+ o(α2) ity [5], and an equivalent formulation is o t where o(α2)= o(t) and ( ) tends to zero as t 0. Similarly 1 1 1 t → + . (36) I(Y +βZ; Z)=(1 λ)I(Y +√tZ; Z)+o(t). It follows that J(X + Y ) J(X) J(Y ) ≥ in the vicinity of t =0− , Blachman [7] gave a short proof using representation (12) and Zamir [12] gave an insightful proof using a data processing I(W + √tZ; Z) λI(X + √tZ; Z) ≤ argument. Again (36) has a simple interpretation, which is + (1 λ)I(Y + √tZ; Z)+ o(t). (39) very similar to that of (34). Here 1/J(X) is the Cram´er-Rao − lower bound (CRB) of the mean-squared error of the unbiaised We now remove the o(t) term by using the assumption that estimator X for a translation parameter, and (36) states that Z is Gaussian. Consider the variables X′ = X + √t′ Z1′ , in terms of the CRB it is better to estimate the translation Y ′ = Y + √t′ Z2′ , where Z1′ ,Z2′ are identically distributed as parameter corresponding to the sum of independent random Z but independent of all other random variables. This Gaussian variables X + Y from individual measurements than from the perturbation ensures that densities are smooth, so that I(X′ + sum of these measurements. αZ; Z) and I(Y ′ + βZ; Z) both admit a second-order Taylor expansion about t = 0. We may, therefore, apply (39) to X′ We now let t in the right-hand side of this inequality. Let → ∞ and Y ′, which gives ε =1/√t and XG be a Gaussian random variable independent of Z, with identical second moments as X. Then I(X; X + √ √ I(W + t′ Z′ + √tZ; Z) λI(X + t′ Z1′ + √tZ; Z) √tZ)= I(X; εX+Z)= h(εX+Z) h(Z) h(εXG+Z) ≤ 1 2 2 − ≤ − √ h(Z)= log(1 + σX /tσZ ), which tends to zero as t . + (1 λ)I(Y + t′ Z2′ + √tZ; Z)+ o(t) (40) 2 − This holds similarly for the other terms in the right-hand→ si ∞de where Z′ = √λZ′ +√1 λZ′ is identically distributed as Z. of (43). Therefore, the EPI (3) follows. 1 − 2 Applying the obvious identity I(X +Z′ +Z; Z)= I(X +Z′ + In light of this proof, we see that theorem 1, which contains 2 Z; Z′ +Z) I(X +Z′; Z′) and using the stability property of the EPI as the special case where σ , merely states that − Z the Gaussian distribution under convolution, (40) boils down the difference h(W + Z) λh(X +→Z) ∞ (1 λ)h(Y + Z) to between both sides of the− EPI (3) decreases− − as independent f(t′ + t) f(t′)+ o(t) Gaussian noise Z is added. This holds in accordance with the ≤ fact that this difference is zero for Gaussian random variables where we have noted f(t) = I(W + √tZ; Z) λI(X + with identical variances. √tZ; Z) (1 λ)I(Y + √tZ; Z). It follows that −f(t) is non − − One may wonder if mutual informations in the form increasing in t, and since it clearly vanishes for t =0, it always I(X; √tX + Z) rather than I(X + √tZ; Z) could be used in assumes non positive values for all t 0. This completes the ≥ the above derivation of Theorem 1 and the EPI, in a similar proof of (9). manner as Verd´uand Guo’s proof uses (29) rather than (25). Interestingly, this proof uses two basic ingredients common In fact, this would amount to proving (43), whose natural to earlier proofs presented in section II: 1) the fact that two derivation using the data processing inequality for mutual variables together bring more information than their sum, information is through (37). which is here expressed as a data processing inequality for The same proof we used above can be employed verbatim to mutual information; 2) a Gaussian perturbation argument using prove the EPI for random vectors. Generalizations to various an auxiliary variable Z. extended versions of EPI are provided in a follow-up to this In fact, Theorem 1 could also be proved using either one work [13]. of the integral representations (25), (29), which are equivalent by virtue of (10) and obtained through de Bruijn’s identity as REFERENCES explained in section II. The originality in the present proof is [1] C. E. Shannon, “A mathematical theory of communication,” Bell System that it does neither require de Bruijn’s identity nor the notions Technical Journal, vol. 27, pp. 623–656, Oct. 1948. [2] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “The capacity region of FI or MMSE. of the Gaussian multiple-input multiple-output broadcast channel,” IEEE Transactions on , vol. 52, no. 9, pp. 3936–3964, Sept. B. The discrete case 2006. [3] J. F. Bercher and C. Vignat, “Estimating the entropy of a signal with The above proof of Theorem 1 does not require that X or Y applications,” IEEE Transactions on Signal Processing, vol. 48, no. 6, be random variables with densities. Therefore, Theorem 1 also pp. 1687–1694, June 2000. holds for discrete (finitely or countably) real valued random [4] A. R. Barron, “Entropy and the central limit theorem,” The Annals of Probability, vol. 14, no. 1, 336–342 1986. variables. Verd´uand Guo [9] proved that the EPI, in the [5] A. Dembo, T. M. Cover, and J. A. Thomas, “Information theoretic form (3), also holds in this case, where differential entropies inequalities,” IEEE Transactions on Information Theory, vol. 37, no. 6, are replaced by entropies. We call attention that this is in fact pp. 1501–1518, Nov. 1991. [6] A. J. Stam, “Some inequalities satisfied by the quantities of information a trivial consequence of the stronger inequality of Fisher and Shannon,” Information and Control, vol. 2, pp. 101–112, June 1959. H(√λX + √1 λ Y ) max H(X), H(Y ) (41) [7] N. M. Blachman, “The convolution inequality for entropy powers,” IEEE − ≥  Transactions on Information Theory, vol. 11, no. 2, pp. 267–271, Apr. for any two independent discrete random variables X and Y . 1965. This inequality is easily obtained by noting that [8] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Wiley, 2006. [9] S. Verd´uand D. Guo, “A simple proof of the entropy-power inequality,” H(W ) H(W Y )= H(X Y )= H(X) (42) ≥ | | IEEE Transactions on Information Theory, vol. 52, no. 5, pp. 2165– 2166, May 2006. and similarly for H(Y ). [10] D. Guo, S. Shamai (Shitz), and S. Verd´u, “Proof of entropy power inequalities via MMSE,” in Proceedings of the IEEE International C. Proof of the EPI Symposium on Information Theory, Seattle, USA, July 2006, pp. 1011– 1015. We now show that the EPI for differential entropies, in the [11] ——, “Mutual information and minimum mean-square error in Gaussian form (3), follows from Theorem 1. By the identity I(X + channels,” IEEE Transactions on Information Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005. Z; Z)+ h(X)= I(X; X + Z)+ h(Z), inequality (9) can be [12] R. Zamir, “A proof of the Fisher information inequality via a data pro- written in the form cessing argument,” IEEE Transactions on Information Theory, vol. 44, no. 3, pp. 1246–1250, May 1998. h(W ) λh(X) (1 λ)h(Y ) I(W ; W + √tZ) [13] O. Rioul, “Information theoretic proofs of entropy power inequalities,” − − − ≥ IEEE Transactions on Information Theory, submitted for publication, λI(X; X + √tZ) (1 λ)I(Y ; Y + √tZ). (43) arXiv cs.IT/0704.1751, April 2007. − − −