On the Similarity of the Entropy Power Inequality And
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONSONINFORMATIONTHEORY,VOL. IT-30,N0. 6,NOVEmER1984 837 Correspondence On the Similarity of the Entropy Power Inequality The preceeding equations allow the entropy power inequality and the Brunn-Minkowski Inequality to be rewritten in the equivalent form Mc4x H.M.COSTAmMBER,IEEE,AND H(X+ Y) 2 H(X’+ Y’) (4) THOMAS M. COVER FELLOW, IEEE where X’ and Y’ are independent normal variables with corre- sponding entropies H( X’) = H(X) and H(Y’) = H(Y). Verifi- Abstract-The entropy power inequality states that the effective vari- cation of this restatement follows from the use of (1) to show that ance (entropy power) of the sum of two independent random variables is greater than the sum of their effective variances. The Brunn-Minkowski -e2H(X+Y)1 > u...eeWX)1 + leWY) inequality states that the effective radius of the set sum of two sets is 27re - 27re 2ae greater than the sum of their effective radii. Both these inequalities are 1 recast in a form that enhances their similarity. In spite of this similarity, E---e ZH(X’) 1 1 e2H(Y’) there is as yet no common proof of the inequalities. Nevertheless, their / 27re 2?re intriguing similarity suggests that new results relating to entropies from = a$ + u$ known results in geometry and vice versa may be found. Two applications of this reasoning are presented. First, an isoperimetric inequality for = ux2 ’+y’ entropy is proved that shows that the spherical normal distribution mini- 1 mizes the trace of the Fisher information matrix given an entropy con- e-e ZH(X’+Y’) (5) straint-just as a sphere minimizes the surface area given a volume 2ae constraint. Second, a theorem involving the effective radii of growing where the penultimate equality follows from the fact that the sum convex sets is proved. of two independent normals is normal with a variance equal to I. THEENTROPYPOWERINEQUALITY the sum of the variances. By the same line of reasoning, the entropy power inequality for Let a random variable X have a probability density function independent random n-vectors X and Y that is given by f(x), x E R. Then its (differential) entropy H(X) is defined as e2H(X+ V/n > eWXVn + eWWn (6) H(X) = -sf(x)lnf(x)dx. can be recast as Shannon’s entropy power inequality [l] states that for X and Y H(X+ Y) 2 H(X’+ Y’) (7) independent random variables having density functions where X’, Y’ are independent multivariate normal random vec- p(X+ Y) 2 pf(X) + ewY) (1) tors with proportional covariance matrices and corresponding We wish to recast this inequality. First we observe that a entropies. normal random variable Z - Cp(z) = (l/ ~)e-zz~20z with II. THJZBRIJNN-~JINKOWSKI~NEQUALITY variance o2 has entropy Let A and B be two measurable sets in R”. The set sum C = A + B of these sets may be written as H(Z) = -J+ln+ C= {x+y:x~A,y~B}. (8) = + ln2seu2. (4 Let V(A) denote the volume of A. The Brunn-h4inkowski in- equality [2], [3] states that By inverting, we see that if Z is normal with entropy H(Z), then its variance is I+‘“(,4 + B) 2 Y’/“(A) + Y”“(B). (9) To recast this inequality, we observe that an n sphere S with (3) radius r has volume Thus, the entropy power inequality is an inequality between V(S) = c,r”. (10) effective variances, where effective variance (entropy power) is Thus if S is a sphere with volume Y, its radius is simply the variance of the normal random variable with the same entropy. r = ( V/C,)~‘~. (11) Hence the Brunn-Minkowski inequality can be viewed as an Manuscript received October 21, 1983; revised July 16, 1984. This work was inequality between the radii of the spherical equivalents of the partially supported by the National Science Foundation under Grant ECS82- sets. 11568, the Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior Rewriting the Brunt-h4inkowski inequality following the (CAPES), the Instituto de PesquisasEspaciais (INPE) of Brazil, and the Joint Services Electronics Program under Grant DAAG-29-81-K-0057. This work model in (5) gives was presented at the IEEE International Symposium on Information Theory, St. Jovite, PQ, Canada, September 26-30, 1983. V( A + B) 2 V( A’ + B’) (12) T. Cover is with the Department of Electrical Engineering and Statistics, Stanford University, Stanford, CA 94305 USA. M. Costa is with Instituto de PesquisasEspaciais, Caixa Postal 515, SBo Jose where A’ and B’ are spheres with volumes V(A’) = V(A) and dos Campos, SP 12200, Brazil. V( B’) = V(B). 0018-9448/84/1100-0837$01.00 01984 IEEE 838 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-30, NO. 6, NOVEMBER 1984 III. COMPARISONS S(X), the “surface area” of a multivariate random variable X, as Inspecting the convolution reveals further similarity between e2HW+Z,) _ eWW S(X) A lim the entropy power and Brunn-Minkowski inequalities. The en- E (18) tropy H( X + Y) is a functional of the convolution, i.e., E+O where 2, is Gaussian with covariance matrix EZ and 2, is H(X+ Y) = -/fz lnfz, (13) independent of X. Thus S(X) is the rate of change of e2H when a small normal random variable is added. We may write where s(x) = firn de2Wx+Z,) . (19) fz(z) = tfx*fY)tz) t,+o dt t=tg Now let X, % X + 2, have density denoted by pI ( xr). Assum- ing H(X) is finite, we prove in the Appendix that Similarly, the volume I/(A + B) is a functional of a convolu- -$(X,) = yYg tion, i.e., PI V(A + B) =/I,(z) dz (14) where ]]vp,ll denotes the norm of the gradient of p,(q). It follows that where Z,(z) is the indicator function for C = A + B given by S(X)=Eme 7-H(X). L(z) = mtaZA(t)ZB(z - t). (20) (15) P2 We note that fi(z) is the L, norm of h,(t) =fx(t)fr(z - t); From the entropy power inequality (7), we have Z,(z) is the L, norm of h,(t) = Z,(r)Z,(z - t); and L, and L, are dual spaces. e2H(X+Zc) > e2H(X’+Z,) (21) Finally eH, like Y, is a measure of volume. For example, for all where X’ is a Gaussian n vector with covariance matrix of the random variables X with support set A, we have H(X) I In I’( A) form 0’1 such that H(X’) = H(X). Thus, from (18) and (21), we with equality if the probability density is uniform over A. More- obtain the following bound on S(X): over, from the Asymptotic Equipartition Property, we know that the volume of the set of e-typical n-sequences(Xi, X,, . , X,), ,ZH(X’+Z,) _ e2H(X’) with X, independent and identically distributed (i.i.d.) according S(X) 2 lim = S(X’). (22) to f(x), is equal to enH(X) to the first order in the exponent. c-0 E Thus, eff(X) = enH(X) is the volume of the typical set for X = For the spherical Gaussian vector X’, the “surface area” is found (XI, x*9.. .9 X,). To suggest a link between the Gaussian distri- from (20) to be bution and spheres, we note that the r-typical set of n sequences (XI, x*,-. .> X,) is given by a sphere when X, are i.i.d. according 2sene2H(X’) to the Gaussian distribution. S(X’) = (23) These observations suggest not only that the two inequalities eWX’)/n may be different manifestations of the same underlying idea, but that there also may be a continuum of inequalities between L, Recalling that H(X’) = H(X), we combine (20), (22), and (23) and L, with their respective natural definitions of volume. to obtain the entropy analog of the isoperimetric inequality In spite of the obvious similarity between the above inequali- IElIVPl12- > ties, there is no apparent similarity between any of the known 2ree-WW”. (24) proofs of the Brunn-Minkowski inequality [2]-[4] and the Stam n P2 and Blachman [5], [6] proofs of the entropy power inequality, nor have we succeededin finding a new common proof. Nevertheless, Let J(X) denote the trace of the Fisher information matrix for the similarity of these inequalities suggeststhat we may find new the translation family of densities { p( x - 6)}, 6 E R”, and results relating to entropies from known results in geometry and x E R”. We formalize the above inequality in a theorem. vice versa. We present two applications of this reasoning. Theorem: Suppose the multivariate random variable X has finite entropy H(X), and suppose J(X) exists. Let X, denote IV. I~~PER~METR~CINEQUALITIES X + Z,, where Z, is spherical normal with covariance matrix tZ It is known that the sphere minimizes surface area for given and independent of X. If X has a continuous density p(n) so volume. A proof follows immediately from the Brunn- that Minkowski inequality [4] as shown below. For “regular” sets A, J(X) = JYP!!t the surface area S(A) of A is given by P2 Y(A + SC) - V(A) S(A) = lim (16) and if the integrals H(X,) and J(X,) converge uniformly near Cd0 E t = 0 so that where S, is a sphere of radius e > 0. Using the Brunn- Minkowski inequality (12) we have NW = ~l$m) &qA) 2 h VP + SC> - V(A’) and = S( A’) (17) c-0 E 4-O = ~+ytxt) where A’ is a sphere with volume V(A’) = Y(A).