1 Moment-entropy inequalities for a random vector Erwin Lutwak, Deane Yang, and Gaoyong Zhang

Abstract— The p-th moment is defined for a real Throughout this paper, X denotes a random vector in Rn. random vector, generalizing the classical . The probability measure on Rn associated with a random Sharp inequalities relating the p-th moment and Renyi entropy vector X is denoted mX . are established, generalizing the classical inequality relating n the second moment and the Shannon entropy. The extremal We will denote the standard Lebesgue density on R by distributions for these inequalities are completely characterized. dx. By the density function fX of a random vector X, we mean the Radon-Nikodym derivative of probability measure mX with respect to Lebesgue measure. If V is a vector space and Φ: Rn → V is a continuous I.INTRODUCTION function, then the expected value of Φ(X) is given by In [9] the authors demonstrated how the classical informa- Z tion theoretic inequality for the Shannon entropy and second E[Φ(X)] = Φ(x) dmX (x). n moment of a real random variable could be extended to R inequalities for Renyi entropy and the p-th moment. The We call a random vector X nondegenerate, if E[|v·X|] > 0 extremals of these inequalities were also completely charac- for each nonzero v ∈ Rn. terized. Moment-entropy inequalities, using Renyi entropy, for discrete random variables have also been obtained by Arikan [2]. B. The p-th moment of a random vector We describe how to extend the definition of the second mo- For p ∈ (0, ∞), the standard p-th moment of a random ment matrix of a real random vector to that of the p-th moment vector X is given by matrix. Using this, we extend the moment-entropy inequalities Z p p and the characterization of the extremal distributions proved E[|X| ] = |x| dmX (x). (1) n in [9] to higher dimensions. R Variants and generalizations of the theorems presented can More generally, the p-th moment with respect to the inner be found in work of the authors [8], [10], [11] and Bastero- product h·, ·iA is Romance [3]. Z p p The authors would like to thank Christoph Haberl for his E[|X|A] = |x|A dmX (x). n careful reading of this paper and valuable suggestions for R improving it. C. The p-th moment matrix

II.THE p-THMOMENTMATRIXOFARANDOMVECTOR The second moment matrix of a random vector X is defined A. Basic notation to be M2[X] = E[X ⊗ X], Throughout this paper we denote: n n where for v ∈ R , v ⊗ v is the linear transformation given R = n-dimensional Euclidean space by x 7→ (x · v)v. Recall that M2[X − E[X]] is the covariance x · y = standard Euclidean inner product of x, y ∈ Rn √ matrix. An important observation is that the definition of the |x| = x · x moment matrix does not use the inner product on Rn. S = positive definite symmetric n-by-n matrices A unique characterization of the second moment matrix is the following: Let M = M [X]. The inner product h·, ·i −1/2 |A| = determinant of A ∈ S 2 M is the unique one whose unit ball has maximal volume among n |K| = Lebesgue measure of K ⊂ R . all inner products h·, ·iA that are normalized so that the second 2 n moment satisfies E[|AX| ] = n. The standard Euclidean ball in R will be denoted by B, and We extend this characterization to a definition of the p-th its volume by ωn. n moment matrix M [X] for all p ∈ (0, ∞). Each inner product on R can be written uniquely as p Theorem 1: If p ∈ (0, ∞) and X is a nondegenerate n n n (x, y) ∈ R × R 7→ hx, yiA = Ax · Ay, random vector in R with finite p-th moment, then there exists a unique matrix A ∈ S such that for A ∈ S. The associated norm will be denoted by | · |A. E[|X|p ] = n E. Lutwak ([email protected]), D. Yang ([email protected]), and A G. Zhang ([email protected]) are with the Department of Mathematics, Polytechnic University, Brooklyn, New York. and were supported in part by and NSF Grant DMS-0405707. |A| ≥ |A0|, 2

0 p for each A ∈ S such that E[|X|A0 ] = n. Moreover, the matrix Observe that hλ[X,Y ] is continuous in λ. A is the unique matrix in S satisfying Lemma 2: If X and Y are random vectors such that hλ[X], h [Y ], and h [X,Y ] are finite, then I = E[AX ⊗ AX|AX|p−2]. λ λ We define the p-th moment matrix of a random vector X to hλ[X,Y ] ≥ 0. −p be Mp[X] = A , where A is given by the theorem above. The proof of the theorem is given in §IV Equality holds if and only if X = Y . Proof: If λ > 1, then by the Holder¨ inequality,

λ−1 1 III.MOMENT-ENTROPYINEQUALITIES Z Z  λ Z  λ λ−1 λ λ fY fX dx ≤ fY dx fX dx , A. Entropy n n n R R R The Shannon entropy of a random vector X is defined to and if λ < 1, then we have be Z Z Z λ λ−1 λ λ(1−λ) h[X] = − fX log fX dx, fX = (fY fX ) fY n n n R R R Z λ Z 1−λ provided that the integral above exists. For λ > 0 the λ-Renyi λ−1 λ ≤ fY fX fY . entropy power of a density function is defined to be n n R R  1 Z  1−λ The inequality for λ = 1 follows by taking the limit λ → 1.  λ  fX if λ 6= 1, The equality conditions for λ 6= 1 follow from the equality Nλ[X] = n R conditions of the Holder¨ inequality. The inequality for λ =  h[f] e if λ = 1, 1, including the equality condition, follows from the Jensen provided that the integral above exists. Observe that inequality (details may be found, for example, page 234 in [4]). lim Nλ[X] = N1[X]. λ→1 The λ–Renyi entropy of a random vector X is defined to be C. Generalized Gaussians We call the extremal random vectors for the moment- h [X] = log N [X]. λ λ entropy inequalities generalized Gaussians and recall their The entropy hλ[X] is continuous in λ and, by the Holder¨ definition here. inequality, decreasing in λ. It is strictly decreasing, unless X Given t ∈ R, let is a uniform random vector. t = max(t, 0). It follows by the chain rule that + Let Nλ[AX] = |A|Nλ[X], (2) Z ∞ Γ(t) = xt−1e−x dx for each A ∈ S. 0 denote the Gamma function, and let B. Relative entropy Γ(a)Γ(b) β(a, b) = Given two random vectors X,Y in Rn, their relative Shan- Γ(a + b) non entropy or Kullback–Leibler distance [6], [5], [1] (also, denote the Beta function. see page 231 in [4]) is defined by For each p ∈ (0, ∞) and λ ∈ (n/(n + p), ∞), define the Z   fX standard generalized Gaussian to be the random vector Z in h1[X,Y ] = fX log dx, (3) n n n f f : → [0, ∞) R Y R whose density function Z R is given by provided that the integral above exists. Given λ > 0, we define  p 1/(λ−1) ap,λ(1 + (1 − λ)|x| )+ if λ 6= 1 the relative λ–Renyi entropy power of X and Y as follows. If  fZ (x) = (5) λ 6= 1, then  −|x|p ap,1e if λ = 1, 1 1 Z  1−λ Z  λ λ−1 λ fY fX dx fY dx where n n R R  n Nλ[X,Y ] = 1 , (4) p(1 − λ) p Z  λ(1−λ)  if λ < 1, λ  n 1 n fX dx nωnβ( p , 1−λ − p ) n  R  p  if λ = 1, and, if λ = 1, then ap,λ = nω Γ( n )  n p h1[X,Y ]  n N1[X,Y ] = e ,  p(λ − 1) p  if λ > 1.  n λ provided in both cases that the righthand side exists. Define nωnβ( p , λ−1 ) the λ–Renyi relative entropy of random vectors X and Y by Any random vector Y in Rn that can be written as Y = AZ, hλ[X,Y ] = log Nλ[X,Y ]. for some A ∈ S is called a generalized Gaussian. 3

D. Information measures of generalized Gaussians If λ 6= 1, then by (9) and (5), (1), (11), and (7), Z If 0 < p < ∞ and λ > n/(n + p), the λ-Renyi entropy λ−1 fY fX power of the standard generalized Gaussian random vector Z n is given by R Z ≥ aλ−1tn(1−λ) + (1 − λ)aλ−1tn(1−λ)−p |x|pf (x) dx 1 X  n  n(λ − 1) λ−1 R  −1 λ−1 n(1−λ) −p p  1 + ap,λ if λ 6= 1 = a t (1 + (1 − λ)t E[|X| ]) Nλ[Z] = pλ λ−1 n(1−λ) p  n −1 = a t (1 + (1 − λ)E[|Z|] ])  p e ap,1 if λ = 1 Z n(1−λ) λ = t fZ , (12) If 0 < p < ∞ and λ > n/(n + p), then the p-th moment n R of Z is given by where equality holds if λ < 1. It follows that if λ 6= 1, then h  p  i−1 by Lemma 2, (4), (10) and (12), and (11), we have E[|Z|p] = λ 1 + − 1 . n λ 1 ≤ Nλ[X,Y ] We define the constant − 1 λ Z  Z  1−λ Z  1−λ p 1/p λ λ λ−1 E[|Z| ] = fY fX fY fX c(n, p, λ) = n n n N [Z]1/n R R R λ (6) 1 n Nλ[Z] h  p  i− p ≤ t = a1/n λ 1 + − 1 b(n, p, λ), N [X] p,λ n λ E[|X|p]n/p N [Z] = λ . where p n/p Nλ[X] E[|Z| ]  1   n(1−λ)  1 − n(1−λ) if λ 6= 1 b(n, p, λ) = pλ If λ = 1, then by Lemma 2, (3) and (5), and (8) and (11), e−1/p if λ = 1. 0 ≤ h1[X,Y ] Observe that if λ 6= 1 and 0 < p < ∞, then = −h[X] − log a + n log t + t−pE[|X|p] Z n E[|X|p] λ λ−1 p = −h[X] + h[Z] + log . fZ = ap,λ (1 + (1 − λ)E[|Z| ]), (7) p n p E[|Z| ] R and if λ = 1, then Lemma 2 shows that equality holds in all cases if and only if Y = X. p h[Z] = − log ap,1 + E[|Z| ]. (8) We will also need the following scaling identities: F. Elliptic moment-entropy inequalities −n −1 Corollary 4: If A ∈ S, p ∈ (0, ∞), λ > n/(n + p), and ftZ (x) = t fZ (t x), (9) n p X is a random vector in R satisfying Nλ[X],E[|X| ] < ∞, for each x ∈ Rn. Therefore, then E[|X|p ]1/p Z Z A ≥ c(n, p, λ), λ n(1−λ) λ 1/n 1/n (13) ftZ dx = t fZ dx, (10) |A| Nλ[X] n n R R c(n, p, λ) and where is given by (6). Equality holds if and only if X = tA−1Z for some t ∈ (0, ∞). E[|tZ|p] = tpE[|Z|p]. Proof: By (2) and Theorem 3, E[|X|p ]1/p E[|AX|p]1/p E. Spherical moment-entropy inequalities A = |A|1/nN [X]1/n N [AX]1/n The proof of Theorem 2 in [9] extends easily to prove the λ λ E[|Z|p]1/p following. A more general version can be found in [7]. ≥ , 1/n Theorem 3: If p ∈ (0, ∞), λ > n/(n + p), and X is a Nλ[Z] n p random vector in R such that Nλ[X],E[|X| ] < ∞, then and equality holds if and only if AX = tZ for some t ∈ E[|X|p]1/p (0, ∞). ≥ c(n, p, λ), 1/n Nλ[X] G. Affine moment-entropy inequalities where c(n, p, λ) is given by (6). Equality holds if and only if X = tZ, for some t ∈ (0, ∞). Optimizing Corollary 4 over all A ∈ S yields the following affine inequality. Proof: For convenience let a = ap,λ. Let Theorem 5: If p ∈ (0, ∞), λ > n/(n + p), and X is a  p 1/p n p E[|X| ] random vector in R satisfying Nλ[X],E[|X| ] < ∞, then t = p (11) E[|Z| ] |M [X]|1/p p ≥ n−n/pc(n, p, λ)n, and Y = tZ. Nλ[X] 4 where c(n, p, λ) is given by (6). Equality holds if and only if Theorem 8: If p ≥ 0 and X is a nondegenerate random X = A−1Z for some A ∈ S. vector in Rn with finite p-th moment, then there exists A ∈ S, −1/p Proof: Substitute A = Mp[X] into (13) unique up to a scalar multiple, such that Conversely, Corollary 4 follows from Theorem 5 by Theo- |A|−1/nE[|AX|p]1/p ≤ |A0|−1/nE[|A0X|p]1/p (17) rem 1. for every A0 ∈ S. 0 IV. PROOFOF THEOREM 1 Proof: Let S ⊂ S be the subset of matrices whose maximum eigenvalue is exactly 1. This is a bounded set inside A. Isotropic position of a probability measure the set of all symmetric matrices, with its boundary ∂S0 equal A Borel measure µ on Rn is said to be in isotropic position, to positive semidefinite matrices with maximum eigenvalue if 1 and minimum eigenvalue 0. Given A0 ∈ S0, let e be an Z x ⊗ x 1 eigenvector of A0 with eigenvalue 1. By Lemma 7, 2 dµ(x) = I, (14) n |x| n R |A0|−1/nE[|A0X|p]1/p ≥ |A0|−1/nE[|X · e|p]1/p where I is the . (18) ≥ c1/p|A0|−1/n. Lemma 6: If p ≥ 0 and µ is a Borel probability measure in isotropic position, then for each A ∈ S, Therefore, if A0 approaches the boundary ∂S0, the left side of (18) grows without bound. Since the left side of (18) is Z p 1/p −1/n |Ax| a continuous function on S0, the existence of a minimum |A| p dµ(x) ≥ 1, n |x| R follows. with either equality holding if and only if A = aI for some Let A ∈ S be such a minimum and Y = AX. Then for a > 0. each B ∈ S, Proof: By Holder’s¨ inequality, |B|−1/nE[|BY |p]1/p = |A|1/n|BA|−1/nE[|(BA)X|p]1/p 1/n −1/n p 1/p Z |Ax|p 1/p Z |Ax|  ≥ |A| |A| E[|AX| ] p dµ(x) ≥ exp log dµ(x) , p 1/p n |x| n |x| = E[|Y | ] . R R (19) so it suffices to prove the p = 0 case only. By (14), with equality holding if and only if equality holds for (17) Z (x · e)2 1 with A0 = BA. Setting B = I + tB0 for B0 ∈ S, we get 2 dµ(x) = , (15) n |x| n R |I + tB0|−1/nE[|(I + tB0)Y |p]1/p ≥ E[|Y |p]1/p, for any unit vector e. Let e , . . . , e be an orthonormal basis of eigenvectors of A for each t near 0. It follows that 1 n with corresponding eigenvalues λ , . . . , λ . By the concavity d 0 −1/n 0 p 1/p 1 n |I + tB | E[|(I + tB )Y | ] = 0, of log, and (15), dt t=0 Z |Ax| 1 Z |Ax|2 for each B0 ∈ S. A straightforward computation shows that log dµ(x) = log 2 dµ(x) n |x| 2 n |x| this holds only if R R 1 Z n (x · e )2 1 X 2 i E[|Y |p]I = E[Y ⊗ Y |Y |p−2]. (20) = log λi 2 dµ(x) 2 n |x| n R i=1 Z n 2 Applying Lemma 6 to 1 X (x · ei) 2 ≥ 2 log λi dµ(x) p 2 n |x| |x| dmY (x) R i=1 dµ(x) = , nE[|Y |p] = log |A|1/n. implies that equality holds for (19) only if B = aI for some The equality condition follows from the strict concavity of log. a ∈ (0, ∞). This, in turn, implies that equality holds for (17) only if A0 = aA. Theorem 1 follows from Theorem 8 by rescaling A so that p B. Proof of theorem E[|Y | ] = n and substituting Y = AX into (20). Lemma 7: If p > 0 and X is a nondegenerate random REFERENCES vector in Rn with finite p-th moment, then there exists c > 0 such that [1] Shun-ichi Amari, Differential-geometrical methods in , Lecture p Notes in Statistics, vol. 28, Springer-Verlag, New York, 1985. MR E[|e · X| ] ≥ c, (16) 86m:62053 [2] Erdal Arikan, An inequality on guessing and its application to sequential for every unit vector e. decoding, IEEE Trans. Inform. Theory 42 (1996), 99–105. Proof: The left side of (16) is a positive continuous [3] J. Bastero and M. Romance, Positions of convex bodies associated to extremal problems and isotropic measures, Adv. Math. 184 (2004), no. 1, function of the unit sphere, which is compact. 64–88. 5

[4] Thomas M. Cover and Joy A. Thomas, Elements of information the- ory, John Wiley & Sons Inc., New York, 1991, A Wiley-Interscience Publication. [5] I. Csiszar,´ Information-type measures of difference of probability distri- butions and indirect observations, Studia Sci. Math. Hungar. 2 (1967), 299–318. MR 36 #2428 [6] S. Kullback and R. A. Leibler, On information and sufficiency, Ann. Math. Statistics 22 (1951), 79–86. MR 12,623a [7] E. Lutwak, D. Yang, and G. Zhang, The Cramer-Rao inequality for star bodies, Duke Math. J. 112 (2002), 59–81. [8] , Moment–entropy inequalities, Annals of Probability 32 (2004), 757–774. [9] , Cramer-Rao and moment-entropy inequalities for Renyi entropy and generalized , IEEE Trans. Inform. Theory 51 (2005), 473–478. [10] , Lp John ellipsoids, Proc. London Math. Soc. 90 (2005), 497– 520. [11] , Optimal Sobolev norms and the Lp Minkowski problem, Int. Math. Res. Not. (2006), 62987, 1–21.

Erwin Lutwak Erwin Lutwak received his B.S., M.S., and Ph.D. degrees in Mathematics from Polytechnic University, where he is now Professor of Mathematics.

Deane Yang Deane Yang received his B.A. in mathematics and physics from University of Pennsylvania and Ph.D. in mathematics from Harvard University. He has been an NSF Postdoctoral Fellow at the Courant Institute and on the faculty of Rice University and Columbia University. He is now a full professor at Polytechnic University.

Gaoyong Zhang Gaoyong Zhang received his B.S. degree in mathematics from Wuhan University of Science and Technology, M.S. degree in mathemat- ics from Wuhan University, Wuhan, China, and Ph.D. degree in mathematics from Temple University, Philadelphia. He was a Rademacher Lecturer at the University of Pennsylvania, a member of the Institute for Advanced Study at Princeton, and a member of the Mathematical Sciences Research Institute at Berkeley. He was an assistant professor and is now a full professor at Polytechnic University.