1 Affine moments of a random vector Erwin Lutwak, Songjun Lv, Deane Yang, and Gaoyong Zhang

Abstract—An affine invariant p-th is defined explicit integral formula for this moment is derived. We then for a random vector and used to prove sharp moment-entropy establish sharp information inequalities giving a sharp lower inequalities that are more general and stronger than standard bound of Renyi´ entropy in terms of the affine moment. These moment-entropy inequalities. new affine moment-entropy inequalities imply the moment- Index Terms—Moment, affine moment, information measure, entropy inequalities obtained in [32], [35]. information theory, entropy, Renyi´ entropy, Shannon entropy, In [26] an approach similar to the one taken here is used Shannon theory, entropy inequality, Gaussian, generalized Gaus- sian. to define an affine invariant version of Fisher information, and a corresponding sharp Fisher information inequality is established. I.INTRODUCTION It is worth noting that the affine moment measure introduced The moment-entropy inequality states that a continuous ran- here, as well as the affine Fisher information studied in [26], dom variable with given second moment has maximal Shannon is closely related to analogous affine invariant generalizations entropy if and only if it is a Gaussian (see, for example, [15], of the surface area of a convex body (see, for example, [9], Theorem 9.6.5). The Fisher information inequality states that [27], [30]). In fact, the results presented here are part of an a continuous with given Fisher information ongoing effort by the authors to explore connections between has minimal Shannon entropy if and only if it is a Gaussian information theory and geometry. (see [40]). In [32] these classical inequalities were extended to Previous results on this include a connection between the sharp inequalities involving the Renyi´ entropy, p-th moment, entropy power inequality in information theory and the Brunn- and generalized Fisher information of a random variable, Minkowski inequality in convex geometry first demonstrated where the extremal distributions are no longer Gaussians but by Lieb [23] and also discussed by Costa and Cover [12]. This are fat-tailed distributions that the authors call generalized was developed further by Cover, Dembo, and Thomas [14] Gaussians. (also, see [15]). Also, see [10] for generalizations of various There are different ways to extend the definition of a p-th entropy and Fisher information inequalities related to mass moment to random vectors in Rn. The most obvious one is transportation, and [4]–[8], [37], [38] for a new connection to define it using the Euclidean norm of the random vector. between affine information inequalities and log-concavity. This, however, assumes that the standard inner product on Rn In view of this, the authors of [19] began to systematically gives the right invariant scalar measure of error or noise in the explore connections between information theory and convex random vector. It is more appropriate to seek a definition of geometry. The goals are to both establish information-theoretic moment that does not rely on the standard inner product. inequalities that are the counterparts of geometric inequalities For example, if the moment of a random vector is defined and investigate possible applications of ideas in information using the standard Euclidean norm, then the extremals for theory to convex geometry. This has led to papers in both the corresponding classical moment-entropy inequality are information theory (see [19], [26], [31], [32], [35]) and convex Gaussians with matrix equal to a multiple of the geometry (see [28], [29]). In the next section we follow the identity. It is more desirable to define an invariant moment suggestion of a referee and provide a brief survey of this measure, where any Gaussian (or generalized Gaussian) is an ongoing effort. extremal distribution and not just the one whose covariance matrix is a multiple of the identity. II.INFORMATIONTHEORETICANDGEOMETRIC One approach for defining an invariant moment measure, INEQUALITIES taken in [35], leads to the definition of the p-th moment matrix The usual way of associating a random vector X in Rn with and the corresponding sharp moment-entropy inequalities. a compact convex K in Rn is to define X as the uniform Here, we introduce a different approach, where a scalar random vector in K. In [19], a different construction of affine moment measure is defined by averaging 1-dimensional random vectors associated to a star-shaped set was introduced. p-th moments obtained by projecting an n-dimensional ran- It was also shown that the information theoretic invariants dom vector along each possible direction in Rn with respect to of the distributions constructed, called contoured distributions a given measure, and then optimizing this average are equivalent to geometric invariants of the convex set K. over all probability measures with given p-th moment. An This provides a direct link between sharp information theoretic inequalities satisfied by contoured distributions and sharp E. Lutwak ([email protected]), D. Yang ([email protected]), and [email protected] geometric inequalities satisfied by convex sets. G. Zhang ( ) are with the Department of Mathematics, n Polytechnic Institute of New York University, and were supported in part by Let K be a bounded star-shaped set in R about the origin. n NSF Grant DMS-0706859. S. Lv ([email protected]) is with the Col- Its gauge , gK : R → [0, ∞), is defined by lege of Mathematics and Computer , Chongqing Normal University, and was supported in part by Chinese NSF Grant 10801140. gK (x) = inf{t > 0 : x ∈ tK}. 2

The gauge function is homogeneous of degree 1. the classical John ellipsoid (the ellipsoid of maximal volume A random vector X has a contoured distribution if its inside a convex body) is the L∞ case. This new result in probability density function is given by convex geometry suggests that it is natural to define a concept of Lp Fisher information matrix as a corresponding object of fX (x) = ψ(gK (x − x0)), the Lp John ellipsoid. The usual Fisher information matrix where K is a bounded star-shaped set with respect to the is the L2 case. This was done in the recent paper [26]. An L x0, and ψ is a 1-dimensional function that we call the radial extension of the covariance matrix to p covariance matrix p profile. If ψ is monotone, then the level sets of fX are dilates (also called -moment matrix) was given earlier in the paper of K with respect to x0. [35], which corresponds to another family of ellipsoids in A straightforward calculation shows that the entropy h(X) geometry that contains the classical Legendre ellipsoid and of X is given by is conceptually dual to the family of Lp John ellipsoids. Affine isoperimetric inequalities are central in the Lp Brunn- h(X) = c0(ψ, n)V (K), Minkowski theory. The authors have been exploring their cor- responding affine information-theoretic inequalities. See the where c (ψ, n) is a constant depending only on the radial 0 survey papers [24] and [42] on affine isoperimetric inequalities profile ψ and the dimension n and V (K) is the n-dimensional in convex geometry. volume (i.e., n-dimensional ) of K. Another For p ≥ 1, λ > n , and independent random vectors X,Y calculation shows that the Fisher information J(X) of X is n+p in Rn, the following moment-entropy inequality was proved given by in [31], J(X) = c1(ψ, n)S2(K), p p p E(|X · Y | ) ≥ c Nλ(X) Nλ(Y ) , (1) where c1(ψ, n) is another constant depending on ψ and n and where Nλ denotes the λ-Renyi´ entropy power, and c is the best Z dS(x) constant that is attained when X,Y are certain generalized S2(K) = ∂K x · ν(x) Gaussian random vectors. The affine isoperimetric inequality behind this moment-entropy inequality is an Lp extension of is called the L2 surface area of K, where the outer unit normal vector ν(x) exists for almost every x ∈ ∂K with respect to the well-known Blaschke-Santalo´ inequality in geometry (see the (n − 1)-dimensional Hausdorff measure dS. [36]). The Shannon entropy h(X) and the λ-Renyi´ entropy power The L2 surface area S2(K) is a geometric invariant in n Nλ(X) of a continuous random vector X in R are affine the Lp Brunn-Minkowski theory in convex geometry. The invariants, that is, they are invariant under linear transfor- usual surface area is viewed as the L1 surface area in the mations of random vectors. To establish affine information- Lp Brunn-Minkowski theory (see [25]). The formula of the theoretic inequalities as counterparts of affine isoperimetric Fisher information and the L2 surface area implies that the inequalities, affine notions of Fisher information and moments classical information theory is closely associated to the L2 Brunn-Minkowski theory. Let K be a convex body (compact as corresponding notions of affine surface areas are needed. In [26], the authors introduced the notion of affine (p, λ)- convex set with nonempty interior) and X be a random vector n in Rn. Denote by B the n-dimensional unit ball and by Fisher information Ψp,λ(X) of a random vector X in R , Z the n-dimensional standard Gaussian random vector. The which is an analogue of the Lp integral affine surface area of following variational formula, a convex body. It was shown that the following affine Fisher √ information and entropy inequality holds: V (K + tB) − V (K) 1 2 p((λ−1)n+1) lim = S2(K), Ψp,λ(X)Nλ(X) ≥ c, (2) t→0+ t 2 1 where +2 is the L2 Minkowski addition (see [25]), and the where 1 ≤ p < n, λ > 1 − n , and c is the best constant de Bruijn’s identity, that is attained when X is a generalized Gaussian random √ vector. This inequality is proved by using an Lp affine Sobolev h(X + tZ) − h(X) 1 lim = J(X), inequality established in [30] (see also [41]). The Lp affine t→0+ t 2 Sobolev inequality is stronger than the classical Lp Sobolev further illustrate the close connection of information theory inequality and comes from the Lp Petty projection inequality and geometry. established in [27] which is an important affine isoperimet- These connections lead to the definition of a new ellipsoid ric inequality in the Lp Brunn-Minkowski theory of convex in the L2 Brunn-Minkowski theory which is the counterpart of geometry. the Fisher information matrix (see [28]), and thus a Cramer- It is one of the purposes this paper to introduce the notion n Rao´ inequality for star-shaped sets was proved in [29]. The of affine p-th moment Mp(X) of a random vector X in R , results were, in turn, applied to information theory to show and to establish an affine moment-entropy inequality. We shall new information-theoretic inequalities (see [19]). prove the following theorem. n It is then natural to investigate the counterpart of the Theorem 1: If 1 ≤ p < ∞, λ > n+p , and X is a random n Lp Brunn-Minkowski theory in information theory. The new vector in R with finite λ-Renyi´ entropy and p-th moment, ellipsoid discovered was shown in [33] to be the L2 case then p of a family of ellipsoids, called Lp John ellipsoids, while Mp(X) ≥ c Nλ(X) , (3) 3 where c is the best constant that is attained only when X is a Γ(·) denotes the gamma function, and B(·, ·) denotes the beta generalized Gaussian random vector. function. Any random vector that can be written as W = AZ, The inequality (3) is proved by using (1). Thus, the affine for an invertible matrix A, is called a generalized Gaussian. isoperimetric inequality behind the affine moment-entropy If α = 2 and β = 0, then Z is the standard Gaussian random inequality (3) is the Lp extension of the Blaschke-Santalo´ vector with mean 0 and matrix equal to the identity. inequality. The functions in the generalized standard Gaussian dis- tributions, which are also called Barenblatt functions, have III.PRELIMINARIES been found to arise naturally as extremals for Sobolev type Let X be a random vector in Rn with probability density inequalities and sharp inequalities relating moment, Renyi´ function fX . entropy, and generalized Fisher information (see, for example, If A is an invertible n × n matrix, then [1], [3], [10], [11], [13], [16], [17], [20], [26], [31], [32], −1 −1 [34], [35]). The usual Gaussians and Student distributions fAX (y) = |A| fX (A y), (4) are generalized Gaussians. The whole class of generalized where |A| is the absolute value of the determinant of A. Gaussian distributions in this form were first studied in [31] as extremal distributions of moment-entropy inequalities. Many A. The p-th moment of a random vector authors have studied special cases of generalized Gaussians For p ∈ (0, ∞), the p-th moment of X is defined to be (see, for example, [1]–[3], [13], [16], [18], [21], [22], [39]). Z p p E(|X| ) = |x| fX (x)dx, Rn B. Information measures of generalized Gaussians where | · | denotes the standard Euclidean norm and dx the standard Lebesgue measure in Rn. Given 0 < p < ∞ and λ > n/(n + p), set the parameters α and β of the standard generalized Gaussian Z so that B. Entropy power 1 n + α 1 The Shannon entropy of X is given by α = p, − = , (6) Z β α λ − 1 h(X) = − fX (x) log fX (x) dx. Rn and β = 0 when λ = 1. We assume throughout this paper that For λ > 0, the λ-Renyi´ entropy power of X is defined to be p, λ, and the parameters α and β of the standard generalized  1 Gaussian satisfy these equations. Z  n(1−λ)  λ The p-th moment of Z is given by  fX (x) dx if λ 6= 1, Nλ(X) = Rn p  1 h(X) E(|Z| ) = n, (7) e n if λ = 1, and the λ-Renyi´ entropy is and its Renyi´ entropy power by

hλ(X) = n log Nλ(X).  1   n(1−λ) − 1 nβ  n By (4), aα,β 1 − if λ 6= 1, 1 Nλ(Z) = α (8) Nλ(AX) = |A| n Nλ(X), (5)  − 1 1  n α for any invertible matrix A. aα,0 e if λ = 1.

IV. GENERALIZED GAUSSIANS See [35] for similar formulas. The following are sketches of calculation. A. Definition For β < 0 ( n < λ < 1), by polar coordinates and change If α > 0 and β < α/n, the corresponding generalized n+α of variable (1 − β rα) = 1 , we have standard Gaussian random vector Z ∈ Rn has density function α t  1 n   β − α −1 α  β α E(|Z| ) aα,β 1 − α |x| if β 6= 0, fZ (x) = + 1 − n −1 α Z   β α  −|x| /α α β α aα,0e if β = 0, = aα,β |x| 1 − |x| dx n α t = max{t, 0} t ∈ R R + where + for , 1 − n −1 Z ∞  β  β α  α β n n α n+α−1 | | α Γ( + 1) = aα,βnωn 1 − r r dr  n α 2  n if β < 0, 0 α  2 n 1  π B( α , 1 − β ) n +1 Z 1   α  α 1 − 1 −1 n  Γ( n + 1) = aα,βnωn t β (1 − t) α dt  2 −β α 0 n n if β = 0, aα,β = n n n π 2 α α Γ( + 1) 2 +1  α nπ  α  α  1 n   = aα,β B , + 1  β n n  α α n αΓ(1 + ) −β −β α  n ( α ) Γ( 2 + 1) 2  n n 1 n if β > 0,  2 = n, π B( α , β − α ) 4

1 α and variable α r = t, we have E(|Z|α)

Nλ(Z) Z 1 α α − α |x| 1 = aα,0 |x| e dx λ( 1 − n −1) ! n(1−λ) n Z   β α R λ β α Z ∞ = a 1 − |x| dx − 1 rα n+α−1 α,β = a nω e α r dr n α + α,0 n R 0 1 ∞ " 1 n # n(1−λ) Z ∞   β − α n n Z β α −t α = a0 nω 1 − rα rn−1dr = aα,0nωnα e t dt α,β n 0 0 α + n nπ 2 n  n  1 α  n Z 1  n(1−λ) = aα,0 n α Γ + 1 nωn  α  α 1 n Γ(1 + ) α 0 − β −1 α −1 2 = aα,β t (1 − t) dt α −β 0 = n, 1  n  n(1−λ) nω  α  α 1  1 n  = a0 n B , α,β α −β α −β α h(Z) 1 Z − 1  βn n(1−λ) = a n 1 − , = − fZ (x) log fZ (x) dx α,β n α R Z α − |x|  = − fZ (x) log aα,0e α dx − 1 (1+ 1 ) n 0 n λ−1 R where aα,β = aα,β . Z  |x|α  α = − fZ (x) log aα,0 + fZ (x) − dx For 0 < β < (λ > 1), by polar coordinates and change n α n R of variable β rα = t, we have 1 α = − log a + E(|Z|α) α,0 α n α = − log a + , E(|Z| ) α,0 α 1 − n −1 Z   β α α β α and thus, = aα,β |x| 1 − |x| dx − 1 1 n α n α N1(Z) = a e . R + α,0 1 − n −1 Z ∞  β  β α = a nω 1 − rα rn+α−1dr V. NOTIONS OF AN AFFINE MOMENT α,β n α 0 + G R n 1 If is the standard Gaussian random vector in with mean α α +1 1 Z n 1 n β − α −1 0 and variance matrix equal to the identity, then the classical = aα,βnωn t α (1 − t) dt β α 0 moment-entropy inequality (see, for example, [15]) states that n n +1 a nπ 2 α α  1 n n  for a random vector X in R, = α,β B − , + 1 n 2 2 αΓ(1 + 2 ) β β α α E(|X| ) E(|G| ) 2 ≥ 2 , (9) = n, N1(X) N1(G) with equality if and only if X = tG, for some t ∈ R\{0}. In and [31], [32], [35], this was extended to the following inequality for the λ-Renyi´ entropy and p-th moment. Theorem 2: If p ∈ (0, ∞), λ > n/(n + p), and X ∈ Rn is N (Z) p λ a random vector such that Nλ(X),E(|X| ) < ∞, then 1 λ( 1 − n −1) ! n(1−λ) Z   β α p p λ β α E(|X| ) E(|Z| ) = aα,β 1 − |x| dx p ≥ p , n α Nλ(X) Nλ(Z) R + 1 " 1 − n # n(1−λ) with equality if and only if there exists t > 0 such that Z ∞   β α 0 β α n−1 X = tZ, where Z is the standard generalized Gaussian with = aα,β nωn 1 − r r dr 0 α + parameters α and β satisfying (6). 1 In [35], an affine p-th moment of a random vector X is  n Z 1  n(1−λ) 0 nωn α α n −1 1 − n = a t α (1 − t) β α dt defined by α,β α β 0 p 1 mp(X) = inf{E(|AX| ): A ∈ SL(n)}, " n # n(1−λ) nω α α  1 n n  = a0 n B 1 + − , α,β α β β α α and the following affine moment-entropy inequality was shown. 1 1   n(1−λ) − n βn Theorem 3: If p ∈ (0, ∞), λ > n/(n + p), and X is a = aα,β 1 − . n p α random vector in R satisfying Nλ(X),E(|X| ) < ∞, then p mp(X) E(|Z| ) p ≥ p , For β = 0 (λ = 1), by polar coordinates and change of Nλ(X) Nλ(Z) 5 with equality if and only if X = TZ for the standard B. An integral representation for affine moments generalized Gaussian Z and some T ∈ GL(n). The following is a special case of the dual Minkowski Theorem 3 is formally stronger than Theorem 2, but the inequality for random vectors established in [31], Lemma 4.1. two are equivalent. Theorem 3 is an affine formulation of n Lemma 4: If p > 0, λ > n+p , and X and Y are Theorem 2. The affine moment mp(X) has no explicit formula independent random vectors in Rn with finite p-th moment, in terms of the density of X. This makes the calculation then difficult. Is there a notion of affine moments that has an explicit Z formula and also gives an essentially stronger moment-entropy E(|y · X|p)g(y) dy inequality than those in Theorems 2 and 3? Rn − p The definition of mp(X) can be formulated differently as  Z  n p p − n follows: Let F be the class of norms on Rn that is given by ≥ Nλ(Y ) a1 E(|u · X| ) p du , Sn−1 n−1 n F = {k · kA : A ∈ SL(n)}, where g is the density of Y , S is the unit sphere in R , du denotes the Lebesgue measure on Sn−1, where k · k is defined by kxk = |Ax|, x ∈ Rn. Then A A a n 1 n  0 B , − if λ < 1, p  n p 1 − λ p mp(X) = inf E(kXkA).  k·kA∈F  n    1 pe p n a1 = Γ 1 + if λ = 1, We shall use a larger class of norms than F to define n n p  the affine moment Mp(X). The norms are generated by the a n λ   0 B , if λ > 1, p-cosine transforms of density functions. We shall give an  n p λ − 1 explicit formula for the affine moment Mp(X) and establish the affine moment-entropy inequality in Theorem 1 which is and 1 n essentially stronger than Theorems 2 and 3.   λ−1 p n n(λ − 1) pλ A similar approach was used in [26] to define a notion of a0 = 1 + 1 + . p pλ n(λ − 1) affine Fisher information. Equality is attained, if the density of Y is given by

1  p λ−1 VI.THE AFFINE p-THMOMENTOFARANDOMVECTOR b(1 + akyk ) if λ < 1,  p g(y) = be−akyk if λ = 1, (12) A. Definition 1  p λ−1 b(1 − akyk )+ if λ > 1, A random vector X in Rn with density g is said to have finite p-moment for p > 0, if for a, b > 0, where the norm k · k is given by kykp = E(|y · X|p) for each y ∈ Rn. Z |x|pg(x) dx < ∞. The following is the integral representation for the affine Rn p-th moment. Theorem 5: If p > 0 and X is a random vector with finite The p-cosine transform of a random vector Y with finite p- p-th moment and density f, then th moment and density g defines the following norm on Rn, − p  Z n  n 1 p − p Z  p Mp(X) = c0 E(|u · X| ) du , p n n−1 kxkY,p = |x · y| g(y) dy , x ∈ R . (10) S n R a1 where c0 = n . c1 If p > 0 and X is a random vector, then we define the affine Proof: If Y is a random vector such that Nλ(Y ) = c1, p-th moment of X to be then by the Fubini theorem and Lemma 4, p E(kXkp ) Mp(X) = inf E(kXkY,p), (11) Y,p Nλ(Y )=c1 Z Z = |x · y|pg(y) dy f(x) dx where each random vector Y is independent of X and has Rn Rn p λ Z Z finite -th moment and -Renyi´ entropy power equal to a = |x · y|pf(x) dx g(y) dy constant c1 which will be chosen appropriately later. Rn Rn The definition above appears to depend on the parameter Z = E(|y · X|p)g(y) dy λ, but by Theorem 5 and (14), which are stated in Sections Rn − p VI-B and VI-D, the value of Mp(X) is in fact independent  Z  n p − n of λ when the constant c1 is properly chosen. We also show ≥ c0 E(|u · X| ) p du . below that the infimum in the definition above is achieved, Sn−1 and the affine p-th moment Mp(X) is invariant under volume- Moreover, by the equality condition of Lemma 4, equality is preserving linear transformations of the random vector X. attained if Y is a random vector whose density is given by 6

(12), normalized so that Nλ(Y ) = c1. Therefore, Lemma 7: If X is a spherically contoured random vector p with finite p-th norm and density given by f(x) = F (|x|), Mp(X) = inf E(kXkY,p) x ∈ Rn, then Nλ(Y )=c1 − p ∞  Z  n Z p − n p+n−1 = c0 E(|u · X| ) p du . Mp(X) = nωn F (ρ)ρ dρ. (15) Sn−1 0 Proof: For each u ∈ Sn−1, Z p p C. Affine invariance of the affine p-th moment E(|u · X| ) = |u · x| F (|x|) dx Rn n Z ∞ Z Theorem 6: If X is a random vector in R with finite p-th p p+n−1 moment, then = |u · v| dv F (ρ)ρ dρ 0 Sn−1 M (AX) = M (X), Z ∞ p p p+n−1 = ωn,p F (ρ)ρ dρ. for each A ∈ SL(n). 0 Proof: If A ∈ GL(n), then it follows by (10) that By Theorem 5,

p − p E(kAXk )  Z  n Y,p p − n Z Z Mp(X) = c0 E(|u · X| ) p du = |Ax · y|pg(y) dy f(x) dx Sn−1 n n Z ∞ R R p+n−1 Z Z = nωn F (ρ)ρ dρ. = |x · Aty|pg(y) dy f(x) dx 0 Rn Rn p = E(kXkAtY ). Thus, if A ∈ SL(n), then by (5), E. Affine versus Euclidean p Mp(AX) = inf E(kAXkY,p) Lemma 8: If p > 0 and X is a random vector in Rn, then Nλ(Y )=c1 p = inf E(kXk ) p AtY,p Mp(X) ≤ E(|X| ). (16) Nλ(Y )=c1 p = inf E(kXk t ) p t A Y,p Equality holds if and only if the function v 7→ E(|v · X| ) is Nλ(A Y )=c1 constant for v ∈ Sn−1. In particular, equality holds if X is = M (X). p spherically contoured. Proof: If f is the density of X and u ∈ Sn−1, then Z E(|u · X|p) du D. The affine p-th moment of a spherically contoured random Sn−1 vector Z Z = |u · x|pf(x) dx du n Denote the volume of the unit ball in R by Sn−1 Rn Z Z p  n x p π 2 = u · du |x| f(x) dx ωn = n , Rn Sn−1 |x| Γ(1 + 2 ) p = ωn,pE(|X| ). and observe that Z Therefore, by Theorem 5, (13), and Holder’s¨ inequality, du = nωn. (13) Sn−1 p (nωnc0) n Mp(X) Let p n−1 − p+1  Z  n Z 2π 2 Γ( ) 1 p − n p 2 = E(|u · X| ) p du ωn,p = |u · en| du = n+p , n−1 Γ( ) nωn Sn−1 S 2 Z 1 p where en is a fixed unit vector, for example, en = (0,..., 0, 1). ≤ E(|u · X| ) du nωn n−1 a1 S We choose the constant c1 so that the constant c0 = n is ω c1 n,p p given by = E(|X| ). nωn n n ! p   p n n p+1 The equality condition follows by the equality condition of 1 ωn,p Γ( 2 ) Γ( 2 )Γ( 2 ) c0 = = n 1 n+p . (14) Holder’s¨ inequality. nωn nωn 2π 2 π 2 Γ( ) 2 By the theorem above and (7), we get the following. A random vector X is called spherically contoured if its Corollary 9: If Z is the standard generalized Gaussian, then density f can be written as f(x) = F (|x|), x ∈ Rn, where F : [0, ∞) → [0, ∞). Mp(Z) = n. 7

VII.AFFINE p-THMOMENT-ENTROPYINEQUALITIES REFERENCES

A. Proof of Theorem 1 [1] S. Amari, Differential-geometrical methods in , ser. Lecture Notes in Statistics. New York: Springer-Verlag, 1985, vol. 28. The following bilinear moment-entropy inequality is estab- [2] A. Andai, “On the geometry of generalized Gaussian distributions,” J. lished in [31]. Multivariate Anal., vol. 100, no. 4, pp. 777–793, 2009. [3] E. Arikan, “An inequality on guessing and its application to sequential Theorem 10: Let p ≥ 1, λ > n/(n + p). There exists a decoding,” IEEE Trans. Inform. Theory, vol. 42, pp. 99–105, 1996. constant c > 0 such that if X and Y are independent random [4] A. D. Barbour, O. Johnson, I. Kontoyiannis, and M. Madiman, “Com- vectors in Rn with finite p-th moments, then pound Poisson approximation via information functionals,” Electron. J. Probab., vol. 15, pp. 1344–1368, 2010. E(|X · Y |p) ≥ c N (X)pN (Y )p, [5] S. Bobkov and M. Madiman, “Concentration of the information in data λ λ with log-concave distributions,” Ann. Probab., vol. 39, no. 4, pp. 1528– 1543, 2011. with equality holding if and only if X and Y are certain [6] S. Bobkov and M. Madiman, “The entropy per coordinate of a random generalized Gaussians. vector is highly constrained under convexity conditions,” IEEE Trans. We use this theorem to establish the following affine Inform. Theory, vol. 57, no. 8, pp. 4940–4954, 2011. [7] S. Bobkov and M. Madiman, “Dimensional behaviour of entropy and moment-entropy inequality, which is Theorem 1. information,” C. R. Math. Acad. Sci. Paris, vol. 349, no. 3-4, pp. 201– Theorem 11: If p ∈ [1, ∞), λ ∈ (n/(n + p), ∞), and X is 204, 2011. Rn λ p [8] S. Bobkov and M. Madiman, “Reverse Brunn-Minkowski and reverse a random vector in with finite -Renyi´ entropy and -th entropy power inequalities for convex measures,” J. Funct. Anal., vol. moment, then 262, no. 7, pp. 3309–3339, 2012. p Mp(X) Mp(Z) [9] S. Campi and P. Gronchi, “The L -Busemann-Petty centroid inequality,” p ≥ p , (17) Adv. Math., vol. 167, pp. 128–141, 2002. Nλ(X) Nλ(Z) [10] D. Cordero-Erausquin, W. Gangbo, and C. Houdre,´ “Inequalities for generalized entropy and optimal transportation,” In Recent advances in with equality if and only if X is a generalized Gaussian. the theory and applications of mass transport, vol. 353, Contemp. Math., Proof: If Y is independent of X and has finite p-th pp. 73–94. Amer. Math. Soc., Providence, RI, 2004. [11] D. Cordero-Erausquin, B. Nazaret, and C. Villani, “A mass- moment and Renyi´ entropy power Nλ(Y ) = c1, then by (10) transportation approach to sharp Sobolev and Gagliardo-Nirenberg in- and Theorem 10, equalities,” Adv. Math., vol. 182, pp. 307–332, 2004. [12] M. H. M. Costa and T. M. Cover, “On the similarity of the entropy p p E(kXkY,p) = E(|X · Y | ) power inequality and the Brunn-Minkowski inequality,” IEEE Trans. p p Inform. Theory, vol. IT-30, pp. 837–839, Nov. 1984. ≥ c c1Nλ(X) . [13] J. A. Costa, A. O. Hero, and C. Vignat, “A characterization of the multivariate distributions maximizing Renyi entropy,” in Proceedings The desired inequality (17) now follows by the definition (11) of 2002 IEEE International Symposium on Information Theory, 2002, p. 263. of Mp(X). The equality condition follows by the equality [14] T. M. Cover, A. Dembo, and J. A. Thomas, “Information theoretic conditions of Lemma 4 and Theorem 10 (or (7) and Corollary inequalities,” IEEE Trans. Inform. Theory, vol. 37, pp. 1501–1518, Nov. 9). 1991. [15] T. M. Cover and J. A. Thomas, Elements of information theory. New York: John Wiley & Sons Inc., 1991, a Wiley-Interscience Publication. B. Affine implies Euclidean [16] I. Csiszar,´ “Information-type measures of difference of probability dis- tributions and indirect observations,” Studia Sci. Math. Hungar., vol. 2, Proposition 12: The affine moment-entropy inequality in pp. 299–318, 1967. [17] M. Del Pino and J. Dolbeault, “Best constants for Gagliardo-Nirenberg Theorem 11 is stronger than the Euclidean moment-entropy inequalities and applications to nonlinear diffusions,” J. Math. Pures inequality in Theorem 2. Appl. (9), vol. 81, pp. 847–875, 2002. Proof: Observe that equality holds in Lemma 8 for a [18] K. T. Fang, S. Kotz, and K. W. Ng, Symmetric Multivariate and Related Distributions, London, U.K.: Chapman & Hall, 1990, vol. 36, standard generalized Gaussian random vector Z because it Monographs on Statistics and Applied Probability. is spherically contoured. By Lemma 8, Theorem 11, and the [19] O. G. Guleryuz, E. Lutwak, D. Yang, and G. Zhang, “Information- equality condition of Lemma 8, theoretic inequalities for contoured probability distributions,” IEEE Trans. Inform. Theory, vol. 48, no. 8, pp. 2377–2383, Aug. 2002. E(|X|p) M (X) [20] O. Johnson and C. Vignat, “Some results concerning maximum Renyi´ p entropy distributions,” Ann. Inst. H. Poincare´ Probab. Statist., vol. 43, p ≥ p Nλ(X) Nλ(X) no. 3, pp. 339–351, 2007. [21] C. P. Kitsos and N. K. Tavoularis, “Logarithmic Sobolev inequalities for Mp(Z) information measures,” IEEE Trans. Inform. Theory, vol. 55, no. 6, pp. ≥ p Nλ(Z) 2554–2561, 2009. E(|Z|p) [22] C. Kitsos and N. Tavoularis, “Newentropy type information measures,” = p . in Proc. 31st Int. Conf. Inf. Technol. Interfaces, pp. 255–260, 2009. Nλ(Z) [23] E. H. Lieb, “Proof of an entropy conjecture of Wehrl,” Commun. Math. Phys., vol. 62, no. 1, pp. 35–41, 1978. Therefore, the Euclidean moment-entropy inequality in Theo- [24] E. Lutwak, “Selected affine isoperimetric inequalities,” Handbook of rem 2 is weaker than the affine moment-entropy inequality in Convex Geometry, edited by Gruber and Wills, Elsevier, North- Holland, Theorem 11. 1993, pp. 151–176. [25] E. Lutwak, “The Brunn-Minkowski-Firey theory, I: Mixed volumes and the Minkowski problem,” J. Diff. Geom., vol. 38, pp. 131–150, 1993. [26] E. Lutwak, S. Lv, D. Yang, and G. Zhang, “Extensions of Fisher ACKNOWLEDGMENT Information and Stam’s Inequality,” IEEE Trans. Inform. Theory, vol. 58, pp. 1319–1327, 2012. The authors would like to thank the referees for their helpful [27] E. Lutwak, D. Yang, and G. Zhang, “Lp affine isoperimetric inequali- comments. ties,” J. Differential Geom., vol. 56, pp. 111–132, 2000. 8

[28] E. Lutwak, D. Yang, and G. Zhang, “A new ellipsoid associated with convex bodies,” Duke Math. J. vol. 104, pp. 375–390, 2000. [29] E. Lutwak, D. Yang, and G. Zhang, “The Cramer-Rao inequality for star bodies,” Duke Math. J. vol. 112, pp. 59–81, 2002. [30] E. Lutwak, D. Yang, and G. Zhang, “Sharp affine Lp Sobolev inequal- ities,” J. Differential Geom., vol. 62, pp. 17–38, 2002. [31] E. Lutwak, D. Yang, and G. Zhang, “Moment–entropy inequalities,” Annals of Probability, vol. 32, pp. 757–774, 2004. [32] E. Lutwak, D. Yang, and G. Zhang, “Cramer-Rao´ and moment-entropy inequalities for Renyi entropy and generalized Fisher information,” IEEE Trans. Inform. Theory, vol. 51, pp. 473–478, 2005. [33] E. Lutwak, D. Yang, and G. Zhang, “Lp John ellipsoids,” Proc. London Math. Soc. vol. 90, pp. 497–520, 2005. [34] E. Lutwak, D. Yang, and G. Zhang, “Optimal Sobolev norms and the Lp Minkowski problem,” Int. Math. Res. Not., 2006, Art. ID 62987, 21 pp. [35] E. Lutwak, D. Yang, and G. Zhang, “Moment-Entropy Inequalities for a Random Vector,” IEEE Trans. Inform. Theory, vol. 53, pp. 1603–1607, 2007. [36] E. Lutwak and G. Zhang, “Blaschke-Santalo´ inequalities,” J. Differential Geom., vol. 47, pp. 1–16, 1997. [37] M. Madiman and A. Barron, “Generalized entropy power inequalities and monotonicity properties of information,” IEEE Trans. Inform. The- ory, vol. 53, no. 7, pp. 2317–2329, 2007. [38] M. Madiman and P. Tetali, “Information inequalities for joint distri- butions, with interpretations and applications,” IEEE Trans. Inform. Theory, vol. 56, no. 6, pp. 2699–2713, 2010. [39] S. Nadarajah, “TheKotz-type distribution with applications,” Statistics, vol. 37, no. 4, pp. 341–358, 2003. [40] A. J. Stam, “Some inequalities satisfied by the quantities of information of Fisher and Shannon,” Information and Control, vol. 2, pp. 101–112, 1959. [41] G. Zhang, “The affine Sobolev inequality,” J. Differential Geom., vol. 53, pp. 183–202, 1999. [42] G. Zhang, “New affine isoperimetric inequalities,” in: ICCM 2007, vol. II, pp. 239–267.

Erwin Lutwak is Professor of Mathematics at Polytechnic Institute of NYU. He received his B.S. and M.S. in Mathematics from this institution when it was named Polytechnic Institute of Brooklyn and his Ph.D. in Mathematics when it was named Polytechnic Institute of New York.

Songjun Lv is Associate Professor of Mathematics at Chongqing Normal University. He received his B.S. and M.S. in Mathematics from Hunan Normal University, and his Ph.D. in Mathematics from Shanghai University.

Deane Yang is Professor of Mathematics at Polytechnic Institute of New York University. He received his B.A. in Mathematics and from University of Pennsylvania and Ph.D. in Mathematics from Harvard University.

Gaoyong Zhang is Professor of Mathematics at Polytechnic Institute of New York University. He received his B.S. in Mathematics from Wuhan University of Science and Technology, M.S. in Mathematics from Wuhan University, and Ph.D. in Mathematics from Temple University.