Moment-Entropy Inequalities for a Random Vector Erwin Lutwak, Deane Yang, and Gaoyong Zhang

1 Moment-entropy inequalities for a random vector Erwin Lutwak, Deane Yang, and Gaoyong Zhang Abstract— The p-th moment matrix is defined for a real Throughout this paper, X denotes a random vector in Rn. random vector, generalizing the classical covariance matrix. The probability measure on Rn associated with a random Sharp inequalities relating the p-th moment and Renyi entropy vector X is denoted mX . are established, generalizing the classical inequality relating n the second moment and the Shannon entropy. The extremal We will denote the standard Lebesgue density on R by distributions for these inequalities are completely characterized. dx. By the density function fX of a random vector X, we mean the Radon-Nikodym derivative of probability measure mX with respect to Lebesgue measure. If V is a vector space and Φ: Rn → V is a continuous I. INTRODUCTION function, then the expected value of Φ(X) is given by In [9] the authors demonstrated how the classical informa- Z tion theoretic inequality for the Shannon entropy and second E[Φ(X)] = Φ(x) dmX (x). n moment of a real random variable could be extended to R inequalities for Renyi entropy and the p-th moment. The We call a random vector X nondegenerate, if E[|v·X|] > 0 extremals of these inequalities were also completely charac- for each nonzero v ∈ Rn. terized. Moment-entropy inequalities, using Renyi entropy, for discrete random variables have also been obtained by Arikan [2]. B. The p-th moment of a random vector We describe how to extend the definition of the second mo- For p ∈ (0, ∞), the standard p-th moment of a random ment matrix of a real random vector to that of the p-th moment vector X is given by matrix. Using this, we extend the moment-entropy inequalities Z p p and the characterization of the extremal distributions proved E[|X| ] = |x| dmX (x). (1) n in [9] to higher dimensions. R Variants and generalizations of the theorems presented can More generally, the p-th moment with respect to the inner be found in work of the authors [8], [10], [11] and Bastero- product h·, ·iA is Romance [3]. Z p p The authors would like to thank Christoph Haberl for his E[|X|A] = |x|A dmX (x). n careful reading of this paper and valuable suggestions for R improving it. C. The p-th moment matrix II. THE p-TH MOMENT MATRIX OF A RANDOM VECTOR The second moment matrix of a random vector X is defined A. Basic notation to be M2[X] = E[X ⊗ X], Throughout this paper we denote: n n where for v ∈ R , v ⊗ v is the linear transformation given R = n-dimensional Euclidean space by x 7→ (x · v)v. Recall that M2[X − E[X]] is the covariance x · y = standard Euclidean inner product of x, y ∈ Rn √ matrix. An important observation is that the definition of the |x| = x · x moment matrix does not use the inner product on Rn. S = positive definite symmetric n-by-n matrices A unique characterization of the second moment matrix is the following: Let M = M [X]. The inner product h·, ·i −1/2 |A| = determinant of A ∈ S 2 M is the unique one whose unit ball has maximal volume among n |K| = Lebesgue measure of K ⊂ R . all inner products h·, ·iA that are normalized so that the second 2 n moment satisfies E[|AX| ] = n. The standard Euclidean ball in R will be denoted by B, and We extend this characterization to a definition of the p-th its volume by ωn. n moment matrix M [X] for all p ∈ (0, ∞). Each inner product on R can be written uniquely as p Theorem 1: If p ∈ (0, ∞) and X is a nondegenerate n n n (x, y) ∈ R × R 7→ hx, yiA = Ax · Ay, random vector in R with finite p-th moment, then there exists a unique matrix A ∈ S such that for A ∈ S. The associated norm will be denoted by | · |A. E[|X|p ] = n E. Lutwak ([email protected]), D. Yang ([email protected]), and A G. Zhang ([email protected]) are with the Department of Mathematics, Polytechnic University, Brooklyn, New York. and were supported in part by and NSF Grant DMS-0405707. |A| ≥ |A0|, 2 0 p for each A ∈ S such that E[|X|A0 ] = n. Moreover, the matrix Observe that hλ[X, Y ] is continuous in λ. A is the unique matrix in S satisfying Lemma 2: If X and Y are random vectors such that hλ[X], h [Y ], and h [X, Y ] are finite, then I = E[AX ⊗ AX|AX|p−2]. λ λ We define the p-th moment matrix of a random vector X to hλ[X, Y ] ≥ 0. −p be Mp[X] = A , where A is given by the theorem above. The proof of the theorem is given in §IV Equality holds if and only if X = Y . Proof: If λ > 1, then by the Holder¨ inequality, λ−1 1 III. MOMENT-ENTROPY INEQUALITIES Z Z λ Z λ λ−1 λ λ fY fX dx ≤ fY dx fX dx , A. Entropy n n n R R R The Shannon entropy of a random vector X is defined to and if λ < 1, then we have be Z Z Z λ λ−1 λ λ(1−λ) h[X] = − fX log fX dx, fX = (fY fX ) fY n n n R R R Z λ Z 1−λ provided that the integral above exists. For λ > 0 the λ-Renyi λ−1 λ ≤ fY fX fY . entropy power of a density function is defined to be n n R R 1 Z 1−λ The inequality for λ = 1 follows by taking the limit λ → 1. λ fX if λ 6= 1, The equality conditions for λ 6= 1 follow from the equality Nλ[X] = n R conditions of the Holder¨ inequality. The inequality for λ = h[f] e if λ = 1, 1, including the equality condition, follows from the Jensen provided that the integral above exists. Observe that inequality (details may be found, for example, page 234 in [4]). lim Nλ[X] = N1[X]. λ→1 The λ–Renyi entropy of a random vector X is defined to be C. Generalized Gaussians We call the extremal random vectors for the moment- h [X] = log N [X]. λ λ entropy inequalities generalized Gaussians and recall their The entropy hλ[X] is continuous in λ and, by the Holder¨ definition here. inequality, decreasing in λ. It is strictly decreasing, unless X Given t ∈ R, let is a uniform random vector. t = max(t, 0). It follows by the chain rule that + Let Nλ[AX] = |A|Nλ[X], (2) Z ∞ Γ(t) = xt−1e−x dx for each A ∈ S. 0 denote the Gamma function, and let B. Relative entropy Γ(a)Γ(b) β(a, b) = Given two random vectors X, Y in Rn, their relative Shan- Γ(a + b) non entropy or Kullback–Leibler distance [6], [5], [1] (also, denote the Beta function. see page 231 in [4]) is defined by For each p ∈ (0, ∞) and λ ∈ (n/(n + p), ∞), define the Z fX standard generalized Gaussian to be the random vector Z in h1[X, Y ] = fX log dx, (3) n n n f f : → [0, ∞) R Y R whose density function Z R is given by provided that the integral above exists. Given λ > 0, we define p 1/(λ−1) ap,λ(1 + (1 − λ)|x| )+ if λ 6= 1 the relative λ–Renyi entropy power of X and Y as follows. If fZ (x) = (5) λ 6= 1, then −|x|p ap,1e if λ = 1, 1 1 Z 1−λ Z λ λ−1 λ fY fX dx fY dx where n n R R n Nλ[X, Y ] = 1 , (4) p(1 − λ) p Z λ(1−λ) if λ < 1, λ n 1 n fX dx nωnβ( p , 1−λ − p ) n R p if λ = 1, and, if λ = 1, then ap,λ = nω Γ( n ) n p h1[X,Y ] n N1[X, Y ] = e , p(λ − 1) p if λ > 1. n λ provided in both cases that the righthand side exists. Define nωnβ( p , λ−1 ) the λ–Renyi relative entropy of random vectors X and Y by Any random vector Y in Rn that can be written as Y = AZ, hλ[X, Y ] = log Nλ[X, Y ]. for some A ∈ S is called a generalized Gaussian. 3 D. Information measures of generalized Gaussians If λ 6= 1, then by (9) and (5), (1), (11), and (7), Z If 0 < p < ∞ and λ > n/(n + p), the λ-Renyi entropy λ−1 fY fX power of the standard generalized Gaussian random vector Z n is given by R Z ≥ aλ−1tn(1−λ) + (1 − λ)aλ−1tn(1−λ)−p |x|pf (x) dx 1 X n n(λ − 1) λ−1 R −1 λ−1 n(1−λ) −p p 1 + ap,λ if λ 6= 1 = a t (1 + (1 − λ)t E[|X| ]) Nλ[Z] = pλ λ−1 n(1−λ) p n −1 = a t (1 + (1 − λ)E[|Z|] ]) p e ap,1 if λ = 1 Z n(1−λ) λ = t fZ , (12) If 0 < p < ∞ and λ > n/(n + p), then the p-th moment n R of Z is given by where equality holds if λ < 1. It follows that if λ 6= 1, then h p i−1 by Lemma 2, (4), (10) and (12), and (11), we have E[|Z|p] = λ 1 + − 1 .

Moment-Entropy Inequalities for a Random Vector Erwin Lutwak, Deane Yang, and Gaoyong Zhang

Matrix-Valued Moment Problems

Geometry of the Pfaff Lattices 3

Pre-Publication Accepted Manuscript

Representational Models: a Common Framework for Understanding Encoding, Pattern-Component, and Representational-Similarity Analysis

A Panorama of Positivity

New Publications Offered by The

RANK SHIFT CONDITIONS and REDUCTIONS of 2D-TODA THEORY 3 Expressed in Terms of the Cauchy Two-Matrix Model

Sums of Squares, Moment Matrices and Optimization Over Polynomials

Arxiv:2006.16213V4 [Math.FA] 9 Jan 2021 41,4B4(Secondary)

Lecture 1: Bivariate Linear Model

A Geometric and Topological Viewpoint

Sums of Squares, Moment Matrices and Optimization Over Polynomials