Rényi Entropy Power and Normal Transport
Total Page:16
File Type:pdf, Size:1020Kb
ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020 Rényi Entropy Power and Normal Transport Olivier Rioul LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120, Palaiseau, France Abstract—A framework for deriving Rényi entropy-power with a power exponent parameter ↵>0. Due to the non- inequalities (REPIs) is presented that uses linearization and an increasing property of the ↵-norm, if (4) holds for ↵ it also inequality of Dembo, Cover, and Thomas. Simple arguments holds for any ↵0 >↵. The value of ↵ was further improved are given to recover the previously known Rényi EPIs and derive new ones, by unifying a multiplicative form with con- (decreased) by Li [9]. All the above EPIs were found for Rényi stant c and a modification with exponent ↵ of previous works. entropies of orders r>1. Recently, the ↵-modification of the An information-theoretic proof of the Dembo-Cover-Thomas Rényi EPI (4) was extended to orders <1 for two independent inequality—equivalent to Young’s convolutional inequality with variables having log-concave densities by Marsiglietti and optimal constants—is provided, based on properties of Rényi Melbourne [10]. The starting point of all the above works was conditional and relative entropies and using transportation ar- guments from Gaussian densities. For log-concave densities, a Young’s strengthened convolutional inequality. transportation proof of a sharp varentropy bound is presented. In this paper, we build on the results of [11] to provide This work was partially presented at the 2019 Information Theory simple proofs for Rényi EPIs of the general form and Applications Workshop, San Diego, CA. ↵ m m ↵ Nr Xi c Nr (Xi) (5) i=1 ≥ i=1 I. INTRODUCTION with constant c>⇣X 0 and⌘ exponentX ↵>0. The present We consider the r-entropy (Rényi entropy of exponent framework uses only basic properties of Rényi entropies and r, where r>0 and r =1) of a n-dimensional zero-mean is based on a transportation argument from normal densities 6 random vector X Rn having density f Lr(Rn): and a change of variable by rotation, which was previously 2 2 1 r used to give a simple proof of Shannon’s original EPI [12]. hr(X)= log f (x)dx = r0 log f r (1) 1 r n R − k k II. LINEARIZATION − Zr r where f denotes the L norm of f, and r0 = is the k kr r 1 The first step toward proving (5) is the following linearization conjugate exponent of r, such that 1 + 1 =1. Notice that− either r r0 lemma which generalizes [9, Lemma 2.1]. r>1 and r0 > 1, or 0 <r<1 and r0 < 0. The limit as r 1 ! For independent X1,X2,...,Xm, the Rényi EPI is the classical h1(X)=h(X)= n f(x) log f(x)dx. Lemma 1. − R Letting N(X)=exp 2h(X)/n be the corresponding entropy in the general form (5) is equivalent to the following inequality R m m power [1], the famous entropy power inequality (EPI) [1], [2] n log c 1 m m hr pλiXi λihr(Xi) 2 ↵ + ↵ 1 H(λ) writes N i=1 Xi i=1 N(Xi) for any independent i=1 − i=1 ≥ − ≥ n for any distribution λ =(λ ,...,λ ) of entropy H(λ). (6) random vectors X1,X2,...,Xm R . The link with the P P 1 m ⇣P ⌘ P 2 Rényi entropy hr(X) was first made in [3] in connection with Proof. Note the scaling property h (aX)=h (X)+n log a r r | | a strengthened Young’s convolutional inequality, where the for any a =0, established by a change of variable. It follows 6 2 EPI is obtained by letting exponents tend to 1 [4, Thm 17.8.3]. that Nr(aX)=a Nr(X). Now first suppose (5) holds. Then Recently, there has been increasing interest in Rényi entropy- m n ↵ m hr pλiXi = log Nr pλiXi (7) power inequalities [5]. The Rényi entropy-power Nr(X) is i=1 2↵ i=1 n log m N ↵(pλ X )+ n log c defined [6] as the average power of a white Gaussian vector P 2↵ i=1 r i i P2↵ 2 ≥ m ↵ having the same Rényi entropy as X. If X⇤ (0,σ I) is = n log λ↵N (X )+ n log c (8) ⇠N 2↵ Pi=1 i r i 2↵ white Gaussian, an easy calculation yields n m ↵ 1 ↵ n 2↵ i=1Pλi log λi − Nr (Xi) + 2↵ log c (9) n 2 n log r ≥ hr(X⇤)= log(2⇡ )+ r0 . (2) m n(↵ 1) m n 2 2 r = iP=1 λihr(Xi)+ 2↵− i=1 λi log λi + 2↵ log c 2 e2hr (X)/n Since equating hr(X⇤)=hr(X) gives σ = , we which proves (6). The scaling property is used in (8) and the 2⇡rr0/r P P 2hr (X)/n define Nr(X)=e as the r-entropy power. concavity of the logarithm is used in (9). Bobkov and Chistyakov [6] extended the classical EPI to Conversely, suppose that (6) is satisfied for all λi > 0 such m ↵ m ↵ the r-entropy by incorporating a r-dependent constant c>0: that i=1 λi =1. Set λi = Nr (Xi)/ i=1 Nr (Xi). Then ↵ m m m 2↵ m Xi N Xi =exp hr pλi r P i=1 n i=1 Ppλi Nr Xi c Nr(Xi). (3) m 1 i=1 ≥ i=1 2↵ m X (1 ↵) λi log i − i=1 λi ⇣ ⌘ Pexp n i=1 λihr P c e Ram and SasonX [7] improved (increased)X the value of c by ≥ pλi ⇥ · m ⇣ ⌘λi m P λi making it depend also on the number m of independent vectors P↵ Xi ↵ 1 ↵ 1 = c N λ − = c N (X )λ− r pλ i r i i X1,X2,...,Xm. Bobkov and Marsiglietti [8] proved another i=1 i i=1 ⇣ ⇣ ⌘ m⌘ ⇣ ⌘ modification of the EPI for the Rényi entropy: Q m ↵ i=1 λi Q m ↵ = c i=1 Nr (Xi) = c i=1 Nr (Xi) ↵ m m ↵ P Nr Xi Nr (Xi) (4) which proves (5). i=1 ≥ i=1 P P ⇣X ⌘ X Copyright (C) 2020 by IEICE 1 ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020 III. THE REPI OF DEMBO-COVER-THOMAS Proof. By Lemma 1 for ↵ =1we only need to check that n As a second ingredient we have the following result, which the r.h.s. of (13) is greater than 2 log c for any choice of was essentially established by Dembo, Cover and Thomas [3]. the λi’s, that is, for any choice of exponents ri such that m 1 1 = . Thus, (3) will hold for log c =minλ A(λ). It is this Rényi version of the EPI which led them to prove i=1 ri0 r0 Shannon’s original EPI by letting Rényi exponents 1. Now, by the log-sum inequality [4, Thm 2.7.1], P m ! m 1 1 m 1 1 m 1 i=1 ri 1 r0 Theorem 1. Let r1,...,rm,r be exponents those conjugates log log =(m r ) log − m 1 1 ri ri ≥ ri m − 0 m r10 ,...,rm0 ,r0 are of the same sign and satisfy i=1 r = r i=1 i=1 P i0 0 X X (15) and let λ1,...,λm be the discrete probability distribution with equality if and only if all r are equal, that is, the r0 P i λi = . Then, for independent zero-mean X1,X2,...,Xm, log r ri0 λ are equal to 1/m. Thus, min A(λ)=r0 +(m m m i λ r − h pλ X λ h (X ) m 1/r0 r i i i ri i 1/r0) log −m = log c. i=1 − i=1 ⇥ ⇣ ⌘ m m (10) log r 1 P P Note that log⇤c = r0 +(mr0 1) log 1 < 0 p r mr0 hr λiXi⇤ λihri (Xi⇤) log r − − ≥ i=1 − i=1 decreases (and tends to r0 1) as m increases. Thus, a ⇣ ⌘ r − where X1⇤,X2⇤,...,Xm⇤ Pare i.i.d. standardP Gaussian (0, I). universal constant independent of m is obtained by taking N r0/r Equality holds if and only if the Xi are i.i.d. Gaussian. 1 mr0 1 r c =inf rr0/r 1 − = , (16) m − mr0 e It is easily seen from the expression (2) of the Rényi as was established by Bobkov and Chistyakov [6]. entropy of a Gaussian that (10) is equivalent to m m m n log r log ri Proposition 2 (Li [9]). The Rényi EPI (4) holds for r>1 hr pλiXi λihr (Xi) r0 . (11) log r 1 i 2 r ri 2 1 − − ≥ − and ↵ = 1+r0 +(2r0 1) log2 1 . i=1 i=1 i=1 r − − 2r0 ⇣P ⌘ P ⇣ P ⌘ Note that the l.h.s. is very similar to that of (6) except that Li [9] remarked⇥ that this value of ↵ is strictly smaller⇤ (better) r+1 different Rényi exponents are present. This will be the crucial than the value ↵ = 2 obtained previously by Bobkov and step toward proving (5). Marsiglietti [8]. In [11] it is shown that it cannot be further Theorem 1 (for m =2) was derived in [3] as a rewriting improved in our framework by making it depend on m. of Young’s strengthened convolutional inequality with optimal Proof. Since the announced ↵ does not depend on m, we constants. Section VII provides a simple transportation proof, can always assume that m =2. By Lemma 1 for c =1, which uses only basic properties of Rényi entropies.