ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020 Rényi Entropy Power and Normal Transport

Olivier Rioul LTCI, Télécom Paris, Institut Polytechnique de Paris, 91120, Palaiseau, France

Abstract—A framework for deriving Rényi entropy-power with a power exponent parameter ↵>0. Due to the non- inequalities (REPIs) is presented that uses linearization and an increasing property of the ↵-norm, if (4) holds for ↵ it also inequality of Dembo, Cover, and Thomas. Simple arguments holds for any ↵0 >↵. The value of ↵ was further improved are given to recover the previously known Rényi EPIs and derive new ones, by unifying a multiplicative form with con- (decreased) by Li [9]. All the above EPIs were found for Rényi stant c and a modification with exponent ↵ of previous works. entropies of orders r>1. Recently, the ↵-modification of the An information-theoretic proof of the Dembo-Cover-Thomas Rényi EPI (4) was extended to orders <1 for two independent inequality—equivalent to Young’s convolutional inequality with variables having log-concave densities by Marsiglietti and optimal constants—is provided, based on properties of Rényi Melbourne [10]. The starting point of all the above works was conditional and relative entropies and using transportation ar- guments from Gaussian densities. For log-concave densities, a Young’s strengthened convolutional inequality. transportation proof of a sharp varentropy bound is presented. In this paper, we build on the results of [11] to provide This work was partially presented at the 2019 simple proofs for Rényi EPIs of the general form and Applications Workshop, San Diego, CA. ↵ m m ↵ Nr Xi c Nr (Xi) (5) i=1 i=1 I. INTRODUCTION with constant c>⇣X 0 and⌘ exponentX ↵>0. The present We consider the r-entropy (Rényi entropy of exponent framework uses only basic properties of Rényi entropies and r, where r>0 and r =1) of a n-dimensional zero-mean is based on a transportation argument from normal densities 6 random vector X Rn having density f Lr(Rn): and a change of variable by rotation, which was previously 2 2 1 r used to give a simple proof of Shannon’s original EPI [12]. hr(X)= log f (x)dx = r0 log f r (1) 1 r n R k k II. LINEARIZATION Zr r where f denotes the L norm of f, and r0 = is the k kr r 1 The first step toward proving (5) is the following linearization conjugate exponent of r, such that 1 + 1 =1. Notice that either r r0 lemma which generalizes [9, Lemma 2.1]. r>1 and r0 > 1, or 0 0 such m ↵ m ↵ the r-entropy by incorporating a r-dependent constant c>0: that i=1 i =1. Set i = Nr (Xi)/ i=1 Nr (Xi). Then ↵ m m m 2↵ m Xi N Xi =exp hr pi r P i=1 n i=1 Ppi Nr Xi c Nr(Xi). (3) m 1 i=1 i=1 2↵ m X (1 ↵) i log i i=1 i ⇣ ⌘ Pexp n i=1 ihr P c e Ram and SasonX [7] improved (increased)X the value of c by pi ⇥ · m ⇣ ⌘i m P i making it depend also on the number m of independent vectors P↵ Xi ↵ 1 ↵ 1 = c N = c N (X ) r p i r i i X1,X2,...,Xm. Bobkov and Marsiglietti [8] proved another i=1 i i=1 ⇣ ⇣ ⌘ m⌘ ⇣ ⌘ modification of the EPI for the Rényi entropy: Q m ↵ i=1 i Q m ↵ = c i=1 Nr (Xi) = c i=1 Nr (Xi) ↵ m m ↵ P Nr Xi Nr (Xi) (4) which proves (5). i=1 i=1 P P ⇣X ⌘ X

Copyright (C) 2020 by IEICE 1 ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020

III. THE REPI OF DEMBO-COVER-THOMAS Proof. By Lemma 1 for ↵ =1we only need to check that n As a second ingredient we have the following result, which the r.h.s. of (13) is greater than 2 log c for any choice of was essentially established by Dembo, Cover and Thomas [3]. the i’s, that is, for any choice of exponents ri such that m 1 1 = . Thus, (3) will hold for log c =min A(). It is this Rényi version of the EPI which led them to prove i=1 ri0 r0 Shannon’s original EPI by letting Rényi exponents 1. Now, by the log-sum inequality [4, Thm 2.7.1], P m ! m 1 1 m 1 1 m 1 i=1 ri 1 r0 Theorem 1. Let r1,...,rm,r be exponents those conjugates log log =(m r ) log m 1 1 ri ri ri m 0 m r10 ,...,rm0 ,r0 are of the same sign and satisfy i=1 r = r i=1 i=1 P i0 0 X X (15) and let 1,...,m be the discrete probability distribution with equality if and only if all r are equal, that is, the r0 P i i = . Then, for independent zero-mean X1,X2,...,Xm, log r ri0 are equal to 1/m. Thus, min A()=r0 +(m m m i r h p X h (X ) m 1/r0 r i i i ri i 1/r0) log m = log c. i=1 i=1 ⇥ ⇣ ⌘ m m (10) log r 1 P P Note that log⇤c = r0 +(mr0 1) log 1 < 0 p r mr0 hr iXi⇤ ihri (Xi⇤) log r i=1 i=1 decreases (and tends to r0 1) as m increases. Thus, a ⇣ ⌘ r where X1⇤,X2⇤,...,Xm⇤ Pare i.i.d. standardP Gaussian (0, I). universal constant independent of m is obtained by taking N r0/r Equality holds if and only if the Xi are i.i.d. Gaussian. 1 mr0 1 r c =inf rr0/r 1 = , (16) m mr0 e It is easily seen from the expression (2) of the Rényi as was established by Bobkov and Chistyakov [6]. entropy of a Gaussian that (10) is equivalent to m m m n log r log ri Proposition 2 (Li [9]). The Rényi EPI (4) holds for r>1 hr piXi ihr (Xi) r0 . (11) log r 1 i 2 r ri 2 1 and ↵ = 1+r0 +(2r0 1) log2 1 . i=1 i=1 i=1 r 2r0 ⇣P ⌘ P ⇣ P ⌘ Note that the l.h.s. is very similar to that of (6) except that Li [9] remarked⇥ that this value of ↵ is strictly smaller⇤ (better) r+1 different Rényi exponents are present. This will be the crucial than the value ↵ = 2 obtained previously by Bobkov and step toward proving (5). Marsiglietti [8]. In [11] it is shown that it cannot be further Theorem 1 (for m =2) was derived in [3] as a rewriting improved in our framework by making it depend on m. of Young’s strengthened convolutional inequality with optimal Proof. Since the announced ↵ does not depend on m, we constants. Section VII provides a simple transportation proof, can always assume that m =2. By Lemma 1 for c =1, which uses only basic properties of Rényi entropies. we only need to check that the r.h.s. of (13) is greater than SFOR RDERS n IV. REPI O >1 2 (1/↵ 1)H() for any choice of is, that is, for any choice 2 1 1 If r>1, then r0 > 0 and all r0 are positive and greater of exponents ri such that i=1 r = r . Thus, (4) will hold i 1 A() i0 0 r r r for 1=min . Li [9] showed—this is also easily than 0. Therefore, all i are less than . Using the well-known ↵ H() P fact that hr(X) is non increasing in r (see also (22) below), proved using [10, Lemma 8]—that the minimum is obtained when =(1/2, 1/2). The corresponding value of A()/H() hri (Xi) hr(Xi)(i =1, 2,...,m). (12) log r 1 is r0 r +(2r0 1) log 1 2r / log 2 = 1/↵ 1. Plugging this into (11), one obtains 0 m m The⇥ above value of ↵is > 1.⇤ However, using the same n log r m log ri hr piXi ihr(Xi) r0 (13) method, it is easy to obtain Rényi EPIs with exponent values 2 r i=1 ri i=1 i=1 ↵<1. In this way we obtain a new Rényi EPI: wherePi = r0/ri0.P From Lemma 1 it sufficesP to establish that the r.h.s. of this inequality exceeds that of (6) to prove (5) for Proposition 3. The Rényi EPI (5) holds for r>1, 0 <↵<1 mr0 1 ↵ appropriate constants c and ↵. For future reference define with c = mrr0/r 1 1 /m. mr0 m log r log ri Proof. By Lemma 1 we only need to check that the r.h.s. A()= r0 (14) ⇥ ⇤ r ri n | | i=1 of Equation (13) is greater than (log c)/↵+(1/↵ 1)H() , m 2 i i 1 1 = r0 (1 P) log(1 ) (1 ) log(1 ). that is, A() (log c)/↵ +(1/↵ 1)H() for any choice | | r0 r0 r0 r0 i=1 of is, that is, for any choice of exponents ri such that X m 1 1 i=1 = . Thus, for a given 0 <↵<1, (5) will hold for (The absolute value r0 is needed in the next section where ri0 r0 | | log c =min ↵A() (1 ↵)H(). From the preceding proofs r0 is negative.) This function is strictly convex in = P (since both A() and H() are convex functions of ), the ( , ,..., ) because x (1 x/r0) log(1 x/r0) is 1 2 m 7! strictly convex. Note that A() vanishes in the limiting cases minimum is attained when all is are equal. This gives log c = log r 1 where tends to one of the standard unit vectors (1, 0,...,0), ↵ r0 +(mr0 1) log 1 (1 ↵) log m. r mr0 ..., (0, 0,...,0, 1) and since every is a convex combination V.⇣ R E P I SFORORDERS <1 AND LOG⌘ -CONCAVE DENSITIES of these vectors and A() is strictly convex, one has A() < 0. Using the properties of A() it is immediate to recover If r<1, then r0 < 0 and all ri0 are negative and r. Now the opposite inequality of (12) holds and the method of the preceding section fails. For log- Proposition 1 (Ram and Sason [7]). The Rényi EPI (3) holds concave densities, however, (12) can be replaced by a similar mr0 1 for r>1 and c = rr0/r 1 1 . inequality in the right direction. mr0

Copyright (C) 2020 by IEICE 2 ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020

A density f is log-concave if log f is concave in its support, Proposition 7. Let r =1and assume that X f Ls(Rn) 6 ⇠ 2 i.e., for all 0 <µ<1, for all s in a neighborhood of r. Then µ 1 µ f(x) f(y) f(µx +(1 µ)y). (17) @ (1 r)hr(X) = E log f(Xr)= h(Xr X) (21)  @r k Theorem 2 (Fradelizi, Madiman and Wang [13]). If X has a @ 1 hr(X)= D(Xr X) 0 (22) log-concave density, then h (rX) rh (X)=(1 r)h (X)+ @r (1 r)2 k  r r r @2 n log r is concave in r. (1 r)h (X) = Var log f(X ). (23) @r2 r r This concavity property is used in [13] to derive a sharp where h(X Y )= f log(1/g) denotes cross-entropy and k “varentropy bound”. Section VIII provides an alternate trans- D(X Y )= f log(f/g) is the Kullback-Leibler divergence. portation proof along the same lines as in Section VII. k R By Theorem 2, since n log r +(1 r)hr(X) is concave Proof. By theR hypothesis, one can differentiate under the inte- n log r+(1 r)hr (X) 0 @ @ r and vanishes for r =1, the slope is gral sign. It is easily seen that @r (1 r)hr(X) = @r log f r 1 r log r = f log f. Taking another derivative yields @ f log f = nonincreasing in r. In other words, hr(X)+n 1 r is r @r f r R 2 2 @ R nondecreasing. Now since all ri are >r, f (log f) ( f log f) (1 r)h (X) = Rr r . Since @r rR log ri log r @ 2 @ hri (X)+n 1 r hr(X)+n 1 r (i =1,...,m). (18) (1 r) @r hr(X) hr(X) we have (1 r) @r hr(X)= i R r R r Plugging this into (11), one obtains fr log(f/f ) + log f = fr log(f/fr). m m R Eq. (22) gives a newR proof thatR hr(X) is nonincreasing in r. hr piXi ihr(Xi) i=1 i=1 It is strictly decreasing if X is not distributed as X, that is, m m r ⇣ log r ⌘ log ri n log r log ri Pn P + r0 if X is not uniformly distributed. Equation (23) shows that 1 r i 1 ri 2 r ri i=1 i=1 r m (1 r)hr(X) is convex in r, that is, f is log-convex in r n log ri log r = r0 P P (19) (which is essentially equivalent to Hölder’s inequality). ri r 2 i=1 R where we haveP used that i = r0/r0 for i =1, 2,...,m. Definition 2 (Relative Rényi Entropy [15]). Given X f i ⇠ Notice that, quite surprisingly, the r.h.s. of (19) for r<1 and Y g, their relative Rényi entropy of exponent r (relative ⇠ (r0 < 0) is the opposite of that of (13) for r>1 (r0 > 0). r-entropy) is given by However, since r0 is now negative, the r.h.s. is exactly equal to r(X Y )=D 1 (Xr Yr) n k r k 2 A() which is still convex and negative. For this reason, the 1 r 1 r where Dr(X Y )= r 1 log f g is the r-divergence [16]. proofs of the following theorems for r<1 are such repeats k r>1 When r 1 both the relativeR r-entropy and the r-divergence of the theorems obtained previously for . ! tend to the Kullback-Leibler divergence D(X Y )=(X Y ) k k Proposition 4. The Rényi EPI (3) for log-concave densities (also known as the relative entropy). For r =1the two notions 1 mr0 r0/r 1 6 holds for c = r 1 and r<1. do not coïncide. It is easily checked from the definitions that mr0 Proof. Identical to that of Theorem 1 except for the change 1/r 1/r0 1/r0 r(X Y )= r0 log f g = r0log E g (X) hr(X) r = r in the expression of A(). k r r r 0 0 Z | | (24) 1/r0 Proposition 5 (Marsiglietti and Melbourne [10]). The hr(X)= r0 log E f (X) . (25) r Rényi EPI (4) log-concave densities holds for ↵ = 1+ Thus, just like for the case r =1, the relative r-entropy (24) log r 1 2 1 r0 r +(2r0 + 1) log2 1+ 2 r and r<1. is the difference between the expression of the r-entropy (25) | | | | | 0| ⇥ Proof. Identical to that of Theorem 2 except for the change in which f is replaced by g, and the r-entropy itself. ⇤ 1 r 1 r r = r in the expression of A(). Since the Rényi divergence Dr(X Y )= f g is 0 0 k r 1 | | nonnegative and vanishes if and only if the two distributions Proposition 6. The REPI (5) for log-concave densities holds f g (X Y )R 1 mr0 ↵ and coïncide, the relative entropy r enjoys the r0/r 1 k for c= mr 1 mr /m where r<1, 0<↵<1. same property. From (24) we have the following 0 Proof. It is identical to that of Theorem 3 except for the change ⇥ ⇤ Proposition 8 (Rényi-Gibbs’ inequality). If X f, r0 = r0 in the expression of A(). ⇠ | | 1/r0 hr(X) r0 log E gr (X) (26) VI. RELATIVE AND CONDITIONAL RÉNYI ENTROPIES  for any density g, with equality if and only if f = g a.e. Before turning to transportations proofs of Theorems 1 and 2, it is convenient to review some definitions and properties. The Letting r 1 one recovers the usual Gibbs’ inequality. ! following notions were previously used for discrete variables, Definition 3 (Arimoto’s Conditional Rényi Entropy [18]). but can be easily adapted to variables with densities. 1/r0 hr(X Z)= r0log E f( Z) r = r0log Efr (X Z) r n | k ·| k | Definition 1 (Escort Variable [14]). If f L (R ), its escort 2 Proposition 8 applied to f(x z) and g(x z) gives the density of exponent r is defined by | 1/r0 | inequality hr(X Z = z) r0 log E gr (X Z = z) |  | r r Z fr(x)=f (x) f (x)dx. (20) which, averaged over , yields the following conditional n ZR Rényi-Gibbs’ inequality . 1/r0 Let Xr fr denote the corresponding escort . hr(X Z) r0 log E g (X Z) . (27) ⇠ |  r |

Copyright (C) 2020 by IEICE 3 ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020

If in particular we put g(x z)=f(x) independent of z, the The proof of Lemma 2 is very simple for one-dimensional | r.h.s. becomes equal to (25). We have thus obtained a simple variables [19], where T is just an increasing function with proof of the following continuous derivative T 0 > 0 and where (33) is the classical Proposition 9 (Conditioning reduces r-entropy [18]). arithmetic-geometric inequality. For dimensions n>1, Lemma 2 comes into two flavors: hr(X Z) hr(X) (28) |  (i) Knöthe maps: T can be chosen such that its Jacobian with equality if and only if X and Z are independent. matrix T 0 is (lower) triangular with positive diagonal elements Another important property is the data processing inequal- (Knöthe–Rosenblatt map [20], [21]). Two different elementary ity [16] which implies Dr(T (X) T (Y )) Dr(X Y ) for any proofs are given in [12]. Inequality (33) results from the k  k transformation T . The same holds for relative r-entropy when concavity of the logarithm applied to the Jacobian matrices’ the transformation is applied to escort variables: diagonal elements. (ii) Brenier maps: T can be chosen such that its Jacobian Proposition 10 (Data processing inequality for relative matrix T 0 is symmetric positive definite (Brenier map [22], r-entropy). If X⇤,Y⇤,X,Y are random vectors such that [23]). In this case (33) is Ky Fan’s inequality [4, § 17.9]. Xr = T (Xr⇤) and Yr = T (Yr⇤), (29) The key argument is now the following. Considering then (X Y ) (X⇤ Y ⇤). r k  r k escort variables, by transport (Lemma 2), one can write Proof. r(X Y )=D 1 (Xr Yr)=D 1 (T (Xr⇤) T (Yr⇤)) k r k r k  Xp = T (Xp⇤) and Yq = U(Yq⇤) for two diffeomorphims D 1 (Xr⇤ Yr⇤)=r(X⇤ Y ⇤). r k k T and U satisfying (33). Then by transport preservation T (Proposition 11), we have p(X U)+(1 )p(Y V )= When is invertible, inequalities in both directions hold: k k (X⇤ U ⇤)+(1 ) (Y ⇤ V ⇤) for any U ' and Proposition 11 (Relative r-entropy preserves transport). For p k p k ⇠ V , which from (31) can be easily rewritten in the form an invertible transport T satisfying (29), r(X Y )= ⇠ 1 k r0 log r0 (X, Y ) h (X) (1 )h (Y ) r(X⇤ Y ⇤). E p q 1 k 1 = r0 log (T (X⇤),U(Y ⇤)) T 0(X⇤) U 0(Y ⇤) r0 From (24) the equality (X Y )=(X⇤ Y ⇤) can be E r k r k | | | | rewritten as the following identity: ⇣ hp(X⇤) (1 )hq(Y ⇤) (34)⌘ 1 1 1 r r where we have noted (x, y)=' (x) (y). Such an iden- r0 log E gr 0(X) hr(X)= r0 log E g⇤r 0(X⇤) hr(X⇤). p q (30) tity holds, by the change of variable x = T (x⇤),y = U(y⇤), for any function (x, y) of x and y. Now from (25) we have Assuming T is a diffeomorphism, the density gr⇤ of Yr⇤ is given 1/r0 by the change of variable formula gr⇤(u)=gr(T (u)) T 0(u) hr(pX +p1 Y )= r0 log E ✓ (pX +p1 Y ) | | r where the Jacobian T 0(u) is the absolute value of the where ✓ is the density of pX+p1 Y . Therefore, the l.h.s. | | determinant of the Jacobian matrix T 0(u). In this case (30) of (32) can be written as can be rewritten as 1 p p r hr( X + 1 Y ) hp(X) (1 )hq(Y ) (35) r0 log E gr 0(X) hr(X) 1 r 1 1 (31) 0 p p r = r0 log E ✓r ( X + 1 Y ) hp(X) (1 )hq(Y ) = r0 log E gr 0(T (X⇤)) T 0(X⇤) r0 hr(X⇤). | | Applying (34) to (x, y)=✓ (px+p1 y) and using the r VII. A TRANSPORTATION PROOF OFTHEOREM 1 inequality (33) gives We proceed to prove (10). It is easily seen, using finite hr(pX +p1 Y ) hp(X) (1 )hq(Y ) (36) induction on m, that it suffices to prove the corresponding 1 r0 log E ' r0 (X⇤,Y⇤) hp(X⇤) (1 )hq(Y ⇤) inequality for m =2arguments: where '(x⇤,y⇤)= ✓r(pT (x⇤)+p1 U(y⇤)) T 0(x⇤)+ hr(pX +p1 Y ) hp(X) (1 )hq(Y ) ·| (32) (1 )U 0(y⇤) . To conclude we need the following hr(pX⇤ +p1 Y ⇤) hp(X⇤) (1 )hq(Y ⇤) | Lemma 3 (Normal Rotation [12]). If X⇤,Y⇤ are i.i.d. X, Y with equality if and only if are i.i.d. Gaussian. Here Gaussian, then for any 0 <<1, the rotation X⇤ and Y ⇤ are i.i.d. standard Gaussian (0, I) and the triple N p p p p (p, q, r) and its associated (0, 1) satisfy the following X = X⇤+ 1 Y⇤, Y = 1 X⇤+ Y⇤ (37) 2 conditions: p, q, r have conjugates p0,q0,r0 of the same sign yields i.i.d. Gaussian variables X,Y . 1 e e which satisfy 1 + = 1 (that is, 1 + 1 =1+1 ) and Lemma 3 is easy proved considering covariance matrices. p0 q r0 p q r e e 0 A deeper result (Bernstein’s lemma, not used here) states that = r0 =1 r0 . p0 q0 this property of remaining i.i.d. by rotation characterizes the Lemma 2 (Normal Transport). Let f be given and X⇤ Gaussian distribution [19, Lemma 4] [24, Chap. 5]). ⇠ (0,2I). There exists a diffeomorphism T : Rn Rn with Since the starred variables can be expressed in terms of the N ! log-concave Jacobian T 0 such that X = T (X⇤) f. tilde variables by the inverse rotation X⇤ =p X p1 Y , | | ⇠ Y ⇤ =p1 X + p Y , inequality (36) can be written as Thus T transports normal X⇤ to X. The log-concavity property e e is that for any such transports T,U and (0, 1), we have hr(pX + p1 Y ) hp(X) (1 )hq(Y ) (38) 2 e e 1 1/r0 T 0(X⇤) U 0(Y ⇤) T 0(X⇤)+(1 )U 0(Y ⇤) . (33) r0 log E (X Y ) hp(X⇤) (1 )hq(Y ⇤), | | | | | | | e e

Copyright (C) 2020 by IEICE 4 ISITA2020, Kapolei, Hawai’i, USA, October 24-27, 2020

where (x y)=✓ (pT (px p1 y)+p1 U(p1 x+ Taking the geometric mean, integrating over x⇤ and taking | r py)) T 0(px p1 y)+(1 )U 0(p1 x + py) . the logarithm gives the representation ·| | Making the change of variable z = pT (px e e e e e (1 p)hp(X)+n log p +(1 ) (1 q)hq(X)+n log q p1 y)+p1 U(p1 x + py), we check that e e e e e p T (x⇤) (1 )q U(x⇤) 1 (x y)dx = ✓ (z)dz =1since ✓ is a density. Hence, =log f p f q T0(x⇤) U 0(x⇤) dx⇤. | r r e | | | | (x y) is a conditional density, and by (27), Now, byZ log-concavity (17) (with µ = p/r) and (33), R | e R e e 1/r0 (1 p)hp(X)+n log p +(1 ) (1 q)hq(X)+n log q e e e r0 log E (X Y ) hr(X Y ) (39) e e | | r T (x⇤)+( 1 )U(x⇤) log f r T 0(x⇤)+(1 )U 0(x⇤) dx⇤ where hr(X Y )=hr(X)=hr(p X⇤ + p1 Y⇤) since  | | | e e e e Z n r = log r f =(1 r)hr(X )+n log r. X and Y are independent. Combining with (38) yields the Z announcede inequalitye (32).e This ends the proof of Theorem 2. @2 e It remainse to settle the equality case in (32). From the above This theorem asserts that the second derivative 2 (1 @r proof, equality holds in (32) if and only if both (33) and (39) r)h (X)+n log r 0. From (23) this gives Var log f(X ) r  r  are equalities. The rest of the argument depends on whether n/r2, that is, Var log f (X ) n. Setting r =1, this is the r r  Knöthe or Brenier maps are used: varentropy bound Var log f(X) n of [13]. (i) Knöthe maps: In the case of Knöthe maps, Jacobian  REFERENCES matrices are triangular and equality in (33) holds if and @Ti @Ui [1] C. E. Shannon, “A mathematical theory of communication,” Bell System only if for all i, (X⇤)= (Y ⇤) a.s. Since X⇤ @xi @yi Technical Journal, vol. 27, pp. 623–656, Oct. 1948. @T and Y ⇤ are independent Gaussian, this implies that @xi [2] O. Rioul, “Information theoretic proofs of entropy power inequalities,” and @U are constant and equal. In particular the Jacobian IEEE Trans. Inf. Theory, vol. 57, no. 1, pp. 33–55, Jan. 2011. @yi [3] A. Dembo, T. M. Cover, and J. A. Thomas, “Information theoretic T 0(px p1 y)+(1 )U 0(p1 x + py) is con- inequalities,” IEEE Trans. Inf. Theory, vol. 37, no. 6, pp. 1501–1518. | | [4] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. stant. Now since hr(X Y )=hr(X) equality in (39) holds | Wiley, 2006. only if (x y) does not depend on y, which implies that e | e e e [5] M. Madiman, J. Melbourne, and P. Xu, “Forward and reverse entropy pT (px p1 ye)+e p1 Ue(p1 x + py) does power inequalities in convex geometry,” in Convexity and Concentration, not depende one the value of y. Taking derivativese with respect ser. IMA Volumes in Mathematics and its Applications, E. Carlen, @Ti @Ui M. Madiman, & E. Werner, Eds. Springer, 2017, vol. 161, pp. 427–485. to y for all j, we have (X⇤)= (Y ⇤) a.s. for all i, j. j e e @xj @yj e e [6] S. G. Bobkov and G. P. Chistyakov, “Entropy power inequality for the In other words, T 0(X⇤)=Ue0(Y ⇤) a.s. Rényi entropy,” IEEE Trans. Inf. Theory, vol. 61, no. 2, pp. 708–714. (ii) Brenier maps: In the case of Brenier maps the argument [7] E. Ram and I. Sason, “On Rényi entropy power inequalities,” IEEE Trans. Inf. Theory, vol. 62, no. 12, pp. 6800–6815, Dec. 2016. is simpler. Jacobian matrices are symmetric positive definite [8] S. G. Bobkov and A. Marsiglietti, “Variants of the entropy power and by strict concavity, Ky Fan’s inequality (33) is an equality inequality,” IEEE Trans. Inf. Theory, vol. 63, no. 12, pp. 7747–7752. [9] J. Li, “Rényi entropy power inequality and a reverse,” Studia Mathemat- only if T 0(X⇤)=U 0(Y ⇤) a.s. ica, vol. 242, pp. 303–319, Feb. 2018. In both cases, since X⇤ and Y ⇤ are independent, this [10] A. Marsiglietti and J. Melbourne, “On the entropy power inequality for implies that T 0(X⇤)=U 0(Y ⇤) is constant. Therefore, T the Rényi entropy of order [0, 1],” IEEE Trans. Inf. Theory, vol. 65, no. 3, and U are linear transformations, equal up to an additive pp. 1387–1396, Mar. 2019. [11] O. Rioul, “Rényi entropy power inequalities via normal transport and constant (=0since the random vectors are assumed of zero rotation,” Entropy, vol. 20, no. 9, p. 641, Sep. 2018. mean). It follows that Xp = T (Xp⇤) and Yq = U(Yq⇤) are [12] ——, “Yet another proof of the entropy power inequality,” IEEE Trans. Gaussian with respective distributions X (0, K/p) and Inf. Theory, vol. 63, no. 6, pp. 3595–3599, Jun. 2017. p ⇠N [13] M. Fradelizi, M. Madiman, and L. Wang, “Optimal concentration of Y (0, K/q). Hence, X and Y are i.i.d. Gaussian (0, K). q ⇠N N for log-concave densities,” in High Dimensional This ends the proof of Theorem 1. Probability VII: The Cargèse Volume, Basel: Birkhäuser, 2016. We note that this section has provided an information- [14] J.-F. Bercher, “Source coding with escort distributions and Rényi entropy bounds,” Physics Letters A, vol. 373, no. 36, pp. 3235–3238. theoretic proof the strengthened Young’s convolutional inequal- [15] A. Lapidoth and C. Pfister, “Two measures of dependence,” in IEEE Int. ity (with optimal constants), since (32) is a rewriting of this Conf. Science Electrical Engineering (ICSEE 2016), 2016. convolutional inequality [3]. [16] T. van Erven and P. Harremoës, “Rényi and Kullback-Leibler divergence,” IEEE Trans. Inf. Theory, vol. 60, no. 7, pp. 3797–3820, Jul. 2014. [17] S. Verdú, “↵-mutual information,” in Information Theory and VIII. A TRANSPORTATION PROOF OF THEOREM 2 Applications Workshop (ITA 2015), Feb. 2015. Define r = p +(1 )q where 0 <<1. It is required [18] S. Fehr and S. Berens, “On the conditional Rényi entropy,” IEEE Trans. Inf. Theory, vol. 60, no. 11, pp. 6801–6810, Nov. 2014. to show that (1 r)h (X)+n log r (1 p)h (X)+ r p [19] O. Rioul, “Optimal transportation to the entropy-power inequality,” in n log p +(1 ) (1 q)h (X)+n log q . IEEE Inf. Theory Applications Workshop (ITA 2017), Feb. 2017. q By Lemma 2 there exists two diffeomorphisms T,U such [20] M. Rosenblatt, “Remarks on a multivariate transformation,” Ann. Math. Stat., vol. 23, no. 3, pp. 470–472, 1952. that one can write pXp = T (X⇤) and qXq = U(X⇤). Then, [21] H. Knöthe, “Contributions to the theory of convex bodies,” Michigan by these changes of variables X⇤ has density Math. J., vol. 4, pp. 39–52, 1957. [22] Y. Brenier, “Polar factorization and monotone rearrangement of 1 T (x⇤) 1 U(x⇤) pn fp p T 0(x⇤) = qn fq q U 0(x⇤) (40) vector-valued functions,” Comm. Pure Applied Math., vol. 44, no. 4, pp. | | | | 375–417, Jun. 1991. which can be written [23] R. J. McCann, “Existence and uniqueness of monotone measure- p T (x⇤) q U(x⇤) preserving maps,” Duke Math. J., vol. 80 pp. 309–324, Nov. 1995. f T 0(x⇤) f U 0(x⇤) p | | = q | | . [24] W. Bryc, The Normal Distribution - Characterizations with Applications, exp (1 p)h (X)+n log p exp (1 q)h (X)+n log q ser. Lecture Notes in Statistics. Springer, 1995, vol. 100. p q

Copyright (C) 2020 by IEICE 5