arXiv:1703.09518v1 [cs.IT] 28 Mar 2017 aihsas vanishes nttlvraini h oa aito distance variation total the if variation total in ∫ dacdCmuiainRsac nttt AR) Sharif aamini}@sharif.ir. [email protected] (ACRI), E-mails: Iran. Institute Tehran, Technology, Research Communication Advanced eaieetoybtentetopf,dfie as defined pdfs, two the between entropy relative cetdMrh2,21.Tewr fA oaiwsspotdb QB950607. supported was Grant Gohari under A. Technology of of work University The 2017. 22, March accepted xsec n otniyo ifrnilEtoyfor Entropy Differential of Continuity and Existence ue ihpdfs absolutel with sure) of sequence a ( giving continuous Assume without conditions. results, existing technical exact the of statement applicatio rigorous theory. an the information for of network [4] in proof check approach the and this instance [5], of in for theorem see smooth limit approach to central latter distribution the the normal on For a out. constraints with distribution c impose to given alternatively, to the or (pdf), common function is density probability it properties, two uneo rbblt esrs(nsm erc,i sof is it metric), some entropie ( differential converge (in the whether determine measures to importance probability of quence rmtedsrt nrp fteqatzdvrino the of version quantized size the step of with entropy variable random discrete the from sta h eurmnscudb edl eie o given a for verified cl readily this be of advantages could the requirements of distribution. the One that me distance. the is variation under space diffe total this the of over convergent that uniformly show is and entropy distributions, probability tinuous nrp farno aibei eie ysubtracting by derived is variable random a of entropy entropy iletoy oa aito itne nomto theor information distance, variation total entropy, tial − fetoyhsrcie teto nteltrtr 3,[4] [3], literature the applications, numerous in their attention to received has entropy of T m p R ( aucitrcie oebr1,21;rvsdFbur 22 February revised 2016; 17, November received Manuscript h uhr r ihteDprmn fEetia Engineeri Electrical of Department the with are authors The ∫ x nodrt kthtecretlnsae epoieanon- a provide we landscape, current the sketch to order In suigteeitneo nrp o ovretse- convergent a for entropy of existence the Assuming ne Terms Index Abstract ed oifiiy[,p 247]. p. [2, infinity to tends p R ) ( p x ersne ytedensity the distributio by probability given represented a of entropy differential HE nrltv nrp if entropy relative in ( ) x log .Frtdfie ySanni 1,tedifferential the [1], in Shannon by defined First ). ) ai hucin mnGohari, Amin Ghourchian, Hamid log I hsppr eietf ls fasltl con- absolutely of class a identify we paper, this —In otniyo entropy of continuity [ p k AC ( p x Asltl otnosrno etr differen- vector, random continuous —Absolutely ∞ → ( )/ x k p ) rbblt esrs(...Lbsu mea- Lebesgue (w.r.t. measures probability ) p q k d ( ( − x x x where , , )] ) q esythat say We . .I I. fteitga ovre ( converges integral the if k d 1 x = . NTRODUCTION esyta h ovrec holds convergence the that say We ¹ D R ( p | .Bt xsec n continuity and existence Both ). p k e.g. ( k 1 x p p / − ) → ) ls fDistributions of Class a ( m p e 3.T urne these guarantee To [3]. see x k n aigislmtas limit its taking and , ) ( q x ( sdfie as defined is 0 ) x where , )| ovre otepdf the to converges d x feu {aminzadeh, if.edu, . eirMme,IEEE Member, Senior D xsec of existence nvriyof University (·k·) k D h Sharif the y p y. h ( k onvolve p ( 2017; , rential gand ng − k p sthe is also s log q ) due p ) tric the (1) ass : k m = = it y n n 1 hc r airt vlaeta h nerl of integrals the than evaluate to easier are which esl,teatoso 4 en ls f“ of class a define Co existence. [4] its of we to authors conditions, addition the in stringent versely, entropy more of with continuity differen Thus, the prove of shown. continuity) is not (and entropy existence the which for some itiuin”a h e of set the as distributions” nttlvrainlast h ovrec netoy The entropy. in convergence the the to leads parameters convergenc variation the total which in for measures probability dimensional nrp.Atcnqedvsdi 1,p 2 eel htif that reveals 12] p. [10, in devised technique A entropy. netoyi bandi 1] ute,tecniinon condition the Further, [11]. in obtained is entropy in h bv euaiycniini etitv.Frinstanc For restrictive. is regular condition any regularity above The ( h eut n[]i em of terms in [4] behavio in scaling results the condition, the regularity restrictive the ssoni 1]ta h ovrec netoyhlsif holds entropy in convergence I the support. that finite of [12] are in distributions shown the if is dropped be could h ovrec nrltv nrp.Wt nfr point-wi uniform both With of entropy. convergence relative in convergence the in sequence a set convex and vrtera iefrsome for line real the over nrp ifrneo n w ebr ol econtrolled be variance. could and distance members Wasserstein two their any by of difference entropy vectors hi ovrec netoy n[] 9 ti hw htthe that shown is with it accompanied [9] ratio [7], in In convergence entropy. impl in distributions convergence i of convergence their conditions, pointwise similar the under that [8], shown In implies entropy. variation total in in convergence convergence the that show can one all n[]ta h ovrec nrltv nrp mle the implies entropy relative in with Similarly, convergence entropy. the in convergence that [7] in aito 6 .3] Provided tot 33]. in p. convergence [6, implies variation entropy relative in convergence nly h ovrec od nrtoif ratio in holds convergence the finally, h ovrec od netoyif entropy in holds convergence The p p p | k p α, ( k k x ial,acaso rglrpoaiiydsrbtos fo distributions" probability “regular of class a Finally, nti ae edfietenwclass new the define we paper this In ( ste“rjcin fsm distribution some of “projection” the is and ( v x ) x α x 1 , ) + hodrmmn n h urmmof supremum the and moment order th − ) m 1 ntespotof support the in M − δ n rs Amini, Arash and , ) x hscasi usto h n osdrdi [13], in considered one the of subset a is class This . δ frno aibe sietfidi 4,frwihthe which for [4], in identified is variables random of – , p AC > k∇ ∈ ( p x 0 v k )| p supp log ( n mle h ovrec netoyadrelative and entropy in convergence the implies 1 ( x and + x ) F a o vnb otnos oee,with However, continuous. be even not may δ 1 ) F p + { ( and ed ob ifrnibe whereas differentiable, be to needs δ nete nrp rttlvrainimplies variation total or entropy either in p m x , fdsrbtos hn h ovrec of convergence the then, distributions, of } )k n oepositive some and , p p nti ls ersn pe onson bounds upper represent class this in 2 | ( k p x ≤ p k → ) ise’ nqaiysosta the that shows inequality Pinsker’s . 1 ( c − x 1 p δ − ) p k and , ( 0 eirMme,IEEE Member, Senior x x and k ) p < δ < m 2 ( satisfying + x n p < )| c p k 1 2 ( / r ueirta those than superior are − p , x p δ k ) 1 ( 1 → n h nerl of integrals the and , aihas vanish h x m + p ( ) δ p ( α, , k m p 1 < k ∀ ( p p k r l integrable all are ( h convergence the , , x x v ( q x → ) ( p )/ , M x x )/ ∈ k m ) ( ) noaclosed a onto p ( c ti shown is it , respectively, , p ) R x ( < 1 – ( x p )/ , n k x AC ) c ( . h → ) M p x 2 ∞ → < ) ( ) ( -regular 1 p x n − p o all for ) ) M δ and , ( rof ur of 1 p < x . k and ∈ ) is t tial the for (2) for ies / M n n- se al e, p e 1 - r t 2

n = n α 1 α of α, v, m – : it is proved in [4, Proposition 1] that if where x α : i=1 xi / . X ( p x )andACY p y are c , c -regular, then h X and k k | | X Y 1 2 Remark 2. Observe that a scalar continuous h Y∼ exist( ) and ∼ ( ) ( ) ( ) Í  ( ) X is in α, m, v if it has an α-moment E X α < v h X h Y ∆, (3) ( )−AC [| | ] | ( )− ( )| ≤ and a bounded density function pX x < m. It is often not where ∆ is a constant that grows like √n as we increase difficult to verify these properties for( a given) random variable. the dimension n. However, the constant that we find for Many commonly used random variables have bounded density n α, v, m – grows like n log n. functions and at least one finite moment. Furthermore, if X1 ( ) AC ∈ This paper is organized as follows: in Section II, some α, m1, v1 and X2 α, m2, v2 are independent ( )−AC ∈ ( )−AC definitions are given; the main result is presented in Section random variables, X1 + X2 α, min m1, m2 , cα v1 +v2 ∈( ( α 1 ) ( ))−AC III, and its proof is given in Section IV and the appendix. where cα = 1 if α 0, 1 and cα = 2 for α > 1. This is ∈ [ ] − + = = because fX1 X2 x t fX1 t fX2 x t dt m1 t fX2 x t dt II. DEFINITIONS AND NOTATIONS m . Also, the α(-moment) of( ) X +( −X )can≤ be bounded( − by) the 1 ∫ 1 2 ∫ All the in this paper are in base e. Random moments of X1 and X2 via the cr -inequality for moments. Thus, variables and vectors are denoted by capital letters with vectors any linear combination of the commonly used variables also distinguished by bold letters. belongs to the defined class of distributions. Definition 1 (Absolutely Continuous Random Vector). Let n Definition 4. [14] The density of an n-dimensional B be the Borel σ-field of Rn and let X be a real-valued and n- random vector of “α-generalized ”, dimensional random vector that is measurable with respect 1 α 0, vα n / In, α , is given by to n. We call X an absolutely continuous random vector if N ( / ) B   n its probability measure µ, induced on Rn, n , is absolutely 1 n α = n α 1 αv x α ( B ) n φn,α,v x αv Γ 1 + e− k k , (8) continuous with respect to the Lebesgue measure for (i.e., ( ) 2 α 1 B  ( )  µ = 0 for all n with zero Lebesgue measure). We  (A) A∈B where In is the identity of size n, scalars α, v > 0 are denote the set of all absolutely continuous distributions by arbitrary, and Γ is the . The α-generalized . E α = AC normal distribution satisfies Xn,α,v α v. The Radon-Nikodym theorem implies that for each X   there exists a n-measurable function p : Rn 0, ∈, such AC III. MAIN RESULT that for all B n we have that → [ ∞) Our main result stated below in Theorem 1 implies A∈B that the differential entropy is uniformly continuous over Pr X = p x dx. (4) n { ∈ A} ( ) α, v, m – with respect to the total variation metric: ¹A ( ) AC The function p is called the probability density function (pdf) = pX pY 1 pX x pY x dx. (9) of X [6, p. 21]. The property X is alternatively k − k Rn | ( )− ( )| ∈ AC ¹ written as p . Random variables with an absolutely Theorem 1. The differential entropy of any p ∈ AC X continuous distributions are generally known as continuous α, v, m – n is well-defined and satisfies ∈ random variables. ( ) AC n αv + h pX α log n log max 1, m Remark 1. Intuitively speaking, the absolute continuity of a | ( )| ≤ ( { }) + n log 2 Γ 1 + 1 + n + 1. (10) probability measure w.r.t the Lebesgue measure implies that α α the function has a pdf with no delta functionals, i.e., the h  i n Furthermore, for all pX, pY α, v, m – satisfying cumulative distribution function (CDF) is continuous. It does ∈ ( ) AC pX pY 1 m, we have that not imply the absolute continuity (or even continuity) of its k − k ≤ probability density function p x as a function of x. h pX h pY ( ) | ( )− ( )| ≤ + 1 Definition 2 (Differential Entropy). [2, Chapter 2] For an n- c1 pX pY 1 c2 pX pY 1 log , (11) k − k k − k pX pY dimensional X with density p we define the differential k − k1 entropy h X , or∈ ACh p as where ( ) ( ) n 2αv e c1 = log + log me + log h p = h X := p x log p x dx, (5) α n | ( )| 2 ( ) ( ) − Rn ( ) ( ) + Γ + 1 + n + ¹ n log 2 1 α α 1, (12) if the integral converges. We use the convention x log x = 0 ( ) = n + h  i at x = 0. c2 α 2. (13) Definition 3. Given α, m, v > 0, and n N, we define We now compliment the above result by showing that the α, v, m – n to be the class of all X ∈ such that the differential entropy may no longer be well-defined if we n (corresponding) AC density function p : Rn ∈0 AC, satisfies remove any of (6) or (7) from the definition of α, v, m – . 7→ [ ∞) ( ) AC α Example 1. Consider the following pdf defined over the real x α p x dx < v, and (6) Rn k k ( ) line R: ¹ 1 , x > e, ess sup p x < m, (7) p x = x log x 2 (14) x Rn ( ) ( ) ∈ ( ) (0, x e. ≤ 3

Note that p x is a valid density function with no delta Then, κ would also serve as an upper bound for the entropy functionals; it( defines) a continuous random variable and thus h Y . Let p . Also, observe that p x < 1 e for all x R. On the ( ) ∈ AC ( ) / ∈= q y other hand, by applying the change of variables y log x, we Θw = q y log ( ) dx, (25) ( ) φ α n y observe that the αth-order moment is infinite for any α > 0: y w n,α,c / v ¹k k∞ ≤ m ( ) α αy ∞ x ∞ e = = where φ α n is defined in Definition 4. Except for the dx dy . (15) n,α,c / v x log x 2 y2 ∞ m Θ ¹e ( ) ¹1 domain of the integral, w defines a relative entropy measure. Hence, this distribution satisfies (7), but not (6). With the same The utility of relative entropy to bound entropy is a known change of variables y = log x, it can be observed that technique; in particular, we link Θw to ∆w by

∞ 1 2 n α h p = log x log x dx =+ , (16) q y exp α n y α 2 ( ) cm/ vα k k ( ) e x log x ( ( ) ) ∞ Θ = q y log dy (26) ¹ w 1 n ( ) y w ( ) α  Example 2. Consider the following probability density func- ∞ n 1 ¹k k ≤ α n 1 cm/ vα 2Γ α +1 tion:  ( )  1 e 1   − 2 , 0 < x < e− , α n α = x log x log log x cm/ vα q x ( )( (− )) (17) =n log 2 Γ 1 + 1 q y dy ( ) , otherwise. n α (0 " # y w ( )     ¹k k∞ ≤ Here, (6) is satisfied but (7) is violated. Note that q x is n + y αq y dy ∆ . (27) a valid density function with no delta functionals, thus( q) α n α w cm/ vα y w k k ( ) − . Since the support of this pdf is finite, all of its α-th∈ ¹k k∞ ≤ momentsAC are finite. However, by the change of variables y = This could be rewritten as log log x , we have that (− )) n e ∆ = log b + log c q y dy e− 2 w 1 m log x log x log log x α y w ( ) h q = − (− ( )( (− )) ) dx = . (18) ¹k k∞ ≤ ( ) x log x log log x 2 −∞ h n i 0 + y αq y dy Θ (28) ¹ ( )( (− )) α n α w c / vα y w k k ( ) − m ¹k k∞ ≤ IV. PROOFOFTHE MAIN RESULT n n log b1 + log cm + Θw, (29) A. Proof of (10): Upper bound on the differential entropy ≤ α | | α − n Let Y := c X where c = max m, 1 . The existence of α √ m m where b := αv n 2Γ 1 α + 1 , and the last inequality is the differential entropy h Y is equivalent( ) to the existence of 1 true because / [ ( / )] the differential entropy h( X) , and when both exist we have that [2, Theorem 8.6.4], ( ) q y dy q y dy = 1, (30) 1 y w ( ) ≤ Rn ( ) h X = h Y log cm. (19) ¹k k∞ ≤ ¹ ( ) ( )− n α α y α q y dy y α q y dy (31) The benefit of defining Y is that its density function denoted y w k k ( ) ≤ Rn k k ( ) ¹k k∞ ≤ ¹ α α n by q, is bounded from above by one: =E Y = c / v. (32) k kα m 1 y a.e. =   q y pX n 1. (20) We now find a lower bound for Θw. Note that for any x 0 ( ) cm √c ≤  m  and a > 0, we have x log x a x a. Thus, ≥ Hence, for any y Rn, we know that ( / )≥ − ∈ q y 1 Θw = q y log ( ) dy (33) q y log 0. (21) ( ) φ α n y y w n,α,c / v ( ) q y ≥ ¹k k∞ ≤ m ( ) ( ) To prove the convergence of q y φ α n y dy (34) n,α,cm/ v ≥ y w ( )− ( ) 1 1 ¹k k∞ ≤   q y log dy = q y log dy, (22) n n = q y dy φ α n y dy 1, (35) R ( ) q y R ( ) q y n,α,cm/ v ¹ ( ) ¹ ( ) y w ( ) − y w ( ) ≥ − ¹k k∞ ≤ ¹k k∞ ≤ let us consider where the validity of the last equation is coming from the fact ∆ = 1 w q y log dy, (23) that φ α n and q are both probability density functions, y w ( ) q y n,α,cm/ v ¹k k∞ ≤ ( ) and their integrals over any interval is between 0 and 1. By = for w > 0, where y : maxi 1, ,n yi , and yi is the combining (29) and (35), we obtain that k k∞ ∈ ··· | | ith element of y. By recalling (21), we deduce that ∆w is w n n increasing in . In addition, ∆w log b1 + log cm + + 1. (36) ≤ α | | α h q = h Y = lim ∆w . (24) w ∆ ( ) ( ) →∞ This establishes an upper bound on w independent of w, Thus, to prove the existence of the differential entropy h Y , which completes the proof of the existence of h Y , and in ( ) ( ) it suffices to find a constant κ such that ∆w κ for all w. turn the existence of h X . ≤ ( ) 4

B. Proof of (11): Continuity of the differential entropy Consequently, The existence of h X and h Y follow from (10). The idea 1 ( ) ( ) h X′ h Y′ δpZ x log dx (52) is to adapt the argument of [2, Theorem 17.3.3] from discrete | ( )− ( )| ≤ Rn ( ) δpZ x ¹ ( ) entropy to differential entropy. Let δ = pX pY 1. Observe 1 a.e. k − k =δ log + δh Z , (53) that δ = 0 yields pX = pY and h X = h Y . For δ > 0, δ ( ) ( ) ( ) consider two new random vectors which completes the proof. = n = n X′ √me X, Y′ √me Y. (37) REFERENCES

Because of Lemma 1 from the appendix, pX , pY [1] C. E. Shannon, “A mathematical theory of communication,” The Bell ′ ′ ∈ System Technical Journal, vol. 27, pp. 379–423, 623–656, 1948. α, me α nv, 1 e – n, as well as, ( ) / / AC [2] T. M. Cover and J. A. Thomas, Elements of , 2nd ed.   New York: John Wiley & Sons, 2006. [3] A. R. Barron, L. Gyorfi, and E. C. van der Meulen, “Distribution pX x pY x dx = δ, (38) Rn | ′ ( )− ′ ( )| estimation consistent in total variation and in two types of information ¹ = divergence,” IEEE Trans. Inf. Theory, vol. 38, no. 5, pp. 1437–1454, h X h Y h X′ h Y′ . (39) 1992. | ( )− ( )| | ( )− ( )| [4] Y. Polyanskiy and Y. Wu, “Wasserstein continuity of entropy and outer We further define a random vector Z with density pZ as bounds for interference channels,” IEEE Trans. Inf. Theory, vol. 62, no. 7, pp. 3992–4002, 2016. pX′ x pY′ x pZ x := | ( )− ( )| . (40) [5] A. R. Barron, “Monotonic central limit theorem for densities,” De- ( ) δ partment of Statistics, Stanford University, Stanford, California, Tech. Rep. 50, Mar. 1984. α n n We shall show that pZ α, 2 me / δ, 2 eδ – . Since [6] S. Ihara, Information theory for continuous systems. Singapore: World ∈ ( ) / /( ) AC Scientific, 1993. pX , pY represent absolutely continuous distributions, the same ′ ′   [7] F. J. Piera and P. Parada, “On convergence properties of shannon applies to pZ. We further have entropy,” Problems of Information Transmission, vol. 45, no. 2, pp. 75– 94, Jun. 2009. pX′ x pY′ x pZ x = | ( )− ( )| (41) [8] M. Godavarti and A. Hero, “Convergence of differential entropies,” IEEE ( ) δ Trans. Inf. Theory, vol. 50, no. 1, pp. 171–176, 2004. pX x + pY x a.e. 2 [9] I. Sason, “On reverse pinsker inequalities,” arXiv preprint ′ ( ) ′ ( ) . (42) δ δ arXiv:1503.07118, 2015. ≤ ≤ e [10] O. Johnson, Information theory and the central limit theorem. London: Regarding the α moment of Z, we can write that Imperial College Press, 2004. p x p x [11] S. Ikeda, “Continuity and characterization of shannon-wiener informa- α α X′ Y′ tion measure for continuous probability distributions,” Annals of the E Z α = x α | ( )− ( )| dx (43) k k Rn k k δ Institute of Statistical Mathematics, vol. 11, pp. 131–144, 1959. ¹ α [12] A. Otáhal, Finiteness and continuity of differential entropy. Heidelberg:   + n α pX′ x pY′ x me v Physica-Verlag HD, 1994, pp. 415–419. x α ( ) ( ) dx 2( ) . (44) ≤ Rn k k δ ≤ δ [13] C. Nair, B. Prabhakar, and D. Shah, “On entropy for mixtures of discrete ¹ and continuous variables,” arXiv preprint cs/0607075, 2006. α n n [14] I. R. Goodman and S. Kotz, “Multivariate θ-generalized normal distri- The latter result confirms pZ α, 2 me / δ, 2 eδ – . ∈ ( ) / /( ) AC butions,” Journal of Multivariate Analysis, vol. 3, no. 2, pp. 204–219, Now, from (10), we know that h Z exists, and 1973.  ( )  n αv m h Z log + log max , 1 APPENDIX | ( )| ≤ α nδ δ n Lemma 1. Let X, Y be distributed according to p , p + n log 2Γ 1 + 1 +  + 1 .  (45) X Y α α α, v, m – n, respectively. Let X = √n me X, Y = √n me Y∈ ( ) AC ′ ′ This result could be simplifiedh  by applyingi δ m as be scaled versions of X and Y respectively. Then, ≤ α n 1 n h Z c1 + c2 1 log δ , (46) pX , pY α, me v, – , (54) | ( )| ≤ ( − )| | ′ ′ ∈ ( ) e AC where c1 and c2 are given in (12), (13). Next, we show that pX pY  = pX pY , (55) k − k1 k ′ − ′ k1 1 h X h Y = h X′ h Y′ . (56) h X′ h Y′ δ log + δh Z . (47) ( )− ( ) ( )− ( ) | ( )− ( )| ≤ δ ( ) Proof: For (54), note that This equation together with (39) and (46) shall complete the 1 x proof. In order to prove (47), we can write pX x = pX n . (57) ′ ( ) me √me . . h X′ h Y′ n  a e | ( )− ( )| Since, pX α, v, m – , then pX′ x 1 e. Furthermore, 1 1 ∈( E ) ACα = α n(E) ≤ α/ α n p x log p x log dx (48) we have that X′ α me / X α me / v. X′ Y′ k k ( ) k k ≤ ( ) ≤ Rn ( ) pX x − ( ) pY x Equation (55) is immediate from the definition of the total ¹ ′ ( ) ′ ( ) variation distance, and (56) holds because scaling a variable = pX x log pX x pY x log pY x dx. (49) n Rn | ′ ( ) ′ ( )− ′ ( ) ′ ( )| by √me increases its entropy by 1 n log me . ¹ ( / ) ( ) Lemma 2 in the appendix results in Lemma 2. [2, Theorem 17.3.3] For every x, y 0, 1 e we have that ∈[ / ] pX x log pX x pY x log pY x (50) | ′ ( ) ′ ( )− ′ ( ) ′ ( )| 1 1 x log x y log y x y log . (58) δpZ x log . (51) | − |≤| − | x y ≤ ( ) δpZ x ( ) | − |