R´enyi Measures for Commonly Used Univariate Continuous Distributions

M. Gil1,∗, F. Alajaji, T. Linder Department of Mathematics and Statistics, Queen’s University, Kingston, ON, K7L 3N6

Abstract Probabilistic ‘distances’ (also called ), which in some sense assess how ‘close’ two probability distributions are from one another, have been widely em- ployed in probability, statistics, information theory, and related fields. Of par- ticular importance due to their generality and applicability are the R´enyi diver- gence measures. This paper presents closed-form expressions for the R´enyi and Kullback-Leibler divergences for nineteen commonly used univariate continuous distributions as well as those for multivariate Gaussian and Dirichlet distribu- tions. In addition, a table summarizing four of the most important informa- tion measure rates for zero-mean stationary Gaussian processes, namely R´enyi entropy, differential Shannon entropy, R´enyi divergence, and Kullback-Leibler divergence, is presented. Lastly, a connection between the R´enyi divergence and the variance of the log-likelihood ratio of two distributions is established, thereby extending a previous result by Song [J. Stat. Plan. Infer. 93 (2001)] on the relation between R´enyi entropy and the log-likelihood function. A ta- ble with the corresponding variance expressions for the univariate distributions considered here is also included. Keywords: R´enyi divergence, R´enyi divergence rate, Kullback divergence, probabilistic distances, divergences, continuous distributions, log-likelihood ratio.

1. Introduction In his 1961 work [27], R´enyi introduced generalized information and diver- gence measures which naturally extend Shannon entropy [29] and Kullback- Leibler divergence (KLD) [19]. For a probability density f on Rn the R´enyi −1 R α  entropy of order α is defined via the hα(f) := (1 − α) ln f(x) dx

∗Corresponding author Email addresses: [email protected] (M. Gil), [email protected] (F. Alajaji), [email protected] (T. Linder) 1Present address: McMaster University, Department of Engineering Physics, 1280 Main Street West, John Hodgins Engineering Building, Room A315, Hamilton, Ontario, Canada L8S 4L7, Phone: (905) 525-9140 ext 24545, Fax: (905) 527-8409.

Preprint submitted to Information Sciences July 14, 2013 for α > 0 and α 6= 1. Under appropriate conditions, in the limit as α → 1, hα(f) converges to the differential Shannon entropy h(f) := − R f(x) ln f(x) dx = n −Ef [ln f] . If g is another probability density on R , the R´enyi divergence of −1 R α 1−α  order α between f and g is given by Dα(f||g) := (α−1) ln f(x) g(x) dx for α > 0 and α 6= 1. Under appropriate conditions, in the limit α → 1, Dα(f||g) converges to the KLD between f and g, D(f||g) := R f(x) log(f(x)/g(x)) dx = Ef [log(f/g)]. While a significant number of other divergence measures have since been introduced [1,3,9], R´enyi divergences are especially important because of their generality, applicability, and the fact that they posses operational definitions in the context of hypothesis testing [7,2] as well as coding [14]. Additional appli- cations of R´enyi divergences include the derivation of a family of test statistics for the hypothesis that the coefficients of variation of k normal populations are equal [23], their use in problems of classification, indexing and retrieval, for ex- ample [15], and tests statistics for partially observed diffusion processes [28], to name a few. The ubiquity of R´enyi divergences suggests the importance of establishing their general mathematical properties as well as having a compilation of readily available analytical expressions for commonly used distributions. The mathe- matical properties of the R´enyi information measures have been studied both di- rectly, e.g., in [27, 32, 13], and indirectly through their relation to f-divergences [6, 21]. On the other hand, a compilation of important R´enyi divergence and KLD expressions does not seem to be currently available in the literature. Some isolated results can be found in separate works, e.g.: R´enyi divergence for Dirich- let distributions (via the Chernoff distance expression) [26]; R´enyi divergence for two multivariate Gaussian distributions [16, 15], a result that can be traced back to [5] and [31]; R´enyi divergence and KLD for a special case of univari- ate Pareto distributions [4]; and KLD for multivariate normal and univariate Gamma [25, 24], as well as Dirichlet and Wishart [24]. The need for an analogous compilation for R´enyi and Shannon entropies has already been addressed in the literature. Closed-form expressions for differential Shannon and R´enyi entropies for several univariate continuous distributions are presented in the work by Song [30]. The author also introduces an ‘intrinsic loglikelihood-based distribution measure,’ Gf , derived from the R´enyi entropy. Song’s work was followed by [22] where the differential Shannon and R´enyi entropy, as well as Song’s intrinsic measure for an additional 26 continuous univariate distribution families are presented. The same authors then expanded these results for several multivariate families in [34]. The main objective of this work is to extend these compilations to include expressions for the KLD and R´enyi divergence of commonly used univariate continuous distributions. Since most of the applications revolve around two distributions within the same family, this was the focus of the calculations as well. The complete derivations can be found in [11], where it is also shown that the independently derived expressions are in agreement with the afore- mentioned existing results in the literature. We present tables containing these

2 expressions for nineteen univariate continuous distributions as well as those for Dirichlet and multivariate Gaussian distributions in Section 2.2. In Section 2.3 we provide a table summarizing the expressions for the R´enyi entropy, Shannon entropy, R´enyi divergence, and KLD rates for zero-mean stationary Gaussian sources. Finally, in Section 3 , we establish a relationship between the R´enyi divergence and the variance of the log-likelihood ratio and present a table of the corresponding variance expressions for the various univariate distributions considered in this paper.

2. Closed-Form Expressions for R´enyi and Kullback Divergences of Continuous Distributions We consider commonly used families of univariate continuous distributions and present R´enyi and Kullback divergence expressions between two members of a given family. While all the expressions given in this section have been independently derived in [11], we note that in the cases of distributions belonging to exponential families, the results can be computed via a closed-form formula originally obtained by Liese and Vajda [20] presented below. This result seems to be largely unknown in the literature. For example, the works [26,4, 25, 24] do not reference it, while similar work by Vajda and Darbellay [8] on differential entropy for exponential families is cited in some of the works compiling the corresponding expressions such as [22, 34], and even in the entropy section of [4]. Some expressions for R´enyi divergences and/or Kullback divergences for cases where the formula by Liese and Vajda is not applicable are also derived in [11], namely the R´enyi and Kullback divergence for general univariate Lapla- cian, general univariate Pareto, Cram´er,and uniform distributions, as well as the KLD for general univariate Gumbel and Weibull densities. The integration calculations are carried out using standard techniques such as substitution and the method of integration by parts, reparametrization of some of the so as to express the integrand as a known probability dis- tribution scaled by some factor, and applying integral representations of special functions, in particular the Gamma and related functions.

2.1. R´enyiDivergence for Natural Exponential Families In Chapter 2 of their 1987 book Convex Statistical Distances [20], Liese and Vajda derive a closed-form expression for the R´enyi divergence between two members of a canonical exponential family. Note that their definition of R´enyi divergence, here denoted by Rα(fi||fj), differs by a factor of α from the one considered in this work; that is Dα(fi||fj) = αRα(fi||fj). Also, the authors allow the parameter α to be any real number as opposed to the more standard definitions where α > 0. Consider a natural exponential family of n −1 probability measures Pτ on R having densities pτ (x) = [C(τ )] exphτ , T (x)i, where T : Rn → Rk, τ is a k−dimensional parameter vector, and h·, ·i denotes the standard inner product in Rk. Denote the natural parameter space by Θ. Then Rα(fi||fj) is given by the following cases:

3 • If α∈ / {0, 1} and ατi + (1 − α)τj ∈ Θ,

1 C(ατi + (1 − α)τj ) Rα(fi||fj) = ln α 1−α ; α(α − 1) C(τi) C(τj )

• If α∈ / {0, 1} and ατi + (1 − α)τj ∈/ Θ,

Rα(fi||fj) = +∞ ;

• If α = 0,

Rα(fi||fj) = ∆(τi, τj );

• If α = 1,

Rα(fi||fj) = ∆(τj , τi);

−1  with ∆(τ , τj ) := limα↓0 α [αD(τi) + (1 − α)D(τj ) − D(ατi + (1 − α)τj )] , and D(τ ) = ln C(τ ).

2.2. Tables The results presented in the following tables are derived in [11], except for the derivation of the KLD for Dirichlet distributions, which can be found in [24]. A few of these results can also be found in the references mentioned in Section 1. Table 2 and Table 3 present the expressions for R´enyi and Kullback divergences, respectively. We follow the convention 0 ln 0 = 0, which is justified by continuity. The densities associated with the distributions are given in Table 1. The table of R´enyi divergences includes a finiteness constraint for which the given expression is valid. For all other cases (and α > 0), Dα(fi||fj) = ∞. In the cases where the closed-form expression is a piecewise-defined function, the conditions for each case are presented alongside the corresponding formula, and it is implied that for all other cases Dα(fi||fj) = ∞. The expressions for the R´enyi divergence of Laplace and Cramer distributions are still continuous at α = λi/(λi + λj) (where λ is the scale parameter in the Laplace distribution) and α = 1/2, respectively (this can be easily verified using l’Hospital’s rule). One important property of R´enyi divergence is that Dα(T (X)||T (Y )) = Dα(X||Y ) for any invertible transformation T . This follows from the more general data processing inequality (see, e.g., [20, 21]). For example, the R´enyi divergence between two lognormal densities is the same as that between two normal densities. Also, applying the appropriate T and reparametrizing, one can find similar relationships between other distributions, such as between Weibull and Gumbel, Gamma and Chi (or the relevant special cases), and the half- normal and L´evyunder equal supports. Note that Γ(x), ψ(x), and B(x) denote the Gamma, Digamma, and the mul- tivariate Beta functions, respectively. Also, γ ≈ 0.5772 is the Euler-Mascheroni constant. Lastly, note the direction of the divergence: i 7→ j.

4 Table 1: Common Continuous Distributions

Name Density Restrictions xa−1(1 − x)b−1 Beta a , b > 0 ; x ∈ (0, 1) B(a, b) 2 2 21−k/2xk−1e−x /2σ Chi σ > 0 , k ∈ ; x > 0 k k  N σ Γ 2 xd/2−1e−x/2 χ2 d ∈ ; x > 0 2d/2Γ(d/2) N θ Cram´er θ > 0 ; x ∈ 2(1 + θ|x|)2 R

d 1 Y ak−1 d Dirichlet x a ∈ , ak > 0, d ≥ 2 ; B(a) k R k=1 d P x ∈ R , xk = 1

Exponential λe−λx λ > 0 ; x > 0

xk−1e−x/θ Gamma θ > 0 , k > 0 ; x > 0 θkΓ(k)

− 1 (x−µ)0Σ−1(x−µ) e 2 Multivariate Gaussian µ ∈ n ; x ∈ n (2π)n/2|Σ|1/2 R R Σ symmetric positive definite

2 2 e−(x−µ) /2σ Univariate Gaussian √ σ > 0 , µ ∈ R ; x ∈ R 2πσ2

− 1 x0Φ−1x e 2 " 1 ρ # Special Bivariate ρ ∈ (−1, 1) , Φ = ; 2π(1 − ρ2)1/2 ρ 1 2 Gaussian x ∈ R −(x−µ)/β e−(x−µ)/β e−e Gumbel µ ∈ , β > 0 ; x ∈ β R R r 2 2 2 Half-Normal e−x /(2σ ) σ > 0 ; x > 0 πσ2  λ 1/2 −λ(x − µ)2 Inverse Gaussian exp λ > 0, , µ > 0 ; x > 0 2πx3 2µ2x 1 Laplace e−|x−θ|/λ λ > 0 , θ ∈ ; x ∈ 2λ R R − c r c e 2(x−µ) L´evy c > 0 , µ ∈ ; x > µ 2π (x − µ)3/2 R 2 1 (ln x−µ) − 2 Log-normal √ e 2σ σ > 0 , µ ∈ R ; x ∈ R xσ 2π 2 − x r 2 x2e 2σ2 Maxwell-Boltzmann σ > 0 ; x > 0 π σ3 Pareto amax−(a+1) a , m > 0 ; x > m x 2 2 Rayleigh e−x /(2σ ) σ > 0 ; x > 0 σ2 1 Uniform a < x < b b − a k Weibull kλ−kxk−1e−(x/λ) k , λ > 0 ; x > 0

5 Table 2: R´enyi Divergences for Common Continuous Distributions

Name Dα(fi||fj ) Finiteness Condition

B(aj , bj ) 1 B(aα, bα) Beta ln + ln aα, bα ≥ 0 B(ai, bi) α − 1 B(ai, bi)

aα = αai + (1 − α)aj , bα = αbi + (1 − α)bj

 kj  σj Γ(kj /2) Chi ln   ki σi Γ(ki/2)   σ2σ2 !kα/2 1 Γ(kα/2) i j 2 ∗ + ln   (σ )α > 0, kα > 0 α − 1 ki (σ2)∗ σi Γ(ki/2) α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi , kα = αki + (1 − α)kj     2 Γ(dj /2) 1 Γ(dα/2) χ ln + ln dα > 0 Γ(di/2) α − 1 Γ(di/2)

dα = αdi + (1 − α)dj Cram´er For α = 1/2

θi  θi θi  ln + 2 ln ln θj θi − θj θj

For α 6= 1/2  2α−1 ! θi 1 θi 1 − (θj /θi) ln + ln θj α − 1 (θi − θj )(2α − 1)

B(aj ) 1  B(aα)  Dirichlet ln + ln aαk > 0 ∀k B(ai) α − 1 B(ai)

aα = αai + (1 − a)aj

λi 1 λi Exponential ln + ln λα > 0 λj α − 1 λα

λα = αλi + (1 − α)λj

 kj  Γ(kj )θj Gamma ln   ki Γ(ki)θi  kα ! 1 Γ(kα) θiθj ∗ + ln θα > 0 and kα > 0 α − 1 ki θ∗ θi Γ(ki) α

∗ θα = αθj + (1 − a)θi, kα = αki + (1 − α)kj

Multivariate α 0 ∗ (µ − µ ) (Σα) (µ − µ ) Gaussian 2 i j i j ∗ −1 −1 1 |(Σα) | αΣ + (1 − α)Σ − ln i j 1−α α 2(α − 1) |Σi| |Σj | positive definite

∗ (Σα) = αΣj + (1 − α)Σi

2 ! 2 Univariate σj 1 σj 1 α(µi − µj ) ln + ln + (σ2)∗ > 0 Gaussian 2 ∗ 2 ∗ α σi 2(α − 1) (σ )α 2 (σ )α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi

6 Name Dα(fi||fj ) Finiteness Condition

Special 2 ! ∗ 2 ! −1 −1 1 1 − ρj 1 1 − (ρ ) αΦ + (1 − α)Φ Bivariate ln − ln α i j 2 2 positive definite Gaussian 2 1 − ρi 2(α − 1) (1 − ρj )

∗ ρα = αρj + (1 − α)ρi

Gumbel µ /β µi − µj 1 e i   Fixed Scale + ln eµi/β > 0 β α − 1 eµi/β  α (βi = βj ) α   eµi/β = αeµi/β + (1 − α)eµj /β α

2 !1/2 σj 1 σ Half-Normal ln + ln j (σ2)∗ > 0 2 ∗ α σi α − 1 (σ )α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi       Inverse 1 λi 1 λi λ ln + ln ≥ 0 and λ > 0 2 α Gaussian 2 λj 2(α − 1) λα µ ( ) α 1  λ   λ  1/2 + − λ 2 α α − 1 µ α µ α

 λ  λi λj = α + (1 − α) , λ = αλ + (1 − α)λ 2 2 2 a i j µ α µi µj

 λ   λi   λj  = α + (1 − α) . µ α µi µj

Laplace For α = λi/(λi + λj )

λj |θi − θj | λi + λj  2λi  ln + + ln λi λj λj λi + λj + |θi − θj |

For α 6= λi/(λi + λj ) and αλj + (1 − α)λi > 0

2 ! λj 1 λiλ g(α) ln + ln j 2 2 2 2 λi α − 1 α λj − (1 − α) λi

α  (1 − α)|θi − θj |  1 − α  −α|θi − θj |  where g(α) = exp − − exp λi λj λj λi L´evy     Equal 1 ci 1 ci ln + ln cα > 0 Supports 2 cj 2(α − 1) cα (µi = µj ) cα = αci + (1 − α)cj

2 ! 2 σj 1 σ 1 α(µi − µj ) Log-normal ln + ln j + (σ2)∗ > 0 2 ∗ 2 ∗ α σi 2(α − 1) (σ )α 2 (σ )α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi

2 !3/2 Maxwell σj 1 σj 3 ln + ln (σ2)∗ > 0 Boltzmann 2 ∗ α σi α − 1 (σ )α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi Pareto For α ∈ (0, 1)

ai ai m ai 1 aim ln i + ln + ln i , aj a aj α − 1 aαM α mj

7 Name Dα(fi||fj ) Finiteness Condition

M = max{mi, mj }

For α > 1, mi ≥ mj , and aα = αai + (1 − α)aj > 0

a  mi  j ai 1 ai ln + ln + ln mj aj α − 1 aα

2 ! σj 1 σ Rayleigh 2 ln + ln j (σ2)∗ > 0 2 ∗ α σi α − 1 (σ )α

2 ∗ 2 2 (σ )α = ασj + (1 − α)σi Uniform For α ∈ (0, 1) and bm = min{bi, bj } > aM = max{ai, aj }

bj − aj 1 bm − aM ln + ln , bi − ai α − 1 bi − ai

For α > 1 , (ai, bi) ⊂ (aj , bj )

bj − aj ln bi − ai

Weibull k k  λj  1 λ Fixed Shape ln + ln j (λk)∗ > 0 k ∗ α λi α − 1 (λ ) (ki = kj ) α

k ∗ k k (λ )α = αλj + (1 − α)λi

Table 3: Kullback Divergences for Common Continuous Distributions

Name D(fi||fj )

B(aj , bj ) Beta ln + ψ(ai)(ai − aj ) + ψ(bi)(bi − bj ) B(ai, bi) + [aj + bj − (ai + bi)]ψ(ai + bi) " #  kj 1 σj Γ(kj /2) ki  2 2 Chi ψ(ki/2) (ki − kj ) + ln + 2 σi − σj 2 σi Γ(ki/2) 2σj

2 Γ(dj /2) di − dj χ ln + ψ(di/2) Γ(di/2) 2

θi + θj θi Cram´er ln − 2 θi − θj θj

d " d !# B(aj ) X   X Dirichlet log + ai − aj ψ(ai ) − ψ ai B(a ) k k k k i k=1 k=1

λi λj − λi Exponential ln + λj λi

 kj   θi − θj  Γ(kj )θ Gamma k + ln j + (k − k ) (ln θ + ψ(k )) i  k  i j i i θj i Γ(ki)θi   Multivariate 1 |Σj |  −1  1 h 0 −1  i ln + tr Σj Σi + µi − µj Σj µi − µj − n Gaussian 2 |Σi| 2

Univariate 1 h i σj (µ − µ )2 + σ2 − σ2 + ln Gaussian 2 i j i j 2σj σi

8 Name D(fi||fj )

Special 2 ! 2 1 1 − ρ ρ − ρj ρi Bivariate j j ln 2 + 2 Gaussian 2 1 − ρi 1 − ρj     General βj βi βi ln + γ − 1 + e(µj −µi)/βj Γ + 1 − 1 Gumbel βi βj βj

  2 2 σj σi − σj Half-Normal ln + 2 σi 2σj   2 ! Inverse 1 λj λi λj (µi − µj ) + ln + − 1 Gaussian 2 2 λi λj µiµj

λj |θi − θj | λi Laplace ln + + exp (−|θi − θj |/λi) − 1 λi λj λj L´evy   Equal 1 ci cj − ci ln + Supports 2 cj 2ci (µi = µj )

1 h 2 2 2i σj Log-normal 2 (µi − µj ) + σi − σj + ln 2σj σi   2 2 Maxwell σj 3(σi − σj ) 3 ln + Boltzmann 2 σi 2σj

a  mi  j ai aj − ai Pareto ln + ln + , for mi ≥ mj and ∞ otherwise. mj aj ai   2 2 σj σi − σj Rayleigh 2 ln + 2 σi σj

bj − aj Uniform ln for (ai, bi) ⊆ (aj , bj ) and ∞ otherwise. bi − ai  k !  k   General ki λj j kj − ki λi j kj ln + γ + Γ 1 + − 1 Weibull kj λi ki λj ki

2.3. Information Rates for Stationary Gaussian Processes It is also of interest to extend divergence measures to stochastic processes, in which case one considers the rate of the given divergence measure. Given two real-valued processes X = {Xn}n∈N, Y = {Yn}n∈N, the expression for the R´enyi divergence rate between X and Y is Dα(X||Y ) := limn→∞(1/n)Dα(fXn ||fY n ), n whenever the limit exists, and where fXn and fY n are the densities of X = n (X1, ..., Xn) and Y = (Y1, ..., Yn), respectively. The KLD, R´enyi entropy, and differential Shannon entropy rates are similarly defined. In Table 4 we sum- marize these expressions for stationary zero-mean Gaussian processes with the appropriate reference. Note that ϕ(λ) and ψ(λ) are the power spectral densities of X and Y on [−π, π], respectively.

9 Table 4: Information Rates for Stationary Zero-Mean Gaussian Processes

Information Rate Conditions Measure ln ϕ(λ) ∈ L1[−π, π] Differential 1 1 Z π ln(2πe) + ln ϕ(λ) dλ [18, 17] Entropy 2 4π −π Also valid for nonzero mean stationary Gaussian processes. ln ϕ(λ) ∈ L1[−π, π] R´enyi 1 1 1 Z π ln 2πα α−1 + ln 2πϕ(λ) dλ [12] Entropy 2 4π −π Also valid for nonzero mean stationary Gaussian processes. ϕ(λ)/ψ(λ) is bounded or KLD 1 Z π  ϕ(λ) ϕ(λ)  − 1 − ln dλ [17] ψ(λ) > a > 0 , ∀λ ∈ [−π, π] D(X||Y ) 4π −π ψ(λ) ψ(λ) and ϕ ∈ L2[−π, π]. h(λ) := αψ(λ) + (1 − α)ϕ(λ) R´enyi 1 Z π h(λ) ψ(λ) and ϕ(λ) are essentially Divergence ln dλ [31] 1−α α bounded and essentially D (X||Y ) 4π(1 − α) −π ϕ(λ) ψ(λ) α bounded away from zero, and α ∈ (0, 1).

3. R´enyi Divergence and the Log-likelihood Ratio

Song [30] pointed out a relationship between the derivative of the R´enyi entropy with respect to the parameter α and the variance of the log-likelihood function of a distribution. Let f be a probability density, then d 1 lim hα(f) = − Var(ln f(X)) , α→1 dα 2 assuming the integrals involved are well-defined and differentiation operations are legitimate. Extending Song’s idea, we note that the variance of the log- likelihood ratio (LLR) between two densities can be similarly derived from an analytic formula of their R´enyi divergence of order α. Let fi and fj be two probability densities such that Dα(fi || fj) is n times continuously differentiable with respect to α (n ≥ 2). Assuming differentiation and integration can be interchanged, one obtains   d 1 fi(X) lim Dα(fi||fj) = Varfi ln . α→1 dα 2 fj(X)

R α 1−α Considering the integral G(α) := fi(x) fj (x) dx for α > 0, α 6= 1, we have

n  n d fi(X) lim n G(α) = Efi ln . α→1 dα fj(X)

10 Hence, from the definition of Dα(fi||fj), we find   d 1 −1 dG(α) lim Dα(fi||fj) = lim (α − 1)G(α) − ln G(α) α→1 dα α→1 (α − 1)2 dα  2 " 2#! 1 fi(X) fi(X) = −Efi ln + Efi ln , 2 fj(X) fj(X) where we used l’Hospital’s rule to evaluate the limit. This connection between the R´enyi divergence and the LLR variance becomes practically useful in light of the R´enyi divergence expressions presented in Section 2.2. Table 5 presents these quantities for the continuous univariate distributions considered in this paper. Note that ψ(1)(x) refers to the polygamma function of order 1.

Table 5: Var(LLR) for Common Continuous Univariate Distributions

 fi(X)  Name Varf ln i fj (X) 2 (1) 2 (1) 2 (1) Beta (ai − aj ) ψ (ai) + (bi − bj ) ψ (bi) − (ai − aj + bi − bj ) ψ (ai + bi)

 2 2   2 2 2 4 2 (1)  ki  2 σi − σj ki σi + σj − 2kj σj + σj (ki − kj ) ψ 2 Chi 4 4σj   2 1 2 (1) di χ (di − dj ) ψ 4 2

h 2 2  θj i 4 (θi − θj ) − θiθj log θ Cram´er i 2 (θi − θj ) 2 (λj − λi) Exponential 2 λi 2 2 (1) (θi − θj )(ki(θi + θj ) − 2kj θj ) + θj (ki − kj ) ψ (ki) Gamma 2 θj 2 2 2 2 2 Univariate 2σi (µi − µj ) + (σi − σj ) Gaussian 4 2σj

2µ  µ µj 2 Gumbel − i i e β e β − e β (Fixed Scale)

2 2 2 (σj − σi ) Half-Normal 4 2σj

2 4 2 2 2 4  2 2 Inverse λiλj µi − 2λiλj µi µj + µj λiλj + 2µi(λi − λj ) Gaussian 2 4 4λi µiµj

( 2|θi−θj | ! |θi−θj | 1 2 − − λi λi Laplace 2 λi 2 − e − 2λiλj e λj |θi−θj | ) − 2 λi −2(λi + λj )|θi − θj |e + λj

L´evy 2 Equal (ci − cj ) Supports 2 2ci (µi = µj ) 2 2 2 2 2 2σi (µi − µj ) + (σi − σj ) Log-normal 4 2σj

11  fi(X)  Name Varf ln i fj (X) 2 2 2 Maxwell 3(σj − σi ) Boltzmann 4 2σj 2 (ai − aj ) Pareto 2 ai 2 2 2 (σj − σi ) Rayleigh 4 σj 2  k k Weibull λi − λj Fixed Scale 2k λj

Song [30] also proposed a nonparametric estimate of Var(ln f(X)) based on 2 separately estimating ∆1 := E[ln f(X)] and ∆2 := E[(ln f(X)) ] by the plug-in R l estimates ∆ˆ l,n := fn(x) ln fn(x) dx, l = 1, 2, where fn is a density estimate of f from the independent and identically distributed (i.i.d.) samples (X1,X2,...,Xn) that are drawn according to f. Theorem 3.1 in [30] establishes ˆ ˆ 2 precise conditions under which ∆2,n − ∆1,n converges with probability one to Var(ln f(X)) as n → ∞ (the estimate is strongly consistent). Assuming one has access to i.i.d. samples (X1,...,Xn) and (Y1,...,Yn) drawn according to fi and fj, respectively, we can propose a similar estimate  fi(X)  of Varf ln . Let i fj (X)

Z  l fi,n(x) ∆˜ l,n := fi,n(x) ln dx, l = 1, 2, fj,n(x) where fi,n and fj,n are kernel density estimates of fi and fj based on (X1,...,Xn) and (Y1,...,Yn), respectively. It is an interesting problem to find exact condi- ˜ ˜ 2 tions on fi and fj such that appropriate choices of fi,n and fj,n make ∆2,n−∆1,n  fi(X)  a strongly consistent estimate of Varf ln . Investigating this problem is i fj (X) out of the scope of this paper, but it is likely that the consistency proof in [30] can be extended to this case under appropriate restrictions on fi and fj. Alter- natively, the partitioning, k(n) nearest-neighbor, and minimum spanning tree based methods reviewed in [33] and the references therein should also provide consistent estimates.

4. Conclusion

In this work we presented closed-form expressions for R´enyi and Kullback- Leibler divergences for nineteen commonly used univariate continuous distri- butions, as well as those expressions for Dirichlet and multivariate Gaussian distributions. We also presented a table summarizing four of the most impor- tant information measure rates for zero-mean stationary Gaussian processes, as well as the relevant references. Lastly, we established a connection between the log-likelihood ratio between two distributions and their R´enyi divergence,

12 extending the work of Song [30], who considers the log-likelihood function and its relation to R´enyi entropy. Following this connection, we have provided the corresponding expressions for the nineteen univariate distributions considered here. The present compilation is of course not complete. Particularly valuable future additions would be the R´enyi divergence expressions for Student-t and Kumaraswamy distributions, as well as for the exponentiated and beta-normal distributions [10].

5. Acknowledgements

The authors are grateful to NSERC of Canada whose funds partially sup- ported this work.

[1] J. Acz´el,Z. Dar´oczy, On Measures of Information and their Characteriza- tion, Academic Press, 1975.

[2] F. Alajaji, P. N. Chen, Z. Rached, Csisz´ar’scutoff rates for the general hy- pothesis testing problem, IEEE Transactions on Information Theory 50 (4) (2004) 663 – 678. [3] C. Arndt, Information Measures: Information and its Description in Science and Engineering, Signals and Communication Technology, Springer, 2004.

[4] M. Asadi, N. Ebrahimi, G. G. Hamedani, E. S. Soofi, Information measures for Pareto distributions and order statistics, in: Advances in Distribution Theory, Order Statistics, and Inference, Springer, 2006, pp. 207–223. [5] J. Burbea, The convexity with respect to Gaussian distributions of diver- gences of order α, Utilitas Mathematica 26 (1984) 171–192.

[6] I. Csisz´ar,Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit¨atvon Markoffschen Ketten, Magyar Tud. Akad. Mat. Kutat´oInt. K¨ozl(8) (1963) 85–108. [7] I. Csiszar, Generalized cutoff rates and R´enyi’s information measures, IEEE Transactions on Information Theory 41 (1) (1995) 26 –34. [8] G. A. Darbellay, I. Vajda, Entropy expressions for multivariate continuous distributions, IEEE Transactions on Information Theory (2000) 709–712. [9] P. A. Devijver, J. Kittler, Pattern Recognition: A Statistical Approach, Prentice - Hall International, 1982.

[10] N. Eugene, C. Lee, F. Famoye, Beta-normal distribution and its applica- tions, Communications in Statistics - Theory and Methods, 31-(4) (2002) 497–512. [11] M. Gil, On R´enyi divergence measures for continuous alphabet sources, Master’s thesis, Queen’s University (2011).

13 [12] L. Golshani, E. Pasha, R´enyi entropy rate for Gaussian processes, Infor- mation Sciences 180 (8) (2010) 1486–1491. [13] L. Golshani, E. Pasha, G. Yari, Some properties of R´enyi entropy and R´enyi entropy rate, Information Sciences 179 (14) (2009) 2426 – 2433.

[14] P. Harremo¨es,Interpretations of R´enyi entropies and divergences, Physica A: Statistical Mechanics and its Applications 365 (1) (2006) 57 – 62. [15] A. O. Hero, B. Ma, O. Michel, J. Gorman, Alpha-divergence for classifica- tion, indexing and retrieval, Tech. rep., University of Michigan (2001). [16] T. Hobza, D. Morales, L. Pardo, R´enyi statistics for testing equality of autocorrelation coefficients, Statistical Methodology 6 (4) (2009) 424 – 436. [17] S. Ihara, Information Theory for Continuous Systems, World Scientific, 1993. [18] A. N. Kolmogorov, A new metric invariant of transient dynamical systems and automorphisms, in: in Lebesgue Spaces.??? Dokl. Akad. Nauk. SSSR 119, 1958. [19] S. Kullback, R. A. Leibler, On information and sufficiency, The Annals of Mathematical Statistics 22 (1) (1951) 79–86. [20] F. Liese, I. Vajda, Convex Statistical Distances, Treubner, 1987.

[21] F. Liese, I. Vajda, On divergences and informations in statistics and infor- mation theory, IEEE Transactions on Information Theory 52 (10) (2006) 4394 –4412. [22] S. Nadarajah, K. Zografos, Formulas for R´enyi information and related measures for univariate distributions, Information Sciences 155 (1-2) (2003) 119 – 138. [23] M. C. Pardo, J. A. Pardo, Use of R´enyi’s divergence to test for the equal- ity of the coefficients of variation, Journal of Computational and Applied Mathematics 116 (1) (2000) 93 – 104.

[24] W. D. Penny, Kullback-Leibler divergences of normal, gamma, dirichlet and wishart densities., Tech. rep., Wellcome Department of Cognitive Neu- rology (2001). [25] W. D. Penny, S. Roberts, Bayesian methods for autoregressive models, in: Neural Networks for Signal Processing X, 2000. Proceedings of the 2000 IEEE Signal Processing Society Workshop, vol. 1, 2000. [26] T. W. Rauber, T. Braun, K. Berns, Probabilistic distance measures of the Dirichlet and Beta distributions, Pattern Recognition 41 (2) (2008) 637 – 645.

14 [27] A. R´enyi, On measures of entropy and information, in: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, 1961. [28] M. J. Rivas, M. T. Santos, D. Morales, R´enyi test statistics for partially observed diffusion processes, Journal of Statistical Planning and Inference 127 (1-2) (2005) 91 – 102. [29] C. E. Shannon, A mathematical theory of communication, Bell System Technical Journal 27 (1948) 379 - 423. [30] K. Song, R´enyi information, loglikelihood and an intrinsic distribution mea- sure, Journal of Statistical Planning and Inference 93 (1-2) (2001) 51 – 69. [31] I. Vajda, Theory of Statistical Inference and Information, Kluwer, Dor- drecht, 1989. [32] T. van Erven, P. Harremo¨es,R´enyi divergence and majorization, in: Infor- mation Theory Proceedings (ISIT), 2010 IEEE International Symposium on, 2010. [33] Q. Wang, S. Kulkarni, S. Verd´u,Universal estimation of information mea- sures for analog sources, Foundations and Trends in Communications and Information Theory, 5 (3) (2009), 265 – 353.

[34] K. Zografos, S. Nadarajah, Expressions for R´enyi and Shannon entropies for multivariate distributions, Statistics & Probability Letters 71 (1) (2005) 71 – 84.

15