Expressions for the Entropy of Binomial-Type Distributions∗

Expressions for the Entropy of Binomial-Type Distributions∗ Mahdi Cheraghchiy Department of Computing Imperial College London London, UK Abstract We develop a general method for computing logarithmic and log-gamma expectations of distributions. As a result, we derive series expansions and integral representations of the entropy for several fundamental distributions, including the Poisson, binomial, beta-binomial, negative binomial, and hypergeometric distributions. Our results also establish connections between the entropy functions and to the Riemann zeta function and its generalizations. 1 Introduction Deriving expressions for the Shannon entropy of commonly studied distributions is of fundamental significance to information and communication theory, statistics, and theoretical computer sci- ence. For many distributions, exact closed-form expression for the entropy is known. A non- comprehensive list of such distributions include uniform, Bernoulli, geometric, exponential, Laplace, normal, log-normal, Pareto, Cauchy, Weibull, Rayleigh, t-distribution, Dirichlet, Wishart, Chi- squared, scaled inverse chi-squared, gamma, and, inverse-gamma distribution. In many cases, the entropy is a simple expression in terms of the first few moments, and in other cases, the logarithmic expectation of the distribution takes a tractable form. For many fundamental distributions, however, we do not expect to have direct, closed form, expressions for the entropy in terms of elementary, or common special, functions. In such cases, high quality approximations, series expansions, or integral forms, for the entropy is desirable. In this work, we focus on what we call \binomial-type" distributions, that exhibit factorial terms in their expressions for the probability mass function. In fact, we derive a general expression for computing log-gamma expectations; i.e., expressions of the form E[log Γ(α+X)] for any distribution arXiv:1708.06394v4 [cs.IT] 21 Jul 2018 in terms of its moment generating function. Specific examples that we will use to demonstrate our technique includes Poisson, binomial, beta-binomial, negative binomial, and hypergeometric distributions. We recall that the binomial distribution is defined to capture the number of success events in a series of n Bernoulli trials, for a given parameter n and a given success probability. This contains the Poisson distribution as a limiting case, and is in turn a limiting case for the more general beta-binomial distribution. A negative binomial distribution is defined similar to the binomial distribution, but with a varying number of trials and fixed number r of success events. Finally, a hypergeometric distribution is regarded as an analogue of the binomial distribution where the Bernoulli trials are performed without replacement. Apart from their fundamental ∗A preliminary version of this work appears in Proceedings of the 2018 IEEE International Symposium on Infor- mation Theory (ISIT). yEmail: [email protected]. 1 nature, the entropy of such distributions are of particular significance to information theory, for example in quantifying the number of bits required for compression of the corresponding sources. Moreover, understanding the entropy of binomial and Poisson distributions is a key step towards a characterization of the capacity of basic channels with synchronization errors, including the deletion and Poisson-repeat channels (cf. [Mit08]). Poisson entropy is also of key significance to the theory of optical communication and in understanding the capacity of Poisson-type channels in this context (cf. [Sha90,Ver99,Ver08,AW12]). Indeed, an integral expression for the log-gamma expectation of Poisson random variables (similar to one of our results) was the key to [Mar07,CR18] for obtaining sharp elementary estimates on the capacity of the discrete-time Poisson channel. On the other hand, similar integral representations, of the type we derive, were used in recent works of the author [Che18, CR18b] to derive strong upper bounds on the capacity of the binary deletion, Poisson-repeat, and related channels with synchronization errors. For a wide range of the parameters, binomial-type distributions are approximated by a normal distribution via the central limit theorem. When the variance of these distributions is large (and, for the negative binomial distribution, the order parameter r is large), the entropy is quite accurately estimated by the entropy of a normal distribution with matching variance. However, as the variance decreases, the quality of this approximation deteriorates. In fact, as the variance σ tends to zero, 1 2 the entropy of a normal distribution, which is 2 log(2πeσ ), diverges to −∞, while the actual entropy of the underlying distribution tends to zero. In such cases, a more refined expression for the entropy in terms of a convergent power series or integral expression would correctly capture its behavior. Our starting point is a simple, but curious, manipulation of the entropy expression for the Poisson distribution. Recall that a Poisson distribution is defined by the probability mass function λke−λ Poi(k; λ) = ; k! where k 2 f0; 1;:::g, and the parameter λ > 0 is the mean and variance of the distribution. The entropy of this distribution is thus equal, by the definition of Shannon entropy, to 1 1 X X λk log k! H (λ) := − Poi(k; λ) log(Poi(k; λ)) = −λ log(λ/e) + e−λ : (1) Poi k! k=0 k=0 1 2 As mentioned above, HPoi(λ) tends to 2 log(2πeσ ) as λ grows. A more accurate estimate is given in [EB88]; namely, that for large λ, 1 1 1 19 H (λ) = log(2πeλ) − − − + O(λ−4): Poi 2 12λ 24λ2 360λ3 We remark that asymptotic expansions for the entropy of binomial and negative binomial distributions for large mean (in terms of the difference between the entropy and the Gaussian entropy estimate) has been derived in [JS99] and [CGKK13] (and, among other results, in [DV98] and [Kne98]). Let X be a Poisson distributed random variable with mean λ, so that the second term on the 2 right hand side of (1) becomes EPoi(λ) := E[log X!]. We now write, using the convolution formula, 1 X λk log k! E (λ) = e−λ Poi k! k=0 1 j X X (−1)j−k log k! = λj (j − k)! k! j=0 k=0 1 j X λj X j = (−1)j−k log k! j! k j=0 k=0 1 X (−λ)j = c(j); (2) j! j=0 Pj j k where we have defined the coefficients c(j) := k=0 k (−1) log k!. The first few values for the c(j), j = 0; 1;:::, are (see integer sequences A122214 and A122215) 4 32 4096 67108864 4503599627370496 2535301200456458802993406410752 log 1; 1; 2; ; ; ; ; ; 3 27 3645 61509375 4204742431640625 2396825584582984447479248046875 ≈ f0; 0; 0:693147; 0:287682; 0:169899; 0:116655; 0:0871265; 0:068664; 0:0561673g; where the logarithms are taken to base e. We have thus derived the Maclaurin series expansion of the function EPoi(λ), which converges for all λ > 0. If we formally replace each coefficient c(j) in (2) with (et − 1)j, the expression turns into the power series expansion of eλ(et−1), which is the moment generating function of the Poisson distribution. Our main observation is that this is not a coincidence, and in fact the same phenomenon occurs for the log-gamma expectation of any (discrete or continuous) distribution defined over the non-negative reals. In particular, we prove the following: Theorem 1. Let M(t) be the moment generating function of any (continuous or discrete) distribution with mean µ = M 0(0), and α > 0 be a parameter. Suppose M(t) is analytic around t = 0 P1 j and let Q(z) := M(log(z + 1)) be represented by power series Q(z) = 1 + j=1 q(j)z . Then, for a random variable X sampled from the distribution given by M, we have the following: 1 X j E[log Γ(X + α)] = log Γ(α) + (−1) q(j)cα(j) (3) j=1 Z 1 µe−t e−αt(1 − M(−t)) = log Γ(α) + − −t dt (4) 0 t t(1 − e ) Z 1 (1 − z)α−1 = log Γ(α) + µ log α − (Q(−z) + µz − 1)dz; (5) 0 z log(1 − z) 1 X j E[log(X + α)] = (−1) q(j − 1)cα(j) (6) j=1 Z 1 e−t − e−αtM(−t) = dt; (7) 0 t where j−1 X j − 1 c (j) := − (−1)k log(k + α): (8) α k k=0 3 Proof of the above theorem turns out to be remarkably simple, yet it provides a general and powerful tool for deriving series expansions and integral expressions for the entropy of distributions involving factorial terms in their probability mass functions, a task that may seem elusive by a direct approach. 1.1 Summary of the main results We apply Theorem 1 to derive series and integral expressions for the entropy and log-gamma expectations of several distributions over the non-negative integers. We will particularly consider the Poisson, binomial, beta-binomial, negative binomial, and hypergeometric distributions that are respectively defined by the probability mass functions below1 (for k = 0; 1;:::): λke−λ Poi(k; λ) = ; λ > 0 (9) k! n Bin(k; n; p) = pk(1 − p)n−k; n > 0; p 2 (0; 1) (10) k nB(k + α; n − k + β) BBin(k; n; α; β) = ; n; α; β > 0 (11) k B(α; β) k + r − 1 NBin(k; r; p) = pk(1 − p)r; r > 0; p 2 (0; 1) (12) k KN−K HG(k; N; K; n) = k n−k ; N; K; n > 0 (13) N n where B(α; β) = Γ(α)Γ(β)=Γ(α+β) denotes the beta function. We remark that a similar technique has been employed in [Kne98] (and rediscovered in [Mar07]) to derive integral expressions for the entropy of Poisson, binomial, and negative binomial distributions.

Load more