Chapter 8: Differential Entropy

Chapter 8: Differential entropy University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8 outline • Motivation • Definitions • Relation to discrete entropy • Joint and conditional differential entropy • Relative entropy and mutual information • Properties • AEP for Continuous Random Variables University of Illinois at Chicago ECE 534, Natasha Devroye Motivation • Our goal is to determine the capacity of an AWGN channel N Gaussian noise ~ N(0,PN) h X Y = h X + N Wireless channel with fading time time University of Illinois at Chicago ECE 534, Natasha Devroye Motivation • Our goal is to determine the capacity of an AWGN channel N Gaussian noise ~ N(0,PN) h X Y = h X + N Wireless channel with fading 2 1 h P +PN C = log | | 2 PN 1 = 2 log (1 + SNR)⇥ (bits/channel use) University of Illinois at Chicago ECE 534, Natasha Devroye Motivation • need to define entropy, mutual information between CONTINUOUS random variables • Can you guess? • Discrete X, p(x): • Continuous X, f(x): University of Illinois at Chicago ECE 534, Natasha Devroye Definitions - densities University of Illinois at Chicago ECE 534, Natasha Devroye Properties - densities University of Illinois at Chicago ECE 534, Natasha Devroye Properties - densities University of Illinois at Chicago ECE 534, Natasha Devroye Properties - densities University of Illinois at Chicago ECE 534, Natasha Devroye 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 an interpretation of the differential entropy: It is the logarithm of the equivalent side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable is confined to a small effective volume and high entropy indicates that the random variable is widely dispersed. Note. Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY QuantizedConsider a random variable randomX with density f(x) variablesillustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such that (i 1)! + f(xi)! f(x)dx. (8.23) = !i! Consider the quantized random variable X!, which is defined by ! X xi if i! X<(i 1)!. (8.24) = ≤ + f(x) ∆ x FIGURE 8.1. Quantization of a continuous random variable. University of Illinois at Chicago ECE 534, Natasha Devroye 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 an interpretation of the differential entropy: It is the logarithman interpretation of the of the differential entropy: It is the logarithm of the equivalent side length of the smallest set that contains most ofequivalent the prob- side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable isability. confined Hence low entropy implies that the random variable is confined to a small effective volume and high entropy indicates that theto random a small effective volume and high entropy indicates that the random variable is widely dispersed. variable is widely dispersed. Note. Just as the entropy is related to the volume of the typicalNote set,. there Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to theis a surface quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in morearea detail of thein typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. Sections 11.10 and 17.8. 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPYQuantized random variablesENTROPY Consider a random variable X with density f(x)illustrated in FigureConsider 8.1. a random variable X with density f(x)illustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !Suppose. Let us that we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, byassume the mean that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that (i 1)! (i 1)! + + f(xi)! f(x)dx. (8.23) f(xi)! f(x)dx. (8.23) = = !i! !i! Consider the quantized random variable X!, which is defined byConsider the quantized random variable X!, which is defined by ! ! X xi if i! X<(i 1)!. (8.24) X xi if i! X<(i 1)!. (8.24) = ≤ + = ≤ + f(x) f(x) ∆ ∆ x x FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable. University of Illinois at Chicago ECE 534, Natasha Devroye 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY 247 an interpretation of the differential entropy: It is the logarithman interpretation of the of the differential entropy: It is the logarithm of the equivalent side length of the smallest set that contains most ofequivalent the prob- side length of the smallest set that contains most of the probability. Hence low entropy implies that the random variable isability. confined Hence low entropy implies that the random variable is confined to a small effective volume and high entropy indicates that theto random a small effective volume and high entropy indicates that the random variable is widely dispersed. variable is widely dispersed. Note. Just as the entropy is related to the volume of the typicalNote set,. there Just as the entropy is related to the volume of the typical set, there is a quantity called Fisher information which is related to theis a surface quantity called Fisher information which is related to the surface area of the typical set. We discuss Fisher information in morearea detail of thein typical set. We discuss Fisher information in more detail in Sections 11.10 and 17.8. Sections 11.10 and 17.8. 8.3Quantized RELATION OF DIFFERENTIAL random ENTROPY variables TO DISCRETE8.3 RELATION OF DIFFERENTIAL ENTROPY TO DISCRETE ENTROPY ENTROPY Consider a random variable X with density f(x)illustrated in FigureConsider 8.1. a random variable X with density f(x)illustrated in Figure 8.1. Suppose that we divide the range of X into bins of length !Suppose. Let us that we divide the range of X into bins of length !. Let us assume that the density is continuous within the bins. Then, byassume the mean that the density is continuous within the bins. Then, by the mean value theorem, there exists a value xi within each bin such thatvalue theorem, there exists a value xi within each bin such that (i 1)! (i 1)! + + f(xi)! f(x)dx. (8.23) f(xi)! f(x)dx. (8.23) = = !i! !i! Consider the quantized random variable X!, which is defined byConsider the quantized random variable X!, which is defined by ! ! X xi if i! X<(i 1)!. (8.24) X xi if i! X<(i 1)!. (8.24) = ≤ + = ≤ + f(x) f(x) ∆ ∆ x x FIGURE 8.1. Quantization of a continuous random variable. FIGURE 8.1. Quantization of a continuous random variable. University of Illinois at Chicago ECE 534, Natasha Devroye Differential entropy - definition University of Illinois at Chicago ECE 534, Natasha Devroye Examples f(x) a b x University of Illinois at Chicago ECE 534, Natasha Devroye Examples University of Illinois at Chicago ECE 534, Natasha Devroye Differential entropy - the good the bad and the ugly University of Illinois at Chicago ECE 534, Natasha Devroye Differential entropy - the good the bad and the ugly University of Illinois at Chicago ECE 534, Natasha Devroye Differential entropy - multiple RVs University of Illinois at Chicago ECE 534, Natasha Devroye Differential entropy of a multi-variate Gaussian University of Illinois at Chicago ECE 534, Natasha Devroye SUMMARY 41 Proof: We have r(x) H(p) D(p r) p(x) log p(x) p(x) log 2− − || 2 + p(x) (2.151) = ! ! 2 p(x) log r(x) (2.152) = ! p(x)2log r(x) (2.153) ≤ " p(x)r(x) (2.154) = Pr"(X X′), (2.155) = = where the inequality follows from Jensen’s inequality and the convexity of the function f(y) 2y. = ! ParallelsThe following telegraphicwith discrete summary omits qualifying entropy.... conditions. SUMMARY Definition The entropy H(X) of a discrete random variable X is defined by H(X) p(x) log p(x). (2.156) = − ✓ x "∈X Properties of H ✕ 1. H(X) 0. ≥ 2. Hb(X) (log a)Ha(X). = b 3. (Conditioning reduces entropy) For any two random variables, X and Y , we have .... H(X Y) H(X) (2.157) | ≤ with equality if and only if X and Y are independent. n 4. H(X1,X2,...,Xn) i 1 H(Xi), with equality if and only if the ≤ = .... Xi are independent. 5. H(X) log , with! equality if and only if X is distributed uni- formly≤ over | .X | .... X 6. H(p) is concave in p. .... University of Illinois at Chicago ECE 534, Natasha Devroye Parallels with discrete entropy.... 42 ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION Definition The relative entropy D(p q) of the probability mass function p with respect to the probability∥ mass function q is defined by p(x) D(p q) p(x) log .

Chapter 8: Differential Entropy

An Unforeseen Equivalence Between Uncertainty and Entropy

Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities

A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications

Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series

"Differential Entropy". In: Elements of Information Theory

January 28, 2021 1 Dealing with Infinite Universes

Introduction to Information Theory and Coding

A Probabilistic Upper Bound on Differential Entropy Joseph Destefano, Member, IEEE, and Erik Learned-Miller

Claude Shannon: His Work and Its Legacy1

Chapter 5 Differential Entropy and Gaussian Channels

Entropy-Based Approach for Detecting Feature Reliability

Measuring the Complexity of Continuous Distributions