Entropy and Mutual Information (Continuous Random Variables)
Master Universitario en Ingenier´ıade Telecomunicaci´on
I. Santamar´ıa Universidad de Cantabria Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Contents
Introduction
Differential Entropy
Joint and Conditional Differential Entropy
Relative Entropy
Mutual Information
Entropy and Mutual Information 1/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Introduction
I We introduce the (differential) entropy and mutual information of continuous random variables
I We will need these concepts, for instance, to determine the capacity of the AWGN channel I Some important differences appear with respect to the case of discrete random variables
I Continuous random variables =⇒ differential entropy (strictly, not entropy) I Unlike the entropy of discrete random variables, the differential entropy of a continuous random variable can be negative I It does not give the average information in X
I The relative entropy and mutual information concepts can be extended to the continuous case in a straightforward manner, and convey the same information
Entropy and Mutual Information 2/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Definitions Let X be a continuous random variable with cumulative distribution function given by
F (x) = Pr{X ≤ x},
and probability density function (pdf) given by
dF (x) f (x) = , dx F (x) and f (x) are both assumed to be continuous functions Definition: The differential entropy h(X ) of a continuous random variable X with pdf f (x) is defined as Z h(X ) = − f (x) log f (x)dx,
where the integration is carried out on the support of the r.v.
Entropy and Mutual Information 3/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Example 1: Entropy of a uniform distribution, X ∼ U(a, b)
f (x) 1 b − a
a b
Z Z b 1 1 h(X ) = − f (x) log f (x)dx = − log = log(b−a) a b − a b − a Note that h(X ) < 0 if (b − a) < 1
Entropy and Mutual Information 4/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Example 2: Entropy of a normal distribution, X ∼ N(0, σ2)
2 1 − x f (x) = √ e 2σ2 2πσ
Z Z 2 1 − x h(X ) = − f (x) log f (x)dx = − f (x) log √ e 2σ2 dx 2πσ Z 1 x2 = − f (x) − log(2πσ2) − log e dx 2 2σ2 1 σ2 1 = log(2πσ2) + log e = log(2πeσ2) 2 2σ2 2
1 h(X ) = log(2πeσ2) 2
Entropy and Mutual Information 5/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
4
3
2
1
h(X) 0
-1
-2
-3 0 2 4 6 8 10 <2
It is a concave function of σ2
Entropy and Mutual Information 6/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Maximum entropy distribution For a fixed variance (E[X 2] = σ2), the normal distribution is the pdf that maximizes entropy Z maximize − f (x) log f (x)dx, f (x) subject to f (x) ≥ 0, Z f (x)dx = 1, Z x2f (x)dx = σ2.
This is a convex optimization problem (entropy is a concave function) whose solution is
2 1 − x f (x) = √ e 2σ2 2πσ
This result will be important later Entropy and Mutual Information 7/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Densities Given two random variables X and Y , we have
I Joint pdf f (x, y)
I Marginal pdf Z Z f (x) = f (x, y)dy f (y) = f (x, y)dx
I Conditional pdf f (x, y) f (x|y) = f (y)
Independence f (x, y) = f (x)f (y)
Entropy and Mutual Information 8/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Joint and conditional entropy T Definition: Let X = (X1,..., XN ) be an N-dimensional random vector with density f (x) = f (x1,... xN ). The (joint) differential entropy of X is defined as Z h(X) = − f (x) log f (x)dx
Definition: Let (X , Y ) have a joint pdf f (x, y), the conditional differential entropy h(X |Y ) is defined as Z h(X |Y ) = − f (x, y) log f (x|y)dx dy
Like for discrete random variables, the following relationships also hold h(X , Y ) = h(X ) + h(Y |X ) h(X , Y ) = h(Y ) + h(X |Y )
Entropy and Mutual Information 9/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Example 1: Entropy of a multivariate normal distribution T X ∼ N(0, C). Let X = (X1,..., XN ) be an N-dimensional Gaussian vector with zero mean and covariance matrix C,
1 − 1 xT C−1x f (x) = √ e 2 ( 2π)N |C|1/2
Z h(X) = − f (x) log f (x)dx
Z 1 log e = − f (x) − log((2π)N |C|) − (xT C−1x) dx 2 2 1 N log e 1 = log((2π)N |C|) + = log((2πe)N |C|) 2 2 2 where we have used the fact that E[xT C−1x] = N
1 1 h(X) = log((2πe)N |C|) = log det(2πe C) 2 2
Entropy and Mutual Information 10/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information As a particular case, let us consider a 2D vector containing two T correlated Gaussian random variables. Let X = (X1, X2) be a zero-mean Gaussian random vector with covariance matrix given by σ2 1 ρ C = , 2 ρ 1 where E[X1X2] ρ = q q 2 2 E[X1 ] E[X2 ] is the correlation coefficient (−1 ≤ ρ ≤ 1) Applying the previous result, the entropy of X is q h(X) = log(πeσ2 (1 − ρ2))
2 If ρ = 0, X = X1 + jX2 is a complex normal (X ∼ CN(0, σ )) with entropy h(X ) = log(πeσ2)
Entropy and Mutual Information 11/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
<2 =1 4
3
2
H 1
0
-1
-2 0 0.2 0.4 0.6 0.8 1 ;2 It is a concave function of ρ2
Entropy and Mutual Information 12/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Properties Let us review a few more properties of the differential entropy and the mutual information that might be useful later
I The differential entropy is invariant to a translation (change in the mean of the pdf)
h(X ) = h(X + a)
Proof: The proof follows directly from the definition of differential entropy I The differential entropy changes with a change of scale
h(aX ) = h(X ) + log |a|
Proof: Let Y = aX , then the pdf of Y is 1 y f (y) = f . Y |a| X a
Entropy and Mutual Information 13/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Applying now the definition of differential entropy we have Z h(aX ) = − fY (y) log fY (y)dy Z 1 y 1 y = − f log f |a| X a |a| X a Z = − fX (y) log fX (x)dx + log |a| = h(X ) + log |a|
I An extension to random vectors is as follows
h(AX) = h(X) + log |det(A)|
Entropy and Mutual Information 14/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Relative entropy
Definition:The relative entropy (Kullback-Leibler divergence) D(f ||g) between two continuous densities is defined by
Z f (x) D(f ||g) = f (x) log dx. g(x)
Note that D(f ||g) is finite only if the support of f (x) is contained in the support of g(x) The KL distance satisfies the following properties (identical to the discrete case)
I D(p||q) ≥ 0
I D(p||q) = 0 iff p = q
Entropy and Mutual Information 15/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information Example 1: Relative entropy between two normal distributions with different means and variances 2 2 − (x−µ1) − (x−µ2) 1 2σ2 1 2σ2 f (x) = √ e 1 and g(x) = √ e 2 2πσ1 2πσ2
Z Z 2 f (x) 2 N(µ1, σ1) D(f ||g) = f (x) log dx = N(µ1, σ1) log 2 dx g(x) N(µ2, σ2) Z 2 2 2 σ2 (x − µ1) (x − µ2) = N(µ1, σ1) log + log(e) − 2 + 2 dx σ1 2σ1 2σ2
2 2 2 1 σ2 σ1 (µ1 − µ2) D(f ||g) = log e ln 2 + 2 + 2 − 1 2 σ1 σ2 σ2 As defined, the relative entropy is measured in bits. If we use ln instead of log in the definition it would be measured in nats, the only difference in the previous expression would be the log e factor Entropy and Mutual Information 16/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
I σ1 = σ2 = 1 and µ1 = 0 1 D(f ||g) = µ2 log e 2 2
3
2.5
2
1.5 D(f||g)
1
0.5
0 -2 -1 0 1 2 7 2
It is a convex function of µ2
Entropy and Mutual Information 17/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
I σ1 = 1 and µ1 = µ2 1 2 1 D(f ||g) = log e ln(σ2) + 2 − 1 2 σ2
0.25
0.2
0.15
D(f||g) 0.1
0.05
0 0.5 1 1.5 2 <2 2 2 It is a convex function of σ2
Entropy and Mutual Information 18/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Mutual information Definition 1: The mutual information I (X ; Y ) between the random variables X and Y is given by
I (X ; Y ) = h(X ) − h(X |Y ) = h(Y ) − h(Y |X )
Definition 2: The mutual information I (X ; Y ) between two random variables with joint distribution f (x, y) is defined as the KL distance between the joint distribution and the product of their marginals
Z f (x, y) I (X ; Y ) = D(f (x, y)||f (x)f (y)) = f (x, y) log f (x)f (y) f (X , Y ) = E log f (X )f (Y )
Entropy and Mutual Information 19/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Two important properties (identical to the case of discrete random variables)
1. I (X ; Y ) ≥ 0 with equality iff X and Y are independent
2. h(X |Y ) ≤ h(X ) with equality iff X and Y are independent
Entropy and Mutual Information 20/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
Example 1: Let us consider the following Additive White Gaussian Noise (AWGN) channel 2 N ~ N(0,σ n )
X ~ N(0,σ 2 ) = + x ⊕ Y X N
I (X ; Y ) = h(Y ) − h(Y |X )
1 2 2 I h(Y ) = 2 log 2πe(σx + σn) 1 2 I h(Y |X ) = h(N) = 2 log 2πeσn
2 1 σx I (X ; Y ) = log 1 + 2 2 σn Sounds familiar?
Entropy and Mutual Information 21/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
5
4
3 I(X;Y) 2
1
0 0 200 400 600 800 1000 snr =<2 / <2 x n The mutual information grows first very fast and then much slower 2 σx for high values of the signal-to-noise-ratio: snr = 2 σn
Entropy and Mutual Information 22/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information Example 2: Mutual information between correlated Gaussian variables. Let (X , Y )T be a zero-mean Gaussian random vector with covariance matrix given by
σ2 ρσ2 C = , ρσ2 σ2
where E[XY ] ρ = pE[X 2]pE[Y 2]
I (X ; Y ) = h(Y ) − h(Y |X ) = h(Y ) + h(X ) − h(X , Y )
1 2 I h(Y ) = h(X ) = 2 log(2πeσ ) 1 2 1 2 4 2 I h(X , Y ) = = 2 log (2πe) |C| = 2 log (2πe) σ (1 − ρ ) 1 I (X ; Y ) = − log(1 − ρ2) 2
Entropy and Mutual Information 23/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information
3.5
3
2.5
2
I(X;Y) 1.5
1
0.5
0 0 0.2 0.4 0.6 0.8 1 ;2
2 I If ρ = 0 then I (X ; Y ) = 0 which implies that X and Y are independent random variables 2 I If ρ = 1 then I (X ; Y ) = ∞ since X and Y are fully correlated
Entropy and Mutual Information 24/24