<<

and Mutual (Continuous Random Variables)

Master Universitario en Ingenier´ıade Telecomunicaci´on

I. Santamar´ıa Universidad de Cantabria Introduction Entropy Joint and Relative Entropy

Contents

Introduction

Differential Entropy

Joint and Conditional Differential Entropy

Relative Entropy

Mutual Information

Entropy and Mutual Information 1/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Introduction

I We introduce the (differential) entropy and mutual information of continuous random variables

I We will need these concepts, for instance, to determine the capacity of the AWGN channel I Some important differences appear with respect to the case of discrete random variables

I Continuous random variables =⇒ differential entropy (strictly, not entropy) I Unlike the entropy of discrete random variables, the differential entropy of a continuous can be negative I It does not give the average information in X

I The relative entropy and mutual information concepts can be extended to the continuous case in a straightforward manner, and convey the same information

Entropy and Mutual Information 2/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Definitions Let X be a continuous random variable with cumulative distribution function given by

F (x) = Pr{X ≤ x},

and probability density function (pdf) given by

dF (x) f (x) = , dx F (x) and f (x) are both assumed to be continuous functions Definition: The differential entropy h(X ) of a continuous random variable X with pdf f (x) is defined as Z h(X ) = − f (x) log f (x)dx,

where the integration is carried out on the support of the r.v.

Entropy and Mutual Information 3/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Example 1: Entropy of a uniform distribution, X ∼ U(a, b)

f (x) 1 b − a

a b

Z Z b 1  1  h(X ) = − f (x) log f (x)dx = − log = log(b−a) a b − a b − a Note that h(X ) < 0 if (b − a) < 1

Entropy and Mutual Information 4/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Example 2: Entropy of a , X ∼ N(0, σ2)

2 1 − x f (x) = √ e 2σ2 2πσ

Z Z  2  1 − x h(X ) = − f (x) log f (x)dx = − f (x) log √ e 2σ2 dx 2πσ Z  1 x2  = − f (x) − log(2πσ2) − log e dx 2 2σ2 1 σ2 1 = log(2πσ2) + log e = log(2πeσ2) 2 2σ2 2

1 h(X ) = log(2πeσ2) 2

Entropy and Mutual Information 5/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

4

3

2

1

h(X) 0

-1

-2

-3 0 2 4 6 8 10 <2

It is a concave function of σ2

Entropy and Mutual Information 6/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Maximum entropy distribution For a fixed (E[X 2] = σ2), the normal distribution is the pdf that maximizes entropy Z maximize − f (x) log f (x)dx, f (x) subject to f (x) ≥ 0, Z f (x)dx = 1, Z x2f (x)dx = σ2.

This is a convex optimization problem (entropy is a concave function) whose solution is

2 1 − x f (x) = √ e 2σ2 2πσ

This result will be important later Entropy and Mutual Information 7/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Densities Given two random variables X and Y , we have

I Joint pdf f (x, y)

I Marginal pdf Z Z f (x) = f (x, y)dy f (y) = f (x, y)dx

I Conditional pdf f (x, y) f (x|y) = f (y)

Independence f (x, y) = f (x)f (y)

Entropy and Mutual Information 8/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Joint and conditional entropy T Definition: Let X = (X1,..., XN ) be an N-dimensional random vector with density f (x) = f (x1,... xN ). The (joint) differential entropy of X is defined as Z h(X) = − f (x) log f (x)dx

Definition: Let (X , Y ) have a joint pdf f (x, y), the conditional differential entropy h(X |Y ) is defined as Z h(X |Y ) = − f (x, y) log f (x|y)dx dy

Like for discrete random variables, the following relationships also hold h(X , Y ) = h(X ) + h(Y |X ) h(X , Y ) = h(Y ) + h(X |Y )

Entropy and Mutual Information 9/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Example 1: Entropy of a multivariate normal distribution T X ∼ N(0, C). Let X = (X1,..., XN ) be an N-dimensional Gaussian vector with zero mean and C,

1 − 1 xT C−1x f (x) = √ e 2 ( 2π)N |C|1/2

Z h(X) = − f (x) log f (x)dx

Z  1 log e  = − f (x) − log((2π)N |C|) − (xT C−1x) dx 2 2 1 N log e 1 = log((2π)N |C|) + = log((2πe)N |C|) 2 2 2 where we have used the fact that E[xT C−1x] = N

1 1 h(X) = log((2πe)N |C|) = log det(2πe C) 2 2

Entropy and Mutual Information 10/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information As a particular case, let us consider a 2D vector containing two T correlated Gaussian random variables. Let X = (X1, X2) be a zero-mean Gaussian random vector with covariance matrix given by σ2 1 ρ C = , 2 ρ 1 where E[X1X2] ρ = q q 2 2 E[X1 ] E[X2 ] is the correlation coefficient (−1 ≤ ρ ≤ 1) Applying the previous result, the entropy of X is q h(X) = log(πeσ2 (1 − ρ2))

2 If ρ = 0, X = X1 + jX2 is a complex normal (X ∼ CN(0, σ )) with entropy h(X ) = log(πeσ2)

Entropy and Mutual Information 11/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

<2 =1 4

3

2

H 1

0

-1

-2 0 0.2 0.4 0.6 0.8 1 ;2 It is a concave function of ρ2

Entropy and Mutual Information 12/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Properties Let us review a few more properties of the differential entropy and the mutual information that might be useful later

I The differential entropy is invariant to a translation (change in the mean of the pdf)

h(X ) = h(X + a)

Proof: The proof follows directly from the definition of differential entropy I The differential entropy changes with a change of scale

h(aX ) = h(X ) + log |a|

Proof: Let Y = aX , then the pdf of Y is 1 y  f (y) = f . Y |a| X a

Entropy and Mutual Information 13/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Applying now the definition of differential entropy we have Z h(aX ) = − fY (y) log fY (y)dy Z 1 y   1 y  = − f log f |a| X a |a| X a Z = − fX (y) log fX (x)dx + log |a| = h(X ) + log |a|

I An extension to random vectors is as follows

h(AX) = h(X) + log |det(A)|

Entropy and Mutual Information 14/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Relative entropy

Definition:The relative entropy (Kullback-Leibler divergence) D(f ||g) between two continuous densities is defined by

Z f (x) D(f ||g) = f (x) log dx. g(x)

Note that D(f ||g) is finite only if the support of f (x) is contained in the support of g(x) The KL distance satisfies the following properties (identical to the discrete case)

I D(p||q) ≥ 0

I D(p||q) = 0 iff p = q

Entropy and Mutual Information 15/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information Example 1: Relative entropy between two normal distributions with different means and 2 2 − (x−µ1) − (x−µ2) 1 2σ2 1 2σ2 f (x) = √ e 1 and g(x) = √ e 2 2πσ1 2πσ2

Z Z 2 f (x) 2 N(µ1, σ1) D(f ||g) = f (x) log dx = N(µ1, σ1) log 2 dx g(x) N(µ2, σ2) Z   2 2  2 σ2 (x − µ1) (x − µ2) = N(µ1, σ1) log + log(e) − 2 + 2 dx σ1 2σ1 2σ2

  2  2 2  1 σ2 σ1 (µ1 − µ2) D(f ||g) = log e ln 2 + 2 + 2 − 1 2 σ1 σ2 σ2 As defined, the relative entropy is measured in . If we use ln instead of log in the definition it would be measured in nats, the only difference in the previous expression would be the log e factor Entropy and Mutual Information 16/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

I σ1 = σ2 = 1 and µ1 = 0 1 D(f ||g) = µ2 log e 2 2

3

2.5

2

1.5 D(f||g)

1

0.5

0 -2 -1 0 1 2 7 2

It is a convex function of µ2

Entropy and Mutual Information 17/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

I σ1 = 1 and µ1 = µ2   1 2 1 D(f ||g) = log e ln(σ2) + 2 − 1 2 σ2

0.25

0.2

0.15

D(f||g) 0.1

0.05

0 0.5 1 1.5 2 <2 2 2 It is a convex function of σ2

Entropy and Mutual Information 18/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Mutual information Definition 1: The mutual information I (X ; Y ) between the random variables X and Y is given by

I (X ; Y ) = h(X ) − h(X |Y ) = h(Y ) − h(Y |X )

Definition 2: The mutual information I (X ; Y ) between two random variables with joint distribution f (x, y) is defined as the KL distance between the joint distribution and the product of their marginals

Z f (x, y) I (X ; Y ) = D(f (x, y)||f (x)f (y)) = f (x, y) log f (x)f (y)  f (X , Y )  = E log f (X )f (Y )

Entropy and Mutual Information 19/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Two important properties (identical to the case of discrete random variables)

1. I (X ; Y ) ≥ 0 with equality iff X and Y are independent

2. h(X |Y ) ≤ h(X ) with equality iff X and Y are independent

Entropy and Mutual Information 20/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

Example 1: Let us consider the following Additive White Gaussian Noise (AWGN) channel 2 N ~ N(0,σ n )

X ~ N(0,σ 2 ) = + x ⊕ Y X N

I (X ; Y ) = h(Y ) − h(Y |X )

1 2 2  I h(Y ) = 2 log 2πe(σx + σn) 1 2 I h(Y |X ) = h(N) = 2 log 2πeσn

 2  1 σx I (X ; Y ) = log 1 + 2 2 σn Sounds familiar?

Entropy and Mutual Information 21/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

5

4

3 I(X;Y) 2

1

0 0 200 400 600 800 1000 snr =<2 / <2 x n The mutual information grows first very fast and then much slower 2 σx for high values of the signal-to-noise-ratio: snr = 2 σn

Entropy and Mutual Information 22/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information Example 2: Mutual information between correlated Gaussian variables. Let (X , Y )T be a zero-mean Gaussian random vector with covariance matrix given by

 σ2 ρσ2 C = , ρσ2 σ2

where E[XY ] ρ = pE[X 2]pE[Y 2]

I (X ; Y ) = h(Y ) − h(Y |X ) = h(Y ) + h(X ) − h(X , Y )

1 2 I h(Y ) = h(X ) = 2 log(2πeσ ) 1 2  1 2 4 2  I h(X , Y ) = = 2 log (2πe) |C| = 2 log (2πe) σ (1 − ρ ) 1 I (X ; Y ) = − log(1 − ρ2) 2

Entropy and Mutual Information 23/24 Introduction Entropy Joint and Conditional Entropy Relative Entropy Mutual Information

3.5

3

2.5

2

I(X;Y) 1.5

1

0.5

0 0 0.2 0.4 0.6 0.8 1 ;2

2 I If ρ = 0 then I (X ; Y ) = 0 which implies that X and Y are independent random variables 2 I If ρ = 1 then I (X ; Y ) = ∞ since X and Y are fully correlated

Entropy and Mutual Information 24/24