Information Theory: Principles and Applications
Total Page:16
File Type:pdf, Size:1020Kb
Information Theory: Principles and Applications . .. Tiago T. V. Vinhoza May 28, 2010 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 1 / 50 . ..1 Differential Entropy Definition Other Information Measures Properties . ..2 Gaussian Channel Capacity Coding Theorem Achievability and Converse Parallel Gaussian Channels: Waterfilling . ..3 Fading Channels . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 2 / 50 Differential Entropy Definition Differential Entropy Entropy of a continuous random variable. Let X be a random variable with cumulative distribution function FX (x) and probabliity density function pX (x). Z h(X) = − pX (x) log pX (x)dx S where S is the supporting set of the random variable X, that is, the set where pX (x) > 0. Like the discrete case, the differential entropy is only dependent of pX (x). Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 3 / 50 Differential Entropy Definition Differential Entropy: Example 1 Uniform distribution. Consider a random variable uniformly distributed from 0 to a. Z h(X) = − pX (x) log pX (x)dx ZS a 1 1 = − log dx = log a 0 a a Note that for a < 1, log a < 0, so the differential entropy can be negative. The volume of the support set, 2h(X) = 2log a = a is always a non-negative quantity. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 4 / 50 Differential Entropy Definition Differential Entropy: Example 2 Gaussian distribution with zero mean and variance σ2. 1 −x2=2σ2 pX (x) = p e 2πσ Z h(X) = − pX (x) ln pX (x)dx ZS [ ] 1 −x2=2σ2 = − pX (x) ln p e dx 2πσ Z hp i Z 2 2 2 = pX (x) ln 2πσ dx + x =2σ pX (x)dx 1 E[X2] = ln 2πσ2 + 2 2σ2 1 1 = ln 2πσ2 + 2 2 1 1 = ln 2πeσ2 nats = log 2πeσ2 bits. 2 2 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 5 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy Consider a random variable X with density pX (x). Divide the range of X into bins of length ∆. Mean value theorem: Z (i+1)∆ pX (xi) = pX (x)dx i∆ Define the quantized random variable X∆ ∆ X = xi; if i∆ ≤ X ≤ (i + 1)∆: ∆ P (X = xi) = f(xi)∆. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 6 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy The entropy of this quantized variable is X1 ∆ H(X ) = − pi log pi i=−∞ X1 = − pX (xi)∆ log(pX (xi)∆) i=−∞ X1 X1 = − pX (xi)∆ log(pX (xi)) − pX (xi)∆ log(∆) i=−∞ i=−∞ X1 = − pX (xi)∆ log(pX (xi)) − log(∆) i=−∞ . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 7 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy If the density pX (x) is Riemann integrable, then H(X∆) + log ∆ ! h(X) as ∆ ! 0: The entropy of and n-bit quantization of a continuous random variable X is approximately h(X) + n. Example: If X has an uniform distribution on [0; 1], and we let ∆ = 2−n, then h(X) = 0, H(X∆) = n and n bits suffice to describe X with n bits accuracy. Example: If X has an uniform distribution on [0; 1=8). The first three bits after the decimal point are zero. To describe X with n bit precision, we need only n − 3 bits, which agrees with h(X) = −3. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 8 / 50 Differential Entropy Other Information Measures Joint Differential Entropy The differential entropy of a random vector Xn composed of the n random variables X1;X2;:::;Xn with density pXn (x ) is defined as ZZ Z n n n n h(X ) = − ::: pXn (x ) log pXn (x )dx . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 9 / 50 Differential Entropy Other Information Measures Joint Differential Entropy: Example Entropy of a multivariate Gaussian distribution: Let X1;X2;:::Xn form a Gaussian random vector with mean µ and covariance matrix K, that is, Xn ∼ N (µ; K). 1 h(Xn) = h(X ;X ;:::;X ) = log(2πe)njKj: 1 2 n 2 where jKj denotes the determinant of the covariance matrix K. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 10 / 50 Differential Entropy Other Information Measures Conditional Differential Entropy If X; Y have a joint pdf pXY (x; y), the conditional differential entropy h(XjY ) is defined as ZZ h(XjY ) = − pXY (x; y) log pXjY =y(x)dxdy As pXY (x; y) log pXjY =y(x)pY (y), we can write h(X; Y ) = h(XjY ) + h(Y ) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 11 / 50 Differential Entropy Other Information Measures Relative Entropy Is a measure of the distance between two continuous distributions. The relative entropy between two probability density functions fX (x) and gX (x) is defined as: Z fX (x) D(fX (x)jjgX (x)) = fX (x) log dx gX (x) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 12 / 50 Differential Entropy Other Information Measures Relative Entropy D(fX (x)jjgX (x)) ≥ 0 with equality if and only if fX (x) = gX (x). D(fX (x)jjgX (x)) =6 D(gX (x)jjfX (x)) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 13 / 50 Differential Entropy Other Information Measures Mutual Information The mutual information of two continuous random variables X and Y is defined as the relative entropy between the joint probability density pXY (x; y) and the product of the marginals pX (x) and pY (y) jj I(X; Y ) = ZZD(pXY (x; y) pX (x)pY (y)) pXY (x; y) = pXY (x; y) log dxdy pX (x)pY (y) I(X; Y ) = h(X) − h(XjY ) = h(Y ) − h(Y jX). Mutual information of continuous random variables is the limit of the mutual information between their quantized versions I(X∆; Y ∆) = H(X∆) − H(X∆jY ∆) ≈ h(X) − log ∆ − (h(XjY ) − log ∆) = I(X; Y ) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 14 / 50 Differential Entropy Properties Properties Translation does not change the differential entropy h(X + c) = h(X) Multiplication by a constant h(aX) = h(X) + log jaj Same property holds for random vectors h(AXn) = h(Xn) + log jAj where jAj is the absolute value of the determinant of A. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 15 / 50 Differential Entropy Properties Properties The multivariate Gaussian distribution maximizes the entropy over all distribution with the same covariance matrix. 1 h(Xn) ≤ log(2πe)njKj: 2 with equality if and only if Xn ∼ N (0; K). Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 16 / 50 Differential Entropy Properties Asymptotic Equipartition Property Like the discrete case, we can define a typical set and characterize it Let X1;X2;:::;Xn be a sequence of iid random variables with probability density function pX (x). 1 − log p (x ; x ; : : : ; x ) ! E[− log p (x)] = h(X) n X1X2:::Xn 1 2 n X Like the discrete case, this results follows from the weak law of large numbers . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 17 / 50 Differential Entropy Properties Asymptotic Equipartition Property (n) The typical set Aϵ with respect to pX (x) is the set of sequences n (x1; x2; : : : ; xn) 2 X with the following property: { } − (n) n log pXn (x) A = x : − h(X) ≤ ϵ ϵ n The properties of the typical set for continuous random variables are the same as the ones for the discrete case. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 18 / 50 Differential Entropy Properties Asymptotic Equipartition Property The volume of the typical set for continuous random variablesis the analog of the cardinality of the typical set for the discrete case. The volume Vol(A) of a set A 2 Rn is defined as Z Vol(A) = dx1dx2 : : : dxn A . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 19 / 50 Differential Entropy Properties Asymptotic Equipartition Property: Properties n (n) P (X 2 Aϵ ) > 1 − ϵ for n sufficiently large. (n) n(h(X)+ϵ) Vol(Aϵ ) ≤ 2 for all n. (n) n(h(X)−ϵ) Vol(Aϵ ) ≥ (1 − ϵ)2 for n sufficiently large. The results for joint typicality follows the ones for the discrete case. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 20 / 50 Gaussian Channel Gaussian Channel It is a discrete-time channel where the output at time i, Yi is the sum of the input Xi and the Gaussian noise Zi Yi = Xi + Zi;Zi ∼ N (0;N) Noise is assumed to be independent from input. Noiseless case: infinite capacity. Any real number can be transmitted without error. Unconstrained inputs: infinite capacity. Even with noise, we can choose the inputs arbitrarily far apart, so that they are distinguishable at the output with probability of error as small as we want. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 21 / 50 Gaussian Channel Gaussian Channel Limitations on the input: Power constraint Average power constraint is assumed. For a any length-n codeword (x1; x2; : : : xn), it is required that 1 Xn x2 ≤ P n i i=1 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 22 / 50 Gaussian Channel Gaussian Channel: A suboptimal use We want to send one bit over the channel in one use of the channel. Given thep power constraint,p we have two possibilities of signals to transmit P and − P .