6.441 Information Theory, Lecture 3

LECTURE 3 Convergence and Asymptotic Equipartition Property Last time: ² Convexity and concavity ² Jensen’s inequality ² Positivity of mutual information ² Data processing theorem ² Fano’s inequality Lecture outline ² Types of convergence ² Weak Law of Large Numbers ² Strong Law of Large Numbers ² Asymptotic Equipartition Property Reading: Scts. 3.1-3.2. Types of convergence Recall what a random variable is: a map ping from its set of sample values Ω onto R X : Ω 7! R » ! X(») In the cases we have been discussing, Ω = X and we map onto [0; 1] Types of convergence ² Sure convergence: a random sequence X1; : : : converges surely to r.v. X if 8» 2 Ω the sequence Xn(») converges to X(») as n ! 1 ² Almost sure convergence (also called con vergence with probability 1) the random sequence converges a.s. (w.p. 1) to X if the sequence X1(»); : : : converges to X(») for all » except possibly on a set of Ω of probability 0 ² Mean-square convergence: X1; : : : con verges in m.s. sense to r.v. X if 2 limn!1 EXn[jXn ¡ Xj ] ! 0 ² Convergence in probability: the sequence converges in probability to X if 8² > 0 limn!1 P r[jXn ¡ Xj > ²] ! 0 ² Convergence in distribution: the sequence converges in distribution if the cumula tive distribution function Fn(x) = P r(Xn · x) satisfies limn!1 Fn(x) ! FX(x) at all x for which F is continuous. Relations among types of convergence Venn diagram of relation: Weak Law of Large Numbers X1; X2; : : : i.i.d. finite mean ¹ and variance σ2 X1 + ¢ ¢ ¢ + Xn Mn = n ² E[Mn] = ² Var(Mn) = 2 σX Pr(jMn ¡ ¹j ¸ ²) · n²2 Weak Law of Large Numbers Consequence of Chebyshev’s inequality: Ran dom variable X X 2 2 σX = (x ¡ E[X]) PX(x) x2X 2 2 σX ¸ c Pr(jX ¡ E[X]j ¸ c) σ2 Pr(jX ¡ E[X]j ¸ c) · X c2 1 Pr(jX ¡ E[X]j ¸ kσ ) · X k2 Strong Law of Large Numbers Theorem: (SLLN) If Xi are IID, and EX[jXj] < 1, then X1 + ¢ ¢ ¢ + Xn Mn = ! E [X]; w:p:1: n X AEP If X1; : : : ; Xn are IID with distribution PX, then ¡1 log(P (x ; : : : ; x )) ! H(X) in prob n X1;:::;Xn 1 n ability j Notation: Xi = (Xi; : : : ; Xj) (if i = 1, gen erally omitted) Proof: create r.v. Y that takes the value yi = ¡ log(PX(xi)) with probability PX(xi) (note that the value of Y is related to its probability distribution) we now apply the WLLN to Y AEP 1 n ¡ log(PXn(x )) n n 1 X = ¡ log(PX(xi)) n i=1 n 1 X = yi n i=1 using the WLLN on Y 1 Pn n i=1 yi ! EY [Y ] in probability EY [Y ] = ¡EZ[log(PX(Z))] = H(X) for some r.v. Z identically distributed with X Consequences of the AEP: the typical set (n) Definition: A² is a typical set with respect to PX(x) if it is the set of sequences in the set of all possible sequences xn 2 X n with probability: ¡n(H(X)+²) n ¡n(H(X)¡²) 2 · PXn (x ) · 2 equivalently 1 n H(X) ¡ ² · ¡ log(P n (x )) · H(X) + ² n X As n increases, the bounds get closer to gether, so we are considering a smaller range of probabilities We shall use the typical set to describe a set with characteristics that belong to the majority of elements in that set. Note: the variance of the entropy is finite Consequences of the AEP: the typical set Why is it typical? AEP says 8² > 0, 8± > 0, 9n0 such that 8n > n0 (n) P r(A² ) ¸ 1 ¡ ± (note: ± can be ²) How big is the typical set? X n 1 = PXn (x ) xn2X n X n ¸ PXn (x ) n (n) x 2A² X ¡n(H(X)+²) ¸ 2 n (n) x 2A² (n) ¡n(H(X)+²) = jA² j2 (n) n(H(X)+²) ) jA² j · 2 (n) P r(A² ) ¸ (1 ¡ ²) X n ) 1 ¡ ² · PXn (x ) n (n) x 2A² (n) ¡n(H(X)¡²) · jA² j2 (n) n(H(X)¡²) ) jA² j ¸ 2 (1 ¡ ²) Visualize: Consequences of the AEP: using the typical set for compression Description in typical set requires no more than n(H(X) + ²) + 1 bits (correction of 1 bit because of integrality) (n)C Description in atypical set A² requires no more than n log(jX j) + 1 bits (n) Add another bit to indicate whether in A² or not to get whole description Consequences of the AEP: using the typical set for compression Let l(xn) be the length of the binary de scription of xn 8² > 0, 9n0 s.t. 8n > n0, n EXn[l(X )] X X n n n n = PXn (x ) l(x ) + PXn (x ) l(x ) (n) C n n (n) x 2A± x 2A X ± n · PXn (x ) (n(H(X) + ±) + 2) xn2A(n) X± n + PXn (x ) (n log(jX j) + 2) C n (n) x 2A± = nH(X) + n² + 2 for ± small enough with respect to ² 1 n so EXn[n l(X )] · H(X)+² for n sufficiently large. Jointly typical sequences (n) A² is a typical set with respect to PX;Y (x; y) if it is the set of sequences in the set of all possible sequences (xn; yn) 2 X n £ Yn with probability: ¡n(H(X)+²) n ¡n(H(X)¡²) 2 · PXn (x ) · 2 ³ ´ ¡n(H(Y )+²) n ¡n(H(Y )¡²) 2 · PY n y · 2 ³ ´ ¡n(H(X;Y )+²) n n ¡n(H(X;Y )¡²) 2 · PXn;Y n x ; y · 2 for (Xn; Y n) sequences of length n IID ac n n Qn cording PXn;Y n(x ; y ) = i=1 PX;Y (xi; yi) n n (n) P r((X ; Y ) 2 A² ) ! 1 as n ! 1 Jointly typical sequences Use the union bound n n (n) P r((X ; Y ) 62 A² ) n n 000(n) · P r((X ; Y ) 62 A² ) n 00(n) + P r((X ) 62 A² ) n 0 (n) + P r((Y ) 62 A² ) For A000 single typical sequence for pair, A00 for X and A0 for Y each element in the RHS goes to 0 MIT OpenCourseWare http://ocw.mit.edu 6.441 Information Theory Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. .

6.441 Information Theory, Lecture 3

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support