Asymptotic Equipartition Property

Asymptotic Equipartition Property

Lecture 5: Asymptotic Equipartition Property • Law of large number for product of random variables • AEP and consequences Dr. Yao Xie, ECE587, Information Theory, Duke University Stock market • Initial investment Y0, daily return ratio ri, in t-th day, your money is Yt = Y0r1 · ::: · rt: • Now if returns ratio ri are i.i.d., with { 4; w.p. 1=2 r = i 0; w.p. 1=2: • So you think the expected return ratio is Eri = 2, • and then t t EYt = E(Y0r1 · ::: · rt) = Y0(Eri) = Y02 ? Dr. Yao Xie, ECE587, Information Theory, Duke University 1 Is \optimized" really optimal? • With Y0 = 1, actual return Yt goes like 1 4 16 0 0 0 ::: • Optimize expected return is not optimal? • Fundamental reason: products does not behave the same as addition Dr. Yao Xie, ECE587, Information Theory, Duke University 2 (Weak) Law of large number Theorem. For independent, identically distributed (i.i.d.) random variables Xi, 1 Xn X¯ = X ! EX; in probability. n n i i=1 • Convergence in probability if for every ϵ > 0, P fjXn − Xj > ϵg ! 0: • Proof by Markov inequality. • So this means P fjX¯n − EXj ≤ ϵg ! 1; n ! 1: Dr. Yao Xie, ECE587, Information Theory, Duke University 3 Other types of convergence • In mean square if as n ! 1 2 E(Xn − X) ! 0 • With probability 1 (almost surely) if as n ! 1 n o P lim Xn = X = 1 n!1 • In distribution if as n ! 1 lim Fn ! F; n where Fn and F are the cumulative distribution function of Xn and X. Dr. Yao Xie, ECE587, Information Theory, Duke University 4 Product of random variables • v How does this behave? u uYn tn Xi i=1 pQ P • n n ≤ 1 n Geometric mean i=1 Xi arithmetic mean n i=1 Xi • Examples: { Volume V of a random box, each dimension Xi, V = X1 · ::: · Xn · · { Stock return Yt = Y0r1 ::: rt Q n { Joint distribution of i.i.d. RVs: p(x1; : : : ; xn) = i=1 p(xi) Dr. Yao Xie, ECE587, Information Theory, Duke University 5 Law of large number for product of random variables • We can write log Xi Xi = e • Hence v u uYn P 1 n tn log Xi Xi = en i=1 i=1 • So from LLN v u uYn tn E(log X) log EX Xi ! e ≤ e = EX: i=1 Dr. Yao Xie, ECE587, Information Theory, Duke University 6 • Stock example: 1 1 E log r = log 4 + log 0 = −∞ i 2 2 E log ri E(Yt) ! Y0e = 0; t ! 1: • Example { a; w.p. 1=2 X = b; w.p. 1=2: 8 v 9 < u = uYn p a + b E tn X ! ab ≤ : i; 2 i=1 Dr. Yao Xie, ECE587, Information Theory, Duke University 7 Asymptotic equipartition property (AEP) • LLN states that 1 Xn X ! EX n i i=1 • AEP states that most sequences 1 1 log ! H(X) n p(X1;X2;:::;Xn) −nH(X) p(X1;X2;:::;Xn) ≈ 2 • Analyze using LLN for product of random variables Dr. Yao Xie, ECE587, Information Theory, Duke University 8 AEP lies in the heart of information theory. • Proof for lossless source coding • Proof for channel capacity • and more... Dr. Yao Xie, ECE587, Information Theory, Duke University 9 AEP Theorem. If X1;X2;::: are i.i.d. ∼ p(x), then 1 − log p(X ;X ;:::;X ) ! H(X); in probability. n 1 2 n Proof: 1 1 Xn − log p(X ;X ; ··· ;X ) = − log p(X ) n 1 2 n n i i=1 ! −E log p(X) = H(X): There are several consequences. Dr. Yao Xie, ECE587, Information Theory, Duke University 10 Typical set A typical set (n) Aϵ n contains all sequences (x1; x2; : : : ; xn) 2 X with the property −n(H(X)+ϵ) −n(H(X)−ϵ) 2 ≤ p(x1; x2; : : : ; xn) ≤ 2 : Dr. Yao Xie, ECE587, Information Theory, Duke University 11 Not all sequences are created equal • Coin tossing example: X 2 f0; 1g, p(1) = 0:8 P P − p(1; 0; 1; 1; 0; 1) = p Xi(1 − p)5 Xi = p4(1 − p)2 = 0:0164 P P − p(0; 0; 0; 0; 0; 0) = p Xi(1 − p)5 Xi = p4(1 − p)2 = 0:000064 • In this example, if 2 (n) (x1; : : : ; xn) Aϵ ; 1 H(X) − ϵ ≤ − log p(X ;:::;X ) ≤ H(X) + ϵ. n 1 n • This means a binary sequence is in typical set is the frequency of heads is approximately k=n Dr. Yao Xie, ECE587, Information Theory, Duke University 12 •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭ •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭ •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• ◭ •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• Dr. Yao Xie, ECE587, Information Theory, Duke University 13 p = 0:6, n = 25, k = number of \1"s n n 1 k pk(1 − p) n−k − log p(x n) !k" ! k" n 0 1 0 .000000 1 .321928 1 25 0 .000000 1 .298530 2 300 0 .000000 1 .275131 3 2300 0 .000001 1 .251733 4 12650 0 .000007 1 .228334 5 53130 0 .000054 1 .204936 6 177100 0 .000227 1 .181537 7 480700 0 .001205 1 .158139 8 1081575 0 .003121 1 .134740 9 2042975 0 .013169 1 .111342 10 3268760 0 .021222 1 .087943 11 4457400 0 .077801 1 .064545 12 5200300 0 .075967 1 .041146 13 5200300 0 .267718 1 .017748 14 4457400 0 .146507 0 .994349 15 3268760 0 .575383 0 .970951 16 2042975 0 .151086 0 .947552 17 1081575 0 .846448 0 .924154 18 480700 0 .079986 0 .900755 19 177100 0 .970638 0 .877357 20 53130 0 .019891 0 .853958 21 12650 0 .997633 0 .830560 22 2300 0 .001937 0 .807161 23 300 0 .999950 0 .783763 24 25 0 .000047 0 .760364 25 1 0 .000003 0 .736966 Dr. Yao Xie, ECE587, Information Theory, Duke University 14 Consequences of AEP (n) Theorem. 1. If (x1; x2; : : : ; xn) 2 Aϵ , then for n sufficiently large: 1 H(X) − ϵ ≤ − log p(x ; x ; : : : ; x ) ≤ H(X) + ϵ n 1 2 n (n) 2. P fAϵ g ≥ 1 − ϵ. (n) n(H(X)+ϵ) 3. jAϵ j ≤ 2 . (n) n(H(X)−ϵ) 4. jAϵ j ≥ (1 − ϵ)2 . Dr. Yao Xie, ECE587, Information Theory, Duke University 15 Property 1 (n) If (x1; x2; : : : ; xn) 2 Aϵ , then 1 H(X) − ϵ ≤ − log p(x ; x ; : : : ; x ) ≤ H(X) + ϵ. n 1 2 n • Proof from definition: 2 (n) (x1; x2; : : : ; xn) Aϵ ; if −n(H(X)+ϵ) −n(H(X)−ϵ) 2 ≤ p(x1; x2; : : : ; xn) ≤ 2 : • The number of bits used to describe sequences in typical set is approximately nH(X). Dr. Yao Xie, ECE587, Information Theory, Duke University 16 Property 2 (n) P fAϵ g ≥ 1 − ϵ for n sufficiently large. • Proof: From AEP: because 1 − log p(X ;:::;X ) ! H(X) n 1 n in probability, this means for a given ϵ > 0, when n is sufficiently large 1 pf − log p(X1;:::;Xn) − H(X) ≤ ϵg ≥ 1 − ϵ. | n {z } (n) 2Aϵ { High probability: sequences in typical set are \most typical". { These sequences almost all have same probability - \equipartition". Dr. Yao Xie, ECE587, Information Theory, Duke University 17 Property 3 and 4: size of typical set − n(H(X)−ϵ) ≤ j (n)j ≤ n(H(X)+ϵ) (1 ϵ)2 Aϵ 2 • Proof: X 1 = p(x1; : : : ; xn) (x ;:::;x ) 1 Xn ≥ p(x1; : : : ; xn) (n) (x ;:::;x )2A 1 Xn ϵ −n(H(X)+ϵ) ≥ p(x1; : : : ; xn)2 (n) (x1;:::;xn)2Aϵ j (n)j −n(H(X)+ϵ) = Aϵ 2 : Dr. Yao Xie, ECE587, Information Theory, Duke University 18 (n) On the other hand, P fAϵ g ≥ 1 − ϵ for n, so X 1 − ϵ < p(x1; : : : ; xn) (n) (x1;:::;xn)2Aϵ ≤ j (n)j −n(H(X)−ϵ) Aϵ 2 : • Size of typical set depends on H(X). • When p = 1=2 in coin tossing example, H(X) = 1, 2nH(X) = 2n: all sequences are typical sequences. Dr. Yao Xie, ECE587, Information Theory, Duke University 19 Typical set diagram • This enables us to divide all sequences into two sets { Typical set: high probability to occur, sample entropy is close to true entropy so we will focus on analyzing sequences in typical set { Non-typical set: small probability, can ignore in general n:| | n elements Non-typical set Typical set (n) n(H + ∋ ) A ∋ : 2 elements Dr. Yao Xie, ECE587, Information Theory, Duke University 20 Data compression scheme from AEP • Let X1;X2;:::;Xn be i.i.d. RV drawn from p(x) • We wish to find short descriptions for such sequences of RVs Dr. Yao Xie, ECE587, Information Theory, Duke University 21 • Divide all sequences in X n into two sets Non-typical set Description: n log | | + 2 bits Typical set Description: n(H + ∋ ) + 2 bits Dr. Yao Xie, ECE587, Information Theory, Duke University 22 • Use one bit to indicate which set (n) { Typical set Aϵ use prefix \1" Since there are no more than 2n(H(X)+ϵ) sequences, indexing requires no more than d(H(X) + ϵ)e + 1 (plus one extra bit) { Non-typical set use prefix \0" Since there are at most jX jn sequences, indexing requires no more than dn log jX je + 1 n n n • Notation: x = (x1; : : : ; xn), l(x ) = length of codeword for x • We can prove [ ] 1 E l(Xn) ≤ H(X) + ϵ n Dr. Yao Xie, ECE587, Information Theory, Duke University 23 Summary of AEP Almost everything is almost equally probable. • Reasons that AEP has H(X) − 1 n ! { n log p(x ) H(X), in probability { n(H(X) ϵ) suffices to describe that random sequence on average { 2H(X) is the effective alphabet size { Typical set is the smallest set with probability near 1 { Size of typical set 2nH(X) { The distance of elements in the set nearly uniform Dr.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    26 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us