Entropy Rate

Entropy Rate

Lecture 6: Entropy Rate • Entropy rate H(X) • Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker • Toss a fair coin and see and sequence Head, Tail, Tail, Head ··· −nH(X) (x1; x2;:::; xn) ≈ 2 • Play card games with friend and see a sequence A | K r Q q J ♠ 10 | ··· (x1; x2;:::; xn) ≈? Dr. Yao Xie, ECE587, Information Theory, Duke University 1 How to model dependence: Markov chain • A stochastic process X1; X2; ··· – State fX1;:::; Xng, each state Xi 2 X – Next step only depends on the previous state p(xn+1jxn;:::; x1) = p(xn+1jxn): – Transition probability pi; j : the transition probability of i ! j P = j – p(xn+1) xn p(xn)p(xn+1 xn) – p(x1; x2; ··· ; xn) = p(x1)p(x2jx1) ··· p(xnjxn−1) Dr. Yao Xie, ECE587, Information Theory, Duke University 2 Hidden Markov model (HMM) • Used extensively in speech recognition, handwriting recognition, machine learning. • Markov process X1; X2;:::; Xn, unobservable • Observe a random process Y1; Y2;:::; Yn, such that Yi ∼ p(yijxi) • We can build a probability model Yn−1 Yn n n p(x ; y ) = p(x1) p(xi+1jxi) p(yijxi) i=1 i=1 Dr. Yao Xie, ECE587, Information Theory, Duke University 3 Time invariance Markov chain • A Markov chain is time invariant if the conditional probability p(xnjxn−1) does not depend on n p(Xn+1 = bjXn = a) = p(X2 = bjX1 = a); for all a; b 2 X • For this kind of Markov chain, define transition matrix 2 3 6 ··· 7 6P11 P1n7 = 6 ··· 7 P 46 57 Pn1 ··· Pnn Dr. Yao Xie, ECE587, Information Theory, Duke University 4 Simple weather model • X = f Sunny: S, Rainy R g • p(SjS) = 1 − β, p(RjR) = 1 − α, p(RjS) = β, p(SjR) = α " # 1 − β β P = α 1 − α #" $% " $%#" " " Dr. Yao Xie, ECE587, Information Theory, Duke University 5 • Probability of seeing a sequence SSRR: p(SSRR) = p(S)p(SjS)p(RjS)p(RjR) = p(S)(1 − β)β(1 − α) What will this sequence behave, after many days of observations? • What sequences of observations are more typical? • What is the probability of seeing a typical sequence? Dr. Yao Xie, ECE587, Information Theory, Duke University 6 Stationary distribution • Stationary distribution: a distribution µ on the states such that the distribution at time n + 1 is the same as the distribution at time n. • Our weather example: α β – If µ(S) = α+β, µ(R) = α+β " # 1 − β β P = α 1 − α – Then p(Xn+1 = S) = p(SjS)µ(S) + p(SjR)µ(R) α β α = − β + α = = µ : (1 )α + β α + β α + β (S ) Dr. Yao Xie, ECE587, Information Theory, Duke University 7 • How to calculate stationary distribution – Stationary distribution µi, i = 1; ··· ; jXj satisfies X XjXj µi = µ j p ji; (µ = µP); and µi = 1: j i=1 – “Detailed balancing”: # " " $% "#%" $" "#"" Dr. Yao Xie, ECE587, Information Theory, Duke University 8 Stationary process • A stochastic process is stationary if the joint distribution of any subset is invariant to time-shift p(X1 = x1; ··· ; Xn = xn) = p(X2 = x1; ··· ; Xn+1 = xn): • Example: coin tossing p(X1 = head; X2 = tail) = p(X2 = head; X3 = tail) = p(1 − p): Dr. Yao Xie, ECE587, Information Theory, Duke University 9 Entropy rate P • n = ; ··· ; = n = When Xi are i.i.d., entropy H(X ) H(X1 Xn) i=1 H(Xi) nH(X) n • With dependent sequence Xi, how does H(X ) grow with n? Still linear? • Entropy rate characterizes the growth rate • Definition 1: average entropy per symbol H(Xn) H(X) = lim n!1 n • Definition 2: rate of information innovation 0 H (X) = lim H(XnjXn−1; ··· ; X1) n!1 Dr. Yao Xie, ECE587, Information Theory, Duke University 10 0 • H (X) exists, for Xi stationary H(XnjX1; ··· ; Xn−1) ≤ H(XnjX2; ··· ; Xn−1) (1) ≤ H(Xn−1jX1; ··· ; Xn−2) (2) – H(XnjX1; ··· ; Xn−1) decreases as n increases – H(X) ≥ 0 – The limit must exist Dr. Yao Xie, ECE587, Information Theory, Duke University 11 0 • H(X) = H (X), for Xi stationary Xn 1 1 H(X ; ··· ; X ) = H(X jX − ; ··· ; X ) n 1 n n i i 1 1 i=1 0 • Each H(XnjX1; ··· ; Xn−1) ! H (X) • Cesaro mean: P ! = 1 n ! ! If an a, bn n i=1 ai, bi a, then bn a. • So 1 H(X ; ··· ; X ) ! H0(X) n 1 n Dr. Yao Xie, ECE587, Information Theory, Duke University 12 AEP for stationary process 1 − log p(X ; ··· ; X ) ! H(X) n 1 n −nH(X) • p(X1; ··· ; Xn) ≈ 2 • Typical sequences in typical set of size 2−nH(X) • We can use nH(X) bits to represent typical sequence Dr. Yao Xie, ECE587, Information Theory, Duke University 13 Entropy rate for Markov chain • For Markov chain H(X) = lim H(XnjXn−1; ··· ; X1) = lim H(XnjXn−1) = H(X2jX1) • By definition p(X2 = jjX1 = i) = Pi j • Entropy rate of Markov chain X H(X) = − µiPi j log Pi j i j Dr. Yao Xie, ECE587, Information Theory, Duke University 14 Calculate entropy rate is fairly easy 1. Find stationary distribution µi 2. Use transition probability Pi j X H(X) = − µiPi j log Pi j i j Dr. Yao Xie, ECE587, Information Theory, Duke University 15 Entropy rate of weather model α β Stationary distribution µ(S) = α+β, µ(R) = α+β " # 1 − β β P = α 1 − α β X = α α + − α − α H( ) α + β[ log (1 ) log(1 )] α β = β + α α + βH( ) α + βH( ) ! αβ ( p ) ≤ ≤ αβ H 2α + β H Maximum when α = β = 1=2: degenerate to independent process Dr. Yao Xie, ECE587, Information Theory, Duke University 16 Random walk on graph • An undirected graph with m nodes f1;:::; mg • Edge i ! j has weight Wi j ≥ 0 (Wi j = W ji) • A particle walks randomly from node to node • Random walk X1; X2; ··· : a sequence of vertices • Given Xn = i, next step chosen from neighboring nodes with probability Wi j Pi j = P k Wik Dr. Yao Xie, ECE587, Information Theory, Duke University 17 Dr. Yao Xie, ECE587, Information Theory, Duke University 18 Entropy rate of random walk on graph • Let X X Wi = Wi j; W = Wi j j i; j:i> j • Stationary distribution is µ = Wi i 2W • Can verify this is a stationary distribution: µP = µ • Stationary distribution / weight of edges emanating from node i (locality) Dr. Yao Xie, ECE587, Information Theory, Duke University 19 Dr. Yao Xie, ECE587, Information Theory, Duke University 20 Summary • AEP Stationary process X1; X2; ··· , Xi 2 X: as n ! 1 −nH(X) p(x1; ··· ; xn) ≈ 2 • Entropy rate 1 H(X) = lim H(XnjXn−1;:::; X1) = lim H(X1;:::; Xn) n!1 n n!1 • Random walk on graph µ = Wi i 2W Dr. Yao Xie, ECE587, Information Theory, Duke University 21.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    22 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us