Lecture 6: Rate

• Entropy rate H(X) • on graph

Dr. Yao Xie, ECE587, , Duke University Coin tossing versus poker

• Toss a fair coin and see and sequence

Head, Tail, Tail, Head ···

−nH(X) (x1, x2,..., xn) ≈ 2

• Play card games with friend and see a sequence

     A ♣ K r Q q J ♠ 10 ♣ ···     

(x1, x2,..., xn) ≈?

Dr. Yao Xie, ECE587, Information Theory, Duke University 1 How to model dependence:

• A X1, X2, ···

– State {X1,..., Xn}, each state Xi ∈ X – Next step only depends on the previous state

p(xn+1|xn,..., x1) = p(xn+1|xn).

– Transition probability

pi, j : the transition probability of i → j ∑ = | – p(xn+1) xn p(xn)p(xn+1 xn) – p(x1, x2, ··· , xn) = p(x1)p(x2|x1) ··· p(xn|xn−1)

Dr. Yao Xie, ECE587, Information Theory, Duke University 2 (HMM)

• Used extensively in , handwriting recognition, machine learning.

• Markov process X1, X2,..., Xn, unobservable

• Observe a random process Y1, Y2,..., Yn, such that

Yi ∼ p(yi|xi)

• We can build a probability model

∏n−1 ∏n n n p(x , y ) = p(x1) p(xi+1|xi) p(yi|xi) i=1 i=1

Dr. Yao Xie, ECE587, Information Theory, Duke University 3 Time invariance Markov chain

• A Markov chain is time invariant if the conditional probability p(xn|xn−1) does not depend on n

p(Xn+1 = b|Xn = a) = p(X2 = b|X1 = a), for all a, b ∈ X

• For this kind of Markov chain, define transition matrix    ···  P11 P1n =  ···  P   Pn1 ··· Pnn

Dr. Yao Xie, ECE587, Information Theory, Duke University 4 Simple weather model

• X = { Sunny: S, Rainy R }

• p(S|S) = 1 − β, p(R|R) = 1 − α, p(R|S) = β, p(S|R) = α [ ] 1 − β β P = α 1 − α

#"

"! " "!#" "

"

Dr. Yao Xie, ECE587, Information Theory, Duke University 5 • Probability of seeing a sequence SSRR:

p(SSRR) = p(S)p(S|S)p(R|S)p(R|R) = p(S)(1 − β)β(1 − α)

What will this sequence behave, after many days of observations?

• What sequences of observations are more typical?

• What is the probability of seeing a typical sequence?

Dr. Yao Xie, ECE587, Information Theory, Duke University 6 Stationary distribution

• Stationary distribution: a distribution µ on the states such that the distribution at time n + 1 is the same as the distribution at time n.

• Our weather example: α β – If µ(S) = α+β, µ(R) = α+β [ ] 1 − β β P = α 1 − α

– Then

p(Xn+1 = S) = p(S|S)µ(S) + p(S|R)µ(R) α β α = − β + α = = µ . (1 )α + β α + β α + β (S )

Dr. Yao Xie, ECE587, Information Theory, Duke University 7 • How to calculate stationary distribution

– Stationary distribution µi, i = 1, ··· , |X| satisfies

∑ ∑|X| µi = µ j p ji, (µ = µP), and µi = 1. j i=1

– “Detailed balancing”:

# "

" $! "#!"

$" "#""

Dr. Yao Xie, ECE587, Information Theory, Duke University 8 Stationary process

• A stochastic process is stationary if the joint distribution of any subset is invariant to time-shift

p(X1 = x1, ··· , Xn = xn) = p(X2 = x1, ··· , Xn+1 = xn).

• Example: coin tossing

p(X1 = head, X2 = tail) = p(X2 = head, X3 = tail) = p(1 − p).

Dr. Yao Xie, ECE587, Information Theory, Duke University 9 Entropy rate ∑ • n = , ··· , = n = When Xi are i.i.d., entropy H(X ) H(X1 Xn) i=1 H(Xi) nH(X)

n • With dependent sequence Xi, how does H(X ) grow with n? Still linear?

• Entropy rate characterizes the growth rate

• Definition 1: average entropy per symbol

H(Xn) H(X) = lim n→∞ n

• Definition 2: rate of information innovation

′ H (X) = lim H(Xn|Xn−1, ··· , X1) n→∞

Dr. Yao Xie, ECE587, Information Theory, Duke University 10 ′ • H (X) exists, for Xi stationary

H(Xn|X1, ··· , Xn−1) ≤ H(Xn|X2, ··· , Xn−1) (1)

≤ H(Xn−1|X1, ··· , Xn−2) (2)

– H(Xn|X1, ··· , Xn−1) decreases as n increases – H(X) ≥ 0 – The limit must exist

Dr. Yao Xie, ECE587, Information Theory, Duke University 11 ′ • H(X) = H (X), for Xi stationary

∑n 1 1 H(X , ··· , X ) = H(X |X − , ··· , X ) n 1 n n i i 1 1 i=1

′ • Each H(Xn|X1, ··· , Xn−1) → H (X)

• Cesaro mean:

∑ → = 1 n → → If an a, bn n i=1 ai, bi a, then bn a.

• So 1 H(X , ··· , X ) → H′(X) n 1 n

Dr. Yao Xie, ECE587, Information Theory, Duke University 12 AEP for stationary process

1 − log p(X , ··· , X ) → H(X) n 1 n

−nH(X) • p(X1, ··· , Xn) ≈ 2

• Typical sequences in of size 2−nH(X)

• We can use nH(X) bits to represent typical sequence

Dr. Yao Xie, ECE587, Information Theory, Duke University 13 Entropy rate for Markov chain

• For Markov chain

H(X) = lim H(Xn|Xn−1, ··· , X1) = lim H(Xn|Xn−1) = H(X2|X1)

• By definition p(X2 = j|X1 = i) = Pi j

• Entropy rate of Markov chain ∑ H(X) = − µiPi j log Pi j i j

Dr. Yao Xie, ECE587, Information Theory, Duke University 14 Calculate entropy rate is fairly easy

1. Find stationary distribution µi

2. Use transition probability Pi j ∑ H(X) = − µiPi j log Pi j i j

Dr. Yao Xie, ECE587, Information Theory, Duke University 15 Entropy rate of weather model α β Stationary distribution µ(S) = α+β, µ(R) = α+β [ ] 1 − β β P = α 1 − α

β X = α α + − α − α H( ) α + β[ log (1 ) log(1 )] α β = β + α α + βH( ) α + βH( ) ( ) αβ ( √ ) ≤ ≤ αβ H 2α + β H

Maximum when α = β = 1/2: degenerate to independent process

Dr. Yao Xie, ECE587, Information Theory, Duke University 16 Random walk on graph

• An undirected graph with m nodes {1,..., m}

• Edge i → j has weight Wi j ≥ 0 (Wi j = W ji)

• A particle walks randomly from node to node

• Random walk X1, X2, ··· : a sequence of vertices

• Given Xn = i, next step chosen from neighboring nodes with probability

Wi j Pi j = ∑ k Wik

Dr. Yao Xie, ECE587, Information Theory, Duke University 17 Dr. Yao Xie, ECE587, Information Theory, Duke University 18 Entropy rate of random walk on graph

• Let ∑ ∑ Wi = Wi j, W = Wi j j i, j:i> j

• Stationary distribution is µ = Wi i 2W

• Can verify this is a stationary distribution: µP = µ

• Stationary distribution ∝ weight of edges emanating from node i (locality)

Dr. Yao Xie, ECE587, Information Theory, Duke University 19 Dr. Yao Xie, ECE587, Information Theory, Duke University 20 Summary

• AEP Stationary process X1, X2, ··· , Xi ∈ X: as n → ∞

−nH(X) p(x1, ··· , xn) ≈ 2

• Entropy rate

1 H(X) = lim H(Xn|Xn−1,..., X1) = lim H(X1,..., Xn) n→∞ n n→∞

• Random walk on graph µ = Wi i 2W

Dr. Yao Xie, ECE587, Information Theory, Duke University 21