Entropy Rate

Lecture 6: Entropy Rate • Entropy rate H(X) • Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker • Toss a fair coin and see and sequence Head, Tail, Tail, Head ··· −nH(X) (x1; x2;:::; xn) ≈ 2 • Play card games with friend and see a sequence A | K r Q q J ♠ 10 | ··· (x1; x2;:::; xn) ≈? Dr. Yao Xie, ECE587, Information Theory, Duke University 1 How to model dependence: Markov chain • A stochastic process X1; X2; ··· – State fX1;:::; Xng, each state Xi 2 X – Next step only depends on the previous state p(xn+1jxn;:::; x1) = p(xn+1jxn): – Transition probability pi; j : the transition probability of i ! j P = j – p(xn+1) xn p(xn)p(xn+1 xn) – p(x1; x2; ··· ; xn) = p(x1)p(x2jx1) ··· p(xnjxn−1) Dr. Yao Xie, ECE587, Information Theory, Duke University 2 Hidden Markov model (HMM) • Used extensively in speech recognition, handwriting recognition, machine learning. • Markov process X1; X2;:::; Xn, unobservable • Observe a random process Y1; Y2;:::; Yn, such that Yi ∼ p(yijxi) • We can build a probability model Yn−1 Yn n n p(x ; y ) = p(x1) p(xi+1jxi) p(yijxi) i=1 i=1 Dr. Yao Xie, ECE587, Information Theory, Duke University 3 Time invariance Markov chain • A Markov chain is time invariant if the conditional probability p(xnjxn−1) does not depend on n p(Xn+1 = bjXn = a) = p(X2 = bjX1 = a); for all a; b 2 X • For this kind of Markov chain, define transition matrix 2 3 6 ··· 7 6P11 P1n7 = 6 ··· 7 P 46 57 Pn1 ··· Pnn Dr. Yao Xie, ECE587, Information Theory, Duke University 4 Simple weather model • X = f Sunny: S, Rainy R g • p(SjS) = 1 − β, p(RjR) = 1 − α, p(RjS) = β, p(SjR) = α " # 1 − β β P = α 1 − α #" $% " $%#" " " Dr. Yao Xie, ECE587, Information Theory, Duke University 5 • Probability of seeing a sequence SSRR: p(SSRR) = p(S)p(SjS)p(RjS)p(RjR) = p(S)(1 − β)β(1 − α) What will this sequence behave, after many days of observations? • What sequences of observations are more typical? • What is the probability of seeing a typical sequence? Dr. Yao Xie, ECE587, Information Theory, Duke University 6 Stationary distribution • Stationary distribution: a distribution µ on the states such that the distribution at time n + 1 is the same as the distribution at time n. • Our weather example: α β – If µ(S) = α+β, µ(R) = α+β " # 1 − β β P = α 1 − α – Then p(Xn+1 = S) = p(SjS)µ(S) + p(SjR)µ(R) α β α = − β + α = = µ : (1 )α + β α + β α + β (S ) Dr. Yao Xie, ECE587, Information Theory, Duke University 7 • How to calculate stationary distribution – Stationary distribution µi, i = 1; ··· ; jXj satisfies X XjXj µi = µ j p ji; (µ = µP); and µi = 1: j i=1 – “Detailed balancing”: # " " $% "#%" $" "#"" Dr. Yao Xie, ECE587, Information Theory, Duke University 8 Stationary process • A stochastic process is stationary if the joint distribution of any subset is invariant to time-shift p(X1 = x1; ··· ; Xn = xn) = p(X2 = x1; ··· ; Xn+1 = xn): • Example: coin tossing p(X1 = head; X2 = tail) = p(X2 = head; X3 = tail) = p(1 − p): Dr. Yao Xie, ECE587, Information Theory, Duke University 9 Entropy rate P • n = ; ··· ; = n = When Xi are i.i.d., entropy H(X ) H(X1 Xn) i=1 H(Xi) nH(X) n • With dependent sequence Xi, how does H(X ) grow with n? Still linear? • Entropy rate characterizes the growth rate • Definition 1: average entropy per symbol H(Xn) H(X) = lim n!1 n • Definition 2: rate of information innovation 0 H (X) = lim H(XnjXn−1; ··· ; X1) n!1 Dr. Yao Xie, ECE587, Information Theory, Duke University 10 0 • H (X) exists, for Xi stationary H(XnjX1; ··· ; Xn−1) ≤ H(XnjX2; ··· ; Xn−1) (1) ≤ H(Xn−1jX1; ··· ; Xn−2) (2) – H(XnjX1; ··· ; Xn−1) decreases as n increases – H(X) ≥ 0 – The limit must exist Dr. Yao Xie, ECE587, Information Theory, Duke University 11 0 • H(X) = H (X), for Xi stationary Xn 1 1 H(X ; ··· ; X ) = H(X jX − ; ··· ; X ) n 1 n n i i 1 1 i=1 0 • Each H(XnjX1; ··· ; Xn−1) ! H (X) • Cesaro mean: P ! = 1 n ! ! If an a, bn n i=1 ai, bi a, then bn a. • So 1 H(X ; ··· ; X ) ! H0(X) n 1 n Dr. Yao Xie, ECE587, Information Theory, Duke University 12 AEP for stationary process 1 − log p(X ; ··· ; X ) ! H(X) n 1 n −nH(X) • p(X1; ··· ; Xn) ≈ 2 • Typical sequences in typical set of size 2−nH(X) • We can use nH(X) bits to represent typical sequence Dr. Yao Xie, ECE587, Information Theory, Duke University 13 Entropy rate for Markov chain • For Markov chain H(X) = lim H(XnjXn−1; ··· ; X1) = lim H(XnjXn−1) = H(X2jX1) • By definition p(X2 = jjX1 = i) = Pi j • Entropy rate of Markov chain X H(X) = − µiPi j log Pi j i j Dr. Yao Xie, ECE587, Information Theory, Duke University 14 Calculate entropy rate is fairly easy 1. Find stationary distribution µi 2. Use transition probability Pi j X H(X) = − µiPi j log Pi j i j Dr. Yao Xie, ECE587, Information Theory, Duke University 15 Entropy rate of weather model α β Stationary distribution µ(S) = α+β, µ(R) = α+β " # 1 − β β P = α 1 − α β X = α α + − α − α H( ) α + β[ log (1 ) log(1 )] α β = β + α α + βH( ) α + βH( ) ! αβ ( p ) ≤ ≤ αβ H 2α + β H Maximum when α = β = 1=2: degenerate to independent process Dr. Yao Xie, ECE587, Information Theory, Duke University 16 Random walk on graph • An undirected graph with m nodes f1;:::; mg • Edge i ! j has weight Wi j ≥ 0 (Wi j = W ji) • A particle walks randomly from node to node • Random walk X1; X2; ··· : a sequence of vertices • Given Xn = i, next step chosen from neighboring nodes with probability Wi j Pi j = P k Wik Dr. Yao Xie, ECE587, Information Theory, Duke University 17 Dr. Yao Xie, ECE587, Information Theory, Duke University 18 Entropy rate of random walk on graph • Let X X Wi = Wi j; W = Wi j j i; j:i> j • Stationary distribution is µ = Wi i 2W • Can verify this is a stationary distribution: µP = µ • Stationary distribution / weight of edges emanating from node i (locality) Dr. Yao Xie, ECE587, Information Theory, Duke University 19 Dr. Yao Xie, ECE587, Information Theory, Duke University 20 Summary • AEP Stationary process X1; X2; ··· , Xi 2 X: as n ! 1 −nH(X) p(x1; ··· ; xn) ≈ 2 • Entropy rate 1 H(X) = lim H(XnjXn−1;:::; X1) = lim H(X1;:::; Xn) n!1 n n!1 • Random walk on graph µ = Wi i 2W Dr. Yao Xie, ECE587, Information Theory, Duke University 21.

Entropy Rate

Entropy Rate of Stochastic Processes

Modeling Dependence in Data: Options Pricing and Random Walks

Entropy Rate Estimation for Markov Chains with Large State Space

A Study of Hidden Markov Model

Entropy Power, Autoregressive Models, and Mutual Information

Source Coding: Part I of Fundamentals of Source and Video Coding

Relative Entropy Rate Between a Markov Chain and Its Corresponding Hidden Markov Chain

Entropy Rates & Markov Chains Information Theory Duke University, Fall 2020

Information Theory 1 Entropy 2 Mutual Information

Arxiv:1711.03962V1 [Stat.ME] 10 Nov 2017 the Entropy Rate of an Individual’S Behavior

Markov Decision Process Example

Redundancy Rates of Slepian-Wolf Coding∗