Information Theory & Coding

INFORMATION THEORY & CODING Week 7 : Entropy Rate Dr. Rui Wang Department of Electrical and Electronic Engineering Southern Univ. of Science and Technology (SUSTech) Email: [email protected] October 27, 2020 Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 1 / 20 Review Summary McMillan inequality Uniquely decodable codes , P D−ì ≤ 1. Huffman code ∗ X L = min pi ì P D−ì ≤1 ∗ HD (X ) ≤ L < HD (X ) + 1: Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 2 / 20 Outline On average, nH(X ) + 1 bits suffices to describe n i.i.d. random variables. But what if the random variables are dependent? Markov Chain: a simplest way to model the correlations among random variables in a stochastic process. Entropy Rate: average number of bits suffices to describe one random variable in a stochastic process. Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 3 / 20 How to Model Dependence: Markov Chains A stochastic process fXi g is an indexed sequence of random variables (X0; X1;::: ) characterized by the joint PMF n p(x1; x2;:::; xn), where (x1; x2;:::; xn) 2 X for n = 0; 1;::: . Definition A stochastic process is said to be stationary if the joint distribution of any subset of the sequence of random variables is invariant with respect to shifts in the time index, i.e., Pr[X1 = x1; X2 = x2;:::; Xn = xn] = Pr[X1+` = x1; X2+` = x2;:::; Xn+` = xn] for every n and every shift ` and for all x1; x2;:::; xn 2 X . Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 4 / 20 Markov Chains Definition A discrete stochastic process X1; X2;::: is said to be a Markov chain or a Markov process if for n = 1; 2;:::; Pr [Xn+1 = xn+1jXn = xn; Xn−1 = xn−1;:::; X1 = x1] = Pr [Xn+1 = xn+1jXn = xn] for all x1; x2;:::; xn; xn+1 2 X . In this case, the joint PMF can be written as p(x1; x2;:::; xn) = p(x1)p(x2jx1)p(x3jx2) ··· p(xnjxn−1): Hence, a Markov chain is completely characterized by initial distribution p(x1) and transition probabilities p(xnjxn−1), n = 2; 3; 4; ::: Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 5 / 20 Markov Chains Definition The Markov chain is called time invariant if the transition probability p(xn+1jxn) does NOT depend on n, i.e., for n = 1; 2;:::; Pr[Xn+1 = bjXn = a] = Pr[X2 = bjX1 = a]; 8a; b 2 X : We deal with time invariant Markov chains. If fXi g is a Markov chain, Xn is called the state at time n. A time invariant Markov chain is characterized by its initial state and a probability transition matrix P =[Pij ]; i; j 2f1; 2;:::; mg, where Pij =Pr[Xn+1 =jjXn =i]. Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 6 / 20 Markov Chain Example: Simple Weather Model X = fSunny: S, Rainy: Rg p(SjS) = 1 − β; p(RjR) = 1 − α; p(RjS) = β; p(SjR) = α 1 − β β P = α 1 − α Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 7 / 20 Markov Chain Example: Simple Weather Model Probability of seeing a sequence SSRR: p(SSRR) = p(S)p(SjS)p(RjS)p(RjR) = p(S)(1 − β)β(1 − α) Suppose the first day is "Sunny" with probability γ, what is the weather distribution of the second day, third day, ...? Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 8 / 20 Stationary Distribution If the PMF of the random variable at time n is n µi = Pr[Xn = i], the PMF at time n + 1, say n+1 µj = Pr[Xn+1 = j], can be written as n+1 X n X n µj = µi Pr[Xn+1 = jjXn = i] = µi Pij : i i n n n+1 fµi j8ig is called a stationary distribution if µi = µi . Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 9 / 20 Stationary Distribution α β If µ = [µS ; µR ] = α+β α+β 1 − β β P = α 1 − α p (Xn+1 = S) = p(SjS)µS + p(SjR)µR α β α = (1 − β) + α = = µ : α + β α + β α + β S Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 10 / 20 Stationary Distribution How to calculate stationary distribution? - Stationary distribution µi ; i = 1; 2;:::; jX j satisfies jX j jX j X X µj = µi Pij and µi = 1: i=1 i=1 Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 11 / 20 Entropy Rate When Xi 's are i.i.d., the entropy n n X H(X ) = H(X1; X2;:::; Xn) = H(Xi ) = nH(X ): i=1 n With dependent sequences Xi 's, how does H(X ) grow with n? Entropy rate characterized the growth rate. Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 12 / 20 Entropy Rate Definition 1: average entropy per symbol H(X ; X ;:::; X ) H(X ) = lim 1 2 n n!1 n Definition 2: conditional entropy of the last r.v. given the past 0 H (X ) = lim H(XnjXn−1; Xn−2;:::; X1) n!1 Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 13 / 20 Entropy Rate Theorem 4.2.2 For a stationary stochastic process, H(XnjXn−1;:::; X1) is nonincreasing in n and has a limit H0(X ). Proof. conditional reduces entropy H (Xn+1jX1; X2;:::; Xn) ≤ H (Xn+1jXn;:::; X2) stationary = H(XnjXn−1;:::; X1); - H(XnjXn−1;:::; X1) decreases as n increases - H(X ) ≥ 0 - The limit must exist. Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 14 / 20 Entropy Rate Theorem 4.2.1 For a stationary stochastic process, H(X ) = H0(X ). Proof. By the chain rule, n 1 1 X H(X ;:::; X ) = H(X jX ;:::; X ): n 1 n n i i−1 1 i=1 0 H(XnjXn−1;:::; X1) ! H (X ) 1 Pn Cesaro mean: If an ! a, bn = n i=1 ai , then bn ! a. So 1 H(X ;:::; X ) ! H0(X ) n 1 n Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 15 / 20 AEP for Stationary Ergodic Process (chap 16) 1 − log p(X ;:::; X ) ! H(X ) n 1 n −nH(X ) p(X1;:::; Xn) ≈ 2 Typical sequences in typical set of size2 −nH(X ) We can use nH(X ) bits to reprensent typical sequences Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 16 / 20 Entropy Rate for Markov Chain For a stationary Markov chain, the entropy rate is 0 H(X )= H (X ) = lim H (XnjXn−1;:::; X1) = lim H (XnjXn−1) = H (X2jX1) Let Pij = Pr[X2 = jjX1 = i]. By definition, entropy rate of stationary Markov chain X X H(X )= H (X2jX1) = µi ( −Pij log Pij ) i j X = − µi Pij log Pij ij Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 17 / 20 To Calculate Entropy Rate 1 Find stationary distribution µi jX j X X µi = µj pji and µi = 1 j i=1 2 User transition probability Pij X H(X ) = − µi Pij log Pij ij Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 18 / 20 Entropy Rate of Weather Model α β Stationary distribution µ(S) = α+β , µ(R) = α+β 1 − β β P = α 1 − α H(X ) = µ(S)H(β) + µ(R)H(α) α β = H(β) + H(α) α + β α + β Jensen's inequality αβ ≤ H(2 ) α + β Maximum when α = β = 1=2: degenerate to independent process Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 19 / 20 Reading & Homework Reading : Whole Chapter 4 Homework : Problems 4.7(a-d), 4.9 Dr. Rui Wang (EEE) INFORMATION THEORY & CODING October 27, 2020 20 / 20.

Load more