ECE644 – Lecture 2

Asymptotic Equipartition Property (AEP) Data Compression

01/23/02 , Spring 2002, 1 Hairong Qi

Recap ()= − () () H X ∑ p x log2 p x ( )= − () () H X ,Y ∑∑p x, y log2 p x, y x, y H ()Y | X = ∑ p ()(x H Y | X = x )= −∑∑p ()()x, y log p y | x x xy () ()p()x D p || q = ∑ p x log () x q x () ()= () p x, y I X ;Y ∑∑p x, y log ()() x, y p x p y

01/23/02 Information Theory, Spring 2002, 2 Hairong Qi

Venn Diagram

H ()X H ()X ,Y H ()Y

H ()()(X ,Y = H X + H Y | X ) = H ()Y + H (X | Y ) I()()()X ;Y = H X − H X | Y = H ()Y − H (Y | X ) = H ()X + H ()Y − H (X ,Y )

H ()X |Y I ()X ;Y H ()Y | X

01/23/02 Information Theory, Spring 2002, 3 Hairong Qi

1 Asymptotic Equipartition Property

1 1 log  Weak law of large  AEP ()m n p X1, X 2 , X n numbers  Is close to the  For i.i.d. random entropy H 1 n variables, ∑ X i is n i=1 close to its expected value EX for large values of n ()m −nH p X1, X 2 , , X n is close to 2

01/23/02 Information Theory, Spring 2002, 4 Hairong Qi

The AEP

m () If X1, X 2 , are i.i.d. ~ p x , then

1 1 log → H ()X ()m n p X1, X 2 , X n

01/23/02 Information Theory, Spring 2002, 5 Hairong Qi

The Typical Set

()n The typical set Aε with respect to p()x is the set of ()m ∈ℜn sequences x1, x2 , , xn with the following property −n()H ()X +ε ≤ ()m ≤ −n()H ()X −ε 2 p x1, x2 , , xn 2 Other properties: 1 1) H()X −ε ≤ − log p()()x , x ,m, x ≤ H X + ε n 1 2 n ()n 2) Pr{}Aε >1−ε for n sufficiently large ()n n()H ()X +ε 3) Aε ≤ 2

01/23/02 Information Theory, Spring 2002, 6 Hairong Qi

2 Data Compression - Problem

 Let X1, X2, …, Xn be i.i.d. random variables drawn from the probability mass function p(x)  Try to find short descriptions for such sequences of random variables

01/23/02 Information Theory, Spring 2002, 7 Hairong Qi

Data Compression - AEP

 Divide all sequence into two sets: the typical set and the non-typical set

all sequence ℜn

Non-typical set

()n What is Aε ?

01/23/02 Information Theory, Spring 2002, 8 Hairong Qi

Expected Length of the Codeword

E(l(X n ))= ??

01/23/02 Information Theory, Spring 2002, 9 Hairong Qi

3 Calculation of Typical Set

 Consider a sequence of I.I.d. binary random

variables X1, X2, …, Xn, where the probability that Xi=1 is 0.6 – (a) Calculate H(X) – (b) With n=25 and ε=0.1, which sequences fall in the typical set? – (c) What is the probability of the typical set? – (d) How many elements are there in the typical set?

01/23/02 Information Theory, Spring 2002, 10 Hairong Qi

Homework 1 (Due 1/30/02)

 1) Prove the three equations on Lecture1 slide 17  2) Problem 16  3) Calculation of typical set described in slide 10 of lecture 2  4) Read handout and identify research area  5) Reading: “: Reluctant Father of the Digital Age” by M. Mitchell Waldrop  6) Experiment: “Shannon’s experiment to calculate the Entropy of English”  7) Reading: “A mathematical theory of communication” by Claude E. Shannon

01/23/02 Information Theory, Spring 2002, 11 Hairong Qi

4