Asymptotic Equipartition Property An Analogous of the Law of Large Numbers

Radu Trˆımbit¸a¸s

UBB

October 2012

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 1 / 19 Asymptotic equipartition property - IntroductionI

1 The law of large numbers states that for i.i.d. RVs n ∑ Xi is close to E (X ) for large values of n. The AEP states that states that 1 log 1 is close to the n p(X1,X2,...,Xn) entropy H.

The probability p (X1, X2, ... , Xn) assigned to an observed sequence will be close to 2−nH . This enables us to divide the set of all sequences into two sets, the typical set, where the sample entropy is close to the true entropy, and the nontypical set, which contains the other sequences.

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 2 / 19 Asymptotic equipartition property - IntroductionII

Definition 1 (Convergence of random variables)

Given a sequence of RVs, X1, X2, ... we say that the sequences X1, X2, ... converges to a random variable X p 1 In probability (Xn −→ X ) if for every ε > 0, P (|Xn − X | > ε) → 0. 2 2 In mean square if E (Xn − X ) → 0. a.s. 3 With probability 1 (also called almost surelyX n −→ X ) if P(limn→∞ Xn = X ) = 1.

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 3 / 19 Asymptotic equipartition property theoremI

Theorem 2 (AEP)

If X1, X2, ... are i.i.d.∼ p(x), then

1 p − log p (X , X , ... , X ) −→ H(X ) (1) n 1 2 n

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 4 / 19 Asymptotic equipartition property theoremII

Proof.

Xi i.i.d =⇒ log p(Xi ) i.i.d, by the weak law of large numbers 1 1 − log p (X1, X2, ... , Xn) = − ∑ log p(Xi ) n n i p −→ −E (log p(X ) = H(X ).

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 5 / 19 Asymptotic equipartition property theoremIII

Definition 3 (n) The typical set Aε with respect to p(x) is the set of sequences n (x1, x2, ... , xn) ∈ X with the property

−n(H(X )+ε) −n(H(X )−ε) 2 ≤ p(x1, x2, ... , xn) ≤ 2 . (2)

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 6 / 19 Asymptotic equipartition property theoremIV

Theorem 4 (n) (1) If (x1, x2, ... , xn) ∈ Aε , then 1 H(X ) − ε ≤ − log p(x , x , ... , x ) ≤ H(X ) + ε. n 1 2 n

 (n) (2) P Aε > 1 − ε for n sufficiently large.

(n) n(H(X )+ε) (3) Aε ≤ 2 , where |A| denotes the cardinal of A.

(n) n(H(X )−ε) (4) Aε ≥ (1 − ε) 2 for n sufficiently large.

Thus, the typical set has probability nearly 1, all elements of the typical set are nearly equiprobable, and the number of elements in the typical set is nearly 2nH .

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 7 / 19 Asymptotic equipartition property theoremV

Proof. (n) (1) from definition of Aε . (n) (2) follows from Theorem 2, since P((X1, X2, ... , Xn) ∈ Aε → 1 as n → ∞. Thus, ∀δ > 0∃n0 such that for all n ≥ n0   1 P − log p (X1, X2, ... , Xn) − H(X ) < ε > 1 − δ. n

Setting δ = ε complets the proof of (2).

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 8 / 19 Asymptotic equipartition property theoremVI Proof - continuation. (3)

−n(H(X )+ε) −n(H(X )+ε) (n) 1 = ∑ p(x) ≥ ∑ p(x) ≥ ∑ 2 = 2 Aε x∈X n (n) (n) x∈Aε x∈Aε

(n) n(H(X )+ε) Hence Aε ≤ 2 .  (n) (4) For sufficiently large n, P Aε > 1 − ε, so that

  (n) −n(H(X )−ε) −n(H(X )−ε) (n) 1 − ε < P Aε ≤ ∑ 2 = 2 Aε (n) x∈Aε which yields to

(n) n(H(X )−ε) Aε ≥ (1 − ε) 2 .

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 9 / 19 Consequences of the AEP: Data compressionI

X1, X2, ... , Xn ∼ p(x), i.i.d We wish short descriptions of such sequences n (n) We divide sequences in X in two sets: the typical set Aε and its complement (Figure 1) All elements in each set are ordered (e.g. lexicografic order) (n) We represent each sequence of Aε by its index of the sequence in the set. n(H+ε) (n) Since there are ≤ 2 sequences in Aε , the indexing requires ≤ n(H + ε) + 1 bits We prefix each sequence by 0: ≤ n(H + ε) + 2 bits required (Figure 2) (n) Similarly, for the complement of Aε we need n log |X | + 1; we prefix each sequence by 1

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 10 / 19 Consequences of the AEP: Data compressionII

Features of the coding scheme one-to-one and easy decodadle; the initial bit acts as a flag to indicate the length of the codeword (n) For complement of Aε we have a brute-force enumeration, without C  (n) n taking into account that the number of sequences in Aε ≤ |X | Typical sequences have short descriptions of length≈ nH n n Notations x - a sequence x1, x2, ... , xn; `(x ) is the length of the codeword corresponding to xn

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 11 / 19 Consequences of the AEP: Data compressionIII

Theorem 5

Let X n i.i.d ∼ p(x), and ε > 0. The there exists a code that maps sequences xn of length n into binary strings such that the mappings is one-to-one (and therefore invertible) and

 1  E ` (X n) ≤ H(X ) + ε (3) n

for n sufficiently large.

We can represent sequences X n using nH(X ) bits on the average.

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 12 / 19 Consequences of the AEP: Data compressionIV Proof.  (n) If n is sufficiently large so that P Aε ≥ 1 − ε, the expected length of codeword is E (` (X n)) = ∑ p(xn)`(xn) xn = ∑ p(xn)`(xn) + ∑ p(xn)`(xn) (n)  C xn ∈A n (n) ε x ∈ Aε ≤ ∑ p(xn) (n (H + ε) + 2) + ∑ p(xn) (n log |X | + 2) (n)  C xn ∈A n (n) ε x ∈ Aε    (n)  (n)C = P Aε (n (H + ε) + 2) + P Aε (n log |X | + 2)

≤ n (H + ε) + εn log |X | + 2 = n H + ε0

0 2 But, ε = ε + ε log |X | + n can be made arbitrarily small. Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 13 / 19 Typical sets and source coding

Figure: Typical sets and source coding

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 14 / 19 Source coding using the typical sets

Figure: Source code using the typical set

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 15 / 19 High-probability sets and the typical setsI

(n) It is not clear that Aε is the smallest set that contains most of the probability. Definition 6 (n) n For each n = 1, 2, ... , let Bδ ⊂ X with

 (n) P Bδ ≥ 1 − δ. (4)

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 16 / 19 High-probability sets and the typical setsII

Theorem 7

1 0 Let X1, X2, ... , Xn be i.i.d. ∼ p(x). For δ < 2 and any δ > 0, if  (n) P Bδ > 1 − δ, then

1 (n) log B > H − δ0 (5) n δ for n sufficiently large.

Definition 8 . The notation an = bn means 1 a lim log n = 0. n→∞ n bn

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 17 / 19 High-probability sets and the typical setsIII

. Thus, an = bn implies an and bn are equal to the first order in the exponent. We can restate the above theorem: if δn → 0 and εn → 0, then

n . n . B( ) = A( ) = 2nH . δn εn

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 18 / 19 References

Thomas M. Cover, Joy A. Thomas, Elements of , 2nd edition, Wiley, 2006. David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. Robert M. Gray, Entropy and Information Theory, Springer, 2009

Radu Trˆımbit¸a¸s (UBB) Asymptotic Equipartition Property October 2012 19 / 19