Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Lecture 4 Lossless Coding (I)

May 13, 2009 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Outline

z Review • With proofs of some theorems (new) z Entropy Coding: Overview z : The Optimal

z : Hall of Fame

1 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Review Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Image and coding: Where is IT?

Predictive Coding

Input Image/Video Pre- Lossy Lossless Post- Processing Coding Coding Processing …110

Visual Quality Predictive Encoded 11001 Measurement Coding Image/Video … …

Post- Lossy Lossless Pre-Processing Decoded PiProcessing CdiCoding CdiCoding Image/Video 3 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Coding: A Formal Definition

z Intput: x=(x1,…,xm), where xi∈X * 2 z Output: y=(y1,…,ym), where yi∈Y =Y∪Y ∪… z Encoding: y=F(x)=f(x1)…f(xm), where f(xi)=yi

• yi is the codeword corresponding to the input symbol xi. • The mapping f: X→Y* is called a code. • F is called the extension of f. • If F is an injection, i.e., x1≠x2 → y1≠y2, then f is a uniquely decodable (UD) code. z Decoding: x*=F-1(y) • When x*=x, we have a lossless code. • When x*≠x, we have a lossy code. • A lossy code cannot be a UD code .

4 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Memoryless random (i.i.d.) source

z Notations • A source emits symbols in a set X.

• At any time, each symbol xi is emitted from the source with a fixed probability pi, independent of any other symbols. • Anyyy two emitted symbols are indep endent of each other: the probability that a symbol xj appears after another symbol xi is pipj.

• ⇒ There is a discrete distribution P={Prob(xi)=pi|∀xi∈X}, which describes the statistical behavior of the source. z A memoryless random source is simppyly represented as a 2-tuple (X,P).

• P can be simply represented as a vector P=[p1,p2,…], when we define an order of all the elements in X. 5 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Prefix-free (PF) code

z We say a code f: X→Y* is pperefix-free (()oPF) or instantaneous, • if no codeword is a prefix of another codeword = there does not exist any two distinct symbols x1, x2∈X such that f(x1) is the prefix of f(x2). z Properties of FP • PF codes are always UD codes. • PF codes can be uniqqypuely represented by a b-ary tree. • PF codes can be decoded without reference of the future.

6 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Kraft-McMillan number

z Kraft-McMillan Number: K = 1/bL(x) x X where L(x) denotes ∈ the length of the codeword f(x)and) and b is the size of PY. z Theorem 1 (Kraft): K≤1 ⇔ a PF code exists • K≤1 is often called Kraft Inequality. z Theorem 2 (McMill an) : a cod e i s UD ⇔ K≤1 z Theorems 1+2: a UD code always has a PF counterpart. z ⇒ UD codes are not important, but PF ones.

7 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Definition of entropy (Shannon, 1948)

z Given a memoryless random source (X, P)

with probability distribution P=[p1, …, pn], its ent ropy t o b ase b idfidis defined as f fllollows: n

Hb(X) = Hb(P ) = pi logb(1/pi) Xi=1 z When b=2, the subscription b may be omitted, then we have H(X)=H(P).

8 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Properties of entropy

z The comparison theorem : P=[p1,…,pn]and] and Q=[q1,…,qn] are two probability dist rib uti ons ⇒ H (P )= n p log (1/p ) n p log (1/q ) b i=1 i b i ≤ i=1 i b i and the equalityP holds if and Ponly if P=Q.

z ⇒ Hb(P)≤logbn, and the equality holds if and only if p1===...=pn=1/n (uniform distribution) . n z Hb(P )=nHb(P)

9 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Proof of the comparison theorem *

z Lemma: ∀x>0, lnx≤x-1, and equality holds if and only if x=1. 2 1 1.8 0.8 1.6 0.6 1.4 0.4

0.2 1.2 . ln(x) -ln(x)

ss 0 1 )) (x-1 x-1 v x-1 -0.2 0.8

-0.4 0.6 -060.6 0.4 -0.8 0.2 -1 0 00.511.520 050.5 1 151.5 2 x x

10 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Proof of the comparison theorem *

z We only need to prove it for the case b=e:

logb(x)=ln(x).

z From Lemma, ln(qi /pi)≤qi /pi-1, with equality if and only if qi /pi=1(i.e.,1 (i.e., qi=pi). n n n z ⇒ He(P ) pi ln(1/qi)= pi ln(1/pi) pi ln(1/qi) − − i=1 i=1 i=1 X Xn Xn = p ln(q /p ) p (q /p 1) i i i ≤ i i i − i=1 i=1 Xn n X = q p =1 1=0 † i − i − Xi=1 Xi=1 11 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Shannon’ s source coding theorem (I)

z The e nt ropy o f a meeoyessmoryless raadosoucendom source defines the lower bound of the efficiency of all UD (uniquely decodable) codes . • Denoting by L the average length of all the codewords of a UD code , this theorem says

L≥ Hb(X)

12 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Proof of Shannon Theorem (I) *

z Calculate the Kraft -McMillan number n K = 1/bL(xi) Xi=1 z Construct a “virtual” probability distribution L(xi) Q=[q1,…,qi,…,qn], where q i =1 Kb . z From the comparison theorem, we have n ±¡ ¢

Hb(P ) = pi logb(1/pi) i=1 Xn K≤1 → log K≤0 p log (1/q ) b ≤ i b i i=1 Xn L(xi) † = pi logb Kb =logb K + L Xi=1 ³ ´ 13 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Shannon’ s source coding theorem (II)

z Given a memoryless random source ( X, P)

with probability distribution P=[p1, …, pn]. z Make a PF Code (Shannon Code) as follow:

• Finding L=[L1,…,Ln]where], where Li is the least positive integer such that bLi≥1/p , i.e., L = log (1 /p ) . i i d b i e • One can pro ve K=∑(1/bLi)≤1then1, then Kft’Kraft’s Theorem ensures there must be a PF code. • Then, for this PF code, we can prove

Hb(X)≤L

14 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Proof of Shannon Theorem (II) *

Li z ∑(1/b )≤∑(1/(1/pi))=∑pi=1 ⇒ There exists a FP code.

z L = log (1/p ) ⇒ L

15 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Approaching the entropy

z Given a memoryless random source { X, P}, generate an extended source {Xn, Pn}. n n z Hb(X )≤Ln

z Let n→∞, we have L→Hb(X). z Problem: n might be too large to be used in practice.

16 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Entropy Coding Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Image and video encoding: A big picture Differential Coding and Compensation A/D Conversion Context-Based Coding Conversion … Pre-Filtering Predictive Partitioning Coding … Input Image/Video Post- Pre- Lossy Lossless Processing Processing Coding Coding (Post-filteri ng)

Quantization Entropy Coding Dictionary-Based Coding Model-Based Coding Run-Length Coding Encoded … … Image/Video

18 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

The ingredients of entropy coding

z A random source ( X, P) z A statistical model (X, P’) as an estimation of the random source z An algorithm to optimize the coding performance (i.e., to minimize the average codeword length)

z At least one designer …

19 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

FLC, VLC and V2FLC

z FLC = Fixed-length coding /code( ()s)/codeword ()(s)

• Each symbol xi emitted from a random source (X, P) is encoded as an n-bit codeword, where |X|≤2n. z VLC = Variable-length coding/code(s)/codeword(s)

• Each symbol xi emitted from a random source (X, P) is encoded as an ni-bit codeword.

• FLC can be considered as a special case of VLC, where n1=…=n|X|. z V2FLC = Variable-to-fixed length coding/code(s)/codeword(s) • A symbol or a string of symbols is encoded as an n-bit codeword. • V2FLC can also be considered as a special case of VLC.

20 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Static coding vs. Dynamic/Adaptive coding

z Static coding = The statistical model P’P is static, i.e., it does not change over time. z Dynam ic/Ad apti ve codi ng = Th e st ati sti cal mod el P’ is dynamically updated, i.e., it adapts itself to the context (i.e., changes over time). • Dyypgnamic/Adaptive coding ⊂ Context-based coding z Hybrid coding = Static + Dynamic coding • A co de boo k is ma int ai ned at th e encod er sid e, and th e encoder dynamically chooses a code for a number of symbols and inform the decoder about the choice .

21 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

A coding

z z Shannon-Fano coding z Huffman coding z (Range coding) • Shannon-Fano-Elisa coding z Universal coding • Exp- (H.264/MPEG-4 AVC, Dirac) • Elias coding family , Levenshtein coding , … z Non-universal coding • Truncated binaryyg,yg, coding, , … • Golomb coding ⊃ Rice coding z ⊂ V2FLC David Salomon, Variable-length Codes z … fDfor Data Compressi on, Spr inger, 2007

22 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Shannon-Fano Code

z Shannon-Fano code ≠ Shannon code z The basic idea • Sort all t he pro ba biliti es i n d ecreasi ng ord er. • Divide all the probabilities into two parts, such that the difference between the left and the right parts is minimized. • Assign the left and the right parts as two leaves of the parent node. • Repeat the above steps for both parts if there are more than one leaf node left.

23 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Shannon-Fano Code: An example

z X={A,B,C,D,E}, P={0.35,0.2,0.19,0.13,0.13}, Y={0,1}

A Possible Code 0.35+0.2 0.19+0.13+0.13 A, B C, D, E A 00 B 01 0.13+0.13 0.35 0.2 0.19 D, E C 10 A B C D 110 0.13 0.13 E 111 D E

24 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Tunstall coding: V2FLC *

z The code f maps a sy mbo l in X ooastgor a string of symbols in X* to a fixed-length (n-bit) codeword in Y*. z The basic idea

• Prepare a list of all symbols: X’={x1,…,xi,…,xm}.

• Select the symbol xi with the larggpest probabilit y.

• Remove xi from X’, and add m 2-symbol strings xix1,…,xixm into X’. • If |X’|+(m-1)≤2n, repeat the above steps. • Use an n-bit integer to represent each element in X’.

25 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Tunstall coding: An example *

z X={A,B,C}, P = {0.7,0.2,0.1}, Y={0,1}, n=3

A Possible Code AAA 000 0.7 ABC AAB 001 020.2 010.1 AAC 010 0.49 ABC AB 011 0.14 0.07 AC 100 A B C B 101 0.343 0.098 0.049 C110

26 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Universal coding (code)

z A code is called universal if L≤C1(H+C2))p for all possible values of H, where C1, C2≥1. • You may see a different definition somewhere, but the basic idea remains the same – a universal code works like an optimal code,

except there is a bound defined by a constant C1.

z A universal code is called asymptotically optimal if C1→1 when H→∞.

27 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

What are universal codes for?

z When the statistics of a source is not exactlyy, known, we can not optimize the code according to the probability distribution. z A universal code may be optimal for a specific probability distribution . z Most universal codes are designed for monotonic

probability distributions: i>j ⇒ pi≤pj. • The uniform distribution is a special case of monotonic ones.

28 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Coding positive/non-negative integers

z Any finite and countably infinite set X can be mapped to a set of positive or non- negative i nt egers. • Example: Z={{,0, -1,,, 1, -2,,, 2,…} ⇒ {{,,,,,0,1, 2, 3, 4, …} 2x, when x 0, f(x) = ≥ (2 x 1, when x<0. | | −

29 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Coding positive/non-negative integers

z Naive binary coding • |X|=2k: k-bit binary representation of an integer. z Truncated binary coding • |X||2=2k+b: X={0, 1,…, 2k+b-1} 0 0 0 k k k 2 b 1 b0 bk 1 2 b b0 bk 2 + b 1 1 1 ⇒ ··· − − ⇒ //////////////////////··· − − ⇒ ··· − ⇒ ··· k k+1 z| {zUnary} code (Stone-age binary coding) | {z } • |X|=∞: X=Z+={1, 2,… } f(x)=0 01or1 10 ··· ··· x 1 x 1 − −

| {z } | {z } 30 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Golomb coding and rice coding

z Golomb coding = Unary coding + Truncated binary coding • An integer x is divided into two parts (quotient and remainder) according to a parameter M: q = x/M , b c r=mod(x, M)=x-q*M. • Golumb code = unary code of q + truncated binary code of r. • When M=1, Golomb coding = unary coding. • When M=2k, Golomb coding = Rice coding. • Golomb code is the optimal code for the geometric distribution: Prob(x=i)=(1-p)i-1p, where 0

Exp-Golomb coding (Universal)

z Exp-Golomb coding ≠ Golomb coding z Exp-Golomb coding of order k=0 is used in some video coding standards such as H.264. z The encoding process • |X|=∞: X={0,1,2,…} k • Calculate q = x/ 2 +1 , and n q = log 2 q . b c b c • Exp-Golomb code = unary code of nq + nq LSBs of q + k-bit representation of r=mod(x,2k)=x-q*2k.

32 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

More codes

z Elias gamma code • |X|=∞: X=Z+={1,2,…} • Calculate n x = log x +1 . b 2 c • The code = nx-1 zeros + nx-bit representation of the integer x. • Modified Elias gamma code with {0,1,…} = Exp-Golomb code of order 0. • Elias gamma code is optimal for the following distribution: PbProb(x=i)1/(2)=1/(2i2). z Elias delta code, Elias omega code z Fibonacci code z Levenstein code David Salomon, Variable-length Codes fDfor Data Compressi on, Spr inger, 2007 z … 33 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman Coding Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

How was Huffman code invented?

z In 1951 David A. Huffman and his classmates in an electrical engineering graduate course on information theory were given the choice of a term paper or a final exam. For the term paper, Huffman’s professor, Robert M. Fano, had assigned what at first appeared to be a simple problem. Students were asked to find the most efficient method of representing numbers, letters or other symbols using a binary code. Besides being a nimble intellectual exercise, finding such a code would enable information to be compressed for transmission over a computer network or for storage in a computer ’smemory s memory. …

Scientific American, vol. 265, no. 3, pp. 54-58, 1991 http://www.huffmancoding.com/david-huffman/scientific-american

35 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman’s rules of making optimal codes

z Source statistics: P=P0=[p1,,,…,pm], where p1≥…≥pm-1≥pm.

z Rule 1: L1≤…≤Lm-1=Lm.

z Rule 2 :If: If L1≤…≤Lm-2

z Rule 3: Each possible bit sequence of length Lm-1 must be either a codeword or the prefix of some codewords .

Homework: Think whyyp these three rules are correct. Please do NOT depend on any textbook or paper to help you. Just try to make use of definitions of PF codes and the average codeword length to derive contradictions.

36 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman coding: The idea

z P=P0=[p1,,,…,pm], where p1≥…≥pm.

z Choose the two smallest probabilities pm-1, pm, and merge them to get a new probability pm-1’.

z Get a new probability distribution P1=[p1,…,pm-2,pm-1’] , where p1≥… ≥pm-2≥pm-1’.

z Generate a tree, in which pm-1’ is the parent node and the two selected smallest probabilities pm-1 and pm are child nodes. z Repeat the above process until there is only one probability (which is equal to 1) left.

37 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman code: An example

z X={1,2,3,4,5}, P=[0.4,0.2,0.2,0.1,0.1].

p1+2+3+4+5=1 A Possible Code p =0.6 1 00 1+2 p3+4+5=040.4 2 01 3 10 p1=040.4 p2=020.2 p4+5=020.2 p3=0.2 4 110 5 111

p4=0.1 p5=0.1

38 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman code: An optimal code

z Relation between Huffman code and Shannon code: H≤LHuffman≤LHuffman-Fano≤LShannon

• When pmax≥ 0.5, LHuffman≤H+pmax

• When pmax<0.5, LHuffman≤H+pmax+log2(2(log2e)/e)≈H+pmax+0.086

• When each pi is a negative power of 2, Huffman code reaches the entropy.

39 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Not all optimal codes can be generated!

z The Huffman code construction method may not be able to generate all possible optimal codes.

Figure 1 .30 of [David Salomon , ViblVariable-lthCdfDtlength Codes for , Springer, 2007] 40 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman code: Small X ppoberoblem

z Problem • When |X| is small, the coding performance is less obvious. • As a special case, when |X|=2, Huffman coding cannot compress the data at all – no matter what the probability is, each symbol has to be encoded as a single bit . z Solutions • Solution 1: Work on Xn rather than X. • Solution 2: Dual tree codinggg = Huffman coding + Tunstall coding 41 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman code: Variance problem

z Problem • There are multiple choices of two smallest probabilities, if more than two nodes have the same probability during any step of the coding process. • Huffman codes with a larger variance may cause trouble for data transmissions via a CBR (constant ) channel –a larger buffer is needed . z Solution • Shorter subtrees first. (A single node’s height is 0.)

42 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Modified Huffman code

z Problem • If |X| is too large, the construction of the Huffman tree will be too long and the memory used for the tree will be too demanding. z Solution -v -v • Divide X into two set X1={si|p(si)>2 }, X2={si|p(si)≤2 }.

• Perform Huffman coding for the new set X3=X1∪{X2}.

• Append f(X2) as the prefix of naive binary representation of all symbols in X2.

43 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Fixed Huffman table

z Practical imppglementation of Huffman coding • Assume P is fixed for all inputs. • Huffman tree is pre-constructed and a Huffman table is generated. • The Huffman table is used as a LUT (look-up table) for encoding and decoding purposes.

A Possible Code 1 00 2 01 3 10 4 110 5 111

44 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Huffman code: Decoding problem

z Problem • Normal decoding process needs repeatedly go through the Huffman tree . ⇒ The decoding speed is too slow . z Solution • Divide the whole Huffman tree into several subtrees with the same prefix.

• For each subtree ti, construct a partial LUT whose entries are di-bit integers, where di is the maximal depth of the subtree.

• Proggyggressively go through each subtree ti byyg reading di bits each time. 45 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Adaptive/Dynamic Huffman coding

z The encoder and decoder dynamically update the statistical models and thus the Huffman trees to adapt to the changing behavior of the source .

z A 0-node with the least probability is used to represent symbols that have not appeared. z For each new encoded symbol, the statistical modide is upd dtdth0ated, the 0-nodbdiidddde may be divided and thus the whole tree is updated.

46 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Information Theory: Hall of Fame Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Hall of Fame (I)

z Claude Elwood Shannon (1916-2001) • Father of Information Theory Homework: watch this 京都賞 • Kyy(oto Prize ( , 1985) video at home • IEEE Medal of Honor (1966)

Claude Shannon: Father of the Information Age UCTV, January 30, 2002 48 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Hall of Fame (II)

z Robert Mario Fano (1917-) • Claude E. Shannon Award (IEEE Information Theory Society, 1976) • Member of US NAS (1978) and NAE (1973) • Fellow of AAAS (American Academy of Arts and Sciences, 1958) • Fellow of IEEE (1954) z David Albert Huffman (1925-1999) • IEEE Richard W. Hamming Medal (1999) • Fellow of IEEE (1974) z Peter Elias (1923-2001) • Claude E. Shannon Award (IEEE Information Theory Society, 1977) • Richard W. Hamming Medal (IEEE, 2002) • Membfber of US NAS (1975 )d) and NAE (1979 ) • Fellow of AAAS (American Academy of Arts and Sciences, ?) and AAAS (American Association for the Advancement of Science, ?) • Fellow of IEEE (?) and ACM (1994)

49 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Hall of Fame (III)

z Solomon Wolf Golomb (1932-) • Claude E. Shannon Award (IEEE Information Theory Society, 1985) • Richard W. Hamming Medal (IEEE, 2000) • Member of US NAS (2003) and NAE (1976) • Foreign Member of Russian Academy of Natural Science (1994) • Fellow of AAAS (American Association for the Advancement of SiScience, ?) • Fellow of IEEE (1982)

50 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Hall of Fame (IV)

z Robert Gray Gallager (1931-) • Claude E. Shannon Award (IEEE Information Theory Society, 1983) • IEEE Medal of Honor (1990) • Member of US NAS (1992) and NAE (1979) • Fellow of AAAS (American Academy of Arts and Sciences, 1990) • Fe llow of IEEE (1968) z Vladimir Iosifovich Levenshtein (1935-) • Rich ard W. Hammi ng Med al ( IEEE, 2006) • Fellow of IEEE (2003)

51 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding

Hall of Fame () (*)

z Richard Wesley Hamming (1915 -1998) • ACM Turing Award (1968) • Member of US NAE (1980) • Former President of ACM (1958-1960) • Fellow of IEEE (1968) and ACM (1994)

Richard W. Hamming, You and Your Research, Bell Communications Research Colloquium Seminar, 7 March 1986, transcribed by J. F. Kaiser, URL: http://www.cs.virginia.edu/~robins/YouAndYourResearch.html

52