Lecture 4 4 Lossless Coding

Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Lecture 4 Lossless Coding (I) May 13, 2009 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Outline z Review • With proofs of some theorems (new) z Entropy Coding: Overview z Huffman Coding: The Optimal Code z Information Theory: Hall of Fame 1 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Review Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Image and video coding: Where is IT? Predictive Coding Input Image/Video Pre- Lossy Lossless Post- Processing Coding Coding Processing …110 Visual Quality Predictive Encoded 11001 Measurement Coding Image/Video … … Post- Lossy Lossless Pre-Processing Decoded PiProcessing CdiCoding CdiCoding Image/Video 3 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Coding: A Formal Definition z Intput: x=(x1,…,xm), where xi∈X * 2 z Output: y=(y1,…,ym), where yi∈Y =Y∪Y ∪… z Encoding: y=F(x)=f(x1)…f(xm), where f(xi)=yi • yi is the codeword corresponding to the input symbol xi. • The mapping f: X→Y* is called a code. • F is called the extension of f. • If F is an injection, i.e., x1≠x2 → y1≠y2, then f is a uniquely decodable (UD) code. z Decoding: x*=F-1(y) • When x*=x, we have a lossless code. • When x*≠x, we have a lossy code. • A lossy code cannot be a UD code . 4 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Memoryless random (i.i.d.) source z Notations • A source emits symbols in a set X. • At any time, each symbol xi is emitted from the source with a fixed probability pi, independent of any other symbols. • Anyyy two emitted symbols are inde pendent of each other: the probability that a symbol xj appears after another symbol xi is pipj. • ⇒ There is a discrete distribution P={Prob(xi)=pi|∀xi∈X}, which describes the statistical behavior of the source. z A memoryless random source is simppyly represented as a 2-tuple (X,P). • P can be simply represented as a vector P=[p1,p2,…], when we define an order of all the elements in X. 5 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Prefix-free (PF) code z We say a code f: X→Y* is pperefix-free (()oPF) or instantaneous, • if no codeword is a prefix of another codeword = there does not exist any two distinct symbols x1, x2∈X such that f(x1) is the prefix of f(x2). z Properties of FP codes • PF codes are always UD codes. • PF codes can be uniqqypuely represented b y a b-ary tree. • PF codes can be decoded without reference of the future. 6 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Kraft-McMillan number z Kraft-McMillan Number: K = 1/bL(x) x X where L(x) denotes∈ the length of the codeword f(x)and) and b is the size ofP Y. z Theorem 1 (Kraft): K≤1 ⇔ a PF code exists • K≤1 is often called Kraft Inequality. z Theorem 2 (Mc Millan) : a cod e i s UD ⇔ K≤1 z Theorems 1+2: a UD code always has a PF counterpart. z ⇒ UD codes are not important, but PF ones. 7 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Definition of entropy (Shannon, 1948) z Given a memoryless random source (X, P) with probability distribution P=[p1, …, pn], its ent ropy t o b ase b idfidis defined as f fllollows: n Hb(X) = Hb(P ) = pi logb(1/pi) i=1 X z When b=2, the subscription b may be omitted, then we have H(X)=H(P). 8 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Properties of entropy z The comparison theorem: P=[p1,…,pn]and] and Q=[q1,…,qn] are two probability dist rib uti ons ⇒ n n Hb(P )= pi log (1/pi) pi log (1/qi) i=1 b ≤ i=1 b and the equalityP holds if and Ponly if P=Q. z ⇒ Hb(P)≤logbn, and the equality holds if and only if p1===...=pn=1/n (uniform distribution) . n z Hb(P )=nHb(P) 9 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Proof of the comparison theorem * z Lemma: ∀x>0, lnx≤x-1, and equality holds if and only if x=1. 2 1 1.8 0.8 1.6 0.6 1.4 0.4 0.2 1.2 . ln(x) -ln(x) ss 0 1 )) (x-1 x-1 v x-1 -0.2 0.8 -0.4 0.6 -060.6 0.4 -0.8 0.2 -1 0 00.511.520 050.5 1 151.5 2 x x 10 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Proof of the comparison theorem * z We only need to prove it for the case b=e: logb(x)=ln(x). z From Lemma, ln(qi /pi)≤qi /pi-1, with equality if and only if qi /pi=1(i.e.,1 (i.e., qi=pi). n n n z ⇒ He(P ) pi ln(1/qi)= pi ln(1/pi) pi ln(1/qi) − − i=1 i=1 i=1 X Xn Xn = p ln(q /p ) p (q /p 1) i i i ≤ i i i − i=1 i=1 Xn n X = q p =1 1=0 i − i − i=1 i=1 X X 11 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Shannon’ s source coding theorem (I) z The e nt ropy o f a meeoyessmoryless raadosoucendom source defines the lower bound of the efficiency of all UD (uniquely decodable) codes . • Denoting by L the average length of all the codewords of a UD code , this theorem says L≥ Hb(X) 12 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Proof of Shannon Theorem (I) * z Calculate the Kraft -McMillan number n K = 1/bL(xi) i=1 X z Construct a “virtual” probability distribution L(xi) Q=[q1,…,qi,…,qn], where q i =1 Kb . z From the comparison theorem, we have n ±¡ ¢ Hb(P ) = pi logb(1/pi) i=1 Xn K≤1 → log K≤0 p log (1/q ) b ≤ i b i i=1 Xn L(xi) = pi logb Kb =logb K + L i=1 X ³ ´ 13 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Shannon’ s source coding theorem (II) z Given a memoryless random source ( X, P) with probability distribution P=[p1, …, pn]. z Make a PF Code (Shannon Code) as follow: • Finding L=[L1,…,Ln]where], where Li is the least positive Li integer such that b ≥1/p , i.e., L i = log (1 /p i ) . i d b e • One can pro v e K=∑(1/bLi)≤1then1, then Kft’Kraft’s Theorem ensures there must be a PF code. • Then, for this PF code, we can prove Hb(X)≤L<Hb(X)+1 14 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Proof of Shannon Theorem (II) * Li z ∑(1/b )≤∑(1/(1/pi))=∑pi=1 ⇒ There exists a FP code. z Li = log (1/pi) ⇒ L <log (1/p )+1. d b e i b i z L=∑piLi<∑pi(logb(1/pi)+1)=Hb(X)+1 15 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Approaching the entropy z Given a memoryless random source { X, P}, generate an extended source {Xn, Pn}. n n z Hb(X )≤Ln<Hb(X )+1⇒ Hb(X)≤L<Hb(X))1/+1/n, where note n Hb(X )=nHb(X) and Ln=nL. z Let n→∞, we have L→Hb(X). z Problem: n might be too large to be used in practice. 16 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Entropy Coding Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Image and video encoding: A big picture Differential Coding Motion Estimation and Compensation A/D Conversion Context-Based Coding Color Space Conversion … Pre-Filtering Predictive Partitioning Coding … Input Image/Video Post- Pre- Lossy Lossless Processing Processing Coding Coding (Post-filteri ng) Quantization Entropy Coding Transform Coding Dictionary-Based Coding Model-Based Coding Run-Length Coding Encoded … … Image/Video 18 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding The ingredients of entropy coding z A random source ( X, P) z A statistical model (X, P’) as an estimation of the random source z An algorithm to optimize the coding performance (i.e., to minimize the average codeword length) z At least one designer … 19 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding FLC, VLC and V2FLC z FLC = Fixed-length coding /code( ()s)/codeword( ()s) • Each symbol xi emitted from a random source (X, P) is encoded as an n-bit codeword, where |X|≤2n. z VLC = Variable-length coding/code(s)/codeword(s) • Each symbol xi emitted from a random source (X, P) is encoded as an ni-bit codeword. • FLC can be considered as a special case of VLC, where n1=…=n|X|. z V2FLC = Variable-to-fixed length coding/code(s)/codeword(s) • A symbol or a string of symbols is encoded as an n-bit codeword. • V2FLC can also be considered as a special case of VLC. 20 Shujun LI (李树钧): INF-10845-20091 Multimedia Coding Static coding vs. Dynamic/Adaptive coding z Static coding = The statistical model P’P is static, i.e., it does not change over time. z Dynam ic /Adapti ve codi ng = Th e st ati sti cal mod el P’ is dynamically updated, i.e., it adapts itself to the context (i.e., changes over time). • Dyypgnamic/Adaptive coding ⊂ Context-based coding z Hybrid coding = Static + Dynamic coding • A co de boo k is ma in tai ned at th e encod er sid e, and th e encoder dynamically chooses a code for a number of symbols and inform the decoder about the choice .

Lecture 4 4 Lossless Coding

Source Coding: Part I of Fundamentals of Source and Video Coding

Error Correction Capacity of Unary Coding

Image Data Compression Introduction to Coding

Efficient Variable-To-Fixed Length Coding Algorithms for Text

The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree

The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities

Habilitation `A Diriger Des Recherches from Image Coding And

Error-Resilient Coding Tools in MPEG-4

Efficient Inverted Index Compression Algorithm Characterized by Faster

Techniques for Inverted Index Compression

NERG 1994 Deel 59 Nr. 05

Compressing Integers for Fast File Access