L07-IT-Handouts.Pdf

Today’s Topics Information Theory • Source Coding Techniques • Huffman Code Mohamed Hamada • Two-pass Huffman Code Software Engineering Lab The University of Aizu • Lemple-Ziv Encoding Email: [email protected] • Lemple-Ziv Decoding URL: http://www.u-aizu.ac.jp/~hamada 1 2 Source Coding Techniques Source Coding Techniques 1. Huffman Code． 2. Two-pass Huffman Code． 3. Lemple-Ziv Code． Information 4. Fano code． User of 1. Huffman Code． Source 5. Shannon Code． Information 6. Arithmetic Code． 2. Two-pass Huffman Code． Source Source Encoder Decoder 3. Lemple-Ziv Code． Channel Channel Encoder Decoder 4. Fano code． 5. Shannon Code． Modulator De-Modulator Channel 6. Arithmetic Code． 3 4 Source Coding Techniques Source Coding Techniques 1. Huffman Code． 1. Huffman Code． 2. Two-path Huffman Code． With the Huffman code in the binary case the two least probable source output symbols are joined together, 3. Lemple-Ziv Code． resulting in a new message alphabet with one less symbol 1 take together smallest probabilites: P(i) + P(j) 4. Fano Code． 2 replace symbol i and j by new symbol 3 go to 1 - until end 5. Shannon Code ． Application examples: JPEG, MPEG, MP3 6. Arithmetic Code． 5 6 1 1. Huffman Code． 1. Huffman Code． For COMPUTER DATA data reduction is ADVANTAGES: • uniquely decodable code lossless no errors at reproduction • smallest average codeword length universal effective for different types of data DISADVANTAGES: • LARGE tables give complexity Huffman is not universal! • sensitive to channel errors it is only valid for one particular type of source: If the source has no probability distribution Huffman code can not applied. 7 8 Huffman Coding: Example Solution A • Compute the Huffman Code for the source Source Symbol Source Stage I shown Symbol Probability Symbol sk p sk Note that: the entropy of S is k s 0.1 0 s2 0.4 1 s1 0.2 H04logS. 2 04. s1 0.2 s2 0.4 1 s3 0.2 s3 0.2 202log. 2 02. s4 0.1 s0 0.1 1 201log. 2 s4 0.1 01. 2.L 12193 9 10 Solution A Solution A Source Stage I Stage II Source Stage I Stage II Stage III Symbol Symbol sk sk s2 0.4 0.4 s2 0.4 0.4 0.4 s1 0.2 0.2 s1 0.2 0.2 0.4 s3 0.2 0.2 s3 0.2 0.2 0.2 s0 0.1 0.2 s0 0.1 0.2 s4 0.1 s4 0.1 11 12 2 Solution A Solution A Source Stage I Stage II Stage III Stage IV Source Stage I Stage II Stage III Stage IV Symbol Symbol sk sk 0 00 s2 0.4 0.4 0.4 0.6 s2 0.4 0.4 0.4 0.6 0 s 0.2 0.2 0.4 0.4 s 0.2 0.2 0.4 0.4 10 1 1 1 0 11 s 0.2 0.2 0.2 s 0.2 0.2 0.2 3 3 1 0 s 0.1 0.2 s 0.1 0.2 010 0 0 1 s 0.1 s 0.1 011 4 4 1 13 14 Solution A Solution A Cont’d Source Symbol Code Source Stage I Stage II Stage III Stage IV Code Symbol Probability word c HS. 2 12193 Symbol k s k pk sk 0 s0 0.1 010 s2 0.4 0.4 0.4 0.6 00 L.04 2 02 . 2 s1 0.2 10 0 02... 2 01 3 01 3 s 0.2 0.2 0.4 0.4 10 s 0.4 00 1 1 2 22. 0 s3 0.2 11 s3 0.2 0.2 0.2 11 1 s 0.1 011 0 4 s 0.1 0.2 010 HH1SL S 0 1 s4 0.1 011 1 THIS IS NOT THE ONLY SOLUTION! 15 16 Another Solution B Cont’d Another Solution B Source Symbol Code Source Stage I Stage II Stage III Stage IV Code Symbol Probability word c HS. 2 12193 Symbol k s k pk sk 0 s0 0.1 0010 s2 0.4 0.4 0.4 0.6 1 L.04 1 02 . 2 s1 0.2 01 0 02... 3 01 4 01 4 s 0.2 0.2 0.4 0.4 01 s 0.4 1 1 1 2 22. 0 s3 0.2 000 s3 0.2 0.2 0.2 000 1 s 0.1 0011 0 4 s 0.1 0.2 0010 HH1SL S 0 1 s 0.1 0011 4 1 17 18 3 What is the difference between Source Coding Techniques the two solutions? • They have the same average length 1. Huffman Code． • They differ in the variance of the average code length 2. Two-pass Huffman Code． K 1 2 2 plkk L 3. Lemple-Ziv Code． k 0 4. Fano Code． • Solution A 2 • σ =0.16 5. Shannon Code． • Solution B • σ2=1.36 6. Arithmetic Code． 19 20 Source Coding Techniques Source Coding Techniques 2. Two-pass Huffman Code． 2. Two-pass Huffman Code． Example This method is used when the probability of symbols in Consider the message: M=ABABABABABACADABACADABACADABACAD the information source is unknown. So we first can estimate this probability by calculating the number of L(M)=32 #(A)=16 p(A)=16/32=0.5 occurrence of the symbols in the given message then we #(B)=8 p(B)=8/32=0.25 can find the possible Huffman codes. This can be #(C)=4 p(C)=4/32=0.125 summarized by the following two passes. #(D)=4 p(D)=4/32=0.125 Pass 1 : Measure the occurrence possibility of each character in the message Pass 2 : Make possible Huffman codes 22 21 0 Source Coding Techniques Lempel-Ziv Coding • Huffman coding requires knowledge of a 1. Huffman Code． probabilistic model of the source 2. Two-path Huffman Code． • This is not necessarily always feasible • Lempel-Ziv code is an adaptive coding 3. Lemple-Ziv Code． technique that does not require prior 4. Fano Code． knowledge of symbol probabilities 5. Shannon Code． • Lempel-Ziv coding is the basis of well-known ZIP for data compression 6. Arithmetic Code． 23 24 4 Lempel-Ziv Coding History Lempel-Ziv Coding Example Input: 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… •Universal: effective for different types of data •Lossless: no errors at reproduction Codebook 123456789 Index Applications: GIF, TIFF, V.42bis modem compression standard, PostScript Level 2 Subsequence 0 1 History: Representation - 1977 published by Abraham Lempel and Jakob Ziv - 1984 LZ-Welch algorithm published in IEEE Computer - Sperry patent transferred to Unisys (1986) Encoding GIF file format Required use of LZW algorithm 25 26 Lempel-Ziv Coding Example Lempel-Ziv Coding Example 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… Codebook 12 3 4 5 6 7 8 9 Codebook 123456789 Index Index Subsequence 0 1 00 Subsequence 0 1 00 01 Representation Representation Encoding Encoding 27 28 Lempel-Ziv Coding Example Lempel-Ziv Coding Example 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… Codebook 12 3 4 5 6 7 8 9 Codebook 12 3 4 5 6 7 8 9 Index Index Subsequence 0 1 00 01 011 Subsequence 0 1 00 01 011 10 Representation Representation Encoding Encoding 29 30 5 Lempel-Ziv Coding Example Lempel-Ziv Coding Example 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… Codebook 12 3 4 5 6 7 8 9 Codebook 12 3 4 5 6 7 8 9 Index Index Subsequence 0 1 00 01 011 10 010 Subsequence 0 1 00 01 011 10 010 100 Representation Representation Encoding Encoding 31 32 Lempel-Ziv Coding Example Lempel-Ziv Coding Example 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… Codebook 12 3 4 5 6 7 8 9 Codebook 12 3 4 5 6 7 8 9 Index Index Subsequence 0 1 00 01 011 10 010 100 101 Subsequence 0 1 00 01 011 10 010 100 101 Representation Representation 1221112 Encoding Encoding 33 34 Lempel-Ziv Coding Example Lempel-Ziv Coding Example Decimal Binary 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 1 001 2 010 4 100 Codebook 12 3 4 5 6 7 8 9 6 110 Index Codebook 12 3 4 5 6 7 8 9 Index Subsequence 0 1 00 01 011 10 010 100 101 Subsequence 0 1 00 01 011 10 010 100 101 Representation 11424661221112 Representation 11424661221112 Encoding Encoding 0110001 35 36 6 Lempel-Ziv Coding Example Lempel-Ziv Coding Example Decimal Binary Information bits 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1… 1 001 2 010 Source encoded bits 0010 0011 1001 0100 1000 1100 1101 4 100 6 110 Codebook 12 3 4 5 6 7 8 9 Index Codebook 12 3 4 5 6 7 8 9 Index Subsequence 0 1 00 01 011 10 010 100 101 Subsequence 0 1 00 01 011 10 010 100 101 Representation 11 12 42 21 41 61 62 Representation 11424661221112 Encoding Source Code 0010 0011 1001 0100 1000 1100 1101 0010110001 001 100 010 100 110 110 37 38 How Come this is Compression?! Encoding idea Lempel Ziv Welch-LZW • The hope is: Assume we have just read • If the bit sequence is long enough, a a segment w from the text.

Load more