Huffman Coding Optimal Codes + • Suppose That Si -> Wi ∈ a Is an Encoding Scheme for a Source Alphabet S={S1, …,Sm}
Total Page:16
File Type:pdf, Size:1020Kb
Multimedia Communications Huffman Coding Optimal codes + • Suppose that si -> wi ∈ A is an encoding scheme for a source alphabet S={s1, …,sm}. Suppose that the source letter s1, …,sm occur with relative frequencies f1, .. fm respectively. The average code word length of the code is defined as: where li is the length of wi • The average number of code letters required to encode a source text consisting of N source letters is • It may be expensive and time consuming to transmit long sequences of code letters, therefore it may be desirable for to be as small as possible. • It is in our power to make small by cleverly making arrangements when we devise the encoding scheme. Copyright S. Shirani Optimal codes • What constraints should we observe? • The resulting code should be uniquely decodable • Considering what we saw in the previous chapter, we confine ourselves to prefix codes. • An encoding scheme that minimizes is called optimal encoding • The process of finding the optimal code was algorithmized by Huffman. Copyright S. Shirani Optimal codes • The necessary conditions for an optimal variable-length binary code are: 1. Given any two letters aj and ak if then 2. The two least probable letters have codewords with the same maximum length 3. In the tree corresponding to the optimum code, there must be two branches stemming from each intermediate node 4. Suppose we change an intermediate node into a leaf node by combining all the leaves descending from it into a composite word of a reduced alphabet. Then, if the original tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet. Copyright S. Shirani Condition #1 and #2 • Condition #1 is obvious • Suppose an optimum code C exists in which the two code words corresponding to the least probable symbols do not have the same length. Suppose the longer code word is k bits longer than the shorter one. • As C is optimal, the codes corresponding to the least probable symbols are also the longest. • As C is a prefix code, none of the code words is a prefix of the longer code. Copyright S. Shirani Condition #2 • This means that, even if we drop the last k bits of the longest code word, the code words will still satisfy the prefix condition. • By dropping the k bits, we obtain a new code that has a shorter average word length. • Therefore, C cannot be optimal. Copyright S. Shirani Conditions #3 and #4 • Condition #3: If there were any intermediate node with only one branch coming from that node, we could remove it without affecting the decipherability of the code while reducing its average length. • Condition #4: If this condition were not satisfied, we could find a code with smaller average code length for the reduced alphabet and then simply expand the composite word of a reduced alphabet. Then, if the original tree was optimal for the original alphabet, the reduced tree is optimal for the reduced alphabet Copyright S. Shirani • It can be shown that codes generated by Huffman algorithm (explained shortly) meet the above conditions • In fact it can be shown that not only does Huffman’s algorithm always give a “right answer”, but also, every “right answer”. • For the proof see section 4.3.1 in “Information theory and data compression” by D. Hankerson. Copyright S. Shirani Building a Huffman Code • The main idea: – Let S be a source with alphabet A={a1,…,aN}. – Let S' be a source with alphabet A'={a'1,…,a'N-1} such that – Then if a prefix code is optimum for S', the corresponding prefix code for S is also optimum. Copyright S. Shirani Building a Huffman Code a2 (.4) a2 (.4) a2 (.4) a''3 (.6) 0 a'''3 (1) a1 (.2) a1 (.2) a'3 (.4) 0 a2 (.4) 1 a3 (.2) a3 (.2) 0 a1 (.2) 1 a4 (.1) 0 a'4 (.2) 1 a5 (.1) 1 a2 (.4) 1 a1 (.2) 01 a3 (.2) 000 average length = 2.2 bits a4 (.1) 0010 entropy = 2.122 bits a5 (.1) 0011 Copyright S. Shirani • We can use a tree diagram to build the Huffman code • Assignment of 0 and 1 to the branches is arbitrary and gives different Huffman codes with the same average codeword length • Sometimes we use the counts of symbols instead of their probability • We might draw the tree horizontally or vertically Copyright S. Shirani Building a Huffman Code 1 a2 (4) (10) 1 a1 (2) (6) 0 0 a3 (2) (4) 0 1 1 a4 (1) (2) a5 (1) 0 a2 : 1 a1 : 01 a3 : 000 a4 : 0010 a5 : 0011 Copyright S. Shirani Minimum Variance Huffman Codes • According to where in the list the combined source is placed, we obtain different Huffman codes with the same average length (same compression performance). • In some applications we do not want the code word lengths to vary significantly from one symbol to another (example: fixed-rate channels). • To obtain a minimum variance Huffman code, we always put the combined symbol as high in the list as possible. Copyright S. Shirani Minimum Variance Huffman Codes 0.6 0 0.4 a2 (.4) 0 1 0.2 a1 (.2) 0 1 a3 (.2) 1 a (.1) 4 0 a (.1) 5 1 a2 (.4) 00 a1 (.2) 10 a3 (.2) 11 average length = 2.2 bits a4 (.1) 010 entropy = 2.122 bits a5 (.1) 011 Copyright S. Shirani Huffman Codes • The average length of a Huffman code satisfies • The upper bound is loose. A tighter bound is Copyright S. Shirani Extended Huffman Codes • If the probability distribution is very skewed (large Pmax), Huffman codes become inefficient. • We can reduce the rate by grouping symbols together. • Consider the source S of independent symbols with alphabet A={a1,…,aN}. • Let us construct an extended source S(n) by grouping n symbols together • Extended symbols in the extended alphabet Copyright S. Shirani Extended Huffman Codes: Example Huffman code (n = 1) Huffman code (n = 2) a1 .95 0 a1a1 .9025 0 a3 .03 10 a1a3 .0285 100 a2 .02 11 a3a1 .0285 101 a1a2 .0190 111 R = 1.05 bits/symbol a2a1 .0190 1101 H = .335 bits/symbol a3a3 .0009 110000 a3a2 .0006 110010 a2a3 .0006 110001 a2a2 .0004 110011 R = .611 bits/symbol Rate gets close to the entropy only for n > 7. Copyright S. Shirani Extended Huffman Codes • We can build a Huffman code for the extended source with a bit rate R(n) which satisfies • But R = R(n)/n and, for i.i.d. sources, H(S) = H(S(n))/n, so • As n → ∞, R → H(s). Complexity (memory, computations) also increases (exponentially). Slow convergence for skewed distributions. Copyright S. Shirani Nonbinary Huffman Codes • The code elements are coming from an alphabet with m>2 letters • Observations 1. The m symbols that occur least frequently will have the same length 2. The m symbols with the lowest probability differ only in the last position • Example: ternary Huffman code for a source with six letters – First combine three letters with the lowest probability, giving us a reduced alphabet with 4 letters, – Then combining three lowest probability gives us an alphabet with only two letters – We have three values to assign and only two letters, we are wasting one of the code symbols Copyright S. Shirani Nonbinary Huffman Codes – Instead of combining three letters at the beginning we could have combined two letters, into an alphabet of size 5, – If we combine three letters from this alphabet we end up in alphabet with a size of 3. – We could combine three in the first step and two in the second step. Which one is better? – Observation: • all combine letters will have codewords of the same length • Symbols with the lowest probability will have the longest codeword – If at some stage we are allowed to combine less than m symbols the logical place is the very first stage • In the general case of a code alphabet with m symbols (m- ary) and a source with N symbols, the number of letters combined in the first phase is: N modulo (m-1) Copyright S. Shirani Adaptive Huffman Codes • Two-pass encoders: first collect statistics, then build Huffman code and use it to encode source. • One-pass (recursive) encoders: – Develop the code based on the statistics of the symbols already encoded. – The decoder can build its own copy in a similar way. – Possible to modify code without redesigning entire tree. – More complex than arithmetic coding. Copyright S. Shirani Adaptive Huffman Codes • Feature: no statistical study of the source text need to be done before hand • The encoder keeps statistics in the form of counts of source letters, as the encoding proceeds, and modifies the encoding according to those statistics • What about the decoder? if the decoder knows the rules and conventions under which the encoder proceeds, it will know how to decode the next letter • Besides the code stream, the decoder should be supplied with the details of how encoder started and how the encoder will proceed in each situation Copyright S. Shirani Adaptive Huffman Codes • In adaptive Huffman coding, the tree and corresponding encoding scheme change accordingly • Two versions of the adaptive Huffman will be described: 1. Primary version 2. Knuth and Gallager method Copyright S. Shirani Primary version • The leaf nodes (source letters) are sorted non-increasingly • We merge from below in the case of a tie • When two nodes are merged they are called siblings • When two nodes are merged, the parent node will be ranked on the higher of the two sibling nodes.