Entropy Coding

Signal Compression 1 Signal Compression 2 Entropy Coding A complete entropy codec, which is an encoder/decoder pair, consists of the process of “encoding” or Entropy coding is also known as “zero-error coding”, “compressing” a random source (typically quantized “data compression” or “lossless compression”. transform coefficients) and the process of “decoding” or Entropy coding is widely used in virtually all popular “decompressing” the compressed signal to “perfectly” international multimedia compression standards such as regenerate the original random source. In other words, JPEG and MPEG. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 3 Signal Compression 4 there is no loss of information due to the process of Random Compressed entropy coding. Entropy Encoding Source Source Thus, entropy coding does not introduce any distortion, and hence, the combination of the entropy encoder and Random Compressed Entropy Decoding entropy decoder faithfully reconstructs the input to the Source Source entropy encoder. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 5 Signal Compression 6 Therefore, any possible loss-of-information or distortion for such a system, and from the perspective of the that may be introduced in a signal compression system is entropy encoder, the input “random source” to that not due to entropy encoding/decoding. As we discussed encoder is the quantized transform coefficients. previously, a typical image compression system, for Quantized Transform Coefficients Coefficients example, includes a transform process, a quantization Random Entropy Compressed Transform Quantization Source Coding Source process, and an entropy coding stage. In such system, the Examples Z Examples KLT Z Z Huffman DCT Z Z Arithmetic distortion is introduced due to quantization. Moreover, Wavelets Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 7 Signal Compression 8 Code Design and Notations second alphabet is the one that is used for constructing In general, entropy coding (or “source coding”) is the codewords. Based on the second alphabet , we can achieved by designing a code, C, which provides a one- construct and define the set D*, which is the set of all to-one mapping from any possible outcome a random finite-length string of symbols withdrawn from the variable X (“source”) to a codeword. alphabet . There two alphabets in this case; one alphabet is the traditional alphabet of the random source X , and the Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 9 Signal Compression 10 The most common and popular codes are binary codes, Alphabet (A)of Set of Codewords Random Source (X) D* where the alphabet of the codewords is simply the binary X + A Alphabet of code symbols A 00 used to construct bits “one” and “zero”. a 01 Codewords B + b 100 bBi C c 101 b 0 . 1110 1 . b2 1 In this example: B 2 Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 11 Signal Compression 12 Binary codes can be represented efficiently using binary Binary tree representation of a binary (D-ary; D=2) prefix code. 000 Set of Codewords trees. In this case, the first two branches of the root node 00 001 0 represent the possible bit assigned to the first bit of a 0 01 010 10 110 codeword. Once that first bit is known, and if the 011 111 100 codeword has a second bit, then the second pair of Alphabet of code symbols 1 10 101 used to construct codewords branches represents the second bit and so on. 11 110 B 0 1 111 BD2 Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 13 Signal Compression 14 Definition The codewords in D* are formed from an alphabet B that A source code, C, is a mapping from a random variable has D elements: ||D. We say that we have a D-ary (source) X with alphabet to a finite length string of code; or B is a D-ary alphabet. symbols, where each string of symbols (codeword) is a As discussed previously, the most common case is when member of the set D*: the alphabet B is the set B 0,1 ; therefore, in this case, CD: * D 2 and we have binary codewords. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 15 Signal Compression 16 Example We can define the code C as follows: Let X be a random source with x +1, 2, 3, 4 . Codeword Length Let 0,1 , and hence ||D 2. Then: CCx110 L1 1 * D {0, 00, 000,...1,11, 111,... CCx2210L2 2 01,10, CCx 33110 L 3 001,010,011,100,... 3 .... CCx 4 4 111 L4 3 } Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 17 Signal Compression 18 Definition Code Types For a random variable X with a p.m.f. p12,pp ,..., m , the The design of a good code follows the basic notion of expected length of a code CX is: entropy: For random outcomes with a high probability, a m good code assigns “short” codewords and vice versa. The L CpLB ii. i1 overall objective is to have the average length L LC to be as small as possible. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 19 Signal Compression 20 In addition, we have to design codes that are uniquely In general, and as a start, we are interested in codes that decodable. In other words, if the source generates a map each random outcome xi into a unique codeword sequence: xxx123,,,... that is mapped into a sequence of that differs from the codeword of any other outcome. For a random source with alphabet 1,2,...m a non- codewords Cx123, Cx , Cx ,... , then we should be singular code meets the following constraint: able to recover the original source sequence xxx123,,,... Cx i Cx j i j from the codewords sequence Cx123, Cx , Cx ,... Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 21 Signal Compression 22 Although a non-singular code is uniquely decodable for a Example: single symbol, it does not guarantee unique decodability Code C1 Code C2 for a sequence of outcomes of X . CCx 111 Cx 110 CCx 2210 Cx 200 CCx 3 3 101 Cx 311 CCx 4 4 111 Cx 4110 Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 23 Signal Compression 24 In the above example, the code C1 is non-singular, It is important to note that a uniquely decodable code however, it is not uniquely decodable. Meanwhile, the may require the decoding of multiple codewords to uniquely identify the original source sequence. code C2 is both non-singular and uniquely decodable. Therefore, not all non-singular codes are uniquely This is the case for the above code C2. (Can you give an decodable; however, every uniquely decodable code is example when the C2 decoder needs to wait for more non-singular. codewords before being able to uniquely decode a sequence?) Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 25 Signal Compression 26 Therefore, it is highly desirable to design a uniquely Example: decodable code that can be decoded instantaneously In the following example, no codeword is used as a when receiving each codeword. prefix for any other codeword. This type of codes are known as instantaneous, prefix CCx 110 CCx 2210 free, or simply prefix codes. CCx 33110 CCx 4 4 111 In a prefix code, a codeword cannot be used as a prefix for any other codewords. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 27 Signal Compression 28 It should be rather intuitive to know that every prefix All possible Non- codes code is uniquely decodable but the inverse is not always singular codes Uniquely decodable true. codes Prefix In summary, the three major types of codes, non-singular, (instantaneous) codes uniquely decodable, and prefix codes, are related as shown in the following diagram. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 29 Signal Compression 30 Kraft Inequality Theorem Based on the above discussion, it should be clear that For any prefix D-ary code C with codeword lengths uniquely decodable codes represent a subset of all L12,LL ,..., m the following must be satisfied: m possible codes. Also, prefix codes are a subset of B D Li 1. uniquely decodable codes. i 1 Prefix codes meet a certain constraint, which is known as the Kraft Inequality. Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 31 Signal Compression 32 Conversely, given a set of codeword lengths that meet corresponding binary tree. (The same principles apply to m higher order codes/trees.) the inequality B D Li 1, there exists a prefix code for i1 For illustration purposes, let us consider the code: this set of lengths. CCx 110 CCx 2210 Proof CCx 33110 CCx 4 4 111 A prefix code C can be represented by a D-ary tree. This code can be represented as follows. Below we illustrate the proof using a binary code and a Copyright © 2005-2008 – Hayder Radha Copyright © 2005-2008 – Hayder Radha Signal Compression 33 Signal Compression 34 Binary tree representation of a An important attribute of the above tree representation of binary (D-ary; D=2) prefix code. 000 Set of Codewords D* codes is the number of leaf nodes that are associated with 00 001 0 0 01 010 10 each codeword. For example, the first codeword C 10 , 011 110 111 there are four leaf nodes that are associated with it.

Entropy Coding

Arithmetic Coding

Information Theory Revision (Source)

Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE

Fast Algorithm for PQ Data Compression Using Integer DTCWT and Entropy Encoding

Entropy Encoding in Wavelet Image Compression

The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree

The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities

Coding and Compression

Answers to Exercises

Digital Video Source Encoding Entropy Encoding

Revision of Lecture 3

Entropy Based Estimation Algorithm Using Split Images to Increase Compression Ratio 33