Entropy Achieving Codes

Univ.Prof. Dr.-Ing. Markus Rupp LVA 389.141 Fachvertiefung Telekommunikation (LVA: 389.137 Image and Compression)

Last change: Jan. 20, 2020 Resume

• Lossless coding example • Consider the following source in which two bits each occur with equal probability. A redundancy is included by a parity check bit

Input output 00 000 01 011 10 101 11 110

Univ.-Prof. Dr.-Ing. Markus Rupp 2 Resume

• How large is the entropy of this source?

K −1 K −1 H (U ) = − P(ak )log 2 (P(ak )) = − pk log 2 (pk ) k =0 k =0 1  1  = −4 log 2   = 2bit 4  4  • Let us assume we receive only the first and third bit, thus 2bit per symbol. • It thus must be possible to recompute the original signal. How?

Univ.-Prof. Dr.-Ing. Markus Rupp 3 Resume

• Solution

Input output 0X0 000 0X1 011 1X1 101 1X0 110

b2 = b1 XOR b3

Univ.-Prof. Dr.-Ing. Markus Rupp 4 Resume

• Consider a pdf fX(x) of a discrete memoryless source. • Q: What happens with the entropy if a

constant is added? →fX(x+c) – A: nothing the entropy remains unchanged • Q: What happens with the entropy if the range is doubled? x→2x – A: nothing, the entropy remains unchanged

Univ.-Prof. Dr.-Ing. Markus Rupp 5 Resume

• Consider a pdf fX(x) of a continuous memoryless source. • Q: What happens with the entropy if a

constant is added? →fX(x+c) – A: nothing the entropy remains unchanged • Q: What happens with the entropy if the

range is doubled? x→2x → 2 fX(2x) – A: a lot

Univ.-Prof. Dr.-Ing. Markus Rupp 6 Resume

• Q: What happens with the entropy if the

range is doubled? x→2x → 2 fX(2x) • A: if x→cx then – For |c|<1 the density becomes more concentrated around the mean and thus the entropy decreases. – For |c|>1 the density becomes smeared and thus the entropy increases.

Univ.-Prof. Dr.-Ing. Markus Rupp 7 Resume Augustin Louis Cauchy (21.8.1789 -23. 5. 1857) French Mathematician • Are there distributions without variance? • Consider Cauchy distribution a f (x) = x  (x2 + a2 )  ax m = dx = 0 x  2 2 − (x + a )  ax2  2 = dx =  x  2 2 − (x + a )

Univ.-Prof. Dr.-Ing. Markus Rupp 8 Resume Paul Pierre Lévy (15.9.1886-15.12.1971) French Mathematician • Are there distributions without mean and variance? • Consider Levy distribution (inverse gamma) c − c e 2(x−a) f (x) = ; x  a x 2 (x − a)3/ 2 c − c  e 2(x−a) m = x dx =  x  3/ 2 2 a (x − a) c − c  e 2(x−a)  2 = x2 dx =  x  3/ 2 Univ.-Prof. Dr.-Ing.2 Markusa Rupp(x − a) 9 http://pillowlab.cps.utexas.edu/teaching/CompNeuro10/slides/slides16_EntropyMethods.pdf Resume

Univ.-Prof. Dr.-Ing. Markus Rupp 10 Resume

dr

Univ.-Prof. Dr.-Ing. Markus Rupp 11 Resume

+ dr

Univ.-Prof. Dr.-Ing. Markus Rupp 12 Outline

• Back to entropy

• Entropy achieving codes – Huffman codes – Golumb and Elias Codes – – Adaptive entropy coding – Run length codes

Univ.-Prof. Dr.-Ing. Markus Rupp 13 Univ.-Prof. Dr.-Ing. Markus Rupp 14 Univ.-Prof. Dr.-Ing. Markus Rupp 15 Univ.-Prof. Dr.-Ing. Markus Rupp 16 Optimal Code Example

• Take for example 16 symbols Z={0000,0001,….,1110,1111} all with probability p=1/16.

• H(Z)=?

• The code is optimal for equal distribution of symbols! • But what if symbols are not equally distributed?

Univ.Prof. Dr.-Ing. Markus Rupp 17 Univ.-Prof. Dr.-Ing. Markus Rupp 18 Univ.-Prof. Dr.-Ing. Markus Rupp 19 Hoffman Code Example

Univ.Prof. Dr.-Ing. Markus Rupp 20 Example: Morse vs. Huffman

Univ.-Prof. Dr.-Ing. Markus Rupp 21 Vector

Huffman: special form of arithmetic coding→achieving entropy

Univ.Prof. Dr.-Ing. Markus Rupp 22 Unary Codes

is an that represents a natural number, n, with n − 1 ones followed by a zero. For example 5 is represented as 11110. Some representations use n − 1 zeros followed by a one. The ones and zeros are interchangeable without loss of generality. • n Unary code Alternative 1 1 0 2 01 10 3 001 110 Unary: 4 0001 1110 German: 5 00001 11110 Unär 6 000001 111110 Monadisch 7 0000001 1111110 opposite: binary 8 00000001 11111110 9 000000001 111111110 10 0000000001 1111111110

Univ.Prof. Dr.-Ing. Markus Rupp 23 Unary Codes

• Unary coding is an optimally efficient encoding for the following discrete probability distribution

−n pn = 2 ;n = 1,2,3...

• In symbol-by-symbol coding, it is optimal for any geometric

distribution −n pn = (k −1)k ;n = 1,2,3...

for which k ≥ φ = 1.61803398879…, the golden ratio, or, more generally, for any discrete distribution for which

pn  pn+1 + pn+2 ;n = 1,2,3...

Univ.Prof. Dr.-Ing. Markus Rupp 24 Golomb Codes Solomon W. Golomb, (*1932)

Efficient for geometric distributions, less complex than unary codes

Univ.Prof. Dr.-Ing. Markus Rupp 25 Example Elias Gamma Code Peter Elias (Nov.23, 1923 – Dec. 7, 2001)

Implied probability • 1 = 20 + 0 = 1 1/2 • 2 = 21 + 0 = 010 1/8 • 3 = 21 + 1 = 011 " • 4 = 22 + 0 = 00100 1/32 • 5 = 22 + 1 = 00101 " • 6 = 22 + 2 = 00110 " Interpretation as • 7 = 22 + 3 = 00111 “ Golumb code with • 8 = 23 + 0 = 0001000 1/128 flexible subblock • 9 = 23 + 1 = 0001001 " length! • 10 = 23 + 2 = 0001010 " • 11 = 23 + 3 = 0001011 " • 12 = 23 + 4 = 0001100 " • 13 = 23 + 5 = 0001101 " • 14 = 23 + 6 = 0001110 " • 15 = 23 + 7 = 0001111 " • 16 = 24 + 0 = 000010000 1/512 • 17 = 24 + 1 = 000010001 "

Univ.Prof. Dr.-Ing. Markus Rupp 26 Elias Coding: pre-Arithmetic Code

− log2 ( fX (x))+1 L  −log2 ( fX (x))+ 2 Univ.Prof. Dr.-Ing. Markus Rupp 27 Arithmetic Coding Example: SQUEEZE • Assume every letter appears equally that is with probability 1/7=0.143 • We then find: P(S)=P(Q)=P(U)=P(Z)=1/7, P(E)=3/7 • Entropy is 14.9bits for the entire word • Hoffman coding would result in 15bits E:0,Q:100,S:101,U:110,Z:111

Univ.-Prof. Dr.-Ing. Markus Rupp 28 Example: Arithmetic Coding

0.647705d= 0.101001011101b Univ.-Prof. Dr.-Ing. Markus Rupp 29 Example Arithmetic Coding

• Eventually, the range for the last letter E becomes 0.64769 - 0.64772. This means that we can encode the word SQUEEZE with a single number in this range. The binary number in this range with the smallest number of bits is 0.101001011101, which corresponds to 0.647705 decimal. The '0.' prefix does not have to be transmitted because every arithmetic coded message starts with this prefix. So we only need to transmit the sequence 101001011101, which is only 12 bits. This number is even below the optimal number of bits of 14.90, but this is not entirely fair. Since the last letter of SQUEEZE is also the most common one, the final range is relatively large, making it easier to fit a value in it with less than the optimal number of bits. If the word was SQUEEEZ, we also would have needed 15 bits with arithmetic coding, but as messages get longer, arithmetic coding clearly outperforms Huffman coding.

• The arithmetic decoding process is similar to the encoding process. We know we have transmitted the value 0.647705. Starting at the top of the figure above, we see that this number falls in the range 0.571 - 0.715, so the first letter must be an S. The decoder also subdivides this range, and we see that the value 0.647705 now falls in the range 0.633 - 0.653, so the second letter must be a Q. This process is repeated until the entire word SQUEEZE is decoded.

Univ.-Prof. Dr.-Ing. Markus Rupp 30 Univ.-Prof. Dr.-Ing. Markus Rupp 31 Univ.-Prof. Dr.-Ing. Markus Rupp 32 Problem in Entropy Codes

• Entropy achieving codes show quite good performance as long as probability for some event is not larger than 0.5. • Note the AC coefficients of DCT are Laplacian distributed with high likelihood (p>0.5) of zeros. • →example

Univ.-Prof. Dr.-Ing. Markus Rupp 33 Example

• Let‘s assume we have only three values occurring – 0 with p=0.75 – -1,+1 with p=0.125 each • Let‘s use a Golomb code – 1 for 0 – 010 for -1 – 011 for +1

Univ.-Prof. Dr.-Ing. Markus Rupp 34 Example

• Entropy: H=1.0613 • Golomb code: -1 x log2(0.75)-2 x 3 x log2(0.125) =1.5bit/symbol

• Note that for – 0 with p=0.5 – -1,+1 with p=0.25 each – Entropy H=1.5!

Univ.-Prof. Dr.-Ing. Markus Rupp 35 Run Length (En)Coding

• Consider a screen containing plain black text on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the text. Let us take a hypothetical single scan line, with B representing a black pixel and W representing white:

• WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWW WWWWWWWWWWWWWWWWWWWWBWWWWWWWWWW WWWW

• If we apply the run-length encoding (RLE) algorithm to the above hypothetical scan line, we get the following: • 12W1B12W3B24W1B14W Interpret this as twelve W's, one B, twelve W's, three B's, etc. • The run-length code represents the original 67 characters in only 18

Univ.-Prof. Dr.-Ing. Markus Rupp 36 Run Length Coding

• More efficient coding annotation W=0,B=1 • WWWWWWWWWWWWBWWWWWWWW WWWWBBBWWWWWWWWWWWWWWW WWWWWWWWWBWWWWWWWWWWW WWW • 12W1B 12W3B 24W1B 14W • JPEG: (12,1),(12,1),(0,1),(0,1)(24,1), EOB • Even better: 12,12,0,0,24 EOB

Univ.-Prof. Dr.-Ing. Markus Rupp 37 Run Length Coding

• Run-length encoding performs lossless data compression and is well suited to palette-based iconic images. It does not work well at all on continuous-tone images such as photographs, although JPEG uses it quite effectively on the coefficients that remain after transforming and quantizing image blocks. • Run-length encoding is used in machines (combined with other techniques into Modified Huffman coding). It is relatively efficient because most faxed documents are mostly white space, with occasional interruptions of black.

Univ.-Prof. Dr.-Ing. Markus Rupp 38 Run Length Coding

• History: check under Lempel and Ziv (Welch)=LZW • For LZW and similar algorithms there are various US patents. LZ78 is covered by the submitted US patent 4.464.650 on 10. August 1981 and granted on 7. August 1984 of company Sperry Corporation (later fusion with Unisys), inventors: Lempel, Ziv, Cohn and Eastman. • Two further US-Patents were granted for the LZW- algorithm Nr. 4.814.746 by Victor S. Miller and Mark N. Wegman (IBM), submitted on 1. Juni 1983, as well as Nr. 4.558.302 by Welch for the Sperry Corporation, submitted on 20. Juni 1983. • The US-Patent 4.558.302 caused the largest controversy.

Univ.-Prof. Dr.-Ing. Markus Rupp 39 United States Patent 4,558,302 Welch December 10, 1985 High speed data compression and decompression apparatus and method

• Abstract A data compressor compresses an input stream of data character signals by storing in a string table strings of data character signals encountered in the input stream. The compressor searches the input stream to determine the longest match to a stored string. Each stored string comprises a prefix string and an extension character where the extension character is the last character in the string and the prefix string comprises all but the extension character. Each string has a code signal associated therewith and a string is stored in the string table by, at least implicitly, storing the code signal for the string, the code signal for the string prefix and the extension character. When the longest match between the input data character stream and the stored strings is determined, the code signal for the longest match is transmitted as the compressed code signal for the encountered string of characters and an extension string is stored in the string table. The prefix of the extended string is the longest match and the extension character of the extended string is the next input data character signal following the longest match. Searching through the string table and entering extended strings therein is effected by a limited search hashing procedure. Decompression is effected by a decompressor that receives the compressed code signals and generates a string table similar to that constructed by the compressor to effect lookup of received code signals so as to recover the data character signals comprising a stored string. The decompressor string table is updated by storing a string having a prefix in accordance with a prior received code signal and an extension character in accordance with the first character of the currently recovered string.

• Inventors: Welch; Terry A. (Concord, MA) Assignee: Sperry Corporation (New York, NY) Appl. No.: 06/505,638 Filed: June 20, 1983

Univ.-Prof. Dr.-Ing. Markus Rupp 40 When will RLE decrease Entropy?

• Note that in general for each value we generate a pair of values: • {1 2 3 0 1 2}→{(0,1)(0,2)(0,3)(1,1)(0,1)(0,2)} • Thus RLE will double the number of elements! • and most certainly increase (double?) entropy. • Unless, there is more than 50% zeros: • {0 2 0 0 0 1}→{(1,2)(3,1)}

Univ.-Prof. Dr.-Ing. Markus Rupp 41 Univ.-Prof. Dr.-Ing. Markus Rupp 42 Run Length Coding

Univ.-Prof. Dr.-Ing. Markus Rupp 43 5th Homework

• Let us generate Huffman codes for the Lena grayscale image at various levels of the Haar transform • Compute the corresponding data files (sum of bits) required to store the information. • Compare to the result of a zip file. • Please write your own Huffman coder, you may compare its function with the procedure from Matlab.

Univ.-Prof. Dr.-Ing. Markus Rupp 44 6th Homework

• Exercise 6 • Let us improve our Huffman coding by first applying run level coding now. • Compare the various stages of Haar coded images and compare the required bit lengths.

Univ.-Prof. Dr.-Ing. Markus Rupp 45 Univ.-Prof. Dr.-Ing. Markus Rupp 46 Summary

• Open issue: – All entropy codes are variable length codes (VLC) – Thus, if a single bit is wrong, we do not know where the next words start!!! – This is a big problem in transmitting with entropy encoded schemes.

– Example: 010 1011000110 00110 011 – 010 1 011 0001100 1 1

Univ.-Prof. Dr.-Ing. Markus Rupp 47