in systems

 Almost every lossy compression system contains a lossless compression system

Lossy compression system

Transform Lossless Lossless Dequantizer Quantizer Encoder Decoder Inverse Transform Lossless compression system

 We discuss the basics of lossless compression first, then move on to lossy compression

Bernd Girod: EE398A Image and Compression Entropy and Lossless Coding no. 1 Topics in lossless compression

 Binary decision trees and variable length coding  Entropy and -rate  Prefix codes, Huffman codes, Golomb codes

 Joint entropy, conditional entropy, sources with memory  compression standards 

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 2 Example: 20 Questions

 Alice thinks of an outcome (from a finite set), but does not disclose her selection.  Bob asks a series of yes/no questions to uniquely determine the outcome chosen. The goal of the game is to ask as few questions as possible on average.  Our goal: Design the best strategy for Bob.

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 3 Example: 20 Questions (cont.)

 Which strategy is better?

0 (=no) 1 (=yes) 0 1

0 0 1 0 1 0 1 1 A B C 0 0 1 C D 0 1 F 0 1 A B E F D E

 Observation: The collection of questions and answers yield a binary code for each outcome.

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 4 Fixed length codes

0 1

0 1 0 1

0 0 1 0 0 1 A B C D E F G H

 Average description length for K outcomes lav  log2 K  Optimum for equally likely outcomes  Verify by modifying tree

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 5 Variable length codes

 If outcomes are NOT equally probable:  Use shorter descriptions for likely outcomes  Use longer descriptions for less likely outcomes  Intuition:  Optimum balanced code trees, i.e., with equally likely outcomes, can be pruned to yield unbalanced trees with unequal probabilities.  The unbalanced code trees such obtained are also optimum.

 Hence, an outcome of probability p should require about

 1  log 2  p

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 6 Entropy of a random variable

 Consider a discrete, finite-alphabet random variable X

Alphabet X {0 ,1,2 ,..., K 1}

PMF f X x PX  x for each x X 

associated with the event X=x

hXX x log2 f x  Entropy of X is the expected value of that information

H X  E hXXX X    f xlog2 f x x X  Unit: bits

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 7 Information and entropy: properties

 Information h x  0 X    Information hX(x) strictly increases with decreasing probability fX(x)  Boundedness of entropy

0HX ( ) log2  X  Equality if only one Equality if all outcomes outcome can occur are equally likely

 Very likely and very unlikely events do not substantially change entropy

plog2 p  0 for p  0 or p  1

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 8

Example: Binary random variable

H X   plog22 p  (1  p )log (1  p )

1 0.9 HX  0.8 Equally 0.7 likely 0.6 0.5 0.4 0.3 0.2 deterministic 0.1 0 0 0.1 0.2 0.3 0.4 p0.5 0.6 0.7 0.8 0.9 1

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 9 Entropy and bit-rate

 Consider IID random process X (or “source”) where each  n  sample X n (or “symbol”) possesses identical entropy H(X)  H(X) is called “entropy rate” of the random process.

 Noiseless Source Coding Theorem [Shannon, 1948]

 The entropy H(X) is a lower bound for the average word length R of a decodable variable-length code for the symbols.

 Conversely, the average word length R can approach H(X), if sufficiently large blocks of symbols are encoded jointly.  Redundancy of a code:  RHX    0

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 10 Variable length codes

 Given IID random process X with alphabet and  n  X PMF fxX  

 Task: assign a distinct code word, cx, to each element, x X , where c x is a string of c x bits, such that each symbol x n can be determined, even if the codewords cx are directly concatenated in a bitstream n  Codes with the above property are said to be “uniquely decodable.”  Prefix codes  No code word is a prefix of any other codeword  Uniquely decodable, symbol by symbol, in natural order 0, 1, 2, . . . , n, . . .

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 11 Example of non-decodable code

Encode sequence of source symbols 0 ,  2 ,  3 ,  0,  1 Resulting bit-stream 0 10 11 0 01

Encode sequence of source symbols 1, 0 , 3, 0 , 1 Resulting bit-stream 01 0 11 0 01

 Same bit-stream for different sequences of source symbols: ambiguous, not uniquely decodable  BTW: Not a .

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 12 Unique decodability: McMillan and Kraft conditions

 Necessary condition for unique decodability [McMillan]

 cx  21 x X

 Given a set of code word lengths ||cx|| satisfying McMillan condition, a corresponding prefix code always exists [Kraft]  Hence, McMillan inequality is both necessary and sufficient.  Also known as Kraft inequality or Kraft-McMillan inequality.  No loss by only considering prefix codes.  Prefix code is not unique.

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 13 Prefix Decoder

Code Shift register word LUT to hold longest code word

Input buffer ......

Advance

||cx|| bits Code word length LUT

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 14 Binary trees and prefix codes

 Any binary tree can be 0 1 converted into a prefix code by traversing the tree from 0 1 0 1 root to leaves. 00 01 10 0 1 0 1 111 1100 1101  Any prefix code corresponding to a binary tree meets McMillan 3 22  2  2  4  2  3  1 condition with equality

 cx  21 x X

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 15 Binary trees and prefix codes (cont.)

 Augmenting binary tree by two 0 1 new nodes does not change McMillan sum. 0 1 0 1

 Pruning binary tree does not 0 1 change McMillan sum. l 1 2

2ll 11  2    2l

 McMillan sum for simplest binary tree 22111

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 16 Instantaneous variable length encoding without redundancy

 A code without redundancy, i.e. Example RHX ()

requires all individual code word lengths

lflog   k 2 Xk  All probabilities would have to be binary fractions: HX  1.75 bits l k f Xk( ) 2 R 1.75 bits   0

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 17 Huffman Code

 Design for variable length codes proposed by Huffman (1952) always finds a code with minimum redundancy.  Obtain code tree as follows:

1 Pick the two symbols with lowest probabilities and merge them into a new auxiliary symbol.

2 Calculate the probability of the auxiliary symbol.

3 If more than one symbol remains, repeat steps 1 and 2 for the new auxiliary alphabet.

4 Convert the code tree into a prefix code.

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 18 Huffman Code - Example

Fixed length coding: Rfixed  4 bits/symbol

Huffman code: RHuffman  2.77 bits/symbol Entropy H( X )  2.69 bits/symbol Redundancy of the Huffman code:   0.08 bits/symbol

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 19 Redundancy of prefix code for general distribution

 Huffman code redundancy 0 1 bit/symbol

 Theorem: For any distribution fX, a prefix code can be found, whose rate R satisfies HXRHX     1  Proof  Left hand inequality: Shannon’s noiseless coding theorem  Right hand inequality:

Choose code word lengths cxX log2 f x

Resulting rate R fXX x  log2 f x x X

 fXX x1 log2 f x x X HX  1

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 20 Vector

 Huffman coding very inefficient for H(X) << 1 bit/symbol

 Remedy:

 Combine m successive symbols to a new “block-symbol”  Huffman code for block-symbols  Redundancy 1 HXRHX      m

 Can also be used to exploit statistical dependencies between successive symbols m  Disadvantage: exponentially growing alphabet size X

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 21 Truncated Huffman Coding

 Idea: reduce size of Huffman code table and maximum Huffman code word length by Huffman-coding only the most probable symbols.

 Combine J least probable symbols of an alphabet of size K into an auxillary symbol ESC

 Use Huffman code for alphabet consisting of remaining K-J most probable symbols and the symbol ESC

 If ESC symbol is encoded, append  log 2  J  bits to specify exact symbol from the full alphabet  Results in increased average code word length – trade off complexity and efficiency by choosing J

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 22

 Use, if source statistics are not known ahead of time  Forward adaptation  Measure source statistics at encoder by analyzing entire data  Transmit Huffman code table ahead of compressed bit-stream  JPEG uses this concept (even though often default tables are transmitted)  Backward adaptation  Measure source statistics both at encoder and decoder, using the same previously decoded data  Regularly generate identical Huffman code tables at transmitter and receiver  Saves overhead of forward adaptation, but usually poorer code tables, since based on past observations  Generally avoided due to computational burden at decoder

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 23

 “Geometric” source

Alphabet  0,1,...  PMF f x  2x 1 , x  0 XX

 Optimal prefix code with redundancy 0 is “unary” code (“comma code”)

c0"1" c 1  "01" c 2  "001" c 3  "0001"

 Consider geometric source with faster decay 1 PMF f x  1  x , with 0    ; x  0 X 2  Unary code is still optimum prefix code (i.e., Huffman code), but not redundancy-free

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 24

 For geometric source with slower decay 1 PMF f x  1  x , with    1; x  0 X 2

 Idea: Express each x as  x  x  mxq  xr with xq    and xr  xmod m  m  Distribution of new random variables mm11 f x f mx  i   mxq f i Xq  q  X q X   ii00 1  f x  xr for 0  x  m Xr r1  m r

XXqr and statistically independent.

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 25 Golomb coding (cont.)

 Golomb coding m 1  Choose integer divisor   2  Encode xq optimally by unary code

 Encode xr by a modified binary code, using code word lengths

k  log m k  log m a  2  b  2 

 Concatenate bits for xq and xr

k  In practice, m=2 is often used, so xr can be encoded by constant code word length log2 m

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 26 Golomb code examples

Unary 100 Code 10 11 101 1 010 Constant 110 Constant 01 011 length code 111 length code 001 0010 0100 0001 0011 0101 00001 00010 0110 000001 00011 0111 0000001 000010 00100 00000001 Unary m=2 00101 m=4 000011 Unary 000000001 Code 0000010 00110 . 0000011 Code 00111 . 00000010 000100 000101 . 00000011 . 000110 000111 . . . . .

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 27 Golomb parameter estimation

 Expected value for geometric distribution   EX  E X 1  x x     x0 11 EX 

Rule for optimum  Approximation for EX1   performance of m Golomb code m EX  m 1  m 1   1 EX  EX  2 1 m2k E X  2 1 Reasonable setting, k max 0, log2  E X  2 even if does not hold

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 28 Adaptive Golomb coder (JPEG-LS)

Pick the Initial estimate best of mean Golomb code Initialize ANˆ x and 1 For each n  0,1,2, A Set k  max 0, log2  2N

Code symbol xkn using Golomb code with parameter

If NN max Set AA  / 2  and N N / 2 

Update A A  xn and N  N  1

Avoid overflow and slowly forget the past

Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 29