Lossless compression in lossy compression systems
Almost every lossy compression system contains a lossless compression system
Lossy compression system
Transform Lossless Lossless Dequantizer Quantizer Encoder Decoder Inverse Transform Lossless compression system
We discuss the basics of lossless compression first, then move on to lossy compression
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 1 Topics in lossless compression
Binary decision trees and variable length coding Entropy and bit-rate Prefix codes, Huffman codes, Golomb codes
Joint entropy, conditional entropy, sources with memory Fax compression standards Arithmetic coding
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 2 Example: 20 Questions
Alice thinks of an outcome (from a finite set), but does not disclose her selection. Bob asks a series of yes/no questions to uniquely determine the outcome chosen. The goal of the game is to ask as few questions as possible on average. Our goal: Design the best strategy for Bob.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 3 Example: 20 Questions (cont.)
Which strategy is better?
0 (=no) 1 (=yes) 0 1
0 0 1 0 1 0 1 1 A B C 0 0 1 C D 0 1 F 0 1 A B E F D E
Observation: The collection of questions and answers yield a binary code for each outcome.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 4 Fixed length codes
0 1
0 1 0 1
0 0 1 0 0 1 A B C D E F G H
Average description length for K outcomes lav log2 K Optimum for equally likely outcomes Verify by modifying tree
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 5 Variable length codes
If outcomes are NOT equally probable: Use shorter descriptions for likely outcomes Use longer descriptions for less likely outcomes Intuition: Optimum balanced code trees, i.e., with equally likely outcomes, can be pruned to yield unbalanced trees with unequal probabilities. The unbalanced code trees such obtained are also optimum.
Hence, an outcome of probability p should require about
1 log bits 2 p
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 6 Entropy of a random variable
Consider a discrete, finite-alphabet random variable X
Alphabet X {0 ,1,2 ,..., K 1}
PMF f X x PX x for each x X
Information associated with the event X=x
hXX x log2 f x Entropy of X is the expected value of that information
H X E hXXX X f xlog2 f x x X Unit: bits
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 7 Information and entropy: properties
Information h x 0 X Information hX(x) strictly increases with decreasing probability fX(x) Boundedness of entropy
0HX ( ) log2 X Equality if only one Equality if all outcomes outcome can occur are equally likely
Very likely and very unlikely events do not substantially change entropy
plog2 p 0 for p 0 or p 1
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 8
Example: Binary random variable
H X plog22 p (1 p )log (1 p )
1 0.9 HX 0.8 Equally 0.7 likely 0.6 0.5 0.4 0.3 0.2 deterministic 0.1 0 0 0.1 0.2 0.3 0.4 p0.5 0.6 0.7 0.8 0.9 1
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 9 Entropy and bit-rate
Consider IID random process X (or “source”) where each n sample X n (or “symbol”) possesses identical entropy H(X) H(X) is called “entropy rate” of the random process.
Noiseless Source Coding Theorem [Shannon, 1948]
The entropy H(X) is a lower bound for the average word length R of a decodable variable-length code for the symbols.
Conversely, the average word length R can approach H(X), if sufficiently large blocks of symbols are encoded jointly. Redundancy of a code: RHX 0
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 10 Variable length codes
Given IID random process X with alphabet and n X PMF fxX
Task: assign a distinct code word, cx, to each element, x X , where c x is a string of c x bits, such that each symbol x n can be determined, even if the codewords cx are directly concatenated in a bitstream n Codes with the above property are said to be “uniquely decodable.” Prefix codes No code word is a prefix of any other codeword Uniquely decodable, symbol by symbol, in natural order 0, 1, 2, . . . , n, . . .
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 11 Example of non-decodable code
Encode sequence of source symbols 0 , 2 , 3 , 0, 1 Resulting bit-stream 0 10 11 0 01
Encode sequence of source symbols 1, 0 , 3, 0 , 1 Resulting bit-stream 01 0 11 0 01
Same bit-stream for different sequences of source symbols: ambiguous, not uniquely decodable BTW: Not a prefix code.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 12 Unique decodability: McMillan and Kraft conditions
Necessary condition for unique decodability [McMillan]
cx 21 x X
Given a set of code word lengths ||cx|| satisfying McMillan condition, a corresponding prefix code always exists [Kraft] Hence, McMillan inequality is both necessary and sufficient. Also known as Kraft inequality or Kraft-McMillan inequality. No loss by only considering prefix codes. Prefix code is not unique.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 13 Prefix Decoder
Code Shift register word LUT to hold longest code word
Input buffer ......
Advance
||cx|| bits Code word length LUT
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 14 Binary trees and prefix codes
Any binary tree can be 0 1 converted into a prefix code by traversing the tree from 0 1 0 1 root to leaves. 00 01 10 0 1 0 1 111 1100 1101 Any prefix code corresponding to a binary tree meets McMillan 3 22 2 2 4 2 3 1 condition with equality
cx 21 x X
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 15 Binary trees and prefix codes (cont.)
Augmenting binary tree by two 0 1 new nodes does not change McMillan sum. 0 1 0 1
Pruning binary tree does not 0 1 change McMillan sum. l 1 2
2ll 11 2 2l
McMillan sum for simplest binary tree 22111
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 16 Instantaneous variable length encoding without redundancy
A code without redundancy, i.e. Example RHX ()
requires all individual code word lengths
lflog k 2 Xk All probabilities would have to be binary fractions: HX 1.75 bits l k f Xk( ) 2 R 1.75 bits 0
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 17 Huffman Code
Design algorithm for variable length codes proposed by Huffman (1952) always finds a code with minimum redundancy. Obtain code tree as follows:
1 Pick the two symbols with lowest probabilities and merge them into a new auxiliary symbol.
2 Calculate the probability of the auxiliary symbol.
3 If more than one symbol remains, repeat steps 1 and 2 for the new auxiliary alphabet.
4 Convert the code tree into a prefix code.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 18 Huffman Code - Example
Fixed length coding: Rfixed 4 bits/symbol
Huffman code: RHuffman 2.77 bits/symbol Entropy H( X ) 2.69 bits/symbol Redundancy of the Huffman code: 0.08 bits/symbol
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 19 Redundancy of prefix code for general distribution
Huffman code redundancy 0 1 bit/symbol
Theorem: For any distribution fX, a prefix code can be found, whose rate R satisfies HXRHX 1 Proof Left hand inequality: Shannon’s noiseless coding theorem Right hand inequality:
Choose code word lengths cxX log2 f x
Resulting rate R fXX x log2 f x x X
fXX x1 log2 f x x X HX 1
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 20 Vector Huffman coding
Huffman coding very inefficient for H(X) << 1 bit/symbol
Remedy:
Combine m successive symbols to a new “block-symbol” Huffman code for block-symbols Redundancy 1 HXRHX m
Can also be used to exploit statistical dependencies between successive symbols m Disadvantage: exponentially growing alphabet size X
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 21 Truncated Huffman Coding
Idea: reduce size of Huffman code table and maximum Huffman code word length by Huffman-coding only the most probable symbols.
Combine J least probable symbols of an alphabet of size K into an auxillary symbol ESC
Use Huffman code for alphabet consisting of remaining K-J most probable symbols and the symbol ESC
If ESC symbol is encoded, append log 2 J bits to specify exact symbol from the full alphabet Results in increased average code word length – trade off complexity and efficiency by choosing J
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 22 Adaptive Huffman Coding
Use, if source statistics are not known ahead of time Forward adaptation Measure source statistics at encoder by analyzing entire data Transmit Huffman code table ahead of compressed bit-stream JPEG uses this concept (even though often default tables are transmitted) Backward adaptation Measure source statistics both at encoder and decoder, using the same previously decoded data Regularly generate identical Huffman code tables at transmitter and receiver Saves overhead of forward adaptation, but usually poorer code tables, since based on past observations Generally avoided due to computational burden at decoder
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 23 Unary coding
“Geometric” source
Alphabet 0,1,... PMF f x 2x 1 , x 0 XX
Optimal prefix code with redundancy 0 is “unary” code (“comma code”)
c0"1" c 1 "01" c 2 "001" c 3 "0001"
Consider geometric source with faster decay 1 PMF f x 1 x , with 0 ; x 0 X 2 Unary code is still optimum prefix code (i.e., Huffman code), but not redundancy-free
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 24 Golomb coding
For geometric source with slower decay 1 PMF f x 1 x , with 1; x 0 X 2
Idea: Express each x as x x mxq xr with xq and xr xmod m m Distribution of new random variables mm11 f x f mx i mxq f i Xq q X q X ii00 1 f x xr for 0 x m Xr r1 m r
XXqr and statistically independent.
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 25 Golomb coding (cont.)
Golomb coding m 1 Choose integer divisor 2 Encode xq optimally by unary code
Encode xr by a modified binary code, using code word lengths
k log m k log m a 2 b 2
Concatenate bits for xq and xr
k In practice, m=2 is often used, so xr can be encoded by constant code word length log2 m
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 26 Golomb code examples
Unary 100 Code 10 11 101 1 010 Constant 110 Constant 01 011 length code 111 length code 001 0010 0100 0001 0011 0101 00001 00010 0110 000001 00011 0111 0000001 000010 00100 00000001 Unary m=2 00101 m=4 000011 Unary 000000001 Code 0000010 00110 . 0000011 Code 00111 . 00000010 000100 000101 . 00000011 . 000110 000111 . . . . .
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 27 Golomb parameter estimation
Expected value for geometric distribution EX E X 1 x x x0 11 EX
Rule for optimum Approximation for EX1 performance of m Golomb code m EX m 1 m 1 1 EX EX 2 1 m2k E X 2 1 Reasonable setting, k max 0, log2 E X 2 even if does not hold
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 28 Adaptive Golomb coder (JPEG-LS)
Pick the Initial estimate best of mean Golomb code Initialize ANˆ x and 1 For each n 0,1,2, A Set k max 0, log2 2N
Code symbol xkn using Golomb code with parameter
If NN max Set AA / 2 and N N / 2
Update A A xn and N N 1
Avoid overflow and slowly forget the past
Bernd Girod: EE398A Image and Video Compression Entropy and Lossless Coding no. 29