Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Digital Image Processing Lectures 25 & 26
M.R. Azimi, Professor
Department of Electrical and Computer Engineering Colorado State University
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Area 4: Image Encoding and Compression
Goal: To exploit the redundancies in the image in order to reduce the number of bits to represent an image or a sequence of images (e.g., video).
Applications: Image Transmission: e.g., HDTV, 3DTV, satellite/military communication, and teleconferencing. Image Storage: e.g., Document storage & retrieval, medical image archives, weather maps, and geological surveys. Category of Techniques: 1 Pixel Encoding: PCM, run-length encoding, bit-plane, Huffmann encoding, entropy encoding 2 Predictive Encoding: Delta modulation, 2-D DPCM, inter-frame method 3 Transform-based Encoding: DCT-based, WT-based, Zonal encoding 4 Others: Vector quantization (clustering), neural network-based, hybrid encoding
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Encoding System There are three steps involved with any encoding system (Fig. 1).
a. Mapping: Removes redundancies in the images. Should be invertible. b. Quantization: Mapped values are quantized using uniform or Llyod-Max quantizers. c. Coding: Optimal codewords are assigned to the quantized values.
Figure 1: A Typical Image Encoding System.
However, before we discuss several types of encoding systems, we need to review some basic results from information theory.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Measure of Information & Entropy
Assume there is a source (e.g., an image) that generates a discrete set of independent messages (e.g., grey-levels), rk, with prob. Pk, k ∈ [1,L] with L being the number of messages (or number of levels).
Figure 2: Source and message.
Then, information associated with rk is
Ik = − log2 Pk bits PL−1 Clearly, k=0 Pk = 1. For equally likely levels (messages) information can be transmitted as an n-bit binary number 1 1 P = = → I = n bits k L 2n k
For images, Pk’s are obtained from the histogram. M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
As an example, consider a binary image with r0 = Black,P0 = 1 and r1 = White,P1 = 0, then Ik = 0 i.e. no information. Entropy: Average information generated by the source L L X X H = PkIk = − Pk log2 Pk Avg. bits/pixel k=1 k=1 Entropy also represents a measure of redundancy. Let L = 4,P1 = P2 = P3 = 0 and P4 = 1, then H = 0 i.e. most certain case and thus maximum redundancy. Now, let L = 4,P1 = P2 = P3 = P4 = 1/4, then H = 2 i.e. most uncertain case and hence least redundant. Maximum entropy occurs when levels are equally likely, 1 Pk = L k ∈ [1,L], then L X 1 1 H = − log = log L max L 2 L 2 k=1 Thus, 0 ≤ H ≤ Hmax Entropy and coding M.R. Azimi Digital Image Processing Entropy represents the lower bound on the number of bits required to code the coder inputs. That is, for a set of coder input levels
vk, k ∈ [1,L], with Pk then it is guaranteed that it is not possible to code them using less than H bits on the average. Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Entropy and Coding Entropy represents the lower bound on the number of bits required to code the coder inputs, i.e. for a set of coder inputs vk, k ∈ [1,L], with prob Pk it is guaranteed that it is not possible to code them using less than H bits on the average. If we design a code with codewords Ck, k ∈ [1,L] with corresponding word lengths βks, the average number PL of bits required by the coder is R(L) = k=1 βkPk.
Figure 3: Coder producing codewords Cks with lengths βks. Shannon’s Entropy Coding Theorem (1949) The average length R(L) is bounded by H ≤ R(L) ≤ H + , , = 1/L
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
i.e. it is possible to encode without distortion a source with entropy H using an average of H + bits/message; or it is possible to encode with distortion the source using H bits/message. Optimality of the coder depends on how close R(L) is to H.
Example: Let L = 2, P1 = p and P2 = 1 − p 0 ≤ p ≤ 1. Thus, the entropy is H = −p log2 p − (1 − p) log2(1 − p). The above figure shows H as a function of p. Clearly, since the source is binary, we can use 1 bit/pixel. This corresponds to Hmax = 1 at p = 1/2. However, if p = 1/8,H ≈ 0.2 i.e. more redundancies then it is possible to find a coding scheme that uses only 0.2 bits/pixel.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Remark: Max achievable compression is Average bit rate of original raw data(B) C = Average bit rate of encoded data (R(L)) Thus B B ≤ C ≤ = 1/L H + H Since certain distortion is inevitable in any image transmission, it is necessary to find the minimum number of bits to encode the image while allowing a certain level of distortion. Rate Distortion Function Let D be a fixed distortion between the actual values, x and reproduced values, xˆ. Then the question is: allowing D distortion what is minimum number of bits required to encode the data? 2 If we consider x as a Gaussian r.v. with σx, D is D = E[(x − xˆ)2]
Rate distortion function is defined by
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
( 2 1 σx 2 2 log2 D 0 ≤ D ≤ σx RD = 2 0 D > σx 1 σ2 = Max[0, log x ] 2 2 D 2 At maximum D ≥ σx, RD = 0 i.e. no information needs is transmitted.
Figure 4: Rate Distortion Function RD versus D. RD shows the number of bits required for distortion D. Since RD 2 1/2 RD σx represents the number of bits/pixel N = 2 = D , D is considered to be quantization noise variance. This variance can be minimized using Llyod-Max quantizer. In transform domain we can assume that x is white (e.g., due to KL). M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Pixel-Based Encoding
Encode each pixel ignoring their inter-pixel dependencies. Among methods are: 1. Entropy Coding Every block of an image is entropy encoded based upon the Pk’s within a block. This produces variable length code for each block depending on spatial activities within the blocks. 2. Run-Length Encoding Scan the image horizontally or vertically and while scanning assign a group of pixel with the same intensity into a pair (gi, li) where gi is the intensity and li is the length of the “run”. This method can also be used for detecting edges and boundaries of an object. It is mostly used for images with a small number of gray levels and is not effective for highly textured images.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Example 1: Consider the following 8 × 8 image. 4 4 4 4 4 4 4 0 4 5 5 5 5 5 4 0 4 5 6 6 6 5 4 0 4 5 6 7 6 5 4 0 4 5 6 6 6 5 4 0 4 5 5 5 5 5 4 0 4 4 4 4 4 4 4 0 4 4 4 4 4 4 4 0 The run-length codes using vertical (continuous top-down) scanning mode are: (4,9) (5,5) (4,3) (5,1) (6,3) (5,1) (4,3) (5,1) (6,1) (7,1) (6,1) (5,1) (4,3) (5,1) (6,3) (5,1) (4,3) (5,5) (4,10) (0,8) i.e. total of 20 pairs = 40 numbers. The horizontal scanning would lead to 34 pairs = 68 numbers, which is more than the actual number of pixels (i.e. 64).
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Example 2: Let the transition probabilities for run-length encoding of a binary image (0: black and 1: white) be p0 = P (0|1) and p1 = P (1|0). Assuming all runs are independent, find (a) average run lengths, (b) entropies of white and black runs, and (c) compression ratio. Solution: A run of length l ≥ 1 can be represented by a Geometric r.v. l−1 Xi with PMF P (Xi = l) = pi(1 − pi) with i = 0, 1 which corresponds to happening of 1st occurrences of 0 or 1 after l independent trials. (Note that (1 − P (0|1)) = P (1|1) and (1 − P (1|0)) = P (0|0).) and Thus, for the average we have ∞ ∞ X X l−1 µXi = lP (Xi = l) = lpi(1 − pi) l=1 l=1 P∞ n−1 1 1 which using series na = 2 reduces to µX = . The n=1 (1−a) i pi entropy is given by ∞ X HXi = − P (Xi = l)log2P (Xi = l) l=1 ∞ X l−1 = −pi (1 − pi) [log2pi + (l − 1)log2(1 − pi)] l=1 M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Using the same series formula, we get 1 HXi = − [pilog2pi + (1 − pi)log2(1 − pi)] pi . The achievable compression ratio is H + H H P H P C = X0 X1 = X0 0 + X1 1 µX0 + µX1 µX0 µX1
pi where Pi = are the a priori probabilities of black and white pixels. p0+p1
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
3. Huffman Encoding Algorithm consists of the following steps.
1 Arrange symbols with probability Pk’s in a decreasing order and consider them as “leaf nodes” of a tree.
2 Merge two nodes with smallest prob to form a new node whose prob is the sum of the two merged nodes. Go to Step 1 and repeat until only two nodes are left (“root nodes”).
3 Arbitrarily assign 1’s and 0’s to each pair of branches merging into a node.
4 Read sequentially from root node to the leaf nodes to form the associated code for each symbol. Example 3: For the same image in the previous example, which requires 3 bits/pixel using standard PCM we can arrange the table on the next page.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Gray levels # occurrences Pk Ck βk Pkβk −Pk log2 Pk 0 8 0.125 0000 4 0.5 0.375 1 0 0 - 0 -- 2 0 0 - 0 -- 3 0 0 - 0 -- 4 31 0.484 1 1 0.484 0.507 5 16 0.25 01 2 0.5 0.5 6 8 0.125 001 3 0.375 0.375 7 1 0.016 0001 4 0.64 0.095 64 1 R H Codewords Cks are obtained by constructing the binary tree as in Fig. 5.
Figure 5: Tree Structure for Huffman Encoding. M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Note that in this case, we have
8 X R = βkPk = 1.923 bits/pixel k=1 8 X H = − Pk log2 Pk = 1.852 bits/pixel k=1 Thus, 1 1.852 ≤ R = 1.923 ≤ H + = 1.977 L i.e. an average of 2 bits/pixel (instead of 3 bits/pixel using PCM) can be used to code the image. However, the drawback of the standard Huffman encoding method is that the codes have variable lengths.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Predictive Encoding
Idea: Remove mutual redundancy among successive pixels in a region of support (ROS) or neighborhood and encode only the new information. This method is based upon linear prediction. Let us start with 1-D linear predictors. An N th order linear prediction of x(n) based on N previous samples is generated using a 1-D autoregressive (AR) model
xˆ(n) = a1x(n − 1) + a2x(n − 2) + ··· + aN x(n − N)
ai’s are model coefficients determined based on some sample signals. Now instead of encoding x(n) the prediction error e(n) = x(n) − xˆ(n) is encoded as it requires substantially smaller number of bits. Then, at the receiver we reconstruct x(n) using the previous encoded values x(n − k) and the encoded error signal, i.e. x(n) =x ˆ(n) + e(n) This method is also referred to as differential PCM (DPCM).
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Minimum Variance Prediction The predictor N X xˆ(n) = aix(n − i) i=1 is the best N th order linear mean-squared predictor of x(n), which minimizes the MSE 2 = E x(n) − xˆ(n)
This minimization wrt ak’s results in the following “orthogonal property”: ∂ = −2E x(n) − xˆ(n) x(n − k) = 0, 1 ≤ k ≤ N ∂ak which leads to the normal equation
N X 2 rxx(k) − airxx(k − i) = σe δ(k), 0 ≤ k ≤ N i=1 2 where rxx(k) is the autocorrelation of the data x(n) and σe is the variance of the driving process e(n).
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Plugging different values for k ∈ [0,N] gives the AR Yule-Walker 2 equation for solving for ai’s and σe , i.e.
rxx(0) rxx(1) ······ rxx(N) 1 2 σe rxx(1) rxx(0) ······ rxx(N − 1) −a 1 0 . . . . . . 0 = .. . . . . .. . . . . . . . −a 0 rxx(N) rxx(N − 1) ······ rxx(0) N (1) Note that correlation matrix, Rx in this case is both Toeplitz and Hermitian. The solution to this system of linear equation is given by
2 1 2 −1 σe = −1 ai = −σe /[Rx ]i+1,1 [Rx ]1,1 −1 th −1 where [Rx ]i,j is the i, j element of matrix Rx .
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
In the 2-D case, an AR model with non-symmetric half-plane (NSHP) ROS is used. This ROS is shown in Fig. 6 when image is scanned from left-to-right and top-to-bottom.
Figure 6: A 1st Order 2-D AR Model with NSHP ROS.
For a 1st order 2-D AR x(m, n) = a01x(m, n − 1) + a11x(m − 1, n − 1) + a10x(m − 1, n) + a1,−1x(m − 1, n + 1) + e(m, n) where ai,j’s are model coefficients. Then, the best linear prediction of x(m, n) is xˆ(m, n) = a01x(m, n−1)+a11x(m−1, n−1)+a10x(m−1, n)+a1,−1x(m−1, n+1)
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Note that at every pixel four previously scanned pixels are needed to generate predicted value xˆ(m, n). Fig. 7 shows those pixels that need to be stored in the global ”state vector” for this 1st order predictor.
Figure 7: Global State Vector. Assuming that the reproduced values (quantized) up to (m, n − 1) are available, we generate 0 0 0 0 xˆ (m, n) = a10x (m, n − 1) + a11x (m − 1, n − 1) + a10x (m − 1, n) + 0 a1,−1x (m − 1, n + 1) Then, prediction error is applied to the quantizer. e(m, n) := x(m, n) − xˆ0(m, n) quantizer input The quantized value e0(m, n) is encoded and transmitted. Also it is used to generate the reproduced value using x0(m, n) = e0(m, n) +x ˆ0(m, n) reproduced value M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
The entire process at the transmitter and receiver is depicted in Fig. 8. Clearly, it is assumed that the model coefficients are available at the receiver.
Figure 8: Block Diagram of 2-D Predictive Encoding System.
It is interesting to note that
q(m, n) := x(m, n) − x0(m, n) PCM quantization error = e(m, n) − e0(m, n) DPCM quantization error
However, for the same quantization error, q(m, n), DPCM requires much fewer number of bits. M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Performance Analysis of DPCM For straight PCM the rate distortion function is 1 R = log σ2/σ2 bit/pixels PCM 2 2 x q i.e. number of bits required per pixel in the presence of a particular 2 2 distortion σq = E[q (m, n)]. Now for DPCM the rate distortion function 1 R = log σ2/σ2 bit/pixels DPCM 2 2 e q 2 2 for the same distortion. Clearly, σe σx → RDPCM RPCM . The bit reduction of DPCM over PCM is 1 R − R = log σ2/σ2 PCM DPCM 2 2 x e 1 = log σ2/σ2 0.6 10 x e The achieved compression depends on the inter-pixel redundancy i.e. for an image with no redundancy image (random). 2 2 σx = σe → RPCM = RDPCM M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Transform-Based Encoding
Idea: Reduce redundancy by applying unitary transformation to blocks of an image. Then the redundancy removed coefficients/features are encoded. The process of transform-based encoding or block quantization is depicted in Fig. 9. The image is first partitioned into non-overlapping blocks. Each block in then unitary transformed and the principal coefficients are quantized and encoded.
Figure 9: Transform-Based Encoding Process.
Q1: What are the best mapping matrices A and B, so that maximum redundancy removal is achieved and at the same time distortion due to coefficients reduction is minimized? Q2: What is the best quantizer that gives minimum quantization distortion?
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Theorem: Let x be a random vector representing blocks of an image and y be the transformed version y = Ax with components y(k) that are mutually uncorrelated. These components are then quantized to yo and then encoded and transmitted. At the receiver the decoded values are reconstructed using matrix B i.e. xo = Byo. The objective is find optimum matrices A and B and optimum quantizer such that
D = E[k x − xo k2]
is minimized. 1. The optimum matrices are A = Ψ∗t and B = Ψ, i.e. KL transform pair. 2. The optimum quantizer is Lloyd-Max quantizer. Proof: See Jain’s book (ch. 11).
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
Bit allocation To allocate optimally a given total number of bits (M) to N (retained) components of yo, so that distortion is minimized
N N 1 X 1 X D = E[(y(k) − yo(k))2] = σ2f(m ) N N k k k=1 k=1 f(.): quantizer distortion function. 2 σk: variance of coefficient y(k). o mk: number of bits allocated to y (k). Optimal bit allocation involves finding mks to minimize D subject to PN M = k=1 mk. Note that coefficients with higher variance contain more information than those with lower variance. Thus, more bits are designated to them to improve the performance. i. Shannon’s Allocation Strategy 1 σ2 m = m (θ) = Max(0, log ( k )) k k 2 2 θ M θ: Must be found to produce an average rate of p = N bits per pixel (bpp).
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
ii. Segall Allocation Strategy
1.46σ2 1 log ( k ) 0.083σ2 ≥ θ > 0 1.78 2 θ k σ2 mk(θ) = 1 k 2 2 1.57 log2( θ ) σk ≥ θ > 0.083σk 2 0 θ > σk PN where θ solves k=1 mk(θ) = M. iii. Huang/Schultheiss Allocation Strategy This bit allocation approximates the optimal non-uniform allocation for Gaussian coefficients giving: N M 2 X mˆ = + 2 log σ2 − log σ2 k N 2 k N 2 i i=1 N X mk = Int[m ˆ k] with M = mk F ixed. k=1 Figs. 10 and 11 show reconstructed images of Lena and Barbara using
Shannon (SNRLena = 20.55 and SNRBarb = 17.24dB) and Segall (SNRLena = 21.23 and SNRBarb = 16.90dB) bit allocation methods for an average of p = 1.5 bpp together with the corresponding error images.
M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
50 50
100 100
150 150
200 200
250 250
300 300
350 350
400 400
450 450
500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
50
100
150
200 Student Version of MATLAB Student Version of MATLAB 250
300
350
400
450
500 50 100 150 200 250 300 350 400 450 500 Figure 10: Reconstructed & Error Images-Shannon’s (1.5bpp).
M.R. Azimi Digital Image Processing
Student Version of MATLAB Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding
50 50
100 100
150 150
200 200
250 250
300 300
350 350
400 400
450 450
500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
50 50
100 100
150 150
200 200 Student Version of MATLAB Student Version of MATLAB 250 250
300 300
350 350
400 400
450 450
500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Figure 11: Reconstructed & Error Images-Segall’s (1.5bpp).
M.R. Azimi Digital Image Processing
Student Version of MATLAB Student Version of MATLAB