<<

Image Encoding & Compression Theory -Based Encoding Predictive Encoding Transform-Based Encoding

Digital Image Processing Lectures 25 & 26

M.R. Azimi, Professor

Department of Electrical and Engineering Colorado State University

M.R. Azimi Processing Image Encoding & Compression Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Area 4: Image Encoding and Compression

Goal: To exploit the redundancies in the image in order to reduce the number of bits to represent an image or a sequence of images (e.g., ).

Applications: Image Transmission: e.g., HDTV, 3DTV, satellite/military , and teleconferencing. Image Storage: e.g., Document storage & retrieval, medical image archives, weather maps, and geological surveys. Category of Techniques: 1 Pixel Encoding: PCM, run-length encoding, bit-plane, Huffmann encoding, 2 Predictive Encoding: , 2-D DPCM, inter- method 3 Transform-based Encoding: DCT-based, WT-based, Zonal encoding 4 Others: (clustering), neural network-based, hybrid encoding

M.R. Azimi Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Encoding System There are three steps involved with any encoding system (Fig. 1).

a. Mapping: Removes redundancies in the images. Should be invertible. b. Quantization: Mapped values are quantized using uniform or Llyod-Max quantizers. c. Coding: Optimal codewords are assigned to the quantized values.

Figure 1: A Typical Image Encoding System.

However, before we discuss several types of encoding systems, we need to review some basic results from information theory.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Measure of Information & Entropy

Assume there is a source (e.g., an image) that generates a discrete set of independent messages (e.g., grey-levels), rk, with prob. Pk, k ∈ [1,L] with L being the number of messages (or number of levels).

Figure 2: Source and message.

Then, information associated with rk is

Ik = − log2 Pk bits PL−1 Clearly, k=0 Pk = 1. For equally likely levels (messages) information can be transmitted as an n-bit binary number 1 1 P = = → I = n bits k L 2n k

For images, Pk’s are obtained from the . M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

As an example, consider a binary image with r0 = Black,P0 = 1 and r1 = White,P1 = 0, then Ik = 0 i.e. no information. Entropy: Average information generated by the source L L X X H = PkIk = − Pk log2 Pk Avg. bits/pixel k=1 k=1 Entropy also represents a measure of redundancy. Let L = 4,P1 = P2 = P3 = 0 and P4 = 1, then H = 0 i.e. most certain case and thus maximum redundancy. Now, let L = 4,P1 = P2 = P3 = P4 = 1/4, then H = 2 i.e. most uncertain case and hence least redundant. Maximum entropy occurs when levels are equally likely, 1 Pk = L k ∈ [1,L], then L X 1 1 H = − log = log L max L 2 L 2 k=1 Thus, 0 ≤ H ≤ Hmax Entropy and coding M.R. Azimi Digital Image Processing Entropy represents the lower bound on the number of bits required to the coder inputs. That is, for a set of coder input levels

vk, k ∈ [1,L], with Pk then it is guaranteed that it is not possible to code them using less than H bits on the average. Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Entropy and Coding Entropy represents the lower bound on the number of bits required to code the coder inputs, i.e. for a set of coder inputs vk, k ∈ [1,L], with prob Pk it is guaranteed that it is not possible to code them using less than H bits on the average. If we design a code with codewords Ck, k ∈ [1,L] with corresponding lengths βks, the average number PL of bits required by the coder is R(L) = k=1 βkPk.

Figure 3: Coder producing codewords Cks with lengths βks. Shannon’s Entropy Coding Theorem (1949) The average length R(L) is bounded by H ≤ R(L) ≤ H + , ,  = 1/L

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

i.e. it is possible to encode without a source with entropy H using an average of H +  bits/message; or it is possible to encode with distortion the source using H bits/message. Optimality of the coder depends on how close R(L) is to H.

Example: Let L = 2, P1 = p and P2 = 1 − p 0 ≤ p ≤ 1. Thus, the entropy is H = −p log2 p − (1 − p) log2(1 − p). The above figure shows H as a function of p. Clearly, since the source is binary, we can use 1 bit/pixel. This corresponds to Hmax = 1 at p = 1/2. However, if p = 1/8,H ≈ 0.2 i.e. more redundancies then it is possible to find a coding scheme that uses only 0.2 bits/pixel.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Remark: Max achievable compression is Average of original raw data(B) C = Average bit rate of encoded data (R(L)) Thus B B ≤ C ≤  = 1/L H +  H Since certain distortion is inevitable in any image transmission, it is necessary to find the minimum number of bits to encode the image while allowing a certain level of distortion. Rate Distortion Function Let D be a fixed distortion between the actual values, x and reproduced values, xˆ. Then the question is: allowing D distortion what is minimum number of bits required to encode the data? 2 If we consider x as a Gaussian r.v. with σx, D is D = E[(x − xˆ)2]

Rate distortion function is defined by

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

( 2 1 σx 2 2 log2 D 0 ≤ D ≤ σx RD = 2 0 D > σx 1 σ2 = Max[0, log x ] 2 2 D 2 At maximum D ≥ σx, RD = 0 i.e. no information needs is transmitted.

Figure 4: Rate Distortion Function RD versus D. RD shows the number of bits required for distortion D. Since RD  2 1/2 RD σx represents the number of bits/pixel N = 2 = D , D is considered to be quantization noise variance. This variance can be minimized using Llyod-Max quantizer. In transform domain we can assume that x is white (e.g., due to KL). M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Pixel-Based Encoding

Encode each pixel ignoring their inter-pixel dependencies. Among methods are: 1. Entropy Coding Every block of an image is entropy encoded based upon the Pk’s within a block. This produces variable length code for each block depending on spatial activities within the blocks. 2. Run-Length Encoding Scan the image horizontally or vertically and while scanning assign a group of pixel with the same intensity into a pair (gi, li) where gi is the intensity and li is the length of the “run”. This method can also be used for detecting edges and boundaries of an object. It is mostly used for images with a small number of gray levels and is not effective for highly textured images.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Example 1: Consider the following 8 × 8 image. 4 4 4 4 4 4 4 0 4 5 5 5 5 5 4 0 4 5 6 6 6 5 4 0 4 5 6 7 6 5 4 0 4 5 6 6 6 5 4 0 4 5 5 5 5 5 4 0 4 4 4 4 4 4 4 0 4 4 4 4 4 4 4 0 The run-length using vertical (continuous top-down) scanning mode are: (4,9) (5,5) (4,3) (5,1) (6,3) (5,1) (4,3) (5,1) (6,1) (7,1) (6,1) (5,1) (4,3) (5,1) (6,3) (5,1) (4,3) (5,5) (4,10) (0,8) i.e. total of 20 pairs = 40 numbers. The horizontal scanning would lead to 34 pairs = 68 numbers, which is more than the actual number of (i.e. 64).

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Example 2: Let the transition probabilities for run-length encoding of a binary image (0: black and 1: white) be p0 = P (0|1) and p1 = P (1|0). Assuming all runs are independent, find (a) average run lengths, (b) entropies of white and black runs, and (c) compression ratio. Solution: A run of length l ≥ 1 can be represented by a Geometric r.v. l−1 Xi with PMF P (Xi = l) = pi(1 − pi) with i = 0, 1 which corresponds to happening of 1st occurrences of 0 or 1 after l independent trials. (Note that (1 − P (0|1)) = P (1|1) and (1 − P (1|0)) = P (0|0).) and Thus, for the average we have ∞ ∞ X X l−1 µXi = lP (Xi = l) = lpi(1 − pi) l=1 l=1 P∞ n−1 1 1 which using series na = 2 reduces to µX = . The n=1 (1−a) i pi entropy is given by ∞ X HXi = − P (Xi = l)log2P (Xi = l) l=1 ∞ X l−1 = −pi (1 − pi) [log2pi + (l − 1)log2(1 − pi)] l=1 M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Using the same series formula, we get 1 HXi = − [pilog2pi + (1 − pi)log2(1 − pi)] pi . The achievable compression ratio is H + H H P H P C = X0 X1 = X0 0 + X1 1 µX0 + µX1 µX0 µX1

pi where Pi = are the a priori probabilities of pixels. p0+p1

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

3. Huffman Encoding consists of the following steps.

1 Arrange with probability Pk’s in a decreasing order and consider them as “leaf nodes” of a tree.

2 Merge two nodes with smallest prob to form a new node whose prob is the sum of the two merged nodes. Go to Step 1 and repeat until only two nodes are left (“root nodes”).

3 Arbitrarily assign 1’s and 0’s to each pair of branches merging into a node.

4 Read sequentially from root node to the leaf nodes to form the associated code for each . Example 3: For the same image in the previous example, which requires 3 bits/pixel using standard PCM we can arrange the table on the next page.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Gray levels # occurrences Pk Ck βk Pkβk −Pk log2 Pk 0 8 0.125 0000 4 0.5 0.375 1 0 0 - 0 -- 2 0 0 - 0 -- 3 0 0 - 0 -- 4 31 0.484 1 1 0.484 0.507 5 16 0.25 01 2 0.5 0.5 6 8 0.125 001 3 0.375 0.375 7 1 0.016 0001 4 0.64 0.095 64 1 R H Codewords Cks are obtained by constructing the binary tree as in Fig. 5.

Figure 5: Tree Structure for Huffman Encoding. M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Note that in this case, we have

8 X R = βkPk = 1.923 bits/pixel k=1 8 X H = − Pk log2 Pk = 1.852 bits/pixel k=1 Thus, 1 1.852 ≤ R = 1.923 ≤ H + = 1.977 L i.e. an average of 2 bits/pixel (instead of 3 bits/pixel using PCM) can be used to code the image. However, the drawback of the standard Huffman encoding method is that the codes have variable lengths.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Predictive Encoding

Idea: Remove mutual redundancy among successive pixels in a region of support (ROS) or neighborhood and encode only the new information. This method is based upon linear prediction. Let us start with 1-D linear predictors. An N th order linear prediction of x(n) based on N previous samples is generated using a 1-D autoregressive (AR) model

xˆ(n) = a1x(n − 1) + a2x(n − 2) + ··· + aN x(n − N)

ai’s are model coefficients determined based on some sample . Now instead of encoding x(n) the prediction error e(n) = x(n) − xˆ(n) is encoded as it requires substantially smaller number of bits. Then, at the receiver we reconstruct x(n) using the previous encoded values x(n − k) and the encoded error , i.e. x(n) =x ˆ(n) + e(n) This method is also referred to as differential PCM (DPCM).

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Minimum Variance Prediction The predictor N X xˆ(n) = aix(n − i) i=1 is the best N th order linear mean-squared predictor of x(n), which minimizes the MSE  2  = E x(n) − xˆ(n)

This minimization wrt ak’s results in the following “orthogonal property”: ∂    = −2E x(n) − xˆ(n) x(n − k) = 0, 1 ≤ k ≤ N ∂ak which leads to the normal equation

N X 2 rxx(k) − airxx(k − i) = σe δ(k), 0 ≤ k ≤ N i=1 2 where rxx(k) is the autocorrelation of the data x(n) and σe is the variance of the driving process e(n).

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Plugging different values for k ∈ [0,N] gives the AR Yule-Walker 2 equation for solving for ai’s and σe , i.e.

  rxx(0) rxx(1) ······ rxx(N)  1   2  σe  rxx(1) rxx(0) ······ rxx(N − 1)  −a    1   0   . .   .     . .   .   0      =    ..       .         .   .   . .. .   .   .   . . .   .  −a 0 rxx(N) rxx(N − 1) ······ rxx(0) N (1) Note that correlation matrix, Rx in this case is both Toeplitz and Hermitian. The solution to this system of linear equation is given by

2 1 2 −1 σe = −1 ai = −σe /[Rx ]i+1,1 [Rx ]1,1 −1 th −1 where [Rx ]i,j is the i, j element of matrix Rx .

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

In the 2-D case, an AR model with non-symmetric half-plane (NSHP) ROS is used. This ROS is shown in Fig. 6 when image is scanned from left-to-right and top-to-bottom.

Figure 6: A 1st Order 2-D AR Model with NSHP ROS.

For a 1st order 2-D AR x(m, n) = a01x(m, n − 1) + a11x(m − 1, n − 1) + a10x(m − 1, n) + a1,−1x(m − 1, n + 1) + e(m, n) where ai,j’s are model coefficients. Then, the best linear prediction of x(m, n) is xˆ(m, n) = a01x(m, n−1)+a11x(m−1, n−1)+a10x(m−1, n)+a1,−1x(m−1, n+1)

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Note that at every pixel four previously scanned pixels are needed to generate predicted value xˆ(m, n). Fig. 7 shows those pixels that need to be stored in the global ”state vector” for this 1st order predictor.

Figure 7: Global State Vector. Assuming that the reproduced values (quantized) up to (m, n − 1) are available, we generate 0 0 0 0 xˆ (m, n) = a10x (m, n − 1) + a11x (m − 1, n − 1) + a10x (m − 1, n) + 0 a1,−1x (m − 1, n + 1) Then, prediction error is applied to the quantizer. e(m, n) := x(m, n) − xˆ0(m, n) quantizer input The quantized value e0(m, n) is encoded and transmitted. Also it is used to generate the reproduced value using x0(m, n) = e0(m, n) +x ˆ0(m, n) reproduced value M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

The entire process at the transmitter and receiver is depicted in Fig. 8. Clearly, it is assumed that the model coefficients are available at the receiver.

Figure 8: Block Diagram of 2-D Predictive Encoding System.

It is interesting to note that

q(m, n) := x(m, n) − x0(m, n) PCM quantization error = e(m, n) − e0(m, n) DPCM quantization error

However, for the same quantization error, q(m, n), DPCM requires much fewer number of bits. M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Performance Analysis of DPCM For straight PCM the rate distortion function is 1 R = log σ2/σ2 bit/pixels PCM 2 2 x q i.e. number of bits required per pixel in the presence of a particular 2 2 distortion σq = E[q (m, n)]. Now for DPCM the rate distortion function 1 R = log σ2/σ2 bit/pixels DPCM 2 2 e q 2 2 for the same distortion. Clearly, σe  σx → RDPCM  RPCM . The bit reduction of DPCM over PCM is 1 R − R = log σ2/σ2 PCM DPCM 2 2 x e 1 = log σ2/σ2 0.6 10 x e The achieved compression depends on the inter-pixel redundancy i.e. for an image with no redundancy image (random). 2 2 σx = σe → RPCM = RDPCM M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding Transform-Based Encoding

Idea: Reduce redundancy by applying unitary transformation to blocks of an image. Then the redundancy removed coefficients/features are encoded. The process of transform-based encoding or block quantization is depicted in Fig. 9. The image is first partitioned into non-overlapping blocks. Each block in then unitary transformed and the principal coefficients are quantized and encoded.

Figure 9: Transform-Based Encoding Process.

Q1: What are the best mapping matrices A and B, so that maximum redundancy removal is achieved and at the same distortion due to coefficients reduction is minimized? Q2: What is the best quantizer that gives minimum quantization distortion?

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Theorem: Let x be a random vector representing blocks of an image and y be the transformed version y = Ax with components y(k) that are mutually uncorrelated. These components are then quantized to yo and then encoded and transmitted. At the receiver the decoded values are reconstructed using matrix B i.e. xo = Byo. The objective is find optimum matrices A and B and optimum quantizer such that

D = E[k x − xo k2]

is minimized. 1. The optimum matrices are A = Ψ∗t and B = Ψ, i.e. KL transform pair. 2. The optimum quantizer is Lloyd-Max quantizer. Proof: See Jain’s book (ch. 11).

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

Bit allocation To allocate optimally a given total number of bits (M) to N (retained) components of yo, so that distortion is minimized

N N 1 X 1 X D = E[(y(k) − yo(k))2] = σ2f(m ) N N k k k=1 k=1 f(.): quantizer distortion function. 2 σk: variance of coefficient y(k). o mk: number of bits allocated to y (k). Optimal bit allocation involves finding mks to minimize D subject to PN M = k=1 mk. Note that coefficients with higher variance contain more information than those with lower variance. Thus, more bits are designated to them to improve the performance. i. Shannon’s Allocation Strategy 1 σ2 m = m (θ) = Max(0, log ( k )) k k 2 2 θ M θ: Must be found to produce an average rate of p = N bits per pixel (bpp).

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

ii. Segall Allocation Strategy

 1.46σ2  1 log ( k ) 0.083σ2 ≥ θ > 0  1.78 2 θ k  σ2 mk(θ) = 1 k 2 2 1.57 log2( θ ) σk ≥ θ > 0.083σk  2  0 θ > σk PN where θ solves k=1 mk(θ) = M. iii. Huang/Schultheiss Allocation Strategy This bit allocation approximates the optimal non-uniform allocation for Gaussian coefficients giving: N M 2 X mˆ = + 2 log σ2 − log σ2 k N 2 k N 2 i i=1 N X mk = Int[m ˆ k] with M = mk F ixed. k=1 Figs. 10 and 11 show reconstructed images of Lena and Barbara using

Shannon (SNRLena = 20.55 and SNRBarb = 17.24dB) and Segall (SNRLena = 21.23 and SNRBarb = 16.90dB) bit allocation methods for an average of p = 1.5 bpp together with the corresponding error images.

M.R. Azimi Digital Image Processing Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

50 50

100 100

150 150

200 200

250 250

300 300

350 350

400 400

450 450

500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

50

100

150

200 Student Version of MATLAB Student Version of MATLAB 250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500 Figure 10: Reconstructed & Error Images-Shannon’s (1.5bpp).

M.R. Azimi Digital Image Processing

Student Version of MATLAB Image Encoding & Compression Information Theory Pixel-Based Encoding Predictive Encoding Transform-Based Encoding

50 50

100 100

150 150

200 200

250 250

300 300

350 350

400 400

450 450

500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

50 50

100 100

150 150

200 200 Student Version of MATLAB Student Version of MATLAB 250 250

300 300

350 350

400 400

450 450

500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Figure 11: Reconstructed & Error Images-Segall’s (1.5bpp).

M.R. Azimi Digital Image Processing

Student Version of MATLAB Student Version of MATLAB