Lecture notes of and Compression series 2005 2. Transform Coding

Prof. Amar Mukherjee Weifeng Sun Topics

z Introduction to Image Compression z Transform Coding z Subband Coding, Filter Banks z Haar Transform z SPIHT, EZW, JPEG 2000 z z Wireless Video Compression

#2 Transform Coding

z Why transform Coding? z Purpose of transformation is to convert the data into a form where compression is easier. This transformation will transform the pixels which are correlated into a representation where they are decorrelated. The new values are usually smaller on average than the original values. The net effect is to reduce the redundancy of representation. z For , the transform coefficients can now be quantized according to their statistical properties, producing a much compressed representation of the original image data.

#3 Transform Coding Block Diagram

z Transmitter

Original Image f(j,k) F(u,v) F*(uv) Segment into Forward Quantization n*n Blocks Transform and Coder

z Receiver Channel

Combine n*n Inverse Blocks Transform Decoder

Reconstructed Image f(j,k) F(u,v) F*(uv)

#4 How Transform Coders Work

z Divide the image into 1x2 blocks z Typical transforms are 8x8 or 16x16

x1 x2

x1 x2

#5 Joint Probability Distribution

z Observe the Joint Probability Distribution or the Joint Histogram.

Probability

x2 x1

#6 Pixel Correlation in Image[Amar]

z Rotate 45o clockwise

⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ Y = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ Source Image: Amar

Before Rotation After Rotation

#7 Pixel Correlation Map in [Amar] -- coordinate distribution

z Upper: Before Rotation z Lower: After Rotation

z Notice the variance of Y2 is smaller than the variance of X2. z Compression: apply entropy coder on Y2.

#8 Pixel Correlation in Image[Lenna]

z Let’s look at another example

⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ Y = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ Source Image: Lenna

Before Rotation After Rotation

#9 Pixel Correlation Map in [Lenna] -- coordinate distribution

z Upper: Before Rotation z Lower: After Rotation

z Notice the variance of Y2 is smaller than the variance of X2. z Compression: apply entropy coder on Y2.

#10 Rotation Matrix

z Rotated 45 degrees clockwise

⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ ⎡ 2 2 2 2⎤⎡X ⎤ Y = 1 = AX = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ ⎣− 2 2 2 2⎦⎣X 2 ⎦

z Rotation matrix A

⎡ cos 45o sin 45o ⎤ ⎡ 2 2 2 2⎤ 2 ⎡ 1 1⎤ A = ⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ ⎣− sin 45o cos 45o ⎦ ⎣− 2 2 2 2⎦ 2 ⎣−1 1⎦

#11 Orthogonal/orthonormal Matrix

z Rotation matrix is orthogonal. z The dot product of a row ⎧≠ 0 if i = j with itself is nonzero. Ai • Aj = ⎨ ⎩0 else z The dot product of different rows is 0. z Futhermore, the rotation matrix is orthonormal. ⎧1 if i = j Ai • Aj = ⎨ z The dot product of a row ⎩0 else with itself is 1.

2 ⎡ 1 1⎤ z Example: A = ⎢ ⎥ 2 ⎣−1 1⎦ #12 Reconstruct the Image

z Goal: recover X from Y. z Since Y = AX, so X = A-1Y z Because the inverse of an orthonormal matrix is its transpose, we have A-1 = AT z So, Y = A-1X = ATX z We have inverse matrix

o o −1 T ⎡cos 45 − sin 45 ⎤ 2 ⎡1 −1⎤ A = A = ⎢ ⎥ = ⎢ ⎥ ⎣sin 45o cos 45o ⎦ 2 ⎣1 1 ⎦

#13 Energy Compaction

⎡Y ⎤ ⎡X ⎤ 2 ⎡ 1 1⎤ Y = 1 = A 1 A = ⎢Y ⎥ ⎢X ⎥ 2 ⎢−1 1⎥ δ⎣ 2 ⎦ ⎣ 2 ⎦ ⎣ ⎦ z Rotation matrix [A] compacted the energy into Y1.

z Energy is the varianceµ (http://davidmlane.com/hyperstat/A16252.html) N 2 1 2 = ∑(xi − ) where µ is the mean N −1 i=1 z Given: ⎡4⎤ 2 ⎡ 1 1⎤⎡4⎤ 2 ⎡9⎤ ⎡6.364⎤ X = ⎢ ⎥, then Y = AX = ⎢ ⎥⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ ⎣5⎦ 2 ⎣−1 1⎦⎣5⎦ 2 ⎣1⎦ ⎣0.707⎦ z The total variance of X equals to that of Y. It is 41.

z Transformation makes Y2 (0.707) very small. z If we discard min{X}, we have error 42/41 =0.39 z If we discard min{Y}, we have error 0.7072/41 =0.012 z Conclusion: we are more confident to discard min{Y}. #14 Idea of Transform Coding

z Transform the input pixels X0,X1,X2,...,Xn-1 into coefficients Y0,Y1,...,Yn-1 (real values) z The coefficients have the property that most of them are near zero. z Most of the “energy” is compacted into a few coefficients. z Scalar quantize the coefficient z This is bit allocation. z Important coefficients should have more quantization levels. z Entropy encode the quantization symbols.

#15 Forward transform (1D)

z Get the sequence Y from the sequence X. z Each element of Y is a linear combination of elements in X. n−1 Yj = ∑ a j,i X i j = 0,1,Ln −1 i=0 Basis Vectors ⎡⎤Ya00,00,10⎡⎤L an− ⎡ X ⎤ ⎢⎥⎢⎥ ⎢ ⎥ ⎢⎥MMOMM= ⎢⎥ ⎢ ⎥ Y = AX ⎢⎥⎢⎥ ⎢ ⎥ ⎣⎦Yann−−11,01,11⎣⎦L a nnn −−− ⎣ X ⎦

The element of the matrix are also called the weight of the linear transform, and they should be independent of the data (except for the KLT transform).

#16 Choosing the Weights of the Basis Vector

z The general guideline to determine the values of A is to make Y0 large, while remaining Y1,...,Yn-1 to be small. z The value of the coefficient will be large if weights aij reinforce the corresponding data items Xj. This requires the weights and the data values to have similar signs. The converse is also true: Yi will be small if the weights and the data values to have dissimilar signs.

#17 Extracting Features of Data

z Thus, the basis vectors should extract distinct features of the data vectors and must be independent orthogonal). Note the pattern of distribution of +1 and -1 in the matrix. They are intended to pick up the low and high “frequency” components of data. z Normally, the coefficients decrease in the order of Y0,Y1,...,Yn-1. z So, Y is more amenable to compression than X. Y = AX

#18 Energy Preserving (1D)

z Another consideration to choose rotation matrix is to conserve energy. z For example, we have orthogonal matrix

⎡1 1 1 1 ⎤ ⎡4⎤ ⎡17 ⎤ ⎢ ⎥ ⎢1 1 −1 −1⎥ ⎢6⎥ 3 A = ⎢ ⎥ X = ⎢ ⎥ Y = AX = ⎢ ⎥ ⎢1 −1 −1 1 ⎥ ⎢5⎥ ⎢− 5⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦ ⎣2⎦ ⎣ 2 ⎦ z Energy before rotation: 42+62+52+22=81 z Energy after rotation: 172+32+(-5)2+12=324 z Energy changed! z Solution: scale W by scale factor. The scaling does not change the fact that most of the energy is concentrated at the low frequency components.

#19 Energy Preserving, Formal Proof

z The sum of the squares Energy of the transformed n−1 2 T ∑Yi = Y Y sequence is the same i=1 as the sum of the = (AX )T (AX ) squares of the original = X T AT AX sequence. = X T (AT A)X T z Most of the energy are = X X n−1 2 Orthonormal concentrated in the low = ∑ X i Matrix; frequency coefficients. i=1 See page #12

#20 Why we are interested in the orthonormal matrix?

⎡11111111⎤ z Normally, it is ⎢ ⎥ ⎢1111− 1111−−−⎥ ⎢11111111−−−− ⎥ computationally difficult ⎢ ⎥ 1 11111111− −−− A = ⎢ ⎥ 8 ⎢1−− 1 11 1 −− 1 11⎥ to get the inverse matrix. ⎢ ⎥ ⎢1− 1−− 11 11 1 − 1⎥ ⎢1−−−− 11 1 11 11⎥ z The inverse of the ⎢ ⎥ ⎢11111111− −−−⎥ transformation matrix is ⎣ ⎦ ⎡11111111⎤ simply its transpose. ⎢ ⎥ ⎢11111111− −−−⎥ ⎢11111111−−−− ⎥ -1 T ⎢ ⎥ A =A 1 11111111− −−− BA===−1 AT ⎢ ⎥ 8 ⎢1−− 1 11 1 −− 1 11⎥ ⎢ ⎥ ⎢1− 1−− 11 11 1 − 1⎥ ⎢1−−−− 11 1 11 11⎥ ⎢ ⎥ ⎣⎢11111111− −−−⎦⎥

#21 Two Dimensional Transform

z From input Image I, we get D. 4 6 7 8 z Given Transform matrix A 5 2 4 4 ⎡1 1 1 1 ⎤ ⎡4 7 6 9⎤ ⎢ ⎥ ⎢ ⎥ 6 3 9 6 1 1 1 −1 −1 6 8 3 6 A = ⎢ ⎥ D = ⎢ ⎥ 2 ⎢1 −1 −1 1 ⎥ ⎢5 4 7 6⎥ 7 5 6 9 ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦ ⎣2 4 5 9⎦ z 2D transformation goes as: ⎡1 1 1 1 ⎤⎡4 7 6 9⎤ ⎡1 1 1 1 ⎤ ⎡22.75 − 2.75 0.75 − 3.75⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 1 1 −1 −1 6 8 3 6 1 1 1 −1 −1 1.75 3.25 − 0.25 −1.75 Y = AXAT = ⎢ ⎥⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ 2 ⎢1 −1 −1 1 ⎥⎢5 4 7 6⎥ 2 ⎢1 −1 −1 1 ⎥ ⎢ 0.25 − 3.25 0.25 − 2.25⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦⎣2 4 5 9⎦ ⎣1 −1 1 −1⎦ ⎣ 1.25 −1.25 0.75 1.75 ⎦ z Notice the energy compaction.

#22 Two Dimensional Transform

z Because transformation matrix is orthonormal, AT = A−1 z So, we have z Forward transform Y = AXA−1 = AXAT

z Backward transform X = A−1YA = ATYA

#23 Linear Separable transform

z Two dimensional transform is simplified as two iterations of one-dimensional transform. NN−−11 YaXakl,,,,= ∑∑ ki i j l j ij==00 z Column-wise transform and row-wise transform.

#24 Transform and Filtering

z Consider the orthonormal transform. 2 ⎡ 1 1⎤ A = ⎢ ⎥ 2 ⎣−1 1⎦ z If A is used to transform a vector of 2 identical elements x = [x,x]T, the transformed sequence will ( 2x,0)T indicating the “low frequency” or the “average” value is 2 x and the “high frequency” component is 0 because the value do not vary. T T z If x = [3,1] or [3,-1] , the output sequence will be (2 2, 2)T and ( 2 , 2 2 ) T respectively. Now, the high frequency component has positive value and it is bigger for [3,-1]T,indicating a much large variation. Thus, the two coefficients behave like output of a “low-pass” and a “high-pass” filters, respectively.

#25 Transform and Functional Approximation

z Transform is a kind of function approximation. z Image is a data set. Any data set is a function. z Transform is to approximate the image function by a combination of simpler, well defined “waveforms” (basis functions). z Not all basis sets are equal in terms of compression. z DCT and are computationally easier than Fourier.

#26 Comparison of various transforms

z The KLT is optimal in the sense of decorrelating and energy-packing. z Walsh-Hadamard Transform is especially easy for implementation. z basis functions are either -1 or +1, only add/sub is necessary.

#27 Two-Dimensional Basis Matrix

z The outer product of two vectors V1 and V2 is T defined as V1 V2. For example, ⎡a⎤ ⎡ad ae af ⎤ V TV = ⎢b⎥[]d e f = ⎢bd be bf ⎥ 1 2 ⎢ ⎥ ⎢ ⎥ ⎣⎢c⎦⎥ ⎣⎢cd ce cf ⎦⎥ z For a matrix A of size n*n, the outer product of ith row and jth column is defined as

⎡ ai,0 ⎤ ⎡ ai,0a j,0 ai,0a j,1 L ai,0a j,n−1 ⎤ ⎢ a ⎥ ⎢ a a a a a a ⎥ ⎢ i,1 ⎥ ⎢ i,1 j,0 i,1 j,1 L i,1 j,n−1 ⎥ αij = []a j,0 a j,1 L a j,n−1 = ⎢ M ⎥ ⎢ M M O M ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ai,n−1 ⎦ ⎣⎢ai,n−1a j,0 ai,n−1a j,1 L ai,n−1a j,n−1 ⎦⎥

#28 Outer Product

z For example, if 2 ⎡1 1 ⎤ A = ⎢ ⎥ z We have: 2 ⎣1 −1⎦ 2 ⎡1⎤ 2 1 ⎡1 1⎤ α00 = ⎢ ⎥ []1 1 = ⎢ ⎥ 2 ⎣1⎦ 2 2 ⎣1 1⎦ 2 ⎡1⎤ 2 1 ⎡1 −1⎤ α01 = ⎢ ⎥ []1 −1 = ⎢ ⎥ 2 ⎣1⎦ 2 2 ⎣1 −1⎦ 2 ⎡ 1 ⎤ 2 1 ⎡ 1 1 ⎤ α10 = ⎢ ⎥ []1 1 = ⎢ ⎥ 2 ⎣−1⎦ 2 2 ⎣−1 −1⎦ 2 ⎡ 1 ⎤ 2 1 ⎡ 1 −1⎤ α11 = ⎢ ⎥ []1 −1 = ⎢ ⎥ 2 ⎣−1⎦ 2 2 ⎣−1 1 ⎦

#29 Outer Product (2)

z We have shown earlier that X = ATYA z Consider X to be a 2*2 matrix:

⎡x00 x01 ⎤ 1 ⎡1 1 ⎤⎡y00 y01 ⎤⎡1 1 ⎤ ⎢ ⎥ = ⎢ ⎥⎢ ⎥⎢ ⎥ ⎣x10 x11 ⎦ 2 ⎣1 −1⎦⎣ y10 y11 ⎦⎣1 −1⎦

1 ⎡y00 + y10 y01 + y11 ⎤⎡1 1 ⎤ = ⎢ ⎥⎢ ⎥ 2 ⎣y00 − y10 y01 − y11 ⎦⎣1 −1⎦

1 ⎡y00 + y10 + y01 + y11 y00 + y10 − y01 − y11 ⎤ = ⎢ ⎥ 2 ⎣ y00 − y10 + y01 − y11 y00 − y10 − y01 + y11 ⎦ 1 ⎧α ⎡1 1⎤ ⎡ 1 1 ⎤ ⎡1 −1⎤ ⎡ 1 −1⎤⎫ = ⎨y00 α+ y10 + y01 + y11 ⎬ 2 ⎢1 1⎥ ⎢−1 −1⎥ ⎢1 −1⎥ ⎢−1 1 ⎥ ⎩ ⎣ ⎦ ⎣ α ⎦ ⎣ ⎦ ⎣ ⎦⎭

= y00 00 + y01 01 + y10 10 + y11α11

#30 Basis Matrix

z The quantities α 00 , α 01 , α 10 , α 11 are called the basis matrices in 2-D space. z In general, the outer products of a n*n orhtonormal matrix form a basis matrix set in 2 dimension. The quantity α 00 is called the DC coefficient (note all the elements for the DC coeeficient are 1, indicating an average operation), and other coefficients have alternating values and are called AC coefficients.

#31 Fast Cosine Transform

z 2D 8X8 basis functions of the DCT: z The horizontal frequency of the basis functions increases from left to right and the vertical frequency of the basis functions increases from top to bottom.

.

#32 Amplitude distribution of the DCT coefficients

z Histograms for 8x8 DCT coefficient amplitudes measured for natural images z DC coefficient is typically uniformly distributed. z The distribution of the AC coefficients have a Laplacian distribution with zero-mean.

#33 Discrete Cosine Transform (DCT)

z Conventional image data have reasonably high inter-element correlation. z DCT avoids the generation of the spurious spectral components which is a problem with DFT and has a fast implementation which avoids complex algebra.

#34 One-dimensional DCT

z The basis idea is to decompose the image into a set of “waveforms”, each with a particular “special” frequency. z To human eyes, high spatial frequencies are imperceptible and a good approximation of the image can be created by keeping only the lower frequencies. z Consider the one-dimensional case first. The 8 arbitrary grayscale values (with range 0 to 255, shown in the next slide) are level shifted by 128 (as is done by JPEG).

#35 One-dimensional DCT

An example of 1-D DCT decomposition

(b) (c)

150 150

100 100

50 50

f(x) 0 S(u) 0 12345678 12345678 -50 -50

-100 -100

-150 -150 x u Before DCT (image data) After DCT (coefficients

1 1 1 1

0 0 0 0 12345678 12345678 12345678 12345678 Amplitude Amplitude Amplitude Amplitude

-1 -1 -1 -1 U=0 U=4 U=1 U=5

1 1 1 1

0 0 0 0 12345678 12345678 12345678 12345678 Amplitude Amplitude Amplitude Amplitude

-1 -1 -1 -1 U=2 U=6 U=3 U=7 The 8 basis functions for 1-D DCT

#36 One-dimensional DCT

z The waveforms cam be denoted as w( f ) = cos( fθ ), with 0 ≤ θ ≤ π with frequencies f = 0, 1, …, 7. Each wave is sampled at 8 points π 3π 5π 7π 9π 11π 13π 15π θ = , , , , , , , 16 16 16 16 16 16 16 16 to form a basis vector. z The eight basis vector constitutes a matrix A:

1 1 1 1 1 1 1 1 0.981 0.831 0.556 0.195 -0.195 -0.556 -0.831 -0.981 0.924 0.383 -0.383 -0.924 -0.924 -0.383 0.383 0.924 0.831 -0.195 -0.981 -0.556 0.556 0.981 0.195 -0.831 0.707 -0.707 -0.707 0.707 0.707 -0.707 -0.707 0.707 0.556 -0.981 0.195 0.831 -0.831 -0.195 0.981 -0.556 0.383 -0.924 0.924 -0.383 -0.383 0.924 -0.924 0.383 -0.195 -0.556 0.831 -0.981 0.981 -0.831 0.556 -0.195

#37 #38 One-dimensional DCT

z The output of the DCT transform is: S(u) = A[I(x)]T where A is the 8*8 transformation matrix defined in the previous slide, and I(x) is the input signal. z S(u) are called the coefficients for the DCT transform for input signal I(x).

#39 One-dimensional FDCT and IDCT with N=8 Sample points

z The 1-D DCT in JPEG is defined as: C(u) 7 (2x +1)uπ z FDCT S(u) = ∑ I(x)cos[ ] for u = 0,1,..7 2 x=0 16 7 C(u) (2x +1)uπ z IDCT I(x) = ∑ S(u)cos[ ] for x = 0,1,..,7 u=0 2 16

z Where (u is frequency) z I(x) is the 1-D sample z S(u) is the 1-D DCT coefficient ⎧ 2 z And ⎪ for u = 0 C(u) = ⎨ 2 ⎪ ⎩1 for u > 0 #40 One-dimensional FDCT and IDCT with N Sample points

z The 1-D DCT in JPEG is defined as: z FDCT 2 N −1 (2x +1)uπ S(u) = C(u)∑ I(x)cos[ ] for u =1,2,..N -1 N x=0 2N

z IDCT 2 n−1 (2x +1)uπ I(x) = ∑C(u)S(u)cos[ ] for x = 0,1,..N -1 N u=0 2N

z Where (u is frequency) z I(x) is the 1-D sample z S(u) is the 1-D DCT coefficient and ⎧ 2 ⎪ for u = 0 C(u) = ⎨ 2 ⎪ ⎩1 for u > 0 #41 One-dimensional FDCT and IDCT

z As an example, let I(x)=[12,10,8,10,12,10,8,11] S(u) = A[]I(x) T = [28.6375,0.5712,0.4619,1.757,3.182,−1.729,0.191,−0.309].

z If we now apply IDCT, we will get back I(x). We can quantize the coefficient S(u) and still obtain a very good approximation of I(x). z For example, IDCT(28.6,0.6,0.5,1.8,3.2,−1.8,0.2,−0.3) = (12.0254,10.0233,7.96054,9.93097,12.0164,9.9932,7.99354,10.9989) z While IDCT (28 ,0,0,2,3,−2,0,0) = (11 .236 ,9.6244 ,7.6628 ,9.573 ,12 .347 ,10 .014 ,8.053 ,10 .684 )

#42 Two-dimensional FDCT and IDCT for 8X8 block

z The 2-D DCT in JPEG is definedπ as: z FDCT C (v) C (u) 7 7 (2 y + 1)v (2x + 1)uπ S (v, u) = ∑ ∑ I ( y, x) cos[ ]cos[ ] 2 2 y =0 x=0 16π 16 z IDCT 7 C(v) 7 C(u) (2y +1)v (2x +1)uπ I(y, x) = ∑ ∑ S(v,u)cos[ ]cos[ ] v=0 2 u=0 2 16 16 z Where z I(y,x) is the 2-D sample z S(v,u) is the 2-D DCT coefficient ⎧ 2 ⎧ 2 z And ⎪ for u = 0 ⎪ for v = 0 C(u) = ⎨ 2 C(v) = ⎨ 2 ⎪ ⎪ ⎩1 for u > 0 ⎩1 for v > 0 #43 Fast Cosine Transform

z 2D 8X8 basis functions of the DCT: z The horizontal frequency of the basis functions increases from left to right and the vertical frequency of the basis functions increases from top to bottom.

.

#44 Two-dimensional FDCT for NXM block

π FDCT 2 C (v) C (u) N −1 M −1 (2 y + 1)v (2x + 1)uπ S (v, u) = ∑ ∑ I ( y, x) cos[ ]cos[ ] NM 2 2 x=0 y =0 2M 2N

for u = 0,1,..., N-1 and v = 0,1,..., M − 1

z Where I(y,x) is the 2-D sample, S(v,u) is the 2-D DCT coefficient and ⎧ 2 ⎧ 2 ⎪ for v = 0 ⎪ for u = 0 C(v) = C(u) = ⎨ 2 ⎨ 2 ⎪ ⎪1 for v > 0 ⎩1 for u > 0 ⎩

#45 Separable Function

z The 2-D FDCT is a separable function because we can express the formula as π

M−1 N−1 2 2 (2x+1)u (2y+1)πu S(v,u) = C(v)∑{ C(u)∑ I(y,x)cos[ ] }cos [ ] N x=0 M y=0 2N 2M

for u =0,1,...,N-1 and v =0,1,...,M −1

This means that we can compute a 2-D FDCT by first computing a row-wise 1-D FDCT and taking the output and perform on it a column-wise 1-D FDCT transform. This is possible, as we explained earlier, because the basis matrices are orthonormal.

#46 Two-dimensional IDCT for NXM block π 2 M−1 N−1 (2y+1)v (2x+1)uπ I(y,x) = ∑ ∑C(u) C(v) S(v,u)cos[ ]cos[ ] NM v=0 u=0 2M 2N for x =0,1,..., N−1 and y =0,1,..., M−1.

This function is again separable and the computation can be done in two steps: a row-wise 1-D IDCT followed by a column-wise 1-D IDCT. A straightforward algorithm to compute both FDCT and IDCT will need ( for a block of NXN pixels) an order of O(N3) multiplication/ addition operations. After the ground-breaking discovery of O(N logN) algorithm for , several researchers proposed efficient algorithms for DCT computation.

#47 JPEG Introduction - The background

z JPEG stands for Joint Photographic Expert Group z A standard image compression method is needed to enable interoperability of equipment from different manufacturer z It is the first international digital image compression standard for continuous-tone images (grayscale or color) z The history of JPEG – the selection process

#48 JPEG Introduction – what’s the objective?

z “very good” or “excellent” compression rate, reconstructed image quality, transmission rate z be applicable to practically any kind of continuous- tone digital source image z good complexity z have the following modes of operations: z sequential encoding z progressive encoding z lossless encoding z hierarchical encoding

#49 JPEG Architecture Standard

Source compressed reconstructed encoder decoder image data image data image data

Image compression system

encoder Source Encoder descriptors symbols entropy compressed statistical image data model encoder image data model

model entropy tables coding tables

The basic parts of an JPEG encoder

#50 JPEG Overview (cont.)

JPEG has the following Operation Modes:

„ Sequential DCT-based mode

„ Progressive DCT-based mode

„ Sequential lossless mode

„ Hierarchical mode

JPEG entropy coding supports:

„ Huffman encoding

„ Arithmetic encoding

#51 JPEG Baseline System

JPEG Baseline system is composed of:

„ Sequential DCT-based mode

„

8×8 blocks DCT-based encoder

statistical compressed FDCT quantizer entropy model encoder image data

Source table table image data specification specification

The basic architecture of JPEG Baseline system

#52 The Baseline System

DC coefficient

„ The DCT coefficient values can be regarded as the relative amounts of the 2-D spatial frequencies contained in the 8×8 block

„ the upper-left corner coefficient is called the DC coefficient, which is a measure of the average of the energy of the block

„ Other coefficients are called AC coefficients, coefficients correspond to high frequencies tend to be zero or near zero for most natural images #53 Quantization Tables in DCT

z Human eyes are less sensitive to high frequencies. z We use different quantization value for different frequencies. z Higher frequency, bigger quantization value. z Lower frequency, smaller quantization value. z Each DCT coefficient corresponds to a certain frequency. z High-level HVS is much more sensitive to the variations in the achromatic channel than in the chromatic channels.

#54 The Baseline System – Quantization

z Why quantization? . z to achieve further compression by representing DCT coefficients with no greater precision than is necessary to achieve the desired image quality z Generally, the “high frequency coefficients” has larger quantization values z Quantization makes most coefficients to be zero, it makes the compression system efficient, but it’s the main source that make the system “lossy” and introduces distortion in the reconstructed image. z The quantization step-size parameters for JPEG are given by two Quantization Matrices, each element of the matrix being an integer of size between 1 to 255. The DCT coeeficients are divided by the corresponding quantization parameters and rounded to nearest integer.

#55 F(u,v) F'(u,v) = Round( ) Q(u,v)

F(u,v): original DCT coefficient F’(u,v): DCT coefficient after quantization Q(u,v): quantization value

There are two quantization tables: one for the luminance component and the other for the Chrominance component. JPEG standard does not prescribe the values of the table; it is left upto the user, but they recommend the following two tables under normal circumstances. These tables have been created by analysing data of human perception and a lot of trial and error.

#56 Quantization Tables in DCT

z So, we have two quantization tables. z Measured for an “average” person. z Higher frequency, bigger quantization value. z Lower frequency, smaller quantization value.

16 11 10 16 24 40 51 61 17 18 24 47 99 99 99 99 12 12 14 19 26 58 60 55 18 21 26 66 99 99 99 99 14 13 16 24 40 57 69 56 24 26 56 99 99 99 99 99 14 17 22 29 51 87 80 62 47 66 99 99 99 99 99 99 18 22 37 56 68 109 103 77 99 99 99 99 99 99 99 99 24 35 55 64 81 104 113 92 99 99 99 99 99 99 99 99 49 64 78 87 103 121 120 101 99 99 99 99 99 99 99 99 72 92 95 98 112 100 103 99 99 99 99 99 99 99 99 99 Luminance quantization table Chrominance quantization table

#57 Two-dimensional DCT

z The image samples are shifted from unsigned integer with range [0, 2 N-1] to signed integers with range [- 2 N-1, 2 N-1-1]. Thus samples in the range 0-255 are converted in the range -128 to 127 and those in the range 0 to 4095 are converted in the range - 2048 to 2047. This zero-shift done for JPEG to reduce the internal precision requirements in the DCT calculations. z How to interpret the DCT coefficients? z The DCT coefficient values can be regarded as the relative amounts of the 2-D spatial frequencies contained in the 8×8 block. z F(0,0) is called DC coefficient, which is a measure of the average of the energy of the block. z Other coefficients are called AC coefficients, coefficients correspond to high frequencies tend to be zero or near zero for most images. z The energy is concentrated in the upper-left corner.

#58 Baseline System - DC coefficient coding

z After quantization step, the DC coefficient are encoded separately by using differential encoding. The DC difference sequence is constituted by concatenating the DC coefficient of the first block followed by the difference values of DC coefficients of the succeeding blocks. The average values of the succeeding blocks are usually correlated.

quantized DC DC difference coefficients DPCM

Differential pulse code modulation

#59 Baseline System - AC coefficient coding

z AC coefficients are arranged into a zig-zag sequence. This is because most of the significant coefficients are located in the upper left corner of the matrix. The high frequency components are mostly 0’s and can be efficiently coded by run-length encoding. Horizontal frequency

015614152728

2 4 7 13 16 26 29 42

3 8 12 17 25 30 41 43

911182431404453

10 19 23 32 39 45 52 54 Vertical frequency 20 22 33 38 46 51 55 60

21 34 37 47 50 56 59 61

35 36 49 48 57 58 62 63

#60 An Example (Ref:T. Acharya, P. Tsai,”JPEG 2000 Standard for Image Compression), Wiley-Iterscience, 2005

The 8X8 Image Data Block 110 110 118 118 121 126 131 131 108 111 125 122 120 125 134 135 106 119 129 127 125 127 138 144 110 126 130 133 133 131 141 148 115 116 119 120 122 125 137 139 115 106 99 110 107 116 130 127 110 91 82 101 99 104 120 118 103 76 70 95 92 91 107 106

#61 #62 Baseline System - Statistical modeling

z Statistical modeling translate the inputs to a sequence of “symbols” for Huffman coding to use z Statistical modeling on DC coefficients: z Category: different size (SSSS) z Magnitude: amplitude of difference (additional bits)

#63 Coding the DC coeeficients

The prediction errors are classified into 16 categories in modulo 216. Category C contains the range of integers [-(2c-1), + (2c-1)] excluding the numbers in the range [-(2c-1-1), + (2c-1-1)] which falls in the previous category. The category number is encoded using either a fixed code (SSSS), unary code (0, 10, 110, …) or a variable length Huffman code.

#64 Coding the DC/AC coefficients

If the magnitude of the error is positive, it is encoded using ‘category’ number of bits with a leading ‘1’. If it is negative, I’s complement of the positive number is used. Thus, if error is -6 and if the Huffman code for category 3 is 1110, it will get a code (1110001). The JPEG Standard recommends Huffman two codes for the category numbers: one for Luminance DC and the other for Chrominance DC ( See Salomon, p.289; it is also described in Annex K ( Table k.3 and K.4 of the Standard)

z Statistical modeling on AC coefficients: z Run-length: RUN-SIZE=16*RRRR+SSSS z Amplitude: amplitude of difference (additional bits)

#65 The AC coefficients contains just a few nonzero numbers, with runs of zeros followed by a long run of trailing zeros. For each nonzero number, two numbers are generated: 1)16*RRRR which gives the number of consecutive zeros preceding the nonzero coefficient being encoded, 2)SSSS gives the category corresponding to number of bits required to encode the AC coefficient using variable length Huffman code. This is specified in the form of a table (see next slide). The second attribute of the AC coefficient is its amplitude which is encoded exactly the same way as the DC values. Note the code (0,0) is reserved for EOB ( signifying that the remaining AC coefficients are just a single trailing run of 0’s). The code (15,0) is reserved for a maximum run of 16 zeros. If it is more than 16, it is broken down into runs of 16 plus possibly a single run less than 16. JPEG recommends two tables for Huffman codes for AC coefficients, Tables K5 and K6 for luminance and Chrominance values.

#66 Encoding the AC Coefficients

Huffman AC statistical model run-length/amplitude combinations Huffman coding of AC coefficients

#67 Example (Contd.)

z Let us finish our example. The DC coeeficient is -6 has a code o 1110001. The AC coeeficients will be encoded as: z (0,3)(-6), (0,3)(6), (0,3)(-5), (1,2)(2), (1,1)(-1), (5,1)(-1),(2,1)(-1), (0,1)(1),(0,0). The Table K.5 gives the codes z 1010, 00, 100, 1100, 11011, 11100 and 11110101 for the pairs of symbols (0,0), (0,1), (0,3), (1,1), (1,2), (2,1) and (5,1), respectively. The variable length code for AC coefficients 1, -1, 2, -5, 6 and -6 are 1, 0, 10, 010,110 and 001, respectively. Thus JPEG will give a compressed representation of the example image using only 55 bits as: z 1110001 100001 100110 100010 1101110 11000 11110100 111000 001 1010 z The uncompressed representation would have required 512 bits!

#68 Example to illustrate encoding/decoding

#69 JPEG Progressive Model

z Why progressive model? z Quick transmission z Image built up in a coarse-to-fine passes z First stage: encode a rough but recognizable version of the image z Later stage(s): the image refined by successive scans till get the final image z Two ways to do this: z Spectral selection – send DC, AC coefficients separately z Successive approximation – send the most significant bits first and then the least significant bits

#70 JPEG Lossless Model

Sample values Descriptors DPCM

Differential pulse code modulation

Predictors for lossless coding selection value prediction strategy 0 no prediction C B 1 A AX 2 B 3 C 4 A+B-C 5 A+(B-C)/2 6 B+(A-C)/2 7 (A+B)/2

#71 JPEG Hierarchical Model

z Hierarchical model is an alternative of progressive model (pyramid) z Steps: z filter and down-sample the original images by the desired number of multiplies of 2 in each dimension z Encode the reduced-size image using one of the above coding model z Use the up-sampled image as a prediction of the origin at this resolution, encode the difference z Repeat till the full resolution image has been encode

#72 The Effect of Segmentation

z The image samples are grouped into 8×8 blocks. 2-D DCT is applied on each 8×8 blocks.

z Because of blocking, the spatial frequencies in the image and the spatial frequencies of the cosine basis functions are not precisely equivalent. According to Fourier’s theorem, all the harmonics of the fundamental frequencies must be present in the basis functions to be precise. Nonetheless, the relationship between the DCT frequency and the spatial frequency is a close approximation if we take into account the sensitivity of human eye for detecting contrast in frequency. z The segmentation also introduces what is called the “blocking artifacts”. This becomes very pronounced if the DC coefficients from block to block vary considerably. These artifacts appear as edges in the image, and abrupt edges imply high frequency. The effect can be minimized if the non-zero AC coefficients are kept.

#73 Some other transforms

z Discrete Fourier Transform (DFT) z Haar Transform z Karhunen Loève Transform (KLT) z Walsh-Hadamard Transform (WHT)

#74 Discrete Fourier Transform (DFT)

z Well-known for its connection to spectral analysis and filtering. z Extensive study done on its fast

implementation (O(Nlog2N) for N-point DFT). z Has the disadvantage of storage and manipulation of complex quantities and creation of spurious spectral components due to the assumed periodicity of image blocks.

#75 Haar Transform

z Very fast transform. z The easiest . z Useful in edge detection, image coding, and image analysis problems. z Energy Compaction is fair, not the best compression algorithms. 2D basis function of Haar transform

#76 Karhunen Loève Transform (KLT)

z Karhunen Loève Transform (KLT) yields decorrelated transform coefficients. z Basis functions are eigenvectors of the covariance matrix of the input signal. z KLT achieves optimum energy concentration. z Disadvantages: z KLT dependent on signal statistics z KLT not separable for image blocks z Transform matrix cannot be factored into sparse matrices

#77 Walsh-Hadamard Transform (WHT)

z Although far from optimum in an Transformation matrix: energy packing ⎡11111111⎤ ⎢ ⎥ sense for typical ⎢1111− 1111−−−⎥ ⎢11−−−− 1 1 1 11 1⎥ imagery, its simple ⎢ ⎥ 1 11111111− −−− A = ⎢ ⎥ implementation 8 ⎢11111111−− −− ⎥ ⎢ ⎥ (basis functions are ⎢11111111−− − −⎥ ⎢11111111−−−−⎥ either -1 or +1) has ⎢ ⎥ made it widely ⎣⎢11111111−−−−⎦⎥ popular.

#78 Walsh-Hadamard Transform (WHT)

z Walsh-Hadamard transform requires adds and subtracts z Use of high speed signal processing has reduced its use due to improvements with DCT.

#79 Transform Coding: Summary

z Purpose of transform z de-correlation z energy concentration z KLT is optimum, but signal dependent and, hence, without a fast algorithm z DCT reduces blocking artifacts appeared in DFT z Threshold coding + zig-zag-scan + 8x8 block size is widely used today z JPEG, MPEG, ITU-T H.263. z Fast algorithm for scaled 8-DCT z 5 multiplications, 29 additions z Audio Coding z MP3 = MPEG 1- Layer 3 uses DCT

#80