2. Transform Coding

Lecture notes of Image Compression and Video Compression series 2005 2. Transform Coding Prof. Amar Mukherjee Weifeng Sun Topics z Introduction to Image Compression z Transform Coding z Subband Coding, Filter Banks z Haar Wavelet Transform z SPIHT, EZW, JPEG 2000 z Motion Compensation z Wireless Video Compression #2 Transform Coding z Why transform Coding? z Purpose of transformation is to convert the data into a form where compression is easier. This transformation will transform the pixels which are correlated into a representation where they are decorrelated. The new values are usually smaller on average than the original values. The net effect is to reduce the redundancy of representation. z For lossy compression, the transform coefficients can now be quantized according to their statistical properties, producing a much compressed representation of the original image data. #3 Transform Coding Block Diagram z Transmitter Original Image f(j,k) F(u,v) F*(uv) Segment into Forward Quantization n*n Blocks Transform and Coder z Receiver Channel Combine n*n Inverse Blocks Transform Decoder Reconstructed Image f(j,k) F(u,v) F*(uv) #4 How Transform Coders Work z Divide the image into 1x2 blocks z Typical transforms are 8x8 or 16x16 x1 x2 x1 x2 #5 Joint Probability Distribution z Observe the Joint Probability Distribution or the Joint Histogram. Probability x2 x1 #6 Pixel Correlation in Image[Amar] z Rotate 45o clockwise ⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ Y = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ Source Image: Amar Before Rotation After Rotation #7 Pixel Correlation Map in [Amar] -- coordinate distribution z Upper: Before Rotation z Lower: After Rotation z Notice the variance of Y2 is smaller than the variance of X2. z Compression: apply entropy coder on Y2. #8 Pixel Correlation in Image[Lenna] z Let’s look at another example ⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ Y = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ Source Image: Lenna Before Rotation After Rotation #9 Pixel Correlation Map in [Lenna] -- coordinate distribution z Upper: Before Rotation z Lower: After Rotation z Notice the variance of Y2 is smaller than the variance of X2. z Compression: apply entropy coder on Y2. #10 Rotation Matrix z Rotated 45 degrees clockwise ⎡Y ⎤ ⎡ cos 45o sin 45o ⎤⎡X ⎤ ⎡ 2 2 2 2⎤⎡X ⎤ Y = 1 = AX = 1 = 1 ⎢ ⎥ ⎢ o o ⎥⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣Y2 ⎦ ⎣− sin 45 cos 45 ⎦⎣X 2 ⎦ ⎣− 2 2 2 2⎦⎣X 2 ⎦ z Rotation matrix A ⎡ cos 45o sin 45o ⎤ ⎡ 2 2 2 2⎤ 2 ⎡ 1 1⎤ A = ⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ ⎣− sin 45o cos 45o ⎦ ⎣− 2 2 2 2⎦ 2 ⎣−1 1⎦ #11 Orthogonal/orthonormal Matrix z Rotation matrix is orthogonal. z The dot product of a row ⎧≠ 0 if i = j with itself is nonzero. Ai • Aj = ⎨ ⎩0 else z The dot product of different rows is 0. z Futhermore, the rotation matrix is orthonormal. ⎧1 if i = j Ai • Aj = ⎨ z The dot product of a row ⎩0 else with itself is 1. 2 ⎡ 1 1⎤ z Example: A = ⎢ ⎥ 2 ⎣−1 1⎦ #12 Reconstruct the Image z Goal: recover X from Y. z Since Y = AX, so X = A-1Y z Because the inverse of an orthonormal matrix is its transpose, we have A-1 = AT z So, Y = A-1X = ATX z We have inverse matrix o o −1 T ⎡cos 45 − sin 45 ⎤ 2 ⎡1 −1⎤ A = A = ⎢ ⎥ = ⎢ ⎥ ⎣sin 45o cos 45o ⎦ 2 ⎣1 1 ⎦ #13 Energy Compaction ⎡Y1 ⎤ ⎡X1 ⎤ 2 ⎡ 1 1⎤ Y = ⎢ ⎥ = A⎢ ⎥ A = ⎢ ⎥ ⎣Y2 ⎦ ⎣X 2 ⎦ 2 ⎣−1 1⎦ z Rotation matrix [A] compacted the energy into Y1. z Energy is the variance (http://davidmlane.com/hyperstat/A16252.html) N 2 1 2 δ = ∑(xi − µ) where µ is the mean N −1 i=1 z Given: ⎡4⎤ 2 ⎡ 1 1⎤⎡4⎤ 2 ⎡9⎤ ⎡6.364⎤ X = ⎢ ⎥, then Y = AX = ⎢ ⎥⎢ ⎥ = ⎢ ⎥ = ⎢ ⎥ ⎣5⎦ 2 ⎣−1 1⎦⎣5⎦ 2 ⎣1⎦ ⎣0.707⎦ z The total variance of X equals to that of Y. It is 41. z Transformation makes Y2 (0.707) very small. z If we discard min{X}, we have error 42/41 =0.39 z If we discard min{Y}, we have error 0.7072/41 =0.012 z Conclusion: we are more confident to discard min{Y}. #14 Idea of Transform Coding z Transform the input pixels X0,X1,X2,...,Xn-1 into coefficients Y0,Y1,...,Yn-1 (real values) z The coefficients have the property that most of them are near zero. z Most of the “energy” is compacted into a few coefficients. z Scalar quantize the coefficient z This is bit allocation. z Important coefficients should have more quantization levels. z Entropy encode the quantization symbols. #15 Forward transform (1D) z Get the sequence Y from the sequence X. z Each element of Y is a linear combination of elements in X. n−1 Yj = ∑ a j,i X i j = 0,1,Ln −1 i=0 Basis Vectors ⎡⎤Ya00,00,10⎡⎤L an− ⎡ X ⎤ ⎢⎥⎢⎥ ⎢ ⎥ ⎢⎥MMOMM= ⎢⎥ ⎢ ⎥ Y = AX ⎢⎥⎢⎥ ⎢ ⎥ ⎣⎦Yann−−11,01,11⎣⎦L a nnn −−− ⎣ X ⎦ The element of the matrix are also called the weight of the linear transform, and they should be independent of the data (except for the KLT transform). #16 Choosing the Weights of the Basis Vector z The general guideline to determine the values of A is to make Y0 large, while remaining Y1,...,Yn-1 to be small. z The value of the coefficient will be large if weights aij reinforce the corresponding data items Xj. This requires the weights and the data values to have similar signs. The converse is also true: Yi will be small if the weights and the data values to have dissimilar signs. #17 Extracting Features of Data z Thus, the basis vectors should extract distinct features of the data vectors and must be independent orthogonal). Note the pattern of distribution of +1 and -1 in the matrix. They are intended to pick up the low and high “frequency” components of data. z Normally, the coefficients decrease in the order of Y0,Y1,...,Yn-1. z So, Y is more amenable to compression than X. Y = AX #18 Energy Preserving (1D) z Another consideration to choose rotation matrix is to conserve energy. z For example, we have orthogonal matrix ⎡1 1 1 1 ⎤ ⎡4⎤ ⎡17 ⎤ ⎢ ⎥ ⎢1 1 −1 −1⎥ ⎢6⎥ 3 A = ⎢ ⎥ X = ⎢ ⎥ Y = AX = ⎢ ⎥ ⎢1 −1 −1 1 ⎥ ⎢5⎥ ⎢− 5⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦ ⎣2⎦ ⎣ 2 ⎦ z Energy before rotation: 42+62+52+22=81 z Energy after rotation: 172+32+(-5)2+12=324 z Energy changed! z Solution: scale W by scale factor. The scaling does not change the fact that most of the energy is concentrated at the low frequency components. #19 Energy Preserving, Formal Proof z The sum of the squares Energy of the transformed n−1 2 T ∑Yi = Y Y sequence is the same i=1 as the sum of the = (AX )T (AX ) squares of the original = X T AT AX sequence. = X T (AT A)X T z Most of the energy are = X X n−1 2 Orthonormal concentrated in the low = ∑ X i Matrix; frequency coefficients. i=1 See page #12 #20 Why we are interested in the orthonormal matrix? ⎡11111111⎤ z Normally, it is ⎢ ⎥ ⎢1111− 1111−−−⎥ ⎢11111111−−−− ⎥ computationally difficult ⎢ ⎥ 1 11111111− −−− A = ⎢ ⎥ 8 ⎢1−− 1 11 1 −− 1 11⎥ to get the inverse matrix. ⎢ ⎥ ⎢1− 1−− 11 11 1 − 1⎥ ⎢1−−−− 11 1 11 11⎥ z The inverse of the ⎢ ⎥ ⎢11111111− −−−⎥ transformation matrix is ⎣ ⎦ ⎡11111111⎤ simply its transpose. ⎢ ⎥ ⎢11111111− −−−⎥ ⎢11111111−−−− ⎥ -1 T ⎢ ⎥ A =A 1 11111111− −−− BA===−1 AT ⎢ ⎥ 8 ⎢1−− 1 11 1 −− 1 11⎥ ⎢ ⎥ ⎢1− 1−− 11 11 1 − 1⎥ ⎢1−−−− 11 1 11 11⎥ ⎢ ⎥ ⎣⎢11111111− −−−⎦⎥ #21 Two Dimensional Transform z From input Image I, we get D. 4 6 7 8 z Given Transform matrix A 5 2 4 4 ⎡1 1 1 1 ⎤ ⎡4 7 6 9⎤ ⎢ ⎥ ⎢ ⎥ 6 3 9 6 1 1 1 −1 −1 6 8 3 6 A = ⎢ ⎥ D = ⎢ ⎥ 2 ⎢1 −1 −1 1 ⎥ ⎢5 4 7 6⎥ 7 5 6 9 ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦ ⎣2 4 5 9⎦ z 2D transformation goes as: ⎡1 1 1 1 ⎤⎡4 7 6 9⎤ ⎡1 1 1 1 ⎤ ⎡22.75 − 2.75 0.75 − 3.75⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1 1 1 −1 −1 6 8 3 6 1 1 1 −1 −1 1.75 3.25 − 0.25 −1.75 Y = AXAT = ⎢ ⎥⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ 2 ⎢1 −1 −1 1 ⎥⎢5 4 7 6⎥ 2 ⎢1 −1 −1 1 ⎥ ⎢ 0.25 − 3.25 0.25 − 2.25⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣1 −1 1 −1⎦⎣2 4 5 9⎦ ⎣1 −1 1 −1⎦ ⎣ 1.25 −1.25 0.75 1.75 ⎦ z Notice the energy compaction. #22 Two Dimensional Transform z Because transformation matrix is orthonormal, AT = A−1 z So, we have z Forward transform Y = AXA−1 = AXAT z Backward transform X = A−1YA = ATYA #23 Linear Separable transform z Two dimensional transform is simplified as two iterations of one-dimensional transform. NN−−11 YaXakl,,,,= ∑∑ ki i j l j ij==00 z Column-wise transform and row-wise transform. #24 Transform and Filtering z Consider the orthonormal transform. 2 ⎡ 1 1⎤ A = ⎢ ⎥ 2 ⎣−1 1⎦ z If A is used to transform a vector of 2 identical elements x = [x,x]T, the transformed sequence will ( 2x,0)T indicating the “low frequency” or the “average” value is 2 x and the “high frequency” component is 0 because the signal value do not vary. T T z If x = [3,1] or [3,-1] , the output sequence will be (2 2, 2)T and ( 2 , 2 2 ) T respectively.

Load more