Ubiquitous Computing and Communication Journal

A HYBRID TRANSFORMATION TECHNIQUE FOR ADVANCED CODING

M. Ezhilarasan, P. Thambidurai Department of Computer Science & Engineering and Information Technology, Pondicherry Engineering College, Pondicherry – 605 014, India [email protected]

ABSTRACT A Video encoder performs video by having combination of three main modules such as and compensation, Transformation, and Entropy encoding. Among these three modules, transformation is the module of removing the spatial redundancy that exists in the spatial domain of video sequence. Discrete Cosine Transformation (DCT) is the defacto transformation method in existing image and video coding standards. Even though the DCT has very good energy preserving and decorrelation properties, it suffers from blocking artifacts. To overcome this problem, a hybridization method has been incorporated in transformation module of video encoder. This paper presents an hybridization in the transformation module by incorporating DCT as transformation technique for inter frames and a combination of wavelet filters for intra frames of video sequence. This proposal is also applied in the existing H.264/AVC standard. Extensive experiments have been conducted with various standard CIF and QCIF video sequences. The results show that the proposed hybrid transformation technique outperforms the existing technique used in the H.264/AVC considerably.

Keywords: Data Compression, DCT, DWT, Video Coding, Transformation.

1 INTRODUCTION In (AVC) [6], video is captured as a sequence of frames. Each frame is Transform coding techniques have become the compressed by partitioning it as one or more slices, important paradigm in image and video coding where each slice consists of sequence of macro standards, in which the Discrete Cosine Transform blocks. These macro blocks are transformed, (DCT) [1][2] is applied due to its high decorrelation quantized and encoded. The transformation module and energy compaction properties. In the past two converts the frame data from time domain to decades, more contributions focused on Discrete frequency domain, which intends to decorrelate the Wavelet Transform (DWT) [3][4] for its energy (i.e., amount of information present in the performance in image coding. The two most popular frame) present in the spatial domain. It also converts techniques such as DCT and DWT are well applied the energy components of the frame into small on image and video coding applications. numbers of transform coefficients, which are more International Organization for Standards / efficient for encoding rather than their original International Electro technical Commission frame. Since the transformation module is reversible (ISO/IEC) and International Telecommunications in nature, this process does not change the Union – Telecommunication standardization sector information content present in the source input signal (ITU-T) organizations have developed their own during encoding and decoding process. By video coding standards viz., Moving Picture Experts information theory, transformed coefficients are Group MPEG-1, MPEG-2, MPEG-4 for multimedia reversible in nature. and H.261, H.263, H.263+, H.263++, H.26L for As per Human Visual System (HVS), human videoconferencing applications. Recently, the MPEG eyes are highly sensitive on low frequency signals and the Video Coding Experts Group (VCEG) have than the high frequency signals. The decisive jointly designed a new standard namely, H.264 / objective is this paper is to develop a hybrid MPEG-4 (Part-10) [5] for providing better technique that achieves higher performance in the compression of video sequence. There has been a parameters specified above than the existing tremendous contribution by researchers, experts of technique used in the current advanced video coding various institutions and research laboratories for the standard. past two decades to take up the recent technology In this paper, a combination of orthogonal and requirements in the video coding standards. bi-orthogonal wavelet filters have been applied at

Volume 3 Number 3 Page 89 www.ubicc.org Ubiquitous Computing and Communication Journal

different decomposition levels for intra frames and 2.1 Basics of Transformation DCT for inter frames of video encoder. Even though intra frames are coded with wavelet transform, the From the basic concepts of information theory, impact of this can be seen in inter frame coding. coding of symbols in vectors is more efficient than With better quality anchor pictures are retained in coding of symbols in scalars [8]. By using this frame memory for prediction, the remaining inter phenomenon, a group of blocks of consecutive frame pictures are more efficiently coded with DCT. symbols from the source video input are taken as The proposed transformation method is also vectors. There is high correlation in the neighboring implemented on H.264/AVC reference software [7]. pixels in an image or intra-frame of video. The paper is organized as follows. In Section 2, the Transformation is a reversible model [9] by theory, basics of the transform coding methods are which decorrelates the symbols in the given blocks. highlighted. The proposed hybrid transformation In the recent image and video coding standards the technique has been described in section 3. Extensive following transformation techniques are applied due experimental results and discussion have been given to their orthonormal property and energy in section 4 followed by conclusion in section 5. compactness.

2 BASICS OF TRANSFORM CODING 2.1.1 Discrete Cosine Transform The Discrete Cosine Transform, a widely used For any inter-frame video coding standards, the transform coding technique in image and video basic functional modules are motion estimation and compression algorithms. It is able to perform de- compensation, transformation, quantization and correlation of the input signal in a data-independent entropy encoder. As shown in the Fig. 1, the manner. When an image or a frame is transformed by temporal redundancies exists in successive frames DCT, it is first divided into blocks, typically of size are minimized or reduced by motion estimation and of 8x8 pixels. These pixels are transformed compensation module. The residue or the difference separately without any influence from the other between the original and motion compensated frame surrounding blocks. The top left coefficient in each is applied into the sequence of transformation and block is called the DC coefficient, and is the average quantization modules. The spatial redundancy exists value of the block. The right most coefficients in the in neighboring pixels in the image or intra-frame is block are the ones with highest horizontal frequency, minimized by these modules. while the coefficients at the bottom have the highest vertical frequency. This implies that the coefficient in the bottom right corner has the highest frequencies of all the coefficients. The forward DCT of a discrete signal for original image f(i,j) for (MxN) block size and inverse DCT (IDCT) of reconstructed image f-(i, j) for the same (MxN) block size are defined as

2C(u)C(v) M 1 N 1 (2i 1)u (2 j 1)v (1) F(u,v) = cos cos f (i, j ) MN i 0 j 0 2M 2N

M 1 N 1 2C(u)C(v) (2i 1)u (2 j 1)v (2) f(i,j)- = cos cos F (u , v ) u 0 v 0 MN 2M 2N

Where i, u = 0,1,…,M-1, j, v = 0,…,N-1 and the Figure 1: Basic Video encoding module. constants C(u) and C(v) are obtained by The transformation module converts the residue C(x) 2 if x = 0 symbols from time domain into frequency domain, 2 which intends decorrelate the energy present in the = 1 otherwise spatial domain. This is so appropriate for quantization. Quantized transform coefficients and MPEG standards apply DCT for video motion displacement vectors obtained from motion compression. The compression exploits spatial and estimation and compensation module are applied into temporal redundancies which occur in video objects entropy encoding (Variable Length Coding) module, or frames. Spatial redundancy can be utilized by where it removes the statistical redundancy. These simply coding each frame separately. This technique modules are briefly introduced as follows. is referred to as intra frame coding. Additional compression can be achieved by taking advantage of the fact that consecutive frames are often almost identical. This temporal compression has the

Volume 3 Number 3 Page 90 www.ubicc.org Ubiquitous Computing and Communication Journal

potential for a major reduction over simply encoding � (t ) 2h0 [n]� (2t n) (3) each frame separately, but the effect is lessened by n Z the fact that video contains frequent scene changes. The dilation function is recipes for finding a function This technique is referred to as inter-frame coding. that can be build from a sum of copies of itself that The DCT and motion compensated Inter-frame are scaled, translated, and dilated. Equation (3) prediction are combined. The coder subtracts the expresses a condition that a function must satisfy to motion-compensated prediction from the source be a scaling function and at the same time forms a picture to form a ‘prediction error’ picture. The definition of the scaling vector h0. The wavelet at the prediction error is transformed with the DCT, the coarser level is also expressed as coefficients are quantized using scalar quantization \jf (t) 2h1 [n]� (2t n) (4) and these quantized values are coded using an n Z

arithmetic coding. The coded luminance and The discrete high-pass impulse response h1[n], chrominance prediction error is combined with ‘side describing the details using the wavelet function, can information’ required by the decoder, such as motion be derived from the discrete low-pass impulse vectors and synchronizing information, and formed response h0[n] using the following equation.

into a bit stream for transmission. This technique n h1 [n] ( 1) h0 [1 n] (5) works well with a stationary background and a moving foreground since only the movement in the The number of coefficients in the impulse coefficients in the impulse response is called the foreground is coded. number of taps in the filter. For orthogonal filters, Despite all the advantages of JPEG and MPEG the forward transform and its inverse are transposing compression schemes based on DCT namely of each other, and the analysis filters are identical to simplicity, satisfactory performance, and availability the synthesis filters. of special purpose hardware for implementation; these are not without their shortcomings. Since the input image needs to be “blocked,” correlation across 2.2 Quantization the block boundaries is not eliminated. The result is A Quantizer [10][11] simply reduces the number of bits needed to store the transformed coefficients noticeable and annoying “blocking artifacts” particularly at low bit rates. by reducing the precision of those values. Since this is a many to one mapping, it is a lossy process and is

the main source of compression in an encoder. 2.1.2 Discrete Wavelet Transform Quantization can be performed on each individual Wavelets are functions defined over a finite coefficient, which is referred as scalar quantization. interval and having an average value of zero. The Quantization can also be performed on a group of basic idea of the wavelet transform is to represent coefficients together, and which is referred as vector any arbitrary function as a superposition of a set of quantization. such wavelets or basis functions. These basis Uniform quantization is a process of partitioning functions or child wavelets are obtained from a the domain of input values into equally spaced single prototype wavelet called the mother wavelet, intervals, except outer intervals. The end points of by dilations or scaling and translations. Wavelets are partition intervals are called the quantizer decision used to characterize detail information. The boundaries. The output or reconstruction value averaging information is formally determined by a corresponding to each interval is taken to be the kind of dual to the mother wavelet, called the scaling midpoint of the interval. The length of each interval function � (t). The main concept of wavelets is that is referred to as the step size (fixed in the case of at a particular level of resolution j, the set of uniform quantization), denoted by the symbol ∆. translates indexed by n form a basis at that level. The step size ∆ is given by The set of translates forming the basis at the j+1 next 2 X max (6) level, a coarser level, can all be written as a sum of M weights times the level-j basis. The scaling function Where M = number of level of quantizer, Xmax is the is chosen such that the coefficients of its translation maximum range of input symbols. are all necessarily bounded. In this work, a quantizer used in H.264 has been The scaling function, along with its translation, considered for inter-frame motion compensated forms a basis at the coarser level j+1 but not level j. predictive coding, which allows acceptable loss in Instead, at level j the set of translates of the scaling quality for the given video sequences. function � (t) along with the set of translates of the mother wavelet � (t) do form a basis. Since the set of 2.3 Motion Estimation Motion estimation (ME) [12] is a process to translates of the scaling function � (t) at a coarser estimate the pixels of the current frame from level can be written exactly as a weighted sum of (s). Block matching motion translates at a finer level, the scaling function must estimation or block matching algorithm (BMA), satisfy the dilation function which is temporal redundancy removal technique

Volume 3 Number 3 Page 91 www.ubicc.org Ubiquitous Computing and Communication Journal

between two or more successive frames, is an coding is more complex than Huffman coding on its integral part for most of the motion compensated implementation. CAVLC used in H.264 has been video coding standards. Frames are being divided considered in the experiments for entropy encoding into regular sized blocks, so referred as macro blocks process. (MB). Block-matching method is to find the best- matched block from the previous frame. Based on a 2.5 Motivation for this work block distortion measure (BDM), the displacement of DCT is best transformation technique for Motion the best-matched block will be described as the estimation and compensated predictive coding motion vector (MV) to the block in the current frame. models. Due to blocking artifacts problems The best match is usually evaluated by a cost encountered in DCT, sub band coding methods are function based on a BDM such as Mean absolute considered as an alternative for this problem. DWT difference (MAD) defined as is the best alternative method because of its energy 1 M 1 N 1 MAD(i, j) | c( x k, y l) p( x k i, y l j) | compaction and preservation property. Due to MN k 0 l 0 (7) ringing artifacts incurred in DWT, there is a where M x N is the size of the macro block, c(.,.) and tremendous contribution from the researchers, p(.,.) denote the pixel intensity in the current frame experts from various institutes and research labs for and previously processed frames respectively, (k,l) is past two decades. the coordinates of the upper left corner of the current In addition to the transformation module, In block, and (x,y) represents the displacement in pixel DCT-based Motion compensated Predictive which is relative to the position of current block. (MCP) [15] coding architecture, previously After checking each location in the search area, the processed frames are considered as reference frames motion vector is then determined as the (x,y) at to predict the future frames. Even though the which the MAD has the minimum value. In this wok, transformation module is energy preserving and an exhaustive full search has been applied for motion lossless module, it is irreversible in experiments. compensated prediction technique. Subsequently the transformed coefficients are quantized to achieve higher compression leads 2.4 Entropy Encoding further loss in the frame, which are to be considered Based on scientist Claude E. Shannon [8], the as reference frames stored in frame memory for entropy 11 of an information source with alphabet S = future frame prediction. Decoded frames are used for {s1, s2, …, s3} is defined as the prediction of new frames as per the MCP coding n 1 technique. JPEG 2000 [16] proved that high quality 11 H (S ) pi log2 p image compression can be achieved by applying i 1 i (8) DWT. This motivates us to have a combination of Where p is the probability of symbol s in S. The i i orthogonal and bi-orthogonal wavelet filters at term log2 1 indicates the amount of information pi different level of decompositions for intra frames and

contained in si, which corresponds to the number of DCT for inter frames of video sequence. bits needed to encode si. An entropy encoder further compresses the quantized values to give better 3 PROPOSED HYBRID TRANSFORMATION compression ratio. It uses a model to accurately WITH DIFFERENT COMBINATION OF determine the probabilities for each quantized value WAVELET FILTERS and produces an appropriate code based on these probabilities so that the resultant output code stream In order to improve the efficiency of will be smaller than the input stream. The most transformation phase, the following techniques are commonly used entropy encoders are the Huffman adopted in the transformation module of the encoder [13] and the arithmetic encoder [14]. It is . Orthogonal wavelet filters such as Haar important to note that a properly designed entropy filter and Daubechies 9/7 filters are considered for encoder is absolutely necessary along with optimal intra frames and Discrete Cosine Transform for inter signal transformation to get the best possible frames of video sequence. Figure 2 illustrates an compression. overview of the encoder of H.264/AVC with a Arithmetic coding is a more modern coding hybrid transformation technique. Previously method that usually outperforms Huffman coding in processed frames (F’n-1) are used to perform Motion practice. In arithmetic coding, a message is Estimation and Motion Compensated Prediction, represented by an interval of real numbers between 0 which yields motion vectors. and 1. As the message becomes longer, the interval These motion vectors are used to make a motion needed to represent it becomes smaller, and the compensated frame. In the case of inter frames, the number of bits needed to specify that interval grows. frame is subtracted from the current frame (Fn) and Successive symbols of the message reduce the size of the residual frame is transformed using Discrete the interval in accordance with the symbol Cosine Transform (T) and quantized (Q). In the case probabilities generated by the model. The arithmetic of intra frame, the current frame is transformed using

Volume 3 Number 3 Page 92 www.ubicc.org Ubiquitous Computing and Communication Journal

Discrete Wavelet Transform (DWT) with different avoidance of undesirable blocking artifacts, the intra orthogonal wavelet filters such as Haar and frame is reconstructed with high quality. The first Daubechies and quantized (Q). The quantized frame in a GOF is intra frame coded. Frequent intra transform coefficients are then entropy coded and frames enable random access to the coded stream. transmitted or stored through NAL along with the Inter frames are predicted from previously decoded motion vectors found in the motion estimation intra frames. process. 4 EXPERIMENTAL RESULTS AND + X Fn T Q Reorder Entropy encoder DISCUSSION ME

Inter DWT F’n- 1 MC NAL The experiments were conducted for three CIF

video sequences such as “Bus” (352x288, 149 Choose Intra Intra intra prediction frames), “Stefan” (352x288, 89 frames) and “Flower prediction IDWT + + Garden” (352x288, 299 frames), and two QCIF F’n Filter IT Q’1 video sequences like “Suzie” (176x144, 149 frames) and “Mobile” (176x144, 299 frames). The Figure 2: Encoder in the hybrid transformation with experimental results show that the developed hybrid wavelet filters. transform coding with wavelet filters combination outperforms over conventional DCT based video For predicting the subsequent frames from the coding in terms of quality performance. anchor intra frames, the quantized transform Peak Signal to Noise Ratio (PSNR) is commonly coefficients are again dequantized (Q’1) followed by used to measure the quality. It is obtained from inversely transformed (IT) and retained in the frame logarithmic scale and it is Mean Squared Error store or store memory for motion compensated (MSE) between the original and reconstructed image prediction. or video frame with respect to the highest available symbol in the spatial domain. Table 1: Biorthogonal wavelets filter coefficients.

n 2 ( 2 1) Analysis Filter Coefficients P S N R 1 0 log dB (9) 10 M SE i Lowpass Filter gL(i) Highpass Filter g H(i) 0 0.602949018236359 1.115087052456994 where n is the number of bits per image symbol. ±1 0.266864118442872 - ±2 - - The fundamental tradeoff is between and fidelity [17]. The ability of any source encoding ±3 - 0.091271763114249 system is to make this tradeoff as acceptable by ±4 0.026748757410809 - keeping moderate coding efficiency. Synthesis Filter Coefficients Table 2: Proposed combination of wavelets filters. i Lowpass Filter hL(i) Highpass Filter h H(i)

0 1.115087052456994 0.602949018236379 Proposed 1st level 2nd level ±1 0.591271763114247 - combination Decomposition Decomposition ±2 - - P1 Haar Haar P2 Haar Daub ±3 - 0.016864118442874 P3 Daub Haar ±4 - 0.026748757410809 P4 Daub Daub

In the case of intra frames, inverse Discrete Table 2 shows the combination of orthogonal Wavelet Transform is applied in order to obtain Haar and Daubechies 9/7 wavelet filters in different reconstructed reference frames (F’n) through de- level of decompositions in transform coding. These blocking filter for inter frames of video sequence. combinations are simulated in H.264/AVC codec, The hybrid transformation technique employs where the DCT is the de-facto transformation different techniques for different categories of frames. technique for both intra frame and inter frames of Intra frames are coded using both Haar wavelet filter video sequence processing. coefficients [0.707, 0.707] and bi-orthogonal Table 3 shows the performance comparison of Daubechies 9/7 wavelet filter coefficients as shown the quality parameter in terms of Peak Signal-to- in Table 1 [16] in different combinations on different Noise Ratio (PSNR) for the existing de-facto DCT decomposition levels. Because of wavelet’s transformation with combination of proposed advantages over DCT such as complete spatial wavelet filters. The values in the table represent the correlation among pixels in the whole frame, average PSNR improvement for Luminance (Y)

Volume 3 Number 3 Page 93 www.ubicc.org Ubiquitous Computing and Communication Journal

component and Chrominance (U and V) components. considered in this paper includes the PSNR As per Human Vision System, human eyes are performance. The performance evaluations show that highly sensitive on Luminance than the Chrominance the hybrid transformation technique outperforms the components. In this analysis, both Luminance and existing DCT transformation method used in Chrominance components are considered due to the H.264/AVC significantly. The experimental results importance of colour in near lossless applications. also demonstrate that the combination of Haar There is 0.12 dB Y-PSNR improvement in P4 wavelet filter in 1st level of decomposition and combination with DCT transformation for ‘Bus’ CIF Daubechies wavelet filters in 2nd level of sequence. When the comparison has been made for decomposition outperforms other combination and ‘Stefan’ CIF sequence, 0.31 dB Y-PSNR the original DCT used in the existing AVC standard. improvement has been achieved in P1 combination with existing transformation. 0.14 dB Y-PSNR ACKNOWLEDGEMENT quality has been obtained with DCT transformation The authors wish to thank S. Anusha, A. R. in P4 combination for ‘Flower-Garden’ CIF Srividhya, S. Vanitha, V. Rajalakshmi, R. Ramya, M. sequence. Vishnupriya A. Arun, V. Vijay Anand, S. Dhinesh Kumar and P. Navaneetha Krishnan undergraduate Table 3: PSNR comparison for the various video students for their valuable help. sequences. 6 REFERENCES Existing P1 P2 P3 P4 Sequence PSNR (dB) (dB) (dB) (dB) (dB) [1] Zixiang Xiong, Kannan Ramachandran, Y 35.77 35.03 35.88 35.88 35.89 Michael T. Orchard and Ya-Qin Zhang: A Bus U 35.83 35.81 35.83 35.82 35.82 Comparative study of DCT and Wavelet-Based V 36.04 36.03 36.04 36.03 36.03 Image Coding, IEEE Transactions on Circuits Y 36.38 35.69 36.50 36.50 36.50 Stefan U 35.00 35.00 35.01 35.00 35.00 and Systems for Video Technology, Vol. 9, V 36.90 36.90 36.91 36.91 36.91 No. 5, pp. 692-695 (1999). Y 36.00 35.72 36.13 36.13 36.14 [2] N. Ahmed, T. Natarajan and K. R. Rao: Flower U 36.51 36.49 36.47 36.50 36.50 Garden Discrete Cosine Transform, IEEE Transactions V 34.93 34.92 34.93 34.94 34.93 on Computers, pp. 90-93 (1974). Y 37.62 37.57 37.66 37.68 37.68 [3] Ingrid Daubechies: Ten lectures on wavelets, Suzie U 43.76 43.71 43.72 43.75 43.74 Capital city Press, Pennsylvania, pp. 53-105 V 43.32 43.35 43.43 43.39 43.39 (1992). Y 33.95 33.92 34.10 34.10 34.10 Mobile U 35.13 35.12 35.10 35.08 35.08 [4] Marc Antonini, Michel Barlaud, Pierre Mathieu V 34.92 34.96 34.91 34.91 34.91 and Ingrid Daubechies: Image coding using wavelet transform, IEEE Transactions on As per QCIF sequences such as ‘Suzie’ and Image Processing, Vol. 1, No. 2, pp. 205-220 ‘Mobile’ are concerned, up to 0.15 dB Y-PSNR (1992). improvement has been achieved when the bi- [5] Gary J. Sullivan, Pankaj Topiwala and Ajay orthogonal wavelet filters are considered in the 2nd Luthra: The H.264/AVC AVC Standard - level of decomposition of the wavelet operation for Overview and Introduction to the Fidelity intra frames of the video sequences. In both CIF and Range Extensions, SPIE Conference on QCIF video sequences, a comparable quality Applications of Digital Image Processing improvement has been attained as per Luminance XXVII (2004). components such as U-PSNR and V-PSNR are [6] Iain E. G. Richardson: H.264 and MPEG-4 concerned. Video Compression, John Wiley & Sons (2003). [7] ftp://ftp.imtc.org/jvt-experts/reference_software. [8] 5 CONCLUSION C. E. Shannon: A Mathematical theory of Communication, Bell System Technical Journal, In this paper, a hybrid transformation technique Vol. 27, pp. 623-656 (1948). for advanced video coding has been proposed. In [9] Kelth Jack: Video Demystified, Penram which, the intra frames of video sequence are coded International Publishing Pvt. Ltd., Mumbai, by DWT with Haar and Daubechies wavelet filters pp. 234-236 (2001). and the inter frames of video sequence are coded [10] Allen Gersho: Quantization, IEEE with DCT technique. The hybrid transformation Communications Society Magazine, pp. 16-29 technique is also simulated in the existing (1977). H.264/AVC reference software. Experiments were [11] Peng H. Ang, Peter A. Ruetz and David Auld: conducted with various standard CIF and QCIF Video compression makes big gains, IEEE video sequences such as Bus, Stefan, Flower-Garden, Spectrum (1991). Mobile and Suzie. The performance parameter [12] Frederic Dufaux, Fabrice Moscheni:Motion

Volume 3 Number 3 Page 94 www.ubicc.org Ubiquitous Computing and Communication Journal

Estimation Technique for Digital TV-A Review standards for Image, Video and Audio Coding, and a New Contribution, Proceedings of IEEE, NJ, Prentice Hall, pp. 85-96 (1996). Vol. 83, No. 6, pp. 858-876 (1995). [16] B. E. Usevitch: A Tutorial on Modern Lossy [13] D. A. Huffman: A Method for the Construction Wavelet Image Compression-Foundations of of Minimum-Redundancy Codes, Proceedings JPEG 2000, IEEE Signal Processing Magazine, of IRE, Vol. 40, No. 9, pp. 1098-1101 (1952). Vol. 18, No. 5, pp. 22-35 (2001). [14] P. G. Howard, J. C. Vitter: Arithmetic Coding [17] Gary J. Sullivan, Thomas Wiegand: Video for Data Compression, Proceedings of the IEEE, Compression – from concepts to the Vol. 82, No. 6, pp. 857-865 (1994). H.264/AVC standard, Proceedings of IEEE, [15] K. R. Rao, J. J. Hwang: Techniques and Vol. 93, No. 1, pp. 18-31 (2005).

Volume 3 Number 3 Page 95 www.ubicc.org