A Hybrid Transformation Technique for Advanced Video Coding

Ubiquitous Computing and Communication Journal A HYBRID TRANSFORMATION TECHNIQUE FOR ADVANCED VIDEO CODING M. Ezhilarasan, P. Thambidurai Department of Computer Science & Engineering and Information Technology, Pondicherry Engineering College, Pondicherry – 605 014, India [email protected] ABSTRACT A Video encoder performs video data compression by having combination of three main modules such as Motion estimation and compensation, Transformation, and Entropy encoding. Among these three modules, transformation is the module of removing the spatial redundancy that exists in the spatial domain of video sequence. Discrete Cosine Transformation (DCT) is the defacto transformation method in existing image and video coding standards. Even though the DCT has very good energy preserving and decorrelation properties, it suffers from blocking artifacts. To overcome this problem, a hybridization method has been incorporated in transformation module of video encoder. This paper presents an hybridization in the transformation module by incorporating DCT as transformation technique for inter frames and a combination of wavelet filters for intra frames of video sequence. This proposal is also applied in the existing H.264/AVC standard. Extensive experiments have been conducted with various standard CIF and QCIF video sequences. The results show that the proposed hybrid transformation technique outperforms the existing technique used in the H.264/AVC considerably. Keywords: Data Compression, DCT, DWT, Video Coding, Transformation. 1 INTRODUCTION In Advanced Video Coding (AVC) [6], video is captured as a sequence of frames. Each frame is Transform coding techniques have become the compressed by partitioning it as one or more slices, important paradigm in image and video coding where each slice consists of sequence of macro standards, in which the Discrete Cosine Transform blocks. These macro blocks are transformed, (DCT) [1][2] is applied due to its high decorrelation quantized and encoded. The transformation module and energy compaction properties. In the past two converts the frame data from time domain to decades, more contributions focused on Discrete frequency domain, which intends to decorrelate the Wavelet Transform (DWT) [3][4] for its energy (i.e., amount of information present in the performance in image coding. The two most popular frame) present in the spatial domain. It also converts techniques such as DCT and DWT are well applied the energy components of the frame into small on image and video coding applications. numbers of transform coefficients, which are more International Organization for Standards / efficient for encoding rather than their original International Electro technical Commission frame. Since the transformation module is reversible (ISO/IEC) and International Telecommunications in nature, this process does not change the Union – Telecommunication standardization sector information content present in the source input signal (ITU-T) organizations have developed their own during encoding and decoding process. By video coding standards viz., Moving Picture Experts information theory, transformed coefficients are Group MPEG-1, MPEG-2, MPEG-4 for multimedia reversible in nature. and H.261, H.263, H.263+, H.263++, H.26L for As per Human Visual System (HVS), human videoconferencing applications. Recently, the MPEG eyes are highly sensitive on low frequency signals and the Video Coding Experts Group (VCEG) have than the high frequency signals. The decisive jointly designed a new standard namely, H.264 / objective is this paper is to develop a hybrid MPEG-4 (Part-10) [5] for providing better technique that achieves higher performance in the compression of video sequence. There has been a parameters specified above than the existing tremendous contribution by researchers, experts of technique used in the current advanced video coding various institutions and research laboratories for the standard. past two decades to take up the recent technology In this paper, a combination of orthogonal and requirements in the video coding standards. bi-orthogonal wavelet filters have been applied at Volume 3 Number 3 Page 89 www.ubicc.org Ubiquitous Computing and Communication Journal different decomposition levels for intra frames and 2.1 Basics of Transformation DCT for inter frames of video encoder. Even though intra frames are coded with wavelet transform, the From the basic concepts of information theory, impact of this can be seen in inter frame coding. coding of symbols in vectors is more efficient than With better quality anchor pictures are retained in coding of symbols in scalars [8]. By using this frame memory for prediction, the remaining inter phenomenon, a group of blocks of consecutive frame pictures are more efficiently coded with DCT. symbols from the source video input are taken as The proposed transformation method is also vectors. There is high correlation in the neighboring implemented on H.264/AVC reference software [7]. pixels in an image or intra-frame of video. The paper is organized as follows. In Section 2, the Transformation is a reversible model [9] by theory, basics of the transform coding methods are which decorrelates the symbols in the given blocks. highlighted. The proposed hybrid transformation In the recent image and video coding standards the technique has been described in section 3. Extensive following transformation techniques are applied due experimental results and discussion have been given to their orthonormal property and energy in section 4 followed by conclusion in section 5. compactness. 2 BASICS OF TRANSFORM CODING 2.1.1 Discrete Cosine Transform The Discrete Cosine Transform, a widely used For any inter-frame video coding standards, the transform coding technique in image and video basic functional modules are motion estimation and compression algorithms. It is able to perform de- compensation, transformation, quantization and correlation of the input signal in a data-independent entropy encoder. As shown in the Fig. 1, the manner. When an image or a frame is transformed by temporal redundancies exists in successive frames DCT, it is first divided into blocks, typically of size are minimized or reduced by motion estimation and of 8x8 pixels. These pixels are transformed compensation module. The residue or the difference separately without any influence from the other between the original and motion compensated frame surrounding blocks. The top left coefficient in each is applied into the sequence of transformation and block is called the DC coefficient, and is the average quantization modules. The spatial redundancy exists value of the block. The right most coefficients in the in neighboring pixels in the image or intra-frame is block are the ones with highest horizontal frequency, minimized by these modules. while the coefficients at the bottom have the highest vertical frequency. This implies that the coefficient in the bottom right corner has the highest frequencies of all the coefficients. The forward DCT of a discrete signal for original image f(i,j) for (MxN) block size and inverse DCT (IDCT) of reconstructed image f-(i, j) for the same (MxN) block size are defined as 2C(u)C(v) M 1 N 1 (2i 1)u (2 j 1)v (1) F(u,v) = cos cos f (i, j ) MN i 0 j 0 2M 2N M 1 N 1 2C(u)C(v) (2i 1)u (2 j 1)v (2) f(i,j)- = cos cos F (u , v ) u 0 v 0 MN 2M 2N Where i, u = 0,1,…,M-1, j, v = 0,…,N-1 and the Figure 1: Basic Video encoding module. constants C(u) and C(v) are obtained by The transformation module converts the residue C(x) 2 if x = 0 symbols from time domain into frequency domain, 2 which intends decorrelate the energy present in the = 1 otherwise spatial domain. This is so appropriate for quantization. Quantized transform coefficients and MPEG standards apply DCT for video motion displacement vectors obtained from motion compression. The compression exploits spatial and estimation and compensation module are applied into temporal redundancies which occur in video objects entropy encoding (Variable Length Coding) module, or frames. Spatial redundancy can be utilized by where it removes the statistical redundancy. These simply coding each frame separately. This technique modules are briefly introduced as follows. is referred to as intra frame coding. Additional compression can be achieved by taking advantage of the fact that consecutive frames are often almost identical. This temporal compression has the Volume 3 Number 3 Page 90 www.ubicc.org Ubiquitous Computing and Communication Journal potential for a major reduction over simply encoding � (t ) 2h0 [n]� (2t n) (3) each frame separately, but the effect is lessened by n Z the fact that video contains frequent scene changes. The dilation function is recipes for finding a function This technique is referred to as inter-frame coding. that can be build from a sum of copies of itself that The DCT and motion compensated Inter-frame are scaled, translated, and dilated. Equation (3) prediction are combined. The coder subtracts the expresses a condition that a function must satisfy to motion-compensated prediction from the source be a scaling function and at the same time forms a picture to form a ‘prediction error’ picture. The definition of the scaling vector h0. The wavelet at the prediction error is transformed with the DCT, the coarser level is also expressed as coefficients are quantized using scalar quantization \jf (t) 2h1 [n]� (2t n) (4) and these quantized values are coded using an n Z arithmetic coding. The coded luminance and The discrete high-pass impulse response h1[n], chrominance prediction error is combined with ‘side describing the details using the wavelet function, can information’ required by the decoder, such as motion be derived from the discrete low-pass impulse vectors and synchronizing information, and formed response h0[n] using the following equation. into a bit stream for transmission. This technique n h1 [n] ( 1) h0 [1 n] (5) works well with a stationary background and a moving foreground since only the movement in the The number of coefficients in the impulse coefficients in the impulse response is called the foreground is coded.

A Hybrid Transformation Technique for Advanced Video Coding

Kulkarni Uta 2502M 11649.Pdf

Multiple Reference Motion Compensation: a Tutorial Introduction and Survey Contents

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter

H.264/MPEG-4 Advanced Video Coding Alexander Hermans

RULING CODECS of VIDEO COMPRESSION and ITS FUTURE 1Dr.Manju.V.C, 2Apsara and 3Annapoorna

A Video Stabilization Method Based on Inter-Frame Image Matching Score

Video Inter-Frame Forgery Detection: a Survey

Video Compression

Neural Video Coding Using Multiscale Motion Compensation and Spatiotemporal Context Model

Intra-Frame JPEG-2000 Vs. Inter-Frame Compression Comparison: the Benefits and Trade-Offs for Very High Quality, High Resolution Sequences

Visual Computing Systems CMU 15-769, Fall 2016 Lecture 7

Digital Video Coding Standards and Their Role in Video Communications