Video Compression Standards: a Comparative Analysis Of

EE5359 Spring 2012

EE5359 Project Proposal on

VIDEO COMPRESSION STANDARDS: A COMPARATIVE ANALYSIS OF H.264, DIRAC AND AVS PART 2

By Sudeep Gangavati ID : 1000717165

LIST OF ACRONYMS

AU: Access Unit

AVS: Audio Video Standard

AVS-M: Audio Video Standard for mobile

B-Frame: Bidirectionally Interpolated Frame

BBC: British Broadcasting Corporation

CAVLC: Context Adaptive Variable Length Coding

CBP: Coded Block Pattern

CIF: Common Intermediate Format EE5359 Spring 2012

DIP: Direct Intra Prediction

DPB: Decoded Picture Buffer

FPS: Frames Per Second

HD: High Definition

ICT: Integer Cosine Transform

IDR: Instantaneous Decoding Refresh

I-Frame: Intra Frame

ITU-T: International Telecommunication Union

JPEG: Joint Photographic Experts Group

MSE: Mean Square Error

PSNR: Peak Signal to Noise Ratio

QCIF: Quarter Common Intermediate Format

SMPTE: Society of Motion Picture and Television Engineers

SSIM: Structural Similarity Index

VIDEO COMPRESSION STANDARDS: A COMPARATIVE ANALYSIS OF H.264, DIRAC AND AVS PART 2 Objective: The objective of this project to study, implement and compare video coding standards like H.264/AVC [1], Dirac [3] and AVS China P2 (AVS video) [8]. The analysis will be carried out and different performance metrics like MSE, PSNR, bitrate, SSIM [13] and video quality will be evaluated for high resolution videos at various bitrates.

Motivation: With the ever increasing demand for video compression, several different video coding standards have been developed to address the needs efficient video coding. The project attempts to implement and evaluate the video coding standards that have been extensively used for video broadcasting, storage and distribution. The video coding standards that will be evaluated are H.264/AVC (Main Profile) [1], AVS China P2 (Base Profile) [8] and Dirac [3]. Since EE5359 Spring 2012

Introduction: H.264/AVC

The H.264/AVC is the latest advanced video coding standard developed by ITU-T Video Coding Experts Group together with the ISO/IEC Moving Picture Experts Group [1]. It is the most widely used video coding standard [15] [28] for streaming videos, mobile/handheld applications, HDTV broadcasting etc. The H.264 standard supports three sampling patters for luminance component (Y), red-difference chroma component (Cr) and blue-difference chroma component (Cb) [20]. The 4:4:4 sampling means that the three components (Y: Cr: Cb) have the same resolution and a sample of each component exists at every pixel position as show in Figure 1(a).

Fig. 1(a) 4:4:4 , 4:2:2, 4:2:0 Sampling patterns [20]

An H.264 encoder converts the raw video into a compressed version and the decoder converts the compressed video back to its original format. The H.264 encoder block diagram is shown in Figure 1(b). EE5359 Spring 2012

Fig.1 (b) Basic H.264 encoder structure [1]

The encoder performs transform, quantization, prediction and encoding to produce compressed video. The decoder shown in Figure 2 on the other hand does the inverse operations to obtain the uncompressed video.

Fig.2 H.264 decoder structure [2] [17]

H.264 Profiles H.264 standard defines a set of profiles and levels to set points of conformance for different classes of applications and services. For each profile there are specific encoding tools that will be supported by the decoders conforming to that profile. There are mainly seven profiles [2]: Baseline, Main, Extended, High, High 10, High 4:2:2, High 4:4:4. EE5359 Spring 2012

Main profile is designed for digital storage media and television broadcasting. Extended profile is designed for multimedia services over Internet. Baseline profile is aimed at real time applications such as video conferencing. High profile mainly aims at applications such as content distribution, studio editing and high resolution videos. Profiles are shown in Figure 3.

Fig.3 H.264 profiles and levels [2]

Video Coding Algorithm

The encoder block diagram is shown in Figure 1 [1]. Encoder will select between intra and inter-coding for blocks of each picture. Intra coding exploits several spatial prediction modes as shown in Figure 3(a) and (b) to reduce the redundancies in the signal for a single picture. Inter coding does the inter prediction of each block of sample values from previously coded pictures. Inter coding uses motion vectors for block based inter-prediction to reduce temporal redundancy between different pictures. The deblocking filter is used to reduce the blocking artifacts at the block boundaries. The prediction residual is further transformed to remove spatial correlation in the block before it is quantized. Finally, the intra predicted modes or motion vectors are combined with the quantized transform coefficient information and encoded using arithmetic coding or entropy coding.

Prediction in H.264 The H.264 video coding standard employs two different prediction techniques called intra prediction and inter prediction in order to predict the current macroblock [20]. In intra prediction, the prediction for the current macroblock of image samples is created EE5359 Spring 2012 from previously coded samples in the same frame. In an intra macroblock, there are three choices for the intra macroblock size for the luma component namely 16 x 16, 8 x 8 or 4 x 4 [1][20]. A single prediction block is generated using one of a number of possible prediction modes. For a 4 x 4 macroblock there are nine modes, for 8 x 8 macroblock size available in High profile also has 9 modes but for 16 x 16 there are 4 modes [20].

During the prediction, one mode is selected and is then used to predict the values. The modes for 4 x 4 and 16 x 16 luma prediction modes are shown in Figure 3 (a) and Figure 3 (b) respectively.

Fig. 3(a) Intra prediction modes for 4 x 4 block size [20].

Fig. 3 (b) Intra prediction modes for 16 x 16 block size [20].

In inter prediction, motion estimation and motion compensation techniques are used to predict the current macroblock [20]. This process involves selecting a prediction region, generating a prediction block and subtracting this from the original block of samples to form a residual that is transformed, quantized and then encoded. The macroblock can be of different sizes as shown in Figure 3 (c). EE5359 Spring 2012

1 macroblock partition of 2 macroblock partitions of 2 macroblock partitions of 4 sub-macroblocks of 16*16 luma samples and 16*8 luma samples and 8*16 luma samples and 8*8 luma samples and associated chroma samples associated chroma samples associated chroma samples associated chroma samples

0 0 1 Macroblock 0 partitions 0 1 1 2 3

1 sub-macroblock partition 2 sub-macroblock partitions 2 sub-macroblock partitions 4 sub-macroblock partitions of 8*8 luma samples and of 8*4 luma samples and of 4*8 luma samples and of 4*4 luma samples and associated chroma samples associated chroma samples associated chroma samples associated chroma samples

Sub-macroblock 0 0 1 partitions 0 0 1 1 2 3

Fig.3(c) Inter prediction macroblock sizes [20]

Deblocking Filter A filter is applied to every decoded macroblock as shown in Figure 3(d) to reduce the blocking distortion [20]. The deblocking filter is applied after the inverse transform in the encoder before reconstructing and storing the macroblock for future predictions and in the decoder before reconstructing and displaying the macroblock. The filter smooths the block edges, improving the appearance of the decoded frames.

Fig. 3 (d) Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries shown with dotted lines) [2]

DIRAC EE5359 Spring 2012

Dirac is a video codec originally developed by BBC [3]. The main aim of Dirac video standard is to provide high-quality video compression for web streaming and HDTV applications. BBC used Dirac to transmit HDTV pictures of Beijing Olympics in 2008 [3] [4]. Dirac Pro is a version of Dirac family of compression tools mainly optimized for video production and archiving applications and the focus is on high quality and low latency. Dirac Pro is intended for high quality applications with lower compression ratios [4][5]. In the Dirac codec, image motion is tracked and the motion information is used to make a prediction of a later frame. A transform is applied to the prediction error between the current frame and the previous frame aided by motion compensation and the transform coefficients are quantized and entropy coded. Temporal and spatial redundancies are removed by motion estimation, motion compensation and discrete wavelet transform respectively. Dirac uses a more flexible and efficient form of entropy coding called arithmetic coding which packs the bits efficiently into the bit stream [30]. Dirac Pro supports the following technical aspects [5]:

 Intra-frame coding only  10 bit 4:2:2  No Subsampling  Lossless or visually lossless compression  Low latency on encode/decode  Robust over multiple passes  Support for multiple HD image formats and frame rates  Low complexity for decoding

The main difference in the Dirac and Dirac Pro is the treatment in the final process in compression – the arithmetic coding. Arithmetic coding is processing intensive and introduces delay. These features are undesirable in high end production work and hence Dirac Pro omits arithmetic coding. EE5359 Spring 2012

Fig.4 (a) Dirac encoder [3]

Fig.4 (b) Dirac decoder [3]

The encoder and decoder block diagram are shown in Figure 4 (a) and (b) respectively.

Architecture

Dirac can compress any size of picture from low resolution QCIF to HDTV. Dirac employs wavelet compression instead of discrete cosine transform [6] used in other codecs. Another application of wavelet transform is the JPEG 2000 compression standard for still images [7]. Motion Estimation EE5359 Spring 2012

In Dirac, frames have two essential properties. Firstly, they are either predicted from other frames i.e. Inter. Secondly they can be used to predict other frames. All combinations of these properties are possible, and any inter frame can be predicted from up to two reference frames. But in Dirac pro, only intra frame coding is used. Dirac Pro provides spatial and quality scalabilities, useful to save bandwidth during the transmission of a single bit stream to receivers with different image resolution and bandwidth requirements [6]. Dirac pro has been adopted by SMPTE as VC-2 [29].

AVS CHINA

AVS is an acronym for Audio Video Standard which is a compression codec for digital audio and video developed by China [8]. AVS China was developed to replace the most used H.264/AVC standard. AVS China finds its applications in high resolution broadcast, video on wireless communications medium etc. AVS China has been divided into various parts and thus dividing the AVS china architecture into various sub-fields. The AVS standard has been divided into 10 parts as shown in Figure 5 [16].

Fig.5 AVS parts [16] EE5359 Spring 2012

AVS part 1 considers the system for broadcast. AVS Part 2 considers the video part. AVS Part3 covers the audio part and AVS Part 6 includes content creation. The AVS Part 2 encoding and decoding structures [8] are shown in Figure6 (a) and (b).

Fig.6 (a) AVS China part 2 encoding structure [8].

Fig.6 (b) AVS China part 2 decoding structure [8].

System architecture

AVS Part 2 is hybrid coding based on spatial and temporal predictions, integer transform and entropy coding. The system architecture is illustrated in Figure.6 [8] Intra Prediction EE5359 Spring 2012

Spatial prediction as shown in Figure 7 is used in intra coding in AVS part 2 to exploit spatial correlations of picture. The intra prediction is based on 8x8 block. The intra prediction method is derived from the neighboring pixels in left and top blocks. There are five luminance intra prediction modes, and four chrominance intra prediction modes. The reconstructed pixels of neighboring blocks before deblocking filter are used on reference pixels for the current block .

Fig.7 Five different modes for 8 x8 block intra luminance prediction [8].

Deblocking filter The deblocking filter is used to reduce/eliminate the block artifacts and enhance both subjective and objective performance. AVS Part 2 deblocking filter first calculates the boundary strength (BS) of each block boundary, and then applies different filters for different BS. VLC AVS Part 2 utilizes an efficient context based 2D-VLC entropy coder for coding 8x8 block-size transform coefficients. 2D-VLC means that a pair Run-Level is regarded as one event and jointly coded. EE5359 Spring 2012

EXPERIMENTAL RESULTS: DIRAC: Video sequence used: news_qcif.yuv Width: 176 Height: 144 Total number of frames: 300 Number of frames used: 100 Frame rate: 25 FPS File Size: 3713kB

Quality Compresse Bit rate Y- Y-MSE Y-SSIM Comrpessio Factor d File Size (kBps) PSNR(dB) n Ratio

0 38 9.573 25.773 187.13 0.79 98:1

5 161 40.571 36.134 19.588 0.95 23:1

10 1379 135.55 46.798 1.41 0.98 2.7:1

15 2780 230.12 55.012 1.01 0.99 1.33

Table 1: Parametric values for Dirac video codec EE5359 Spring 2012

Fig. 8 (a) Output of Dirac video codec at Quality Fig.8 (b) Output of Dirac video codec at Quality Factor Factor of 0. of 5.

Fig.8 (c) Output of Dirac video codec at Quality Factor of 10 EE5359 Spring 2012

CONCLUSION This project aims at a thorough study, implementation and exhaustive comparison of video coding standards like H.264 (Main Profile), Dirac and AVS part 2 ( Base Profile). Analysis has been carried out and different performance metrics like MSE, PSNR etc. have been computed for Dirac video codec. The software will be tested on other video sequences as well. Implementation of H.264 and AVS will be done in the coming days. Based on the values of these performance metrics, conclusions will be drawn as to which video coding standard is best suited for video broadcasting.

References: EE5359 Spring 2012

[1] T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 13, pp.560-576, July 2003. [2] S.K.Kwon, A. Tamhankar and K.R.Rao, “Overview of H.264/MPEG-4 Part 10” J.VCIR, Vol. 17, pp. 186-216, April 2006, Special Issue on “Emerging H.264/AVC video coding standard”. [3] “ The Dirac web page” :http://www.bbc.co.uk/rd/projects/dirac/intro.shtml. [4] “Dirac Codec Wiki Page ” at http://en.wikipedia.org/wiki/Dirac(codec). [5] “Dirac Pro web page” at http://www.bbc.co.uk/rd/projects/dirac/diracpro.shtml. [6] “Video on the web “ at http://etill.net/projects/dirac_theora_evaluation/. [7] T. Borer, and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review, July 2005. [8] L. Yu et al, “An overview of AVS-Video: tools, performance and complexity”, Visual Communications and Image Processing, Proc. of SPIE, vol. 5960, pp.679-690, July 2006. [9] AVS Video Expert Group, “Information technology – Advanced coding of audio and video – Part 2: Video (AVS1-P2 JQP FCD 1.0)”, Audio Video Coding Standard Group of China (AVS), Doc. AVS-N1538, Sep. 2008.

[10] Special issue on “AVS and its applications” Signal processing: Image Communication, vol.24, pp. 245-344, April 2009.

[11] JVT ”Draft ITU-T recommendation and final draft international standard of joint video specification (ITU-T rec. H.264– ISO/IEC 14496-10 AVC),” March 2003, JVT- G050 available on http://ip.hhi.de/imagecom_G1/assets/pdfs/JVT-G050.pdf . [12] K. Onthriar, K. K. Loo and Z. Xue, “Performance comparison of emerging Dirac video codec with H.264/AVC”, IEEE International Conference on Digital Telecommunications, 2006, ICDT apos; Vol. 06, Page: 22, Issue: 29-31, Aug. 2006.

[13] Z. Wang and A.C. Bovik, “A universal image quality index”, IEEE Signal Processing Letters, Vol.9, pp. 81-84, March 2002.

[14] A. Ravi, “Performance analysis and comparison of the Dirac video codec with H.264/AVC”, M.S. Thesis, Electrical Engineering Department, University of Texas at Arlington, August 2009. EE5359 Spring 2012

[15] “H.264/MPEG-4 AVC web page” http://en.wikipedia.org/wiki/H.264/MPEG-4_AVC.

[16] H.264 AVC JM Software : http://iphome.hhi.de/suehring/tml/.

[17] H.264 decoder: http://www.adalta.it/Pages/407/266881_266881.jpg.

[18] W. Gao et al, “AVS - The Chinese next-generation video coding standard” NAB, Las Vegas, 2004.

[19] X. Wang et al, “Performance comparison of AVS and H.264/AVC video coding standards” J. Computer Science and Technology, vol.21, No.3, pp.310-314, May 2006.

[20] Iain Richardson, “ The H.264 advanced video coding standard”, Second Edition, Wiley, 2010

[21] AVS China part 2 video software, password protected : ftp://124.207.250.92/.

[22] S. Swaminathan and K.R. Rao, “Multiplexing and demultiplexing of AVS CHINA video with AAC audio,” TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011. [23] Dirac Pro Software : http://diracvideo.org/download/

[24] M. Tun, K.K. Loo and J. Cosmas, “Semi-hierarchical motion estimation for the Dirac video codec,” 2008 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp.1–6, March 31-April 2, 2008.

[25] T. Davies, “The Dirac Algorithm”: http://dirac.sourceforge.net/documentation/algorithm/, 2008.

[26] Dirac video codec – A programmer's guide: http://dirac.sourceforge.net/documentation/code/programmers_guide/toc.htm

[26] B. Tang et al, “AVS encoder performance and complexity analysis based on mobile video communication”, WRI International conference on Communications and Mobile Computing, CMC ‘09, vol. 3, pp. 102-107, 6-8 Jan. 2009.

[27] A. Ravi and K.R. Rao, “Performance analysis and comparison of the Dirac video codec with H.264 / MPEG-4 Part 10 AVC”, IJWMIP, vol.9, No. 4, pp.635-654, 2011. EE5359 Spring 2012

[28] T. Wiegand and G.J. Sullivan, “The picturephone is here. Really,” IEEE Spectrum, vol. 48, pp. 50-54, Sept. 2011. [29] Adoption of Dirac pro by SMPTE as VC-2 “"SMPTE 2042-1-2009", Sept. 2009.

[30] T. Borer and T. Davies, “Dirac video compression using open technology,” BBC EBU Technical Review, July 2005.