Advanced Image Coding

TUSHAR SAXENA FALL 2011

THESIS PROPOSAL

Reducing the encoding time of H.264 Baseline profile using parallel programming techniques

INSTRUCTOR: DR. K. R. RAO

Tushar Saxena Department of Electrical Engineering University of Texas at Arlington Email: [email protected]

Page 1 LIST OF ACRONYMS

ASO Arbitrary slice ordering

API Application Programming Interface

BMA Block-matching algorithm

CPU Central Processing Unit

CUDA Compute Unified Device Architecture

EE Electrical Engineering

FMO Flexible Macro block Ordering

GPU Graphical Processing Unit

HD High Definition

ME Motion Estimation

M.S. Master of Science

NAL Network Abstraction Layer

OpenMP Open Multi Processing

UTA University of Texas at Arlington

Page 2 Reducing the encoding time of H.264 Baseline profile using parallel programming techniques

Abstract:

H.264 [5] is a standard for video compression for recording, compression and distribution of high definition video. It is also designed for multiview coding, scalable coding, etc. Baselines and extended profiles are designed for handheld devices, video streaming, etc. It basically reduces the amount of information required to reproduce the input video by exploiting redundancy in the pictures it is encoding, both spatially (within the same picture) and temporally (between pictures). But these computations are very complex. It increases the encoding time so as to restrict its use for real time applications.

To make it suitable for real-time applications, the encoding time of H.264 video codec should be reduced. This can be achieved by encoding video frames parallely instead of sequentially. Hence more than one video frame [depending on the number of cores on the system] will be encoded in the same time duration of a single video frame. With the advancement in technology and the need for more bandwidth and processing power increasing on a daily basis, many parallel programming techniques [10] are now available. Since the scope of this thesis is to reduce the encoding time in H.264 Baseline profile only on central processing units (CPU), a set of libraries known as Open Multi Processing (OpenMP) will be used for this purpose.

What is H.264:

H.264 [5] is a video compression standard which can achieve high quality video in relatively low bitrates. It can mainly achieve this because of its very strong salient feature i.e. Variable block-size motion compensation with small block sizes [See Fig.1]

Fig.1: Block sizes available for motion prediction in H.264 [2]

The encoder [see fig. 2] splits the input video signal into macro blocks of 16x16 pixels. The macro blocks are then encoded by a technique known as motion estimation (ME) [1], an essential part in inter-picture prediction, makes a great contribution to reduce the bit rate.

Page 3 FIG.2: H.264 VIDEO ENCODER BLOCK DIAGRAM [2]

Once the encoding of a frame is done, the coded video data is organized into network abstraction layer (NAL) units [see fig. 3], each of which is effectively a packet that contains an integer number of bytes. The first byte of each NAL unit is a header byte that contains an indication of the type of data in the NAL unit and the remaining bytes contain payload data of the type indicated by the header.

FIG.3: NAL unit interface between encoder and decoder [2]

The NAL units are then made available at the input of the decoder [see fig. 4] where the encoded data is decoded to obtain the original frame.

FIG.4: H.264 VIDEO DECODER BLOCK DIAGRAM [2]

Page 4 ME [see fig. 5] is an important part of inter-picture prediction. It is a process of determining the best motion vectors that describe the transformation from one frame to another. MV described as (dx, dy) is displacement vector of a moving object. An algorithm known as block-matching algorithm (BMA) [3] is used in H.264 standard to locate matching macro block in a frame based on the position of this macro block in reference frame.

FIG.5: Multi frame ME [4]

H.264 exploits both spatial redundancy as well as temporal redundancy. Temporal redundancy is exploited using ME [see fig. 5] whereas spatial redundancy is exploited using the prediction modes [see fig. 6]. Since these computations are very complex, it increases the encoding time drastically restricting the use of H.264 for real time application. For homogeneous regions the macro block sizes are 16x16 with four intra prediction modes [see fig. 7] but for non homogeneous regions the block sizes are 4x4 with nine prediction modes [see fig. 6].

Fig.6: Nine intra prediction modes for 4x4 block sizes [11]

Page 5 FIG.7: Four intra prediction modes for 16x16 block sizes [12]

Profiles of H.264:

FIG.8: Profiles of H.264 [5]

The four profiles [see fig.8] of H.264: Baseline, Main, Extended and High.

1] Baseline Profile: It is primarily for low-cost applications that require additional data loss robustness, this profile is used in some videoconferencing and mobile applications. This profile includes all features that are supported in the Constrained Baseline Profile, plus three additional features that can be used for loss robustness (or for other purposes such as low-delay multi-point video stream compositing).

2] Main Profile: It is designed for digital storage media and television broadcasting. H.264 main profile which is the subset of high profile was designed with compression coding efficiency as its main target.

3] Extended Profile: Extended profile intended as the streaming video profile, this profile has relatively high compression capability and some extra tricks for robustness to data losses and server stream switching.

4] High Profile: High profile powers more visual communication with fewer resources, thus limiting or avoiding costly network upgrades. High definition (HD) systems benefit the most from this profile and this new technology will accelerate the adoption of HD communication across organizations.

Page 6 Advantages and Disadvantage of H.264:

H.264 encoding and decoding is more computationally complex than some other codecs such as MPEG-4 Part-2[13]. This is mainly because of the variable block-size motion compensation technique adopted with small block sizes and adaptive intra directional predictions. This limits its use for real-time applications. However, the compression performance of H.264 is significantly better than these so it depends on the requirement of the application.

Changes adopted in H.264 Baseline profile video codec to reduce the encoding time:

The only way to make H.264 suitable for real-time applications is to reduce the encoding time. This can be achieved by encoding many frames parallely. From the software point of view this can be done by incorporating parallel programming techniques in the encoding algorithm. One of them is to use OpenMP libraries. The strategy adopted for encoding the frames parallely is as follows: Step 1] Divide the total number of frames to encode into 2 equal sets. Ex. If the total number of frames to encode is 30, then set1 contains frame numbers from 1 to 15 and set 2 contains frame numbers from 16 to 30. Step 2] Perform intra coding parallely on frame 1 and frame 16 only. Frame 1 can be used as a reference frame for frame 2 and frame 16 can be used as a reference frame for frame 17 and so on. Step 3] Perform inter coding on frame 2 and frame 17 by incorporating changes in the encoding algorithm using OpenMP. Repeat for frame 3 and frame 18 and so on till all the frames are encoded.

FIG.9: Parallel processing of frames to reduce encoding time

As seen from [FIG.9] 30 frames are divided into 2 sets with set1 having frames from 1 to frame 15 and set 2 having frames from frame 16 to frame 30. Frame 1 and frame 16 are intra coded in parallel to act as a reference frame for frame 2 and frame 17. Similarly frame 2 and frame 17 are inter coded in parallel to act as a reference frame for frame 3 and frame 18. This process is carried out till all the frames are encoded. Hence all the frames can be encoded in exactly half the time required to encode all the frames.

What is OpenMP:

OpenMP is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++ and Fortran on many architectures, including Unix and Microsoft Windows platforms.

Page 7 FIG.10: ILLUSTRATION OF MULTITHREADING IN PARALLEL [6]

OpenMP is an implementation of multithreading [see fig.10], a method of parallelization whereby the master "thread" (a series of instructions executed consecutively) "forks" a specified number of slave "threads" and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors. The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed.

Previous research work carried out:

1] Name: D. Han et al. Title: “Low complexity H.264 encoder using machine learning”. Year: Sept. 2010, Sejong University, Seoul, Korea. Description: [7]

2] Name: P.R. Ramolia Title: Low Complexity AVS-M Using Machine Learning Algorithm C4.5. Year: May 2011, Electrical Engineering (EE) Department, University of Texas Arlington (UTA), Arlington, USA. Description: [8]

3] Name: Hitesh Yadav Title: Optimization of the deblocking filter in H.264 codec for real time implementation Year: May 2006, Master Of Science (M.S.), Thesis EE Department, UTA. Description: Reduce the encoding time by enhancing the algorithm of the deblocking filter so as to make it suitable for real time applications

4] Name: Suchethan Swaroop Title: Low complexity H.264 encoder using machine learning for streaming applications Year: May 2011, Master Of Science (M.S.), Thesis EE Department, UTA. Description: Machine learning was adopted to improve the encoder complexity of H.264. Machine learning is a branch of artificial intelligence that is concerned with the design and development of algorithms that allow computers to evolve behaviors.

Page 8 5] Name: Amruta Kulkarni Title: Implementation of fast inter-prediction mode decision algorithm in H.264/AVC video encoder. Year: Expected Dec. 2011, M.S., Thesis EE, UTA. Description: It is proposed to implement a complexity reduction algorithm for inter mode selection in H.264/AVC video coding.

6] Name: Tejas Sathe Title: Complexity reduction in H.264 encoder using Compute Unified Device Architecture (CUDA) programming Year: Expected Dec. 2011, M.S., Thesis EE, UTA. Description: CUDA programming is basically used for general purpose computing on Graphical Processing Unit (GPU). To make H.264 applicable for real time applications, the encoding time needs to be minimized. This can be achieved by CUDA programming as it reduces the computational complexity.

REFERENCES:

[1] F. Dufaux and F. Moscheni , “Motion estimation techniques for digital TV: a review and a new contribution”, Proceedings of the IEEE, Vol. 83, No.6, pp. 858-876, Jun. 1995.

[2] M. Wien, “Variable Block-Size Transforms for H.264/AVC”, IEEE Transactions on Circuits and Systems for video technology, Vol.13, No.7, pp. 564-567, July 2003.

Page 9 [3] J. Vanne, “A Configurable Motion Estimation Architecture for Block-Matching Algorithms”, IEEE Transactions on Circuits and Systems for video technology, Vol.19, No.4, pp 620-628, April 2009.

[4] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions on Circuits and Systems for video technology, Vol.13, No.7, pp. 560-576, July 2003.

[5] I. E. G. Richardson, “H.264 and MPEG-4 Video Compression: Video Coding for Next Generation Multimedia”, Wiley 2nd edition, August 2010.

[6] Wikipedia, It is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation.

[7] D. Han et al, “ Low complexity H.264 encoder using machine learning”, IEEE SPA , Poznan, Poland, pp. 40-43, Sept 2010.

[8] P. R. Ramolia and K.R. Rao, “Low Complexity AVS-M Using Machine Learning Algorithm C4.5”, TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011.

[9] H. Kalva, “Parallel programming for multimedia applications”, Springer Science and Business Media, Florida Atlantic University, Florida, USA, Dec 2010.

[10] JM Software version 18.0, H.264/AVC codec software, Website: http://iphome.hhi.de/suehring/tml/.

[11] C.C. Cheng, “Fast three step intra prediction algorithm for 4×4 blocks in H.264”, Circuits and Systems, ISCAS, IEEE International Symposium, Vol.2, pp. 1509 – 1512, May 2005.

[12] M. Jafani and S. Kasaei, “Fast Intra-Prediction Mode Decision in H.264 Advanced Video Coding”, IEEE Communication Systems, Singapore International Conference ,pp. 1-6, Oct. 2006.

[13] M. Roitzsch and M. Pohlack, “Principles for the Prediction of Video Decoding Times Applied to MPEG-1/2 and MPEG-4 Part 2 Video”, Real-Time Systems Symposium, RTSS, 27TH IEEE International, pp. 271 – 280, Dec. 2006.

Page 10