An Efficient Hardware Design for Intra-Prediction in H.264/AVC

An Efficient Hardware Design for Intra-prediction in H.264/AVC Decoder Muhammad Nadeem, Stephan Wong, and Georgi Kuzmanov Computer Engineering Laboratory, Delft University of Technology Mekelweg 4, 2628CD Delft, The Netherlands {M.Nadeem, J.S.S.M.Wong, G.K.Kuzmanov}@tudelft.nl Abstract—The H.264/AVC intra-frame codec is widely that H.264/AVC outperforms JPEG2000, a state-of-the- used to compress image/video data for applications like art still-image coding standard, in terms of subjective Digital Still Camera (DSC), Digital Video Camera (DVC), as well as objective image quality. This makes the Television Studio Broadcast, and Surveillance video. Intra- prediction is one of the top 3 compute-intensive processing H.264/AVC intra-frame codec, an attractive choice for functions in the H.264/AVC baseline decoder [6] and, an image compression engine. Applications like Dig- therefore, consumes significant number of compute cycles ital Still Camera (DSC) employ intra-frame encoding a processor. In this paper, we propose a configurable, technique to compress high-resolution images. high-throughput, and area-efficient hardware design for In a video frame encoded in intra mode, the current the intra-prediction unit. The intra-prediction algorithm is × optimized to significantly reduce the redundancy in addi- macro-block (i.e., 16 16 pixels block) is predicted from tion operations (e.g., 27% reduction when compared with the previously encoded neighboring macro-blocks (MB) state-of-the-art in literature [12]). The area requirement from the same video frame. Therefore, an intra-frame for our hardware implementation of the optimized intra- with all intra-MB does not depend on any other video prediction algorithm is further reduced by employing a frame(s) and can be decoded independently. A video configurable design to reuse data paths for mutually exclusive processing scenarios. The proposed design is described encoded with intra-frames only is easier to edit than in VHDL and synthesized under 0.18μm CMOS standard video with inter-frames (frames predicted from past or cell technology. While working at a clock frequency of future video frames). Similarly in many surveillance 150 MHz, it can easily meet the throughput requirement systems, the video is compressed using intra frames of HDTV resolutions and consumes only 21K gates. encoding mode due to legal reasons. Courts in many Keywords-intra-prediction; H.264/AVC decoder; image countries do not accept the predicted image frames as le- and video compression; inverse integer transform. gal evidence [5]. As a result, a typical video surveillance system compresses video using intra encoded frames I. INTRODUCTION only. Consequently, intra-only video coding is widely The Advanced Video Coding standard H.264/AVC, used coding technique in Television Studio Broadcast, also known as MPEG-4 part 10, is jointly developed Digital Cinema and Surveillance video applications. by ITU-T VCEG and ISO/IEC MPEG [1]. It is able to The H.264/AVC baseline decoder is approximately achieve significantly higher compression efficiency over 2.5 times more time complex than the H.263 baseline the previous video coding standards, like H.263 and decoder [6]. According to the analysis of run-time MPEG-2/4. The H.264/AVC provides similar subjective profiling data of H.264/AVC baseline decoder sub- video quality as that of MPEG-2 with at least 50% functions, intra-prediction is one of the top 3 compute- more reduction in bit-rate [2], [3]. It provides up to intensive functions [6]. A high-throughput intra-frame 30% better compression when compared with H.263+ processing chain is, therefore, an important requirement and MPEG-4 Advanced Simple Profile (ASP) [4]. This for H.264/AVC intra-frame decoder for real-time video significantly higher compression efficiency is achieved processing applications. The demanding characteristics at the cost of additional computational complexity of the of the intra-prediction algorithm suggest a hardware video coding algorithms in H.264/AVC, approximately implementation for such function for high-definition 10 times higher than the MPEG-4 simple profile [4]. video applications, where even larger frame sizes at The H.264/AVC supports multiple directional intra higher frame rates are to be processed in real time. prediction modes (4 modes for the luma 16 × 16 / Several different hardware architectures have been chroma 8 × 8 block types and 9 modes for the 4 × 4 proposed in the literature in last few years and most block type) to reduce the spatial redundancy in the of them are developed from the encoder’s requirements video signal. These multiple intra prediction modes point of view. In a video decoder, while decoding the help to significantly improve the encoding performance compressed input video bit-stream, the block type and of an H.264 intra-frame encoder. Studies [6] shown intra-prediction mode for the current macro-block are 978-1-4577-0069-9/11/$26.00 ©2011 IEEE already known. Therefore, this information can be used Recon to reduce on-chip area cost by designing overlapped Frame MC data paths for mutually exclusive processing scenarios. (Fn-1) Bitstream In this paper, we focus on the realization of a high- Intra Input throughput and area-efficient design of intra-prediction Prediction unit in H.264/AVC video decoder and provide the + following specific contributions: Recon Deblocking -1 Entropy Frame T-1 Q Filter + Decoder • A proposal to optimize the intra-prediction algo- (Fn) rithm for 4 × 4 luma blocks by decomposing the filter kernels. The proposed decomposition significantly reduces the additions operations for its Figure 1. Functional block diagram of H.264/AVC decoder. hardware implementation (27%-60% reduction). • A configurable hardware design of intra-prediction cost significant amount of hardware resources (approx- module to reduce on-chip area by using hardware imately 29K gates). sharing approach for mutually exclusive processing Similarly, in [12], an efficient hardware implementa- scenarios (approx. 21K gates). tion for intra-prediction unit is proposed. The design targets the H.264/AVC encoder and uses a so-called Furthermore, we also optimize the common equa- combined module approach to generate a subset of intra- tions in the intra-prediction algorithm, to compute the prediction modes in parallel. Furthermore, the intra- × × prediction samples for 16 16 luminance and 8 8 prediction algorithm is optimized to significantly reduce chrominance blocks. the number of arithmetic operation for the computation The remainder of this paper is organized as follows. of prediction samples. The proposed solution, however, The related work is presented in Section II. Section III does not fully eliminate the redundancy in the algorithm provides an overview of the intra-prediction algorithm and also primarily targets the H.264/AVC encoder. and also describes the proposed optimizations in the In short, most of the hardware implementations pre- algorithm. The proposed configurable hardware design sented in the literature do not specifically target the is presented in Section IV, whereas design evaluation H.264/AVC decoder implementation and, therefore, fails is provided in Section V. Finally, Section VI concludes to fully exploit the prior intra-prediction mode informa- this paper. tion to reduce area cost. Some of them do not support the computation of all of the intra-prediction modes II. RELATED WORK and therefore, cannot be used in a general H.264/AVC In the last few years, many researchers proposed a video decoder where any of the intra-prediction modes number of optimized algorithms and efficient hardware can appear in the encoded bit-stream. Similarly, some implementations for intra-prediction unit in H.264/AVC. of them have made attempts to reduce the redundancy For instance, a hardware implementation with five-stage in the intra-prediction algorithm. However, with the registers and three configurable data paths for intra- proposed decomposition in this paper, further arithmetic prediction is proposed in [8]. The proposed solution can operations reduction is possible. process approximately 100 VGA frames in real time. In [9], an efficient intra-frame codec is proposed. III. INTRA-PREDICTION ALGORITHM The proposed solution can process 720p @ 30 fps In this section, we briefly introduce the intra-frame in real time and can be used for both the encoder processing chain in an H.264/AVC decoder. Subse- and decoder implementations. This solution, however, quently, an overview of the intra-prediction algorithm excludes the plane mode for 16 × 16 luma and 8 × 8 for 4 × 4 luminance blocks, 16 × 16 luminance blocks, chroma blocks. The proposed solution, therefore, can and 8 × 8 chrominance blocks is provided in separate be used in a matched encoder-decoder scenarios only subsections. The optimizations to reduce the number where it is guaranteed that plane mode is not used for of operations in the intra-prediction algorithm are also intra-prediction. explained along with algorithm overview. High-throughput hardware designs for a H.264/AVC The functional block diagram for an H.264/AVC decoder are proposed in [10] and [11]. The proposed intra-frame decoder is depicted in Fig. 1. The entropy designs can process high-definition (HD) video in real decoder unit parses the input bit-stream and decodes the time. Since no attempts were made to optimize the intra- intra-prediction mode used for the current MB. This prediction algorithm to reduce the arithmetic operations, intra-prediction mode information is passed on

Load more