J Sign Process Syst manuscript No. (will be inserted by the editor)
High-speed FPGA Architecture for CABAC Decoding Acceleration in H.264/AVC Standard
Roberto R. Osorio · Javier D. Bruguera
Abstract Video encoding and decoding are comput- Keywords H.264, CABAC, FPGA, arithmetic codes, ing intensive applications that require high performance video coding processors or dedicated hardware. Video decoding of- fers a high parallel processing potential that may be exploited. However, a particular task challenges paral- lelization: entropy decoding. In H.264 and SVC video 1 Introduction standards, this task is mainly carried out using arith- metic decoding, an strictly sequential algorithm that The new H.264/AVC [15] video coding standard repre- achieves results close to the entropy limit. By accel- sents a significant improvement in quality and compres- erating arithmetic decoding, the bottleneck is removed sion ratio over previous standards such as MPEG-2 and and parallel decoding is enabled. Many works have been MPEG-4 Part 2. This advantage comes with a price as published on accelerating pure binary encoding and de- H.264 is characterized by high computational complex- coding. However, little research has been done into how ity. to integrate binary decoding with context managing In H.264, the Context-based Adaptive Binary Arith- and control without losing performance. In this work metic Coding (CABAC) algorithm offers a significant we propose a FPGA-based architecture that achieves compression gain with respect to the baseline method real time decoding for high-definition video by sustain- (&10%). However, this algorithm cannot be parallelized ing a 1 bin per cycle throughput. This is accomplished to a significant extent. by implementing fast bin decoding; a novel and area Sequential tasks pose a threat on performance as efficient context-managing mechanism; and optimized they limit the available parallelism due to Amdahl’s control scheduling. law. This problem is addressed in [3] where some solu- tions are proposed, such as high clock frequency pro- Work supported in part by Ministry of Science and Innova- cessors that are allocated to sequential tasks, or recon- tion of Spain, co-funded by the FEDER funds of the European figurable accelerators. Union, under contract TIN2010-17541, and by the Xunta de The need for hardware acceleration in arithmetic de- Galicia, Program for Consolidation of Competitive Research Groups ref. 2010/6 and 2010/28. coding (AD) has been addressed in [1] and [13] among others. Subsequently, the remaining decoding tasks are Roberto R. Osorio carried out in parallel using either reconfigurable hard- Dept. of Electronic and Systems, University of A Coru˜na,Spain ware, or a number of programmable processors in a E-mail: [email protected] manycore system. Javier D. Bruguera FPGA is a mature technology that allows accelerat- Centro de Investigacion en Tecnologias de la Informacion ing computing tasks without coming into the high non- (CITIUS), recurring costs and lack of flexibility of ASICs. Recon- University of Santiago de Compostela, Spain He is a member of the European Network of Excellence on figurability is a key factor, as the combination of pro- High Performance and Embedded Architecture and Compi- grammable and reconfigurable parallel processing pro- lation (HiPEAC). vides the adaptability required by modern multi-media E-mail: [email protected] platforms that must implement multiple standards. 2 Roberto R. Osorio, Javier D. Bruguera
Several architectures have been proposed for AD bin 1 1 1 1 1 1 1 1 1 0 0 0 1 1 in recent years that apply different strategies. In [8, meaning > 0 > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 <