Constant Bit-Rate Control Efficiency with Fast Motion Estimation in H.264/Avc Video Coding Standard

CONSTANT BIT-RATE CONTROL EFFICIENCY WITH FAST MOTION ESTIMATION IN H.264/AVC VIDEO CODING STANDARD Daniele Alfonso, Daniele Bagni, Luca Celetto and Simone Milani* Advanced System Technology labs, STMicroelectronics, Agrate Brianza, Italy (Europe) * Department of Information Engineering, University of Padova, Italy (Europe) emails: [email protected], [email protected] ABSTRACT greatly reduces the computation with respect to FSBM with- The emerging H.264/AVC video coding standard provides out visible quality losses at SD-TV resolution. ρ significant enhancements in compression efficiency with The JVT-D030/E069 and -domain algorithms are pre- respect to its ancestors in the MPEG and H.26x families. In sented in sections 2 and 3 respectively, whereas section 4 this paper, we analyse the performance of two different Con- introduces the motion estimation method. The numerical stant Bit-Rate control methods, suitable for H.264/AVC en- results are showed in section 5, followed by our conclusions coding at SD-TV resolution, where the motion estimation is in section 6. performed by a fast proprietary predictive algorithm. 2. JVT-D030/E069 RATE CONTROL 1. INTRODUCTION The JVT-D030/E069 [4, 5] is a CBR control method, devel- The H.264/AVC video coding standard [1] greatly outper- oped from the widely known TM5 [6], originally conceived forms its ancestors in the MPEG and H.26x families, provid- for MPEG-2 video coding. The main difference between the ing about 50% more compression at equal quality [2, 3] in two algorithms consists in the definition of luminance mac- all kinds of applications, from low-bitrate streaming to high- roblock activity, which is the Sum of Absolute Differences quality storage. after prediction (either Intra or Inter) for JVT-D030/E069, Until now, the performance of H.264/AVC was analysed and variance for TM5. mostly in an ideal environment, with pure variable bitrate (at Given a target bitrate, the JVT algorithm determines fixed quantization values) and using the common Full-Search the QP quantizer for each macroblock operating at three Block-Matching (FSBM) algorithm for motion estimation. In levels: Group-Of-Pictures (GOP), frame/field picture and practical applications, however, we need to impose some macroblock. In comparison with the original version [4, 5], constraints on the encoding process, i.e. on the number of we changed the experimental constants, in order to opti- bits generated per time unit in case of limited bandwidth, and mally adapt the algorithm to SD-TV video formats. on the overall computational complexity for real-time encoding. The former goal can be obtained by a rate control algo- 2.1 Rate control at GOP level rithm, which adapts the quantization of the residual coding in The target number of bits for each GOP is order to match a target bitrate, whereas the computational BitRate R = N ⋅ + R (1) complexity can be reduced by adopting a fast algorithm to PictureRate prev perform motion estimation, which is one of the most resource where N is the number of frames in the GOP, i.e. the dis- consuming tasks in the whole encoding process. tance between two Intra coded pictures, and Rprev is the num- Indeed, the relationship between rate control and motion ber of exceeding bits after the encoding of the previous estimation is an important topic of investigation, because the GOP, which is initialised to zero for the first GOP of the two systems interact with each other, determining the overall sequence. performance of the encoder. In fact, if the motion estimation is not efficient, the prediction error will be greater, forcing 2.2 Rate control at frame/field level the rate control to raise the quantization step, in the attempt At the beginning of each frame or field (respectively in case to match the target bitrate. This will negatively affect both of progressive or interlaced source video), the target number the achieved quality and the motion estimation of successive of bits for each I, P and B picture is computed as in the fol- frames, as the motion search is performed referencing the lowing: reconstructed frames, which are less correlated to the current N X N X BitRate one due to the coarser quantization. T = maxR /1+ p p + b b , i × In this paper, we consider two Constant Bit-Rate (CBR) K p X i Kb X i 8 PictureRate ρ control algorithms, named JVT-D030/E069 and -domain, N K X BitRate T = maxR / N + b p b , (2) while using a proprietary motion estimation technique that p p × Kb X p 8 PictureRate 1971 • n N pKb X p BitRate d0 is the initial virtual buffer occupancy, respectively set T = maxR / N + , i p i b i b b × as d =20·r/31, d =K ·d and d =K ·d for I, P and B K p X b 8 PictureRate 0 0 p 0 0 b 0 pictures at the beginning of the encoding process; where R is defined in (1) and Np and Nb are the number of • remaining P and B frames or fields in the GOP, respectively. Bm-1 is the number of bits generated by encoding the first After the encoding of each frame, R is updated as m-1 macroblocks in the picture (composed of MB_CNT macroblocks in total). R = R − S where S is the number of bits used to encode the current 3. ρ-DOMAIN RATE CONTROL frame, and Np and Nb are decremented by one if the current frame or field was of P- or B-type respectively. Traditional rate control algorithms, as also the JVT- Xi, Xp and Xb define the content complexity of the differ- D030/E069, operate in the so-called q-domain, where rate ent picture types. They are initialised with and distortion characteristic curves of the encoder are de- Xi = ( 155 · BitRate ) / 115 termined as a function of the quantization step. However, an Xp = ( 100 · BitRate ) / 115 alternative is to consider them as functions of the percentage Xb = 0.9 · Xp of null quantized transform coefficients, indicated by ρ. This and after the encoding of each frame they are updated as is named ρ-domain analysis, and an efficient rate control 1 method exploiting this theory was presented in [7] and prop- X {}= ⋅ S ⋅QP i, p,b 2 avg erly adapted to H.264/AVC in [8]. where QPavg is the average quantizer for the current picture. The number of bits R for each picture can be expressed Finally, Kp=1.1 and Kb=1.4 define the relative complex- as a function of ρ through the relation ity of I pictures with respect to P and B ones. ρ = λ ⋅ ρ + λ R( ) 1 2 (3) With the approximation that no bits are sent if all coeffi- 2.3 Rate control at macroblock level cients are null, equation (3) becomes Before encoding each macroblock, an initial quantizer is R(ρ) = λ ⋅ (1− ρ) (4) chosen according to the following formula Given a target bit budget Tn for the n-th frame, the d n × 31 corresponding ρ can be determined from (4) as = m + n Qm dq T r ρ = 1− n (5) n λ where r is a constant called reaction parameter, defined as The parameter λ can be estimated from the results of the bit _ rate r = 10⋅ encoding of previous frames, whereas the percentage of null frame _ rate coefficients ρ can be written as a function of the quantizer dq is named delta parameter and it depends from the activ- QP in the following way ity of the current macroblock in the following way ρ ∆ = (6) ( ) ∑ px (QP) − floor(AvgAct / act −1), 0 < act / AvgAct <= 1/ 2 QP <∆ m m = < < dq 0, 1/ 2 actm / AvgAct 2 where px(q) is the probability distribution of transform coef- − >= ficients x and [-∆,+∆] is the quantization step interval asso- floor(actm / AvgAct) 1, actm / AvgAct 2 AvgAct is the average activity for the current picture and ciated with the null reconstructed value. For the probability distribution, we empirically have chosen a laplacian- actm the activity of the current m-th macroblock. At the beginning of the sequence we set AvgAct =2000, impulsive function defined as i 2 − q AvgActp=1500 and AvgActb=800. = α ⋅δ + β px (q) (q) e (7) We would remark that the Intra/Inter prediction could be being δ(q) the Dirac impulse. performed twice for each macroblock in JVT-D030/E069. In Writing (6) in integral form and considering (7) we obtain a first step, we derive the activity using as temporary QP the − 2 ∆ α 1 β one of the previously encoded macroblock. Hence, we en- ρ(∆) = + ⋅1− e +α +α code the macroblock with the final QP computed by the rate 1 1 control, when it differs from the temporary one. Experimen- from which we have tally we found that every macroblock is Intra/Inter predicted β ∆ = − ()+α − ρ ()+α (8) about 1.25 times, on the average, for target bitrates in the ln 1 n 1 ranges from 2 to 7 Mbit/s at SD-TV resolution. 2 The coefficients α and β can be accurately approximated by The dm parameter indicates the occupation of the virtual buffer for picture type n=i,p,b, and it is specified as relating them to the average activity of the picture in the polynomial form: n = n + − × − d d B − T (m 1) / MB _ CNT α(AvgAct) = α + α ⋅ log( AvgAct) + α ⋅ log log( AvgAct) m 0 m 1 n 0 1 2 where: β = β + β ⋅ + β ⋅ 2 (AvgAct) 0 1 AvgAct 2 (AvgAct) • i p b dm , dm , and dm represent the fullness of virtual buffers and then tabulated.

Constant Bit-Rate Control Efficiency with Fast Motion Estimation in H.264/Avc Video Coding Standard

Digital Speech Processing— Lecture 17

Audio Coding for Digital Broadcasting

Challenges in Relaying Video Back to Mission Control

Hikvision H.264+ Encoding Technology

TMS320C6000 U-Law and A-Law Companding with Software Or The

Bit Rate Adaptation Using Linear Quadratic Optimization for Mobile Video Streaming

A Predictive H.263 Bit-Rate Control Scheme Based on Scene

A Packetization and Variable Bitrate Interframe Compression Scheme for Vector Quantizer-Based Distributed Speech Recognition

AVC to the Max: How to Configure Encoder

Adaptive Bitrate Streaming Over Cellular Networks: Rate Adaptation and Data Savings Strategies

Comments on `Area and Power Efficient DCT Architecture for Image

Neural Rate Control for Video Encoding Using Imitation Learning