2011 18th IEEE International Conference on Image Processing

LOW COMPLEXITY DEBLOCKING FILTER PERCEPTUAL OPTIMIZATION FOR THE HEVC

Matteo Naccari 1, Catarina Brites 1,2 , João Ascenso 1,3 and Fernando Pereira 1,2 Instituto de Telecomunicações 1 – Instituto Superior Técnico 2, Instituto Superior de Engenharia de Lisboa 3 {matteo.naccari, catarina.brites, joao.ascenso, fernando.pereira}@lx.it.pt

ABSTRACT codec. The perceptual optimization is performed by varying the The compression efficiency of the state-of-art H.264/AVC two aforementioned offsets to minimize a Generalized Block-edge coding standard must be improved to accommodate the compres- Impairment Metric (GBIM) [7], taken as good quality metric to sion needs of high definition . To this end, ITU and MPEG quantify the blocking artifacts visibility. The proposed novelty may started a new standardization project called High Efficiency Video be summarized in two main contributions: first, the GBIM is ex- Coding. The under development still relies on trans- tended to consider the new block sizes considered in the TMuC form domain quantization and includes the same in-loop deblock- codec and, second, a low complexity deblocking filter offsets per- ing filter adopted in the H.264/AVC standard to reduce quantiza- ceptual optimization is proposed to improve the GBIM quality tion blocking artifacts. This deblocking filter provides two offsets while significantly reducing the computational resources that to vary the amount of filtering for each image area. This paper would be required by a brute force approach where all possible proposes a perceptual optimization of these offsets based on a offset values would be exhaustively tested. The proposed deblock- quality metric able to quantify the blocking artifacts impact on the ing filter offsets perceptual optimization is parameterized to be- perceived video quality. The proposed optimization involves low come a dynamic, tuneable tool able to provide some additional computational complexity and provides quality improvements with video quality at the cost of some low computational complexity respect to a non-perceptually optimized H.264/AVC deblocking increase at both the encoder and decoder sides since the deblock- filter. Moreover, the proposed optimization allows up to 92% of ing filter is in the coding loop. To the best of the authors’ knowl- complexity reduction regarding a brute force perceptual optimiza- edge, there is no proposal in the literature for a perceptual optimi- tion which exhaustively tests all the possible offsets values. zation of the deblocking filter, notably by varying the deblocking filter offsets of the H.264/AVC standard. Index Terms— Blocking artifacts, H.264/AVC deblocking The remainder of this paper is organized as follows: Section 2 filter, High Efficiency Video Coding, Perceptual optimization. briefly summarizes the relevant coding tools already considered in the HEVC project. In Section 3, the GBIM is briefly presented and 1. INTRODUCTION then its extension to HEVC is proposed. Section 4 proposes the Recent advantages in video capturing and display technologies low complexity deblocking filter offsets perceptual optimization will further increase the presence of high and ultra high definition while Section 5 presents the experimental results. Finally, Section video contents in multimedia mass market applications. To ac- 6 concludes the paper and discusses some future research work. commodate the higher compression efficiency required by these applications, the compression capabilities of the state-of-art 2. RELATED BACKGROUND H.264/AVC video coding standard [1] must be improved. This All the 27 proposals submitted to the JCT-VC CfP were based target is currently gaining evidence with the standardization activi- on a hybrid block-based motion compensated predictive video ties in the High Efficiency Video Coding (HEVC) project. These coding architecture similar to the one used by the H.264/AVC activities are the result of a successful Call for Proposals (CfP) [2], video coding standard and its predecessors. Nevertheless, several issued in January 2010, by ITU and MPEG which joined efforts in novel coding tools were proposed for the following coding mod- the so-called Joint Collaborative Team on Video Coding (JCT- ules: intra prediction, , frequency transforma- VC). Given the interesting results provided by many of the CfP tion, entropy coding and in-loop filtering [3], [4]. In the intra cod- proponents [3], the JCT-VC decided to further investigate the most ing process, new spatial prediction directions have been included promising coding tools, integrating them in a video codec called together with the planar prediction mode, allowing to encode an Test Model under Consideration (TMuC) [4]. The Rate-Distortion 8×8 image area by sending only 1 pixel value for the luminance (RD) performance improvement achieved by the TMuC codec is and 2 pixel values for the chrominance components. In the motion mainly due to the novel coding tools which explicitly tackle the compensation process, new block sizes larger than the usual 16×16 higher spatio-temporal redundancy present in high definition vid- have been defined by means of the more flexible Cod- eos. Moreover, the perceived quality is improved by means of two ing Tree Block (CTB) structure [4]. A CTB is a B×B image area kinds of in-loop filters which reduce the quantization artifacts. which can be recursively quad-tree split. The size B, together with These two kinds of in-loop filters are the adaptive H.264/AVC the maximum splitting level, may vary at sequence level and be deblocking filter [5] and the symmetric Wiener filter [6]. The signalled in the sequence parameter set. These new partitioning H.264/AVC deblocking filter was designed to filter the blocking sizes lead to new integer discrete cosine transform sizes, notably artifacts while preserving image edges. Furthermore, this deblock- 16×16 , 32×32 and 64×64, in addition to the usual H.264/AVC ing filter allows to adaptively modulate the amount of filtering for 4×4 and 8×8 sizes. For the entropy coding process, decoding par- each block edge by means of two offsets [5]. This modulation may allelization and more efficient context modeling for binary arith- be performed by means of an objective quality metric able to ex- metic coding have been proposed. Finally, in the in-loop filtering press the subjective impact of the quantization blocking artifacts. process, the H.264/AVC deblocking filter has been extended to In this context, this paper proposes a novel perceptual optimi- accommodate the new block sizes and a symmetric Wiener filter zation algorithm for the deblocking filter included in the TMuC has been added on top of the deblocking filter to reduce the quanti-

978-1-4577-1302-6/11/$26.00 ©2011 IEEE 749 2011 18th IEEE International Conference on Image Processing

zation distortion inside reconstructed blocks [6]. Regarding the in- is not required for the metric computation) and, thus, it can be also loop deblocking filter, the key element in this paper, the current used at the decoder side where the original content is not available. TMuC software includes two alternative filters: the one standard- ized in the H.264/AVC standard and another proposed by the 3.2. The extended generalized block-edge impairment metric Tandberg-Ericsson-Nokia (TEN) consortium which considers 8×8 The baseline GBIM was designed for image partitionings taken as or larger block sizes and has a less complex filter enabling and regular grids of non overlapped B×B blocks where B was equal to disabling logic [8]. As side effect, the TEN filter has lower flexibil- eight [7]. However, the various block sizes adopted in the TMuC ity for varying the filter offsets. Therefore, in this paper, the codec lead to an irregular image partitioning grid. The GBIM ex- H.264/AVC deblocking filter will be considered for the proposed tension towards this irregular image partitioning grids consists in offsets perceptual optimization as it allows more adaptability in the computing Mh,v and Eh,v assuming a regular image partitioning grid offsets variation and, thus, the provision of a more powerful tool to with a block size equal to the smallest size used by the TMuC co- trade-off additional quality with additional complexity. dec and by setting to zero those differences which do not corre- spond to block edges in the irregular partitioning. Therefore, the 3. PROPOSED GENERALIZED BLOCK-EDGE extended GBIM is computed as: IMPAIRMENT METRIC EXTENSION 1. Regular grid partitioning Mh,v and Eh,v computation : For all This section presents the proposed Generalized Block-edge frames, compute Mh,v and Eh,v as specified in [7] over a regular Impairment Metric (GBIM) [7] extension towards the new block image grid with a block size equal to the smallest size used by sizes adopted in the TMuC codec. the TMuC codec for the video sequence being coded (e.g. 4×4 ).

2. Non block edge element suppression in Mh,v and Eh,v : For all 3.1. The baseline generalized block-edge impairment metric frames, set to zero the Mh,v and Eh,v elements which do not corre- The GBIM is an objective metric able to assess the impact of spond to block edges in the irregular TMuC image partitioning. blocking edge artifacts on the quality perceived by a human ob- 3. Frame level GBIM computation : For all frames, compute GBIM f server. The GBIM measures the luminance differences for pixels as in (1). Note that now Mh,v , Eh,v exclude the pixels which do lying at the image block edges and accounts for some Human Vis- not lie at block edges in the irregular partitioning. ual System (HVS) perceptual mechanisms, making its blocking 4. Sequence level GBIM computation : Compute the GBIM seq value artifacts visibility assessment closer to the human observer judge- as the average over all the already computed GBIM f values. ment. The HVS masking mechanisms considered by GBIM are: For all the test videos used in the experimental results, it has been

1. Perceptual weighting of the block edge pixel differences : Each verified that the extended GBIM seq values were coherent with the difference between the intensities of two pixels lying at the block findings depicted in [7]; this means, for example, that GBIM seq edge is multiplied by a perceptual weight wp quantifying the increases with the bitrate and higher GBIMseq values are obtained HVS luminance masking behaviour responsible for the low HVS when the deblocking filter is disabled. sensitivity to artifacts in darker and brighter image areas.

2. Normalization of block edge pixels difference : The above pixel 4. PROPOSED LOW COMPLEXITY DEBLOCKING differences are also normalized by the average difference over FILTER PERCEPTUAL OPTIMIZATION those pixels which do not lie at block edges to model the HVS This section proposes a low complexity deblocking filter per- masking mechanism making artifacts less visible in image areas ceptual optimization. First, the offsets perceptual optimization with high spatial activity. problem taking the GBIM as the perceptual metric is formulated The GBIM quantifies the final blocking artifacts visibility with a and then the proposed low complexity optimization is proposed. quality score in the range [0, + ∞], where the lower the score, the less the blocking artifacts visibility. Furthermore, the GBIM defini- 4.1 GBIM based deblocking filter perceptual optimization tion refers to the luminance component as the HVS is more sensi- The final goal of the H.264/AVC deblocking filter [5] is to filter tive to the distortion in this component [7]. The GBIM ability to the block edge artifacts due to quantization while preserving the quantify the blocking artifacts visibility has been assessed initially image actual edges. The classification between block edge due to for MPEG-1 Video [7] and after for H.263+ and H.264/AVC video quantization artifacts and actual image edge is performed by check- [9]. The study in [9] reports the GBIM as the best objective metric ing the difference among the pixels lying across a block edge. among the tested metrics designed for that specific purpose. The More specifically, over each set of pixels lying across a block edge, proposed GBIM extension regards the following terms [7]: their differences are compared with two thresholds, ( α,β), as speci- fied in [5]. The values for (α,β) depend on both the used Quantiza- 1. Perceptually weighted block edge pixel difference, Mh ( Mv): This term represents the norm of the horizontal (or vertical) block tion Parameter (QP) as well as on two offsets OA and OB. The off- sets O , O can be varied to improve the final perceptual quality of edge pixel differences, weighted by the perceptual weight wp. A B the decoded frames. In particular, assuming the GBIM as the met- 2. Perceptually weighted non block edge average difference, Eh ric to measure the perceptual impact of quantization artifacts, the (Ev): This term represents the norm of the average difference for those pixels between two horizontal (or vertical) block edges. As offsets OA, OB can be selected to optimize (i.e. minimize) the GBIM value measured for a given horizontal or vertical block edge for the Mh,v terms, also this average difference is weighted by wp. (hereafter, denoted as GBIM BE ). In this paper, the aforementioned The frame level GBIM (GBIM f) is then obtained as: optimization is performed on the luminance component of the f . (1) GBIM = 5.0 ⋅ (M h / Eh ) + 5.0 ⋅(M v / Ev ) coded frames and the optimized offsets are then applied also to the chrominance components since no further RD performance im- Conversely, the sequence level GBIM (GBIM seq ) is obtained as the provement has been observed for separate chrominance optimiza- average of the GBIM f values. Finally, it should be noted that the tion. At the block edge level, the offsets perceptual constrained GBIM operates in no-reference modality (i.e. the original content optimization problem may be defined as:

750 2011 18th IEEE International Conference on Image Processing

BE min (GBIM (O , O )) s.t. {O ∈ CS , O ∈ CS }, (2) with the offsets ( OA, OB) obtained above. OA,OB A B A A B B 10. Set i to the next block edge and go to 1. where CS A and CS B denote the candidate set of values for the off- In the proposed low complexity perceptual offsets optimization, sets OA and OB, respectively. The norm computed for the afore- only the spatial block edge correlation is considered since the ex- BE mentioned terms Mh,v and Eh,v implies that the GBIM is not an perimental results revealed a weak temporal correlation between BE additive metric. Therefore, minimizing the GBIM does not guar- the offsets and the GBIM BE values. f antee minimizing GBIM . However, it has been experimentally f verified that GBIM can be approximated quite well as the average 5. EXPERIMENTAL RESULTS BE of the GBIM for all the filtered block edges. Therefore, by mini- This section presents the experiments performed to assess the BE mizing GBIM , a good approximation of the frame level mini- RD performance and the computational complexity required by the mum can be achieved. An extension of the problem expressed in proposed low complexity perceptual deblocking filter optimization, (2) at GBIM frame level is considered for future work. hereafter denoted as Low-complexity solution.

4.2 Low complexity perceptual optimization 5.1. Coding conditions The relation between the H.264/AVC deblocking filtering [5] and The high definition videos selected from the JCT-VC test set have BE GBIM is highly nonlinear and, thus, does not allow finding a been encoded according to the “ random access, high-efficiency ” closed form solution for the minimization problem in (2). Thus, the configuration [10] and using the TMuC software version 0.9. optimal offsets, OA and OB, can only be found by an exhaustive Document [10] defines four QP values for the Intra coded frames search over the candidate sets CS A and CS B, a so-called brute force QPI: 22, 27, 32, and 37; inter coded frames QP values are auto- solution. This exhaustive search involves high computational com- matically derived by the TMuC software according to the Group plexity as a block edge is filtered | CS A|·|CS B| times where |•| denotes Of Picture (GOP) length. Due to space limitations, the presented the set cardinality. This high computational complexity becomes RD performance only refers to four representative videos, selected even more problematic as this exhaustive search is performed also by clustering the Spatial Index (SI) and Temporal Index (TI) space at the decoder side where the computational resources are typically with these indexes computed as in [11]. The SI-TI clustered space scarcer. To lower the computational complexity involved in the in Figure 1 led to the choice of the Kimono1 , BasketballDrive , perceptual optimization of the offsets OA, OB, this paper proposes a RaceHorses and PartyScene sequences. The first two videos have novel low complexity optimization algorithm estimating the offsets 1920×1080 spatial resolution with frame rates of 24 and 50 (fps), (OA,OB) based on the experimentally observed correlation between while the last two videos have 832×480 spatial resolution with BE the optimal offsets values and the GBIM values among the block frame rates of 30 and 50 (fps). For the proposed low complexity edges with the same size. Therefore, for each block edge, the pro- offsets perceptual optimization, the value for the parameters ε, ρ, N posed low complexity offsets perceptual optimization searches is 0.45, 300 and 3, respectively, while the candidate sets CS A and among all the block edges already filtered, with the same block CS assume values in the integer range [-5, 5]. Finally, the sym- BE B size and with GBIM values close to the one of the block edge metric Wiener filter is disabled to fully assess the perceptual qual- being considered. For all the block edges with these characteristics, ity improvement brought by the proposed offsets optimization. their offsets ( O ,O ) are used to obtain offsets estimates for the A B considered edge block. Then, the algorithm proceeds as follows: 1. For each block edge i (horizontal and vertical) to be filtered do 2. Initial GBIM BE computation : Compute the GBIM BE value for block edge i to quantify the similarity between i and the al- ready filtered block edges. 3. For each block edge j already filtered and with the same size and type (i.e. horizontal or vertical ) of i do 4. GBIM BE difference computation : Compute BE BE d1 = | GBIM (j)-GBIM (i)|. expressing how similar are the GBIM BE values for j and i.

5. Spatial block edge distance computation : Compute Figure 1: SI-TI clustering for the JCT-VC test sequences. 2 2 1/2 d2 = (( xi-xj) + ( yi-yj) ) . where ( x,y ) denote the top left corner coordinates of the 5.2. Assessment methodology seq block to which i, j belong to. d2 is used to reduce the com- The RD performance will be measured taking GBIM as the per- plexity involved in the searching over the filtered block ceptual distortion metric. Regarding the computational complexity, edges by defining a maximum distance for this search. it is measured through the number of times the deblocking filtering

6. Block edge j similarity check : If d1 < ε (similar GBIM) and is invoked over each line of pixels across a block edge. The de- d2 < ρ (close enough) then add the j offsets ( OA, OB) to a blocking filtering calls are accumulated at the frame level and, at temporary set S for block edge i. the sequence level, the complexity is taken as the average over all 7. If the size of S ≥ N, with N an experimentally derived thresh- video frames ( θseq ). To assess the proposed low complexity offsets old, estimate ( OA, OB) for block edge i as the median of the off- perceptual optimization, the test videos have also been encoded sets in S. The threshold N guarantees that a significant number with the following codec configurations: of offsets estimates is available while the median allows for ro- 1. Zero-offsets solution where the block edge deblocking process is bustness to outliers. always carried out with O = O = 0 as typically happens in the A B 8. Else compute OA and OB with exhaustive search over CS A, CS B. literature where the offsets are not used. 9. Block edge i filtering : Filter block edge i as specified in [5] 2. High-complexity solution where the deblocking process is carried

751 2011 18th IEEE International Conference on Image Processing

out using offsets determined with exhaustive search on CS A, CS B. Table 1: RD performance. Zero-offsets High-complexity Low-complexity 5.3. Experimental results Sequence Bitrate GBIM Bitrate GBIM Bitrate GBIM ∆θseq QPI The measured RD performance is shown Table 1. As it may be name [kbps] (seq) [kbps] (seq) [kbps] (seq) [%] noted, the proposed Low-complexity solution allows improving the 22 4900.34 1.136 4902.62 1.091 4914.03 1.125 -86.25 seq final quality of the coded video (i.e. lower GBIM values are 27 2265.14 1.218 2264.31 1.097 2269.43 1.146 -77.27 Kimono1 obtained) with respect to the Zero-offsets solution at the cost of 32 1103.23 1.248 1102.73 1.115 1105.76 1.154 -65.69 some (low) investment in computational complexity. Moreover, 37 551.64 1.249 552.14 1.117 553.70 1.158 -53.90 the proposed Low-complexity solution provides a significant com- 22 18496.21 1.183 18428.75 1.151 18551.57 1.172 -92.41 plexity reduction, i.e. up to 92% with an average of 76% over all Basketball 27 6406.74 1.276 6403.96 1.204 6432.56 1.236 -87.81 the tested videos, over the High-complexity solution. Furthermore, Drive 32 2948.57 1.376 2947.69 1.259 2959.23 1.320 -80.79 the quality improvements with respect to the Zero-offset s solution 37 1492.99 1.427 1491.93 1.284 1496.03 1.352 -70.55 are higher for lower bitrate values: this is expectable as blocking 22 5065.70 1.219 5060.61 1.202 5137.51 1.211 -91.96 artifacts are more pronounced at lower bitrates. At the same time, Race 27 2144.45 1.302 2142.69 1.256 2154.21 1.284 -86.04 the computational complexity reduction decreases with the bitrate Horses 32 994.76 1.417 993.04 1.327 996.99 1.366 -76.94 which is mainly due to the higher number of block edges filtered 37 469.45 1.506 468.74 1.381 470.23 1.429 -64.91 for which, sometimes, an offsets estimate cannot be found and, 22 7115.44 1.155 7127.02 1.146 7249.46 1.151 -89.14 thus, exhaustive search has to be performed. As an example, the 27 3232.28 1.180 3234.86 1.156 3249.19 1.162 -79.38 Party Scene RD performance for the PartyScene sequence is graphically shown 32 1517.90 1.214 1518.36 1.168 1514.90 1.182 -68.29 in Figure 2. Finally, Figure 3 shows the RD performance for the 37 697.90 1.228 697.86 1.171 693.46 1.183 -55.91 RaceHorses sequence for different ρ parameter values; as expected, Average - 3712.7 1.270 3708.6 1.195 3734.3 1.227 -76.70 there is a clear trade-off between the RD performance gain and the computational complexity, clearly showing the proposed solution 6. CONCLUSION AND FUTURE WORK allows finding the ideal trade-off point for the application at hand. This paper proposes a low complexity offsets perceptual opti- mization for the deblocking filtering in the TMuC codec. The pro- posed optimization allows improving the perceptual quality of the decoded video by reducing the visibility of quantization artifacts. Moreover, the proposed low complexity optimization allows re- ducing the computational complexity regarding a brute force per- ceptual optimization. Future work regards the frame level exten- sion of the minimization problem in (2) and the automatic tuning of the ρ parameter based on some video features to obtain a RD performance versus complexity trade-off as shown in Figure 3.

7. REFERENCES [1] T. Wiegand, G. J. Sullivan, G. Bjøntegaard and A. Luthra, “Overview of the H.264/AVC video coding standard”, IEEE Trans. on Circuits and Syst. for Video Technol. , vol. 13, no. 7, pp. 560-576, July 2003. [2] ISO/IEC JTC1/SC29/WG11, “Joint Call for Proposal on Video Com- pression Technology”, Doc. N11113, 91 st Meeting: Kyoto , Jan. 2010. [3] G. J. Sullivan and J.-R. Ohm, “Meeting report of the first meeting of the Joint Collaborative Team on Video Coding”, JCTVC-A200, 1 st Meet- Figure 2: RD performance for the PartyScene sequence. ing : Dresden, Apr. 2010. [4] JCT-VC, “Suggestion for a test model”, JCTVC-A033r1, 1 st meeting : Dresden, Apr. 2010. [5] P. List, A. Joch, J. Lainema, G. Bjøntegaard and M. Karczewicz, “Adaptive deblocking filter”, IEEE Trans. on Circuits and Syst. for Video Technol. , vol. 13, no. 7, pp. 614-619, Jul. 2003. [6] T. Chujoh, N. Wada and G. Yasuda, “Quadtree-based adaptive loop filter”, ITU-T document, COM16-C181, Jan. 2009. [7] H. R. Wu and M. Yuen, “A generalized block-edge impairment metric for video coding”, IEEE Signal Process. Lett. , vol. 4, no. 11, pp. 317- 320, Nov. 1997. [8] K. Ugur, K. R. Andersson and A. Fuldseth, “Description of video cod- ing technology proposal by Tandberg, Nokia, Ericsson”, JCTVC-A119, 1st meeting : Dresden, Apr. 2010. [9] A. Leontaris, P. C. Cosman and A. R. Reibman, “Quality evaluation of motion-compensated edge artifacts in compressed video”, IEEE Trans. on Image Process. , vol. 16, no. 4, pp. 943-956, Apr. 2007. [10] F. Bossen, “Common test conditions and software reference configura- nd tions”, JCTVC-B300, 2 meeting : Geneva, CH, Jul. 2010. [11] ITU-T, “Subjective video quality assessment methods for multimedia Figure 3: RD performance for various ρ values for the Race- applications”, Recommendation ITU-R P 910 , Sept. 1999. Horses sequence.

752