Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 7, JULY 2019 3343 Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding Chuanmin Jia , Student Member, IEEE, Shiqi Wang , Member, IEEE,XinfengZhang , Member, IEEE, Shanshe Wang , Jiaying Liu , Senior Member, IEEE, Shiliang Pu, and Siwei Ma , Senior Member, IEEE Abstract— Recently, convolutional neural network (CNN) has Index Terms— High-efficiency video coding (HEVC), in-loop attracted tremendous attention and has achieved great success filter, convolutional neural network. in many image processing tasks. In this paper, we focus on CNN technology combined with image restoration to facilitate I. INTRODUCTION video coding performance and propose the content-aware CNN based in-loop filtering for high-efficiency video coding (HEVC). HE block-based compression framework has been widely In particular, we quantitatively analyze the structure of the Tadopted in many existing image/video coding standards, proposed CNN model from multiple dimensions to make the e.g., JPEG [1], H.264/AVC [2], AVS2 [3] and High Efficiency model interpretable and optimal for CNN-based loop filtering. Video Coding (HEVC) [4]. However, the block-based predic- More specifically, each coding tree unit (CTU) is treated as an independent region for processing, such that the proposed tion and quantization [1], [2], [4] in these existing compression content-aware multimodel filtering mechanism is realized by framework will introduce the discontinuities along the block the restoration of different regions with different CNN models boundary, error information around texture contour, as well under the guidance of the discriminative network. To adapt the as the loss of high frequency details, which correspond to image content, the discriminative neural network is learned to the blocking, ringing and blurring artifacts, respectively. The analyze the content characteristics of each region for the adaptive selection of the deep learning model. The CTU level control is strengths of these artifacts are important determinants of video also enabled in the sense of rate-distortion optimization. To learn quality. Therefore, in-loop filtering plays a significant role the CNN model, an iterative training method is proposed by to promote the reconstruction quality of decoded video, and simultaneously labeling filter categories at the CTU level and the majority of state-of-the-art in-loop filtering algorithms are fine-tuning the CNN model parameters. The CNN based in-loop investigated with such purpose. filter is implemented after sample adaptive offset in HEVC, and extensive experiments show that the proposed approach To suppress the blocking artifacts in video coding, significantly improves the coding performance and achieves up the in-loop deblocking filters are investigated over the past to 10.0% bit-rate reduction. On average, 4.1%, 6.0%, 4.7%, several decades [3], [5]–[9], the philosophy of which mainly and 6.0% bit-rate reduction can be obtained under all intra, lies in designing low-pass filters to restrain the blocking low delay, low delay P, and random access configurations, artifacts by adaptively smoothing the boundary pixels. More- respectively. over, the strength of deblocking filtering can also be deter- mined by comparing the discontinuities between adjacent Manuscript received September 20, 2018; revised January 12, 2019; block boundaries with certain thresholds. The deblocking accepted January 22, 2019. Date of publication January 31, 2019; date of filters are originally applied for 4 × 4 block boundaries in current version May 22, 2019. This work was supported in part by the H.264/AVC [2], [6]. The advanced deblocking filter is then National Natural Science Foundation of China under Grant 61632001 and Grant 61872400, in part by the National Basic Research Program of China designed in HEVC [4], [5], which is able to accommodate the (973 Program) under Grant 2015CB351800, in part by the High-Performance quad-tree block partition process. Although deblocking filters Computing Platform of Peking University, Hong Kong RGC Early Career can reduce the blocking artifacts efficiently by smoothing the Scheme, under Grant 9048122 (CityU 21211018), in part by the City University of Hong Kong under Grant 7200539/CS, in part by the National boundary pixels, its application scope is restricted due to the Natural Science Foundation of China under Grant 61772043, and in part by the design philosophy that only boundary pixels are processed Beijing Natural Science Foundation under Grant L182002 and Grant 4192025. while inner ones within a block have been largely ignored. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Xin Li. (Corresponding author: Shanshe Wang.) Obviously, it is difficult to be applied in handling other kinds C. Jia, S. Wang, and S. Ma are with the Institute of Digital Media, of artifacts (e.g., ringing and blurring). To compensate the Peking University, Beijing 100871, China (e-mail: [email protected]; artifacts induced by block-based transform and coarse quanti- [email protected]; [email protected]). S. Wang is with the Department of Computer Science, City University of zation, several novel in-loop filtering methods such as sample Hong Kong, Hong Kong (e-mail: [email protected]). adaptive offset (SAO) [10], adaptive loop filtering (ALF) [11] X. Zhang is with the School of Computer and Control Engineering, and image denoising based method [12]–[17] were investi- University of the Chinese Academy of Sciences, Beijing 100049, China (e-mail: [email protected]). gated, all of which could efficiently remove the artifacts and J. Liu is with the Institute of Computer Science and Technology, Peking obtain better coding performance. University, Beijing 100871, China (e-mail: [email protected]). Recently, deep learning (DL) [18], especially CNN, has S. Pu is with the Hikvision Research Institute, Hangzhou 310052, China (e-mail: [email protected]). been bringing revolutionary breakthrough in many tasks such Digital Object Identifier 10.1109/TIP.2019.2896489 as visual and textural data processing. Deep learning based 1057-7149 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 3344 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 7, JULY 2019 image restoration and denoising methods [19], [20] have also • Deblocking Filter. Due to the quantization of block- achieved state-of-the-art performances. Regarding image and based coding, the prediction error cannot be completely video coding, which tend to become intelligence originated as compensated. Therefore, the discontinuity often appears well, DL based coding tools have been widely investigated, along the block boundaries, especially under low bit- e.g., in the end-to-end image compression framework [21], rate coding circumstances. The design philosophy of sub-pixel interpolation [22], [23], inter-prediction [24], in-loop deblocking filter [28], [29] is low-pass filtering the filtering [25], [26]. block boundaries to smooth the jagged and discontin- In this work, we aim to achieve the content-aware in-loop uous edges or boundaries. Hence, deblocking filter has filter via the multiple CNN models and the corresponding been adopted as a core coding tool since the video discriminative network to well adapt to images with different coding standards H.263+ [3], [5]–[9]. For low bit-rate content characteristics. In summary, the contributions of this video coding, Kim et al. [29] proposed an algorithm paper are summarized as follows. with two separate filtering modes, which are selected • A fully convolutional network architecture with inception by pixel behavior around the block boundary. Recently, structure [27] is analyzed and designed to enhance the deblocking algorithms [30], [31] with higher flexibilities quality of the reconstructed frame in video coding. With were also presented to determine the filter strength by the proposed CNN structure, a content-aware loop filter- content complexity estimation instead of boundary-pixel ing scheme based on multiple CNN models is proposed thresholds. for high efficiency video coding. • SAO and ALF. The deblocking filters cannot adequately • We employ the discriminative network to adaptively restore the quality degraded frame, since the inner pixels select the CNN model for each CTU. As such, the content have been ignored in the deblocking process. In view of adaptive selection of the appropriate filtering parameter this, more in-loop filtering algorithms such SAO [10] and is casted into a classification problem and solved with the ALF [11] were proposed, which potentially take all pixels data-driven DL approach. within each CTU into consideration. SAO belongs to the • We investigate an iteratively training strategy to learn statistical algorithm, and the key idea is to compensate the multiple CNN models, which achieves simultaneous for the sample distortion by classifying the reconstructed learning of the near-optimal model parameters as well as samples into different categories. Inspired by Wiener the content category. Extensive validations provide useful filter theory [32]–[34], ALF aims to minimize distortions evidence regarding the effectiveness of the proposed between the original and reconstructed pixels and trains approach. the low-pass filter coefficients at the encoder. The coeffi- The remainder of this