Deep Non-Local Kalman Network for Video Compression Artifact Reduction Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Dong Xu, Fellow, IEEE, Li Chen, and Zhiyong Gao

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2019.2943214, IEEE Transactions on Image Processing DEEP NON-LOCAL KALMAN NETWORK 1 Deep Non-local Kalman Network for Video Compression Artifact Reduction Guo Lu, Xiaoyun Zhang, Wanli Ouyang, Dong Xu, Fellow, IEEE, Li Chen, and Zhiyong Gao Abstract—Video compression algorithms are widely used to YLGHRZLWKDUWLIDFWV YLGHRZLWKDUWLIDFWV reduce the huge size of video data, but they also introduce unpleasant visual artifacts due to the lossy compression. In order to improve the quality of the compressed videos, we proposed a deep non-local Kalman network for compression artifact reduction. Specifically, the video restoration is modeled as a Kalman filtering procedure and the decoded frames can be restored from the proposed deep Kalman model. Instead of using the noisy previous decoded frames as temporal information, the less noisy previous restored frame is employed in a recursive UHVWRUHGYLGHR UHVWRUHGYLGHR way, which provides the potential to generate high quality E 2XUUHFXUVLYHSLSHOLQH restored frames. In the proposed framework, several deep neural D 7UDGLWLRQDOSLSHOLQH networks are utilized to estimate the corresponding states in the Kalman filter and integrated together in the deep Kalman filtering network. More importantly, we also exploit the non-local prior information by incorporating the spatial and temporal non- local networks for better restoration. Our approach takes the advantages of both the model-based methods and learning-based methods, by combining the recursive nature of the Kalman model F )ORZEDVHGLPDJHZDUS G 1RQORFDORSHUDWLRQ and powerful representation ability of neural networks. Extensive experimental results on the Vimeo-90k and HEVC benchmark Fig. 1. Different methodologies for video artifact reduction (a) the traditional datasets demonstrate the effectiveness of our proposed method. pipeline without considering previous restored frames. (b) our Kalman filtering pipeline. Exploiting the temporal information by (c) flow based image warp Index Terms—Video Compression Artifact Reduction, Deep and (d) our non-local operation. wi and wj represent the non-local weight. Neural Network, Kalman Model, Recursive Filtering, Video Restoration In this paper, we propose a deep non-local Kalman filtering I. INTRODUCTION network for video compression artifact reduction and our Considering the increasing amount of video data over the motivations are two-fold. First, the restoration process for Internet, compression algorithms (e.g., H.264 and HEVC) [1]– the current frame can benefit from the previous restored [3] have been applied to reduce the storage size and band- frames. It is expected that the previous restored frame can width. However, these algorithms also introduce compression provide more accurate temporal information compared with artifacts, such as blocking, blurring and ringing artifacts. In the original decoded frame. Therefore, we can employ more order to obtain high quality images/videos at the decoder side, precise temporal information from previous restored frames a lot of compression artifact reduction algorithms have been and build a robust video artifact removal system with high proposed to generate artifact-free images in the past decades. performance. It is obvious that the dependence of previous Previously, manually designed filters [4], [5] and sparse restored frames will lead to a dynamic recursive solution for coding based methods [6]–[9] are proposed to solve this video artifact removal. More importantly, this scheme provides problem. Recently, learning based approaches have been suc- the opportunity to utilize long term temporal information cessfully applied to a lot of computer vision tasks [10]– through the recursive pipeline. As we know, most learning [20], such as super-resolution [15], [16], denoising [17] and based approaches [16]–[19], [21] for artifact reduction focus artifact reduction [18]–[20]. In [18], the convolutional neural on image artifact reduction. Although the temporal information networks (CNN) are firstly utilized for image compression is utilized in video artifact reduction [22] or video super- artifact reduction, which demonstrates the effectiveness of resolution [23]–[25], each frame is restored separately (see CNN model. Fig.1(a)) without considering the previous restored frames. Guo Lu, Xiaoyun Zhang, Li Chen and Zhiyong Gao are with Institude In summary, we aims to build a dynamic filtering scheme (see of Image Communication and Network Engineering, Department of Elec- Fig.1(b)) to exploit accurate temporal information in previous tronic Engineering, Shanghai Jiao Tong University, Shanghai, China. (email: frame for high quality restoration. fluguo2014,xiaoyun.zhang,hilichen, [email protected]). Correspond- ing authors: Xiaoyun Zhang and Zhiyong Gao. Second, spatial/temporal non-local prior information is ben- Wanli Ouyang is with The University of Sydney, SenseTime Computer eficial for image/video restoration task. In the past decades, Vision Research Group, Australia. (email: [email protected]) Dong Xu is with The University of Sydney, Australia. (email: non-local prior has been successfully applied to image restora- [email protected]) tion tasks (e.g., image denoising [26] and image super- 1057-7149 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2019.2943214, IEEE Transactions on Image Processing DEEP NON-LOCAL KALMAN NETWORK 2 resolution [8]). However, it is still unclear how to utilize framework. this powerful information for the learning based approach, especially for video restoration. More importantly, motion clue II. RELATED WORK is crucial for the video restoration task. Most learning based A. Single Image Compression Artifact Reduction video restoration approaches [22], [24] tried to use the optical In the past decades, a lot of methods have been proposed to flow to align the temporal neighbour frames for reconstruction reduce the artifacts introduced by compression. In [28], [29], (see Fig.1(c)). Therefore, the quality of restored frame heavily the hand-crafted filters are proposed for reducing blocking rely on the accuracy of estimated optical flow and may degrade and ringing artifacts. Although these methods utilized sophis- for the complex regions. At the same time, the non-local ticated designed filters, the decoded frames are still far from prior can capture the similarity between two neighbouring satisfactory. In order to improve the performance for artifact frames and can be used as an implicit approach to utilize reduction, the sparse coding technique is also widely utilized the motion clue (see Fig. 1(d)), which is more robust and [8], [9], [30]. For example, Chang et al. [8] employed sparse lightweight. Therefore, it is feasible to enhance the restoration coding and tried to learn powerful representations to reduce the by employing the non-local prior information. compression artifacts. In [30], Wang et al. built a deep dual- In this paper, a deep non-local Kalman filtering network domain based fast restoration model by leveraging the large is proposed for video compression artifact reduction. Our capacity of deep networks and problem-specific knowledge. proposed framework is designed as a post-processing module Recently, a lot of convolutional neural network based meth- and can be easily extended to different compression algo- ods have been proposed for the low-level vision tasks, such as rithms. Specifically, the video artifact reduction is formulated super-resolution, denoising and artifact reduction. For exam- as a Kalman filtering procedure, which means the decoded ple, a CNN based neural network (ARCNN) for JPEG artifact frame can be refined recursively by utilizing the information reduction was proposed in [18]. Inspired by ARCNN, various propagated from previous restored frames. In our deep non- techniques (such as residual learning [31], skip connection local Kalman model, two CNN based neural networks (i.e., [31], [32], batch normalization [17], perceptual loss [33], prediction network and measurement network) are proposed to residual block [20] and generative adversarial network [20]) perform Kalman filtering procedure. The prediction network have been employed to generate high quality reconstructed tries to calculate prior estimation based on the previous frames. In [17], a 20-layer neural network by using batch restored frame, while the measurement network aims at obtain- normalization and residual learning was proposed for both de- ing the measurement through the current decoded frame. Both noising and other low-level vision tasks. In [16], Tai et al. built prediction network and measurement network incorporate the a memory persistent network based on a recursive unit and a spatial/temporal non-local prior information by utilizing the gate unit for image restoration task. It should be mentioned well-designed non-local network. Then the prior estimation that a lot of works [34]–[36] also tried to learn the image prior and the measurement are combined together in the Kalman by using CNN and

Load more