Deep Kalman Filtering Network for Video Compression Artifact Reduction

Deep Kalman Filtering Network for Video Compression Artifact Reduction Guo Lu1, Wanli Ouyang2,3, Dong Xu2, Xiaoyun Zhang1, Zhiyong Gao1, and Ming-Ting Sun4 1 Shanghai Jiao Tong University, China {luguo2014,xiaoyun.zhang, zhiyong.gao}@sjtu.edu.cn 2 The University of Sydney, Australia {wanli.ouyang,dong.xu}@sydney.edu.au 3 The University of Sydney, SenseTime Computer Vision Research Group 4 University of Washington, USA [email protected] Abstract. When lossy video compression algorithms are applied, compression artifacts often appear in videos, making decoded videos un- pleasant for human visual systems. In this paper, we model the video artifact reduction task as a Kalman iltering procedure and restore decoded frames through a deep Kalman iltering network. Diferent from the existing works using the noisy previous decoded frames as temporal information in the restoration problem, we utilize the less noisy previous restored frame and build a recursive iltering scheme based on the Kalman model. This strategy can provide more accurate and consistent temporal information, which produces higher quality restoration results. In addition, the strong prior information of prediction residual is also exploited for restoration through a well designed neural network. These two components are combined under the Kalman framework and op- timized through the deep Kalman iltering network. Our approach can well bridge the gap between the model-based methods and learning-based methods by integrating the recursive nature of the Kalman model and highly non-linear transformation ability of deep neural network. Exper- imental results on the benchmark dataset demonstrate the efectiveness of our proposed method. Keywords: Compression artifact reduction, deep neural network, Kalman model, recursive iltering, video restoration 1 Introduction Compression artifact reduction methods aim at generating artifact-free images from lossy decoded images. To store and transfer a large amount of images and videos on the Internet, image and video compression algorithms (e.g., JPEG, H.264) are widely used [1–3]. However, these algorithms often introduce un- desired compression artifacts, such as blocking, blurring and ringing artifacts. Thus, compression artifact reduction has attracted increasing attention and many methods have been developed in the past few decades. 2 Guo Lu et al. video with artifacts video with artifacts restored video restored video (a) (b) (c) (d) (e) (f) (g) (h) Fig. 1. Diferent methodologies for video artifact reduction (a) the traditional pipeline without considering previous restored frames. (b) our Kalman iltering pipeline. Arti- fact reduction results between (c) Xue et al. [21] and (d) our proposed deep Kalman iltering network. (e) original frame (f) decoded frame (g) quantized prediction residual (h) distortion between the original frame and the decoded frame. Early works use manually designed ilters [4, 5] and sparse coding methods [6–9] to remove compression artifacts. Recently, convolutional neural network (CNN) based approaches have been successfully applied for a lot of computer vision tasks [10–20], such as super-resolution [15,16], denoising [17] and artifact reduction [18–20]. In particular, Dong et al. [18] irstly proposed a four-layer neural network to eliminate the JPEG compression artifacts. For the video artifact reduction task, our motivations are two-fold. First, the restoration process for the current frame could beneit from the previous restored frames. A reason is that when compared to the decoded frame, the previous restored frame can provide more accurate information. Therefore temporal information (such as motion clues) from neighbouring frames is more precise and robust, and can provide the potential to further improve the performance. In addition, the dependence of previous restored frames naturally leads to a recursive pipeline for video artifact reduction. Therefore it recursively restores the current frame by potentially utilizing all past restored frames, which means we can leverage efective information propagated from previous estimations. Currently, most of the state-of-the-art deep learning approaches for compression artifact reduction are limited to remove artifacts in a single image [16–19]. Although the video artifact reduction method [21] or video super-resolution methods [22–24] try to combine temporal information for the restoration tasks, their methods ignore the previous restored frame and restore each frame separately as shown in Fig. 1(a). Therefore, the video artifact reduction performance can be further improved by using an appropriate dynamic iltering scheme. Second, modern video compression algorithms may contain powerful prior information that can be utilized to restore the decoded frame. It has been ob- DKFN for Video Compression Artifact Reduction 3 served that practical video compression standards are not optimal according to the information theory [9], therefore the resulting compression code streams still have redundancies. It is possible, at least theoretically, to improve the restoration results by exploiting the knowledge hidden in the code streams. For video compression algorithms, inter prediction is a fundamental technique used to reduce temporal redundancy. Therefore, the decoded frame consists of two components: prediction frame and quantized prediction residual. As shown in Fig. 1(g)-(h), the distortion between the original frame and the decoded frame has a strong relationship with quantized prediction residual, i.e., the region which has high distortion often corresponds to high quantized prediction residual values. There- fore, it is possible to enhance the restoration by employing this task-speciic prior information. In this paper, we propose a deep Kalman iltering network (DKFN) for video artifact reduction. The proposed approach can be used as a post-processing technique, and thus can be applied to diferent compression algorithms. Speciically, we model the video artifact reduction problem as a Kalman iltering procedure, which can recursively restore the decoded frame and capture information that propagated from previous restored frames. To perform Kalman iltering for decoded frames, we build two deep convolutional neural networks: prediction network and measurement network. The prediction network aims to obtain prior estimation based on the previous restored frame. At the same time, we inves- tigate the quantized prediction residual in the coding algorithms and a novel measurement net incorporating this strong prior information is proposed for robust measurement. After that, the restored frame can be obtained by fusing the prior estimation and the measurement under the Kalman framework. Our proposed approach bridges the gap between model-based methods and learning- based methods by integrating the recursive nature of the Kalman model and highly non-linear transform ability of neural network. Therefore, our approach can restore high quality frames from a series of decoded video frames. To the best of our knowledge, we are the irst to develop a new deep convolutional neural network under the Kalman iltering framework for video artifact reduction. In summary, the main contributions of this work are two-fold. First, we for- mulate the video artifact reduction problem as a Kalman iltering procedure, which can recursively restore the decoded frames. In this procedure, we utilize the CNN to predict and update the state of Kalman iltering. Second, we employ the quantized prediction residual as the strong prior information for video artifact reduction through deep neural network. Experimental results show that our proposed approach outperforms the state-of-the-art methods for reducing video compression artifacts. 2 Related Work 2.1 Single Image Compression Artifact Reduction A lot of methods have been proposed to remove the compression artifacts. Early methods [25, 26] designed new ilters to reduce blocking and ringing artifacts. One of the disadvantages for these methods is that such manually designed 4 Guo Lu et al. ilters cannot suiciently handle the compression degradation and may over- smooth the decoded images. Learning methods based on sparse coding were also proposed for image artifact reduction [8,9,27]. Chang et al. [8] proposed to learn a sparse representation from a training image set, which is used to reduce artifacts introduced by compression. Liu et al. [9] exploited the DCT information and built a sparsity-based dual domain approach. Recently, deep convolutional neural network based methods have been successfully utilized for the low-level computer vision tasks. Dong et al. [18] proposed artifact reduction CNN (ARCNN) to remove the artifacts from JPEG compression. Inspired by ARCNN, several methods have been proposed to reduce compression artifact by using various techniques, such as residual learning [28], skip connection [28, 29], batch normalization [17], perceptual loss [30], residual block [20] and generative adversarial network [20]. For example, Zhang et al. [17] proposed a 20-layer neural network based on batch normalization and residual learning to eliminate Gaussian noise with unknown noise level. Tai et al. [16] proposed a memory block, consisting of a recursive unit and a gate unit, to explicitly mine persistent memory through an adaptive learning process. In addition, a lot of methods [31–33] were proposed to learn the image prior by using CNN and achieved competitive results for the image restoration tasks. As mentioned in

Load more